Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230614となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 大規模言語モデル:インテクスト学習による多言語コメント生成 Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning ( http://arxiv.org/abs/2304.11384v3 ) ライセンス: Link先を確認	Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Jin, Xiaoguang Mao, Xiangke Liao	(参考訳) コードコメント生成は、開発者のプログラム理解活動を容易にするために、コードスニペットの自然言語記述を生成することを目的としている。長い間研究されてきたが、既存のアプローチのボトルネックは、コードスニペットが与えられた場合、1つのコメントしか生成できないことだ。この制限に対処するために,大規模な言語モデル(LLM)を用いて,開発者の多様な意図を満たすコメントを生成する可能性について実験的に検討した。我々の直感は,(1)LLMの事前学習過程において,自然言語とプログラミング言語のセマンティックな関係を構築するためにコードとそのペアのコメントが使用されること,(2)事前学習のために収集される実世界のプロジェクトにおけるコメントには,通常,開発者意図が異なるという事実に基づいている。したがって、LLMは事前学習後に異なる視点からコードを理解することができると仮定する。コンテキスト内学習パラダイムを採用して、llmに適切なプロンプト(例:10以上の例で提供)を提供することで、llmは複数の意図を持ったコメントを生成するための最先端の教師付き学習アプローチを大幅に上回ることができるのです。また, 結果の再評価のためのプロンプト構築戦略や後処理戦略をカスタマイズすることで, LLMのパフォーマンスが向上し, LLMを用いたコメント生成の今後の研究方向性が明らかになった。 Code comment generation aims at generating natural language descriptions for a code snippet to facilitate developers' program comprehension activities. Despite being studied for a long time, a bottleneck for existing approaches is that given a code snippet, they can only generate one comment while developers usually need to know information from diverse perspectives such as what is the functionality of this code snippet and how to use it. To tackle this limitation, this study empirically investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents. Our intuition is based on the facts that (1) the code and its pairwise comment are used during the pre-training process of LLMs to build the semantic connection between the natural language and programming language, and (2) comments in the real-world projects, which are collected for the pre-training, usually contain different developers' intents. We thus postulate that the LLMs can already understand the code from different perspectives after the pre-training. Indeed, experiments on two large-scale datasets demonstrate the rationale of our insights: by adopting the in-context learning paradigm and giving adequate prompts to the LLM (e.g., providing it with ten or more examples), the LLM can significantly outperform a state-of-the-art supervised learning approach on generating comments with multiple intents. Results also show that customized strategies for constructing the prompts and post-processing strategies for reranking the results can both boost the LLM's performances, which shed light on future research directions for using LLMs to achieve comment generation.	翻訳日:2023-10-24 12:35:35 公開日:2023-06-14
# PythonとRのデータ分析プログラムでバグを特徴付ける Characterizing Bugs in Python and R Data Analytics Programs ( http://arxiv.org/abs/2306.08632v1 ) ライセンス: Link先を確認	Shibbir Ahmed, Mohammad Wardat, Hamid Bagheri, Breno Dantas Cruz, Hridesh Rajan	(参考訳) RとPythonは多くの重要なデータ分析タスクで使われている最も人気のある言語の一つである。しかし、これらの2つの言語がデータ分析タスクで発生するバグについて、まだ完全には理解していません。どんなバグがよくあるのか? 主な原因は何ですか? バグと根本原因の関係は何か? これらのバグを緩和する方法? 我々は5,068のStack Overflowポスト、GitHubリポジトリからの1,800のバグ修正コミット、RとPythonのバグを理解するために最も使われているライブラリのGitHub問題に関する包括的な調査を紹介する。 RとPythonには、データ分析の経験不足によるバグがあるが、PythonはRと比較して、データ前処理のバグが大幅に大きい。また、パッケージやライブラリの変更やバグがPythonよりもRのバグを発生させるのに対して、パッケージやライブラリのミスセレクションやコンフリクトはRよりもPythonのバグを発生させる。データビジュアライゼーションの面では、RパッケージはPythonライブラリよりもはるかに多くのバグがある。また,言語的および方法論的差異にもかかわらず,RとPythonのパッケージに比較して強い相関関係が認められた。最後に、手作業によるRとPythonのバグの大規模なデータセットを寄贈する。 R and Python are among the most popular languages used in many critical data analytics tasks. However, we still do not fully understand the capabilities of these two languages w.r.t. bugs encountered in data analytics tasks. What type of bugs are common? What are the main root causes? What is the relation between bugs and root causes? How to mitigate these bugs? We present a comprehensive study of 5,068 Stack Overflow posts, 1,800 bug fix commits from GitHub repositories, and several GitHub issues of the most used libraries to understand bugs in R and Python. Our key findings include: while both R and Python have bugs due to inexperience with data analysis, Python see significantly larger data preprocessing bugs compared to R. Developers experience significantly more data flow bugs in R because intermediate results are often implicit. We also found changes and bugs in packages and libraries cause more bugs in R compared to Python while package or library misselection and conflicts cause more bugs in Python than R. While R has a slightly higher readability barrier for data analysts, the statistical power of R leads to a less number of bad performance bugs. In terms of data visualization, R packages have significantly more bugs than Python libraries. We also identified a strong correlation between comparable packages in R and Python despite their linguistic and methodological differences. Lastly, we contribute a large dataset of manually verified R and Python bugs.	翻訳日:2023-10-23 19:48:22 公開日:2023-06-14
# 建築侵食の違反症状の自動同定に向けて Towards Automatic Identification of Violation Symptoms of Architecture Erosion ( http://arxiv.org/abs/2306.08616v1 ) ライセンス: Link先を確認	Ruiyin Li, Peng Liang, Paris Avgeriou	(参考訳) アーキテクチャの侵食は、実装が意図したアーキテクチャから外れるので、保守と進化に有害な影響を与える。これを防ぐためには、開発チームは浸食の症状、特に意図したアーキテクチャの違反を十分に早期に理解する必要がある。これを実現する1つの方法は、アーキテクチャ違反をテキストアーティファクト、特にコードレビューから自動的に識別することです。本稿では,機械学習に基づく15の分類器と,事前学習された3つの単語埋め込みを用いた4つの深層学習に基づく分類器を開発した。具体的には、OpenStack(NovaとNeutron)とQt(Qt BaseとQt Creator)の4つの大きなオープンソースプロジェクトのコードレビューコメントを調べました。次に、コードレビューでアーキテクチャ違反について議論した参加者からのフィードバックを得て、トレーニング済みの分類器の有用性を検証する調査を行った。その結果,Word2vec事前学習語埋め込みに基づくSVM分類器はF1スコア0.779で最良となることがわかった。多くの場合、fastText事前訓練された単語埋め込みモデルを用いた分類器は比較的優れた性能が得られる。さらに,200次元事前学習語埋め込みモデルは,100次元および300次元モデルを用いた分類器よりも優れている。また、多数決戦略に基づくアンサンブル分類器は、さらにその分類器を強化し、個々の分類器より優れる。最後に、関係する開発者のオンライン調査により、我々のアプローチによって特定された違反症状は実用的価値があり、差し迫ったアーキテクチャの侵食に対する早期警告を提供できることが明らかとなった。 Architecture erosion has a detrimental effect on maintenance and evolution, as the implementation drifts away from the intended architecture. To prevent this, development teams need to understand early enough the symptoms of erosion, and particularly violations of the intended architecture. One way to achieve this, is through the automatic identification of architecture violations from textual artifacts, and particularly code reviews. In this paper, we developed 15 machine learning-based and 4 deep learning-based classifiers with three pre-trained word embeddings to identify violation symptoms of architecture erosion from developer discussions in code reviews. Specifically, we looked at code review comments from four large open-source projects from the OpenStack (Nova and Neutron) and Qt (Qt Base and Qt Creator) communities. We then conducted a survey to acquire feedback from the involved participants who discussed architecture violations in code reviews, to validate the usefulness of our trained classifiers. The results show that the SVM classifier based on word2vec pre-trained word embedding performs the best with an F1-score of 0.779. In most cases, classifiers with the fastText pre-trained word embedding model can achieve relatively good performance. Furthermore, 200-dimensional pre-trained word embedding models outperform classifiers that use 100 and 300-dimensional models. In addition, an ensemble classifier based on the majority voting strategy can further enhance the classifier and outperforms the individual classifiers. Finally, an online survey of the involved developers reveals that the violation symptoms identified by our approaches have practical value and can provide early warnings for impending architecture erosion.	翻訳日:2023-10-23 19:48:01 公開日:2023-06-14
# ソフトウェア工学教育におけるチーム構成 Team Composition in Software Engineering Education ( http://arxiv.org/abs/2306.08431v1 ) ライセンス: Link先を確認	Sajid Ibrahim Hashmi and Jouni Markkula	(参考訳) ソフトウェア工学教育の目的の1つは、学生に重要なチームワークスキルを習得させることである。これは、学生にコースの割り当てのためにグループで作業させることによって行われる。学生チーム構成は、学習結果、学習内容、学習方法に大きな影響を与えるため、この点で重要な役割を果たす。本研究は,ソフトウェア工学教育における学生チーム構成の理解を深め,国際ソフトウェア工学教育における学生チーム構成に影響する要因を検討することを目的とする。これらの要因は、ソフトウェア工学の教師がコースでグループワークの課題を設計する際に考慮すべきである。本稿では,現在進行中の行動研究研究の最初の知見について述べる。この結果は、ソフトウェア工学コースで学生チーム構成を設計する際に考慮すべきいくつかの明確な原則を与えている。 One of the objectives of software engineering education is to make students to learn essential teamwork skills. This is done by having the students work in groups for course assignments. Student team composition plays a vital role in this, as it significantly affects learning outcomes, what is learned, and how. The study presented in this paper aims to better understand the student team composition in software engineering education and investigate the factors affecting it in the international software engineering education context. Those factors should be taken into consideration by software engineering teachers when they design group work assignments in their courses. In this paper, the initial findings of the ongoing Action research study are presented. The results give some identified principles that should be considered when designing student team composition in software engineering courses.	翻訳日:2023-10-23 19:47:38 公開日:2023-06-14
# プロパティアクセスエラーに対する統計的アプローチ A statistical approach for finding property-access errors ( http://arxiv.org/abs/2306.08741v1 ) ライセンス: Link先を確認	Ellen Arteca, Max Sch\"afer, Frank Tip	(参考訳) 我々は、オブジェクトが固定されたレイアウトを持たず、プロパティ(メソッドを含む)を追加し、上書きし、オブジェクトの寿命を通して自由に削除できるJavaScriptの不正なプロパティアクセスを見つける問題を調査する。非存在プロパティを参照することはjavascriptのエラーではないため、(おそらくタイプミスやapiドキュメントの誤解によって)存在しないプロパティへの偶発的なアクセスは、徹底的なテストなしで検出されず、問題の原因からは程遠い可能性がある。そこで本研究では,プロパティアクセスのほとんどが正しいという観測に基づいて,プロパティアクセスエラーを検出する2相アプローチを提案する。まず、実世界のjavascriptコードの広範囲なコーパスから多数のプロパティアクセスパターンを収集し、異常な使用パターンを特定するために統計分析を行う。これらのパターンの特定のインスタンスはバグではないかもしれない(動的型チェックなど)ため、ローカルなデータフロー分析では、安全な異常なプロパティアクセスのインスタンスをフィルタし、実際のバグは残らない。提案手法を実験的に検証し, 異常なプロパティアクセスの100件の具体例において, 90%のリコールで82%の精度を実現し, 実用化に適していることを示す。また、人気のあるVSCodeコード補完機能がオブジェクトプロパティの提案にどの程度有効であるかを判断する実験を行い、不正なプロパティ(100%の精度)を示唆しなかったが、80ケース中62ケース(22.5%のリコール)で正しいプロパティを提案できなかったことを発見した。これは、すべてのプロパティアクセスが有効であることを保証するために、VSCodeのコード補完のみに頼ることはできないことを示している。 We study the problem of finding incorrect property accesses in JavaScript where objects do not have a fixed layout, and properties (including methods) can be added, overwritten, and deleted freely throughout the lifetime of an object. Since referencing a non-existent property is not an error in JavaScript, accidental accesses to non-existent properties (caused, perhaps, by a typo or by a misunderstanding of API documentation) can go undetected without thorough testing, and may manifest far from the source of the problem. We propose a two-phase approach for detecting property access errors based on the observation that, in practice, most property accesses will be correct. First a large number of property access patterns is collected from an extensive corpus of real-world JavaScript code, and a statistical analysis is performed to identify anomalous usage patterns. Specific instances of these patterns may not be bugs (due, e.g., dynamic type checks), so a local data-flow analysis filters out instances of anomalous property accesses that are safe and leaves only those likely to be actual bugs. We experimentally validate our approach, showing that on a set of 100 concrete instances of anomalous property accesses, the approach achieves a precision of 82% with a recall of 90%, making it suitable for practical use. We also conducted an experiment to determine how effective the popular VSCode code completion feature is at suggesting object properties, and found that, while it never suggested an incorrect property (precision of 100%), it failed to suggest the correct property in 62 out of 80 cases (recall of 22.5%). This shows that developers cannot rely on VSCode's code completion alone to ensure that all property accesses are valid.	翻訳日:2023-10-23 19:34:58 公開日:2023-06-14
# Cavatoolsシミュレータのリアルタイム性能解析のためのPOWER命令セットアーキテクチャのRTL擬符号をCに変換する Transpiling RTL Pseudo-code of the POWER Instruction Set Architecture to C for Real-time Performance Analysis on Cavatools Simulator ( http://arxiv.org/abs/2306.08701v1 ) ライセンス: Link先を確認	Kinar S, Prashanth K V, Adithya Hegde, Aditya Subrahmanya Bhat, Narender M	(参考訳) 本稿では,POWER命令セットアーキテクチャ(ISA)のRTL擬似コードをCコードに変換するトランスパイラフレームワークを提案し,Cavatoolsシミュレータ上での実行を可能にする。トランスパイラは、RTL擬似コードを解析し、対応するCコード表現を生成するレキサとパーサで構成される。レキサは入力コードをトークン化し、パーサは文法ルールを適用して抽象構文木(AST)を構築する。トランスパイラは、要件に準拠したCコードを生成することで、Cavatoolsシミュレータとの互換性を保証する。結果として得られたCコードはCavatoolsシミュレータ上で実行でき、開発者はPower ISAの命令レベルのパフォーマンスをリアルタイムで分析できる。提案フレームワークは,RTL擬似コードをCavatoolsエコシステムにシームレスに統合し,総合的なパフォーマンス解析とPower ISAベースのコードの最適化を可能にする。 This paper presents a transpiler framework for converting RTL pseudo code of the POWER Instruction Set Architecture (ISA) to C code, enabling its execution on the Cavatools simulator. The transpiler consists of a lexer and parser, which parse the RTL pseudo code and generate corresponding C code representations. The lexer tokenizes the input code, while the parser applies grammar rules to build an abstract syntax tree (AST). The transpiler ensures compatibility with the Cavatools simulator by generating C code that adheres to its requirements. The resulting C code can be executed on the Cavatools simulator, allowing developers to analyze the instruction-level performance of the Power ISA in real time. The proposed framework facilitates the seamless integration of RTL pseudo code into the Cavatools ecosystem, enabling comprehensive performance analysis and optimization of Power ISA-based code.	翻訳日:2023-10-23 19:34:01 公開日:2023-06-14
# 機械学習を用いた企業間プロジェクトメトリクスから説明可能なソフトウェア欠陥予測 Explainable Software Defect Prediction from Cross Company Project Metrics Using Machine Learning ( http://arxiv.org/abs/2306.08655v1 ) ライセンス: Link先を確認	Susmita Haldar, Luiz Fernando Capretz	(参考訳) プロジェクトの欠陥の数を予測することは、プロジェクトテストマネージャが、テスト、サポート、メンテナンスの作業のために予算、リソース、スケジュールを割り当てるのに重要です。ソフトウェア欠陥予測モデルは、過去の欠陥関連情報をトレーニングした後、与えられたプロジェクトの欠陥の数を予測する。欠陥予測研究の大部分は、手法やクラスレベルの静的情報から欠陥の可能性のあるモジュールを予測することに焦点を当てているが、この研究は、クロス企業プロジェクトデータセットに基づいたプロジェクトレベルの情報から欠陥を予測するものである。本研究は,様々な機械学習アルゴリズムを応用した欠陥予測モデルの開発に焦点をあて,ソフトウェアサイズメトリクス,労力メトリクス,欠陥密度情報を活用する。既存の欠陥予測研究で注目すべき問題は、開発モデルにおける透明性の欠如である。その結果,Shapley Additive exPlanations (SHAP) と呼ばれる最先端のポストホックモデルに依存しない手法を用いて,開発モデルの説明可能性を示した。最後に、企業間のプロジェクト情報から欠陥を予測する重要な特徴を特定した。 Predicting the number of defects in a project is critical for project test managers to allocate budget, resources, and schedule for testing, support and maintenance efforts. Software Defect Prediction models predict the number of defects in given projects after training the model with historical defect related information. The majority of defect prediction studies focused on predicting defect-prone modules from methods, and class-level static information, whereas this study predicts defects from project-level information based on a cross-company project dataset. This study utilizes software sizing metrics, effort metrics, and defect density information, and focuses on developing defect prediction models that apply various machine learning algorithms. One notable issue in existing defect prediction studies is the lack of transparency in the developed models. Consequently, the explain-ability of the developed model has been demonstrated using the state-of-the-art post-hoc model-agnostic method called Shapley Additive exPlanations (SHAP). Finally, important features for predicting defects from cross-company project information were identified.	翻訳日:2023-10-23 19:33:40 公開日:2023-06-14
# 商品市場におけるDeep Policy Gradient Methods Deep Policy Gradient Methods in Commodity Markets ( http://arxiv.org/abs/2308.01910v1 ) ライセンス: Link先を確認	Jonas Hanetho	(参考訳) エネルギー移行は、断続的なエネルギー源への依存を高め、エネルギー市場を不安定化し、前例のないボラティリティを引き起こし、2021年の世界的なエネルギー危機で頂点に達した。生産者や消費者を害するだけでなく、揮発性エネルギー市場は重要な脱炭努力を危うくする可能性がある。トレーダーは流動性とボラティリティの低減によって市場の安定化に重要な役割を果たしている。将来のリターンを予測するための数理モデルと統計モデルが提案されている。しかし、金融市場の信号対雑音比や非定常力学のため、そのようなモデルの開発は簡単ではない。本論文は,商品取引における深層強化学習手法の有効性について考察する。商品取引問題を離散時間確率力学系として定式化する。このシステムは、市場のボラティリティに反応し適応し、サブサンプルの金融時系列により良い統計特性を提供する、新しい時間分散方式を採用している。取引コストとリスクに敏感な取引エージェントを最適化するために,アクターベースとアクタークリティカルベースという2つのポリシー勾配アルゴリズムを提案する。エージェントは、ディープニューラルネットワークアーキテクチャ、特にCNNとLSTMを用いたパラメトリック関数近似器を介して、過去の価格観測を市場ポジションにマッピングする。深層強化学習モデルの平均は、2017年から2022年までの前月の天然ガス先物試験において、買い買いベースラインよりも83%高いシャープ率を示している。バックテストにより, 深層強化学習エージェントのリスク耐性は, リスク感受性項を用いて調整可能であることが示された。アクターに基づくポリシー勾配アルゴリズムはアクター批判に基づくアルゴリズムよりも大幅に優れており、CNNベースのモデルはLSTMに基づくアルゴリズムよりも若干優れている。 The energy transition has increased the reliance on intermittent energy sources, destabilizing energy markets and causing unprecedented volatility, culminating in the global energy crisis of 2021. In addition to harming producers and consumers, volatile energy markets may jeopardize vital decarbonization efforts. Traders play an important role in stabilizing markets by providing liquidity and reducing volatility. Several mathematical and statistical models have been proposed for forecasting future returns. However, developing such models is non-trivial due to financial markets' low signal-to-noise ratios and nonstationary dynamics. This thesis investigates the effectiveness of deep reinforcement learning methods in commodities trading. It formalizes the commodities trading problem as a continuing discrete-time stochastic dynamical system. This system employs a novel time-discretization scheme that is reactive and adaptive to market volatility, providing better statistical properties for the sub-sampled financial time series. Two policy gradient algorithms, an actor-based and an actor-critic-based, are proposed for optimizing a transaction-cost- and risk-sensitive trading agent. The agent maps historical price observations to market positions through parametric function approximators utilizing deep neural network architectures, specifically CNNs and LSTMs. On average, the deep reinforcement learning models produce an 83 percent higher Sharpe ratio than the buy-and-hold baseline when backtested on front-month natural gas futures from 2017 to 2022. The backtests demonstrate that the risk tolerance of the deep reinforcement learning agents can be adjusted using a risk-sensitivity term. The actor-based policy gradient algorithm performs significantly better than the actor-critic-based algorithm, and the CNN-based models perform slightly better than those based on the LSTM.	翻訳日:2023-10-23 15:31:41 公開日:2023-06-14
# ボース、光子スピン、識別不能の物語 The Story of Bose, Photon Spin and Indistinguishability ( http://arxiv.org/abs/2308.01909v1 ) ライセンス: Link先を確認	Partha Ghose	(参考訳) 1924年の量子統計の発見百周年に近づくにつれ、ボース・アインシュタイン統計のほとんどの標準的なプレゼンテーションで無視されるプランクの法則のオリジナルの導出を再検討することが重要である。これは光子の区別不可能性という新しい概念だけでなく、その固有のスピンの概念も導入した。 As we approach the centenary of the discovery of quantum statistics in 1924, it is important to revisit Bose's original derivation of Planck's law usually ignored in most standard presentations of Bose-Einstein statistics. It introduced not only the novel concept of the indistinguishability of photons but also of their intrinsic spin, a fact unknown to most physicists.	翻訳日:2023-10-23 15:31:12 公開日:2023-06-14
# トロイの木馬がいるのか! IoT環境における最新のMLによる最新の侵入検知システムの文献調査と評価 Is there a Trojan! : Literature survey and critical evaluation of the latest ML based modern intrusion detection systems in IoT environments ( http://arxiv.org/abs/2310.10778v1 ) ライセンス: Link先を確認	Vishal Karanam	(参考訳) ドメインとしてのIoTはここ数年で大きく成長し、データ量だけでなく、サイバーセキュリティの脅威もモバイルネットワーク環境に匹敵している。 IoT環境内のデータの機密性とプライバシは、ここ数年でセキュリティ研究の重要な領域になっている。ますます多くのセキュリティ専門家が、従来のセキュリティ手法を補完するものとして、IoT環境を保護する堅牢なIDSシステムを設計することに関心を持っている。 IoTデバイスはリソース制約があり、異種プロトコルスタックがあるため、従来の侵入検出アプローチはこれらのスキーマ境界内ではうまく機能しない。これにより、セキュリティ研究者は、IoTエコシステムにおける非学習ベースのIDSシステムの欠点を解決するために、マシンラーニングとIDSの交差点でイノベーションを行うことができた。さまざまなMLアルゴリズムがIoTデータセットですでに高い精度を実現していますが、十分なプロダクショングレードモデルがないことが分かります。本稿では,iot侵入検出システムにおける最新の学習ベースアプローチの概要を概説するとともに,これらのシステム,mlパイプラインの潜在的な落とし穴,mlの観点からの課題について徹底的なレビューを行い,今後の研究範囲と推奨事項について論じる。 IoT as a domain has grown so much in the last few years that it rivals that of the mobile network environments in terms of data volumes as well as cybersecurity threats. The confidentiality and privacy of data within IoT environments have become very important areas of security research within the last few years. More and more security experts are interested in designing robust IDS systems to protect IoT environments as a supplement to the more traditional security methods. Given that IoT devices are resource-constrained and have a heterogeneous protocol stack, most traditional intrusion detection approaches don't work well within these schematic boundaries. This has led security researchers to innovate at the intersection of Machine Learning and IDS to solve the shortcomings of non-learning based IDS systems in the IoT ecosystem. Despite various ML algorithms already having high accuracy with IoT datasets, we can see a lack of sufficient production grade models. This survey paper details a comprehensive summary of the latest learning-based approaches used in IoT intrusion detection systems, and conducts a thorough critical review of these systems, potential pitfalls in ML pipelines, challenges from an ML perspective, and discusses future research scope and recommendations.	翻訳日:2023-10-23 02:21:32 公開日:2023-06-14
# m$^2$hub: 材料発見のための機械学習の可能性を解き放つ M$^2$Hub: Unlocking the Potential of Machine Learning for Materials Discovery ( http://arxiv.org/abs/2307.05378v1 ) ライセンス: Link先を確認	Yuanqi Du, Yingheng Wang, Yining Huang, Jianan Canal Li, Yanqiao Zhu, Tian Xie, Chenru Duan, John M. Gregoire, Carla P. Gomes	(参考訳) 材料発見における機械学習を促進するツールキットであるM$^2$Hubを紹介する。機械学習は分子構造、特に創薬のための生体分子のモデリングにおいて著しく進歩した。しかし、材料構造のモデリングのための機械学習手法の開発は遅れており、材料発見のための多様なタスクへのアクセスを可能にする統合プラットフォームが欠如していることにも理由がある。このギャップを埋めるため、M$^2$Hubは、ワークフロー全体をカバーする材料発見タスク、データセット、機械学習メソッド、評価、ベンチマーク結果へのアクセスを簡単にする。具体的には、M$^2$Hubの最初のリリースでは、仮想スクリーニング、逆設計、分子シミュレーションの3つの重要な段階に焦点を当てている。さらに,材料生成タスクのための2つの合成データセットを提供する。ランダムなデータ分割に加えて、現実世界の物質発見シナリオを反映した3つのデータパーティションも提供します。最先端の機械学習手法(材料構造に適しているが文献では比較されないものを含む)は、代表的タスクでベンチマークされる。私たちのコードとライブラリはhttps://github.com/yuanqidu/m2hubで公開されています。 We introduce M$^2$Hub, a toolkit for advancing machine learning in materials discovery. Machine learning has achieved remarkable progress in modeling molecular structures, especially biomolecules for drug discovery. However, the development of machine learning approaches for modeling materials structures lag behind, which is partly due to the lack of an integrated platform that enables access to diverse tasks for materials discovery. To bridge this gap, M$^2$Hub will enable easy access to materials discovery tasks, datasets, machine learning methods, evaluations, and benchmark results that cover the entire workflow. Specifically, the first release of M$^2$Hub focuses on three key stages in materials discovery: virtual screening, inverse design, and molecular simulation, including 9 datasets that covers 6 types of materials with 56 tasks across 8 types of material properties. We further provide 2 synthetic datasets for the purpose of generative tasks on materials. In addition to random data splits, we also provide 3 additional data partitions to reflect the real-world materials discovery scenarios. State-of-the-art machine learning methods (including those are suitable for materials structures but never compared in the literature) are benchmarked on representative tasks. Our codes and library are publicly available at https://github.com/yuanqidu/M2Hub.	翻訳日:2023-07-16 03:43:09 公開日:2023-06-14
# 非伝統的な認知知能ロボット制御:人間の感情推定における量子ソフトコンピューティングアプローチ-QCOptKBツールキット応用 Unconventional Cognitive Intelligent Robotic Control: Quantum Soft Computing Approach in Human Being Emotion Estimation -- QCOptKB Toolkit Application ( http://arxiv.org/abs/2307.06858v1 ) ライセンス: Link先を確認	Sergey V. Ulyanov, Ichiro Kurawaki, Viktor S. Ulyanov, Takakhide Hagiwara	(参考訳) 量子・ソフトコンピューティングに基づく知的認知制御システムの戦略について知的ファジィコントローラから抽出した量子自己組織化知識ベース相乗効果の不完全知識ベースその技術は、認知神経インタフェースと異なるタイプのロボット協調で記述されたハザード制御状況における知的認知制御システムの堅牢性を改善した。例えば、ボード埋め込み制御系のためのプログラム可能なアルゴリズムによる解法として量子ファジィ推論ゲート設計が導入された。車両の運転に量子ファジィ制御を用いた認知ヘルメットを用いたニューラルインタフェースの適用の可能性を示す。 Strategy of intelligent cognitive control systems based on quantum and soft computing presented. Quantum self-organization knowledge base synergetic effect extracted from intelligent fuzzy controllers imperfect knowledge bases described. That technology improved of robustness of intelligent cognitive control systems in hazard control situations described with the cognitive neuro-interface and different types of robot cooperation. Examples demonstrated the introduction of quantum fuzzy inference gate design as prepared programmable algorithmic solution for board embedded control systems. The possibility of neuro-interface application based on cognitive helmet with quantum fuzzy controller for driving of the vehicle is shown.	翻訳日:2023-07-16 03:17:24 公開日:2023-06-14
# ELMニューロン:高能率・高能率皮質ニューロンモデルによる長期作業の解法 The ELM Neuron: an Efficient and Expressive Cortical Neuron Model Can Solve Long-Horizon Tasks ( http://arxiv.org/abs/2306.16922v1 ) ライセンス: Link先を確認	Aaron Spieler, Nasim Rahaman, Georg Martius, Bernhard Sch\"olkopf, Anna Levina	(参考訳) 伝統的な大規模神経科学モデルと機械学習は、複雑な計算を行うために集団活動と適切に調整された接続に依存する、個々のニューロンの単純化されたモデルを利用する。しかし、それぞれの生物学的皮質ニューロンは本質的に高度な計算装置であり、数百万のパラメータを持つ深層人工ニューラルネットワークを用いて、皮質錐体ニューロンの詳細な生体物理モデルの入力-出力関係を再現するという最近の研究で裏付けられている。我々はこれらの多くのパラメータの必要性を疑問視し、生物学的にインスパイアされ、計算的に表現され、しかし効率的な皮質ニューロンモデルであるExpressive Leaky Memory(ELM)ニューロンを導入する。 ELMニューロンは、上記の入力と出力の関係を正確に一致させるために、わずか8Kのトレーニング可能なパラメータしか必要としない。正確なモデルは複数のメモリのような隠れ状態と複雑な非線形シナプス積分を必要とする。本研究では,CIFAR-10分類タスクの逐次バージョン,挑戦的パスファインダー-Xタスク,スパイキングハイデルバーグ・ディジットスデータセットに基づく新しいデータセットなど,時間的構造を必要とする様々なタスクにおけるEMMニューロンの評価を行う。 ELMニューロンは、Pathfinder-Xタスク上で77%の精度でトランスフォーマーベースモデルより優れており、シークエンシャルCIFAR-10上での競合性能を示し、スパイキングハイデルバーグ・ディジットスデータセットの古典LSTMモデルよりも優れた性能を示している。これらの結果は、生物学的に動機づけられ、計算効率の良いニューロンモデルが機械学習タスクの性能を向上させる可能性を示唆している。 Traditional large-scale neuroscience models and machine learning utilize simplified models of individual neurons, relying on collective activity and properly adjusted connections to perform complex computations. However, each biological cortical neuron is inherently a sophisticated computational device, as corroborated in a recent study where it took a deep artificial neural network with millions of parameters to replicate the input-output relationship of a detailed biophysical model of a cortical pyramidal neuron. We question the necessity for these many parameters and introduce the Expressive Leaky Memory (ELM) neuron, a biologically inspired, computationally expressive, yet efficient model of a cortical neuron. Remarkably, our ELM neuron requires only 8K trainable parameters to match the aforementioned input-output relationship accurately. We find that an accurate model necessitates multiple memory-like hidden states and intricate nonlinear synaptic integration. To assess the computational ramifications of this design, we evaluate the ELM neuron on various tasks with demanding temporal structures, including a sequential version of the CIFAR-10 classification task, the challenging Pathfinder-X task, and a new dataset based on the Spiking Heidelberg Digits dataset. Our ELM neuron outperforms most transformer-based models on the Pathfinder-X task with 77% accuracy, demonstrates competitive performance on Sequential CIFAR-10, and superior performance compared to classic LSTM models on the variant of the Spiking Heidelberg Digits dataset. These findings indicate a potential for biologically motivated, computationally efficient neuronal models to enhance performance in challenging machine learning tasks.	翻訳日:2023-07-02 13:07:13 公開日:2023-06-14
# 教育のための社会生成AI : 理論・実践・倫理 Towards social generative AI for education: theory, practices and ethics ( http://arxiv.org/abs/2306.10063v1 ) ライセンス: Link先を確認	Mike Sharples	(参考訳) 本稿では,人間と人工知能を介する教育的相互作用を,プロンプトや応答のシーケンスではなく,会話や探索の社会的プロセスとして考察する。この概念では、学習者はインターネットツールやリソースの動的計算媒体内のAI言語モデルと絶えず会話する。学習は、この分散システムが目標を設定し、データから意味を構築し、理解を集約し、違いを調和させ、知識を新しいドメインに移すときに起こる。教育のための社会的生成AIの構築には、人間だけでなく、互いに会話できる強力なAIシステムの開発、知識マップのような外部表現の構築、インターネットリソースへのアクセスと貢献、教師、学習者、ガイド、メンターの役割が必要となる。これは倫理の根本的な問題を引き起こす。このようなシステムは、彼らの限界、学習者に対する責任、インターネットの完全性、そして人間の教師や専門家に対する敬意を意識すべきである。教育のための社会的生成AIの設計と制約について検討する必要がある。 This paper explores educational interactions involving humans and artificial intelligences not as sequences of prompts and responses, but as a social process of conversation and exploration. In this conception, learners continually converse with AI language models within a dynamic computational medium of internet tools and resources. Learning happens when this distributed system sets goals, builds meaning from data, consolidates understanding, reconciles differences, and transfers knowledge to new domains. Building social generative AI for education will require development of powerful AI systems that can converse with each other as well as humans, construct external representations such as knowledge maps, access and contribute to internet resources, and act as teachers, learners, guides and mentors. This raises fundamental problems of ethics. Such systems should be aware of their limitations, their responsibility to learners and the integrity of the internet, and their respect for human teachers and experts. We need to consider how to design and constrain social generative AI for education.	翻訳日:2023-06-26 01:29:51 公開日:2023-06-14
# 言語モデル能力の構造を明らかにする Revealing the structure of language model capabilities ( http://arxiv.org/abs/2306.10062v1 ) ライセンス: Link先を確認	Ryan Burnell, Han Hao, Andrew R. A. Conway, and Jose Hernandez Orallo	(参考訳) 大規模言語モデル(LLM)の能力に関する理論的理解を構築することは、これらのシステムの振る舞いを予測し、説明する能力に不可欠である。本稿では, LLMの個体群間での個人差パターンから潜在能力を抽出し, LLMの機能構造について検討する。ベイジアン因子と頻繁な因子分析の組み合わせを用いて,27の認知タスクにわたる29のLLMからのデータを分析した。 LLM機能はモノリシックではないという証拠が見つかった。その代わり、推論、理解、コア言語モデリングを表す3つのよく定義された要素によってよりよく説明されます。さらに,これらの3因子は,モデル性能のばらつきの比率が高いことを説明できることがわかった。これらの結果は、異なるLLMの能力において一貫した構造を示し、これらの能力の多面的性質を示す。また,3つの能力はモデルサイズや命令チューニングなどのモデル特性と異なる関係を示すことがわかった。これらのパターンは、スケーリング法則の理解を深め、ある能力を改善するモデルの変更が同時に他人を損なう可能性があることを示すのに役立つ。これらの結果から,各モデル能力に合わせたタスクに着目して,ベンチマークを合理化できることが示唆された。 Building a theoretical understanding of the capabilities of large language models (LLMs) is vital for our ability to predict and explain the behavior of these systems. Here, we investigate the structure of LLM capabilities by extracting latent capabilities from patterns of individual differences across a varied population of LLMs. Using a combination of Bayesian and frequentist factor analysis, we analyzed data from 29 different LLMs across 27 cognitive tasks. We found evidence that LLM capabilities are not monolithic. Instead, they are better explained by three well-delineated factors that represent reasoning, comprehension and core language modeling. Moreover, we found that these three factors can explain a high proportion of the variance in model performance. These results reveal a consistent structure in the capabilities of different LLMs and demonstrate the multifaceted nature of these capabilities. We also found that the three abilities show different relationships to model properties such as model size and instruction tuning. These patterns help refine our understanding of scaling laws and indicate that changes to a model that improve one ability might simultaneously impair others. Based on these findings, we suggest that benchmarks could be streamlined by focusing on tasks that tap into each broad model ability.	翻訳日:2023-06-26 01:29:35 公開日:2023-06-14
# エージェント、システム、サービスの統合のためのオントロジー:OASISバージョン2 The Ontology for Agents, Systems and Integration of Services: OASIS version 2 ( http://arxiv.org/abs/2306.10061v1 ) ライセンス: Link先を確認	Giampaolo Bella, Domenico Cantone, Carmelo Fabio Longo, Marianna Nicolosi-Asmundo and Daniele Francesco Santamaria	(参考訳) セマンティック表現はいくつかのアプリケーションドメインにとって重要なイネーブルであり、マルチエージェントシステム領域は例外ではない。エージェントを意味的に表現する手法の1つとして、行動主義的なビジョンを持ち、どのように作用し、仲間と関わりあうかを記述することで、本質的に達成されている。このアプローチは基本的に、タスクの達成に関連する精神状態を通じてエージェントの運用能力を定義することを目的としている。 2019年に発表されたOASISオントロジー(An Ontology for Agent, Systems, and Integration of Services)は、セマンティック表現システムとエージェントとそのコミットメントのための通信プロトコルを提供するための行動論的アプローチを追求している。本稿では、oasis 2におけるエージェントの表現に関する主なモデル選択、oasisの最新のメジャーアップグレード、特にブロックチェーンのオントロジーの文脈において、導入以来のオントロジーによって達成された成果について報告する。 Semantic representation is a key enabler for several application domains, and the multi-agent systems realm makes no exception. Among the methods for semantically representing agents, one has been essentially achieved by taking a behaviouristic vision, through which one can describe how they operate and engage with their peers. The approach essentially aims at defining the operational capabilities of agents through the mental states related with the achievement of tasks. The OASIS ontology -- An Ontology for Agent, Systems, and Integration of Services, presented in 2019 -- pursues the behaviouristic approach to deliver a semantic representation system and a communication protocol for agents and their commitments. This paper reports on the main modeling choices concerning the representation of agents in OASIS 2, the latest major upgrade of OASIS, and the achievement reached by the ontology since it was first introduced, in particular in the context of ontologies for blockchains.	翻訳日:2023-06-26 01:29:16 公開日:2023-06-14
# MUBen:分子特性予測のための事前学習モデルの不確かさのベンチマーク MUBen: Benchmarking the Uncertainty of Pre-Trained Models for Molecular Property Prediction ( http://arxiv.org/abs/2306.10060v1 ) ライセンス: Link先を確認	Yinghao Li, Lingkai Kong, Yuanqi Du, Yue Yu, Yuchen Zhuang, Wenhao Mu, Chao Zhang	(参考訳) 大量のラベルのない分子データに基づいて事前訓練された大きなトランスフォーマーモデルは、分子特性を予測することに成功している。しかし、これらのモデルは微調整中に過度に適合しがちであり、トレーニング分布の外側にあるテストデータに対する過密な予測が引き起こされる。この問題を解決するために、モデルのキャリブレーションを改善するために不確実量化法(UQ)を用いることができる。多くのUQアプローチが存在するが、それらすべてが性能改善につながるわけではない。分子前訓練モデルを改善するためにUQを用いた研究もあるが、信頼性の高い分子不確実性推定のための適切なバックボーンとUQ法を選択するプロセスはまだ未定である。このギャップに対処するために,backboneモデルとuqモデルの異なる組み合わせを評価し,特性予測と不確実性推定の両方のパフォーマンスを定量化するmubenを提案する。異なる分子記述子を用いた様々なバックボーン分子表現モデルを、異なるカテゴリからのUQ手法による入力として微調整することにより、アーキテクチャ決定とトレーニング戦略の影響を批判的に評価する。本研究は、材料科学や薬物発見などの分野における不確実性クリティカルな応用の研究を促進するために、UQモデルとバックボーンモデルを選択するための洞察を提供する。 Large Transformer models pre-trained on massive unlabeled molecular data have shown great success in predicting molecular properties. However, these models can be prone to overfitting during fine-tuning, resulting in over-confident predictions on test data that fall outside of the training distribution. To address this issue, uncertainty quantification (UQ) methods can be used to improve the models' calibration of predictions. Although many UQ approaches exist, not all of them lead to improved performance. While some studies have used UQ to improve molecular pre-trained models, the process of selecting suitable backbone and UQ methods for reliable molecular uncertainty estimation remains underexplored. To address this gap, we present MUBen, which evaluates different combinations of backbone and UQ models to quantify their performance for both property prediction and uncertainty estimation. By fine-tuning various backbone molecular representation models using different molecular descriptors as inputs with UQ methods from different categories, we critically assess the influence of architectural decisions and training strategies. Our study offers insights for selecting UQ and backbone models, which can facilitate research on uncertainty-critical applications in fields such as materials science and drug discovery.	翻訳日:2023-06-26 01:29:00 公開日:2023-06-14
# EM-Network:Oracleがシーケンス学習のための自己蒸留をガイド EM-Network: Oracle Guided Self-distillation for Sequence Learning ( http://arxiv.org/abs/2306.10058v1 ) ライセンス: Link先を確認	Ji Won Yoon, Sunghwan Ahn, Hyeonseung Lee, Minchan Kim, Seok Min Kim, Nam Soo Kim	(参考訳) 我々は,seq2seq学習における目標情報を有効に活用する,新しい自己蒸留法であるem-networkを提案する。従来の手法とは対照的に、ターゲットシーケンスから派生したoracle guidanceでトレーニングされる。オラクルガイダンスは、タスク解決時にシーケンスモデルを支援するターゲット側コンテキストをコンパクトに表現するため、EM-Networkは、ソース入力のみを使用する場合よりも予測が優れている。そこで本研究では,EM-Network の有望な能力を引き継ぐために,EM-Network の知識を1段階的に活用できる新たな自己蒸留手法を提案する。音声認識のためのコネクショニスト時間分類(ctc)と機械翻訳のためのアテンションベースエンコーダデコーダ(aed)の2種類のseq2seqモデルについて包括的実験を行った。実験の結果,em-networkは,音声認識における最善の先行作業よりも改善し,wmt'14およびiwslt'14における最先端性能を確立した。 We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning. In contrast to conventional methods, it is trained with oracle guidance, which is derived from the target sequence. Since the oracle guidance compactly represents the target-side context that can assist the sequence model in solving the task, the EM-Network achieves a better prediction compared to using only the source input. To allow the sequence model to inherit the promising capability of the EM-Network, we propose a new self-distillation strategy, where the original sequence model can benefit from the knowledge of the EM-Network in a one-stage manner. We conduct comprehensive experiments on two types of seq2seq models: connectionist temporal classification (CTC) for speech recognition and attention-based encoder-decoder (AED) for machine translation. Experimental results demonstrate that the EM-Network significantly advances the current state-of-the-art approaches, improving over the best prior work on speech recognition and establishing state-of-the-art performance on WMT'14 and IWSLT'14.	翻訳日:2023-06-26 01:28:38 公開日:2023-06-14
# 表現を理解するために生成する Generate to Understand for Representation ( http://arxiv.org/abs/2306.10056v1 ) ライセンス: Link先を確認	Changshang Xue, Xiande Zhong, Xiaoqing Liu	(参考訳) 近年,自然言語理解(NLU)や自然言語生成(NLG),テキスト表現タスクなど,高品質な事前訓練モデルが多数出現している。従来、これらのモデルはカスタムドメインコーパスで事前トレーニングされ、特定のタスク用に微調整されており、gpuの使用と労力に関するコストが高くなる。残念ながら、最近の言語モデリングのトレンドは、スケーリングによるパフォーマンス向上に移行し、関連するコストをさらに高めている。 GUR: 言語モデリングと対照的な学習目標を組み合わせた事前トレーニングフレームワークを,単一のトレーニングステップで導入する。文書からLCS(Longest Common Substring)に基づいて類似したテキストペアを選択し,マスク付き言語モデリングと教師なしコントラスト学習を用いてモデルを訓練する。その結果得られたモデルであるGURは、ラベル付きトレーニングデータを使わずに印象的な結果を得ることができ、ゼロショット設定でリコールベンチマークにおいて、他のトレーニング済みベースラインよりも優れている。さらに,我々のアブレーション実験で示されたように,GURは言語モデリング能力を維持している。我々のコードは \url{https://github.com/laohur/GUR} で入手できる。 In recent years, a significant number of high-quality pretrained models have emerged, greatly impacting Natural Language Understanding (NLU), Natural Language Generation (NLG), and Text Representation tasks. Traditionally, these models are pretrained on custom domain corpora and finetuned for specific tasks, resulting in high costs related to GPU usage and labor. Unfortunately, recent trends in language modeling have shifted towards enhancing performance through scaling, further exacerbating the associated costs. Introducing GUR: a pretraining framework that combines language modeling and contrastive learning objectives in a single training step. We select similar text pairs based on their Longest Common Substring (LCS) from raw unlabeled documents and train the model using masked language modeling and unsupervised contrastive learning. The resulting model, GUR, achieves impressive results without any labeled training data, outperforming all other pretrained baselines as a retriever at the recall benchmark in a zero-shot setting. Additionally, GUR maintains its language modeling ability, as demonstrated in our ablation experiment. Our code is available at \url{https://github.com/laohur/GUR}.	翻訳日:2023-06-26 01:28:17 公開日:2023-06-14
# 効率的なメッシュセグメンテーションのための神経形状径関数 Neural Shape Diameter Function for Efficient Mesh Segmentation ( http://arxiv.org/abs/2306.11737v1 ) ライセンス: Link先を確認	Bruno Roy	(参考訳) 多角形メッシュを意味のある部分に分割することは難しい。多くのアプリケーションはコンピュータグラフィックスのさらなる処理のためにそのような構造を分解する必要がある。この10年間、集中計算時間を犠牲にして、この問題に取り組むためのいくつかの方法が提案された。近年,3次元構造のセグメンテーション作業に機械学習が有効であることが証明されている。それでも、これらの最先端のメソッドは、しばしば一般化しにくく、学習したモデルをオーバーフィッティングを避けるためにいくつかの特定のオブジェクトクラスに分割する必要がある。複数のアプリケーションのためのメッシュセグメンテーションの前に,ディープラーニングを利用してマッピング関数を符号化する。我々のネットワークは, 頂点近傍の類似性を利用した textsl{Shape Diameter Function} (SDF) 法の知識を用いて, 周辺地図を再現する。我々のアプローチは、入力メッシュをサンプリングし、近所の貢献のみのために全解像度構造をクエリするので、解像度に依存しない。予測したsdf値を用いることで、グラフカットアルゴリズムに構造を注入し、効率良くロバストなメッシュセグメンテーションを生成し、必要な計算時間をかなり削減できる。 Partitioning a polygonal mesh into meaningful parts can be challenging. Many applications require decomposing such structures for further processing in computer graphics. In the last decade, several methods were proposed to tackle this problem, at the cost of intensive computational times. Recently, machine learning has proven to be effective for the segmentation task on 3D structures. Nevertheless, these state-of-the-art methods are often hardly generalizable and require dividing the learned model into several specific classes of objects to avoid overfitting. We present a data-driven approach leveraging deep learning to encode a mapping function prior to mesh segmentation for multiple applications. Our network reproduces a neighborhood map using our knowledge of the \textsl{Shape Diameter Function} (SDF) method using similarities among vertex neighborhoods. Our approach is resolution-agnostic as we downsample the input meshes and query the full-resolution structure solely for neighborhood contributions. Using our predicted SDF values, we can inject the resulting structure into a graph-cut algorithm to generate an efficient and robust mesh segmentation while considerably reducing the required computation times.	翻訳日:2023-06-26 01:20:47 公開日:2023-06-14
# 容量獲得型入力分布における相互情報 The Mutual Information In The Vicinity of Capacity-Achieving Input Distributions ( http://arxiv.org/abs/2304.14219v3 ) ライセンス: Link先を確認	Bar{\i}\c{s} Nakibo\u{g}lu and Hao-Chung Cheng	(参考訳) 容量獲得入力分布の小さな近傍では、容量獲得入力分布との距離との相互情報の減少は、tops{\o}eとpinskerの不等式による同一性を用いた(多倍の)線形制約を持つすべてのチャネルの容量達成入力分布と有限入力集合との間の距離の2乗の線形関数によって下限される。そのような二次境界の非存在を示すカウンター例は、無限個の線形制約と無限個の入力集合の場合に与えられる。ピンスカーの不等式ではなくテイラー級数近似を用いて、容量獲得入力分布の小さい近傍において、容量獲得入力分布までの距離における相互情報の最も遅い減少の正確な特性を決定する。出力密度作用素が分離可能なヒルベルト空間上で定義される古典量子チャネルに対して、アナログ結果が確立される。チャネル符号化問題に対するこれらの観測の意義と関連する問題への証明手法の適用について論じる。 On small neighborhoods of the capacity-achieving input distributions, the decrease of the mutual information with the distance to the capacity-achieving input distributions is bounded below by a linear function of the square of the distance to the capacity-achieving input distributions for all channels with (possibly multiple) linear constraints and finite input sets using an identity due to Tops{\o}e and Pinsker's inequality. Counter examples demonstrating non-existence of such a quadratic bound are provided for the case of infinite many linear constraints and the case of infinite input sets. Using a Taylor series approximation, rather than Pinsker's inequality, the exact characterization of the slowest decrease of the mutual information with the distance to the capacity-achieving input distributions is determined on small neighborhoods of the capacity-achieving input distributions. Analogous results are established for classical-quantum channels whose output density operators are defined on a separable Hilbert spaces. Implications of these observations for the channel coding problem and applications of the proof technique to related problems are discussed.	翻訳日:2023-06-19 17:14:34 公開日:2023-06-14
# SAFER:顔の感情認識を意識した状況 SAFER: Situation Aware Facial Emotion Recognition ( http://arxiv.org/abs/2306.09372v1 ) ライセンス: Link先を確認	Mijanur Palash, Bharat Bhargava	(参考訳) 本稿では,表情から感情を認識する新しいシステムであるSAFERを提案する。最先端のディープラーニング技術を使用して、顔画像からさまざまな特徴を抽出し、背景や位置といったコンテキスト情報を組み込んでパフォーマンスを向上させる。このシステムはオープンワールドで動作するように設計されており、目に見えない様々な表情に適応でき、現実世界のアプリケーションに適している。この分野における既存の作業に対するSAFERの広範な評価は、CAER-Sデータセットで91.4%の精度で改善された性能を示す。さらに、Covid-19パンデミック時の顔マスクなどの新奇性が顔の感情認識に及ぼす影響を調査し、主流の表情データセットの限界を批判的に調査する。これらの制約に対処するために,表情認識のための新しいデータセットを提案する。提案するデータセットとシステムは,人間とコンピュータのインタラクションやセキュリティ,監視など,さまざまな用途に有用であると思われる。 In this paper, we present SAFER, a novel system for emotion recognition from facial expressions. It employs state-of-the-art deep learning techniques to extract various features from facial images and incorporates contextual information, such as background and location type, to enhance its performance. The system has been designed to operate in an open-world setting, meaning it can adapt to unseen and varied facial expressions, making it suitable for real-world applications. An extensive evaluation of SAFER against existing works in the field demonstrates improved performance, achieving an accuracy of 91.4% on the CAER-S dataset. Additionally, the study investigates the effect of novelty such as face masks during the Covid-19 pandemic on facial emotion recognition and critically examines the limitations of mainstream facial expressions datasets. To address these limitations, a novel dataset for facial emotion recognition is proposed. The proposed dataset and the system are expected to be useful for various applications such as human-computer interaction, security, and surveillance.	翻訳日:2023-06-19 16:47:42 公開日:2023-06-14
# デルタポテンシャル相互作用を持つ一次元量子力学モデルについて On some one-dimensional quantum-mechanical models with a delta-potential interaction ( http://arxiv.org/abs/2306.09371v1 ) ライセンス: Link先を確認	Francisco M. Fern\'andez	(参考訳) 我々は無次元量子力学的方程式の体系的構成について論じる。このプロセスは、独立したモデルパラメータの数を最小限に減らし、同時に、長さやエネルギーなどの自然な単位を明確かつ直接的な方法で提供する。この体系的な手順を、$\hbar=1$の設定からなる広く採用されている手順と比較する。具体例として、不均質媒質中の局在状態の研究のために最近提案された単純な一次元モデルを選択する。 We discuss a systematic construction of dimensionless quantum-mechanical equations. The process reduces the number of independent model parameters to a minimum and, at the same time, provides the natural units of length, energy, etc. in a clear, straightforward way. We compare this systematic procedure with the widely adopted one that consists of setting $\hbar=1$. As illustrative examples, we choose some simple one-dimensional models proposed recently for the study of localized states in inhomogeneous media.	翻訳日:2023-06-19 16:47:26 公開日:2023-06-14
# Warpformer:不規則な臨床時系列のマルチスケールモデリング手法 Warpformer: A Multi-scale Modeling Approach for Irregular Clinical Time Series ( http://arxiv.org/abs/2306.09368v1 ) ライセンス: Link先を確認	Jiawen Zhang, Shun Zheng, Wei Cao, Jiang Bian, Jia Li	(参考訳) 不規則にサンプリングされた多変量時系列は、様々な分野、特に医療分野においてユビキタスであり、シリーズ内不規則性とシリーズ間不一致の2つの重要な特徴を示す。シリーズ内不規則性は、時系列信号が不規則な間隔でしばしば記録されるという事実であり、シリーズ間不一致はシリーズ間のサンプリングレートの顕著な変動を指す。しかし、不規則時系列の最近の進歩は、シリーズ間の不規則性の問題を見越して、シリーズ内不規則性に対処することに集中している。このギャップを埋めるために、これらの2つの特徴を完全に考慮した新しいアプローチであるWarpformerを提案する。簡単に言えば、warpformerには、シリーズ内不規則性とシリーズ間不一致の両方を明示的に特徴付ける特定の入力表現、所定のスケールで不規則な時系列を適応的に統一するワーピングモジュール、表現学習のためのカスタマイズされたアテンションモジュールなど、いくつかの重要な設計がある。さらに、複数のワープモジュールとアテンションモジュールを積み重ねて異なるスケールで学習し、下流のタスクに対して粗くきめ細かな信号のバランスをとるマルチスケール表現を生成する。広範に使用されるデータセットと臨床データベースから構築した新しい大規模ベンチマークについて広範な実験を行った。この結果は、warpformerが既存の最先端のアプローチよりも優れていることを示している。 Irregularly sampled multivariate time series are ubiquitous in various fields, particularly in healthcare, and exhibit two key characteristics: intra-series irregularity and inter-series discrepancy. Intra-series irregularity refers to the fact that time-series signals are often recorded at irregular intervals, while inter-series discrepancy refers to the significant variability in sampling rates among diverse series. However, recent advances in irregular time series have primarily focused on addressing intra-series irregularity, overlooking the issue of inter-series discrepancy. To bridge this gap, we present Warpformer, a novel approach that fully considers these two characteristics. In a nutshell, Warpformer has several crucial designs, including a specific input representation that explicitly characterizes both intra-series irregularity and inter-series discrepancy, a warping module that adaptively unifies irregular time series in a given scale, and a customized attention module for representation learning. Additionally, we stack multiple warping and attention modules to learn at different scales, producing multi-scale representations that balance coarse-grained and fine-grained signals for downstream tasks. We conduct extensive experiments on widely used datasets and a new large-scale benchmark built from clinical databases. The results demonstrate the superiority of Warpformer over existing state-of-the-art approaches.	翻訳日:2023-06-19 16:47:19 公開日:2023-06-14
# 2次元ハイゼンベルク反強磁性体の確率級数展開におけるループアンサンブル Loop ensembles in Stochastic Series Expansion of Two-Dimensional Heisenberg Antiferromagnets ( http://arxiv.org/abs/2306.09366v1 ) ライセンス: Link先を確認	Vedant Motamarri	(参考訳) 確率級数展開 (sse) 法はスピンあるいはフレーバー値の再開とともに、量子反強磁性体の分割関数を1つの高次元の密充填ループガスモデルに写像する。 Nahumらによる以前の研究は、特定の密充填された3次元ループガスモデルがマクロループに支配される位相を示し、ループ長の対応する結合分布はポアソン・ディリクレであることを示した。普遍性の観点からは、(2+1)次元の量子反強磁性体で得られるループのアンサンブルは、異なる微視的起源からループが現れるにもかかわらず、同様に予想される。モンテカルロを用いた二乗格子上のSU(N)反強磁性体に対するSSEループアンサンブルをサンプリングし、関節分布が表現度Nと逆温度$\beta$でどのように変化するかを調べる。低温および低N($\leq$4)では,反強磁性相関が系を支配している場合,ポアソン-ディリクレ挙動の特性が実際に示される。 The Stochastic Series Expansion (SSE) method along with resummation over the spin or flavor values maps the partition function of a quantum antiferromagnet to a closely-packed loop gas model in one higher dimension. Earlier work by Nahum et al. has shown that certain closely-packed three-dimensional loop gas models exhibit phases dominated by macroscopic loops, wherein the corresponding joint distribution of loop lengths is Poisson-Dirichlet. On grounds of universality, the same is expected of the ensemble of loops obtained in (2+1)-dimensional quantum antiferromagnets, albeit the loops emerge from a different microscopic origin. We sample the SSE loop ensemble for SU(N) antiferromagnets on a square lattice using Monte Carlo and study how the joint distribution varies with the degree of representation N and inverse temperature $\beta$. We observe that, for low temperatures and small N($\leq$ 4), the distribution indeed shows characteristics of Poisson-Dirichlet behaviour when antiferromagnetic correlations dominate the system.	翻訳日:2023-06-19 16:46:51 公開日:2023-06-14
# 関数次元化法による誘導電動機の故障検出 Fault Detection in Induction Motors using Functional Dimensionality Reduction Methods ( http://arxiv.org/abs/2306.09365v1 ) ライセンス: Link先を確認	Mar\'ia Barroso, Jos\'e M. Bossio, Carlos M. Ala\'iz and \'Angela Fern\'andez	(参考訳) 回転する電気機械の故障検出および診断のための戦略の実装は、現代の産業システムの信頼性と安全性に不可欠である。本研究の貢献は、誘導電動機の故障状況を検出し分類するための、従来のモータ電流シグナチャ解析の戦略と機能的主成分分析と機能的拡散マップという機能的次元削減手法を組み合わせた方法論である。提案手法は, 誘導電動機における故障の存在をリアルタイムに検出するだけでなく, オフライン解析によって発生する多くの種類の故障の同定にも有用であることを示す。 The implementation of strategies for fault detection and diagnosis on rotating electrical machines is crucial for the reliability and safety of modern industrial systems. The contribution of this work is a methodology that combines conventional strategy of Motor Current Signature Analysis with functional dimensionality reduction methods, namely Functional Principal Components Analysis and Functional Diffusion Maps, for detecting and classifying fault conditions in induction motors. The results obtained from the proposed scheme are very encouraging, revealing a potential use in the future not only for real-time detection of the presence of a fault in an induction motor, but also in the identification of a greater number of types of faults present through an offline analysis.	翻訳日:2023-06-19 16:46:30 公開日:2023-06-14
# TSMixer:多変量時系列予測のための軽量MLPミクサモデル TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting ( http://arxiv.org/abs/2306.09364v1 ) ライセンス: Link先を確認	Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam	(参考訳) トランスフォーマーは時系列予測において、長い列の相互作用を捉える能力で人気を集めている。しかし、その高いメモリとコンピューティング要件は長期的な予測に重大なボトルネックをもたらす。そこで本研究では,多層パーセプトロン(MLP)モジュールのみからなる軽量ニューラルネットワークTSMixerを提案する。 tsmixerはパッチ付き時系列の多変量予測と表現学習のために設計されており、トランスフォーマーの効率的な代替手段を提供する。我々のモデルはコンピュータビジョンにおけるMLP-Mixerモデルの成功からインスピレーションを得ている。時系列にVision MLP-Mixerを適用する際の課題を示し、精度を高めるために経験的検証されたコンポーネントを導入する。これは、階層構造やチャネル相関などの時系列特性を明示的にモデル化するための、MLP-Mixerバックボーンにオンライン和解ヘッドを付加する新しい設計パラダイムを含む。また,既存のパッチチャネル混合方式では一般的な課題である,多種多様なデータセット間のノイズチャネルインタラクションと一般化を効果的に処理するためのハイブリッドチャネルモデリング手法を提案する。さらに、重要な特徴を優先するために、バックボーンに単純なゲートアテンション機構が導入される。これらの軽量なコンポーネントを組み込むことで、単純なmlp構造の学習能力を大幅に向上させ、最小の計算使用量で複雑なトランスフォーマーモデルを上回る。さらに、TSMixerのモジュール設計により、教師付きとマスク付きの両方の自己教師付き学習手法との互換性が実現され、時系列基礎モデルのための有望なビルディングブロックとなる。 TSMixer は最先端の MLP と Transformer のモデルよりも 8-60% の差で予測できる。また、Patch-Transformerモデルの最新の強力なベンチマーク(1～2%)を上回り、メモリとランタイム(2～3倍)を大幅に削減した。 Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules. TSMixer is designed for multivariate forecasting and representation learning on patched time series, providing an efficient alternative to Transformers. Our model draws inspiration from the success of MLP-Mixer models in computer vision. We demonstrate the challenges involved in adapting Vision MLP-Mixer for time series and introduce empirically validated components to enhance accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a Hybrid channel modeling approach to effectively handle noisy channel interactions and generalization across diverse datasets, a common challenge in existing patch channel-mixing methods. Additionally, a simple gated attention mechanism is introduced in the backbone to prioritize important features. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X).	翻訳日:2023-06-19 16:46:20 公開日:2023-06-14
# 特徴分布歪型フェデレーション学習のための簡易データ拡張法 A Simple Data Augmentation for Feature Distribution Skewed Federated Learning ( http://arxiv.org/abs/2306.09363v1 ) ライセンス: Link先を確認	Yunlu Yan, Lei Zhu	(参考訳) フェデレートラーニング(FL)は、複数のクライアント間の協調学習を分散的に支援し、プライバシ保護を保証する。しかし、その性能は不均一なデータ、すなわち非IIDデータとして必然的に劣化する。本稿では,現実世界のアプリケーションに広く普及しているFLシナリオの特徴分布に着目した。主な課題は、ローカルデータセットのさまざまな基礎的な分布に起因する機能シフトにある。前回の試みは進展したが、データ自体に注意を払う研究はほとんどなく、この問題の根源となっている。そこで本論文の主な目的は,特徴シフトを軽減するため,入力レベルでの汎用データ拡張手法を開発することである。この目的を達成するために,フェデレーション全体からクライアントのデータにデータセットの統計をランダムに注入する特徴分散スキュードFLのための,シンプルで驚くほど効果的なデータ拡張手法であるFedRDNを提案する。これにより,特徴の一般化を効果的に改善でき,特徴のシフトを緩和できる。さらにFedRDNは、数行のコードだけでデータ拡張フローにシームレスに統合できるプラグイン・アンド・プレイコンポーネントである。いくつかのデータセットに対する大規模な実験により、FedRDNと組み合わせることで、様々な代表FLワークの性能をさらに向上できることが示されている。ソースコードはリリースされます。 Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner, while ensuring privacy protection. However, its performance is inevitably degraded as suffering data heterogeneity, i.e., non-IID data. In this paper, we focus on the feature distribution skewed FL scenario, which is widespread in real-world applications. The main challenge lies in the feature shift caused by the different underlying distributions of local datasets. While the previous attempts achieved progress, few studies pay attention to the data itself, the root of this issue. Therefore, the primary goal of this paper is to develop a general data augmentation technique at the input level, to mitigate the feature shift. To achieve this goal, we propose FedRDN, a simple yet remarkably effective data augmentation method for feature distribution skewed FL, which randomly injects the statistics of the dataset from the entire federation into the client's data. By this, our method can effectively improve the generalization of features, thereby mitigating the feature shift. Moreover, FedRDN is a plug-and-play component, which can be seamlessly integrated into the data augmentation flow with only a few lines of code. Extensive experiments on several datasets show that the performance of various representative FL works can be further improved by combining them with FedRDN, which demonstrates the strong scalability and generalizability of FedRDN. The source code will be released.	翻訳日:2023-06-19 16:45:50 公開日:2023-06-14
# パラメータアウェアポリシを用いた汎用ワンショットロープマニピュレーション Generalizable One-shot Rope Manipulation with Parameter-Aware Policy ( http://arxiv.org/abs/2306.09872v1 ) ライセンス: Link先を確認	So Kuroki, Jiaxian Guo, Tatsuya Matsushima, Takuya Okubo, Masato Kobayashi, Yuya Ikeda, Ryosuke Takanami, Paul Yoo, Yutaka Matsuo, Yusuke Iwasawa	(参考訳) 従来のロープ操作では、動作中の変形性に固有の不確実性があるため、ロープのゴール到達のような単純なタスクであっても、ロープの操作ポリシーをトレーニングするために、何百もの実世界のデモを必要とする場合が多い。この問題に対処するため、実世界の1つのデモで異なる変形可能なロープを操作できるフレームワークであるGenORMを紹介します。これを実現するために, 変形可能なロープパラメータに条件付けし, 各種の模擬変形可能なロープをトレーニングすることにより, 異なるロープパラメータに基づいて動作を調整できるようにした。新しいロープが与えられたとき、GenORMは、実世界の実演とシミュレーションの点雲の格子密度の差を最小限にして、変形可能なロープパラメータを推定する。微分可能な物理シミュレータの助けを借りて、我々は1つの実世界のデモンストレーションしか必要としない。シミュレーションと実世界のロープ操作の両セットアップにおける実証的検証により,1回のデモンストレーションで異なるロープを操作でき,両環境でのベースラインを著しく上回る(ドメイン内ロープの62%向上,シミュレーションでの分散外ロープの15%向上,実世界の26%改善)ことが明らかとなり,ワンショットロープ操作におけるアプローチの有効性が実証された。 Due to the inherent uncertainty in their deformability during motion, previous methods in rope manipulation often require hundreds of real-world demonstrations to train a manipulation policy for each rope, even for simple tasks such as rope goal reaching, which hinder their applications in our ever-changing world. To address this issue, we introduce GenORM, a framework that allows the manipulation policy to handle different deformable ropes with a single real-world demonstration. To achieve this, we augment the policy by conditioning it on deformable rope parameters and training it with a diverse range of simulated deformable ropes so that the policy can adjust actions based on different rope parameters. At the time of inference, given a new rope, GenORM estimates the deformable rope parameters by minimizing the disparity between the grid density of point clouds of real-world demonstrations and simulations. With the help of a differentiable physics simulator, we require only a single real-world demonstration. Empirical validations on both simulated and real-world rope manipulation setups clearly show that our method can manipulate different ropes with a single demonstration and significantly outperforms the baseline in both environments (62% improvement in in-domain ropes, and 15% improvement in out-of-distribution ropes in simulation, 26% improvement in real-world), demonstrating the effectiveness of our approach in one-shot rope manipulation.	翻訳日:2023-06-19 13:31:49 公開日:2023-06-14
# 養殖システムにおける摂水制御と水質モニタリング--機会と課題 Feeding control and water quality monitoring in aquaculture systems: Opportunities and challenges ( http://arxiv.org/abs/2306.09920v1 ) ライセンス: Link先を確認	Fahad Aljehani, Ibrahima N'Doye, Taous-Meriem Laleg-Kirati	(参考訳) 養殖システムは、運営コストと魚の損失を低減し、成長生産効率を高め、魚の福祉と健康に繋がる先進的な管理戦略の最近の発展から恩恵を受けることができる。水質のモニタリングと給餌の制御は、魚類の生産性のバランスと魚類の成長過程の形成の基本的な要素である。現在、ほとんどの魚の養殖プロセスは異なる段階で手動で行われ、時間と挑戦的な人工的差別に依存している。摂餌制御アプローチは、飼料転換率を通じて魚類の成長と繁殖に影響を与えるため、これらの摂餌パラメータの制御は、魚の福祉の強化と一般的な漁業コストの最小化に不可欠である。アンモニア濃度やpHなどの環境因子の高濃度は水質や魚の生存に影響を及ぼす。したがって、最適で効率的で信頼性の高い供給プロセスを決定し、水質を監視するための制御戦略を開発する必要がある。本稿では,養殖システムにおける魚の成長制御技術,すなわち動的養殖プロセスの給水・水質を最適化するアルゴリズムについて概説する。具体的には,魚の成長と生存を最適化するためのモデルベース制御手法とモデルフリー強化学習戦略について検討した。モデルフリーフレームワークは近似魚の成長動的モデルを使用し、制約を満たさない。モデルに基づくアプローチが強化学習フレームワークをどのようにサポートし、制約満足度を効率的に処理し、価値に基づく強化学習からより良い軌道とポリシーを見つけるかについて議論する。 Aquaculture systems can benefit from the recent development of advanced control strategies to reduce operating costs and fish loss and increase growth production efficiency, resulting in fish welfare and health. Monitoring the water quality and controlling feeding are fundamental elements of balancing fish productivity and shaping the fish growth process. Currently, most fish-feeding processes are conducted manually in different phases and rely on time-consuming and challenging artificial discrimination. The feeding control approach influences fish growth and breeding through the feed conversion rate; hence, controlling these feeding parameters is crucial for enhancing fish welfare and minimizing general fishery costs. The high concentration of environmental factors, such as a high ammonia concentration and pH, affect the water quality and fish survival. Therefore, there is a critical need to develop control strategies to determine optimal, efficient, and reliable feeding processes and monitor water quality. This paper reviews the main control design techniques for fish growth in aquaculture systems, namely algorithms that optimize the feeding and water quality of a dynamic fish growth process. Specifically, we review model-based control approaches and model-free reinforcement learning strategies to optimize the growth and survival of the fish or track a desired reference live-weight growth trajectory. The model-free framework uses an approximate fish growth dynamic model and does not satisfy constraints. We discuss how model-based approaches can support a reinforcement learning framework to efficiently handle constraint satisfaction and find better trajectories and policies from value-based reinforcement learning.	翻訳日:2023-06-19 13:12:35 公開日:2023-06-14
# イソスペクトラル変形による複合kdvブレザへの自由粒子 Free Particle to Complex KdV breathers through Isospectral Deformation ( http://arxiv.org/abs/1110.3708v5 ) ライセンス: Link先を確認	Kumar Abhinav, Aradhya Shukla, and Prasanta K. Panigrahi	(参考訳) 実空間における量子力学における自由粒子には超対称性が与えられ、これは複素スペクトルへの自然な拡張を可能にし、P(英語版)とT(英語版)対称性が組み込まれている。また、PT対称性の非破壊相と破壊相の起源と、実値と複素固有値の関係についても説明し、後者はさらにゼロ幅共鳴を示す。これは、複素平面への固有値問題の拡張により拡大ヒルベルト空間における境界状態と減衰状態の組込みが可能になるため可能である。超対称性のスペクトルを変化させることなくポテンシャルを改変する固有の自由は、KdVの複素呼気解とPT対称性と複素平面上の自由粒子との接続を自然に説明する。さらに、破壊されたPT相における非自明な零幅共鳴は、sl(2, R) ポテンシャル代数に直結する一般化を課す。 The free particle in quantum mechanics in real space is endowed with supersymmetry, which enables a natural extension to complex spectra with a built-in parity (P) and time reversal (T) symmetry. It also explains the origin of unbroken and broken phases of the PT-symmetry and their relationship with the real and complex eigenvalues respectively, the latter further displaying zero-width resonances. This is possible as the extension of the eigenvalue problem to the complex plane enables the incorporation of bound and decaying states in the enlarged Hilbert space. The inherent freedom of modification of the potential without changing the spectra in supersymmetry naturally explains the connection of complex breather solutions of KdV with PT-symmetry and the free particle on the complex plane. Further, non-trivial zero-width resonances in the broken PT phase mandate a generalization that is directly connected to the sl(2, R) potential algebra.	翻訳日:2023-06-18 14:56:39 公開日:2023-06-14
# カテゴリ間の変換へのロバストネス:不変神経表現による変換へのロバストネス? Robustness to Transformations Across Categories: Is Robustness To Transformations Driven by Invariant Neural Representations? ( http://arxiv.org/abs/2007.00112v4 ) ライセンス: Link先を確認	Hojin Jang, Syed Suleman Abbas Zaidi, Xavier Boix, Neeraj Prasad, Sharon Gilad-Gutnick, Shlomit Ben-Ami, Pawan Sinha	(参考訳) 深層畳み込みニューラルネットワーク(DCNN)は、これらの変換がトレーニングセットに含まれる場合、変換中のオブジェクト(例えば、ぼやけやノイズ)を認識するための印象的な堅牢性を示している。このようなロバスト性を説明する仮説は、dcnnが画像が変換された後も不変な神経表現を発達させることである。しかし、この仮説がどの程度真であるかは、例えば不変性とは異なる性質で変換に対する堅牢性が達成できるため、決定的な疑問である。ネットワークの一部は、変換された画像または非変換された画像を認識するために特化することができる。本稿では, 学習分布を超えた変換に対するロバスト性を促進することによって, 不変な神経表現が出現する条件について検討する。具体的には、トレーニング中にいくつかのオブジェクトカテゴリのみが変換されるトレーニングパラダイムを分析し、dcnnが変換されないカテゴリ全体の変換にロバストであるかどうかを評価する。その結果,不変なニューラルネットワーク表現がない場合でも,ネットワークはトレーニング中に変換されるカテゴリのロバスト性を示すため,不変なニューラルネットワーク表現が変換に対するロバスト性を常に駆動するとは限らない。不変性は、トレーニングセット内の変換されたカテゴリの数が増えるときにのみ現れる。この現象は、物体の空間配置の変化を伴う回転や薄化のような幾何学的変換よりも、ぼやけやハイパスフィルタリングのような局所的変換で顕著である。その結果,深層学習における不変神経表現の理解が深層学習と自然発生状態の理解を深めることができた。 Deep Convolutional Neural Networks (DCNNs) have demonstrated impressive robustness to recognize objects under transformations (eg. blur or noise) when these transformations are included in the training set. A hypothesis to explain such robustness is that DCNNs develop invariant neural representations that remain unaltered when the image is transformed. However, to what extent this hypothesis holds true is an outstanding question, as robustness to transformations could be achieved with properties different from invariance, eg. parts of the network could be specialized to recognize either transformed or non-transformed images. This paper investigates the conditions under which invariant neural representations emerge by leveraging that they facilitate robustness to transformations beyond the training distribution. Concretely, we analyze a training paradigm in which only some object categories are seen transformed during training and evaluate whether the DCNN is robust to transformations across categories not seen transformed. Our results with state-of-the-art DCNNs indicate that invariant neural representations do not always drive robustness to transformations, as networks show robustness for categories seen transformed during training even in the absence of invariant neural representations. Invariance only emerges as the number of transformed categories in the training set is increased. This phenomenon is much more prominent with local transformations such as blurring and high-pass filtering than geometric transformations such as rotation and thinning, which entail changes in the spatial arrangement of the object. Our results contribute to a better understanding of invariant neural representations in deep learning and the conditions under which it spontaneously emerges.	翻訳日:2023-06-17 04:48:12 公開日:2023-06-14
# 嗜好に基づくOOD検出におけるエントロピー問題 Entropic Issues in Likelihood-Based OOD Detection ( http://arxiv.org/abs/2109.10794v2 ) ライセンス: Link先を確認	Anthony L. Caterini, Gabriel Loaiza-Ganem	(参考訳) 最大確率で訓練された深層生成モデルは、確率的にデータを推論するための非常に一般的な方法である。しかし、分布外データ(OOD)は分布内データよりも高い確率を割り当てることができることが観察されており、これらの確率値の意味を疑問視している。本研究では,この現象に対する新しい視点を示し,平均的確率をkl発散項とエントロピー項に分解する。後者は、上述した奇妙なOOD挙動を説明し、高いエントロピーを持つデータセットの確率値を抑制することができる。私たちのアイデアは単純ですが、文献ではまだ探索されていません。本解析は,問題となるエントロピー項が期待値から外れるので,確率比に基づくood検出手法の成功のさらなる説明を提供する。最後に、上記の分解が直接保持されない多様体モデルを用いた最近のOOD検出の成功と、この観察がどう関係しているかを論じる。 Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold directly.	翻訳日:2023-06-17 04:43:29 公開日:2023-06-14
# llvip: ローライトビジョンのための可視赤外ペアデータセット LLVIP: A Visible-infrared Paired Dataset for Low-light Vision ( http://arxiv.org/abs/2108.10831v4 ) ライセンス: Link先を確認	Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, Shengjie Liu, Wenli Zhou	(参考訳) 画像の融合や歩行者検出、低照度での画像から画像への変換といった様々な視覚課題において、有効な対象領域の欠如は極めて困難である。この場合、赤外線と可視画像を組み合わせて、詳細な情報と効果的なターゲット領域の両方を提供することができる。本稿では,低照度ビジョンのための可視赤外ペアデータセットLLVIPを提案する。このデータセットには30976枚の画像、または15488枚のペアが含まれており、そのほとんどは非常に暗いシーンで撮影され、すべての画像は時間と空間で厳密に整列している。データセットの歩行者はラベルが付けられています。データセットを他の可視赤外データセットと比較し,画像融合,歩行者検出,画像から画像への変換など,一般的なビジュアルアルゴリズムの性能評価を行った。実験結果は,画像情報に対する融合の相補的効果を示し,超低照度条件下での3つの視覚課題の既存のアルゴリズムの欠如を見出した。 LLVIPデータセットは,低照度アプリケーションにおける画像融合,歩行者検出,画像から画像への変換を促進することによって,コンピュータビジョンのコミュニティに寄与すると考えている。データセットはhttps://bupt-ai-cz.github.io/llvipでリリースされる。生データは画像登録などのさらなる研究のためにも提供される。 It is very challenging for various visual tasks such as image fusion, pedestrian detection and image-to-image translation in low light conditions due to the loss of effective target areas. In this case, infrared and visible images can be used together to provide both rich detail information and effective target areas. In this paper, we present LLVIP, a visible-infrared paired dataset for low-light vision. This dataset contains 30976 images, or 15488 pairs, most of which were taken at very dark scenes, and all of the images are strictly aligned in time and space. Pedestrians in the dataset are labeled. We compare the dataset with other visible-infrared datasets and evaluate the performance of some popular visual algorithms including image fusion, pedestrian detection and image-to-image translation on the dataset. The experimental results demonstrate the complementary effect of fusion on image information, and find the deficiency of existing algorithms of the three visual tasks in very low-light conditions. We believe the LLVIP dataset will contribute to the community of computer vision by promoting image fusion, pedestrian detection and image-to-image translation in very low-light applications. The dataset is being released in https://bupt-ai-cz.github.io/LLVIP. Raw data is also provided for further research such as image registration.	翻訳日:2023-06-17 04:43:14 公開日:2023-06-14
# 信頼されたサーバを持たないプライベートフェデレーション学習:凸損失の最適アルゴリズム Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses ( http://arxiv.org/abs/2106.09779v8 ) ライセンス: Link先を確認	Andrew Lowy and Meisam Razaviyayn	(参考訳) 本稿では、サーバや他のサイロを信頼していない人々からのデータを用いて、フェデレーション学習(FL)、特にクロスサイロFLについて研究する。この設定では、各サイロ(例えば病院)は、異なる人々(例えば患者)のデータを持ち、サーバまたは他のサイロが敵の盗聴者として機能しても、各人のデータ(例えば医療記録)のプライバシーを維持する必要がある。この要件は、レコード/イテムレベルの差分プライバシー(DP)を満たすためにサイロ i の通信を必要とする、ISRL-DP(Inter-Silo Record-Level Differential Privacy)の研究を動機付けている。 ISRL-DPは、サイロ i(例えば、病院 i)内の各人物(例えば、患者)のデータが漏洩しないことを保証する。 ISRL-DPは、よく研究されているプライバシー概念とは異なる。中央およびユーザレベルのDPは、人々がサーバ/他のサイロを信頼していると仮定します。スペクトルの反対側では、ローカルDPは、人々が誰も信用していないと仮定する(独自のサイロでさえ)。 ISRL-DPは、中央のDPとローカルのDPの間に位置するので、サーバや他のサイロではなく、人々が自分のサイロを信頼するという現実的な仮定(クロスサイロFL)が成り立つ。本研究では、ISRL-DP FL 上の(対数まで)上と下の境界に凸/強凸損失関数と等質な(等質な)サイロデータを与える。注目すべきは、ISRL-DPアルゴリズムにより、任意の不均一なサイロデータ分布で同様の境界がスムーズな損失に到達できることである。また, ISRL-DPフェデレーションによる経験的リスク最小化のために, 上および下限を厳密に設定し, アクセラレーションを用いて, 最先端技術よりも少ない通信ラウンドで最適なバウンドを実現する。最後に、サイロメッセージを匿名化するセキュアな「シャッフル」により、より実用的な信頼前提の下で、我々のアルゴリズムは最適な中央DPレートを得る。数値実験により,アルゴリズムの分類と回帰タスクにおいて,プライバシの正確性が良好なトレードオフを示す。 This paper studies federated learning (FL)--especially cross-silo FL--with data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) has data from different people (e.g. patients) and must maintain the privacy of each person's data (e.g. medical record), even if the server or other silos act as adversarial eavesdroppers. This requirement motivates the study of Inter-Silo Record-Level Differential Privacy (ISRL-DP), which requires silo i's communications to satisfy record/item-level differential privacy (DP). ISRL-DP ensures that the data of each person (e.g. patient) in silo i (e.g. hospital i) cannot be leaked. ISRL-DP is different from well-studied privacy notions. Central and user-level DP assume that people trust the server/other silos. On the other end of the spectrum, local DP assumes that people do not trust anyone at all (even their own silo). Sitting between central and local DP, ISRL-DP makes the realistic assumption (in cross-silo FL) that people trust their own silo, but not the server or other silos. In this work, we provide tight (up to logarithms) upper and lower bounds for ISRL-DP FL with convex/strongly convex loss functions and homogeneous (i.i.d.) silo data. Remarkably, we show that similar bounds are attainable for smooth losses with arbitrary heterogeneous silo data distributions, via an accelerated ISRL-DP algorithm. We also provide tight upper and lower bounds for ISRL-DP federated empirical risk minimization, and use acceleration to attain the optimal bounds in fewer rounds of communication than the state-of-the-art. Finally, with a secure "shuffler" to anonymize silo messages (but without a trusted server), our algorithm attains the optimal central DP rates under more practical trust assumptions. Numerical experiments show favorable privacy-accuracy tradeoffs for our algorithm in classification and regression tasks.	翻訳日:2023-06-17 04:42:55 公開日:2023-06-14
# ロバストなサンプル重み付けによるターゲット集団に対する個別化治療ルール学習 Robust Sample Weighting to Facilitate Individualized Treatment Rule Learning for a Target Population ( http://arxiv.org/abs/2105.00581v2 ) ライセンス: Link先を確認	Rui Chen, Jared D. Huling, Guanhua Chen, Menggang Yu	(参考訳) 個別化治療規則(ITR)の学習は、精密医療において重要なトピックである。現在の文献は主に単一源集団からITRを誘導することに焦点を当てている。対象個体群と対象個体群とが異なる場合の観測データ設定について考察する。スカラー量である平均処理効果の因果一般化と比較すると、ITRの一般化は、制限されない真の最適ITRを含まないかもしれない関数の事前定義されたクラスに基づいて規則をモデル化し一般化する必要があるため、新たな課題をもたらす。本研究の目的は、このような不特定性の影響を緩和し、ソース集団からターゲット集団への最適なITRの一般化を容易にするための重み付けフレームワークを開発することである。提案手法は,カーネルヒルベルト空間を再現した非パラメトリック関数クラスに対する共変量バランスを求め,重みに依存する多くのIRR学習法を改善することができる。提案手法は,重み付けの重要性と重み付けの重み付けを2つの極端なケースとして包含し,その間のバイアス分散トレードオフを改善できることを示す。数値的な例は,本手法を用いることで,他の重み付け法と比較して,ターゲット個体数のITR推定を大幅に改善できることを示している。 Learning individualized treatment rules (ITRs) is an important topic in precision medicine. Current literature mainly focuses on deriving ITRs from a single source population. We consider the observational data setting when the source population differs from a target population of interest. Compared with causal generalization for the average treatment effect which is a scalar quantity, ITR generalization poses new challenges due to the need to model and generalize the rules based on a prespecified class of functions which may not contain the unrestricted true optimal ITR. The aim of this paper is to develop a weighting framework to mitigate the impact of such misspecification and thus facilitate the generalizability of optimal ITRs from a source population to a target population. Our method seeks covariate balance over a non-parametric function class characterized by a reproducing kernel Hilbert space and can improve many ITR learning methods that rely on weights. We show that the proposed method encompasses importance weights and overlap weights as two extreme cases, allowing for a better bias-variance trade-off in between. Numerical examples demonstrate that the use of our weighting method can greatly improve ITR estimation for the target population compared with other weighting methods.	翻訳日:2023-06-17 04:42:02 公開日:2023-06-14
# 多様体とグラフ上の確率分布のコレクションの比較のための固有スライスワッサースタイン距離 Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and Graphs ( http://arxiv.org/abs/2010.15285v3 ) ライセンス: Link先を確認	Raif Rustamov and Subhabrata Majumdar	(参考訳) 確率分布のコレクションは、ユーザアクティビティパターン分析から脳コネクトミクスまで、さまざまなアプリケーションで発生します。実際には、これらの分布は有限区間、円、シリンダー、球面、他の多様体、グラフを含む様々な領域タイプで定義される。本稿では,そのような一般領域上の分布の2つの集合間の差を検出する手法を提案する。そこで本研究では,多様体とグラフ上の新たなワッサースタイン距離クラスを導出する本質的スライシング構成を提案する。これらの距離はヒルベルト埋め込み可能であり、分布コレクション比較問題をヒルベルト空間におけるより親しみやすい平均テスト問題に還元することができる。我々は、再サンプリングに基づく2つのテスト手順と、座標ワイドテストからのp値の組み合わせを提供する。種々の合成および実データ設定実験により、結果の試験が強力であり、p値が良好に校正されていることを示す。 Collections of probability distributions arise in a variety of applications ranging from user activity pattern analysis to brain connectomics. In practice these distributions can be defined over diverse domain types including finite intervals, circles, cylinders, spheres, other manifolds, and graphs. This paper introduces an approach for detecting differences between two collections of distributions over such general domains. To this end, we propose the intrinsic slicing construction that yields a novel class of Wasserstein distances on manifolds and graphs. These distances are Hilbert embeddable, allowing us to reduce the distribution collection comparison problem to a more familiar mean testing problem in a Hilbert space. We provide two testing procedures one based on resampling and another on combining p-values from coordinate-wise tests. Our experiments in various synthetic and real data settings show that the resulting tests are powerful and the p-values are well-calibrated.	翻訳日:2023-06-17 04:40:57 公開日:2023-06-14
# データ依存による会員推測攻撃の調査 Investigating Membership Inference Attacks under Data Dependencies ( http://arxiv.org/abs/2010.12112v4 ) ライセンス: Link先を確認	Thomas Humphries, Simon Oya, Lindsey Tulloch, Matthew Rafuse, Ian Goldberg, Urs Hengartner, Florian Kerschbaum	(参考訳) プライバシに敏感なデータに基づく機械学習モデルのトレーニングが一般的なプラクティスとなり、拡大する分野におけるイノベーションを推進している。これにより、プライバシーに深刻な影響をもたらす新たな攻撃への扉が開いた。そのような攻撃の一つ、メンバーシップ推論攻撃(mia)は、特定のデータポイントがモデルのトレーニングに使われたかどうかを暴露する。増大する文学は、そのような攻撃に対する防御として差分プライベート(dp)訓練アルゴリズムを使用する。しかしながら、これらの研究は、訓練セットのすべてのメンバーと非メンバーが独立して同一に分散しているという制限的な仮定の下で、防衛を評価する。この仮定は文学における現実世界のユースケースの多くに当てはまらない。このことから,サンプル間の統計的依存関係による会員推定を評価し,DPが意味のある保護を提供していない理由(プライバシーパラメータ $\epsilon$ scales with the training set size $n$)を説明する。実世界のデータから構築したサンプル間の依存関係の異なるトレーニングセットを用いて,市販miasを用いた経験的評価を行う。以上の結果から,MIA の性能が大幅に向上し,データサンプルが統計的に独立であることはMIA の性能を著しく過小評価できることがわかった。 Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Private (DP) training algorithms as a defence against such attacks. However, these works evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed. This assumption does not hold for many real-world use cases in the literature. Motivated by this, we evaluate membership inference with statistical dependencies among samples and explain why DP does not provide meaningful protection (the privacy parameter $\epsilon$ scales with the training set size $n$) in this more general case. We conduct a series of empirical evaluations with off-the-shelf MIAs using training sets built from real-world data showing different types of dependencies among samples. Our results reveal that training set dependencies can severely increase the performance of MIAs, and therefore assuming that data samples are statistically independent can significantly underestimate the performance of MIAs.	翻訳日:2023-06-17 04:40:41 公開日:2023-06-14
# 因子化線形判別分析と計算生物学への応用 Factorized linear discriminant analysis and its application in computational biology ( http://arxiv.org/abs/2010.02171v5 ) ライセンス: Link先を確認	Mu Qiao	(参考訳) 単細胞転写データの複雑な景観をナビゲートすることは大きな課題である。この課題の中心は、細胞タイプの構造的および機能的特性に光を当てる高次元遺伝子発現パターンの有意義な表現の同定である。モデル解釈性と計算の単純さを追求し、しばしば細胞の重要な表現型の特徴と整合する元のデータの線形変換を求める。そこで本稿では,このニーズに対応するために,新しい線形次元低減法である因子化線形判別分析(flda)を提案する。 FLDAのくちばしは、他の影響を最小限に抑えつつ、1つの表現型の特徴と高い相関を持つ遺伝子発現レベルの線形機能を特定することである。本研究では,この手法をスパーシティーベース正規化アルゴリズムと統合する。この統合は、特定の表現型の特徴またはそれらの組み合わせに欠かせない遺伝子のサブセットを選択するために重要である。 fldaの有効性を説明するために,ショウジョウバエ視葉の神経細胞からの転写学的データセットに適用する。 FLDAは表現型の特徴に沿った構造パターンを捉えるだけでなく,各表現型に関連する重要な遺伝子を明らかにする。 Navigating the complex landscape of single-cell transcriptomic data presents significant challenges. Central to this challenge is the identification of a meaningful representation of high-dimensional gene expression patterns that sheds light on the structural and functional properties of cell types. Pursuing model interpretability and computational simplicity, we often look for a linear transformation of the original data that aligns with key phenotypic features of cells. In response to this need, we introduce factorized linear discriminant analysis (FLDA), a novel method for linear dimensionality reduction. The crux of FLDA lies in identifying a linear function of gene expression levels that is highly correlated with one phenotypic feature while minimizing the influence of others. To augment this method, we integrate it with a sparsity-based regularization algorithm. This integration is crucial as it selects a subset of genes pivotal to a specific phenotypic feature or a combination thereof. To illustrate the effectiveness of FLDA, we apply it to transcriptomic datasets from neurons in the Drosophila optic lobe. We demonstrate that FLDA not only captures the inherent structural patterns aligned with phenotypic features but also uncovers key genes associated with each phenotype.	翻訳日:2023-06-17 04:40:21 公開日:2023-06-14
# 高次元2層ニューラルネットワークにおける確率勾配の位相図 Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks ( http://arxiv.org/abs/2202.00293v4 ) ライセンス: Link先を確認	Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborov\'a	(参考訳) 非凸最適化の展望にもかかわらず、過パラメータの浅いネットワークは勾配降下下でグローバル収束を達成することができる。この画像は狭いネットワークでは根本的に異なるが、局所的な極小視では行き詰まる傾向がある。本稿では,これら2つのレジームの高次元設定におけるクロスオーバーについて検討し,特に,いわゆる平均場・流体力学的レジームとsaad & sollaの独創的アプローチとの関係について検討する。ガウスデータに着目し,確率勾配勾配(SGD)の高次元的ダイナミクスにおける学習速度,時間スケール,隠れた単位数との相互作用について検討した。我々の研究は、統計的物理学から高次元のSGDを決定論的に記述し、それを拡張し、厳密な収束率を提供する。 Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.	翻訳日:2023-06-17 04:34:42 公開日:2023-06-14
# 量子後連想記憶 A Post-Quantum Associative Memory ( http://arxiv.org/abs/2201.12305v2 ) ライセンス: Link先を確認	Ludovico Lami, Daniel Goldwater, Gerardo Adesso	(参考訳) 連想記憶(Associative memory)は、その部分的開示によって完全に検索できる情報を記憶する装置である。我々は,いくつかの基本的な操作公理を満足する物理理論の最も一般的なクラスを表現する一般確率論(gpts)の枠組みの中で,連想記憶のおもちゃモデルとそれを行う究極の限界について検討する。私たちは、gptの次元がどれくらい大きいか自問自答し、n$が完全に区別可能な特性で2^m$の状態に対応できるようにします。このような最小次元を$d(n,m)$ と呼ぶ。 Danzer と Gr\"unbaum によって古い結果を呼び起こすと、GPT が古典的あるいは量子的である必要がある場合、$d(2,m)=m+1$ が $O(2^m)$ と比較されることを示す。これは、GPTが古典理論と量子理論の両方を指数関数的に上回るタスクの例をもたらす。より一般に、固定された$N$と漸近的に大きい$m$を解決し、すべての$N\geq 2$に対して$d(N,m) \leq m^{1+o_N(1)}$(m\to\infty$)を証明し、古典的および量子理論よりも指数関数的に改善する。最後に、与えられた gpt に対して最大$n$-wise の相互識別可能な集合を見つけるという一般問題に対する数値的アプローチを開発し、これは$n$-regular hypergraphs 上の最大クライク問題の例と見なすことができる。 Associative memories are devices storing information that can be fully retrieved given partial disclosure of it. We examine a toy model of associative memory and the ultimate limitations it is subjected to within the framework of general probabilistic theories (GPTs), which represent the most general class of physical theories satisfying some basic operational axioms. We ask ourselves how large the dimension of a GPT should be so that it can accommodate $2^m$ states with the property that any $N$ of them are perfectly distinguishable. Call $d(N,m)$ the minimal such dimension. Invoking an old result by Danzer and Gr\"unbaum, we prove that $d(2,m)=m+1$, to be compared with $O(2^m)$ when the GPT is required to be either classical or quantum. This yields an example of a task where GPTs outperform both classical and quantum theory exponentially. More generally, we resolve the case of fixed $N$ and asymptotically large $m$, proving that $d(N,m) \leq m^{1+o_N(1)}$ (as $m\to\infty$) for every $N\geq 2$, which yields again an exponential improvement over classical and quantum theories. Finally, we develop a numerical approach to the general problem of finding the largest $N$-wise mutually distinguishable set for a given GPT, which can be seen as an instance of the maximum clique problem on $N$-regular hypergraphs.	翻訳日:2023-06-17 04:34:28 公開日:2023-06-14
# 能動ラベル取得における名前付きエンティティに注目して Focusing on Potential Named Entities During Active Label Acquisition ( http://arxiv.org/abs/2111.03837v3 ) ライセンス: Link先を確認	Ali Osman Berk Sapci, Oznur Tastan, Reyyan Yeniterzi	(参考訳) 名前付きエンティティ認識(ner)は、非構造化テキスト内の名前付きエンティティの参照を識別し、それらを予め定義された名前付きエンティティクラスに分類することを目的としている。ディープラーニングベースの事前学習言語モデルは、NERで優れた予測性能を達成するのに役立つが、多くのドメイン固有のNERアプリケーションは、依然としてかなりの量のラベル付きデータを要求する。ラベル取得問題の一般的なフレームワークであるアクティブラーニング(AL)は、モデル性能を犠牲にすることなく、アノテーションコストを最小限に抑えるためにNERタスクに使用されている。しかし,トークンの非バランスなクラス分布は,NERの効果的なALクエリ手法を設計する上での課題をもたらす。本稿では,有意な正のトークンにより多くの注意を払うAL文クエリ評価関数を提案し,これらの関数を文ベースおよびトークンベースのコスト評価戦略を用いて評価する。また、長すぎるか短すぎる文をペナル化するためのデータ駆動正規化手法を提案する。異なる領域からの3つのデータセットに対する実験により,提案手法はアノテーション付きトークンの数を減らし,従来の手法による予測性能を向上する。 Named entity recognition (NER) aims to identify mentions of named entities in an unstructured text and classify them into predefined named entity classes. While deep learning-based pre-trained language models help to achieve good predictive performances in NER, many domain-specific NER applications still call for a substantial amount of labeled data. Active learning (AL), a general framework for the label acquisition problem, has been used for NER tasks to minimize the annotation cost without sacrificing model performance. However, the heavily imbalanced class distribution of tokens introduces challenges in designing effective AL querying methods for NER. We propose several AL sentence query evaluation functions that pay more attention to potential positive tokens, and evaluate these proposed functions with both sentence-based and token-based cost evaluation strategies. We also propose a better data-driven normalization approach to penalize sentences that are too long or too short. Our experiments on three datasets from different domains reveal that the proposed approach reduces the number of annotated tokens while achieving better or comparable prediction performance with conventional methods.	翻訳日:2023-06-17 04:33:03 公開日:2023-06-14
# 自己コンディショニング事前学習言語モデル Self-conditioning pre-trained language models ( http://arxiv.org/abs/2110.02802v4 ) ライセンス: Link先を確認	Xavier Suau, Luca Zappella, Nicholas Apostoloff	(参考訳) 本稿では,事前学習したTransformer-based Language Models (TLM) を用いてテキスト生成を誘導するメカニズムについて検討する。 Hinton (1999) によるProduct of Expertsの定式化に基づいて、TLM に自然に存在するエキスパートユニットを利用する生成機構を記述する。そのような単位は、そのような概念の入力および条件付きテキスト生成における概念を検出する責任がある。生成した出力に望まれる概念を誘導するために、専門家ユニットの識別方法と推論中にそれらを活性化する方法を述べる。驚くほど少量のユニットのアクティベーションは、テキスト生成(345mのパラメータを持つモデルでは3ユニット程度)を制御するのに十分であることがわかった。本研究の目的は, TLMの動作についてより深く知ることであるが, 細粒度ホモグラフの概念であっても, 微調整や余分なパラメータを使わずに条件付けに有効であることを示す。さらに,本手法は, TLMの出力に存在する性別バイアスを補正し, 評価された文脈ごとの性別パリティを達成できることを示す。提案手法をFUDGEとPPLM-BoWと比較し,本手法がより低いパープレキシティでジェンダーパリティを達成可能であることを示す。提案手法は,単純さと計算能力の最小化により,幅広いオーディエンスに利用可能である。本研究の成果は, TLMの生成機構を理解するための一歩である。 In this paper we aim to investigate the mechanisms that guide text generation with pre-trained Transformer-based Language Models (TLMs). Grounded on the Product of Experts formulation by Hinton (1999), we describe a generative mechanism that exploits expert units which naturally exist in TLMs. Such units are responsible for detecting concepts in the input and conditioning text generation on such concepts. We describe how to identify expert units and how to activate them during inference in order to induce any desired concept in the generated output. We find that the activation of a surprisingly small amount of units is sufficient to steer text generation (as little as 3 units in a model with 345M parameters). While the objective of this work is to learn more about how TLMs work, we show that our method is effective for conditioning without fine-tuning or using extra parameters, even on fine-grained homograph concepts. Additionally, we show that our method can be used to correct gender bias present in the output of TLMs and achieves gender parity for all evaluated contexts. We compare our method with FUDGE and PPLM-BoW, and show that our approach is able to achieve gender parity at a lower perplexity. The proposed method is accessible to a wide audience thanks to its simplicity and minimal compute needs. The findings in this paper are a step forward in understanding the generative mechanisms of TLMs.	翻訳日:2023-06-17 04:31:26 公開日:2023-06-14
# 地形活性化マップを用いた深部ニューラルネットワークの可視化 Visualizing Deep Neural Networks with Topographic Activation Maps ( http://arxiv.org/abs/2204.03528v2 ) ライセンス: Link先を確認	Valerie Krug, Raihan Kabir Ratul, Christopher Olson, Sebastian Stober	(参考訳) ディープニューラルネットワーク(DNN)による機械学習は、さまざまな分野のアプリケーションでタスクを解くのに成功している。しかし、DNNの複雑さは、彼らの学習課題の解決方法を理解するのを困難にしている。 DNNの説明可能性を改善するため、複雑で不透明なシステムを解析する神経科学の手法を適用した。ここでは、神経科学が脳活動の可視化に地形マップをどのように利用するかからインスピレーションを得る。また、DNNにおけるニューロンの活性化を地形図として可視化するため、同様の活動のニューロンが互いに近接する2次元空間に配置する手法の研究を行った。本研究では,DNN層内のニューロンの地形的レイアウトを求める手法を紹介し,比較する。さらに,地形アクティベーションマップを用いて誤りやバイアスを識別し,トレーニングプロセスを可視化する方法を示す。我々の新しい可視化技術は、DNNに基づく意思決定システムの透明性を改善し、機械学習の専門知識なしで解釈可能である。 Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. However, the complexity of DNNs makes it difficult to understand how they solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience that analyze complex and opaque systems. Here, we draw inspiration from how neuroscience uses topographic maps to visualize brain activity. To also visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space such that neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare methods to obtain a topographic layout of neurons in a DNN layer. Moreover, we demonstrate how to use topographic activation maps to identify errors or encoded biases and to visualize training processes. Our novel visualization technique improves the transparency of DNN-based decision-making systems and is interpretable without expert knowledge in Machine Learning.	翻訳日:2023-06-17 04:23:49 公開日:2023-06-14
# 量子不明瞭性によるアクセス不能情報へのアクセス Accessing inaccessible information via quantum indistinguishability ( http://arxiv.org/abs/2203.16592v3 ) ライセンス: Link先を確認	Sebastian Horvat, Borivoje Daki\'c	(参考訳) 本稿では,その情報を符号化する「ターゲット」粒子を空間的に移動させることで,情報を少し学習する情報理論タスクを提示・解析する。一方、目的粒子と区別できない場合のみ、追加で独立に準備された量子粒子を用いることでタスクを解くことができることを示す。一方, 対象粒子と絡み合っている場合のみ, 識別可能な量子粒子を用いることで解くことができる。第一の量子化形式論において、独立に準備された不明瞭な量子粒子からなる系に本質的に存在するように見えるという絡み合いは、単なる表現的人工物以上のものであり、情報処理のリソースとして実際に使用できる。我々のタスクを解く量子力学プロトコルのクラスを分析することに加えて、我々は結果を一般化し、暗号に適用する可能な方法に向かって行動する。 In this paper we present and analyze an information-theoretic task that consists in learning a bit of information by spatially moving the "target" particle that encodes it. On one hand, we show that the task can be solved with the use of additional independently prepared quantum particles, only if these are indistinguishable from the target particle. On the other hand, the task can be solved with the use of distinguishable quantum particles, only if they are entangled with the target particle. These two features, as we argue, support the following claim: the entanglement that, in the first quantization formalism, appears to be inherently present in systems comprised of independently prepared indistinguishable quantum particles, is more than a mere representational artefact and can indeed be used as a resource for information processing. Besides analyzing the class of quantum-mechanical protocols that solve our task, we gesture towards possible ways of generalizing our results and of applying them in cryptography.	翻訳日:2023-06-17 04:23:32 公開日:2023-06-14
# SC2ベンチマーク:スプリットコンピューティングの圧縮を改善 SC2 Benchmark: Supervised Compression for Split Computing ( http://arxiv.org/abs/2203.08875v2 ) ライセンス: Link先を確認	Yoshitomo Matsubara, Ruihan Yang, Marco Levorato, Stephan Mandt	(参考訳) モバイルデバイスのディープラーニングモデルに対する需要が高まっているため、デバイスとより強力なエッジサーバの間のニューラルネットワーク計算の分割は魅力的なソリューションとなっている。しかし、既存の分割コンピューティングアプローチは、圧縮されたデータに対する遠隔計算の単純なベースラインに比べて性能が劣ることが多い。最近の研究では、教師付き下流タスクの関連情報を含む圧縮表現の学習を提案し、圧縮されたデータサイズと教師付きパフォーマンスのトレードオフを改善した。しかし、既存の評価指標は分割計算の不完全な図のみを提供する。本研究では,スプリットコンピューティング(SC2)の教師付き圧縮を導入し,モバイルデバイス上での計算の最小化,送信データサイズの最小化,モデル精度の最大化という新たな評価基準を提案する。 10のベースライン手法,3つのコンピュータビジョンタスク,180以上のトレーニングモデルを用いた総合的なベンチマーク研究を行い,SC2の様々な側面について議論する。 sc2benchは、将来のsc2研究のためのpythonパッケージです。提案するメトリクスとパッケージは、スプリットコンピューティングにおける教師付き圧縮のトレードオフを理解するのに役立つでしょう。 With the increasing demand for deep learning models on mobile devices, splitting neural network computation between the device and a more powerful edge server has become an attractive solution. However, existing split computing approaches often underperform compared to a naive baseline of remote computation on compressed data. Recent studies propose learning compressed representations that contain more relevant information for supervised downstream tasks, showing improved tradeoffs between compressed data size and supervised performance. However, existing evaluation metrics only provide an incomplete picture of split computing. This study introduces supervised compression for split computing (SC2) and proposes new evaluation criteria: minimizing computation on the mobile device, minimizing transmitted data size, and maximizing model accuracy. We conduct a comprehensive benchmark study using 10 baseline methods, three computer vision tasks, and over 180 trained models, and discuss various aspects of SC2. We also release sc2bench, a Python package for future research on SC2. Our proposed metrics and package will help researchers better understand the tradeoffs of supervised compression in split computing.	翻訳日:2023-06-17 04:22:20 公開日:2023-06-14
# 再構成による構成シーン表現学習:調査 Compositional Scene Representation Learning via Reconstruction: A Survey ( http://arxiv.org/abs/2202.07135v4 ) ライセンス: Link先を確認	Jinyang Yuan, Tonglin Chen, Bin Li, Xiangyang Xue	(参考訳) 視覚シーンは視覚概念で構成され、組み合わせ爆発の特性を持つ。人間が多様な視覚シーンから効率的に学習する重要な理由は、構成的知覚能力であり、人工知能が同様の能力を持つことが望ましい。構成シーン表現学習はそのような能力を実現するタスクである。近年,表現学習に有利な深層ニューラルネットワークを応用し,再構成による構図表現を学習し,この研究の方向性を深層学習時代へと発展させる手法が提案されている。大量のラベルのないデータを使用し、費用がかかるデータアノテーションを避けることができるため、再構築による学習は有利である。 In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic. Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.	翻訳日:2023-06-17 04:21:42 公開日:2023-06-14
# 不均一無線ネットワーク上での動的分散モデルトレーニングのための並列逐次学習 Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks ( http://arxiv.org/abs/2202.02947v6 ) ライセンス: Link先を確認	Seyyedali Hosseinalipour, Su Wang, Nicolo Michelusi, Vaneet Aggarwal, Christopher G. Brinton, David J. Love, Mung Chiang	(参考訳) フェデレートラーニング(FedL)は,一連の無線デバイス上で,反復的なローカルアップデート(デバイス)とグローバルアグリゲーション(サーバ)を通じて,モデルトレーニングを分散する一般的なテクニックとして登場した。本稿では,FedLアーキテクチャを3次元に拡張した並列逐次学習(PSL)を開発する。 i)デバイス間通信(D2D)を介してデバイス間の分散協調を可能にするネットワーク。 (ii-a)学習:pslは、デバイスで異なるミニバッチサイズを持つ確率的勾配降下イテレーションの異種数を考慮し、(ii-b)データ:pslはデータの到着と出発を伴う動的環境を想定し、ローカルデータセットの分布は時間とともに進化し、モデル/コンセプトドリフトの新しいメトリックを介してキャプチャされる。 (ii-c) デバイス: PSLは計算能力と通信能力の異なるデバイスを考える。 (iii)近接、デバイス同士の距離とアクセスポイントが異なる。 pslは、資源効率の改善のためにそれらの間にアイドルタイムでグローバルアグリゲーションが実行され、データ分散とモデル分散と局所モデル凝縮をfederに組み込む現実的なシナリオを考察している。我々の分析は、分散機械学習におけるコールド対ウォームアップモデルの概念とモデル慣性について光を当てている。次に、ネットワーク対応動的モデルトラッキングを提案し、モデル学習とリソース効率のトレードオフを最適化し、NPハードなシグナミカルプログラミング問題を示す。最後に, 一般最適化解法を提案することで, この問題を解決した。数値計算により,グローバルアグリゲーション,モデル/コンセプションドリフト,D2D協調構成の間におけるアイドル時間間の相互依存性が明らかになった。 Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices, via iterative local updates (at devices) and global aggregations (at the server). In this paper, we develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions: (i) Network, allowing decentralized cooperation among the devices via device-to-device (D2D) communications. (ii) Heterogeneity, interpreted at three levels: (ii-a) Learning: PSL considers heterogeneous number of stochastic gradient descent iterations with different mini-batch sizes at the devices; (ii-b) Data: PSL presumes a dynamic environment with data arrival and departure, where the distributions of local datasets evolve over time, captured via a new metric for model/concept drift. (ii-c) Device: PSL considers devices with different computation and communication capabilities. (iii) Proximity, where devices have different distances to each other and the access point. PSL considers the realistic scenario where global aggregations are conducted with idle times in-between them for resource efficiency improvements, and incorporates data dispersion and model dispersion with local model condensation into FedL. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning. We then propose network-aware dynamic model tracking to optimize the model learning vs. resource efficiency tradeoff, which we show is an NP-hard signomial programming problem. We finally solve this problem through proposing a general optimization solver. Our numerical results reveal new findings on the interdependencies between the idle times in-between the global aggregations, model/concept drift, and D2D cooperation configuration.	翻訳日:2023-06-17 04:21:08 公開日:2023-06-14
# 字幕からの視覚言語トランスフォーマーの訓練 Training Vision-Language Transformers from Captions ( http://arxiv.org/abs/2205.09256v3 ) ライセンス: Link先を確認	Liangke Gui, Yingshan Chang, Qiuyuan Huang, Subhojit Som, Alex Hauptmann, Jianfeng Gao, Yonatan Bisk	(参考訳) 視覚言語トランスフォーマーは、低レベルな人間のラベル(クラスラベル、バウンディングボックスなど)なしで学習することができる。既存の作業は、バウンディングボックスやパッチを明示的に利用するにせよ、視覚的なバックボーンは、マルチモーダル言語パイプラインに統合される前に、ImageNetクラス予測に基づいてトレーニングする必要があると仮定する。これは不要であることを示し、この監督を必要としないマスク付きオートエンコーダ上に構築されたキャプション(vlc)から新しいモデルヴィジョン言語を導入する。実際、監督対象分類で事前訓練された現在の最先端のパッチベース視覚言語トランスフォーマであるVLTと、我々のモデルであるVLCとの直接比較では、我々のアプローチが分かる。 1.標準ベンチマークでvultを上回っている 2. より解釈可能で直感的なパッチ視覚化を提供する。 3.3は、アノテーション付きバウンディングボックスでトレーニングされたROIを利用する多くの大きなモデルと競合する。 Vision-Language Transformers can be learned without low-level human labels (e.g. class labels, bounding boxes, etc). Existing work, whether explicitly utilizing bounding boxes or patches, assumes that the visual backbone must first be trained on ImageNet class prediction before being integrated into a multimodal linguistic pipeline. We show that this is not necessary and introduce a new model Vision-Language from Captions (VLC) built on top of Masked Auto-Encoders that does not require this supervision. In fact, in a head-to-head comparison between ViLT, the current state-of-the-art patch-based vision-language transformer which is pretrained with supervised object classification, and our model, VLC, we find that our approach 1. outperforms ViLT on standard benchmarks, 2. provides more interpretable and intuitive patch visualizations, and 3. is competitive with many larger models that utilize ROIs trained on annotated bounding-boxes.	翻訳日:2023-06-17 04:13:21 公開日:2023-06-14
# テキスト・画像生成のためのプロンプト修飾器の分類 A Taxonomy of Prompt Modifiers for Text-To-Image Generation ( http://arxiv.org/abs/2204.13988v3 ) ライセンス: Link先を確認	Jonas Oppenlaender	(参考訳) テキストから画像への生成は2021年以来、注目を集めている。今日では、美しい、興味深いデジタル画像やアートワークが、テキスト入力("prompts")と深い生成モデルから合成することができる。テキスト・ツー・画像生成とAI生成アートに関するオンラインコミュニティが急速に現れている。本稿では,3ヶ月のエスノグラフィー研究に基づいて,オンラインコミュニティの実践者が使用する6種類のプロンプト修飾剤を同定する。プロンプト修飾子の新しい分類法により、研究者はテキストから画像への生成の実践を研究するための概念的な出発点を提供するが、aiが生成した芸術の実践者がイメージを改善するのに役立つかもしれない。さらに,「プロンプトエンジニアリング」の実践における即時修飾器の応用について概説する。本稿では,ヒューマン・コンピュータ・インタラクション(HCI)分野における新しい創造的実践の機会について論じる。この論文は、テキスト・ツー・イメージ生成とAI生成技術以外の将来の応用におけるヒューマン・AIインタラクション(HAI)の観点から、迅速なエンジニアリングの幅広い意味を論じる。 Text-to-image generation has seen an explosion of interest since 2021. Today, beautiful and intriguing digital images and artworks can be synthesized from textual inputs ("prompts") with deep generative models. Online communities around text-to-image generation and AI generated art have quickly emerged. This paper identifies six types of prompt modifiers used by practitioners in the online community based on a 3-month ethnographic study. The novel taxonomy of prompt modifiers provides researchers a conceptual starting point for investigating the practice of text-to-image generation, but may also help practitioners of AI generated art improve their images. We further outline how prompt modifiers are applied in the practice of "prompt engineering." We discuss research opportunities of this novel creative practice in the field of Human-Computer Interaction (HCI). The paper concludes with a discussion of broader implications of prompt engineering from the perspective of Human-AI Interaction (HAI) in future applications beyond the use case of text-to-image generation and AI generated art.	翻訳日:2023-06-17 04:13:05 公開日:2023-06-14
# 相互作用ナノ粒子のフィードバック冷却による力勾配センシングと絡み合い Force-Gradient Sensing and Entanglement via Feedback Cooling of Interacting Nanoparticles ( http://arxiv.org/abs/2204.13684v3 ) ライセンス: Link先を確認	Henning Rudolph, Uro\v{s} Deli\'c, Markus Aspelmeyer, Klaus Hornberger, and Benjamin A. Stickler	(参考訳) 本研究では, 2つの浮遊ナノ粒子のフィードバック冷却により, 力の差分知覚と定常的絡み合いの観察が可能となることを示す。このフィードバックにより、2つの粒子は不均質な力場に影響を受けやすく、十分に強い粒子間カップリングの絡み合いを示す定常的な非熱状態へと誘導される。マイクロンあたりのゼプトニュートンの力勾配センシングは実現可能であり、荷電粒子間のクーロン相互作用による絡み合いは最先端のセットアップで現実的に観測できると予測した。 We show theoretically that feedback-cooling of two levitated, interacting nanoparticles enables differential sensing of forces and the observation of stationary entanglement. The feedback drives the two particles into a stationary, non-thermal state which is susceptible to inhomogeneous force fields and which exhibits entanglement for sufficiently strong inter-particle couplings. We predict that force-gradient sensing at the zepto-Newton per micron range is feasible and that entanglement due to the Coulomb interaction between charged particles can be realistically observed in state-of-the-art setups.	翻訳日:2023-06-17 04:12:21 公開日:2023-06-14
# 実験グレーボックス量子システム同定と制御 Experimental graybox quantum system identification and control ( http://arxiv.org/abs/2206.12201v3 ) ライセンス: Link先を確認	Akram Youssry, Yang Yang, Robert J. Chapman, Ben Haylock, Francesco Lenzini, Mirko Lobino, Alberto Peruzzo	(参考訳) エンジニアリングされた量子システムの理解と制御は、実用的な量子技術を開発するための鍵である。しかし、製造の不完全さや環境騒音といった現在の技術的限界を考えると、これは必ずしも可能とは限らない。これらの問題に対処するため、量子システム同定と制御のための理論的および数値的手法が数多く開発されている。これらの手法は、システムを記述するモデルの精度によって制限される従来の曲線フィッティングから、効率的な制御ソリューションを提供するが、モデルの出力を超えた制御や、基礎となる物理プロセスへの洞察を提供する機械学習手法まで、幅広い。ここでは,量子システムの物理モデルを構築し,最適制御を設計するための"グレーボックス"手法を実験的に実証する。標準教師付き機械学習モデルでは使用できない量であるユニタリとハミルトニアンを生成する一方で,モデルフィッティングよりも優れた性能を示す。提案手法は,物理原理と高精度機械学習を組み合わせることで,必要な制御量を直接測定できない問題に対して有効である。この方法は自然に時間依存的かつオープンな量子システムに拡張され、量子ノイズ分光とキャンセルへの応用がある。 Understanding and controlling engineered quantum systems is key to developing practical quantum technology. However, given the current technological limitations, such as fabrication imperfections and environmental noise, this is not always possible. To address these issues, a great deal of theoretical and numerical methods for quantum system identification and control have been developed. These methods range from traditional curve fittings, which are limited by the accuracy of the model that describes the system, to machine learning methods, which provide efficient control solutions but no control beyond the output of the model, nor insights into the underlying physical process. Here we experimentally demonstrate a "graybox" approach to construct a physical model of a quantum system and use it to design optimal control. We report superior performance over model fitting, while generating unitaries and Hamiltonians, which are quantities not available from the structure of standard supervised machine learning models. Our approach combines physics principles with high-accuracy machine learning and is effective with any problem where the required controlled quantities cannot be directly measured in experiments. This method naturally extends to time-dependent and open quantum systems, with applications in quantum noise spectroscopy and cancellation.	翻訳日:2023-06-17 04:04:13 公開日:2023-06-14
# 視覚異常検出のためのオートエンコーダによる自己教師付きトレーニング Self-Supervised Training with Autoencoders for Visual Anomaly Detection ( http://arxiv.org/abs/2206.11723v3 ) ライセンス: Link先を確認	Alexander Bauer	(参考訳) 深層畳み込みオートエンコーダは、教師なしの方法で非線形次元の減少を学習するための効果的なツールを提供する。近年,視覚領域における異常検出作業に用いられている。異常のない例を用いて再構成誤差を最適化することにより、対応するネットワークがアプリケーションフェーズ内の異常領域を正確に再構成できない、という考え方が一般的である。この目標は通常、ボトルネック層のサイズを縮小するか、アクティベーションに間隔制約を課すことで、ネットワークの容量を制御することで対処される。しかし、どちらの手法も異常信号の再構成を明示的に罰しないため、しばしば検出が困難になる。我々は,データ多様体に着目した学習において,修正された再構成誤差を用いて識別情報を活用できる自己教師型学習システムを適用することで,この問題に対処する。これにより、モデルが局所的に一貫した再構成を生成するとともに、異常パターンのフィルタとして機能することで不規則性を置き換えることができる。関連する手法とは対照的に,本手法による推論は,1ステップで入力画像全体を処理する訓練や予測において極めて効率的である。 MVTec ADデータセットを用いた実験により,提案手法の高認識と局所化性能を示す。特にテクスチャ・サブセットでは,本手法は最近の異常検出手法を大きなマージンで一貫して上回っている。 Deep convolutional autoencoders provide an effective tool for learning non-linear dimensionality reduction in an unsupervised way. Recently, they have been used for the task of anomaly detection in the visual domain. By optimising for the reconstruction error using anomaly-free examples, the common belief is that a corresponding network should fail to accurately reconstruct anomalous regions in the application phase. This goal is typically addressed by controlling the capacity of the network by either reducing the size of the bottleneck layer or enforcing sparsity constraints on its activations. However, neither of these techniques does explicitly penalize reconstruction of anomalous signals often resulting in poor detection. We tackle this problem by adapting a self-supervised learning regime, which allows to use discriminative information during training focusing on the data manifold by means of a modified reconstruction error. This regularizes the model to produce locally consistent reconstructions, while replacing irregularities by acting as a filter for anomalous patterns. In contrast to related approaches, inference with our method is very efficient during training and prediction processing the entire input image in one single step. Our experiments on the MVTec AD dataset demonstrate high recognition and localization performance of the proposed method. On the texture-subset, in particular, our approach consistently outperforms a bunch of recent anomaly detection methods by a big margin.	翻訳日:2023-06-17 04:03:50 公開日:2023-06-14
# E2PN: 効率的なSE(3)-等変点ネットワーク E2PN: Efficient SE(3)-Equivariant Point Network ( http://arxiv.org/abs/2206.05398v3 ) ライセンス: Link先を確認	Minghan Zhu, Maani Ghaffari, William A. Clark, Huei Peng	(参考訳) 本稿では,3次元点雲からSE(3)-等価特徴を学習するための畳み込み構造を提案する。これはカーネルポイント畳み込み(kpconv)の同変バージョンと見なすことができ、ポイントクラウドデータを処理するために広く使用される畳み込み形式である。既存の等価ネットワークと比較して、私たちの設計はシンプルで軽量で、高速で、既存のタスク固有のポイントクラウド学習パイプラインと統合が容易です。群畳み込みと商表現を組み合わせることでこれらの望ましい性質を達成する。具体的には、SO(2) を安定化部分群として使用し、計算を省くために球面商特徴体を形成する際に、SO(3) を有限群に区別する。また, 回転を区別するキャパシティを保持するために, 球状特徴からSO(3)特徴を復元する置換層を提案する。実験の結果,オブジェクト分類,ポーズ推定,キーポイントマッチングなどのタスクにおいて,既存の作業よりもはるかに少ないメモリ消費と高速実行を実現していることがわかった。提案手法は,点雲に基づく実世界のアプリケーションのための同変モデルの開発を促進することができる。 This paper proposes a convolution structure for learning SE(3)-equivariant features from 3D point clouds. It can be viewed as an equivariant version of kernel point convolutions (KPConv), a widely used convolution form to process point cloud data. Compared with existing equivariant networks, our design is simple, lightweight, fast, and easy to be integrated with existing task-specific point cloud learning pipelines. We achieve these desirable properties by combining group convolutions and quotient representations. Specifically, we discretize SO(3) to finite groups for their simplicity while using SO(2) as the stabilizer subgroup to form spherical quotient feature fields to save computations. We also propose a permutation layer to recover SO(3) features from spherical features to preserve the capacity to distinguish rotations. Experiments show that our method achieves comparable or superior performance in various tasks, including object classification, pose estimation, and keypoint-matching, while consuming much less memory and running faster than existing work. The proposed method can foster the development of equivariant models for real-world applications based on point clouds.	翻訳日:2023-06-17 04:02:38 公開日:2023-06-14
# ニューラル共分散SDE:初期化時の無限深さ幅ネットワークの形状 The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization ( http://arxiv.org/abs/2206.02768v3 ) ライセンス: Link先を確認	Mufan Bill Li, Mihai Nica, Daniel M. Roy	(参考訳) 初期化時のフィードフォワードニューラルネットワークのロジット出力は、垂直層で定義されたランダムな共分散行列を条件付きガウス行列とする。本研究では,このランダム行列の分布について検討する。近年の研究では、この共分散行列が非退化するためには、ネットワーク深さが大きくなるにつれて活性化関数を形成する必要があることが示されている。しかし、この形状法に対する現在の無限幅スタイルの理解は大深度では不十分であり、無限幅解析は層間における微視的変動を無視するが、これらのゆらぎは多くの層に蓄積する。この欠点を克服するために、形状の無限深さと幅の極限におけるランダム共分散行列を考察する。非自明な極限に達するのに必要な活性化関数の正確なスケーリングを特定し、確率微分方程式(SDE)によってランダムな共分散行列が支配されることを示す。シミュレーションを用いて、sde は有限ネットワークのランダム共分散行列の分布と密接に一致することを示す。さらに,活性化関数に基づき,大形ネットワークの爆発や消滅のノルムに対するif-and-only-if条件を回復する。 The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.	翻訳日:2023-06-17 04:01:50 公開日:2023-06-14
# adaprop: グラフニューラルネットワークに基づく知識グラフ推論のための学習適応伝播 AdaProp: Learning Adaptive Propagation for Graph Neural Network based Knowledge Graph Reasoning ( http://arxiv.org/abs/2205.15319v2 ) ライセンス: Link先を確認	Yongqi Zhang, Zhanke Zhou, Quanming Yao, Xiaowen Chu, Bo Han	(参考訳) グラフニューラルネットワーク(GNN)の人気により、知識グラフ(KG)を推論する様々なGNNベースの手法が設計されている。 gnnベースのkg推論法の重要な設計要素は伝搬経路と呼ばれ、各伝播ステップに関連するエンティティの集合を含んでいる。既存の手法では手書きの伝搬経路を使用し、エンティティとクエリ関係の相関を無視している。さらに、関与する物質の数は、より大きな伝播ステップで爆発的に増加する。本研究では,有望な目標を維持しつつ,無関係なエンティティをフィルタリングするために適応的な伝搬経路を学習する動機付けを行う。まず,近傍のターゲットと層間接続を線形複雑に保存できるインクリメンタルサンプリング機構を設計する。第2に,意味的関連のあるエンティティを識別するために,学習に基づくサンプリング分布を設計する。広範な実験により,本手法は強力で効率的であり,意味論的であることが示された。コードはhttps://github.com/LARS-research/AdaProp.comで公開されている。 Due to the popularity of Graph Neural Networks (GNNs), various GNN-based methods have been designed to reason on knowledge graphs (KGs). An important design component of GNN-based KG reasoning methods is called the propagation path, which contains a set of involved entities in each propagation step. Existing methods use hand-designed propagation paths, ignoring the correlation between the entities and the query relation. In addition, the number of involved entities will explosively grow at larger propagation steps. In this work, we are motivated to learn an adaptive propagation path in order to filter out irrelevant entities while preserving promising targets. First, we design an incremental sampling mechanism where the nearby targets and layer-wise connections can be preserved with linear complexity. Second, we design a learning-based sampling distribution to identify the semantically related entities. Extensive experiments show that our method is powerful, efficient, and semantic-aware. The code is available at https://github.com/LARS-research/AdaProp.	翻訳日:2023-06-17 04:01:29 公開日:2023-06-14
# 米国の政治家によるコミュニケーションにおける誠実さの代替概念から代替事実へ From alternative conceptions of honesty to alternative facts in communications by U.S. politicians ( http://arxiv.org/abs/2208.10814v3 ) ライセンス: Link先を確認	Jana Lasser, Segun Taofeek Aroyehun, Fabio Carrella, Almog Simchon, David Garcia, Stephan Lewandowsky	(参考訳) ソーシャルメディアにおけるオンライン誤報の拡散は、社会的結束と民主主義の問題としてますます認識されている。この過程における政治指導者の役割は、たとえ証拠によって支持されていなくても、「自分の心を語る」政治家は、国民のセグメントによって真正かつ正直であると認識されているにもかかわらず、研究の注意を引いている。 2011年から2022年の間、Twitter上で米国議会のメンバーによるコミュニケーションを分析すると、政治家の正直性の概念は、証拠から切り離された真正な信念が、明白な証拠に基づく真理の探求とより区別されるようになることを示している。我々は、民主党ではなく共和党員にとって、10%の信念話者の増加は、ツイートで共有された情報源の12.8ポイントの質(ニューガードスコアシステム)の低下と関連していることを示した。逆に、真理検索言語の増加は、双方の情報源の品質の向上と関連している。この結果は、政治談話における現在の誤報の拡散は、証拠への依存を犠牲にして主観的信念の喚起を強調する真理と誠実性の代替的理解によって部分的に引き起こされているという仮説を支持している。 The spread of online misinformation on social media is increasingly perceived as a problem for societal cohesion and democracy. The role of political leaders in this process has attracted less research attention, even though politicians who "speak their mind" are perceived by segments of the public as authentic and honest even if their statements are unsupported by evidence. Analyzing communications by members of the U.S. Congress on Twitter between 2011 and 2022, we show that politicians' conception of honesty has undergone a distinct shift, with authentic belief-speaking that may be decoupled from evidence becoming more prominent and more differentiated from explicitly evidence-based truth seeking. We show that for Republicans - but not Democrats - an increase of belief-speaking of 10% is associated with a decrease of 12.8 points of quality (NewsGuard scoring system) in the sources shared in a tweet. Conversely, an increase in truth-seeking language is associated with an increase in quality of sources for both parties. The results support the hypothesis that the current dissemination of misinformation in political discourse is in part driven by an alternative understanding of truth and honesty that emphasizes invocation of subjective belief at the expense of reliance on evidence.	翻訳日:2023-06-17 03:56:15 公開日:2023-06-14
# テンプレートに基づく時間適応による動的文脈化単語埋め込みの学習 Learning Dynamic Contextualised Word Embeddings via Template-based Temporal Adaptation ( http://arxiv.org/abs/2208.10734v3 ) ライセンス: Link先を確認	Xiaohang Tang, Yi Zhou, Danushka Bollegala	(参考訳) dynamic contextized word embeddeds (dcwes) は、単語の時間的意味変化を表す。本稿では,事前学習されたマスク言語モデル(mlm)の時間適応化によるdcwes学習法を提案する。 2つの異なるタイムスタンプ $t_1$ と $t_2$ でそれぞれ取られたコーパスの2つのスナップショット $c_1$ と $c_2$ を考えると、まずは教師なしの方法を提案する。 (a)$c_1$ と $c_2$ のどちらも関連する用語と、 (b)個々のスナップショットの特定のピボット項に関連付けられたemph{anchor}用語。次に、抽出されたピボットとアンカーを使って手動でコンパイルされたテンプレートを埋めてプロンプトを生成します。さらに,人間による監督を必要とせず,C_1$とC_2$からタイムセンシティブなテンプレートを自動的に学習する手法を提案する。次に、生成されたプロンプトを使用して、プリトレーニングされたmlmをこれらのプロンプトを使用して微調整することで$t_2$に適応させる。複数の実験により, 提案手法はテスト文の難易度を$C_2$で低減し, 現状よりも優れていた。 Dynamic contextualised word embeddings (DCWEs) represent the temporal semantic variations of words. We propose a method for learning DCWEs by time-adapting a pretrained Masked Language Model (MLM) using time-sensitive templates. Given two snapshots $C_1$ and $C_2$ of a corpus taken respectively at two distinct timestamps $T_1$ and $T_2$, we first propose an unsupervised method to select (a) \emph{pivot} terms related to both $C_1$ and $C_2$, and (b) \emph{anchor} terms that are associated with a specific pivot term in each individual snapshot. We then generate prompts by filling manually compiled templates using the extracted pivot and anchor terms. Moreover, we propose an automatic method to learn time-sensitive templates from $C_1$ and $C_2$, without requiring any human supervision. Next, we use the generated prompts to adapt a pretrained MLM to $T_2$ by fine-tuning using those prompts. Multiple experiments show that our proposed method reduces the perplexity of test sentences in $C_2$, outperforming the current state-of-the-art.	翻訳日:2023-06-17 03:55:49 公開日:2023-06-14
# 少数ショット学習のためのプリミティブアウェア識別表現の学習 Learning Primitive-aware Discriminative Representations for Few-shot Learning ( http://arxiv.org/abs/2208.09717v2 ) ライセンス: Link先を確認	Jianpeng Yang, Yuhang Niu, Xuemei Xie, Guangming Shi	(参考訳) FSL (Few-shot Learning) は、いくつかのラベル付き例で簡単に新しいクラスを認識できる分類器を学習することを目的としている。 FSLに関する最近の研究は有望な分類性能をもたらし、画像レベルの特徴を使って分類のためのサンプル間の類似性を計算する。しかし、画像レベルの特徴は、見知らぬクラスと見えないクラスの間で転送可能で一貫性のあるオブジェクトの微細で構造的なインフォームを無視する。人間はどのようにして複数のサムプルを持つ新しいクラスを識別できるのか? 認知科学からのいくつかの研究は、人間が原始を通して新しいカテゴリーを認識できると主張している。基本と新規のカテゴリは重複しないが、共通のプリミティブを共有することができる。上記の再調査に触発されて,計量に基づくfslモデルに基づく原始認識表現を学習する原始的マイニング・推論ネットワーク (pmrn) を提案する。具体的には,機能抽出にSSJ(Self-supervision Jigsaw Task)を並列に追加し,オブジェクトの部分に対応する視覚パターンをフェースチャネルにエンコードするモデルを導出する。さらに識別表現をマイニングするために、アダプティブチャンネルグルーピング(acg)法をクラスタに適用し、空間的およびセマント的に関連した視覚パターンを重み付けし、視覚プリミティブのグループを生成する。プリミティブの識別可能性と伝達可能性を高めるために,グラフコンボリューションネットワークに基づく視覚的プリミティブ相関推論ネットワーク(CRN)を提案し,プリミティブ間の豊富な構造情報と内部相関を学習する。最後に、エピソディックトレーニング戦略に基づいてメタタスクの分類のための原始レベル計量を行う。広範な実験により,6つの標準ベンチマークで最新の結果が得られた。 Few-shot learning (FSL) aims to learn a classifier that can be easily adapted to recognize novel classes with only a few labeled examples. Some recent work about FSL has yielded promising classification performance, where the image-level feature is used to calculate the similarity among samples for classification. However, the image-level feature ignores abundant fine-grained and structural in-formation of objects that may be transferable and consistent between seen and unseen classes. How can humans easily identify novel classes with several sam-ples? Some study from cognitive science argues that humans can recognize novel categories through primitives. Although base and novel categories are non-overlapping, they can share some primitives in common. Inspired by above re-search, we propose a Primitive Mining and Reasoning Network (PMRN) to learn primitive-aware representations based on metric-based FSL model. Concretely, we first add Self-supervision Jigsaw task (SSJ) for feature extractor parallelly, guiding the model to encode visual pattern corresponding to object parts into fea-ture channels. To further mine discriminative representations, an Adaptive Chan-nel Grouping (ACG) method is applied to cluster and weight spatially and se-mantically related visual patterns to generate a group of visual primitives. To fur-ther enhance the discriminability and transferability of primitives, we propose a visual primitive Correlation Reasoning Network (CRN) based on graph convolu-tional network to learn abundant structural information and internal correlation among primitives. Finally, a primitive-level metric is conducted for classification in a meta-task based on episodic training strategy. Extensive experiments show that our method achieves state-of-the-art results on six standard benchmarks.	翻訳日:2023-06-17 03:55:25 公開日:2023-06-14
# ラベル雑音の存在下での一般化に及ぼすモデル幅と密度の影響の検討 Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise ( http://arxiv.org/abs/2208.08003v4 ) ライセンス: Link先を確認	Yihao Xue, Kyle Whitecross, Baharan Mirzasoleiman	(参考訳) 過パラメータ化されたニューラルネットワークのサイズ拡大は、最先端のパフォーマンスを達成する上で鍵となる。これは二重降下現象によって捉えられ、モデル幅が増加するにつれて、試験損失は減少・減少パターンに従う。しかし, 試験損失曲線に対するラベルノイズの影響は十分に検討されていない。本研究では、ラベルノイズが元々観測された二重降下曲線において \textit{final ascent} となる興味深い現象を明らかにする。具体的には、ノイズ対サンプルサイズ比が十分大きい場合には、中間幅で最適一般化が達成される。理論的解析を通じて、この現象はラベルノイズによる試験損失分散の形状遷移に起因している。さらに,最終昇華現象をモデル密度に拡張し,トレーニング可能なパラメータをランダムに落とせば,ラベルノイズ下での一般化が向上することを示す最初の理論的特徴を与える。また,正規化とサンプルサイズの役割についても徹底的に検討した。驚いたことに、ラベルノイズに対する大きな$\ell_2$正規化と堅牢な学習手法が最終的な上昇を悪化させる。我々は,MNISTでトレーニングされたReLuネットワーク,CIFAR-10/100でトレーニングされたResNet,および現実世界の雑音ラベルを持つスタンフォードカーでトレーニングされたInceptionResNet-v2を用いて,その妥当性を確認した。 Increasing the size of overparameterized neural networks has been a key in achieving state-of-the-art performance. This is captured by the double descent phenomenon, where the test loss follows a decreasing-increasing-decreasing pattern as model width increases. However, the effect of label noise on the test loss curve has not been fully explored. In this work, we uncover an intriguing phenomenon where label noise leads to a \textit{final ascent} in the originally observed double descent curve. Specifically, under a sufficiently large noise-to-sample-size ratio, optimal generalization is achieved at intermediate widths. Through theoretical analysis, we attribute this phenomenon to the shape transition of test loss variance induced by label noise. Furthermore, we extend the final ascent phenomenon to model density and provide the first theoretical characterization showing that reducing density by randomly dropping trainable parameters improves generalization under label noise. We also thoroughly examine the roles of regularization and sample size. Surprisingly, we find that larger $\ell_2$ regularization and robust learning methods against label noise exacerbate the final ascent. We confirm the validity of our findings through extensive experiments on ReLu networks trained on MNIST, ResNets trained on CIFAR-10/100, and InceptionResNet-v2 trained on Stanford Cars with real-world noisy labels.	翻訳日:2023-06-17 03:54:41 公開日:2023-06-14
# 一様局所測定による任意絡み合い状態の効率的な検証 Efficient verification of arbitrary entangled states with homogeneous local measurements ( http://arxiv.org/abs/2208.01083v2 ) ライセンス: Link先を確認	Ye-Chao Liu, Yinfei Li, Jiangwei Shang, Xiangdong Zhang	(参考訳) 量子状態検証(QSV)は、特定の量子デバイスが所望の目標状態を生成することを検証するためにのみ、局所的な測定に依存するタスクである。今のところ、ある種の絡み合った状態はQSVによって効率よく、あるいは最適に検証できる。しかし、任意の絡み合った状態を考えると、その検証プロトコルをどのように設計するかは未解決の問題である。そこで本研究では, 選択非依存計測プロトコルとして導入する手法の局所性を考慮し, 操作者が均質である場合に直接達成できる手法を提案する。いくつかの典型的な絡み合った状態を例にとると、標準ポーリ射影を用いたプロトコル設計の明示的な手順を示し、より優れたqsv戦略を実現する方法の優位性を示す。さらに,本フレームワークは,絡み目の構築やパラメータ推定など,他のタスクにも自然に拡張することができる。 Quantum state verification (QSV) is the task of relying on local measurements only to verify that a given quantum device does produce the desired target state. Up to now, certain types of entangled states can be verified efficiently or even optimally by QSV. However, given an arbitrary entangled state, how to design its verification protocol remains an open problem. In this work, we present a systematic strategy to tackle this problem by considering the locality of what we initiate as the choice-independent measurement protocols, whose operators can be directly achieved when they are homogeneous. Taking several typical entangled states as examples, we show the explicit procedures of the protocol design using standard Pauli projections, demonstrating the superiority of our method for attaining better QSV strategies. Moreover, our framework can be naturally extended to other tasks such as the construction of entanglement witness, and even parameter estimation.	翻訳日:2023-06-17 03:54:16 公開日:2023-06-14
# ニューラルネットワークによる新規なテスト選択による機能被覆の高速化 Using Neural Networks for Novelty-based Test Selection to Accelerate Functional Coverage Closure ( http://arxiv.org/abs/2207.00445v3 ) ライセンス: Link先を確認	Xuan Zheng, Kerstin Eder and Tim Blackmore	(参考訳) シミュレーションに基づく検証に使用される新しいテストセレクタは、カバレッジホールの数に関わらず、カバレッジ閉鎖を著しく加速することが示されている。本稿ではニューラルネットワークに基づく新しいテスト選択のための構成可能かつ高度に自動化されたフレームワークを提案する。このフレームワークの3つの構成は商用信号処理ユニットでテストされる。 3つとも確率的にランダムなテスト選択を上回っており、最大のシミュレーションの節約率は49.37%で99.5%である。構成の計算コストは、シミュレーションの削減と比べて無視できる。実験結果を比較し,構成の性能に関する重要な特徴について考察する。 Novel test selectors used in simulation-based verification have been shown to significantly accelerate coverage closure regardless of the number of coverage holes. This paper presents a configurable and highly-automated framework for novel test selection based on neural networks. Three configurations of this framework are tested with a commercial signal processing unit. All three convincingly outperform random test selection with the largest saving of simulation being 49.37% to reach 99.5% coverage. The computational expense of the configurations is negligible compared to the simulation reduction. We compare the experimental results and discuss important characteristics related to the performance of the configurations.	翻訳日:2023-06-17 03:52:28 公開日:2023-06-14
# 浮揚光機械センサによる横軌道角運動量計測 Structured transverse orbital angular momentum probed by a levitated optomechanical sensor ( http://arxiv.org/abs/2209.09759v3 ) ライセンス: Link先を確認	Yanhui Hu, Jack J. Kingsley-Smith, Maryam Nikkhou, James A. Sabin, Francisco J. Rodr\'iguez-Fortu\~no, Xiaohao Xu and James Millen	(参考訳) 構造された光電場によって運ばれる運動量は、様々な驚くべき特徴を示す。本研究では,2つの平行な直線偏光集束ビームの干渉場における横軌道角運動量(TOAM)を生成し,固有TOAMを有する同一のハンドネス渦列を合成する。回転が光角運動量のプローブであり、非常に大きなトルクを発生させる光学浮揚シリコンナノロッドからなる光機械センサを用いて、この構造された光場を探索する。この単純なTOAMの生成と直接観察は、基礎物理学、物質の光学的操作、量子光学の研究に応用される。 The momentum carried by structured light fields exhibits a rich array of surprising features. In this work, we generate transverse orbital angular momentum (TOAM) in the interference field of two parallel and counterpropagating linearly-polarised focused beams, synthesising an array of identical handedness vortices carrying intrinsic TOAM. We explore this structured light field using an optomechanical sensor, consisting of an optically levitated silicon nanorod, whose rotation is a probe of the optical angular momentum, which generates an exceptionally large torque. This simple creation and direct observation of TOAM will have applications in studies of fundamental physics, the optical manipulation of matter and quantum optomechanics.	翻訳日:2023-06-17 03:45:47 公開日:2023-06-14
# 分類基準の分析と比較 Analysis and Comparison of Classification Metrics ( http://arxiv.org/abs/2209.05355v3 ) ライセンス: Link先を確認	Luciana Ferrer	(参考訳) さまざまなパフォーマンス指標が、分類システムの評価のために機械学習文献で一般的に使用されている。ハード決定の質を測る最も一般的なものは、標準とバランスの取れた精度、標準とバランスの取れた誤差率、Fベータスコア、マシューズ相関係数(MCC)である。本稿では,これらと他の指標の定義をレビューし,各統計学習コースで導入されているが機械学習文献では滅多に用いられていない期待コスト(ec)と比較する。標準および平衡誤差率の両方がECの特別な場合であることを示す。さらに、f-score と mcc との関係を示し、ec は従来のメトリクスよりも優れており、よりエレガントで汎用的で直感的であり、統計の基本的な原則に基づいていると主張する。上記のメトリクスは、難しい決定の質を測定します。しかし、現代のほとんどの分類システムは、直接評価したいクラスに対して連続スコアを出力する。システムスコアの測定基準には、ROC曲線下の領域、等誤差率、クロスエントロピー、ブライアスコア、ベイズECまたはベイズリスクなどが含まれる。最後の3つのメトリクスは、適切なスコアリングルール(PSR)の期待値によって与えられるメトリクスのファミリーの特別なケースである。これらの指標の背景にある理論を概観し、系が生み出す後部確率の質を測る最も原理的な方法であると主張している。最後に,これらの測定値を用いてシステムのキャリブレーション損失を計算し,この測定値と標準期待キャリブレーション誤差(ECE)を比較し,PSRに基づくキャリブレーション損失は様々な理由からECEよりも優れていると主張した。 A variety of different performance metrics are commonly used in the machine learning literature for the evaluation of classification systems. Some of the most common ones for measuring quality of hard decisions are standard and balanced accuracy, standard and balanced error rate, F-beta score, and Matthews correlation coefficient (MCC). In this document, we review the definition of these and other metrics and compare them with the expected cost (EC), a metric introduced in every statistical learning course but rarely used in the machine learning literature. We show that both the standard and balanced error rates are special cases of the EC. Further, we show its relation with F-score and MCC and argue that EC is superior to these traditional metrics, being more elegant, general, and intuitive, as well as being based on basic principles from statistics. The metrics above measure the quality of hard decisions. Yet, most modern classification systems output continuous scores for the classes which we may want to evaluate directly. Metrics for measuring the quality of system scores include the area under the ROC curve, equal error rate, cross-entropy, Brier score, and Bayes EC or Bayes risk, among others. The last three metrics are special cases of a family of metrics given by the expected value of proper scoring rules (PSRs). We review the theory behind these metrics and argue that they are the most principled way to measure the quality of the posterior probabilities produced by a system. Finally, we show how to use these metrics to compute the system's calibration loss and compare this metric with the standard expected calibration error (ECE), arguing that calibration loss based on PSRs is superior to the ECE for a variety of reasons.	翻訳日:2023-06-17 03:44:57 公開日:2023-06-14
# トランザクションからの重力:エントロピー重力プログラムをフルフィルする Gravity from Transactions: Fulfilling the Entropic Gravity Program ( http://arxiv.org/abs/2209.04025v2 ) ライセンス: Link先を確認	A. Schlatter and R. E. Kastner	(参考訳) 本稿では,相対論的トランザクション解釈(RTI)の観点から,エントロピー重力の新展開を概観する。時空事象に対するトランザクショナルなアプローチは、エントロピック重力(もともとエリック・ヴェルリンデが提唱した方法で)に対する自然な方法を生み出し、その研究プログラムに対する既存の反対を克服する。この理論は自然に宇宙定数と修正ニュートン力学(MOND)を生じさせ、歴史的に「暗黒エネルギー」と「暗黒物質」に由来する現象の物理的説明を与える。 This is a review of new developments in entropic gravity in light of the Relativistic Transactional Interpretation (RTI). A transactional approach to spacetime events can give rise in a natural way to entropic gravity (in the way originally proposed by Erik Verlinde) while also overcoming extant objections to that research program. The theory also naturally gives rise to a Cosmological Constant and to Modified Newtonian Dynamics (MOND) and thus provides a physical explanation for the phenomena historically attributed to "dark energy" and "dark matter".	翻訳日:2023-06-17 03:44:06 公開日:2023-06-14
# 多様な相互相関を持つ単段広帯域マルチラベル学習(bmiml)とその医用画像分類への応用 Single-Stage Broad Multi-Instance Multi-Label Learning (BMIML) with Diverse Inter-Correlations and its application to medical image classification ( http://arxiv.org/abs/2209.02625v2 ) ライセンス: Link先を確認	Qi Lai, Jianhang Zhou, Yanfen Gan, Chi-Man Vong, Deshuang Huang	(参考訳) 複数のインスタンス(イメージパッチなど)によって記述され、同時に複数のラベルに関連付けられる。既存のMIMLメソッドは多くのアプリケーションで有用であるが、そのほとんどはいくつかの問題により比較的低い精度と訓練効率に悩まされている。一ラベル間の相関関係(即ち、対象に対応する複数のラベル間の確率的相関関係)を無視すること。二インスタンス間相関(すなわち、オブジェクトラベルの予測において異なるインスタンスの確率的相関)は、欠落したインスタンスラベルによる他の種類の相関を直接(又は共同で)学習することはできない。三多様な相互相関(例えば、ラベル間相関、インスタンス間相関)は、複数の段階でしか学べない。これらの問題を解決するために,広帯域マルチインスタンス・マルチラベル学習(BMIML)と呼ばれる新しいシングルステージフレームワークを提案する。 BMIMLには3つの革新的なモジュールがある。一広範学習システム(BLS)に基づく自己重み付きラベル強化学習(AWLEL)を設計し、従来のBLSでは不可能でありながら、ラベル間相関を同時にかつ効率的に取得する。二スケーラブルマルチインスタンス確率回帰(SMIPR)と呼ばれる特定のMIMLニューラルネットワークを構築して、オブジェクトラベルのみを用いてインスタンス間相関を効果的に推定し、学習のためのさらなる確率情報を提供する。三最後に、対話型意思決定最適化(IDO)を設計し、AWLELとSMIPRの結果を組み合わせ、最適化し、単一ステージのフレームワークを形成する。実験の結果、BMIMLは既存の手法よりも精度が高く、大きな医療画像データセット(>90K画像)であってもほとんどのMIML法よりもはるかに高速であることがわかった。 described by multiple instances (e.g., image patches) and simultaneously associated with multiple labels. Existing MIML methods are useful in many applications but most of which suffer from relatively low accuracy and training efficiency due to several issues: i) the inter-label correlations(i.e., the probabilistic correlations between the multiple labels corresponding to an object) are neglected; ii) the inter-instance correlations (i.e., the probabilistic correlations of different instances in predicting the object label) cannot be learned directly (or jointly) with other types of correlations due to the missing instance labels; iii) diverse inter-correlations (e.g., inter-label correlations, inter-instance correlations) can only be learned in multiple stages. To resolve these issues, a new single-stage framework called broad multi-instance multi-label learning (BMIML) is proposed. In BMIML, there are three innovative modules: i) an auto-weighted label enhancement learning (AWLEL) based on broad learning system (BLS) is designed, which simultaneously and efficiently captures the inter-label correlations while traditional BLS cannot; ii) A specific MIML neural network called scalable multi-instance probabilistic regression (SMIPR) is constructed to effectively estimate the inter-instance correlations using the object label only, which can provide additional probabilistic information for learning; iii) Finally, an interactive decision optimization (IDO) is designed to combine and optimize the results from AWLEL and SMIPR and form a single-stage framework. Experiments show that BMIML is highly competitive to (or even better than) existing methods in accuracy and much faster than most MIML methods even for large medical image data sets (> 90K images).	翻訳日:2023-06-17 03:43:56 公開日:2023-06-14
# 量子状態におけるランダムアクセスコードの2例 Two instances of random access code in the quantum regime ( http://arxiv.org/abs/2208.14422v3 ) ライセンス: Link先を確認	Nitica Sakharwade, Micha{\l} Studzi\'nski, Micha{\l} Eckstein, and Pawe{\l} Horodecki	(参考訳) 我々は、ランダムアクセスコード(rac)の量子一般化の2つのクラスを検討し、そのようなタスクの成功確率の下限を研究する。制約のあるリソースを用いた情報処理タスクの研究に有用なフレームワークを提供する。最初のクラスはランダムなアクセスコードに基づいており、量子入力と出力は非署名量子RAC (NS-QRAC) [A. Grudka et al. Phys. Rev. A 92, 052312 (2015)] と呼ばれる。 ns-qracシナリオの2つの修正について検討する。まず、アンバウンドなエンタングルメントと制約付き量子通信が許可され、次に、有界なエンタングルメントと制約のない古典的通信が許可されている場合、送信のフィデリティに対する一夫一夫一婦関係が、通常の通信方式とは対照的に、複数の送信者と1人の受信者が関与する。これらのシナリオに対して,より低いバウンダリを提供します。第2のクラスは、量子チャネルと共有絡み合い[A. Tavakoli et al. PRX Quantum 2 (4) 040357 (2021)]を持つランダムアクセスコードに基づいている。 2桁の$d$-baseからなる2つの入力をquditと最大絡み合った状態に符号化し、制約付き量子通信による量子密符号化として見ることができ、$d=2,3,4$の量子下界を提供する。エンコーディングはグレーコードを利用する。 We consider two classes of quantum generalisations of Random Access Code (RAC) and study lower bounds for probabilities of success for such tasks. It provides a useful framework for the study of certain information processing tasks with constrained resources. The first class is based on a random access code with quantum inputs and output known as No-Signalling Quantum RAC (NS-QRAC) [A. Grudka et al. Phys. Rev. A 92, 052312 (2015)], where unbounded entanglement and constrained classical communication are allowed, which can be seen as quantum teleportation with constrained classical communication, for which we provide a quantum lower bound. We consider two modifications to the NS-QRAC scenario, first where unbounded entanglement and constrained quantum communication is allowed and, second where bounded entanglement and unconstrained classical communication are allowed, where we find a monogamy relation for the transmission fidelities, which -- in contrast to the usual communication schemes -- involves multiple senders and a single receiver. We provide lower bounds for these scenarios. The second class is based on a random access code with a quantum channel and shared entanglement [A. Tavakoli et al. PRX Quantum 2 (4) 040357 (2021)]. We study the set of tasks where two inputs made of two digits of $d$-base are encoded over a qudit and a maximally entangled state, which can be seen as quantum dense coding with constrained quantum communication, for which we provide quantum lower bounds for $d=2,3,4$. The encoding employed utilises Gray codes.	翻訳日:2023-06-17 03:43:07 公開日:2023-06-14
# perspective-1-ellipsoid:1つの楕円型対応によるカメラポーズ推定問題の定式化、解析、および解法 Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence ( http://arxiv.org/abs/2208.12513v3 ) ライセンス: Link先を確認	Vincent Gaudilli\`ere, Gilles Simon, Marie-Odile Berger	(参考訳) コンピュータビジョンでは、3次元幾何学的実体と画像への投影との対応からカメラのポーズ推定が広く研究されている。多くの最先端の手法は、ポイントやラインのような低レベルプリミティブを利用するが、近年の非常に効果的なCNNベースのオブジェクト検出器の出現は、意味論的に意味のある情報を持つ高レベルな特徴の使用への道を開いた。この方向のパイオニアは、楕円体による3Dオブジェクトのモデリングと楕円体による2D検出が、2Dデータと3Dデータをリンクするのに便利な方法であることを示した。しかし、関連するlitteratureでよく使われる数学的形式論は、楕円形や楕円形を他の二次や円錐形と容易に区別することはできず、いくつかの発展において潜在的に有害な特異性の喪失に繋がる。さらに、投射方程式の線形化過程は、カメラパラメータの過剰表現を生成し、効率損失を引き起こす可能性がある。そこで本稿では,楕円体固有の理論的枠組みを導入し,ポーズ推定の文脈においてその有益性を示す。より正確には、提案形式は、残りの未知を閉形式で導出できる位置または向きのみの推定問題に、ポーズ推定問題を還元することができることを示す。次に,1自由度 (1dof) 問題にさらに縮小できることを示し,その一意なスカラーの関数として,ポーズの解析的導出を提供する。視覚的な例で理論的考察を例示し,実用的側面について考察する。最後に,エリプソイド関連ポーズ推定問題のより効率的な解決に向けて,対応するソースコードとともに本論文をリリースする。 In computer vision, camera pose estimation from correspondences between 3D geometric entities and their projections into the image has been a widely investigated problem. Although most state-of-the-art methods exploit low-level primitives such as points or lines, the emergence of very effective CNN-based object detectors in the recent years has paved the way to the use of higher-level features carrying semantically meaningful information. Pioneering works in that direction have shown that modelling 3D objects by ellipsoids and 2D detections by ellipses offers a convenient manner to link 2D and 3D data. However, the mathematical formalism most often used in the related litterature does not enable to easily distinguish ellipsoids and ellipses from other quadrics and conics, leading to a loss of specificity potentially detrimental in some developments. Moreover, the linearization process of the projection equation creates an over-representation of the camera parameters, also possibly causing an efficiency loss. In this paper, we therefore introduce an ellipsoid-specific theoretical framework and demonstrate its beneficial properties in the context of pose estimation. More precisely, we first show that the proposed formalism enables to reduce the pose estimation problem to a position or orientation-only estimation problem in which the remaining unknowns can be derived in closed-form. Then, we demonstrate that it can be further reduced to a 1 Degree-of-Freedom (1DoF) problem and provide the analytical derivations of the pose as a function of that unique scalar unknown. We illustrate our theoretical considerations by visual examples and include a discussion on the practical aspects. Finally, we release this paper along with the corresponding source code in order to contribute towards more efficient resolutions of ellipsoid-related pose estimation problems.	翻訳日:2023-06-17 03:42:17 公開日:2023-06-14
# 量子コンピュータを用いたネットワークにおける感染拡大のシミュレーション Simulating the Spread of Infection in Networks with Quantum Computers ( http://arxiv.org/abs/2208.11394v2 ) ライセンス: Link先を確認	Xiaoyang Wang and Yinchenguang Lyu and Changyu Yao and Xiao Yuan	(参考訳) 本稿では,ネットワークの感染拡大をシミュレーションする量子コンピュータを提案する。まず,Ising型相互作用による感染分布とスピン格子構成の類似性を示す。次に, 拡散過程を古典マルコフ過程としてモデル化できるので, パラメータ化されたハミルトニアンを持つ量子熱力学モデルの進化を用いて拡散過程をシミュレートできることを示す。特に,ハミルトニアンの進化挙動を解析的および数値的に解析し,その進化が古典マルコフ過程をシミュレートすることを証明する。疫学的な入力から熱力学的ハミルトニアンのパラメータを決定するための実用的な方法を示す。例として,SARS-Cov-2変異株Omicronの感染拡散過程のシミュレーションを行った。 We propose to use quantum computers to simulate infection spreading in networks. We first show the analogy between the infection distribution and spin-lattice configurations with Ising-type interactions. Then, since the spreading process can be modeled as a classical Markovian process, we show that the spreading process can be simulated using the evolution of a quantum thermal dynamic model with a parameterized Hamiltonian. In particular, we analytically and numerically analyze the evolution behavior of the Hamiltonian, and prove that the evolution simulates a classical Markovian process, which describes the well-known epidemiological stochastic susceptible and infectious (SI) model. A practical method to determine the parameters of the thermal dynamic Hamiltonian from epidemiological inputs is exhibited. As an example, we simulate the infection spreading process of the SARS-Cov-2 variant Omicron in a small-world network.	翻訳日:2023-06-17 03:41:46 公開日:2023-06-14
# MAMO:細粒度視覚言語表現学習のためのマスク付きマルチモーダルモデリング MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning ( http://arxiv.org/abs/2210.04183v3 ) ライセンス: Link先を確認	Zijia Zhao, Longteng Guo, Xingjian He, Shuai Shao, Zehuan Yuan, Jing Liu	(参考訳) マルチモーダル表現学習は様々な視覚言語タスクにおいて有望な改善を示している。既存のほとんどの手法は、視覚と言語の間のグローバルレベルのアライメントを構築するのに優れ、効果的なきめ細かい画像とテキストの相互作用を欠いている。本稿では,細粒度マルチモーダル表現を学習するための複合マスク型マルチモーダルモデリング手法を提案する。本手法は,画像テキスト入力の共用マスキングを行い,マスキング信号の暗黙的および明示的ターゲットを統合して復元する。暗黙のターゲットは視覚と言語に対する統一的で不偏の目的を与え、そこでモデルは非マスキーク入力の潜在マルチモーダル表現を予測する。明示的なターゲットは、画像パッチの運動量視覚的特徴や単語トークンの概念といった高レベルで意味のある情報を復元することで、マルチモーダル表現をさらに強化する。このようなマスク付きモデリングプロセスを通じて、我々のモデルは微細なマルチモーダル相互作用を学習するだけでなく、高レベルの表現と低レベルの予測ターゲット(画像画素など)のセマンティックギャップを回避し、ゼロショットと微調整の両方でうまく機能するセマンティックにリッチなマルチモーダル表現を生成する。先行学習モデル(mamo)は,画像テキスト検索,視覚的質問応答,視覚的推論,弱教師付き視覚接地など,下流の視覚言語タスクにおいて最先端のパフォーマンスを実現する。 Multimodal representation learning has shown promising improvements on various vision-language tasks. Most existing methods excel at building global-level alignment between vision and language while lacking effective fine-grained image-text interaction. In this paper, we propose a jointly masked multimodal modeling method to learn fine-grained multimodal representations. Our method performs joint masking on image-text input and integrates both implicit and explicit targets for the masked signals to recover. The implicit target provides a unified and debiased objective for vision and language, where the model predicts latent multimodal representations of the unmasked input. The explicit target further enriches the multimodal representations by recovering high-level and semantically meaningful information: momentum visual features of image patches and concepts of word tokens. Through such a masked modeling process, our model not only learns fine-grained multimodal interaction, but also avoids the semantic gap between high-level representations and low- or mid-level prediction targets (e.g. image pixels), thus producing semantically rich multimodal representations that perform well on both zero-shot and fine-tuned settings. Our pre-trained model (named MAMO) achieves state-of-the-art performance on various downstream vision-language tasks, including image-text retrieval, visual question answering, visual reasoning, and weakly-supervised visual grounding.	翻訳日:2023-06-17 03:36:23 公開日:2023-06-14
# RGB-Dパノプティブセグメンテーションのためのロバスト二重エンコーダネットワーク Robust Double-Encoder Network for RGB-D Panoptic Segmentation ( http://arxiv.org/abs/2210.02834v2 ) ライセンス: Link先を確認	Matteo Sodano, Federico Magistri, Tiziano Guadagnino, Jens Behley, Cyrill Stachniss	(参考訳) 知覚は、現実の環境で行動するロボットにとって不可欠である。自律システムは周囲の世界を見て理解し、適切に行動する必要があるからだ。パノプティックセグメンテーションは、ピクセルワイズセマンティックラベルをインスタンスIDと共に計算することでシーンの解釈を提供する。本稿では,室内シーンのRGB-Dデータを用いたパノプティカルセグメンテーションについて述べる。本稿では、2つのエンコーダを通してRGBと深さを別々に処理する新しいエンコーダデコーダニューラルネットワークを提案する。個々のエンコーダの特徴は異なる解像度で徐々にマージされ、rgbの特徴は相補的な深さ情報を用いて強化される。本稿では,特徴マップの重要度に応じて各エントリを強調する,susentexciteと呼ばれる新しいマージ手法を提案する。ダブルエンコーダアーキテクチャでは、欠けているヒントに対して堅牢です。特に、同じモデルは、特殊なモデルを訓練することなく、RGB-D、RGB-only、deep-only入力データをトレーニングおよび推論することができる。提案手法を公開データセット上で評価し,他の汎視的セグメンテーション手法と比較して優れた結果が得られることを示す。 Perception is crucial for robots that act in real-world environments, as autonomous systems need to see and understand the world around them to act properly. Panoptic segmentation provides an interpretation of the scene by computing a pixelwise semantic label together with instance IDs. In this paper, we address panoptic segmentation using RGB-D data of indoor scenes. We propose a novel encoder-decoder neural network that processes RGB and depth separately through two encoders. The features of the individual encoders are progressively merged at different resolutions, such that the RGB features are enhanced using complementary depth information. We propose a novel merging approach called ResidualExcite, which reweighs each entry of the feature map according to its importance. With our double-encoder architecture, we are robust to missing cues. In particular, the same model can train and infer on RGB-D, RGB-only, and depth-only input data, without the need to train specialized models. We evaluate our method on publicly available datasets and show that our approach achieves superior results compared to other common approaches for panoptic segmentation.	翻訳日:2023-06-17 03:35:29 公開日:2023-06-14
# 時間変化重みを用いたデータドリフト下の学習 Learning under Data Drift with Time-Varying Importance Weights ( http://arxiv.org/abs/2210.01422v3 ) ライセンス: Link先を確認	Rasool Fakoor and Jonas Mueller and Zachary C. Lipton and Pratik Chaudhari and Alexander J. Smola	(参考訳) データが時間とともに進化するため、機械学習モデルの現実世界でのデプロイメントは難しい。データが任意の方法で進化する際には、モデルが機能しないが、これらの変更に何らかのパターンがある場合、それに対応するメソッドを設計できるかもしれない。本稿では,データが徐々に進化する状況に対処する。我々は、データ分布の段階的な変化を検知し、過去のデータを選択的にサンプリングしてモデルを更新できる時間変化確率スコアを導入します。時間変動確率スコアは非常に一般的で,教師付き学習(画像分類問題など)から,段階的な変化を連続的に行う,教師付き学習(画像分類問題など)から,方針やタスクの変化に伴ってデータがシフトする強化学習タスク(ロボット操作や連続制御など)まで,さまざまな問題に対して評価を行う。 Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.	翻訳日:2023-06-17 03:35:11 公開日:2023-06-14
# 複数のスケールでの位相特異性検出 Topological Singularity Detection at Multiple Scales ( http://arxiv.org/abs/2210.00069v4 ) ライセンス: Link先を確認	Julius von Rohrscheidt and Bastian Rieck	(参考訳) データが低本質次元の未知多様体上またはその近くにあると仮定する多様体仮説は、現代の機械学習研究の出発点である。しかし、最近の研究により、実世界のデータは、特異点、すなわち誤った発見につながる可能性のある異なる非多様体構造を示すことが示されている。このような特異点の検出は補間および推論タスクの前駆体として重要である。この問題に対処するために、我々はトポロジカルな枠組みを開発します。 (i)局所的な内在次元を定量化し、 (ii)複数の尺度に沿った点の「多様体性」を評価するためのユークリディシティスコアを得る。画像データの特異構造や局所幾何学的複雑性を捉えながら,複素空間の特異点を同定する。 The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the 'manifoldness' of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.	翻訳日:2023-06-17 03:34:27 公開日:2023-06-14
# truncated-cumulant trajectoriesによる開量子スピン格子の量子および古典的相関 Quantum and classical correlations in open quantum-spin lattices via truncated-cumulant trajectories ( http://arxiv.org/abs/2209.13377v4 ) ライセンス: Link先を確認	Wouter Verstraelen and Dolf Huybrechts and Tommaso Roscilde and Michiel Wouters	(参考訳) リウビリアン開量子システムにおける量子多体物理学の研究は、散逸系に対する最近の実験的制御の進展と、その技術的利用によってますます重要になっている。オープン量子系における中心的な問題は、量子相関の運命と、ハミルトン力学と浴槽とのカップリングの競合を工学的に制御する可能性に関するものである。このような問題は、量子相関を忠実に説明する数値的な方法が正確な対角化に依存しているか、扱える大きさを劇的に制限しているか、あるいは密度行列に対する特定のアンサッツの選択に関連する量子相関の範囲や強度を近似しているため、理論的観点からは難しい。本研究では,開放系力学の解に対する確率的量子軌道に基づいて,開量子スピン格子を扱う新しい手法を提案する。各軌道に沿って、多点スピンスピンコレレータの運動方程式の階層は、カットオフ$k_c$を超える$k$の多変量$k$-次累積が消えると仮定して、与えられた有限順序に切り替わる。これにより、全ての長さスケールに対して、量子スピン-スピン相関の進化を追跡することができる。自発的崩壊を受ける2次元xyz格子の相転移のパラダイム的場合において、このアプローチを検証する。我々は,パラ磁性から強磁性への定常相転移の存在を,ハミルトニアンカップリングの1つを増加させ,またその古典的イジングの性質を説得力をもって評価する。さらに, このアプローチにより, 散逸臨界点近傍に有意な量子相関が存在することを示し, 量子フィッシャー情報と密接な結合であるスピンスクイーズの存在を明らかにすることができる。 The study of quantum many-body physics in Liouvillian open quantum systems becomes increasingly important with the recent progress in experimental control on dissipative systems and their technological exploitation . A central question in open quantum systems concerns the fate of quantum correlations, and the possibility of controlling them by engineering the competition between the Hamiltonian dynamics and the coupling to a bath. Such a question is challenging from a theoretical point of view, as numerical methods faithfully accounting for quantum correlations are either relying on exact diagonalization, limiting drastically the sizes that can be treated; or on approximations on the range or strength of quantum correlations, associated to the choice of a specific Ansatz for the density matrix. In this work we propose a new method to treat open quantum-spin lattices, based on stochastic quantum trajectories for the solution of the open-system dynamics. Along each trajectory, the hierarchy of equations of motion for many-point spin-spin correlators is truncated to a given finite order, assuming that multivariate $k$-th order cumulants vanish for $k$ exceeding a cutoff $k_c$. This allows tracking the evolution of quantum spin-spin correlations up to order $k_c$ for all length scales. We validate this approach in the paradigmatic case of the phase transitions of the dissipative 2D XYZ lattice, subject to spontaneous decay. We convincingly assess the existence of steady-state phase transitions from paramagnetic to ferromagnetic, and back to paramagnetic, upon increasing one of the Hamiltonian couplings; as well as their classical Ising nature. Moreover, the approach allows us to show the presence of significant quantum correlations in the vicinity of the dissipative critical point, and to unveil the presence of spin squeezing, a tight lower bound to the quantum Fisher information.	翻訳日:2023-06-17 03:33:57 公開日:2023-06-14
# NISQデバイス以降におけるトロタライゼーション適応化とエネルギー自己補正 Making Trotterization adaptive and energy-self-correcting for NISQ devices and beyond ( http://arxiv.org/abs/2209.12653v2 ) ライセンス: Link先を確認	Hongzheng Zhao, Marin Bukov, Markus Heyl, and Roderich Moessner	(参考訳) 連続時間進化のシミュレーションは、古典コンピュータと量子コンピュータの両方で時間離散化を必要とする。より細かい時間ステップはシミュレーションの精度を向上させるが、必然的に計算労力が増加する。これは、今日のノイズの多い中間スケール量子コンピュータにとって特にコストがかかり、有名なゲートの不完全さは、与えられた精度で実行可能な回路の深さを制限する。古典的適応解法は数値計算時間を節約するためによく開発されている。しかしながら、適応時間ステップによって利用可能な量子リソースを最適に利用することは、依然として際立った課題である。本稿では,局所観測器の量子多体ダイナミクスの制御解を提供するため,この問題を解決する量子アルゴリズムを提案する。提案アルゴリズムの鍵となる概念要素は、時間ステップを適応させることでシミュレーションエラーを自己修正するフィードバックループであり、これにより、従来のトロッタースキームを基本レベルで大幅に上回り、回路深さを減少させる。さらには、通常のトロッタライズドダイナミクスが困難に直面している、制御された漸近的長時間エラーも可能にします。我々の量子アルゴリズムのもう1つの重要な利点は、望ましい保存則を自己修正フィードバックループに含めることができることである。我々は、格子ゲージ理論の忠実で長期にわたる量子シミュレーションに不可欠なゲージ不変性を強制することによって、その能力を実証する。このアルゴリズムは、例えば、時間発展ブロックデシメーション法に基づく数値的アプローチなど、時間的離散化が関与する場合には、より一般的なレベルで有用である可能性がある。 Simulation of continuous time evolution requires time discretization on both classical and quantum computers. A finer time step improves simulation precision, but it inevitably leads to increased computational efforts. This is particularly costly for today's noisy intermediate scale quantum computers, where notable gate imperfections limit the circuit depth that can be executed at a given accuracy. Classical adaptive solvers are well-developed to save numerical computation times. However, it remains an outstanding challenge to make optimal usage of the available quantum resources by means of adaptive time steps. Here, we introduce a quantum algorithm to solve this problem, providing a controlled solution of the quantum many-body dynamics of local observables. The key conceptual element of our algorithm is a feedback loop which self-corrects the simulation errors by adapting time steps, thereby significantly outperforming conventional Trotter schemes on a fundamental level and reducing the circuit depth. It even allows for a controlled asymptotic long-time error, where usual Trotterized dynamics is facing difficulties. Another key advantage of our quantum algorithm is that any desired conservation law can be included in the self-correcting feedback loop, which has potentially a wide range of applicability. We demonstrate the capabilities by enforcing gauge invariance which is crucial for a faithful and long-sought quantum simulation of lattice gauge theories. Our algorithm can be potentially useful on a more general level whenever time discretization is involved concerning, for instance, also numerical approaches based on time-evolving block decimation methods.	翻訳日:2023-06-17 03:33:23 公開日:2023-06-14
# テキストマッチングレコメンデーションシステムのアウトオブディストリビューション一般化のための介入の利用 Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Recommendation Systems ( http://arxiv.org/abs/2210.10636v2 ) ライセンス: Link先を確認	Parikshit Bansal, Yashoteja Prabhu, Emre Kiciman, Amit Sharma	(参考訳) ユーザの入力テキストが与えられた場合、テキストマッチングレコメンダシステムは、eコマースプラットフォームにおける製品間レコメンデーションなど、入力テキストと利用可能なアイテムの説明を比較して関連項目を出力する。ユーザの関心や項目のインベントリが変化すると期待されているため、テキストマッチングシステムがデータシフト(out-of-distribution (ood) generalization)と呼ばれるタスクに一般化することが重要である。しかし、ペアアイテム関連データ(例えば、ユーザークリック)上で大きなベース言語モデルを微調整する一般的なアプローチは、ood一般化の逆生成的であることがわかった。製品レコメンデーションタスクでは、新しいカテゴリや将来の期間の項目を推奨する場合、微調整はベースモデルよりも精度が悪くなる。この一般化の失敗を説明するために、微調整されたモデルが散発的な相関を捉え、2つのテキスト入力間の関連性を決定する因果的特徴を学習できないことを示す、介入に基づく重要度指標を考える。また、この設定では因果規則化の標準的な手法は適用されないが、画像とは異なり、テキストマッチングタスクには普遍的にスプリアスな特徴が存在しない(同じトークンがマッチしているテキストによってスプリアスか因果的になる可能性がある)。そこで本研究では,テキスト入力におけるOOD一般化について,特定の特徴に対する高い重要点の回避という,異なる目標を掲げる。これは、モデルの関連度スコアに対するトークンの因果効果を、ベースモデルに類似するように制約する介入ベースの正規化器を使用します。 amazon製品と3つの質問推奨データセットの結果から,提案する正規化器は,特にベースモデルが正確でない場合の難解なシナリオにおいて,分布内評価とood評価の両方の一般化を改善できることが分かる。 Given a user's input text, text-matching recommender systems output relevant items by comparing the input text to available items' description, such as product-to-product recommendation on e-commerce platforms. As users' interests and item inventory are expected to change, it is important for a text-matching system to generalize to data shifts, a task known as out-of-distribution (OOD) generalization. However, we find that the popular approach of fine-tuning a large, base language model on paired item relevance data (e.g., user clicks) can be counter-productive for OOD generalization. For a product recommendation task, fine-tuning obtains worse accuracy than the base model when recommending items in a new category or for a future time period. To explain this generalization failure, we consider an intervention-based importance metric, which shows that a fine-tuned model captures spurious correlations and fails to learn the causal features that determine the relevance between any two text inputs. Moreover, standard methods for causal regularization do not apply in this setting, because unlike in images, there exist no universally spurious features in a text-matching task (the same token may be spurious or causal depending on the text it is being matched to). For OOD generalization on text inputs, therefore, we highlight a different goal: avoiding high importance scores for certain features. We do so using an intervention-based regularizer that constraints the causal effect of any token on the model's relevance score to be similar to the base model. Results on Amazon product and 3 question recommendation datasets show that our proposed regularizer improves generalization for both in-distribution and OOD evaluation, especially in difficult scenarios when the base model is not accurate.	翻訳日:2023-06-17 03:25:43 公開日:2023-06-14
# Pareto Manifold Learning:シングルタスクモデルのアンサンブルを通じて複数のタスクに取り組む Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models ( http://arxiv.org/abs/2210.09759v2 ) ライセンス: Link先を確認	Nikolaos Dimitriadis, Pascal Frossard, Fran\c{c}ois Fleuret	(参考訳) MTL(Multi-Task Learning)では、タスクはソリューションに最適化を導くのではなく、互いに競合し、達成したパフォーマンスを制限することができる。すべてのタスクに最適なユニークなソリューションが存在しないことが多いため、実践者はタスクのパフォーマンス間のトレードオフをバランスさせ、Paretoの意味において最適性に頼る必要がある。ほとんどのMTL方法論は、この側面を完全に無視し、パレートフロントを学習する代わりに、最適化スキームによって事前に定義された1つの解を生成する。最近のアプローチでは、ニューラルネットワークを介してPareto Frontをパラメータ化し、トレードオフから客観的空間への複雑なマッピングにつながっている。本稿では,パレート前線がパラメータ空間における線形パラメータ化を許容すると仮定し,重み空間におけるセンシング法である \textit{pareto manifold learning} を提案する。当社のアプローチでは,単一トレーニングランで連続的なPareto Frontを生成し,推論中の各タスクのパフォーマンスを変調する。画像分類から表データセット、シーン理解まで、マルチタスク学習ベンチマークの実験では、 \textit{Pareto Manifold Learning} が最先端の単一ポイントアルゴリズムより優れており、マルチポイントベースラインよりも優れたパレートパラメータ化を学習している。 In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution, superior to all its single-task trained counterparts. Since there is often not a unique solution optimal for all tasks, practitioners have to balance tradeoffs between tasks' performance, and resort to optimality in the Pareto sense. Most MTL methodologies either completely neglect this aspect, and instead of aiming at learning a Pareto Front, produce one solution predefined by their optimization schemes, or produce diverse but discrete solutions. Recent approaches parameterize the Pareto Front via neural networks, leading to complex mappings from tradeoff to objective space. In this paper, we conjecture that the Pareto Front admits a linear parameterization in parameter space, which leads us to propose \textit{Pareto Manifold Learning}, an ensembling method in weight space. Our approach produces a continuous Pareto Front in a single training run, that allows to modulate the performance on each task during inference. Experiments on multi-task learning benchmarks, ranging from image classification to tabular datasets and scene understanding, show that \textit{Pareto Manifold Learning} outperforms state-of-the-art single-point algorithms, while learning a better Pareto parameterization than multi-point baselines.	翻訳日:2023-06-17 03:24:34 公開日:2023-06-14
# PromptCast: 時系列予測のための新しいPromptベースの学習パラダイム PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting ( http://arxiv.org/abs/2210.08964v3 ) ライセンス: Link先を確認	Hao Xue and Flora D.Salim	(参考訳) 本稿では,時系列予測の新しい視点を提案する。既存の時系列予測手法では、モデルは入力として数値の列を取り、出力として数値値を生成する。既存のSOTAモデルはトランスフォーマーアーキテクチャに基づいており、複数のエンコーディング機構で変更され、歴史的データのコンテキストとセマンティクスが組み込まれている。事前学習された言語基盤モデルの成功に触発されて、これらのモデルが時系列予測の解決にも適用できるかどうかを疑問視する。そこで我々は,新しい予測パラダイムであるprompt-based time series forecasting (promptcast)を提案する。この新しいタスクでは、数値入力と出力をプロンプトに変換し、予測タスクを文から文へのフレーム化することで、予測目的の言語モデルを直接適用することができる。本研究を支援するために,3つの実世界の予測シナリオを含む大規模データセット(PISA)を提案する。我々は異なるSOTA数値に基づく予測手法と言語生成モデルを評価する。様々な予測設定によるベンチマーク結果は、言語生成モデルで提案するプロンプトキャストが有望な研究方向であることを示している。さらに、従来の数値ベースの予測と比較すると、PromptCastはゼロショット設定下でのより優れた一般化能力を示す。 This paper presents a new perspective on time series forecasting. In existing time series forecasting methods, the models take a sequence of numerical values as input and yield numerical values as output. The existing SOTA models are largely based on the Transformer architecture, modified with multiple encoding mechanisms to incorporate the context and semantics around the historical data. Inspired by the successes of pre-trained language foundation models, we pose a question about whether these models can also be adapted to solve time-series forecasting. Thus, we propose a new forecasting paradigm: prompt-based time series forecasting (PromptCast). In this novel task, the numerical input and output are transformed into prompts and the forecasting task is framed in a sentence-to-sentence manner, making it possible to directly apply language models for forecasting purposes. To support and facilitate the research of this task, we also present a large-scale dataset (PISA) that includes three real-world forecasting scenarios. We evaluate different SOTA numerical-based forecasting methods and language generation models. The benchmark results with various forecasting settings demonstrate the proposed PromptCast with language generation models is a promising research direction. Additionally, in comparison to conventional numerical-based forecasting, PromptCast shows a much better generalization ability under the zero-shot setting.	翻訳日:2023-06-17 03:24:08 公開日:2023-06-14
# 農業領域におけるジョイントセマンティクス,植物インスタンス,葉のインスタンスセグメンテーションの階層的アプローチ Hierarchical Approach for Joint Semantic, Plant Instance, and Leaf Instance Segmentation in the Agricultural Domain ( http://arxiv.org/abs/2210.07879v2 ) ライセンス: Link先を確認	Gianmarco Roggiolani, Matteo Sodano, Tiziano Guadagnino, Federico Magistri, Jens Behley, Cyrill Stachniss	(参考訳) 植物表現型は、植物の成長段階、発達、その他の関連する量を記述するため、農業において中心的な役割である。ロボットは、葉の数、葉面積、植物の大きさなどの植物形質を正確に推定することで、このプロセスの自動化を支援する。本稿では,RGBデータから作物の連接意味,植物インスタンス,葉のインスタンスセグメンテーションの問題に対処する。本稿では,3つのタスクを同時に処理し,その基盤となる階層構造を活用する畳み込みニューラルネットワークを提案する。タスク固有のスキップ接続を導入することで,従来のスキームよりも有益であることが実験的評価で証明される。また,葉の重なり合っているため,農業領域に共通する空間的近接インスタンスの問題に明示的に対処する,新しい自動後処理を提案する。私たちのアーキテクチャは、農業の文脈で同時にこれらの問題に取り組みます。以前の作品は植物または葉のセグメンテーションに焦点を当てるか、意味的なセグメンテーションを最適化しない。その結果,システムの性能は最先端の手法に比べて優れ,パラメータ数が減少し,カメラフレームレートで動作していることがわかった。 Plant phenotyping is a central task in agriculture, as it describes plants' growth stage, development, and other relevant quantities. Robots can help automate this process by accurately estimating plant traits such as the number of leaves, leaf area, and the plant size. In this paper, we address the problem of joint semantic, plant instance, and leaf instance segmentation of crop fields from RGB data. We propose a single convolutional neural network that addresses the three tasks simultaneously, exploiting their underlying hierarchical structure. We introduce task-specific skip connections, which our experimental evaluation proves to be more beneficial than the usual schemes. We also propose a novel automatic post-processing, which explicitly addresses the problem of spatially close instances, common in the agricultural domain because of overlapping leaves. Our architecture simultaneously tackles these problems jointly in the agricultural context. Previous works either focus on plant or leaf segmentation, or do not optimise for semantic segmentation. Results show that our system has superior performance compared to state-of-the-art approaches, while having a reduced number of parameters and is operating at camera frame rate.	翻訳日:2023-06-17 03:22:56 公開日:2023-06-14
# CORL: 深部オフライン強化学習ライブラリ CORL: Research-oriented Deep Offline Reinforcement Learning Library ( http://arxiv.org/abs/2210.07105v3 ) ライセンス: Link先を確認	Denis Tarasov, Alexander Nikulin, Dmitry Akimov, Vladislav Kurenkov, Sergey Kolesnikov	(参考訳) CORLはオープンソースのライブラリで、オフラインとオフラインの強化学習アルゴリズムの両方で、徹底的にベンチマークされた単一ファイルの実装を提供する。簡単なコードベースと現代的な分析追跡ツールを使って、シンプルな開発体験を強調する。 CORLでは、メソッドの実装を個別のファイルに分離し、パフォーマンス関連の詳細を認識しやすくする。さらに、メトリクス、ハイパーパラメータ、依存関係などをクラウドにログする実験追跡機能も提供されている。最後に、一般的なD4RLデータセットをベンチマークすることで実装の信頼性を保証し、パフォーマンスプロファイルや改善の確率、期待されるオンラインパフォーマンスなどの堅牢な評価ツールに再利用可能な、透過的な結果のソースを提供する。 CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into separate single files, making performance-relevant details easier to recognize. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking commonly employed D4RL datasets providing a transparent source of results that can be reused for robust evaluation tools such as performance profiles, probability of improvement, or expected online performance.	翻訳日:2023-06-17 03:22:37 公開日:2023-06-14
# FP拡散:下記のスコアフォッカー・プランク方程式によるスコアベース拡散モデルの改善 FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation ( http://arxiv.org/abs/2210.04296v4 ) ライセンス: Link先を確認	Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon	(参考訳) スコアベース生成モデル (sgms) は, 音量の増加に伴って摂動するデータ密度に対応する雑音条件スコア関数の族を学習する。これらの摂動データ密度は、拡散過程を経た密度の空間-時間発展を管理する偏微分方程式(pde)であるフォッカー・プランク方程式(fpe)によって結合される。本研究では,摂動データ密度(すなわち勾配)のノイズ条件スコアを特徴付けるスコアfpeと呼ばれる対応する方程式を導出する。驚くべきことに、印象的な経験的性能にもかかわらず、DSM(denoising score matching)によって学習されたスコアは、基礎となるスコアFPEを満たさないことが観察された。スコアFPEの満足度や保守度を向上させるため,FPEの満足度が望ましいことを示す。そこで,本研究では,スコアFPEの満足度を高めるためにDSM目標を標準化することを提案する。 Score-based generative models (SGMs) learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These perturbed data densities are linked together by the Fokker-Planck equation (FPE), a partial differential equation (PDE) governing the spatial-temporal evolution of a density undergoing a diffusion process. In this work, we derive a corresponding equation called the score FPE that characterizes the noise-conditional scores of the perturbed data densities (i.e., their gradients). Surprisingly, despite the impressive empirical performance, we observe that scores learned through denoising score matching (DSM) fail to fulfill the underlying score FPE, which is an inherent self-consistency property of the ground truth score. We prove that satisfying the score FPE is desirable as it improves the likelihood and the degree of conservativity. Hence, we propose to regularize the DSM objective to enforce satisfaction of the score FPE, and we show the effectiveness of this approach across various datasets.	翻訳日:2023-06-17 03:22:23 公開日:2023-06-14
# 有限データからの重力電流再構成のための物理インフォームドニューラルネットワーク Physics-informed neural networks for gravity currents reconstruction from limited data ( http://arxiv.org/abs/2211.09715v2 ) ライセンス: Link先を確認	Micka\"el Delcey, Yoann Cheny, S\'ebastien Kiesgen de Richter	(参考訳) 本研究では, 物理インフォームドニューラルネットワーク(PINN)を用いた非定常重力電流の3次元再構成について検討した。 PINNの文脈では、目的関数がネットワーク予測と観測データとのミスマッチをペナルティ化し、自動微分を用いて基礎となる方程式を埋め込むニューラルネットワークを訓練することにより、流れ場を再構築する。本研究は、正準ロック交換構成の高忠実度数値実験に依存する。これにより、密度と速度に関する最先端の実験的な測定技術を模倣した、いくつかのトレーニングデータベース上で、PINNの再構築能力を定量的にベンチマークすることができる。特に、光減衰法(lat)による空間平均密度測定がトレーニング手順に採用されている。 pinnによるフロー再構成のための最適実験セットアップは,実装の複雑さと推定フィールドの精度という2つの基準に従って提案されている。 The present work investigates the use of physics-informed neural networks (PINNs) for the 3D reconstruction of unsteady gravity currents from limited data. In the PINN context, the flow fields are reconstructed by training a neural network whose objective function penalizes the mismatch between the network predictions and the observed data and embeds the underlying equations using automatic differentiation. This study relies on a high-fidelity numerical experiment of the canonical lock-exchange configuration. This allows us to benchmark quantitatively the PINNs reconstruction capabilities on several training databases that mimic state-of-the-art experimental measurement techniques for density and velocity. Notably, spatially averaged density measurements by light attenuation technique (LAT) are employed for the training procedure. An optimal experimental setup for flow reconstruction by PINNs is proposed according to two criteria : the implementation complexity and the accuracy of the inferred fields.	翻訳日:2023-06-17 03:17:00 公開日:2023-06-14
# 因子化階層型変分オートエンコーダにおけるコントラスト学習による不等角化音声表現の改善 Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder ( http://arxiv.org/abs/2211.08191v2 ) ライセンス: Link先を確認	Yuying Xie, Thomas Arildsen, Zheng-Hua Tan	(参考訳) 話者のアイデンティティと内容が異なる時間スケールで異なるという事実を活用すると、 \acrlong{fhvae} (\acrshort{fhvae}) は2つの属性を象徴するために異なる潜在変数を使用する。これらの属性の切り離しは、対応する潜在変数の異なる事前設定によって実行される。話者同一性変数の事前について、 \acr short{fhvae} は、発話スケールの変化平均と固定分散を持つガウス分布であると仮定する。トレーニングプロセスは、小さな一定の分散を設定することにより、先行する平均に近い1つの発話におけるアイデンティティ変数を促進する。しかし、この制約は、発話間の先行的な変化の平均として比較的弱い。そこで,本研究では,同じ話者を表す場合の話者識別変数を,他の話者と可能な限り距離を置けるようにするために,コントラスト学習を<acrshort{fhvae} フレームワークに導入する。この作業ではモデル構造は変更されていないが、トレーニングプロセスのみであるため、テスト中に追加のコストは必要ない。本論文の応用例として音声変換が選択されている。潜在変数評価には、話者識別変数の話者検証と識別、コンテンツ変数の音声認識が含まれる。さらに, 偽音声検出実験の結果から, 音声変換性能の評価を行った。その結果,提案手法は<acrshort{fhvae}と比較して話者識別とコンテンツ特徴抽出の両面で改善し,変換のベースラインよりも優れた性能を示した。 Leveraging the fact that speaker identity and content vary on different time scales, \acrlong{fhvae} (\acrshort{fhvae}) uses different latent variables to symbolize these two attributes. Disentanglement of these attributes is carried out by different prior settings of the corresponding latent variables. For the prior of speaker identity variable, \acrshort{fhvae} assumes it is a Gaussian distribution with an utterance-scale varying mean and a fixed variance. By setting a small fixed variance, the training process promotes identity variables within one utterance gathering close to the mean of their prior. However, this constraint is relatively weak, as the mean of the prior changes between utterances. Therefore, we introduce contrastive learning into the \acrshort{fhvae} framework, to make the speaker identity variables gathering when representing the same speaker, while distancing themselves as far as possible from those of other speakers. The model structure has not been changed in this work but only the training process, thus no additional cost is needed during testing. Voice conversion has been chosen as the application in this paper. Latent variable evaluations include speaker verification and identification for the speaker identity variable, and speech recognition for the content variable. Furthermore, assessments of voice conversion performance are on the grounds of fake speech detection experiments. Results show that the proposed method improves both speaker identity and content feature extraction compared to \acrshort{fhvae}, and has better performance than baseline on conversion.	翻訳日:2023-06-17 03:16:18 公開日:2023-06-14
# ディープラーニングのための方向性プライバシ Directional Privacy for Deep Learning ( http://arxiv.org/abs/2211.04686v2 ) ライセンス: Link先を確認	Pedro Faustini, Natasha Fernandes, Shakila Tonni, Annabelle McIver, Mark Dras	(参考訳) Differentially Private Stochastic Gradient Descent (DP-SGD)は、ディープラーニングモデルのトレーニングにプライバシーを適用するための重要な方法である。これはトレーニング中の勾配に等方性ガウスノイズを適用し、任意の方向にこれらの勾配を摂動させ、有用性を損なう。しかし、メトリックDPは、ユーティリティの保存にもっと適した任意のメトリクスに基づいた代替メカニズムを提供することができる。本稿では,von mises-fisher (vmf) 分布に基づく機構を用いて,von mises-fisher (vmf) 分布に基づく \textit{directional privacy} を適用し,グラデーション方向が広く保存されるように \textit{angular distance} を用いて勾配を摂動させる。このことは、ガウスのメカニズムの$(\epsilon, \delta)$-privacyではなく、深層学習トレーニングに$\epsilon$-DPと$\epsilon d$-privacyの両方を提供することを示している。これらの異なるフレームワーク間の$\epsilon$sを直接比較できないため、MIA(メンバシップ推論攻撃)を用いた標準DPフレームワーク内のプライバシを実証的に校正する経験的プライバシ校正メカニズムを検証し、MIAの強化と再構築攻撃の組み合わせが、プライバシ校正に適した方法であることを示した。キーデータセットの実験は、VMFメカニズムがユーティリティとプライバシのトレードオフでガウシアンを上回っていることを示している。特に,本実験は,再建と会員推定に対する防御能力の観点から,2つのアプローチのプライバシーを直接比較するものである。 Differentially Private Stochastic Gradient Descent (DP-SGD) is a key method for applying privacy in the training of deep learning models. This applies isotropic Gaussian noise to gradients during training, which can perturb these gradients in any direction, damaging utility. Metric DP, however, can provide alternative mechanisms based on arbitrary metrics that might be more suitable for preserving utility. In this paper, we apply \textit{directional privacy}, via a mechanism based on the von Mises-Fisher (VMF) distribution, to perturb gradients in terms of \textit{angular distance} so that gradient direction is broadly preserved. We show that this provides both $\epsilon$-DP and $\epsilon d$-privacy for deep learning training, rather than the $(\epsilon, \delta)$-privacy of the Gaussian mechanism; we observe that the $\epsilon d$-privacy guarantee does not require a $\delta>0$ term but degrades smoothly according to the dissimilarity of the input gradients. As $\epsilon$s between these different frameworks cannot be directly compared, we examine empirical privacy calibration mechanisms that go beyond previous work on empirically calibrating privacy within standard DP frameworks using membership inference attacks (MIA); we show that a combination of enhanced MIA and reconstruction attacks provides a suitable method for privacy calibration. Experiments on key datasets then indicate that the VMF mechanism can outperform the Gaussian in the utility-privacy trade-off. In particular, our experiments provide a direct comparison of privacy between the two approaches in terms of their ability to defend against reconstruction and membership inference.	翻訳日:2023-06-17 03:15:50 公開日:2023-06-14
# LMD:話者検証の逆例を検出する学習可能なマスクネットワーク LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification ( http://arxiv.org/abs/2211.00825v2 ) ライセンス: Link先を確認	Xing Chen, Jie Wang, Xiao-Lei Zhang, Wei-Qiang Zhang, and Kunde Yang	(参考訳) 自動話者検証(ASV)のセキュリティは、最近出現した敵攻撃によって深刻な脅威を受けているが、脅威を緩和するための対策がいくつかある。しかし、多くの防御的アプローチは、攻撃者の事前の知識を必要とするだけでなく、弱い解釈性も持っている。そこで本稿では,学習可能なマスク検出器 (LMD) と呼ばれる攻撃者非依存かつ解釈可能な手法を提案する。スコア変動は、元のオーディオ録音のASVスコアと、そのマスク付き複素スペクトログラムから合成された変換オーディオとの絶対的な差である、逆例を検出する指標としてスコア変動を利用する。スコア変動検出装置のコアコンポーネントは、ニューラルネットワークによってマスクされたスペクトログラムを生成することである。ニューラルネットワークはトレーニングの真の例のみを必要とするため、アタッカー非依存のアプローチになる。その解釈性は、ニューラルネットワークがターゲットのasvのスコア変動を最小限に抑えるように訓練され、本物のトレーニング例のマスキングされたスペクトログラムビンの数を最大化する。その基礎は、話者情報が少ない分光器箱の大部分をマスキングすることで、必然的に敵対的な例に大きなスコアの変動をもたらし、実際の例に小さなスコアの変動をもたらすという観察に基づいている。 12人の攻撃者と2人の代表的ASVシステムによる実験結果から,提案手法は最先端の5つのベースラインより優れていることがわかった。大規模な実験結果は、検出に基づくASV防御のベンチマークでもある。 Although the security of automatic speaker verification (ASV) is seriously threatened by recently emerged adversarial attacks, there have been some countermeasures to alleviate the threat. However, many defense approaches not only require the prior knowledge of the attackers but also possess weak interpretability. To address this issue, in this paper, we propose an attacker-independent and interpretable method, named learnable mask detector (LMD), to separate adversarial examples from the genuine ones. It utilizes score variation as an indicator to detect adversarial examples, where the score variation is the absolute discrepancy between the ASV scores of an original audio recording and its transformed audio synthesized from its masked complex spectrogram. A core component of the score variation detector is to generate the masked spectrogram by a neural network. The neural network needs only genuine examples for training, which makes it an attacker-independent approach. Its interpretability lies that the neural network is trained to minimize the score variation of the targeted ASV, and maximize the number of the masked spectrogram bins of the genuine training examples. Its foundation is based on the observation that, masking out the vast majority of the spectrogram bins with little speaker information will inevitably introduce a large score variation to the adversarial example, and a small score variation to the genuine example. Experimental results with 12 attackers and two representative ASV systems show that our proposed method outperforms five state-of-the-art baselines. The extensive experimental results can also be a benchmark for the detection-based ASV defenses.	翻訳日:2023-06-17 03:14:43 公開日:2023-06-14
# ランク制約最適化問題に対するDantzig-Wolfe緩和の効果について On the Exactness of Dantzig-Wolfe Relaxation for Rank Constrained Optimization Problems ( http://arxiv.org/abs/2210.16191v3 ) ライセンス: Link先を確認	Yongchun Li and Weijun Xie	(参考訳) 階数制約最適化問題(RCOP)では、予め定義された階数制約付き領域集合上の線形目的関数を最小化し、一般的な二辺行列の不等式を$m$とする。多くの非凸最適化問題を解く一般的なアプローチであるダンツィヒ=ウルフ分解(DW)によって動機付けられ、RCOPのDW緩和(DWR)の強さについて検討する。特に、我々の目標は、DWRが任意の m 個の二辺行列の不等式に対して RCOP と一致する条件を特徴づけることである。初歩的な観点からは、最初に知られた必要条件と十分条件を同時に開発する。 (i)極点正確性 -- DWR の可能な集合のすべての極点が RCOP の極点に属する。 (ii) 凸船体精度 -- DWR 実現可能な集合は RCOP 実現可能な集合の閉凸船体と同一である。 (iii) 客観的厳密性 - dwr と rcop の最適値が一致する。提案した条件は,2次制約付き2次プログラム(QCQP)と公正な教師なし学習において,既存の正確性を統一,洗練,拡張する。これらの条件は、2つの同質な2辺2次制約を持つ不均一な目的関数を許容するQCQP問題の極点完全性や、フェアSVDの凸包完全性など、新しい結果の同定に非常に有用である。 In the rank-constrained optimization problem (RCOP), it minimizes a linear objective function over a prespecified closed rank-constrained domain set and $m$ generic two-sided linear matrix inequalities. Motivated by the Dantzig-Wolfe (DW) decomposition, a popular approach of solving many nonconvex optimization problems, we investigate the strength of DW relaxation (DWR) of the RCOP, which admits the same formulation as RCOP except replacing the domain set by its closed convex hull. Notably, our goal is to characterize conditions under which the DWR matches RCOP for any m two-sided linear matrix inequalities. From the primal perspective, we develop the first-known simultaneously necessary and sufficient conditions that achieve: (i) extreme point exactness -- all the extreme points of the DWR feasible set belong to that of the RCOP; (ii) convex hull exactness -- the DWR feasible set is identical to the closed convex hull of RCOP feasible set; and (iii) objective exactness -- the optimal values of the DWR and RCOP coincide. The proposed conditions unify, refine, and extend the existing exactness results in the quadratically constrained quadratic program (QCQP) and fair unsupervised learning. These conditions can be very useful to identify new results, including the extreme point exactness for a QCQP problem that admits an inhomogeneous objective function with two homogeneous two-sided quadratic constraints and the convex hull exactness for fair SVD.	翻訳日:2023-06-17 03:14:15 公開日:2023-06-14
# 群対称性を用いた連続視覚に基づく強化学習 Continual Vision-based Reinforcement Learning with Group Symmetries ( http://arxiv.org/abs/2210.12301v2 ) ライセンス: Link先を確認	Shiqi Liu, Mengdi Xu, Piede Huang, Yongkang Liu, Kentaro Oguchi, Ding Zhao	(参考訳) 継続的な強化学習は、様々なタスクを順次学習し、以前遭遇したタスクを実行する能力を維持し、同時に新しいタスクのための新しいポリシーを開発することを目的としている。しかし、現在の連続RLアプローチは、特定のタスクが回転や翻訳といった基本的なグループ操作、特に視覚入力において同一であるという事実を無視する。彼らは、同じタスクごとに新しいポリシーを学習し、維持する必要があり、サンプル効率が悪く、一般化能力が弱い。そこで本研究では,個別のタスクではなく,個別のタスク群に対するポリシーを育成し,グループ対称性を認識できる,一意な連続視覚に基づく強化学習手法を提案する。 COVERSは、近似ポリシー最適化に基づくRLアルゴリズムと、同変特徴抽出器と、抽出した不変特徴に依存する新しいタスクグループ化機構を用いる。シミュレーションと実ロボットプラットフォームの両方において,画像観察とロボット固有情報を含むテーブルトップ操作タスクのシーケンスについて評価する。その結果, COVERS は各グループにタスクを正確に割り当て, 一般化能力において既存手法よりも優れていた。 Continual reinforcement learning aims to sequentially learn a variety of tasks, retaining the ability to perform previously encountered tasks while simultaneously developing new policies for novel tasks. However, current continual RL approaches overlook the fact that certain tasks are identical under basic group operations like rotations or translations, especially with visual inputs. They may unnecessarily learn and maintain a new policy for each similar task, leading to poor sample efficiency and weak generalization capability. To address this, we introduce a unique Continual Vision-based Reinforcement Learning method that recognizes Group Symmetries, called COVERS, cultivating a policy for each group of equivalent tasks rather than individual tasks. COVERS employs a proximal policy optimization-based RL algorithm with an equivariant feature extractor and a novel task grouping mechanism that relies on the extracted invariant features. We evaluate COVERS on sequences of table-top manipulation tasks that incorporate image observations and robot proprioceptive information in both simulations and on real robot platforms. Our results show that COVERS accurately assigns tasks to their respective groups and significantly outperforms existing methods in terms of generalization capability.	翻訳日:2023-06-17 03:13:20 公開日:2023-06-14
# GREAD: グラフニューラル反応拡散ネットワーク GREAD: Graph Neural Reaction-Diffusion Networks ( http://arxiv.org/abs/2211.14208v3 ) ライセンス: Link先を確認	Jeongwhan Choi, Seoyoung Hong, Noseong Park, Sung-Bae Cho	(参考訳) グラフニューラルネットワーク(GNN)は、ディープラーニングに関する最も人気のある研究トピックの1つである。 GNN法は通常、グラフ信号処理理論に基づいて設計されている。特に、拡散方程式はGNNのコア処理層の設計に広く用いられており、悪名高い過密問題に対して必然的に脆弱である。最近、いくつかの論文が拡散方程式とともに反応方程式に注意を払っている。しかし、それらはすべて限定的な反応方程式である。そこで本研究では,我々が設計した1つの特殊反応方程式に加えて,一般的な反応方程式をすべて考慮した反応拡散式に基づくgnn法を提案する。本論文は,反応拡散式に基づくgnnに関する最も包括的な研究の1つである。 9つのデータセットと28のベースラインを用いた実験では、GREADと呼ばれる手法がほとんどのケースで優れています。さらなる合成データ実験により、オーバースムーシング問題を緩和し、様々なホモフィリー率でうまく機能することが示された。 Graph neural networks (GNNs) are one of the most popular research topics for deep learning. GNN methods typically have been designed on top of the graph signal processing theory. In particular, diffusion equations have been widely used for designing the core processing layer of GNNs, and therefore they are inevitably vulnerable to the notorious oversmoothing problem. Recently, a couple of papers paid attention to reaction equations in conjunctions with diffusion equations. However, they all consider limited forms of reaction equations. To this end, we present a reaction-diffusion equation-based GNN method that considers all popular types of reaction equations in addition to one special reaction equation designed by us. To our knowledge, our paper is one of the most comprehensive studies on reaction-diffusion equation-based GNNs. In our experiments with 9 datasets and 28 baselines, our method, called GREAD, outperforms them in a majority of cases. Further synthetic data experiments show that it mitigates the oversmoothing problem and works well for various homophily rates.	翻訳日:2023-06-17 03:05:07 公開日:2023-06-14
# プレイヤーは次に動くのか? バドミントンにおける運動予測のための動的グラフと階層融合 Where Will Players Move Next? Dynamic Graphs and Hierarchical Fusion for Movement Forecasting in Badminton ( http://arxiv.org/abs/2211.12217v2 ) ライセンス: Link先を確認	Kai-Shiang Chang, Wei-Yao Wang, Wen-Chih Peng	(参考訳) 各種データの分析により,トレーニング戦略やプレーヤ評価などの洞察が得られ,スポーツ分析が注目を集めている。そこで本稿では,どの種類の復帰ストロークが作られるか,また,選手が前回のストロークに基づいてどこに移動するかを予測することに焦点を当てる。この問題はこれまで解決されていないため、シーケンス予測タスクとして定式化することにより、シーケンスベースおよびグラフベースのモデルを通じて動き予測に取り組むことができる。しかし、既存のシーケンスベースのモデルはプレイヤー間の相互作用の影響を無視しており、グラフベースのモデルは次の動きに対する多面的視点に苦しむ。また、プレイヤーのショットタイプや動きの戦略的関係を表現する作業は現存していない。これらの課題に対処するために,まず,プレイヤーの動き(pm)グラフの手順を導入し,プレイヤーの構造的動きを戦略的関係に活用する。 PMグラフに基づいて,対話スタイル抽出器を用いた動的グラフと階層型動き予測モデル(DyMF)を提案する。さらに、階層的融合モジュールはプレイヤーとラリー相互作用の両方のスタイルの影響を組み込むように設計されている。広範な実験により,本モデルが逐次的およびグラフ的手法を経験的に上回っており,動き予測の実用性が示される。 Sports analytics has captured increasing attention since analysis of the various data enables insights for training strategies, player evaluation, etc. In this paper, we focus on predicting what types of returning strokes will be made, and where players will move to based on previous strokes. As this problem has not been addressed to date, movement forecasting can be tackled through sequence-based and graph-based models by formulating as a sequence prediction task. However, existing sequence-based models neglect the effects of interactions between players, and graph-based models still suffer from multifaceted perspectives on the next movement. Moreover, there is no existing work on representing strategic relations among players' shot types and movements. To address these challenges, we first introduce the procedure of the Player Movements (PM) graph to exploit the structural movements of players with strategic relations. Based on the PM graph, we propose a novel Dynamic Graphs and Hierarchical Fusion for Movement Forecasting model (DyMF) with interaction style extractors to capture the mutual interactions of players themselves and between both players within a rally, and dynamic players' tactics across time. In addition, hierarchical fusion modules are designed to incorporate the style influence of both players and rally interactions. Extensive experiments show that our model empirically outperforms both sequence- and graph-based methods and demonstrate the practical usage of movement forecasting.	翻訳日:2023-06-17 03:04:18 公開日:2023-06-14
# Aging with GRACE: 離散キーバリューアダプタによる生涯モデル編集 Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adapters ( http://arxiv.org/abs/2211.11031v3 ) ライセンス: Link先を確認	Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi	(参考訳) デプロイされたモデルは、入力のシフト、ユーザニーズの変化、あるいは創発的な知識ギャップによって、時間の経過とともに崩壊する。有害な行動が特定される場合、ターゲットとする編集が必要である。しかし、事前訓練されたモデルの特定の振る舞いを調整する現在のモデルエディタは、複数の編集でモデル性能を低下させる。本稿では,展開モデルのストリーミングエラーにスポットフィックスを実装し,無関係な入力への影響を最小限に抑えるライフロングモデル編集手法であるGRACEを提案する。 GRACEはトレーニング済みモデルの潜在空間に新しいマッピングを書き、モデルの重みを変えることなく、個別にローカルな編集のコードブックを作成する。これはストリーミングエラーのみを使用して、数千のシーケンシャルな編集を可能にする最初の方法である。 T5,BERT,GPTモデルを用いた実験では,非表示入力に一般化しつつ,編集および保持におけるGRACEの最先端性能を示す。私たちのコードはhttps://www.github.com/thartvigsen/grace}{github.com/thartvigsen/grace}で入手できる。 Deployed models decay over time due to shifting inputs, changing user needs, or emergent knowledge gaps. When harmful behaviors are identified, targeted edits are required. However, current model editors, which adjust specific behaviors of pre-trained models, degrade model performance over multiple edits. We propose GRACE, a Lifelong Model Editing method, which implements spot-fixes on streaming errors of a deployed model, ensuring minimal impact on unrelated inputs. GRACE writes new mappings into a pre-trained model's latent space, creating a discrete, local codebook of edits without altering model weights. This is the first method enabling thousands of sequential edits using only streaming errors. Our experiments on T5, BERT, and GPT models show GRACE's state-of-the-art performance in making and retaining edits, while generalizing to unseen inputs. Our code is available at https://www.github.com/thartvigsen/grace}{github.com/thartvigsen/grace}.	翻訳日:2023-06-17 03:03:38 公開日:2023-06-14
# 特徴帰属を伴うニューラルマシン翻訳における幻覚の低減 Reducing Hallucinations in Neural Machine Translation with Feature Attribution ( http://arxiv.org/abs/2211.09878v2 ) ライセンス: Link先を確認	Jo\"el Tang, Marina Fomicheva, Lucia Specia	(参考訳) ニューラル条件付き言語生成モデルは、ニューラルネットワーク翻訳(NMT)の最先端を実現するが、並列トレーニングデータセットの品質に大きく依存する。低品質のデータセットでトレーニングすると、これらのモデルは幻覚、すなわち、流動的だが原文とは無関係な出力を含む様々なエラータイプに傾向がある。これらの誤りは特に危険である、なぜなら表面上は翻訳が正しい出力であると認識でき、特に読者がソース言語を理解していない場合である。 NMTにおける幻覚の軽減を目的としたモデル理解と正規化に着目したケーススタディを提案する。まず,幻覚を発生させるnmtモデルの行動を研究するために特徴帰属法を用いる。次に,これらの手法を用いて幻覚を低減し,モデルをスクラッチから再トレーニングする必要のない新しい損失関数を提案する。 Neural conditional language generation models achieve the state-of-the-art in Neural Machine Translation (NMT) but are highly dependent on the quality of parallel training dataset. When trained on low-quality datasets, these models are prone to various error types, including hallucinations, i.e. outputs that are fluent, but unrelated to the source sentences. These errors are particularly dangerous, because on the surface the translation can be perceived as a correct output, especially if the reader does not understand the source language. We present a case study focusing on model understanding and regularisation to reduce hallucinations in NMT. We first use feature attribution methods to study the behaviour of an NMT model that produces hallucinations. We then leverage these methods to propose a novel loss function that substantially helps reduce hallucinations and does not require retraining the model from scratch.	翻訳日:2023-06-17 03:02:47 公開日:2023-06-14
# ポイントクラウド登録のための解空間切断を用いた進化的マルチタスク Evolutionary Multitasking with Solution Space Cutting for Point Cloud Registration ( http://arxiv.org/abs/2212.05679v2 ) ライセンス: Link先を確認	Wu Yue, Peiran Gong, Maoguo Gong, Hangqi Ding, Zedong Tang, Yibo Liu, Wenping Ma, Qiguang Miao	(参考訳) ポイントクラウド登録(PCR)はコンピュータビジョンにおいて人気のある研究トピックである。近年,対象関数設計における初期ポーズに対する頑健さと柔軟性から,進化的手法による登録法が注目されている。しかし、ほとんどの登録法は局所最適にうまく対応できず、成功率を調査することはめったになく、これは局所最適に陥らない可能性を示し、アルゴリズムの実用性に密接に関係している。進化的マルチタスク最適化(EMTO)は、関連するタスク間の知識伝達を通じて探索能力を向上するパラダイムである。この概念に着想を得た本研究では,マルチタスク構成を解空間切断の考え方に基づくEMTOによる新規な登録アルゴリズムを提案する。具体的には, カットスペースを探索するタスクは, 局所最適から逃れ, 登録率を向上する上で, 複雑な関数ランドスケープを伴うタスクを支援する。不要な計算コストを削減するため,スパース・トゥ・ダンス戦略を提案する。また,様々なオーバーラップ率に頑健な新しい適合関数と,計算コストの課題特異的指標を導入する。 8つの進化的アプローチ,4つの従来のアプローチ,および3つのディープラーニングアプローチによるオブジェクトスケールおよびシーンスケールの登録データセットと比較し,実験結果から,提案手法は精度と局所最適処理において優れた性能を示した。 Point cloud registration (PCR) is a popular research topic in computer vision. Recently, the registration method in an evolutionary way has received continuous attention because of its robustness to the initial pose and flexibility in objective function design. However, most evolving registration methods cannot tackle the local optimum well and they have rarely investigated the success ratio, which implies the probability of not falling into local optima and is closely related to the practicality of the algorithm. Evolutionary multi-task optimization (EMTO) is a widely used paradigm, which can boost exploration capability through knowledge transfer among related tasks. Inspired by this concept, this study proposes a novel evolving registration algorithm via EMTO, where the multi-task configuration is based on the idea of solution space cutting. Concretely, one task searching in cut space assists another task with complex function landscape in escaping from local optima and enhancing successful registration ratio. To reduce unnecessary computational cost, a sparse-to-dense strategy is proposed. In addition, a novel fitness function robust to various overlap rates as well as a problem-specific metric of computational cost is introduced. Compared with 8 evolving approaches, 4 traditional approaches and 3 deep learning approaches on the object-scale and scene-scale registration datasets, experimental results demonstrate that the proposed method has superior performances in terms of precision and tackling local optima.	翻訳日:2023-06-17 02:56:54 公開日:2023-06-14
# TIDE: グラフによるディープラーニングのための時間微分拡散 TIDE: Time Derivative Diffusion for Deep Learning on Graphs ( http://arxiv.org/abs/2212.02483v2 ) ライセンス: Link先を確認	Maysam Behmanesh, Maximilian Krahn, Maks Ovsjanikov	(参考訳) グラフニューラルネットワークの顕著なパラダイムは、メッセージパッシングフレームワークに基づいている。この枠組みでは、隣接ノード間のみの情報通信を実現する。このパラダイムを使用するアプローチの課題は、深層畳み込みネットワークが過密になりやすいため、ノード間の効率的で正確な長距離通信を保証することである。本稿では,メッセージパッシングフレームワークの構造的制約を克服するために,時間微分グラフ拡散(tide)に基づく新しい手法を提案する。提案手法により,様々なタスクやネットワークチャネル間の空間的拡散範囲を最適化し,中長距離通信を効率的に行うことができる。さらに, アーキテクチャ設計により, ローカルメッセージパッシングが可能であり, ローカルメッセージパッシングの能力を継承できることを示す。グラフベンチマークと合成メッシュとグラフデータセットの両方において,提案フレームワークが最先端手法を著しく上回っていることを示す。 A prominent paradigm for graph neural networks is based on the message-passing framework. In this framework, information communication is realized only between neighboring nodes. The challenge of approaches that use this paradigm is to ensure efficient and accurate long-distance communication between nodes, as deep convolutional networks are prone to oversmoothing. In this paper, we present a novel method based on time derivative graph diffusion (TIDE) to overcome these structural limitations of the message-passing framework. Our approach allows for optimizing the spatial extent of diffusion across various tasks and network channels, thus enabling medium and long-distance communication efficiently. Furthermore, we show that our architecture design also enables local message-passing and thus inherits from the capabilities of local message-passing approaches. We show that on both widely used graph benchmarks and synthetic mesh and graph datasets, the proposed framework outperforms state-of-the-art methods by a significant margin	翻訳日:2023-06-17 02:56:11 公開日:2023-06-14
# 組込みラベルノイズロバスト深部画像表現学習の創発的推論 Generative Reasoning Integrated Label Noise Robust Deep Image Representation Learning ( http://arxiv.org/abs/2212.01261v2 ) ライセンス: Link先を確認	Gencer Sumbul and Beg\"um Demir	(参考訳) 深層学習に基づく画像表現学習(IRL)手法の開発は,様々な画像理解問題に対して大きな注目を集めている。これらの手法の多くは、大量の注釈付き訓練画像の可用性と品質を必要としており、収集には時間と費用がかかる。ラベル費用を削減するため、クラウドソースデータ、自動ラベル付け手順、市民科学プロジェクトなどが考えられる。しかしながら、このようなアプローチは、トレーニングデータにラベルノイズを含めるリスクを増大させる。差別的推論が採用されると、ノイズラベルが過小評価される可能性がある。これにより、準最適学習手順が導き出され、画像の特徴が不正確になる。そこで本研究では,生成的推論統合ラベル雑音ロバスト深部表現学習(grid)手法を提案する。本研究の目的は、雑音ラベル下でのIRLの識別的・生成的推論の相補的特性をモデル化することである。そこで我々はまず,教師付き変分オートエンコーダを用いて生成的推論を識別的推論に統合する。これにより、グリッドはノイズラベルでトレーニングサンプルを自動的に検出できる。そして,ラベルノイズによる頑健なハイブリッド表現学習戦略を通じて,これらのサンプルのIRLの学習手順全体を,識別的推論により生成的推論および他のサンプルの学習手法によって調整する。提案手法は,irl法とは独立に雑音ラベルの干渉を防止しつつ,識別的画像表現を学習する。したがって、既存の手法とは異なり、GRIDはアノテーションの種類、ニューラルネットワークアーキテクチャ、損失関数、学習タスクに依存しないため、様々な問題に直接利用することができる。実験結果から, 最先端手法と比較して有効性を示した。 GRIDのコードはhttps://github.com/gencersumbul/GRIDで公開されている。 The development of deep learning based image representation learning (IRL) methods has attracted great attention for various image understanding problems. Most of these methods require the availability of a high quantity and quality of annotated training images, which can be time-consuming and costly to gather. To reduce labeling costs, crowdsourced data, automatic labeling procedures or citizen science projects can be considered. However, such approaches increase the risk of including label noise in training data. It may result in overfitting on noisy labels when discriminative reasoning is employed. This leads to sub-optimal learning procedures, and thus inaccurate characterization of images. To address this, we introduce a generative reasoning integrated label noise robust deep representation learning (GRID) approach. Our approach aims to model the complementary characteristics of discriminative and generative reasoning for IRL under noisy labels. To this end, we first integrate generative reasoning into discriminative reasoning through a supervised variational autoencoder. This allows GRID to automatically detect training samples with noisy labels. Then, through our label noise robust hybrid representation learning strategy, GRID adjusts the whole learning procedure for IRL of these samples through generative reasoning and that of other samples through discriminative reasoning. Our approach learns discriminative image representations while preventing interference of noisy labels independently from the IRL method being selected. Thus, unlike the existing methods, GRID does not depend on the type of annotation, neural network architecture, loss function or learning task, and thus can be directly utilized for various problems. Experimental results show its effectiveness compared to state-of-the-art methods. The code of GRID is publicly available at https://github.com/gencersumbul/GRID.	翻訳日:2023-06-17 02:55:15 公開日:2023-06-14
# 非凸低ランク半有限緩和による逆学習ニューラルネットワークのタイト認証 Tight Certification of Adversarially Trained Neural Networks via Nonconvex Low-Rank Semidefinite Relaxations ( http://arxiv.org/abs/2211.17244v3 ) ライセンス: Link先を確認	Hong-Ming Chiu and Richard Y. Zhang	(参考訳) adversarial trainingは、adversarial perturbationに対して経験的に堅牢な高品質なニューラルネットワークモデルを作成することでよく知られている。それでも、一度モデルが逆行訓練を受けたら、モデルが将来の攻撃に対して真に堅牢であることを証明したいと願うことが多い。残念なことに、敵対的に訓練されたモデルに直面した場合、既存のアプローチはすべて、実用的に使えるほど強力な認証を作成するのに苦労しています。特に線形プログラミング(LP)技術は「凸緩和障壁」(convex relaxation barrier)に直面しており、混合整数線形プログラミング(MILP)と分岐およびバウンド(BnB)技術で洗練されても高品質な認証を行うことができない。本稿では,半定値プログラミング(SDP)緩和の低ランク制約に基づく非凸認証手法を提案する。非凸緩和により、より高価なSDPメソッドに匹敵する強力な認証が得られ、より弱いLPメソッドに匹敵する、劇的に少ない変数を最適化する。非凸性にもかかわらず、既製の局所最適化アルゴリズムが多項式時間における大域的最適性の実現と証明にどのように役立つかを示す。実験の結果,非凸緩和は正反対に訓練されたモデルの正確な認証に対するギャップをほぼ完全に埋めることがわかった。 Adversarial training is well-known to produce high-quality neural network models that are empirically robust against adversarial perturbations. Nevertheless, once a model has been adversarially trained, one often desires a certification that the model is truly robust against all future attacks. Unfortunately, when faced with adversarially trained models, all existing approaches have significant trouble making certifications that are strong enough to be practically useful. Linear programming (LP) techniques in particular face a "convex relaxation barrier" that prevent them from making high-quality certifications, even after refinement with mixed-integer linear programming (MILP) and branch-and-bound (BnB) techniques. In this paper, we propose a nonconvex certification technique, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. The nonconvex relaxation makes strong certifications comparable to much more expensive SDP methods, while optimizing over dramatically fewer variables comparable to much weaker LP methods. Despite nonconvexity, we show how off-the-shelf local optimization algorithms can be used to achieve and to certify global optimality in polynomial time. Our experiments find that the nonconvex relaxation almost completely closes the gap towards exact certification of adversarially trained models.	翻訳日:2023-06-17 02:54:26 公開日:2023-06-14
# Prompt-Augmented Linear Probing:Few-shot In-Context Learnersの限界を超えるスケーリング Prompt-Augmented Linear Probing: Scaling beyond the Limit of Few-shot In-Context Learners ( http://arxiv.org/abs/2212.10873v3 ) ライセンス: Link先を確認	Hyunsoo Cho, Hyuhng Joon Kim, Junyeob Kim, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, Taeuk Kim	(参考訳) In-context Learning (ICL) を通じて、大規模言語モデルは、追加のモデル微調整なしで効果的な数ショット学習者となる。しかし、ICLの性能は、基礎となる言語モデル固有の入力長制約によって制限されるため、利用可能なトレーニングサンプルの数に匹敵しない。一方、言語モデルもまた強力な特徴抽出器であり、ブラックボックス方式で利用でき、事前抽出された入力表現の上に軽量な識別器を訓練する線形探索パラダイムを可能にすることが多くの研究で明らかにされている。本稿では,両世界の最善を生かす線形プローブと icl のハイブリッドである promp-augmented linear probing (palp) を提案する。 PALPは線形探索のスケーラビリティと言語モデルを強制することで、入力をより知覚可能な形式に調整することでより意味のある表現を導き出す能力を継承する。各種データセットの詳細な調査を通じて、PALPは、データ・ハングリーシナリオにおけるICL間のギャップを閉じる入力表現と、トレーニングオーバーヘッドの少ないデータ・バウンダントシナリオでの微調整を著しく強化し、ブラックボックスシナリオにおいてPALPが強力な代替手段となる可能性を検証した。 Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning. However, the ICL performance does not scale well with the number of available training samples as it is limited by the inherent input length constraint of the underlying language model. Meanwhile, many studies have revealed that language models are also powerful feature extractors, allowing them to be utilized in a black-box manner and enabling the linear probing paradigm, where lightweight discriminators are trained on top of the pre-extracted input representations. This paper proposes prompt-augmented linear probing (PALP), a hybrid of linear probing and ICL, which leverages the best of both worlds. PALP inherits the scalability of linear probing and the capability of enforcing language models to derive more meaningful representations via tailoring input into a more conceivable form. Throughout in-depth investigations on various datasets, we verified that PALP significantly enhances the input representations closing the gap between ICL in the data-hungry scenario and fine-tuning in the data-abundant scenario with little training overhead, potentially making PALP a strong alternative in a black-box scenario.	翻訳日:2023-06-17 02:45:34 公開日:2023-06-14
# GPT-3は良いデータアノテーションか? Is GPT-3 a Good Data Annotator? ( http://arxiv.org/abs/2212.10450v2 ) ライセンス: Link先を確認	Bosheng Ding, Chengwei Qin, Linlin Liu, Yew Ken Chia, Shafiq Joty, Boyang Li, Lidong Bing	(参考訳) データアノテーションは、機械学習モデルのトレーニングに使用できるデータのラベル付けプロセスである。モデルが入力データと所望の出力の関係を学習できるようにするため、高品質なアノテーションを持つことが不可欠である。 OpenAIが開発した大規模言語モデルであるGPT-3は、広範囲なNLPタスクにおいて、ゼロショットと少数ショットのパフォーマンスを誇示している。したがって、NLPタスクのデータに効果的にアノテートできるかどうか疑問に思うのが自然である。本稿では,GPT-3を従来のデータアノテーション手法と比較し,その出力を様々なタスクで分析することにより,データアノテータとしての性能を評価する。そこで本研究では,NLPにおける汎用データアノテータとしてのGPT-3の可能性について考察する。 Data annotation is the process of labeling data that could be used to train machine learning models. Having high-quality annotation is crucial, as it allows the model to learn the relationship between the input data and the desired output. GPT-3, a large-scale language model developed by OpenAI, has demonstrated impressive zero- and few-shot performance on a wide range of NLP tasks. It is therefore natural to wonder whether it can be used to effectively annotate data for NLP tasks. In this paper, we evaluate the performance of GPT-3 as a data annotator by comparing it with traditional data annotation methods and analyzing its output on a range of tasks. Through this analysis, we aim to provide insight into the potential of GPT-3 as a general-purpose data annotator in NLP.	翻訳日:2023-06-17 02:45:12 公開日:2023-06-14
# 干し草の山に刺さる針--mturkにおける高品位労働者の要約分析 Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization ( http://arxiv.org/abs/2212.10397v3 ) ライセンス: Link先を確認	Lining Zhang, Simon Mille, Yufang Hou, Daniel Deutsch, Elizabeth Clark, Yixin Liu, Saad Mahamood, Sebastian Gehrmann, Miruna Clinciu, Khyathi Chandu, Jo\~ao Sedoc	(参考訳) 低品質アノテーションにおけるリソースのコストと非効率的な使用を防止するため、自動要約評価などの困難なタスクを効果的に完了できる信頼可能なアノテータのプールを作成する方法が望まれる。そこで本研究では,amazon mechanical turk workersの2段階パイプラインによる採用について検討する。我々は、評価を行う前にサブパーワーカーをフィルタリングし、リソースに類似した制約のある高収差アノテーションを得られることを示す。当社のワーカーは、自分自身とクラウドリサーチワーカーの間で強いコンセンサスを示していますが、データのサブセットに対する専門家の判断との一致は期待どおりではなく、正確性に関するさらなるトレーニングが必要です。この論文は、他の困難なアノテーションタスクにおいて、資格アノテータを採用するためのベストプラクティスとして機能する。 To prevent the costly and inefficient use of resources on low-quality annotations, we want a method for creating a pool of dependable annotators who can effectively complete difficult tasks, such as evaluating automatic summarization. Thus, we investigate the recruitment of high-quality Amazon Mechanical Turk workers via a two-step pipeline. We show that we can successfully filter out subpar workers before they carry out the evaluations and obtain high-agreement annotations with similar constraints on resources. Although our workers demonstrate a strong consensus among themselves and CloudResearch workers, their alignment with expert judgments on a subset of the data is not as expected and needs further training in correctness. This paper still serves as a best practice for the recruitment of qualified annotators in other challenging annotation tasks.	翻訳日:2023-06-17 02:44:58 公開日:2023-06-14
# DOC:詳細なアウトライン制御による長いストーリーコヒーレンスの改善 DOC: Improving Long Story Coherence With Detailed Outline Control ( http://arxiv.org/abs/2212.10077v3 ) ライセンス: Link先を確認	Kevin Yang, Dan Klein, Nanyun Peng, Yuandong Tian	(参考訳) 複数単語のストーリーを自動的に生成する際の長距離プロットコヒーレンスを改善するための詳細アウトライン制御(DOC)フレームワークを提案する。 DOCは2つの補完的なコンポーネントで構成されている。詳細アウトラインは、より詳細で階層的に構造化されたアウトラインを作成し、クリエイティブな負担をメインのドラフト手順から計画段階に移行する。詳細コントローラは、アウトラインの詳細に合わせてストーリーの節を制御することで、生成中もより詳細なアウトラインが尊重されるようにします。自動生成ストーリーの人間による評価では、DOCはプロットコヒーレンス(22.5%の絶対ゲイン)、アウトライン関連(28.2%)、面白さ(20.7%)で強いRe3ベースライン(Yang et al., 2022)を大幅に上回る。人間はまた、DOCは対話的な世代設定においてはるかに制御可能であると判断した。 We propose the Detailed Outline Control (DOC) framework for improving long-range plot coherence when automatically generating several-thousand-word-long stories. DOC consists of two complementary components: a detailed outliner and a detailed controller. The detailed outliner creates a more detailed, hierarchically structured outline, shifting creative burden from the main drafting procedure to the planning stage. The detailed controller ensures the more detailed outline is still respected during generation by controlling story passages to align with outline details. In human evaluations of automatically generated stories, DOC substantially outperforms a strong Re3 baseline (Yang et al., 2022) on plot coherence (22.5% absolute gain), outline relevance (28.2%), and interestingness (20.7%). Humans also judged DOC to be much more controllable in an interactive generation setting.	翻訳日:2023-06-17 02:44:43 公開日:2023-06-14
# マルコフ決定過程における因果時間推論 Causal Temporal Reasoning for Markov Decision Processes ( http://arxiv.org/abs/2212.08712v2 ) ライセンス: Link先を確認	Milad Kazemi and Nicola Paoletti	(参考訳) 我々は,マルコフ決定過程(mdp)の検証のための新しい確率的時間論理である$\textit{pcftl (probabilistic counterfactual temporal logic)$を導入する。 PCFTLは因果推論の演算子を初めて含み、介入的および反事実的クエリを表現できる。経路公式 $\phi$ が与えられたとき、介入性は、特定の変更 $I$ を MDP に適用した場合に$\phi$ の満足度確率に関係する(例えば、別のポリシーに切り替えるなど)。 MDP の異なる構成を含む \textit{what-if} のシナリオを推論できるため、我々のアプローチは、固定されたシステム構成のみを推論できる既存の確率的時間論理から逸脱している。統語論的観点から,PCTLなどの従来の確率演算子と同様に,介入確率と反ファクト確率の両方を仮定する一般化された反ファクト演算子を導入する。意味論の観点からは、我々の論理はmdpの構造的因果モデル変換によって解釈され、これは反事実的推論を許容する表現を与える。グリッドワールドモデルのベンチマークを用いて,PCFTLを安全な強化学習の文脈で評価する。 We introduce $\textit{PCFTL (Probabilistic CounterFactual Temporal Logic)}$, a new probabilistic temporal logic for the verification of Markov Decision Processes (MDP). PCFTL is the first to include operators for causal reasoning, allowing us to express interventional and counterfactual queries. Given a path formula $\phi$, an interventional property is concerned with the satisfaction probability of $\phi$ if we apply a particular change $I$ to the MDP (e.g., switching to a different policy); a counterfactual allows us to compute, given an observed MDP path $\tau$, what the outcome of $\phi$ would have been had we applied $I$ in the past. For its ability to reason about \textit{what-if} scenarios involving different configurations of the MDP, our approach represents a departure from existing probabilistic temporal logics that can only reason about a fixed system configuration. From a syntactic viewpoint, we introduce a generalized counterfactual operator that subsumes both interventional and counterfactual probabilities as well as the traditional probabilistic operator found in e.g., PCTL. From a semantics viewpoint, our logic is interpreted over a structural causal model translation of the MDP, which gives us a representation amenable to counterfactual reasoning. We evaluate PCFTL in the context of safe reinforcement learning using a benchmark of grid-world models.	翻訳日:2023-06-17 02:44:10 公開日:2023-06-14
# 説明付きAI意思決定における人間直観の役割の理解 Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations ( http://arxiv.org/abs/2301.07255v3 ) ライセンス: Link先を確認	Valerie Chen, Q. Vera Liao, Jennifer Wortman Vaughan, Gagan Bansal	(参考訳) AIの説明は、人間とAIの意思決定を改善する方法としてしばしば言及されるが、実証的研究は、説明の有効性の一貫性のある証拠を見出さず、逆に、AIシステムが間違っている場合に過度な信頼性を高めることができることを示唆している。多くの要因がAIサポートに依存する可能性があるが、意思決定者がAIの予測をいつオーバーライドするかを決定するためにAIシステムが提供する情報と、事前知識、経験、パターン認識に基づいて、自身の直観(信念やヒューリスティックス)をどう解釈するかが重要な要素である。我々は、意思決定者の直感がAIの予測と説明の使用にどのように影響するか、そして最終的にAIに依存するタイミングを選択するために、2つの予測タスクのための2つの説明タイプ(機能と例に基づく)で、思考アラウドと混合メソッドの研究を行う。結果から,AIの予測と説明に関する推論に関わる3つの直観,すなわちタスク結果,特徴,AIの限界に関する直観を抽出した。これらに基づいて、意思決定者が自身の直感を適用し、AI予測を上書きする3つの観察経路を要約する。筆者らは,(1)特徴に基づく説明が参加者の判断結果を改善せず,AIに対する信頼度を高めなかった理由,(2)特徴に基づく説明よりも意思決定者のパフォーマンスを向上し,補完的な人間-AIのパフォーマンスを実現した事例に基づく説明を,これらの経路を用いて説明している。全体として、私たちの研究は、意思決定者がAIに適切に依存するための直感を効果的に適用するのに役立つAI意思決定支援システムと説明方法のさらなる発展に向けた方向性を特定します。 AI explanations are often mentioned as a way to improve human-AI decision-making, but empirical studies have not found consistent evidence of explanations' effectiveness and, on the contrary, suggest that they can increase overreliance when the AI system is wrong. While many factors may affect reliance on AI support, one important factor is how decision-makers reconcile their own intuition -- beliefs or heuristics, based on prior knowledge, experience, or pattern recognition, used to make judgments -- with the information provided by the AI system to determine when to override AI predictions. We conduct a think-aloud, mixed-methods study with two explanation types (feature- and example-based) for two prediction tasks to explore how decision-makers' intuition affects their use of AI predictions and explanations, and ultimately their choice of when to rely on AI. Our results identify three types of intuition involved in reasoning about AI predictions and explanations: intuition about the task outcome, features, and AI limitations. Building on these, we summarize three observed pathways for decision-makers to apply their own intuition and override AI predictions. We use these pathways to explain why (1) the feature-based explanations we used did not improve participants' decision outcomes and increased their overreliance on AI, and (2) the example-based explanations we used improved decision-makers' performance over feature-based explanations and helped achieve complementary human-AI performance. Overall, our work identifies directions for further development of AI decision-support systems and explanation methods that help decision-makers effectively apply their intuition to achieve appropriate reliance on AI.	翻訳日:2023-06-17 02:36:43 公開日:2023-06-14
# 手術集約:分散医用画像データセットを多様なタスクで調和させる協調学習フレームワーク Surgical Aggregation: A Collaborative Learning Framework for Harmonizing Distributed Medical Imaging Datasets with Diverse Tasks ( http://arxiv.org/abs/2301.06683v4 ) ライセンス: Link先を確認	Pranav Kulkarni, Adway Kanhere, Paul H. Yi, Vishwa S. Parekh	(参考訳) 大規模胸部X線データセットは、深層学習を用いて異常を検出するためにキュレートされ、多くの臨床応用において大きな利益をもたらす可能性がある。しかしながら、各データセットは、患者に同時に存在する可能性のある発見のサブセットのみに焦点を当てており、複数のデータセットをまとめるモデルをトレーニングすることは困難である。したがって、これらのデータセットを集約的に活用し、胸腔内に存在する可能性のある異常の完全な表現で臨床的に有用なモデルを訓練することが重要である。そこで本研究では,分散異種データセットから知識を部分的アノテーションで融合・集約する協調学習フレームワークであるブラジカルアグリゲーションを提案する。人工的および実世界の異種データセットにまたがる外科的アグリゲーションを部分的アノテーションを用いて評価する。以上の結果から, 外科的アグリゲーションは現在の戦略より優れ, より一般化し, 異種疾患ラベル付きデータセットを用いても, 臨床的に有用なモデルの開発を促進できる可能性が示唆された。 Large-scale chest x-ray datasets have been curated for the detection of abnormalities using deep learning, with the potential to provide substantial benefits across many clinical applications. However, each dataset focuses only on a subset of findings that can be simultaneously present in a patient, making it challenging to train models that aggregate multiple datasets together. Therefore, data harmonization is crucial to leverage these datasets in aggregate to train clinically useful models with a complete representation of abnormalities that may occur within the thorax. To that end, we propose surgical aggregation, a collaborative learning framework for harmonizing and aggregating knowledge from distributed heterogeneous datasets with partial annotations. We evaluate surgical aggregation across synthetic and real-world heterogeneous datasets with partial annotations. Our results indicate that surgical aggregation outperforms current strategies, generalizes better, and has the potential to facilitate the development of clinically useful models even when using datasets with heterogeneous disease labels.	翻訳日:2023-06-17 02:36:06 公開日:2023-06-14
# AIモデルのための理論的枠組みとバイオメディシンへの応用 A Theoretical Framework for AI Models Explainability with Application in Biomedicine ( http://arxiv.org/abs/2212.14447v4 ) ライセンス: Link先を確認	Matteo Rizzo, Alberto Veneri, Andrea Albarelli, Claudio Lucchese, Marco Nobile, Cristina Conati	(参考訳) 説明可能な人工知能(XAI)は、人工知能コミュニティにおいて活発な研究テーマであり、メソッドやドメインにまたがる関心が高まっている。この問題については多くが書かれてきたが、XAIはいまだに共通用語と説明に構造的健全性を提供するフレームワークを欠いている。本研究では,文献に見ることができるものの合成である説明の新しい定義を提案することで,これらの課題に対処した。我々は、説明が原子性ではなく、モデルとその入出力マッピングに由来する証拠の組み合わせであり、この証拠の人間の解釈であると認識する。さらに、忠実性(すなわち、モデルの内部動作と意思決定プロセスの真の説明である説明)と可否性(つまり、その説明がどの程度ユーザにとって説得力のあるように見えるか)について説明する。提案する理論的枠組みを用いて,これらの特性の操作方法を単純化し,ケーススタディとして分析する共通説明法に対する新たな洞察を与える。 EXplainable Artificial Intelligence (XAI) is a vibrant research topic in the artificial intelligence community, with growing interest across methods and domains. Much has been written about the subject, yet XAI still lacks shared terminology and a framework capable of providing structural soundness to explanations. In our work, we address these issues by proposing a novel definition of explanation that is a synthesis of what can be found in the literature. We recognize that explanations are not atomic but the combination of evidence stemming from the model and its input-output mapping, and the human interpretation of this evidence. Furthermore, we fit explanations into the properties of faithfulness (i.e., the explanation being a true description of the model's inner workings and decision-making process) and plausibility (i.e., how much the explanation looks convincing to the user). Using our proposed theoretical framework simplifies how these properties are operationalized and it provides new insight into common explanation methods that we analyze as case studies.	翻訳日:2023-06-17 02:34:33 公開日:2023-06-14
# MixupE: 方向微分の観点からのミックスアップの理解と改善 MixupE: Understanding and Improving Mixup from Directional Derivative Perspective ( http://arxiv.org/abs/2212.13381v3 ) ライセンス: Link先を確認	Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi	(参考訳) Mixupはディープニューラルネットワークをトレーニングするための一般的なデータ拡張テクニックで、入力とラベルを線形に補間することで追加サンプルを生成する。この技術は多くの学習パラダイムや応用において一般化性能を向上させることが知られている。本研究では,まず混合を解析し,すべての順序の無限個の方向微分を暗黙的に規則化することを示す。この新たな知見に基づいて,理論上はバニラミックスアップよりも優れた一般化性能を提供するため,mixupの改良版を提案する。提案手法の有効性を示すために,画像,表データ,音声,グラフなどの様々な領域で実験を行った。提案手法は,様々なアーキテクチャを用いて,複数のデータセットのミックスアップを改良し,ImageNet Top-1の精度が0.8%向上したことを示す。 Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.	翻訳日:2023-06-17 02:34:14 公開日:2023-06-14
# 反復生成のためのスケーラブル適応計算 Scalable Adaptive Computation for Iterative Generation ( http://arxiv.org/abs/2212.11972v2 ) ライセンス: Link先を確認	Allan Jabri, David Fleet, Ting Chen	(参考訳) 自然データは冗長だが支配的なアーキテクチャであり、入出力空間を均一に計算する。本稿では,データ次元からコア計算を分離し,よりスケーラブルな高次元データ生成のための適応計算を可能にする注目型アーキテクチャであるRecurrent Interface Networks (RINs)を提案する。 RINは、潜在トークンとデータトークンの間の情報(すなわちルート)を読み書きするためにクロスアテンションを使用して、計算の大部分(すなわちグローバルな自己アテンション)を潜在トークンの集合にフォーカスする。 RINブロックの積み重ねにより、ボトムアップ(データから遅延)とトップダウン(データに近い)のフィードバックが可能になり、より深く表現力のあるルーティングが可能になる。このルーティングには課題が伴うが、拡散モデルによる反復生成のようなタスク(およびルーティング問題)が徐々に変化する繰り返し計算設定では問題が少なくなる。逆拡散過程の各前方通過に潜時トークンを前処理、すなわち潜時自己条件で条件付けすることで再帰性を活用する方法を示す。 RINは、画像生成とビデオ生成のための最先端のピクセル拡散モデルを生成し、カスケードやガイダンスなしで1024X1024画像にスケーリングすると同時に、ドメインに依存しず、2Dや3D U-Netよりも最大10倍効率が高い。 Natural data is redundant yet predominant architectures tile computation uniformly across their input and output space. We propose the Recurrent Interface Networks (RINs), an attention-based architecture that decouples its core computation from the dimensionality of the data, enabling adaptive computation for more scalable generation of high-dimensional data. RINs focus the bulk of computation (i.e. global self-attention) on a set of latent tokens, using cross-attention to read and write (i.e. route) information between latent and data tokens. Stacking RIN blocks allows bottom-up (data to latent) and top-down (latent to data) feedback, leading to deeper and more expressive routing. While this routing introduces challenges, this is less problematic in recurrent computation settings where the task (and routing problem) changes gradually, such as iterative generation with diffusion models. We show how to leverage recurrence by conditioning the latent tokens at each forward pass of the reverse diffusion process with those from prior computation, i.e. latent self-conditioning. RINs yield state-of-the-art pixel diffusion models for image and video generation, scaling to 1024X1024 images without cascades or guidance, while being domain-agnostic and up to 10X more efficient than 2D and 3D U-Nets.	翻訳日:2023-06-17 02:33:59 公開日:2023-06-14
# 小型潜時ネットワークを用いた適応型シームズ追跡 Adaptive Siamese Tracking with a Compact Latent Network ( http://arxiv.org/abs/2302.00930v2 ) ライセンス: Link先を確認	Xingping Dong, Jianbing Shen, Fatih Porikli, Jiebo Luo, and Ling Shao	(参考訳) 本稿では,シームズに基づくトラッカーを簡易化するために,トラッキングタスクを分類に変換し,直感的なビューアを提供する。この見地から,視覚シミュレーションや実追跡例を通じて詳細な解析を行い,いくつかの困難な状況における障害事例をオフライントレーニングにおける決定的サンプルの欠落問題とみなすことができる。最初の(最初の)フレームのサンプルは、豊富なシーケンス固有情報を含んでいるので、シーケンス全体を表す決定的なサンプルとみなすことができる。ベースモデルを新しいシーンに迅速に適応させるために、これらの決定的なサンプルをフル活用して、コンパクトな潜在ネットワークを提示する。具体的には,逐次的情報抽出を効率的に行うことで,高速調整のための統計に基づくコンパクトな潜在性特徴を提案する。さらに,提案するコンパクト潜在ネットワークの識別能力をさらに向上させるための,新たな多種多様なサンプルマイニング戦略を考案した。最後に,追跡フェーズ中のシーン変動を効率的に処理するために,基本モデルを更新するための条件付き更新戦略を提案する。本手法の一般化と有効性を評価するため,siamrpn++,siamfc,siambanの3つの古典的なsiameseベースのトラッカーを調整した。最近の6つのデータセットの大規模な実験結果から、3つの調整されたトラッカーは高い走行速度を保ちながら精度で優れた性能が得られることが示された。 In this paper, we provide an intuitive viewing to simplify the Siamese-based trackers by converting the tracking task to a classification. Under this viewing, we perform an in-depth analysis for them through visual simulations and real tracking examples, and find that the failure cases in some challenging situations can be regarded as the issue of missing decisive samples in offline training. Since the samples in the initial (first) frame contain rich sequence-specific information, we can regard them as the decisive samples to represent the whole sequence. To quickly adapt the base model to new scenes, a compact latent network is presented via fully using these decisive samples. Specifically, we present a statistics-based compact latent feature for fast adjustment by efficiently extracting the sequence-specific information. Furthermore, a new diverse sample mining strategy is designed for training to further improve the discrimination ability of the proposed compact latent network. Finally, a conditional updating strategy is proposed to efficiently update the basic models to handle scene variation during the tracking phase. To evaluate the generalization ability and effectiveness and of our method, we apply it to adjust three classical Siamese-based trackers, namely SiamRPN++, SiamFC, and SiamBAN. Extensive experimental results on six recent datasets demonstrate that all three adjusted trackers obtain the superior performance in terms of the accuracy, while having high running speed.	翻訳日:2023-06-17 02:25:16 公開日:2023-06-14
# 電子化エージェントへのインターネットスケールビジョンランゲージモデルの蒸留 Distilling Internet-Scale Vision-Language Models into Embodied Agents ( http://arxiv.org/abs/2301.12507v2 ) ライセンス: Link先を確認	Theodore Sumers, Kenneth Marino, Arun Ahuja, Rob Fergus, Ishita Dasgupta	(参考訳) 命令追従エージェントは言語を観察空間と行動空間に基礎付ける必要がある。基底言語への学習は、通常、ドメイン固有のエンジニアリングまたは大量のヒューマンインタラクションデータを必要とする。この課題に対処するために,事前に訓練された視覚言語モデル (VLM) を用いてエンボディエージェントを監督する手法を提案する。モデル蒸留と後視体験再生(HER)のアイデアを組み合わせて, VLMを用いてエージェントの動作を記述する言語を遡及的に生成する。単純なプロンプトによって監督信号を制御でき、エージェントに3dレンダリングされた環境で名前(平面など)や特徴(色など)に基づいて、新しいオブジェクトと対話するように教えます。 fewshotプロンプトでは、既存のカテゴリ(食べ物とおもちゃ)やアドホックなもの(オブジェクトよりもアービタリーな好み)など、抽象的なカテゴリのメンバシップを教えられます。我々の研究は、インターネットスケールのVLMを使うための新しい効果的な方法を概説し、そのようなモデルが獲得した汎用言語基盤を再利用し、エージェントにタスク関連基盤を教える。 Instruction-following agents must ground language into their observation and action spaces. Learning to ground language is challenging, typically requiring domain-specific engineering or large quantities of human interaction data. To address this challenge, we propose using pretrained vision-language models (VLMs) to supervise embodied agents. We combine ideas from model distillation and hindsight experience replay (HER), using a VLM to retroactively generate language describing the agent's behavior. Simple prompting allows us to control the supervision signal, teaching an agent to interact with novel objects based on their names (e.g., planes) or their features (e.g., colors) in a 3D rendered environment. Fewshot prompting lets us teach abstract category membership, including pre-existing categories (food vs toys) and ad-hoc ones (arbitrary preferences over objects). Our work outlines a new and effective way to use internet-scale VLMs, repurposing the generic language grounding acquired by such models to teach task-relevant groundings to embodied agents.	翻訳日:2023-06-17 02:23:50 公開日:2023-06-14
# SOBER:離散空間と混合空間上の高並列ベイズ最適化とベイズ四分法 SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces ( http://arxiv.org/abs/2301.11832v3 ) ライセンス: Link先を確認	Masaki Adachi, Satoshi Hayakawa, Saad Hamid, Martin J{\o}rgensen, Harald Oberhauser, Micheal A. Osborne	(参考訳) Batch Bayesian optimization と Bayesian quadrature は、高価な対物関数を並列にクエリできる最適化と二次化を行うサンプル効率のよい方法であることが示されている。しかし、現在の手法は大規模なバッチサイズにはスケールしない -- 実際には頻繁なデシデラタム(例えば、薬物の発見やシミュレーションに基づく推論)である。本稿では,分散空間上の任意の獲得関数とカーネルを持つ,スケーラブルで多様なバッチグローバル最適化と定式化を実現する新しいアルゴリズム SOBER を提案する。我々のアプローチの鍵は、二次問題としてグローバル最適化のためのバッチ選択を再構成することであり、これは獲得関数の最大化(非凸)をカーネル再結合(凸)に緩和する。グローバル最適化と二次のブリッジは、搾取ベイズ最適化と探索ベイズ二次のメリットをバランスさせることで、両方のタスクを効率的に解決することができる。実世界の12のタスクにおいて,SOBERが11の競争ベースラインを上回っていることを示す。 Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch global optimisation and quadrature with arbitrary acquisition functions and kernels over discrete and mixed spaces. The key to our approach is to reformulate batch selection for global optimisation as a quadrature problem, which relaxes acquisition function maximisation (non-convex) to kernel recombination (convex). Bridging global optimisation and quadrature can efficiently solve both tasks by balancing the merits of exploitative Bayesian optimisation and explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive baselines on 12 synthetic and diverse real-world tasks.	翻訳日:2023-06-17 02:23:30 公開日:2023-06-14
# パラメーター効率の高い転送学習による言語モデルの分布外ロバスト性の検出 Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning ( http://arxiv.org/abs/2301.11660v4 ) ライセンス: Link先を確認	Hyunsoo Cho, Choonghyun Park, Junyeop Kim, Hyuhng Joon Kim, Kang Min Yoo, and Sang-goo Lee	(参考訳) プレトレーニング言語モデル (PLM) のサイズが増加し続けるにつれて, 微調整の膨大なコストを補うために, パラメータ効率の学習手法が多数提案されている。大規模な事前学習言語モデル (PLM) と各種パラメータ効率変換学習法 (PETL) が日没ベンチマークで達成した印象的な結果にもかかわらず, 分散的にシフトした入力を効果的に処理できるかどうかは不明である。本研究では,plmの大きさや転送方法が変化するにつれて,od(out-of-distribution)がどう変化するかを体系的に検討する。具体的には,異なるスケールの様々な言語モデルを用いて,3つの異なる意図分類タスクにおいて,微調整,アダプタ,lora,プレフィックスチューニングを含む様々なpetl手法を評価した。 As the size of the pre-trained language model (PLM) continues to increase, numerous parameter-efficient transfer learning methods have been proposed recently to compensate for the tremendous cost of fine-tuning. Despite the impressive results achieved by large pre-trained language models (PLMs) and various parameter-efficient transfer learning (PETL) methods on sundry benchmarks, it remains unclear if they can handle inputs that have been distributionally shifted effectively. In this study, we systematically explore how the ability to detect out-of-distribution (OOD) changes as the size of the PLM grows or the transfer methods are altered. Specifically, we evaluated various PETL techniques, including fine-tuning, Adapter, LoRA, and prefix-tuning, on three different intention classification tasks, each utilizing various language models with different scales.	翻訳日:2023-06-17 02:23:13 公開日:2023-06-14
# LightGCL:レコメンデーションのためのシンプルで効果的なグラフコントラスト学習 LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation ( http://arxiv.org/abs/2302.08191v3 ) ライセンス: Link先を確認	Xuheng Cai, Chao Huang, Lianghao Xia, Xubin Ren	(参考訳) グラフニューラルネットワーク(GNN)は、グラフベースのレコメンデータシステムのための強力な学習手法である。近年, コントラスト学習と統合されたGNNは, 高度にスパースなデータを扱うことを目的としたデータ拡張方式により, 優れた性能を示した。その成功にもかかわらず、既存のグラフのコントラスト学習手法のほとんどは、ユーザ-itemの相互作用グラフ上で確率的拡張(ノード/エッジの摂動)を行うか、あるいはコントラスト的なビューを生成するためにヒューリスティックベースの拡張技術(ユーザクラスタリングなど)に依存する。これらの手法は本質的な意味構造を十分に保ち得ず、ノイズの摂動によって容易にバイアスを受けることができる。本稿では,これらの問題を緩和し,CLベースのレコメンデータの汎用性と堅牢性を損なう,簡易で効果的なグラフコントラッシブ学習パラダイムLightGCLを提案する。本モデルでは, コントラスト拡張のために特異値分解を排他的に活用し, 協調関係モデリングによる制約のない構造改善を可能にする。いくつかのベンチマークデータセットで行った実験は、最先端のモデルよりもモデルの性能が大幅に向上したことを示している。さらなる分析は、データスパーシリティと人気バイアスに対するLightGCLの頑健さの優位性を示している。私たちのモデルのソースコードはhttps://github.com/HKUDS/LightGCLで公開されています。 Graph neural network (GNN) is a powerful learning approach for graph-based recommender systems. Recently, GNNs integrated with contrastive learning have shown superior performance in recommendation with their data augmentation schemes, aiming at dealing with highly sparse data. Despite their success, most existing graph contrastive learning methods either perform stochastic augmentation (e.g., node/edge perturbation) on the user-item interaction graph, or rely on the heuristic-based augmentation techniques (e.g., user clustering) for generating contrastive views. We argue that these methods cannot well preserve the intrinsic semantic structures and are easily biased by the noise perturbation. In this paper, we propose a simple yet effective graph contrastive learning paradigm LightGCL that mitigates these issues impairing the generality and robustness of CL-based recommenders. Our model exclusively utilizes singular value decomposition for contrastive augmentation, which enables the unconstrained structural refinement with global collaborative relation modeling. Experiments conducted on several benchmark datasets demonstrate the significant improvement in performance of our model over the state-of-the-arts. Further analyses demonstrate the superiority of LightGCL's robustness against data sparsity and popularity bias. The source code of our model is available at https://github.com/HKUDS/LightGCL.	翻訳日:2023-06-17 02:17:18 公開日:2023-06-14
# バンド・ソーシャル・ラーニング : 神秘的行動による探索 Bandit Social Learning: Exploration under Myopic Behavior ( http://arxiv.org/abs/2302.07425v3 ) ライセンス: Link先を確認	Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Aleksandrs Slivkins	(参考訳) エージェントが単純なマルチアームバンディットプロトコルに従う社会学習のダイナミクスについて検討する。エージェントは順次到着し、腕を選び、関連する報酬を受け取る。各エージェントは、前のエージェントの完全な履歴(武器と報酬)を観察し、プライベートシグナルは存在しない。協力してエージェントは探索と探索のトレードオフに直面しますが、それぞれのエージェントは探査に関して無差別に行動します。モチベーションシナリオは、オンラインプラットフォームにおけるレビューと評価に関するものだ。我々は、「偏見のない」行動や様々な行動バイアスを含む、(パラメータ化された)信頼区間と整合した幅広い筋電図的行動を許容する。これらの行動の極端なバージョンはよく知られたバンディットアルゴリズムに対応しているが、より穏健なバージョンは究極の探索失敗につながり、結果としてエージェント数に線形な後悔率をもたらすことを証明している。我々は「適度に楽観的な」エージェントを分析して後悔の上限を一致させる。独立利害関係の特別な場合として,多腕バンディットにおけるグリーディアルゴリズムの故障に関する一般的な結果を得る。これが文学における最初の結果であり、我々の知る限りでは最善である。 We study social learning dynamics where the agents collectively follow a simple multi-armed bandit protocol. Agents arrive sequentially, choose arms and receive associated rewards. Each agent observes the full history (arms and rewards) of the previous agents, and there are no private signals. While collectively the agents face exploration-exploitation tradeoff, each agent acts myopically, without regards to exploration. Motivating scenarios concern reviews and ratings on online platforms. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals, including the "unbiased" behavior as well as various behaviorial biases. While extreme versions of these behaviors correspond to well-known bandit algorithms, we prove that more moderate versions lead to stark exploration failures, and consequently to regret rates that are linear in the number of agents. We provide matching upper bounds on regret by analyzing "moderately optimistic" agents. As a special case of independent interest, we obtain a general result on failure of the greedy algorithm in multi-armed bandits. This is the first such result in the literature, to the best of our knowledge.	翻訳日:2023-06-17 02:16:43 公開日:2023-06-14
# Androidマルウェア検出のための継続的学習 Continuous Learning for Android Malware Detection ( http://arxiv.org/abs/2302.04332v2 ) ライセンス: Link先を確認	Yizheng Chen, Zhoujie Ding, David Wagner	(参考訳) 機械学習は、androidのマルウェアを非常に高い精度で検出できる。しかし、これらの分類器にはAchilles Heelとコンセプトドリフトがあり、マルウェアアプリや良質なアプリの進化によって、それらは急速に時代遅れになり、非効率になる。我々の研究によると、Androidのマルウェア分類器を1年分のデータでトレーニングした後、新しいテストサンプルに6ヶ月デプロイした後、F1スコアはすぐに0.99から0.76に低下した。本稿では,androidマルウェア分類器の概念ドリフト問題に対処する新しい手法を提案する。マシンラーニングのテクニックを継続的にデプロイする必要があるため、私たちはアクティブラーニングを使用します。アナリストがラベル付けする新しいサンプルを選択し、ラベル付きサンプルをトレーニングセットに追加して、分類器を再トレーニングします。私たちの重要なアイデアは、類似性に基づく不確実性が、コンセプトドリフトに対してより堅牢であることです。そこで我々は,コントラスト学習とアクティブラーニングを組み合わせる。本稿では,新しい階層的コントラスト学習スキームと,androidマルウェア分類器を継続的に学習するための新しいサンプル選択手法を提案する。評価の結果,前回公表したアクティブラーニング手法と比較して,大幅な改善がみられた。我々のアプローチは、偽陰性率を14%(最良のベースライン)から9%に削減するとともに、偽陽性率(0.86%から0.48%)を低下させる。また,従来の手法よりも7年間にわたって一貫した性能を維持する。 Machine learning methods can detect Android malware with very high accuracy. However, these classifiers have an Achilles heel, concept drift: they rapidly become out of date and ineffective, due to the evolution of malware apps and benign apps. Our research finds that, after training an Android malware classifier on one year's worth of data, the F1 score quickly dropped from 0.99 to 0.76 after 6 months of deployment on new test samples. In this paper, we propose new methods to combat the concept drift problem of Android malware classifiers. Since machine learning technique needs to be continuously deployed, we use active learning: we select new samples for analysts to label, and then add the labeled samples to the training set to retrain the classifier. Our key idea is, similarity-based uncertainty is more robust against concept drift. Therefore, we combine contrastive learning with active learning. We propose a new hierarchical contrastive learning scheme, and a new sample selection technique to continuously train the Android malware classifier. Our evaluation shows that this leads to significant improvements, compared to previously published methods for active learning. Our approach reduces the false negative rate from 14% (for the best baseline) to 9%, while also reducing the false positive rate (from 0.86% to 0.48%). Also, our approach maintains more consistent performance across a seven-year time period than past methods.	翻訳日:2023-06-17 02:15:34 公開日:2023-06-14
# 効率的な同変GNNのためのSO(3)のSO(2)への畳み込み Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs ( http://arxiv.org/abs/2302.03655v2 ) ライセンス: Link先を確認	Saro Passaro, C. Lawrence Zitnick	(参考訳) 点雲や原子などの3Dデータをモデル化するグラフニューラルネットワークは、通常、$SO(3)$等式、すなわち3Dローテーションに同変することを望んでいる。残念ながら、同変ネットワークの基本的な操作である同変畳み込みは、高次テンソルを使用すると計算複雑性が大幅に増加する。本稿では、$SO(3)$畳み込みあるいはテンソル積を$SO(2)$ の数学的に等価な畳み込みに還元することでこの問題に対処する。これは、ノード埋め込みの一次軸をエッジベクトルに合わせることで達成され、これはテンソル積を分散させ、計算複雑性を$O(L^6)$から$O(L^3)$に減らし、$L$は表現の次数である。本研究では,大規模oc-20およびoc-22データセットの最先端結果を実現する等変畳み込み法を用いて,グラフニューラルネットワークである等変球状チャネルネットワーク(escn)を提案することで,この改善の可能性を示す。 Graph neural networks that model 3D data, such as point clouds or atoms, are typically desired to be $SO(3)$ equivariant, i.e., equivariant to 3D rotations. Unfortunately equivariant convolutions, which are a fundamental operation for equivariant networks, increase significantly in computational complexity as higher-order tensors are used. In this paper, we address this issue by reducing the $SO(3)$ convolutions or tensor products to mathematically equivalent convolutions in $SO(2)$ . This is accomplished by aligning the node embeddings' primary axis with the edge vectors, which sparsifies the tensor product and reduces the computational complexity from $O(L^6)$ to $O(L^3)$, where $L$ is the degree of the representation. We demonstrate the potential implications of this improvement by proposing the Equivariant Spherical Channel Network (eSCN), a graph neural network utilizing our novel approach to equivariant convolutions, which achieves state-of-the-art results on the large-scale OC-20 and OC-22 datasets.	翻訳日:2023-06-17 02:15:11 公開日:2023-06-14
# ブラックボックスモデルで単純なタスクをオーバーキルしなくなり、代わりに透明モデルを使用する Stop overkilling simple tasks with black-box models and use transparent models instead ( http://arxiv.org/abs/2302.02804v2 ) ライセンス: Link先を確認	Matteo Rizzo, Matteo Marcuzzo, Alessandro Zangari, Andrea Gasparetto, Andrea Albarelli	(参考訳) 近年、ディープラーニングの手法が採用され、人工知能にいくつかの大きなブレークスルーをもたらした。従来の機械学習モデルとは異なり、ディープラーニングベースのアプローチは、生データから自律的に特徴を抽出することができる。これにより、一般的にエラーを起こしやすく、面倒であると考えられる機能エンジニアリングプロセスをバイパスすることができる。さらに、ディープラーニング戦略は、精度で従来のモデルより優れていることが多い。 In recent years, the employment of deep learning methods has led to several significant breakthroughs in artificial intelligence. Different from traditional machine learning models, deep learning-based approaches are able to extract features autonomously from raw data. This allows for bypassing the feature engineering process, which is generally considered to be both error-prone and tedious. Moreover, deep learning strategies often outperform traditional models in terms of accuracy.	翻訳日:2023-06-17 02:14:49 公開日:2023-06-14
# 予算・ROI制約付き多チャンネル自動車 Multi-channel Autobidding with Budget and ROI Constraints ( http://arxiv.org/abs/2302.01523v3 ) ライセンス: Link先を確認	Yuan Deng, Negin Golrezaei, Patrick Jaillet, Jason Cheuk Nam Liang, Vahab Mirrokni	(参考訳) デジタルオンライン広告では、広告主は複数のプラットフォーム、あるいはGoogle Ads、Meta Ads Managerなどのいわゆるチャンネルで広告インプレッションを同時に調達する。広告主が全チャンネルの総コンバージョン(広告クリックなど)を最大化しつつ、総リターン・オン・投資(ROI)と予算制約を満たす方法について検討する。実際には、広告主は、制御することができないため、グローバルに最適化することができないため、各チャンネルで参加する個別の広告オークションを承認し、その代わりにインプレッションを得るチャンネルを許可する。本研究では,広告主のグローバルなマルチチャネル問題を解決するために,各レバーの有効性をまず分析する。広告主がチャネル毎のROIを最適化するだけでは、全体の変換がグローバルな問題で得られるものよりも任意に悪化することを示します。さらに,チャネル当たりの予算を最適化するだけで,広告主がグローバルに最適な変換を実現できることを示す。この発見を踏まえ、広告主が各チャンネルでの広告入札に関する情報に制限がある実世界のシナリオと、チャネル調達広告の仕組みを模倣した、帯域単位の予算を生成する効率的な学習アルゴリズムを提案し、その結果の変換は、グローバルな最適問題のものと近似する。最後に、当社の結果は、広告主の代理として、チャンネルがインプレッションを得られるシングルイットとマルチイットのオークションの両方に当てはまると論じる。 In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc., each of which consists of numerous ad auctions. We study how an advertiser maximizes total conversion (e.g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels. In practice, an advertiser does not have control over, and thus cannot globally optimize, which individual ad auctions she participates in for each channel, and instead authorizes a channel to procure impressions on her behalf: the advertiser can only utilize two levers on each channel, namely setting a per-channel budget and per-channel target ROI. In this work, we first analyze the effectiveness of each of these levers for solving the advertiser's global multi-channel problem. We show that when an advertiser only optimizes over per-channel ROIs, her total conversion can be arbitrarily worse than what she could have obtained in the global problem. Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem. Finally, we argue that all our results hold for both single-item and multi-item auctions from which channels procure impressions on advertisers' behalf.	翻訳日:2023-06-17 02:14:30 公開日:2023-06-14
# MADDPGにおけるGumbel-Softmaxの再検討 Revisiting the Gumbel-Softmax in MADDPG ( http://arxiv.org/abs/2302.11793v2 ) ライセンス: Link先を確認	Callum Rhys Tilbury, Filippos Christianos, Stefano V. Albrecht	(参考訳) MADDPGはマルチエージェント強化学習(MARL)におけるアルゴリズムであり、一般的な単エージェント法であるDDPGをマルチエージェントシナリオに拡張する。 DDPGは、状態-作用値関数の勾配が存在する連続的な行動空間向けに設計されたアルゴリズムである。このアルゴリズムが離散作用空間で動作するためには、離散勾配推定を行う必要がある。 maddpgでは、gumbel-softmax (gs) 推定器が使用されている -- 離散分布を同様の連続分布に緩和する再パラメータ化である。しかし、この手法は統計的に偏りがあり、最近のMARLベンチマークでは、このバイアスにより、アクション空間が離散的なグリッドワールド環境でのMADDPGの性能が低下することが示唆されている。幸いにもGSの代替品は数多く存在し、幅広い特性を誇っている。本稿では,これらの選択肢のいくつかを探索し,離散グリッドワールドシナリオのためのMADDPGに統合する。さまざまなパフォーマンス指標に対する対応する影響を計測して分析する。提案した推定器の1つは、いくつかのタスクにおいて元のGSよりもはるかに優れた性能を示し、最大で55%高いリターンを達成し、より高速な収束を実現している。 MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that extends the popular single-agent method, DDPG, to multi-agent scenarios. Importantly, DDPG is an algorithm designed for continuous action spaces, where the gradient of the state-action value function exists. For this algorithm to work in discrete action spaces, discrete gradient estimation must be performed. For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisation which relaxes a discrete distribution into a similar continuous one. This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete. Fortunately, many alternatives to the GS exist, boasting a wide range of properties. This paper explores several of these alternatives and integrates them into MADDPG for discrete grid-world scenarios. The corresponding impact on various performance metrics is then measured and analysed. It is found that one of the proposed estimators performs significantly better than the original GS in several tasks, achieving up to 55% higher returns, along with faster convergence.	翻訳日:2023-06-17 02:06:55 公開日:2023-06-14
# MalProtect:MLベースのマルウェア検出における逆クエリ攻撃に対するステートフル防御 MalProtect: Stateful Defense Against Adversarial Query Attacks in ML-based Malware Detection ( http://arxiv.org/abs/2302.10739v2 ) ライセンス: Link先を確認	Aqib Rashid and Jose Such	(参考訳) mlモデルは、逆クエリ攻撃に対して脆弱であることが知られている。これらの攻撃では、クエリは出力以外にターゲットモデルに関する知識のない特定のクラスに対して反復的に摂動される。リモートホスト型ML分類モデルとMachine-Learning-as-a-Serviceプラットフォームの普及は、クエリアタックがこれらのシステムのセキュリティに本当の脅威をもたらすことを意味する。これに対処するため、システムで受信されたクエリのシーケンスを監視し分析することで、クエリアタックの検出と敵の例の発生を防止するために、ステートフルな防御が提案されている。近年、いくつかの国家防衛が提案されている。しかし、これらの防御は、他の領域で有効な類似性または分散検出方法のみに依存している。マルウェア検出領域では、敵の例を生成する方法は本質的に異なるため、そのような検出機構は著しく効果が低い。そこで本研究では,マルウェア検出領域におけるクエリアタックに対するステートフルな防御であるMalProtectを提案する。 MalProtectはいくつかの脅威指標を使用して攻撃を検出する。以上の結果から,Android および Windows マルウェアでは,さまざまな攻撃シナリオにおいて,敵クエリ攻撃の回避率を 80 % 削減できることがわかった。この種の最初の評価では、malprotectは、特に最大の敵の脅威下で、以前の国家的防御よりも優れています。 ML models are known to be vulnerable to adversarial query attacks. In these attacks, queries are iteratively perturbed towards a particular class without any knowledge of the target model besides its output. The prevalence of remotely-hosted ML classification models and Machine-Learning-as-a-Service platforms means that query attacks pose a real threat to the security of these systems. To deal with this, stateful defenses have been proposed to detect query attacks and prevent the generation of adversarial examples by monitoring and analyzing the sequence of queries received by the system. Several stateful defenses have been proposed in recent years. However, these defenses rely solely on similarity or out-of-distribution detection methods that may be effective in other domains. In the malware detection domain, the methods to generate adversarial examples are inherently different, and therefore we find that such detection mechanisms are significantly less effective. Hence, in this paper, we present MalProtect, which is a stateful defense against query attacks in the malware detection domain. MalProtect uses several threat indicators to detect attacks. Our results show that it reduces the evasion rate of adversarial query attacks by 80+\% in Android and Windows malware, across a range of attacker scenarios. In the first evaluation of its kind, we show that MalProtect outperforms prior stateful defenses, especially under the peak adversarial threat.	翻訳日:2023-06-17 02:06:11 公開日:2023-06-14
# ディープニューラルネットワークにおけるショートカット学習の取り組み--解釈可能なモデルによる反復的アプローチ Tackling Shortcut Learning in Deep Neural Networks: An Iterative Approach with Interpretable Models ( http://arxiv.org/abs/2302.10289v6 ) ライセンス: Link先を確認	Shantanu Ghosh, Ke Yu, Forough Arabshahi, Kayhan Batmanghelich	(参考訳) 概念に基づく解釈モデルを用いてショートカット学習を緩和する。既存の方法には解釈性がない。 Blackboxから始めて、解釈可能な専門家 (MoIE) と \emph{residual network} の混合物を反復的に \emph{carve out} する。各専門家は、FOL(First Order Logic)を使用してデータのサブセットを説明する。サンプルを説明しながら、偏りのあるBB由来のMoIEからFOLがショートカットを効果的に検出する。 BBをメタデータ正規化(MDN)で微調整すると、ショートカットがなくなる。微細BB由来MoIEからのFOLはショートカットの除去を検証する。実験の結果,MoIEは元のBBの精度を損なわず,ショートカットを効果的に除去することがわかった。 We use concept-based interpretable models to mitigate shortcut learning. Existing methods lack interpretability. Beginning with a Blackbox, we iteratively \emph{carve out} a mixture of interpretable experts (MoIE) and a \emph{residual network}. Each expert explains a subset of data using First Order Logic (FOL). While explaining a sample, the FOL from biased BB-derived MoIE detects the shortcut effectively. Finetuning the BB with Metadata Normalization (MDN) eliminates the shortcut. The FOLs from the finetuned-BB-derived MoIE verify the elimination of the shortcut. Our experiments show that MoIE does not hurt the accuracy of the original BB and eliminates shortcuts effectively.	翻訳日:2023-06-17 02:05:48 公開日:2023-06-14
# 深層学習アルゴリズムによる多変量系リスク対策と計算 Multivariate Systemic Risk Measures and Computation by Deep Learning Algorithms ( http://arxiv.org/abs/2302.10183v2 ) ライセンス: Link先を確認	Alessandro Doldi, Yichen Feng, Jean-Pierre Fouque, Marco Frittelli	(参考訳) 本研究では,多変量ユーティリティ関数によって定義されるシステム的短絡リスク尺度の計算のための深層学習に基づくアルゴリズムを提案する。本稿では,主観的最適性と関連するリスク割り当ての公平性に着目し,重要な理論的側面について論じる。私たちが提供しているアルゴリズムは、予備最適化の学習、二重表現の最適化、およびそれに対応する公正なリスク割り当てを可能にします。アルゴリズムをベンチマークモデルと比較し,一対の指数的ユーティリティ関数をベースとして,明示的な公式を提供するアルゴリズムを検証した。また、明示的な公式が得られない場合においても収束の証拠を示す。 In this work we propose deep learning-based algorithms for the computation of systemic shortfall risk measures defined via multivariate utility functions. We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations. The algorithms we provide allow for learning primal optimizers, optima for the dual representation and corresponding fair risk allocations. We test our algorithms by comparison to a benchmark model, based on a paired exponential utility function, for which we can provide explicit formulas. We also show evidence of convergence in a case for which explicit formulas are not available.	翻訳日:2023-06-17 02:05:36 公開日:2023-06-14
# 人選好による言語モデルの事前学習 Pretraining Language Models with Human Preferences ( http://arxiv.org/abs/2302.08582v2 ) ライセンス: Link先を確認	Tomasz Korbak and Kejian Shi and Angelica Chen and Rasika Bhalerao and Christopher L. Buckley and Jason Phang and Samuel R. Bowman and Ethan Perez	(参考訳) 言語モデル(LM)はインターネットテキストを模倣するために事前訓練されており、LMが生成したコンテンツには、偽造、攻撃的なコメント、個人識別可能な情報、品質の低いコード、バギーコードなどが含まれる。本稿では,人間の嗜好に沿ったテキストを生成する方法として,LMの事前学習のための代替目的を検討する。我々は,3つのタスクにまたがるフィードバックによる事前学習の5つの目標をベンチマークし,それらが事前訓練されたLMのアライメントと能力のトレードオフに与える影響について検討する。そこで我々は、条件付きトレーニングや、報酬モデルによって与えられる人間の嗜好スコアに基づくトークン上の分布の学習という、パレート最適で簡単なアプローチを見出した。条件付きトレーニングは、プロンプトを使わずに生成する時と逆行するプロンプトを伴って、望ましくないコンテンツの速度を最大で桁違いに減少させる。さらに条件付きトレーニングは、タスク固有の微調整前後において、標準lmプリトレーニングのダウンストリームタスクパフォーマンスを維持する。人間のフィードバックによる事前トレーニングは、標準のlmプリトレーニングよりもずっと優れた好み満足度をもたらし、続いてフィードバックによる微調整、すなわち学習、そして望ましくない行動を学習する。この結果から,LMの事前学習では模倣学習を超越し,訓練開始から人間の嗜好を取り入れるべきであることが示唆された。 Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs. We find a Pareto-optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially-chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training.	翻訳日:2023-06-17 02:04:27 公開日:2023-06-14
# iSAGE: データストリームのオンライン説明のためのSAGEのインクリメンタルバージョン iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams ( http://arxiv.org/abs/2303.01181v2 ) ライセンス: Link先を確認	Maximilian Muschalik, Fabian Fumagalli, Barbara Hammer, Eyke H\"ullermeier	(参考訳) SAGEのような一般的な特徴重要度尺度を含む既存の説明可能な人工知能(XAI)の方法は、主にバッチ学習シナリオに限定されている。しかしながら、機械学習は、データが継続的に到着し、学習をオンライン形式で行わなければならない動的環境に適用されることが多い。そこで本研究では,SAGEの時間・メモリ効率のインクリメンタル化であるiSAGEを提案する。さらに,機能依存性を(干渉的に)破壊し,(観察的に)保持する,効率的な機能削除手法も提供する。さらに,iSAGEがSAGEと類似した理論的性質に固執していることを示すための説明法を正式に分析した。最後に,確立されたデータセットと概念ドリフトを伴うデータストリームに基づいて,我々のアプローチを徹底した実験分析で評価する。 Existing methods for explainable artificial intelligence (XAI), including popular feature importance measures such as SAGE, are mostly restricted to the batch learning scenario. However, machine learning is often applied in dynamic environments, where data arrives continuously and learning must be done in an online manner. Therefore, we propose iSAGE, a time- and memory-efficient incrementalization of SAGE, which is able to react to changes in the model as well as to drift in the data-generating process. We further provide efficient feature removal methods that break (interventional) and retain (observational) feature dependencies. Moreover, we formally analyze our explanation method to show that iSAGE adheres to similar theoretical properties as SAGE. Finally, we evaluate our approach in a thorough experimental analysis based on well-established data sets and data streams with concept drift.	翻訳日:2023-06-17 01:57:30 公開日:2023-06-14
# gnot: 演算子学習のための一般ニューラルネットワークトランスフォーマー GNOT: A General Neural Operator Transformer for Operator Learning ( http://arxiv.org/abs/2302.14376v3 ) ライセンス: Link先を確認	Zhongkai Hao, Zhengyi Wang, Hang Su, Chengyang Ying, Yinpeng Dong, Songming Liu, Ze Cheng, Jian Song, Jun Zhu	(参考訳) 偏微分方程式(pdes)解演算子の学習は、機械学習において不可欠な問題である。しかし、不規則メッシュ、複数入力関数、PDEの解の複雑さなど、実践的な応用における演算子学習にはいくつかの課題がある。そこで本研究では,学習操作者のためのスケーラブルで効果的なトランスフォーマーフレームワークであるgeneral neural operator transformer (gnot)を提案する。新たな不均一正規化アテンション層を設計することにより,複数の入力関数や不規則メッシュを扱うことができる。また,マルチスケール問題を解くためにソフトドメイン分解と見なすことのできる幾何学的ゲーティング機構を導入する。トランスフォーマーアーキテクチャの大規模モデルキャパシティは,大規模データセットと実用上の問題にスケールする可能性をモデルに与える。異なる領域の複数の挑戦的データセットを広範囲に実験し,代替手法と比較して著しく改善した。私たちのコードとデータは、 \url{https://github.com/thu-ml/gnot}で公開されている。 Learning partial differential equations' (PDEs) solution operators is an essential problem in machine learning. However, there are several challenges for learning operators in practical applications like the irregular mesh, multiple input functions, and complexity of the PDEs' solution. To address these challenges, we propose a general neural operator transformer (GNOT), a scalable and effective transformer-based framework for learning operators. By designing a novel heterogeneous normalized attention layer, our model is highly flexible to handle multiple input functions and irregular meshes. Besides, we introduce a geometric gating mechanism which could be viewed as a soft domain decomposition to solve the multi-scale problems. The large model capacity of the transformer architecture grants our model the possibility to scale to large datasets and practical problems. We conduct extensive experiments on multiple challenging datasets from different domains and achieve a remarkable improvement compared with alternative methods. Our code and data are publicly available at \url{https://github.com/thu-ml/GNOT}.	翻訳日:2023-06-17 01:57:15 公開日:2023-06-14
# 胸部x線画像における知識強化視覚言語前訓練 Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images ( http://arxiv.org/abs/2302.14042v3 ) ライセンス: Link先を確認	Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie	(参考訳) 大規模データに事前学習されたマルチモーダル基礎モデルは自然言語理解や視覚認識に成功しているが、医療領域におけるそれらの使用は、医学的タスクのきめ細かい性質とドメイン知識の高需要のために制限されている。この課題に対処するために,既存の医学領域の知識を活用して,胸部X線と放射線学のレポートを用いた視覚言語事前学習を指導する,知識強調型自動診断(KAD)という新しいアプローチを提案する。我々は, {four} 外部X線データセット上でKADを評価し,そのゼロショット性能が完全教師付きモデルに匹敵するだけでなく,統計学的に有意な3種類の専門放射線技師の平均よりも優れていることを示した。さらに、少数ショットのアノテーションが利用できる場合、KADは、微調整設定で既存のすべてのアプローチより優れており、異なる臨床シナリオにおける適用の可能性を示している。 While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge. To address this challenge, we propose a novel approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports. We evaluate KAD on {four} external X-ray datasets and demonstrate that its zero-shot performance is not only comparable to that of fully-supervised models, but also superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. Moreover, when few-shot annotation is available, KAD outperforms all existing approaches in fine-tuning settings, demonstrating its potential for application in different clinical scenarios.	翻訳日:2023-06-17 01:56:41 公開日:2023-06-14
# 正規化動的プログラミングによる最適計画 Optimistic Planning by Regularized Dynamic Programming ( http://arxiv.org/abs/2302.14004v3 ) ライセンス: Link先を確認	Antoine Moulin, Gergely Neu	(参考訳) 本稿では,標準近似値反復手順の更新に正規化を加えるという考え方に基づいて,無限ホライゾン割引マルコフ決定過程における楽観的計画手法を提案する。この手法により, 線形関数近似を用いたMDPの最小二乗法により推定される近似遷移関数を, 既存の近似動的プログラミング手法の分析で必要とされる縮退や単調性引数を回避することができる。本手法は,表付きMDPの既知保証を回復し,また,1つの経験ストリームから,割引された線形混合MDPの準最適ポリシーを学習するための計算効率の良いアルゴリズムを提供する。 We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.	翻訳日:2023-06-17 01:56:22 公開日:2023-06-14
# DiffusioNeRF: Denoising Diffusion Modelを用いた正則化ニューラルラジアンス場 DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models ( http://arxiv.org/abs/2302.12231v2 ) ライセンス: Link先を確認	Jamie Wynn, Daniyar Turmukhambetov	(参考訳) 良好な条件下では、ニューラルレージアンス場(NeRF)は、新しいビュー合成タスクにおいて印象的な結果を示している。 NeRFは、トレーニングビューとシーンの異なるレンダリングとの光度差を最小限にして、シーンの色と密度場を学習する。十分な一連のビューからトレーニングされたNeRFは、任意のカメラ位置から新しいビューを生成することができる。しかし、シーンの幾何学とカラーフィールドは厳しい制約下にあり、特に少ない入力ビューでトレーニングされた場合、アーティファクトにつながる可能性がある。この問題を軽減するために,ddm(denoising diffusion model)を用いて,風景形状と色彩の先行学習を行う。我々のDDMは、合成HypersimデータセットのRGBDパッチに基づいて訓練されており、色と深さの確率分布の対数勾配を予測できる。これらのrgbdパッチプリエントの対数勾配は,シーンの形状や色を規則化するのに役立つ。 nerfトレーニング中、ランダムなrgbdパッチがレンダリングされ、ログ類似度の推定勾配が色と密度フィールドに再伝播される。最も関連するデータセットであるllffの評価は、学習済みの事前学習によって再構成された幾何学の質が向上し、新しい視点への一般化が改善されたことを示している。 DTUの評価では、NeRF法で再現性が改善された。 Under good conditions, Neural Radiance Fields (NeRFs) have shown impressive results on novel view synthesis tasks. NeRFs learn a scene's color and density fields by minimizing the photometric discrepancy between training views and differentiable renderings of the scene. Once trained from a sufficient set of views, NeRFs can generate novel views from arbitrary camera positions. However, the scene geometry and color fields are severely under-constrained, which can lead to artifacts, especially when trained with few input views. To alleviate this problem we learn a prior over scene geometry and color, using a denoising diffusion model (DDM). Our DDM is trained on RGBD patches of the synthetic Hypersim dataset and can be used to predict the gradient of the logarithm of a joint probability distribution of color and depth patches. We show that, these gradients of logarithms of RGBD patch priors serve to regularize geometry and color of a scene. During NeRF training, random RGBD patches are rendered and the estimated gradient of the log-likelihood is backpropagated to the color and density fields. Evaluations on LLFF, the most relevant dataset, show that our learned prior achieves improved quality in the reconstructed geometry and improved generalization to novel views. Evaluations on DTU show improved reconstruction quality among NeRF methods.	翻訳日:2023-06-17 01:55:06 公開日:2023-06-14
# mednext: 医用画像セグメンテーションのためのconvnetのトランスフォーマー駆動スケーリング MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation ( http://arxiv.org/abs/2303.09975v3 ) ライセンス: Link先を確認	Saikat Roy, Gregor Koehler, Constantin Ulrich, Michael Baumgartner, Jens Petersen, Fabian Isensee, Paul F. Jaeger, Klaus Maier-Hein	(参考訳) 医療画像セグメンテーションのためにTransformerベースのアーキテクチャを採用することへの関心は爆発的に高まっている。しかし、大規模な注釈付き医療データセットの欠如により、自然画像のそれと同等のパフォーマンスを達成することは困難である。対照的に畳み込みネットワークは誘導バイアスが高く、その結果、高い性能で容易に訓練できる。近年、convnextアーキテクチャはトランスフォーマーブロックをミラーリングすることで標準convnetの近代化を試みた。そこで本研究では, 医療現場の課題に合わせてカスタマイズした, 現代的でスケーラブルな畳み込み型アーキテクチャの設計を改良した。トランスフォーマーにインスパイアされた大規模カーネルセグメンテーションネットワークであるMedNeXtを導入し,1)医療画像セグメンテーションのための完全なConvNeXt 3Dエンコーダデコーダネットワークを導入する。 2) 規模にまたがる意味的豊かさを維持するため,残留ConvNeXtのアップアンドダウンサンプリングブロック。 3)小規模カーネルネットワークのアップサンプリングによるカーネルサイズを反復的に増加させ,限られた医療データの性能飽和を防止する新手法 4)MedNeXtの複数レベルの複合スケーリング(深さ,幅,カーネルサイズ)。これにより、CTとMRIの4つのタスクにおける最先端のパフォーマンスと、さまざまなデータセットサイズが実現され、医療画像セグメンテーションのための近代化されたディープアーキテクチャが表される。私たちのコードは、https://github.com/MIC-DKFZ/MedNeXt.comで公開されています。 There has been exploding interest in embracing Transformer-based architectures for medical image segmentation. However, the lack of large-scale annotated medical datasets make achieving performances equivalent to those in natural images challenging. Convolutional networks, in contrast, have higher inductive biases and consequently, are easily trainable to high performance. Recently, the ConvNeXt architecture attempted to modernize the standard ConvNet by mirroring Transformer blocks. In this work, we improve upon this to design a modernized and scalable convolutional architecture customized to challenges of data-scarce medical settings. We introduce MedNeXt, a Transformer-inspired large kernel segmentation network which introduces - 1) A fully ConvNeXt 3D Encoder-Decoder Network for medical image segmentation, 2) Residual ConvNeXt up and downsampling blocks to preserve semantic richness across scales, 3) A novel technique to iteratively increase kernel sizes by upsampling small kernel networks, to prevent performance saturation on limited medical data, 4) Compound scaling at multiple levels (depth, width, kernel size) of MedNeXt. This leads to state-of-the-art performance on 4 tasks on CT and MRI modalities and varying dataset sizes, representing a modernized deep architecture for medical image segmentation. Our code is made publicly available at: https://github.com/MIC-DKFZ/MedNeXt.	翻訳日:2023-06-17 01:47:39 公開日:2023-06-14
# スピンネットワークにおける励起伝達制御のロバスト性評価と統一化 Analyzing and Unifying Robustness Measures for Excitation Transfer Control in Spin Networks ( http://arxiv.org/abs/2303.09518v2 ) ライセンス: Link先を確認	S. P. O'Neil, I. Khalid, A. A. Rompokos, C. A. Weidner, F. C. Langbein, S. G. Schirmer, E. A. Jonckheere	(参考訳) 量子制御の最近の業績は、量子通信、コンピューティング、センシングのアプリケーションのためのコントローラを設計するための高度な技術を生み出した。しかし、そのようなシステムのノイズや不確実性への感受性は、量子デバイスの完全なポテンシャルを実現するために、これらの条件下で効果的に機能する堅牢なコントローラを必要とする。時間領域のログ感度と最近導入されたロバストネス不忠実度測定(RIM)は、量子システムにおけるコントローラのロバストネスを定量化する2つの方法である。前者は分析的に見つかるが、後者はモンテカルロサンプリングを必要とする。本研究は, スピン鎖および環における単一励起伝達の堅牢性を評価するために, 対数感度とRIMの相関関係について検討した。予測される誤差の差分感度は, RIMの差分感度と一致し, 予測値が誤差確率分布上にあることを示す。統計的解析により、対数感度とRIMは差分感度を介してリンクされ、差分感度とRIMは極めて一致していることが示された。様々な現実的なシナリオにおけるコントローラーの堅牢性を評価するための2つの手段(分析的手法とサンプリング的手法)の統合は、量子コントローラの堅牢性をモデル化し評価するための様々なツールを統一する第一歩となる。 Recent achievements in quantum control have resulted in advanced techniques for designing controllers for applications in quantum communication, computing, and sensing. However, the susceptibility of such systems to noise and uncertainties necessitates robust controllers that perform effectively under these conditions to realize the full potential of quantum devices. The time-domain log-sensitivity and a recently introduced robustness infidelity measure (RIM) are two means to quantify controller robustness in quantum systems. The former can be found analytically, while the latter requires Monte-Carlo sampling. In this work, the correlation between the log-sensitivity and the RIM for evaluating the robustness of single excitation transfer fidelity in spin chains and rings in the presence of dephasing is investigated. We show that the expected differential sensitivity of the error agrees with the differential sensitivity of the RIM, where the expectation is over the error probability distribution. Statistical analysis also demonstrates that the log-sensitivity and the RIM are linked via the differential sensitivity, and that the differential sensitivity and RIM are highly concordant. This unification of two means (one analytic and one via sampling) to assess controller robustness in a variety of realistic scenarios provides a first step in unifying various tools to model and assess robustness of quantum controllers.	翻訳日:2023-06-17 01:46:49 公開日:2023-06-14
# 離散変調連続可変量子鍵分布のセキュリティ Security of discrete-modulated continuous-variable quantum key distribution ( http://arxiv.org/abs/2303.09255v2 ) ライセンス: Link先を確認	Stefan B\"auml, Carlos Pascual Garc\'ia, Victoria Wright, Omar Fawzi, Antonio Ac\'in	(参考訳) 離散変調による連続可変量子鍵分布は、広く利用可能な光学素子と既存の通信インフラを用いて量子物理セキュリティを提供する可能性がある。その実装はガウス変調に基づくプロトコルよりもはるかに単純であるが、コヒーレント攻撃に対する有限サイズのセキュリティを証明することは困難である。本研究では、4つのコヒーレント状態とヘテロダイン検出を含む離散変調量子鍵分布プロトコルに対するコヒーレント攻撃に対する有限サイズのセキュリティを証明するために、これまで離散変数の設定に用いられてきたエントロピー累積定理を適用する。そのために,従来の手法とは対照的に,すべての情報を無差別に扱うプロトコルを考える。我々はまず、その漸近速度を現実的なフォトン数カットオフ仮定の下で制限した。この境界はエントロピー蓄積を用いた有限サイズのセキュリティ証明にアップグレードされる。解析では、ラウンドあたり0.1-10^{-4}$bitsの範囲において、最大100kmまでの距離に対して漸近的なレートが与えられ、有限の場合と現実的なパラメータでは、n=10^{12}$ rounds と数十kmの距離の後に10ドルgbitsの秘密鍵が与えられる。 Continuous variable quantum key distribution with discrete modulation has the potential to provide quantum physical security using widely available optical elements and existing telecom infrastructure. While their implementation is significantly simpler than that for protocols based on Gaussian modulation, proving their finite-size security against coherent attacks poses a challenge. In this work we apply the entropy accumulation theorem, a tool that has previously been used in the setting of discrete variables, to prove finite-size security against coherent attacks for a discrete-modulated quantum key distribution protocol involving four coherent states and heterodyne detection. To do so, and contrary to previous approaches, we consider a protocol in which all the information is discretised. We first bound its asymptotic rate under a realistic photon number cutoff assumption. This bound is then upgraded into a finite-size security proof using entropy accumulation. Our analysis provides asymptotic rates in the range of $0.1-10^{-4}$ bits per round for distances up to hundred kilometres, while in the finite case and for realistic parameters, we get of the order of $10$ Gbits of secret key after $n=10^{12}$ rounds and distances of few tens of kilometres.	翻訳日:2023-06-17 01:46:26 公開日:2023-06-14
# 画像再構成におけるヒューマンインストラクションの回避を学習する説明可能なテキスト・ビジュアル・チャット Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation ( http://arxiv.org/abs/2303.05983v2 ) ライセンス: Link先を確認	Zhiwei Zhang, Yuliang Liu	(参考訳) chatgptとgpt-4の成功はマルチモーダル対話システムに広く注目されている。しかし、学術コミュニティには、テキスト・ビジュアルチャットタスクでVisual Language Models(VLM)のマルチモーダル生成能力を検証できるデータセットが欠けている。本稿では,合成CLEVR-ATVCデータセット(620K)と手動によるFruit-ATVCデータセット(50K)の2つの新しいマルチモーダルデータセットを構築する。さらに、言語ベースのChatGPT会話のように、マルチモーダルシステムが人間の要求を拒否する(すなわち、説明責任を示す)ために、データセットに特定のルールを組み込んで監視信号とする。これにより、トレーニングされたVLMは、視覚的およびテキスト的推論の後、なぜ人間の指示を抽出できないのかという言語説明とともに、イエスまたはノー回答を提供することができる。本研究では,画像の自動エンコーダと自動回帰変換器をスクラッチからトレーニングするための2状態トレーニング手法を提案する。第1の状態は、各画像を短いトークンに圧縮する離散変分オートエンコーダ(dVAE)を含み、その後、単一のデータストリームとしてテキストトークンと結合してデコーダベースのトランスフォーマーに送信し、第2状態において視覚的再生成とテキストフィードバックを生成する。本研究では,画像品質,回答精度,不確実性や不完全なユーザクエリに直面する場合のモデル行動の観点から,実験結果を総合的に分析する。本研究の成果は,テキスト・視覚生成モデルの説明可能性に関する貴重な知見に寄与することを期待している。 The recent success of ChatGPT and GPT-4 has drawn widespread attention to multimodal dialogue systems. However, the academia community lacks a dataset that can validate the multimodal generation capabilities of Visual Language Models (VLMs) in textual-visual chat tasks. In this paper, we construct two new multimodal datasets: the synthetic CLEVR-ATVC dataset (620K) and the manually pictured Fruit-ATVC dataset (50K), both featuring visual and text-based inputs and outputs. Additionally, to enable the multimodal system to reject human requests (i.e., demonstrate accountability), as in language-based ChatGPT conversations, we develop and incorporate specific rules into the datasets as supervisory signals. This allows the trained VLM to provide a yes or no answer after visual and textual reasoning, accompanied by a language explanation as to why the human instruction cannot be excuted. In our method, we propose a two-state training procedure to train the image auto-encoder and auto-regressive transformer from scratch. The first state involves a discrete variational autoencoder (dVAE) to compress each image into short tokens, which are then concatenated with text tokens as a single data stream to be fed into the decoder-based transformer for generating visual re-creation and textual feedback in the second state. We provide comprehensive analyses of experimental results in terms of re-created image quality, answer accuracy, and the model behavior when faced with uncertainty and imperfect user queries. We hope our explorations and findings contribute valuable insights regarding the accountability of textual-visual generative models.	翻訳日:2023-06-17 01:45:24 公開日:2023-06-14
# 確率的トリガーアームを用いた文脈組合せ帯域 Contextual Combinatorial Bandits with Probabilistically Triggered Arms ( http://arxiv.org/abs/2303.17110v2 ) ライセンス: Link先を確認	Xutong Liu, Jinhang Zuo, Siwei Wang, John C.S. Lui, Mohammad Hajiesmaili, Adam Wierman, Wei Chen	(参考訳) 本研究では,コンテキストカスケードバンドや文脈影響最大化バンドなど,幅広い応用を捉えた様々な平滑性条件下で,確率的トリガアーム(c$^2$mab-t)を用いたコンテクストコンビネートバンドの研究を行った。トリガリング確率変調 (TPM) 条件の下では、C$^2$-UCB-T アルゴリズムを考案し、$\tilde{O}(d\sqrt{KT})$ regret bound を達成する新しい解析法を提案し、潜在的に指数関数的に大きな因子である $O(1/p_{\min})$ を除去し、$d$ は文脈の次元であり、$p_{\min}$ は任意のアームをトリガできる最小の正の確率であり、バッチサイズ $K$ はラウンド毎にトリガできる最大のアーム数である。分散変調 (vm) またはトリガー確率および分散変調 (tpvm) 条件の下で, 分散適応アルゴリズム vac$^2$-ucb を提案し, バッチサイズの $k$ とは独立な, 後悔の束縛 $\tilde{o}(d\sqrt{t})$ を導出する。価値ある副産物として,cmab-t および c$^2$mab 設定に解析手法と分散適応アルゴリズムを適用し,既存の結果も改善した。合成および実世界のデータセットのベンチマークアルゴリズムと比較して,アルゴリズムの性能向上を示す実験も含んでいる。 We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$^2$MAB setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.	翻訳日:2023-06-17 01:39:07 公開日:2023-06-14
# 視覚的に配線されたNFT : 非触覚における吸気の役割を探る Visually Wired NFTs: Exploring the Role of Inspiration in Non-Fungible Tokens ( http://arxiv.org/abs/2303.17031v3 ) ライセンス: Link先を確認	Lucio La Cava, Davide Costa, Andrea Tagarelli	(参考訳) 非フランジブルトークン(nfts)への熱意は無数のクリエイターを惹きつけ、多くの創造的プロセスのように、潜在性や明示的なインスピレーションによって引き起こされるデジタル資産のビッグバンにつながった。この研究は、視覚変換器とグラフベースのモデリングを利用して、NFT間の視覚的なインスピレーション現象を長年研究してきた。私たちの目標は、視覚インスピレーションネットワークを形成する主な構造特性の公開、視覚インスピレーションとアセットパフォーマンスの相互関係の探索、インスピレーションプロセスに対する暗号の影響の調査、NFT間のインスピレーション関係の説明などです。インスピレーションの広汎さが視覚的特徴空間の一時的な飽和、インスピレーションとインスピレーションの2分断が財務成績に及ぼす影響、市場とインスピレーションの波による本質的な自己調節機構の解明につながった。私たちの仕事は、web3の進化のより広い視点を得るための出発点となり得る。 The fervor for Non-Fungible Tokens (NFTs) attracted countless creators, leading to a Big Bang of digital assets driven by latent or explicit forms of inspiration, as in many creative processes. This work exploits Vision Transformers and graph-based modeling to delve into visual inspiration phenomena between NFTs over the years. Our goals include unveiling the main structural traits that shape visual inspiration networks, exploring the interrelation between visual inspiration and asset performances, investigating crypto influence on inspiration processes, and explaining the inspiration relationships among NFTs. Our findings unveil how the pervasiveness of inspiration led to a temporary saturation of the visual feature space, the impact of the dichotomy between inspiring and inspired NFTs on their financial performance, and an intrinsic self-regulatory mechanism between markets and inspiration waves. Our work can serve as a starting point for gaining a broader view of the evolution of Web3.	翻訳日:2023-06-17 01:38:32 公開日:2023-06-14
# 品質多様性トランスフォーマ:決定トランスを用いた行動条件形軌道の生成 The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers ( http://arxiv.org/abs/2303.16207v2 ) ライセンス: Link先を確認	Valentin Mac\'e, Rapha\"el Boige, Felix Chalumeau, Thomas Pierrot, Guillaume Richard, Nicolas Perrin-Gilbert	(参考訳) 神経進化の文脈において、品質多様性アルゴリズムは行動空間の定義に依存することにより、多様で効率的なポリシーのレパートリーを生成するのに有効であることが証明されている。このようなレパートリーの作成によって引き起こされる自然な目標は、レパートリーから対応するポリシーを実行することで実現可能な、需要に対する行動を達成することである。しかし、不確実な環境では2つの問題が生じる。第一に、ポリシーは堅牢性と再現性に欠ける可能性があるため、わずかに異なる条件下での複数のエピソードは、しばしば非常に異なる振る舞いをもたらす。第二に、レパートリーの離散的性質のため、解は不連続に変化する。本稿では,まず,行動空間において最も一貫した解に対する解の選択を制約するMAP-Elites Low-Spread (ME-LS) という2つのメカニズムに基づく行動条件付き軌道生成を実現するための新しい手法を提案する。第二に、連続的な動作記述子に基づくトランスフォーマティブベースのモデルである quality-diversity transformer (qdt) は、me-lsレパートリーからのポリシによって生成されたデータセットをトレーニングし、ターゲットの動作を達成するアクションのシーケンスを自己回帰的に生成することを学ぶ。その結果,ME-LSは一貫性とロバストなポリシを生成し,QDTと組み合わせることで,要求に対する多様な振る舞いを高い精度で達成可能な単一ポリシが得られることがわかった。 In the context of neuroevolution, Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies by relying on the definition of a behavior space. A natural goal induced by the creation of such a repertoire is trying to achieve behaviors on demand, which can be done by running the corresponding policy from the repertoire. However, in uncertain environments, two problems arise. First, policies can lack robustness and repeatability, meaning that multiple episodes under slightly different conditions often result in very different behaviors. Second, due to the discrete nature of the repertoire, solutions vary discontinuously. Here we present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains the selection of solutions to those that are the most consistent in the behavior space. Second, the Quality-Diversity Transformer (QDT), a Transformer-based model conditioned on continuous behavior descriptors, which trains on a dataset generated by policies from a ME-LS repertoire and learns to autoregressively generate sequences of actions that achieve target behaviors. Results show that ME-LS produces consistent and robust policies, and that its combination with the QDT yields a single policy capable of achieving diverse behaviors on demand with high accuracy.	翻訳日:2023-06-17 01:38:07 公開日:2023-06-14
# LLaMA-Adapter:ゼロ入力型言語モデルの効率的な微調整 LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention ( http://arxiv.org/abs/2303.16199v2 ) ライセンス: Link先を確認	Renrui Zhang, Jiaming Han, Chris Liu, Peng Gao, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Yu Qiao	(参考訳) 命令追従モデルにLLaMAを効率よく微調整する軽量適応手法であるLLaMA-Adapterを提案する。 LLaMA-Adapterは52Kの自己命令型デモを使用して、凍結したLLaMA 7Bモデルで1.2Mの学習可能なパラメータを導入し、8A100 GPUの微調整には1時間未満の費用がかかる。具体的には、学習可能な適応プロンプトを採用し、より高いトランスフォーマー層で単語トークンにそれらを強調する。次に,新しい指導手がかりをラマに適応的に注入し,事前学習した知識を効果的に保持する,ゼロゲーティングによるゼロ初期化注意機構を提案する。効率的なトレーニングにより、LLaMA-Adapterは、完全に微調整された7Bパラメータを持つAlpacaに匹敵する高品質な応答を生成することができる。言語コマンドの他に,ScienceQA や COCO Caption のベンチマークにおいて,より優れた推論性能を実現する画像条件付き LLaMA モデルを学習するためのマルチモーダル命令にも簡単に拡張できる。さらに,従来の視覚や言語タスクに対して,事前学習した他のモデル (ViT, RoBERTa) を微調整するゼロ初期化アテンション機構も評価し,提案手法のより優れた一般化能力を示す。コードはhttps://github.com/OpenGVLab/LLaMA-Adapterで公開されている。 We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With our efficient training, LLaMA-Adapter can generate high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Besides language commands, our approach can be simply extended to multi-modal instructions for learning image-conditioned LLaMA model, which achieves superior reasoning performance on ScienceQA and COCO Caption benchmarks. Furthermore, we also evaluate the zero-initialized attention mechanism for fine-tuning other pre-trained models (ViT, RoBERTa) on traditional vision and language tasks, demonstrating the superior generalization capacity of our approach. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.	翻訳日:2023-06-17 01:37:43 公開日:2023-06-14
# 単一光子に対する波動粒子双対性の確率論的考察 A probabilistic view of wave-particle duality for single photons ( http://arxiv.org/abs/2303.15185v3 ) ライセンス: Link先を確認	Andrea Aiello	(参考訳) 古典物理学から借用された概念の観点から量子力学を解釈する最も厄介な結果の1つは、いわゆる波動粒子双対性である。通常、波と粒子の双対性は、干渉実験における経路識別性とフリンジ可視性の相補性の観点から示される。そこで本研究では,波動の連続的性質と粒子の離散的特性との間に生じる新しい相補性を提案する。量子場理論の確率論的手法を用いて、同じ光のビーム内の波振幅と光子数を同時に測定することは、ある状況下では量子力学の法則によって禁止されることを示した。その結果,`interferometric duality'という概念は,より一般的な`continuous-vs-discrete duality'の概念に置き換えられる可能性が示唆された。 One of the most puzzling consequences of interpreting quantum mechanics in terms of concepts borrowed from classical physics, is the so-called wave-particle duality. Usually, wave-particle duality is illustrated in terms of complementarity between path distinguishability and fringe visibility in interference experiments. In this work, we instead propose a new type of complementarity, that between the continuous nature of waves and the discrete character of particles. Using the probabilistic methods of quantum field theory, we show that the simultaneous measurement of the wave amplitude and the number of photons in the same beam of light is, under certain circumstances, prohibited by the laws of quantum mechanics. Our results suggest that the concept of ``interferometric duality'' could be eventually replaced by the more general one of ``continuous-vs-discrete duality''.	翻訳日:2023-06-17 01:37:18 公開日:2023-06-14
# CCL:LiDAR位置認識のための連続的コントラスト学習 CCL: Continual Contrastive Learning for LiDAR Place Recognition ( http://arxiv.org/abs/2303.13952v2 ) ライセンス: Link先を確認	Jiafeng Cui, Xieyuanli Chen	(参考訳) 位置認識は、ロボットや自動運転アプリケーションのためのループクローズとグローバルローカライズにおいて、必須かつ困難なタスクである。近年のディープラーニング技術の発展により,LiDAR位置認識(LPR)の性能は大幅に向上した。しかし、現在のディープラーニングベースの手法は、一般化能力の低さと破滅的な忘れることの2つの大きな問題に悩まされている。本稿では,大惨な忘れの問題に対処し,LPRアプローチの堅牢性を改善するために,CCLという連続的なコントラスト学習手法を提案する。我々のCCLは、コントラスト的特徴プールを構築し、コントラスト的損失を利用して、より移動可能な場所表現を訓練する。新たな環境に移行すると、CCLはコントラストメモリバンクを継続的にレビューし、新しいデータから新しい場所を認識することを継続的に学習しながら、過去のデータの検索能力を維持するために分布ベースの知識蒸留を適用します。我々は3つの異なるLPR手法を用いてオックスフォード、MulRan、PNVデータセットに対するアプローチを徹底的に評価した。実験の結果,我々のCCLは,異なる環境における異なる手法の性能を常に改善し,最先端の継続的学習法よりも優れていた。このメソッドの実装はhttps://github.com/cloudcjf/cclでリリースされた。 Place recognition is an essential and challenging task in loop closing and global localization for robotics and autonomous driving applications. Benefiting from the recent advances in deep learning techniques, the performance of LiDAR place recognition (LPR) has been greatly improved. However, current deep learning-based methods suffer from two major problems: poor generalization ability and catastrophic forgetting. In this paper, we propose a continual contrastive learning method, named CCL, to tackle the catastrophic forgetting problem and generally improve the robustness of LPR approaches. Our CCL constructs a contrastive feature pool and utilizes contrastive loss to train more transferable representations of places. When transferred into new environments, our CCL continuously reviews the contrastive memory bank and applies a distribution-based knowledge distillation to maintain the retrieval ability of the past data while continually learning to recognize new places from the new data. We thoroughly evaluate our approach on Oxford, MulRan, and PNV datasets using three different LPR methods. The experimental results show that our CCL consistently improves the performance of different methods in different environments outperforming the state-of-the-art continual learning method. The implementation of our method has been released at https://github.com/cloudcjf/CCL.	翻訳日:2023-06-17 01:37:04 公開日:2023-06-14
# OpenAGI: LLMがドメインエキスパートと出会ったとき OpenAGI: When LLM Meets Domain Experts ( http://arxiv.org/abs/2304.04370v3 ) ライセンス: Link先を確認	Yingqiang Ge, Wenyue Hua, Kai Mei, Jianchao Ji, Juntao Tan, Shuyuan Xu, Zelong Li, Yongfeng Zhang	(参考訳) 人間の知性は、複雑なタスクを解決するために、基本的なスキルを複雑なものに組み立てる素晴らしい能力を持っている。この能力は人工知能(ai)にも等しく重要であり、大規模で包括的な知的モデルの開発に加えて、人工知能(agi)の追求において複雑なタスク解決のために様々なドメイン固有のエキスパートモデルを活用する能力を備えることが重要であると主張する。近年の大規模言語モデル(llm)の発展は驚くべき学習能力と推論能力を示しており、複雑なタスクを解決するために外部モデルを選択、合成、実行するためのコントローラとして有望である。本稿では,オープンソースのAGI研究プラットフォームであるOpenAGIを開発し,タスク固有のデータセット,評価指標,さまざまな拡張可能なモデルなどを伴って,複雑なマルチステップタスクを提供する。 OpenAGIは複雑なタスクを自然言語クエリとして定式化し、LLMへの入力として機能する。 LLMはその後、タスクに対処するためにOpenAGIが提供するモデルを選択し、合成し、実行します。さらに,課題解決結果をフィードバックとして利用するタスクフィードバック(rltf)機構から強化学習を行い,llmのタスク解決能力を向上させる。したがって、LLMは複雑なタスクを解決するために様々な外部モデルを合成する責任を持ち、RTLFはタスク解決能力を改善するためのフィードバックを提供し、自己改善AIのためのフィードバックループを可能にする。我々は、複雑なタスク解決のための様々な専門家モデルを操作するLLMのパラダイムが、AGIに対する有望なアプローチであると信じている。コミュニティによるAGIの能力の長期的な改善と評価を容易にするため、私たちはOpenAGIプロジェクトのコード、ベンチマーク、評価方法をhttps://github.com/agiresearch/OpenAGIでオープンソース化しました。 Human intelligence has the remarkable ability to assemble basic skills into complex ones so as to solve complex tasks. This ability is equally important for Artificial Intelligence (AI), and thus, we assert that in addition to the development of large, comprehensive intelligent models, it is equally crucial to equip such models with the capability to harness various domain-specific expert models for complex task-solving in the pursuit of Artificial General Intelligence (AGI). Recent developments in Large Language Models (LLMs) have demonstrated remarkable learning and reasoning abilities, making them promising as a controller to select, synthesize, and execute external models to solve complex tasks. In this project, we develop OpenAGI, an open-source AGI research platform, specifically designed to offer complex, multi-step tasks and accompanied by task-specific datasets, evaluation metrics, and a diverse range of extensible models. OpenAGI formulates complex tasks as natural language queries, serving as input to the LLM. The LLM subsequently selects, synthesizes, and executes models provided by OpenAGI to address the task. Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF) mechanism, which uses the task-solving result as feedback to improve the LLM's task-solving ability. Thus, the LLM is responsible for synthesizing various external models for solving complex tasks, while RLTF provides feedback to improve its task-solving ability, enabling a feedback loop for self-improving AI. We believe that the paradigm of LLMs operating various expert models for complex task-solving is a promising approach towards AGI. To facilitate the community's long-term improvement and evaluation of AGI's ability, we open-source the code, benchmark, and evaluation methods of the OpenAGI project at https://github.com/agiresearch/OpenAGI.	翻訳日:2023-06-17 01:28:16 公開日:2023-06-14
# Einstein-Podolsky-Rosen-Bohm実験:離散データ駆動アプローチ Einstein-Podolsky-Rosen-Bohm experiments: a discrete data driven approach ( http://arxiv.org/abs/2304.03962v3 ) ライセンス: Link先を確認	Hans De Raedt, Mikhail I. Katsnelson, Manpreet S. Jattana, Vrinda Mehta, Madita Willsch, Dennis Willsch, Kristel Michielsen, Fengping Jin	(参考訳) 我々は、実験データから数学モデルへの一方的な橋渡しを構築することは、後者で使われる記号に意味を付けることによって引き起こされる論争を回避できるという観点から考える。特に、アインシュタイン-ポドルスキー-ローゼン=ボーム実験の結果を解釈するための数学的モデルを構築する上で、この考え方を採用することが新しい視点をもたらすことを示す。まず, アインシュタイン-ポドルスキー-ローゼン-ボーム実験により得られた4つの相関の値に制約を与えるベル型不等式を4つの異なる条件で証明する。証明は ``model-free' であり、データの生成を想像する数学的モデルに言及しないという意味では '`model-free' である。制約は、相関値を変更することなく、4つのデータセットでデータを再シャッフルすることで得られる四足数にのみ依存する。これらの新しい不等式は、既知のベル型不等式(英語版)のモデルフリーバージョンに還元される。モデルフリーであることから、実験データによる後者の違反は、4つのデータセットのすべてのデータが4重に書き換えられるわけではないことを意味する。さらに、モデルのない不等式であるため、実験データによる後者の違反は、このデータを生成すると仮定される数学的モデルが適用されないことを意味する。 Einstein-Podolsky-Rosen-Bohm実験によって得られたデータから、これらのデータの主な特徴を記述する数学的モデルを仮定する代わりに構築する。合理的推論の数学的枠組みは再現可能で堅牢なデータに適用され、一重項状態の2つのスピン1/2オブジェクトの系に対する相関式である量子論のいかなる概念も使わずに得られる。 (ここで詳述) We take the point of view that building a one-way bridge from experimental data to mathematical models instead of the other way around avoids running into controversies resulting from attaching meaning to the symbols used in the latter. In particular, we show that adopting this view offers new perspectives for constructing mathematical models for and interpreting the results of Einstein-Podolsky-Rosen-Bohm experiments. We first prove new Bell-type inequalities constraining the values of the four correlations obtained by performing Einstein-Podolsky-Rosen-Bohm experiments under four different conditions. The proof is ``model-free'' in the sense that it does not refer to any mathematical model that one imagines to have produced the data. The constraints only depend on the number of quadruples obtained by reshuffling the data in the four data sets without changing the values of the correlations. These new inequalities reduce to model-free versions of the well-known Bell-type inequalities if the maximum fraction of quadruples is equal to one. Being model-free, a violation of the latter by experimental data implies that not all the data in the four data sets can be reshuffled to form quadruples. Furthermore, being model-free inequalities, a violation of the latter by experimental data only implies that any mathematical model assumed to produce this data does not apply. Starting from the data obtained by performing Einstein-Podolsky-Rosen-Bohm experiments, we construct instead of postulate mathematical models that describe the main features of these data. The mathematical framework of plausible reasoning is applied to reproducible and robust data, yielding without using any concept of quantum theory, the expression of the correlation for a system of two spin-1/2 objects in the singlet state. (truncated here)	翻訳日:2023-06-17 01:27:41 公開日:2023-06-14
# aspest: アクティブラーニングと選択的予測のギャップを埋める ASPEST: Bridging the Gap Between Active Learning and Selective Prediction ( http://arxiv.org/abs/2304.03870v2 ) ライセンス: Link先を確認	Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan Arik, Somesh Jha, Tomas Pfister	(参考訳) 選択的予測は、モデルの不確実性が高い場合の予測を省略する信頼できるモデルを学ぶことを目的としている。これらの予測は、さらなる評価のために人間の専門家に延期することができる。多くの実世界のシナリオでは、テストデータの分布はトレーニングデータとは異なる。これにより、より不正確な予測が行われ、人間のラベル付けが複雑でコストがかかる。アクティブラーニングは、最も有益な例のみを問うことでこれを回避し、いくつかのケースでは、全体的なラベル付けの労力を減らすことが示されている。そこで本研究では,選択予測とアクティブラーニングを橋渡しし,移動対象領域からより有意義なサンプルをクエリし,精度とカバレッジを高めた新しい学習パラダイムであるactive selective predictionを提案する。この新たな問題に対して,モデルスナップショットのアンサンブルと,集約された出力を擬似ラベルとして自己学習する,シンプルで効果的なソリューションであるASPESTを提案する。多くの画像、テキスト、構造化データセット、特にドメインシフトに苦しむデータセットに関する広範な実験は、提案手法が選択的予測とアクティブラーニング(例えば、100ドルのラベル付け予算でmnist$\to$svhnベンチマークで)の以前の作業を大きく上回ることを実証し、aucメトリックを79.36\%から8.84.%$に改善し、ループ内で人間の最適な利用を達成する。 Selective prediction aims to learn a reliable model that abstains from making predictions when the model uncertainty is high. These predictions can then be deferred to a human expert for further evaluation. In many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, necessitating increased human labeling, which can be difficult and expensive. Active learning circumvents this by only querying the most informative examples and, in several cases, has been shown to lower the overall labeling effort. In this work, we bridge selective prediction and active learning, proposing a new learning paradigm called active selective prediction which learns to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new problem, we propose a simple but effective solution, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, particularly those suffer from domain shifts, demonstrate that our proposed method can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST$\to$SVHN benchmark with the labeling budget of $100$, ASPEST improves the AUC metric from $79.36\%$ to $88.84\%$) and achieves more optimal utilization of humans in the loop.	翻訳日:2023-06-17 01:27:08 公開日:2023-06-14
# ImageEye:プログラム合成を用いたバッチ画像処理 ImageEye: Batch Image Processing Using Program Synthesis ( http://arxiv.org/abs/2304.03253v3 ) ライセンス: Link先を確認	Celeste Barnaby, Qiaochu Chen, Roopsha Samanta, Isil Dillig	(参考訳) 本稿では,バッチ画像処理のための新しい合成手法を提案する。全画像にグローバル編集しか適用できない既存のツールとは異なり、この方法は画像内の個々のオブジェクトに対してきめ細かい編集を施すことができる。例えば、特定の特性を持つ特定のオブジェクトを選択的にぼかしたり、収穫することができる。このようなきめ細かい画像編集作業を容易にするために,事前学習したニューラルネットワークと記号推論を可能にする他の言語構造を組み合わせた,ニューロシンボリックドメイン固有言語(DSL)を提案する。本手法は,新しい合成アルゴリズムを用いて,ユーザの実演から,このdslのプログラムを自動的に学習する。提案手法をImageEyeと呼ばれるツールに実装し,50個の画像編集タスクで評価した。評価の結果,ImageEyeはこれらのタスクの96%を自動化できることがわかった。 This paper presents a new synthesis-based approach for batch image processing. Unlike existing tools that can only apply global edits to the entire image, our method can apply fine-grained edits to individual objects within the image. For example, our method can selectively blur or crop specific objects that have a certain property. To facilitate such fine-grained image editing tasks, we propose a neuro-symbolic domain-specific language (DSL) that combines pre-trained neural networks for image classification with other language constructs that enable symbolic reasoning. Our method can automatically learn programs in this DSL from user demonstrations by utilizing a novel synthesis algorithm. We have implemented the proposed technique in a tool called ImageEye and evaluated it on 50 image editing tasks. Our evaluation shows that ImageEye is able to automate 96% of these tasks.	翻訳日:2023-06-17 01:26:40 公開日:2023-06-14
# ニューラル遅延微分方程式を用いた遅延学習 Learning the Delay Using Neural Delay Differential Equations ( http://arxiv.org/abs/2304.01329v2 ) ライセンス: Link先を確認	Maria Oprea and Mark Walth and Robert Stephany and Gabriella Torres Nothaft and Arnaldo Rodriguez-Gonzalez and William Clark	(参考訳) 機械学習と動的システムの交点は最近かなりの関心を集めている。ニューラルネットワークの常微分方程式(ノード)は、これらのフィールド間の重なりが豊富である。本稿では,遅延微分方程式(ddes)に基づく連続時間ニューラルネットワーク手法を提案する。本モデルでは,データからモデルパラメータと遅延を直接学習するために随伴感度法を用いる。我々のアプローチはNODEにインスパイアされ、遅延の値が先行値であることが仮定された初期のニューラルDDEモデルを拡張します。我々は,提案手法の感度解析を行い,ベンチマークシステムからddeパラメータを学習する能力を示す。我々は今後の方向性と応用の可能性で議論を終える。 The intersection of machine learning and dynamical systems has generated considerable interest recently. Neural Ordinary Differential Equations (NODEs) represent a rich overlap between these fields. In this paper, we develop a continuous time neural network approach based on Delay Differential Equations (DDEs). Our model uses the adjoint sensitivity method to learn the model parameters and delay directly from data. Our approach is inspired by that of NODEs and extends earlier neural DDE models, which have assumed that the value of the delay is known a priori. We perform a sensitivity analysis on our proposed approach and demonstrate its ability to learn DDE parameters from benchmark systems. We conclude our discussion with potential future directions and applications.	翻訳日:2023-06-17 01:25:50 公開日:2023-06-14
# 個人差分学習におけるユーティリティ損失の軽減について:幾何学的カーネルアプローチによる新しい視点 On Mitigating the Utility-Loss in Differentially Private Learning: A new Perspective by a Geometrically Inspired Kernel Approach ( http://arxiv.org/abs/2304.01300v2 ) ライセンス: Link先を確認	Mohit Kumar, Bernhard A. Moser, Lukas Fischer	(参考訳) プライバシとユーティリティのトレードオフは、差分プライベート機械学習の基本的な問題のひとつとして残っている。本稿では,幾何学的インスパイアされたカーネルに基づく分類の精度低下を緩和する手法を提案する。このアプローチでは、与えられたデータポイントのアフィン殻の表現が、Reproduction Kernel Hilbert Spaces (RKHS) で学習される。これにより、個々のデータポイントに関するプライバシーに敏感な情報を隠蔽し、メンバシップ推論攻撃のリスクを大幅に低減することで、プライバシとユーティリティのトレードオフを改善する新しい距離尺度が導かれる。このアプローチの有効性は、MNISTデータセット、フライブルク食料品データセット、本物のバイオメディカルデータセットの実験を通じて実証される。このアプローチが計算上実用的であることは確認されている。フェデレーション学習へのアプローチの適用を考察し,分散データによる精度損失は限界値か,あるいはそれほど高くないことが観察された。 Privacy-utility tradeoff remains as one of the fundamental issues of differentially private machine learning. This paper introduces a geometrically inspired kernel-based approach to mitigate the accuracy-loss issue in classification. In this approach, a representation of the affine hull of given data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads to a novel distance measure that hides privacy-sensitive information about individual data points and improves the privacy-utility tradeoff via significantly reducing the risk of membership inference attacks. The effectiveness of the approach is demonstrated through experiments on MNIST dataset, Freiburg groceries dataset, and a real biomedical dataset. It is verified that the approach remains computationally practical. The application of the approach to federated learning is considered and it is observed that the accuracy-loss due to data being distributed is either marginal or not significantly high.	翻訳日:2023-06-17 01:25:42 公開日:2023-06-14
# リーンのための機械学習型前提選択 Machine-Learned Premise Selection for Lean ( http://arxiv.org/abs/2304.00994v2 ) ライセンス: Link先を確認	Bartosz Piotrowski, Ramon Fern\'andez Mir, Edward Ayers	(参考訳) ユーザによって証明される定理の前提となる前提を示唆する,リーン証明アシスタントのための機械学習ベースのツールを紹介する。ツールの設計原則は,(1)証明アシスタントとの緊密な統合,(2)使いやすさとインストール,(3)軽量で迅速なアプローチである。この目的のために、オンラインで訓練されたランダム森林モデルのカスタムバージョンを設計した。これはLean 4.0のリッチで効率的なメタプログラミング機能のおかげで可能になった。ランダムな森は、リーンの数学ライブラリであるMathlibから抽出されたデータに基づいて訓練されている。トレーニング機能やラベルを作成するための様々なオプションを試す。トレーニングされたモデルからのアドバイスは、対話的に証明を構築しながら、エディターで呼び出すことができるsuggested_premises tacticを介してユーザに提供する。 We introduce a machine-learning-based tool for the Lean proof assistant that suggests relevant premises for theorems being proved by a user. The design principles for the tool are (1) tight integration with the proof assistant, (2) ease of use and installation, (3) a lightweight and fast approach. For this purpose, we designed a custom version of the random forest model, trained in an online fashion. It is implemented directly in Lean, which was possible thanks to the rich and efficient metaprogramming features of Lean 4. The random forest is trained on data extracted from mathlib -- Lean's mathematics library. We experiment with various options for producing training features and labels. The advice from a trained model is accessible to the user via the suggest_premises tactic which can be called in an editor while constructing a proof interactively.	翻訳日:2023-06-17 01:25:26 公開日:2023-06-14
# 局所エネルギー分布に基づく確率的アニーリングのハイパーパラメータ決定 Local Energy Distribution Based Hyperparameter Determination for Stochastic Simulated Annealing ( http://arxiv.org/abs/2304.11839v2 ) ライセンス: Link先を確認	Naoya Onizawa, Kyo Kuroki, Duckgyu Shin, Takahiro Hanyu	(参考訳) 本稿では,局所エネルギー分布に基づく確率的模擬焼鈍(SSA)のためのハイパーパラメータ決定法を提案する。 SSAは、一般的な模擬焼鈍(SA)よりも高速に組合せ最適化問題を解くことができるが、時間を要するハイパーパラメーター探索が必要である。提案手法はスピン(確率ビット)の局所エネルギー分布に基づいてハイパーパラメータを決定する。スピンはSSAの基本計算要素であり、その重みで他のスピンとグラフィカルに接続されている。局所エネルギーの分布は中心極限定理(CLT)に基づいて推定できる。 CLTに基づく正規分布は、従来の手法のO(n^3)からO(1)へのハイパーパラメータ探索の時間的複雑さを低減するために用いられる。最大カット問題に対するGsetおよびK2000ベンチマークにおいて,決定されたハイパーパラメータを用いたSSAの性能を評価する。その結果,提案手法は最もよく知られたカット値の約98%の平均カット値が得られることがわかった。 This paper presents a local energy distribution based hyperparameter determination for stochastic simulated annealing (SSA). SSA is capable of solving combinatorial optimization problems faster than typical simulated annealing (SA), but requires a time-consuming hyperparameter search. The proposed method determines hyperparameters based on the local energy distributions of spins (probabilistic bits). The spin is a basic computing element of SSA and is graphically connected to other spins with its weights. The distribution of the local energy can be estimated based on the central limit theorem (CLT). The CLT-based normal distribution is used to determine the hyperparameters, which reduces the time complexity for hyperparameter search from O(n^3) of the conventional method to O(1). The performance of SSA with the determined hyperparameters is evaluated on the Gset and K2000 benchmarks for maximum-cut problems. The results show that the proposed method achieves mean cut values of approximately 98% of the best-known cut values.	翻訳日:2023-06-17 01:19:13 公開日:2023-06-14
# 自律運転のためのニューラルマップ Neural Map Prior for Autonomous Driving ( http://arxiv.org/abs/2304.08481v2 ) ライセンス: Link先を確認	Xuan Xiong, Yicheng Liu, Tianyuan Yuan, Yue Wang, Yilun Wang, Hang Zhao	(参考訳) 高精細(HD)セマンティックマップは、自動運転車が都市環境をナビゲートするために不可欠である。オフラインのhdマップを作成する従来の方法は、コストがかかるだけでなく、タイムリーな更新には不十分な、労働集約的な手動アノテーションプロセスを伴う。近年,オンラインセンサを用いた局所地図作成手法が提案されている。しかし、このアプローチはセンサーの知覚範囲と咬合に対する感受性によって制限される。本研究では,グローバルマップのニューラルマップ表現であるneural map prior (nmp)を提案する。この表現は自動的に更新され、ローカルマップ推論のパフォーマンスが向上する。具体的には、これを実現するために2つのアプローチを利用する。まず,局所写像推論に先行する強写像を統合するために,現在と過去の特徴の相関関係を動的に同定する機構であるクロスアテンションを適用した。第2に,グローバルニューラルマップを事前に更新するために,前回のトラバーサルから特徴を抽出してネットワークを誘導する学習ベースのフュージョンモジュールを用いる。 nuScenesデータセットをベースとした実験結果から,本フレームワークは様々なマップセグメンテーションおよび検出アーキテクチャと高い互換性を示す。より長い知覚範囲の厳しい気象条件や状況であっても、地図予測性能を著しく向上させる。私たちの知る限りでは、グローバルマップを事前に作成するための学習ベースのシステムとしてはこれが初めてです。 High-definition (HD) semantic maps are crucial in enabling autonomous vehicles to navigate urban environments. The traditional method of creating offline HD maps involves labor-intensive manual annotation processes, which are not only costly but also insufficient for timely updates. Recent studies have proposed an alternative approach that generates local maps using online sensor observations. However, this approach is limited by the sensor's perception range and its susceptibility to occlusions. In this study, we propose Neural Map Prior (NMP), a neural representation of global maps. This representation automatically updates itself and improves the performance of local map inference. Specifically, we utilize two approaches to achieve this. Firstly, to integrate a strong map prior into local map inference, we apply cross-attention, a mechanism that dynamically identifies correlations between current and prior features. Secondly, to update the global neural map prior, we utilize a learning-based fusion module that guides the network in fusing features from previous traversals. Our experimental results, based on the nuScenes dataset, demonstrate that our framework is highly compatible with various map segmentation and detection architectures. It significantly improves map prediction performance, even in challenging weather conditions and situations with a longer perception range. To the best of our knowledge, this is the first learning-based system for creating a global map prior.	翻訳日:2023-06-17 01:18:32 公開日:2023-06-14
# 自己教師付き深層学習による全スリッド画像の高速かつスケーラブルな検索」に関するコメント Comments on 'Fast and scalable search of whole-slide images via self-supervised deep learning' ( http://arxiv.org/abs/2304.08297v4 ) ライセンス: Link先を確認	Milad Sikaroudi, Mehdi Afshari, Abubakr Shafique, Shivam Kalra, H.R. Tizhoosh	(参考訳) チェンなど。 [chen2022]は最近、nature biomedical engineeringで、"fast and scalable search of whole-slide images via self-supervised deep learning"という記事を発表した。著者らはこれらの手法を「組織学のための自己監督画像検索」、略称SISHと呼んでいる。 SISH は Yottixel の漸進的な修正であり,MinMax のバイナライゼーションは用いてきたが,原著を引用せず,誤用した「自己監督画像検索」に基づいている,という懸念を表明する。また、Chenらによる実験と比較に関する他の懸念についても指摘する。 Chen et al. [Chen2022] recently published the article 'Fast and scalable search of whole-slide images via self-supervised deep learning' in Nature Biomedical Engineering. The authors call their method 'self-supervised image search for histology', short SISH. We express our concerns that SISH is an incremental modification of Yottixel, has used MinMax binarization but does not cite the original works, and is based on a misnomer 'self-supervised image search'. As well, we point to several other concerns regarding experiments and comparisons performed by Chen et al.	翻訳日:2023-06-17 01:17:31 公開日:2023-06-14
# 多くのディープネットワークの訓練過程は、同じ低次元多様体を探索する The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold ( http://arxiv.org/abs/2305.01604v2 ) ライセンス: Link先を確認	Jialin Mao, Itay Griniasty, Han Kheng Teoh, Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari	(参考訳) 我々は,訓練中の深層ネットワーク予測の軌跡を解析するための情報幾何学的手法を開発した。基礎となる高次元確率モデルを調べることにより,訓練過程が効果的に低次元多様体を探索することを明らかにする。様々なアーキテクチャ、サイズを持つネットワークは、様々な最適化手法、正規化技術、データ拡張技術、重み付け初期化を訓練し、予測空間の同じ多様体上に配置する。この多様体の詳細を調べたところ、異なるアーキテクチャを持つネットワークは区別可能な軌跡に従うが、他の要因は最小限の影響を受けており、より大きなネットワークはより小さなネットワークと同様の多様体に沿って訓練し、予測空間の非常に異なる部分で初期化されるネットワークは、同様の多様体に沿って解に収束する。 We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.	翻訳日:2023-06-17 01:08:59 公開日:2023-06-14
# ユーザクエリのためのコンテキスト多言語スペルチェッカ Contextual Multilingual Spellchecker for User Queries ( http://arxiv.org/abs/2305.01082v2 ) ライセンス: Link先を確認	Sanat Sharma, Josep Valls-Vargas, Tracy Holloway King, Francois Guerin, Chirag Arora	(参考訳) Spellcheckingは、最も基本的で広く使われている検索機能の一つだ。不正な綴りのユーザクエリの修正は、ユーザエクスペリエンスの向上だけでなく、ユーザの期待も高めます。しかしながら、最も広く利用されているスペルチェックソリューションは、最先端のソリューションよりも精度が低いか、レイテンシが重要な要件である検索ユースケースで使用するには遅すぎるかのどちらかである。さらに、最近の最も革新的なアーキテクチャは英語に重点を置いており、多言語で訓練されておらず、長文の綴り訂正のために訓練されている。最後に、ほとんどの企業は製品名のような独自の語彙を持っているため、既製のスペルソリューションはユーザのニーズに届かない。本研究では,非常に高速でスケーラブルで,その語彙に適応し,特定の製品のニーズに応じた綴り出力を行う多言語スペルチェッカを構築した。さらに、スペルはドメイン内のデータセットに対して広いマージンで汎用スペルを上回ります。私たちの多言語スペルはAdobe製品の検索に使われ、様々なアプリケーションでオートコンプリートに使われています。 Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications.	翻訳日:2023-06-17 01:08:43 公開日:2023-06-14
# クラスバランス拡散モデル Class-Balancing Diffusion Models ( http://arxiv.org/abs/2305.00562v2 ) ライセンス: Link先を確認	Yiming Qin, Huangjie Zheng, Jiangchao Yao, Mingyuan Zhou, Ya Zhang	(参考訳) 拡散に基づくモデルは、近年の研究でより良い多様性を保ちながら高品質な視覚データを生成する利点を示している。しかし、そのような観察は、データサンプルがラベルの点から一様に配布されるように適切に事前処理されたキュレートされたデータ分布でのみ正当化される。実際には、ロングテールデータ分布はより一般的であり、そのようなクラス不均衡データに対して拡散モデルがどのように振る舞うかは不明である。本研究では,この問題をまず研究し,拡散モデルがクラス不均衡分布を持つデータセット上で訓練された場合,多様性と忠実性の両面で有意な劣化を観測する。特に尾のクラスでは、世代は多様性をほとんど失い、重度のモード崩壊の問題を観察します。そこで本研究では,データ分布がクラスバランスではないという仮説から,分布調整正規化器を用いて学習したクラスバランス拡散モデル(cbdm)を提案する。 CBDMが生成した画像は,定量的および質的両面で高い多様性と品質を示した。提案手法は,CIFAR100/CIFAR100LTデータセットで生成結果をベンチマークし,下流認識タスクにおいて優れた性能を示す。 Diffusion-based models have shown the merits of generating high-quality visual data while preserving better diversity in recent studies. However, such observation is only justified with curated data distribution, where the data samples are nicely pre-processed to be uniformly distributed in terms of their labels. In practice, a long-tailed data distribution appears more common and how diffusion models perform on such class-imbalanced data remains unknown. In this work, we first investigate this problem and observe significant degradation in both diversity and fidelity when the diffusion model is trained on datasets with class-imbalanced distributions. Especially in tail classes, the generations largely lose diversity and we observe severe mode-collapse issues. To tackle this problem, we set from the hypothesis that the data distribution is not class-balanced, and propose Class-Balancing Diffusion Models (CBDM) that are trained with a distribution adjustment regularizer as a solution. Experiments show that images generated by CBDM exhibit higher diversity and quality in both quantitative and qualitative ways. Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.	翻訳日:2023-06-17 01:07:54 公開日:2023-06-14
# ニオブ表面カプセル化によるトランスモン量子コヒーレンスの系統的改善 Systematic Improvements in Transmon Qubit Coherence Enabled by Niobium Surface Encapsulation ( http://arxiv.org/abs/2304.13257v2 ) ライセンス: Link先を確認	Mustafa Bal, Akshay A. Murthy, Shaojiang Zhu, Francesco Crisa, Xinyuan You, Ziwen Huang, Tanay Roy, Jaeyel Lee, David van Zanten, Roman Pilipenko, Ivan Nekrashevich, Daniel Bafia, Yulia Krasnikova, Cameron J. Kopas, Ella O. Lachman, Duncan Miller, Josh Y. Mutus, Matthew J. Reagor, Hilal Cansizoglu, Jayss Marshall, David P. Pappas, Kim Vu, Kameshwar Yadavalli, Jin-Su Oh, Lin Zhou, Matthew J. Kramer, Florent Q. Lecocq, Dominic P. Goronzy, Carlos G. Torres-Castanedo, Graham Pritchard, Vinayak P. Dravid, James M. Rondinelli, Michael J. Bedzyk, Mark C. Hersam, John Zasadzinski, Jens Koch, James A. Sauls, Alexander Romanenko, and Anna Grassellino	(参考訳) 本稿では,T$_1$コヒーレンス時間を体系的に改善するトランスモンキュービット製造手法を提案する。我々は, ニオブの表面を緩和し, 損失表面の酸化物の形成を阻害するカプセル化戦略を用いて, デバイスを作製した。同じ超伝導金属を維持し, 表面構造だけを変化させることにより, 異なる量子ビットファイントリーにまたがる異なるキャッピング材料および膜基板について, ニオブ酸化物が超伝導量子ビットのコヒーレンス時間に与える影響をタンタル, アルミニウム, 窒化チタンのネイティブ酸化物と比較して明らかに実証した。表面封入したニオブ量子ビットデバイスは,ネイティブなニオブ酸化物を用いたベースラインニオブ量子ビットデバイスよりも2～5倍のコヒーレンス時間を示す。ニオブをタンタルで捕獲すると、200マイクロ秒以上で平均クビット寿命が得られる。アモルファスなニオブ酸化物は, 他のアモルファスな酸化物に比べて高い損失を生じる可能性が示唆された。これらの結果は,超高Q超伝導ラジオ周波数(SRF)キャビティで得られた酸化ニオブ損失タンジェントの高精度測定と一致した。この新しい表面カプセル化戦略は、シリコンプロセスとの互換性により製造とスケーラブルな製造性を維持しつつ、環境安定材料によるパッシベーションによる誘電損失のさらなる低減を可能にする。 We present a novel transmon qubit fabrication technique that yields systematic improvements in T$_1$ coherence times. We fabricate devices using an encapsulation strategy that involves passivating the surface of niobium and thereby preventing the formation of its lossy surface oxide. By maintaining the same superconducting metal and only varying the surface structure, this comparative investigation examining different capping materials and film substrates across different qubit foundries definitively demonstrates the detrimental impact that niobium oxides have on the coherence times of superconducting qubits, compared to native oxides of tantalum, aluminum or titanium nitride. Our surface-encapsulated niobium qubit devices exhibit T$_1$ coherence times 2 to 5 times longer than baseline niobium qubit devices with native niobium oxides. When capping niobium with tantalum, we obtain median qubit lifetimes above 200 microseconds. Our comparative structural and chemical analysis suggests that amorphous niobium oxides may induce higher losses compared to other amorphous oxides. These results are in line with high-accuracy measurements of the niobium oxide loss tangent obtained with ultra-high Q superconducting radiofrequency (SRF) cavities. This new surface encapsulation strategy enables even further reduction of dielectric losses via passivation with ambient-stable materials, while preserving fabrication and scalable manufacturability thanks to the compatibility with silicon processes.	翻訳日:2023-06-17 01:06:33 公開日:2023-06-14
# 編集可能なステップバイステップ記述によるインタラクティブテキスト間SQL生成 Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations ( http://arxiv.org/abs/2305.07372v2 ) ライセンス: Link先を確認	Yuan Tian, Zheng Zhang, Zheng Ning, Toby Jia-Jun Li, Jonathan K. Kummerfeld, Tianyi Zhang	(参考訳) 関係データベースは、このビッグデータ時代において重要な役割を果たす。しかし、SQLのようなデータベース言語に慣れていないため、非専門家がリレーショナルデータベースの分析能力を完全に解き放つことは困難である。自然言語からSQLを自動的に生成する多くのテクニックが提案されているが、それらは2つの問題に悩まされている。(1) 依然として多くのミス、特に複雑なクエリ、(2) 非専門家のユーザが不正クエリを検証、洗練するための柔軟な方法を提供していない。これらの問題に対処するために、ユーザがSQLエラーを修正するために、間違ったSQLのステップバイステップ説明を直接編集できる新しいインタラクションメカニズムを導入する。スパイダーベンチマークの実験では、我々の手法は3つのSOTAアプローチを少なくとも31.6%上回っている。 24人の参加者によるユーザスタディでは、私たちのアプローチによって、より少ない時間と高い信頼性で、はるかに多くのSQLタスクを解決できることが示されています。 Relational databases play an important role in this Big Data era. However, it is challenging for non-experts to fully unleash the analytical power of relational databases, since they are not familiar with database languages such as SQL. Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly for complex queries, and (2) they do not provide a flexible way for non-expert users to validate and refine the incorrect queries. To address these issues, we introduce a new interaction mechanism that allows users directly edit a step-by-step explanation of an incorrect SQL to fix SQL errors. Experiments on the Spider benchmark show that our approach outperforms three SOTA approaches by at least 31.6% in terms of execution accuracy. A user study with 24 participants further shows that our approach helped users solve significantly more SQL tasks with less time and higher confidence, demonstrating its potential to expand access to databases, particularly for non-experts.	翻訳日:2023-06-17 00:59:41 公開日:2023-06-14
# 時系列予測のためのスペクトル-時間グラフニューラルネットワークの表現力 How Expressive are Spectral-Temporal Graph Neural Networks for Time Series Forecasting? ( http://arxiv.org/abs/2305.06587v2 ) ライセンス: Link先を確認	Ming Jin, Guangsi Shi, Yuan-Fang Li, Qingsong Wen, Bo Xiong, Tian Zhou, Shirui Pan	(参考訳) スペクトル時間グラフニューラルネットワークは、グラフニューラルネットワーク(GNN)に基づくほとんどの時系列予測モデルに基づく、有望な抽象化である。しかし、この手法の根底についてもっと知る必要がある。本稿では,スペクトル時間GNNの表現力を向上する理論的枠組みを確立する。その結果,線形スペクトル時間GNNは軽微な仮定の下で普遍的であり,その表現力は離散時間動的グラフ上の1次Weisfeiler-Lemanアルゴリズムによって有界であることがわかった。有効なインスタンス化を実践するために、関連する制約を詳細に検討し、スペクトル領域における空間的および時間的モジュールを設計するための理論的青写真について概説する。これらの知見に基づいて、我々のフレームワークに基づいて、スペクトル時間GNNがいかに強力であるかを示すために、TGC(Temporal Graph GegenConv)というシンプルなインスタンスを提案し、線形成分のみで既存のモデルよりも大幅に優れ、モデル効率が向上した。 Spectral-temporal graph neural network is a promising abstraction underlying most time series forecasting models that are based on graph neural networks (GNNs). However, more is needed to know about the underpinnings of this branch of methods. In this paper, we establish a theoretical framework that unravels the expressive power of spectral-temporal GNNs. Our results show that linear spectral-temporal GNNs are universal under mild assumptions, and their expressive power is bounded by our extended first-order Weisfeiler-Leman algorithm on discrete-time dynamic graphs. To make our findings useful in practice on valid instantiations, we discuss related constraints in detail and outline a theoretical blueprint for designing spatial and temporal modules in spectral domains. Building on these insights and to demonstrate how powerful spectral-temporal GNNs are based on our framework, we propose a simple instantiation named Temporal Graph GegenConv (TGC), which significantly outperforms most existing models with only linear components and shows better model efficiency.	翻訳日:2023-06-17 00:59:21 公開日:2023-06-14
# 歴史データセットのライター検索に向けて Towards Writer Retrieval for Historical Datasets ( http://arxiv.org/abs/2305.05358v2 ) ライセンス: Link先を確認	Marco Peer, Florian Kleber, Robert Sablatnig	(参考訳) 本稿では,キーポイント位置で検出されたクラスタリングSIFT記述子に基づいて,擬似クラスタラベルによる文字検索を行う手法を提案する。これらのクラスタラベルを用いて,NetVLADに比べて複雑性の低い符号化層であるNetRVLADをキーポイント位置32x32パッチでトレーニングした。さらに,ページ埋め込みの類似性を生かして検索性能を向上させるため,SGRと呼ばれるグラフベースの再ランクアルゴリズムを提案する。本手法は2つの歴史的データセット(Historical-WIとHisIR19)で評価する。我々は異なるバックボーンとNetRVLADの評価を含む。明示的なエンコーディングを使わずに、歴史的なデータセットに関する関連作業と競合する。再ランキング方式を適用することで,両データセットに新たな最先端技術を設定し,現代的なデータセットでも同等のパフォーマンスを達成できることを実証した。 This paper presents an unsupervised approach for writer retrieval based on clustering SIFT descriptors detected at keypoint locations resulting in pseudo-cluster labels. With those cluster labels, a residual network followed by our proposed NetRVLAD, an encoding layer with reduced complexity compared to NetVLAD, is trained on 32x32 patches at keypoint locations. Additionally, we suggest a graph-based reranking algorithm called SGR to exploit similarities of the page embeddings to boost the retrieval performance. Our approach is evaluated on two historical datasets (Historical-WI and HisIR19). We include an evaluation of different backbones and NetRVLAD. It competes with related work on historical datasets without using explicit encodings. We set a new State-of-the-art on both datasets by applying our reranking scheme and show that our approach achieves comparable performance on a modern dataset as well.	翻訳日:2023-06-17 00:58:39 公開日:2023-06-14
# 複数の注意機構と深層学習に基づく底部血管像のセグメンテーション Fundus vascular image segmentation based on multiple attention mechanisms and deep learning ( http://arxiv.org/abs/2305.03617v2 ) ライセンス: Link先を確認	Yuanyuan Peng, Pengpeng Luan, Zixu Zhang	(参考訳) 網膜眼底画像中の血管を正確に分割することは、眼疾患の早期スクリーニング、診断、評価において重要であるが、重要な光変化、不均一な曲率構造、非一様コントラストなどの様々な要因により、セグメンテーションタスクに不明瞭な不確実性をもたらす。その結果,網膜基底画像の血管を正確に検出するために,複数の注意機構と深部学習に基づく有用なアプローチが提案された。シーン情報補償の喪失に関する文脈情報を強化するため、トランスフォーマーによって構築された空間的注意機構とチャネル注意を結合した注意融合機構を用いて、空間的およびチャネル的な寸法の網膜基底画像から血管の様々な特徴を抽出する。その後、スキップ接続で低レベル機能から冗長な情報やノイズを除去し、高レベル機能との統合性を向上させるために、ユニークな空間的注意機構が導入される。さらに、ドロップアウト層を使用して、いくつかのニューロンをランダムに破棄することで、ディープラーニングネットワークの過剰フィットを防止し、その一般化性能を向上させることができる。 Accurately segmenting blood vessels in retinal fundus images is crucial in the early screening, diagnosing, and evaluating some ocular diseases, yet it poses a nontrivial uncertainty for the segmentation task due to various factors such as significant light variations, uneven curvilinear structures, and non-uniform contrast. As a result, a useful approach based on multiple attention mechanisms and deep learning is proposed to accurately detect blood vessels in retinal fundus images. To enrich contextual information for the loss of scene information compensation, an attention fusion mechanism that combines the channel attention with spatial attention mechanisms constructed by Transformer is employed to extract various features of blood vessels from retinal fundus images in both spatial and channel dimensions. Subsequently, a unique spatial attention mechanism is introduced in the skip connection to filter out redundant information and noise from low-level features, thus enabling better integration with high-level features. In addition, a DropOut layer is employed to randomly discard some neurons, which can prevent overfitting of the deep learning network and improve its generalization performance.	翻訳日:2023-06-17 00:58:05 公開日:2023-06-14
# フローターno more: 近距離カメラのトレーニングを改善するための放射輝度場勾配スケーリング Floaters No More: Radiance Field Gradient Scaling for Improved Near-Camera Training ( http://arxiv.org/abs/2305.02756v2 ) ライセンス: Link先を確認	Julien Philip and Valentin Deschaintre	(参考訳) nerf取得は通常、異なるカメラの近接面を慎重に選択するか、背景の崩壊に悩まされ、撮影シーンの端に浮かぶアーティファクトを生成する必要がある。この研究の鍵となる洞察は、背景の崩壊は、カメラ近傍の領域で試料の密度が高いことに起因する。このサンプリング不均衡の結果、近カメラボリュームは、はるかに多くの勾配を受け取り、誤った密度の蓄積をもたらす。本稿では,このサンプリング不均衡を解消し,背景崩壊を防止しつつ,近接平面の必要性をなくすための勾配スケーリング手法を提案する。我々の手法は数行で実装でき、大きなオーバーヘッドを生じさせることなく、ほとんどのNeRF実装と互換性がある。 NeRF acquisition typically requires careful choice of near planes for the different cameras or suffers from background collapse, creating floating artifacts on the edges of the captured scene. The key insight of this work is that background collapse is caused by a higher density of samples in regions near cameras. As a result of this sampling imbalance, near-camera volumes receive significantly more gradients, leading to incorrect density buildup. We propose a gradient scaling approach to counter-balance this sampling imbalance, removing the need for near planes, while preventing background collapse. Our method can be implemented in a few lines, does not induce any significant overhead, and is compatible with most NeRF implementations.	翻訳日:2023-06-17 00:57:26 公開日:2023-06-14
# LLM-Pruner:大規模言語モデルの構造解析について LLM-Pruner: On the Structural Pruning of Large Language Models ( http://arxiv.org/abs/2305.11627v2 ) ライセンス: Link先を確認	Xinyin Ma, Gongfan Fang, Xinchao Wang	(参考訳) 大規模言語モデル(LLM)は、言語理解と生成において顕著な能力を示している。しかしながら、そのような印象的な機能は通常、相当なモデルサイズが伴い、デプロイメント、推論、トレーニングステージの両方において大きな課題が生じる。 LLMは汎用的なタスクソルバであり,従来のLLMのマルチタスク解決と言語生成能力の維持を目的とした,タスク非依存の方法で圧縮を探索する。これを実現するための1つの課題は、データ転送と後トレーニングのオーバーバーデンサムをモデル化するLLMのトレーニングコーパスの巨大なサイズである。そこで本研究では,LLMの圧縮をタスク依存的であること,トレーニングデータセットへの依存を最小限に抑えること,という2つの制約の範囲内で行う。 llm-pruner という手法では,勾配情報に基づく非臨界結合構造を選択的に除去し,llmの機能の大部分を最大に保持する構造的プルーニングを採用する。この目的のために、プルーニングされたモデルの性能は、わずか3時間で、わずか50Kのデータしか必要とせず、チューニング技術であるLoRAによって効率よく回復することができる。 LLaMA, Vicuna, ChatGLM の3つの LLM 上で LLM-Pruner の有効性を検証し, 圧縮されたモデルがゼロショットの分類と生成に満足できることを示す。コードは、https://github.com/horseee/LLM-Prunerで入手できる。 Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges in both the deployment, inference, and training stages. With LLM being a general-purpose task solver, we explore its compression in a task-agnostic manner, which aims to preserve the multi-task solving and language generation ability of the original LLM. One challenge to achieving this is the enormous size of the training corpus of LLM, which makes both data transfer and model post-training over-burdensome. Thus, we tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures based on gradient information, maximally preserving the majority of the LLM's functionality. To this end, the performance of pruned models can be efficiently recovered through tuning techniques, LoRA, in merely 3 hours, requiring only 50K data. We validate the LLM-Pruner on three LLMs, including LLaMA, Vicuna, and ChatGLM, and demonstrate that the compressed models still exhibit satisfactory capabilities in zero-shot classification and generation. The code is available at: https://github.com/horseee/LLM-Pruner	翻訳日:2023-06-17 00:49:50 公開日:2023-06-14
# 制御可能な画像合成のための遅延制約拡散誘導 Late-Constraint Diffusion Guidance for Controllable Image Synthesis ( http://arxiv.org/abs/2305.11520v4 ) ライセンス: Link先を確認	Chang Liu, Dong Liu	(参考訳) 拡散モデルは、テキスト条件の有無にかかわらず、数語または全くの単語を与えられたフォトリアリスティック画像の合成能力を示す。通常のユーザーやアーティストは、全体的なレイアウト、色、構造、オブジェクトの形状など、特定のガイダンスで合成画像を制御するつもりなので、これらのモデルはユーザーのニーズを十分に満たさないかもしれない。制御可能な画像合成に拡散モデルを適用するために,拡散復調ネットワークの中間特性を正則化するためのいくつかの手法が提案されている。本稿では, 早期制約法として知られ, 単一解法で複数の条件を扱うのに困難がある。彼らは、多くのトレーニングコストと一般化不可能なソリューションを必要とする、特定の条件ごとに別々のモデルをトレーニングすることを意図している。これらの問題に対処するために,我々は拡散ネットワークをそのまま残しながら,その出力が要求条件に合致するように制約するという,遅延制約という新しいアプローチを提案する。具体的には,外部条件と拡散モデルの内部表現との相関性を確立するために,軽量条件アダプタを訓練する。反復分別処理の間、条件付きガイダンスを対応する条件アダプタに送信してサンプリングプロセスを確立された相関で操作する。さらに,提案手法に準拠した合成画像の品質向上を図るため,時間ステップリサンプリング法と早期停止法を用いて,導入した遅延制約戦略を導入する。提案手法は,既存の早期制約法よりも優れ,未確認条件の一般化に優れる。私たちのコードは利用できます。 Diffusion models, either with or without text condition, have demonstrated impressive capability in synthesizing photorealistic images given a few or even no words. These models may not fully satisfy user need, as normal users or artists intend to control the synthesized images with specific guidance, like overall layout, color, structure, object shape, and so on. To adapt diffusion models for controllable image synthesis, several methods have been proposed to incorporate the required conditions as regularization upon the intermediate features of the diffusion denoising network. These methods, known as early-constraint ones in this paper, have difficulties in handling multiple conditions with a single solution. They intend to train separate models for each specific condition, which require much training cost and result in non-generalizable solutions. To address these difficulties, we propose a new approach namely late-constraint: we leave the diffusion networks unchanged, but constrain its output to be aligned with the required conditions. Specifically, we train a lightweight condition adapter to establish the correlation between external conditions and internal representations of diffusion models. During the iterative denoising process, the conditional guidance is sent into corresponding condition adapter to manipulate the sampling process with the established correlation. We further equip the introduced late-constraint strategy with a timestep resampling method and an early stopping technique, which boost the quality of synthesized image meanwhile complying with the guidance. Our method outperforms the existing early-constraint methods and generalizes better to unseen condition. Our code would be available.	翻訳日:2023-06-17 00:49:24 公開日:2023-06-14
# the beauty or the beast: 合成医療画像のどの側面が注目に値するか? The Beauty or the Beast: Which Aspect of Synthetic Medical Images Deserves Our Focus? ( http://arxiv.org/abs/2305.09789v2 ) ライセンス: Link先を確認	Xiaodan Xing, Yang Nan, Federico Felder, Simon Walsh and Guang Yang	(参考訳) 医療用AIアルゴリズムのトレーニングには、大量の正確なラベル付きデータセットが必要である。深層生成モデルから生成された合成画像は、データの不足問題を緩和するのに役立つが、それらの効果は実世界の画像への忠実さに依存する。通常、研究者は画質測定に基づいて合成モデルを選択し、リアルに見える合成画像を優先する。しかし,本研究では,高忠実度で視覚的に魅力的な合成画像が必ずしも優れているとは限らない。実際,下流タスクにおいて,低忠実度合成画像が高忠実度画像よりも優れている場合を示す。本研究は,現実世界のアプリケーションに合成データを組み込む前に,総合分析の重要性を浮き彫りにする。我々は,医療用AIアルゴリズムのトレーニングにおいて,低忠実度合成画像の価値について,研究コミュニティの間で認識を深めることを期待している。 Training medical AI algorithms requires large volumes of accurately labeled datasets, which are difficult to obtain in the real world. Synthetic images generated from deep generative models can help alleviate the data scarcity problem, but their effectiveness relies on their fidelity to real-world images. Typically, researchers select synthesis models based on image quality measurements, prioritizing synthetic images that appear realistic. However, our empirical analysis shows that high-fidelity and visually appealing synthetic images are not necessarily superior. In fact, we present a case where low-fidelity synthetic images outperformed their high-fidelity counterparts in downstream tasks. Our findings highlight the importance of comprehensive analysis before incorporating synthetic data into real-world applications. We hope our results will raise awareness among the research community of the value of low-fidelity synthetic images in medical AI algorithm training.	翻訳日:2023-06-17 00:48:44 公開日:2023-06-14
# LoViT:手術用位相認識用長ビデオトランス LoViT: Long Video Transformer for Surgical Phase Recognition ( http://arxiv.org/abs/2305.08989v3 ) ライセンス: Link先を確認	Yang Liu, Maxence Boels, Luis C. Garcia-Peraza-Herrera, Tom Vercauteren, Prokar Dasgupta, Alejandro Granados and Sebastien Ourselin	(参考訳) オンラインの手術相認識は、パフォーマンスを定量化し、手術ワークフローの実行を監督するコンテキストツールを構築する上で重要な役割を果たす。現在のアプローチは、異なるフェーズに出現する類似のフレームによる誤った予測につながるフレームレベルの監督を使って空間的特徴抽出器を訓練し、外科手術でよく見られるロングビデオの分析に影響を及ぼす計算上の制約によって局所的特徴とグローバルな特徴をうまく融合しないため、制限されている。本稿では,Long Video Transformer (LoViT) と呼ばれる,時間的に豊富な空間的特徴抽出器と,自己意図に基づく2つのL-Transモジュールからなる大規模時間的アグリゲータを組み合わせた,短時間・長期の時間的情報を融合する2段階の手法を提案する。マルチスケールのテンポラリヘッドは、局所的および大域的な特徴を結合し、位相遷移認識による手術段階を分類する。このアプローチは、Colec80とAutoLaparoデータセットの最先端メソッドを一貫して上回る。 trans-svnetと比較すると、lovitはcholec80におけるビデオレベルの精度が2.4pp向上し、autolaparoでは3.1pp向上した。さらに、オートラパロの位相レベルjaccardの5.3pp改善とcholec80の1.55pp改善を達成している。以上の結果から,本手法は,異なる手術手順と時間的シークエンシング特性の2つのデータセット上での外科的位相認識の最先端化に有効であり,また,ロングビデオ対応のメカニズムも導入している。 Online surgical phase recognition plays a significant role towards building contextual tools that could quantify performance and oversee the execution of surgical workflows. Current approaches are limited since they train spatial feature extractors using frame-level supervision that could lead to incorrect predictions due to similar frames appearing at different phases, and poorly fuse local and global features due to computational constraints which can affect the analysis of long videos commonly encountered in surgical interventions. In this paper, we present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information that combines a temporally-rich spatial feature extractor and a multi-scale temporal aggregator consisting of two cascaded L-Trans modules based on self-attention, followed by a G-Informer module based on ProbSparse self-attention for processing global temporal information. The multi-scale temporal head then combines local and global features and classifies surgical phases using phase transition-aware supervision. Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently. Compared to Trans-SVNet, LoViT achieves a 2.4 pp (percentage point) improvement in video-level accuracy on Cholec80 and a 3.1 pp improvement on AutoLaparo. Moreover, it achieves a 5.3 pp improvement in phase-level Jaccard on AutoLaparo and a 1.55 pp improvement on Cholec80. Our results demonstrate the effectiveness of our approach in achieving state-of-the-art performance of surgical phase recognition on two datasets of different surgical procedures and temporal sequencing characteristics whilst introducing mechanisms that cope with long videos.	翻訳日:2023-06-17 00:48:30 公開日:2023-06-14
# Mobile-Env: LLM時代のインタラクティブエージェントの評価プラットフォームとベンチマーク Mobile-Env: An Evaluation Platform and Benchmark for Interactive Agents in LLM Era ( http://arxiv.org/abs/2305.08144v2 ) ライセンス: Link先を確認	Danyang Zhang, Lu Chen, Zihan Zhao, Ruisheng Cao, Kai Yu	(参考訳) 様々な評価ベンチマークは、大規模言語モデル(LLM)の幅広い機能を評価する上で重要な役割を果たす。価値あるベンチマークの構築に多くの取り組みがなされているが、マルチステップ対話環境におけるllmの能力評価を目的とした作業はまだ少ない。 LLMは、インタラクションのための環境観測のテキスト表現を必要とすることに気づき、情報ユーザインタフェース(InfoUI)に基づいた新しいベンチマークを構築することで、そのような空白を埋めることを選択します。 infouiはリッチテキストコンテンツで構成され、いくつかのテキストフォーマットで表現できるため、llmの相互作用能力の評価に適している。さらに、infouiの複雑な構造は、llmがプレーンテキストではなく構造化テキストを理解することの難しさをさらに高めることができる。インタラクションプラットフォームはエージェントを評価するために常に使用されるが、InfoUI専用に十分なインタラクションプラットフォームがまだ存在しない。そこで本研究では,新たな拡張性,適応性,親密なインタラクションプラットフォームであるmobile-envを構築し,適切なベンチマークのベースを提供する。 Mobile-Env をベースにした InfoUI タスクセット WikiHow が構築され,構造化テキストベースの環境における LLM のマルチステップインタラクション能力のベンチマークを確立する。一連のLLMをベースとしたエージェントをタスクセット上でテストし,InfoUIインタラクションにおけるLLMの可能性と課題について考察する。コミュニティがmobile-envの新しい環境と新しいタスクセットを提供し、より良いテストベンチマークを提供し、対応するドメインの開発を促進することを心から歓迎します。 Diverse evaluation benchmarks play a crucial role to assess a wide range of capabilities of large language models (LLM). Although plenty of endeavors have been dedicated to building valuable benchmarks, there is still little work aiming at evaluating the capability of LLM in multistep interactive environments. Noticing that LLM requires a text representation of the environment observations for interaction, we choose to fill such a blank by building a novel benchmark based on the information user interface (InfoUI). InfoUI consists of rich text contents and can be represented in some text formats, thus is suitable for the assessment of interaction ability of LLM. Additionally, the complex structures of InfoUI can further raise a challenge for LLM to understand structured texts rather than plain texts. An interaction platform is always used to evaluate an agent, however, there is still a lack of a satisfactory interaction platform dedicated to InfoUI. Consequently, we propose to build a novel easily-extendable, adaptable, and close-to-reality interaction platform, Mobile-Env, to provide a base for an appropriate benchmark. Based on Mobile-Env, an InfoUI task set WikiHow is then built to establish a benchmark for the multistep interaction capability of LLM in structured text-based environments. Agents based on a series of LLMs are tested on the task set to obtain an insight into the potential and challenge of LLM for InfoUI interaction. It is sincerely welcome that the community contribute new environments and new task sets for Mobile-Env to provide better test benchmarks and facilitate the development of the corresponding domains.	翻訳日:2023-06-17 00:47:49 公開日:2023-06-14
# 自然言語処理における拡散モデルの検討 A Survey of Diffusion Models in Natural Language Processing ( http://arxiv.org/abs/2305.14671v2 ) ライセンス: Link先を確認	Hao Zou, Zae Myung Kim, Dongyeop Kang	(参考訳) 本稿では,自然言語処理(NLP)における拡散モデルの利用について概説する。拡散モデル(英: Diffusion model)は、ネットワークや多様体にまたがる情報や信号の拡散を捉えることを目的とした数学モデルのクラスである。 NLPでは、自然言語生成、感情分析、トピックモデリング、機械翻訳などの様々な応用で拡散モデルが使われている。本稿では,NLPにおける拡散モデルの異なる定式化,その強度と限界,応用について論じる。また、拡散モデルと代替生成モデルとの徹底的な比較を行い、特に自己回帰(AR)モデルを強調し、拡散モデルとともにトランスフォーマーがいかに多様なアーキテクチャを組み込むかを検討する。 ARモデルと比較して、拡散モデルは、並列生成、テキスト補間、構文構造や意味的内容などのトークンレベルの制御、堅牢性に対して大きな利点がある。トランスフォーマーを拡散モデルに統合するさらなる応用を探求することは、価値ある追求である。また,nlpにおける拡散モデルの発展に向けて,多変量拡散モデルや,数発学習の特長を持つ大規模拡散言語モデルの開発が重要となる。 This survey paper provides a comprehensive review of the use of diffusion models in natural language processing (NLP). Diffusion models are a class of mathematical models that aim to capture the diffusion of information or signals across a network or manifold. In NLP, diffusion models have been used in a variety of applications, such as natural language generation, sentiment analysis, topic modeling, and machine translation. This paper discusses the different formulations of diffusion models used in NLP, their strengths and limitations, and their applications. We also perform a thorough comparison between diffusion models and alternative generative models, specifically highlighting the autoregressive (AR) models, while also examining how diverse architectures incorporate the Transformer in conjunction with diffusion models. Compared to AR models, diffusion models have significant advantages for parallel generation, text interpolation, token-level controls such as syntactic structures and semantic contents, and robustness. Exploring further permutations of integrating Transformers into diffusion models would be a valuable pursuit. Also, the development of multimodal diffusion models and large-scale diffusion language models with notable capabilities for few-shot learning would be important directions for the future advance of diffusion models in NLP.	翻訳日:2023-06-17 00:40:34 公開日:2023-06-14
# リポジトリにおける科学PDFアクセシビリティの現状:スイスにおける調査 The state of scientific PDF accessibility in repositories: A survey in Switzerland ( http://arxiv.org/abs/2305.14041v2 ) ライセンス: Link先を確認	Alireza Darvishy, Rolf Sethe, Ines Engler, Oriane Pierres, Juliet Manning	(参考訳) 本調査は、スイスのオンラインリポジトリにおけるPDF文書の品質を分析し、視覚障害者に対するアクセシビリティを検討した。 2つの最小限のアクセシビリティ機能が分析された。PDFにはタグと階層的な方向構造が必要だった。調査には、PDFアクセシビリティに関する一般的な意見や知識を評価するため、複数のスイス大学のリポジトリの管理者や責任者へのインタビューも含まれていた。インタビュアーの回答の分析は、PDFアクセシビリティに対する全体的な認識の欠如を示し、オンラインリポジトリにはこの問題に対処する具体的な計画がないことを示した。本稿では,PDF文書のアクセス性を向上させるために,オンラインリポジトリのレコメンデーションセットを提示する。 This survey analyzed the quality of the PDF documents on online repositories in Switzerland, examining their accessibility for people with visual impairments. Two minimal accessibility features were analyzed: the PDFs had to have tags and a hierarchical heading structure. The survey also included interviews with the managers or heads of multiple Swiss universities' repositories to assess the general opinion and knowledge of PDF accessibility. An analysis of interviewee responses indicates an overall lack of awareness of PDF accessibility, and showed that online repositories currently have no concrete plans to address the issue. This paper concludes by presenting a set of recommendations for online repositories to improve the accessibility of their PDF documents.	翻訳日:2023-06-17 00:39:14 公開日:2023-06-14
# 深部強化学習によるスラムの道路計画 Road Planning for Slums via Deep Reinforcement Learning ( http://arxiv.org/abs/2305.13060v3 ) ライセンス: Link先を確認	Yu Zheng, Hongyuan Su, Jingtao Ding, Depeng Jin, Yong Li	(参考訳) 何百万人ものスラム住民がスラム内の不適切な道路インフラのために都市サービスへのアクセシビリティが低下しており、スラムの道路計画が都市の持続可能な発展に不可欠である。既存の再ブロックやヒューリスティックな手法は、異なるスラムに一般化できない時間を要するか、アクセシビリティや建設コストの観点から最適以下の道路計画が得られる。本稿では,スラムの道路配置を自動的に行うための深層強化学習手法を提案する。本研究では,スラムのトポロジー構造を捉える汎用グラフモデルを提案し,計画道路の場所を選択するための新しいグラフニューラルネットワークを考案する。マスキングポリシー最適化により,スラム内の場所を最小限の建設コストで接続する道路計画を作成することができる。異なる国における実世界のスラムに関する広範囲な実験により、モデルの有効性が検証され、既存のベースラインメソッドに対するアクセシビリティが14.3%向上した。異なるタスク間での移動に関するさらなる調査は、我々のモデルが単純なシナリオで道路計画スキルを習得し、より複雑なシナリオに適応できることを示し、我々のモデルを現実世界のスラムアップグレードに適用する可能性を示している。コードとデータはhttps://github.com/tsinghua-fib-lab/road-planning-for-slumsで入手できる。 Millions of slum dwellers suffer from poor accessibility to urban services due to inadequate road infrastructure within slums, and road planning for slums is critical to the sustainable development of cities. Existing re-blocking or heuristic methods are either time-consuming which cannot generalize to different slums, or yield sub-optimal road plans in terms of accessibility and construction costs. In this paper, we present a deep reinforcement learning based approach to automatically layout roads for slums. We propose a generic graph model to capture the topological structure of a slum, and devise a novel graph neural network to select locations for the planned roads. Through masked policy optimization, our model can generate road plans that connect places in a slum at minimal construction costs. Extensive experiments on real-world slums in different countries verify the effectiveness of our model, which can significantly improve accessibility by 14.3% against existing baseline methods. Further investigations on transferring across different tasks demonstrate that our model can master road planning skills in simple scenarios and adapt them to much more complicated ones, indicating the potential of applying our model in real-world slum upgrading. The code and data are available at https://github.com/tsinghua-fib-lab/road-planning-for-slums.	翻訳日:2023-06-17 00:38:35 公開日:2023-06-14
# 3次元流れ場分割と分類のための新しい深層学習法 Novel deep learning methods for 3D flow field segmentation and classification ( http://arxiv.org/abs/2305.11884v2 ) ライセンス: Link先を確認	Xiaorui Bai, Wenyong Wang, Jun Zhang, Yueqing Wang, Yu Xiang	(参考訳) 流れ場のセグメンテーションと分類は、渦の構造や乱流を理解するのに役立つ。グローバル情報に基づく既存の深層学習手法 : 2次元状況に着目して流れ場理論に基づいて,3次元空間における新しい流れ場セグメンテーションとディープラーニングの分類法を提案する。本研究では,局所速度情報と渦流と渦流の関係に基づく分類基準に基づいてセグメンテーション基準を構築し,3次元流れ場の渦構造を同定し,渦流のタイプを正確にかつ迅速に分類する。シミュレーション実験の結果,従来の手法と比較して,分節法では渦面積をより正確に識別できるが,時間消費は50%以上減少し,分類法では同じ分類精度を維持しつつ,時間消費を90%以上削減できることがわかった。 Flow field segmentation and classification help researchers to understand vortex structure and thus turbulent flow. Existing deep learning methods mainly based on global information and focused on 2D circumstance. Based on flow field theory, we propose novel flow field segmentation and classification deep learning methods in three-dimensional space. We construct segmentation criterion based on local velocity information and classification criterion based on the relationship between local vorticity and vortex wake, to identify vortex structure in 3D flow field, and further classify the type of vortex wakes accurately and rapidly. Simulation experiment results showed that, compared with existing methods, our segmentation method can identify the vortex area more accurately, while the time consumption is reduced more than 50%; our classification method can reduce the time consumption by more than 90% while maintaining the same classification accuracy level.	翻訳日:2023-06-17 00:37:32 公開日:2023-06-14
# OVO: Open-Vocabulary Occupancy OVO: Open-Vocabulary Occupancy ( http://arxiv.org/abs/2305.16133v2 ) ライセンス: Link先を確認	Zhiyu Tan, Zichao Dong, Cheng Zhang, Weikun Zhang, Hang Ji, Hao Li	(参考訳) semantic occupancy predictionは、自律エージェントが3d環境で安全に動作するために、周囲の密度の幾何と意味を推測することを目的としている。既存の占有率予測手法は,人間の注釈付きボリュームデータに基づいてほぼ完全に訓練されている。高品質ではあるが、そのような3Dアノテーションの生成は面倒でコストがかかり、トレーニングデータセット内のいくつかの特定のオブジェクトカテゴリに制限される。この制限に対処するために,任意のクラスを意味的に占有できるが,訓練中に3Dアノテーションを必要としない新しい手法であるOpen Vocabulary Occupancy (OVO)を提案する。提案手法の鍵は,(1)事前訓練した2次元開語彙セグメンテーションモデルから3次元占有ネットワークへの知識蒸留,(2)高品質トレーニングデータ生成のためのピクセルボクセルフィルタリングである。結果として得られるフレームワークはシンプルでコンパクトで、ほとんどの最先端のセマンティック占有予測モデルと互換性がある。 NYUv2とSemanticKITTIデータセットでは、OVOは教師付きセマンティック占有予測アプローチと比較して、競争性能が向上する。さらに,提案フレームワークの設計に関する知見を提供するため,広範な解析およびアブレーション研究を行う。私たちのコードはhttps://github.com/dzcgaara/ovoで公開されています。 Semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment. Existing occupancy prediction methods are almost entirely trained on human-annotated volumetric data. Although of high quality, the generation of such 3D annotations is laborious and costly, restricting them to a few specific object categories in the training dataset. To address this limitation, this paper proposes Open Vocabulary Occupancy (OVO), a novel approach that allows semantic occupancy prediction of arbitrary classes but without the need for 3D annotations during training. Keys to our approach are (1) knowledge distillation from a pre-trained 2D open-vocabulary segmentation model to the 3D occupancy network, and (2) pixel-voxel filtering for high-quality training data generation. The resulting framework is simple, compact, and compatible with most state-of-the-art semantic occupancy prediction models. On NYUv2 and SemanticKITTI datasets, OVO achieves competitive performance compared to supervised semantic occupancy prediction approaches. Furthermore, we conduct extensive analyses and ablation studies to offer insights into the design of the proposed framework. Our code is publicly available at https://github.com/dzcgaara/OVO.	翻訳日:2023-06-17 00:28:29 公開日:2023-06-14
# Bhasha-Abhijnaanam:22言語におけるネイティブスクリプトとロマン化言語同定 Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages ( http://arxiv.org/abs/2305.15814v2 ) ライセンス: Link先を確認	Yash Madhani, Mitesh M. Khapra, Anoop Kunchukuttan	(参考訳) 我々は、インド憲法に記載されている22の言語について、言語識別(LID)データセットとモデルを作成する。まず、ネイティブスクリプト用の言語識別テストセットであるbhasha-abhijnaanamと、22のindic言語にまたがるローマ字テキストを作成します。 IndicLIDは、上記のすべての言語をネイティブおよびローマン化されたスクリプトで識別する言語である。ネイティブテキストでは、既存のLIDよりも言語カバレッジが良く、他のLIDよりも競争力がある。 IndicLIDは、インド語でロマライズされたテキストのための最初のLIDである。 romanized text LIDの2つの大きな課題は、トレーニングデータの欠如と、言語が似ている場合の低LIDパフォーマンスである。これらの問題に対する単純で効果的な解決策を提供する。一般に、いかなる言語においてもローマ字化テキストに関する作業は限られており、この発見はローマ字化言語識別を必要とする他の言語に関連している。私たちのモデルはオープンソースライセンスの下でhttps://ai4bharat.iitm.ac.in/indiclidで公開されています。私たちのトレーニングとテストセットは、オープンソースライセンスの下でhttps://ai4bharat.iitm.ac.in/bhasha-abhijnaanamで公開されています。 We create publicly available language identification (LID) datasets and models in all 22 Indian languages listed in the Indian constitution in both native-script and romanized text. First, we create Bhasha-Abhijnaanam, a language identification test set for native-script as well as romanized text which spans all 22 Indic languages. We also train IndicLID, a language identifier for all the above-mentioned languages in both native and romanized script. For native-script text, it has better language coverage than existing LIDs and is competitive or better than other LIDs. IndicLID is the first LID for romanized text in Indian languages. Two major challenges for romanized text LID are the lack of training data and low-LID performance when languages are similar. We provide simple and effective solutions to these problems. In general, there has been limited work on romanized text in any language, and our findings are relevant to other languages that need romanized language identification. Our models are publicly available at https://ai4bharat.iitm.ac.in/indiclid under open-source licenses. Our training and test sets are also publicly available at https://ai4bharat.iitm.ac.in/bhasha-abhijnaanam under open-source licenses.	翻訳日:2023-06-17 00:28:09 公開日:2023-06-14
# モバイル支払い受け入れのドライバー:ネットワーク外部性の影響 Drivers of Mobile Payment Acceptance: The Impact of Network Externalities ( http://arxiv.org/abs/2305.15436v2 ) ライセンス: Link先を確認	Qasim Ajao and E. Abdullah Abu-Shanab	(参考訳) スマートフォンとそのアプリケーションの普及により、モバイル決済はますます人気が高まっている。しかし、アフリカ諸国での採用は、私たちの生活を単純化する可能性にもかかわらず、制限されている。本研究の目的は,ナイジェリアにおけるモバイル決済の受容に影響を与える要因の理解を深めることである。そこで本稿では,従来の技術受容要因に加えて,ネットワーク外部性の影響について検討する。この研究は、モバイル支払いの受け入れの主要な要因は、パフォーマンスの期待、努力の期待、社会的影響、信頼、ネットワーク外部性である、と仮定している。調査の結果は、従来のドライバーは依然としてモバイル決済を採用する顧客の意思に影響を与えているが、ネットワーク外部性は最も強い影響を与えることを示唆している。本論文は, 努力期待の影響を裏付けるものではないが, 今後の研究を推奨する。 Mobile payment has become increasingly popular due to the widespread use of smartphones and their applications. However, its adoption in African countries has been limited, despite its potential to simplify our lives. This study aims to enhance our understanding of the factors that affect the acceptance of mobile payment in Nigeria. To achieve this, the paper explores the impact of "network externalities" in addition to traditional technology acceptance factors. The study hypothesizes that the key drivers of mobile payment acceptance are performance expectancy, effort expectancy, social influence, trust, and network externality. The research findings suggest that while traditional drivers still play a role in customers' willingness to adopt mobile payment, network externalities have the strongest impact. Although the results did not support the influence of effort expectancy, the paper provides recommendations for future research.	翻訳日:2023-06-17 00:27:51 公開日:2023-06-14
# 言語から見た弱視映像の再検討 Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective ( http://arxiv.org/abs/2306.00595v3 ) ライセンス: Link先を確認	Yingying Fan and Yu Wu and Yutian Lin and Bo Du	(参考訳) 音声/視覚モダリティのすべてのイベントを識別・特定することを目的とした,弱い教師付き音声映像解析タスク(avvp)に注目した。それまでの作業は、モダリティにまたがるビデオレベルのラベルにのみフォーカスするが、隣接するビデオセグメント(すなわち1秒のビデオクリップ)が異なるイベントを含むセグメントレベルのラベルノイズを見落としている。しかし、セグメント内のイベントを認識することは、そのラベルがビデオ内で発生するイベントの組み合わせである可能性があるため、難しい。この問題を解決するために、言語の観点からAVVPに取り組むことを検討する。なぜなら、言語は固定ラベルを超えて各セグメントにどのように様々なイベントが現れるかを自由に記述できるからだ。具体的には、各ビデオのイベント出現のすべてのケースを記述する言語プロンプトを設計します。次に、最も類似したプロンプトのイベントをセグメントレベルラベルとして、言語プロンプトとセグメントの類似度を算出する。また,ラベルの誤りに対処するため,信頼できないセグメントに対して動的再重み付けを行い,ラベルを調整することを提案する。実験により, 単純かつ効果的なアプローチが最先端の手法を大差で上回っていることが示された。 We focus on the weakly-supervised audio-visual video parsing task (AVVP), which aims to identify and locate all the events in audio/visual modalities. Previous works only concentrate on video-level overall label denoising across modalities, but overlook the segment-level label noise, where adjacent video segments (i.e., 1-second video clips) may contain different events. However, recognizing events in the segment is challenging because its label could be any combination of events that occur in the video. To address this issue, we consider tackling AVVP from the language perspective, since language could freely describe how various events appear in each segment beyond fixed labels. Specifically, we design language prompts to describe all cases of event appearance for each video. Then, the similarity between language prompts and segments is calculated, where the event of the most similar prompt is regarded as the segment-level label. In addition, to deal with the mislabeled segments, we propose to perform dynamic re-weighting on the unreliable segments to adjust their labels. Experiments show that our simple yet effective approach outperforms state-of-the-art methods by a large margin.	翻訳日:2023-06-17 00:21:40 公開日:2023-06-14
# Concordiaによる並列神経シンボル統合 Parallel Neurosymbolic Integration with Concordia ( http://arxiv.org/abs/2306.00480v2 ) ライセンス: Link先を確認	Jonathan Feldstein, Modestas Jur\v{c}ius, Efthymia Tsamoura	(参考訳) 並列型ニューロシンボリックアーキテクチャは論理理論からの知識を深層モデルに蒸留することでNLPに効果的に適用されているが、従来の技術は制限された論理理論をサポートし、論理と深層ネットワークの独立性の仮定に依存するなど、いくつかの制限に直面している。先行技術の限界を克服するフレームワークであるConcordiaを提示する。コンコルディアはディープネットワークと論理理論の両方に非依存であり、幅広い確率論的理論を支持する。我々のフレームワークは、両方のコンポーネントの教師なしトレーニングと神経コンポーネントの教師なしトレーニングをサポートすることができる。コンコーディアはNLPやデータ分類以外のタスクに適用され、集団活動の検出、エンティティリンク、レコメンデーションタスクにおける最先端の精度を向上させる。 Parallel neurosymbolic architectures have been applied effectively in NLP by distilling knowledge from a logic theory into a deep model.However, prior art faces several limitations including supporting restricted forms of logic theories and relying on the assumption of independence between the logic and the deep network. We present Concordia, a framework overcoming the limitations of prior art. Concordia is agnostic both to the deep network and the logic theory offering support for a wide range of probabilistic theories. Our framework can support supervised training of both components and unsupervised training of the neural component. Concordia has been successfully applied to tasks beyond NLP and data classification, improving the accuracy of state-of-the-art on collective activity detection, entity linking and recommendation tasks.	翻訳日:2023-06-17 00:20:58 公開日:2023-06-14
# スケッチリファインメントによるインタラクティブな画像インペインティング Towards Interactive Image Inpainting via Sketch Refinement ( http://arxiv.org/abs/2306.00407v2 ) ライセンス: Link先を確認	Chang Liu, Shunxin Xu, Jialun Peng, Kaidong Zhang and Dong Liu	(参考訳) イメージインペインティングの難しい問題は、腐敗した領域の複雑な構造を復元することである。インタラクティブなイメージのインパインティングを動機付け、スケッチなどの追加ヒントを活用してインパインティングプロセスを支援する。 sketchはエンドユーザーにはシンプルで直感的だが、ランダム性のあるフリーフォームがある。このようなランダム性は、塗装されたモデルと混同し、完成した画像に深刻なアーティファクトを引き起こす可能性がある。この問題に対処するため,sketchrefinerと呼ばれる2段階画像インペインティング手法を提案する。第1段階では,利用者に提供されたスケッチを粗い方法で校正し,洗練するために,相互相関損失関数を用いることを提案する。第2段階では,特徴空間の抽象的スケッチから情報的特徴を抽出し,着色過程を変調する。また,実際のスケッチを自動的にシミュレートし,異なるアプリケーションでテストプロトコルを構築するアルゴリズムを提案する。公開データセットの実験結果によると、SketchRefinerはスケッチ情報を効果的に利用し、フリーフォームスケッチによるアーティファクトを排除している。本手法は定性的にも量的にも常に最先端の手法よりも優れており,一方で実世界のアプリケーションにおいても大きな可能性を秘めている。コードとデータセットが利用可能です。 One tough problem of image inpainting is to restore complex structures in the corrupted regions. It motivates interactive image inpainting which leverages additional hints, e.g., sketches, to assist the inpainting process. Sketch is simple and intuitive to end users, but meanwhile has free forms with much randomness. Such randomness may confuse the inpainting models, and incur severe artifacts in completed images. To address this problem, we propose a two-stage image inpainting method termed SketchRefiner. In the first stage, we propose using a cross-correlation loss function to robustly calibrate and refine the user-provided sketches in a coarse-to-fine fashion. In the second stage, we learn to extract informative features from the abstracted sketches in the feature space and modulate the inpainting process. We also propose an algorithm to simulate real sketches automatically and build a test protocol with different applications. Experimental results on public datasets demonstrate that SketchRefiner effectively utilizes sketch information and eliminates the artifacts due to the free-form sketches. Our method consistently outperforms the state-of-the-art ones both qualitatively and quantitatively, meanwhile revealing great potential in real-world applications. Our code and dataset are available.	翻訳日:2023-06-17 00:20:44 公開日:2023-06-14
# 合成ゼロショット学習における条件属性の学習 Learning Conditional Attributes for Compositional Zero-Shot Learning ( http://arxiv.org/abs/2305.17940v2 ) ライセンス: Link先を確認	Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, Chunhua Shen	(参考訳) 合成ゼロショット学習(CZSL)は、属性オブジェクトの組み合わせのような学習概念に基づいて、新しい合成概念を認識するためのモデルを訓練することを目的としている。例えば、 ``wet apple" と ``wet cat" の属性 ``wet" は異なる。本研究では,属性が認識対象と入力画像上で条件付けされていることを解析し,属性ハイパーラーナと属性ベースラーナを含む属性学習フレームワークによって組込みされた学習条件属性を探索する。条件付き属性を符号化することにより、一般化のための柔軟な属性埋め込みを生成することができる。より挑戦的なC-GQAデータセットを含むCZSLベンチマークの実験は、他の最先端のアプローチよりも優れたパフォーマンスを示し、学習条件属性の重要性を検証する。コードはhttps://github.com/wqshmzh/CANet-CZSLで入手できる。 Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts based on learned concepts such as attribute-object combinations. One of the challenges is to model attributes interacted with different objects, e.g., the attribute ``wet" in ``wet apple" and ``wet cat" is different. As a solution, we provide analysis and argue that attributes are conditioned on the recognized object and input image and explore learning conditional attribute embeddings by a proposed attribute learning framework containing an attribute hyper learner and an attribute base learner. By encoding conditional attributes, our model enables to generate flexible attribute embeddings for generalization from seen to unseen compositions. Experiments on CZSL benchmarks, including the more challenging C-GQA dataset, demonstrate better performances compared with other state-of-the-art approaches and validate the importance of learning conditional attributes. Code is available at https://github.com/wqshmzh/CANet-CZSL	翻訳日:2023-06-17 00:18:47 公開日:2023-06-14
# 強凸最適化のための下次手法の原始双対理論 Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization ( http://arxiv.org/abs/2305.17323v2 ) ライセンス: Link先を確認	Benjamin Grimmer, Danlin Li	(参考訳) 強凸だが非滑らかな非リプシッツ最適化のための(統計的)部分次数法を考える。古典的下位段階法,近位下位段階法,スイッチング下位段階法に対して,新しい等価な二重記述(二重平均化のスタイル)を提供する。これらの同値性により、$O(1/T)$収束保証は古典的原始的ギャップと、強い凸最適化のための以前に解析されなかった双対ギャップの両方の観点から可能である。その結果,本理論は,計算コストを増すことなく,簡便で最適な停止基準と最適性証明書をこれらの古典的手法に提供する。結論は, 段階的選択や, 非リプシッツ非条件問題において, 段階的手法の初期イテレーションが指数関数的に変動する可能性(我々の知識の最大値に対して, 先行研究が対処されない現象)に対して適用できる。このような望ましくない振る舞いが存在する場合でも、我々の理論は最終的な収束を保証し、境界を与える。 We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging) for the classic subgradient method, the proximal subgradient method, and the switching subgradient method. These equivalences enable $O(1/T)$ convergence guarantees in terms of both their classic primal gap and a not previously analyzed dual gap for strongly convex optimization. Consequently, our theory provides these classic methods with simple, optimal stopping criteria and optimality certificates at no added computational cost. Our results apply under nearly any stepsize selection and for a range of non-Lipschitz ill-conditioned problems where the early iterations of the subgradient method may diverge exponentially quickly (a phenomenon which, to the best of our knowledge, no prior works address). Even in the presence of such undesirable behaviors, our theory still ensures and bounds eventual convergence.	翻訳日:2023-06-17 00:18:20 公開日:2023-06-14
# dreamsparse: スパースビューによる2次元凍結拡散モデルによるプラトンの洞窟からの脱出 DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views ( http://arxiv.org/abs/2306.03414v3 ) ライセンス: Link先を確認	Paul Yoo, Jiaxian Guo, Yutaka Matsuo, Shixiang Shane Gu	(参考訳) いくつかの視点から新しいビューイメージを合成することは、難しいが実践的な問題である。既存の手法では、提供された情報不足のため、品質の高い結果を生成するのに苦労することが多い。本研究では,事前学習した拡散モデルにおける2次元先行の強みを利用した新しいビュー画像の合成について検討する。しかし、2d拡散モデルには3d認識が欠如しており、画像合成の歪曲化とアイデンティティの妥協に繋がる。このような問題に対処するために,凍結した事前学習拡散モデルにより幾何学的,アイデンティティに一貫性のある新しいビュー画像を生成するフレームワークDreamSparseを提案する。具体的には、DreamSparseには3Dビューから3Dの機能をキャプチャーするための幾何学モジュールが組み込まれている。その後、これらの3次元特徴写像を生成過程の空間情報に変換するための空間誘導モデルを導入する。この情報は、事前訓練された拡散モデルを導くために使用され、幾何的に一貫した画像を生成することができる。事前訓練された拡散モデルで強いイメージを活用すれば、DreamSparseはオブジェクトレベルの画像とシーンレベルの画像の両方に対して高品質なノベルビューを合成し、オープンセットイメージに一般化することができる。実験により,本フレームワークは,スパースビューから新しいビューイメージを効果的に合成し,訓練されたカテゴリイメージとオープンセットのカテゴリイメージの両方において,ベースラインに優れることを示した。 https://sites.google.com/view/dreamsparse-webページ。 Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images. 2D diffusion models, nevertheless, lack 3D awareness, leading to distorted image synthesis and compromising the identity. To address these problems, we propose DreamSparse, a framework that enables the frozen pre-trained diffusion model to generate geometry and identity-consistent novel view image. Specifically, DreamSparse incorporates a geometry module designed to capture 3D features from sparse views as a 3D prior. Subsequently, a spatial guidance model is introduced to convert these 3D feature maps into spatial information for the generative process. This information is then used to guide the pre-trained diffusion model, enabling it to generate geometrically consistent images without tuning it. Leveraging the strong image priors in the pre-trained diffusion models, DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images and generalising to open-set images. Experimental results demonstrate that our framework can effectively synthesize novel view images from sparse views and outperforms baselines in both trained and open-set category images. More results can be found on our project page: https://sites.google.com/view/dreamsparse-webpage.	翻訳日:2023-06-17 00:10:49 公開日:2023-06-14
# 低エネルギー中性子-陽子散乱における絡み合い最大化 Entanglement Maximization in Low-Energy Neutron-Proton Scattering ( http://arxiv.org/abs/2306.03239v2 ) ライセンス: Link先を確認	Gerald A. Miller	(参考訳) 中性子-陽子散乱の絡み合い特性を, 中性子-陽子状態に対する散乱作用素の作用によって生じる絡み合い対の数を数える尺度を用いて検討した。 350mevまでの実験室エネルギーの散乱に関連する全ての位相シフトが用いられる。エンタングルメントは、非常に低いエネルギー散乱で最大化される。そのようなエネルギーでは、ハミルトニアンはウィグナーSU(4)対称性に従い、絡み合いの最大度はその対称性の符号である。高エネルギーでは、エンタングルメントの角度依存性は強く、エンタングルメントは多くの散乱角に対して大きい。テンソル力は、約50MeV以上の実験室運動エネルギーで絡み合いを発生させる重要な役割を担っている。 The entanglement properties of neutron-proton scattering are investigated using a measure that counts the number of entangled pairs produced by the action of a scattering operator on a given initial neutron-proton state. All phase shifts relevant for scattering at laboratory energies up to 350 MeV are used. Entanglement is found to be maximized in very low energy scattering. At such energies the Hamiltonian obeys Wigner SU(4) symmetry, and an entanglement maximum is a sign of that symmetry. At higher energies the angular dependence of entanglement is strong and the entanglement is large for many scattering angles. The tensor force is shown to play a significant role in producing entanglement at lab kinetic energies greater than about 50 MeV.	翻訳日:2023-06-17 00:10:24 公開日:2023-06-14
# midmed:医療相談のための混合型対話に向けて MidMed: Towards Mixed-Type Dialogues for Medical Consultation ( http://arxiv.org/abs/2306.02923v2 ) ライセンス: Link先を確認	Xiaoming Shi, Zeming Liu, Chuan Wang, Haitao Leng, Kui Xue, Xiaofan Zhang, Shaoting Zhang	(参考訳) ほとんどの医療対話システムは、患者が医療相談の前に明確な目標(医療問合せ、外科手術問合せなど)を持っていると仮定している。しかし、多くの現実シナリオでは、医学的な知識が不足しているため、患者が必要な全てのスロットで明確な目標を決定することは通常困難である。本稿では,この課題を,患者の目標を明確にするための医療相談対話システムの構築方法として認識する。そこで本研究では,この課題を軽減すべく,タスク指向対話,レコメンデーション,知識基盤対話,qa,chitchatの5つの対話タイプをカバーする「midmed」と呼ばれるヒューマン・ツー・ヒューマン混合型医療相談対話コーパスを提案する。 MidMedは4つの部門(耳鼻咽喉科、眼科、皮膚、消化器科)と8,175の対話をカバーしている。さらに,この課題に対処するため,MidMed上にベースラインを構築し,InsMedと呼ばれる指導指導型医療対話生成フレームワークを提案する。実験の結果,InsMedの有効性が示された。 Most medical dialogue systems assume that patients have clear goals (medicine querying, surgical operation querying, etc.) before medical consultation. However, in many real scenarios, due to the lack of medical knowledge, it is usually difficult for patients to determine clear goals with all necessary slots. In this paper, we identify this challenge as how to construct medical consultation dialogue systems to help patients clarify their goals. To mitigate this challenge, we propose a novel task and create a human-to-human mixed-type medical consultation dialogue corpus, termed MidMed, covering five dialogue types: task-oriented dialogue for diagnosis, recommendation, knowledge-grounded dialogue, QA, and chitchat. MidMed covers four departments (otorhinolaryngology, ophthalmology, skin, and digestive system), with 8,175 dialogues. Furthermore, we build baselines on MidMed and propose an instruction-guiding medical dialogue generation framework, termed InsMed, to address this task. Experimental results show the effectiveness of InsMed.	翻訳日:2023-06-17 00:10:12 公開日:2023-06-14
# 生成型AI応用に関する調査 A survey of Generative AI Applications ( http://arxiv.org/abs/2306.02781v2 ) ライセンス: Link先を確認	Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merch\'an	(参考訳) ジェネレーティブAIは近年顕著な成長を遂げており、多様なドメインにまたがる幅広いアプリケーションを生み出している。本稿では,350以上の生成ai応用に関する包括的調査を行い,様々な単様および多様生成aiの構造化分類と簡潔な記述について述べる。この調査は、テキスト、画像、ビデオ、ゲーム、脳情報など、幅広いユニモーダルな生成aiアプリケーションをカバーするセクションに分割されている。我々の調査は、研究者や実践者が、急速に拡大する生成AIの風景をナビゲートし、現在の最先端の理解を深め、この分野におけるさらなるイノベーションを促進するための貴重なリソースとなることを目的としています。 Generative AI has experienced remarkable growth in recent years, leading to a wide array of applications across diverse domains. In this paper, we present a comprehensive survey of more than 350 generative AI applications, providing a structured taxonomy and concise descriptions of various unimodal and even multimodal generative AIs. The survey is organized into sections, covering a wide range of unimodal generative AI applications such as text, images, video, gaming and brain information. Our survey aims to serve as a valuable resource for researchers and practitioners to navigate the rapidly expanding landscape of generative AI, facilitating a better understanding of the current state-of-the-art and fostering further innovation in the field.	翻訳日:2023-06-17 00:09:53 公開日:2023-06-14
# MoviePuzzle:マルチモーダル順序学習による視覚的ナラティブ推論 MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning ( http://arxiv.org/abs/2306.02252v2 ) ライセンス: Link先を確認	Jianghui Wang, Yuxuan Wang, Dongyan Zhao, Zilong Zheng	(参考訳) 視覚的物語的推論と全体論的映画理解をターゲットとした新しい挑戦であるMoviePuzzleを紹介する。ビデオ理解の領域で注目すべき進歩にもかかわらず、ほとんどの先行作品は、長い形式のビデオに存在する総合的なビデオ理解と生来のビジュアルナラティブ構造に対処するためのタスクやモデルの提供に失敗している。そこで本研究では,映像対話情報の存在下で映画セグメントの撮影,フレーム,クリップ層を再分割することにより,映像モデルの時間的特徴学習と構造学習を増幅するmoviepuzzleタスクを行った。まず,映画を階層層に分割し,ランダムに順序を並べ替えることで,movienetに基づく精巧なデータセットを構築する。映画理解の先行技術を用いて映画パズルをベンチマークすると同時に,映画再注文の基盤構造と視覚的意味的順序を考慮した階層的コントラスト映画クラスタリング(hcmc)モデルを考案する。具体的には、ペアワイズで対照的な学習アプローチを通じて、各層の正しい順序を予測するためにモデルを訓練する。これにより、映画の視覚的物語構造を解読し、ビデオデータに潜む障害を処理するためのネックが装備される。実験により,本手法は,既存の<MoviePuzzle>ベンチマークよりも高い性能を示し,その有効性を裏付ける。 We introduce MoviePuzzle, a novel challenge that targets visual narrative reasoning and holistic movie understanding. Despite the notable progress that has been witnessed in the realm of video understanding, most prior works fail to present tasks and models to address holistic video understanding and the innate visual narrative structures existing in long-form videos. To tackle this quandary, we put forth MoviePuzzle task that amplifies the temporal feature learning and structure learning of video models by reshuffling the shot, frame, and clip layers of movie segments in the presence of video-dialogue information. We start by establishing a carefully refined dataset based on MovieNet by dissecting movies into hierarchical layers and randomly permuting the orders. Besides benchmarking the MoviePuzzle with prior arts on movie understanding, we devise a Hierarchical Contrastive Movie Clustering (HCMC) model that considers the underlying structure and visual semantic orders for movie reordering. Specifically, through a pairwise and contrastive learning approach, we train models to predict the correct order of each layer. This equips them with the knack for deciphering the visual narrative structure of movies and handling the disorder lurking in video data. Experiments show that our approach outperforms existing state-of-the-art methods on the \MoviePuzzle benchmark, underscoring its efficacy.	翻訳日:2023-06-17 00:09:40 公開日:2023-06-14
# PassGPT: 大きな言語モデルを用いたパスワードモデリングと(ガイド付き)生成 PassGPT: Password Modeling and (Guided) Generation with Large Language Models ( http://arxiv.org/abs/2306.01545v2 ) ライセンス: Link先を確認	Javier Rando and Fernando Perez-Cruz and Briland Hitaj	(参考訳) 大規模言語モデル(LLM)は、明示的な監督なしに大量のテキストから自然言語をモデル化することに成功した。本稿では,パスワードのモデリングにおけるLLMの有効性について検討する。パスワード生成のためのパスワードリークを訓練したllmであるpassgptを提案する。 passgptは、従来の2倍のパスワードを推測することで、generative adversarial networks (gan) に基づく既存の方法よりも優れています。さらに,任意の制約に対応するパスワードを生成するためにPassGPTサンプリング手法を利用する誘導型パスワード生成の概念を導入する。最後に、passgptがパスワード上で定義しているエントロピーと確率分布の詳細な分析を行い、既存のパスワード強度推定器の強化における使用について論じる。 Large language models (LLMs) successfully model natural language from vast amounts of text without the need for explicit supervision. In this paper, we investigate the efficacy of LLMs in modeling passwords. We present PassGPT, a LLM trained on password leaks for password generation. PassGPT outperforms existing methods based on generative adversarial networks (GAN) by guessing twice as many previously unseen passwords. Furthermore, we introduce the concept of guided password generation, where we leverage PassGPT sampling procedure to generate passwords matching arbitrary constraints, a feat lacking in current GAN-based strategies. Lastly, we conduct an in-depth analysis of the entropy and probability distribution that PassGPT defines over passwords and discuss their use in enhancing existing password strength estimators.	翻訳日:2023-06-17 00:08:52 公開日:2023-06-14
# 判例要約のための事前学習された抽象モデルとllmは、どの程度準備ができているか? How Ready are Pre-trained Abstractive Models and LLMs for Legal Case Judgement Summarization? ( http://arxiv.org/abs/2306.01248v2 ) ライセンス: Link先を確認	Aniket Deroy, Kripabandhu Ghosh, Saptarshi Ghosh	(参考訳) 判例判断の自動要約は伝統的に抽出的要約法を用いて試みられている。しかし近年では,より自然で一貫性のある要約を生成できるため,抽象要約モデルが普及している。法的なドメイン固有の事前学習された抽象要約モデルが利用可能である。さらに、ChatGPTのような汎用ドメイン事前訓練された大規模言語モデル(LLM)は高品質なテキストを生成することで知られており、テキスト要約の能力を持っている。したがって、これらのモデルが、ケース判断のための抽象的な要約を自動生成するオフザシェルフアプリケーションの準備が整っているかどうかを問うのは自然である。そこで本研究では,インドの裁判所判決に対して,最先端のドメイン固有抽象要約モデルと一般ドメインLLMを適用し,生成した要約の質を確認する。要約品質の標準指標に加えて、要約における矛盾や幻覚も確認する。抽象的な要約モデルでは,ROUGEやBLEUなどの標準要約評価指標を用いて,抽出モデルよりも若干高いスコアが得られる。しかし、生成した抽象要約には矛盾する情報や幻覚的な情報がしばしば見出される。全体として,事前学習した抽象要約モデルとLLMは,ケース判断要約のための完全自動展開にはまだ準備が整っていないことが示唆されている。 Automatic summarization of legal case judgements has traditionally been attempted by using extractive summarization methods. However, in recent years, abstractive summarization models are gaining popularity since they can generate more natural and coherent summaries. Legal domain-specific pre-trained abstractive summarization models are now available. Moreover, general-domain pre-trained Large Language Models (LLMs), such as ChatGPT, are known to generate high-quality text and have the capacity for text summarization. Hence it is natural to ask if these models are ready for off-the-shelf application to automatically generate abstractive summaries for case judgements. To explore this question, we apply several state-of-the-art domain-specific abstractive summarization models and general-domain LLMs on Indian court case judgements, and check the quality of the generated summaries. In addition to standard metrics for summary quality, we check for inconsistencies and hallucinations in the summaries. We see that abstractive summarization models generally achieve slightly higher scores than extractive models in terms of standard summary evaluation metrics such as ROUGE and BLEU. However, we often find inconsistent or hallucinated information in the generated abstractive summaries. Overall, our investigation indicates that the pre-trained abstractive summarization models and LLMs are not yet ready for fully automatic deployment for case judgement summarization; rather a human-in-the-loop approach including manual checks for inconsistencies is more suitable at present.	翻訳日:2023-06-17 00:08:41 公開日:2023-06-14
# 同時運動量と位置測定とインストゥルメンタルワイル・ハイゼンベルク群 Simultaneous Momentum and Position Measurement and the Instrumental Weyl-Heisenberg Group ( http://arxiv.org/abs/2306.01045v2 ) ライセンス: Link先を確認	Christopher S. Jackson and Carlton M. Caves	(参考訳) 標準可換関係、$[Q,P] = i\hbar$ は量子論の基礎とヒルベルト空間の原点である。可観測性としての$P$ & $Q$の解釈は、ヒルベルト空間のユニタリ変換と古典位相空間の正準変換(つまり接触)の間の類似に常に依存している。量子測度の理論は本質的に完備である(これはしばらく時間がかかった)ため、一元変換ではなく正の変換に関する量子論の基礎を定める方法で正の可換関係を再考することができる。本稿では,同時計測の概念が基本的な微分幾何学問題にどのようにつながるかを示し,その解を次のように示す。同時計測 (p$ & $q$) 測定 (spqm) は,7次元多様体の形をとる普遍計測器を定義し,それをインストゥルメンタルワイル・ハイゼンベルク群 (iwh) と呼ぶ。群 IWH は、正の演算値測度 (POVM) がエネルギー量子化の完全な代替となるほど、予期せぬ方法で古典位相空間にアイデンティティを接続する。 5つの次元は、容易に認識し理解できるプロセスを定義する。他の2次元、IWHの中心における正規化と位相は、あまり知られていない。正規化は特に、SPQMを記述し理解するために特別な処理を必要とする。 The canonical commutation relation, $[Q,P] = i\hbar$, stands at the foundation of quantum theory and the original Hilbert space. The interpretation of $P$ & $Q$ as observables has always relied on the analogies that exist between the unitary transformations of Hilbert space and the canonical (a.k.a. contact) transformations of classical phase space. Now that the theory of quantum measurement is essentially complete (this took a while), it is possible to revisit the canonical commutation relation in a way that sets the foundation of quantum theory not on unitary transformations, but on positive transformations. This paper shows how the concept of simultaneous measurement leads to a fundamental differential geometric problem whose solution shows us the following: The simultaneous $P$ & $Q$ measurement (SPQM) defines a universal measuring instrument, which takes the shape of a 7-dimensional manifold, a universal covering group we call the Instrumental Weyl-Heisenberg Group, IWH. The group IWH connects the identity to classical phase space in unexpected ways that are significant enough that the positive-operator-valued measure (POVM) offers a complete alternative to energy quantization. Five of the dimensions define processes that can be easily recognized and understood. The other two dimensions, the normalization and phase in the center of IWH, are less familiar. The normalization, in particular, requires special handling in order to describe and understand the SPQM instrument.	翻訳日:2023-06-17 00:07:57 公開日:2023-06-14
# CorrMatch:半教師付きセマンティックセグメンテーションのための相関マッチングによるラベル伝播 CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation ( http://arxiv.org/abs/2306.04300v2 ) ライセンス: Link先を確認	Boyuan Sun, Yuqi Yang, Weifeng Yuan, Le Zhang, Ming-Ming Cheng, Qibin Hou	(参考訳) 本稿では,CorrMatch と呼ばれる,単純だが半教師付きセマンティックセマンティックセマンティックセマンティクス手法を提案する。我々のゴールは、ラベルのない画像からより高品質な領域を抽出し、一貫性の正則化によってラベルのないデータをより効率的に活用することである。 CorrMatchの主な貢献は、2つの新しい、補完的な戦略です。まず,良質な領域を拡大するために,初期化を緩和した適応しきい値更新戦略を導入する。さらに,画素間の対の類似度を測定することにより,高信頼度予測の伝播を提案する。その単純さにもかかわらず、corrmatchは人気のある半教師付きセマンティックセグメンテーションベンチマークで素晴らしいパフォーマンスを達成していることを示している。 resnet-101 backboneを使用したdeeplabv3+フレームワークをセグメンテーションモデルとして、pascal voc 2012セグメンテーションベンチマークで76%以上のmiouスコアを取得しました。また,従来の半教師付きセマンティックセグメンテーションモデルよりも一貫した改善を実現している。コードは公開される予定だ。 In this paper, we present a simple but performant semi-supervised semantic segmentation approach, termed CorrMatch. Our goal is to mine more high-quality regions from the unlabeled images to leverage the unlabeled data more efficiently via consistency regularization. The key contributions of our CorrMatch are two novel and complementary strategies. First, we introduce an adaptive threshold updating strategy with a relaxed initialization to expand the high-quality regions. Furthermore, we propose to propagate high-confidence predictions through measuring the pairwise similarities between pixels. Despite its simplicity, we show that CorrMatch achieves great performance on popular semi-supervised semantic segmentation benchmarks. Taking the DeepLabV3+ framework with ResNet-101 backbone as our segmentation model, we receive a 76%+ mIoU score on the Pascal VOC 2012 segmentation benchmark with only 92 annotated images provided. We also achieve a consistent improvement over previous semi-supervised semantic segmentation models. Code will be made publicly available.	翻訳日:2023-06-17 00:00:23 公開日:2023-06-14
# DEMIST : 深層学習に基づく心筋灌流SPECTのためのタスク特異的 denoising アプローチ DEMIST: A deep-learning-based task-specific denoising approach for myocardial perfusion SPECT ( http://arxiv.org/abs/2306.04249v2 ) ライセンス: Link先を確認	Md Ashequr Rahman, Zitong Yu, Richard Laforest, Craig K. Abbey, Barry A. Siegel, Abhinav K. Jha	(参考訳) 低放射線量で取得した心筋血流イメージング(mpi)spect画像および/または取得時間を処理する方法が必要であり、この処理画像は灌流欠陥の検出に関する臨床課題において観察者性能を向上させる。このニーズに対処するために、モデル・オブザーバ理論と人間の視覚システムの理解に基づいて、MPI SPECT画像(DEMIST)を識別するタスク固有の深層学習に基づくアプローチを提案する。この手法は、遮音性能が検出タスクに影響を及ぼす特徴を保存するために設計されている。 2台のスキャナー(N=338)でMPIを施行した患者を対象に,匿名臨床データを用いた再検討を行い,DEMISTを客観的に評価した。評価は低線量率6.25%, 12.5%, 25%で行われ, 人為的チャネル化ホテルリング観測者を用いて行った。受信動作特性曲線 (AUC) 下での性能を定量化した。 DEMISTで認識された画像は、対応する低用量画像や、一般的に使われているタスク非依存のDLベースの画像と比較してAUCが有意に高かった。同様の結果は, 性差と欠陥タイプに基づく成層分析で観察された。さらに、DEMISTはルート平均二乗誤差と構造類似度指標を用いて定量化され、低線量画像の視覚的忠実度を改善した。数学的解析により、DEMISTはノイズ特性を改善しながら検出タスクを補助する機能を保存し、観測性能を向上した。以上の結果から,MPI SPECTで低位像を呈示するDEMISTのさらなる臨床評価が示唆された。 There is an important need for methods to process myocardial perfusion imaging (MPI) SPECT images acquired at lower radiation dose and/or acquisition time such that the processed images improve observer performance on the clinical task of detecting perfusion defects. To address this need, we build upon concepts from model-observer theory and our understanding of the human visual system to propose a Detection task-specific deep-learning-based approach for denoising MPI SPECT images (DEMIST). The approach, while performing denoising, is designed to preserve features that influence observer performance on detection tasks. We objectively evaluated DEMIST on the task of detecting perfusion defects using a retrospective study with anonymized clinical data in patients who underwent MPI studies across two scanners (N = 338). The evaluation was performed at low-dose levels of 6.25%, 12.5% and 25% and using an anthropomorphic channelized Hotelling observer. Performance was quantified using area under the receiver operating characteristics curve (AUC). Images denoised with DEMIST yielded significantly higher AUC compared to corresponding low-dose images and images denoised with a commonly used task-agnostic DL-based denoising method. Similar results were observed with stratified analysis based on patient sex and defect type. Additionally, DEMIST improved visual fidelity of the low-dose images as quantified using root mean squared error and structural similarity index metric. A mathematical analysis revealed that DEMIST preserved features that assist in detection tasks while improving the noise properties, resulting in improved observer performance. The results provide strong evidence for further clinical evaluation of DEMIST to denoise low-count images in MPI SPECT.	翻訳日:2023-06-17 00:00:01 公開日:2023-06-14
# CrazyFlie 2.Xの強化学習に基づく制御 Reinforcement Learning-Based Control of CrazyFlie 2.X Quadrotor ( http://arxiv.org/abs/2306.03951v2 ) ライセンス: Link先を確認	Arshad Javeed, Valent\'in L\'opez Jim\'enez	(参考訳) プロジェクトの目的は、PIDのような古典的な制御アルゴリズムと現代の強化学習アルゴリズムの相乗効果を探求し、クレイジーフリー2.Xを制御するための実用的な制御機構を考案することである。第一の目的は強化学習戦略を用いたPIDチューニングを行うことである。第二の目的は、最初のタスクからの学習を活用し、灯台位置決めシステムと統合してナビゲーションの制御を実装することである。ナビゲーションには2つのアプローチが考えられる。これは、有限の事前定義された動作プリミティブを持つ深部Q-Learningを用いた離散的なナビゲーション問題であり、連続的なナビゲーションアプローチのための深部強化学習である。 RLトレーニングのシミュレーションは、強化学習のためのオープンソースのジムベースの環境であるジム・パイブルレット・ドレーンで実施され、RL実装は安定ベースライン3で提供される。 The objective of the project is to explore synergies between classical control algorithms such as PID and contemporary reinforcement learning algorithms to come up with a pragmatic control mechanism to control the CrazyFlie 2.X quadrotor. The primary objective would be performing PID tuning using reinforcement learning strategies. The secondary objective is to leverage the learnings from the first task to implement control for navigation by integrating with the lighthouse positioning system. Two approaches are considered for navigation, a discrete navigation problem using Deep Q-Learning with finite predefined motion primitives, and deep reinforcement learning for a continuous navigation approach. Simulations for RL training will be performed on gym-pybullet-drones, an open-source gym-based environment for reinforcement learning, and the RL implementations are provided by stable-baselines3	翻訳日:2023-06-16 23:59:32 公開日:2023-06-14
# 残念ながら、それはできません:ブラックボックス生成言語モデルにおける即時拒否の予測 I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models ( http://arxiv.org/abs/2306.03423v2 ) ライセンス: Link先を確認	Max Reuter, William Schulze	(参考訳) OpenAIのChatGPTのリリース以来、生成言語モデルは広く注目を集めている。利用の増加は生成モデルの広範な実用性を強調しているが、いくつかの形態の埋め込みバイアスも明らかにしている。いくつかは事前学習コーパスによって引き起こされるが、生成モデルに特有の追加のバイアスは、有害なコンテンツを生成するのを避けるために主観的微調整を使用することから生じる。微調整バイアスは、個々のエンジニアと企業のポリシーから生じ、モデルが拒否する方向に影響を及ぼす可能性がある。本実験では,ブラックボックス攻撃によるChatGPTの拒絶動作を特徴付ける。まずChatGPTにさまざまな攻撃的かつ良心的なプロンプト(n=1,706)を問い合わせ、それから手動で各レスポンスをコンプライアンスや拒否としてラベル付けします。応答の手動検査は、拒絶はクリーンなバイナリではなく、連続体上にあることを示し、いくつかの異なる種類の応答をコンプライアンスや拒否のバイナリにマップする。手動でラベルされた小さなデータセットは、拒絶分類器のトレーニングに使用され、96%の精度を実現している。次に、この拒絶分類器を使用して、Quora Insincere Questionsデータセットから適合したより大きな(n=10,000)データセットをブートストラップします。この機械ラベル付きデータを用いて、ChatGPTの応答を見ることなく、ChatGPTが与えられた質問を拒否するかどうかを予測するプロンプト分類器を訓練する。このプロンプト分類器は、手動ラベル付き質問(n=985)のテストセットで76%の精度を達成する。コンプライアンスや拒否を最も予測する分類器とn-gramのプロンプトについて検討した。私たちのデータセットとコードはhttps://github.com/maxwellreuter/chatgpt-refusalsで利用可能です。 Since the release of OpenAI's ChatGPT, generative language models have attracted extensive public attention. The increased usage has highlighted generative models' broad utility, but also revealed several forms of embedded bias. Some is induced by the pre-training corpus; but additional bias specific to generative models arises from the use of subjective fine-tuning to avoid generating harmful content. Fine-tuning bias may come from individual engineers and company policies, and affects which prompts the model chooses to refuse. In this experiment, we characterize ChatGPT's refusal behavior using a black-box attack. We first query ChatGPT with a variety of offensive and benign prompts (n=1,706), then manually label each response as compliance or refusal. Manual examination of responses reveals that refusal is not cleanly binary, and lies on a continuum; as such, we map several different kinds of responses to a binary of compliance or refusal. The small manually-labeled dataset is used to train a refusal classifier, which achieves an accuracy of 96%. Second, we use this refusal classifier to bootstrap a larger (n=10,000) dataset adapted from the Quora Insincere Questions dataset. With this machine-labeled data, we train a prompt classifier to predict whether ChatGPT will refuse a given question, without seeing ChatGPT's response. This prompt classifier achieves 76% accuracy on a test set of manually labeled questions (n=985). We examine our classifiers and the prompt n-grams that are most predictive of either compliance or refusal. Our datasets and code are available at https://github.com/maxwellreuter/chatgpt-refusals.	翻訳日:2023-06-16 23:59:08 公開日:2023-06-14
# 分布外検出と条件正規化流の適応による高速光電場3次元顕微鏡 Fast light-field 3D microscopy with out-of-distribution detection and adaptation through Conditional Normalizing Flows ( http://arxiv.org/abs/2306.06408v2 ) ライセンス: Link先を確認	Josu\'e Page Vizca\'ino, Panagiotis Symvoulidis, Zeguan Wang, Jonas Jelten, Paolo Favaro, Edward S. Boyden, Tobias Lasser	(参考訳) リアルタイム3次元蛍光顕微鏡は、神経活動モニタリングなどの生物の時空間分析に不可欠である。拡張視野光電界顕微鏡(extended field-of-view light field microscope, xlfm)は、フーリエ光電界顕微鏡(fourier light field microscope)とも呼ばれる。 XLFMは、単一のカメラ露光において空間角情報を取得する。その後のステップでは、3Dボリュームをアルゴリズムで再構成することができ、リアルタイムの3D取得と潜在的な分析に非常に適している。残念なことに、従来の再構成手法(デコンボリューションなど)は処理時間(0.0220Hz)を必要とし、XLFMの速度優位性を妨げている。ニューラルネットワークアーキテクチャは、確実性指標の欠如を犠牲にして、速度制約を克服することができるため、バイオメディカル領域では信頼できない。本研究は, 条件付き正規化フローに基づいて, 生きた固定化ゼブラフィッシュ神経活動の高速な3次元再構成を行うアーキテクチャを提案する。 512x512x96ボクセルにまたがる8Hzのボリュームを再構築し、小さなデータセット(10のイメージボリュームペア)のために2時間以内にトレーニングすることができる。さらに、フローの正規化により、分布監視が可能となり、新しいサンプルが検出された場合、システムの配布外検出と再学習が行われる。提案手法は,複数の分布内サンプル(遺伝的に同一のゼブラフィッシュ)と分布外サンプルを含むクロスバリデーション手法について検討した。 Real-time 3D fluorescence microscopy is crucial for the spatiotemporal analysis of live organisms, such as neural activity monitoring. The eXtended field-of-view light field microscope (XLFM), also known as Fourier light field microscope, is a straightforward, single snapshot solution to achieve this. The XLFM acquires spatial-angular information in a single camera exposure. In a subsequent step, a 3D volume can be algorithmically reconstructed, making it exceptionally well-suited for real-time 3D acquisition and potential analysis. Unfortunately, traditional reconstruction methods (like deconvolution) require lengthy processing times (0.0220 Hz), hampering the speed advantages of the XLFM. Neural network architectures can overcome the speed constraints at the expense of lacking certainty metrics, which renders them untrustworthy for the biomedical realm. This work proposes a novel architecture to perform fast 3D reconstructions of live immobilized zebrafish neural activity based on a conditional normalizing flow. It reconstructs volumes at 8 Hz spanning 512x512x96 voxels, and it can be trained in under two hours due to the small dataset requirements (10 image-volume pairs). Furthermore, normalizing flows allow for exact Likelihood computation, enabling distribution monitoring, followed by out-of-distribution detection and retraining of the system when a novel sample is detected. We evaluate the proposed method on a cross-validation approach involving multiple in-distribution samples (genetically identical zebrafish) and various out-of-distribution ones.	翻訳日:2023-06-16 23:51:56 公開日:2023-06-14
# エラーフィードバックはプリコンディショナーを正確に圧縮できる Error Feedback Can Accurately Compress Preconditioners ( http://arxiv.org/abs/2306.06098v2 ) ライセンス: Link先を確認	Ionut-Vlad Modoranu, Aleksei Kalinov, Eldar Kurtic, Dan Alistarh	(参考訳) 深層ネットワークの規模で2次情報を活用することは、ディープラーニングのための現在の最適化器の性能を改善するための主要なアプローチの1つだ。しかしながら、フルマトリクスアダグラード(ggt)やマトリクスフリー近似曲率(m-fac)のような、正確なフルマトリクスプリコンディショニングのための既存のアプローチは、中規模モデルにも適用される場合、モデル次元でメモリ要求が乗算されるような勾配のスライディングウィンドウを格納しなければならないため、膨大なストレージコストを被る。本稿では, この問題を, 収束の損失なく, プリコンディショナーの最大2桁圧縮に適用可能な, 効率的かつ簡易に実装したエラーフィードバック手法を用いて解決する。具体的には、スペーシフィケーションや低ランク圧縮 \emph{before} を用いて勾配情報をプレコンディショナーに入力し、圧縮誤差を将来の繰り返しにフィードバックする。ビジョンのためのディープニューラルネットワークに関する広範な実験により、このアプローチは精度に影響を与えず、フルマトリックスプリコンディショナーを最大2桁圧縮し、フルマトリックスアダグラード(ggt)と自然勾配(m-fac)の実装のためのフルマトリックスプリコンディショニングのメモリオーバーヘッドを効果的に除去できることが示されている。私たちのコードはhttps://github.com/IST-DASLab/EFCPで利用可能です。 Leveraging second-order information at the scale of deep networks is one of the main lines of approach for improving the performance of current optimizers for deep learning. Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to medium-scale models, as they must store a sliding window of gradients, whose memory requirements are multiplicative in the model dimension. In this paper, we address this issue via an efficient and simple-to-implement error-feedback technique that can be applied to compress preconditioners by up to two orders of magnitude in practice, without loss of convergence. Specifically, our approach compresses the gradient information via sparsification or low-rank compression \emph{before} it is fed into the preconditioner, feeding the compression error back into future iterations. Extensive experiments on deep neural networks for vision show that this approach can compress full-matrix preconditioners by up to two orders of magnitude without impact on accuracy, effectively removing the memory overhead of full-matrix preconditioning for implementations of full-matrix Adagrad (GGT) and natural gradient (M-FAC). Our code is available at https://github.com/IST-DASLab/EFCP.	翻訳日:2023-06-16 23:51:10 公開日:2023-06-14
# CARSO:合成観測の対向的リコール CARSO: Counter-Adversarial Recall of Synthetic Observations ( http://arxiv.org/abs/2306.06081v2 ) ライセンス: Link先を確認	Emanuele Ballarin, Alessio Ansuini, Luca Bortolussi	(参考訳) 本稿では,認知神経科学からのヒントに触発された画像分類のための新しい防御機構カルソを提案する。この方法は相乗的に敵の訓練に相補的であり、攻撃された分類器の内部表現に関する知識に依存している。このような表現を条件とした生成モデルを利用して、最終的に分類される入力の再構成をサンプリングする。 CARSOは、さまざまな画像データセットと分類器アーキテクチャをまたいだ、多種多様で強力な適応攻撃に関するよく確立されたベンチマークによる実験的評価によると、CARSOは、最先端の対人訓練単独よりもはるかに優れた分類器を、許容可能な正確さで防御することができる。さらに防御アーキテクチャは、予期せぬ脅威から効果的に身を守ることに成功し、愚かな確率的防御に適応したエンドツーエンド攻撃にも成功している。コードと事前トレーニングされたモデルはhttps://github.com/emaballarin/CARSO で公開されている。 In this paper, we propose a novel adversarial defence mechanism for image classification -- CARSO -- inspired by cues from cognitive neuroscience. The method is synergistically complementary to adversarial training and relies on knowledge of the internal representation of the attacked classifier. Exploiting a generative model for adversarial purification, conditioned on such representation, it samples reconstructions of inputs to be finally classified. Experimental evaluation by a well-established benchmark of varied, strong adaptive attacks, across diverse image datasets and classifier architectures, shows that CARSO is able to defend the classifier significantly better than state-of-the-art adversarial training alone -- with a tolerable clean accuracy toll. Furthermore, the defensive architecture succeeds in effectively shielding itself from unforeseen threats, and end-to-end attacks adapted to fool stochastic defences. Code and pre-trained models are available at https://github.com/emaballarin/CARSO .	翻訳日:2023-06-16 23:50:39 公開日:2023-06-14
# COVER:言語モデルにおけるプロンプトに基づく学習に対するヒューリスティックなグレディ・アドバイザリアタック COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models ( http://arxiv.org/abs/2306.05659v2 ) ライセンス: Link先を確認	Zihao Tan, Qingliang Chen, Wenbin Zhu and Yongjian Huang	(参考訳) プロンプトベースの学習は、プレトレーニング言語モデル(PLM)、特に数ショット設定のような低リソースシナリオにおいて、効果的な方法であることが証明されている。しかしながら、PLMの信頼性は最重要であり、言語モデルの予測を誤解させ、重大なセキュリティ上の懸念を引き起こす可能性のあるプロンプトベースのテンプレートに潜在的な脆弱性が示されている。本稿では,ブラックボックスシナリオにおける手動テンプレートに対する即時攻撃を提案することにより,PLMの脆弱性について明らかにする。まず,手動テンプレートを分割するための文字レベルと単語レベルのヒューリスティックアプローチを設計する。次に,上記のヒューリスティック破壊手法に基づく攻撃に対する欲深いアルゴリズムを提案する。最後に,3種類のBERT系列モデルと8つのデータセットの分類タスクを用いて,本手法の評価を行った。総合的な実験結果から,攻撃成功率と攻撃速度の観点から,本手法の有効性を検証した。さらに, 提案手法は, ショット数, テンプレート長, クエリ回数の異なるシナリオにおいても優れた性能を示し, 高い一般化性を示した。 Prompt-based learning has been proved to be an effective way in pre-trained language models (PLMs), especially in low-resource scenarios like few-shot settings. However, the trustworthiness of PLMs is of paramount significance and potential vulnerabilities have been shown in prompt-based templates that could mislead the predictions of language models, causing serious security concerns. In this paper, we will shed light on some vulnerabilities of PLMs, by proposing a prompt-based adversarial attack on manual templates in black box scenarios. First of all, we design character-level and word-level heuristic approaches to break manual templates separately. Then we present a greedy algorithm for the attack based on the above heuristic destructive approaches. Finally, we evaluate our approach with the classification tasks on three variants of BERT series models and eight datasets. And comprehensive experimental results justify the effectiveness of our approach in terms of attack success rate and attack speed. Further experimental studies indicate that our proposed method also displays good capabilities in scenarios with varying shot counts, template lengths and query counts, exhibiting good generalizability.	翻訳日:2023-06-16 23:49:00 公開日:2023-06-14
# CVXPYを用いたロバストな経験的リスク最小化問題の特定と解決 Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY ( http://arxiv.org/abs/2306.05649v2 ) ライセンス: Link先を確認	Eric Luxenberg and Dhruv Malik and Yuanzhi Li and Aarti Singh and Stephen Boyd	(参考訳) 本研究では,各データポイントが所定の凸不確実性集合上で変動する場合の最悪の経験的損失を最小限に抑えるために,モデルパラメータが選択される,ロバストな経験的リスク最小化(ERM)を考える。単純な場合では、そのような問題は分析形式で表現できる。一般に、問題は双対化によって引き出すことができ、min-max問題からmin-min問題へと変換される。二重化には専門知識が必要です。本稿では,CVXPYを用いて,この二重化手順をユーザフレンドリな方法で自動化する方法を示す。当社のフレームワークでは,コンベックス損失の一般的なクラスを用いて,堅牢なERM問題の特定と解決を可能にし,多くの標準回帰および分類問題を捕捉する。ユーザーはdisciplined convex programming (dcp) 制約によって表現可能な任意の複雑な不確実性集合を容易に指定できる。 We consider robust empirical risk minimization (ERM), where model parameters are chosen to minimize the worst-case empirical loss when each data point varies over a given convex uncertainty set. In some simple cases, such problems can be expressed in an analytical form. In general the problem can be made tractable via dualization, which turns a min-max problem into a min-min problem. Dualization requires expertise and is tedious and error-prone. We demonstrate how CVXPY can be used to automate this dualization procedure in a user-friendly manner. Our framework allows practitioners to specify and solve robust ERM problems with a general class of convex losses, capturing many standard regression and classification problems. Users can easily specify any complex uncertainty set that is representable via disciplined convex programming (DCP) constraints.	翻訳日:2023-06-16 23:48:42 公開日:2023-06-14
# 大型鋳造シリコンフォトニクスにおける波長可変量子エミッタ Tunable quantum emitters on large-scale foundry silicon photonics ( http://arxiv.org/abs/2306.06460v2 ) ライセンス: Link先を確認	Hugo Larocque, Mustafa Atabey Buyukkaya, Carlos Errando-Herranz, Samuel Harper, Jacques Carolan, Chang-Min Lee, Christopher J.K. Richardson, Gerald L. Leake, Daniel J. Coleman, Michael L. Fanto, Edo Waks, Dirk Englund	(参考訳) 単一光子と単一原子系のレベルでの大規模多体量子システム制御は、量子情報科学と技術における中心的な目標である。集中的な研究と開発により、鋳物ベースのシリコン・オン・インシュレーターフォトニック集積回路は、個々のモードをプログラム可能な大規模光制御のための主要なプラットフォームへと推進された。しかし、原子量子系と単一エミッタのチューナビリティを統合することは、未解決の課題である。ここでは,高輝度赤外半導体量子ドット単一光子エミッタを含む複数InAs/InPマイクロチップレットを300〜mmのファクトリープロセスで作製したシリコンオン絶縁体フォトニック集積回路に結合することで,この障壁を克服する。このプラットフォームでは、共振蛍光による単一光子放出と、電気的に制御された不揮発性メモリによるスケーラブルな発光波長可変性を実現する。フォトニックと量子システムの複合制御は、半導体ファイントリーで製造されるプログラマブルな量子情報プロセッサへの扉を開く。 Controlling large-scale many-body quantum systems at the level of single photons and single atomic systems is a central goal in quantum information science and technology. Intensive research and development has propelled foundry-based silicon-on-insulator photonic integrated circuits to a leading platform for large-scale optical control with individual mode programmability. However, integrating atomic quantum systems with single-emitter tunability remains an open challenge. Here, we overcome this barrier through the hybrid integration of multiple InAs/InP microchiplets containing high-brightness infrared semiconductor quantum dot single photon emitters into advanced silicon-on-insulator photonic integrated circuits fabricated in a 300~mm foundry process. With this platform, we achieve single photon emission via resonance fluorescence and scalable emission wavelength tunability through an electrically controlled non-volatile memory. The combined control of photonic and quantum systems opens the door to programmable quantum information processors manufactured in leading semiconductor foundries.	翻訳日:2023-06-16 23:39:40 公開日:2023-06-14
# ユーザの意図に基づく文脈フォント推薦 Contextual Font Recommendations based on User Intent ( http://arxiv.org/abs/2306.08188v1 ) ライセンス: Link先を確認	Sanat Sharma, Jayant Kumar, Jing Zheng, Tracy Holloway King	(参考訳) Adobe Fontsには2万以上のユニークなフォントのリッチライブラリがあり、Adobeユーザーがグラフィック、ポスター、コンポジットの作成に使っている。大きなライブラリの性質から、どのフォントを選択するかを知ることは、多くの経験を必要とする大変な作業である。多くのAdobe製品、特にAdobe Expressのカジュアルなユーザーにとって、これは利用可能なリッチで多様なフォントを使わずにデフォルトのフォントを選択することを意味することが多い。本研究では,ユーザの創造的体験を支援するために,文脈的フォントレコメンデーションを提供する意図駆動システムを構築する。本システムは多言語テキスト入力を取り入れ,ユーザの意図に基づいて適切なフォントを推薦する。ユーザの権利に基づいて、無料フォントと有料フォントの混合が調整される。この機能は、現在数百万のAdobe Expressユーザーが利用しており、CTRは25%である。 Adobe Fonts has a rich library of over 20,000 unique fonts that Adobe users utilize for creating graphics, posters, composites etc. Due to the nature of the large library, knowing what font to select can be a daunting task that requires a lot of experience. For most users in Adobe products, especially casual users of Adobe Express, this often means choosing the default font instead of utilizing the rich and diverse fonts available. In this work, we create an intent-driven system to provide contextual font recommendations to users to aid in their creative journey. Our system takes in multilingual text input and recommends suitable fonts based on the user's intent. Based on user entitlements, the mix of free and paid fonts is adjusted. The feature is currently used by millions of Adobe Express users with a CTR of >25%.	翻訳日:2023-06-16 20:56:06 公開日:2023-06-14
# ZeroForge:3Dスーパービジョンのないフィードフォワードテキスト・ツー・シェイプ ZeroForge: Feedforward Text-to-Shape Without 3D Supervision ( http://arxiv.org/abs/2306.08183v1 ) ライセンス: Link先を確認	Kelly O. Marshall, Minh Pham, Ameya Joshi, Anushrut Jignasu, Aditya Balu, Adarsh Krishnamurthy Chinmay Hegde	(参考訳) 現在のtext-to-shape生成の最先端手法では、事前に定義された3d形状のラベル付きデータセットを使った教師付きトレーニングが必要か、暗黙のニューラルネットワーク表現の高価な推論時間最適化が必要となる。本稿では,ゼロショットテキスト・ツー・シェイプ生成手法であるZeroForgeについて述べる。オープンボキャブラリー形状生成を実現するためには,既存のフィードフォワードアプローチの注意深いアーキテクチャ適応と,データフリーなクリップロスとコントラストロスの組み合わせが必要となる。これらの技術を用いて、CLIP-Forgeのような既存のフィードフォワードテキスト変換モデルの生成能力を著しく拡張することができる。我々はこの手法を質的・定量的評価を通じて支援する。 Current state-of-the-art methods for text-to-shape generation either require supervised training using a labeled dataset of pre-defined 3D shapes, or perform expensive inference-time optimization of implicit neural representations. In this work, we present ZeroForge, an approach for zero-shot text-to-shape generation that avoids both pitfalls. To achieve open-vocabulary shape generation, we require careful architectural adaptation of existing feed-forward approaches, as well as a combination of data-free CLIP-loss and contrastive losses to avoid mode collapse. Using these techniques, we are able to considerably expand the generative ability of existing feed-forward text-to-shape models such as CLIP-Forge. We support our method via extensive qualitative and quantitative evaluations	翻訳日:2023-06-16 20:55:53 公開日:2023-06-14
# 量子コンピュータにおける量子グリード最適化の実験的実装 Experimental implementation of quantum greedy optimization on quantum computer ( http://arxiv.org/abs/2306.08181v1 ) ライセンス: Link先を確認	Tadayoshi Matsumori, Tadashi Kadowaki	(参考訳) 本稿では,時間進化の離散化(d-QGO)に基づく量子グリード最適化アルゴリズムを提案する。もともと、反断熱駆動による処理時間を短縮するために開発された量子グリード最適化は、エネルギーの感度解析から反断熱項のパラメータを順次選択し、パラメータ値を決定する。量子コンピュータにd-QGOを実装する場合、感度解析はデバイスやショットノイズにより短時間で基底状態を見つけるためにボトルネックとなる可能性がある。本稿では,d-qgoに対して十分に大きな差分間隔を用いた感度解析法を提案する。 d-qgoは、成功確率を維持しながら感度を決定するのに必要なショット数を減少させる。 This paper implements a quantum greedy optimization algorithm based on the discretization of time evolution (d-QGO). Quantum greedy optimization, which was originally developed for reducing processing time via counterdiabatic driving, sequentially selects a parameter in the counterdiabatic term from the sensitivity analysis of energy and then determines the parameter value. For implementing d-QGO on a quantum computer, the sensitivity analysis may become a bottleneck to find the ground state in a short time due to device and shot noise. In this paper, we present an improved sensitivity analysis for d-QGO that employs a sufficiently large differential interval. We demonstrate that d-QGO reduces the number of shots required to determine the sensitivity while maintaining the success probability.	翻訳日:2023-06-16 20:55:39 公開日:2023-06-14
# unraveling the arc puzzle: オブジェクト中心決定トランスフォーマーで人間のソリューションを模倣する Unraveling the ARC Puzzle: Mimicking Human Solutions with Object-Centric Decision Transformer ( http://arxiv.org/abs/2306.08204v1 ) ライセンス: Link先を確認	Jaehyun Park, Jaegyun Im, Sanha Hwang, Mintaek Lim, Sabina Ualibekova, Sejin Kim, Sundong Kim	(参考訳) 人工知能(AGI)の追求において,新たな2段階アプローチを用いて抽象・推論コーパス(ARC)の課題に取り組む。本稿では,人間の問題解決をモデル化するための模擬学習パラダイムとしてDecision Transformerを使用し,オブジェクト検出アルゴリズムであるPush and Pullクラスタリング手法を導入する。この二重戦略はAIのARC問題解決スキルを強化し、AGIの進歩に対する洞察を提供する。しかし、我々の研究は、高度なデータ収集ツール、堅牢なトレーニングデータセット、洗練されたモデル構造の必要性を明らかにしています。本研究は意思決定トランスフォーマーの潜在的な改善を浮き彫りにし,今後のagi研究を推進する。 In the pursuit of artificial general intelligence (AGI), we tackle Abstraction and Reasoning Corpus (ARC) tasks using a novel two-pronged approach. We employ the Decision Transformer in an imitation learning paradigm to model human problem-solving, and introduce an object detection algorithm, the Push and Pull clustering method. This dual strategy enhances AI's ARC problem-solving skills and provides insights for AGI progression. Yet, our work reveals the need for advanced data collection tools, robust training datasets, and refined model structures. This study highlights potential improvements for Decision Transformers and propels future AGI research.	翻訳日:2023-06-16 20:47:20 公開日:2023-06-14
# 指数型家族雑音を用いたグラフラプラシアン学習 Graph Laplacian Learning with Exponential Family Noise ( http://arxiv.org/abs/2306.08201v1 ) ライセンス: Link先を確認	Changhao Shi, Gal Mishne	(参考訳) グラフ機械学習手法を適用する際の一般的な課題は、システムの基盤となるグラフがしばしば未知であることである。連続グラフ信号に対して異なるグラフ推定法が提案されているが、離散数などの他の種類のデータに基づくグラフ構造を推定するには未定である。本稿では,スムーズなグラフ信号から指数関数的な家族雑音分布へグラフを学習するグラフ信号処理(GSP)フレームワークを一般化し,様々なデータタイプをモデル化する。本稿では,グラフラプラシアンと雑音信号からの非可観測滑らかな表現を推定する交互アルゴリズムを提案する。我々は合成データと実世界データを用いて,新しいアルゴリズムがノイズモデルミスマッチ下でのラプラシアン推定法を上回っていることを示す。 A common challenge in applying graph machine learning methods is that the underlying graph of a system is often unknown. Although different graph inference methods have been proposed for continuous graph signals, inferring the graph structure underlying other types of data, such as discrete counts, is under-explored. In this paper, we generalize a graph signal processing (GSP) framework for learning a graph from smooth graph signals to the exponential family noise distribution to model various data types. We propose an alternating algorithm that estimates the graph Laplacian as well as the unobserved smooth representation from the noisy signals. We demonstrate in synthetic and real-world data that our new algorithm outperforms competing Laplacian estimation methods under noise model mismatch.	翻訳日:2023-06-16 20:47:09 公開日:2023-06-14
# POP:継続的な学習のためのプロンプトのプロンプト POP: Prompt Of Prompts for Continual Learning ( http://arxiv.org/abs/2306.08200v1 ) ライセンス: Link先を確認	Zhiyuan Hu, Jiancheng Lyu, Dashan Gao, Nuno Vasconcelos	(参考訳) 近年,継続的な学習 (CL) が注目されている。破滅的な忘れることなく新しい概念を学ぶ人間の能力を模倣することを目的としている。既存のCLメソッドはある程度これを達成しているが、学習した特徴空間のセマンティックなドリフトがまだある。基盤モデルには、非常に大きなデータセットから学んだ堅牢な特徴表現が与えられ、cl問題の解のための興味深い基盤を提供する。最近の研究は、表現の一般性をほとんど無スケールで残すような技法を迅速にチューニングすることで、特定のタスクに適応できることも示している。しかし、オープンな質問は、タスク固有のプロンプトと、グローバルであるプロンプト、すなわち、クロスタスク情報を取得する方法である。本研究では、タスク特定プロンプトのグループと、popと呼ばれるグローバルプロンプトのグループを段階的に学習して、前者からの情報を統合することにより、この目標に対処するprompion of prompts(pop)モデルを提案する。 POP学習を用いた基礎モデルでは,古典的なCL手法よりも優れた性能が得られることを示す。さらに、プロンプトチューニングは、少数のトレーニングサンプルのみを必要とするため、POPは、データセット全体でトレーニングされた競合メソッドよりも優れたパフォーマンスを保ちながら、数ショット設定でCLを実行することができる。 Continual learning (CL) has attracted increasing attention in the recent past. It aims to mimic the human ability to learn new concepts without catastrophic forgetting. While existing CL methods accomplish this to some extent, they are still prone to semantic drift of the learned feature space. Foundation models, which are endowed with a robust feature representation, learned from very large datasets, provide an interesting substrate for the solution of the CL problem. Recent work has also shown that they can be adapted to specific tasks by prompt tuning techniques that leave the generality of the representation mostly unscathed. An open question is, however, how to learn both prompts that are task specific and prompts that are global, i.e. capture cross-task information. In this work, we propose the Prompt Of Prompts (POP) model, which addresses this goal by progressively learning a group of task-specified prompts and a group of global prompts, denoted as POP, to integrate information from the former. We show that a foundation model equipped with POP learning is able to outperform classic CL methods by a significant margin. Moreover, as prompt tuning only requires a small set of training samples, POP is able to perform CL in the few-shot setting, while still outperforming competing methods trained on the entire dataset.	翻訳日:2023-06-16 20:46:54 公開日:2023-06-14
# デジタル病理学における説明可能・位置認識学習 Explainable and Position-Aware Learning in Digital Pathology ( http://arxiv.org/abs/2306.08198v1 ) ライセンス: Link先を確認	Milan Aryal and Nasim Yahyasoltani	(参考訳) スライド画像全体(wsi)をグラフとしてエンコーディングすることは、gigapixelの解像度であるwsiをグラフ学習のために全体表現することができるため、モチベーションが高い。この目的のために、WSIはグラフのノードを表す小さなパッチに分割される。これにより、がんの分類と分類にグラフベースの学習方法が利用できる。隣接ノード間のメッセージパッシングは、グラフベースの学習手法の基礎である。しかし、それらはパッチのいかなる位置情報も考慮せず、2つのパッチが位相的に同型な近傍にある場合、それらの埋め込みは互いにほぼ類似している。本研究は, 位置埋め込みとグラフアテンションを用いて, WSIsからがんの分類を行う。グラフ分類におけるノードの位置埋め込みを表現するために,提案手法ではspline convolutional neural networks (cnn)を用いる。このアルゴリズムは、前立腺がんと腎臓がんをグレードするWSIデータセットでテストされる。提案手法とがん診断とグレーディングの指導的アプローチの比較により, 評価精度が向上した。 WSIsにおける癌領域の同定は、がん診断におけるもう一つの重要な課題である。本研究では,提案モデルの説明可能性についても論じる。勾配に基づく説明性アプローチは、wsisの塩分マッピングを生成するために用いられる。これは、がん診断の責任があるwsiの領域を調べるために使用することができ、提案モデルを説明することができる。 Encoding whole slide images (WSI) as graphs is well motivated since it makes it possible for the gigapixel resolution WSI to be represented in its entirety for the purpose of graph learning. To this end, WSIs can be broken into smaller patches that represent the nodes of the graph. Then, graph-based learning methods can be utilized for the grading and classification of cancer. Message passing among neighboring nodes is the foundation of graph-based learning methods. However, they do not take into consideration any positional information for any of the patches, and if two patches are found in topologically isomorphic neighborhoods, their embeddings are nearly similar to one another. In this work, classification of cancer from WSIs is performed with positional embedding and graph attention. In order to represent the positional embedding of the nodes in graph classification, the proposed method makes use of spline convolutional neural networks (CNN). The algorithm is then tested with the WSI dataset for grading prostate cancer and kidney cancer. A comparison of the proposed method with leading approaches in cancer diagnosis and grading verify improved performance. The identification of cancerous regions in WSIs is another critical task in cancer diagnosis. In this work, the explainability of the proposed model is also addressed. A gradient-based explainbility approach is used to generate the saliency mapping for the WSIs. This can be used to look into regions of WSI that are responsible for cancer diagnosis thus rendering the proposed model explainable.	翻訳日:2023-06-16 20:46:37 公開日:2023-06-14
# ラベル雑音下でのグラフの学習 Learning on Graphs under Label Noise ( http://arxiv.org/abs/2306.08194v1 ) ライセンス: Link先を確認	Jingyang Yuan, Xiao Luo, Yifang Qin, Yusheng Zhao, Wei Ju, Ming Zhang	(参考訳) グラフ上のノード分類は、ソーシャル分析や異常検出など、幅広いアプリケーションにおいて重要なタスクである。グラフニューラルネットワーク(GNN)はこのタスクで有望な結果を生んでいるが、現在の手法ではノードのラベル情報が正確であると推定されることが多い。この問題に対処するために,ラベルノイズのあるグラフについて学習する問題を調査し,それを解決するためにCGNN(Consistent Graph Neural Network)と呼ばれる新しいアプローチを開発した。具体的には、グラフコントラスト学習を正規化項として採用し、拡張ノードの2つのビューを一貫した表現へと促進する。この正規化項はラベル情報を利用できないため、ラベルノイズに対するノード表現の堅牢性を高めることができる。さらに,グラフ上の雑音ラベルを検出するために,隣接ノードとラベルの整合性を計測して雑音ノードを識別するホモフィリー仮定に基づくサンプル選択手法を提案する。最後に,これらの信頼度の高い雑音ラベルを純化し,効率的な意味的グラフ学習を実現する。 3つの有名なベンチマークデータセットに関する広範な実験は、競合するアプローチよりもcgnnが優れていることを示している。 Node classification on graphs is a significant task with a wide range of applications, including social analysis and anomaly detection. Even though graph neural networks (GNNs) have produced promising results on this task, current techniques often presume that label information of nodes is accurate, which may not be the case in real-world applications. To tackle this issue, we investigate the problem of learning on graphs with label noise and develop a novel approach dubbed Consistent Graph Neural Network (CGNN) to solve it. Specifically, we employ graph contrastive learning as a regularization term, which promotes two views of augmented nodes to have consistent representations. Since this regularization term cannot utilize label information, it can enhance the robustness of node representations to label noise. Moreover, to detect noisy labels on the graph, we present a sample selection technique based on the homophily assumption, which identifies noisy nodes by measuring the consistency between the labels with their neighbors. Finally, we purify these confident noisy labels to permit efficient semantic graph learning. Extensive experiments on three well-known benchmark datasets demonstrate the superiority of our CGNN over competing approaches.	翻訳日:2023-06-16 20:46:17 公開日:2023-06-14
# 自然言語処理における操作表現 Operationalising Representation in Natural Language Processing ( http://arxiv.org/abs/2306.08193v1 ) ライセンス: Link先を確認	Jacqueline Harding	(参考訳) 認知科学の哲学の中心性にもかかわらず、現代のNLP実践における表現の概念にかかわる哲学的な研究はほとんどない。本稿では,認知科学のアイデアに基づいて,ニューラルNLPモデルの構成要素に関する表現的クレームを評価するための枠組みを提案し,モデルの構成要素が特性を表すかどうかを評価するための3つの基準を提案し,これらの基準を,NLP(およびより広義の深層学習)で一般的な分析手法であるプローブ分類器を用いて運用する。哲学的にインフォームドされた表現の概念を運用するプロジェクトは、科学の哲学者とNLP実践者の両方にとって興味がある。これは哲学者に表現の性質に関する主張のための新しい試験場を与え、NLPの研究者が実証実験に関する大規模な文献を整理するのを手助けし、経験的研究のための新しい道筋を示唆している。 Despite its centrality in the philosophy of cognitive science, there has been little prior philosophical work engaging with the notion of representation in contemporary NLP practice. This paper attempts to fill that lacuna: drawing on ideas from cognitive science, I introduce a framework for evaluating the representational claims made about components of neural NLP models, proposing three criteria with which to evaluate whether a component of a model represents a property and operationalising these criteria using probing classifiers, a popular analysis technique in NLP (and deep learning more broadly). The project of operationalising a philosophically-informed notion of representation should be of interest to both philosophers of science and NLP practitioners. It affords philosophers a novel testing-ground for claims about the nature of representation, and helps NLPers organise the large literature on probing experiments, suggesting novel avenues for empirical research.	翻訳日:2023-06-16 20:45:59 公開日:2023-06-14
# 単発ノード分類のためのインダクティブ線形探索 Inductive Linear Probing for Few-shot Node Classification ( http://arxiv.org/abs/2306.08192v1 ) ライセンス: Link先を確認	Hirthik Mathavan, Zhen Tan, Nivedh Mudiam, Huan Liu	(参考訳) メタラーニングは、数ショットのノード分類のための強力なトレーニング戦略として現れ、トランスダクティブ環境での有効性を実証している。しかし、既存の文献は主にトランスダクティブな少数ショットノードの分類に焦点を当てており、幅広い少数ショット学習コミュニティで広く研究されているインダクティブな設定を無視している。これにより,グラフデータに基づくメタラーニング手法の性能に対する包括的理解が制限される。本研究では,インダクティブな数ショットノード分類設定において,現在のフレームワークの限界を明らかにするための実証的研究を行う。さらに,帰納的ノード分類タスクに適した,単純かつ競争的なベースラインアプローチを提案する。私たちは、メタラーニングパラダイムがグラフ領域でどのように機能するかをよりよく理解するために、私たちの仕事が新しい道を提供することを期待しています。 Meta-learning has emerged as a powerful training strategy for few-shot node classification, demonstrating its effectiveness in the transductive setting. However, the existing literature predominantly focuses on transductive few-shot node classification, neglecting the widely studied inductive setting in the broader few-shot learning community. This oversight limits our comprehensive understanding of the performance of meta-learning based methods on graph data. In this work, we conduct an empirical study to highlight the limitations of current frameworks in the inductive few-shot node classification setting. Additionally, we propose a simple yet competitive baseline approach specifically tailored for inductive few-shot node classification tasks. We hope our work can provide a new path forward to better understand how the meta-learning paradigm works in the graph domain.	翻訳日:2023-06-16 20:45:43 公開日:2023-06-14
# 畳み込みニューラルネットワークによる大規模空間問題の解法 Solving Large-scale Spatial Problems with Convolutional Neural Networks ( http://arxiv.org/abs/2306.08191v1 ) ライセンス: Link先を確認	Damian Owerko, Charilaos I. Kanatsoulis, Charilaos I. Kanatsoulis	(参考訳) 過去10年間で、ディープラーニングの研究はますます強力なハードウェアによって加速され、モデルの複雑さとデータ量の増加が促進された。これは持続不可能になりつつあるため、効率に再フォーカスする必要がある。本稿では,大規模空間問題に対する学習効率を向上させるために,トランスファー学習を用いる。畳み込みニューラルネットワーク (cnn) は, 信号の小さな窓上で学習できるが, 性能劣化が少なく, 任意に大きい信号で評価し, 結果の一般化誤差に対する理論的拘束力を提供する。我々の証明は、伝達学習において過小評価されている特性であるCNNのシフト等価性を利用する。理論的結果は、モバイルインフラの需要(MID)の文脈で実験的に支持される。提案手法は数百のエージェントで大規模に中規模に取り組むことが可能であり,その前に計算処理が難しかった。 Over the past decade, deep learning research has been accelerated by increasingly powerful hardware, which facilitated rapid growth in the model complexity and the amount of data ingested. This is becoming unsustainable and therefore refocusing on efficiency is necessary. In this paper, we employ transfer learning to improve training efficiency for large-scale spatial problems. We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation, and provide a theoretical bound on the resulting generalization error. Our proof leverages shift-equivariance of CNNs, a property that is underexploited in transfer learning. The theoretical results are experimentally supported in the context of mobile infrastructure on demand (MID). The proposed approach is able to tackle MID at large scales with hundreds of agents, which was computationally intractable prior to this work.	翻訳日:2023-06-16 20:45:28 公開日:2023-06-14
# 偽の政治文書検出におけるGPT-3の有効性の評価:LIARデータセットを事例として Assessing the Effectiveness of GPT-3 in Detecting False Political Statements: A Case Study on the LIAR Dataset ( http://arxiv.org/abs/2306.08190v1 ) ライセンス: Link先を確認	Mars Gokturk Buchholz	(参考訳) 政治的偽言の検出は、情報の完全性を維持し、社会における誤報の拡散を防ぐために重要である。歴史的に、最先端の機械学習モデルは、偽造文を検出する様々な方法を用いていた。これらの手法にはメタデータ(W. Wang et al., 2018)、n-grams analysis(Singh et al., 2021)、言語(Wu et al., 2022)、スタイリスティックな特徴(Islam et al., 2020)の使用が含まれる。 GPT-3(Brown et al., 2020)のような大規模言語モデルの最近の進歩は、幅広いタスクにおいて最先端のパフォーマンスを実現している。本研究では,LIARデータセット(W. Wang et al., 2018)上でGPT-3を用いて実験を行い,メタモデルや言語学的特徴を使わずに最先端モデルよりも高い精度を実現した。さらに, 注意深く設計したプロンプトを用いてゼロショット学習を実験し, ほぼ最先端の性能を達成した。このアプローチの利点は、モデルが決定の証拠を提供し、モデルの意思決定に透明性を与え、ユーザが提供された証拠の有効性を検証する機会を提供することである。 The detection of political fake statements is crucial for maintaining information integrity and preventing the spread of misinformation in society. Historically, state-of-the-art machine learning models employed various methods for detecting deceptive statements. These methods include the use of metadata (W. Wang et al., 2018), n-grams analysis (Singh et al., 2021), and linguistic (Wu et al., 2022) and stylometric (Islam et al., 2020) features. Recent advancements in large language models, such as GPT-3 (Brown et al., 2020) have achieved state-of-the-art performance on a wide range of tasks. In this study, we conducted experiments with GPT-3 on the LIAR dataset (W. Wang et al., 2018) and achieved higher accuracy than state-of-the-art models without using any additional meta or linguistic features. Additionally, we experimented with zero-shot learning using a carefully designed prompt and achieved near state-of-the-art performance. An advantage of this approach is that the model provided evidence for its decision, which adds transparency to the model's decision-making and offers a chance for users to verify the validity of the evidence provided.	翻訳日:2023-06-16 20:45:13 公開日:2023-06-14
# 言語モデルは否定者ではない:否定ベンチマークによる言語モデルの解析 Language models are not naysayers: An analysis of language models on negation benchmarks ( http://arxiv.org/abs/2306.08189v1 ) ライセンス: Link先を確認	Thinh Hung Truong, Timothy Baldwin, Karin Verspoor, Trevor Cohn	(参考訳) BERTのようなマスキング言語モデルでは、否定が大きなボトルネックであることが示されている。しかし、この発見がより大きな自己回帰型言語モデル(``LLMs'')をまだ包括的に研究していない。言語理解の中心となる基本的な言語現象である否定に対処する現在の世代のLSMの能力を評価するために,LLMの研究と応用の増大とともに一歩後退する。我々は,オープンソースの GPT-neo や GPT-3, InstructGPT など,さまざまな LLM を,幅広い否定ベンチマークに対して評価する。様々なモデルサイズとプロンプトを用いた系統的実験により, llmは否定の存在に対する無感性, 否定の語彙意味を捉えることができないこと, 否定下での推論に失敗することなど, いくつかの制限があることが示されている。 Negation has been shown to be a major bottleneck for masked language models, such as BERT. However, whether this finding still holds for larger-sized auto-regressive language models (``LLMs'') has not been studied comprehensively. With the ever-increasing volume of research and applications of LLMs, we take a step back to evaluate the ability of current-generation LLMs to handle negation, a fundamental linguistic phenomenon that is central to language understanding. We evaluate different LLMs -- including the open-source GPT-neo, GPT-3, and InstructGPT -- against a wide range of negation benchmarks. Through systematic experimentation with varying model sizes and prompts, we show that LLMs have several limitations including insensitivity to the presence of negation, an inability to capture the lexical semantics of negation, and a failure to reason under negation.	翻訳日:2023-06-16 20:44:49 公開日:2023-06-14
# 離散表現構造を持つ深部生成モデルの不偏学習 Unbiased Learning of Deep Generative Models with Structured Discrete Representations ( http://arxiv.org/abs/2306.08230v1 ) ライセンス: Link先を確認	Harry Bendekgey, Gabriel Hope and Erik B. Sudderth	(参考訳) グラフィカルモデルとディープラーニングアーキテクチャを組み合わせることで、両方のフレームワークの強みで生成モデルを学びます。構造化変分オートエンコーダ(SVAE)は、グラフィカルモデルから構造と解釈可能性を受け継ぎ、ディープラーニングから高次元データに柔軟な可能性をもたらすが、かなりの最適化課題が生じる。本稿では,svaeを学習するための新しいアルゴリズムを提案し,離散的潜在変数を組み込んだデータ欠落時のマルチモーダル不確実性に対処するsvaeの能力を示す。メモリ効率の高い暗黙差分法により,SVAEは不完全最適化に対して頑健さを示しつつ,勾配降下により学習しやすくなった。正確なグラフィカルモデルパラメータをより迅速に学習するために,手作業による導出を伴わずに自然勾配を計算する手法を導出する。これらの最適化の革新はSVAEと最先端の時系列モデルの最初の比較を可能にし、SVAEは解釈可能で構造化された離散データ表現を学習しながら競争的に機能する。 By composing graphical models with deep learning architectures, we learn generative models with the strengths of both frameworks. The structured variational autoencoder (SVAE) inherits structure and interpretability from graphical models, and flexible likelihoods for high-dimensional data from deep learning, but poses substantial optimization challenges. We propose novel algorithms for learning SVAEs, and are the first to demonstrate the SVAE's ability to handle multimodal uncertainty when data is missing by incorporating discrete latent variables. Our memory-efficient implicit differentiation scheme makes the SVAE tractable to learn via gradient descent, while demonstrating robustness to incomplete optimization. To more rapidly learn accurate graphical model parameters, we derive a method for computing natural gradients without manual derivations, which avoids biases found in prior work. These optimization innovations enable the first comparisons of the SVAE to state-of-the-art time series models, where the SVAE performs competitively while learning interpretable and structured discrete data representations.	翻訳日:2023-06-16 20:36:45 公開日:2023-06-14
# 通信帯域集積型マルチモードフォトニック量子メモリ Telecom-band integrated multimode photonic quantum memory ( http://arxiv.org/abs/2306.08229v1 ) ライセンス: Link先を確認	Xueying Zhang and Bin Zhang and Shihai Wei and Hao Li and Jinyu Liao and Cheng Li and Guangwei Deng and You Wang and Haizhi Song and Lixing You and Bo Jing and Feng Chen and Guang-Can Guo and Qiang Zhou	(参考訳) テレコムバンド集積量子メモリは、ファイバ通信インフラと互換性のある量子ネットワークを開発するための基本的なビルディングブロックである。このような大容量ネットワークに向けて、テレコムバンドにおける集積マルチモードフォトニック量子メモリがまだ実証されていない。本稿では,ファイバ集積型マルチモード量子記憶装置を,レーザ書き起こしチップ上のテレコムバンドに設置する。 Er3+:LiNbO3導波路をファイバピグテールとし、1532nmで4GHz幅のシーケンシャル単一光子の330時間モードと、単一モードに対する偶然検出率の167倍の増大が可能な記憶装置である。全ファイバーアドレス付きメモリシステムは、通信帯域ファイバ統合およびオンチップデバイスを用いて行う。この結果は、統合フォトニクスデバイスを用いた将来の量子ネットワークにとって重要なステップである。 Telecom-band integrated quantum memory is an elementary building block for developing quantum networks compatible with fiber communication infrastructures. Towards such a network with large capacity, an integrated multimode photonic quantum memory at telecom band has yet been demonstrated. Here we report a fiber-integrated multimode quantum storage of single photon at telecom band on a laser-written chip. The storage device is a fiber-pigtailed Er3+:LiNbO3 waveguide and allows a storage of up to 330 temporal modes of heralded single photon with 4-GHz-wide bandwidth at 1532 nm and a 167-fold increasing of coincidence detection rate with respect to single mode. Our memory system with all-fiber addressing is performed using telecom-band fiber-integrated and on-chip devices. The results represent an important step for the future quantum networks using integrated photonics devices.	翻訳日:2023-06-16 20:36:25 公開日:2023-06-14
# CLIPXPlore: 3次元形状探索のための複合CLIPと形状空間 CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration ( http://arxiv.org/abs/2306.08226v1 ) ライセンス: Link先を確認	Jingyu Hu, Ka-Hei Hui, Zhengzhe liu, Hao Zhang and Chi-Wing Fu	(参考訳) 本稿では,3次元形状空間の探索を支援するために視覚言語モデルを活用した新しいフレームワークであるCLIPXPloreを提案する。近年,3次元形状を学習された潜在形状空間にエンコードして生成設計とモデリングを可能にする手法が数多く開発されている。しかし、豊富な情報にもかかわらず、既存の手法には効果的な探索機構がない。そこで我々は,形状空間探索を支援するために,事前学習された視覚言語モデルである clip を活用することを提案する。私たちの考えは3倍です。まず,CLIPと形状空間をペアにし,スケッチ画像からCLIPと形状コードを生成し,2つの空間を接続するマッパーネットワークを訓練する。第二に、与えられた形状の周囲の空間を探索するために、形状の幾何によくマッチするCLIPコードを探すための最適化戦略を定式化します。第3に,2成分誘導,テキスト誘導,スケッチ誘導の3つの探索モードを設計し,形状空間における適切な探索軌跡を特定し,形状に有意な変化をもたらす。我々は,CLIPXPloreを3つの探索モードごとに異なるベースラインと定量的かつ視覚的に比較する一連の実験を行い,既存のソリューションでは達成できない多くの有意義な探索結果が得られることを示した。 This paper presents CLIPXPlore, a new framework that leverages a vision-language model to guide the exploration of the 3D shape space. Many recent methods have been developed to encode 3D shapes into a learned latent shape space to enable generative design and modeling. Yet, existing methods lack effective exploration mechanisms, despite the rich information. To this end, we propose to leverage CLIP, a powerful pre-trained vision-language model, to aid the shape-space exploration. Our idea is threefold. First, we couple the CLIP and shape spaces by generating paired CLIP and shape codes through sketch images and training a mapper network to connect the two spaces. Second, to explore the space around a given shape, we formulate a co-optimization strategy to search for the CLIP code that better matches the geometry of the shape. Third, we design three exploration modes, binary-attribute-guided, text-guided, and sketch-guided, to locate suitable exploration trajectories in shape space and induce meaningful changes to the shape. We perform a series of experiments to quantitatively and visually compare CLIPXPlore with different baselines in each of the three exploration modes, showing that CLIPXPlore can produce many meaningful exploration results that cannot be achieved by the existing solutions.	翻訳日:2023-06-16 20:36:12 公開日:2023-06-14
# 潜在状態表現を用いた政策移行によるアジャイルロコモーションの汎用性の拡大 Expanding Versatility of Agile Locomotion through Policy Transitions Using Latent State Representation ( http://arxiv.org/abs/2306.08224v1 ) ライセンス: Link先を確認	Guilherme Christmann, Ying-Sheng Luo, Jonathan Hans Soeseno, Wei-Chao Chen	(参考訳) 本稿では,実環境におけるロボット移動の汎用性を高めるロバストな遷移戦略であるtransition-netを提案する。この目的のために、我々は異なる歩行の複雑さを現実世界のロボットに適用可能な専用の移動ポリシーに分散することから始める。次に、ロバストな遷移を伴うポリシーを、潜在状態表現を調べることによって単一のコヒーレントなメタコントローラに統一することにより、ロボットの汎用性を拡大する。本手法により,ロボットはリパートリーを反復的に拡張し,ライブラリ内の任意のポリシーペア間の堅牢な遷移を可能にする。我々のフレームワークでは、新しいスキルを追加することは、以前に学んだスキルを変えるプロセスを導入しない。さらに、locomotionポリシーのトレーニングには、1つのコンシューマgpuで1時間もかからない。我々のアプローチは実世界で有効であり、既存のアプローチに比べて実験において最も困難な移行ペアの平均成功率は19%高い。 This paper proposes the transition-net, a robust transition strategy that expands the versatility of robot locomotion in the real-world setting. To this end, we start by distributing the complexity of different gaits into dedicated locomotion policies applicable to real-world robots. Next, we expand the versatility of the robot by unifying the policies with robust transitions into a single coherent meta-controller by examining the latent state representations. Our approach enables the robot to iteratively expand its skill repertoire and robustly transition between any policy pair in a library. In our framework, adding new skills does not introduce any process that alters the previously learned skills. Moreover, training of a locomotion policy takes less than an hour with a single consumer GPU. Our approach is effective in the real-world and achieves a 19% higher average success rate for the most challenging transition pairs in our experiments compared to existing approaches.	翻訳日:2023-06-16 20:35:48 公開日:2023-06-14
# 反対の損失は、パラレルラインとしてアナロジーを回復するために必要なもの Contrastive Loss is All You Need to Recover Analogies as Parallel Lines ( http://arxiv.org/abs/2306.08221v1 ) ライセンス: Link先を確認	Narutatsu Ri, Fei-Tzin Lee, Nakul Verma	(参考訳) 静的な単語埋め込みモデルは、言語アナロジーを高次元空間における平行線として表現することが知られているが、なぜそのような幾何学的構造をもたらすのかというメカニズムは、いまだ不明である。学習時間に劇的な高速化を図りながら、分布情報よりも基本的なコントラスト型手法が、アナログ回復タスクにおける一般的な単語埋め込みモデルと競合することを示した。さらに, コントラスト損失は, 単語埋め込みにおいてこれらの並列構造を作るのに十分であることを示すとともに, 共起統計量と単語埋め込みの幾何学的構造との正確な関係を確立する。 While static word embedding models are known to represent linguistic analogies as parallel lines in high-dimensional space, the underlying mechanism as to why they result in such geometric structures remains obscure. We find that an elementary contrastive-style method employed over distributional information performs competitively with popular word embedding models on analogy recovery tasks, while achieving dramatic speedups in training time. Further, we demonstrate that a contrastive loss is sufficient to create these parallel structures in word embeddings, and establish a precise relationship between the co-occurrence statistics and the geometric structure of the resulting word embeddings.	翻訳日:2023-06-16 20:35:32 公開日:2023-06-14
# SMC-UDA:unsupervised cross-domain Renal Segmentationのための構造的制約 SMC-UDA: Structure-Modal Constraint for Unsupervised Cross-Domain Renal Segmentation ( http://arxiv.org/abs/2306.08213v1 ) ライセンス: Link先を確認	Zhusi Zhong, Jie Li, Lulu Bi, Li Yang, Ihab Kamel, Rama Chellappa, Xinbo Gao, Harrison Bai, Zhicheng Jiao	(参考訳) 深層学習に基づく医用画像のセグメンテーションは、異なる領域の画像にデプロイされると、しばしば失敗する。ドメイン適応手法はドメインシフトの問題を解決することを目的としているが、まだいくつかの問題に直面している。転送学習法は対象ドメインのアノテーションを必要とし、生成的非教師なしドメイン適応(uda)モデルはドメイン固有の表現を無視し、その生成品質はセグメンテーション性能を非常に制限する。本研究では,識別パラダイムに基づく新しい構造モード制約(SMC) UDA フレームワークを提案し,ドメイン間のブリッジとしてエッジ構造を導入する。提案するマルチモーダル学習バックボーンは,画像テクスチャから構造情報を蒸留し,領域不変エッジ構造を識別する。構造に制約のある自己学習とプログレッシブROIでは,エッジの3次元空間構造を見極めることで腎臓を分節する。我々は,SMC-UDAを,ラベル付きソースドメイン (CT) からラベルなしターゲットドメイン (CT/MRI) に適応させることにより,公開腎セグメンテーションデータセット上で評価した。実験の結果,提案するSMC-UDAの一般化は良好であり,生成的UDA法よりも優れていた。 Medical image segmentation based on deep learning often fails when deployed on images from a different domain. The domain adaptation methods aim to solve domain-shift challenges, but still face some problems. The transfer learning methods require annotation on the target domain, and the generative unsupervised domain adaptation (UDA) models ignore domain-specific representations, whose generated quality highly restricts segmentation performance. In this study, we propose a novel Structure-Modal Constrained (SMC) UDA framework based on a discriminative paradigm and introduce edge structure as a bridge between domains. The proposed multi-modal learning backbone distills structure information from image texture to distinguish domain-invariant edge structure. With the structure-constrained self-learning and progressive ROI, our methods segment the kidney by locating the 3D spatial structure of the edge. We evaluated SMC-UDA on public renal segmentation datasets, adapting from the labeled source domain (CT) to the unlabeled target domain (CT/MRI). The experiments show that our proposed SMC-UDA has a strong generalization and outperforms generative UDA methods.	翻訳日:2023-06-16 20:35:20 公開日:2023-06-14
# PEPSにおけるコーナー・トランスファー法のコスト削減 Reduced Contraction Costs of Corner-Transfer Methods for PEPS ( http://arxiv.org/abs/2306.08212v1 ) ライセンス: Link先を確認	Wangwei Lan, Glen Evenbly	(参考訳) コーナー・トランスファー・アプローチを用いる場合、無限に投影される絡み合ったペア状態(iPEPS)を$\mathcal{O}(\chi^3D^6)$から$\mathcal{O}(\chi^3D^3)$に縮約する一対の近似法を提案する。最初の近似は (i)境界テンソルの切断に必要な環境の削減 (二)確立されたアルゴリズムと併用するのではなく、ブラとケットの指数の逐次収縮と切り離しに依存している。このアルゴリズムを検証するため、正方格子ハイゼンベルクモデル上でベンチマークシミュレーションを行い、標準iPEPSアルゴリズムに匹敵する結果を得る。計算コストの向上により,大きな結合次元計算が可能となり,課題解決への可能性を広げることができた。 We propose a pair of approximations that allows the leading order computational cost of contracting an infinite projected entangled-pair state (iPEPS) to be reduced from $\mathcal{O}(\chi^3D^6)$ to $\mathcal{O}(\chi^3D^3)$ when using a corner-transfer approach. The first approximation involves (i) reducing the environment needed for truncation of the boundary tensors (ii) relies on the sequential contraction and truncation of bra and ket indices, rather than doing both together as with the established algorithm. To verify the algorithm, we perform benchmark simulations over square lattice Heisenberg model and obtain results that are comparable to the standard iPEPS algorithm. The improvement in computational cost enables us to perform large bond dimension calculations, extending its potential to solve challenging problems.	翻訳日:2023-06-16 20:34:59 公開日:2023-06-14
# 不確実性を考慮した雑音グラフのロバスト学習 Uncertainty-Aware Robust Learning on Noisy Graphs ( http://arxiv.org/abs/2306.08210v1 ) ライセンス: Link先を確認	Shuyi Chen, Kaize Ding, Shixiang Zhu	(参考訳) グラフニューラルネットワークは、さまざまなグラフ学習タスク、特にノード分類において優れた解決能力を示している。しかし、それらの効果は、実世界のグラフに存在する位相情報やノイズ情報に関連するノイズ測定が広く存在することから生じる課題によって妨げられる。これらの観測の不正確さは、グラフデータ内の重要なパターンを損なう可能性があり、最終的には実用上望ましくない性能をもたらす。そこで本稿では,分散的ロバスト最適化を動機とする新しい不確実性対応グラフ学習フレームワークを提案する。具体的には、グラフニューラルネットワークベースのエンコーダを用いて、ノードの特徴を埋め込み、最小限の定式化により最悪のリスクを最小限に抑えて最適なノード埋め込みを見つける。このような不確実性を考慮した学習プロセスは、ノード表現の改善と、データノイズによる不確実性の影響を効果的に軽減するより堅牢なグラフ予測モデルをもたらす。実験の結果,提案手法は,様々な雑音条件下での最先端のベースラインと比較して,優れた予測性能が得られることがわかった。 Graph neural networks have shown impressive capabilities in solving various graph learning tasks, particularly excelling in node classification. However, their effectiveness can be hindered by the challenges arising from the widespread existence of noisy measurements associated with the topological or nodal information present in real-world graphs. These inaccuracies in observations can corrupt the crucial patterns within the graph data, ultimately resulting in undesirable performance in practical applications. To address these issues, this paper proposes a novel uncertainty-aware graph learning framework motivated by distributionally robust optimization. Specifically, we use a graph neural network-based encoder to embed the node features and find the optimal node embeddings by minimizing the worst-case risk through a minimax formulation. Such an uncertainty-aware learning process leads to improved node representations and a more robust graph predictive model that effectively mitigates the impact of uncertainty arising from data noise. Our experimental result shows that the proposed framework achieves superior predictive performance compared to the state-of-the-art baselines under various noisy settings.	翻訳日:2023-06-16 20:34:44 公開日:2023-06-14
# 主観-目的政策形成アプローチ:マルチエージェントシミュレーションによる複数回帰分析と住民価値の結合 Subjective-objective policy making approach: Coupling of resident-values multiple regression analysis with value-indices, multi-agent-based simulation ( http://arxiv.org/abs/2306.08208v1 ) ライセンス: Link先を確認	Misa Owa, Junichi Miyakoshi, Takeshi Kato	(参考訳) 本研究は,既存の主観的・客観的政策評価アプローチに関する懸念を踏まえ,市民の意思を反映し,客観的事実に裏付けられたより良い政策を選択するための,主観的・客観的政策評価手法を提案する。生活満足アプローチや随伴評価法といった主観的アプローチは主観性を経済価値に転換し、より高い経済価値が市民の望むものと本当に一致するかどうかという疑問を提起する。エビデンスに基づく政策立案やマルチエージェントに基づくシミュレーションといった客観的政策評価アプローチは主観性を考慮しておらず、多元的かつ多元的候補政策の選択が困難である。提案手法は,住民アンケートの結果の複数回帰分析に基づいて主観的目標関数を確立し,mabsを用いて候補政策の客観的評価指標を算出した。次に、主観的対象関数の説明変数と客観的評価指標を組み合わせた新しい主観的目的結合目標関数を設定し、多数の候補から望ましいポリシーを選択するように最適化する。このアプローチを評価するため,宮崎県高春町において再生可能エネルギー導入政策の検証を行った。その結果,MABSの社会的,生態的,経済的な価値に対する2万の政策候補から,住民の価値観に整合した政策を選択するために,新たな主観的・客観的結合目標関数を使用する可能性が示唆された。新しいアプローチを使っていくつかのポリシーを比較することで、さまざまな価値を持つ利害関係者の意志を具体的に表現でき、建設的な議論やコンセンサス構築に寄与する。 Given the concerns around the existing subjective and objective policy evaluation approaches, this study proposes a new combined subjective-objective policy evaluation approach to choose better policy that reflects the will of citizens and is backed up by objective facts. Subjective approaches, such as the Life Satisfaction Approach and the Contingent Valuation Method, convert subjectivity into economic value, raising the question whether a higher economic value really accords with what citizens want. Objective policy evaluation approaches, such as Evidence Based Policy Making and Multi-Agent-Based Simulation, do not take subjectivity into account, making it difficult to choose from diverse and pluralistic candidate policies. The proposed approach establishes a subjective target function based on a multiple regression analysis of the results of a residents questionnaire survey, and uses MABS to calculate the objective evaluation indices for a number of candidate policies. Next, a new subjective-objective coupling target function, combining the explanatory variables of the subjective target function with objective evaluation indices, is set up, optimized to select the preferred policies from numerous candidates. To evaluate this approach, we conducted a verification of renewable energy introduction policies at Takaharu Town in Miyazaki Prefecture, Japan. The results show a good potential for using a new subjective-objective coupling target function to select policies consistent with the residents values for well-being from 20,000 policy candidates for social, ecological, and economic values obtained in MABS. Using the new approach to compare several policies enables concrete expression of the will of stakeholders with diverse values, and contributes to constructive discussions and consensus-building.	翻訳日:2023-06-16 20:34:28 公開日:2023-06-14
# 集合変換器と階層型Bi-LSTMを用いた多エージェントスポーツコンテキストからの球軌道推定 Ball Trajectory Inference from Multi-Agent Sports Contexts Using Set Transformer and Hierarchical Bi-LSTM ( http://arxiv.org/abs/2306.08206v1 ) ライセンス: Link先を確認	Hyunsung Kim, Han-Jun Choi, Chang Jo Kim, Jinsung Yoon, Sang-Ki Ko	(参考訳) 人工知能が多くの分野に広がるにつれ、スポーツ分析へのAIの適用も注目されている。しかしながら、主な課題の1つは、スポーツの試合中に連続移動データを自動取得することの困難さである。特に、オクルージョンや模倣などの障害物のある広いサッカーピッチで、小さなボールを確実に追跡することはコンウンダラムである。そこで本稿では,ボールトラッキングに代わる費用対効果として,選手軌道からの球軌道推定フレームワークを提案する。我々は、集合トランスフォーマーを組み合わせ、マルチエージェントコンテキストの置換不変および同変表現と、プレイヤーボールの保持を中間的に予測して最終軌道推定をサポートする階層的アーキテクチャを得る。また,現実的損失項とポストプロセッシングを導入し,推定軌跡を物理的に現実的なものにする。実験の結果, 本モデルは, 球の保持と同時に, 自然かつ正確な軌道を提供することがわかった。最後に, 軌道インプテーションの欠如, 半自動パスアノテーション, マッチブロードキャストのための自動ズームイン, ホールドワイズ性能指標の算出など, 本フレームワークの実用的応用について提案する。 As artificial intelligence spreads out to numerous fields, the application of AI to sports analytics is also in the spotlight. However, one of the major challenges is the difficulty of automated acquisition of continuous movement data during sports matches. In particular, it is a conundrum to reliably track a tiny ball on a wide soccer pitch with obstacles such as occlusion and imitations. Tackling the problem, this paper proposes an inference framework of ball trajectory from player trajectories as a cost-efficient alternative to ball tracking. We combine Set Transformers to get permutation-invariant and equivariant representations of the multi-agent contexts with a hierarchical architecture that intermediately predicts the player ball possession to support the final trajectory inference. Also, we introduce the reality loss term and postprocessing to secure the estimated trajectories to be physically realistic. The experimental results show that our model provides natural and accurate trajectories as well as admissible player ball possession at the same time. Lastly, we suggest several practical applications of our framework including missing trajectory imputation, semi-automated pass annotation, automated zoom-in for match broadcasting, and calculating possession-wise running performance metrics.	翻訳日:2023-06-16 20:34:01 公開日:2023-06-14
# 生成拡散モデルによる震度予測のためのデータ拡張 Data Augmentation for Seizure Prediction with Generative Diffusion Model ( http://arxiv.org/abs/2306.08256v1 ) ライセンス: Link先を確認	Kai Shu, Yuchang Zhao, Le Wu, Aiping Liu, Ruobing Qian, and Xun Chen	(参考訳) 目的: 発作予測は患者の生活を改善する上で非常に重要である。焦点は、先天状態と間天状態とを区別することである。機械学習の発展により、発作予測法は大きな進歩を遂げた。しかし, 先行データと間欠データとの間の深刻な不均衡問題は, 分類器の性能を制限し, 依然として大きな課題となっている。データ拡張はこの問題を解決する直感的な方法である。既存のデータ拡張手法はデータの重複や再結合によってサンプルを生成する。生成したサンプルの分布は、特徴空間を完全に探索し、新しい情報を提供することができないため、元のデータによって制限される。てんかんの脳波の表現は発作によって異なるため、これらのサンプルは新しい発作で高いパフォーマンスを達成するのに十分な多様性を提供することができない。その結果,DiffEEGと呼ばれる拡散モデルを用いた新しいデータ拡張手法を提案する。方法:拡散モデルは、2つのプロセスからなる生成モデルのクラスである。具体的には、拡散過程において、入力脳波サンプルに段階的にノイズを付加し、ノイズを出力ランダムノイズに変換し、出力と付加されたノイズの損失を最小限にしてデータの分布を探索する。離散化過程において、モデルはノイズを徐々に除去し、データ分布を外側に拡散させ、異なるクラスタ間の距離を狭めることによって合成データをサンプリングする。結果: DiffEEGを既存の手法と比較し, 3つの代表的な分類器に統合した。実験の結果、DiffEEGはパフォーマンスをさらに改善し、既存の手法よりも優れていることが示された。結論: 本論文では, 不均衡を解消し, 本手法の有効性と汎用性を実証する手法を提案する。 Objective: Seizure prediction is of great importance to improve the life of patients. The focal point is to distinguish preictal states from interictal ones. With the development of machine learning, seizure prediction methods have achieved significant progress. However, the severe imbalance problem between preictal and interictal data still poses a great challenge, restricting the performance of classifiers. Data augmentation is an intuitive way to solve this problem. Existing data augmentation methods generate samples by overlapping or recombining data. The distribution of generated samples is limited by original data, because such transformations cannot fully explore the feature space and offer new information. As the epileptic EEG representation varies among seizures, these generated samples cannot provide enough diversity to achieve high performance on a new seizure. As a consequence, we propose a novel data augmentation method with diffusion model called DiffEEG. Methods: Diffusion models are a class of generative models that consist of two processes. Specifically, in the diffusion process, the model adds noise to the input EEG sample step by step and converts the noisy sample into output random noise, exploring the distribution of data by minimizing the loss between the output and the noise added. In the denoised process, the model samples the synthetic data by removing the noise gradually, diffusing the data distribution to outward areas and narrowing the distance between different clusters. Results: We compared DiffEEG with existing methods, and integrated them into three representative classifiers. The experiments indicate that DiffEEG could further improve the performance and shows superiority to existing methods. Conclusion: This paper proposes a novel and effective method to solve the imbalanced problem and demonstrates the effectiveness and generality of our method.	翻訳日:2023-06-16 20:30:01 公開日:2023-06-14
# GBSD: ステージ拡散による生成型ボケ GBSD: Generative Bokeh with Stage Diffusion ( http://arxiv.org/abs/2306.08251v1 ) ライセンス: Link先を確認	Jieren Deng, Xin Zhou, Hao Tian, Zhihong Pan, and Derek Aguiar	(参考訳) ボケ効果(ボケエフェクト、bokeh effect)は、写真中の焦点領域をぼかす芸術的手法であり、テキストから画像への合成や、スマートフォンカメラや写真共有アプリの普及により関心を集めている。ボケ効果のレンダリングに関する以前の研究は、古典的なコンピュータグラフィックスやニューラルレンダリング技術を用いて既存の写真に類似したぼやけた効果を生み出すために、ポストホック画像操作に焦点を合わせてきたが、深度不連続アーティファクトを持つか、トレーニングデータに存在するボケ効果の再生に制限されている。より最近の拡散モデルでは、イメージを芸術的なスタイルで合成することができるが、高次元マスクの生成、高価な微調整、あるいはグローバルなイメージ特性に影響を与える必要がある。本稿では,フォトリアリスティックな画像をボケスタイルで合成する最初の画像生成モデルであるgbsdを提案する。拡散モデルにおける画像合成の進行に動機づけられ, 潜在拡散モデルと2段階のコンディショニングアルゴリズムを組み合わせることで, 意味論的に定義された物体に対するボケ効果を表現できる。オブジェクトに効果を集中することができるので、このセマンティックボケ効果は古典的なレンダリング技術よりも汎用性が高い。我々は,gbsdを定量的かつ質的に評価し,テキストから画像への設定と画像から画像への設定の両方に適用できることを実証する。 The bokeh effect is an artistic technique that blurs out-of-focus areas in a photograph and has gained interest due to recent developments in text-to-image synthesis and the ubiquity of smart-phone cameras and photo-sharing apps. Prior work on rendering bokeh effects have focused on post hoc image manipulation to produce similar blurring effects in existing photographs using classical computer graphics or neural rendering techniques, but have either depth discontinuity artifacts or are restricted to reproducing bokeh effects that are present in the training data. More recent diffusion based models can synthesize images with an artistic style, but either require the generation of high-dimensional masks, expensive fine-tuning, or affect global image characteristics. In this paper, we present GBSD, the first generative text-to-image model that synthesizes photorealistic images with a bokeh style. Motivated by how image synthesis occurs progressively in diffusion models, our approach combines latent diffusion models with a 2-stage conditioning algorithm to render bokeh effects on semantically defined objects. Since we can focus the effect on objects, this semantic bokeh effect is more versatile than classical rendering techniques. We evaluate GBSD both quantitatively and qualitatively and demonstrate its ability to be applied in both text-to-image and image-to-image settings.	翻訳日:2023-06-16 20:29:36 公開日:2023-06-14
# 超音波画像認識におけるマスク付きオートエンコーダの劣化 Deblurring Masked Autoencoder is Better Recipe for Ultrasound Image Recognition ( http://arxiv.org/abs/2306.08249v1 ) ライセンス: Link先を確認	Qingbo Kang, Jun Gao, Kang Li, Qicheng Lao	(参考訳) masked autoencoder (mae) は前例のない注目を集め、多くの視覚タスクで顕著なパフォーマンスを達成している。事前トレーニング中にランダムにマスクされたイメージパッチ(プロキシタスクと呼ばれる)を再構築し、下流タスクに転送できる意味のある意味表現を学ぶ。しかし、超音波画像では、MAEは十分に調査されていない。本研究では,超音波画像認識におけるMAEの可能性を検討する。超音波画像の高雑音/信号比に特有の特徴を生かして,プリトレーニング中のプロキシタスクにデブラーリングを組み込んだ新しいデブラーリングMAE手法を提案する。デブロアリングの追加により、超音波画像に表示される微妙な細部をよりよく復元し、下流分類タスクの性能を向上させることができる。超音波画像分類における最新の性能を実現するため, 脱毛性maeの有効性を実証した。全体としては,超音波画像認識におけるmaeの可能性に注目し,デブラリングを組み込んだ新しい手法を提案する。 Masked autoencoder (MAE) has attracted unprecedented attention and achieves remarkable performance in many vision tasks. It reconstructs random masked image patches (known as proxy task) during pretraining and learns meaningful semantic representations that can be transferred to downstream tasks. However, MAE has not been thoroughly explored in ultrasound imaging. In this work, we investigate the potential of MAE for ultrasound image recognition. Motivated by the unique property of ultrasound imaging in high noise-to-signal ratio, we propose a novel deblurring MAE approach that incorporates deblurring into the proxy task during pretraining. The addition of deblurring facilitates the pretraining to better recover the subtle details presented in the ultrasound images, thus improving the performance of the downstream classification task. Our experimental results demonstrate the effectiveness of our deblurring MAE, achieving state-of-the-art performance in ultrasound image classification. Overall, our work highlights the potential of MAE for ultrasound image recognition and presents a novel approach that incorporates deblurring to further improve its effectiveness.	翻訳日:2023-06-16 20:29:11 公開日:2023-06-14
# 拡散の拡散:周期的一方向拡散によるテキストビジョン条件付き生成 Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation ( http://arxiv.org/abs/2306.08247v1 ) ライセンス: Link先を確認	Yongqi Yang (1), Ruoyu Wang (1), Zhihao Qian (1), Ye Zhu (2), Yu Wu (1) ((1) Wuhan University, (2) Princeton University)	(参考訳) 拡散モデルを用いたテキスト・ツー・イメージ(T2I)生成により、ユーザはテキスト条件が与えられた合成画像のセマンティックコンテンツを制御することができる。よりカスタマイズされた画像生成アプリケーションに向けたさらなるステップとして、セマンティックレベルのテキスト入力だけでなく、ピクセルレベルの視覚条件にもとづく画像の合成を行う、新しいマルチモダリティ生成設定を導入する。既存の文献は、まず与えられた視覚情報を言語と接続して意味論的表現に変換し、それから元の分節化プロセスに組み込む。一見直感的に見えるように、このような方法論設計は意味遷移中にピクセル値を失うため、低レベルのビジョン(例えば、顔画像のid)の保存が望まれるタスクシナリオを満たせない。そこで本研究では,セマンティックテキストやピクセル・ビジュアル・コンディショニングに関して,カスタマイズされた画像を作成するためのトレーニングフリーフレームワークであるCyclic One-Way Diffusion (COW)を提案する。特に,画像のサブ領域は,物理的拡散と同様に相互干渉を伴い,消音軌道に沿った究極の調和を達成する。そこで我々は,視覚条件を高濃度の「セド」としてデノナイズプロセスの初期段階に配置し,視覚条件からの一方向情報の流れを制御することで,その「拡散」を調和図形にすることで,与えられた視覚条件を周期的に繰り返し活用することを提案する。画像内における内部拡散過程を段階的に実施するために, 破壊・構築過程を何回も繰り返す。難解なワンショット顔とテキストコンディショニング画像合成タスクの実験は,学習に基づくテキスト・ビジョン条件付き手法と比較して,速度,画質,条件付き忠実性において優れることを示した。 Text-to-Image (T2I) generation with diffusion models allows users to control the semantic content in the synthesized images given text conditions. As a further step toward a more customized image creation application, we introduce a new multi-modality generation setting that synthesizes images based on not only the semantic-level textual input but also on the pixel-level visual conditions. Existing literature first converts the given visual information to semantic-level representation by connecting it to languages, and then incorporates it into the original denoising process. Seemingly intuitive, such methodological design loses the pixel values during the semantic transition, thus failing to fulfill the task scenario where the preservation of low-level vision is desired (e.g., ID of a given face image). To this end, we propose Cyclic One-Way Diffusion (COW), a training-free framework for creating customized images with respect to semantic text and pixel-visual conditioning. Notably, we observe that sub-regions of an image impose mutual interference, just like physical diffusion, to achieve ultimate harmony along the denoising trajectory. Thus we propose to repetitively utilize the given visual condition in a cyclic way, by planting the visual condition as a high-concentration ``seed'' at the initialization step of the denoising process, and ``diffuse'' it into a harmonious picture by controlling a one-way information flow from the visual condition. We repeat the destroy-and-construct process multiple times to gradually but steadily impose the internal diffusion process within the image. Experiments on the challenging one-shot face and text-conditioned image synthesis task demonstrate our superiority in terms of speed, image quality, and conditional fidelity compared to learning-based text-vision conditional methods.	翻訳日:2023-06-16 20:28:54 公開日:2023-06-14
# MMASD:自閉症介入分析のためのマルチモーダルデータセット MMASD: A Multimodal Dataset for Autism Intervention Analysis ( http://arxiv.org/abs/2306.08243v1 ) ライセンス: Link先を確認	Jicheng Li, Vuthea Chheang, Pinar Kullu, Eli Brignac, Zhang Guo, Kenneth E. Barner, Anjana Bhat, Roghayeh Leila Barmaki Name	(参考訳) 自閉症スペクトラム障害(Autism spectrum disorder、ASD)は、発達障害の一つで、社会的コミュニケーション障害とコミュニケーションの困難さを特徴とする。機械学習技術は、自閉症の研究と評価を促進するために広く採用されている。しかしながら、計算モデルは、主に特定の分析に集中しており、プライバシを保存するデータ共有の複雑さによるモデル間の比較を制限する自閉症コミュニティのプライベートデータセットに検証されている。本研究は,自閉症児の遊び療法介入から収集した,新たなプライバシー保護オープンソースデータセットであるMMASDをマルチモーダルASDベンチマークデータセットとして提示する。 MMASDには、ASDを持つ32人の子供のデータと、100時間以上の介入記録から区切られた1,315のデータが含まれている。パブリックアクセスを促進するために、各データサンプルは、(1)光学フロー、(2)2Dスケルトン、(3)3Dスケルトン、(4)クリニカルASD評価スコア、例えばADOSスコアの4つのプライバシー保護モードから構成される。 MMASDは、研究者やセラピストが子どもの認知状態を理解し、治療中の進捗を監視し、それに応じて治療計画をカスタマイズすることを目的としている。また、行動品質評価や対人同期推定といった下流タスクにもインスピレーションを与えている。 MMASDデータセットはhttps://github.com/Li-Jicheng/MMASD-A-Multimodal-Dataset-for-Autism-Intervention-Analysisで簡単にアクセスできる。 Autism spectrum disorder (ASD) is a developmental disorder characterized by significant social communication impairments and difficulties perceiving and presenting communication cues. Machine learning techniques have been broadly adopted to facilitate autism studies and assessments. However, computational models are primarily concentrated on specific analysis and validated on private datasets in the autism community, which limits comparisons across models due to privacy-preserving data sharing complications. This work presents a novel privacy-preserving open-source dataset, MMASD as a MultiModal ASD benchmark dataset, collected from play therapy interventions of children with Autism. MMASD includes data from 32 children with ASD, and 1,315 data samples segmented from over 100 hours of intervention recordings. To promote public access, each data sample consists of four privacy-preserving modalities of data: (1) optical flow, (2) 2D skeleton, (3) 3D skeleton, and (4) clinician ASD evaluation scores of children, e.g., ADOS scores. MMASD aims to assist researchers and therapists in understanding children's cognitive status, monitoring their progress during therapy, and customizing the treatment plan accordingly. It also has inspiration for downstream tasks such as action quality assessment and interpersonal synchrony estimation. MMASD dataset can be easily accessed at https://github.com/Li-Jicheng/MMASD-A-Multimodal-Dataset-for-Autism-Intervention-Analysis.	翻訳日:2023-06-16 20:27:59 公開日:2023-06-14
# 量子エネルギーテレポーテーションを用いた量子インタラクティブ証明 Quantum interactive proofs using quantum energy teleportation ( http://arxiv.org/abs/2306.08242v1 ) ライセンス: Link先を確認	Kazuki Ikeda, Adam Lowe	(参考訳) 本稿では,量子状態テレポーテーション(qst)と量子エネルギーテレポーテーション(qet)プロトコルを用いた単純な量子インタラクティブ証明(qip)プロトコルを提案する。 qetは、サプライヤから注入されたエネルギーを担保として、ローカル操作と古典的通信(locc)によって、遠方から受信者がローカルエネルギーを抽出する技術である。 QETは、絡み合う任意の局所ハミルトニアンに対して作用し、我々の研究では、一般的な局所ハミルトニアンの基底状態を得るのが量子メリン・アーサー(QMA)ハードであることが重要である。これらの目的のためにQETを採用する主な動機は明確である。まず、証明者が正しい状態を保持し、適切な操作を行う場合、検証者は、高い確率(完全性)で負のエネルギーの存在を効果的に検証することができる。適切な演算子や誤った状態を選択することができないと、検証者は負のエネルギー(音量)を観測できない。重要なことに、検証者は証明者の伝達状態から1つの量子ビットのみを観測するが、証明者のハミルトニアン状態と状態(ゼロ・ノウトリッジ)は無視できない。さらに,分散量子インタラクティブな証明に解析を拡張し,各プレイヤーの計測値の検証のための複数の解を提案する。最も一般的な場合におけるプロトコルの複雑性クラスは、QIP(3)=PSPACEに属するので、小さな量子通信デバイスで実装可能なセキュアな量子認証スキームを提供する。プロトコルをQuantum Multi-Prover Interactive Proof (QMIP) システムに拡張するのは簡単で、複雑さはより強力になる(PSPACE$\subset$QMIP=NEXPTIME)。この場合、すべてのプローバーは基底状態の絡み合いを共有するので、より強力な複雑性クラス QMIP$^$ に属するべきである。 We present a simple quantum interactive proof (QIP) protocol using the quantum state teleportation (QST) and quantum energy teleportation (QET) protocols. QET is a technique that allows a receiver at a distance to extract the local energy by local operations and classical communication (LOCC), using the energy injected by the supplier as collateral. QET works for any local Hamiltonian with entanglement and, for our study, it is important that getting the ground state of a generic local Hamiltonian is quantum Merlin Arthur (QMA)-hard. The key motivations behind employing QET for these purposes are clarified. Firstly, in cases where a prover possesses the correct state and executes the appropriate operations, the verifier can effectively validate the presence of negative energy with a high probability (Completeness). Failure to select the appropriate operators or an incorrect state renders the verifier incapable of observing negative energy (Soundness). Importantly, the verifier solely observes a single qubit from the prover's transmitted state, while remaining oblivious to the prover's Hamiltonian and state (Zero-knowledge). Furthermore, the analysis is extended to distributed quantum interactive proofs, where we propose multiple solutions for the verification of each player's measurement. The complexity class of our protocol in the most general case belongs to QIP(3)=PSPACE, hence it provides a secure quantum authentication scheme that can be implemented in small quantum communication devices. It is straightforward to extend our protocol to Quantum Multi-Prover Interactive Proof (QMIP) systems, where the complexity is expected to be more powerful (PSPACE$\subset$QMIP=NEXPTIME). In our case, all provers share the ground state entanglement, hence it should belong to a more powerful complexity class QMIP$^$.	翻訳日:2023-06-16 20:27:20 公開日:2023-06-14
# 点監督下での半教師細胞認識 Semi-supervised Cell Recognition under Point Supervision ( http://arxiv.org/abs/2306.08240v1 ) ライセンス: Link先を確認	Zhongyi Shui, Yizhi Zhao, Sunyi Zheng, Yunlong Zhang, Honglin Li, Shichuan Zhang, Xiaoxuan Yu, Chenglu Zhu, Lin Yang	(参考訳) 細胞認識はデジタル病理画像解析の基本的な課題である。ポイントベースの細胞認識(PCR)法は通常、非常にコストがかかり、時間がかかり、労力がかかる大量のアノテーションを必要とする。 semi-supervised learning (ssl) はギガピクセル全体のスライド画像でセル情報をフル活用するためのショートカットを提供する。しかし、半教師付きポイントベース細胞認識(SSPCR)の研究はほとんど見過ごされている。従来のsspcrの研究はすべて密度マップベースのpcrモデルに基づいており、精度が不十分で推論速度が遅く、ハイパーパラメータに対する感度が高い。これらの問題に対処するため,近年,エンドツーエンドpcrモデルが提案されている。本稿では,エンド・ツー・エンドのPCRモデルに適したSSPCRフレームワークを初めて開発する。全体としては、現在のモデルを用いてラベルなし画像の擬似ラベルを生成し、モデルトレーニングを監督するために使用される。さらに,自己学習に一般的に存在する確証バイアス問題を克服するために,共同学習戦略を導入する。分散アライメント技術も組み込まれ、ラベルなしデータに対して高品質でバイアスのない擬似ラベルを生成する。各種染色スタイルに関する4つの病理組織学的データセットの実験結果から,提案手法の有効性と妥当性が示された。コードは \textcolor{magenta}{\url{https://github.com/windygooo/SSPCR} で入手できる。 Cell recognition is a fundamental task in digital histopathology image analysis. Point-based cell recognition (PCR) methods normally require a vast number of annotations, which is extremely costly, time-consuming and labor-intensive. Semi-supervised learning (SSL) can provide a shortcut to make full use of cell information in gigapixel whole slide images without exhaustive labeling. However, research into semi-supervised point-based cell recognition (SSPCR) remains largely overlooked. Previous SSPCR works are all built on density map-based PCR models, which suffer from unsatisfactory accuracy, slow inference speed and high sensitivity to hyper-parameters. To address these issues, end-to-end PCR models are proposed recently. In this paper, we develop a SSPCR framework suitable for the end-to-end PCR models for the first time. Overall, we use the current models to generate pseudo labels for unlabeled images, which are in turn utilized to supervise the models training. Besides, we introduce a co-teaching strategy to overcome the confirmation bias problem that generally exists in self-training. A distribution alignment technique is also incorporated to produce high-quality, unbiased pseudo labels for unlabeled data. Experimental results on four histopathology datasets concerning different types of staining styles show the effectiveness and versatility of the proposed framework. Code is available at \textcolor{magenta}{\url{https://github.com/windygooo/SSPCR}	翻訳日:2023-06-16 20:26:20 公開日:2023-06-14
# Maestro:AIロバストネスを教えるためのゲームプラットフォーム Maestro: A Gamified Platform for Teaching AI Robustness ( http://arxiv.org/abs/2306.08238v1 ) ライセンス: Link先を確認	Margarita Geleta and Jiacen Xu and Manikanta Loya and Junlin Wang and Sameer Singh and Zhou Li and Sergio Gago-Masague	(参考訳) AI脆弱性の防止は、ユーザや企業の安全とプライバシを保護するために重要であるが、堅牢なAIのための教育ツールはまだ世界中で未開発である。本稿では,maestroの設計,実装,評価について述べる。 Maestroは、堅牢なAI教育の発展に寄与する、効果的なオープンソースのゲームベースのプラットフォームである。 maestroは、競争の激しいプログラミング環境において、学生が人生に触発された課題に直面するゴールベースのシナリオを提供する。本研究では,学生の関与,モチベーション,学習成功に対するmaestroの影響を評価した。この作業は、堅牢なAIドメインにおけるアクティブな学習機会を促進するオンライン学習ツールの設計機能に関する洞察も提供する。マエストロを用いた147人の大学生の反射反応を,AIの4分の2の授業で分析した。その結果、堅牢なAIで新しいスキルを習得したと感じた学生は、Maestroを高く評価する傾向があり、堅牢なAIにおける物質統合、好奇心、熟達に高く評価された。さらに,マエストロの主要なゲーミフィケーション要素であるリーダーボードは,学生のエンゲージメントと学習に効果的に寄与している。また,maestroは教育的品質を損なうことなく,任意のコースの長さや深さに効果的に適応できることを示した。 Although the prevention of AI vulnerabilities is critical to preserve the safety and privacy of users and businesses, educational tools for robust AI are still underdeveloped worldwide. We present the design, implementation, and assessment of Maestro. Maestro is an effective open-source game-based platform that contributes to the advancement of robust AI education. Maestro provides goal-based scenarios where college students are exposed to challenging life-inspired assignments in a competitive programming environment. We assessed Maestro's influence on students' engagement, motivation, and learning success in robust AI. This work also provides insights into the design features of online learning tools that promote active learning opportunities in the robust AI domain. We analyzed the reflection responses (measured with Likert scales) of 147 undergraduate students using Maestro in two quarterly college courses in AI. According to the results, students who felt the acquisition of new skills in robust AI tended to appreciate highly Maestro and scored highly on material consolidation, curiosity, and mastery in robust AI. Moreover, the leaderboard, our key gamification element in Maestro, has effectively contributed to students' engagement and learning. Results also indicate that Maestro can be effectively adapted to any course length and depth without losing its educational quality.	翻訳日:2023-06-16 20:25:59 公開日:2023-06-14
# OT-Net: 再利用可能なニューラル最適輸送ソリューション OT-Net: A Reusable Neural Optimal Transport Solver ( http://arxiv.org/abs/2306.08233v1 ) ライセンス: Link先を確認	Zezeng Li, Shenghao Li, Lianbao Jin, Na Lei, Zhongxuan Luo	(参考訳) 最適輸送(ot)の広範な適用により、その計算は必須となり、様々なアルゴリズムが出現した。しかし、既存の手法は効率が低く、不連続写像を表現できない。そこで,新しい再利用可能なニューラルネットワークotソルバot-netが提案され,まずブレニアの高さ表現をニューラルネットワークで学習し,その電位の勾配を計算してotマップを得た。アルゴリズムには2つのメリットがある。 1) 不連続写像を容易に表現でき、不連続な支持を持つ任意の対象分布と一致し、鋭い境界を達成することができる。これにより、生成されたモデルのモード崩壊をなくすことができる。 2) OTマップは,新たなターゲットサンプルを追加すると,提案アルゴリズムによって直接的に計算できるため,マップの効率と再利用性が大幅に向上する。さらに, アルゴリズムの理論的誤差境界を解析し, 画像生成, 色移動, ドメイン適応におけるアプローチの実証的成功を実証した。 With the widespread application of optimal transport (OT), its calculation becomes essential, and various algorithms have emerged. However, the existing methods either have low efficiency or cannot represent discontinuous maps. A novel reusable neural OT solver OT-Net is thus presented, which first learns Brenier's height representation via the neural network to obtain its potential, and then gained the OT map by computing the gradient of the potential. The algorithm has two merits, 1) it can easily represent discontinuous maps, which allows it to match any target distribution with discontinuous supports and achieve sharp boundaries. This can well eliminate mode collapse in the generated models. 2) The OT map can be calculated straightly by the proposed algorithm when new target samples are added, which greatly improves the efficiency and reusability of the map. Moreover, the theoretical error bound of the algorithm is analyzed, and we have demonstrated the empirical success of our approach in image generation, color transfer, and domain adaptation.	翻訳日:2023-06-16 20:25:38 公開日:2023-06-14
# 逆強化学習のためのカリキュラムサブゴール Curricular Subgoals for Inverse Reinforcement Learning ( http://arxiv.org/abs/2306.08232v1 ) ライセンス: Link先を確認	Shunyu Liu, Yunpeng Qing, Shuqi Xu, Hongyan Wu, Jiangtao Zhang, Jingyuan Cong, Tianhao Chen, Yunfu Liu, Mingli Song	(参考訳) Inverse Reinforcement Learning (IRL)は、政策学習を促進するために専門家によるデモンストレーションから報酬関数を再構築することを目的としており、模倣学習においてその顕著な成功を実証している。専門家的な行動を促進するため、既存のIRL法は主に、模倣者と専門家の軌跡の違いを最小限に抑えるグローバル報酬関数の学習に焦点を当てている。しかし、これらのグローバルな設計は、冗長なノイズとエラー伝搬の問題によって依然として制限されており、複雑なマルチステージタスクにおいてエージェント能力の低下につながる。本稿では,一タスクを複数のローカルサブゴールで明示的に切り離し,エージェントの模倣をガイドする,Curricular Subgoal-based Inverse Reinforcement Learning (CSIRL)フレームワークを提案する。具体的には、csirlはまず、訓練されたエージェントが専門家の軌道上で決定的不確実性を導入し、異なるタスクステージの探索境界を直接決定するサブゴールを動的に選択する。さらに,各ステージの局所報酬関数を取得するために,これらのキュラーサブゴールに基づいてメタシミュレーション対象をカスタマイズし,固有報酬生成装置を訓練する。 D4RLと自律走行ベンチマークの実験では、提案手法が最先端技術よりも優れた結果をもたらすとともに、より優れた解釈可能性を示す。私たちのコードはhttps://github.com/Plankson/CSIRLで公開されています。 Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning, and has demonstrated its remarkable success in imitation learning. To promote expert-like behavior, existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert. However, these global designs are still limited by the redundant noise and error propagation problems, leading to the unsuitable reward assignment and thus downgrading the agent capability in complex multi-stage tasks. In this paper, we propose a novel Curricular Subgoal-based Inverse Reinforcement Learning (CSIRL) framework, that explicitly disentangles one task with several local subgoals to guide agent imitation. Specifically, CSIRL firstly introduces decision uncertainty of the trained agent over expert trajectories to dynamically select subgoals, which directly determines the exploration boundary of different task stages. To further acquire local reward functions for each stage, we customize a meta-imitation objective based on these curricular subgoals to train an intrinsic reward generator. Experiments on the D4RL and autonomous driving benchmarks demonstrate that the proposed methods yields results superior to the state-of-the-art counterparts, as well as better interpretability. Our code is available at https://github.com/Plankson/CSIRL.	翻訳日:2023-06-16 20:25:22 公開日:2023-06-14
# 直交列を用いた微分プライベート無線フェデレート学習 Differentially Private Wireless Federated Learning Using Orthogonal Sequences ( http://arxiv.org/abs/2306.08280v1 ) ライセンス: Link先を確認	Xizixiang Wei, Tianhao Wang, Ruiquan Huang, Cong Shen, Jing Yang, H. Vincent Poor	(参考訳) 本稿では,単入力単一出力(siso)無線フェデレート学習(fl)システムのための,新しいプライバシー保護型上界計算(aircomp)法を提案する。通信設計の観点から、FLORASは直交配列の特性を利用して送信機(CSIT)のチャネル状態情報の要求をなくす。プライバシの観点から、FLORASはアイテムレベルとクライアントレベルの差分プライバシー(DP)の両方を保証できることを示す。さらに、システムパラメータを調整することで、FLORASは追加コストなしで異なるDPレベルを柔軟に達成することができる。新たなFL収束バウンダリが導出され、プライバシー保証と組み合わせることで、収束率と差分プライバシーレベルのスムーズなトレードオフが可能になる。数値計算により, FLORASの利点をAirComp法と比較し, モデル収束度とプライバシレベルの異なるトレードオフ条件を持つプライバシ保存FLの設計を導出できることが検証された。 We propose a novel privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the communication design perspective, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we prove that FLORAS can offer both item-level and client-level differential privacy (DP) guarantees. Moreover, by adjusting the system parameters, FLORAS can flexibly achieve different DP levels at no additional cost. A novel FL convergence bound is derived which, combined with the privacy guarantees, allows for a smooth tradeoff between convergence rate and differential privacy levels. Numerical results demonstrate the advantages of FLORAS compared with the baseline AirComp method, and validate that our analytical results can guide the design of privacy-preserving FL with different tradeoff requirements on the model convergence and privacy levels.	翻訳日:2023-06-16 20:17:04 公開日:2023-06-14
# FRIGATE:Frugal Spatio-temporal Preecasting on Road Networks (英語) FRIGATE: Frugal Spatio-temporal Forecasting on Road Networks ( http://arxiv.org/abs/2306.08277v1 ) ライセンス: Link先を確認	Mridul Gupta, Hariprasad Kodamana, Sayan Ranu	(参考訳) 道路網における時空間過程のモデル化は重要度を高める課題である。時空間グラフニューラルネットワーク(gnns)の開発には大きな進展があるが、既存の研究は現実の道路網では実用的でない3つの仮定に基づいている。まず、道路ネットワークのすべてのノードを検知すると仮定する。実際には、予算制約やセンサーの故障のため、すべての位置(ノード)にはセンサーが備わっていない。第2に、すべてのセンサーでセンサー履歴が利用できると仮定する。これは、センサーの故障、通信中のパケットの損失など、非現実的である。最後に、静的な道路網の仮定がある。道路の閉鎖や新道路の建設などにより、ネットワーク内の接続性は変化する。この作業では、これらの欠点に対処するためにフリゲートを開発します。 FRIGATEは、位置情報、位相情報、時間情報をリッチな帰納ノード表現に統合する時空間Gnnによって駆動される。この多様な情報の融合は、Lstms と gated Lipschitz の埋め込みによる新しい組み合わせによって実現可能である。提案するgnnアーキテクチャは,最先端アルゴリズムで使用されるメッセージパッシングgnnよりも表現力が高いことが証明される。 FRIGATEの高い表現性は、実世界のネットワーク制約されたトラフィックデータに対して行われる経験的性能に自然に変換される。さらに、フルールセンサーの展開、道路網の接続性の変化、センシングの時間的不規則性にも耐えられる。 Modelling spatio-temporal processes on road networks is a task of growing importance. While significant progress has been made on developing spatio-temporal graph neural networks (Gnns), existing works are built upon three assumptions that are not practical on real-world road networks. First, they assume sensing on every node of a road network. In reality, due to budget-constraints or sensor failures, all locations (nodes) may not be equipped with sensors. Second, they assume that sensing history is available at all installed sensors. This is unrealistic as well due to sensor failures, loss of packets during communication, etc. Finally, there is an assumption of static road networks. Connectivity within networks change due to road closures, constructions of new roads, etc. In this work, we develop FRIGATE to address all these shortcomings. FRIGATE is powered by a spatio-temporal Gnn that integrates positional, topological, and temporal information into rich inductive node representations. The joint fusion of this diverse information is made feasible through a novel combination of gated Lipschitz embeddings with Lstms. We prove that the proposed Gnn architecture is provably more expressive than message-passing Gnns used in state-of-the-art algorithms. The higher expressivity of FRIGATE naturally translates to superior empirical performance conducted on real-world network-constrained traffic data. In addition, FRIGATE is robust to frugal sensor deployment, changes in road network connectivity, and temporal irregularity in sensing.	翻訳日:2023-06-16 20:16:47 公開日:2023-06-14
# TryOnDiffusion: 2つのユニペットの物語 TryOnDiffusion: A Tale of Two UNets ( http://arxiv.org/abs/2306.08276v1 ) ライセンス: Link先を確認	Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, Ira Kemelmacher-Shlizerman	(参考訳) 他者が着ている人物と衣服のイメージが2つある場合、私たちのゴールは、入力された人の衣服がどのように見えるかを可視化することです。重要な課題は、被服の写実的な細部保存の可視化を合成し、被服に重要な身体のポーズと形状の変化を適応させる。それまでの方法は、効果的なポーズや形状の変化を伴わずに衣服の細部保存に焦点を合わせるか、望ましい形状で試してみるか、衣服の細部を欠くかのどちらかであった。本稿では,2つのunets(parallel-unet と呼ぶ)を統一した拡散ベースのアーキテクチャを提案する。 Parallel-UNetの背景にある主要なアイデアは以下のとおりである。 1)衣服はクロスアテンション機構によって暗黙的に反動される。 2) 衣服ワープと人物ブレンドは,2つの異なるタスクのシーケンスとは対照的に,統一プロセスの一部として発生する。実験の結果,tryondiffusionは定性的にも定量的にも最先端のパフォーマンスを達成できた。 Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.	翻訳日:2023-06-16 20:16:24 公開日:2023-06-14
# c$^3$ps:半教師付き医用画像セグメンテーションのための文脈認識条件付きクロス擬似監督 C$^3$PS: Context-aware Conditional Cross Pseudo Supervision for Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2306.08275v1 ) ライセンス: Link先を確認	Peng Liu and Guoyan Zheng	(参考訳) 性能向上のために大量のラベル付きデータを活用できる半教師付き学習(SSL)手法が近年注目を集めている。本稿では,半教師型医用画像分割のためのコンテキスト対応コンディショナルクロス擬似スーパービジョン法(C$^3$PS)を提案する。先述したクロス擬似監督 (cps) とは異なり、本論文では条件付きクロス擬似監督 (ccps) 機構を導入し、与えられたクラスラベル上でクロス擬似監督を行う。文脈認識はCCPSにも導入され、相互監督のための擬似ラベルの品質が向上した。提案手法には,後期の訓練段階において,硬臓器の学習に集中できるという利点がある。医用画像分割作業の典型的な2つの課題に対して,本手法は最先端の手法よりも優れた性能を示す。 Semi-supervised learning (SSL) methods, which can leverage a large amount of unlabeled data for improved performance, has attracted increasing attention recently. In this paper, we introduce a novel Context-aware Conditional Cross Pseudo Supervision method (referred as C$^3$PS) for semi-supervised medical image segmentation. Unlike previously published Cross Pseudo Supervision (CPS) works, this paper introduces a novel Conditional Cross Pseudo Supervision (CCPS) mechanism where the cross pseudo supervision is conditioned on a given class label. Context-awareness is further introduced in the CCPS to improve the quality of pseudo-labels for cross pseudo supervision. The proposed method has the additional advantage that in the later training stage, it can focus on the learning of hard organs. Validated on two typical yet challenging medical image segmentation tasks, our method demonstrates superior performance over the state-of-the-art methods.	翻訳日:2023-06-16 20:16:06 公開日:2023-06-14
# なぜ間接グラフや非間接グラフで集約機能や隣接リストを使うのか? 実証的研究と簡易分類法 Why Using Either Aggregated Features or Adjacency Lists in Directed or Undirected Graph? Empirical Study and Simple Classification Method ( http://arxiv.org/abs/2306.08274v1 ) ライセンス: Link先を確認	Seiji Maekawa, Yuya Sasaki, Makoto Onizuka	(参考訳) ノード分類は、グラフ分析で最もホットなタスクの1つです。本稿では,ノード表現(特徴の集約対隣接リスト)と入力グラフのエッジ方向(指向対非指向)の選択に焦点をあて,分類結果に大きな影響を与えている。本研究は,ノード表現とエッジ方向の組み合わせを用いた各種GNNの性能評価を行うための実証的研究である。実験の結果,データセット間の静的な組み合わせは得られず,データセットの特性に応じて適切な組み合わせを選択する必要があることが示された。そこで本研究では,有向グラフと無向グラフのノード表現のすべての組み合わせを利用する,単純だが包括的分類法A2DUGを提案する。我々は,A2DUGが様々なデータセットで安定して動作することを示す。驚くべきことに、いくつかのデータセットにおいて、現在の最先端のメソッドを大きく上回っている。この結果は,ノード表現とエッジ方向の組み合わせに対する適応効果制御の重要性を検証する。 Node classification is one of the hottest tasks in graph analysis. In this paper, we focus on the choices of node representations (aggregated features vs. adjacency lists) and the edge direction of an input graph (directed vs. undirected), which have a large influence on classification results. We address the first empirical study to benchmark the performance of various GNNs that use either combination of node representations and edge directions. Our experiments demonstrate that no single combination stably achieves state-of-the-art results across datasets, which indicates that we need to select appropriate combinations depending on the characteristics of datasets. In response, we propose a simple yet holistic classification method A2DUG which leverages all combinations of node representation variants in directed and undirected graphs. We demonstrate that A2DUG stably performs well on various datasets. Surprisingly, it largely outperforms the current state-of-the-art methods in several datasets. This result validates the importance of the adaptive effect control on the combinations of node representations and edge directions.	翻訳日:2023-06-16 20:15:49 公開日:2023-06-14
# 局在2次元波束の密度ゆらぎにおけるカルダー・パリ・チャン物理 Kardar-Parisi-Zhang Physics in the Density Fluctuations of Localized Two-Dimensional Wave Packets ( http://arxiv.org/abs/2306.08272v1 ) ライセンス: Link先を確認	Sen Mu, Jiangbin Gong, Gabriel Lemari\'e	(参考訳) 2次元アンダーソン局在波パケットにおいて,波密度対数のゆらぎにおけるKardar-Parisi-Zhang普遍性クラスの鍵となる特徴を同定する。数値解析により,ゆらぎは,約1/3$の指数とゆらぎのトレイシー・ウィドム確率分布によって特徴づけられる距離を持つ代数的スケーリングを示すことがわかった。さらに、KPZ物理の指向性ポリマー画像において、指向性経路の波状パケット密度への支配的寄与を同定し、その横変動が粗さ指数2/3$で特徴づけられることを発見した。このKPZ物理との接続を利用して、2次元のアンダーソン局所化波のパケットがそのよく知られた指数的局所化に対する拡張指数的補正を示すことを検証した。 We identify the key features of Kardar-Parisi-Zhang universality class in the fluctuations of the wave density logarithm, in a two-dimensional Anderson localized wave packet. In our numerical analysis, the fluctuations are found to exhibit an algebraic scaling with distance characterized by an exponent of $1/3$, and a Tracy-Widom probability distribution of the fluctuations. Additionally, within a directed polymer picture of KPZ physics, we identify the dominant contribution of a directed path to the wave packet density and find that its transverse fluctuations are characterized by a roughness exponent $2/3$. Leveraging on this connection with KPZ physics, we verify that an Anderson localized wave packet in 2D exhibits a stretched-exponential correction to its well-known exponential localization.	翻訳日:2023-06-16 20:15:34 公開日:2023-06-14
# 物体検出のための多クラス信頼度と位置校正 Multiclass Confidence and Localization Calibration for Object Detection ( http://arxiv.org/abs/2306.08271v1 ) ライセンス: Link先を確認	Bimsara Pathiraja, Malitha Gunawardhana, Muhammad Haris Khan	(参考訳) 多くの挑戦的なコンピュータビジョン問題に対して高い予測精度を達成する一方で、最近の研究は、ディープニューラルネットワーク(DNN)が過信的な予測を行う傾向があることを示唆している。 DNNキャリブレーションを改善する既存の試みのほとんどは、分類タスクに限定され、ドメイン内予測のキャリブレーションに限られている。驚くべきことに、視覚ベースのセキュリティに敏感で安全に重要なアプリケーションにおいて重要な空間を占める物体検出法を校正する試みは、ほとんど、あるいは全く行われていない。本稿では,最近の物体検出手法を校正するための新しい列車時間手法を提案する。予測の不確実性を利用することで、マルチクラス信頼度とボックスローカライズを共同で調整することができる。我々は複数のドメイン内およびドメイン外検出ベンチマークで広範囲な実験を行う。その結果,提案手法は,領域内と領域外の両方の予測におけるキャリブレーション誤差を減少させるため,複数のベースラインを一貫して上回っていることがわかった。私たちのコードとモデルはhttps://github.com/bimsarapathiraja/mcclで利用可能です。 Albeit achieving high predictive accuracy across many challenging computer vision problems, recent studies suggest that deep neural networks (DNNs) tend to make overconfident predictions, rendering them poorly calibrated. Most of the existing attempts for improving DNN calibration are limited to classification tasks and restricted to calibrating in-domain predictions. Surprisingly, very little to no attempts have been made in studying the calibration of object detection methods, which occupy a pivotal space in vision-based security-sensitive, and safety-critical applications. In this paper, we propose a new train-time technique for calibrating modern object detection methods. It is capable of jointly calibrating multiclass confidence and box localization by leveraging their predictive uncertainties. We perform extensive experiments on several in-domain and out-of-domain detection benchmarks. Results demonstrate that our proposed train-time calibration method consistently outperforms several baselines in reducing calibration error for both in-domain and out-of-domain predictions. Our code and models are available at https://github.com/bimsarapathiraja/MCCL.	翻訳日:2023-06-16 20:15:18 公開日:2023-06-14
# 2次元円形カーネル時系列変換, エントロピー対策, 機械学習アプローチを用いた太陽活動の画像追跡 Imagery Tracking of Sun Activity Using 2D Circular Kernel Time Series Transformation, Entropy Measures and Machine Learning Approaches ( http://arxiv.org/abs/2306.08270v1 ) ライセンス: Link先を確認	Irewola Aaron Oludehinwa, Andrei Velichko, Maksim Belyaev and Olasunkanmi I. Olusola	(参考訳) 太陽は自然に非常に複雑であり、その観測画像の特徴は太陽活動、宇宙、地球の気象条件に関する情報の最も重要な情報源の1つである。 NASAのソーラー・ダイナミクス・オブザーバトリーは1日あたり約7万枚の太陽活動の画像を撮影しており、この太陽観測画像の連続的な視界検査は難しい。本研究では,2次元円カーネル時系列変換,統計的およびエントロピー尺度を用いて,機械学習による太陽活動の追跡手法を開発した。この技術は、太陽観測画像断面を1次元時系列(1-DTS)に変換し、統計的およびエントロピー的測度(Approach)を計測する。 1)と直接分類(応用) 2) 機械学習分類のための1-DTSから抽出した特徴を'solar storm'と'no storm'にキャプチャするために使用される。その結果、太陽活動の追跡におけるモデルの有効性は、アプローチ1では0.981、アプローチ2では0.999であることがわかった。太陽観測衛星画像の回転変換への発展アプローチの安定性は明らかである。 Approach 1 の当初のデータセットをトレーニングすると、太陽嵐領域の分布の一致指数 (T90) は T90 ~ 0.993 となり、また Approach 2 は T90 ~ 0.951 となる。また、拡張トレーニングベースを使用すると、マッチ指数はt90〜0.994とt90〜1に増加した。このモデルは、太陽嵐に関連する渦巻く磁気線の領域を一貫して分類し、画像の回転、彩度、光学的アーティファクトに頑健である。 The sun is highly complex in nature and its observatory imagery features is one of the most important sources of information about the sun activity, space and Earth's weather conditions. The NASA, solar Dynamics Observatory captures approximately 70,000 images of the sun activity in a day and the continuous visual inspection of this solar observatory images is challenging. In this study, we developed a technique of tracking the sun's activity using 2D circular kernel time series transformation, statistical and entropy measures, with machine learning approaches. The technique involves transforming the solar observatory image section into 1-Dimensional time series (1-DTS) while the statistical and entropy measures (Approach 1) and direct classification (Approach 2) is used to capture the extraction features from the 1-DTS for machine learning classification into 'solar storm' and 'no storm'. We found that the potential accuracy of the model in tracking the activity of the sun is approximately 0.981 for Approach 1 and 0.999 for Approach 2. The stability of the developed approach to rotational transformation of the solar observatory image is evident. When training on the original dataset for Approach 1, the match index (T90) of the distribution of solar storm areas reaches T90 ~ 0.993, and T90 ~ 0.951 for Approach 2. In addition, when using the extended training base, the match indices increased to T90 ~ 0.994 and T90 ~ 1, respectively. This model consistently classifies areas with swirling magnetic lines associated with solar storms and is robust to image rotation, glare, and optical artifacts.	翻訳日:2023-06-16 20:15:00 公開日:2023-06-14
# LargeST: 大規模トラフィック予測のためのベンチマークデータセット LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting ( http://arxiv.org/abs/2306.08259v1 ) ライセンス: Link先を確認	Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chao Huang, Zhenguang Liu, Bryan Hooi, Roger Zimmermann	(参考訳) 交通予測はスマートシティのイニシアチブにおいて重要な役割を担い、トラフィックデータの非線形パターンを捉えた深層学習の力により、大きな進歩を遂げている。しかし、現在の公開データセットで達成された有望な結果は、これらのデータセット内の制限のため、実用的なシナリオには適用できない可能性がある。まず、制限されたサイズは、実際の交通ネットワークの規模を反映していない可能性がある。第二に、これらのデータセットの時間的カバレッジは通常短く、長期的なパターンを研究し、深層モデルのトレーニングに十分なサンプルを取得する上でハードルとなる。第三に、これらのデータセットはセンサーに十分なメタデータを欠いており、データの信頼性と解釈性を損なう。これらの制限を軽減するため、LargeSTベンチマークデータセットを導入します。総計8,600個のセンサーを5年間にわたってカバーし、包括的なメタデータを含んでいる。最大で詳細なデータ分析を行い、データインサイトを抽出し、パフォーマンスと効率の観点からよく知られたベースラインをベンチマークし、課題と将来の研究の機会を特定します。データセットとベースラインの実装は、https://github.com/liuxu77/ largestでリリースします。 Traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning in capturing non-linear patterns of traffic data. However, the promising results achieved on current public datasets may not be applicable to practical scenarios due to limitations within these datasets. First, the limited sizes of them may not reflect the real-world scale of traffic networks. Second, the temporal coverage of these datasets is typically short, posing hurdles in studying long-term patterns and acquiring sufficient samples for training deep models. Third, these datasets often lack adequate metadata for sensors, which compromises the reliability and interpretability of the data. To mitigate these limitations, we introduce the LargeST benchmark dataset. It encompasses a total number of 8,600 sensors with a 5-year time coverage and includes comprehensive metadata. Using LargeST, we perform in-depth data analysis to extract data insights, benchmark well-known baselines in terms of their performance and efficiency, and identify challenges as well as opportunities for future research. We release the datasets and baseline implementations at: https://github.com/liuxu77/LargeST.	翻訳日:2023-06-16 20:14:33 公開日:2023-06-14
# 潜在拡散モデルのロバスト性について On the Robustness of Latent Diffusion Models ( http://arxiv.org/abs/2306.08257v1 ) ライセンス: Link先を確認	Jianping Zhang, Zhuoer Xu, Shiwen Cui, Changhua Meng, Weibin Wu, Michael R. Lyu	(参考訳) 潜在拡散モデルは、画像合成や画像編集など、様々な生成タスクで最先端のパフォーマンスを達成する。しかし, 潜在拡散モデルのロバスト性は十分に研究されていない。以前の作業は、デノーミングプロセスに関係なく、エンコーダやホワイトボックス設定下での出力イメージに対する敵攻撃のみに焦点を当てていた。そこで本研究では,潜伏拡散モデルのロバスト性をより詳細に解析することを目的とする。まず,潜伏拡散モデル内の成分が白色箱の堅牢性に及ぼす影響について検討した。ホワイトボックスのシナリオに加えて、転送攻撃による潜伏拡散モデルのブラックボックスロバスト性を評価し、プロンプト・トランスファーとモデル・トランスファーの両方の設定と防御機構について検討する。しかし、これらの調査には包括的なベンチマークデータセットが必要であり、文献に欠けている。そこで, 潜在拡散モデルのロバスト性の研究を容易にするために, 2種類の画像編集モデルのための2つの自動データセット構築パイプラインを提案し, データセット全体を解放する。コードとデータセットは \url{https://github.com/jpzhang1810/LDM-Robustness} で公開されています。 Latent diffusion models achieve state-of-the-art performance on a variety of generative tasks, such as image synthesis and image editing. However, the robustness of latent diffusion models is not well studied. Previous works only focus on the adversarial attacks against the encoder or the output image under white-box settings, regardless of the denoising process. Therefore, in this paper, we aim to analyze the robustness of latent diffusion models more thoroughly. We first study the influence of the components inside latent diffusion models on their white-box robustness. In addition to white-box scenarios, we evaluate the black-box robustness of latent diffusion models via transfer attacks, where we consider both prompt-transfer and model-transfer settings and possible defense mechanisms. However, all these explorations need a comprehensive benchmark dataset, which is missing in the literature. Therefore, to facilitate the research of the robustness of latent diffusion models, we propose two automatic dataset construction pipelines for two kinds of image editing models and release the whole dataset. Our code and dataset are available at \url{https://github.com/jpzhang1810/LDM-Robustness}.	翻訳日:2023-06-16 20:14:16 公開日:2023-06-14
# 予測輝度による可逆的ハーフトーン変換 Taming Reversible Halftoning via Predictive Luminance ( http://arxiv.org/abs/2306.08309v1 ) ライセンス: Link先を確認	Cheuk-Kit Lau, Menghan Xia, Tien-Tsin Wong	(参考訳) 伝統的なハーフトーンは通常、二値ドットで画像をディザリングする際に色を落とすため、元の色情報を復元することが困難になる。カラーイメージを元のバージョンに完全復元可能なバイナリハーフトーンに変換する,新しいハーフトーン技術を提案する。提案手法は,2つの畳み込みニューラルネットワーク(CNN)による可逆半音パターンの生成と,CNNの平坦性劣化問題を緩和するためのノイズインセンティブブロック(NIB)から構成される。さらに,提案手法では,青音品質と復元精度の矛盾に対処するため,予測可能な情報をネットワークからオフロードする予測器組込み手法を提案し,本手法はハーフトーンパターンに類似した輝度情報である。このようなアプローチにより、ネットワークは、修復品質を損なうことなく、より優れたブルーノイズ品質のハーフトーンを生産する柔軟性を得ることができる。多段階訓練法と損失重み付けに関する詳細な研究が行われている。我々は, 半音のスペクトル解析, 半音の精度, 復元精度, データ埋め込み研究について, 予測器埋め込み法と新しい手法を比較した。エントロピー評価の結果,我々のハーフトーンは,新しいベース法よりもエントロピー情報が少ないことがわかった。実験により, 半音の青色音質を改善するために, 予測器埋込み法により柔軟性が向上し, 耐障害性も向上した。 Traditional halftoning usually drops colors when dithering images with binary dots, which makes it difficult to recover the original color information. We proposed a novel halftoning technique that converts a color image into a binary halftone with full restorability to its original version. Our novel base halftoning technique consists of two convolutional neural networks (CNNs) to produce the reversible halftone patterns, and a noise incentive block (NIB) to mitigate the flatness degradation issue of CNNs. Furthermore, to tackle the conflicts between the blue-noise quality and restoration accuracy in our novel base method, we proposed a predictor-embedded approach to offload predictable information from the network, which in our case is the luminance information resembling from the halftone pattern. Such an approach allows the network to gain more flexibility to produce halftones with better blue-noise quality without compromising the restoration quality. Detailed studies on the multiple-stage training method and loss weightings have been conducted. We have compared our predictor-embedded method and our novel method regarding spectrum analysis on halftone, halftone accuracy, restoration accuracy, and the data embedding studies. Our entropy evaluation evidences our halftone contains less encoding information than our novel base method. The experiments show our predictor-embedded method gains more flexibility to improve the blue-noise quality of halftones and maintains a comparable restoration quality with a higher tolerance for disturbances.	翻訳日:2023-06-16 20:08:42 公開日:2023-06-14
# マルチモーダル分類のためのバランスの取れたアクティブラーニングに向けて Towards Balanced Active Learning for Multimodal Classification ( http://arxiv.org/abs/2306.08306v1 ) ライセンス: Link先を確認	Meng Shen, Yizheng Huang, Jianxiong Yin, Heqing Zou, Deepu Rajan, Simon See	(参考訳) マルチモーダルネットワークのトレーニングには、ユニモーダルネットワークに比べてパラメータ空間が大きいため、膨大なデータが必要である。アクティブラーニングは、モデルの性能向上に寄与するサンプルのみを選択することで、データアノテーションコストを削減するために広く使われているテクニックである。しかし、現在のアクティブラーニング戦略は、主に一助的なタスクのために設計されており、マルチモーダルデータに適用すると、支配的なモダリティからのサンプル選択がバイアスとなることが多い。この不公平さは、最適なパフォーマンスを達成するために重要なマルチモーダル学習のバランスを妨げる。本稿では,よりバランスの取れたマルチモーダルアクティブラーニング戦略を設計するための3つのガイドラインを提案する。これらのガイドラインに従うと、モダリティ間の優越度で勾配埋め込みを変調することにより、より公平なデータ選択を実現する新しい手法が提案される。本研究は,支配的モダリティから欲深いサンプル選択を回避し,よりバランスのとれたマルチモーダル学習を実現することを実証する。提案手法は,様々なマルチモーダル分類タスクにおいて,既存のアクティブラーニング戦略より優れている。本研究は,マルチモーダル能動学習におけるサンプル選択のバランスをとることの重要性を強調し,マルチモーダル分類のためのよりバランスの取れた能動学習を実現するための実践的ソリューションを提供する。 Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks. Active learning is a widely used technique for reducing data annotation costs by selecting only those samples that could contribute to improving model performance. However, current active learning strategies are mostly designed for unimodal tasks, and when applied to multimodal data, they often result in biased sample selection from the dominant modality. This unfairness hinders balanced multimodal learning, which is crucial for achieving optimal performance. To address this issue, we propose three guidelines for designing a more balanced multimodal active learning strategy. Following these guidelines, a novel approach is proposed to achieve more fair data selection by modulating the gradient embedding with the dominance degree among modalities. Our studies demonstrate that the proposed method achieves more balanced multimodal learning by avoiding greedy sample selection from the dominant modality. Our approach outperforms existing active learning strategies on a variety of multimodal classification tasks. Overall, our work highlights the importance of balancing sample selection in multimodal active learning and provides a practical solution for achieving more balanced active learning for multimodal classification.	翻訳日:2023-06-16 20:08:15 公開日:2023-06-14
# マイクロドップラーシグネチャに基づくレーダーデータエンハンス型深層学習アプローチによる歩行者認識 Pedestrian Recognition with Radar Data-Enhanced Deep Learning Approach Based on Micro-Doppler Signatures ( http://arxiv.org/abs/2306.08303v1 ) ライセンス: Link先を確認	Haoming Li, Yu Xiang, Haodong Xu, Wenyong Wang	(参考訳) 近年のホットな話題として、レーダーマイクロドップラーシグネチャに基づく歩行者識別能力は、適切な訓練データがないために制限されている。本稿では,データ拡張(DE)モジュールとマルチ特性学習(MCL)モジュールを備えたデータ強化型マルチ特性学習(DEMCL)モデルを提案し,より補完的な歩行者用マイクロドップラー(m-D)シグネチャを学習する。 DEモジュールでは、フリーウォーキングデータセットを強化するためにレンジ・ドップラー生成逆数ネットワーク(RDGAN)が提案され、マルチスケール畳み込みニューラルネットワーク(MCNN)とラジアル基底関数ニューラルネットワーク(RBFNN)を備えたMCLモジュールは、強化データセットから抽出されたm-Dシグネチャを学習するために訓練される。実験の結果,25分間の歩行データセットにおいて,本モデルは3.33%から10.24%の精度を示し,走行時間は0.09324秒であった。 As a hot topic in recent years, the ability of pedestrians identification based on radar micro-Doppler signatures is limited by the lack of adequate training data. In this paper, we propose a data-enhanced multi-characteristic learning (DEMCL) model with data enhancement (DE) module and multi-characteristic learning (MCL) module to learn more complementary pedestrian micro-Doppler (m-D) signatures. In DE module, a range-Doppler generative adversarial network (RDGAN) is proposed to enhance free walking datasets, and MCL module with multi-scale convolution neural network (MCNN) and radial basis function neural network (RBFNN) is trained to learn m-D signatures extracted from enhanced datasets. Experimental results show that our model is 3.33% to 10.24% more accurate than other studies and has a short run time of 0.9324 seconds on a 25-minute walking dataset.	翻訳日:2023-06-16 20:07:54 公開日:2023-06-14
# 大規模言語モデルと知識グラフの統合:ロードマップ Unifying Large Language Models and Knowledge Graphs: A Roadmap ( http://arxiv.org/abs/2306.08302v1 ) ライセンス: Link先を確認	Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, Xindong Wu	(参考訳) ChatGPTやGPT4のような大規模言語モデル(LLM)は、その創発的能力と一般化性のために、自然言語処理と人工知能の分野で新たな波を発生させている。しかし、llmはブラックボックスモデルであり、事実知識を捉えてアクセスすることができないことが多い。対照的に、ナレッジグラフ(kgs)、wikipedia、huapuは、リッチな事実知識を明示的に格納する構造化ナレッジモデルである。 kgsは推論と解釈の外部知識を提供することでllmを強化することができる。一方、KGは自然によって構築と進化が困難であり、KGの既存の手法に挑戦して新しい事実を生成し、目に見えない知識を表現する。したがって、llmとkgを統一し、同時にその利点を活用することは相補的である。本稿では,LLMとKGの統合に向けた今後のロードマップを示す。私たちのロードマップは3つの一般的なフレームワークで構成されています。 1) LLMの事前訓練及び推論段階でKGを組み込んだKG強化LLM、又は、LLMが学習した知識の理解を深めることを目的とした。 2 LLM強化KGは、埋め込み、完了、構築、グラフ・トゥ・テキスト生成、質問応答等の異なるKGタスクにLLMを活用する。 3) LLM と KG が同等の役割を担い、相互に有益な方法で機能し、データと知識の両方によって駆動される双方向推論のための LLM と KG の両方を強化する。我々は、これらの3つのフレームワークの既存の取り組みをロードマップでレビューし、要約し、今後の研究方向性を見極める。 Large language models (LLMs), such as ChatGPT and GPT4, are making new waves in the field of natural language processing and artificial intelligence, due to their emergent ability and generalizability. However, LLMs are black-box models, which often fall short of capturing and accessing factual knowledge. In contrast, Knowledge Graphs (KGs), Wikipedia and Huapu for example, are structured knowledge models that explicitly store rich factual knowledge. KGs can enhance LLMs by providing external knowledge for inference and interpretability. Meanwhile, KGs are difficult to construct and evolving by nature, which challenges the existing methods in KGs to generate new facts and represent unseen knowledge. Therefore, it is complementary to unify LLMs and KGs together and simultaneously leverage their advantages. In this article, we present a forward-looking roadmap for the unification of LLMs and KGs. Our roadmap consists of three general frameworks, namely, 1) KG-enhanced LLMs, which incorporate KGs during the pre-training and inference phases of LLMs, or for the purpose of enhancing understanding of the knowledge learned by LLMs; 2) LLM-augmented KGs, that leverage LLMs for different KG tasks such as embedding, completion, construction, graph-to-text generation, and question answering; and 3) Synergized LLMs + KGs, in which LLMs and KGs play equal roles and work in a mutually beneficial way to enhance both LLMs and KGs for bidirectional reasoning driven by both data and knowledge. We review and summarize existing efforts within these three frameworks in our roadmap and pinpoint their future research directions.	翻訳日:2023-06-16 20:07:34 公開日:2023-06-14
# SaDI: 極端イベント下での電力負荷予測のための自己適応型分解型解釈可能なフレームワーク SaDI: A Self-adaptive Decomposed Interpretable Framework for Electric Load Forecasting under Extreme Events ( http://arxiv.org/abs/2306.08299v1 ) ライセンス: Link先を確認	Hengbo Liu, Ziqing Ma, Linxiao Yang, Tian Zhou, Rui Xia, Yi Wang, Qingsong Wen, Liang Sun	(参考訳) 電力網の計画と管理には電力負荷の正確な予測が不可欠である。本稿では,熱を焼成するなどの極端な状況下での電力負荷予測問題を解く。正確な予測の1つの課題は、極端な条件下でのトレーニングサンプルの欠如である。また、負荷は通常、より優れた決定を下すために解釈可能なモデルを要求するこれらの極端な状況において劇的に変化する。本稿では, 長期傾向, 短期傾向, 周期モデリングを合体させ, 異なる成分の時間特性を捉えた, 自己適応型分解型解釈可能フレームワーク (sadi) を提案する。極端事象下での不均衡学習のための外部変数トリガ損失を提案する。さらに,GAM(Generalized Additive Model)が望ましい解釈性のためのフレームワークとして採用されている。建物からの電力負荷と公共エネルギーメータの双方に関する実験により、提案されたSaDIフレームワークは、通常化RMSEの日平均で、極端な事象下での予測における最先端のアルゴリズムと比較して平均22.14%改善していることが示された。コード、パブリックデータセット、およびAppendixは、https://doi.org/10.24433/CO.9696980.v1 で利用可能である。 Accurate prediction of electric load is crucial in power grid planning and management. In this paper, we solve the electric load forecasting problem under extreme events such as scorching heats. One challenge for accurate forecasting is the lack of training samples under extreme conditions. Also load usually changes dramatically in these extreme conditions, which calls for interpretable model to make better decisions. In this paper, we propose a novel forecasting framework, named Self-adaptive Decomposed Interpretable framework~(SaDI), which ensembles long-term trend, short-term trend, and period modelings to capture temporal characteristics in different components. The external variable triggered loss is proposed for the imbalanced learning under extreme events. Furthermore, Generalized Additive Model (GAM) is employed in the framework for desirable interpretability. The experiments on both Central China electric load and public energy meters from buildings show that the proposed SaDI framework achieves average 22.14% improvement compared with the current state-of-the-art algorithms in forecasting under extreme events in terms of daily mean of normalized RMSE. Code, Public datasets, and Appendix are available at: https://doi.org/10.24433/CO.9696980.v1 .	翻訳日:2023-06-16 20:07:03 公開日:2023-06-14
# 直接グリッドリファインメントアルゴリズムを用いた物理形ニューラルネットワークの効率的な学習 Efficient Training of Physics-Informed Neural Networks with Direct Grid Refinement Algorithm ( http://arxiv.org/abs/2306.08293v1 ) ライセンス: Link先を確認	Shikhar Nilabh and Fidel Grandia	(参考訳) 本研究では,物理インフォームドニューラルネットワーク(PINN)の枠組みにおける残点の適応サンプリングに適したアルゴリズムの開発について述べる。提案手法は,既存の適応サンプリング手法に内在する制限に対処することで,計算効率と適応点配置の両方を効果的に保証する直接メッシュリファインメント手法を導入する。本アルゴリズムの性能を評価するために検証を行い,本手法とベンチマークモデルの結果に基づいて,モデル間の合理的な一致を示した。従来の適応型再サンプリング技術との比較分析により,特に高精細化率で実施した場合,本手法の優れた性能が示された。本研究は, 適応サンプリングアルゴリズムを物理インフォームドニューラルネットワークに適用することにより, シミュレーション精度の向上を図ったものである。 This research presents the development of an innovative algorithm tailored for the adaptive sampling of residual points within the framework of Physics-Informed Neural Networks (PINNs). By addressing the limitations inherent in existing adaptive sampling techniques, our proposed methodology introduces a direct mesh refinement approach that effectively ensures both computational efficiency and adaptive point placement. Verification studies were conducted to evaluate the performance of our algorithm, showcasing reasonable agreement between the model based on our novel approach and benchmark model results. Comparative analyses with established adaptive resampling techniques demonstrated the superior performance of our approach, particularly when implemented with higher refinement factor. Overall, our findings highlight the enhancement of simulation accuracy achievable through the application of our adaptive sampling algorithm for Physics-Informed Neural Networks.	翻訳日:2023-06-16 20:06:43 公開日:2023-06-14
# 高次解法におけるp適応のための強化学習戦略 A reinforcement learning strategy for p-adaptation in high order solvers ( http://arxiv.org/abs/2306.08292v1 ) ライセンス: Link先を確認	David Huergo, Gonzalo Rubio, Esteban Ferrer	(参考訳) 強化学習(Reinforcement Learning, RL)は、意思決定プロセスを自動化するための有望なアプローチである。本稿では,高次解法を用いて計算メッシュの多項式順序を最適化するrl手法の適用について検討する。メッシュ適応は,コストを低減しつつ精度を向上し,数値シミュレーションの効率を向上させる上で重要な役割を担っている。ここで、近位ポリシー最適化に基づくアクター-クリティックrlモデルは、エージェントが進化する条件に基づいて最適なメッシュ修正を学ぶためのデータ駆動アプローチを提供する。本稿では,高次解法におけるp適応戦略を提案し,適切な報酬構造の定式化やrlエージェントとシミュレーション環境との相互作用など,rlベースのメッシュ適応の主な側面について考察する。 RLに基づくメッシュp適応が計算効率と精度に与える影響を論じる。本研究では,RL p-adaptation strategy をバーガースの 1D Inviscid Burgers 方程式で検証し,この戦略の有効性を実証する。 rl戦略は計算コストを削減し、均一な適応よりも精度を向上させ、人間の介入を最小限に抑える。 Reinforcement learning (RL) has emerged as a promising approach to automating decision processes. This paper explores the application of RL techniques to optimise the polynomial order in the computational mesh when using high-order solvers. Mesh adaptation plays a crucial role in improving the efficiency of numerical simulations by improving accuracy while reducing the cost. Here, actor-critic RL models based on Proximal Policy Optimization offer a data-driven approach for agents to learn optimal mesh modifications based on evolving conditions. The paper provides a strategy for p-adaptation in high-order solvers and includes insights into the main aspects of RL-based mesh adaptation, including the formulation of appropriate reward structures and the interaction between the RL agent and the simulation environment. We discuss the impact of RL-based mesh p-adaptation on computational efficiency and accuracy. We test the RL p-adaptation strategy on a 1D inviscid Burgers' equation to demonstrate the effectiveness of the strategy. The RL strategy reduces the computational cost and improves accuracy over uniform adaptation, while minimising human intervention.	翻訳日:2023-06-16 20:06:32 公開日:2023-06-14
# 注意機構とctcに基づくデコードを用いたフランス語 cued 音声における手と唇のダイナミックスの検討 Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding ( http://arxiv.org/abs/2306.08290v1 ) ライセンス: Link先を確認	Sanjana Sankar (GIPSA-CRISSP), Denis Beautemps (GIPSA-CRISSP), Fr\'ed\'eric Elisei (ICP), Olivier Perrotin (GIPSA-CRISSP), Thomas Hueber (GIPSA-CRISSP)	(参考訳) 難聴者や難聴者は、音声言語を理解するためのコミュニケーションツールとして、CS(cued speech)を利用する。音声情報に関連する手がかりを提供することで、CSはリップリーディングを強化する手段を提供する。文献では、人間の生産の文脈において、手と唇の動態に関するいくつかの研究がなされている。本稿では,ニューラルネットワークが単一話者に対して,注意機構を用いて認識タスクを実行しながら,この関係を学習する方法を提案する。さらに、学習ダイナミクスの分析を用いて、2つのモダリティ間の関係を確立し、自動セグメントを抽出する。本研究の目的のために,フランスCS向けに新しいデータセットが記録されている。このデータセットのリリースとともに、単語レベルの認識のためのベンチマークが報告される。 Hard of hearing or profoundly deaf people make use of cued speech (CS) as a communication tool to understand spoken language. By delivering cues that are relevant to the phonetic information, CS offers a way to enhance lipreading. In literature, there have been several studies on the dynamics between the hand and the lips in the context of human production. This article proposes a way to investigate how a neural network learns this relation for a single speaker while performing a recognition task using attention mechanisms. Further, an analysis of the learnt dynamics is utilized to establish the relationship between the two modalities and extract automatic segments. For the purpose of this study, a new dataset has been recorded for French CS. Along with the release of this dataset, a benchmark will be reported for word-level recognition, a novelty in the automatic recognition of French CS.	翻訳日:2023-06-16 20:06:15 公開日:2023-06-14
# $\textbf{A}^2\textbf{CiD}^2$:分散ディープラーニングにおける非同期通信の高速化 $\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in Decentralized Deep Learning ( http://arxiv.org/abs/2306.08289v1 ) ライセンス: Link先を確認	Adel Nabli (MLIA, Mila), Eugene Belilovsky (Mila), Edouard Oyallon (MLIA)	(参考訳) ディープラーニングモデルの分散トレーニングは、この分野における多くの成功に不可欠である。現在の標準手法は主に同期集中型アルゴリズムに依存しており、大きな通信ボトルネックを引き起こし、ユーザビリティを強い接続性を持つハイパフォーマンスコンピューティング(HPC)環境に制限する。分散非同期アルゴリズムは潜在的な代替手段として登場しているが、実用性はまだ遅れている。本研究では,その柔軟性と並列化の可能性から,ピアツーピア非同期手法に着目する。大規模かつ接続の不十分な状況において,帯域幅の増加を緩和するために,$\textbf{A}^2\textbf{CiD}^2$という連続運動量のおかげで,非同期,ランダム化,ゴシップベースのアルゴリズムを導入する。パラメータの倍増以外のコストで重要な通信アクセラレーションを発生させるのに加えて、$\textbf{A}^2\textbf{CiD}^2$を他の非同期アプローチに組み込む必要がある。理論的・数値的にその効率を実証する。環グラフ上で経験的に、$\textbf{A}^2\textbf{CiD}^2$は通信レートを倍にするのと同じ効果を持つ。特に,最大64個の非同期ワーカ(a100 gpu)と各種通信ネットワークトポロジを用いたimagenetデータセットの一貫性向上を示す。 Distributed training of Deep Learning models has been critical to many recent successes in the field. Current standard methods primarily rely on synchronous centralized algorithms which induce major communication bottlenecks and limit their usability to High-Performance Computing (HPC) environments with strong connectivity. Decentralized asynchronous algorithms are emerging as a potential alternative but their practical applicability still lags. In this work, we focus on peerto-peer asynchronous methods due to their flexibility and parallelization potentials. In order to mitigate the increase in bandwidth they require at large scale and in poorly connected contexts, we introduce a principled asynchronous, randomized, gossip-based algorithm which works thanks to a continuous momentum named $\textbf{A}^2\textbf{CiD}^2$. In addition to inducing a significant communication acceleration at no cost other than doubling the parameters, minimal adaptation is required to incorporate $\textbf{A}^2\textbf{CiD}^2$ to other asynchronous approaches. We demonstrate its efficiency theoretically and numerically. Empirically on the ring graph, adding $\textbf{A}^2\textbf{CiD}^2$ has the same effect as doubling the communication rate. In particular, we show consistent improvement on the ImageNet dataset using up to 64 asynchronous workers (A100 GPUs) and various communication network topologies.	翻訳日:2023-06-16 20:06:02 公開日:2023-06-14
# 3次元音速位相不変エコー定位 3-Dimensional Sonic Phase-invariant Echo Localization ( http://arxiv.org/abs/2306.08281v1 ) ライセンス: Link先を確認	Christopher Hahne	(参考訳) パララックスと飛行時間(ToF)は、高度なカメラベースの3次元3次元再構成において様々な光と気象条件が課題であるロボットビジョンにおいて補完的なものとみなされる。そこで本研究では,3次元空間における任意のセンサ位置から音波パルスを三角測量するために,対応エコー(PaCE)のパララックスを確立した。これは新しいラウンドトリップ反射モデルによって達成され、それはセンサーの位置と検出された到着時刻にまたがる楕円形の交差点でターゲットをピンポイントする。チャネル間エコーアソシエーションは、標的検出の必須条件となり、シームズ多層パーセプトロン(MLP)のスタックから得られる特徴類似性から学習される。 PaCEアルゴリズムは1個の等方性エミッタと少なくとも3個のToF受信機からの位相不変3次元物体の局在化を可能にする。空中超音波センサハードウェアを用いて実験を行い、定量的な結果を得た。 Parallax and Time-of-Flight (ToF) are often regarded as complementary in robotic vision where various light and weather conditions remain challenges for advanced camera-based 3-Dimensional (3-D) reconstruction. To this end, this paper establishes Parallax among Corresponding Echoes (PaCE) to triangulate acoustic ToF pulses from arbitrary sensor positions in 3-D space for the first time. This is achieved through a novel round-trip reflection model that pinpoints targets at the intersection of ellipsoids, which are spanned by sensor locations and detected arrival times. Inter-channel echo association becomes a crucial prerequisite for target detection and is learned from feature similarity obtained by a stack of Siamese Multi-Layer Perceptrons (MLPs). The PaCE algorithm enables phase-invariant 3-D object localization from only 1 isotropic emitter and at least 3 ToF receivers with relaxed sensor position constraints. Experiments are conducted with airborne ultrasound sensor hardware and back this hypothesis with quantitative results.	翻訳日:2023-06-16 20:05:35 公開日:2023-06-14
# GCformer: 正確でスケーラブルな多変数時系列予測のための効率的なフレームワーク GCformer: An Efficient Framework for Accurate and Scalable Long-Term Multivariate Time Series Forecasting ( http://arxiv.org/abs/2306.08325v1 ) ライセンス: Link先を確認	YanJun Zhao, Ziqing Ma, Tian Zhou, Liang Sun, Mengni Ye, Yi Qian	(参考訳) トランスフォーマーベースのモデルは、時系列予測の有望なツールとして登場した。しかし、これらのモデルでは長い入力時系列の正確な予測はできない。一方で、時系列データ内のグローバルな依存関係を捉えられなかった。一方、長い入力シーケンスは、通常、大きなモデルサイズと高い時間複雑性をもたらす。この制限に対処するために、長い入力列を処理する構造化グローバル畳み込みブランチと、短い最新の信号をキャプチャするローカルトランスフォーマティブベースのブランチを組み合わせたgcformerを提案する。大域的畳み込みカーネルのための凝集フレームワークが3つの異なるパラメータ化手法を用いて導入された。グローバルブランチで選択された構造化畳み込みカーネルは、特に線形の複雑さで構築されており、長大で雑音の多い入力信号の効率的かつ効率的な処理を可能にしている。 6つのベンチマークデータセットに関する実証的研究により、GCformerは最先端の手法より優れており、多変量時系列ベンチマークのMSEエラーを4.38%、モデルパラメータを61.92%削減している。特に、グローバル畳み込み分岐は他のモデルの性能を向上させるためのプラグインブロックとして機能することができ、最近発表された様々なトランスフォーマーベースのモデルを含む平均31.93\%改善されている。私たちのコードはhttps://github.com/zyj-111/gcformerで公開しています。 Transformer-based models have emerged as promising tools for time series forecasting. However, these model cannot make accurate prediction for long input time series. On the one hand, they failed to capture global dependencies within time series data. On the other hand, the long input sequence usually leads to large model size and high time complexity. To address these limitations, we present GCformer, which combines a structured global convolutional branch for processing long input sequences with a local Transformer-based branch for capturing short, recent signals. A cohesive framework for a global convolution kernel has been introduced, utilizing three distinct parameterization methods. The selected structured convolutional kernel in the global branch has been specifically crafted with sublinear complexity, thereby allowing for the efficient and effective processing of lengthy and noisy input signals. Empirical studies on six benchmark datasets demonstrate that GCformer outperforms state-of-the-art methods, reducing MSE error in multivariate time series benchmarks by 4.38% and model parameters by 61.92%. In particular, the global convolutional branch can serve as a plug-in block to enhance the performance of other models, with an average improvement of 31.93\%, including various recently published Transformer-based models. Our code is publicly available at https://github.com/zyj-111/GCformer.	翻訳日:2023-06-16 19:58:00 公開日:2023-06-14
# ディープラーニングモデルをトレーニングする際のカーボンフットプリントの推定方法ガイドとレビュー How to estimate carbon footprint when training deep learning models? A guide and review ( http://arxiv.org/abs/2306.08323v1 ) ライセンス: Link先を確認	Lucia Bouza Heguerte (MAP5), Aur\'elie Bugeau (IUF, LaBRI, UB), Lo\"ic Lannelongue	(参考訳) 機械学習とディープラーニングモデルは、最近の社会の多くの分野における人工知能の急速な発展に欠かせないものとなっている。現在、これらのモデルの開発には多くの研究で分析された環境コストがあることが広く認識されている。機械学習モデルをトレーニングしながらエネルギー消費を追跡するために、いくつかのオンラインおよびソフトウェアツールが開発されている。本稿では,これらのツールの包括的導入と比較を行い,その作業の環境影響を推定したいai実践者を対象とした。特定の語彙、各ツールの技術的要件をレビューし、これらのツールを使用する方法と時期についてアドバイスします。 Machine learning and deep learning models have become essential in the recent fast development of artificial intelligence in many sectors of the society. It is now widely acknowledge that the development of these models has an environmental cost that has been analyzed in many studies. Several online and software tools have been developed to track energy consumption while training machine learning models. In this paper, we propose a comprehensive introduction and comparison of these tools for AI practitioners wishing to start estimating the environmental impact of their work. We review the specific vocabulary, the technical requirements for each tool, and provide some advice on how and when to use these tools.	翻訳日:2023-06-16 19:57:38 公開日:2023-06-14
# 過パラメータ浅層reluニューラルネットワークを用いた非パラメトリック回帰 Nonparametric regression using over-parameterized shallow ReLU neural networks ( http://arxiv.org/abs/2306.08321v1 ) ライセンス: Link先を確認	Yunfei Yang, Ding-Xuan Zhou	(参考訳) 重みが適切に制約されたり規則化されたりした場合、過パラメータ化されたニューラルネットワークは、ある滑らかな関数クラスから関数を学習するための最小収束率(対数係数まで)を達成することができる。具体的には、浅いreluニューラルネットワークを用いて未知の$d$-variate関数を推定する非パラメトリック回帰を考える。回帰関数は、平滑性$\alpha<(d+3)/2$のh\"older空間や、無限に広いニューラルネットワークと見なすことができる浅層ニューラルネットワークに対応する変動空間に由来すると仮定される。この設定では、重みに対する一定の規範制約を持つ浅層ニューラルネットワークに基づく最小二乗推定器は、ネットワーク幅が十分大きい場合、最小最適であることが証明される。副産物として、浅いReLUニューラルネットワークの局所ラデマッハ複雑性に対する新しい大きさ非依存境界が導出される。 It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the H\"older space with smoothness $\alpha<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.	翻訳日:2023-06-16 19:57:28 公開日:2023-06-14
# オンラインカーネル回帰に対する線形計算複雑度をもつ近似アルゴリズム Nearly Optimal Algorithms with Sublinear Computational Complexity for Online Kernel Regression ( http://arxiv.org/abs/2306.08320v1 ) ライセンス: Link先を確認	Junfan Li and Shizhong Liao	(参考訳) 後悔と計算コストのトレードオフは、オンラインカーネルレグレッションの基本的な問題であり、このトレードオフに取り組んでいた以前のアルゴリズムは、線形計算の複雑さで最適な後悔の境界を維持することはできない。本稿では,aogd-aldとnons-aldの2つの新しいアルゴリズムを提案する。どちらのアルゴリズムも、カーネルマッピングを近似するために使用されるほぼ直交基底の群を動的に維持し、近似誤差を制御することでほぼ最適な後悔境界を維持する。基底の数は、カーネル行列の固有値の近似誤差と崩壊率に依存する。固有値が指数関数的に減衰すると、AOGD-ALD と NONS-ALD は $O(\sqrt{L(f)})$ と $O(\mathrm{d}_{\mathrm{eff}}(\mu)\ln{T})$ を、$O(\ln^2{T})$ の計算複雑性でそれぞれ後悔する。固有値が次数$p\geq 1$で多項式的に減衰した場合、我々のアルゴリズムは、それぞれ$p>4$ と $p\geq 10$ の場合、同じ後悔境界を$o(T)$ の計算複雑性で保持する。 l(f)$ は$f$ の累積損失であり、$\mathrm{d}_{\mathrm{eff}}(\mu)$ は問題の有効次元である。 2つの後悔境界はほぼ最適であり、同等ではない。 The trade-off between regret and computational cost is a fundamental problem for online kernel regression, and previous algorithms worked on the trade-off can not keep optimal regret bounds at a sublinear computational complexity. In this paper, we propose two new algorithms, AOGD-ALD and NONS-ALD, which can keep nearly optimal regret bounds at a sublinear computational complexity, and give sufficient conditions under which our algorithms work. Both algorithms dynamically maintain a group of nearly orthogonal basis used to approximate the kernel mapping, and keep nearly optimal regret bounds by controlling the approximate error. The number of basis depends on the approximate error and the decay rate of eigenvalues of the kernel matrix. If the eigenvalues decay exponentially, then AOGD-ALD and NONS-ALD separately achieves a regret of $O(\sqrt{L(f)})$ and $O(\mathrm{d}_{\mathrm{eff}}(\mu)\ln{T})$ at a computational complexity in $O(\ln^2{T})$. If the eigenvalues decay polynomially with degree $p\geq 1$, then our algorithms keep the same regret bounds at a computational complexity in $o(T)$ in the case of $p>4$ and $p\geq 10$, respectively. $L(f)$ is the cumulative losses of $f$ and $\mathrm{d}_{\mathrm{eff}}(\mu)$ is the effective dimension of the problem. The two regret bounds are nearly optimal and are not comparable.	翻訳日:2023-06-16 19:57:10 公開日:2023-06-14
# パレート最適解の集合によるエネルギー管理構成概念の同定 Identification of Energy Management Configuration Concepts from a Set of Pareto-optimal Solutions ( http://arxiv.org/abs/2306.08318v1 ) ライセンス: Link先を確認	Felix Lanfermann and Qiqi Liu and Yaochu Jin and Sebastian Schmitt	(参考訳) 効率的なエネルギー利用のための建築構成の最適化は、現在研究によって注目されており、この課題に対処するいくつかの方法が開発されている。しかし、初期投資コスト、繰り返しコスト、グリッド操作の不確実性に対する堅牢性など、複数の相反する目標に基づく適切な構成の選択は、難しいマルチクリトリア意思決定問題である。概念識別は、構成オプションを意味のあるグループ(概念)に分類し、目的の選択に対するトレードオフ期待を満たすための制約を導入することで意思決定を容易にする。本研究では,多目的進化最適化による20000pareto-optimal building energy managementの設定に対して,複数の概念識別イテレーションを実施し,インフォームド投資決定の基盤を提供する。その後の一連の分析ステップで、記述空間の選択、すなわち、一貫性のある概念と重複しない概念を必要とする集合への特徴の分割が、抽出可能な情報のタイプに影響を与え、記述空間の異なるセットアップが構成データのいくつかの異なる側面を照らしていることを示す。 Optimizing building configurations for an efficient use of energy is increasingly receiving attention by current research and several methods have been developed to address this task. Selecting a suitable configuration based on multiple conflicting objectives, such as initial investment cost, recurring cost, robustness with respect to uncertainty of grid operation is, however, a difficult multi-criteria decision making problem. Concept identification can facilitate a decision maker by sorting configuration options into semantically meaningful groups (concepts), further introducing constraints to meet trade-off expectations for a selection of objectives. In this study, for a set of 20000 Pareto-optimal building energy management configurations, resulting from a many-objective evolutionary optimization, multiple concept identification iterations are conducted to provide a basis for making an informed investment decision. In a series of subsequent analysis steps, it is shown how the choice of description spaces, i.e., the partitioning of the features into sets for which consistent and non-overlapping concepts are required, impacts the type of information that can be extracted and that different setups of description spaces illuminate several different aspects of the configuration data - an important aspect that has not been addressed in previous work.	翻訳日:2023-06-16 19:56:40 公開日:2023-06-14
# R-Drop構造を有する改良型変圧器における名前付きエンティティ認識に関する研究 Research on Named Entity Recognition in Improved transformer with R-Drop structure ( http://arxiv.org/abs/2306.08315v1 ) ライセンス: Link先を確認	Weidong Ji, Yousheng Zhang, Guohui Zhou, Xu Wang	(参考訳) 本稿では,モデル一般化能力の向上と名前付きエンティティ認識タスクにおける変換器の有効性向上のために,XLNet-Transformer-Rモデルを提案する。 XLNet事前学習モデルと相対的な位置エンコーディングを備えたTransformerエンコーダを組み合わせることで、長いテキストの処理能力を高め、コンテキスト情報を学習することで堅牢性を向上させる。オーバーフィッティングを防止するため、R-Drop構造を用いて一般化能力を改善し、名前付きエンティティ認識タスクにおけるモデルの精度を高める。本稿では,MSRAデータセット上でのアブレーション実験と,XLNet-Transformer-Rモデルの戦略的有効性を示す4つのデータセットにおける他のモデルとの比較実験を行う。 To enhance the generalization ability of the model and improve the effectiveness of the transformer for named entity recognition tasks, the XLNet-Transformer-R model is proposed in this paper. The XLNet pre-trained model and the Transformer encoder with relative positional encodings are combined to enhance the model's ability to process long text and learn contextual information to improve robustness. To prevent overfitting, the R-Drop structure is used to improve the generalization capability and enhance the accuracy of the model in named entity recognition tasks. The model in this paper performs ablation experiments on the MSRA dataset and comparison experiments with other models on four datasets with excellent performance, demonstrating the strategic effectiveness of the XLNet-Transformer-R model.	翻訳日:2023-06-16 19:56:20 公開日:2023-06-14
# 自動話者独立視覚音声認識:包括的調査 Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey ( http://arxiv.org/abs/2306.08314v1 ) ライセンス: Link先を確認	Praneeth Nemani, G. Sai Krishna, Supriya Kundrapu	(参考訳) 話者非依存のVSRは、話者の顔の動きのビデオ記録から音声語やフレーズを識別する複雑なタスクである。長年にわたり、システムパフォーマンスを評価するために異なるアルゴリズムとデータセットを含むvsrの分野でかなりの研究が行われてきた。これらの取り組みは有効なVSRモデルの開発に大きな進歩をもたらし、この分野におけるさらなる研究の機会を生み出した。この調査は、過去30年間のVSRの進展を詳細に調査し、特に話者に依存しないシステムから話者に依存しないシステムへの移行に焦点を当てている。また、VSR研究で使用される各種データセットの概要と、話者独立を達成するために使用される事前処理技術についても概説する。この調査は1990年から2023年にかけて出版された著作を網羅し、各著作を徹底的に分析し、様々なパラメータと比較している。この調査は、1990年から2023年までの話者に依存しないVSRシステムの進化を詳細に分析する。 VSRシステムの開発について概説し、話者に依存しないVSRのためのエンドツーエンドパイプラインを開発する必要性を強調している。画像表現は、話者に依存しないVSRで使用されるテクニックの明確かつ簡潔な概要を提供し、それによって様々な方法論の理解と分析を支援する。調査ではまた、それぞれのテクニックの強みと限界を強調し、視覚音声の手がかりを分析するための新しいアプローチの開発に関する洞察を提供する。全体として、この総合的なレビューは、現在最先端の話者非依存のVSRに関する洞察を提供し、将来の研究の可能性を強調している。 Speaker-independent VSR is a complex task that involves identifying spoken words or phrases from video recordings of a speaker's facial movements. Over the years, there has been a considerable amount of research in the field of VSR involving different algorithms and datasets to evaluate system performance. These efforts have resulted in significant progress in developing effective VSR models, creating new opportunities for further research in this area. This survey provides a detailed examination of the progression of VSR over the past three decades, with a particular emphasis on the transition from speaker-dependent to speaker-independent systems. We also provide a comprehensive overview of the various datasets used in VSR research and the preprocessing techniques employed to achieve speaker independence. The survey covers the works published from 1990 to 2023, thoroughly analyzing each work and comparing them on various parameters. This survey provides an in-depth analysis of speaker-independent VSR systems evolution from 1990 to 2023. It outlines the development of VSR systems over time and highlights the need to develop end-to-end pipelines for speaker-independent VSR. The pictorial representation offers a clear and concise overview of the techniques used in speaker-independent VSR, thereby aiding in the comprehension and analysis of the various methodologies. The survey also highlights the strengths and limitations of each technique and provides insights into developing novel approaches for analyzing visual speech cues. Overall, This comprehensive review provides insights into the current state-of-the-art speaker-independent VSR and highlights potential areas for future research.	翻訳日:2023-06-16 19:56:06 公開日:2023-06-14
# バックドア攻撃における投薬効率向上のためのプロキシフリー戦略 A Proxy-Free Strategy for Practically Improving the Poisoning Efficiency in Backdoor Attacks ( http://arxiv.org/abs/2306.08313v1 ) ライセンス: Link先を確認	Ziqiang Li, Hong Sun, Pengfei Xia, Beihao Xia, Xue Rui, Wei Zhang, Bin Li	(参考訳) 毒殺効率は、毒殺ベースのバックドア攻撃において重要な要素である。攻撃者は、検出されていない状態を保つために、できるだけ少ない毒性サンプルを同じレベルの攻撃強度を達成するために使用する。効率的なトリガーは中毒の効率を大幅に改善するが、改善の余地はまだある。近年,効率のよいサンプルを選択することは有望であるが,有効な有害なサンプルセットを見つけるためにはプロキシバックドアインジェクションタスクが必要であるため,プロキシアタック設定が被害者の実際の設定と異なる場合,パフォーマンスが低下する可能性がある。本稿では,個別の類似性に基づいて効率的な有毒試料を選定し,この課題を効果的に解決する,新規なプロキシフリー戦略(pfs)を提案する。提案手法は,いくつかのデータセット,トリガ,中毒率,アーキテクチャ,ハイパーパラメータのトレーニングで評価する。実験の結果, PFSは従来のプロキシベース選択手法よりも高速で, バックドア攻撃強度が高いことがわかった。 Poisoning efficiency is a crucial factor in poisoning-based backdoor attacks. Attackers prefer to use as few poisoned samples as possible to achieve the same level of attack strength, in order to remain undetected. Efficient triggers have significantly improved poisoning efficiency, but there is still room for improvement. Recently, selecting efficient samples has shown promise, but it requires a proxy backdoor injection task to find an efficient poisoned sample set, which can lead to performance degradation if the proxy attack settings are different from the actual settings used by the victims. In this paper, we propose a novel Proxy-Free Strategy (PFS) that selects efficient poisoned samples based on individual similarity and set diversity, effectively addressing this issue. We evaluate the proposed strategy on several datasets, triggers, poisoning ratios, architectures, and training hyperparameters. Our experimental results demonstrate that PFS achieves higher backdoor attack strength while x500 faster than previous proxy-based selection approaches.	翻訳日:2023-06-16 19:55:42 公開日:2023-06-14
# 量子ゼノと反ゼノ効果に基づくコヒーレント制御:コヒーレンスとタイミングの役割 Coherent control based on quantum Zeno and anti-Zeno effects: Role of coherences and timing ( http://arxiv.org/abs/2306.08311v1 ) ライセンス: Link先を確認	Jacob Levitt and Artur F. Izmaylov	(参考訳) 量子ゼノと反ゼノ効果(QZE/AZE)は長い間知られている。多数の離散レベルが結合した一般的な量子系では、特定のレベル集団の測定は、その集団が他のレベルに移動する際の加速(AZE)または遅延(QZE)のいずれかにつながる。ここでは,結合した量子状態から特定の時間における計測によって人口の流れを制御する方法と,それに対するシステムパラメータについて考察する。本稿では,時間依存密度行列摂動理論に基づく量子ゼノダイナミクス解析の枠組みを提案する。この枠組みにより、国家の人口をコヒーレンスから明確に分離し、QZEまたはAZEの出現を予測することができる。 2つのモデルシステムについて分析する。 1)2つの結合レベルと 2)連続体に結合したレベル。どちらの場合も、量子コヒーレンスのダイナミクスは重要な役割を担い、摂動的考慮によって射影測定の効果を予測することができる。さらに,系の波動関数を記述するコヒーレント重ね合わせにおいて,状態の符号を反転させるユニタリ変換であるコヒーレント制御シナリオについても考察した。 The quantum-Zeno and anti-Zeno effects (QZE/AZE) are known for a long time. In a general quantum system with a number of coupled discrete levels, the measurement of a particular level population can lead to either acceleration (i.e. AZE) or retardation (i.e. QZE) of its population transfer to other levels. Here we consider how one can control the population flow from a coupled quantum state by measurement at a particular time, and what system parameters are responsible for that. We propose a framework for analysis of quantum Zeno dynamics based on time-dependent density matrix perturbation theory. This framework allows us to clearly separate state populations from their coherences and to predict appearance of either QZE or AZE. We illustrate our analysis on two model systems: 1) two coupled levels and 2) a level coupled to a continuum. In both cases dynamics of quantum coherences play a crucial role, and perturbative considerations allow us to predict the effect of projective measurements. In addition, we have extended our consideration to a closely related coherent control scenario, a unitary transformation flipping a sign of a state in a coherent superposition describing the system wavefunction.	翻訳日:2023-06-16 19:55:25 公開日:2023-06-14
# TWIGMA: Twitterのメタデータを備えたAI生成画像のデータセット TWIGMA: A dataset of AI-Generated Images with Metadata From Twitter ( http://arxiv.org/abs/2306.08310v1 ) ライセンス: Link先を確認	Yiqun Chen, James Zou	(参考訳) 生成型人工知能(gen-AI)の最近の進歩により、写真リアリスティック写真や芸術的インスピレーション写真が1クリックで生成できるようになった。 DALLEやStableDiffusionといったジェネラルAIモデルの使用方法を検討するためには、AI生成写真に存在するテーマ、内容、バリエーションを理解することが重要である。本稿では,2021年1月から2023年3月までにTwitter上で収集された800,000以上のgen-AIイメージを含む包括的なデータセットであるTWIGMA(TWItter Generative-aiイメージ with MetadatA)を紹介する。 TWIGMAと自然画像と人間のアートワークを比較した結果,gen-AI画像は特徴的特徴を有し,非gen-AI画像と比較した場合,平均的,低変動性を示すことがわかった。さらに,gen-AI画像と自然画像との類似性も明らかになった。 (i)「いいね」の数と逆相関し、 (ii)は、gen-AI創造のインスピレーションとなる人間の画像を特定するために用いられる。最後に、Twitter上でAI生成画像のテーマの経年変化を観察し、ユーザーは複雑な人間の肖像画などの芸術的に洗練されたコンテンツをシェアする一方で、自然の場面や動物のような単純な主題への関心は減少している。我々は,AI生成画像の研究において,TWIGMAがユニークなデータ資源であることを示す。 Recent progress in generative artificial intelligence (gen-AI) has enabled the generation of photo-realistic and artistically-inspiring photos at a single click, catering to millions of users online. To explore how people use gen-AI models such as DALLE and StableDiffusion, it is critical to understand the themes, contents, and variations present in the AI-generated photos. In this work, we introduce TWIGMA (TWItter Generative-ai images with MetadatA), a comprehensive dataset encompassing over 800,000 gen-AI images collected from Jan 2021 to March 2023 on Twitter, with associated metadata (e.g., tweet text, creation date, number of likes). Through a comparative analysis of TWIGMA with natural images and human artwork, we find that gen-AI images possess distinctive characteristics and exhibit, on average, lower variability when compared to their non-gen-AI counterparts. Additionally, we find that the similarity between a gen-AI image and natural images (i) is inversely correlated with the number of likes; and (ii) can be used to identify human images that served as inspiration for the gen-AI creations. Finally, we observe a longitudinal shift in the themes of AI-generated images on Twitter, with users increasingly sharing artistically sophisticated content such as intricate human portraits, whereas their interest in simple subjects such as natural scenes and animals has decreased. Our analyses and findings underscore the significance of TWIGMA as a unique data resource for studying AI-generated images.	翻訳日:2023-06-16 19:55:04 公開日:2023-06-14
# ランダムフーリエ特徴を用いたベイズ非線形潜在変数モデリング Bayesian Non-linear Latent Variable Modeling via Random Fourier Features ( http://arxiv.org/abs/2306.08352v1 ) ライセンス: Link先を確認	Michael Minyi Zhang, Gregory W. Gundersen, Barbara E. Engelhardt	(参考訳) ガウス過程潜在変数モデル(英: Gaussian process latent variable model、GPLVM)は、非線形次元の減少、行列分解、状態空間モデリングによく用いられる確率論的手法である。 gplvmsの推論は、データがガウス的である場合にのみ計算可能である。さらに、gplvmsの推論は、一般的に、後方不確かさを誤認するオーバーフィッティングや変分近似につながる最大後方点推定を得るために制限されている。本稿では,一般化ベイズ非線形潜在変数モデリングのためのマルコフ連鎖モンテカルロ(mcmc)推定を行う手法を提案する。 GPLVMを任意の観測モデルに一般化するために必要な重要な洞察は、ガウス過程写像におけるカーネル関数をランダムなフーリエ特徴と近似することである。ランダム特徴潜時変数モデル(RFLVM)を用いて,ポアソン,負二項分布,多項分布などの非ガウス観測にGPLVMを一般化できることを示す。一般化されたRFLVMは, 動作キャプチャ, 画像, テキストデータなど, 様々なアプリケーション上で, 最先端の潜伏変数モデルと同等に動作し, 潜伏構造を推定し, 複雑なデータセットの欠落データを出力する。 The Gaussian process latent variable model (GPLVM) is a popular probabilistic method used for nonlinear dimension reduction, matrix factorization, and state-space modeling. Inference for GPLVMs is computationally tractable only when the data likelihood is Gaussian. Moreover, inference for GPLVMs has typically been restricted to obtaining maximum a posteriori point estimates, which can lead to overfitting, or variational approximations, which mischaracterize the posterior uncertainty. Here, we present a method to perform Markov chain Monte Carlo (MCMC) inference for generalized Bayesian nonlinear latent variable modeling. The crucial insight necessary to generalize GPLVMs to arbitrary observation models is that we approximate the kernel function in the Gaussian process mappings with random Fourier features; this allows us to compute the gradient of the posterior in closed form with respect to the latent variables. We show that we can generalize GPLVMs to non-Gaussian observations, such as Poisson, negative binomial, and multinomial distributions, using our random feature latent variable model (RFLVM). Our generalized RFLVMs perform on par with state-of-the-art latent variable models on a wide range of applications, including motion capture, images, and text data for the purpose of estimating the latent structure and imputing the missing data of these complex data sets.	翻訳日:2023-06-16 19:49:00 公開日:2023-06-14
# コード事前学習モデルに対するマルチターゲットバックドア攻撃 Multi-target Backdoor Attacks for Code Pre-trained Models ( http://arxiv.org/abs/2306.08350v1 ) ライセンス: Link先を確認	Yanzhou Li, Shangqing Liu, Kangjie Chen, Xiaofei Xie, Tianwei Zhang and Yang Liu	(参考訳) ニューラルコードモデルのバックドア攻撃は、コードインテリジェンスの進歩により、かなりの注目を集めている。しかし、既存の作業の多くは、コードに関連する下流タスクのタスク固有のデータにトリガーを挿入することで、攻撃範囲を制限している。さらに、事前訓練されたモデルに対する攻撃の大半は、タスクを理解するために設計されている。本稿では,コード事前学習モデルに対するタスク非依存のバックドア攻撃を提案する。我々のバックドアモデルは、下流のコード理解と生成タスクのマルチターゲット攻撃をサポートする2つの学習戦略(Poisoned Seq2Seq学習とトークン表現学習)で事前訓練されている。デプロイフェーズでは、ターゲットの攻撃を達成するために設計したトリガーによって、被害者モデルに埋め込まれたバックドアを起動することができる。 7つのデータセット上で2つのコード理解タスクと3つのコード生成タスクに対するアプローチを評価した。大規模な実験により、我々のアプローチは、コードに関連する下流タスクを効果的に、そして、密かに攻撃できることを示した。 Backdoor attacks for neural code models have gained considerable attention due to the advancement of code intelligence. However, most existing works insert triggers into task-specific data for code-related downstream tasks, thereby limiting the scope of attacks. Moreover, the majority of attacks for pre-trained models are designed for understanding tasks. In this paper, we propose task-agnostic backdoor attacks for code pre-trained models. Our backdoored model is pre-trained with two learning strategies (i.e., Poisoned Seq2Seq learning and token representation learning) to support the multi-target attack of downstream code understanding and generation tasks. During the deployment phase, the implanted backdoors in the victim models can be activated by the designed triggers to achieve the targeted attack. We evaluate our approach on two code understanding tasks and three code generation tasks over seven datasets. Extensive experiments demonstrate that our approach can effectively and stealthily attack code-related downstream tasks.	翻訳日:2023-06-16 19:48:33 公開日:2023-06-14
# 生成的深層学習はフェルミオン系の集合変数を明らかにする Generative deep-learning reveals collective variables of Fermionic systems ( http://arxiv.org/abs/2306.08348v1 ) ライセンス: Link先を確認	Rapha\"el-David Lasseri, David Regnier, Mika\"el Frosini, Marc Verriere, Nicolas Schunck	(参考訳) タンパク質の折りたたみから核分裂までの複雑な過程は、いくつかの集団変数でパラメータ化された低次元反応経路に従うことが多い。核理論において、平均場図における核密度の形状に関連する変数は、中性子と陽子の大きな振幅集団運動を記述する鍵となる。これらの自由度によって広がる断熱エネルギーの風景を探索すると、この還元空間のダイナミクスをシミュレートしながら、可能な反応チャネルが明らかになる。残念ながら、この理論の枠組みは、系が集合変数に関して量子相転移に遭遇するたびに崩壊する。本稿では,核過程を高度に表現し,そのフェルミオン波動関数への微分可能写像を保証しながら,新たな集団変数を構築できる生成的深層学習アルゴリズムを提案する。この集合空間内では、核はその断熱量子相の一方からもう一方へ、ポテンシャルエネルギー障壁を渡る価格で連続的に進化することができる。このアプローチは、密度汎関数理論で記述された電子系を包含する単一のスレーター行列式によって記述された任意のフェルミオン系に適用できる。 Complex processes ranging from protein folding to nuclear fission often follow a low-dimension reaction path parameterized in terms of a few collective variables. In nuclear theory, variables related to the shape of the nuclear density in a mean-field picture are key to describing the large amplitude collective motion of the neutrons and protons. Exploring the adiabatic energy landscape spanned by these degrees of freedom reveals the possible reaction channels while simulating the dynamics in this reduced space yields their respective probabilities. Unfortunately, this theoretical framework breaks down whenever the systems encounters a quantum phase transition with respect to the collective variables. Here we propose a generative-deep-learning algorithm capable of building new collective variables highly representative of a nuclear process while ensuring a differentiable mapping to its Fermionic wave function. Within this collective space, the nucleus can evolve continuously from one of its adiabatic quantum phase to the other at the price of crossing a potential energy barrier. This approach applies to any Fermionic system described by a single Slater determinant, which encompasses electronic systems described within the density functional theory.	翻訳日:2023-06-16 19:48:20 公開日:2023-06-14
# UIERL:水中画像強調のための外部表現学習ネットワーク UIERL: Internal-External Representation Learning Network for Underwater Image Enhancement ( http://arxiv.org/abs/2306.08344v1 ) ライセンス: Link先を確認	Zhengyong Wang, Liquan Shen, Yihan Yu and Yuan Hui	(参考訳) 水中画像強調(uie)は有意義だが困難な課題であり,近年,学習に基づくuie手法が数多く提案されている。多くの進展がみられたが,(1)水中撮像過程による水中画像の局所的品質差は,特に風景深度の異なる領域において有意な差がある。しかし, 従来の手法では, 水中画像の内部特性は無視されており, 性能は劣っている。(2) 取得手法の特異性のため, 水中画像取得ツールは通常, 同一又は類似のシーンで複数の画像をキャプチャする。したがって, 実用化に資する水中画像は, 高い相関関係にある。しかし、単一の画像を処理する場合、既存の手法では、関連画像が提供するリッチな外部情報を考慮していない。彼らのパフォーマンスにはまだ改善の余地がある。これら2つの側面を動機として,UIEタスクを内部情報と外部情報とを同時に実行するための,UIERL(internal-external representation learning)ネットワークを提案する。内部表現学習段階において、シーン深度に基づく領域セグメンテーションを含む、新しい深度に基づく領域特徴誘導網を設計し、異なる品質レベルの領域を検知し、次いで領域ワイド空間エンコーダモジュールを設計する。異なる品質の地域に対して地域的特徴学習を行うことで、ネットワークはグローバルな特徴の効果的なガイダンスを提供し、画像内差分エンハンスメントのガイドとなる。外部表現学習段階において,まず,関連画像中のリッチな外部情報をマイニングする外部情報抽出ネットワークを提案する。次に、提案する外部アシスト-内部モジュールと内部アシスト-eを介して、内部および外部特徴が相互に相互作用する。 Underwater image enhancement (UIE) is a meaningful but challenging task, and many learning-based UIE methods have been proposed in recent years. Although much progress has been made, these methods still exist two issues: (1) There exists a significant region-wise quality difference in a single underwater image due to the underwater imaging process, especially in regions with different scene depths. However, existing methods neglect this internal characteristic of underwater images, resulting in inferior performance; (2) Due to the uniqueness of the acquisition approach, underwater image acquisition tools usually capture multiple images in the same or similar scenes. Thus, the underwater images to be enhanced in practical usage are highly correlated. However, when processing a single image, existing methods do not consider the rich external information provided by the related images. There is still room for improvement in their performance. Motivated by these two aspects, we propose a novel internal-external representation learning (UIERL) network to better perform UIE tasks with internal and external information, simultaneously. In the internal representation learning stage, a new depth-based region feature guidance network is designed, including a region segmentation based on scene depth to sense regions with different quality levels, followed by a region-wise space encoder module. With performing region-wise feature learning for regions with different quality separately, the network provides an effective guidance for global features and thus guides intra-image differentiated enhancement. In the external representation learning stage, we first propose an external information extraction network to mine the rich external information in the related images. Then, internal and external features interact with each other via the proposed external-assist-internal module and internal-assist-e	翻訳日:2023-06-16 19:48:03 公開日:2023-06-14
# 調和ポテンシャルにおける量子粒子の繰り返し測定について On Repeated Measurements of a Quantum Particle in a Harmonic Potential ( http://arxiv.org/abs/2306.08342v1 ) ライセンス: Link先を確認	Filip Gampel, Mariusz Gajda	(参考訳) 位置と運動量が繰り返し監視される調和ポテンシャルにおける量子粒子の進化を研究する。測定装置のバックアクションが考慮される。本モデルは、正の演算子値測度に対応する一般化計測を用いる。測定すると、粒子の波動関数は観測結果に応じて検出可能な状態の1つに投影されると仮定する。我々は、これらの測定後の状態がガウス波束を動かすことを選択した。波動関数量子モンテカルロ形式は粒子の単一の量子軌道をシミュレートするために用いられる。本研究では, 粒子の位置と運動量の分散を詳細に観察し, 古典的軌道がどのように出現するかを示す。 We study evolution of a quantum particle in a harmonic potential whose position and momentum are repeatedly monitored. A back-action of measuring devices is accounted for. Our model utilizes a generalized measurement corresponding to the Positive Operator-Valued Measure. We assume that upon measurement the particle's wavefunction is projected onto one of possible detector states depending on the observed result. We chose these post-measurement states to be moving Gaussian wavepackets. The Wave Function Quantum Monte-Carlo formalism is used to simulate single quantum trajectories of the particle. We show how classical trajectories emerge in course of observation and study in detail dispersion of position and momentum of the particle.	翻訳日:2023-06-16 19:47:33 公開日:2023-06-14
# 畳み込みニューラルネットワークにおけるグローバルローカル処理 Global-Local Processing in Convolutional Neural Networks ( http://arxiv.org/abs/2306.08336v1 ) ライセンス: Link先を確認	Zahra Rezvani, Soroor Shekarizeh, Mohammad Sabokrou	(参考訳) 畳み込みニューラルネットワーク(CNN)は、画像処理の課題において優れたパフォーマンスを達成した。実際、cnnはマイクロレベルのヒト脳構造(人工ニューロン)を模倣している。同時に、マクロアーキテクチャ(ハイレベル認知)における人間の自然な視覚知覚の模倣から距離を置いている。近年,CNNは局所的な特徴に非常に偏りがあり,入力のグローバルな側面を検知できないことが研究されている。しかしながら、この文献はこの問題に関する限られた手がかりを提供している。そこで本研究では,人間の瞳孔の無意識行動に触発された単純かつ効果的な解法を提案する。我々は,Global Advantage Stream (GAS)と呼ばれるシンプルなモジュールを考案し,入力サンプルの全体的特徴(グローバル機能)を学習し,捉える。次に,グローバル/ローカル処理(glp)モデルと呼ばれるプラグ・アンド・プレイコンポーネントとして,ガスの特徴をcnnネットワークと組み合わせた。実験の結果,このストリームは計算量や時間負荷を増加させることで精度が向上し,ネットワークを敵の攻撃に対してより堅牢にすることを確認した。さらに、モデルの解釈を調べることで、健康な人間の知覚システムに似た、より包括的な表現を学習できることが分かる。 Convolutional Neural Networks (CNNs) have achieved outstanding performance on image processing challenges. Actually, CNNs imitate the typically developed human brain structures at the micro-level (Artificial neurons). At the same time, they distance themselves from imitating natural visual perception in humans at the macro architectures (high-level cognition). Recently it has been investigated that CNNs are highly biased toward local features and fail to detect the global aspects of their input. Nevertheless, the literature offers limited clues on this problem. To this end, we propose a simple yet effective solution inspired by the unconscious behavior of the human pupil. We devise a simple module called Global Advantage Stream (GAS) to learn and capture the holistic features of input samples (i.e., the global features). Then, the GAS features were combined with a CNN network as a plug-and-play component called the Global/Local Processing (GLP) model. The experimental results confirm that this stream improves the accuracy with an insignificant additional computational/temporal load and makes the network more robust to adversarial attacks. Furthermore, investigating the interpretation of the model shows that it learns a more holistic representation similar to the perceptual system of healthy humans	翻訳日:2023-06-16 19:47:26 公開日:2023-06-14
# 生存予測のためのグローバル構造整合性を有するマルチモーダル最適輸送型コアテンショントランス Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency for Survival Prediction ( http://arxiv.org/abs/2306.08330v1 ) ライセンス: Link先を確認	Yingxue Xu and Hao Chen	(参考訳) 生存予測(Survival prediction)は、死のランク付けリスクを予測することを目的とした複雑な順序回帰タスクであり、一般的には、組織学とゲノムデータの統合の恩恵を受ける。病理学とゲノム学による共同学習の進展にもかかわらず、既存の方法はまだ困難な問題に悩まされている。 1) 病理像の大きさが大きいため, ギガピクセル全体のスライド画像(wsis)を効果的に表現することは困難である。 2) 組織学における腫瘍微小環境(TME)内の相互作用は生存分析に不可欠である。現在のアプローチは、ヒストロジーとゲノムデータの間のコアテンションを通じてこれらの相互作用をモデル化しようとするが、それらはモダリティ間の密集した局所的類似性のみに焦点をあてる。そこで本稿では,グローバル構造一貫性を持つ多モード最適トランスポートベースコアテンショントランスフォーマティブフレームワークを提案する。このフレームワークでは,ggapixel wsiを表すために,wsiのパッチと遺伝子組込みをマッチさせるために最適なトランスポート(ot)を適用する。さらに重要なことは、OTベースのコアテンションは、生存予測のためにTME内の構造的相互作用を効果的に捉えるグローバルな認識を提供する。 OTの計算複雑性の増大を克服するため,不均衡なミニバッチOTで元のOTを近似することにより,WSIパッチのマイクロバッチに対する堅牢かつ効率的な実装を提案する。大規模実験により,5つのベンチマークデータセット上での手法の優位性を示した。コードはリリースされている。 Survival prediction is a complicated ordinal regression task that aims to predict the ranking risk of death, which generally benefits from the integration of histology and genomic data. Despite the progress in joint learning from pathology and genomics, existing methods still suffer from challenging issues: 1) Due to the large size of pathological images, it is difficult to effectively represent the gigapixel whole slide images (WSIs). 2) Interactions within tumor microenvironment (TME) in histology are essential for survival analysis. Although current approaches attempt to model these interactions via co-attention between histology and genomic data, they focus on only dense local similarity across modalities, which fails to capture global consistency between potential structures, i.e. TME-related interactions of histology and co-expression of genomic data. To address these challenges, we propose a Multimodal Optimal Transport-based Co-Attention Transformer framework with global structure consistency, in which optimal transport (OT) is applied to match patches of a WSI and genes embeddings for selecting informative patches to represent the gigapixel WSI. More importantly, OT-based co-attention provides a global awareness to effectively capture structural interactions within TME for survival prediction. To overcome high computational complexity of OT, we propose a robust and efficient implementation over micro-batch of WSI patches by approximating the original OT with unbalanced mini-batch OT. Extensive experiments show the superiority of our method on five benchmark datasets compared to the state-of-the-art methods. The code is released.	翻訳日:2023-06-16 19:47:07 公開日:2023-06-14
# r-drop構造を有する改良されたコンフォーマントエンドツーエンド音声認識モデルに関する研究 Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure ( http://arxiv.org/abs/2306.08329v1 ) ライセンス: Link先を確認	Weidong Ji, Shijie Zan, Guohui Zhou, and Xu Wang	(参考訳) 深層学習におけるエンド・ツー・エンド音声認識モデルにおける一般化能力の低下に対処するため,R-drop構造を組み込んだコンフォーマーベース音声認識モデル"Conformer-R"を提案する。このモデルは、音声認識で有望な結果を示す適合モデルとr-drop構造を組み合わせたものである。これにより、R-drop構造を用いることで、局所的およびグローバルな音声情報の両方を効果的にモデル化し、過度な適合を低減できる。これにより、モデルの一般化能力が向上し、全体的な認識効率が向上する。このモデルは、まず一般ドメイン適応のためにAishell1とWenetspeechデータセットで事前訓練され、その後、コンピュータ関連のオーディオデータに基づいて微調整された。 LAS や Wenet といった古典モデルとの比較テストは同じテストセットで実施され、Conformer-R モデルの一般化を効果的に改善する能力を示した。 To address the issue of poor generalization ability in end-to-end speech recognition models within deep learning, this study proposes a new Conformer-based speech recognition model called "Conformer-R" that incorporates the R-drop structure. This model combines the Conformer model, which has shown promising results in speech recognition, with the R-drop structure. By doing so, the model is able to effectively model both local and global speech information while also reducing overfitting through the use of the R-drop structure. This enhances the model's ability to generalize and improves overall recognition efficiency. The model was first pre-trained on the Aishell1 and Wenetspeech datasets for general domain adaptation, and subsequently fine-tuned on computer-related audio data. Comparison tests with classic models such as LAS and Wenet were performed on the same test set, demonstrating the Conformer-R model's ability to effectively improve generalization.	翻訳日:2023-06-16 19:46:38 公開日:2023-06-14
# out-of-distribution predictionのための分布シフトインバージョン Distribution Shift Inversion for Out-of-Distribution Prediction ( http://arxiv.org/abs/2306.08328v1 ) ライセンス: Link先を確認	Runpeng Yu, Songhua Liu, Xingyi Yang, Xinchao Wang	(参考訳) 機械学習学会は、統一予測器や不変特徴表現を探索することによって、トレーニングとテスト分布の間の分散シフトに対処する、無数の分散(ood)アルゴリズムの出現を目撃している。しかし、トレーニング期間中に試験分布が不有効であることや、トレーニングとテスト分布間の分布トランスレータマッピングのトレーニングが不可能であることから、未確認の試験セットにおける分布シフトを直接緩和するタスクはめったに検討されない。本稿では,分散翻訳訓練における分散テストの必要性を回避し,分散翻訳をood予測に役立てる方法について検討する。そこで本研究では, 予測モデルに入力される前に, ood試験試料をガウス雑音と線形結合し, 音源分布にのみ訓練された拡散モデルを用いて, トレーニング分布に戻す可搬分布シフトインバージョンアルゴリズムを提案する。理論解析により本手法の有効性が明らかになった。複数領域の一般化データセットと単一領域の一般化データセットを併用した実験結果から,OoDアルゴリズムを幅広く使用する場合,本手法は汎用的な性能向上をもたらすことが示された。 Machine learning society has witnessed the emergence of a myriad of Out-of-Distribution (OoD) algorithms, which address the distribution shift between the training and the testing distribution by searching for a unified predictor or invariant feature representation. However, the task of directly mitigating the distribution shift in the unseen testing set is rarely investigated, due to the unavailability of the testing distribution during the training phase and thus the impossibility of training a distribution translator mapping between the training and testing distribution. In this paper, we explore how to bypass the requirement of testing distribution for distribution translator training and make the distribution translation useful for OoD prediction. We propose a portable Distribution Shift Inversion algorithm, in which, before being fed into the prediction model, the OoD testing samples are first linearly combined with additional Gaussian noise and then transferred back towards the training distribution using a diffusion model trained only on the source distribution. Theoretical analysis reveals the feasibility of our method. Experimental results, on both multiple-domain generalization datasets and single-domain generalization datasets, show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.	翻訳日:2023-06-16 19:46:21 公開日:2023-06-14
# Histogram Oriented Gradient Based Support Vector Machine を用いた遅発性トマト病の早期診断 Early Detection of Late Blight Tomato Disease using Histogram Oriented Gradient based Support Vector Machine ( http://arxiv.org/abs/2306.08326v1 ) ライセンス: Link先を確認	M. Ishaq, M. Waqas	(参考訳) トマトは地球上で最も重要な果物の1つである。農業生産において重要な役割を担っている。本研究はトマトにおける遅発性病の早期発見のための新しいスマート手法を提案する。本研究は,フィールド(植物村のデータセット)からのイメージの増加によるデータセットの改善と,遅延トマト病のリアルタイム検出のためのサポートベクターマシン(SVM)とヒストグラム指向勾配(HOG)からなるハイブリッドアルゴリズムを提案する。遅発性トマト葉病の早期発見のためのHOGに基づくSVMモデルを提案する。 MSE,精度,精度,リコールの観点から,提案モデルの性能を決定木やKNNと比較する。農業における先進技術の統合は産業に革命をもたらす可能性があり、より効率的で持続可能な利益をもたらす。トマト病の早期発見に関する研究は、スマート農業の重要性の高まり、気候に配慮した農業の必要性、天然資源をより効率的に活用する必要性の高まり、収穫高の需要に寄与する。提案したSVMとHOGのハイブリッドアルゴリズムは,トマトの遅発性病の早期発見に有意な可能性を秘めている。決定木とKNNアルゴリズムに対して提案したモデルの性能と,将来のアプリケーションに最適なアルゴリズムを選択するのに有効である。この研究は、農家が作物の収量と品質を最適化し、農業慣行の環境への影響を減らし、データ駆動による決定を下すのに役立つ。 The tomato is one of the most important fruits on earth. It plays an important and useful role in the agricultural production of any country. This research propose a novel smart technique for early detection of late blight diseases in tomatoes. This work improve the dataset with an increase in images from the field (the Plant Village dataset) and proposed a hybrid algorithm composed of support vector machines (SVM) and histogram-oriented gradients (HOG) for real-time detection of late blight tomato disease. To propose a HOG-based SVM model for early detection of late blight tomato leaf disease. To check the performance of the proposed model in terms of MSE, accuracy, precision, and recall as compared to Decision Tree and KNN. The integration of advanced technology in agriculture has the potential to revolutionize the industry, making it more efficient, sustainable, and profitable. This research work on the early detection of tomato diseases contributes to the growing importance of smart farming, the need for climate-smart agriculture, the rising need to more efficiently utilize natural resources, and the demand for higher crop yields. The proposed hybrid algorithm of SVM and HOG has significant potential for the early detection of late blight disease in tomato plants. The performance of the proposed model against decision tree and KNN algorithms and the results may assist in selecting the best algorithm for future applications. The research work can help farmers make data-driven decisions to optimize crop yield and quality while also reducing the environmental impact of farming practices.	翻訳日:2023-06-16 19:46:00 公開日:2023-06-14
# NodeFormer: ノード分類のためのスケーラブルなグラフ構造学習トランスフォーマー NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification ( http://arxiv.org/abs/2306.08385v1 ) ライセンス: Link先を確認	Qitian Wu, Wentao Zhao, Zenan Li, David Wipf, Junchi Yan	(参考訳) グラフニューラルネットワークは、相互接続データによる学習のために広く研究されている。これにもかかわらず、近年の証拠は、GNNが過剰なスカッシング、ヘテロフィリー、長距離依存関係の扱い、エッジの不完全性、特にグラフの完全欠如に関連する欠陥を明らかにしている。メッセージパッシングのための新しい適応トポロジを学習することが有効な解決策であるが、二次複雑性に関する問題は、大規模ネットワークにおけるスケーラビリティと精度の同時保証を妨げる。本稿では,大規模グラフ上でノード分類を行うTransformerスタイルネットワークにおいて,任意のノード間のノード信号を効率的に伝搬するための新しい全ペアメッセージパッシング方式を提案する。具体的には、効率的な計算は、アルゴリズムの複雑さを線形性(w.r.t. node number)に還元し、潜在グラフ構造を大きな、潜在的に完全連結なグラフから微分可能な方法で学習する。設計の正当化として、付随する理論も提供します。広範な実験により、入力グラフが欠落しているグラフ(最大2mノード)のノード分類や、グラフ強調アプリケーション(画像分類など)など、様々なタスクにおいて、この手法が有望な有効性を示す。 Graph neural networks have been extensively studied for learning with inter-connected data. Despite this, recent evidence has revealed GNNs' deficiencies related to over-squashing, heterophily, handling long-range dependencies, edge incompleteness and particularly, the absence of graphs altogether. While a plausible solution is to learn new adaptive topology for message passing, issues concerning quadratic complexity hinder simultaneous guarantees for scalability and precision in large networks. In this paper, we introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes, as an important building block for a pioneering Transformer-style network for node classification on large graphs, dubbed as \textsc{NodeFormer}. Specifically, the efficient computation is enabled by a kernerlized Gumbel-Softmax operator that reduces the algorithmic complexity to linearity w.r.t. node numbers for learning latent graph structures from large, potentially fully-connected graphs in a differentiable manner. We also provide accompanying theory as justification for our design. Extensive experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs (with up to 2M nodes) and graph-enhanced applications (e.g., image classification) where input graphs are missing.	翻訳日:2023-06-16 19:39:51 公開日:2023-06-14
# speechglue: 自己教師付き音声モデルが言語知識をいかにうまく捉えられるか? SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? ( http://arxiv.org/abs/2306.08374v1 ) ライセンス: Link先を確認	Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma	(参考訳) 音声表現のための自己教師付き学習(SSL)は、音声認識や話者認識など、様々な下流タスクにうまく適用されている。最近では、音声SSLモデルも音声言語理解タスクの進行に有用であることが示され、SSLモデルが音響だけでなく言語情報も学習できる可能性が示唆されている。本稿では,音声ssl技術が言語知識をうまく捉えることができるかを明らかにすることを目的とする。本研究では,汎用言語理解評価(GLUE)ベンチマークの音声バージョンであるSpeechGLUEを紹介する。 GLUEは様々な自然言語理解タスクから構成されるため、SpeechGLUEは音声SSLモデルの言語能力の程度を解明することができる。実験では、テキストベースのSSLモデルに劣らず、音声SSLモデルはベースラインよりも優れた性能を示し、ラベルなしの音声データからある程度の言語知識を得られることを示唆している。 Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition. More recently, speech SSL models have also been shown to be beneficial in advancing spoken language understanding tasks, implying that the SSL models have the potential to learn not only acoustic but also linguistic information. In this paper, we aim to clarify if speech SSL techniques can well capture linguistic knowledge. For this purpose, we introduce SpeechGLUE, a speech version of the General Language Understanding Evaluation (GLUE) benchmark. Since GLUE comprises a variety of natural language understanding tasks, SpeechGLUE can elucidate the degree of linguistic ability of speech SSL models. Experiments demonstrate that speech SSL models, although inferior to text-based SSL models, perform better than baselines, suggesting that they can acquire a certain amount of general linguistic knowledge from just unlabeled speech data.	翻訳日:2023-06-16 19:39:26 公開日:2023-06-14
# アスペクト感情三重項抽出のための意味的拡張二重エンコーダ A semantically enhanced dual encoder for aspect sentiment triplet extraction ( http://arxiv.org/abs/2306.08373v1 ) ライセンス: Link先を確認	Baoxing Jiang, Shehui Liang, Peiyu Liu, Kaifang Dong, Hongye Li	(参考訳) 感情三重項抽出(ASTE)は、感情三重項を包括的に識別することを目的としたアスペクトベース感情分析(ABSA)の重要なサブタスクである。従来の研究は、革新的なテーブル充填戦略によるASTEの強化に重点を置いてきた。しかし、これらのアプローチはしばしば言語表現の多面的性質を見落とし、アスペクトと意見の間の貴重な相互作用情報を失うことになる。この制限に対処するために,BERTをベースとした基本エンコーダと,Bi-LSTMネットワークとGCN(Graph Convolutional Network)で構成される特定のエンコーダの両方を利用するフレームワークを提案する。基本エンコーダは言語表現の表面レベルセマンティクスをキャプチャし、特定のエンコーダは構文情報や語彙情報を含むより深いセマンティクスを抽出する。コメントの係り受け木をモデル化し,単語の係り受けや位置情報を考慮し,文の根底にある意図とより関連のある意味を捉えることを目的とする。相互作用戦略は、2つのエンコーダが学んだ意味を組み合わせ、複数の視点の融合を可能にし、アスペクト-オピニオン関係のより包括的な理解を促進する。ベンチマークデータセットを用いた実験により,提案フレームワークの最先端性能を実証した。 Aspect sentiment triplet extraction (ASTE) is a crucial subtask of aspect-based sentiment analysis (ABSA) that aims to comprehensively identify sentiment triplets. Previous research has focused on enhancing ASTE through innovative table-filling strategies. However, these approaches often overlook the multi-perspective nature of language expressions, resulting in a loss of valuable interaction information between aspects and opinions. To address this limitation, we propose a framework that leverages both a basic encoder, primarily based on BERT, and a particular encoder comprising a Bi-LSTM network and graph convolutional network (GCN ). The basic encoder captures the surface-level semantics of linguistic expressions, while the particular encoder extracts deeper semantics, including syntactic and lexical information. By modeling the dependency tree of comments and considering the part-of-speech and positional information of words, we aim to capture semantics that are more relevant to the underlying intentions of the sentences. An interaction strategy combines the semantics learned by the two encoders, enabling the fusion of multiple perspectives and facilitating a more comprehensive understanding of aspect--opinion relationships. Experiments conducted on benchmark datasets demonstrate the state-of-the-art performance of our proposed framework.	翻訳日:2023-06-16 19:39:10 公開日:2023-06-14
# 統一スペクトル空間特徴集合によるハイパースペクトル像の物体検出 Object Detection in Hyperspectral Image via Unified Spectral-Spatial Feature Aggregation ( http://arxiv.org/abs/2306.08370v1 ) ライセンス: Link先を確認	Xiao He, Chang Tang, Xinwang Liu, Wei Zhang, Kun Sun, Jiangfeng Xu	(参考訳) 深層学習に基づくハイパースペクトル画像(hsi)の分類と物体検出技術は,画像コンテンツ解析,解釈,より広いhsi応用において重要な役割を担っているため,注目されている。しかし、現在のハイパースペクトルオブジェクト検出アプローチは、主にスペクトル情報または空間情報を強調し、これら2つの側面間の貴重な相補関係を見越す。本研究では,高スペクトル画像に固有の豊富なスペクトル情報と空間補完情報を効果的に活用する,新しい \textbf{S}pectral-\textbf{S}patial \textbf{A}ggregation (S2ADet) オブジェクト検出器を提案する。 S2ADetは、ハイパースペクトル情報デカップリング(HID)モジュールと、2ストリーム特徴抽出ネットワークと1ステージ検出ヘッドとを備える。 HIDモジュールは、帯域選択と主成分分析によりスペクトルおよび空間情報を集約することによりハイパースペクトル画像を処理する。得られた空間的およびスペクトル的集約情報に基づいて,スペクトル空間的特徴を相互作用する特徴集約2ストリームネットワークを提案する。さらに、既存のデータベースの制限に対処するために、hod3kという、さまざまな実世界のシーンでキャプチャされた3,242のハイパースペクトルイメージを含む、広範なデータセットに注釈を付け、3つのオブジェクトクラスを包含する。これらの画像は512x256ピクセルの解像度を持ち、470nmから620nmまでの16バンドをカバーしている。 2つのデータセットに関する総合的な実験は、S2ADetが既存の最先端の手法を超え、堅牢で信頼性の高い結果が得られることを示した。この作業のデモコードとデータセットは、 \url{https://github.com/hexiao-cs/S2ADet}で公開されている。 Deep learning-based hyperspectral image (HSI) classification and object detection techniques have gained significant attention due to their vital role in image content analysis, interpretation, and wider HSI applications. However, current hyperspectral object detection approaches predominantly emphasize either spectral or spatial information, overlooking the valuable complementary relationship between these two aspects. In this study, we present a novel \textbf{S}pectral-\textbf{S}patial \textbf{A}ggregation (S2ADet) object detector that effectively harnesses the rich spectral and spatial complementary information inherent in hyperspectral images. S2ADet comprises a hyperspectral information decoupling (HID) module, a two-stream feature extraction network, and a one-stage detection head. The HID module processes hyperspectral images by aggregating spectral and spatial information via band selection and principal components analysis, consequently reducing redundancy. Based on the acquired spatial and spectral aggregation information, we propose a feature aggregation two-stream network for interacting spectral-spatial features. Furthermore, to address the limitations of existing databases, we annotate an extensive dataset, designated as HOD3K, containing 3,242 hyperspectral images captured across diverse real-world scenes and encompassing three object classes. These images possess a resolution of 512x256 pixels and cover 16 bands ranging from 470 nm to 620 nm. Comprehensive experiments on two datasets demonstrate that S2ADet surpasses existing state-of-the-art methods, achieving robust and reliable results. The demo code and dataset of this work are publicly available at \url{https://github.com/hexiao-cs/S2ADet}.	翻訳日:2023-06-16 19:38:49 公開日:2023-06-14
# T5-SR: セマンティックパーシングのための統一Seq-to-Seqデコーディング戦略 T5-SR: A Unified Seq-to-Seq Decoding Strategy for Semantic Parsing ( http://arxiv.org/abs/2306.08368v1 ) ライセンス: Link先を確認	Yuntao Li and Zhenpeng Su and Yutian Li and Hanchu Zhang and Sirui Wang and Wei Wu and Yan Zhang	(参考訳) 自然言語クエリをseq2seq形式でsqlに変換することが近年注目を集めている。しかし、抽象シンタクティックツリーベースのSQL生成と比較すると、セq2seqセマンティックパーザは、スキーマ情報予測の質の低下や自然言語クエリとSQL間のセマンティックコヒーレンス不足など、多くの課題に直面している。本稿では,上記の問題点を分析し,srと呼ばれる新たな中間表現ssqlとスコア再推定器を用いた再ランキング法を用いて,上記の障害を解決するseq2seq指向の復号戦略を提案する。実験の結果,提案手法の有効性が示され,T5-SR-3b はスパイダーデータセット上で新たな最先端結果を得ることができた。 Translating natural language queries into SQLs in a seq2seq manner has attracted much attention recently. However, compared with abstract-syntactic-tree-based SQL generation, seq2seq semantic parsers face much more challenges, including poor quality on schematical information prediction and poor semantic coherence between natural language queries and SQLs. This paper analyses the above difficulties and proposes a seq2seq-oriented decoding strategy called SR, which includes a new intermediate representation SSQL and a reranking method with score re-estimator to solve the above obstacles respectively. Experimental results demonstrate the effectiveness of our proposed techniques and T5-SR-3b achieves new state-of-the-art results on the Spider dataset.	翻訳日:2023-06-16 19:38:15 公開日:2023-06-14
# SaliencyCut:Open-set Fine-Grained Anomaly Detectionのための可塑性異常の増大 SaliencyCut: Augmenting Plausible Anomalies for Open-set Fine-Grained Anomaly Detection ( http://arxiv.org/abs/2306.08366v1 ) ライセンス: Link先を確認	Jianan Ye, Yijie Hu, Xi Yang, Qiu-Feng Wang, Chao Huang, Kaizhu Huang	(参考訳) オープンセットのきめ細かい異常検出は、トレーニング中に見えなかったような異常を検出するために、識別可能なきめ細かい特徴を学習する必要がある難しいタスクである。安価で効果的なアプローチとして、データ拡張は、そのようなモデルのトレーニングを改善するために擬似異常を作成するために広く使われている。拡張手法の最近の知恵は、ランダムな擬似インスタンスの生成に焦点が当てられており、これにより、拡張インスタンスと異常が混ざり合ったり、典型的な異常範囲から外れたりする可能性がある。この問題に対処するため,本論文では,疑似だがより一般的な異常を発生させるために,サリエンシー誘導型データ拡張手法であるsaliencycutを提案する。さらに,各サンプルの異常スコアを学習するために,正規および異常学習ヘッドからなる2頭学習戦略を展開した。理論的解析により、このメカニズムはより扱いやすく、データログライクな下限を提供することが示された。次に、各サンプルから微細な異常特徴を抽出・評価し、異常事例の識別表現の学習を容易にするために、異常学習ヘッドにパッチワイド残余モジュールを新たに設計する。 6つの実世界の異常検出データセットで行った大規模な実験は、様々な条件下でのベースラインや他の最先端手法に対する我々の手法の優位性を実証している。 Open-set fine-grained anomaly detection is a challenging task that requires learning discriminative fine-grained features to detect anomalies that were even unseen during training. As a cheap yet effective approach, data augmentation has been widely used to create pseudo anomalies for better training of such models. Recent wisdom of augmentation methods focuses on generating random pseudo instances that may lead to a mixture of augmented instances with seen anomalies, or out of the typical range of anomalies. To address this issue, we propose a novel saliency-guided data augmentation method, SaliencyCut, to produce pseudo but more common anomalies which tend to stay in the plausible range of anomalies. Furthermore, we deploy a two-head learning strategy consisting of normal and anomaly learning heads, to learn the anomaly score of each sample. Theoretical analyses show that this mechanism offers a more tractable and tighter lower bound of the data log-likelihood. We then design a novel patch-wise residual module in the anomaly learning head to extract and assess the fine-grained anomaly features from each sample, facilitating the learning of discriminative representations of anomaly instances. Extensive experiments conducted on six real-world anomaly detection datasets demonstrate the superiority of our method to the baseline and other state-of-the-art methods under various settings.	翻訳日:2023-06-16 19:37:52 公開日:2023-06-14
# 摂動データを用いた高能率オフライン強化学習 Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources ( http://arxiv.org/abs/2306.08364v1 ) ライセンス: Link先を確認	Chengshuai Shi, Wei Xiong, Cong Shen, Jing Yang	(参考訳) オフライン強化学習(rl)に関する既存の理論的研究は、ターゲットタスクから直接サンプリングされたデータセットをほとんど考慮している。しかし実際には、データは複数の異種だが関連する情報源から来ることが多い。このギャップによって動機づけられたこの研究は、ターゲットタスクのランダムな摂動バージョンから収集される複数のデータセットでオフラインRLを厳格に理解することを目的としている。情報理論の下限が導出され、データサンプルの数に加えて、関係するソースの数に関する必要条件が明らかにされる。次に,データソース毎に有限個のデータサンプルからのサンプル不確実性と,利用可能なデータソースの有限個数によるソース不確実性を同時に考慮した,新しいhetpeviアルゴリズムを提案する。理論的解析により、HetPEVIは、データソースが優れたデータカバレッジを提供する限り、ターゲットタスクを解決できることを示した。さらに、HetPEVIは水平長の多項式係数まで最適であることが示されている。最後に、この研究はオフラインのマルコフゲームとオフラインのロバストなRLに拡張され、提案された設計の一般化と理論的解析を示す。 Existing theoretical studies on offline reinforcement learning (RL) mostly consider a dataset sampled directly from the target task. In practice, however, data often come from several heterogeneous but related sources. Motivated by this gap, this work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of from itself. An information-theoretic lower bound is derived, which reveals a necessary requirement on the number of involved sources in addition to that on the number of data samples. Then, a novel HetPEVI algorithm is proposed, which simultaneously considers the sample uncertainties from a finite number of data samples per data source and the source uncertainties due to a finite number of available data sources. Theoretical analyses demonstrate that HetPEVI can solve the target task as long as the data sources collectively provide a good data coverage. Moreover, HetPEVI is demonstrated to be optimal up to a polynomial factor of the horizon length. Finally, the study is extended to offline Markov games and offline robust RL, which demonstrates the generality of the proposed designs and theoretical analyses.	翻訳日:2023-06-16 19:37:01 公開日:2023-06-14
# テキスト対画像生成の知覚と現実 Perceptions and Realities of Text-to-Image Generation ( http://arxiv.org/abs/2306.08363v1 ) ライセンス: Link先を確認	Jonas Oppenlaender, Johanna Silvennoinen, Ville Paananen, Aku Visuri	(参考訳) 生成人工知能(AI)は広く普及している技術であり、社会や個人に大きな影響を与える。 10年足らず前には、クリエイティブな作業が自動化される最後のものになるだろうと考えられていました。本稿では,テキストから画像への生成に対する人々の知覚に関する調査の結果について述べる。我々は,新興技術に対する参加者の技術的理解,その恐怖と懸念,および個人や社会に対するテキスト・ツー・イメージ・ジェネレーションのリスクと危険性について考察する。参加者は、この技術に関連するリスクと危険性を認識していたが、技術が個人的リスクであると考える参加者はごくわずかである。他人のリスクは参加者にとってより容易に認識できた。芸術家は特に危険にさらされた。この技術を試した参加者は、試した人よりも将来の重要性を低く評価した。この結果は、多くの人々が、生成的人工知能の潜在的な個人的リスクと、この技術に関連する差し迫った社会的変化をまだ知らないことを示している。 Generative artificial intelligence (AI) is a widely popular technology that will have a profound impact on society and individuals. Less than a decade ago, it was thought that creative work would be among the last to be automated - yet today, we see AI encroaching on many creative domains. In this paper, we present the findings of a survey study on people's perceptions of text-to-image generation. We touch on participants' technical understanding of the emerging technology, their fears and concerns, and thoughts about risks and dangers of text-to-image generation to the individual and society. We find that while participants were aware of the risks and dangers associated with the technology, only few participants considered the technology to be a personal risk. The risks for others were more easy to recognize for participants. Artists were particularly seen at risk. Participants who had tried the technology rated its future importance lower than those who had not tried it. This result shows that many people are still oblivious of the potential personal risks of generative artificial intelligence and the impending societal changes associated with this technology.	翻訳日:2023-06-16 19:36:29 公開日:2023-06-14
# 協調型マルチエージェント強化学習を支援する階層型タスクネットワーク計画 Hierarchical Task Network Planning for Facilitating Cooperative Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2306.08359v1 ) ライセンス: Link先を確認	Xuechen Mu, Hankz Hankui Zhuo, Chen Chen, Kai Zhang, Chao Yu and Jianye Hao	(参考訳) sparse reward multi-agent reinforcement learning (marl)環境を共同方法でトラップで探索することは複雑なタスクである。エージェントは通常、目標状態に達しず、トラップに陥り、システム全体のパフォーマンスに影響を与えます。そこで本稿では,事前知識を用いて探索空間を縮小し,学習を支援するフレームワークであるSOMARLを提案する。 SOMARLではエージェントはMARL環境の一部として扱われ、シンボリック知識は木構造を用いて組み込まれ、知識階層を構築する。本フレームワークは,階層型タスクネットワーク(HTN)とメタコントローラを備えたハイブリッドモジュールを高レベルで,MARLベースの対話モジュールを低レベルとする2層階層構造を有する。 HTNモジュールとメタコントローラは階層的ドメイン定義言語(HDDL)とオプションフレームワークを使用して、それぞれ記号的知識を形式化し、ドメイン知識と記号的オプションセットを取得する。さらに、HTNモジュールはドメイン知識を活用し、メタコントローラがシンボリックオプションを選択するのを支援することで、低レベルのエージェント探索を誘導する。メタコントローラはさらに、探索行動を制限し、必要に応じてHTN計画ソリューションを調整するために、シンボリックオプションの固有の報酬を計算する。我々は,findtreasureとmoveboxの2つのベンチマークでsomarlを評価し,最先端のmarlおよびmarl環境におけるsubgoalベースラインよりも優れた性能を報告した。 Exploring sparse reward multi-agent reinforcement learning (MARL) environments with traps in a collaborative manner is a complex task. Agents typically fail to reach the goal state and fall into traps, which affects the overall performance of the system. To overcome this issue, we present SOMARL, a framework that uses prior knowledge to reduce the exploration space and assist learning. In SOMARL, agents are treated as part of the MARL environment, and symbolic knowledge is embedded using a tree structure to build a knowledge hierarchy. The framework has a two-layer hierarchical structure, comprising a hybrid module with a Hierarchical Task Network (HTN) planning and meta-controller at the higher level, and a MARL-based interactive module at the lower level. The HTN module and meta-controller use Hierarchical Domain Definition Language (HDDL) and the option framework to formalize symbolic knowledge and obtain domain knowledge and a symbolic option set, respectively. Moreover, the HTN module leverages domain knowledge to guide low-level agent exploration by assisting the meta-controller in selecting symbolic options. The meta-controller further computes intrinsic rewards of symbolic options to limit exploration behavior and adjust HTN planning solutions as needed. We evaluate SOMARL on two benchmarks, FindTreasure and MoveBox, and report superior performance over state-of-the-art MARL and subgoal-based baselines for MARL environments significantly.	翻訳日:2023-06-16 19:36:11 公開日:2023-06-14
# 場-曲率結合による絡み合い領域法違反 Entanglement area law violation from field-curvature coupling ( http://arxiv.org/abs/2306.08357v1 ) ライセンス: Link先を確認	Alessio Belfiglio, Orlando Luongo, Stefano Mancini	(参考訳) 時空曲率と最小結合しない大スカラー場の絡み合いエントロピーを静的で球対称な背景を仮定して検討する。我々は、球状殻の格子を導入し、半径方向のカットオフを付与することで、フィールドハミルトンを識別する。次にフィールドの基底状態を調べ,非ミニマルカップリングによる領域則からの逸脱を定量化し,特にシュワルツシルト・ド・ジッターとヘイワード時空に着目し,ド・ジッター時空を制限ケースとして論じた。また, 大規模正のカップリング定数は, 小さいフィールド質量の場合であっても, 境界領域に対するエントロピースケーリングを著しく変化させることができることを示した。我々の結果はブラックホールのエントロピー生成と初期宇宙シナリオの観点から解釈される。 We investigate the entanglement entropy of a massive scalar field nonminimally coupled to spacetime curvature, assuming a static, spherically symmetric background. We discretize the field Hamiltonian by introducing a lattice of spherical shells and imposing a cutoff in the radial direction. We then study the ground state of the field and quantify deviations from area law due to nonminimal coupling, focusing in particular on Schwarzschild-de Sitter and Hayward spacetimes, also discussing de Sitter spacetime as a limiting case. We show that large positive coupling constants can significantly alter the entropy scaling with respect to the boundary area, even in case of small field mass. Our outcomes are interpreted in view of black hole entropy production and early universe scenarios.	翻訳日:2023-06-16 19:35:46 公開日:2023-06-14
# MgO(001)基板上のDy原子:DFT+U(HIA)法による研究 Dy adatom on MgO(001) substrate: DFT+U(HIA) study ( http://arxiv.org/abs/2306.08415v1 ) ライセンス: Link先を確認	Alexander B. Shick (1,2), Eduard Belsch (1,3), Alexander I. Lichtenstein (3,4) ((1) Institute of Physics, Czech Academy of Sciences, Na Slovance 2, 182 21 Prague, Czech Republic (2) Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, Rehovoth 76100, Israel (3) Institute of Theoretical Physics, University of Hamburg, 20355 Hamburg, Germany (4) European X-Ray Free-Electron Laser Facility, Holzkoppel 4, 22869 Schenefeld, Germany)	(参考訳) MgO(001)基板上に吸着した個々のDy原子の電子構造と磁性について、密度汎関数理論とアンダーソン不純物モデル(DFT+U(HIA))に対するハバードI近似の組み合わせを用いて検討した。 2価のDy$^{2+}$ adatom in $f^{10}$ configurationが見つかる。算出したX線吸収(XAS)と磁気円二色性(XMCD)スペクトルを実験データと比較した。退化した$\|{J=8.0, J_z= \pm 4.0}>$状態の間の量子トンネルは、磁気モーメントの面内配向を持つ$\|{J=8.0, J_z=0.0}>$基底状態を形成する。これはMg(001)基板上におけるMgO原子の残留磁化の欠如を説明する。我々の研究は、希土類単原子磁石のさらなる研究と予測に有効なルートを提供することができる。 The electronic structure and magnetism of individual Dy atom adsorbed on the MgO(001) substrate is investigated using the combination of the density functional theory with the Hubbard-I approximation to the Anderson impurity model (DFT+U(HIA)). The divalent Dy$^{2+}$ adatom in $f^{10}$ configuration is found. The calculated x-ray absorption (XAS) and magnetic circular dichroism (XMCD) spectra are compared to the experimental data. Quantum tunneling between degenerate $\|{J=8.0, J_z= \pm 4.0}>$ states leads to formation of $\|{J=8.0, J_z= 0.0}>$ ground state with an in-plane orientation of the magnetic moment. It explains absence of remanent magnetization in MgO adatom on the top of Mg(001) substrate. Our studies can provide a viable route for further investigation and prediction of the rare-earth single atom magnets.	翻訳日:2023-06-16 19:29:48 公開日:2023-06-14
# 音声強調のための微調整自己監督モデルの特徴正規化 Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement ( http://arxiv.org/abs/2306.08406v1 ) ライセンス: Link先を確認	Hejung Yang, Hong-Goo Kang	(参考訳) 自己教師付き学習を用いて訓練された大規模で事前訓練された表現モデルは、入力データから高品質な有能な特徴を抽出できるため、機械学習の様々な分野で人気を集めている。そのため、音声認識など様々なパターン分類タスクのベースネットワークとして頻繁に使用されている。しかし、これらのモデルを音声信号生成の分野に適用する研究はあまり行われていない。本稿では,下流音声強調タスクにおける事前学習音声表現モデルの有用性について検討する。事前学習モデルの入力特徴と目標拡張モデルとのミスマッチを軽減するために,これらのモジュールをスムーズにリンクする新しい特徴正規化手法を採用する。提案手法は, 各種事前学習音声モデルと組み合わせた場合, ベースラインと比較し, 音声品質の大幅な向上を実現する。 Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have been frequently used as base networks for various pattern classification tasks such as speech recognition. However, not much research has been conducted on applying these types of models to the field of speech signal generation. In this paper, we investigate the feasibility of using pre-trained speech representation models for a downstream speech enhancement task. To alleviate mismatches between the input features of the pre-trained model and the target enhancement model, we adopt a novel feature normalization technique to smoothly link these modules together. Our proposed method enables significant improvements in speech quality compared to baselines when combined with various types of pre-trained speech models.	翻訳日:2023-06-16 19:29:31 公開日:2023-06-14
# 生物医学的関連抽出のためのコーパスの構築 Building a Corpus for Biomedical Relation Extraction of Species Mentions ( http://arxiv.org/abs/2306.08403v1 ) ライセンス: Link先を確認	Oumaima El Khettari, Solen Quiniou, Samuel Chaffron	(参考訳) バイオメディカルテキストの文レベルで,生物間の有意義な連接関係を抽出するために,手動で注釈付きコーパス,種-種間相互作用(種-種間相互作用)を提案する。このコーパスはpubtatorを利用して、異なる名前付きエンティティ認識種タガーを評価した後、全文記事に種を注釈付けする。最初の成果は、BERTとその生物医学的変異体を用いた種間関係の抽出である。 We present a manually annotated corpus, Species-Species Interaction, for extracting meaningful binary relations between species, in biomedical texts, at sentence level, with a focus on the gut microbiota. The corpus leverages PubTator to annotate species in full-text articles after evaluating different Named Entity Recognition species taggers. Our first results are promising for extracting relations between species using BERT and its biomedical variants.	翻訳日:2023-06-16 19:29:17 公開日:2023-06-14
# LiveChat:ライブストリーミングから自動構築された大規模パーソナライズされた対話データセット LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically Constructed from Live Streaming ( http://arxiv.org/abs/2306.08401v1 ) ライセンス: Link先を確認	Jingsheng Gao, Yixin Lian, Ziyi Zhou, Yuzhuo Fu, Baoyuan Wang	(参考訳) 近年,オープンドメイン対話システムは有望な進歩を遂げている。最先端の対話エージェントは、大規模なテキストベースのソーシャルメディアデータと大規模な事前訓練されたモデルに基づいて構築されているが、RedditやWeiboなどの公開データセットのバウンダリ転送可能性や、ライブストリーミングなど、急速に成長するシナリオでも、これらのエージェントがうまく機能する保証はない。実写オープンドメインシナリオにおけるベンチマークの応答と確立の本質的な能力を改善するため,351のペルソナの平均セッション数が約3800、各ペルソナの詳細なプロファイルが約1億3300万件からなるLiveChatデータセットを紹介した。 livechatは、インターネット上で多数のライブビデオを処理することで自動的に構築される。そこで本研究では,応答モデルと宛先認識の2つの重要な課題を対象とし,高度な手法に基づく検索ベースラインを提案する。実験により、ペルソナプロファイルとペルソナ当たりの平均セッションの活用によるポジティブな効果が検証された。さらに、LiveChat上の先進世代モデルの転送可能性もベンチマークし、現在の課題に対する今後の方向性を示す。 Open-domain dialogue systems have made promising progress in recent years. While the state-of-the-art dialogue agents are built upon large-scale text-based social media data and large pre-trained models, there is no guarantee these agents could also perform well in fast-growing scenarios, such as live streaming, due to the bounded transferability of pre-trained models and biased distributions of public datasets from Reddit and Weibo, etc. To improve the essential capability of responding and establish a benchmark in the live open-domain scenario, we introduce the LiveChat dataset, composed of 1.33 million real-life Chinese dialogues with almost 3800 average sessions across 351 personas and fine-grained profiles for each persona. LiveChat is automatically constructed by processing numerous live videos on the Internet and naturally falls within the scope of multi-party conversations, where the issues of Who says What to Whom should be considered. Therefore, we target two critical tasks of response modeling and addressee recognition and propose retrieval-based baselines grounded on advanced techniques. Experimental results have validated the positive effects of leveraging persona profiles and larger average sessions per persona. In addition, we also benchmark the transferability of advanced generation-based models on LiveChat and pose some future directions for current challenges.	翻訳日:2023-06-16 19:29:07 公開日:2023-06-14
# メタ強化学習の副産物としての単純エンボディード言語学習 Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning ( http://arxiv.org/abs/2306.08400v1 ) ライセンス: Link先を確認	Evan Zheran Liu, Sahaana Suri, Tong Mu, Allan Zhou, Chelsea Finn	(参考訳) 機械学習モデルは通常、言語タスク(例えば、次の単語予測)を直接訓練することで言語を学ぶが、非言語タスク(例えば、食べ物の取得)を解決する副産物として、人間の子供に言語が現れる。 embodied reinforcement learning (rl)エージェントは、非言語タスクから間接的に言語を学習できるのでしょうか? 言語とその意味を関連付ける学習には、動的環境と様々な言語が必要である。そこで本稿では,タスクによって異なる言語を持つマルチタスク環境において,この問題を考察する。具体的には、エージェントが特定のオフィスを見つけることを目標とするオフィスナビゲーション環境を設計し、異なる建物(タスク)でオフィスの位置が異なる。それぞれの建物には、ゴールオフィスの位置を簡単な言語で記述したフロアプランが含まれており、訪問時にRGBイメージとして視覚的に読むことができる。 RLエージェントは言語を間接的に学習することができる。現在のメタRLアルゴリズムで訓練されたエージェントは、ホールドアウトレイアウトと言語フレーズを備えたフロアプランの読み込みに成功し、直接的な言語監督を受けていないにも関わらず、すぐに正しいオフィスに移動する。 Whereas machine learning models typically learn language by directly training on language tasks (e.g., next-word prediction), language emerges in human children as a byproduct of solving non-language tasks (e.g., acquiring food). Motivated by this observation, we ask: can embodied reinforcement learning (RL) agents also indirectly learn language from non-language tasks? Learning to associate language with its meaning requires a dynamic environment with varied language. Therefore, we investigate this question in a multi-task environment with language that varies across the different tasks. Specifically, we design an office navigation environment, where the agent's goal is to find a particular office, and office locations differ in different buildings (i.e., tasks). Each building includes a floor plan with a simple language description of the goal office's location, which can be visually read as an RGB image when visited. We find RL agents indeed are able to indirectly learn language. Agents trained with current meta-RL algorithms successfully generalize to reading floor plans with held-out layouts and language phrases, and quickly navigate to the correct office, despite receiving no direct language supervision.	翻訳日:2023-06-16 19:28:41 公開日:2023-06-14
# スケーラブルなニューラル確率的解集合プログラミング Scalable Neural-Probabilistic Answer Set Programming ( http://arxiv.org/abs/2306.08397v1 ) ライセンス: Link先を確認	Arseny Skryagin and Daniel Ochs and Devendra Singh Dhami and Kristian Kersting	(参考訳) ニューラルネットワークのロバスト性とシンボリックメソッドの表現性を組み合わせた目標は、ニューロシンボリックaiへの関心を再び高めた。深層ニューラルネットワークの確率推定により確率論的論理プログラミングを行うために,DPPL(Deep Probabilistic Programming Languages)が開発された。しかし、最近のSOTA DPPLアプローチでは、条件付き確率的クエリに限られており、真の関節確率推定のパワーを提供していない。そこで本研究では,DPPL内でのトラクタブル確率的推論の容易な統合を提案する。本稿では,NPP(Neural-Probabilistic Predicates)と解集合プログラミング(ASP)を介して結合された論理プログラムからなる新しいDPPLであるSLASHを紹介する。 NPPは、すべての深いモデルタイプとそれらの組み合わせを単一の確率的述語として表現できる新しい設計原理である。この文脈では、述語の原子表記を調整することにより、様々な種類の確率的クエリに応答する新しい$+/-$表記を導入する。提案手法は, 予測性能を犠牲にすることなく, 推論を高速化し, 統計的に重要でない部分(地上)を創出する方法を示す。我々は、MNIST追加のベンチマークタスクやVQA(Visual Question Answering)など、様々なタスクでSLASHを評価する。 The goal of combining the robustness of neural networks and the expressiveness of symbolic methods has rekindled the interest in Neuro-Symbolic AI. Deep Probabilistic Programming Languages (DPPLs) have been developed for probabilistic logic programming to be carried out via the probability estimations of deep neural networks. However, recent SOTA DPPL approaches allow only for limited conditional probabilistic queries and do not offer the power of true joint probability estimation. In our work, we propose an easy integration of tractable probabilistic inference within a DPPL. To this end, we introduce SLASH, a novel DPPL that consists of Neural-Probabilistic Predicates (NPPs) and a logic program, united via answer set programming (ASP). NPPs are a novel design principle allowing for combining all deep model types and combinations thereof to be represented as a single probabilistic predicate. In this context, we introduce a novel $+/-$ notation for answering various types of probabilistic queries by adjusting the atom notations of a predicate. To scale well, we show how to prune the stochastically insignificant parts of the (ground) program, speeding up reasoning without sacrificing the predictive performance. We evaluate SLASH on a variety of different tasks, including the benchmark task of MNIST addition and Visual Question Answering (VQA).	翻訳日:2023-06-16 19:28:20 公開日:2023-06-14
# 公正度とEU非差別法との整合性:復調パリティと条件付き復調異性 Compatibility of Fairness Metrics with EU Non-Discrimination Laws: Demographic Parity & Conditional Demographic Disparity ( http://arxiv.org/abs/2306.08394v1 ) ライセンス: Link先を確認	Lisa Koutsoviti Koumeri, Magali Legast, Yasaman Yousefi, Koen Vanhoof, Axel Legay, Christoph Schommer	(参考訳) 実証的な証拠は、機械学習(ML)技術によって駆動されるアルゴリズムによる決定が、法的に保護されたグループに対する差別を脅かしたり、新たな不公平な情報源を創り出すことを示唆している。この研究は、EUの非差別的法的枠組みにおける公正に対する文脈的アプローチをサポートし、公正度メトリクスと公正性制約による法的公正性を保証するためのポイントを評価することを目的としている。そこで本研究では, 公平性定義(DP)による非差別・差分処理の法的概念を, 条件付き復号法(CDD)を用いて分析する。我々は、EU非差別法の下で実施される司法解釈に対する文脈的アプローチを有効化しつつ、予測のバイアスを減らすことができるかどうかを評価するために、異なる分類器を公正な制約で訓練し比較する。 3つのシナリオにおける実験結果から,処理バイアス軽減アルゴリズムがそれぞれ異なる性能をもたらすことが示された。我々の実験と分析は、手元にあるケースと法的正当性に応じて、AIによる意思決定が法的な観点から公平である可能性を示唆している。これらの予備的な結果は、さらなるケーススタディ、メトリクス、公平性の概念を含む将来の研究を促進する。 Empirical evidence suggests that algorithmic decisions driven by Machine Learning (ML) techniques threaten to discriminate against legally protected groups or create new sources of unfairness. This work supports the contextual approach to fairness in EU non-discrimination legal framework and aims at assessing up to what point we can assure legal fairness through fairness metrics and under fairness constraints. For that, we analyze the legal notion of non-discrimination and differential treatment with the fairness definition Demographic Parity (DP) through Conditional Demographic Disparity (CDD). We train and compare different classifiers with fairness constraints to assess whether it is possible to reduce bias in the prediction while enabling the contextual approach to judicial interpretation practiced under EU non-discrimination laws. Our experimental results on three scenarios show that the in-processing bias mitigation algorithm leads to different performances in each of them. Our experiments and analysis suggest that AI-assisted decision-making can be fair from a legal perspective depending on the case at hand and the legal justification. These preliminary results encourage future work which will involve further case studies, metrics, and fairness notions.	翻訳日:2023-06-16 19:27:59 公開日:2023-06-14
# 個人化とロバストなフェデレーション学習 Provably Personalized and Robust Federated Learning ( http://arxiv.org/abs/2306.08393v1 ) ライセンス: Link先を確認	Mariel Werner, Lie He, Sai Praneeth Karimireddy, Michael Jordan, Martin Jaggi	(参考訳) 類似の目的を持ったクライアントのクラスタ化とクラスタ単位のモデル学習は、フェデレーション学習におけるパーソナライゼーションに対する直感的で解釈可能なアプローチである。しかし、証明可能かつ最適な保証で実施することは、依然としてオープンな課題である。本研究では、クライアント上の確率勾配がK$分布の1つに対応する確率最適化問題としてパーソナライズされたフェデレーション学習を形式化する。そのような設定では、使用法を示す。一簡単なしきい値に基づくクラスタリングアルゴリズム、及び二ローカルクライアント勾配が最適収束保証を得ること。実際、クライアントの真の基盤となるクラスタリングを知っていれば、当社のレートは漸近的に一致します。さらに,我々のアルゴリズムは,勾配のごく一部が崩壊するビザンチン設定において,確実に頑健である。 Clustering clients with similar objectives and learning a model per cluster is an intuitive and interpretable approach to personalization in federated learning. However, doing so with provable and optimal guarantees has remained an open challenge. In this work, we formalize personalized federated learning as a stochastic optimization problem where the stochastic gradients on a client may correspond to one of $K$ distributions. In such a setting, we show that using i) a simple thresholding-based clustering algorithm, and ii) local client gradients obtains optimal convergence guarantees. In fact, our rates asymptotically match those obtained if we knew the true underlying clustering of the clients. Furthermore, our algorithms are provably robust in the Byzantine setting where some fraction of the gradients are corrupted.	翻訳日:2023-06-16 19:27:38 公開日:2023-06-14
# Skill-Critic: 強化学習のための学習スキルの精製 Skill-Critic: Refining Learned Skills for Reinforcement Learning ( http://arxiv.org/abs/2306.08388v1 ) ライセンス: Link先を確認	Ce Hao, Catherine Weaver, Chen Tang, Kenta Kawamoto, Masayoshi Tomizuka, Wei Zhan	(参考訳) 階層的強化学習(RL)は、政策を時間的に複数のレベルに抽象化することで、長期的な意思決定を促進することができる。スパース報酬環境における評価結果は、スキル、すなわちプリミティブアクションのシーケンスで見られる。通常、スキル潜在空間とポリシはオフラインデータから検出されるが、結果として生じる低レベルのポリシは、低カバレッジのデモンストレーションや分散シフトのために信頼性が低い可能性がある。そこで,我々は,ハイレベルなスキル選択と連動して,低レベルのポリシーを微調整する手法を提案する。これらのポリシーは、オフラインデモから学んだ潜在空間によって初期化され、規則化され、統合ポリシー最適化のガイドとなる。我々は,Gran Turismo Sportにおける新しいスパース報酬自律レースタスクを含む,複数のスパースRL環境でのアプローチを検証する。実験の結果,Skill-Criticの低レベル政策の微調整と実証誘導正規化が最適性能に不可欠であることが示唆された。画像とビデオはhttps://sites.google.com/view/skill-critic.comで入手できる。最終バージョンでコードをオープンソース化する予定です。 Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data, but the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose fine-tuning the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low and high-level policies; these policies are also initialized and regularized by the latent space learned from offline demonstrations to guide the joint policy optimization. We validate our approach in multiple sparse RL environments, including a new sparse reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for optimal performance. Images and videos are available at https://sites.google.com/view/skill-critic. We plan to open source the code with the final version.	翻訳日:2023-06-16 19:27:27 公開日:2023-06-14
# 実世界シナリオにおけるディープニューラルネットワークの効率的なバックドア攻撃 Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios ( http://arxiv.org/abs/2306.08386v1 ) ライセンス: Link先を確認	Hong Sun, Ziqiang Li, Pengfei Xia, Heng Li, Beihao Xia, Yi Wu, Bin Li	(参考訳) 近年のディープニューラルネットワーク(DNN)は、大量のトレーニングデータに依存しており、悪意のある攻撃者がデータを悪用して汚染し、バックドア攻撃を行う機会となっている。これらの攻撃はDNNの信頼性を著しく損なう。しかし、既存のバックドア攻撃手法は、すべてのトレーニングデータが単一のソースから来ており、攻撃者がトレーニングデータへの完全なアクセスを前提として、非現実的な仮定をする。本稿では、被害者が複数のソースからデータを収集し、攻撃者が完全なトレーニングデータにアクセスできないような、より現実的な攻撃シナリオを導入することで、この制限に対処する。このシナリオを、データ制約されたバックドア攻撃と呼んでいる。このような場合、以前の攻撃方法は、バックドア注入の過程で良性と毒物の特徴が絡み合うことによる効率の低下に苦しむ。そこで本研究では,CLIP(Contrastive Language- Image Pre-Training)モデルを用いた新しい手法を提案する。そこで,本研究では,清潔な特徴の影響を抑制することを目的とした,清潔な特徴抑制技術と,モデルの動作を効果的に操作するための中毒機能の存在と影響を増強することに焦点を当てた中毒機能増強技術という,2つの異なる流れからのクリップベースの技術を紹介する。本手法の有効性, 正確性に対する無害性, およびステルスネスを評価するため, 3つのターゲットモデル, 3つのデータセット, 15以上の異なる設定について広範な実験を行った。その結果、データ制約のあるシナリオにおける既存の攻撃と比較して、いくつかの設定で100%以上の改善が達成された。本研究は,既存の手法の限界に対処し,データ制約されたバックドア攻撃に対する実用的で効果的な解決策を提供する。 Recent deep neural networks (DNNs) have come to rely on vast amounts of training data, providing an opportunity for malicious attackers to exploit and contaminate the data to carry out backdoor attacks. These attacks significantly undermine the reliability of DNNs. However, existing backdoor attack methods make unrealistic assumptions, assuming that all training data comes from a single source and that attackers have full access to the training data. In this paper, we address this limitation by introducing a more realistic attack scenario where victims collect data from multiple sources, and attackers cannot access the complete training data. We refer to this scenario as data-constrained backdoor attacks. In such cases, previous attack methods suffer from severe efficiency degradation due to the entanglement between benign and poisoning features during the backdoor injection process. To tackle this problem, we propose a novel approach that leverages the pre-trained Contrastive Language-Image Pre-Training (CLIP) model. We introduce three CLIP-based technologies from two distinct streams: Clean Feature Suppression, which aims to suppress the influence of clean features to enhance the prominence of poisoning features, and Poisoning Feature Augmentation, which focuses on augmenting the presence and impact of poisoning features to effectively manipulate the model's behavior. To evaluate the effectiveness, harmlessness to benign accuracy, and stealthiness of our method, we conduct extensive experiments on 3 target models, 3 datasets, and over 15 different settings. The results demonstrate remarkable improvements, with some settings achieving over 100% improvement compared to existing attacks in data-constrained scenarios. Our research contributes to addressing the limitations of existing methods and provides a practical and effective solution for data-constrained backdoor attacks.	翻訳日:2023-06-16 19:27:11 公開日:2023-06-14
# 準-1次元格子上の1次元ハイゼンベルクハミルトニアンによる量子状態転移 Quantum state transfer using 1D Heisenberg Hamiltonian on quasi-1D lattices ( http://arxiv.org/abs/2306.08440v1 ) ライセンス: Link先を確認	Chandrima B. Pushpan, Harikrishnan K. J., Amit Kumar Pal	(参考訳) 準1次元格子上での単一およびマルチキュービット状態の転送について検討し、状態伝達プロトコルに関わる時間進化は1Dハミルトニアンによってのみ生成される。準-1D 等方性ハイゼンベルク模型を$z$方向の磁場下で使用し、スピンスピン相互作用の強さは、ラング(rungs)と呼ばれ、他の部分格子に沿った相互作用よりもはるかに強い。フィールド強度を特殊値にチューニングすると、強いrung結合限界において、準1次元等方性ハイゼンベルクモデルが有効な1d xxzモデルにマッピングされ、各rungは効果的な2レベル系を模倣する。したがって、1つのrungから別のrungへの低エネルギーrung状態の転送は、1d xxzモデルを用いて任意の1量子ビット状態から1つの格子サイトから別の場所への転送によって表現できる。そこで本研究では,単一キュービット状態の特定エンコーディングを低エネルギーrung状態とし,その後に転送された状態を受信側rung上でデコードすることにより,任意の単一キュービット状態をある格子サイトから別のラッチサイトへ転送するプロトコルを提案する。これらの符号化および復号プロトコルは、1Dラングハミルトニアンとシングルキュービット位相ゲートによって生成される時間進化を含み、単一キュービット状態の転送に必要なすべての時間進化が1Dハミルトニアンから生成される。提案プロトコルを用いた単一量子状態転送の性能は,全準1Dハミルトニアンによる時間進化を用いた場合,常に同じよりも優れていることを示す。 We consider transfer of single and multi-qubit states on a quasi-1D lattice, where the time evolutions involved in the state transfer protocol are generated by only 1D Hamiltonians. We use the quasi-1D isotropic Heisenberg model under a magnetic field along the $z$ direction, where the spin-spin interaction strengths along the vertical sublattices, referred to as rungs, are much stronger than the interactions along other sublattices. Tuning the field-strength to a special value, in the strong rung-coupling limit, the quasi-1D isotropic Heisenberg model can be mapped to an effective 1D XXZ model, where each rung mimics an effective two-level system. Consequently, the transfer of low-energy rung states from one rung to another can be represented by a transfer of an arbitrary single-qubit state from one lattice site to another using the 1D XXZ model. Exploiting this, we propose protocols for transferring arbitrary single-qubit states from one lattice site to another by using specific encoding of the single-qubit state into a low-energy rung state, and a subsequent decoding of the transferred state on the receiver rung. These encoding and decoding protocols involve a time evolution generated by the 1D rung Hamiltonian and single-qubit phase gates, ensuring that all time-evolutions required for transferring the single-qubit state are generated from 1D Hamiltonians. We show that the performance of the single-qubit state transfer using the proposed protocol is always better than the same when a time-evolution generated by the full quasi-1D Hamiltonian is used.	翻訳日:2023-06-16 19:18:39 公開日:2023-06-14
# 2レベル結合系からのコヒーレント散乱 Coherent scattering from coupled two level systems ( http://arxiv.org/abs/2306.08439v1 ) ライセンス: Link先を確認	Thomas Nutz, Samuel T. Mister, Petros Androvitsaneas, Andrew Young, E. Harbord, J. G. Rarity, Ruth Oulton and Dara P. S. McCutcheon	(参考訳) 光活性スピン1/2系の共鳴蛍光特性について検討し,散乱光のコヒーレンスに及ぼす磁場の影響を解明した。本研究では, 2レベル系(TLS)の結果を再現し, 基底状態結合を持つスピン系にも適用可能な, このシステムのためのマスター方程式モデルを導出する。このモデルは弱励起状態において解析的に解かれる。このモデルにおけるスピンダイナミクスの包含は、コヒーレントに散乱した光の性質を基本レベルで変化させる。 TLSの場合、コヒーレンス特性は入力レーザーによって決定されることが知られている。スピン散乱光はスピンのコヒーレンス特性を継承することを示す。このマッピングにより、散乱場の直接測定によりスピンダイナミクスとコヒーレンス時間を測定することができる。さらに,線幅以下のゼーマン分裂を解消する能力を示した。スピン散乱場のコヒーレンス特性を理解するための重要なツールを示すとともに、スピン光子ベースの量子技術には不可欠である。 We study the resonance fluorescence properties of an optically active spin 1/2 system, elucidating the effects of a magnetic field on the coherence of the scattered light. We derive a master equation model for this system that reproduces the results of a two level system (TLS) while also being applicable to a spin system with ground state coupling. This model is then solved analytically in the weak excitation regime. The inclusion of spin dynamics in our model alters the properties of the coherently scattered light at a fundamental level. For a TLS the coherence properties are known to be determined by the input laser. We show that spin scattered light inherits the coherence properties of the spin. This mapping allows us to measure spin dynamics and coherence time through direct measurement of the scattered fields. Furthermore, we show the ability to resolve sub-natural linewidth zeeman splittings. Along with representing an invaluable tool for spin spectroscopy understanding the coherence properties of the spin-scattered field will be vital for spin-photon based quantum technologies.	翻訳日:2023-06-16 19:18:11 公開日:2023-06-14
# 実空間表現を用いた非対称変分量子状態の構築 Construction of Antisymmetric Variational Quantum States with Real-Space Representation ( http://arxiv.org/abs/2306.08434v1 ) ライセンス: Link先を確認	Takahiro Horiba, Soichi Shirai, Hirotoshi Hirai	(参考訳) 量子コンピュータを用いた電子状態計算は、主に量子ビット表現に適した第二量子化に基づいている。量子コンピュータ上で電子状態を記述する別の方法は、第1量子化であり、第2量子化よりも基底関数の数に関してより小さなスケーリングを実現することが期待されている。基底関数のうち、実空間基底はフォールトトレラント量子計算(ftqc)時代の量子力学シミュレーションにとって魅力的な選択肢である。実空間基底を持つ第一量子化における大きな困難は多体電子系の状態準備である。この困難は電子の反対称性から来ており、量子回路上に反対称性の量子状態を構築することは容易ではない。本稿では,非対称量子状態を作成するために,変分量子回路を構築するための設計原理を提案する。提案回路は、指数関数的に多くのスレーター行列式、すなわち、正確な基底状態の近似に関する体系的なアプローチを提供するマルチコンフィギュレーション状態の重ね合わせを生成する。我々は1次元水素分子系の基底状態を得るために変分量子固有解法(VQE)を実装した。その結果、提案回路は正確な非対称基底状態とそのエネルギーを十分に再現したが、従来の変分回路は非対称状態も対称状態も得られなかった。さらに,電子相関と量子絡み合いの関係を示す量子情報理論に基づく多体波動関数の解析を行った。 Electronic state calculations using quantum computers are mostly based on second quantization, which is suitable for qubit representation. Another way to describe electronic states on a quantum computer is first quantization, which is expected to achieve smaller scaling with respect to the number of basis functions than second quantization. Among basis functions, a real-space basis is an attractive option for quantum dynamics simulations in the fault-tolerant quantum computation (FTQC) era. A major difficulty in first quantization with a real-space basis is state preparation for many-body electronic systems. This difficulty stems from of the antisymmetry of electrons, and it is not straightforward to construct antisymmetric quantum states on a quantum circuit. In the present paper, we provide a design principle for constructing a variational quantum circuit to prepare an antisymmetric quantum state. The proposed circuit generates the superposition of exponentially many Slater determinants, that is, a multi-configuration state, which provides a systematic approach to approximating the exact ground state. We implemented the variational quantum eigensolver (VQE) to obtain the ground state of a one-dimensional hydrogen molecular system. As a result, the proposed circuit well reproduced the exact antisymmetric ground state and its energy, whereas the conventional variational circuit yielded neither an antisymmetric nor a symmetric state. Furthermore, we analyzed the many-body wave functions based on quantum information theory, which illustrated the relation between the electron correlation and the quantum entanglement.	翻訳日:2023-06-16 19:17:57 公開日:2023-06-14
# 『定義モデリング:定義をモデル化する』セマンティクスを使わずに定義を生成する "Definition Modeling: To model definitions." Generating Definitions With Little to No Semantics ( http://arxiv.org/abs/2306.08433v1 ) ライセンス: Link先を確認	Vincent Segonne and Timothee Mickus	(参考訳) 定義モデル(定義を生成するタスク)は、最初に、単語埋め込みの意味的品質を評価する手段として提案され、文脈における単語の一貫した語彙的意味表現は、その定義を生成するために必要な全ての情報を含むべきである。このタスクの比較的新しいところは、どの要素が実際に定義モデリングシステムに頼っているかわからないことです。本稿では,本論文の先行モデルが,明示的な多意味性などの意味的側面に対してより敏感であること,また,単語と語句間の形式的類似性に頼っていること,および埋め込みを評価する手段としてのタスクの有効性に疑念を抱くこと,など,期待するほど意味論を含まない可能性があることを示す。 Definition Modeling, the task of generating definitions, was first proposed as a means to evaluate the semantic quality of word embeddings-a coherent lexical semantic representations of a word in context should contain all the information necessary to generate its definition. The relative novelty of this task entails that we do not know which factors are actually relied upon by a Definition Modeling system. In this paper, we present evidence that the task may not involve as much semantics as one might expect: we show how an earlier model from the literature is both rather insensitive to semantic aspects such as explicit polysemy, as well as reliant on formal similarities between headwords and words occurring in its glosses, casting doubt on the validity of the task as a means to evaluate embeddings.	翻訳日:2023-06-16 19:17:37 公開日:2023-06-14
# 高次元過度線形回帰における最小ノルムリスクのバッチ安定化 Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression ( http://arxiv.org/abs/2306.08432v1 ) ライセンス: Link先を確認	Shahar Stein Ioushua, Inbar Hasidim, Ofer Shayevitz and Meir Feder	(参考訳) データをバッチに分割する学習アルゴリズムは、多くの機械学習アプリケーションで一般的であり、典型的には計算効率と性能のトレードオフを提供する。本稿では,等方的ガウス特徴を持つ最小ノルム過パラメータ線形回帰モデルのレンズによるバッチ分割の利点について検討する。最小ノルム推定器の自然な小バッチバージョンを提案し,その二次リスクの上限を導出し,最適バッチサイズの選択のために,ノイズレベルと過パラメータ比に逆比例することを示した。極小ノルムとは対照的に, 推定器は単調に過パラメータ化比が増加する安定なリスク挙動を認め, 補間点での爆発と二重発振現象の両方を除去する。興味深いことに、バッチパーティションによって提供されるこの暗黙の規則化は、バッチ間の機能の重複によって部分的に説明される。我々の境界は、新しい手法の組み合わせ、特にランダム部分空間上のノイズ射影のワッサーシュタイン計量における正規近似によって導かれる。 Learning algorithms that divide the data into batches are prevalent in many machine-learning applications, typically offering useful trade-offs between computational efficiency and performance. In this paper, we examine the benefits of batch-partitioning through the lens of a minimum-norm overparameterized linear regression model with isotropic Gaussian features. We suggest a natural small-batch version of the minimum-norm estimator, and derive an upper bound on its quadratic risk, showing it is inversely proportional to the noise level as well as to the overparameterization ratio, for the optimal choice of batch size. In contrast to minimum-norm, our estimator admits a stable risk behavior that is monotonically increasing in the overparameterization ratio, eliminating both the blowup at the interpolation point and the double-descent phenomenon. Interestingly, we observe that this implicit regularization offered by the batch partition is partially explained by feature overlap between the batches. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.	翻訳日:2023-06-16 19:17:23 公開日:2023-06-14
# 量子コンピューティングノイズモデルのボリュームベンチマーク Volumetric Benchmarking of Quantum Computing Noise Models ( http://arxiv.org/abs/2306.08427v1 ) ライセンス: Link先を確認	Tom Weber, Kerstin Borras, Karl Jansen, Dirk Kr\"ucker and Matthias Riebisch	(参考訳) スケーラビリティに向かっている量子コンピューティングの主な課題は、現在のデバイスの誤った振る舞いである。計算への影響の理解と予測は、これらのエラーを量子エラー緩和のような手法で対処するために不可欠である。したがって、正確なノイズモデルの構築と評価が必要である。しかし、ノイズモデルの評価はまだ体系的なアプローチに従っていないため、あるアプリケーションに対するモデルの精度を推定することはほぼ不可能である。そこで我々は,量子コンピューティングアプリケーションのためのノイズモデルベンチマーク手法を開発し,提案する。ハードウェア実験の結果と、量子回路の代表集合に対するノイズモデルの予測を比較する。また,ノイズモデルを構築し,そのパラメータを一連のトレーニング回路で最適化する。次に、文献から他のモデルと比較したボリュームベンチマークを実行します。 The main challenge of quantum computing on its way to scalability is the erroneous behaviour of current devices. Understanding and predicting their impact on computations is essential to counteract these errors with methods such as quantum error mitigation. Thus, it is necessary to construct and evaluate accurate noise models. However, the evaluation of noise models does not yet follow a systematic approach, making it nearly impossible to estimate the accuracy of a model for a given application. Therefore, we developed and present a systematic approach to benchmark noise models for quantum computing applications. It compares the results of hardware experiments to predictions of noise models for a representative set of quantum circuits. We also construct a noise model and optimize its parameters with a series of training circuits. We then perform a volumetric benchmark comparing our model to other models from the literature.	翻訳日:2023-06-16 19:17:06 公開日:2023-06-14
# 選択的概念モデル: テスト時にステークホルダーのカスタマイズを許可する Selective Concept Models: Permitting Stakeholder Customisation at Test-Time ( http://arxiv.org/abs/2306.08424v1 ) ライセンス: Link先を確認	Matthew Barker, Katherine M. Collins, Krishnamurthy Dvijotham, Adrian Weller, Umang Bhatt	(参考訳) 概念に基づくモデルは、ステークホルダーに解釈可能な概念のセットを使用して予測を行う。しかし、そのようなモデルは、しばしば固定された多くの概念を伴い、利害関係者にかなりの認知負荷を与える可能性がある。 SCOM(Selective Concept Models)を提案する。これは概念のサブセットのみを用いて予測を行い、その好みに応じてテスト時に利害関係者がカスタマイズできる。複数の実世界のデータセットに対して最適な精度を実現するために、SCOMは全概念のごく一部しか必要としないことを示す。さらに、人気のcubデータセットから900の鳥画像に対して、人間のコンセプトセット選択からなる新しいデータセットcub-selを収集し、リリースする。 CUB-Selを用いて、人間は推論を好む概念を選択し、最も理論的に意味のある概念を特定するのに苦労していることが示される。 SCOMが提供するカスタマイズとコンセプトの選択は、ステークホルダーの解釈と介入の効率を向上させる。 Concept-based models perform prediction using a set of concepts that are interpretable to stakeholders. However, such models often involve a fixed, large number of concepts, which may place a substantial cognitive load on stakeholders. We propose Selective COncept Models (SCOMs) which make predictions using only a subset of concepts and can be customised by stakeholders at test-time according to their preferences. We show that SCOMs only require a fraction of the total concepts to achieve optimal accuracy on multiple real-world datasets. Further, we collect and release a new dataset, CUB-Sel, consisting of human concept set selections for 900 bird images from the popular CUB dataset. Using CUB-Sel, we show that humans have unique individual preferences for the choice of concepts they prefer to reason about, and struggle to identify the most theoretically informative concepts. The customisation and concept selection provided by SCOM improves the efficiency of interpretation and intervention for stakeholders.	翻訳日:2023-06-16 19:16:56 公開日:2023-06-14
# x-detect: 小売店舗における物体検出装置の敵対的パッチ検出法 X-Detect: Explainable Adversarial Patch Detection for Object Detectors in Retail ( http://arxiv.org/abs/2306.08422v1 ) ライセンス: Link先を確認	Omer Hofman, Amit Giloni, Yarin Hayun, Ikuya Morikawa, Toshiya Shimizu, Yuval Elovici and Asaf Shabtai	(参考訳) 様々なドメイン(小売など)で広く使われているオブジェクト検出モデルは、敵の攻撃に対して脆弱であることが示されている。既存の物体検出器に対する対向攻撃検出方法は、新しい実生活攻撃の検出が困難であった。我々は、新しい対向パッチ検出器であるX-Detectを提示する。一敵のサンプルをリアルタイムで検出し、防御者が予防措置を講じることができること。二被告の意思決定プロセスを支援するために提起された警告について説明すること。三新たな攻撃の形で不慣れな脅威を扱うこと。新しいシーンが与えられると、x-detectは、オブジェクト抽出、シーン操作、特徴変換技術を利用してアラートを発行する必要があるかどうかを判断する、設計毎に説明可能な検出器のアンサンブルを使用する。 X-Detectは5つの異なる攻撃シナリオ(アダプティブアタックを含む)とCOCOデータセットと新しいSuperstoreデータセットを使用して、物理空間とデジタル空間の両方で評価された。実際の環境ではスマートショッピングカートのセットアップを用いて物理的評価を行い,17件の敵パッチ攻撃が1700件のビデオに記録された。その結果、X-Detectは攻撃シナリオの良さと敵の情景を区別し、0%のFPR(誤報なし)を維持し、警告のアクション可能な説明を提供しながら、最先端の手法よりも優れていた。デモが公開されている。 Object detection models, which are widely used in various domains (such as retail), have been shown to be vulnerable to adversarial attacks. Existing methods for detecting adversarial attacks on object detectors have had difficulty detecting new real-life attacks. We present X-Detect, a novel adversarial patch detector that can: i) detect adversarial samples in real time, allowing the defender to take preventive action; ii) provide explanations for the alerts raised to support the defender's decision-making process, and iii) handle unfamiliar threats in the form of new attacks. Given a new scene, X-Detect uses an ensemble of explainable-by-design detectors that utilize object extraction, scene manipulation, and feature transformation techniques to determine whether an alert needs to be raised. X-Detect was evaluated in both the physical and digital space using five different attack scenarios (including adaptive attacks) and the COCO dataset and our new Superstore dataset. The physical evaluation was performed using a smart shopping cart setup in real-world settings and included 17 adversarial patch attacks recorded in 1,700 adversarial videos. The results showed that X-Detect outperforms the state-of-the-art methods in distinguishing between benign and adversarial scenes for all attack scenarios while maintaining a 0% FPR (no false alarms) and providing actionable explanations for the alerts raised. A demo is available.	翻訳日:2023-06-16 19:16:41 公開日:2023-06-14
# マルチエージェント強化学習 Mediated Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2306.08419v1 ) ライセンス: Link先を確認	Dmitry Ivanov, Ilya Zisman, Kirill Chernyshev	(参考訳) マルチエージェント強化学習(MARL: Multi-Agent Reinforcement Learning, MARL)の文献の大半は、社会的福祉の最大化の問題に、混合環境における自己関心のエージェントの協力と一致する。この結果、個人の目標を放棄して社会的利益を優先するエージェントが生まれ、利己的な離反者によって悪用される可能性がある。協力はまた、創発的な行動が均衡であること、すなわち、エージェントがより高い個別の報酬を受け取れないことを保証することによって、エージェントのアイデンティティと境界を尊重することを要求する。機構設計の進歩に触発されて,メディエータを用いて社会的に有益な均衡を見出すものとして定義された協調問題の解決を提案する。仲介者は、代理人のために行動するが、それに同意する代理人のためにのみ行動する好意的な存在である。本研究では,政策勾配を有するエージェントと並行して仲介者を訓練し,仲介者を通じて協力を促す制約を受ける社会福祉を最大化する方法を示す。行列ゲームと反復ゲームにおける我々の実験は、MARLにおけるメディエータの適用の可能性を強調している。 The majority of Multi-Agent Reinforcement Learning (MARL) literature equates the cooperation of self-interested agents in mixed environments to the problem of social welfare maximization, allowing agents to arbitrarily share rewards and private information. This results in agents that forgo their individual goals in favour of social good, which can potentially be exploited by selfish defectors. We argue that cooperation also requires agents' identities and boundaries to be respected by making sure that the emergent behaviour is an equilibrium, i.e., a convention that no agent can deviate from and receive higher individual payoffs. Inspired by advances in mechanism design, we propose to solve the problem of cooperation, defined as finding socially beneficial equilibrium, by using mediators. A mediator is a benevolent entity that may act on behalf of agents, but only for the agents that agree to it. We show how a mediator can be trained alongside agents with policy gradient to maximize social welfare subject to constraints that encourage agents to cooperate through the mediator. Our experiments in matrix and iterative games highlight the potential power of applying mediators in MARL.	翻訳日:2023-06-16 19:16:15 公開日:2023-06-14
# The Devil is in the details: Analyzing the Lucrative Ad Fraud Patterns of the Online Ad Ecosystem The Devil is in the Details: Analyzing the Lucrative Ad Fraud Patterns of the Online Ad Ecosystem ( http://arxiv.org/abs/2306.08418v1 ) ライセンス: Link先を確認	Emmanouil Papadogiannakis, Nicolas Kourtellis, Panagiotis Papadopoulos, Evangelos P. Markatos	(参考訳) オンライン広告市場は最近500億ドル(約5兆5000億円)に達し、ユーザーと最も高い入札者とを1秒でマッチングする必要性に対応するため、多数のエージェントや中間男性を含む複雑な自動化モデルへと移行した。潜在的な収入と透明性の欠如に刺激され、悪役はそれを悪用し、制限を回避し、不当で違法なコンテンツからかなりの収入を生み出す方法を見出した。さらに悪いことに、これらの違法行為とは無関係な尊敬すべき企業から広告を受け取ることが多い。総じて、広告主のお金は未知の実体に向けられ、不利な操作を支持し、その存在を維持する。このプロジェクトでは、問題の程度を理解し、シャディエージェントが広告エコシステムのギャップを利用して事業を収益化する方法について光を当てています。我々は700万以上のウェブサイトを調査し、オンライン広告に関する最先端の標準がどのように適用されているかを調査した。我々は、この世で観測された実際の実践を発見し、パブリッシャーが好ましくない、違法なコンテンツを収益化でき、毎月数千ドルの収益を得られることを示す。 The online advertising market has recently reached the 500 billion dollar mark, and to accommodate the need to match a user with the highest bidder at a fraction of a second, it has moved towards a complex automated model involving numerous agents and middle men. Stimulated by potential revenue and the lack of transparency, bad actors have found ways to abuse it, circumvent restrictions, and generate substantial revenue from objectionable and even illegal content. To make matters worse, they often receive advertisements from respectable companies which have nothing to do with these illegal activities. Altogether, advertiser money is funneled towards unknown entities, supporting their objectionable operations and maintaining their existence. In this project, we work towards understanding the extent of the problem and shed light on how shady agents take advantage of gaps in the ad ecosystem to monetize their operations. We study over 7 million websites and examine how state-of-the-art standards associated with online advertising are applied. We discover and present actual practices observed in the wild and show that publishers are able to monetize objectionable and illegal content and generate thousands of dollars of revenue on a monthly basis.	翻訳日:2023-06-16 19:15:53 公開日:2023-06-14
# 畳み込み理論に基づく量子乗算アルゴリズム Quantum Multiplication Algorithm Based on Convolution Theorem ( http://arxiv.org/abs/2306.08473v1 ) ライセンス: Link先を確認	Mehdi Ramezani, Morteza Nikaeen, Farnaz Farman, Seyed Mahmoud Ashrafi and Alireza Bahrampour	(参考訳) 大量の効率的な乗算の問題は古典計算における長年の課題であり、何世紀にもわたって広く研究されてきた。既存の古典的アルゴリズムは理論上の限界に近づき、さらなる拡張の余地はほとんどないようである。しかし、量子コンピュータの出現と量子ハードウェア上での乗算が可能な量子アルゴリズムの必要性により、新しいパラダイムが出現する。本稿では,畳み込み定理と古典的高速フーリエ変換に依拠するストラッセン法に着想を得て,量子資源を用いて現代的な古典的乗算アルゴリズムに対していくつかの利点を持つ乗算が可能な量子版を提案する。畳み込み定理の量子バージョンは、精度、空間の複雑さの指数的減少、時間効率の(確率的な)向上の観点から、乗算アルゴリズムに顕著な改善をもたらすことを実証する。この論文はまた、古典的乗法アルゴリズムの歴史と発展をレビューし、量子リソースがこの根本的な問題に対する新しい視点と可能性を提供する方法を探る動機付けである。 The problem of efficient multiplication of large numbers has been a long-standing challenge in classical computation and has been extensively studied for centuries. It appears that the existing classical algorithms are close to their theoretical limit and offer little room for further enhancement. However, with the advent of quantum computers and the need for quantum algorithms that can perform multiplication on quantum hardware, a new paradigm emerges. In this paper, inspired by the Strassen method that relies on the convolution theorem and classical Fast Fourier Transform, we propose a quantum version of this algorithm that can perform multiplication with some advantages over the modern classical multiplication algorithms by using quantum resources. We demonstrate how the quantum version of the convolution theorem can offer significant improvements to multiplication algorithms in terms of accuracy, exponential reduction of space complexity and (probabilistic) enhancement of time efficiency. The paper also reviews the history and development of classical multiplication algorithms and motivates us to explore how quantum resources can provide new perspectives and possibilities for this fundamental problem.	翻訳日:2023-06-16 19:10:01 公開日:2023-06-14
# 再現可能なナップサックのバンディット:両世界のベスト Bandits with Replenishable Knapsacks: the Best of both Worlds ( http://arxiv.org/abs/2306.08470v1 ) ライセンス: Link先を確認	Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Federico Fusco	(参考訳) knapsack(bwk)フレームワークのバンディットは、エージェントがリソース消費の制約に従う一連の決定を下すオンライン意思決定問題をモデル化する。従来のモデルでは、各アクションが非負のリソースを消費し、初期予算が完全に枯渇するとプロセスが終了する。本研究では,非単調な資源利用を可能にするbwkフレームワークの自然一般化,すなわち資源を正の量で補充できる方法を検討する。そこで本稿では,オンライン学習問題に対処できる最強のプリミティブ・デュアルテンプレートを提案する。特に、我々のフレームワークは、$b=\omega(t)$ または可能な1ラウンドあたりの補充が正の定数である場合に、一定の競合比 $\alpha$ を保証していることを示すことによって、敵対的入力の場合の最初のポジティブな結果を提供する。さらに,確率的入力モデルの下では,既存のインスタンス依存境界を補完するインスタンス独立$\tilde{O}(T^{1/2})$ regret boundが得られる。最後に,我々の枠組みを実践的妥当性の経済的問題に適用する。 The bandits with knapsack (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio $\alpha$ when $B=\Omega(T)$ or when the possible per-round replenishment is a positive constant. Moreover, under a stochastic input model, our algorithm yields an instance-independent $\tilde{O}(T^{1/2})$ regret bound which complements existing instance-dependent bounds for the same setting. Finally, we provide applications of our framework to some economic problems of practical relevance.	翻訳日:2023-06-16 19:09:43 公開日:2023-06-14
# ヘテロフィア下での自己教師あり学習とグラフ分類 Self-supervised Learning and Graph Classification under Heterophily ( http://arxiv.org/abs/2306.08469v1 ) ライセンス: Link先を確認	Yilin Ding, Zhen Liu, Hao Hao	(参考訳) 自己教師型学習は近年,グラフ表現学習において有望な能力を示している。既存のほとんどの事前学習戦略は、通常、特殊なローパスフィルタと見なすことのできる一般的なグラフニューラルネットワーク(gnns)を選択するが、ヘテロフィリを効果的に捉えることができない。本稿では,ヘテロフィリグラフの分類において,低域通過フィルタと高域通過フィルタの性能を実験的に検討し,ヘテロフィリグラフ表現の学習において高周波信号が重要であることを示す。一方で,グラフの構造パターンを効果的に捉える方法や,自己教師付き事前学習戦略がグラフ構造をキャプチャする上での能力を測定する方法はまだ不明である。そこで我々はまず,グラフの類似度とグラフペアの埋め込み類似度との相関関係を解析し,グラフ構造を測定する定量的な尺度を設計する。次に,自己教師付き学習で取得したグラフ構造情報を強化するために,計量(pgm)に基づく事前学習のための新しい自己教師付き戦略を提案する。分子特性予測とタンパク質機能予測のために,我々の事前学習戦略を検証した。さらに,ヘテロフィリグラフ分類のための適切な事前学習戦略を設計するよりも,適切なフィルタを選択する方が良い場合もある。 Self-supervised learning has shown its promising capability in graph representation learning in recent work. Most existing pre-training strategies usually choose the popular Graph neural networks (GNNs), which can be seen as a special form of low-pass filter, fail to effectively capture heterophily. In this paper, we first present an experimental investigation exploring the performance of low-pass and high-pass filters in heterophily graph classification, where the results clearly show that high-frequency signal is important for learning heterophily graph representation. On the other hand, it is still unclear how to effectively capture the structural pattern of graphs and how to measure the capability of the self-supervised pre-training strategy in capturing graph structure. To address the problem, we first design a quantitative metric to Measure Graph Structure (MGS), which analyzes correlation between structural similarity and embedding similarity of graph pairs. Then, to enhance the graph structural information captured by self-supervised learning, we propose a novel self-supervised strategy for Pre-training GNNs based on the Metric (PGM). Extensive experiments validate our pre-training strategy achieves state-of-the-art performance for molecular property prediction and protein function prediction. In addition, we find choosing the suitable filter sometimes may be better than designing good pre-training strategies for heterophily graph classification.	翻訳日:2023-06-16 19:09:25 公開日:2023-06-14
# 時間依存非可換量子系における創発的幾何位相 Emergent geometric phase in time-dependent noncommutative quantum system ( http://arxiv.org/abs/2306.08467v1 ) ライセンス: Link先を確認	Anwesha Chakraborty	(参考訳) プランク長さスケールの近傍で、量子重力効果が観測されると予測される場合にのみ事象を局在化させる努力は、必然的に重力崩壊を引き起こす。そのような状況が発生するのを防ぐために、時空座標間の非可換 (nc) 代数を仮定しなければならない。一方、量子力学自体は一貫した定式化であり、演算子としての時間は困難で長年続く問題である。ここでは, 1+1次元NC時空上での非相対論的量子力学(モーダル型非可換性)をユーザフレンドリーな方法で定式化する方法を提案し, 等価可換理論の定式化を義務付ける。時空の非可換性の影響は、おそらく非常に高いエネルギースケールで重要になるはずであるが、低エネルギー状態においても量子時空の影響の遺物が存在すると推測するのは興味深い。このモチベーションを念頭において、時間依存系の研究、すなわち、nc時空における強制調和振動子の研究を行い、ncパラメータがゼロに設定された場合に消失する幾何相の出現を示し、幾何相の発生は、時空の非可換性に完全に依存していることを証明した。 Any effort to localise an event in the vicinity of the Planck length scale, only where the quantum gravitational effects are predicted to be observed, will invariably result in gravitational collapse. One must postulate noncommutative (NC) algebra between space-time coordinates, which are now elevated to the status of operators, in order to prevent such a situation from occurring. On the other hand, a consistent formulation of Quantum mechanics itself, with time being an operator is a challenging and longstanding problem. Here we have given a systematic way to formulate non-relativistic quantum mechanics on 1+1 dimensional NC space-time (Moyal type noncommutativity) in a user-friendly way, which mandates the formulation of an equivalent commutative theory. Although the effect of noncommutativity of space-time should presumably become significant at a very high energy scale, it is intriguing to speculate that there should be some relics of the effects of quantum space-time even in a low-energy regime. With this motivation in mind, we undertake the study of a time-dependent system, namely a forced harmonic oscillator in NC space-time and have shown the emergence of a geometric phase, which vanishes if the NC parameter is put to zero, proving the fact that, the occurrence of geometric phase is totally dependent on the non-commutativity of space-time.	翻訳日:2023-06-16 19:09:04 公開日:2023-06-14
# meta-gradient augmentation によるメタラーニングの一般化 Improving Generalization in Meta-Learning via Meta-Gradient Augmentation ( http://arxiv.org/abs/2306.08460v1 ) ライセンス: Link先を確認	Ren Wang, Haoliang Sun, Qi Wei, Xiushan Nie, Yuling Ma, Yilong Yin	(参考訳) メタ学習の方法は一般的に2ループのフレームワークに従い、各ループは悪名高い過剰フィッティングに苦しむ可能性があり、新しいタスクへの迅速な適応と一般化を妨げる。既存のスキームは、訓練サンプルの相互排他性や多様性を高めて解決するが、これらのデータ操作戦略はデータに依存しており、柔軟性が不十分である。本研究は,勾配正規化の観点からのメタラーニングの過剰化を緩和し,データ非依存な \textbf{m}eta-\textbf{g}radient \textbf{aug}mentation (\textbf{mgaug}) 法を提案する。鍵となるアイデアは、まずネットワークプルーニングによって、内側ループの記憶過剰に対処し、その後、プルーニングされたサブネットワークの勾配が自然に、外部ループの学習者オーバーフィッティングを緩和するメタグレードの高品質な強化を形成することである。具体的には,各パラメータに対するメタ記憶保持量(mmca)を計測し,高スコアの記憶を極力破壊するために,新たに提案する \textit{catfish pruning}, \textit{random width pruning}, \textit{random parameter pruning},および新たに提案された \textit{catfish pruning} の3つのプルーニング戦略を検討した。提案した MGAug は、PAC-Bayes フレームワークからの一般化によって理論的に保証される。さらに、パフォーマンス向上とリソースオーバーヘッドのトレードオフとして、MGAug-MaxUpと呼ばれる軽量バージョンを拡張しました。複数の数ショットの学習ベンチマークに対する大規模な実験は、MGAugの有効性と様々なメタベースラインに対する大幅な改善を検証する。コードは \url{https://github.com/xxLifeLover/Meta-Gradient-Augmentation} で公開されている。 Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing schemes solve it by enhancing the mutual-exclusivity or diversity of training samples, but these data manipulation strategies are data-dependent and insufficiently flexible. This work alleviates overfitting in meta-learning from the perspective of gradient regularization and proposes a data-independent \textbf{M}eta-\textbf{G}radient \textbf{Aug}mentation (\textbf{MGAug}) method. The key idea is to first break the rote memories by network pruning to address memorization overfitting in the inner loop, and then the gradients of pruned sub-networks naturally form the high-quality augmentation of the meta-gradient to alleviate learner overfitting in the outer loop. Specifically, we explore three pruning strategies, including \textit{random width pruning}, \textit{random parameter pruning}, and a newly proposed \textit{catfish pruning} that measures a Meta-Memorization Carrying Amount (MMCA) score for each parameter and prunes high-score ones to break rote memories as much as possible. The proposed MGAug is theoretically guaranteed by the generalization bound from the PAC-Bayes framework. In addition, we extend a lightweight version, called MGAug-MaxUp, as a trade-off between performance gains and resource overhead. Extensive experiments on multiple few-shot learning benchmarks validate MGAug's effectiveness and significant improvement over various meta-baselines. The code is publicly available at \url{https://github.com/xxLifeLover/Meta-Gradient-Augmentation}.	翻訳日:2023-06-16 19:08:39 公開日:2023-06-14
# 詩の融合 : 詩生成における意味的・韻律的操作の融合に向けて PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in Poetry Generation ( http://arxiv.org/abs/2306.08456v1 ) ライセンス: Link先を確認	Zhiyuan Hu, Chumin Liu, Yue Feng, Bryan Hooi	(参考訳) 詩生成は自然言語生成において典型的で一般的なタスクである。以前の作品では、詩生成の意味的側面や計量的側面を制御できたが、両方の視点を同時に扱うことにはまだ課題がある。本稿では,中国語のSonnetとSongCiの詩を生成するためにDiffusionモデルを用いて,このような課題に初めて取り組む。自己回帰生成と異なり、Diffusionモデルに基づく私たちの詩拡散モデルは、全文情報を考慮した完全な文や詩を生成し、意味表現の改善をもたらす。さらに、メトリクス(フォーマットとリズム)を操作および評価するために、新しいメートル法コントローラを組み込んだ。 PoetryDiffusionのDenoisingプロセスは、セマンティクスの段階的な強化とメートル法コントローラの柔軟な統合を可能にする。 2つのデータセットに対する実験結果から,本モデルが意味的,計量的,総合的な性能で既存モデルより優れていることが示された。 Poetry generation is a typical and popular task in natural language generation. While prior works have shown success in controlling either semantic or metrical aspects of poetry generation, there are still challenges in addressing both perspectives simultaneously. In this paper, we employ the Diffusion model to generate poetry in Sonnet and SongCi in Chinese for the first time to tackle such challenges. Different from autoregressive generation, our PoetryDiffusion model, based on Diffusion model, generates the complete sentence or poetry by taking into account the whole sentence information, resulting in improved semantic expression. Additionally, we incorporate a novel metrical controller to manipulate and evaluate metrics (format and rhythm). The denoising process in PoetryDiffusion allows for gradual enhancement of semantics and flexible integration of the metrical controller. Experimental results on two datasets demonstrate that our model outperforms existing models in terms of semantic, metrical and overall performance.	翻訳日:2023-06-16 19:08:03 公開日:2023-06-14
# 血圧測定技術に関する調査研究 : バイアスの潜在的源への取り組み A Survey on Blood Pressure Measurement Technologies: Addressing Potential Sources of Bias ( http://arxiv.org/abs/2306.08451v1 ) ライセンス: Link先を確認	Seyedeh Somayyeh Mousavi and Reza Sameni	(参考訳) 血圧は、健康、特に心血管の健康に関する重要な洞察を与える重要なサインである。疾病予防、診断、治療、管理のための医療施設や家庭において重要な役割を担っている。医師は決定を下すために血圧値に大きく依存している。ほとんどの商用機器は血圧測定にカフを使用し、高血圧の頻度が高いために自動装置は人気を博している。血圧の自己測定とホームモニタリングも推奨されている。しかし、血圧測定技術の精度や、報告された値と実際の値との整合性に懸念が生じる。人々はこれらの報告された値に基づいて薬を調整し、正確さを不可欠にします。本研究は「バイアス」の概念に着目し、報告された血圧値と実際の血圧値との潜在的な不一致を強調する。これまでの研究では,(1)血圧測定装置,(2)主観的要因,(3)測定セッションの3つのカテゴリから発せられるバイアスを同定した。具体的には,カフをベースとした血圧技術にまつわるバイアスについて,医療応用の普及と在宅モニタリングの傾向について検討した。バイアスの主な原因を特定し、対処することは、バイアスの伝播を防ぎ、潜在的な影響を軽減するために不可欠である。さらに,機械学習を用いた血圧モニタリングの今後の展望についても検討した。 Blood pressure is a vital sign that offers important insights into overall health, particularly cardiovascular well-being. It plays a critical role in medical settings and homes for disease prevention, diagnosis, treatment, and management. Physicians heavily rely on blood pressure values for making crucial decisions. Most commercial devices utilize cuffs for blood pressure measurement, and automatic devices have gained popularity due to the high prevalence of hypertension. Self-measurement and home monitoring of blood pressure are also recommended. However, concerns arise regarding the accuracy of blood pressure measurement technologies and the alignment of reported values with actual values. People often adjust their medication based on these reported values, making accuracy vital. This study focuses on the concept of ``bias'' to highlight potential discrepancies between reported and actual blood pressure values. Previous research has identified biases originating from three categories: (1) blood pressure measurement devices, (2) subject-specific factors, and (3) measurement sessions. Specifically, this study examines biases associated with cuff-based blood pressure technologies due to their widespread use in medical applications and the growing trend of home monitoring. Identifying and addressing the primary sources of biases is crucial to prevent their propagation and mitigate potential consequences. Additionally, the study explores the future prospects of blood pressure monitoring using machine learning methods.	翻訳日:2023-06-16 19:07:46 公開日:2023-06-14
# 非定常データのオンライン分類のためのカルマンフィルタ Kalman Filter for Online Classification of Non-Stationary Data ( http://arxiv.org/abs/2306.08448v1 ) ライセンス: Link先を確認	Michalis K. Titsias, Alexandre Galashov, Amal Rannen-Triki, Razvan Pascanu, Yee Whye Teh, Jorg Bornschein	(参考訳) オンライン連続学習(ocl)では、学習システムはデータのストリームを受け取り、予測およびトレーニングステップを順次実行する。 OCLの重要な課題は、データの特定の非定常構造への自動適応と予測の不確実性の定量化である。これらの課題に触発され、線形予測量に対する(おそらく事前学習された)ニューラル表現と状態空間モデルを用いて確率的ベイズオンライン学習モデルを導入する。線形予測子重みの非定常性は、忘れることを定量化する係数によってパラメータドリフト遷移密度を用いてモデル化される。このモデルの推論は、線形重みの後方分布を追跡する効率的なカルマンフィルタ再帰によって実装されるが、遷移ダイナミクス係数のオンラインsgd更新により、データに見られる非定常性に適応することができる。フレームワークは線形ガウスモデルとして開発されているが、分類問題やディープラーニング表現の微調整のために拡張する。 CIFAR-100 や CLOC などのデータセットを用いたマルチクラス分類実験では,モデルの予測能力と非定常性を捉える柔軟性を示す。 In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. Important challenges in OCL are concerned with automatic adaptation to the particular non-stationary structure of the data, and with quantification of predictive uncertainty. Motivated by these challenges we introduce a probabilistic Bayesian online learning model by using a (possibly pretrained) neural representation and a state space model over the linear predictor weights. Non-stationarity over the linear predictor weights is modelled using a parameter drift transition density, parametrized by a coefficient that quantifies forgetting. Inference in the model is implemented with efficient Kalman filter recursions which track the posterior distribution over the linear weights, while online SGD updates over the transition dynamics coefficient allows to adapt to the non-stationarity seen in data. While the framework is developed assuming a linear Gaussian model, we also extend it to deal with classification problems and for fine-tuning the deep learning representation. In a set of experiments in multi-class classification using data sets such as CIFAR-100 and CLOC we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.	翻訳日:2023-06-16 19:07:26 公開日:2023-06-14
# OoD検出器の剛性設計に向けて Towards Rigorous Design of OoD Detectors ( http://arxiv.org/abs/2306.08447v1 ) ライセンス: Link先を確認	Chih-Hong Cheng, Changshun Wu, Harald Ruess, Saddek Bensalem	(参考訳) out-of-distribution(ood)検出技術は、安全関連ニューラルネットワークに有用である。しかし,現在の性能指向OoD検出技術は,キャリブレーション誤差などの基準値の一致を考慮に入れているため,安全性の確保には不十分である。欠けているのは、ood検出器の開発、検証、検証のための厳密な設計アプローチである。これらの設計原則は、意図した機能と運用ドメインに適合する必要がある。そこで我々は,ood検出器のための厳密で安全関連の設計手法を開発するための,今後の可能性とともに,重要な技術的課題のいくつかを定式化する。 Out-of-distribution (OoD) detection techniques are instrumental for safety-related neural networks. We are arguing, however, that current performance-oriented OoD detection techniques geared towards matching metrics such as expected calibration error, are not sufficient for establishing safety claims. What is missing is a rigorous design approach for developing, verifying, and validating OoD detectors. These design principles need to be aligned with the intended functionality and the operational domain. Here, we formulate some of the key technical challenges, together with a possible way forward, for developing a rigorous and safety-related design methodology for OoD detectors.	翻訳日:2023-06-16 19:07:05 公開日:2023-06-14
# グラフ構造化力学系に対する深いガウス的マルコフランダム場 Deep Gaussian Markov Random Fields for Graph-Structured Dynamical Systems ( http://arxiv.org/abs/2306.08445v1 ) ライセンス: Link先を確認	Fiona Lippert, Bart Kranstauber, E. Emiel van Loon, Patrick Forr\'e	(参考訳) 高次元状態空間モデルにおける確率的推論は計算上困難である。しかし、多くの時空間系では、状態変数の依存性構造に関する事前知識が利用可能である。この構造を利用して、(部分的に)未知のダイナミクスと限られた履歴データを持つグラフ構造状態空間モデルにおける状態推定と学習のための計算効率の高い手法を開発する。ガウスマルコフ確率場(英語版)(GMRF)の原理推論とディープラーニングからのアイデアを組み合わせた最近の手法に基づいて、簡単な空間グラフ層と時間グラフ層によって定義されたディープGMRFとしてグラフ構造化状態空間モデルを再構成する。これにより、変動推論によって単一の時間列から効率的に学習できるフレキシブルな時空間前処理が実現される。線形ガウスの仮定の下では、共役勾配法を用いて効率的にサンプリングできる閉形式後部を保ち、古典カルマンフィルタに基づくアプローチと比較して好ましくスケーリングする。 Probabilistic inference in high-dimensional state-space models is computationally challenging. For many spatiotemporal systems, however, prior knowledge about the dependency structure of state variables is available. We leverage this structure to develop a computationally efficient approach to state estimation and learning in graph-structured state-space models with (partially) unknown dynamics and limited historical data. Building on recent methods that combine ideas from deep learning with principled inference in Gaussian Markov random fields (GMRF), we reformulate graph-structured state-space models as Deep GMRFs defined by simple spatial and temporal graph layers. This results in a flexible spatiotemporal prior that can be learned efficiently from a single time sequence via variational inference. Under linear Gaussian assumptions, we retain a closed-form posterior, which can be sampled efficiently using the conjugate gradient method, scaling favourably compared to classical Kalman filter based approaches	翻訳日:2023-06-16 19:06:54 公開日:2023-06-14
# 科学的シンボリック推論に先立つ確率的正則木 Probabilistic Regular Tree Priors for Scientific Symbolic Reasoning ( http://arxiv.org/abs/2306.08506v1 ) ライセンス: Link先を確認	Tim Schneider, Amin Totounferoush, Wolfgang Nowak, Steffen Staab	(参考訳) シンボリック回帰(SR)は、データから科学方程式を発見できる。可能な方程式の大きな探索空間を制限するため、任意の文字列の部分集合を特徴づける形式文法の用語で事前知識が表現されている。しかし、構文的に正しい方程式の集合を表現するのに必要な文脈自由文法、前者の閉包特性の欠如、後者のツリー構造の間にはミスマッチがある。私たちの貢献は (i)確率正規木表現(pRTE)によりどの方程式が予想されるかという専門家の事前の信念をコンパクトに表現し、 (ii)有限状態機械として符号化された記号回帰に対して、ベイズ推論を効率的に利用できるように適応させる。本研究は土壌科学における吸収等温線の発見と超弾性材料のモデル化に有効性を示す。 Symbolic Regression (SR) allows for the discovery of scientific equations from data. To limit the large search space of possible equations, prior knowledge has been expressed in terms of formal grammars that characterize subsets of arbitrary strings. However, there is a mismatch between context-free grammars required to express the set of syntactically correct equations, missing closure properties of the former, and a tree structure of the latter. Our contributions are to (i) compactly express experts' prior beliefs about which equations are more likely to be expected by probabilistic Regular Tree Expressions (pRTE), and (ii) adapt Bayesian inference to make such priors efficiently available for symbolic regression encoded as finite state machines. Our scientific case studies show its effectiveness in soil science to find sorption isotherms and for modeling hyper-elastic materials.	翻訳日:2023-06-16 18:58:52 公開日:2023-06-14
# DiffuDetox: テキストのデトックス化のための混合拡散モデル DiffuDetox: A Mixed Diffusion Model for Text Detoxification ( http://arxiv.org/abs/2306.08505v1 ) ライセンス: Link先を確認	Griffin Floto, Mohammad Mahdi Abdollah Pour, Parsa Farinneya, Zhenwei Tang, Ali Pesaranghader, Manasa Bharadwaj, Scott Sanner	(参考訳) テキストデトックス化(text detoxification)は、有害なテキストから有害なコンテンツを除去するための条件付きテキスト生成タスクである。オンラインフォーラムやソーシャルメディアでは、攻撃的なコンテンツが頻繁に出会うのに非常に有用である。直感的には、意味を保ちながら文章をデトックス化する方法は様々であり、ユーザに対してテキストを表示する前に、デトックス化文を選択できる。条件付き拡散モデルは、言語モデルに基づく既存の条件付きテキスト生成モデルよりも高い生成的多様性を示すため、このタスクに特に適している。それでも、不十分なデータで訓練された場合、テキストの流布度は低下する。本研究では,テキストデトックス化のための混合条件と非条件拡散モデルであるDiffuDetoxを提案する。条件付きモデルは、有毒なテキストを条件として取り、その毒性を減少させ、様々な無毒な文を生成する。非条件モデルは、入力テキストを復元するために訓練され、トレーニングのために追加のフルーエントテキストを導入することができる。提案するdiffudetoxの有効性を実験的に検証し,詳細な解析を行った。 Text detoxification is a conditional text generation task aiming to remove offensive content from toxic text. It is highly useful for online forums and social media, where offensive content is frequently encountered. Intuitively, there are diverse ways to detoxify sentences while preserving their meanings, and we can select from detoxified sentences before displaying text to users. Conditional diffusion models are particularly suitable for this task given their demonstrated higher generative diversity than existing conditional text generation models based on language models. Nonetheless, text fluency declines when they are trained with insufficient data, which is the case for this task. In this work, we propose DiffuDetox, a mixed conditional and unconditional diffusion model for text detoxification. The conditional model takes toxic text as the condition and reduces its toxicity, yielding a diverse set of detoxified sentences. The unconditional model is trained to recover the input text, which allows the introduction of additional fluent text for training and thus ensures text fluency. Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed DiffuDetox.	翻訳日:2023-06-16 18:58:37 公開日:2023-06-14
# ITALIC: イタリアのインテント分類データセット ITALIC: An Italian Intent Classification Dataset ( http://arxiv.org/abs/2306.08502v1 ) ライセンス: Link先を確認	Alkis Koudounas, Moreno La Quatra, Lorenzo Vaiani, Luca Colomba, Giuseppe Attanasio, Eliana Pastor, Luca Cagliero, Elena Baralis	(参考訳) 最近の大規模音声言語理解データセットは、主に英語に焦点を当てており、特定の音素や異なる発話中の単語といった言語固有の現象を考慮していない。 ITALICはイタリア語で意図分類用に設計された最初の大規模音声データセットである。このデータセットは、イタリア各地の70人の話者が記録した16,521人のクラウドソースオーディオサンプルからなり、インテントラベルと追加メタデータが付加されている。我々は現在最先端の音声とテキストモデルを評価することでITALICの汎用性を探求する。意図分類の結果から,大規模化や言語適応の促進により,より優れた音声モデルが得られ,モノリンガルテキストモデルが多言語モデルよりも優れていることが示唆された。我々は、新しいイタリアSLUモデルと言語固有のデータセットの開発を効率化するために、データセットとアノテーションスキームの両方をリリースする。 Recent large-scale Spoken Language Understanding datasets focus predominantly on English and do not account for language-specific phenomena such as particular phonemes or words in different lects. We introduce ITALIC, the first large-scale speech dataset designed for intent classification in Italian. The dataset comprises 16,521 crowdsourced audio samples recorded by 70 speakers from various Italian regions and annotated with intent labels and additional metadata. We explore the versatility of ITALIC by evaluating current state-of-the-art speech and text models. Results on intent classification suggest that increasing scale and running language adaptation yield better speech models, monolingual text models outscore multilingual ones, and that speech recognition on ITALIC is more challenging than on existing Italian benchmarks. We release both the dataset and the annotation scheme to streamline the development of new Italian SLU models and language-specific datasets.	翻訳日:2023-06-16 18:58:20 公開日:2023-06-14
# 機械学習を用いた都市変化過程追跡のための衛星誘導夜間光時系列の適応モデリング Adaptive Modeling of Satellite-Derived Nighttime Lights Time-Series for Tracking Urban Change Processes Using Machine Learning ( http://arxiv.org/abs/2306.08501v1 ) ライセンス: Link先を確認	Srija Chakraborty and Eleanor C. Stokes	(参考訳) リモートセンシングされた夜間灯(ntl)は、都市化、社会・政治の衝突と変位、災害、休日、日々の人間の行動パターンの変化など、人間と生態的な幸福にとって重要な都市変化プロセスを独特に捉えている。グローバルなNTL製品はいくつかあるが、開発レベルや社会的、経済的、文化的特徴など、光に影響を与える固有の都市特有の要因は、各都市特有のものであり、NTLシグネチャに埋め込まれた都市プロセスの特徴付けが困難であり、都市の変化分析のスケーラビリティを制限している。本研究では,各都市に適応し,都市固有の時間パターンの学習に有効である日次衛星由来NTLデータから都市変化を検出するためのデータ駆動型手法を提案する。提案手法は,過去のデータ記録からニューラルネットワークを用いてntlシグネチャを予測し,大量のラベルなしデータの利用を可能にし,アノテーションの手間を省く。異常検出手法を用いたモデル予測から観測されたNTLの偏差に基づいて都市の変化を検出する。モデル予測と観測されたNTLを比較することで、変更の方向(正または負)を特定したり、変更の重大度を監視してリカバリを追跡することもできる。このモデルの運用にあたっては,動的ntl時系列を持つ多様な地域から10の都市圏を考察し,ntl偏差に基づいて,異なるドライバによる変化過程とこれらの都市域内で発生するレートを検出する手法の一般化可能性を示す。毎日のリモートセンシング観測から変化を監視するこのスケーラブルなアプローチは、大規模なデータボリュームを効率的に活用し、継続的な監視と意思決定をサポートする。 Remotely sensed nighttime lights (NTL) uniquely capture urban change processes that are important to human and ecological well-being, such as urbanization, socio-political conflicts and displacement, impacts from disasters, holidays, and changes in daily human patterns of movement. Though several NTL products are global in extent, intrinsic city-specific factors that affect lighting, such as development levels, and social, economic, and cultural characteristics, are unique to each city, making the urban processes embedded in NTL signatures difficult to characterize, and limiting the scalability of urban change analyses. In this study, we propose a data-driven approach to detect urban changes from daily satellite-derived NTL data records that is adaptive across cities and effective at learning city-specific temporal patterns. The proposed method learns to forecast NTL signatures from past data records using neural networks and allows the use of large volumes of unlabeled data, eliminating annotation effort. Urban changes are detected based on deviations of observed NTL from model forecasts using an anomaly detection approach. Comparing model forecasts with observed NTL also allows identifying the direction of change (positive or negative) and monitoring change severity for tracking recovery. In operationalizing the model, we consider ten urban areas from diverse geographic regions with dynamic NTL time-series and demonstrate the generalizability of the approach for detecting the change processes with different drivers and rates occurring within these urban areas based on NTL deviation. This scalable approach for monitoring changes from daily remote sensing observations efficiently utilizes large data volumes to support continuous monitoring and decision making.	翻訳日:2023-06-16 18:58:04 公開日:2023-06-14
# 線形応答による非平衡量子プローブ Non-equilibrium quantum probing through linear response ( http://arxiv.org/abs/2306.08500v1 ) ライセンス: Link先を確認	Sherry Blair, Giorgio Zicari, Alessio Belenchia, Alessandro Ferraro, Mauro Paternostro	(参考訳) 線形応答理論の形式論は、開量子系が非平衡定常状態に向かって進化する物理的状況を含むように拡張することができる。ここでは、Konopik と Lutz [Phys] が提案したフレームワークを使用します。 Rev. Research {\bf 1}, 033156 (2019)] は、力学のユニタリ摂動を超えていく。 2つの結合量子高調波発振器からなるオープンシステムについて検討し、ハミルトニアンダイナミクスや非ユニタリ摂動に影響を及ぼすユニタリ摂動に対するシステムの応答を調べ、その温度やスクイーズなど環境の性質に影響を及ぼす。線形応答は, 量子探索法と組み合わせることで, 非単体力学の場合であっても, 環境の摂動や特性について, 有効な定量的情報を提供できることを示す。 The formalism of linear response theory can be extended to encompass physical situations where an open quantum system evolves towards a non-equilibrium steady-state. Here, we use the framework put forward by Konopik and Lutz [Phys. Rev. Research {\bf 1}, 033156 (2019)] to go beyond unitary perturbations of the dynamics. Considering an open system comprised of two coupled quantum harmonic oscillators, we study the system's response to unitary perturbations, affecting the Hamiltonian dynamics, as well as non-unitary perturbations, affecting the properties of the environment, e.g., its temperature and squeezing. We show that linear response, combined with a quantum probing approach, can effectively provide valuable quantitative information about the perturbation and characteristics of the environment, even in cases of non-unitary dynamics.	翻訳日:2023-06-16 18:57:33 公開日:2023-06-14
# RISCLIP: CLIP を用いたイメージセグメンテーションフレームワークの参照 RISCLIP: Referring Image Segmentation Framework using CLIP ( http://arxiv.org/abs/2306.08498v1 ) ライセンス: Link先を確認	Seoyeon Kim, Minguk Kang, Jaesik Park	(参考訳) 近年のコンピュータビジョンと自然言語処理の進歩は、Referring Image Segmentation (RIS)を含むマルチモーダルタスクの活発な研究につながっている。最近のアプローチでは、RISのフロンティアを目覚ましいマージンで前進させているが、最先端のパフォーマンスを達成するには、外部の視覚的グラウンドデータセットの事前訓練段階が必要になる。本稿では, CLIP(Contrastive Language- Image Pretraining) を RIS に適用することにより, この要件から解放しようとする。本稿では,Flsion AdaptersとBackbone Adaptersを用いて,凍結したCLIP機能をRISに残留的に適応させる新しいフレームワークを提案する。フリーズCLIPはバックボーンのリッチで汎用的な画像テキストアライメントの知識を保ち、Fusion Adaptersはマルチモーダル通信を導入し、Backbone AdaptersはRISの解決に有用な新しい知識を注入する。提案手法は3つの主要なRISベンチマーク上での新たな技術状況に達する。追加の事前訓練を必要とせず、追加のトレーニングやデータ準備の必要性を解消する。ソースコードとモデルの重み付けは、公開時に提供される。 Recent advances in computer vision and natural language processing have naturally led to active research in multi-modal tasks, including Referring Image Segmentation (RIS). Recent approaches have advanced the frontier of RIS by impressive margins, but they require an additional pretraining stage on external visual grounding datasets to achieve the state-of-the-art performances. We attempt to break free from this requirement by effectively adapting Contrastive Language-Image Pretraining (CLIP) to RIS. We propose a novel framework that residually adapts frozen CLIP features to RIS with Fusion Adapters and Backbone Adapters. Freezing CLIP preserves the backbone's rich, general image-text alignment knowledge, whilst Fusion Adapters introduce multi-modal communication and Backbone Adapters inject new knowledge useful in solving RIS. Our method reaches a new state of the art on three major RIS benchmarks. We attain such performance without additional pretraining and thereby absolve the necessity of extra training and data preparation. Source code and model weights will be available upon publication.	翻訳日:2023-06-16 18:57:19 公開日:2023-06-14
# 強対数凹分布に対するランゲヴィン・モンテカルロ:ランダム化された中間点の再検討 Langevin Monte Carlo for strongly log-concave distributions: Randomized midpoint revisited ( http://arxiv.org/abs/2306.08494v1 ) ライセンス: Link先を確認	Lu Yu, Avetik Karagulyan, Arnak Dalalyan	(参考訳) 我々は,$\mathbb r^p$ の至る所で滑らかな対数対数密度を持つ対象分布からサンプリングする問題を再検討する。この文脈では、付加的な密度情報がない場合、動力学的ランジュバン拡散のランダム化中間点離散化は、大きな条件数を持つ高次元において最もスケーラブルな方法であることが知られている。我々の主な結果は、この手法のワッサーシュタイン-2誤差の上限を非漸近的に計算し易いことである。計算可能な上界を確立する方法のより詳細な説明として,バニラ・ランゲヴィン過程の中間点の離散化を解析する。この分析は根底にある原理を明らかにするのに役立ち、中間点の離散化を伴う速度論的ランゲヴィン過程の上限を改良するために私たちが使う貴重な洞察を提供する。さらに、これらの手法を適用することで、既存の上界よりも条件数に依存したオイラー離散化によるランゲヴィン過程の新しい保証を確立する。 We revisit the problem of sampling from a target distribution that has a smooth strongly log-concave density everywhere in $\mathbb R^p$. In this context, if no additional density information is available, the randomized midpoint discretization for the kinetic Langevin diffusion is known to be the most scalable method in high dimensions with large condition numbers. Our main result is a nonasymptotic and easy to compute upper bound on the Wasserstein-2 error of this method. To provide a more thorough explanation of our method for establishing the computable upper bound, we conduct an analysis of the midpoint discretization for the vanilla Langevin process. This analysis helps to clarify the underlying principles and provides valuable insights that we use to establish an improved upper bound for the kinetic Langevin process with the midpoint discretization. Furthermore, by applying these techniques we establish new guarantees for the kinetic Langevin process with Euler discretization, which have a better dependence on the condition number than existing upper bounds.	翻訳日:2023-06-16 18:56:59 公開日:2023-06-14
# ニューラルネットワーク翻訳モデルに対する逆攻撃に対する緩和最適化手法 A Relaxed Optimization Approach for Adversarial Attacks against Neural Machine Translation Models ( http://arxiv.org/abs/2306.08492v1 ) ライセンス: Link先を確認	Sahar Sadrizadeh, Cl\'ement Barbier, Ljiljana Dolamic, Pascal Frossard	(参考訳) 本稿では,ニューラルネットワーク翻訳(NMT)モデルに対する最適化に基づく逆攻撃を提案する。まず、原文と意味的に類似しているが、ターゲットNMTモデルによって生成された翻訳を破壊できる逆例を生成する最適化問題を提案する。この最適化問題は離散的であり,それを解くための連続緩和を提案する。この緩和により、各トークンの確率分布が逆の例に現れ、これらの分布からサンプリングすることで複数の逆の例を生成することができる。実験結果から,本攻撃はNMTモデルの翻訳品質を著しく低下させつつ,原文と逆文のセマンティックな類似性を維持できることがわかった。さらに,本攻撃は,成功率,類似性保持率,翻訳品質への影響,トークンエラー率において,ベースラインを上回っている。最後に,勾配がアクセス可能な参照モデルの最適確率分布からサンプリングすることにより,攻撃のブラックボックス拡張を提案する。 In this paper, we propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models. First, we propose an optimization problem to generate adversarial examples that are semantically similar to the original sentences but destroy the translation generated by the target NMT model. This optimization problem is discrete, and we propose a continuous relaxation to solve it. With this relaxation, we find a probability distribution for each token in the adversarial example, and then we can generate multiple adversarial examples by sampling from these distributions. Experimental results show that our attack significantly degrades the translation quality of multiple NMT models while maintaining the semantic similarity between the original and adversarial sentences. Furthermore, our attack outperforms the baselines in terms of success rate, similarity preservation, effect on translation quality, and token error rate. Finally, we propose a black-box extension of our attack by sampling from an optimized probability distribution for a reference model whose gradients are accessible.	翻訳日:2023-06-16 18:56:40 公開日:2023-06-14
# 大規模・高密度ランダムクロネッカーグラフの解析と近似推定 Analysis and Approximate Inference of Large and Dense Random Kronecker Graphs ( http://arxiv.org/abs/2306.08489v1 ) ライセンス: Link先を確認	Zhenyu Liao, Yuanqian Xia, Chengmei Niu, Yong Xiao	(参考訳) ランダムグラフモデルは科学や産業においてますます重要な役割を担っており、社会や交通ネットワーク、レコメンデーションシステムや分子遺伝学など幅広い分野に応用されている。本稿では,グラフ頂点数が$N$である場合のランダムクロネッカーグラフモデルの詳細な解析を行う。乱数行列理論の最近の進歩を基にして、密分布において、ランダムクロネッカーグラフの隣接行列は、グラフパラメータに線形な最小ランク(最大$\log n$)信号行列と4/4円の特異値分布を持つランダムノイズ行列を持つ信号プラスノイズモデル(英語版)(signal-plus-noise model)に従うことを示した。この観測により,計算複雑性の低減と(漸近的な)性能保証により,グラフパラメータを近似的に推定する<denoise-and-solve'メタアルゴリズムを提案することができる。合成グラフと現実グラフの両方におけるグラフ推定とグラフ分類の数値実験を行い,提案手法の利点を実証した。 Random graph models are playing an increasingly important role in science and industry, and finds their applications in a variety of fields ranging from social and traffic networks, to recommendation systems and molecular genetics. In this paper, we perform an in-depth analysis of the random Kronecker graph model proposed in \cite{leskovec2010kronecker}, when the number of graph vertices $N$ is large. Built upon recent advances in random matrix theory, we show, in the dense regime, that the random Kronecker graph adjacency matrix follows approximately a signal-plus-noise model, with a small-rank (of order at most $\log N$) signal matrix that is linear in the graph parameters and a random noise matrix having a quarter-circle-form singular value distribution. This observation allows us to propose a ``denoise-and-solve'' meta algorithm to approximately infer the graph parameters, with reduced computational complexity and (asymptotic) performance guarantee. Numerical experiments of graph inference and graph classification on both synthetic and realistic graphs are provided to support the advantageous performance of the proposed approach.	翻訳日:2023-06-16 18:56:25 公開日:2023-06-14
# マルチモーダル集中型知識グラフによる未知物体の認識 Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph Propagation ( http://arxiv.org/abs/2306.08487v1 ) ライセンス: Link先を確認	Likang Wu, Zhi Li, Hongke Zhao, Zhefeng Wang, Qi Liu, Baoxing Huai, Nicholas Jing Yuan, Enhong Chen	(参考訳) Zero-Shot Learning (ZSL)は、見えないオブジェクトを自動的に認識することを目的としており、マシンに対する新しい現実世界の知識を継続的に理解するための、有望な学習パラダイムである。近年、知識グラフ(kg)は、ゼロショットタスクを大規模かつ非帰属データで扱うための効果的なスキームとして証明されている。先行研究は常に、見えないオブジェクトと見えないオブジェクトの関係を、既存の知識グラフから視覚情報に埋め込み、見えないデータの認知能力を促進する。実際、現実世界の知識は自然にマルチモーダルな事実によって形成されます。グラフの観点からの通常の構造的知識と比較して、マルチモーダルkgはきめ細かい知識を持つ認知システムを提供できる。例えば、テキスト記述とビジュアルコンテンツは、知識のトリプレットのみに依存するよりも、事実のより重要な詳細を描写することができる。残念ながら、このマルチモーダルなきめ細かな知識は、異なるモダリティ間の機能アライメントのボトルネックのため、ほとんど展開されていない。そこで我々は,画像の領域と対応するセマンティックな埋め込みとを,設計した集中型注目モジュールと自己校正損失によってマッチングする多モード集中型ZSLフレームワークを提案する。これにより、ZSLフレームワークのセマンティックトランスファープロセスは、エンティティ間のより分化した知識を学習する。私たちのモデルは、粗いグローバル機能のみを使用する場合のパフォーマンス制限も取り除きます。大規模実世界データを用いた大規模実験を行い,モデルの評価を行った。実験結果は,標準ゼロショット分類タスクにおける提案モデルの有効性を明らかにした。 Zero-Shot Learning (ZSL), which aims at automatically recognizing unseen objects, is a promising learning paradigm to understand new real-world knowledge for machines continuously. Recently, the Knowledge Graph (KG) has been proven as an effective scheme for handling the zero-shot task with large-scale and non-attribute data. Prior studies always embed relationships of seen and unseen objects into visual information from existing knowledge graphs to promote the cognitive ability of the unseen data. Actually, real-world knowledge is naturally formed by multimodal facts. Compared with ordinary structural knowledge from a graph perspective, multimodal KG can provide cognitive systems with fine-grained knowledge. For example, the text description and visual content can depict more critical details of a fact than only depending on knowledge triplets. Unfortunately, this multimodal fine-grained knowledge is largely unexploited due to the bottleneck of feature alignment between different modalities. To that end, we propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings via a designed dense attention module and self-calibration loss. It makes the semantic transfer process of our ZSL framework learns more differentiated knowledge between entities. Our model also gets rid of the performance limitation of only using rough global features. We conduct extensive experiments and evaluate our model on large-scale real-world data. The experimental results clearly demonstrate the effectiveness of the proposed model in standard zero-shot classification tasks.	翻訳日:2023-06-16 18:56:03 公開日:2023-06-14
# アクティベーション関数の共設計によるディープニューラルネットワークの高速かつプライベートな推論 Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions ( http://arxiv.org/abs/2306.08538v1 ) ライセンス: Link先を確認	Abdulrahman Diaa, Lucas Fenaux, Thomas Humphries, Marian Dietz, Faezeh Ebrahimianghazani, Bailey Kacsmar, Xinda Li, Nils Lukas, Rasoul Akhavan Mahdavi, Simon Oya, Ehsan Amjadian, Florian Kerschbaum	(参考訳) マシンラーニング・アズ・ア・サービス(MLaaS)は、豊富なコンピューティングリソースを持つ企業がディープニューラルネットワークをトレーニングし、画像分類などのタスクに対してクエリアクセスを提供するという、人気の高い設計である。この設計の課題は、MLaaSがクライアントに対して、モデルをホストしている企業に対して、潜在的にセンシティブなクエリを明らかにする必要があることだ。マルチパーティ計算(MPC)は、暗号化された推論を許すことでクライアントのデータを保護する。しかし、現在のアプローチは、非常に大きな推論時間に苦しむ。 MPCにおける推定時間ボトルネックは、ReLUアクティベーション関数のような非線形層の評価である。機械学習とmpcの側面を共同設計する以前の作業の成功に動機づけられ、アクティベーション関数を共同設計する。我々は全てのReLUを多項式近似に置き換え、それらを単一ラウンドMPCプロトコルで評価し、広域ネットワークにおける最先端の推論時間を与える。さらに,以前に多項式アクティベーションで遭遇した精度問題に対処するために,平文モデルと競合する精度のトレーニングアルゴリズムを提案する。私たちの評価では、最大230万ドルのパラメータを持つ大規模モデルでの推論時間の4ドルから90ドルのスピードアップと、競合する推論精度の維持が示されています。 Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC aspects, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-the-art inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between $4$ and $90\times$ speedups in inference time on large models with up to $23$ million parameters while maintaining competitive inference accuracy.	翻訳日:2023-06-16 18:50:20 公開日:2023-06-14
# VIBR:ロバスト視覚制御のためのビュー不変値関数の学習 VIBR: Learning View-Invariant Value Functions for Robust Visual Control ( http://arxiv.org/abs/2306.08537v1 ) ライセンス: Link先を確認	Tom Dupuis, Jaonary Rabarisoa, Quoc-Cuong Pham and David Filliat	(参考訳) 画像におけるエンドツーエンドの強化学習は近年大きな進歩を見せている。データベースアプローチはデータ拡張とドメインのランダム化を活用し、表現学習手法は補助損失を使用してタスク関連の特徴を学習する。しかし、強化はいまだに視覚的に多様で、混乱と刺激的な騒音に満ちた環境に苦しむ。本研究では,多視点学習と不変予測を組み合わせることで,RLに基づくビジュモータ制御におけるアウト・オブ・ディストリビューション(OOD)の一般化ギャップを低減する手法であるVIBR(View-Invariant Bellman Residuals)を提案する。モデルフリーアプローチでは,表現学習の目的を付加する必要がなく,計算コストが制限されることなく,ベースライン性能が向上する。視覚摂動の高い複雑なビジュオモータ制御環境において,VIBRは既存の手法よりも優れていることを示す。提案手法は,多くの視覚摂動器の頑健性,OODの一般化,外挿機能を評価するため,現状の手法では未解決であるDistracting Control Suiteベンチマークの最先端結果を実現する。 End-to-end reinforcement learning on images showed significant progress in the recent years. Data-based approach leverage data augmentation and domain randomization while representation learning methods use auxiliary losses to learn task-relevant features. Yet, reinforcement still struggles in visually diverse environments full of distractions and spurious noise. In this work, we tackle the problem of robust visual control at its core and present VIBR (View-Invariant Bellman Residuals), a method that combines multi-view training and invariant prediction to reduce out-of-distribution (OOD) generalization gap for RL based visuomotor control. Our model-free approach improve baselines performances without the need of additional representation learning objectives and with limited additional computational cost. We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation. Our approach achieves state-of the-art results on the Distracting Control Suite benchmark, a challenging benchmark still not solved by current methods, where we evaluate the robustness to a number of visual perturbators, as well as OOD generalization and extrapolation capabilities.	翻訳日:2023-06-16 18:49:59 公開日:2023-06-14
# 3量子クリフォード+CS作用素の生成と関係 Generators and relations for 3-qubit Clifford+CS operators ( http://arxiv.org/abs/2306.08530v1 ) ライセンス: Link先を確認	Xiaoning Bian and Peter Selinger	(参考訳) 生成子によるプレゼンテーションと3量子クリフォード+CS作用素群の関係について述べる。証明は概ね2つの部分から構成される:(1) ライデマイスター=シュライアーの定理を我々の初期の結果に再帰的に適用すること、(2) 何千もの関係を17の関係に単純化すること。 1)と(2)は、証明アシスタントAgdaで正式に認証されている。 reidemeister-schreier の定理は、スーパーモノイドの表現が与えられた部分モノイドの表現を計算するための構成的方法を与える。 (2) を達成するために、clifford+cs演算子のほぼ正規形式を考案する。その過程で、クリフォード+CS群内のいくつかの興味深い構造も同定する。具体的には、元が一意な正規形式を与えることのできる3つの異なる有限部分群を特定する。 3量子クリフォード+cs群は、もちろん無限であり、これら3つの有限部分群の合併積である。この結果は、 1-立方体 Clifford+T 群が2つの有限部分群の積であるという事実に類似している。 We give a presentation by generators and relations of the group of 3-qubit Clifford+CS operators. The proof roughly consists of two parts: (1) applying the Reidemeister-Schreier theorem recursively to an earlier result of ours; and (2) the simplification of thousands of relations into 17 relations. Both (1) and (2) have been formally verified in the proof assistant Agda. The Reidemeister-Schreier theorem gives a constructive method for computing a presentation of a sub-monoid given a presentation of the super-monoid. To achieve (2), we devise an almost-normal form for Clifford+CS operators. Along the way, we also identify several interesting structures within the Clifford+CS group. Specifically, we identify three different finite subgroups for whose elements we can give unique normal forms. We show that the 3-qubit Clifford+CS group, which is of course infinite, is the amalgamated product of these three finite subgroups. This result is analogous to the fact that the 1-qubit Clifford+T group is an amalgamated product of two finite subgroups.	翻訳日:2023-06-16 18:49:38 公開日:2023-06-14
# SQL2Circuits: 量子自然言語処理法によるSQLクエリのメトリック推定 SQL2Circuits: Estimating Metrics for SQL Queries with A Quantum Natural Language Processing Method ( http://arxiv.org/abs/2306.08529v1 ) ライセンス: Link先を確認	Valter Uotila	(参考訳) 近年、量子コンピューティングは著しく発展している。 SQLクエリの様々なメトリクスを推定するアルゴリズムの開発は、クエリ最適化とデータベース性能に影響を与えるため、データベース研究において重要な研究課題となっている。この研究は、量子自然言語処理(QNLP)から着想を得たアプローチで、実行時間と濃度に関してSQLクエリを分類できる量子機械学習モデルを構築する。量子機械学習の観点から、我々のモデルと結果をQNLPの以前の研究と比較し、我々のモデルは分類タスクにおけるQNLPモデルと同様の精度に達すると結論づける。これは,QNLPにない問題に適用しても,QNLPモデルは有望な手法であることを示している。本研究では,その表現可能性とエンハング能力ヒストグラムを計算し,量子機械学習モデルについて検討した。結果は、モデルが表現しやすい性質を持つが、量子ハードウェア上で実行するには複雑ではないことを示している。 Quantum computing has developed significantly in recent years. Developing algorithms to estimate various metrics for SQL queries has been an important research question in database research since the estimations affect query optimization and database performance. This work represents a quantum natural language processing (QNLP) -inspired approach for constructing a quantum machine learning model which can classify SQL queries with respect to their execution times and cardinalities. From the quantum machine learning perspective, we compare our model and results to the previous research in QNLP and conclude that our model reaches similar accuracy as the QNLP model in the classification tasks. This indicates that the QNLP model is a promising method even when applied to problems that are not in QNLP. We study the developed quantum machine learning model by calculating its expressibility and entangling capability histograms. The results show that the model has favorable properties to be expressible but also not too complex to be executed on quantum hardware.	翻訳日:2023-06-16 18:49:21 公開日:2023-06-14
# 予測:連続画像を用いた予測誘導3次元物体検出 Predict to Detect: Prediction-guided 3D Object Detection using Sequential Images ( http://arxiv.org/abs/2306.08528v1 ) ライセンス: Link先を確認	Sanmin Kim, Youngseok Kim, In-Jae Lee, Dongsuk Kum	(参考訳) 最近のカメラベースの3Dオブジェクト検出手法では、複数のフレームが大きな深さ推定誤差を軽減することを期待して、シーケンシャルフレームを導入している。検出性能の改善にもかかわらず、先行の作業は単純融合法(例えば結合)や静的なシーン(例えば時間ステレオ)に限られており、物体の動きキューの重要性を無視している。これらのアプローチはシーケンシャルなイメージの可能性を完全に活用せず、限られた性能改善を示す。この制限に対処するために,予測スキームを検出フレームワークに統合し,運動特徴を明示的に抽出し活用する新しい3Dオブジェクト検出モデルP2D(Predict to Detect)を提案する。 P2Dは、過去のフレームのみを用いて現在のフレーム内のオブジェクト情報を予測し、時間運動の特徴を学習する。次に,予測対象情報に基づいてバードアイビュー(BEV)特徴を注意深く活用し,正確な3次元物体検出を実現する新しい時間的特徴集約手法を提案する。実験結果から,P2Dは連続画像ベースラインに比べてmAPとNDSを3.0%,3.7%改善し,予測スキームを組み込むことで検出精度が大幅に向上することが示された。 Recent camera-based 3D object detection methods have introduced sequential frames to improve the detection performance hoping that multiple frames would mitigate the large depth estimation error. Despite improved detection performance, prior works rely on naive fusion methods (e.g., concatenation) or are limited to static scenes (e.g., temporal stereo), neglecting the importance of the motion cue of objects. These approaches do not fully exploit the potential of sequential images and show limited performance improvements. To address this limitation, we propose a novel 3D object detection model, P2D (Predict to Detect), that integrates a prediction scheme into a detection framework to explicitly extract and leverage motion features. P2D predicts object information in the current frame using solely past frames to learn temporal motion features. We then introduce a novel temporal feature aggregation method that attentively exploits Bird's-Eye-View (BEV) features based on predicted object information, resulting in accurate 3D object detection. Experimental results demonstrate that P2D improves mAP and NDS by 3.0% and 3.7% compared to the sequential image-based baseline, illustrating that incorporating a prediction scheme can significantly improve detection accuracy.	翻訳日:2023-06-16 18:49:07 公開日:2023-06-14
# 音声強調のための可変保存型補間拡散モデル Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement ( http://arxiv.org/abs/2306.08527v1 ) ライセンス: Link先を確認	Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang	(参考訳) 本研究の目的は,音声強調のための拡散モデルを実装することである。最初のステップは、連続条件下での分散保存(VP)ベースの補間拡散の理論的基礎を強調することである。次に,VP-および分散拡散(VE)に基づく補間拡散法の両方をカプセル化した,より簡潔なフレームワークを提案する。この2つの手法が提案フレームワークの特別な場合であることを実証する。さらに、SEタスクに対するVPベースの補間拡散の実例を示す。性能の向上とモデルトレーニングの容易化を目的として,拡散モデルで発生する一般的な難易度を分析し,超パラメータの提案を行う。最後に,提案手法の有効性を示すために,公開ベンチマークを用いた複数の手法に対する評価を行った。 The goal of this study is to implement diffusion models for speech enhancement (SE). The first step is to emphasize the theoretical foundation of variance-preserving (VP)-based interpolation diffusion under continuous conditions. Subsequently, we present a more concise framework that encapsulates both the VP- and variance-exploding (VE)-based interpolation diffusion methods. We demonstrate that these two methods are special cases of the proposed framework. Additionally, we provide a practical example of VP-based interpolation diffusion for the SE task. To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models and suggest amenable hyper-parameters. Finally, we evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach	翻訳日:2023-06-16 18:48:42 公開日:2023-06-14
# albmore:アルバニアの感情分析のための映画レビューコーパス AlbMoRe: A Corpus of Movie Reviews for Sentiment Analysis in Albanian ( http://arxiv.org/abs/2306.08526v1 ) ライセンス: Link先を確認	Erion \c{C}ano	(参考訳) 低リソース言語のためのテキストコーパスのような利用可能なリソースの不足は、自然言語処理や計算言語学の研究を著しく妨げている。本稿では,アルバニア語映画評論800のコーパスであるAlbMoReを紹介する。各テキストはポジティブまたはネガティブとラベル付けされ、感情分析研究に使用することができる。 AlbMoReサンプルを用いて学習した従来の機械学習分類器に基づく予備結果も報告する。これらは将来の研究実験の比較基準となる。 Lack of available resources such as text corpora for low-resource languages seriously hinders research on natural language processing and computational linguistics. This paper presents AlbMoRe, a corpus of 800 sentiment annotated movie reviews in Albanian. Each text is labeled as positive or negative and can be used for sentiment analysis research. Preliminary results based on traditional machine learning classifiers trained with the AlbMoRe samples are also reported. They can serve as comparison baselines for future research experiments.	翻訳日:2023-06-16 18:48:24 公開日:2023-06-14
# ランクアグリゲーションにおける分割性の測定と制御 Measuring and Controlling Divisiveness in Rank Aggregation ( http://arxiv.org/abs/2306.08511v1 ) ライセンス: Link先を確認	Rachael Colley, Umberto Grandi, C\'esar Hidalgo, Mariana Macedo and Carlos Navarrete	(参考訳) 階級集計において、人口階級のメンバーは、どの集団が好まれるかを決定する。代わりに、個人の好みの相違を表現する分割的な問題を特定することに焦点を合わせます。我々は、偏極性尺度の特性と既存の偏極性概念との関係を分析する。また,不完全な選好の下でのロバスト性や,分割性の制御と操作のためのアルゴリズムについても検討する。我々の結果は、集団意思決定における不一致を定量化する方法についての理解を深める。 In rank aggregation, members of a population rank issues to decide which are collectively preferred. We focus instead on identifying divisive issues that express disagreements among the preferences of individuals. We analyse the properties of our divisiveness measures and their relation to existing notions of polarisation. We also study their robustness under incomplete preferences and algorithms for control and manipulation of divisiveness. Our results advance our understanding of how to quantify disagreements in collective decision-making.	翻訳日:2023-06-16 18:48:15 公開日:2023-06-14
# 音源追跡のための置換不変リカレントニューラルネットワーク Permutation Invariant Recurrent Neural Networks for Sound Source Tracking Applications ( http://arxiv.org/abs/2306.08510v1 ) ライセンス: Link先を確認	David Diaz-Guerra, Archontis Politis, Antonio Miguel, Jose R. Beltran, Tuomas Virtanen	(参考訳) ニューラルネットワークに基づく多くのマルチソースローカライゼーションと追跡モデルは、最終段階で1つまたは複数の繰り返しレイヤを使用してソースの移動を追跡する。長い短期記憶(LSTM)やゲートリカレントユニット(GRU)のような従来のリカレントニューラルネットワーク(RNN)は、ベクトルを入力として、別のベクトルを使って状態を記憶する。しかし、このアプローチは、単一の順序ベクトルに含まれる全てのソースからの情報をもたらすため、マルチソース追跡のような置換不変問題には最適ではない。本稿では,その入力と状態の両方を表現するために非順序集合を使用し,入力集合の置換に不変であり,状態集合の置換に同値である新しい再帰的アーキテクチャを提案する。したがって、各音源の情報は個別の埋め込みで表現され、新しい推定値はその順序に関係なくトラックされた軌跡に割り当てられる。 Many multi-source localization and tracking models based on neural networks use one or several recurrent layers at their final stages to track the movement of the sources. Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vector to store their state. However, this approach results in the information from all the sources being contained in a single ordered vector, which is not optimal for permutation-invariant problems such as multi-source tracking. In this paper, we present a new recurrent architecture that uses unordered sets to represent both its input and its state and that is invariant to the permutations of the input set and equivariant to the permutations of the state set. Hence, the information of every sound source is represented in an individual embedding and the new estimates are assigned to the tracked trajectories regardless of their order.	翻訳日:2023-06-16 18:48:03 公開日:2023-06-14
# NISQ時代の量子コンピュータにおける車両ルーティング問題に対する量子ビット効率的な量子アルゴリズム Qubit efficient quantum algorithms for the vehicle routing problem on quantum computers of the NISQ era ( http://arxiv.org/abs/2306.08507v1 ) ライセンス: Link先を確認	Ioannis D. Leonidas, Alexander Dukakis, Benjamin Tan, Dimitris G. Angelakis	(参考訳) タイムウインドウ(VRPTW)による車両ルーティング問題は、ロジスティクスや輸送など、多くの分野で発生する古典的な最適化問題である。 VRPTWの目標は、車両群が目的地を訪れるための最短ルートを見つけることである。近年,2次非制約バイナリ最適化(QUBO)問題として定式化できる問題に対する近似解を求めるために,変分量子アルゴリズム(VQA)の使用への関心が高まっている。本研究では,vrptw を qubo として定式化し,[1] に記述した先述の符号化方式を用いて vrptw に量子変分法を適用し,必要な qubit 数を大幅に削減する。 exxonmobilの研究者が提供したデータをもとに,11～3964経路のvrptwインスタンスを対象に,提案手法を評価した。 NISQ時代に可能な最大問題のサイズが20-30経路の順序であるような標準的な完全符号化手法を用いて得られた解を比較した。 ibmq、aws(rigetti)、ionqによって提供されるクラウド量子ハードウェアだけでなく、シミュレータでアルゴリズムを実行し、シミュレータ上でも結果をベンチマークします。本手法は,全エンコーディングを用いた量子アルゴリズムの解に匹敵するvrptwの近似解を求めることができることを示す。その結果,業界毎の最適化問題に対する近似解を求めるのに必要な量子ビット数を劇的に削減する有望な手法が提案されている。 The vehicle routing problem with time windows (VRPTW) is a classic optimization problem that arises in many different areas, such as logistics and transportation. The goal of the VRPTW is to find the shortest possible route for a fleet of vehicles to visit a set of destinations. In recent years, there has been growing interest in using variational quantum algorithms (VQAs), to find approximate solutions to problems that can be formulated as quadratic unconstrained binary optimization (QUBO) problems. In this work, we formulate the VRPTW as a QUBO and apply a quantum variational approach to the VRPTW using our earlier suggested encoding scheme described in [1] to reduce drastically the number of qubits required. We evaluate our approach on a set of VRPTW instances ranging from 11 to 3964 routes constructed with data provided by researchers from ExxonMobil. We compare the solutions obtained with standard full encoding approaches for which the max problems size possible in NISQ era are of the order of 20-30 routes. We run our algorithms in simulators as well as cloud quantum hardware provided by IBMQ, AWS (Rigetti) and IonQ and benchmark our results against each other as well as on the simulators. We show that our approach can find approximate solutions to the VRPTW that are comparable to the solutions found by quantum algorithms using the full encoding. Our results suggest that our unique encoding approach, provides a promising approach to drastically reducing the number of qubits required to find decent approximate solutions for industry-based optimization problems.	翻訳日:2023-06-16 18:47:36 公開日:2023-06-14
# 自然主義的刺激に対する普遍的一般化法則 The Universal Law of Generalization Holds for Naturalistic Stimuli ( http://arxiv.org/abs/2306.08564v1 ) ライセンス: Link先を確認	Raja Marjieh, Nori Jacoby, Joshua C. Peterson, Thomas L. Griffiths	(参考訳) シェパードの普遍的一般化の法則は、知的生物がどのように類似性を知覚すべきかについての顕著な仮説である。普遍法則は、一対の刺激の知覚的類似性のレベルは、適切な心理学空間に埋め込まれた場合、その距離の凹凸関数として崩壊すべきであると述べている。広く研究されているが、普遍法則を支持する証拠は、現実世界とは大きく異なる低次元の刺激と小さな刺激セットに依存している。これは主に、類似性判定に必要な対比較が刺激数で2次的にスケールするためである。自然主義的高次元体制における普遍的法則の直接的な証拠として,既存の214,200人の類似性判定のデータセットと,新たに収集された390,819人の一般性判定のデータセット(N=2406US)を3セットの自然画像で分析する。 Shepard's universal law of generalization is a remarkable hypothesis about how intelligent organisms should perceive similarity. In its broadest form, the universal law states that the level of perceived similarity between a pair of stimuli should decay as a concave function of their distance when embedded in an appropriate psychological space. While extensively studied, evidence in support of the universal law has relied on low-dimensional stimuli and small stimulus sets that are very different from their real-world counterparts. This is largely because pairwise comparisons -- as required for similarity judgments -- scale quadratically in the number of stimuli. We provide direct evidence for the universal law in a naturalistic high-dimensional regime by analyzing an existing dataset of 214,200 human similarity judgments and a newly collected dataset of 390,819 human generalization judgments (N=2406 US participants) across three sets of natural images.	翻訳日:2023-06-16 18:40:16 公開日:2023-06-14
# ダイヤモンド中の偏光エンタングルストークスアンチストークス光子の微視的起源 Microscopic origin of polarization-entangled Stokes-anti-Stokes photons in diamond ( http://arxiv.org/abs/2306.08563v1 ) ライセンス: Link先を確認	Tiago A. Freitas, Paula Machado, Lucas V. de Carvalho, Diego Sier, Raul Corr\^ea, Riichiro Saito, Marcelo F. Santos, Carlos H. Monken, and Ado Jorio	(参考訳) ストークス反ストークス(SaS)光子対のラマン共鳴近傍での偏光に対するクレーター-ホルン-シモニー-ホルト不等式の振動を実証した。このペアは、ダイヤモンド試料にパルスレーザーを照射して生成され、レーザーの2つの光子を異なる周波数の1対の光子に変換する。生成した対は、標準ベル分析器によって収集され、スペクトル領域や試料の結晶方位に対する入射光の偏光方向に依存する絡み合いの度合いで、偏光に絡み合っていることが示されている。この結果は、材料科学と量子情報を改善するために量子光学とsasラマン分光法を組み合わせる可能性を開く。 Violation of the Clauser-Horne-Shimony-Holt inequality for the polarization of Stokes-anti-Stokes (SaS) photon pairs near a Raman resonance is demonstrated. The pairs are generated by shining a pulsed laser on a diamond sample, where two photons of the laser are converted into a pair of photons of different frequencies. The generated pairs are collected by standard Bell analyzers and shown to be entangled in polarization, with the degree of entanglement depending on the spectral region and on the orientation of the polarization of the incident light with respect to the crystallographic orientation of the sample. This result opens up the possibility to combine quantum optics and SaS Raman spectroscopy in order to improve materials science and quantum information.	翻訳日:2023-06-16 18:40:00 公開日:2023-06-14
# 新型コロナウイルスによる学生の行動経路適応 : マルチモーダルアプローチ Adaptation of Student Behavioural Routines during COVID-19: A Multimodal Approach ( http://arxiv.org/abs/2306.08561v1 ) ライセンス: Link先を確認	Nicol\`o A. Girardini, Simone Centellegher, Andrea Passerini, Ivano Bison, Fausto Giunchiglia and Bruno Lepri	(参考訳) 新型コロナウイルス(covid-19)のパンデミックで大きく適応し、行動を変えたい集団は学生だ。これまでの研究では、パンデミックが心理的健康と学術的パフォーマンスに与える影響を幅広く研究してきたが、活動のルーチンには限定的な注意が向けられている。本研究では,学生の行動変化を,2つの異なる期間(2018年と2020年)における日常の質的・定量的な違いから分析する。学生の活動, 場所, 社会性に関するマルチモーダルな自己申告データを収集する経験サンプリング法(ESM)を用いて, 非否定的マトリックス因子化(NMF)を用いて意味のある行動成分を抽出し, 2018年と2020年の学生の行動変動を定量化した。意外なことに、新型コロナウイルス(COVID-19)の規制があるにも関わらず、学生の活動の変化は最小限であり、活動の多様性も影響を受けていない。その結果, パンデミックに適応する活動は, 場所や社会性の面で主に発生していることが判明した。 One population group that had to significantly adapt and change their behaviour during the COVID-19 pandemic is students. While previous studies have extensively investigated the impact of the pandemic on their psychological well-being and academic performance, limited attention has been given to their activity routines. In this work, we analyze students' behavioural changes by examining qualitative and quantitative differences in their daily routines between two distinct periods (2018 and 2020). Using an Experience Sampling Method (ESM) that captures multimodal self-reported data on students' activity, locations and sociality, we apply Non-Negative Matrix Factorization (NMF) to extract meaningful behavioural components, and quantified the variations in behaviour between students in 2018 and 2020. Surprisingly, despite the presence of COVID-19 restrictions, we find minimal changes in the activities performed by students, and the diversity of activities also remains largely unaffected. Leveraging the richness of the data at our disposal, we discovered that activities adaptation to the pandemic primarily occurred in the location and sociality dimensions.	翻訳日:2023-06-16 18:39:46 公開日:2023-06-14
# サブ波長原子配列を用いた量子コンピューティング Quantum computing with subwavelength atomic arrays ( http://arxiv.org/abs/2306.08555v1 ) ライセンス: Link先を確認	Freya Shah, Taylor L. Patti, Oriol Rubies-Bigorda, Susanne F. Yelin	(参考訳) サブ波長原子配列における光子による相互作用は量子科学に多くの応用がある。本稿では,3レベル量子エミッタの可能性,すなわち2次元原子配列に埋め込まれた`impurities'の可能性を探り,量子計算のプラットフォームとして機能する。サブ波長アレイを介する誘導双極子-双極子相互作用の結果、不純物の変形挙動を利用することにより、$\sqrt{\text{iSWAP}}$とシングルキュービット回転からなる普遍量子ゲートの集合を実装する。これらのゲートは、原子が近距離にある限り、非常に高い忠実度とコヒーレンス時間を持つことを示す。最後に、最大絡み合う2量子ビットベル状態、および絡み合う3量子ビットGHZ状態を生成するための量子回路を実装した。これらの結果は、量子計算と量子シミュレーションの代替プラットフォームとしてサブ波長エミッタアレイを確立する。 Photon-mediated interactions in subwavelength atomic arrays have numerous applications in quantum science. In this manuscript, we explore the potential of three-level quantum emitters, or ``impurities" embedded in a two-dimensional atomic array to serve as a platform for quantum computation. By exploiting the altered behavior of impurities as a result of the induced dipole-dipole interactions mediated by subwavelength array, we implement a set of universal quantum gates consisting of the $\sqrt{\text{iSWAP}}$ and single-qubit rotations. We demonstrate that these gates have very high fidelities and coherence times, as long as the atoms remain within a proximal range. Finally, we implement quantum circuits leading to the generation of the maximally entangled two-qubit Bell states, as well as the entangled three-qubit GHZ state. These findings establish subwavelength emitter arrays as an alternative platform for quantum computation and quantum simulation.	翻訳日:2023-06-16 18:39:23 公開日:2023-06-14
# 最適収束率を有するフラットミニマの雑音安定性最適化 Noise Stability Optimization for Flat Minima with Optimal Convergence Rates ( http://arxiv.org/abs/2306.08553v1 ) ライセンス: Link先を確認	Haotian Ju, Dongyue Li, and Hongyang R. Zhang	(参考訳) 平均重量摂動を加えて平坦で局所的な最小値を求める。非凸関数 $f: \mathbb{r}^d \rightarrow \mathbb{r}$ と $d$-次元分布 $\mathcal{p}$ が 0 で対称であるとき、f(w) = \mathbb{e}[f({w + u})]$ を摂動して $f(w) = \mathbb{e}[f({w + u})]$ と定義する。このインジェクションは、小さな等方性ガウス摂動に対してヘッセントレースf$を介して正規化を誘導する。したがって、重みの摂動関数は、低ヘッシアントレースを持つ最小化子に偏りを与える。いくつかの先行研究は、一般化を改善するアルゴリズムを設計することによって、この重み摂動関数に関連する設定を研究した。それでも収束率は、関数$F$の平均摂動の下でミニマを見つけることは知られていない。本稿では,分散を低減するために$\mathcal{P}$の対称性を活用しながら,勾配の計算前にランダムノイズを注入するSGDライクなアルゴリズムについて考察する。次に、厳密な解析を行い、f$の勾配がリプシッツ連続であるとき、近似した1次定常点を求めるアルゴリズムの上と下の境界が一致することを示す。我々は,様々なアーキテクチャを用いた画像分類タスクに対して,そのアルゴリズムを実証的に検証する。シャープネス・アウェアの最小化と比較すると、hessian traceの12.6%と7.8%の低下と、発見されたminimaの最高固有値が8つのデータセットの平均値であることがわかった。アブレーション研究はアルゴリズムの設計の利点を検証する。 We consider finding flat, local minimizers by adding average weight perturbations. Given a nonconvex function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ and a $d$-dimensional distribution $\mathcal{P}$ which is symmetric at zero, we perturb the weight of $f$ and define $F(W) = \mathbb{E}[f({W + U})]$, where $U$ is a random sample from $\mathcal{P}$. This injection induces regularization through the Hessian trace of $f$ for small, isotropic Gaussian perturbations. Thus, the weight-perturbed function biases to minimizers with low Hessian trace. Several prior works have studied settings related to this weight-perturbed function by designing algorithms to improve generalization. Still, convergence rates are not known for finding minima under the average perturbations of the function $F$. This paper considers an SGD-like algorithm that injects random noise before computing gradients while leveraging the symmetry of $\mathcal{P}$ to reduce variance. We then provide a rigorous analysis, showing matching upper and lower bounds of our algorithm for finding an approximate first-order stationary point of $F$ when the gradient of $f$ is Lipschitz-continuous. We empirically validate our algorithm for several image classification tasks with various architectures. Compared to sharpness-aware minimization, we note a 12.6% and 7.8% drop in the Hessian trace and top eigenvalue of the found minima, respectively, averaged over eight datasets. Ablation studies validate the benefit of the design of our algorithm.	翻訳日:2023-06-16 18:39:07 公開日:2023-06-14
# 情報アクセスシステム評価のためのユーザシミュレーション User Simulation for Evaluating Information Access Systems ( http://arxiv.org/abs/2306.08550v1 ) ライセンス: Link先を確認	Krisztian Balog and ChengXiang Zhai	(参考訳) 検索エンジンやレコメンデータシステム,会話アシスタントといった情報アクセスシステムは,情報ニーズを満たす上で,私たちの日常生活に不可欠なものになっています。しかし、これらのシステムの有効性を評価することは長く複雑な科学的課題である。この課題は、対話的なサポートによるタスクの完了を支援するシステム全体の効果を評価することの難しさと、ユーザの振る舞いや好みの実質的な変動によってさらに悪化することにある。この課題に対処するため、ユーザシミュレーションは有望なソリューションとして現れる。本書は,評価目的に特化して設計されたユーザシミュレーション技術の徹底的な理解を提供することに重点を置いている。まず,情報アクセスシステム評価の背景からユーザシミュレーションの多様な応用について考察する。次に,ユーザシミュレータ設計のための一般的なフレームワーク,評価のためのユーザシミュレーション,検索エンジン,レコメンダシステム,会話アシスタントとのユーザインタラクションをシミュレートする特定のモデルとアルゴリズムの両方をカバーする,ユーザシミュレーションの主要な研究成果を体系的にレビューする。ユーザシミュレーションが学際的な研究課題であることを認識し,機械学習,対話システム,ユーザモデリング,経済学などの関連分野との連携を確立する。本書は,情報アクセスシステムの評価を超えて,対話型知的システム全般の評価方法に広範な影響を与えることが期待されている,今後の重要な研究方向性について,詳細な議論で締めくくっている。 Information access systems, such as search engines, recommender systems, and conversational assistants, have become integral to our daily lives as they help us satisfy our information needs. However, evaluating the effectiveness of these systems presents a long-standing and complex scientific challenge. This challenge is rooted in the difficulty of assessing a system's overall effectiveness in assisting users to complete tasks through interactive support, and further exacerbated by the substantial variation in user behaviour and preferences. To address this challenge, user simulation emerges as a promising solution. This book focuses on providing a thorough understanding of user simulation techniques designed specifically for evaluation purposes. We begin with a background of information access system evaluation and explore the diverse applications of user simulation. Subsequently, we systematically review the major research progress in user simulation, covering both general frameworks for designing user simulators, utilizing user simulation for evaluation, and specific models and algorithms for simulating user interactions with search engines, recommender systems, and conversational assistants. Realizing that user simulation is an interdisciplinary research topic, whenever possible, we attempt to establish connections with related fields, including machine learning, dialogue systems, user modeling, and economics. We end the book with a detailed discussion of important future research directions, many of which extend beyond the evaluation of information access systems and are expected to have broader impact on how to evaluate interactive intelligent systems in general.	翻訳日:2023-06-16 18:38:33 公開日:2023-06-14
# 機械学習アルゴリズムを用いたマスキング顔認識の探索的研究 An Exploratory Study of Masked Face Recognition with Machine Learning Algorithms ( http://arxiv.org/abs/2306.08549v1 ) ライセンス: Link先を確認	Megh Pudyel and Mustafa Atay	(参考訳) 自動顔認識は、自動境界制御、電子機器へのセキュアなログイン、コミュニティの監視、学校の出席の追跡、職場時計のイン、クロックアウトなど、さまざまなプロセスにおける人々の接触のない識別のための、広く採用されている機械学習技術である。最近の世界的な新型コロナウイルス(covid-19)パンデミックでは、マスクの使用が日常生活で重要になっている。フェイスマスクの使用により、従来の顔認識技術の性能は大幅に低下する。顔認識におけるマスク着用の効果は、まだ未検討の課題である。本稿では,マスク付き顔画像とマスクなし顔画像の識別により,多数の顔認識モデルの性能を評価することにより,この問題に対処する。 SVC, KNN, LDA, DT, LR, NBの6つの従来の機械学習アルゴリズムを用いて, マスクされた顔画像の存在下で, 性能の悪いもの以外に, 性能のよいものを見つけ出す。特徴抽出演算子としてローカルバイナリパターン(LBP)が使用される。合成顔画像の生成と利用を行った。非マスク、仮面、半マスクのトレーニングデータセットを作成し、マスク画像と未マスク画像の両方に対する顔認識性能を評価し、この問題の広い視野を示す。本研究は,マスク認識を半マスクから半マスクまで,半マスクからアンマスクまで,ほぼすべてのシナリオで説明し,従来の機械学習アルゴリズムを文献で比較した。 Automated face recognition is a widely adopted machine learning technology for contactless identification of people in various processes such as automated border control, secure login to electronic devices, community surveillance, tracking school attendance, workplace clock in and clock out. Using face masks have become crucial in our daily life with the recent world-wide COVID-19 pandemic. The use of face masks causes the performance of conventional face recognition technologies to degrade considerably. The effect of mask-wearing in face recognition is yet an understudied issue. In this paper, we address this issue by evaluating the performance of a number of face recognition models which are tested by identifying masked and unmasked face images. We use six conventional machine learning algorithms, which are SVC, KNN, LDA, DT, LR and NB, to find out the ones which perform best, besides the ones which poorly perform, in the presence of masked face images. Local Binary Pattern (LBP) is utilized as the feature extraction operator. We generated and used synthesized masked face images. We prepared unmasked, masked, and half-masked training datasets and evaluated the face recognition performance against both masked and unmasked images to present a broad view of this crucial problem. We believe that our study is unique in elaborating the mask-aware facial recognition with almost all possible scenarios including half_masked-to-masked and half_masked-to-unmasked besides evaluating a larger number of conventional machine learning algorithms compared the other studies in the literature.	翻訳日:2023-06-16 18:38:11 公開日:2023-06-14
# フォトニック量子コンピュータにおける非共有原子間相互作用のモデル化 Modeling Non-Covalent Interatomic Interactions on a Photonic Quantum Computer ( http://arxiv.org/abs/2306.08544v1 ) ライセンス: Link先を確認	Matthieu Sarkis, Alessio Fallani, Alexandre Tkatchenko	(参考訳) 非共有結合相互作用は、材料、分子、生体複合体の構造、安定性、ダイナミクスを決定する重要な要素である。しかし、これらの相互作用を正確に捉えることは複雑な量子多体問題であり、古典的コンピュータでは効率的な解は得られない。非共有相互作用を正確かつ効率的にモデル化するために広く使われているモデルはクーロン結合量子ドルド振動子(cqdo)多体ハミルトニアンであり、正確な解は知られていない。我々は,cQDOモデルが自然にフォトニック量子コンピュータ上でのシミュレーションに有効であることを示し,XanaduのStrawberry Fieldsフォトニクスライブラリを利用して2原子系の結合エネルギー曲線を計算する。本研究は、非共有結合相互作用に対する概念実証的応用を小さな分子の標準的な電子構造問題を超えて示すことにより、量子コンピューティングの原子論的モデリングへの適用性を実質的に拡張する。興味深いことに、2つの結合したボソニックQDOは安定結合を示す。さらに,従来の計算機に最適化可能なcQDO波動関数の効率的な関数形式を提案し,原子間距離を増大させるために結合-非共有遷移を捉える。興味深いことに、2つの結合したボソニックQDOは安定結合を示す。さらに,従来の計算機に最適化可能なcQDO波動関数の効率的な関数形式を提案し,原子間距離を増大させるために結合-非共有遷移を捉える。 Non-covalent interactions are a key ingredient to determine the structure, stability, and dynamics of materials, molecules, and biological complexes. However, accurately capturing these interactions is a complex quantum many-body problem, with no efficient solution available on classical computers. A widely used model to accurately and efficiently model non-covalent interactions is the Coulomb-coupled quantum Drude oscillator (cQDO) many-body Hamiltonian, for which no exact solution is known. We show that the cQDO model lends itself naturally to simulation on a photonic quantum computer, and we calculate the binding energy curve of diatomic systems by leveraging Xanadu's Strawberry Fields photonics library. Our study substantially extends the applicability of quantum computing to atomistic modeling, by showing a proof-of-concept application to non-covalent interactions, beyond the standard electronic-structure problem of small molecules. Remarkably, we find that two coupled bosonic QDOs exhibit a stable bond. In addition, our study suggests efficient functional forms for cQDO wavefunctions that can be optimized on classical computers, and capture the bonded-to-noncovalent transition for increasing interatomic distances. Remarkably, we find that two coupled bosonic QDOs exhibit a stable bond. In addition, our study suggests efficient functional forms for cQDO wavefunctions that can be optimized on classical computers, and capture the bonded-to-noncovalent transition for increasing interatomic distances.	翻訳日:2023-06-16 18:37:41 公開日:2023-06-14
# 大規模言語モデルの知識蒸留 Knowledge Distillation of Large Language Models ( http://arxiv.org/abs/2306.08543v1 ) ライセンス: Link先を確認	Yuxian Gu, Li Dong, Furu Wei, Minlie Huang	(参考訳) 知識蒸留 (KD) は, 大規模言語モデル (LLM) の高い計算需要を減らすための有望な手法である。しかしながら、従来のKDメソッドは、主にホワイトボックス分類モデルや、ChatGPTのようなブラックボックスモデルAPIを模倣する小さなモデルの訓練に適用される。ホワイトボックス生成LDMから効果的に知識を抽出する方法はまだ未熟であり、LSMの繁栄とともにますます重要になっている。本研究では,生成型言語モデルからより小さな言語モデルを抽出するminillmを提案する。我々はまず,教師分布の低確率領域を過大評価しないように,生成言語モデル上でKDに適した逆KLDを用いて,標準KDアプローチにおけるKLL(Kulback-Leibler divergence)目標のフォワードを置き換える。そして、この目的を学習するための効果的な最適化アプローチを導出する。命令追従設定における広範囲な実験により、MiniLLMモデルは、より高い全体的な品質、低い露光バイアス、より良い校正、より高い長文生成性能でより正確な応答を生成することが示された。提案手法は,120Mから13Bのパラメータを持つ異なるモデルファミリに対してもスケーラブルである。コードとモデルチェックポイントはhttps://aka.ms/MiniLLM.com/でリリースします。 Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-box classification models or training small models to imitate black-box model APIs like ChatGPT. How to effectively distill the knowledge from white-box generative LLMs is still under-explored, which becomes more and more important with the prosperity of LLMs. In this work, we propose MiniLLM that distills smaller language models from generative larger language models. We first replace the forward Kullback-Leibler divergence (KLD) objective in the standard KD approaches with reverse KLD, which is more suitable for KD on generative language models, to prevent the student model from overestimating the low-probability regions of the teacher distribution. Then, we derive an effective optimization approach to learn this objective. Extensive experiments in the instruction-following setting show that the MiniLLM models generate more precise responses with the higher overall quality, lower exposure bias, better calibration, and higher long-text generation performance. Our method is also scalable for different model families with 120M to 13B parameters. We will release our code and model checkpoints at https://aka.ms/MiniLLM.	翻訳日:2023-06-16 18:37:17 公開日:2023-06-14
# ゼロショット3次元形状スケッチビューの類似性と検索 Zero-Shot 3D Shape Sketch View Similarity and Retrieval ( http://arxiv.org/abs/2306.08541v1 ) ライセンス: Link先を確認	Gianluca Berardi and Yulia Gryaditskaya	(参考訳) プリテキストタスクのViTとResNetの特徴層に基づいて事前学習を行い、個々の3次元形状の2次元スケッチビューのペア間の類似性を定量化する。モデルが類似したビューと地上3D形状を検索する能力の観点から性能を評価する。ゼロショット性能研究の先駆けとして、1つまたは複数の形状クラスにおける代替微調整戦略とその他の形状クラスへの一般化について検討する。 NPR(Non-Photo Realistic)レンダリングの進歩を利用して、コントラスト学習を用いた事前学習基礎モデルの微調整に使用する複数のスタイルで合成スケッチビューを生成する。スケッチ中のオブジェクトのスケールが,異なるネットワーク層における特徴の類似性に与える影響について検討する。スケールによって異なる特徴層がスケッチビューにおける形状の類似性を示すことが観察できる。しかし、同様のオブジェクトスケールがvitとresnetの最高のパフォーマンスをもたらすことが分かりました。要約すると, 微調整戦略の慎重な選択により, ゼロショット形状検索精度の一貫した改善が得られることを示す。我々の研究はスケッチ領域の研究に大きな影響を与え、知覚的損失として大規模な事前学習モデルを採用する方法についての洞察とガイダンスを提供するだろうと考えています。 We conduct a detailed study of the ability of pretrained on pretext tasks ViT and ResNet feature layers to quantify the similarity between pairs of 2D sketch views of individual 3D shapes. We assess the performance in terms of the models' abilities to retrieve similar views and ground-truth 3D shapes. Going beyond naive zero-shot performance study, we investigate alternative fine-tuning strategies on one or several shape classes, and their generalization to other shape classes. Leveraging progress in NPR (Non-Photo Realistic) rendering, we generate synthetic sketch views in several styles which we use to fine-tune pretrained foundation models using contrastive learning. We study how the scale of an object in a sketch affects the similarity of features at different network layers. We observe that depending on the scale, different feature layers can be more indicative of shape similarities in sketch views. However, we find that similar object scales result in the best performance of ViT and ResNet. In summary, we show that careful selection of a fine-tuning strategy allows us to obtain consistent improvement in zero-shot shape retrieval accuracy. We believe that our work will have a significant impact on research in the sketch domain, providing insights and guidance on how to adopt large pretrained models as perceptual losses.	翻訳日:2023-06-16 18:36:53 公開日:2023-06-14
# Fed-ZERO:フェデレートされた専門家による効率的なゼロショットパーソナライゼーション Fed-ZERO: Efficient Zero-shot Personalization with Federated Mixture of Experts ( http://arxiv.org/abs/2306.08586v1 ) ライセンス: Link先を確認	Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Robert Sim, Anastasios Kyrillidis, Dimitrios Dimitriadis	(参考訳) フェデレートラーニング(FL)の目標の1つは、共有グローバルモデルからの知識を活用しながら、参加する各クライアントのコンテキストに適応可能なパーソナライズされたモデルを作成することである。しかし、しばしばパーソナライゼーションは、優れたパフォーマンスを達成するために、クライアントのラベル付きデータを使用する微調整のステップを必要とする。これは、入ってくるクライアントが新しくなり、あるいはプライバシー上の懸念があるシナリオでは実現できないかもしれない。そして、これらのシナリオでゼロショットのパーソナライズを実現する方法が、まだオープンである。 FLセットアップ内でMixture-of-Experts(MoE)フレームワークを用いて新しいソリューションを提案する。本手法は,クライアントの多様性を活かし,クラスの異なるサブセットに関する専門的な専門家を訓練し,入力を最も関連する専門家にルーティングするゲーティング関数を提供する。我々のゲーティング関数は、事前訓練されたモデル共通専門家の知識を利用して、オンザフライで経路決定を強化する。その結果,術式FL設定の精度は最大18%向上し,ゼロショット性能の競争力は維持できることがわかった。実際に,本手法は非均一なデータ分散を処理し,より効率的にスケールし,FLベンチマークの最先端性能を向上させる。 One of the goals in Federated Learning (FL) is to create personalized models that can adapt to the context of each participating client, while utilizing knowledge from a shared global model. Yet, often, personalization requires a fine-tuning step using clients' labeled data in order to achieve good performance. This may not be feasible in scenarios where incoming clients are fresh and/or have privacy concerns. It, then, remains open how one can achieve zero-shot personalization in these scenarios. We propose a novel solution by using a Mixture-of-Experts (MoE) framework within a FL setup. Our method leverages the diversity of the clients to train specialized experts on different subsets of classes, and a gating function to route the input to the most relevant expert(s). Our gating function harnesses the knowledge of a pretrained model common expert to enhance its routing decisions on-the-fly. As a highlight, our approach can improve accuracy up to 18\% in state of the art FL settings, while maintaining competitive zero-shot performance. In practice, our method can handle non-homogeneous data distributions, scale more efficiently, and improve the state-of-the-art performance on common FL benchmarks.	翻訳日:2023-06-16 18:31:03 公開日:2023-06-14
# トフォリゲートを5000万個しか持たない256ビット楕円曲線秘密鍵の計算法 How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates ( http://arxiv.org/abs/2306.08585v1 ) ライセンス: Link先を確認	Daniel Litinski	(参考訳) 我々は、シリコンフォトニクスにインスパイアされたアクティブボリュームアーキテクチャにおける資源推定のケーススタディとして、楕円曲線プライベートキーの計算にショアのアルゴリズムを用いる。ここでは、フォールトトレラントなサーフェスコード量子コンピュータは、非局所的なモジュール間接続の対数数のモジュールで構成され、アルゴリズムのコスト関数を2Dローカルアーキテクチャと比較する。非ローカル接続は、運用体制によってキー当たりのコストを300～700倍削減できることがわかりました。 10%のしきい値で、10-$\mu$sのコードサイクルと非ローカル接続を仮定すると、1つのキーは、それぞれ1152の物理キュービットを持つ6000モジュールを使用して10分毎に生成される。対照的に、厳格な2Dローカル接続を持つデバイスは、より多くのキュービットを必要とし、38時間毎に1つのキーを生成する。また,鍵当たりの toffoli 数を最大5倍に減らす単純なアーキテクチャ非依存のアルゴリズム修正も見いだした。これらの変更は、複数のキーに対して格納された状態を再利用し、アルゴリズムの複数の並列インスタンスにモジュラ分割演算のコストを分散させることを含む。 We use Shor's algorithm for the computation of elliptic curve private keys as a case study for resource estimates in the silicon-photonics-inspired active-volume architecture. Here, a fault-tolerant surface-code quantum computer consists of modules with a logarithmic number of non-local inter-module connections, modifying the algorithmic cost function compared to 2D-local architectures. We find that the non-local connections reduce the cost per key by a factor of 300-700 depending on the operating regime. At 10% threshold, assuming a 10-$\mu$s code cycle and non-local connections, one key can be generated every 10 minutes using 6000 modules with 1152 physical qubits each. By contrast, a device with strict 2D-local connectivity requires more qubits and produces one key every 38 hours. We also find simple architecture-independent algorithmic modifications that reduce the Toffoli count per key by up to a factor of 5. These modifications involve reusing the stored state for multiple keys and spreading the cost of the modular division operation over multiple parallel instances of the algorithm.	翻訳日:2023-06-16 18:30:40 公開日:2023-06-14
# 同時解釈データを用いたタグ付きエンドツーエンド同時音声翻訳訓練 Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data ( http://arxiv.org/abs/2306.08582v1 ) ライセンス: Link先を確認	Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura	(参考訳) 同時音声翻訳(SimulST)は部分的な音声入力を漸進的に翻訳する。入力と出力のモノトニック対応は、より小さなレイテンシーでは望ましいが、英語や日本語のような遠方の言語ペアではそうではない。この問題に対する先進的なアプローチは、SIデータを用いて同時解釈(SI)を模倣し、SimulSTモデルをトレーニングすることである。しかし、そのようなSIデータのサイズは限られているため、SIデータはオフラインで翻訳される通常のバイリンガルデータと併用されるべきである。本稿では,SIとオフラインの混合データを用いたSimulSTモデルを効果的に訓練する方法を提案する。提案手法は、モデルにSI型またはオフライン型の出力を生成するよう指示するスタイルタグと混合データを用いて単一のモデルを訓練する。実験結果から, BLEURTの低遅延域における改善が示され, 提案モデルがベースラインよりもSIスタイルの出力を生成することが明らかとなった。 Simultaneous speech translation (SimulST) translates partial speech inputs incrementally. Although the monotonic correspondence between input and output is preferable for smaller latency, it is not the case for distant language pairs such as English and Japanese. A prospective approach to this problem is to mimic simultaneous interpretation (SI) using SI data to train a SimulST model. However, the size of such SI data is limited, so the SI data should be used together with ordinary bilingual data whose translations are given in offline. In this paper, we propose an effective way to train a SimulST model using mixed data of SI and offline. The proposed method trains a single model using the mixed data with style tags that tell the model to generate SI- or offline-style outputs. Experiment results show improvements of BLEURT in different latency ranges, and our analyses revealed the proposed model generates SI-style outputs more than the baseline.	翻訳日:2023-06-16 18:30:21 公開日:2023-06-14
# 低リソース音声認識改善のためのデータ拡張のための言語間マッピングの学習 Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition ( http://arxiv.org/abs/2306.08577v1 ) ライセンス: Link先を確認	Muhammad Umar Farooq, Thomas Hain	(参考訳) 言語間リソースの利用は、低リソース言語の不足を補う効果的な方法である。近年,多言語モデル融合手法が提案され,言語間音響-音韻類似性を写像関数として学習するためのモデルが訓練されている。しかし、手作りのレキシコンはハイブリッドDNN-HMM ASRシステムの訓練に使われてきた。この依存関係を取り除くために、エンドツーエンド音声認識のための学習可能な言語間マッピングの概念を拡張する。さらに、並列データを用いずに、ソース言語をターゲット言語に翻訳するマッピングモデルも採用している。最後に、ソースオーディオとその音訳をデータ拡張に使用して、ターゲット言語ASRを再トレーニングする。その結果,任意のソース言語ASRモデルを用いて低リソースターゲット言語認識を行い,その後にマッピングモデルを提案する。さらに、データ拡張により、ベースライン単言語モデルよりも5%以上の相対的なゲインが得られる。 Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resource languages. Recently, a novel multilingual model fusion technique has been proposed where a model is trained to learn cross-lingual acoustic-phonetic similarities as a mapping function. However, handcrafted lexicons have been used to train hybrid DNN-HMM ASR systems. To remove this dependency, we extend the concept of learnable cross-lingual mappings for end-to-end speech recognition. Furthermore, mapping models are employed to transliterate the source languages to the target language without using parallel data. Finally, the source audio and its transliteration is used for data augmentation to retrain the target language ASR. The results show that any source language ASR model can be used for a low-resource target language recognition followed by proposed mapping model. Furthermore, data augmentation results in a relative gain up to 5% over baseline monolingual model.	翻訳日:2023-06-16 18:30:05 公開日:2023-06-14
# リモートセンシングにおける教師付き変分オートエンコーダに基づくラベルノイズロバスト画像表現学習 Label Noise Robust Image Representation Learning based on Supervised Variational Autoencoders in Remote Sensing ( http://arxiv.org/abs/2306.08575v1 ) ライセンス: Link先を確認	Gencer Sumbul and Beg\"um Demir	(参考訳) 公開されたテーママップとクラウドソースデータにより、ディープニューラルネットワーク(DNN)のトレーニングには、リモートセンシング(RS)イメージアノテーションをゼロコストで収集することができる。しかし、そのようなアノテーション源は、トレーニングデータにノイズラベルを含むリスクを高め、不正確なRS画像表現学習(IRL)を引き起こす可能性がある。本稿では,RSで検討されている学習課題とは無関係に,IRL上のノイズラベルの干渉を防止することを目的としたラベル頑健なIRL手法を提案する。提案手法は,教師付き変分オートエンコーダ(SVAE)と任意の種類のDNNを組み合わせる。これは画像の特徴に基づいて変動生成プロセスを定義することで達成される。これにより、SVAEから得られた損失値と検討されたDNNのタスクヘッドに基づいて、IRLの各トレーニングサンプルの重要性を定義することができる。そして,提案手法はノイズラベルを持つ画像に対して重要度を低くするとともに,irl中に正しいラベルを持つ画像に対して高い重要度を与える。 rs画像に適用したラベル雑音ロバストirl法と比較して,提案手法の有効性を実験的に示した。提案手法のコードはhttps://git.tu-berlin.de/rsim/RS-IRL-SVAEで公開されている。 Due to the publicly available thematic maps and crowd-sourced data, remote sensing (RS) image annotations can be gathered at zero cost for training deep neural networks (DNNs). However, such annotation sources may increase the risk of including noisy labels in training data, leading to inaccurate RS image representation learning (IRL). To address this issue, in this paper we propose a label noise robust IRL method that aims to prevent the interference of noisy labels on IRL, independently from the learning task being considered in RS. To this end, the proposed method combines a supervised variational autoencoder (SVAE) with any kind of DNN. This is achieved by defining variational generative process based on image features. This allows us to define the importance of each training sample for IRL based on the loss values acquired from the SVAE and the task head of the considered DNN. Then, the proposed method imposes lower importance to images with noisy labels, while giving higher importance to those with correct labels during IRL. Experimental results show the effectiveness of the proposed method when compared to well-known label noise robust IRL methods applied to RS images. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/RS-IRL-SVAE.	翻訳日:2023-06-16 18:29:51 公開日:2023-06-14
# 分子内の電子デコヒーレンス経路のマッピング Mapping Electronic Decoherence Pathways in Molecules ( http://arxiv.org/abs/2306.08574v1 ) ライセンス: Link先を確認	Ignacio Gustin, Chang Woo Kim, David W. McCamant and Ignacio Franco	(参考訳) 分子電子量子デコヒーレンスを支配する基本的な化学原理を確立することは、依然として顕著な課題である。溶媒や分子内振動、化学機能化といった基本的な問題は、電子デコヒーレンス全体への寄与は未解決のままであり、最先端の理論的および実験的アプローチの範囲を超えている。本研究では, 電子量子コヒーレンス損失の解明を可能にする, 縮合相環境に浸漬した分子色相の脱コヒーレンス経路を分離する手法を開発する。そのため, RR分光法は, 室温, 溶媒中, 蛍光分子および非蛍光分子において, 完全化学量で分子スペクトル密度を再構築するための一般的な実験方法として同定された。次に、スペクトル密度から脱コヒーレンスダイナミクスを定量的に捉え、脱コヒーレンス経路を個々の分子振動や溶媒モードによる寄与に分解して同定する方法を示す。 DNA塩基チミンとその誘導体の水中における電子的脱コヒーレンス経路の解析による戦略の有用性について述べる。この場合の電子コヒーレンスは ~30 fs で崩壊する。早期のコヒーレンス損失は分子内振動によって決定され、溶媒による全体的な崩壊が決定される。チミンの化学置換は、水との水素結合相互作用による脱コヒーレンスを調節し、最も速い脱コヒーレンス速度をもたらす。温度の上昇は溶媒の寄与の重要性を高めるため脱コヒーレンスを速くするが、初期の脱コヒーレンスダイナミクスはそのまま残る。開発された戦略は、分子構造と溶媒構造と量子デコヒーレンスとの接続を確立する重要な機会を開き、それを合理的に調節する化学戦略を開発する。 Establishing the fundamental chemical principles that govern molecular electronic quantum decoherence has remained an outstanding challenge. Fundamental questions such as how solvent and intramolecular vibrations or chemical functionalization contribute to the overall electronic decoherence remain unanswered and are beyond the reach of state-of-the-art theoretical and experimental approaches. We address this challenge by developing a strategy to isolate decoherence pathways for molecular chromophores immersed in condensed phase environments that enables elucidating how electronic quantum coherence is lost. For this, we first identify RR spectroscopy as a general experimental method to reconstruct molecular spectral densities with full chemical complexity at room temperature, in solvent, and for fluorescent and non-fluorescent molecules. We then show how to quantitatively capture the decoherence dynamics from the spectral density and identify decoherence pathways by decomposing the overall coherence loss into contributions due to individual molecular vibrations and solvent modes. We illustrate the utility of the strategy by analyzing the electronic decoherence pathways of the DNA base thymine and its derivatives in water. The electronic coherences in this case decay in ~30 fs. The early-time coherence loss is determined by intramolecular vibrations while the overall decay by solvent. Chemical substitution of thymine modulates the decoherence with hydrogen-bond interactions with water leading to the fastest decoherence rates. Increasing temperature leads to faster decoherence as it enhances the importance of solvent contributions but leaves the early-time decoherence dynamics intact. The developed strategy opens key opportunities to establish the connection between molecular and solvent structure and quantum decoherence as needed to develop chemical strategies to rationally modulate it.	翻訳日:2023-06-16 18:29:30 公開日:2023-06-14
# GenImage:AI生成画像検出のための100万規模のベンチマーク GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image ( http://arxiv.org/abs/2306.08571v1 ) ライセンス: Link先を確認	Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, Yunhe Wang	(参考訳) 生成モデルが写真画像を生成するという異常な能力は、偽情報の拡散に対する懸念を強め、それによってAI生成した偽画像と実画像とを区別できる検出器の需要が高まった。しかし、最も先進的な画像生成装置の画像を含む大規模なデータセットの欠如は、そのような検出器の開発に障害をもたらす。本稿では,以下の利点を有するGenImageデータセットを紹介する。 1)AIが生成した偽画像100万枚以上の画像と実際の画像の収集を含む大量の画像。 2)リッチ画像コンテンツは幅広い画像クラスを包含する。 3)最先端のジェネレータ,高度な拡散モデルとGANを用いた合成画像。前述の利点により、GenImageで訓練された検出器は、徹底的な評価を行い、多様な画像に強い適用性を示すことができる。本研究では,実世界のシナリオに類似した検出手法を評価するための2つのタスクを提案する。クロスジェネレータ画像分類タスクは、あるジェネレータで訓練された検出器が他のジェネレータでテストした場合の性能を測定する。劣化画像分類タスクは、低解像度、ぼやけた画像、圧縮画像などの劣化画像を扱う検出器の能力を評価する。 GenImageデータセットを使うことで、研究者は一般的な手法と比較して、優れたAI生成画像検出器の開発と評価を効果的に行うことができる。 The extraordinary ability of generative models to generate photographic images has intensified concerns about the spread of disinformation, thereby leading to the demand for detectors capable of distinguishing between AI-generated fake images and real images. However, the lack of large datasets containing images from the most advanced image generators poses an obstacle to the development of such detectors. In this paper, we introduce the GenImage dataset, which has the following advantages: 1) Plenty of Images, including over one million pairs of AI-generated fake images and collected real images. 2) Rich Image Content, encompassing a broad range of image classes. 3) State-of-the-art Generators, synthesizing images with advanced diffusion models and GANs. The aforementioned advantages allow the detectors trained on GenImage to undergo a thorough evaluation and demonstrate strong applicability to diverse images. We conduct a comprehensive analysis of the dataset and propose two tasks for evaluating the detection method in resembling real-world scenarios. The cross-generator image classification task measures the performance of a detector trained on one generator when tested on the others. The degraded image classification task assesses the capability of the detectors in handling degraded images such as low-resolution, blurred, and compressed images. With the GenImage dataset, researchers can effectively expedite the development and evaluation of superior AI-generated image detectors in comparison to prevailing methodologies.	翻訳日:2023-06-16 18:29:01 公開日:2023-06-14
# WizardCoder: Evol-Instructでコード大言語モデルを強化する WizardCoder: Empowering Code Large Language Models with Evol-Instruct ( http://arxiv.org/abs/2306.08568v1 ) ライセンス: Link先を確認	Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang	(参考訳) StarCoderのようなCode Large Language Models (Code LLM)は、コード関連のタスクにおいて例外的なパフォーマンスを示している。しかし、既存のモデルのほとんどは、命令の微調整なしで広範囲の生コードデータに基づいて事前訓練されている。本稿では,コード領域にEvol-Instruct法を適用することで,複雑な命令の微調整を施したコードLLMを実現するWizardCoderを提案する。我々は,HumanEval,HumanEval+,MBPP,DS-1000という4つの著名なコード生成ベンチマークに関する総合的な実験を通じて,我々のモデルが持つ異常な能力を明らかにする。他のオープンソースコードLLMをはるかに上回ります。さらに、我々のモデルは、HumanEvalとHumanEval+上で、最大の閉LLM、ArthropicのClaudeとGoogleのBardよりも優れています。私たちのコード、モデルウェイト、データはhttps://github.com/nlpxucan/wizardlmで公開されている。 Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. Through comprehensive experiments on four prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, and DS-1000, we unveil the exceptional capabilities of our model. It surpasses all other open-source Code LLMs by a substantial margin. Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+. Our code, model weights, and data are public at https://github.com/nlpxucan/WizardLM	翻訳日:2023-06-16 18:28:44 公開日:2023-06-14
# サイバー攻撃に対する連合学習に基づく車両軌道予測 Federated Learning-based Vehicle Trajectory Prediction against Cyberattacks ( http://arxiv.org/abs/2306.08566v1 ) ライセンス: Link先を確認	Zhe Wang, Tingkai Yan	(参考訳) Internet of Vehicles (IoV) の開発により、車両無線通信は深刻なサイバーセキュリティ上の問題を引き起こす。偽の車両の位置や周囲の車両が送った速度などの不具合情報は、車両の衝突、交通渋滞、さらには死傷者も引き起こす可能性がある。さらに、車両軌道やユーザアカウント情報などの個人車両データ漏洩は、ユーザのプロパティやセキュリティを損なう可能性がある。そのため,iovシステムにおいて,データの飽和が不良なサイバー攻撃対策を実現する必要がある。本稿では,これらの問題に対処するため,FL-TPに対するフェデレート学習に基づく車両軌道予測アルゴリズムを提案する。 FL-TPは一般に公開されているVehicular Reference Misbehavior(VeReMi)データセットを使用して、定数、定数オフセット、ランダム、ランダムオフセット、最終的な停止という5種類のサイバー攻撃を集中的にトレーニングし、テストする。その結果,提案手法は最大サイバー攻撃透過性シナリオにおいて,サイバー攻撃検出と追跡予測を最大6.99%,54.86%改善できることがわかった。 With the development of the Internet of Vehicles (IoV), vehicle wireless communication poses serious cybersecurity challenges. Faulty information, such as fake vehicle positions and speeds sent by surrounding vehicles, could cause vehicle collisions, traffic jams, and even casualties. Additionally, private vehicle data leakages, such as vehicle trajectory and user account information, may damage user property and security. Therefore, achieving a cyberattack-defense scheme in the IoV system with faulty data saturation is necessary. This paper proposes a Federated Learning-based Vehicle Trajectory Prediction Algorithm against Cyberattacks (FL-TP) to address the above problems. The FL-TP is intensively trained and tested using a publicly available Vehicular Reference Misbehavior (VeReMi) dataset with five types of cyberattacks: constant, constant offset, random, random offset, and eventual stop. The results show that the proposed FL-TP algorithm can improve cyberattack detection and trajectory prediction by up to 6.99% and 54.86%, respectively, under the maximum cyberattack permeability scenarios compared with benchmark methods.	翻訳日:2023-06-16 18:28:26 公開日:2023-06-14
# 逆転率の信頼性評価 Reliable Evaluation of Adversarial Transferability ( http://arxiv.org/abs/2306.08565v1 ) ライセンス: Link先を確認	Wenqian Yu and Jindong Gu and Zhijiang Li and Philip Torr	(参考訳) 小さな敵対的摂動を持つ敵対的例(AE)は、ディープニューラルネットワーク(DNN)を誤った予測に導出する可能性がある。あるDNNで作られたAEは、別のDNNを騙すこともできる。ここ数年、AEsの転送性はブラックボックス攻撃を促進する重要な特性であるため、大きな注目を集めてきた。対向移動性を改善するために多くのアプローチが提案されている。しかし、これらは主に異なる畳み込みニューラルネットワーク(cnn)アーキテクチャで検証されており、全てのcnnが類似したアーキテクチャバイアスを共有しているため、信頼性の高い評価ではない。本研究では,4種類のニューラルネットワークから18種類の人気モデルを検証し,代表的転送可能性向上攻撃法を再評価する。我々の再評価の結果、逆転性はしばしば過大評価され、すべての人気モデルに変換できる単一のAEは存在しないことがわかった。包括的評価の下では,前回の攻撃方法の移動可能性ランクが変化する。本稿では,3つの評価プロトコルを含む信頼性ベンチマークを提案する。新たなベンチマークにおける逆転送可能性は非常に低く、逆転送可能性の過大評価をさらに裏付ける。私たちは、コード、モデルチェックポイント、評価プロトコルを含む将来の研究を促進するために、https://adv-trans-eval.github.ioでベンチマークをリリースします。 Adversarial examples (AEs) with small adversarial perturbations can mislead deep neural networks (DNNs) into wrong predictions. The AEs created on one DNN can also fool another DNN. Over the last few years, the transferability of AEs has garnered significant attention as it is a crucial property for facilitating black-box attacks. Many approaches have been proposed to improve adversarial transferability. However, they are mainly verified across different convolutional neural network (CNN) architectures, which is not a reliable evaluation since all CNNs share some similar architectural biases. In this work, we re-evaluate 12 representative transferability-enhancing attack methods where we test on 18 popular models from 4 types of neural networks. Our reevaluation revealed that the adversarial transferability is often overestimated, and there is no single AE that can be transferred to all popular models. The transferability rank of previous attacking methods changes when under our comprehensive evaluation. Based on our analysis, we propose a reliable benchmark including three evaluation protocols. Adversarial transferability on our new benchmark is extremely low, which further confirms the overestimation of adversarial transferability. We release our benchmark at https://adv-trans-eval.github.io to facilitate future research, which includes code, model checkpoints, and evaluation protocols.	翻訳日:2023-06-16 18:28:03 公開日:2023-06-14
# TomoSAM:トモグラフィのセグメンテーションにSAMを使用した3Dスライダ拡張 TomoSAM: a 3D Slicer extension using SAM for tomography segmentation ( http://arxiv.org/abs/2306.08609v1 ) ライセンス: Link先を確認	Federico Semeraro, Alexandre Quintart, Sergio Fraile Izquierdo, Joseph C. Ferguson	(参考訳) tomosamは、3d画像処理と視覚化に使用される高度に有能なソフトウェアプラットフォームである3d slicerに、最先端セグメントのany model(sam)を統合するために開発された。 samは、プロンプト可能なディープラーニングモデルで、少数のユーザークリックのみに基づいて、オブジェクトを識別し、ゼロショットでイメージマスクを作成することができる。これらのツールのシナジーは、トモグラフィーや他のイメージング技術による複雑な3dデータセットのセグメンテーションを支援する。この記事に関連するソースコードはhttps://github.com/fsemerar/SlicerTomoSAMにある。 TomoSAM has been developed to integrate the cutting-edge Segment Anything Model (SAM) into 3D Slicer, a highly capable software platform used for 3D image processing and visualization. SAM is a promptable deep learning model that is able to identify objects and create image masks in a zero-shot manner, based only on a few user clicks. The synergy between these tools aids in the segmentation of complex 3D datasets from tomography or other imaging techniques, which would otherwise require a laborious manual segmentation process. The source code associated with this article can be found at https://github.com/fsemerar/SlicerTomoSAM	翻訳日:2023-06-16 18:20:24 公開日:2023-06-14
# ロバスト性とメンバシッププライバシのためのグラフ情報ボトルネックの統一フレームワーク A Unified Framework of Graph Information Bottleneck for Robustness and Membership Privacy ( http://arxiv.org/abs/2306.08604v1 ) ライセンス: Link先を確認	Enyan Dai, Limeng Cui, Zhengyang Wang, Xianfeng Tang, Yinghan Wang, Monica Cheng, Bing Yin, Suhang Wang	(参考訳) グラフニューラルネットワーク(GNN)は,グラフ構造化データのモデリングにおいて大きな成功を収めている。しかし、最近の研究では、GNNは攻撃者の望ましい予測をするために、GNNモデルを騙す可能性のある敵攻撃に対して脆弱であることを示している。さらに、GNNのトレーニングデータは、メンバシップ推論攻撃によって漏洩することができる。これは主に、電子商取引、金融、バイオインフォマティクスといった高額な分野におけるGNNの採用を妨げる。堅牢な予測とメンバーシッププライバシ保護に関する調査は行われてきたが、一般的には堅牢性とメンバーシッププライバシを同時に考慮できていない。そこで本研究では、堅牢でメンバーシップなプライバシー保護型GNNを開発するための新しい課題について検討する。我々の分析によると、Information Bottleneck(IB)は、ノイズの多い情報をフィルタリングし、ラベル付きサンプルで予測を規則化するのに役立つ。しかし、ノード分類における構造ノイズとラベルの欠如は、グラフ構造化データへのIBの展開に挑戦する。これらの問題を緩和するために,隣接ボトルネックによる構造ノイズを軽減するグラフ情報ボトルネックフレームワークを提案する。擬似ラベルは、ラベル付きセットの予測と、メンバーシッププライバシの未ラベルセットとのギャップを最小限に抑える最適化にも組み込まれている。実世界のデータセットに関する広範囲な実験は、この手法が堅牢な予測を与え、同時にメンバーシッププライバシを保存できることを実証する。 Graph Neural Networks (GNNs) have achieved great success in modeling graph-structured data. However, recent works show that GNNs are vulnerable to adversarial attacks which can fool the GNN model to make desired predictions of the attacker. In addition, training data of GNNs can be leaked under membership inference attacks. This largely hinders the adoption of GNNs in high-stake domains such as e-commerce, finance and bioinformatics. Though investigations have been made in conducting robust predictions and protecting membership privacy, they generally fail to simultaneously consider the robustness and membership privacy. Therefore, in this work, we study a novel problem of developing robust and membership privacy-preserving GNNs. Our analysis shows that Information Bottleneck (IB) can help filter out noisy information and regularize the predictions on labeled samples, which can benefit robustness and membership privacy. However, structural noises and lack of labels in node classification challenge the deployment of IB on graph-structured data. To mitigate these issues, we propose a novel graph information bottleneck framework that can alleviate structural noises with neighbor bottleneck. Pseudo labels are also incorporated in the optimization to minimize the gap between the predictions on the labeled set and unlabeled set for membership privacy. Extensive experiments on real-world datasets demonstrate that our method can give robust predictions and simultaneously preserve membership privacy.	翻訳日:2023-06-16 18:20:12 公開日:2023-06-14
# M^2UNet:polypセグメンテーションのためのMetaFormerマルチスケールアップサンプリングネットワーク M^2UNet: MetaFormer Multi-scale Upsampling Network for Polyp Segmentation ( http://arxiv.org/abs/2306.08600v1 ) ライセンス: Link先を確認	Quoc-Huy Trinh, Nhat-Tan Bui, Trong-Hieu Nguyen Mau, Minh-Van Nguyen, Hai-Minh Phan, Minh-Triet Tran, Hai-Dang Nguyen	(参考訳) 近年,ポリプのセグメンテーションが注目され,様々な手法が提案されている。しかし, コンボリューション操作の性質から, 複雑ポリープの前景とその周辺領域での作業では困難に直面することが多い。さらに、既存のほとんどのメソッドは、複数のデコーダステージからの潜在的な情報を利用することを忘れている。この課題に対処するために、cnnとtransformerを統合するベースラインとして導入されたmetaformerと、unetフレームワークを結合し、マルチスケールアップサンプリングブロック(mu)を統合することを提案します。このシンプルなモジュールは、浅いデコーダステージの複数の受容的フィールドパスを探索し、より高いステージを追加して、医療画像のセグメンテーションに不可欠な優れた特徴表現を集約することで、多レベル情報を組み合わせることができる。本稿では,ポリプセグメンテーションタスクのためのMetaFormer Multi-scale Upsampling Network (M$^2$UNet)を提案する。 5つのベンチマークデータセットを広範囲に実験した結果,従来の手法に比べて性能が高かった。 Polyp segmentation has recently garnered significant attention, and multiple methods have been formulated to achieve commendable outcomes. However, these techniques often confront difficulty when working with the complex polyp foreground and their surrounding regions because of the nature of convolution operation. Besides, most existing methods forget to exploit the potential information from multiple decoder stages. To address this challenge, we suggest combining MetaFormer, introduced as a baseline for integrating CNN and Transformer, with UNet framework and incorporating our Multi-scale Upsampling block (MU). This simple module makes it possible to combine multi-level information by exploring multiple receptive field paths of the shallow decoder stage and then adding with the higher stage to aggregate better feature representation, which is essential in medical image segmentation. Taken all together, we propose MetaFormer Multi-scale Upsampling Network (M$^2$UNet) for the polyp segmentation task. Extensive experiments on five benchmark datasets demonstrate that our method achieved competitive performance compared with several previous methods.	翻訳日:2023-06-16 18:19:49 公開日:2023-06-14
# Kernel Debiased Plug-in Estimation Kernel Debiased Plug-in Estimation ( http://arxiv.org/abs/2306.08598v1 ) ライセンス: Link先を確認	Brian Cho, Kyra Gan, Ivana Malenica, Yaroslav Mukhin	(参考訳) 本研究では,ノイズパラメータの存在下でスカラーターゲットパラメータを推定する問題を考察する。未知のニュアンスパラメータを非パラメトリック推定器、例えば機械学習(ML)モデルで置き換えるのは便利であるが、大きなバイアスのために非効率であることが示されている。ターゲット最小損失ベース推定(TMLE)やダブル機械学習(DML)といった現代の手法は、ML推定を利用して、プラグインバイアスを緩和し、柔軟な仮定の下で最適な性能を達成する。準最適バイアス分散トレードオフを回避するため、これらの手法はプラグインの偏りを事前に見積もる。既存のデバイアス手法では、ターゲットパラメータの影響関数を入力として要求する。しかし、IFの派生には専門的な専門知識が必要であり、実践者によるこれらの手法の適応を妨げる。プラグイン推定器をデバイアスする新しい方法を提案する。 (i)効率的である。 (ii)IFの実施を必要としない。三) 計算的抽出が可能であり, 新たな推定問題に容易に適応でき, 利用者による解析的導出なしに自動化することができる。我々はtmleフレームワーク上に構築し,再現カーネルヒルベルト空間 (rkhs) を用いて構築した非パラメトリックモデルに対して,正規化確率最大化ステップでプラグイン推定を更新し,任意の正規目標パラメータに対して効率的なプラグイン推定を生成する。そこで本手法は,プラグインアプローチの有用性を犠牲にすることなく,競合するデバイアス手法の効率性を提供する。 We consider the problem of estimating a scalar target parameter in the presence of nuisance parameters. Replacing the unknown nuisance parameter with a nonparametric estimator, e.g.,a machine learning (ML) model, is convenient but has shown to be inefficient due to large biases. Modern methods, such as the targeted minimum loss-based estimation (TMLE) and double machine learning (DML), achieve optimal performance under flexible assumptions by harnessing ML estimates while mitigating the plug-in bias. To avoid a sub-optimal bias-variance trade-off, these methods perform a debiasing step of the plug-in pre-estimate. Existing debiasing methods require the influence function of the target parameter as input. However, deriving the IF requires specialized expertise and thus obstructs the adaptation of these methods by practitioners. We propose a novel way to debias plug-in estimators which (i) is efficient, (ii) does not require the IF to be implemented, (iii) is computationally tractable, and therefore can be readily adapted to new estimation problems and automated without analytic derivations by the user. We build on the TMLE framework and update a plug-in estimate with a regularized likelihood maximization step over a nonparametric model constructed with a reproducing kernel Hilbert space (RKHS), producing an efficient plug-in estimate for any regular target parameter. Our method, thus, offers the efficiency of competing debiasing techniques without sacrificing the utility of the plug-in approach.	翻訳日:2023-06-16 18:19:27 公開日:2023-06-14
# Floquet周波数変調によるRydberg相互作用の変換 Transforming Rydberg Interactions with Floquet Frequency Modulation ( http://arxiv.org/abs/2306.08596v1 ) ライセンス: Link先を確認	Luheng Zhao, Michael Dao Kang Lee, Mohammad Mujahid Aliyu, Huanqian Loh	(参考訳) ライドベルク封鎖は配列中の原子を絡める重要な要素である。しかし、これは局所的な量子ゲートの範囲を制限するブロック半径内に原子をうまく配置する必要がある。ここでは、Floquet周波数変調を用いてこの制約を破り、従来の閉塞半径を超えるRydberg-Blockade絡みを実証する。さらに,Floquet周波数変調の下では,絡み合った状態のコヒーレンスを拡張できることがわかった。最後に, ブロックド半径内にある2つの原子に対して, 従来の静的駆動のみで定常状態の個体群を達成できないという, ライドバーグの抗遮断状態を実現する。我々の研究は、Rydbergブロックとアンチブロッカドのパラダイム的な状態の間で変化し、より連結的で一貫性があり、汎用的な中性原子量子プロセッサを単一のアプローチで実現する方法を舗装する。 The Rydberg blockade is a key ingredient for entangling atoms in arrays. However, it requires atoms to be spaced well within the blockade radius, which limits the range of local quantum gates. Here we break this constraint using Floquet frequency modulation, with which we demonstrate Rydberg-blockade entanglement beyond the traditional blockade radius. Further, we find that the coherence of entangled states can be extended under Floquet frequency modulation. Finally, we realize Rydberg anti-blockade states for two atoms within the blockade radius, where the steady-state population cannot be achieved with only the conventional static drive. Our work transforms between the paradigmatic regimes of Rydberg blockade versus anti-blockade and paves the way for realizing more connected, coherent, and versatile neutral atom quantum processors with a single approach.	翻訳日:2023-06-16 18:19:01 公開日:2023-06-14
# tensorkrowch: マシンラーニングにおけるテンソルネットワークのスムーズな統合 TensorKrowch: Smooth integration of tensor networks in machine learning ( http://arxiv.org/abs/2306.08595v1 ) ライセンス: Link先を確認	Jos\'e Ram\'on Pareja Monturiol, David P\'erez-Garc\'ia, Alejandro Pozas-Kerstjens	(参考訳) テンソルネットワークは、高次元テンソルから小さなテンソルのネットワークへの分解である。彼らは物理学や数学に応用しており、最近では有望な機械学習アーキテクチャとして提案されている。機械学習パイプラインにおけるテンソルネットワークの統合を容易にするため、PyTorch上に構築されたオープンソースのPythonライブラリであるTensorKrowchを紹介した。ユーザフレンドリなインターフェースを提供するTensorKrowchでは,任意のテンソルネットワークを構築してトレーニングし,より複雑なディープラーニングモデルのレイヤとして統合することができる。本稿では,TensorKrowchの主な機能と基本的な使用法について述べるとともに,その構築ブロックと効率的な操作を実現するための最適化について技術的に詳述する。 Tensor networks are factorizations of high-dimensional tensors into networks of smaller tensors. They have applications in physics and mathematics, and recently have been proposed as promising machine learning architectures. To ease the integration of tensor networks in machine learning pipelines, we introduce TensorKrowch, an open source Python library built on top of PyTorch. Providing a user-friendly interface, TensorKrowch allows users to construct any tensor network, train it, and integrate it as a layer in more intricate deep learning models. In this paper, we describe the main functionality and basic usage of TensorKrowch, and provide technical details on its building blocks and the optimizations performed to achieve efficient operation.	翻訳日:2023-06-16 18:18:45 公開日:2023-06-14
# 不均一連続学習 Heterogeneous Continual Learning ( http://arxiv.org/abs/2306.08593v1 ) ライセンス: Link先を確認	Divyam Madaan, Hongxu Yin, Wonmin Byeon, Jan Kautz, Pavlo Molchanov	(参考訳) 本稿では,ネットワークアーキテクチャの変更に伴う継続学習(CL)問題に対処するための新しいフレームワークとソリューションを提案する。ほとんどのCLメソッドは、重みを変更して新しいタスク/クラスに単一のアーキテクチャを適用することに重点を置いている。しかし、アーキテクチャ設計の急速な進歩に伴い、既存のソリューションを新しいアーキテクチャに適応させるという課題が重要となる。この制限に対処するため、我々は、新しいデータ/タスクとともに広範囲に進化するネットワークアーキテクチャが継続的に現れる異種連続学習(HCL)を提案する。解決策として, 蒸留技術群の上に構築し, より弱いモデルが教師の役割を担うような新しい環境に修正する一方で, より強力なアーキテクチャが学生として機能する。さらに,従来のデータへのアクセス制限を考慮し,知識伝達を支援するために,タスク前の視覚的特徴を復元するクイック・ディープ・インバージョン(QDI)を提案する。 QDIは従来のソリューションに比べて計算コストを大幅に削減し、全体的なパフォーマンスを向上させる。本稿では, 知識蒸留パラダイムを改良したCLの新しいセットアップを提案し, 蒸留を強化するための高速データ反転法を設計する。各種ネットワークアーキテクチャ上での最先端手法と比較して,評価精度が大幅に向上した。 We propose a novel framework and a solution to tackle the continual learning (CL) problem with changing network architectures. Most CL methods focus on adapting a single architecture to a new task/class by modifying its weights. However, with rapid progress in architecture design, the problem of adapting existing solutions to novel architectures becomes relevant. To address this limitation, we propose Heterogeneous Continual Learning (HCL), where a wide range of evolving network architectures emerge continually together with novel data/tasks. As a solution, we build on top of the distillation family of techniques and modify it to a new setting where a weaker model takes the role of a teacher; meanwhile, a new stronger architecture acts as a student. Furthermore, we consider a setup of limited access to previous data and propose Quick Deep Inversion (QDI) to recover prior task visual features to support knowledge transfer. QDI significantly reduces computational costs compared to previous solutions and improves overall performance. In summary, we propose a new setup for CL with a modified knowledge distillation paradigm and design a quick data inversion method to enhance distillation. Our evaluation of various benchmarks shows a significant improvement on accuracy in comparison to state-of-the-art methods over various networks architectures.	翻訳日:2023-06-16 18:18:33 公開日:2023-06-14
# 大腸内視鏡における自己監督ポリープ再同定 Self-Supervised Polyp Re-Identification in Colonoscopy ( http://arxiv.org/abs/2306.08591v1 ) ライセンス: Link先を確認	Yotam Intrator, Natalie Aizenberg, Amir Livne, Ehud Rivlin, Roman Goldenberg	(参考訳) コンピュータ支援ポリープ検出(CADe)は, 現代の大腸内視鏡システムにおいて, 標準的かつ重要な部分となっている。典型的な大腸内視鏡CADeは、単一のフレーム内のポリプを検出し、ビデオシーケンスを通して追跡しない。しかし、polyp characterization (cadx)、quality metrics、automatic reportingを含む多くの下流タスクでは、複数のフレームからポリpデータを集約する必要がある。本研究では,視覚的外観による再同定に基づく堅牢な長期ポリープ追跡手法を提案する。本ソリューションは,ビデオ入力の時間的性質を活用した注意に基づく自己教師付きmlモデルを用いる。提案手法の性能を定量的に評価し,CADxタスクの価値を示す。 Computer-aided polyp detection (CADe) is becoming a standard, integral part of any modern colonoscopy system. A typical colonoscopy CADe detects a polyp in a single frame and does not track it through the video sequence. Yet, many downstream tasks including polyp characterization (CADx), quality metrics, automatic reporting, require aggregating polyp data from multiple frames. In this work we propose a robust long term polyp tracking method based on re-identification by visual appearance. Our solution uses an attention-based self-supervised ML model, specifically designed to leverage the temporal nature of video input. We quantitatively evaluate method's performance and demonstrate its value for the CADx task.	翻訳日:2023-06-16 18:18:15 公開日:2023-06-14
# 暗黙のバイアスを超えて:オンライン学習におけるsgdノイズの無意味さ Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning ( http://arxiv.org/abs/2306.08590v1 ) ライセンス: Link先を確認	Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham Kakade, Boaz Barak	(参考訳) ディープラーニングにおけるSGDの成功は、高い学習率または小さなバッチサイズによって引き起こされる暗黙のバイアス("SGD noise")に先行研究によって説明されている。オフライン学習(マルチエポック学習)に焦点を当てた先行研究では,オンライン学習(単一エポック学習)におけるSGDノイズの影響について検討した。画像と言語データの広範な実証分析を通じて,オンライン学習において,大きな学習率と小さなバッチサイズが暗黙のバイアスアドバンテージを生まないことを実証する。オフライン学習とは対照的に、オンライン学習におけるSGDノイズの利点は厳密な計算であり、より大きく、よりコスト効率の良い勾配ステップを促進する。本研究は,オンラインシステムにおけるsgdは,ノイズレス勾配流アルゴリズムの「黄金経路」に沿ってノイズステップをとることができることを示唆する。この仮説を裏付ける証拠として,訓練中にsgdノイズを低減させる実験と,sgdノイズレベルは異なるが等価な損失値で訓練されたモデル間のポイントワイズ機能距離を測定する。本研究は,SGDの一般的な理解に挑戦し,オンライン学習におけるその役割に関する新たな知見を提供する。 The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by high learning rate or small batch size ("SGD noise"). While prior works that focused on offline learning (i.e., multiple-epoch training), we study the impact of SGD noise on online (i.e., single epoch) learning. Through an extensive empirical analysis of image and language data, we demonstrate that large learning rate and small batch size do not confer any implicit bias advantages in online learning. In contrast to offline learning, the benefits of SGD noise in online learning are strictly computational, facilitating larger or more cost-effective gradient steps. Our work suggests that SGD in the online regime can be construed as taking noisy steps along the "golden path" of the noiseless gradient flow algorithm. We provide evidence to support this hypothesis by conducting experiments that reduce SGD noise during training and by measuring the pointwise functional distance between models trained with varying SGD noise levels, but at equivalent loss values. Our findings challenge the prevailing understanding of SGD and offer novel insights into its role in online learning.	翻訳日:2023-06-16 18:18:05 公開日:2023-06-14
# 音声編集によるASRにおけるコード切り替えと名前付きエンティティ認識の改善 Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation ( http://arxiv.org/abs/2306.08588v1 ) ライセンス: Link先を確認	Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen	(参考訳) 近年,エンド・ツー・エンド(E2E)自動音声認識(ASR)モデルは非常に進歩しており,音声認識性能に優れる。しかし、コードスイッチングや名前付きエンティティ認識(NER)など、E2Eモデルには適さない難題がいくつか残っている。データ拡張は2つのシナリオで一般的で効果的なプラクティスです。しかし、現在のデータ拡張方法は、主に音声スプライシングとテキスト音声(TTS)モデルに依存しており、不連続性、非現実性、多様化の少ない音声をもたらす可能性がある。そこで本研究では,テキストベースの音声編集モデルを適用した新しいデータ拡張手法を提案する。音声編集システムによる拡張音声は、よりコヒーレントで多様化しており、また実際の音声に近い。コードスイッチングとnerタスクの実験結果は,提案手法が音声スプライシングとニューラルttsに基づくデータ拡張システムを大きく上回ることを示した。 Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition. However, there remain several challenging scenarios that E2E models are not competent in, such as code-switching and named entity recognition (NER). Data augmentation is a common and effective practice for these two scenarios. However, the current data augmentation methods mainly rely on audio splicing and text-to-speech (TTS) models, which might result in discontinuous, unrealistic, and less diversified speech. To mitigate these potential issues, we propose a novel data augmentation method by applying the text-based speech editing model. The augmented speech from speech editing systems is more coherent and diversified, also more akin to real speech. The experimental results on code-switching and NER tasks show that our proposed method can significantly outperform the audio splicing and neural TTS based data augmentation systems.	翻訳日:2023-06-16 18:17:45 公開日:2023-06-14
# CVSSの改良による産業制御システムの脆弱性評価 Vulnerability Assessment of Industrial Control System with an Improved CVSS ( http://arxiv.org/abs/2306.08631v1 ) ライセンス: Link先を確認	He Wen	(参考訳) 産業制御システム(ICS)に対するサイバー攻撃は学界で注目を集めている。しかし、これは一部の工業従事者の間で十分な懸念を生じさせていない。したがって、ICS内の脆弱な場所やコンポーネントを特定し、攻撃シナリオやテクニックを調査する必要がある。本研究は,ICSにおけるサイバー攻撃のリスクをCVSS(Common Vulnerability Scoring System)の改良により評価し,CSTR(Continuous stired tank reactor)モデルに適用する手法を提案する。その結果,icの物理システムレベルはサイバー攻撃時に最も重要度が高く,コントローラ,ワークステーション,ヒューマン・マシン・インタフェースがサイバー攻撃と防御の重要な構成要素であることがわかった。 Cyberattacks on industrial control systems (ICS) have been drawing attention in academia. However, this has not raised adequate concerns among some industrial practitioners. Therefore, it is necessary to identify the vulnerable locations and components in the ICS and investigate the attack scenarios and techniques. This study proposes a method to assess the risk of cyberattacks on ICS with an improved Common Vulnerability Scoring System (CVSS) and applies it to a continuous stirred tank reactor (CSTR) model. The results show the physical system levels of ICS have the highest severity once cyberattacked, and controllers, workstations, and human-machine interface are the crucial components in the cyberattack and defense.	翻訳日:2023-06-16 18:11:22 公開日:2023-06-14
# 部分空間と適応生成モデルを組み合わせた高次元mr再構成 High-Dimensional MR Reconstruction Integrating Subspace and Adaptive Generative Models ( http://arxiv.org/abs/2306.08630v1 ) ライセンス: Link先を確認	Ruiyang Zhao, Xi Peng, Varun A. Kelkar, Mark A. Anastasio, Fan Lam	(参考訳) 目的:高次元MR画像再構成のためのサブスペースと生成画像モデルを統合する新しい手法を開発する。方法:高次元画像の低次元部分空間モデルと,「コントラスト重み付き」画像のシーケンスや部分空間モデルの空間係数の空間制約となる適応生成画像と,従来のスパーシティ正規化とを融合させる定式化を提案した。コントラストの異なる画像のための正確な生成ネットワークベース表現を構築するために, 特別事前学習と対象特化ネットワーク適応戦略を提案した。最近提案された中間層最適化手法を応用した生成画像モデルの部分空間係数と多分解能潜時空間を共同で更新する反復アルゴリズムが導入された。結果: 高速MRパラメータマッピングと高分解能MR分光画像の2つの高次元イメージングへの応用について検討した。最先端のサブスペースベースメソッドのパフォーマンス向上が両ケースで実証された。結論:提案手法は,部分空間再構成を制約するデータ駆動空間として適応型生成モデルを導入することで,高次元mr画像再構成問題を解決する新しい手法を提供する。意義:本研究は,高次元画像問題に対するデータ駆動型および適応型生成前処理と標準低次元モデリングを統合する可能性を実証した。 Objective: To develop a new method that integrates subspace and generative image models for high-dimensional MR image reconstruction. Methods: We proposed a formulation that synergizes a low-dimensional subspace model of high-dimensional images, an adaptive generative image prior serving as spatial constraints on the sequence of "contrast-weighted" images or spatial coefficients of the subspace model, and a conventional sparsity regularization. A special pretraining plus subject-specific network adaptation strategy was proposed to construct an accurate generative-network-based representation for images with varying contrasts. An iterative algorithm was introduced to jointly update the subspace coefficients and the multi-resolution latent space of the generative image model that leveraged a recently proposed intermediate layer optimization technique for network inversion. Results: We evaluated the utility of the proposed method for two high-dimensional imaging applications: accelerated MR parameter mapping and high-resolution MR spectroscopic imaging. Improved performance over state-of-the-art subspace-based methods was demonstrated in both cases. Conclusion: The proposed method provided a new way to address high-dimensional MR image reconstruction problems by incorporating an adaptive generative model as a data-driven spatial prior for constraining subspace reconstruction. Significance: Our work demonstrated the potential of integrating data-driven and adaptive generative priors with canonical low-dimensional modeling for high-dimensional imaging problems.	翻訳日:2023-06-16 18:11:11 公開日:2023-06-14
# 深さ最適量子ビット割り当てとスワップベースルーティングのための制約プログラミングモデル Constraint programming models for depth-optimal qubit assignment and SWAP-based routing ( http://arxiv.org/abs/2306.08629v1 ) ライセンス: Link先を確認	Kyle E. C. Booth	(参考訳) ゲートモデル量子デバイスの接続が限られているため、論理量子回路は実行前にターゲットハードウェアにコンパイルされなければならない。多くの場合、このプロセスでは、論理回路にスワップゲートを挿入し、いわゆる量子ビット割り当てとルーティング問題を解決することで回路の深さを増加させる。近年,量子ビット割当問題やルーティング問題の解法として整数線形計画法(ilp)モデルが提案されている。これらのモデルは問題の目的関数と制約を符号化し、ハードウェア準拠の量子回路を見つけるために自動解法技術を利用する。そこで本研究では,本問題に対する制約プログラミング(cp)モデルを提案し,線形および二次元グリッド格子デバイストポロジの回路深度最小化のためのirpとの比較を行った。実験分析の結果,提案手法はソリューションの品質と実行時間の両方において,ILPモデルよりも優れていることがわかった。 Due to the limited connectivity of gate model quantum devices, logical quantum circuits must be compiled to target hardware before they can be executed. Often, this process involves the insertion of SWAP gates into the logical circuit, usually increasing the depth of the circuit, achieved by solving a so-called qubit assignment and routing problem. Recently, a number of integer linear programming (ILP) models have been proposed for solving the qubit assignment and routing problem to proven optimality. These models encode the objective function and constraints of the problem, and leverage the use of automated solver technology to find hardware-compliant quantum circuits. In this work, we propose constraint programming (CP) models for this problem and compare their performance against ILP for circuit depth minimization for both linear and two-dimensional grid lattice device topologies on a set of randomly generated instances. Our empirical analysis indicates that the proposed CP approaches outperform the ILP models both in terms of solution quality and runtime.	翻訳日:2023-06-16 18:10:50 公開日:2023-06-14
# 気象データへのグラフベースマトリックス補完の適用 Graph-Based Matrix Completion Applied to Weather Data ( http://arxiv.org/abs/2306.08627v1 ) ライセンス: Link先を確認	Beno\^it Loucheur, P.-A. Absil, Michel Journ\'ee	(参考訳) 低ランク行列完備化は、真の行列が良好な低ランク近似を持つことを仮定して、行列の未知のエントリを復元するタスクである。時には変数に関する追加情報が知られており、この情報を行列補完モデルに組み込むことで、より良い完成性が得られる。本稿では,行列の列/行エンティティ間の情報を重み付きグラフとして利用できる状況を考える。本稿では,気象観測所が記録した気温データの欠落エントリを完了させる問題に対処する。気象データの実際のギャップを模倣した場所にデータを格納してテストセットを構築する。これらのテストセットにおいて,適切な空間的および時間的グラフは,グラフ正規化低ランク行列補完法によって得られる完了の精度を著しく向上できることを示す。 Low-rank matrix completion is the task of recovering unknown entries of a matrix by assuming that the true matrix admits a good low-rank approximation. Sometimes additional information about the variables is known, and incorporating this information into a matrix completion model can lead to a better completion quality. We consider the situation where information between the column/row entities of the matrix is available as a weighted graph. In this framework, we address the problem of completing missing entries in air temperature data recorded by weather stations. We construct test sets by holding back data at locations that mimic real-life gaps in weather data. On such test sets, we show that adequate spatial and temporal graphs can significantly improve the accuracy of the completion obtained by graph-regularized low-rank matrix completion methods.	翻訳日:2023-06-16 18:10:32 公開日:2023-06-14
# フェルミオン不純物のマルコフ緩和過程におけるシステムバスの絡み合い System-bath entanglement during Markovian relaxation of a fermionic impurity ( http://arxiv.org/abs/2306.08626v1 ) ライセンス: Link先を確認	Krzysztof Ptaszynski, Massimiliano Esposito	(参考訳) フェルミイオン熱浴に結合した非相互作用性フェルミイオン不純物の熱分解におけるシステムと環境の絡み合いのダイナミクスについて検討した。弱結合状態においても過渡的絡み合いは観測可能であり、系の還元ダイナミクスや熱力学が状態集団に対する古典的・マルコフ的マスター方程式によってよく説明できることを示した。この絡み合いは長い間消滅するが、緩和時間に匹敵する時間スケールで保存される。その大きさは、システムと環境のカップリングに弱いだけでなく、システムの初期状態の純度に大きく依存する。我々は,このような過渡的絡み合いの存在とマルコフ記述の縮小に基づくシステムバス力学のユニタリ特性を関連づける。 We investigate the dynamics of entanglement between the system and the environment during thermalization of a noninteracting fermionic impurity coupled to a fermionic thermal bath. We show that transient entanglement can be observed even in the weak coupling regime, when the reduced dynamics and thermodynamics of the system can be well described by an effectively classical and Markovian master equation for the state populations. This entanglement vanishes for long times, but is preserved over timescales comparable to the relaxation time. Its magnitude depends only weakly on the system-environment coupling but instead strongly on the purity of the initial state of the system. We relate the presence of such transient entanglement to the unitary character of the system-bath dynamics underlying the reduced Markovian description.	翻訳日:2023-06-16 18:10:18 公開日:2023-06-14
# RRSIS:リモートセンシング画像のセグメンテーションを参照 RRSIS: Referring Remote Sensing Image Segmentation ( http://arxiv.org/abs/2306.08625v1 ) ライセンス: Link先を確認	Zhenghang Yuan, Lichao Mou, Yuansheng Hua, Xiao Xiang Zhu	(参考訳) リモートセンシング画像から所望のオブジェクトをローカライズすることは、実用上非常に有用である。与えられた表現が参照する対象を分割することを目的とした画像分割の参照は、自然画像において広く研究されている。しかし、このリモートセンシング画像のタスクには、ほとんど研究の注意が払われていない。本稿では,実世界の応用の可能性を考慮して,このギャップを埋めるためにリモートセンシング画像セグメンテーション(RRSIS)を紹介する。具体的には、このタスクのためにRefSegRSと呼ばれる新しいデータセットを作成し、異なるメソッドの評価を可能にします。その後、RefSegRSデータセット上の自然画像のイメージセグメンテーション手法をベンチマークし、これらのモデルが小さな物体や散乱物体の検出において限られた有効性を示すことを示した。この問題を軽減するために,言語機能を利用した言語誘導型クロススケール拡張(LGCE)モジュールを提案する。提案したデータセット、ベンチマーク結果、デザインされたLGCEモジュールは、より良いRRSISモデルの設計に関する洞察を提供する。データセットとコードを公開します。 Localizing desired objects from remote sensing images is of great use in practical applications. Referring image segmentation, which aims at segmenting out the objects to which a given expression refers, has been extensively studied in natural images. However, almost no research attention is given to this task of remote sensing imagery. Considering its potential for real-world applications, in this paper, we introduce referring remote sensing image segmentation (RRSIS) to fill in this gap and make some insightful explorations. Specifically, we create a new dataset, called RefSegRS, for this task, enabling us to evaluate different methods. Afterward, we benchmark referring image segmentation methods of natural images on the RefSegRS dataset and find that these models show limited efficacy in detecting small and scattered objects. To alleviate this issue, we propose a language-guided cross-scale enhancement (LGCE) module that utilizes linguistic features to adaptively enhance multi-scale visual features by integrating both deep and shallow features. The proposed dataset, benchmarking results, and the designed LGCE module provide insights into the design of a better RRSIS model. We will make our dataset and code publicly available.	翻訳日:2023-06-16 18:10:06 公開日:2023-06-14
# 量子ドットにおける断熱量子励起の熱力学 Thermodynamics of adiabatic quantum pumping in quantum dots ( http://arxiv.org/abs/2306.08621v1 ) ライセンス: Link先を確認	Daniele Nello and Alessandro Silva	(参考訳) 2つのフェルミオンリードに接続された単一レベルの量子ドットである共鳴レベルモデルによる断熱量子ポンピングを考える。我々は, このモデルについて, 点のエネルギーレベルと熱浴によるトンネル速度の変動を考慮した一貫した熱力学的記述を開発した。本研究では,エントロピーや散逸電力など,関連する熱力学量を計算するポンプサイクルの様々な例について検討する。次に、これらの量をシステムの輸送特性と比較する。その結果, 電荷量子化限界ではエントロピー生成速度が消失し, 散逸した電力は同じ限界で量子化されることがわかった。 We consider adiabatic quantum pumping through a resonant level model, a single-level quantum dot connected to two fermionic leads. We develop a consistent thermodynamic description of this model accounting for the variation of the energy level of the dot and the tunnelling rates with the thermal baths. We study various examples of pumping cycles computing the relevant thermodynamic quantities, such as the entropy produced and the dissipated power. We then compare these quantities with the transport properties of the system. Among other results, we find that the entropy production rate vanishes in the charge quantization limit while the dissipated power is quantized in the same limit.	翻訳日:2023-06-16 18:09:47 公開日:2023-06-14
# 期待音楽変換器 Anticipatory Music Transformer ( http://arxiv.org/abs/2306.08620v1 ) ライセンス: Link先を確認	John Thickstun, David Hall, Chris Donahue, Percy Liang	(参考訳) 第2の相関化プロセス(制御プロセス)の実現に基づいて非同期に条件づけされた時間的点過程(イベントプロセス)の制御可能な生成モデルを構築する方法である。我々は,イベントシーケンスの停止時間に従って制御が現れるように,イベントと制御のシーケンスをインターリーブすることでこれを実現する。この作品は、シンボリック・ミュージック・ジェネレーションの制御に生じる問題に動機づけられている。制御タスクは、制御自体がイベントのサブセットであり、条件付き生成は、固定された制御イベントが与えられたイベントのシーケンスを完了する。大規模かつ多様なLakh MIDI音楽データセットを用いて予測入出力モデルを訓練する。これらのモデルは、伴奏を含むインフィル制御タスクを実行する追加機能を備えた、インプット音楽生成のための自己回帰モデルのパフォーマンスと一致する。 human evaluatorsは、予測モデルが20秒のクリップで人間が作曲した音楽と同じような音楽性を持つ伴奏を生成すると報告している。 We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.	翻訳日:2023-06-16 18:09:36 公開日:2023-06-14
# 近似有効$p$-Resistanceによるマルチクラスグラフクラスタリング Multi-class Graph Clustering via Approximated Effective $p$-Resistance ( http://arxiv.org/abs/2306.08617v1 ) ライセンス: Link先を確認	Shota Saito and Mark Herbster	(参考訳) 本稿では,(有効)$p$-resistanceの近似を開発し,マルチクラスクラスタリングに適用する。グラフラプラシアンに基づくスペクトル法とそのグラフ $p$-laplacian への一般化は、非ユークリッドクラスタリング技術のバックボーンとなっている。 p$-Laplacian の利点は、パラメータ $p$ がクラスタ構造に制御可能なバイアスをもたらすことである。 p$-Laplacian eigenvector based methodの欠点は、3番目と上位の固有ベクトルの計算が難しいことである。したがって、我々はクラスタリングに$p$-Laplacianによって誘導される$p$-resistanceを使うことを動機付けている。 p$-resistanceの場合、小さな$p$バイアスは内部接続性の高いクラスタへ、大きな$p$バイアスは小さな ``extent,''のクラスタへ、これはクラスタ内の頂点間の短いパス距離を優先する。しかし、$p$-resistanceは計算にコストがかかる。我々は、$p$-resistanceの近似を開発することでこれを克服する。この近似で上界と下界を証明し、グラフが木であるときにそれが正確であることを観測する。また、クラスタリングに$p$-resistanceを使用するための理論的正当性も提供する。最後に、近似した$p$-resistanceクラスタリングと他の$p$-Laplacianベースのメソッドとの比較実験を行う。 This paper develops an approximation to the (effective) $p$-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph $p$-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the $p$-Laplacian is that the parameter $p$ induces a controllable bias on cluster structure. The drawback of $p$-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the $p$-resistance induced by the $p$-Laplacian for clustering. For $p$-resistance, small $p$ biases towards clusters with high internal connectivity while large $p$ biases towards clusters of small ``extent,'' that is a preference for smaller shortest-path distances between vertices in the cluster. However, the $p$-resistance is expensive to compute. We overcome this by developing an approximation to the $p$-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of $p$-resistance for clustering. Finally, we provide experiments comparing our approximated $p$-resistance clustering to other $p$-Laplacian based methods.	翻訳日:2023-06-16 18:09:21 公開日:2023-06-14
# 時間-局所最適化を超えたラジカルペアダイナミクスの量子制御 Quantum Control of Radical Pair Dynamics beyond Time-Local Optimisation ( http://arxiv.org/abs/2306.08613v1 ) ライセンス: Link先を確認	Farhan T. Chowdhury, Matt C. J. Denton, Daniel C. Bonser, Daniel R. Kattnig	(参考訳) 傾斜上昇パルス工学(grape)を拡張して反応収率を最適化することで,低磁場領域におけるラジカル対のスピン選択的再結合反応における任意の波形制御を実現する。これは、反応制御を実現するための従来の時間局所最適化アプローチの欠点を克服し、高いバイアス場によって駆動されるラジカル対の適用性に制限された。本研究では, 時間ブロック, スパースサンプリング, 中央単段プロパゲータとそのFr'echet誘導体の反復的トロッタスズキ分割による評価により, ラジカル対組換え収率の時間-グローバル最適化がいかに効果的かを示す。その結果、より単純な高磁場シナリオにおいてラジカル対反応のコヒーレント制御を示すおもちゃモデルと、16個の核スピンからなる現実的な励起結合形成ドナー受容体系の両方が得られた。このことは、周囲磁場における実際のラジカル対系のスピン制御の見通しを高め、目的特異的な電波波形を用いてラジカル反応の収率を抑制または促進し、反応収率依存性の量子磁気学におけるラジカル誘起量子ビットアーキテクチャーへの道を切り開いたり、生化学的ラジカル対反応への量子制御の潜在的応用を可能にした。 By extending Gradient Ascent Pulse Engineering (GRAPE) to allow for optimising reaction yields, we realise arbitrary waveform-based control in spin-selective recombination reactions of radical pairs in the low magnetic field regime. This overcomes drawbacks of previous time-local optimisation approaches for realising reaction control, which were limited in their applicability to radical pairs driven by high biasing fields. We demonstrate how efficient time-global optimisation of the radical pair recombination yields can be realised by gradient based methods augmented with time-blocking, sparse sampling of the yield, and evaluation of the central single-timestep propagators and their Fr\'echet derivatives using iterated Trotter-Suzuki splittings. Results are shown for both a toy model, previously used to demonstrate coherent control of radical pair reactions in the simpler high-field scenario, and furthermore for a realistic exciplex-forming donor-acceptor system comprising 16 nuclear spins. This raises prospects for the spin-control of actual radical pair systems in ambient magnetic fields, by suppressing or boosting radical reaction yields using purpose-specific radio-frequency waveforms, paving the way for radical inspired qubit architectures for reaction-yield-dependent quantum magnetometry and potentially applications of quantum control to biochemical radical pair reactions.	翻訳日:2023-06-16 18:08:57 公開日:2023-06-14
# 接地型社会推論に向けて Toward Grounded Social Reasoning ( http://arxiv.org/abs/2306.08651v1 ) ライセンス: Link先を確認	Minae Kwon, Hengyuan Hu, Vivek Myers, Siddharth Karamcheti, Anca Dragan, Dorsa Sadigh	(参考訳) レゴのスポーツカーでデスクを丁寧に組み立てるロボットを考えてみてほしい。人間はスポーツカーの分解が社会的に適切でないと認識し、「タイダイイング」の一部として取り除くことができる。ロボットはどうやってその結論に達するのか? 大規模言語モデル (LLMs) は近年, 社会的推論に利用されてきたが, 現実の世界でのこの推論は困難である。現実の世界では、ロボットは受動的にLLMに問い合わせるだけでなく、正しい判断を下すために必要な環境から情報を積極的に収集する必要がある。例えば、隠された車があることを検知したロボットは、レゴ製の高度なモデルカーなのか、幼児が作ったおもちゃの車なのかを積極的に認識する必要があるかもしれない。 llmと視覚言語モデル(vlm)を活用して,ロボットがその環境を積極的に認識し,基盤的社会的推論を行うためのアプローチを提案する。当社のフレームワークを大規模に評価するために,クリーニングが必要な70の現実世界の面の画像を含むMessySurfacesデータセットをリリースしました。さらに,2つの表面を注意深く設計したロボットによるアプローチについても紹介する。我々は、メッシーサーフェースベンチマークの平均12.9%の改善と、アクティブな知覚を使用しないベースラインに対するロボット実験の平均15%の改善を見出した。私たちのアプローチのデータセット、コード、ビデオは、https://minaek.github.io/groundedsocialreasoningで見ることができます。 Consider a robot tasked with tidying a desk with a meticulously constructed Lego sports car. A human may recognize that it is not socially appropriate to disassemble the sports car and put it away as part of the "tidying". How can a robot reach that conclusion? Although large language models (LLMs) have recently been used to enable social reasoning, grounding this reasoning in the real world has been challenging. To reason in the real world, robots must go beyond passively querying LLMs and actively gather information from the environment* that is required to make the right decision. For instance, after detecting that there is an occluded car, the robot may need to actively perceive the car to know whether it is an advanced model car made out of Legos or a toy car built by a toddler. We propose an approach that leverages an LLM and vision language model (VLM) to help a robot actively perceive its environment to perform grounded social reasoning. To evaluate our framework at scale, we release the MessySurfaces dataset which contains images of 70 real-world surfaces that need to be cleaned. We additionally illustrate our approach with a robot on 2 carefully designed surfaces. We find an average 12.9% improvement on the MessySurfaces benchmark and an average 15% improvement on the robot experiments over baselines that do not use active perception. The dataset, code, and videos of our approach can be found at https://minaek.github.io/groundedsocialreasoning.	翻訳日:2023-06-16 18:01:53 公開日:2023-06-14
# ランクが重要なときのランクの学習 Learning to Rank when Grades Matter ( http://arxiv.org/abs/2306.08650v1 ) ライセンス: Link先を確認	Le Yan, Zhen Qin, Xuanhui Wang, Gil Shamir, Mike Bendersky	(参考訳) グレードラベルは、現実世界の学習からランクへのアプリケーション、特に人間格付けされた関連データで広く使われている。従来の学習 to ランク技術は、文書のランク付け順序を最適化することを目的としている。しかし、通常は実際の成績の予測を無視する。これにより、`poor'' ドキュメントをフィルタリングするなど、グレードが重要なアプリケーションでそれらを採用できない。優れたランク付け性能と優れたグレード予測性能の両方を達成することは、まだ未解決の問題である。既存の研究は、モデル出力の校正を行わず、あるいはラベルが線形スケールにあり、順序付け情報を活用できないと仮定して、グレードを数値として扱うことで、ランキング性能のみに焦点を当てている。本稿では,ランク付け性能と格付け予測性能の両方が重要となるランク付け学習について,厳密な研究を行う。成績予測の非スカラー予測による順位付けの方法に関する形式的な議論を行い,順位予測と順位予測の両方を共同で最適化する多目的定式化を提案する。実験では,我々の手法がparetoのランキングとグレード予測のパフォーマンスのトレードオフのフロンティアを押し上げることができるという,いくつかの公開データセットを検証した。 Graded labels are ubiquitous in real-world learning-to-rank applications, especially in human rated relevance data. Traditional learning-to-rank techniques aim to optimize the ranked order of documents. They typically, however, ignore predicting actual grades. This prevents them from being adopted in applications where grades matter, such as filtering out ``poor'' documents. Achieving both good ranking performance and good grade prediction performance is still an under-explored problem. Existing research either focuses only on ranking performance by not calibrating model outputs, or treats grades as numerical values, assuming labels are on a linear scale and failing to leverage the ordinal grade information. In this paper, we conduct a rigorous study of learning to rank with grades, where both ranking performance and grade prediction performance are important. We provide a formal discussion on how to perform ranking with non-scalar predictions for grades, and propose a multiobjective formulation to jointly optimize both ranking and grade predictions. In experiments, we verify on several public datasets that our methods are able to push the Pareto frontier of the tradeoff between ranking and grade prediction performance, showing the benefit of leveraging ordinal grade information.	翻訳日:2023-06-16 18:01:26 公開日:2023-06-14
# OCAtari:オブジェクト中心のAtari 2600強化学習環境 OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments ( http://arxiv.org/abs/2306.08649v1 ) ライセンス: Link先を確認	Quentin Delfosse, Jannis Bl\"uml, Bjarne Gregori, Sebastian Sztwiertnia, Kristian Kersting	(参考訳) 認知科学と心理学は、複雑なシーンのオブジェクト中心の表現が、低レベルの知覚的特徴から効率的な抽象的推論を実現するための有望なステップであることを示唆している。しかし、最も深い強化学習アプローチは、自然のシーンの合成特性を捉えないピクセルベースの表現にのみ依存する。そのためには、オブジェクト指向アプローチの作業と評価を可能にする環境とデータセットが必要です。本稿では,ゲームにおけるオブジェクト中心の状態表現を提供する環境セットであるocatariを提案する。 OCAtariはまた、ゲームのRAM状態操作を変更でき、特定の状況や新しい状況でも作成できる。この作業のコードベースはgithub.com/k4ntz/oc_atariで入手できる。 Cognitive science and psychology suggest that object-centric representations of complex scenes are a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep reinforcement learning approaches rely on only pixel-based representations that do not capture the compositional properties of natural scenes. For this, we need environments and datasets that allow us to work and evaluate object-centric approaches. We present OCAtari, a set of environment that provides object-centric state representations of Atari games, the most-used evaluation framework for deep RL approaches. OCAtari also allows for RAM state manipulations of the games to change and create specific or even novel situations. The code base for this work is available at github.com/k4ntz/OC_Atari.	翻訳日:2023-06-16 18:01:04 公開日:2023-06-14
# simplemapping: ディープマルチビューステレオを用いたリアルタイム視覚慣性密集マッピング SimpleMapping: Real-Time Visual-Inertial Dense Mapping with Deep Multi-View Stereo ( http://arxiv.org/abs/2306.08648v1 ) ライセンス: Link先を確認	Yingye Xin, Xingxing Zuo, Dongyue Lu, Stefan Leutenegger	(参考訳) 逐次単眼画像と慣性測定ユニット(IMU)のみを用いて高画質の3次元メッシュ再構成を行うことができるリアルタイムビジュアル慣性高密度マッピング法を提案する。 6-DoFカメラのポーズは、頑健な特徴に基づく視覚慣性計測(VIO)によって推定され、ノイズの多い3Dマップポイントを副産物として生成する。本稿では,vioシステムから有益だがノイズの多いスパースポイントを効果的に活用できるスパースポイント支援マルチビューステレオニューラルネットワーク(spa-mvsnet)を提案する。 VIOからのスパース深度は、まず、シングルビュー深度完了ネットワークによって完了する。この濃厚深さマップは、当然精度は限られているが、mvsネットワークのコストボリューム生成と正確な濃密深さ予測のための正規化を導くために、前もって使用される。 MVSネットワークによるキーフレーム画像の予測深度マップをTSDF-Fusionを用いてグローバルマップにインクリメンタルに融合する。提案するspa-mvsnetと,複数の公開データセット上での視覚慣性的高密度マッピングシステムと,我々のデータセットの両方を評価し,システムの印象的な一般化能力と高品質な3dメッシュ再構成をオンラインで提供する能力を示した。提案手法は,EuRoCデータセットの難易度評価において,既存システムよりも39.7%のFスコア向上を実現している。受け入れ次第、この作業のコードをリリースする予定です。 We present a real-time visual-inertial dense mapping method capable of performing incremental 3D mesh reconstruction with high quality using only sequential monocular images and inertial measurement unit (IMU) readings. 6-DoF camera poses are estimated by a robust feature-based visual-inertial odometry (VIO), which also generates noisy sparse 3D map points as a by-product. We propose a sparse point aided multi-view stereo neural network (SPA-MVSNet) that can effectively leverage the informative but noisy sparse points from the VIO system. The sparse depth from VIO is firstly completed by a single-view depth completion network. This dense depth map, although naturally limited in accuracy, is then used as a prior to guide our MVS network in the cost volume generation and regularization for accurate dense depth prediction. Predicted depth maps of keyframe images by the MVS network are incrementally fused into a global map using TSDF-Fusion. We extensively evaluate both the proposed SPA-MVSNet and the entire visual-inertial dense mapping system on several public datasets as well as our own dataset, demonstrating the system's impressive generalization capabilities and its ability to deliver high-quality 3D mesh reconstruction online. Our proposed dense mapping system achieves a 39.7% improvement in F-score over existing systems when evaluated on the challenging scenarios of the EuRoC dataset. We plan to release the code of this work upon acceptance.	翻訳日:2023-06-16 18:00:51 公開日:2023-06-14
# ロボットのスキル合成に報酬を与える言語 Language to Rewards for Robotic Skill Synthesis ( http://arxiv.org/abs/2306.08647v1 ) ライセンス: Link先を確認	Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, Fei Xia	(参考訳) 大規模言語モデル(llm)は、論理的な推論からコード記述まで、コンテキスト内学習を通じて多様な新機能を獲得するという、エキサイティングな進歩を示している。ロボティクスの研究者たちは、LLMを使ってロボット制御の能力を向上させる研究も行っている。しかし、低レベルロボットの動作はハードウェアに依存しており、LLMトレーニングコーパスでは表現できないため、LLMをロボットに適用するための既存の取り組みは、LLMをセマンティックプランナーとして、あるいは人間工学のコントロールプリミティブに頼ってロボットと対話している。一方、報酬関数は、多様なタスクを達成するために制御ポリシーに最適化できるフレキシブルな表現であり、その意味的な豊かさはLLMによって指定されるのに適している。本研究では, LLMを利用して, 様々なロボットタスクを最適化し, 実現可能な報酬パラメータを定義することによって, この実現を実現する新しいパラダイムを提案する。 LLMが生成する中間インタフェースとして報酬を用いることで、高レベルの言語命令や修正のギャップを、低レベルのロボット動作に効果的に埋めることができる。一方、リアルタイムオプティマイザであるmujoco mpcと組み合わせることで、ユーザがすぐに結果を観察し、システムへのフィードバックを提供できるインタラクティブな行動創造エクスペリエンスが実現される。提案手法の性能を体系的に評価するために,擬似四足ロボットと擬似マニピュレータロボットのための合計17のタスクを設計した。提案手法は設計したタスクの90%に確実に対応し,コード・アズ・ポリシシーのインターフェースとしてプリミティブ・スキルを用いたベースラインはタスクの50%を達成する。さらに本手法を,非包括的プッシュなどの複雑な操作スキルが対話システムを通じて現れるロボットアーム上で検証した。 Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.	翻訳日:2023-06-16 18:00:25 公開日:2023-06-14
# 可変サイズテキスト・画像合成のための学習自由拡散モデル適応 Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis ( http://arxiv.org/abs/2306.08645v1 ) ライセンス: Link先を確認	Zhiyu Jin and Xuli Shen and Bin Li and Xiangyang Xue	(参考訳) 拡散モデル(DM)は近年,テキスト・画像合成における最先端性能に注目されている。ディープラーニングの伝統に従って、DMは一定サイズの画像に基づいて訓練され、評価される。しかし、ユーザーは特定のサイズと様々なアスペクト比で様々な画像を要求する。本稿では,視覚の忠実性を維持しつつ,テキストから画像への拡散モデルを適用することに焦点を当てる。まず、合成中は、解像度の低い画像は不完全な物体の描写に悩まされ、高解像度画像は繰り返し提示される。次に,注意エントロピーがトークン量とともに変化することを示す統計的関係を確立し,モデルが画像解像度に比例して空間情報を集約することを示す。その後の観察では、低解像度の空間情報が少ないため、オブジェクトは不完全に描写されるが、高解像度の空間情報から繰り返し提示される。この観点から,注意エントロピーの変化を緩和し,観察した欠陥パターンを緩和するためのスケーリング係数を提案する。大規模な実験結果から,提案したスケーリング係数の有効性が検証され,視覚効果,画質,テキストアライメントが向上した。特に、これらの改善は、追加のトレーニングや微調整技術なしで達成される。 Diffusion models (DMs) have recently gained attention with state-of-the-art performance in text-to-image synthesis. Abiding by the tradition in deep learning, DMs are trained and evaluated on the images with fixed sizes. However, users are demanding for various images with specific sizes and various aspect ratio. This paper focuses on adapting text-to-image diffusion models to handle such variety while maintaining visual fidelity. First we observe that, during the synthesis, lower resolution images suffer from incomplete object portrayal, while higher resolution images exhibit repetitive presentation. Next, we establish a statistical relationship indicating that attention entropy changes with token quantity, suggesting that models aggregate spatial information in proportion to image resolution. The subsequent interpretation on our observations is that objects are incompletely depicted due to limited spatial information for low resolutions, while repetitive presentation arises from redundant spatial information for high resolutions. From this perspective, we propose a scaling factor to alleviate the change of attention entropy and mitigate the defective pattern observed. Extensive experimental results validate the efficacy of the proposed scaling factor, which enables the model to achieve better visual effects, image quality, and text alignment. Notably, these improvements are achieved without additional training or fine-tuning techniques.	翻訳日:2023-06-16 17:59:53 公開日:2023-06-14
# コンピュータビジョンにおけるAGIに向けて: GPTと大規模言語モデルから学ぶ Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models ( http://arxiv.org/abs/2306.08641v1 ) ライセンス: Link先を確認	Lingxi Xie, Longhui Wei, Xiaopeng Zhang, Kaifeng Bi, Xiaotao Gu, Jianlong Chang, Qi Tian	(参考訳) AIコミュニティは、どんな現実世界の問題にも適用される人工知能(AGI)と呼ばれるアルゴリズムを追求してきた。近年,大規模言語モデル(LLM)を利用したチャットシステムが出現し,自然言語処理(NLP)におけるAGIの実現に向けて急速に進んでいるが,コンピュータビジョン(CV)におけるAGIへの道のりはいまだ不明である。ディレンマは、視覚信号が言語信号よりも複雑であることに起因するかも知れませんが、具体的な理由の発見や、gptやllmからの経験を吸収して問題を解決することに関心があります。本稿では、AGIの概念定義から始め、NLPがチャットシステムを介して広範囲のタスクをどのように解決するかを簡単にレビューする。この分析は、統合がCVの次の重要な目標であることを示している。しかし、この方向への様々な取り組みにもかかわらず、CVは、すべてのタスクを自然に統合するGPTのようなシステムからはまだ遠い。 CVの本質的な弱点は、環境から学ぶためのパラダイムが欠如していることが指摘されているが、NLPはテキストの世界においてその課題を達成している。次に、CVアルゴリズム(つまりエージェント)を世界規模で対話可能な環境に配置し、その動作に関する将来のフレームを予測するために事前訓練し、様々なタスクをこなすための命令で微調整するパイプラインを想像する。私たちは、このアイデアを前進させ、それをスケールアップするために、かなりの研究とエンジニアリングの努力を期待しています。 The AI community has been pursuing algorithms known as artificial general intelligence (AGI) that apply to any kind of real-world problem. Recently, chat systems powered by large language models (LLMs) emerge and rapidly become a promising direction to achieve AGI in natural language processing (NLP), but the path towards AGI in computer vision (CV) remains unclear. One may owe the dilemma to the fact that visual signals are more complex than language signals, yet we are interested in finding concrete reasons, as well as absorbing experiences from GPT and LLMs to solve the problem. In this paper, we start with a conceptual definition of AGI and briefly review how NLP solves a wide range of tasks via a chat system. The analysis inspires us that unification is the next important goal of CV. But, despite various efforts in this direction, CV is still far from a system like GPT that naturally integrates all tasks. We point out that the essential weakness of CV lies in lacking a paradigm to learn from environments, yet NLP has accomplished the task in the text world. We then imagine a pipeline that puts a CV algorithm (i.e., an agent) in world-scale, interactable environments, pre-trains it to predict future frames with respect to its action, and then fine-tunes it with instruction to accomplish various tasks. We expect substantial research and engineering efforts to push the idea forward and scale it up, for which we share our perspectives on future research directions.	翻訳日:2023-06-16 17:59:33 公開日:2023-06-14
# AssistGPT:計画、実行、検査、学習が可能な汎用マルチモーダルアシスタント AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn ( http://arxiv.org/abs/2306.08640v1 ) ライセンス: Link先を確認	Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou	(参考訳) 近年のLarge Language Models (LLMs) の研究は、一般のNLPAIアシスタントに顕著な進歩をもたらした。いくつかの研究は、より一般的なマルチモーダルユーザクエリに対処するために、モデルやapiの計画と呼び出しにllmの使用をさらに検討している。この進歩にもかかわらず、視覚タスクの多様な性質のため、複雑な視覚ベースのタスクは依然として困難である。この多様性は2つの側面に反映されます 1)経路の推論。多くの実生活アプリケーションでは、クエリ自体を調べるだけでクエリを正確に分解することは困難である。特定の視覚内容と各ステップの結果に基づいた計画が通常必要である。 2)柔軟な入力と中間結果。入力フォームは、野生のケースでは柔軟で、単一の画像やビデオだけでなく、ビデオや画像の混合物(たとえば、ユーザービュー画像といくつかの参照ビデオ)も含む。さらに、複雑な推論プロセスは、ビデオナレーションやセグメント化されたビデオクリップなど、さまざまなマルチモーダル中間結果を生成する。このような一般的なケースに対処するため,我々は,plan,execute,inspect,learning(peil)と呼ばれるインターリーブされたコードと言語推論アプローチを備えたマルチモーダルaiアシスタントである assistgpt を提案する。具体的には、Plannerは自然言語を使ってExecutorのどのツールが次にすべきかを、現在の推論の進捗に基づいて計画することができる。インスペクタは、プランナーが特定のツールに適切な視覚情報を供給するのを補助する効率的なメモリマネージャである。最後に、推論プロセス全体が複雑で柔軟であるため、学習者はモデルが最適な解を自律的に探索し発見できるように設計されている。我々は, A-OKVQA と NExT-QA のベンチマーク実験を行った。さらに,本システムでは,ベンチマークよりもはるかに複雑な質問を処理可能であることを示す。 Recent research on Large Language Models (LLMs) has led to remarkable advancements in general NLP AI assistants. Some studies have further explored the use of LLMs for planning and invoking models or APIs to address more general multi-modal user queries. Despite this progress, complex visual-based tasks still remain challenging due to the diverse nature of visual tasks. This diversity is reflected in two aspects: 1) Reasoning paths. For many real-life applications, it is hard to accurately decompose a query simply by examining the query itself. Planning based on the specific visual content and the results of each step is usually required. 2) Flexible inputs and intermediate results. Input forms could be flexible for in-the-wild cases, and involves not only a single image or video but a mixture of videos and images, e.g., a user-view image with some reference videos. Besides, a complex reasoning process will also generate diverse multimodal intermediate results, e.g., video narrations, segmented video clips, etc. To address such general cases, we propose a multi-modal AI assistant, AssistGPT, with an interleaved code and language reasoning approach called Plan, Execute, Inspect, and Learn (PEIL) to integrate LLMs with various tools. Specifically, the Planner is capable of using natural language to plan which tool in Executor should do next based on the current reasoning progress. Inspector is an efficient memory manager to assist the Planner to feed proper visual information into a specific tool. Finally, since the entire reasoning process is complex and flexible, a Learner is designed to enable the model to autonomously explore and discover the optimal solution. We conducted experiments on A-OKVQA and NExT-QA benchmarks, achieving state-of-the-art results. Moreover, showcases demonstrate the ability of our system to handle questions far more complex than those found in the benchmarks.	翻訳日:2023-06-16 17:59:07 公開日:2023-06-14
# TAPIR: フレーム単位の初期化と時間的リファインメントによる任意のポイントの追跡 TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement ( http://arxiv.org/abs/2306.08637v1 ) ライセンス: Link先を確認	Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman	(参考訳) 本稿では,ビデオシーケンスを通して任意の物理面上の問合せ点を効果的に追跡する,TAP(Tracking Any Point)の新しいモデルを提案する。提案手法では,(1)他の各フレームのクエリ点に対する適切な候補点マッチングを独立に求めるマッチングステージ,(2)局所相関に基づいて軌跡と問合せ特徴の両方を更新するリファインメントステージの2つのステージを用いる。結果として得られたモデルは、DAVISにおける平均約20%の絶対平均ジャカード(AJ)改善によって示されるように、TAP-Vidベンチマークにおける大きなマージンで、すべてのベースライン手法を上回ります。本モデルは,長大かつ高精細な映像系列の高速推論を容易にする。現代のGPUでは、我々の実装はリアルタイムよりも高速にポイントを追跡する能力を持っている。視覚化、ソースコード、事前訓練されたモデルは、プロジェクトのWebページにある。 We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time. Visualizations, source code, and pretrained models can be found on our project webpage.	翻訳日:2023-06-16 17:58:36 公開日:2023-06-14
# 移動平均と回帰モデルによる無線チャネルの品質予測 Predicting Wireless Channel Quality by means of Moving Averages and Regression Models ( http://arxiv.org/abs/2306.08634v1 ) ライセンス: Link先を確認	Gabriele Formis, Stefano Scanzio, Gianluca Cena, Adriano Valenzano	(参考訳) メディアアクセス制御層で見られるように、無線チャネルの将来品質を確実に予測する能力は、ワイヤーに依存しない将来の産業ネットワークの性能を向上させるための鍵となる。チャネルの振る舞いがどれほど変化するかを事前に知ることで、最適なチャネルを適応的に選択する手順を高速化し、ネットワークをより決定性が高く、信頼性が高く、エネルギー不足が軽減され、デバイスローミング能力が向上する可能性がある。この目的のために、実際のWi-Fi設定から取得したデータに基づいて、複数のキーパフォーマンス指標を用いて、移動平均と回帰に基づく一般的なアプローチを比較した。さらに, 異なる手法による結果の線形結合に基づく簡易な手法を提案し解析し, 予測誤差を更に低減し, 達成可能な誤差の上限を低くする方法について考察した。最良モデルは指数移動平均であり,平均誤差2.10\%でフレーム配信率を予測できると同時に,計算の複雑さやメモリ消費が他のモデルよりも低いことがわかった。 The ability to reliably predict the future quality of a wireless channel, as seen by the media access control layer, is a key enabler to improve performance of future industrial networks that do not rely on wires. Knowing in advance how much channel behavior may change can speed up procedures for adaptively selecting the best channel, making the network more deterministic, reliable, and less energy-hungry, possibly improving device roaming capabilities at the same time. To this aim, popular approaches based on moving averages and regression were compared, using multiple key performance indicators, on data captured from a real Wi-Fi setup. Moreover, a simple technique based on a linear combination of outcomes from different techniques was presented and analyzed, to further reduce the prediction error, and some considerations about lower bounds on achievable errors have been reported. We found that the best model is the exponential moving average, which managed to predict the frame delivery ratio with a 2.10\% average error and, at the same time, has lower computational complexity and memory consumption than the other models we analyzed.	翻訳日:2023-06-16 17:58:20 公開日:2023-06-14
# RGBDデータからシーンレベルインプット3Dを予測する学習 Learning to Predict Scene-Level Implicit 3D from Posed RGBD Data ( http://arxiv.org/abs/2306.08671v1 ) ライセンス: Link先を確認	Nilesh Kulkarni, Linyi Jin, Justin Johnson, David F. Fouhey	(参考訳) 本稿では,RGBDデータから3次元再構成のためのシーンレベルの暗黙関数を学習する手法を提案する。テスト時には,これまで見えなかったRGB画像を,暗黙の関数によるシーンの3次元再構成にマッピングする。 3次元再構成のための暗黙の関数はメッシュに結びついていることが多いが,RGBD画像のみを用いてトレーニングできることを示す。この設定は、3Dリコンストラクションが加速度計+RGBDの海を解き放つのに役立つかもしれない。当社のシステムであるD2-DRDFは,メッシュ監視を用いた現在の手法に適合し,時には優れ,スパースデータの堅牢性も向上する。 We introduce a method that can learn to predict scene-level implicit functions for 3D reconstruction from posed RGBD data. At test time, our system maps a previously unseen RGB image to a 3D reconstruction of a scene via implicit functions. While implicit functions for 3D reconstruction have often been tied to meshes, we show that we can train one using only a set of posed RGBD images. This setting may help 3D reconstruction unlock the sea of accelerometer+RGBD data that is coming with new phones. Our system, D2-DRDF, can match and sometimes outperform current methods that use mesh supervision and shows better robustness to sparse data.	翻訳日:2023-06-16 17:52:53 公開日:2023-06-14
# Gossipモデルにおける分散学習ダイナミクス Decentralized Learning Dynamics in the Gossip Model ( http://arxiv.org/abs/2306.08670v1 ) ライセンス: Link先を確認	John Lazarsfeld, Dan Alistarh	(参考訳) 我々は,ゴシップモデルにおけるメモリ制限ノード数$n$の分散マルチアームバンディットについて検討し,各ラウンドにおいて各ノードが$m$のアームの1つを局所的に採用し,アームの分布から引き出された報酬を観測し,次にランダムにサンプリングされた隣人と通信し,次のラウンドでその方針を決定する。各ノードの決定は完全にローカルであり、最近取得した報酬とサンプルした隣接ノードのみに依存する。我々は,これらの分散ダイナミクスのグローバル進化と,ある種の「ゼロサム」乗算重み更新アルゴリズムとの関係を示し,これらの自然プロトコルの集団レベルの後悔を分析するための汎用フレームワークを開発した。この枠組みを用いて、固定的な報酬設定(各腕の分布の平均が時間とともに固定される)と敵対的な報酬設定(時間とともに変化しうる手段)について、幅広いパラメータ規則(すなわち、人口と武器の数)の下でサブ線形後悔境界を導出する。さらに,これらのプロトコルは,確率的勾配 oracle から報酬分布が生成される場合に,simplex 上の凸関数を近似的に最適化できることを示した。 We study a distributed multi-armed bandit setting among a population of $n$ memory-constrained nodes in the gossip model: at each round, every node locally adopts one of $m$ arms, observes a reward drawn from the arm's (adversarially chosen) distribution, and then communicates with a randomly sampled neighbor, exchanging information to determine its policy in the next round. We introduce and analyze several families of dynamics for this task that are decentralized: each node's decision is entirely local and depends only on its most recently obtained reward and that of the neighbor it sampled. We show a connection between the global evolution of these decentralized dynamics with a certain class of "zero-sum" multiplicative weight update algorithms, and we develop a general framework for analyzing the population-level regret of these natural protocols. Using this framework, we derive sublinear regret bounds under a wide range of parameter regimes (i.e., the size of the population and number of arms) for both the stationary reward setting (where the mean of each arm's distribution is fixed over time) and the adversarial reward setting (where means can vary over time). Further, we show that these protocols can approximately optimize convex functions over the simplex when the reward distributions are generated from a stochastic gradient oracle.	翻訳日:2023-06-16 17:52:40 公開日:2023-06-14
# 量子ドットデバイスを用いた通信用cバンドにおける識別不能光子のオンデマンド生成 On-demand Generation of Indistinguishable Photons in the Telecom C-Band using Quantum Dot Devices ( http://arxiv.org/abs/2306.08668v1 ) ライセンス: Link先を確認	Daniel A. Vajner, Pawe{\l} Holewa, Emilia Zi\k{e}ba-Ost\'oj, Maja Wasiluk, Martin von Helversen, Aurimas Sakanas, Alexander Huck, Kresten Yvind, Niels Gregersen, Anna Musia{\l}, Marcin Syperek, Elizaveta Semenova, Tobias Heindel	(参考訳) 量子ドット(QD)は、量子情報や量子通信への応用のために、単一および絡み合った光子を生成することができる。 780 nmから950 nmのスペクトル範囲で放射されるQDは、1光子純度と不連続性を持つが、この波長の光損失が大きいため、光ファイバーネットワークへの応用には最適ではない。ここで好まれる選択は、1550nm(Telecom Cバンド)の低損失スペクトルウィンドウで動作するQDである。本研究では,InAs/InP QD-mesa構造をシリコンウェハ上の金属リフレクタと不均一に統合した単一QDデバイスから,通信用Cバンド中の不明瞭な光子のコヒーレントなオンデマンド生成を実証する。二励起子-励起子放射カスケードのパルス2光子共鳴励起を用いて、励起子と二励起子光子のそれぞれg$^{(2)}$(0)=0.005(1)と0.015(1)の項で、4$\pi$のパルス領域までのラビ回転と高い単光子純度を観測する。香港-奥羽-マンデル型実験では, 共分極と交叉偏極の一致を比較することにより, 最大35(3)%の2光子干渉振動率を得る。これは、波長変換なしで直接通信Cバンドに放出される単一光子の光子区別性の著しい進歩を示す。 Semiconductor quantum dots (QDs) enable the generation of single and entangled photons for applications in quantum information and quantum communication. While QDs emitting in the 780 nm to 950 nm spectral range feature close-to-ideal single-photon purities and indistinguishabilities, they are not the best choice for applications in fiber-optical networks, due to the high optical losses in this wavelength regime. The preferable choice here are QDs operating in the lowest-loss spectral window around 1550 nm (telecom C-band). In this work, we demonstrate the coherent on-demand generation of indistinguishable photons in the telecom C-band from single QD devices consisting of InAs/InP QD-mesa structures heterogeneously integrated with a metallic reflector on a silicon wafer. Using pulsed two-photon resonant excitation of the biexciton-exciton radiative cascade, we observe Rabi rotations up to pulse areas of 4$\pi$ and a high single-photon purity in terms of g$^{(2)}$(0)=0.005(1) and 0.015(1) for exciton and biexciton photons, respectively. We obtain two-photon interference visibilities of up to 35(3)% in Hong-Ou-Mandel-type experiments by comparing co- and cross-polarized coincidences. This represents a significant advancement in the photon-indistinguishability of single photons emitted directly in the telecom C-band without wavelength conversion.	翻訳日:2023-06-16 17:52:13 公開日:2023-06-14
# 効率的な自己注意をいつ使うのか? テキスト・音声・画像変換器バリアントのプロファイリング When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants ( http://arxiv.org/abs/2306.08667v1 ) ライセンス: Link先を確認	Anuj Diwan, Eunsol Choi, David Harwath	(参考訳) 本稿では,テキスト,音声,視覚にまたがる自己着脱型変圧器の効率に関する最初の統一研究を行う。我々は、様々な効率指標(レイテンシ、スループット、メモリ)を用いて、効率的なトランスフォーマー変種がバニラモデルよりも効率的になる入力長閾値(タップポイント)を同定する。そこで,本研究では,自己教師付き音声モデルの局所的対応型であるl-hubertを提案する。これらのしきい値は a) 典型的なデータセットのシーケンスの長さよりもはるかに高い (b)計量とモダリティに依存しており、正しいモデルを選択することはモダリティ、タスクタイプ(一般的なコンテキストとロングフォーム)、リソース制約(時間対メモリ)に依存することを示している。また, 変圧器部品の計算コストの推移を可視化することにより, 非自己注意部品は計算コストが著しく高いことを示す。私たちはプロファイリングツールキットをhttps://github.com/ajd12342/profiling-transformersでリリースします。 We present the first unified study of the efficiency of self-attention-based Transformer variants spanning text, speech and vision. We identify input length thresholds (tipping points) at which efficient Transformer variants become more efficient than vanilla models, using a variety of efficiency metrics (latency, throughput, and memory). To conduct this analysis for speech, we introduce L-HuBERT, a novel local-attention variant of a self-supervised speech model. We observe that these thresholds are (a) much higher than typical dataset sequence lengths and (b) dependent on the metric and modality, showing that choosing the right model depends on modality, task type (long-form vs. typical context) and resource constraints (time vs. memory). By visualising the breakdown of the computational costs for transformer components, we also show that non-self-attention components exhibit significant computational costs. We release our profiling toolkit at https://github.com/ajd12342/profiling-transformers .	翻訳日:2023-06-16 17:51:38 公開日:2023-06-14
# Radiology-GPT: ラジオロジーのための大規模言語モデル Radiology-GPT: A Large Language Model for Radiology ( http://arxiv.org/abs/2306.08666v1 ) ライセンス: Link先を確認	Zhengliang Liu, Aoxiao Zhong, Yiwei Li, Longtao Yang, Chao Ju, Zihao Wu, Chong Ma, Peng Shu, Cheng Chen, Sekeun Kim, Haixing Dai, Lin Zhao, Dajiang Zhu, Jun Liu, Wei Liu, Dinggang Shen, Xiang Li, Quanzheng Li, Tianming Liu	(参考訳) 放射線学のための大規模言語モデルであるRadiology-GPTを紹介する。放射線学領域知識の広範なデータセットに基づく指導チューニング手法を用いて、ラジオロジー-GPTは、StableLM、Dlly、LLaMAといった一般的な言語モデルと比較して優れた性能を示す。放射線診断、研究、コミュニケーションにおいて重要な多様性を示す。この研究は、臨床nlpの今後の発展の触媒となる。ラジオロジー-GPTの実装が成功したことは、HIPAAのようなプライバシ標準の遵守を確保しつつ、特に特有の医療専門分野に適した、生成的な大きな言語モデルをローカライズする可能性を示唆している。様々な病院のニーズに合わせて個別化された大規模言語モデルを開発する見通しは、有望な方向性を示している。これらのモデルにおける会話能力とドメイン固有の知識の融合は、医療AIにおける将来の発展を促進することを目的としている。 radiology-gptのデモはhttps://huggingface.co/spaces/allen-eric/radiology-gptで見ることができる。 We introduce Radiology-GPT, a large language model for radiology. Using an instruction tuning approach on an extensive dataset of radiology domain knowledge, Radiology-GPT demonstrates superior performance compared to general language models such as StableLM, Dolly and LLaMA. It exhibits significant versatility in radiological diagnosis, research, and communication. This work serves as a catalyst for future developments in clinical NLP. The successful implementation of Radiology-GPT is indicative of the potential of localizing generative large language models, specifically tailored for distinctive medical specialties, while ensuring adherence to privacy standards such as HIPAA. The prospect of developing individualized, large-scale language models that cater to specific needs of various hospitals presents a promising direction. The fusion of conversational competence and domain-specific knowledge in these models is set to foster future development in healthcare AI. A demo of Radiology-GPT is available at https://huggingface.co/spaces/allen-eric/radiology-gpt.	翻訳日:2023-06-16 17:51:16 公開日:2023-06-14
# 一般統計モデルに対するZiv-Zakai型誤差境界 Ziv-Zakai-type error bounds for general statistical models ( http://arxiv.org/abs/2306.08660v1 ) ライセンス: Link先を確認	Mankei Tsang	(参考訳) パラメータ空間 $\Theta$ が一般であり、$\beta(\theta)$ が$\theta$ の線型函数でなくてもよいとき、パラメータ $\beta:\Theta \to \mathbb R$ を推定するためのベイズ誤差上の Ziv-Zakai 型下界を提案する。 I propose Ziv-Zakai-type lower bounds on the Bayesian error for estimating a parameter $\beta:\Theta \to \mathbb R$ when the parameter space $\Theta$ is general and $\beta(\theta)$ need not be a linear function of $\theta$.	翻訳日:2023-06-16 17:50:59 公開日:2023-06-14
# 3Dポイントクラウド理解のためのインコンテキスト学習の探索 Explore In-Context Learning for 3D Point Cloud Understanding ( http://arxiv.org/abs/2306.08659v1 ) ライセンス: Link先を確認	Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M. Buhmann, Chen Change Loy, Mengyuan Liu	(参考訳) 広範囲なデータに基づいて訓練された大規模モデルの台頭により、自然言語処理やコンピュータビジョンタスクにおいて大きな可能性を示す新たな学習パラダイムとなった。一方、インコンテキスト学習は、3d point cloudドメインではまだほとんど未調査である。マスク付きモデリングは、2Dビジョンにおけるコンテキスト内学習に成功しているが、それを3Dポイントクラウドに直接拡張することは、依然として困難な課題である。点雲の場合、トークンそのものは、推論中にマスクされる点雲の位置(座標)である。さらに、前作における位置埋め込みは、不注意に情報漏洩をもたらす可能性がある。このような課題に対処するために,我々は,特に3d ポイントクラウドにおけるインコンテキスト学習用に設計された point-in-context という新しいフレームワークを導入する。さらに,一般点サンプリング演算子と協調して動作するよう慎重に設計したジョイントサンプリングモジュールを提案し,上記の技術的課題を効果的に解決する。提案手法の汎用性と適応性を検証するため,幅広いタスクを扱うための広範囲な実験を行った。さらに、より効果的なプロンプト選択戦略により、我々のフレームワークは個別に訓練されたモデルの結果を上回る。 With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks. Meanwhile, in-context learning is still largely unexplored in the 3D point cloud domain. Although masked modeling has been successfully applied for in-context learning in 2D vision, directly extending it to 3D point clouds remains a formidable challenge. In the case of point clouds, the tokens themselves are the point cloud positions (coordinates) that are masked during inference. Moreover, position embedding in previous works may inadvertently introduce information leakage. To address these challenges, we introduce a novel framework, named Point-In-Context, designed especially for in-context learning in 3D point clouds, where both inputs and outputs are modeled as coordinates for each task. Additionally, we propose the Joint Sampling module, carefully designed to work in tandem with the general point sampling operator, effectively resolving the aforementioned technical issues. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks. Furthermore, with a more effective prompt selection strategy, our framework surpasses the results of individually trained models.	翻訳日:2023-06-16 17:50:50 公開日:2023-06-14
# Babel-ImageNet:視覚・言語表現の多言語的評価 Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations ( http://arxiv.org/abs/2306.08658v1 ) ライセンス: Link先を確認	Gregor Geigle, Radu Timofte, Goran Glava\v{s}	(参考訳) 視覚と言語(VL)モデルは、各モダリティ(例えばCLIP)ごとに異なるエンコーダを持ち、ゼロショット画像分類と画像テキスト検索のためのゴートモデルになっている。しかし、これらのモデルの評価の大部分は、英語のテキストのみで行われている: 言語固有の画像キャプチャーデータセットのコストの高い作成は、少数の高リソース言語に限定した多言語VLベンチマークを持つ。本研究では,1000のimagenetラベルを92言語に(部分的に)翻訳する多言語ベンチマークであるbabel-imagenetを紹介し,機械翻訳(mt)や手動アノテーションを使わずに構築した。代わりに、共有のwordnetシンセットを介して、imagenext概念の信頼できる翻訳を、巨大な多言語レキシコ・セマンティクスネットワークであるbabelnetにリンクすることで、自動的に取得します。 92のbabel-imagenet言語のそれぞれについて,公開されている8種類のマルチリンガル・クリップモデル(zs-ic)を評価し,英語イメージネットの性能と高リソース言語(ドイツ語や中国語など)と,低リソース言語(シンハラ語やラオ語など)とのギャップを明らかにした。 Babel-ImageNetのZS-IC性能は画像テキスト検索の性能と高い相関性を示し、金色の画像テキストデータを持たないほとんどの言語において、多言語VL表現空間の品質を推定するのにBabel-ImageNetが適していることを示す。最後に、低リソース言語に対する多言語CLIPの性能は、安価でパラメータ効率の良い言語特化学習によって劇的に改善できることを示す。コードとデータを公開します。 \url{https://github.com/gregor-ge/Babel-ImageNet} Vision-and-language (VL) models with separate encoders for each modality (e.g., CLIP) have become the go-to models for zero-shot image classification and image-text retrieval. The bulk of the evaluation of these models is, however, performed with English text only: the costly creation of language-specific image-caption datasets has limited multilingual VL benchmarks to a handful of high-resource languages. In this work, we introduce Babel-ImageNet, a massively multilingual benchmark that offers (partial) translations of 1000 ImageNet labels to 92 languages, built without resorting to machine translation (MT) or requiring manual annotation. We instead automatically obtain reliable translations of ImageNext concepts by linking them -- via shared WordNet synsets -- to BabelNet, a massively multilingual lexico-semantic network. We evaluate 8 different publicly available multilingual CLIP models on zero-shot image classification (ZS-IC) for each of the 92 Babel-ImageNet languages, demonstrating a significant gap between English ImageNet performance and that of high-resource languages (e.g., German or Chinese), and an even bigger gap for low-resource languages (e.g., Sinhala or Lao). Crucially, we show that the models' ZS-IC performance on Babel-ImageNet highly correlates with their performance in image-text retrieval, validating that Babel-ImageNet is suitable for estimating the quality of the multilingual VL representation spaces for the vast majority of languages that lack gold image-text data. Finally, we show that the performance of multilingual CLIP for low-resource languages can be drastically improved via cheap, parameter-efficient language-specific training. We make our code and data publicly available: \url{https://github.com/gregor-ge/Babel-ImageNet}	翻訳日:2023-06-16 17:50:31 公開日:2023-06-14
# EMERSK -- 状況知識を用いた説明可能なマルチモーダル感情認識 EMERSK -- Explainable Multimodal Emotion Recognition with Situational Knowledge ( http://arxiv.org/abs/2306.08657v1 ) ライセンス: Link先を確認	Mijanur Palash, Bharat Bhargava	(参考訳) 近年,ディープラーニングアルゴリズムの普及により,感情の自動認識が注目されている。感情認識における主な課題の1つは、データで利用可能な様々な手がかり(モダリティ)を効果的に活用することである。もう一つの課題は、学習結果の適切な説明を提供することであり、これらの課題に対処するために、人間の感情認識と視覚情報を用いた説明のための一般化されたモジュールシステムEMERSK(Explainable Multimodal Emotion Recognition with situational Knowledge)を提案する。本システムは, 表情, 姿勢, 歩行などの複数のモーダルを柔軟かつモジュラーな方法で処理することができる。ネットワークは、利用可能なデータに応じて追加または削除できるさまざまなモジュールで構成されている。畳み込みニューラルネットワーク(cnns)とエンコーダ-デコーダスタイルの注意機構を備えた2ストリームネットワークアーキテクチャを用いて,顔画像から深い特徴を抽出する。同様に、長い短期記憶(lstm)を持つcnnとリカレントニューラルネットワーク(rnn)を用いて、姿勢や歩行データから特徴を抽出する。また、背景からの深い機能を学習プロセスのコンテキスト情報として取り入れています。各モジュールの深い機能は、初期のフュージョンネットワークを使って融合される。さらに,シーンから抽出した位置タイプと形容詞・名詞ペア(anp),感情の時空間的平均分布から得られた状況知識を活用し,説明を生成する。アブレーション研究は、各サブネットワークが独立して感情認識を行い、それらをマルチモーダルアプローチで組み合わせることで、全体的な認識性能が著しく向上することを示した。 GroupWalkを含む様々なベンチマークデータセットで実施された大規模な実験は、他の最先端手法と比較して、我々のアプローチの優れた性能を検証する。 Automatic emotion recognition has recently gained significant attention due to the growing popularity of deep learning algorithms. One of the primary challenges in emotion recognition is effectively utilizing the various cues (modalities) available in the data. Another challenge is providing a proper explanation of the outcome of the learning.To address these challenges, we present Explainable Multimodal Emotion Recognition with Situational Knowledge (EMERSK), a generalized and modular system for human emotion recognition and explanation using visual information. Our system can handle multiple modalities, including facial expressions, posture, and gait, in a flexible and modular manner. The network consists of different modules that can be added or removed depending on the available data. We utilize a two-stream network architecture with convolutional neural networks (CNNs) and encoder-decoder style attention mechanisms to extract deep features from face images. Similarly, CNNs and recurrent neural networks (RNNs) with Long Short-term Memory (LSTM) are employed to extract features from posture and gait data. We also incorporate deep features from the background as contextual information for the learning process. The deep features from each module are fused using an early fusion network. Furthermore, we leverage situational knowledge derived from the location type and adjective-noun pair (ANP) extracted from the scene, as well as the spatio-temporal average distribution of emotions, to generate explanations. Ablation studies demonstrate that each sub-network can independently perform emotion recognition, and combining them in a multimodal approach significantly improves overall recognition performance. Extensive experiments conducted on various benchmark datasets, including GroupWalk, validate the superior performance of our approach compared to other state-of-the-art methods.	翻訳日:2023-06-16 17:49:56 公開日:2023-06-14
# Augment then Smooth: 認証されたロバスト性で差別的プライバシを再定義する Augment then Smooth: Reconciling Differential Privacy with Certified Robustness ( http://arxiv.org/abs/2306.08656v1 ) ライセンス: Link先を確認	Jiapeng Wu, Atiyeh Ashari Ghomi, David Glukhov, Jesse C. Cresswell, Franziska Boenisch, Nicolas Papernot	(参考訳) マシンラーニングモデルは、デプロイに対する信頼を損なう可能性のあるさまざまな攻撃に影響を受けやすい。これらの脅威には、トレーニングデータのプライバシーに対する攻撃や、モデルの精度を脅かす敵の例が含まれる。ディファレンシャルプライバシとランダム化平滑化は、これらの脅威のそれぞれに対して証明可能な保証を提供する効果的な防御であるが、どちらの防御も他の脅威にどのように影響するかはよく分かっていない。本研究では,プライバシー保証と認証された堅牢性の両方を同時に達成できることを論じる。我々は,ランダム化平滑化による認定ロバストネスを差分プライベートモデルトレーニングに統合するdp-certと呼ばれるフレームワークを提供する。例えば、DP-CERTは、CIFAR10上の個人確率勾配勾配よりも12倍の精度向上と平均認定半径の10倍の精度向上を達成し、精度の1.2%の低下を犠牲にしている。試料ごとの距離解析により, 認定半径は局所リプシッツ定数と損失面の滑らかさに相関することを示した。これにより、プライベートモデルが堅牢でない場合に新たな診断方法が提供される。 Machine learning models are susceptible to a variety of attacks that can erode trust in their deployment. These threats include attacks against the privacy of training data and adversarial examples that jeopardize model accuracy. Differential privacy and randomized smoothing are effective defenses that provide certifiable guarantees for each of these threats, however, it is not well understood how implementing either defense impacts the other. In this work, we argue that it is possible to achieve both privacy guarantees and certified robustness simultaneously. We provide a framework called DP-CERT for integrating certified robustness through randomized smoothing into differentially private model training. For instance, compared to differentially private stochastic gradient descent on CIFAR10, DP-CERT leads to a 12-fold increase in certified accuracy and a 10-fold increase in the average certified radius at the expense of a drop in accuracy of 1.2%. Through in-depth per-sample metric analysis, we show that the certified radius correlates with the local Lipschitz constant and smoothness of the loss surface. This provides a new way to diagnose when private models will fail to be robust.	翻訳日:2023-06-16 17:49:29 公開日:2023-06-14
# 国家間における市民不安の相転移と時間的変化 Phase Transitions of Civil Unrest across Countries and Time ( http://arxiv.org/abs/2306.08698v1 ) ライセンス: Link先を確認	Dan Braha	(参考訳) 組織のマクロなパターン間の急激なシフトを特徴とする相転移は、複雑なシステムにおいてユビキタスである。物理科学や自然科学の研究は多いが、社会システムにおけるこの現象の実証的研究は比較的未発達である。本研究の目的は,集団的市民不安のダイナミクスが,再帰的位相シフトの系列として,各フェーズが測定可能かつ識別可能な潜在性を有することを明らかにすることにある。 1946年から2017年までの170か国における市民不安の総合データセットを用いて,市民不安のマクロレベルの統計モデルを導入し,その可能性を評価する。以上の結果から,マクロレベルの位相モデルは,世界各国の市民不安データの特徴を効果的に捉え,普遍的なメカニズムは市民不安のダイナミクスの特定の側面を裏付ける可能性がある。また,国家の時間単位当たりの長期的不安を定量化する新たな尺度を導入し,特定の地域に集中して,市民的不安が地理的に集結する傾向があることを示す。我々のアプローチは、市民の不安を超えた様々な集団の人間の現象の相転移を特定し測定する可能性があり、複雑な社会システムに対するより良い理解に寄与する。 Phase transitions, characterized by abrupt shifts between macroscopic patterns of organization, are ubiquitous in complex systems. Despite considerable research in the physical and natural sciences, the empirical study of this phenomenon in societal systems is relatively underdeveloped. The goal of this study is to explore whether the dynamics of collective civil unrest can be plausibly characterized as a sequence of recurrent phase shifts, with each phase having measurable and identifiable latent characteristics. We introduce a macro-level statistical model of civil unrest and evaluate its plausibility using a comprehensive dataset of civil unrest events in 170 countries from 1946 to 2017. Our findings demonstrate that the macro-level phase model effectively captures the characteristics of civil unrest data from diverse countries globally and that universal mechanisms may underlie certain aspects of the dynamics of civil unrest. We also introduce a new scale to quantify a country's long-term unrest per unit of time and show that civil unrest events tend to cluster geographically, with the magnitude of civil unrest concentrated in specific regions. Our approach has the potential to identify and measure phase transitions in various collective human phenomena beyond civil unrest, contributing to a better understanding of complex social systems.	翻訳日:2023-06-16 17:42:34 公開日:2023-06-14
# GHP-MOFassemble:拡散モデリング、高スループットスクリーニング、分子動力学による炭素捕獲のための新規金属-有機化合物の合理的発見 GHP-MOFassemble: Diffusion modeling, high throughput screening, and molecular dynamics for rational discovery of novel metal-organic frameworks for carbon capture at scale ( http://arxiv.org/abs/2306.08695v1 ) ライセンス: Link先を確認	Hyun Park, Xiaoli Yan, Ruijie Zhu, E. A. Huerta, Santanu Chaudhuri, Donny Cooper, Ian Foster, Emad Tajkhorshid	(参考訳) ghp-mofassembleは生成型人工知能(ai)であり、高いco2能力と合成可能なリンカを備えた金属-有機フレームワーク(mofs)の合理的設計を加速する高性能フレームワークである。我々のフレームワークは,3つの事前選択ノードのうちの1つで組み立てられた新しいリンカを,プリミティブな立方体(pcu)トポロジーでMOFに生成するために,拡散モデルと生成AIのクラスを組み合わせる。これらのAI生成MOFのCO2容量は、結晶グラフ畳み込みニューラルネットワークモデルの修正版を用いて予測される。次に、LAMMPS符号を用いて分子動力学シミュレーションを行い、AI生成したMOF構造を緩和し、安定な構造に収束する構造を特定し、シミュレーションを通して多孔質性を維持する。 GHP-MOFassembleフレームワークによって生成された12万のpcu MOF候補のうち、合計102の分子動力学シミュレーションが1バーで完了し、1バーで2 mmol/g以上のCO2容量が0.1バーで予測され、MOFX-DBデータベースの仮説MOF(hMOF)データセットにおけるhMOFの上位5%に相当する。これらの候補のうち、18は分子動力学シミュレーションにおいて1%未満の密度変化を示し、安定性を示している。また、上位5つのGHP-MOFassembleのMOF構造は、96.9%のhMOF構造よりもCO2容量が高いことがわかった。この新しいアプローチは、生成型ai、グラフモデリング、大規模分子動力学シミュレーション、極端な計算を組み合わせて、新しいmof構造を大規模に発見するための新しい経路を開く。 We introduce GHP-MOFassemble, a Generative artificial intelligence (AI), High Performance framework to accelerate the rational design of metal-organic frameworks (MOFs) with high CO2 capacity and synthesizable linkers. Our framework combines a diffusion model, a class of generative AI, to generate novel linkers that are assembled with one of three pre-selected nodes into MOFs in a primitive cubic (pcu) topology. The CO2 capacities of these AI-generated MOFs are predicted using a modified version of the crystal graph convolutional neural network model. We then use the LAMMPS code to perform molecular dynamics simulations to relax the AI-generated MOF structures, and identify those that converge to stable structures, and maintain their porous properties throughout the simulations. Among 120,000 pcu MOF candidates generated by the GHP-MOFassemble framework, with three distinct metal nodes (Cu paddlewheel, Zn paddlewheel, Zn tetramer), a total of 102 structures completed molecular dynamics simulations at 1 bar with predicted CO2 capacity higher than 2 mmol/g at 0.1 bar, which corresponds to the top 5% of hMOFs in the hypothetical MOF (hMOF) dataset in the MOFX-DB database. Among these candidates, 18 have change in density lower than 1% during molecular dynamics simulations, indicating their stability. We also found that the top five GHP-MOFassemble's MOF structures have CO2 capacities higher than 96.9% of hMOF structures. This new approach combines generative AI, graph modeling, large-scale molecular dynamics simulations, and extreme scale computing to open up new pathways for the accelerated discovery of novel MOF structures at scale.	翻訳日:2023-06-16 17:42:12 公開日:2023-06-14
# 等化量子回帰への不確実性認識の統合 Integrating Uncertainty Awareness into Conformalized Quantile Regression ( http://arxiv.org/abs/2306.08693v1 ) ライセンス: Link先を確認	Raphael Rossellini, Rina Foygel Barber, Rebecca Willett	(参考訳) Conformalized Quantile Regression (CQR) は、分布的仮定を作らずに、共変量$X$の応答に対して予測間隔を構築する方法である。しかし、実証的に示すように、既存のCQRの構成は、量子回帰器が特徴空間の特定の部分において他の部分よりも優れているという問題に対して効果がない。理由は、CQR の予測間隔が 2 つの不確かさを区別しないからである: まず、$Y$ の条件分布のばらつき(すなわち、アレター的不確実性)と、この条件分布を推定する不確実性(すなわち、疫学的不確実性)である。これは不均一な範囲につながり、疫学的不確実性が低い(または高い)地域では、非常に広い(または過度に狭い)間隔を持つ。そこで本研究では,これら2つの不確実性源を明示的に分離し,特徴空間をまたいで分位レグレッセプタを調整する,不確実性対応型cqr(uacqr)を提案する。 cqrと比較すると,本手法はカバレッジ特性に対する分散フリーな理論保証を享受する一方で,実世界のデータセット上でのシミュレーションによる条件付きカバレッジの強化とインターバルの厳密化を実証した。 Conformalized Quantile Regression (CQR) is a recently proposed method for constructing prediction intervals for a response $Y$ given covariates $X$, without making distributional assumptions. However, as we demonstrate empirically, existing constructions of CQR can be ineffective for problems where the quantile regressors perform better in certain parts of the feature space than others. The reason is that the prediction intervals of CQR do not distinguish between two forms of uncertainty: first, the variability of the conditional distribution of $Y$ given $X$ (i.e., aleatoric uncertainty), and second, our uncertainty in estimating this conditional distribution (i.e., epistemic uncertainty). This can lead to uneven coverage, with intervals that are overly wide (or overly narrow) in regions where epistemic uncertainty is low (or high). To address this, we propose a new variant of the CQR methodology, Uncertainty-Aware CQR (UACQR), that explicitly separates these two sources of uncertainty to adjust quantile regressors differentially across the feature space. Compared to CQR, our methods enjoy the same distribution-free theoretical guarantees for coverage properties, while demonstrating in our experiments stronger conditional coverage in simulated settings and tighter intervals on a range of real-world data sets.	翻訳日:2023-06-16 17:41:33 公開日:2023-06-14
# 弱結合限界におけるRydbergアーキテクチャの量子ゲート最適化 Quantum Gate Optimization for Rydberg Architectures in the Weak-Coupling Limit ( http://arxiv.org/abs/2306.08691v1 ) ライセンス: Link先を確認	Nicolas Heimann, Lukas Broers, Nejira Pintul, Tobias Petersen, Koen Sponselee, Alexander Ilin, Christoph Becker, Ludwig Mathey	(参考訳) 我々はRydberg tweezerシステムにおける2ビットゲートの機械学習支援設計を実証する。各原子の2つの低エネルギー超微細構造は論理量子ビットを表し、リドバーグ状態は量子ビット相互作用を誘導する補助状態として作用する。ハイブリッド量子古典最適化器を用いることで、実験的に現実的なパラメータとプロトコル、および現実的な制限のために、高忠実度CNOTゲートを実装する最適なパルス列を生成する。単一量子ビット演算の局所制御は、大きな原子配列で量子計算を行うのに十分であることを示す。我々は,リュドベルク州の強結合・封鎖体制だけでなく,弱結合限界に対しても堅牢な最適化戦略を生成する。したがって, 弱結合限界におけるrydbergに基づく量子情報処理は, 強固かつ最適であり, 現在の技術と相まって望ましい手法であることを示す。 We demonstrate machine learning assisted design of a two-qubit gate in a Rydberg tweezer system. Two low-energy hyperfine states in each of the atoms represent the logical qubit and a Rydberg state acts as an auxiliary state to induce qubit interaction. Utilizing a hybrid quantum-classical optimizer, we generate optimal pulse sequences that implement a CNOT gate with high fidelity, for experimentally realistic parameters and protocols, as well as realistic limitations. We show that local control of single qubit operations is sufficient for performing quantum computation on a large array of atoms. We generate optimized strategies that are robust for both the strong-coupling, blockade regime of the Rydberg states, but also for the weak-coupling limit. Thus, we show that Rydberg-based quantum information processing in the weak-coupling limit is a desirable approach, being robust and optimal, with current technology.	翻訳日:2023-06-16 17:41:05 公開日:2023-06-14
# ICETによる幾何型レーザスキャンマッチングの精度評価 ICET Online Accuracy Characterization for Geometry-Based Laser Scan Matching ( http://arxiv.org/abs/2306.08690v1 ) ライセンス: Link先を確認	Matthew McDermott and Jason Rife	(参考訳) Distribution-to-Distribution (D2D)ポイントクラウド登録アルゴリズムは高速で、解釈可能で、非構造化環境ではよく機能する。残念ながら、これらの方法のソリューションエラーを予測する既存の戦略は、特に大規模または拡張された物理オブジェクトを含む領域において、非常に楽観的である。本稿では,第1原理からロバストな精度予測を実現するために,ndtを再構成した新しい3次元lidarスキャンマッチングアルゴリズムである反復的最接近楕円型変換(icet)を提案する。 ndtと同様に、icetはより小さな局所点分布を考慮して複雑なシーンを分析するために、lidarスキャンをvoxelに分割するが、icetはランダムノイズと決定論的構造を区別するためにvoxel分布を評価する。 icetは重み付き最小二乗法を用いて、このノイズ/構造区別を局所化解の計算と解エラー共分散の予測に組み込む。精度予測の合理性を示すために,実世界の自動車データ,高忠実度シミュレーショントラジェクタ,コーナーケースシーンのシミュレーションを含む3つのlidarテストで3d icetを検証した。それぞれのテストで、icetは一貫してサブセンチメートルの精度でスキャンマッチングを行う。この精度のレベルは、アルゴリズムが完全に解釈可能であるという事実と相まって、安全クリティカルな輸送用途に適している。コードはhttps://github.com/mcdermatt/ICETで入手できる。 Distribution-to-Distribution (D2D) point cloud registration algorithms are fast, interpretable, and perform well in unstructured environments. Unfortunately, existing strategies for predicting solution error for these methods are overly optimistic, particularly in regions containing large or extended physical objects. In this paper we introduce the Iterative Closest Ellipsoidal Transform (ICET), a novel 3D LIDAR scan-matching algorithm that re-envisions NDT in order to provide robust accuracy prediction from first principles. Like NDT, ICET subdivides a LIDAR scan into voxels in order to analyze complex scenes by considering many smaller local point distributions, however, ICET assesses the voxel distribution to distinguish random noise from deterministic structure. ICET then uses a weighted least-squares formulation to incorporate this noise/structure distinction into computing a localization solution and predicting the solution-error covariance. In order to demonstrate the reasonableness of our accuracy predictions, we verify 3D ICET in three LIDAR tests involving real-world automotive data, high-fidelity simulated trajectories, and simulated corner-case scenes. For each test, ICET consistently performs scan matching with sub-centimeter accuracy. This level of accuracy, combined with the fact that the algorithm is fully interpretable, make it well suited for safety-critical transportation applications. Code is available at https://github.com/mcdermatt/ICET	翻訳日:2023-06-16 17:40:51 公開日:2023-06-14
# テキスト・画像生成のためのノルム誘導潜時空間探索 Norm-guided latent space exploration for text-to-image generation ( http://arxiv.org/abs/2306.08687v1 ) ライセンス: Link先を確認	Dvir Samuel, Rami Ben-Ari, Nir Darshan, Haggai Maron, Gal Chechik	(参考訳) テキストから画像への拡散モデルは、新しい構成やシナリオにおいて様々な概念を合成する大きな可能性を示している。しかし、その潜在的な種空間はまだよく分かっておらず、新しい希少な概念の生成に影響を及ぼすことが示されている。具体的には、補間やセントロイド探索のような単純な操作は、潜在空間の標準ユークリッド測度や球面測度ではうまく機能しない。本稿では,現行のトレーニング手法が,標準値の狭い入力に対して拡散モデルを偏在させることを観察する。これは、画像生成のシード操作に依存する手法に強く影響し、少数ショットおよび長期学習タスクにさらに適用することができる。この問題に対処するために, 2つの種子間を補間する新しい方法を提案し, 種子に先行するノルムを考慮した新しい非ユークリッド計量を定義することを実証する。我々は,この計量を近似する単純かつ効率的なアルゴリズムを記述し,それを用いて潜在種空間におけるセントロイドをさらに定義する。我々は,新たな補間・遠心評価手法により,レアコンセプト画像の生成が著しく向上することを示す。これにより、少数ショットとロングテールのベンチマークにおける最先端のパフォーマンスが向上し、生成速度、画質、セマンティックコンテンツといった面で以前のアプローチが改善される。 Text-to-image diffusion models show great potential in synthesizing a large variety of concepts in new compositions and scenarios. However, their latent seed space is still not well understood and has been shown to have an impact in generating new and rare concepts. Specifically, simple operations like interpolation and centroid finding work poorly with the standard Euclidean and spherical metrics in the latent space. This paper makes the observation that current training procedures make diffusion models biased toward inputs with a narrow range of norm values. This has strong implications for methods that rely on seed manipulation for image generation that can be further applied to few-shot and long-tail learning tasks. To address this issue, we propose a novel method for interpolating between two seeds and demonstrate that it defines a new non-Euclidean metric that takes into account a norm-based prior on seeds. We describe a simple yet efficient algorithm for approximating this metric and use it to further define centroids in the latent seed space. We show that our new interpolation and centroid evaluation techniques significantly enhance the generation of rare concept images. This further leads to state-of-the-art performance on few-shot and long-tail benchmarks, improving prior approach in terms of generation speed, image quality, and semantic content.	翻訳日:2023-06-16 17:40:24 公開日:2023-06-14
# World-to-Words:視覚言語モデルにおける高速マッピングによる接地型オープン語彙獲得 World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models ( http://arxiv.org/abs/2306.08685v1 ) ライセンス: Link先を確認	Ziqiao Ma, Jiayi Pan, Joyce Chai	(参考訳) 言語単位を物理的世界の参照元とつなぐ能力は「接地」と呼ばれ、単語の基底的意味の学習と理解に不可欠である。人間は新しい単語学習で高速マッピングを実証するが、現代視覚言語モデルがその基礎的意味を持つ言語を真に表現できるかどうか、新しい単語学習をさらにブートストラップする方法については定かではない。この目的のために、オープンワールド言語学習における接地とブートストラップを検討するために、GOVA(Grounded Open Vocabulary Acquisition)を導入する。最初の試みとして,オブジェクト指向bert(10月)を提案する。これは,接地を目的として強調する画像とテキストのペアを事前学習することで,視覚的に接地した新しい言語モデルである。実験や分析を通じて、OctoBERTはより一貫性があり、高速な単語学習者であり、事前学習中に得られる接地能力は、未知の単語をより迅速かつ堅牢に学習する上で有効であることを示した。私たちのコードはhttps://github.com/sled-group/world-to-wordsで利用可能です。 The ability to connect language units to their referents in the physical world, referred to as grounding, is crucial to learning and understanding grounded meanings of words. While humans demonstrate fast mapping in new word learning, it remains unclear whether modern vision-language models can truly represent language with their grounded meanings and how grounding may further bootstrap new word learning. To this end, we introduce Grounded Open Vocabulary Acquisition (GOVA) to examine grounding and bootstrapping in open-world language learning. As an initial attempt, we propose object-oriented BERT (OctoBERT), a novel visually-grounded language model by pre-training on image-text pairs highlighting grounding as an objective. Through extensive experiments and analysis, we demonstrate that OctoBERT is a more coherent and fast grounded word learner, and that the grounding ability acquired during pre-training helps the model to learn unseen words more rapidly and robustly. Our code is available at https://github.com/sled-group/world-to-words	翻訳日:2023-06-16 17:40:02 公開日:2023-06-14
# コネクテッドカーデータを用いたハリケーン避難時のリアルタイム衝突リスク予測 Predicting Real-time Crash Risks during Hurricane Evacuation Using Connected Vehicle Data ( http://arxiv.org/abs/2306.08682v1 ) ライセンス: Link先を確認	Zaheen E Muktadi Syed and Samiul Hasan	(参考訳) 沿岸部の人々の命を救えるよう命じられたハリケーンの避難は、衝突のリスクを増し、高い交通需要を生み出す。このようなリスクを軽減するため、交通機関は適切な対策を展開するために、衝突リスクの高い高速道路の場所を予想する必要がある。ユビキタスセンサーと通信技術により、個々の車両軌道と速度情報を含むマイクロレベルの車両データを取得することができる。このような高分解能な車両データはリアルタイムに利用可能であり、交通安全条件の評価に使用できる。車両の速度と加速プロファイルを使用して、潜在的な衝突リスクをリアルタイムで予測することができる。リアルタイムの事故リスク予測に関するこれまでの研究は、主に道路セグメントの多くをカバーしていないインフラストラクチャーベースのセンサーのデータを用いていた。そこで,本稿では,新データであるコネクテッド・ビークル・データから,ハリケーン避難時の事故リスクを判定する手法を提案する。このようなデータは、非常に高い周波数(30秒未満)で収集された車両の位置、速度、加速度情報を含んでいる。事故リスクを予測するために,ルイジアナ州州間高速道路10号線(i-10)のハリケーン・アイダの避難期間に収集されたデータセットを用いた。連結車両データから5分間隔で抽出した気象特性と交通特性を考慮し,複数の機械学習モデルを訓練した。その結果, ガウスプロセスブースティング (GPBoost) とエクストリームグラディエントブースティング (XGBoost) は, 他のモデルより優れている(リコール=0.91)ことが示された。事故リスク評価のためのリアルタイムコネクテッドカーデータにより、交通管理者は資源を効率的に活用し、安全対策を積極的に行うことができる。 Hurricane evacuation, ordered to save lives of people of coastal regions, generates high traffic demand with increased crash risk. To mitigate such risk, transportation agencies need to anticipate highway locations with high crash risks to deploy appropriate countermeasures. With ubiquitous sensors and communication technologies, it is now possible to retrieve micro-level vehicular data containing individual vehicle trajectory and speed information. Such high-resolution vehicle data, potentially available in real time, can be used to assess prevailing traffic safety conditions. Using vehicle speed and acceleration profiles, potential crash risks can be predicted in real time. Previous studies on real-time crash risk prediction mainly used data from infrastructure-based sensors which may not cover many road segments. In this paper, we present methods to determine potential crash risks during hurricane evacuation from an emerging alternative data source known as connected vehicle data. Such data contain vehicle location, speed, and acceleration information collected at a very high frequency (less than 30 seconds). To predict potential crash risks, we utilized a dataset collected during the evacuation period of Hurricane Ida on Interstate-10 (I-10) in the state of Louisiana. Multiple machine learning models were trained considering weather features and different traffic characteristics extracted from the connected vehicle data in 5-minute intervals. The results indicate that the Gaussian Process Boosting (GPBoost) and Extreme Gradient Boosting (XGBoost) models perform better (recall = 0.91) than other models. The real-time connected vehicle data for crash risks assessment will allow traffic managers to efficiently utilize resources to proactively take safety measures.	翻訳日:2023-06-16 17:39:42 公開日:2023-06-14
# 完全可観測非決定性領域モデルにおける時間的拡張目標認識 Temporally Extended Goal Recognition in Fully Observable Non-Deterministic Domain Models ( http://arxiv.org/abs/2306.08680v1 ) ライセンス: Link先を確認	Ramon Fraga Pereira, Francesco Fuggitti, Felipe Meneguzzi, Giuseppe De Giacomo	(参考訳) ゴール認識(Goal Recognition)とは、エージェントが目標仮説、ドメインモデル、および一連の観測(つまり、環境内で実行される計画のサンプル)を与えられた上で達成しようとする意図された目標を識別するタスクである。既存のアプローチでは、ゴール仮説は単一の最終状態上の単一の共役式で構成され、環境力学は決定論的であり、より複雑な設定において時間的に拡張されたゴールの認識を妨げていると仮定している。本稿では,線形時相論理(ltlf)と純粋過去時相論理(pltlf)で表される有限トレースの目標に着目し,完全可観測非決定性(fond)計画ドメインモデルにおいて,目標認識を時間的拡張目標に拡張する。 6つのFONDプランニングドメインモデルに対して,そのような設定で目標を認識可能な最初のアプローチを開発し,異なるLTLfとPLTLfの目標を用いて評価する。実験の結果,我々のアプローチは,異なる認識環境における時間的拡張目標の認識において正確であることがわかった。 Goal Recognition is the task of discerning the correct intended goal that an agent aims to achieve, given a set of goal hypotheses, a domain model, and a sequence of observations (i.e., a sample of the plan executed in the environment). Existing approaches assume that goal hypotheses comprise a single conjunctive formula over a single final state and that the environment dynamics are deterministic, preventing the recognition of temporally extended goals in more complex settings. In this paper, we expand goal recognition to temporally extended goals in Fully Observable Non-Deterministic (FOND) planning domain models, focusing on goals on finite traces expressed in Linear Temporal Logic (LTLf) and Pure Past Linear Temporal Logic (PLTLf). We develop the first approach capable of recognizing goals in such settings and evaluate it using different LTLf and PLTLf goals over six FOND planning domain models. Empirical results show that our approach is accurate in recognizing temporally extended goals in different recognition settings.	翻訳日:2023-06-16 17:39:12 公開日:2023-06-14
# 定常状態における多体エッジバースト Many-body edge burst in steady states ( http://arxiv.org/abs/2306.08676v1 ) ライセンス: Link先を確認	Yu-Min Hu, Wen-Tan Xue, Fei Song, Zhong Wang	(参考訳) 非エルミート皮膚効果と損失格子の空隙との相互作用はエッジバースト(エッジバースト)と呼ばれる非常に大きな粒子損失が発生する境界誘起力学現象を引き起こす。ここでは、そのような興味深い非エルミート力学現象を対応する開量子系の定常密度分布に正確にマッピングできることが分かる。したがって、エッジバーストにおける損失確率のバルクエッジスケーリング関係は定常密度のそれにもマップされる。さらに,二体損失を持つ散逸多体系に対して正のp表現を適用し,定常相関関数に対するスケーリング関係の有効性を明らかにする。これらの結果は、相互作用によって引き起こされる多体非エルミート皮膚効果の独特な展望を与える。我々の予測は最先端の実験プラットフォームで検証可能である。 The interplay between the non-Hermitian skin effect and the imaginary gap of lossy lattices results in the edge burst, a boundary-induced dynamical phenomenon that an exceptionally large portion of particle loss occurs at the edge. Here, we find that such an intriguing non-Hermitian dynamical phenomenon can be exactly mapped into the steady-state density distribution of a corresponding open quantum system. Consequently, the bulk-edge scaling relation of loss probability in edge burst also maps to that of steady-state density. Moreover, we apply the positive-P representation to dissipative many-body systems with two-body loss and reveal the validity of scaling relation for steady-state correlation functions. These results provide a unique perspective of the interaction-induced many-body non-Hermitian skin effect. Our predictions are testable in state-of-the-art experimental platforms.	翻訳日:2023-06-16 17:38:52 公開日:2023-06-14
# ワークフローノートを用いた信頼できる発作検出に向けて Towards trustworthy seizure onset detection using workflow notes ( http://arxiv.org/abs/2306.08728v1 ) ライセンス: Link先を確認	Khaled Saab, Siyi Tang, Mohamed Taha, Christopher Lee-Messer, Christopher R\'e, Daniel Rubin	(参考訳) 医療AIモデルをデプロイする上で大きな障壁は、信頼性だ。既存のモデルは、集約されたメトリクスに専門家レベルのパフォーマンスを示すことがあるが、それらはしばしば非因果的特徴に依存し、隠れたサブグループのエラーにつながる。脳波からの信頼できる発作発生検出に向けて、我々は、発作以外の複数のイベント記述を含む、日常的な臨床ワークフロー(ワークフローノートと呼ばれる)で医療関係者が作成するアノテーションを活用することを提案する。ワークフローノートを用いて、トレーニングデータを68,920 EEG時間にスケールアップすることにより、高価な手作業によるゴールドスタンダードラベルによる小さなトレーニングセットに依存するよりも、発作発生検出性能が著しく向上する(+12.3 AUROCポイント)ことを示す。第2に, 2次発作検出モデルは, 臨床的に関連のあるサブグループ (小児と成人の間では最大6.5 auroc point) に過小評価され, また, 任意の脳波クリップ (+19 fpr) と比較して, 非てんかん性異常を示す脳波クリップでは有意に高い偽陽性率を示した。隠れたサブグループに対するモデルロバスト性を改善するために、スパイク、減速、移動アーティファクトなど、発作以外の26の属性を分類するマルチラベルモデルを訓練する。その結果, マルチラベルモデルでは, 発作発生検出性能(+5.9 AUROC点)が有意に向上し, サブグループ(+8.3 AUROC点)のパフォーマンスが向上し, 非てんかん性異常に対する偽陽性が8FPR点まで低下することがわかった。最後に,24脳波時間あたりの偽陽性率に基づく臨床ユーティリティ指標を提案するとともに,この臨床ユーティリティ指標を異なる臨床設定で2倍改善するマルチラベルモデルを提案する。 A major barrier to deploying healthcare AI models is their trustworthiness. One form of trustworthiness is a model's robustness across different subgroups: while existing models may exhibit expert-level performance on aggregate metrics, they often rely on non-causal features, leading to errors in hidden subgroups. To take a step closer towards trustworthy seizure onset detection from EEG, we propose to leverage annotations that are produced by healthcare personnel in routine clinical workflows -- which we refer to as workflow notes -- that include multiple event descriptions beyond seizures. Using workflow notes, we first show that by scaling training data to an unprecedented level of 68,920 EEG hours, seizure onset detection performance significantly improves (+12.3 AUROC points) compared to relying on smaller training sets with expensive manual gold-standard labels. Second, we reveal that our binary seizure onset detection model underperforms on clinically relevant subgroups (e.g., up to a margin of 6.5 AUROC points between pediatrics and adults), while having significantly higher false positives on EEG clips showing non-epileptiform abnormalities compared to any EEG clip (+19 FPR points). To improve model robustness to hidden subgroups, we train a multilabel model that classifies 26 attributes other than seizures, such as spikes, slowing, and movement artifacts. We find that our multilabel model significantly improves overall seizure onset detection performance (+5.9 AUROC points) while greatly improving performance among subgroups (up to +8.3 AUROC points), and decreases false positives on non-epileptiform abnormalities by 8 FPR points. Finally, we propose a clinical utility metric based on false positives per 24 EEG hours and find that our multilabel model improves this clinical utility metric by a factor of 2x across different clinical settings.	翻訳日:2023-06-16 17:34:29 公開日:2023-06-14
# クロスドメイン手術画像分割のためのクライアントサーバディープフェデレーション学習 A Client-server Deep Federated Learning for Cross-domain Surgical Image Segmentation ( http://arxiv.org/abs/2306.08720v1 ) ライセンス: Link先を確認	Ronast Subedi, Rebati Raman Gaire, Sharib Ali, Anh Nguyen, Danail Stoyanov, and Binod Bhattarai	(参考訳) 本稿では,異なるセンターに属する分散データセットのプライバシー保護を念頭において,2次元画像分割のための領域間適応問題の解法を提案する。医学画像解析におけるディープラーニングアーキテクチャは、より一般化するために広範なトレーニングデータを必要とする。しかし,本質的なデータキュレーションコストとデータアノテーション専門家の必要性から,十分な診断・手術データを得ることは依然として困難である。さらに、プライバシーと法的コンプライアンスの懸念が高まり、臨床現場や地域間でのデータ共有が困難になる可能性がある。医療データセットが直面するもうひとつの課題は、異なるセンターで収集されたデータ間のドメインシフトが避けられないことだ。そこで本研究では,クロスドメイン適応のためのクライアントサーバディープフェデレーションアーキテクチャを提案する。サーバはソースドメインとターゲットドメインの両方に共通するイミュータブルなパラメータのセットをホストする。クライアントはそれぞれのドメイン固有のパラメータで構成され、パラメータと推論を学習しながらサーバにリクエストを行う。本手法は2つのベンチマークデータセットで評価し,内視鏡的ポリープ・セグメンテーションと診断的皮膚病変の検出と解析に対するコンピュータ支援介入の適用性を示した。提案手法は, 競争的ベースライン法や最先端手法と比較して, 提案手法の優位性を示す。コードは、https://github.com/thetna/distributed-daで入手できる。 This paper presents a solution to the cross-domain adaptation problem for 2D surgical image segmentation, explicitly considering the privacy protection of distributed datasets belonging to different centers. Deep learning architectures in medical image analysis necessitate extensive training data for better generalization. However, obtaining sufficient diagnostic and surgical data is still challenging, mainly due to the inherent cost of data curation and the need of experts for data annotation. Moreover, increased privacy and legal compliance concerns can make data sharing across clinical sites or regions difficult. Another ubiquitous challenge the medical datasets face is inevitable domain shifts among the collected data at the different centers. To this end, we propose a Client-server deep federated architecture for cross-domain adaptation. A server hosts a set of immutable parameters common to both the source and target domains. The clients consist of the respective domain-specific parameters and make requests to the server while learning their parameters and inferencing. We evaluate our framework in two benchmark datasets, demonstrating applicability in computer-assisted interventions for endoscopic polyp segmentation and diagnostic skin lesion detection and analysis. Our extensive quantitative and qualitative experiments demonstrate the superiority of the proposed method compared to competitive baseline and state-of-the-art methods. Codes are available at: https://github.com/thetna/distributed-da	翻訳日:2023-06-16 17:33:53 公開日:2023-06-14
# 二重不均質環境におけるオフポリシー評価 Off-policy Evaluation in Doubly Inhomogeneous Environments ( http://arxiv.org/abs/2306.08719v1 ) ライセンス: Link先を確認	Zeyu Bian, Chengchun Shi, Zhengling Qi and Lan Wang	(参考訳) 本研究の目的は,2つの重要な強化学習(RL)の仮定 – 時間的定常性と個人的均質性の両方に違反するシナリオの下で,政治外評価(OPE)を研究することである。二重不均一性」を扱うために、モデルベースとモデルフリーの両方のアプローチからなる一般的なOPEフレームワークを開発するために、報酬および観測遷移関数のための潜在因子モデルのクラスを提案する。我々の知る限り、この論文は二重不均一なオフラインRLにおける統計的に健全なOPE法を開発した最初の論文である。標準的なRL仮定が満たされていない環境でのOPEの深い理解に寄与し、これらの設定においていくつかの実践的なアプローチを提供する。提案する値推定器の理論的性質を定め,その手法が時間的非定常性や個人的不均一性を無視する競合手法よりも優れていることを実証的に示す。最後に,集中治療のための医療情報マートから得られたデータセットについて述べる。 This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated. To handle the ``double inhomogeneities", we propose a class of latent factor models for the reward and observation transition functions, under which we develop a general OPE framework that consists of both model-based and model-free approaches. To our knowledge, this is the first paper that develops statistically sound OPE methods in offline RL with double inhomogeneities. It contributes to a deeper understanding of OPE in environments, where standard RL assumptions are not met, and provides several practical approaches in these settings. We establish the theoretical properties of the proposed value estimators and empirically show that our approach outperforms competing methods that ignore either temporal nonstationarity or individual heterogeneity. Finally, we illustrate our method on a data set from the Medical Information Mart for Intensive Care.	翻訳日:2023-06-16 17:33:35 公開日:2023-06-14
# 灌水スケジューリングのための機械学習パラダイムと混合整数モデル予測制御の統合 Integrating machine learning paradigms and mixed-integer model predictive control for irrigation scheduling ( http://arxiv.org/abs/2306.08715v1 ) ライセンス: Link先を確認	Bernard T. Agyeman, Mohamed Naouri, Willemijn Appels, Jinfeng Liu (University of Alberta), Sirish L. Shah	(参考訳) 農業部門は、主に淡水不足への懸念から、水資源の保全と収穫量の最適化において大きな課題に直面している。従来の灌水スケジューリング手法は、大規模な灌水システムのニーズを満たすのに不十分であることが多い。そこで本稿では,機械学習の3つのパラダイムを活かし,灌水スケジュールを最適化する予測灌水スケジューラを提案する。提案するスケジューラでは, 土壌水分パラメータとトポロジ情報に基づいて, k-meansクラスタリング手法を用いて, フィールドを異なる灌水管理ゾーンに分割する。さらに,管理ゾーン毎に動的モデルを構築するための長期短期記憶ネットワークを用いて,土壌水分動態の正確な予測を行う。混合整数モデル予測制御問題として定式化されたスケジューラは、全体の水消費と灌水コストを最小化しながら、吸水量を最大化する。混合整数最適化課題に取り組むために, 日次灌水決定に責任を持つ強化学習エージェントを訓練するために, 近位政策最適化アルゴリズムを用いる。提案したスケジューラの性能を評価するため、カナダのレスブリッジにある26.4ヘクタールのフィールドが2015年と2022年の成長期のケーススタディとして選ばれた。以上の結果から,水利用効率と作物収量改善の両面において,従来の灌水スケジューリング法と比較して,提案するスケジューラの優越性が示された。特に、提案されたスケジューラは6.4%から22.8%の貯水量を達成し、収量は2.3%から4.3%に増加した。 The agricultural sector currently faces significant challenges in water resource conservation and crop yield optimization, primarily due to concerns over freshwater scarcity. Traditional irrigation scheduling methods often prove inadequate in meeting the needs of large-scale irrigation systems. To address this issue, this paper proposes a predictive irrigation scheduler that leverages the three paradigms of machine learning to optimize irrigation schedules. The proposed scheduler employs the k-means clustering approach to divide the field into distinct irrigation management zones based on soil hydraulic parameters and topology information. Furthermore, a long short-term memory network is employed to develop dynamic models for each management zone, enabling accurate predictions of soil moisture dynamics. Formulated as a mixed-integer model predictive control problem, the scheduler aims to maximize water uptake while minimizing overall water consumption and irrigation costs. To tackle the mixed-integer optimization challenge, the proximal policy optimization algorithm is utilized to train a reinforcement learning agent responsible for making daily irrigation decisions. To evaluate the performance of the proposed scheduler, a 26.4-hectare field in Lethbridge, Canada, was chosen as a case study for the 2015 and 2022 growing seasons. The results demonstrate the superiority of the proposed scheduler compared to a traditional irrigation scheduling method in terms of water use efficiency and crop yield improvement for both growing seasons. Notably, the proposed scheduler achieved water savings ranging from 6.4% to 22.8%, along with yield increases ranging from 2.3% to 4.3%.	翻訳日:2023-06-16 17:33:16 公開日:2023-06-14
# イタリアの料理人はインドで機械工学を学べますか。シナリオと場所に関する行動認識の一般化 What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations ( http://arxiv.org/abs/2306.08713v1 ) ライセンス: Link先を確認	Chiara Plizzari, Toby Perrett, Barbara Caputo, Dima Damen	(参考訳) 行動認識のために訓練されたモデルは、これまで見つからなかったシナリオや、これまで見つからなかった場所で実行されたアクションをうまく分類できるだろうか? この質問に答えるために、大規模ego4dデータセットからの1.1mのビデオクリップを含む、シナリオとロケーションデータセット(argo1m)に対するアクション認識の一般化を紹介する。認識モデルは、10以上の提案されたテスト分割を一般化するのに苦労し、各シナリオは目に見えない場所にある。そこで我々は,他のドメインからの動画のクロスインスタンス再構成として,各ビデオを表現するCIRを提案する。レコンストラクションはテキストナレーションと組み合わせて、ドメインの一般化可能な表現の学習を導く。我々は、CIRが全てのテスト分割に先立つ領域一般化よりも優れていることを示すARGO1Mに関する広範な分析と改善を提供する。コードとデータ: https://chiaraplizz.github.io/what-can-a-cook/ We propose and address a new generalisation problem: can a model trained for action recognition successfully classify actions when they are performed within a previously unseen scenario and in a previously unseen location? To answer this question, we introduce the Action Recognition Generalisation Over scenarios and locations dataset (ARGO1M), which contains 1.1M video clips from the large-scale Ego4D dataset, across 10 scenarios and 13 locations. We demonstrate recognition models struggle to generalise over 10 proposed test splits, each of an unseen scenario in an unseen location. We thus propose CIR, a method to represent each video as a Cross-Instance Reconstruction of videos from other domains. Reconstructions are paired with text narrations to guide the learning of a domain generalisable representation. We provide extensive analysis and ablations on ARGO1M that show CIR outperforms prior domain generalisation works on all test splits. Code and data: https://chiaraplizz.github.io/what-can-a-cook/.	翻訳日:2023-06-16 17:32:46 公開日:2023-06-14
# AiXpand AI OS -- 分散ユビキタスコンピューティングMLOps実行エンジン AiXpand AI OS -- Decentralized ubiquitous computing MLOps execution engine ( http://arxiv.org/abs/2306.08708v1 ) ライセンス: Link先を確認	Beatrice Milik, Stefan Saraev, Cristian Bleotiu, Radu Lupaescu, Bogdan Hobeanu, Andrei Ionut Damian	(参考訳) 過去数年間、ユビキタス、あるいは広く普及したコンピューティングは、エンタープライズグレードシステム、コンシューマアプリケーション、ゲームシステムを含む幅広いアプリケーションの主要なアプローチとして人気を集めてきた。ユビキタスコンピューティング(ユビキタスコンピューティング)とは、コンピュータ技術を日常のオブジェクトや環境に統合し、相互に通信可能な相互接続されたデバイスのネットワークを構築することを指す。ユビキタスコンピューティング技術を使用することで、コミュニティはよりつながりやすく、効率的になり、メンバーはコミュニケーションやコラボレーションが容易になる。これによって相互接続性とコラボレーションが,より成功し,持続可能なコミュニティに結びつくのです。しかしユビキタスコンピューティングの普及は、自動化された学習とスマートなアプリケーション全般の重要性を強調している。人工知能とディープラーニングには大きな進歩があったが、高価で高度に複雑なクラウド数値計算インフラへの圧力が高まり、大規模に採用されている。実践的な機械学習システムの採用や開発には、複雑なインフラストラクチャだけでなく、データサイエンスや機械学習の専門知識の面でも、禁止的なコストが伴う。本稿では、エンドツーエンドai協調アプリケーションパイプラインのローコード開発と展開のための革新的なアプローチを提案する。我々は、トークン化経済に基づく完全に分散したグローバル協調コミュニティにおける、インフラの割り当て、コスト、そして安全な雇用分配に対処する。 Over the past few years, ubiquitous, or pervasive computing has gained popularity as the primary approach for a wide range of applications, including enterprise-grade systems, consumer applications, and gaming systems. Ubiquitous computing refers to the integration of computing technologies into everyday objects and environments, creating a network of interconnected devices that can communicate with each other and with humans. By using ubiquitous computing technologies, communities can become more connected and efficient, with members able to communicate and collaborate more easily. This enabled interconnectedness and collaboration can lead to a more successful and sustainable community. The spread of ubiquitous computing, however, has emphasized the importance of automated learning and smart applications in general. Even though there have been significant strides in Artificial Intelligence and Deep Learning, large scale adoption has been hesitant due to mounting pressure on expensive and highly complex cloud numerical-compute infrastructures. Adopting, and even developing, practical machine learning systems can come with prohibitive costs, not only in terms of complex infrastructures but also of solid expertise in Data Science and Machine Learning. In this paper we present an innovative approach for low-code development and deployment of end-to-end AI cooperative application pipelines. We address infrastructure allocation, costs, and secure job distribution in a fully decentralized global cooperative community based on tokenized economics.	翻訳日:2023-06-16 17:32:17 公開日:2023-06-14
# VidEdit:ゼロショットと空間対応のテキスト駆動ビデオ編集 VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing ( http://arxiv.org/abs/2306.08707v1 ) ライセンス: Link先を確認	Paul Couairon, Cl\'ement Rambour, Jean-Emmanuel Haugeard, Nicolas Thome	(参考訳) 近年,拡散に基づく生成モデルが画像生成と編集において大きな成功を収めている。しかし、ビデオ編集には依然として重要な制限がある。本稿では,強い時間的・空間的一貫性を確保したゼロショットテキストベースの映像編集手法であるvideditを提案する。まず,アトラスベースと事前学習したテキスト-画像拡散モデルを組み合わせて,時間的滑らかさを設計する訓練不要で効率的な編集方法を提案する。第2に,既製パン光学セグメンタとエッジ検出器を併用し,条件付き拡散型アトラス編集に応用する。これにより、元のビデオの構造を厳格に保ちながら、ターゲット領域の正確な空間的制御が保証される。定量的および定性的な実験により、VidEditは、意味的忠実性、画像保存、時間的一貫性のメトリクスに関して、DAVISデータセット上で最先端の手法より優れていることが示された。このフレームワークでは、単一のビデオを処理するのに約1分しかかからず、ユニークなテキストプロンプトに基づいて複数の互換性のある編集を生成することができる。 Project Web-page at https://videdit.github.io Recently, diffusion-based generative models have achieved remarkable success for image generation and edition. However, their use for video editing still faces important limitations. This paper introduces VidEdit, a novel method for zero-shot text-based video editing ensuring strong temporal and spatial consistency. Firstly, we propose to combine atlas-based and pre-trained text-to-image diffusion models to provide a training-free and efficient editing method, which by design fulfills temporal smoothness. Secondly, we leverage off-the-shelf panoptic segmenters along with edge detectors and adapt their use for conditioned diffusion-based atlas editing. This ensures a fine spatial control on targeted regions while strictly preserving the structure of the original video. Quantitative and qualitative experiments show that VidEdit outperforms state-of-the-art methods on DAVIS dataset, regarding semantic faithfulness, image preservation, and temporal consistency metrics. With this framework, processing a single video only takes approximately one minute, and it can generate multiple compatible edits based on a unique text prompt. Project web-page at https://videdit.github.io	翻訳日:2023-06-16 17:31:28 公開日:2023-06-14
# hBN/グラフェン/hBN超格子の熱力 Thermopower in hBN/graphene/hBN superlattices ( http://arxiv.org/abs/2306.08705v1 ) ライセンス: Link先を確認	Victor H. Guarochico-Moreira, Christopher R. Anderson, Vladimir Fal'ko, Irina V. Grigorieva, Endre T\'ov\'ari, Matthew Hamer, Roman Gorbachev, Song Liu, James H. Edgar, Alessandro Principi, Andrey V. Kretinin and Ivan J. Vera-Marun	(参考訳) 熱電効果はフェルミエネルギーの周りの状態密度の非対称性に非常に敏感であり、電子構造のプローブとして利用することができる。グラフェンとhbn格子が整列した完全hbnカプセル化と1次元エッジ接触からなるヘテロ構造において,高品質なグラフェンの熱パワーを実験的に研究した。グラフェンがhBN層のいずれかに配列されている場合、キャリア密度の関数として熱力に付加的な符号逆転の存在が示され、モワール超格子の存在が直接決定される。熱パワーの温度依存性は、内蔵歪変動とファンホブ特異性の役割を評価できることを示し、umklapp電子電子電子散乱過程の存在を示唆する。熱力は中立点付近でピークに達するため、エネルギースペクトルの縮退を探索することができる。さらに、グラフェンが上面と下面のhBN結晶と二重配向している場合、熱力は微分超モワール格子によって複数のクローン化されたディラック点が生じる特徴を示す。どちらの場合も、熱力がモットの方程式とどの程度よく一致するかを評価する。最後に, キャリア密度を制御することにより, 温度駆動型熱パワー逆転を正から負へ, 逆の逆転を観測できることを示す。熱力の研究は、2次元超格子の電子構造を研究するための代替のアプローチを提供する一方で、これらのヘテロ構造に熱電応答を工学する機会を提供する。 Thermoelectric effects are highly sensitive to the asymmetry in the density of states around the Fermi energy and can be exploited as probes of the electronic structure. We experimentally study thermopower in high-quality monolayer graphene, within heterostructures consisting of complete hBN encapsulation and 1D edge contacts, where the graphene and hBN lattices are aligned. When graphene is aligned to one of the hBN layers, we demonstrate the presence of additional sign reversals in the thermopower as a function of carrier density, directly evidencing the presence of the moir\'e superlattice. We show that the temperature dependence of the thermopower enables the assessment of the role of built-in strain variation and van Hove singularities and hints at the presence of Umklapp electron-electron scattering processes. As the thermopower peaks around the neutrality point, this allows to probe the energy spectrum degeneracy. Further, when graphene is double-aligned with the top and bottom hBN crystals, the thermopower exhibits features evidencing multiple cloned Dirac points caused by the differential super-moir\'e lattice. For both cases we evaluate how well the thermopower agrees with Mott's equation. Finally, we show the same superlattice device can exhibit a temperature-driven thermopower reversal from positive to negative and vice versa, by controlling the carrier density. The study of thermopower provides an alternative approach to study the electronic structure of 2D superlattices, whilst offering opportunities to engineer the thermoelectric response on these heterostructures.	翻訳日:2023-06-16 17:30:55 公開日:2023-06-14
# mBERTはロマンシュを理解していますか。単語アライメントを用いた単語埋め込みの評価 Does mBERT understand Romansh? Evaluating word embeddings using word alignment ( http://arxiv.org/abs/2306.08702v1 ) ライセンス: Link先を確認	Eyal Liron Dolev	(参考訳) 類似度に基づく単語アライメントモデル(SimAlign と Super-Align )と mBERT と XLM-R の単語埋め込みを,ドイツ語とロマンシュ語の並行文に組み合わせて検証する。 romanshは目に見えない言語なので、ゼロショットの設定を扱う。 mBERT からの埋め込みを用いて、両方のモデルがアライメントエラー率 0.22 に達し、統計モデルである fast_align を上回り、類似性に基づく単語アライメントと同等である。我々はこれらの結果を,mBERTが意味があり,ロマンシュに適用可能な情報を含んでいるという証拠として解釈する。性能を評価するため,過去25年間のドイツ語,ロマンシュ語,イタリア語のCanton of Grisonsによるプレスリリースを含む,DERMIT(DE-RM-IT)コーパスを新たに発表した。コーパスは4,547の並列文書と約10000の文対を言語の組み合わせに含む。さらに、ドイツ・ルーマニア語のアライメントの金本位制も提示する。データはhttps://github.com/eyldlv/DERMIT-Corpusで公開されている。 We test similarity-based word alignment models (SimAlign and awesome-align) in combination with word embeddings from mBERT and XLM-R on parallel sentences in German and Romansh. Since Romansh is an unseen language, we are dealing with a zero-shot setting. Using embeddings from mBERT, both models reach an alignment error rate of 0.22, which outperforms fast_align, a statistical model, and is on par with similarity-based word alignment for seen languages. We interpret these results as evidence that mBERT contains information that can be meaningful and applicable to Romansh. To evaluate performance, we also present a new trilingual corpus, which we call the DERMIT (DE-RM-IT) corpus, containing press releases made by the Canton of Grisons in German, Romansh and Italian in the past 25 years. The corpus contains 4 547 parallel documents and approximately 100 000 sentence pairs in each language combination. We additionally present a gold standard for German-Romansh word alignment. The data is available at https://github.com/eyldlv/DERMIT-Corpus.	翻訳日:2023-06-16 17:30:28 公開日:2023-06-14
# 反復的自己伝達学習:小規模データセットに基づく応答時間履歴予測の一般的な手法 Iterative self-transfer learning: A general methodology for response time-history prediction based on small dataset ( http://arxiv.org/abs/2306.08700v1 ) ライセンス: Link先を確認	Yongjia Xu, Xinzheng Lu, Yifan Fei and Yuli Huang	(参考訳) 応答時間履歴予測のためのディープニューラルネットワークサロゲートモデリングには、多くの利点がある。しかし、洗練された数値シミュレーションや実際の実験のコストが高いため、データ不足は実用的な応用において避けられないボトルネックとなっている。本研究では,小規模データセットに基づくニューラルネットワーク学習のための反復的自己伝達学習手法を提案する。回帰のための3つの分岐(DAN-TR)を持つ深層適応ネットワーク(deep adapt network)を新たに提案する。 DAN-TRと擬似ラベル戦略を組み合わせることで,ネットワークの総合的学習戦略を開発し,対応するデータセットの確立についても論じる。最後に、複素成分をケーススタディとして選択する。提案手法は, 外部ラベル付きサンプルや, 事前学習モデル, 追加人工ラベル付け, 複雑な物理・数学的解析を必要とせずに, モデル性能を約1桁向上させることができることを示す。 There are numerous advantages of deep neural network surrogate modeling for response time-history prediction. However, due to the high cost of refined numerical simulations and actual experiments, the lack of data has become an unavoidable bottleneck in practical applications. An iterative self-transfer learningmethod for training neural networks based on small datasets is proposed in this study. A new mapping-based transfer learning network, named as deep adaptation network with three branches for regression (DAN-TR), is proposed. A general iterative network training strategy is developed by coupling DAN-TR and the pseudo-label strategy, and the establishment of corresponding datasets is also discussed. Finally, a complex component is selected as a case study. The results show that the proposed method can improve the model performance by near an order of magnitude on small datasets without the need of external labeled samples,well behaved pre-trainedmodels, additional artificial labeling, and complex physical/mathematical analysis.	翻訳日:2023-06-16 17:30:08 公開日:2023-06-14
# 縦型胸部X線を用いた放射線診断 Utilizing Longitudinal Chest X-Rays and Reports to Pre-Fill Radiology Reports ( http://arxiv.org/abs/2306.08749v1 ) ライセンス: Link先を確認	Qingqing Zhu, Tejas Sudharshan Mathai, Pritam Mukherjee, Yifan Peng, Ronald M. Summers, and Zhiyong Lu	(参考訳) 音声認識ソフトウェアの使用による放射線学報告のターンアラウンドタイムの短縮にもかかわらず、持続的な通信エラーは、放射線学レポートの解釈に大きな影響を及ぼす可能性がある。 MIMIC-CXRデータセットにおける患者訪問記録の経時的性質を活かしたアプローチの欠如は,医療報告作成の文献的努力にもかかわらず,放射線学報告の補充が期待できる。このギャップに対処するため, 患者の過去訪問CXR, 現在の訪問CXR, および過去の訪問報告など, 患者の現在訪問レポートの「最終」部分をプリフィルするために, 縦マルチモーダルデータを用いることを提案する。まず,MIMIC-CXRデータセットから26,625人の患者を対象に,経時的訪問情報を収集した。この新しいデータセットでは、マルチモーダルデータ(cxrイメージ+レポート)を含む縦断的な患者訪問記録から、クロスタッチベースのマルチモーダル融合モジュールと階層的メモリ駆動デコーダを介して情報をキャプチャするためにトランスフォーマベースのモデルが訓練された。現在の訪問データのみを入力としてモデルトレーニングを行う従来の作業とは対照的に,本研究では,放射線学レポートの「発見」セクションを事前に埋めるために利用可能な縦断情報を活用している。実験の結果,f1得点では3%,bleu-4,meteor,rouge-lでは2%であった。データセットとコードは公開される予定だ。 Despite the reduction in turn-around times in radiology reports with the use of speech recognition software, persistent communication errors can significantly impact the interpretation of the radiology report. Pre-filling a radiology report holds promise in mitigating reporting errors, and despite efforts in the literature to generate medical reports, there exists a lack of approaches that exploit the longitudinal nature of patient visit records in the MIMIC-CXR dataset. To address this gap, we propose to use longitudinal multi-modal data, i.e., previous patient visit CXR, current visit CXR, and previous visit report, to pre-fill the 'findings' section of a current patient visit report. We first gathered the longitudinal visit information for 26,625 patients from the MIMIC-CXR dataset and created a new dataset called Longitudinal-MIMIC. With this new dataset, a transformer-based model was trained to capture the information from longitudinal patient visit records containing multi-modal data (CXR images + reports) via a cross-attention-based multi-modal fusion module and a hierarchical memory-driven decoder. In contrast to previous work that only uses current visit data as input to train a model, our work exploits the longitudinal information available to pre-fill the 'findings' section of radiology reports. Experiments show that our approach outperforms several recent approaches by >=3% on F1 score, and >=2% for BLEU-4, METEOR and ROUGE-L respectively. The dataset and code will be made publicly available.	翻訳日:2023-06-16 17:23:07 公開日:2023-06-14
# 物体中心神経散乱関数による多物体操作 Multi-Object Manipulation via Object-Centric Neural Scattering Functions ( http://arxiv.org/abs/2306.08748v1 ) ライセンス: Link先を確認	Stephen Tian, Yancheng Cai, Hong-Xing Yu, Sergey Zakharov, Katherine Liu, Adrien Gaidon, Yunzhu Li, Jiajun Wu	(参考訳) 学習された視覚力学モデルはロボット操作に有効であることが証明されている。しかし、マルチオブジェクトインタラクションに関わるシーンを表現できる最善の方法はまだ不明である。現在の方法はシーンを離散的なオブジェクトに分解するが、特定の照度に結びついた外観のみをエンコードするため、照明条件に挑戦する中で正確なモデリングと操作に苦慮する。本稿では,モデル予測制御フレームワークにおけるオブジェクト表現として,オブジェクト中心のニューラル散乱関数(osfs)を用いることを提案する。 OSFは、オブジェクトごとの光輸送をモデルとし、オブジェクトの再配置と様々な照明条件の下で構成シーンの再レンダリングを可能にする。このアプローチを逆パラメータ推定とグラフに基づくニューラルダイナミクスモデルと組み合わせることで,従来考えられなかったシナリオや過酷な照明条件においても,モデル予測制御性能の向上と合成多目的環境における一般化を実証する。 Learned visual dynamics models have proven effective for robotic manipulation tasks. Yet, it remains unclear how best to represent scenes involving multi-object interactions. Current methods decompose a scene into discrete objects, but they struggle with precise modeling and manipulation amid challenging lighting conditions as they only encode appearance tied with specific illuminations. In this work, we propose using object-centric neural scattering functions (OSFs) as object representations in a model-predictive control framework. OSFs model per-object light transport, enabling compositional scene re-rendering under object rearrangement and varying lighting conditions. By combining this approach with inverse parameter estimation and graph-based neural dynamics models, we demonstrate improved model-predictive control performance and generalization in compositional multi-object environments, even in previously unseen scenarios and harsh lighting conditions.	翻訳日:2023-06-16 17:22:34 公開日:2023-06-14
# MetaML: ディープラーニングアクセラレーションのためのカスタマイズ可能なクロスステージ設計フローを自動化する MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration ( http://arxiv.org/abs/2306.08746v1 ) ライセンス: Link先を確認	Zhiqiang Que, Shuo Liu, Markus Rognlien, Ce Guo, Jose G. F. Coutinho, Wayne Luk	(参考訳) 本稿では、Deep Neural Network(DNN)ハードウェアアクセラレーションのための新しい最適化フレームワークを紹介し、カスタマイズされた設計フローと自動設計フローの迅速な開発を可能にする。具体的には、DNNとFPGAの低レベル最適化を含む低レベル最適化手法の選択と構成を自動化することを目的とする。 DNN加速器の性能と効率を向上させるため、高度にカスタマイズ可能で柔軟な設計フローアーキテクチャを構築するための新しい最適化および変換タスクを導入する。以上の結果から,DSP使用率を最大99%,LUT使用率を最大99%削減し,精度を維持し,人的努力やドメインの専門知識の必要性を排除した。最先端のアプローチと比較して,提案手法は高い精度を実現し,dspリソースを3倍削減し,提案フレームワークの利点を強調する。 This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators, enabling the rapid development of customized and automated design flows. More specifically, our approach aims to automate the selection and configuration of low-level optimization techniques, encompassing DNN and FPGA low-level optimizations. We introduce novel optimization and transformation tasks for building design-flow architectures, which are highly customizable and flexible, thereby enhancing the performance and efficiency of DNN accelerators. Our results demonstrate considerable reductions of up to 92\% in DSP usage and 89\% in LUT usage for two networks, while maintaining accuracy and eliminating the need for human effort or domain expertise. In comparison to state-of-the-art approaches, our design achieves higher accuracy and utilizes three times fewer DSP resources, underscoring the advantages of our proposed framework.	翻訳日:2023-06-16 17:22:19 公開日:2023-06-14
# PLAN: 変数対応のプライベート平均推定 PLAN: Variance-Aware Private Mean Estimation ( http://arxiv.org/abs/2306.08745v1 ) ライセンス: Link先を確認	Martin Aum\"uller, Christian Janos Lebeda, Boel Nelson, Rasmus Pagh	(参考訳) 差分プライベート平均推定は、データ分析と機械学習のためのプライバシ保存アルゴリズムの重要な構成要素である。プライバシとユーティリティのトレードオフは最悪の場合よく理解されているが、多くのデータセットはより良いアルゴリズムを生み出すために悪用される可能性がある構造を示している。本稿では、入力が分散$\mathcal{d}$ over $\mathbf{r}^d$ から独立にサンプリングされ、座標的に標準偏差が$\boldsymbol{\sigma} \in \mathbf{r}^d$ となるような設定において、平均推定のための微分プライベートアルゴリズムの族である$\textit{private limit adapted noise (plan)}$を提案する。マハラノビス距離での推定と同様、PLANはノイズの形状をデータの形に調整するが、従来のアルゴリズムとは異なり、プライバシー予算は座標に不均一に費やされる。 $\mathcal{D}$ の濃度仮定の下で、ベクトル $\boldsymbol{\sigma}$ のスキューをどのように活用するかを示し、$\ell_2$ 誤差が $\\|\boldsymbol{\sigma}\\|_1$ に比例した(ゼロ濃度の)微分プライベート平均推定値を得る。以前の研究は、$\boldsymbol{\sigma}$を考慮に入れなかったり、マハラノビス距離$\unicode{x2013}$で測定された誤差は、どちらも$\ell_2$エラーは$\sqrt{d}\\|\boldsymbol{\sigma}\\|_2$に比例する。アルゴリズムの有効性を検証するため,合成データと実世界データの両方で精度を実証的に評価した。 Differentially private mean estimation is an important building block in privacy-preserving algorithms for data analysis and machine learning. Though the trade-off between privacy and utility is well understood in the worst case, many datasets exhibit structure that could potentially be exploited to yield better algorithms. In this paper we present $\textit{Private Limit Adapted Noise (PLAN)}$, a family of differentially private algorithms for mean estimation in the setting where inputs are independently sampled from a distribution $\mathcal{D}$ over $\mathbf{R}^d$, with coordinate-wise standard deviations $\boldsymbol{\sigma} \in \mathbf{R}^d$. Similar to mean estimation under Mahalanobis distance, PLAN tailors the shape of the noise to the shape of the data, but unlike previous algorithms the privacy budget is spent non-uniformly over the coordinates. Under a concentration assumption on $\mathcal{D}$, we show how to exploit skew in the vector $\boldsymbol{\sigma}$, obtaining a (zero-concentrated) differentially private mean estimate with $\ell_2$ error proportional to $\\|\boldsymbol{\sigma}\\|_1$. Previous work has either not taken $\boldsymbol{\sigma}$ into account, or measured error in Mahalanobis distance $\unicode{x2013}$ in both cases resulting in $\ell_2$ error proportional to $\sqrt{d}\\|\boldsymbol{\sigma}\\|_2$, which can be up to a factor $\sqrt{d}$ larger. To verify the effectiveness of \algorithmname, we empirically evaluate accuracy on both synthetic and real world data.	翻訳日:2023-06-16 17:22:03 公開日:2023-06-14
# 深い単一スパイクと深いReLUネットワークの訓練軌跡は等価か? Are training trajectories of deep single-spike and deep ReLU network equivalent? ( http://arxiv.org/abs/2306.08744v1 ) ライセンス: Link先を確認	Ana Stanojevic, Stanis{\l}aw Wo\'zniak, Guillaume Bellec, Giovanni Cherubini, Angeliki Pantazi and Wulfram Gerstner	(参考訳) 二分数とスパーススパイクによるコミュニケーションは、生物学的脳のエネルギー効率の重要な要素である。しかしながら、バックプロパゲーションによるディープスパイクニューラルネットワーク(SNN)のトレーニングは、ReLUからTTFS(Time-to-first-Spike)SNNへの正確なマッピングアルゴリズムを提供することを考えると、人工知能ニューラルネットワーク(ANN)よりも難しい。これらの結果に基づいて,TTFS-SNNの学習力学を理論的およびシミュレーションで解析する。我々の分析は、SNNをReLUネットワークに正確にマッピングできたとしても、勾配降下によって常に頑健に訓練できないことを強調している。その理由は、それに相当するANNと比較して勾配降下軌道に偏りをもたらす、消滅または爆発する勾配問題の特定の例の出現である。この問題を特定した後、ネットワーク初期化とSNNパラメータ化のための一般的なソリューションを導き、SNNがANNと同等に堅牢にトレーニングできることを保証する。画像分類データセットについて理論的知見を実際に示す。提案手法は,CIFAR10上での深部ConvNetsと同じ精度を実現し,さらに大きなPLACES365データセットをANNと比較して精度を損なうことなく微調整することができる。変換の観点とSNNの頑健な勾配勾配による微調整の組み合わせは、低レイテンシとノイズや量子化に対するレジリエンスを必要とするハードウェア実装において、SNNを最適化することが決定的に重要であると我々は主張する。 Communication by binary and sparse spikes is a key factor for the energy efficiency of biological brains. However, training deep spiking neural networks (SNNs) with backpropagation is harder than with artificial neural networks (ANNs), which is puzzling given that recent theoretical results provide exact mapping algorithms from ReLU to time-to-first-spike (TTFS) SNNs. Building upon these results, we analyze in theory and in simulation the learning dynamics of TTFS-SNNs. Our analysis highlights that even when an SNN can be mapped exactly to a ReLU network, it cannot always be robustly trained by gradient descent. The reason for that is the emergence of a specific instance of the vanishing-or-exploding gradient problem leading to a bias in the gradient descent trajectory in comparison with the equivalent ANN. After identifying this issue we derive a generic solution for the network initialization and SNN parameterization which guarantees that the SNN can be trained as robustly as its ANN counterpart. Our theoretical findings are illustrated in practice on image classification datasets. Our method achieves the same accuracy as deep ConvNets on CIFAR10 and enables fine-tuning on the much larger PLACES365 dataset without loss of accuracy compared to the ANN. We argue that the combined perspective of conversion and fine-tuning with robust gradient descent in SNN will be decisive to optimize SNNs for hardware implementations needing low latency and resilience to noise and quantization.	翻訳日:2023-06-16 17:21:21 公開日:2023-06-14
# 水中モノクロSLAMの課題の検討 Investigation of the Challenges of Underwater-Visual-Monocular-SLAM ( http://arxiv.org/abs/2306.08738v1 ) ライセンス: Link先を確認	Michele Grimaldi, David Nakath, Mengkun She, Kevin K\"oser	(参考訳) 本稿では,水中ロボットにおける単眼視覚同時配置マッピング法(vSLAM)の課題について,包括的に検討する。過去10年間、視覚データを利用する状態推定手法では大きな進歩を遂げてきたが、ほとんどの評価は、印象的な性能を示す屋内および都市環境の制御に限定されている。しかし、水や光の条件、ロボットの経路、深さなどの要因がアルゴリズムの性能に大きな影響を及ぼす水中シナリオのような、非常に困難な環境では、これらの手法は広くテストされていない。そこで,実世界のAUVシナリオと,正確な外部参照を提供する実験室設定で評価を行った。水の光学特性や照明シナリオなどの環境条件が単眼vslam法の性能に与える影響を理解することに焦点が当てられている。この目的を達成するために,まず,水中環境におけるすべての手法の動作が良好であることを示し,次いで水中環境における性能の低下を示す。本研究の最終目標は,これらの条件下でのSLAM法の精度とロバスト性を向上させる技術を明らかにすることである。この目的を達成するために,slam法で使用される入力画像の品質向上,特に散乱媒体の視認性および極端な照明シナリオにおける画像強調技術の可能性について検討する。本研究では,水中環境下での単分子SLAM法の性能向上を図るため,キャリブレーション操作と簡単な画像復元技術に関する最初の評価を行った。 In this paper, we present a comprehensive investigation of the challenges of Monocular Visual Simultaneous Localization and Mapping (vSLAM) methods for underwater robots. While significant progress has been made in state estimation methods that utilize visual data in the past decade, most evaluations have been limited to controlled indoor and urban environments, where impressive performance was demonstrated. However, these techniques have not been extensively tested in extremely challenging conditions, such as underwater scenarios where factors such as water and light conditions, robot path, and depth can greatly impact algorithm performance. Hence, our evaluation is conducted in real-world AUV scenarios as well as laboratory settings which provide precise external reference. A focus is laid on understanding the impact of environmental conditions, such as optical properties of the water and illumination scenarios, on the performance of monocular vSLAM methods. To this end, we first show that all methods perform very well in in-air settings and subsequently show the degradation of their performance in challenging underwater environments. The final goal of this study is to identify techniques that can improve accuracy and robustness of SLAM methods in such conditions. To achieve this goal, we investigate the potential of image enhancement techniques to improve the quality of input images used by the SLAM methods, specifically in low visibility and extreme lighting scenarios in scattering media. We present a first evaluation on calibration maneuvers and simple image restoration techniques to determine their ability to enable or enhance the performance of monocular SLAM methods in underwater environments.	翻訳日:2023-06-16 17:20:53 公開日:2023-06-14
# LoSh:ビデオオブジェクトセグメント参照のための長短テキスト共同予測ネットワーク LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation ( http://arxiv.org/abs/2306.08736v1 ) ライセンス: Link先を確認	Linfeng Yuan, Miaojing Shi, Zijie Yue	(参考訳) ビデオオブジェクトセグメンテーション(RVOS)は、所定のテキスト表現によって参照されるターゲットインスタンスをビデオクリップにセグメントすることを目的としている。テキスト表現は通常、インスタンスの外観、行動、他者との関係に関する洗練された記述を含んでいる。したがって、RVOSモデルでは、ビデオ内のすべての属性をキャプチャすることはかなり困難である。これは最終的には、ターゲットインスタンスの不完全あるいは不正なマスク予測に終わる。本稿では,従来の長文表現から主語中心の短文表現を取り出すことにより,この問題に対処する。ショートは、ターゲットインスタンスの外観関連情報のみを保持するので、モデルがインスタンスの外観に注意を集中するために使用できます。我々は,長文と短文の両方の表現を用いて共同予測を行い,連接予測を整合させるために,長短の予測交叉損失を導入する。また,前向きの視覚的整合性損失も導入し,アノテートフレームと時間的隣接部との間の視覚的特徴の整合性に光学的流れを利用する。エンド・ツー・エンドのトレーニングのために, art transformerベースのパイプラインの2つの状態上に本手法を構築した。 A2D-SentencesとJHMDB-Sentencesデータセットの大規模な実験により,本手法の大幅な改善が示された。 Referring video object segmentation (RVOS) aims to segment the target instance referred by a given text expression in a video clip. The text expression normally contains sophisticated descriptions of the instance's appearance, actions, and relations with others. It is therefore rather difficult for an RVOS model to capture all these attributes correspondingly in the video; in fact, the model often favours more on the action- and relation-related visual attribute of the instance. This can end up with incomplete or even incorrect mask prediction of the target instance. In this paper, we tackle this problem by taking a subject-centric short text expression from the original long text expression. The short one retains only the appearance-related information of the target instance so that we can use it to focus the model's attention on the instance's appearance. We let the model make joint predictions using both long and short text expressions and introduce a long-short predictions intersection loss to align the joint predictions. Besides the improvement on the linguistic part, we also introduce a forward-backward visual consistency loss, which utilizes optical flows to warp visual features between the annotated frames and their temporal neighbors for consistency. We build our method on top of two state of the art transformer-based pipelines for end-to-end training. Extensive experiments on A2D-Sentences and JHMDB-Sentences datasets show impressive improvements of our method.	翻訳日:2023-06-16 17:20:27 公開日:2023-06-14
# wavpool:ディープニューラルネットワークのための新しいブロック WavPool: A New Block for Deep Neural Networks ( http://arxiv.org/abs/2306.08734v1 ) ライセンス: Link先を確認	Samuel D. McDermott, M. Voetberg, Brian Nord	(参考訳) 現代のディープニューラルネットワークは、密集層や畳み込み層など、多くの操作層で構成されており、しばしばブロックにまとめられる。本研究では,マルチレゾリューション・パーセプトロンと呼ばれる新しいウェーブレット・トランスフォーメーション・ベースのネットワークアーキテクチャを導入する。マルチレゾリューションパーセプトロンの第1ステップは、入力データを固定係数のフィルタで変換するが、サイズが大きくなることで、データをそのマルチレゾリューション分解形式に変換する。画像処理技術により,データベクトルのサイズを増大させることなく,スケール情報と空間情報を同時にネットワークにアクセスすることができる。 WavPoolはパラメータを減らしながら同様の多層パーセプトロンを上回り、CIFAR-10の相対精度で同等の畳み込みニューラルネットワークを約10%上回る。 Modern deep neural networks comprise many operational layers, such as dense or convolutional layers, which are often collected into blocks. In this work, we introduce a new, wavelet-transform-based network architecture that we call the multi-resolution perceptron: by adding a pooling layer, we create a new network block, the WavPool. The first step of the multi-resolution perceptron is transforming the data into its multi-resolution decomposition form by convolving the input data with filters of fixed coefficients but increasing size. Following image processing techniques, we are able to make scale and spatial information simultaneously accessible to the network without increasing the size of the data vector. WavPool outperforms a similar multilayer perceptron while using fewer parameters, and outperforms a comparable convolutional neural network by ~ 10% on relative accuracy on CIFAR-10.	翻訳日:2023-06-16 17:20:03 公開日:2023-06-14
# 連続学習に基づく新しい感情認識システム Continuous Learning Based Novelty Aware Emotion Recognition System ( http://arxiv.org/abs/2306.08733v1 ) ライセンス: Link先を確認	Mijanur Palash, Bharat Bhargava	(参考訳) 現在の人間の感情認識の研究は、新しさを考慮せずに厳格な規則によって統治される伝統的なクローズドラーニングアプローチに従っている。分類モデルは、収集されたデータセット上でトレーニングされ、現実世界のデプロイメントで同じデータ分布を持つことが期待される。私たちが住んでいる世界の流動的で絶えず変化する性質のため、予期せぬ新しいサンプル分布を持つことで、モデルが失敗する可能性がある。そこで本研究では,自動感情認識タスクの新規性を扱うための継続的学習手法を提案する。 Current works in human emotion recognition follow the traditional closed learning approach governed by rigid rules without any consideration of novelty. Classification models are trained on some collected datasets and expected to have the same data distribution in the real-world deployment. Due to the fluid and constantly changing nature of the world we live in, it is possible to have unexpected and novel sample distribution which can lead the model to fail. Hence, in this work, we propose a continuous learning based approach to deal with novelty in the automatic emotion recognition task.	翻訳日:2023-06-16 17:19:46 公開日:2023-06-14
# epic fields: 3dジオメトリとビデオ理解の結婚 EPIC Fields: Marrying 3D Geometry and Video Understanding ( http://arxiv.org/abs/2306.08731v1 ) ライセンス: Link先を確認	Vadim Tschernezki, Ahmad Darkhalil, Zhifan Zhu, David Fouhey, Iro Laina, Diane Larlus, Dima Damen, Andrea Vedaldi	(参考訳) ニューラルレンダリングは、20年以上待ち続けている学習と3D幾何学、そしてビデオ理解の統一を加速させている。しかし、プログレスはまだ適切なデータセットとベンチマークの欠如によって妨げられている。このギャップに対処するために,EPIC-KITCHENSを3次元カメラ情報で拡張したEPIC Fieldsを導入する。ニューラルレンダリングのための他のデータセットと同様に、EPIC Fieldsは、フォトグラムを使ってカメラを再構成する複雑で高価なステップを取り除き、研究者が問題モデリングに集中できるようにする。本稿では,ダイナミックアクションのエゴセントリックビデオにおけるフォトグラメトリーの課題を説明し,それに対処するためのイノベーションを提案する。他のニューラルレンダリングデータセットと比較して、EPIC Fieldsはラベル付きアクションセグメントと最近のVISORセグメントアノテーションとの組み合わせであるため、ビデオ理解に適している。さらにコミュニティのモチベーションを高めるために、ニューラルネットワークと動的オブジェクトのセグメンテーションにおける2つのベンチマークタスクを評価します。また,バイザアノテーション上の半教師付き映像オブジェクトセグメンテーションにおける幾何の利点を強調する。 EPIC FieldsはEPICKITCHENSの動画の96%を再構築し、45のキッチンで99時間に19Mフレームを登録している。 Neural rendering is fuelling a unification of learning, 3D geometry and video understanding that has been waiting for more than two decades. Progress, however, is still hampered by a lack of suitable datasets and benchmarks. To address this gap, we introduce EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Like other datasets for neural rendering, EPIC Fields removes the complex and expensive step of reconstructing cameras using photogrammetry, and allows researchers to focus on modelling problems. We illustrate the challenge of photogrammetry in egocentric videos of dynamic actions and propose innovations to address them. Compared to other neural rendering datasets, EPIC Fields is better tailored to video understanding because it is paired with labelled action segments and the recent VISOR segment annotations. To further motivate the community, we also evaluate two benchmark tasks in neural rendering and segmenting dynamic objects, with strong baselines that showcase what is not possible today. We also highlight the advantage of geometry in semi-supervised video object segmentations on the VISOR annotations. EPIC Fields reconstructs 96% of videos in EPICKITCHENS, registering 19M frames in 99 hours recorded in 45 kitchens.	翻訳日:2023-06-16 17:19:38 公開日:2023-06-14
# 一般化可能なワンショットニューラルヘッドアバター Generalizable One-shot Neural Head Avatar ( http://arxiv.org/abs/2306.08768v1 ) ライセンス: Link先を確認	Xueting Li, Shalini De Mello, Sifei Liu, Koki Nagano, Umar Iqbal, Jan Kautz	(参考訳) 本研究では,1枚の画像から3次元頭部アバターを再構成し,アニメイトする手法を提案する。既存の方法は、複数の画像を持つ特定の人の時間的最適化や、顔領域を超えて複雑な外観の詳細を合成するのに苦労する。これらの制限に対処するために、人物固有の最適化を必要とせず、一視点画像に基づく識別不能に一般化するだけでなく、顔領域内外の特徴詳細(髪型、アクセサリーなど)をキャプチャする枠組みを提案する。提案手法のコアとなるのは, 粗い3次元形状, ソース画像の詳細な外観, ターゲット画像の表現を表す3つの三面体を生成する3つの枝である。 3つの三面体と超解像モジュールの組み合わせにボリュームレンダリングを適用することにより、所望のアイデンティティ、表現、ポーズの忠実度の高い画像が得られる。トレーニングを終えると、ネットワークを1つのフォワードパスで効率的な3d頭部アバターの再構築とアニメーションが可能になる。実験により,提案手法は未発見の検証データセットによく一般化し,頭部アバターの再構成とアニメーションにおいてsotaベースライン法を大きなマージンで上回った。 We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image. Existing methods either involve time-consuming optimization for a specific person with multiple images, or they struggle to synthesize intricate appearance details beyond the facial region. To address these limitations, we propose a framework that not only generalizes to unseen identities based on a single-view image without requiring person-specific optimization, but also captures characteristic details within and beyond the face area (e.g. hairstyle, accessories, etc.). At the core of our method are three branches that produce three tri-planes representing the coarse 3D geometry, detailed appearance of a source image, as well as the expression of a target image. By applying volumetric rendering to the combination of the three tri-planes followed by a super-resolution module, our method yields a high fidelity image of the desired identity, expression and pose. Once trained, our model enables efficient 3D head avatar reconstruction and animation via a single forward pass through a network. Experiments show that the proposed approach generalizes well to unseen validation datasets, surpassing SOTA baseline methods by a large margin on head avatar reconstruction and animation.	翻訳日:2023-06-16 17:13:55 公開日:2023-06-14
# データにおけるコペルニクス革命 A Copernican Revolution in Data ( http://arxiv.org/abs/2306.08766v1 ) ライセンス: Link先を確認	Claudio Gutierrez	(参考訳) 半世紀前、Charles Bachmanはデジタル世界でのデータの重要性と中心性を予見した。本稿では,過去数十年のデータベースコミュニティにおけるこれらのアイデアの進化について考察する。この歴史的分析は、我々の規律の下にある根本的な変化の理解を深め、我々の分野の将来の軌道に関する洞察を与えるのに役立つと信じている。 Half a century ago, Charles Bachman foresaw the significance and centrality of data in the digital world. In this short paper, we delve into the evolution of these ideas within the database community over the past decades. We believe that this historical analysis helps deepen our comprehension of the fundamental changes undergoing our discipline and provides insights into the future trajectory of our field.	翻訳日:2023-06-16 17:13:33 公開日:2023-06-14
# 時系列からの因果発見のための制約に基づくアルゴリズムと雑音に基づくアルゴリズムのハイブリッド化 Hybrids of Constraint-based and Noise-based Algorithms for Causal Discovery from Time Series ( http://arxiv.org/abs/2306.08765v1 ) ライセンス: Link先を確認	Charles K. Assaad, Daria Bystrova, Julyan Arbel, Emilie Devijver, Eric Gaussier, Wilfried Thuiller	(参考訳) 実アプリケーションでは検証できないような強い仮定の下で観測時系列から要約因果グラフを見つけるための制約ベースおよびノイズベース手法が提案されている。近年,これら2つのアプローチを組み合わせたハイブリッド手法 (Assaad et al, 2021) が仮定違反に対して堅牢であることが判明した。しかし、この手法は、要約因果グラフが非巡回であると仮定するが、多くのアプリケーションではサイクルが一般的である。例えば、生態学的コミュニティでは、捕食者と獲物個体群の間に周期的な関係があり、フィードバックループを形成している。そこで本稿では,制約に基づく手法と雑音に基づく手法を併用して,サイクルを含まない可能性のある要約因果グラフを探索する手法を提案する。各フレームワークに対して、シミュレーションデータ、実環境データ、および様々なアプリケーションの実データに対して実験的にテストされる2つのハイブリッドアルゴリズムを提供する。実験によると、私たちのハイブリッドアプローチは堅牢であり、ほとんどのデータセットに対して優れた結果をもたらします。 Constraint-based and noise-based methods have been proposed to discover summary causal graphs from observational time series under strong assumptions which can be violated or impossible to verify in real applications. Recently, a hybrid method (Assaad et al, 2021) that combines these two approaches, proved to be robust to assumption violation. However, this method assumes that the summary causal graph is acyclic, but cycles are common in many applications. For example, in ecological communities, there may be cyclic relationships between predator and prey populations, creating feedback loops. Therefore, this paper presents two new frameworks for hybrids of constraint-based and noise-based methods that can discover summary causal graphs that may or may not contain cycles. For each framework, we provide two hybrid algorithms which are experimentally tested on simulated data, realistic ecological data, and real data from various applications. Experiments show that our hybrid approaches are robust and yield good results over most datasets.	翻訳日:2023-06-16 17:13:28 公開日:2023-06-14
# 部分的後視状態情報を持つRLにおけるPMDPの理論的硬さとトラクタビリティ Theoretical Hardness and Tractability of POMDPs in RL with Partial Hindsight State Information ( http://arxiv.org/abs/2306.08762v1 ) ライセンス: Link先を確認	Ming Shi, Yingbin Liang, and Ness Shroff	(参考訳) 部分可観測マルコフ決定プロセス(pomdps)は多くの実世界のアプリケーションを取り込むために広く適用されてきた。しかし、既存の理論的な結果から、一般的なpomdpsでの学習は難解であり、主な課題は潜在状態情報がないことである。ここでの基本的な問題は、トラクタビリティを実現するのに、どれくらいの後方状態情報(HSI)が十分かということだ。完全HSIがなければ,POMDPに対する$\epsilon$-Optimal Policy Solutionを得るためには,指数関数的にスケールするサンプルの複雑さが必要である。それでも、下界構造における重要な知見から、部分的HSIであっても、PMDPの重要な抽出可能なクラスが存在することが分かる。特に、部分的HSIを持つPOMDPの2つの新しいクラスに対して、新しい上界と下界を確立することにより、ほぼ最適であることを示す新しいアルゴリズムを提供する。 Partially observable Markov decision processes (POMDPs) have been widely applied to capture many real-world applications. However, existing theoretical results have shown that learning in general POMDPs could be intractable, where the main challenge lies in the lack of latent state information. A key fundamental question here is how much hindsight state information (HSI) is sufficient to achieve tractability. In this paper, we establish a lower bound that reveals a surprising hardness result: unless we have full HSI, we need an exponentially scaling sample complexity to obtain an $\epsilon$-optimal policy solution for POMDPs. Nonetheless, from the key insights in our lower-bound construction, we find that there exist important tractable classes of POMDPs even with partial HSI. In particular, for two novel classes of POMDPs with partial HSI, we provide new algorithms that are shown to be near-optimal by establishing new upper and lower bounds.	翻訳日:2023-06-16 17:13:10 公開日:2023-06-14
# 高調波電位井戸における傾斜線形及び二次バンド接触分散に対するファノ共鳴 Fano resonances for tilted linear and quadratic band touching dispersions in a harmonically driven potential well ( http://arxiv.org/abs/2306.08759v1 ) ライセンス: Link先を確認	Anton Gregefalk, Annica Black-Schaffer, Tanay Nag	(参考訳) 傾き線形および2次帯域接触分散モデルを考えると、横方向線形傾きが伝送スペクトルに与える影響を、高調波駆動電位配向により縦方向に解析する。フロッケ散乱行列形式を用いることで、ファノ共鳴はフロッケサイドバンドと準束縛状態のマッチングの結果であり、傾きはエネルギーと波のベクトルを再正規化する。伝送プロファイルにおいて, 負の共鳴エネルギーは, 横運動量の大きさが大きくなるにつれて, 線形(四角形)のバンドタッチにおいて減少する(増加)。逆運動量と傾きの積の符号は、両方のバンド分散のトテッドケースに対するファノ共鳴エネルギーの相対的なシフトも決定し、傾き系のファノ共鳴のチューニング可能性を示している。重要なことに、横モーメント方向の関数としてファノ共鳴エネルギーを測定することにより、傾き強度を直接決定することもできる。さらに,ファノ共鳴エネルギーの周囲に反射領域とゆらぎがある場合のショットノイズスペクトルとその差特性について検討した。興味深いことに、差動ショットノイズと透過スペクトルはどちらも同じ方法で定性的に振る舞うため、駆動固体系の将来の実験において重要な観測材料となる。 Considering models with tilted linear and quadratic band touching dispersions, we analyze the effect of the transverse linear tilt on the transmission spectra through a harmonically driven potential well oriented longitudinally. Employing the Floquet scattering matrix formalism, we find Fano resonances as an outcome of matching between the Floquet sidebands and quasi-bound states, where the tilt renormalizes their energies and wave vectors. We find that the Fano resonance energy decreases (increases) for linear (quadratic) band touchings as the magnitude of the transverse momentum increases, indicating a distinct signature of the underlying band dispersion in the transmission profile. The sign of the product of the transverse momentum and the tilt also determines the relative shift in the Fano resonance energy with respect to the untilted case for both band dispersions, suggesting a possible tunability of the Fano resonance for tilted systems. Importantly, the tilt strength can also be directly determined by measuring the Fano resonance energy as function of the transverse momenta direction. We furthermore study the shot noise spectra and their differential property where we find an inflection region and undulation, respectively, around the Fano resonance energy. Interestingly, differential shot noise and transmission spectra both qualitatively behave in a similar fashion and might thus serve as important observables for future experiments on driven solid-state systems.	翻訳日:2023-06-16 17:12:53 公開日:2023-06-14
# infodiffusion:情報最大化拡散モデルを用いた表現学習 InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models ( http://arxiv.org/abs/2306.08757v1 ) ライセンス: Link先を確認	Yingheng Wang, Yair Schiff, Aaron Gokaslan, Weishen Pan, Fei Wang, Christopher De Sa, Volodymyr Kuleshov	(参考訳) 拡散モデルは高品質なサンプルを生成するのに優れているが、潜伏変数は通常意味を欠き、表現学習には適さない。本稿では,データ変動の高レベル因子をキャプチャする低次元潜在変数を用いた拡散モデルの拡張アルゴリズムであるinfodiffusionを提案する。 InfoDiffusionは、観測された変数と隠れた変数の相互情報に規則化された学習目標に依存し、遅延空間の品質を改善し、表現的な拡散に基づくデコーダによって潜伏者が無視されるのを防ぐ。経験的に、InfoDiffusionは、拡散モデルの高いサンプル品質を維持しながら、最先端の生成的およびコントラスト的手法と競合する、絡み合った人間の解釈可能な潜在表現を学習する。提案手法は, 生成画像の属性を操作可能であり, 生成設計などの品質サンプルを生成するために学習した潜伏空間を探索するタスクを支援することができる。 While diffusion models excel at generating high-quality samples, their latent variables typically lack semantic meaning and are not suitable for representation learning. Here, we propose InfoDiffusion, an algorithm that augments diffusion models with low-dimensional latent variables that capture high-level factors of variation in the data. InfoDiffusion relies on a learning objective regularized with the mutual information between observed and hidden variables, which improves latent space quality and prevents the latents from being ignored by expressive diffusion-based decoders. Empirically, we find that InfoDiffusion learns disentangled and human-interpretable latent representations that are competitive with state-of-the-art generative and contrastive methods, while retaining the high sample quality of diffusion models. Our method enables manipulating the attributes of generated images and has the potential to assist tasks that require exploring a learned latent space to generate quality samples, e.g., generative design.	翻訳日:2023-06-16 17:12:27 公開日:2023-06-14
# 多言語エンコーダとSeq2Seqモデルの逐次事前学習 Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models ( http://arxiv.org/abs/2306.08756v1 ) ライセンス: Link先を確認	Saleh Soltan, Andy Rosenbaum, Tobias Falke, Qin Lu, Anna Rumshisky, Wael Hamza	(参考訳) プリトレーニングエンコーダのみおよびシーケンシャル・ツー・シークエンス(seq2seq)モデルにはそれぞれ利点があるが、両方のモデルタイプをスクラッチからトレーニングするのは計算コストがかかる。 1つのモデルを他のモデルから初期化することで、事前学習効率を改善するためのレシピを検討する。 1)Seq2seqモデルからエンコーダを抽出し,特にシーケンスラベリングタスクにおいて,マスケッド言語モデリング(MLM)エンコーダの下位性能を示す。 seq2seqトレーニング中のマスキングの変化、デコーダサイズの削減、少量のMLMトレーニングの継続はギャップを埋めない。 2)逆に、エンコーダをウォームスタートseq2seqトレーニングに使用することにより、トレーニングを通じてエンコーダパートウェイをフリーズすることで、スクラッチseq2seqモデルのタスク性能と一致させることができることを示す。全体として、この2段階のアプローチは、多言語エンコーダとseq2seqモデルの両方を得るための効率的なレシピであり、各モデルをスクラッチからトレーニングするパフォーマンスを一致させ、計算コストを27%削減する。 Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages, however training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one model from the other. (1) Extracting the encoder from a seq2seq model, we show it under-performs a Masked Language Modeling (MLM) encoder, particularly on sequence labeling tasks. Variations of masking during seq2seq training, reducing the decoder size, and continuing with a small amount of MLM training do not close the gap. (2) Conversely, using an encoder to warm-start seq2seq training, we show that by unfreezing the encoder partway through training, we can match task performance of a from-scratch seq2seq model. Overall, this two-stage approach is an efficient recipe to obtain both a multilingual encoder and a seq2seq model, matching the performance of training each model from scratch while reducing the total compute cost by 27%.	翻訳日:2023-06-16 17:12:09 公開日:2023-06-14
# ClimSim:ハイブリッドマルチスケール気候シミュレータにおける高分解能物理エミュレータのトレーニングのためのオープンな大規模データセット ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate simulators ( http://arxiv.org/abs/2306.08754v1 ) ライセンス: Link先を確認	Sungduk Yu, Walter M. Hannah, Liran Peng, Mohamed Aziz Bhouri, Ritwik Gupta, Jerry Lin, Bj\"orn L\"utjens, Justus C. Will, Tom Beucler, Bryce E. Harrop, Benjamin R. Hillman, Andrea M. Jenney, Savannah L. Ferretti, Nana Liu, Anima Anandkumar, Noah D. Brenowitz, Veronika Eyring, Pierre Gentine, Stephan Mandt, Jaideep Pathak, Carl Vondrick, Rose Yu, Laure Zanna, Ryan P. Abernathey, Fiaz Ahmed, David C. Bader, Pierre Baldi, Elizabeth A. Barnes, Gunnar Behrens, Christopher S. Bretherton, Julius J. M. Busecke, Peter M. Caldwell, Wayne Chuang, Yilun Han, Yu Huang, Fernando Iglesias-Suarez, Sanket Jantre, Karthik Kashinath, Marat Khairoutdinov, Thorsten Kurth, Nicholas J. Lutsko, Po-Lun Ma, Griffin Mooers, J. David Neelin, David A. Randall, Sara Shamekh, Akshay Subramaniam, Mark A. Taylor, Nathan M. Urban, Janni Yuval, Guang J. Zhang, Tian Zheng, Michael S. Pritchard	(参考訳) 現代の気候予測は、計算の制約による空間的および時間的解決が不十分である。その結果は、嵐のような臨界過程の不正確で不正確な予測である。物理と機械学習(ML)を組み合わせたハイブリッドな手法は、新しい世代の高忠実度気候シミュレータを導入し、計算ハングリーで短い高解像度のシミュレーションをMLエミュレータにアウトソーシングすることでムーアの法則を助長することができる。しかし、このハイブリッドML-物理シミュレーションアプローチは、ドメイン固有の治療を必要としており、トレーニングデータや関連する、使いやすいワークフローがないため、MLの専門家にはアクセスできない。 ClimSimは、ハイブリッドML物理研究のために設計された、史上最大のデータセットである。気候科学者とML研究者のコンソーシアムによって開発されたマルチスケール気候シミュレーションを含んでいる。 570億対の多変量入力および出力ベクトルからなり、ホストの気候シミュレータのマクロスケールの物理状態に対する局所ネスト、高分解能、高忠実性物理学の影響を分離する。データセットはグローバルにカバーされており、複数年にわたってサンプリング頻度が高く、結果としてエミュレータがダウンストリーム結合と互換性を持つように設計されている。我々は,MLの課題とその得点を明らかにするために,決定論的および確率的回帰ベースラインを実装した。データ(https://huggingface.co/datasets/LEAP/ClimSim_high-res)とコード(https://leap-stc.github.io/ClimSim)は、科学と社会の利益のために、ハイブリッドML物理と高忠実度気候シミュレーションの開発を支援するために公開されている。 Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise prediction of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state. The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society.	翻訳日:2023-06-16 17:11:48 公開日:2023-06-14
# モノリンガルデータからのバイリンガルおよびコード変換音声認識モデルの訓練に向けて Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources ( http://arxiv.org/abs/2306.08753v1 ) ライセンス: Link先を確認	Kunal Dhawan, Dima Rekesh, Boris Ginsburg	(参考訳) ASR(Multilingual Automatic Speech Recognition)モデルでは、複数の言語にまたがる音声の書き起こしが可能で、個別のモデルを必要としない。さらに、Language Identification (LID)を実行し、コード変更された音声を処理する。しかし、これらのモデルを訓練するには特別なコードスウィッチと多言語音声コーパスが必要である。本稿では,バイリンガル学習に対する異なるアプローチと,純粋にモノリンガルなデータソースを用いたコード切替型ASRモデルの評価を行う。本稿では,モノリンガルなサンプル境界におけるLIDの生成技術と異なり,各トークンに対してLIDを生成する集合トークン化器の概念を紹介する。両言語間および単言語間モデルの性能を比較し,アグリゲートトークン化器の有効性を示し,合成型asrデータ生成手法を示し,提案するasrモデルが音声認識と音声言語識別のタスクに有効であることを示す。 Multilingual Automatic Speech Recognition (ASR) models are capable of transcribing audios across multiple languages, eliminating the need for separate models. In addition, they can perform Language Identification (LID) and handle code-switched speech. However, training these models requires special code-switch and multilingual speech corpora which are sparsely available. In this paper, we evaluate different approaches towards training of bilingual as well as code-switched ASR models using purely monolingual data sources. We introduce the concept of aggregate tokenizers that differs from the current prevalent technique of generating LIDs at the boundaries of monolingual samples and produces LID for each emitted token instead. We compare bilingual and monolingual model performance, showcase the efficacy of aggregate tokenizers, present a synthetic code-switched ASR data generation technique and demonstrate the effectiveness of the proposed code-switched ASR models for the tasks of speech recognition and spoken language identification.	翻訳日:2023-06-16 17:11:13 公開日:2023-06-14
# 学習者からの学習による視覚的質問応答の改善 Improving Selective Visual Question Answering by Learning from Your Peers ( http://arxiv.org/abs/2306.08751v1 ) ライセンス: Link先を確認	Corentin Dancette, Spencer Whitehead, Rishabh Maheshwary, Ramakrishna Vedantam, Stefan Scherer, Xinlei Chen, Matthieu Cord, Marcus Rohrbach	(参考訳) VQA(Visual Question Answering)の進歩にもかかわらず、モデルが自身の正確性を評価する能力はいまだ探索されていない。最近の研究によると、VQAモデル、即席のモデルは、間違っているときの回答を控えることが困難であることが示されている。 Selective Prediction(選択予測)とも呼ばれる禁忌オプションは、システムのアウトプットを信頼しなければならないユーザ(視覚障害のあるユーザのためのVQAアシスタントなど)にシステムを展開する場合に非常に有用である。このようなシナリオでは、ユーザがアウト・オブ・ディストリビューション(OOD)や、誤った回答の可能性がより高い敵のインプットを提供するため、禁忌は特に重要である。そこで本研究では,モデルにIDとOODデータを混合して提示する,分散内(ID)およびOODシナリオのSelective VQAについて検討する。目標は、これらの質問に対する誤りのリスクを最小限に抑えながら、回答する質問の数を最大化することである。本稿では,マルチモーダル選択関数を学習し,留意決定を行うためのシンプルなLearning from Your Peers (LYP)アプローチを提案する。提案手法では,学習データの異なるサブセットに基づいて訓練されたモデルの予測を,選択的VQAモデルの最適化のターゲットとして利用する。追加のマニュアルラベルや保留データを必要とせず、簡単に一般化できる例を特定するための信号を提供する。広範な評価では、さまざまなアーキテクチャやスケールにわたる多くのモデルにこのメリットがあります。全体として、IDについては、選択的な予測基準のカバレッジで32.92%に達し、1%のエラーリスク(C@1%)で、このタスクで過去最高のカバレッジは15.79%だった。混合ID/OODでは、モデルのソフトマックスの信頼度を用いて、10%のOOD例に直面しても1%のエラーリスクで回答するが、LYPで学習した選択関数は25.38% C@1%に増加する。 Despite advances in Visual Question Answering (VQA), the ability of models to assess their own correctness remains underexplored. Recent work has shown that VQA models, out-of-the-box, can have difficulties abstaining from answering when they are wrong. The option to abstain, also called Selective Prediction, is highly relevant when deploying systems to users who must trust the system's output (e.g., VQA assistants for users with visual impairments). For such scenarios, abstention can be especially important as users may provide out-of-distribution (OOD) or adversarial inputs that make incorrect answers more likely. In this work, we explore Selective VQA in both in-distribution (ID) and OOD scenarios, where models are presented with mixtures of ID and OOD data. The goal is to maximize the number of questions answered while minimizing the risk of error on those questions. We propose a simple yet effective Learning from Your Peers (LYP) approach for training multimodal selection functions for making abstention decisions. Our approach uses predictions from models trained on distinct subsets of the training data as targets for optimizing a Selective VQA model. It does not require additional manual labels or held-out data and provides a signal for identifying examples that are easy/difficult to generalize to. In our extensive evaluations, we show this benefits a number of models across different architectures and scales. Overall, for ID, we reach 32.92% in the selective prediction metric coverage at 1% risk of error (C@1%) which doubles the previous best coverage of 15.79% on this task. For mixed ID/OOD, using models' softmax confidences for abstention decisions performs very poorly, answering <5% of questions at 1% risk of error even when faced with only 10% OOD examples, but a learned selection function with LYP can increase that to 25.38% C@1%.	翻訳日:2023-06-16 17:10:56 公開日:2023-06-14
# UAV支援ネットワークにおけるエネルギー効率最適化のための密度認識強化学習 Density-Aware Reinforcement Learning to Optimise Energy Efficiency in UAV-Assisted Networks ( http://arxiv.org/abs/2306.08785v1 ) ライセンス: Link先を確認	Babatunji Omoniwa, Boris Galkin, Ivana Dusparic	(参考訳) 航空機基地局として機能する無人航空機(uavs)は、車両などのモバイルユーザへのワイヤレス接続を提供するために展開することができる。しかし、道路上の車両の密度は、主に地理的地域における移動性や交通状況によって空間的・時間的に変化することが多く、ユビキタスなサービスの提供が困難である。さらに、エネルギーに制約のあるUAVが移動中に空を飛ぶと、近くのUAVセルや同じ周波数帯域を共有する他のアクセスポイントからの干渉に直面し、システムのエネルギー効率(EE)に影響を与える可能性がある。近年,マルチエージェント強化学習 (marl) が適用され,利用者のカバー範囲を最適化する手法が有効に機能しているが,不均一な利用者分布,すなわち不均一な車両濃度を有する都市道路網ではうまく機能しない可能性がある。本研究では,各uavの軌道,接続ユーザ数,uavのエネルギー消費量を共同で最適化し,高密度かつ不均一なユーザの分布を追跡することで,システム全体のeeを最大化する密度対応型マルチエージェント分散型マルチディープq-network(dacemad-ddqn)手法を提案する。私たちの結果は、最先端のMARLアプローチを65%から85%上回っています。 Unmanned aerial vehicles (UAVs) serving as aerial base stations can be deployed to provide wireless connectivity to mobile users, such as vehicles. However, the density of vehicles on roads often varies spatially and temporally primarily due to mobility and traffic situations in a geographical area, making it difficult to provide ubiquitous service. Moreover, as energy-constrained UAVs hover in the sky while serving mobile users, they may be faced with interference from nearby UAV cells or other access points sharing the same frequency band, thereby impacting the system's energy efficiency (EE). Recent multi-agent reinforcement learning (MARL) approaches applied to optimise the users' coverage worked well in reasonably even densities but might not perform as well in uneven users' distribution, i.e., in urban road networks with uneven concentration of vehicles. In this work, we propose a density-aware communication-enabled multi-agent decentralised double deep Q-network (DACEMAD-DDQN) approach that maximises the total system's EE by jointly optimising the trajectory of each UAV, the number of connected users, and the UAVs' energy consumption while keeping track of dense and uneven users' distribution. Our result outperforms state-of-the-art MARL approaches in terms of EE by as much as 65% - 85%.	翻訳日:2023-06-16 17:01:57 公開日:2023-06-14
# HOSSnet: き裂伝播シミュレーションのための効率的な物理誘導ニューラルネットワーク HOSSnet: an Efficient Physics-Guided Neural Network for Simulating Crack Propagation ( http://arxiv.org/abs/2306.08783v1 ) ライセンス: Link先を確認	Shengyu Chen, Shihang Feng, Yao Huang, Zhou Lei, Xiaowei Jia, Youzuo Lin, Estaben Rougier	(参考訳) 有限離散要素法(FDEM)を併用したHOSS(Hybrid Optimization Software Suite)は,高忠実度破壊および断片化過程をシミュレーションする高度な手法の1つであるが,純粋なHOSSシミュレーションの適用は計算コストがかかる。同時に、いくつかの科学的問題で大きな成功を収めている機械学習手法は、科学領域における物理学ベースのモデルに代わる有望な選択肢と考えられている。そこで本研究の目的は, 空間的および時間的領域においてき裂破壊を正確に再構築するための新しいデータ駆動手法の構築である。長期再建における破壊伝播の調整には, 物理的制約を活用している。さらに, フラクチャーデータの再構成性能をさらに向上するために, 知覚的損失と純粋機械学習最適化手法をいくつか導入する。提案手法の有効性を補間実験と補間実験で実証する。提案手法は, 画素単位での再構成誤差と構造的類似性の観点から, 空間的および時間的に高精度な破壊データを再構成できることを確認した。視覚比較は長期的な有望な結果も示す Hybrid Optimization Software Suite (HOSS), which is a combined finite-discrete element method (FDEM), is one of the advanced approaches to simulating high-fidelity fracture and fragmentation processes but the application of pure HOSS simulation is computationally expensive. At the same time, machine learning methods, shown tremendous success in several scientific problems, are increasingly being considered promising alternatives to physics-based models in the scientific domains. Thus, our goal in this work is to build a new data-driven methodology to reconstruct the crack fracture accurately in the spatial and temporal fields. We leverage physical constraints to regularize the fracture propagation in the long-term reconstruction. In addition, we introduce perceptual loss and several extra pure machine learning optimization approaches to improve the reconstruction performance of fracture data further. We demonstrate the effectiveness of our proposed method through both extrapolation and interpolation experiments. The results confirm that our proposed method can reconstruct high-fidelity fracture data over space and time in terms of pixel-wise reconstruction error and structural similarity. Visual comparisons also show promising results in long-term	翻訳日:2023-06-16 17:01:31 公開日:2023-06-14
# 説明可能性の説明:2階説明可能性による深層学習への深い行動可能な洞察に向けて Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability ( http://arxiv.org/abs/2306.08780v1 ) ライセンス: Link先を確認	E. Zhixuan Zeng, Hayden Gunraj, Sheldon Fernandez, Alexander Wong	(参考訳) 説明責任は、ディープラーニングモデルの振る舞いをより包括的に理解する上で重要な役割を担います。これにより、モデルのパフォーマンスの徹底的な検証が可能になり、その決定が関連する視覚的指標に基づいており、トレーニングデータに存在する無関係なパターンに偏らないことを保証する。しかし、既存のメソッドはインスタンスレベルの説明可能性のみを提供しており、各サンプルを手動で分析する必要がある。このような手作業によるレビューは時間がかかり、人間の偏見に傾向があります。この問題に対処するため、最近2次説明可能なAI(SOXAI)の概念が提案され、説明可能なAI(XAI)をインスタンスレベルからデータセットレベルまで拡張した。 SOXAIは、一般的な概念を特定することによって、量的説明とデータセットバイアスの間の関係の分析を自動化する。本研究では,ディープニューラルネットワークの振る舞いを高レベルに解釈することで,行動可能な洞察を「説明可能性を説明する」ことを可能にする。具体的には,SOXAIの動作可能な洞察に基づくトレーニングセットから無関係な概念を除外することで,モデルの性能を向上させることを,例分類とセグメンテーションケースを通じて初めて示す。 Explainability plays a crucial role in providing a more comprehensive understanding of deep learning models' behaviour. This allows for thorough validation of the model's performance, ensuring that its decisions are based on relevant visual indicators and not biased toward irrelevant patterns existing in training data. However, existing methods provide only instance-level explainability, which requires manual analysis of each sample. Such manual review is time-consuming and prone to human biases. To address this issue, the concept of second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. SOXAI automates the analysis of the connections between quantitative explanations and dataset biases by identifying prevalent concepts. In this work, we explore the use of this higher-level interpretation of a deep neural network's behaviour to allows us to "explain the explainability" for actionable insights. Specifically, we demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.	翻訳日:2023-06-16 17:01:13 公開日:2023-06-14
# MMD-FUSE:データ分割のない2サンプルテストのための学習とカーネルの組み合わせ MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting ( http://arxiv.org/abs/2306.08777v1 ) ライセンス: Link先を確認	Felix Biggs, Antonin Schrab, Arthur Gretton	(参考訳) 本稿では,最大平均離散性(MMD)に基づく2サンプルテストのパワーを最大化する新しい統計法を提案する。有限集合の場合、これは重み付けされたソフトな最大値によってこれらのカーネルのそれぞれの下で(正規化された)MDD値を組み合わせることに還元される。指数濃度境界は、null と alternative の下で提案する統計で証明される。さらに、これらのカーネルをデータ依存だが順列非依存の方法で選択する方法を、適切に調整されたテストで示し、データの分割を避ける。この手法は、一般的な置換に基づくMDDテストに広く適用され、オートエンコーダのような教師なしモデルを用いて学習した機能を持つディープカーネルの使用を含む。我々は,合成低次元および実世界の高次元データに対するMDD-FUSEテストの適用性を強調し,その性能を現状のカーネルテストと比較した。 We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it. For finite sets, this reduces to combining (normalised) MMD values under each of these kernels via a weighted soft maximum. Exponential concentration bounds are proved for our proposed statistics under the null and alternative. We further show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting. This technique applies more broadly to general permutation-based MMD testing, and includes the use of deep kernels with features learnt using unsupervised models such as auto-encoders. We highlight the applicability of our MMD-FUSE test on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.	翻訳日:2023-06-16 17:00:54 公開日:2023-06-14
# 時間依存ハミルトニアンのフォン・ノイマン方程式の量子シミュレーション Quantum simulation of the von Neumann equation of time-dependent Hamiltonians ( http://arxiv.org/abs/2306.08775v1 ) ライセンス: Link先を確認	Alejandro Kunold	(参考訳) 本研究では,時間依存ハミルトイナンに対するフォン・ノイマン方程式によって制御される密度行列のダイナミクスをシミュレートする量子アルゴリズムを開発した。この方法は、与えられたリー代数の構造定数の性質を通して密度行列のベクトル化に依存する。パウリの弦によって形成される代数を用いても、アルゴリズムは他の代数に容易に適用できる。このアプローチの主な利点の1つは、位相キックバックによって容易に決定できる実密度行列係数が得られることである。このアルゴリズムはIBMノイズ量子回路シミュレータを用いて実証される。 In this work we develop a quantum algorithm to simulate the dynamics of the density matrix governed by the von Neumann equation for time-dependent Hamiltoinans. The method relies on the vectorization of the density matrix through the properties of the structure constants of a given Lie algebra. Even though we have used the algebra formed by the Pauli strings, the algorithm can be easily adapted to other algebras. One of the main advantages of this approach is that it yields real density matrix coefficients that are easy to determine through phase kickback. The algorithm is demonstrated using the IBM noisy quantum circuit simulator.	翻訳日:2023-06-16 17:00:37 公開日:2023-06-14
# Katakomba: データ駆動NetHackのツールとベンチマーク Katakomba: Tools and Benchmarks for Data-Driven NetHack ( http://arxiv.org/abs/2306.08772v1 ) ライセンス: Link先を確認	Vladislav Kurenkov, Alexander Nikulin, Denis Tarasov, Sergey Kolesnikov	(参考訳) NetHackは強化学習研究のフロンティアとして知られており、学習ベースの手法は依然としてルールベースのソリューションに追いつく必要がある。ブレークスルーの有望な方向の1つは、ロボット工学やレコメンダシステムなどの最近の開発に類似したデータセットを、オフライン強化学習(orl)の傘下で使用することである。最近、大規模なNetHackデータセットがリリースされた。これは必要なステップだったが、まだORLコミュニティで広く採用されていない。本研究では,ツール・ワイド,実装・ワイド,ベンチマーク・ワイドの3つの大きな障害について論じる。そこで我々は, ORLコミュニティに慣れ親しんだワークフローの基礎を提供するオープンソースライブラリを開発した。D4RLスタイルのタスク, 乱雑なベースライン実装, クラウドに同期した設定とログを備えた信頼性評価ツールである。 NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack dataset was released; while it was a necessary step forward, it has yet to gain wide adoption in the ORL community. In this work, we argue that there are three major obstacles for adoption: tool-wise, implementation-wise, and benchmark-wise. To address them, we develop an open-source library that provides workflow fundamentals familiar to the ORL community: pre-defined D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation tools with accompanying configs and logs synced to the cloud.	翻訳日:2023-06-16 17:00:26 公開日:2023-06-14
# 検索型対話システムのためのテキスト自動エンコーダ ConTextual Masked Auto-Encoder for Retrieval-based Dialogue Systems ( http://arxiv.org/abs/2306.04357v3 ) ライセンス: Link先を確認	Zhenpeng Su and Xing Wu and Wei Zhou and Guangyuan Ma and Songlin Hu	(参考訳) 対話応答選択は、与えられたユーザとシステム発話履歴に基づいて、複数の候補から適切な応答を選択することを目的としている。近年, 学習後の対話応答選択の精度が向上し, 主にナイーブマスク型言語モデリング手法に依拠している。しかし、最近開発された生成手法は、IRコミュニティにおいて有望なテキスト表現能力を示しており、よりよい対話セマンティクスモデリングにつながる可能性がある。そこで本稿では,対話応答選択のための自動学習手法であるdialog-mae(dialogue context masking auto-encoder)を提案する。 dial-maeは非対称エンコーダ-デコーダアーキテクチャを使用して、対話の意味を対話型ベクトルに圧縮する。 Dial-MAEのプロセスでは、ディープエンコーダがダイアログのコンテキストに埋め込まれたディープエンコーダを作成し、続いて浅層デコーダが、この埋め込みとマスキングされた応答を使って元の応答を復元する。実験の結果,dial-maeは2つのベンチマークで最先端の性能を得られた。 Dialogue response selection aims to select an appropriate response from several candidates based on a given user and system utterance history. Recent studies have been improving the accuracy of dialogue response selection through post-training, mostly relying on naive masked language modeling methods. However, the recently developed generative methods have shown promising text representation capabilities in IR community, which could potentially lead to better dialogue semantics modeling. Thus, in this paper, we propose Dial-MAE (Dialogue Contextual Masking Auto-encoder), a straightforward yet effective post-training technique tailored for dialogue response selection. Dial-MAE uses an asymmetric encoder-decoder architecture that learns to better compress the semantics of the dialogue into dialogue-dense vectors. The process of Dial-MAE involves a deep encoder creating a dialogue embedding with the masked dialogue context, followed by a shallow decoder that uses this embedding along with the highly masked response to restore the original response. Our experiments have demonstrated that Dial-MAE is highly effective, achieving state-of-the-art performance on two commonly evaluated benchmarks.	翻訳日:2023-06-16 11:10:06 公開日:2023-06-14
# 大規模言語モデルにおける言語の影響を探る: GPT-3.5を用いた検討 Exploring the Influence of Language on Time-Reward Perceptions in Large Language Models: A Study Using GPT-3.5 ( http://arxiv.org/abs/2305.02531v3 ) ライセンス: Link先を確認	Ali Goli, Amandeep Singh	(参考訳) 言語は時間と報酬に対する認識に強い影響を与えます。これは、大きな言語モデルが、異なる言語で同じ質問をするときに、時間とともに報酬に対する異なる好みを示し、その選択が人間のものと似ているかどうかという疑問を提起する。本研究では,複数の言語におけるプロンプトに対するgpt-3.5(以下gptと呼ぶ)の反応を分析し,より小さく,より早い報酬とより大きな後続報酬の選好について検討した。以上の結果から, GPTはドイツ語やマンダリンなどの言語において, 英語やフランス語のような強いFTRを持つ言語と比較して, FTRが弱い言語において, より忍耐力を示すことが示された。これらの知見は既存の文献と一致しており、GPTの選択と話者の好みの相関関係が示唆されている。しかし、さらなる分析により、早期または後期の報酬の選好は、報酬ギャップによって体系的に変化せず、早期の支払いに対する語彙選好を示すことが明らかとなった。 GPTは言語間の興味深いバリエーションを捉えることができるが、これらのモデルによる選択は人間の意思決定者とは一致しない。 Language has a strong influence on our perceptions of time and rewards. This raises the question of whether large language models, when asked the same question in different languages, show different preferences for rewards over time and if their choices are similar to those of humans. In this study, we analyze the responses of GPT-3.5 (hereafter referred to as GPT) to prompts in multiple languages, exploring preferences between smaller, sooner rewards and larger, later rewards. Our results show that GPT displays greater patience when prompted in languages with weak future tense references (FTR), such as German and Mandarin, compared to languages with strong FTR, like English and French. These findings are consistent with the existing literature and suggest a correlation between GPT's choices and the preferences of speakers of these languages. However, further analysis reveals that the preference for earlier or later rewards does not systematically change with reward gaps, indicating a lexicographic preference for earlier payments. While GPT may capture intriguing variations across languages, our findings indicate that the choices made by these models do not correspond to those of human decision-makers.	翻訳日:2023-06-16 11:08:38 公開日:2023-06-14
# Co-MLを用いた家族による協調型機械学習モデルの構築 Collaborative Machine Learning Model Building with Families Using Co-ML ( http://arxiv.org/abs/2304.05444v3 ) ライセンス: Link先を確認	Tiffany Tseng, Jennifer King Chen, Mona Abdelrahman, Mary Beth Kery, Fred Hohman, Adriana Hilliard, R. Benjamin Shapiro	(参考訳) 既存の初心者フレンドリーな機械学習(ml)モデリングツールは、単一のユーザエクスペリエンスを中心に、単一のユーザが自身のデータのみを収集してモデルを構築する。しかし、単体モデリングの経験は、学習者が一緒に働くときに起こりうる代替のアイデアやアプローチに遭遇する貴重な機会を制限している。この問題に対処するため、私たちはco-mlを開発した。これはタブレットベースのアプリで、学習者がエンドツーエンドの反復的なモデル構築プロセスを通じてmlイメージ分類器を共同構築する。本稿では,家庭内導入型ml活動にco-mlを用いた家族(11歳,14歳,11歳,11歳)の詳細なケーススタディを行い,協調モデリングの実現可能性と潜在的豊かさについて述べる。我々は、Co-MLシステム設計を共有し、コラボレーティブアクティビティにおけるCo-MLの使用によって、初心者がデータ多様性、クラス不均衡、データ品質といった以前の作業で不足していたデータセット設計の考察をまとめて行うことができるかについて議論する。個人が異なるモデル構築責任を負う分散協調プロセスが、子供や大人がMLデータセット設計を学ぶためのリッチなコンテキストを提供する方法について論じる。 Existing novice-friendly machine learning (ML) modeling tools center around a solo user experience, where a single user collects only their own data to build a model. However, solo modeling experiences limit valuable opportunities for encountering alternative ideas and approaches that can arise when learners work together; consequently, it often precludes encountering critical issues in ML around data representation and diversity that can surface when different perspectives are manifested in a group-constructed data set. To address this issue, we created Co-ML -- a tablet-based app for learners to collaboratively build ML image classifiers through an end-to-end, iterative model-building process. In this paper, we illustrate the feasibility and potential richness of collaborative modeling by presenting an in-depth case study of a family (two children 11 and 14-years-old working with their parents) using Co-ML in a facilitated introductory ML activity at home. We share the Co-ML system design and contribute a discussion of how using Co-ML in a collaborative activity enabled beginners to collectively engage with dataset design considerations underrepresented in prior work such as data diversity, class imbalance, and data quality. We discuss how a distributed collaborative process, in which individuals can take on different model-building responsibilities, provides a rich context for children and adults to learn ML dataset design.	翻訳日:2023-06-16 11:08:15 公開日:2023-06-14
# 非保存拡散過程の非平衡ダイナミクス Nonequilibrium dynamics of nonconservative diffusion processes ( http://arxiv.org/abs/2302.10154v4 ) ライセンス: Link先を確認	P. Garbaczewski, M. \.Zaba	(参考訳) 非保存的ドリフト場を持つ拡散過程のフォッカー・プランク作用素は次元$N\geq 2$で、非エルミート電磁型ハミルトン運動発生器と直接関連付けられる。確率密度の誘導非平衡力学は、フォッカー・プランク方程式の経路積分解の問題に向けられ、量子プロパゲータの既知の正確な経路積分式を実時間とユークリッド時間に再検討し、これらをフォッカー・プランクが引き起こす遷移確率密度関数に含める。以下では、確率拡散過程のダイナミクスに対する磁気的(または磁気的に見える)影響の形式的かつ概念的に異なる実装に遭遇する、$n=3$の「磁気糸」に従う。 That includes the "magnetic affinity" of nonconservative diffusion processes, the classic Brownian motion of charged particles in the (electro)magnetic field, so-called Euclidean quantum mechanics involving non-Hermitian magnetic-type Hamiltonians, and path integral evaluation of integral kernels of Schr\"{o}dinger semigroups with a minimal electromagnetic coupling (encoded in their Hermitian generators). Our main objective is to go beyond the lore of magnetic analogies/affinities. We aim at detecting deeper interrelations between "magnetically affine" approaches, while clearly discriminating between the classic Lorentz or magnetic forcing in the Brownian motion of charged particles, quantum methods of incorporating electromagnetism, and potentially useful electromagnetic analogies ("surrogate magnetism") in the dynamics of diffusion processes. Fokker-Planck operators of diffusion processes with nonconservative drift fields, in dimension $N\geq 2$, can be directly related with non-Hermitian electromagnetic-type Hamiltonian generators of motion. The induced nonequilibrium dynamics of probability densities points towards an issue of path integral solutions of the Fokker-Planck equation, and calls for revisiting links between known exact path integral formulas for quantum propagators in real and Euclidean time, with these for Fokker-Planck-induced transition probability density functions. In below we shall follow the $N=3$ "magnetic thread", within which one encounters formally and conceptually distinct implementations of the magnetic (or magnetic-looking) impact on the dynamics of stochastic diffusion processes. That includes the "magnetic affinity" of nonconservative diffusion processes, the classic Brownian motion of charged particles in the (electro)magnetic field, so-called Euclidean quantum mechanics involving non-Hermitian magnetic-type Hamiltonians, and path integral evaluation of integral kernels of Schr\"{o}dinger semigroups with a minimal electromagnetic coupling (encoded in their Hermitian generators). Our main objective is to go beyond the lore of magnetic analogies/affinities. We aim at detecting deeper interrelations between "magnetically affine" approaches, while clearly discriminating between the classic Lorentz or magnetic forcing in the Brownian motion of charged particles, quantum methods of incorporating electromagnetism, and potentially useful electromagnetic analogies ("surrogate magnetism") in the dynamics of diffusion processes.	翻訳日:2023-06-16 11:07:48 公開日:2023-06-14
# nf4は理論上最適な情報ではありません NF4 Isn't Information Theoretically Optimal (and that's Good) ( http://arxiv.org/abs/2306.06965v2 ) ライセンス: Link先を確認	Davis Yoshida	(参考訳) このノートは、dettmers et al., 2023で使われているabsmaxベースのブロックワイズ量子化に関するいくつかの単純な計算と実験を共有している。提案したNF4データ型は、通常分布する重みを表すのに理論的に最適であると言われている。私は、量子化すべき値の分布がブロックサイズに依存するため、このことはありえないことを示しています。私はこれらの洞察を応用して、Quantileベースの手法ではなく、期待されるL1再構成エラーを最小限に抑え、改善されたコードを導き出そうとします。これにより、より大きな量子化ブロックサイズのパフォーマンスが向上し、どちらのコードもより小さなブロックサイズで同じように動作する。 This note shares some simple calculations and experiments related to absmax-based blockwise quantization, as used in Dettmers et al., 2023. Their proposed NF4 data type is said to be information theoretically optimal for representing normally distributed weights. I show that this can't quite be the case, as the distribution of the values to be quantized depends on the block-size. I attempt to apply these insights to derive an improved code based on minimizing the expected L1 reconstruction error, rather than the quantile based method. This leads to improved performance for larger quantization block sizes, while both codes perform similarly at smaller block sizes.	翻訳日:2023-06-16 11:04:10 公開日:2023-06-14
# TASRA:AIによる社会規模リスクの分類と分析 TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI ( http://arxiv.org/abs/2306.06924v2 ) ライセンス: Link先を確認	Andrew Critch and Stuart Russell	(参考訳) 近年のいくつかの研究で、人工知能による人類に対する社会規模および絶滅レベルのリスクが特定されているが、そのようなリスクを徹底的に分類する試みは、ほとんどない。多くの抜本的な分類が可能であり、特に新しいリスクや安全性への実践的なアプローチを明らかにする場合に有用である。本稿では,リスクに繋がる行動,アクターは一体化されているか,意図的かという,説明責任に基づく分類について考察する。また、多くのAIシステムの予期せぬ相互作用から生じるリスクや、技術的なソリューションとポリシーの複合が示される故意の誤用によるリスクなど、さまざまなリスクタイプがどのように機能するかを説明する物語も提供します。 While several recent works have identified societal-scale and extinction-level risks to humanity arising from artificial intelligence, few have attempted an {\em exhaustive taxonomy} of such risks. Many exhaustive taxonomies are possible, and some are useful -- particularly if they reveal new risks or practical approaches to safety. This paper explores a taxonomy based on accountability: whose actions lead to the risk, are the actors unified, and are they deliberate? We also provide stories to illustrate how the various risk types could each play out, including risks arising from unanticipated interactions of many AI systems, as well as risks from deliberate misuse, for which combined technical and policy solutions are indicated.	翻訳日:2023-06-16 11:03:25 公開日:2023-06-14
# 視覚トランスフォーマによる胸部x線画像の解析によるcovid-19診断の促進 Enhancing COVID-19 Diagnosis through Vision Transformer-Based Analysis of Chest X-ray Images ( http://arxiv.org/abs/2306.06914v2 ) ライセンス: Link先を確認	Sultan Zavrak	(参考訳) 2019年の新型コロナウイルス(covid-19)の出現は、世界的健康危機を招き、様々な診断方法を通じて個人の病気の特定を必要としている。放射線画像、特にX線画像の展開は、COVID-19の検出とキャラクタリゼーションにおいて重要な手段として認識されている。近年の研究では、X線画像中のウイルスに関する貴重な知見が明らかにされており、人工知能(AI)技術を利用した診断精度の向上を目的とした方法論の探索が進められている。現在の研究は、生の胸部x線画像、特にvit(pre-trained vision transformer)モデルを微調整することで、covid-19の自動診断のための革新的な枠組みを想定している。開発したモデルでは, 2つの分類性能, 通常の症例からcovid-19を識別する, 3つの分類性能, 肺炎および正常例からcovid-19を識別する, および4つの分類性能, 細菌性肺炎, ウイルス性肺炎, および正常な条件を識別し, それぞれ異なるデータセットを用いて評価した。提案したモデルでは,2次分類では99.92%,99.84%,3次分類では97.95%,86.48%,それぞれ4次分類では86.81%という異常な精度が得られた。 The advent of 2019 Coronavirus (COVID-19) has engendered a momentous global health crisis, necessitating the identification of the ailment in individuals through diverse diagnostic modalities. Radiological imaging, particularly the deployment of X-ray imaging, has been recognized as a pivotal instrument in the detection and characterization of COVID-19. Recent investigations have unveiled invaluable insights pertaining to the virus within X-ray images, instigating the exploration of methodologies aimed at augmenting diagnostic accuracy through the utilization of artificial intelligence (AI) techniques. The current research endeavor posits an innovative framework for the automated diagnosis of COVID-19, harnessing raw chest X-ray images, specifically by means of fine-tuning pre-trained Vision Transformer (ViT) models. The developed models were appraised in terms of their binary classification performance, discerning COVID-19 from Normal cases, as well as their ternary classification performance, discriminating COVID-19 from Pneumonia and Normal instances, and lastly, their quaternary classification performance, discriminating COVID-19 from Bacterial Pneumonia, Viral Pneumonia, and Normal conditions, employing distinct datasets. The proposed model evinced extraordinary precision, registering results of 99.92% and 99.84% for binary classification, 97.95% and 86.48% for ternary classification, and 86.81% for quaternary classification, respectively, on the respective datasets.	翻訳日:2023-06-16 11:02:45 公開日:2023-06-14
# HEOM.jl:開量子系における運動の階層方程式のための効率的なジュリアフレームワーク HEOM.jl: An efficient Julia framework for hierarchical equations of motion in open quantum systems ( http://arxiv.org/abs/2306.07522v2 ) ライセンス: Link先を確認	Yi-Te Huang, Po-Chen Kuo, Neill Lambert, Mauro Cirio, Simon Cross, Shen-Liang Yang, Franco Nori, Yueh-Nan Chen	(参考訳) 我々は,複数のボソニック環境とフェルミオン環境を同時に結合したシステムの階層的運動方程式(heom)を統合するためのjuliaフレームワークである「heom.jl」というオープンソースソフトウェアパッケージを紹介する。 Heom.jlは、ボゾンスペクトルとフェルミオンスペクトル、定常状態、および全ての補助密度作用素(ADO)の拡張空間におけるフルダイナミックスを計算する方法の集合である。 ADOのマルチインデックスの必要な処理は、ユーザフレンドリーなインターフェースによって実現される。 2つのフェルミオン貯水池と相互作用する1つの不純物(アンダーソンモデル)と1つのボゾンと2つのフェルミオン貯水池と相互作用する超強結合電荷キャビティ系を解析することにより、パッケージの機能性を実証する。 Heom.jl は HEOM Liouvillian Superoperator の構築において、Python のQuantum Toolbox (QuTiP) の対応するメソッドに関して、すべての ADO に対する動的および定常状態の解決を可能にする。 We introduce an open-source software package called "Heom.jl", a Julia framework to integrate the hierarchical equations of motion (HEOM) for the reduced dynamics of a system simultaneously coupled to multiple bosonic and fermionic environments. Heom.jl features a collection of methods to compute bosonic and fermionic spectra, stationary states, and the full dynamics in the extended space of all auxiliary density operators (ADOs). The required handling of the ADOs multi-indexes is achieved through a user-friendly interface. We exemplify the functionalities of the package by analyzing a single impurity interacting with two fermionic reservoirs (Anderson model), and an ultra-strongly coupled charge-cavity system interacting with one bosonic and two fermionic reservoirs. Heom.jl allows for an order of magnitude speedup in the construction of the HEOM Liouvillian superoperator, solving dynamics and stationary states for all ADOs, with respect to the corresponding method in the Quantum Toolbox in Python (QuTiP), upon which this package is founded.	翻訳日:2023-06-16 10:53:24 公開日:2023-06-14
# グラフェンを添加した単一イオン検出器によるダイヤモンド内ドパント固定 Graphene-Enhanced Single Ion Detectors for Deterministic Near-Surface Dopant Implantation in Diamond ( http://arxiv.org/abs/2306.07496v2 ) ライセンス: Link先を確認	Nicholas F. L. Collins, Alexander M. Jakob, Simon G. Robson, Shao Qi Lim, Paul R\"acke, Brett C. Johnson, Boqing Liu, Yuerui Lu, Daniel Spemann, Jeffrey C. McCallum, David N. Jamieson	(参考訳) ダイヤモンドのカラーセンターアンサンブルは、量子通信のための単一光子源、光学入力と出力による量子計算、ナノスケールへの磁場感知など、多くの応用において集中的に研究されている。これらのアプリケーションのいくつかは、チップ内の単一中心またはランダムに分散したアンサンブルで実現されているが、大規模量子コンピュータの最も要求の高いアプリケーションは、順序付き配列を必要とするだろう。電荷感電素子に接続されたバイアスド表面グラフェン電極により電子グレードダイヤモンド基板を構成することにより、典型的な確率的イオン源から30〜130nmの深さで停止するイオンに対する決定論的単一イオン注入を示すことができる。イオン注入からの電子-ホール対のドリフトによって誘導される電荷パルスにより、注入イベントが信号される。イオン注入部位はAFMナノステンシルまたは集束イオンビームで局在する。これにより、モノリシックデバイスにおける決定論的色中心ネットワーク構築の道を開く、関連する色中心を持つ単一原子の順序づけられた配列を構築することができる。 Colour centre ensembles in diamond have been the subject of intensive investigation for many applications including single photon sources for quantum communication, quantum computation with optical inputs and outputs, and magnetic field sensing down to the nanoscale. Some of these applications are realised with a single centre or randomly distributed ensembles in chips, but the most demanding application for a large-scale quantum computer will require ordered arrays. By configuring an electronic-grade diamond substrate with a biased surface graphene electrode connected to charge-sensitive electronics, it is possible to demonstrate deterministic single ion implantation for ions stopping between 30 and 130~nm deep from a typical stochastic ion source. An implantation event is signalled by a charge pulse induced by the drift of electron-hole pairs from the ion implantation. The ion implantation site is localised with an AFM nanostencil or a focused ion beam. This allows the construction of ordered arrays of single atoms with associated colour centres that paves the way for the fabrication of deterministic colour center networks in a monolithic device.	翻訳日:2023-06-16 10:53:01 公開日:2023-06-14
# オンラインレコメンダシステムにおける高品質コンテンツへのインセンティブ Incentivizing High-Quality Content in Online Recommender Systems ( http://arxiv.org/abs/2306.07479v2 ) ライセンス: Link先を確認	Xinyan Hu, Meena Jagadeesan, Michael I. Jordan, and Jacob Steinhardt	(参考訳) TikTokやYouTubeのようなコンテンツレコメンデーターシステムでは、プラットフォームの決定アルゴリズムがコンテンツ制作者のインセンティブを形成し、コンテンツ制作者がコンテンツの品質にどれだけの努力を払っているかが分かる。多くのプラットフォームがオンライン学習を採用しており、今日のコンテンツは将来のコンテンツの推奨に影響を与えるため、時間的インセンティブを生み出している。本稿では,オンライン学習から生じるインセンティブについて検討し,nash平衡で生成するコンテンツの質を分析した。 hedgeやexp3のような古典的なオンライン学習アルゴリズムは、残念ながら生産者に低品質のコンテンツを制作するインセンティブを与えている。特に、コンテンツの品質は学習率の観点から上界にあり、典型的な学習率スケジュールに対してゼロに近づきます。このネガティブな結果に動機づけられて、私たちは異なる学習アルゴリズム -- 低品質のコンテンツを作るプロデューサーを罰する - をデザインし、プロデューサに高品質なコンテンツを作るインセンティブを正しく与えます。概念レベルでは、我々の研究は、プラットフォームの学習アルゴリズムがコンテンツの品質に与えうる意図しない影響を示し、高品質コンテンツの作成にインセンティブを与えるプラットフォーム学習アルゴリズムの設計への扉を開く。 For content recommender systems such as TikTok and YouTube, the platform's decision algorithm shapes the incentives of content producers, including how much effort the content producers invest in the quality of their content. Many platforms employ online learning, which creates intertemporal incentives, since content produced today affects recommendations of future content. In this paper, we study the incentives arising from online learning, analyzing the quality of content produced at a Nash equilibrium. We show that classical online learning algorithms, such as Hedge and EXP3, unfortunately incentivize producers to create low-quality content. In particular, the quality of content is upper bounded in terms of the learning rate and approaches zero for typical learning rate schedules. Motivated by this negative result, we design a different learning algorithm -- based on punishing producers who create low-quality content -- that correctly incentivizes producers to create high-quality content. At a conceptual level, our work illustrates the unintended impact that a platform's learning algorithm can have on content quality and opens the door towards designing platform learning algorithms that incentivize the creation of high-quality content.	翻訳日:2023-06-16 10:52:14 公開日:2023-06-14
# 高スピンドナーquditの電場と磁場による16次元ヒルベルト空間の移動 Navigating the 16-dimensional Hilbert space of a high-spin donor qudit with electric and magnetic fields ( http://arxiv.org/abs/2306.07453v2 ) ライセンス: Link先を確認	Irene Fern\'andez de Fuentes, Tim Botzem, Mark A. I. Johnson, Arjen Vaartjes, Serwan Asaad, Vincent Mourik, Fay E. Hudson, Kohei M. Itoh, Brett C. Johnson, Alexander M. Jakob, Jeffrey C. McCallum, David N. Jamieson, Andrew S. Dzurak, Andrea Morello	(参考訳) 効率的なスケーリングと柔軟な制御は、有用な量子コンピューティングハードウェアの重要な側面である。半導体のスピンは、量子情報処理と電子、ホール、核、電気または磁場の制御、交換または双極子相互作用によるスケーラブルな結合を結合する。しかし、大きなヒルベルト空間へのアクセスは、相互作用の短距離性のため、依然として困難である。ここでは16次元ヒルベルト空間をシリコン中の1つのアンチモンドナーの電子核状態によって構築する原子ベースの半導体プラットフォームを提案する。我々は、この大きなヒルベルト空間を電場と磁場の両方を使ってナビゲートでき、ゲート忠実度が99.8%を超えることを実証し、ハミルトニアン系とその制御および雑音場に対する感受性の詳細を明らかにした。これらの結果は、高スピンドナーを実用的な量子情報のための豊かなプラットフォームとして確立し、量子基礎を探求する。 Efficient scaling and flexible control are key aspects of useful quantum computing hardware. Spins in semiconductors combine quantum information processing with electrons, holes or nuclei, control with electric or magnetic fields, and scalable coupling via exchange or dipole interaction. However, accessing large Hilbert space dimensions has remained challenging, due to the short-distance nature of the interactions. Here, we present an atom-based semiconductor platform where a 16-dimensional Hilbert space is built by the combined electron-nuclear states of a single antimony donor in silicon. We demonstrate the ability to navigate this large Hilbert space using both electric and magnetic fields, with gate fidelity exceeding 99.8% on the nuclear spin, and unveil fine details of the system Hamiltonian and its susceptibility to control and noise fields. These results establish high-spin donors as a rich platform for practical quantum information and to explore quantum foundations.	翻訳日:2023-06-16 10:51:50 公開日:2023-06-14
# deeptransition: 学習前四足歩行スキルにおける歩行遷移の出現 DeepTransition: Viability Leads to the Emergence of Gait Transitions in Learning Anticipatory Quadrupedal Locomotion Skills ( http://arxiv.org/abs/2306.07419v2 ) ライセンス: Link先を確認	Milad Shafiee, Guillaume Bellegarda, and Auke Ijspeert	(参考訳) 四足動物は移動速度を変えると、歩き方をシームレスに移行します。歩行遷移に関する最も広く受け入れられている説明はエネルギー効率であるが、決定要因や地形特性の潜在的な影響については明確な合意がない。本稿では,転倒の回避という生存可能性が歩行遷移の重要な基準であることを示す。深部強化学習とロボティクスツールを活用して, 上脊髄駆動(脳), 脊髄の中枢パターン生成器, 身体, 外受容感覚の相互作用による歩行遷移の出現について検討した。四足歩行の動物データと一致して,四足歩行ロボットの歩行遷移は,歩行能力とエネルギー効率の両立性が向上することを示した。さらに,個々の地形(すなわち連続した隙間を交差する)が歩行遷移に与える影響を調査し,非生存状態を避けるためにトロト-プロンク遷移の出現を見いだす。最大力やエネルギー効率などの他の潜在的な基準と比較すると、可視性は平地と分断地の両方での歩行遷移後の唯一の改善要因であり、可視性は歩行遷移の第一、普遍的な目的であり、他の基準は二次的な目的であり、かつ/または生存性の結果である。さらに、我々は、学習したコントローラをシミュレート・トゥ・リアルなハードウェア実験で展開し、挑戦的なシナリオで最先端の4倍の俊敏性を示す。 Quadruped animals seamlessly transition between gaits as they change locomotion speeds. While the most widely accepted explanation for gait transitions is energy efficiency, there is no clear consensus on the determining factor, nor on the potential effects from terrain properties. In this article, we propose that viability, i.e. the avoidance of falls, represents an important criterion for gait transitions. We investigate the emergence of gait transitions through the interaction between supraspinal drive (brain), the central pattern generator in the spinal cord, the body, and exteroceptive sensing by leveraging deep reinforcement learning and robotics tools. Consistent with quadruped animal data, we show that the walk-trot gait transition for quadruped robots on flat terrain improves both viability and energy efficiency. Furthermore, we investigate the effects of discrete terrain (i.e. crossing successive gaps) on imposing gait transitions, and find the emergence of trot-pronk transitions to avoid non-viable states. Compared with other potential criteria such as peak forces and energy efficiency, viability is the only improved factor after gait transitions on both flat and discrete gap terrains, suggesting that viability could be a primary and universal objective of gait transitions, while other criteria are secondary objectives and/or a consequence of viability. Moreover, we deploy our learned controller in sim-to-real hardware experiments and demonstrate state-of-the-art quadruped agility in challenging scenarios, where the Unitree A1 quadruped autonomously transitions gaits between trot and pronk to cross consecutive gaps of up to 30 cm (83.3 % of the body-length) at over 1.3 m/s.	翻訳日:2023-06-16 10:51:34 公開日:2023-06-14
# 最適化に触発されたディープニューラルネットワークを用いた自己教師付きハイパースペクトルインパインティング Self-Supervised Hyperspectral Inpainting with the Optimisation inspired Deep Neural Network Prior ( http://arxiv.org/abs/2306.07308v2 ) ライセンス: Link先を確認	Shuo Li and Mehrdad Yaghoobi	(参考訳) ハイパースペクトル画像(HSI)は、数百から数千の狭いスペクトル帯域をカバーし、多くの空間およびスペクトル情報を伝達する。しかし、インストゥルメンタルエラーや大気の変化により、実際に得られたhsiはしばしばノイズやデッドピクセル(ライン)によって汚染され、結果として、その後の応用を著しく損なう可能性のある情報が欠落する。本稿では,新しいHSI欠落画素予測アルゴリズム,Low Rank and Sparsity Constraint Plug-and-Play (LRS-PnP)を紹介する。 LRS-PnPは、画像の全てのスペクトル帯域が欠落している場合でも、欠落した画素や帯域を予測することができる。 LRS-PnPアルゴリズムは、LSS-PnPとDeep Image Prior (DIP)を組み合わせた自己教師型モデルにさらに拡張される。実データを用いた一連の実験において、LSS-PnP-DIPは、他の学習ベース手法と比較して最先端の塗装性能を達成するか、性能を上回ることを示した。 Hyperspectral Image (HSI)s cover hundreds or thousands of narrow spectral bands, conveying a wealth of spatial and spectral information. However, due to the instrumental errors and the atmospheric changes, the HSI obtained in practice are often contaminated by noise and dead pixels(lines), resulting in missing information that may severely compromise the subsequent applications. We introduce here a novel HSI missing pixel prediction algorithm, called Low Rank and Sparsity Constraint Plug-and-Play (LRS-PnP). It is shown that LRS-PnP is able to predict missing pixels and bands even when all spectral bands of the image are missing. The proposed LRS-PnP algorithm is further extended to a self-supervised model by combining the LRS-PnP with the Deep Image Prior (DIP), called LRS-PnP-DIP. In a series of experiments with real data, It is shown that the LRS-PnP-DIP either achieves state-of-the-art inpainting performance compared to other learning-based methods, or outperforms them.	翻訳日:2023-06-16 10:50:17 公開日:2023-06-14
# 第二応答理論:量子重ね合わせの伝播に関する理論的形式論 Second Response Theory: A Theoretical Formalism for the Propagation of Quantum Superpositions ( http://arxiv.org/abs/2306.07924v2 ) ライセンス: Link先を確認	Mart\'in A. Mosquera	(参考訳) 一般電子量子状態の伝播は、分子系と外部駆動場との相互作用に関する情報を提供する。これらは非断熱量子現象に関する理解を与えることもできる。確立された手法は主に、当初は基底状態波動関数によってのみ記述された量子系を伝播することに焦点を当てている。本研究では,前述した2次応答理論と呼ばれる結合クラスター理論の形式性を拡張することにより,まずは基底状態を含む異なる状態の一般線形結合によって記述された量子系を伝播させ,そのような伝播を時間依存クラスター作用素の特殊集合でどのように行うかを示す。我々の理論は、量子力学的観測値、確率、コヒーレンスを決定するために、数値的に正確な結果と強い整合性を示す。本稿では, 2次応答理論における非定常状態と, 線形および二次応答理論における行列要素の予測能力について論じる。本研究はまた、基底状態のクラスター振幅の潜在的な不安定性を持つシステムを扱う近似正規化手法についても論じ、標準ユニタリ理論の参照結果について、その近似を比較する。 The propagation of general electronic quantum states provides information of the interaction of molecular systems with external driving fields. These can also offer understandings regarding non-adiabatic quantum phenomena. Well established methods focus mainly on propagating a quantum system that is initially described exclusively by the ground state wavefunction. In this work, we expand a previously developed formalism within coupled cluster theory, called second response theory, so it propagates quantum systems that are initially described by a general linear combination of different states, which can include the ground state, and show how with a special set of time-dependent cluster operators such propagations are performed. Our theory shows strong consistency with numerically exact results for the determination of quantum mechanical observables, probabilities, and coherences. We discuss unperturbed non-stationary states within second response theory and their ability to predict matrix elements that agree with those found in linear and quadratic response theories. This work also discusses an approximate regularized methodology to treat systems with potential instabilities in their ground-state cluster amplitudes, and compare such approximations with respect to reference results from standard unitary theory.	翻訳日:2023-06-16 10:44:20 公開日:2023-06-14
# iSLAM: インペラティブSLAM iSLAM: Imperative SLAM ( http://arxiv.org/abs/2306.07894v2 ) ライセンス: Link先を確認	Taimeng Fu, Shaoshu Su, Chen Wang	(参考訳) 同時ローカライゼーションとマッピング(SLAM)は、ロボットナビゲーションにおける重要な課題の1つである。近年の進歩は, 教師あり学習に基づく手法が, 従来の最適化手法が評価ドリフトの最小化に重要な役割を担っていることを示唆している。本稿では,このような疎結合なパラダイムが準最適性能にのみ寄与し,結果としてシステム能力と一般化ポテンシャルを削減できることを見出した。この問題を解決するために,我々は,フロントエンドとバックエンドの相互修正を促進し,外部の監督を必要とせずに性能を向上させるための,新しい自己教師付き学習フレームワークimperative slam(islam)を提案した。具体的には,二元最適化問題としてslamシステムを定式化し,両成分を双方向に連結する。その結果、フロントエンドモデルは、バックエンドから残差をバックプロパゲーションすることで、ポーズグラフ最適化によって得られるグローバル幾何学的知識を学習することができる。これにより、システム全体の一般化能力が大幅に向上し、精度が45%まで向上する。我々の知る限り、iSLAMは、フロントエンドとバックエンドが相互に相互に相互に自己管理的な方法で学習できることを示す最初のSLAMシステムです。 Simultaneous localization and mapping (SLAM) stands as one of the critical challenges in robot navigation. Recent advancements suggest that methods based on supervised learning deliver impressive performance in front-end odometry, while traditional optimization-based methods still play a vital role in the back-end for minimizing estimation drift. In this paper, we found that such decoupled paradigm can lead to only sub-optimal performance, consequently curtailing system capabilities and generalization potential. To solve this problem, we proposed a novel self-supervised learning framework, imperative SLAM (iSLAM), which fosters reciprocal correction between the front-end and back-end, thus enhancing performance without necessitating any external supervision. Specifically, we formulate a SLAM system as a bi-level optimization problem so that the two components are bidirectionally connected. As a result, the front-end model is able to learn global geometric knowledge obtained through pose graph optimization by back-propagating the residuals from the back-end. This significantly improves the generalization ability of the entire system and thus achieves the accuracy improvement up to 45%. To the best of our knowledge, iSLAM is the first SLAM system showing that the front-end and back-end can learn jointly and mutually contribute to each other in a self-supervised manner.	翻訳日:2023-06-16 10:44:01 公開日:2023-06-14
# { Generalized $ \left\{ h (1) \oplus h(1) \right\} \uplus u(2) $ commensurate anisotropic Hamiltoninan and ladder operator; energy spectrum, eigenstates and associated coherent and squeezeed state {Generalized $ \left\{ h (1) \oplus h(1) \right\} \uplus u(2) $ commensurate anisotropic Hamiltoninan and ladder operators; energy spectrum, eigenstates and associated coherent and squeezed states ( http://arxiv.org/abs/2306.07889v2 ) ライセンス: Link先を確認	Nibaldo-Edmundo Alvarez-Moraga	(参考訳) 本稿では、複素数 $ \left\{ h (1) \oplus h(1) \right\} \uplus u(2) $ Lie algebra の要素であるハミルトニアンが、この代数の要素であるはしご作用素を認める条件について研究した。このように構成された下降作用素の代数固有状態を計算し、それらからこのハミルトニアンのエネルギースペクトルとエネルギー固有状態の両方を、対応する昇降作用素の助けを借りて通常に生成する。したがって、一般化ハミルトニアン系のいくつかの族が発見され、適切な類似性変換の下では、1:1, 2:1, 1:2, $su(2)$ および他の非共役および可換な異方性2次元量子振動子系を見つける基本的な系の集合に還元される。ハミルトニアンの正規化固有状態とその関連する下降作用素に対する明示的な表現が与えられ、二モード分離可能および非分離一般化コヒーレントおよびスクイーズ状態の古典構造を示す。最後に、上記のすべての結果に基づいて、$p:q$ coprime commensurate 異方性量子振動子のための新しいラダー演算子の提案が行われ、chen $su(2)$コヒーレント状態のクラスへと導かれる。 In this article a study was made of the conditions under which a Hamiltonian which is an element of the complex $ \left\{ h (1) \oplus h(1) \right\} \uplus u(2) $ Lie algebra admits ladder operators which are also elements of this algebra. The algebra eigenstates of the lowering operator constructed in this way are computed and from them both the energy spectrum and the energy eigenstates of this Hamiltonian are generated in the usual way with the help of the corresponding raising operator. Thus, several families of generalized Hamiltonian systems are found, which, under a suitable similarity transformation, reduce to a basic set of systems, among which we find the 1:1, 2:1, 1:2, $su(2)$ and some other non-commensurate and commensurate anisotropic 2D quantum oscillator systems. Explicit expressions for the normalized eigenstates of the Hamiltonian and its associated lowering operator are given, which show the classical structure of two-mode separable and non-separable generalized coherent and squeezed states. Finally, based on all the above results, a proposal for new ladder operators for the $p:q$ coprime commensurate anisotropic quantum oscillator is made, which leads us to a class of Chen $SU(2)$ coherent states.	翻訳日:2023-06-16 10:43:38 公開日:2023-06-14
# 信頼できない純入力状態を持つユニタリ量子プロセストモグラフィ Unitary quantum process tomography with unreliable pure input states ( http://arxiv.org/abs/2306.07867v2 ) ライセンス: Link先を確認	Fran\c{c}ois Verdeil and Yannick Deville	(参考訳) 量子プロセストモグラフィ(QPT)法は、与えられた量子プロセスを特定することを目的としている。本稿では,一元的プロセスの推定に焦点をあてる。なぜなら、量子力学は任意の閉量子系の進化はユニタリ変換によって記述されると仮定しているからである。 QTPの標準的なアプローチは、特定されるプロセスによって修正された後、所定の(一般に純粋な)状態の特定のセットのコピーを測定することである。この設定の主な問題は、入力状態を作成して所定の値に正確に設定することが困難であり、エラーが発生することである。これらのエラーは、中心となるエラー(すなわち、すべてのコピーの平均がゼロである)と、すべてのコピーで同じである系統的エラーの合計に分解することができる。本稿で紹介するアルゴリズムは,QPTを理論的に可能な任意の入力状態に対して有効である。入力状態が所定の値に正確に設定される必要がないという事実は、いくつかの状態が未知であるが、特定されるプロセスを通過する前に測定されることを考慮して、体系的なエラーの問題を除去するためにトリックを使用することができることを意味する。我々は、各入力状態のコピーを複数のグループに分割し、識別するプロセスの$k$インスタンスを通して連続的に転送された後、$k$-th groupのコピーを測定する(各入力状態のコピーは一度だけ測定される)。このトリックを使うことで、初期状態に関する知識を使わずに、プロセスの前後で測定された状態の推定を計算することができる。シミュレーションデータと実験データの両方でアルゴリズムをテストし、閉じ込められたイオン量子コンピュータ上のcnotゲートを同定する。 Quantum process tomography (QPT) methods aim at identifying a given quantum process. The present paper focuses on the estimation of a unitary process. This class is of particular interest because quantum mechanics postulates that the evolution of any closed quantum system is described by a unitary transformation. The standard approach of QTP is to measure copies of a particular set of predetermined (generally pure) states after they have been modified by the process to be identified. The main problem with this setup is that preparing an input state and setting it precisely to a predetermined value is challenging and thus yields errors. These errors can be decomposed into a sum of centred errors (i.e. whose average on all the copies is zero) and systematic errors that are the same on all the copies, the latter is often the main source of error in QPT. The algorithm we introduce in the current paper works for any input states that make QPT theoretically possible. The fact that we do not require the input states to be precisely set to predetermined values means that we can use a trick to remove the issue of systematic errors by considering that some states are unknown but measured before they go through the process to be identified. We achieve this by splitting the copies of each input state into several groups and measuring the copies of the $k$-th group after they have successively been transferred through $k$ instances of the process to be identified (each copy of each input state is only measured once). Using this trick we can compute estimates of the measured states before and after they go through the process without using the knowledge we might have on the initial states. We test our algorithm both on simulated data and experimentally to identify a CNOT gate on a trapped-ions qubit quantum computer.	翻訳日:2023-06-16 10:42:47 公開日:2023-06-14
# BeliefPPG:Breief PropagationによるPGG信号からの不確かさを意識した心拍数推定 BeliefPPG: Uncertainty-aware Heart Rate Estimation from PPG signals via Belief Propagation ( http://arxiv.org/abs/2306.07730v2 ) ライセンス: Link先を確認	Valentin Bieri, Paul Streli, Berken Utku Demirel and Christian Holz	(参考訳) 本稿では,photoplethysmography signal (ppg) から抽出した心拍数推定ベンチマークを用いて,最先端のパフォーマンスを実現する新しい学習ベース手法を提案する。我々は,隠れマルコフモデルとして表現される離散時間確率過程の文脈における心拍数の進化を考える。訓練されたニューラルネットワークを介して、所定のppg信号ウィンドウの心拍数値の分布を導出する。信念伝播を用いて,心拍変動の統計的分布を取り入れ,これらの推定値を時間的文脈で洗練する。そこで,本研究では,予測の不確かさを有意義かつ適切に推定した心拍数値の範囲を定量化した確率分布を求める。提案手法は8つの公開データセット上で3つの異なる相互評価実験によりロバスト性を示す。 We present a novel learning-based method that achieves state-of-the-art performance on several heart rate estimation benchmarks extracted from photoplethysmography signals (PPG). We consider the evolution of the heart rate in the context of a discrete-time stochastic process that we represent as a hidden Markov model. We derive a distribution over possible heart rate values for a given PPG signal window through a trained neural network. Using belief propagation, we incorporate the statistical distribution of heart rate changes to refine these estimates in a temporal context. From this, we obtain a quantized probability distribution over the range of possible heart rate values that captures a meaningful and well-calibrated estimate of the inherent predictive uncertainty. We show the robustness of our method on eight public datasets with three different cross-validation experiments.	翻訳日:2023-06-16 10:42:01 公開日:2023-06-14
# 先進的脅威に対する文脈認識型防御のためのマルチドメイン知識再武装 Few-shot Multi-domain Knowledge Rearming for Context-aware Defence against Advanced Persistent Threats ( http://arxiv.org/abs/2306.07685v2 ) ライセンス: Link先を確認	Gaolei Li, Yuanyuan Zhao, Wenqi Wei, Yuchen Liu	(参考訳) 高度な持続的脅威(APT)には、多段階の侵入、高度に調整された意図、回避戦術などの新しい特徴がある。 APTの防御には、攻撃意図を特定するために多次元サイバー脅威インテリジェンスデータを融合させ、エンティティ関係を認識するためにデータ駆動機械学習による効率的な知識発見戦略を実行する必要がある。しかし、データ駆動機械学習は、新しいサンプルや未知のサンプルの一般化能力に欠けており、防御モデルの精度と実用性を低下させる。さらに、これらのAPT防衛モデルを異種環境や様々なネットワークデバイスにプライベートに展開するには、コンテキスト認識(既知の攻撃エンティティ、連続ネットワーク状態、現在のセキュリティ戦略など)に多大な投資が必要になる。本稿では,APTに対する文脈認識型防御のためのFMKR方式を提案する。メタ学習によって異なるネットワークドメインから生成される複数の小さなタスクを完了させることで、FMKRはまず、新しく未知のAPT攻撃に対して優れた識別と一般化能力を持つモデルを訓練する。各FMKRタスクでは、脅威インテリジェンスとローカルエンティティの両方がメタラーニングにおけるサポート/クエリセットに融合し、攻撃ステージを特定する。第二に、現在のセキュリティ戦略を再構築するために、学生モデルに学習知識を伝達する微調整に基づく展開機構を提案し、防御コストを最小限に抑える。複数のモデル置換戦略と比較して、FMKRは、スケジューリングコストを削減しつつ、攻撃行動に対する迅速な応答を提供する。 2ヶ月にわたる産業用IoT(Industrial Internet of Things, IIoT)のユーザからのフィードバックをもとに,提案手法が防衛満足度を向上させることを実証した。 Advanced persistent threats (APTs) have novel features such as multi-stage penetration, highly-tailored intention, and evasive tactics. APTs defense requires fusing multi-dimensional Cyber threat intelligence data to identify attack intentions and conducts efficient knowledge discovery strategies by data-driven machine learning to recognize entity relationships. However, data-driven machine learning lacks generalization ability on fresh or unknown samples, reducing the accuracy and practicality of the defense model. Besides, the private deployment of these APT defense models on heterogeneous environments and various network devices requires significant investment in context awareness (such as known attack entities, continuous network states, and current security strategies). In this paper, we propose a few-shot multi-domain knowledge rearming (FMKR) scheme for context-aware defense against APTs. By completing multiple small tasks that are generated from different network domains with meta-learning, the FMKR firstly trains a model with good discrimination and generalization ability for fresh and unknown APT attacks. In each FMKR task, both threat intelligence and local entities are fused into the support/query sets in meta-learning to identify possible attack stages. Secondly, to rearm current security strategies, an finetuning-based deployment mechanism is proposed to transfer learned knowledge into the student model, while minimizing the defense cost. Compared to multiple model replacement strategies, the FMKR provides a faster response to attack behaviors while consuming less scheduling cost. Based on the feedback from multiple real users of the Industrial Internet of Things (IIoT) over 2 months, we demonstrate that the proposed scheme can improve the defense satisfaction rate.	翻訳日:2023-06-16 10:41:49 公開日:2023-06-14
# UOD: 解剖学的ランドマークのユニバーサルワンショット検出 UOD: Universal One-shot Detection of Anatomical Landmarks ( http://arxiv.org/abs/2306.07615v2 ) ライセンス: Link先を確認	Heqin Zhu, Quan Quan, Qingsong Yao, Zaiyi Liu, S. kevin Zhou	(参考訳) ワンショット医療ランドマーク検出は、多くの注目を集め、ラベル効率の良いトレーニングプロセスで大きな成功を収める。しかし、既存のワンショット学習手法は、単一のドメインに高度に特化しており、マルチドメイン未ラベルデータの状況において、ドメインの嗜好を著しく損なう。さらに、ワンショット学習は、サブ最適イメージにアノテートした場合のパフォーマンス低下に直面するほど堅牢ではない。これらの課題に対処するために,Universal One-shot Detection (UOD) という,多領域の医療画像を扱うためのドメイン適応型ワンショットランドマーク検出フレームワークを開発する。 UODは、ドメイン固有モジュールとドメイン共有モジュールの組み合わせとして設計された、2つのステージと2つの対応するユニバーサルモデルから構成される。第1段階では、ドメイン適応畳み込みモデルが学習され、擬似ランドマークラベルを生成する。第2段階では、ドメイン優先を排除し、マルチドメインデータのグローバルコンテキストを構築するために、ドメイン適応変換器を設計する。各ドメインからの注釈付きサンプルは1つしかトレーニングできないが、ドメイン共有モジュールはUODがすべての一発サンプルを集約し、より堅牢で正確なランドマークを検出するのに役立つ。解剖学的領域(頭,手,胸など)で広く利用されている3つの公開X線データセットの質的,定量的に検討し,各領域における最先端の成果を得た。 One-shot medical landmark detection gains much attention and achieves great success for its label-efficient training process. However, existing one-shot learning methods are highly specialized in a single domain and suffer domain preference heavily in the situation of multi-domain unlabeled data. Moreover, one-shot learning is not robust that it faces performance drop when annotating a sub-optimal image. To tackle these issues, we resort to developing a domain-adaptive one-shot landmark detection framework for handling multi-domain medical images, named Universal One-shot Detection (UOD). UOD consists of two stages and two corresponding universal models which are designed as combinations of domain-specific modules and domain-shared modules. In the first stage, a domain-adaptive convolution model is self-supervised learned to generate pseudo landmark labels. In the second stage, we design a domain-adaptive transformer to eliminate domain preference and build the global context for multi-domain data. Even though only one annotated sample from each domain is available for training, the domain-shared modules help UOD aggregate all one-shot samples to detect more robust and accurate landmarks. We investigated both qualitatively and quantitatively the proposed UOD on three widely-used public X-ray datasets in different anatomical domains (i.e., head, hand, chest) and obtained state-of-the-art performances in each domain.	翻訳日:2023-06-16 10:40:56 公開日:2023-06-14

Title

Authors

Abstract

論文公表日・翻訳日

# 大規模言語モデル:インテクスト学習による多言語コメント生成

Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning ( http://arxiv.org/abs/2304.11384v3 )

ライセンス: Link先を確認

Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Jin, Xiaoguang Mao, Xiangke Liao

(参考訳) コードコメント生成は、開発者のプログラム理解活動を容易にするために、コードスニペットの自然言語記述を生成することを目的としている。長い間研究されてきたが、既存のアプローチのボトルネックは、コードスニペットが与えられた場合、1つのコメントしか生成できないことだ。この制限に対処するために,大規模な言語モデル(LLM)を用いて,開発者の多様な意図を満たすコメントを生成する可能性について実験的に検討した。我々の直感は,(1)LLMの事前学習過程において,自然言語とプログラミング言語のセマンティックな関係を構築するためにコードとそのペアのコメントが使用されること,(2)事前学習のために収集される実世界のプロジェクトにおけるコメントには,通常,開発者意図が異なるという事実に基づいている。したがって、LLMは事前学習後に異なる視点からコードを理解することができると仮定する。コンテキスト内学習パラダイムを採用して、llmに適切なプロンプト(例:10以上の例で提供)を提供することで、llmは複数の意図を持ったコメントを生成するための最先端の教師付き学習アプローチを大幅に上回ることができるのです。また, 結果の再評価のためのプロンプト構築戦略や後処理戦略をカスタマイズすることで, LLMのパフォーマンスが向上し, LLMを用いたコメント生成の今後の研究方向性が明らかになった。

Code comment generation aims at generating natural language descriptions for a code snippet to facilitate developers' program comprehension activities. Despite being studied for a long time, a bottleneck for existing approaches is that given a code snippet, they can only generate one comment while developers usually need to know information from diverse perspectives such as what is the functionality of this code snippet and how to use it. To tackle this limitation, this study empirically investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents. Our intuition is based on the facts that (1) the code and its pairwise comment are used during the pre-training process of LLMs to build the semantic connection between the natural language and programming language, and (2) comments in the real-world projects, which are collected for the pre-training, usually contain different developers' intents. We thus postulate that the LLMs can already understand the code from different perspectives after the pre-training. Indeed, experiments on two large-scale datasets demonstrate the rationale of our insights: by adopting the in-context learning paradigm and giving adequate prompts to the LLM (e.g., providing it with ten or more examples), the LLM can significantly outperform a state-of-the-art supervised learning approach on generating comments with multiple intents. Results also show that customized strategies for constructing the prompts and post-processing strategies for reranking the results can both boost the LLM's performances, which shed light on future research directions for using LLMs to achieve comment generation.

翻訳日:2023-10-24 12:35:35 公開日:2023-06-14

# PythonとRのデータ分析プログラムでバグを特徴付ける

Characterizing Bugs in Python and R Data Analytics Programs ( http://arxiv.org/abs/2306.08632v1 )

ライセンス: Link先を確認

Shibbir Ahmed, Mohammad Wardat, Hamid Bagheri, Breno Dantas Cruz, Hridesh Rajan

(参考訳) RとPythonは多くの重要なデータ分析タスクで使われている最も人気のある言語の一つである。しかし、これらの2つの言語がデータ分析タスクで発生するバグについて、まだ完全には理解していません。どんなバグがよくあるのか? 主な原因は何ですか? バグと根本原因の関係は何か? これらのバグを緩和する方法? 我々は5,068のStack Overflowポスト、GitHubリポジトリからの1,800のバグ修正コミット、RとPythonのバグを理解するために最も使われているライブラリのGitHub問題に関する包括的な調査を紹介する。 RとPythonには、データ分析の経験不足によるバグがあるが、PythonはRと比較して、データ前処理のバグが大幅に大きい。また、パッケージやライブラリの変更やバグがPythonよりもRのバグを発生させるのに対して、パッケージやライブラリのミスセレクションやコンフリクトはRよりもPythonのバグを発生させる。データビジュアライゼーションの面では、RパッケージはPythonライブラリよりもはるかに多くのバグがある。また,言語的および方法論的差異にもかかわらず,RとPythonのパッケージに比較して強い相関関係が認められた。最後に、手作業によるRとPythonのバグの大規模なデータセットを寄贈する。

R and Python are among the most popular languages used in many critical data analytics tasks. However, we still do not fully understand the capabilities of these two languages w.r.t. bugs encountered in data analytics tasks. What type of bugs are common? What are the main root causes? What is the relation between bugs and root causes? How to mitigate these bugs? We present a comprehensive study of 5,068 Stack Overflow posts, 1,800 bug fix commits from GitHub repositories, and several GitHub issues of the most used libraries to understand bugs in R and Python. Our key findings include: while both R and Python have bugs due to inexperience with data analysis, Python see significantly larger data preprocessing bugs compared to R. Developers experience significantly more data flow bugs in R because intermediate results are often implicit. We also found changes and bugs in packages and libraries cause more bugs in R compared to Python while package or library misselection and conflicts cause more bugs in Python than R. While R has a slightly higher readability barrier for data analysts, the statistical power of R leads to a less number of bad performance bugs. In terms of data visualization, R packages have significantly more bugs than Python libraries. We also identified a strong correlation between comparable packages in R and Python despite their linguistic and methodological differences. Lastly, we contribute a large dataset of manually verified R and Python bugs.

翻訳日:2023-10-23 19:48:22 公開日:2023-06-14

# 建築侵食の違反症状の自動同定に向けて

Towards Automatic Identification of Violation Symptoms of Architecture Erosion ( http://arxiv.org/abs/2306.08616v1 )

ライセンス: Link先を確認

Ruiyin Li, Peng Liang, Paris Avgeriou

(参考訳) アーキテクチャの侵食は、実装が意図したアーキテクチャから外れるので、保守と進化に有害な影響を与える。これを防ぐためには、開発チームは浸食の症状、特に意図したアーキテクチャの違反を十分に早期に理解する必要がある。これを実現する1つの方法は、アーキテクチャ違反をテキストアーティファクト、特にコードレビューから自動的に識別することです。本稿では,機械学習に基づく15の分類器と,事前学習された3つの単語埋め込みを用いた4つの深層学習に基づく分類器を開発した。具体的には、OpenStack(NovaとNeutron)とQt(Qt BaseとQt Creator)の4つの大きなオープンソースプロジェクトのコードレビューコメントを調べました。次に、コードレビューでアーキテクチャ違反について議論した参加者からのフィードバックを得て、トレーニング済みの分類器の有用性を検証する調査を行った。その結果,Word2vec事前学習語埋め込みに基づくSVM分類器はF1スコア0.779で最良となることがわかった。多くの場合、fastText事前訓練された単語埋め込みモデルを用いた分類器は比較的優れた性能が得られる。さらに,200次元事前学習語埋め込みモデルは,100次元および300次元モデルを用いた分類器よりも優れている。また、多数決戦略に基づくアンサンブル分類器は、さらにその分類器を強化し、個々の分類器より優れる。最後に、関係する開発者のオンライン調査により、我々のアプローチによって特定された違反症状は実用的価値があり、差し迫ったアーキテクチャの侵食に対する早期警告を提供できることが明らかとなった。

Architecture erosion has a detrimental effect on maintenance and evolution, as the implementation drifts away from the intended architecture. To prevent this, development teams need to understand early enough the symptoms of erosion, and particularly violations of the intended architecture. One way to achieve this, is through the automatic identification of architecture violations from textual artifacts, and particularly code reviews. In this paper, we developed 15 machine learning-based and 4 deep learning-based classifiers with three pre-trained word embeddings to identify violation symptoms of architecture erosion from developer discussions in code reviews. Specifically, we looked at code review comments from four large open-source projects from the OpenStack (Nova and Neutron) and Qt (Qt Base and Qt Creator) communities. We then conducted a survey to acquire feedback from the involved participants who discussed architecture violations in code reviews, to validate the usefulness of our trained classifiers. The results show that the SVM classifier based on word2vec pre-trained word embedding performs the best with an F1-score of 0.779. In most cases, classifiers with the fastText pre-trained word embedding model can achieve relatively good performance. Furthermore, 200-dimensional pre-trained word embedding models outperform classifiers that use 100 and 300-dimensional models. In addition, an ensemble classifier based on the majority voting strategy can further enhance the classifier and outperforms the individual classifiers. Finally, an online survey of the involved developers reveals that the violation symptoms identified by our approaches have practical value and can provide early warnings for impending architecture erosion.

翻訳日:2023-10-23 19:48:01 公開日:2023-06-14

# ソフトウェア工学教育におけるチーム構成

Team Composition in Software Engineering Education ( http://arxiv.org/abs/2306.08431v1 )

ライセンス: Link先を確認

Sajid Ibrahim Hashmi and Jouni Markkula

(参考訳) ソフトウェア工学教育の目的の1つは、学生に重要なチームワークスキルを習得させることである。これは、学生にコースの割り当てのためにグループで作業させることによって行われる。学生チーム構成は、学習結果、学習内容、学習方法に大きな影響を与えるため、この点で重要な役割を果たす。本研究は,ソフトウェア工学教育における学生チーム構成の理解を深め,国際ソフトウェア工学教育における学生チーム構成に影響する要因を検討することを目的とする。これらの要因は、ソフトウェア工学の教師がコースでグループワークの課題を設計する際に考慮すべきである。本稿では,現在進行中の行動研究研究の最初の知見について述べる。この結果は、ソフトウェア工学コースで学生チーム構成を設計する際に考慮すべきいくつかの明確な原則を与えている。

One of the objectives of software engineering education is to make students to learn essential teamwork skills. This is done by having the students work in groups for course assignments. Student team composition plays a vital role in this, as it significantly affects learning outcomes, what is learned, and how. The study presented in this paper aims to better understand the student team composition in software engineering education and investigate the factors affecting it in the international software engineering education context. Those factors should be taken into consideration by software engineering teachers when they design group work assignments in their courses. In this paper, the initial findings of the ongoing Action research study are presented. The results give some identified principles that should be considered when designing student team composition in software engineering courses.

翻訳日:2023-10-23 19:47:38 公開日:2023-06-14

# プロパティアクセスエラーに対する統計的アプローチ

A statistical approach for finding property-access errors ( http://arxiv.org/abs/2306.08741v1 )

ライセンス: Link先を確認

Ellen Arteca, Max Sch\"afer, Frank Tip

(参考訳) 我々は、オブジェクトが固定されたレイアウトを持たず、プロパティ(メソッドを含む)を追加し、上書きし、オブジェクトの寿命を通して自由に削除できるJavaScriptの不正なプロパティアクセスを見つける問題を調査する。非存在プロパティを参照することはjavascriptのエラーではないため、(おそらくタイプミスやapiドキュメントの誤解によって)存在しないプロパティへの偶発的なアクセスは、徹底的なテストなしで検出されず、問題の原因からは程遠い可能性がある。そこで本研究では,プロパティアクセスのほとんどが正しいという観測に基づいて,プロパティアクセスエラーを検出する2相アプローチを提案する。まず、実世界のjavascriptコードの広範囲なコーパスから多数のプロパティアクセスパターンを収集し、異常な使用パターンを特定するために統計分析を行う。これらのパターンの特定のインスタンスはバグではないかもしれない(動的型チェックなど)ため、ローカルなデータフロー分析では、安全な異常なプロパティアクセスのインスタンスをフィルタし、実際のバグは残らない。提案手法を実験的に検証し, 異常なプロパティアクセスの100件の具体例において, 90%のリコールで82%の精度を実現し, 実用化に適していることを示す。また、人気のあるVSCodeコード補完機能がオブジェクトプロパティの提案にどの程度有効であるかを判断する実験を行い、不正なプロパティ(100%の精度)を示唆しなかったが、80ケース中62ケース(22.5%のリコール)で正しいプロパティを提案できなかったことを発見した。これは、すべてのプロパティアクセスが有効であることを保証するために、VSCodeのコード補完のみに頼ることはできないことを示している。

We study the problem of finding incorrect property accesses in JavaScript where objects do not have a fixed layout, and properties (including methods) can be added, overwritten, and deleted freely throughout the lifetime of an object. Since referencing a non-existent property is not an error in JavaScript, accidental accesses to non-existent properties (caused, perhaps, by a typo or by a misunderstanding of API documentation) can go undetected without thorough testing, and may manifest far from the source of the problem. We propose a two-phase approach for detecting property access errors based on the observation that, in practice, most property accesses will be correct. First a large number of property access patterns is collected from an extensive corpus of real-world JavaScript code, and a statistical analysis is performed to identify anomalous usage patterns. Specific instances of these patterns may not be bugs (due, e.g., dynamic type checks), so a local data-flow analysis filters out instances of anomalous property accesses that are safe and leaves only those likely to be actual bugs. We experimentally validate our approach, showing that on a set of 100 concrete instances of anomalous property accesses, the approach achieves a precision of 82% with a recall of 90%, making it suitable for practical use. We also conducted an experiment to determine how effective the popular VSCode code completion feature is at suggesting object properties, and found that, while it never suggested an incorrect property (precision of 100%), it failed to suggest the correct property in 62 out of 80 cases (recall of 22.5%). This shows that developers cannot rely on VSCode's code completion alone to ensure that all property accesses are valid.

翻訳日:2023-10-23 19:34:58 公開日:2023-06-14

# Cavatoolsシミュレータのリアルタイム性能解析のためのPOWER命令セットアーキテクチャのRTL擬符号をCに変換する

Transpiling RTL Pseudo-code of the POWER Instruction Set Architecture to C for Real-time Performance Analysis on Cavatools Simulator ( http://arxiv.org/abs/2306.08701v1 )

ライセンス: Link先を確認

Kinar S, Prashanth K V, Adithya Hegde, Aditya Subrahmanya Bhat, Narender M

(参考訳) 本稿では,POWER命令セットアーキテクチャ(ISA)のRTL擬似コードをCコードに変換するトランスパイラフレームワークを提案し,Cavatoolsシミュレータ上での実行を可能にする。トランスパイラは、RTL擬似コードを解析し、対応するCコード表現を生成するレキサとパーサで構成される。レキサは入力コードをトークン化し、パーサは文法ルールを適用して抽象構文木(AST)を構築する。トランスパイラは、要件に準拠したCコードを生成することで、Cavatoolsシミュレータとの互換性を保証する。結果として得られたCコードはCavatoolsシミュレータ上で実行でき、開発者はPower ISAの命令レベルのパフォーマンスをリアルタイムで分析できる。提案フレームワークは,RTL擬似コードをCavatoolsエコシステムにシームレスに統合し,総合的なパフォーマンス解析とPower ISAベースのコードの最適化を可能にする。

This paper presents a transpiler framework for converting RTL pseudo code of the POWER Instruction Set Architecture (ISA) to C code, enabling its execution on the Cavatools simulator. The transpiler consists of a lexer and parser, which parse the RTL pseudo code and generate corresponding C code representations. The lexer tokenizes the input code, while the parser applies grammar rules to build an abstract syntax tree (AST). The transpiler ensures compatibility with the Cavatools simulator by generating C code that adheres to its requirements. The resulting C code can be executed on the Cavatools simulator, allowing developers to analyze the instruction-level performance of the Power ISA in real time. The proposed framework facilitates the seamless integration of RTL pseudo code into the Cavatools ecosystem, enabling comprehensive performance analysis and optimization of Power ISA-based code.

翻訳日:2023-10-23 19:34:01 公開日:2023-06-14

# 機械学習を用いた企業間プロジェクトメトリクスから説明可能なソフトウェア欠陥予測

Explainable Software Defect Prediction from Cross Company Project Metrics Using Machine Learning ( http://arxiv.org/abs/2306.08655v1 )

ライセンス: Link先を確認

Susmita Haldar, Luiz Fernando Capretz

(参考訳) プロジェクトの欠陥の数を予測することは、プロジェクトテストマネージャが、テスト、サポート、メンテナンスの作業のために予算、リソース、スケジュールを割り当てるのに重要です。ソフトウェア欠陥予測モデルは、過去の欠陥関連情報をトレーニングした後、与えられたプロジェクトの欠陥の数を予測する。欠陥予測研究の大部分は、手法やクラスレベルの静的情報から欠陥の可能性のあるモジュールを予測することに焦点を当てているが、この研究は、クロス企業プロジェクトデータセットに基づいたプロジェクトレベルの情報から欠陥を予測するものである。本研究は,様々な機械学習アルゴリズムを応用した欠陥予測モデルの開発に焦点をあて,ソフトウェアサイズメトリクス,労力メトリクス,欠陥密度情報を活用する。既存の欠陥予測研究で注目すべき問題は、開発モデルにおける透明性の欠如である。その結果,Shapley Additive exPlanations (SHAP) と呼ばれる最先端のポストホックモデルに依存しない手法を用いて,開発モデルの説明可能性を示した。最後に、企業間のプロジェクト情報から欠陥を予測する重要な特徴を特定した。

Predicting the number of defects in a project is critical for project test managers to allocate budget, resources, and schedule for testing, support and maintenance efforts. Software Defect Prediction models predict the number of defects in given projects after training the model with historical defect related information. The majority of defect prediction studies focused on predicting defect-prone modules from methods, and class-level static information, whereas this study predicts defects from project-level information based on a cross-company project dataset. This study utilizes software sizing metrics, effort metrics, and defect density information, and focuses on developing defect prediction models that apply various machine learning algorithms. One notable issue in existing defect prediction studies is the lack of transparency in the developed models. Consequently, the explain-ability of the developed model has been demonstrated using the state-of-the-art post-hoc model-agnostic method called Shapley Additive exPlanations (SHAP). Finally, important features for predicting defects from cross-company project information were identified.

翻訳日:2023-10-23 19:33:40 公開日:2023-06-14

# 商品市場におけるDeep Policy Gradient Methods

Deep Policy Gradient Methods in Commodity Markets ( http://arxiv.org/abs/2308.01910v1 )

ライセンス: Link先を確認

Jonas Hanetho

(参考訳) エネルギー移行は、断続的なエネルギー源への依存を高め、エネルギー市場を不安定化し、前例のないボラティリティを引き起こし、2021年の世界的なエネルギー危機で頂点に達した。生産者や消費者を害するだけでなく、揮発性エネルギー市場は重要な脱炭努力を危うくする可能性がある。トレーダーは流動性とボラティリティの低減によって市場の安定化に重要な役割を果たしている。将来のリターンを予測するための数理モデルと統計モデルが提案されている。しかし、金融市場の信号対雑音比や非定常力学のため、そのようなモデルの開発は簡単ではない。本論文は,商品取引における深層強化学習手法の有効性について考察する。商品取引問題を離散時間確率力学系として定式化する。このシステムは、市場のボラティリティに反応し適応し、サブサンプルの金融時系列により良い統計特性を提供する、新しい時間分散方式を採用している。取引コストとリスクに敏感な取引エージェントを最適化するために,アクターベースとアクタークリティカルベースという2つのポリシー勾配アルゴリズムを提案する。エージェントは、ディープニューラルネットワークアーキテクチャ、特にCNNとLSTMを用いたパラメトリック関数近似器を介して、過去の価格観測を市場ポジションにマッピングする。深層強化学習モデルの平均は、2017年から2022年までの前月の天然ガス先物試験において、買い買いベースラインよりも83%高いシャープ率を示している。バックテストにより, 深層強化学習エージェントのリスク耐性は, リスク感受性項を用いて調整可能であることが示された。アクターに基づくポリシー勾配アルゴリズムはアクター批判に基づくアルゴリズムよりも大幅に優れており、CNNベースのモデルはLSTMに基づくアルゴリズムよりも若干優れている。

The energy transition has increased the reliance on intermittent energy sources, destabilizing energy markets and causing unprecedented volatility, culminating in the global energy crisis of 2021. In addition to harming producers and consumers, volatile energy markets may jeopardize vital decarbonization efforts. Traders play an important role in stabilizing markets by providing liquidity and reducing volatility. Several mathematical and statistical models have been proposed for forecasting future returns. However, developing such models is non-trivial due to financial markets' low signal-to-noise ratios and nonstationary dynamics. This thesis investigates the effectiveness of deep reinforcement learning methods in commodities trading. It formalizes the commodities trading problem as a continuing discrete-time stochastic dynamical system. This system employs a novel time-discretization scheme that is reactive and adaptive to market volatility, providing better statistical properties for the sub-sampled financial time series. Two policy gradient algorithms, an actor-based and an actor-critic-based, are proposed for optimizing a transaction-cost- and risk-sensitive trading agent. The agent maps historical price observations to market positions through parametric function approximators utilizing deep neural network architectures, specifically CNNs and LSTMs. On average, the deep reinforcement learning models produce an 83 percent higher Sharpe ratio than the buy-and-hold baseline when backtested on front-month natural gas futures from 2017 to 2022. The backtests demonstrate that the risk tolerance of the deep reinforcement learning agents can be adjusted using a risk-sensitivity term. The actor-based policy gradient algorithm performs significantly better than the actor-critic-based algorithm, and the CNN-based models perform slightly better than those based on the LSTM.

翻訳日:2023-10-23 15:31:41 公開日:2023-06-14

# ボース、光子スピン、識別不能の物語

The Story of Bose, Photon Spin and Indistinguishability ( http://arxiv.org/abs/2308.01909v1 )

ライセンス: Link先を確認

Partha Ghose

(参考訳) 1924年の量子統計の発見百周年に近づくにつれ、ボース・アインシュタイン統計のほとんどの標準的なプレゼンテーションで無視されるプランクの法則のオリジナルの導出を再検討することが重要である。これは光子の区別不可能性という新しい概念だけでなく、その固有のスピンの概念も導入した。

As we approach the centenary of the discovery of quantum statistics in 1924, it is important to revisit Bose's original derivation of Planck's law usually ignored in most standard presentations of Bose-Einstein statistics. It introduced not only the novel concept of the indistinguishability of photons but also of their intrinsic spin, a fact unknown to most physicists.

翻訳日:2023-10-23 15:31:12 公開日:2023-06-14

# トロイの木馬がいるのか! IoT環境における最新のMLによる最新の侵入検知システムの文献調査と評価

Is there a Trojan! : Literature survey and critical evaluation of the latest ML based modern intrusion detection systems in IoT environments ( http://arxiv.org/abs/2310.10778v1 )

ライセンス: Link先を確認

Vishal Karanam

(参考訳) ドメインとしてのIoTはここ数年で大きく成長し、データ量だけでなく、サイバーセキュリティの脅威もモバイルネットワーク環境に匹敵している。 IoT環境内のデータの機密性とプライバシは、ここ数年でセキュリティ研究の重要な領域になっている。ますます多くのセキュリティ専門家が、従来のセキュリティ手法を補完するものとして、IoT環境を保護する堅牢なIDSシステムを設計することに関心を持っている。 IoTデバイスはリソース制約があり、異種プロトコルスタックがあるため、従来の侵入検出アプローチはこれらのスキーマ境界内ではうまく機能しない。これにより、セキュリティ研究者は、IoTエコシステムにおける非学習ベースのIDSシステムの欠点を解決するために、マシンラーニングとIDSの交差点でイノベーションを行うことができた。さまざまなMLアルゴリズムがIoTデータセットですでに高い精度を実現していますが、十分なプロダクショングレードモデルがないことが分かります。本稿では,iot侵入検出システムにおける最新の学習ベースアプローチの概要を概説するとともに,これらのシステム,mlパイプラインの潜在的な落とし穴,mlの観点からの課題について徹底的なレビューを行い,今後の研究範囲と推奨事項について論じる。

IoT as a domain has grown so much in the last few years that it rivals that of the mobile network environments in terms of data volumes as well as cybersecurity threats. The confidentiality and privacy of data within IoT environments have become very important areas of security research within the last few years. More and more security experts are interested in designing robust IDS systems to protect IoT environments as a supplement to the more traditional security methods. Given that IoT devices are resource-constrained and have a heterogeneous protocol stack, most traditional intrusion detection approaches don't work well within these schematic boundaries. This has led security researchers to innovate at the intersection of Machine Learning and IDS to solve the shortcomings of non-learning based IDS systems in the IoT ecosystem. Despite various ML algorithms already having high accuracy with IoT datasets, we can see a lack of sufficient production grade models. This survey paper details a comprehensive summary of the latest learning-based approaches used in IoT intrusion detection systems, and conducts a thorough critical review of these systems, potential pitfalls in ML pipelines, challenges from an ML perspective, and discusses future research scope and recommendations.

翻訳日:2023-10-23 02:21:32 公開日:2023-06-14

# m$^2$hub: 材料発見のための機械学習の可能性を解き放つ

M$^2$Hub: Unlocking the Potential of Machine Learning for Materials Discovery ( http://arxiv.org/abs/2307.05378v1 )

ライセンス: Link先を確認

Yuanqi Du, Yingheng Wang, Yining Huang, Jianan Canal Li, Yanqiao Zhu, Tian Xie, Chenru Duan, John M. Gregoire, Carla P. Gomes

(参考訳) 材料発見における機械学習を促進するツールキットであるM$^2$Hubを紹介する。機械学習は分子構造、特に創薬のための生体分子のモデリングにおいて著しく進歩した。しかし、材料構造のモデリングのための機械学習手法の開発は遅れており、材料発見のための多様なタスクへのアクセスを可能にする統合プラットフォームが欠如していることにも理由がある。このギャップを埋めるため、M$^2$Hubは、ワークフロー全体をカバーする材料発見タスク、データセット、機械学習メソッド、評価、ベンチマーク結果へのアクセスを簡単にする。具体的には、M$^2$Hubの最初のリリースでは、仮想スクリーニング、逆設計、分子シミュレーションの3つの重要な段階に焦点を当てている。さらに,材料生成タスクのための2つの合成データセットを提供する。ランダムなデータ分割に加えて、現実世界の物質発見シナリオを反映した3つのデータパーティションも提供します。最先端の機械学習手法(材料構造に適しているが文献では比較されないものを含む)は、代表的タスクでベンチマークされる。私たちのコードとライブラリはhttps://github.com/yuanqidu/m2hubで公開されています。

We introduce M$^2$Hub, a toolkit for advancing machine learning in materials discovery. Machine learning has achieved remarkable progress in modeling molecular structures, especially biomolecules for drug discovery. However, the development of machine learning approaches for modeling materials structures lag behind, which is partly due to the lack of an integrated platform that enables access to diverse tasks for materials discovery. To bridge this gap, M$^2$Hub will enable easy access to materials discovery tasks, datasets, machine learning methods, evaluations, and benchmark results that cover the entire workflow. Specifically, the first release of M$^2$Hub focuses on three key stages in materials discovery: virtual screening, inverse design, and molecular simulation, including 9 datasets that covers 6 types of materials with 56 tasks across 8 types of material properties. We further provide 2 synthetic datasets for the purpose of generative tasks on materials. In addition to random data splits, we also provide 3 additional data partitions to reflect the real-world materials discovery scenarios. State-of-the-art machine learning methods (including those are suitable for materials structures but never compared in the literature) are benchmarked on representative tasks. Our codes and library are publicly available at https://github.com/yuanqidu/M2Hub.

翻訳日:2023-07-16 03:43:09 公開日:2023-06-14

# 非伝統的な認知知能ロボット制御:人間の感情推定における量子ソフトコンピューティングアプローチ-QCOptKBツールキット応用

Unconventional Cognitive Intelligent Robotic Control: Quantum Soft Computing Approach in Human Being Emotion Estimation -- QCOptKB Toolkit Application ( http://arxiv.org/abs/2307.06858v1 )

ライセンス: Link先を確認

Sergey V. Ulyanov, Ichiro Kurawaki, Viktor S. Ulyanov, Takakhide Hagiwara

(参考訳) 量子・ソフトコンピューティングに基づく知的認知制御システムの戦略について知的ファジィコントローラから抽出した量子自己組織化知識ベース相乗効果の不完全知識ベースその技術は、認知神経インタフェースと異なるタイプのロボット協調で記述されたハザード制御状況における知的認知制御システムの堅牢性を改善した。例えば、ボード埋め込み制御系のためのプログラム可能なアルゴリズムによる解法として量子ファジィ推論ゲート設計が導入された。車両の運転に量子ファジィ制御を用いた認知ヘルメットを用いたニューラルインタフェースの適用の可能性を示す。

Strategy of intelligent cognitive control systems based on quantum and soft computing presented. Quantum self-organization knowledge base synergetic effect extracted from intelligent fuzzy controllers imperfect knowledge bases described. That technology improved of robustness of intelligent cognitive control systems in hazard control situations described with the cognitive neuro-interface and different types of robot cooperation. Examples demonstrated the introduction of quantum fuzzy inference gate design as prepared programmable algorithmic solution for board embedded control systems. The possibility of neuro-interface application based on cognitive helmet with quantum fuzzy controller for driving of the vehicle is shown.

翻訳日:2023-07-16 03:17:24 公開日:2023-06-14

# ELMニューロン:高能率・高能率皮質ニューロンモデルによる長期作業の解法

The ELM Neuron: an Efficient and Expressive Cortical Neuron Model Can Solve Long-Horizon Tasks ( http://arxiv.org/abs/2306.16922v1 )

ライセンス: Link先を確認

Aaron Spieler, Nasim Rahaman, Georg Martius, Bernhard Sch\"olkopf, Anna Levina

(参考訳) 伝統的な大規模神経科学モデルと機械学習は、複雑な計算を行うために集団活動と適切に調整された接続に依存する、個々のニューロンの単純化されたモデルを利用する。しかし、それぞれの生物学的皮質ニューロンは本質的に高度な計算装置であり、数百万のパラメータを持つ深層人工ニューラルネットワークを用いて、皮質錐体ニューロンの詳細な生体物理モデルの入力-出力関係を再現するという最近の研究で裏付けられている。我々はこれらの多くのパラメータの必要性を疑問視し、生物学的にインスパイアされ、計算的に表現され、しかし効率的な皮質ニューロンモデルであるExpressive Leaky Memory(ELM)ニューロンを導入する。 ELMニューロンは、上記の入力と出力の関係を正確に一致させるために、わずか8Kのトレーニング可能なパラメータしか必要としない。正確なモデルは複数のメモリのような隠れ状態と複雑な非線形シナプス積分を必要とする。本研究では,CIFAR-10分類タスクの逐次バージョン,挑戦的パスファインダー-Xタスク,スパイキングハイデルバーグ・ディジットスデータセットに基づく新しいデータセットなど,時間的構造を必要とする様々なタスクにおけるEMMニューロンの評価を行う。 ELMニューロンは、Pathfinder-Xタスク上で77%の精度でトランスフォーマーベースモデルより優れており、シークエンシャルCIFAR-10上での競合性能を示し、スパイキングハイデルバーグ・ディジットスデータセットの古典LSTMモデルよりも優れた性能を示している。これらの結果は、生物学的に動機づけられ、計算効率の良いニューロンモデルが機械学習タスクの性能を向上させる可能性を示唆している。

Traditional large-scale neuroscience models and machine learning utilize simplified models of individual neurons, relying on collective activity and properly adjusted connections to perform complex computations. However, each biological cortical neuron is inherently a sophisticated computational device, as corroborated in a recent study where it took a deep artificial neural network with millions of parameters to replicate the input-output relationship of a detailed biophysical model of a cortical pyramidal neuron. We question the necessity for these many parameters and introduce the Expressive Leaky Memory (ELM) neuron, a biologically inspired, computationally expressive, yet efficient model of a cortical neuron. Remarkably, our ELM neuron requires only 8K trainable parameters to match the aforementioned input-output relationship accurately. We find that an accurate model necessitates multiple memory-like hidden states and intricate nonlinear synaptic integration. To assess the computational ramifications of this design, we evaluate the ELM neuron on various tasks with demanding temporal structures, including a sequential version of the CIFAR-10 classification task, the challenging Pathfinder-X task, and a new dataset based on the Spiking Heidelberg Digits dataset. Our ELM neuron outperforms most transformer-based models on the Pathfinder-X task with 77% accuracy, demonstrates competitive performance on Sequential CIFAR-10, and superior performance compared to classic LSTM models on the variant of the Spiking Heidelberg Digits dataset. These findings indicate a potential for biologically motivated, computationally efficient neuronal models to enhance performance in challenging machine learning tasks.

翻訳日:2023-07-02 13:07:13 公開日:2023-06-14

# 教育のための社会生成AI : 理論・実践・倫理

Towards social generative AI for education: theory, practices and ethics ( http://arxiv.org/abs/2306.10063v1 )

ライセンス: Link先を確認

Mike Sharples

(参考訳) 本稿では,人間と人工知能を介する教育的相互作用を,プロンプトや応答のシーケンスではなく,会話や探索の社会的プロセスとして考察する。この概念では、学習者はインターネットツールやリソースの動的計算媒体内のAI言語モデルと絶えず会話する。学習は、この分散システムが目標を設定し、データから意味を構築し、理解を集約し、違いを調和させ、知識を新しいドメインに移すときに起こる。教育のための社会的生成AIの構築には、人間だけでなく、互いに会話できる強力なAIシステムの開発、知識マップのような外部表現の構築、インターネットリソースへのアクセスと貢献、教師、学習者、ガイド、メンターの役割が必要となる。これは倫理の根本的な問題を引き起こす。このようなシステムは、彼らの限界、学習者に対する責任、インターネットの完全性、そして人間の教師や専門家に対する敬意を意識すべきである。教育のための社会的生成AIの設計と制約について検討する必要がある。

This paper explores educational interactions involving humans and artificial intelligences not as sequences of prompts and responses, but as a social process of conversation and exploration. In this conception, learners continually converse with AI language models within a dynamic computational medium of internet tools and resources. Learning happens when this distributed system sets goals, builds meaning from data, consolidates understanding, reconciles differences, and transfers knowledge to new domains. Building social generative AI for education will require development of powerful AI systems that can converse with each other as well as humans, construct external representations such as knowledge maps, access and contribute to internet resources, and act as teachers, learners, guides and mentors. This raises fundamental problems of ethics. Such systems should be aware of their limitations, their responsibility to learners and the integrity of the internet, and their respect for human teachers and experts. We need to consider how to design and constrain social generative AI for education.

翻訳日:2023-06-26 01:29:51 公開日:2023-06-14

# 言語モデル能力の構造を明らかにする

Revealing the structure of language model capabilities ( http://arxiv.org/abs/2306.10062v1 )

ライセンス: Link先を確認

Ryan Burnell, Han Hao, Andrew R. A. Conway, and Jose Hernandez Orallo

(参考訳) 大規模言語モデル(LLM)の能力に関する理論的理解を構築することは、これらのシステムの振る舞いを予測し、説明する能力に不可欠である。本稿では, LLMの個体群間での個人差パターンから潜在能力を抽出し, LLMの機能構造について検討する。ベイジアン因子と頻繁な因子分析の組み合わせを用いて,27の認知タスクにわたる29のLLMからのデータを分析した。 LLM機能はモノリシックではないという証拠が見つかった。その代わり、推論、理解、コア言語モデリングを表す3つのよく定義された要素によってよりよく説明されます。さらに,これらの3因子は,モデル性能のばらつきの比率が高いことを説明できることがわかった。これらの結果は、異なるLLMの能力において一貫した構造を示し、これらの能力の多面的性質を示す。また,3つの能力はモデルサイズや命令チューニングなどのモデル特性と異なる関係を示すことがわかった。これらのパターンは、スケーリング法則の理解を深め、ある能力を改善するモデルの変更が同時に他人を損なう可能性があることを示すのに役立つ。これらの結果から,各モデル能力に合わせたタスクに着目して,ベンチマークを合理化できることが示唆された。

Building a theoretical understanding of the capabilities of large language models (LLMs) is vital for our ability to predict and explain the behavior of these systems. Here, we investigate the structure of LLM capabilities by extracting latent capabilities from patterns of individual differences across a varied population of LLMs. Using a combination of Bayesian and frequentist factor analysis, we analyzed data from 29 different LLMs across 27 cognitive tasks. We found evidence that LLM capabilities are not monolithic. Instead, they are better explained by three well-delineated factors that represent reasoning, comprehension and core language modeling. Moreover, we found that these three factors can explain a high proportion of the variance in model performance. These results reveal a consistent structure in the capabilities of different LLMs and demonstrate the multifaceted nature of these capabilities. We also found that the three abilities show different relationships to model properties such as model size and instruction tuning. These patterns help refine our understanding of scaling laws and indicate that changes to a model that improve one ability might simultaneously impair others. Based on these findings, we suggest that benchmarks could be streamlined by focusing on tasks that tap into each broad model ability.

翻訳日:2023-06-26 01:29:35 公開日:2023-06-14

# エージェント、システム、サービスの統合のためのオントロジー:OASISバージョン2

The Ontology for Agents, Systems and Integration of Services: OASIS version 2 ( http://arxiv.org/abs/2306.10061v1 )

ライセンス: Link先を確認

Giampaolo Bella, Domenico Cantone, Carmelo Fabio Longo, Marianna Nicolosi-Asmundo and Daniele Francesco Santamaria

(参考訳) セマンティック表現はいくつかのアプリケーションドメインにとって重要なイネーブルであり、マルチエージェントシステム領域は例外ではない。エージェントを意味的に表現する手法の1つとして、行動主義的なビジョンを持ち、どのように作用し、仲間と関わりあうかを記述することで、本質的に達成されている。このアプローチは基本的に、タスクの達成に関連する精神状態を通じてエージェントの運用能力を定義することを目的としている。 2019年に発表されたOASISオントロジー(An Ontology for Agent, Systems, and Integration of Services)は、セマンティック表現システムとエージェントとそのコミットメントのための通信プロトコルを提供するための行動論的アプローチを追求している。本稿では、oasis 2におけるエージェントの表現に関する主なモデル選択、oasisの最新のメジャーアップグレード、特にブロックチェーンのオントロジーの文脈において、導入以来のオントロジーによって達成された成果について報告する。

Semantic representation is a key enabler for several application domains, and the multi-agent systems realm makes no exception. Among the methods for semantically representing agents, one has been essentially achieved by taking a behaviouristic vision, through which one can describe how they operate and engage with their peers. The approach essentially aims at defining the operational capabilities of agents through the mental states related with the achievement of tasks. The OASIS ontology -- An Ontology for Agent, Systems, and Integration of Services, presented in 2019 -- pursues the behaviouristic approach to deliver a semantic representation system and a communication protocol for agents and their commitments. This paper reports on the main modeling choices concerning the representation of agents in OASIS 2, the latest major upgrade of OASIS, and the achievement reached by the ontology since it was first introduced, in particular in the context of ontologies for blockchains.

翻訳日:2023-06-26 01:29:16 公開日:2023-06-14

# MUBen:分子特性予測のための事前学習モデルの不確かさのベンチマーク

MUBen: Benchmarking the Uncertainty of Pre-Trained Models for Molecular Property Prediction ( http://arxiv.org/abs/2306.10060v1 )

ライセンス: Link先を確認

Yinghao Li, Lingkai Kong, Yuanqi Du, Yue Yu, Yuchen Zhuang, Wenhao Mu, Chao Zhang

(参考訳) 大量のラベルのない分子データに基づいて事前訓練された大きなトランスフォーマーモデルは、分子特性を予測することに成功している。しかし、これらのモデルは微調整中に過度に適合しがちであり、トレーニング分布の外側にあるテストデータに対する過密な予測が引き起こされる。この問題を解決するために、モデルのキャリブレーションを改善するために不確実量化法(UQ)を用いることができる。多くのUQアプローチが存在するが、それらすべてが性能改善につながるわけではない。分子前訓練モデルを改善するためにUQを用いた研究もあるが、信頼性の高い分子不確実性推定のための適切なバックボーンとUQ法を選択するプロセスはまだ未定である。このギャップに対処するために,backboneモデルとuqモデルの異なる組み合わせを評価し,特性予測と不確実性推定の両方のパフォーマンスを定量化するmubenを提案する。異なる分子記述子を用いた様々なバックボーン分子表現モデルを、異なるカテゴリからのUQ手法による入力として微調整することにより、アーキテクチャ決定とトレーニング戦略の影響を批判的に評価する。本研究は、材料科学や薬物発見などの分野における不確実性クリティカルな応用の研究を促進するために、UQモデルとバックボーンモデルを選択するための洞察を提供する。

Large Transformer models pre-trained on massive unlabeled molecular data have shown great success in predicting molecular properties. However, these models can be prone to overfitting during fine-tuning, resulting in over-confident predictions on test data that fall outside of the training distribution. To address this issue, uncertainty quantification (UQ) methods can be used to improve the models' calibration of predictions. Although many UQ approaches exist, not all of them lead to improved performance. While some studies have used UQ to improve molecular pre-trained models, the process of selecting suitable backbone and UQ methods for reliable molecular uncertainty estimation remains underexplored. To address this gap, we present MUBen, which evaluates different combinations of backbone and UQ models to quantify their performance for both property prediction and uncertainty estimation. By fine-tuning various backbone molecular representation models using different molecular descriptors as inputs with UQ methods from different categories, we critically assess the influence of architectural decisions and training strategies. Our study offers insights for selecting UQ and backbone models, which can facilitate research on uncertainty-critical applications in fields such as materials science and drug discovery.

翻訳日:2023-06-26 01:29:00 公開日:2023-06-14

# EM-Network:Oracleがシーケンス学習のための自己蒸留をガイド

EM-Network: Oracle Guided Self-distillation for Sequence Learning ( http://arxiv.org/abs/2306.10058v1 )

ライセンス: Link先を確認

Ji Won Yoon, Sunghwan Ahn, Hyeonseung Lee, Minchan Kim, Seok Min Kim, Nam Soo Kim

(参考訳) 我々は,seq2seq学習における目標情報を有効に活用する,新しい自己蒸留法であるem-networkを提案する。従来の手法とは対照的に、ターゲットシーケンスから派生したoracle guidanceでトレーニングされる。オラクルガイダンスは、タスク解決時にシーケンスモデルを支援するターゲット側コンテキストをコンパクトに表現するため、EM-Networkは、ソース入力のみを使用する場合よりも予測が優れている。そこで本研究では,EM-Network の有望な能力を引き継ぐために,EM-Network の知識を1段階的に活用できる新たな自己蒸留手法を提案する。音声認識のためのコネクショニスト時間分類(ctc)と機械翻訳のためのアテンションベースエンコーダデコーダ(aed)の2種類のseq2seqモデルについて包括的実験を行った。実験の結果,em-networkは,音声認識における最善の先行作業よりも改善し,wmt'14およびiwslt'14における最先端性能を確立した。

We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning. In contrast to conventional methods, it is trained with oracle guidance, which is derived from the target sequence. Since the oracle guidance compactly represents the target-side context that can assist the sequence model in solving the task, the EM-Network achieves a better prediction compared to using only the source input. To allow the sequence model to inherit the promising capability of the EM-Network, we propose a new self-distillation strategy, where the original sequence model can benefit from the knowledge of the EM-Network in a one-stage manner. We conduct comprehensive experiments on two types of seq2seq models: connectionist temporal classification (CTC) for speech recognition and attention-based encoder-decoder (AED) for machine translation. Experimental results demonstrate that the EM-Network significantly advances the current state-of-the-art approaches, improving over the best prior work on speech recognition and establishing state-of-the-art performance on WMT'14 and IWSLT'14.

翻訳日:2023-06-26 01:28:38 公開日:2023-06-14

# 表現を理解するために生成する

Generate to Understand for Representation ( http://arxiv.org/abs/2306.10056v1 )

ライセンス: Link先を確認

Changshang Xue, Xiande Zhong, Xiaoqing Liu

(参考訳) 近年,自然言語理解(NLU)や自然言語生成(NLG),テキスト表現タスクなど,高品質な事前訓練モデルが多数出現している。従来、これらのモデルはカスタムドメインコーパスで事前トレーニングされ、特定のタスク用に微調整されており、gpuの使用と労力に関するコストが高くなる。残念ながら、最近の言語モデリングのトレンドは、スケーリングによるパフォーマンス向上に移行し、関連するコストをさらに高めている。 GUR: 言語モデリングと対照的な学習目標を組み合わせた事前トレーニングフレームワークを,単一のトレーニングステップで導入する。文書からLCS(Longest Common Substring)に基づいて類似したテキストペアを選択し,マスク付き言語モデリングと教師なしコントラスト学習を用いてモデルを訓練する。その結果得られたモデルであるGURは、ラベル付きトレーニングデータを使わずに印象的な結果を得ることができ、ゼロショット設定でリコールベンチマークにおいて、他のトレーニング済みベースラインよりも優れている。さらに,我々のアブレーション実験で示されたように,GURは言語モデリング能力を維持している。我々のコードは \url{https://github.com/laohur/GUR} で入手できる。

In recent years, a significant number of high-quality pretrained models have emerged, greatly impacting Natural Language Understanding (NLU), Natural Language Generation (NLG), and Text Representation tasks. Traditionally, these models are pretrained on custom domain corpora and finetuned for specific tasks, resulting in high costs related to GPU usage and labor. Unfortunately, recent trends in language modeling have shifted towards enhancing performance through scaling, further exacerbating the associated costs. Introducing GUR: a pretraining framework that combines language modeling and contrastive learning objectives in a single training step. We select similar text pairs based on their Longest Common Substring (LCS) from raw unlabeled documents and train the model using masked language modeling and unsupervised contrastive learning. The resulting model, GUR, achieves impressive results without any labeled training data, outperforming all other pretrained baselines as a retriever at the recall benchmark in a zero-shot setting. Additionally, GUR maintains its language modeling ability, as demonstrated in our ablation experiment. Our code is available at \url{https://github.com/laohur/GUR}.

翻訳日:2023-06-26 01:28:17 公開日:2023-06-14

# 効率的なメッシュセグメンテーションのための神経形状径関数

Neural Shape Diameter Function for Efficient Mesh Segmentation ( http://arxiv.org/abs/2306.11737v1 )

ライセンス: Link先を確認

Bruno Roy

(参考訳) 多角形メッシュを意味のある部分に分割することは難しい。多くのアプリケーションはコンピュータグラフィックスのさらなる処理のためにそのような構造を分解する必要がある。この10年間、集中計算時間を犠牲にして、この問題に取り組むためのいくつかの方法が提案された。近年,3次元構造のセグメンテーション作業に機械学習が有効であることが証明されている。それでも、これらの最先端のメソッドは、しばしば一般化しにくく、学習したモデルをオーバーフィッティングを避けるためにいくつかの特定のオブジェクトクラスに分割する必要がある。複数のアプリケーションのためのメッシュセグメンテーションの前に,ディープラーニングを利用してマッピング関数を符号化する。我々のネットワークは, 頂点近傍の類似性を利用した textsl{Shape Diameter Function} (SDF) 法の知識を用いて, 周辺地図を再現する。我々のアプローチは、入力メッシュをサンプリングし、近所の貢献のみのために全解像度構造をクエリするので、解像度に依存しない。予測したsdf値を用いることで、グラフカットアルゴリズムに構造を注入し、効率良くロバストなメッシュセグメンテーションを生成し、必要な計算時間をかなり削減できる。

Partitioning a polygonal mesh into meaningful parts can be challenging. Many applications require decomposing such structures for further processing in computer graphics. In the last decade, several methods were proposed to tackle this problem, at the cost of intensive computational times. Recently, machine learning has proven to be effective for the segmentation task on 3D structures. Nevertheless, these state-of-the-art methods are often hardly generalizable and require dividing the learned model into several specific classes of objects to avoid overfitting. We present a data-driven approach leveraging deep learning to encode a mapping function prior to mesh segmentation for multiple applications. Our network reproduces a neighborhood map using our knowledge of the \textsl{Shape Diameter Function} (SDF) method using similarities among vertex neighborhoods. Our approach is resolution-agnostic as we downsample the input meshes and query the full-resolution structure solely for neighborhood contributions. Using our predicted SDF values, we can inject the resulting structure into a graph-cut algorithm to generate an efficient and robust mesh segmentation while considerably reducing the required computation times.

翻訳日:2023-06-26 01:20:47 公開日:2023-06-14

# 容量獲得型入力分布における相互情報

The Mutual Information In The Vicinity of Capacity-Achieving Input Distributions ( http://arxiv.org/abs/2304.14219v3 )

ライセンス: Link先を確認

Bar{\i}\c{s} Nakibo\u{g}lu and Hao-Chung Cheng

(参考訳) 容量獲得入力分布の小さな近傍では、容量獲得入力分布との距離との相互情報の減少は、tops{\o}eとpinskerの不等式による同一性を用いた(多倍の)線形制約を持つすべてのチャネルの容量達成入力分布と有限入力集合との間の距離の2乗の線形関数によって下限される。そのような二次境界の非存在を示すカウンター例は、無限個の線形制約と無限個の入力集合の場合に与えられる。ピンスカーの不等式ではなくテイラー級数近似を用いて、容量獲得入力分布の小さい近傍において、容量獲得入力分布までの距離における相互情報の最も遅い減少の正確な特性を決定する。出力密度作用素が分離可能なヒルベルト空間上で定義される古典量子チャネルに対して、アナログ結果が確立される。チャネル符号化問題に対するこれらの観測の意義と関連する問題への証明手法の適用について論じる。

On small neighborhoods of the capacity-achieving input distributions, the decrease of the mutual information with the distance to the capacity-achieving input distributions is bounded below by a linear function of the square of the distance to the capacity-achieving input distributions for all channels with (possibly multiple) linear constraints and finite input sets using an identity due to Tops{\o}e and Pinsker's inequality. Counter examples demonstrating non-existence of such a quadratic bound are provided for the case of infinite many linear constraints and the case of infinite input sets. Using a Taylor series approximation, rather than Pinsker's inequality, the exact characterization of the slowest decrease of the mutual information with the distance to the capacity-achieving input distributions is determined on small neighborhoods of the capacity-achieving input distributions. Analogous results are established for classical-quantum channels whose output density operators are defined on a separable Hilbert spaces. Implications of these observations for the channel coding problem and applications of the proof technique to related problems are discussed.

翻訳日:2023-06-19 17:14:34 公開日:2023-06-14

# SAFER:顔の感情認識を意識した状況

SAFER: Situation Aware Facial Emotion Recognition ( http://arxiv.org/abs/2306.09372v1 )

ライセンス: Link先を確認

Mijanur Palash, Bharat Bhargava

(参考訳) 本稿では,表情から感情を認識する新しいシステムであるSAFERを提案する。最先端のディープラーニング技術を使用して、顔画像からさまざまな特徴を抽出し、背景や位置といったコンテキスト情報を組み込んでパフォーマンスを向上させる。このシステムはオープンワールドで動作するように設計されており、目に見えない様々な表情に適応でき、現実世界のアプリケーションに適している。この分野における既存の作業に対するSAFERの広範な評価は、CAER-Sデータセットで91.4%の精度で改善された性能を示す。さらに、Covid-19パンデミック時の顔マスクなどの新奇性が顔の感情認識に及ぼす影響を調査し、主流の表情データセットの限界を批判的に調査する。これらの制約に対処するために,表情認識のための新しいデータセットを提案する。提案するデータセットとシステムは,人間とコンピュータのインタラクションやセキュリティ,監視など,さまざまな用途に有用であると思われる。

In this paper, we present SAFER, a novel system for emotion recognition from facial expressions. It employs state-of-the-art deep learning techniques to extract various features from facial images and incorporates contextual information, such as background and location type, to enhance its performance. The system has been designed to operate in an open-world setting, meaning it can adapt to unseen and varied facial expressions, making it suitable for real-world applications. An extensive evaluation of SAFER against existing works in the field demonstrates improved performance, achieving an accuracy of 91.4% on the CAER-S dataset. Additionally, the study investigates the effect of novelty such as face masks during the Covid-19 pandemic on facial emotion recognition and critically examines the limitations of mainstream facial expressions datasets. To address these limitations, a novel dataset for facial emotion recognition is proposed. The proposed dataset and the system are expected to be useful for various applications such as human-computer interaction, security, and surveillance.

翻訳日:2023-06-19 16:47:42 公開日:2023-06-14

# デルタポテンシャル相互作用を持つ一次元量子力学モデルについて

On some one-dimensional quantum-mechanical models with a delta-potential interaction ( http://arxiv.org/abs/2306.09371v1 )

ライセンス: Link先を確認

Francisco M. Fern\'andez

(参考訳) 我々は無次元量子力学的方程式の体系的構成について論じる。このプロセスは、独立したモデルパラメータの数を最小限に減らし、同時に、長さやエネルギーなどの自然な単位を明確かつ直接的な方法で提供する。この体系的な手順を、$\hbar=1$の設定からなる広く採用されている手順と比較する。具体例として、不均質媒質中の局在状態の研究のために最近提案された単純な一次元モデルを選択する。

We discuss a systematic construction of dimensionless quantum-mechanical equations. The process reduces the number of independent model parameters to a minimum and, at the same time, provides the natural units of length, energy, etc. in a clear, straightforward way. We compare this systematic procedure with the widely adopted one that consists of setting $\hbar=1$. As illustrative examples, we choose some simple one-dimensional models proposed recently for the study of localized states in inhomogeneous media.

翻訳日:2023-06-19 16:47:26 公開日:2023-06-14

# Warpformer:不規則な臨床時系列のマルチスケールモデリング手法

Warpformer: A Multi-scale Modeling Approach for Irregular Clinical Time Series ( http://arxiv.org/abs/2306.09368v1 )

ライセンス: Link先を確認

Jiawen Zhang, Shun Zheng, Wei Cao, Jiang Bian, Jia Li

(参考訳) 不規則にサンプリングされた多変量時系列は、様々な分野、特に医療分野においてユビキタスであり、シリーズ内不規則性とシリーズ間不一致の2つの重要な特徴を示す。シリーズ内不規則性は、時系列信号が不規則な間隔でしばしば記録されるという事実であり、シリーズ間不一致はシリーズ間のサンプリングレートの顕著な変動を指す。しかし、不規則時系列の最近の進歩は、シリーズ間の不規則性の問題を見越して、シリーズ内不規則性に対処することに集中している。このギャップを埋めるために、これらの2つの特徴を完全に考慮した新しいアプローチであるWarpformerを提案する。簡単に言えば、warpformerには、シリーズ内不規則性とシリーズ間不一致の両方を明示的に特徴付ける特定の入力表現、所定のスケールで不規則な時系列を適応的に統一するワーピングモジュール、表現学習のためのカスタマイズされたアテンションモジュールなど、いくつかの重要な設計がある。さらに、複数のワープモジュールとアテンションモジュールを積み重ねて異なるスケールで学習し、下流のタスクに対して粗くきめ細かな信号のバランスをとるマルチスケール表現を生成する。広範に使用されるデータセットと臨床データベースから構築した新しい大規模ベンチマークについて広範な実験を行った。この結果は、warpformerが既存の最先端のアプローチよりも優れていることを示している。

Irregularly sampled multivariate time series are ubiquitous in various fields, particularly in healthcare, and exhibit two key characteristics: intra-series irregularity and inter-series discrepancy. Intra-series irregularity refers to the fact that time-series signals are often recorded at irregular intervals, while inter-series discrepancy refers to the significant variability in sampling rates among diverse series. However, recent advances in irregular time series have primarily focused on addressing intra-series irregularity, overlooking the issue of inter-series discrepancy. To bridge this gap, we present Warpformer, a novel approach that fully considers these two characteristics. In a nutshell, Warpformer has several crucial designs, including a specific input representation that explicitly characterizes both intra-series irregularity and inter-series discrepancy, a warping module that adaptively unifies irregular time series in a given scale, and a customized attention module for representation learning. Additionally, we stack multiple warping and attention modules to learn at different scales, producing multi-scale representations that balance coarse-grained and fine-grained signals for downstream tasks. We conduct extensive experiments on widely used datasets and a new large-scale benchmark built from clinical databases. The results demonstrate the superiority of Warpformer over existing state-of-the-art approaches.

翻訳日:2023-06-19 16:47:19 公開日:2023-06-14

# 2次元ハイゼンベルク反強磁性体の確率級数展開におけるループアンサンブル

Loop ensembles in Stochastic Series Expansion of Two-Dimensional Heisenberg Antiferromagnets ( http://arxiv.org/abs/2306.09366v1 )

ライセンス: Link先を確認

Vedant Motamarri

(参考訳) 確率級数展開 (sse) 法はスピンあるいはフレーバー値の再開とともに、量子反強磁性体の分割関数を1つの高次元の密充填ループガスモデルに写像する。 Nahumらによる以前の研究は、特定の密充填された3次元ループガスモデルがマクロループに支配される位相を示し、ループ長の対応する結合分布はポアソン・ディリクレであることを示した。普遍性の観点からは、(2+1)次元の量子反強磁性体で得られるループのアンサンブルは、異なる微視的起源からループが現れるにもかかわらず、同様に予想される。モンテカルロを用いた二乗格子上のSU(N)反強磁性体に対するSSEループアンサンブルをサンプリングし、関節分布が表現度Nと逆温度$\beta$でどのように変化するかを調べる。低温および低N($\leq$4)では,反強磁性相関が系を支配している場合,ポアソン-ディリクレ挙動の特性が実際に示される。

The Stochastic Series Expansion (SSE) method along with resummation over the spin or flavor values maps the partition function of a quantum antiferromagnet to a closely-packed loop gas model in one higher dimension. Earlier work by Nahum et al. has shown that certain closely-packed three-dimensional loop gas models exhibit phases dominated by macroscopic loops, wherein the corresponding joint distribution of loop lengths is Poisson-Dirichlet. On grounds of universality, the same is expected of the ensemble of loops obtained in (2+1)-dimensional quantum antiferromagnets, albeit the loops emerge from a different microscopic origin. We sample the SSE loop ensemble for SU(N) antiferromagnets on a square lattice using Monte Carlo and study how the joint distribution varies with the degree of representation N and inverse temperature $\beta$. We observe that, for low temperatures and small N($\leq$ 4), the distribution indeed shows characteristics of Poisson-Dirichlet behaviour when antiferromagnetic correlations dominate the system.

翻訳日:2023-06-19 16:46:51 公開日:2023-06-14

# 関数次元化法による誘導電動機の故障検出

Fault Detection in Induction Motors using Functional Dimensionality Reduction Methods ( http://arxiv.org/abs/2306.09365v1 )

ライセンス: Link先を確認

Mar\'ia Barroso, Jos\'e M. Bossio, Carlos M. Ala\'iz and \'Angela Fern\'andez

(参考訳) 回転する電気機械の故障検出および診断のための戦略の実装は、現代の産業システムの信頼性と安全性に不可欠である。本研究の貢献は、誘導電動機の故障状況を検出し分類するための、従来のモータ電流シグナチャ解析の戦略と機能的主成分分析と機能的拡散マップという機能的次元削減手法を組み合わせた方法論である。提案手法は, 誘導電動機における故障の存在をリアルタイムに検出するだけでなく, オフライン解析によって発生する多くの種類の故障の同定にも有用であることを示す。

The implementation of strategies for fault detection and diagnosis on rotating electrical machines is crucial for the reliability and safety of modern industrial systems. The contribution of this work is a methodology that combines conventional strategy of Motor Current Signature Analysis with functional dimensionality reduction methods, namely Functional Principal Components Analysis and Functional Diffusion Maps, for detecting and classifying fault conditions in induction motors. The results obtained from the proposed scheme are very encouraging, revealing a potential use in the future not only for real-time detection of the presence of a fault in an induction motor, but also in the identification of a greater number of types of faults present through an offline analysis.

翻訳日:2023-06-19 16:46:30 公開日:2023-06-14

# TSMixer:多変量時系列予測のための軽量MLPミクサモデル

TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting ( http://arxiv.org/abs/2306.09364v1 )

ライセンス: Link先を確認

Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam

(参考訳) トランスフォーマーは時系列予測において、長い列の相互作用を捉える能力で人気を集めている。しかし、その高いメモリとコンピューティング要件は長期的な予測に重大なボトルネックをもたらす。そこで本研究では,多層パーセプトロン(MLP)モジュールのみからなる軽量ニューラルネットワークTSMixerを提案する。 tsmixerはパッチ付き時系列の多変量予測と表現学習のために設計されており、トランスフォーマーの効率的な代替手段を提供する。我々のモデルはコンピュータビジョンにおけるMLP-Mixerモデルの成功からインスピレーションを得ている。時系列にVision MLP-Mixerを適用する際の課題を示し、精度を高めるために経験的検証されたコンポーネントを導入する。これは、階層構造やチャネル相関などの時系列特性を明示的にモデル化するための、MLP-Mixerバックボーンにオンライン和解ヘッドを付加する新しい設計パラダイムを含む。また,既存のパッチチャネル混合方式では一般的な課題である,多種多様なデータセット間のノイズチャネルインタラクションと一般化を効果的に処理するためのハイブリッドチャネルモデリング手法を提案する。さらに、重要な特徴を優先するために、バックボーンに単純なゲートアテンション機構が導入される。これらの軽量なコンポーネントを組み込むことで、単純なmlp構造の学習能力を大幅に向上させ、最小の計算使用量で複雑なトランスフォーマーモデルを上回る。さらに、TSMixerのモジュール設計により、教師付きとマスク付きの両方の自己教師付き学習手法との互換性が実現され、時系列基礎モデルのための有望なビルディングブロックとなる。 TSMixer は最先端の MLP と Transformer のモデルよりも 8-60% の差で予測できる。また、Patch-Transformerモデルの最新の強力なベンチマーク(1～2%)を上回り、メモリとランタイム(2～3倍)を大幅に削減した。

Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules. TSMixer is designed for multivariate forecasting and representation learning on patched time series, providing an efficient alternative to Transformers. Our model draws inspiration from the success of MLP-Mixer models in computer vision. We demonstrate the challenges involved in adapting Vision MLP-Mixer for time series and introduce empirically validated components to enhance accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a Hybrid channel modeling approach to effectively handle noisy channel interactions and generalization across diverse datasets, a common challenge in existing patch channel-mixing methods. Additionally, a simple gated attention mechanism is introduced in the backbone to prioritize important features. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X).

翻訳日:2023-06-19 16:46:20 公開日:2023-06-14

# 特徴分布歪型フェデレーション学習のための簡易データ拡張法

A Simple Data Augmentation for Feature Distribution Skewed Federated Learning ( http://arxiv.org/abs/2306.09363v1 )

ライセンス: Link先を確認

Yunlu Yan, Lei Zhu

(参考訳) フェデレートラーニング(FL)は、複数のクライアント間の協調学習を分散的に支援し、プライバシ保護を保証する。しかし、その性能は不均一なデータ、すなわち非IIDデータとして必然的に劣化する。本稿では,現実世界のアプリケーションに広く普及しているFLシナリオの特徴分布に着目した。主な課題は、ローカルデータセットのさまざまな基礎的な分布に起因する機能シフトにある。前回の試みは進展したが、データ自体に注意を払う研究はほとんどなく、この問題の根源となっている。そこで本論文の主な目的は,特徴シフトを軽減するため,入力レベルでの汎用データ拡張手法を開発することである。この目的を達成するために,フェデレーション全体からクライアントのデータにデータセットの統計をランダムに注入する特徴分散スキュードFLのための,シンプルで驚くほど効果的なデータ拡張手法であるFedRDNを提案する。これにより,特徴の一般化を効果的に改善でき,特徴のシフトを緩和できる。さらにFedRDNは、数行のコードだけでデータ拡張フローにシームレスに統合できるプラグイン・アンド・プレイコンポーネントである。いくつかのデータセットに対する大規模な実験により、FedRDNと組み合わせることで、様々な代表FLワークの性能をさらに向上できることが示されている。ソースコードはリリースされます。

Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner, while ensuring privacy protection. However, its performance is inevitably degraded as suffering data heterogeneity, i.e., non-IID data. In this paper, we focus on the feature distribution skewed FL scenario, which is widespread in real-world applications. The main challenge lies in the feature shift caused by the different underlying distributions of local datasets. While the previous attempts achieved progress, few studies pay attention to the data itself, the root of this issue. Therefore, the primary goal of this paper is to develop a general data augmentation technique at the input level, to mitigate the feature shift. To achieve this goal, we propose FedRDN, a simple yet remarkably effective data augmentation method for feature distribution skewed FL, which randomly injects the statistics of the dataset from the entire federation into the client's data. By this, our method can effectively improve the generalization of features, thereby mitigating the feature shift. Moreover, FedRDN is a plug-and-play component, which can be seamlessly integrated into the data augmentation flow with only a few lines of code. Extensive experiments on several datasets show that the performance of various representative FL works can be further improved by combining them with FedRDN, which demonstrates the strong scalability and generalizability of FedRDN. The source code will be released.

翻訳日:2023-06-19 16:45:50 公開日:2023-06-14

# パラメータアウェアポリシを用いた汎用ワンショットロープマニピュレーション

Generalizable One-shot Rope Manipulation with Parameter-Aware Policy ( http://arxiv.org/abs/2306.09872v1 )

ライセンス: Link先を確認

So Kuroki, Jiaxian Guo, Tatsuya Matsushima, Takuya Okubo, Masato Kobayashi, Yuya Ikeda, Ryosuke Takanami, Paul Yoo, Yutaka Matsuo, Yusuke Iwasawa

(参考訳) 従来のロープ操作では、動作中の変形性に固有の不確実性があるため、ロープのゴール到達のような単純なタスクであっても、ロープの操作ポリシーをトレーニングするために、何百もの実世界のデモを必要とする場合が多い。この問題に対処するため、実世界の1つのデモで異なる変形可能なロープを操作できるフレームワークであるGenORMを紹介します。これを実現するために, 変形可能なロープパラメータに条件付けし, 各種の模擬変形可能なロープをトレーニングすることにより, 異なるロープパラメータに基づいて動作を調整できるようにした。新しいロープが与えられたとき、GenORMは、実世界の実演とシミュレーションの点雲の格子密度の差を最小限にして、変形可能なロープパラメータを推定する。微分可能な物理シミュレータの助けを借りて、我々は1つの実世界のデモンストレーションしか必要としない。シミュレーションと実世界のロープ操作の両セットアップにおける実証的検証により,1回のデモンストレーションで異なるロープを操作でき,両環境でのベースラインを著しく上回る(ドメイン内ロープの62%向上,シミュレーションでの分散外ロープの15%向上,実世界の26%改善)ことが明らかとなり,ワンショットロープ操作におけるアプローチの有効性が実証された。

Due to the inherent uncertainty in their deformability during motion, previous methods in rope manipulation often require hundreds of real-world demonstrations to train a manipulation policy for each rope, even for simple tasks such as rope goal reaching, which hinder their applications in our ever-changing world. To address this issue, we introduce GenORM, a framework that allows the manipulation policy to handle different deformable ropes with a single real-world demonstration. To achieve this, we augment the policy by conditioning it on deformable rope parameters and training it with a diverse range of simulated deformable ropes so that the policy can adjust actions based on different rope parameters. At the time of inference, given a new rope, GenORM estimates the deformable rope parameters by minimizing the disparity between the grid density of point clouds of real-world demonstrations and simulations. With the help of a differentiable physics simulator, we require only a single real-world demonstration. Empirical validations on both simulated and real-world rope manipulation setups clearly show that our method can manipulate different ropes with a single demonstration and significantly outperforms the baseline in both environments (62% improvement in in-domain ropes, and 15% improvement in out-of-distribution ropes in simulation, 26% improvement in real-world), demonstrating the effectiveness of our approach in one-shot rope manipulation.

翻訳日:2023-06-19 13:31:49 公開日:2023-06-14

# 養殖システムにおける摂水制御と水質モニタリング--機会と課題

Feeding control and water quality monitoring in aquaculture systems: Opportunities and challenges ( http://arxiv.org/abs/2306.09920v1 )

ライセンス: Link先を確認

Fahad Aljehani, Ibrahima N'Doye, Taous-Meriem Laleg-Kirati

(参考訳) 養殖システムは、運営コストと魚の損失を低減し、成長生産効率を高め、魚の福祉と健康に繋がる先進的な管理戦略の最近の発展から恩恵を受けることができる。水質のモニタリングと給餌の制御は、魚類の生産性のバランスと魚類の成長過程の形成の基本的な要素である。現在、ほとんどの魚の養殖プロセスは異なる段階で手動で行われ、時間と挑戦的な人工的差別に依存している。摂餌制御アプローチは、飼料転換率を通じて魚類の成長と繁殖に影響を与えるため、これらの摂餌パラメータの制御は、魚の福祉の強化と一般的な漁業コストの最小化に不可欠である。アンモニア濃度やpHなどの環境因子の高濃度は水質や魚の生存に影響を及ぼす。したがって、最適で効率的で信頼性の高い供給プロセスを決定し、水質を監視するための制御戦略を開発する必要がある。本稿では,養殖システムにおける魚の成長制御技術,すなわち動的養殖プロセスの給水・水質を最適化するアルゴリズムについて概説する。具体的には,魚の成長と生存を最適化するためのモデルベース制御手法とモデルフリー強化学習戦略について検討した。モデルフリーフレームワークは近似魚の成長動的モデルを使用し、制約を満たさない。モデルに基づくアプローチが強化学習フレームワークをどのようにサポートし、制約満足度を効率的に処理し、価値に基づく強化学習からより良い軌道とポリシーを見つけるかについて議論する。

Aquaculture systems can benefit from the recent development of advanced control strategies to reduce operating costs and fish loss and increase growth production efficiency, resulting in fish welfare and health. Monitoring the water quality and controlling feeding are fundamental elements of balancing fish productivity and shaping the fish growth process. Currently, most fish-feeding processes are conducted manually in different phases and rely on time-consuming and challenging artificial discrimination. The feeding control approach influences fish growth and breeding through the feed conversion rate; hence, controlling these feeding parameters is crucial for enhancing fish welfare and minimizing general fishery costs. The high concentration of environmental factors, such as a high ammonia concentration and pH, affect the water quality and fish survival. Therefore, there is a critical need to develop control strategies to determine optimal, efficient, and reliable feeding processes and monitor water quality. This paper reviews the main control design techniques for fish growth in aquaculture systems, namely algorithms that optimize the feeding and water quality of a dynamic fish growth process. Specifically, we review model-based control approaches and model-free reinforcement learning strategies to optimize the growth and survival of the fish or track a desired reference live-weight growth trajectory. The model-free framework uses an approximate fish growth dynamic model and does not satisfy constraints. We discuss how model-based approaches can support a reinforcement learning framework to efficiently handle constraint satisfaction and find better trajectories and policies from value-based reinforcement learning.

翻訳日:2023-06-19 13:12:35 公開日:2023-06-14

# イソスペクトラル変形による複合kdvブレザへの自由粒子

Free Particle to Complex KdV breathers through Isospectral Deformation ( http://arxiv.org/abs/1110.3708v5 )

ライセンス: Link先を確認

Kumar Abhinav, Aradhya Shukla, and Prasanta K. Panigrahi

(参考訳) 実空間における量子力学における自由粒子には超対称性が与えられ、これは複素スペクトルへの自然な拡張を可能にし、P(英語版)とT(英語版)対称性が組み込まれている。また、PT対称性の非破壊相と破壊相の起源と、実値と複素固有値の関係についても説明し、後者はさらにゼロ幅共鳴を示す。これは、複素平面への固有値問題の拡張により拡大ヒルベルト空間における境界状態と減衰状態の組込みが可能になるため可能である。超対称性のスペクトルを変化させることなくポテンシャルを改変する固有の自由は、KdVの複素呼気解とPT対称性と複素平面上の自由粒子との接続を自然に説明する。さらに、破壊されたPT相における非自明な零幅共鳴は、sl(2, R) ポテンシャル代数に直結する一般化を課す。

The free particle in quantum mechanics in real space is endowed with supersymmetry, which enables a natural extension to complex spectra with a built-in parity (P) and time reversal (T) symmetry. It also explains the origin of unbroken and broken phases of the PT-symmetry and their relationship with the real and complex eigenvalues respectively, the latter further displaying zero-width resonances. This is possible as the extension of the eigenvalue problem to the complex plane enables the incorporation of bound and decaying states in the enlarged Hilbert space. The inherent freedom of modification of the potential without changing the spectra in supersymmetry naturally explains the connection of complex breather solutions of KdV with PT-symmetry and the free particle on the complex plane. Further, non-trivial zero-width resonances in the broken PT phase mandate a generalization that is directly connected to the sl(2, R) potential algebra.

翻訳日:2023-06-18 14:56:39 公開日:2023-06-14

# カテゴリ間の変換へのロバストネス:不変神経表現による変換へのロバストネス?

Robustness to Transformations Across Categories: Is Robustness To Transformations Driven by Invariant Neural Representations? ( http://arxiv.org/abs/2007.00112v4 )

ライセンス: Link先を確認

Hojin Jang, Syed Suleman Abbas Zaidi, Xavier Boix, Neeraj Prasad, Sharon Gilad-Gutnick, Shlomit Ben-Ami, Pawan Sinha

(参考訳) 深層畳み込みニューラルネットワーク(DCNN)は、これらの変換がトレーニングセットに含まれる場合、変換中のオブジェクト(例えば、ぼやけやノイズ)を認識するための印象的な堅牢性を示している。このようなロバスト性を説明する仮説は、dcnnが画像が変換された後も不変な神経表現を発達させることである。しかし、この仮説がどの程度真であるかは、例えば不変性とは異なる性質で変換に対する堅牢性が達成できるため、決定的な疑問である。ネットワークの一部は、変換された画像または非変換された画像を認識するために特化することができる。本稿では, 学習分布を超えた変換に対するロバスト性を促進することによって, 不変な神経表現が出現する条件について検討する。具体的には、トレーニング中にいくつかのオブジェクトカテゴリのみが変換されるトレーニングパラダイムを分析し、dcnnが変換されないカテゴリ全体の変換にロバストであるかどうかを評価する。その結果,不変なニューラルネットワーク表現がない場合でも,ネットワークはトレーニング中に変換されるカテゴリのロバスト性を示すため,不変なニューラルネットワーク表現が変換に対するロバスト性を常に駆動するとは限らない。不変性は、トレーニングセット内の変換されたカテゴリの数が増えるときにのみ現れる。この現象は、物体の空間配置の変化を伴う回転や薄化のような幾何学的変換よりも、ぼやけやハイパスフィルタリングのような局所的変換で顕著である。その結果,深層学習における不変神経表現の理解が深層学習と自然発生状態の理解を深めることができた。

Deep Convolutional Neural Networks (DCNNs) have demonstrated impressive robustness to recognize objects under transformations (eg. blur or noise) when these transformations are included in the training set. A hypothesis to explain such robustness is that DCNNs develop invariant neural representations that remain unaltered when the image is transformed. However, to what extent this hypothesis holds true is an outstanding question, as robustness to transformations could be achieved with properties different from invariance, eg. parts of the network could be specialized to recognize either transformed or non-transformed images. This paper investigates the conditions under which invariant neural representations emerge by leveraging that they facilitate robustness to transformations beyond the training distribution. Concretely, we analyze a training paradigm in which only some object categories are seen transformed during training and evaluate whether the DCNN is robust to transformations across categories not seen transformed. Our results with state-of-the-art DCNNs indicate that invariant neural representations do not always drive robustness to transformations, as networks show robustness for categories seen transformed during training even in the absence of invariant neural representations. Invariance only emerges as the number of transformed categories in the training set is increased. This phenomenon is much more prominent with local transformations such as blurring and high-pass filtering than geometric transformations such as rotation and thinning, which entail changes in the spatial arrangement of the object. Our results contribute to a better understanding of invariant neural representations in deep learning and the conditions under which it spontaneously emerges.

翻訳日:2023-06-17 04:48:12 公開日:2023-06-14

# 嗜好に基づくOOD検出におけるエントロピー問題

Entropic Issues in Likelihood-Based OOD Detection ( http://arxiv.org/abs/2109.10794v2 )

ライセンス: Link先を確認

Anthony L. Caterini, Gabriel Loaiza-Ganem

(参考訳) 最大確率で訓練された深層生成モデルは、確率的にデータを推論するための非常に一般的な方法である。しかし、分布外データ(OOD)は分布内データよりも高い確率を割り当てることができることが観察されており、これらの確率値の意味を疑問視している。本研究では,この現象に対する新しい視点を示し,平均的確率をkl発散項とエントロピー項に分解する。後者は、上述した奇妙なOOD挙動を説明し、高いエントロピーを持つデータセットの確率値を抑制することができる。私たちのアイデアは単純ですが、文献ではまだ探索されていません。本解析は,問題となるエントロピー項が期待値から外れるので,確率比に基づくood検出手法の成功のさらなる説明を提供する。最後に、上記の分解が直接保持されない多様体モデルを用いた最近のOOD検出の成功と、この観察がどう関係しているかを論じる。

Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold directly.

翻訳日:2023-06-17 04:43:29 公開日:2023-06-14

# llvip: ローライトビジョンのための可視赤外ペアデータセット

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision ( http://arxiv.org/abs/2108.10831v4 )

ライセンス: Link先を確認

Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, Shengjie Liu, Wenli Zhou

(参考訳) 画像の融合や歩行者検出、低照度での画像から画像への変換といった様々な視覚課題において、有効な対象領域の欠如は極めて困難である。この場合、赤外線と可視画像を組み合わせて、詳細な情報と効果的なターゲット領域の両方を提供することができる。本稿では,低照度ビジョンのための可視赤外ペアデータセットLLVIPを提案する。このデータセットには30976枚の画像、または15488枚のペアが含まれており、そのほとんどは非常に暗いシーンで撮影され、すべての画像は時間と空間で厳密に整列している。データセットの歩行者はラベルが付けられています。データセットを他の可視赤外データセットと比較し,画像融合,歩行者検出,画像から画像への変換など,一般的なビジュアルアルゴリズムの性能評価を行った。実験結果は,画像情報に対する融合の相補的効果を示し,超低照度条件下での3つの視覚課題の既存のアルゴリズムの欠如を見出した。 LLVIPデータセットは,低照度アプリケーションにおける画像融合,歩行者検出,画像から画像への変換を促進することによって,コンピュータビジョンのコミュニティに寄与すると考えている。データセットはhttps://bupt-ai-cz.github.io/llvipでリリースされる。生データは画像登録などのさらなる研究のためにも提供される。

It is very challenging for various visual tasks such as image fusion, pedestrian detection and image-to-image translation in low light conditions due to the loss of effective target areas. In this case, infrared and visible images can be used together to provide both rich detail information and effective target areas. In this paper, we present LLVIP, a visible-infrared paired dataset for low-light vision. This dataset contains 30976 images, or 15488 pairs, most of which were taken at very dark scenes, and all of the images are strictly aligned in time and space. Pedestrians in the dataset are labeled. We compare the dataset with other visible-infrared datasets and evaluate the performance of some popular visual algorithms including image fusion, pedestrian detection and image-to-image translation on the dataset. The experimental results demonstrate the complementary effect of fusion on image information, and find the deficiency of existing algorithms of the three visual tasks in very low-light conditions. We believe the LLVIP dataset will contribute to the community of computer vision by promoting image fusion, pedestrian detection and image-to-image translation in very low-light applications. The dataset is being released in https://bupt-ai-cz.github.io/LLVIP. Raw data is also provided for further research such as image registration.

翻訳日:2023-06-17 04:43:14 公開日:2023-06-14

# 信頼されたサーバを持たないプライベートフェデレーション学習:凸損失の最適アルゴリズム

Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses ( http://arxiv.org/abs/2106.09779v8 )

ライセンス: Link先を確認

Andrew Lowy and Meisam Razaviyayn

(参考訳) 本稿では、サーバや他のサイロを信頼していない人々からのデータを用いて、フェデレーション学習(FL)、特にクロスサイロFLについて研究する。この設定では、各サイロ(例えば病院)は、異なる人々(例えば患者)のデータを持ち、サーバまたは他のサイロが敵の盗聴者として機能しても、各人のデータ(例えば医療記録)のプライバシーを維持する必要がある。この要件は、レコード/イテムレベルの差分プライバシー(DP)を満たすためにサイロ i の通信を必要とする、ISRL-DP(Inter-Silo Record-Level Differential Privacy)の研究を動機付けている。 ISRL-DPは、サイロ i(例えば、病院 i)内の各人物(例えば、患者)のデータが漏洩しないことを保証する。 ISRL-DPは、よく研究されているプライバシー概念とは異なる。中央およびユーザレベルのDPは、人々がサーバ/他のサイロを信頼していると仮定します。スペクトルの反対側では、ローカルDPは、人々が誰も信用していないと仮定する(独自のサイロでさえ)。 ISRL-DPは、中央のDPとローカルのDPの間に位置するので、サーバや他のサイロではなく、人々が自分のサイロを信頼するという現実的な仮定(クロスサイロFL)が成り立つ。本研究では、ISRL-DP FL 上の(対数まで)上と下の境界に凸/強凸損失関数と等質な(等質な)サイロデータを与える。注目すべきは、ISRL-DPアルゴリズムにより、任意の不均一なサイロデータ分布で同様の境界がスムーズな損失に到達できることである。また, ISRL-DPフェデレーションによる経験的リスク最小化のために, 上および下限を厳密に設定し, アクセラレーションを用いて, 最先端技術よりも少ない通信ラウンドで最適なバウンドを実現する。最後に、サイロメッセージを匿名化するセキュアな「シャッフル」により、より実用的な信頼前提の下で、我々のアルゴリズムは最適な中央DPレートを得る。数値実験により,アルゴリズムの分類と回帰タスクにおいて,プライバシの正確性が良好なトレードオフを示す。

This paper studies federated learning (FL)--especially cross-silo FL--with data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) has data from different people (e.g. patients) and must maintain the privacy of each person's data (e.g. medical record), even if the server or other silos act as adversarial eavesdroppers. This requirement motivates the study of Inter-Silo Record-Level Differential Privacy (ISRL-DP), which requires silo i's communications to satisfy record/item-level differential privacy (DP). ISRL-DP ensures that the data of each person (e.g. patient) in silo i (e.g. hospital i) cannot be leaked. ISRL-DP is different from well-studied privacy notions. Central and user-level DP assume that people trust the server/other silos. On the other end of the spectrum, local DP assumes that people do not trust anyone at all (even their own silo). Sitting between central and local DP, ISRL-DP makes the realistic assumption (in cross-silo FL) that people trust their own silo, but not the server or other silos. In this work, we provide tight (up to logarithms) upper and lower bounds for ISRL-DP FL with convex/strongly convex loss functions and homogeneous (i.i.d.) silo data. Remarkably, we show that similar bounds are attainable for smooth losses with arbitrary heterogeneous silo data distributions, via an accelerated ISRL-DP algorithm. We also provide tight upper and lower bounds for ISRL-DP federated empirical risk minimization, and use acceleration to attain the optimal bounds in fewer rounds of communication than the state-of-the-art. Finally, with a secure "shuffler" to anonymize silo messages (but without a trusted server), our algorithm attains the optimal central DP rates under more practical trust assumptions. Numerical experiments show favorable privacy-accuracy tradeoffs for our algorithm in classification and regression tasks.

翻訳日:2023-06-17 04:42:55 公開日:2023-06-14

# ロバストなサンプル重み付けによるターゲット集団に対する個別化治療ルール学習

Robust Sample Weighting to Facilitate Individualized Treatment Rule Learning for a Target Population ( http://arxiv.org/abs/2105.00581v2 )

ライセンス: Link先を確認

Rui Chen, Jared D. Huling, Guanhua Chen, Menggang Yu

(参考訳) 個別化治療規則(ITR)の学習は、精密医療において重要なトピックである。現在の文献は主に単一源集団からITRを誘導することに焦点を当てている。対象個体群と対象個体群とが異なる場合の観測データ設定について考察する。スカラー量である平均処理効果の因果一般化と比較すると、ITRの一般化は、制限されない真の最適ITRを含まないかもしれない関数の事前定義されたクラスに基づいて規則をモデル化し一般化する必要があるため、新たな課題をもたらす。本研究の目的は、このような不特定性の影響を緩和し、ソース集団からターゲット集団への最適なITRの一般化を容易にするための重み付けフレームワークを開発することである。提案手法は,カーネルヒルベルト空間を再現した非パラメトリック関数クラスに対する共変量バランスを求め,重みに依存する多くのIRR学習法を改善することができる。提案手法は,重み付けの重要性と重み付けの重み付けを2つの極端なケースとして包含し,その間のバイアス分散トレードオフを改善できることを示す。数値的な例は,本手法を用いることで,他の重み付け法と比較して,ターゲット個体数のITR推定を大幅に改善できることを示している。

Learning individualized treatment rules (ITRs) is an important topic in precision medicine. Current literature mainly focuses on deriving ITRs from a single source population. We consider the observational data setting when the source population differs from a target population of interest. Compared with causal generalization for the average treatment effect which is a scalar quantity, ITR generalization poses new challenges due to the need to model and generalize the rules based on a prespecified class of functions which may not contain the unrestricted true optimal ITR. The aim of this paper is to develop a weighting framework to mitigate the impact of such misspecification and thus facilitate the generalizability of optimal ITRs from a source population to a target population. Our method seeks covariate balance over a non-parametric function class characterized by a reproducing kernel Hilbert space and can improve many ITR learning methods that rely on weights. We show that the proposed method encompasses importance weights and overlap weights as two extreme cases, allowing for a better bias-variance trade-off in between. Numerical examples demonstrate that the use of our weighting method can greatly improve ITR estimation for the target population compared with other weighting methods.

翻訳日:2023-06-17 04:42:02 公開日:2023-06-14

# 多様体とグラフ上の確率分布のコレクションの比較のための固有スライスワッサースタイン距離

Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and Graphs ( http://arxiv.org/abs/2010.15285v3 )

ライセンス: Link先を確認

Raif Rustamov and Subhabrata Majumdar

(参考訳) 確率分布のコレクションは、ユーザアクティビティパターン分析から脳コネクトミクスまで、さまざまなアプリケーションで発生します。実際には、これらの分布は有限区間、円、シリンダー、球面、他の多様体、グラフを含む様々な領域タイプで定義される。本稿では,そのような一般領域上の分布の2つの集合間の差を検出する手法を提案する。そこで本研究では,多様体とグラフ上の新たなワッサースタイン距離クラスを導出する本質的スライシング構成を提案する。これらの距離はヒルベルト埋め込み可能であり、分布コレクション比較問題をヒルベルト空間におけるより親しみやすい平均テスト問題に還元することができる。我々は、再サンプリングに基づく2つのテスト手順と、座標ワイドテストからのp値の組み合わせを提供する。種々の合成および実データ設定実験により、結果の試験が強力であり、p値が良好に校正されていることを示す。

Collections of probability distributions arise in a variety of applications ranging from user activity pattern analysis to brain connectomics. In practice these distributions can be defined over diverse domain types including finite intervals, circles, cylinders, spheres, other manifolds, and graphs. This paper introduces an approach for detecting differences between two collections of distributions over such general domains. To this end, we propose the intrinsic slicing construction that yields a novel class of Wasserstein distances on manifolds and graphs. These distances are Hilbert embeddable, allowing us to reduce the distribution collection comparison problem to a more familiar mean testing problem in a Hilbert space. We provide two testing procedures one based on resampling and another on combining p-values from coordinate-wise tests. Our experiments in various synthetic and real data settings show that the resulting tests are powerful and the p-values are well-calibrated.

翻訳日:2023-06-17 04:40:57 公開日:2023-06-14

# データ依存による会員推測攻撃の調査

Investigating Membership Inference Attacks under Data Dependencies ( http://arxiv.org/abs/2010.12112v4 )

ライセンス: Link先を確認

Thomas Humphries, Simon Oya, Lindsey Tulloch, Matthew Rafuse, Ian Goldberg, Urs Hengartner, Florian Kerschbaum

(参考訳) プライバシに敏感なデータに基づく機械学習モデルのトレーニングが一般的なプラクティスとなり、拡大する分野におけるイノベーションを推進している。これにより、プライバシーに深刻な影響をもたらす新たな攻撃への扉が開いた。そのような攻撃の一つ、メンバーシップ推論攻撃(mia)は、特定のデータポイントがモデルのトレーニングに使われたかどうかを暴露する。増大する文学は、そのような攻撃に対する防御として差分プライベート(dp)訓練アルゴリズムを使用する。しかしながら、これらの研究は、訓練セットのすべてのメンバーと非メンバーが独立して同一に分散しているという制限的な仮定の下で、防衛を評価する。この仮定は文学における現実世界のユースケースの多くに当てはまらない。このことから,サンプル間の統計的依存関係による会員推定を評価し,DPが意味のある保護を提供していない理由(プライバシーパラメータ $\epsilon$ scales with the training set size $n$)を説明する。実世界のデータから構築したサンプル間の依存関係の異なるトレーニングセットを用いて,市販miasを用いた経験的評価を行う。以上の結果から,MIA の性能が大幅に向上し,データサンプルが統計的に独立であることはMIA の性能を著しく過小評価できることがわかった。

Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Private (DP) training algorithms as a defence against such attacks. However, these works evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed. This assumption does not hold for many real-world use cases in the literature. Motivated by this, we evaluate membership inference with statistical dependencies among samples and explain why DP does not provide meaningful protection (the privacy parameter $\epsilon$ scales with the training set size $n$) in this more general case. We conduct a series of empirical evaluations with off-the-shelf MIAs using training sets built from real-world data showing different types of dependencies among samples. Our results reveal that training set dependencies can severely increase the performance of MIAs, and therefore assuming that data samples are statistically independent can significantly underestimate the performance of MIAs.

翻訳日:2023-06-17 04:40:41 公開日:2023-06-14

# 因子化線形判別分析と計算生物学への応用

Factorized linear discriminant analysis and its application in computational biology ( http://arxiv.org/abs/2010.02171v5 )

ライセンス: Link先を確認

Mu Qiao

(参考訳) 単細胞転写データの複雑な景観をナビゲートすることは大きな課題である。この課題の中心は、細胞タイプの構造的および機能的特性に光を当てる高次元遺伝子発現パターンの有意義な表現の同定である。モデル解釈性と計算の単純さを追求し、しばしば細胞の重要な表現型の特徴と整合する元のデータの線形変換を求める。そこで本稿では,このニーズに対応するために,新しい線形次元低減法である因子化線形判別分析(flda)を提案する。 FLDAのくちばしは、他の影響を最小限に抑えつつ、1つの表現型の特徴と高い相関を持つ遺伝子発現レベルの線形機能を特定することである。本研究では,この手法をスパーシティーベース正規化アルゴリズムと統合する。この統合は、特定の表現型の特徴またはそれらの組み合わせに欠かせない遺伝子のサブセットを選択するために重要である。 fldaの有効性を説明するために,ショウジョウバエ視葉の神経細胞からの転写学的データセットに適用する。 FLDAは表現型の特徴に沿った構造パターンを捉えるだけでなく,各表現型に関連する重要な遺伝子を明らかにする。

Navigating the complex landscape of single-cell transcriptomic data presents significant challenges. Central to this challenge is the identification of a meaningful representation of high-dimensional gene expression patterns that sheds light on the structural and functional properties of cell types. Pursuing model interpretability and computational simplicity, we often look for a linear transformation of the original data that aligns with key phenotypic features of cells. In response to this need, we introduce factorized linear discriminant analysis (FLDA), a novel method for linear dimensionality reduction. The crux of FLDA lies in identifying a linear function of gene expression levels that is highly correlated with one phenotypic feature while minimizing the influence of others. To augment this method, we integrate it with a sparsity-based regularization algorithm. This integration is crucial as it selects a subset of genes pivotal to a specific phenotypic feature or a combination thereof. To illustrate the effectiveness of FLDA, we apply it to transcriptomic datasets from neurons in the Drosophila optic lobe. We demonstrate that FLDA not only captures the inherent structural patterns aligned with phenotypic features but also uncovers key genes associated with each phenotype.

翻訳日:2023-06-17 04:40:21 公開日:2023-06-14

# 高次元2層ニューラルネットワークにおける確率勾配の位相図

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks ( http://arxiv.org/abs/2202.00293v4 )

ライセンス: Link先を確認

Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborov\'a

(参考訳) 非凸最適化の展望にもかかわらず、過パラメータの浅いネットワークは勾配降下下でグローバル収束を達成することができる。この画像は狭いネットワークでは根本的に異なるが、局所的な極小視では行き詰まる傾向がある。本稿では,これら2つのレジームの高次元設定におけるクロスオーバーについて検討し,特に,いわゆる平均場・流体力学的レジームとsaad & sollaの独創的アプローチとの関係について検討する。ガウスデータに着目し,確率勾配勾配(SGD)の高次元的ダイナミクスにおける学習速度,時間スケール,隠れた単位数との相互作用について検討した。我々の研究は、統計的物理学から高次元のSGDを決定論的に記述し、それを拡張し、厳密な収束率を提供する。

Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.

翻訳日:2023-06-17 04:34:42 公開日:2023-06-14

# 量子後連想記憶

A Post-Quantum Associative Memory ( http://arxiv.org/abs/2201.12305v2 )

ライセンス: Link先を確認

Ludovico Lami, Daniel Goldwater, Gerardo Adesso

(参考訳) 連想記憶(Associative memory)は、その部分的開示によって完全に検索できる情報を記憶する装置である。我々は,いくつかの基本的な操作公理を満足する物理理論の最も一般的なクラスを表現する一般確率論(gpts)の枠組みの中で,連想記憶のおもちゃモデルとそれを行う究極の限界について検討する。私たちは、gptの次元がどれくらい大きいか自問自答し、n$が完全に区別可能な特性で2^m$の状態に対応できるようにします。このような最小次元を$d(n,m)$ と呼ぶ。 Danzer と Gr\"unbaum によって古い結果を呼び起こすと、GPT が古典的あるいは量子的である必要がある場合、$d(2,m)=m+1$ が $O(2^m)$ と比較されることを示す。これは、GPTが古典理論と量子理論の両方を指数関数的に上回るタスクの例をもたらす。より一般に、固定された$N$と漸近的に大きい$m$を解決し、すべての$N\geq 2$に対して$d(N,m) \leq m^{1+o_N(1)}$(m\to\infty$)を証明し、古典的および量子理論よりも指数関数的に改善する。最後に、与えられた gpt に対して最大$n$-wise の相互識別可能な集合を見つけるという一般問題に対する数値的アプローチを開発し、これは$n$-regular hypergraphs 上の最大クライク問題の例と見なすことができる。

Associative memories are devices storing information that can be fully retrieved given partial disclosure of it. We examine a toy model of associative memory and the ultimate limitations it is subjected to within the framework of general probabilistic theories (GPTs), which represent the most general class of physical theories satisfying some basic operational axioms. We ask ourselves how large the dimension of a GPT should be so that it can accommodate $2^m$ states with the property that any $N$ of them are perfectly distinguishable. Call $d(N,m)$ the minimal such dimension. Invoking an old result by Danzer and Gr\"unbaum, we prove that $d(2,m)=m+1$, to be compared with $O(2^m)$ when the GPT is required to be either classical or quantum. This yields an example of a task where GPTs outperform both classical and quantum theory exponentially. More generally, we resolve the case of fixed $N$ and asymptotically large $m$, proving that $d(N,m) \leq m^{1+o_N(1)}$ (as $m\to\infty$) for every $N\geq 2$, which yields again an exponential improvement over classical and quantum theories. Finally, we develop a numerical approach to the general problem of finding the largest $N$-wise mutually distinguishable set for a given GPT, which can be seen as an instance of the maximum clique problem on $N$-regular hypergraphs.

翻訳日:2023-06-17 04:34:28 公開日:2023-06-14

# 能動ラベル取得における名前付きエンティティに注目して

Focusing on Potential Named Entities During Active Label Acquisition ( http://arxiv.org/abs/2111.03837v3 )

ライセンス: Link先を確認

Ali Osman Berk Sapci, Oznur Tastan, Reyyan Yeniterzi

(参考訳) 名前付きエンティティ認識(ner)は、非構造化テキスト内の名前付きエンティティの参照を識別し、それらを予め定義された名前付きエンティティクラスに分類することを目的としている。ディープラーニングベースの事前学習言語モデルは、NERで優れた予測性能を達成するのに役立つが、多くのドメイン固有のNERアプリケーションは、依然としてかなりの量のラベル付きデータを要求する。ラベル取得問題の一般的なフレームワークであるアクティブラーニング(AL)は、モデル性能を犠牲にすることなく、アノテーションコストを最小限に抑えるためにNERタスクに使用されている。しかし,トークンの非バランスなクラス分布は,NERの効果的なALクエリ手法を設計する上での課題をもたらす。本稿では,有意な正のトークンにより多くの注意を払うAL文クエリ評価関数を提案し,これらの関数を文ベースおよびトークンベースのコスト評価戦略を用いて評価する。また、長すぎるか短すぎる文をペナル化するためのデータ駆動正規化手法を提案する。異なる領域からの3つのデータセットに対する実験により,提案手法はアノテーション付きトークンの数を減らし,従来の手法による予測性能を向上する。

Named entity recognition (NER) aims to identify mentions of named entities in an unstructured text and classify them into predefined named entity classes. While deep learning-based pre-trained language models help to achieve good predictive performances in NER, many domain-specific NER applications still call for a substantial amount of labeled data. Active learning (AL), a general framework for the label acquisition problem, has been used for NER tasks to minimize the annotation cost without sacrificing model performance. However, the heavily imbalanced class distribution of tokens introduces challenges in designing effective AL querying methods for NER. We propose several AL sentence query evaluation functions that pay more attention to potential positive tokens, and evaluate these proposed functions with both sentence-based and token-based cost evaluation strategies. We also propose a better data-driven normalization approach to penalize sentences that are too long or too short. Our experiments on three datasets from different domains reveal that the proposed approach reduces the number of annotated tokens while achieving better or comparable prediction performance with conventional methods.

翻訳日:2023-06-17 04:33:03 公開日:2023-06-14

# 自己コンディショニング事前学習言語モデル

Self-conditioning pre-trained language models ( http://arxiv.org/abs/2110.02802v4 )

ライセンス: Link先を確認

Xavier Suau, Luca Zappella, Nicholas Apostoloff

(参考訳) 本稿では,事前学習したTransformer-based Language Models (TLM) を用いてテキスト生成を誘導するメカニズムについて検討する。 Hinton (1999) によるProduct of Expertsの定式化に基づいて、TLM に自然に存在するエキスパートユニットを利用する生成機構を記述する。そのような単位は、そのような概念の入力および条件付きテキスト生成における概念を検出する責任がある。生成した出力に望まれる概念を誘導するために、専門家ユニットの識別方法と推論中にそれらを活性化する方法を述べる。驚くほど少量のユニットのアクティベーションは、テキスト生成(345mのパラメータを持つモデルでは3ユニット程度)を制御するのに十分であることがわかった。本研究の目的は, TLMの動作についてより深く知ることであるが, 細粒度ホモグラフの概念であっても, 微調整や余分なパラメータを使わずに条件付けに有効であることを示す。さらに,本手法は, TLMの出力に存在する性別バイアスを補正し, 評価された文脈ごとの性別パリティを達成できることを示す。提案手法をFUDGEとPPLM-BoWと比較し,本手法がより低いパープレキシティでジェンダーパリティを達成可能であることを示す。提案手法は,単純さと計算能力の最小化により,幅広いオーディエンスに利用可能である。本研究の成果は, TLMの生成機構を理解するための一歩である。

In this paper we aim to investigate the mechanisms that guide text generation with pre-trained Transformer-based Language Models (TLMs). Grounded on the Product of Experts formulation by Hinton (1999), we describe a generative mechanism that exploits expert units which naturally exist in TLMs. Such units are responsible for detecting concepts in the input and conditioning text generation on such concepts. We describe how to identify expert units and how to activate them during inference in order to induce any desired concept in the generated output. We find that the activation of a surprisingly small amount of units is sufficient to steer text generation (as little as 3 units in a model with 345M parameters). While the objective of this work is to learn more about how TLMs work, we show that our method is effective for conditioning without fine-tuning or using extra parameters, even on fine-grained homograph concepts. Additionally, we show that our method can be used to correct gender bias present in the output of TLMs and achieves gender parity for all evaluated contexts. We compare our method with FUDGE and PPLM-BoW, and show that our approach is able to achieve gender parity at a lower perplexity. The proposed method is accessible to a wide audience thanks to its simplicity and minimal compute needs. The findings in this paper are a step forward in understanding the generative mechanisms of TLMs.

翻訳日:2023-06-17 04:31:26 公開日:2023-06-14

# 地形活性化マップを用いた深部ニューラルネットワークの可視化

Visualizing Deep Neural Networks with Topographic Activation Maps ( http://arxiv.org/abs/2204.03528v2 )

ライセンス: Link先を確認

Valerie Krug, Raihan Kabir Ratul, Christopher Olson, Sebastian Stober

(参考訳) ディープニューラルネットワーク(DNN)による機械学習は、さまざまな分野のアプリケーションでタスクを解くのに成功している。しかし、DNNの複雑さは、彼らの学習課題の解決方法を理解するのを困難にしている。 DNNの説明可能性を改善するため、複雑で不透明なシステムを解析する神経科学の手法を適用した。ここでは、神経科学が脳活動の可視化に地形マップをどのように利用するかからインスピレーションを得る。また、DNNにおけるニューロンの活性化を地形図として可視化するため、同様の活動のニューロンが互いに近接する2次元空間に配置する手法の研究を行った。本研究では,DNN層内のニューロンの地形的レイアウトを求める手法を紹介し,比較する。さらに,地形アクティベーションマップを用いて誤りやバイアスを識別し,トレーニングプロセスを可視化する方法を示す。我々の新しい可視化技術は、DNNに基づく意思決定システムの透明性を改善し、機械学習の専門知識なしで解釈可能である。

Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. However, the complexity of DNNs makes it difficult to understand how they solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience that analyze complex and opaque systems. Here, we draw inspiration from how neuroscience uses topographic maps to visualize brain activity. To also visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space such that neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare methods to obtain a topographic layout of neurons in a DNN layer. Moreover, we demonstrate how to use topographic activation maps to identify errors or encoded biases and to visualize training processes. Our novel visualization technique improves the transparency of DNN-based decision-making systems and is interpretable without expert knowledge in Machine Learning.

翻訳日:2023-06-17 04:23:49 公開日:2023-06-14

# 量子不明瞭性によるアクセス不能情報へのアクセス

Accessing inaccessible information via quantum indistinguishability ( http://arxiv.org/abs/2203.16592v3 )

ライセンス: Link先を確認

Sebastian Horvat, Borivoje Daki\'c

(参考訳) 本稿では,その情報を符号化する「ターゲット」粒子を空間的に移動させることで,情報を少し学習する情報理論タスクを提示・解析する。一方、目的粒子と区別できない場合のみ、追加で独立に準備された量子粒子を用いることでタスクを解くことができることを示す。一方, 対象粒子と絡み合っている場合のみ, 識別可能な量子粒子を用いることで解くことができる。第一の量子化形式論において、独立に準備された不明瞭な量子粒子からなる系に本質的に存在するように見えるという絡み合いは、単なる表現的人工物以上のものであり、情報処理のリソースとして実際に使用できる。我々のタスクを解く量子力学プロトコルのクラスを分析することに加えて、我々は結果を一般化し、暗号に適用する可能な方法に向かって行動する。

In this paper we present and analyze an information-theoretic task that consists in learning a bit of information by spatially moving the "target" particle that encodes it. On one hand, we show that the task can be solved with the use of additional independently prepared quantum particles, only if these are indistinguishable from the target particle. On the other hand, the task can be solved with the use of distinguishable quantum particles, only if they are entangled with the target particle. These two features, as we argue, support the following claim: the entanglement that, in the first quantization formalism, appears to be inherently present in systems comprised of independently prepared indistinguishable quantum particles, is more than a mere representational artefact and can indeed be used as a resource for information processing. Besides analyzing the class of quantum-mechanical protocols that solve our task, we gesture towards possible ways of generalizing our results and of applying them in cryptography.

翻訳日:2023-06-17 04:23:32 公開日:2023-06-14

# SC2ベンチマーク:スプリットコンピューティングの圧縮を改善

SC2 Benchmark: Supervised Compression for Split Computing ( http://arxiv.org/abs/2203.08875v2 )

ライセンス: Link先を確認

Yoshitomo Matsubara, Ruihan Yang, Marco Levorato, Stephan Mandt

(参考訳) モバイルデバイスのディープラーニングモデルに対する需要が高まっているため、デバイスとより強力なエッジサーバの間のニューラルネットワーク計算の分割は魅力的なソリューションとなっている。しかし、既存の分割コンピューティングアプローチは、圧縮されたデータに対する遠隔計算の単純なベースラインに比べて性能が劣ることが多い。最近の研究では、教師付き下流タスクの関連情報を含む圧縮表現の学習を提案し、圧縮されたデータサイズと教師付きパフォーマンスのトレードオフを改善した。しかし、既存の評価指標は分割計算の不完全な図のみを提供する。本研究では,スプリットコンピューティング(SC2)の教師付き圧縮を導入し,モバイルデバイス上での計算の最小化,送信データサイズの最小化,モデル精度の最大化という新たな評価基準を提案する。 10のベースライン手法,3つのコンピュータビジョンタスク,180以上のトレーニングモデルを用いた総合的なベンチマーク研究を行い,SC2の様々な側面について議論する。 sc2benchは、将来のsc2研究のためのpythonパッケージです。提案するメトリクスとパッケージは、スプリットコンピューティングにおける教師付き圧縮のトレードオフを理解するのに役立つでしょう。

With the increasing demand for deep learning models on mobile devices, splitting neural network computation between the device and a more powerful edge server has become an attractive solution. However, existing split computing approaches often underperform compared to a naive baseline of remote computation on compressed data. Recent studies propose learning compressed representations that contain more relevant information for supervised downstream tasks, showing improved tradeoffs between compressed data size and supervised performance. However, existing evaluation metrics only provide an incomplete picture of split computing. This study introduces supervised compression for split computing (SC2) and proposes new evaluation criteria: minimizing computation on the mobile device, minimizing transmitted data size, and maximizing model accuracy. We conduct a comprehensive benchmark study using 10 baseline methods, three computer vision tasks, and over 180 trained models, and discuss various aspects of SC2. We also release sc2bench, a Python package for future research on SC2. Our proposed metrics and package will help researchers better understand the tradeoffs of supervised compression in split computing.

翻訳日:2023-06-17 04:22:20 公開日:2023-06-14

# 再構成による構成シーン表現学習:調査

Compositional Scene Representation Learning via Reconstruction: A Survey ( http://arxiv.org/abs/2202.07135v4 )

ライセンス: Link先を確認

Jinyang Yuan, Tonglin Chen, Bin Li, Xiangyang Xue

(参考訳) 視覚シーンは視覚概念で構成され、組み合わせ爆発の特性を持つ。人間が多様な視覚シーンから効率的に学習する重要な理由は、構成的知覚能力であり、人工知能が同様の能力を持つことが望ましい。構成シーン表現学習はそのような能力を実現するタスクである。近年,表現学習に有利な深層ニューラルネットワークを応用し,再構成による構図表現を学習し,この研究の方向性を深層学習時代へと発展させる手法が提案されている。大量のラベルのないデータを使用し、費用がかかるデータアノテーションを避けることができるため、再構築による学習は有利である。 In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.

Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.

翻訳日:2023-06-17 04:21:42 公開日:2023-06-14

# 不均一無線ネットワーク上での動的分散モデルトレーニングのための並列逐次学習

Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks ( http://arxiv.org/abs/2202.02947v6 )

ライセンス: Link先を確認

Seyyedali Hosseinalipour, Su Wang, Nicolo Michelusi, Vaneet Aggarwal, Christopher G. Brinton, David J. Love, Mung Chiang

(参考訳) フェデレートラーニング(FedL)は,一連の無線デバイス上で,反復的なローカルアップデート(デバイス)とグローバルアグリゲーション(サーバ)を通じて,モデルトレーニングを分散する一般的なテクニックとして登場した。本稿では,FedLアーキテクチャを3次元に拡張した並列逐次学習(PSL)を開発する。 i)デバイス間通信(D2D)を介してデバイス間の分散協調を可能にするネットワーク。 (ii-a)学習:pslは、デバイスで異なるミニバッチサイズを持つ確率的勾配降下イテレーションの異種数を考慮し、(ii-b)データ:pslはデータの到着と出発を伴う動的環境を想定し、ローカルデータセットの分布は時間とともに進化し、モデル/コンセプトドリフトの新しいメトリックを介してキャプチャされる。 (ii-c) デバイス: PSLは計算能力と通信能力の異なるデバイスを考える。 (iii)近接、デバイス同士の距離とアクセスポイントが異なる。 pslは、資源効率の改善のためにそれらの間にアイドルタイムでグローバルアグリゲーションが実行され、データ分散とモデル分散と局所モデル凝縮をfederに組み込む現実的なシナリオを考察している。我々の分析は、分散機械学習におけるコールド対ウォームアップモデルの概念とモデル慣性について光を当てている。次に、ネットワーク対応動的モデルトラッキングを提案し、モデル学習とリソース効率のトレードオフを最適化し、NPハードなシグナミカルプログラミング問題を示す。最後に, 一般最適化解法を提案することで, この問題を解決した。数値計算により,グローバルアグリゲーション,モデル/コンセプションドリフト,D2D協調構成の間におけるアイドル時間間の相互依存性が明らかになった。

Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices, via iterative local updates (at devices) and global aggregations (at the server). In this paper, we develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions: (i) Network, allowing decentralized cooperation among the devices via device-to-device (D2D) communications. (ii) Heterogeneity, interpreted at three levels: (ii-a) Learning: PSL considers heterogeneous number of stochastic gradient descent iterations with different mini-batch sizes at the devices; (ii-b) Data: PSL presumes a dynamic environment with data arrival and departure, where the distributions of local datasets evolve over time, captured via a new metric for model/concept drift. (ii-c) Device: PSL considers devices with different computation and communication capabilities. (iii) Proximity, where devices have different distances to each other and the access point. PSL considers the realistic scenario where global aggregations are conducted with idle times in-between them for resource efficiency improvements, and incorporates data dispersion and model dispersion with local model condensation into FedL. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning. We then propose network-aware dynamic model tracking to optimize the model learning vs. resource efficiency tradeoff, which we show is an NP-hard signomial programming problem. We finally solve this problem through proposing a general optimization solver. Our numerical results reveal new findings on the interdependencies between the idle times in-between the global aggregations, model/concept drift, and D2D cooperation configuration.

翻訳日:2023-06-17 04:21:08 公開日:2023-06-14

# 字幕からの視覚言語トランスフォーマーの訓練

Training Vision-Language Transformers from Captions ( http://arxiv.org/abs/2205.09256v3 )

ライセンス: Link先を確認

Liangke Gui, Yingshan Chang, Qiuyuan Huang, Subhojit Som, Alex Hauptmann, Jianfeng Gao, Yonatan Bisk

(参考訳) 視覚言語トランスフォーマーは、低レベルな人間のラベル(クラスラベル、バウンディングボックスなど)なしで学習することができる。既存の作業は、バウンディングボックスやパッチを明示的に利用するにせよ、視覚的なバックボーンは、マルチモーダル言語パイプラインに統合される前に、ImageNetクラス予測に基づいてトレーニングする必要があると仮定する。これは不要であることを示し、この監督を必要としないマスク付きオートエンコーダ上に構築されたキャプション(vlc)から新しいモデルヴィジョン言語を導入する。実際、監督対象分類で事前訓練された現在の最先端のパッチベース視覚言語トランスフォーマであるVLTと、我々のモデルであるVLCとの直接比較では、我々のアプローチが分かる。 1.標準ベンチマークでvultを上回っている 2. より解釈可能で直感的なパッチ視覚化を提供する。 3.3は、アノテーション付きバウンディングボックスでトレーニングされたROIを利用する多くの大きなモデルと競合する。

Vision-Language Transformers can be learned without low-level human labels (e.g. class labels, bounding boxes, etc). Existing work, whether explicitly utilizing bounding boxes or patches, assumes that the visual backbone must first be trained on ImageNet class prediction before being integrated into a multimodal linguistic pipeline. We show that this is not necessary and introduce a new model Vision-Language from Captions (VLC) built on top of Masked Auto-Encoders that does not require this supervision. In fact, in a head-to-head comparison between ViLT, the current state-of-the-art patch-based vision-language transformer which is pretrained with supervised object classification, and our model, VLC, we find that our approach 1. outperforms ViLT on standard benchmarks, 2. provides more interpretable and intuitive patch visualizations, and 3. is competitive with many larger models that utilize ROIs trained on annotated bounding-boxes.

翻訳日:2023-06-17 04:13:21 公開日:2023-06-14

# テキスト・画像生成のためのプロンプト修飾器の分類

A Taxonomy of Prompt Modifiers for Text-To-Image Generation ( http://arxiv.org/abs/2204.13988v3 )

ライセンス: Link先を確認

Jonas Oppenlaender

(参考訳) テキストから画像への生成は2021年以来、注目を集めている。今日では、美しい、興味深いデジタル画像やアートワークが、テキスト入力("prompts")と深い生成モデルから合成することができる。テキスト・ツー・画像生成とAI生成アートに関するオンラインコミュニティが急速に現れている。本稿では,3ヶ月のエスノグラフィー研究に基づいて,オンラインコミュニティの実践者が使用する6種類のプロンプト修飾剤を同定する。プロンプト修飾子の新しい分類法により、研究者はテキストから画像への生成の実践を研究するための概念的な出発点を提供するが、aiが生成した芸術の実践者がイメージを改善するのに役立つかもしれない。さらに,「プロンプトエンジニアリング」の実践における即時修飾器の応用について概説する。本稿では,ヒューマン・コンピュータ・インタラクション(HCI)分野における新しい創造的実践の機会について論じる。この論文は、テキスト・ツー・イメージ生成とAI生成技術以外の将来の応用におけるヒューマン・AIインタラクション(HAI)の観点から、迅速なエンジニアリングの幅広い意味を論じる。

Text-to-image generation has seen an explosion of interest since 2021. Today, beautiful and intriguing digital images and artworks can be synthesized from textual inputs ("prompts") with deep generative models. Online communities around text-to-image generation and AI generated art have quickly emerged. This paper identifies six types of prompt modifiers used by practitioners in the online community based on a 3-month ethnographic study. The novel taxonomy of prompt modifiers provides researchers a conceptual starting point for investigating the practice of text-to-image generation, but may also help practitioners of AI generated art improve their images. We further outline how prompt modifiers are applied in the practice of "prompt engineering." We discuss research opportunities of this novel creative practice in the field of Human-Computer Interaction (HCI). The paper concludes with a discussion of broader implications of prompt engineering from the perspective of Human-AI Interaction (HAI) in future applications beyond the use case of text-to-image generation and AI generated art.

翻訳日:2023-06-17 04:13:05 公開日:2023-06-14

# 相互作用ナノ粒子のフィードバック冷却による力勾配センシングと絡み合い

Force-Gradient Sensing and Entanglement via Feedback Cooling of Interacting Nanoparticles ( http://arxiv.org/abs/2204.13684v3 )

ライセンス: Link先を確認

Henning Rudolph, Uro\v{s} Deli\'c, Markus Aspelmeyer, Klaus Hornberger, and Benjamin A. Stickler

(参考訳) 本研究では, 2つの浮遊ナノ粒子のフィードバック冷却により, 力の差分知覚と定常的絡み合いの観察が可能となることを示す。このフィードバックにより、2つの粒子は不均質な力場に影響を受けやすく、十分に強い粒子間カップリングの絡み合いを示す定常的な非熱状態へと誘導される。マイクロンあたりのゼプトニュートンの力勾配センシングは実現可能であり、荷電粒子間のクーロン相互作用による絡み合いは最先端のセットアップで現実的に観測できると予測した。

We show theoretically that feedback-cooling of two levitated, interacting nanoparticles enables differential sensing of forces and the observation of stationary entanglement. The feedback drives the two particles into a stationary, non-thermal state which is susceptible to inhomogeneous force fields and which exhibits entanglement for sufficiently strong inter-particle couplings. We predict that force-gradient sensing at the zepto-Newton per micron range is feasible and that entanglement due to the Coulomb interaction between charged particles can be realistically observed in state-of-the-art setups.

翻訳日:2023-06-17 04:12:21 公開日:2023-06-14

# 実験グレーボックス量子システム同定と制御

Experimental graybox quantum system identification and control ( http://arxiv.org/abs/2206.12201v3 )

ライセンス: Link先を確認

Akram Youssry, Yang Yang, Robert J. Chapman, Ben Haylock, Francesco Lenzini, Mirko Lobino, Alberto Peruzzo

(参考訳) エンジニアリングされた量子システムの理解と制御は、実用的な量子技術を開発するための鍵である。しかし、製造の不完全さや環境騒音といった現在の技術的限界を考えると、これは必ずしも可能とは限らない。これらの問題に対処するため、量子システム同定と制御のための理論的および数値的手法が数多く開発されている。これらの手法は、システムを記述するモデルの精度によって制限される従来の曲線フィッティングから、効率的な制御ソリューションを提供するが、モデルの出力を超えた制御や、基礎となる物理プロセスへの洞察を提供する機械学習手法まで、幅広い。ここでは,量子システムの物理モデルを構築し,最適制御を設計するための"グレーボックス"手法を実験的に実証する。標準教師付き機械学習モデルでは使用できない量であるユニタリとハミルトニアンを生成する一方で,モデルフィッティングよりも優れた性能を示す。提案手法は,物理原理と高精度機械学習を組み合わせることで,必要な制御量を直接測定できない問題に対して有効である。この方法は自然に時間依存的かつオープンな量子システムに拡張され、量子ノイズ分光とキャンセルへの応用がある。

Understanding and controlling engineered quantum systems is key to developing practical quantum technology. However, given the current technological limitations, such as fabrication imperfections and environmental noise, this is not always possible. To address these issues, a great deal of theoretical and numerical methods for quantum system identification and control have been developed. These methods range from traditional curve fittings, which are limited by the accuracy of the model that describes the system, to machine learning methods, which provide efficient control solutions but no control beyond the output of the model, nor insights into the underlying physical process. Here we experimentally demonstrate a "graybox" approach to construct a physical model of a quantum system and use it to design optimal control. We report superior performance over model fitting, while generating unitaries and Hamiltonians, which are quantities not available from the structure of standard supervised machine learning models. Our approach combines physics principles with high-accuracy machine learning and is effective with any problem where the required controlled quantities cannot be directly measured in experiments. This method naturally extends to time-dependent and open quantum systems, with applications in quantum noise spectroscopy and cancellation.

翻訳日:2023-06-17 04:04:13 公開日:2023-06-14

# 視覚異常検出のためのオートエンコーダによる自己教師付きトレーニング

Self-Supervised Training with Autoencoders for Visual Anomaly Detection ( http://arxiv.org/abs/2206.11723v3 )

ライセンス: Link先を確認

Alexander Bauer

(参考訳) 深層畳み込みオートエンコーダは、教師なしの方法で非線形次元の減少を学習するための効果的なツールを提供する。近年,視覚領域における異常検出作業に用いられている。異常のない例を用いて再構成誤差を最適化することにより、対応するネットワークがアプリケーションフェーズ内の異常領域を正確に再構成できない、という考え方が一般的である。この目標は通常、ボトルネック層のサイズを縮小するか、アクティベーションに間隔制約を課すことで、ネットワークの容量を制御することで対処される。しかし、どちらの手法も異常信号の再構成を明示的に罰しないため、しばしば検出が困難になる。我々は,データ多様体に着目した学習において,修正された再構成誤差を用いて識別情報を活用できる自己教師型学習システムを適用することで,この問題に対処する。これにより、モデルが局所的に一貫した再構成を生成するとともに、異常パターンのフィルタとして機能することで不規則性を置き換えることができる。関連する手法とは対照的に,本手法による推論は,1ステップで入力画像全体を処理する訓練や予測において極めて効率的である。 MVTec ADデータセットを用いた実験により,提案手法の高認識と局所化性能を示す。特にテクスチャ・サブセットでは,本手法は最近の異常検出手法を大きなマージンで一貫して上回っている。

Deep convolutional autoencoders provide an effective tool for learning non-linear dimensionality reduction in an unsupervised way. Recently, they have been used for the task of anomaly detection in the visual domain. By optimising for the reconstruction error using anomaly-free examples, the common belief is that a corresponding network should fail to accurately reconstruct anomalous regions in the application phase. This goal is typically addressed by controlling the capacity of the network by either reducing the size of the bottleneck layer or enforcing sparsity constraints on its activations. However, neither of these techniques does explicitly penalize reconstruction of anomalous signals often resulting in poor detection. We tackle this problem by adapting a self-supervised learning regime, which allows to use discriminative information during training focusing on the data manifold by means of a modified reconstruction error. This regularizes the model to produce locally consistent reconstructions, while replacing irregularities by acting as a filter for anomalous patterns. In contrast to related approaches, inference with our method is very efficient during training and prediction processing the entire input image in one single step. Our experiments on the MVTec AD dataset demonstrate high recognition and localization performance of the proposed method. On the texture-subset, in particular, our approach consistently outperforms a bunch of recent anomaly detection methods by a big margin.

翻訳日:2023-06-17 04:03:50 公開日:2023-06-14

# E2PN: 効率的なSE(3)-等変点ネットワーク

E2PN: Efficient SE(3)-Equivariant Point Network ( http://arxiv.org/abs/2206.05398v3 )

ライセンス: Link先を確認

Minghan Zhu, Maani Ghaffari, William A. Clark, Huei Peng

(参考訳) 本稿では,3次元点雲からSE(3)-等価特徴を学習するための畳み込み構造を提案する。これはカーネルポイント畳み込み(kpconv)の同変バージョンと見なすことができ、ポイントクラウドデータを処理するために広く使用される畳み込み形式である。既存の等価ネットワークと比較して、私たちの設計はシンプルで軽量で、高速で、既存のタスク固有のポイントクラウド学習パイプラインと統合が容易です。群畳み込みと商表現を組み合わせることでこれらの望ましい性質を達成する。具体的には、SO(2) を安定化部分群として使用し、計算を省くために球面商特徴体を形成する際に、SO(3) を有限群に区別する。また, 回転を区別するキャパシティを保持するために, 球状特徴からSO(3)特徴を復元する置換層を提案する。実験の結果,オブジェクト分類,ポーズ推定,キーポイントマッチングなどのタスクにおいて,既存の作業よりもはるかに少ないメモリ消費と高速実行を実現していることがわかった。提案手法は,点雲に基づく実世界のアプリケーションのための同変モデルの開発を促進することができる。

This paper proposes a convolution structure for learning SE(3)-equivariant features from 3D point clouds. It can be viewed as an equivariant version of kernel point convolutions (KPConv), a widely used convolution form to process point cloud data. Compared with existing equivariant networks, our design is simple, lightweight, fast, and easy to be integrated with existing task-specific point cloud learning pipelines. We achieve these desirable properties by combining group convolutions and quotient representations. Specifically, we discretize SO(3) to finite groups for their simplicity while using SO(2) as the stabilizer subgroup to form spherical quotient feature fields to save computations. We also propose a permutation layer to recover SO(3) features from spherical features to preserve the capacity to distinguish rotations. Experiments show that our method achieves comparable or superior performance in various tasks, including object classification, pose estimation, and keypoint-matching, while consuming much less memory and running faster than existing work. The proposed method can foster the development of equivariant models for real-world applications based on point clouds.

翻訳日:2023-06-17 04:02:38 公開日:2023-06-14

# ニューラル共分散SDE:初期化時の無限深さ幅ネットワークの形状

The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization ( http://arxiv.org/abs/2206.02768v3 )

ライセンス: Link先を確認

Mufan Bill Li, Mihai Nica, Daniel M. Roy

(参考訳) 初期化時のフィードフォワードニューラルネットワークのロジット出力は、垂直層で定義されたランダムな共分散行列を条件付きガウス行列とする。本研究では,このランダム行列の分布について検討する。近年の研究では、この共分散行列が非退化するためには、ネットワーク深さが大きくなるにつれて活性化関数を形成する必要があることが示されている。しかし、この形状法に対する現在の無限幅スタイルの理解は大深度では不十分であり、無限幅解析は層間における微視的変動を無視するが、これらのゆらぎは多くの層に蓄積する。この欠点を克服するために、形状の無限深さと幅の極限におけるランダム共分散行列を考察する。非自明な極限に達するのに必要な活性化関数の正確なスケーリングを特定し、確率微分方程式(SDE)によってランダムな共分散行列が支配されることを示す。シミュレーションを用いて、sde は有限ネットワークのランダム共分散行列の分布と密接に一致することを示す。さらに,活性化関数に基づき,大形ネットワークの爆発や消滅のノルムに対するif-and-only-if条件を回復する。

The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.

翻訳日:2023-06-17 04:01:50 公開日:2023-06-14

# adaprop: グラフニューラルネットワークに基づく知識グラフ推論のための学習適応伝播

AdaProp: Learning Adaptive Propagation for Graph Neural Network based Knowledge Graph Reasoning ( http://arxiv.org/abs/2205.15319v2 )

ライセンス: Link先を確認

Yongqi Zhang, Zhanke Zhou, Quanming Yao, Xiaowen Chu, Bo Han

(参考訳) グラフニューラルネットワーク(GNN)の人気により、知識グラフ(KG)を推論する様々なGNNベースの手法が設計されている。 gnnベースのkg推論法の重要な設計要素は伝搬経路と呼ばれ、各伝播ステップに関連するエンティティの集合を含んでいる。既存の手法では手書きの伝搬経路を使用し、エンティティとクエリ関係の相関を無視している。さらに、関与する物質の数は、より大きな伝播ステップで爆発的に増加する。本研究では,有望な目標を維持しつつ,無関係なエンティティをフィルタリングするために適応的な伝搬経路を学習する動機付けを行う。まず,近傍のターゲットと層間接続を線形複雑に保存できるインクリメンタルサンプリング機構を設計する。第2に,意味的関連のあるエンティティを識別するために,学習に基づくサンプリング分布を設計する。広範な実験により,本手法は強力で効率的であり,意味論的であることが示された。コードはhttps://github.com/LARS-research/AdaProp.comで公開されている。

Due to the popularity of Graph Neural Networks (GNNs), various GNN-based methods have been designed to reason on knowledge graphs (KGs). An important design component of GNN-based KG reasoning methods is called the propagation path, which contains a set of involved entities in each propagation step. Existing methods use hand-designed propagation paths, ignoring the correlation between the entities and the query relation. In addition, the number of involved entities will explosively grow at larger propagation steps. In this work, we are motivated to learn an adaptive propagation path in order to filter out irrelevant entities while preserving promising targets. First, we design an incremental sampling mechanism where the nearby targets and layer-wise connections can be preserved with linear complexity. Second, we design a learning-based sampling distribution to identify the semantically related entities. Extensive experiments show that our method is powerful, efficient, and semantic-aware. The code is available at https://github.com/LARS-research/AdaProp.

翻訳日:2023-06-17 04:01:29 公開日:2023-06-14

# 米国の政治家によるコミュニケーションにおける誠実さの代替概念から代替事実へ

From alternative conceptions of honesty to alternative facts in communications by U.S. politicians ( http://arxiv.org/abs/2208.10814v3 )

ライセンス: Link先を確認

Jana Lasser, Segun Taofeek Aroyehun, Fabio Carrella, Almog Simchon, David Garcia, Stephan Lewandowsky

(参考訳) ソーシャルメディアにおけるオンライン誤報の拡散は、社会的結束と民主主義の問題としてますます認識されている。この過程における政治指導者の役割は、たとえ証拠によって支持されていなくても、「自分の心を語る」政治家は、国民のセグメントによって真正かつ正直であると認識されているにもかかわらず、研究の注意を引いている。 2011年から2022年の間、Twitter上で米国議会のメンバーによるコミュニケーションを分析すると、政治家の正直性の概念は、証拠から切り離された真正な信念が、明白な証拠に基づく真理の探求とより区別されるようになることを示している。我々は、民主党ではなく共和党員にとって、10%の信念話者の増加は、ツイートで共有された情報源の12.8ポイントの質(ニューガードスコアシステム)の低下と関連していることを示した。逆に、真理検索言語の増加は、双方の情報源の品質の向上と関連している。この結果は、政治談話における現在の誤報の拡散は、証拠への依存を犠牲にして主観的信念の喚起を強調する真理と誠実性の代替的理解によって部分的に引き起こされているという仮説を支持している。

The spread of online misinformation on social media is increasingly perceived as a problem for societal cohesion and democracy. The role of political leaders in this process has attracted less research attention, even though politicians who "speak their mind" are perceived by segments of the public as authentic and honest even if their statements are unsupported by evidence. Analyzing communications by members of the U.S. Congress on Twitter between 2011 and 2022, we show that politicians' conception of honesty has undergone a distinct shift, with authentic belief-speaking that may be decoupled from evidence becoming more prominent and more differentiated from explicitly evidence-based truth seeking. We show that for Republicans - but not Democrats - an increase of belief-speaking of 10% is associated with a decrease of 12.8 points of quality (NewsGuard scoring system) in the sources shared in a tweet. Conversely, an increase in truth-seeking language is associated with an increase in quality of sources for both parties. The results support the hypothesis that the current dissemination of misinformation in political discourse is in part driven by an alternative understanding of truth and honesty that emphasizes invocation of subjective belief at the expense of reliance on evidence.

翻訳日:2023-06-17 03:56:15 公開日:2023-06-14

# テンプレートに基づく時間適応による動的文脈化単語埋め込みの学習

Learning Dynamic Contextualised Word Embeddings via Template-based Temporal Adaptation ( http://arxiv.org/abs/2208.10734v3 )

ライセンス: Link先を確認

Xiaohang Tang, Yi Zhou, Danushka Bollegala

(参考訳) dynamic contextized word embeddeds (dcwes) は、単語の時間的意味変化を表す。本稿では,事前学習されたマスク言語モデル(mlm)の時間適応化によるdcwes学習法を提案する。 2つの異なるタイムスタンプ $t_1$ と $t_2$ でそれぞれ取られたコーパスの2つのスナップショット $c_1$ と $c_2$ を考えると、まずは教師なしの方法を提案する。 (a)$c_1$ と $c_2$ のどちらも関連する用語と、 (b)個々のスナップショットの特定のピボット項に関連付けられたemph{anchor}用語。次に、抽出されたピボットとアンカーを使って手動でコンパイルされたテンプレートを埋めてプロンプトを生成します。さらに,人間による監督を必要とせず,C_1$とC_2$からタイムセンシティブなテンプレートを自動的に学習する手法を提案する。次に、生成されたプロンプトを使用して、プリトレーニングされたmlmをこれらのプロンプトを使用して微調整することで$t_2$に適応させる。複数の実験により, 提案手法はテスト文の難易度を$C_2$で低減し, 現状よりも優れていた。

Dynamic contextualised word embeddings (DCWEs) represent the temporal semantic variations of words. We propose a method for learning DCWEs by time-adapting a pretrained Masked Language Model (MLM) using time-sensitive templates. Given two snapshots $C_1$ and $C_2$ of a corpus taken respectively at two distinct timestamps $T_1$ and $T_2$, we first propose an unsupervised method to select (a) \emph{pivot} terms related to both $C_1$ and $C_2$, and (b) \emph{anchor} terms that are associated with a specific pivot term in each individual snapshot. We then generate prompts by filling manually compiled templates using the extracted pivot and anchor terms. Moreover, we propose an automatic method to learn time-sensitive templates from $C_1$ and $C_2$, without requiring any human supervision. Next, we use the generated prompts to adapt a pretrained MLM to $T_2$ by fine-tuning using those prompts. Multiple experiments show that our proposed method reduces the perplexity of test sentences in $C_2$, outperforming the current state-of-the-art.

翻訳日:2023-06-17 03:55:49 公開日:2023-06-14

# 少数ショット学習のためのプリミティブアウェア識別表現の学習

Learning Primitive-aware Discriminative Representations for Few-shot Learning ( http://arxiv.org/abs/2208.09717v2 )

ライセンス: Link先を確認

Jianpeng Yang, Yuhang Niu, Xuemei Xie, Guangming Shi

(参考訳) FSL (Few-shot Learning) は、いくつかのラベル付き例で簡単に新しいクラスを認識できる分類器を学習することを目的としている。 FSLに関する最近の研究は有望な分類性能をもたらし、画像レベルの特徴を使って分類のためのサンプル間の類似性を計算する。しかし、画像レベルの特徴は、見知らぬクラスと見えないクラスの間で転送可能で一貫性のあるオブジェクトの微細で構造的なインフォームを無視する。人間はどのようにして複数のサムプルを持つ新しいクラスを識別できるのか? 認知科学からのいくつかの研究は、人間が原始を通して新しいカテゴリーを認識できると主張している。基本と新規のカテゴリは重複しないが、共通のプリミティブを共有することができる。上記の再調査に触発されて,計量に基づくfslモデルに基づく原始認識表現を学習する原始的マイニング・推論ネットワーク (pmrn) を提案する。具体的には,機能抽出にSSJ(Self-supervision Jigsaw Task)を並列に追加し,オブジェクトの部分に対応する視覚パターンをフェースチャネルにエンコードするモデルを導出する。さらに識別表現をマイニングするために、アダプティブチャンネルグルーピング(acg)法をクラスタに適用し、空間的およびセマント的に関連した視覚パターンを重み付けし、視覚プリミティブのグループを生成する。プリミティブの識別可能性と伝達可能性を高めるために,グラフコンボリューションネットワークに基づく視覚的プリミティブ相関推論ネットワーク(CRN)を提案し,プリミティブ間の豊富な構造情報と内部相関を学習する。最後に、エピソディックトレーニング戦略に基づいてメタタスクの分類のための原始レベル計量を行う。広範な実験により,6つの標準ベンチマークで最新の結果が得られた。

Few-shot learning (FSL) aims to learn a classifier that can be easily adapted to recognize novel classes with only a few labeled examples. Some recent work about FSL has yielded promising classification performance, where the image-level feature is used to calculate the similarity among samples for classification. However, the image-level feature ignores abundant fine-grained and structural in-formation of objects that may be transferable and consistent between seen and unseen classes. How can humans easily identify novel classes with several sam-ples? Some study from cognitive science argues that humans can recognize novel categories through primitives. Although base and novel categories are non-overlapping, they can share some primitives in common. Inspired by above re-search, we propose a Primitive Mining and Reasoning Network (PMRN) to learn primitive-aware representations based on metric-based FSL model. Concretely, we first add Self-supervision Jigsaw task (SSJ) for feature extractor parallelly, guiding the model to encode visual pattern corresponding to object parts into fea-ture channels. To further mine discriminative representations, an Adaptive Chan-nel Grouping (ACG) method is applied to cluster and weight spatially and se-mantically related visual patterns to generate a group of visual primitives. To fur-ther enhance the discriminability and transferability of primitives, we propose a visual primitive Correlation Reasoning Network (CRN) based on graph convolu-tional network to learn abundant structural information and internal correlation among primitives. Finally, a primitive-level metric is conducted for classification in a meta-task based on episodic training strategy. Extensive experiments show that our method achieves state-of-the-art results on six standard benchmarks.

翻訳日:2023-06-17 03:55:25 公開日:2023-06-14

# ラベル雑音の存在下での一般化に及ぼすモデル幅と密度の影響の検討

Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise ( http://arxiv.org/abs/2208.08003v4 )

ライセンス: Link先を確認

Yihao Xue, Kyle Whitecross, Baharan Mirzasoleiman

(参考訳) 過パラメータ化されたニューラルネットワークのサイズ拡大は、最先端のパフォーマンスを達成する上で鍵となる。これは二重降下現象によって捉えられ、モデル幅が増加するにつれて、試験損失は減少・減少パターンに従う。しかし, 試験損失曲線に対するラベルノイズの影響は十分に検討されていない。本研究では、ラベルノイズが元々観測された二重降下曲線において \textit{final ascent} となる興味深い現象を明らかにする。具体的には、ノイズ対サンプルサイズ比が十分大きい場合には、中間幅で最適一般化が達成される。理論的解析を通じて、この現象はラベルノイズによる試験損失分散の形状遷移に起因している。さらに,最終昇華現象をモデル密度に拡張し,トレーニング可能なパラメータをランダムに落とせば,ラベルノイズ下での一般化が向上することを示す最初の理論的特徴を与える。また,正規化とサンプルサイズの役割についても徹底的に検討した。驚いたことに、ラベルノイズに対する大きな$\ell_2$正規化と堅牢な学習手法が最終的な上昇を悪化させる。我々は,MNISTでトレーニングされたReLuネットワーク,CIFAR-10/100でトレーニングされたResNet,および現実世界の雑音ラベルを持つスタンフォードカーでトレーニングされたInceptionResNet-v2を用いて,その妥当性を確認した。

Increasing the size of overparameterized neural networks has been a key in achieving state-of-the-art performance. This is captured by the double descent phenomenon, where the test loss follows a decreasing-increasing-decreasing pattern as model width increases. However, the effect of label noise on the test loss curve has not been fully explored. In this work, we uncover an intriguing phenomenon where label noise leads to a \textit{final ascent} in the originally observed double descent curve. Specifically, under a sufficiently large noise-to-sample-size ratio, optimal generalization is achieved at intermediate widths. Through theoretical analysis, we attribute this phenomenon to the shape transition of test loss variance induced by label noise. Furthermore, we extend the final ascent phenomenon to model density and provide the first theoretical characterization showing that reducing density by randomly dropping trainable parameters improves generalization under label noise. We also thoroughly examine the roles of regularization and sample size. Surprisingly, we find that larger $\ell_2$ regularization and robust learning methods against label noise exacerbate the final ascent. We confirm the validity of our findings through extensive experiments on ReLu networks trained on MNIST, ResNets trained on CIFAR-10/100, and InceptionResNet-v2 trained on Stanford Cars with real-world noisy labels.

翻訳日:2023-06-17 03:54:41 公開日:2023-06-14

# 一様局所測定による任意絡み合い状態の効率的な検証

Efficient verification of arbitrary entangled states with homogeneous local measurements ( http://arxiv.org/abs/2208.01083v2 )

ライセンス: Link先を確認

Ye-Chao Liu, Yinfei Li, Jiangwei Shang, Xiangdong Zhang

(参考訳) 量子状態検証(QSV)は、特定の量子デバイスが所望の目標状態を生成することを検証するためにのみ、局所的な測定に依存するタスクである。今のところ、ある種の絡み合った状態はQSVによって効率よく、あるいは最適に検証できる。しかし、任意の絡み合った状態を考えると、その検証プロトコルをどのように設計するかは未解決の問題である。そこで本研究では, 選択非依存計測プロトコルとして導入する手法の局所性を考慮し, 操作者が均質である場合に直接達成できる手法を提案する。いくつかの典型的な絡み合った状態を例にとると、標準ポーリ射影を用いたプロトコル設計の明示的な手順を示し、より優れたqsv戦略を実現する方法の優位性を示す。さらに,本フレームワークは,絡み目の構築やパラメータ推定など,他のタスクにも自然に拡張することができる。

Quantum state verification (QSV) is the task of relying on local measurements only to verify that a given quantum device does produce the desired target state. Up to now, certain types of entangled states can be verified efficiently or even optimally by QSV. However, given an arbitrary entangled state, how to design its verification protocol remains an open problem. In this work, we present a systematic strategy to tackle this problem by considering the locality of what we initiate as the choice-independent measurement protocols, whose operators can be directly achieved when they are homogeneous. Taking several typical entangled states as examples, we show the explicit procedures of the protocol design using standard Pauli projections, demonstrating the superiority of our method for attaining better QSV strategies. Moreover, our framework can be naturally extended to other tasks such as the construction of entanglement witness, and even parameter estimation.

翻訳日:2023-06-17 03:54:16 公開日:2023-06-14

# ニューラルネットワークによる新規なテスト選択による機能被覆の高速化

Using Neural Networks for Novelty-based Test Selection to Accelerate Functional Coverage Closure ( http://arxiv.org/abs/2207.00445v3 )

ライセンス: Link先を確認

Xuan Zheng, Kerstin Eder and Tim Blackmore

(参考訳) シミュレーションに基づく検証に使用される新しいテストセレクタは、カバレッジホールの数に関わらず、カバレッジ閉鎖を著しく加速することが示されている。本稿ではニューラルネットワークに基づく新しいテスト選択のための構成可能かつ高度に自動化されたフレームワークを提案する。このフレームワークの3つの構成は商用信号処理ユニットでテストされる。 3つとも確率的にランダムなテスト選択を上回っており、最大のシミュレーションの節約率は49.37%で99.5%である。構成の計算コストは、シミュレーションの削減と比べて無視できる。実験結果を比較し,構成の性能に関する重要な特徴について考察する。

Novel test selectors used in simulation-based verification have been shown to significantly accelerate coverage closure regardless of the number of coverage holes. This paper presents a configurable and highly-automated framework for novel test selection based on neural networks. Three configurations of this framework are tested with a commercial signal processing unit. All three convincingly outperform random test selection with the largest saving of simulation being 49.37% to reach 99.5% coverage. The computational expense of the configurations is negligible compared to the simulation reduction. We compare the experimental results and discuss important characteristics related to the performance of the configurations.

翻訳日:2023-06-17 03:52:28 公開日:2023-06-14

# 浮揚光機械センサによる横軌道角運動量計測

Structured transverse orbital angular momentum probed by a levitated optomechanical sensor ( http://arxiv.org/abs/2209.09759v3 )

ライセンス: Link先を確認

Yanhui Hu, Jack J. Kingsley-Smith, Maryam Nikkhou, James A. Sabin, Francisco J. Rodr\'iguez-Fortu\~no, Xiaohao Xu and James Millen

(参考訳) 構造された光電場によって運ばれる運動量は、様々な驚くべき特徴を示す。本研究では,2つの平行な直線偏光集束ビームの干渉場における横軌道角運動量(TOAM)を生成し,固有TOAMを有する同一のハンドネス渦列を合成する。回転が光角運動量のプローブであり、非常に大きなトルクを発生させる光学浮揚シリコンナノロッドからなる光機械センサを用いて、この構造された光場を探索する。この単純なTOAMの生成と直接観察は、基礎物理学、物質の光学的操作、量子光学の研究に応用される。

The momentum carried by structured light fields exhibits a rich array of surprising features. In this work, we generate transverse orbital angular momentum (TOAM) in the interference field of two parallel and counterpropagating linearly-polarised focused beams, synthesising an array of identical handedness vortices carrying intrinsic TOAM. We explore this structured light field using an optomechanical sensor, consisting of an optically levitated silicon nanorod, whose rotation is a probe of the optical angular momentum, which generates an exceptionally large torque. This simple creation and direct observation of TOAM will have applications in studies of fundamental physics, the optical manipulation of matter and quantum optomechanics.

翻訳日:2023-06-17 03:45:47 公開日:2023-06-14

# 分類基準の分析と比較

Analysis and Comparison of Classification Metrics ( http://arxiv.org/abs/2209.05355v3 )

ライセンス: Link先を確認

Luciana Ferrer

(参考訳) さまざまなパフォーマンス指標が、分類システムの評価のために機械学習文献で一般的に使用されている。ハード決定の質を測る最も一般的なものは、標準とバランスの取れた精度、標準とバランスの取れた誤差率、Fベータスコア、マシューズ相関係数(MCC)である。本稿では,これらと他の指標の定義をレビューし,各統計学習コースで導入されているが機械学習文献では滅多に用いられていない期待コスト(ec)と比較する。標準および平衡誤差率の両方がECの特別な場合であることを示す。さらに、f-score と mcc との関係を示し、ec は従来のメトリクスよりも優れており、よりエレガントで汎用的で直感的であり、統計の基本的な原則に基づいていると主張する。上記のメトリクスは、難しい決定の質を測定します。しかし、現代のほとんどの分類システムは、直接評価したいクラスに対して連続スコアを出力する。システムスコアの測定基準には、ROC曲線下の領域、等誤差率、クロスエントロピー、ブライアスコア、ベイズECまたはベイズリスクなどが含まれる。最後の3つのメトリクスは、適切なスコアリングルール(PSR)の期待値によって与えられるメトリクスのファミリーの特別なケースである。これらの指標の背景にある理論を概観し、系が生み出す後部確率の質を測る最も原理的な方法であると主張している。最後に,これらの測定値を用いてシステムのキャリブレーション損失を計算し,この測定値と標準期待キャリブレーション誤差(ECE)を比較し,PSRに基づくキャリブレーション損失は様々な理由からECEよりも優れていると主張した。

A variety of different performance metrics are commonly used in the machine learning literature for the evaluation of classification systems. Some of the most common ones for measuring quality of hard decisions are standard and balanced accuracy, standard and balanced error rate, F-beta score, and Matthews correlation coefficient (MCC). In this document, we review the definition of these and other metrics and compare them with the expected cost (EC), a metric introduced in every statistical learning course but rarely used in the machine learning literature. We show that both the standard and balanced error rates are special cases of the EC. Further, we show its relation with F-score and MCC and argue that EC is superior to these traditional metrics, being more elegant, general, and intuitive, as well as being based on basic principles from statistics. The metrics above measure the quality of hard decisions. Yet, most modern classification systems output continuous scores for the classes which we may want to evaluate directly. Metrics for measuring the quality of system scores include the area under the ROC curve, equal error rate, cross-entropy, Brier score, and Bayes EC or Bayes risk, among others. The last three metrics are special cases of a family of metrics given by the expected value of proper scoring rules (PSRs). We review the theory behind these metrics and argue that they are the most principled way to measure the quality of the posterior probabilities produced by a system. Finally, we show how to use these metrics to compute the system's calibration loss and compare this metric with the standard expected calibration error (ECE), arguing that calibration loss based on PSRs is superior to the ECE for a variety of reasons.

翻訳日:2023-06-17 03:44:57 公開日:2023-06-14

# トランザクションからの重力:エントロピー重力プログラムをフルフィルする

Gravity from Transactions: Fulfilling the Entropic Gravity Program ( http://arxiv.org/abs/2209.04025v2 )

ライセンス: Link先を確認

A. Schlatter and R. E. Kastner

(参考訳) 本稿では,相対論的トランザクション解釈(RTI)の観点から,エントロピー重力の新展開を概観する。時空事象に対するトランザクショナルなアプローチは、エントロピック重力(もともとエリック・ヴェルリンデが提唱した方法で)に対する自然な方法を生み出し、その研究プログラムに対する既存の反対を克服する。この理論は自然に宇宙定数と修正ニュートン力学(MOND)を生じさせ、歴史的に「暗黒エネルギー」と「暗黒物質」に由来する現象の物理的説明を与える。

This is a review of new developments in entropic gravity in light of the Relativistic Transactional Interpretation (RTI). A transactional approach to spacetime events can give rise in a natural way to entropic gravity (in the way originally proposed by Erik Verlinde) while also overcoming extant objections to that research program. The theory also naturally gives rise to a Cosmological Constant and to Modified Newtonian Dynamics (MOND) and thus provides a physical explanation for the phenomena historically attributed to "dark energy" and "dark matter".

翻訳日:2023-06-17 03:44:06 公開日:2023-06-14

# 多様な相互相関を持つ単段広帯域マルチラベル学習(bmiml)とその医用画像分類への応用

Single-Stage Broad Multi-Instance Multi-Label Learning (BMIML) with Diverse Inter-Correlations and its application to medical image classification ( http://arxiv.org/abs/2209.02625v2 )

ライセンス: Link先を確認

Qi Lai, Jianhang Zhou, Yanfen Gan, Chi-Man Vong, Deshuang Huang

(参考訳) 複数のインスタンス(イメージパッチなど)によって記述され、同時に複数のラベルに関連付けられる。既存のMIMLメソッドは多くのアプリケーションで有用であるが、そのほとんどはいくつかの問題により比較的低い精度と訓練効率に悩まされている。一ラベル間の相関関係(即ち、対象に対応する複数のラベル間の確率的相関関係)を無視すること。二インスタンス間相関(すなわち、オブジェクトラベルの予測において異なるインスタンスの確率的相関)は、欠落したインスタンスラベルによる他の種類の相関を直接(又は共同で)学習することはできない。三多様な相互相関(例えば、ラベル間相関、インスタンス間相関)は、複数の段階でしか学べない。これらの問題を解決するために,広帯域マルチインスタンス・マルチラベル学習(BMIML)と呼ばれる新しいシングルステージフレームワークを提案する。 BMIMLには3つの革新的なモジュールがある。一広範学習システム(BLS)に基づく自己重み付きラベル強化学習(AWLEL)を設計し、従来のBLSでは不可能でありながら、ラベル間相関を同時にかつ効率的に取得する。二スケーラブルマルチインスタンス確率回帰(SMIPR)と呼ばれる特定のMIMLニューラルネットワークを構築して、オブジェクトラベルのみを用いてインスタンス間相関を効果的に推定し、学習のためのさらなる確率情報を提供する。三最後に、対話型意思決定最適化(IDO)を設計し、AWLELとSMIPRの結果を組み合わせ、最適化し、単一ステージのフレームワークを形成する。実験の結果、BMIMLは既存の手法よりも精度が高く、大きな医療画像データセット(>90K画像)であってもほとんどのMIML法よりもはるかに高速であることがわかった。

described by multiple instances (e.g., image patches) and simultaneously associated with multiple labels. Existing MIML methods are useful in many applications but most of which suffer from relatively low accuracy and training efficiency due to several issues: i) the inter-label correlations(i.e., the probabilistic correlations between the multiple labels corresponding to an object) are neglected; ii) the inter-instance correlations (i.e., the probabilistic correlations of different instances in predicting the object label) cannot be learned directly (or jointly) with other types of correlations due to the missing instance labels; iii) diverse inter-correlations (e.g., inter-label correlations, inter-instance correlations) can only be learned in multiple stages. To resolve these issues, a new single-stage framework called broad multi-instance multi-label learning (BMIML) is proposed. In BMIML, there are three innovative modules: i) an auto-weighted label enhancement learning (AWLEL) based on broad learning system (BLS) is designed, which simultaneously and efficiently captures the inter-label correlations while traditional BLS cannot; ii) A specific MIML neural network called scalable multi-instance probabilistic regression (SMIPR) is constructed to effectively estimate the inter-instance correlations using the object label only, which can provide additional probabilistic information for learning; iii) Finally, an interactive decision optimization (IDO) is designed to combine and optimize the results from AWLEL and SMIPR and form a single-stage framework. Experiments show that BMIML is highly competitive to (or even better than) existing methods in accuracy and much faster than most MIML methods even for large medical image data sets (> 90K images).

翻訳日:2023-06-17 03:43:56 公開日:2023-06-14

# 量子状態におけるランダムアクセスコードの2例

Two instances of random access code in the quantum regime ( http://arxiv.org/abs/2208.14422v3 )

ライセンス: Link先を確認

Nitica Sakharwade, Micha{\l} Studzi\'nski, Micha{\l} Eckstein, and Pawe{\l} Horodecki

(参考訳) 我々は、ランダムアクセスコード(rac)の量子一般化の2つのクラスを検討し、そのようなタスクの成功確率の下限を研究する。制約のあるリソースを用いた情報処理タスクの研究に有用なフレームワークを提供する。最初のクラスはランダムなアクセスコードに基づいており、量子入力と出力は非署名量子RAC (NS-QRAC) [A. Grudka et al. Phys. Rev. A 92, 052312 (2015)] と呼ばれる。 ns-qracシナリオの2つの修正について検討する。まず、アンバウンドなエンタングルメントと制約付き量子通信が許可され、次に、有界なエンタングルメントと制約のない古典的通信が許可されている場合、送信のフィデリティに対する一夫一夫一婦関係が、通常の通信方式とは対照的に、複数の送信者と1人の受信者が関与する。これらのシナリオに対して,より低いバウンダリを提供します。第2のクラスは、量子チャネルと共有絡み合い[A. Tavakoli et al. PRX Quantum 2 (4) 040357 (2021)]を持つランダムアクセスコードに基づいている。 2桁の$d$-baseからなる2つの入力をquditと最大絡み合った状態に符号化し、制約付き量子通信による量子密符号化として見ることができ、$d=2,3,4$の量子下界を提供する。エンコーディングはグレーコードを利用する。

We consider two classes of quantum generalisations of Random Access Code (RAC) and study lower bounds for probabilities of success for such tasks. It provides a useful framework for the study of certain information processing tasks with constrained resources. The first class is based on a random access code with quantum inputs and output known as No-Signalling Quantum RAC (NS-QRAC) [A. Grudka et al. Phys. Rev. A 92, 052312 (2015)], where unbounded entanglement and constrained classical communication are allowed, which can be seen as quantum teleportation with constrained classical communication, for which we provide a quantum lower bound. We consider two modifications to the NS-QRAC scenario, first where unbounded entanglement and constrained quantum communication is allowed and, second where bounded entanglement and unconstrained classical communication are allowed, where we find a monogamy relation for the transmission fidelities, which -- in contrast to the usual communication schemes -- involves multiple senders and a single receiver. We provide lower bounds for these scenarios. The second class is based on a random access code with a quantum channel and shared entanglement [A. Tavakoli et al. PRX Quantum 2 (4) 040357 (2021)]. We study the set of tasks where two inputs made of two digits of $d$-base are encoded over a qudit and a maximally entangled state, which can be seen as quantum dense coding with constrained quantum communication, for which we provide quantum lower bounds for $d=2,3,4$. The encoding employed utilises Gray codes.

翻訳日:2023-06-17 03:43:07 公開日:2023-06-14

# perspective-1-ellipsoid:1つの楕円型対応によるカメラポーズ推定問題の定式化、解析、および解法

Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence ( http://arxiv.org/abs/2208.12513v3 )

ライセンス: Link先を確認

Vincent Gaudilli\`ere, Gilles Simon, Marie-Odile Berger

(参考訳) コンピュータビジョンでは、3次元幾何学的実体と画像への投影との対応からカメラのポーズ推定が広く研究されている。多くの最先端の手法は、ポイントやラインのような低レベルプリミティブを利用するが、近年の非常に効果的なCNNベースのオブジェクト検出器の出現は、意味論的に意味のある情報を持つ高レベルな特徴の使用への道を開いた。この方向のパイオニアは、楕円体による3Dオブジェクトのモデリングと楕円体による2D検出が、2Dデータと3Dデータをリンクするのに便利な方法であることを示した。しかし、関連するlitteratureでよく使われる数学的形式論は、楕円形や楕円形を他の二次や円錐形と容易に区別することはできず、いくつかの発展において潜在的に有害な特異性の喪失に繋がる。さらに、投射方程式の線形化過程は、カメラパラメータの過剰表現を生成し、効率損失を引き起こす可能性がある。そこで本稿では,楕円体固有の理論的枠組みを導入し,ポーズ推定の文脈においてその有益性を示す。より正確には、提案形式は、残りの未知を閉形式で導出できる位置または向きのみの推定問題に、ポーズ推定問題を還元することができることを示す。次に,1自由度 (1dof) 問題にさらに縮小できることを示し,その一意なスカラーの関数として,ポーズの解析的導出を提供する。視覚的な例で理論的考察を例示し,実用的側面について考察する。最後に,エリプソイド関連ポーズ推定問題のより効率的な解決に向けて,対応するソースコードとともに本論文をリリースする。

In computer vision, camera pose estimation from correspondences between 3D geometric entities and their projections into the image has been a widely investigated problem. Although most state-of-the-art methods exploit low-level primitives such as points or lines, the emergence of very effective CNN-based object detectors in the recent years has paved the way to the use of higher-level features carrying semantically meaningful information. Pioneering works in that direction have shown that modelling 3D objects by ellipsoids and 2D detections by ellipses offers a convenient manner to link 2D and 3D data. However, the mathematical formalism most often used in the related litterature does not enable to easily distinguish ellipsoids and ellipses from other quadrics and conics, leading to a loss of specificity potentially detrimental in some developments. Moreover, the linearization process of the projection equation creates an over-representation of the camera parameters, also possibly causing an efficiency loss. In this paper, we therefore introduce an ellipsoid-specific theoretical framework and demonstrate its beneficial properties in the context of pose estimation. More precisely, we first show that the proposed formalism enables to reduce the pose estimation problem to a position or orientation-only estimation problem in which the remaining unknowns can be derived in closed-form. Then, we demonstrate that it can be further reduced to a 1 Degree-of-Freedom (1DoF) problem and provide the analytical derivations of the pose as a function of that unique scalar unknown. We illustrate our theoretical considerations by visual examples and include a discussion on the practical aspects. Finally, we release this paper along with the corresponding source code in order to contribute towards more efficient resolutions of ellipsoid-related pose estimation problems.

翻訳日:2023-06-17 03:42:17 公開日:2023-06-14

# 量子コンピュータを用いたネットワークにおける感染拡大のシミュレーション

Simulating the Spread of Infection in Networks with Quantum Computers ( http://arxiv.org/abs/2208.11394v2 )

ライセンス: Link先を確認

Xiaoyang Wang and Yinchenguang Lyu and Changyu Yao and Xiao Yuan

(参考訳) 本稿では,ネットワークの感染拡大をシミュレーションする量子コンピュータを提案する。まず,Ising型相互作用による感染分布とスピン格子構成の類似性を示す。次に, 拡散過程を古典マルコフ過程としてモデル化できるので, パラメータ化されたハミルトニアンを持つ量子熱力学モデルの進化を用いて拡散過程をシミュレートできることを示す。特に,ハミルトニアンの進化挙動を解析的および数値的に解析し,その進化が古典マルコフ過程をシミュレートすることを証明する。疫学的な入力から熱力学的ハミルトニアンのパラメータを決定するための実用的な方法を示す。例として,SARS-Cov-2変異株Omicronの感染拡散過程のシミュレーションを行った。

We propose to use quantum computers to simulate infection spreading in networks. We first show the analogy between the infection distribution and spin-lattice configurations with Ising-type interactions. Then, since the spreading process can be modeled as a classical Markovian process, we show that the spreading process can be simulated using the evolution of a quantum thermal dynamic model with a parameterized Hamiltonian. In particular, we analytically and numerically analyze the evolution behavior of the Hamiltonian, and prove that the evolution simulates a classical Markovian process, which describes the well-known epidemiological stochastic susceptible and infectious (SI) model. A practical method to determine the parameters of the thermal dynamic Hamiltonian from epidemiological inputs is exhibited. As an example, we simulate the infection spreading process of the SARS-Cov-2 variant Omicron in a small-world network.

翻訳日:2023-06-17 03:41:46 公開日:2023-06-14

# MAMO:細粒度視覚言語表現学習のためのマスク付きマルチモーダルモデリング

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning ( http://arxiv.org/abs/2210.04183v3 )

ライセンス: Link先を確認

Zijia Zhao, Longteng Guo, Xingjian He, Shuai Shao, Zehuan Yuan, Jing Liu

(参考訳) マルチモーダル表現学習は様々な視覚言語タスクにおいて有望な改善を示している。既存のほとんどの手法は、視覚と言語の間のグローバルレベルのアライメントを構築するのに優れ、効果的なきめ細かい画像とテキストの相互作用を欠いている。本稿では,細粒度マルチモーダル表現を学習するための複合マスク型マルチモーダルモデリング手法を提案する。本手法は,画像テキスト入力の共用マスキングを行い,マスキング信号の暗黙的および明示的ターゲットを統合して復元する。暗黙のターゲットは視覚と言語に対する統一的で不偏の目的を与え、そこでモデルは非マスキーク入力の潜在マルチモーダル表現を予測する。明示的なターゲットは、画像パッチの運動量視覚的特徴や単語トークンの概念といった高レベルで意味のある情報を復元することで、マルチモーダル表現をさらに強化する。このようなマスク付きモデリングプロセスを通じて、我々のモデルは微細なマルチモーダル相互作用を学習するだけでなく、高レベルの表現と低レベルの予測ターゲット(画像画素など)のセマンティックギャップを回避し、ゼロショットと微調整の両方でうまく機能するセマンティックにリッチなマルチモーダル表現を生成する。先行学習モデル(mamo)は,画像テキスト検索,視覚的質問応答,視覚的推論,弱教師付き視覚接地など,下流の視覚言語タスクにおいて最先端のパフォーマンスを実現する。

Multimodal representation learning has shown promising improvements on various vision-language tasks. Most existing methods excel at building global-level alignment between vision and language while lacking effective fine-grained image-text interaction. In this paper, we propose a jointly masked multimodal modeling method to learn fine-grained multimodal representations. Our method performs joint masking on image-text input and integrates both implicit and explicit targets for the masked signals to recover. The implicit target provides a unified and debiased objective for vision and language, where the model predicts latent multimodal representations of the unmasked input. The explicit target further enriches the multimodal representations by recovering high-level and semantically meaningful information: momentum visual features of image patches and concepts of word tokens. Through such a masked modeling process, our model not only learns fine-grained multimodal interaction, but also avoids the semantic gap between high-level representations and low- or mid-level prediction targets (e.g. image pixels), thus producing semantically rich multimodal representations that perform well on both zero-shot and fine-tuned settings. Our pre-trained model (named MAMO) achieves state-of-the-art performance on various downstream vision-language tasks, including image-text retrieval, visual question answering, visual reasoning, and weakly-supervised visual grounding.

翻訳日:2023-06-17 03:36:23 公開日:2023-06-14

# RGB-Dパノプティブセグメンテーションのためのロバスト二重エンコーダネットワーク

Robust Double-Encoder Network for RGB-D Panoptic Segmentation ( http://arxiv.org/abs/2210.02834v2 )

ライセンス: Link先を確認

Matteo Sodano, Federico Magistri, Tiziano Guadagnino, Jens Behley, Cyrill Stachniss

(参考訳) 知覚は、現実の環境で行動するロボットにとって不可欠である。自律システムは周囲の世界を見て理解し、適切に行動する必要があるからだ。パノプティックセグメンテーションは、ピクセルワイズセマンティックラベルをインスタンスIDと共に計算することでシーンの解釈を提供する。本稿では,室内シーンのRGB-Dデータを用いたパノプティカルセグメンテーションについて述べる。本稿では、2つのエンコーダを通してRGBと深さを別々に処理する新しいエンコーダデコーダニューラルネットワークを提案する。個々のエンコーダの特徴は異なる解像度で徐々にマージされ、rgbの特徴は相補的な深さ情報を用いて強化される。本稿では,特徴マップの重要度に応じて各エントリを強調する,susentexciteと呼ばれる新しいマージ手法を提案する。ダブルエンコーダアーキテクチャでは、欠けているヒントに対して堅牢です。特に、同じモデルは、特殊なモデルを訓練することなく、RGB-D、RGB-only、deep-only入力データをトレーニングおよび推論することができる。提案手法を公開データセット上で評価し,他の汎視的セグメンテーション手法と比較して優れた結果が得られることを示す。

Perception is crucial for robots that act in real-world environments, as autonomous systems need to see and understand the world around them to act properly. Panoptic segmentation provides an interpretation of the scene by computing a pixelwise semantic label together with instance IDs. In this paper, we address panoptic segmentation using RGB-D data of indoor scenes. We propose a novel encoder-decoder neural network that processes RGB and depth separately through two encoders. The features of the individual encoders are progressively merged at different resolutions, such that the RGB features are enhanced using complementary depth information. We propose a novel merging approach called ResidualExcite, which reweighs each entry of the feature map according to its importance. With our double-encoder architecture, we are robust to missing cues. In particular, the same model can train and infer on RGB-D, RGB-only, and depth-only input data, without the need to train specialized models. We evaluate our method on publicly available datasets and show that our approach achieves superior results compared to other common approaches for panoptic segmentation.

翻訳日:2023-06-17 03:35:29 公開日:2023-06-14

# 時間変化重みを用いたデータドリフト下の学習

Learning under Data Drift with Time-Varying Importance Weights ( http://arxiv.org/abs/2210.01422v3 )

ライセンス: Link先を確認

Rasool Fakoor and Jonas Mueller and Zachary C. Lipton and Pratik Chaudhari and Alexander J. Smola

(参考訳) データが時間とともに進化するため、機械学習モデルの現実世界でのデプロイメントは難しい。データが任意の方法で進化する際には、モデルが機能しないが、これらの変更に何らかのパターンがある場合、それに対応するメソッドを設計できるかもしれない。本稿では,データが徐々に進化する状況に対処する。我々は、データ分布の段階的な変化を検知し、過去のデータを選択的にサンプリングしてモデルを更新できる時間変化確率スコアを導入します。時間変動確率スコアは非常に一般的で,教師付き学習(画像分類問題など)から,段階的な変化を連続的に行う,教師付き学習(画像分類問題など)から,方針やタスクの変化に伴ってデータがシフトする強化学習タスク(ロボット操作や連続制御など)まで,さまざまな問題に対して評価を行う。

Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.

翻訳日:2023-06-17 03:35:11 公開日:2023-06-14

# 複数のスケールでの位相特異性検出

Topological Singularity Detection at Multiple Scales ( http://arxiv.org/abs/2210.00069v4 )

ライセンス: Link先を確認

Julius von Rohrscheidt and Bastian Rieck

(参考訳) データが低本質次元の未知多様体上またはその近くにあると仮定する多様体仮説は、現代の機械学習研究の出発点である。しかし、最近の研究により、実世界のデータは、特異点、すなわち誤った発見につながる可能性のある異なる非多様体構造を示すことが示されている。このような特異点の検出は補間および推論タスクの前駆体として重要である。この問題に対処するために、我々はトポロジカルな枠組みを開発します。 (i)局所的な内在次元を定量化し、 (ii)複数の尺度に沿った点の「多様体性」を評価するためのユークリディシティスコアを得る。画像データの特異構造や局所幾何学的複雑性を捉えながら,複素空間の特異点を同定する。

The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the 'manifoldness' of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.

翻訳日:2023-06-17 03:34:27 公開日:2023-06-14

# truncated-cumulant trajectoriesによる開量子スピン格子の量子および古典的相関

Quantum and classical correlations in open quantum-spin lattices via truncated-cumulant trajectories ( http://arxiv.org/abs/2209.13377v4 )

ライセンス: Link先を確認

Wouter Verstraelen and Dolf Huybrechts and Tommaso Roscilde and Michiel Wouters

(参考訳) リウビリアン開量子システムにおける量子多体物理学の研究は、散逸系に対する最近の実験的制御の進展と、その技術的利用によってますます重要になっている。オープン量子系における中心的な問題は、量子相関の運命と、ハミルトン力学と浴槽とのカップリングの競合を工学的に制御する可能性に関するものである。このような問題は、量子相関を忠実に説明する数値的な方法が正確な対角化に依存しているか、扱える大きさを劇的に制限しているか、あるいは密度行列に対する特定のアンサッツの選択に関連する量子相関の範囲や強度を近似しているため、理論的観点からは難しい。本研究では,開放系力学の解に対する確率的量子軌道に基づいて,開量子スピン格子を扱う新しい手法を提案する。各軌道に沿って、多点スピンスピンコレレータの運動方程式の階層は、カットオフ$k_c$を超える$k$の多変量$k$-次累積が消えると仮定して、与えられた有限順序に切り替わる。これにより、全ての長さスケールに対して、量子スピン-スピン相関の進化を追跡することができる。自発的崩壊を受ける2次元xyz格子の相転移のパラダイム的場合において、このアプローチを検証する。我々は,パラ磁性から強磁性への定常相転移の存在を,ハミルトニアンカップリングの1つを増加させ,またその古典的イジングの性質を説得力をもって評価する。さらに, このアプローチにより, 散逸臨界点近傍に有意な量子相関が存在することを示し, 量子フィッシャー情報と密接な結合であるスピンスクイーズの存在を明らかにすることができる。

The study of quantum many-body physics in Liouvillian open quantum systems becomes increasingly important with the recent progress in experimental control on dissipative systems and their technological exploitation . A central question in open quantum systems concerns the fate of quantum correlations, and the possibility of controlling them by engineering the competition between the Hamiltonian dynamics and the coupling to a bath. Such a question is challenging from a theoretical point of view, as numerical methods faithfully accounting for quantum correlations are either relying on exact diagonalization, limiting drastically the sizes that can be treated; or on approximations on the range or strength of quantum correlations, associated to the choice of a specific Ansatz for the density matrix. In this work we propose a new method to treat open quantum-spin lattices, based on stochastic quantum trajectories for the solution of the open-system dynamics. Along each trajectory, the hierarchy of equations of motion for many-point spin-spin correlators is truncated to a given finite order, assuming that multivariate $k$-th order cumulants vanish for $k$ exceeding a cutoff $k_c$. This allows tracking the evolution of quantum spin-spin correlations up to order $k_c$ for all length scales. We validate this approach in the paradigmatic case of the phase transitions of the dissipative 2D XYZ lattice, subject to spontaneous decay. We convincingly assess the existence of steady-state phase transitions from paramagnetic to ferromagnetic, and back to paramagnetic, upon increasing one of the Hamiltonian couplings; as well as their classical Ising nature. Moreover, the approach allows us to show the presence of significant quantum correlations in the vicinity of the dissipative critical point, and to unveil the presence of spin squeezing, a tight lower bound to the quantum Fisher information.

翻訳日:2023-06-17 03:33:57 公開日:2023-06-14

# NISQデバイス以降におけるトロタライゼーション適応化とエネルギー自己補正

Making Trotterization adaptive and energy-self-correcting for NISQ devices and beyond ( http://arxiv.org/abs/2209.12653v2 )

ライセンス: Link先を確認

Hongzheng Zhao, Marin Bukov, Markus Heyl, and Roderich Moessner

(参考訳) 連続時間進化のシミュレーションは、古典コンピュータと量子コンピュータの両方で時間離散化を必要とする。より細かい時間ステップはシミュレーションの精度を向上させるが、必然的に計算労力が増加する。これは、今日のノイズの多い中間スケール量子コンピュータにとって特にコストがかかり、有名なゲートの不完全さは、与えられた精度で実行可能な回路の深さを制限する。古典的適応解法は数値計算時間を節約するためによく開発されている。しかしながら、適応時間ステップによって利用可能な量子リソースを最適に利用することは、依然として際立った課題である。本稿では,局所観測器の量子多体ダイナミクスの制御解を提供するため,この問題を解決する量子アルゴリズムを提案する。提案アルゴリズムの鍵となる概念要素は、時間ステップを適応させることでシミュレーションエラーを自己修正するフィードバックループであり、これにより、従来のトロッタースキームを基本レベルで大幅に上回り、回路深さを減少させる。さらには、通常のトロッタライズドダイナミクスが困難に直面している、制御された漸近的長時間エラーも可能にします。我々の量子アルゴリズムのもう1つの重要な利点は、望ましい保存則を自己修正フィードバックループに含めることができることである。我々は、格子ゲージ理論の忠実で長期にわたる量子シミュレーションに不可欠なゲージ不変性を強制することによって、その能力を実証する。このアルゴリズムは、例えば、時間発展ブロックデシメーション法に基づく数値的アプローチなど、時間的離散化が関与する場合には、より一般的なレベルで有用である可能性がある。

Simulation of continuous time evolution requires time discretization on both classical and quantum computers. A finer time step improves simulation precision, but it inevitably leads to increased computational efforts. This is particularly costly for today's noisy intermediate scale quantum computers, where notable gate imperfections limit the circuit depth that can be executed at a given accuracy. Classical adaptive solvers are well-developed to save numerical computation times. However, it remains an outstanding challenge to make optimal usage of the available quantum resources by means of adaptive time steps. Here, we introduce a quantum algorithm to solve this problem, providing a controlled solution of the quantum many-body dynamics of local observables. The key conceptual element of our algorithm is a feedback loop which self-corrects the simulation errors by adapting time steps, thereby significantly outperforming conventional Trotter schemes on a fundamental level and reducing the circuit depth. It even allows for a controlled asymptotic long-time error, where usual Trotterized dynamics is facing difficulties. Another key advantage of our quantum algorithm is that any desired conservation law can be included in the self-correcting feedback loop, which has potentially a wide range of applicability. We demonstrate the capabilities by enforcing gauge invariance which is crucial for a faithful and long-sought quantum simulation of lattice gauge theories. Our algorithm can be potentially useful on a more general level whenever time discretization is involved concerning, for instance, also numerical approaches based on time-evolving block decimation methods.

翻訳日:2023-06-17 03:33:23 公開日:2023-06-14

# テキストマッチングレコメンデーションシステムのアウトオブディストリビューション一般化のための介入の利用

Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Recommendation Systems ( http://arxiv.org/abs/2210.10636v2 )

ライセンス: Link先を確認

Parikshit Bansal, Yashoteja Prabhu, Emre Kiciman, Amit Sharma

(参考訳) ユーザの入力テキストが与えられた場合、テキストマッチングレコメンダシステムは、eコマースプラットフォームにおける製品間レコメンデーションなど、入力テキストと利用可能なアイテムの説明を比較して関連項目を出力する。ユーザの関心や項目のインベントリが変化すると期待されているため、テキストマッチングシステムがデータシフト(out-of-distribution (ood) generalization)と呼ばれるタスクに一般化することが重要である。しかし、ペアアイテム関連データ(例えば、ユーザークリック)上で大きなベース言語モデルを微調整する一般的なアプローチは、ood一般化の逆生成的であることがわかった。製品レコメンデーションタスクでは、新しいカテゴリや将来の期間の項目を推奨する場合、微調整はベースモデルよりも精度が悪くなる。この一般化の失敗を説明するために、微調整されたモデルが散発的な相関を捉え、2つのテキスト入力間の関連性を決定する因果的特徴を学習できないことを示す、介入に基づく重要度指標を考える。また、この設定では因果規則化の標準的な手法は適用されないが、画像とは異なり、テキストマッチングタスクには普遍的にスプリアスな特徴が存在しない(同じトークンがマッチしているテキストによってスプリアスか因果的になる可能性がある)。そこで本研究では,テキスト入力におけるOOD一般化について,特定の特徴に対する高い重要点の回避という,異なる目標を掲げる。これは、モデルの関連度スコアに対するトークンの因果効果を、ベースモデルに類似するように制約する介入ベースの正規化器を使用します。 amazon製品と3つの質問推奨データセットの結果から,提案する正規化器は,特にベースモデルが正確でない場合の難解なシナリオにおいて,分布内評価とood評価の両方の一般化を改善できることが分かる。

Given a user's input text, text-matching recommender systems output relevant items by comparing the input text to available items' description, such as product-to-product recommendation on e-commerce platforms. As users' interests and item inventory are expected to change, it is important for a text-matching system to generalize to data shifts, a task known as out-of-distribution (OOD) generalization. However, we find that the popular approach of fine-tuning a large, base language model on paired item relevance data (e.g., user clicks) can be counter-productive for OOD generalization. For a product recommendation task, fine-tuning obtains worse accuracy than the base model when recommending items in a new category or for a future time period. To explain this generalization failure, we consider an intervention-based importance metric, which shows that a fine-tuned model captures spurious correlations and fails to learn the causal features that determine the relevance between any two text inputs. Moreover, standard methods for causal regularization do not apply in this setting, because unlike in images, there exist no universally spurious features in a text-matching task (the same token may be spurious or causal depending on the text it is being matched to). For OOD generalization on text inputs, therefore, we highlight a different goal: avoiding high importance scores for certain features. We do so using an intervention-based regularizer that constraints the causal effect of any token on the model's relevance score to be similar to the base model. Results on Amazon product and 3 question recommendation datasets show that our proposed regularizer improves generalization for both in-distribution and OOD evaluation, especially in difficult scenarios when the base model is not accurate.

翻訳日:2023-06-17 03:25:43 公開日:2023-06-14

# Pareto Manifold Learning:シングルタスクモデルのアンサンブルを通じて複数のタスクに取り組む

Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models ( http://arxiv.org/abs/2210.09759v2 )

ライセンス: Link先を確認

Nikolaos Dimitriadis, Pascal Frossard, Fran\c{c}ois Fleuret

(参考訳) MTL(Multi-Task Learning)では、タスクはソリューションに最適化を導くのではなく、互いに競合し、達成したパフォーマンスを制限することができる。すべてのタスクに最適なユニークなソリューションが存在しないことが多いため、実践者はタスクのパフォーマンス間のトレードオフをバランスさせ、Paretoの意味において最適性に頼る必要がある。ほとんどのMTL方法論は、この側面を完全に無視し、パレートフロントを学習する代わりに、最適化スキームによって事前に定義された1つの解を生成する。最近のアプローチでは、ニューラルネットワークを介してPareto Frontをパラメータ化し、トレードオフから客観的空間への複雑なマッピングにつながっている。本稿では,パレート前線がパラメータ空間における線形パラメータ化を許容すると仮定し,重み空間におけるセンシング法である \textit{pareto manifold learning} を提案する。当社のアプローチでは,単一トレーニングランで連続的なPareto Frontを生成し,推論中の各タスクのパフォーマンスを変調する。画像分類から表データセット、シーン理解まで、マルチタスク学習ベンチマークの実験では、 \textit{Pareto Manifold Learning} が最先端の単一ポイントアルゴリズムより優れており、マルチポイントベースラインよりも優れたパレートパラメータ化を学習している。

In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution, superior to all its single-task trained counterparts. Since there is often not a unique solution optimal for all tasks, practitioners have to balance tradeoffs between tasks' performance, and resort to optimality in the Pareto sense. Most MTL methodologies either completely neglect this aspect, and instead of aiming at learning a Pareto Front, produce one solution predefined by their optimization schemes, or produce diverse but discrete solutions. Recent approaches parameterize the Pareto Front via neural networks, leading to complex mappings from tradeoff to objective space. In this paper, we conjecture that the Pareto Front admits a linear parameterization in parameter space, which leads us to propose \textit{Pareto Manifold Learning}, an ensembling method in weight space. Our approach produces a continuous Pareto Front in a single training run, that allows to modulate the performance on each task during inference. Experiments on multi-task learning benchmarks, ranging from image classification to tabular datasets and scene understanding, show that \textit{Pareto Manifold Learning} outperforms state-of-the-art single-point algorithms, while learning a better Pareto parameterization than multi-point baselines.

翻訳日:2023-06-17 03:24:34 公開日:2023-06-14

# PromptCast: 時系列予測のための新しいPromptベースの学習パラダイム

PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting ( http://arxiv.org/abs/2210.08964v3 )

ライセンス: Link先を確認

Hao Xue and Flora D.Salim

(参考訳) 本稿では,時系列予測の新しい視点を提案する。既存の時系列予測手法では、モデルは入力として数値の列を取り、出力として数値値を生成する。既存のSOTAモデルはトランスフォーマーアーキテクチャに基づいており、複数のエンコーディング機構で変更され、歴史的データのコンテキストとセマンティクスが組み込まれている。事前学習された言語基盤モデルの成功に触発されて、これらのモデルが時系列予測の解決にも適用できるかどうかを疑問視する。そこで我々は,新しい予測パラダイムであるprompt-based time series forecasting (promptcast)を提案する。この新しいタスクでは、数値入力と出力をプロンプトに変換し、予測タスクを文から文へのフレーム化することで、予測目的の言語モデルを直接適用することができる。本研究を支援するために,3つの実世界の予測シナリオを含む大規模データセット(PISA)を提案する。我々は異なるSOTA数値に基づく予測手法と言語生成モデルを評価する。様々な予測設定によるベンチマーク結果は、言語生成モデルで提案するプロンプトキャストが有望な研究方向であることを示している。さらに、従来の数値ベースの予測と比較すると、PromptCastはゼロショット設定下でのより優れた一般化能力を示す。

This paper presents a new perspective on time series forecasting. In existing time series forecasting methods, the models take a sequence of numerical values as input and yield numerical values as output. The existing SOTA models are largely based on the Transformer architecture, modified with multiple encoding mechanisms to incorporate the context and semantics around the historical data. Inspired by the successes of pre-trained language foundation models, we pose a question about whether these models can also be adapted to solve time-series forecasting. Thus, we propose a new forecasting paradigm: prompt-based time series forecasting (PromptCast). In this novel task, the numerical input and output are transformed into prompts and the forecasting task is framed in a sentence-to-sentence manner, making it possible to directly apply language models for forecasting purposes. To support and facilitate the research of this task, we also present a large-scale dataset (PISA) that includes three real-world forecasting scenarios. We evaluate different SOTA numerical-based forecasting methods and language generation models. The benchmark results with various forecasting settings demonstrate the proposed PromptCast with language generation models is a promising research direction. Additionally, in comparison to conventional numerical-based forecasting, PromptCast shows a much better generalization ability under the zero-shot setting.

翻訳日:2023-06-17 03:24:08 公開日:2023-06-14

# 農業領域におけるジョイントセマンティクス,植物インスタンス,葉のインスタンスセグメンテーションの階層的アプローチ

Hierarchical Approach for Joint Semantic, Plant Instance, and Leaf Instance Segmentation in the Agricultural Domain ( http://arxiv.org/abs/2210.07879v2 )

ライセンス: Link先を確認

Gianmarco Roggiolani, Matteo Sodano, Tiziano Guadagnino, Federico Magistri, Jens Behley, Cyrill Stachniss

(参考訳) 植物表現型は、植物の成長段階、発達、その他の関連する量を記述するため、農業において中心的な役割である。ロボットは、葉の数、葉面積、植物の大きさなどの植物形質を正確に推定することで、このプロセスの自動化を支援する。本稿では,RGBデータから作物の連接意味,植物インスタンス,葉のインスタンスセグメンテーションの問題に対処する。本稿では,3つのタスクを同時に処理し,その基盤となる階層構造を活用する畳み込みニューラルネットワークを提案する。タスク固有のスキップ接続を導入することで,従来のスキームよりも有益であることが実験的評価で証明される。また,葉の重なり合っているため,農業領域に共通する空間的近接インスタンスの問題に明示的に対処する,新しい自動後処理を提案する。私たちのアーキテクチャは、農業の文脈で同時にこれらの問題に取り組みます。以前の作品は植物または葉のセグメンテーションに焦点を当てるか、意味的なセグメンテーションを最適化しない。その結果,システムの性能は最先端の手法に比べて優れ,パラメータ数が減少し,カメラフレームレートで動作していることがわかった。

Plant phenotyping is a central task in agriculture, as it describes plants' growth stage, development, and other relevant quantities. Robots can help automate this process by accurately estimating plant traits such as the number of leaves, leaf area, and the plant size. In this paper, we address the problem of joint semantic, plant instance, and leaf instance segmentation of crop fields from RGB data. We propose a single convolutional neural network that addresses the three tasks simultaneously, exploiting their underlying hierarchical structure. We introduce task-specific skip connections, which our experimental evaluation proves to be more beneficial than the usual schemes. We also propose a novel automatic post-processing, which explicitly addresses the problem of spatially close instances, common in the agricultural domain because of overlapping leaves. Our architecture simultaneously tackles these problems jointly in the agricultural context. Previous works either focus on plant or leaf segmentation, or do not optimise for semantic segmentation. Results show that our system has superior performance compared to state-of-the-art approaches, while having a reduced number of parameters and is operating at camera frame rate.

翻訳日:2023-06-17 03:22:56 公開日:2023-06-14

# CORL: 深部オフライン強化学習ライブラリ

CORL: Research-oriented Deep Offline Reinforcement Learning Library ( http://arxiv.org/abs/2210.07105v3 )

ライセンス: Link先を確認

Denis Tarasov, Alexander Nikulin, Dmitry Akimov, Vladislav Kurenkov, Sergey Kolesnikov

(参考訳) CORLはオープンソースのライブラリで、オフラインとオフラインの強化学習アルゴリズムの両方で、徹底的にベンチマークされた単一ファイルの実装を提供する。簡単なコードベースと現代的な分析追跡ツールを使って、シンプルな開発体験を強調する。 CORLでは、メソッドの実装を個別のファイルに分離し、パフォーマンス関連の詳細を認識しやすくする。さらに、メトリクス、ハイパーパラメータ、依存関係などをクラウドにログする実験追跡機能も提供されている。最後に、一般的なD4RLデータセットをベンチマークすることで実装の信頼性を保証し、パフォーマンスプロファイルや改善の確率、期待されるオンラインパフォーマンスなどの堅牢な評価ツールに再利用可能な、透過的な結果のソースを提供する。

CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into separate single files, making performance-relevant details easier to recognize. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking commonly employed D4RL datasets providing a transparent source of results that can be reused for robust evaluation tools such as performance profiles, probability of improvement, or expected online performance.

翻訳日:2023-06-17 03:22:37 公開日:2023-06-14

# FP拡散:下記のスコアフォッカー・プランク方程式によるスコアベース拡散モデルの改善

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation ( http://arxiv.org/abs/2210.04296v4 )

ライセンス: Link先を確認

Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon

(参考訳) スコアベース生成モデル (sgms) は, 音量の増加に伴って摂動するデータ密度に対応する雑音条件スコア関数の族を学習する。これらの摂動データ密度は、拡散過程を経た密度の空間-時間発展を管理する偏微分方程式(pde)であるフォッカー・プランク方程式(fpe)によって結合される。本研究では,摂動データ密度(すなわち勾配)のノイズ条件スコアを特徴付けるスコアfpeと呼ばれる対応する方程式を導出する。驚くべきことに、印象的な経験的性能にもかかわらず、DSM(denoising score matching)によって学習されたスコアは、基礎となるスコアFPEを満たさないことが観察された。スコアFPEの満足度や保守度を向上させるため,FPEの満足度が望ましいことを示す。そこで,本研究では,スコアFPEの満足度を高めるためにDSM目標を標準化することを提案する。

Score-based generative models (SGMs) learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These perturbed data densities are linked together by the Fokker-Planck equation (FPE), a partial differential equation (PDE) governing the spatial-temporal evolution of a density undergoing a diffusion process. In this work, we derive a corresponding equation called the score FPE that characterizes the noise-conditional scores of the perturbed data densities (i.e., their gradients). Surprisingly, despite the impressive empirical performance, we observe that scores learned through denoising score matching (DSM) fail to fulfill the underlying score FPE, which is an inherent self-consistency property of the ground truth score. We prove that satisfying the score FPE is desirable as it improves the likelihood and the degree of conservativity. Hence, we propose to regularize the DSM objective to enforce satisfaction of the score FPE, and we show the effectiveness of this approach across various datasets.

翻訳日:2023-06-17 03:22:23 公開日:2023-06-14

# 有限データからの重力電流再構成のための物理インフォームドニューラルネットワーク

Physics-informed neural networks for gravity currents reconstruction from limited data ( http://arxiv.org/abs/2211.09715v2 )

ライセンス: Link先を確認

Micka\"el Delcey, Yoann Cheny, S\'ebastien Kiesgen de Richter

(参考訳) 本研究では, 物理インフォームドニューラルネットワーク(PINN)を用いた非定常重力電流の3次元再構成について検討した。 PINNの文脈では、目的関数がネットワーク予測と観測データとのミスマッチをペナルティ化し、自動微分を用いて基礎となる方程式を埋め込むニューラルネットワークを訓練することにより、流れ場を再構築する。本研究は、正準ロック交換構成の高忠実度数値実験に依存する。これにより、密度と速度に関する最先端の実験的な測定技術を模倣した、いくつかのトレーニングデータベース上で、PINNの再構築能力を定量的にベンチマークすることができる。特に、光減衰法(lat)による空間平均密度測定がトレーニング手順に採用されている。 pinnによるフロー再構成のための最適実験セットアップは,実装の複雑さと推定フィールドの精度という2つの基準に従って提案されている。

The present work investigates the use of physics-informed neural networks (PINNs) for the 3D reconstruction of unsteady gravity currents from limited data. In the PINN context, the flow fields are reconstructed by training a neural network whose objective function penalizes the mismatch between the network predictions and the observed data and embeds the underlying equations using automatic differentiation. This study relies on a high-fidelity numerical experiment of the canonical lock-exchange configuration. This allows us to benchmark quantitatively the PINNs reconstruction capabilities on several training databases that mimic state-of-the-art experimental measurement techniques for density and velocity. Notably, spatially averaged density measurements by light attenuation technique (LAT) are employed for the training procedure. An optimal experimental setup for flow reconstruction by PINNs is proposed according to two criteria : the implementation complexity and the accuracy of the inferred fields.

翻訳日:2023-06-17 03:17:00 公開日:2023-06-14

# 因子化階層型変分オートエンコーダにおけるコントラスト学習による不等角化音声表現の改善

Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder ( http://arxiv.org/abs/2211.08191v2 )

ライセンス: Link先を確認

Yuying Xie, Thomas Arildsen, Zheng-Hua Tan

(参考訳) 話者のアイデンティティと内容が異なる時間スケールで異なるという事実を活用すると、 \acrlong{fhvae} (\acrshort{fhvae}) は2つの属性を象徴するために異なる潜在変数を使用する。これらの属性の切り離しは、対応する潜在変数の異なる事前設定によって実行される。話者同一性変数の事前について、 \acr short{fhvae} は、発話スケールの変化平均と固定分散を持つガウス分布であると仮定する。トレーニングプロセスは、小さな一定の分散を設定することにより、先行する平均に近い1つの発話におけるアイデンティティ変数を促進する。しかし、この制約は、発話間の先行的な変化の平均として比較的弱い。そこで,本研究では,同じ話者を表す場合の話者識別変数を,他の話者と可能な限り距離を置けるようにするために,コントラスト学習を<acrshort{fhvae} フレームワークに導入する。この作業ではモデル構造は変更されていないが、トレーニングプロセスのみであるため、テスト中に追加のコストは必要ない。本論文の応用例として音声変換が選択されている。潜在変数評価には、話者識別変数の話者検証と識別、コンテンツ変数の音声認識が含まれる。さらに, 偽音声検出実験の結果から, 音声変換性能の評価を行った。その結果,提案手法は<acrshort{fhvae}と比較して話者識別とコンテンツ特徴抽出の両面で改善し,変換のベースラインよりも優れた性能を示した。

Leveraging the fact that speaker identity and content vary on different time scales, \acrlong{fhvae} (\acrshort{fhvae}) uses different latent variables to symbolize these two attributes. Disentanglement of these attributes is carried out by different prior settings of the corresponding latent variables. For the prior of speaker identity variable, \acrshort{fhvae} assumes it is a Gaussian distribution with an utterance-scale varying mean and a fixed variance. By setting a small fixed variance, the training process promotes identity variables within one utterance gathering close to the mean of their prior. However, this constraint is relatively weak, as the mean of the prior changes between utterances. Therefore, we introduce contrastive learning into the \acrshort{fhvae} framework, to make the speaker identity variables gathering when representing the same speaker, while distancing themselves as far as possible from those of other speakers. The model structure has not been changed in this work but only the training process, thus no additional cost is needed during testing. Voice conversion has been chosen as the application in this paper. Latent variable evaluations include speaker verification and identification for the speaker identity variable, and speech recognition for the content variable. Furthermore, assessments of voice conversion performance are on the grounds of fake speech detection experiments. Results show that the proposed method improves both speaker identity and content feature extraction compared to \acrshort{fhvae}, and has better performance than baseline on conversion.

翻訳日:2023-06-17 03:16:18 公開日:2023-06-14

# ディープラーニングのための方向性プライバシ

Directional Privacy for Deep Learning ( http://arxiv.org/abs/2211.04686v2 )

ライセンス: Link先を確認

Pedro Faustini, Natasha Fernandes, Shakila Tonni, Annabelle McIver, Mark Dras

(参考訳) Differentially Private Stochastic Gradient Descent (DP-SGD)は、ディープラーニングモデルのトレーニングにプライバシーを適用するための重要な方法である。これはトレーニング中の勾配に等方性ガウスノイズを適用し、任意の方向にこれらの勾配を摂動させ、有用性を損なう。しかし、メトリックDPは、ユーティリティの保存にもっと適した任意のメトリクスに基づいた代替メカニズムを提供することができる。本稿では,von mises-fisher (vmf) 分布に基づく機構を用いて,von mises-fisher (vmf) 分布に基づく \textit{directional privacy} を適用し,グラデーション方向が広く保存されるように \textit{angular distance} を用いて勾配を摂動させる。このことは、ガウスのメカニズムの$(\epsilon, \delta)$-privacyではなく、深層学習トレーニングに$\epsilon$-DPと$\epsilon d$-privacyの両方を提供することを示している。これらの異なるフレームワーク間の$\epsilon$sを直接比較できないため、MIA(メンバシップ推論攻撃)を用いた標準DPフレームワーク内のプライバシを実証的に校正する経験的プライバシ校正メカニズムを検証し、MIAの強化と再構築攻撃の組み合わせが、プライバシ校正に適した方法であることを示した。キーデータセットの実験は、VMFメカニズムがユーティリティとプライバシのトレードオフでガウシアンを上回っていることを示している。特に,本実験は,再建と会員推定に対する防御能力の観点から,2つのアプローチのプライバシーを直接比較するものである。

Differentially Private Stochastic Gradient Descent (DP-SGD) is a key method for applying privacy in the training of deep learning models. This applies isotropic Gaussian noise to gradients during training, which can perturb these gradients in any direction, damaging utility. Metric DP, however, can provide alternative mechanisms based on arbitrary metrics that might be more suitable for preserving utility. In this paper, we apply \textit{directional privacy}, via a mechanism based on the von Mises-Fisher (VMF) distribution, to perturb gradients in terms of \textit{angular distance} so that gradient direction is broadly preserved. We show that this provides both $\epsilon$-DP and $\epsilon d$-privacy for deep learning training, rather than the $(\epsilon, \delta)$-privacy of the Gaussian mechanism; we observe that the $\epsilon d$-privacy guarantee does not require a $\delta>0$ term but degrades smoothly according to the dissimilarity of the input gradients. As $\epsilon$s between these different frameworks cannot be directly compared, we examine empirical privacy calibration mechanisms that go beyond previous work on empirically calibrating privacy within standard DP frameworks using membership inference attacks (MIA); we show that a combination of enhanced MIA and reconstruction attacks provides a suitable method for privacy calibration. Experiments on key datasets then indicate that the VMF mechanism can outperform the Gaussian in the utility-privacy trade-off. In particular, our experiments provide a direct comparison of privacy between the two approaches in terms of their ability to defend against reconstruction and membership inference.

翻訳日:2023-06-17 03:15:50 公開日:2023-06-14

# LMD:話者検証の逆例を検出する学習可能なマスクネットワーク

LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification ( http://arxiv.org/abs/2211.00825v2 )

ライセンス: Link先を確認

Xing Chen, Jie Wang, Xiao-Lei Zhang, Wei-Qiang Zhang, and Kunde Yang

(参考訳) 自動話者検証(ASV)のセキュリティは、最近出現した敵攻撃によって深刻な脅威を受けているが、脅威を緩和するための対策がいくつかある。しかし、多くの防御的アプローチは、攻撃者の事前の知識を必要とするだけでなく、弱い解釈性も持っている。そこで本稿では,学習可能なマスク検出器 (LMD) と呼ばれる攻撃者非依存かつ解釈可能な手法を提案する。スコア変動は、元のオーディオ録音のASVスコアと、そのマスク付き複素スペクトログラムから合成された変換オーディオとの絶対的な差である、逆例を検出する指標としてスコア変動を利用する。スコア変動検出装置のコアコンポーネントは、ニューラルネットワークによってマスクされたスペクトログラムを生成することである。ニューラルネットワークはトレーニングの真の例のみを必要とするため、アタッカー非依存のアプローチになる。その解釈性は、ニューラルネットワークがターゲットのasvのスコア変動を最小限に抑えるように訓練され、本物のトレーニング例のマスキングされたスペクトログラムビンの数を最大化する。その基礎は、話者情報が少ない分光器箱の大部分をマスキングすることで、必然的に敵対的な例に大きなスコアの変動をもたらし、実際の例に小さなスコアの変動をもたらすという観察に基づいている。 12人の攻撃者と2人の代表的ASVシステムによる実験結果から,提案手法は最先端の5つのベースラインより優れていることがわかった。大規模な実験結果は、検出に基づくASV防御のベンチマークでもある。

Although the security of automatic speaker verification (ASV) is seriously threatened by recently emerged adversarial attacks, there have been some countermeasures to alleviate the threat. However, many defense approaches not only require the prior knowledge of the attackers but also possess weak interpretability. To address this issue, in this paper, we propose an attacker-independent and interpretable method, named learnable mask detector (LMD), to separate adversarial examples from the genuine ones. It utilizes score variation as an indicator to detect adversarial examples, where the score variation is the absolute discrepancy between the ASV scores of an original audio recording and its transformed audio synthesized from its masked complex spectrogram. A core component of the score variation detector is to generate the masked spectrogram by a neural network. The neural network needs only genuine examples for training, which makes it an attacker-independent approach. Its interpretability lies that the neural network is trained to minimize the score variation of the targeted ASV, and maximize the number of the masked spectrogram bins of the genuine training examples. Its foundation is based on the observation that, masking out the vast majority of the spectrogram bins with little speaker information will inevitably introduce a large score variation to the adversarial example, and a small score variation to the genuine example. Experimental results with 12 attackers and two representative ASV systems show that our proposed method outperforms five state-of-the-art baselines. The extensive experimental results can also be a benchmark for the detection-based ASV defenses.

翻訳日:2023-06-17 03:14:43 公開日:2023-06-14

# ランク制約最適化問題に対するDantzig-Wolfe緩和の効果について

On the Exactness of Dantzig-Wolfe Relaxation for Rank Constrained Optimization Problems ( http://arxiv.org/abs/2210.16191v3 )

ライセンス: Link先を確認

Yongchun Li and Weijun Xie

(参考訳) 階数制約最適化問題(RCOP)では、予め定義された階数制約付き領域集合上の線形目的関数を最小化し、一般的な二辺行列の不等式を$m$とする。多くの非凸最適化問題を解く一般的なアプローチであるダンツィヒ=ウルフ分解(DW)によって動機付けられ、RCOPのDW緩和(DWR)の強さについて検討する。特に、我々の目標は、DWRが任意の m 個の二辺行列の不等式に対して RCOP と一致する条件を特徴づけることである。初歩的な観点からは、最初に知られた必要条件と十分条件を同時に開発する。 (i)極点正確性 -- DWR の可能な集合のすべての極点が RCOP の極点に属する。 (ii) 凸船体精度 -- DWR 実現可能な集合は RCOP 実現可能な集合の閉凸船体と同一である。 (iii) 客観的厳密性 - dwr と rcop の最適値が一致する。提案した条件は,2次制約付き2次プログラム(QCQP)と公正な教師なし学習において,既存の正確性を統一,洗練,拡張する。これらの条件は、2つの同質な2辺2次制約を持つ不均一な目的関数を許容するQCQP問題の極点完全性や、フェアSVDの凸包完全性など、新しい結果の同定に非常に有用である。

In the rank-constrained optimization problem (RCOP), it minimizes a linear objective function over a prespecified closed rank-constrained domain set and $m$ generic two-sided linear matrix inequalities. Motivated by the Dantzig-Wolfe (DW) decomposition, a popular approach of solving many nonconvex optimization problems, we investigate the strength of DW relaxation (DWR) of the RCOP, which admits the same formulation as RCOP except replacing the domain set by its closed convex hull. Notably, our goal is to characterize conditions under which the DWR matches RCOP for any m two-sided linear matrix inequalities. From the primal perspective, we develop the first-known simultaneously necessary and sufficient conditions that achieve: (i) extreme point exactness -- all the extreme points of the DWR feasible set belong to that of the RCOP; (ii) convex hull exactness -- the DWR feasible set is identical to the closed convex hull of RCOP feasible set; and (iii) objective exactness -- the optimal values of the DWR and RCOP coincide. The proposed conditions unify, refine, and extend the existing exactness results in the quadratically constrained quadratic program (QCQP) and fair unsupervised learning. These conditions can be very useful to identify new results, including the extreme point exactness for a QCQP problem that admits an inhomogeneous objective function with two homogeneous two-sided quadratic constraints and the convex hull exactness for fair SVD.

翻訳日:2023-06-17 03:14:15 公開日:2023-06-14

# 群対称性を用いた連続視覚に基づく強化学習

Continual Vision-based Reinforcement Learning with Group Symmetries ( http://arxiv.org/abs/2210.12301v2 )

ライセンス: Link先を確認

Shiqi Liu, Mengdi Xu, Piede Huang, Yongkang Liu, Kentaro Oguchi, Ding Zhao

(参考訳) 継続的な強化学習は、様々なタスクを順次学習し、以前遭遇したタスクを実行する能力を維持し、同時に新しいタスクのための新しいポリシーを開発することを目的としている。しかし、現在の連続RLアプローチは、特定のタスクが回転や翻訳といった基本的なグループ操作、特に視覚入力において同一であるという事実を無視する。彼らは、同じタスクごとに新しいポリシーを学習し、維持する必要があり、サンプル効率が悪く、一般化能力が弱い。そこで本研究では,個別のタスクではなく,個別のタスク群に対するポリシーを育成し,グループ対称性を認識できる,一意な連続視覚に基づく強化学習手法を提案する。 COVERSは、近似ポリシー最適化に基づくRLアルゴリズムと、同変特徴抽出器と、抽出した不変特徴に依存する新しいタスクグループ化機構を用いる。シミュレーションと実ロボットプラットフォームの両方において,画像観察とロボット固有情報を含むテーブルトップ操作タスクのシーケンスについて評価する。その結果, COVERS は各グループにタスクを正確に割り当て, 一般化能力において既存手法よりも優れていた。

Continual reinforcement learning aims to sequentially learn a variety of tasks, retaining the ability to perform previously encountered tasks while simultaneously developing new policies for novel tasks. However, current continual RL approaches overlook the fact that certain tasks are identical under basic group operations like rotations or translations, especially with visual inputs. They may unnecessarily learn and maintain a new policy for each similar task, leading to poor sample efficiency and weak generalization capability. To address this, we introduce a unique Continual Vision-based Reinforcement Learning method that recognizes Group Symmetries, called COVERS, cultivating a policy for each group of equivalent tasks rather than individual tasks. COVERS employs a proximal policy optimization-based RL algorithm with an equivariant feature extractor and a novel task grouping mechanism that relies on the extracted invariant features. We evaluate COVERS on sequences of table-top manipulation tasks that incorporate image observations and robot proprioceptive information in both simulations and on real robot platforms. Our results show that COVERS accurately assigns tasks to their respective groups and significantly outperforms existing methods in terms of generalization capability.

翻訳日:2023-06-17 03:13:20 公開日:2023-06-14

# GREAD: グラフニューラル反応拡散ネットワーク

GREAD: Graph Neural Reaction-Diffusion Networks ( http://arxiv.org/abs/2211.14208v3 )

ライセンス: Link先を確認

Jeongwhan Choi, Seoyoung Hong, Noseong Park, Sung-Bae Cho

(参考訳) グラフニューラルネットワーク(GNN)は、ディープラーニングに関する最も人気のある研究トピックの1つである。 GNN法は通常、グラフ信号処理理論に基づいて設計されている。特に、拡散方程式はGNNのコア処理層の設計に広く用いられており、悪名高い過密問題に対して必然的に脆弱である。最近、いくつかの論文が拡散方程式とともに反応方程式に注意を払っている。しかし、それらはすべて限定的な反応方程式である。そこで本研究では,我々が設計した1つの特殊反応方程式に加えて,一般的な反応方程式をすべて考慮した反応拡散式に基づくgnn法を提案する。本論文は,反応拡散式に基づくgnnに関する最も包括的な研究の1つである。 9つのデータセットと28のベースラインを用いた実験では、GREADと呼ばれる手法がほとんどのケースで優れています。さらなる合成データ実験により、オーバースムーシング問題を緩和し、様々なホモフィリー率でうまく機能することが示された。

Graph neural networks (GNNs) are one of the most popular research topics for deep learning. GNN methods typically have been designed on top of the graph signal processing theory. In particular, diffusion equations have been widely used for designing the core processing layer of GNNs, and therefore they are inevitably vulnerable to the notorious oversmoothing problem. Recently, a couple of papers paid attention to reaction equations in conjunctions with diffusion equations. However, they all consider limited forms of reaction equations. To this end, we present a reaction-diffusion equation-based GNN method that considers all popular types of reaction equations in addition to one special reaction equation designed by us. To our knowledge, our paper is one of the most comprehensive studies on reaction-diffusion equation-based GNNs. In our experiments with 9 datasets and 28 baselines, our method, called GREAD, outperforms them in a majority of cases. Further synthetic data experiments show that it mitigates the oversmoothing problem and works well for various homophily rates.

翻訳日:2023-06-17 03:05:07 公開日:2023-06-14

# プレイヤーは次に動くのか? バドミントンにおける運動予測のための動的グラフと階層融合

Where Will Players Move Next? Dynamic Graphs and Hierarchical Fusion for Movement Forecasting in Badminton ( http://arxiv.org/abs/2211.12217v2 )

ライセンス: Link先を確認

Kai-Shiang Chang, Wei-Yao Wang, Wen-Chih Peng

(参考訳) 各種データの分析により,トレーニング戦略やプレーヤ評価などの洞察が得られ,スポーツ分析が注目を集めている。そこで本稿では,どの種類の復帰ストロークが作られるか,また,選手が前回のストロークに基づいてどこに移動するかを予測することに焦点を当てる。この問題はこれまで解決されていないため、シーケンス予測タスクとして定式化することにより、シーケンスベースおよびグラフベースのモデルを通じて動き予測に取り組むことができる。しかし、既存のシーケンスベースのモデルはプレイヤー間の相互作用の影響を無視しており、グラフベースのモデルは次の動きに対する多面的視点に苦しむ。また、プレイヤーのショットタイプや動きの戦略的関係を表現する作業は現存していない。これらの課題に対処するために,まず,プレイヤーの動き(pm)グラフの手順を導入し,プレイヤーの構造的動きを戦略的関係に活用する。 PMグラフに基づいて,対話スタイル抽出器を用いた動的グラフと階層型動き予測モデル(DyMF)を提案する。さらに、階層的融合モジュールはプレイヤーとラリー相互作用の両方のスタイルの影響を組み込むように設計されている。広範な実験により,本モデルが逐次的およびグラフ的手法を経験的に上回っており,動き予測の実用性が示される。

Sports analytics has captured increasing attention since analysis of the various data enables insights for training strategies, player evaluation, etc. In this paper, we focus on predicting what types of returning strokes will be made, and where players will move to based on previous strokes. As this problem has not been addressed to date, movement forecasting can be tackled through sequence-based and graph-based models by formulating as a sequence prediction task. However, existing sequence-based models neglect the effects of interactions between players, and graph-based models still suffer from multifaceted perspectives on the next movement. Moreover, there is no existing work on representing strategic relations among players' shot types and movements. To address these challenges, we first introduce the procedure of the Player Movements (PM) graph to exploit the structural movements of players with strategic relations. Based on the PM graph, we propose a novel Dynamic Graphs and Hierarchical Fusion for Movement Forecasting model (DyMF) with interaction style extractors to capture the mutual interactions of players themselves and between both players within a rally, and dynamic players' tactics across time. In addition, hierarchical fusion modules are designed to incorporate the style influence of both players and rally interactions. Extensive experiments show that our model empirically outperforms both sequence- and graph-based methods and demonstrate the practical usage of movement forecasting.

翻訳日:2023-06-17 03:04:18 公開日:2023-06-14

# Aging with GRACE: 離散キーバリューアダプタによる生涯モデル編集

Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adapters ( http://arxiv.org/abs/2211.11031v3 )

ライセンス: Link先を確認

Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi

(参考訳) デプロイされたモデルは、入力のシフト、ユーザニーズの変化、あるいは創発的な知識ギャップによって、時間の経過とともに崩壊する。有害な行動が特定される場合、ターゲットとする編集が必要である。しかし、事前訓練されたモデルの特定の振る舞いを調整する現在のモデルエディタは、複数の編集でモデル性能を低下させる。本稿では,展開モデルのストリーミングエラーにスポットフィックスを実装し,無関係な入力への影響を最小限に抑えるライフロングモデル編集手法であるGRACEを提案する。 GRACEはトレーニング済みモデルの潜在空間に新しいマッピングを書き、モデルの重みを変えることなく、個別にローカルな編集のコードブックを作成する。これはストリーミングエラーのみを使用して、数千のシーケンシャルな編集を可能にする最初の方法である。 T5,BERT,GPTモデルを用いた実験では,非表示入力に一般化しつつ,編集および保持におけるGRACEの最先端性能を示す。私たちのコードはhttps://www.github.com/thartvigsen/grace}{github.com/thartvigsen/grace}で入手できる。

Deployed models decay over time due to shifting inputs, changing user needs, or emergent knowledge gaps. When harmful behaviors are identified, targeted edits are required. However, current model editors, which adjust specific behaviors of pre-trained models, degrade model performance over multiple edits. We propose GRACE, a Lifelong Model Editing method, which implements spot-fixes on streaming errors of a deployed model, ensuring minimal impact on unrelated inputs. GRACE writes new mappings into a pre-trained model's latent space, creating a discrete, local codebook of edits without altering model weights. This is the first method enabling thousands of sequential edits using only streaming errors. Our experiments on T5, BERT, and GPT models show GRACE's state-of-the-art performance in making and retaining edits, while generalizing to unseen inputs. Our code is available at https://www.github.com/thartvigsen/grace}{github.com/thartvigsen/grace}.

翻訳日:2023-06-17 03:03:38 公開日:2023-06-14

# 特徴帰属を伴うニューラルマシン翻訳における幻覚の低減

Reducing Hallucinations in Neural Machine Translation with Feature Attribution ( http://arxiv.org/abs/2211.09878v2 )

ライセンス: Link先を確認

Jo\"el Tang, Marina Fomicheva, Lucia Specia

(参考訳) ニューラル条件付き言語生成モデルは、ニューラルネットワーク翻訳(NMT)の最先端を実現するが、並列トレーニングデータセットの品質に大きく依存する。低品質のデータセットでトレーニングすると、これらのモデルは幻覚、すなわち、流動的だが原文とは無関係な出力を含む様々なエラータイプに傾向がある。これらの誤りは特に危険である、なぜなら表面上は翻訳が正しい出力であると認識でき、特に読者がソース言語を理解していない場合である。 NMTにおける幻覚の軽減を目的としたモデル理解と正規化に着目したケーススタディを提案する。まず,幻覚を発生させるnmtモデルの行動を研究するために特徴帰属法を用いる。次に,これらの手法を用いて幻覚を低減し,モデルをスクラッチから再トレーニングする必要のない新しい損失関数を提案する。

Neural conditional language generation models achieve the state-of-the-art in Neural Machine Translation (NMT) but are highly dependent on the quality of parallel training dataset. When trained on low-quality datasets, these models are prone to various error types, including hallucinations, i.e. outputs that are fluent, but unrelated to the source sentences. These errors are particularly dangerous, because on the surface the translation can be perceived as a correct output, especially if the reader does not understand the source language. We present a case study focusing on model understanding and regularisation to reduce hallucinations in NMT. We first use feature attribution methods to study the behaviour of an NMT model that produces hallucinations. We then leverage these methods to propose a novel loss function that substantially helps reduce hallucinations and does not require retraining the model from scratch.

翻訳日:2023-06-17 03:02:47 公開日:2023-06-14

# ポイントクラウド登録のための解空間切断を用いた進化的マルチタスク

Evolutionary Multitasking with Solution Space Cutting for Point Cloud Registration ( http://arxiv.org/abs/2212.05679v2 )

ライセンス: Link先を確認

Wu Yue, Peiran Gong, Maoguo Gong, Hangqi Ding, Zedong Tang, Yibo Liu, Wenping Ma, Qiguang Miao

(参考訳) ポイントクラウド登録(PCR)はコンピュータビジョンにおいて人気のある研究トピックである。近年,対象関数設計における初期ポーズに対する頑健さと柔軟性から,進化的手法による登録法が注目されている。しかし、ほとんどの登録法は局所最適にうまく対応できず、成功率を調査することはめったになく、これは局所最適に陥らない可能性を示し、アルゴリズムの実用性に密接に関係している。進化的マルチタスク最適化(EMTO)は、関連するタスク間の知識伝達を通じて探索能力を向上するパラダイムである。この概念に着想を得た本研究では,マルチタスク構成を解空間切断の考え方に基づくEMTOによる新規な登録アルゴリズムを提案する。具体的には, カットスペースを探索するタスクは, 局所最適から逃れ, 登録率を向上する上で, 複雑な関数ランドスケープを伴うタスクを支援する。不要な計算コストを削減するため,スパース・トゥ・ダンス戦略を提案する。また,様々なオーバーラップ率に頑健な新しい適合関数と,計算コストの課題特異的指標を導入する。 8つの進化的アプローチ,4つの従来のアプローチ,および3つのディープラーニングアプローチによるオブジェクトスケールおよびシーンスケールの登録データセットと比較し,実験結果から,提案手法は精度と局所最適処理において優れた性能を示した。

Point cloud registration (PCR) is a popular research topic in computer vision. Recently, the registration method in an evolutionary way has received continuous attention because of its robustness to the initial pose and flexibility in objective function design. However, most evolving registration methods cannot tackle the local optimum well and they have rarely investigated the success ratio, which implies the probability of not falling into local optima and is closely related to the practicality of the algorithm. Evolutionary multi-task optimization (EMTO) is a widely used paradigm, which can boost exploration capability through knowledge transfer among related tasks. Inspired by this concept, this study proposes a novel evolving registration algorithm via EMTO, where the multi-task configuration is based on the idea of solution space cutting. Concretely, one task searching in cut space assists another task with complex function landscape in escaping from local optima and enhancing successful registration ratio. To reduce unnecessary computational cost, a sparse-to-dense strategy is proposed. In addition, a novel fitness function robust to various overlap rates as well as a problem-specific metric of computational cost is introduced. Compared with 8 evolving approaches, 4 traditional approaches and 3 deep learning approaches on the object-scale and scene-scale registration datasets, experimental results demonstrate that the proposed method has superior performances in terms of precision and tackling local optima.

翻訳日:2023-06-17 02:56:54 公開日:2023-06-14

# TIDE: グラフによるディープラーニングのための時間微分拡散

TIDE: Time Derivative Diffusion for Deep Learning on Graphs ( http://arxiv.org/abs/2212.02483v2 )

ライセンス: Link先を確認

Maysam Behmanesh, Maximilian Krahn, Maks Ovsjanikov

(参考訳) グラフニューラルネットワークの顕著なパラダイムは、メッセージパッシングフレームワークに基づいている。この枠組みでは、隣接ノード間のみの情報通信を実現する。このパラダイムを使用するアプローチの課題は、深層畳み込みネットワークが過密になりやすいため、ノード間の効率的で正確な長距離通信を保証することである。本稿では,メッセージパッシングフレームワークの構造的制約を克服するために,時間微分グラフ拡散(tide)に基づく新しい手法を提案する。提案手法により,様々なタスクやネットワークチャネル間の空間的拡散範囲を最適化し,中長距離通信を効率的に行うことができる。さらに, アーキテクチャ設計により, ローカルメッセージパッシングが可能であり, ローカルメッセージパッシングの能力を継承できることを示す。グラフベンチマークと合成メッシュとグラフデータセットの両方において,提案フレームワークが最先端手法を著しく上回っていることを示す。

A prominent paradigm for graph neural networks is based on the message-passing framework. In this framework, information communication is realized only between neighboring nodes. The challenge of approaches that use this paradigm is to ensure efficient and accurate long-distance communication between nodes, as deep convolutional networks are prone to oversmoothing. In this paper, we present a novel method based on time derivative graph diffusion (TIDE) to overcome these structural limitations of the message-passing framework. Our approach allows for optimizing the spatial extent of diffusion across various tasks and network channels, thus enabling medium and long-distance communication efficiently. Furthermore, we show that our architecture design also enables local message-passing and thus inherits from the capabilities of local message-passing approaches. We show that on both widely used graph benchmarks and synthetic mesh and graph datasets, the proposed framework outperforms state-of-the-art methods by a significant margin

翻訳日:2023-06-17 02:56:11 公開日:2023-06-14

# 組込みラベルノイズロバスト深部画像表現学習の創発的推論

Generative Reasoning Integrated Label Noise Robust Deep Image Representation Learning ( http://arxiv.org/abs/2212.01261v2 )

ライセンス: Link先を確認

Gencer Sumbul and Beg\"um Demir

(参考訳) 深層学習に基づく画像表現学習(IRL)手法の開発は,様々な画像理解問題に対して大きな注目を集めている。これらの手法の多くは、大量の注釈付き訓練画像の可用性と品質を必要としており、収集には時間と費用がかかる。ラベル費用を削減するため、クラウドソースデータ、自動ラベル付け手順、市民科学プロジェクトなどが考えられる。しかしながら、このようなアプローチは、トレーニングデータにラベルノイズを含めるリスクを増大させる。差別的推論が採用されると、ノイズラベルが過小評価される可能性がある。これにより、準最適学習手順が導き出され、画像の特徴が不正確になる。そこで本研究では,生成的推論統合ラベル雑音ロバスト深部表現学習(grid)手法を提案する。本研究の目的は、雑音ラベル下でのIRLの識別的・生成的推論の相補的特性をモデル化することである。そこで我々はまず,教師付き変分オートエンコーダを用いて生成的推論を識別的推論に統合する。これにより、グリッドはノイズラベルでトレーニングサンプルを自動的に検出できる。そして,ラベルノイズによる頑健なハイブリッド表現学習戦略を通じて,これらのサンプルのIRLの学習手順全体を,識別的推論により生成的推論および他のサンプルの学習手法によって調整する。提案手法は,irl法とは独立に雑音ラベルの干渉を防止しつつ,識別的画像表現を学習する。したがって、既存の手法とは異なり、GRIDはアノテーションの種類、ニューラルネットワークアーキテクチャ、損失関数、学習タスクに依存しないため、様々な問題に直接利用することができる。実験結果から, 最先端手法と比較して有効性を示した。 GRIDのコードはhttps://github.com/gencersumbul/GRIDで公開されている。

The development of deep learning based image representation learning (IRL) methods has attracted great attention for various image understanding problems. Most of these methods require the availability of a high quantity and quality of annotated training images, which can be time-consuming and costly to gather. To reduce labeling costs, crowdsourced data, automatic labeling procedures or citizen science projects can be considered. However, such approaches increase the risk of including label noise in training data. It may result in overfitting on noisy labels when discriminative reasoning is employed. This leads to sub-optimal learning procedures, and thus inaccurate characterization of images. To address this, we introduce a generative reasoning integrated label noise robust deep representation learning (GRID) approach. Our approach aims to model the complementary characteristics of discriminative and generative reasoning for IRL under noisy labels. To this end, we first integrate generative reasoning into discriminative reasoning through a supervised variational autoencoder. This allows GRID to automatically detect training samples with noisy labels. Then, through our label noise robust hybrid representation learning strategy, GRID adjusts the whole learning procedure for IRL of these samples through generative reasoning and that of other samples through discriminative reasoning. Our approach learns discriminative image representations while preventing interference of noisy labels independently from the IRL method being selected. Thus, unlike the existing methods, GRID does not depend on the type of annotation, neural network architecture, loss function or learning task, and thus can be directly utilized for various problems. Experimental results show its effectiveness compared to state-of-the-art methods. The code of GRID is publicly available at https://github.com/gencersumbul/GRID.

翻訳日:2023-06-17 02:55:15 公開日:2023-06-14

# 非凸低ランク半有限緩和による逆学習ニューラルネットワークのタイト認証

Tight Certification of Adversarially Trained Neural Networks via Nonconvex Low-Rank Semidefinite Relaxations ( http://arxiv.org/abs/2211.17244v3 )

ライセンス: Link先を確認

Hong-Ming Chiu and Richard Y. Zhang

(参考訳) adversarial trainingは、adversarial perturbationに対して経験的に堅牢な高品質なニューラルネットワークモデルを作成することでよく知られている。それでも、一度モデルが逆行訓練を受けたら、モデルが将来の攻撃に対して真に堅牢であることを証明したいと願うことが多い。残念なことに、敵対的に訓練されたモデルに直面した場合、既存のアプローチはすべて、実用的に使えるほど強力な認証を作成するのに苦労しています。特に線形プログラミング(LP)技術は「凸緩和障壁」(convex relaxation barrier)に直面しており、混合整数線形プログラミング(MILP)と分岐およびバウンド(BnB)技術で洗練されても高品質な認証を行うことができない。本稿では,半定値プログラミング(SDP)緩和の低ランク制約に基づく非凸認証手法を提案する。非凸緩和により、より高価なSDPメソッドに匹敵する強力な認証が得られ、より弱いLPメソッドに匹敵する、劇的に少ない変数を最適化する。非凸性にもかかわらず、既製の局所最適化アルゴリズムが多項式時間における大域的最適性の実現と証明にどのように役立つかを示す。実験の結果,非凸緩和は正反対に訓練されたモデルの正確な認証に対するギャップをほぼ完全に埋めることがわかった。

Adversarial training is well-known to produce high-quality neural network models that are empirically robust against adversarial perturbations. Nevertheless, once a model has been adversarially trained, one often desires a certification that the model is truly robust against all future attacks. Unfortunately, when faced with adversarially trained models, all existing approaches have significant trouble making certifications that are strong enough to be practically useful. Linear programming (LP) techniques in particular face a "convex relaxation barrier" that prevent them from making high-quality certifications, even after refinement with mixed-integer linear programming (MILP) and branch-and-bound (BnB) techniques. In this paper, we propose a nonconvex certification technique, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. The nonconvex relaxation makes strong certifications comparable to much more expensive SDP methods, while optimizing over dramatically fewer variables comparable to much weaker LP methods. Despite nonconvexity, we show how off-the-shelf local optimization algorithms can be used to achieve and to certify global optimality in polynomial time. Our experiments find that the nonconvex relaxation almost completely closes the gap towards exact certification of adversarially trained models.

翻訳日:2023-06-17 02:54:26 公開日:2023-06-14

# Prompt-Augmented Linear Probing:Few-shot In-Context Learnersの限界を超えるスケーリング

Prompt-Augmented Linear Probing: Scaling beyond the Limit of Few-shot In-Context Learners ( http://arxiv.org/abs/2212.10873v3 )

ライセンス: Link先を確認

Hyunsoo Cho, Hyuhng Joon Kim, Junyeob Kim, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, Taeuk Kim

(参考訳) In-context Learning (ICL) を通じて、大規模言語モデルは、追加のモデル微調整なしで効果的な数ショット学習者となる。しかし、ICLの性能は、基礎となる言語モデル固有の入力長制約によって制限されるため、利用可能なトレーニングサンプルの数に匹敵しない。一方、言語モデルもまた強力な特徴抽出器であり、ブラックボックス方式で利用でき、事前抽出された入力表現の上に軽量な識別器を訓練する線形探索パラダイムを可能にすることが多くの研究で明らかにされている。本稿では,両世界の最善を生かす線形プローブと icl のハイブリッドである promp-augmented linear probing (palp) を提案する。 PALPは線形探索のスケーラビリティと言語モデルを強制することで、入力をより知覚可能な形式に調整することでより意味のある表現を導き出す能力を継承する。各種データセットの詳細な調査を通じて、PALPは、データ・ハングリーシナリオにおけるICL間のギャップを閉じる入力表現と、トレーニングオーバーヘッドの少ないデータ・バウンダントシナリオでの微調整を著しく強化し、ブラックボックスシナリオにおいてPALPが強力な代替手段となる可能性を検証した。

Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning. However, the ICL performance does not scale well with the number of available training samples as it is limited by the inherent input length constraint of the underlying language model. Meanwhile, many studies have revealed that language models are also powerful feature extractors, allowing them to be utilized in a black-box manner and enabling the linear probing paradigm, where lightweight discriminators are trained on top of the pre-extracted input representations. This paper proposes prompt-augmented linear probing (PALP), a hybrid of linear probing and ICL, which leverages the best of both worlds. PALP inherits the scalability of linear probing and the capability of enforcing language models to derive more meaningful representations via tailoring input into a more conceivable form. Throughout in-depth investigations on various datasets, we verified that PALP significantly enhances the input representations closing the gap between ICL in the data-hungry scenario and fine-tuning in the data-abundant scenario with little training overhead, potentially making PALP a strong alternative in a black-box scenario.

翻訳日:2023-06-17 02:45:34 公開日:2023-06-14

# GPT-3は良いデータアノテーションか?

Is GPT-3 a Good Data Annotator? ( http://arxiv.org/abs/2212.10450v2 )

ライセンス: Link先を確認

Bosheng Ding, Chengwei Qin, Linlin Liu, Yew Ken Chia, Shafiq Joty, Boyang Li, Lidong Bing

(参考訳) データアノテーションは、機械学習モデルのトレーニングに使用できるデータのラベル付けプロセスである。モデルが入力データと所望の出力の関係を学習できるようにするため、高品質なアノテーションを持つことが不可欠である。 OpenAIが開発した大規模言語モデルであるGPT-3は、広範囲なNLPタスクにおいて、ゼロショットと少数ショットのパフォーマンスを誇示している。したがって、NLPタスクのデータに効果的にアノテートできるかどうか疑問に思うのが自然である。本稿では,GPT-3を従来のデータアノテーション手法と比較し,その出力を様々なタスクで分析することにより,データアノテータとしての性能を評価する。そこで本研究では,NLPにおける汎用データアノテータとしてのGPT-3の可能性について考察する。

Data annotation is the process of labeling data that could be used to train machine learning models. Having high-quality annotation is crucial, as it allows the model to learn the relationship between the input data and the desired output. GPT-3, a large-scale language model developed by OpenAI, has demonstrated impressive zero- and few-shot performance on a wide range of NLP tasks. It is therefore natural to wonder whether it can be used to effectively annotate data for NLP tasks. In this paper, we evaluate the performance of GPT-3 as a data annotator by comparing it with traditional data annotation methods and analyzing its output on a range of tasks. Through this analysis, we aim to provide insight into the potential of GPT-3 as a general-purpose data annotator in NLP.

翻訳日:2023-06-17 02:45:12 公開日:2023-06-14

# 干し草の山に刺さる針--mturkにおける高品位労働者の要約分析

Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization ( http://arxiv.org/abs/2212.10397v3 )

ライセンス: Link先を確認

Lining Zhang, Simon Mille, Yufang Hou, Daniel Deutsch, Elizabeth Clark, Yixin Liu, Saad Mahamood, Sebastian Gehrmann, Miruna Clinciu, Khyathi Chandu, Jo\~ao Sedoc

(参考訳) 低品質アノテーションにおけるリソースのコストと非効率的な使用を防止するため、自動要約評価などの困難なタスクを効果的に完了できる信頼可能なアノテータのプールを作成する方法が望まれる。そこで本研究では,amazon mechanical turk workersの2段階パイプラインによる採用について検討する。我々は、評価を行う前にサブパーワーカーをフィルタリングし、リソースに類似した制約のある高収差アノテーションを得られることを示す。当社のワーカーは、自分自身とクラウドリサーチワーカーの間で強いコンセンサスを示していますが、データのサブセットに対する専門家の判断との一致は期待どおりではなく、正確性に関するさらなるトレーニングが必要です。この論文は、他の困難なアノテーションタスクにおいて、資格アノテータを採用するためのベストプラクティスとして機能する。

To prevent the costly and inefficient use of resources on low-quality annotations, we want a method for creating a pool of dependable annotators who can effectively complete difficult tasks, such as evaluating automatic summarization. Thus, we investigate the recruitment of high-quality Amazon Mechanical Turk workers via a two-step pipeline. We show that we can successfully filter out subpar workers before they carry out the evaluations and obtain high-agreement annotations with similar constraints on resources. Although our workers demonstrate a strong consensus among themselves and CloudResearch workers, their alignment with expert judgments on a subset of the data is not as expected and needs further training in correctness. This paper still serves as a best practice for the recruitment of qualified annotators in other challenging annotation tasks.

翻訳日:2023-06-17 02:44:58 公開日:2023-06-14

# DOC:詳細なアウトライン制御による長いストーリーコヒーレンスの改善

DOC: Improving Long Story Coherence With Detailed Outline Control ( http://arxiv.org/abs/2212.10077v3 )

ライセンス: Link先を確認

Kevin Yang, Dan Klein, Nanyun Peng, Yuandong Tian

(参考訳) 複数単語のストーリーを自動的に生成する際の長距離プロットコヒーレンスを改善するための詳細アウトライン制御(DOC)フレームワークを提案する。 DOCは2つの補完的なコンポーネントで構成されている。詳細アウトラインは、より詳細で階層的に構造化されたアウトラインを作成し、クリエイティブな負担をメインのドラフト手順から計画段階に移行する。詳細コントローラは、アウトラインの詳細に合わせてストーリーの節を制御することで、生成中もより詳細なアウトラインが尊重されるようにします。自動生成ストーリーの人間による評価では、DOCはプロットコヒーレンス(22.5%の絶対ゲイン)、アウトライン関連(28.2%)、面白さ(20.7%)で強いRe3ベースライン(Yang et al., 2022)を大幅に上回る。人間はまた、DOCは対話的な世代設定においてはるかに制御可能であると判断した。

We propose the Detailed Outline Control (DOC) framework for improving long-range plot coherence when automatically generating several-thousand-word-long stories. DOC consists of two complementary components: a detailed outliner and a detailed controller. The detailed outliner creates a more detailed, hierarchically structured outline, shifting creative burden from the main drafting procedure to the planning stage. The detailed controller ensures the more detailed outline is still respected during generation by controlling story passages to align with outline details. In human evaluations of automatically generated stories, DOC substantially outperforms a strong Re3 baseline (Yang et al., 2022) on plot coherence (22.5% absolute gain), outline relevance (28.2%), and interestingness (20.7%). Humans also judged DOC to be much more controllable in an interactive generation setting.

翻訳日:2023-06-17 02:44:43 公開日:2023-06-14

# マルコフ決定過程における因果時間推論

Causal Temporal Reasoning for Markov Decision Processes ( http://arxiv.org/abs/2212.08712v2 )

ライセンス: Link先を確認

Milad Kazemi and Nicola Paoletti

(参考訳) 我々は,マルコフ決定過程(mdp)の検証のための新しい確率的時間論理である$\textit{pcftl (probabilistic counterfactual temporal logic)$を導入する。 PCFTLは因果推論の演算子を初めて含み、介入的および反事実的クエリを表現できる。経路公式 $\phi$ が与えられたとき、介入性は、特定の変更 $I$ を MDP に適用した場合に$\phi$ の満足度確率に関係する(例えば、別のポリシーに切り替えるなど)。 MDP の異なる構成を含む \textit{what-if} のシナリオを推論できるため、我々のアプローチは、固定されたシステム構成のみを推論できる既存の確率的時間論理から逸脱している。統語論的観点から,PCTLなどの従来の確率演算子と同様に,介入確率と反ファクト確率の両方を仮定する一般化された反ファクト演算子を導入する。意味論の観点からは、我々の論理はmdpの構造的因果モデル変換によって解釈され、これは反事実的推論を許容する表現を与える。グリッドワールドモデルのベンチマークを用いて,PCFTLを安全な強化学習の文脈で評価する。

We introduce $\textit{PCFTL (Probabilistic CounterFactual Temporal Logic)}$, a new probabilistic temporal logic for the verification of Markov Decision Processes (MDP). PCFTL is the first to include operators for causal reasoning, allowing us to express interventional and counterfactual queries. Given a path formula $\phi$, an interventional property is concerned with the satisfaction probability of $\phi$ if we apply a particular change $I$ to the MDP (e.g., switching to a different policy); a counterfactual allows us to compute, given an observed MDP path $\tau$, what the outcome of $\phi$ would have been had we applied $I$ in the past. For its ability to reason about \textit{what-if} scenarios involving different configurations of the MDP, our approach represents a departure from existing probabilistic temporal logics that can only reason about a fixed system configuration. From a syntactic viewpoint, we introduce a generalized counterfactual operator that subsumes both interventional and counterfactual probabilities as well as the traditional probabilistic operator found in e.g., PCTL. From a semantics viewpoint, our logic is interpreted over a structural causal model translation of the MDP, which gives us a representation amenable to counterfactual reasoning. We evaluate PCFTL in the context of safe reinforcement learning using a benchmark of grid-world models.

翻訳日:2023-06-17 02:44:10 公開日:2023-06-14

# 説明付きAI意思決定における人間直観の役割の理解

Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations ( http://arxiv.org/abs/2301.07255v3 )

ライセンス: Link先を確認

Valerie Chen, Q. Vera Liao, Jennifer Wortman Vaughan, Gagan Bansal

(参考訳) AIの説明は、人間とAIの意思決定を改善する方法としてしばしば言及されるが、実証的研究は、説明の有効性の一貫性のある証拠を見出さず、逆に、AIシステムが間違っている場合に過度な信頼性を高めることができることを示唆している。多くの要因がAIサポートに依存する可能性があるが、意思決定者がAIの予測をいつオーバーライドするかを決定するためにAIシステムが提供する情報と、事前知識、経験、パターン認識に基づいて、自身の直観(信念やヒューリスティックス)をどう解釈するかが重要な要素である。我々は、意思決定者の直感がAIの予測と説明の使用にどのように影響するか、そして最終的にAIに依存するタイミングを選択するために、2つの予測タスクのための2つの説明タイプ(機能と例に基づく)で、思考アラウドと混合メソッドの研究を行う。結果から,AIの予測と説明に関する推論に関わる3つの直観,すなわちタスク結果,特徴,AIの限界に関する直観を抽出した。これらに基づいて、意思決定者が自身の直感を適用し、AI予測を上書きする3つの観察経路を要約する。筆者らは,(1)特徴に基づく説明が参加者の判断結果を改善せず,AIに対する信頼度を高めなかった理由,(2)特徴に基づく説明よりも意思決定者のパフォーマンスを向上し,補完的な人間-AIのパフォーマンスを実現した事例に基づく説明を,これらの経路を用いて説明している。全体として、私たちの研究は、意思決定者がAIに適切に依存するための直感を効果的に適用するのに役立つAI意思決定支援システムと説明方法のさらなる発展に向けた方向性を特定します。

AI explanations are often mentioned as a way to improve human-AI decision-making, but empirical studies have not found consistent evidence of explanations' effectiveness and, on the contrary, suggest that they can increase overreliance when the AI system is wrong. While many factors may affect reliance on AI support, one important factor is how decision-makers reconcile their own intuition -- beliefs or heuristics, based on prior knowledge, experience, or pattern recognition, used to make judgments -- with the information provided by the AI system to determine when to override AI predictions. We conduct a think-aloud, mixed-methods study with two explanation types (feature- and example-based) for two prediction tasks to explore how decision-makers' intuition affects their use of AI predictions and explanations, and ultimately their choice of when to rely on AI. Our results identify three types of intuition involved in reasoning about AI predictions and explanations: intuition about the task outcome, features, and AI limitations. Building on these, we summarize three observed pathways for decision-makers to apply their own intuition and override AI predictions. We use these pathways to explain why (1) the feature-based explanations we used did not improve participants' decision outcomes and increased their overreliance on AI, and (2) the example-based explanations we used improved decision-makers' performance over feature-based explanations and helped achieve complementary human-AI performance. Overall, our work identifies directions for further development of AI decision-support systems and explanation methods that help decision-makers effectively apply their intuition to achieve appropriate reliance on AI.

翻訳日:2023-06-17 02:36:43 公開日:2023-06-14

# 手術集約:分散医用画像データセットを多様なタスクで調和させる協調学習フレームワーク

Surgical Aggregation: A Collaborative Learning Framework for Harmonizing Distributed Medical Imaging Datasets with Diverse Tasks ( http://arxiv.org/abs/2301.06683v4 )

ライセンス: Link先を確認

Pranav Kulkarni, Adway Kanhere, Paul H. Yi, Vishwa S. Parekh

(参考訳) 大規模胸部X線データセットは、深層学習を用いて異常を検出するためにキュレートされ、多くの臨床応用において大きな利益をもたらす可能性がある。しかしながら、各データセットは、患者に同時に存在する可能性のある発見のサブセットのみに焦点を当てており、複数のデータセットをまとめるモデルをトレーニングすることは困難である。したがって、これらのデータセットを集約的に活用し、胸腔内に存在する可能性のある異常の完全な表現で臨床的に有用なモデルを訓練することが重要である。そこで本研究では,分散異種データセットから知識を部分的アノテーションで融合・集約する協調学習フレームワークであるブラジカルアグリゲーションを提案する。人工的および実世界の異種データセットにまたがる外科的アグリゲーションを部分的アノテーションを用いて評価する。以上の結果から, 外科的アグリゲーションは現在の戦略より優れ, より一般化し, 異種疾患ラベル付きデータセットを用いても, 臨床的に有用なモデルの開発を促進できる可能性が示唆された。

Large-scale chest x-ray datasets have been curated for the detection of abnormalities using deep learning, with the potential to provide substantial benefits across many clinical applications. However, each dataset focuses only on a subset of findings that can be simultaneously present in a patient, making it challenging to train models that aggregate multiple datasets together. Therefore, data harmonization is crucial to leverage these datasets in aggregate to train clinically useful models with a complete representation of abnormalities that may occur within the thorax. To that end, we propose surgical aggregation, a collaborative learning framework for harmonizing and aggregating knowledge from distributed heterogeneous datasets with partial annotations. We evaluate surgical aggregation across synthetic and real-world heterogeneous datasets with partial annotations. Our results indicate that surgical aggregation outperforms current strategies, generalizes better, and has the potential to facilitate the development of clinically useful models even when using datasets with heterogeneous disease labels.

翻訳日:2023-06-17 02:36:06 公開日:2023-06-14

# AIモデルのための理論的枠組みとバイオメディシンへの応用

A Theoretical Framework for AI Models Explainability with Application in Biomedicine ( http://arxiv.org/abs/2212.14447v4 )

ライセンス: Link先を確認

Matteo Rizzo, Alberto Veneri, Andrea Albarelli, Claudio Lucchese, Marco Nobile, Cristina Conati

(参考訳) 説明可能な人工知能(XAI)は、人工知能コミュニティにおいて活発な研究テーマであり、メソッドやドメインにまたがる関心が高まっている。この問題については多くが書かれてきたが、XAIはいまだに共通用語と説明に構造的健全性を提供するフレームワークを欠いている。本研究では,文献に見ることができるものの合成である説明の新しい定義を提案することで,これらの課題に対処した。我々は、説明が原子性ではなく、モデルとその入出力マッピングに由来する証拠の組み合わせであり、この証拠の人間の解釈であると認識する。さらに、忠実性(すなわち、モデルの内部動作と意思決定プロセスの真の説明である説明)と可否性(つまり、その説明がどの程度ユーザにとって説得力のあるように見えるか)について説明する。提案する理論的枠組みを用いて,これらの特性の操作方法を単純化し,ケーススタディとして分析する共通説明法に対する新たな洞察を与える。

EXplainable Artificial Intelligence (XAI) is a vibrant research topic in the artificial intelligence community, with growing interest across methods and domains. Much has been written about the subject, yet XAI still lacks shared terminology and a framework capable of providing structural soundness to explanations. In our work, we address these issues by proposing a novel definition of explanation that is a synthesis of what can be found in the literature. We recognize that explanations are not atomic but the combination of evidence stemming from the model and its input-output mapping, and the human interpretation of this evidence. Furthermore, we fit explanations into the properties of faithfulness (i.e., the explanation being a true description of the model's inner workings and decision-making process) and plausibility (i.e., how much the explanation looks convincing to the user). Using our proposed theoretical framework simplifies how these properties are operationalized and it provides new insight into common explanation methods that we analyze as case studies.

翻訳日:2023-06-17 02:34:33 公開日:2023-06-14

# MixupE: 方向微分の観点からのミックスアップの理解と改善

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective ( http://arxiv.org/abs/2212.13381v3 )

ライセンス: Link先を確認

Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi

(参考訳) Mixupはディープニューラルネットワークをトレーニングするための一般的なデータ拡張テクニックで、入力とラベルを線形に補間することで追加サンプルを生成する。この技術は多くの学習パラダイムや応用において一般化性能を向上させることが知られている。本研究では,まず混合を解析し,すべての順序の無限個の方向微分を暗黙的に規則化することを示す。この新たな知見に基づいて,理論上はバニラミックスアップよりも優れた一般化性能を提供するため,mixupの改良版を提案する。提案手法の有効性を示すために,画像,表データ,音声,グラフなどの様々な領域で実験を行った。提案手法は,様々なアーキテクチャを用いて,複数のデータセットのミックスアップを改良し,ImageNet Top-1の精度が0.8%向上したことを示す。

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.

翻訳日:2023-06-17 02:34:14 公開日:2023-06-14

# 反復生成のためのスケーラブル適応計算

Scalable Adaptive Computation for Iterative Generation ( http://arxiv.org/abs/2212.11972v2 )

ライセンス: Link先を確認

Allan Jabri, David Fleet, Ting Chen

(参考訳) 自然データは冗長だが支配的なアーキテクチャであり、入出力空間を均一に計算する。本稿では,データ次元からコア計算を分離し,よりスケーラブルな高次元データ生成のための適応計算を可能にする注目型アーキテクチャであるRecurrent Interface Networks (RINs)を提案する。 RINは、潜在トークンとデータトークンの間の情報(すなわちルート)を読み書きするためにクロスアテンションを使用して、計算の大部分(すなわちグローバルな自己アテンション)を潜在トークンの集合にフォーカスする。 RINブロックの積み重ねにより、ボトムアップ(データから遅延)とトップダウン(データに近い)のフィードバックが可能になり、より深く表現力のあるルーティングが可能になる。このルーティングには課題が伴うが、拡散モデルによる反復生成のようなタスク(およびルーティング問題)が徐々に変化する繰り返し計算設定では問題が少なくなる。逆拡散過程の各前方通過に潜時トークンを前処理、すなわち潜時自己条件で条件付けすることで再帰性を活用する方法を示す。 RINは、画像生成とビデオ生成のための最先端のピクセル拡散モデルを生成し、カスケードやガイダンスなしで1024X1024画像にスケーリングすると同時に、ドメインに依存しず、2Dや3D U-Netよりも最大10倍効率が高い。

Natural data is redundant yet predominant architectures tile computation uniformly across their input and output space. We propose the Recurrent Interface Networks (RINs), an attention-based architecture that decouples its core computation from the dimensionality of the data, enabling adaptive computation for more scalable generation of high-dimensional data. RINs focus the bulk of computation (i.e. global self-attention) on a set of latent tokens, using cross-attention to read and write (i.e. route) information between latent and data tokens. Stacking RIN blocks allows bottom-up (data to latent) and top-down (latent to data) feedback, leading to deeper and more expressive routing. While this routing introduces challenges, this is less problematic in recurrent computation settings where the task (and routing problem) changes gradually, such as iterative generation with diffusion models. We show how to leverage recurrence by conditioning the latent tokens at each forward pass of the reverse diffusion process with those from prior computation, i.e. latent self-conditioning. RINs yield state-of-the-art pixel diffusion models for image and video generation, scaling to 1024X1024 images without cascades or guidance, while being domain-agnostic and up to 10X more efficient than 2D and 3D U-Nets.

翻訳日:2023-06-17 02:33:59 公開日:2023-06-14

# 小型潜時ネットワークを用いた適応型シームズ追跡

Adaptive Siamese Tracking with a Compact Latent Network ( http://arxiv.org/abs/2302.00930v2 )

ライセンス: Link先を確認

Xingping Dong, Jianbing Shen, Fatih Porikli, Jiebo Luo, and Ling Shao

(参考訳) 本稿では,シームズに基づくトラッカーを簡易化するために,トラッキングタスクを分類に変換し,直感的なビューアを提供する。この見地から,視覚シミュレーションや実追跡例を通じて詳細な解析を行い,いくつかの困難な状況における障害事例をオフライントレーニングにおける決定的サンプルの欠落問題とみなすことができる。最初の(最初の)フレームのサンプルは、豊富なシーケンス固有情報を含んでいるので、シーケンス全体を表す決定的なサンプルとみなすことができる。ベースモデルを新しいシーンに迅速に適応させるために、これらの決定的なサンプルをフル活用して、コンパクトな潜在ネットワークを提示する。具体的には,逐次的情報抽出を効率的に行うことで,高速調整のための統計に基づくコンパクトな潜在性特徴を提案する。さらに,提案するコンパクト潜在ネットワークの識別能力をさらに向上させるための,新たな多種多様なサンプルマイニング戦略を考案した。最後に,追跡フェーズ中のシーン変動を効率的に処理するために,基本モデルを更新するための条件付き更新戦略を提案する。本手法の一般化と有効性を評価するため,siamrpn++,siamfc,siambanの3つの古典的なsiameseベースのトラッカーを調整した。最近の6つのデータセットの大規模な実験結果から、3つの調整されたトラッカーは高い走行速度を保ちながら精度で優れた性能が得られることが示された。

In this paper, we provide an intuitive viewing to simplify the Siamese-based trackers by converting the tracking task to a classification. Under this viewing, we perform an in-depth analysis for them through visual simulations and real tracking examples, and find that the failure cases in some challenging situations can be regarded as the issue of missing decisive samples in offline training. Since the samples in the initial (first) frame contain rich sequence-specific information, we can regard them as the decisive samples to represent the whole sequence. To quickly adapt the base model to new scenes, a compact latent network is presented via fully using these decisive samples. Specifically, we present a statistics-based compact latent feature for fast adjustment by efficiently extracting the sequence-specific information. Furthermore, a new diverse sample mining strategy is designed for training to further improve the discrimination ability of the proposed compact latent network. Finally, a conditional updating strategy is proposed to efficiently update the basic models to handle scene variation during the tracking phase. To evaluate the generalization ability and effectiveness and of our method, we apply it to adjust three classical Siamese-based trackers, namely SiamRPN++, SiamFC, and SiamBAN. Extensive experimental results on six recent datasets demonstrate that all three adjusted trackers obtain the superior performance in terms of the accuracy, while having high running speed.

翻訳日:2023-06-17 02:25:16 公開日:2023-06-14

# 電子化エージェントへのインターネットスケールビジョンランゲージモデルの蒸留

Distilling Internet-Scale Vision-Language Models into Embodied Agents ( http://arxiv.org/abs/2301.12507v2 )

ライセンス: Link先を確認

Theodore Sumers, Kenneth Marino, Arun Ahuja, Rob Fergus, Ishita Dasgupta

(参考訳) 命令追従エージェントは言語を観察空間と行動空間に基礎付ける必要がある。基底言語への学習は、通常、ドメイン固有のエンジニアリングまたは大量のヒューマンインタラクションデータを必要とする。この課題に対処するために,事前に訓練された視覚言語モデル (VLM) を用いてエンボディエージェントを監督する手法を提案する。モデル蒸留と後視体験再生(HER)のアイデアを組み合わせて, VLMを用いてエージェントの動作を記述する言語を遡及的に生成する。単純なプロンプトによって監督信号を制御でき、エージェントに3dレンダリングされた環境で名前(平面など)や特徴(色など)に基づいて、新しいオブジェクトと対話するように教えます。 fewshotプロンプトでは、既存のカテゴリ(食べ物とおもちゃ)やアドホックなもの(オブジェクトよりもアービタリーな好み)など、抽象的なカテゴリのメンバシップを教えられます。我々の研究は、インターネットスケールのVLMを使うための新しい効果的な方法を概説し、そのようなモデルが獲得した汎用言語基盤を再利用し、エージェントにタスク関連基盤を教える。

Instruction-following agents must ground language into their observation and action spaces. Learning to ground language is challenging, typically requiring domain-specific engineering or large quantities of human interaction data. To address this challenge, we propose using pretrained vision-language models (VLMs) to supervise embodied agents. We combine ideas from model distillation and hindsight experience replay (HER), using a VLM to retroactively generate language describing the agent's behavior. Simple prompting allows us to control the supervision signal, teaching an agent to interact with novel objects based on their names (e.g., planes) or their features (e.g., colors) in a 3D rendered environment. Fewshot prompting lets us teach abstract category membership, including pre-existing categories (food vs toys) and ad-hoc ones (arbitrary preferences over objects). Our work outlines a new and effective way to use internet-scale VLMs, repurposing the generic language grounding acquired by such models to teach task-relevant groundings to embodied agents.

翻訳日:2023-06-17 02:23:50 公開日:2023-06-14

# SOBER:離散空間と混合空間上の高並列ベイズ最適化とベイズ四分法

SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces ( http://arxiv.org/abs/2301.11832v3 )

ライセンス: Link先を確認

Masaki Adachi, Satoshi Hayakawa, Saad Hamid, Martin J{\o}rgensen, Harald Oberhauser, Micheal A. Osborne

(参考訳) Batch Bayesian optimization と Bayesian quadrature は、高価な対物関数を並列にクエリできる最適化と二次化を行うサンプル効率のよい方法であることが示されている。しかし、現在の手法は大規模なバッチサイズにはスケールしない -- 実際には頻繁なデシデラタム(例えば、薬物の発見やシミュレーションに基づく推論)である。本稿では,分散空間上の任意の獲得関数とカーネルを持つ,スケーラブルで多様なバッチグローバル最適化と定式化を実現する新しいアルゴリズム SOBER を提案する。我々のアプローチの鍵は、二次問題としてグローバル最適化のためのバッチ選択を再構成することであり、これは獲得関数の最大化(非凸)をカーネル再結合(凸)に緩和する。グローバル最適化と二次のブリッジは、搾取ベイズ最適化と探索ベイズ二次のメリットをバランスさせることで、両方のタスクを効率的に解決することができる。実世界の12のタスクにおいて,SOBERが11の競争ベースラインを上回っていることを示す。

Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch global optimisation and quadrature with arbitrary acquisition functions and kernels over discrete and mixed spaces. The key to our approach is to reformulate batch selection for global optimisation as a quadrature problem, which relaxes acquisition function maximisation (non-convex) to kernel recombination (convex). Bridging global optimisation and quadrature can efficiently solve both tasks by balancing the merits of exploitative Bayesian optimisation and explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive baselines on 12 synthetic and diverse real-world tasks.

翻訳日:2023-06-17 02:23:30 公開日:2023-06-14

# パラメーター効率の高い転送学習による言語モデルの分布外ロバスト性の検出

Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning ( http://arxiv.org/abs/2301.11660v4 )

ライセンス: Link先を確認

Hyunsoo Cho, Choonghyun Park, Junyeop Kim, Hyuhng Joon Kim, Kang Min Yoo, and Sang-goo Lee

(参考訳) プレトレーニング言語モデル (PLM) のサイズが増加し続けるにつれて, 微調整の膨大なコストを補うために, パラメータ効率の学習手法が多数提案されている。大規模な事前学習言語モデル (PLM) と各種パラメータ効率変換学習法 (PETL) が日没ベンチマークで達成した印象的な結果にもかかわらず, 分散的にシフトした入力を効果的に処理できるかどうかは不明である。本研究では,plmの大きさや転送方法が変化するにつれて,od(out-of-distribution)がどう変化するかを体系的に検討する。具体的には,異なるスケールの様々な言語モデルを用いて,3つの異なる意図分類タスクにおいて,微調整,アダプタ,lora,プレフィックスチューニングを含む様々なpetl手法を評価した。

As the size of the pre-trained language model (PLM) continues to increase, numerous parameter-efficient transfer learning methods have been proposed recently to compensate for the tremendous cost of fine-tuning. Despite the impressive results achieved by large pre-trained language models (PLMs) and various parameter-efficient transfer learning (PETL) methods on sundry benchmarks, it remains unclear if they can handle inputs that have been distributionally shifted effectively. In this study, we systematically explore how the ability to detect out-of-distribution (OOD) changes as the size of the PLM grows or the transfer methods are altered. Specifically, we evaluated various PETL techniques, including fine-tuning, Adapter, LoRA, and prefix-tuning, on three different intention classification tasks, each utilizing various language models with different scales.

翻訳日:2023-06-17 02:23:13 公開日:2023-06-14

# LightGCL:レコメンデーションのためのシンプルで効果的なグラフコントラスト学習

LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation ( http://arxiv.org/abs/2302.08191v3 )

ライセンス: Link先を確認

Xuheng Cai, Chao Huang, Lianghao Xia, Xubin Ren

(参考訳) グラフニューラルネットワーク(GNN)は、グラフベースのレコメンデータシステムのための強力な学習手法である。近年, コントラスト学習と統合されたGNNは, 高度にスパースなデータを扱うことを目的としたデータ拡張方式により, 優れた性能を示した。その成功にもかかわらず、既存のグラフのコントラスト学習手法のほとんどは、ユーザ-itemの相互作用グラフ上で確率的拡張(ノード/エッジの摂動)を行うか、あるいはコントラスト的なビューを生成するためにヒューリスティックベースの拡張技術(ユーザクラスタリングなど)に依存する。これらの手法は本質的な意味構造を十分に保ち得ず、ノイズの摂動によって容易にバイアスを受けることができる。本稿では,これらの問題を緩和し,CLベースのレコメンデータの汎用性と堅牢性を損なう,簡易で効果的なグラフコントラッシブ学習パラダイムLightGCLを提案する。本モデルでは, コントラスト拡張のために特異値分解を排他的に活用し, 協調関係モデリングによる制約のない構造改善を可能にする。いくつかのベンチマークデータセットで行った実験は、最先端のモデルよりもモデルの性能が大幅に向上したことを示している。さらなる分析は、データスパーシリティと人気バイアスに対するLightGCLの頑健さの優位性を示している。私たちのモデルのソースコードはhttps://github.com/HKUDS/LightGCLで公開されています。

Graph neural network (GNN) is a powerful learning approach for graph-based recommender systems. Recently, GNNs integrated with contrastive learning have shown superior performance in recommendation with their data augmentation schemes, aiming at dealing with highly sparse data. Despite their success, most existing graph contrastive learning methods either perform stochastic augmentation (e.g., node/edge perturbation) on the user-item interaction graph, or rely on the heuristic-based augmentation techniques (e.g., user clustering) for generating contrastive views. We argue that these methods cannot well preserve the intrinsic semantic structures and are easily biased by the noise perturbation. In this paper, we propose a simple yet effective graph contrastive learning paradigm LightGCL that mitigates these issues impairing the generality and robustness of CL-based recommenders. Our model exclusively utilizes singular value decomposition for contrastive augmentation, which enables the unconstrained structural refinement with global collaborative relation modeling. Experiments conducted on several benchmark datasets demonstrate the significant improvement in performance of our model over the state-of-the-arts. Further analyses demonstrate the superiority of LightGCL's robustness against data sparsity and popularity bias. The source code of our model is available at https://github.com/HKUDS/LightGCL.

翻訳日:2023-06-17 02:17:18 公開日:2023-06-14

# バンド・ソーシャル・ラーニング : 神秘的行動による探索

Bandit Social Learning: Exploration under Myopic Behavior ( http://arxiv.org/abs/2302.07425v3 )

ライセンス: Link先を確認

Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Aleksandrs Slivkins

(参考訳) エージェントが単純なマルチアームバンディットプロトコルに従う社会学習のダイナミクスについて検討する。エージェントは順次到着し、腕を選び、関連する報酬を受け取る。各エージェントは、前のエージェントの完全な履歴(武器と報酬)を観察し、プライベートシグナルは存在しない。協力してエージェントは探索と探索のトレードオフに直面しますが、それぞれのエージェントは探査に関して無差別に行動します。モチベーションシナリオは、オンラインプラットフォームにおけるレビューと評価に関するものだ。我々は、「偏見のない」行動や様々な行動バイアスを含む、(パラメータ化された)信頼区間と整合した幅広い筋電図的行動を許容する。これらの行動の極端なバージョンはよく知られたバンディットアルゴリズムに対応しているが、より穏健なバージョンは究極の探索失敗につながり、結果としてエージェント数に線形な後悔率をもたらすことを証明している。我々は「適度に楽観的な」エージェントを分析して後悔の上限を一致させる。独立利害関係の特別な場合として,多腕バンディットにおけるグリーディアルゴリズムの故障に関する一般的な結果を得る。これが文学における最初の結果であり、我々の知る限りでは最善である。

We study social learning dynamics where the agents collectively follow a simple multi-armed bandit protocol. Agents arrive sequentially, choose arms and receive associated rewards. Each agent observes the full history (arms and rewards) of the previous agents, and there are no private signals. While collectively the agents face exploration-exploitation tradeoff, each agent acts myopically, without regards to exploration. Motivating scenarios concern reviews and ratings on online platforms. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals, including the "unbiased" behavior as well as various behaviorial biases. While extreme versions of these behaviors correspond to well-known bandit algorithms, we prove that more moderate versions lead to stark exploration failures, and consequently to regret rates that are linear in the number of agents. We provide matching upper bounds on regret by analyzing "moderately optimistic" agents. As a special case of independent interest, we obtain a general result on failure of the greedy algorithm in multi-armed bandits. This is the first such result in the literature, to the best of our knowledge.

翻訳日:2023-06-17 02:16:43 公開日:2023-06-14

# Androidマルウェア検出のための継続的学習

Continuous Learning for Android Malware Detection ( http://arxiv.org/abs/2302.04332v2 )

ライセンス: Link先を確認

Yizheng Chen, Zhoujie Ding, David Wagner

(参考訳) 機械学習は、androidのマルウェアを非常に高い精度で検出できる。しかし、これらの分類器にはAchilles Heelとコンセプトドリフトがあり、マルウェアアプリや良質なアプリの進化によって、それらは急速に時代遅れになり、非効率になる。我々の研究によると、Androidのマルウェア分類器を1年分のデータでトレーニングした後、新しいテストサンプルに6ヶ月デプロイした後、F1スコアはすぐに0.99から0.76に低下した。本稿では,androidマルウェア分類器の概念ドリフト問題に対処する新しい手法を提案する。マシンラーニングのテクニックを継続的にデプロイする必要があるため、私たちはアクティブラーニングを使用します。アナリストがラベル付けする新しいサンプルを選択し、ラベル付きサンプルをトレーニングセットに追加して、分類器を再トレーニングします。私たちの重要なアイデアは、類似性に基づく不確実性が、コンセプトドリフトに対してより堅牢であることです。そこで我々は,コントラスト学習とアクティブラーニングを組み合わせる。本稿では,新しい階層的コントラスト学習スキームと,androidマルウェア分類器を継続的に学習するための新しいサンプル選択手法を提案する。評価の結果,前回公表したアクティブラーニング手法と比較して,大幅な改善がみられた。我々のアプローチは、偽陰性率を14%(最良のベースライン)から9%に削減するとともに、偽陽性率(0.86%から0.48%)を低下させる。また,従来の手法よりも7年間にわたって一貫した性能を維持する。

Machine learning methods can detect Android malware with very high accuracy. However, these classifiers have an Achilles heel, concept drift: they rapidly become out of date and ineffective, due to the evolution of malware apps and benign apps. Our research finds that, after training an Android malware classifier on one year's worth of data, the F1 score quickly dropped from 0.99 to 0.76 after 6 months of deployment on new test samples. In this paper, we propose new methods to combat the concept drift problem of Android malware classifiers. Since machine learning technique needs to be continuously deployed, we use active learning: we select new samples for analysts to label, and then add the labeled samples to the training set to retrain the classifier. Our key idea is, similarity-based uncertainty is more robust against concept drift. Therefore, we combine contrastive learning with active learning. We propose a new hierarchical contrastive learning scheme, and a new sample selection technique to continuously train the Android malware classifier. Our evaluation shows that this leads to significant improvements, compared to previously published methods for active learning. Our approach reduces the false negative rate from 14% (for the best baseline) to 9%, while also reducing the false positive rate (from 0.86% to 0.48%). Also, our approach maintains more consistent performance across a seven-year time period than past methods.

翻訳日:2023-06-17 02:15:34 公開日:2023-06-14

# 効率的な同変GNNのためのSO(3)のSO(2)への畳み込み

Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs ( http://arxiv.org/abs/2302.03655v2 )

ライセンス: Link先を確認

Saro Passaro, C. Lawrence Zitnick

(参考訳) 点雲や原子などの3Dデータをモデル化するグラフニューラルネットワークは、通常、$SO(3)$等式、すなわち3Dローテーションに同変することを望んでいる。残念ながら、同変ネットワークの基本的な操作である同変畳み込みは、高次テンソルを使用すると計算複雑性が大幅に増加する。本稿では、$SO(3)$畳み込みあるいはテンソル積を$SO(2)$ の数学的に等価な畳み込みに還元することでこの問題に対処する。これは、ノード埋め込みの一次軸をエッジベクトルに合わせることで達成され、これはテンソル積を分散させ、計算複雑性を$O(L^6)$から$O(L^3)$に減らし、$L$は表現の次数である。本研究では,大規模oc-20およびoc-22データセットの最先端結果を実現する等変畳み込み法を用いて,グラフニューラルネットワークである等変球状チャネルネットワーク(escn)を提案することで,この改善の可能性を示す。

Graph neural networks that model 3D data, such as point clouds or atoms, are typically desired to be $SO(3)$ equivariant, i.e., equivariant to 3D rotations. Unfortunately equivariant convolutions, which are a fundamental operation for equivariant networks, increase significantly in computational complexity as higher-order tensors are used. In this paper, we address this issue by reducing the $SO(3)$ convolutions or tensor products to mathematically equivalent convolutions in $SO(2)$ . This is accomplished by aligning the node embeddings' primary axis with the edge vectors, which sparsifies the tensor product and reduces the computational complexity from $O(L^6)$ to $O(L^3)$, where $L$ is the degree of the representation. We demonstrate the potential implications of this improvement by proposing the Equivariant Spherical Channel Network (eSCN), a graph neural network utilizing our novel approach to equivariant convolutions, which achieves state-of-the-art results on the large-scale OC-20 and OC-22 datasets.

翻訳日:2023-06-17 02:15:11 公開日:2023-06-14

# ブラックボックスモデルで単純なタスクをオーバーキルしなくなり、代わりに透明モデルを使用する

Stop overkilling simple tasks with black-box models and use transparent models instead ( http://arxiv.org/abs/2302.02804v2 )

ライセンス: Link先を確認

Matteo Rizzo, Matteo Marcuzzo, Alessandro Zangari, Andrea Gasparetto, Andrea Albarelli

(参考訳) 近年、ディープラーニングの手法が採用され、人工知能にいくつかの大きなブレークスルーをもたらした。従来の機械学習モデルとは異なり、ディープラーニングベースのアプローチは、生データから自律的に特徴を抽出することができる。これにより、一般的にエラーを起こしやすく、面倒であると考えられる機能エンジニアリングプロセスをバイパスすることができる。さらに、ディープラーニング戦略は、精度で従来のモデルより優れていることが多い。

In recent years, the employment of deep learning methods has led to several significant breakthroughs in artificial intelligence. Different from traditional machine learning models, deep learning-based approaches are able to extract features autonomously from raw data. This allows for bypassing the feature engineering process, which is generally considered to be both error-prone and tedious. Moreover, deep learning strategies often outperform traditional models in terms of accuracy.

翻訳日:2023-06-17 02:14:49 公開日:2023-06-14

# 予算・ROI制約付き多チャンネル自動車

Multi-channel Autobidding with Budget and ROI Constraints ( http://arxiv.org/abs/2302.01523v3 )

ライセンス: Link先を確認

Yuan Deng, Negin Golrezaei, Patrick Jaillet, Jason Cheuk Nam Liang, Vahab Mirrokni

(参考訳) デジタルオンライン広告では、広告主は複数のプラットフォーム、あるいはGoogle Ads、Meta Ads Managerなどのいわゆるチャンネルで広告インプレッションを同時に調達する。広告主が全チャンネルの総コンバージョン(広告クリックなど)を最大化しつつ、総リターン・オン・投資(ROI)と予算制約を満たす方法について検討する。実際には、広告主は、制御することができないため、グローバルに最適化することができないため、各チャンネルで参加する個別の広告オークションを承認し、その代わりにインプレッションを得るチャンネルを許可する。本研究では,広告主のグローバルなマルチチャネル問題を解決するために,各レバーの有効性をまず分析する。広告主がチャネル毎のROIを最適化するだけでは、全体の変換がグローバルな問題で得られるものよりも任意に悪化することを示します。さらに,チャネル当たりの予算を最適化するだけで,広告主がグローバルに最適な変換を実現できることを示す。この発見を踏まえ、広告主が各チャンネルでの広告入札に関する情報に制限がある実世界のシナリオと、チャネル調達広告の仕組みを模倣した、帯域単位の予算を生成する効率的な学習アルゴリズムを提案し、その結果の変換は、グローバルな最適問題のものと近似する。最後に、当社の結果は、広告主の代理として、チャンネルがインプレッションを得られるシングルイットとマルチイットのオークションの両方に当てはまると論じる。

In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc., each of which consists of numerous ad auctions. We study how an advertiser maximizes total conversion (e.g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels. In practice, an advertiser does not have control over, and thus cannot globally optimize, which individual ad auctions she participates in for each channel, and instead authorizes a channel to procure impressions on her behalf: the advertiser can only utilize two levers on each channel, namely setting a per-channel budget and per-channel target ROI. In this work, we first analyze the effectiveness of each of these levers for solving the advertiser's global multi-channel problem. We show that when an advertiser only optimizes over per-channel ROIs, her total conversion can be arbitrarily worse than what she could have obtained in the global problem. Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem. Finally, we argue that all our results hold for both single-item and multi-item auctions from which channels procure impressions on advertisers' behalf.

翻訳日:2023-06-17 02:14:30 公開日:2023-06-14

# MADDPGにおけるGumbel-Softmaxの再検討

Revisiting the Gumbel-Softmax in MADDPG ( http://arxiv.org/abs/2302.11793v2 )

ライセンス: Link先を確認

Callum Rhys Tilbury, Filippos Christianos, Stefano V. Albrecht

(参考訳) MADDPGはマルチエージェント強化学習(MARL)におけるアルゴリズムであり、一般的な単エージェント法であるDDPGをマルチエージェントシナリオに拡張する。 DDPGは、状態-作用値関数の勾配が存在する連続的な行動空間向けに設計されたアルゴリズムである。このアルゴリズムが離散作用空間で動作するためには、離散勾配推定を行う必要がある。 maddpgでは、gumbel-softmax (gs) 推定器が使用されている -- 離散分布を同様の連続分布に緩和する再パラメータ化である。しかし、この手法は統計的に偏りがあり、最近のMARLベンチマークでは、このバイアスにより、アクション空間が離散的なグリッドワールド環境でのMADDPGの性能が低下することが示唆されている。幸いにもGSの代替品は数多く存在し、幅広い特性を誇っている。本稿では,これらの選択肢のいくつかを探索し,離散グリッドワールドシナリオのためのMADDPGに統合する。さまざまなパフォーマンス指標に対する対応する影響を計測して分析する。提案した推定器の1つは、いくつかのタスクにおいて元のGSよりもはるかに優れた性能を示し、最大で55%高いリターンを達成し、より高速な収束を実現している。

MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that extends the popular single-agent method, DDPG, to multi-agent scenarios. Importantly, DDPG is an algorithm designed for continuous action spaces, where the gradient of the state-action value function exists. For this algorithm to work in discrete action spaces, discrete gradient estimation must be performed. For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisation which relaxes a discrete distribution into a similar continuous one. This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete. Fortunately, many alternatives to the GS exist, boasting a wide range of properties. This paper explores several of these alternatives and integrates them into MADDPG for discrete grid-world scenarios. The corresponding impact on various performance metrics is then measured and analysed. It is found that one of the proposed estimators performs significantly better than the original GS in several tasks, achieving up to 55% higher returns, along with faster convergence.

翻訳日:2023-06-17 02:06:55 公開日:2023-06-14

# MalProtect:MLベースのマルウェア検出における逆クエリ攻撃に対するステートフル防御

MalProtect: Stateful Defense Against Adversarial Query Attacks in ML-based Malware Detection ( http://arxiv.org/abs/2302.10739v2 )

ライセンス: Link先を確認

Aqib Rashid and Jose Such

(参考訳) mlモデルは、逆クエリ攻撃に対して脆弱であることが知られている。これらの攻撃では、クエリは出力以外にターゲットモデルに関する知識のない特定のクラスに対して反復的に摂動される。リモートホスト型ML分類モデルとMachine-Learning-as-a-Serviceプラットフォームの普及は、クエリアタックがこれらのシステムのセキュリティに本当の脅威をもたらすことを意味する。これに対処するため、システムで受信されたクエリのシーケンスを監視し分析することで、クエリアタックの検出と敵の例の発生を防止するために、ステートフルな防御が提案されている。近年、いくつかの国家防衛が提案されている。しかし、これらの防御は、他の領域で有効な類似性または分散検出方法のみに依存している。マルウェア検出領域では、敵の例を生成する方法は本質的に異なるため、そのような検出機構は著しく効果が低い。そこで本研究では,マルウェア検出領域におけるクエリアタックに対するステートフルな防御であるMalProtectを提案する。 MalProtectはいくつかの脅威指標を使用して攻撃を検出する。以上の結果から,Android および Windows マルウェアでは,さまざまな攻撃シナリオにおいて,敵クエリ攻撃の回避率を 80 % 削減できることがわかった。この種の最初の評価では、malprotectは、特に最大の敵の脅威下で、以前の国家的防御よりも優れています。

ML models are known to be vulnerable to adversarial query attacks. In these attacks, queries are iteratively perturbed towards a particular class without any knowledge of the target model besides its output. The prevalence of remotely-hosted ML classification models and Machine-Learning-as-a-Service platforms means that query attacks pose a real threat to the security of these systems. To deal with this, stateful defenses have been proposed to detect query attacks and prevent the generation of adversarial examples by monitoring and analyzing the sequence of queries received by the system. Several stateful defenses have been proposed in recent years. However, these defenses rely solely on similarity or out-of-distribution detection methods that may be effective in other domains. In the malware detection domain, the methods to generate adversarial examples are inherently different, and therefore we find that such detection mechanisms are significantly less effective. Hence, in this paper, we present MalProtect, which is a stateful defense against query attacks in the malware detection domain. MalProtect uses several threat indicators to detect attacks. Our results show that it reduces the evasion rate of adversarial query attacks by 80+\% in Android and Windows malware, across a range of attacker scenarios. In the first evaluation of its kind, we show that MalProtect outperforms prior stateful defenses, especially under the peak adversarial threat.

翻訳日:2023-06-17 02:06:11 公開日:2023-06-14

# ディープニューラルネットワークにおけるショートカット学習の取り組み--解釈可能なモデルによる反復的アプローチ

Tackling Shortcut Learning in Deep Neural Networks: An Iterative Approach with Interpretable Models ( http://arxiv.org/abs/2302.10289v6 )

ライセンス: Link先を確認

Shantanu Ghosh, Ke Yu, Forough Arabshahi, Kayhan Batmanghelich

(参考訳) 概念に基づく解釈モデルを用いてショートカット学習を緩和する。既存の方法には解釈性がない。 Blackboxから始めて、解釈可能な専門家 (MoIE) と \emph{residual network} の混合物を反復的に \emph{carve out} する。各専門家は、FOL(First Order Logic)を使用してデータのサブセットを説明する。サンプルを説明しながら、偏りのあるBB由来のMoIEからFOLがショートカットを効果的に検出する。 BBをメタデータ正規化(MDN)で微調整すると、ショートカットがなくなる。微細BB由来MoIEからのFOLはショートカットの除去を検証する。実験の結果,MoIEは元のBBの精度を損なわず,ショートカットを効果的に除去することがわかった。

We use concept-based interpretable models to mitigate shortcut learning. Existing methods lack interpretability. Beginning with a Blackbox, we iteratively \emph{carve out} a mixture of interpretable experts (MoIE) and a \emph{residual network}. Each expert explains a subset of data using First Order Logic (FOL). While explaining a sample, the FOL from biased BB-derived MoIE detects the shortcut effectively. Finetuning the BB with Metadata Normalization (MDN) eliminates the shortcut. The FOLs from the finetuned-BB-derived MoIE verify the elimination of the shortcut. Our experiments show that MoIE does not hurt the accuracy of the original BB and eliminates shortcuts effectively.

翻訳日:2023-06-17 02:05:48 公開日:2023-06-14

# 深層学習アルゴリズムによる多変量系リスク対策と計算

Multivariate Systemic Risk Measures and Computation by Deep Learning Algorithms ( http://arxiv.org/abs/2302.10183v2 )

ライセンス: Link先を確認

Alessandro Doldi, Yichen Feng, Jean-Pierre Fouque, Marco Frittelli

(参考訳) 本研究では,多変量ユーティリティ関数によって定義されるシステム的短絡リスク尺度の計算のための深層学習に基づくアルゴリズムを提案する。本稿では,主観的最適性と関連するリスク割り当ての公平性に着目し,重要な理論的側面について論じる。私たちが提供しているアルゴリズムは、予備最適化の学習、二重表現の最適化、およびそれに対応する公正なリスク割り当てを可能にします。アルゴリズムをベンチマークモデルと比較し,一対の指数的ユーティリティ関数をベースとして,明示的な公式を提供するアルゴリズムを検証した。また、明示的な公式が得られない場合においても収束の証拠を示す。

In this work we propose deep learning-based algorithms for the computation of systemic shortfall risk measures defined via multivariate utility functions. We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations. The algorithms we provide allow for learning primal optimizers, optima for the dual representation and corresponding fair risk allocations. We test our algorithms by comparison to a benchmark model, based on a paired exponential utility function, for which we can provide explicit formulas. We also show evidence of convergence in a case for which explicit formulas are not available.

翻訳日:2023-06-17 02:05:36 公開日:2023-06-14

# 人選好による言語モデルの事前学習

Pretraining Language Models with Human Preferences ( http://arxiv.org/abs/2302.08582v2 )

ライセンス: Link先を確認

Tomasz Korbak and Kejian Shi and Angelica Chen and Rasika Bhalerao and Christopher L. Buckley and Jason Phang and Samuel R. Bowman and Ethan Perez

(参考訳) 言語モデル(LM)はインターネットテキストを模倣するために事前訓練されており、LMが生成したコンテンツには、偽造、攻撃的なコメント、個人識別可能な情報、品質の低いコード、バギーコードなどが含まれる。本稿では,人間の嗜好に沿ったテキストを生成する方法として,LMの事前学習のための代替目的を検討する。我々は,3つのタスクにまたがるフィードバックによる事前学習の5つの目標をベンチマークし,それらが事前訓練されたLMのアライメントと能力のトレードオフに与える影響について検討する。そこで我々は、条件付きトレーニングや、報酬モデルによって与えられる人間の嗜好スコアに基づくトークン上の分布の学習という、パレート最適で簡単なアプローチを見出した。条件付きトレーニングは、プロンプトを使わずに生成する時と逆行するプロンプトを伴って、望ましくないコンテンツの速度を最大で桁違いに減少させる。さらに条件付きトレーニングは、タスク固有の微調整前後において、標準lmプリトレーニングのダウンストリームタスクパフォーマンスを維持する。人間のフィードバックによる事前トレーニングは、標準のlmプリトレーニングよりもずっと優れた好み満足度をもたらし、続いてフィードバックによる微調整、すなわち学習、そして望ましくない行動を学習する。この結果から,LMの事前学習では模倣学習を超越し,訓練開始から人間の嗜好を取り入れるべきであることが示唆された。

Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs. We find a Pareto-optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially-chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training.

翻訳日:2023-06-17 02:04:27 公開日:2023-06-14

# iSAGE: データストリームのオンライン説明のためのSAGEのインクリメンタルバージョン

iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams ( http://arxiv.org/abs/2303.01181v2 )

ライセンス: Link先を確認

Maximilian Muschalik, Fabian Fumagalli, Barbara Hammer, Eyke H\"ullermeier

(参考訳) SAGEのような一般的な特徴重要度尺度を含む既存の説明可能な人工知能(XAI)の方法は、主にバッチ学習シナリオに限定されている。しかしながら、機械学習は、データが継続的に到着し、学習をオンライン形式で行わなければならない動的環境に適用されることが多い。そこで本研究では,SAGEの時間・メモリ効率のインクリメンタル化であるiSAGEを提案する。さらに,機能依存性を(干渉的に)破壊し,(観察的に)保持する,効率的な機能削除手法も提供する。さらに,iSAGEがSAGEと類似した理論的性質に固執していることを示すための説明法を正式に分析した。最後に,確立されたデータセットと概念ドリフトを伴うデータストリームに基づいて,我々のアプローチを徹底した実験分析で評価する。

Existing methods for explainable artificial intelligence (XAI), including popular feature importance measures such as SAGE, are mostly restricted to the batch learning scenario. However, machine learning is often applied in dynamic environments, where data arrives continuously and learning must be done in an online manner. Therefore, we propose iSAGE, a time- and memory-efficient incrementalization of SAGE, which is able to react to changes in the model as well as to drift in the data-generating process. We further provide efficient feature removal methods that break (interventional) and retain (observational) feature dependencies. Moreover, we formally analyze our explanation method to show that iSAGE adheres to similar theoretical properties as SAGE. Finally, we evaluate our approach in a thorough experimental analysis based on well-established data sets and data streams with concept drift.

翻訳日:2023-06-17 01:57:30 公開日:2023-06-14

# gnot: 演算子学習のための一般ニューラルネットワークトランスフォーマー

GNOT: A General Neural Operator Transformer for Operator Learning ( http://arxiv.org/abs/2302.14376v3 )

ライセンス: Link先を確認

Zhongkai Hao, Zhengyi Wang, Hang Su, Chengyang Ying, Yinpeng Dong, Songming Liu, Ze Cheng, Jian Song, Jun Zhu

(参考訳) 偏微分方程式(pdes)解演算子の学習は、機械学習において不可欠な問題である。しかし、不規則メッシュ、複数入力関数、PDEの解の複雑さなど、実践的な応用における演算子学習にはいくつかの課題がある。そこで本研究では,学習操作者のためのスケーラブルで効果的なトランスフォーマーフレームワークであるgeneral neural operator transformer (gnot)を提案する。新たな不均一正規化アテンション層を設計することにより,複数の入力関数や不規則メッシュを扱うことができる。また,マルチスケール問題を解くためにソフトドメイン分解と見なすことのできる幾何学的ゲーティング機構を導入する。トランスフォーマーアーキテクチャの大規模モデルキャパシティは,大規模データセットと実用上の問題にスケールする可能性をモデルに与える。異なる領域の複数の挑戦的データセットを広範囲に実験し,代替手法と比較して著しく改善した。私たちのコードとデータは、 \url{https://github.com/thu-ml/gnot}で公開されている。

Learning partial differential equations' (PDEs) solution operators is an essential problem in machine learning. However, there are several challenges for learning operators in practical applications like the irregular mesh, multiple input functions, and complexity of the PDEs' solution. To address these challenges, we propose a general neural operator transformer (GNOT), a scalable and effective transformer-based framework for learning operators. By designing a novel heterogeneous normalized attention layer, our model is highly flexible to handle multiple input functions and irregular meshes. Besides, we introduce a geometric gating mechanism which could be viewed as a soft domain decomposition to solve the multi-scale problems. The large model capacity of the transformer architecture grants our model the possibility to scale to large datasets and practical problems. We conduct extensive experiments on multiple challenging datasets from different domains and achieve a remarkable improvement compared with alternative methods. Our code and data are publicly available at \url{https://github.com/thu-ml/GNOT}.

翻訳日:2023-06-17 01:57:15 公開日:2023-06-14

# 胸部x線画像における知識強化視覚言語前訓練

Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images ( http://arxiv.org/abs/2302.14042v3 )

ライセンス: Link先を確認

Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie

(参考訳) 大規模データに事前学習されたマルチモーダル基礎モデルは自然言語理解や視覚認識に成功しているが、医療領域におけるそれらの使用は、医学的タスクのきめ細かい性質とドメイン知識の高需要のために制限されている。この課題に対処するために,既存の医学領域の知識を活用して,胸部X線と放射線学のレポートを用いた視覚言語事前学習を指導する,知識強調型自動診断(KAD)という新しいアプローチを提案する。我々は, {four} 外部X線データセット上でKADを評価し,そのゼロショット性能が完全教師付きモデルに匹敵するだけでなく,統計学的に有意な3種類の専門放射線技師の平均よりも優れていることを示した。さらに、少数ショットのアノテーションが利用できる場合、KADは、微調整設定で既存のすべてのアプローチより優れており、異なる臨床シナリオにおける適用の可能性を示している。

While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge. To address this challenge, we propose a novel approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports. We evaluate KAD on {four} external X-ray datasets and demonstrate that its zero-shot performance is not only comparable to that of fully-supervised models, but also superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. Moreover, when few-shot annotation is available, KAD outperforms all existing approaches in fine-tuning settings, demonstrating its potential for application in different clinical scenarios.

翻訳日:2023-06-17 01:56:41 公開日:2023-06-14

# 正規化動的プログラミングによる最適計画

Optimistic Planning by Regularized Dynamic Programming ( http://arxiv.org/abs/2302.14004v3 )

ライセンス: Link先を確認

Antoine Moulin, Gergely Neu

(参考訳) 本稿では,標準近似値反復手順の更新に正規化を加えるという考え方に基づいて,無限ホライゾン割引マルコフ決定過程における楽観的計画手法を提案する。この手法により, 線形関数近似を用いたMDPの最小二乗法により推定される近似遷移関数を, 既存の近似動的プログラミング手法の分析で必要とされる縮退や単調性引数を回避することができる。本手法は,表付きMDPの既知保証を回復し,また,1つの経験ストリームから,割引された線形混合MDPの準最適ポリシーを学習するための計算効率の良いアルゴリズムを提供する。

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.

翻訳日:2023-06-17 01:56:22 公開日:2023-06-14

# DiffusioNeRF: Denoising Diffusion Modelを用いた正則化ニューラルラジアンス場

DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models ( http://arxiv.org/abs/2302.12231v2 )

ライセンス: Link先を確認

Jamie Wynn, Daniyar Turmukhambetov

(参考訳) 良好な条件下では、ニューラルレージアンス場(NeRF)は、新しいビュー合成タスクにおいて印象的な結果を示している。 NeRFは、トレーニングビューとシーンの異なるレンダリングとの光度差を最小限にして、シーンの色と密度場を学習する。十分な一連のビューからトレーニングされたNeRFは、任意のカメラ位置から新しいビューを生成することができる。しかし、シーンの幾何学とカラーフィールドは厳しい制約下にあり、特に少ない入力ビューでトレーニングされた場合、アーティファクトにつながる可能性がある。この問題を軽減するために,ddm(denoising diffusion model)を用いて,風景形状と色彩の先行学習を行う。我々のDDMは、合成HypersimデータセットのRGBDパッチに基づいて訓練されており、色と深さの確率分布の対数勾配を予測できる。これらのrgbdパッチプリエントの対数勾配は,シーンの形状や色を規則化するのに役立つ。 nerfトレーニング中、ランダムなrgbdパッチがレンダリングされ、ログ類似度の推定勾配が色と密度フィールドに再伝播される。最も関連するデータセットであるllffの評価は、学習済みの事前学習によって再構成された幾何学の質が向上し、新しい視点への一般化が改善されたことを示している。 DTUの評価では、NeRF法で再現性が改善された。

Under good conditions, Neural Radiance Fields (NeRFs) have shown impressive results on novel view synthesis tasks. NeRFs learn a scene's color and density fields by minimizing the photometric discrepancy between training views and differentiable renderings of the scene. Once trained from a sufficient set of views, NeRFs can generate novel views from arbitrary camera positions. However, the scene geometry and color fields are severely under-constrained, which can lead to artifacts, especially when trained with few input views. To alleviate this problem we learn a prior over scene geometry and color, using a denoising diffusion model (DDM). Our DDM is trained on RGBD patches of the synthetic Hypersim dataset and can be used to predict the gradient of the logarithm of a joint probability distribution of color and depth patches. We show that, these gradients of logarithms of RGBD patch priors serve to regularize geometry and color of a scene. During NeRF training, random RGBD patches are rendered and the estimated gradient of the log-likelihood is backpropagated to the color and density fields. Evaluations on LLFF, the most relevant dataset, show that our learned prior achieves improved quality in the reconstructed geometry and improved generalization to novel views. Evaluations on DTU show improved reconstruction quality among NeRF methods.

翻訳日:2023-06-17 01:55:06 公開日:2023-06-14

# mednext: 医用画像セグメンテーションのためのconvnetのトランスフォーマー駆動スケーリング

MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation ( http://arxiv.org/abs/2303.09975v3 )

ライセンス: Link先を確認

Saikat Roy, Gregor Koehler, Constantin Ulrich, Michael Baumgartner, Jens Petersen, Fabian Isensee, Paul F. Jaeger, Klaus Maier-Hein

(参考訳) 医療画像セグメンテーションのためにTransformerベースのアーキテクチャを採用することへの関心は爆発的に高まっている。しかし、大規模な注釈付き医療データセットの欠如により、自然画像のそれと同等のパフォーマンスを達成することは困難である。対照的に畳み込みネットワークは誘導バイアスが高く、その結果、高い性能で容易に訓練できる。近年、convnextアーキテクチャはトランスフォーマーブロックをミラーリングすることで標準convnetの近代化を試みた。そこで本研究では, 医療現場の課題に合わせてカスタマイズした, 現代的でスケーラブルな畳み込み型アーキテクチャの設計を改良した。トランスフォーマーにインスパイアされた大規模カーネルセグメンテーションネットワークであるMedNeXtを導入し,1)医療画像セグメンテーションのための完全なConvNeXt 3Dエンコーダデコーダネットワークを導入する。 2) 規模にまたがる意味的豊かさを維持するため,残留ConvNeXtのアップアンドダウンサンプリングブロック。 3)小規模カーネルネットワークのアップサンプリングによるカーネルサイズを反復的に増加させ,限られた医療データの性能飽和を防止する新手法 4)MedNeXtの複数レベルの複合スケーリング(深さ,幅,カーネルサイズ)。これにより、CTとMRIの4つのタスクにおける最先端のパフォーマンスと、さまざまなデータセットサイズが実現され、医療画像セグメンテーションのための近代化されたディープアーキテクチャが表される。私たちのコードは、https://github.com/MIC-DKFZ/MedNeXt.comで公開されています。

There has been exploding interest in embracing Transformer-based architectures for medical image segmentation. However, the lack of large-scale annotated medical datasets make achieving performances equivalent to those in natural images challenging. Convolutional networks, in contrast, have higher inductive biases and consequently, are easily trainable to high performance. Recently, the ConvNeXt architecture attempted to modernize the standard ConvNet by mirroring Transformer blocks. In this work, we improve upon this to design a modernized and scalable convolutional architecture customized to challenges of data-scarce medical settings. We introduce MedNeXt, a Transformer-inspired large kernel segmentation network which introduces - 1) A fully ConvNeXt 3D Encoder-Decoder Network for medical image segmentation, 2) Residual ConvNeXt up and downsampling blocks to preserve semantic richness across scales, 3) A novel technique to iteratively increase kernel sizes by upsampling small kernel networks, to prevent performance saturation on limited medical data, 4) Compound scaling at multiple levels (depth, width, kernel size) of MedNeXt. This leads to state-of-the-art performance on 4 tasks on CT and MRI modalities and varying dataset sizes, representing a modernized deep architecture for medical image segmentation. Our code is made publicly available at: https://github.com/MIC-DKFZ/MedNeXt.

翻訳日:2023-06-17 01:47:39 公開日:2023-06-14

# スピンネットワークにおける励起伝達制御のロバスト性評価と統一化

Analyzing and Unifying Robustness Measures for Excitation Transfer Control in Spin Networks ( http://arxiv.org/abs/2303.09518v2 )

ライセンス: Link先を確認

S. P. O'Neil, I. Khalid, A. A. Rompokos, C. A. Weidner, F. C. Langbein, S. G. Schirmer, E. A. Jonckheere

(参考訳) 量子制御の最近の業績は、量子通信、コンピューティング、センシングのアプリケーションのためのコントローラを設計するための高度な技術を生み出した。しかし、そのようなシステムのノイズや不確実性への感受性は、量子デバイスの完全なポテンシャルを実現するために、これらの条件下で効果的に機能する堅牢なコントローラを必要とする。時間領域のログ感度と最近導入されたロバストネス不忠実度測定(RIM)は、量子システムにおけるコントローラのロバストネスを定量化する2つの方法である。前者は分析的に見つかるが、後者はモンテカルロサンプリングを必要とする。本研究は, スピン鎖および環における単一励起伝達の堅牢性を評価するために, 対数感度とRIMの相関関係について検討した。予測される誤差の差分感度は, RIMの差分感度と一致し, 予測値が誤差確率分布上にあることを示す。統計的解析により、対数感度とRIMは差分感度を介してリンクされ、差分感度とRIMは極めて一致していることが示された。様々な現実的なシナリオにおけるコントローラーの堅牢性を評価するための2つの手段(分析的手法とサンプリング的手法)の統合は、量子コントローラの堅牢性をモデル化し評価するための様々なツールを統一する第一歩となる。

Recent achievements in quantum control have resulted in advanced techniques for designing controllers for applications in quantum communication, computing, and sensing. However, the susceptibility of such systems to noise and uncertainties necessitates robust controllers that perform effectively under these conditions to realize the full potential of quantum devices. The time-domain log-sensitivity and a recently introduced robustness infidelity measure (RIM) are two means to quantify controller robustness in quantum systems. The former can be found analytically, while the latter requires Monte-Carlo sampling. In this work, the correlation between the log-sensitivity and the RIM for evaluating the robustness of single excitation transfer fidelity in spin chains and rings in the presence of dephasing is investigated. We show that the expected differential sensitivity of the error agrees with the differential sensitivity of the RIM, where the expectation is over the error probability distribution. Statistical analysis also demonstrates that the log-sensitivity and the RIM are linked via the differential sensitivity, and that the differential sensitivity and RIM are highly concordant. This unification of two means (one analytic and one via sampling) to assess controller robustness in a variety of realistic scenarios provides a first step in unifying various tools to model and assess robustness of quantum controllers.

翻訳日:2023-06-17 01:46:49 公開日:2023-06-14

# 離散変調連続可変量子鍵分布のセキュリティ

Security of discrete-modulated continuous-variable quantum key distribution ( http://arxiv.org/abs/2303.09255v2 )

ライセンス: Link先を確認

Stefan B\"auml, Carlos Pascual Garc\'ia, Victoria Wright, Omar Fawzi, Antonio Ac\'in

(参考訳) 離散変調による連続可変量子鍵分布は、広く利用可能な光学素子と既存の通信インフラを用いて量子物理セキュリティを提供する可能性がある。その実装はガウス変調に基づくプロトコルよりもはるかに単純であるが、コヒーレント攻撃に対する有限サイズのセキュリティを証明することは困難である。本研究では、4つのコヒーレント状態とヘテロダイン検出を含む離散変調量子鍵分布プロトコルに対するコヒーレント攻撃に対する有限サイズのセキュリティを証明するために、これまで離散変数の設定に用いられてきたエントロピー累積定理を適用する。そのために,従来の手法とは対照的に,すべての情報を無差別に扱うプロトコルを考える。我々はまず、その漸近速度を現実的なフォトン数カットオフ仮定の下で制限した。この境界はエントロピー蓄積を用いた有限サイズのセキュリティ証明にアップグレードされる。解析では、ラウンドあたり0.1-10^{-4}$bitsの範囲において、最大100kmまでの距離に対して漸近的なレートが与えられ、有限の場合と現実的なパラメータでは、n=10^{12}$ rounds と数十kmの距離の後に10ドルgbitsの秘密鍵が与えられる。

Continuous variable quantum key distribution with discrete modulation has the potential to provide quantum physical security using widely available optical elements and existing telecom infrastructure. While their implementation is significantly simpler than that for protocols based on Gaussian modulation, proving their finite-size security against coherent attacks poses a challenge. In this work we apply the entropy accumulation theorem, a tool that has previously been used in the setting of discrete variables, to prove finite-size security against coherent attacks for a discrete-modulated quantum key distribution protocol involving four coherent states and heterodyne detection. To do so, and contrary to previous approaches, we consider a protocol in which all the information is discretised. We first bound its asymptotic rate under a realistic photon number cutoff assumption. This bound is then upgraded into a finite-size security proof using entropy accumulation. Our analysis provides asymptotic rates in the range of $0.1-10^{-4}$ bits per round for distances up to hundred kilometres, while in the finite case and for realistic parameters, we get of the order of $10$ Gbits of secret key after $n=10^{12}$ rounds and distances of few tens of kilometres.

翻訳日:2023-06-17 01:46:26 公開日:2023-06-14

# 画像再構成におけるヒューマンインストラクションの回避を学習する説明可能なテキスト・ビジュアル・チャット

Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation ( http://arxiv.org/abs/2303.05983v2 )

ライセンス: Link先を確認

Zhiwei Zhang, Yuliang Liu

(参考訳) chatgptとgpt-4の成功はマルチモーダル対話システムに広く注目されている。しかし、学術コミュニティには、テキスト・ビジュアルチャットタスクでVisual Language Models(VLM)のマルチモーダル生成能力を検証できるデータセットが欠けている。本稿では,合成CLEVR-ATVCデータセット(620K)と手動によるFruit-ATVCデータセット(50K)の2つの新しいマルチモーダルデータセットを構築する。さらに、言語ベースのChatGPT会話のように、マルチモーダルシステムが人間の要求を拒否する(すなわち、説明責任を示す)ために、データセットに特定のルールを組み込んで監視信号とする。これにより、トレーニングされたVLMは、視覚的およびテキスト的推論の後、なぜ人間の指示を抽出できないのかという言語説明とともに、イエスまたはノー回答を提供することができる。本研究では,画像の自動エンコーダと自動回帰変換器をスクラッチからトレーニングするための2状態トレーニング手法を提案する。第1の状態は、各画像を短いトークンに圧縮する離散変分オートエンコーダ(dVAE)を含み、その後、単一のデータストリームとしてテキストトークンと結合してデコーダベースのトランスフォーマーに送信し、第2状態において視覚的再生成とテキストフィードバックを生成する。本研究では,画像品質,回答精度,不確実性や不完全なユーザクエリに直面する場合のモデル行動の観点から,実験結果を総合的に分析する。本研究の成果は,テキスト・視覚生成モデルの説明可能性に関する貴重な知見に寄与することを期待している。

The recent success of ChatGPT and GPT-4 has drawn widespread attention to multimodal dialogue systems. However, the academia community lacks a dataset that can validate the multimodal generation capabilities of Visual Language Models (VLMs) in textual-visual chat tasks. In this paper, we construct two new multimodal datasets: the synthetic CLEVR-ATVC dataset (620K) and the manually pictured Fruit-ATVC dataset (50K), both featuring visual and text-based inputs and outputs. Additionally, to enable the multimodal system to reject human requests (i.e., demonstrate accountability), as in language-based ChatGPT conversations, we develop and incorporate specific rules into the datasets as supervisory signals. This allows the trained VLM to provide a yes or no answer after visual and textual reasoning, accompanied by a language explanation as to why the human instruction cannot be excuted. In our method, we propose a two-state training procedure to train the image auto-encoder and auto-regressive transformer from scratch. The first state involves a discrete variational autoencoder (dVAE) to compress each image into short tokens, which are then concatenated with text tokens as a single data stream to be fed into the decoder-based transformer for generating visual re-creation and textual feedback in the second state. We provide comprehensive analyses of experimental results in terms of re-created image quality, answer accuracy, and the model behavior when faced with uncertainty and imperfect user queries. We hope our explorations and findings contribute valuable insights regarding the accountability of textual-visual generative models.

翻訳日:2023-06-17 01:45:24 公開日:2023-06-14

# 確率的トリガーアームを用いた文脈組合せ帯域

Contextual Combinatorial Bandits with Probabilistically Triggered Arms ( http://arxiv.org/abs/2303.17110v2 )

ライセンス: Link先を確認

Xutong Liu, Jinhang Zuo, Siwei Wang, John C.S. Lui, Mohammad Hajiesmaili, Adam Wierman, Wei Chen

(参考訳) 本研究では,コンテキストカスケードバンドや文脈影響最大化バンドなど,幅広い応用を捉えた様々な平滑性条件下で,確率的トリガアーム(c$^2$mab-t)を用いたコンテクストコンビネートバンドの研究を行った。トリガリング確率変調 (TPM) 条件の下では、C$^2$-UCB-T アルゴリズムを考案し、$\tilde{O}(d\sqrt{KT})$ regret bound を達成する新しい解析法を提案し、潜在的に指数関数的に大きな因子である $O(1/p_{\min})$ を除去し、$d$ は文脈の次元であり、$p_{\min}$ は任意のアームをトリガできる最小の正の確率であり、バッチサイズ $K$ はラウンド毎にトリガできる最大のアーム数である。分散変調 (vm) またはトリガー確率および分散変調 (tpvm) 条件の下で, 分散適応アルゴリズム vac$^2$-ucb を提案し, バッチサイズの $k$ とは独立な, 後悔の束縛 $\tilde{o}(d\sqrt{t})$ を導出する。価値ある副産物として,cmab-t および c$^2$mab 設定に解析手法と分散適応アルゴリズムを適用し,既存の結果も改善した。合成および実世界のデータセットのベンチマークアルゴリズムと比較して,アルゴリズムの性能向上を示す実験も含んでいる。

We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$^2$MAB setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.

翻訳日:2023-06-17 01:39:07 公開日:2023-06-14

# 視覚的に配線されたNFT : 非触覚における吸気の役割を探る

Visually Wired NFTs: Exploring the Role of Inspiration in Non-Fungible Tokens ( http://arxiv.org/abs/2303.17031v3 )

ライセンス: Link先を確認

Lucio La Cava, Davide Costa, Andrea Tagarelli

(参考訳) 非フランジブルトークン(nfts)への熱意は無数のクリエイターを惹きつけ、多くの創造的プロセスのように、潜在性や明示的なインスピレーションによって引き起こされるデジタル資産のビッグバンにつながった。この研究は、視覚変換器とグラフベースのモデリングを利用して、NFT間の視覚的なインスピレーション現象を長年研究してきた。私たちの目標は、視覚インスピレーションネットワークを形成する主な構造特性の公開、視覚インスピレーションとアセットパフォーマンスの相互関係の探索、インスピレーションプロセスに対する暗号の影響の調査、NFT間のインスピレーション関係の説明などです。インスピレーションの広汎さが視覚的特徴空間の一時的な飽和、インスピレーションとインスピレーションの2分断が財務成績に及ぼす影響、市場とインスピレーションの波による本質的な自己調節機構の解明につながった。私たちの仕事は、web3の進化のより広い視点を得るための出発点となり得る。

The fervor for Non-Fungible Tokens (NFTs) attracted countless creators, leading to a Big Bang of digital assets driven by latent or explicit forms of inspiration, as in many creative processes. This work exploits Vision Transformers and graph-based modeling to delve into visual inspiration phenomena between NFTs over the years. Our goals include unveiling the main structural traits that shape visual inspiration networks, exploring the interrelation between visual inspiration and asset performances, investigating crypto influence on inspiration processes, and explaining the inspiration relationships among NFTs. Our findings unveil how the pervasiveness of inspiration led to a temporary saturation of the visual feature space, the impact of the dichotomy between inspiring and inspired NFTs on their financial performance, and an intrinsic self-regulatory mechanism between markets and inspiration waves. Our work can serve as a starting point for gaining a broader view of the evolution of Web3.

翻訳日:2023-06-17 01:38:32 公開日:2023-06-14

# 品質多様性トランスフォーマ:決定トランスを用いた行動条件形軌道の生成

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers ( http://arxiv.org/abs/2303.16207v2 )

ライセンス: Link先を確認

Valentin Mac\'e, Rapha\"el Boige, Felix Chalumeau, Thomas Pierrot, Guillaume Richard, Nicolas Perrin-Gilbert

(参考訳) 神経進化の文脈において、品質多様性アルゴリズムは行動空間の定義に依存することにより、多様で効率的なポリシーのレパートリーを生成するのに有効であることが証明されている。このようなレパートリーの作成によって引き起こされる自然な目標は、レパートリーから対応するポリシーを実行することで実現可能な、需要に対する行動を達成することである。しかし、不確実な環境では2つの問題が生じる。第一に、ポリシーは堅牢性と再現性に欠ける可能性があるため、わずかに異なる条件下での複数のエピソードは、しばしば非常に異なる振る舞いをもたらす。第二に、レパートリーの離散的性質のため、解は不連続に変化する。本稿では,まず,行動空間において最も一貫した解に対する解の選択を制約するMAP-Elites Low-Spread (ME-LS) という2つのメカニズムに基づく行動条件付き軌道生成を実現するための新しい手法を提案する。第二に、連続的な動作記述子に基づくトランスフォーマティブベースのモデルである quality-diversity transformer (qdt) は、me-lsレパートリーからのポリシによって生成されたデータセットをトレーニングし、ターゲットの動作を達成するアクションのシーケンスを自己回帰的に生成することを学ぶ。その結果,ME-LSは一貫性とロバストなポリシを生成し,QDTと組み合わせることで,要求に対する多様な振る舞いを高い精度で達成可能な単一ポリシが得られることがわかった。

In the context of neuroevolution, Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies by relying on the definition of a behavior space. A natural goal induced by the creation of such a repertoire is trying to achieve behaviors on demand, which can be done by running the corresponding policy from the repertoire. However, in uncertain environments, two problems arise. First, policies can lack robustness and repeatability, meaning that multiple episodes under slightly different conditions often result in very different behaviors. Second, due to the discrete nature of the repertoire, solutions vary discontinuously. Here we present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains the selection of solutions to those that are the most consistent in the behavior space. Second, the Quality-Diversity Transformer (QDT), a Transformer-based model conditioned on continuous behavior descriptors, which trains on a dataset generated by policies from a ME-LS repertoire and learns to autoregressively generate sequences of actions that achieve target behaviors. Results show that ME-LS produces consistent and robust policies, and that its combination with the QDT yields a single policy capable of achieving diverse behaviors on demand with high accuracy.

翻訳日:2023-06-17 01:38:07 公開日:2023-06-14

# LLaMA-Adapter:ゼロ入力型言語モデルの効率的な微調整

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention ( http://arxiv.org/abs/2303.16199v2 )

ライセンス: Link先を確認

Renrui Zhang, Jiaming Han, Chris Liu, Peng Gao, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Yu Qiao

(参考訳) 命令追従モデルにLLaMAを効率よく微調整する軽量適応手法であるLLaMA-Adapterを提案する。 LLaMA-Adapterは52Kの自己命令型デモを使用して、凍結したLLaMA 7Bモデルで1.2Mの学習可能なパラメータを導入し、8A100 GPUの微調整には1時間未満の費用がかかる。具体的には、学習可能な適応プロンプトを採用し、より高いトランスフォーマー層で単語トークンにそれらを強調する。次に,新しい指導手がかりをラマに適応的に注入し,事前学習した知識を効果的に保持する,ゼロゲーティングによるゼロ初期化注意機構を提案する。効率的なトレーニングにより、LLaMA-Adapterは、完全に微調整された7Bパラメータを持つAlpacaに匹敵する高品質な応答を生成することができる。言語コマンドの他に,ScienceQA や COCO Caption のベンチマークにおいて,より優れた推論性能を実現する画像条件付き LLaMA モデルを学習するためのマルチモーダル命令にも簡単に拡張できる。さらに,従来の視覚や言語タスクに対して,事前学習した他のモデル (ViT, RoBERTa) を微調整するゼロ初期化アテンション機構も評価し,提案手法のより優れた一般化能力を示す。コードはhttps://github.com/OpenGVLab/LLaMA-Adapterで公開されている。

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With our efficient training, LLaMA-Adapter can generate high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Besides language commands, our approach can be simply extended to multi-modal instructions for learning image-conditioned LLaMA model, which achieves superior reasoning performance on ScienceQA and COCO Caption benchmarks. Furthermore, we also evaluate the zero-initialized attention mechanism for fine-tuning other pre-trained models (ViT, RoBERTa) on traditional vision and language tasks, demonstrating the superior generalization capacity of our approach. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.

翻訳日:2023-06-17 01:37:43 公開日:2023-06-14

# 単一光子に対する波動粒子双対性の確率論的考察

A probabilistic view of wave-particle duality for single photons ( http://arxiv.org/abs/2303.15185v3 )

ライセンス: Link先を確認

Andrea Aiello

(参考訳) 古典物理学から借用された概念の観点から量子力学を解釈する最も厄介な結果の1つは、いわゆる波動粒子双対性である。通常、波と粒子の双対性は、干渉実験における経路識別性とフリンジ可視性の相補性の観点から示される。そこで本研究では,波動の連続的性質と粒子の離散的特性との間に生じる新しい相補性を提案する。量子場理論の確率論的手法を用いて、同じ光のビーム内の波振幅と光子数を同時に測定することは、ある状況下では量子力学の法則によって禁止されることを示した。その結果,`interferometric duality'という概念は,より一般的な`continuous-vs-discrete duality'の概念に置き換えられる可能性が示唆された。

One of the most puzzling consequences of interpreting quantum mechanics in terms of concepts borrowed from classical physics, is the so-called wave-particle duality. Usually, wave-particle duality is illustrated in terms of complementarity between path distinguishability and fringe visibility in interference experiments. In this work, we instead propose a new type of complementarity, that between the continuous nature of waves and the discrete character of particles. Using the probabilistic methods of quantum field theory, we show that the simultaneous measurement of the wave amplitude and the number of photons in the same beam of light is, under certain circumstances, prohibited by the laws of quantum mechanics. Our results suggest that the concept of ``interferometric duality'' could be eventually replaced by the more general one of ``continuous-vs-discrete duality''.

翻訳日:2023-06-17 01:37:18 公開日:2023-06-14

# CCL:LiDAR位置認識のための連続的コントラスト学習

CCL: Continual Contrastive Learning for LiDAR Place Recognition ( http://arxiv.org/abs/2303.13952v2 )

ライセンス: Link先を確認

Jiafeng Cui, Xieyuanli Chen

(参考訳) 位置認識は、ロボットや自動運転アプリケーションのためのループクローズとグローバルローカライズにおいて、必須かつ困難なタスクである。近年のディープラーニング技術の発展により,LiDAR位置認識(LPR)の性能は大幅に向上した。しかし、現在のディープラーニングベースの手法は、一般化能力の低さと破滅的な忘れることの2つの大きな問題に悩まされている。本稿では,大惨な忘れの問題に対処し,LPRアプローチの堅牢性を改善するために,CCLという連続的なコントラスト学習手法を提案する。我々のCCLは、コントラスト的特徴プールを構築し、コントラスト的損失を利用して、より移動可能な場所表現を訓練する。新たな環境に移行すると、CCLはコントラストメモリバンクを継続的にレビューし、新しいデータから新しい場所を認識することを継続的に学習しながら、過去のデータの検索能力を維持するために分布ベースの知識蒸留を適用します。我々は3つの異なるLPR手法を用いてオックスフォード、MulRan、PNVデータセットに対するアプローチを徹底的に評価した。実験の結果,我々のCCLは,異なる環境における異なる手法の性能を常に改善し,最先端の継続的学習法よりも優れていた。このメソッドの実装はhttps://github.com/cloudcjf/cclでリリースされた。

Place recognition is an essential and challenging task in loop closing and global localization for robotics and autonomous driving applications. Benefiting from the recent advances in deep learning techniques, the performance of LiDAR place recognition (LPR) has been greatly improved. However, current deep learning-based methods suffer from two major problems: poor generalization ability and catastrophic forgetting. In this paper, we propose a continual contrastive learning method, named CCL, to tackle the catastrophic forgetting problem and generally improve the robustness of LPR approaches. Our CCL constructs a contrastive feature pool and utilizes contrastive loss to train more transferable representations of places. When transferred into new environments, our CCL continuously reviews the contrastive memory bank and applies a distribution-based knowledge distillation to maintain the retrieval ability of the past data while continually learning to recognize new places from the new data. We thoroughly evaluate our approach on Oxford, MulRan, and PNV datasets using three different LPR methods. The experimental results show that our CCL consistently improves the performance of different methods in different environments outperforming the state-of-the-art continual learning method. The implementation of our method has been released at https://github.com/cloudcjf/CCL.

翻訳日:2023-06-17 01:37:04 公開日:2023-06-14

# OpenAGI: LLMがドメインエキスパートと出会ったとき

OpenAGI: When LLM Meets Domain Experts ( http://arxiv.org/abs/2304.04370v3 )

ライセンス: Link先を確認

Yingqiang Ge, Wenyue Hua, Kai Mei, Jianchao Ji, Juntao Tan, Shuyuan Xu, Zelong Li, Yongfeng Zhang

(参考訳) 人間の知性は、複雑なタスクを解決するために、基本的なスキルを複雑なものに組み立てる素晴らしい能力を持っている。この能力は人工知能(ai)にも等しく重要であり、大規模で包括的な知的モデルの開発に加えて、人工知能(agi)の追求において複雑なタスク解決のために様々なドメイン固有のエキスパートモデルを活用する能力を備えることが重要であると主張する。近年の大規模言語モデル(llm)の発展は驚くべき学習能力と推論能力を示しており、複雑なタスクを解決するために外部モデルを選択、合成、実行するためのコントローラとして有望である。本稿では,オープンソースのAGI研究プラットフォームであるOpenAGIを開発し,タスク固有のデータセット,評価指標,さまざまな拡張可能なモデルなどを伴って,複雑なマルチステップタスクを提供する。 OpenAGIは複雑なタスクを自然言語クエリとして定式化し、LLMへの入力として機能する。 LLMはその後、タスクに対処するためにOpenAGIが提供するモデルを選択し、合成し、実行します。さらに,課題解決結果をフィードバックとして利用するタスクフィードバック(rltf)機構から強化学習を行い,llmのタスク解決能力を向上させる。したがって、LLMは複雑なタスクを解決するために様々な外部モデルを合成する責任を持ち、RTLFはタスク解決能力を改善するためのフィードバックを提供し、自己改善AIのためのフィードバックループを可能にする。我々は、複雑なタスク解決のための様々な専門家モデルを操作するLLMのパラダイムが、AGIに対する有望なアプローチであると信じている。コミュニティによるAGIの能力の長期的な改善と評価を容易にするため、私たちはOpenAGIプロジェクトのコード、ベンチマーク、評価方法をhttps://github.com/agiresearch/OpenAGIでオープンソース化しました。

Human intelligence has the remarkable ability to assemble basic skills into complex ones so as to solve complex tasks. This ability is equally important for Artificial Intelligence (AI), and thus, we assert that in addition to the development of large, comprehensive intelligent models, it is equally crucial to equip such models with the capability to harness various domain-specific expert models for complex task-solving in the pursuit of Artificial General Intelligence (AGI). Recent developments in Large Language Models (LLMs) have demonstrated remarkable learning and reasoning abilities, making them promising as a controller to select, synthesize, and execute external models to solve complex tasks. In this project, we develop OpenAGI, an open-source AGI research platform, specifically designed to offer complex, multi-step tasks and accompanied by task-specific datasets, evaluation metrics, and a diverse range of extensible models. OpenAGI formulates complex tasks as natural language queries, serving as input to the LLM. The LLM subsequently selects, synthesizes, and executes models provided by OpenAGI to address the task. Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF) mechanism, which uses the task-solving result as feedback to improve the LLM's task-solving ability. Thus, the LLM is responsible for synthesizing various external models for solving complex tasks, while RLTF provides feedback to improve its task-solving ability, enabling a feedback loop for self-improving AI. We believe that the paradigm of LLMs operating various expert models for complex task-solving is a promising approach towards AGI. To facilitate the community's long-term improvement and evaluation of AGI's ability, we open-source the code, benchmark, and evaluation methods of the OpenAGI project at https://github.com/agiresearch/OpenAGI.

翻訳日:2023-06-17 01:28:16 公開日:2023-06-14

# Einstein-Podolsky-Rosen-Bohm実験:離散データ駆動アプローチ

Einstein-Podolsky-Rosen-Bohm experiments: a discrete data driven approach ( http://arxiv.org/abs/2304.03962v3 )

ライセンス: Link先を確認

Hans De Raedt, Mikhail I. Katsnelson, Manpreet S. Jattana, Vrinda Mehta, Madita Willsch, Dennis Willsch, Kristel Michielsen, Fengping Jin

(参考訳) 我々は、実験データから数学モデルへの一方的な橋渡しを構築することは、後者で使われる記号に意味を付けることによって引き起こされる論争を回避できるという観点から考える。特に、アインシュタイン-ポドルスキー-ローゼン=ボーム実験の結果を解釈するための数学的モデルを構築する上で、この考え方を採用することが新しい視点をもたらすことを示す。まず, アインシュタイン-ポドルスキー-ローゼン-ボーム実験により得られた4つの相関の値に制約を与えるベル型不等式を4つの異なる条件で証明する。証明は ``model-free' であり、データの生成を想像する数学的モデルに言及しないという意味では '`model-free' である。制約は、相関値を変更することなく、4つのデータセットでデータを再シャッフルすることで得られる四足数にのみ依存する。これらの新しい不等式は、既知のベル型不等式(英語版)のモデルフリーバージョンに還元される。モデルフリーであることから、実験データによる後者の違反は、4つのデータセットのすべてのデータが4重に書き換えられるわけではないことを意味する。さらに、モデルのない不等式であるため、実験データによる後者の違反は、このデータを生成すると仮定される数学的モデルが適用されないことを意味する。 Einstein-Podolsky-Rosen-Bohm実験によって得られたデータから、これらのデータの主な特徴を記述する数学的モデルを仮定する代わりに構築する。合理的推論の数学的枠組みは再現可能で堅牢なデータに適用され、一重項状態の2つのスピン1/2オブジェクトの系に対する相関式である量子論のいかなる概念も使わずに得られる。 (ここで詳述)

We take the point of view that building a one-way bridge from experimental data to mathematical models instead of the other way around avoids running into controversies resulting from attaching meaning to the symbols used in the latter. In particular, we show that adopting this view offers new perspectives for constructing mathematical models for and interpreting the results of Einstein-Podolsky-Rosen-Bohm experiments. We first prove new Bell-type inequalities constraining the values of the four correlations obtained by performing Einstein-Podolsky-Rosen-Bohm experiments under four different conditions. The proof is ``model-free'' in the sense that it does not refer to any mathematical model that one imagines to have produced the data. The constraints only depend on the number of quadruples obtained by reshuffling the data in the four data sets without changing the values of the correlations. These new inequalities reduce to model-free versions of the well-known Bell-type inequalities if the maximum fraction of quadruples is equal to one. Being model-free, a violation of the latter by experimental data implies that not all the data in the four data sets can be reshuffled to form quadruples. Furthermore, being model-free inequalities, a violation of the latter by experimental data only implies that any mathematical model assumed to produce this data does not apply. Starting from the data obtained by performing Einstein-Podolsky-Rosen-Bohm experiments, we construct instead of postulate mathematical models that describe the main features of these data. The mathematical framework of plausible reasoning is applied to reproducible and robust data, yielding without using any concept of quantum theory, the expression of the correlation for a system of two spin-1/2 objects in the singlet state. (truncated here)

翻訳日:2023-06-17 01:27:41 公開日:2023-06-14

# aspest: アクティブラーニングと選択的予測のギャップを埋める

ASPEST: Bridging the Gap Between Active Learning and Selective Prediction ( http://arxiv.org/abs/2304.03870v2 )

ライセンス: Link先を確認

Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan Arik, Somesh Jha, Tomas Pfister

(参考訳) 選択的予測は、モデルの不確実性が高い場合の予測を省略する信頼できるモデルを学ぶことを目的としている。これらの予測は、さらなる評価のために人間の専門家に延期することができる。多くの実世界のシナリオでは、テストデータの分布はトレーニングデータとは異なる。これにより、より不正確な予測が行われ、人間のラベル付けが複雑でコストがかかる。アクティブラーニングは、最も有益な例のみを問うことでこれを回避し、いくつかのケースでは、全体的なラベル付けの労力を減らすことが示されている。そこで本研究では,選択予測とアクティブラーニングを橋渡しし,移動対象領域からより有意義なサンプルをクエリし,精度とカバレッジを高めた新しい学習パラダイムであるactive selective predictionを提案する。この新たな問題に対して,モデルスナップショットのアンサンブルと,集約された出力を擬似ラベルとして自己学習する,シンプルで効果的なソリューションであるASPESTを提案する。多くの画像、テキスト、構造化データセット、特にドメインシフトに苦しむデータセットに関する広範な実験は、提案手法が選択的予測とアクティブラーニング(例えば、100ドルのラベル付け予算でmnist$\to$svhnベンチマークで)の以前の作業を大きく上回ることを実証し、aucメトリックを79.36\%から8.84.%$に改善し、ループ内で人間の最適な利用を達成する。

Selective prediction aims to learn a reliable model that abstains from making predictions when the model uncertainty is high. These predictions can then be deferred to a human expert for further evaluation. In many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, necessitating increased human labeling, which can be difficult and expensive. Active learning circumvents this by only querying the most informative examples and, in several cases, has been shown to lower the overall labeling effort. In this work, we bridge selective prediction and active learning, proposing a new learning paradigm called active selective prediction which learns to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new problem, we propose a simple but effective solution, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, particularly those suffer from domain shifts, demonstrate that our proposed method can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST$\to$SVHN benchmark with the labeling budget of $100$, ASPEST improves the AUC metric from $79.36\%$ to $88.84\%$) and achieves more optimal utilization of humans in the loop.

翻訳日:2023-06-17 01:27:08 公開日:2023-06-14

# ImageEye:プログラム合成を用いたバッチ画像処理

ImageEye: Batch Image Processing Using Program Synthesis ( http://arxiv.org/abs/2304.03253v3 )

ライセンス: Link先を確認

Celeste Barnaby, Qiaochu Chen, Roopsha Samanta, Isil Dillig

(参考訳) 本稿では,バッチ画像処理のための新しい合成手法を提案する。全画像にグローバル編集しか適用できない既存のツールとは異なり、この方法は画像内の個々のオブジェクトに対してきめ細かい編集を施すことができる。例えば、特定の特性を持つ特定のオブジェクトを選択的にぼかしたり、収穫することができる。このようなきめ細かい画像編集作業を容易にするために,事前学習したニューラルネットワークと記号推論を可能にする他の言語構造を組み合わせた,ニューロシンボリックドメイン固有言語(DSL)を提案する。本手法は,新しい合成アルゴリズムを用いて,ユーザの実演から,このdslのプログラムを自動的に学習する。提案手法をImageEyeと呼ばれるツールに実装し,50個の画像編集タスクで評価した。評価の結果,ImageEyeはこれらのタスクの96%を自動化できることがわかった。

This paper presents a new synthesis-based approach for batch image processing. Unlike existing tools that can only apply global edits to the entire image, our method can apply fine-grained edits to individual objects within the image. For example, our method can selectively blur or crop specific objects that have a certain property. To facilitate such fine-grained image editing tasks, we propose a neuro-symbolic domain-specific language (DSL) that combines pre-trained neural networks for image classification with other language constructs that enable symbolic reasoning. Our method can automatically learn programs in this DSL from user demonstrations by utilizing a novel synthesis algorithm. We have implemented the proposed technique in a tool called ImageEye and evaluated it on 50 image editing tasks. Our evaluation shows that ImageEye is able to automate 96% of these tasks.

翻訳日:2023-06-17 01:26:40 公開日:2023-06-14

# ニューラル遅延微分方程式を用いた遅延学習

Learning the Delay Using Neural Delay Differential Equations ( http://arxiv.org/abs/2304.01329v2 )

ライセンス: Link先を確認

Maria Oprea and Mark Walth and Robert Stephany and Gabriella Torres Nothaft and Arnaldo Rodriguez-Gonzalez and William Clark

(参考訳) 機械学習と動的システムの交点は最近かなりの関心を集めている。ニューラルネットワークの常微分方程式(ノード)は、これらのフィールド間の重なりが豊富である。本稿では,遅延微分方程式(ddes)に基づく連続時間ニューラルネットワーク手法を提案する。本モデルでは,データからモデルパラメータと遅延を直接学習するために随伴感度法を用いる。我々のアプローチはNODEにインスパイアされ、遅延の値が先行値であることが仮定された初期のニューラルDDEモデルを拡張します。我々は,提案手法の感度解析を行い,ベンチマークシステムからddeパラメータを学習する能力を示す。我々は今後の方向性と応用の可能性で議論を終える。

The intersection of machine learning and dynamical systems has generated considerable interest recently. Neural Ordinary Differential Equations (NODEs) represent a rich overlap between these fields. In this paper, we develop a continuous time neural network approach based on Delay Differential Equations (DDEs). Our model uses the adjoint sensitivity method to learn the model parameters and delay directly from data. Our approach is inspired by that of NODEs and extends earlier neural DDE models, which have assumed that the value of the delay is known a priori. We perform a sensitivity analysis on our proposed approach and demonstrate its ability to learn DDE parameters from benchmark systems. We conclude our discussion with potential future directions and applications.

翻訳日:2023-06-17 01:25:50 公開日:2023-06-14

# 個人差分学習におけるユーティリティ損失の軽減について:幾何学的カーネルアプローチによる新しい視点

On Mitigating the Utility-Loss in Differentially Private Learning: A new Perspective by a Geometrically Inspired Kernel Approach ( http://arxiv.org/abs/2304.01300v2 )

ライセンス: Link先を確認

Mohit Kumar, Bernhard A. Moser, Lukas Fischer

(参考訳) プライバシとユーティリティのトレードオフは、差分プライベート機械学習の基本的な問題のひとつとして残っている。本稿では,幾何学的インスパイアされたカーネルに基づく分類の精度低下を緩和する手法を提案する。このアプローチでは、与えられたデータポイントのアフィン殻の表現が、Reproduction Kernel Hilbert Spaces (RKHS) で学習される。これにより、個々のデータポイントに関するプライバシーに敏感な情報を隠蔽し、メンバシップ推論攻撃のリスクを大幅に低減することで、プライバシとユーティリティのトレードオフを改善する新しい距離尺度が導かれる。このアプローチの有効性は、MNISTデータセット、フライブルク食料品データセット、本物のバイオメディカルデータセットの実験を通じて実証される。このアプローチが計算上実用的であることは確認されている。フェデレーション学習へのアプローチの適用を考察し,分散データによる精度損失は限界値か,あるいはそれほど高くないことが観察された。

Privacy-utility tradeoff remains as one of the fundamental issues of differentially private machine learning. This paper introduces a geometrically inspired kernel-based approach to mitigate the accuracy-loss issue in classification. In this approach, a representation of the affine hull of given data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads to a novel distance measure that hides privacy-sensitive information about individual data points and improves the privacy-utility tradeoff via significantly reducing the risk of membership inference attacks. The effectiveness of the approach is demonstrated through experiments on MNIST dataset, Freiburg groceries dataset, and a real biomedical dataset. It is verified that the approach remains computationally practical. The application of the approach to federated learning is considered and it is observed that the accuracy-loss due to data being distributed is either marginal or not significantly high.

翻訳日:2023-06-17 01:25:42 公開日:2023-06-14

# リーンのための機械学習型前提選択

Machine-Learned Premise Selection for Lean ( http://arxiv.org/abs/2304.00994v2 )

ライセンス: Link先を確認

Bartosz Piotrowski, Ramon Fern\'andez Mir, Edward Ayers

(参考訳) ユーザによって証明される定理の前提となる前提を示唆する,リーン証明アシスタントのための機械学習ベースのツールを紹介する。ツールの設計原則は,(1)証明アシスタントとの緊密な統合,(2)使いやすさとインストール,(3)軽量で迅速なアプローチである。この目的のために、オンラインで訓練されたランダム森林モデルのカスタムバージョンを設計した。これはLean 4.0のリッチで効率的なメタプログラミング機能のおかげで可能になった。ランダムな森は、リーンの数学ライブラリであるMathlibから抽出されたデータに基づいて訓練されている。トレーニング機能やラベルを作成するための様々なオプションを試す。トレーニングされたモデルからのアドバイスは、対話的に証明を構築しながら、エディターで呼び出すことができるsuggested_premises tacticを介してユーザに提供する。

We introduce a machine-learning-based tool for the Lean proof assistant that suggests relevant premises for theorems being proved by a user. The design principles for the tool are (1) tight integration with the proof assistant, (2) ease of use and installation, (3) a lightweight and fast approach. For this purpose, we designed a custom version of the random forest model, trained in an online fashion. It is implemented directly in Lean, which was possible thanks to the rich and efficient metaprogramming features of Lean 4. The random forest is trained on data extracted from mathlib -- Lean's mathematics library. We experiment with various options for producing training features and labels. The advice from a trained model is accessible to the user via the suggest_premises tactic which can be called in an editor while constructing a proof interactively.

翻訳日:2023-06-17 01:25:26 公開日:2023-06-14

# 局所エネルギー分布に基づく確率的アニーリングのハイパーパラメータ決定

Local Energy Distribution Based Hyperparameter Determination for Stochastic Simulated Annealing ( http://arxiv.org/abs/2304.11839v2 )

ライセンス: Link先を確認

Naoya Onizawa, Kyo Kuroki, Duckgyu Shin, Takahiro Hanyu

(参考訳) 本稿では,局所エネルギー分布に基づく確率的模擬焼鈍(SSA)のためのハイパーパラメータ決定法を提案する。 SSAは、一般的な模擬焼鈍(SA)よりも高速に組合せ最適化問題を解くことができるが、時間を要するハイパーパラメーター探索が必要である。提案手法はスピン(確率ビット)の局所エネルギー分布に基づいてハイパーパラメータを決定する。スピンはSSAの基本計算要素であり、その重みで他のスピンとグラフィカルに接続されている。局所エネルギーの分布は中心極限定理(CLT)に基づいて推定できる。 CLTに基づく正規分布は、従来の手法のO(n^3)からO(1)へのハイパーパラメータ探索の時間的複雑さを低減するために用いられる。最大カット問題に対するGsetおよびK2000ベンチマークにおいて,決定されたハイパーパラメータを用いたSSAの性能を評価する。その結果,提案手法は最もよく知られたカット値の約98%の平均カット値が得られることがわかった。

This paper presents a local energy distribution based hyperparameter determination for stochastic simulated annealing (SSA). SSA is capable of solving combinatorial optimization problems faster than typical simulated annealing (SA), but requires a time-consuming hyperparameter search. The proposed method determines hyperparameters based on the local energy distributions of spins (probabilistic bits). The spin is a basic computing element of SSA and is graphically connected to other spins with its weights. The distribution of the local energy can be estimated based on the central limit theorem (CLT). The CLT-based normal distribution is used to determine the hyperparameters, which reduces the time complexity for hyperparameter search from O(n^3) of the conventional method to O(1). The performance of SSA with the determined hyperparameters is evaluated on the Gset and K2000 benchmarks for maximum-cut problems. The results show that the proposed method achieves mean cut values of approximately 98% of the best-known cut values.

翻訳日:2023-06-17 01:19:13 公開日:2023-06-14

# 自律運転のためのニューラルマップ

Neural Map Prior for Autonomous Driving ( http://arxiv.org/abs/2304.08481v2 )

ライセンス: Link先を確認

Xuan Xiong, Yicheng Liu, Tianyuan Yuan, Yue Wang, Yilun Wang, Hang Zhao

(参考訳) 高精細(HD)セマンティックマップは、自動運転車が都市環境をナビゲートするために不可欠である。オフラインのhdマップを作成する従来の方法は、コストがかかるだけでなく、タイムリーな更新には不十分な、労働集約的な手動アノテーションプロセスを伴う。近年,オンラインセンサを用いた局所地図作成手法が提案されている。しかし、このアプローチはセンサーの知覚範囲と咬合に対する感受性によって制限される。本研究では,グローバルマップのニューラルマップ表現であるneural map prior (nmp)を提案する。この表現は自動的に更新され、ローカルマップ推論のパフォーマンスが向上する。具体的には、これを実現するために2つのアプローチを利用する。まず,局所写像推論に先行する強写像を統合するために,現在と過去の特徴の相関関係を動的に同定する機構であるクロスアテンションを適用した。第2に,グローバルニューラルマップを事前に更新するために,前回のトラバーサルから特徴を抽出してネットワークを誘導する学習ベースのフュージョンモジュールを用いる。 nuScenesデータセットをベースとした実験結果から,本フレームワークは様々なマップセグメンテーションおよび検出アーキテクチャと高い互換性を示す。より長い知覚範囲の厳しい気象条件や状況であっても、地図予測性能を著しく向上させる。私たちの知る限りでは、グローバルマップを事前に作成するための学習ベースのシステムとしてはこれが初めてです。

High-definition (HD) semantic maps are crucial in enabling autonomous vehicles to navigate urban environments. The traditional method of creating offline HD maps involves labor-intensive manual annotation processes, which are not only costly but also insufficient for timely updates. Recent studies have proposed an alternative approach that generates local maps using online sensor observations. However, this approach is limited by the sensor's perception range and its susceptibility to occlusions. In this study, we propose Neural Map Prior (NMP), a neural representation of global maps. This representation automatically updates itself and improves the performance of local map inference. Specifically, we utilize two approaches to achieve this. Firstly, to integrate a strong map prior into local map inference, we apply cross-attention, a mechanism that dynamically identifies correlations between current and prior features. Secondly, to update the global neural map prior, we utilize a learning-based fusion module that guides the network in fusing features from previous traversals. Our experimental results, based on the nuScenes dataset, demonstrate that our framework is highly compatible with various map segmentation and detection architectures. It significantly improves map prediction performance, even in challenging weather conditions and situations with a longer perception range. To the best of our knowledge, this is the first learning-based system for creating a global map prior.

翻訳日:2023-06-17 01:18:32 公開日:2023-06-14

# 自己教師付き深層学習による全スリッド画像の高速かつスケーラブルな検索」に関するコメント

Comments on 'Fast and scalable search of whole-slide images via self-supervised deep learning' ( http://arxiv.org/abs/2304.08297v4 )

ライセンス: Link先を確認

Milad Sikaroudi, Mehdi Afshari, Abubakr Shafique, Shivam Kalra, H.R. Tizhoosh

(参考訳) チェンなど。 [chen2022]は最近、nature biomedical engineeringで、"fast and scalable search of whole-slide images via self-supervised deep learning"という記事を発表した。著者らはこれらの手法を「組織学のための自己監督画像検索」、略称SISHと呼んでいる。 SISH は Yottixel の漸進的な修正であり,MinMax のバイナライゼーションは用いてきたが,原著を引用せず,誤用した「自己監督画像検索」に基づいている,という懸念を表明する。また、Chenらによる実験と比較に関する他の懸念についても指摘する。

Chen et al. [Chen2022] recently published the article 'Fast and scalable search of whole-slide images via self-supervised deep learning' in Nature Biomedical Engineering. The authors call their method 'self-supervised image search for histology', short SISH. We express our concerns that SISH is an incremental modification of Yottixel, has used MinMax binarization but does not cite the original works, and is based on a misnomer 'self-supervised image search'. As well, we point to several other concerns regarding experiments and comparisons performed by Chen et al.

翻訳日:2023-06-17 01:17:31 公開日:2023-06-14

# 多くのディープネットワークの訓練過程は、同じ低次元多様体を探索する

The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold ( http://arxiv.org/abs/2305.01604v2 )

ライセンス: Link先を確認

Jialin Mao, Itay Griniasty, Han Kheng Teoh, Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari

(参考訳) 我々は,訓練中の深層ネットワーク予測の軌跡を解析するための情報幾何学的手法を開発した。基礎となる高次元確率モデルを調べることにより,訓練過程が効果的に低次元多様体を探索することを明らかにする。様々なアーキテクチャ、サイズを持つネットワークは、様々な最適化手法、正規化技術、データ拡張技術、重み付け初期化を訓練し、予測空間の同じ多様体上に配置する。この多様体の詳細を調べたところ、異なるアーキテクチャを持つネットワークは区別可能な軌跡に従うが、他の要因は最小限の影響を受けており、より大きなネットワークはより小さなネットワークと同様の多様体に沿って訓練し、予測空間の非常に異なる部分で初期化されるネットワークは、同様の多様体に沿って解に収束する。

We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.

翻訳日:2023-06-17 01:08:59 公開日:2023-06-14

# ユーザクエリのためのコンテキスト多言語スペルチェッカ

Contextual Multilingual Spellchecker for User Queries ( http://arxiv.org/abs/2305.01082v2 )

ライセンス: Link先を確認

Sanat Sharma, Josep Valls-Vargas, Tracy Holloway King, Francois Guerin, Chirag Arora

(参考訳) Spellcheckingは、最も基本的で広く使われている検索機能の一つだ。不正な綴りのユーザクエリの修正は、ユーザエクスペリエンスの向上だけでなく、ユーザの期待も高めます。しかしながら、最も広く利用されているスペルチェックソリューションは、最先端のソリューションよりも精度が低いか、レイテンシが重要な要件である検索ユースケースで使用するには遅すぎるかのどちらかである。さらに、最近の最も革新的なアーキテクチャは英語に重点を置いており、多言語で訓練されておらず、長文の綴り訂正のために訓練されている。最後に、ほとんどの企業は製品名のような独自の語彙を持っているため、既製のスペルソリューションはユーザのニーズに届かない。本研究では,非常に高速でスケーラブルで,その語彙に適応し,特定の製品のニーズに応じた綴り出力を行う多言語スペルチェッカを構築した。さらに、スペルはドメイン内のデータセットに対して広いマージンで汎用スペルを上回ります。私たちの多言語スペルはAdobe製品の検索に使われ、様々なアプリケーションでオートコンプリートに使われています。

Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications.

翻訳日:2023-06-17 01:08:43 公開日:2023-06-14

# クラスバランス拡散モデル

Class-Balancing Diffusion Models ( http://arxiv.org/abs/2305.00562v2 )

ライセンス: Link先を確認

Yiming Qin, Huangjie Zheng, Jiangchao Yao, Mingyuan Zhou, Ya Zhang

(参考訳) 拡散に基づくモデルは、近年の研究でより良い多様性を保ちながら高品質な視覚データを生成する利点を示している。しかし、そのような観察は、データサンプルがラベルの点から一様に配布されるように適切に事前処理されたキュレートされたデータ分布でのみ正当化される。実際には、ロングテールデータ分布はより一般的であり、そのようなクラス不均衡データに対して拡散モデルがどのように振る舞うかは不明である。本研究では,この問題をまず研究し,拡散モデルがクラス不均衡分布を持つデータセット上で訓練された場合,多様性と忠実性の両面で有意な劣化を観測する。特に尾のクラスでは、世代は多様性をほとんど失い、重度のモード崩壊の問題を観察します。そこで本研究では,データ分布がクラスバランスではないという仮説から,分布調整正規化器を用いて学習したクラスバランス拡散モデル(cbdm)を提案する。 CBDMが生成した画像は,定量的および質的両面で高い多様性と品質を示した。提案手法は,CIFAR100/CIFAR100LTデータセットで生成結果をベンチマークし,下流認識タスクにおいて優れた性能を示す。

Diffusion-based models have shown the merits of generating high-quality visual data while preserving better diversity in recent studies. However, such observation is only justified with curated data distribution, where the data samples are nicely pre-processed to be uniformly distributed in terms of their labels. In practice, a long-tailed data distribution appears more common and how diffusion models perform on such class-imbalanced data remains unknown. In this work, we first investigate this problem and observe significant degradation in both diversity and fidelity when the diffusion model is trained on datasets with class-imbalanced distributions. Especially in tail classes, the generations largely lose diversity and we observe severe mode-collapse issues. To tackle this problem, we set from the hypothesis that the data distribution is not class-balanced, and propose Class-Balancing Diffusion Models (CBDM) that are trained with a distribution adjustment regularizer as a solution. Experiments show that images generated by CBDM exhibit higher diversity and quality in both quantitative and qualitative ways. Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.

翻訳日:2023-06-17 01:07:54 公開日:2023-06-14

# ニオブ表面カプセル化によるトランスモン量子コヒーレンスの系統的改善

Systematic Improvements in Transmon Qubit Coherence Enabled by Niobium Surface Encapsulation ( http://arxiv.org/abs/2304.13257v2 )

ライセンス: Link先を確認

Mustafa Bal, Akshay A. Murthy, Shaojiang Zhu, Francesco Crisa, Xinyuan You, Ziwen Huang, Tanay Roy, Jaeyel Lee, David van Zanten, Roman Pilipenko, Ivan Nekrashevich, Daniel Bafia, Yulia Krasnikova, Cameron J. Kopas, Ella O. Lachman, Duncan Miller, Josh Y. Mutus, Matthew J. Reagor, Hilal Cansizoglu, Jayss Marshall, David P. Pappas, Kim Vu, Kameshwar Yadavalli, Jin-Su Oh, Lin Zhou, Matthew J. Kramer, Florent Q. Lecocq, Dominic P. Goronzy, Carlos G. Torres-Castanedo, Graham Pritchard, Vinayak P. Dravid, James M. Rondinelli, Michael J. Bedzyk, Mark C. Hersam, John Zasadzinski, Jens Koch, James A. Sauls, Alexander Romanenko, and Anna Grassellino

(参考訳) 本稿では,T$_1$コヒーレンス時間を体系的に改善するトランスモンキュービット製造手法を提案する。我々は, ニオブの表面を緩和し, 損失表面の酸化物の形成を阻害するカプセル化戦略を用いて, デバイスを作製した。同じ超伝導金属を維持し, 表面構造だけを変化させることにより, 異なる量子ビットファイントリーにまたがる異なるキャッピング材料および膜基板について, ニオブ酸化物が超伝導量子ビットのコヒーレンス時間に与える影響をタンタル, アルミニウム, 窒化チタンのネイティブ酸化物と比較して明らかに実証した。表面封入したニオブ量子ビットデバイスは,ネイティブなニオブ酸化物を用いたベースラインニオブ量子ビットデバイスよりも2～5倍のコヒーレンス時間を示す。ニオブをタンタルで捕獲すると、200マイクロ秒以上で平均クビット寿命が得られる。アモルファスなニオブ酸化物は, 他のアモルファスな酸化物に比べて高い損失を生じる可能性が示唆された。これらの結果は,超高Q超伝導ラジオ周波数(SRF)キャビティで得られた酸化ニオブ損失タンジェントの高精度測定と一致した。この新しい表面カプセル化戦略は、シリコンプロセスとの互換性により製造とスケーラブルな製造性を維持しつつ、環境安定材料によるパッシベーションによる誘電損失のさらなる低減を可能にする。

We present a novel transmon qubit fabrication technique that yields systematic improvements in T$_1$ coherence times. We fabricate devices using an encapsulation strategy that involves passivating the surface of niobium and thereby preventing the formation of its lossy surface oxide. By maintaining the same superconducting metal and only varying the surface structure, this comparative investigation examining different capping materials and film substrates across different qubit foundries definitively demonstrates the detrimental impact that niobium oxides have on the coherence times of superconducting qubits, compared to native oxides of tantalum, aluminum or titanium nitride. Our surface-encapsulated niobium qubit devices exhibit T$_1$ coherence times 2 to 5 times longer than baseline niobium qubit devices with native niobium oxides. When capping niobium with tantalum, we obtain median qubit lifetimes above 200 microseconds. Our comparative structural and chemical analysis suggests that amorphous niobium oxides may induce higher losses compared to other amorphous oxides. These results are in line with high-accuracy measurements of the niobium oxide loss tangent obtained with ultra-high Q superconducting radiofrequency (SRF) cavities. This new surface encapsulation strategy enables even further reduction of dielectric losses via passivation with ambient-stable materials, while preserving fabrication and scalable manufacturability thanks to the compatibility with silicon processes.

翻訳日:2023-06-17 01:06:33 公開日:2023-06-14

# 編集可能なステップバイステップ記述によるインタラクティブテキスト間SQL生成

Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations ( http://arxiv.org/abs/2305.07372v2 )

ライセンス: Link先を確認

Yuan Tian, Zheng Zhang, Zheng Ning, Toby Jia-Jun Li, Jonathan K. Kummerfeld, Tianyi Zhang

(参考訳) 関係データベースは、このビッグデータ時代において重要な役割を果たす。しかし、SQLのようなデータベース言語に慣れていないため、非専門家がリレーショナルデータベースの分析能力を完全に解き放つことは困難である。自然言語からSQLを自動的に生成する多くのテクニックが提案されているが、それらは2つの問題に悩まされている。(1) 依然として多くのミス、特に複雑なクエリ、(2) 非専門家のユーザが不正クエリを検証、洗練するための柔軟な方法を提供していない。これらの問題に対処するために、ユーザがSQLエラーを修正するために、間違ったSQLのステップバイステップ説明を直接編集できる新しいインタラクションメカニズムを導入する。スパイダーベンチマークの実験では、我々の手法は3つのSOTAアプローチを少なくとも31.6%上回っている。 24人の参加者によるユーザスタディでは、私たちのアプローチによって、より少ない時間と高い信頼性で、はるかに多くのSQLタスクを解決できることが示されています。

Relational databases play an important role in this Big Data era. However, it is challenging for non-experts to fully unleash the analytical power of relational databases, since they are not familiar with database languages such as SQL. Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly for complex queries, and (2) they do not provide a flexible way for non-expert users to validate and refine the incorrect queries. To address these issues, we introduce a new interaction mechanism that allows users directly edit a step-by-step explanation of an incorrect SQL to fix SQL errors. Experiments on the Spider benchmark show that our approach outperforms three SOTA approaches by at least 31.6% in terms of execution accuracy. A user study with 24 participants further shows that our approach helped users solve significantly more SQL tasks with less time and higher confidence, demonstrating its potential to expand access to databases, particularly for non-experts.

翻訳日:2023-06-17 00:59:41 公開日:2023-06-14

# 時系列予測のためのスペクトル-時間グラフニューラルネットワークの表現力

How Expressive are Spectral-Temporal Graph Neural Networks for Time Series Forecasting? ( http://arxiv.org/abs/2305.06587v2 )

ライセンス: Link先を確認

Ming Jin, Guangsi Shi, Yuan-Fang Li, Qingsong Wen, Bo Xiong, Tian Zhou, Shirui Pan

(参考訳) スペクトル時間グラフニューラルネットワークは、グラフニューラルネットワーク(GNN)に基づくほとんどの時系列予測モデルに基づく、有望な抽象化である。しかし、この手法の根底についてもっと知る必要がある。本稿では,スペクトル時間GNNの表現力を向上する理論的枠組みを確立する。その結果,線形スペクトル時間GNNは軽微な仮定の下で普遍的であり,その表現力は離散時間動的グラフ上の1次Weisfeiler-Lemanアルゴリズムによって有界であることがわかった。有効なインスタンス化を実践するために、関連する制約を詳細に検討し、スペクトル領域における空間的および時間的モジュールを設計するための理論的青写真について概説する。これらの知見に基づいて、我々のフレームワークに基づいて、スペクトル時間GNNがいかに強力であるかを示すために、TGC(Temporal Graph GegenConv)というシンプルなインスタンスを提案し、線形成分のみで既存のモデルよりも大幅に優れ、モデル効率が向上した。

Spectral-temporal graph neural network is a promising abstraction underlying most time series forecasting models that are based on graph neural networks (GNNs). However, more is needed to know about the underpinnings of this branch of methods. In this paper, we establish a theoretical framework that unravels the expressive power of spectral-temporal GNNs. Our results show that linear spectral-temporal GNNs are universal under mild assumptions, and their expressive power is bounded by our extended first-order Weisfeiler-Leman algorithm on discrete-time dynamic graphs. To make our findings useful in practice on valid instantiations, we discuss related constraints in detail and outline a theoretical blueprint for designing spatial and temporal modules in spectral domains. Building on these insights and to demonstrate how powerful spectral-temporal GNNs are based on our framework, we propose a simple instantiation named Temporal Graph GegenConv (TGC), which significantly outperforms most existing models with only linear components and shows better model efficiency.

翻訳日:2023-06-17 00:59:21 公開日:2023-06-14

# 歴史データセットのライター検索に向けて

Towards Writer Retrieval for Historical Datasets ( http://arxiv.org/abs/2305.05358v2 )

ライセンス: Link先を確認

Marco Peer, Florian Kleber, Robert Sablatnig

(参考訳) 本稿では,キーポイント位置で検出されたクラスタリングSIFT記述子に基づいて,擬似クラスタラベルによる文字検索を行う手法を提案する。これらのクラスタラベルを用いて,NetVLADに比べて複雑性の低い符号化層であるNetRVLADをキーポイント位置32x32パッチでトレーニングした。さらに,ページ埋め込みの類似性を生かして検索性能を向上させるため,SGRと呼ばれるグラフベースの再ランクアルゴリズムを提案する。本手法は2つの歴史的データセット(Historical-WIとHisIR19)で評価する。我々は異なるバックボーンとNetRVLADの評価を含む。明示的なエンコーディングを使わずに、歴史的なデータセットに関する関連作業と競合する。再ランキング方式を適用することで,両データセットに新たな最先端技術を設定し,現代的なデータセットでも同等のパフォーマンスを達成できることを実証した。

This paper presents an unsupervised approach for writer retrieval based on clustering SIFT descriptors detected at keypoint locations resulting in pseudo-cluster labels. With those cluster labels, a residual network followed by our proposed NetRVLAD, an encoding layer with reduced complexity compared to NetVLAD, is trained on 32x32 patches at keypoint locations. Additionally, we suggest a graph-based reranking algorithm called SGR to exploit similarities of the page embeddings to boost the retrieval performance. Our approach is evaluated on two historical datasets (Historical-WI and HisIR19). We include an evaluation of different backbones and NetRVLAD. It competes with related work on historical datasets without using explicit encodings. We set a new State-of-the-art on both datasets by applying our reranking scheme and show that our approach achieves comparable performance on a modern dataset as well.

翻訳日:2023-06-17 00:58:39 公開日:2023-06-14

# 複数の注意機構と深層学習に基づく底部血管像のセグメンテーション

Fundus vascular image segmentation based on multiple attention mechanisms and deep learning ( http://arxiv.org/abs/2305.03617v2 )

ライセンス: Link先を確認

Yuanyuan Peng, Pengpeng Luan, Zixu Zhang

(参考訳) 網膜眼底画像中の血管を正確に分割することは、眼疾患の早期スクリーニング、診断、評価において重要であるが、重要な光変化、不均一な曲率構造、非一様コントラストなどの様々な要因により、セグメンテーションタスクに不明瞭な不確実性をもたらす。その結果,網膜基底画像の血管を正確に検出するために,複数の注意機構と深部学習に基づく有用なアプローチが提案された。シーン情報補償の喪失に関する文脈情報を強化するため、トランスフォーマーによって構築された空間的注意機構とチャネル注意を結合した注意融合機構を用いて、空間的およびチャネル的な寸法の網膜基底画像から血管の様々な特徴を抽出する。その後、スキップ接続で低レベル機能から冗長な情報やノイズを除去し、高レベル機能との統合性を向上させるために、ユニークな空間的注意機構が導入される。さらに、ドロップアウト層を使用して、いくつかのニューロンをランダムに破棄することで、ディープラーニングネットワークの過剰フィットを防止し、その一般化性能を向上させることができる。

Accurately segmenting blood vessels in retinal fundus images is crucial in the early screening, diagnosing, and evaluating some ocular diseases, yet it poses a nontrivial uncertainty for the segmentation task due to various factors such as significant light variations, uneven curvilinear structures, and non-uniform contrast. As a result, a useful approach based on multiple attention mechanisms and deep learning is proposed to accurately detect blood vessels in retinal fundus images. To enrich contextual information for the loss of scene information compensation, an attention fusion mechanism that combines the channel attention with spatial attention mechanisms constructed by Transformer is employed to extract various features of blood vessels from retinal fundus images in both spatial and channel dimensions. Subsequently, a unique spatial attention mechanism is introduced in the skip connection to filter out redundant information and noise from low-level features, thus enabling better integration with high-level features. In addition, a DropOut layer is employed to randomly discard some neurons, which can prevent overfitting of the deep learning network and improve its generalization performance.

翻訳日:2023-06-17 00:58:05 公開日:2023-06-14

# フローターno more: 近距離カメラのトレーニングを改善するための放射輝度場勾配スケーリング

Floaters No More: Radiance Field Gradient Scaling for Improved Near-Camera Training ( http://arxiv.org/abs/2305.02756v2 )

ライセンス: Link先を確認

Julien Philip and Valentin Deschaintre

(参考訳) nerf取得は通常、異なるカメラの近接面を慎重に選択するか、背景の崩壊に悩まされ、撮影シーンの端に浮かぶアーティファクトを生成する必要がある。この研究の鍵となる洞察は、背景の崩壊は、カメラ近傍の領域で試料の密度が高いことに起因する。このサンプリング不均衡の結果、近カメラボリュームは、はるかに多くの勾配を受け取り、誤った密度の蓄積をもたらす。本稿では,このサンプリング不均衡を解消し,背景崩壊を防止しつつ,近接平面の必要性をなくすための勾配スケーリング手法を提案する。我々の手法は数行で実装でき、大きなオーバーヘッドを生じさせることなく、ほとんどのNeRF実装と互換性がある。

NeRF acquisition typically requires careful choice of near planes for the different cameras or suffers from background collapse, creating floating artifacts on the edges of the captured scene. The key insight of this work is that background collapse is caused by a higher density of samples in regions near cameras. As a result of this sampling imbalance, near-camera volumes receive significantly more gradients, leading to incorrect density buildup. We propose a gradient scaling approach to counter-balance this sampling imbalance, removing the need for near planes, while preventing background collapse. Our method can be implemented in a few lines, does not induce any significant overhead, and is compatible with most NeRF implementations.

翻訳日:2023-06-17 00:57:26 公開日:2023-06-14

# LLM-Pruner:大規模言語モデルの構造解析について

LLM-Pruner: On the Structural Pruning of Large Language Models ( http://arxiv.org/abs/2305.11627v2 )

ライセンス: Link先を確認

Xinyin Ma, Gongfan Fang, Xinchao Wang

(参考訳) 大規模言語モデル(LLM)は、言語理解と生成において顕著な能力を示している。しかしながら、そのような印象的な機能は通常、相当なモデルサイズが伴い、デプロイメント、推論、トレーニングステージの両方において大きな課題が生じる。 LLMは汎用的なタスクソルバであり,従来のLLMのマルチタスク解決と言語生成能力の維持を目的とした,タスク非依存の方法で圧縮を探索する。これを実現するための1つの課題は、データ転送と後トレーニングのオーバーバーデンサムをモデル化するLLMのトレーニングコーパスの巨大なサイズである。そこで本研究では,LLMの圧縮をタスク依存的であること,トレーニングデータセットへの依存を最小限に抑えること,という2つの制約の範囲内で行う。 llm-pruner という手法では,勾配情報に基づく非臨界結合構造を選択的に除去し,llmの機能の大部分を最大に保持する構造的プルーニングを採用する。この目的のために、プルーニングされたモデルの性能は、わずか3時間で、わずか50Kのデータしか必要とせず、チューニング技術であるLoRAによって効率よく回復することができる。 LLaMA, Vicuna, ChatGLM の3つの LLM 上で LLM-Pruner の有効性を検証し, 圧縮されたモデルがゼロショットの分類と生成に満足できることを示す。コードは、https://github.com/horseee/LLM-Prunerで入手できる。

Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges in both the deployment, inference, and training stages. With LLM being a general-purpose task solver, we explore its compression in a task-agnostic manner, which aims to preserve the multi-task solving and language generation ability of the original LLM. One challenge to achieving this is the enormous size of the training corpus of LLM, which makes both data transfer and model post-training over-burdensome. Thus, we tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures based on gradient information, maximally preserving the majority of the LLM's functionality. To this end, the performance of pruned models can be efficiently recovered through tuning techniques, LoRA, in merely 3 hours, requiring only 50K data. We validate the LLM-Pruner on three LLMs, including LLaMA, Vicuna, and ChatGLM, and demonstrate that the compressed models still exhibit satisfactory capabilities in zero-shot classification and generation. The code is available at: https://github.com/horseee/LLM-Pruner

翻訳日:2023-06-17 00:49:50 公開日:2023-06-14

# 制御可能な画像合成のための遅延制約拡散誘導

Late-Constraint Diffusion Guidance for Controllable Image Synthesis ( http://arxiv.org/abs/2305.11520v4 )

ライセンス: Link先を確認

Chang Liu, Dong Liu

(参考訳) 拡散モデルは、テキスト条件の有無にかかわらず、数語または全くの単語を与えられたフォトリアリスティック画像の合成能力を示す。通常のユーザーやアーティストは、全体的なレイアウト、色、構造、オブジェクトの形状など、特定のガイダンスで合成画像を制御するつもりなので、これらのモデルはユーザーのニーズを十分に満たさないかもしれない。制御可能な画像合成に拡散モデルを適用するために,拡散復調ネットワークの中間特性を正則化するためのいくつかの手法が提案されている。本稿では, 早期制約法として知られ, 単一解法で複数の条件を扱うのに困難がある。彼らは、多くのトレーニングコストと一般化不可能なソリューションを必要とする、特定の条件ごとに別々のモデルをトレーニングすることを意図している。これらの問題に対処するために,我々は拡散ネットワークをそのまま残しながら,その出力が要求条件に合致するように制約するという,遅延制約という新しいアプローチを提案する。具体的には,外部条件と拡散モデルの内部表現との相関性を確立するために,軽量条件アダプタを訓練する。反復分別処理の間、条件付きガイダンスを対応する条件アダプタに送信してサンプリングプロセスを確立された相関で操作する。さらに,提案手法に準拠した合成画像の品質向上を図るため,時間ステップリサンプリング法と早期停止法を用いて,導入した遅延制約戦略を導入する。提案手法は,既存の早期制約法よりも優れ,未確認条件の一般化に優れる。私たちのコードは利用できます。

Diffusion models, either with or without text condition, have demonstrated impressive capability in synthesizing photorealistic images given a few or even no words. These models may not fully satisfy user need, as normal users or artists intend to control the synthesized images with specific guidance, like overall layout, color, structure, object shape, and so on. To adapt diffusion models for controllable image synthesis, several methods have been proposed to incorporate the required conditions as regularization upon the intermediate features of the diffusion denoising network. These methods, known as early-constraint ones in this paper, have difficulties in handling multiple conditions with a single solution. They intend to train separate models for each specific condition, which require much training cost and result in non-generalizable solutions. To address these difficulties, we propose a new approach namely late-constraint: we leave the diffusion networks unchanged, but constrain its output to be aligned with the required conditions. Specifically, we train a lightweight condition adapter to establish the correlation between external conditions and internal representations of diffusion models. During the iterative denoising process, the conditional guidance is sent into corresponding condition adapter to manipulate the sampling process with the established correlation. We further equip the introduced late-constraint strategy with a timestep resampling method and an early stopping technique, which boost the quality of synthesized image meanwhile complying with the guidance. Our method outperforms the existing early-constraint methods and generalizes better to unseen condition. Our code would be available.

翻訳日:2023-06-17 00:49:24 公開日:2023-06-14

# the beauty or the beast: 合成医療画像のどの側面が注目に値するか?

The Beauty or the Beast: Which Aspect of Synthetic Medical Images Deserves Our Focus? ( http://arxiv.org/abs/2305.09789v2 )

ライセンス: Link先を確認

Xiaodan Xing, Yang Nan, Federico Felder, Simon Walsh and Guang Yang

(参考訳) 医療用AIアルゴリズムのトレーニングには、大量の正確なラベル付きデータセットが必要である。深層生成モデルから生成された合成画像は、データの不足問題を緩和するのに役立つが、それらの効果は実世界の画像への忠実さに依存する。通常、研究者は画質測定に基づいて合成モデルを選択し、リアルに見える合成画像を優先する。しかし,本研究では,高忠実度で視覚的に魅力的な合成画像が必ずしも優れているとは限らない。実際,下流タスクにおいて,低忠実度合成画像が高忠実度画像よりも優れている場合を示す。本研究は,現実世界のアプリケーションに合成データを組み込む前に,総合分析の重要性を浮き彫りにする。我々は,医療用AIアルゴリズムのトレーニングにおいて,低忠実度合成画像の価値について,研究コミュニティの間で認識を深めることを期待している。

Training medical AI algorithms requires large volumes of accurately labeled datasets, which are difficult to obtain in the real world. Synthetic images generated from deep generative models can help alleviate the data scarcity problem, but their effectiveness relies on their fidelity to real-world images. Typically, researchers select synthesis models based on image quality measurements, prioritizing synthetic images that appear realistic. However, our empirical analysis shows that high-fidelity and visually appealing synthetic images are not necessarily superior. In fact, we present a case where low-fidelity synthetic images outperformed their high-fidelity counterparts in downstream tasks. Our findings highlight the importance of comprehensive analysis before incorporating synthetic data into real-world applications. We hope our results will raise awareness among the research community of the value of low-fidelity synthetic images in medical AI algorithm training.

翻訳日:2023-06-17 00:48:44 公開日:2023-06-14

# LoViT:手術用位相認識用長ビデオトランス

LoViT: Long Video Transformer for Surgical Phase Recognition ( http://arxiv.org/abs/2305.08989v3 )

ライセンス: Link先を確認

Yang Liu, Maxence Boels, Luis C. Garcia-Peraza-Herrera, Tom Vercauteren, Prokar Dasgupta, Alejandro Granados and Sebastien Ourselin

(参考訳) オンラインの手術相認識は、パフォーマンスを定量化し、手術ワークフローの実行を監督するコンテキストツールを構築する上で重要な役割を果たす。現在のアプローチは、異なるフェーズに出現する類似のフレームによる誤った予測につながるフレームレベルの監督を使って空間的特徴抽出器を訓練し、外科手術でよく見られるロングビデオの分析に影響を及ぼす計算上の制約によって局所的特徴とグローバルな特徴をうまく融合しないため、制限されている。本稿では,Long Video Transformer (LoViT) と呼ばれる,時間的に豊富な空間的特徴抽出器と,自己意図に基づく2つのL-Transモジュールからなる大規模時間的アグリゲータを組み合わせた,短時間・長期の時間的情報を融合する2段階の手法を提案する。マルチスケールのテンポラリヘッドは、局所的および大域的な特徴を結合し、位相遷移認識による手術段階を分類する。このアプローチは、Colec80とAutoLaparoデータセットの最先端メソッドを一貫して上回る。 trans-svnetと比較すると、lovitはcholec80におけるビデオレベルの精度が2.4pp向上し、autolaparoでは3.1pp向上した。さらに、オートラパロの位相レベルjaccardの5.3pp改善とcholec80の1.55pp改善を達成している。以上の結果から,本手法は,異なる手術手順と時間的シークエンシング特性の2つのデータセット上での外科的位相認識の最先端化に有効であり,また,ロングビデオ対応のメカニズムも導入している。

Online surgical phase recognition plays a significant role towards building contextual tools that could quantify performance and oversee the execution of surgical workflows. Current approaches are limited since they train spatial feature extractors using frame-level supervision that could lead to incorrect predictions due to similar frames appearing at different phases, and poorly fuse local and global features due to computational constraints which can affect the analysis of long videos commonly encountered in surgical interventions. In this paper, we present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information that combines a temporally-rich spatial feature extractor and a multi-scale temporal aggregator consisting of two cascaded L-Trans modules based on self-attention, followed by a G-Informer module based on ProbSparse self-attention for processing global temporal information. The multi-scale temporal head then combines local and global features and classifies surgical phases using phase transition-aware supervision. Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently. Compared to Trans-SVNet, LoViT achieves a 2.4 pp (percentage point) improvement in video-level accuracy on Cholec80 and a 3.1 pp improvement on AutoLaparo. Moreover, it achieves a 5.3 pp improvement in phase-level Jaccard on AutoLaparo and a 1.55 pp improvement on Cholec80. Our results demonstrate the effectiveness of our approach in achieving state-of-the-art performance of surgical phase recognition on two datasets of different surgical procedures and temporal sequencing characteristics whilst introducing mechanisms that cope with long videos.

翻訳日:2023-06-17 00:48:30 公開日:2023-06-14

# Mobile-Env: LLM時代のインタラクティブエージェントの評価プラットフォームとベンチマーク

Mobile-Env: An Evaluation Platform and Benchmark for Interactive Agents in LLM Era ( http://arxiv.org/abs/2305.08144v2 )

ライセンス: Link先を確認

Danyang Zhang, Lu Chen, Zihan Zhao, Ruisheng Cao, Kai Yu

(参考訳) 様々な評価ベンチマークは、大規模言語モデル(LLM)の幅広い機能を評価する上で重要な役割を果たす。価値あるベンチマークの構築に多くの取り組みがなされているが、マルチステップ対話環境におけるllmの能力評価を目的とした作業はまだ少ない。 LLMは、インタラクションのための環境観測のテキスト表現を必要とすることに気づき、情報ユーザインタフェース(InfoUI)に基づいた新しいベンチマークを構築することで、そのような空白を埋めることを選択します。 infouiはリッチテキストコンテンツで構成され、いくつかのテキストフォーマットで表現できるため、llmの相互作用能力の評価に適している。さらに、infouiの複雑な構造は、llmがプレーンテキストではなく構造化テキストを理解することの難しさをさらに高めることができる。インタラクションプラットフォームはエージェントを評価するために常に使用されるが、InfoUI専用に十分なインタラクションプラットフォームがまだ存在しない。そこで本研究では,新たな拡張性,適応性,親密なインタラクションプラットフォームであるmobile-envを構築し,適切なベンチマークのベースを提供する。 Mobile-Env をベースにした InfoUI タスクセット WikiHow が構築され,構造化テキストベースの環境における LLM のマルチステップインタラクション能力のベンチマークを確立する。一連のLLMをベースとしたエージェントをタスクセット上でテストし,InfoUIインタラクションにおけるLLMの可能性と課題について考察する。コミュニティがmobile-envの新しい環境と新しいタスクセットを提供し、より良いテストベンチマークを提供し、対応するドメインの開発を促進することを心から歓迎します。

Diverse evaluation benchmarks play a crucial role to assess a wide range of capabilities of large language models (LLM). Although plenty of endeavors have been dedicated to building valuable benchmarks, there is still little work aiming at evaluating the capability of LLM in multistep interactive environments. Noticing that LLM requires a text representation of the environment observations for interaction, we choose to fill such a blank by building a novel benchmark based on the information user interface (InfoUI). InfoUI consists of rich text contents and can be represented in some text formats, thus is suitable for the assessment of interaction ability of LLM. Additionally, the complex structures of InfoUI can further raise a challenge for LLM to understand structured texts rather than plain texts. An interaction platform is always used to evaluate an agent, however, there is still a lack of a satisfactory interaction platform dedicated to InfoUI. Consequently, we propose to build a novel easily-extendable, adaptable, and close-to-reality interaction platform, Mobile-Env, to provide a base for an appropriate benchmark. Based on Mobile-Env, an InfoUI task set WikiHow is then built to establish a benchmark for the multistep interaction capability of LLM in structured text-based environments. Agents based on a series of LLMs are tested on the task set to obtain an insight into the potential and challenge of LLM for InfoUI interaction. It is sincerely welcome that the community contribute new environments and new task sets for Mobile-Env to provide better test benchmarks and facilitate the development of the corresponding domains.

翻訳日:2023-06-17 00:47:49 公開日:2023-06-14

# 自然言語処理における拡散モデルの検討

A Survey of Diffusion Models in Natural Language Processing ( http://arxiv.org/abs/2305.14671v2 )

ライセンス: Link先を確認

Hao Zou, Zae Myung Kim, Dongyeop Kang

(参考訳) 本稿では,自然言語処理(NLP)における拡散モデルの利用について概説する。拡散モデル(英: Diffusion model)は、ネットワークや多様体にまたがる情報や信号の拡散を捉えることを目的とした数学モデルのクラスである。 NLPでは、自然言語生成、感情分析、トピックモデリング、機械翻訳などの様々な応用で拡散モデルが使われている。本稿では,NLPにおける拡散モデルの異なる定式化,その強度と限界,応用について論じる。また、拡散モデルと代替生成モデルとの徹底的な比較を行い、特に自己回帰(AR)モデルを強調し、拡散モデルとともにトランスフォーマーがいかに多様なアーキテクチャを組み込むかを検討する。 ARモデルと比較して、拡散モデルは、並列生成、テキスト補間、構文構造や意味的内容などのトークンレベルの制御、堅牢性に対して大きな利点がある。トランスフォーマーを拡散モデルに統合するさらなる応用を探求することは、価値ある追求である。また,nlpにおける拡散モデルの発展に向けて,多変量拡散モデルや,数発学習の特長を持つ大規模拡散言語モデルの開発が重要となる。

This survey paper provides a comprehensive review of the use of diffusion models in natural language processing (NLP). Diffusion models are a class of mathematical models that aim to capture the diffusion of information or signals across a network or manifold. In NLP, diffusion models have been used in a variety of applications, such as natural language generation, sentiment analysis, topic modeling, and machine translation. This paper discusses the different formulations of diffusion models used in NLP, their strengths and limitations, and their applications. We also perform a thorough comparison between diffusion models and alternative generative models, specifically highlighting the autoregressive (AR) models, while also examining how diverse architectures incorporate the Transformer in conjunction with diffusion models. Compared to AR models, diffusion models have significant advantages for parallel generation, text interpolation, token-level controls such as syntactic structures and semantic contents, and robustness. Exploring further permutations of integrating Transformers into diffusion models would be a valuable pursuit. Also, the development of multimodal diffusion models and large-scale diffusion language models with notable capabilities for few-shot learning would be important directions for the future advance of diffusion models in NLP.

翻訳日:2023-06-17 00:40:34 公開日:2023-06-14

# リポジトリにおける科学PDFアクセシビリティの現状:スイスにおける調査

The state of scientific PDF accessibility in repositories: A survey in Switzerland ( http://arxiv.org/abs/2305.14041v2 )

ライセンス: Link先を確認

Alireza Darvishy, Rolf Sethe, Ines Engler, Oriane Pierres, Juliet Manning

(参考訳) 本調査は、スイスのオンラインリポジトリにおけるPDF文書の品質を分析し、視覚障害者に対するアクセシビリティを検討した。 2つの最小限のアクセシビリティ機能が分析された。PDFにはタグと階層的な方向構造が必要だった。調査には、PDFアクセシビリティに関する一般的な意見や知識を評価するため、複数のスイス大学のリポジトリの管理者や責任者へのインタビューも含まれていた。インタビュアーの回答の分析は、PDFアクセシビリティに対する全体的な認識の欠如を示し、オンラインリポジトリにはこの問題に対処する具体的な計画がないことを示した。本稿では,PDF文書のアクセス性を向上させるために,オンラインリポジトリのレコメンデーションセットを提示する。

This survey analyzed the quality of the PDF documents on online repositories in Switzerland, examining their accessibility for people with visual impairments. Two minimal accessibility features were analyzed: the PDFs had to have tags and a hierarchical heading structure. The survey also included interviews with the managers or heads of multiple Swiss universities' repositories to assess the general opinion and knowledge of PDF accessibility. An analysis of interviewee responses indicates an overall lack of awareness of PDF accessibility, and showed that online repositories currently have no concrete plans to address the issue. This paper concludes by presenting a set of recommendations for online repositories to improve the accessibility of their PDF documents.

翻訳日:2023-06-17 00:39:14 公開日:2023-06-14

# 深部強化学習によるスラムの道路計画

Road Planning for Slums via Deep Reinforcement Learning ( http://arxiv.org/abs/2305.13060v3 )

ライセンス: Link先を確認

Yu Zheng, Hongyuan Su, Jingtao Ding, Depeng Jin, Yong Li

(参考訳) 何百万人ものスラム住民がスラム内の不適切な道路インフラのために都市サービスへのアクセシビリティが低下しており、スラムの道路計画が都市の持続可能な発展に不可欠である。既存の再ブロックやヒューリスティックな手法は、異なるスラムに一般化できない時間を要するか、アクセシビリティや建設コストの観点から最適以下の道路計画が得られる。本稿では,スラムの道路配置を自動的に行うための深層強化学習手法を提案する。本研究では,スラムのトポロジー構造を捉える汎用グラフモデルを提案し,計画道路の場所を選択するための新しいグラフニューラルネットワークを考案する。マスキングポリシー最適化により,スラム内の場所を最小限の建設コストで接続する道路計画を作成することができる。異なる国における実世界のスラムに関する広範囲な実験により、モデルの有効性が検証され、既存のベースラインメソッドに対するアクセシビリティが14.3%向上した。異なるタスク間での移動に関するさらなる調査は、我々のモデルが単純なシナリオで道路計画スキルを習得し、より複雑なシナリオに適応できることを示し、我々のモデルを現実世界のスラムアップグレードに適用する可能性を示している。コードとデータはhttps://github.com/tsinghua-fib-lab/road-planning-for-slumsで入手できる。

Millions of slum dwellers suffer from poor accessibility to urban services due to inadequate road infrastructure within slums, and road planning for slums is critical to the sustainable development of cities. Existing re-blocking or heuristic methods are either time-consuming which cannot generalize to different slums, or yield sub-optimal road plans in terms of accessibility and construction costs. In this paper, we present a deep reinforcement learning based approach to automatically layout roads for slums. We propose a generic graph model to capture the topological structure of a slum, and devise a novel graph neural network to select locations for the planned roads. Through masked policy optimization, our model can generate road plans that connect places in a slum at minimal construction costs. Extensive experiments on real-world slums in different countries verify the effectiveness of our model, which can significantly improve accessibility by 14.3% against existing baseline methods. Further investigations on transferring across different tasks demonstrate that our model can master road planning skills in simple scenarios and adapt them to much more complicated ones, indicating the potential of applying our model in real-world slum upgrading. The code and data are available at https://github.com/tsinghua-fib-lab/road-planning-for-slums.

翻訳日:2023-06-17 00:38:35 公開日:2023-06-14

# 3次元流れ場分割と分類のための新しい深層学習法

Novel deep learning methods for 3D flow field segmentation and classification ( http://arxiv.org/abs/2305.11884v2 )

ライセンス: Link先を確認

Xiaorui Bai, Wenyong Wang, Jun Zhang, Yueqing Wang, Yu Xiang

(参考訳) 流れ場のセグメンテーションと分類は、渦の構造や乱流を理解するのに役立つ。グローバル情報に基づく既存の深層学習手法 : 2次元状況に着目して流れ場理論に基づいて,3次元空間における新しい流れ場セグメンテーションとディープラーニングの分類法を提案する。本研究では,局所速度情報と渦流と渦流の関係に基づく分類基準に基づいてセグメンテーション基準を構築し,3次元流れ場の渦構造を同定し,渦流のタイプを正確にかつ迅速に分類する。シミュレーション実験の結果,従来の手法と比較して,分節法では渦面積をより正確に識別できるが,時間消費は50%以上減少し,分類法では同じ分類精度を維持しつつ,時間消費を90%以上削減できることがわかった。

Flow field segmentation and classification help researchers to understand vortex structure and thus turbulent flow. Existing deep learning methods mainly based on global information and focused on 2D circumstance. Based on flow field theory, we propose novel flow field segmentation and classification deep learning methods in three-dimensional space. We construct segmentation criterion based on local velocity information and classification criterion based on the relationship between local vorticity and vortex wake, to identify vortex structure in 3D flow field, and further classify the type of vortex wakes accurately and rapidly. Simulation experiment results showed that, compared with existing methods, our segmentation method can identify the vortex area more accurately, while the time consumption is reduced more than 50%; our classification method can reduce the time consumption by more than 90% while maintaining the same classification accuracy level.

翻訳日:2023-06-17 00:37:32 公開日:2023-06-14

# OVO: Open-Vocabulary Occupancy

OVO: Open-Vocabulary Occupancy ( http://arxiv.org/abs/2305.16133v2 )

ライセンス: Link先を確認

Zhiyu Tan, Zichao Dong, Cheng Zhang, Weikun Zhang, Hang Ji, Hao Li

(参考訳) semantic occupancy predictionは、自律エージェントが3d環境で安全に動作するために、周囲の密度の幾何と意味を推測することを目的としている。既存の占有率予測手法は,人間の注釈付きボリュームデータに基づいてほぼ完全に訓練されている。高品質ではあるが、そのような3Dアノテーションの生成は面倒でコストがかかり、トレーニングデータセット内のいくつかの特定のオブジェクトカテゴリに制限される。この制限に対処するために,任意のクラスを意味的に占有できるが,訓練中に3Dアノテーションを必要としない新しい手法であるOpen Vocabulary Occupancy (OVO)を提案する。提案手法の鍵は,(1)事前訓練した2次元開語彙セグメンテーションモデルから3次元占有ネットワークへの知識蒸留,(2)高品質トレーニングデータ生成のためのピクセルボクセルフィルタリングである。結果として得られるフレームワークはシンプルでコンパクトで、ほとんどの最先端のセマンティック占有予測モデルと互換性がある。 NYUv2とSemanticKITTIデータセットでは、OVOは教師付きセマンティック占有予測アプローチと比較して、競争性能が向上する。さらに,提案フレームワークの設計に関する知見を提供するため,広範な解析およびアブレーション研究を行う。私たちのコードはhttps://github.com/dzcgaara/ovoで公開されています。

Semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment. Existing occupancy prediction methods are almost entirely trained on human-annotated volumetric data. Although of high quality, the generation of such 3D annotations is laborious and costly, restricting them to a few specific object categories in the training dataset. To address this limitation, this paper proposes Open Vocabulary Occupancy (OVO), a novel approach that allows semantic occupancy prediction of arbitrary classes but without the need for 3D annotations during training. Keys to our approach are (1) knowledge distillation from a pre-trained 2D open-vocabulary segmentation model to the 3D occupancy network, and (2) pixel-voxel filtering for high-quality training data generation. The resulting framework is simple, compact, and compatible with most state-of-the-art semantic occupancy prediction models. On NYUv2 and SemanticKITTI datasets, OVO achieves competitive performance compared to supervised semantic occupancy prediction approaches. Furthermore, we conduct extensive analyses and ablation studies to offer insights into the design of the proposed framework. Our code is publicly available at https://github.com/dzcgaara/OVO.

翻訳日:2023-06-17 00:28:29 公開日:2023-06-14

# Bhasha-Abhijnaanam:22言語におけるネイティブスクリプトとロマン化言語同定

Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages ( http://arxiv.org/abs/2305.15814v2 )

ライセンス: Link先を確認

Yash Madhani, Mitesh M. Khapra, Anoop Kunchukuttan

(参考訳) 我々は、インド憲法に記載されている22の言語について、言語識別(LID)データセットとモデルを作成する。まず、ネイティブスクリプト用の言語識別テストセットであるbhasha-abhijnaanamと、22のindic言語にまたがるローマ字テキストを作成します。 IndicLIDは、上記のすべての言語をネイティブおよびローマン化されたスクリプトで識別する言語である。ネイティブテキストでは、既存のLIDよりも言語カバレッジが良く、他のLIDよりも競争力がある。 IndicLIDは、インド語でロマライズされたテキストのための最初のLIDである。 romanized text LIDの2つの大きな課題は、トレーニングデータの欠如と、言語が似ている場合の低LIDパフォーマンスである。これらの問題に対する単純で効果的な解決策を提供する。一般に、いかなる言語においてもローマ字化テキストに関する作業は限られており、この発見はローマ字化言語識別を必要とする他の言語に関連している。私たちのモデルはオープンソースライセンスの下でhttps://ai4bharat.iitm.ac.in/indiclidで公開されています。私たちのトレーニングとテストセットは、オープンソースライセンスの下でhttps://ai4bharat.iitm.ac.in/bhasha-abhijnaanamで公開されています。

We create publicly available language identification (LID) datasets and models in all 22 Indian languages listed in the Indian constitution in both native-script and romanized text. First, we create Bhasha-Abhijnaanam, a language identification test set for native-script as well as romanized text which spans all 22 Indic languages. We also train IndicLID, a language identifier for all the above-mentioned languages in both native and romanized script. For native-script text, it has better language coverage than existing LIDs and is competitive or better than other LIDs. IndicLID is the first LID for romanized text in Indian languages. Two major challenges for romanized text LID are the lack of training data and low-LID performance when languages are similar. We provide simple and effective solutions to these problems. In general, there has been limited work on romanized text in any language, and our findings are relevant to other languages that need romanized language identification. Our models are publicly available at https://ai4bharat.iitm.ac.in/indiclid under open-source licenses. Our training and test sets are also publicly available at https://ai4bharat.iitm.ac.in/bhasha-abhijnaanam under open-source licenses.

翻訳日:2023-06-17 00:28:09 公開日:2023-06-14

# モバイル支払い受け入れのドライバー:ネットワーク外部性の影響

Drivers of Mobile Payment Acceptance: The Impact of Network Externalities ( http://arxiv.org/abs/2305.15436v2 )

ライセンス: Link先を確認

Qasim Ajao and E. Abdullah Abu-Shanab

(参考訳) スマートフォンとそのアプリケーションの普及により、モバイル決済はますます人気が高まっている。しかし、アフリカ諸国での採用は、私たちの生活を単純化する可能性にもかかわらず、制限されている。本研究の目的は,ナイジェリアにおけるモバイル決済の受容に影響を与える要因の理解を深めることである。そこで本稿では,従来の技術受容要因に加えて,ネットワーク外部性の影響について検討する。この研究は、モバイル支払いの受け入れの主要な要因は、パフォーマンスの期待、努力の期待、社会的影響、信頼、ネットワーク外部性である、と仮定している。調査の結果は、従来のドライバーは依然としてモバイル決済を採用する顧客の意思に影響を与えているが、ネットワーク外部性は最も強い影響を与えることを示唆している。本論文は, 努力期待の影響を裏付けるものではないが, 今後の研究を推奨する。

Mobile payment has become increasingly popular due to the widespread use of smartphones and their applications. However, its adoption in African countries has been limited, despite its potential to simplify our lives. This study aims to enhance our understanding of the factors that affect the acceptance of mobile payment in Nigeria. To achieve this, the paper explores the impact of "network externalities" in addition to traditional technology acceptance factors. The study hypothesizes that the key drivers of mobile payment acceptance are performance expectancy, effort expectancy, social influence, trust, and network externality. The research findings suggest that while traditional drivers still play a role in customers' willingness to adopt mobile payment, network externalities have the strongest impact. Although the results did not support the influence of effort expectancy, the paper provides recommendations for future research.

翻訳日:2023-06-17 00:27:51 公開日:2023-06-14

# 言語から見た弱視映像の再検討

Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective ( http://arxiv.org/abs/2306.00595v3 )

ライセンス: Link先を確認

Yingying Fan and Yu Wu and Yutian Lin and Bo Du

(参考訳) 音声/視覚モダリティのすべてのイベントを識別・特定することを目的とした,弱い教師付き音声映像解析タスク(avvp)に注目した。それまでの作業は、モダリティにまたがるビデオレベルのラベルにのみフォーカスするが、隣接するビデオセグメント(すなわち1秒のビデオクリップ)が異なるイベントを含むセグメントレベルのラベルノイズを見落としている。しかし、セグメント内のイベントを認識することは、そのラベルがビデオ内で発生するイベントの組み合わせである可能性があるため、難しい。この問題を解決するために、言語の観点からAVVPに取り組むことを検討する。なぜなら、言語は固定ラベルを超えて各セグメントにどのように様々なイベントが現れるかを自由に記述できるからだ。具体的には、各ビデオのイベント出現のすべてのケースを記述する言語プロンプトを設計します。次に、最も類似したプロンプトのイベントをセグメントレベルラベルとして、言語プロンプトとセグメントの類似度を算出する。また,ラベルの誤りに対処するため,信頼できないセグメントに対して動的再重み付けを行い,ラベルを調整することを提案する。実験により, 単純かつ効果的なアプローチが最先端の手法を大差で上回っていることが示された。

We focus on the weakly-supervised audio-visual video parsing task (AVVP), which aims to identify and locate all the events in audio/visual modalities. Previous works only concentrate on video-level overall label denoising across modalities, but overlook the segment-level label noise, where adjacent video segments (i.e., 1-second video clips) may contain different events. However, recognizing events in the segment is challenging because its label could be any combination of events that occur in the video. To address this issue, we consider tackling AVVP from the language perspective, since language could freely describe how various events appear in each segment beyond fixed labels. Specifically, we design language prompts to describe all cases of event appearance for each video. Then, the similarity between language prompts and segments is calculated, where the event of the most similar prompt is regarded as the segment-level label. In addition, to deal with the mislabeled segments, we propose to perform dynamic re-weighting on the unreliable segments to adjust their labels. Experiments show that our simple yet effective approach outperforms state-of-the-art methods by a large margin.

翻訳日:2023-06-17 00:21:40 公開日:2023-06-14

# Concordiaによる並列神経シンボル統合

Parallel Neurosymbolic Integration with Concordia ( http://arxiv.org/abs/2306.00480v2 )

ライセンス: Link先を確認

Jonathan Feldstein, Modestas Jur\v{c}ius, Efthymia Tsamoura

(参考訳) 並列型ニューロシンボリックアーキテクチャは論理理論からの知識を深層モデルに蒸留することでNLPに効果的に適用されているが、従来の技術は制限された論理理論をサポートし、論理と深層ネットワークの独立性の仮定に依存するなど、いくつかの制限に直面している。先行技術の限界を克服するフレームワークであるConcordiaを提示する。コンコルディアはディープネットワークと論理理論の両方に非依存であり、幅広い確率論的理論を支持する。我々のフレームワークは、両方のコンポーネントの教師なしトレーニングと神経コンポーネントの教師なしトレーニングをサポートすることができる。コンコーディアはNLPやデータ分類以外のタスクに適用され、集団活動の検出、エンティティリンク、レコメンデーションタスクにおける最先端の精度を向上させる。

Parallel neurosymbolic architectures have been applied effectively in NLP by distilling knowledge from a logic theory into a deep model.However, prior art faces several limitations including supporting restricted forms of logic theories and relying on the assumption of independence between the logic and the deep network. We present Concordia, a framework overcoming the limitations of prior art. Concordia is agnostic both to the deep network and the logic theory offering support for a wide range of probabilistic theories. Our framework can support supervised training of both components and unsupervised training of the neural component. Concordia has been successfully applied to tasks beyond NLP and data classification, improving the accuracy of state-of-the-art on collective activity detection, entity linking and recommendation tasks.

翻訳日:2023-06-17 00:20:58 公開日:2023-06-14

# スケッチリファインメントによるインタラクティブな画像インペインティング

Towards Interactive Image Inpainting via Sketch Refinement ( http://arxiv.org/abs/2306.00407v2 )

ライセンス: Link先を確認

Chang Liu, Shunxin Xu, Jialun Peng, Kaidong Zhang and Dong Liu

(参考訳) イメージインペインティングの難しい問題は、腐敗した領域の複雑な構造を復元することである。インタラクティブなイメージのインパインティングを動機付け、スケッチなどの追加ヒントを活用してインパインティングプロセスを支援する。 sketchはエンドユーザーにはシンプルで直感的だが、ランダム性のあるフリーフォームがある。このようなランダム性は、塗装されたモデルと混同し、完成した画像に深刻なアーティファクトを引き起こす可能性がある。この問題に対処するため,sketchrefinerと呼ばれる2段階画像インペインティング手法を提案する。第1段階では,利用者に提供されたスケッチを粗い方法で校正し,洗練するために,相互相関損失関数を用いることを提案する。第2段階では,特徴空間の抽象的スケッチから情報的特徴を抽出し,着色過程を変調する。また,実際のスケッチを自動的にシミュレートし,異なるアプリケーションでテストプロトコルを構築するアルゴリズムを提案する。公開データセットの実験結果によると、SketchRefinerはスケッチ情報を効果的に利用し、フリーフォームスケッチによるアーティファクトを排除している。本手法は定性的にも量的にも常に最先端の手法よりも優れており,一方で実世界のアプリケーションにおいても大きな可能性を秘めている。コードとデータセットが利用可能です。

One tough problem of image inpainting is to restore complex structures in the corrupted regions. It motivates interactive image inpainting which leverages additional hints, e.g., sketches, to assist the inpainting process. Sketch is simple and intuitive to end users, but meanwhile has free forms with much randomness. Such randomness may confuse the inpainting models, and incur severe artifacts in completed images. To address this problem, we propose a two-stage image inpainting method termed SketchRefiner. In the first stage, we propose using a cross-correlation loss function to robustly calibrate and refine the user-provided sketches in a coarse-to-fine fashion. In the second stage, we learn to extract informative features from the abstracted sketches in the feature space and modulate the inpainting process. We also propose an algorithm to simulate real sketches automatically and build a test protocol with different applications. Experimental results on public datasets demonstrate that SketchRefiner effectively utilizes sketch information and eliminates the artifacts due to the free-form sketches. Our method consistently outperforms the state-of-the-art ones both qualitatively and quantitatively, meanwhile revealing great potential in real-world applications. Our code and dataset are available.

翻訳日:2023-06-17 00:20:44 公開日:2023-06-14

# 合成ゼロショット学習における条件属性の学習

Learning Conditional Attributes for Compositional Zero-Shot Learning ( http://arxiv.org/abs/2305.17940v2 )

ライセンス: Link先を確認

Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, Chunhua Shen

(参考訳) 合成ゼロショット学習(CZSL)は、属性オブジェクトの組み合わせのような学習概念に基づいて、新しい合成概念を認識するためのモデルを訓練することを目的としている。例えば、 ``wet apple" と ``wet cat" の属性 ``wet" は異なる。本研究では,属性が認識対象と入力画像上で条件付けされていることを解析し,属性ハイパーラーナと属性ベースラーナを含む属性学習フレームワークによって組込みされた学習条件属性を探索する。条件付き属性を符号化することにより、一般化のための柔軟な属性埋め込みを生成することができる。より挑戦的なC-GQAデータセットを含むCZSLベンチマークの実験は、他の最先端のアプローチよりも優れたパフォーマンスを示し、学習条件属性の重要性を検証する。コードはhttps://github.com/wqshmzh/CANet-CZSLで入手できる。

Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts based on learned concepts such as attribute-object combinations. One of the challenges is to model attributes interacted with different objects, e.g., the attribute ``wet" in ``wet apple" and ``wet cat" is different. As a solution, we provide analysis and argue that attributes are conditioned on the recognized object and input image and explore learning conditional attribute embeddings by a proposed attribute learning framework containing an attribute hyper learner and an attribute base learner. By encoding conditional attributes, our model enables to generate flexible attribute embeddings for generalization from seen to unseen compositions. Experiments on CZSL benchmarks, including the more challenging C-GQA dataset, demonstrate better performances compared with other state-of-the-art approaches and validate the importance of learning conditional attributes. Code is available at https://github.com/wqshmzh/CANet-CZSL

翻訳日:2023-06-17 00:18:47 公開日:2023-06-14

# 強凸最適化のための下次手法の原始双対理論

Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization ( http://arxiv.org/abs/2305.17323v2 )

ライセンス: Link先を確認

Benjamin Grimmer, Danlin Li

(参考訳) 強凸だが非滑らかな非リプシッツ最適化のための(統計的)部分次数法を考える。古典的下位段階法,近位下位段階法,スイッチング下位段階法に対して,新しい等価な二重記述(二重平均化のスタイル)を提供する。これらの同値性により、$O(1/T)$収束保証は古典的原始的ギャップと、強い凸最適化のための以前に解析されなかった双対ギャップの両方の観点から可能である。その結果,本理論は,計算コストを増すことなく,簡便で最適な停止基準と最適性証明書をこれらの古典的手法に提供する。結論は, 段階的選択や, 非リプシッツ非条件問題において, 段階的手法の初期イテレーションが指数関数的に変動する可能性(我々の知識の最大値に対して, 先行研究が対処されない現象)に対して適用できる。このような望ましくない振る舞いが存在する場合でも、我々の理論は最終的な収束を保証し、境界を与える。

We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging) for the classic subgradient method, the proximal subgradient method, and the switching subgradient method. These equivalences enable $O(1/T)$ convergence guarantees in terms of both their classic primal gap and a not previously analyzed dual gap for strongly convex optimization. Consequently, our theory provides these classic methods with simple, optimal stopping criteria and optimality certificates at no added computational cost. Our results apply under nearly any stepsize selection and for a range of non-Lipschitz ill-conditioned problems where the early iterations of the subgradient method may diverge exponentially quickly (a phenomenon which, to the best of our knowledge, no prior works address). Even in the presence of such undesirable behaviors, our theory still ensures and bounds eventual convergence.

翻訳日:2023-06-17 00:18:20 公開日:2023-06-14

# dreamsparse: スパースビューによる2次元凍結拡散モデルによるプラトンの洞窟からの脱出

DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views ( http://arxiv.org/abs/2306.03414v3 )

ライセンス: Link先を確認

Paul Yoo, Jiaxian Guo, Yutaka Matsuo, Shixiang Shane Gu

(参考訳) いくつかの視点から新しいビューイメージを合成することは、難しいが実践的な問題である。既存の手法では、提供された情報不足のため、品質の高い結果を生成するのに苦労することが多い。本研究では,事前学習した拡散モデルにおける2次元先行の強みを利用した新しいビュー画像の合成について検討する。しかし、2d拡散モデルには3d認識が欠如しており、画像合成の歪曲化とアイデンティティの妥協に繋がる。このような問題に対処するために,凍結した事前学習拡散モデルにより幾何学的,アイデンティティに一貫性のある新しいビュー画像を生成するフレームワークDreamSparseを提案する。具体的には、DreamSparseには3Dビューから3Dの機能をキャプチャーするための幾何学モジュールが組み込まれている。その後、これらの3次元特徴写像を生成過程の空間情報に変換するための空間誘導モデルを導入する。この情報は、事前訓練された拡散モデルを導くために使用され、幾何的に一貫した画像を生成することができる。事前訓練された拡散モデルで強いイメージを活用すれば、DreamSparseはオブジェクトレベルの画像とシーンレベルの画像の両方に対して高品質なノベルビューを合成し、オープンセットイメージに一般化することができる。実験により,本フレームワークは,スパースビューから新しいビューイメージを効果的に合成し,訓練されたカテゴリイメージとオープンセットのカテゴリイメージの両方において,ベースラインに優れることを示した。 https://sites.google.com/view/dreamsparse-webページ。

Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images. 2D diffusion models, nevertheless, lack 3D awareness, leading to distorted image synthesis and compromising the identity. To address these problems, we propose DreamSparse, a framework that enables the frozen pre-trained diffusion model to generate geometry and identity-consistent novel view image. Specifically, DreamSparse incorporates a geometry module designed to capture 3D features from sparse views as a 3D prior. Subsequently, a spatial guidance model is introduced to convert these 3D feature maps into spatial information for the generative process. This information is then used to guide the pre-trained diffusion model, enabling it to generate geometrically consistent images without tuning it. Leveraging the strong image priors in the pre-trained diffusion models, DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images and generalising to open-set images. Experimental results demonstrate that our framework can effectively synthesize novel view images from sparse views and outperforms baselines in both trained and open-set category images. More results can be found on our project page: https://sites.google.com/view/dreamsparse-webpage.

翻訳日:2023-06-17 00:10:49 公開日:2023-06-14

# 低エネルギー中性子-陽子散乱における絡み合い最大化

Entanglement Maximization in Low-Energy Neutron-Proton Scattering ( http://arxiv.org/abs/2306.03239v2 )

ライセンス: Link先を確認

Gerald A. Miller

(参考訳) 中性子-陽子散乱の絡み合い特性を, 中性子-陽子状態に対する散乱作用素の作用によって生じる絡み合い対の数を数える尺度を用いて検討した。 350mevまでの実験室エネルギーの散乱に関連する全ての位相シフトが用いられる。エンタングルメントは、非常に低いエネルギー散乱で最大化される。そのようなエネルギーでは、ハミルトニアンはウィグナーSU(4)対称性に従い、絡み合いの最大度はその対称性の符号である。高エネルギーでは、エンタングルメントの角度依存性は強く、エンタングルメントは多くの散乱角に対して大きい。テンソル力は、約50MeV以上の実験室運動エネルギーで絡み合いを発生させる重要な役割を担っている。

The entanglement properties of neutron-proton scattering are investigated using a measure that counts the number of entangled pairs produced by the action of a scattering operator on a given initial neutron-proton state. All phase shifts relevant for scattering at laboratory energies up to 350 MeV are used. Entanglement is found to be maximized in very low energy scattering. At such energies the Hamiltonian obeys Wigner SU(4) symmetry, and an entanglement maximum is a sign of that symmetry. At higher energies the angular dependence of entanglement is strong and the entanglement is large for many scattering angles. The tensor force is shown to play a significant role in producing entanglement at lab kinetic energies greater than about 50 MeV.

翻訳日:2023-06-17 00:10:24 公開日:2023-06-14

# midmed:医療相談のための混合型対話に向けて

MidMed: Towards Mixed-Type Dialogues for Medical Consultation ( http://arxiv.org/abs/2306.02923v2 )

ライセンス: Link先を確認

Xiaoming Shi, Zeming Liu, Chuan Wang, Haitao Leng, Kui Xue, Xiaofan Zhang, Shaoting Zhang

(参考訳) ほとんどの医療対話システムは、患者が医療相談の前に明確な目標(医療問合せ、外科手術問合せなど)を持っていると仮定している。しかし、多くの現実シナリオでは、医学的な知識が不足しているため、患者が必要な全てのスロットで明確な目標を決定することは通常困難である。本稿では,この課題を,患者の目標を明確にするための医療相談対話システムの構築方法として認識する。そこで本研究では,この課題を軽減すべく,タスク指向対話,レコメンデーション,知識基盤対話,qa,chitchatの5つの対話タイプをカバーする「midmed」と呼ばれるヒューマン・ツー・ヒューマン混合型医療相談対話コーパスを提案する。 MidMedは4つの部門(耳鼻咽喉科、眼科、皮膚、消化器科)と8,175の対話をカバーしている。さらに,この課題に対処するため,MidMed上にベースラインを構築し,InsMedと呼ばれる指導指導型医療対話生成フレームワークを提案する。実験の結果,InsMedの有効性が示された。

Most medical dialogue systems assume that patients have clear goals (medicine querying, surgical operation querying, etc.) before medical consultation. However, in many real scenarios, due to the lack of medical knowledge, it is usually difficult for patients to determine clear goals with all necessary slots. In this paper, we identify this challenge as how to construct medical consultation dialogue systems to help patients clarify their goals. To mitigate this challenge, we propose a novel task and create a human-to-human mixed-type medical consultation dialogue corpus, termed MidMed, covering five dialogue types: task-oriented dialogue for diagnosis, recommendation, knowledge-grounded dialogue, QA, and chitchat. MidMed covers four departments (otorhinolaryngology, ophthalmology, skin, and digestive system), with 8,175 dialogues. Furthermore, we build baselines on MidMed and propose an instruction-guiding medical dialogue generation framework, termed InsMed, to address this task. Experimental results show the effectiveness of InsMed.

翻訳日:2023-06-17 00:10:12 公開日:2023-06-14

# 生成型AI応用に関する調査

A survey of Generative AI Applications ( http://arxiv.org/abs/2306.02781v2 )

ライセンス: Link先を確認

Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merch\'an

(参考訳) ジェネレーティブAIは近年顕著な成長を遂げており、多様なドメインにまたがる幅広いアプリケーションを生み出している。本稿では,350以上の生成ai応用に関する包括的調査を行い,様々な単様および多様生成aiの構造化分類と簡潔な記述について述べる。この調査は、テキスト、画像、ビデオ、ゲーム、脳情報など、幅広いユニモーダルな生成aiアプリケーションをカバーするセクションに分割されている。我々の調査は、研究者や実践者が、急速に拡大する生成AIの風景をナビゲートし、現在の最先端の理解を深め、この分野におけるさらなるイノベーションを促進するための貴重なリソースとなることを目的としています。

Generative AI has experienced remarkable growth in recent years, leading to a wide array of applications across diverse domains. In this paper, we present a comprehensive survey of more than 350 generative AI applications, providing a structured taxonomy and concise descriptions of various unimodal and even multimodal generative AIs. The survey is organized into sections, covering a wide range of unimodal generative AI applications such as text, images, video, gaming and brain information. Our survey aims to serve as a valuable resource for researchers and practitioners to navigate the rapidly expanding landscape of generative AI, facilitating a better understanding of the current state-of-the-art and fostering further innovation in the field.

翻訳日:2023-06-17 00:09:53 公開日:2023-06-14

# MoviePuzzle:マルチモーダル順序学習による視覚的ナラティブ推論

MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning ( http://arxiv.org/abs/2306.02252v2 )

ライセンス: Link先を確認

Jianghui Wang, Yuxuan Wang, Dongyan Zhao, Zilong Zheng

(参考訳) 視覚的物語的推論と全体論的映画理解をターゲットとした新しい挑戦であるMoviePuzzleを紹介する。ビデオ理解の領域で注目すべき進歩にもかかわらず、ほとんどの先行作品は、長い形式のビデオに存在する総合的なビデオ理解と生来のビジュアルナラティブ構造に対処するためのタスクやモデルの提供に失敗している。そこで本研究では,映像対話情報の存在下で映画セグメントの撮影,フレーム,クリップ層を再分割することにより,映像モデルの時間的特徴学習と構造学習を増幅するmoviepuzzleタスクを行った。まず,映画を階層層に分割し,ランダムに順序を並べ替えることで,movienetに基づく精巧なデータセットを構築する。映画理解の先行技術を用いて映画パズルをベンチマークすると同時に,映画再注文の基盤構造と視覚的意味的順序を考慮した階層的コントラスト映画クラスタリング(hcmc)モデルを考案する。具体的には、ペアワイズで対照的な学習アプローチを通じて、各層の正しい順序を予測するためにモデルを訓練する。これにより、映画の視覚的物語構造を解読し、ビデオデータに潜む障害を処理するためのネックが装備される。実験により,本手法は,既存の<MoviePuzzle>ベンチマークよりも高い性能を示し,その有効性を裏付ける。

We introduce MoviePuzzle, a novel challenge that targets visual narrative reasoning and holistic movie understanding. Despite the notable progress that has been witnessed in the realm of video understanding, most prior works fail to present tasks and models to address holistic video understanding and the innate visual narrative structures existing in long-form videos. To tackle this quandary, we put forth MoviePuzzle task that amplifies the temporal feature learning and structure learning of video models by reshuffling the shot, frame, and clip layers of movie segments in the presence of video-dialogue information. We start by establishing a carefully refined dataset based on MovieNet by dissecting movies into hierarchical layers and randomly permuting the orders. Besides benchmarking the MoviePuzzle with prior arts on movie understanding, we devise a Hierarchical Contrastive Movie Clustering (HCMC) model that considers the underlying structure and visual semantic orders for movie reordering. Specifically, through a pairwise and contrastive learning approach, we train models to predict the correct order of each layer. This equips them with the knack for deciphering the visual narrative structure of movies and handling the disorder lurking in video data. Experiments show that our approach outperforms existing state-of-the-art methods on the \MoviePuzzle benchmark, underscoring its efficacy.

翻訳日:2023-06-17 00:09:40 公開日:2023-06-14

# PassGPT: 大きな言語モデルを用いたパスワードモデリングと(ガイド付き)生成

PassGPT: Password Modeling and (Guided) Generation with Large Language Models ( http://arxiv.org/abs/2306.01545v2 )

ライセンス: Link先を確認

Javier Rando and Fernando Perez-Cruz and Briland Hitaj

(参考訳) 大規模言語モデル(LLM)は、明示的な監督なしに大量のテキストから自然言語をモデル化することに成功した。本稿では,パスワードのモデリングにおけるLLMの有効性について検討する。パスワード生成のためのパスワードリークを訓練したllmであるpassgptを提案する。 passgptは、従来の2倍のパスワードを推測することで、generative adversarial networks (gan) に基づく既存の方法よりも優れています。さらに,任意の制約に対応するパスワードを生成するためにPassGPTサンプリング手法を利用する誘導型パスワード生成の概念を導入する。最後に、passgptがパスワード上で定義しているエントロピーと確率分布の詳細な分析を行い、既存のパスワード強度推定器の強化における使用について論じる。

Large language models (LLMs) successfully model natural language from vast amounts of text without the need for explicit supervision. In this paper, we investigate the efficacy of LLMs in modeling passwords. We present PassGPT, a LLM trained on password leaks for password generation. PassGPT outperforms existing methods based on generative adversarial networks (GAN) by guessing twice as many previously unseen passwords. Furthermore, we introduce the concept of guided password generation, where we leverage PassGPT sampling procedure to generate passwords matching arbitrary constraints, a feat lacking in current GAN-based strategies. Lastly, we conduct an in-depth analysis of the entropy and probability distribution that PassGPT defines over passwords and discuss their use in enhancing existing password strength estimators.

翻訳日:2023-06-17 00:08:52 公開日:2023-06-14

# 判例要約のための事前学習された抽象モデルとllmは、どの程度準備ができているか?

How Ready are Pre-trained Abstractive Models and LLMs for Legal Case Judgement Summarization? ( http://arxiv.org/abs/2306.01248v2 )

ライセンス: Link先を確認

Aniket Deroy, Kripabandhu Ghosh, Saptarshi Ghosh

(参考訳) 判例判断の自動要約は伝統的に抽出的要約法を用いて試みられている。しかし近年では,より自然で一貫性のある要約を生成できるため,抽象要約モデルが普及している。法的なドメイン固有の事前学習された抽象要約モデルが利用可能である。さらに、ChatGPTのような汎用ドメイン事前訓練された大規模言語モデル(LLM)は高品質なテキストを生成することで知られており、テキスト要約の能力を持っている。したがって、これらのモデルが、ケース判断のための抽象的な要約を自動生成するオフザシェルフアプリケーションの準備が整っているかどうかを問うのは自然である。そこで本研究では,インドの裁判所判決に対して,最先端のドメイン固有抽象要約モデルと一般ドメインLLMを適用し,生成した要約の質を確認する。要約品質の標準指標に加えて、要約における矛盾や幻覚も確認する。抽象的な要約モデルでは,ROUGEやBLEUなどの標準要約評価指標を用いて,抽出モデルよりも若干高いスコアが得られる。しかし、生成した抽象要約には矛盾する情報や幻覚的な情報がしばしば見出される。全体として,事前学習した抽象要約モデルとLLMは,ケース判断要約のための完全自動展開にはまだ準備が整っていないことが示唆されている。

Automatic summarization of legal case judgements has traditionally been attempted by using extractive summarization methods. However, in recent years, abstractive summarization models are gaining popularity since they can generate more natural and coherent summaries. Legal domain-specific pre-trained abstractive summarization models are now available. Moreover, general-domain pre-trained Large Language Models (LLMs), such as ChatGPT, are known to generate high-quality text and have the capacity for text summarization. Hence it is natural to ask if these models are ready for off-the-shelf application to automatically generate abstractive summaries for case judgements. To explore this question, we apply several state-of-the-art domain-specific abstractive summarization models and general-domain LLMs on Indian court case judgements, and check the quality of the generated summaries. In addition to standard metrics for summary quality, we check for inconsistencies and hallucinations in the summaries. We see that abstractive summarization models generally achieve slightly higher scores than extractive models in terms of standard summary evaluation metrics such as ROUGE and BLEU. However, we often find inconsistent or hallucinated information in the generated abstractive summaries. Overall, our investigation indicates that the pre-trained abstractive summarization models and LLMs are not yet ready for fully automatic deployment for case judgement summarization; rather a human-in-the-loop approach including manual checks for inconsistencies is more suitable at present.

翻訳日:2023-06-17 00:08:41 公開日:2023-06-14

# 同時運動量と位置測定とインストゥルメンタルワイル・ハイゼンベルク群

Simultaneous Momentum and Position Measurement and the Instrumental Weyl-Heisenberg Group ( http://arxiv.org/abs/2306.01045v2 )

ライセンス: Link先を確認

Christopher S. Jackson and Carlton M. Caves

(参考訳) 標準可換関係、$[Q,P] = i\hbar$ は量子論の基礎とヒルベルト空間の原点である。可観測性としての$P$ & $Q$の解釈は、ヒルベルト空間のユニタリ変換と古典位相空間の正準変換(つまり接触)の間の類似に常に依存している。量子測度の理論は本質的に完備である(これはしばらく時間がかかった)ため、一元変換ではなく正の変換に関する量子論の基礎を定める方法で正の可換関係を再考することができる。本稿では,同時計測の概念が基本的な微分幾何学問題にどのようにつながるかを示し,その解を次のように示す。同時計測 (p$ & $q$) 測定 (spqm) は,7次元多様体の形をとる普遍計測器を定義し,それをインストゥルメンタルワイル・ハイゼンベルク群 (iwh) と呼ぶ。群 IWH は、正の演算値測度 (POVM) がエネルギー量子化の完全な代替となるほど、予期せぬ方法で古典位相空間にアイデンティティを接続する。 5つの次元は、容易に認識し理解できるプロセスを定義する。他の2次元、IWHの中心における正規化と位相は、あまり知られていない。正規化は特に、SPQMを記述し理解するために特別な処理を必要とする。

The canonical commutation relation, $[Q,P] = i\hbar$, stands at the foundation of quantum theory and the original Hilbert space. The interpretation of $P$ & $Q$ as observables has always relied on the analogies that exist between the unitary transformations of Hilbert space and the canonical (a.k.a. contact) transformations of classical phase space. Now that the theory of quantum measurement is essentially complete (this took a while), it is possible to revisit the canonical commutation relation in a way that sets the foundation of quantum theory not on unitary transformations, but on positive transformations. This paper shows how the concept of simultaneous measurement leads to a fundamental differential geometric problem whose solution shows us the following: The simultaneous $P$ & $Q$ measurement (SPQM) defines a universal measuring instrument, which takes the shape of a 7-dimensional manifold, a universal covering group we call the Instrumental Weyl-Heisenberg Group, IWH. The group IWH connects the identity to classical phase space in unexpected ways that are significant enough that the positive-operator-valued measure (POVM) offers a complete alternative to energy quantization. Five of the dimensions define processes that can be easily recognized and understood. The other two dimensions, the normalization and phase in the center of IWH, are less familiar. The normalization, in particular, requires special handling in order to describe and understand the SPQM instrument.

翻訳日:2023-06-17 00:07:57 公開日:2023-06-14

# CorrMatch:半教師付きセマンティックセグメンテーションのための相関マッチングによるラベル伝播

CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation ( http://arxiv.org/abs/2306.04300v2 )

ライセンス: Link先を確認

Boyuan Sun, Yuqi Yang, Weifeng Yuan, Le Zhang, Ming-Ming Cheng, Qibin Hou

(参考訳) 本稿では,CorrMatch と呼ばれる,単純だが半教師付きセマンティックセマンティックセマンティックセマンティクス手法を提案する。我々のゴールは、ラベルのない画像からより高品質な領域を抽出し、一貫性の正則化によってラベルのないデータをより効率的に活用することである。 CorrMatchの主な貢献は、2つの新しい、補完的な戦略です。まず,良質な領域を拡大するために,初期化を緩和した適応しきい値更新戦略を導入する。さらに,画素間の対の類似度を測定することにより,高信頼度予測の伝播を提案する。その単純さにもかかわらず、corrmatchは人気のある半教師付きセマンティックセグメンテーションベンチマークで素晴らしいパフォーマンスを達成していることを示している。 resnet-101 backboneを使用したdeeplabv3+フレームワークをセグメンテーションモデルとして、pascal voc 2012セグメンテーションベンチマークで76%以上のmiouスコアを取得しました。また,従来の半教師付きセマンティックセグメンテーションモデルよりも一貫した改善を実現している。コードは公開される予定だ。

In this paper, we present a simple but performant semi-supervised semantic segmentation approach, termed CorrMatch. Our goal is to mine more high-quality regions from the unlabeled images to leverage the unlabeled data more efficiently via consistency regularization. The key contributions of our CorrMatch are two novel and complementary strategies. First, we introduce an adaptive threshold updating strategy with a relaxed initialization to expand the high-quality regions. Furthermore, we propose to propagate high-confidence predictions through measuring the pairwise similarities between pixels. Despite its simplicity, we show that CorrMatch achieves great performance on popular semi-supervised semantic segmentation benchmarks. Taking the DeepLabV3+ framework with ResNet-101 backbone as our segmentation model, we receive a 76%+ mIoU score on the Pascal VOC 2012 segmentation benchmark with only 92 annotated images provided. We also achieve a consistent improvement over previous semi-supervised semantic segmentation models. Code will be made publicly available.

翻訳日:2023-06-17 00:00:23 公開日:2023-06-14

# DEMIST : 深層学習に基づく心筋灌流SPECTのためのタスク特異的 denoising アプローチ

DEMIST: A deep-learning-based task-specific denoising approach for myocardial perfusion SPECT ( http://arxiv.org/abs/2306.04249v2 )

ライセンス: Link先を確認

Md Ashequr Rahman, Zitong Yu, Richard Laforest, Craig K. Abbey, Barry A. Siegel, Abhinav K. Jha

(参考訳) 低放射線量で取得した心筋血流イメージング(mpi)spect画像および/または取得時間を処理する方法が必要であり、この処理画像は灌流欠陥の検出に関する臨床課題において観察者性能を向上させる。このニーズに対処するために、モデル・オブザーバ理論と人間の視覚システムの理解に基づいて、MPI SPECT画像(DEMIST)を識別するタスク固有の深層学習に基づくアプローチを提案する。この手法は、遮音性能が検出タスクに影響を及ぼす特徴を保存するために設計されている。 2台のスキャナー(N=338)でMPIを施行した患者を対象に,匿名臨床データを用いた再検討を行い,DEMISTを客観的に評価した。評価は低線量率6.25%, 12.5%, 25%で行われ, 人為的チャネル化ホテルリング観測者を用いて行った。受信動作特性曲線 (AUC) 下での性能を定量化した。 DEMISTで認識された画像は、対応する低用量画像や、一般的に使われているタスク非依存のDLベースの画像と比較してAUCが有意に高かった。同様の結果は, 性差と欠陥タイプに基づく成層分析で観察された。さらに、DEMISTはルート平均二乗誤差と構造類似度指標を用いて定量化され、低線量画像の視覚的忠実度を改善した。数学的解析により、DEMISTはノイズ特性を改善しながら検出タスクを補助する機能を保存し、観測性能を向上した。以上の結果から,MPI SPECTで低位像を呈示するDEMISTのさらなる臨床評価が示唆された。

There is an important need for methods to process myocardial perfusion imaging (MPI) SPECT images acquired at lower radiation dose and/or acquisition time such that the processed images improve observer performance on the clinical task of detecting perfusion defects. To address this need, we build upon concepts from model-observer theory and our understanding of the human visual system to propose a Detection task-specific deep-learning-based approach for denoising MPI SPECT images (DEMIST). The approach, while performing denoising, is designed to preserve features that influence observer performance on detection tasks. We objectively evaluated DEMIST on the task of detecting perfusion defects using a retrospective study with anonymized clinical data in patients who underwent MPI studies across two scanners (N = 338). The evaluation was performed at low-dose levels of 6.25%, 12.5% and 25% and using an anthropomorphic channelized Hotelling observer. Performance was quantified using area under the receiver operating characteristics curve (AUC). Images denoised with DEMIST yielded significantly higher AUC compared to corresponding low-dose images and images denoised with a commonly used task-agnostic DL-based denoising method. Similar results were observed with stratified analysis based on patient sex and defect type. Additionally, DEMIST improved visual fidelity of the low-dose images as quantified using root mean squared error and structural similarity index metric. A mathematical analysis revealed that DEMIST preserved features that assist in detection tasks while improving the noise properties, resulting in improved observer performance. The results provide strong evidence for further clinical evaluation of DEMIST to denoise low-count images in MPI SPECT.

翻訳日:2023-06-17 00:00:01 公開日:2023-06-14

# CrazyFlie 2.Xの強化学習に基づく制御

Reinforcement Learning-Based Control of CrazyFlie 2.X Quadrotor ( http://arxiv.org/abs/2306.03951v2 )

ライセンス: Link先を確認

Arshad Javeed, Valent\'in L\'opez Jim\'enez

(参考訳) プロジェクトの目的は、PIDのような古典的な制御アルゴリズムと現代の強化学習アルゴリズムの相乗効果を探求し、クレイジーフリー2.Xを制御するための実用的な制御機構を考案することである。第一の目的は強化学習戦略を用いたPIDチューニングを行うことである。第二の目的は、最初のタスクからの学習を活用し、灯台位置決めシステムと統合してナビゲーションの制御を実装することである。ナビゲーションには2つのアプローチが考えられる。これは、有限の事前定義された動作プリミティブを持つ深部Q-Learningを用いた離散的なナビゲーション問題であり、連続的なナビゲーションアプローチのための深部強化学習である。 RLトレーニングのシミュレーションは、強化学習のためのオープンソースのジムベースの環境であるジム・パイブルレット・ドレーンで実施され、RL実装は安定ベースライン3で提供される。

The objective of the project is to explore synergies between classical control algorithms such as PID and contemporary reinforcement learning algorithms to come up with a pragmatic control mechanism to control the CrazyFlie 2.X quadrotor. The primary objective would be performing PID tuning using reinforcement learning strategies. The secondary objective is to leverage the learnings from the first task to implement control for navigation by integrating with the lighthouse positioning system. Two approaches are considered for navigation, a discrete navigation problem using Deep Q-Learning with finite predefined motion primitives, and deep reinforcement learning for a continuous navigation approach. Simulations for RL training will be performed on gym-pybullet-drones, an open-source gym-based environment for reinforcement learning, and the RL implementations are provided by stable-baselines3

翻訳日:2023-06-16 23:59:32 公開日:2023-06-14

# 残念ながら、それはできません:ブラックボックス生成言語モデルにおける即時拒否の予測

I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models ( http://arxiv.org/abs/2306.03423v2 )

ライセンス: Link先を確認

Max Reuter, William Schulze

(参考訳) OpenAIのChatGPTのリリース以来、生成言語モデルは広く注目を集めている。利用の増加は生成モデルの広範な実用性を強調しているが、いくつかの形態の埋め込みバイアスも明らかにしている。いくつかは事前学習コーパスによって引き起こされるが、生成モデルに特有の追加のバイアスは、有害なコンテンツを生成するのを避けるために主観的微調整を使用することから生じる。微調整バイアスは、個々のエンジニアと企業のポリシーから生じ、モデルが拒否する方向に影響を及ぼす可能性がある。本実験では,ブラックボックス攻撃によるChatGPTの拒絶動作を特徴付ける。まずChatGPTにさまざまな攻撃的かつ良心的なプロンプト(n=1,706)を問い合わせ、それから手動で各レスポンスをコンプライアンスや拒否としてラベル付けします。応答の手動検査は、拒絶はクリーンなバイナリではなく、連続体上にあることを示し、いくつかの異なる種類の応答をコンプライアンスや拒否のバイナリにマップする。手動でラベルされた小さなデータセットは、拒絶分類器のトレーニングに使用され、96%の精度を実現している。次に、この拒絶分類器を使用して、Quora Insincere Questionsデータセットから適合したより大きな(n=10,000)データセットをブートストラップします。この機械ラベル付きデータを用いて、ChatGPTの応答を見ることなく、ChatGPTが与えられた質問を拒否するかどうかを予測するプロンプト分類器を訓練する。このプロンプト分類器は、手動ラベル付き質問(n=985)のテストセットで76%の精度を達成する。コンプライアンスや拒否を最も予測する分類器とn-gramのプロンプトについて検討した。私たちのデータセットとコードはhttps://github.com/maxwellreuter/chatgpt-refusalsで利用可能です。

Since the release of OpenAI's ChatGPT, generative language models have attracted extensive public attention. The increased usage has highlighted generative models' broad utility, but also revealed several forms of embedded bias. Some is induced by the pre-training corpus; but additional bias specific to generative models arises from the use of subjective fine-tuning to avoid generating harmful content. Fine-tuning bias may come from individual engineers and company policies, and affects which prompts the model chooses to refuse. In this experiment, we characterize ChatGPT's refusal behavior using a black-box attack. We first query ChatGPT with a variety of offensive and benign prompts (n=1,706), then manually label each response as compliance or refusal. Manual examination of responses reveals that refusal is not cleanly binary, and lies on a continuum; as such, we map several different kinds of responses to a binary of compliance or refusal. The small manually-labeled dataset is used to train a refusal classifier, which achieves an accuracy of 96%. Second, we use this refusal classifier to bootstrap a larger (n=10,000) dataset adapted from the Quora Insincere Questions dataset. With this machine-labeled data, we train a prompt classifier to predict whether ChatGPT will refuse a given question, without seeing ChatGPT's response. This prompt classifier achieves 76% accuracy on a test set of manually labeled questions (n=985). We examine our classifiers and the prompt n-grams that are most predictive of either compliance or refusal. Our datasets and code are available at https://github.com/maxwellreuter/chatgpt-refusals.

翻訳日:2023-06-16 23:59:08 公開日:2023-06-14

# 分布外検出と条件正規化流の適応による高速光電場3次元顕微鏡

Fast light-field 3D microscopy with out-of-distribution detection and adaptation through Conditional Normalizing Flows ( http://arxiv.org/abs/2306.06408v2 )

ライセンス: Link先を確認

Josu\'e Page Vizca\'ino, Panagiotis Symvoulidis, Zeguan Wang, Jonas Jelten, Paolo Favaro, Edward S. Boyden, Tobias Lasser

(参考訳) リアルタイム3次元蛍光顕微鏡は、神経活動モニタリングなどの生物の時空間分析に不可欠である。拡張視野光電界顕微鏡(extended field-of-view light field microscope, xlfm)は、フーリエ光電界顕微鏡(fourier light field microscope)とも呼ばれる。 XLFMは、単一のカメラ露光において空間角情報を取得する。その後のステップでは、3Dボリュームをアルゴリズムで再構成することができ、リアルタイムの3D取得と潜在的な分析に非常に適している。残念なことに、従来の再構成手法(デコンボリューションなど)は処理時間(0.0220Hz)を必要とし、XLFMの速度優位性を妨げている。ニューラルネットワークアーキテクチャは、確実性指標の欠如を犠牲にして、速度制約を克服することができるため、バイオメディカル領域では信頼できない。本研究は, 条件付き正規化フローに基づいて, 生きた固定化ゼブラフィッシュ神経活動の高速な3次元再構成を行うアーキテクチャを提案する。 512x512x96ボクセルにまたがる8Hzのボリュームを再構築し、小さなデータセット(10のイメージボリュームペア)のために2時間以内にトレーニングすることができる。さらに、フローの正規化により、分布監視が可能となり、新しいサンプルが検出された場合、システムの配布外検出と再学習が行われる。提案手法は,複数の分布内サンプル(遺伝的に同一のゼブラフィッシュ)と分布外サンプルを含むクロスバリデーション手法について検討した。

Real-time 3D fluorescence microscopy is crucial for the spatiotemporal analysis of live organisms, such as neural activity monitoring. The eXtended field-of-view light field microscope (XLFM), also known as Fourier light field microscope, is a straightforward, single snapshot solution to achieve this. The XLFM acquires spatial-angular information in a single camera exposure. In a subsequent step, a 3D volume can be algorithmically reconstructed, making it exceptionally well-suited for real-time 3D acquisition and potential analysis. Unfortunately, traditional reconstruction methods (like deconvolution) require lengthy processing times (0.0220 Hz), hampering the speed advantages of the XLFM. Neural network architectures can overcome the speed constraints at the expense of lacking certainty metrics, which renders them untrustworthy for the biomedical realm. This work proposes a novel architecture to perform fast 3D reconstructions of live immobilized zebrafish neural activity based on a conditional normalizing flow. It reconstructs volumes at 8 Hz spanning 512x512x96 voxels, and it can be trained in under two hours due to the small dataset requirements (10 image-volume pairs). Furthermore, normalizing flows allow for exact Likelihood computation, enabling distribution monitoring, followed by out-of-distribution detection and retraining of the system when a novel sample is detected. We evaluate the proposed method on a cross-validation approach involving multiple in-distribution samples (genetically identical zebrafish) and various out-of-distribution ones.

翻訳日:2023-06-16 23:51:56 公開日:2023-06-14

# エラーフィードバックはプリコンディショナーを正確に圧縮できる

Error Feedback Can Accurately Compress Preconditioners ( http://arxiv.org/abs/2306.06098v2 )

ライセンス: Link先を確認

Ionut-Vlad Modoranu, Aleksei Kalinov, Eldar Kurtic, Dan Alistarh

(参考訳) 深層ネットワークの規模で2次情報を活用することは、ディープラーニングのための現在の最適化器の性能を改善するための主要なアプローチの1つだ。しかしながら、フルマトリクスアダグラード(ggt)やマトリクスフリー近似曲率(m-fac)のような、正確なフルマトリクスプリコンディショニングのための既存のアプローチは、中規模モデルにも適用される場合、モデル次元でメモリ要求が乗算されるような勾配のスライディングウィンドウを格納しなければならないため、膨大なストレージコストを被る。本稿では, この問題を, 収束の損失なく, プリコンディショナーの最大2桁圧縮に適用可能な, 効率的かつ簡易に実装したエラーフィードバック手法を用いて解決する。具体的には、スペーシフィケーションや低ランク圧縮 \emph{before} を用いて勾配情報をプレコンディショナーに入力し、圧縮誤差を将来の繰り返しにフィードバックする。ビジョンのためのディープニューラルネットワークに関する広範な実験により、このアプローチは精度に影響を与えず、フルマトリックスプリコンディショナーを最大2桁圧縮し、フルマトリックスアダグラード(ggt)と自然勾配(m-fac)の実装のためのフルマトリックスプリコンディショニングのメモリオーバーヘッドを効果的に除去できることが示されている。私たちのコードはhttps://github.com/IST-DASLab/EFCPで利用可能です。

Leveraging second-order information at the scale of deep networks is one of the main lines of approach for improving the performance of current optimizers for deep learning. Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to medium-scale models, as they must store a sliding window of gradients, whose memory requirements are multiplicative in the model dimension. In this paper, we address this issue via an efficient and simple-to-implement error-feedback technique that can be applied to compress preconditioners by up to two orders of magnitude in practice, without loss of convergence. Specifically, our approach compresses the gradient information via sparsification or low-rank compression \emph{before} it is fed into the preconditioner, feeding the compression error back into future iterations. Extensive experiments on deep neural networks for vision show that this approach can compress full-matrix preconditioners by up to two orders of magnitude without impact on accuracy, effectively removing the memory overhead of full-matrix preconditioning for implementations of full-matrix Adagrad (GGT) and natural gradient (M-FAC). Our code is available at https://github.com/IST-DASLab/EFCP.

翻訳日:2023-06-16 23:51:10 公開日:2023-06-14

# CARSO:合成観測の対向的リコール

CARSO: Counter-Adversarial Recall of Synthetic Observations ( http://arxiv.org/abs/2306.06081v2 )

ライセンス: Link先を確認

Emanuele Ballarin, Alessio Ansuini, Luca Bortolussi

(参考訳) 本稿では,認知神経科学からのヒントに触発された画像分類のための新しい防御機構カルソを提案する。この方法は相乗的に敵の訓練に相補的であり、攻撃された分類器の内部表現に関する知識に依存している。このような表現を条件とした生成モデルを利用して、最終的に分類される入力の再構成をサンプリングする。 CARSOは、さまざまな画像データセットと分類器アーキテクチャをまたいだ、多種多様で強力な適応攻撃に関するよく確立されたベンチマークによる実験的評価によると、CARSOは、最先端の対人訓練単独よりもはるかに優れた分類器を、許容可能な正確さで防御することができる。さらに防御アーキテクチャは、予期せぬ脅威から効果的に身を守ることに成功し、愚かな確率的防御に適応したエンドツーエンド攻撃にも成功している。コードと事前トレーニングされたモデルはhttps://github.com/emaballarin/CARSO で公開されている。

In this paper, we propose a novel adversarial defence mechanism for image classification -- CARSO -- inspired by cues from cognitive neuroscience. The method is synergistically complementary to adversarial training and relies on knowledge of the internal representation of the attacked classifier. Exploiting a generative model for adversarial purification, conditioned on such representation, it samples reconstructions of inputs to be finally classified. Experimental evaluation by a well-established benchmark of varied, strong adaptive attacks, across diverse image datasets and classifier architectures, shows that CARSO is able to defend the classifier significantly better than state-of-the-art adversarial training alone -- with a tolerable clean accuracy toll. Furthermore, the defensive architecture succeeds in effectively shielding itself from unforeseen threats, and end-to-end attacks adapted to fool stochastic defences. Code and pre-trained models are available at https://github.com/emaballarin/CARSO .

翻訳日:2023-06-16 23:50:39 公開日:2023-06-14

# COVER:言語モデルにおけるプロンプトに基づく学習に対するヒューリスティックなグレディ・アドバイザリアタック

COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models ( http://arxiv.org/abs/2306.05659v2 )

ライセンス: Link先を確認

Zihao Tan, Qingliang Chen, Wenbin Zhu and Yongjian Huang

(参考訳) プロンプトベースの学習は、プレトレーニング言語モデル(PLM)、特に数ショット設定のような低リソースシナリオにおいて、効果的な方法であることが証明されている。しかしながら、PLMの信頼性は最重要であり、言語モデルの予測を誤解させ、重大なセキュリティ上の懸念を引き起こす可能性のあるプロンプトベースのテンプレートに潜在的な脆弱性が示されている。本稿では,ブラックボックスシナリオにおける手動テンプレートに対する即時攻撃を提案することにより,PLMの脆弱性について明らかにする。まず,手動テンプレートを分割するための文字レベルと単語レベルのヒューリスティックアプローチを設計する。次に,上記のヒューリスティック破壊手法に基づく攻撃に対する欲深いアルゴリズムを提案する。最後に,3種類のBERT系列モデルと8つのデータセットの分類タスクを用いて,本手法の評価を行った。総合的な実験結果から,攻撃成功率と攻撃速度の観点から,本手法の有効性を検証した。さらに, 提案手法は, ショット数, テンプレート長, クエリ回数の異なるシナリオにおいても優れた性能を示し, 高い一般化性を示した。

Prompt-based learning has been proved to be an effective way in pre-trained language models (PLMs), especially in low-resource scenarios like few-shot settings. However, the trustworthiness of PLMs is of paramount significance and potential vulnerabilities have been shown in prompt-based templates that could mislead the predictions of language models, causing serious security concerns. In this paper, we will shed light on some vulnerabilities of PLMs, by proposing a prompt-based adversarial attack on manual templates in black box scenarios. First of all, we design character-level and word-level heuristic approaches to break manual templates separately. Then we present a greedy algorithm for the attack based on the above heuristic destructive approaches. Finally, we evaluate our approach with the classification tasks on three variants of BERT series models and eight datasets. And comprehensive experimental results justify the effectiveness of our approach in terms of attack success rate and attack speed. Further experimental studies indicate that our proposed method also displays good capabilities in scenarios with varying shot counts, template lengths and query counts, exhibiting good generalizability.

翻訳日:2023-06-16 23:49:00 公開日:2023-06-14

# CVXPYを用いたロバストな経験的リスク最小化問題の特定と解決

Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY ( http://arxiv.org/abs/2306.05649v2 )

ライセンス: Link先を確認

Eric Luxenberg and Dhruv Malik and Yuanzhi Li and Aarti Singh and Stephen Boyd

(参考訳) 本研究では,各データポイントが所定の凸不確実性集合上で変動する場合の最悪の経験的損失を最小限に抑えるために,モデルパラメータが選択される,ロバストな経験的リスク最小化(ERM)を考える。単純な場合では、そのような問題は分析形式で表現できる。一般に、問題は双対化によって引き出すことができ、min-max問題からmin-min問題へと変換される。二重化には専門知識が必要です。本稿では,CVXPYを用いて,この二重化手順をユーザフレンドリな方法で自動化する方法を示す。当社のフレームワークでは,コンベックス損失の一般的なクラスを用いて,堅牢なERM問題の特定と解決を可能にし,多くの標準回帰および分類問題を捕捉する。ユーザーはdisciplined convex programming (dcp) 制約によって表現可能な任意の複雑な不確実性集合を容易に指定できる。

We consider robust empirical risk minimization (ERM), where model parameters are chosen to minimize the worst-case empirical loss when each data point varies over a given convex uncertainty set. In some simple cases, such problems can be expressed in an analytical form. In general the problem can be made tractable via dualization, which turns a min-max problem into a min-min problem. Dualization requires expertise and is tedious and error-prone. We demonstrate how CVXPY can be used to automate this dualization procedure in a user-friendly manner. Our framework allows practitioners to specify and solve robust ERM problems with a general class of convex losses, capturing many standard regression and classification problems. Users can easily specify any complex uncertainty set that is representable via disciplined convex programming (DCP) constraints.

翻訳日:2023-06-16 23:48:42 公開日:2023-06-14

# 大型鋳造シリコンフォトニクスにおける波長可変量子エミッタ

Tunable quantum emitters on large-scale foundry silicon photonics ( http://arxiv.org/abs/2306.06460v2 )

ライセンス: Link先を確認

Hugo Larocque, Mustafa Atabey Buyukkaya, Carlos Errando-Herranz, Samuel Harper, Jacques Carolan, Chang-Min Lee, Christopher J.K. Richardson, Gerald L. Leake, Daniel J. Coleman, Michael L. Fanto, Edo Waks, Dirk Englund

(参考訳) 単一光子と単一原子系のレベルでの大規模多体量子システム制御は、量子情報科学と技術における中心的な目標である。集中的な研究と開発により、鋳物ベースのシリコン・オン・インシュレーターフォトニック集積回路は、個々のモードをプログラム可能な大規模光制御のための主要なプラットフォームへと推進された。しかし、原子量子系と単一エミッタのチューナビリティを統合することは、未解決の課題である。ここでは,高輝度赤外半導体量子ドット単一光子エミッタを含む複数InAs/InPマイクロチップレットを300〜mmのファクトリープロセスで作製したシリコンオン絶縁体フォトニック集積回路に結合することで,この障壁を克服する。このプラットフォームでは、共振蛍光による単一光子放出と、電気的に制御された不揮発性メモリによるスケーラブルな発光波長可変性を実現する。フォトニックと量子システムの複合制御は、半導体ファイントリーで製造されるプログラマブルな量子情報プロセッサへの扉を開く。

Controlling large-scale many-body quantum systems at the level of single photons and single atomic systems is a central goal in quantum information science and technology. Intensive research and development has propelled foundry-based silicon-on-insulator photonic integrated circuits to a leading platform for large-scale optical control with individual mode programmability. However, integrating atomic quantum systems with single-emitter tunability remains an open challenge. Here, we overcome this barrier through the hybrid integration of multiple InAs/InP microchiplets containing high-brightness infrared semiconductor quantum dot single photon emitters into advanced silicon-on-insulator photonic integrated circuits fabricated in a 300~mm foundry process. With this platform, we achieve single photon emission via resonance fluorescence and scalable emission wavelength tunability through an electrically controlled non-volatile memory. The combined control of photonic and quantum systems opens the door to programmable quantum information processors manufactured in leading semiconductor foundries.

翻訳日:2023-06-16 23:39:40 公開日:2023-06-14

# ユーザの意図に基づく文脈フォント推薦

Contextual Font Recommendations based on User Intent ( http://arxiv.org/abs/2306.08188v1 )

ライセンス: Link先を確認

Sanat Sharma, Jayant Kumar, Jing Zheng, Tracy Holloway King

(参考訳) Adobe Fontsには2万以上のユニークなフォントのリッチライブラリがあり、Adobeユーザーがグラフィック、ポスター、コンポジットの作成に使っている。大きなライブラリの性質から、どのフォントを選択するかを知ることは、多くの経験を必要とする大変な作業である。多くのAdobe製品、特にAdobe Expressのカジュアルなユーザーにとって、これは利用可能なリッチで多様なフォントを使わずにデフォルトのフォントを選択することを意味することが多い。本研究では,ユーザの創造的体験を支援するために,文脈的フォントレコメンデーションを提供する意図駆動システムを構築する。本システムは多言語テキスト入力を取り入れ,ユーザの意図に基づいて適切なフォントを推薦する。ユーザの権利に基づいて、無料フォントと有料フォントの混合が調整される。この機能は、現在数百万のAdobe Expressユーザーが利用しており、CTRは25%である。

Adobe Fonts has a rich library of over 20,000 unique fonts that Adobe users utilize for creating graphics, posters, composites etc. Due to the nature of the large library, knowing what font to select can be a daunting task that requires a lot of experience. For most users in Adobe products, especially casual users of Adobe Express, this often means choosing the default font instead of utilizing the rich and diverse fonts available. In this work, we create an intent-driven system to provide contextual font recommendations to users to aid in their creative journey. Our system takes in multilingual text input and recommends suitable fonts based on the user's intent. Based on user entitlements, the mix of free and paid fonts is adjusted. The feature is currently used by millions of Adobe Express users with a CTR of >25%.

翻訳日:2023-06-16 20:56:06 公開日:2023-06-14

# ZeroForge:3Dスーパービジョンのないフィードフォワードテキスト・ツー・シェイプ

ZeroForge: Feedforward Text-to-Shape Without 3D Supervision ( http://arxiv.org/abs/2306.08183v1 )

ライセンス: Link先を確認

Kelly O. Marshall, Minh Pham, Ameya Joshi, Anushrut Jignasu, Aditya Balu, Adarsh Krishnamurthy Chinmay Hegde

(参考訳) 現在のtext-to-shape生成の最先端手法では、事前に定義された3d形状のラベル付きデータセットを使った教師付きトレーニングが必要か、暗黙のニューラルネットワーク表現の高価な推論時間最適化が必要となる。本稿では,ゼロショットテキスト・ツー・シェイプ生成手法であるZeroForgeについて述べる。オープンボキャブラリー形状生成を実現するためには,既存のフィードフォワードアプローチの注意深いアーキテクチャ適応と,データフリーなクリップロスとコントラストロスの組み合わせが必要となる。これらの技術を用いて、CLIP-Forgeのような既存のフィードフォワードテキスト変換モデルの生成能力を著しく拡張することができる。我々はこの手法を質的・定量的評価を通じて支援する。

Current state-of-the-art methods for text-to-shape generation either require supervised training using a labeled dataset of pre-defined 3D shapes, or perform expensive inference-time optimization of implicit neural representations. In this work, we present ZeroForge, an approach for zero-shot text-to-shape generation that avoids both pitfalls. To achieve open-vocabulary shape generation, we require careful architectural adaptation of existing feed-forward approaches, as well as a combination of data-free CLIP-loss and contrastive losses to avoid mode collapse. Using these techniques, we are able to considerably expand the generative ability of existing feed-forward text-to-shape models such as CLIP-Forge. We support our method via extensive qualitative and quantitative evaluations

翻訳日:2023-06-16 20:55:53 公開日:2023-06-14

# 量子コンピュータにおける量子グリード最適化の実験的実装

Experimental implementation of quantum greedy optimization on quantum computer ( http://arxiv.org/abs/2306.08181v1 )

ライセンス: Link先を確認

Tadayoshi Matsumori, Tadashi Kadowaki

(参考訳) 本稿では,時間進化の離散化(d-QGO)に基づく量子グリード最適化アルゴリズムを提案する。もともと、反断熱駆動による処理時間を短縮するために開発された量子グリード最適化は、エネルギーの感度解析から反断熱項のパラメータを順次選択し、パラメータ値を決定する。量子コンピュータにd-QGOを実装する場合、感度解析はデバイスやショットノイズにより短時間で基底状態を見つけるためにボトルネックとなる可能性がある。本稿では,d-qgoに対して十分に大きな差分間隔を用いた感度解析法を提案する。 d-qgoは、成功確率を維持しながら感度を決定するのに必要なショット数を減少させる。

This paper implements a quantum greedy optimization algorithm based on the discretization of time evolution (d-QGO). Quantum greedy optimization, which was originally developed for reducing processing time via counterdiabatic driving, sequentially selects a parameter in the counterdiabatic term from the sensitivity analysis of energy and then determines the parameter value. For implementing d-QGO on a quantum computer, the sensitivity analysis may become a bottleneck to find the ground state in a short time due to device and shot noise. In this paper, we present an improved sensitivity analysis for d-QGO that employs a sufficiently large differential interval. We demonstrate that d-QGO reduces the number of shots required to determine the sensitivity while maintaining the success probability.

翻訳日:2023-06-16 20:55:39 公開日:2023-06-14

# unraveling the arc puzzle: オブジェクト中心決定トランスフォーマーで人間のソリューションを模倣する

Unraveling the ARC Puzzle: Mimicking Human Solutions with Object-Centric Decision Transformer ( http://arxiv.org/abs/2306.08204v1 )

ライセンス: Link先を確認

Jaehyun Park, Jaegyun Im, Sanha Hwang, Mintaek Lim, Sabina Ualibekova, Sejin Kim, Sundong Kim

(参考訳) 人工知能(AGI)の追求において,新たな2段階アプローチを用いて抽象・推論コーパス(ARC)の課題に取り組む。本稿では,人間の問題解決をモデル化するための模擬学習パラダイムとしてDecision Transformerを使用し,オブジェクト検出アルゴリズムであるPush and Pullクラスタリング手法を導入する。この二重戦略はAIのARC問題解決スキルを強化し、AGIの進歩に対する洞察を提供する。しかし、我々の研究は、高度なデータ収集ツール、堅牢なトレーニングデータセット、洗練されたモデル構造の必要性を明らかにしています。本研究は意思決定トランスフォーマーの潜在的な改善を浮き彫りにし,今後のagi研究を推進する。

In the pursuit of artificial general intelligence (AGI), we tackle Abstraction and Reasoning Corpus (ARC) tasks using a novel two-pronged approach. We employ the Decision Transformer in an imitation learning paradigm to model human problem-solving, and introduce an object detection algorithm, the Push and Pull clustering method. This dual strategy enhances AI's ARC problem-solving skills and provides insights for AGI progression. Yet, our work reveals the need for advanced data collection tools, robust training datasets, and refined model structures. This study highlights potential improvements for Decision Transformers and propels future AGI research.

翻訳日:2023-06-16 20:47:20 公開日:2023-06-14

# 指数型家族雑音を用いたグラフラプラシアン学習

Graph Laplacian Learning with Exponential Family Noise ( http://arxiv.org/abs/2306.08201v1 )

ライセンス: Link先を確認

Changhao Shi, Gal Mishne

(参考訳) グラフ機械学習手法を適用する際の一般的な課題は、システムの基盤となるグラフがしばしば未知であることである。連続グラフ信号に対して異なるグラフ推定法が提案されているが、離散数などの他の種類のデータに基づくグラフ構造を推定するには未定である。本稿では,スムーズなグラフ信号から指数関数的な家族雑音分布へグラフを学習するグラフ信号処理(GSP)フレームワークを一般化し,様々なデータタイプをモデル化する。本稿では,グラフラプラシアンと雑音信号からの非可観測滑らかな表現を推定する交互アルゴリズムを提案する。我々は合成データと実世界データを用いて,新しいアルゴリズムがノイズモデルミスマッチ下でのラプラシアン推定法を上回っていることを示す。

A common challenge in applying graph machine learning methods is that the underlying graph of a system is often unknown. Although different graph inference methods have been proposed for continuous graph signals, inferring the graph structure underlying other types of data, such as discrete counts, is under-explored. In this paper, we generalize a graph signal processing (GSP) framework for learning a graph from smooth graph signals to the exponential family noise distribution to model various data types. We propose an alternating algorithm that estimates the graph Laplacian as well as the unobserved smooth representation from the noisy signals. We demonstrate in synthetic and real-world data that our new algorithm outperforms competing Laplacian estimation methods under noise model mismatch.

翻訳日:2023-06-16 20:47:09 公開日:2023-06-14

# POP:継続的な学習のためのプロンプトのプロンプト

POP: Prompt Of Prompts for Continual Learning ( http://arxiv.org/abs/2306.08200v1 )

ライセンス: Link先を確認

Zhiyuan Hu, Jiancheng Lyu, Dashan Gao, Nuno Vasconcelos

(参考訳) 近年,継続的な学習 (CL) が注目されている。破滅的な忘れることなく新しい概念を学ぶ人間の能力を模倣することを目的としている。既存のCLメソッドはある程度これを達成しているが、学習した特徴空間のセマンティックなドリフトがまだある。基盤モデルには、非常に大きなデータセットから学んだ堅牢な特徴表現が与えられ、cl問題の解のための興味深い基盤を提供する。最近の研究は、表現の一般性をほとんど無スケールで残すような技法を迅速にチューニングすることで、特定のタスクに適応できることも示している。しかし、オープンな質問は、タスク固有のプロンプトと、グローバルであるプロンプト、すなわち、クロスタスク情報を取得する方法である。本研究では、タスク特定プロンプトのグループと、popと呼ばれるグローバルプロンプトのグループを段階的に学習して、前者からの情報を統合することにより、この目標に対処するprompion of prompts(pop)モデルを提案する。 POP学習を用いた基礎モデルでは,古典的なCL手法よりも優れた性能が得られることを示す。さらに、プロンプトチューニングは、少数のトレーニングサンプルのみを必要とするため、POPは、データセット全体でトレーニングされた競合メソッドよりも優れたパフォーマンスを保ちながら、数ショット設定でCLを実行することができる。

Continual learning (CL) has attracted increasing attention in the recent past. It aims to mimic the human ability to learn new concepts without catastrophic forgetting. While existing CL methods accomplish this to some extent, they are still prone to semantic drift of the learned feature space. Foundation models, which are endowed with a robust feature representation, learned from very large datasets, provide an interesting substrate for the solution of the CL problem. Recent work has also shown that they can be adapted to specific tasks by prompt tuning techniques that leave the generality of the representation mostly unscathed. An open question is, however, how to learn both prompts that are task specific and prompts that are global, i.e. capture cross-task information. In this work, we propose the Prompt Of Prompts (POP) model, which addresses this goal by progressively learning a group of task-specified prompts and a group of global prompts, denoted as POP, to integrate information from the former. We show that a foundation model equipped with POP learning is able to outperform classic CL methods by a significant margin. Moreover, as prompt tuning only requires a small set of training samples, POP is able to perform CL in the few-shot setting, while still outperforming competing methods trained on the entire dataset.

翻訳日:2023-06-16 20:46:54 公開日:2023-06-14

# デジタル病理学における説明可能・位置認識学習

Explainable and Position-Aware Learning in Digital Pathology ( http://arxiv.org/abs/2306.08198v1 )

ライセンス: Link先を確認

Milan Aryal and Nasim Yahyasoltani

(参考訳) スライド画像全体(wsi)をグラフとしてエンコーディングすることは、gigapixelの解像度であるwsiをグラフ学習のために全体表現することができるため、モチベーションが高い。この目的のために、WSIはグラフのノードを表す小さなパッチに分割される。これにより、がんの分類と分類にグラフベースの学習方法が利用できる。隣接ノード間のメッセージパッシングは、グラフベースの学習手法の基礎である。しかし、それらはパッチのいかなる位置情報も考慮せず、2つのパッチが位相的に同型な近傍にある場合、それらの埋め込みは互いにほぼ類似している。本研究は, 位置埋め込みとグラフアテンションを用いて, WSIsからがんの分類を行う。グラフ分類におけるノードの位置埋め込みを表現するために,提案手法ではspline convolutional neural networks (cnn)を用いる。このアルゴリズムは、前立腺がんと腎臓がんをグレードするWSIデータセットでテストされる。提案手法とがん診断とグレーディングの指導的アプローチの比較により, 評価精度が向上した。 WSIsにおける癌領域の同定は、がん診断におけるもう一つの重要な課題である。本研究では,提案モデルの説明可能性についても論じる。勾配に基づく説明性アプローチは、wsisの塩分マッピングを生成するために用いられる。これは、がん診断の責任があるwsiの領域を調べるために使用することができ、提案モデルを説明することができる。

Encoding whole slide images (WSI) as graphs is well motivated since it makes it possible for the gigapixel resolution WSI to be represented in its entirety for the purpose of graph learning. To this end, WSIs can be broken into smaller patches that represent the nodes of the graph. Then, graph-based learning methods can be utilized for the grading and classification of cancer. Message passing among neighboring nodes is the foundation of graph-based learning methods. However, they do not take into consideration any positional information for any of the patches, and if two patches are found in topologically isomorphic neighborhoods, their embeddings are nearly similar to one another. In this work, classification of cancer from WSIs is performed with positional embedding and graph attention. In order to represent the positional embedding of the nodes in graph classification, the proposed method makes use of spline convolutional neural networks (CNN). The algorithm is then tested with the WSI dataset for grading prostate cancer and kidney cancer. A comparison of the proposed method with leading approaches in cancer diagnosis and grading verify improved performance. The identification of cancerous regions in WSIs is another critical task in cancer diagnosis. In this work, the explainability of the proposed model is also addressed. A gradient-based explainbility approach is used to generate the saliency mapping for the WSIs. This can be used to look into regions of WSI that are responsible for cancer diagnosis thus rendering the proposed model explainable.

翻訳日:2023-06-16 20:46:37 公開日:2023-06-14

# ラベル雑音下でのグラフの学習

Learning on Graphs under Label Noise ( http://arxiv.org/abs/2306.08194v1 )

ライセンス: Link先を確認

Jingyang Yuan, Xiao Luo, Yifang Qin, Yusheng Zhao, Wei Ju, Ming Zhang

(参考訳) グラフ上のノード分類は、ソーシャル分析や異常検出など、幅広いアプリケーションにおいて重要なタスクである。グラフニューラルネットワーク(GNN)はこのタスクで有望な結果を生んでいるが、現在の手法ではノードのラベル情報が正確であると推定されることが多い。この問題に対処するために,ラベルノイズのあるグラフについて学習する問題を調査し,それを解決するためにCGNN(Consistent Graph Neural Network)と呼ばれる新しいアプローチを開発した。具体的には、グラフコントラスト学習を正規化項として採用し、拡張ノードの2つのビューを一貫した表現へと促進する。この正規化項はラベル情報を利用できないため、ラベルノイズに対するノード表現の堅牢性を高めることができる。さらに,グラフ上の雑音ラベルを検出するために,隣接ノードとラベルの整合性を計測して雑音ノードを識別するホモフィリー仮定に基づくサンプル選択手法を提案する。最後に,これらの信頼度の高い雑音ラベルを純化し,効率的な意味的グラフ学習を実現する。 3つの有名なベンチマークデータセットに関する広範な実験は、競合するアプローチよりもcgnnが優れていることを示している。

Node classification on graphs is a significant task with a wide range of applications, including social analysis and anomaly detection. Even though graph neural networks (GNNs) have produced promising results on this task, current techniques often presume that label information of nodes is accurate, which may not be the case in real-world applications. To tackle this issue, we investigate the problem of learning on graphs with label noise and develop a novel approach dubbed Consistent Graph Neural Network (CGNN) to solve it. Specifically, we employ graph contrastive learning as a regularization term, which promotes two views of augmented nodes to have consistent representations. Since this regularization term cannot utilize label information, it can enhance the robustness of node representations to label noise. Moreover, to detect noisy labels on the graph, we present a sample selection technique based on the homophily assumption, which identifies noisy nodes by measuring the consistency between the labels with their neighbors. Finally, we purify these confident noisy labels to permit efficient semantic graph learning. Extensive experiments on three well-known benchmark datasets demonstrate the superiority of our CGNN over competing approaches.

翻訳日:2023-06-16 20:46:17 公開日:2023-06-14

# 自然言語処理における操作表現

Operationalising Representation in Natural Language Processing ( http://arxiv.org/abs/2306.08193v1 )

ライセンス: Link先を確認

Jacqueline Harding

(参考訳) 認知科学の哲学の中心性にもかかわらず、現代のNLP実践における表現の概念にかかわる哲学的な研究はほとんどない。本稿では,認知科学のアイデアに基づいて,ニューラルNLPモデルの構成要素に関する表現的クレームを評価するための枠組みを提案し,モデルの構成要素が特性を表すかどうかを評価するための3つの基準を提案し,これらの基準を,NLP(およびより広義の深層学習)で一般的な分析手法であるプローブ分類器を用いて運用する。哲学的にインフォームドされた表現の概念を運用するプロジェクトは、科学の哲学者とNLP実践者の両方にとって興味がある。これは哲学者に表現の性質に関する主張のための新しい試験場を与え、NLPの研究者が実証実験に関する大規模な文献を整理するのを手助けし、経験的研究のための新しい道筋を示唆している。

Despite its centrality in the philosophy of cognitive science, there has been little prior philosophical work engaging with the notion of representation in contemporary NLP practice. This paper attempts to fill that lacuna: drawing on ideas from cognitive science, I introduce a framework for evaluating the representational claims made about components of neural NLP models, proposing three criteria with which to evaluate whether a component of a model represents a property and operationalising these criteria using probing classifiers, a popular analysis technique in NLP (and deep learning more broadly). The project of operationalising a philosophically-informed notion of representation should be of interest to both philosophers of science and NLP practitioners. It affords philosophers a novel testing-ground for claims about the nature of representation, and helps NLPers organise the large literature on probing experiments, suggesting novel avenues for empirical research.

翻訳日:2023-06-16 20:45:59 公開日:2023-06-14

# 単発ノード分類のためのインダクティブ線形探索

Inductive Linear Probing for Few-shot Node Classification ( http://arxiv.org/abs/2306.08192v1 )

ライセンス: Link先を確認

Hirthik Mathavan, Zhen Tan, Nivedh Mudiam, Huan Liu

(参考訳) メタラーニングは、数ショットのノード分類のための強力なトレーニング戦略として現れ、トランスダクティブ環境での有効性を実証している。しかし、既存の文献は主にトランスダクティブな少数ショットノードの分類に焦点を当てており、幅広い少数ショット学習コミュニティで広く研究されているインダクティブな設定を無視している。これにより,グラフデータに基づくメタラーニング手法の性能に対する包括的理解が制限される。本研究では,インダクティブな数ショットノード分類設定において,現在のフレームワークの限界を明らかにするための実証的研究を行う。さらに,帰納的ノード分類タスクに適した,単純かつ競争的なベースラインアプローチを提案する。私たちは、メタラーニングパラダイムがグラフ領域でどのように機能するかをよりよく理解するために、私たちの仕事が新しい道を提供することを期待しています。

Meta-learning has emerged as a powerful training strategy for few-shot node classification, demonstrating its effectiveness in the transductive setting. However, the existing literature predominantly focuses on transductive few-shot node classification, neglecting the widely studied inductive setting in the broader few-shot learning community. This oversight limits our comprehensive understanding of the performance of meta-learning based methods on graph data. In this work, we conduct an empirical study to highlight the limitations of current frameworks in the inductive few-shot node classification setting. Additionally, we propose a simple yet competitive baseline approach specifically tailored for inductive few-shot node classification tasks. We hope our work can provide a new path forward to better understand how the meta-learning paradigm works in the graph domain.

翻訳日:2023-06-16 20:45:43 公開日:2023-06-14

# 畳み込みニューラルネットワークによる大規模空間問題の解法

Solving Large-scale Spatial Problems with Convolutional Neural Networks ( http://arxiv.org/abs/2306.08191v1 )

ライセンス: Link先を確認

Damian Owerko, Charilaos I. Kanatsoulis, Charilaos I. Kanatsoulis

(参考訳) 過去10年間で、ディープラーニングの研究はますます強力なハードウェアによって加速され、モデルの複雑さとデータ量の増加が促進された。これは持続不可能になりつつあるため、効率に再フォーカスする必要がある。本稿では,大規模空間問題に対する学習効率を向上させるために,トランスファー学習を用いる。畳み込みニューラルネットワーク (cnn) は, 信号の小さな窓上で学習できるが, 性能劣化が少なく, 任意に大きい信号で評価し, 結果の一般化誤差に対する理論的拘束力を提供する。我々の証明は、伝達学習において過小評価されている特性であるCNNのシフト等価性を利用する。理論的結果は、モバイルインフラの需要(MID)の文脈で実験的に支持される。提案手法は数百のエージェントで大規模に中規模に取り組むことが可能であり,その前に計算処理が難しかった。

Over the past decade, deep learning research has been accelerated by increasingly powerful hardware, which facilitated rapid growth in the model complexity and the amount of data ingested. This is becoming unsustainable and therefore refocusing on efficiency is necessary. In this paper, we employ transfer learning to improve training efficiency for large-scale spatial problems. We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation, and provide a theoretical bound on the resulting generalization error. Our proof leverages shift-equivariance of CNNs, a property that is underexploited in transfer learning. The theoretical results are experimentally supported in the context of mobile infrastructure on demand (MID). The proposed approach is able to tackle MID at large scales with hundreds of agents, which was computationally intractable prior to this work.

翻訳日:2023-06-16 20:45:28 公開日:2023-06-14

# 偽の政治文書検出におけるGPT-3の有効性の評価:LIARデータセットを事例として

Assessing the Effectiveness of GPT-3 in Detecting False Political Statements: A Case Study on the LIAR Dataset ( http://arxiv.org/abs/2306.08190v1 )

ライセンス: Link先を確認

Mars Gokturk Buchholz

(参考訳) 政治的偽言の検出は、情報の完全性を維持し、社会における誤報の拡散を防ぐために重要である。歴史的に、最先端の機械学習モデルは、偽造文を検出する様々な方法を用いていた。これらの手法にはメタデータ(W. Wang et al., 2018)、n-grams analysis(Singh et al., 2021)、言語(Wu et al., 2022)、スタイリスティックな特徴(Islam et al., 2020)の使用が含まれる。 GPT-3(Brown et al., 2020)のような大規模言語モデルの最近の進歩は、幅広いタスクにおいて最先端のパフォーマンスを実現している。本研究では,LIARデータセット(W. Wang et al., 2018)上でGPT-3を用いて実験を行い,メタモデルや言語学的特徴を使わずに最先端モデルよりも高い精度を実現した。さらに, 注意深く設計したプロンプトを用いてゼロショット学習を実験し, ほぼ最先端の性能を達成した。このアプローチの利点は、モデルが決定の証拠を提供し、モデルの意思決定に透明性を与え、ユーザが提供された証拠の有効性を検証する機会を提供することである。

The detection of political fake statements is crucial for maintaining information integrity and preventing the spread of misinformation in society. Historically, state-of-the-art machine learning models employed various methods for detecting deceptive statements. These methods include the use of metadata (W. Wang et al., 2018), n-grams analysis (Singh et al., 2021), and linguistic (Wu et al., 2022) and stylometric (Islam et al., 2020) features. Recent advancements in large language models, such as GPT-3 (Brown et al., 2020) have achieved state-of-the-art performance on a wide range of tasks. In this study, we conducted experiments with GPT-3 on the LIAR dataset (W. Wang et al., 2018) and achieved higher accuracy than state-of-the-art models without using any additional meta or linguistic features. Additionally, we experimented with zero-shot learning using a carefully designed prompt and achieved near state-of-the-art performance. An advantage of this approach is that the model provided evidence for its decision, which adds transparency to the model's decision-making and offers a chance for users to verify the validity of the evidence provided.

翻訳日:2023-06-16 20:45:13 公開日:2023-06-14

# 言語モデルは否定者ではない:否定ベンチマークによる言語モデルの解析

Language models are not naysayers: An analysis of language models on negation benchmarks ( http://arxiv.org/abs/2306.08189v1 )

ライセンス: Link先を確認

Thinh Hung Truong, Timothy Baldwin, Karin Verspoor, Trevor Cohn

(参考訳) BERTのようなマスキング言語モデルでは、否定が大きなボトルネックであることが示されている。しかし、この発見がより大きな自己回帰型言語モデル(``LLMs'')をまだ包括的に研究していない。言語理解の中心となる基本的な言語現象である否定に対処する現在の世代のLSMの能力を評価するために,LLMの研究と応用の増大とともに一歩後退する。我々は,オープンソースの GPT-neo や GPT-3, InstructGPT など,さまざまな LLM を,幅広い否定ベンチマークに対して評価する。様々なモデルサイズとプロンプトを用いた系統的実験により, llmは否定の存在に対する無感性, 否定の語彙意味を捉えることができないこと, 否定下での推論に失敗することなど, いくつかの制限があることが示されている。

Negation has been shown to be a major bottleneck for masked language models, such as BERT. However, whether this finding still holds for larger-sized auto-regressive language models (``LLMs'') has not been studied comprehensively. With the ever-increasing volume of research and applications of LLMs, we take a step back to evaluate the ability of current-generation LLMs to handle negation, a fundamental linguistic phenomenon that is central to language understanding. We evaluate different LLMs -- including the open-source GPT-neo, GPT-3, and InstructGPT -- against a wide range of negation benchmarks. Through systematic experimentation with varying model sizes and prompts, we show that LLMs have several limitations including insensitivity to the presence of negation, an inability to capture the lexical semantics of negation, and a failure to reason under negation.

翻訳日:2023-06-16 20:44:49 公開日:2023-06-14

# 離散表現構造を持つ深部生成モデルの不偏学習

Unbiased Learning of Deep Generative Models with Structured Discrete Representations ( http://arxiv.org/abs/2306.08230v1 )

ライセンス: Link先を確認

Harry Bendekgey, Gabriel Hope and Erik B. Sudderth

(参考訳) グラフィカルモデルとディープラーニングアーキテクチャを組み合わせることで、両方のフレームワークの強みで生成モデルを学びます。構造化変分オートエンコーダ(SVAE)は、グラフィカルモデルから構造と解釈可能性を受け継ぎ、ディープラーニングから高次元データに柔軟な可能性をもたらすが、かなりの最適化課題が生じる。本稿では,svaeを学習するための新しいアルゴリズムを提案し,離散的潜在変数を組み込んだデータ欠落時のマルチモーダル不確実性に対処するsvaeの能力を示す。メモリ効率の高い暗黙差分法により,SVAEは不完全最適化に対して頑健さを示しつつ,勾配降下により学習しやすくなった。正確なグラフィカルモデルパラメータをより迅速に学習するために,手作業による導出を伴わずに自然勾配を計算する手法を導出する。これらの最適化の革新はSVAEと最先端の時系列モデルの最初の比較を可能にし、SVAEは解釈可能で構造化された離散データ表現を学習しながら競争的に機能する。

By composing graphical models with deep learning architectures, we learn generative models with the strengths of both frameworks. The structured variational autoencoder (SVAE) inherits structure and interpretability from graphical models, and flexible likelihoods for high-dimensional data from deep learning, but poses substantial optimization challenges. We propose novel algorithms for learning SVAEs, and are the first to demonstrate the SVAE's ability to handle multimodal uncertainty when data is missing by incorporating discrete latent variables. Our memory-efficient implicit differentiation scheme makes the SVAE tractable to learn via gradient descent, while demonstrating robustness to incomplete optimization. To more rapidly learn accurate graphical model parameters, we derive a method for computing natural gradients without manual derivations, which avoids biases found in prior work. These optimization innovations enable the first comparisons of the SVAE to state-of-the-art time series models, where the SVAE performs competitively while learning interpretable and structured discrete data representations.

翻訳日:2023-06-16 20:36:45 公開日:2023-06-14

# 通信帯域集積型マルチモードフォトニック量子メモリ

Telecom-band integrated multimode photonic quantum memory ( http://arxiv.org/abs/2306.08229v1 )

ライセンス: Link先を確認

Xueying Zhang and Bin Zhang and Shihai Wei and Hao Li and Jinyu Liao and Cheng Li and Guangwei Deng and You Wang and Haizhi Song and Lixing You and Bo Jing and Feng Chen and Guang-Can Guo and Qiang Zhou

(参考訳) テレコムバンド集積量子メモリは、ファイバ通信インフラと互換性のある量子ネットワークを開発するための基本的なビルディングブロックである。このような大容量ネットワークに向けて、テレコムバンドにおける集積マルチモードフォトニック量子メモリがまだ実証されていない。本稿では,ファイバ集積型マルチモード量子記憶装置を,レーザ書き起こしチップ上のテレコムバンドに設置する。 Er3+:LiNbO3導波路をファイバピグテールとし、1532nmで4GHz幅のシーケンシャル単一光子の330時間モードと、単一モードに対する偶然検出率の167倍の増大が可能な記憶装置である。全ファイバーアドレス付きメモリシステムは、通信帯域ファイバ統合およびオンチップデバイスを用いて行う。この結果は、統合フォトニクスデバイスを用いた将来の量子ネットワークにとって重要なステップである。

Telecom-band integrated quantum memory is an elementary building block for developing quantum networks compatible with fiber communication infrastructures. Towards such a network with large capacity, an integrated multimode photonic quantum memory at telecom band has yet been demonstrated. Here we report a fiber-integrated multimode quantum storage of single photon at telecom band on a laser-written chip. The storage device is a fiber-pigtailed Er3+:LiNbO3 waveguide and allows a storage of up to 330 temporal modes of heralded single photon with 4-GHz-wide bandwidth at 1532 nm and a 167-fold increasing of coincidence detection rate with respect to single mode. Our memory system with all-fiber addressing is performed using telecom-band fiber-integrated and on-chip devices. The results represent an important step for the future quantum networks using integrated photonics devices.

翻訳日:2023-06-16 20:36:25 公開日:2023-06-14

# CLIPXPlore: 3次元形状探索のための複合CLIPと形状空間

CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration ( http://arxiv.org/abs/2306.08226v1 )

ライセンス: Link先を確認

Jingyu Hu, Ka-Hei Hui, Zhengzhe liu, Hao Zhang and Chi-Wing Fu

(参考訳) 本稿では,3次元形状空間の探索を支援するために視覚言語モデルを活用した新しいフレームワークであるCLIPXPloreを提案する。近年,3次元形状を学習された潜在形状空間にエンコードして生成設計とモデリングを可能にする手法が数多く開発されている。しかし、豊富な情報にもかかわらず、既存の手法には効果的な探索機構がない。そこで我々は,形状空間探索を支援するために,事前学習された視覚言語モデルである clip を活用することを提案する。私たちの考えは3倍です。まず,CLIPと形状空間をペアにし,スケッチ画像からCLIPと形状コードを生成し,2つの空間を接続するマッパーネットワークを訓練する。第二に、与えられた形状の周囲の空間を探索するために、形状の幾何によくマッチするCLIPコードを探すための最適化戦略を定式化します。第3に,2成分誘導,テキスト誘導,スケッチ誘導の3つの探索モードを設計し,形状空間における適切な探索軌跡を特定し,形状に有意な変化をもたらす。我々は,CLIPXPloreを3つの探索モードごとに異なるベースラインと定量的かつ視覚的に比較する一連の実験を行い,既存のソリューションでは達成できない多くの有意義な探索結果が得られることを示した。

This paper presents CLIPXPlore, a new framework that leverages a vision-language model to guide the exploration of the 3D shape space. Many recent methods have been developed to encode 3D shapes into a learned latent shape space to enable generative design and modeling. Yet, existing methods lack effective exploration mechanisms, despite the rich information. To this end, we propose to leverage CLIP, a powerful pre-trained vision-language model, to aid the shape-space exploration. Our idea is threefold. First, we couple the CLIP and shape spaces by generating paired CLIP and shape codes through sketch images and training a mapper network to connect the two spaces. Second, to explore the space around a given shape, we formulate a co-optimization strategy to search for the CLIP code that better matches the geometry of the shape. Third, we design three exploration modes, binary-attribute-guided, text-guided, and sketch-guided, to locate suitable exploration trajectories in shape space and induce meaningful changes to the shape. We perform a series of experiments to quantitatively and visually compare CLIPXPlore with different baselines in each of the three exploration modes, showing that CLIPXPlore can produce many meaningful exploration results that cannot be achieved by the existing solutions.

翻訳日:2023-06-16 20:36:12 公開日:2023-06-14

# 潜在状態表現を用いた政策移行によるアジャイルロコモーションの汎用性の拡大

Expanding Versatility of Agile Locomotion through Policy Transitions Using Latent State Representation ( http://arxiv.org/abs/2306.08224v1 )

ライセンス: Link先を確認

Guilherme Christmann, Ying-Sheng Luo, Jonathan Hans Soeseno, Wei-Chao Chen

(参考訳) 本稿では,実環境におけるロボット移動の汎用性を高めるロバストな遷移戦略であるtransition-netを提案する。この目的のために、我々は異なる歩行の複雑さを現実世界のロボットに適用可能な専用の移動ポリシーに分散することから始める。次に、ロバストな遷移を伴うポリシーを、潜在状態表現を調べることによって単一のコヒーレントなメタコントローラに統一することにより、ロボットの汎用性を拡大する。本手法により,ロボットはリパートリーを反復的に拡張し,ライブラリ内の任意のポリシーペア間の堅牢な遷移を可能にする。我々のフレームワークでは、新しいスキルを追加することは、以前に学んだスキルを変えるプロセスを導入しない。さらに、locomotionポリシーのトレーニングには、1つのコンシューマgpuで1時間もかからない。我々のアプローチは実世界で有効であり、既存のアプローチに比べて実験において最も困難な移行ペアの平均成功率は19%高い。

This paper proposes the transition-net, a robust transition strategy that expands the versatility of robot locomotion in the real-world setting. To this end, we start by distributing the complexity of different gaits into dedicated locomotion policies applicable to real-world robots. Next, we expand the versatility of the robot by unifying the policies with robust transitions into a single coherent meta-controller by examining the latent state representations. Our approach enables the robot to iteratively expand its skill repertoire and robustly transition between any policy pair in a library. In our framework, adding new skills does not introduce any process that alters the previously learned skills. Moreover, training of a locomotion policy takes less than an hour with a single consumer GPU. Our approach is effective in the real-world and achieves a 19% higher average success rate for the most challenging transition pairs in our experiments compared to existing approaches.

翻訳日:2023-06-16 20:35:48 公開日:2023-06-14

# 反対の損失は、パラレルラインとしてアナロジーを回復するために必要なもの

Contrastive Loss is All You Need to Recover Analogies as Parallel Lines ( http://arxiv.org/abs/2306.08221v1 )

ライセンス: Link先を確認

Narutatsu Ri, Fei-Tzin Lee, Nakul Verma

(参考訳) 静的な単語埋め込みモデルは、言語アナロジーを高次元空間における平行線として表現することが知られているが、なぜそのような幾何学的構造をもたらすのかというメカニズムは、いまだ不明である。学習時間に劇的な高速化を図りながら、分布情報よりも基本的なコントラスト型手法が、アナログ回復タスクにおける一般的な単語埋め込みモデルと競合することを示した。さらに, コントラスト損失は, 単語埋め込みにおいてこれらの並列構造を作るのに十分であることを示すとともに, 共起統計量と単語埋め込みの幾何学的構造との正確な関係を確立する。

While static word embedding models are known to represent linguistic analogies as parallel lines in high-dimensional space, the underlying mechanism as to why they result in such geometric structures remains obscure. We find that an elementary contrastive-style method employed over distributional information performs competitively with popular word embedding models on analogy recovery tasks, while achieving dramatic speedups in training time. Further, we demonstrate that a contrastive loss is sufficient to create these parallel structures in word embeddings, and establish a precise relationship between the co-occurrence statistics and the geometric structure of the resulting word embeddings.

翻訳日:2023-06-16 20:35:32 公開日:2023-06-14

# SMC-UDA:unsupervised cross-domain Renal Segmentationのための構造的制約

SMC-UDA: Structure-Modal Constraint for Unsupervised Cross-Domain Renal Segmentation ( http://arxiv.org/abs/2306.08213v1 )

ライセンス: Link先を確認

Zhusi Zhong, Jie Li, Lulu Bi, Li Yang, Ihab Kamel, Rama Chellappa, Xinbo Gao, Harrison Bai, Zhicheng Jiao

(参考訳) 深層学習に基づく医用画像のセグメンテーションは、異なる領域の画像にデプロイされると、しばしば失敗する。ドメイン適応手法はドメインシフトの問題を解決することを目的としているが、まだいくつかの問題に直面している。転送学習法は対象ドメインのアノテーションを必要とし、生成的非教師なしドメイン適応(uda)モデルはドメイン固有の表現を無視し、その生成品質はセグメンテーション性能を非常に制限する。本研究では,識別パラダイムに基づく新しい構造モード制約(SMC) UDA フレームワークを提案し,ドメイン間のブリッジとしてエッジ構造を導入する。提案するマルチモーダル学習バックボーンは,画像テクスチャから構造情報を蒸留し,領域不変エッジ構造を識別する。構造に制約のある自己学習とプログレッシブROIでは,エッジの3次元空間構造を見極めることで腎臓を分節する。我々は,SMC-UDAを,ラベル付きソースドメイン (CT) からラベルなしターゲットドメイン (CT/MRI) に適応させることにより,公開腎セグメンテーションデータセット上で評価した。実験の結果,提案するSMC-UDAの一般化は良好であり,生成的UDA法よりも優れていた。

Medical image segmentation based on deep learning often fails when deployed on images from a different domain. The domain adaptation methods aim to solve domain-shift challenges, but still face some problems. The transfer learning methods require annotation on the target domain, and the generative unsupervised domain adaptation (UDA) models ignore domain-specific representations, whose generated quality highly restricts segmentation performance. In this study, we propose a novel Structure-Modal Constrained (SMC) UDA framework based on a discriminative paradigm and introduce edge structure as a bridge between domains. The proposed multi-modal learning backbone distills structure information from image texture to distinguish domain-invariant edge structure. With the structure-constrained self-learning and progressive ROI, our methods segment the kidney by locating the 3D spatial structure of the edge. We evaluated SMC-UDA on public renal segmentation datasets, adapting from the labeled source domain (CT) to the unlabeled target domain (CT/MRI). The experiments show that our proposed SMC-UDA has a strong generalization and outperforms generative UDA methods.

翻訳日:2023-06-16 20:35:20 公開日:2023-06-14

# PEPSにおけるコーナー・トランスファー法のコスト削減

Reduced Contraction Costs of Corner-Transfer Methods for PEPS ( http://arxiv.org/abs/2306.08212v1 )

ライセンス: Link先を確認

Wangwei Lan, Glen Evenbly

(参考訳) コーナー・トランスファー・アプローチを用いる場合、無限に投影される絡み合ったペア状態(iPEPS)を$\mathcal{O}(\chi^3D^6)$から$\mathcal{O}(\chi^3D^3)$に縮約する一対の近似法を提案する。最初の近似は (i)境界テンソルの切断に必要な環境の削減 (二)確立されたアルゴリズムと併用するのではなく、ブラとケットの指数の逐次収縮と切り離しに依存している。このアルゴリズムを検証するため、正方格子ハイゼンベルクモデル上でベンチマークシミュレーションを行い、標準iPEPSアルゴリズムに匹敵する結果を得る。計算コストの向上により,大きな結合次元計算が可能となり,課題解決への可能性を広げることができた。

We propose a pair of approximations that allows the leading order computational cost of contracting an infinite projected entangled-pair state (iPEPS) to be reduced from $\mathcal{O}(\chi^3D^6)$ to $\mathcal{O}(\chi^3D^3)$ when using a corner-transfer approach. The first approximation involves (i) reducing the environment needed for truncation of the boundary tensors (ii) relies on the sequential contraction and truncation of bra and ket indices, rather than doing both together as with the established algorithm. To verify the algorithm, we perform benchmark simulations over square lattice Heisenberg model and obtain results that are comparable to the standard iPEPS algorithm. The improvement in computational cost enables us to perform large bond dimension calculations, extending its potential to solve challenging problems.

翻訳日:2023-06-16 20:34:59 公開日:2023-06-14

# 不確実性を考慮した雑音グラフのロバスト学習

Uncertainty-Aware Robust Learning on Noisy Graphs ( http://arxiv.org/abs/2306.08210v1 )

ライセンス: Link先を確認

Shuyi Chen, Kaize Ding, Shixiang Zhu

(参考訳) グラフニューラルネットワークは、さまざまなグラフ学習タスク、特にノード分類において優れた解決能力を示している。しかし、それらの効果は、実世界のグラフに存在する位相情報やノイズ情報に関連するノイズ測定が広く存在することから生じる課題によって妨げられる。これらの観測の不正確さは、グラフデータ内の重要なパターンを損なう可能性があり、最終的には実用上望ましくない性能をもたらす。そこで本稿では,分散的ロバスト最適化を動機とする新しい不確実性対応グラフ学習フレームワークを提案する。具体的には、グラフニューラルネットワークベースのエンコーダを用いて、ノードの特徴を埋め込み、最小限の定式化により最悪のリスクを最小限に抑えて最適なノード埋め込みを見つける。このような不確実性を考慮した学習プロセスは、ノード表現の改善と、データノイズによる不確実性の影響を効果的に軽減するより堅牢なグラフ予測モデルをもたらす。実験の結果,提案手法は,様々な雑音条件下での最先端のベースラインと比較して,優れた予測性能が得られることがわかった。

Graph neural networks have shown impressive capabilities in solving various graph learning tasks, particularly excelling in node classification. However, their effectiveness can be hindered by the challenges arising from the widespread existence of noisy measurements associated with the topological or nodal information present in real-world graphs. These inaccuracies in observations can corrupt the crucial patterns within the graph data, ultimately resulting in undesirable performance in practical applications. To address these issues, this paper proposes a novel uncertainty-aware graph learning framework motivated by distributionally robust optimization. Specifically, we use a graph neural network-based encoder to embed the node features and find the optimal node embeddings by minimizing the worst-case risk through a minimax formulation. Such an uncertainty-aware learning process leads to improved node representations and a more robust graph predictive model that effectively mitigates the impact of uncertainty arising from data noise. Our experimental result shows that the proposed framework achieves superior predictive performance compared to the state-of-the-art baselines under various noisy settings.

翻訳日:2023-06-16 20:34:44 公開日:2023-06-14

# 主観-目的政策形成アプローチ:マルチエージェントシミュレーションによる複数回帰分析と住民価値の結合

Subjective-objective policy making approach: Coupling of resident-values multiple regression analysis with value-indices, multi-agent-based simulation ( http://arxiv.org/abs/2306.08208v1 )

ライセンス: Link先を確認

Misa Owa, Junichi Miyakoshi, Takeshi Kato

(参考訳) 本研究は,既存の主観的・客観的政策評価アプローチに関する懸念を踏まえ,市民の意思を反映し,客観的事実に裏付けられたより良い政策を選択するための,主観的・客観的政策評価手法を提案する。生活満足アプローチや随伴評価法といった主観的アプローチは主観性を経済価値に転換し、より高い経済価値が市民の望むものと本当に一致するかどうかという疑問を提起する。エビデンスに基づく政策立案やマルチエージェントに基づくシミュレーションといった客観的政策評価アプローチは主観性を考慮しておらず、多元的かつ多元的候補政策の選択が困難である。提案手法は,住民アンケートの結果の複数回帰分析に基づいて主観的目標関数を確立し,mabsを用いて候補政策の客観的評価指標を算出した。次に、主観的対象関数の説明変数と客観的評価指標を組み合わせた新しい主観的目的結合目標関数を設定し、多数の候補から望ましいポリシーを選択するように最適化する。このアプローチを評価するため,宮崎県高春町において再生可能エネルギー導入政策の検証を行った。その結果,MABSの社会的,生態的,経済的な価値に対する2万の政策候補から,住民の価値観に整合した政策を選択するために,新たな主観的・客観的結合目標関数を使用する可能性が示唆された。新しいアプローチを使っていくつかのポリシーを比較することで、さまざまな価値を持つ利害関係者の意志を具体的に表現でき、建設的な議論やコンセンサス構築に寄与する。

Given the concerns around the existing subjective and objective policy evaluation approaches, this study proposes a new combined subjective-objective policy evaluation approach to choose better policy that reflects the will of citizens and is backed up by objective facts. Subjective approaches, such as the Life Satisfaction Approach and the Contingent Valuation Method, convert subjectivity into economic value, raising the question whether a higher economic value really accords with what citizens want. Objective policy evaluation approaches, such as Evidence Based Policy Making and Multi-Agent-Based Simulation, do not take subjectivity into account, making it difficult to choose from diverse and pluralistic candidate policies. The proposed approach establishes a subjective target function based on a multiple regression analysis of the results of a residents questionnaire survey, and uses MABS to calculate the objective evaluation indices for a number of candidate policies. Next, a new subjective-objective coupling target function, combining the explanatory variables of the subjective target function with objective evaluation indices, is set up, optimized to select the preferred policies from numerous candidates. To evaluate this approach, we conducted a verification of renewable energy introduction policies at Takaharu Town in Miyazaki Prefecture, Japan. The results show a good potential for using a new subjective-objective coupling target function to select policies consistent with the residents values for well-being from 20,000 policy candidates for social, ecological, and economic values obtained in MABS. Using the new approach to compare several policies enables concrete expression of the will of stakeholders with diverse values, and contributes to constructive discussions and consensus-building.

翻訳日:2023-06-16 20:34:28 公開日:2023-06-14

# 集合変換器と階層型Bi-LSTMを用いた多エージェントスポーツコンテキストからの球軌道推定

Ball Trajectory Inference from Multi-Agent Sports Contexts Using Set Transformer and Hierarchical Bi-LSTM ( http://arxiv.org/abs/2306.08206v1 )

ライセンス: Link先を確認

Hyunsung Kim, Han-Jun Choi, Chang Jo Kim, Jinsung Yoon, Sang-Ki Ko

(参考訳) 人工知能が多くの分野に広がるにつれ、スポーツ分析へのAIの適用も注目されている。しかしながら、主な課題の1つは、スポーツの試合中に連続移動データを自動取得することの困難さである。特に、オクルージョンや模倣などの障害物のある広いサッカーピッチで、小さなボールを確実に追跡することはコンウンダラムである。そこで本稿では,ボールトラッキングに代わる費用対効果として,選手軌道からの球軌道推定フレームワークを提案する。我々は、集合トランスフォーマーを組み合わせ、マルチエージェントコンテキストの置換不変および同変表現と、プレイヤーボールの保持を中間的に予測して最終軌道推定をサポートする階層的アーキテクチャを得る。また,現実的損失項とポストプロセッシングを導入し,推定軌跡を物理的に現実的なものにする。実験の結果, 本モデルは, 球の保持と同時に, 自然かつ正確な軌道を提供することがわかった。最後に, 軌道インプテーションの欠如, 半自動パスアノテーション, マッチブロードキャストのための自動ズームイン, ホールドワイズ性能指標の算出など, 本フレームワークの実用的応用について提案する。

As artificial intelligence spreads out to numerous fields, the application of AI to sports analytics is also in the spotlight. However, one of the major challenges is the difficulty of automated acquisition of continuous movement data during sports matches. In particular, it is a conundrum to reliably track a tiny ball on a wide soccer pitch with obstacles such as occlusion and imitations. Tackling the problem, this paper proposes an inference framework of ball trajectory from player trajectories as a cost-efficient alternative to ball tracking. We combine Set Transformers to get permutation-invariant and equivariant representations of the multi-agent contexts with a hierarchical architecture that intermediately predicts the player ball possession to support the final trajectory inference. Also, we introduce the reality loss term and postprocessing to secure the estimated trajectories to be physically realistic. The experimental results show that our model provides natural and accurate trajectories as well as admissible player ball possession at the same time. Lastly, we suggest several practical applications of our framework including missing trajectory imputation, semi-automated pass annotation, automated zoom-in for match broadcasting, and calculating possession-wise running performance metrics.

翻訳日:2023-06-16 20:34:01 公開日:2023-06-14

# 生成拡散モデルによる震度予測のためのデータ拡張

Data Augmentation for Seizure Prediction with Generative Diffusion Model ( http://arxiv.org/abs/2306.08256v1 )

ライセンス: Link先を確認

Kai Shu, Yuchang Zhao, Le Wu, Aiping Liu, Ruobing Qian, and Xun Chen

(参考訳) 目的: 発作予測は患者の生活を改善する上で非常に重要である。焦点は、先天状態と間天状態とを区別することである。機械学習の発展により、発作予測法は大きな進歩を遂げた。しかし, 先行データと間欠データとの間の深刻な不均衡問題は, 分類器の性能を制限し, 依然として大きな課題となっている。データ拡張はこの問題を解決する直感的な方法である。既存のデータ拡張手法はデータの重複や再結合によってサンプルを生成する。生成したサンプルの分布は、特徴空間を完全に探索し、新しい情報を提供することができないため、元のデータによって制限される。てんかんの脳波の表現は発作によって異なるため、これらのサンプルは新しい発作で高いパフォーマンスを達成するのに十分な多様性を提供することができない。その結果,DiffEEGと呼ばれる拡散モデルを用いた新しいデータ拡張手法を提案する。方法:拡散モデルは、2つのプロセスからなる生成モデルのクラスである。具体的には、拡散過程において、入力脳波サンプルに段階的にノイズを付加し、ノイズを出力ランダムノイズに変換し、出力と付加されたノイズの損失を最小限にしてデータの分布を探索する。離散化過程において、モデルはノイズを徐々に除去し、データ分布を外側に拡散させ、異なるクラスタ間の距離を狭めることによって合成データをサンプリングする。結果: DiffEEGを既存の手法と比較し, 3つの代表的な分類器に統合した。実験の結果、DiffEEGはパフォーマンスをさらに改善し、既存の手法よりも優れていることが示された。結論: 本論文では, 不均衡を解消し, 本手法の有効性と汎用性を実証する手法を提案する。

Objective: Seizure prediction is of great importance to improve the life of patients. The focal point is to distinguish preictal states from interictal ones. With the development of machine learning, seizure prediction methods have achieved significant progress. However, the severe imbalance problem between preictal and interictal data still poses a great challenge, restricting the performance of classifiers. Data augmentation is an intuitive way to solve this problem. Existing data augmentation methods generate samples by overlapping or recombining data. The distribution of generated samples is limited by original data, because such transformations cannot fully explore the feature space and offer new information. As the epileptic EEG representation varies among seizures, these generated samples cannot provide enough diversity to achieve high performance on a new seizure. As a consequence, we propose a novel data augmentation method with diffusion model called DiffEEG. Methods: Diffusion models are a class of generative models that consist of two processes. Specifically, in the diffusion process, the model adds noise to the input EEG sample step by step and converts the noisy sample into output random noise, exploring the distribution of data by minimizing the loss between the output and the noise added. In the denoised process, the model samples the synthetic data by removing the noise gradually, diffusing the data distribution to outward areas and narrowing the distance between different clusters. Results: We compared DiffEEG with existing methods, and integrated them into three representative classifiers. The experiments indicate that DiffEEG could further improve the performance and shows superiority to existing methods. Conclusion: This paper proposes a novel and effective method to solve the imbalanced problem and demonstrates the effectiveness and generality of our method.

翻訳日:2023-06-16 20:30:01 公開日:2023-06-14

# GBSD: ステージ拡散による生成型ボケ

GBSD: Generative Bokeh with Stage Diffusion ( http://arxiv.org/abs/2306.08251v1 )

ライセンス: Link先を確認

Jieren Deng, Xin Zhou, Hao Tian, Zhihong Pan, and Derek Aguiar

(参考訳) ボケ効果(ボケエフェクト、bokeh effect)は、写真中の焦点領域をぼかす芸術的手法であり、テキストから画像への合成や、スマートフォンカメラや写真共有アプリの普及により関心を集めている。ボケ効果のレンダリングに関する以前の研究は、古典的なコンピュータグラフィックスやニューラルレンダリング技術を用いて既存の写真に類似したぼやけた効果を生み出すために、ポストホック画像操作に焦点を合わせてきたが、深度不連続アーティファクトを持つか、トレーニングデータに存在するボケ効果の再生に制限されている。より最近の拡散モデルでは、イメージを芸術的なスタイルで合成することができるが、高次元マスクの生成、高価な微調整、あるいはグローバルなイメージ特性に影響を与える必要がある。本稿では,フォトリアリスティックな画像をボケスタイルで合成する最初の画像生成モデルであるgbsdを提案する。拡散モデルにおける画像合成の進行に動機づけられ, 潜在拡散モデルと2段階のコンディショニングアルゴリズムを組み合わせることで, 意味論的に定義された物体に対するボケ効果を表現できる。オブジェクトに効果を集中することができるので、このセマンティックボケ効果は古典的なレンダリング技術よりも汎用性が高い。我々は,gbsdを定量的かつ質的に評価し,テキストから画像への設定と画像から画像への設定の両方に適用できることを実証する。

The bokeh effect is an artistic technique that blurs out-of-focus areas in a photograph and has gained interest due to recent developments in text-to-image synthesis and the ubiquity of smart-phone cameras and photo-sharing apps. Prior work on rendering bokeh effects have focused on post hoc image manipulation to produce similar blurring effects in existing photographs using classical computer graphics or neural rendering techniques, but have either depth discontinuity artifacts or are restricted to reproducing bokeh effects that are present in the training data. More recent diffusion based models can synthesize images with an artistic style, but either require the generation of high-dimensional masks, expensive fine-tuning, or affect global image characteristics. In this paper, we present GBSD, the first generative text-to-image model that synthesizes photorealistic images with a bokeh style. Motivated by how image synthesis occurs progressively in diffusion models, our approach combines latent diffusion models with a 2-stage conditioning algorithm to render bokeh effects on semantically defined objects. Since we can focus the effect on objects, this semantic bokeh effect is more versatile than classical rendering techniques. We evaluate GBSD both quantitatively and qualitatively and demonstrate its ability to be applied in both text-to-image and image-to-image settings.

翻訳日:2023-06-16 20:29:36 公開日:2023-06-14

# 超音波画像認識におけるマスク付きオートエンコーダの劣化

Deblurring Masked Autoencoder is Better Recipe for Ultrasound Image Recognition ( http://arxiv.org/abs/2306.08249v1 )

ライセンス: Link先を確認

Qingbo Kang, Jun Gao, Kang Li, Qicheng Lao

(参考訳) masked autoencoder (mae) は前例のない注目を集め、多くの視覚タスクで顕著なパフォーマンスを達成している。事前トレーニング中にランダムにマスクされたイメージパッチ(プロキシタスクと呼ばれる)を再構築し、下流タスクに転送できる意味のある意味表現を学ぶ。しかし、超音波画像では、MAEは十分に調査されていない。本研究では,超音波画像認識におけるMAEの可能性を検討する。超音波画像の高雑音/信号比に特有の特徴を生かして,プリトレーニング中のプロキシタスクにデブラーリングを組み込んだ新しいデブラーリングMAE手法を提案する。デブロアリングの追加により、超音波画像に表示される微妙な細部をよりよく復元し、下流分類タスクの性能を向上させることができる。超音波画像分類における最新の性能を実現するため, 脱毛性maeの有効性を実証した。全体としては,超音波画像認識におけるmaeの可能性に注目し,デブラリングを組み込んだ新しい手法を提案する。

Masked autoencoder (MAE) has attracted unprecedented attention and achieves remarkable performance in many vision tasks. It reconstructs random masked image patches (known as proxy task) during pretraining and learns meaningful semantic representations that can be transferred to downstream tasks. However, MAE has not been thoroughly explored in ultrasound imaging. In this work, we investigate the potential of MAE for ultrasound image recognition. Motivated by the unique property of ultrasound imaging in high noise-to-signal ratio, we propose a novel deblurring MAE approach that incorporates deblurring into the proxy task during pretraining. The addition of deblurring facilitates the pretraining to better recover the subtle details presented in the ultrasound images, thus improving the performance of the downstream classification task. Our experimental results demonstrate the effectiveness of our deblurring MAE, achieving state-of-the-art performance in ultrasound image classification. Overall, our work highlights the potential of MAE for ultrasound image recognition and presents a novel approach that incorporates deblurring to further improve its effectiveness.

翻訳日:2023-06-16 20:29:11 公開日:2023-06-14

# 拡散の拡散:周期的一方向拡散によるテキストビジョン条件付き生成

Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation ( http://arxiv.org/abs/2306.08247v1 )

ライセンス: Link先を確認

Yongqi Yang (1), Ruoyu Wang (1), Zhihao Qian (1), Ye Zhu (2), Yu Wu (1) ((1) Wuhan University, (2) Princeton University)

(参考訳) 拡散モデルを用いたテキスト・ツー・イメージ(T2I)生成により、ユーザはテキスト条件が与えられた合成画像のセマンティックコンテンツを制御することができる。よりカスタマイズされた画像生成アプリケーションに向けたさらなるステップとして、セマンティックレベルのテキスト入力だけでなく、ピクセルレベルの視覚条件にもとづく画像の合成を行う、新しいマルチモダリティ生成設定を導入する。既存の文献は、まず与えられた視覚情報を言語と接続して意味論的表現に変換し、それから元の分節化プロセスに組み込む。一見直感的に見えるように、このような方法論設計は意味遷移中にピクセル値を失うため、低レベルのビジョン(例えば、顔画像のid)の保存が望まれるタスクシナリオを満たせない。そこで本研究では,セマンティックテキストやピクセル・ビジュアル・コンディショニングに関して,カスタマイズされた画像を作成するためのトレーニングフリーフレームワークであるCyclic One-Way Diffusion (COW)を提案する。特に,画像のサブ領域は,物理的拡散と同様に相互干渉を伴い,消音軌道に沿った究極の調和を達成する。そこで我々は,視覚条件を高濃度の「セド」としてデノナイズプロセスの初期段階に配置し,視覚条件からの一方向情報の流れを制御することで,その「拡散」を調和図形にすることで,与えられた視覚条件を周期的に繰り返し活用することを提案する。画像内における内部拡散過程を段階的に実施するために, 破壊・構築過程を何回も繰り返す。難解なワンショット顔とテキストコンディショニング画像合成タスクの実験は,学習に基づくテキスト・ビジョン条件付き手法と比較して,速度,画質,条件付き忠実性において優れることを示した。

Text-to-Image (T2I) generation with diffusion models allows users to control the semantic content in the synthesized images given text conditions. As a further step toward a more customized image creation application, we introduce a new multi-modality generation setting that synthesizes images based on not only the semantic-level textual input but also on the pixel-level visual conditions. Existing literature first converts the given visual information to semantic-level representation by connecting it to languages, and then incorporates it into the original denoising process. Seemingly intuitive, such methodological design loses the pixel values during the semantic transition, thus failing to fulfill the task scenario where the preservation of low-level vision is desired (e.g., ID of a given face image). To this end, we propose Cyclic One-Way Diffusion (COW), a training-free framework for creating customized images with respect to semantic text and pixel-visual conditioning. Notably, we observe that sub-regions of an image impose mutual interference, just like physical diffusion, to achieve ultimate harmony along the denoising trajectory. Thus we propose to repetitively utilize the given visual condition in a cyclic way, by planting the visual condition as a high-concentration ``seed'' at the initialization step of the denoising process, and ``diffuse'' it into a harmonious picture by controlling a one-way information flow from the visual condition. We repeat the destroy-and-construct process multiple times to gradually but steadily impose the internal diffusion process within the image. Experiments on the challenging one-shot face and text-conditioned image synthesis task demonstrate our superiority in terms of speed, image quality, and conditional fidelity compared to learning-based text-vision conditional methods.

翻訳日:2023-06-16 20:28:54 公開日:2023-06-14

# MMASD:自閉症介入分析のためのマルチモーダルデータセット

MMASD: A Multimodal Dataset for Autism Intervention Analysis ( http://arxiv.org/abs/2306.08243v1 )

ライセンス: Link先を確認

Jicheng Li, Vuthea Chheang, Pinar Kullu, Eli Brignac, Zhang Guo, Kenneth E. Barner, Anjana Bhat, Roghayeh Leila Barmaki Name

(参考訳) 自閉症スペクトラム障害(Autism spectrum disorder、ASD)は、発達障害の一つで、社会的コミュニケーション障害とコミュニケーションの困難さを特徴とする。機械学習技術は、自閉症の研究と評価を促進するために広く採用されている。しかしながら、計算モデルは、主に特定の分析に集中しており、プライバシを保存するデータ共有の複雑さによるモデル間の比較を制限する自閉症コミュニティのプライベートデータセットに検証されている。本研究は,自閉症児の遊び療法介入から収集した,新たなプライバシー保護オープンソースデータセットであるMMASDをマルチモーダルASDベンチマークデータセットとして提示する。 MMASDには、ASDを持つ32人の子供のデータと、100時間以上の介入記録から区切られた1,315のデータが含まれている。パブリックアクセスを促進するために、各データサンプルは、(1)光学フロー、(2)2Dスケルトン、(3)3Dスケルトン、(4)クリニカルASD評価スコア、例えばADOSスコアの4つのプライバシー保護モードから構成される。 MMASDは、研究者やセラピストが子どもの認知状態を理解し、治療中の進捗を監視し、それに応じて治療計画をカスタマイズすることを目的としている。また、行動品質評価や対人同期推定といった下流タスクにもインスピレーションを与えている。 MMASDデータセットはhttps://github.com/Li-Jicheng/MMASD-A-Multimodal-Dataset-for-Autism-Intervention-Analysisで簡単にアクセスできる。

Autism spectrum disorder (ASD) is a developmental disorder characterized by significant social communication impairments and difficulties perceiving and presenting communication cues. Machine learning techniques have been broadly adopted to facilitate autism studies and assessments. However, computational models are primarily concentrated on specific analysis and validated on private datasets in the autism community, which limits comparisons across models due to privacy-preserving data sharing complications. This work presents a novel privacy-preserving open-source dataset, MMASD as a MultiModal ASD benchmark dataset, collected from play therapy interventions of children with Autism. MMASD includes data from 32 children with ASD, and 1,315 data samples segmented from over 100 hours of intervention recordings. To promote public access, each data sample consists of four privacy-preserving modalities of data: (1) optical flow, (2) 2D skeleton, (3) 3D skeleton, and (4) clinician ASD evaluation scores of children, e.g., ADOS scores. MMASD aims to assist researchers and therapists in understanding children's cognitive status, monitoring their progress during therapy, and customizing the treatment plan accordingly. It also has inspiration for downstream tasks such as action quality assessment and interpersonal synchrony estimation. MMASD dataset can be easily accessed at https://github.com/Li-Jicheng/MMASD-A-Multimodal-Dataset-for-Autism-Intervention-Analysis.

翻訳日:2023-06-16 20:27:59 公開日:2023-06-14

# 量子エネルギーテレポーテーションを用いた量子インタラクティブ証明

Quantum interactive proofs using quantum energy teleportation ( http://arxiv.org/abs/2306.08242v1 )

ライセンス: Link先を確認

Kazuki Ikeda, Adam Lowe

(参考訳) 本稿では,量子状態テレポーテーション(qst)と量子エネルギーテレポーテーション(qet)プロトコルを用いた単純な量子インタラクティブ証明(qip)プロトコルを提案する。 qetは、サプライヤから注入されたエネルギーを担保として、ローカル操作と古典的通信(locc)によって、遠方から受信者がローカルエネルギーを抽出する技術である。 QETは、絡み合う任意の局所ハミルトニアンに対して作用し、我々の研究では、一般的な局所ハミルトニアンの基底状態を得るのが量子メリン・アーサー(QMA)ハードであることが重要である。これらの目的のためにQETを採用する主な動機は明確である。まず、証明者が正しい状態を保持し、適切な操作を行う場合、検証者は、高い確率(完全性)で負のエネルギーの存在を効果的に検証することができる。適切な演算子や誤った状態を選択することができないと、検証者は負のエネルギー(音量)を観測できない。重要なことに、検証者は証明者の伝達状態から1つの量子ビットのみを観測するが、証明者のハミルトニアン状態と状態(ゼロ・ノウトリッジ)は無視できない。さらに,分散量子インタラクティブな証明に解析を拡張し,各プレイヤーの計測値の検証のための複数の解を提案する。最も一般的な場合におけるプロトコルの複雑性クラスは、QIP(3)=PSPACEに属するので、小さな量子通信デバイスで実装可能なセキュアな量子認証スキームを提供する。プロトコルをQuantum Multi-Prover Interactive Proof (QMIP) システムに拡張するのは簡単で、複雑さはより強力になる(PSPACE$\subset$QMIP=NEXPTIME)。この場合、すべてのプローバーは基底状態の絡み合いを共有するので、より強力な複雑性クラス QMIP$^*$ に属するべきである。

We present a simple quantum interactive proof (QIP) protocol using the quantum state teleportation (QST) and quantum energy teleportation (QET) protocols. QET is a technique that allows a receiver at a distance to extract the local energy by local operations and classical communication (LOCC), using the energy injected by the supplier as collateral. QET works for any local Hamiltonian with entanglement and, for our study, it is important that getting the ground state of a generic local Hamiltonian is quantum Merlin Arthur (QMA)-hard. The key motivations behind employing QET for these purposes are clarified. Firstly, in cases where a prover possesses the correct state and executes the appropriate operations, the verifier can effectively validate the presence of negative energy with a high probability (Completeness). Failure to select the appropriate operators or an incorrect state renders the verifier incapable of observing negative energy (Soundness). Importantly, the verifier solely observes a single qubit from the prover's transmitted state, while remaining oblivious to the prover's Hamiltonian and state (Zero-knowledge). Furthermore, the analysis is extended to distributed quantum interactive proofs, where we propose multiple solutions for the verification of each player's measurement. The complexity class of our protocol in the most general case belongs to QIP(3)=PSPACE, hence it provides a secure quantum authentication scheme that can be implemented in small quantum communication devices. It is straightforward to extend our protocol to Quantum Multi-Prover Interactive Proof (QMIP) systems, where the complexity is expected to be more powerful (PSPACE$\subset$QMIP=NEXPTIME). In our case, all provers share the ground state entanglement, hence it should belong to a more powerful complexity class QMIP$^*$.

翻訳日:2023-06-16 20:27:20 公開日:2023-06-14

# 点監督下での半教師細胞認識

Semi-supervised Cell Recognition under Point Supervision ( http://arxiv.org/abs/2306.08240v1 )

ライセンス: Link先を確認

Zhongyi Shui, Yizhi Zhao, Sunyi Zheng, Yunlong Zhang, Honglin Li, Shichuan Zhang, Xiaoxuan Yu, Chenglu Zhu, Lin Yang

(参考訳) 細胞認識はデジタル病理画像解析の基本的な課題である。ポイントベースの細胞認識(PCR)法は通常、非常にコストがかかり、時間がかかり、労力がかかる大量のアノテーションを必要とする。 semi-supervised learning (ssl) はギガピクセル全体のスライド画像でセル情報をフル活用するためのショートカットを提供する。しかし、半教師付きポイントベース細胞認識(SSPCR)の研究はほとんど見過ごされている。従来のsspcrの研究はすべて密度マップベースのpcrモデルに基づいており、精度が不十分で推論速度が遅く、ハイパーパラメータに対する感度が高い。これらの問題に対処するため,近年,エンドツーエンドpcrモデルが提案されている。本稿では,エンド・ツー・エンドのPCRモデルに適したSSPCRフレームワークを初めて開発する。全体としては、現在のモデルを用いてラベルなし画像の擬似ラベルを生成し、モデルトレーニングを監督するために使用される。さらに,自己学習に一般的に存在する確証バイアス問題を克服するために,共同学習戦略を導入する。分散アライメント技術も組み込まれ、ラベルなしデータに対して高品質でバイアスのない擬似ラベルを生成する。各種染色スタイルに関する4つの病理組織学的データセットの実験結果から,提案手法の有効性と妥当性が示された。コードは \textcolor{magenta}{\url{https://github.com/windygooo/SSPCR} で入手できる。

Cell recognition is a fundamental task in digital histopathology image analysis. Point-based cell recognition (PCR) methods normally require a vast number of annotations, which is extremely costly, time-consuming and labor-intensive. Semi-supervised learning (SSL) can provide a shortcut to make full use of cell information in gigapixel whole slide images without exhaustive labeling. However, research into semi-supervised point-based cell recognition (SSPCR) remains largely overlooked. Previous SSPCR works are all built on density map-based PCR models, which suffer from unsatisfactory accuracy, slow inference speed and high sensitivity to hyper-parameters. To address these issues, end-to-end PCR models are proposed recently. In this paper, we develop a SSPCR framework suitable for the end-to-end PCR models for the first time. Overall, we use the current models to generate pseudo labels for unlabeled images, which are in turn utilized to supervise the models training. Besides, we introduce a co-teaching strategy to overcome the confirmation bias problem that generally exists in self-training. A distribution alignment technique is also incorporated to produce high-quality, unbiased pseudo labels for unlabeled data. Experimental results on four histopathology datasets concerning different types of staining styles show the effectiveness and versatility of the proposed framework. Code is available at \textcolor{magenta}{\url{https://github.com/windygooo/SSPCR}

翻訳日:2023-06-16 20:26:20 公開日:2023-06-14

# Maestro:AIロバストネスを教えるためのゲームプラットフォーム

Maestro: A Gamified Platform for Teaching AI Robustness ( http://arxiv.org/abs/2306.08238v1 )

ライセンス: Link先を確認

Margarita Geleta and Jiacen Xu and Manikanta Loya and Junlin Wang and Sameer Singh and Zhou Li and Sergio Gago-Masague

(参考訳) AI脆弱性の防止は、ユーザや企業の安全とプライバシを保護するために重要であるが、堅牢なAIのための教育ツールはまだ世界中で未開発である。本稿では,maestroの設計,実装,評価について述べる。 Maestroは、堅牢なAI教育の発展に寄与する、効果的なオープンソースのゲームベースのプラットフォームである。 maestroは、競争の激しいプログラミング環境において、学生が人生に触発された課題に直面するゴールベースのシナリオを提供する。本研究では,学生の関与,モチベーション,学習成功に対するmaestroの影響を評価した。この作業は、堅牢なAIドメインにおけるアクティブな学習機会を促進するオンライン学習ツールの設計機能に関する洞察も提供する。マエストロを用いた147人の大学生の反射反応を,AIの4分の2の授業で分析した。その結果、堅牢なAIで新しいスキルを習得したと感じた学生は、Maestroを高く評価する傾向があり、堅牢なAIにおける物質統合、好奇心、熟達に高く評価された。さらに,マエストロの主要なゲーミフィケーション要素であるリーダーボードは,学生のエンゲージメントと学習に効果的に寄与している。また,maestroは教育的品質を損なうことなく,任意のコースの長さや深さに効果的に適応できることを示した。

Although the prevention of AI vulnerabilities is critical to preserve the safety and privacy of users and businesses, educational tools for robust AI are still underdeveloped worldwide. We present the design, implementation, and assessment of Maestro. Maestro is an effective open-source game-based platform that contributes to the advancement of robust AI education. Maestro provides goal-based scenarios where college students are exposed to challenging life-inspired assignments in a competitive programming environment. We assessed Maestro's influence on students' engagement, motivation, and learning success in robust AI. This work also provides insights into the design features of online learning tools that promote active learning opportunities in the robust AI domain. We analyzed the reflection responses (measured with Likert scales) of 147 undergraduate students using Maestro in two quarterly college courses in AI. According to the results, students who felt the acquisition of new skills in robust AI tended to appreciate highly Maestro and scored highly on material consolidation, curiosity, and mastery in robust AI. Moreover, the leaderboard, our key gamification element in Maestro, has effectively contributed to students' engagement and learning. Results also indicate that Maestro can be effectively adapted to any course length and depth without losing its educational quality.

翻訳日:2023-06-16 20:25:59 公開日:2023-06-14

# OT-Net: 再利用可能なニューラル最適輸送ソリューション

OT-Net: A Reusable Neural Optimal Transport Solver ( http://arxiv.org/abs/2306.08233v1 )

ライセンス: Link先を確認

Zezeng Li, Shenghao Li, Lianbao Jin, Na Lei, Zhongxuan Luo

(参考訳) 最適輸送(ot)の広範な適用により、その計算は必須となり、様々なアルゴリズムが出現した。しかし、既存の手法は効率が低く、不連続写像を表現できない。そこで,新しい再利用可能なニューラルネットワークotソルバot-netが提案され,まずブレニアの高さ表現をニューラルネットワークで学習し,その電位の勾配を計算してotマップを得た。アルゴリズムには2つのメリットがある。 1) 不連続写像を容易に表現でき、不連続な支持を持つ任意の対象分布と一致し、鋭い境界を達成することができる。これにより、生成されたモデルのモード崩壊をなくすことができる。 2) OTマップは,新たなターゲットサンプルを追加すると,提案アルゴリズムによって直接的に計算できるため,マップの効率と再利用性が大幅に向上する。さらに, アルゴリズムの理論的誤差境界を解析し, 画像生成, 色移動, ドメイン適応におけるアプローチの実証的成功を実証した。

With the widespread application of optimal transport (OT), its calculation becomes essential, and various algorithms have emerged. However, the existing methods either have low efficiency or cannot represent discontinuous maps. A novel reusable neural OT solver OT-Net is thus presented, which first learns Brenier's height representation via the neural network to obtain its potential, and then gained the OT map by computing the gradient of the potential. The algorithm has two merits, 1) it can easily represent discontinuous maps, which allows it to match any target distribution with discontinuous supports and achieve sharp boundaries. This can well eliminate mode collapse in the generated models. 2) The OT map can be calculated straightly by the proposed algorithm when new target samples are added, which greatly improves the efficiency and reusability of the map. Moreover, the theoretical error bound of the algorithm is analyzed, and we have demonstrated the empirical success of our approach in image generation, color transfer, and domain adaptation.

翻訳日:2023-06-16 20:25:38 公開日:2023-06-14

# 逆強化学習のためのカリキュラムサブゴール

Curricular Subgoals for Inverse Reinforcement Learning ( http://arxiv.org/abs/2306.08232v1 )

ライセンス: Link先を確認

Shunyu Liu, Yunpeng Qing, Shuqi Xu, Hongyan Wu, Jiangtao Zhang, Jingyuan Cong, Tianhao Chen, Yunfu Liu, Mingli Song

(参考訳) Inverse Reinforcement Learning (IRL)は、政策学習を促進するために専門家によるデモンストレーションから報酬関数を再構築することを目的としており、模倣学習においてその顕著な成功を実証している。専門家的な行動を促進するため、既存のIRL法は主に、模倣者と専門家の軌跡の違いを最小限に抑えるグローバル報酬関数の学習に焦点を当てている。しかし、これらのグローバルな設計は、冗長なノイズとエラー伝搬の問題によって依然として制限されており、複雑なマルチステージタスクにおいてエージェント能力の低下につながる。本稿では,一タスクを複数のローカルサブゴールで明示的に切り離し,エージェントの模倣をガイドする,Curricular Subgoal-based Inverse Reinforcement Learning (CSIRL)フレームワークを提案する。具体的には、csirlはまず、訓練されたエージェントが専門家の軌道上で決定的不確実性を導入し、異なるタスクステージの探索境界を直接決定するサブゴールを動的に選択する。さらに,各ステージの局所報酬関数を取得するために,これらのキュラーサブゴールに基づいてメタシミュレーション対象をカスタマイズし,固有報酬生成装置を訓練する。 D4RLと自律走行ベンチマークの実験では、提案手法が最先端技術よりも優れた結果をもたらすとともに、より優れた解釈可能性を示す。私たちのコードはhttps://github.com/Plankson/CSIRLで公開されています。

Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning, and has demonstrated its remarkable success in imitation learning. To promote expert-like behavior, existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert. However, these global designs are still limited by the redundant noise and error propagation problems, leading to the unsuitable reward assignment and thus downgrading the agent capability in complex multi-stage tasks. In this paper, we propose a novel Curricular Subgoal-based Inverse Reinforcement Learning (CSIRL) framework, that explicitly disentangles one task with several local subgoals to guide agent imitation. Specifically, CSIRL firstly introduces decision uncertainty of the trained agent over expert trajectories to dynamically select subgoals, which directly determines the exploration boundary of different task stages. To further acquire local reward functions for each stage, we customize a meta-imitation objective based on these curricular subgoals to train an intrinsic reward generator. Experiments on the D4RL and autonomous driving benchmarks demonstrate that the proposed methods yields results superior to the state-of-the-art counterparts, as well as better interpretability. Our code is available at https://github.com/Plankson/CSIRL.

翻訳日:2023-06-16 20:25:22 公開日:2023-06-14

# 直交列を用いた微分プライベート無線フェデレート学習

Differentially Private Wireless Federated Learning Using Orthogonal Sequences ( http://arxiv.org/abs/2306.08280v1 )

ライセンス: Link先を確認

Xizixiang Wei, Tianhao Wang, Ruiquan Huang, Cong Shen, Jing Yang, H. Vincent Poor

(参考訳) 本稿では,単入力単一出力(siso)無線フェデレート学習(fl)システムのための,新しいプライバシー保護型上界計算(aircomp)法を提案する。通信設計の観点から、FLORASは直交配列の特性を利用して送信機(CSIT)のチャネル状態情報の要求をなくす。プライバシの観点から、FLORASはアイテムレベルとクライアントレベルの差分プライバシー(DP)の両方を保証できることを示す。さらに、システムパラメータを調整することで、FLORASは追加コストなしで異なるDPレベルを柔軟に達成することができる。新たなFL収束バウンダリが導出され、プライバシー保証と組み合わせることで、収束率と差分プライバシーレベルのスムーズなトレードオフが可能になる。数値計算により, FLORASの利点をAirComp法と比較し, モデル収束度とプライバシレベルの異なるトレードオフ条件を持つプライバシ保存FLの設計を導出できることが検証された。

We propose a novel privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the communication design perspective, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we prove that FLORAS can offer both item-level and client-level differential privacy (DP) guarantees. Moreover, by adjusting the system parameters, FLORAS can flexibly achieve different DP levels at no additional cost. A novel FL convergence bound is derived which, combined with the privacy guarantees, allows for a smooth tradeoff between convergence rate and differential privacy levels. Numerical results demonstrate the advantages of FLORAS compared with the baseline AirComp method, and validate that our analytical results can guide the design of privacy-preserving FL with different tradeoff requirements on the model convergence and privacy levels.

翻訳日:2023-06-16 20:17:04 公開日:2023-06-14

# FRIGATE:Frugal Spatio-temporal Preecasting on Road Networks (英語)

FRIGATE: Frugal Spatio-temporal Forecasting on Road Networks ( http://arxiv.org/abs/2306.08277v1 )

ライセンス: Link先を確認

Mridul Gupta, Hariprasad Kodamana, Sayan Ranu

(参考訳) 道路網における時空間過程のモデル化は重要度を高める課題である。時空間グラフニューラルネットワーク(gnns)の開発には大きな進展があるが、既存の研究は現実の道路網では実用的でない3つの仮定に基づいている。まず、道路ネットワークのすべてのノードを検知すると仮定する。実際には、予算制約やセンサーの故障のため、すべての位置(ノード)にはセンサーが備わっていない。第2に、すべてのセンサーでセンサー履歴が利用できると仮定する。これは、センサーの故障、通信中のパケットの損失など、非現実的である。最後に、静的な道路網の仮定がある。道路の閉鎖や新道路の建設などにより、ネットワーク内の接続性は変化する。この作業では、これらの欠点に対処するためにフリゲートを開発します。 FRIGATEは、位置情報、位相情報、時間情報をリッチな帰納ノード表現に統合する時空間Gnnによって駆動される。この多様な情報の融合は、Lstms と gated Lipschitz の埋め込みによる新しい組み合わせによって実現可能である。提案するgnnアーキテクチャは,最先端アルゴリズムで使用されるメッセージパッシングgnnよりも表現力が高いことが証明される。 FRIGATEの高い表現性は、実世界のネットワーク制約されたトラフィックデータに対して行われる経験的性能に自然に変換される。さらに、フルールセンサーの展開、道路網の接続性の変化、センシングの時間的不規則性にも耐えられる。

Modelling spatio-temporal processes on road networks is a task of growing importance. While significant progress has been made on developing spatio-temporal graph neural networks (Gnns), existing works are built upon three assumptions that are not practical on real-world road networks. First, they assume sensing on every node of a road network. In reality, due to budget-constraints or sensor failures, all locations (nodes) may not be equipped with sensors. Second, they assume that sensing history is available at all installed sensors. This is unrealistic as well due to sensor failures, loss of packets during communication, etc. Finally, there is an assumption of static road networks. Connectivity within networks change due to road closures, constructions of new roads, etc. In this work, we develop FRIGATE to address all these shortcomings. FRIGATE is powered by a spatio-temporal Gnn that integrates positional, topological, and temporal information into rich inductive node representations. The joint fusion of this diverse information is made feasible through a novel combination of gated Lipschitz embeddings with Lstms. We prove that the proposed Gnn architecture is provably more expressive than message-passing Gnns used in state-of-the-art algorithms. The higher expressivity of FRIGATE naturally translates to superior empirical performance conducted on real-world network-constrained traffic data. In addition, FRIGATE is robust to frugal sensor deployment, changes in road network connectivity, and temporal irregularity in sensing.

翻訳日:2023-06-16 20:16:47 公開日:2023-06-14

# TryOnDiffusion: 2つのユニペットの物語

TryOnDiffusion: A Tale of Two UNets ( http://arxiv.org/abs/2306.08276v1 )

ライセンス: Link先を確認

Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, Ira Kemelmacher-Shlizerman

(参考訳) 他者が着ている人物と衣服のイメージが2つある場合、私たちのゴールは、入力された人の衣服がどのように見えるかを可視化することです。重要な課題は、被服の写実的な細部保存の可視化を合成し、被服に重要な身体のポーズと形状の変化を適応させる。それまでの方法は、効果的なポーズや形状の変化を伴わずに衣服の細部保存に焦点を合わせるか、望ましい形状で試してみるか、衣服の細部を欠くかのどちらかであった。本稿では,2つのunets(parallel-unet と呼ぶ)を統一した拡散ベースのアーキテクチャを提案する。 Parallel-UNetの背景にある主要なアイデアは以下のとおりである。 1)衣服はクロスアテンション機構によって暗黙的に反動される。 2) 衣服ワープと人物ブレンドは,2つの異なるタスクのシーケンスとは対照的に,統一プロセスの一部として発生する。実験の結果,tryondiffusionは定性的にも定量的にも最先端のパフォーマンスを達成できた。

Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.

翻訳日:2023-06-16 20:16:24 公開日:2023-06-14

# c$^3$ps:半教師付き医用画像セグメンテーションのための文脈認識条件付きクロス擬似監督

C$^3$PS: Context-aware Conditional Cross Pseudo Supervision for Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2306.08275v1 )

ライセンス: Link先を確認

Peng Liu and Guoyan Zheng

(参考訳) 性能向上のために大量のラベル付きデータを活用できる半教師付き学習(SSL)手法が近年注目を集めている。本稿では,半教師型医用画像分割のためのコンテキスト対応コンディショナルクロス擬似スーパービジョン法(C$^3$PS)を提案する。先述したクロス擬似監督 (cps) とは異なり、本論文では条件付きクロス擬似監督 (ccps) 機構を導入し、与えられたクラスラベル上でクロス擬似監督を行う。文脈認識はCCPSにも導入され、相互監督のための擬似ラベルの品質が向上した。提案手法には,後期の訓練段階において,硬臓器の学習に集中できるという利点がある。医用画像分割作業の典型的な2つの課題に対して,本手法は最先端の手法よりも優れた性能を示す。

Semi-supervised learning (SSL) methods, which can leverage a large amount of unlabeled data for improved performance, has attracted increasing attention recently. In this paper, we introduce a novel Context-aware Conditional Cross Pseudo Supervision method (referred as C$^3$PS) for semi-supervised medical image segmentation. Unlike previously published Cross Pseudo Supervision (CPS) works, this paper introduces a novel Conditional Cross Pseudo Supervision (CCPS) mechanism where the cross pseudo supervision is conditioned on a given class label. Context-awareness is further introduced in the CCPS to improve the quality of pseudo-labels for cross pseudo supervision. The proposed method has the additional advantage that in the later training stage, it can focus on the learning of hard organs. Validated on two typical yet challenging medical image segmentation tasks, our method demonstrates superior performance over the state-of-the-art methods.

翻訳日:2023-06-16 20:16:06 公開日:2023-06-14

# なぜ間接グラフや非間接グラフで集約機能や隣接リストを使うのか? 実証的研究と簡易分類法

Why Using Either Aggregated Features or Adjacency Lists in Directed or Undirected Graph? Empirical Study and Simple Classification Method ( http://arxiv.org/abs/2306.08274v1 )

ライセンス: Link先を確認

Seiji Maekawa, Yuya Sasaki, Makoto Onizuka

(参考訳) ノード分類は、グラフ分析で最もホットなタスクの1つです。本稿では,ノード表現(特徴の集約対隣接リスト)と入力グラフのエッジ方向(指向対非指向)の選択に焦点をあて,分類結果に大きな影響を与えている。本研究は,ノード表現とエッジ方向の組み合わせを用いた各種GNNの性能評価を行うための実証的研究である。実験の結果,データセット間の静的な組み合わせは得られず,データセットの特性に応じて適切な組み合わせを選択する必要があることが示された。そこで本研究では,有向グラフと無向グラフのノード表現のすべての組み合わせを利用する,単純だが包括的分類法A2DUGを提案する。我々は,A2DUGが様々なデータセットで安定して動作することを示す。驚くべきことに、いくつかのデータセットにおいて、現在の最先端のメソッドを大きく上回っている。この結果は,ノード表現とエッジ方向の組み合わせに対する適応効果制御の重要性を検証する。

Node classification is one of the hottest tasks in graph analysis. In this paper, we focus on the choices of node representations (aggregated features vs. adjacency lists) and the edge direction of an input graph (directed vs. undirected), which have a large influence on classification results. We address the first empirical study to benchmark the performance of various GNNs that use either combination of node representations and edge directions. Our experiments demonstrate that no single combination stably achieves state-of-the-art results across datasets, which indicates that we need to select appropriate combinations depending on the characteristics of datasets. In response, we propose a simple yet holistic classification method A2DUG which leverages all combinations of node representation variants in directed and undirected graphs. We demonstrate that A2DUG stably performs well on various datasets. Surprisingly, it largely outperforms the current state-of-the-art methods in several datasets. This result validates the importance of the adaptive effect control on the combinations of node representations and edge directions.

翻訳日:2023-06-16 20:15:49 公開日:2023-06-14

# 局在2次元波束の密度ゆらぎにおけるカルダー・パリ・チャン物理

Kardar-Parisi-Zhang Physics in the Density Fluctuations of Localized Two-Dimensional Wave Packets ( http://arxiv.org/abs/2306.08272v1 )

ライセンス: Link先を確認

Sen Mu, Jiangbin Gong, Gabriel Lemari\'e

(参考訳) 2次元アンダーソン局在波パケットにおいて,波密度対数のゆらぎにおけるKardar-Parisi-Zhang普遍性クラスの鍵となる特徴を同定する。数値解析により,ゆらぎは,約1/3$の指数とゆらぎのトレイシー・ウィドム確率分布によって特徴づけられる距離を持つ代数的スケーリングを示すことがわかった。さらに、KPZ物理の指向性ポリマー画像において、指向性経路の波状パケット密度への支配的寄与を同定し、その横変動が粗さ指数2/3$で特徴づけられることを発見した。このKPZ物理との接続を利用して、2次元のアンダーソン局所化波のパケットがそのよく知られた指数的局所化に対する拡張指数的補正を示すことを検証した。

We identify the key features of Kardar-Parisi-Zhang universality class in the fluctuations of the wave density logarithm, in a two-dimensional Anderson localized wave packet. In our numerical analysis, the fluctuations are found to exhibit an algebraic scaling with distance characterized by an exponent of $1/3$, and a Tracy-Widom probability distribution of the fluctuations. Additionally, within a directed polymer picture of KPZ physics, we identify the dominant contribution of a directed path to the wave packet density and find that its transverse fluctuations are characterized by a roughness exponent $2/3$. Leveraging on this connection with KPZ physics, we verify that an Anderson localized wave packet in 2D exhibits a stretched-exponential correction to its well-known exponential localization.

翻訳日:2023-06-16 20:15:34 公開日:2023-06-14

# 物体検出のための多クラス信頼度と位置校正

Multiclass Confidence and Localization Calibration for Object Detection ( http://arxiv.org/abs/2306.08271v1 )

ライセンス: Link先を確認

Bimsara Pathiraja, Malitha Gunawardhana, Muhammad Haris Khan

(参考訳) 多くの挑戦的なコンピュータビジョン問題に対して高い予測精度を達成する一方で、最近の研究は、ディープニューラルネットワーク(DNN)が過信的な予測を行う傾向があることを示唆している。 DNNキャリブレーションを改善する既存の試みのほとんどは、分類タスクに限定され、ドメイン内予測のキャリブレーションに限られている。驚くべきことに、視覚ベースのセキュリティに敏感で安全に重要なアプリケーションにおいて重要な空間を占める物体検出法を校正する試みは、ほとんど、あるいは全く行われていない。本稿では,最近の物体検出手法を校正するための新しい列車時間手法を提案する。予測の不確実性を利用することで、マルチクラス信頼度とボックスローカライズを共同で調整することができる。我々は複数のドメイン内およびドメイン外検出ベンチマークで広範囲な実験を行う。その結果,提案手法は,領域内と領域外の両方の予測におけるキャリブレーション誤差を減少させるため,複数のベースラインを一貫して上回っていることがわかった。私たちのコードとモデルはhttps://github.com/bimsarapathiraja/mcclで利用可能です。

Albeit achieving high predictive accuracy across many challenging computer vision problems, recent studies suggest that deep neural networks (DNNs) tend to make overconfident predictions, rendering them poorly calibrated. Most of the existing attempts for improving DNN calibration are limited to classification tasks and restricted to calibrating in-domain predictions. Surprisingly, very little to no attempts have been made in studying the calibration of object detection methods, which occupy a pivotal space in vision-based security-sensitive, and safety-critical applications. In this paper, we propose a new train-time technique for calibrating modern object detection methods. It is capable of jointly calibrating multiclass confidence and box localization by leveraging their predictive uncertainties. We perform extensive experiments on several in-domain and out-of-domain detection benchmarks. Results demonstrate that our proposed train-time calibration method consistently outperforms several baselines in reducing calibration error for both in-domain and out-of-domain predictions. Our code and models are available at https://github.com/bimsarapathiraja/MCCL.

翻訳日:2023-06-16 20:15:18 公開日:2023-06-14

# 2次元円形カーネル時系列変換, エントロピー対策, 機械学習アプローチを用いた太陽活動の画像追跡

Imagery Tracking of Sun Activity Using 2D Circular Kernel Time Series Transformation, Entropy Measures and Machine Learning Approaches ( http://arxiv.org/abs/2306.08270v1 )

ライセンス: Link先を確認

Irewola Aaron Oludehinwa, Andrei Velichko, Maksim Belyaev and Olasunkanmi I. Olusola

(参考訳) 太陽は自然に非常に複雑であり、その観測画像の特徴は太陽活動、宇宙、地球の気象条件に関する情報の最も重要な情報源の1つである。 NASAのソーラー・ダイナミクス・オブザーバトリーは1日あたり約7万枚の太陽活動の画像を撮影しており、この太陽観測画像の連続的な視界検査は難しい。本研究では,2次元円カーネル時系列変換,統計的およびエントロピー尺度を用いて,機械学習による太陽活動の追跡手法を開発した。この技術は、太陽観測画像断面を1次元時系列(1-DTS)に変換し、統計的およびエントロピー的測度(Approach)を計測する。 1)と直接分類(応用) 2) 機械学習分類のための1-DTSから抽出した特徴を'solar storm'と'no storm'にキャプチャするために使用される。その結果、太陽活動の追跡におけるモデルの有効性は、アプローチ1では0.981、アプローチ2では0.999であることがわかった。太陽観測衛星画像の回転変換への発展アプローチの安定性は明らかである。 Approach 1 の当初のデータセットをトレーニングすると、太陽嵐領域の分布の一致指数 (T90) は T90 ~ 0.993 となり、また Approach 2 は T90 ~ 0.951 となる。また、拡張トレーニングベースを使用すると、マッチ指数はt90〜0.994とt90〜1に増加した。このモデルは、太陽嵐に関連する渦巻く磁気線の領域を一貫して分類し、画像の回転、彩度、光学的アーティファクトに頑健である。

The sun is highly complex in nature and its observatory imagery features is one of the most important sources of information about the sun activity, space and Earth's weather conditions. The NASA, solar Dynamics Observatory captures approximately 70,000 images of the sun activity in a day and the continuous visual inspection of this solar observatory images is challenging. In this study, we developed a technique of tracking the sun's activity using 2D circular kernel time series transformation, statistical and entropy measures, with machine learning approaches. The technique involves transforming the solar observatory image section into 1-Dimensional time series (1-DTS) while the statistical and entropy measures (Approach 1) and direct classification (Approach 2) is used to capture the extraction features from the 1-DTS for machine learning classification into 'solar storm' and 'no storm'. We found that the potential accuracy of the model in tracking the activity of the sun is approximately 0.981 for Approach 1 and 0.999 for Approach 2. The stability of the developed approach to rotational transformation of the solar observatory image is evident. When training on the original dataset for Approach 1, the match index (T90) of the distribution of solar storm areas reaches T90 ~ 0.993, and T90 ~ 0.951 for Approach 2. In addition, when using the extended training base, the match indices increased to T90 ~ 0.994 and T90 ~ 1, respectively. This model consistently classifies areas with swirling magnetic lines associated with solar storms and is robust to image rotation, glare, and optical artifacts.

翻訳日:2023-06-16 20:15:00 公開日:2023-06-14

# LargeST: 大規模トラフィック予測のためのベンチマークデータセット

LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting ( http://arxiv.org/abs/2306.08259v1 )

ライセンス: Link先を確認

Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chao Huang, Zhenguang Liu, Bryan Hooi, Roger Zimmermann

(参考訳) 交通予測はスマートシティのイニシアチブにおいて重要な役割を担い、トラフィックデータの非線形パターンを捉えた深層学習の力により、大きな進歩を遂げている。しかし、現在の公開データセットで達成された有望な結果は、これらのデータセット内の制限のため、実用的なシナリオには適用できない可能性がある。まず、制限されたサイズは、実際の交通ネットワークの規模を反映していない可能性がある。第二に、これらのデータセットの時間的カバレッジは通常短く、長期的なパターンを研究し、深層モデルのトレーニングに十分なサンプルを取得する上でハードルとなる。第三に、これらのデータセットはセンサーに十分なメタデータを欠いており、データの信頼性と解釈性を損なう。これらの制限を軽減するため、LargeSTベンチマークデータセットを導入します。総計8,600個のセンサーを5年間にわたってカバーし、包括的なメタデータを含んでいる。最大で詳細なデータ分析を行い、データインサイトを抽出し、パフォーマンスと効率の観点からよく知られたベースラインをベンチマークし、課題と将来の研究の機会を特定します。データセットとベースラインの実装は、https://github.com/liuxu77/ largestでリリースします。

Traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning in capturing non-linear patterns of traffic data. However, the promising results achieved on current public datasets may not be applicable to practical scenarios due to limitations within these datasets. First, the limited sizes of them may not reflect the real-world scale of traffic networks. Second, the temporal coverage of these datasets is typically short, posing hurdles in studying long-term patterns and acquiring sufficient samples for training deep models. Third, these datasets often lack adequate metadata for sensors, which compromises the reliability and interpretability of the data. To mitigate these limitations, we introduce the LargeST benchmark dataset. It encompasses a total number of 8,600 sensors with a 5-year time coverage and includes comprehensive metadata. Using LargeST, we perform in-depth data analysis to extract data insights, benchmark well-known baselines in terms of their performance and efficiency, and identify challenges as well as opportunities for future research. We release the datasets and baseline implementations at: https://github.com/liuxu77/LargeST.

翻訳日:2023-06-16 20:14:33 公開日:2023-06-14

# 潜在拡散モデルのロバスト性について

On the Robustness of Latent Diffusion Models ( http://arxiv.org/abs/2306.08257v1 )

ライセンス: Link先を確認

Jianping Zhang, Zhuoer Xu, Shiwen Cui, Changhua Meng, Weibin Wu, Michael R. Lyu

(参考訳) 潜在拡散モデルは、画像合成や画像編集など、様々な生成タスクで最先端のパフォーマンスを達成する。しかし, 潜在拡散モデルのロバスト性は十分に研究されていない。以前の作業は、デノーミングプロセスに関係なく、エンコーダやホワイトボックス設定下での出力イメージに対する敵攻撃のみに焦点を当てていた。そこで本研究では,潜伏拡散モデルのロバスト性をより詳細に解析することを目的とする。まず,潜伏拡散モデル内の成分が白色箱の堅牢性に及ぼす影響について検討した。ホワイトボックスのシナリオに加えて、転送攻撃による潜伏拡散モデルのブラックボックスロバスト性を評価し、プロンプト・トランスファーとモデル・トランスファーの両方の設定と防御機構について検討する。しかし、これらの調査には包括的なベンチマークデータセットが必要であり、文献に欠けている。そこで, 潜在拡散モデルのロバスト性の研究を容易にするために, 2種類の画像編集モデルのための2つの自動データセット構築パイプラインを提案し, データセット全体を解放する。コードとデータセットは \url{https://github.com/jpzhang1810/LDM-Robustness} で公開されています。

Latent diffusion models achieve state-of-the-art performance on a variety of generative tasks, such as image synthesis and image editing. However, the robustness of latent diffusion models is not well studied. Previous works only focus on the adversarial attacks against the encoder or the output image under white-box settings, regardless of the denoising process. Therefore, in this paper, we aim to analyze the robustness of latent diffusion models more thoroughly. We first study the influence of the components inside latent diffusion models on their white-box robustness. In addition to white-box scenarios, we evaluate the black-box robustness of latent diffusion models via transfer attacks, where we consider both prompt-transfer and model-transfer settings and possible defense mechanisms. However, all these explorations need a comprehensive benchmark dataset, which is missing in the literature. Therefore, to facilitate the research of the robustness of latent diffusion models, we propose two automatic dataset construction pipelines for two kinds of image editing models and release the whole dataset. Our code and dataset are available at \url{https://github.com/jpzhang1810/LDM-Robustness}.

翻訳日:2023-06-16 20:14:16 公開日:2023-06-14

# 予測輝度による可逆的ハーフトーン変換

Taming Reversible Halftoning via Predictive Luminance ( http://arxiv.org/abs/2306.08309v1 )

ライセンス: Link先を確認

Cheuk-Kit Lau, Menghan Xia, Tien-Tsin Wong

(参考訳) 伝統的なハーフトーンは通常、二値ドットで画像をディザリングする際に色を落とすため、元の色情報を復元することが困難になる。カラーイメージを元のバージョンに完全復元可能なバイナリハーフトーンに変換する,新しいハーフトーン技術を提案する。提案手法は,2つの畳み込みニューラルネットワーク(CNN)による可逆半音パターンの生成と,CNNの平坦性劣化問題を緩和するためのノイズインセンティブブロック(NIB)から構成される。さらに,提案手法では,青音品質と復元精度の矛盾に対処するため,予測可能な情報をネットワークからオフロードする予測器組込み手法を提案し,本手法はハーフトーンパターンに類似した輝度情報である。このようなアプローチにより、ネットワークは、修復品質を損なうことなく、より優れたブルーノイズ品質のハーフトーンを生産する柔軟性を得ることができる。多段階訓練法と損失重み付けに関する詳細な研究が行われている。我々は, 半音のスペクトル解析, 半音の精度, 復元精度, データ埋め込み研究について, 予測器埋め込み法と新しい手法を比較した。エントロピー評価の結果,我々のハーフトーンは,新しいベース法よりもエントロピー情報が少ないことがわかった。実験により, 半音の青色音質を改善するために, 予測器埋込み法により柔軟性が向上し, 耐障害性も向上した。

Traditional halftoning usually drops colors when dithering images with binary dots, which makes it difficult to recover the original color information. We proposed a novel halftoning technique that converts a color image into a binary halftone with full restorability to its original version. Our novel base halftoning technique consists of two convolutional neural networks (CNNs) to produce the reversible halftone patterns, and a noise incentive block (NIB) to mitigate the flatness degradation issue of CNNs. Furthermore, to tackle the conflicts between the blue-noise quality and restoration accuracy in our novel base method, we proposed a predictor-embedded approach to offload predictable information from the network, which in our case is the luminance information resembling from the halftone pattern. Such an approach allows the network to gain more flexibility to produce halftones with better blue-noise quality without compromising the restoration quality. Detailed studies on the multiple-stage training method and loss weightings have been conducted. We have compared our predictor-embedded method and our novel method regarding spectrum analysis on halftone, halftone accuracy, restoration accuracy, and the data embedding studies. Our entropy evaluation evidences our halftone contains less encoding information than our novel base method. The experiments show our predictor-embedded method gains more flexibility to improve the blue-noise quality of halftones and maintains a comparable restoration quality with a higher tolerance for disturbances.

翻訳日:2023-06-16 20:08:42 公開日:2023-06-14

# マルチモーダル分類のためのバランスの取れたアクティブラーニングに向けて

Towards Balanced Active Learning for Multimodal Classification ( http://arxiv.org/abs/2306.08306v1 )

ライセンス: Link先を確認

Meng Shen, Yizheng Huang, Jianxiong Yin, Heqing Zou, Deepu Rajan, Simon See

(参考訳) マルチモーダルネットワークのトレーニングには、ユニモーダルネットワークに比べてパラメータ空間が大きいため、膨大なデータが必要である。アクティブラーニングは、モデルの性能向上に寄与するサンプルのみを選択することで、データアノテーションコストを削減するために広く使われているテクニックである。しかし、現在のアクティブラーニング戦略は、主に一助的なタスクのために設計されており、マルチモーダルデータに適用すると、支配的なモダリティからのサンプル選択がバイアスとなることが多い。この不公平さは、最適なパフォーマンスを達成するために重要なマルチモーダル学習のバランスを妨げる。本稿では,よりバランスの取れたマルチモーダルアクティブラーニング戦略を設計するための3つのガイドラインを提案する。これらのガイドラインに従うと、モダリティ間の優越度で勾配埋め込みを変調することにより、より公平なデータ選択を実現する新しい手法が提案される。本研究は,支配的モダリティから欲深いサンプル選択を回避し,よりバランスのとれたマルチモーダル学習を実現することを実証する。提案手法は,様々なマルチモーダル分類タスクにおいて,既存のアクティブラーニング戦略より優れている。本研究は,マルチモーダル能動学習におけるサンプル選択のバランスをとることの重要性を強調し,マルチモーダル分類のためのよりバランスの取れた能動学習を実現するための実践的ソリューションを提供する。

Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks. Active learning is a widely used technique for reducing data annotation costs by selecting only those samples that could contribute to improving model performance. However, current active learning strategies are mostly designed for unimodal tasks, and when applied to multimodal data, they often result in biased sample selection from the dominant modality. This unfairness hinders balanced multimodal learning, which is crucial for achieving optimal performance. To address this issue, we propose three guidelines for designing a more balanced multimodal active learning strategy. Following these guidelines, a novel approach is proposed to achieve more fair data selection by modulating the gradient embedding with the dominance degree among modalities. Our studies demonstrate that the proposed method achieves more balanced multimodal learning by avoiding greedy sample selection from the dominant modality. Our approach outperforms existing active learning strategies on a variety of multimodal classification tasks. Overall, our work highlights the importance of balancing sample selection in multimodal active learning and provides a practical solution for achieving more balanced active learning for multimodal classification.

翻訳日:2023-06-16 20:08:15 公開日:2023-06-14

# マイクロドップラーシグネチャに基づくレーダーデータエンハンス型深層学習アプローチによる歩行者認識

Pedestrian Recognition with Radar Data-Enhanced Deep Learning Approach Based on Micro-Doppler Signatures ( http://arxiv.org/abs/2306.08303v1 )

ライセンス: Link先を確認

Haoming Li, Yu Xiang, Haodong Xu, Wenyong Wang

(参考訳) 近年のホットな話題として、レーダーマイクロドップラーシグネチャに基づく歩行者識別能力は、適切な訓練データがないために制限されている。本稿では,データ拡張(DE)モジュールとマルチ特性学習(MCL)モジュールを備えたデータ強化型マルチ特性学習(DEMCL)モデルを提案し,より補完的な歩行者用マイクロドップラー(m-D)シグネチャを学習する。 DEモジュールでは、フリーウォーキングデータセットを強化するためにレンジ・ドップラー生成逆数ネットワーク(RDGAN)が提案され、マルチスケール畳み込みニューラルネットワーク(MCNN)とラジアル基底関数ニューラルネットワーク(RBFNN)を備えたMCLモジュールは、強化データセットから抽出されたm-Dシグネチャを学習するために訓練される。実験の結果,25分間の歩行データセットにおいて,本モデルは3.33%から10.24%の精度を示し,走行時間は0.09324秒であった。

As a hot topic in recent years, the ability of pedestrians identification based on radar micro-Doppler signatures is limited by the lack of adequate training data. In this paper, we propose a data-enhanced multi-characteristic learning (DEMCL) model with data enhancement (DE) module and multi-characteristic learning (MCL) module to learn more complementary pedestrian micro-Doppler (m-D) signatures. In DE module, a range-Doppler generative adversarial network (RDGAN) is proposed to enhance free walking datasets, and MCL module with multi-scale convolution neural network (MCNN) and radial basis function neural network (RBFNN) is trained to learn m-D signatures extracted from enhanced datasets. Experimental results show that our model is 3.33% to 10.24% more accurate than other studies and has a short run time of 0.9324 seconds on a 25-minute walking dataset.

翻訳日:2023-06-16 20:07:54 公開日:2023-06-14

# 大規模言語モデルと知識グラフの統合:ロードマップ

Unifying Large Language Models and Knowledge Graphs: A Roadmap ( http://arxiv.org/abs/2306.08302v1 )

ライセンス: Link先を確認

Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, Xindong Wu

(参考訳) ChatGPTやGPT4のような大規模言語モデル(LLM)は、その創発的能力と一般化性のために、自然言語処理と人工知能の分野で新たな波を発生させている。しかし、llmはブラックボックスモデルであり、事実知識を捉えてアクセスすることができないことが多い。対照的に、ナレッジグラフ(kgs)、wikipedia、huapuは、リッチな事実知識を明示的に格納する構造化ナレッジモデルである。 kgsは推論と解釈の外部知識を提供することでllmを強化することができる。一方、KGは自然によって構築と進化が困難であり、KGの既存の手法に挑戦して新しい事実を生成し、目に見えない知識を表現する。したがって、llmとkgを統一し、同時にその利点を活用することは相補的である。本稿では,LLMとKGの統合に向けた今後のロードマップを示す。私たちのロードマップは3つの一般的なフレームワークで構成されています。 1) LLMの事前訓練及び推論段階でKGを組み込んだKG強化LLM、又は、LLMが学習した知識の理解を深めることを目的とした。 2 LLM強化KGは、埋め込み、完了、構築、グラフ・トゥ・テキスト生成、質問応答等の異なるKGタスクにLLMを活用する。 3) LLM と KG が同等の役割を担い、相互に有益な方法で機能し、データと知識の両方によって駆動される双方向推論のための LLM と KG の両方を強化する。我々は、これらの3つのフレームワークの既存の取り組みをロードマップでレビューし、要約し、今後の研究方向性を見極める。

Large language models (LLMs), such as ChatGPT and GPT4, are making new waves in the field of natural language processing and artificial intelligence, due to their emergent ability and generalizability. However, LLMs are black-box models, which often fall short of capturing and accessing factual knowledge. In contrast, Knowledge Graphs (KGs), Wikipedia and Huapu for example, are structured knowledge models that explicitly store rich factual knowledge. KGs can enhance LLMs by providing external knowledge for inference and interpretability. Meanwhile, KGs are difficult to construct and evolving by nature, which challenges the existing methods in KGs to generate new facts and represent unseen knowledge. Therefore, it is complementary to unify LLMs and KGs together and simultaneously leverage their advantages. In this article, we present a forward-looking roadmap for the unification of LLMs and KGs. Our roadmap consists of three general frameworks, namely, 1) KG-enhanced LLMs, which incorporate KGs during the pre-training and inference phases of LLMs, or for the purpose of enhancing understanding of the knowledge learned by LLMs; 2) LLM-augmented KGs, that leverage LLMs for different KG tasks such as embedding, completion, construction, graph-to-text generation, and question answering; and 3) Synergized LLMs + KGs, in which LLMs and KGs play equal roles and work in a mutually beneficial way to enhance both LLMs and KGs for bidirectional reasoning driven by both data and knowledge. We review and summarize existing efforts within these three frameworks in our roadmap and pinpoint their future research directions.

翻訳日:2023-06-16 20:07:34 公開日:2023-06-14

# SaDI: 極端イベント下での電力負荷予測のための自己適応型分解型解釈可能なフレームワーク

SaDI: A Self-adaptive Decomposed Interpretable Framework for Electric Load Forecasting under Extreme Events ( http://arxiv.org/abs/2306.08299v1 )

ライセンス: Link先を確認

Hengbo Liu, Ziqing Ma, Linxiao Yang, Tian Zhou, Rui Xia, Yi Wang, Qingsong Wen, Liang Sun

(参考訳) 電力網の計画と管理には電力負荷の正確な予測が不可欠である。本稿では,熱を焼成するなどの極端な状況下での電力負荷予測問題を解く。正確な予測の1つの課題は、極端な条件下でのトレーニングサンプルの欠如である。また、負荷は通常、より優れた決定を下すために解釈可能なモデルを要求するこれらの極端な状況において劇的に変化する。本稿では, 長期傾向, 短期傾向, 周期モデリングを合体させ, 異なる成分の時間特性を捉えた, 自己適応型分解型解釈可能フレームワーク (sadi) を提案する。極端事象下での不均衡学習のための外部変数トリガ損失を提案する。さらに,GAM(Generalized Additive Model)が望ましい解釈性のためのフレームワークとして採用されている。建物からの電力負荷と公共エネルギーメータの双方に関する実験により、提案されたSaDIフレームワークは、通常化RMSEの日平均で、極端な事象下での予測における最先端のアルゴリズムと比較して平均22.14%改善していることが示された。コード、パブリックデータセット、およびAppendixは、https://doi.org/10.24433/CO.9696980.v1 で利用可能である。

Accurate prediction of electric load is crucial in power grid planning and management. In this paper, we solve the electric load forecasting problem under extreme events such as scorching heats. One challenge for accurate forecasting is the lack of training samples under extreme conditions. Also load usually changes dramatically in these extreme conditions, which calls for interpretable model to make better decisions. In this paper, we propose a novel forecasting framework, named Self-adaptive Decomposed Interpretable framework~(SaDI), which ensembles long-term trend, short-term trend, and period modelings to capture temporal characteristics in different components. The external variable triggered loss is proposed for the imbalanced learning under extreme events. Furthermore, Generalized Additive Model (GAM) is employed in the framework for desirable interpretability. The experiments on both Central China electric load and public energy meters from buildings show that the proposed SaDI framework achieves average 22.14% improvement compared with the current state-of-the-art algorithms in forecasting under extreme events in terms of daily mean of normalized RMSE. Code, Public datasets, and Appendix are available at: https://doi.org/10.24433/CO.9696980.v1 .

翻訳日:2023-06-16 20:07:03 公開日:2023-06-14

# 直接グリッドリファインメントアルゴリズムを用いた物理形ニューラルネットワークの効率的な学習

Efficient Training of Physics-Informed Neural Networks with Direct Grid Refinement Algorithm ( http://arxiv.org/abs/2306.08293v1 )

ライセンス: Link先を確認

Shikhar Nilabh and Fidel Grandia

(参考訳) 本研究では,物理インフォームドニューラルネットワーク(PINN)の枠組みにおける残点の適応サンプリングに適したアルゴリズムの開発について述べる。提案手法は,既存の適応サンプリング手法に内在する制限に対処することで,計算効率と適応点配置の両方を効果的に保証する直接メッシュリファインメント手法を導入する。本アルゴリズムの性能を評価するために検証を行い,本手法とベンチマークモデルの結果に基づいて,モデル間の合理的な一致を示した。従来の適応型再サンプリング技術との比較分析により,特に高精細化率で実施した場合,本手法の優れた性能が示された。本研究は, 適応サンプリングアルゴリズムを物理インフォームドニューラルネットワークに適用することにより, シミュレーション精度の向上を図ったものである。

This research presents the development of an innovative algorithm tailored for the adaptive sampling of residual points within the framework of Physics-Informed Neural Networks (PINNs). By addressing the limitations inherent in existing adaptive sampling techniques, our proposed methodology introduces a direct mesh refinement approach that effectively ensures both computational efficiency and adaptive point placement. Verification studies were conducted to evaluate the performance of our algorithm, showcasing reasonable agreement between the model based on our novel approach and benchmark model results. Comparative analyses with established adaptive resampling techniques demonstrated the superior performance of our approach, particularly when implemented with higher refinement factor. Overall, our findings highlight the enhancement of simulation accuracy achievable through the application of our adaptive sampling algorithm for Physics-Informed Neural Networks.

翻訳日:2023-06-16 20:06:43 公開日:2023-06-14

# 高次解法におけるp適応のための強化学習戦略

A reinforcement learning strategy for p-adaptation in high order solvers ( http://arxiv.org/abs/2306.08292v1 )

ライセンス: Link先を確認

David Huergo, Gonzalo Rubio, Esteban Ferrer

(参考訳) 強化学習(Reinforcement Learning, RL)は、意思決定プロセスを自動化するための有望なアプローチである。本稿では,高次解法を用いて計算メッシュの多項式順序を最適化するrl手法の適用について検討する。メッシュ適応は,コストを低減しつつ精度を向上し,数値シミュレーションの効率を向上させる上で重要な役割を担っている。ここで、近位ポリシー最適化に基づくアクター-クリティックrlモデルは、エージェントが進化する条件に基づいて最適なメッシュ修正を学ぶためのデータ駆動アプローチを提供する。本稿では,高次解法におけるp適応戦略を提案し,適切な報酬構造の定式化やrlエージェントとシミュレーション環境との相互作用など,rlベースのメッシュ適応の主な側面について考察する。 RLに基づくメッシュp適応が計算効率と精度に与える影響を論じる。本研究では,RL p-adaptation strategy をバーガースの 1D Inviscid Burgers 方程式で検証し,この戦略の有効性を実証する。 rl戦略は計算コストを削減し、均一な適応よりも精度を向上させ、人間の介入を最小限に抑える。

Reinforcement learning (RL) has emerged as a promising approach to automating decision processes. This paper explores the application of RL techniques to optimise the polynomial order in the computational mesh when using high-order solvers. Mesh adaptation plays a crucial role in improving the efficiency of numerical simulations by improving accuracy while reducing the cost. Here, actor-critic RL models based on Proximal Policy Optimization offer a data-driven approach for agents to learn optimal mesh modifications based on evolving conditions. The paper provides a strategy for p-adaptation in high-order solvers and includes insights into the main aspects of RL-based mesh adaptation, including the formulation of appropriate reward structures and the interaction between the RL agent and the simulation environment. We discuss the impact of RL-based mesh p-adaptation on computational efficiency and accuracy. We test the RL p-adaptation strategy on a 1D inviscid Burgers' equation to demonstrate the effectiveness of the strategy. The RL strategy reduces the computational cost and improves accuracy over uniform adaptation, while minimising human intervention.

翻訳日:2023-06-16 20:06:32 公開日:2023-06-14

# 注意機構とctcに基づくデコードを用いたフランス語 cued 音声における手と唇のダイナミックスの検討

Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding ( http://arxiv.org/abs/2306.08290v1 )

ライセンス: Link先を確認

Sanjana Sankar (GIPSA-CRISSP), Denis Beautemps (GIPSA-CRISSP), Fr\'ed\'eric Elisei (ICP), Olivier Perrotin (GIPSA-CRISSP), Thomas Hueber (GIPSA-CRISSP)

(参考訳) 難聴者や難聴者は、音声言語を理解するためのコミュニケーションツールとして、CS(cued speech)を利用する。音声情報に関連する手がかりを提供することで、CSはリップリーディングを強化する手段を提供する。文献では、人間の生産の文脈において、手と唇の動態に関するいくつかの研究がなされている。本稿では,ニューラルネットワークが単一話者に対して,注意機構を用いて認識タスクを実行しながら,この関係を学習する方法を提案する。さらに、学習ダイナミクスの分析を用いて、2つのモダリティ間の関係を確立し、自動セグメントを抽出する。本研究の目的のために,フランスCS向けに新しいデータセットが記録されている。このデータセットのリリースとともに、単語レベルの認識のためのベンチマークが報告される。

Hard of hearing or profoundly deaf people make use of cued speech (CS) as a communication tool to understand spoken language. By delivering cues that are relevant to the phonetic information, CS offers a way to enhance lipreading. In literature, there have been several studies on the dynamics between the hand and the lips in the context of human production. This article proposes a way to investigate how a neural network learns this relation for a single speaker while performing a recognition task using attention mechanisms. Further, an analysis of the learnt dynamics is utilized to establish the relationship between the two modalities and extract automatic segments. For the purpose of this study, a new dataset has been recorded for French CS. Along with the release of this dataset, a benchmark will be reported for word-level recognition, a novelty in the automatic recognition of French CS.

翻訳日:2023-06-16 20:06:15 公開日:2023-06-14

# $\textbf{A}^2\textbf{CiD}^2$:分散ディープラーニングにおける非同期通信の高速化

$\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in Decentralized Deep Learning ( http://arxiv.org/abs/2306.08289v1 )

ライセンス: Link先を確認

Adel Nabli (MLIA, Mila), Eugene Belilovsky (Mila), Edouard Oyallon (MLIA)

(参考訳) ディープラーニングモデルの分散トレーニングは、この分野における多くの成功に不可欠である。現在の標準手法は主に同期集中型アルゴリズムに依存しており、大きな通信ボトルネックを引き起こし、ユーザビリティを強い接続性を持つハイパフォーマンスコンピューティング(HPC)環境に制限する。分散非同期アルゴリズムは潜在的な代替手段として登場しているが、実用性はまだ遅れている。本研究では,その柔軟性と並列化の可能性から,ピアツーピア非同期手法に着目する。大規模かつ接続の不十分な状況において,帯域幅の増加を緩和するために,$\textbf{A}^2\textbf{CiD}^2$という連続運動量のおかげで,非同期,ランダム化,ゴシップベースのアルゴリズムを導入する。パラメータの倍増以外のコストで重要な通信アクセラレーションを発生させるのに加えて、$\textbf{A}^2\textbf{CiD}^2$を他の非同期アプローチに組み込む必要がある。理論的・数値的にその効率を実証する。環グラフ上で経験的に、$\textbf{A}^2\textbf{CiD}^2$は通信レートを倍にするのと同じ効果を持つ。特に,最大64個の非同期ワーカ(a100 gpu)と各種通信ネットワークトポロジを用いたimagenetデータセットの一貫性向上を示す。

Distributed training of Deep Learning models has been critical to many recent successes in the field. Current standard methods primarily rely on synchronous centralized algorithms which induce major communication bottlenecks and limit their usability to High-Performance Computing (HPC) environments with strong connectivity. Decentralized asynchronous algorithms are emerging as a potential alternative but their practical applicability still lags. In this work, we focus on peerto-peer asynchronous methods due to their flexibility and parallelization potentials. In order to mitigate the increase in bandwidth they require at large scale and in poorly connected contexts, we introduce a principled asynchronous, randomized, gossip-based algorithm which works thanks to a continuous momentum named $\textbf{A}^2\textbf{CiD}^2$. In addition to inducing a significant communication acceleration at no cost other than doubling the parameters, minimal adaptation is required to incorporate $\textbf{A}^2\textbf{CiD}^2$ to other asynchronous approaches. We demonstrate its efficiency theoretically and numerically. Empirically on the ring graph, adding $\textbf{A}^2\textbf{CiD}^2$ has the same effect as doubling the communication rate. In particular, we show consistent improvement on the ImageNet dataset using up to 64 asynchronous workers (A100 GPUs) and various communication network topologies.

翻訳日:2023-06-16 20:06:02 公開日:2023-06-14

# 3次元音速位相不変エコー定位

3-Dimensional Sonic Phase-invariant Echo Localization ( http://arxiv.org/abs/2306.08281v1 )

ライセンス: Link先を確認

Christopher Hahne

(参考訳) パララックスと飛行時間(ToF)は、高度なカメラベースの3次元3次元再構成において様々な光と気象条件が課題であるロボットビジョンにおいて補完的なものとみなされる。そこで本研究では,3次元空間における任意のセンサ位置から音波パルスを三角測量するために,対応エコー(PaCE)のパララックスを確立した。これは新しいラウンドトリップ反射モデルによって達成され、それはセンサーの位置と検出された到着時刻にまたがる楕円形の交差点でターゲットをピンポイントする。チャネル間エコーアソシエーションは、標的検出の必須条件となり、シームズ多層パーセプトロン(MLP)のスタックから得られる特徴類似性から学習される。 PaCEアルゴリズムは1個の等方性エミッタと少なくとも3個のToF受信機からの位相不変3次元物体の局在化を可能にする。空中超音波センサハードウェアを用いて実験を行い、定量的な結果を得た。

Parallax and Time-of-Flight (ToF) are often regarded as complementary in robotic vision where various light and weather conditions remain challenges for advanced camera-based 3-Dimensional (3-D) reconstruction. To this end, this paper establishes Parallax among Corresponding Echoes (PaCE) to triangulate acoustic ToF pulses from arbitrary sensor positions in 3-D space for the first time. This is achieved through a novel round-trip reflection model that pinpoints targets at the intersection of ellipsoids, which are spanned by sensor locations and detected arrival times. Inter-channel echo association becomes a crucial prerequisite for target detection and is learned from feature similarity obtained by a stack of Siamese Multi-Layer Perceptrons (MLPs). The PaCE algorithm enables phase-invariant 3-D object localization from only 1 isotropic emitter and at least 3 ToF receivers with relaxed sensor position constraints. Experiments are conducted with airborne ultrasound sensor hardware and back this hypothesis with quantitative results.

翻訳日:2023-06-16 20:05:35 公開日:2023-06-14

# GCformer: 正確でスケーラブルな多変数時系列予測のための効率的なフレームワーク

GCformer: An Efficient Framework for Accurate and Scalable Long-Term Multivariate Time Series Forecasting ( http://arxiv.org/abs/2306.08325v1 )

ライセンス: Link先を確認

YanJun Zhao, Ziqing Ma, Tian Zhou, Liang Sun, Mengni Ye, Yi Qian

(参考訳) トランスフォーマーベースのモデルは、時系列予測の有望なツールとして登場した。しかし、これらのモデルでは長い入力時系列の正確な予測はできない。一方で、時系列データ内のグローバルな依存関係を捉えられなかった。一方、長い入力シーケンスは、通常、大きなモデルサイズと高い時間複雑性をもたらす。この制限に対処するために、長い入力列を処理する構造化グローバル畳み込みブランチと、短い最新の信号をキャプチャするローカルトランスフォーマティブベースのブランチを組み合わせたgcformerを提案する。大域的畳み込みカーネルのための凝集フレームワークが3つの異なるパラメータ化手法を用いて導入された。グローバルブランチで選択された構造化畳み込みカーネルは、特に線形の複雑さで構築されており、長大で雑音の多い入力信号の効率的かつ効率的な処理を可能にしている。 6つのベンチマークデータセットに関する実証的研究により、GCformerは最先端の手法より優れており、多変量時系列ベンチマークのMSEエラーを4.38%、モデルパラメータを61.92%削減している。特に、グローバル畳み込み分岐は他のモデルの性能を向上させるためのプラグインブロックとして機能することができ、最近発表された様々なトランスフォーマーベースのモデルを含む平均31.93\%改善されている。私たちのコードはhttps://github.com/zyj-111/gcformerで公開しています。

Transformer-based models have emerged as promising tools for time series forecasting. However, these model cannot make accurate prediction for long input time series. On the one hand, they failed to capture global dependencies within time series data. On the other hand, the long input sequence usually leads to large model size and high time complexity. To address these limitations, we present GCformer, which combines a structured global convolutional branch for processing long input sequences with a local Transformer-based branch for capturing short, recent signals. A cohesive framework for a global convolution kernel has been introduced, utilizing three distinct parameterization methods. The selected structured convolutional kernel in the global branch has been specifically crafted with sublinear complexity, thereby allowing for the efficient and effective processing of lengthy and noisy input signals. Empirical studies on six benchmark datasets demonstrate that GCformer outperforms state-of-the-art methods, reducing MSE error in multivariate time series benchmarks by 4.38% and model parameters by 61.92%. In particular, the global convolutional branch can serve as a plug-in block to enhance the performance of other models, with an average improvement of 31.93\%, including various recently published Transformer-based models. Our code is publicly available at https://github.com/zyj-111/GCformer.

翻訳日:2023-06-16 19:58:00 公開日:2023-06-14

# ディープラーニングモデルをトレーニングする際のカーボンフットプリントの推定方法ガイドとレビュー

How to estimate carbon footprint when training deep learning models? A guide and review ( http://arxiv.org/abs/2306.08323v1 )

ライセンス: Link先を確認

Lucia Bouza Heguerte (MAP5), Aur\'elie Bugeau (IUF, LaBRI, UB), Lo\"ic Lannelongue

(参考訳) 機械学習とディープラーニングモデルは、最近の社会の多くの分野における人工知能の急速な発展に欠かせないものとなっている。現在、これらのモデルの開発には多くの研究で分析された環境コストがあることが広く認識されている。機械学習モデルをトレーニングしながらエネルギー消費を追跡するために、いくつかのオンラインおよびソフトウェアツールが開発されている。本稿では,これらのツールの包括的導入と比較を行い,その作業の環境影響を推定したいai実践者を対象とした。特定の語彙、各ツールの技術的要件をレビューし、これらのツールを使用する方法と時期についてアドバイスします。

Machine learning and deep learning models have become essential in the recent fast development of artificial intelligence in many sectors of the society. It is now widely acknowledge that the development of these models has an environmental cost that has been analyzed in many studies. Several online and software tools have been developed to track energy consumption while training machine learning models. In this paper, we propose a comprehensive introduction and comparison of these tools for AI practitioners wishing to start estimating the environmental impact of their work. We review the specific vocabulary, the technical requirements for each tool, and provide some advice on how and when to use these tools.

翻訳日:2023-06-16 19:57:38 公開日:2023-06-14

# 過パラメータ浅層reluニューラルネットワークを用いた非パラメトリック回帰

Nonparametric regression using over-parameterized shallow ReLU neural networks ( http://arxiv.org/abs/2306.08321v1 )

ライセンス: Link先を確認

Yunfei Yang, Ding-Xuan Zhou

(参考訳) 重みが適切に制約されたり規則化されたりした場合、過パラメータ化されたニューラルネットワークは、ある滑らかな関数クラスから関数を学習するための最小収束率(対数係数まで)を達成することができる。具体的には、浅いreluニューラルネットワークを用いて未知の$d$-variate関数を推定する非パラメトリック回帰を考える。回帰関数は、平滑性$\alpha<(d+3)/2$のh\"older空間や、無限に広いニューラルネットワークと見なすことができる浅層ニューラルネットワークに対応する変動空間に由来すると仮定される。この設定では、重みに対する一定の規範制約を持つ浅層ニューラルネットワークに基づく最小二乗推定器は、ネットワーク幅が十分大きい場合、最小最適であることが証明される。副産物として、浅いReLUニューラルネットワークの局所ラデマッハ複雑性に対する新しい大きさ非依存境界が導出される。

It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the H\"older space with smoothness $\alpha<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.

翻訳日:2023-06-16 19:57:28 公開日:2023-06-14

# オンラインカーネル回帰に対する線形計算複雑度をもつ近似アルゴリズム

Nearly Optimal Algorithms with Sublinear Computational Complexity for Online Kernel Regression ( http://arxiv.org/abs/2306.08320v1 )

ライセンス: Link先を確認

Junfan Li and Shizhong Liao

(参考訳) 後悔と計算コストのトレードオフは、オンラインカーネルレグレッションの基本的な問題であり、このトレードオフに取り組んでいた以前のアルゴリズムは、線形計算の複雑さで最適な後悔の境界を維持することはできない。本稿では,aogd-aldとnons-aldの2つの新しいアルゴリズムを提案する。どちらのアルゴリズムも、カーネルマッピングを近似するために使用されるほぼ直交基底の群を動的に維持し、近似誤差を制御することでほぼ最適な後悔境界を維持する。基底の数は、カーネル行列の固有値の近似誤差と崩壊率に依存する。固有値が指数関数的に減衰すると、AOGD-ALD と NONS-ALD は $O(\sqrt{L(f)})$ と $O(\mathrm{d}_{\mathrm{eff}}(\mu)\ln{T})$ を、$O(\ln^2{T})$ の計算複雑性でそれぞれ後悔する。固有値が次数$p\geq 1$で多項式的に減衰した場合、我々のアルゴリズムは、それぞれ$p>4$ と $p\geq 10$ の場合、同じ後悔境界を$o(T)$ の計算複雑性で保持する。 l(f)$ は$f$ の累積損失であり、$\mathrm{d}_{\mathrm{eff}}(\mu)$ は問題の有効次元である。 2つの後悔境界はほぼ最適であり、同等ではない。

The trade-off between regret and computational cost is a fundamental problem for online kernel regression, and previous algorithms worked on the trade-off can not keep optimal regret bounds at a sublinear computational complexity. In this paper, we propose two new algorithms, AOGD-ALD and NONS-ALD, which can keep nearly optimal regret bounds at a sublinear computational complexity, and give sufficient conditions under which our algorithms work. Both algorithms dynamically maintain a group of nearly orthogonal basis used to approximate the kernel mapping, and keep nearly optimal regret bounds by controlling the approximate error. The number of basis depends on the approximate error and the decay rate of eigenvalues of the kernel matrix. If the eigenvalues decay exponentially, then AOGD-ALD and NONS-ALD separately achieves a regret of $O(\sqrt{L(f)})$ and $O(\mathrm{d}_{\mathrm{eff}}(\mu)\ln{T})$ at a computational complexity in $O(\ln^2{T})$. If the eigenvalues decay polynomially with degree $p\geq 1$, then our algorithms keep the same regret bounds at a computational complexity in $o(T)$ in the case of $p>4$ and $p\geq 10$, respectively. $L(f)$ is the cumulative losses of $f$ and $\mathrm{d}_{\mathrm{eff}}(\mu)$ is the effective dimension of the problem. The two regret bounds are nearly optimal and are not comparable.

翻訳日:2023-06-16 19:57:10 公開日:2023-06-14

# パレート最適解の集合によるエネルギー管理構成概念の同定

Identification of Energy Management Configuration Concepts from a Set of Pareto-optimal Solutions ( http://arxiv.org/abs/2306.08318v1 )

ライセンス: Link先を確認

Felix Lanfermann and Qiqi Liu and Yaochu Jin and Sebastian Schmitt

(参考訳) 効率的なエネルギー利用のための建築構成の最適化は、現在研究によって注目されており、この課題に対処するいくつかの方法が開発されている。しかし、初期投資コスト、繰り返しコスト、グリッド操作の不確実性に対する堅牢性など、複数の相反する目標に基づく適切な構成の選択は、難しいマルチクリトリア意思決定問題である。概念識別は、構成オプションを意味のあるグループ(概念)に分類し、目的の選択に対するトレードオフ期待を満たすための制約を導入することで意思決定を容易にする。本研究では,多目的進化最適化による20000pareto-optimal building energy managementの設定に対して,複数の概念識別イテレーションを実施し,インフォームド投資決定の基盤を提供する。その後の一連の分析ステップで、記述空間の選択、すなわち、一貫性のある概念と重複しない概念を必要とする集合への特徴の分割が、抽出可能な情報のタイプに影響を与え、記述空間の異なるセットアップが構成データのいくつかの異なる側面を照らしていることを示す。

Optimizing building configurations for an efficient use of energy is increasingly receiving attention by current research and several methods have been developed to address this task. Selecting a suitable configuration based on multiple conflicting objectives, such as initial investment cost, recurring cost, robustness with respect to uncertainty of grid operation is, however, a difficult multi-criteria decision making problem. Concept identification can facilitate a decision maker by sorting configuration options into semantically meaningful groups (concepts), further introducing constraints to meet trade-off expectations for a selection of objectives. In this study, for a set of 20000 Pareto-optimal building energy management configurations, resulting from a many-objective evolutionary optimization, multiple concept identification iterations are conducted to provide a basis for making an informed investment decision. In a series of subsequent analysis steps, it is shown how the choice of description spaces, i.e., the partitioning of the features into sets for which consistent and non-overlapping concepts are required, impacts the type of information that can be extracted and that different setups of description spaces illuminate several different aspects of the configuration data - an important aspect that has not been addressed in previous work.

翻訳日:2023-06-16 19:56:40 公開日:2023-06-14

# R-Drop構造を有する改良型変圧器における名前付きエンティティ認識に関する研究

Research on Named Entity Recognition in Improved transformer with R-Drop structure ( http://arxiv.org/abs/2306.08315v1 )

ライセンス: Link先を確認

Weidong Ji, Yousheng Zhang, Guohui Zhou, Xu Wang

(参考訳) 本稿では,モデル一般化能力の向上と名前付きエンティティ認識タスクにおける変換器の有効性向上のために,XLNet-Transformer-Rモデルを提案する。 XLNet事前学習モデルと相対的な位置エンコーディングを備えたTransformerエンコーダを組み合わせることで、長いテキストの処理能力を高め、コンテキスト情報を学習することで堅牢性を向上させる。オーバーフィッティングを防止するため、R-Drop構造を用いて一般化能力を改善し、名前付きエンティティ認識タスクにおけるモデルの精度を高める。本稿では,MSRAデータセット上でのアブレーション実験と,XLNet-Transformer-Rモデルの戦略的有効性を示す4つのデータセットにおける他のモデルとの比較実験を行う。

To enhance the generalization ability of the model and improve the effectiveness of the transformer for named entity recognition tasks, the XLNet-Transformer-R model is proposed in this paper. The XLNet pre-trained model and the Transformer encoder with relative positional encodings are combined to enhance the model's ability to process long text and learn contextual information to improve robustness. To prevent overfitting, the R-Drop structure is used to improve the generalization capability and enhance the accuracy of the model in named entity recognition tasks. The model in this paper performs ablation experiments on the MSRA dataset and comparison experiments with other models on four datasets with excellent performance, demonstrating the strategic effectiveness of the XLNet-Transformer-R model.

翻訳日:2023-06-16 19:56:20 公開日:2023-06-14

# 自動話者独立視覚音声認識:包括的調査

Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey ( http://arxiv.org/abs/2306.08314v1 )

ライセンス: Link先を確認

Praneeth Nemani, G. Sai Krishna, Supriya Kundrapu

(参考訳) 話者非依存のVSRは、話者の顔の動きのビデオ記録から音声語やフレーズを識別する複雑なタスクである。長年にわたり、システムパフォーマンスを評価するために異なるアルゴリズムとデータセットを含むvsrの分野でかなりの研究が行われてきた。これらの取り組みは有効なVSRモデルの開発に大きな進歩をもたらし、この分野におけるさらなる研究の機会を生み出した。この調査は、過去30年間のVSRの進展を詳細に調査し、特に話者に依存しないシステムから話者に依存しないシステムへの移行に焦点を当てている。また、VSR研究で使用される各種データセットの概要と、話者独立を達成するために使用される事前処理技術についても概説する。この調査は1990年から2023年にかけて出版された著作を網羅し、各著作を徹底的に分析し、様々なパラメータと比較している。この調査は、1990年から2023年までの話者に依存しないVSRシステムの進化を詳細に分析する。 VSRシステムの開発について概説し、話者に依存しないVSRのためのエンドツーエンドパイプラインを開発する必要性を強調している。画像表現は、話者に依存しないVSRで使用されるテクニックの明確かつ簡潔な概要を提供し、それによって様々な方法論の理解と分析を支援する。調査ではまた、それぞれのテクニックの強みと限界を強調し、視覚音声の手がかりを分析するための新しいアプローチの開発に関する洞察を提供する。全体として、この総合的なレビューは、現在最先端の話者非依存のVSRに関する洞察を提供し、将来の研究の可能性を強調している。

Speaker-independent VSR is a complex task that involves identifying spoken words or phrases from video recordings of a speaker's facial movements. Over the years, there has been a considerable amount of research in the field of VSR involving different algorithms and datasets to evaluate system performance. These efforts have resulted in significant progress in developing effective VSR models, creating new opportunities for further research in this area. This survey provides a detailed examination of the progression of VSR over the past three decades, with a particular emphasis on the transition from speaker-dependent to speaker-independent systems. We also provide a comprehensive overview of the various datasets used in VSR research and the preprocessing techniques employed to achieve speaker independence. The survey covers the works published from 1990 to 2023, thoroughly analyzing each work and comparing them on various parameters. This survey provides an in-depth analysis of speaker-independent VSR systems evolution from 1990 to 2023. It outlines the development of VSR systems over time and highlights the need to develop end-to-end pipelines for speaker-independent VSR. The pictorial representation offers a clear and concise overview of the techniques used in speaker-independent VSR, thereby aiding in the comprehension and analysis of the various methodologies. The survey also highlights the strengths and limitations of each technique and provides insights into developing novel approaches for analyzing visual speech cues. Overall, This comprehensive review provides insights into the current state-of-the-art speaker-independent VSR and highlights potential areas for future research.

翻訳日:2023-06-16 19:56:06 公開日:2023-06-14

# バックドア攻撃における投薬効率向上のためのプロキシフリー戦略

A Proxy-Free Strategy for Practically Improving the Poisoning Efficiency in Backdoor Attacks ( http://arxiv.org/abs/2306.08313v1 )

ライセンス: Link先を確認

Ziqiang Li, Hong Sun, Pengfei Xia, Beihao Xia, Xue Rui, Wei Zhang, Bin Li

(参考訳) 毒殺効率は、毒殺ベースのバックドア攻撃において重要な要素である。攻撃者は、検出されていない状態を保つために、できるだけ少ない毒性サンプルを同じレベルの攻撃強度を達成するために使用する。効率的なトリガーは中毒の効率を大幅に改善するが、改善の余地はまだある。近年,効率のよいサンプルを選択することは有望であるが,有効な有害なサンプルセットを見つけるためにはプロキシバックドアインジェクションタスクが必要であるため,プロキシアタック設定が被害者の実際の設定と異なる場合,パフォーマンスが低下する可能性がある。本稿では,個別の類似性に基づいて効率的な有毒試料を選定し,この課題を効果的に解決する,新規なプロキシフリー戦略(pfs)を提案する。提案手法は,いくつかのデータセット,トリガ,中毒率,アーキテクチャ,ハイパーパラメータのトレーニングで評価する。実験の結果, PFSは従来のプロキシベース選択手法よりも高速で, バックドア攻撃強度が高いことがわかった。

Poisoning efficiency is a crucial factor in poisoning-based backdoor attacks. Attackers prefer to use as few poisoned samples as possible to achieve the same level of attack strength, in order to remain undetected. Efficient triggers have significantly improved poisoning efficiency, but there is still room for improvement. Recently, selecting efficient samples has shown promise, but it requires a proxy backdoor injection task to find an efficient poisoned sample set, which can lead to performance degradation if the proxy attack settings are different from the actual settings used by the victims. In this paper, we propose a novel Proxy-Free Strategy (PFS) that selects efficient poisoned samples based on individual similarity and set diversity, effectively addressing this issue. We evaluate the proposed strategy on several datasets, triggers, poisoning ratios, architectures, and training hyperparameters. Our experimental results demonstrate that PFS achieves higher backdoor attack strength while x500 faster than previous proxy-based selection approaches.

翻訳日:2023-06-16 19:55:42 公開日:2023-06-14

# 量子ゼノと反ゼノ効果に基づくコヒーレント制御:コヒーレンスとタイミングの役割

Coherent control based on quantum Zeno and anti-Zeno effects: Role of coherences and timing ( http://arxiv.org/abs/2306.08311v1 )

ライセンス: Link先を確認

Jacob Levitt and Artur F. Izmaylov

(参考訳) 量子ゼノと反ゼノ効果(QZE/AZE)は長い間知られている。多数の離散レベルが結合した一般的な量子系では、特定のレベル集団の測定は、その集団が他のレベルに移動する際の加速(AZE)または遅延(QZE)のいずれかにつながる。ここでは,結合した量子状態から特定の時間における計測によって人口の流れを制御する方法と,それに対するシステムパラメータについて考察する。本稿では,時間依存密度行列摂動理論に基づく量子ゼノダイナミクス解析の枠組みを提案する。この枠組みにより、国家の人口をコヒーレンスから明確に分離し、QZEまたはAZEの出現を予測することができる。 2つのモデルシステムについて分析する。 1)2つの結合レベルと 2)連続体に結合したレベル。どちらの場合も、量子コヒーレンスのダイナミクスは重要な役割を担い、摂動的考慮によって射影測定の効果を予測することができる。さらに,系の波動関数を記述するコヒーレント重ね合わせにおいて,状態の符号を反転させるユニタリ変換であるコヒーレント制御シナリオについても考察した。

The quantum-Zeno and anti-Zeno effects (QZE/AZE) are known for a long time. In a general quantum system with a number of coupled discrete levels, the measurement of a particular level population can lead to either acceleration (i.e. AZE) or retardation (i.e. QZE) of its population transfer to other levels. Here we consider how one can control the population flow from a coupled quantum state by measurement at a particular time, and what system parameters are responsible for that. We propose a framework for analysis of quantum Zeno dynamics based on time-dependent density matrix perturbation theory. This framework allows us to clearly separate state populations from their coherences and to predict appearance of either QZE or AZE. We illustrate our analysis on two model systems: 1) two coupled levels and 2) a level coupled to a continuum. In both cases dynamics of quantum coherences play a crucial role, and perturbative considerations allow us to predict the effect of projective measurements. In addition, we have extended our consideration to a closely related coherent control scenario, a unitary transformation flipping a sign of a state in a coherent superposition describing the system wavefunction.

翻訳日:2023-06-16 19:55:25 公開日:2023-06-14

# TWIGMA: Twitterのメタデータを備えたAI生成画像のデータセット

TWIGMA: A dataset of AI-Generated Images with Metadata From Twitter ( http://arxiv.org/abs/2306.08310v1 )

ライセンス: Link先を確認

Yiqun Chen, James Zou

(参考訳) 生成型人工知能(gen-AI)の最近の進歩により、写真リアリスティック写真や芸術的インスピレーション写真が1クリックで生成できるようになった。 DALLEやStableDiffusionといったジェネラルAIモデルの使用方法を検討するためには、AI生成写真に存在するテーマ、内容、バリエーションを理解することが重要である。本稿では,2021年1月から2023年3月までにTwitter上で収集された800,000以上のgen-AIイメージを含む包括的なデータセットであるTWIGMA(TWItter Generative-aiイメージ with MetadatA)を紹介する。 TWIGMAと自然画像と人間のアートワークを比較した結果,gen-AI画像は特徴的特徴を有し,非gen-AI画像と比較した場合,平均的,低変動性を示すことがわかった。さらに,gen-AI画像と自然画像との類似性も明らかになった。 (i)「いいね」の数と逆相関し、 (ii)は、gen-AI創造のインスピレーションとなる人間の画像を特定するために用いられる。最後に、Twitter上でAI生成画像のテーマの経年変化を観察し、ユーザーは複雑な人間の肖像画などの芸術的に洗練されたコンテンツをシェアする一方で、自然の場面や動物のような単純な主題への関心は減少している。我々は,AI生成画像の研究において,TWIGMAがユニークなデータ資源であることを示す。

Recent progress in generative artificial intelligence (gen-AI) has enabled the generation of photo-realistic and artistically-inspiring photos at a single click, catering to millions of users online. To explore how people use gen-AI models such as DALLE and StableDiffusion, it is critical to understand the themes, contents, and variations present in the AI-generated photos. In this work, we introduce TWIGMA (TWItter Generative-ai images with MetadatA), a comprehensive dataset encompassing over 800,000 gen-AI images collected from Jan 2021 to March 2023 on Twitter, with associated metadata (e.g., tweet text, creation date, number of likes). Through a comparative analysis of TWIGMA with natural images and human artwork, we find that gen-AI images possess distinctive characteristics and exhibit, on average, lower variability when compared to their non-gen-AI counterparts. Additionally, we find that the similarity between a gen-AI image and natural images (i) is inversely correlated with the number of likes; and (ii) can be used to identify human images that served as inspiration for the gen-AI creations. Finally, we observe a longitudinal shift in the themes of AI-generated images on Twitter, with users increasingly sharing artistically sophisticated content such as intricate human portraits, whereas their interest in simple subjects such as natural scenes and animals has decreased. Our analyses and findings underscore the significance of TWIGMA as a unique data resource for studying AI-generated images.

翻訳日:2023-06-16 19:55:04 公開日:2023-06-14

# ランダムフーリエ特徴を用いたベイズ非線形潜在変数モデリング

Bayesian Non-linear Latent Variable Modeling via Random Fourier Features ( http://arxiv.org/abs/2306.08352v1 )

ライセンス: Link先を確認

Michael Minyi Zhang, Gregory W. Gundersen, Barbara E. Engelhardt

(参考訳) ガウス過程潜在変数モデル(英: Gaussian process latent variable model、GPLVM)は、非線形次元の減少、行列分解、状態空間モデリングによく用いられる確率論的手法である。 gplvmsの推論は、データがガウス的である場合にのみ計算可能である。さらに、gplvmsの推論は、一般的に、後方不確かさを誤認するオーバーフィッティングや変分近似につながる最大後方点推定を得るために制限されている。本稿では,一般化ベイズ非線形潜在変数モデリングのためのマルコフ連鎖モンテカルロ(mcmc)推定を行う手法を提案する。 GPLVMを任意の観測モデルに一般化するために必要な重要な洞察は、ガウス過程写像におけるカーネル関数をランダムなフーリエ特徴と近似することである。ランダム特徴潜時変数モデル(RFLVM)を用いて,ポアソン,負二項分布,多項分布などの非ガウス観測にGPLVMを一般化できることを示す。一般化されたRFLVMは, 動作キャプチャ, 画像, テキストデータなど, 様々なアプリケーション上で, 最先端の潜伏変数モデルと同等に動作し, 潜伏構造を推定し, 複雑なデータセットの欠落データを出力する。

The Gaussian process latent variable model (GPLVM) is a popular probabilistic method used for nonlinear dimension reduction, matrix factorization, and state-space modeling. Inference for GPLVMs is computationally tractable only when the data likelihood is Gaussian. Moreover, inference for GPLVMs has typically been restricted to obtaining maximum a posteriori point estimates, which can lead to overfitting, or variational approximations, which mischaracterize the posterior uncertainty. Here, we present a method to perform Markov chain Monte Carlo (MCMC) inference for generalized Bayesian nonlinear latent variable modeling. The crucial insight necessary to generalize GPLVMs to arbitrary observation models is that we approximate the kernel function in the Gaussian process mappings with random Fourier features; this allows us to compute the gradient of the posterior in closed form with respect to the latent variables. We show that we can generalize GPLVMs to non-Gaussian observations, such as Poisson, negative binomial, and multinomial distributions, using our random feature latent variable model (RFLVM). Our generalized RFLVMs perform on par with state-of-the-art latent variable models on a wide range of applications, including motion capture, images, and text data for the purpose of estimating the latent structure and imputing the missing data of these complex data sets.

翻訳日:2023-06-16 19:49:00 公開日:2023-06-14

# コード事前学習モデルに対するマルチターゲットバックドア攻撃

Multi-target Backdoor Attacks for Code Pre-trained Models ( http://arxiv.org/abs/2306.08350v1 )

ライセンス: Link先を確認

Yanzhou Li, Shangqing Liu, Kangjie Chen, Xiaofei Xie, Tianwei Zhang and Yang Liu

(参考訳) ニューラルコードモデルのバックドア攻撃は、コードインテリジェンスの進歩により、かなりの注目を集めている。しかし、既存の作業の多くは、コードに関連する下流タスクのタスク固有のデータにトリガーを挿入することで、攻撃範囲を制限している。さらに、事前訓練されたモデルに対する攻撃の大半は、タスクを理解するために設計されている。本稿では,コード事前学習モデルに対するタスク非依存のバックドア攻撃を提案する。我々のバックドアモデルは、下流のコード理解と生成タスクのマルチターゲット攻撃をサポートする2つの学習戦略(Poisoned Seq2Seq学習とトークン表現学習)で事前訓練されている。デプロイフェーズでは、ターゲットの攻撃を達成するために設計したトリガーによって、被害者モデルに埋め込まれたバックドアを起動することができる。 7つのデータセット上で2つのコード理解タスクと3つのコード生成タスクに対するアプローチを評価した。大規模な実験により、我々のアプローチは、コードに関連する下流タスクを効果的に、そして、密かに攻撃できることを示した。

Backdoor attacks for neural code models have gained considerable attention due to the advancement of code intelligence. However, most existing works insert triggers into task-specific data for code-related downstream tasks, thereby limiting the scope of attacks. Moreover, the majority of attacks for pre-trained models are designed for understanding tasks. In this paper, we propose task-agnostic backdoor attacks for code pre-trained models. Our backdoored model is pre-trained with two learning strategies (i.e., Poisoned Seq2Seq learning and token representation learning) to support the multi-target attack of downstream code understanding and generation tasks. During the deployment phase, the implanted backdoors in the victim models can be activated by the designed triggers to achieve the targeted attack. We evaluate our approach on two code understanding tasks and three code generation tasks over seven datasets. Extensive experiments demonstrate that our approach can effectively and stealthily attack code-related downstream tasks.

翻訳日:2023-06-16 19:48:33 公開日:2023-06-14

# 生成的深層学習はフェルミオン系の集合変数を明らかにする

Generative deep-learning reveals collective variables of Fermionic systems ( http://arxiv.org/abs/2306.08348v1 )

ライセンス: Link先を確認

Rapha\"el-David Lasseri, David Regnier, Mika\"el Frosini, Marc Verriere, Nicolas Schunck

(参考訳) タンパク質の折りたたみから核分裂までの複雑な過程は、いくつかの集団変数でパラメータ化された低次元反応経路に従うことが多い。核理論において、平均場図における核密度の形状に関連する変数は、中性子と陽子の大きな振幅集団運動を記述する鍵となる。これらの自由度によって広がる断熱エネルギーの風景を探索すると、この還元空間のダイナミクスをシミュレートしながら、可能な反応チャネルが明らかになる。残念ながら、この理論の枠組みは、系が集合変数に関して量子相転移に遭遇するたびに崩壊する。本稿では,核過程を高度に表現し,そのフェルミオン波動関数への微分可能写像を保証しながら,新たな集団変数を構築できる生成的深層学習アルゴリズムを提案する。この集合空間内では、核はその断熱量子相の一方からもう一方へ、ポテンシャルエネルギー障壁を渡る価格で連続的に進化することができる。このアプローチは、密度汎関数理論で記述された電子系を包含する単一のスレーター行列式によって記述された任意のフェルミオン系に適用できる。

Complex processes ranging from protein folding to nuclear fission often follow a low-dimension reaction path parameterized in terms of a few collective variables. In nuclear theory, variables related to the shape of the nuclear density in a mean-field picture are key to describing the large amplitude collective motion of the neutrons and protons. Exploring the adiabatic energy landscape spanned by these degrees of freedom reveals the possible reaction channels while simulating the dynamics in this reduced space yields their respective probabilities. Unfortunately, this theoretical framework breaks down whenever the systems encounters a quantum phase transition with respect to the collective variables. Here we propose a generative-deep-learning algorithm capable of building new collective variables highly representative of a nuclear process while ensuring a differentiable mapping to its Fermionic wave function. Within this collective space, the nucleus can evolve continuously from one of its adiabatic quantum phase to the other at the price of crossing a potential energy barrier. This approach applies to any Fermionic system described by a single Slater determinant, which encompasses electronic systems described within the density functional theory.

翻訳日:2023-06-16 19:48:20 公開日:2023-06-14

# UIERL:水中画像強調のための外部表現学習ネットワーク

UIERL: Internal-External Representation Learning Network for Underwater Image Enhancement ( http://arxiv.org/abs/2306.08344v1 )

ライセンス: Link先を確認

Zhengyong Wang, Liquan Shen, Yihan Yu and Yuan Hui

(参考訳) 水中画像強調(uie)は有意義だが困難な課題であり,近年,学習に基づくuie手法が数多く提案されている。多くの進展がみられたが,(1)水中撮像過程による水中画像の局所的品質差は,特に風景深度の異なる領域において有意な差がある。しかし, 従来の手法では, 水中画像の内部特性は無視されており, 性能は劣っている。(2) 取得手法の特異性のため, 水中画像取得ツールは通常, 同一又は類似のシーンで複数の画像をキャプチャする。したがって, 実用化に資する水中画像は, 高い相関関係にある。しかし、単一の画像を処理する場合、既存の手法では、関連画像が提供するリッチな外部情報を考慮していない。彼らのパフォーマンスにはまだ改善の余地がある。これら2つの側面を動機として,UIEタスクを内部情報と外部情報とを同時に実行するための,UIERL(internal-external representation learning)ネットワークを提案する。内部表現学習段階において、シーン深度に基づく領域セグメンテーションを含む、新しい深度に基づく領域特徴誘導網を設計し、異なる品質レベルの領域を検知し、次いで領域ワイド空間エンコーダモジュールを設計する。異なる品質の地域に対して地域的特徴学習を行うことで、ネットワークはグローバルな特徴の効果的なガイダンスを提供し、画像内差分エンハンスメントのガイドとなる。外部表現学習段階において,まず,関連画像中のリッチな外部情報をマイニングする外部情報抽出ネットワークを提案する。次に、提案する外部アシスト-内部モジュールと内部アシスト-eを介して、内部および外部特徴が相互に相互作用する。

Underwater image enhancement (UIE) is a meaningful but challenging task, and many learning-based UIE methods have been proposed in recent years. Although much progress has been made, these methods still exist two issues: (1) There exists a significant region-wise quality difference in a single underwater image due to the underwater imaging process, especially in regions with different scene depths. However, existing methods neglect this internal characteristic of underwater images, resulting in inferior performance; (2) Due to the uniqueness of the acquisition approach, underwater image acquisition tools usually capture multiple images in the same or similar scenes. Thus, the underwater images to be enhanced in practical usage are highly correlated. However, when processing a single image, existing methods do not consider the rich external information provided by the related images. There is still room for improvement in their performance. Motivated by these two aspects, we propose a novel internal-external representation learning (UIERL) network to better perform UIE tasks with internal and external information, simultaneously. In the internal representation learning stage, a new depth-based region feature guidance network is designed, including a region segmentation based on scene depth to sense regions with different quality levels, followed by a region-wise space encoder module. With performing region-wise feature learning for regions with different quality separately, the network provides an effective guidance for global features and thus guides intra-image differentiated enhancement. In the external representation learning stage, we first propose an external information extraction network to mine the rich external information in the related images. Then, internal and external features interact with each other via the proposed external-assist-internal module and internal-assist-e

翻訳日:2023-06-16 19:48:03 公開日:2023-06-14

# 調和ポテンシャルにおける量子粒子の繰り返し測定について

On Repeated Measurements of a Quantum Particle in a Harmonic Potential ( http://arxiv.org/abs/2306.08342v1 )

ライセンス: Link先を確認

Filip Gampel, Mariusz Gajda

(参考訳) 位置と運動量が繰り返し監視される調和ポテンシャルにおける量子粒子の進化を研究する。測定装置のバックアクションが考慮される。本モデルは、正の演算子値測度に対応する一般化計測を用いる。測定すると、粒子の波動関数は観測結果に応じて検出可能な状態の1つに投影されると仮定する。我々は、これらの測定後の状態がガウス波束を動かすことを選択した。波動関数量子モンテカルロ形式は粒子の単一の量子軌道をシミュレートするために用いられる。本研究では, 粒子の位置と運動量の分散を詳細に観察し, 古典的軌道がどのように出現するかを示す。

We study evolution of a quantum particle in a harmonic potential whose position and momentum are repeatedly monitored. A back-action of measuring devices is accounted for. Our model utilizes a generalized measurement corresponding to the Positive Operator-Valued Measure. We assume that upon measurement the particle's wavefunction is projected onto one of possible detector states depending on the observed result. We chose these post-measurement states to be moving Gaussian wavepackets. The Wave Function Quantum Monte-Carlo formalism is used to simulate single quantum trajectories of the particle. We show how classical trajectories emerge in course of observation and study in detail dispersion of position and momentum of the particle.

翻訳日:2023-06-16 19:47:33 公開日:2023-06-14

# 畳み込みニューラルネットワークにおけるグローバルローカル処理

Global-Local Processing in Convolutional Neural Networks ( http://arxiv.org/abs/2306.08336v1 )

ライセンス: Link先を確認

Zahra Rezvani, Soroor Shekarizeh, Mohammad Sabokrou

(参考訳) 畳み込みニューラルネットワーク(CNN)は、画像処理の課題において優れたパフォーマンスを達成した。実際、cnnはマイクロレベルのヒト脳構造(人工ニューロン)を模倣している。同時に、マクロアーキテクチャ(ハイレベル認知)における人間の自然な視覚知覚の模倣から距離を置いている。近年,CNNは局所的な特徴に非常に偏りがあり,入力のグローバルな側面を検知できないことが研究されている。しかしながら、この文献はこの問題に関する限られた手がかりを提供している。そこで本研究では,人間の瞳孔の無意識行動に触発された単純かつ効果的な解法を提案する。我々は,Global Advantage Stream (GAS)と呼ばれるシンプルなモジュールを考案し,入力サンプルの全体的特徴(グローバル機能)を学習し,捉える。次に,グローバル/ローカル処理(glp)モデルと呼ばれるプラグ・アンド・プレイコンポーネントとして,ガスの特徴をcnnネットワークと組み合わせた。実験の結果,このストリームは計算量や時間負荷を増加させることで精度が向上し,ネットワークを敵の攻撃に対してより堅牢にすることを確認した。さらに、モデルの解釈を調べることで、健康な人間の知覚システムに似た、より包括的な表現を学習できることが分かる。

Convolutional Neural Networks (CNNs) have achieved outstanding performance on image processing challenges. Actually, CNNs imitate the typically developed human brain structures at the micro-level (Artificial neurons). At the same time, they distance themselves from imitating natural visual perception in humans at the macro architectures (high-level cognition). Recently it has been investigated that CNNs are highly biased toward local features and fail to detect the global aspects of their input. Nevertheless, the literature offers limited clues on this problem. To this end, we propose a simple yet effective solution inspired by the unconscious behavior of the human pupil. We devise a simple module called Global Advantage Stream (GAS) to learn and capture the holistic features of input samples (i.e., the global features). Then, the GAS features were combined with a CNN network as a plug-and-play component called the Global/Local Processing (GLP) model. The experimental results confirm that this stream improves the accuracy with an insignificant additional computational/temporal load and makes the network more robust to adversarial attacks. Furthermore, investigating the interpretation of the model shows that it learns a more holistic representation similar to the perceptual system of healthy humans

翻訳日:2023-06-16 19:47:26 公開日:2023-06-14

# 生存予測のためのグローバル構造整合性を有するマルチモーダル最適輸送型コアテンショントランス

Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency for Survival Prediction ( http://arxiv.org/abs/2306.08330v1 )

ライセンス: Link先を確認

Yingxue Xu and Hao Chen

(参考訳) 生存予測(Survival prediction)は、死のランク付けリスクを予測することを目的とした複雑な順序回帰タスクであり、一般的には、組織学とゲノムデータの統合の恩恵を受ける。病理学とゲノム学による共同学習の進展にもかかわらず、既存の方法はまだ困難な問題に悩まされている。 1) 病理像の大きさが大きいため, ギガピクセル全体のスライド画像(wsis)を効果的に表現することは困難である。 2) 組織学における腫瘍微小環境(TME)内の相互作用は生存分析に不可欠である。現在のアプローチは、ヒストロジーとゲノムデータの間のコアテンションを通じてこれらの相互作用をモデル化しようとするが、それらはモダリティ間の密集した局所的類似性のみに焦点をあてる。そこで本稿では,グローバル構造一貫性を持つ多モード最適トランスポートベースコアテンショントランスフォーマティブフレームワークを提案する。このフレームワークでは,ggapixel wsiを表すために,wsiのパッチと遺伝子組込みをマッチさせるために最適なトランスポート(ot)を適用する。さらに重要なことは、OTベースのコアテンションは、生存予測のためにTME内の構造的相互作用を効果的に捉えるグローバルな認識を提供する。 OTの計算複雑性の増大を克服するため,不均衡なミニバッチOTで元のOTを近似することにより,WSIパッチのマイクロバッチに対する堅牢かつ効率的な実装を提案する。大規模実験により,5つのベンチマークデータセット上での手法の優位性を示した。コードはリリースされている。

Survival prediction is a complicated ordinal regression task that aims to predict the ranking risk of death, which generally benefits from the integration of histology and genomic data. Despite the progress in joint learning from pathology and genomics, existing methods still suffer from challenging issues: 1) Due to the large size of pathological images, it is difficult to effectively represent the gigapixel whole slide images (WSIs). 2) Interactions within tumor microenvironment (TME) in histology are essential for survival analysis. Although current approaches attempt to model these interactions via co-attention between histology and genomic data, they focus on only dense local similarity across modalities, which fails to capture global consistency between potential structures, i.e. TME-related interactions of histology and co-expression of genomic data. To address these challenges, we propose a Multimodal Optimal Transport-based Co-Attention Transformer framework with global structure consistency, in which optimal transport (OT) is applied to match patches of a WSI and genes embeddings for selecting informative patches to represent the gigapixel WSI. More importantly, OT-based co-attention provides a global awareness to effectively capture structural interactions within TME for survival prediction. To overcome high computational complexity of OT, we propose a robust and efficient implementation over micro-batch of WSI patches by approximating the original OT with unbalanced mini-batch OT. Extensive experiments show the superiority of our method on five benchmark datasets compared to the state-of-the-art methods. The code is released.

翻訳日:2023-06-16 19:47:07 公開日:2023-06-14

# r-drop構造を有する改良されたコンフォーマントエンドツーエンド音声認識モデルに関する研究

Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure ( http://arxiv.org/abs/2306.08329v1 )

ライセンス: Link先を確認

Weidong Ji, Shijie Zan, Guohui Zhou, and Xu Wang

(参考訳) 深層学習におけるエンド・ツー・エンド音声認識モデルにおける一般化能力の低下に対処するため,R-drop構造を組み込んだコンフォーマーベース音声認識モデル"Conformer-R"を提案する。このモデルは、音声認識で有望な結果を示す適合モデルとr-drop構造を組み合わせたものである。これにより、R-drop構造を用いることで、局所的およびグローバルな音声情報の両方を効果的にモデル化し、過度な適合を低減できる。これにより、モデルの一般化能力が向上し、全体的な認識効率が向上する。このモデルは、まず一般ドメイン適応のためにAishell1とWenetspeechデータセットで事前訓練され、その後、コンピュータ関連のオーディオデータに基づいて微調整された。 LAS や Wenet といった古典モデルとの比較テストは同じテストセットで実施され、Conformer-R モデルの一般化を効果的に改善する能力を示した。

To address the issue of poor generalization ability in end-to-end speech recognition models within deep learning, this study proposes a new Conformer-based speech recognition model called "Conformer-R" that incorporates the R-drop structure. This model combines the Conformer model, which has shown promising results in speech recognition, with the R-drop structure. By doing so, the model is able to effectively model both local and global speech information while also reducing overfitting through the use of the R-drop structure. This enhances the model's ability to generalize and improves overall recognition efficiency. The model was first pre-trained on the Aishell1 and Wenetspeech datasets for general domain adaptation, and subsequently fine-tuned on computer-related audio data. Comparison tests with classic models such as LAS and Wenet were performed on the same test set, demonstrating the Conformer-R model's ability to effectively improve generalization.

翻訳日:2023-06-16 19:46:38 公開日:2023-06-14

# out-of-distribution predictionのための分布シフトインバージョン

Distribution Shift Inversion for Out-of-Distribution Prediction ( http://arxiv.org/abs/2306.08328v1 )

ライセンス: Link先を確認

Runpeng Yu, Songhua Liu, Xingyi Yang, Xinchao Wang

(参考訳) 機械学習学会は、統一予測器や不変特徴表現を探索することによって、トレーニングとテスト分布の間の分散シフトに対処する、無数の分散(ood)アルゴリズムの出現を目撃している。しかし、トレーニング期間中に試験分布が不有効であることや、トレーニングとテスト分布間の分布トランスレータマッピングのトレーニングが不可能であることから、未確認の試験セットにおける分布シフトを直接緩和するタスクはめったに検討されない。本稿では,分散翻訳訓練における分散テストの必要性を回避し,分散翻訳をood予測に役立てる方法について検討する。そこで本研究では, 予測モデルに入力される前に, ood試験試料をガウス雑音と線形結合し, 音源分布にのみ訓練された拡散モデルを用いて, トレーニング分布に戻す可搬分布シフトインバージョンアルゴリズムを提案する。理論解析により本手法の有効性が明らかになった。複数領域の一般化データセットと単一領域の一般化データセットを併用した実験結果から,OoDアルゴリズムを幅広く使用する場合,本手法は汎用的な性能向上をもたらすことが示された。

Machine learning society has witnessed the emergence of a myriad of Out-of-Distribution (OoD) algorithms, which address the distribution shift between the training and the testing distribution by searching for a unified predictor or invariant feature representation. However, the task of directly mitigating the distribution shift in the unseen testing set is rarely investigated, due to the unavailability of the testing distribution during the training phase and thus the impossibility of training a distribution translator mapping between the training and testing distribution. In this paper, we explore how to bypass the requirement of testing distribution for distribution translator training and make the distribution translation useful for OoD prediction. We propose a portable Distribution Shift Inversion algorithm, in which, before being fed into the prediction model, the OoD testing samples are first linearly combined with additional Gaussian noise and then transferred back towards the training distribution using a diffusion model trained only on the source distribution. Theoretical analysis reveals the feasibility of our method. Experimental results, on both multiple-domain generalization datasets and single-domain generalization datasets, show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.

翻訳日:2023-06-16 19:46:21 公開日:2023-06-14

# Histogram Oriented Gradient Based Support Vector Machine を用いた遅発性トマト病の早期診断

Early Detection of Late Blight Tomato Disease using Histogram Oriented Gradient based Support Vector Machine ( http://arxiv.org/abs/2306.08326v1 )

ライセンス: Link先を確認

M. Ishaq, M. Waqas

(参考訳) トマトは地球上で最も重要な果物の1つである。農業生産において重要な役割を担っている。本研究はトマトにおける遅発性病の早期発見のための新しいスマート手法を提案する。本研究は,フィールド(植物村のデータセット)からのイメージの増加によるデータセットの改善と,遅延トマト病のリアルタイム検出のためのサポートベクターマシン(SVM)とヒストグラム指向勾配(HOG)からなるハイブリッドアルゴリズムを提案する。遅発性トマト葉病の早期発見のためのHOGに基づくSVMモデルを提案する。 MSE,精度,精度,リコールの観点から,提案モデルの性能を決定木やKNNと比較する。農業における先進技術の統合は産業に革命をもたらす可能性があり、より効率的で持続可能な利益をもたらす。トマト病の早期発見に関する研究は、スマート農業の重要性の高まり、気候に配慮した農業の必要性、天然資源をより効率的に活用する必要性の高まり、収穫高の需要に寄与する。提案したSVMとHOGのハイブリッドアルゴリズムは,トマトの遅発性病の早期発見に有意な可能性を秘めている。決定木とKNNアルゴリズムに対して提案したモデルの性能と,将来のアプリケーションに最適なアルゴリズムを選択するのに有効である。この研究は、農家が作物の収量と品質を最適化し、農業慣行の環境への影響を減らし、データ駆動による決定を下すのに役立つ。

The tomato is one of the most important fruits on earth. It plays an important and useful role in the agricultural production of any country. This research propose a novel smart technique for early detection of late blight diseases in tomatoes. This work improve the dataset with an increase in images from the field (the Plant Village dataset) and proposed a hybrid algorithm composed of support vector machines (SVM) and histogram-oriented gradients (HOG) for real-time detection of late blight tomato disease. To propose a HOG-based SVM model for early detection of late blight tomato leaf disease. To check the performance of the proposed model in terms of MSE, accuracy, precision, and recall as compared to Decision Tree and KNN. The integration of advanced technology in agriculture has the potential to revolutionize the industry, making it more efficient, sustainable, and profitable. This research work on the early detection of tomato diseases contributes to the growing importance of smart farming, the need for climate-smart agriculture, the rising need to more efficiently utilize natural resources, and the demand for higher crop yields. The proposed hybrid algorithm of SVM and HOG has significant potential for the early detection of late blight disease in tomato plants. The performance of the proposed model against decision tree and KNN algorithms and the results may assist in selecting the best algorithm for future applications. The research work can help farmers make data-driven decisions to optimize crop yield and quality while also reducing the environmental impact of farming practices.

翻訳日:2023-06-16 19:46:00 公開日:2023-06-14

# NodeFormer: ノード分類のためのスケーラブルなグラフ構造学習トランスフォーマー

NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification ( http://arxiv.org/abs/2306.08385v1 )

ライセンス: Link先を確認

Qitian Wu, Wentao Zhao, Zenan Li, David Wipf, Junchi Yan

(参考訳) グラフニューラルネットワークは、相互接続データによる学習のために広く研究されている。これにもかかわらず、近年の証拠は、GNNが過剰なスカッシング、ヘテロフィリー、長距離依存関係の扱い、エッジの不完全性、特にグラフの完全欠如に関連する欠陥を明らかにしている。メッセージパッシングのための新しい適応トポロジを学習することが有効な解決策であるが、二次複雑性に関する問題は、大規模ネットワークにおけるスケーラビリティと精度の同時保証を妨げる。本稿では,大規模グラフ上でノード分類を行うTransformerスタイルネットワークにおいて,任意のノード間のノード信号を効率的に伝搬するための新しい全ペアメッセージパッシング方式を提案する。具体的には、効率的な計算は、アルゴリズムの複雑さを線形性(w.r.t. node number)に還元し、潜在グラフ構造を大きな、潜在的に完全連結なグラフから微分可能な方法で学習する。設計の正当化として、付随する理論も提供します。広範な実験により、入力グラフが欠落しているグラフ(最大2mノード)のノード分類や、グラフ強調アプリケーション(画像分類など)など、様々なタスクにおいて、この手法が有望な有効性を示す。

Graph neural networks have been extensively studied for learning with inter-connected data. Despite this, recent evidence has revealed GNNs' deficiencies related to over-squashing, heterophily, handling long-range dependencies, edge incompleteness and particularly, the absence of graphs altogether. While a plausible solution is to learn new adaptive topology for message passing, issues concerning quadratic complexity hinder simultaneous guarantees for scalability and precision in large networks. In this paper, we introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes, as an important building block for a pioneering Transformer-style network for node classification on large graphs, dubbed as \textsc{NodeFormer}. Specifically, the efficient computation is enabled by a kernerlized Gumbel-Softmax operator that reduces the algorithmic complexity to linearity w.r.t. node numbers for learning latent graph structures from large, potentially fully-connected graphs in a differentiable manner. We also provide accompanying theory as justification for our design. Extensive experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs (with up to 2M nodes) and graph-enhanced applications (e.g., image classification) where input graphs are missing.

翻訳日:2023-06-16 19:39:51 公開日:2023-06-14

# speechglue: 自己教師付き音声モデルが言語知識をいかにうまく捉えられるか?

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? ( http://arxiv.org/abs/2306.08374v1 )

ライセンス: Link先を確認

Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma

(参考訳) 音声表現のための自己教師付き学習(SSL)は、音声認識や話者認識など、様々な下流タスクにうまく適用されている。最近では、音声SSLモデルも音声言語理解タスクの進行に有用であることが示され、SSLモデルが音響だけでなく言語情報も学習できる可能性が示唆されている。本稿では,音声ssl技術が言語知識をうまく捉えることができるかを明らかにすることを目的とする。本研究では,汎用言語理解評価(GLUE)ベンチマークの音声バージョンであるSpeechGLUEを紹介する。 GLUEは様々な自然言語理解タスクから構成されるため、SpeechGLUEは音声SSLモデルの言語能力の程度を解明することができる。実験では、テキストベースのSSLモデルに劣らず、音声SSLモデルはベースラインよりも優れた性能を示し、ラベルなしの音声データからある程度の言語知識を得られることを示唆している。

Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition. More recently, speech SSL models have also been shown to be beneficial in advancing spoken language understanding tasks, implying that the SSL models have the potential to learn not only acoustic but also linguistic information. In this paper, we aim to clarify if speech SSL techniques can well capture linguistic knowledge. For this purpose, we introduce SpeechGLUE, a speech version of the General Language Understanding Evaluation (GLUE) benchmark. Since GLUE comprises a variety of natural language understanding tasks, SpeechGLUE can elucidate the degree of linguistic ability of speech SSL models. Experiments demonstrate that speech SSL models, although inferior to text-based SSL models, perform better than baselines, suggesting that they can acquire a certain amount of general linguistic knowledge from just unlabeled speech data.

翻訳日:2023-06-16 19:39:26 公開日:2023-06-14

# アスペクト感情三重項抽出のための意味的拡張二重エンコーダ

A semantically enhanced dual encoder for aspect sentiment triplet extraction ( http://arxiv.org/abs/2306.08373v1 )

ライセンス: Link先を確認

Baoxing Jiang, Shehui Liang, Peiyu Liu, Kaifang Dong, Hongye Li

(参考訳) 感情三重項抽出(ASTE)は、感情三重項を包括的に識別することを目的としたアスペクトベース感情分析(ABSA)の重要なサブタスクである。従来の研究は、革新的なテーブル充填戦略によるASTEの強化に重点を置いてきた。しかし、これらのアプローチはしばしば言語表現の多面的性質を見落とし、アスペクトと意見の間の貴重な相互作用情報を失うことになる。この制限に対処するために,BERTをベースとした基本エンコーダと,Bi-LSTMネットワークとGCN(Graph Convolutional Network)で構成される特定のエンコーダの両方を利用するフレームワークを提案する。基本エンコーダは言語表現の表面レベルセマンティクスをキャプチャし、特定のエンコーダは構文情報や語彙情報を含むより深いセマンティクスを抽出する。コメントの係り受け木をモデル化し,単語の係り受けや位置情報を考慮し,文の根底にある意図とより関連のある意味を捉えることを目的とする。相互作用戦略は、2つのエンコーダが学んだ意味を組み合わせ、複数の視点の融合を可能にし、アスペクト-オピニオン関係のより包括的な理解を促進する。ベンチマークデータセットを用いた実験により,提案フレームワークの最先端性能を実証した。

Aspect sentiment triplet extraction (ASTE) is a crucial subtask of aspect-based sentiment analysis (ABSA) that aims to comprehensively identify sentiment triplets. Previous research has focused on enhancing ASTE through innovative table-filling strategies. However, these approaches often overlook the multi-perspective nature of language expressions, resulting in a loss of valuable interaction information between aspects and opinions. To address this limitation, we propose a framework that leverages both a basic encoder, primarily based on BERT, and a particular encoder comprising a Bi-LSTM network and graph convolutional network (GCN ). The basic encoder captures the surface-level semantics of linguistic expressions, while the particular encoder extracts deeper semantics, including syntactic and lexical information. By modeling the dependency tree of comments and considering the part-of-speech and positional information of words, we aim to capture semantics that are more relevant to the underlying intentions of the sentences. An interaction strategy combines the semantics learned by the two encoders, enabling the fusion of multiple perspectives and facilitating a more comprehensive understanding of aspect--opinion relationships. Experiments conducted on benchmark datasets demonstrate the state-of-the-art performance of our proposed framework.

翻訳日:2023-06-16 19:39:10 公開日:2023-06-14

# 統一スペクトル空間特徴集合によるハイパースペクトル像の物体検出

Object Detection in Hyperspectral Image via Unified Spectral-Spatial Feature Aggregation ( http://arxiv.org/abs/2306.08370v1 )

ライセンス: Link先を確認

Xiao He, Chang Tang, Xinwang Liu, Wei Zhang, Kun Sun, Jiangfeng Xu

(参考訳) 深層学習に基づくハイパースペクトル画像(hsi)の分類と物体検出技術は,画像コンテンツ解析,解釈,より広いhsi応用において重要な役割を担っているため,注目されている。しかし、現在のハイパースペクトルオブジェクト検出アプローチは、主にスペクトル情報または空間情報を強調し、これら2つの側面間の貴重な相補関係を見越す。本研究では,高スペクトル画像に固有の豊富なスペクトル情報と空間補完情報を効果的に活用する,新しい \textbf{S}pectral-\textbf{S}patial \textbf{A}ggregation (S2ADet) オブジェクト検出器を提案する。 S2ADetは、ハイパースペクトル情報デカップリング(HID)モジュールと、2ストリーム特徴抽出ネットワークと1ステージ検出ヘッドとを備える。 HIDモジュールは、帯域選択と主成分分析によりスペクトルおよび空間情報を集約することによりハイパースペクトル画像を処理する。得られた空間的およびスペクトル的集約情報に基づいて,スペクトル空間的特徴を相互作用する特徴集約2ストリームネットワークを提案する。さらに、既存のデータベースの制限に対処するために、hod3kという、さまざまな実世界のシーンでキャプチャされた3,242のハイパースペクトルイメージを含む、広範なデータセットに注釈を付け、3つのオブジェクトクラスを包含する。これらの画像は512x256ピクセルの解像度を持ち、470nmから620nmまでの16バンドをカバーしている。 2つのデータセットに関する総合的な実験は、S2ADetが既存の最先端の手法を超え、堅牢で信頼性の高い結果が得られることを示した。この作業のデモコードとデータセットは、 \url{https://github.com/hexiao-cs/S2ADet}で公開されている。

Deep learning-based hyperspectral image (HSI) classification and object detection techniques have gained significant attention due to their vital role in image content analysis, interpretation, and wider HSI applications. However, current hyperspectral object detection approaches predominantly emphasize either spectral or spatial information, overlooking the valuable complementary relationship between these two aspects. In this study, we present a novel \textbf{S}pectral-\textbf{S}patial \textbf{A}ggregation (S2ADet) object detector that effectively harnesses the rich spectral and spatial complementary information inherent in hyperspectral images. S2ADet comprises a hyperspectral information decoupling (HID) module, a two-stream feature extraction network, and a one-stage detection head. The HID module processes hyperspectral images by aggregating spectral and spatial information via band selection and principal components analysis, consequently reducing redundancy. Based on the acquired spatial and spectral aggregation information, we propose a feature aggregation two-stream network for interacting spectral-spatial features. Furthermore, to address the limitations of existing databases, we annotate an extensive dataset, designated as HOD3K, containing 3,242 hyperspectral images captured across diverse real-world scenes and encompassing three object classes. These images possess a resolution of 512x256 pixels and cover 16 bands ranging from 470 nm to 620 nm. Comprehensive experiments on two datasets demonstrate that S2ADet surpasses existing state-of-the-art methods, achieving robust and reliable results. The demo code and dataset of this work are publicly available at \url{https://github.com/hexiao-cs/S2ADet}.

翻訳日:2023-06-16 19:38:49 公開日:2023-06-14

# T5-SR: セマンティックパーシングのための統一Seq-to-Seqデコーディング戦略

T5-SR: A Unified Seq-to-Seq Decoding Strategy for Semantic Parsing ( http://arxiv.org/abs/2306.08368v1 )

ライセンス: Link先を確認

Yuntao Li and Zhenpeng Su and Yutian Li and Hanchu Zhang and Sirui Wang and Wei Wu and Yan Zhang

(参考訳) 自然言語クエリをseq2seq形式でsqlに変換することが近年注目を集めている。しかし、抽象シンタクティックツリーベースのSQL生成と比較すると、セq2seqセマンティックパーザは、スキーマ情報予測の質の低下や自然言語クエリとSQL間のセマンティックコヒーレンス不足など、多くの課題に直面している。本稿では,上記の問題点を分析し,srと呼ばれる新たな中間表現ssqlとスコア再推定器を用いた再ランキング法を用いて,上記の障害を解決するseq2seq指向の復号戦略を提案する。実験の結果,提案手法の有効性が示され,T5-SR-3b はスパイダーデータセット上で新たな最先端結果を得ることができた。

Translating natural language queries into SQLs in a seq2seq manner has attracted much attention recently. However, compared with abstract-syntactic-tree-based SQL generation, seq2seq semantic parsers face much more challenges, including poor quality on schematical information prediction and poor semantic coherence between natural language queries and SQLs. This paper analyses the above difficulties and proposes a seq2seq-oriented decoding strategy called SR, which includes a new intermediate representation SSQL and a reranking method with score re-estimator to solve the above obstacles respectively. Experimental results demonstrate the effectiveness of our proposed techniques and T5-SR-3b achieves new state-of-the-art results on the Spider dataset.

翻訳日:2023-06-16 19:38:15 公開日:2023-06-14

# SaliencyCut:Open-set Fine-Grained Anomaly Detectionのための可塑性異常の増大

SaliencyCut: Augmenting Plausible Anomalies for Open-set Fine-Grained Anomaly Detection ( http://arxiv.org/abs/2306.08366v1 )

ライセンス: Link先を確認

Jianan Ye, Yijie Hu, Xi Yang, Qiu-Feng Wang, Chao Huang, Kaizhu Huang

(参考訳) オープンセットのきめ細かい異常検出は、トレーニング中に見えなかったような異常を検出するために、識別可能なきめ細かい特徴を学習する必要がある難しいタスクである。安価で効果的なアプローチとして、データ拡張は、そのようなモデルのトレーニングを改善するために擬似異常を作成するために広く使われている。拡張手法の最近の知恵は、ランダムな擬似インスタンスの生成に焦点が当てられており、これにより、拡張インスタンスと異常が混ざり合ったり、典型的な異常範囲から外れたりする可能性がある。この問題に対処するため,本論文では,疑似だがより一般的な異常を発生させるために,サリエンシー誘導型データ拡張手法であるsaliencycutを提案する。さらに,各サンプルの異常スコアを学習するために,正規および異常学習ヘッドからなる2頭学習戦略を展開した。理論的解析により、このメカニズムはより扱いやすく、データログライクな下限を提供することが示された。次に、各サンプルから微細な異常特徴を抽出・評価し、異常事例の識別表現の学習を容易にするために、異常学習ヘッドにパッチワイド残余モジュールを新たに設計する。 6つの実世界の異常検出データセットで行った大規模な実験は、様々な条件下でのベースラインや他の最先端手法に対する我々の手法の優位性を実証している。

Open-set fine-grained anomaly detection is a challenging task that requires learning discriminative fine-grained features to detect anomalies that were even unseen during training. As a cheap yet effective approach, data augmentation has been widely used to create pseudo anomalies for better training of such models. Recent wisdom of augmentation methods focuses on generating random pseudo instances that may lead to a mixture of augmented instances with seen anomalies, or out of the typical range of anomalies. To address this issue, we propose a novel saliency-guided data augmentation method, SaliencyCut, to produce pseudo but more common anomalies which tend to stay in the plausible range of anomalies. Furthermore, we deploy a two-head learning strategy consisting of normal and anomaly learning heads, to learn the anomaly score of each sample. Theoretical analyses show that this mechanism offers a more tractable and tighter lower bound of the data log-likelihood. We then design a novel patch-wise residual module in the anomaly learning head to extract and assess the fine-grained anomaly features from each sample, facilitating the learning of discriminative representations of anomaly instances. Extensive experiments conducted on six real-world anomaly detection datasets demonstrate the superiority of our method to the baseline and other state-of-the-art methods under various settings.

翻訳日:2023-06-16 19:37:52 公開日:2023-06-14

# 摂動データを用いた高能率オフライン強化学習

Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources ( http://arxiv.org/abs/2306.08364v1 )

ライセンス: Link先を確認

Chengshuai Shi, Wei Xiong, Cong Shen, Jing Yang

(参考訳) オフライン強化学習(rl)に関する既存の理論的研究は、ターゲットタスクから直接サンプリングされたデータセットをほとんど考慮している。しかし実際には、データは複数の異種だが関連する情報源から来ることが多い。このギャップによって動機づけられたこの研究は、ターゲットタスクのランダムな摂動バージョンから収集される複数のデータセットでオフラインRLを厳格に理解することを目的としている。情報理論の下限が導出され、データサンプルの数に加えて、関係するソースの数に関する必要条件が明らかにされる。次に,データソース毎に有限個のデータサンプルからのサンプル不確実性と,利用可能なデータソースの有限個数によるソース不確実性を同時に考慮した,新しいhetpeviアルゴリズムを提案する。理論的解析により、HetPEVIは、データソースが優れたデータカバレッジを提供する限り、ターゲットタスクを解決できることを示した。さらに、HetPEVIは水平長の多項式係数まで最適であることが示されている。最後に、この研究はオフラインのマルコフゲームとオフラインのロバストなRLに拡張され、提案された設計の一般化と理論的解析を示す。

Existing theoretical studies on offline reinforcement learning (RL) mostly consider a dataset sampled directly from the target task. In practice, however, data often come from several heterogeneous but related sources. Motivated by this gap, this work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of from itself. An information-theoretic lower bound is derived, which reveals a necessary requirement on the number of involved sources in addition to that on the number of data samples. Then, a novel HetPEVI algorithm is proposed, which simultaneously considers the sample uncertainties from a finite number of data samples per data source and the source uncertainties due to a finite number of available data sources. Theoretical analyses demonstrate that HetPEVI can solve the target task as long as the data sources collectively provide a good data coverage. Moreover, HetPEVI is demonstrated to be optimal up to a polynomial factor of the horizon length. Finally, the study is extended to offline Markov games and offline robust RL, which demonstrates the generality of the proposed designs and theoretical analyses.

翻訳日:2023-06-16 19:37:01 公開日:2023-06-14

# テキスト対画像生成の知覚と現実

Perceptions and Realities of Text-to-Image Generation ( http://arxiv.org/abs/2306.08363v1 )

ライセンス: Link先を確認

Jonas Oppenlaender, Johanna Silvennoinen, Ville Paananen, Aku Visuri

(参考訳) 生成人工知能(AI)は広く普及している技術であり、社会や個人に大きな影響を与える。 10年足らず前には、クリエイティブな作業が自動化される最後のものになるだろうと考えられていました。本稿では,テキストから画像への生成に対する人々の知覚に関する調査の結果について述べる。我々は,新興技術に対する参加者の技術的理解,その恐怖と懸念,および個人や社会に対するテキスト・ツー・イメージ・ジェネレーションのリスクと危険性について考察する。参加者は、この技術に関連するリスクと危険性を認識していたが、技術が個人的リスクであると考える参加者はごくわずかである。他人のリスクは参加者にとってより容易に認識できた。芸術家は特に危険にさらされた。この技術を試した参加者は、試した人よりも将来の重要性を低く評価した。この結果は、多くの人々が、生成的人工知能の潜在的な個人的リスクと、この技術に関連する差し迫った社会的変化をまだ知らないことを示している。

Generative artificial intelligence (AI) is a widely popular technology that will have a profound impact on society and individuals. Less than a decade ago, it was thought that creative work would be among the last to be automated - yet today, we see AI encroaching on many creative domains. In this paper, we present the findings of a survey study on people's perceptions of text-to-image generation. We touch on participants' technical understanding of the emerging technology, their fears and concerns, and thoughts about risks and dangers of text-to-image generation to the individual and society. We find that while participants were aware of the risks and dangers associated with the technology, only few participants considered the technology to be a personal risk. The risks for others were more easy to recognize for participants. Artists were particularly seen at risk. Participants who had tried the technology rated its future importance lower than those who had not tried it. This result shows that many people are still oblivious of the potential personal risks of generative artificial intelligence and the impending societal changes associated with this technology.

翻訳日:2023-06-16 19:36:29 公開日:2023-06-14

# 協調型マルチエージェント強化学習を支援する階層型タスクネットワーク計画

Hierarchical Task Network Planning for Facilitating Cooperative Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2306.08359v1 )

ライセンス: Link先を確認

Xuechen Mu, Hankz Hankui Zhuo, Chen Chen, Kai Zhang, Chao Yu and Jianye Hao

(参考訳) sparse reward multi-agent reinforcement learning (marl)環境を共同方法でトラップで探索することは複雑なタスクである。エージェントは通常、目標状態に達しず、トラップに陥り、システム全体のパフォーマンスに影響を与えます。そこで本稿では,事前知識を用いて探索空間を縮小し,学習を支援するフレームワークであるSOMARLを提案する。 SOMARLではエージェントはMARL環境の一部として扱われ、シンボリック知識は木構造を用いて組み込まれ、知識階層を構築する。本フレームワークは,階層型タスクネットワーク(HTN)とメタコントローラを備えたハイブリッドモジュールを高レベルで,MARLベースの対話モジュールを低レベルとする2層階層構造を有する。 HTNモジュールとメタコントローラは階層的ドメイン定義言語(HDDL)とオプションフレームワークを使用して、それぞれ記号的知識を形式化し、ドメイン知識と記号的オプションセットを取得する。さらに、HTNモジュールはドメイン知識を活用し、メタコントローラがシンボリックオプションを選択するのを支援することで、低レベルのエージェント探索を誘導する。メタコントローラはさらに、探索行動を制限し、必要に応じてHTN計画ソリューションを調整するために、シンボリックオプションの固有の報酬を計算する。我々は,findtreasureとmoveboxの2つのベンチマークでsomarlを評価し,最先端のmarlおよびmarl環境におけるsubgoalベースラインよりも優れた性能を報告した。

Exploring sparse reward multi-agent reinforcement learning (MARL) environments with traps in a collaborative manner is a complex task. Agents typically fail to reach the goal state and fall into traps, which affects the overall performance of the system. To overcome this issue, we present SOMARL, a framework that uses prior knowledge to reduce the exploration space and assist learning. In SOMARL, agents are treated as part of the MARL environment, and symbolic knowledge is embedded using a tree structure to build a knowledge hierarchy. The framework has a two-layer hierarchical structure, comprising a hybrid module with a Hierarchical Task Network (HTN) planning and meta-controller at the higher level, and a MARL-based interactive module at the lower level. The HTN module and meta-controller use Hierarchical Domain Definition Language (HDDL) and the option framework to formalize symbolic knowledge and obtain domain knowledge and a symbolic option set, respectively. Moreover, the HTN module leverages domain knowledge to guide low-level agent exploration by assisting the meta-controller in selecting symbolic options. The meta-controller further computes intrinsic rewards of symbolic options to limit exploration behavior and adjust HTN planning solutions as needed. We evaluate SOMARL on two benchmarks, FindTreasure and MoveBox, and report superior performance over state-of-the-art MARL and subgoal-based baselines for MARL environments significantly.

翻訳日:2023-06-16 19:36:11 公開日:2023-06-14

# 場-曲率結合による絡み合い領域法違反

Entanglement area law violation from field-curvature coupling ( http://arxiv.org/abs/2306.08357v1 )

ライセンス: Link先を確認

Alessio Belfiglio, Orlando Luongo, Stefano Mancini

(参考訳) 時空曲率と最小結合しない大スカラー場の絡み合いエントロピーを静的で球対称な背景を仮定して検討する。我々は、球状殻の格子を導入し、半径方向のカットオフを付与することで、フィールドハミルトンを識別する。次にフィールドの基底状態を調べ,非ミニマルカップリングによる領域則からの逸脱を定量化し,特にシュワルツシルト・ド・ジッターとヘイワード時空に着目し,ド・ジッター時空を制限ケースとして論じた。また, 大規模正のカップリング定数は, 小さいフィールド質量の場合であっても, 境界領域に対するエントロピースケーリングを著しく変化させることができることを示した。我々の結果はブラックホールのエントロピー生成と初期宇宙シナリオの観点から解釈される。

We investigate the entanglement entropy of a massive scalar field nonminimally coupled to spacetime curvature, assuming a static, spherically symmetric background. We discretize the field Hamiltonian by introducing a lattice of spherical shells and imposing a cutoff in the radial direction. We then study the ground state of the field and quantify deviations from area law due to nonminimal coupling, focusing in particular on Schwarzschild-de Sitter and Hayward spacetimes, also discussing de Sitter spacetime as a limiting case. We show that large positive coupling constants can significantly alter the entropy scaling with respect to the boundary area, even in case of small field mass. Our outcomes are interpreted in view of black hole entropy production and early universe scenarios.

翻訳日:2023-06-16 19:35:46 公開日:2023-06-14

# MgO(001)基板上のDy原子:DFT+U(HIA)法による研究

Dy adatom on MgO(001) substrate: DFT+U(HIA) study ( http://arxiv.org/abs/2306.08415v1 )

ライセンス: Link先を確認

Alexander B. Shick (1,2), Eduard Belsch (1,3), Alexander I. Lichtenstein (3,4) ((1) Institute of Physics, Czech Academy of Sciences, Na Slovance 2, 182 21 Prague, Czech Republic (2) Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, Rehovoth 76100, Israel (3) Institute of Theoretical Physics, University of Hamburg, 20355 Hamburg, Germany (4) European X-Ray Free-Electron Laser Facility, Holzkoppel 4, 22869 Schenefeld, Germany)

(参考訳) MgO(001)基板上に吸着した個々のDy原子の電子構造と磁性について、密度汎関数理論とアンダーソン不純物モデル(DFT+U(HIA))に対するハバードI近似の組み合わせを用いて検討した。 2価のDy$^{2+}$ adatom in $f^{10}$ configurationが見つかる。算出したX線吸収(XAS)と磁気円二色性(XMCD)スペクトルを実験データと比較した。退化した$|{J=8.0, J_z= \pm 4.0}>$状態の間の量子トンネルは、磁気モーメントの面内配向を持つ$|{J=8.0, J_z=0.0}>$基底状態を形成する。これはMg(001)基板上におけるMgO原子の残留磁化の欠如を説明する。我々の研究は、希土類単原子磁石のさらなる研究と予測に有効なルートを提供することができる。

The electronic structure and magnetism of individual Dy atom adsorbed on the MgO(001) substrate is investigated using the combination of the density functional theory with the Hubbard-I approximation to the Anderson impurity model (DFT+U(HIA)). The divalent Dy$^{2+}$ adatom in $f^{10}$ configuration is found. The calculated x-ray absorption (XAS) and magnetic circular dichroism (XMCD) spectra are compared to the experimental data. Quantum tunneling between degenerate $|{J=8.0, J_z= \pm 4.0}>$ states leads to formation of $|{J=8.0, J_z= 0.0}>$ ground state with an in-plane orientation of the magnetic moment. It explains absence of remanent magnetization in MgO adatom on the top of Mg(001) substrate. Our studies can provide a viable route for further investigation and prediction of the rare-earth single atom magnets.

翻訳日:2023-06-16 19:29:48 公開日:2023-06-14

# 音声強調のための微調整自己監督モデルの特徴正規化

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement ( http://arxiv.org/abs/2306.08406v1 )

ライセンス: Link先を確認

Hejung Yang, Hong-Goo Kang

(参考訳) 自己教師付き学習を用いて訓練された大規模で事前訓練された表現モデルは、入力データから高品質な有能な特徴を抽出できるため、機械学習の様々な分野で人気を集めている。そのため、音声認識など様々なパターン分類タスクのベースネットワークとして頻繁に使用されている。しかし、これらのモデルを音声信号生成の分野に適用する研究はあまり行われていない。本稿では,下流音声強調タスクにおける事前学習音声表現モデルの有用性について検討する。事前学習モデルの入力特徴と目標拡張モデルとのミスマッチを軽減するために,これらのモジュールをスムーズにリンクする新しい特徴正規化手法を採用する。提案手法は, 各種事前学習音声モデルと組み合わせた場合, ベースラインと比較し, 音声品質の大幅な向上を実現する。

Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have been frequently used as base networks for various pattern classification tasks such as speech recognition. However, not much research has been conducted on applying these types of models to the field of speech signal generation. In this paper, we investigate the feasibility of using pre-trained speech representation models for a downstream speech enhancement task. To alleviate mismatches between the input features of the pre-trained model and the target enhancement model, we adopt a novel feature normalization technique to smoothly link these modules together. Our proposed method enables significant improvements in speech quality compared to baselines when combined with various types of pre-trained speech models.

翻訳日:2023-06-16 19:29:31 公開日:2023-06-14

# 生物医学的関連抽出のためのコーパスの構築

Building a Corpus for Biomedical Relation Extraction of Species Mentions ( http://arxiv.org/abs/2306.08403v1 )

ライセンス: Link先を確認

Oumaima El Khettari, Solen Quiniou, Samuel Chaffron

(参考訳) バイオメディカルテキストの文レベルで,生物間の有意義な連接関係を抽出するために,手動で注釈付きコーパス,種-種間相互作用(種-種間相互作用)を提案する。このコーパスはpubtatorを利用して、異なる名前付きエンティティ認識種タガーを評価した後、全文記事に種を注釈付けする。最初の成果は、BERTとその生物医学的変異体を用いた種間関係の抽出である。

We present a manually annotated corpus, Species-Species Interaction, for extracting meaningful binary relations between species, in biomedical texts, at sentence level, with a focus on the gut microbiota. The corpus leverages PubTator to annotate species in full-text articles after evaluating different Named Entity Recognition species taggers. Our first results are promising for extracting relations between species using BERT and its biomedical variants.

翻訳日:2023-06-16 19:29:17 公開日:2023-06-14

# LiveChat:ライブストリーミングから自動構築された大規模パーソナライズされた対話データセット

LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically Constructed from Live Streaming ( http://arxiv.org/abs/2306.08401v1 )

ライセンス: Link先を確認

Jingsheng Gao, Yixin Lian, Ziyi Zhou, Yuzhuo Fu, Baoyuan Wang

(参考訳) 近年,オープンドメイン対話システムは有望な進歩を遂げている。最先端の対話エージェントは、大規模なテキストベースのソーシャルメディアデータと大規模な事前訓練されたモデルに基づいて構築されているが、RedditやWeiboなどの公開データセットのバウンダリ転送可能性や、ライブストリーミングなど、急速に成長するシナリオでも、これらのエージェントがうまく機能する保証はない。実写オープンドメインシナリオにおけるベンチマークの応答と確立の本質的な能力を改善するため,351のペルソナの平均セッション数が約3800、各ペルソナの詳細なプロファイルが約1億3300万件からなるLiveChatデータセットを紹介した。 livechatは、インターネット上で多数のライブビデオを処理することで自動的に構築される。そこで本研究では,応答モデルと宛先認識の2つの重要な課題を対象とし,高度な手法に基づく検索ベースラインを提案する。実験により、ペルソナプロファイルとペルソナ当たりの平均セッションの活用によるポジティブな効果が検証された。さらに、LiveChat上の先進世代モデルの転送可能性もベンチマークし、現在の課題に対する今後の方向性を示す。

Open-domain dialogue systems have made promising progress in recent years. While the state-of-the-art dialogue agents are built upon large-scale text-based social media data and large pre-trained models, there is no guarantee these agents could also perform well in fast-growing scenarios, such as live streaming, due to the bounded transferability of pre-trained models and biased distributions of public datasets from Reddit and Weibo, etc. To improve the essential capability of responding and establish a benchmark in the live open-domain scenario, we introduce the LiveChat dataset, composed of 1.33 million real-life Chinese dialogues with almost 3800 average sessions across 351 personas and fine-grained profiles for each persona. LiveChat is automatically constructed by processing numerous live videos on the Internet and naturally falls within the scope of multi-party conversations, where the issues of Who says What to Whom should be considered. Therefore, we target two critical tasks of response modeling and addressee recognition and propose retrieval-based baselines grounded on advanced techniques. Experimental results have validated the positive effects of leveraging persona profiles and larger average sessions per persona. In addition, we also benchmark the transferability of advanced generation-based models on LiveChat and pose some future directions for current challenges.

翻訳日:2023-06-16 19:29:07 公開日:2023-06-14

# メタ強化学習の副産物としての単純エンボディード言語学習

Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning ( http://arxiv.org/abs/2306.08400v1 )

ライセンス: Link先を確認

Evan Zheran Liu, Sahaana Suri, Tong Mu, Allan Zhou, Chelsea Finn

(参考訳) 機械学習モデルは通常、言語タスク(例えば、次の単語予測)を直接訓練することで言語を学ぶが、非言語タスク(例えば、食べ物の取得)を解決する副産物として、人間の子供に言語が現れる。 embodied reinforcement learning (rl)エージェントは、非言語タスクから間接的に言語を学習できるのでしょうか? 言語とその意味を関連付ける学習には、動的環境と様々な言語が必要である。そこで本稿では,タスクによって異なる言語を持つマルチタスク環境において,この問題を考察する。具体的には、エージェントが特定のオフィスを見つけることを目標とするオフィスナビゲーション環境を設計し、異なる建物(タスク)でオフィスの位置が異なる。それぞれの建物には、ゴールオフィスの位置を簡単な言語で記述したフロアプランが含まれており、訪問時にRGBイメージとして視覚的に読むことができる。 RLエージェントは言語を間接的に学習することができる。現在のメタRLアルゴリズムで訓練されたエージェントは、ホールドアウトレイアウトと言語フレーズを備えたフロアプランの読み込みに成功し、直接的な言語監督を受けていないにも関わらず、すぐに正しいオフィスに移動する。

Whereas machine learning models typically learn language by directly training on language tasks (e.g., next-word prediction), language emerges in human children as a byproduct of solving non-language tasks (e.g., acquiring food). Motivated by this observation, we ask: can embodied reinforcement learning (RL) agents also indirectly learn language from non-language tasks? Learning to associate language with its meaning requires a dynamic environment with varied language. Therefore, we investigate this question in a multi-task environment with language that varies across the different tasks. Specifically, we design an office navigation environment, where the agent's goal is to find a particular office, and office locations differ in different buildings (i.e., tasks). Each building includes a floor plan with a simple language description of the goal office's location, which can be visually read as an RGB image when visited. We find RL agents indeed are able to indirectly learn language. Agents trained with current meta-RL algorithms successfully generalize to reading floor plans with held-out layouts and language phrases, and quickly navigate to the correct office, despite receiving no direct language supervision.

翻訳日:2023-06-16 19:28:41 公開日:2023-06-14

# スケーラブルなニューラル確率的解集合プログラミング

Scalable Neural-Probabilistic Answer Set Programming ( http://arxiv.org/abs/2306.08397v1 )

ライセンス: Link先を確認

Arseny Skryagin and Daniel Ochs and Devendra Singh Dhami and Kristian Kersting

(参考訳) ニューラルネットワークのロバスト性とシンボリックメソッドの表現性を組み合わせた目標は、ニューロシンボリックaiへの関心を再び高めた。深層ニューラルネットワークの確率推定により確率論的論理プログラミングを行うために,DPPL(Deep Probabilistic Programming Languages)が開発された。しかし、最近のSOTA DPPLアプローチでは、条件付き確率的クエリに限られており、真の関節確率推定のパワーを提供していない。そこで本研究では,DPPL内でのトラクタブル確率的推論の容易な統合を提案する。本稿では,NPP(Neural-Probabilistic Predicates)と解集合プログラミング(ASP)を介して結合された論理プログラムからなる新しいDPPLであるSLASHを紹介する。 NPPは、すべての深いモデルタイプとそれらの組み合わせを単一の確率的述語として表現できる新しい設計原理である。この文脈では、述語の原子表記を調整することにより、様々な種類の確率的クエリに応答する新しい$+/-$表記を導入する。提案手法は, 予測性能を犠牲にすることなく, 推論を高速化し, 統計的に重要でない部分(地上)を創出する方法を示す。我々は、MNIST追加のベンチマークタスクやVQA(Visual Question Answering)など、様々なタスクでSLASHを評価する。

The goal of combining the robustness of neural networks and the expressiveness of symbolic methods has rekindled the interest in Neuro-Symbolic AI. Deep Probabilistic Programming Languages (DPPLs) have been developed for probabilistic logic programming to be carried out via the probability estimations of deep neural networks. However, recent SOTA DPPL approaches allow only for limited conditional probabilistic queries and do not offer the power of true joint probability estimation. In our work, we propose an easy integration of tractable probabilistic inference within a DPPL. To this end, we introduce SLASH, a novel DPPL that consists of Neural-Probabilistic Predicates (NPPs) and a logic program, united via answer set programming (ASP). NPPs are a novel design principle allowing for combining all deep model types and combinations thereof to be represented as a single probabilistic predicate. In this context, we introduce a novel $+/-$ notation for answering various types of probabilistic queries by adjusting the atom notations of a predicate. To scale well, we show how to prune the stochastically insignificant parts of the (ground) program, speeding up reasoning without sacrificing the predictive performance. We evaluate SLASH on a variety of different tasks, including the benchmark task of MNIST addition and Visual Question Answering (VQA).

翻訳日:2023-06-16 19:28:20 公開日:2023-06-14

# 公正度とEU非差別法との整合性:復調パリティと条件付き復調異性

Compatibility of Fairness Metrics with EU Non-Discrimination Laws: Demographic Parity & Conditional Demographic Disparity ( http://arxiv.org/abs/2306.08394v1 )

ライセンス: Link先を確認

Lisa Koutsoviti Koumeri, Magali Legast, Yasaman Yousefi, Koen Vanhoof, Axel Legay, Christoph Schommer

(参考訳) 実証的な証拠は、機械学習(ML)技術によって駆動されるアルゴリズムによる決定が、法的に保護されたグループに対する差別を脅かしたり、新たな不公平な情報源を創り出すことを示唆している。この研究は、EUの非差別的法的枠組みにおける公正に対する文脈的アプローチをサポートし、公正度メトリクスと公正性制約による法的公正性を保証するためのポイントを評価することを目的としている。そこで本研究では, 公平性定義(DP)による非差別・差分処理の法的概念を, 条件付き復号法(CDD)を用いて分析する。我々は、EU非差別法の下で実施される司法解釈に対する文脈的アプローチを有効化しつつ、予測のバイアスを減らすことができるかどうかを評価するために、異なる分類器を公正な制約で訓練し比較する。 3つのシナリオにおける実験結果から,処理バイアス軽減アルゴリズムがそれぞれ異なる性能をもたらすことが示された。我々の実験と分析は、手元にあるケースと法的正当性に応じて、AIによる意思決定が法的な観点から公平である可能性を示唆している。これらの予備的な結果は、さらなるケーススタディ、メトリクス、公平性の概念を含む将来の研究を促進する。

Empirical evidence suggests that algorithmic decisions driven by Machine Learning (ML) techniques threaten to discriminate against legally protected groups or create new sources of unfairness. This work supports the contextual approach to fairness in EU non-discrimination legal framework and aims at assessing up to what point we can assure legal fairness through fairness metrics and under fairness constraints. For that, we analyze the legal notion of non-discrimination and differential treatment with the fairness definition Demographic Parity (DP) through Conditional Demographic Disparity (CDD). We train and compare different classifiers with fairness constraints to assess whether it is possible to reduce bias in the prediction while enabling the contextual approach to judicial interpretation practiced under EU non-discrimination laws. Our experimental results on three scenarios show that the in-processing bias mitigation algorithm leads to different performances in each of them. Our experiments and analysis suggest that AI-assisted decision-making can be fair from a legal perspective depending on the case at hand and the legal justification. These preliminary results encourage future work which will involve further case studies, metrics, and fairness notions.

翻訳日:2023-06-16 19:27:59 公開日:2023-06-14

# 個人化とロバストなフェデレーション学習

Provably Personalized and Robust Federated Learning ( http://arxiv.org/abs/2306.08393v1 )

ライセンス: Link先を確認

Mariel Werner, Lie He, Sai Praneeth Karimireddy, Michael Jordan, Martin Jaggi

(参考訳) 類似の目的を持ったクライアントのクラスタ化とクラスタ単位のモデル学習は、フェデレーション学習におけるパーソナライゼーションに対する直感的で解釈可能なアプローチである。しかし、証明可能かつ最適な保証で実施することは、依然としてオープンな課題である。本研究では、クライアント上の確率勾配がK$分布の1つに対応する確率最適化問題としてパーソナライズされたフェデレーション学習を形式化する。そのような設定では、使用法を示す。一簡単なしきい値に基づくクラスタリングアルゴリズム、及び二ローカルクライアント勾配が最適収束保証を得ること。実際、クライアントの真の基盤となるクラスタリングを知っていれば、当社のレートは漸近的に一致します。さらに,我々のアルゴリズムは,勾配のごく一部が崩壊するビザンチン設定において,確実に頑健である。

Clustering clients with similar objectives and learning a model per cluster is an intuitive and interpretable approach to personalization in federated learning. However, doing so with provable and optimal guarantees has remained an open challenge. In this work, we formalize personalized federated learning as a stochastic optimization problem where the stochastic gradients on a client may correspond to one of $K$ distributions. In such a setting, we show that using i) a simple thresholding-based clustering algorithm, and ii) local client gradients obtains optimal convergence guarantees. In fact, our rates asymptotically match those obtained if we knew the true underlying clustering of the clients. Furthermore, our algorithms are provably robust in the Byzantine setting where some fraction of the gradients are corrupted.

翻訳日:2023-06-16 19:27:38 公開日:2023-06-14

# Skill-Critic: 強化学習のための学習スキルの精製

Skill-Critic: Refining Learned Skills for Reinforcement Learning ( http://arxiv.org/abs/2306.08388v1 )

ライセンス: Link先を確認

Ce Hao, Catherine Weaver, Chen Tang, Kenta Kawamoto, Masayoshi Tomizuka, Wei Zhan

(参考訳) 階層的強化学習(RL)は、政策を時間的に複数のレベルに抽象化することで、長期的な意思決定を促進することができる。スパース報酬環境における評価結果は、スキル、すなわちプリミティブアクションのシーケンスで見られる。通常、スキル潜在空間とポリシはオフラインデータから検出されるが、結果として生じる低レベルのポリシは、低カバレッジのデモンストレーションや分散シフトのために信頼性が低い可能性がある。そこで,我々は,ハイレベルなスキル選択と連動して,低レベルのポリシーを微調整する手法を提案する。これらのポリシーは、オフラインデモから学んだ潜在空間によって初期化され、規則化され、統合ポリシー最適化のガイドとなる。我々は,Gran Turismo Sportにおける新しいスパース報酬自律レースタスクを含む,複数のスパースRL環境でのアプローチを検証する。実験の結果,Skill-Criticの低レベル政策の微調整と実証誘導正規化が最適性能に不可欠であることが示唆された。画像とビデオはhttps://sites.google.com/view/skill-critic.comで入手できる。最終バージョンでコードをオープンソース化する予定です。

Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data, but the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose fine-tuning the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low and high-level policies; these policies are also initialized and regularized by the latent space learned from offline demonstrations to guide the joint policy optimization. We validate our approach in multiple sparse RL environments, including a new sparse reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for optimal performance. Images and videos are available at https://sites.google.com/view/skill-critic. We plan to open source the code with the final version.

翻訳日:2023-06-16 19:27:27 公開日:2023-06-14

# 実世界シナリオにおけるディープニューラルネットワークの効率的なバックドア攻撃

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios ( http://arxiv.org/abs/2306.08386v1 )

ライセンス: Link先を確認

Hong Sun, Ziqiang Li, Pengfei Xia, Heng Li, Beihao Xia, Yi Wu, Bin Li

(参考訳) 近年のディープニューラルネットワーク(DNN)は、大量のトレーニングデータに依存しており、悪意のある攻撃者がデータを悪用して汚染し、バックドア攻撃を行う機会となっている。これらの攻撃はDNNの信頼性を著しく損なう。しかし、既存のバックドア攻撃手法は、すべてのトレーニングデータが単一のソースから来ており、攻撃者がトレーニングデータへの完全なアクセスを前提として、非現実的な仮定をする。本稿では、被害者が複数のソースからデータを収集し、攻撃者が完全なトレーニングデータにアクセスできないような、より現実的な攻撃シナリオを導入することで、この制限に対処する。このシナリオを、データ制約されたバックドア攻撃と呼んでいる。このような場合、以前の攻撃方法は、バックドア注入の過程で良性と毒物の特徴が絡み合うことによる効率の低下に苦しむ。そこで本研究では,CLIP(Contrastive Language- Image Pre-Training)モデルを用いた新しい手法を提案する。そこで,本研究では,清潔な特徴の影響を抑制することを目的とした,清潔な特徴抑制技術と,モデルの動作を効果的に操作するための中毒機能の存在と影響を増強することに焦点を当てた中毒機能増強技術という,2つの異なる流れからのクリップベースの技術を紹介する。本手法の有効性, 正確性に対する無害性, およびステルスネスを評価するため, 3つのターゲットモデル, 3つのデータセット, 15以上の異なる設定について広範な実験を行った。その結果、データ制約のあるシナリオにおける既存の攻撃と比較して、いくつかの設定で100%以上の改善が達成された。本研究は,既存の手法の限界に対処し,データ制約されたバックドア攻撃に対する実用的で効果的な解決策を提供する。

Recent deep neural networks (DNNs) have come to rely on vast amounts of training data, providing an opportunity for malicious attackers to exploit and contaminate the data to carry out backdoor attacks. These attacks significantly undermine the reliability of DNNs. However, existing backdoor attack methods make unrealistic assumptions, assuming that all training data comes from a single source and that attackers have full access to the training data. In this paper, we address this limitation by introducing a more realistic attack scenario where victims collect data from multiple sources, and attackers cannot access the complete training data. We refer to this scenario as data-constrained backdoor attacks. In such cases, previous attack methods suffer from severe efficiency degradation due to the entanglement between benign and poisoning features during the backdoor injection process. To tackle this problem, we propose a novel approach that leverages the pre-trained Contrastive Language-Image Pre-Training (CLIP) model. We introduce three CLIP-based technologies from two distinct streams: Clean Feature Suppression, which aims to suppress the influence of clean features to enhance the prominence of poisoning features, and Poisoning Feature Augmentation, which focuses on augmenting the presence and impact of poisoning features to effectively manipulate the model's behavior. To evaluate the effectiveness, harmlessness to benign accuracy, and stealthiness of our method, we conduct extensive experiments on 3 target models, 3 datasets, and over 15 different settings. The results demonstrate remarkable improvements, with some settings achieving over 100% improvement compared to existing attacks in data-constrained scenarios. Our research contributes to addressing the limitations of existing methods and provides a practical and effective solution for data-constrained backdoor attacks.

翻訳日:2023-06-16 19:27:11 公開日:2023-06-14

# 準-1次元格子上の1次元ハイゼンベルクハミルトニアンによる量子状態転移

Quantum state transfer using 1D Heisenberg Hamiltonian on quasi-1D lattices ( http://arxiv.org/abs/2306.08440v1 )

ライセンス: Link先を確認

Chandrima B. Pushpan, Harikrishnan K. J., Amit Kumar Pal

(参考訳) 準1次元格子上での単一およびマルチキュービット状態の転送について検討し、状態伝達プロトコルに関わる時間進化は1Dハミルトニアンによってのみ生成される。準-1D 等方性ハイゼンベルク模型を$z$方向の磁場下で使用し、スピンスピン相互作用の強さは、ラング(rungs)と呼ばれ、他の部分格子に沿った相互作用よりもはるかに強い。フィールド強度を特殊値にチューニングすると、強いrung結合限界において、準1次元等方性ハイゼンベルクモデルが有効な1d xxzモデルにマッピングされ、各rungは効果的な2レベル系を模倣する。したがって、1つのrungから別のrungへの低エネルギーrung状態の転送は、1d xxzモデルを用いて任意の1量子ビット状態から1つの格子サイトから別の場所への転送によって表現できる。そこで本研究では,単一キュービット状態の特定エンコーディングを低エネルギーrung状態とし,その後に転送された状態を受信側rung上でデコードすることにより,任意の単一キュービット状態をある格子サイトから別のラッチサイトへ転送するプロトコルを提案する。これらの符号化および復号プロトコルは、1Dラングハミルトニアンとシングルキュービット位相ゲートによって生成される時間進化を含み、単一キュービット状態の転送に必要なすべての時間進化が1Dハミルトニアンから生成される。提案プロトコルを用いた単一量子状態転送の性能は,全準1Dハミルトニアンによる時間進化を用いた場合,常に同じよりも優れていることを示す。

We consider transfer of single and multi-qubit states on a quasi-1D lattice, where the time evolutions involved in the state transfer protocol are generated by only 1D Hamiltonians. We use the quasi-1D isotropic Heisenberg model under a magnetic field along the $z$ direction, where the spin-spin interaction strengths along the vertical sublattices, referred to as rungs, are much stronger than the interactions along other sublattices. Tuning the field-strength to a special value, in the strong rung-coupling limit, the quasi-1D isotropic Heisenberg model can be mapped to an effective 1D XXZ model, where each rung mimics an effective two-level system. Consequently, the transfer of low-energy rung states from one rung to another can be represented by a transfer of an arbitrary single-qubit state from one lattice site to another using the 1D XXZ model. Exploiting this, we propose protocols for transferring arbitrary single-qubit states from one lattice site to another by using specific encoding of the single-qubit state into a low-energy rung state, and a subsequent decoding of the transferred state on the receiver rung. These encoding and decoding protocols involve a time evolution generated by the 1D rung Hamiltonian and single-qubit phase gates, ensuring that all time-evolutions required for transferring the single-qubit state are generated from 1D Hamiltonians. We show that the performance of the single-qubit state transfer using the proposed protocol is always better than the same when a time-evolution generated by the full quasi-1D Hamiltonian is used.

翻訳日:2023-06-16 19:18:39 公開日:2023-06-14

# 2レベル結合系からのコヒーレント散乱

Coherent scattering from coupled two level systems ( http://arxiv.org/abs/2306.08439v1 )

ライセンス: Link先を確認

Thomas Nutz, Samuel T. Mister, Petros Androvitsaneas, Andrew Young, E. Harbord, J. G. Rarity, Ruth Oulton and Dara P. S. McCutcheon

(参考訳) 光活性スピン1/2系の共鳴蛍光特性について検討し,散乱光のコヒーレンスに及ぼす磁場の影響を解明した。本研究では, 2レベル系(TLS)の結果を再現し, 基底状態結合を持つスピン系にも適用可能な, このシステムのためのマスター方程式モデルを導出する。このモデルは弱励起状態において解析的に解かれる。このモデルにおけるスピンダイナミクスの包含は、コヒーレントに散乱した光の性質を基本レベルで変化させる。 TLSの場合、コヒーレンス特性は入力レーザーによって決定されることが知られている。スピン散乱光はスピンのコヒーレンス特性を継承することを示す。このマッピングにより、散乱場の直接測定によりスピンダイナミクスとコヒーレンス時間を測定することができる。さらに,線幅以下のゼーマン分裂を解消する能力を示した。スピン散乱場のコヒーレンス特性を理解するための重要なツールを示すとともに、スピン光子ベースの量子技術には不可欠である。

We study the resonance fluorescence properties of an optically active spin 1/2 system, elucidating the effects of a magnetic field on the coherence of the scattered light. We derive a master equation model for this system that reproduces the results of a two level system (TLS) while also being applicable to a spin system with ground state coupling. This model is then solved analytically in the weak excitation regime. The inclusion of spin dynamics in our model alters the properties of the coherently scattered light at a fundamental level. For a TLS the coherence properties are known to be determined by the input laser. We show that spin scattered light inherits the coherence properties of the spin. This mapping allows us to measure spin dynamics and coherence time through direct measurement of the scattered fields. Furthermore, we show the ability to resolve sub-natural linewidth zeeman splittings. Along with representing an invaluable tool for spin spectroscopy understanding the coherence properties of the spin-scattered field will be vital for spin-photon based quantum technologies.

翻訳日:2023-06-16 19:18:11 公開日:2023-06-14

# 実空間表現を用いた非対称変分量子状態の構築

Construction of Antisymmetric Variational Quantum States with Real-Space Representation ( http://arxiv.org/abs/2306.08434v1 )

ライセンス: Link先を確認

Takahiro Horiba, Soichi Shirai, Hirotoshi Hirai

(参考訳) 量子コンピュータを用いた電子状態計算は、主に量子ビット表現に適した第二量子化に基づいている。量子コンピュータ上で電子状態を記述する別の方法は、第1量子化であり、第2量子化よりも基底関数の数に関してより小さなスケーリングを実現することが期待されている。基底関数のうち、実空間基底はフォールトトレラント量子計算(ftqc)時代の量子力学シミュレーションにとって魅力的な選択肢である。実空間基底を持つ第一量子化における大きな困難は多体電子系の状態準備である。この困難は電子の反対称性から来ており、量子回路上に反対称性の量子状態を構築することは容易ではない。本稿では,非対称量子状態を作成するために,変分量子回路を構築するための設計原理を提案する。提案回路は、指数関数的に多くのスレーター行列式、すなわち、正確な基底状態の近似に関する体系的なアプローチを提供するマルチコンフィギュレーション状態の重ね合わせを生成する。我々は1次元水素分子系の基底状態を得るために変分量子固有解法(VQE)を実装した。その結果、提案回路は正確な非対称基底状態とそのエネルギーを十分に再現したが、従来の変分回路は非対称状態も対称状態も得られなかった。さらに,電子相関と量子絡み合いの関係を示す量子情報理論に基づく多体波動関数の解析を行った。

Electronic state calculations using quantum computers are mostly based on second quantization, which is suitable for qubit representation. Another way to describe electronic states on a quantum computer is first quantization, which is expected to achieve smaller scaling with respect to the number of basis functions than second quantization. Among basis functions, a real-space basis is an attractive option for quantum dynamics simulations in the fault-tolerant quantum computation (FTQC) era. A major difficulty in first quantization with a real-space basis is state preparation for many-body electronic systems. This difficulty stems from of the antisymmetry of electrons, and it is not straightforward to construct antisymmetric quantum states on a quantum circuit. In the present paper, we provide a design principle for constructing a variational quantum circuit to prepare an antisymmetric quantum state. The proposed circuit generates the superposition of exponentially many Slater determinants, that is, a multi-configuration state, which provides a systematic approach to approximating the exact ground state. We implemented the variational quantum eigensolver (VQE) to obtain the ground state of a one-dimensional hydrogen molecular system. As a result, the proposed circuit well reproduced the exact antisymmetric ground state and its energy, whereas the conventional variational circuit yielded neither an antisymmetric nor a symmetric state. Furthermore, we analyzed the many-body wave functions based on quantum information theory, which illustrated the relation between the electron correlation and the quantum entanglement.

翻訳日:2023-06-16 19:17:57 公開日:2023-06-14

# 『定義モデリング:定義をモデル化する』セマンティクスを使わずに定義を生成する

"Definition Modeling: To model definitions." Generating Definitions With Little to No Semantics ( http://arxiv.org/abs/2306.08433v1 )

ライセンス: Link先を確認

Vincent Segonne and Timothee Mickus

(参考訳) 定義モデル(定義を生成するタスク)は、最初に、単語埋め込みの意味的品質を評価する手段として提案され、文脈における単語の一貫した語彙的意味表現は、その定義を生成するために必要な全ての情報を含むべきである。このタスクの比較的新しいところは、どの要素が実際に定義モデリングシステムに頼っているかわからないことです。本稿では,本論文の先行モデルが,明示的な多意味性などの意味的側面に対してより敏感であること,また,単語と語句間の形式的類似性に頼っていること,および埋め込みを評価する手段としてのタスクの有効性に疑念を抱くこと,など,期待するほど意味論を含まない可能性があることを示す。

Definition Modeling, the task of generating definitions, was first proposed as a means to evaluate the semantic quality of word embeddings-a coherent lexical semantic representations of a word in context should contain all the information necessary to generate its definition. The relative novelty of this task entails that we do not know which factors are actually relied upon by a Definition Modeling system. In this paper, we present evidence that the task may not involve as much semantics as one might expect: we show how an earlier model from the literature is both rather insensitive to semantic aspects such as explicit polysemy, as well as reliant on formal similarities between headwords and words occurring in its glosses, casting doubt on the validity of the task as a means to evaluate embeddings.

翻訳日:2023-06-16 19:17:37 公開日:2023-06-14

# 高次元過度線形回帰における最小ノルムリスクのバッチ安定化

Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression ( http://arxiv.org/abs/2306.08432v1 )

ライセンス: Link先を確認

Shahar Stein Ioushua, Inbar Hasidim, Ofer Shayevitz and Meir Feder

(参考訳) データをバッチに分割する学習アルゴリズムは、多くの機械学習アプリケーションで一般的であり、典型的には計算効率と性能のトレードオフを提供する。本稿では,等方的ガウス特徴を持つ最小ノルム過パラメータ線形回帰モデルのレンズによるバッチ分割の利点について検討する。最小ノルム推定器の自然な小バッチバージョンを提案し,その二次リスクの上限を導出し,最適バッチサイズの選択のために,ノイズレベルと過パラメータ比に逆比例することを示した。極小ノルムとは対照的に, 推定器は単調に過パラメータ化比が増加する安定なリスク挙動を認め, 補間点での爆発と二重発振現象の両方を除去する。興味深いことに、バッチパーティションによって提供されるこの暗黙の規則化は、バッチ間の機能の重複によって部分的に説明される。我々の境界は、新しい手法の組み合わせ、特にランダム部分空間上のノイズ射影のワッサーシュタイン計量における正規近似によって導かれる。

Learning algorithms that divide the data into batches are prevalent in many machine-learning applications, typically offering useful trade-offs between computational efficiency and performance. In this paper, we examine the benefits of batch-partitioning through the lens of a minimum-norm overparameterized linear regression model with isotropic Gaussian features. We suggest a natural small-batch version of the minimum-norm estimator, and derive an upper bound on its quadratic risk, showing it is inversely proportional to the noise level as well as to the overparameterization ratio, for the optimal choice of batch size. In contrast to minimum-norm, our estimator admits a stable risk behavior that is monotonically increasing in the overparameterization ratio, eliminating both the blowup at the interpolation point and the double-descent phenomenon. Interestingly, we observe that this implicit regularization offered by the batch partition is partially explained by feature overlap between the batches. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.

翻訳日:2023-06-16 19:17:23 公開日:2023-06-14

# 量子コンピューティングノイズモデルのボリュームベンチマーク

Volumetric Benchmarking of Quantum Computing Noise Models ( http://arxiv.org/abs/2306.08427v1 )

ライセンス: Link先を確認

Tom Weber, Kerstin Borras, Karl Jansen, Dirk Kr\"ucker and Matthias Riebisch

(参考訳) スケーラビリティに向かっている量子コンピューティングの主な課題は、現在のデバイスの誤った振る舞いである。計算への影響の理解と予測は、これらのエラーを量子エラー緩和のような手法で対処するために不可欠である。したがって、正確なノイズモデルの構築と評価が必要である。しかし、ノイズモデルの評価はまだ体系的なアプローチに従っていないため、あるアプリケーションに対するモデルの精度を推定することはほぼ不可能である。そこで我々は,量子コンピューティングアプリケーションのためのノイズモデルベンチマーク手法を開発し,提案する。ハードウェア実験の結果と、量子回路の代表集合に対するノイズモデルの予測を比較する。また,ノイズモデルを構築し,そのパラメータを一連のトレーニング回路で最適化する。次に、文献から他のモデルと比較したボリュームベンチマークを実行します。

The main challenge of quantum computing on its way to scalability is the erroneous behaviour of current devices. Understanding and predicting their impact on computations is essential to counteract these errors with methods such as quantum error mitigation. Thus, it is necessary to construct and evaluate accurate noise models. However, the evaluation of noise models does not yet follow a systematic approach, making it nearly impossible to estimate the accuracy of a model for a given application. Therefore, we developed and present a systematic approach to benchmark noise models for quantum computing applications. It compares the results of hardware experiments to predictions of noise models for a representative set of quantum circuits. We also construct a noise model and optimize its parameters with a series of training circuits. We then perform a volumetric benchmark comparing our model to other models from the literature.

翻訳日:2023-06-16 19:17:06 公開日:2023-06-14

# 選択的概念モデル: テスト時にステークホルダーのカスタマイズを許可する

Selective Concept Models: Permitting Stakeholder Customisation at Test-Time ( http://arxiv.org/abs/2306.08424v1 )

ライセンス: Link先を確認

Matthew Barker, Katherine M. Collins, Krishnamurthy Dvijotham, Adrian Weller, Umang Bhatt

(参考訳) 概念に基づくモデルは、ステークホルダーに解釈可能な概念のセットを使用して予測を行う。しかし、そのようなモデルは、しばしば固定された多くの概念を伴い、利害関係者にかなりの認知負荷を与える可能性がある。 SCOM(Selective Concept Models)を提案する。これは概念のサブセットのみを用いて予測を行い、その好みに応じてテスト時に利害関係者がカスタマイズできる。複数の実世界のデータセットに対して最適な精度を実現するために、SCOMは全概念のごく一部しか必要としないことを示す。さらに、人気のcubデータセットから900の鳥画像に対して、人間のコンセプトセット選択からなる新しいデータセットcub-selを収集し、リリースする。 CUB-Selを用いて、人間は推論を好む概念を選択し、最も理論的に意味のある概念を特定するのに苦労していることが示される。 SCOMが提供するカスタマイズとコンセプトの選択は、ステークホルダーの解釈と介入の効率を向上させる。

Concept-based models perform prediction using a set of concepts that are interpretable to stakeholders. However, such models often involve a fixed, large number of concepts, which may place a substantial cognitive load on stakeholders. We propose Selective COncept Models (SCOMs) which make predictions using only a subset of concepts and can be customised by stakeholders at test-time according to their preferences. We show that SCOMs only require a fraction of the total concepts to achieve optimal accuracy on multiple real-world datasets. Further, we collect and release a new dataset, CUB-Sel, consisting of human concept set selections for 900 bird images from the popular CUB dataset. Using CUB-Sel, we show that humans have unique individual preferences for the choice of concepts they prefer to reason about, and struggle to identify the most theoretically informative concepts. The customisation and concept selection provided by SCOM improves the efficiency of interpretation and intervention for stakeholders.

翻訳日:2023-06-16 19:16:56 公開日:2023-06-14

# x-detect: 小売店舗における物体検出装置の敵対的パッチ検出法

X-Detect: Explainable Adversarial Patch Detection for Object Detectors in Retail ( http://arxiv.org/abs/2306.08422v1 )

ライセンス: Link先を確認

Omer Hofman, Amit Giloni, Yarin Hayun, Ikuya Morikawa, Toshiya Shimizu, Yuval Elovici and Asaf Shabtai

(参考訳) 様々なドメイン(小売など)で広く使われているオブジェクト検出モデルは、敵の攻撃に対して脆弱であることが示されている。既存の物体検出器に対する対向攻撃検出方法は、新しい実生活攻撃の検出が困難であった。我々は、新しい対向パッチ検出器であるX-Detectを提示する。一敵のサンプルをリアルタイムで検出し、防御者が予防措置を講じることができること。二被告の意思決定プロセスを支援するために提起された警告について説明すること。三新たな攻撃の形で不慣れな脅威を扱うこと。新しいシーンが与えられると、x-detectは、オブジェクト抽出、シーン操作、特徴変換技術を利用してアラートを発行する必要があるかどうかを判断する、設計毎に説明可能な検出器のアンサンブルを使用する。 X-Detectは5つの異なる攻撃シナリオ(アダプティブアタックを含む)とCOCOデータセットと新しいSuperstoreデータセットを使用して、物理空間とデジタル空間の両方で評価された。実際の環境ではスマートショッピングカートのセットアップを用いて物理的評価を行い,17件の敵パッチ攻撃が1700件のビデオに記録された。その結果、X-Detectは攻撃シナリオの良さと敵の情景を区別し、0%のFPR(誤報なし)を維持し、警告のアクション可能な説明を提供しながら、最先端の手法よりも優れていた。デモが公開されている。

Object detection models, which are widely used in various domains (such as retail), have been shown to be vulnerable to adversarial attacks. Existing methods for detecting adversarial attacks on object detectors have had difficulty detecting new real-life attacks. We present X-Detect, a novel adversarial patch detector that can: i) detect adversarial samples in real time, allowing the defender to take preventive action; ii) provide explanations for the alerts raised to support the defender's decision-making process, and iii) handle unfamiliar threats in the form of new attacks. Given a new scene, X-Detect uses an ensemble of explainable-by-design detectors that utilize object extraction, scene manipulation, and feature transformation techniques to determine whether an alert needs to be raised. X-Detect was evaluated in both the physical and digital space using five different attack scenarios (including adaptive attacks) and the COCO dataset and our new Superstore dataset. The physical evaluation was performed using a smart shopping cart setup in real-world settings and included 17 adversarial patch attacks recorded in 1,700 adversarial videos. The results showed that X-Detect outperforms the state-of-the-art methods in distinguishing between benign and adversarial scenes for all attack scenarios while maintaining a 0% FPR (no false alarms) and providing actionable explanations for the alerts raised. A demo is available.

翻訳日:2023-06-16 19:16:41 公開日:2023-06-14

# マルチエージェント強化学習

Mediated Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2306.08419v1 )

ライセンス: Link先を確認

Dmitry Ivanov, Ilya Zisman, Kirill Chernyshev

(参考訳) マルチエージェント強化学習(MARL: Multi-Agent Reinforcement Learning, MARL)の文献の大半は、社会的福祉の最大化の問題に、混合環境における自己関心のエージェントの協力と一致する。この結果、個人の目標を放棄して社会的利益を優先するエージェントが生まれ、利己的な離反者によって悪用される可能性がある。協力はまた、創発的な行動が均衡であること、すなわち、エージェントがより高い個別の報酬を受け取れないことを保証することによって、エージェントのアイデンティティと境界を尊重することを要求する。機構設計の進歩に触発されて,メディエータを用いて社会的に有益な均衡を見出すものとして定義された協調問題の解決を提案する。仲介者は、代理人のために行動するが、それに同意する代理人のためにのみ行動する好意的な存在である。本研究では,政策勾配を有するエージェントと並行して仲介者を訓練し,仲介者を通じて協力を促す制約を受ける社会福祉を最大化する方法を示す。行列ゲームと反復ゲームにおける我々の実験は、MARLにおけるメディエータの適用の可能性を強調している。

The majority of Multi-Agent Reinforcement Learning (MARL) literature equates the cooperation of self-interested agents in mixed environments to the problem of social welfare maximization, allowing agents to arbitrarily share rewards and private information. This results in agents that forgo their individual goals in favour of social good, which can potentially be exploited by selfish defectors. We argue that cooperation also requires agents' identities and boundaries to be respected by making sure that the emergent behaviour is an equilibrium, i.e., a convention that no agent can deviate from and receive higher individual payoffs. Inspired by advances in mechanism design, we propose to solve the problem of cooperation, defined as finding socially beneficial equilibrium, by using mediators. A mediator is a benevolent entity that may act on behalf of agents, but only for the agents that agree to it. We show how a mediator can be trained alongside agents with policy gradient to maximize social welfare subject to constraints that encourage agents to cooperate through the mediator. Our experiments in matrix and iterative games highlight the potential power of applying mediators in MARL.

翻訳日:2023-06-16 19:16:15 公開日:2023-06-14

# The Devil is in the details: Analyzing the Lucrative Ad Fraud Patterns of the Online Ad Ecosystem

The Devil is in the Details: Analyzing the Lucrative Ad Fraud Patterns of the Online Ad Ecosystem ( http://arxiv.org/abs/2306.08418v1 )

ライセンス: Link先を確認

Emmanouil Papadogiannakis, Nicolas Kourtellis, Panagiotis Papadopoulos, Evangelos P. Markatos

(参考訳) オンライン広告市場は最近500億ドル(約5兆5000億円)に達し、ユーザーと最も高い入札者とを1秒でマッチングする必要性に対応するため、多数のエージェントや中間男性を含む複雑な自動化モデルへと移行した。潜在的な収入と透明性の欠如に刺激され、悪役はそれを悪用し、制限を回避し、不当で違法なコンテンツからかなりの収入を生み出す方法を見出した。さらに悪いことに、これらの違法行為とは無関係な尊敬すべき企業から広告を受け取ることが多い。総じて、広告主のお金は未知の実体に向けられ、不利な操作を支持し、その存在を維持する。このプロジェクトでは、問題の程度を理解し、シャディエージェントが広告エコシステムのギャップを利用して事業を収益化する方法について光を当てています。我々は700万以上のウェブサイトを調査し、オンライン広告に関する最先端の標準がどのように適用されているかを調査した。我々は、この世で観測された実際の実践を発見し、パブリッシャーが好ましくない、違法なコンテンツを収益化でき、毎月数千ドルの収益を得られることを示す。

The online advertising market has recently reached the 500 billion dollar mark, and to accommodate the need to match a user with the highest bidder at a fraction of a second, it has moved towards a complex automated model involving numerous agents and middle men. Stimulated by potential revenue and the lack of transparency, bad actors have found ways to abuse it, circumvent restrictions, and generate substantial revenue from objectionable and even illegal content. To make matters worse, they often receive advertisements from respectable companies which have nothing to do with these illegal activities. Altogether, advertiser money is funneled towards unknown entities, supporting their objectionable operations and maintaining their existence. In this project, we work towards understanding the extent of the problem and shed light on how shady agents take advantage of gaps in the ad ecosystem to monetize their operations. We study over 7 million websites and examine how state-of-the-art standards associated with online advertising are applied. We discover and present actual practices observed in the wild and show that publishers are able to monetize objectionable and illegal content and generate thousands of dollars of revenue on a monthly basis.

翻訳日:2023-06-16 19:15:53 公開日:2023-06-14

# 畳み込み理論に基づく量子乗算アルゴリズム

Quantum Multiplication Algorithm Based on Convolution Theorem ( http://arxiv.org/abs/2306.08473v1 )

ライセンス: Link先を確認

Mehdi Ramezani, Morteza Nikaeen, Farnaz Farman, Seyed Mahmoud Ashrafi and Alireza Bahrampour

(参考訳) 大量の効率的な乗算の問題は古典計算における長年の課題であり、何世紀にもわたって広く研究されてきた。既存の古典的アルゴリズムは理論上の限界に近づき、さらなる拡張の余地はほとんどないようである。しかし、量子コンピュータの出現と量子ハードウェア上での乗算が可能な量子アルゴリズムの必要性により、新しいパラダイムが出現する。本稿では,畳み込み定理と古典的高速フーリエ変換に依拠するストラッセン法に着想を得て,量子資源を用いて現代的な古典的乗算アルゴリズムに対していくつかの利点を持つ乗算が可能な量子版を提案する。畳み込み定理の量子バージョンは、精度、空間の複雑さの指数的減少、時間効率の(確率的な)向上の観点から、乗算アルゴリズムに顕著な改善をもたらすことを実証する。この論文はまた、古典的乗法アルゴリズムの歴史と発展をレビューし、量子リソースがこの根本的な問題に対する新しい視点と可能性を提供する方法を探る動機付けである。

The problem of efficient multiplication of large numbers has been a long-standing challenge in classical computation and has been extensively studied for centuries. It appears that the existing classical algorithms are close to their theoretical limit and offer little room for further enhancement. However, with the advent of quantum computers and the need for quantum algorithms that can perform multiplication on quantum hardware, a new paradigm emerges. In this paper, inspired by the Strassen method that relies on the convolution theorem and classical Fast Fourier Transform, we propose a quantum version of this algorithm that can perform multiplication with some advantages over the modern classical multiplication algorithms by using quantum resources. We demonstrate how the quantum version of the convolution theorem can offer significant improvements to multiplication algorithms in terms of accuracy, exponential reduction of space complexity and (probabilistic) enhancement of time efficiency. The paper also reviews the history and development of classical multiplication algorithms and motivates us to explore how quantum resources can provide new perspectives and possibilities for this fundamental problem.

翻訳日:2023-06-16 19:10:01 公開日:2023-06-14

# 再現可能なナップサックのバンディット:両世界のベスト

Bandits with Replenishable Knapsacks: the Best of both Worlds ( http://arxiv.org/abs/2306.08470v1 )

ライセンス: Link先を確認

Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Federico Fusco

(参考訳) knapsack(bwk)フレームワークのバンディットは、エージェントがリソース消費の制約に従う一連の決定を下すオンライン意思決定問題をモデル化する。従来のモデルでは、各アクションが非負のリソースを消費し、初期予算が完全に枯渇するとプロセスが終了する。本研究では,非単調な資源利用を可能にするbwkフレームワークの自然一般化,すなわち資源を正の量で補充できる方法を検討する。そこで本稿では,オンライン学習問題に対処できる最強のプリミティブ・デュアルテンプレートを提案する。特に、我々のフレームワークは、$b=\omega(t)$ または可能な1ラウンドあたりの補充が正の定数である場合に、一定の競合比 $\alpha$ を保証していることを示すことによって、敵対的入力の場合の最初のポジティブな結果を提供する。さらに,確率的入力モデルの下では,既存のインスタンス依存境界を補完するインスタンス独立$\tilde{O}(T^{1/2})$ regret boundが得られる。最後に,我々の枠組みを実践的妥当性の経済的問題に適用する。

The bandits with knapsack (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio $\alpha$ when $B=\Omega(T)$ or when the possible per-round replenishment is a positive constant. Moreover, under a stochastic input model, our algorithm yields an instance-independent $\tilde{O}(T^{1/2})$ regret bound which complements existing instance-dependent bounds for the same setting. Finally, we provide applications of our framework to some economic problems of practical relevance.

翻訳日:2023-06-16 19:09:43 公開日:2023-06-14

# ヘテロフィア下での自己教師あり学習とグラフ分類

Self-supervised Learning and Graph Classification under Heterophily ( http://arxiv.org/abs/2306.08469v1 )

ライセンス: Link先を確認

Yilin Ding, Zhen Liu, Hao Hao

(参考訳) 自己教師型学習は近年,グラフ表現学習において有望な能力を示している。既存のほとんどの事前学習戦略は、通常、特殊なローパスフィルタと見なすことのできる一般的なグラフニューラルネットワーク(gnns)を選択するが、ヘテロフィリを効果的に捉えることができない。本稿では,ヘテロフィリグラフの分類において,低域通過フィルタと高域通過フィルタの性能を実験的に検討し,ヘテロフィリグラフ表現の学習において高周波信号が重要であることを示す。一方で,グラフの構造パターンを効果的に捉える方法や,自己教師付き事前学習戦略がグラフ構造をキャプチャする上での能力を測定する方法はまだ不明である。そこで我々はまず,グラフの類似度とグラフペアの埋め込み類似度との相関関係を解析し,グラフ構造を測定する定量的な尺度を設計する。次に,自己教師付き学習で取得したグラフ構造情報を強化するために,計量(pgm)に基づく事前学習のための新しい自己教師付き戦略を提案する。分子特性予測とタンパク質機能予測のために,我々の事前学習戦略を検証した。さらに,ヘテロフィリグラフ分類のための適切な事前学習戦略を設計するよりも,適切なフィルタを選択する方が良い場合もある。

Self-supervised learning has shown its promising capability in graph representation learning in recent work. Most existing pre-training strategies usually choose the popular Graph neural networks (GNNs), which can be seen as a special form of low-pass filter, fail to effectively capture heterophily. In this paper, we first present an experimental investigation exploring the performance of low-pass and high-pass filters in heterophily graph classification, where the results clearly show that high-frequency signal is important for learning heterophily graph representation. On the other hand, it is still unclear how to effectively capture the structural pattern of graphs and how to measure the capability of the self-supervised pre-training strategy in capturing graph structure. To address the problem, we first design a quantitative metric to Measure Graph Structure (MGS), which analyzes correlation between structural similarity and embedding similarity of graph pairs. Then, to enhance the graph structural information captured by self-supervised learning, we propose a novel self-supervised strategy for Pre-training GNNs based on the Metric (PGM). Extensive experiments validate our pre-training strategy achieves state-of-the-art performance for molecular property prediction and protein function prediction. In addition, we find choosing the suitable filter sometimes may be better than designing good pre-training strategies for heterophily graph classification.

翻訳日:2023-06-16 19:09:25 公開日:2023-06-14

# 時間依存非可換量子系における創発的幾何位相

Emergent geometric phase in time-dependent noncommutative quantum system ( http://arxiv.org/abs/2306.08467v1 )

ライセンス: Link先を確認

Anwesha Chakraborty

(参考訳) プランク長さスケールの近傍で、量子重力効果が観測されると予測される場合にのみ事象を局在化させる努力は、必然的に重力崩壊を引き起こす。そのような状況が発生するのを防ぐために、時空座標間の非可換 (nc) 代数を仮定しなければならない。一方、量子力学自体は一貫した定式化であり、演算子としての時間は困難で長年続く問題である。ここでは, 1+1次元NC時空上での非相対論的量子力学(モーダル型非可換性)をユーザフレンドリーな方法で定式化する方法を提案し, 等価可換理論の定式化を義務付ける。時空の非可換性の影響は、おそらく非常に高いエネルギースケールで重要になるはずであるが、低エネルギー状態においても量子時空の影響の遺物が存在すると推測するのは興味深い。このモチベーションを念頭において、時間依存系の研究、すなわち、nc時空における強制調和振動子の研究を行い、ncパラメータがゼロに設定された場合に消失する幾何相の出現を示し、幾何相の発生は、時空の非可換性に完全に依存していることを証明した。

Any effort to localise an event in the vicinity of the Planck length scale, only where the quantum gravitational effects are predicted to be observed, will invariably result in gravitational collapse. One must postulate noncommutative (NC) algebra between space-time coordinates, which are now elevated to the status of operators, in order to prevent such a situation from occurring. On the other hand, a consistent formulation of Quantum mechanics itself, with time being an operator is a challenging and longstanding problem. Here we have given a systematic way to formulate non-relativistic quantum mechanics on 1+1 dimensional NC space-time (Moyal type noncommutativity) in a user-friendly way, which mandates the formulation of an equivalent commutative theory. Although the effect of noncommutativity of space-time should presumably become significant at a very high energy scale, it is intriguing to speculate that there should be some relics of the effects of quantum space-time even in a low-energy regime. With this motivation in mind, we undertake the study of a time-dependent system, namely a forced harmonic oscillator in NC space-time and have shown the emergence of a geometric phase, which vanishes if the NC parameter is put to zero, proving the fact that, the occurrence of geometric phase is totally dependent on the non-commutativity of space-time.

翻訳日:2023-06-16 19:09:04 公開日:2023-06-14

# meta-gradient augmentation によるメタラーニングの一般化

Improving Generalization in Meta-Learning via Meta-Gradient Augmentation ( http://arxiv.org/abs/2306.08460v1 )

ライセンス: Link先を確認

Ren Wang, Haoliang Sun, Qi Wei, Xiushan Nie, Yuling Ma, Yilong Yin

(参考訳) メタ学習の方法は一般的に2ループのフレームワークに従い、各ループは悪名高い過剰フィッティングに苦しむ可能性があり、新しいタスクへの迅速な適応と一般化を妨げる。既存のスキームは、訓練サンプルの相互排他性や多様性を高めて解決するが、これらのデータ操作戦略はデータに依存しており、柔軟性が不十分である。本研究は,勾配正規化の観点からのメタラーニングの過剰化を緩和し,データ非依存な \textbf{m}eta-\textbf{g}radient \textbf{aug}mentation (\textbf{mgaug}) 法を提案する。鍵となるアイデアは、まずネットワークプルーニングによって、内側ループの記憶過剰に対処し、その後、プルーニングされたサブネットワークの勾配が自然に、外部ループの学習者オーバーフィッティングを緩和するメタグレードの高品質な強化を形成することである。具体的には,各パラメータに対するメタ記憶保持量(mmca)を計測し,高スコアの記憶を極力破壊するために,新たに提案する \textit{catfish pruning}, \textit{random width pruning}, \textit{random parameter pruning},および新たに提案された \textit{catfish pruning} の3つのプルーニング戦略を検討した。提案した MGAug は、PAC-Bayes フレームワークからの一般化によって理論的に保証される。さらに、パフォーマンス向上とリソースオーバーヘッドのトレードオフとして、MGAug-MaxUpと呼ばれる軽量バージョンを拡張しました。複数の数ショットの学習ベンチマークに対する大規模な実験は、MGAugの有効性と様々なメタベースラインに対する大幅な改善を検証する。コードは \url{https://github.com/xxLifeLover/Meta-Gradient-Augmentation} で公開されている。

Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing schemes solve it by enhancing the mutual-exclusivity or diversity of training samples, but these data manipulation strategies are data-dependent and insufficiently flexible. This work alleviates overfitting in meta-learning from the perspective of gradient regularization and proposes a data-independent \textbf{M}eta-\textbf{G}radient \textbf{Aug}mentation (\textbf{MGAug}) method. The key idea is to first break the rote memories by network pruning to address memorization overfitting in the inner loop, and then the gradients of pruned sub-networks naturally form the high-quality augmentation of the meta-gradient to alleviate learner overfitting in the outer loop. Specifically, we explore three pruning strategies, including \textit{random width pruning}, \textit{random parameter pruning}, and a newly proposed \textit{catfish pruning} that measures a Meta-Memorization Carrying Amount (MMCA) score for each parameter and prunes high-score ones to break rote memories as much as possible. The proposed MGAug is theoretically guaranteed by the generalization bound from the PAC-Bayes framework. In addition, we extend a lightweight version, called MGAug-MaxUp, as a trade-off between performance gains and resource overhead. Extensive experiments on multiple few-shot learning benchmarks validate MGAug's effectiveness and significant improvement over various meta-baselines. The code is publicly available at \url{https://github.com/xxLifeLover/Meta-Gradient-Augmentation}.

翻訳日:2023-06-16 19:08:39 公開日:2023-06-14

# 詩の融合 : 詩生成における意味的・韻律的操作の融合に向けて

PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in Poetry Generation ( http://arxiv.org/abs/2306.08456v1 )

ライセンス: Link先を確認

Zhiyuan Hu, Chumin Liu, Yue Feng, Bryan Hooi

(参考訳) 詩生成は自然言語生成において典型的で一般的なタスクである。以前の作品では、詩生成の意味的側面や計量的側面を制御できたが、両方の視点を同時に扱うことにはまだ課題がある。本稿では,中国語のSonnetとSongCiの詩を生成するためにDiffusionモデルを用いて,このような課題に初めて取り組む。自己回帰生成と異なり、Diffusionモデルに基づく私たちの詩拡散モデルは、全文情報を考慮した完全な文や詩を生成し、意味表現の改善をもたらす。さらに、メトリクス(フォーマットとリズム)を操作および評価するために、新しいメートル法コントローラを組み込んだ。 PoetryDiffusionのDenoisingプロセスは、セマンティクスの段階的な強化とメートル法コントローラの柔軟な統合を可能にする。 2つのデータセットに対する実験結果から,本モデルが意味的,計量的,総合的な性能で既存モデルより優れていることが示された。

Poetry generation is a typical and popular task in natural language generation. While prior works have shown success in controlling either semantic or metrical aspects of poetry generation, there are still challenges in addressing both perspectives simultaneously. In this paper, we employ the Diffusion model to generate poetry in Sonnet and SongCi in Chinese for the first time to tackle such challenges. Different from autoregressive generation, our PoetryDiffusion model, based on Diffusion model, generates the complete sentence or poetry by taking into account the whole sentence information, resulting in improved semantic expression. Additionally, we incorporate a novel metrical controller to manipulate and evaluate metrics (format and rhythm). The denoising process in PoetryDiffusion allows for gradual enhancement of semantics and flexible integration of the metrical controller. Experimental results on two datasets demonstrate that our model outperforms existing models in terms of semantic, metrical and overall performance.

翻訳日:2023-06-16 19:08:03 公開日:2023-06-14

# 血圧測定技術に関する調査研究 : バイアスの潜在的源への取り組み

A Survey on Blood Pressure Measurement Technologies: Addressing Potential Sources of Bias ( http://arxiv.org/abs/2306.08451v1 )

ライセンス: Link先を確認

Seyedeh Somayyeh Mousavi and Reza Sameni

(参考訳) 血圧は、健康、特に心血管の健康に関する重要な洞察を与える重要なサインである。疾病予防、診断、治療、管理のための医療施設や家庭において重要な役割を担っている。医師は決定を下すために血圧値に大きく依存している。ほとんどの商用機器は血圧測定にカフを使用し、高血圧の頻度が高いために自動装置は人気を博している。血圧の自己測定とホームモニタリングも推奨されている。しかし、血圧測定技術の精度や、報告された値と実際の値との整合性に懸念が生じる。人々はこれらの報告された値に基づいて薬を調整し、正確さを不可欠にします。本研究は「バイアス」の概念に着目し、報告された血圧値と実際の血圧値との潜在的な不一致を強調する。これまでの研究では,(1)血圧測定装置,(2)主観的要因,(3)測定セッションの3つのカテゴリから発せられるバイアスを同定した。具体的には,カフをベースとした血圧技術にまつわるバイアスについて,医療応用の普及と在宅モニタリングの傾向について検討した。バイアスの主な原因を特定し、対処することは、バイアスの伝播を防ぎ、潜在的な影響を軽減するために不可欠である。さらに,機械学習を用いた血圧モニタリングの今後の展望についても検討した。

Blood pressure is a vital sign that offers important insights into overall health, particularly cardiovascular well-being. It plays a critical role in medical settings and homes for disease prevention, diagnosis, treatment, and management. Physicians heavily rely on blood pressure values for making crucial decisions. Most commercial devices utilize cuffs for blood pressure measurement, and automatic devices have gained popularity due to the high prevalence of hypertension. Self-measurement and home monitoring of blood pressure are also recommended. However, concerns arise regarding the accuracy of blood pressure measurement technologies and the alignment of reported values with actual values. People often adjust their medication based on these reported values, making accuracy vital. This study focuses on the concept of ``bias'' to highlight potential discrepancies between reported and actual blood pressure values. Previous research has identified biases originating from three categories: (1) blood pressure measurement devices, (2) subject-specific factors, and (3) measurement sessions. Specifically, this study examines biases associated with cuff-based blood pressure technologies due to their widespread use in medical applications and the growing trend of home monitoring. Identifying and addressing the primary sources of biases is crucial to prevent their propagation and mitigate potential consequences. Additionally, the study explores the future prospects of blood pressure monitoring using machine learning methods.

翻訳日:2023-06-16 19:07:46 公開日:2023-06-14

# 非定常データのオンライン分類のためのカルマンフィルタ

Kalman Filter for Online Classification of Non-Stationary Data ( http://arxiv.org/abs/2306.08448v1 )

ライセンス: Link先を確認

Michalis K. Titsias, Alexandre Galashov, Amal Rannen-Triki, Razvan Pascanu, Yee Whye Teh, Jorg Bornschein

(参考訳) オンライン連続学習(ocl)では、学習システムはデータのストリームを受け取り、予測およびトレーニングステップを順次実行する。 OCLの重要な課題は、データの特定の非定常構造への自動適応と予測の不確実性の定量化である。これらの課題に触発され、線形予測量に対する(おそらく事前学習された)ニューラル表現と状態空間モデルを用いて確率的ベイズオンライン学習モデルを導入する。線形予測子重みの非定常性は、忘れることを定量化する係数によってパラメータドリフト遷移密度を用いてモデル化される。このモデルの推論は、線形重みの後方分布を追跡する効率的なカルマンフィルタ再帰によって実装されるが、遷移ダイナミクス係数のオンラインsgd更新により、データに見られる非定常性に適応することができる。フレームワークは線形ガウスモデルとして開発されているが、分類問題やディープラーニング表現の微調整のために拡張する。 CIFAR-100 や CLOC などのデータセットを用いたマルチクラス分類実験では,モデルの予測能力と非定常性を捉える柔軟性を示す。

In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. Important challenges in OCL are concerned with automatic adaptation to the particular non-stationary structure of the data, and with quantification of predictive uncertainty. Motivated by these challenges we introduce a probabilistic Bayesian online learning model by using a (possibly pretrained) neural representation and a state space model over the linear predictor weights. Non-stationarity over the linear predictor weights is modelled using a parameter drift transition density, parametrized by a coefficient that quantifies forgetting. Inference in the model is implemented with efficient Kalman filter recursions which track the posterior distribution over the linear weights, while online SGD updates over the transition dynamics coefficient allows to adapt to the non-stationarity seen in data. While the framework is developed assuming a linear Gaussian model, we also extend it to deal with classification problems and for fine-tuning the deep learning representation. In a set of experiments in multi-class classification using data sets such as CIFAR-100 and CLOC we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.

翻訳日:2023-06-16 19:07:26 公開日:2023-06-14

# OoD検出器の剛性設計に向けて

Towards Rigorous Design of OoD Detectors ( http://arxiv.org/abs/2306.08447v1 )

ライセンス: Link先を確認

Chih-Hong Cheng, Changshun Wu, Harald Ruess, Saddek Bensalem

(参考訳) out-of-distribution(ood)検出技術は、安全関連ニューラルネットワークに有用である。しかし,現在の性能指向OoD検出技術は,キャリブレーション誤差などの基準値の一致を考慮に入れているため,安全性の確保には不十分である。欠けているのは、ood検出器の開発、検証、検証のための厳密な設計アプローチである。これらの設計原則は、意図した機能と運用ドメインに適合する必要がある。そこで我々は,ood検出器のための厳密で安全関連の設計手法を開発するための,今後の可能性とともに,重要な技術的課題のいくつかを定式化する。

Out-of-distribution (OoD) detection techniques are instrumental for safety-related neural networks. We are arguing, however, that current performance-oriented OoD detection techniques geared towards matching metrics such as expected calibration error, are not sufficient for establishing safety claims. What is missing is a rigorous design approach for developing, verifying, and validating OoD detectors. These design principles need to be aligned with the intended functionality and the operational domain. Here, we formulate some of the key technical challenges, together with a possible way forward, for developing a rigorous and safety-related design methodology for OoD detectors.

翻訳日:2023-06-16 19:07:05 公開日:2023-06-14

# グラフ構造化力学系に対する深いガウス的マルコフランダム場

Deep Gaussian Markov Random Fields for Graph-Structured Dynamical Systems ( http://arxiv.org/abs/2306.08445v1 )

ライセンス: Link先を確認

Fiona Lippert, Bart Kranstauber, E. Emiel van Loon, Patrick Forr\'e

(参考訳) 高次元状態空間モデルにおける確率的推論は計算上困難である。しかし、多くの時空間系では、状態変数の依存性構造に関する事前知識が利用可能である。この構造を利用して、(部分的に)未知のダイナミクスと限られた履歴データを持つグラフ構造状態空間モデルにおける状態推定と学習のための計算効率の高い手法を開発する。ガウスマルコフ確率場(英語版)(GMRF)の原理推論とディープラーニングからのアイデアを組み合わせた最近の手法に基づいて、簡単な空間グラフ層と時間グラフ層によって定義されたディープGMRFとしてグラフ構造化状態空間モデルを再構成する。これにより、変動推論によって単一の時間列から効率的に学習できるフレキシブルな時空間前処理が実現される。線形ガウスの仮定の下では、共役勾配法を用いて効率的にサンプリングできる閉形式後部を保ち、古典カルマンフィルタに基づくアプローチと比較して好ましくスケーリングする。

Probabilistic inference in high-dimensional state-space models is computationally challenging. For many spatiotemporal systems, however, prior knowledge about the dependency structure of state variables is available. We leverage this structure to develop a computationally efficient approach to state estimation and learning in graph-structured state-space models with (partially) unknown dynamics and limited historical data. Building on recent methods that combine ideas from deep learning with principled inference in Gaussian Markov random fields (GMRF), we reformulate graph-structured state-space models as Deep GMRFs defined by simple spatial and temporal graph layers. This results in a flexible spatiotemporal prior that can be learned efficiently from a single time sequence via variational inference. Under linear Gaussian assumptions, we retain a closed-form posterior, which can be sampled efficiently using the conjugate gradient method, scaling favourably compared to classical Kalman filter based approaches

翻訳日:2023-06-16 19:06:54 公開日:2023-06-14

# 科学的シンボリック推論に先立つ確率的正則木

Probabilistic Regular Tree Priors for Scientific Symbolic Reasoning ( http://arxiv.org/abs/2306.08506v1 )

ライセンス: Link先を確認

Tim Schneider, Amin Totounferoush, Wolfgang Nowak, Steffen Staab

(参考訳) シンボリック回帰(SR)は、データから科学方程式を発見できる。可能な方程式の大きな探索空間を制限するため、任意の文字列の部分集合を特徴づける形式文法の用語で事前知識が表現されている。しかし、構文的に正しい方程式の集合を表現するのに必要な文脈自由文法、前者の閉包特性の欠如、後者のツリー構造の間にはミスマッチがある。私たちの貢献は (i)確率正規木表現(pRTE)によりどの方程式が予想されるかという専門家の事前の信念をコンパクトに表現し、 (ii)有限状態機械として符号化された記号回帰に対して、ベイズ推論を効率的に利用できるように適応させる。本研究は土壌科学における吸収等温線の発見と超弾性材料のモデル化に有効性を示す。

Symbolic Regression (SR) allows for the discovery of scientific equations from data. To limit the large search space of possible equations, prior knowledge has been expressed in terms of formal grammars that characterize subsets of arbitrary strings. However, there is a mismatch between context-free grammars required to express the set of syntactically correct equations, missing closure properties of the former, and a tree structure of the latter. Our contributions are to (i) compactly express experts' prior beliefs about which equations are more likely to be expected by probabilistic Regular Tree Expressions (pRTE), and (ii) adapt Bayesian inference to make such priors efficiently available for symbolic regression encoded as finite state machines. Our scientific case studies show its effectiveness in soil science to find sorption isotherms and for modeling hyper-elastic materials.

翻訳日:2023-06-16 18:58:52 公開日:2023-06-14

# DiffuDetox: テキストのデトックス化のための混合拡散モデル

DiffuDetox: A Mixed Diffusion Model for Text Detoxification ( http://arxiv.org/abs/2306.08505v1 )

ライセンス: Link先を確認

Griffin Floto, Mohammad Mahdi Abdollah Pour, Parsa Farinneya, Zhenwei Tang, Ali Pesaranghader, Manasa Bharadwaj, Scott Sanner

(参考訳) テキストデトックス化(text detoxification)は、有害なテキストから有害なコンテンツを除去するための条件付きテキスト生成タスクである。オンラインフォーラムやソーシャルメディアでは、攻撃的なコンテンツが頻繁に出会うのに非常に有用である。直感的には、意味を保ちながら文章をデトックス化する方法は様々であり、ユーザに対してテキストを表示する前に、デトックス化文を選択できる。条件付き拡散モデルは、言語モデルに基づく既存の条件付きテキスト生成モデルよりも高い生成的多様性を示すため、このタスクに特に適している。それでも、不十分なデータで訓練された場合、テキストの流布度は低下する。本研究では,テキストデトックス化のための混合条件と非条件拡散モデルであるDiffuDetoxを提案する。条件付きモデルは、有毒なテキストを条件として取り、その毒性を減少させ、様々な無毒な文を生成する。非条件モデルは、入力テキストを復元するために訓練され、トレーニングのために追加のフルーエントテキストを導入することができる。提案するdiffudetoxの有効性を実験的に検証し,詳細な解析を行った。

Text detoxification is a conditional text generation task aiming to remove offensive content from toxic text. It is highly useful for online forums and social media, where offensive content is frequently encountered. Intuitively, there are diverse ways to detoxify sentences while preserving their meanings, and we can select from detoxified sentences before displaying text to users. Conditional diffusion models are particularly suitable for this task given their demonstrated higher generative diversity than existing conditional text generation models based on language models. Nonetheless, text fluency declines when they are trained with insufficient data, which is the case for this task. In this work, we propose DiffuDetox, a mixed conditional and unconditional diffusion model for text detoxification. The conditional model takes toxic text as the condition and reduces its toxicity, yielding a diverse set of detoxified sentences. The unconditional model is trained to recover the input text, which allows the introduction of additional fluent text for training and thus ensures text fluency. Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed DiffuDetox.

翻訳日:2023-06-16 18:58:37 公開日:2023-06-14

# ITALIC: イタリアのインテント分類データセット

ITALIC: An Italian Intent Classification Dataset ( http://arxiv.org/abs/2306.08502v1 )

ライセンス: Link先を確認

Alkis Koudounas, Moreno La Quatra, Lorenzo Vaiani, Luca Colomba, Giuseppe Attanasio, Eliana Pastor, Luca Cagliero, Elena Baralis

(参考訳) 最近の大規模音声言語理解データセットは、主に英語に焦点を当てており、特定の音素や異なる発話中の単語といった言語固有の現象を考慮していない。 ITALICはイタリア語で意図分類用に設計された最初の大規模音声データセットである。このデータセットは、イタリア各地の70人の話者が記録した16,521人のクラウドソースオーディオサンプルからなり、インテントラベルと追加メタデータが付加されている。我々は現在最先端の音声とテキストモデルを評価することでITALICの汎用性を探求する。意図分類の結果から,大規模化や言語適応の促進により,より優れた音声モデルが得られ,モノリンガルテキストモデルが多言語モデルよりも優れていることが示唆された。我々は、新しいイタリアSLUモデルと言語固有のデータセットの開発を効率化するために、データセットとアノテーションスキームの両方をリリースする。

Recent large-scale Spoken Language Understanding datasets focus predominantly on English and do not account for language-specific phenomena such as particular phonemes or words in different lects. We introduce ITALIC, the first large-scale speech dataset designed for intent classification in Italian. The dataset comprises 16,521 crowdsourced audio samples recorded by 70 speakers from various Italian regions and annotated with intent labels and additional metadata. We explore the versatility of ITALIC by evaluating current state-of-the-art speech and text models. Results on intent classification suggest that increasing scale and running language adaptation yield better speech models, monolingual text models outscore multilingual ones, and that speech recognition on ITALIC is more challenging than on existing Italian benchmarks. We release both the dataset and the annotation scheme to streamline the development of new Italian SLU models and language-specific datasets.

翻訳日:2023-06-16 18:58:20 公開日:2023-06-14

# 機械学習を用いた都市変化過程追跡のための衛星誘導夜間光時系列の適応モデリング

Adaptive Modeling of Satellite-Derived Nighttime Lights Time-Series for Tracking Urban Change Processes Using Machine Learning ( http://arxiv.org/abs/2306.08501v1 )

ライセンス: Link先を確認

Srija Chakraborty and Eleanor C. Stokes

(参考訳) リモートセンシングされた夜間灯(ntl)は、都市化、社会・政治の衝突と変位、災害、休日、日々の人間の行動パターンの変化など、人間と生態的な幸福にとって重要な都市変化プロセスを独特に捉えている。グローバルなNTL製品はいくつかあるが、開発レベルや社会的、経済的、文化的特徴など、光に影響を与える固有の都市特有の要因は、各都市特有のものであり、NTLシグネチャに埋め込まれた都市プロセスの特徴付けが困難であり、都市の変化分析のスケーラビリティを制限している。本研究では,各都市に適応し,都市固有の時間パターンの学習に有効である日次衛星由来NTLデータから都市変化を検出するためのデータ駆動型手法を提案する。提案手法は,過去のデータ記録からニューラルネットワークを用いてntlシグネチャを予測し,大量のラベルなしデータの利用を可能にし,アノテーションの手間を省く。異常検出手法を用いたモデル予測から観測されたNTLの偏差に基づいて都市の変化を検出する。モデル予測と観測されたNTLを比較することで、変更の方向(正または負)を特定したり、変更の重大度を監視してリカバリを追跡することもできる。このモデルの運用にあたっては,動的ntl時系列を持つ多様な地域から10の都市圏を考察し,ntl偏差に基づいて,異なるドライバによる変化過程とこれらの都市域内で発生するレートを検出する手法の一般化可能性を示す。毎日のリモートセンシング観測から変化を監視するこのスケーラブルなアプローチは、大規模なデータボリュームを効率的に活用し、継続的な監視と意思決定をサポートする。

Remotely sensed nighttime lights (NTL) uniquely capture urban change processes that are important to human and ecological well-being, such as urbanization, socio-political conflicts and displacement, impacts from disasters, holidays, and changes in daily human patterns of movement. Though several NTL products are global in extent, intrinsic city-specific factors that affect lighting, such as development levels, and social, economic, and cultural characteristics, are unique to each city, making the urban processes embedded in NTL signatures difficult to characterize, and limiting the scalability of urban change analyses. In this study, we propose a data-driven approach to detect urban changes from daily satellite-derived NTL data records that is adaptive across cities and effective at learning city-specific temporal patterns. The proposed method learns to forecast NTL signatures from past data records using neural networks and allows the use of large volumes of unlabeled data, eliminating annotation effort. Urban changes are detected based on deviations of observed NTL from model forecasts using an anomaly detection approach. Comparing model forecasts with observed NTL also allows identifying the direction of change (positive or negative) and monitoring change severity for tracking recovery. In operationalizing the model, we consider ten urban areas from diverse geographic regions with dynamic NTL time-series and demonstrate the generalizability of the approach for detecting the change processes with different drivers and rates occurring within these urban areas based on NTL deviation. This scalable approach for monitoring changes from daily remote sensing observations efficiently utilizes large data volumes to support continuous monitoring and decision making.

翻訳日:2023-06-16 18:58:04 公開日:2023-06-14

# 線形応答による非平衡量子プローブ

Non-equilibrium quantum probing through linear response ( http://arxiv.org/abs/2306.08500v1 )

ライセンス: Link先を確認

Sherry Blair, Giorgio Zicari, Alessio Belenchia, Alessandro Ferraro, Mauro Paternostro

(参考訳) 線形応答理論の形式論は、開量子系が非平衡定常状態に向かって進化する物理的状況を含むように拡張することができる。ここでは、Konopik と Lutz [Phys] が提案したフレームワークを使用します。 Rev. Research {\bf 1}, 033156 (2019)] は、力学のユニタリ摂動を超えていく。 2つの結合量子高調波発振器からなるオープンシステムについて検討し、ハミルトニアンダイナミクスや非ユニタリ摂動に影響を及ぼすユニタリ摂動に対するシステムの応答を調べ、その温度やスクイーズなど環境の性質に影響を及ぼす。線形応答は, 量子探索法と組み合わせることで, 非単体力学の場合であっても, 環境の摂動や特性について, 有効な定量的情報を提供できることを示す。

The formalism of linear response theory can be extended to encompass physical situations where an open quantum system evolves towards a non-equilibrium steady-state. Here, we use the framework put forward by Konopik and Lutz [Phys. Rev. Research {\bf 1}, 033156 (2019)] to go beyond unitary perturbations of the dynamics. Considering an open system comprised of two coupled quantum harmonic oscillators, we study the system's response to unitary perturbations, affecting the Hamiltonian dynamics, as well as non-unitary perturbations, affecting the properties of the environment, e.g., its temperature and squeezing. We show that linear response, combined with a quantum probing approach, can effectively provide valuable quantitative information about the perturbation and characteristics of the environment, even in cases of non-unitary dynamics.

翻訳日:2023-06-16 18:57:33 公開日:2023-06-14

# RISCLIP: CLIP を用いたイメージセグメンテーションフレームワークの参照

RISCLIP: Referring Image Segmentation Framework using CLIP ( http://arxiv.org/abs/2306.08498v1 )

ライセンス: Link先を確認

Seoyeon Kim, Minguk Kang, Jaesik Park

(参考訳) 近年のコンピュータビジョンと自然言語処理の進歩は、Referring Image Segmentation (RIS)を含むマルチモーダルタスクの活発な研究につながっている。最近のアプローチでは、RISのフロンティアを目覚ましいマージンで前進させているが、最先端のパフォーマンスを達成するには、外部の視覚的グラウンドデータセットの事前訓練段階が必要になる。本稿では, CLIP(Contrastive Language- Image Pretraining) を RIS に適用することにより, この要件から解放しようとする。本稿では,Flsion AdaptersとBackbone Adaptersを用いて,凍結したCLIP機能をRISに残留的に適応させる新しいフレームワークを提案する。フリーズCLIPはバックボーンのリッチで汎用的な画像テキストアライメントの知識を保ち、Fusion Adaptersはマルチモーダル通信を導入し、Backbone AdaptersはRISの解決に有用な新しい知識を注入する。提案手法は3つの主要なRISベンチマーク上での新たな技術状況に達する。追加の事前訓練を必要とせず、追加のトレーニングやデータ準備の必要性を解消する。ソースコードとモデルの重み付けは、公開時に提供される。

Recent advances in computer vision and natural language processing have naturally led to active research in multi-modal tasks, including Referring Image Segmentation (RIS). Recent approaches have advanced the frontier of RIS by impressive margins, but they require an additional pretraining stage on external visual grounding datasets to achieve the state-of-the-art performances. We attempt to break free from this requirement by effectively adapting Contrastive Language-Image Pretraining (CLIP) to RIS. We propose a novel framework that residually adapts frozen CLIP features to RIS with Fusion Adapters and Backbone Adapters. Freezing CLIP preserves the backbone's rich, general image-text alignment knowledge, whilst Fusion Adapters introduce multi-modal communication and Backbone Adapters inject new knowledge useful in solving RIS. Our method reaches a new state of the art on three major RIS benchmarks. We attain such performance without additional pretraining and thereby absolve the necessity of extra training and data preparation. Source code and model weights will be available upon publication.

翻訳日:2023-06-16 18:57:19 公開日:2023-06-14

# 強対数凹分布に対するランゲヴィン・モンテカルロ:ランダム化された中間点の再検討

Langevin Monte Carlo for strongly log-concave distributions: Randomized midpoint revisited ( http://arxiv.org/abs/2306.08494v1 )

ライセンス: Link先を確認

Lu Yu, Avetik Karagulyan, Arnak Dalalyan

(参考訳) 我々は,$\mathbb r^p$ の至る所で滑らかな対数対数密度を持つ対象分布からサンプリングする問題を再検討する。この文脈では、付加的な密度情報がない場合、動力学的ランジュバン拡散のランダム化中間点離散化は、大きな条件数を持つ高次元において最もスケーラブルな方法であることが知られている。我々の主な結果は、この手法のワッサーシュタイン-2誤差の上限を非漸近的に計算し易いことである。計算可能な上界を確立する方法のより詳細な説明として,バニラ・ランゲヴィン過程の中間点の離散化を解析する。この分析は根底にある原理を明らかにするのに役立ち、中間点の離散化を伴う速度論的ランゲヴィン過程の上限を改良するために私たちが使う貴重な洞察を提供する。さらに、これらの手法を適用することで、既存の上界よりも条件数に依存したオイラー離散化によるランゲヴィン過程の新しい保証を確立する。

We revisit the problem of sampling from a target distribution that has a smooth strongly log-concave density everywhere in $\mathbb R^p$. In this context, if no additional density information is available, the randomized midpoint discretization for the kinetic Langevin diffusion is known to be the most scalable method in high dimensions with large condition numbers. Our main result is a nonasymptotic and easy to compute upper bound on the Wasserstein-2 error of this method. To provide a more thorough explanation of our method for establishing the computable upper bound, we conduct an analysis of the midpoint discretization for the vanilla Langevin process. This analysis helps to clarify the underlying principles and provides valuable insights that we use to establish an improved upper bound for the kinetic Langevin process with the midpoint discretization. Furthermore, by applying these techniques we establish new guarantees for the kinetic Langevin process with Euler discretization, which have a better dependence on the condition number than existing upper bounds.

翻訳日:2023-06-16 18:56:59 公開日:2023-06-14

# ニューラルネットワーク翻訳モデルに対する逆攻撃に対する緩和最適化手法

A Relaxed Optimization Approach for Adversarial Attacks against Neural Machine Translation Models ( http://arxiv.org/abs/2306.08492v1 )

ライセンス: Link先を確認

Sahar Sadrizadeh, Cl\'ement Barbier, Ljiljana Dolamic, Pascal Frossard

(参考訳) 本稿では,ニューラルネットワーク翻訳(NMT)モデルに対する最適化に基づく逆攻撃を提案する。まず、原文と意味的に類似しているが、ターゲットNMTモデルによって生成された翻訳を破壊できる逆例を生成する最適化問題を提案する。この最適化問題は離散的であり,それを解くための連続緩和を提案する。この緩和により、各トークンの確率分布が逆の例に現れ、これらの分布からサンプリングすることで複数の逆の例を生成することができる。実験結果から,本攻撃はNMTモデルの翻訳品質を著しく低下させつつ,原文と逆文のセマンティックな類似性を維持できることがわかった。さらに,本攻撃は,成功率,類似性保持率,翻訳品質への影響,トークンエラー率において,ベースラインを上回っている。最後に,勾配がアクセス可能な参照モデルの最適確率分布からサンプリングすることにより,攻撃のブラックボックス拡張を提案する。

In this paper, we propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models. First, we propose an optimization problem to generate adversarial examples that are semantically similar to the original sentences but destroy the translation generated by the target NMT model. This optimization problem is discrete, and we propose a continuous relaxation to solve it. With this relaxation, we find a probability distribution for each token in the adversarial example, and then we can generate multiple adversarial examples by sampling from these distributions. Experimental results show that our attack significantly degrades the translation quality of multiple NMT models while maintaining the semantic similarity between the original and adversarial sentences. Furthermore, our attack outperforms the baselines in terms of success rate, similarity preservation, effect on translation quality, and token error rate. Finally, we propose a black-box extension of our attack by sampling from an optimized probability distribution for a reference model whose gradients are accessible.

翻訳日:2023-06-16 18:56:40 公開日:2023-06-14

# 大規模・高密度ランダムクロネッカーグラフの解析と近似推定

Analysis and Approximate Inference of Large and Dense Random Kronecker Graphs ( http://arxiv.org/abs/2306.08489v1 )

ライセンス: Link先を確認

Zhenyu Liao, Yuanqian Xia, Chengmei Niu, Yong Xiao

(参考訳) ランダムグラフモデルは科学や産業においてますます重要な役割を担っており、社会や交通ネットワーク、レコメンデーションシステムや分子遺伝学など幅広い分野に応用されている。本稿では,グラフ頂点数が$N$である場合のランダムクロネッカーグラフモデルの詳細な解析を行う。乱数行列理論の最近の進歩を基にして、密分布において、ランダムクロネッカーグラフの隣接行列は、グラフパラメータに線形な最小ランク(最大$\log n$)信号行列と4/4円の特異値分布を持つランダムノイズ行列を持つ信号プラスノイズモデル(英語版)(signal-plus-noise model)に従うことを示した。この観測により,計算複雑性の低減と(漸近的な)性能保証により,グラフパラメータを近似的に推定する<denoise-and-solve'メタアルゴリズムを提案することができる。合成グラフと現実グラフの両方におけるグラフ推定とグラフ分類の数値実験を行い,提案手法の利点を実証した。

Random graph models are playing an increasingly important role in science and industry, and finds their applications in a variety of fields ranging from social and traffic networks, to recommendation systems and molecular genetics. In this paper, we perform an in-depth analysis of the random Kronecker graph model proposed in \cite{leskovec2010kronecker}, when the number of graph vertices $N$ is large. Built upon recent advances in random matrix theory, we show, in the dense regime, that the random Kronecker graph adjacency matrix follows approximately a signal-plus-noise model, with a small-rank (of order at most $\log N$) signal matrix that is linear in the graph parameters and a random noise matrix having a quarter-circle-form singular value distribution. This observation allows us to propose a ``denoise-and-solve'' meta algorithm to approximately infer the graph parameters, with reduced computational complexity and (asymptotic) performance guarantee. Numerical experiments of graph inference and graph classification on both synthetic and realistic graphs are provided to support the advantageous performance of the proposed approach.

翻訳日:2023-06-16 18:56:25 公開日:2023-06-14

# マルチモーダル集中型知識グラフによる未知物体の認識

Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph Propagation ( http://arxiv.org/abs/2306.08487v1 )

ライセンス: Link先を確認

Likang Wu, Zhi Li, Hongke Zhao, Zhefeng Wang, Qi Liu, Baoxing Huai, Nicholas Jing Yuan, Enhong Chen

(参考訳) Zero-Shot Learning (ZSL)は、見えないオブジェクトを自動的に認識することを目的としており、マシンに対する新しい現実世界の知識を継続的に理解するための、有望な学習パラダイムである。近年、知識グラフ(kg)は、ゼロショットタスクを大規模かつ非帰属データで扱うための効果的なスキームとして証明されている。先行研究は常に、見えないオブジェクトと見えないオブジェクトの関係を、既存の知識グラフから視覚情報に埋め込み、見えないデータの認知能力を促進する。実際、現実世界の知識は自然にマルチモーダルな事実によって形成されます。グラフの観点からの通常の構造的知識と比較して、マルチモーダルkgはきめ細かい知識を持つ認知システムを提供できる。例えば、テキスト記述とビジュアルコンテンツは、知識のトリプレットのみに依存するよりも、事実のより重要な詳細を描写することができる。残念ながら、このマルチモーダルなきめ細かな知識は、異なるモダリティ間の機能アライメントのボトルネックのため、ほとんど展開されていない。そこで我々は,画像の領域と対応するセマンティックな埋め込みとを,設計した集中型注目モジュールと自己校正損失によってマッチングする多モード集中型ZSLフレームワークを提案する。これにより、ZSLフレームワークのセマンティックトランスファープロセスは、エンティティ間のより分化した知識を学習する。私たちのモデルは、粗いグローバル機能のみを使用する場合のパフォーマンス制限も取り除きます。大規模実世界データを用いた大規模実験を行い,モデルの評価を行った。実験結果は,標準ゼロショット分類タスクにおける提案モデルの有効性を明らかにした。

Zero-Shot Learning (ZSL), which aims at automatically recognizing unseen objects, is a promising learning paradigm to understand new real-world knowledge for machines continuously. Recently, the Knowledge Graph (KG) has been proven as an effective scheme for handling the zero-shot task with large-scale and non-attribute data. Prior studies always embed relationships of seen and unseen objects into visual information from existing knowledge graphs to promote the cognitive ability of the unseen data. Actually, real-world knowledge is naturally formed by multimodal facts. Compared with ordinary structural knowledge from a graph perspective, multimodal KG can provide cognitive systems with fine-grained knowledge. For example, the text description and visual content can depict more critical details of a fact than only depending on knowledge triplets. Unfortunately, this multimodal fine-grained knowledge is largely unexploited due to the bottleneck of feature alignment between different modalities. To that end, we propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings via a designed dense attention module and self-calibration loss. It makes the semantic transfer process of our ZSL framework learns more differentiated knowledge between entities. Our model also gets rid of the performance limitation of only using rough global features. We conduct extensive experiments and evaluate our model on large-scale real-world data. The experimental results clearly demonstrate the effectiveness of the proposed model in standard zero-shot classification tasks.

翻訳日:2023-06-16 18:56:03 公開日:2023-06-14

# アクティベーション関数の共設計によるディープニューラルネットワークの高速かつプライベートな推論

Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions ( http://arxiv.org/abs/2306.08538v1 )

ライセンス: Link先を確認

Abdulrahman Diaa, Lucas Fenaux, Thomas Humphries, Marian Dietz, Faezeh Ebrahimianghazani, Bailey Kacsmar, Xinda Li, Nils Lukas, Rasoul Akhavan Mahdavi, Simon Oya, Ehsan Amjadian, Florian Kerschbaum

(参考訳) マシンラーニング・アズ・ア・サービス(MLaaS)は、豊富なコンピューティングリソースを持つ企業がディープニューラルネットワークをトレーニングし、画像分類などのタスクに対してクエリアクセスを提供するという、人気の高い設計である。この設計の課題は、MLaaSがクライアントに対して、モデルをホストしている企業に対して、潜在的にセンシティブなクエリを明らかにする必要があることだ。マルチパーティ計算(MPC)は、暗号化された推論を許すことでクライアントのデータを保護する。しかし、現在のアプローチは、非常に大きな推論時間に苦しむ。 MPCにおける推定時間ボトルネックは、ReLUアクティベーション関数のような非線形層の評価である。機械学習とmpcの側面を共同設計する以前の作業の成功に動機づけられ、アクティベーション関数を共同設計する。我々は全てのReLUを多項式近似に置き換え、それらを単一ラウンドMPCプロトコルで評価し、広域ネットワークにおける最先端の推論時間を与える。さらに,以前に多項式アクティベーションで遭遇した精度問題に対処するために,平文モデルと競合する精度のトレーニングアルゴリズムを提案する。私たちの評価では、最大230万ドルのパラメータを持つ大規模モデルでの推論時間の4ドルから90ドルのスピードアップと、競合する推論精度の維持が示されています。

Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC aspects, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-the-art inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between $4$ and $90\times$ speedups in inference time on large models with up to $23$ million parameters while maintaining competitive inference accuracy.

翻訳日:2023-06-16 18:50:20 公開日:2023-06-14

# VIBR:ロバスト視覚制御のためのビュー不変値関数の学習

VIBR: Learning View-Invariant Value Functions for Robust Visual Control ( http://arxiv.org/abs/2306.08537v1 )

ライセンス: Link先を確認

Tom Dupuis, Jaonary Rabarisoa, Quoc-Cuong Pham and David Filliat

(参考訳) 画像におけるエンドツーエンドの強化学習は近年大きな進歩を見せている。データベースアプローチはデータ拡張とドメインのランダム化を活用し、表現学習手法は補助損失を使用してタスク関連の特徴を学習する。しかし、強化はいまだに視覚的に多様で、混乱と刺激的な騒音に満ちた環境に苦しむ。本研究では,多視点学習と不変予測を組み合わせることで,RLに基づくビジュモータ制御におけるアウト・オブ・ディストリビューション(OOD)の一般化ギャップを低減する手法であるVIBR(View-Invariant Bellman Residuals)を提案する。モデルフリーアプローチでは,表現学習の目的を付加する必要がなく,計算コストが制限されることなく,ベースライン性能が向上する。視覚摂動の高い複雑なビジュオモータ制御環境において,VIBRは既存の手法よりも優れていることを示す。提案手法は,多くの視覚摂動器の頑健性,OODの一般化,外挿機能を評価するため,現状の手法では未解決であるDistracting Control Suiteベンチマークの最先端結果を実現する。

End-to-end reinforcement learning on images showed significant progress in the recent years. Data-based approach leverage data augmentation and domain randomization while representation learning methods use auxiliary losses to learn task-relevant features. Yet, reinforcement still struggles in visually diverse environments full of distractions and spurious noise. In this work, we tackle the problem of robust visual control at its core and present VIBR (View-Invariant Bellman Residuals), a method that combines multi-view training and invariant prediction to reduce out-of-distribution (OOD) generalization gap for RL based visuomotor control. Our model-free approach improve baselines performances without the need of additional representation learning objectives and with limited additional computational cost. We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation. Our approach achieves state-of the-art results on the Distracting Control Suite benchmark, a challenging benchmark still not solved by current methods, where we evaluate the robustness to a number of visual perturbators, as well as OOD generalization and extrapolation capabilities.

翻訳日:2023-06-16 18:49:59 公開日:2023-06-14

# 3量子クリフォード+CS作用素の生成と関係

Generators and relations for 3-qubit Clifford+CS operators ( http://arxiv.org/abs/2306.08530v1 )

ライセンス: Link先を確認

Xiaoning Bian and Peter Selinger

(参考訳) 生成子によるプレゼンテーションと3量子クリフォード+CS作用素群の関係について述べる。証明は概ね2つの部分から構成される:(1) ライデマイスター=シュライアーの定理を我々の初期の結果に再帰的に適用すること、(2) 何千もの関係を17の関係に単純化すること。 1)と(2)は、証明アシスタントAgdaで正式に認証されている。 reidemeister-schreier の定理は、スーパーモノイドの表現が与えられた部分モノイドの表現を計算するための構成的方法を与える。 (2) を達成するために、clifford+cs演算子のほぼ正規形式を考案する。その過程で、クリフォード+CS群内のいくつかの興味深い構造も同定する。具体的には、元が一意な正規形式を与えることのできる3つの異なる有限部分群を特定する。 3量子クリフォード+cs群は、もちろん無限であり、これら3つの有限部分群の合併積である。この結果は、 1-立方体 Clifford+T 群が2つの有限部分群の積であるという事実に類似している。

We give a presentation by generators and relations of the group of 3-qubit Clifford+CS operators. The proof roughly consists of two parts: (1) applying the Reidemeister-Schreier theorem recursively to an earlier result of ours; and (2) the simplification of thousands of relations into 17 relations. Both (1) and (2) have been formally verified in the proof assistant Agda. The Reidemeister-Schreier theorem gives a constructive method for computing a presentation of a sub-monoid given a presentation of the super-monoid. To achieve (2), we devise an almost-normal form for Clifford+CS operators. Along the way, we also identify several interesting structures within the Clifford+CS group. Specifically, we identify three different finite subgroups for whose elements we can give unique normal forms. We show that the 3-qubit Clifford+CS group, which is of course infinite, is the amalgamated product of these three finite subgroups. This result is analogous to the fact that the 1-qubit Clifford+T group is an amalgamated product of two finite subgroups.

翻訳日:2023-06-16 18:49:38 公開日:2023-06-14

# SQL2Circuits: 量子自然言語処理法によるSQLクエリのメトリック推定

SQL2Circuits: Estimating Metrics for SQL Queries with A Quantum Natural Language Processing Method ( http://arxiv.org/abs/2306.08529v1 )

ライセンス: Link先を確認

Valter Uotila

(参考訳) 近年、量子コンピューティングは著しく発展している。 SQLクエリの様々なメトリクスを推定するアルゴリズムの開発は、クエリ最適化とデータベース性能に影響を与えるため、データベース研究において重要な研究課題となっている。この研究は、量子自然言語処理(QNLP)から着想を得たアプローチで、実行時間と濃度に関してSQLクエリを分類できる量子機械学習モデルを構築する。量子機械学習の観点から、我々のモデルと結果をQNLPの以前の研究と比較し、我々のモデルは分類タスクにおけるQNLPモデルと同様の精度に達すると結論づける。これは,QNLPにない問題に適用しても,QNLPモデルは有望な手法であることを示している。本研究では,その表現可能性とエンハング能力ヒストグラムを計算し,量子機械学習モデルについて検討した。結果は、モデルが表現しやすい性質を持つが、量子ハードウェア上で実行するには複雑ではないことを示している。

Quantum computing has developed significantly in recent years. Developing algorithms to estimate various metrics for SQL queries has been an important research question in database research since the estimations affect query optimization and database performance. This work represents a quantum natural language processing (QNLP) -inspired approach for constructing a quantum machine learning model which can classify SQL queries with respect to their execution times and cardinalities. From the quantum machine learning perspective, we compare our model and results to the previous research in QNLP and conclude that our model reaches similar accuracy as the QNLP model in the classification tasks. This indicates that the QNLP model is a promising method even when applied to problems that are not in QNLP. We study the developed quantum machine learning model by calculating its expressibility and entangling capability histograms. The results show that the model has favorable properties to be expressible but also not too complex to be executed on quantum hardware.

翻訳日:2023-06-16 18:49:21 公開日:2023-06-14

# 予測:連続画像を用いた予測誘導3次元物体検出

Predict to Detect: Prediction-guided 3D Object Detection using Sequential Images ( http://arxiv.org/abs/2306.08528v1 )

ライセンス: Link先を確認

Sanmin Kim, Youngseok Kim, In-Jae Lee, Dongsuk Kum

(参考訳) 最近のカメラベースの3Dオブジェクト検出手法では、複数のフレームが大きな深さ推定誤差を軽減することを期待して、シーケンシャルフレームを導入している。検出性能の改善にもかかわらず、先行の作業は単純融合法(例えば結合)や静的なシーン(例えば時間ステレオ)に限られており、物体の動きキューの重要性を無視している。これらのアプローチはシーケンシャルなイメージの可能性を完全に活用せず、限られた性能改善を示す。この制限に対処するために,予測スキームを検出フレームワークに統合し,運動特徴を明示的に抽出し活用する新しい3Dオブジェクト検出モデルP2D(Predict to Detect)を提案する。 P2Dは、過去のフレームのみを用いて現在のフレーム内のオブジェクト情報を予測し、時間運動の特徴を学習する。次に,予測対象情報に基づいてバードアイビュー(BEV)特徴を注意深く活用し,正確な3次元物体検出を実現する新しい時間的特徴集約手法を提案する。実験結果から,P2Dは連続画像ベースラインに比べてmAPとNDSを3.0%,3.7%改善し,予測スキームを組み込むことで検出精度が大幅に向上することが示された。

Recent camera-based 3D object detection methods have introduced sequential frames to improve the detection performance hoping that multiple frames would mitigate the large depth estimation error. Despite improved detection performance, prior works rely on naive fusion methods (e.g., concatenation) or are limited to static scenes (e.g., temporal stereo), neglecting the importance of the motion cue of objects. These approaches do not fully exploit the potential of sequential images and show limited performance improvements. To address this limitation, we propose a novel 3D object detection model, P2D (Predict to Detect), that integrates a prediction scheme into a detection framework to explicitly extract and leverage motion features. P2D predicts object information in the current frame using solely past frames to learn temporal motion features. We then introduce a novel temporal feature aggregation method that attentively exploits Bird's-Eye-View (BEV) features based on predicted object information, resulting in accurate 3D object detection. Experimental results demonstrate that P2D improves mAP and NDS by 3.0% and 3.7% compared to the sequential image-based baseline, illustrating that incorporating a prediction scheme can significantly improve detection accuracy.

翻訳日:2023-06-16 18:49:07 公開日:2023-06-14

# 音声強調のための可変保存型補間拡散モデル

Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement ( http://arxiv.org/abs/2306.08527v1 )

ライセンス: Link先を確認

Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang

(参考訳) 本研究の目的は,音声強調のための拡散モデルを実装することである。最初のステップは、連続条件下での分散保存(VP)ベースの補間拡散の理論的基礎を強調することである。次に,VP-および分散拡散(VE)に基づく補間拡散法の両方をカプセル化した,より簡潔なフレームワークを提案する。この2つの手法が提案フレームワークの特別な場合であることを実証する。さらに、SEタスクに対するVPベースの補間拡散の実例を示す。性能の向上とモデルトレーニングの容易化を目的として,拡散モデルで発生する一般的な難易度を分析し,超パラメータの提案を行う。最後に,提案手法の有効性を示すために,公開ベンチマークを用いた複数の手法に対する評価を行った。

The goal of this study is to implement diffusion models for speech enhancement (SE). The first step is to emphasize the theoretical foundation of variance-preserving (VP)-based interpolation diffusion under continuous conditions. Subsequently, we present a more concise framework that encapsulates both the VP- and variance-exploding (VE)-based interpolation diffusion methods. We demonstrate that these two methods are special cases of the proposed framework. Additionally, we provide a practical example of VP-based interpolation diffusion for the SE task. To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models and suggest amenable hyper-parameters. Finally, we evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach

翻訳日:2023-06-16 18:48:42 公開日:2023-06-14

# albmore:アルバニアの感情分析のための映画レビューコーパス

AlbMoRe: A Corpus of Movie Reviews for Sentiment Analysis in Albanian ( http://arxiv.org/abs/2306.08526v1 )

ライセンス: Link先を確認

Erion \c{C}ano

(参考訳) 低リソース言語のためのテキストコーパスのような利用可能なリソースの不足は、自然言語処理や計算言語学の研究を著しく妨げている。本稿では,アルバニア語映画評論800のコーパスであるAlbMoReを紹介する。各テキストはポジティブまたはネガティブとラベル付けされ、感情分析研究に使用することができる。 AlbMoReサンプルを用いて学習した従来の機械学習分類器に基づく予備結果も報告する。これらは将来の研究実験の比較基準となる。

Lack of available resources such as text corpora for low-resource languages seriously hinders research on natural language processing and computational linguistics. This paper presents AlbMoRe, a corpus of 800 sentiment annotated movie reviews in Albanian. Each text is labeled as positive or negative and can be used for sentiment analysis research. Preliminary results based on traditional machine learning classifiers trained with the AlbMoRe samples are also reported. They can serve as comparison baselines for future research experiments.

翻訳日:2023-06-16 18:48:24 公開日:2023-06-14

# ランクアグリゲーションにおける分割性の測定と制御

Measuring and Controlling Divisiveness in Rank Aggregation ( http://arxiv.org/abs/2306.08511v1 )

ライセンス: Link先を確認

Rachael Colley, Umberto Grandi, C\'esar Hidalgo, Mariana Macedo and Carlos Navarrete

(参考訳) 階級集計において、人口階級のメンバーは、どの集団が好まれるかを決定する。代わりに、個人の好みの相違を表現する分割的な問題を特定することに焦点を合わせます。我々は、偏極性尺度の特性と既存の偏極性概念との関係を分析する。また,不完全な選好の下でのロバスト性や,分割性の制御と操作のためのアルゴリズムについても検討する。我々の結果は、集団意思決定における不一致を定量化する方法についての理解を深める。

In rank aggregation, members of a population rank issues to decide which are collectively preferred. We focus instead on identifying divisive issues that express disagreements among the preferences of individuals. We analyse the properties of our divisiveness measures and their relation to existing notions of polarisation. We also study their robustness under incomplete preferences and algorithms for control and manipulation of divisiveness. Our results advance our understanding of how to quantify disagreements in collective decision-making.

翻訳日:2023-06-16 18:48:15 公開日:2023-06-14

# 音源追跡のための置換不変リカレントニューラルネットワーク

Permutation Invariant Recurrent Neural Networks for Sound Source Tracking Applications ( http://arxiv.org/abs/2306.08510v1 )

ライセンス: Link先を確認

David Diaz-Guerra, Archontis Politis, Antonio Miguel, Jose R. Beltran, Tuomas Virtanen

(参考訳) ニューラルネットワークに基づく多くのマルチソースローカライゼーションと追跡モデルは、最終段階で1つまたは複数の繰り返しレイヤを使用してソースの移動を追跡する。長い短期記憶(LSTM)やゲートリカレントユニット(GRU)のような従来のリカレントニューラルネットワーク(RNN)は、ベクトルを入力として、別のベクトルを使って状態を記憶する。しかし、このアプローチは、単一の順序ベクトルに含まれる全てのソースからの情報をもたらすため、マルチソース追跡のような置換不変問題には最適ではない。本稿では,その入力と状態の両方を表現するために非順序集合を使用し,入力集合の置換に不変であり,状態集合の置換に同値である新しい再帰的アーキテクチャを提案する。したがって、各音源の情報は個別の埋め込みで表現され、新しい推定値はその順序に関係なくトラックされた軌跡に割り当てられる。

Many multi-source localization and tracking models based on neural networks use one or several recurrent layers at their final stages to track the movement of the sources. Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vector to store their state. However, this approach results in the information from all the sources being contained in a single ordered vector, which is not optimal for permutation-invariant problems such as multi-source tracking. In this paper, we present a new recurrent architecture that uses unordered sets to represent both its input and its state and that is invariant to the permutations of the input set and equivariant to the permutations of the state set. Hence, the information of every sound source is represented in an individual embedding and the new estimates are assigned to the tracked trajectories regardless of their order.

翻訳日:2023-06-16 18:48:03 公開日:2023-06-14

# NISQ時代の量子コンピュータにおける車両ルーティング問題に対する量子ビット効率的な量子アルゴリズム

Qubit efficient quantum algorithms for the vehicle routing problem on quantum computers of the NISQ era ( http://arxiv.org/abs/2306.08507v1 )

ライセンス: Link先を確認

Ioannis D. Leonidas, Alexander Dukakis, Benjamin Tan, Dimitris G. Angelakis

(参考訳) タイムウインドウ(VRPTW)による車両ルーティング問題は、ロジスティクスや輸送など、多くの分野で発生する古典的な最適化問題である。 VRPTWの目標は、車両群が目的地を訪れるための最短ルートを見つけることである。近年,2次非制約バイナリ最適化(QUBO)問題として定式化できる問題に対する近似解を求めるために,変分量子アルゴリズム(VQA)の使用への関心が高まっている。本研究では,vrptw を qubo として定式化し,[1] に記述した先述の符号化方式を用いて vrptw に量子変分法を適用し,必要な qubit 数を大幅に削減する。 exxonmobilの研究者が提供したデータをもとに,11～3964経路のvrptwインスタンスを対象に,提案手法を評価した。 NISQ時代に可能な最大問題のサイズが20-30経路の順序であるような標準的な完全符号化手法を用いて得られた解を比較した。 ibmq、aws(rigetti)、ionqによって提供されるクラウド量子ハードウェアだけでなく、シミュレータでアルゴリズムを実行し、シミュレータ上でも結果をベンチマークします。本手法は,全エンコーディングを用いた量子アルゴリズムの解に匹敵するvrptwの近似解を求めることができることを示す。その結果,業界毎の最適化問題に対する近似解を求めるのに必要な量子ビット数を劇的に削減する有望な手法が提案されている。

The vehicle routing problem with time windows (VRPTW) is a classic optimization problem that arises in many different areas, such as logistics and transportation. The goal of the VRPTW is to find the shortest possible route for a fleet of vehicles to visit a set of destinations. In recent years, there has been growing interest in using variational quantum algorithms (VQAs), to find approximate solutions to problems that can be formulated as quadratic unconstrained binary optimization (QUBO) problems. In this work, we formulate the VRPTW as a QUBO and apply a quantum variational approach to the VRPTW using our earlier suggested encoding scheme described in [1] to reduce drastically the number of qubits required. We evaluate our approach on a set of VRPTW instances ranging from 11 to 3964 routes constructed with data provided by researchers from ExxonMobil. We compare the solutions obtained with standard full encoding approaches for which the max problems size possible in NISQ era are of the order of 20-30 routes. We run our algorithms in simulators as well as cloud quantum hardware provided by IBMQ, AWS (Rigetti) and IonQ and benchmark our results against each other as well as on the simulators. We show that our approach can find approximate solutions to the VRPTW that are comparable to the solutions found by quantum algorithms using the full encoding. Our results suggest that our unique encoding approach, provides a promising approach to drastically reducing the number of qubits required to find decent approximate solutions for industry-based optimization problems.

翻訳日:2023-06-16 18:47:36 公開日:2023-06-14

# 自然主義的刺激に対する普遍的一般化法則

The Universal Law of Generalization Holds for Naturalistic Stimuli ( http://arxiv.org/abs/2306.08564v1 )

ライセンス: Link先を確認

Raja Marjieh, Nori Jacoby, Joshua C. Peterson, Thomas L. Griffiths

(参考訳) シェパードの普遍的一般化の法則は、知的生物がどのように類似性を知覚すべきかについての顕著な仮説である。普遍法則は、一対の刺激の知覚的類似性のレベルは、適切な心理学空間に埋め込まれた場合、その距離の凹凸関数として崩壊すべきであると述べている。広く研究されているが、普遍法則を支持する証拠は、現実世界とは大きく異なる低次元の刺激と小さな刺激セットに依存している。これは主に、類似性判定に必要な対比較が刺激数で2次的にスケールするためである。自然主義的高次元体制における普遍的法則の直接的な証拠として,既存の214,200人の類似性判定のデータセットと,新たに収集された390,819人の一般性判定のデータセット(N=2406US)を3セットの自然画像で分析する。

Shepard's universal law of generalization is a remarkable hypothesis about how intelligent organisms should perceive similarity. In its broadest form, the universal law states that the level of perceived similarity between a pair of stimuli should decay as a concave function of their distance when embedded in an appropriate psychological space. While extensively studied, evidence in support of the universal law has relied on low-dimensional stimuli and small stimulus sets that are very different from their real-world counterparts. This is largely because pairwise comparisons -- as required for similarity judgments -- scale quadratically in the number of stimuli. We provide direct evidence for the universal law in a naturalistic high-dimensional regime by analyzing an existing dataset of 214,200 human similarity judgments and a newly collected dataset of 390,819 human generalization judgments (N=2406 US participants) across three sets of natural images.

翻訳日:2023-06-16 18:40:16 公開日:2023-06-14

# ダイヤモンド中の偏光エンタングルストークスアンチストークス光子の微視的起源

Microscopic origin of polarization-entangled Stokes-anti-Stokes photons in diamond ( http://arxiv.org/abs/2306.08563v1 )

ライセンス: Link先を確認

Tiago A. Freitas, Paula Machado, Lucas V. de Carvalho, Diego Sier, Raul Corr\^ea, Riichiro Saito, Marcelo F. Santos, Carlos H. Monken, and Ado Jorio

(参考訳) ストークス反ストークス(SaS)光子対のラマン共鳴近傍での偏光に対するクレーター-ホルン-シモニー-ホルト不等式の振動を実証した。このペアは、ダイヤモンド試料にパルスレーザーを照射して生成され、レーザーの2つの光子を異なる周波数の1対の光子に変換する。生成した対は、標準ベル分析器によって収集され、スペクトル領域や試料の結晶方位に対する入射光の偏光方向に依存する絡み合いの度合いで、偏光に絡み合っていることが示されている。この結果は、材料科学と量子情報を改善するために量子光学とsasラマン分光法を組み合わせる可能性を開く。

Violation of the Clauser-Horne-Shimony-Holt inequality for the polarization of Stokes-anti-Stokes (SaS) photon pairs near a Raman resonance is demonstrated. The pairs are generated by shining a pulsed laser on a diamond sample, where two photons of the laser are converted into a pair of photons of different frequencies. The generated pairs are collected by standard Bell analyzers and shown to be entangled in polarization, with the degree of entanglement depending on the spectral region and on the orientation of the polarization of the incident light with respect to the crystallographic orientation of the sample. This result opens up the possibility to combine quantum optics and SaS Raman spectroscopy in order to improve materials science and quantum information.

翻訳日:2023-06-16 18:40:00 公開日:2023-06-14

# 新型コロナウイルスによる学生の行動経路適応 : マルチモーダルアプローチ

Adaptation of Student Behavioural Routines during COVID-19: A Multimodal Approach ( http://arxiv.org/abs/2306.08561v1 )

ライセンス: Link先を確認

Nicol\`o A. Girardini, Simone Centellegher, Andrea Passerini, Ivano Bison, Fausto Giunchiglia and Bruno Lepri

(参考訳) 新型コロナウイルス(covid-19)のパンデミックで大きく適応し、行動を変えたい集団は学生だ。これまでの研究では、パンデミックが心理的健康と学術的パフォーマンスに与える影響を幅広く研究してきたが、活動のルーチンには限定的な注意が向けられている。本研究では,学生の行動変化を,2つの異なる期間(2018年と2020年)における日常の質的・定量的な違いから分析する。学生の活動, 場所, 社会性に関するマルチモーダルな自己申告データを収集する経験サンプリング法(ESM)を用いて, 非否定的マトリックス因子化(NMF)を用いて意味のある行動成分を抽出し, 2018年と2020年の学生の行動変動を定量化した。意外なことに、新型コロナウイルス(COVID-19)の規制があるにも関わらず、学生の活動の変化は最小限であり、活動の多様性も影響を受けていない。その結果, パンデミックに適応する活動は, 場所や社会性の面で主に発生していることが判明した。

One population group that had to significantly adapt and change their behaviour during the COVID-19 pandemic is students. While previous studies have extensively investigated the impact of the pandemic on their psychological well-being and academic performance, limited attention has been given to their activity routines. In this work, we analyze students' behavioural changes by examining qualitative and quantitative differences in their daily routines between two distinct periods (2018 and 2020). Using an Experience Sampling Method (ESM) that captures multimodal self-reported data on students' activity, locations and sociality, we apply Non-Negative Matrix Factorization (NMF) to extract meaningful behavioural components, and quantified the variations in behaviour between students in 2018 and 2020. Surprisingly, despite the presence of COVID-19 restrictions, we find minimal changes in the activities performed by students, and the diversity of activities also remains largely unaffected. Leveraging the richness of the data at our disposal, we discovered that activities adaptation to the pandemic primarily occurred in the location and sociality dimensions.

翻訳日:2023-06-16 18:39:46 公開日:2023-06-14

# サブ波長原子配列を用いた量子コンピューティング

Quantum computing with subwavelength atomic arrays ( http://arxiv.org/abs/2306.08555v1 )

ライセンス: Link先を確認

Freya Shah, Taylor L. Patti, Oriol Rubies-Bigorda, Susanne F. Yelin

(参考訳) サブ波長原子配列における光子による相互作用は量子科学に多くの応用がある。本稿では,3レベル量子エミッタの可能性,すなわち2次元原子配列に埋め込まれた`impurities'の可能性を探り,量子計算のプラットフォームとして機能する。サブ波長アレイを介する誘導双極子-双極子相互作用の結果、不純物の変形挙動を利用することにより、$\sqrt{\text{iSWAP}}$とシングルキュービット回転からなる普遍量子ゲートの集合を実装する。これらのゲートは、原子が近距離にある限り、非常に高い忠実度とコヒーレンス時間を持つことを示す。最後に、最大絡み合う2量子ビットベル状態、および絡み合う3量子ビットGHZ状態を生成するための量子回路を実装した。これらの結果は、量子計算と量子シミュレーションの代替プラットフォームとしてサブ波長エミッタアレイを確立する。

Photon-mediated interactions in subwavelength atomic arrays have numerous applications in quantum science. In this manuscript, we explore the potential of three-level quantum emitters, or ``impurities" embedded in a two-dimensional atomic array to serve as a platform for quantum computation. By exploiting the altered behavior of impurities as a result of the induced dipole-dipole interactions mediated by subwavelength array, we implement a set of universal quantum gates consisting of the $\sqrt{\text{iSWAP}}$ and single-qubit rotations. We demonstrate that these gates have very high fidelities and coherence times, as long as the atoms remain within a proximal range. Finally, we implement quantum circuits leading to the generation of the maximally entangled two-qubit Bell states, as well as the entangled three-qubit GHZ state. These findings establish subwavelength emitter arrays as an alternative platform for quantum computation and quantum simulation.

翻訳日:2023-06-16 18:39:23 公開日:2023-06-14

# 最適収束率を有するフラットミニマの雑音安定性最適化

Noise Stability Optimization for Flat Minima with Optimal Convergence Rates ( http://arxiv.org/abs/2306.08553v1 )

ライセンス: Link先を確認

Haotian Ju, Dongyue Li, and Hongyang R. Zhang

(参考訳) 平均重量摂動を加えて平坦で局所的な最小値を求める。非凸関数 $f: \mathbb{r}^d \rightarrow \mathbb{r}$ と $d$-次元分布 $\mathcal{p}$ が 0 で対称であるとき、f(w) = \mathbb{e}[f({w + u})]$ を摂動して $f(w) = \mathbb{e}[f({w + u})]$ と定義する。このインジェクションは、小さな等方性ガウス摂動に対してヘッセントレースf$を介して正規化を誘導する。したがって、重みの摂動関数は、低ヘッシアントレースを持つ最小化子に偏りを与える。いくつかの先行研究は、一般化を改善するアルゴリズムを設計することによって、この重み摂動関数に関連する設定を研究した。それでも収束率は、関数$F$の平均摂動の下でミニマを見つけることは知られていない。本稿では,分散を低減するために$\mathcal{P}$の対称性を活用しながら,勾配の計算前にランダムノイズを注入するSGDライクなアルゴリズムについて考察する。次に、厳密な解析を行い、f$の勾配がリプシッツ連続であるとき、近似した1次定常点を求めるアルゴリズムの上と下の境界が一致することを示す。我々は,様々なアーキテクチャを用いた画像分類タスクに対して,そのアルゴリズムを実証的に検証する。シャープネス・アウェアの最小化と比較すると、hessian traceの12.6%と7.8%の低下と、発見されたminimaの最高固有値が8つのデータセットの平均値であることがわかった。アブレーション研究はアルゴリズムの設計の利点を検証する。

We consider finding flat, local minimizers by adding average weight perturbations. Given a nonconvex function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ and a $d$-dimensional distribution $\mathcal{P}$ which is symmetric at zero, we perturb the weight of $f$ and define $F(W) = \mathbb{E}[f({W + U})]$, where $U$ is a random sample from $\mathcal{P}$. This injection induces regularization through the Hessian trace of $f$ for small, isotropic Gaussian perturbations. Thus, the weight-perturbed function biases to minimizers with low Hessian trace. Several prior works have studied settings related to this weight-perturbed function by designing algorithms to improve generalization. Still, convergence rates are not known for finding minima under the average perturbations of the function $F$. This paper considers an SGD-like algorithm that injects random noise before computing gradients while leveraging the symmetry of $\mathcal{P}$ to reduce variance. We then provide a rigorous analysis, showing matching upper and lower bounds of our algorithm for finding an approximate first-order stationary point of $F$ when the gradient of $f$ is Lipschitz-continuous. We empirically validate our algorithm for several image classification tasks with various architectures. Compared to sharpness-aware minimization, we note a 12.6% and 7.8% drop in the Hessian trace and top eigenvalue of the found minima, respectively, averaged over eight datasets. Ablation studies validate the benefit of the design of our algorithm.

翻訳日:2023-06-16 18:39:07 公開日:2023-06-14

# 情報アクセスシステム評価のためのユーザシミュレーション

User Simulation for Evaluating Information Access Systems ( http://arxiv.org/abs/2306.08550v1 )

ライセンス: Link先を確認

Krisztian Balog and ChengXiang Zhai

(参考訳) 検索エンジンやレコメンデータシステム,会話アシスタントといった情報アクセスシステムは,情報ニーズを満たす上で,私たちの日常生活に不可欠なものになっています。しかし、これらのシステムの有効性を評価することは長く複雑な科学的課題である。この課題は、対話的なサポートによるタスクの完了を支援するシステム全体の効果を評価することの難しさと、ユーザの振る舞いや好みの実質的な変動によってさらに悪化することにある。この課題に対処するため、ユーザシミュレーションは有望なソリューションとして現れる。本書は,評価目的に特化して設計されたユーザシミュレーション技術の徹底的な理解を提供することに重点を置いている。まず,情報アクセスシステム評価の背景からユーザシミュレーションの多様な応用について考察する。次に,ユーザシミュレータ設計のための一般的なフレームワーク,評価のためのユーザシミュレーション,検索エンジン,レコメンダシステム,会話アシスタントとのユーザインタラクションをシミュレートする特定のモデルとアルゴリズムの両方をカバーする,ユーザシミュレーションの主要な研究成果を体系的にレビューする。ユーザシミュレーションが学際的な研究課題であることを認識し,機械学習,対話システム,ユーザモデリング,経済学などの関連分野との連携を確立する。本書は,情報アクセスシステムの評価を超えて,対話型知的システム全般の評価方法に広範な影響を与えることが期待されている,今後の重要な研究方向性について,詳細な議論で締めくくっている。

Information access systems, such as search engines, recommender systems, and conversational assistants, have become integral to our daily lives as they help us satisfy our information needs. However, evaluating the effectiveness of these systems presents a long-standing and complex scientific challenge. This challenge is rooted in the difficulty of assessing a system's overall effectiveness in assisting users to complete tasks through interactive support, and further exacerbated by the substantial variation in user behaviour and preferences. To address this challenge, user simulation emerges as a promising solution. This book focuses on providing a thorough understanding of user simulation techniques designed specifically for evaluation purposes. We begin with a background of information access system evaluation and explore the diverse applications of user simulation. Subsequently, we systematically review the major research progress in user simulation, covering both general frameworks for designing user simulators, utilizing user simulation for evaluation, and specific models and algorithms for simulating user interactions with search engines, recommender systems, and conversational assistants. Realizing that user simulation is an interdisciplinary research topic, whenever possible, we attempt to establish connections with related fields, including machine learning, dialogue systems, user modeling, and economics. We end the book with a detailed discussion of important future research directions, many of which extend beyond the evaluation of information access systems and are expected to have broader impact on how to evaluate interactive intelligent systems in general.

翻訳日:2023-06-16 18:38:33 公開日:2023-06-14

# 機械学習アルゴリズムを用いたマスキング顔認識の探索的研究

An Exploratory Study of Masked Face Recognition with Machine Learning Algorithms ( http://arxiv.org/abs/2306.08549v1 )

ライセンス: Link先を確認

Megh Pudyel and Mustafa Atay

(参考訳) 自動顔認識は、自動境界制御、電子機器へのセキュアなログイン、コミュニティの監視、学校の出席の追跡、職場時計のイン、クロックアウトなど、さまざまなプロセスにおける人々の接触のない識別のための、広く採用されている機械学習技術である。最近の世界的な新型コロナウイルス(covid-19)パンデミックでは、マスクの使用が日常生活で重要になっている。フェイスマスクの使用により、従来の顔認識技術の性能は大幅に低下する。顔認識におけるマスク着用の効果は、まだ未検討の課題である。本稿では,マスク付き顔画像とマスクなし顔画像の識別により,多数の顔認識モデルの性能を評価することにより,この問題に対処する。 SVC, KNN, LDA, DT, LR, NBの6つの従来の機械学習アルゴリズムを用いて, マスクされた顔画像の存在下で, 性能の悪いもの以外に, 性能のよいものを見つけ出す。特徴抽出演算子としてローカルバイナリパターン(LBP)が使用される。合成顔画像の生成と利用を行った。非マスク、仮面、半マスクのトレーニングデータセットを作成し、マスク画像と未マスク画像の両方に対する顔認識性能を評価し、この問題の広い視野を示す。本研究は,マスク認識を半マスクから半マスクまで,半マスクからアンマスクまで,ほぼすべてのシナリオで説明し,従来の機械学習アルゴリズムを文献で比較した。

Automated face recognition is a widely adopted machine learning technology for contactless identification of people in various processes such as automated border control, secure login to electronic devices, community surveillance, tracking school attendance, workplace clock in and clock out. Using face masks have become crucial in our daily life with the recent world-wide COVID-19 pandemic. The use of face masks causes the performance of conventional face recognition technologies to degrade considerably. The effect of mask-wearing in face recognition is yet an understudied issue. In this paper, we address this issue by evaluating the performance of a number of face recognition models which are tested by identifying masked and unmasked face images. We use six conventional machine learning algorithms, which are SVC, KNN, LDA, DT, LR and NB, to find out the ones which perform best, besides the ones which poorly perform, in the presence of masked face images. Local Binary Pattern (LBP) is utilized as the feature extraction operator. We generated and used synthesized masked face images. We prepared unmasked, masked, and half-masked training datasets and evaluated the face recognition performance against both masked and unmasked images to present a broad view of this crucial problem. We believe that our study is unique in elaborating the mask-aware facial recognition with almost all possible scenarios including half_masked-to-masked and half_masked-to-unmasked besides evaluating a larger number of conventional machine learning algorithms compared the other studies in the literature.

翻訳日:2023-06-16 18:38:11 公開日:2023-06-14

# フォトニック量子コンピュータにおける非共有原子間相互作用のモデル化

Modeling Non-Covalent Interatomic Interactions on a Photonic Quantum Computer ( http://arxiv.org/abs/2306.08544v1 )

ライセンス: Link先を確認

Matthieu Sarkis, Alessio Fallani, Alexandre Tkatchenko

(参考訳) 非共有結合相互作用は、材料、分子、生体複合体の構造、安定性、ダイナミクスを決定する重要な要素である。しかし、これらの相互作用を正確に捉えることは複雑な量子多体問題であり、古典的コンピュータでは効率的な解は得られない。非共有相互作用を正確かつ効率的にモデル化するために広く使われているモデルはクーロン結合量子ドルド振動子(cqdo)多体ハミルトニアンであり、正確な解は知られていない。我々は,cQDOモデルが自然にフォトニック量子コンピュータ上でのシミュレーションに有効であることを示し,XanaduのStrawberry Fieldsフォトニクスライブラリを利用して2原子系の結合エネルギー曲線を計算する。本研究は、非共有結合相互作用に対する概念実証的応用を小さな分子の標準的な電子構造問題を超えて示すことにより、量子コンピューティングの原子論的モデリングへの適用性を実質的に拡張する。興味深いことに、2つの結合したボソニックQDOは安定結合を示す。さらに,従来の計算機に最適化可能なcQDO波動関数の効率的な関数形式を提案し,原子間距離を増大させるために結合-非共有遷移を捉える。興味深いことに、2つの結合したボソニックQDOは安定結合を示す。さらに,従来の計算機に最適化可能なcQDO波動関数の効率的な関数形式を提案し,原子間距離を増大させるために結合-非共有遷移を捉える。

Non-covalent interactions are a key ingredient to determine the structure, stability, and dynamics of materials, molecules, and biological complexes. However, accurately capturing these interactions is a complex quantum many-body problem, with no efficient solution available on classical computers. A widely used model to accurately and efficiently model non-covalent interactions is the Coulomb-coupled quantum Drude oscillator (cQDO) many-body Hamiltonian, for which no exact solution is known. We show that the cQDO model lends itself naturally to simulation on a photonic quantum computer, and we calculate the binding energy curve of diatomic systems by leveraging Xanadu's Strawberry Fields photonics library. Our study substantially extends the applicability of quantum computing to atomistic modeling, by showing a proof-of-concept application to non-covalent interactions, beyond the standard electronic-structure problem of small molecules. Remarkably, we find that two coupled bosonic QDOs exhibit a stable bond. In addition, our study suggests efficient functional forms for cQDO wavefunctions that can be optimized on classical computers, and capture the bonded-to-noncovalent transition for increasing interatomic distances. Remarkably, we find that two coupled bosonic QDOs exhibit a stable bond. In addition, our study suggests efficient functional forms for cQDO wavefunctions that can be optimized on classical computers, and capture the bonded-to-noncovalent transition for increasing interatomic distances.

翻訳日:2023-06-16 18:37:41 公開日:2023-06-14

# 大規模言語モデルの知識蒸留

Knowledge Distillation of Large Language Models ( http://arxiv.org/abs/2306.08543v1 )

ライセンス: Link先を確認

Yuxian Gu, Li Dong, Furu Wei, Minlie Huang

(参考訳) 知識蒸留 (KD) は, 大規模言語モデル (LLM) の高い計算需要を減らすための有望な手法である。しかしながら、従来のKDメソッドは、主にホワイトボックス分類モデルや、ChatGPTのようなブラックボックスモデルAPIを模倣する小さなモデルの訓練に適用される。ホワイトボックス生成LDMから効果的に知識を抽出する方法はまだ未熟であり、LSMの繁栄とともにますます重要になっている。本研究では,生成型言語モデルからより小さな言語モデルを抽出するminillmを提案する。我々はまず,教師分布の低確率領域を過大評価しないように,生成言語モデル上でKDに適した逆KLDを用いて,標準KDアプローチにおけるKLL(Kulback-Leibler divergence)目標のフォワードを置き換える。そして、この目的を学習するための効果的な最適化アプローチを導出する。命令追従設定における広範囲な実験により、MiniLLMモデルは、より高い全体的な品質、低い露光バイアス、より良い校正、より高い長文生成性能でより正確な応答を生成することが示された。提案手法は,120Mから13Bのパラメータを持つ異なるモデルファミリに対してもスケーラブルである。コードとモデルチェックポイントはhttps://aka.ms/MiniLLM.com/でリリースします。

Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-box classification models or training small models to imitate black-box model APIs like ChatGPT. How to effectively distill the knowledge from white-box generative LLMs is still under-explored, which becomes more and more important with the prosperity of LLMs. In this work, we propose MiniLLM that distills smaller language models from generative larger language models. We first replace the forward Kullback-Leibler divergence (KLD) objective in the standard KD approaches with reverse KLD, which is more suitable for KD on generative language models, to prevent the student model from overestimating the low-probability regions of the teacher distribution. Then, we derive an effective optimization approach to learn this objective. Extensive experiments in the instruction-following setting show that the MiniLLM models generate more precise responses with the higher overall quality, lower exposure bias, better calibration, and higher long-text generation performance. Our method is also scalable for different model families with 120M to 13B parameters. We will release our code and model checkpoints at https://aka.ms/MiniLLM.

翻訳日:2023-06-16 18:37:17 公開日:2023-06-14

# ゼロショット3次元形状スケッチビューの類似性と検索

Zero-Shot 3D Shape Sketch View Similarity and Retrieval ( http://arxiv.org/abs/2306.08541v1 )

ライセンス: Link先を確認

Gianluca Berardi and Yulia Gryaditskaya

(参考訳) プリテキストタスクのViTとResNetの特徴層に基づいて事前学習を行い、個々の3次元形状の2次元スケッチビューのペア間の類似性を定量化する。モデルが類似したビューと地上3D形状を検索する能力の観点から性能を評価する。ゼロショット性能研究の先駆けとして、1つまたは複数の形状クラスにおける代替微調整戦略とその他の形状クラスへの一般化について検討する。 NPR(Non-Photo Realistic)レンダリングの進歩を利用して、コントラスト学習を用いた事前学習基礎モデルの微調整に使用する複数のスタイルで合成スケッチビューを生成する。スケッチ中のオブジェクトのスケールが,異なるネットワーク層における特徴の類似性に与える影響について検討する。スケールによって異なる特徴層がスケッチビューにおける形状の類似性を示すことが観察できる。しかし、同様のオブジェクトスケールがvitとresnetの最高のパフォーマンスをもたらすことが分かりました。要約すると, 微調整戦略の慎重な選択により, ゼロショット形状検索精度の一貫した改善が得られることを示す。我々の研究はスケッチ領域の研究に大きな影響を与え、知覚的損失として大規模な事前学習モデルを採用する方法についての洞察とガイダンスを提供するだろうと考えています。

We conduct a detailed study of the ability of pretrained on pretext tasks ViT and ResNet feature layers to quantify the similarity between pairs of 2D sketch views of individual 3D shapes. We assess the performance in terms of the models' abilities to retrieve similar views and ground-truth 3D shapes. Going beyond naive zero-shot performance study, we investigate alternative fine-tuning strategies on one or several shape classes, and their generalization to other shape classes. Leveraging progress in NPR (Non-Photo Realistic) rendering, we generate synthetic sketch views in several styles which we use to fine-tune pretrained foundation models using contrastive learning. We study how the scale of an object in a sketch affects the similarity of features at different network layers. We observe that depending on the scale, different feature layers can be more indicative of shape similarities in sketch views. However, we find that similar object scales result in the best performance of ViT and ResNet. In summary, we show that careful selection of a fine-tuning strategy allows us to obtain consistent improvement in zero-shot shape retrieval accuracy. We believe that our work will have a significant impact on research in the sketch domain, providing insights and guidance on how to adopt large pretrained models as perceptual losses.

翻訳日:2023-06-16 18:36:53 公開日:2023-06-14

# Fed-ZERO:フェデレートされた専門家による効率的なゼロショットパーソナライゼーション

Fed-ZERO: Efficient Zero-shot Personalization with Federated Mixture of Experts ( http://arxiv.org/abs/2306.08586v1 )

ライセンス: Link先を確認

Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Robert Sim, Anastasios Kyrillidis, Dimitrios Dimitriadis

(参考訳) フェデレートラーニング(FL)の目標の1つは、共有グローバルモデルからの知識を活用しながら、参加する各クライアントのコンテキストに適応可能なパーソナライズされたモデルを作成することである。しかし、しばしばパーソナライゼーションは、優れたパフォーマンスを達成するために、クライアントのラベル付きデータを使用する微調整のステップを必要とする。これは、入ってくるクライアントが新しくなり、あるいはプライバシー上の懸念があるシナリオでは実現できないかもしれない。そして、これらのシナリオでゼロショットのパーソナライズを実現する方法が、まだオープンである。 FLセットアップ内でMixture-of-Experts(MoE)フレームワークを用いて新しいソリューションを提案する。本手法は,クライアントの多様性を活かし,クラスの異なるサブセットに関する専門的な専門家を訓練し,入力を最も関連する専門家にルーティングするゲーティング関数を提供する。我々のゲーティング関数は、事前訓練されたモデル共通専門家の知識を利用して、オンザフライで経路決定を強化する。その結果,術式FL設定の精度は最大18%向上し,ゼロショット性能の競争力は維持できることがわかった。実際に,本手法は非均一なデータ分散を処理し,より効率的にスケールし,FLベンチマークの最先端性能を向上させる。

One of the goals in Federated Learning (FL) is to create personalized models that can adapt to the context of each participating client, while utilizing knowledge from a shared global model. Yet, often, personalization requires a fine-tuning step using clients' labeled data in order to achieve good performance. This may not be feasible in scenarios where incoming clients are fresh and/or have privacy concerns. It, then, remains open how one can achieve zero-shot personalization in these scenarios. We propose a novel solution by using a Mixture-of-Experts (MoE) framework within a FL setup. Our method leverages the diversity of the clients to train specialized experts on different subsets of classes, and a gating function to route the input to the most relevant expert(s). Our gating function harnesses the knowledge of a pretrained model common expert to enhance its routing decisions on-the-fly. As a highlight, our approach can improve accuracy up to 18\% in state of the art FL settings, while maintaining competitive zero-shot performance. In practice, our method can handle non-homogeneous data distributions, scale more efficiently, and improve the state-of-the-art performance on common FL benchmarks.

翻訳日:2023-06-16 18:31:03 公開日:2023-06-14

# トフォリゲートを5000万個しか持たない256ビット楕円曲線秘密鍵の計算法

How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates ( http://arxiv.org/abs/2306.08585v1 )

ライセンス: Link先を確認

Daniel Litinski

(参考訳) 我々は、シリコンフォトニクスにインスパイアされたアクティブボリュームアーキテクチャにおける資源推定のケーススタディとして、楕円曲線プライベートキーの計算にショアのアルゴリズムを用いる。ここでは、フォールトトレラントなサーフェスコード量子コンピュータは、非局所的なモジュール間接続の対数数のモジュールで構成され、アルゴリズムのコスト関数を2Dローカルアーキテクチャと比較する。非ローカル接続は、運用体制によってキー当たりのコストを300～700倍削減できることがわかりました。 10%のしきい値で、10-$\mu$sのコードサイクルと非ローカル接続を仮定すると、1つのキーは、それぞれ1152の物理キュービットを持つ6000モジュールを使用して10分毎に生成される。対照的に、厳格な2Dローカル接続を持つデバイスは、より多くのキュービットを必要とし、38時間毎に1つのキーを生成する。また,鍵当たりの toffoli 数を最大5倍に減らす単純なアーキテクチャ非依存のアルゴリズム修正も見いだした。これらの変更は、複数のキーに対して格納された状態を再利用し、アルゴリズムの複数の並列インスタンスにモジュラ分割演算のコストを分散させることを含む。

We use Shor's algorithm for the computation of elliptic curve private keys as a case study for resource estimates in the silicon-photonics-inspired active-volume architecture. Here, a fault-tolerant surface-code quantum computer consists of modules with a logarithmic number of non-local inter-module connections, modifying the algorithmic cost function compared to 2D-local architectures. We find that the non-local connections reduce the cost per key by a factor of 300-700 depending on the operating regime. At 10% threshold, assuming a 10-$\mu$s code cycle and non-local connections, one key can be generated every 10 minutes using 6000 modules with 1152 physical qubits each. By contrast, a device with strict 2D-local connectivity requires more qubits and produces one key every 38 hours. We also find simple architecture-independent algorithmic modifications that reduce the Toffoli count per key by up to a factor of 5. These modifications involve reusing the stored state for multiple keys and spreading the cost of the modular division operation over multiple parallel instances of the algorithm.

翻訳日:2023-06-16 18:30:40 公開日:2023-06-14

# 同時解釈データを用いたタグ付きエンドツーエンド同時音声翻訳訓練

Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data ( http://arxiv.org/abs/2306.08582v1 )

ライセンス: Link先を確認

Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

(参考訳) 同時音声翻訳(SimulST)は部分的な音声入力を漸進的に翻訳する。入力と出力のモノトニック対応は、より小さなレイテンシーでは望ましいが、英語や日本語のような遠方の言語ペアではそうではない。この問題に対する先進的なアプローチは、SIデータを用いて同時解釈(SI)を模倣し、SimulSTモデルをトレーニングすることである。しかし、そのようなSIデータのサイズは限られているため、SIデータはオフラインで翻訳される通常のバイリンガルデータと併用されるべきである。本稿では,SIとオフラインの混合データを用いたSimulSTモデルを効果的に訓練する方法を提案する。提案手法は、モデルにSI型またはオフライン型の出力を生成するよう指示するスタイルタグと混合データを用いて単一のモデルを訓練する。実験結果から, BLEURTの低遅延域における改善が示され, 提案モデルがベースラインよりもSIスタイルの出力を生成することが明らかとなった。

Simultaneous speech translation (SimulST) translates partial speech inputs incrementally. Although the monotonic correspondence between input and output is preferable for smaller latency, it is not the case for distant language pairs such as English and Japanese. A prospective approach to this problem is to mimic simultaneous interpretation (SI) using SI data to train a SimulST model. However, the size of such SI data is limited, so the SI data should be used together with ordinary bilingual data whose translations are given in offline. In this paper, we propose an effective way to train a SimulST model using mixed data of SI and offline. The proposed method trains a single model using the mixed data with style tags that tell the model to generate SI- or offline-style outputs. Experiment results show improvements of BLEURT in different latency ranges, and our analyses revealed the proposed model generates SI-style outputs more than the baseline.

翻訳日:2023-06-16 18:30:21 公開日:2023-06-14

# 低リソース音声認識改善のためのデータ拡張のための言語間マッピングの学習

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition ( http://arxiv.org/abs/2306.08577v1 )

ライセンス: Link先を確認

Muhammad Umar Farooq, Thomas Hain

(参考訳) 言語間リソースの利用は、低リソース言語の不足を補う効果的な方法である。近年,多言語モデル融合手法が提案され,言語間音響-音韻類似性を写像関数として学習するためのモデルが訓練されている。しかし、手作りのレキシコンはハイブリッドDNN-HMM ASRシステムの訓練に使われてきた。この依存関係を取り除くために、エンドツーエンド音声認識のための学習可能な言語間マッピングの概念を拡張する。さらに、並列データを用いずに、ソース言語をターゲット言語に翻訳するマッピングモデルも採用している。最後に、ソースオーディオとその音訳をデータ拡張に使用して、ターゲット言語ASRを再トレーニングする。その結果,任意のソース言語ASRモデルを用いて低リソースターゲット言語認識を行い,その後にマッピングモデルを提案する。さらに、データ拡張により、ベースライン単言語モデルよりも5%以上の相対的なゲインが得られる。

Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resource languages. Recently, a novel multilingual model fusion technique has been proposed where a model is trained to learn cross-lingual acoustic-phonetic similarities as a mapping function. However, handcrafted lexicons have been used to train hybrid DNN-HMM ASR systems. To remove this dependency, we extend the concept of learnable cross-lingual mappings for end-to-end speech recognition. Furthermore, mapping models are employed to transliterate the source languages to the target language without using parallel data. Finally, the source audio and its transliteration is used for data augmentation to retrain the target language ASR. The results show that any source language ASR model can be used for a low-resource target language recognition followed by proposed mapping model. Furthermore, data augmentation results in a relative gain up to 5% over baseline monolingual model.

翻訳日:2023-06-16 18:30:05 公開日:2023-06-14

# リモートセンシングにおける教師付き変分オートエンコーダに基づくラベルノイズロバスト画像表現学習

Label Noise Robust Image Representation Learning based on Supervised Variational Autoencoders in Remote Sensing ( http://arxiv.org/abs/2306.08575v1 )

ライセンス: Link先を確認

Gencer Sumbul and Beg\"um Demir

(参考訳) 公開されたテーママップとクラウドソースデータにより、ディープニューラルネットワーク(DNN)のトレーニングには、リモートセンシング(RS)イメージアノテーションをゼロコストで収集することができる。しかし、そのようなアノテーション源は、トレーニングデータにノイズラベルを含むリスクを高め、不正確なRS画像表現学習(IRL)を引き起こす可能性がある。本稿では,RSで検討されている学習課題とは無関係に,IRL上のノイズラベルの干渉を防止することを目的としたラベル頑健なIRL手法を提案する。提案手法は,教師付き変分オートエンコーダ(SVAE)と任意の種類のDNNを組み合わせる。これは画像の特徴に基づいて変動生成プロセスを定義することで達成される。これにより、SVAEから得られた損失値と検討されたDNNのタスクヘッドに基づいて、IRLの各トレーニングサンプルの重要性を定義することができる。そして,提案手法はノイズラベルを持つ画像に対して重要度を低くするとともに,irl中に正しいラベルを持つ画像に対して高い重要度を与える。 rs画像に適用したラベル雑音ロバストirl法と比較して,提案手法の有効性を実験的に示した。提案手法のコードはhttps://git.tu-berlin.de/rsim/RS-IRL-SVAEで公開されている。

Due to the publicly available thematic maps and crowd-sourced data, remote sensing (RS) image annotations can be gathered at zero cost for training deep neural networks (DNNs). However, such annotation sources may increase the risk of including noisy labels in training data, leading to inaccurate RS image representation learning (IRL). To address this issue, in this paper we propose a label noise robust IRL method that aims to prevent the interference of noisy labels on IRL, independently from the learning task being considered in RS. To this end, the proposed method combines a supervised variational autoencoder (SVAE) with any kind of DNN. This is achieved by defining variational generative process based on image features. This allows us to define the importance of each training sample for IRL based on the loss values acquired from the SVAE and the task head of the considered DNN. Then, the proposed method imposes lower importance to images with noisy labels, while giving higher importance to those with correct labels during IRL. Experimental results show the effectiveness of the proposed method when compared to well-known label noise robust IRL methods applied to RS images. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/RS-IRL-SVAE.

翻訳日:2023-06-16 18:29:51 公開日:2023-06-14

# 分子内の電子デコヒーレンス経路のマッピング

Mapping Electronic Decoherence Pathways in Molecules ( http://arxiv.org/abs/2306.08574v1 )

ライセンス: Link先を確認

Ignacio Gustin, Chang Woo Kim, David W. McCamant and Ignacio Franco

(参考訳) 分子電子量子デコヒーレンスを支配する基本的な化学原理を確立することは、依然として顕著な課題である。溶媒や分子内振動、化学機能化といった基本的な問題は、電子デコヒーレンス全体への寄与は未解決のままであり、最先端の理論的および実験的アプローチの範囲を超えている。本研究では, 電子量子コヒーレンス損失の解明を可能にする, 縮合相環境に浸漬した分子色相の脱コヒーレンス経路を分離する手法を開発する。そのため, RR分光法は, 室温, 溶媒中, 蛍光分子および非蛍光分子において, 完全化学量で分子スペクトル密度を再構築するための一般的な実験方法として同定された。次に、スペクトル密度から脱コヒーレンスダイナミクスを定量的に捉え、脱コヒーレンス経路を個々の分子振動や溶媒モードによる寄与に分解して同定する方法を示す。 DNA塩基チミンとその誘導体の水中における電子的脱コヒーレンス経路の解析による戦略の有用性について述べる。この場合の電子コヒーレンスは ~30 fs で崩壊する。早期のコヒーレンス損失は分子内振動によって決定され、溶媒による全体的な崩壊が決定される。チミンの化学置換は、水との水素結合相互作用による脱コヒーレンスを調節し、最も速い脱コヒーレンス速度をもたらす。温度の上昇は溶媒の寄与の重要性を高めるため脱コヒーレンスを速くするが、初期の脱コヒーレンスダイナミクスはそのまま残る。開発された戦略は、分子構造と溶媒構造と量子デコヒーレンスとの接続を確立する重要な機会を開き、それを合理的に調節する化学戦略を開発する。

Establishing the fundamental chemical principles that govern molecular electronic quantum decoherence has remained an outstanding challenge. Fundamental questions such as how solvent and intramolecular vibrations or chemical functionalization contribute to the overall electronic decoherence remain unanswered and are beyond the reach of state-of-the-art theoretical and experimental approaches. We address this challenge by developing a strategy to isolate decoherence pathways for molecular chromophores immersed in condensed phase environments that enables elucidating how electronic quantum coherence is lost. For this, we first identify RR spectroscopy as a general experimental method to reconstruct molecular spectral densities with full chemical complexity at room temperature, in solvent, and for fluorescent and non-fluorescent molecules. We then show how to quantitatively capture the decoherence dynamics from the spectral density and identify decoherence pathways by decomposing the overall coherence loss into contributions due to individual molecular vibrations and solvent modes. We illustrate the utility of the strategy by analyzing the electronic decoherence pathways of the DNA base thymine and its derivatives in water. The electronic coherences in this case decay in ~30 fs. The early-time coherence loss is determined by intramolecular vibrations while the overall decay by solvent. Chemical substitution of thymine modulates the decoherence with hydrogen-bond interactions with water leading to the fastest decoherence rates. Increasing temperature leads to faster decoherence as it enhances the importance of solvent contributions but leaves the early-time decoherence dynamics intact. The developed strategy opens key opportunities to establish the connection between molecular and solvent structure and quantum decoherence as needed to develop chemical strategies to rationally modulate it.

翻訳日:2023-06-16 18:29:30 公開日:2023-06-14

# GenImage:AI生成画像検出のための100万規模のベンチマーク

GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image ( http://arxiv.org/abs/2306.08571v1 )

ライセンス: Link先を確認

Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, Yunhe Wang

(参考訳) 生成モデルが写真画像を生成するという異常な能力は、偽情報の拡散に対する懸念を強め、それによってAI生成した偽画像と実画像とを区別できる検出器の需要が高まった。しかし、最も先進的な画像生成装置の画像を含む大規模なデータセットの欠如は、そのような検出器の開発に障害をもたらす。本稿では,以下の利点を有するGenImageデータセットを紹介する。 1)AIが生成した偽画像100万枚以上の画像と実際の画像の収集を含む大量の画像。 2)リッチ画像コンテンツは幅広い画像クラスを包含する。 3)最先端のジェネレータ,高度な拡散モデルとGANを用いた合成画像。前述の利点により、GenImageで訓練された検出器は、徹底的な評価を行い、多様な画像に強い適用性を示すことができる。本研究では,実世界のシナリオに類似した検出手法を評価するための2つのタスクを提案する。クロスジェネレータ画像分類タスクは、あるジェネレータで訓練された検出器が他のジェネレータでテストした場合の性能を測定する。劣化画像分類タスクは、低解像度、ぼやけた画像、圧縮画像などの劣化画像を扱う検出器の能力を評価する。 GenImageデータセットを使うことで、研究者は一般的な手法と比較して、優れたAI生成画像検出器の開発と評価を効果的に行うことができる。

The extraordinary ability of generative models to generate photographic images has intensified concerns about the spread of disinformation, thereby leading to the demand for detectors capable of distinguishing between AI-generated fake images and real images. However, the lack of large datasets containing images from the most advanced image generators poses an obstacle to the development of such detectors. In this paper, we introduce the GenImage dataset, which has the following advantages: 1) Plenty of Images, including over one million pairs of AI-generated fake images and collected real images. 2) Rich Image Content, encompassing a broad range of image classes. 3) State-of-the-art Generators, synthesizing images with advanced diffusion models and GANs. The aforementioned advantages allow the detectors trained on GenImage to undergo a thorough evaluation and demonstrate strong applicability to diverse images. We conduct a comprehensive analysis of the dataset and propose two tasks for evaluating the detection method in resembling real-world scenarios. The cross-generator image classification task measures the performance of a detector trained on one generator when tested on the others. The degraded image classification task assesses the capability of the detectors in handling degraded images such as low-resolution, blurred, and compressed images. With the GenImage dataset, researchers can effectively expedite the development and evaluation of superior AI-generated image detectors in comparison to prevailing methodologies.

翻訳日:2023-06-16 18:29:01 公開日:2023-06-14

# WizardCoder: Evol-Instructでコード大言語モデルを強化する

WizardCoder: Empowering Code Large Language Models with Evol-Instruct ( http://arxiv.org/abs/2306.08568v1 )

ライセンス: Link先を確認

Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang

(参考訳) StarCoderのようなCode Large Language Models (Code LLM)は、コード関連のタスクにおいて例外的なパフォーマンスを示している。しかし、既存のモデルのほとんどは、命令の微調整なしで広範囲の生コードデータに基づいて事前訓練されている。本稿では,コード領域にEvol-Instruct法を適用することで,複雑な命令の微調整を施したコードLLMを実現するWizardCoderを提案する。我々は,HumanEval,HumanEval+,MBPP,DS-1000という4つの著名なコード生成ベンチマークに関する総合的な実験を通じて,我々のモデルが持つ異常な能力を明らかにする。他のオープンソースコードLLMをはるかに上回ります。さらに、我々のモデルは、HumanEvalとHumanEval+上で、最大の閉LLM、ArthropicのClaudeとGoogleのBardよりも優れています。私たちのコード、モデルウェイト、データはhttps://github.com/nlpxucan/wizardlmで公開されている。

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. Through comprehensive experiments on four prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, and DS-1000, we unveil the exceptional capabilities of our model. It surpasses all other open-source Code LLMs by a substantial margin. Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+. Our code, model weights, and data are public at https://github.com/nlpxucan/WizardLM

翻訳日:2023-06-16 18:28:44 公開日:2023-06-14

# サイバー攻撃に対する連合学習に基づく車両軌道予測

Federated Learning-based Vehicle Trajectory Prediction against Cyberattacks ( http://arxiv.org/abs/2306.08566v1 )

ライセンス: Link先を確認

Zhe Wang, Tingkai Yan

(参考訳) Internet of Vehicles (IoV) の開発により、車両無線通信は深刻なサイバーセキュリティ上の問題を引き起こす。偽の車両の位置や周囲の車両が送った速度などの不具合情報は、車両の衝突、交通渋滞、さらには死傷者も引き起こす可能性がある。さらに、車両軌道やユーザアカウント情報などの個人車両データ漏洩は、ユーザのプロパティやセキュリティを損なう可能性がある。そのため,iovシステムにおいて,データの飽和が不良なサイバー攻撃対策を実現する必要がある。本稿では,これらの問題に対処するため,FL-TPに対するフェデレート学習に基づく車両軌道予測アルゴリズムを提案する。 FL-TPは一般に公開されているVehicular Reference Misbehavior(VeReMi)データセットを使用して、定数、定数オフセット、ランダム、ランダムオフセット、最終的な停止という5種類のサイバー攻撃を集中的にトレーニングし、テストする。その結果,提案手法は最大サイバー攻撃透過性シナリオにおいて,サイバー攻撃検出と追跡予測を最大6.99%,54.86%改善できることがわかった。

With the development of the Internet of Vehicles (IoV), vehicle wireless communication poses serious cybersecurity challenges. Faulty information, such as fake vehicle positions and speeds sent by surrounding vehicles, could cause vehicle collisions, traffic jams, and even casualties. Additionally, private vehicle data leakages, such as vehicle trajectory and user account information, may damage user property and security. Therefore, achieving a cyberattack-defense scheme in the IoV system with faulty data saturation is necessary. This paper proposes a Federated Learning-based Vehicle Trajectory Prediction Algorithm against Cyberattacks (FL-TP) to address the above problems. The FL-TP is intensively trained and tested using a publicly available Vehicular Reference Misbehavior (VeReMi) dataset with five types of cyberattacks: constant, constant offset, random, random offset, and eventual stop. The results show that the proposed FL-TP algorithm can improve cyberattack detection and trajectory prediction by up to 6.99% and 54.86%, respectively, under the maximum cyberattack permeability scenarios compared with benchmark methods.

翻訳日:2023-06-16 18:28:26 公開日:2023-06-14

# 逆転率の信頼性評価

Reliable Evaluation of Adversarial Transferability ( http://arxiv.org/abs/2306.08565v1 )

ライセンス: Link先を確認

Wenqian Yu and Jindong Gu and Zhijiang Li and Philip Torr

(参考訳) 小さな敵対的摂動を持つ敵対的例(AE)は、ディープニューラルネットワーク(DNN)を誤った予測に導出する可能性がある。あるDNNで作られたAEは、別のDNNを騙すこともできる。ここ数年、AEsの転送性はブラックボックス攻撃を促進する重要な特性であるため、大きな注目を集めてきた。対向移動性を改善するために多くのアプローチが提案されている。しかし、これらは主に異なる畳み込みニューラルネットワーク(cnn)アーキテクチャで検証されており、全てのcnnが類似したアーキテクチャバイアスを共有しているため、信頼性の高い評価ではない。本研究では,4種類のニューラルネットワークから18種類の人気モデルを検証し,代表的転送可能性向上攻撃法を再評価する。我々の再評価の結果、逆転性はしばしば過大評価され、すべての人気モデルに変換できる単一のAEは存在しないことがわかった。包括的評価の下では,前回の攻撃方法の移動可能性ランクが変化する。本稿では,3つの評価プロトコルを含む信頼性ベンチマークを提案する。新たなベンチマークにおける逆転送可能性は非常に低く、逆転送可能性の過大評価をさらに裏付ける。私たちは、コード、モデルチェックポイント、評価プロトコルを含む将来の研究を促進するために、https://adv-trans-eval.github.ioでベンチマークをリリースします。

Adversarial examples (AEs) with small adversarial perturbations can mislead deep neural networks (DNNs) into wrong predictions. The AEs created on one DNN can also fool another DNN. Over the last few years, the transferability of AEs has garnered significant attention as it is a crucial property for facilitating black-box attacks. Many approaches have been proposed to improve adversarial transferability. However, they are mainly verified across different convolutional neural network (CNN) architectures, which is not a reliable evaluation since all CNNs share some similar architectural biases. In this work, we re-evaluate 12 representative transferability-enhancing attack methods where we test on 18 popular models from 4 types of neural networks. Our reevaluation revealed that the adversarial transferability is often overestimated, and there is no single AE that can be transferred to all popular models. The transferability rank of previous attacking methods changes when under our comprehensive evaluation. Based on our analysis, we propose a reliable benchmark including three evaluation protocols. Adversarial transferability on our new benchmark is extremely low, which further confirms the overestimation of adversarial transferability. We release our benchmark at https://adv-trans-eval.github.io to facilitate future research, which includes code, model checkpoints, and evaluation protocols.

翻訳日:2023-06-16 18:28:03 公開日:2023-06-14

# TomoSAM:トモグラフィのセグメンテーションにSAMを使用した3Dスライダ拡張

TomoSAM: a 3D Slicer extension using SAM for tomography segmentation ( http://arxiv.org/abs/2306.08609v1 )

ライセンス: Link先を確認

Federico Semeraro, Alexandre Quintart, Sergio Fraile Izquierdo, Joseph C. Ferguson

(参考訳) tomosamは、3d画像処理と視覚化に使用される高度に有能なソフトウェアプラットフォームである3d slicerに、最先端セグメントのany model(sam)を統合するために開発された。 samは、プロンプト可能なディープラーニングモデルで、少数のユーザークリックのみに基づいて、オブジェクトを識別し、ゼロショットでイメージマスクを作成することができる。これらのツールのシナジーは、トモグラフィーや他のイメージング技術による複雑な3dデータセットのセグメンテーションを支援する。この記事に関連するソースコードはhttps://github.com/fsemerar/SlicerTomoSAMにある。

TomoSAM has been developed to integrate the cutting-edge Segment Anything Model (SAM) into 3D Slicer, a highly capable software platform used for 3D image processing and visualization. SAM is a promptable deep learning model that is able to identify objects and create image masks in a zero-shot manner, based only on a few user clicks. The synergy between these tools aids in the segmentation of complex 3D datasets from tomography or other imaging techniques, which would otherwise require a laborious manual segmentation process. The source code associated with this article can be found at https://github.com/fsemerar/SlicerTomoSAM

翻訳日:2023-06-16 18:20:24 公開日:2023-06-14

# ロバスト性とメンバシッププライバシのためのグラフ情報ボトルネックの統一フレームワーク

A Unified Framework of Graph Information Bottleneck for Robustness and Membership Privacy ( http://arxiv.org/abs/2306.08604v1 )

ライセンス: Link先を確認

Enyan Dai, Limeng Cui, Zhengyang Wang, Xianfeng Tang, Yinghan Wang, Monica Cheng, Bing Yin, Suhang Wang

(参考訳) グラフニューラルネットワーク(GNN)は,グラフ構造化データのモデリングにおいて大きな成功を収めている。しかし、最近の研究では、GNNは攻撃者の望ましい予測をするために、GNNモデルを騙す可能性のある敵攻撃に対して脆弱であることを示している。さらに、GNNのトレーニングデータは、メンバシップ推論攻撃によって漏洩することができる。これは主に、電子商取引、金融、バイオインフォマティクスといった高額な分野におけるGNNの採用を妨げる。堅牢な予測とメンバーシッププライバシ保護に関する調査は行われてきたが、一般的には堅牢性とメンバーシッププライバシを同時に考慮できていない。そこで本研究では、堅牢でメンバーシップなプライバシー保護型GNNを開発するための新しい課題について検討する。我々の分析によると、Information Bottleneck(IB)は、ノイズの多い情報をフィルタリングし、ラベル付きサンプルで予測を規則化するのに役立つ。しかし、ノード分類における構造ノイズとラベルの欠如は、グラフ構造化データへのIBの展開に挑戦する。これらの問題を緩和するために,隣接ボトルネックによる構造ノイズを軽減するグラフ情報ボトルネックフレームワークを提案する。擬似ラベルは、ラベル付きセットの予測と、メンバーシッププライバシの未ラベルセットとのギャップを最小限に抑える最適化にも組み込まれている。実世界のデータセットに関する広範囲な実験は、この手法が堅牢な予測を与え、同時にメンバーシッププライバシを保存できることを実証する。

Graph Neural Networks (GNNs) have achieved great success in modeling graph-structured data. However, recent works show that GNNs are vulnerable to adversarial attacks which can fool the GNN model to make desired predictions of the attacker. In addition, training data of GNNs can be leaked under membership inference attacks. This largely hinders the adoption of GNNs in high-stake domains such as e-commerce, finance and bioinformatics. Though investigations have been made in conducting robust predictions and protecting membership privacy, they generally fail to simultaneously consider the robustness and membership privacy. Therefore, in this work, we study a novel problem of developing robust and membership privacy-preserving GNNs. Our analysis shows that Information Bottleneck (IB) can help filter out noisy information and regularize the predictions on labeled samples, which can benefit robustness and membership privacy. However, structural noises and lack of labels in node classification challenge the deployment of IB on graph-structured data. To mitigate these issues, we propose a novel graph information bottleneck framework that can alleviate structural noises with neighbor bottleneck. Pseudo labels are also incorporated in the optimization to minimize the gap between the predictions on the labeled set and unlabeled set for membership privacy. Extensive experiments on real-world datasets demonstrate that our method can give robust predictions and simultaneously preserve membership privacy.

翻訳日:2023-06-16 18:20:12 公開日:2023-06-14

# M^2UNet:polypセグメンテーションのためのMetaFormerマルチスケールアップサンプリングネットワーク

M^2UNet: MetaFormer Multi-scale Upsampling Network for Polyp Segmentation ( http://arxiv.org/abs/2306.08600v1 )

ライセンス: Link先を確認

Quoc-Huy Trinh, Nhat-Tan Bui, Trong-Hieu Nguyen Mau, Minh-Van Nguyen, Hai-Minh Phan, Minh-Triet Tran, Hai-Dang Nguyen

(参考訳) 近年,ポリプのセグメンテーションが注目され,様々な手法が提案されている。しかし, コンボリューション操作の性質から, 複雑ポリープの前景とその周辺領域での作業では困難に直面することが多い。さらに、既存のほとんどのメソッドは、複数のデコーダステージからの潜在的な情報を利用することを忘れている。この課題に対処するために、cnnとtransformerを統合するベースラインとして導入されたmetaformerと、unetフレームワークを結合し、マルチスケールアップサンプリングブロック(mu)を統合することを提案します。このシンプルなモジュールは、浅いデコーダステージの複数の受容的フィールドパスを探索し、より高いステージを追加して、医療画像のセグメンテーションに不可欠な優れた特徴表現を集約することで、多レベル情報を組み合わせることができる。本稿では,ポリプセグメンテーションタスクのためのMetaFormer Multi-scale Upsampling Network (M$^2$UNet)を提案する。 5つのベンチマークデータセットを広範囲に実験した結果,従来の手法に比べて性能が高かった。

Polyp segmentation has recently garnered significant attention, and multiple methods have been formulated to achieve commendable outcomes. However, these techniques often confront difficulty when working with the complex polyp foreground and their surrounding regions because of the nature of convolution operation. Besides, most existing methods forget to exploit the potential information from multiple decoder stages. To address this challenge, we suggest combining MetaFormer, introduced as a baseline for integrating CNN and Transformer, with UNet framework and incorporating our Multi-scale Upsampling block (MU). This simple module makes it possible to combine multi-level information by exploring multiple receptive field paths of the shallow decoder stage and then adding with the higher stage to aggregate better feature representation, which is essential in medical image segmentation. Taken all together, we propose MetaFormer Multi-scale Upsampling Network (M$^2$UNet) for the polyp segmentation task. Extensive experiments on five benchmark datasets demonstrate that our method achieved competitive performance compared with several previous methods.

翻訳日:2023-06-16 18:19:49 公開日:2023-06-14

# Kernel Debiased Plug-in Estimation

Kernel Debiased Plug-in Estimation ( http://arxiv.org/abs/2306.08598v1 )

ライセンス: Link先を確認

Brian Cho, Kyra Gan, Ivana Malenica, Yaroslav Mukhin

(参考訳) 本研究では,ノイズパラメータの存在下でスカラーターゲットパラメータを推定する問題を考察する。未知のニュアンスパラメータを非パラメトリック推定器、例えば機械学習(ML)モデルで置き換えるのは便利であるが、大きなバイアスのために非効率であることが示されている。ターゲット最小損失ベース推定(TMLE)やダブル機械学習(DML)といった現代の手法は、ML推定を利用して、プラグインバイアスを緩和し、柔軟な仮定の下で最適な性能を達成する。準最適バイアス分散トレードオフを回避するため、これらの手法はプラグインの偏りを事前に見積もる。既存のデバイアス手法では、ターゲットパラメータの影響関数を入力として要求する。しかし、IFの派生には専門的な専門知識が必要であり、実践者によるこれらの手法の適応を妨げる。プラグイン推定器をデバイアスする新しい方法を提案する。 (i)効率的である。 (ii)IFの実施を必要としない。三) 計算的抽出が可能であり, 新たな推定問題に容易に適応でき, 利用者による解析的導出なしに自動化することができる。我々はtmleフレームワーク上に構築し,再現カーネルヒルベルト空間 (rkhs) を用いて構築した非パラメトリックモデルに対して,正規化確率最大化ステップでプラグイン推定を更新し,任意の正規目標パラメータに対して効率的なプラグイン推定を生成する。そこで本手法は,プラグインアプローチの有用性を犠牲にすることなく,競合するデバイアス手法の効率性を提供する。

We consider the problem of estimating a scalar target parameter in the presence of nuisance parameters. Replacing the unknown nuisance parameter with a nonparametric estimator, e.g.,a machine learning (ML) model, is convenient but has shown to be inefficient due to large biases. Modern methods, such as the targeted minimum loss-based estimation (TMLE) and double machine learning (DML), achieve optimal performance under flexible assumptions by harnessing ML estimates while mitigating the plug-in bias. To avoid a sub-optimal bias-variance trade-off, these methods perform a debiasing step of the plug-in pre-estimate. Existing debiasing methods require the influence function of the target parameter as input. However, deriving the IF requires specialized expertise and thus obstructs the adaptation of these methods by practitioners. We propose a novel way to debias plug-in estimators which (i) is efficient, (ii) does not require the IF to be implemented, (iii) is computationally tractable, and therefore can be readily adapted to new estimation problems and automated without analytic derivations by the user. We build on the TMLE framework and update a plug-in estimate with a regularized likelihood maximization step over a nonparametric model constructed with a reproducing kernel Hilbert space (RKHS), producing an efficient plug-in estimate for any regular target parameter. Our method, thus, offers the efficiency of competing debiasing techniques without sacrificing the utility of the plug-in approach.

翻訳日:2023-06-16 18:19:27 公開日:2023-06-14

# Floquet周波数変調によるRydberg相互作用の変換

Transforming Rydberg Interactions with Floquet Frequency Modulation ( http://arxiv.org/abs/2306.08596v1 )

ライセンス: Link先を確認

Luheng Zhao, Michael Dao Kang Lee, Mohammad Mujahid Aliyu, Huanqian Loh

(参考訳) ライドベルク封鎖は配列中の原子を絡める重要な要素である。しかし、これは局所的な量子ゲートの範囲を制限するブロック半径内に原子をうまく配置する必要がある。ここでは、Floquet周波数変調を用いてこの制約を破り、従来の閉塞半径を超えるRydberg-Blockade絡みを実証する。さらに,Floquet周波数変調の下では,絡み合った状態のコヒーレンスを拡張できることがわかった。最後に, ブロックド半径内にある2つの原子に対して, 従来の静的駆動のみで定常状態の個体群を達成できないという, ライドバーグの抗遮断状態を実現する。我々の研究は、Rydbergブロックとアンチブロッカドのパラダイム的な状態の間で変化し、より連結的で一貫性があり、汎用的な中性原子量子プロセッサを単一のアプローチで実現する方法を舗装する。

The Rydberg blockade is a key ingredient for entangling atoms in arrays. However, it requires atoms to be spaced well within the blockade radius, which limits the range of local quantum gates. Here we break this constraint using Floquet frequency modulation, with which we demonstrate Rydberg-blockade entanglement beyond the traditional blockade radius. Further, we find that the coherence of entangled states can be extended under Floquet frequency modulation. Finally, we realize Rydberg anti-blockade states for two atoms within the blockade radius, where the steady-state population cannot be achieved with only the conventional static drive. Our work transforms between the paradigmatic regimes of Rydberg blockade versus anti-blockade and paves the way for realizing more connected, coherent, and versatile neutral atom quantum processors with a single approach.

翻訳日:2023-06-16 18:19:01 公開日:2023-06-14

# tensorkrowch: マシンラーニングにおけるテンソルネットワークのスムーズな統合

TensorKrowch: Smooth integration of tensor networks in machine learning ( http://arxiv.org/abs/2306.08595v1 )

ライセンス: Link先を確認

Jos\'e Ram\'on Pareja Monturiol, David P\'erez-Garc\'ia, Alejandro Pozas-Kerstjens

(参考訳) テンソルネットワークは、高次元テンソルから小さなテンソルのネットワークへの分解である。彼らは物理学や数学に応用しており、最近では有望な機械学習アーキテクチャとして提案されている。機械学習パイプラインにおけるテンソルネットワークの統合を容易にするため、PyTorch上に構築されたオープンソースのPythonライブラリであるTensorKrowchを紹介した。ユーザフレンドリなインターフェースを提供するTensorKrowchでは,任意のテンソルネットワークを構築してトレーニングし,より複雑なディープラーニングモデルのレイヤとして統合することができる。本稿では,TensorKrowchの主な機能と基本的な使用法について述べるとともに,その構築ブロックと効率的な操作を実現するための最適化について技術的に詳述する。

Tensor networks are factorizations of high-dimensional tensors into networks of smaller tensors. They have applications in physics and mathematics, and recently have been proposed as promising machine learning architectures. To ease the integration of tensor networks in machine learning pipelines, we introduce TensorKrowch, an open source Python library built on top of PyTorch. Providing a user-friendly interface, TensorKrowch allows users to construct any tensor network, train it, and integrate it as a layer in more intricate deep learning models. In this paper, we describe the main functionality and basic usage of TensorKrowch, and provide technical details on its building blocks and the optimizations performed to achieve efficient operation.

翻訳日:2023-06-16 18:18:45 公開日:2023-06-14

# 不均一連続学習

Heterogeneous Continual Learning ( http://arxiv.org/abs/2306.08593v1 )

ライセンス: Link先を確認

Divyam Madaan, Hongxu Yin, Wonmin Byeon, Jan Kautz, Pavlo Molchanov

(参考訳) 本稿では,ネットワークアーキテクチャの変更に伴う継続学習(CL)問題に対処するための新しいフレームワークとソリューションを提案する。ほとんどのCLメソッドは、重みを変更して新しいタスク/クラスに単一のアーキテクチャを適用することに重点を置いている。しかし、アーキテクチャ設計の急速な進歩に伴い、既存のソリューションを新しいアーキテクチャに適応させるという課題が重要となる。この制限に対処するため、我々は、新しいデータ/タスクとともに広範囲に進化するネットワークアーキテクチャが継続的に現れる異種連続学習(HCL)を提案する。解決策として, 蒸留技術群の上に構築し, より弱いモデルが教師の役割を担うような新しい環境に修正する一方で, より強力なアーキテクチャが学生として機能する。さらに,従来のデータへのアクセス制限を考慮し,知識伝達を支援するために,タスク前の視覚的特徴を復元するクイック・ディープ・インバージョン(QDI)を提案する。 QDIは従来のソリューションに比べて計算コストを大幅に削減し、全体的なパフォーマンスを向上させる。本稿では, 知識蒸留パラダイムを改良したCLの新しいセットアップを提案し, 蒸留を強化するための高速データ反転法を設計する。各種ネットワークアーキテクチャ上での最先端手法と比較して,評価精度が大幅に向上した。

We propose a novel framework and a solution to tackle the continual learning (CL) problem with changing network architectures. Most CL methods focus on adapting a single architecture to a new task/class by modifying its weights. However, with rapid progress in architecture design, the problem of adapting existing solutions to novel architectures becomes relevant. To address this limitation, we propose Heterogeneous Continual Learning (HCL), where a wide range of evolving network architectures emerge continually together with novel data/tasks. As a solution, we build on top of the distillation family of techniques and modify it to a new setting where a weaker model takes the role of a teacher; meanwhile, a new stronger architecture acts as a student. Furthermore, we consider a setup of limited access to previous data and propose Quick Deep Inversion (QDI) to recover prior task visual features to support knowledge transfer. QDI significantly reduces computational costs compared to previous solutions and improves overall performance. In summary, we propose a new setup for CL with a modified knowledge distillation paradigm and design a quick data inversion method to enhance distillation. Our evaluation of various benchmarks shows a significant improvement on accuracy in comparison to state-of-the-art methods over various networks architectures.

翻訳日:2023-06-16 18:18:33 公開日:2023-06-14

# 大腸内視鏡における自己監督ポリープ再同定

Self-Supervised Polyp Re-Identification in Colonoscopy ( http://arxiv.org/abs/2306.08591v1 )

ライセンス: Link先を確認

Yotam Intrator, Natalie Aizenberg, Amir Livne, Ehud Rivlin, Roman Goldenberg

(参考訳) コンピュータ支援ポリープ検出(CADe)は, 現代の大腸内視鏡システムにおいて, 標準的かつ重要な部分となっている。典型的な大腸内視鏡CADeは、単一のフレーム内のポリプを検出し、ビデオシーケンスを通して追跡しない。しかし、polyp characterization (cadx)、quality metrics、automatic reportingを含む多くの下流タスクでは、複数のフレームからポリpデータを集約する必要がある。本研究では,視覚的外観による再同定に基づく堅牢な長期ポリープ追跡手法を提案する。本ソリューションは,ビデオ入力の時間的性質を活用した注意に基づく自己教師付きmlモデルを用いる。提案手法の性能を定量的に評価し,CADxタスクの価値を示す。

Computer-aided polyp detection (CADe) is becoming a standard, integral part of any modern colonoscopy system. A typical colonoscopy CADe detects a polyp in a single frame and does not track it through the video sequence. Yet, many downstream tasks including polyp characterization (CADx), quality metrics, automatic reporting, require aggregating polyp data from multiple frames. In this work we propose a robust long term polyp tracking method based on re-identification by visual appearance. Our solution uses an attention-based self-supervised ML model, specifically designed to leverage the temporal nature of video input. We quantitatively evaluate method's performance and demonstrate its value for the CADx task.

翻訳日:2023-06-16 18:18:15 公開日:2023-06-14

# 暗黙のバイアスを超えて:オンライン学習におけるsgdノイズの無意味さ

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning ( http://arxiv.org/abs/2306.08590v1 )

ライセンス: Link先を確認

Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham Kakade, Boaz Barak

(参考訳) ディープラーニングにおけるSGDの成功は、高い学習率または小さなバッチサイズによって引き起こされる暗黙のバイアス("SGD noise")に先行研究によって説明されている。オフライン学習(マルチエポック学習)に焦点を当てた先行研究では,オンライン学習(単一エポック学習)におけるSGDノイズの影響について検討した。画像と言語データの広範な実証分析を通じて,オンライン学習において,大きな学習率と小さなバッチサイズが暗黙のバイアスアドバンテージを生まないことを実証する。オフライン学習とは対照的に、オンライン学習におけるSGDノイズの利点は厳密な計算であり、より大きく、よりコスト効率の良い勾配ステップを促進する。本研究は,オンラインシステムにおけるsgdは,ノイズレス勾配流アルゴリズムの「黄金経路」に沿ってノイズステップをとることができることを示唆する。この仮説を裏付ける証拠として,訓練中にsgdノイズを低減させる実験と,sgdノイズレベルは異なるが等価な損失値で訓練されたモデル間のポイントワイズ機能距離を測定する。本研究は,SGDの一般的な理解に挑戦し,オンライン学習におけるその役割に関する新たな知見を提供する。

The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by high learning rate or small batch size ("SGD noise"). While prior works that focused on offline learning (i.e., multiple-epoch training), we study the impact of SGD noise on online (i.e., single epoch) learning. Through an extensive empirical analysis of image and language data, we demonstrate that large learning rate and small batch size do not confer any implicit bias advantages in online learning. In contrast to offline learning, the benefits of SGD noise in online learning are strictly computational, facilitating larger or more cost-effective gradient steps. Our work suggests that SGD in the online regime can be construed as taking noisy steps along the "golden path" of the noiseless gradient flow algorithm. We provide evidence to support this hypothesis by conducting experiments that reduce SGD noise during training and by measuring the pointwise functional distance between models trained with varying SGD noise levels, but at equivalent loss values. Our findings challenge the prevailing understanding of SGD and offer novel insights into its role in online learning.

翻訳日:2023-06-16 18:18:05 公開日:2023-06-14

# 音声編集によるASRにおけるコード切り替えと名前付きエンティティ認識の改善

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation ( http://arxiv.org/abs/2306.08588v1 )

ライセンス: Link先を確認

Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen

(参考訳) 近年,エンド・ツー・エンド(E2E)自動音声認識(ASR)モデルは非常に進歩しており,音声認識性能に優れる。しかし、コードスイッチングや名前付きエンティティ認識(NER)など、E2Eモデルには適さない難題がいくつか残っている。データ拡張は2つのシナリオで一般的で効果的なプラクティスです。しかし、現在のデータ拡張方法は、主に音声スプライシングとテキスト音声(TTS)モデルに依存しており、不連続性、非現実性、多様化の少ない音声をもたらす可能性がある。そこで本研究では,テキストベースの音声編集モデルを適用した新しいデータ拡張手法を提案する。音声編集システムによる拡張音声は、よりコヒーレントで多様化しており、また実際の音声に近い。コードスイッチングとnerタスクの実験結果は,提案手法が音声スプライシングとニューラルttsに基づくデータ拡張システムを大きく上回ることを示した。

Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition. However, there remain several challenging scenarios that E2E models are not competent in, such as code-switching and named entity recognition (NER). Data augmentation is a common and effective practice for these two scenarios. However, the current data augmentation methods mainly rely on audio splicing and text-to-speech (TTS) models, which might result in discontinuous, unrealistic, and less diversified speech. To mitigate these potential issues, we propose a novel data augmentation method by applying the text-based speech editing model. The augmented speech from speech editing systems is more coherent and diversified, also more akin to real speech. The experimental results on code-switching and NER tasks show that our proposed method can significantly outperform the audio splicing and neural TTS based data augmentation systems.

翻訳日:2023-06-16 18:17:45 公開日:2023-06-14

# CVSSの改良による産業制御システムの脆弱性評価

Vulnerability Assessment of Industrial Control System with an Improved CVSS ( http://arxiv.org/abs/2306.08631v1 )

ライセンス: Link先を確認

He Wen

(参考訳) 産業制御システム(ICS)に対するサイバー攻撃は学界で注目を集めている。しかし、これは一部の工業従事者の間で十分な懸念を生じさせていない。したがって、ICS内の脆弱な場所やコンポーネントを特定し、攻撃シナリオやテクニックを調査する必要がある。本研究は,ICSにおけるサイバー攻撃のリスクをCVSS(Common Vulnerability Scoring System)の改良により評価し,CSTR(Continuous stired tank reactor)モデルに適用する手法を提案する。その結果,icの物理システムレベルはサイバー攻撃時に最も重要度が高く,コントローラ,ワークステーション,ヒューマン・マシン・インタフェースがサイバー攻撃と防御の重要な構成要素であることがわかった。

Cyberattacks on industrial control systems (ICS) have been drawing attention in academia. However, this has not raised adequate concerns among some industrial practitioners. Therefore, it is necessary to identify the vulnerable locations and components in the ICS and investigate the attack scenarios and techniques. This study proposes a method to assess the risk of cyberattacks on ICS with an improved Common Vulnerability Scoring System (CVSS) and applies it to a continuous stirred tank reactor (CSTR) model. The results show the physical system levels of ICS have the highest severity once cyberattacked, and controllers, workstations, and human-machine interface are the crucial components in the cyberattack and defense.

翻訳日:2023-06-16 18:11:22 公開日:2023-06-14

# 部分空間と適応生成モデルを組み合わせた高次元mr再構成

High-Dimensional MR Reconstruction Integrating Subspace and Adaptive Generative Models ( http://arxiv.org/abs/2306.08630v1 )

ライセンス: Link先を確認

Ruiyang Zhao, Xi Peng, Varun A. Kelkar, Mark A. Anastasio, Fan Lam

(参考訳) 目的:高次元MR画像再構成のためのサブスペースと生成画像モデルを統合する新しい手法を開発する。方法:高次元画像の低次元部分空間モデルと,「コントラスト重み付き」画像のシーケンスや部分空間モデルの空間係数の空間制約となる適応生成画像と,従来のスパーシティ正規化とを融合させる定式化を提案した。コントラストの異なる画像のための正確な生成ネットワークベース表現を構築するために, 特別事前学習と対象特化ネットワーク適応戦略を提案した。最近提案された中間層最適化手法を応用した生成画像モデルの部分空間係数と多分解能潜時空間を共同で更新する反復アルゴリズムが導入された。結果: 高速MRパラメータマッピングと高分解能MR分光画像の2つの高次元イメージングへの応用について検討した。最先端のサブスペースベースメソッドのパフォーマンス向上が両ケースで実証された。結論:提案手法は,部分空間再構成を制約するデータ駆動空間として適応型生成モデルを導入することで,高次元mr画像再構成問題を解決する新しい手法を提供する。意義:本研究は,高次元画像問題に対するデータ駆動型および適応型生成前処理と標準低次元モデリングを統合する可能性を実証した。

Objective: To develop a new method that integrates subspace and generative image models for high-dimensional MR image reconstruction. Methods: We proposed a formulation that synergizes a low-dimensional subspace model of high-dimensional images, an adaptive generative image prior serving as spatial constraints on the sequence of "contrast-weighted" images or spatial coefficients of the subspace model, and a conventional sparsity regularization. A special pretraining plus subject-specific network adaptation strategy was proposed to construct an accurate generative-network-based representation for images with varying contrasts. An iterative algorithm was introduced to jointly update the subspace coefficients and the multi-resolution latent space of the generative image model that leveraged a recently proposed intermediate layer optimization technique for network inversion. Results: We evaluated the utility of the proposed method for two high-dimensional imaging applications: accelerated MR parameter mapping and high-resolution MR spectroscopic imaging. Improved performance over state-of-the-art subspace-based methods was demonstrated in both cases. Conclusion: The proposed method provided a new way to address high-dimensional MR image reconstruction problems by incorporating an adaptive generative model as a data-driven spatial prior for constraining subspace reconstruction. Significance: Our work demonstrated the potential of integrating data-driven and adaptive generative priors with canonical low-dimensional modeling for high-dimensional imaging problems.

翻訳日:2023-06-16 18:11:11 公開日:2023-06-14

# 深さ最適量子ビット割り当てとスワップベースルーティングのための制約プログラミングモデル

Constraint programming models for depth-optimal qubit assignment and SWAP-based routing ( http://arxiv.org/abs/2306.08629v1 )

ライセンス: Link先を確認

Kyle E. C. Booth

(参考訳) ゲートモデル量子デバイスの接続が限られているため、論理量子回路は実行前にターゲットハードウェアにコンパイルされなければならない。多くの場合、このプロセスでは、論理回路にスワップゲートを挿入し、いわゆる量子ビット割り当てとルーティング問題を解決することで回路の深さを増加させる。近年,量子ビット割当問題やルーティング問題の解法として整数線形計画法(ilp)モデルが提案されている。これらのモデルは問題の目的関数と制約を符号化し、ハードウェア準拠の量子回路を見つけるために自動解法技術を利用する。そこで本研究では,本問題に対する制約プログラミング(cp)モデルを提案し,線形および二次元グリッド格子デバイストポロジの回路深度最小化のためのirpとの比較を行った。実験分析の結果,提案手法はソリューションの品質と実行時間の両方において,ILPモデルよりも優れていることがわかった。

Due to the limited connectivity of gate model quantum devices, logical quantum circuits must be compiled to target hardware before they can be executed. Often, this process involves the insertion of SWAP gates into the logical circuit, usually increasing the depth of the circuit, achieved by solving a so-called qubit assignment and routing problem. Recently, a number of integer linear programming (ILP) models have been proposed for solving the qubit assignment and routing problem to proven optimality. These models encode the objective function and constraints of the problem, and leverage the use of automated solver technology to find hardware-compliant quantum circuits. In this work, we propose constraint programming (CP) models for this problem and compare their performance against ILP for circuit depth minimization for both linear and two-dimensional grid lattice device topologies on a set of randomly generated instances. Our empirical analysis indicates that the proposed CP approaches outperform the ILP models both in terms of solution quality and runtime.

翻訳日:2023-06-16 18:10:50 公開日:2023-06-14

# 気象データへのグラフベースマトリックス補完の適用

Graph-Based Matrix Completion Applied to Weather Data ( http://arxiv.org/abs/2306.08627v1 )

ライセンス: Link先を確認

Beno\^it Loucheur, P.-A. Absil, Michel Journ\'ee

(参考訳) 低ランク行列完備化は、真の行列が良好な低ランク近似を持つことを仮定して、行列の未知のエントリを復元するタスクである。時には変数に関する追加情報が知られており、この情報を行列補完モデルに組み込むことで、より良い完成性が得られる。本稿では,行列の列/行エンティティ間の情報を重み付きグラフとして利用できる状況を考える。本稿では,気象観測所が記録した気温データの欠落エントリを完了させる問題に対処する。気象データの実際のギャップを模倣した場所にデータを格納してテストセットを構築する。これらのテストセットにおいて,適切な空間的および時間的グラフは,グラフ正規化低ランク行列補完法によって得られる完了の精度を著しく向上できることを示す。

Low-rank matrix completion is the task of recovering unknown entries of a matrix by assuming that the true matrix admits a good low-rank approximation. Sometimes additional information about the variables is known, and incorporating this information into a matrix completion model can lead to a better completion quality. We consider the situation where information between the column/row entities of the matrix is available as a weighted graph. In this framework, we address the problem of completing missing entries in air temperature data recorded by weather stations. We construct test sets by holding back data at locations that mimic real-life gaps in weather data. On such test sets, we show that adequate spatial and temporal graphs can significantly improve the accuracy of the completion obtained by graph-regularized low-rank matrix completion methods.

翻訳日:2023-06-16 18:10:32 公開日:2023-06-14

# フェルミオン不純物のマルコフ緩和過程におけるシステムバスの絡み合い

System-bath entanglement during Markovian relaxation of a fermionic impurity ( http://arxiv.org/abs/2306.08626v1 )

ライセンス: Link先を確認

Krzysztof Ptaszynski, Massimiliano Esposito

(参考訳) フェルミイオン熱浴に結合した非相互作用性フェルミイオン不純物の熱分解におけるシステムと環境の絡み合いのダイナミクスについて検討した。弱結合状態においても過渡的絡み合いは観測可能であり、系の還元ダイナミクスや熱力学が状態集団に対する古典的・マルコフ的マスター方程式によってよく説明できることを示した。この絡み合いは長い間消滅するが、緩和時間に匹敵する時間スケールで保存される。その大きさは、システムと環境のカップリングに弱いだけでなく、システムの初期状態の純度に大きく依存する。我々は,このような過渡的絡み合いの存在とマルコフ記述の縮小に基づくシステムバス力学のユニタリ特性を関連づける。

We investigate the dynamics of entanglement between the system and the environment during thermalization of a noninteracting fermionic impurity coupled to a fermionic thermal bath. We show that transient entanglement can be observed even in the weak coupling regime, when the reduced dynamics and thermodynamics of the system can be well described by an effectively classical and Markovian master equation for the state populations. This entanglement vanishes for long times, but is preserved over timescales comparable to the relaxation time. Its magnitude depends only weakly on the system-environment coupling but instead strongly on the purity of the initial state of the system. We relate the presence of such transient entanglement to the unitary character of the system-bath dynamics underlying the reduced Markovian description.

翻訳日:2023-06-16 18:10:18 公開日:2023-06-14

# RRSIS:リモートセンシング画像のセグメンテーションを参照

RRSIS: Referring Remote Sensing Image Segmentation ( http://arxiv.org/abs/2306.08625v1 )

ライセンス: Link先を確認

Zhenghang Yuan, Lichao Mou, Yuansheng Hua, Xiao Xiang Zhu

(参考訳) リモートセンシング画像から所望のオブジェクトをローカライズすることは、実用上非常に有用である。与えられた表現が参照する対象を分割することを目的とした画像分割の参照は、自然画像において広く研究されている。しかし、このリモートセンシング画像のタスクには、ほとんど研究の注意が払われていない。本稿では,実世界の応用の可能性を考慮して,このギャップを埋めるためにリモートセンシング画像セグメンテーション(RRSIS)を紹介する。具体的には、このタスクのためにRefSegRSと呼ばれる新しいデータセットを作成し、異なるメソッドの評価を可能にします。その後、RefSegRSデータセット上の自然画像のイメージセグメンテーション手法をベンチマークし、これらのモデルが小さな物体や散乱物体の検出において限られた有効性を示すことを示した。この問題を軽減するために,言語機能を利用した言語誘導型クロススケール拡張(LGCE)モジュールを提案する。提案したデータセット、ベンチマーク結果、デザインされたLGCEモジュールは、より良いRRSISモデルの設計に関する洞察を提供する。データセットとコードを公開します。

Localizing desired objects from remote sensing images is of great use in practical applications. Referring image segmentation, which aims at segmenting out the objects to which a given expression refers, has been extensively studied in natural images. However, almost no research attention is given to this task of remote sensing imagery. Considering its potential for real-world applications, in this paper, we introduce referring remote sensing image segmentation (RRSIS) to fill in this gap and make some insightful explorations. Specifically, we create a new dataset, called RefSegRS, for this task, enabling us to evaluate different methods. Afterward, we benchmark referring image segmentation methods of natural images on the RefSegRS dataset and find that these models show limited efficacy in detecting small and scattered objects. To alleviate this issue, we propose a language-guided cross-scale enhancement (LGCE) module that utilizes linguistic features to adaptively enhance multi-scale visual features by integrating both deep and shallow features. The proposed dataset, benchmarking results, and the designed LGCE module provide insights into the design of a better RRSIS model. We will make our dataset and code publicly available.

翻訳日:2023-06-16 18:10:06 公開日:2023-06-14

# 量子ドットにおける断熱量子励起の熱力学

Thermodynamics of adiabatic quantum pumping in quantum dots ( http://arxiv.org/abs/2306.08621v1 )

ライセンス: Link先を確認

Daniele Nello and Alessandro Silva

(参考訳) 2つのフェルミオンリードに接続された単一レベルの量子ドットである共鳴レベルモデルによる断熱量子ポンピングを考える。我々は, このモデルについて, 点のエネルギーレベルと熱浴によるトンネル速度の変動を考慮した一貫した熱力学的記述を開発した。本研究では,エントロピーや散逸電力など,関連する熱力学量を計算するポンプサイクルの様々な例について検討する。次に、これらの量をシステムの輸送特性と比較する。その結果, 電荷量子化限界ではエントロピー生成速度が消失し, 散逸した電力は同じ限界で量子化されることがわかった。

We consider adiabatic quantum pumping through a resonant level model, a single-level quantum dot connected to two fermionic leads. We develop a consistent thermodynamic description of this model accounting for the variation of the energy level of the dot and the tunnelling rates with the thermal baths. We study various examples of pumping cycles computing the relevant thermodynamic quantities, such as the entropy produced and the dissipated power. We then compare these quantities with the transport properties of the system. Among other results, we find that the entropy production rate vanishes in the charge quantization limit while the dissipated power is quantized in the same limit.

翻訳日:2023-06-16 18:09:47 公開日:2023-06-14

# 期待音楽変換器

Anticipatory Music Transformer ( http://arxiv.org/abs/2306.08620v1 )

ライセンス: Link先を確認

John Thickstun, David Hall, Chris Donahue, Percy Liang

(参考訳) 第2の相関化プロセス(制御プロセス)の実現に基づいて非同期に条件づけされた時間的点過程(イベントプロセス)の制御可能な生成モデルを構築する方法である。我々は,イベントシーケンスの停止時間に従って制御が現れるように,イベントと制御のシーケンスをインターリーブすることでこれを実現する。この作品は、シンボリック・ミュージック・ジェネレーションの制御に生じる問題に動機づけられている。制御タスクは、制御自体がイベントのサブセットであり、条件付き生成は、固定された制御イベントが与えられたイベントのシーケンスを完了する。大規模かつ多様なLakh MIDI音楽データセットを用いて予測入出力モデルを訓練する。これらのモデルは、伴奏を含むインフィル制御タスクを実行する追加機能を備えた、インプット音楽生成のための自己回帰モデルのパフォーマンスと一致する。 human evaluatorsは、予測モデルが20秒のクリップで人間が作曲した音楽と同じような音楽性を持つ伴奏を生成すると報告している。

We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.

翻訳日:2023-06-16 18:09:36 公開日:2023-06-14

# 近似有効$p$-Resistanceによるマルチクラスグラフクラスタリング

Multi-class Graph Clustering via Approximated Effective $p$-Resistance ( http://arxiv.org/abs/2306.08617v1 )

ライセンス: Link先を確認

Shota Saito and Mark Herbster

(参考訳) 本稿では,(有効)$p$-resistanceの近似を開発し,マルチクラスクラスタリングに適用する。グラフラプラシアンに基づくスペクトル法とそのグラフ $p$-laplacian への一般化は、非ユークリッドクラスタリング技術のバックボーンとなっている。 p$-Laplacian の利点は、パラメータ $p$ がクラスタ構造に制御可能なバイアスをもたらすことである。 p$-Laplacian eigenvector based methodの欠点は、3番目と上位の固有ベクトルの計算が難しいことである。したがって、我々はクラスタリングに$p$-Laplacianによって誘導される$p$-resistanceを使うことを動機付けている。 p$-resistanceの場合、小さな$p$バイアスは内部接続性の高いクラスタへ、大きな$p$バイアスは小さな ``extent,''のクラスタへ、これはクラスタ内の頂点間の短いパス距離を優先する。しかし、$p$-resistanceは計算にコストがかかる。我々は、$p$-resistanceの近似を開発することでこれを克服する。この近似で上界と下界を証明し、グラフが木であるときにそれが正確であることを観測する。また、クラスタリングに$p$-resistanceを使用するための理論的正当性も提供する。最後に、近似した$p$-resistanceクラスタリングと他の$p$-Laplacianベースのメソッドとの比較実験を行う。

This paper develops an approximation to the (effective) $p$-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph $p$-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the $p$-Laplacian is that the parameter $p$ induces a controllable bias on cluster structure. The drawback of $p$-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the $p$-resistance induced by the $p$-Laplacian for clustering. For $p$-resistance, small $p$ biases towards clusters with high internal connectivity while large $p$ biases towards clusters of small ``extent,'' that is a preference for smaller shortest-path distances between vertices in the cluster. However, the $p$-resistance is expensive to compute. We overcome this by developing an approximation to the $p$-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of $p$-resistance for clustering. Finally, we provide experiments comparing our approximated $p$-resistance clustering to other $p$-Laplacian based methods.

翻訳日:2023-06-16 18:09:21 公開日:2023-06-14

# 時間-局所最適化を超えたラジカルペアダイナミクスの量子制御

Quantum Control of Radical Pair Dynamics beyond Time-Local Optimisation ( http://arxiv.org/abs/2306.08613v1 )

ライセンス: Link先を確認

Farhan T. Chowdhury, Matt C. J. Denton, Daniel C. Bonser, Daniel R. Kattnig

(参考訳) 傾斜上昇パルス工学(grape)を拡張して反応収率を最適化することで,低磁場領域におけるラジカル対のスピン選択的再結合反応における任意の波形制御を実現する。これは、反応制御を実現するための従来の時間局所最適化アプローチの欠点を克服し、高いバイアス場によって駆動されるラジカル対の適用性に制限された。本研究では, 時間ブロック, スパースサンプリング, 中央単段プロパゲータとそのFr'echet誘導体の反復的トロッタスズキ分割による評価により, ラジカル対組換え収率の時間-グローバル最適化がいかに効果的かを示す。その結果、より単純な高磁場シナリオにおいてラジカル対反応のコヒーレント制御を示すおもちゃモデルと、16個の核スピンからなる現実的な励起結合形成ドナー受容体系の両方が得られた。このことは、周囲磁場における実際のラジカル対系のスピン制御の見通しを高め、目的特異的な電波波形を用いてラジカル反応の収率を抑制または促進し、反応収率依存性の量子磁気学におけるラジカル誘起量子ビットアーキテクチャーへの道を切り開いたり、生化学的ラジカル対反応への量子制御の潜在的応用を可能にした。

By extending Gradient Ascent Pulse Engineering (GRAPE) to allow for optimising reaction yields, we realise arbitrary waveform-based control in spin-selective recombination reactions of radical pairs in the low magnetic field regime. This overcomes drawbacks of previous time-local optimisation approaches for realising reaction control, which were limited in their applicability to radical pairs driven by high biasing fields. We demonstrate how efficient time-global optimisation of the radical pair recombination yields can be realised by gradient based methods augmented with time-blocking, sparse sampling of the yield, and evaluation of the central single-timestep propagators and their Fr\'echet derivatives using iterated Trotter-Suzuki splittings. Results are shown for both a toy model, previously used to demonstrate coherent control of radical pair reactions in the simpler high-field scenario, and furthermore for a realistic exciplex-forming donor-acceptor system comprising 16 nuclear spins. This raises prospects for the spin-control of actual radical pair systems in ambient magnetic fields, by suppressing or boosting radical reaction yields using purpose-specific radio-frequency waveforms, paving the way for radical inspired qubit architectures for reaction-yield-dependent quantum magnetometry and potentially applications of quantum control to biochemical radical pair reactions.

翻訳日:2023-06-16 18:08:57 公開日:2023-06-14

# 接地型社会推論に向けて

Toward Grounded Social Reasoning ( http://arxiv.org/abs/2306.08651v1 )

ライセンス: Link先を確認

Minae Kwon, Hengyuan Hu, Vivek Myers, Siddharth Karamcheti, Anca Dragan, Dorsa Sadigh

(参考訳) レゴのスポーツカーでデスクを丁寧に組み立てるロボットを考えてみてほしい。人間はスポーツカーの分解が社会的に適切でないと認識し、「タイダイイング」の一部として取り除くことができる。ロボットはどうやってその結論に達するのか? 大規模言語モデル (LLMs) は近年, 社会的推論に利用されてきたが, 現実の世界でのこの推論は困難である。現実の世界では、ロボットは受動的にLLMに問い合わせるだけでなく、正しい判断を下すために必要な環境*から情報を積極的に収集する必要がある。例えば、隠された車があることを検知したロボットは、レゴ製の高度なモデルカーなのか、幼児が作ったおもちゃの車なのかを積極的に認識する必要があるかもしれない。 llmと視覚言語モデル(vlm)を活用して,ロボットがその環境を積極的に認識し,基盤的社会的推論を行うためのアプローチを提案する。当社のフレームワークを大規模に評価するために,クリーニングが必要な70の現実世界の面の画像を含むMessySurfacesデータセットをリリースしました。さらに,2つの表面を注意深く設計したロボットによるアプローチについても紹介する。我々は、メッシーサーフェースベンチマークの平均12.9%の改善と、アクティブな知覚を使用しないベースラインに対するロボット実験の平均15%の改善を見出した。私たちのアプローチのデータセット、コード、ビデオは、https://minaek.github.io/groundedsocialreasoningで見ることができます。

Consider a robot tasked with tidying a desk with a meticulously constructed Lego sports car. A human may recognize that it is not socially appropriate to disassemble the sports car and put it away as part of the "tidying". How can a robot reach that conclusion? Although large language models (LLMs) have recently been used to enable social reasoning, grounding this reasoning in the real world has been challenging. To reason in the real world, robots must go beyond passively querying LLMs and *actively gather information from the environment* that is required to make the right decision. For instance, after detecting that there is an occluded car, the robot may need to actively perceive the car to know whether it is an advanced model car made out of Legos or a toy car built by a toddler. We propose an approach that leverages an LLM and vision language model (VLM) to help a robot actively perceive its environment to perform grounded social reasoning. To evaluate our framework at scale, we release the MessySurfaces dataset which contains images of 70 real-world surfaces that need to be cleaned. We additionally illustrate our approach with a robot on 2 carefully designed surfaces. We find an average 12.9% improvement on the MessySurfaces benchmark and an average 15% improvement on the robot experiments over baselines that do not use active perception. The dataset, code, and videos of our approach can be found at https://minaek.github.io/groundedsocialreasoning.

翻訳日:2023-06-16 18:01:53 公開日:2023-06-14

# ランクが重要なときのランクの学習

Learning to Rank when Grades Matter ( http://arxiv.org/abs/2306.08650v1 )

ライセンス: Link先を確認

Le Yan, Zhen Qin, Xuanhui Wang, Gil Shamir, Mike Bendersky

(参考訳) グレードラベルは、現実世界の学習からランクへのアプリケーション、特に人間格付けされた関連データで広く使われている。従来の学習 to ランク技術は、文書のランク付け順序を最適化することを目的としている。しかし、通常は実際の成績の予測を無視する。これにより、`poor'' ドキュメントをフィルタリングするなど、グレードが重要なアプリケーションでそれらを採用できない。優れたランク付け性能と優れたグレード予測性能の両方を達成することは、まだ未解決の問題である。既存の研究は、モデル出力の校正を行わず、あるいはラベルが線形スケールにあり、順序付け情報を活用できないと仮定して、グレードを数値として扱うことで、ランキング性能のみに焦点を当てている。本稿では,ランク付け性能と格付け予測性能の両方が重要となるランク付け学習について,厳密な研究を行う。成績予測の非スカラー予測による順位付けの方法に関する形式的な議論を行い,順位予測と順位予測の両方を共同で最適化する多目的定式化を提案する。実験では,我々の手法がparetoのランキングとグレード予測のパフォーマンスのトレードオフのフロンティアを押し上げることができるという,いくつかの公開データセットを検証した。

Graded labels are ubiquitous in real-world learning-to-rank applications, especially in human rated relevance data. Traditional learning-to-rank techniques aim to optimize the ranked order of documents. They typically, however, ignore predicting actual grades. This prevents them from being adopted in applications where grades matter, such as filtering out ``poor'' documents. Achieving both good ranking performance and good grade prediction performance is still an under-explored problem. Existing research either focuses only on ranking performance by not calibrating model outputs, or treats grades as numerical values, assuming labels are on a linear scale and failing to leverage the ordinal grade information. In this paper, we conduct a rigorous study of learning to rank with grades, where both ranking performance and grade prediction performance are important. We provide a formal discussion on how to perform ranking with non-scalar predictions for grades, and propose a multiobjective formulation to jointly optimize both ranking and grade predictions. In experiments, we verify on several public datasets that our methods are able to push the Pareto frontier of the tradeoff between ranking and grade prediction performance, showing the benefit of leveraging ordinal grade information.

翻訳日:2023-06-16 18:01:26 公開日:2023-06-14

# OCAtari:オブジェクト中心のAtari 2600強化学習環境

OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments ( http://arxiv.org/abs/2306.08649v1 )

ライセンス: Link先を確認

Quentin Delfosse, Jannis Bl\"uml, Bjarne Gregori, Sebastian Sztwiertnia, Kristian Kersting

(参考訳) 認知科学と心理学は、複雑なシーンのオブジェクト中心の表現が、低レベルの知覚的特徴から効率的な抽象的推論を実現するための有望なステップであることを示唆している。しかし、最も深い強化学習アプローチは、自然のシーンの合成特性を捉えないピクセルベースの表現にのみ依存する。そのためには、オブジェクト指向アプローチの作業と評価を可能にする環境とデータセットが必要です。本稿では,ゲームにおけるオブジェクト中心の状態表現を提供する環境セットであるocatariを提案する。 OCAtariはまた、ゲームのRAM状態操作を変更でき、特定の状況や新しい状況でも作成できる。この作業のコードベースはgithub.com/k4ntz/oc_atariで入手できる。

Cognitive science and psychology suggest that object-centric representations of complex scenes are a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep reinforcement learning approaches rely on only pixel-based representations that do not capture the compositional properties of natural scenes. For this, we need environments and datasets that allow us to work and evaluate object-centric approaches. We present OCAtari, a set of environment that provides object-centric state representations of Atari games, the most-used evaluation framework for deep RL approaches. OCAtari also allows for RAM state manipulations of the games to change and create specific or even novel situations. The code base for this work is available at github.com/k4ntz/OC_Atari.

翻訳日:2023-06-16 18:01:04 公開日:2023-06-14

# simplemapping: ディープマルチビューステレオを用いたリアルタイム視覚慣性密集マッピング

SimpleMapping: Real-Time Visual-Inertial Dense Mapping with Deep Multi-View Stereo ( http://arxiv.org/abs/2306.08648v1 )

ライセンス: Link先を確認

Yingye Xin, Xingxing Zuo, Dongyue Lu, Stefan Leutenegger

(参考訳) 逐次単眼画像と慣性測定ユニット(IMU)のみを用いて高画質の3次元メッシュ再構成を行うことができるリアルタイムビジュアル慣性高密度マッピング法を提案する。 6-DoFカメラのポーズは、頑健な特徴に基づく視覚慣性計測(VIO)によって推定され、ノイズの多い3Dマップポイントを副産物として生成する。本稿では,vioシステムから有益だがノイズの多いスパースポイントを効果的に活用できるスパースポイント支援マルチビューステレオニューラルネットワーク(spa-mvsnet)を提案する。 VIOからのスパース深度は、まず、シングルビュー深度完了ネットワークによって完了する。この濃厚深さマップは、当然精度は限られているが、mvsネットワークのコストボリューム生成と正確な濃密深さ予測のための正規化を導くために、前もって使用される。 MVSネットワークによるキーフレーム画像の予測深度マップをTSDF-Fusionを用いてグローバルマップにインクリメンタルに融合する。提案するspa-mvsnetと,複数の公開データセット上での視覚慣性的高密度マッピングシステムと,我々のデータセットの両方を評価し,システムの印象的な一般化能力と高品質な3dメッシュ再構成をオンラインで提供する能力を示した。提案手法は,EuRoCデータセットの難易度評価において,既存システムよりも39.7%のFスコア向上を実現している。受け入れ次第、この作業のコードをリリースする予定です。

We present a real-time visual-inertial dense mapping method capable of performing incremental 3D mesh reconstruction with high quality using only sequential monocular images and inertial measurement unit (IMU) readings. 6-DoF camera poses are estimated by a robust feature-based visual-inertial odometry (VIO), which also generates noisy sparse 3D map points as a by-product. We propose a sparse point aided multi-view stereo neural network (SPA-MVSNet) that can effectively leverage the informative but noisy sparse points from the VIO system. The sparse depth from VIO is firstly completed by a single-view depth completion network. This dense depth map, although naturally limited in accuracy, is then used as a prior to guide our MVS network in the cost volume generation and regularization for accurate dense depth prediction. Predicted depth maps of keyframe images by the MVS network are incrementally fused into a global map using TSDF-Fusion. We extensively evaluate both the proposed SPA-MVSNet and the entire visual-inertial dense mapping system on several public datasets as well as our own dataset, demonstrating the system's impressive generalization capabilities and its ability to deliver high-quality 3D mesh reconstruction online. Our proposed dense mapping system achieves a 39.7% improvement in F-score over existing systems when evaluated on the challenging scenarios of the EuRoC dataset. We plan to release the code of this work upon acceptance.

翻訳日:2023-06-16 18:00:51 公開日:2023-06-14

# ロボットのスキル合成に報酬を与える言語

Language to Rewards for Robotic Skill Synthesis ( http://arxiv.org/abs/2306.08647v1 )

ライセンス: Link先を確認

Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, Fei Xia

(参考訳) 大規模言語モデル(llm)は、論理的な推論からコード記述まで、コンテキスト内学習を通じて多様な新機能を獲得するという、エキサイティングな進歩を示している。ロボティクスの研究者たちは、LLMを使ってロボット制御の能力を向上させる研究も行っている。しかし、低レベルロボットの動作はハードウェアに依存しており、LLMトレーニングコーパスでは表現できないため、LLMをロボットに適用するための既存の取り組みは、LLMをセマンティックプランナーとして、あるいは人間工学のコントロールプリミティブに頼ってロボットと対話している。一方、報酬関数は、多様なタスクを達成するために制御ポリシーに最適化できるフレキシブルな表現であり、その意味的な豊かさはLLMによって指定されるのに適している。本研究では, LLMを利用して, 様々なロボットタスクを最適化し, 実現可能な報酬パラメータを定義することによって, この実現を実現する新しいパラダイムを提案する。 LLMが生成する中間インタフェースとして報酬を用いることで、高レベルの言語命令や修正のギャップを、低レベルのロボット動作に効果的に埋めることができる。一方、リアルタイムオプティマイザであるmujoco mpcと組み合わせることで、ユーザがすぐに結果を観察し、システムへのフィードバックを提供できるインタラクティブな行動創造エクスペリエンスが実現される。提案手法の性能を体系的に評価するために,擬似四足ロボットと擬似マニピュレータロボットのための合計17のタスクを設計した。提案手法は設計したタスクの90%に確実に対応し,コード・アズ・ポリシシーのインターフェースとしてプリミティブ・スキルを用いたベースラインはタスクの50%を達成する。さらに本手法を,非包括的プッシュなどの複雑な操作スキルが対話システムを通じて現れるロボットアーム上で検証した。

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.

翻訳日:2023-06-16 18:00:25 公開日:2023-06-14

# 可変サイズテキスト・画像合成のための学習自由拡散モデル適応

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis ( http://arxiv.org/abs/2306.08645v1 )

ライセンス: Link先を確認

Zhiyu Jin and Xuli Shen and Bin Li and Xiangyang Xue

(参考訳) 拡散モデル(DM)は近年,テキスト・画像合成における最先端性能に注目されている。ディープラーニングの伝統に従って、DMは一定サイズの画像に基づいて訓練され、評価される。しかし、ユーザーは特定のサイズと様々なアスペクト比で様々な画像を要求する。本稿では,視覚の忠実性を維持しつつ,テキストから画像への拡散モデルを適用することに焦点を当てる。まず、合成中は、解像度の低い画像は不完全な物体の描写に悩まされ、高解像度画像は繰り返し提示される。次に,注意エントロピーがトークン量とともに変化することを示す統計的関係を確立し,モデルが画像解像度に比例して空間情報を集約することを示す。その後の観察では、低解像度の空間情報が少ないため、オブジェクトは不完全に描写されるが、高解像度の空間情報から繰り返し提示される。この観点から,注意エントロピーの変化を緩和し,観察した欠陥パターンを緩和するためのスケーリング係数を提案する。大規模な実験結果から,提案したスケーリング係数の有効性が検証され,視覚効果,画質,テキストアライメントが向上した。特に、これらの改善は、追加のトレーニングや微調整技術なしで達成される。

Diffusion models (DMs) have recently gained attention with state-of-the-art performance in text-to-image synthesis. Abiding by the tradition in deep learning, DMs are trained and evaluated on the images with fixed sizes. However, users are demanding for various images with specific sizes and various aspect ratio. This paper focuses on adapting text-to-image diffusion models to handle such variety while maintaining visual fidelity. First we observe that, during the synthesis, lower resolution images suffer from incomplete object portrayal, while higher resolution images exhibit repetitive presentation. Next, we establish a statistical relationship indicating that attention entropy changes with token quantity, suggesting that models aggregate spatial information in proportion to image resolution. The subsequent interpretation on our observations is that objects are incompletely depicted due to limited spatial information for low resolutions, while repetitive presentation arises from redundant spatial information for high resolutions. From this perspective, we propose a scaling factor to alleviate the change of attention entropy and mitigate the defective pattern observed. Extensive experimental results validate the efficacy of the proposed scaling factor, which enables the model to achieve better visual effects, image quality, and text alignment. Notably, these improvements are achieved without additional training or fine-tuning techniques.

翻訳日:2023-06-16 17:59:53 公開日:2023-06-14

# コンピュータビジョンにおけるAGIに向けて: GPTと大規模言語モデルから学ぶ

Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models ( http://arxiv.org/abs/2306.08641v1 )

ライセンス: Link先を確認

Lingxi Xie, Longhui Wei, Xiaopeng Zhang, Kaifeng Bi, Xiaotao Gu, Jianlong Chang, Qi Tian

(参考訳) AIコミュニティは、どんな現実世界の問題にも適用される人工知能(AGI)と呼ばれるアルゴリズムを追求してきた。近年,大規模言語モデル(LLM)を利用したチャットシステムが出現し,自然言語処理(NLP)におけるAGIの実現に向けて急速に進んでいるが,コンピュータビジョン(CV)におけるAGIへの道のりはいまだ不明である。ディレンマは、視覚信号が言語信号よりも複雑であることに起因するかも知れませんが、具体的な理由の発見や、gptやllmからの経験を吸収して問題を解決することに関心があります。本稿では、AGIの概念定義から始め、NLPがチャットシステムを介して広範囲のタスクをどのように解決するかを簡単にレビューする。この分析は、統合がCVの次の重要な目標であることを示している。しかし、この方向への様々な取り組みにもかかわらず、CVは、すべてのタスクを自然に統合するGPTのようなシステムからはまだ遠い。 CVの本質的な弱点は、環境から学ぶためのパラダイムが欠如していることが指摘されているが、NLPはテキストの世界においてその課題を達成している。次に、CVアルゴリズム(つまりエージェント)を世界規模で対話可能な環境に配置し、その動作に関する将来のフレームを予測するために事前訓練し、様々なタスクをこなすための命令で微調整するパイプラインを想像する。私たちは、このアイデアを前進させ、それをスケールアップするために、かなりの研究とエンジニアリングの努力を期待しています。

The AI community has been pursuing algorithms known as artificial general intelligence (AGI) that apply to any kind of real-world problem. Recently, chat systems powered by large language models (LLMs) emerge and rapidly become a promising direction to achieve AGI in natural language processing (NLP), but the path towards AGI in computer vision (CV) remains unclear. One may owe the dilemma to the fact that visual signals are more complex than language signals, yet we are interested in finding concrete reasons, as well as absorbing experiences from GPT and LLMs to solve the problem. In this paper, we start with a conceptual definition of AGI and briefly review how NLP solves a wide range of tasks via a chat system. The analysis inspires us that unification is the next important goal of CV. But, despite various efforts in this direction, CV is still far from a system like GPT that naturally integrates all tasks. We point out that the essential weakness of CV lies in lacking a paradigm to learn from environments, yet NLP has accomplished the task in the text world. We then imagine a pipeline that puts a CV algorithm (i.e., an agent) in world-scale, interactable environments, pre-trains it to predict future frames with respect to its action, and then fine-tunes it with instruction to accomplish various tasks. We expect substantial research and engineering efforts to push the idea forward and scale it up, for which we share our perspectives on future research directions.

翻訳日:2023-06-16 17:59:33 公開日:2023-06-14

# AssistGPT:計画、実行、検査、学習が可能な汎用マルチモーダルアシスタント

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn ( http://arxiv.org/abs/2306.08640v1 )

ライセンス: Link先を確認

Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou

(参考訳) 近年のLarge Language Models (LLMs) の研究は、一般のNLPAIアシスタントに顕著な進歩をもたらした。いくつかの研究は、より一般的なマルチモーダルユーザクエリに対処するために、モデルやapiの計画と呼び出しにllmの使用をさらに検討している。この進歩にもかかわらず、視覚タスクの多様な性質のため、複雑な視覚ベースのタスクは依然として困難である。この多様性は2つの側面に反映されます 1)経路の推論。多くの実生活アプリケーションでは、クエリ自体を調べるだけでクエリを正確に分解することは困難である。特定の視覚内容と各ステップの結果に基づいた計画が通常必要である。 2)柔軟な入力と中間結果。入力フォームは、野生のケースでは柔軟で、単一の画像やビデオだけでなく、ビデオや画像の混合物(たとえば、ユーザービュー画像といくつかの参照ビデオ)も含む。さらに、複雑な推論プロセスは、ビデオナレーションやセグメント化されたビデオクリップなど、さまざまなマルチモーダル中間結果を生成する。このような一般的なケースに対処するため,我々は,plan,execute,inspect,learning(peil)と呼ばれるインターリーブされたコードと言語推論アプローチを備えたマルチモーダルaiアシスタントである assistgpt を提案する。具体的には、Plannerは自然言語を使ってExecutorのどのツールが次にすべきかを、現在の推論の進捗に基づいて計画することができる。インスペクタは、プランナーが特定のツールに適切な視覚情報を供給するのを補助する効率的なメモリマネージャである。最後に、推論プロセス全体が複雑で柔軟であるため、学習者はモデルが最適な解を自律的に探索し発見できるように設計されている。我々は, A-OKVQA と NExT-QA のベンチマーク実験を行った。さらに,本システムでは,ベンチマークよりもはるかに複雑な質問を処理可能であることを示す。

Recent research on Large Language Models (LLMs) has led to remarkable advancements in general NLP AI assistants. Some studies have further explored the use of LLMs for planning and invoking models or APIs to address more general multi-modal user queries. Despite this progress, complex visual-based tasks still remain challenging due to the diverse nature of visual tasks. This diversity is reflected in two aspects: 1) Reasoning paths. For many real-life applications, it is hard to accurately decompose a query simply by examining the query itself. Planning based on the specific visual content and the results of each step is usually required. 2) Flexible inputs and intermediate results. Input forms could be flexible for in-the-wild cases, and involves not only a single image or video but a mixture of videos and images, e.g., a user-view image with some reference videos. Besides, a complex reasoning process will also generate diverse multimodal intermediate results, e.g., video narrations, segmented video clips, etc. To address such general cases, we propose a multi-modal AI assistant, AssistGPT, with an interleaved code and language reasoning approach called Plan, Execute, Inspect, and Learn (PEIL) to integrate LLMs with various tools. Specifically, the Planner is capable of using natural language to plan which tool in Executor should do next based on the current reasoning progress. Inspector is an efficient memory manager to assist the Planner to feed proper visual information into a specific tool. Finally, since the entire reasoning process is complex and flexible, a Learner is designed to enable the model to autonomously explore and discover the optimal solution. We conducted experiments on A-OKVQA and NExT-QA benchmarks, achieving state-of-the-art results. Moreover, showcases demonstrate the ability of our system to handle questions far more complex than those found in the benchmarks.

翻訳日:2023-06-16 17:59:07 公開日:2023-06-14

# TAPIR: フレーム単位の初期化と時間的リファインメントによる任意のポイントの追跡

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement ( http://arxiv.org/abs/2306.08637v1 )

ライセンス: Link先を確認

Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman

(参考訳) 本稿では,ビデオシーケンスを通して任意の物理面上の問合せ点を効果的に追跡する,TAP(Tracking Any Point)の新しいモデルを提案する。提案手法では,(1)他の各フレームのクエリ点に対する適切な候補点マッチングを独立に求めるマッチングステージ,(2)局所相関に基づいて軌跡と問合せ特徴の両方を更新するリファインメントステージの2つのステージを用いる。結果として得られたモデルは、DAVISにおける平均約20%の絶対平均ジャカード(AJ)改善によって示されるように、TAP-Vidベンチマークにおける大きなマージンで、すべてのベースライン手法を上回ります。本モデルは,長大かつ高精細な映像系列の高速推論を容易にする。現代のGPUでは、我々の実装はリアルタイムよりも高速にポイントを追跡する能力を持っている。視覚化、ソースコード、事前訓練されたモデルは、プロジェクトのWebページにある。

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time. Visualizations, source code, and pretrained models can be found on our project webpage.

翻訳日:2023-06-16 17:58:36 公開日:2023-06-14

# 移動平均と回帰モデルによる無線チャネルの品質予測

Predicting Wireless Channel Quality by means of Moving Averages and Regression Models ( http://arxiv.org/abs/2306.08634v1 )

ライセンス: Link先を確認

Gabriele Formis, Stefano Scanzio, Gianluca Cena, Adriano Valenzano

(参考訳) メディアアクセス制御層で見られるように、無線チャネルの将来品質を確実に予測する能力は、ワイヤーに依存しない将来の産業ネットワークの性能を向上させるための鍵となる。チャネルの振る舞いがどれほど変化するかを事前に知ることで、最適なチャネルを適応的に選択する手順を高速化し、ネットワークをより決定性が高く、信頼性が高く、エネルギー不足が軽減され、デバイスローミング能力が向上する可能性がある。この目的のために、実際のWi-Fi設定から取得したデータに基づいて、複数のキーパフォーマンス指標を用いて、移動平均と回帰に基づく一般的なアプローチを比較した。さらに, 異なる手法による結果の線形結合に基づく簡易な手法を提案し解析し, 予測誤差を更に低減し, 達成可能な誤差の上限を低くする方法について考察した。最良モデルは指数移動平均であり,平均誤差2.10\%でフレーム配信率を予測できると同時に,計算の複雑さやメモリ消費が他のモデルよりも低いことがわかった。

The ability to reliably predict the future quality of a wireless channel, as seen by the media access control layer, is a key enabler to improve performance of future industrial networks that do not rely on wires. Knowing in advance how much channel behavior may change can speed up procedures for adaptively selecting the best channel, making the network more deterministic, reliable, and less energy-hungry, possibly improving device roaming capabilities at the same time. To this aim, popular approaches based on moving averages and regression were compared, using multiple key performance indicators, on data captured from a real Wi-Fi setup. Moreover, a simple technique based on a linear combination of outcomes from different techniques was presented and analyzed, to further reduce the prediction error, and some considerations about lower bounds on achievable errors have been reported. We found that the best model is the exponential moving average, which managed to predict the frame delivery ratio with a 2.10\% average error and, at the same time, has lower computational complexity and memory consumption than the other models we analyzed.

翻訳日:2023-06-16 17:58:20 公開日:2023-06-14

# RGBDデータからシーンレベルインプット3Dを予測する学習

Learning to Predict Scene-Level Implicit 3D from Posed RGBD Data ( http://arxiv.org/abs/2306.08671v1 )

ライセンス: Link先を確認

Nilesh Kulkarni, Linyi Jin, Justin Johnson, David F. Fouhey

(参考訳) 本稿では,RGBDデータから3次元再構成のためのシーンレベルの暗黙関数を学習する手法を提案する。テスト時には,これまで見えなかったRGB画像を,暗黙の関数によるシーンの3次元再構成にマッピングする。 3次元再構成のための暗黙の関数はメッシュに結びついていることが多いが,RGBD画像のみを用いてトレーニングできることを示す。この設定は、3Dリコンストラクションが加速度計+RGBDの海を解き放つのに役立つかもしれない。当社のシステムであるD2-DRDFは,メッシュ監視を用いた現在の手法に適合し,時には優れ,スパースデータの堅牢性も向上する。

We introduce a method that can learn to predict scene-level implicit functions for 3D reconstruction from posed RGBD data. At test time, our system maps a previously unseen RGB image to a 3D reconstruction of a scene via implicit functions. While implicit functions for 3D reconstruction have often been tied to meshes, we show that we can train one using only a set of posed RGBD images. This setting may help 3D reconstruction unlock the sea of accelerometer+RGBD data that is coming with new phones. Our system, D2-DRDF, can match and sometimes outperform current methods that use mesh supervision and shows better robustness to sparse data.

翻訳日:2023-06-16 17:52:53 公開日:2023-06-14

# Gossipモデルにおける分散学習ダイナミクス

Decentralized Learning Dynamics in the Gossip Model ( http://arxiv.org/abs/2306.08670v1 )

ライセンス: Link先を確認

John Lazarsfeld, Dan Alistarh

(参考訳) 我々は,ゴシップモデルにおけるメモリ制限ノード数$n$の分散マルチアームバンディットについて検討し,各ラウンドにおいて各ノードが$m$のアームの1つを局所的に採用し,アームの分布から引き出された報酬を観測し,次にランダムにサンプリングされた隣人と通信し,次のラウンドでその方針を決定する。各ノードの決定は完全にローカルであり、最近取得した報酬とサンプルした隣接ノードのみに依存する。我々は,これらの分散ダイナミクスのグローバル進化と,ある種の「ゼロサム」乗算重み更新アルゴリズムとの関係を示し,これらの自然プロトコルの集団レベルの後悔を分析するための汎用フレームワークを開発した。この枠組みを用いて、固定的な報酬設定(各腕の分布の平均が時間とともに固定される)と敵対的な報酬設定(時間とともに変化しうる手段)について、幅広いパラメータ規則(すなわち、人口と武器の数)の下でサブ線形後悔境界を導出する。さらに,これらのプロトコルは,確率的勾配 oracle から報酬分布が生成される場合に,simplex 上の凸関数を近似的に最適化できることを示した。

We study a distributed multi-armed bandit setting among a population of $n$ memory-constrained nodes in the gossip model: at each round, every node locally adopts one of $m$ arms, observes a reward drawn from the arm's (adversarially chosen) distribution, and then communicates with a randomly sampled neighbor, exchanging information to determine its policy in the next round. We introduce and analyze several families of dynamics for this task that are decentralized: each node's decision is entirely local and depends only on its most recently obtained reward and that of the neighbor it sampled. We show a connection between the global evolution of these decentralized dynamics with a certain class of "zero-sum" multiplicative weight update algorithms, and we develop a general framework for analyzing the population-level regret of these natural protocols. Using this framework, we derive sublinear regret bounds under a wide range of parameter regimes (i.e., the size of the population and number of arms) for both the stationary reward setting (where the mean of each arm's distribution is fixed over time) and the adversarial reward setting (where means can vary over time). Further, we show that these protocols can approximately optimize convex functions over the simplex when the reward distributions are generated from a stochastic gradient oracle.

翻訳日:2023-06-16 17:52:40 公開日:2023-06-14

# 量子ドットデバイスを用いた通信用cバンドにおける識別不能光子のオンデマンド生成

On-demand Generation of Indistinguishable Photons in the Telecom C-Band using Quantum Dot Devices ( http://arxiv.org/abs/2306.08668v1 )

ライセンス: Link先を確認

Daniel A. Vajner, Pawe{\l} Holewa, Emilia Zi\k{e}ba-Ost\'oj, Maja Wasiluk, Martin von Helversen, Aurimas Sakanas, Alexander Huck, Kresten Yvind, Niels Gregersen, Anna Musia{\l}, Marcin Syperek, Elizaveta Semenova, Tobias Heindel

(参考訳) 量子ドット(QD)は、量子情報や量子通信への応用のために、単一および絡み合った光子を生成することができる。 780 nmから950 nmのスペクトル範囲で放射されるQDは、1光子純度と不連続性を持つが、この波長の光損失が大きいため、光ファイバーネットワークへの応用には最適ではない。ここで好まれる選択は、1550nm(Telecom Cバンド)の低損失スペクトルウィンドウで動作するQDである。本研究では,InAs/InP QD-mesa構造をシリコンウェハ上の金属リフレクタと不均一に統合した単一QDデバイスから,通信用Cバンド中の不明瞭な光子のコヒーレントなオンデマンド生成を実証する。二励起子-励起子放射カスケードのパルス2光子共鳴励起を用いて、励起子と二励起子光子のそれぞれg$^{(2)}$(0)=0.005(1)と0.015(1)の項で、4$\pi$のパルス領域までのラビ回転と高い単光子純度を観測する。香港-奥羽-マンデル型実験では, 共分極と交叉偏極の一致を比較することにより, 最大35(3)%の2光子干渉振動率を得る。これは、波長変換なしで直接通信Cバンドに放出される単一光子の光子区別性の著しい進歩を示す。

Semiconductor quantum dots (QDs) enable the generation of single and entangled photons for applications in quantum information and quantum communication. While QDs emitting in the 780 nm to 950 nm spectral range feature close-to-ideal single-photon purities and indistinguishabilities, they are not the best choice for applications in fiber-optical networks, due to the high optical losses in this wavelength regime. The preferable choice here are QDs operating in the lowest-loss spectral window around 1550 nm (telecom C-band). In this work, we demonstrate the coherent on-demand generation of indistinguishable photons in the telecom C-band from single QD devices consisting of InAs/InP QD-mesa structures heterogeneously integrated with a metallic reflector on a silicon wafer. Using pulsed two-photon resonant excitation of the biexciton-exciton radiative cascade, we observe Rabi rotations up to pulse areas of 4$\pi$ and a high single-photon purity in terms of g$^{(2)}$(0)=0.005(1) and 0.015(1) for exciton and biexciton photons, respectively. We obtain two-photon interference visibilities of up to 35(3)% in Hong-Ou-Mandel-type experiments by comparing co- and cross-polarized coincidences. This represents a significant advancement in the photon-indistinguishability of single photons emitted directly in the telecom C-band without wavelength conversion.

翻訳日:2023-06-16 17:52:13 公開日:2023-06-14

# 効率的な自己注意をいつ使うのか? テキスト・音声・画像変換器バリアントのプロファイリング

When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants ( http://arxiv.org/abs/2306.08667v1 )

ライセンス: Link先を確認

Anuj Diwan, Eunsol Choi, David Harwath

(参考訳) 本稿では,テキスト,音声,視覚にまたがる自己着脱型変圧器の効率に関する最初の統一研究を行う。我々は、様々な効率指標(レイテンシ、スループット、メモリ)を用いて、効率的なトランスフォーマー変種がバニラモデルよりも効率的になる入力長閾値(タップポイント)を同定する。そこで,本研究では,自己教師付き音声モデルの局所的対応型であるl-hubertを提案する。これらのしきい値は a) 典型的なデータセットのシーケンスの長さよりもはるかに高い (b)計量とモダリティに依存しており、正しいモデルを選択することはモダリティ、タスクタイプ(一般的なコンテキストとロングフォーム)、リソース制約(時間対メモリ)に依存することを示している。また, 変圧器部品の計算コストの推移を可視化することにより, 非自己注意部品は計算コストが著しく高いことを示す。私たちはプロファイリングツールキットをhttps://github.com/ajd12342/profiling-transformersでリリースします。

We present the first unified study of the efficiency of self-attention-based Transformer variants spanning text, speech and vision. We identify input length thresholds (tipping points) at which efficient Transformer variants become more efficient than vanilla models, using a variety of efficiency metrics (latency, throughput, and memory). To conduct this analysis for speech, we introduce L-HuBERT, a novel local-attention variant of a self-supervised speech model. We observe that these thresholds are (a) much higher than typical dataset sequence lengths and (b) dependent on the metric and modality, showing that choosing the right model depends on modality, task type (long-form vs. typical context) and resource constraints (time vs. memory). By visualising the breakdown of the computational costs for transformer components, we also show that non-self-attention components exhibit significant computational costs. We release our profiling toolkit at https://github.com/ajd12342/profiling-transformers .

翻訳日:2023-06-16 17:51:38 公開日:2023-06-14

# Radiology-GPT: ラジオロジーのための大規模言語モデル

Radiology-GPT: A Large Language Model for Radiology ( http://arxiv.org/abs/2306.08666v1 )

ライセンス: Link先を確認

Zhengliang Liu, Aoxiao Zhong, Yiwei Li, Longtao Yang, Chao Ju, Zihao Wu, Chong Ma, Peng Shu, Cheng Chen, Sekeun Kim, Haixing Dai, Lin Zhao, Dajiang Zhu, Jun Liu, Wei Liu, Dinggang Shen, Xiang Li, Quanzheng Li, Tianming Liu

(参考訳) 放射線学のための大規模言語モデルであるRadiology-GPTを紹介する。放射線学領域知識の広範なデータセットに基づく指導チューニング手法を用いて、ラジオロジー-GPTは、StableLM、Dlly、LLaMAといった一般的な言語モデルと比較して優れた性能を示す。放射線診断、研究、コミュニケーションにおいて重要な多様性を示す。この研究は、臨床nlpの今後の発展の触媒となる。ラジオロジー-GPTの実装が成功したことは、HIPAAのようなプライバシ標準の遵守を確保しつつ、特に特有の医療専門分野に適した、生成的な大きな言語モデルをローカライズする可能性を示唆している。様々な病院のニーズに合わせて個別化された大規模言語モデルを開発する見通しは、有望な方向性を示している。これらのモデルにおける会話能力とドメイン固有の知識の融合は、医療AIにおける将来の発展を促進することを目的としている。 radiology-gptのデモはhttps://huggingface.co/spaces/allen-eric/radiology-gptで見ることができる。

We introduce Radiology-GPT, a large language model for radiology. Using an instruction tuning approach on an extensive dataset of radiology domain knowledge, Radiology-GPT demonstrates superior performance compared to general language models such as StableLM, Dolly and LLaMA. It exhibits significant versatility in radiological diagnosis, research, and communication. This work serves as a catalyst for future developments in clinical NLP. The successful implementation of Radiology-GPT is indicative of the potential of localizing generative large language models, specifically tailored for distinctive medical specialties, while ensuring adherence to privacy standards such as HIPAA. The prospect of developing individualized, large-scale language models that cater to specific needs of various hospitals presents a promising direction. The fusion of conversational competence and domain-specific knowledge in these models is set to foster future development in healthcare AI. A demo of Radiology-GPT is available at https://huggingface.co/spaces/allen-eric/radiology-gpt.

翻訳日:2023-06-16 17:51:16 公開日:2023-06-14

# 一般統計モデルに対するZiv-Zakai型誤差境界

Ziv-Zakai-type error bounds for general statistical models ( http://arxiv.org/abs/2306.08660v1 )

ライセンス: Link先を確認

Mankei Tsang

(参考訳) パラメータ空間 $\Theta$ が一般であり、$\beta(\theta)$ が$\theta$ の線型函数でなくてもよいとき、パラメータ $\beta:\Theta \to \mathbb R$ を推定するためのベイズ誤差上の Ziv-Zakai 型下界を提案する。

I propose Ziv-Zakai-type lower bounds on the Bayesian error for estimating a parameter $\beta:\Theta \to \mathbb R$ when the parameter space $\Theta$ is general and $\beta(\theta)$ need not be a linear function of $\theta$.

翻訳日:2023-06-16 17:50:59 公開日:2023-06-14

# 3Dポイントクラウド理解のためのインコンテキスト学習の探索

Explore In-Context Learning for 3D Point Cloud Understanding ( http://arxiv.org/abs/2306.08659v1 )

ライセンス: Link先を確認

Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M. Buhmann, Chen Change Loy, Mengyuan Liu

(参考訳) 広範囲なデータに基づいて訓練された大規模モデルの台頭により、自然言語処理やコンピュータビジョンタスクにおいて大きな可能性を示す新たな学習パラダイムとなった。一方、インコンテキスト学習は、3d point cloudドメインではまだほとんど未調査である。マスク付きモデリングは、2Dビジョンにおけるコンテキスト内学習に成功しているが、それを3Dポイントクラウドに直接拡張することは、依然として困難な課題である。点雲の場合、トークンそのものは、推論中にマスクされる点雲の位置(座標)である。さらに、前作における位置埋め込みは、不注意に情報漏洩をもたらす可能性がある。このような課題に対処するために,我々は,特に3d ポイントクラウドにおけるインコンテキスト学習用に設計された point-in-context という新しいフレームワークを導入する。さらに,一般点サンプリング演算子と協調して動作するよう慎重に設計したジョイントサンプリングモジュールを提案し,上記の技術的課題を効果的に解決する。提案手法の汎用性と適応性を検証するため,幅広いタスクを扱うための広範囲な実験を行った。さらに、より効果的なプロンプト選択戦略により、我々のフレームワークは個別に訓練されたモデルの結果を上回る。

With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks. Meanwhile, in-context learning is still largely unexplored in the 3D point cloud domain. Although masked modeling has been successfully applied for in-context learning in 2D vision, directly extending it to 3D point clouds remains a formidable challenge. In the case of point clouds, the tokens themselves are the point cloud positions (coordinates) that are masked during inference. Moreover, position embedding in previous works may inadvertently introduce information leakage. To address these challenges, we introduce a novel framework, named Point-In-Context, designed especially for in-context learning in 3D point clouds, where both inputs and outputs are modeled as coordinates for each task. Additionally, we propose the Joint Sampling module, carefully designed to work in tandem with the general point sampling operator, effectively resolving the aforementioned technical issues. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks. Furthermore, with a more effective prompt selection strategy, our framework surpasses the results of individually trained models.

翻訳日:2023-06-16 17:50:50 公開日:2023-06-14

# Babel-ImageNet:視覚・言語表現の多言語的評価

Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations ( http://arxiv.org/abs/2306.08658v1 )

ライセンス: Link先を確認

Gregor Geigle, Radu Timofte, Goran Glava\v{s}

(参考訳) 視覚と言語(VL)モデルは、各モダリティ(例えばCLIP)ごとに異なるエンコーダを持ち、ゼロショット画像分類と画像テキスト検索のためのゴートモデルになっている。しかし、これらのモデルの評価の大部分は、英語のテキストのみで行われている: 言語固有の画像キャプチャーデータセットのコストの高い作成は、少数の高リソース言語に限定した多言語VLベンチマークを持つ。本研究では,1000のimagenetラベルを92言語に(部分的に)翻訳する多言語ベンチマークであるbabel-imagenetを紹介し,機械翻訳(mt)や手動アノテーションを使わずに構築した。代わりに、共有のwordnetシンセットを介して、imagenext概念の信頼できる翻訳を、巨大な多言語レキシコ・セマンティクスネットワークであるbabelnetにリンクすることで、自動的に取得します。 92のbabel-imagenet言語のそれぞれについて,公開されている8種類のマルチリンガル・クリップモデル(zs-ic)を評価し,英語イメージネットの性能と高リソース言語(ドイツ語や中国語など)と,低リソース言語(シンハラ語やラオ語など)とのギャップを明らかにした。 Babel-ImageNetのZS-IC性能は画像テキスト検索の性能と高い相関性を示し、金色の画像テキストデータを持たないほとんどの言語において、多言語VL表現空間の品質を推定するのにBabel-ImageNetが適していることを示す。最後に、低リソース言語に対する多言語CLIPの性能は、安価でパラメータ効率の良い言語特化学習によって劇的に改善できることを示す。コードとデータを公開します。 \url{https://github.com/gregor-ge/Babel-ImageNet}

Vision-and-language (VL) models with separate encoders for each modality (e.g., CLIP) have become the go-to models for zero-shot image classification and image-text retrieval. The bulk of the evaluation of these models is, however, performed with English text only: the costly creation of language-specific image-caption datasets has limited multilingual VL benchmarks to a handful of high-resource languages. In this work, we introduce Babel-ImageNet, a massively multilingual benchmark that offers (partial) translations of 1000 ImageNet labels to 92 languages, built without resorting to machine translation (MT) or requiring manual annotation. We instead automatically obtain reliable translations of ImageNext concepts by linking them -- via shared WordNet synsets -- to BabelNet, a massively multilingual lexico-semantic network. We evaluate 8 different publicly available multilingual CLIP models on zero-shot image classification (ZS-IC) for each of the 92 Babel-ImageNet languages, demonstrating a significant gap between English ImageNet performance and that of high-resource languages (e.g., German or Chinese), and an even bigger gap for low-resource languages (e.g., Sinhala or Lao). Crucially, we show that the models' ZS-IC performance on Babel-ImageNet highly correlates with their performance in image-text retrieval, validating that Babel-ImageNet is suitable for estimating the quality of the multilingual VL representation spaces for the vast majority of languages that lack gold image-text data. Finally, we show that the performance of multilingual CLIP for low-resource languages can be drastically improved via cheap, parameter-efficient language-specific training. We make our code and data publicly available: \url{https://github.com/gregor-ge/Babel-ImageNet}

翻訳日:2023-06-16 17:50:31 公開日:2023-06-14

# EMERSK -- 状況知識を用いた説明可能なマルチモーダル感情認識

EMERSK -- Explainable Multimodal Emotion Recognition with Situational Knowledge ( http://arxiv.org/abs/2306.08657v1 )

ライセンス: Link先を確認

Mijanur Palash, Bharat Bhargava

(参考訳) 近年,ディープラーニングアルゴリズムの普及により,感情の自動認識が注目されている。感情認識における主な課題の1つは、データで利用可能な様々な手がかり(モダリティ)を効果的に活用することである。もう一つの課題は、学習結果の適切な説明を提供することであり、これらの課題に対処するために、人間の感情認識と視覚情報を用いた説明のための一般化されたモジュールシステムEMERSK(Explainable Multimodal Emotion Recognition with situational Knowledge)を提案する。本システムは, 表情, 姿勢, 歩行などの複数のモーダルを柔軟かつモジュラーな方法で処理することができる。ネットワークは、利用可能なデータに応じて追加または削除できるさまざまなモジュールで構成されている。畳み込みニューラルネットワーク(cnns)とエンコーダ-デコーダスタイルの注意機構を備えた2ストリームネットワークアーキテクチャを用いて,顔画像から深い特徴を抽出する。同様に、長い短期記憶(lstm)を持つcnnとリカレントニューラルネットワーク(rnn)を用いて、姿勢や歩行データから特徴を抽出する。また、背景からの深い機能を学習プロセスのコンテキスト情報として取り入れています。各モジュールの深い機能は、初期のフュージョンネットワークを使って融合される。さらに,シーンから抽出した位置タイプと形容詞・名詞ペア(anp),感情の時空間的平均分布から得られた状況知識を活用し,説明を生成する。アブレーション研究は、各サブネットワークが独立して感情認識を行い、それらをマルチモーダルアプローチで組み合わせることで、全体的な認識性能が著しく向上することを示した。 GroupWalkを含む様々なベンチマークデータセットで実施された大規模な実験は、他の最先端手法と比較して、我々のアプローチの優れた性能を検証する。

Automatic emotion recognition has recently gained significant attention due to the growing popularity of deep learning algorithms. One of the primary challenges in emotion recognition is effectively utilizing the various cues (modalities) available in the data. Another challenge is providing a proper explanation of the outcome of the learning.To address these challenges, we present Explainable Multimodal Emotion Recognition with Situational Knowledge (EMERSK), a generalized and modular system for human emotion recognition and explanation using visual information. Our system can handle multiple modalities, including facial expressions, posture, and gait, in a flexible and modular manner. The network consists of different modules that can be added or removed depending on the available data. We utilize a two-stream network architecture with convolutional neural networks (CNNs) and encoder-decoder style attention mechanisms to extract deep features from face images. Similarly, CNNs and recurrent neural networks (RNNs) with Long Short-term Memory (LSTM) are employed to extract features from posture and gait data. We also incorporate deep features from the background as contextual information for the learning process. The deep features from each module are fused using an early fusion network. Furthermore, we leverage situational knowledge derived from the location type and adjective-noun pair (ANP) extracted from the scene, as well as the spatio-temporal average distribution of emotions, to generate explanations. Ablation studies demonstrate that each sub-network can independently perform emotion recognition, and combining them in a multimodal approach significantly improves overall recognition performance. Extensive experiments conducted on various benchmark datasets, including GroupWalk, validate the superior performance of our approach compared to other state-of-the-art methods.

翻訳日:2023-06-16 17:49:56 公開日:2023-06-14

# Augment then Smooth: 認証されたロバスト性で差別的プライバシを再定義する

Augment then Smooth: Reconciling Differential Privacy with Certified Robustness ( http://arxiv.org/abs/2306.08656v1 )

ライセンス: Link先を確認

Jiapeng Wu, Atiyeh Ashari Ghomi, David Glukhov, Jesse C. Cresswell, Franziska Boenisch, Nicolas Papernot

(参考訳) マシンラーニングモデルは、デプロイに対する信頼を損なう可能性のあるさまざまな攻撃に影響を受けやすい。これらの脅威には、トレーニングデータのプライバシーに対する攻撃や、モデルの精度を脅かす敵の例が含まれる。ディファレンシャルプライバシとランダム化平滑化は、これらの脅威のそれぞれに対して証明可能な保証を提供する効果的な防御であるが、どちらの防御も他の脅威にどのように影響するかはよく分かっていない。本研究では,プライバシー保証と認証された堅牢性の両方を同時に達成できることを論じる。我々は,ランダム化平滑化による認定ロバストネスを差分プライベートモデルトレーニングに統合するdp-certと呼ばれるフレームワークを提供する。例えば、DP-CERTは、CIFAR10上の個人確率勾配勾配よりも12倍の精度向上と平均認定半径の10倍の精度向上を達成し、精度の1.2%の低下を犠牲にしている。試料ごとの距離解析により, 認定半径は局所リプシッツ定数と損失面の滑らかさに相関することを示した。これにより、プライベートモデルが堅牢でない場合に新たな診断方法が提供される。

Machine learning models are susceptible to a variety of attacks that can erode trust in their deployment. These threats include attacks against the privacy of training data and adversarial examples that jeopardize model accuracy. Differential privacy and randomized smoothing are effective defenses that provide certifiable guarantees for each of these threats, however, it is not well understood how implementing either defense impacts the other. In this work, we argue that it is possible to achieve both privacy guarantees and certified robustness simultaneously. We provide a framework called DP-CERT for integrating certified robustness through randomized smoothing into differentially private model training. For instance, compared to differentially private stochastic gradient descent on CIFAR10, DP-CERT leads to a 12-fold increase in certified accuracy and a 10-fold increase in the average certified radius at the expense of a drop in accuracy of 1.2%. Through in-depth per-sample metric analysis, we show that the certified radius correlates with the local Lipschitz constant and smoothness of the loss surface. This provides a new way to diagnose when private models will fail to be robust.

翻訳日:2023-06-16 17:49:29 公開日:2023-06-14

# 国家間における市民不安の相転移と時間的変化

Phase Transitions of Civil Unrest across Countries and Time ( http://arxiv.org/abs/2306.08698v1 )

ライセンス: Link先を確認

Dan Braha

(参考訳) 組織のマクロなパターン間の急激なシフトを特徴とする相転移は、複雑なシステムにおいてユビキタスである。物理科学や自然科学の研究は多いが、社会システムにおけるこの現象の実証的研究は比較的未発達である。本研究の目的は,集団的市民不安のダイナミクスが,再帰的位相シフトの系列として,各フェーズが測定可能かつ識別可能な潜在性を有することを明らかにすることにある。 1946年から2017年までの170か国における市民不安の総合データセットを用いて,市民不安のマクロレベルの統計モデルを導入し,その可能性を評価する。以上の結果から,マクロレベルの位相モデルは,世界各国の市民不安データの特徴を効果的に捉え,普遍的なメカニズムは市民不安のダイナミクスの特定の側面を裏付ける可能性がある。また,国家の時間単位当たりの長期的不安を定量化する新たな尺度を導入し,特定の地域に集中して,市民的不安が地理的に集結する傾向があることを示す。我々のアプローチは、市民の不安を超えた様々な集団の人間の現象の相転移を特定し測定する可能性があり、複雑な社会システムに対するより良い理解に寄与する。

Phase transitions, characterized by abrupt shifts between macroscopic patterns of organization, are ubiquitous in complex systems. Despite considerable research in the physical and natural sciences, the empirical study of this phenomenon in societal systems is relatively underdeveloped. The goal of this study is to explore whether the dynamics of collective civil unrest can be plausibly characterized as a sequence of recurrent phase shifts, with each phase having measurable and identifiable latent characteristics. We introduce a macro-level statistical model of civil unrest and evaluate its plausibility using a comprehensive dataset of civil unrest events in 170 countries from 1946 to 2017. Our findings demonstrate that the macro-level phase model effectively captures the characteristics of civil unrest data from diverse countries globally and that universal mechanisms may underlie certain aspects of the dynamics of civil unrest. We also introduce a new scale to quantify a country's long-term unrest per unit of time and show that civil unrest events tend to cluster geographically, with the magnitude of civil unrest concentrated in specific regions. Our approach has the potential to identify and measure phase transitions in various collective human phenomena beyond civil unrest, contributing to a better understanding of complex social systems.

翻訳日:2023-06-16 17:42:34 公開日:2023-06-14

# GHP-MOFassemble:拡散モデリング、高スループットスクリーニング、分子動力学による炭素捕獲のための新規金属-有機化合物の合理的発見

GHP-MOFassemble: Diffusion modeling, high throughput screening, and molecular dynamics for rational discovery of novel metal-organic frameworks for carbon capture at scale ( http://arxiv.org/abs/2306.08695v1 )

ライセンス: Link先を確認

Hyun Park, Xiaoli Yan, Ruijie Zhu, E. A. Huerta, Santanu Chaudhuri, Donny Cooper, Ian Foster, Emad Tajkhorshid

(参考訳) ghp-mofassembleは生成型人工知能(ai)であり、高いco2能力と合成可能なリンカを備えた金属-有機フレームワーク(mofs)の合理的設計を加速する高性能フレームワークである。我々のフレームワークは,3つの事前選択ノードのうちの1つで組み立てられた新しいリンカを,プリミティブな立方体(pcu)トポロジーでMOFに生成するために,拡散モデルと生成AIのクラスを組み合わせる。これらのAI生成MOFのCO2容量は、結晶グラフ畳み込みニューラルネットワークモデルの修正版を用いて予測される。次に、LAMMPS符号を用いて分子動力学シミュレーションを行い、AI生成したMOF構造を緩和し、安定な構造に収束する構造を特定し、シミュレーションを通して多孔質性を維持する。 GHP-MOFassembleフレームワークによって生成された12万のpcu MOF候補のうち、合計102の分子動力学シミュレーションが1バーで完了し、1バーで2 mmol/g以上のCO2容量が0.1バーで予測され、MOFX-DBデータベースの仮説MOF(hMOF)データセットにおけるhMOFの上位5%に相当する。これらの候補のうち、18は分子動力学シミュレーションにおいて1%未満の密度変化を示し、安定性を示している。また、上位5つのGHP-MOFassembleのMOF構造は、96.9%のhMOF構造よりもCO2容量が高いことがわかった。この新しいアプローチは、生成型ai、グラフモデリング、大規模分子動力学シミュレーション、極端な計算を組み合わせて、新しいmof構造を大規模に発見するための新しい経路を開く。

We introduce GHP-MOFassemble, a Generative artificial intelligence (AI), High Performance framework to accelerate the rational design of metal-organic frameworks (MOFs) with high CO2 capacity and synthesizable linkers. Our framework combines a diffusion model, a class of generative AI, to generate novel linkers that are assembled with one of three pre-selected nodes into MOFs in a primitive cubic (pcu) topology. The CO2 capacities of these AI-generated MOFs are predicted using a modified version of the crystal graph convolutional neural network model. We then use the LAMMPS code to perform molecular dynamics simulations to relax the AI-generated MOF structures, and identify those that converge to stable structures, and maintain their porous properties throughout the simulations. Among 120,000 pcu MOF candidates generated by the GHP-MOFassemble framework, with three distinct metal nodes (Cu paddlewheel, Zn paddlewheel, Zn tetramer), a total of 102 structures completed molecular dynamics simulations at 1 bar with predicted CO2 capacity higher than 2 mmol/g at 0.1 bar, which corresponds to the top 5% of hMOFs in the hypothetical MOF (hMOF) dataset in the MOFX-DB database. Among these candidates, 18 have change in density lower than 1% during molecular dynamics simulations, indicating their stability. We also found that the top five GHP-MOFassemble's MOF structures have CO2 capacities higher than 96.9% of hMOF structures. This new approach combines generative AI, graph modeling, large-scale molecular dynamics simulations, and extreme scale computing to open up new pathways for the accelerated discovery of novel MOF structures at scale.

翻訳日:2023-06-16 17:42:12 公開日:2023-06-14

# 等化量子回帰への不確実性認識の統合

Integrating Uncertainty Awareness into Conformalized Quantile Regression ( http://arxiv.org/abs/2306.08693v1 )

ライセンス: Link先を確認

Raphael Rossellini, Rina Foygel Barber, Rebecca Willett

(参考訳) Conformalized Quantile Regression (CQR) は、分布的仮定を作らずに、共変量$X$の応答に対して予測間隔を構築する方法である。しかし、実証的に示すように、既存のCQRの構成は、量子回帰器が特徴空間の特定の部分において他の部分よりも優れているという問題に対して効果がない。理由は、CQR の予測間隔が 2 つの不確かさを区別しないからである: まず、$Y$ の条件分布のばらつき(すなわち、アレター的不確実性)と、この条件分布を推定する不確実性(すなわち、疫学的不確実性)である。これは不均一な範囲につながり、疫学的不確実性が低い(または高い)地域では、非常に広い(または過度に狭い)間隔を持つ。そこで本研究では,これら2つの不確実性源を明示的に分離し,特徴空間をまたいで分位レグレッセプタを調整する,不確実性対応型cqr(uacqr)を提案する。 cqrと比較すると,本手法はカバレッジ特性に対する分散フリーな理論保証を享受する一方で,実世界のデータセット上でのシミュレーションによる条件付きカバレッジの強化とインターバルの厳密化を実証した。

Conformalized Quantile Regression (CQR) is a recently proposed method for constructing prediction intervals for a response $Y$ given covariates $X$, without making distributional assumptions. However, as we demonstrate empirically, existing constructions of CQR can be ineffective for problems where the quantile regressors perform better in certain parts of the feature space than others. The reason is that the prediction intervals of CQR do not distinguish between two forms of uncertainty: first, the variability of the conditional distribution of $Y$ given $X$ (i.e., aleatoric uncertainty), and second, our uncertainty in estimating this conditional distribution (i.e., epistemic uncertainty). This can lead to uneven coverage, with intervals that are overly wide (or overly narrow) in regions where epistemic uncertainty is low (or high). To address this, we propose a new variant of the CQR methodology, Uncertainty-Aware CQR (UACQR), that explicitly separates these two sources of uncertainty to adjust quantile regressors differentially across the feature space. Compared to CQR, our methods enjoy the same distribution-free theoretical guarantees for coverage properties, while demonstrating in our experiments stronger conditional coverage in simulated settings and tighter intervals on a range of real-world data sets.

翻訳日:2023-06-16 17:41:33 公開日:2023-06-14

# 弱結合限界におけるRydbergアーキテクチャの量子ゲート最適化

Quantum Gate Optimization for Rydberg Architectures in the Weak-Coupling Limit ( http://arxiv.org/abs/2306.08691v1 )

ライセンス: Link先を確認

Nicolas Heimann, Lukas Broers, Nejira Pintul, Tobias Petersen, Koen Sponselee, Alexander Ilin, Christoph Becker, Ludwig Mathey

(参考訳) 我々はRydberg tweezerシステムにおける2ビットゲートの機械学習支援設計を実証する。各原子の2つの低エネルギー超微細構造は論理量子ビットを表し、リドバーグ状態は量子ビット相互作用を誘導する補助状態として作用する。ハイブリッド量子古典最適化器を用いることで、実験的に現実的なパラメータとプロトコル、および現実的な制限のために、高忠実度CNOTゲートを実装する最適なパルス列を生成する。単一量子ビット演算の局所制御は、大きな原子配列で量子計算を行うのに十分であることを示す。我々は,リュドベルク州の強結合・封鎖体制だけでなく,弱結合限界に対しても堅牢な最適化戦略を生成する。したがって, 弱結合限界におけるrydbergに基づく量子情報処理は, 強固かつ最適であり, 現在の技術と相まって望ましい手法であることを示す。

We demonstrate machine learning assisted design of a two-qubit gate in a Rydberg tweezer system. Two low-energy hyperfine states in each of the atoms represent the logical qubit and a Rydberg state acts as an auxiliary state to induce qubit interaction. Utilizing a hybrid quantum-classical optimizer, we generate optimal pulse sequences that implement a CNOT gate with high fidelity, for experimentally realistic parameters and protocols, as well as realistic limitations. We show that local control of single qubit operations is sufficient for performing quantum computation on a large array of atoms. We generate optimized strategies that are robust for both the strong-coupling, blockade regime of the Rydberg states, but also for the weak-coupling limit. Thus, we show that Rydberg-based quantum information processing in the weak-coupling limit is a desirable approach, being robust and optimal, with current technology.

翻訳日:2023-06-16 17:41:05 公開日:2023-06-14

# ICETによる幾何型レーザスキャンマッチングの精度評価

ICET Online Accuracy Characterization for Geometry-Based Laser Scan Matching ( http://arxiv.org/abs/2306.08690v1 )

ライセンス: Link先を確認

Matthew McDermott and Jason Rife

(参考訳) Distribution-to-Distribution (D2D)ポイントクラウド登録アルゴリズムは高速で、解釈可能で、非構造化環境ではよく機能する。残念ながら、これらの方法のソリューションエラーを予測する既存の戦略は、特に大規模または拡張された物理オブジェクトを含む領域において、非常に楽観的である。本稿では,第1原理からロバストな精度予測を実現するために,ndtを再構成した新しい3次元lidarスキャンマッチングアルゴリズムである反復的最接近楕円型変換(icet)を提案する。 ndtと同様に、icetはより小さな局所点分布を考慮して複雑なシーンを分析するために、lidarスキャンをvoxelに分割するが、icetはランダムノイズと決定論的構造を区別するためにvoxel分布を評価する。 icetは重み付き最小二乗法を用いて、このノイズ/構造区別を局所化解の計算と解エラー共分散の予測に組み込む。精度予測の合理性を示すために,実世界の自動車データ,高忠実度シミュレーショントラジェクタ,コーナーケースシーンのシミュレーションを含む3つのlidarテストで3d icetを検証した。それぞれのテストで、icetは一貫してサブセンチメートルの精度でスキャンマッチングを行う。この精度のレベルは、アルゴリズムが完全に解釈可能であるという事実と相まって、安全クリティカルな輸送用途に適している。コードはhttps://github.com/mcdermatt/ICETで入手できる。

Distribution-to-Distribution (D2D) point cloud registration algorithms are fast, interpretable, and perform well in unstructured environments. Unfortunately, existing strategies for predicting solution error for these methods are overly optimistic, particularly in regions containing large or extended physical objects. In this paper we introduce the Iterative Closest Ellipsoidal Transform (ICET), a novel 3D LIDAR scan-matching algorithm that re-envisions NDT in order to provide robust accuracy prediction from first principles. Like NDT, ICET subdivides a LIDAR scan into voxels in order to analyze complex scenes by considering many smaller local point distributions, however, ICET assesses the voxel distribution to distinguish random noise from deterministic structure. ICET then uses a weighted least-squares formulation to incorporate this noise/structure distinction into computing a localization solution and predicting the solution-error covariance. In order to demonstrate the reasonableness of our accuracy predictions, we verify 3D ICET in three LIDAR tests involving real-world automotive data, high-fidelity simulated trajectories, and simulated corner-case scenes. For each test, ICET consistently performs scan matching with sub-centimeter accuracy. This level of accuracy, combined with the fact that the algorithm is fully interpretable, make it well suited for safety-critical transportation applications. Code is available at https://github.com/mcdermatt/ICET

翻訳日:2023-06-16 17:40:51 公開日:2023-06-14

# テキスト・画像生成のためのノルム誘導潜時空間探索

Norm-guided latent space exploration for text-to-image generation ( http://arxiv.org/abs/2306.08687v1 )

ライセンス: Link先を確認

Dvir Samuel, Rami Ben-Ari, Nir Darshan, Haggai Maron, Gal Chechik

(参考訳) テキストから画像への拡散モデルは、新しい構成やシナリオにおいて様々な概念を合成する大きな可能性を示している。しかし、その潜在的な種空間はまだよく分かっておらず、新しい希少な概念の生成に影響を及ぼすことが示されている。具体的には、補間やセントロイド探索のような単純な操作は、潜在空間の標準ユークリッド測度や球面測度ではうまく機能しない。本稿では,現行のトレーニング手法が,標準値の狭い入力に対して拡散モデルを偏在させることを観察する。これは、画像生成のシード操作に依存する手法に強く影響し、少数ショットおよび長期学習タスクにさらに適用することができる。この問題に対処するために, 2つの種子間を補間する新しい方法を提案し, 種子に先行するノルムを考慮した新しい非ユークリッド計量を定義することを実証する。我々は,この計量を近似する単純かつ効率的なアルゴリズムを記述し,それを用いて潜在種空間におけるセントロイドをさらに定義する。我々は,新たな補間・遠心評価手法により,レアコンセプト画像の生成が著しく向上することを示す。これにより、少数ショットとロングテールのベンチマークにおける最先端のパフォーマンスが向上し、生成速度、画質、セマンティックコンテンツといった面で以前のアプローチが改善される。

Text-to-image diffusion models show great potential in synthesizing a large variety of concepts in new compositions and scenarios. However, their latent seed space is still not well understood and has been shown to have an impact in generating new and rare concepts. Specifically, simple operations like interpolation and centroid finding work poorly with the standard Euclidean and spherical metrics in the latent space. This paper makes the observation that current training procedures make diffusion models biased toward inputs with a narrow range of norm values. This has strong implications for methods that rely on seed manipulation for image generation that can be further applied to few-shot and long-tail learning tasks. To address this issue, we propose a novel method for interpolating between two seeds and demonstrate that it defines a new non-Euclidean metric that takes into account a norm-based prior on seeds. We describe a simple yet efficient algorithm for approximating this metric and use it to further define centroids in the latent seed space. We show that our new interpolation and centroid evaluation techniques significantly enhance the generation of rare concept images. This further leads to state-of-the-art performance on few-shot and long-tail benchmarks, improving prior approach in terms of generation speed, image quality, and semantic content.

翻訳日:2023-06-16 17:40:24 公開日:2023-06-14

# World-to-Words:視覚言語モデルにおける高速マッピングによる接地型オープン語彙獲得

World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models ( http://arxiv.org/abs/2306.08685v1 )

ライセンス: Link先を確認

Ziqiao Ma, Jiayi Pan, Joyce Chai

(参考訳) 言語単位を物理的世界の参照元とつなぐ能力は「接地」と呼ばれ、単語の基底的意味の学習と理解に不可欠である。人間は新しい単語学習で高速マッピングを実証するが、現代視覚言語モデルがその基礎的意味を持つ言語を真に表現できるかどうか、新しい単語学習をさらにブートストラップする方法については定かではない。この目的のために、オープンワールド言語学習における接地とブートストラップを検討するために、GOVA(Grounded Open Vocabulary Acquisition)を導入する。最初の試みとして,オブジェクト指向bert(10月)を提案する。これは,接地を目的として強調する画像とテキストのペアを事前学習することで,視覚的に接地した新しい言語モデルである。実験や分析を通じて、OctoBERTはより一貫性があり、高速な単語学習者であり、事前学習中に得られる接地能力は、未知の単語をより迅速かつ堅牢に学習する上で有効であることを示した。私たちのコードはhttps://github.com/sled-group/world-to-wordsで利用可能です。

The ability to connect language units to their referents in the physical world, referred to as grounding, is crucial to learning and understanding grounded meanings of words. While humans demonstrate fast mapping in new word learning, it remains unclear whether modern vision-language models can truly represent language with their grounded meanings and how grounding may further bootstrap new word learning. To this end, we introduce Grounded Open Vocabulary Acquisition (GOVA) to examine grounding and bootstrapping in open-world language learning. As an initial attempt, we propose object-oriented BERT (OctoBERT), a novel visually-grounded language model by pre-training on image-text pairs highlighting grounding as an objective. Through extensive experiments and analysis, we demonstrate that OctoBERT is a more coherent and fast grounded word learner, and that the grounding ability acquired during pre-training helps the model to learn unseen words more rapidly and robustly. Our code is available at https://github.com/sled-group/world-to-words

翻訳日:2023-06-16 17:40:02 公開日:2023-06-14

# コネクテッドカーデータを用いたハリケーン避難時のリアルタイム衝突リスク予測

Predicting Real-time Crash Risks during Hurricane Evacuation Using Connected Vehicle Data ( http://arxiv.org/abs/2306.08682v1 )

ライセンス: Link先を確認

Zaheen E Muktadi Syed and Samiul Hasan

(参考訳) 沿岸部の人々の命を救えるよう命じられたハリケーンの避難は、衝突のリスクを増し、高い交通需要を生み出す。このようなリスクを軽減するため、交通機関は適切な対策を展開するために、衝突リスクの高い高速道路の場所を予想する必要がある。ユビキタスセンサーと通信技術により、個々の車両軌道と速度情報を含むマイクロレベルの車両データを取得することができる。このような高分解能な車両データはリアルタイムに利用可能であり、交通安全条件の評価に使用できる。車両の速度と加速プロファイルを使用して、潜在的な衝突リスクをリアルタイムで予測することができる。リアルタイムの事故リスク予測に関するこれまでの研究は、主に道路セグメントの多くをカバーしていないインフラストラクチャーベースのセンサーのデータを用いていた。そこで,本稿では,新データであるコネクテッド・ビークル・データから,ハリケーン避難時の事故リスクを判定する手法を提案する。このようなデータは、非常に高い周波数(30秒未満)で収集された車両の位置、速度、加速度情報を含んでいる。事故リスクを予測するために,ルイジアナ州州間高速道路10号線(i-10)のハリケーン・アイダの避難期間に収集されたデータセットを用いた。連結車両データから5分間隔で抽出した気象特性と交通特性を考慮し,複数の機械学習モデルを訓練した。その結果, ガウスプロセスブースティング (GPBoost) とエクストリームグラディエントブースティング (XGBoost) は, 他のモデルより優れている(リコール=0.91)ことが示された。事故リスク評価のためのリアルタイムコネクテッドカーデータにより、交通管理者は資源を効率的に活用し、安全対策を積極的に行うことができる。

Hurricane evacuation, ordered to save lives of people of coastal regions, generates high traffic demand with increased crash risk. To mitigate such risk, transportation agencies need to anticipate highway locations with high crash risks to deploy appropriate countermeasures. With ubiquitous sensors and communication technologies, it is now possible to retrieve micro-level vehicular data containing individual vehicle trajectory and speed information. Such high-resolution vehicle data, potentially available in real time, can be used to assess prevailing traffic safety conditions. Using vehicle speed and acceleration profiles, potential crash risks can be predicted in real time. Previous studies on real-time crash risk prediction mainly used data from infrastructure-based sensors which may not cover many road segments. In this paper, we present methods to determine potential crash risks during hurricane evacuation from an emerging alternative data source known as connected vehicle data. Such data contain vehicle location, speed, and acceleration information collected at a very high frequency (less than 30 seconds). To predict potential crash risks, we utilized a dataset collected during the evacuation period of Hurricane Ida on Interstate-10 (I-10) in the state of Louisiana. Multiple machine learning models were trained considering weather features and different traffic characteristics extracted from the connected vehicle data in 5-minute intervals. The results indicate that the Gaussian Process Boosting (GPBoost) and Extreme Gradient Boosting (XGBoost) models perform better (recall = 0.91) than other models. The real-time connected vehicle data for crash risks assessment will allow traffic managers to efficiently utilize resources to proactively take safety measures.

翻訳日:2023-06-16 17:39:42 公開日:2023-06-14

# 完全可観測非決定性領域モデルにおける時間的拡張目標認識

Temporally Extended Goal Recognition in Fully Observable Non-Deterministic Domain Models ( http://arxiv.org/abs/2306.08680v1 )

ライセンス: Link先を確認

Ramon Fraga Pereira, Francesco Fuggitti, Felipe Meneguzzi, Giuseppe De Giacomo

(参考訳) ゴール認識(Goal Recognition)とは、エージェントが目標仮説、ドメインモデル、および一連の観測(つまり、環境内で実行される計画のサンプル)を与えられた上で達成しようとする意図された目標を識別するタスクである。既存のアプローチでは、ゴール仮説は単一の最終状態上の単一の共役式で構成され、環境力学は決定論的であり、より複雑な設定において時間的に拡張されたゴールの認識を妨げていると仮定している。本稿では,線形時相論理(ltlf)と純粋過去時相論理(pltlf)で表される有限トレースの目標に着目し,完全可観測非決定性(fond)計画ドメインモデルにおいて,目標認識を時間的拡張目標に拡張する。 6つのFONDプランニングドメインモデルに対して,そのような設定で目標を認識可能な最初のアプローチを開発し,異なるLTLfとPLTLfの目標を用いて評価する。実験の結果,我々のアプローチは,異なる認識環境における時間的拡張目標の認識において正確であることがわかった。

Goal Recognition is the task of discerning the correct intended goal that an agent aims to achieve, given a set of goal hypotheses, a domain model, and a sequence of observations (i.e., a sample of the plan executed in the environment). Existing approaches assume that goal hypotheses comprise a single conjunctive formula over a single final state and that the environment dynamics are deterministic, preventing the recognition of temporally extended goals in more complex settings. In this paper, we expand goal recognition to temporally extended goals in Fully Observable Non-Deterministic (FOND) planning domain models, focusing on goals on finite traces expressed in Linear Temporal Logic (LTLf) and Pure Past Linear Temporal Logic (PLTLf). We develop the first approach capable of recognizing goals in such settings and evaluate it using different LTLf and PLTLf goals over six FOND planning domain models. Empirical results show that our approach is accurate in recognizing temporally extended goals in different recognition settings.

翻訳日:2023-06-16 17:39:12 公開日:2023-06-14

# 定常状態における多体エッジバースト

Many-body edge burst in steady states ( http://arxiv.org/abs/2306.08676v1 )

ライセンス: Link先を確認

Yu-Min Hu, Wen-Tan Xue, Fei Song, Zhong Wang

(参考訳) 非エルミート皮膚効果と損失格子の空隙との相互作用はエッジバースト(エッジバースト)と呼ばれる非常に大きな粒子損失が発生する境界誘起力学現象を引き起こす。ここでは、そのような興味深い非エルミート力学現象を対応する開量子系の定常密度分布に正確にマッピングできることが分かる。したがって、エッジバーストにおける損失確率のバルクエッジスケーリング関係は定常密度のそれにもマップされる。さらに,二体損失を持つ散逸多体系に対して正のp表現を適用し,定常相関関数に対するスケーリング関係の有効性を明らかにする。これらの結果は、相互作用によって引き起こされる多体非エルミート皮膚効果の独特な展望を与える。我々の予測は最先端の実験プラットフォームで検証可能である。

The interplay between the non-Hermitian skin effect and the imaginary gap of lossy lattices results in the edge burst, a boundary-induced dynamical phenomenon that an exceptionally large portion of particle loss occurs at the edge. Here, we find that such an intriguing non-Hermitian dynamical phenomenon can be exactly mapped into the steady-state density distribution of a corresponding open quantum system. Consequently, the bulk-edge scaling relation of loss probability in edge burst also maps to that of steady-state density. Moreover, we apply the positive-P representation to dissipative many-body systems with two-body loss and reveal the validity of scaling relation for steady-state correlation functions. These results provide a unique perspective of the interaction-induced many-body non-Hermitian skin effect. Our predictions are testable in state-of-the-art experimental platforms.

翻訳日:2023-06-16 17:38:52 公開日:2023-06-14

# ワークフローノートを用いた信頼できる発作検出に向けて

Towards trustworthy seizure onset detection using workflow notes ( http://arxiv.org/abs/2306.08728v1 )

ライセンス: Link先を確認

Khaled Saab, Siyi Tang, Mohamed Taha, Christopher Lee-Messer, Christopher R\'e, Daniel Rubin

(参考訳) 医療AIモデルをデプロイする上で大きな障壁は、信頼性だ。既存のモデルは、集約されたメトリクスに専門家レベルのパフォーマンスを示すことがあるが、それらはしばしば非因果的特徴に依存し、隠れたサブグループのエラーにつながる。脳波からの信頼できる発作発生検出に向けて、我々は、発作以外の複数のイベント記述を含む、日常的な臨床ワークフロー(ワークフローノートと呼ばれる)で医療関係者が作成するアノテーションを活用することを提案する。ワークフローノートを用いて、トレーニングデータを68,920 EEG時間にスケールアップすることにより、高価な手作業によるゴールドスタンダードラベルによる小さなトレーニングセットに依存するよりも、発作発生検出性能が著しく向上する(+12.3 AUROCポイント)ことを示す。第2に, 2次発作検出モデルは, 臨床的に関連のあるサブグループ (小児と成人の間では最大6.5 auroc point) に過小評価され, また, 任意の脳波クリップ (+19 fpr) と比較して, 非てんかん性異常を示す脳波クリップでは有意に高い偽陽性率を示した。隠れたサブグループに対するモデルロバスト性を改善するために、スパイク、減速、移動アーティファクトなど、発作以外の26の属性を分類するマルチラベルモデルを訓練する。その結果, マルチラベルモデルでは, 発作発生検出性能(+5.9 AUROC点)が有意に向上し, サブグループ(+8.3 AUROC点)のパフォーマンスが向上し, 非てんかん性異常に対する偽陽性が8FPR点まで低下することがわかった。最後に,24脳波時間あたりの偽陽性率に基づく臨床ユーティリティ指標を提案するとともに,この臨床ユーティリティ指標を異なる臨床設定で2倍改善するマルチラベルモデルを提案する。

A major barrier to deploying healthcare AI models is their trustworthiness. One form of trustworthiness is a model's robustness across different subgroups: while existing models may exhibit expert-level performance on aggregate metrics, they often rely on non-causal features, leading to errors in hidden subgroups. To take a step closer towards trustworthy seizure onset detection from EEG, we propose to leverage annotations that are produced by healthcare personnel in routine clinical workflows -- which we refer to as workflow notes -- that include multiple event descriptions beyond seizures. Using workflow notes, we first show that by scaling training data to an unprecedented level of 68,920 EEG hours, seizure onset detection performance significantly improves (+12.3 AUROC points) compared to relying on smaller training sets with expensive manual gold-standard labels. Second, we reveal that our binary seizure onset detection model underperforms on clinically relevant subgroups (e.g., up to a margin of 6.5 AUROC points between pediatrics and adults), while having significantly higher false positives on EEG clips showing non-epileptiform abnormalities compared to any EEG clip (+19 FPR points). To improve model robustness to hidden subgroups, we train a multilabel model that classifies 26 attributes other than seizures, such as spikes, slowing, and movement artifacts. We find that our multilabel model significantly improves overall seizure onset detection performance (+5.9 AUROC points) while greatly improving performance among subgroups (up to +8.3 AUROC points), and decreases false positives on non-epileptiform abnormalities by 8 FPR points. Finally, we propose a clinical utility metric based on false positives per 24 EEG hours and find that our multilabel model improves this clinical utility metric by a factor of 2x across different clinical settings.

翻訳日:2023-06-16 17:34:29 公開日:2023-06-14

# クロスドメイン手術画像分割のためのクライアントサーバディープフェデレーション学習

A Client-server Deep Federated Learning for Cross-domain Surgical Image Segmentation ( http://arxiv.org/abs/2306.08720v1 )

ライセンス: Link先を確認

Ronast Subedi, Rebati Raman Gaire, Sharib Ali, Anh Nguyen, Danail Stoyanov, and Binod Bhattarai

(参考訳) 本稿では,異なるセンターに属する分散データセットのプライバシー保護を念頭において,2次元画像分割のための領域間適応問題の解法を提案する。医学画像解析におけるディープラーニングアーキテクチャは、より一般化するために広範なトレーニングデータを必要とする。しかし,本質的なデータキュレーションコストとデータアノテーション専門家の必要性から,十分な診断・手術データを得ることは依然として困難である。さらに、プライバシーと法的コンプライアンスの懸念が高まり、臨床現場や地域間でのデータ共有が困難になる可能性がある。医療データセットが直面するもうひとつの課題は、異なるセンターで収集されたデータ間のドメインシフトが避けられないことだ。そこで本研究では,クロスドメイン適応のためのクライアントサーバディープフェデレーションアーキテクチャを提案する。サーバはソースドメインとターゲットドメインの両方に共通するイミュータブルなパラメータのセットをホストする。クライアントはそれぞれのドメイン固有のパラメータで構成され、パラメータと推論を学習しながらサーバにリクエストを行う。本手法は2つのベンチマークデータセットで評価し,内視鏡的ポリープ・セグメンテーションと診断的皮膚病変の検出と解析に対するコンピュータ支援介入の適用性を示した。提案手法は, 競争的ベースライン法や最先端手法と比較して, 提案手法の優位性を示す。コードは、https://github.com/thetna/distributed-daで入手できる。

This paper presents a solution to the cross-domain adaptation problem for 2D surgical image segmentation, explicitly considering the privacy protection of distributed datasets belonging to different centers. Deep learning architectures in medical image analysis necessitate extensive training data for better generalization. However, obtaining sufficient diagnostic and surgical data is still challenging, mainly due to the inherent cost of data curation and the need of experts for data annotation. Moreover, increased privacy and legal compliance concerns can make data sharing across clinical sites or regions difficult. Another ubiquitous challenge the medical datasets face is inevitable domain shifts among the collected data at the different centers. To this end, we propose a Client-server deep federated architecture for cross-domain adaptation. A server hosts a set of immutable parameters common to both the source and target domains. The clients consist of the respective domain-specific parameters and make requests to the server while learning their parameters and inferencing. We evaluate our framework in two benchmark datasets, demonstrating applicability in computer-assisted interventions for endoscopic polyp segmentation and diagnostic skin lesion detection and analysis. Our extensive quantitative and qualitative experiments demonstrate the superiority of the proposed method compared to competitive baseline and state-of-the-art methods. Codes are available at: https://github.com/thetna/distributed-da

翻訳日:2023-06-16 17:33:53 公開日:2023-06-14

# 二重不均質環境におけるオフポリシー評価

Off-policy Evaluation in Doubly Inhomogeneous Environments ( http://arxiv.org/abs/2306.08719v1 )

ライセンス: Link先を確認

Zeyu Bian, Chengchun Shi, Zhengling Qi and Lan Wang

(参考訳) 本研究の目的は,2つの重要な強化学習(RL)の仮定 – 時間的定常性と個人的均質性の両方に違反するシナリオの下で,政治外評価(OPE)を研究することである。二重不均一性」を扱うために、モデルベースとモデルフリーの両方のアプローチからなる一般的なOPEフレームワークを開発するために、報酬および観測遷移関数のための潜在因子モデルのクラスを提案する。我々の知る限り、この論文は二重不均一なオフラインRLにおける統計的に健全なOPE法を開発した最初の論文である。標準的なRL仮定が満たされていない環境でのOPEの深い理解に寄与し、これらの設定においていくつかの実践的なアプローチを提供する。提案する値推定器の理論的性質を定め,その手法が時間的非定常性や個人的不均一性を無視する競合手法よりも優れていることを実証的に示す。最後に,集中治療のための医療情報マートから得られたデータセットについて述べる。

This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated. To handle the ``double inhomogeneities", we propose a class of latent factor models for the reward and observation transition functions, under which we develop a general OPE framework that consists of both model-based and model-free approaches. To our knowledge, this is the first paper that develops statistically sound OPE methods in offline RL with double inhomogeneities. It contributes to a deeper understanding of OPE in environments, where standard RL assumptions are not met, and provides several practical approaches in these settings. We establish the theoretical properties of the proposed value estimators and empirically show that our approach outperforms competing methods that ignore either temporal nonstationarity or individual heterogeneity. Finally, we illustrate our method on a data set from the Medical Information Mart for Intensive Care.

翻訳日:2023-06-16 17:33:35 公開日:2023-06-14

# 灌水スケジューリングのための機械学習パラダイムと混合整数モデル予測制御の統合

Integrating machine learning paradigms and mixed-integer model predictive control for irrigation scheduling ( http://arxiv.org/abs/2306.08715v1 )

ライセンス: Link先を確認

Bernard T. Agyeman, Mohamed Naouri, Willemijn Appels, Jinfeng Liu (University of Alberta), Sirish L. Shah

(参考訳) 農業部門は、主に淡水不足への懸念から、水資源の保全と収穫量の最適化において大きな課題に直面している。従来の灌水スケジューリング手法は、大規模な灌水システムのニーズを満たすのに不十分であることが多い。そこで本稿では,機械学習の3つのパラダイムを活かし,灌水スケジュールを最適化する予測灌水スケジューラを提案する。提案するスケジューラでは, 土壌水分パラメータとトポロジ情報に基づいて, k-meansクラスタリング手法を用いて, フィールドを異なる灌水管理ゾーンに分割する。さらに,管理ゾーン毎に動的モデルを構築するための長期短期記憶ネットワークを用いて,土壌水分動態の正確な予測を行う。混合整数モデル予測制御問題として定式化されたスケジューラは、全体の水消費と灌水コストを最小化しながら、吸水量を最大化する。混合整数最適化課題に取り組むために, 日次灌水決定に責任を持つ強化学習エージェントを訓練するために, 近位政策最適化アルゴリズムを用いる。提案したスケジューラの性能を評価するため、カナダのレスブリッジにある26.4ヘクタールのフィールドが2015年と2022年の成長期のケーススタディとして選ばれた。以上の結果から,水利用効率と作物収量改善の両面において,従来の灌水スケジューリング法と比較して,提案するスケジューラの優越性が示された。特に、提案されたスケジューラは6.4%から22.8%の貯水量を達成し、収量は2.3%から4.3%に増加した。

The agricultural sector currently faces significant challenges in water resource conservation and crop yield optimization, primarily due to concerns over freshwater scarcity. Traditional irrigation scheduling methods often prove inadequate in meeting the needs of large-scale irrigation systems. To address this issue, this paper proposes a predictive irrigation scheduler that leverages the three paradigms of machine learning to optimize irrigation schedules. The proposed scheduler employs the k-means clustering approach to divide the field into distinct irrigation management zones based on soil hydraulic parameters and topology information. Furthermore, a long short-term memory network is employed to develop dynamic models for each management zone, enabling accurate predictions of soil moisture dynamics. Formulated as a mixed-integer model predictive control problem, the scheduler aims to maximize water uptake while minimizing overall water consumption and irrigation costs. To tackle the mixed-integer optimization challenge, the proximal policy optimization algorithm is utilized to train a reinforcement learning agent responsible for making daily irrigation decisions. To evaluate the performance of the proposed scheduler, a 26.4-hectare field in Lethbridge, Canada, was chosen as a case study for the 2015 and 2022 growing seasons. The results demonstrate the superiority of the proposed scheduler compared to a traditional irrigation scheduling method in terms of water use efficiency and crop yield improvement for both growing seasons. Notably, the proposed scheduler achieved water savings ranging from 6.4% to 22.8%, along with yield increases ranging from 2.3% to 4.3%.

翻訳日:2023-06-16 17:33:16 公開日:2023-06-14

# イタリアの料理人はインドで機械工学を学べますか。シナリオと場所に関する行動認識の一般化

What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations ( http://arxiv.org/abs/2306.08713v1 )

ライセンス: Link先を確認

Chiara Plizzari, Toby Perrett, Barbara Caputo, Dima Damen

(参考訳) 行動認識のために訓練されたモデルは、これまで見つからなかったシナリオや、これまで見つからなかった場所で実行されたアクションをうまく分類できるだろうか? この質問に答えるために、大規模ego4dデータセットからの1.1mのビデオクリップを含む、シナリオとロケーションデータセット(argo1m)に対するアクション認識の一般化を紹介する。認識モデルは、10以上の提案されたテスト分割を一般化するのに苦労し、各シナリオは目に見えない場所にある。そこで我々は,他のドメインからの動画のクロスインスタンス再構成として,各ビデオを表現するCIRを提案する。レコンストラクションはテキストナレーションと組み合わせて、ドメインの一般化可能な表現の学習を導く。我々は、CIRが全てのテスト分割に先立つ領域一般化よりも優れていることを示すARGO1Mに関する広範な分析と改善を提供する。コードとデータ: https://chiaraplizz.github.io/what-can-a-cook/

We propose and address a new generalisation problem: can a model trained for action recognition successfully classify actions when they are performed within a previously unseen scenario and in a previously unseen location? To answer this question, we introduce the Action Recognition Generalisation Over scenarios and locations dataset (ARGO1M), which contains 1.1M video clips from the large-scale Ego4D dataset, across 10 scenarios and 13 locations. We demonstrate recognition models struggle to generalise over 10 proposed test splits, each of an unseen scenario in an unseen location. We thus propose CIR, a method to represent each video as a Cross-Instance Reconstruction of videos from other domains. Reconstructions are paired with text narrations to guide the learning of a domain generalisable representation. We provide extensive analysis and ablations on ARGO1M that show CIR outperforms prior domain generalisation works on all test splits. Code and data: https://chiaraplizz.github.io/what-can-a-cook/.

翻訳日:2023-06-16 17:32:46 公開日:2023-06-14

# AiXpand AI OS -- 分散ユビキタスコンピューティングMLOps実行エンジン

AiXpand AI OS -- Decentralized ubiquitous computing MLOps execution engine ( http://arxiv.org/abs/2306.08708v1 )

ライセンス: Link先を確認

Beatrice Milik, Stefan Saraev, Cristian Bleotiu, Radu Lupaescu, Bogdan Hobeanu, Andrei Ionut Damian

(参考訳) 過去数年間、ユビキタス、あるいは広く普及したコンピューティングは、エンタープライズグレードシステム、コンシューマアプリケーション、ゲームシステムを含む幅広いアプリケーションの主要なアプローチとして人気を集めてきた。ユビキタスコンピューティング(ユビキタスコンピューティング)とは、コンピュータ技術を日常のオブジェクトや環境に統合し、相互に通信可能な相互接続されたデバイスのネットワークを構築することを指す。ユビキタスコンピューティング技術を使用することで、コミュニティはよりつながりやすく、効率的になり、メンバーはコミュニケーションやコラボレーションが容易になる。これによって相互接続性とコラボレーションが,より成功し,持続可能なコミュニティに結びつくのです。しかしユビキタスコンピューティングの普及は、自動化された学習とスマートなアプリケーション全般の重要性を強調している。人工知能とディープラーニングには大きな進歩があったが、高価で高度に複雑なクラウド数値計算インフラへの圧力が高まり、大規模に採用されている。実践的な機械学習システムの採用や開発には、複雑なインフラストラクチャだけでなく、データサイエンスや機械学習の専門知識の面でも、禁止的なコストが伴う。本稿では、エンドツーエンドai協調アプリケーションパイプラインのローコード開発と展開のための革新的なアプローチを提案する。我々は、トークン化経済に基づく完全に分散したグローバル協調コミュニティにおける、インフラの割り当て、コスト、そして安全な雇用分配に対処する。

Over the past few years, ubiquitous, or pervasive computing has gained popularity as the primary approach for a wide range of applications, including enterprise-grade systems, consumer applications, and gaming systems. Ubiquitous computing refers to the integration of computing technologies into everyday objects and environments, creating a network of interconnected devices that can communicate with each other and with humans. By using ubiquitous computing technologies, communities can become more connected and efficient, with members able to communicate and collaborate more easily. This enabled interconnectedness and collaboration can lead to a more successful and sustainable community. The spread of ubiquitous computing, however, has emphasized the importance of automated learning and smart applications in general. Even though there have been significant strides in Artificial Intelligence and Deep Learning, large scale adoption has been hesitant due to mounting pressure on expensive and highly complex cloud numerical-compute infrastructures. Adopting, and even developing, practical machine learning systems can come with prohibitive costs, not only in terms of complex infrastructures but also of solid expertise in Data Science and Machine Learning. In this paper we present an innovative approach for low-code development and deployment of end-to-end AI cooperative application pipelines. We address infrastructure allocation, costs, and secure job distribution in a fully decentralized global cooperative community based on tokenized economics.

翻訳日:2023-06-16 17:32:17 公開日:2023-06-14

# VidEdit:ゼロショットと空間対応のテキスト駆動ビデオ編集

VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing ( http://arxiv.org/abs/2306.08707v1 )

ライセンス: Link先を確認

Paul Couairon, Cl\'ement Rambour, Jean-Emmanuel Haugeard, Nicolas Thome

(参考訳) 近年,拡散に基づく生成モデルが画像生成と編集において大きな成功を収めている。しかし、ビデオ編集には依然として重要な制限がある。本稿では,強い時間的・空間的一貫性を確保したゼロショットテキストベースの映像編集手法であるvideditを提案する。まず,アトラスベースと事前学習したテキスト-画像拡散モデルを組み合わせて,時間的滑らかさを設計する訓練不要で効率的な編集方法を提案する。第2に,既製パン光学セグメンタとエッジ検出器を併用し,条件付き拡散型アトラス編集に応用する。これにより、元のビデオの構造を厳格に保ちながら、ターゲット領域の正確な空間的制御が保証される。定量的および定性的な実験により、VidEditは、意味的忠実性、画像保存、時間的一貫性のメトリクスに関して、DAVISデータセット上で最先端の手法より優れていることが示された。このフレームワークでは、単一のビデオを処理するのに約1分しかかからず、ユニークなテキストプロンプトに基づいて複数の互換性のある編集を生成することができる。 Project Web-page at https://videdit.github.io

Recently, diffusion-based generative models have achieved remarkable success for image generation and edition. However, their use for video editing still faces important limitations. This paper introduces VidEdit, a novel method for zero-shot text-based video editing ensuring strong temporal and spatial consistency. Firstly, we propose to combine atlas-based and pre-trained text-to-image diffusion models to provide a training-free and efficient editing method, which by design fulfills temporal smoothness. Secondly, we leverage off-the-shelf panoptic segmenters along with edge detectors and adapt their use for conditioned diffusion-based atlas editing. This ensures a fine spatial control on targeted regions while strictly preserving the structure of the original video. Quantitative and qualitative experiments show that VidEdit outperforms state-of-the-art methods on DAVIS dataset, regarding semantic faithfulness, image preservation, and temporal consistency metrics. With this framework, processing a single video only takes approximately one minute, and it can generate multiple compatible edits based on a unique text prompt. Project web-page at https://videdit.github.io

翻訳日:2023-06-16 17:31:28 公開日:2023-06-14

# hBN/グラフェン/hBN超格子の熱力

Thermopower in hBN/graphene/hBN superlattices ( http://arxiv.org/abs/2306.08705v1 )

ライセンス: Link先を確認

Victor H. Guarochico-Moreira, Christopher R. Anderson, Vladimir Fal'ko, Irina V. Grigorieva, Endre T\'ov\'ari, Matthew Hamer, Roman Gorbachev, Song Liu, James H. Edgar, Alessandro Principi, Andrey V. Kretinin and Ivan J. Vera-Marun

(参考訳) 熱電効果はフェルミエネルギーの周りの状態密度の非対称性に非常に敏感であり、電子構造のプローブとして利用することができる。グラフェンとhbn格子が整列した完全hbnカプセル化と1次元エッジ接触からなるヘテロ構造において,高品質なグラフェンの熱パワーを実験的に研究した。グラフェンがhBN層のいずれかに配列されている場合、キャリア密度の関数として熱力に付加的な符号逆転の存在が示され、モワール超格子の存在が直接決定される。熱パワーの温度依存性は、内蔵歪変動とファンホブ特異性の役割を評価できることを示し、umklapp電子電子電子散乱過程の存在を示唆する。熱力は中立点付近でピークに達するため、エネルギースペクトルの縮退を探索することができる。さらに、グラフェンが上面と下面のhBN結晶と二重配向している場合、熱力は微分超モワール格子によって複数のクローン化されたディラック点が生じる特徴を示す。どちらの場合も、熱力がモットの方程式とどの程度よく一致するかを評価する。最後に, キャリア密度を制御することにより, 温度駆動型熱パワー逆転を正から負へ, 逆の逆転を観測できることを示す。熱力の研究は、2次元超格子の電子構造を研究するための代替のアプローチを提供する一方で、これらのヘテロ構造に熱電応答を工学する機会を提供する。

Thermoelectric effects are highly sensitive to the asymmetry in the density of states around the Fermi energy and can be exploited as probes of the electronic structure. We experimentally study thermopower in high-quality monolayer graphene, within heterostructures consisting of complete hBN encapsulation and 1D edge contacts, where the graphene and hBN lattices are aligned. When graphene is aligned to one of the hBN layers, we demonstrate the presence of additional sign reversals in the thermopower as a function of carrier density, directly evidencing the presence of the moir\'e superlattice. We show that the temperature dependence of the thermopower enables the assessment of the role of built-in strain variation and van Hove singularities and hints at the presence of Umklapp electron-electron scattering processes. As the thermopower peaks around the neutrality point, this allows to probe the energy spectrum degeneracy. Further, when graphene is double-aligned with the top and bottom hBN crystals, the thermopower exhibits features evidencing multiple cloned Dirac points caused by the differential super-moir\'e lattice. For both cases we evaluate how well the thermopower agrees with Mott's equation. Finally, we show the same superlattice device can exhibit a temperature-driven thermopower reversal from positive to negative and vice versa, by controlling the carrier density. The study of thermopower provides an alternative approach to study the electronic structure of 2D superlattices, whilst offering opportunities to engineer the thermoelectric response on these heterostructures.

翻訳日:2023-06-16 17:30:55 公開日:2023-06-14

# mBERTはロマンシュを理解していますか。単語アライメントを用いた単語埋め込みの評価

Does mBERT understand Romansh? Evaluating word embeddings using word alignment ( http://arxiv.org/abs/2306.08702v1 )

ライセンス: Link先を確認

Eyal Liron Dolev

(参考訳) 類似度に基づく単語アライメントモデル(SimAlign と Super-Align )と mBERT と XLM-R の単語埋め込みを,ドイツ語とロマンシュ語の並行文に組み合わせて検証する。 romanshは目に見えない言語なので、ゼロショットの設定を扱う。 mBERT からの埋め込みを用いて、両方のモデルがアライメントエラー率 0.22 に達し、統計モデルである fast_align を上回り、類似性に基づく単語アライメントと同等である。我々はこれらの結果を,mBERTが意味があり,ロマンシュに適用可能な情報を含んでいるという証拠として解釈する。性能を評価するため,過去25年間のドイツ語,ロマンシュ語,イタリア語のCanton of Grisonsによるプレスリリースを含む,DERMIT(DE-RM-IT)コーパスを新たに発表した。コーパスは4,547の並列文書と約10000の文対を言語の組み合わせに含む。さらに、ドイツ・ルーマニア語のアライメントの金本位制も提示する。データはhttps://github.com/eyldlv/DERMIT-Corpusで公開されている。

We test similarity-based word alignment models (SimAlign and awesome-align) in combination with word embeddings from mBERT and XLM-R on parallel sentences in German and Romansh. Since Romansh is an unseen language, we are dealing with a zero-shot setting. Using embeddings from mBERT, both models reach an alignment error rate of 0.22, which outperforms fast_align, a statistical model, and is on par with similarity-based word alignment for seen languages. We interpret these results as evidence that mBERT contains information that can be meaningful and applicable to Romansh. To evaluate performance, we also present a new trilingual corpus, which we call the DERMIT (DE-RM-IT) corpus, containing press releases made by the Canton of Grisons in German, Romansh and Italian in the past 25 years. The corpus contains 4 547 parallel documents and approximately 100 000 sentence pairs in each language combination. We additionally present a gold standard for German-Romansh word alignment. The data is available at https://github.com/eyldlv/DERMIT-Corpus.

翻訳日:2023-06-16 17:30:28 公開日:2023-06-14

# 反復的自己伝達学習:小規模データセットに基づく応答時間履歴予測の一般的な手法

Iterative self-transfer learning: A general methodology for response time-history prediction based on small dataset ( http://arxiv.org/abs/2306.08700v1 )

ライセンス: Link先を確認

Yongjia Xu, Xinzheng Lu, Yifan Fei and Yuli Huang

(参考訳) 応答時間履歴予測のためのディープニューラルネットワークサロゲートモデリングには、多くの利点がある。しかし、洗練された数値シミュレーションや実際の実験のコストが高いため、データ不足は実用的な応用において避けられないボトルネックとなっている。本研究では,小規模データセットに基づくニューラルネットワーク学習のための反復的自己伝達学習手法を提案する。回帰のための3つの分岐(DAN-TR)を持つ深層適応ネットワーク(deep adapt network)を新たに提案する。 DAN-TRと擬似ラベル戦略を組み合わせることで,ネットワークの総合的学習戦略を開発し,対応するデータセットの確立についても論じる。最後に、複素成分をケーススタディとして選択する。提案手法は, 外部ラベル付きサンプルや, 事前学習モデル, 追加人工ラベル付け, 複雑な物理・数学的解析を必要とせずに, モデル性能を約1桁向上させることができることを示す。

There are numerous advantages of deep neural network surrogate modeling for response time-history prediction. However, due to the high cost of refined numerical simulations and actual experiments, the lack of data has become an unavoidable bottleneck in practical applications. An iterative self-transfer learningmethod for training neural networks based on small datasets is proposed in this study. A new mapping-based transfer learning network, named as deep adaptation network with three branches for regression (DAN-TR), is proposed. A general iterative network training strategy is developed by coupling DAN-TR and the pseudo-label strategy, and the establishment of corresponding datasets is also discussed. Finally, a complex component is selected as a case study. The results show that the proposed method can improve the model performance by near an order of magnitude on small datasets without the need of external labeled samples,well behaved pre-trainedmodels, additional artificial labeling, and complex physical/mathematical analysis.

翻訳日:2023-06-16 17:30:08 公開日:2023-06-14

# 縦型胸部X線を用いた放射線診断

Utilizing Longitudinal Chest X-Rays and Reports to Pre-Fill Radiology Reports ( http://arxiv.org/abs/2306.08749v1 )

ライセンス: Link先を確認

Qingqing Zhu, Tejas Sudharshan Mathai, Pritam Mukherjee, Yifan Peng, Ronald M. Summers, and Zhiyong Lu

(参考訳) 音声認識ソフトウェアの使用による放射線学報告のターンアラウンドタイムの短縮にもかかわらず、持続的な通信エラーは、放射線学レポートの解釈に大きな影響を及ぼす可能性がある。 MIMIC-CXRデータセットにおける患者訪問記録の経時的性質を活かしたアプローチの欠如は,医療報告作成の文献的努力にもかかわらず,放射線学報告の補充が期待できる。このギャップに対処するため, 患者の過去訪問CXR, 現在の訪問CXR, および過去の訪問報告など, 患者の現在訪問レポートの「最終」部分をプリフィルするために, 縦マルチモーダルデータを用いることを提案する。まず,MIMIC-CXRデータセットから26,625人の患者を対象に,経時的訪問情報を収集した。この新しいデータセットでは、マルチモーダルデータ(cxrイメージ+レポート)を含む縦断的な患者訪問記録から、クロスタッチベースのマルチモーダル融合モジュールと階層的メモリ駆動デコーダを介して情報をキャプチャするためにトランスフォーマベースのモデルが訓練された。現在の訪問データのみを入力としてモデルトレーニングを行う従来の作業とは対照的に,本研究では,放射線学レポートの「発見」セクションを事前に埋めるために利用可能な縦断情報を活用している。実験の結果,f1得点では3%,bleu-4,meteor,rouge-lでは2%であった。データセットとコードは公開される予定だ。

Despite the reduction in turn-around times in radiology reports with the use of speech recognition software, persistent communication errors can significantly impact the interpretation of the radiology report. Pre-filling a radiology report holds promise in mitigating reporting errors, and despite efforts in the literature to generate medical reports, there exists a lack of approaches that exploit the longitudinal nature of patient visit records in the MIMIC-CXR dataset. To address this gap, we propose to use longitudinal multi-modal data, i.e., previous patient visit CXR, current visit CXR, and previous visit report, to pre-fill the 'findings' section of a current patient visit report. We first gathered the longitudinal visit information for 26,625 patients from the MIMIC-CXR dataset and created a new dataset called Longitudinal-MIMIC. With this new dataset, a transformer-based model was trained to capture the information from longitudinal patient visit records containing multi-modal data (CXR images + reports) via a cross-attention-based multi-modal fusion module and a hierarchical memory-driven decoder. In contrast to previous work that only uses current visit data as input to train a model, our work exploits the longitudinal information available to pre-fill the 'findings' section of radiology reports. Experiments show that our approach outperforms several recent approaches by >=3% on F1 score, and >=2% for BLEU-4, METEOR and ROUGE-L respectively. The dataset and code will be made publicly available.

翻訳日:2023-06-16 17:23:07 公開日:2023-06-14

# 物体中心神経散乱関数による多物体操作

Multi-Object Manipulation via Object-Centric Neural Scattering Functions ( http://arxiv.org/abs/2306.08748v1 )

ライセンス: Link先を確認

Stephen Tian, Yancheng Cai, Hong-Xing Yu, Sergey Zakharov, Katherine Liu, Adrien Gaidon, Yunzhu Li, Jiajun Wu

(参考訳) 学習された視覚力学モデルはロボット操作に有効であることが証明されている。しかし、マルチオブジェクトインタラクションに関わるシーンを表現できる最善の方法はまだ不明である。現在の方法はシーンを離散的なオブジェクトに分解するが、特定の照度に結びついた外観のみをエンコードするため、照明条件に挑戦する中で正確なモデリングと操作に苦慮する。本稿では,モデル予測制御フレームワークにおけるオブジェクト表現として,オブジェクト中心のニューラル散乱関数(osfs)を用いることを提案する。 OSFは、オブジェクトごとの光輸送をモデルとし、オブジェクトの再配置と様々な照明条件の下で構成シーンの再レンダリングを可能にする。このアプローチを逆パラメータ推定とグラフに基づくニューラルダイナミクスモデルと組み合わせることで,従来考えられなかったシナリオや過酷な照明条件においても,モデル予測制御性能の向上と合成多目的環境における一般化を実証する。

Learned visual dynamics models have proven effective for robotic manipulation tasks. Yet, it remains unclear how best to represent scenes involving multi-object interactions. Current methods decompose a scene into discrete objects, but they struggle with precise modeling and manipulation amid challenging lighting conditions as they only encode appearance tied with specific illuminations. In this work, we propose using object-centric neural scattering functions (OSFs) as object representations in a model-predictive control framework. OSFs model per-object light transport, enabling compositional scene re-rendering under object rearrangement and varying lighting conditions. By combining this approach with inverse parameter estimation and graph-based neural dynamics models, we demonstrate improved model-predictive control performance and generalization in compositional multi-object environments, even in previously unseen scenarios and harsh lighting conditions.

翻訳日:2023-06-16 17:22:34 公開日:2023-06-14

# MetaML: ディープラーニングアクセラレーションのためのカスタマイズ可能なクロスステージ設計フローを自動化する

MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration ( http://arxiv.org/abs/2306.08746v1 )

ライセンス: Link先を確認

Zhiqiang Que, Shuo Liu, Markus Rognlien, Ce Guo, Jose G. F. Coutinho, Wayne Luk

(参考訳) 本稿では、Deep Neural Network(DNN)ハードウェアアクセラレーションのための新しい最適化フレームワークを紹介し、カスタマイズされた設計フローと自動設計フローの迅速な開発を可能にする。具体的には、DNNとFPGAの低レベル最適化を含む低レベル最適化手法の選択と構成を自動化することを目的とする。 DNN加速器の性能と効率を向上させるため、高度にカスタマイズ可能で柔軟な設計フローアーキテクチャを構築するための新しい最適化および変換タスクを導入する。以上の結果から,DSP使用率を最大99%,LUT使用率を最大99%削減し,精度を維持し,人的努力やドメインの専門知識の必要性を排除した。最先端のアプローチと比較して,提案手法は高い精度を実現し,dspリソースを3倍削減し,提案フレームワークの利点を強調する。

This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators, enabling the rapid development of customized and automated design flows. More specifically, our approach aims to automate the selection and configuration of low-level optimization techniques, encompassing DNN and FPGA low-level optimizations. We introduce novel optimization and transformation tasks for building design-flow architectures, which are highly customizable and flexible, thereby enhancing the performance and efficiency of DNN accelerators. Our results demonstrate considerable reductions of up to 92\% in DSP usage and 89\% in LUT usage for two networks, while maintaining accuracy and eliminating the need for human effort or domain expertise. In comparison to state-of-the-art approaches, our design achieves higher accuracy and utilizes three times fewer DSP resources, underscoring the advantages of our proposed framework.

翻訳日:2023-06-16 17:22:19 公開日:2023-06-14

# PLAN: 変数対応のプライベート平均推定

PLAN: Variance-Aware Private Mean Estimation ( http://arxiv.org/abs/2306.08745v1 )

ライセンス: Link先を確認

Martin Aum\"uller, Christian Janos Lebeda, Boel Nelson, Rasmus Pagh

(参考訳) 差分プライベート平均推定は、データ分析と機械学習のためのプライバシ保存アルゴリズムの重要な構成要素である。プライバシとユーティリティのトレードオフは最悪の場合よく理解されているが、多くのデータセットはより良いアルゴリズムを生み出すために悪用される可能性がある構造を示している。本稿では、入力が分散$\mathcal{d}$ over $\mathbf{r}^d$ から独立にサンプリングされ、座標的に標準偏差が$\boldsymbol{\sigma} \in \mathbf{r}^d$ となるような設定において、平均推定のための微分プライベートアルゴリズムの族である$\textit{private limit adapted noise (plan)}$を提案する。マハラノビス距離での推定と同様、PLANはノイズの形状をデータの形に調整するが、従来のアルゴリズムとは異なり、プライバシー予算は座標に不均一に費やされる。 $\mathcal{D}$ の濃度仮定の下で、ベクトル $\boldsymbol{\sigma}$ のスキューをどのように活用するかを示し、$\ell_2$ 誤差が $\|\boldsymbol{\sigma}\|_1$ に比例した(ゼロ濃度の)微分プライベート平均推定値を得る。以前の研究は、$\boldsymbol{\sigma}$を考慮に入れなかったり、マハラノビス距離$\unicode{x2013}$で測定された誤差は、どちらも$\ell_2$エラーは$\sqrt{d}\|\boldsymbol{\sigma}\|_2$に比例する。アルゴリズムの有効性を検証するため,合成データと実世界データの両方で精度を実証的に評価した。

Differentially private mean estimation is an important building block in privacy-preserving algorithms for data analysis and machine learning. Though the trade-off between privacy and utility is well understood in the worst case, many datasets exhibit structure that could potentially be exploited to yield better algorithms. In this paper we present $\textit{Private Limit Adapted Noise (PLAN)}$, a family of differentially private algorithms for mean estimation in the setting where inputs are independently sampled from a distribution $\mathcal{D}$ over $\mathbf{R}^d$, with coordinate-wise standard deviations $\boldsymbol{\sigma} \in \mathbf{R}^d$. Similar to mean estimation under Mahalanobis distance, PLAN tailors the shape of the noise to the shape of the data, but unlike previous algorithms the privacy budget is spent non-uniformly over the coordinates. Under a concentration assumption on $\mathcal{D}$, we show how to exploit skew in the vector $\boldsymbol{\sigma}$, obtaining a (zero-concentrated) differentially private mean estimate with $\ell_2$ error proportional to $\|\boldsymbol{\sigma}\|_1$. Previous work has either not taken $\boldsymbol{\sigma}$ into account, or measured error in Mahalanobis distance $\unicode{x2013}$ in both cases resulting in $\ell_2$ error proportional to $\sqrt{d}\|\boldsymbol{\sigma}\|_2$, which can be up to a factor $\sqrt{d}$ larger. To verify the effectiveness of \algorithmname, we empirically evaluate accuracy on both synthetic and real world data.

翻訳日:2023-06-16 17:22:03 公開日:2023-06-14

# 深い単一スパイクと深いReLUネットワークの訓練軌跡は等価か?

Are training trajectories of deep single-spike and deep ReLU network equivalent? ( http://arxiv.org/abs/2306.08744v1 )

ライセンス: Link先を確認

Ana Stanojevic, Stanis{\l}aw Wo\'zniak, Guillaume Bellec, Giovanni Cherubini, Angeliki Pantazi and Wulfram Gerstner

(参考訳) 二分数とスパーススパイクによるコミュニケーションは、生物学的脳のエネルギー効率の重要な要素である。しかしながら、バックプロパゲーションによるディープスパイクニューラルネットワーク(SNN)のトレーニングは、ReLUからTTFS(Time-to-first-Spike)SNNへの正確なマッピングアルゴリズムを提供することを考えると、人工知能ニューラルネットワーク(ANN)よりも難しい。これらの結果に基づいて,TTFS-SNNの学習力学を理論的およびシミュレーションで解析する。我々の分析は、SNNをReLUネットワークに正確にマッピングできたとしても、勾配降下によって常に頑健に訓練できないことを強調している。その理由は、それに相当するANNと比較して勾配降下軌道に偏りをもたらす、消滅または爆発する勾配問題の特定の例の出現である。この問題を特定した後、ネットワーク初期化とSNNパラメータ化のための一般的なソリューションを導き、SNNがANNと同等に堅牢にトレーニングできることを保証する。画像分類データセットについて理論的知見を実際に示す。提案手法は,CIFAR10上での深部ConvNetsと同じ精度を実現し,さらに大きなPLACES365データセットをANNと比較して精度を損なうことなく微調整することができる。変換の観点とSNNの頑健な勾配勾配による微調整の組み合わせは、低レイテンシとノイズや量子化に対するレジリエンスを必要とするハードウェア実装において、SNNを最適化することが決定的に重要であると我々は主張する。

Communication by binary and sparse spikes is a key factor for the energy efficiency of biological brains. However, training deep spiking neural networks (SNNs) with backpropagation is harder than with artificial neural networks (ANNs), which is puzzling given that recent theoretical results provide exact mapping algorithms from ReLU to time-to-first-spike (TTFS) SNNs. Building upon these results, we analyze in theory and in simulation the learning dynamics of TTFS-SNNs. Our analysis highlights that even when an SNN can be mapped exactly to a ReLU network, it cannot always be robustly trained by gradient descent. The reason for that is the emergence of a specific instance of the vanishing-or-exploding gradient problem leading to a bias in the gradient descent trajectory in comparison with the equivalent ANN. After identifying this issue we derive a generic solution for the network initialization and SNN parameterization which guarantees that the SNN can be trained as robustly as its ANN counterpart. Our theoretical findings are illustrated in practice on image classification datasets. Our method achieves the same accuracy as deep ConvNets on CIFAR10 and enables fine-tuning on the much larger PLACES365 dataset without loss of accuracy compared to the ANN. We argue that the combined perspective of conversion and fine-tuning with robust gradient descent in SNN will be decisive to optimize SNNs for hardware implementations needing low latency and resilience to noise and quantization.

翻訳日:2023-06-16 17:21:21 公開日:2023-06-14

# 水中モノクロSLAMの課題の検討

Investigation of the Challenges of Underwater-Visual-Monocular-SLAM ( http://arxiv.org/abs/2306.08738v1 )

ライセンス: Link先を確認

Michele Grimaldi, David Nakath, Mengkun She, Kevin K\"oser

(参考訳) 本稿では,水中ロボットにおける単眼視覚同時配置マッピング法(vSLAM)の課題について,包括的に検討する。過去10年間、視覚データを利用する状態推定手法では大きな進歩を遂げてきたが、ほとんどの評価は、印象的な性能を示す屋内および都市環境の制御に限定されている。しかし、水や光の条件、ロボットの経路、深さなどの要因がアルゴリズムの性能に大きな影響を及ぼす水中シナリオのような、非常に困難な環境では、これらの手法は広くテストされていない。そこで,実世界のAUVシナリオと,正確な外部参照を提供する実験室設定で評価を行った。水の光学特性や照明シナリオなどの環境条件が単眼vslam法の性能に与える影響を理解することに焦点が当てられている。この目的を達成するために,まず,水中環境におけるすべての手法の動作が良好であることを示し,次いで水中環境における性能の低下を示す。本研究の最終目標は,これらの条件下でのSLAM法の精度とロバスト性を向上させる技術を明らかにすることである。この目的を達成するために,slam法で使用される入力画像の品質向上,特に散乱媒体の視認性および極端な照明シナリオにおける画像強調技術の可能性について検討する。本研究では,水中環境下での単分子SLAM法の性能向上を図るため,キャリブレーション操作と簡単な画像復元技術に関する最初の評価を行った。

In this paper, we present a comprehensive investigation of the challenges of Monocular Visual Simultaneous Localization and Mapping (vSLAM) methods for underwater robots. While significant progress has been made in state estimation methods that utilize visual data in the past decade, most evaluations have been limited to controlled indoor and urban environments, where impressive performance was demonstrated. However, these techniques have not been extensively tested in extremely challenging conditions, such as underwater scenarios where factors such as water and light conditions, robot path, and depth can greatly impact algorithm performance. Hence, our evaluation is conducted in real-world AUV scenarios as well as laboratory settings which provide precise external reference. A focus is laid on understanding the impact of environmental conditions, such as optical properties of the water and illumination scenarios, on the performance of monocular vSLAM methods. To this end, we first show that all methods perform very well in in-air settings and subsequently show the degradation of their performance in challenging underwater environments. The final goal of this study is to identify techniques that can improve accuracy and robustness of SLAM methods in such conditions. To achieve this goal, we investigate the potential of image enhancement techniques to improve the quality of input images used by the SLAM methods, specifically in low visibility and extreme lighting scenarios in scattering media. We present a first evaluation on calibration maneuvers and simple image restoration techniques to determine their ability to enable or enhance the performance of monocular SLAM methods in underwater environments.

翻訳日:2023-06-16 17:20:53 公開日:2023-06-14

# LoSh:ビデオオブジェクトセグメント参照のための長短テキスト共同予測ネットワーク

LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation ( http://arxiv.org/abs/2306.08736v1 )

ライセンス: Link先を確認

Linfeng Yuan, Miaojing Shi, Zijie Yue

(参考訳) ビデオオブジェクトセグメンテーション(RVOS)は、所定のテキスト表現によって参照されるターゲットインスタンスをビデオクリップにセグメントすることを目的としている。テキスト表現は通常、インスタンスの外観、行動、他者との関係に関する洗練された記述を含んでいる。したがって、RVOSモデルでは、ビデオ内のすべての属性をキャプチャすることはかなり困難である。これは最終的には、ターゲットインスタンスの不完全あるいは不正なマスク予測に終わる。本稿では,従来の長文表現から主語中心の短文表現を取り出すことにより,この問題に対処する。ショートは、ターゲットインスタンスの外観関連情報のみを保持するので、モデルがインスタンスの外観に注意を集中するために使用できます。我々は,長文と短文の両方の表現を用いて共同予測を行い,連接予測を整合させるために,長短の予測交叉損失を導入する。また,前向きの視覚的整合性損失も導入し,アノテートフレームと時間的隣接部との間の視覚的特徴の整合性に光学的流れを利用する。エンド・ツー・エンドのトレーニングのために, art transformerベースのパイプラインの2つの状態上に本手法を構築した。 A2D-SentencesとJHMDB-Sentencesデータセットの大規模な実験により,本手法の大幅な改善が示された。

Referring video object segmentation (RVOS) aims to segment the target instance referred by a given text expression in a video clip. The text expression normally contains sophisticated descriptions of the instance's appearance, actions, and relations with others. It is therefore rather difficult for an RVOS model to capture all these attributes correspondingly in the video; in fact, the model often favours more on the action- and relation-related visual attribute of the instance. This can end up with incomplete or even incorrect mask prediction of the target instance. In this paper, we tackle this problem by taking a subject-centric short text expression from the original long text expression. The short one retains only the appearance-related information of the target instance so that we can use it to focus the model's attention on the instance's appearance. We let the model make joint predictions using both long and short text expressions and introduce a long-short predictions intersection loss to align the joint predictions. Besides the improvement on the linguistic part, we also introduce a forward-backward visual consistency loss, which utilizes optical flows to warp visual features between the annotated frames and their temporal neighbors for consistency. We build our method on top of two state of the art transformer-based pipelines for end-to-end training. Extensive experiments on A2D-Sentences and JHMDB-Sentences datasets show impressive improvements of our method.

翻訳日:2023-06-16 17:20:27 公開日:2023-06-14

# wavpool:ディープニューラルネットワークのための新しいブロック

WavPool: A New Block for Deep Neural Networks ( http://arxiv.org/abs/2306.08734v1 )

ライセンス: Link先を確認

Samuel D. McDermott, M. Voetberg, Brian Nord

(参考訳) 現代のディープニューラルネットワークは、密集層や畳み込み層など、多くの操作層で構成されており、しばしばブロックにまとめられる。本研究では,マルチレゾリューション・パーセプトロンと呼ばれる新しいウェーブレット・トランスフォーメーション・ベースのネットワークアーキテクチャを導入する。マルチレゾリューションパーセプトロンの第1ステップは、入力データを固定係数のフィルタで変換するが、サイズが大きくなることで、データをそのマルチレゾリューション分解形式に変換する。画像処理技術により,データベクトルのサイズを増大させることなく,スケール情報と空間情報を同時にネットワークにアクセスすることができる。 WavPoolはパラメータを減らしながら同様の多層パーセプトロンを上回り、CIFAR-10の相対精度で同等の畳み込みニューラルネットワークを約10%上回る。

Modern deep neural networks comprise many operational layers, such as dense or convolutional layers, which are often collected into blocks. In this work, we introduce a new, wavelet-transform-based network architecture that we call the multi-resolution perceptron: by adding a pooling layer, we create a new network block, the WavPool. The first step of the multi-resolution perceptron is transforming the data into its multi-resolution decomposition form by convolving the input data with filters of fixed coefficients but increasing size. Following image processing techniques, we are able to make scale and spatial information simultaneously accessible to the network without increasing the size of the data vector. WavPool outperforms a similar multilayer perceptron while using fewer parameters, and outperforms a comparable convolutional neural network by ~ 10% on relative accuracy on CIFAR-10.

翻訳日:2023-06-16 17:20:03 公開日:2023-06-14

# 連続学習に基づく新しい感情認識システム

Continuous Learning Based Novelty Aware Emotion Recognition System ( http://arxiv.org/abs/2306.08733v1 )

ライセンス: Link先を確認

Mijanur Palash, Bharat Bhargava

(参考訳) 現在の人間の感情認識の研究は、新しさを考慮せずに厳格な規則によって統治される伝統的なクローズドラーニングアプローチに従っている。分類モデルは、収集されたデータセット上でトレーニングされ、現実世界のデプロイメントで同じデータ分布を持つことが期待される。私たちが住んでいる世界の流動的で絶えず変化する性質のため、予期せぬ新しいサンプル分布を持つことで、モデルが失敗する可能性がある。そこで本研究では,自動感情認識タスクの新規性を扱うための継続的学習手法を提案する。

Current works in human emotion recognition follow the traditional closed learning approach governed by rigid rules without any consideration of novelty. Classification models are trained on some collected datasets and expected to have the same data distribution in the real-world deployment. Due to the fluid and constantly changing nature of the world we live in, it is possible to have unexpected and novel sample distribution which can lead the model to fail. Hence, in this work, we propose a continuous learning based approach to deal with novelty in the automatic emotion recognition task.

翻訳日:2023-06-16 17:19:46 公開日:2023-06-14

# epic fields: 3dジオメトリとビデオ理解の結婚

EPIC Fields: Marrying 3D Geometry and Video Understanding ( http://arxiv.org/abs/2306.08731v1 )

ライセンス: Link先を確認

Vadim Tschernezki, Ahmad Darkhalil, Zhifan Zhu, David Fouhey, Iro Laina, Diane Larlus, Dima Damen, Andrea Vedaldi

(参考訳) ニューラルレンダリングは、20年以上待ち続けている学習と3D幾何学、そしてビデオ理解の統一を加速させている。しかし、プログレスはまだ適切なデータセットとベンチマークの欠如によって妨げられている。このギャップに対処するために,EPIC-KITCHENSを3次元カメラ情報で拡張したEPIC Fieldsを導入する。ニューラルレンダリングのための他のデータセットと同様に、EPIC Fieldsは、フォトグラムを使ってカメラを再構成する複雑で高価なステップを取り除き、研究者が問題モデリングに集中できるようにする。本稿では,ダイナミックアクションのエゴセントリックビデオにおけるフォトグラメトリーの課題を説明し,それに対処するためのイノベーションを提案する。他のニューラルレンダリングデータセットと比較して、EPIC Fieldsはラベル付きアクションセグメントと最近のVISORセグメントアノテーションとの組み合わせであるため、ビデオ理解に適している。さらにコミュニティのモチベーションを高めるために、ニューラルネットワークと動的オブジェクトのセグメンテーションにおける2つのベンチマークタスクを評価します。また,バイザアノテーション上の半教師付き映像オブジェクトセグメンテーションにおける幾何の利点を強調する。 EPIC FieldsはEPICKITCHENSの動画の96%を再構築し、45のキッチンで99時間に19Mフレームを登録している。

Neural rendering is fuelling a unification of learning, 3D geometry and video understanding that has been waiting for more than two decades. Progress, however, is still hampered by a lack of suitable datasets and benchmarks. To address this gap, we introduce EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Like other datasets for neural rendering, EPIC Fields removes the complex and expensive step of reconstructing cameras using photogrammetry, and allows researchers to focus on modelling problems. We illustrate the challenge of photogrammetry in egocentric videos of dynamic actions and propose innovations to address them. Compared to other neural rendering datasets, EPIC Fields is better tailored to video understanding because it is paired with labelled action segments and the recent VISOR segment annotations. To further motivate the community, we also evaluate two benchmark tasks in neural rendering and segmenting dynamic objects, with strong baselines that showcase what is not possible today. We also highlight the advantage of geometry in semi-supervised video object segmentations on the VISOR annotations. EPIC Fields reconstructs 96% of videos in EPICKITCHENS, registering 19M frames in 99 hours recorded in 45 kitchens.

翻訳日:2023-06-16 17:19:38 公開日:2023-06-14

# 一般化可能なワンショットニューラルヘッドアバター

Generalizable One-shot Neural Head Avatar ( http://arxiv.org/abs/2306.08768v1 )

ライセンス: Link先を確認

Xueting Li, Shalini De Mello, Sifei Liu, Koki Nagano, Umar Iqbal, Jan Kautz

(参考訳) 本研究では,1枚の画像から3次元頭部アバターを再構成し,アニメイトする手法を提案する。既存の方法は、複数の画像を持つ特定の人の時間的最適化や、顔領域を超えて複雑な外観の詳細を合成するのに苦労する。これらの制限に対処するために、人物固有の最適化を必要とせず、一視点画像に基づく識別不能に一般化するだけでなく、顔領域内外の特徴詳細(髪型、アクセサリーなど)をキャプチャする枠組みを提案する。提案手法のコアとなるのは, 粗い3次元形状, ソース画像の詳細な外観, ターゲット画像の表現を表す3つの三面体を生成する3つの枝である。 3つの三面体と超解像モジュールの組み合わせにボリュームレンダリングを適用することにより、所望のアイデンティティ、表現、ポーズの忠実度の高い画像が得られる。トレーニングを終えると、ネットワークを1つのフォワードパスで効率的な3d頭部アバターの再構築とアニメーションが可能になる。実験により,提案手法は未発見の検証データセットによく一般化し,頭部アバターの再構成とアニメーションにおいてsotaベースライン法を大きなマージンで上回った。

We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image. Existing methods either involve time-consuming optimization for a specific person with multiple images, or they struggle to synthesize intricate appearance details beyond the facial region. To address these limitations, we propose a framework that not only generalizes to unseen identities based on a single-view image without requiring person-specific optimization, but also captures characteristic details within and beyond the face area (e.g. hairstyle, accessories, etc.). At the core of our method are three branches that produce three tri-planes representing the coarse 3D geometry, detailed appearance of a source image, as well as the expression of a target image. By applying volumetric rendering to the combination of the three tri-planes followed by a super-resolution module, our method yields a high fidelity image of the desired identity, expression and pose. Once trained, our model enables efficient 3D head avatar reconstruction and animation via a single forward pass through a network. Experiments show that the proposed approach generalizes well to unseen validation datasets, surpassing SOTA baseline methods by a large margin on head avatar reconstruction and animation.

翻訳日:2023-06-16 17:13:55 公開日:2023-06-14

# データにおけるコペルニクス革命

A Copernican Revolution in Data ( http://arxiv.org/abs/2306.08766v1 )

ライセンス: Link先を確認

Claudio Gutierrez

(参考訳) 半世紀前、Charles Bachmanはデジタル世界でのデータの重要性と中心性を予見した。本稿では,過去数十年のデータベースコミュニティにおけるこれらのアイデアの進化について考察する。この歴史的分析は、我々の規律の下にある根本的な変化の理解を深め、我々の分野の将来の軌道に関する洞察を与えるのに役立つと信じている。

Half a century ago, Charles Bachman foresaw the significance and centrality of data in the digital world. In this short paper, we delve into the evolution of these ideas within the database community over the past decades. We believe that this historical analysis helps deepen our comprehension of the fundamental changes undergoing our discipline and provides insights into the future trajectory of our field.

翻訳日:2023-06-16 17:13:33 公開日:2023-06-14

# 時系列からの因果発見のための制約に基づくアルゴリズムと雑音に基づくアルゴリズムのハイブリッド化

Hybrids of Constraint-based and Noise-based Algorithms for Causal Discovery from Time Series ( http://arxiv.org/abs/2306.08765v1 )

ライセンス: Link先を確認

Charles K. Assaad, Daria Bystrova, Julyan Arbel, Emilie Devijver, Eric Gaussier, Wilfried Thuiller

(参考訳) 実アプリケーションでは検証できないような強い仮定の下で観測時系列から要約因果グラフを見つけるための制約ベースおよびノイズベース手法が提案されている。近年,これら2つのアプローチを組み合わせたハイブリッド手法 (Assaad et al, 2021) が仮定違反に対して堅牢であることが判明した。しかし、この手法は、要約因果グラフが非巡回であると仮定するが、多くのアプリケーションではサイクルが一般的である。例えば、生態学的コミュニティでは、捕食者と獲物個体群の間に周期的な関係があり、フィードバックループを形成している。そこで本稿では,制約に基づく手法と雑音に基づく手法を併用して,サイクルを含まない可能性のある要約因果グラフを探索する手法を提案する。各フレームワークに対して、シミュレーションデータ、実環境データ、および様々なアプリケーションの実データに対して実験的にテストされる2つのハイブリッドアルゴリズムを提供する。実験によると、私たちのハイブリッドアプローチは堅牢であり、ほとんどのデータセットに対して優れた結果をもたらします。

Constraint-based and noise-based methods have been proposed to discover summary causal graphs from observational time series under strong assumptions which can be violated or impossible to verify in real applications. Recently, a hybrid method (Assaad et al, 2021) that combines these two approaches, proved to be robust to assumption violation. However, this method assumes that the summary causal graph is acyclic, but cycles are common in many applications. For example, in ecological communities, there may be cyclic relationships between predator and prey populations, creating feedback loops. Therefore, this paper presents two new frameworks for hybrids of constraint-based and noise-based methods that can discover summary causal graphs that may or may not contain cycles. For each framework, we provide two hybrid algorithms which are experimentally tested on simulated data, realistic ecological data, and real data from various applications. Experiments show that our hybrid approaches are robust and yield good results over most datasets.

翻訳日:2023-06-16 17:13:28 公開日:2023-06-14

# 部分的後視状態情報を持つRLにおけるPMDPの理論的硬さとトラクタビリティ

Theoretical Hardness and Tractability of POMDPs in RL with Partial Hindsight State Information ( http://arxiv.org/abs/2306.08762v1 )

ライセンス: Link先を確認

Ming Shi, Yingbin Liang, and Ness Shroff

(参考訳) 部分可観測マルコフ決定プロセス(pomdps)は多くの実世界のアプリケーションを取り込むために広く適用されてきた。しかし、既存の理論的な結果から、一般的なpomdpsでの学習は難解であり、主な課題は潜在状態情報がないことである。ここでの基本的な問題は、トラクタビリティを実現するのに、どれくらいの後方状態情報(HSI)が十分かということだ。完全HSIがなければ,POMDPに対する$\epsilon$-Optimal Policy Solutionを得るためには,指数関数的にスケールするサンプルの複雑さが必要である。それでも、下界構造における重要な知見から、部分的HSIであっても、PMDPの重要な抽出可能なクラスが存在することが分かる。特に、部分的HSIを持つPOMDPの2つの新しいクラスに対して、新しい上界と下界を確立することにより、ほぼ最適であることを示す新しいアルゴリズムを提供する。

Partially observable Markov decision processes (POMDPs) have been widely applied to capture many real-world applications. However, existing theoretical results have shown that learning in general POMDPs could be intractable, where the main challenge lies in the lack of latent state information. A key fundamental question here is how much hindsight state information (HSI) is sufficient to achieve tractability. In this paper, we establish a lower bound that reveals a surprising hardness result: unless we have full HSI, we need an exponentially scaling sample complexity to obtain an $\epsilon$-optimal policy solution for POMDPs. Nonetheless, from the key insights in our lower-bound construction, we find that there exist important tractable classes of POMDPs even with partial HSI. In particular, for two novel classes of POMDPs with partial HSI, we provide new algorithms that are shown to be near-optimal by establishing new upper and lower bounds.

翻訳日:2023-06-16 17:13:10 公開日:2023-06-14

# 高調波電位井戸における傾斜線形及び二次バンド接触分散に対するファノ共鳴

Fano resonances for tilted linear and quadratic band touching dispersions in a harmonically driven potential well ( http://arxiv.org/abs/2306.08759v1 )

ライセンス: Link先を確認

Anton Gregefalk, Annica Black-Schaffer, Tanay Nag

(参考訳) 傾き線形および2次帯域接触分散モデルを考えると、横方向線形傾きが伝送スペクトルに与える影響を、高調波駆動電位配向により縦方向に解析する。フロッケ散乱行列形式を用いることで、ファノ共鳴はフロッケサイドバンドと準束縛状態のマッチングの結果であり、傾きはエネルギーと波のベクトルを再正規化する。伝送プロファイルにおいて, 負の共鳴エネルギーは, 横運動量の大きさが大きくなるにつれて, 線形(四角形)のバンドタッチにおいて減少する(増加)。逆運動量と傾きの積の符号は、両方のバンド分散のトテッドケースに対するファノ共鳴エネルギーの相対的なシフトも決定し、傾き系のファノ共鳴のチューニング可能性を示している。重要なことに、横モーメント方向の関数としてファノ共鳴エネルギーを測定することにより、傾き強度を直接決定することもできる。さらに,ファノ共鳴エネルギーの周囲に反射領域とゆらぎがある場合のショットノイズスペクトルとその差特性について検討した。興味深いことに、差動ショットノイズと透過スペクトルはどちらも同じ方法で定性的に振る舞うため、駆動固体系の将来の実験において重要な観測材料となる。

Considering models with tilted linear and quadratic band touching dispersions, we analyze the effect of the transverse linear tilt on the transmission spectra through a harmonically driven potential well oriented longitudinally. Employing the Floquet scattering matrix formalism, we find Fano resonances as an outcome of matching between the Floquet sidebands and quasi-bound states, where the tilt renormalizes their energies and wave vectors. We find that the Fano resonance energy decreases (increases) for linear (quadratic) band touchings as the magnitude of the transverse momentum increases, indicating a distinct signature of the underlying band dispersion in the transmission profile. The sign of the product of the transverse momentum and the tilt also determines the relative shift in the Fano resonance energy with respect to the untilted case for both band dispersions, suggesting a possible tunability of the Fano resonance for tilted systems. Importantly, the tilt strength can also be directly determined by measuring the Fano resonance energy as function of the transverse momenta direction. We furthermore study the shot noise spectra and their differential property where we find an inflection region and undulation, respectively, around the Fano resonance energy. Interestingly, differential shot noise and transmission spectra both qualitatively behave in a similar fashion and might thus serve as important observables for future experiments on driven solid-state systems.

翻訳日:2023-06-16 17:12:53 公開日:2023-06-14

# infodiffusion:情報最大化拡散モデルを用いた表現学習

InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models ( http://arxiv.org/abs/2306.08757v1 )

ライセンス: Link先を確認

Yingheng Wang, Yair Schiff, Aaron Gokaslan, Weishen Pan, Fei Wang, Christopher De Sa, Volodymyr Kuleshov

(参考訳) 拡散モデルは高品質なサンプルを生成するのに優れているが、潜伏変数は通常意味を欠き、表現学習には適さない。本稿では,データ変動の高レベル因子をキャプチャする低次元潜在変数を用いた拡散モデルの拡張アルゴリズムであるinfodiffusionを提案する。 InfoDiffusionは、観測された変数と隠れた変数の相互情報に規則化された学習目標に依存し、遅延空間の品質を改善し、表現的な拡散に基づくデコーダによって潜伏者が無視されるのを防ぐ。経験的に、InfoDiffusionは、拡散モデルの高いサンプル品質を維持しながら、最先端の生成的およびコントラスト的手法と競合する、絡み合った人間の解釈可能な潜在表現を学習する。提案手法は, 生成画像の属性を操作可能であり, 生成設計などの品質サンプルを生成するために学習した潜伏空間を探索するタスクを支援することができる。

While diffusion models excel at generating high-quality samples, their latent variables typically lack semantic meaning and are not suitable for representation learning. Here, we propose InfoDiffusion, an algorithm that augments diffusion models with low-dimensional latent variables that capture high-level factors of variation in the data. InfoDiffusion relies on a learning objective regularized with the mutual information between observed and hidden variables, which improves latent space quality and prevents the latents from being ignored by expressive diffusion-based decoders. Empirically, we find that InfoDiffusion learns disentangled and human-interpretable latent representations that are competitive with state-of-the-art generative and contrastive methods, while retaining the high sample quality of diffusion models. Our method enables manipulating the attributes of generated images and has the potential to assist tasks that require exploring a learned latent space to generate quality samples, e.g., generative design.

翻訳日:2023-06-16 17:12:27 公開日:2023-06-14

# 多言語エンコーダとSeq2Seqモデルの逐次事前学習

Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models ( http://arxiv.org/abs/2306.08756v1 )

ライセンス: Link先を確認

Saleh Soltan, Andy Rosenbaum, Tobias Falke, Qin Lu, Anna Rumshisky, Wael Hamza

(参考訳) プリトレーニングエンコーダのみおよびシーケンシャル・ツー・シークエンス(seq2seq)モデルにはそれぞれ利点があるが、両方のモデルタイプをスクラッチからトレーニングするのは計算コストがかかる。 1つのモデルを他のモデルから初期化することで、事前学習効率を改善するためのレシピを検討する。 1)Seq2seqモデルからエンコーダを抽出し,特にシーケンスラベリングタスクにおいて,マスケッド言語モデリング(MLM)エンコーダの下位性能を示す。 seq2seqトレーニング中のマスキングの変化、デコーダサイズの削減、少量のMLMトレーニングの継続はギャップを埋めない。 2)逆に、エンコーダをウォームスタートseq2seqトレーニングに使用することにより、トレーニングを通じてエンコーダパートウェイをフリーズすることで、スクラッチseq2seqモデルのタスク性能と一致させることができることを示す。全体として、この2段階のアプローチは、多言語エンコーダとseq2seqモデルの両方を得るための効率的なレシピであり、各モデルをスクラッチからトレーニングするパフォーマンスを一致させ、計算コストを27%削減する。

Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages, however training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one model from the other. (1) Extracting the encoder from a seq2seq model, we show it under-performs a Masked Language Modeling (MLM) encoder, particularly on sequence labeling tasks. Variations of masking during seq2seq training, reducing the decoder size, and continuing with a small amount of MLM training do not close the gap. (2) Conversely, using an encoder to warm-start seq2seq training, we show that by unfreezing the encoder partway through training, we can match task performance of a from-scratch seq2seq model. Overall, this two-stage approach is an efficient recipe to obtain both a multilingual encoder and a seq2seq model, matching the performance of training each model from scratch while reducing the total compute cost by 27%.

翻訳日:2023-06-16 17:12:09 公開日:2023-06-14

# ClimSim:ハイブリッドマルチスケール気候シミュレータにおける高分解能物理エミュレータのトレーニングのためのオープンな大規模データセット

ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate simulators ( http://arxiv.org/abs/2306.08754v1 )

ライセンス: Link先を確認

Sungduk Yu, Walter M. Hannah, Liran Peng, Mohamed Aziz Bhouri, Ritwik Gupta, Jerry Lin, Bj\"orn L\"utjens, Justus C. Will, Tom Beucler, Bryce E. Harrop, Benjamin R. Hillman, Andrea M. Jenney, Savannah L. Ferretti, Nana Liu, Anima Anandkumar, Noah D. Brenowitz, Veronika Eyring, Pierre Gentine, Stephan Mandt, Jaideep Pathak, Carl Vondrick, Rose Yu, Laure Zanna, Ryan P. Abernathey, Fiaz Ahmed, David C. Bader, Pierre Baldi, Elizabeth A. Barnes, Gunnar Behrens, Christopher S. Bretherton, Julius J. M. Busecke, Peter M. Caldwell, Wayne Chuang, Yilun Han, Yu Huang, Fernando Iglesias-Suarez, Sanket Jantre, Karthik Kashinath, Marat Khairoutdinov, Thorsten Kurth, Nicholas J. Lutsko, Po-Lun Ma, Griffin Mooers, J. David Neelin, David A. Randall, Sara Shamekh, Akshay Subramaniam, Mark A. Taylor, Nathan M. Urban, Janni Yuval, Guang J. Zhang, Tian Zheng, Michael S. Pritchard

(参考訳) 現代の気候予測は、計算の制約による空間的および時間的解決が不十分である。その結果は、嵐のような臨界過程の不正確で不正確な予測である。物理と機械学習(ML)を組み合わせたハイブリッドな手法は、新しい世代の高忠実度気候シミュレータを導入し、計算ハングリーで短い高解像度のシミュレーションをMLエミュレータにアウトソーシングすることでムーアの法則を助長することができる。しかし、このハイブリッドML-物理シミュレーションアプローチは、ドメイン固有の治療を必要としており、トレーニングデータや関連する、使いやすいワークフローがないため、MLの専門家にはアクセスできない。 ClimSimは、ハイブリッドML物理研究のために設計された、史上最大のデータセットである。気候科学者とML研究者のコンソーシアムによって開発されたマルチスケール気候シミュレーションを含んでいる。 570億対の多変量入力および出力ベクトルからなり、ホストの気候シミュレータのマクロスケールの物理状態に対する局所ネスト、高分解能、高忠実性物理学の影響を分離する。データセットはグローバルにカバーされており、複数年にわたってサンプリング頻度が高く、結果としてエミュレータがダウンストリーム結合と互換性を持つように設計されている。我々は,MLの課題とその得点を明らかにするために,決定論的および確率的回帰ベースラインを実装した。データ(https://huggingface.co/datasets/LEAP/ClimSim_high-res)とコード(https://leap-stc.github.io/ClimSim)は、科学と社会の利益のために、ハイブリッドML物理と高忠実度気候シミュレーションの開発を支援するために公開されている。

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise prediction of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state. The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society.

翻訳日:2023-06-16 17:11:48 公開日:2023-06-14

# モノリンガルデータからのバイリンガルおよびコード変換音声認識モデルの訓練に向けて

Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources ( http://arxiv.org/abs/2306.08753v1 )

ライセンス: Link先を確認

Kunal Dhawan, Dima Rekesh, Boris Ginsburg

(参考訳) ASR(Multilingual Automatic Speech Recognition)モデルでは、複数の言語にまたがる音声の書き起こしが可能で、個別のモデルを必要としない。さらに、Language Identification (LID)を実行し、コード変更された音声を処理する。しかし、これらのモデルを訓練するには特別なコードスウィッチと多言語音声コーパスが必要である。本稿では,バイリンガル学習に対する異なるアプローチと,純粋にモノリンガルなデータソースを用いたコード切替型ASRモデルの評価を行う。本稿では,モノリンガルなサンプル境界におけるLIDの生成技術と異なり,各トークンに対してLIDを生成する集合トークン化器の概念を紹介する。両言語間および単言語間モデルの性能を比較し,アグリゲートトークン化器の有効性を示し,合成型asrデータ生成手法を示し,提案するasrモデルが音声認識と音声言語識別のタスクに有効であることを示す。

Multilingual Automatic Speech Recognition (ASR) models are capable of transcribing audios across multiple languages, eliminating the need for separate models. In addition, they can perform Language Identification (LID) and handle code-switched speech. However, training these models requires special code-switch and multilingual speech corpora which are sparsely available. In this paper, we evaluate different approaches towards training of bilingual as well as code-switched ASR models using purely monolingual data sources. We introduce the concept of aggregate tokenizers that differs from the current prevalent technique of generating LIDs at the boundaries of monolingual samples and produces LID for each emitted token instead. We compare bilingual and monolingual model performance, showcase the efficacy of aggregate tokenizers, present a synthetic code-switched ASR data generation technique and demonstrate the effectiveness of the proposed code-switched ASR models for the tasks of speech recognition and spoken language identification.

翻訳日:2023-06-16 17:11:13 公開日:2023-06-14

# 学習者からの学習による視覚的質問応答の改善

Improving Selective Visual Question Answering by Learning from Your Peers ( http://arxiv.org/abs/2306.08751v1 )

ライセンス: Link先を確認

Corentin Dancette, Spencer Whitehead, Rishabh Maheshwary, Ramakrishna Vedantam, Stefan Scherer, Xinlei Chen, Matthieu Cord, Marcus Rohrbach

(参考訳) VQA(Visual Question Answering)の進歩にもかかわらず、モデルが自身の正確性を評価する能力はいまだ探索されていない。最近の研究によると、VQAモデル、即席のモデルは、間違っているときの回答を控えることが困難であることが示されている。 Selective Prediction(選択予測)とも呼ばれる禁忌オプションは、システムのアウトプットを信頼しなければならないユーザ(視覚障害のあるユーザのためのVQAアシスタントなど)にシステムを展開する場合に非常に有用である。このようなシナリオでは、ユーザがアウト・オブ・ディストリビューション(OOD)や、誤った回答の可能性がより高い敵のインプットを提供するため、禁忌は特に重要である。そこで本研究では,モデルにIDとOODデータを混合して提示する,分散内(ID)およびOODシナリオのSelective VQAについて検討する。目標は、これらの質問に対する誤りのリスクを最小限に抑えながら、回答する質問の数を最大化することである。本稿では,マルチモーダル選択関数を学習し,留意決定を行うためのシンプルなLearning from Your Peers (LYP)アプローチを提案する。提案手法では,学習データの異なるサブセットに基づいて訓練されたモデルの予測を,選択的VQAモデルの最適化のターゲットとして利用する。追加のマニュアルラベルや保留データを必要とせず、簡単に一般化できる例を特定するための信号を提供する。広範な評価では、さまざまなアーキテクチャやスケールにわたる多くのモデルにこのメリットがあります。全体として、IDについては、選択的な予測基準のカバレッジで32.92%に達し、1%のエラーリスク(C@1%)で、このタスクで過去最高のカバレッジは15.79%だった。混合ID/OODでは、モデルのソフトマックスの信頼度を用いて、10%のOOD例に直面しても1%のエラーリスクで回答するが、LYPで学習した選択関数は25.38% C@1%に増加する。

Despite advances in Visual Question Answering (VQA), the ability of models to assess their own correctness remains underexplored. Recent work has shown that VQA models, out-of-the-box, can have difficulties abstaining from answering when they are wrong. The option to abstain, also called Selective Prediction, is highly relevant when deploying systems to users who must trust the system's output (e.g., VQA assistants for users with visual impairments). For such scenarios, abstention can be especially important as users may provide out-of-distribution (OOD) or adversarial inputs that make incorrect answers more likely. In this work, we explore Selective VQA in both in-distribution (ID) and OOD scenarios, where models are presented with mixtures of ID and OOD data. The goal is to maximize the number of questions answered while minimizing the risk of error on those questions. We propose a simple yet effective Learning from Your Peers (LYP) approach for training multimodal selection functions for making abstention decisions. Our approach uses predictions from models trained on distinct subsets of the training data as targets for optimizing a Selective VQA model. It does not require additional manual labels or held-out data and provides a signal for identifying examples that are easy/difficult to generalize to. In our extensive evaluations, we show this benefits a number of models across different architectures and scales. Overall, for ID, we reach 32.92% in the selective prediction metric coverage at 1% risk of error (C@1%) which doubles the previous best coverage of 15.79% on this task. For mixed ID/OOD, using models' softmax confidences for abstention decisions performs very poorly, answering <5% of questions at 1% risk of error even when faced with only 10% OOD examples, but a learned selection function with LYP can increase that to 25.38% C@1%.

翻訳日:2023-06-16 17:10:56 公開日:2023-06-14

# UAV支援ネットワークにおけるエネルギー効率最適化のための密度認識強化学習

Density-Aware Reinforcement Learning to Optimise Energy Efficiency in UAV-Assisted Networks ( http://arxiv.org/abs/2306.08785v1 )

ライセンス: Link先を確認

Babatunji Omoniwa, Boris Galkin, Ivana Dusparic

(参考訳) 航空機基地局として機能する無人航空機(uavs)は、車両などのモバイルユーザへのワイヤレス接続を提供するために展開することができる。しかし、道路上の車両の密度は、主に地理的地域における移動性や交通状況によって空間的・時間的に変化することが多く、ユビキタスなサービスの提供が困難である。さらに、エネルギーに制約のあるUAVが移動中に空を飛ぶと、近くのUAVセルや同じ周波数帯域を共有する他のアクセスポイントからの干渉に直面し、システムのエネルギー効率(EE)に影響を与える可能性がある。近年,マルチエージェント強化学習 (marl) が適用され,利用者のカバー範囲を最適化する手法が有効に機能しているが,不均一な利用者分布,すなわち不均一な車両濃度を有する都市道路網ではうまく機能しない可能性がある。本研究では,各uavの軌道,接続ユーザ数,uavのエネルギー消費量を共同で最適化し,高密度かつ不均一なユーザの分布を追跡することで,システム全体のeeを最大化する密度対応型マルチエージェント分散型マルチディープq-network(dacemad-ddqn)手法を提案する。私たちの結果は、最先端のMARLアプローチを65%から85%上回っています。

Unmanned aerial vehicles (UAVs) serving as aerial base stations can be deployed to provide wireless connectivity to mobile users, such as vehicles. However, the density of vehicles on roads often varies spatially and temporally primarily due to mobility and traffic situations in a geographical area, making it difficult to provide ubiquitous service. Moreover, as energy-constrained UAVs hover in the sky while serving mobile users, they may be faced with interference from nearby UAV cells or other access points sharing the same frequency band, thereby impacting the system's energy efficiency (EE). Recent multi-agent reinforcement learning (MARL) approaches applied to optimise the users' coverage worked well in reasonably even densities but might not perform as well in uneven users' distribution, i.e., in urban road networks with uneven concentration of vehicles. In this work, we propose a density-aware communication-enabled multi-agent decentralised double deep Q-network (DACEMAD-DDQN) approach that maximises the total system's EE by jointly optimising the trajectory of each UAV, the number of connected users, and the UAVs' energy consumption while keeping track of dense and uneven users' distribution. Our result outperforms state-of-the-art MARL approaches in terms of EE by as much as 65% - 85%.

翻訳日:2023-06-16 17:01:57 公開日:2023-06-14

# HOSSnet: き裂伝播シミュレーションのための効率的な物理誘導ニューラルネットワーク

HOSSnet: an Efficient Physics-Guided Neural Network for Simulating Crack Propagation ( http://arxiv.org/abs/2306.08783v1 )

ライセンス: Link先を確認

Shengyu Chen, Shihang Feng, Yao Huang, Zhou Lei, Xiaowei Jia, Youzuo Lin, Estaben Rougier

(参考訳) 有限離散要素法(FDEM)を併用したHOSS(Hybrid Optimization Software Suite)は,高忠実度破壊および断片化過程をシミュレーションする高度な手法の1つであるが,純粋なHOSSシミュレーションの適用は計算コストがかかる。同時に、いくつかの科学的問題で大きな成功を収めている機械学習手法は、科学領域における物理学ベースのモデルに代わる有望な選択肢と考えられている。そこで本研究の目的は, 空間的および時間的領域においてき裂破壊を正確に再構築するための新しいデータ駆動手法の構築である。長期再建における破壊伝播の調整には, 物理的制約を活用している。さらに, フラクチャーデータの再構成性能をさらに向上するために, 知覚的損失と純粋機械学習最適化手法をいくつか導入する。提案手法の有効性を補間実験と補間実験で実証する。提案手法は, 画素単位での再構成誤差と構造的類似性の観点から, 空間的および時間的に高精度な破壊データを再構成できることを確認した。視覚比較は長期的な有望な結果も示す

Hybrid Optimization Software Suite (HOSS), which is a combined finite-discrete element method (FDEM), is one of the advanced approaches to simulating high-fidelity fracture and fragmentation processes but the application of pure HOSS simulation is computationally expensive. At the same time, machine learning methods, shown tremendous success in several scientific problems, are increasingly being considered promising alternatives to physics-based models in the scientific domains. Thus, our goal in this work is to build a new data-driven methodology to reconstruct the crack fracture accurately in the spatial and temporal fields. We leverage physical constraints to regularize the fracture propagation in the long-term reconstruction. In addition, we introduce perceptual loss and several extra pure machine learning optimization approaches to improve the reconstruction performance of fracture data further. We demonstrate the effectiveness of our proposed method through both extrapolation and interpolation experiments. The results confirm that our proposed method can reconstruct high-fidelity fracture data over space and time in terms of pixel-wise reconstruction error and structural similarity. Visual comparisons also show promising results in long-term

翻訳日:2023-06-16 17:01:31 公開日:2023-06-14

# 説明可能性の説明:2階説明可能性による深層学習への深い行動可能な洞察に向けて

Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability ( http://arxiv.org/abs/2306.08780v1 )

ライセンス: Link先を確認

E. Zhixuan Zeng, Hayden Gunraj, Sheldon Fernandez, Alexander Wong

(参考訳) 説明責任は、ディープラーニングモデルの振る舞いをより包括的に理解する上で重要な役割を担います。これにより、モデルのパフォーマンスの徹底的な検証が可能になり、その決定が関連する視覚的指標に基づいており、トレーニングデータに存在する無関係なパターンに偏らないことを保証する。しかし、既存のメソッドはインスタンスレベルの説明可能性のみを提供しており、各サンプルを手動で分析する必要がある。このような手作業によるレビューは時間がかかり、人間の偏見に傾向があります。この問題に対処するため、最近2次説明可能なAI(SOXAI)の概念が提案され、説明可能なAI(XAI)をインスタンスレベルからデータセットレベルまで拡張した。 SOXAIは、一般的な概念を特定することによって、量的説明とデータセットバイアスの間の関係の分析を自動化する。本研究では,ディープニューラルネットワークの振る舞いを高レベルに解釈することで,行動可能な洞察を「説明可能性を説明する」ことを可能にする。具体的には,SOXAIの動作可能な洞察に基づくトレーニングセットから無関係な概念を除外することで,モデルの性能を向上させることを,例分類とセグメンテーションケースを通じて初めて示す。

Explainability plays a crucial role in providing a more comprehensive understanding of deep learning models' behaviour. This allows for thorough validation of the model's performance, ensuring that its decisions are based on relevant visual indicators and not biased toward irrelevant patterns existing in training data. However, existing methods provide only instance-level explainability, which requires manual analysis of each sample. Such manual review is time-consuming and prone to human biases. To address this issue, the concept of second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. SOXAI automates the analysis of the connections between quantitative explanations and dataset biases by identifying prevalent concepts. In this work, we explore the use of this higher-level interpretation of a deep neural network's behaviour to allows us to "explain the explainability" for actionable insights. Specifically, we demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.

翻訳日:2023-06-16 17:01:13 公開日:2023-06-14

# MMD-FUSE:データ分割のない2サンプルテストのための学習とカーネルの組み合わせ

MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting ( http://arxiv.org/abs/2306.08777v1 )

ライセンス: Link先を確認

Felix Biggs, Antonin Schrab, Arthur Gretton

(参考訳) 本稿では,最大平均離散性(MMD)に基づく2サンプルテストのパワーを最大化する新しい統計法を提案する。有限集合の場合、これは重み付けされたソフトな最大値によってこれらのカーネルのそれぞれの下で(正規化された)MDD値を組み合わせることに還元される。指数濃度境界は、null と alternative の下で提案する統計で証明される。さらに、これらのカーネルをデータ依存だが順列非依存の方法で選択する方法を、適切に調整されたテストで示し、データの分割を避ける。この手法は、一般的な置換に基づくMDDテストに広く適用され、オートエンコーダのような教師なしモデルを用いて学習した機能を持つディープカーネルの使用を含む。我々は,合成低次元および実世界の高次元データに対するMDD-FUSEテストの適用性を強調し,その性能を現状のカーネルテストと比較した。

We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it. For finite sets, this reduces to combining (normalised) MMD values under each of these kernels via a weighted soft maximum. Exponential concentration bounds are proved for our proposed statistics under the null and alternative. We further show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting. This technique applies more broadly to general permutation-based MMD testing, and includes the use of deep kernels with features learnt using unsupervised models such as auto-encoders. We highlight the applicability of our MMD-FUSE test on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.

翻訳日:2023-06-16 17:00:54 公開日:2023-06-14

# 時間依存ハミルトニアンのフォン・ノイマン方程式の量子シミュレーション

Quantum simulation of the von Neumann equation of time-dependent Hamiltonians ( http://arxiv.org/abs/2306.08775v1 )

ライセンス: Link先を確認

Alejandro Kunold

(参考訳) 本研究では,時間依存ハミルトイナンに対するフォン・ノイマン方程式によって制御される密度行列のダイナミクスをシミュレートする量子アルゴリズムを開発した。この方法は、与えられたリー代数の構造定数の性質を通して密度行列のベクトル化に依存する。パウリの弦によって形成される代数を用いても、アルゴリズムは他の代数に容易に適用できる。このアプローチの主な利点の1つは、位相キックバックによって容易に決定できる実密度行列係数が得られることである。このアルゴリズムはIBMノイズ量子回路シミュレータを用いて実証される。

In this work we develop a quantum algorithm to simulate the dynamics of the density matrix governed by the von Neumann equation for time-dependent Hamiltoinans. The method relies on the vectorization of the density matrix through the properties of the structure constants of a given Lie algebra. Even though we have used the algebra formed by the Pauli strings, the algorithm can be easily adapted to other algebras. One of the main advantages of this approach is that it yields real density matrix coefficients that are easy to determine through phase kickback. The algorithm is demonstrated using the IBM noisy quantum circuit simulator.

翻訳日:2023-06-16 17:00:37 公開日:2023-06-14

# Katakomba: データ駆動NetHackのツールとベンチマーク

Katakomba: Tools and Benchmarks for Data-Driven NetHack ( http://arxiv.org/abs/2306.08772v1 )

ライセンス: Link先を確認

Vladislav Kurenkov, Alexander Nikulin, Denis Tarasov, Sergey Kolesnikov

(参考訳) NetHackは強化学習研究のフロンティアとして知られており、学習ベースの手法は依然としてルールベースのソリューションに追いつく必要がある。ブレークスルーの有望な方向の1つは、ロボット工学やレコメンダシステムなどの最近の開発に類似したデータセットを、オフライン強化学習(orl)の傘下で使用することである。最近、大規模なNetHackデータセットがリリースされた。これは必要なステップだったが、まだORLコミュニティで広く採用されていない。本研究では,ツール・ワイド,実装・ワイド,ベンチマーク・ワイドの3つの大きな障害について論じる。そこで我々は, ORLコミュニティに慣れ親しんだワークフローの基礎を提供するオープンソースライブラリを開発した。D4RLスタイルのタスク, 乱雑なベースライン実装, クラウドに同期した設定とログを備えた信頼性評価ツールである。

NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack dataset was released; while it was a necessary step forward, it has yet to gain wide adoption in the ORL community. In this work, we argue that there are three major obstacles for adoption: tool-wise, implementation-wise, and benchmark-wise. To address them, we develop an open-source library that provides workflow fundamentals familiar to the ORL community: pre-defined D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation tools with accompanying configs and logs synced to the cloud.

翻訳日:2023-06-16 17:00:26 公開日:2023-06-14

# 検索型対話システムのためのテキスト自動エンコーダ

ConTextual Masked Auto-Encoder for Retrieval-based Dialogue Systems ( http://arxiv.org/abs/2306.04357v3 )

ライセンス: Link先を確認

Zhenpeng Su and Xing Wu and Wei Zhou and Guangyuan Ma and Songlin Hu

(参考訳) 対話応答選択は、与えられたユーザとシステム発話履歴に基づいて、複数の候補から適切な応答を選択することを目的としている。近年, 学習後の対話応答選択の精度が向上し, 主にナイーブマスク型言語モデリング手法に依拠している。しかし、最近開発された生成手法は、IRコミュニティにおいて有望なテキスト表現能力を示しており、よりよい対話セマンティクスモデリングにつながる可能性がある。そこで本稿では,対話応答選択のための自動学習手法であるdialog-mae(dialogue context masking auto-encoder)を提案する。 dial-maeは非対称エンコーダ-デコーダアーキテクチャを使用して、対話の意味を対話型ベクトルに圧縮する。 Dial-MAEのプロセスでは、ディープエンコーダがダイアログのコンテキストに埋め込まれたディープエンコーダを作成し、続いて浅層デコーダが、この埋め込みとマスキングされた応答を使って元の応答を復元する。実験の結果,dial-maeは2つのベンチマークで最先端の性能を得られた。

Dialogue response selection aims to select an appropriate response from several candidates based on a given user and system utterance history. Recent studies have been improving the accuracy of dialogue response selection through post-training, mostly relying on naive masked language modeling methods. However, the recently developed generative methods have shown promising text representation capabilities in IR community, which could potentially lead to better dialogue semantics modeling. Thus, in this paper, we propose Dial-MAE (Dialogue Contextual Masking Auto-encoder), a straightforward yet effective post-training technique tailored for dialogue response selection. Dial-MAE uses an asymmetric encoder-decoder architecture that learns to better compress the semantics of the dialogue into dialogue-dense vectors. The process of Dial-MAE involves a deep encoder creating a dialogue embedding with the masked dialogue context, followed by a shallow decoder that uses this embedding along with the highly masked response to restore the original response. Our experiments have demonstrated that Dial-MAE is highly effective, achieving state-of-the-art performance on two commonly evaluated benchmarks.

翻訳日:2023-06-16 11:10:06 公開日:2023-06-14

# 大規模言語モデルにおける言語の影響を探る: GPT-3.5を用いた検討

Exploring the Influence of Language on Time-Reward Perceptions in Large Language Models: A Study Using GPT-3.5 ( http://arxiv.org/abs/2305.02531v3 )

ライセンス: Link先を確認

Ali Goli, Amandeep Singh

(参考訳) 言語は時間と報酬に対する認識に強い影響を与えます。これは、大きな言語モデルが、異なる言語で同じ質問をするときに、時間とともに報酬に対する異なる好みを示し、その選択が人間のものと似ているかどうかという疑問を提起する。本研究では,複数の言語におけるプロンプトに対するgpt-3.5(以下gptと呼ぶ)の反応を分析し,より小さく,より早い報酬とより大きな後続報酬の選好について検討した。以上の結果から, GPTはドイツ語やマンダリンなどの言語において, 英語やフランス語のような強いFTRを持つ言語と比較して, FTRが弱い言語において, より忍耐力を示すことが示された。これらの知見は既存の文献と一致しており、GPTの選択と話者の好みの相関関係が示唆されている。しかし、さらなる分析により、早期または後期の報酬の選好は、報酬ギャップによって体系的に変化せず、早期の支払いに対する語彙選好を示すことが明らかとなった。 GPTは言語間の興味深いバリエーションを捉えることができるが、これらのモデルによる選択は人間の意思決定者とは一致しない。

Language has a strong influence on our perceptions of time and rewards. This raises the question of whether large language models, when asked the same question in different languages, show different preferences for rewards over time and if their choices are similar to those of humans. In this study, we analyze the responses of GPT-3.5 (hereafter referred to as GPT) to prompts in multiple languages, exploring preferences between smaller, sooner rewards and larger, later rewards. Our results show that GPT displays greater patience when prompted in languages with weak future tense references (FTR), such as German and Mandarin, compared to languages with strong FTR, like English and French. These findings are consistent with the existing literature and suggest a correlation between GPT's choices and the preferences of speakers of these languages. However, further analysis reveals that the preference for earlier or later rewards does not systematically change with reward gaps, indicating a lexicographic preference for earlier payments. While GPT may capture intriguing variations across languages, our findings indicate that the choices made by these models do not correspond to those of human decision-makers.

翻訳日:2023-06-16 11:08:38 公開日:2023-06-14

# Co-MLを用いた家族による協調型機械学習モデルの構築

Collaborative Machine Learning Model Building with Families Using Co-ML ( http://arxiv.org/abs/2304.05444v3 )

ライセンス: Link先を確認

Tiffany Tseng, Jennifer King Chen, Mona Abdelrahman, Mary Beth Kery, Fred Hohman, Adriana Hilliard, R. Benjamin Shapiro

(参考訳) 既存の初心者フレンドリーな機械学習(ml)モデリングツールは、単一のユーザエクスペリエンスを中心に、単一のユーザが自身のデータのみを収集してモデルを構築する。しかし、単体モデリングの経験は、学習者が一緒に働くときに起こりうる代替のアイデアやアプローチに遭遇する貴重な機会を制限している。この問題に対処するため、私たちはco-mlを開発した。これはタブレットベースのアプリで、学習者がエンドツーエンドの反復的なモデル構築プロセスを通じてmlイメージ分類器を共同構築する。本稿では,家庭内導入型ml活動にco-mlを用いた家族(11歳,14歳,11歳,11歳)の詳細なケーススタディを行い,協調モデリングの実現可能性と潜在的豊かさについて述べる。我々は、Co-MLシステム設計を共有し、コラボレーティブアクティビティにおけるCo-MLの使用によって、初心者がデータ多様性、クラス不均衡、データ品質といった以前の作業で不足していたデータセット設計の考察をまとめて行うことができるかについて議論する。個人が異なるモデル構築責任を負う分散協調プロセスが、子供や大人がMLデータセット設計を学ぶためのリッチなコンテキストを提供する方法について論じる。

Existing novice-friendly machine learning (ML) modeling tools center around a solo user experience, where a single user collects only their own data to build a model. However, solo modeling experiences limit valuable opportunities for encountering alternative ideas and approaches that can arise when learners work together; consequently, it often precludes encountering critical issues in ML around data representation and diversity that can surface when different perspectives are manifested in a group-constructed data set. To address this issue, we created Co-ML -- a tablet-based app for learners to collaboratively build ML image classifiers through an end-to-end, iterative model-building process. In this paper, we illustrate the feasibility and potential richness of collaborative modeling by presenting an in-depth case study of a family (two children 11 and 14-years-old working with their parents) using Co-ML in a facilitated introductory ML activity at home. We share the Co-ML system design and contribute a discussion of how using Co-ML in a collaborative activity enabled beginners to collectively engage with dataset design considerations underrepresented in prior work such as data diversity, class imbalance, and data quality. We discuss how a distributed collaborative process, in which individuals can take on different model-building responsibilities, provides a rich context for children and adults to learn ML dataset design.

翻訳日:2023-06-16 11:08:15 公開日:2023-06-14

# 非保存拡散過程の非平衡ダイナミクス

Nonequilibrium dynamics of nonconservative diffusion processes ( http://arxiv.org/abs/2302.10154v4 )

ライセンス: Link先を確認

P. Garbaczewski, M. \.Zaba

(参考訳) 非保存的ドリフト場を持つ拡散過程のフォッカー・プランク作用素は次元$N\geq 2$で、非エルミート電磁型ハミルトン運動発生器と直接関連付けられる。確率密度の誘導非平衡力学は、フォッカー・プランク方程式の経路積分解の問題に向けられ、量子プロパゲータの既知の正確な経路積分式を実時間とユークリッド時間に再検討し、これらをフォッカー・プランクが引き起こす遷移確率密度関数に含める。以下では、確率拡散過程のダイナミクスに対する磁気的(または磁気的に見える)影響の形式的かつ概念的に異なる実装に遭遇する、$n=3$の「磁気糸」に従う。 That includes the "magnetic affinity" of nonconservative diffusion processes, the classic Brownian motion of charged particles in the (electro)magnetic field, so-called Euclidean quantum mechanics involving non-Hermitian magnetic-type Hamiltonians, and path integral evaluation of integral kernels of Schr\"{o}dinger semigroups with a minimal electromagnetic coupling (encoded in their Hermitian generators). Our main objective is to go beyond the lore of magnetic analogies/affinities. We aim at detecting deeper interrelations between "magnetically affine" approaches, while clearly discriminating between the classic Lorentz or magnetic forcing in the Brownian motion of charged particles, quantum methods of incorporating electromagnetism, and potentially useful electromagnetic analogies ("surrogate magnetism") in the dynamics of diffusion processes.

Fokker-Planck operators of diffusion processes with nonconservative drift fields, in dimension $N\geq 2$, can be directly related with non-Hermitian electromagnetic-type Hamiltonian generators of motion. The induced nonequilibrium dynamics of probability densities points towards an issue of path integral solutions of the Fokker-Planck equation, and calls for revisiting links between known exact path integral formulas for quantum propagators in real and Euclidean time, with these for Fokker-Planck-induced transition probability density functions. In below we shall follow the $N=3$ "magnetic thread", within which one encounters formally and conceptually distinct implementations of the magnetic (or magnetic-looking) impact on the dynamics of stochastic diffusion processes. That includes the "magnetic affinity" of nonconservative diffusion processes, the classic Brownian motion of charged particles in the (electro)magnetic field, so-called Euclidean quantum mechanics involving non-Hermitian magnetic-type Hamiltonians, and path integral evaluation of integral kernels of Schr\"{o}dinger semigroups with a minimal electromagnetic coupling (encoded in their Hermitian generators). Our main objective is to go beyond the lore of magnetic analogies/affinities. We aim at detecting deeper interrelations between "magnetically affine" approaches, while clearly discriminating between the classic Lorentz or magnetic forcing in the Brownian motion of charged particles, quantum methods of incorporating electromagnetism, and potentially useful electromagnetic analogies ("surrogate magnetism") in the dynamics of diffusion processes.

翻訳日:2023-06-16 11:07:48 公開日:2023-06-14

# nf4は理論上最適な情報ではありません

NF4 Isn't Information Theoretically Optimal (and that's Good) ( http://arxiv.org/abs/2306.06965v2 )

ライセンス: Link先を確認

Davis Yoshida

(参考訳) このノートは、dettmers et al., 2023で使われているabsmaxベースのブロックワイズ量子化に関するいくつかの単純な計算と実験を共有している。提案したNF4データ型は、通常分布する重みを表すのに理論的に最適であると言われている。私は、量子化すべき値の分布がブロックサイズに依存するため、このことはありえないことを示しています。私はこれらの洞察を応用して、Quantileベースの手法ではなく、期待されるL1再構成エラーを最小限に抑え、改善されたコードを導き出そうとします。これにより、より大きな量子化ブロックサイズのパフォーマンスが向上し、どちらのコードもより小さなブロックサイズで同じように動作する。

This note shares some simple calculations and experiments related to absmax-based blockwise quantization, as used in Dettmers et al., 2023. Their proposed NF4 data type is said to be information theoretically optimal for representing normally distributed weights. I show that this can't quite be the case, as the distribution of the values to be quantized depends on the block-size. I attempt to apply these insights to derive an improved code based on minimizing the expected L1 reconstruction error, rather than the quantile based method. This leads to improved performance for larger quantization block sizes, while both codes perform similarly at smaller block sizes.

翻訳日:2023-06-16 11:04:10 公開日:2023-06-14

# TASRA:AIによる社会規模リスクの分類と分析

TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI ( http://arxiv.org/abs/2306.06924v2 )

ライセンス: Link先を確認

Andrew Critch and Stuart Russell

(参考訳) 近年のいくつかの研究で、人工知能による人類に対する社会規模および絶滅レベルのリスクが特定されているが、そのようなリスクを徹底的に分類する試みは、ほとんどない。多くの抜本的な分類が可能であり、特に新しいリスクや安全性への実践的なアプローチを明らかにする場合に有用である。本稿では,リスクに繋がる行動,アクターは一体化されているか,意図的かという,説明責任に基づく分類について考察する。また、多くのAIシステムの予期せぬ相互作用から生じるリスクや、技術的なソリューションとポリシーの複合が示される故意の誤用によるリスクなど、さまざまなリスクタイプがどのように機能するかを説明する物語も提供します。

While several recent works have identified societal-scale and extinction-level risks to humanity arising from artificial intelligence, few have attempted an {\em exhaustive taxonomy} of such risks. Many exhaustive taxonomies are possible, and some are useful -- particularly if they reveal new risks or practical approaches to safety. This paper explores a taxonomy based on accountability: whose actions lead to the risk, are the actors unified, and are they deliberate? We also provide stories to illustrate how the various risk types could each play out, including risks arising from unanticipated interactions of many AI systems, as well as risks from deliberate misuse, for which combined technical and policy solutions are indicated.

翻訳日:2023-06-16 11:03:25 公開日:2023-06-14

# 視覚トランスフォーマによる胸部x線画像の解析によるcovid-19診断の促進

Enhancing COVID-19 Diagnosis through Vision Transformer-Based Analysis of Chest X-ray Images ( http://arxiv.org/abs/2306.06914v2 )

ライセンス: Link先を確認

Sultan Zavrak

(参考訳) 2019年の新型コロナウイルス(covid-19)の出現は、世界的健康危機を招き、様々な診断方法を通じて個人の病気の特定を必要としている。放射線画像、特にX線画像の展開は、COVID-19の検出とキャラクタリゼーションにおいて重要な手段として認識されている。近年の研究では、X線画像中のウイルスに関する貴重な知見が明らかにされており、人工知能(AI)技術を利用した診断精度の向上を目的とした方法論の探索が進められている。現在の研究は、生の胸部x線画像、特にvit(pre-trained vision transformer)モデルを微調整することで、covid-19の自動診断のための革新的な枠組みを想定している。開発したモデルでは, 2つの分類性能, 通常の症例からcovid-19を識別する, 3つの分類性能, 肺炎および正常例からcovid-19を識別する, および4つの分類性能, 細菌性肺炎, ウイルス性肺炎, および正常な条件を識別し, それぞれ異なるデータセットを用いて評価した。提案したモデルでは,2次分類では99.92%,99.84%,3次分類では97.95%,86.48%,それぞれ4次分類では86.81%という異常な精度が得られた。

The advent of 2019 Coronavirus (COVID-19) has engendered a momentous global health crisis, necessitating the identification of the ailment in individuals through diverse diagnostic modalities. Radiological imaging, particularly the deployment of X-ray imaging, has been recognized as a pivotal instrument in the detection and characterization of COVID-19. Recent investigations have unveiled invaluable insights pertaining to the virus within X-ray images, instigating the exploration of methodologies aimed at augmenting diagnostic accuracy through the utilization of artificial intelligence (AI) techniques. The current research endeavor posits an innovative framework for the automated diagnosis of COVID-19, harnessing raw chest X-ray images, specifically by means of fine-tuning pre-trained Vision Transformer (ViT) models. The developed models were appraised in terms of their binary classification performance, discerning COVID-19 from Normal cases, as well as their ternary classification performance, discriminating COVID-19 from Pneumonia and Normal instances, and lastly, their quaternary classification performance, discriminating COVID-19 from Bacterial Pneumonia, Viral Pneumonia, and Normal conditions, employing distinct datasets. The proposed model evinced extraordinary precision, registering results of 99.92% and 99.84% for binary classification, 97.95% and 86.48% for ternary classification, and 86.81% for quaternary classification, respectively, on the respective datasets.

翻訳日:2023-06-16 11:02:45 公開日:2023-06-14

# HEOM.jl:開量子系における運動の階層方程式のための効率的なジュリアフレームワーク

HEOM.jl: An efficient Julia framework for hierarchical equations of motion in open quantum systems ( http://arxiv.org/abs/2306.07522v2 )

ライセンス: Link先を確認

Yi-Te Huang, Po-Chen Kuo, Neill Lambert, Mauro Cirio, Simon Cross, Shen-Liang Yang, Franco Nori, Yueh-Nan Chen

(参考訳) 我々は,複数のボソニック環境とフェルミオン環境を同時に結合したシステムの階層的運動方程式(heom)を統合するためのjuliaフレームワークである「heom.jl」というオープンソースソフトウェアパッケージを紹介する。 Heom.jlは、ボゾンスペクトルとフェルミオンスペクトル、定常状態、および全ての補助密度作用素(ADO)の拡張空間におけるフルダイナミックスを計算する方法の集合である。 ADOのマルチインデックスの必要な処理は、ユーザフレンドリーなインターフェースによって実現される。 2つのフェルミオン貯水池と相互作用する1つの不純物(アンダーソンモデル)と1つのボゾンと2つのフェルミオン貯水池と相互作用する超強結合電荷キャビティ系を解析することにより、パッケージの機能性を実証する。 Heom.jl は HEOM Liouvillian Superoperator の構築において、Python のQuantum Toolbox (QuTiP) の対応するメソッドに関して、すべての ADO に対する動的および定常状態の解決を可能にする。

We introduce an open-source software package called "Heom.jl", a Julia framework to integrate the hierarchical equations of motion (HEOM) for the reduced dynamics of a system simultaneously coupled to multiple bosonic and fermionic environments. Heom.jl features a collection of methods to compute bosonic and fermionic spectra, stationary states, and the full dynamics in the extended space of all auxiliary density operators (ADOs). The required handling of the ADOs multi-indexes is achieved through a user-friendly interface. We exemplify the functionalities of the package by analyzing a single impurity interacting with two fermionic reservoirs (Anderson model), and an ultra-strongly coupled charge-cavity system interacting with one bosonic and two fermionic reservoirs. Heom.jl allows for an order of magnitude speedup in the construction of the HEOM Liouvillian superoperator, solving dynamics and stationary states for all ADOs, with respect to the corresponding method in the Quantum Toolbox in Python (QuTiP), upon which this package is founded.

翻訳日:2023-06-16 10:53:24 公開日:2023-06-14

# グラフェンを添加した単一イオン検出器によるダイヤモンド内ドパント固定

Graphene-Enhanced Single Ion Detectors for Deterministic Near-Surface Dopant Implantation in Diamond ( http://arxiv.org/abs/2306.07496v2 )

ライセンス: Link先を確認

Nicholas F. L. Collins, Alexander M. Jakob, Simon G. Robson, Shao Qi Lim, Paul R\"acke, Brett C. Johnson, Boqing Liu, Yuerui Lu, Daniel Spemann, Jeffrey C. McCallum, David N. Jamieson

(参考訳) ダイヤモンドのカラーセンターアンサンブルは、量子通信のための単一光子源、光学入力と出力による量子計算、ナノスケールへの磁場感知など、多くの応用において集中的に研究されている。これらのアプリケーションのいくつかは、チップ内の単一中心またはランダムに分散したアンサンブルで実現されているが、大規模量子コンピュータの最も要求の高いアプリケーションは、順序付き配列を必要とするだろう。電荷感電素子に接続されたバイアスド表面グラフェン電極により電子グレードダイヤモンド基板を構成することにより、典型的な確率的イオン源から30〜130nmの深さで停止するイオンに対する決定論的単一イオン注入を示すことができる。イオン注入からの電子-ホール対のドリフトによって誘導される電荷パルスにより、注入イベントが信号される。イオン注入部位はAFMナノステンシルまたは集束イオンビームで局在する。これにより、モノリシックデバイスにおける決定論的色中心ネットワーク構築の道を開く、関連する色中心を持つ単一原子の順序づけられた配列を構築することができる。

Colour centre ensembles in diamond have been the subject of intensive investigation for many applications including single photon sources for quantum communication, quantum computation with optical inputs and outputs, and magnetic field sensing down to the nanoscale. Some of these applications are realised with a single centre or randomly distributed ensembles in chips, but the most demanding application for a large-scale quantum computer will require ordered arrays. By configuring an electronic-grade diamond substrate with a biased surface graphene electrode connected to charge-sensitive electronics, it is possible to demonstrate deterministic single ion implantation for ions stopping between 30 and 130~nm deep from a typical stochastic ion source. An implantation event is signalled by a charge pulse induced by the drift of electron-hole pairs from the ion implantation. The ion implantation site is localised with an AFM nanostencil or a focused ion beam. This allows the construction of ordered arrays of single atoms with associated colour centres that paves the way for the fabrication of deterministic colour center networks in a monolithic device.

翻訳日:2023-06-16 10:53:01 公開日:2023-06-14

# オンラインレコメンダシステムにおける高品質コンテンツへのインセンティブ

Incentivizing High-Quality Content in Online Recommender Systems ( http://arxiv.org/abs/2306.07479v2 )

ライセンス: Link先を確認

Xinyan Hu, Meena Jagadeesan, Michael I. Jordan, and Jacob Steinhardt

(参考訳) TikTokやYouTubeのようなコンテンツレコメンデーターシステムでは、プラットフォームの決定アルゴリズムがコンテンツ制作者のインセンティブを形成し、コンテンツ制作者がコンテンツの品質にどれだけの努力を払っているかが分かる。多くのプラットフォームがオンライン学習を採用しており、今日のコンテンツは将来のコンテンツの推奨に影響を与えるため、時間的インセンティブを生み出している。本稿では,オンライン学習から生じるインセンティブについて検討し,nash平衡で生成するコンテンツの質を分析した。 hedgeやexp3のような古典的なオンライン学習アルゴリズムは、残念ながら生産者に低品質のコンテンツを制作するインセンティブを与えている。特に、コンテンツの品質は学習率の観点から上界にあり、典型的な学習率スケジュールに対してゼロに近づきます。このネガティブな結果に動機づけられて、私たちは異なる学習アルゴリズム -- 低品質のコンテンツを作るプロデューサーを罰する - をデザインし、プロデューサに高品質なコンテンツを作るインセンティブを正しく与えます。概念レベルでは、我々の研究は、プラットフォームの学習アルゴリズムがコンテンツの品質に与えうる意図しない影響を示し、高品質コンテンツの作成にインセンティブを与えるプラットフォーム学習アルゴリズムの設計への扉を開く。

For content recommender systems such as TikTok and YouTube, the platform's decision algorithm shapes the incentives of content producers, including how much effort the content producers invest in the quality of their content. Many platforms employ online learning, which creates intertemporal incentives, since content produced today affects recommendations of future content. In this paper, we study the incentives arising from online learning, analyzing the quality of content produced at a Nash equilibrium. We show that classical online learning algorithms, such as Hedge and EXP3, unfortunately incentivize producers to create low-quality content. In particular, the quality of content is upper bounded in terms of the learning rate and approaches zero for typical learning rate schedules. Motivated by this negative result, we design a different learning algorithm -- based on punishing producers who create low-quality content -- that correctly incentivizes producers to create high-quality content. At a conceptual level, our work illustrates the unintended impact that a platform's learning algorithm can have on content quality and opens the door towards designing platform learning algorithms that incentivize the creation of high-quality content.

翻訳日:2023-06-16 10:52:14 公開日:2023-06-14

# 高スピンドナーquditの電場と磁場による16次元ヒルベルト空間の移動

Navigating the 16-dimensional Hilbert space of a high-spin donor qudit with electric and magnetic fields ( http://arxiv.org/abs/2306.07453v2 )

ライセンス: Link先を確認

Irene Fern\'andez de Fuentes, Tim Botzem, Mark A. I. Johnson, Arjen Vaartjes, Serwan Asaad, Vincent Mourik, Fay E. Hudson, Kohei M. Itoh, Brett C. Johnson, Alexander M. Jakob, Jeffrey C. McCallum, David N. Jamieson, Andrew S. Dzurak, Andrea Morello

(参考訳) 効率的なスケーリングと柔軟な制御は、有用な量子コンピューティングハードウェアの重要な側面である。半導体のスピンは、量子情報処理と電子、ホール、核、電気または磁場の制御、交換または双極子相互作用によるスケーラブルな結合を結合する。しかし、大きなヒルベルト空間へのアクセスは、相互作用の短距離性のため、依然として困難である。ここでは16次元ヒルベルト空間をシリコン中の1つのアンチモンドナーの電子核状態によって構築する原子ベースの半導体プラットフォームを提案する。我々は、この大きなヒルベルト空間を電場と磁場の両方を使ってナビゲートでき、ゲート忠実度が99.8%を超えることを実証し、ハミルトニアン系とその制御および雑音場に対する感受性の詳細を明らかにした。これらの結果は、高スピンドナーを実用的な量子情報のための豊かなプラットフォームとして確立し、量子基礎を探求する。

Efficient scaling and flexible control are key aspects of useful quantum computing hardware. Spins in semiconductors combine quantum information processing with electrons, holes or nuclei, control with electric or magnetic fields, and scalable coupling via exchange or dipole interaction. However, accessing large Hilbert space dimensions has remained challenging, due to the short-distance nature of the interactions. Here, we present an atom-based semiconductor platform where a 16-dimensional Hilbert space is built by the combined electron-nuclear states of a single antimony donor in silicon. We demonstrate the ability to navigate this large Hilbert space using both electric and magnetic fields, with gate fidelity exceeding 99.8% on the nuclear spin, and unveil fine details of the system Hamiltonian and its susceptibility to control and noise fields. These results establish high-spin donors as a rich platform for practical quantum information and to explore quantum foundations.

翻訳日:2023-06-16 10:51:50 公開日:2023-06-14

# deeptransition: 学習前四足歩行スキルにおける歩行遷移の出現

DeepTransition: Viability Leads to the Emergence of Gait Transitions in Learning Anticipatory Quadrupedal Locomotion Skills ( http://arxiv.org/abs/2306.07419v2 )

ライセンス: Link先を確認

Milad Shafiee, Guillaume Bellegarda, and Auke Ijspeert

(参考訳) 四足動物は移動速度を変えると、歩き方をシームレスに移行します。歩行遷移に関する最も広く受け入れられている説明はエネルギー効率であるが、決定要因や地形特性の潜在的な影響については明確な合意がない。本稿では,転倒の回避という生存可能性が歩行遷移の重要な基準であることを示す。深部強化学習とロボティクスツールを活用して, 上脊髄駆動(脳), 脊髄の中枢パターン生成器, 身体, 外受容感覚の相互作用による歩行遷移の出現について検討した。四足歩行の動物データと一致して,四足歩行ロボットの歩行遷移は,歩行能力とエネルギー効率の両立性が向上することを示した。さらに,個々の地形(すなわち連続した隙間を交差する)が歩行遷移に与える影響を調査し,非生存状態を避けるためにトロト-プロンク遷移の出現を見いだす。最大力やエネルギー効率などの他の潜在的な基準と比較すると、可視性は平地と分断地の両方での歩行遷移後の唯一の改善要因であり、可視性は歩行遷移の第一、普遍的な目的であり、他の基準は二次的な目的であり、かつ/または生存性の結果である。さらに、我々は、学習したコントローラをシミュレート・トゥ・リアルなハードウェア実験で展開し、挑戦的なシナリオで最先端の4倍の俊敏性を示す。

Quadruped animals seamlessly transition between gaits as they change locomotion speeds. While the most widely accepted explanation for gait transitions is energy efficiency, there is no clear consensus on the determining factor, nor on the potential effects from terrain properties. In this article, we propose that viability, i.e. the avoidance of falls, represents an important criterion for gait transitions. We investigate the emergence of gait transitions through the interaction between supraspinal drive (brain), the central pattern generator in the spinal cord, the body, and exteroceptive sensing by leveraging deep reinforcement learning and robotics tools. Consistent with quadruped animal data, we show that the walk-trot gait transition for quadruped robots on flat terrain improves both viability and energy efficiency. Furthermore, we investigate the effects of discrete terrain (i.e. crossing successive gaps) on imposing gait transitions, and find the emergence of trot-pronk transitions to avoid non-viable states. Compared with other potential criteria such as peak forces and energy efficiency, viability is the only improved factor after gait transitions on both flat and discrete gap terrains, suggesting that viability could be a primary and universal objective of gait transitions, while other criteria are secondary objectives and/or a consequence of viability. Moreover, we deploy our learned controller in sim-to-real hardware experiments and demonstrate state-of-the-art quadruped agility in challenging scenarios, where the Unitree A1 quadruped autonomously transitions gaits between trot and pronk to cross consecutive gaps of up to 30 cm (83.3 % of the body-length) at over 1.3 m/s.

翻訳日:2023-06-16 10:51:34 公開日:2023-06-14

# 最適化に触発されたディープニューラルネットワークを用いた自己教師付きハイパースペクトルインパインティング

Self-Supervised Hyperspectral Inpainting with the Optimisation inspired Deep Neural Network Prior ( http://arxiv.org/abs/2306.07308v2 )

ライセンス: Link先を確認

Shuo Li and Mehrdad Yaghoobi

(参考訳) ハイパースペクトル画像(HSI)は、数百から数千の狭いスペクトル帯域をカバーし、多くの空間およびスペクトル情報を伝達する。しかし、インストゥルメンタルエラーや大気の変化により、実際に得られたhsiはしばしばノイズやデッドピクセル(ライン)によって汚染され、結果として、その後の応用を著しく損なう可能性のある情報が欠落する。本稿では,新しいHSI欠落画素予測アルゴリズム,Low Rank and Sparsity Constraint Plug-and-Play (LRS-PnP)を紹介する。 LRS-PnPは、画像の全てのスペクトル帯域が欠落している場合でも、欠落した画素や帯域を予測することができる。 LRS-PnPアルゴリズムは、LSS-PnPとDeep Image Prior (DIP)を組み合わせた自己教師型モデルにさらに拡張される。実データを用いた一連の実験において、LSS-PnP-DIPは、他の学習ベース手法と比較して最先端の塗装性能を達成するか、性能を上回ることを示した。

Hyperspectral Image (HSI)s cover hundreds or thousands of narrow spectral bands, conveying a wealth of spatial and spectral information. However, due to the instrumental errors and the atmospheric changes, the HSI obtained in practice are often contaminated by noise and dead pixels(lines), resulting in missing information that may severely compromise the subsequent applications. We introduce here a novel HSI missing pixel prediction algorithm, called Low Rank and Sparsity Constraint Plug-and-Play (LRS-PnP). It is shown that LRS-PnP is able to predict missing pixels and bands even when all spectral bands of the image are missing. The proposed LRS-PnP algorithm is further extended to a self-supervised model by combining the LRS-PnP with the Deep Image Prior (DIP), called LRS-PnP-DIP. In a series of experiments with real data, It is shown that the LRS-PnP-DIP either achieves state-of-the-art inpainting performance compared to other learning-based methods, or outperforms them.

翻訳日:2023-06-16 10:50:17 公開日:2023-06-14

# 第二応答理論:量子重ね合わせの伝播に関する理論的形式論

Second Response Theory: A Theoretical Formalism for the Propagation of Quantum Superpositions ( http://arxiv.org/abs/2306.07924v2 )

ライセンス: Link先を確認

Mart\'in A. Mosquera

(参考訳) 一般電子量子状態の伝播は、分子系と外部駆動場との相互作用に関する情報を提供する。これらは非断熱量子現象に関する理解を与えることもできる。確立された手法は主に、当初は基底状態波動関数によってのみ記述された量子系を伝播することに焦点を当てている。本研究では,前述した2次応答理論と呼ばれる結合クラスター理論の形式性を拡張することにより,まずは基底状態を含む異なる状態の一般線形結合によって記述された量子系を伝播させ,そのような伝播を時間依存クラスター作用素の特殊集合でどのように行うかを示す。我々の理論は、量子力学的観測値、確率、コヒーレンスを決定するために、数値的に正確な結果と強い整合性を示す。本稿では, 2次応答理論における非定常状態と, 線形および二次応答理論における行列要素の予測能力について論じる。本研究はまた、基底状態のクラスター振幅の潜在的な不安定性を持つシステムを扱う近似正規化手法についても論じ、標準ユニタリ理論の参照結果について、その近似を比較する。

The propagation of general electronic quantum states provides information of the interaction of molecular systems with external driving fields. These can also offer understandings regarding non-adiabatic quantum phenomena. Well established methods focus mainly on propagating a quantum system that is initially described exclusively by the ground state wavefunction. In this work, we expand a previously developed formalism within coupled cluster theory, called second response theory, so it propagates quantum systems that are initially described by a general linear combination of different states, which can include the ground state, and show how with a special set of time-dependent cluster operators such propagations are performed. Our theory shows strong consistency with numerically exact results for the determination of quantum mechanical observables, probabilities, and coherences. We discuss unperturbed non-stationary states within second response theory and their ability to predict matrix elements that agree with those found in linear and quadratic response theories. This work also discusses an approximate regularized methodology to treat systems with potential instabilities in their ground-state cluster amplitudes, and compare such approximations with respect to reference results from standard unitary theory.

翻訳日:2023-06-16 10:44:20 公開日:2023-06-14

# iSLAM: インペラティブSLAM

iSLAM: Imperative SLAM ( http://arxiv.org/abs/2306.07894v2 )

ライセンス: Link先を確認

Taimeng Fu, Shaoshu Su, Chen Wang

(参考訳) 同時ローカライゼーションとマッピング(SLAM)は、ロボットナビゲーションにおける重要な課題の1つである。近年の進歩は, 教師あり学習に基づく手法が, 従来の最適化手法が評価ドリフトの最小化に重要な役割を担っていることを示唆している。本稿では,このような疎結合なパラダイムが準最適性能にのみ寄与し,結果としてシステム能力と一般化ポテンシャルを削減できることを見出した。この問題を解決するために,我々は,フロントエンドとバックエンドの相互修正を促進し,外部の監督を必要とせずに性能を向上させるための,新しい自己教師付き学習フレームワークimperative slam(islam)を提案した。具体的には,二元最適化問題としてslamシステムを定式化し,両成分を双方向に連結する。その結果、フロントエンドモデルは、バックエンドから残差をバックプロパゲーションすることで、ポーズグラフ最適化によって得られるグローバル幾何学的知識を学習することができる。これにより、システム全体の一般化能力が大幅に向上し、精度が45%まで向上する。我々の知る限り、iSLAMは、フロントエンドとバックエンドが相互に相互に相互に自己管理的な方法で学習できることを示す最初のSLAMシステムです。

Simultaneous localization and mapping (SLAM) stands as one of the critical challenges in robot navigation. Recent advancements suggest that methods based on supervised learning deliver impressive performance in front-end odometry, while traditional optimization-based methods still play a vital role in the back-end for minimizing estimation drift. In this paper, we found that such decoupled paradigm can lead to only sub-optimal performance, consequently curtailing system capabilities and generalization potential. To solve this problem, we proposed a novel self-supervised learning framework, imperative SLAM (iSLAM), which fosters reciprocal correction between the front-end and back-end, thus enhancing performance without necessitating any external supervision. Specifically, we formulate a SLAM system as a bi-level optimization problem so that the two components are bidirectionally connected. As a result, the front-end model is able to learn global geometric knowledge obtained through pose graph optimization by back-propagating the residuals from the back-end. This significantly improves the generalization ability of the entire system and thus achieves the accuracy improvement up to 45%. To the best of our knowledge, iSLAM is the first SLAM system showing that the front-end and back-end can learn jointly and mutually contribute to each other in a self-supervised manner.

翻訳日:2023-06-16 10:44:01 公開日:2023-06-14

# { Generalized $ \left\{ h (1) \oplus h(1) \right\} \uplus u(2) $ commensurate anisotropic Hamiltoninan and ladder operator; energy spectrum, eigenstates and associated coherent and squeezeed state

{Generalized $ \left\{ h (1) \oplus h(1) \right\} \uplus u(2) $ commensurate anisotropic Hamiltoninan and ladder operators; energy spectrum, eigenstates and associated coherent and squeezed states ( http://arxiv.org/abs/2306.07889v2 )

ライセンス: Link先を確認

Nibaldo-Edmundo Alvarez-Moraga

(参考訳) 本稿では、複素数 $ \left\{ h (1) \oplus h(1) \right\} \uplus u(2) $ Lie algebra の要素であるハミルトニアンが、この代数の要素であるはしご作用素を認める条件について研究した。このように構成された下降作用素の代数固有状態を計算し、それらからこのハミルトニアンのエネルギースペクトルとエネルギー固有状態の両方を、対応する昇降作用素の助けを借りて通常に生成する。したがって、一般化ハミルトニアン系のいくつかの族が発見され、適切な類似性変換の下では、1:1, 2:1, 1:2, $su(2)$ および他の非共役および可換な異方性2次元量子振動子系を見つける基本的な系の集合に還元される。ハミルトニアンの正規化固有状態とその関連する下降作用素に対する明示的な表現が与えられ、二モード分離可能および非分離一般化コヒーレントおよびスクイーズ状態の古典構造を示す。最後に、上記のすべての結果に基づいて、$p:q$ coprime commensurate 異方性量子振動子のための新しいラダー演算子の提案が行われ、chen $su(2)$コヒーレント状態のクラスへと導かれる。

In this article a study was made of the conditions under which a Hamiltonian which is an element of the complex $ \left\{ h (1) \oplus h(1) \right\} \uplus u(2) $ Lie algebra admits ladder operators which are also elements of this algebra. The algebra eigenstates of the lowering operator constructed in this way are computed and from them both the energy spectrum and the energy eigenstates of this Hamiltonian are generated in the usual way with the help of the corresponding raising operator. Thus, several families of generalized Hamiltonian systems are found, which, under a suitable similarity transformation, reduce to a basic set of systems, among which we find the 1:1, 2:1, 1:2, $su(2)$ and some other non-commensurate and commensurate anisotropic 2D quantum oscillator systems. Explicit expressions for the normalized eigenstates of the Hamiltonian and its associated lowering operator are given, which show the classical structure of two-mode separable and non-separable generalized coherent and squeezed states. Finally, based on all the above results, a proposal for new ladder operators for the $p:q$ coprime commensurate anisotropic quantum oscillator is made, which leads us to a class of Chen $SU(2)$ coherent states.

翻訳日:2023-06-16 10:43:38 公開日:2023-06-14

# 信頼できない純入力状態を持つユニタリ量子プロセストモグラフィ

Unitary quantum process tomography with unreliable pure input states ( http://arxiv.org/abs/2306.07867v2 )

ライセンス: Link先を確認

Fran\c{c}ois Verdeil and Yannick Deville

(参考訳) 量子プロセストモグラフィ(QPT)法は、与えられた量子プロセスを特定することを目的としている。本稿では,一元的プロセスの推定に焦点をあてる。なぜなら、量子力学は任意の閉量子系の進化はユニタリ変換によって記述されると仮定しているからである。 QTPの標準的なアプローチは、特定されるプロセスによって修正された後、所定の(一般に純粋な)状態の特定のセットのコピーを測定することである。この設定の主な問題は、入力状態を作成して所定の値に正確に設定することが困難であり、エラーが発生することである。これらのエラーは、中心となるエラー(すなわち、すべてのコピーの平均がゼロである)と、すべてのコピーで同じである系統的エラーの合計に分解することができる。本稿で紹介するアルゴリズムは,QPTを理論的に可能な任意の入力状態に対して有効である。入力状態が所定の値に正確に設定される必要がないという事実は、いくつかの状態が未知であるが、特定されるプロセスを通過する前に測定されることを考慮して、体系的なエラーの問題を除去するためにトリックを使用することができることを意味する。我々は、各入力状態のコピーを複数のグループに分割し、識別するプロセスの$k$インスタンスを通して連続的に転送された後、$k$-th groupのコピーを測定する(各入力状態のコピーは一度だけ測定される)。このトリックを使うことで、初期状態に関する知識を使わずに、プロセスの前後で測定された状態の推定を計算することができる。シミュレーションデータと実験データの両方でアルゴリズムをテストし、閉じ込められたイオン量子コンピュータ上のcnotゲートを同定する。

Quantum process tomography (QPT) methods aim at identifying a given quantum process. The present paper focuses on the estimation of a unitary process. This class is of particular interest because quantum mechanics postulates that the evolution of any closed quantum system is described by a unitary transformation. The standard approach of QTP is to measure copies of a particular set of predetermined (generally pure) states after they have been modified by the process to be identified. The main problem with this setup is that preparing an input state and setting it precisely to a predetermined value is challenging and thus yields errors. These errors can be decomposed into a sum of centred errors (i.e. whose average on all the copies is zero) and systematic errors that are the same on all the copies, the latter is often the main source of error in QPT. The algorithm we introduce in the current paper works for any input states that make QPT theoretically possible. The fact that we do not require the input states to be precisely set to predetermined values means that we can use a trick to remove the issue of systematic errors by considering that some states are unknown but measured before they go through the process to be identified. We achieve this by splitting the copies of each input state into several groups and measuring the copies of the $k$-th group after they have successively been transferred through $k$ instances of the process to be identified (each copy of each input state is only measured once). Using this trick we can compute estimates of the measured states before and after they go through the process without using the knowledge we might have on the initial states. We test our algorithm both on simulated data and experimentally to identify a CNOT gate on a trapped-ions qubit quantum computer.

翻訳日:2023-06-16 10:42:47 公開日:2023-06-14

# BeliefPPG:Breief PropagationによるPGG信号からの不確かさを意識した心拍数推定

BeliefPPG: Uncertainty-aware Heart Rate Estimation from PPG signals via Belief Propagation ( http://arxiv.org/abs/2306.07730v2 )

ライセンス: Link先を確認

Valentin Bieri, Paul Streli, Berken Utku Demirel and Christian Holz

(参考訳) 本稿では,photoplethysmography signal (ppg) から抽出した心拍数推定ベンチマークを用いて,最先端のパフォーマンスを実現する新しい学習ベース手法を提案する。我々は,隠れマルコフモデルとして表現される離散時間確率過程の文脈における心拍数の進化を考える。訓練されたニューラルネットワークを介して、所定のppg信号ウィンドウの心拍数値の分布を導出する。信念伝播を用いて,心拍変動の統計的分布を取り入れ,これらの推定値を時間的文脈で洗練する。そこで,本研究では,予測の不確かさを有意義かつ適切に推定した心拍数値の範囲を定量化した確率分布を求める。提案手法は8つの公開データセット上で3つの異なる相互評価実験によりロバスト性を示す。

We present a novel learning-based method that achieves state-of-the-art performance on several heart rate estimation benchmarks extracted from photoplethysmography signals (PPG). We consider the evolution of the heart rate in the context of a discrete-time stochastic process that we represent as a hidden Markov model. We derive a distribution over possible heart rate values for a given PPG signal window through a trained neural network. Using belief propagation, we incorporate the statistical distribution of heart rate changes to refine these estimates in a temporal context. From this, we obtain a quantized probability distribution over the range of possible heart rate values that captures a meaningful and well-calibrated estimate of the inherent predictive uncertainty. We show the robustness of our method on eight public datasets with three different cross-validation experiments.

翻訳日:2023-06-16 10:42:01 公開日:2023-06-14

# 先進的脅威に対する文脈認識型防御のためのマルチドメイン知識再武装

Few-shot Multi-domain Knowledge Rearming for Context-aware Defence against Advanced Persistent Threats ( http://arxiv.org/abs/2306.07685v2 )

ライセンス: Link先を確認

Gaolei Li, Yuanyuan Zhao, Wenqi Wei, Yuchen Liu

(参考訳) 高度な持続的脅威(APT)には、多段階の侵入、高度に調整された意図、回避戦術などの新しい特徴がある。 APTの防御には、攻撃意図を特定するために多次元サイバー脅威インテリジェンスデータを融合させ、エンティティ関係を認識するためにデータ駆動機械学習による効率的な知識発見戦略を実行する必要がある。しかし、データ駆動機械学習は、新しいサンプルや未知のサンプルの一般化能力に欠けており、防御モデルの精度と実用性を低下させる。さらに、これらのAPT防衛モデルを異種環境や様々なネットワークデバイスにプライベートに展開するには、コンテキスト認識(既知の攻撃エンティティ、連続ネットワーク状態、現在のセキュリティ戦略など)に多大な投資が必要になる。本稿では,APTに対する文脈認識型防御のためのFMKR方式を提案する。メタ学習によって異なるネットワークドメインから生成される複数の小さなタスクを完了させることで、FMKRはまず、新しく未知のAPT攻撃に対して優れた識別と一般化能力を持つモデルを訓練する。各FMKRタスクでは、脅威インテリジェンスとローカルエンティティの両方がメタラーニングにおけるサポート/クエリセットに融合し、攻撃ステージを特定する。第二に、現在のセキュリティ戦略を再構築するために、学生モデルに学習知識を伝達する微調整に基づく展開機構を提案し、防御コストを最小限に抑える。複数のモデル置換戦略と比較して、FMKRは、スケジューリングコストを削減しつつ、攻撃行動に対する迅速な応答を提供する。 2ヶ月にわたる産業用IoT(Industrial Internet of Things, IIoT)のユーザからのフィードバックをもとに,提案手法が防衛満足度を向上させることを実証した。

Advanced persistent threats (APTs) have novel features such as multi-stage penetration, highly-tailored intention, and evasive tactics. APTs defense requires fusing multi-dimensional Cyber threat intelligence data to identify attack intentions and conducts efficient knowledge discovery strategies by data-driven machine learning to recognize entity relationships. However, data-driven machine learning lacks generalization ability on fresh or unknown samples, reducing the accuracy and practicality of the defense model. Besides, the private deployment of these APT defense models on heterogeneous environments and various network devices requires significant investment in context awareness (such as known attack entities, continuous network states, and current security strategies). In this paper, we propose a few-shot multi-domain knowledge rearming (FMKR) scheme for context-aware defense against APTs. By completing multiple small tasks that are generated from different network domains with meta-learning, the FMKR firstly trains a model with good discrimination and generalization ability for fresh and unknown APT attacks. In each FMKR task, both threat intelligence and local entities are fused into the support/query sets in meta-learning to identify possible attack stages. Secondly, to rearm current security strategies, an finetuning-based deployment mechanism is proposed to transfer learned knowledge into the student model, while minimizing the defense cost. Compared to multiple model replacement strategies, the FMKR provides a faster response to attack behaviors while consuming less scheduling cost. Based on the feedback from multiple real users of the Industrial Internet of Things (IIoT) over 2 months, we demonstrate that the proposed scheme can improve the defense satisfaction rate.

翻訳日:2023-06-16 10:41:49 公開日:2023-06-14

# UOD: 解剖学的ランドマークのユニバーサルワンショット検出

UOD: Universal One-shot Detection of Anatomical Landmarks ( http://arxiv.org/abs/2306.07615v2 )

ライセンス: Link先を確認

Heqin Zhu, Quan Quan, Qingsong Yao, Zaiyi Liu, S. kevin Zhou

(参考訳) ワンショット医療ランドマーク検出は、多くの注目を集め、ラベル効率の良いトレーニングプロセスで大きな成功を収める。しかし、既存のワンショット学習手法は、単一のドメインに高度に特化しており、マルチドメイン未ラベルデータの状況において、ドメインの嗜好を著しく損なう。さらに、ワンショット学習は、サブ最適イメージにアノテートした場合のパフォーマンス低下に直面するほど堅牢ではない。これらの課題に対処するために,Universal One-shot Detection (UOD) という,多領域の医療画像を扱うためのドメイン適応型ワンショットランドマーク検出フレームワークを開発する。 UODは、ドメイン固有モジュールとドメイン共有モジュールの組み合わせとして設計された、2つのステージと2つの対応するユニバーサルモデルから構成される。第1段階では、ドメイン適応畳み込みモデルが学習され、擬似ランドマークラベルを生成する。第2段階では、ドメイン優先を排除し、マルチドメインデータのグローバルコンテキストを構築するために、ドメイン適応変換器を設計する。各ドメインからの注釈付きサンプルは1つしかトレーニングできないが、ドメイン共有モジュールはUODがすべての一発サンプルを集約し、より堅牢で正確なランドマークを検出するのに役立つ。解剖学的領域(頭,手,胸など)で広く利用されている3つの公開X線データセットの質的,定量的に検討し,各領域における最先端の成果を得た。

One-shot medical landmark detection gains much attention and achieves great success for its label-efficient training process. However, existing one-shot learning methods are highly specialized in a single domain and suffer domain preference heavily in the situation of multi-domain unlabeled data. Moreover, one-shot learning is not robust that it faces performance drop when annotating a sub-optimal image. To tackle these issues, we resort to developing a domain-adaptive one-shot landmark detection framework for handling multi-domain medical images, named Universal One-shot Detection (UOD). UOD consists of two stages and two corresponding universal models which are designed as combinations of domain-specific modules and domain-shared modules. In the first stage, a domain-adaptive convolution model is self-supervised learned to generate pseudo landmark labels. In the second stage, we design a domain-adaptive transformer to eliminate domain preference and build the global context for multi-domain data. Even though only one annotated sample from each domain is available for training, the domain-shared modules help UOD aggregate all one-shot samples to detect more robust and accurate landmarks. We investigated both qualitatively and quantitatively the proposed UOD on three widely-used public X-ray datasets in different anatomical domains (i.e., head, hand, chest) and obtained state-of-the-art performances in each domain.

翻訳日:2023-06-16 10:40:56 公開日:2023-06-14

PDF登録状況（公開日: 20230614）