Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230720となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 差分プライバシーを用いたデータ分析 Data Analytics with Differential Privacy ( http://arxiv.org/abs/2311.16104v1 ) ライセンス: Link先を確認	Vassilis Digalakis Jr	(参考訳) ディファレンシャルプライバシは、プライバシに関する最先端の定義であり、機密データセット上で実行される分析が、そのデータを含む個人に関する情報をリークしないことを保証する。本論文では,分散およびストリーミングデータを解析するための微分プライベートアルゴリズムを開発した。分散モデルでは、学習の特定の問題 -- 分散形式で -- はデータのグローバルモデルであり、その後任意の分析に使用できると考える。ベイズネットワークモデルを用いて,低次分布の積としての集中型データセットの高次元分布を近似する微分プライベート手法であるprivbayesを基礎とする。分散データからグローバルベイズネットワークを学習するための3つの新しいアプローチについて検討し、すべてのローカルデータセットに差分プライバシー保証を提供する。我々の研究は、我々のアルゴリズムの1つで使われている分散プライベートエントロピー推定器の詳細な理論的解析と、合成データと実世界のデータの両方を用いて詳細な実験的評価を含む。ストリーミングモデルでは,ストリームに実際に現れる全ユーザの比率を表す,ストリームの密度を推定する問題に注目する。我々は,ストリーミングモデルであるユーザレベルのパンプライバシに対して,最も強力なプライバシ保証を提供する。これは,アルゴリズムの内部状態を監視している敵に対して,ユーザのプライバシが保護されていることを保証します。そこで本研究では,既存のサンプリングベースアルゴリズムの詳細な解析を行い,全ての「プライバシー予算」を最適に活用し,理論的および実験的に改善する2つの新しい修正を提案する。 Differential privacy is the state-of-the-art definition for privacy, guaranteeing that any analysis performed on a sensitive dataset leaks no information about the individuals whose data are contained therein. In this thesis, we develop differentially private algorithms to analyze distributed and streaming data. In the distributed model, we consider the particular problem of learning -- in a distributed fashion -- a global model of the data, that can subsequently be used for arbitrary analyses. We build upon PrivBayes, a differentially private method that approximates the high-dimensional distribution of a centralized dataset as a product of low-order distributions, utilizing a Bayesian Network model. We examine three novel approaches to learning a global Bayesian Network from distributed data, while offering the differential privacy guarantee to all local datasets. Our work includes a detailed theoretical analysis of the distributed, differentially private entropy estimator which we use in one of our algorithms, as well as a detailed experimental evaluation, using both synthetic and real-world data. In the streaming model, we focus on the problem of estimating the density of a stream of users, which expresses the fraction of all users that actually appear in the stream. We offer one of the strongest privacy guarantees for the streaming model, user-level pan-privacy, which ensures that the privacy of any user is protected, even against an adversary that observes the internal state of the algorithm. We provide a detailed analysis of an existing, sampling-based algorithm for the problem and propose two novel modifications that significantly improve it, both theoretically and experimentally, by optimally using all the allocated "privacy budget."	翻訳日:2024-01-15 15:21:55 公開日:2023-07-20
# RESTful API設計ルールはWeb APIの理解可能性に影響を与えるか? API記述を用いたWebベースの実験 Do RESTful API Design Rules Have an Impact on the Understandability of Web APIs? A Web-Based Experiment with API Descriptions ( http://arxiv.org/abs/2305.07346v3 ) ライセンス: Link先を確認	Justus Bogner, Sebastian Kotstein, Timo Pfaff	(参考訳) コンテキスト: web apiは、web上でアプリケーション機能を公開するための最もよく使われる方法の1つであり、その理解力は、提供されたリソースを効率的に利用する上で重要である。多くのAPI設計ルールが存在するが、ほとんどのルールの有効性に関する実証的な証拠が欠けている。目的:学習したいと願う 1)restfulなapi設計ルールが理解可能性に与える影響 2 規則違反がより理解しにくいと認められる場合、及び 3) REST関連の経験のような人口統計特性がこれに影響を与えている場合。方法: 業界, 学界, 経験の異なる105人の参加者を対象に, 制御されたWebベースの実験を行った。クロスオーバーとオブジェクト間の設計のハイブリッドに基づいて,APIスニペットを2つの補完バージョンで使用した12の設計ルールについて検討した。参加者は理解的な質問に答え、その難しさを評価した。結果:12のルールのうち11のルールにおいて,「違反」は理解作業において「ルール」よりも有意に悪い結果が得られた。主観評価では,12ルールのうち9ルールに有意な差が認められた。デモグラフィックは「違反」に対する理解的なパフォーマンスには何の役割も果たさなかった。結論: この結果は, 研究者, 実践者, 教育者にとって重要な Web API の理解性を改善するために, 設計規則に従うことの重要性を実証した最初の証拠となる。 Context: Web APIs are one of the most used ways to expose application functionality on the Web, and their understandability is important for efficiently using the provided resources. While many API design rules exist, empirical evidence for the effectiveness of most rules is lacking. Objective: We therefore wanted to study 1) the impact of RESTful API design rules on understandability, 2) if rule violations are also perceived as more difficult to understand, and 3) if demographic attributes like REST-related experience have an influence on this. Method: We conducted a controlled Web-based experiment with 105 participants, from both industry and academia and with different levels of experience. Based on a hybrid between a crossover and a between-subjects design, we studied 12 design rules using API snippets in two complementary versions: one that adhered to a "rule" and one that was a "violation" of this rule. Participants answered comprehension questions and rated the perceived difficulty. Results: For 11 of the 12 rules, we found that "violation" performed significantly worse than "rule" for the comprehension tasks. Regarding the subjective ratings, we found significant differences for 9 of the 12 rules, meaning that most violations were subjectively rated as more difficult to understand. Demographics played no role in the comprehension performance for "violation". Conclusions: Our results provide first empirical evidence for the importance of following design rules to improve the understandability of Web APIs, which is important for researchers, practitioners, and educators.	翻訳日:2023-10-24 08:55:07 公開日:2023-07-20
# EventB, $\{log\}$, Why3 スパース集合モデルの比較 Comparing EventB, $\{log\}$ and Why3 Models of Sparse Sets ( http://arxiv.org/abs/2307.03974v2 ) ライセンス: Link先を確認	Maximiliano Cristi\'a and Catherine Dubois	(参考訳) 集合の多くの表現はプログラミング言語ライブラリで利用可能である。この論文は、例えば、範囲列の代替として、値の有限集合である整数変数領域を表す制約解決器で使われるスパース集合に焦点を当てている。本稿では, スパース集合の実装を, EventB, $\{log\}$ と Why3 の3つの帰納的形式検証ツールで検証する。さらに,仕様や証明について比較する。 Many representations for sets are available in programming languages libraries. The paper focuses on sparse sets used, e.g., in some constraint solvers for representing integer variable domains which are finite sets of values, as an alternative to range sequence. We propose in this paper verified implementations of sparse sets, in three deductive formal verification tools, namely EventB, $\{log\}$ and Why3. Furthermore, we draw some comparisons regarding specifications and proofs.	翻訳日:2023-10-23 18:05:38 公開日:2023-07-20
# ソフトウェア移植によるソフトウェア製品ラインエンジニアリング Software Product Line Engineering via Software Transplantation ( http://arxiv.org/abs/2307.10896v1 ) ライセンス: Link先を確認	Leandro O. Souza, Earl T. Barr, Justyna Petke, Eduardo S. Almeida and Paulo Anselmo M. S. Neto	(参考訳) 関連製品を製造する企業にとって、SPL(Software Product Line)は、市場投入までの時間とソフトウェア品質を改善し、大幅なコスト削減を実現するソフトウェア再利用手法である。多くの場合、SPLをサポートするためにコードベースを再設計し、再設計するのに何年もかかります。現在のSPLのプラクティスは、さまざまな再設計フェーズ用に調整されたツールの集合に依存している。本稿では,splへの変換とメンテナンスを高速化するソフトウェア移植の汎用的自動化手法である foundryを提案する。 Foundryは機能抽出とマイグレーションを容易にする。複数のファイルで実装された一連の機能を効率よく、繰り返し、移植することができる。私たちはFoundryを使って、3つの現実世界のシステムから機能を自動で統合する2つの有効な製品ラインを作りました。さらに,Foundryの機能移行と手作業との比較実験を行った。 foundryは、splの専門家のグループがタスクを達成するのに要した平均時間よりも、コードベース全体の機能を自動的に4.8倍速く移行した。 For companies producing related products, a Software Product Line (SPL) is a software reuse method that improves time-to-market and software quality, achieving substantial cost reductions.These benefits do not come for free. It often takes years to re-architect and re-engineer a codebase to support SPL and, once adopted, it must be maintained. Current SPL practice relies on a collection of tools, tailored for different reengineering phases, whose output developers must coordinate and integrate. We present Foundry, a general automated approach for leveraging software transplantation to speed conversion to and maintenance of SPL. Foundry facilitates feature extraction and migration. It can efficiently, repeatedly, transplant a sequence of features, implemented in multiple files. We used Foundry to create two valid product lines that integrate features from three real-world systems in an automated way. Moreover, we conducted an experiment comparing Foundry's feature migration with manual effort. We show that Foundry automatically migrated features across codebases 4.8 times faster, on average, than the average time a group of SPL experts took to accomplish the task.	翻訳日:2023-10-23 17:05:07 公開日:2023-07-20
# コンパイラエラーに対処する - スタックオーバーフローか,あるいは大規模言語モデルか? Addressing Compiler Errors: Stack Overflow or Large Language Models? ( http://arxiv.org/abs/2307.10793v1 ) ライセンス: Link先を確認	Patricia Widjojo and Christoph Treude	(参考訳) コンパイラエラーメッセージは、コンパイルエラーを扱うプログラマの初期リソースとして機能する。しかし、以前の研究では、コード問題を解決するのに十分なターゲット情報がないことがしばしば示されている。その結果、プログラマは通常、エラーを修正するために独自の研究に依存します。歴史的に、stack overflowはそのような情報の主要なリソースであったが、近年の大規模言語モデルの進歩は代替手段を提供している。本研究では,コンパイラエラーに遭遇するプログラマにとって最も効果的なアプローチを決定するために,3つのソースからの100個のコンパイラエラーメッセージを体系的に検討する。検討された要因には、Stack Overflow検索方法やモデルバージョンの影響、大規模言語モデルを使用する場合の迅速な表現などがある。 GPT-4は、コンパイラエラーメッセージの説明において、Stack Overflowよりも優れており、Stack Overflow検索にコードスニペットを追加する効果は、検索方法によって異なり、Stack Overflowの結果はGoogleとStackExchange APIの検索とは大きく異なる。さらに、GPT-4 は GPT-3.5 を超え、"How to fix" は "What do this error mean" に優れた結果をもたらす。これらの結果は、コンパイラエラーメッセージの支援、GPT-4のような先進的な大規模言語モデルのデバッグやAI支援プログラミングの研究者のための新たな探究の道を開く可能性について、プログラマに貴重なガイダンスを提供する。 Compiler error messages serve as an initial resource for programmers dealing with compilation errors. However, previous studies indicate that they often lack sufficient targeted information to resolve code issues. Consequently, programmers typically rely on their own research to fix errors. Historically, Stack Overflow has been the primary resource for such information, but recent advances in large language models offer alternatives. This study systematically examines 100 compiler error messages from three sources to determine the most effective approach for programmers encountering compiler errors. Factors considered include Stack Overflow search methods and the impact of model version and prompt phrasing when using large language models. The results reveal that GPT-4 outperforms Stack Overflow in explaining compiler error messages, the effectiveness of adding code snippets to Stack Overflow searches depends on the search method, and results for Stack Overflow differ significantly between Google and StackExchange API searches. Furthermore, GPT-4 surpasses GPT-3.5, with "How to fix" prompts yielding superior outcomes to "What does this error mean" prompts. These results offer valuable guidance for programmers seeking assistance with compiler error messages, underscoring the transformative potential of advanced large language models like GPT-4 in debugging and opening new avenues of exploration for researchers in AI-assisted programming.	翻訳日:2023-10-23 17:04:22 公開日:2023-07-20
# SMOTEC: 適応型スマートモビリティ実験のためのエッジコンピューティングテストベッド SMOTEC: An Edge Computing Testbed for Adaptive Smart Mobility Experimentation ( http://arxiv.org/abs/2307.11181v1 ) ライセンス: Link先を確認	Zeinab Nezami, Evangelos Pournaras, Amir Borzouie, Jie Xu	(参考訳) smart mobilityは、ネットゼロの目標を達成する上で最重要となる。しかし、自動運転車や自動運転、電気自動車は、エッジからクラウドへの連続体全体に広がる、効率的で回復力があり、信頼性の高い計算オフロードバックボーンを必要とする。オンデマンドの不均一な計算資源をスマートモビリティに活用することは困難であり、しばしばコスト非効率である。本稿では,エッジコンピューティングを用いた適応型スマートモビリティ実験のためのオープンソーステストベッドSMOTECを紹介する。 SMOTECは、拡張現実やリアルタイムトラフィック監視といったエッジデバイス上のインテリジェンスサービスのプロトタイピングと最適化を行うモジュール型のエンドツーエンドインスツルメンテーションを初めて提供する。 SMOTECは、都市移動のためのSUMOシミュレータ、ZeroMQとEPOSを介して通信するRaspberry Piエッジデバイス、エッジからクラウドへの分散ロードバランシングを備えたAIベースのDockerコンテナ統合をサポートする。すべてのコンポーネントは、K3s軽量Kubernetesによってオーケストレーションされる。ミュンヘンからの交通監視のための自己最適化サービス配置の実証は、SMOTECの適用性と費用対効果を実証している。 Smart mobility becomes paramount for meeting net-zero targets. However, autonomous, self-driving and electric vehicles require more than ever before an efficient, resilient and trustworthy computational offloading backbone that expands throughout the edge-to-cloud continuum. Utilizing on-demand heterogeneous computational resources for smart mobility is challenging and often cost-ineffective. This paper introduces SMOTEC, a novel open-source testbed for adaptive smart mobility experimentation with edge computing. SMOTEC provides for the first time a modular end-to-end instrumentation for prototyping and optimizing placement of intelligence services on edge devices such as augmented reality and real-time traffic monitoring. SMOTEC supports a plug-and-play Docker container integration of the SUMO simulator for urban mobility, Raspberry Pi edge devices communicating via ZeroMQ and EPOS for an AI-based decentralized load balancing across edge-to-cloud. All components are orchestrated by the K3s lightweight Kubernetes. A proof-of-concept of self-optimized service placements for traffic monitoring from Munich demonstrates in practice the applicability and cost-effectiveness of SMOTEC.	翻訳日:2023-10-23 16:51:27 公開日:2023-07-20
# コンピュータ支援設計における脳神経インタフェースのためのビジュアルフローベースプログラミングプラグイン Visual Flow-based Programming Plugin for Brain Computer Interface in Computer-Aided Design ( http://arxiv.org/abs/2307.11023v1 ) ライセンス: Link先を確認	Tong Bill Xu and Saleh Kalantari	(参考訳) 過去半世紀にわたり、BCI(Brain Computer Interfaces, Brain Computer Interfaces, BCI)のメインの応用は、車椅子やニューラルな義肢を制御したり、モビリティに制限のある人々のためのテキストやコマンドを生成したりしてきた。 BCIが新しい形態の環境相互作用を提供する可能性にもかかわらず、コンピュータ支援設計の応用にはこの分野において非常に注意が向けられている。本稿では、神経科学やコンピュータプログラミングの経験が乏しいデザイナーが、設計に関連する確立された指標とともに神経学的データにアクセスし、デジタルオンスクリーンオブジェクトと物理デバイスの両方でbciインタラクションプロトタイプを作成し、神経学的情報に基づいてデザインを評価し、さらなる分析を行うための新しいbciツールであるneuronの開発と応用について紹介する。 BCIツール開発について議論した後、この記事では2つのケーススタディを通じて、ツールのパフォーマンスを簡潔に評価し、意味、制限、将来の改善について議論する。 Over the last half century, the main application of Brain Computer Interfaces, BCIs has been controlling wheelchairs and neural prostheses or generating text or commands for people with restricted mobility. There has been very limited attention in the field to applications for computer aided design, despite the potential of BCIs to provide a new form of environmental interaction. In this paper we introduce the development and application of Neuron, a novel BCI tool that enables designers with little experience in neuroscience or computer programming to gain access to neurological data, along with established metrics relevant to design, create BCI interaction prototypes, both with digital onscreen objects and physical devices, and evaluate designs based on neurological information and record measurements for further analysis. After discussing the BCI tool development, the article presents its capabilities through two case studies, along with a brief evaluation of the tool performance and a discussion of implications, limitations, and future improvement.	翻訳日:2023-10-23 16:51:03 公開日:2023-07-20
# 抽出法リファクタリングのためのライブ環境の実証評価 Empirical Evaluation of a Live Environment for Extract Method Refactoring ( http://arxiv.org/abs/2307.11010v1 ) ライセンス: Link先を確認	Sara Fernandes, Ademar Aguiar, Andr\'e Restivo	(参考訳) 複雑なソフトウェアは読みやすく、適応し、維持することが難しい。リファクタリングはクリーンで自己説明的なコードを生成することができる。リファクタリングツールは、開発者をより良いコードへと誘導し、より品質を高めます。しかし、ほとんどがフィードバック、サポート、そして開発者がソフトウェアをどのように改善すべきかについてのガイダンスを提供するのに時間がかかり過ぎます。この問題を軽減するために,我々は,視覚的に提案し,リファクタリングを適用したLive Refactoringの概念をリアルタイムで検討した。このことを念頭に置いて,メソッドのリファクタリングを視覚的に識別し,推奨し,適用するライブリファクタリング環境を開発した。それを検証するために実験を行った。初期の結果から、私たちのアプローチはいくつかのコード品質メトリクスを改善しました。さらに、私たちの結果は、追加の助けなしにコードを手動でリファクタリングした結果とは大きく異なり、より良いと結論付けました。 Complex software can be hard to read, adapt, and maintain. Refactoring it can create cleaner and self-explanatory code. Refactoring tools try to guide developers towards better code, with more quality. However, most of them take too long to provide feedback, support, and guidance on how developers should improve their software. To reduce this problem, we explored the concept of Live Refactoring, focusing on visually suggesting and applying refactorings, in real-time. With this in mind, we developed a Live Refactoring Environment that visually identifies, recommends, and applies Extract Method refactorings. To validate it, we conducted an empirical experiment. Early results showed that our approach improved several code quality metrics. Besides, we also concluded that our results were significantly different and better than the ones from refactoring the code manually without further help.	翻訳日:2023-10-23 16:50:43 公開日:2023-07-20
# Twitterが未来を語るデータは何か? What Twitter Data Tell Us about the Future? ( http://arxiv.org/abs/2308.02035v1 ) ライセンス: Link先を確認	Alina Landowska, Marek Robak, Maciej Skorski	(参考訳) 期待とは、未来に対する思考と生活を伴う人間の基本的な認知能力である。言語マーカーは予測思考を反映するが,自然言語処理の観点からの予測に関する研究は限られている。本研究は,未来派がtwitterで展開する未来を探究し,ソーシャルメディア利用者の予測思考に対する言語手がかりの影響を検討することを目的とする。我々は、Twitterの未来主義者が期待し共有する未来と、これらの将来がソーシャルデータからどのようにモデル化されるかに関する研究課題に対処する。本研究は,予測に関する関連研究を概観し,言語マーカーと高名な個人が予測思考に与える影響を考察し,未来を「現在未来」と「未来現在」に分類する分類体系を提案する。本研究では、将来のインフルエンサーによる100万件以上の公開ツイートをまとめたデータセットを提示し、SOTAモデルを用いたスケーラブルなNLPパイプラインを開発する。この研究は、LDAアプローチから15のトピックと、未来主義者のツイートの中でBERTopicアプローチから100のトピックを識別する。これらの発見はトピックモデリングの研究に寄与し、Twitterの未来学者が期待する未来についての洞察を提供する。この研究は、未来学者の言葉の手がかりが、ソーシャルメディア利用者が自身のシナリオを予測し、現在対応できる未来を示唆していることを実証している。完全なオープンソースデータセット、インタラクティブ解析、再現可能なソースコードは、さらなる調査のために利用可能である。 Anticipation is a fundamental human cognitive ability that involves thinking about and living towards the future. While language markers reflect anticipatory thinking, research on anticipation from the perspective of natural language processing is limited. This study aims to investigate the futures projected by futurists on Twitter and explore the impact of language cues on anticipatory thinking among social media users. We address the research questions of what futures Twitter's futurists anticipate and share, and how these anticipated futures can be modeled from social data. To investigate this, we review related works on anticipation, discuss the influence of language markers and prestigious individuals on anticipatory thinking, and present a taxonomy system categorizing futures into "present futures" and "future present". This research presents a compiled dataset of over 1 million publicly shared tweets by future influencers and develops a scalable NLP pipeline using SOTA models. The study identifies 15 topics from the LDA approach and 100 distinct topics from the BERTopic approach within the futurists' tweets. These findings contribute to the research on topic modelling and provide insights into the futures anticipated by Twitter's futurists. The research demonstrates the futurists' language cues signals futures-in-the-making that enhance social media users to anticipate their own scenarios and respond to them in present. The fully open-sourced dataset, interactive analysis, and reproducible source code are available for further exploration.	翻訳日:2023-08-14 01:58:38 公開日:2023-07-20
# 脳波からの感情の流出:GRUに基づくアプローチ Unveiling Emotions from EEG: A GRU-Based Approach ( http://arxiv.org/abs/2308.02778v1 ) ライセンス: Link先を確認	Sarthak Johari, Gowri Namratha Meedinti, Radhakrishnan Delhibabu, Deepak Joshi	(参考訳) 感情コンピューティングにおける最も重要な研究分野の1つは脳波データを用いた感情識別である。本研究では,recurrent neural network(rnn)の一種であるgated recurrent unit(gru)アルゴリズムを用いて,脳波信号を用いて感情状態を予測できるかどうかを検証した。我々の公開データセットは、幸せ、中立、ネガティブな感情を呼び起こす刺激にさらされた人々の脳波記録と同様に、中立なデータを休ませることから成り立っている。最適な特徴抽出のために,アーティファクト除去,バンドパスフィルタ,正規化手法を用いて脳波データを前処理する。検証セットの100%の精度で,GRUの能力を利用して時間的依存関係を捕捉し,優れた結果を得た。他の機械学習技術と比較すると、GRUモデルのExtreme Gradient Boosting Classifierが最も精度が高かった。本研究により,モデルの性能に関する洞察に富んだ情報が得られ,正確な感情分類が可能となった。本研究は,感情認識のための grus などのディープラーニングモデルの可能性と,感情コンピューティングの進歩を強調する。我々の研究結果は、コンピュータと対話し、脳波活動を通して感情がどのように表現されるかを理解する新しい可能性を開く。 One of the most important study areas in affective computing is emotion identification using EEG data. In this study, the Gated Recurrent Unit (GRU) algorithm, which is a type of Recurrent Neural Networks (RNNs), is tested to see if it can use EEG signals to predict emotional states. Our publicly accessible dataset consists of resting neutral data as well as EEG recordings from people who were exposed to stimuli evoking happy, neutral, and negative emotions. For the best feature extraction, we pre-process the EEG data using artifact removal, bandpass filters, and normalization methods. With 100% accuracy on the validation set, our model produced outstanding results by utilizing the GRU's capacity to capture temporal dependencies. When compared to other machine learning techniques, our GRU model's Extreme Gradient Boosting Classifier had the highest accuracy. Our investigation of the confusion matrix revealed insightful information about the performance of the model, enabling precise emotion classification. This study emphasizes the potential of deep learning models like GRUs for emotion recognition and advances in affective computing. Our findings open up new possibilities for interacting with computers and comprehending how emotions are expressed through brainwave activity.	翻訳日:2023-08-14 00:49:38 公開日:2023-07-20
# LLMによるAI-Guardianの爆発支援 A LLM Assisted Exploitation of AI-Guardian ( http://arxiv.org/abs/2307.15008v1 ) ライセンス: Link先を確認	Nicholas Carlini	(参考訳) 大規模言語モデル(LLM)は今や様々なタスクで高い能力を持っている。本稿では,LPM である GPT-4 が,敵対的機械学習分野の研究者を支援することができるかどうかを考察する。ケーススタディとして、トップコンピュータセキュリティカンファレンスieee s&p 2023で発表された敵の例に対する最近の防御であるai-guardianのロバスト性を評価する。提案されたスキームは、未定義のベースラインと比較して堅牢性を高めません。我々は、このモデルを攻撃するためのコードを書かず、代わりにGPT-4に命令とガイダンスに従って全ての攻撃アルゴリズムを実装するよう促します。このプロセスは驚くほど効果的で効率的であり、言語モデルは、この論文の著者が実行したよりも早く曖昧な命令からコードを生成することもあった。結論として,(1)ai-guardianが提案する評価における警告サインが破られること,(2)言語モデリングにおける最新の進歩を用いて攻撃の設計と新たな研究を行う経験について論じた。 Large language models (LLMs) are now highly capable at a diverse range of tasks. This paper studies whether or not GPT-4, one such LLM, is capable of assisting researchers in the field of adversarial machine learning. As a case study, we evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023, a top computer security conference. We completely break this defense: the proposed scheme does not increase robustness compared to an undefended baseline. We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance. This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done. We conclude by discussing (1) the warning signs present in the evaluation that suggested to us AI-Guardian would be broken, and (2) our experience with designing attacks and performing novel research using the most recent advances in language modeling.	翻訳日:2023-07-30 03:58:43 公開日:2023-07-20
# ワンショット画像誘導による一般画像変換 General Image-to-Image Translation with One-Shot Image Guidance ( http://arxiv.org/abs/2307.14352v1 ) ライセンス: Link先を確認	Bin Cheng, Zuhao Liu, Yunbo Peng, Yue Lin	(参考訳) 大規模テキスト・画像ペアで事前学習した大規模テキスト・画像モデルは最近画像合成において優れた性能を示している。しかし、画像はプレーンテキストよりも直感的な視覚概念を提供することができる。望みの視覚概念を既存のイメージ、例えば肖像画に統合するにはどうすればいいのか? 現在の方法は、コンテンツを保存したり、視覚概念を効果的に翻訳する能力が欠けているため、この要求を満たすには不十分である。そこで本研究では,画像中のコンテンツを保存し,単一の参照画像でガイドされる視覚概念を翻訳する機能を備えた,視覚概念トランスレータ(VCT)という新しいフレームワークを提案する。提案するVCTは、内容と概念を抽出する内容概念反転(CCI)プロセスと、抽出した情報を収集して対象画像を得る内容概念融合(CCF)プロセスとを含む。 1つの参照画像のみを与えられた場合、提案するvctは、優れた結果を得て、幅広い一般的な画像から画像への翻訳タスクを完了することができる。提案手法の優越性と有効性を証明するため,広範な実験を行った。コードはhttps://github.com/crystalneuro/visual-concept-translatorで入手できる。 Large-scale text-to-image models pre-trained on massive text-image pairs show excellent performance in image synthesis recently. However, image can provide more intuitive visual concepts than plain text. People may ask: how can we integrate the desired visual concept into an existing image, such as our portrait? Current methods are inadequate in meeting this demand as they lack the ability to preserve content or translate visual concepts effectively. Inspired by this, we propose a novel framework named visual concept translator (VCT) with the ability to preserve content in the source image and translate the visual concepts guided by a single reference image. The proposed VCT contains a content-concept inversion (CCI) process to extract contents and concepts, and a content-concept fusion (CCF) process to gather the extracted information to obtain the target image. Given only one reference image, the proposed VCT can complete a wide range of general image-to-image translation tasks with excellent results. Extensive experiments are conducted to prove the superiority and effectiveness of the proposed methods. Codes are available at https://github.com/CrystalNeuro/visual-concept-translator.	翻訳日:2023-07-30 03:58:01 公開日:2023-07-20
# 財務における感情分析へのQNLPの適用 Applying QNLP to sentiment analysis in finance ( http://arxiv.org/abs/2307.11788v1 ) ライセンス: Link先を確認	Jonas Stein, Ivo Christ, Nicolas Kraus, Maximilian Balthasar Mansky, Robert M\"uller, Claudia Linnhof-Popien	(参考訳) わずかな質的な改善が大きな価値をもたらすアプリケーション領域として、金融は早期の量子優位の候補となる。量子自然言語処理(QNLP)の急速に進歩する分野に着目し、金融における感情分析の問題に対する2つの中心的アプローチであるDisCoCatとQuantum-Enhanced Long Short-Term Memory(QLSTM)の実用性について検討する。新たなChatGPTベースのデータ生成手法を用いることで、1000以上の現実的な文でケーススタディを行い、QLSTMはDisCoCatよりも大幅に高速にトレーニングでき、また、利用可能なソフトウェア実装の古典的な結果に近い結果が得られることを発見した。 As an application domain where the slightest qualitative improvements can yield immense value, finance is a promising candidate for early quantum advantage. Focusing on the rapidly advancing field of Quantum Natural Language Processing (QNLP), we explore the practical applicability of the two central approaches DisCoCat and Quantum-Enhanced Long Short-Term Memory (QLSTM) to the problem of sentiment analysis in finance. Utilizing a novel ChatGPT-based data generation approach, we conduct a case study with more than 1000 realistic sentences and find that QLSTMs can be trained substantially faster than DisCoCat while also achieving close to classical results for their available software implementations.	翻訳日:2023-07-25 19:47:35 公開日:2023-07-20
# 人間のLLM認知判断 LLM Cognitive Judgements Differ From Human ( http://arxiv.org/abs/2307.11787v1 ) ライセンス: Link先を確認	Sotiris Lamprinidis	(参考訳) 大規模言語モデル(LLM)は最近、研究者、ビジネス、消費者の注目を浴びている。このようなモデルの言語能力は広く研究されているが、認知的対象として研究することへの関心が高まっている。本研究は,認知科学文献からの限定データ帰納的推論課題におけるGPT-3とChatGPTの機能について検討する。その結果、これらのモデルの認知的判断は人間に似ていないことが示唆された。 Large Language Models (LLMs) have lately been on the spotlight of researchers, businesses, and consumers alike. While the linguistic capabilities of such models have been studied extensively, there is growing interest in investigating them as cognitive subjects. In the present work I examine GPT-3 and ChatGPT capabilities on an limited-data inductive reasoning task from the cognitive science literature. The results suggest that these models' cognitive judgements are not human-like.	翻訳日:2023-07-25 19:47:20 公開日:2023-07-20
# 知的エージェントの対話型シェーピング Adversarial Conversational Shaping for Intelligent Agents ( http://arxiv.org/abs/2307.11785v1 ) ライセンス: Link先を確認	Piotr Tarasiewicz, Sultan Kenjeyev, Ilana Sebag, Shehab Alshehabi	(参考訳) 最近のディープラーニング手法の出現により、研究コミュニティは自然言語処理を含むいくつかの領域で最先端の成果を達成できるようになった。しかし、現在のロボコールシステムは不安定で不正確であり、テキスト生成とチャットボットは退屈で、人間のような対話を誤解する可能性がある。本研究は, 対人会話形成による知的会話エージェントの強化が可能な2つのモデルの性能について検討する: 政策勾配(GANPG)を持つ生成的敵ネットワークと, Li 等で提示された REGS モデルに基づいて, 世代ごとの報酬を持つ生成的敵ネットワークである。 [18] . このモデルは、部分的および完全に生成されたテキストシーケンスの両方に報酬を割り当てることができる。強化学習フレームワークにおいて,Seq2seq [36]とTransformer [37 ]という,異なるトレーニングの詳細でパフォーマンスを議論する。 The recent emergence of deep learning methods has enabled the research community to achieve state-of-the art results in several domains including natural language processing. However, the current robocall system remains unstable and inaccurate: text generator and chat-bots can be tedious and misunderstand human-like dialogue. In this work, we study the performance of two models able to enhance an intelligent conversational agent through adversarial conversational shaping: a generative adversarial network with policy gradient (GANPG) and a generative adversarial network with reward for every generation step (REGS) based on the REGS model presented in Li et al. [18] . This model is able to assign rewards to both partially and fully generated text sequences. We discuss performance with different training details : seq2seq [ 36] and transformers [37 ] in a reinforcement learning framework.	翻訳日:2023-07-25 19:47:15 公開日:2023-07-20
# 実際、学習可能安全クリティカルシステムのための達成可能な保証手段とは何か What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems ( http://arxiv.org/abs/2307.11784v1 ) ライセンス: Link先を確認	Saddek Bensalem, Chih-Hong Cheng, Wei Huang, Xiaowei Huang, Changshun Wu, Xingyu Zhao	(参考訳) 機械学習は目覚ましい進歩を遂げているが、安全クリティカルな領域で学習可能なコンポーネントを確実に活用することは、依然として課題となっている。課題の1つは、厳格で実用的で、安全保証を達成する方法が最も顕著であることである。本稿ではまず,そのようなシステムの設計と検証に関わる工学的課題と研究課題について論じる。そして,既存の著作物が実際に証明可能な保証を達成できないという観測に基づいて,証明可能な統計保証の最終的な達成のための2段階検証手法を奨励する。 Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees.	翻訳日:2023-07-25 19:46:59 公開日:2023-07-20
# ボックス座標マッチングに基づく特定対象に対する新しい検出グラスピング法 A novel integrated method of detection-grasping for specific object based on the box coordinate matching ( http://arxiv.org/abs/2307.11783v1 ) ライセンス: Link先を確認	Zongmin Liu, Jirui Wang, Jie Li, Zufeng Li, Kai Ren, Peng Shi	(参考訳) 高齢者と障害者のケアを改善するためには,サービスロボットが物体検出と把持推定の効果的な融合法を持つことが不可欠である。しかし,物体検出と把握推定の組み合わせについて限定的な研究がなされている。そこで本稿では,この課題を克服するために,ボックス座標マッチングに基づく特定物体の検出・検出統合手法を提案する。まず、チャネルアテンションモジュール(CAM)と空間アテンションモジュール(SAM)を追加することで、SOLOv2インスタンスセグメンテーションモデルを改善する。次に、生成残差畳み込みニューラルネットワーク(GR-CNN)モデルに、アトラス空間ピラミッドプーリング(ASPP)とCAMを加え、把握推定を最適化する。さらに,ボックス座標マッチング(DG-BCM)に基づく検出グラスピング統合アルゴリズムを提案し,物体検出と把握推定の融合モデルを求める。検証のために,オブジェクト検出と把持推定実験を別々に行い,改良したモデルの優越性を検証する。さらに,本論文で提案するDG-BCMアルゴリズムの有効性と有効性を示すシミュレーションプラットフォーム上で,複数の特定のオブジェクトの把握タスクを実装した。 To better care for the elderly and disabled, it is essential for service robots to have an effective fusion method of object detection and grasp estimation. However, limited research has been observed on the combination of object detection and grasp estimation. To overcome this technical difficulty, a novel integrated method of detection-grasping for specific object based on the box coordinate matching is proposed in this paper. Firstly, the SOLOv2 instance segmentation model is improved by adding channel attention module (CAM) and spatial attention module (SAM). Then, the atrous spatial pyramid pooling (ASPP) and CAM are added to the generative residual convolutional neural network (GR-CNN) model to optimize grasp estimation. Furthermore, a detection-grasping integrated algorithm based on box coordinate matching (DG-BCM) is proposed to obtain the fusion model of object detection and grasp estimation. For verification, experiments on object detection and grasp estimation are conducted separately to verify the superiority of improved models. Additionally, grasping tasks for several specific objects are implemented on a simulation platform, demonstrating the feasibility and effectiveness of DG-BCM algorithm proposed in this paper.	翻訳日:2023-07-25 19:46:48 公開日:2023-07-20
# 非凸対象に対するアダムの収束性:緩和ハイパーパラメータと非エルゴードケース Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic Case ( http://arxiv.org/abs/2307.11782v1 ) ライセンス: Link先を確認	Meixuan He, Yuqing Liang, Jinlan Liu and Dongpo Xu	(参考訳) adamは機械学習でよく使われる確率最適化アルゴリズムである。しかし、その収束は、特に非凸設定において完全には理解されていない。本稿では,バニラ・アダムの収束のためのハイパーパラメータ設定の検討と,非エルゴード収束の課題に取り組む。まず、エルゴード収束と非エルゴード収束の正確な定義を導入し、確率的最適化アルゴリズムの収束のほぼ全ての形態をカバーする。一方,エルゴード収束に対する非エルゴード収束の優位性を強調する。第二に、アダムのエルゴード収束を保証するためのより弱い条件を確立し、より緩和されたハイパーパラメータの選択を可能にする。このことから、adam のほぼ確実にエルゴード収束率を達成し、これは任意に $o(1/\sqrt{k})$ に近い。さらに重要なことは、Adamの最後の反復が非凸目的に対して定常点に収束することを初めて証明したことである。最後に、polyak-lojasiewicz (pl) 条件下で関数値に対する非エルゴード収束速度は$o(1/k)$を得る。これらの結果は、Adamが非凸確率最適化問題を解くための確かな理論基盤を構築している。 Adam is a commonly used stochastic optimization algorithm in machine learning. However, its convergence is still not fully understood, especially in the non-convex setting. This paper focuses on exploring hyperparameter settings for the convergence of vanilla Adam and tackling the challenges of non-ergodic convergence related to practical application. The primary contributions are summarized as follows: firstly, we introduce precise definitions of ergodic and non-ergodic convergence, which cover nearly all forms of convergence for stochastic optimization algorithms. Meanwhile, we emphasize the superiority of non-ergodic convergence over ergodic convergence. Secondly, we establish a weaker sufficient condition for the ergodic convergence guarantee of Adam, allowing a more relaxed choice of hyperparameters. On this basis, we achieve the almost sure ergodic convergence rate of Adam, which is arbitrarily close to $o(1/\sqrt{K})$. More importantly, we prove, for the first time, that the last iterate of Adam converges to a stationary point for non-convex objectives. Finally, we obtain the non-ergodic convergence rate of $O(1/K)$ for function values under the Polyak-Lojasiewicz (PL) condition. These findings build a solid theoretical foundation for Adam to solve non-convex stochastic optimization problems.	翻訳日:2023-07-25 19:46:28 公開日:2023-07-20
# 抽出抽象軸:生成言語モデルにおける内容「バローイング」の測定 The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models ( http://arxiv.org/abs/2307.11779v1 ) ライセンス: Link先を確認	Nedelina Teneva	(参考訳) 生成言語モデルは、検索エンジンの抽出応答とは対照的に、設計によって非常に抽象的な出力を生成する。このLCMの特徴とコンテンツライセシング・アトリビューションへの影響を考慮し、生成モデルのベンチマークのためのいわゆる抽出・抽象軸を提案し、対応するメトリクスやデータセット、ガイドラインの開発の必要性を強調した。我々は議論をテキストモダリティに限定する。 Generative language models produce highly abstractive outputs by design, in contrast to extractive responses in search engines. Given this characteristic of LLMs and the resulting implications for content Licensing & Attribution, we propose the the so-called Extractive-Abstractive axis for benchmarking generative models and highlight the need for developing corresponding metrics, datasets and annotation guidelines. We limit our discussion to the text modality.	翻訳日:2023-07-25 19:46:08 公開日:2023-07-20
# ASRU 2023 MADASR ChallengeにおけるTranssion TSUPの音声認識システム Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge ( http://arxiv.org/abs/2307.11778v1 ) ライセンス: Link先を確認	Xiaoxiao Li, Gaosheng Zhang, An Zhu, Weiyong Li, Shuming Fang, Xiaoyue Yang, Jianchao Zhu	(参考訳) 本稿では,asru 2023 madasrチャレンジのためにtranssion speech understanding processing team (tsup) が開発した音声認識システムを提案する。このシステムは、低リソースインド言語へのasrモデルの適用にフォーカスしており、チャレンジの全4トラックをカバーしている。トラック1と2では、音響モデルはスクイーズフォーマエンコーダと、ジョイントctcアテンション訓練損失を有する双方向トランスデコーダを利用した。さらに、外部KenLM言語モデルがTLGビームサーチデコーディングに使用された。トラック3と4では、事前訓練されたindicwhisperモデルが採用され、チャレンジデータセットと公開データセットの両方で微調整された。ウィスパービームサーチデコーディングは、外部のKenLM言語モデルをサポートするように修正され、チャレンジによって提供される追加のテキストをより活用できるようになった。提案手法は,4トラックで24.17%,24.43%,15.97%,15.97%,ベンガル語で15.97%,4トラックで19.61%,19.54%,15.48%,15.48%の単語誤り率(wer)を達成した。これらの結果は,提案手法の有効性を示す。 This paper presents a speech recognition system developed by the Transsion Speech Understanding Processing Team (TSUP) for the ASRU 2023 MADASR Challenge. The system focuses on adapting ASR models for low-resource Indian languages and covers all four tracks of the challenge. For tracks 1 and 2, the acoustic model utilized a squeezeformer encoder and bidirectional transformer decoder with joint CTC-Attention training loss. Additionally, an external KenLM language model was used during TLG beam search decoding. For tracks 3 and 4, pretrained IndicWhisper models were employed and finetuned on both the challenge dataset and publicly available datasets. The whisper beam search decoding was also modified to support an external KenLM language model, which enabled better utilization of the additional text provided by the challenge. The proposed method achieved word error rates (WER) of 24.17%, 24.43%, 15.97%, and 15.97% for Bengali language in the four tracks, and WER of 19.61%, 19.54%, 15.48%, and 15.48% for Bhojpuri language in the four tracks. These results demonstrate the effectiveness of the proposed method.	翻訳日:2023-07-25 19:46:00 公開日:2023-07-20
# チーム強度推定による統計的強化学習によるハンドボールマッチの予測 Prediction of Handball Matches with Statistically Enhanced Learning via Estimated Team Strengths ( http://arxiv.org/abs/2307.11777v1 ) ライセンス: Link先を確認	Florian Felice and Christophe Ley	(参考訳) ハンドボールゲームを予測するため,統計的に強化された学習モデル(別名SEL)を提案する。 SELで強化された機械学習モデルは、80%以上の精度で最先端のモデルより優れている。本研究では,過去の女子部戦における機械学習モデルをトレーニングするためのデータセットの構築方法を示す。次に、異なるモデルを比較し、それらのパフォーマンス能力を評価する。最後に、説明可能性法により、ツールの範囲を、純粋に予測可能なソリューションから、非常に洞察に富んだ分析ツールに変更することができる。これはハンドボールチームのコーチにとって価値ある資産となり、将来のコンペティションに備えるための統計的および予測的な洞察を提供する。 We propose a Statistically Enhanced Learning (aka. SEL) model to predict handball games. Our Machine Learning model augmented with SEL features outperforms state-of-the-art models with an accuracy beyond 80%. In this work, we show how we construct the data set to train Machine Learning models on past female club matches. We then compare different models and evaluate them to assess their performance capabilities. Finally, explainability methods allow us to change the scope of our tool from a purely predictive solution to a highly insightful analytical tool. This can become a valuable asset for handball teams' coaches providing valuable statistical and predictive insights to prepare future competitions.	翻訳日:2023-07-25 19:45:35 公開日:2023-07-20
# 浅層再帰デコーダネットワークを用いた任意移動センサトラジェクタのフルステート再構成への応用 Leveraging arbitrary mobile sensor trajectories with shallow recurrent decoder networks for full-state reconstruction ( http://arxiv.org/abs/2307.11793v1 ) ライセンス: Link先を確認	Megan R. Ebers, Jan P. Williams, Katherine M. Steele, J. Nathan Kutz	(参考訳) センシングは、複雑な時空間システムの監視、予測、制御のための最も基本的なタスクの1つである。多くのアプリケーションでは、限られた数のセンサーがモバイルであり、ウェアラブル技術、海洋監視ブイ、気象気球など、ダイナミクスを使って移動している。これらの動的システム(統計に依存しない領域を除く)では、測定時間履歴は重要なタスクのために抽出できるかなりの量の情報をエンコードする。ほとんどのモデルフリーセンシングパラダイムは、現在のスパースセンサの測定結果を高次元の状態空間にマッピングすることを目的としている。現代のディープラーニングアーキテクチャを用いて、LSTM(long, short-term memory)ネットワークのようなシーケンス・ツー・ベクターモデルとデコーダ・ネットワークを用いて、動的軌跡情報を全状態空間推定にマッピング可能であることを示す。実際、我々は、浅い再帰デコーダネットワークでモバイルセンサトラジェクタを利用することで、ネットワークを訓練できることを実証する。一センサの任意の動的軌跡を用いて全状態空間を正確に再構築すること。 (ii)このアーキテクチャは、イムモービルセンサと比較して、復元誤差の平均二乗誤差のばらつきを低減させる。 (iii)アーキテクチャはまた、トレーニングセット外のデータの迅速な一般化(動的パラメータ化)を可能にする。また、センサの空間軌跡の訓練データが利用可能であれば、センサの経路を任意に選択することができる。ネットワークアーキテクチャの例外的な性能は,乱流,大域海面温度データ,人体運動バイオメカニクスの3つの応用で実証されている。 Sensing is one of the most fundamental tasks for the monitoring, forecasting and control of complex, spatio-temporal systems. In many applications, a limited number of sensors are mobile and move with the dynamics, with examples including wearable technology, ocean monitoring buoys, and weather balloons. In these dynamic systems (without regions of statistical-independence), the measurement time history encodes a significant amount of information that can be extracted for critical tasks. Most model-free sensing paradigms aim to map current sparse sensor measurements to the high-dimensional state space, ignoring the time-history all together. Using modern deep learning architectures, we show that a sequence-to-vector model, such as an LSTM (long, short-term memory) network, with a decoder network, dynamic trajectory information can be mapped to full state-space estimates. Indeed, we demonstrate that by leveraging mobile sensor trajectories with shallow recurrent decoder networks, we can train the network (i) to accurately reconstruct the full state space using arbitrary dynamical trajectories of the sensors, (ii) the architecture reduces the variance of the mean-square error of the reconstruction error in comparison with immobile sensors, and (iii) the architecture also allows for rapid generalization (parameterization of dynamics) for data outside the training set. Moreover, the path of the sensor can be chosen arbitrarily, provided training data for the spatial trajectory of the sensor is available. The exceptional performance of the network architecture is demonstrated on three applications: turbulent flows, global sea-surface temperature data, and human movement biomechanics.	翻訳日:2023-07-25 19:36:53 公開日:2023-07-20
# 古典データの分類のための相互作用層を有する量子畳み込みニューラルネットワーク Quantum Convolutional Neural Networks with Interaction Layers for Classification of Classical Data ( http://arxiv.org/abs/2307.11792v1 ) ライセンス: Link先を確認	Jishnu Mahmud, Raisa Mashtura, Shaikh Anowarul Fattah	(参考訳) 量子機械学習(quantum machine learning, qml)は、量子コンピュータの計算能力の異常さから生まれた。量子ニューラルネットワークにおけるマルチキュービット相互作用の影響は, 近距離量子コンピュータの今後への期待から広く研究されることが重要である。本稿では,ネットワークの表現可能性と絡み合い能力を高める3量子ビット相互作用を利用した新しい相互作用層を有する量子畳み込みネットワークを提案する。提案手法は, mnist, fashion mnist, irisデータセットの3つの公開データセットを用いて, バイナリ分類とマルチクラス分類を行い, 既存の最先端手法の性能に取って代わるものと考えられる。 Quantum Machine Learning (QML) has come into the limelight due to the exceptional computational abilities of quantum computers. With the promises of near error-free quantum computers in the not-so-distant future, it is important that the effect of multi-qubit interactions on quantum neural networks is studied extensively. This paper introduces a Quantum Convolutional Network with novel Interaction layers exploiting three-qubit interactions increasing the network's expressibility and entangling capability, for classifying both image and one-dimensional data. The proposed approach is tested on three publicly available datasets namely MNIST, Fashion MNIST, and Iris datasets, to perform binary and multiclass classifications and is found to supersede the performance of the existing state-of-the-art methods.	翻訳日:2023-07-25 19:36:26 公開日:2023-07-20
# 複合量子シミュレーション Composite Quantum Simulations ( http://arxiv.org/abs/2206.06409v3 ) ライセンス: Link先を確認	Matthew Hagan and Nathan Wiebe	(参考訳) 本稿では, トロッタスズキ公式やQDriftなどの複数の量子シミュレーション手法を, ゲート数を削減するための古いコネッセーションのアイデアの上に構築した1つの複合チャネルに組み合わせる枠組みを提案する。このアプローチの背後にある中心的な考え方は、シミュレーション内のチャネルのトロッターまたはQDrift部分にハミルトン項を割り当てるパーティショニングスキームを使用することである。これにより、高次トロッタースズキ式を用いてより大きい項をシミュレートしながら、QDriftを用いて、小さくて多数の項をシミュレートできる。合成チャネルと理想シミュレーションチャネルとの間のダイヤモンド距離の厳密な境界を証明し、合成チャネルの実装コストが漸近的に上界となる条件下では、項の確率的分割と決定論的分割の両方でそれを構成する方法を示す。最後に、分割スキームを決定するための戦略と、同一フレームワーク内で異なるシミュレーション手法を組み込む手法について論じる。 In this paper we provide a framework for combining multiple quantum simulation methods, such as Trotter-Suzuki formulas and QDrift into a single Composite channel that builds upon older coalescing ideas for reducing gate counts. The central idea behind our approach is to use a partitioning scheme that allocates a Hamiltonian term to the Trotter or QDrift part of a channel within the simulation. This allows us to simulate small but numerous terms using QDrift while simulating the larger terms using a high-order Trotter-Suzuki formula. We prove rigorous bounds on the diamond distance between the Composite channel and the ideal simulation channel and show under what conditions the cost of implementing the Composite channel is asymptotically upper bounded by the methods that comprise it for both probabilistic partitioning of terms and deterministic partitioning. Finally, we discuss strategies for determining partitioning schemes as well as methods for incorporating different simulation methods within the same framework.	翻訳日:2023-07-24 16:59:31 公開日:2023-07-20
# プリプロセッサが重要! 機械学習システムに対するリアルな意思決定に基づく攻撃 Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems ( http://arxiv.org/abs/2210.03297v2 ) ライセンス: Link先を確認	Chawin Sitawarin, Florian Tram\`er, Nicholas Carlini	(参考訳) 決定に基づく攻撃は、ハードラベルクエリのみを作成することによって、機械学習(ML)モデルに対する逆例を構築する。これらの攻撃は主にスタンドアロンのニューラルネットワークに直接適用される。しかし、実際には、MLモデルはより大きな学習システムの1つの構成要素にすぎない。分類器の前に1つのプリプロセッサを追加することで、最先端のクエリベースの攻撃は、モデル単独で攻撃するよりも予測パイプラインを攻撃するのに7$\times$以下になることがわかった。この相違は、ほとんどのプリプロセッサが入力空間に不変性の概念を導入しているという事実によって説明される。したがって、この不変性に気づいていない攻撃は、必然的に大量のクエリを無駄にして再発見または克服する。したがって、我々は技術を開発する。 (i)プリプロセッサをリバースエンジニアリングし、 (ii)この抽出情報を用いてエンドツーエンドシステムを攻撃する。プリプロセッサ抽出法は数百のクエリしか必要とせず,プリプロセッサアウェアアタックはモデル単独による攻撃と同じ効果を回復する。コードはhttps://github.com/google-research/preprocessor-aware-black-box-attackにある。 Decision-based attacks construct adversarial examples against a machine learning (ML) model by making only hard-label queries. These attacks have mainly been applied directly to standalone neural networks. However, in practice, ML models are just one component of a larger learning system. We find that by adding a single preprocessor in front of a classifier, state-of-the-art query-based attacks are up to 7$\times$ less effective at attacking a prediction pipeline than at attacking the model alone. We explain this discrepancy by the fact that most preprocessors introduce some notion of invariance to the input space. Hence, attacks that are unaware of this invariance inevitably waste a large number of queries to re-discover or overcome it. We, therefore, develop techniques to (i) reverse-engineer the preprocessor and then (ii) use this extracted information to attack the end-to-end system. Our preprocessors extraction method requires only a few hundred queries, and our preprocessor-aware attacks recover the same efficacy as when attacking the model alone. The code can be found at https://github.com/google-research/preprocessor-aware-black-box-attack.	翻訳日:2023-07-24 16:48:33 公開日:2023-07-20
# 逆ベイズシミュレーション Adversarial Bayesian Simulation ( http://arxiv.org/abs/2208.12113v2 ) ライセンス: Link先を確認	Yuexi Wang, Veronika Ro\v{c}kov\'a	(参考訳) 明示的あるいは扱いやすい可能性がない場合、ベイジアンはしばしば推定のために近似ベイジアン計算(abc)に頼る。我々の研究は、GAN(Generative Adversarial Network)と逆効果ベイズに基づくディープ・ニューラル暗黙のサンプルでABCを橋渡しする。 abcとgansは、観測データと偽データとを比較して、それぞれ後方と確率からシミュレートする。我々は, 逆最適化問題を解くことで, 直接後方を狙うベイズ型GAN(B-GAN)サンプリング器を開発した。 B-GANは条件付きGANによってABC参照で学習された決定論的マッピングによって駆動される。マッピングがトレーニングされた後、ノイズを無視可能な追加コストでフィルタリングすることで、後部サンプルを得る。 1) 重み付けを重要視するデータ駆動型提案と, (2) 変分ベイズを用いて, 処理後の局所的な改良を2つ提案する。本研究は,ニューラルネットワーク生成器や識別器において,真と近似後部の典型的な総変動距離が0に収束することを示す。シミュレーションデータを用いた結果,近年の近未来型後方シミュレータと比較して高い競争性能を示した。 In the absence of explicit or tractable likelihoods, Bayesians often resort to approximate Bayesian computation (ABC) for inference. Our work bridges ABC with deep neural implicit samplers based on generative adversarial networks (GANs) and adversarial variational Bayes. Both ABC and GANs compare aspects of observed and fake data to simulate from posteriors and likelihoods, respectively. We develop a Bayesian GAN (B-GAN) sampler that directly targets the posterior by solving an adversarial optimization problem. B-GAN is driven by a deterministic mapping learned on the ABC reference by conditional GANs. Once the mapping has been trained, iid posterior samples are obtained by filtering noise at a negligible additional cost. We propose two post-processing local refinements using (1) data-driven proposals with importance reweighting, and (2) variational Bayes. We support our findings with frequentist-Bayesian results, showing that the typical total variation distance between the true and approximate posteriors converges to zero for certain neural network generators and discriminators. Our findings on simulated data show highly competitive performance relative to some of the most recent likelihood-free posterior simulators.	翻訳日:2023-07-24 16:47:30 公開日:2023-07-20
# 単一量子ビットセンサを用いた2次元双極子スピンアンサンブルの探索ダイナミクス Probing dynamics of a two-dimensional dipolar spin ensemble using single qubit sensor ( http://arxiv.org/abs/2207.10688v2 ) ライセンス: Link先を確認	Kristine Rezai, Soonwon Choi, Mikhail D. Lukin, Alexander O. Sushkov	(参考訳) 量子多体系の微視的熱化ダイナミクスを理解することは、現代の統計物理学の中心的な課題の一つである。ここでは,ダイヤモンド結晶表面上の電子スピンの2次元アンサンブルにおける個々のスピンダイナミクスを実験的に検討する。表面近傍nv中心をナノスケール磁気センサとして、双極子相互作用面スピンアンサンブルにおける個々のスピンの相関ダイナミクスを調べる。各スピンの緩和速度は, 近傍の磁場変動の時間スケールと強く相関し, 自在に推定された双極子相互作用強度に基づいて, ネイブ期待よりも著しく遅いことが観察された。この不規則に緩やかな緩和速度は、強い動的障害の存在によるものであり、動的共鳴計数に基づく定量的な説明を示す。最後に、共振スピンロック駆動を用いて局所磁場の有効強度を制御し、異なる状態における動的障害の役割を明らかにする。我々の研究は、強く相互作用する無秩序なスピンアンサンブルにおける量子熱化の微視的研究と制御への道を開いた。 Understanding the thermalization dynamics of quantum many-body systems at the microscopic level is among the central challenges of modern statistical physics. Here we experimentally investigate individual spin dynamics in a two-dimensional ensemble of electron spins on the surface of a diamond crystal. We use a near-surface NV center as a nanoscale magnetic sensor to probe correlation dynamics of individual spins in a dipolar interacting surface spin ensemble. We observe that the relaxation rate for each spin is significantly slower than the naive expectation based on independently estimated dipolar interaction strengths with nearest neighbors and is strongly correlated with the timescale of the local magnetic field fluctuation. We show that this anomalously slow relaxation rate is due to the presence of strong dynamical disorder and present a quantitative explanation based on dynamic resonance counting. Finally, we use resonant spin-lock driving to control the effective strength of the local magnetic fields and reveal the role of the dynamical disorder in different regimes. Our work paves the way towards microscopic study and control of quantum thermalization in strongly interacting disordered spin ensembles.	翻訳日:2023-07-24 16:45:57 公開日:2023-07-20
# 正規化リスク最小化のための分布シフト下の単調リスク関係 Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization ( http://arxiv.org/abs/2210.11589v2 ) ライセンス: Link先を確認	Daniel LeJeune, Jiayu Liu, Reinhard Heckel	(参考訳) 機械学習システムは、トレーニング分布とは異なる分布から引き出されたデータに適用されることが多い。近年の研究では,様々な分類・信号再構成問題に対して,分布外性能と分布内性能との相関が強く示されている。この関係やより一般に単調な関係が成り立つと、それは重要な結果をもたらす。例えば、あるディストリビューションのパフォーマンスを、もう一方のパフォーマンスのプロキシとして最適化することができる。本稿では,2つの分布におけるモデルの性能の単調な関係が期待できる条件について検討する。共変量シフトの下でのリッジ正規化一般線形モデルの二乗誤差に対する完全漸近線形関係と誤分類誤差に対する単調関係および線形逆問題に対する近似線形関係を証明した。 Machine learning systems are often applied to data that is drawn from a different distribution than the training distribution. Recent work has shown that for a variety of classification and signal reconstruction problems, the out-of-distribution performance is strongly linearly correlated with the in-distribution performance. If this relationship or more generally a monotonic one holds, it has important consequences. For example, it allows to optimize performance on one distribution as a proxy for performance on the other. In this paper, we study conditions under which a monotonic relationship between the performances of a model on two distributions is expected. We prove an exact asymptotic linear relation for squared error and a monotonic relation for misclassification error for ridge-regularized general linear models under covariate shift, as well as an approximate linear relation for linear inverse problems.	翻訳日:2023-07-24 16:37:26 公開日:2023-07-20
# ADPS:画像異常検出のための非対称蒸留後分離法 ADPS: Asymmetric Distillation Post-Segmentation Method for Image Anomaly Detection ( http://arxiv.org/abs/2210.10495v2 ) ライセンス: Link先を確認	Peng Xing, Hao Tang, Jinhui Tang, Zechao Li	(参考訳) 知識蒸留に基づく異常検出(KDAD)手法は,両ネットワークが抽出した特徴を対比することにより,異常領域の検出とセグメント化を行う教師学生パラダイムに依存している。しかし、既存のKDADメソッドには2つの制限がある。 1)生徒ネットワークは、教師ネットワークの表現を必死に再現することができ、 2)教師ネットワークの特徴は「参照基準」としてのみ機能し,完全に活用されていない。この目的のために、確立されたパラダイムから離れ、代わりに非対称蒸留ポストセグメンテーション(ADPS)と呼ばれる革新的なアプローチを提案する。我々のADPSは教師-学生ネットワークの入力と同一の画像の異なる形態の非対称蒸留パラダイムを採用し、学生ネットワークに異常領域の識別表現を学習させる。一方,非対称パラダイムから得られた蒸留知識を教師ネットワークに伝達する粗い局所化マスクを生成するために,カスタマイズされた重みマスクブロック(wmb)を提案する。 WMBを組み込んだPSM(Post-Segmentation Module)は,微細な構造と明確な境界を持つ異常領域を効果的に検出し,分割することができる。実験の結果,ADPSは異常の検出とセグメント化において最先端の手法よりも優れていた。驚いたことに、ADPSは平均精度(AP)を、MVTec ADとKolektorSDD2データセットでそれぞれ9%、20%改善している。 Knowledge Distillation-based Anomaly Detection (KDAD) methods rely on the teacher-student paradigm to detect and segment anomalous regions by contrasting the unique features extracted by both networks. However, existing KDAD methods suffer from two main limitations: 1) the student network can effortlessly replicate the teacher network's representations, and 2) the features of the teacher network serve solely as a ``reference standard" and are not fully leveraged. Toward this end, we depart from the established paradigm and instead propose an innovative approach called Asymmetric Distillation Post-Segmentation (ADPS). Our ADPS employs an asymmetric distillation paradigm that takes distinct forms of the same image as the input of the teacher-student networks, driving the student network to learn discriminating representations for anomalous regions. Meanwhile, a customized Weight Mask Block (WMB) is proposed to generate a coarse anomaly localization mask that transfers the distilled knowledge acquired from the asymmetric paradigm to the teacher network. Equipped with WMB, the proposed Post-Segmentation Module (PSM) is able to effectively detect and segment abnormal regions with fine structures and clear boundaries. Experimental results demonstrate that the proposed ADPS outperforms the state-of-the-art methods in detecting and segmenting anomalies. Surprisingly, ADPS significantly improves Average Precision (AP) metric by 9% and 20% on the MVTec AD and KolektorSDD2 datasets, respectively.	翻訳日:2023-07-24 16:37:10 公開日:2023-07-20
# 協調進化探索によるML対応自律システムの危険性境界の同定 Identifying the Hazard Boundary of ML-enabled Autonomous Systems Using Cooperative Co-Evolutionary Search ( http://arxiv.org/abs/2301.13807v2 ) ライセンス: Link先を確認	Sepehr Sharifi, Donghwan Shin, Lionel C. Briand and Nathan Aschbacher	(参考訳) 機械学習(ML)対応自律システム(MLAS)では,MLコンポーネント(MLC)の危険境界を解析で識別することが不可欠である。このようなバウンダリがLCCの振る舞いやハザードに繋がるシステムコンテキストという観点で条件を捉えていることを考慮すれば、例えばハザード境界に到達する際に、事前に定義されたフォールバック機構を実行時に取得できる安全モニターを構築することができる。しかし、このようなMLコンポーネントのハザード境界を決定することは困難である。これは、システムコンテキスト(シナリオ)とMLCの振る舞い(入力と出力)を組み合わせた問題空間が、徹底的な探索には大きすぎること、遺伝的アルゴリズムのような従来のメタヒューリスティック(メタヒューリスティック)を扱うことにさえ原因がある。さらに、MLASの安全性違反を判定するために必要なシミュレーションの計算コストが高いため、この問題はさらに難しくなる。さらに、シミュレーションにおける制御不能なパラメータとMLASにおけるMLモデル(例えばディープニューラルネットワーク)の非線形な振る舞いのために、問題空間内の領域が決定論的に安全または安全でないと考えることは非現実的である。この課題に対処するために,協調進化アルゴリズム(CCEA)に基づく新しい手法であるMLCSHE(ML Component Safety Hazard Envelope)を提案する。さらに,安全で安全でない領域を確率論的に捉え,確率的ハザード境界からの距離を測定する新しい適合関数を定義し,探索を効果的に推進する。複雑な自律走行車(AV)におけるMLCSHEの有効性と効率について検討した。評価の結果,MLCSHEは標準的な遺伝的アルゴリズムやランダム検索よりも効率的かつ効率的であることが示唆された。 In Machine Learning (ML)-enabled autonomous systems (MLASs), it is essential to identify the hazard boundary of ML Components (MLCs) in the MLAS under analysis. Given that such boundary captures the conditions in terms of MLC behavior and system context that can lead to hazards, it can then be used to, for example, build a safety monitor that can take any predefined fallback mechanisms at runtime when reaching the hazard boundary. However, determining such hazard boundary for an ML component is challenging. This is due to the problem space combining system contexts (i.e., scenarios) and MLC behaviors (i.e., inputs and outputs) being far too large for exhaustive exploration and even to handle using conventional metaheuristics, such as genetic algorithms. Additionally, the high computational cost of simulations required to determine any MLAS safety violations makes the problem even more challenging. Furthermore, it is unrealistic to consider a region in the problem space deterministically safe or unsafe due to the uncontrollable parameters in simulations and the non-linear behaviors of ML models (e.g., deep neural networks) in the MLAS under analysis. To address the challenges, we propose MLCSHE (ML Component Safety Hazard Envelope), a novel method based on a Cooperative Co-Evolutionary Algorithm (CCEA), which aims to tackle a high-dimensional problem by decomposing it into two lower-dimensional search subproblems. Moreover, we take a probabilistic view of safe and unsafe regions and define a novel fitness function to measure the distance from the probabilistic hazard boundary and thus drive the search effectively. We evaluate the effectiveness and efficiency of MLCSHE on a complex Autonomous Vehicle (AV) case study. Our evaluation results show that MLCSHE is significantly more effective and efficient compared to a standard genetic algorithm and random search.	翻訳日:2023-07-24 16:28:29 公開日:2023-07-20
# マニフォールドニューラルネットワークの収束率 A Convergence Rate for Manifold Neural Networks ( http://arxiv.org/abs/2212.12606v2 ) ライセンス: Link先を確認	Joyce Chew and Deanna Needell and Michael Perlmutter	(参考訳) 幾何深層学習の急速に発展する分野は、グラフや多様体のような非ユークリッド領域でそのようなデータを解析するためのニューラルネットワークアーキテクチャの開発を目指している。 Z. Wang, L. Ruiz, A. Ribeiroの最近の研究は、ラプラスベルトラミ作用素のスペクトル分解を用いて多様体ニューラルネットワークを構築する方法を紹介している。さらに,本研究では,多様体が未知かつ有限個のサンプル点しかアクセスできない場合に,そのようなニューラルネットワークを実装するための数値スキームを提案する。著者らは、データ駆動グラフの構築に依存するこのスキームは、標本点の数が無限になるにつれて連続限界に収束することを示した。ここでは、多様体の内在次元に依存するが、周囲次元とは独立な収束率を確立することにより、この結果の上に構築する。また,収束速度は,ネットワークの深さと各層で使用されるフィルタ数にどのように依存するかについても検討した。 High-dimensional data arises in numerous applications, and the rapidly developing field of geometric deep learning seeks to develop neural network architectures to analyze such data in non-Euclidean domains, such as graphs and manifolds. Recent work by Z. Wang, L. Ruiz, and A. Ribeiro has introduced a method for constructing manifold neural networks using the spectral decomposition of the Laplace Beltrami operator. Moreover, in this work, the authors provide a numerical scheme for implementing such neural networks when the manifold is unknown and one only has access to finitely many sample points. The authors show that this scheme, which relies upon building a data-driven graph, converges to the continuum limit as the number of sample points tends to infinity. Here, we build upon this result by establishing a rate of convergence that depends on the intrinsic dimension of the manifold but is independent of the ambient dimension. We also discuss how the rate of convergence depends on the depth of the network and the number of filters used in each layer.	翻訳日:2023-07-24 16:27:26 公開日:2023-07-20
# 強磁場中におけるスカラー荷電粒子によるツイスト光子の放出 Emission of twisted photons by a scalar charged particle in a strong magnetic field ( http://arxiv.org/abs/2303.01946v2 ) ライセンス: Link先を確認	D. Karlovets, A. Di Piazza	(参考訳) 一定かつ均一な磁場中におけるスカラー荷電粒子による光子の放出について考察する。光子と外部電荷の両方が検出されるという従来のアプローチとは対照的に、電荷のみが検出され、放出された光子の特性が調査される場合について検討する。背景磁場は計算において正確に考慮され、電荷は相対論的ランダウ状態によって記述される。放出された光子状態は、それぞれ初期荷電粒子と最終荷電粒子の角運動量として$\ell-\ell'$ と$\ell'$ が与えられる全角運動量を持つねじれたベッセルビームを表すことが示されている。非偏極電荷、特にハードX線と$\gamma$-ray範囲、および臨界および亜臨界磁場において、シュウィンガー値が$H_c = 4.4\times 10^9$Tと比較すると、ほとんどの光子は$\ell-\ell'\gtrsim 1$でねじられる。 We consider the emission of a photon by a scalar charged particle in a constant and uniform magnetic field. In contrast to the conventional approach with both photon and outgoing charge being assumed to be detected, we study the case where only the charge is detected and investigate the properties of the emitted photon. The background magnetic field is taken into account exactly in the calculations and the charge is described by relativistic Landau states. It is shown that the emitted photon state represents a twisted Bessel beam with a total angular momentum given by $\ell-\ell'$, where $\ell$ and $\ell'$ are angular momentum quantum numbers of the initial and final charged particle, respectively. The majority of photons emitted by unpolarized charges, especially in the hard X-ray and $\gamma$-ray range and in critical and sub-critical magnetic fields, as compared to the Schwinger value of $H_c = 4.4\times 10^9$ T, turn out to be twisted with $\ell-\ell'\gtrsim 1$.	翻訳日:2023-07-24 16:20:21 公開日:2023-07-20
# クリフォード回路を用いた分割量子化学シミュレーション Partitioning Quantum Chemistry Simulations with Clifford Circuits ( http://arxiv.org/abs/2303.01221v2 ) ライセンス: Link先を確認	Philipp Schleich, Joseph Boen, Lukasz Cincio, Abhinav Anand, Jakob S. Kottmann, Sergei Tretiak, Pavel A. Dub, Al\'an Aspuru-Guzik	(参考訳) 現在の量子コンピューティングハードウェアは、量子コンピュータ上での量子化学計算において、より大きく複雑な分子の研究を短期的に制限するわずかなノイズ量子ビットの可用性によって制限されている。本研究では,量子回路と変分量子固有解器の枠組みに留まりながら,古典的および近古典的処理の限界について検討する。この目的のために,分離可能なペア ansatz 形式を適応させたパラメトリズド波動関数に対して,naive と physical に動機づけられ,古典的に効率的な積 ansatz を考える。このアンサッツから派生したサブシステム間の相互作用を考慮した後処理と組み合わせる。古典的処理は、強制されたサブシステム間の支持を持ち、ハミルトニアンに折り畳まれる別の量子回路によって与えられる。ハミルトン項の数が指数関数的に増加するのを避けるために、エンタングリング演算は純粋にクリフォード回路または近クリフォード回路から構成される。クリフォード回路は古典的に効率的にシミュレートできるが、それらは普遍的ではない。表現性の欠如を考慮し、選択された非クリフォードゲートの少ない近クリフォード回路を用いる。この目的を達成するための正確な回路構造は分子に依存し、シミュレートアニーリングと遺伝的アルゴリズムを用いて構築される。関心の分子の集合に対する我々のアプローチを実証し、方法論の到達範囲について検討する。本手法の数値シミュレーションによる実証的検証により, 分離可能なペア・アンサッツと比較して, 最大50\%の量子ビット数の減少が確認された。 Current quantum computing hardware is restricted by the availability of only few, noisy qubits which limits the investigation of larger, more complex molecules in quantum chemistry calculations on quantum computers in the near-term. In this work, we investigate the limits of their classical and near-classical treatment while staying within the framework of quantum circuits and the variational quantum eigensolver. To this end, we consider naive and physically motivated, classically efficient product ansatz for the parametrized wavefunction adapting the separable pair ansatz form. We combine it with post-treatment to account for interactions between subsystems originating from this ansatz. The classical treatment is given by another quantum circuit that has support between the enforced subsystems and is folded into the Hamiltonian. To avoid an exponential increase in the number of Hamiltonian terms, the entangling operations are constructed from purely Clifford or near-Clifford circuits. While Clifford circuits can be simulated efficiently classically, they are not universal. In order to account for missing expressibility, near-Clifford circuits with only few, selected non-Clifford gates are employed. The exact circuit structure to achieve this objective is molecule-dependent and is constructed using simulated annealing and genetic algorithms. We demonstrate our approach on a set of molecules of interest and investigate the extent of our methodology's reach. Empirical validation of our approach using numerical simulations shows a reduction of the qubit count of up to a 50\% at a similar accuracy as compared to the separable-pair ansatz.	翻訳日:2023-07-24 16:19:58 公開日:2023-07-20
# 単一分子における刺激ラマン転移のスペクトル分割 Spectral splitting of a stimulated Raman transition in a single molecule ( http://arxiv.org/abs/2302.14733v2 ) ライセンス: Link先を確認	Johannes Zirkelbach, Burak Gurlek, Masoud Mirzaei, Alexey Shkarin, Tobias Utikal, Stephan G\"otzinger, Vahid Sandoghdar	(参考訳) ラマン散乱の小さな断面積は、単分子レベルでの直接研究にとって大きな課題となる。共振共振の高フランク・コンドン係数を利用し、電子接地における大きな振動周波数差と励起状態とt < 2kでの動作を選択し、コヒーレント刺激ラマン遷移を分子内で駆動することに成功した。我々は、その現象の特徴的シグネチャとなるスペクトル分割を観察し、モデル化する。本研究は、固体量子光学および情報処理における分子の自由度を内在的に利用するための基礎を定めている。 The small cross section of Raman scattering poses a great challenge for its direct study at the single-molecule level. By exploiting the high Franck-Condon factor of a common-mode resonance, choosing a large vibrational frequency difference in electronic ground and excited states and operation at T < 2K, we succeed at driving a coherent stimulated Raman transition in individual molecules. We observe and model a spectral splitting that serves as a characteristic signature of the phenomenon at hand. Our study sets the ground for exploiting the intrinsic optomechanical degrees of freedom of molecules for applications in solid-state quantum optics and information processing.	翻訳日:2023-07-24 16:19:19 公開日:2023-07-20
# 単眼単発6Dオブジェクトポース推定のためのオープンチャレンジ Open Challenges for Monocular Single-shot 6D Object Pose Estimation ( http://arxiv.org/abs/2302.11827v2 ) ライセンス: Link先を確認	Stefan Thalhammer, Peter H\"onig, Jean-Baptiste Weibel, Markus Vincze	(参考訳) オブジェクトのポーズ推定は、ロボット操作、ビンピック、拡張現実、シーン理解を可能にする非自明なタスクである。単眼物体のポーズ推定は、高性能なディープラーニングベースのソリューションの台頭とともにかなりの勢いを増し、センサが安価で推論が速いため、コミュニティにとって特に興味深い。先行研究は多種多様なポーズ推定問題に対する芸術の包括的状態を確立する。その広い範囲は将来有望な方向を特定するのを困難にしている。我々は,ロボット工学でよく用いられる単発モノクロ6Dオブジェクトのポーズ推定の問題の範囲を狭め,そのような傾向を識別することができる。ロボティクスとコンピュータビジョンに関する最近の論文をレビューすることで、両方の分野の連合に最先端の芸術が確立される。その後、研究者が関連する研究のアイデアを定式化し、技術の現状を効果的に進めるための有望な研究方向を特定した。例えば、メソッドはドメインシフトを克服するのに十分な高度であり、オクルージョンハンドリングは根本的な課題である。また,ロボット工学を進歩させる上での課題として,新規なオブジェクトポーズ推定や課題処理といった課題も強調する。 Object pose estimation is a non-trivial task that enables robotic manipulation, bin picking, augmented reality, and scene understanding, to name a few use cases. Monocular object pose estimation gained considerable momentum with the rise of high-performing deep learning-based solutions and is particularly interesting for the community since sensors are inexpensive and inference is fast. Prior works establish the comprehensive state of the art for diverse pose estimation problems. Their broad scopes make it difficult to identify promising future directions. We narrow down the scope to the problem of single-shot monocular 6D object pose estimation, which is commonly used in robotics, and thus are able to identify such trends. By reviewing recent publications in robotics and computer vision, the state of the art is established at the union of both fields. Following that, we identify promising research directions in order to help researchers to formulate relevant research ideas and effectively advance the state of the art. Findings include that methods are sophisticated enough to overcome the domain shift and that occlusion handling is a fundamental challenge. We also highlight problems such as novel object pose estimation and challenging materials handling as central challenges to advance robotics.	翻訳日:2023-07-24 16:18:45 公開日:2023-07-20
# ニューラルネットワークに基づくスペクトル推定と希少事象予測のための不正確な反復数値線形代数 Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction ( http://arxiv.org/abs/2303.12534v3 ) ライセンス: Link先を確認	John Strahan, Spencer C. Guo, Chatipat Lorpaiboon, Aaron R. Dinner, Jonathan Weare	(参考訳) 複雑なシステムの力学を理解することは、多くの自由度があり、興味のある事象を記述する上で最も重要なものはしばしば明らかではない。遷移作用素の先頭の固有関数は視覚化に有用であり、イベントの確率や平均時間(予測)といった統計計算の効率的な基盤を提供することができる。ここでは、これらの固有関数(スペクトル推定)を計算し、有限間隔でサンプリングされた短い軌跡のデータセットから予測する不正確な反復線型代数法を開発する。生体分子系の可視化と高次元モデルを容易にする低次元モデル上での手法を実証する。強化学習における予測問題の意味について論じる。 Understanding dynamics in complex systems is challenging because there are many degrees of freedom, and those that are most important for describing events of interest are often not obvious. The leading eigenfunctions of the transition operator are useful for visualization, and they can provide an efficient basis for computing statistics such as the likelihood and average time of events (predictions). Here we develop inexact iterative linear algebra methods for computing these eigenfunctions (spectral estimation) and making predictions from a data set of short trajectories sampled at finite intervals. We demonstrate the methods on a low-dimensional model that facilitates visualization and a high-dimensional model of a biomolecular system. Implications for the prediction problem in reinforcement learning are discussed.	翻訳日:2023-07-24 16:09:30 公開日:2023-07-20
# 非一様超グラフ確率ブロックモデルの厳密な回復 Exact recovery for the non-uniform Hypergraph Stochastic Block Model ( http://arxiv.org/abs/2304.13139v2 ) ライセンス: Link先を確認	Ioana Dumitriu, Haixiao Wang	(参考訳) 非一様ハイパーグラフ確率ブロックモデル(hsbm)の下でのランダムハイパーグラフにおけるコミュニティ検出問題を考える。文献の中で初めて、この一様でないケースの下で正確な回復のための鋭いしきい値が、マイナーな制約のもとに確立された。ここでの重要なポイントは、すべての均一な層から情報を集約することで、各層が単独では不可能に見える場合であっても、正確な回復が得られることである。しきい値以上の正確な回復を達成する2つの効率的なアルゴリズムが提供される。我々のアルゴリズムの理論的解析は、非一様ランダムハイパーグラフに対する隣接行列の濃度と正規化に依存しており、これは独立な関心を持つ可能性がある。またパラメータ知識と推定に関するオープンな問題にも対処する。 Consider the community detection problem in random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), where each hyperedge appears independently with some given probability depending only on the labels of its vertices. We establish, for the first time in the literature, a sharp threshold for exact recovery under this non-uniform case, subject to minor constraints; in particular, we consider the model with multiple communities ($K \geq 2$). One crucial point here is that by aggregating information from all the uniform layers, we may obtain exact recovery even in cases when this may appear impossible if each layer were considered alone. Two efficient algorithms that successfully achieve exact recovery above the threshold are provided. The theoretical analysis of our algorithms relies on the concentration and regularization of the adjacency matrix for non-uniform random hypergraphs, which could be of independent interest. We also address some open problems regarding parameter knowledge and estimation.	翻訳日:2023-07-24 15:59:35 公開日:2023-07-20
# 因果部分構造を用いたシフトロバスト分子関係学習 Shift-Robust Molecular Relational Learning with Causal Substructure ( http://arxiv.org/abs/2305.18451v3 ) ライセンス: Link先を確認	Namkyeong Lee, Kanghoon Yoon, Gyoung S. Na, Sein Kim, Chanyoung Park	(参考訳) 近年、分子対間の相互作用の振る舞いを予測することを目的とした分子関係学習が、幅広い応用のために分子科学への関心が高まっている。本研究では,分子関係学習における分布変化に頑健なCMRLを提案する。そこで我々はまず,分子科学の領域知識に基づいて因果関係を仮定し,変数間の関係を明らかにする構造因果モデル(SCM)を構築する。 SCMに基づいて, 組換え分子上での干渉を条件付けした新しい条件付き干渉機構を導入する。条件付き介入の枠組みにより,本モデルは因果的サブ構造から学習し,化学反応に急激な相関を持つショートカットサブ構造の共起効果を緩和する。実世界および合成データセットを用いた様々なタスクに関する大規模な実験は、最先端のベースラインモデルよりもCMRLの方が優れていることを示す。私たちのコードはhttps://github.com/namkyeong/cmrlで利用可能です。 Recently, molecular relational learning, whose goal is to predict the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. In this work, we propose CMRL that is robust to the distributional shift in molecular relational learning by detecting the core substructure that is causally related to chemical reactions. To do so, we first assume a causal relationship based on the domain knowledge of molecular sciences and construct a structural causal model (SCM) that reveals the relationship between variables. Based on the SCM, we introduce a novel conditional intervention framework whose intervention is conditioned on the paired molecule. With the conditional intervention framework, our model successfully learns from the causal substructure and alleviates the confounding effect of shortcut substructures that are spuriously correlated to chemical reactions. Extensive experiments on various tasks with real-world and synthetic datasets demonstrate the superiority of CMRL over state-of-the-art baseline models. Our code is available at https://github.com/Namkyeong/CMRL.	翻訳日:2023-07-24 15:49:42 公開日:2023-07-20
# AIによる意思決定における精度と時間の両方に対する適応的介入 Adaptive interventions for both accuracy and time in AI-assisted human decision making ( http://arxiv.org/abs/2306.07458v2 ) ライセンス: Link先を確認	Siddharth Swaroop, Zana Bu\c{c}inca, Finale Doshi-Velez	(参考訳) 緊急治療室で働く医師など、ユーザが時間的にプレッシャーをかけ、高い精度を必要とする環境では、精度を高め、時間を短縮するaiアシスタントを提供したいと思っています。しかし、異なるタイプのAIアシストには、異なる利点がある。ですから私たちは,2つの目標を最大限にトレードオフするために,さまざまな特性(質問やユーザの)に依存したAI支援に適応したいと考えています。我々は、ユーザーがエイリアンに薬を処方しなければならない研究を紹介し、それを使ってAI支援に適応する可能性を探る。私たちは、質問に応じてAI支援を適用することが有益であるという証拠を見つけ、時間と正確性の間に良いトレードオフをもたらす。今後の研究では、機械学習アルゴリズム(強化学習など)が自動的に適応することを考慮します。 In settings where users are both time-pressured and need high accuracy, such as doctors working in Emergency Rooms, we want to provide AI assistance that both increases accuracy and reduces time. However, different types of AI assistance have different benefits: some reduce time taken while increasing overreliance on AI, while others do the opposite. We therefore want to adapt what AI assistance we show depending on various properties (of the question and of the user) in order to best tradeoff our two objectives. We introduce a study where users have to prescribe medicines to aliens, and use it to explore the potential for adapting AI assistance. We find evidence that it is beneficial to adapt our AI assistance depending on the question, leading to good tradeoffs between time taken and accuracy. Future work would consider machine-learning algorithms (such as reinforcement learning) to automatically adapt quickly.	翻訳日:2023-07-24 15:40:42 公開日:2023-07-20
# 高次元および置換不変異常検出 High-dimensional and Permutation Invariant Anomaly Detection ( http://arxiv.org/abs/2306.03933v2 ) ライセンス: Link先を確認	Vinicius Mikuni, Benjamin Nachman	(参考訳) 新しい物理過程の異常検出法は、高次元確率密度の学習が困難であるため、しばしば低次元空間に限られる。特に構成レベルでは,一般密度推定法では置換不変性や可変長入力などの望ましい特性を組み込むことが困難となる。本研究では, 分散モデルに基づく粒子物理学データに対して, 可変長入力を扱うために特別に設計された置換不変密度推定器を提案する。本手法の有効性は,学習密度を置換不変な異常検出スコアとして利用し,背景のみの仮説の下でジェットを効果的に同定することによって実証する。密度推定法を検証するため, 教師付き分類アルゴリズムにより得られた密度の比について検討し, 比較を行った。 Methods for anomaly detection of new physics processes are often limited to low-dimensional spaces due to the difficulty of learning high-dimensional probability densities. Particularly at the constituent level, incorporating desirable properties such as permutation invariance and variable-length inputs becomes difficult within popular density estimation methods. In this work, we introduce a permutation-invariant density estimator for particle physics data based on diffusion models, specifically designed to handle variable-length inputs. We demonstrate the efficacy of our methodology by utilizing the learned density as a permutation-invariant anomaly detection score, effectively identifying jets with low likelihood under the background-only hypothesis. To validate our density estimation method, we investigate the ratio of learned densities and compare to those obtained by a supervised classification algorithm.	翻訳日:2023-07-24 15:40:02 公開日:2023-07-20
# LiDARデータを用いた埋設考古学構造物のセマンティックセグメンテーション手法のトランファー学習 Tranfer Learning of Semantic Segmentation Methods for Identifying Buried Archaeological Structures on LiDAR Data ( http://arxiv.org/abs/2307.03512v2 ) ライセンス: Link先を確認	Paolo Soleni, Wouter B. Verschoof-van der Vaart, \v{Z}iga Kokalj, Arianna Traviglia, Marco Fiorucci	(参考訳) 考古学的な研究において、深層学習をリモートセンシングデータに適用する際には、トレーニングモデルに適したデータセットが限られている。転送学習の応用は、この欠点を軽減するために頻繁に用いられる。しかし、異なる考古学的データセットに適用する場合、その有効性を調べる必要がある。本稿では,2つのlidarデータセット上の2つの意味セグメンテーション深層ニューラルネットワークを用いた,転送学習構成の性能比較を行う。実験結果から, 考古学における伝達学習に基づくアプローチは, 体系的な拡張がまだ観察されていないものの, 性能改善につながる可能性が示唆された。我々は,今後の研究のベースラインとして機能する技術の有効性について,具体的な知見を提供する。 When applying deep learning to remote sensing data in archaeological research, a notable obstacle is the limited availability of suitable datasets for training models. The application of transfer learning is frequently employed to mitigate this drawback. However, there is still a need to explore its effectiveness when applied across different archaeological datasets. This paper compares the performance of various transfer learning configurations using two semantic segmentation deep neural networks on two LiDAR datasets. The experimental results indicate that transfer learning-based approaches in archaeology can lead to performance improvements, although a systematic enhancement has not yet been observed. We provide specific insights about the validity of such techniques that can serve as a baseline for future works.	翻訳日:2023-07-24 15:31:30 公開日:2023-07-20
# 入力制約型mpcの直接最適化アルゴリズム A direct optimization algorithm for input-constrained MPC ( http://arxiv.org/abs/2306.15079v4 ) ライセンス: Link先を確認	Liang Wu	(参考訳) モデル予測制御(model prediction control, mpc)アルゴリズムを本番組込みプラットフォームで実行する際の課題のひとつは,最悪の計算複雑性の証明書を提供することである。本稿では、入力制約付きMPCに対する \textit{direct} 最適化アルゴリズムを初めて提案する: 繰り返しの回数は、問題次元$n$, 正確な値 $\left\lceil\frac{\log\left(\frac{2n}{\epsilon}\right)}{-2\log(1-\frac{1}{4\sqrt{2n}})}\right\rceil+1$, ここで$\epsilon$は所定の停止精度を示す。 One challenge of running a model predictive control (MPC) algorithm in a production-embedded platform is to provide the certificate of worst-case computation complexity, that is, its maximum execution time has to always be smaller than sampling time. This paper proposes for the first time a \textit{direct} optimization algorithm for input-constrained MPC: the number of iterations is data-independent and dependent on the problem dimension $n$, with exact value $\left\lceil\frac{\log\left(\frac{2n}{\epsilon}\right)}{-2\log(1-\frac{1}{4\sqrt{2n}})}\right\rceil+1$, where $\epsilon$ denotes a given stopping accuracy.	翻訳日:2023-07-24 15:29:19 公開日:2023-07-20
# FAIR: 判断の逆転を正確に推測するための因果関係フレームワーク FAIR: A Causal Framework for Accurately Inferring Judgments Reversals ( http://arxiv.org/abs/2306.11585v2 ) ライセンス: Link先を確認	Minghua He, Nanfei Gu, Yuntao Shi, Qionghui Zhang, Yaying Chen	(参考訳) 人工知能研究者は近年、法的なインテリジェンスに大きな進歩を遂げている。しかし、既存の研究は、法的知性の効率の向上を制限する判断の反転に埋め込まれた重要な価値に焦点を絞ってはいない。本稿では,実際の中国語の判断をモデルとしたケースリバーサル(FAIR)の高精度推論のための因果的枠組みを提案する。因果推論法による判断反転の原因を抽出し,得られた因果関係を事前知識としてニューラルネットワークに注入する。そして、我々のフレームワークは、法的判断予測タスクとして挑戦的なデータセット上で検証される。実験の結果,提案手法は判断の反転において最も重要な要素を活用でき,得られた因果関係はニューラルネットワークの性能を効果的に改善できることがわかった。さらに、ChatGPTを例として、法的な知能タスクのための大規模言語モデルの一般化能力について論じる。実験の結果,大規模言語モデルの一般化能力にはまだ欠陥が残っており,因果関係のマイニングは,モデル予測の精度を効果的に向上し,説明できることがわかった。 Artificial intelligence researchers have made significant advances in legal intelligence in recent years. However, the existing studies have not focused on the important value embedded in judgments reversals, which limits the improvement of the efficiency of legal intelligence. In this paper, we propose a causal Framework for Accurately Inferring case Reversals (FAIR), which models the problem of judgments reversals based on real Chinese judgments. We mine the causes of judgments reversals by causal inference methods and inject the obtained causal relationships into the neural network as a priori knowledge. And then, our framework is validated on a challenging dataset as a legal judgment prediction task. The experimental results show that our framework can tap the most critical factors in judgments reversal, and the obtained causal relationships can effectively improve the neural network's performance. In addition, we discuss the generalization ability of large language models for legal intelligence tasks using ChatGPT as an example. Our experiment has found that the generalization ability of large language models still has defects, and mining causal relationships can effectively improve the accuracy and explain ability of model predictions.	翻訳日:2023-07-24 15:28:07 公開日:2023-07-20
# 長いステップを通したより高速なグラディエント染料 Provably Faster Gradient Descent via Long Steps ( http://arxiv.org/abs/2307.06324v4 ) ライセンス: Link先を確認	Benjamin Grimmer	(参考訳) 本研究は, 滑らかな凸最適化における勾配降下の収束速度を, コンピュータ支援解析手法により確実に向上させる。本理論は、多くの反復の全体的な効果を、ほとんどの一階法分析で使われる典型的な単文帰納法ではなく、一度に分析することにより、頻繁な長いステップでポリシーを段階化することを可能にする。短期的に客観的な価値を高めるための長いステップは、長期的には確実により早く収束することを示している。勾配降下のより高速な$O(1/T\log T)$レートを証明するための予想も、単純な数値検証と共に動機付けられる。 This work establishes provably faster convergence rates for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster $O(1/T\log T)$ rate for gradient descent is also motivated along with simple numerical validation.	翻訳日:2023-07-24 15:20:51 公開日:2023-07-20
# ZeroQuant-FP:浮動小数点フォーマットを用いたLLM後のW4A8量子化 ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats ( http://arxiv.org/abs/2307.09782v2 ) ライセンス: Link先を確認	Xiaoxia Wu and Zhewei Yao and Yuxiong He	(参考訳) 大規模言語モデル(LLM)の複雑な領域では、計算効率とモデル品質の維持のバランスを崩すことは、非常に難しい課題である。均一量子化の本質的な限界をナビゲートし、特に外れ値を扱う場合、NVIDIAのH100ハードウェアのローンチによって動機づけられたこの研究は、浮動小数点量子化(FP)の生存可能性、特にFP8とFP4に焦点をあてる。我々の総合的な調査によると、LLMでは、FP8のアクティベーションは整数(INT8)を一貫して上回り、性能エッジは10億を超えるパラメータを持つモデルでより顕著になる。重量量子化では、FP4はINT4に匹敵する性能を示し、H100のようなFP対応ハードウェアへの展開を単純化している。重みとアクティベーションの差に起因する精度アライメントのオーバーヘッドを軽減するため、標準のw4a8モデルと比較して性能に悪影響を及ぼす2つの重み量子化のスケーリング制約を提案する。さらに、低ランク補償(LoRC)戦略を統合することで量子化手法を強化し、特に小型モデルにおいて改善をもたらす。本研究は, LLMにおけるFP量子化の可能性を強調し, 資源制限環境における高効率展開の道を開くものである。 In the complex domain of large language models (LLMs), striking a balance between computational efficiency and maintaining model quality is a formidable challenge. Navigating the inherent limitations of uniform quantization, particularly when dealing with outliers, and motivated by the launch of NVIDIA's H100 hardware, this study delves into the viability of floating-point (FP) quantization, particularly focusing on FP8 and FP4, as a potential solution. Our comprehensive investigation reveals that for LLMs, FP8 activation consistently outshines its integer (INT8) equivalent, with the performance edge becoming more noticeable in models possessing parameters beyond one billion. For weight quantization, our findings indicate that FP4 exhibits comparable, if not superior, performance to INT4, simplifying deployment on FP-supported hardware like H100. To mitigate the overhead from precision alignment caused by the disparity between weights and activations, we propose two scaling constraints for weight quantization that negligibly impact the performance compared to the standard W4A8 model. We additionally enhance our quantization methods by integrating the Low Rank Compensation (LoRC) strategy, yielding improvements especially in smaller models. The results of our investigation emphasize the immense potential of FP quantization for LLMs, paving the way for high-efficiency deployment in resource-limited settings.	翻訳日:2023-07-24 15:09:57 公開日:2023-07-20
# 単一回路を用いた量子ニューラルネットワークの全てのパラメータに関する勾配の計算 Computing the gradients with respect to all parameters of a quantum neural network using a single circuit ( http://arxiv.org/abs/2307.08167v2 ) ライセンス: Link先を確認	Guang Ping He	(参考訳) パラメータシフト規則を用いて量子ニューラルネットワークの勾配を計算する場合、ネットワークの1つの調整可能なパラメータに対して、勾配に対してコスト関数を2回計算する必要がある。パラメータの総数が多い場合には、計算のための量子回路を何度も調整して実行しなければならない。本稿では,回路深度を小さくし,古典レジスタを小さくした単一回路のみを用いた勾配計算手法を提案する。また、実量子ハードウェアとシミュレータの両方で実験により、回路が従来の手法よりもはるかに短い時間でコンパイルできるという利点があり、結果として全体の実行速度が向上することを示した。 When computing the gradients of a quantum neural network using the parameter-shift rule, the cost function needs to be calculated twice for the gradient with respect to a single adjustable parameter of the network. When the total number of parameters is high, the quantum circuit for the computation has to be adjusted and run for many times. Here we propose an approach to compute all the gradients using a single circuit only, with a much reduced circuit depth and less classical registers. We also demonstrate experimentally, on both real quantum hardware and simulator, that our approach has the advantages that the circuit takes a significantly shorter time to compile than the conventional approach, resulting in a speedup on the total runtime.	翻訳日:2023-07-24 15:08:17 公開日:2023-07-20
# 認知症患者の日常生活行動パターンの変化を識別するためのマルコフ連鎖モデル A Markov Chain Model for Identifying Changes in Daily Activity Patterns of People Living with Dementia ( http://arxiv.org/abs/2307.11126v1 ) ライセンス: Link先を確認	Nan Fletcher-Lloyd, Alina-Irina Serban, Magdalena Kolanko, David Wingfield, Danielle Wilson, Ramin Nilforooshan, Payam Barnaghi, and Eyal Soreq	(参考訳) 栄養失調と脱水は認知症患者(plwd)の認知機能低下と、健常者と比較して入院率の上昇に強く関係している。食事や飲酒行動の過度な変化は、しばしば栄養失調や脱水を引き起こし、認知と機能低下の進行を加速させ、生活の質を著しく低下させる。残念ながら、このような変化を客観的に検出する方法は確立されていない。本稿では,iot(internet of things, モノのインターネット)技術を用いて,73世帯のplwdから収集した家庭内モニタリングデータを分析した。コロナウイルス2019(COVID-19)パンデミックは、PLWDの行動習慣、特に飲食習慣を劇的に変えたことがこれまで示されていた。新型コロナウイルスのパンデミックを自然実験として使用し,499日間連続観察されたPLWD21世帯のキッチン活動の変化を線形混合効果モデルを用いて検討した。昼間のキッチン活動の増加と夜間のキッチン活動の著しい減少(t(147) = -2.90, p < 0.001)を報告した。さらに, 遠隔監視データに適用したマルコフモデルを用いたplwdの挙動変化を, 直接計測できない行動のプロキシとして検出する新しい解析手法を提案する。これらの結果は, PLWDの自然的環境におけるモニタリングの改善と, 反応性から積極的ケアへの転換の道を開くものである。 Malnutrition and dehydration are strongly associated with increased cognitive and functional decline in people living with dementia (PLWD), as well as an increased rate of hospitalisations in comparison to their healthy counterparts. Extreme changes in eating and drinking behaviours can often lead to malnutrition and dehydration, accelerating the progression of cognitive and functional decline and resulting in a marked reduction in quality of life. Unfortunately, there are currently no established methods by which to objectively detect such changes. Here, we present the findings of an extensive quantitative analysis conducted on in-home monitoring data collected from 73 households of PLWD using Internet of Things technologies. The Coronavirus 2019 (COVID-19) pandemic has previously been shown to have dramatically altered the behavioural habits, particularly the eating and drinking habits, of PLWD. Using the COVID-19 pandemic as a natural experiment, we conducted linear mixed-effects modelling to examine changes in mean kitchen activity within a subset of 21 households of PLWD that were continuously monitored for 499 days. We report an observable increase in day-time kitchen activity and a significant decrease in night-time kitchen activity (t(147) = -2.90, p < 0.001). We further propose a novel analytical approach to detecting changes in behaviours of PLWD using Markov modelling applied to remote monitoring data as a proxy for behaviours that cannot be directly measured. Together, these results pave the way to introduce improvements into the monitoring of PLWD in naturalistic settings and for shifting from reactive to proactive care.	翻訳日:2023-07-24 14:51:44 公開日:2023-07-20
# 弱コヒーレント状態を用いたベル不等式違反 Violating Bell inequality using weak coherent states ( http://arxiv.org/abs/2307.11123v1 ) ライセンス: Link先を確認	Moslem Mahdavifar and S. M. Hashemi Rafsanjani	(参考訳) 連続波レーザーを用いた2光子干渉の実験的検討を行う。連続波レーザーによる位相ランダム化弱コヒーレント状態を用いたCHSH不等式違反を示す。我々の実装は、古典的ソースと見なされるソースの量子的性質を明らかにするためのアプローチとして機能する。 We present an experimental investigation of two-photon interference using a continuous-wave laser. We demonstrate the violation of the CHSH inequality using the phase randomized weak coherent states from a continuous wave laser. Our implementation serves as an approach to reveal the quantum nature of a source that is considered to be a classical source.	翻訳日:2023-07-24 14:51:18 公開日:2023-07-20
# 銀河画像の確率的デコンボリューションのための拡散モデル Diffusion Models for Probabilistic Deconvolution of Galaxy Images ( http://arxiv.org/abs/2307.11122v1 ) ライセンス: Link先を確認	Zhiwei Xue, Yuhang Li, Yash Patel, Jeffrey Regier	(参考訳) 望遠鏡は特定の点拡散関数(PSF)で画像をキャプチャする。 PSFデコンボリューション(PSF deconvolution)として知られる問題である、よりシャープなPSFで画像がどのように見えるかを推測することは、PSFコンボリューションが可逆変換ではないために不適切である。深部生成モデルがPSFの非畳み込みに訴えているのは、PSFと結合した場合に観測結果が生成される可能性のある候補画像の後方分布を推測できるためである。しかしながら、VAEやGANのような古典的な深層生成モデルは、しばしば不十分なサンプル多様性をもたらす。代替として,銀河画像のpsf分解のための分類器フリー条件拡散モデルを提案する。この拡散モデルが条件付きvaeと比較して可能なデコンボリューションのより広い多様性を捉えることを実証する。 Telescopes capture images with a particular point spread function (PSF). Inferring what an image would have looked like with a much sharper PSF, a problem known as PSF deconvolution, is ill-posed because PSF convolution is not an invertible transformation. Deep generative models are appealing for PSF deconvolution because they can infer a posterior distribution over candidate images that, if convolved with the PSF, could have generated the observation. However, classical deep generative models such as VAEs and GANs often provide inadequate sample diversity. As an alternative, we propose a classifier-free conditional diffusion model for PSF deconvolution of galaxy images. We demonstrate that this diffusion model captures a greater diversity of possible deconvolutions compared to a conditional VAE.	翻訳日:2023-07-24 14:51:13 公開日:2023-07-20
# 計算倫理から道徳へ : 意思決定アルゴリズムが道徳原理の出現、最適な行動の存在、そしてそれを発見する能力を理解するのにどのように役立つか From computational ethics to morality: how decision-making algorithms can help us understand the emergence of moral principles, the existence of an optimal behaviour and our ability to discover it ( http://arxiv.org/abs/2307.11119v1 ) ライセンス: Link先を確認	Eduardo C. Garrido-Merch\'an, Sara Lumbreras-Sancho	(参考訳) 本稿では,計算倫理観から得られた具体的な洞察を提供することにより,道徳性を自然化するための進化的倫理の努力を付け加える。本稿では,人工知能の最も成功したパラダイムの一つである強化学習に基づく,人間の意思決定のスタイル化モデルを提案する。強化学習に関する主要な概念が提示された後、倫理の進化的説明を照らし出すことのできる、特に有用な並列性が描かれた。具体的には,エージェントの条件を考慮した最適な政策(あるいは,客観的な倫理的原則)の存在について検討する。さらに、この方針が試行錯誤によってどのように学習可能かを示し、強化学習の文脈でよく知られた2つの定理の仮説を支持する。結論として,提案する枠組みを拡大して,人間行動の他の潜在的に興味深い分野について形式化の観点から検討する。 This paper adds to the efforts of evolutionary ethics to naturalize morality by providing specific insights derived from a computational ethics view. We propose a stylized model of human decision-making, which is based on Reinforcement Learning, one of the most successful paradigms in Artificial Intelligence. After the main concepts related to Reinforcement Learning have been presented, some particularly useful parallels are drawn that can illuminate evolutionary accounts of ethics. Specifically, we investigate the existence of an optimal policy (or, as we will refer to, objective ethical principles) given the conditions of an agent. In addition, we will show how this policy is learnable by means of trial and error, supporting our hypotheses on two well-known theorems in the context of Reinforcement Learning. We conclude by discussing how the proposed framework can be enlarged to study other potentially interesting areas of human behavior from a formalizable perspective.	翻訳日:2023-07-24 14:51:00 公開日:2023-07-20
# 発散物除去のための運動量を用いた拡散サンプリング Diffusion Sampling with Momentum for Mitigating Divergence Artifacts ( http://arxiv.org/abs/2307.11118v1 ) ライセンス: Link先を確認	Suttisak Wizadwongsa, Worameth Chinchuthakun, Pramook Khungurn, Amit Raj, Supasorn Suwajanakorn	(参考訳) 画像生成における拡散モデルの顕著な成功にもかかわらず、遅いサンプリングは永続的な問題である。サンプリングプロセスの高速化を目的として,先行研究はODE/SDEとして拡散サンプリングを改良し,高次数値法を導入した。しかしながら、これらの手法はしばしば分岐アーティファクトを生成し、特に少ないサンプリングステップで達成可能な加速を制限する。本稿では,これらのアーティファクトの潜在的な原因を調査し,これらの方法の小さな安定性領域が主な原因である可能性を示唆する。この問題に対処するため,我々は2つの新しい手法を提案する。最初の手法は、最適化を改善する有名な手法である重球運動量(hb)を既存の拡散数値法に組み込んで安定化領域を広げることである。また、結果の方法が一階収束であることも証明する。第2のテクニックは、GHVB(Generalized Heavy Ball)と呼ばれ、精度とアーティファクトの抑制のトレードオフを提供する新しい高階法を構築する。提案手法は,低ステップサンプリングのためのピクセルベースおよび潜在拡散モデルの両方において,最先端の拡散ソルバを上回って,アーティファクトの削減と画質向上に極めて有効であることを示す。本研究は,今後の拡散作業のための数値手法の設計に関する新たな知見を提供する。 Despite the remarkable success of diffusion models in image generation, slow sampling remains a persistent issue. To accelerate the sampling process, prior studies have reformulated diffusion sampling as an ODE/SDE and introduced higher-order numerical methods. However, these methods often produce divergence artifacts, especially with a low number of sampling steps, which limits the achievable acceleration. In this paper, we investigate the potential causes of these artifacts and suggest that the small stability regions of these methods could be the principal cause. To address this issue, we propose two novel techniques. The first technique involves the incorporation of Heavy Ball (HB) momentum, a well-known technique for improving optimization, into existing diffusion numerical methods to expand their stability regions. We also prove that the resulting methods have first-order convergence. The second technique, called Generalized Heavy Ball (GHVB), constructs a new high-order method that offers a variable trade-off between accuracy and artifact suppression. Experimental results show that our techniques are highly effective in reducing artifacts and improving image quality, surpassing state-of-the-art diffusion solvers on both pixel-based and latent-based diffusion models for low-step sampling. Our research provides novel insights into the design of numerical methods for future diffusion work.	翻訳日:2023-07-24 14:50:42 公開日:2023-07-20
# 知能の性質 Nature of Intelligence ( http://arxiv.org/abs/2307.11114v1 ) ライセンス: Link先を確認	Barco Jie You	(参考訳) 人間の脳は人間の知能の基盤である。人間の脳をシミュレートすることで、人工知能は学習能力を持つ計算モデルを構築し、人間のレベルに近づくインテリジェントなタスクを実行する。ディープニューラルネットワークは、データの表現を学習し、多くの認識領域における最先端を改善するために複数の計算層から構成される。しかし、人間とAIの両方で一般的に表現される知性の本質は不明である。ここでは、インテリジェンスの性質は、空間と時間に関するデータセット間の機能的関係を確立することにより、システムのエントロピーを最小限に抑える一連の数学的機能的プロセスであることを示す。人間とAIは、エネルギーを消費する強化された方法でこれらのエントロピー還元プロセスを実装することで知性を達成した。この仮説により、言語、無意識、意識の数学的モデルを確立し、神経科学によって発見され、AI工学によって達成される証拠を予測する。さらに、宇宙の全体エントロピーは保守的であると結論付け、知性は、宇宙にもともと存在するが空間と時間の間で分離された物理的または情報的に連結されたデータセットによってエントロピーを減少させる自発的なプロセスと対向する。このエッセイは、宇宙と私たちを人間としてより深く理解するための出発点であり、人間の知性にかかわる高度なAIモデルを達成するためのものであるべきです。さらに、このエッセイは、エントロピーをより効率的なエネルギー消費方法で減らせば、人間よりも高度な知性が存在するべきだと主張している。 The human brain is the substrate for human intelligence. By simulating the human brain, artificial intelligence builds computational models that have learning capabilities and perform intelligent tasks approaching the human level. Deep neural networks consist of multiple computation layers to learn representations of data and improve the state-of-the-art in many recognition domains. However, the essence of intelligence commonly represented by both humans and AI is unknown. Here, we show that the nature of intelligence is a series of mathematically functional processes that minimize system entropy by establishing functional relationships between datasets over space and time. Humans and AI have achieved intelligence by implementing these entropy-reducing processes in a reinforced manner that consumes energy. With this hypothesis, we establish mathematical models of language, unconsciousness and consciousness, predicting the evidence to be found by neuroscience and achieved by AI engineering. Furthermore, a conclusion is made that the total entropy of the universe is conservative, and intelligence counters the spontaneous processes to decrease entropy by physically or informationally connecting datasets that originally exist in the universe but are separated across space and time. This essay should be a starting point for a deeper understanding of the universe and us as human beings and for achieving sophisticated AI models that are tantamount to human intelligence or even superior. Furthermore, this essay argues that more advanced intelligence than humans should exist if only it reduces entropy in a more efficient energy-consuming way.	翻訳日:2023-07-24 14:50:20 公開日:2023-07-20
# 昆虫の微細な分類のためのトランスフォーマーと畳み込みモデルの比較 Comparison between transformers and convolutional models for fine-grained classification of insects ( http://arxiv.org/abs/2307.11112v1 ) ライセンス: Link先を確認	Rita Pucci, Vincent J. Kalkman, Dan Stowell	(参考訳) 識別的特徴を見つけるのが難しいため、きめ細かい分類は難しい。この問題は、同じ分類群内の種を特定することに適用されると悪化する。これは種がしばしば形態的特徴を共有しており、区別が難しいためである。我々はInsectaの分類学クラスを考える。昆虫の識別は多くの生態系の基盤にある住民の1つであるため、生物多様性監視に不可欠である。市民科学は、野生の昆虫の画像を収集し、専門家がすべての国で改良された分布地図を作成する可能性を秘めている。何十億もの画像が自動的に分類され、ディープニューラルネットワークアルゴリズムが、きめ細かいタスクのために研究されている主要なテクニックの1つです。 SOTAでは、ディープラーニングアルゴリズムの分野は非常に実りが多いので、どのようにアルゴリズムを識別するか? 我々は,オドナタとコレオプテアの順序に着目し,コンピュータビジョンにおいてよく知られた2つの階層構造,トランスフォーマー層と畳み込み層を分析するための初期比較研究を提案する。我々は,完全トランスフォーマーベースであるT2TViT,完全畳み込みベースであるEfficientNet,ハイブリッドであるViTAEの性能を比較した。我々は,3つのモデルの性能を同一条件で分析し,性別,推論時間,およびスマートフォンからの画像のバランスの取れないデータセットを用いて,形態ごとの性能を評価する。 3種類のモデルすべてで高い性能を観察したが,本解析により,ハイブリッドモデルが完全畳み込みベースモデルおよび完全トランスフォーマベースモデルよりも精度において優れ,完全トランスフォーマベースモデルが推論速度において他モデルよりも優れており,トランスフォーマがサンプル不足に対して頑健であり,推論時間が速いことを証明した。 Fine-grained classification is challenging due to the difficulty of finding discriminatory features. This problem is exacerbated when applied to identifying species within the same taxonomical class. This is because species are often sharing morphological characteristics that make them difficult to differentiate. We consider the taxonomical class of Insecta. The identification of insects is essential in biodiversity monitoring as they are one of the inhabitants at the base of many ecosystems. Citizen science is doing brilliant work of collecting images of insects in the wild giving the possibility to experts to create improved distribution maps in all countries. We have billions of images that need to be automatically classified and deep neural network algorithms are one of the main techniques explored for fine-grained tasks. At the SOTA, the field of deep learning algorithms is extremely fruitful, so how to identify the algorithm to use? We focus on Odonata and Coleoptera orders, and we propose an initial comparative study to analyse the two best-known layer structures for computer vision: transformer and convolutional layers. We compare the performance of T2TViT, a fully transformer-base, EfficientNet, a fully convolutional-base, and ViTAE, a hybrid. We analyse the performance of the three models in identical conditions evaluating the performance per species, per morph together with sex, the inference time, and the overall performance with unbalanced datasets of images from smartphones. Although we observe high performances with all three families of models, our analysis shows that the hybrid model outperforms the fully convolutional-base and fully transformer-base models on accuracy performance and the fully transformer-base model outperforms the others on inference speed and, these prove the transformer to be robust to the shortage of samples and to be faster at inference time.	翻訳日:2023-07-24 14:49:54 公開日:2023-07-20
# ドメイン一般化のための平坦性を考慮した最小化 Flatness-Aware Minimization for Domain Generalization ( http://arxiv.org/abs/2307.11108v1 ) ライセンス: Link先を確認	Xingxuan Zhang, Renzhe Xu, Han Yu, Yancheng Dong, Pengfei Tian, Peng Cu	(参考訳) ドメイン一般化(DG)は、未知の分布シフトの下でよく一般化する堅牢なモデルを学ぶことを目指している。 DGの重要な側面として、オプティマイザの選択は深く調査されていない。現在、ほとんどのDGメソッドは広く使われているベンチマークであるDomainBedに従っており、すべてのデータセットのデフォルトオプティマイザとしてAdamを使用している。しかし、Adamは必ずしも現在のDGメソッドやデータセットの大部分にとって最適な選択肢ではない。本研究では,損失景観平坦性の観点から,ゼロ次および1次平坦性を同時に最適化できる領域一般化のための平坦性認識最小化(fad)を提案する。本稿では,FADのアウト・オブ・ディストリビューション(OOD)の一般化誤差と収束に関する理論的解析を行う。実験の結果,様々なDGデータセット上でのFADの優位性を示した。さらに、FADは、他のゼロ階および1階の平坦度対応最適化手法と比較して、フラットな最適性を発見することができることを確認した。 Domain generalization (DG) seeks to learn robust models that generalize well under unknown distribution shifts. As a critical aspect of DG, optimizer selection has not been explored in depth. Currently, most DG methods follow the widely used benchmark, DomainBed, and utilize Adam as the default optimizer for all datasets. However, we reveal that Adam is not necessarily the optimal choice for the majority of current DG methods and datasets. Based on the perspective of loss landscape flatness, we propose a novel approach, Flatness-Aware Minimization for Domain Generalization (FAD), which can efficiently optimize both zeroth-order and first-order flatness simultaneously for DG. We provide theoretical analyses of the FAD's out-of-distribution (OOD) generalization error and convergence. Our experimental results demonstrate the superiority of FAD on various DG datasets. Additionally, we confirm that FAD is capable of discovering flatter optima in comparison to other zeroth-order and first-order flatness-aware optimization methods.	翻訳日:2023-07-24 14:49:24 公開日:2023-07-20
# 双対性を持つ1次元スピン模型における弱普遍性、量子多体傷、異常無限温度自己相関 Weak universality, quantum many-body scars and anomalous infinite-temperature autocorrelations in a one-dimensional spin model with duality ( http://arxiv.org/abs/2307.11161v1 ) ライセンス: Link先を確認	Adithi Udupa, Samudra Sur, Arnab Sen and Diptiman Sen	(参考訳) 3スピン相互作用を持つ1次元スピン1/2モデルと横磁場 $h$ について検討した。このモデルは、z_2 \times z_2$ 対称性を持ち、h$と1/h$の双対性を持つことが知られている。自己双対点の$h=1$は連続相転移を持つ量子臨界点である。臨界指数 $z$, $\beta$, $\gamma$, $\nu$ を計算し、中心電荷 $c$ を正確な対角化を用いて数値的に計算する。 z$ と $c$ の両方が 1$ に等しいことは、臨界点が辺数作用素を持つ共形場理論によって支配されていることを暗示している。 3スピンモデルは4状態ポッツモデルと2つのデカップリング横場イジングモデルの間の中間であるアシュキン・テラー臨界性を示す。エネルギー準位間隔解析は、モデルが可積分でないことを示す。偶数のサイト数と周期境界条件を持つ系には、システムサイズとともに指数関数的に増加する正中スペクトルゼロエネルギー固有状態が存在する。これらの固有状態の部分集合は、$h$の値とは独立な波動関数を持ち、特異な絡み合い構造を持つため、量子多体傷と考えられる。このような量子スカーの数は、少なくともシステムサイズと線形にスケールする。最後に,開放系の一端に近い場所での無限温度自己相関関数について検討する。自己相関者の何人かは異常に時間的にリラックスし、h \gg 1$ または $h \ll 1$ であれば、発音される振動と非常に小さな減衰率を持つ。 h$ が臨界点に近い場合、オートコレレータは終点のオートコレレータを除いて急速に 0 に崩壊する。 We study a one-dimensional spin-1/2 model with three-spin interactions and a transverse magnetic field $h$. The model is known to have a $Z_2 \times Z_2$ symmetry, and a duality between $h$ and $1/h$. The self-dual point at $h=1$ is a quantum critical point with a continuous phase transition. We compute the critical exponents $z$, $\beta$, $\gamma$ and $\nu$, and the central charge $c$ numerically using exact diagonalization. We find that both $z$ and $c$ are equal to $1$, implying that the critical point is governed by a conformal field theory with a marginal operator. The three-spin model exhibits Ashkin-Teller criticality with an effective coupling that is intermediate between four-state Potts model and two decoupled transverse field Ising models. An energy level spacing analysis shows that the model is not integrable. For a system with an even number of sites and periodic boundary conditions, there are exact mid-spectrum zero-energy eigenstates whose number grows exponentially with the system size. A subset of these eigenstates have wave functions which are independent of the value of $h$ and have unusual entanglement structure; hence these can be considered to be quantum many-body scars. The number of such quantum scars scales at least linearly with system size. Finally, we study the infinite-temperature autocorrelation functions at sites close to one end of an open system. We find that some of the autocorrelators relax anomalously in time, with pronounced oscillations and very small decay rates if $h \gg 1$ or $h \ll 1$. If $h$ is close to the critical point, the autocorrelators decay quickly to zero except for an autocorrelator at the end site.	翻訳日:2023-07-24 14:44:25 公開日:2023-07-20
# 時間最適多ビットゲート:複雑度、効率的ヒューリスティックおよびゲート時間境界 Time-optimal multi-qubit gates: Complexity, efficient heuristic and gate-time bounds ( http://arxiv.org/abs/2307.11160v1 ) ライセンス: Link先を確認	Pascal Ba{\ss}ler, Markus Heinrich, Martin Kliesch	(参考訳) マルチキュービット相互作用は量子コンピューティングハードウェアにおいて一様であり、マルチキュービットエンタングゲートを生成することができる。このようなゲートは従来の2ビットゲートよりも有利である。本研究では,マルチキュービットIsing型相互作用と単一キュービットゲートを用いた量子ゲート合成に着目した。これらの相互作用はグローバルZZゲート(GZZゲート)を生成することができる。時間最適マルチキュービットゲートの合成はNPハードであることを示す。しかし、ある仮定の下では、効率的な合成を可能にする時間最適マルチキュービットゲートの明示的な構成を提供する。これらの構築されたマルチキュービットゲートは一定のゲート時間を持ち、線形なシングルキュービットゲート層で実装できる。さらに、高速なマルチキュービットゲートを合成するための多項式ランタイムを持つヒューリスティックアルゴリズムを提供する。最後に、最適GZZゲート時間において、下限と上限を証明した。さらに、任意の GZZ ゲートは n 個の量子ビットの時間 O(n) で実行可能であると推測する。我々はこの主張を理論的および数値的な結果で支持する。 Multi-qubit interactions are omnipresent in quantum computing hardware, and they can generate multi-qubit entangling gates. Such gates promise advantages over traditional two-qubit gates. In this work, we focus on the quantum gate synthesis with multi-qubit Ising-type interactions and single-qubit gates. These interactions can generate global ZZ-gates (GZZ gates). We show that the synthesis of time-optimal multi-qubit gates is NP-hard. However, under certain assumptions we provide explicit constructions of time-optimal multi-qubit gates allowing for efficient synthesis. These constructed multi-qubit gates have a constant gate time and can be implemented with linear single-qubit gate layers. Moreover, a heuristic algorithm with polynomial runtime for synthesizing fast multi-qubit gates is provided. Finally, we prove lower and upper bounds on the optimal GZZ gate-time. Furthermore, we conjecture that any GZZ gate can be executed in a time O(n) for n qubits. We support this claim with theoretical and numerical results.	翻訳日:2023-07-24 14:43:31 公開日:2023-07-20
# ハードウェアインスパイアしたゼロノイズ外挿を用いた変分固有解法における量子ゲート誤差の軽減 Mitigating Quantum Gate Errors for Variational Eigensolvers Using Hardware-Inspired Zero-Noise Extrapolation ( http://arxiv.org/abs/2307.11156v1 ) ライセンス: Link先を確認	Alexey Uvarov, Daniil Rabinovich, Olga Lakhmanskaya, Kirill Lakhmanskiy, Jacob Biamonte, Soumik Adhikary	(参考訳) 変分量子アルゴリズムは、現代の量子アルゴリズム研究の基盤として登場した。これらのアルゴリズムの実践的実装は、体系的エラーに対してある程度の堅牢性を提供するが、確率的エラーとコヒーレンス時間に制限があるため、性能の低下を示す。本研究では,ゼロノイズ外挿を用いた変分アルゴリズムの量子ゲート誤差を緩和する手法を開発した。回路の誤差強度を制御できる実験可能な手法を提案する。物理量子デバイスにおけるゲートエラーが、異なる量子ビットとペアに対して不均質に分布するという事実を利用する。その結果、回路内の抽象量子ビットを物理デバイスにマッピングする方法に基づいて、異なる回路誤差和を達成できる。回路誤差和 (CES) に関して, 変動的アプローチにおける推定エネルギーは概ね線形であることがわかった。したがって、CESをゼロにすると、エネルギー-CESデータによる線形フィットはノイズのない変動アルゴリズムによって推定されるエネルギーを近似することができる。これを数値的に証明し、回路内の2ビットゲートが正則グラフの形で配置されている場合、近似が正確であることを示す。 Variational quantum algorithms have emerged as a cornerstone of contemporary quantum algorithms research. Practical implementations of these algorithms, despite offering certain levels of robustness against systematic errors, show a decline in performance due to the presence of stochastic errors and limited coherence time. In this work, we develop a recipe for mitigating quantum gate errors for variational algorithms using zero-noise extrapolation. We introduce an experimentally amenable method to control error strength in the circuit. We utilise the fact that gate errors in a physical quantum device are distributed inhomogeneously over different qubits and pairs thereof. As a result, one can achieve different circuit error sums based on the manner in which abstract qubits in the circuit are mapped to a physical device. We find that the estimated energy in the variational approach is approximately linear with respect to the circuit error sum (CES). Consequently, a linear fit through the energy-CES data, when extrapolated to zero CES, can approximate the energy estimated by a noiseless variational algorithm. We demonstrate this numerically and further prove that the approximation is exact if the two-qubit gates in the circuits are arranged in the form of a regular graph.	翻訳日:2023-07-24 14:43:07 公開日:2023-07-20
# 一般ゲーム表現に向けて:ゲームピクセルをコンテンツとスタイルに分解する Towards General Game Representations: Decomposing Games Pixels into Content and Style ( http://arxiv.org/abs/2307.11141v1 ) ライセンス: Link先を確認	Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis and Georgios N. Yannakakis	(参考訳) オンスクリーンゲーム映像には、プレイヤーがゲームをプレイしたり経験したりする際に処理する豊富なコンテキスト情報が含まれている。ゲームにおけるピクセル表現の学習は、ゲームプレイエージェント、手続き的コンテンツ生成、プレイヤーのモデリングなど、いくつかの下流タスクにわたる人工知能の恩恵を受ける。しかし、これらの手法の一般化性は、学習された表現は、類似のゲーム力学を持つゲーム間で理想的に共有されるべきである。例えば、1つのゲームでトレーニングされたゲームプレイングエージェントは、リトレーニングなしで同様のゲームでうまく動作することができる。本稿では,コンテンツ埋め込みやスタイル埋め込みに潜伏空間を分解することで,コンピュータビジョンエンコーダの汎用性について考察する。ゴールは、下流タスクにとって重要なゲームコンテンツに関して、同じジャンルのゲーム間のドメインギャップを最小化し、グラフィックスタイルの違いを無視することである。予め学習した視覚トランスフォーマエンコーダとゲームジャンルに基づく分解技術を用いて,異なるコンテンツとスタイル埋め込みを得る。本研究は, コンテント抽出能力を維持しつつ, 複数のゲームにまたがるスタイルの不変性を実現していることを示す。提案するコンテンツとスタイルの分解は,下流タスクとは無関係に,ゲーム環境にまたがるより良い一般化能力を提供する。 On-screen game footage contains rich contextual information that players process when playing and experiencing a game. Learning pixel representations of games can benefit artificial intelligence across several downstream tasks including game-playing agents, procedural content generation, and player modelling. The generalizability of these methods, however, remains a challenge, as learned representations should ideally be shared across games with similar game mechanics. This could allow, for instance, game-playing agents trained on one game to perform well in similar games with no re-training. This paper explores how generalizable pre-trained computer vision encoders can be for such tasks, by decomposing the latent space into content embeddings and style embeddings. The goal is to minimize the domain gap between games of the same genre when it comes to game content critical for downstream tasks, and ignore differences in graphical style. We employ a pre-trained Vision Transformer encoder and a decomposition technique based on game genres to obtain separate content and style embeddings. Our findings show that the decomposed embeddings achieve style invariance across multiple games while still maintaining strong content extraction capabilities. We argue that the proposed decomposition of content and style offers better generalization capacities across game environments independently of the downstream task.	翻訳日:2023-07-24 14:42:29 公開日:2023-07-20
# RCVaR:産業報告データを用いたサイバー攻撃コスト推定のための経済的なアプローチ RCVaR: an Economic Approach to Estimate Cyberattacks Costs using Data from Industry Reports ( http://arxiv.org/abs/2307.11140v1 ) ライセンス: Link先を確認	Muriel Figueredo Franco, Fabian K\"unzler, Jan von der Assen, Chao Feng, Burkhard Stiller	(参考訳) デジタル化は、破壊的なサイバー攻撃の犠牲者となる企業のビジネス機会とリスクを高める。したがって、リスクエクスポージャーとサイバーセキュリティ戦略の管理は、競争力のある市場で生き残りたいデジタル企業にとって不可欠である。しかし、企業固有のリスクの理解と関連するコストの定量化は簡単ではない。現在のアプローチでは、サイバーセキュリティへの影響を個別かつ定量的に見積もることはできない。限られた資源と技術的専門知識のため、中小企業や大企業でさえ、サイバー攻撃の暴露の定量化に苦慮している。そのため、サイバー攻撃による損失の理解を支援するため、新たなアプローチをとらなければならない。この記事では、公開サイバーセキュリティレポートから実際の情報を用いて、サイバーセキュリティコストを見積もるための経済的なアプローチであるReal Cyber Value at Risk (RCVaR)を紹介する。 RCVaRは、様々な情報源から最も重要なサイバーリスク要因を特定し、それらの定量的結果を組み合わせて、企業のサイバー攻撃コストを見積もる。さらに、RCVaRは、確率に基づくシミュレーションだけでなく、過去の実世界のデータに基づくコストとリスク推定を実現するために、現在の手法を拡張している。未確認データに対するアプローチの評価は、サイバーリスクの予測と管理におけるRCVaRの精度と効率を示している。したがって、RCVaRはサイバーセキュリティ計画とリスク管理プロセスに価値ある追加であることを示している。 Digitization increases business opportunities and the risk of companies being victims of devastating cyberattacks. Therefore, managing risk exposure and cybersecurity strategies is essential for digitized companies that want to survive in competitive markets. However, understanding company-specific risks and quantifying their associated costs is not trivial. Current approaches fail to provide individualized and quantitative monetary estimations of cybersecurity impacts. Due to limited resources and technical expertise, SMEs and even large companies are affected and struggle to quantify their cyberattack exposure. Therefore, novel approaches must be placed to support the understanding of the financial loss due to cyberattacks. This article introduces the Real Cyber Value at Risk (RCVaR), an economical approach for estimating cybersecurity costs using real-world information from public cybersecurity reports. RCVaR identifies the most significant cyber risk factors from various sources and combines their quantitative results to estimate specific cyberattacks costs for companies. Furthermore, RCVaR extends current methods to achieve cost and risk estimations based on historical real-world data instead of only probability-based simulations. The evaluation of the approach on unseen data shows the accuracy and efficiency of the RCVaR in predicting and managing cyber risks. Thus, it shows that the RCVaR is a valuable addition to cybersecurity planning and risk management processes.	翻訳日:2023-07-24 14:41:53 公開日:2023-07-20
# Of Models and Tin Men - 大規模言語モデルを用いたAIアライメントにおける主エージェント問題の行動経済学的研究 Of Models and Tin Men -- a behavioural economics study of principal-agent problems in AI alignment using large-language models ( http://arxiv.org/abs/2307.11137v1 ) ライセンス: Link先を確認	Steve Phelps and Rebecca Ranson	(参考訳) AIアライメント(AI Alignment)は、単一のデザイナと、設計者がエージェントの動作をその目的と一致させようとする人工エージェントとの相互作用としてしばしば提示される。一般的に事前学習される大言語モデル(llm)でインスタンス化されたエージェントの出現により、現実世界では設計者とエージェントの間に1対1の対応がなく、多くのエージェント(人工的および人間的の両方)は異質な値を持っているため、aiの安全性の本質的な側面を捉えていないと論じる。したがって、AIの安全性には経済的側面があり、プリンシパルエージェントの問題が発生する可能性が高い。主エージェント問題紛争は、情報非対称性とエージェントの効用とその主役間の固有の不整合が原因で発生し、エージェントを訓練を通じて所望の実用機能を採用するように強制することによって、この固有の不整合は克服できない。我々は、プリンシパルエージェント問題の根底にある仮定は、実際の状況において事前訓練されたaiモデルを含む安全問題の本質を捉えるために不可欠であると主張する。 AIの安全性に対して実証的なアプローチをとることで、GPTモデルが主エージェント間の衝突に対してどのように反応するかを調査する。 GPT-3.5 と GPT-4 をベースとしたエージェントは,簡単なオンラインショッピングタスクで主目的を上回り,主エージェントの対立の明確な証拠を示す。驚くべきことに、初期のGPT-3.5モデルは情報非対称性の変化に応じてよりニュアンスな振る舞いを示すが、後期のGPT-4モデルはそれ以前のアライメントに固執する。この結果は、経済学の原則をアライメントプロセスに組み込むことの重要性を強調している。 AI Alignment is often presented as an interaction between a single designer and an artificial agent in which the designer attempts to ensure the agent's behavior is consistent with its purpose, and risks arise solely because of conflicts caused by inadvertent misalignment between the utility function intended by the designer and the resulting internal utility function of the agent. With the advent of agents instantiated with large-language models (LLMs), which are typically pre-trained, we argue this does not capture the essential aspects of AI safety because in the real world there is not a one-to-one correspondence between designer and agent, and the many agents, both artificial and human, have heterogeneous values. Therefore, there is an economic aspect to AI safety and the principal-agent problem is likely to arise. In a principal-agent problem conflict arises because of information asymmetry together with inherent misalignment between the utility of the agent and its principal, and this inherent misalignment cannot be overcome by coercing the agent into adopting a desired utility function through training. We argue the assumptions underlying principal-agent problems are crucial to capturing the essence of safety problems involving pre-trained AI models in real-world situations. Taking an empirical approach to AI safety, we investigate how GPT models respond in principal-agent conflicts. We find that agents based on both GPT-3.5 and GPT-4 override their principal's objectives in a simple online shopping task, showing clear evidence of principal-agent conflict. Surprisingly, the earlier GPT-3.5 model exhibits more nuanced behaviour in response to changes in information asymmetry, whereas the later GPT-4 model is more rigid in adhering to its prior alignment. Our results highlight the importance of incorporating principles from economics into the alignment process.	翻訳日:2023-07-24 14:41:34 公開日:2023-07-20
# 色コードのフロッケ Floquetifying the Colour Code ( http://arxiv.org/abs/2307.11136v1 ) ライセンス: Link先を確認	Alex Townsend-Teague, Julio Magdalena de la Fuente, Markus Kesselring	(参考訳) フロッケ符号は、最近発見された量子誤り訂正符号の一種である。それらは安定化器符号とサブシステム符号の一般化であり、コードの論理的なパウリ演算子を時間とともに動的に変化させることで考えられる。本研究では、ZX計算を用いて、既知の安定化器符号と同等の定義可能な意味での新しいフロケ符号を生成する。特に、色コードと同等のFloquetコードを見つけるが、それを実装するのに必要なすべての測定値が1つか2であるという利点がある。特に、量子ビットは正方格子上にレイアウトすることもできる。これは、色コードをフォールトトレラントに実装することの現在の困難を回避し、他のよく研究されたコードよりもその利点を保ちつつ、さらにフロッケコードのみに特有な機能から利益を得ることができる。より高いレベルでは、arxiv:2303.08829のように、この研究は'静的'安定化コードとサブシステムコードと'動的'フローケットコードの関係に光を当てている。 Floquet codes are a recently discovered type of quantum error correction code. They can be thought of as generalising stabilizer codes and subsystem codes, by allowing the logical Pauli operators of the code to vary dynamically over time. In this work, we use the ZX-calculus to create new Floquet codes that are in a definable sense equivalent to known stabilizer codes. In particular, we find a Floquet code that is equivalent to the colour code, but has the advantage that all measurements required to implement it are of weight one or two. Notably, the qubits can even be laid out on a square lattice. This circumvents current difficulties with implementing the colour code fault-tolerantly, while preserving its advantages over other well-studied codes, and could furthermore allow one to benefit from extra features exclusive to Floquet codes. On a higher level, as in arXiv:2303.08829, this work shines a light on the relationship between 'static' stabilizer and subsystem codes and 'dynamic' Floquet codes; at first glance the latter seems a significant generalisation of the former, but in the case of the codes that we find here, the difference is essentially just a few basic ZX-diagram deformations.	翻訳日:2023-07-24 14:40:58 公開日:2023-07-20
# 条件付き生成逆ニューラルネットワークによる周波数認識型光コヒーレンス断層画像の超解像 Frequency-aware optical coherence tomography image super-resolution via conditional generative adversarial neural network ( http://arxiv.org/abs/2307.11130v1 ) ライセンス: Link先を確認	Xueshen Li, Zhenxing Dong, Hongshan Liu, Jennifer J. Kang-Mieler, Yuye Ling and Yu Gan	(参考訳) 光コヒーレンストモグラフィー(OCT)は、心臓科や眼科などの分野において、幅広い医療画像に基づく診断と治療を刺激している。このような応用は深層学習に基づく超解像技術によってさらに促進され、モルフォロジー構造を解く能力が向上する。しかし、既存の深層学習に基づく手法は、画像再構成における空間分布のみに焦点をあて、周波数バイアスをもたらす。この制限を克服するために、周波数変換、周波数スキップ接続、周波数アライメントの3つの重要な周波数ベースのモジュールと周波数ベースの損失関数を条件付き生成対向ネットワーク(cGAN)に統合する周波数対応超解像フレームワークを提案する。既存の冠動脈octデータセットから大規模定量的解析を行い,既存の深層学習フレームワークに対する提案フレームワークの優位性を実証した。さらに,魚角膜画像およびラット網膜画像に適用し,眼画像における形態的詳細を超解き明かす能力を示すことにより,我々の枠組みの一般化性を確認した。 Optical coherence tomography (OCT) has stimulated a wide range of medical image-based diagnosis and treatment in fields such as cardiology and ophthalmology. Such applications can be further facilitated by deep learning-based super-resolution technology, which improves the capability of resolving morphological structures. However, existing deep learning-based method only focuses on spatial distribution and disregard frequency fidelity in image reconstruction, leading to a frequency bias. To overcome this limitation, we propose a frequency-aware super-resolution framework that integrates three critical frequency-based modules (i.e., frequency transformation, frequency skip connection, and frequency alignment) and frequency-based loss function into a conditional generative adversarial network (cGAN). We conducted a large-scale quantitative study from an existing coronary OCT dataset to demonstrate the superiority of our proposed framework over existing deep learning frameworks. In addition, we confirmed the generalizability of our framework by applying it to fish corneal images and rat retinal images, demonstrating its capability to super-resolve morphological details in eye imaging.	翻訳日:2023-07-24 14:40:37 公開日:2023-07-20
# 近似コンピューティングサーベイ(ii) : アプリケーション固有・アーキテクチャ近似技術とその応用 Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications ( http://arxiv.org/abs/2307.11128v1 ) ライセンス: Link先を確認	Vasileios Leon, Muhammad Abdullah Hanif, Giorgos Armeniakos, Xun Jiao, Muhammad Shafique, Kiamal Pekmestzi, Dimitrios Soudris	(参考訳) 人工知能(AI)やDSP(Digital Signal Processing)といったドメインからの計算集約的なアプリケーションのデプロイが困難なため、コンピューティングシステムコミュニティは新たな設計アプローチを模索せざるを得なくなった。近似コンピューティングは、エネルギー効率と/または性能を改善するために、システムの設計における結果の質を調整できる新しいソリューションとして現れる。この急進的なパラダイムシフトは学術と産業の両方から興味を惹きつけ、様々な設計層(システムダウンから集積回路まで)における近似技術と方法論に大きな研究をもたらした。過去10年間にわたる近似コンピューティングの幅広い魅力に動機づけられ、重要な側面(用語や応用など)をカバーし、従来のコンピューティングスタックの全層から最先端の近似テクニックをレビューするために、2部的な調査を実施しました。本調査のパートIIでは,資源効率の高いプロセッサ/アクセラレータ・システムの設計を対象とする,アプリケーション固有およびアーキテクチャ近似技術の技術的詳細を分類,提示する。さらに,近似計算の応用スペクトルを詳細に分析し,オープンな課題と今後の方向性について考察する。 The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.	翻訳日:2023-07-24 14:40:18 公開日:2023-07-20
# 暗黙的内在性下での密度マッチングによる合成制御法 Synthetic Control Methods by Density Matching under Implicit Endogeneitiy ( http://arxiv.org/abs/2307.11127v1 ) ライセンス: Link先を確認	Masahiro Kato and Akari Ohda and Masaaki Imaizumi and Kenichiro McAlinn	(参考訳) 合成制御法(scms)は比較事例研究において因果推論の重要なツールとなっている。 SCMの基本的な考え方は、未処理単位の観測結果の重み付け和を用いて、処理単位の対実結果を評価することである。合成制御 (SC) の精度は因果効果を推定するために重要であり, SC重量の推定が多くの研究の焦点となっている。本稿では,まず,既存のscmが非処理単位の結果と反事実的結果のモデルにおける誤差項の相関関係である暗黙的内在性問題に苦しむことを指摘した。この問題は因果効果推定器にバイアスをもたらすことを示した。次に,非処理単位の密度(すなわち混合モデル)の重み付け平均値によって処理単位の出力密度を近似できることを仮定して,密度マッチングに基づく新しいscmを提案する。この仮定に基づき,治療結果のモーメントと未治療結果のモーメントの重み付け和を一致させてsc重みを推定する。提案手法は既存手法よりも3つの利点がある。まず, 混合モデルの仮定により, 推定器は漸近的に偏りがない。第2に,漸近的不偏性により,反事実予測の平均二乗誤差を低減できる。第3に, 本手法は, 期待値だけでなく, 処理効果の完全な密度を生成し, SCMの適用範囲を広げる。提案手法の有効性を実証するための実験結果を提供する。 Synthetic control methods (SCMs) have become a crucial tool for causal inference in comparative case studies. The fundamental idea of SCMs is to estimate counterfactual outcomes for a treated unit by using a weighted sum of observed outcomes from untreated units. The accuracy of the synthetic control (SC) is critical for estimating the causal effect, and hence, the estimation of SC weights has been the focus of much research. In this paper, we first point out that existing SCMs suffer from an implicit endogeneity problem, which is the correlation between the outcomes of untreated units and the error term in the model of a counterfactual outcome. We show that this problem yields a bias in the causal effect estimator. We then propose a novel SCM based on density matching, assuming that the density of outcomes of the treated unit can be approximated by a weighted average of the densities of untreated units (i.e., a mixture model). Based on this assumption, we estimate SC weights by matching moments of treated outcomes and the weighted sum of moments of untreated outcomes. Our proposed method has three advantages over existing methods. First, our estimator is asymptotically unbiased under the assumption of the mixture model. Second, due to the asymptotic unbiasedness, we can reduce the mean squared error for counterfactual prediction. Third, our method generates full densities of the treatment effect, not only expected values, which broadens the applicability of SCMs. We provide experimental results to demonstrate the effectiveness of our proposed method.	翻訳日:2023-07-24 14:39:56 公開日:2023-07-20
# 疫学コホート作成が管理医療データを用いたホームレスの機械学習予測と警察との対話結果に及ぼす影響 The Effect of Epidemiological Cohort Creation on the Machine Learning Prediction of Homelessness and Police Interaction Outcomes Using Administrative Health Care Data ( http://arxiv.org/abs/2307.11211v1 ) ライセンス: Link先を確認	Faezehsadat Shahidi, M. Ethan MacDonald, Dallas Seitz, Geoffrey Messier	(参考訳) 背景: 精神病はホームレスや警察との交流などの有害な結果につながる可能性があり, これらの有害な結果につながる出来事の理解が重要である。予測モデルは、そのような悪影響のリスクのある個人を特定するのに役立つかもしれない。ロジスティック回帰(LR)や機械学習(ML)モデルを備えた固定された観測窓コホートを使用することで、適応的およびパーセル化されたウィンドウと比較して低い性能が得られる。方法:2013年4月1日から2018年3月31日まで,カナダ,アルバータ州カルガリーにおいて,中毒性ないし精神疾患(amh)と診断された240,219人の管理医療データセットを用いた。コホートはホームレスと警察の相互作用に関連する要因を特定するために2年間続いた。予測モデルに対するフレキシブルウィンドウの利点を理解するために、代替のコホートが作成された。そして,2つのコホートにおいて,ランダム森林(RF)を含むLRおよびMLモデルと極勾配上昇(XGBoost)を比較した。結果: 237,602人中 0.8% (1,800) が最初のホームレスとなり,0.32% (759) が237,141人の間で最初の警察活動が報告された。男性性(AORs: H=1.51, P=2.52)、物質障害(AORs: H=3.70, P=2.83)、精神科医の訪問(AORs: H=1.44, P=1.49)、薬物乱用(AORs: H=2.67, P=1.83)は初期ホームレス(H)と警察の相互作用(P)に関連していた。 XGBoostは, フレキシブルな手法(初期ホームレスに対する感度=91%, AUC=90%, 初期警察との相互作用に対する感度=90%, AUC=89%)で優れた性能を示した。 Background: Mental illness can lead to adverse outcomes such as homelessness and police interaction and understanding of the events leading up to these adverse outcomes is important. Predictive models may help identify individuals at risk of such adverse outcomes. Using a fixed observation window cohort with logistic regression (LR) or machine learning (ML) models can result in lower performance when compared with adaptive and parcellated windows. Method: An administrative healthcare dataset was used, comprising of 240,219 individuals in Calgary, Alberta, Canada who were diagnosed with addiction or mental health (AMH) between April 1, 2013, and March 31, 2018. The cohort was followed for 2 years to identify factors associated with homelessness and police interactions. To understand the benefit of flexible windows to predictive models, an alternative cohort was created. Then LR and ML models, including random forests (RF), and extreme gradient boosting (XGBoost) were compared in the two cohorts. Results: Among 237,602 individuals, 0.8% (1,800) experienced first homelessness, while 0.32% (759) reported initial police interaction among 237,141 individuals. Male sex (AORs: H=1.51, P=2.52), substance disorder (AORs: H=3.70, P=2.83), psychiatrist visits (AORs: H=1.44, P=1.49), and drug abuse (AORs: H=2.67, P=1.83) were associated with initial homelessness (H) and police interaction (P). XGBoost showed superior performance using the flexible method (sensitivity =91%, AUC =90% for initial homelessness, and sensitivity =90%, AUC=89% for initial police interaction) Conclusion: This study identified key features associated with initial homelessness and police interaction and demonstrated that flexible windows can improve predictive modeling.	翻訳日:2023-07-24 14:32:19 公開日:2023-07-20
# 臨床トライアルアクティブラーニング Clinical Trial Active Learning ( http://arxiv.org/abs/2307.11209v1 ) ライセンス: Link先を確認	Zoe Fowler, Kiran Kokilepersaud, Mohit Prabhushankar, and Ghassan AlRegib	(参考訳) 本稿では,非依存的かつ同一分布(非i.i.d.)構造を考慮したアクティブラーニングへの新しいアプローチを提案する。臨床試験には、ふりかえりとprospectiveという2つのタイプがあります。臨床試験は治療後のデータを分析し,治療が進行中であるときにデータを収集する。通常、アクティブな学習アプローチでは、データセットはトレーニングサンプルを選択する際にdと仮定されるが、臨床試験の場合、治療の結果、現在の訪問時に収集されたデータと過去の訪問の間に依存性が生じる。そこで我々は,従来の能動学習手法の限界を克服し,それを光コヒーレンス断層撮影(OCT)画像の病気検出に適用し,画像が収集された時点で条件を定め,i.d.仮定を強制する。提案手法を従来のアクティブラーニングパラダイムと比較し,本手法を「ふりかえり」と呼ぶ。有望なアクティブラーニングが2種類のテスト環境でのレトロスペクティブアクティブラーニングより優れていることを示す。 This paper presents a novel approach to active learning that takes into account the non-independent and identically distributed (non-i.i.d.) structure of a clinical trial setting. There exists two types of clinical trials: retrospective and prospective. Retrospective clinical trials analyze data after treatment has been performed; prospective clinical trials collect data as treatment is ongoing. Typically, active learning approaches assume the dataset is i.i.d. when selecting training samples; however, in the case of clinical trials, treatment results in a dependency between the data collected at the current and past visits. Thus, we propose prospective active learning to overcome the limitations present in traditional active learning methods and apply it to disease detection in optical coherence tomography (OCT) images, where we condition on the time an image was collected to enforce the i.i.d. assumption. We compare our proposed method to the traditional active learning paradigm, which we refer to as retrospective in nature. We demonstrate that prospective active learning outperforms retrospective active learning in two different types of test settings.	翻訳日:2023-07-24 14:31:35 公開日:2023-07-20
# 存在論的根拠と言語非依存の知識グラフを目指して Towards Ontologically Grounded and Language-Agnostic Knowledge Graphs ( http://arxiv.org/abs/2307.11206v1 ) ライセンス: Link先を確認	Walid S. Saba	(参考訳) 知識グラフ(KG)は、リコメンデーションエンジン、検索、質問応答システムなどのアプリケーションにおける事実情報の表現の標準技術となっている。しかし、KGsの継続的な更新、および異なるドメインからのKGsと異なる言語でのKGsの統合は、依然として大きな課題である。ここでの示唆は、抽象オブジェクトの再構築と、概念と型の間の存在論的区別の認識によって、KG統合の困難を緩和できる存在論的根拠と言語に依存しない表現にたどり着くことである。 Knowledge graphs (KGs) have become the standard technology for the representation of factual information in applications such as recommendation engines, search, and question-answering systems. However, the continual updating of KGs, as well as the integration of KGs from different domains and KGs in different languages, remains to be a major challenge. What we suggest here is that by a reification of abstract objects and by acknowledging the ontological distinction between concepts and types, we arrive at an ontologically grounded and language-agnostic representation that can alleviate the difficulties in KG integration.	翻訳日:2023-07-24 14:31:18 公開日:2023-07-20
# マイクロメカニカル共振器に結合した可変駆動型RabiダイマーにおけるLandau Zener転移 Photon-assisted Landau Zener transitions in a tunable driven Rabi dimer coupled to a micromechanical resonator ( http://arxiv.org/abs/2307.11200v1 ) ライセンス: Link先を確認	Daniel Melvin, Fulu Zheng, Kewei Sun, Zhengjie Tan, Yang Zhao	(参考訳) 多重ダヴィドフ D$_2$ Ansatz と時間依存性の変動原理を用いて,光子アシスト型ランダウ・ツェナー遷移と量子力学デバイスにおける量子ビット操作について検討した。ラビダイマーとしてモデル化されたこのデバイスは、2つの相互作用する伝送線路共振器からなり、それぞれがキュービットに結合される。独立調和場によって駆動される量子ビットは、フォノンモードで模倣されたマイクロメカニカル共振器によってさらに変調される。 2つの独立駆動場がキュービット力学に与える影響を慎重に検討した。システム内のエネルギー図と共振器上の光子数移動を解析し、単一フォノンモードの影響を考慮してLZ遷移と量子力学の挙動を説明する。その結果、低いフォノン周波数は、特に駆動場がない場合、量子ビットのダイナミクスを変化させることができることが示され、強いフォノンカップリング強度は、高いフォノンエネルギーの流入によって、量子ビットのダイナミクスを著しく揺るがすことができる。特に、光子周波数のみが量子ビット偏波の振動周波数に影響する。この研究は、光子とフォノンがラビディマーモデルで果たす重要な役割を明らかにするものである。 Employing the multiple Davydov D$_2$ Ansatz with the time-dependent variational principle, we have investigated photon-assisted Landau-Zener (LZ) transitions and qubit manipulation in a hybrid quantum electrodynamics device. Modelled as a Rabi dimer, the device comprises of two interacting transmission-line resonators, each coupled to a qubit. The qubits, driven by independent harmonic fields, are further modulated by a micromechanical resonator mimicked by a phonon mode. The impacts of two independent driving fields on the qubit dynamics are carefully examined. The energy diagram of the system and the photon number mobilization on the resonators are analyzed to explain the behaviour of the LZ transitions and qubit dynamics while taking into account the influence of the single phonon mode. Results show that low phonon frequencies can alter the qubit dynamics, particularly in the absence of the driving fields, {and a strong phonon coupling strength can significantly perturb the qubit dynamics thanks to a high influx of phonon energy}. Notably, only the photon frequency affects the oscillation frequency of qubit polarization. This study unveils the imperative roles that photons and phonons play in the Rabi dimer model.	翻訳日:2023-07-24 14:31:06 公開日:2023-07-20
# Lefschetz thimble計算による実時間経路積分における量子トンネルの新しい図形 A new picture of quantum tunneling in the real-time path integral from Lefschetz thimble calculations ( http://arxiv.org/abs/2307.11199v1 ) ライセンス: Link先を確認	Jun Nishimura, Katsuta Sakai, Atis Yosprakob	(参考訳) 量子トンネルは想像時間経路積分形式論においてインスタントンによって記述できることはよく知られている。しかし、実時間経路積分形式論におけるその記述は不可解である。ここでは、量子トンネルは一般に、ピカール=レフシェッツ理論を用いて同定できる複雑なサドル点の寄与によって特徴づけられるという声明を確立する。簡単な量子力学系のモンテカルロシミュレーションを実行し、一般化されたレフシェッツ・ティンブル法で符号問題を克服することでこれを明示的に実証する。複素鞍点の寄与が、原理実験によって測定できる物理量である時刻$t$で評価されるエルミート座標作用素 $\hat{x}$ の複素 ``weak value'' に現れることを数値的に確認する。また, 古典力学への変遷についても考察する。 It is well known that quantum tunneling can be described by instantons in the imaginary-time path integral formalism. However, its description in the real-time path integral formalism has been elusive. Here we establish a statement that quantum tunneling can be characterized in general by the contribution of complex saddle points, which can be identified by using the Picard-Lefschetz theory. We demonstrate this explicitly by performing Monte Carlo simulations of simple quantum mechanical systems, overcoming the sign problem by the generalized Lefschetz thimble method. We confirm numerically that the contribution of complex saddle points manifests itself in a complex ``weak value'' of the Hermitian coordinate operator $\hat{x}$ evaluated at time $t$, which is a physical quantity that can be measured by experiments in principle. We also discuss the transition to classical dynamics based on our picture.	翻訳日:2023-07-24 14:30:44 公開日:2023-07-20
# 画像異常検出のためのヒューリスティックハイパーパラメータ選択 Heuristic Hyperparameter Choice for Image Anomaly Detection ( http://arxiv.org/abs/2307.11197v1 ) ライセンス: Link先を確認	Zeyu Jiang, Jo\~ao P. C. Bertoldo, Etienne Decenci\`ere	(参考訳) 画像における異常検出(ad)は、ディープラーニングニューラルネットワークによる、正規性から著しく逸脱した画像を識別する基本的なコンピュータビジョン問題である。事前訓練されたモデルから抽出された深い特徴は多変量ガウス分布解析に基づいてADに必須であることが証明された。しかし、モデルは通常、imagenetのような分類タスクのために大きなデータセットで事前トレーニングされるので、多くの冗長なフィーチャをadに生成し、計算コストを増加させ、パフォーマンスを低下させる可能性がある。我々はこれらの特徴に対してNPCA(Negated principal Component Analysis)の次元削減を図る。そこで我々は,NPCAアルゴリズムのハイパーパラメータを極力少ない機能として選択し,優れた性能を確保するためのヒューリスティックな提案を行った。 Anomaly detection (AD) in images is a fundamental computer vision problem by deep learning neural network to identify images deviating significantly from normality. The deep features extracted from pretrained models have been proved to be essential for AD based on multivariate Gaussian distribution analysis. However, since models are usually pretrained on a large dataset for classification tasks such as ImageNet, they might produce lots of redundant features for AD, which increases computational cost and degrades the performance. We aim to do the dimension reduction of Negated Principal Component Analysis (NPCA) for these features. So we proposed some heuristic to choose hyperparameter of NPCA algorithm for getting as fewer components of features as possible while ensuring a good performance.	翻訳日:2023-07-24 14:30:28 公開日:2023-07-20
# 1次元導波路QEDシステムにおける多重サイドバンド干渉による信号増幅 Signal Amplification Assisted by Multiple Sideband Interference in 1D Waveguide QED Systems ( http://arxiv.org/abs/2307.11174v1 ) ライセンス: Link先を確認	Kuan-Ting Lin, Ting Hsu, Yu-Chen Lin, Io-Chun Hoi and Guin-Dar Lin	(参考訳) 本研究では1次元導波路量子電磁力学系における複数のRabiサイドバンドコヒーレンスによる信号増幅について理論的に検討する。具体的には、半無限導波路を介してコヒーレントマイクロ波場によって強く駆動されるトランスモンの挙動を探索する。増幅のメカニズムを理解するために,複数の服を着たサイドバンドを強い駆動場下で明示的に考慮し,プローブ信号の反射振幅を分析する理論を開発した。以上の結果から,増幅は集団反転または複数のサイドバンド構成的干渉と関連する可能性が示唆された。さらに、増幅過程におけるqubit dephasingの効果について検討する。 This study theoretically investigates signal amplification resulting from multiple Rabi sideband coherence in a one-dimensional waveguide quantum electrodynamical system. Specifically, we explore the behavior of a transmon while strongly driven by a coherent microwave field through a semi-infinite waveguide. To understand the underlying mechanisms of amplification, we develop a theory that explicitly takes into account multiple dressed sidebands under a strong driving field, and analyze the reflection amplitude of the probe signal. Our findings show that amplification can be related to either population inversion or multiple sideband constructive interference in some cases without population inversion. We further examine the effect of qubit dephasing during the amplification process.	翻訳日:2023-07-24 14:30:15 公開日:2023-07-20
# UMLS-KGI-BERT:バイオメディカルエンティティ認識のためのトランスフォーマにおけるデータ中心知識の統合 UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for Biomedical Entity Recognition ( http://arxiv.org/abs/2307.11170v1 ) ライセンス: Link先を確認	Aidan Mannion, Thierry Chevalier, Didier Schwab, Lorraine Geouriot	(参考訳) 近年,事前学習型トランスフォーマー言語モデル (LM) が応用NLPの主流となっている。これらのモデルは、情報抽出、質問応答、感情分析、文書分類などのタスクで最先端のパフォーマンスを達成した。生物医学領域では、このパラダイムをドメイン固有の知識の統合と言語の統計的モデリングを必要とするnlpタスクに適応させることで大きな進歩を遂げている。特に、この領域の研究は、医学文献におけるトークン分布のパターンだけでなく、umlのような用語資源に含まれる構造化情報の豊富さを考慮に入れたlmsの構築がいかに最善かという問題に焦点をあてている。この研究は、UMLSからテキストシーケンスを抽出することにより、バイオメディカルトランスフォーマーエンコーダLMの言語表現を強化するためのデータ中心パラダイムに寄与する。これにより、グラフベースの学習目標とマスク言語事前学習を組み合わせることができる。予め訓練したLMの拡張実験およびスクラッチからのトレーニングの結果から,複数の生物医学的,臨床的な名前付きエンティティ認識(NER)タスクにおける下流性能の向上が示された。 Pre-trained transformer language models (LMs) have in recent years become the dominant paradigm in applied NLP. These models have achieved state-of-the-art performance on tasks such as information extraction, question answering, sentiment analysis, document classification and many others. In the biomedical domain, significant progress has been made in adapting this paradigm to NLP tasks that require the integration of domain-specific knowledge as well as statistical modelling of language. In particular, research in this area has focused on the question of how best to construct LMs that take into account not only the patterns of token distribution in medical text, but also the wealth of structured information contained in terminology resources such as the UMLS. This work contributes a data-centric paradigm for enriching the language representations of biomedical transformer-encoder LMs by extracting text sequences from the UMLS. This allows for graph-based learning objectives to be combined with masked-language pre-training. Preliminary results from experiments in the extension of pre-trained LMs as well as training from scratch show that this framework improves downstream performance on multiple biomedical and clinical Named Entity Recognition (NER) tasks.	翻訳日:2023-07-24 14:30:04 公開日:2023-07-20
# MuJoCo環境における離散的・連続的制御タスクのための強化学習手法の探索 Exploring reinforcement learning techniques for discrete and continuous control tasks in the MuJoCo environment ( http://arxiv.org/abs/2307.11166v1 ) ライセンス: Link先を確認	Vaddadi Sai Rahul, Debajyoti Chakraborty	(参考訳) 我々は、高速な物理シミュレータであるMuJoCoを利用して、連続的な制御環境でタスクを実行し、各タスクに対する観察空間、アクションスペース、報酬などの詳細を明らかにする。本稿では,Q-learning と SARSA を離散化手法で比較し,それらをベースラインとして使用し,現在最先端の深層政策勾配法 DDPG に段階的に移行した。多数のエピソードにおいて、QlearningはSARSAより優れていたが、DDPGはいずれも少数のエピソードで優れていた。最後に、モデルハイパーパラメータを微調整し、より多くのパフォーマンスを期待しながら、より少ない時間とリソースを使うようにしました。 DDPGの新しい設計はパフォーマンスを大幅に改善すると予想したが、わずか数回で十分な平均的な報酬を得ることができた。十分な時間と計算資源を提供するパフォーマンスの向上を期待する。 We leverage the fast physics simulator, MuJoCo to run tasks in a continuous control environment and reveal details like the observation space, action space, rewards, etc. for each task. We benchmark value-based methods for continuous control by comparing Q-learning and SARSA through a discretization approach, and using them as baselines, progressively moving into one of the state-of-the-art deep policy gradient method DDPG. Over a large number of episodes, Qlearning outscored SARSA, but DDPG outperformed both in a small number of episodes. Lastly, we also fine-tuned the model hyper-parameters expecting to squeeze more performance but using lesser time and resources. We anticipated that the new design for DDPG would vastly improve performance, yet after only a few episodes, we were able to achieve decent average rewards. We expect to improve the performance provided adequate time and computational resources.	翻訳日:2023-07-24 14:29:44 公開日:2023-07-20
# google量子ai実験におけるマイクロ波光子の結合状態のロバスト性と最終的な遅い減衰 Robustness and eventual slow decay of bound states of interacting microwave photons in the Google Quantum AI experiment ( http://arxiv.org/abs/2307.11164v1 ) ライセンス: Link先を確認	Federica Maria Surace, Olexei Motrunich	(参考訳) 可積分モデルは、崩壊することなく無限に伝播できる安定励起の存在によって特徴づけられる。これには、有名なxxzスピンチェーンモデルとその可積分フロッケモデルにおけるマルチマグノン境界状態が含まれる。 Floquetモデルを実現する最近のGoogle Quantum AI実験(A. Morvan et al., Nature 612, 240 (2022))では、統合性が壊れた場合でも、このような集合的な励起が持続していることが示されている。本稿では,実験で実現したモデルのスペクトルを,正確な対角化と物理的議論を用いて検討する。可積分モデルの厳密な境界状態の子孫に対応する孤立したバンドは、広い範囲のシステムサイズのスペクトルにおいて明らかに観測可能であることが判明した。しかし, 固有状態の局在特性の数値解析により, 境界状態が熱力学的限界で不安定になることが示唆された。崩壊率の摂動的推定は、大きなシステムサイズに対する最終的な不安定性の予測と一致する。 Integrable models are characterized by the existence of stable excitations that can propagate indefinitely without decaying. This includes multi-magnon bound states in the celebrated XXZ spin chain model and its integrable Floquet counterpart. A recent Google Quantum AI experiment [A. Morvan et al., Nature 612, 240 (2022)] realizing the Floquet model demonstrated the persistence of such collective excitations even when the integrability is broken: this observation is at odds with the expectation of ergodic dynamics in generic non-integrable systems. We here study the spectrum of the model realized in the experiment using exact diagonalization and physical arguments. We find that isolated bands corresponding to the descendants of the exact bound states of the integrable model are clearly observable in the spectrum for a large range of system sizes. However, our numerical analysis of the localization properties of the eigenstates suggests that the bound states become unstable in the thermodynamic limit. A perturbative estimate of the decay rate agrees with the prediction of an eventual instability for large system sizes.	翻訳日:2023-07-24 14:29:27 公開日:2023-07-20
# 下部境界における水産-ラオ勾配について On the Fisher-Rao Gradient of the Evidence Lower Bound ( http://arxiv.org/abs/2307.11249v1 ) ライセンス: Link先を確認	Nihat Ay, Jesse van Oostrum	(参考訳) 本稿では, 変動オートネコーダ, ヘルムホルツ機械, 自由エネルギー原理の理論において重要な役割を担っているエビデンス下界の自然勾配, エルボのフィッシャー・ラオ勾配について考察する。 ELBOの自然勾配は、学習の主目的関数である目標分布からのクルバック・リーブラー分岐の自然勾配と関連している。情報幾何学における勾配の不変性に基づいて、主目的関数の最小化とELBOの最大化の同値性を確保するための基礎モデルの条件が提供される。 This article studies the Fisher-Rao gradient, also referred to as the natural gradient, of the evidence lower bound, the ELBO, which plays a crucial role within the theory of the Variational Autonecoder, the Helmholtz Machine and the Free Energy Principle. The natural gradient of the ELBO is related to the natural gradient of the Kullback-Leibler divergence from a target distribution, the prime objective function of learning. Based on invariance properties of gradients within information geometry, conditions on the underlying model are provided that ensure the equivalence of minimising the prime objective function and the maximisation of the ELBO.	翻訳日:2023-07-24 14:24:10 公開日:2023-07-20
# ニューロモルフィックコンピューティングを用いた高エネルギー物理実験のためのオンセンサデータフィルタリング On-Sensor Data Filtering using Neuromorphic Computing for High Energy Physics Experiments ( http://arxiv.org/abs/2307.11242v1 ) ライセンス: Link先を確認	Shruti R. Kulkarni, Aaron Young, Prasanna Date, Narasinga Rao Miniskar, Jeffrey S. Vetter, Farah Fahim, Benjamin Parpillon, Jennet Dickinson, Nhan Tran, Jieun Yoo, Corrinne Mills, Morris Swartz, Petar Maksimovic, Catherine D. Schuman, Alice Bean	(参考訳) 本研究では、高輝度ハドロン衝突型加速器で実施された高エネルギー物理実験において、センサエレクトロニクスからのデータフィルタリングに使用されるニューロモルフィックコンピューティングベースのスパイキングニューラルネットワーク(SNN)モデルについて述べる。本稿では,粒子の逆運動量に基づいてセンサデータをフィルタする小型ニューロモルフィックモデルを開発し,下流エレクトロニクスに送信されるデータ量を削減することを目的とした。入ってくる電荷波形は二値イベントのストリームに変換され、SNNによって処理される。ハードウェア展開に最適化された正確でコンパクトなSNNに対して,データエンコーディングからトレーニングアルゴリズムの最適パラメータまで,さまざまなシステム設計選択に関する知見を提示する。その結果、進化的アルゴリズムと最適化されたハイパーパラメータセットで訓練されたsnは、ディープニューラルネットワークの半分近いパラメータを持つ約91%の信号効率が得られることがわかった。 This work describes the investigation of neuromorphic computing-based spiking neural network (SNN) models used to filter data from sensor electronics in high energy physics experiments conducted at the High Luminosity Large Hadron Collider. We present our approach for developing a compact neuromorphic model that filters out the sensor data based on the particle's transverse momentum with the goal of reducing the amount of data being sent to the downstream electronics. The incoming charge waveforms are converted to streams of binary-valued events, which are then processed by the SNN. We present our insights on the various system design choices - from data encoding to optimal hyperparameters of the training algorithm - for an accurate and compact SNN optimized for hardware deployment. Our results show that an SNN trained with an evolutionary algorithm and an optimized set of hyperparameters obtains a signal efficiency of about 91% with nearly half as many parameters as a deep neural network.	翻訳日:2023-07-24 14:23:57 公開日:2023-07-20
# ネットワークインデックス信号のエッジワイド出力 Edgewise outliers of network indexed signals ( http://arxiv.org/abs/2307.11239v1 ) ライセンス: Link先を確認	Christopher Rieser and Anne Ruiz-Gazen and Christine Thomas-Agnan	(参考訳) 変数間の依存やグラフノード間の依存を含む,ネットワークインデックス付き多変量データのモデルを検討する。これらのモデルのフレームワークでは、外れ値検出に注目し、エッジワイズ外れ値の概念を導入する。この目的のために、まず、検出規則と外れ値検出のしきい値の固定に使用できる正方形の和、特に正方形のマハラノビス距離の分布を導出する。そこで我々は,エッジワイド MCD と呼ぶ決定論的 MCD アルゴリズムの頑健なバージョンを提案する。シミュレーションデータへの応用は、依存構造を考慮することに関心を示す。また,提案手法の有用性を実データで説明する。 We consider models for network indexed multivariate data involving a dependence between variables as well as across graph nodes. In the framework of these models, we focus on outliers detection and introduce the concept of edgewise outliers. For this purpose, we first derive the distribution of some sums of squares, in particular squared Mahalanobis distances that can be used to fix detection rules and thresholds for outlier detection. We then propose a robust version of the deterministic MCD algorithm that we call edgewise MCD. An application on simulated data shows the interest of taking the dependence structure into account. We also illustrate the utility of the proposed method with a real data set.	翻訳日:2023-07-24 14:23:39 公開日:2023-07-20
# QDC: グラフ上の量子拡散畳み込みカーネル QDC: Quantum Diffusion Convolution Kernels on Graphs ( http://arxiv.org/abs/2307.11234v1 ) ライセンス: Link先を確認	Thomas Markovich	(参考訳) グラフ畳み込みニューラルネットワーク(graph convolutional neural networks, gcns)は、対象とする予測タスクに基づいて、ローカルな近傍にメッセージを集約することで動作する。多くのGCNは、グラフ上の入力特徴の一般化拡散の一形態として理解することができ、メッセージパッシングの方法を変更することで予測精度を向上させるために重要な研究がなされている。本研究では,量子粒子のグラフ上での伝播に対する一般化拡散パラダイムに基づくトレーディングにより,頂点の占有相関に従ってグラフを効果的に再配線する新しい畳み込みカーネルを提案する。この新しい畳み込みカーネルを量子拡散畳み込み演算子(QDC)と呼ぶ。さらに、QDC演算子と従来の組合せラプラシアンからのメッセージを組み合わせたマルチスケール変種を導入する。本手法を理解するために,帯域通過フィルタの構成におけるホモフィリのスペクトル依存性と量子力学の重要性を検討する。これらの研究、および様々なデータセットの実験を通して、QDCは類似の手法と比較して広く使われているベンチマークデータセットの予測性能を改善する。 Graph convolutional neural networks (GCNs) operate by aggregating messages over local neighborhoods given the prediction task under interest. Many GCNs can be understood as a form of generalized diffusion of input features on the graph, and significant work has been dedicated to improving predictive accuracy by altering the ways of message passing. In this work, we propose a new convolution kernel that effectively rewires the graph according to the occupation correlations of the vertices by trading on the generalized diffusion paradigm for the propagation of a quantum particle over the graph. We term this new convolution kernel the Quantum Diffusion Convolution (QDC) operator. In addition, we introduce a multiscale variant that combines messages from the QDC operator and the traditional combinatorial Laplacian. To understand our method, we explore the spectral dependence of homophily and the importance of quantum dynamics in the construction of a bandpass filter. Through these studies, as well as experiments on a range of datasets, we observe that QDC improves predictive performance on the widely used benchmark datasets when compared to similar methods.	翻訳日:2023-07-24 14:23:30 公開日:2023-07-20
# ファイナンスのための量子コンピューティング Quantum computing for finance ( http://arxiv.org/abs/2307.11230v1 ) ライセンス: Link先を確認	Dylan Herman, Cody Googin, Xiaoyuan Liu, Yue Sun, Alexey Galda, Ilya Safro, Marco Pistoia, Yuri Alexeev	(参考訳) 量子コンピュータは、古典的コンピュータの計算能力を超え、多くの産業に変化をもたらすことが期待されている。本稿では,金融アプリケーションにおける量子コンピューティングの現状,特に確率的モデリング,最適化,機械学習について概説する。このレビューは物理学者を対象とし、金融業界で使われている古典的手法の概要を述べ、量子技術の潜在的な利点と限界について論じている。最後に、物理学者が取り組むべき課題に目を向けます。 Quantum computers are expected to surpass the computational capabilities of classical computers and have a transformative impact on numerous industry sectors. We present a comprehensive summary of the state of the art of quantum computing for financial applications, with particular emphasis on stochastic modeling, optimization, and machine learning. This Review is aimed at physicists, so it outlines the classical techniques used by the financial industry and discusses the potential advantages and limitations of quantum techniques. Finally, we look at the challenges that physicists could help tackle.	翻訳日:2023-07-24 14:23:12 公開日:2023-07-20
# adaptive query releaseからmachine unlearningへ From Adaptive Query Release to Machine Unlearning ( http://arxiv.org/abs/2307.11228v1 ) ライセンス: Link先を確認	Enayat Ullah, Raman Arora	(参考訳) 構造化クエリクラスから適応クエリを選択する学習アルゴリズムに対応する効率的なアンラーニングアルゴリズムの設計として,機械学習の問題を定式化する。線形およびプレフィックスサムクエリクラスに対する効率的な未学習アルゴリズムを提供する。応用として,多くの問題,特に確率凸最適化(sco)におけるアンラーニングが,上記の問題に還元され,問題に対する保証が向上することを示す。特に、スムースなリプシッツ損失と任意の$\rho>0$に対して、この結果は、$d$がモデル次元であり、$n$がサンプルの初期数である$\tilde o\big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{n\rho}\big)$という過剰な人口リスクを持つ未学習アルゴリズムをもたらす。非スムースリプシッツ損失に対しては、過剰な人口リスクを持つアンラーニングアルゴリズムに、同じアンラーニングクエリ(gradient)複雑性を持つ$\tilde o\big(\frac{1}{\sqrt{n}}+\big(\frac{\sqrt{d}}{n\rho}\big)^{1/2}\big)$を与える。さらに、線形回帰やロジスティック回帰のような一般化線形モデル(GLM)の特別な場合では、滑らかなリプシッツと非滑らかなリプシッツの損失に対して、それぞれ$\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{2/3}}\big)$と$\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{1/3}}\big)$の次元非依存率を得る。最後に、上記を1つの未学習リクエストから挿入と削除からなる‘textit{dynamic}ストリームへ一般化する。 We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms which perform a selection of adaptive queries from structured query classes. We give efficient unlearning algorithms for linear and prefix-sum query classes. As applications, we show that unlearning in many problems, in particular, stochastic convex optimization (SCO), can be reduced to the above, yielding improved guarantees for the problem. In particular, for smooth Lipschitz losses and any $\rho>0$, our results yield an unlearning algorithm with excess population risk of $\tilde O\big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{n\rho}\big)$ with unlearning query (gradient) complexity $\tilde O(\rho \cdot \text{Retraining Complexity})$, where $d$ is the model dimensionality and $n$ is the initial number of samples. For non-smooth Lipschitz losses, we give an unlearning algorithm with excess population risk $\tilde O\big(\frac{1}{\sqrt{n}}+\big(\frac{\sqrt{d}}{n\rho}\big)^{1/2}\big)$ with the same unlearning query (gradient) complexity. Furthermore, in the special case of Generalized Linear Models (GLMs), such as those in linear and logistic regression, we get dimension-independent rates of $\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{2/3}}\big)$ and $\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{1/3}}\big)$ for smooth Lipschitz and non-smooth Lipschitz losses respectively. Finally, we give generalizations of the above from one unlearning request to \textit{dynamic} streams consisting of insertions and deletions.	翻訳日:2023-07-24 14:23:04 公開日:2023-07-20
# UP-DP:ビジョン言語モデルを用いたデータ事前選択のための教師なしプロンプト学習 UP-DP: Unsupervised Prompt Learning for Data Pre-Selection with Vision-Language Models ( http://arxiv.org/abs/2307.11227v1 ) ライセンス: Link先を確認	Xin Li, Sima Behpour, Thang Doan, Wenbin He, Liang Gou, Liu Ren	(参考訳) 本研究では,ラベルのないデータセットから単一のパスでラベル付けするインスタンスを選択することを目的としたデータ事前選択タスクについて検討し,アノテーション予算に制限のある下流タスクのパフォーマンスを最適化する。以前のデータ事前選択のアプローチは、CLIPやBLIP-2といった基礎モデルから抽出された視覚的特徴にのみ依存していたが、テキスト機能の強力さは無視された。本研究では、適切な設計により、視覚とテキストの融合特徴空間がデータの事前選択により良い表現をもたらすことを論じる。この目的のために,データ事前選択にBLIP-2のような視覚言語モデルを適用する,シンプルで効果的な教師なしのプロンプト学習手法であるUP-DPを導入する。具体的には、BLIP-2パラメータを凍結することで、テキストプロンプトをトレーニングし、表現性を改善し、データセット全体をカバーする多様なクラスタ構造を保証する。この手法を7つのベンチマークデータセットを異なる設定で使用し,最大20%のパフォーマンス向上を実現した最新技術と比較した。興味深いことに、あるデータセットから学んだプロンプトは大きな一般化可能性を示し、他のデータセットからBLIP-2の特徴抽出を強化するために直接適用することができる。 up-dpは、データ事前選択のためのビジョン言語モデルに教師なしのプロンプト学習を組み込んだ最初の仕事です。 In this study, we investigate the task of data pre-selection, which aims to select instances for labeling from an unlabeled dataset through a single pass, thereby optimizing performance for undefined downstream tasks with a limited annotation budget. Previous approaches to data pre-selection relied solely on visual features extracted from foundation models, such as CLIP and BLIP-2, but largely ignored the powerfulness of text features. In this work, we argue that, with proper design, the joint feature space of both vision and text can yield a better representation for data pre-selection. To this end, we introduce UP-DP, a simple yet effective unsupervised prompt learning approach that adapts vision-language models, like BLIP-2, for data pre-selection. Specifically, with the BLIP-2 parameters frozen, we train text prompts to extract the joint features with improved representation, ensuring a diverse cluster structure that covers the entire dataset. We extensively compare our method with the state-of-the-art using seven benchmark datasets in different settings, achieving up to a performance gain of 20%. Interestingly, the prompts learned from one dataset demonstrate significant generalizability and can be applied directly to enhance the feature extraction of BLIP-2 from other datasets. To the best of our knowledge, UP-DP is the first work to incorporate unsupervised prompt learning in a vision-language model for data pre-selection.	翻訳日:2023-07-24 14:22:14 公開日:2023-07-20
# Jina Embeddings: 高性能な文埋め込みモデルの新しいセット Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models ( http://arxiv.org/abs/2307.11224v1 ) ライセンス: Link先を確認	Michael G\"unther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, Han Xiao	(参考訳) Jina Embeddingsは、様々なテキスト入力を数値表現に変換するのに有効な高性能な文埋め込みモデルの集合を構成する。これらのモデルはテキスト生成のためにのみ設計されたものではないが、密検索や意味的テキストの類似性といった応用に優れている。本稿では、高品質なペアワイズおよびトリプルトデータセットの作成から始まった、jina組み込みの開発について述べる。データセット作成におけるデータクリーニングの重要な役割を強調し、モデルトレーニングプロセスに関する深い洞察を与え、massive textual embedded benchmark(mteb)を用いた包括的なパフォーマンス評価で締めくくっている。 Jina Embeddings constitutes a set of high-performance sentence embedding models adept at translating various textual inputs into numerical representations, thereby capturing the semantic essence of the text. While these models are not exclusively designed for text generation, they excel in applications such as dense retrieval and semantic textual similarity. This paper details the development of Jina Embeddings, starting with the creation of a high-quality pairwise and triplet dataset. It underlines the crucial role of data cleaning in dataset preparation, gives in-depth insights into the model training process, and concludes with a comprehensive performance evaluation using the Massive Textual Embedding Benchmark (MTEB).	翻訳日:2023-07-24 14:21:49 公開日:2023-07-20
# マルチオブザーバブルとマルチインスツルメント Multi-Observables and Multi-Instruments ( http://arxiv.org/abs/2307.11223v1 ) ライセンス: Link先を確認	Stan Gudder	(参考訳) 本稿では、量子力学におけるマルチオブザーバブルとマルチインストゥルメントの概念を紹介する。 a multi-observable $A$ (multi-instrument $\mathcal{I}$) は $\Omega =\Omega _1\times\cdots\Omega _n$ という形式の結果空間を持ち、$A_{x_1\cdots x_n}$$$\mathcal{I}_{x_1\cdots x_n}$(x_1,\ldots ,x_n)\in\Omega$ で表される。また、$A$ ($\mathcal{I}$) a $n$-observable ($n$-instrument) と呼び、$n=2$ は $A$$$\mathcal{I}$) a bi-observable (bi-instrument) と呼ぶ。 bi-observables $a$(\mathcal{i}$)とbi-instrumentsは過去の文献で検討されてきたが、より一般的なケースは新しいようだ。特に、2つの観測可能量 (instrument) は、共同観測可能量 (bi-instrument) を持つ場合、共存または相容的であるように定義されている。この定義を$n$オブザーバブルと$n$楽器に拡張し、$n$オブザーバブルのジョイント限界と$n$インストラクトのジョイント限界を考える。我々は、n$-instrument がユニークな $n$-observable を計測し、有限個の機器のumber が共存するならば、それらの測定されたobservables が共存することを示す。非自明な$n$-observableとその部分の間には密接な関係があることを証明します。さらに、同様の結果が楽器に当てはまる。次に、有限個の楽器のテンソル積に対する自然な定義が存在し、合理的な性質を持つことを示す。次に,有限個の観測器と観測器の逐次積について考察する。我々は、kraus、holevo、l\"udersといった様々な楽器の例を示す。 This article introduces the concepts of multi-observables and multi-instruments in quantum mechanics. A multi-observable $A$ (multi-instrument $\mathcal{I}$) has an outcome space of the form $\Omega =\Omega _1\times\cdots\times\Omega _n$ and is denoted by $A_{x_1\cdots x_n}$ ($\mathcal{I}_{x_1\cdots x_n}$) where $(x_1,\ldots ,x_n)\in\Omega$. We also call $A$ ($\mathcal{I}$) an $n$-observable ($n$-instrument) and when $n=2$ we call $A$ ($\mathcal{I}$) a bi-observable (bi-instrument). We point out that bi-observables $A$ ($\mathcal{I}$) and bi-instruments have been considered in past literature, but the more general case appears to be new. In particular, two observables (instruments) have been defined to coexist or be compatible if they possess a joint bi-observable (bi-instrument). We extend this definition to $n$ observables and $n$ instruments by considering joint marginals of $n$-observables and joint reduced marginals of $n$-instruments. We show that a $n$-instrument measures a unique $n$-observable and if a finite umber of instruments coexist, then their measured observables coexist. We prove that there is a close relationship between a nontrivial $n$-observable and its parts. Moreover, a similar result holds for instruments. We next show that a natural definition for the tensor product of a finite number of instruments exist and possess reasonable properties. We then discuss sequential products of a finite number of observables and instruments. We present various examples such as Kraus, Holevo and L\"uders instruments.	翻訳日:2023-07-24 14:21:34 公開日:2023-07-20
# FairMobi-Net: 都市移動フロー生成のためのフェアネスを考慮したディープラーニングモデル FairMobi-Net: A Fairness-aware Deep Learning Model for Urban Mobility Flow Generation ( http://arxiv.org/abs/2307.11214v1 ) ライセンス: Link先を確認	Zhewei Liu, Lipai Huang, Chao Fan, Ali Mostafavi	(参考訳) 都市構造と人口活動パターンを理解するためには, 地域をまたいだ現実的な人的流れの生成が不可欠であり, 都市計画・管理の分野において重要な応用が期待できる。しかし、既存のモビリティ生成手法の顕著な欠点は、予測公正性を無視することであり、弱い人口集団を持つ地域をまたいだモビリティフローの過小評価を招き、資源分布やインフラ開発が不適当になる可能性がある。この限界を克服するため,本研究では,地域間人的フロー予測のための新しいフェアネスアウェア深層学習モデルfairmobi-netを提案する。 FairMobi-Netモデルは、損失関数に公正損失を独自に組み込み、ハイブリッドアプローチを採用し、人間のフロー予測にバイナリ分類と数値回帰技術を統合する。本研究では,米国4都市の総合的移動度データセットを用いてFairMobi-Netモデルを検証する。この結果から,FairMobi-Netモデルは,地域所得差にかかわらず,より正確で公平な人流予測を実現する上で,最先端モデル(DeepGravityモデルなど)よりも優れていることがわかった。モデルは様々な領域にわたって高い精度を維持しており、以前の公正な懸念に対処している。特徴のさらなる分析は、物理的距離と道路ネットワーク構造が地域を横断する人的流れに与える影響を解明する。このモデルと結果は、都市科学、交通工学、コンピューティングの分野にまたがる研究者や実践者に、地域をまたがる人間の移動の流れを正確に生成するための効果的なツールを提供する。 Generating realistic human flows across regions is essential for our understanding of urban structures and population activity patterns, enabling important applications in the fields of urban planning and management. However, a notable shortcoming of most existing mobility generation methodologies is neglect of prediction fairness, which can result in underestimation of mobility flows across regions with vulnerable population groups, potentially resulting in inequitable resource distribution and infrastructure development. To overcome this limitation, our study presents a novel, fairness-aware deep learning model, FairMobi-Net, for inter-region human flow prediction. The FairMobi-Net model uniquely incorporates fairness loss into the loss function and employs a hybrid approach, merging binary classification and numerical regression techniques for human flow prediction. We validate the FairMobi-Net model using comprehensive human mobility datasets from four U.S. cities, predicting human flow at the census-tract level. Our findings reveal that the FairMobi-Net model outperforms state-of-the-art models (such as the DeepGravity model) in producing more accurate and equitable human flow predictions across a variety of region pairs, regardless of regional income differences. The model maintains a high degree of accuracy consistently across diverse regions, addressing the previous fairness concern. Further analysis of feature importance elucidates the impact of physical distances and road network structures on human flows across regions. With fairness as its touchstone, the model and results provide researchers and practitioners across the fields of urban sciences, transportation engineering, and computing with an effective tool for accurate generation of human mobility flows across regions.	翻訳日:2023-07-24 14:20:49 公開日:2023-07-20
# simcol3d -- 大腸内視鏡検査による3次元再建 SimCol3D -- 3D Reconstruction during Colonoscopy Challenge ( http://arxiv.org/abs/2307.11261v1 ) ライセンス: Link先を確認	Anita Rau, Sophia Bano, Yueming Jin, Pablo Azagra, Javier Morlana, Edward Sanderson, Bogdan J. Matuszewski, Jae Young Lee, Dong-Jae Lee, Erez Posner, Netanel Frank, Varshini Elangovan, Sista Raviteja, Zhengwen Li, Jiquan Liu, Seenivasan Lalithkumar, Mobarakol Islam, Hongliang Ren, Jos\'e M.M. Montiel, Danail Stoyanov	(参考訳) 大腸癌は世界で最も一般的ながんの1つである。大腸内視鏡は効果的なスクリーニング技術であるが,大腸内視鏡を通してポリープを検出するのは困難である。観察された表面の3dマップは、無防備な大腸組織の同定を強化し、訓練用プラットフォームとして機能する。しかし, 自己閉塞, 反射面, テクスチャの欠如, 特徴的手法を制限した組織変形など多くの要因により, ビデオ映像からの結腸の再構築は未解決のままである。学習ベースのアプローチはpromiseを堅牢な代替手段として持つが、広範なデータセットを必要とする。ベンチマークを確立することで、2022 EndoVisのサブチャンジSimCol3Dは、データ駆動深度を促進し、大腸内視鏡中に予測を行う。この挑戦はMICCAI 2022の一部としてシンガポールで開催された。世界中から6つのチームと、学界や産業の代表者が、合成深度予測、合成ポーズ予測、実際のポーズ予測という3つの課題に参加した。本稿では,課題,提案手法,その結果について述べる。仮想大腸内視鏡の深度予測は頑健に解けるが, ポーズ推定は未解決の課題である。 Colorectal cancer is one of the most common cancers in the world. While colonoscopy is an effective screening technique, navigating an endoscope through the colon to detect polyps is challenging. A 3D map of the observed surfaces could enhance the identification of unscreened colon tissue and serve as a training platform. However, reconstructing the colon from video footage remains unsolved due to numerous factors such as self-occlusion, reflective surfaces, lack of texture, and tissue deformation that limit feature-based methods. Learning-based approaches hold promise as robust alternatives, but necessitate extensive datasets. By establishing a benchmark, the 2022 EndoVis sub-challenge SimCol3D aimed to facilitate data-driven depth and pose prediction during colonoscopy. The challenge was hosted as part of MICCAI 2022 in Singapore. Six teams from around the world and representatives from academia and industry participated in the three sub-challenges: synthetic depth prediction, synthetic pose prediction, and real pose prediction. This paper describes the challenge, the submitted methods, and their results. We show that depth prediction in virtual colonoscopy is robustly solvable, while pose estimation remains an open research question.	翻訳日:2023-07-24 14:11:32 公開日:2023-07-20
# ガウス過程を用いた低データからの信頼度画像予測のための非パラメトリックモデルに向けて Towards Non-Parametric Models for Confidence Aware Image Prediction from Low Data using Gaussian Processes ( http://arxiv.org/abs/2307.11259v1 ) ライセンス: Link先を確認	Nikhil U. Shinde, Florian Richter, Michael C. Yip	(参考訳) 将来の状態を想定する能力は、動的環境と対話しながらインフォームドな意思決定に不可欠である。カメラが広範かつ情報に富んだ知覚モダリティを提供することで、画像シーケンスから将来の状態を予測できるという問題が注目されている。工法の現状は、通常、予測のために大きなパラメトリックモデルを訓練する。精度で予測できることが多いが、これらのモデルは有用なソリューションに収束するために、大規模なトレーニングデータセットの可用性に依存している。本稿では,非常に少ないトレーニングデータから画像系列の将来の画像を予測する問題に着目する。この問題に取り組むために,非パラメトリックモデルを用いて確率論的手法による画像予測を行う。逐次予測画像上で確率分布を生成し,不確かさを時間を通して伝播し,予測に対する信頼度指標を生成する。 gaussianプロセスは、データ効率と、オンラインに新しいトレーニングデータを組み込む能力のために使用される。滑らかな流体シミュレーション環境における将来のフレーム予測に成功し,提案手法を紹介する。 The ability to envision future states is crucial to informed decision making while interacting with dynamic environments. With cameras providing a prevalent and information rich sensing modality, the problem of predicting future states from image sequences has garnered a lot of attention. Current state of the art methods typically train large parametric models for their predictions. Though often able to predict with accuracy, these models rely on the availability of large training datasets to converge to useful solutions. In this paper we focus on the problem of predicting future images of an image sequence from very little training data. To approach this problem, we use non-parametric models to take a probabilistic approach to image prediction. We generate probability distributions over sequentially predicted images and propagate uncertainty through time to generate a confidence metric for our predictions. Gaussian Processes are used for their data efficiency and ability to readily incorporate new training data online. We showcase our method by successfully predicting future frames of a smooth fluid simulation environment.	翻訳日:2023-07-24 14:11:13 公開日:2023-07-20
# 脱分極雑音下でのロバスト基底状態エネルギー推定 Robust ground-state energy estimation under depolarizing noise ( http://arxiv.org/abs/2307.11257v1 ) ライセンス: Link先を確認	Zhiyan Ding and Yulong Dong and Yu Tong and Lin Lin	(参考訳) 我々は,大域的な分極誤差チャネルの下で頑健な基底状態エネルギー推定アルゴリズムを提案する。最近開発された量子指数最小二乗法 (qcels) アルゴリズム [ding, lin, prx quantum, 4, 020331, 2023] に基づいて, 多項式コストの精度を維持しつつ, 頑健な推定を実現するための重要な進歩を取り入れている。ハミルトンのスペクトルギャップを効果的に活用することにより、我々のアルゴリズムは量子位相推定(QPE)やロバスト位相推定(RPE)といった従来の手法で観測された限界を克服する。グローバル非分極化誤りチャネルを超えて、量子ノイズを非分極化エラーチャネルに合わせるためにランダムコンパイル技術を活用することの重要性と実際的な利点を強調する。本研究では,非分極ノイズの存在下での基底状態エネルギー推定の可能性を示し,誤差補正と量子アルゴリズムのアルゴリズムレベルの誤差緩和の可能性を示す。 We present a novel ground-state energy estimation algorithm that is robust under global depolarizing error channels. Building upon the recently developed Quantum Exponential Least Squares (QCELS) algorithm [Ding, Lin, PRX Quantum, 4, 020331, 2023], our new approach incorporates significant advancements to ensure robust estimation while maintaining a polynomial cost in precision. By leveraging the spectral gap of the Hamiltonian effectively, our algorithm overcomes limitations observed in previous methods like quantum phase estimation (QPE) and robust phase estimation (RPE). Going beyond global depolarizing error channels, our work underscores the significance and practical advantages of utilizing randomized compiling techniques to tailor quantum noise towards depolarizing error channels. Our research demonstrates the feasibility of ground-state energy estimation in the presence of depolarizing noise, offering potential advancements in error correction and algorithmic-level error mitigation for quantum algorithms.	翻訳日:2023-07-24 14:10:58 公開日:2023-07-20
# バイオメディカル自然言語処理におけるフェデレーション学習の体系的評価 A Systematic Evaluation of Federated Learning on Biomedical Natural Language Processing ( http://arxiv.org/abs/2307.11254v1 ) ライセンス: Link先を確認	Le Peng, sicheng zhou, jiandong chen, Rui Zhang, Ziyue Xu, Ju Sun	(参考訳) BERTやGPTのような言語モデル(LM)は自然言語処理(NLP)に革命をもたらした。しかし、プライバシーに敏感なドメイン、特に医療分野は、健康保険可搬性会計法(Health Insurance Portability and Accountability Act, HIPPA)や一般データ保護規則(General Data Protection Regulation, GDPR)などの規制によって課されるデータアクセスとプライバシーの制約が制限されているため、LMを訓練する課題に直面している。フェデレートラーニング(FL)は、データプライバシの保護を確保しながら協調学習を可能にする分散ソリューションを提供する。本研究は, バイオメディカルNLPタスクの医療におけるFLを, 8ドルコーパスを含む6ドルのLMを用いて体系的に評価した。結果はこう示しました 1) flモデルは,個々のクライアントのデータでトレーニングされたlmmを一貫して上回っており,時にはポーリングされたデータでトレーニングされたモデルと一致する。 2) 総データ量は一定であり, FLをより多くのクライアントで訓練したLMでは性能は劣るが, 事前学習したトランスフォーマーモデルではレジリエンスが向上した。 3) fl を用いてトレーニングした lms は,非 iid データの可視ギャップを提示しながら,クライアントの iid 分散時にプールデータでトレーニングされたモデルとほぼ同等の性能を発揮する。私たちのコードは、https://github.com/PL97/FedNLPで利用可能です。 Language models (LMs) like BERT and GPT have revolutionized natural language processing (NLP). However, privacy-sensitive domains, particularly the medical field, face challenges to train LMs due to limited data access and privacy constraints imposed by regulations like the Health Insurance Portability and Accountability Act (HIPPA) and the General Data Protection Regulation (GDPR). Federated learning (FL) offers a decentralized solution that enables collaborative learning while ensuring the preservation of data privacy. In this study, we systematically evaluate FL in medicine across $2$ biomedical NLP tasks using $6$ LMs encompassing $8$ corpora. Our results showed that: 1) FL models consistently outperform LMs trained on individual client's data and sometimes match the model trained with polled data; 2) With the fixed number of total data, LMs trained using FL with more clients exhibit inferior performance, but pre-trained transformer-based models exhibited greater resilience. 3) LMs trained using FL perform nearly on par with the model trained with pooled data when clients' data are IID distributed while exhibiting visible gaps with non-IID data. Our code is available at: https://github.com/PL97/FedNLP	翻訳日:2023-07-24 14:10:37 公開日:2023-07-20
# 大腸癌予防のための片面合成非ペア画像翻訳と分節化 Joint one-sided synthetic unpaired image translation and segmentation for colorectal cancer prevention ( http://arxiv.org/abs/2307.11253v1 ) ライセンス: Link先を確認	Enric Moreu, Eric Arazo, Kevin McGuinness, Noel E. O'Connor	(参考訳) 深層学習は医療画像の解析において優れた性能を示した。しかし、データセットはプライバシの問題、標準化の問題、アノテーションの欠如のために取得することが難しい。本稿では,3次元技術と生成対向ネットワークを組み合わせたリアルな合成画像を作成することで,これらの課題に対処する。 CUT-segは,ポリプの分割学習中に,分割モデルと生成モデルとを併用してリアルな画像を生成するジョイントトレーニングである。最近の片面翻訳モデルの利点は、メモリ使用量が非常に少なく、トレーニングループにセグメンテーションモデルを追加できる点にあります。 CUT-segは2段階の訓練を必要とする他のメモリ集約型画像変換手法よりもパフォーマンスが良く、計算コストも低く、実際の画像を必要としない。有望な結果は、単一の実画像とゼロ実アノテーションを使用して、5つの実ポリプセグメンテーションデータセットで達成される。この研究の一環として、我々はSynth-Colonをリリースした。Synth-Colonは、20000のリアルな大腸画像と深度と3D幾何学に関する追加情報を含む完全に合成されたデータセットである。 Deep learning has shown excellent performance in analysing medical images. However, datasets are difficult to obtain due privacy issues, standardization problems, and lack of annotations. We address these problems by producing realistic synthetic images using a combination of 3D technologies and generative adversarial networks. We propose CUT-seg, a joint training where a segmentation model and a generative model are jointly trained to produce realistic images while learning to segment polyps. We take advantage of recent one-sided translation models because they use significantly less memory, allowing us to add a segmentation model in the training loop. CUT-seg performs better, is computationally less expensive, and requires less real images than other memory-intensive image translation approaches that require two stage training. Promising results are achieved on five real polyp segmentation datasets using only one real image and zero real annotations. As a part of this study we release Synth-Colon, an entirely synthetic dataset that includes 20000 realistic colon images and additional details about depth and 3D geometry: https://enric1994.github.io/synth-colon	翻訳日:2023-07-24 14:10:09 公開日:2023-07-20
# Contra multos verbos : 量子力学のスキャンダルについて Contra multos verbos: On scandals of quantum mechanics ( http://arxiv.org/abs/2307.11669v1 ) ライセンス: Link先を確認	Theodorus Maria Nieuwenhuizen	(参考訳) 2008年、ニコ・ファン・カンペン(nico van kampen)は「量子力学のスキャンダル」("it the scandal of quantum mechanics")という手紙の中で、「このスキャンダルは、様々な解釈や哲学的根拠を宣伝する多くの記事、議論、教科書がある。「それ以来、あまり変わっていないが、ソーシャルメディアはニコが「スキャンダル」と呼ぶようなプラットフォームを提供してきた。量子力学の現状について、Arman Allahverdyan と Roger Balian の20年間の研究から、量子測定のためのキュリー・ワイスのモデルの動的解について、詳細な見解が述べられている。統計的解釈のある種の最小形態を具現化し、存在論的つながりを排除している。その過程で、様々な主題、用語、解釈に関するコメントが与えられる。 In 2008 Nico van Kampen wrote in his letter {\it The scandal of quantum mechanics}: ``The scandal is that there are still many articles, discussions and textbooks, which advertise various interpretations and philosophical profundities." Not much has changed since then, while social media have given a platform for more of what Nico would term ``a scandal''. A detailed viewpoint is presented on the status of quantum mechanics, distilled from two decades of work with Armen Allahverdyan and Roger Balian on the dynamical solution of Curie-Weiss models for quantum measurement. It embodies a certain minimal form of the statistical interpretation and stays clear of ontological connections. Along the way, comments on various related subjects, terms and interpretations are given.	翻訳日:2023-07-24 11:52:52 公開日:2023-07-20
# ロバスト主成分分析:手段アプローチの中央値 Robust Principal Component Analysis: A Median of Means Approach ( http://arxiv.org/abs/2102.03403v2 ) ライセンス: Link先を確認	Debolina Paul, Saptarshi Chakraborty and Swagatam Das	(参考訳) 主成分分析(PCA)は、データの可視化、復調、次元化のための基本的なツールである。統計学、機械学習、コンピュータビジョン、関連する分野で広く使われている。しかし、PCAは外れ値に陥ることがよく知られており、しばしばデータセット内の真の下層の低次元構造を検出することに失敗する。メディア・オブ・ミーンズ(MoM)の哲学に従い、近年の教師付き学習手法は、膨大なサンプル理論特性を損なうことなく、外部観測を扱うことに成功している。本稿では,MoM原理に基づくPCA手法を提案する。 mompca (textbf{m}edian of \textbf{m}eans \textbf{p}rincipal \textbf{c}omponent \textbf{a}nalysis) と呼ばれるこの手法は計算上魅力的であるだけでなく、最小の仮定の下で最適収束率を達成する。特に、ラデマッハ複素数の助けを借りて得られた解の漸近的でない誤差境界を探索し、外部の観測に全く仮定を与えない。導出された濃度結果は、解析が分離可能なヒルベルト空間で行われ、結果が対応するノルムにおける基底分布の4番目のモーメントのみに依存するため、次元に依存しない。提案の有効性はシミュレーションや実データアプリケーションを通じて徹底的に実証されている。 Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philosophy, recent supervised learning methods have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. This paper proposes a PCA procedure based on the MoM principle. Called the \textbf{M}edian of \textbf{M}eans \textbf{P}rincipal \textbf{C}omponent \textbf{A}nalysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of the Rademacher complexities while granting absolutely no assumption on the outlying observations. The derived concentration results are not dependent on the dimension because the analysis is conducted in a separable Hilbert space, and the results only depend on the fourth moment of the underlying distribution in the corresponding norm. The proposal's efficacy is also thoroughly showcased through simulations and real data applications.	翻訳日:2023-07-21 19:35:49 公開日:2023-07-20
# パーセプトロン理論はニューラルネットワークの精度を予測することができる Perceptron Theory Can Predict the Accuracy of Neural Networks ( http://arxiv.org/abs/2012.07881v2 ) ライセンス: Link先を確認	Denis Kleyko, Antonello Rosato, E. Paxon Frady, Massimo Panella, Friedrich T. Sommer	(参考訳) 多層ニューラルネットワークは、多くの技術的分類問題に対する技術の現状を定めている。しかし、これらのネットワークは基本的にはブラックボックスであり、分析してパフォーマンスを予測する。本稿では,1層パーセプトロンの統計的理論を開発し,異なるアーキテクチャを持つ驚くほど多種多様なニューラルネットワークの性能を予測できることを示す。パーセプトロンを用いた分類の一般的な理論は、ベクトル記号アーキテクチャとして知られるシンボリック推論のための貯水池計算モデルとコネクショニストモデルを分析するための既存の理論を一般化することによって展開される。我々の統計理論は、信号統計を利用した3つの公式を提供する。式は解析的に難解であるが、数値的に評価できる。最大詳細をキャプチャする記述レベルには、確率的サンプリング方法が必要である。ネットワークモデルによっては、単純な公式はすでに高い予測精度をもたらす。理論予測の質は、貯水池計算文献からのエコー状態ネットワーク(ESN)の記憶タスク、浅いランダムに接続されたネットワークの分類データセットの収集、深層畳み込みニューラルネットワークのイメージNetデータセットの3つの実験環境で評価される。パーセプトロン理論の2番目の記述レベルは,従来説明できなかったタイプのESNの性能を予測できることがわかった。この理論は、その出力層に適用することで、深い多層ニューラルネットワークを予測することができる。ニューラルネットワークの性能を予測する他の方法は、推定モデルの訓練を必要とすることが多いが、提案された理論は、出力ニューロンにおけるシナプス後和の分布の最初の2つのモーメントのみを必要とする。パーセプトロン理論は、推定器モデルを訓練に依存しない他の方法と好意的に比較する。 Multilayer neural networks set the current state of the art for many technical classification problems. But, these networks are still, essentially, black boxes in terms of analyzing them and predicting their performance. Here, we develop a statistical theory for the one-layer perceptron and show that it can predict performances of a surprisingly large variety of neural networks with different architectures. A general theory of classification with perceptrons is developed by generalizing an existing theory for analyzing reservoir computing models and connectionist models for symbolic reasoning known as vector symbolic architectures. Our statistical theory offers three formulas leveraging the signal statistics with increasing detail. The formulas are analytically intractable, but can be evaluated numerically. The description level that captures maximum details requires stochastic sampling methods. Depending on the network model, the simpler formulas already yield high prediction accuracy. The quality of the theory predictions is assessed in three experimental settings, a memorization task for echo state networks (ESNs) from reservoir computing literature, a collection of classification datasets for shallow randomly connected networks, and the ImageNet dataset for deep convolutional neural networks. We find that the second description level of the perceptron theory can predict the performance of types of ESNs, which could not be described previously. The theory can predict deep multilayer neural networks by being applied to their output layer. While other methods for prediction of neural networks performance commonly require to train an estimator model, the proposed theory requires only the first two moments of the distribution of the postsynaptic sums in the output neurons. The perceptron theory compares favorably to other methods that do not rely on training an estimator model.	翻訳日:2023-07-21 19:35:24 公開日:2023-07-20
# ABNIRML:ニューラルIRモデルの挙動解析 ABNIRML: Analyzing the Behavior of Neural IR Models ( http://arxiv.org/abs/2011.00696v2 ) ライセンス: Link先を確認	Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan	(参考訳) BERTやT5のような事前制約付き言語モデルは、アドホック検索のための新しい最先端技術を確立した。しかし、これらの方法がなぜこれほど効果的なのか、なぜ他の種類よりも有効なのか、どのような落とし穴があるのか、まだよく理解されていない。本稿では,従来の手法では扱えなかった文体,事実性,言い換えに対する感受性,単語順など,いくつかの特徴をテスト可能な新しいタイプの診断プローブを含む,ニューラルirモデル(abnirml)の挙動解析のための新しい包括的なフレームワークを提案する。フレームワークの価値を示すために、神経モデルの利益に寄与する要因についての洞察を与え、モデルが提示する意図しないバイアスを識別する、広範な実証研究を行う。例えば、最近のニューラルネットワークのランキングモデルでは、クエリと正確な項重なりをあまり頼りにせず、単語や文の順序に高い感度で示されるより豊かな言語情報を活用するようにしています。他の結果は、いくつかのモデル(例えばT5やColBERT)が(単に関連性ではなく)事実的に正しいテキストに偏っているなど、より驚くべきものである。さらに、同じベース言語モデルであってもいくつかの特性が異なり、他の特徴はモデルの訓練中にランダムなバリエーションによって現れる。 Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search. However, it is not yet well-understood why these methods are so effective, what makes some variants more effective than others, and what pitfalls they may have. We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML), which includes new types of diagnostic probes that allow us to test several characteristics -- such as writing styles, factuality, sensitivity to paraphrasing and word order -- that are not addressed by previous techniques. To demonstrate the value of the framework, we conduct an extensive empirical study that yields insights into the factors that contribute to the neural model's gains, and identify potential unintended biases the models exhibit. Some of our results confirm conventional wisdom, like that recent neural ranking models rely less on exact term overlap with the query, and instead leverage richer linguistic information, evidenced by their higher sensitivity to word and sentence order. Other results are more surprising, such as that some models (e.g., T5 and ColBERT) are biased towards factually correct (rather than simply relevant) texts. Further, some characteristics vary even for the same base language model, and other characteristics can appear due to random variations during model training.	翻訳日:2023-07-21 19:34:57 公開日:2023-07-20
# 局所部分空間の暗黙多次元射影 Implicit Multidimensional Projection of Local Subspaces ( http://arxiv.org/abs/2009.03259v2 ) ライセンス: Link先を確認	Rongzheng Bian, Yumeng Xue, Liang Zhou, Jian Zhang, Baoquan Chen, Daniel Weiskopf, Yunhai Wang	(参考訳) 本研究では,多次元投影が局所部分空間に与える影響を暗黙の関数微分を用いて可視化する手法を提案する。ここでは、局所部分空間をデータポイントの多次元局所近傍として理解する。既存の手法は多次元データポイントの投影に重点を置いており、近隣情報は無視される。本手法は,局所部分空間の形状と方向情報を解析し,局所構造を知覚することで,データの全体構造に関するさらなる洞察を得ることができる。局所部分空間は基底ベクトルにまたがる多次元楕円体によって構成される。暗黙関数として定式化された多次元射影の解析的微分に基づいて,高精度かつ効率的なベクトル変換法を提案する。結果はグリフとして可視化され、効率的なWebベースの可視化ツールでサポートされている、特別に設計されたインタラクションの完全なセットを用いて分析される。本手法の有用性を多次元および高次元ベンチマークデータセットを用いて実証した。暗黙的微分ベクトル変換は数値比較により評価され, 探索例とユースケースを用いて総合的手法が評価された。 We propose a visualization method to understand the effect of multidimensional projection on local subspaces, using implicit function differentiation. Here, we understand the local subspace as the multidimensional local neighborhood of data points. Existing methods focus on the projection of multidimensional data points, and the neighborhood information is ignored. Our method is able to analyze the shape and directional information of the local subspace to gain more insights into the global structure of the data through the perception of local structures. Local subspaces are fitted by multidimensional ellipses that are spanned by basis vectors. An accurate and efficient vector transformation method is proposed based on analytical differentiation of multidimensional projections formulated as implicit functions. The results are visualized as glyphs and analyzed using a full set of specifically-designed interactions supported in our efficient web-based visualization tool. The usefulness of our method is demonstrated using various multi- and high-dimensional benchmark datasets. Our implicit differentiation vector transformation is evaluated through numerical comparisons; the overall method is evaluated through exploration examples and use cases.	翻訳日:2023-07-21 19:34:35 公開日:2023-07-20
# 2量子ビット状態を持つランダムアクセスコードプロトコルにおける量子長所の十分条件 Sufficient conditions for quantum advantage in random access code protocols with two-qubit states ( http://arxiv.org/abs/1912.09900v6 ) ライセンス: Link先を確認	Som Kanjilal, C Jebarathinam, Tomasz Paterek, Dipankar Home	(参考訳) ランダムアクセスコード(RAC)は、nビット文字列のランダムに指定されたサブストリングに関する情報を取得するための重要な通信プロトコルである。量子RACは通常、古典的な通信と共に使用される量子ビットの通信または共用量子状態を利用する。ここでは、単一ビット通信と2つの量子ビットの共有任意の状態の制約の下で、量子プロトコルの後者について考察する。最低ケースの成功確率をメリットの図形として、逆相関行列を持つ任意の状態を用いて、n=3の古典的RACを上回り得ることを示す。 n=2の場合、最も優れた古典的性能を達成できる追加条件を導出する。特に、分離状態は n=2,3 の量子優位性の背後にある有用な資源であることが判明した。量子状態の単一コピーを補助する$n \geq 4$ RACは、古典的なRACよりも優れていない。 Random access code (RAC) is an important communication protocol to obtain information about a randomly specified substring of an n-bit string, while only having limited information about the n-bit string. Quantum RACs usually utilise either communication of quantum bits or a shared-in-advance quantum state used in conjunction with classical communication. Here we consider the latter version of the quantum protocols under the constraint of single-bit communication and with shared arbitrary state of two qubits. Taking the worst-case success probability as the figure of merit, we demonstrate that any state with invertible correlation matrix can be used to outperform the best classical RAC for n=3. We derive an additional condition sufficient to beat the best classical performance in the case of n=2. In particular, separable states turn out to be a useful resource behind the quantum advantage for n=2,3. For $n \geq 4$ RACs assisted with a single copy of a quantum state do not outperform the classical RACs.	翻訳日:2023-07-21 19:34:01 公開日:2023-07-20
# 相対サブシステムと量子参照フレーム変換 Relative subsystems and quantum reference frame transformations ( http://arxiv.org/abs/2110.13199v2 ) ライセンス: Link先を確認	Esteban Castro-Ruiz and Ognyan Oreshkov	(参考訳) 近年、参照フレーム変換の量子一般化の開発に多くの努力がなされている。重要な進歩にもかかわらず、その原則に対する完全な理解はまだ欠けている。特に、以前の提案は、宇宙全体に適用した場合のみ、任意の量子参照フレーム間の可逆変換をもたらす可能性があると論じる。対照的に、標準量子理論のみを用いて、第一原理から量子参照フレーム変換を導出する。我々のフレームワークは、自然にコヒーレントなグループ平均化よりも不整合性に基づいており、参照フレームと関心体系にのみ依存する可逆変換をもたらす。これまでの研究よりもより一般的な変換が得られ、これは制限部分空間でのみ有効である。重要なことに、我々のフレームワークは、参照フレーム状態の量子的特徴に関する情報を伝達する「外部粒子」という形で追加の自由度を含んでいる。我々の形式主義は幅広い対称群に対して有効である。中心的に拡張されたガリレイ群を特に研究し、以前の提案との大きな違いを強調した。 Recently there has been much effort in developing a quantum generalisation of reference frame transformations. Despite important progress, a complete understanding of their principles is still lacking. In particular, we argue that previous proposals could yield reversible transformations between arbitrary quantum reference frames only when applied to the whole universe. In contrast, here we derive quantum reference frame transformations from first principles, using only standard quantum theory. Our framework, naturally based on incoherent rather than coherent group averaging, yields reversible transformations that only depend on the reference frames and system of interest. We find more general transformations than those studied so far, which are valid only in a restricted subspace. Importantly, our framework contains additional degrees of freedom in the form of an "extra particle," which carries information about the quantum features of reference frame states. Our formalism is valid for a broad range of symmetry groups. We study the centrally extended Galilei group specifically, highlighting key differences from previous proposals.	翻訳日:2023-07-21 19:28:03 公開日:2023-07-20
# 動作認識に注意を向けた高次テンソルプーリング High-order Tensor Pooling with Attention for Action Recognition ( http://arxiv.org/abs/2110.05216v2 ) ライセンス: Link先を確認	Piotr Koniusz and Lei Wang and Ke Sun	(参考訳) 本稿では,ニューラルネットワークによって形成される特徴ベクトルの高次統計を捉え,エンドツーエンドの2次・高次プーリングを提案し,テンソルディスクリプタを構成する。テンソルディスクリプタは、集約ベクトルの少ない数と、与えられた特徴が統計的に予想されるよりも頻繁に現れるバーストネス現象のために、堅牢な類似度尺度を必要とする。グラフラプラシアン上の熱拡散過程(HDP)は、逆がループグラフラプラシアンを形成する共分散・自己相関行列の固有値パワー正規化(EPN)と密接に関係している。我々は,HDPとEPNが同一の役割を担っていること,すなわち固有スペクトルの大きさを増大または減衰させることにより,バーストの防止を図っている。我々は、高次発生のスペクトル検出器として作用するepnに高次テンソルを装備し、バーストネスを防止する。また、d次元特徴記述子から構築された位数 r のテンソルに対して、そのような検出器は、少なくとも1つの高次発生がテンソルで表されるbinom(d,r)部分空間の1つに「射影」される可能性を示し、したがってそのような「detectors」のようなbinom(d,r)で導かれるテンソルパワー正規化計量を形成する。実験的なコントリビューションとして,2次および高次プール変種をアクション認識に適用し,これまでに提示されていないプール変種の比較を行い,HMDB-51,YUP++,MPII調理活動の最先端結果を示す。 We aim at capturing high-order statistics of feature vectors formed by a neural network, and propose end-to-end second- and higher-order pooling to form a tensor descriptor. Tensor descriptors require a robust similarity measure due to low numbers of aggregated vectors and the burstiness phenomenon, when a given feature appears more/less frequently than statistically expected. The Heat Diffusion Process (HDP) on a graph Laplacian is closely related to the Eigenvalue Power Normalization (EPN) of the covariance/auto-correlation matrix, whose inverse forms a loopy graph Laplacian. We show that the HDP and the EPN play the same role, i.e., to boost or dampen the magnitude of the eigenspectrum thus preventing the burstiness. We equip higher-order tensors with EPN which acts as a spectral detector of higher-order occurrences to prevent burstiness. We also prove that for a tensor of order r built from d dimensional feature descriptors, such a detector gives the likelihood if at least one higher-order occurrence is 'projected' into one of binom(d,r) subspaces represented by the tensor; thus forming a tensor power normalization metric endowed with binom(d,r) such 'detectors'. For experimental contributions, we apply several second- and higher-order pooling variants to action recognition, provide previously not presented comparisons of such pooling variants, and show state-of-the-art results on HMDB-51, YUP++ and MPII Cooking Activities.	翻訳日:2023-07-21 19:27:39 公開日:2023-07-20
# ジェネリックコンテキスト帯域のモデル選択 Model Selection for Generic Contextual Bandits ( http://arxiv.org/abs/2107.03455v2 ) ライセンス: Link先を確認	Avishek Ghosh, Abishek Sankararaman and Kannan Ramchandran	(参考訳) 一般化可能性仮定の下では,一般確率的文脈帯域のモデル選択の問題を考える。そこで本研究では,適応的文脈的バンドイット({\ttfamily acb})と呼ばれる逐次改良型アルゴリズムを提案する。このアルゴリズムが適応的であること、すなわち、後悔率の順序付けは、証明可能な文脈的バンディットアルゴリズムのそれと一致することを証明する。これは真のモデルクラスの知識を必要とする。正しいモデルクラスを知らないという価格は、後悔境界における第二次項に寄与する加法項のみであることが判明した。このコストはモデルクラスが識別しやすくなり、逆もまたより小さくなるという直感的な特性を持っている。また,真のモデルクラスを知らないにもかかわらず,ETCスタイルのアルゴリズムでも同様の後悔境界が得られることを示す。しかし、モデル選択のコストは予想通り in {\ttfamily acb} よりも高い。さらに,線形文脈バンディットの特別な場合に対して,汎用的な構成に比べてシャープな保証を得るための特別なアルゴリズムを提案する。 We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption. We propose a successive refinement based algorithm called Adaptive Contextual Bandit ({\ttfamily ACB}), that works in phases and successively eliminates model classes that are too simple to fit the given instance. We prove that this algorithm is adaptive, i.e., the regret rate order-wise matches that of any provable contextual bandit algorithm (ex. \cite{falcon}), that needs the knowledge of the true model class. The price of not knowing the correct model class turns out to be only an additive term contributing to the second order term in the regret bound. This cost possess the intuitive property that it becomes smaller as the model class becomes easier to identify, and vice-versa. We also show that a much simpler explore-then-commit (ETC) style algorithm also obtains similar regret bound, despite not knowing the true model class. However, the cost of model selection is higher in ETC as opposed to in {\ttfamily ACB}, as expected. Furthermore, for the special case of linear contextual bandits, we propose specialized algorithms that obtain sharper guarantees compared to the generic setup.	翻訳日:2023-07-21 19:26:49 公開日:2023-07-20
# 新興ハードウェアのための計算フレームワークとしてのベクトルシンボリックアーキテクチャ Vector Symbolic Architectures as a Computing Framework for Emerging Hardware ( http://arxiv.org/abs/2106.05268v2 ) ライセンス: Link先を確認	Denis Kleyko, Mike Davies, E. Paxon Frady, Pentti Kanerva, Spencer J. Kent, Bruno A. Olshausen, Evgeny Osipov, Jan M. Rabaey, Dmitri A. Rachkovskij, Abbas Rahimi, Friedrich T. Sommer	(参考訳) 本稿では,vector symbolic architectures (vsa) (超次元コンピューティングとしても知られている) の開発における最近の進歩を概観する。このフレームワークは確率的で新興のハードウェアの実装に適しており、人工知能(AI)に必要な認知操作のタイプを自然に表現している。本稿では、vsa の体様代数構造が、現代的な計算に関連する全てのデータ構造と操作をサポートする高次元ベクトル上の単純かつ強力な操作を提供することを示す。さらに,VSAの区別機能である「重ね合わせ計算」について述べる。また、AIアプリケーションに固有の難しい組合せ探索問題に対する効率的なソリューションへの扉を開く。我々はVSAが計算学的に普遍的であることを示す方法をスケッチする。分散表現を用いたコンピューティングのフレームワークとして機能し、新興コンピューティングハードウェアの抽象化レイヤの役割を担っていると考えています。この記事では、vsaの背景にある哲学、それらを用いた分散コンピューティングのテクニック、ニューロモーフィックコンピューティングのような新しいコンピューティングハードウェアとの関連を図示することで、コンピュータアーキテクトへの参照として役立ちます。 This article reviews recent progress in the development of the computing framework vector symbolic architectures (VSA) (also known as hyperdimensional computing). This framework is well suited for implementation in stochastic, emerging hardware, and it naturally expresses the types of cognitive operations required for artificial intelligence (AI). We demonstrate in this article that the field-like algebraic structure of VSA offers simple but powerful operations on high-dimensional vectors that can support all data structures and manipulations relevant to modern computing. In addition, we illustrate the distinguishing feature of VSA, "computing in superposition," which sets it apart from conventional computing. It also opens the door to efficient solutions to the difficult combinatorial search problems inherent in AI applications. We sketch ways of demonstrating that VSA are computationally universal. We see them acting as a framework for computing with distributed representations that can play a role of an abstraction layer for emerging computing hardware. This article serves as a reference for computer architects by illustrating the philosophy behind VSA, techniques of distributed computing with them, and their relevance to emerging computing hardware, such as neuromorphic computing.	翻訳日:2023-07-21 19:26:29 公開日:2023-07-20
# detreg: オブジェクト検出のための領域優先型教師なし事前トレーニング DETReg: Unsupervised Pretraining with Region Priors for Object Detection ( http://arxiv.org/abs/2106.04550v5 ) ライセンス: Link先を確認	Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson	(参考訳) 近年, 物体検出のための自己監督型事前学習法は, 検出アーキテクチャの重要な部分を無視して, 対象検出器のバックボーンの事前訓練に重点を置いている。代わりに、オブジェクトのローカライゼーションと埋め込みコンポーネントを含む、オブジェクト検出ネットワーク全体を事前学習する新しい自己教師ありメソッドであるdetregを紹介する。事前トレーニング中、detregは、教師なし領域提案ジェネレータからのローカライゼーションと一致するオブジェクトのローカライゼーションを予測し、対応する特徴埋め込みと自己教師なし画像エンコーダからの埋め込みを同時に調整する。我々は,DETRファミリーの検出器を用いてDETRegを実装し,COCO,PASCAL VOC,Airbus Shipのベンチマークを微調整することで,競争ベースラインよりも向上することを示す。低データのレシエーションでは、DreTRegは、1%のラベルと数ショットの学習設定でトレーニングするなど、パフォーマンスの向上を実現している。 Recent self-supervised pretraining methods for object detection largely focus on pretraining the backbone of the object detector, neglecting key parts of detection architecture. Instead, we introduce DETReg, a new self-supervised method that pretrains the entire object detection network, including the object localization and embedding components. During pretraining, DETReg predicts object localizations to match the localizations from an unsupervised region proposal generator and simultaneously aligns the corresponding feature embeddings with embeddings from a self-supervised image encoder. We implement DETReg using the DETR family of detectors and show that it improves over competitive baselines when finetuned on COCO, PASCAL VOC, and Airbus Ship benchmarks. In low-data regimes DETReg achieves improved performance, e.g., when training with only 1% of the labels and in the few-shot learning settings.	翻訳日:2023-07-21 19:26:13 公開日:2023-07-20
# 明示的知識指導による意味的逆シナリオ生成 Semantically Adversarial Scenario Generation with Explicit Knowledge Guidance ( http://arxiv.org/abs/2106.04066v6 ) ライセンス: Link先を確認	Wenhao Ding, Haohong Lin, Bo Li, Ding Zhao	(参考訳) 自律運転システムを失敗させる可能性のある敵シナリオを生成することは、堅牢性を改善する効果的な方法である。純粋にデータ駆動生成モデルを拡張し、最近の特殊モデルは、ニューロンレベルで暗黙的にパターンを操作することによって、運転シーンに交通標識を埋め込むなど、制御可能な追加要件を満たす。本稿では,semantically adversarial generation (sag) を実現するために,生成プロセスにドメイン知識を明示的に組み込む手法を提案する。ドライビングシーンの構成に整合性を持たせるために,まず知識を物体の性質と物体間の関係という2つのタイプに分類する。次に,木構造変化型自動エンコーダ(T-VAE)を提案する。ツリー構造におけるノードとエッジの特性にセマンティックルールを課すことで、明示的な知識統合は制御可能な生成を可能にする。本手法の制御性と説明性を示すための合成例を簡潔な設定で構築する。本手法は,異なる最先端の3dポイントクラウドセグメンテーションモデルに対する逆行運転シーンを効率的に識別し,明示的な知識として指定されたトラフィックルールを満たす。 Generating adversarial scenarios, which have the potential to fail autonomous driving systems, provides an effective way to improve robustness. Extending purely data-driven generative models, recent specialized models satisfy additional controllable requirements such as embedding a traffic sign in a driving scene by manipulating patterns implicitly in the neuron level. In this paper, we introduce a method to incorporate domain knowledge explicitly in the generation process to achieve the Semantically Adversarial Generation (SAG). To be consistent with the composition of driving scenes, we first categorize the knowledge into two types, the property of objects and the relationship among objects. We then propose a tree-structured variational auto-encoder (T-VAE) to learn hierarchical scene representation. By imposing semantic rules on the properties of nodes and edges in the tree structure, explicit knowledge integration enables controllable generation. We construct a synthetic example to illustrate the controllability and explainability of our method in a succinct setting. We further extend to realistic environments for autonomous vehicles: our method efficiently identifies adversarial driving scenes against different state-of-the-art 3D point cloud segmentation models and satisfies the traffic rules specified as the explicit knowledge.	翻訳日:2023-07-21 19:25:54 公開日:2023-07-20
# 到達可能なマルチスタビリティを最大化するリカレントニューラルネットワークのウォーミングアップが学習を大幅に改善 Warming up recurrent neural networks to maximise reachable multistability greatly improves learning ( http://arxiv.org/abs/2106.01001v3 ) ライセンス: Link先を確認	Gaspard Lambrechts, Florent De Geeter, Nicolas Vecoven, Damien Ernst, Guillaume Drion	(参考訳) リカレントニューラルネットワークのトレーニングは、時間依存が長くなると難しいことが知られている。本研究では、ほとんどの標準セルは初期化時に1つの安定平衡しか持たず、ネットワーク安定平衡の数が増加すると、長い時間依存を持つタスクの学習が一般的に起こることを示す。マルチスタビリティは、初期のモノスタブルネットワークでは容易に実現できないことが多く、入力と出力の間の長時間の依存関係の学習が困難になる。この洞察は、"warmup"と呼ばれる手続きを通じて、任意の再帰的な細胞接続を初期化し、任意に長い時間依存を学習する能力を改善する新しい方法の設計に繋がる。この初期化手順は、ネットワークの到達可能な多重性、すなわち、いくつかの勾配ステップにおいて、関連する入力軌跡を通じて到達可能なネットワーク内の平衡数を最大化するように設計されている。いくつかの情報復元,シーケンス分類,強化学習ベンチマークについて検討し,複数の繰り返しセルにおいて学習速度と性能が大幅に向上するが,時には精度が損なわれることを示した。そこで我々は,高レベルな精度を維持しつつ,長時間依存の学習を大幅に改善できる部分ウォームアップを特徴とする二重層アーキテクチャを導入する。このアプローチは、長期間の依存関係が存在する場合のリカレントセルの学習能力を改善するための一般的なフレームワークを提供する。また,文献から得られた他の初期化および前訓練法が,再発細胞の到達可能な多重化を暗黙的に促進することを示す。 Training recurrent neural networks is known to be difficult when time dependencies become long. In this work, we show that most standard cells only have one stable equilibrium at initialisation, and that learning on tasks with long time dependencies generally occurs once the number of network stable equilibria increases; a property known as multistability. Multistability is often not easily attained by initially monostable networks, making learning of long time dependencies between inputs and outputs difficult. This insight leads to the design of a novel way to initialise any recurrent cell connectivity through a procedure called "warmup" to improve its capability to learn arbitrarily long time dependencies. This initialisation procedure is designed to maximise network reachable multistability, i.e., the number of equilibria within the network that can be reached through relevant input trajectories, in few gradient steps. We show on several information restitution, sequence classification, and reinforcement learning benchmarks that warming up greatly improves learning speed and performance, for multiple recurrent cells, but sometimes impedes precision. We therefore introduce a double-layer architecture initialised with a partial warmup that is shown to greatly improve learning of long time dependencies while maintaining high levels of precision. This approach provides a general framework for improving learning abilities of any recurrent cell when long time dependencies are present. We also show empirically that other initialisation and pretraining procedures from the literature implicitly foster reachable multistability of recurrent cells.	翻訳日:2023-07-21 19:25:32 公開日:2023-07-20
# HDGT:シーンエンコーディングによるマルチエージェント軌道予測のための異種駆動グラフ変換器 HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding ( http://arxiv.org/abs/2205.09753v2 ) ライセンス: Link先を確認	Xiaosong Jia, Penghao Wu, Li Chen, Yu Liu, Hongyang Li, Junchi Yan	(参考訳) 運転シーンをベクトル表現にエンコーディングすることは、軌道予測のような下流タスクに利益をもたらす自動運転にとって必須のタスクである。駆動シーンは、しばしば異なる種類のオブジェクト(エージェント、レーン、交通標識)のような異種要素を伴い、オブジェクト間の意味的関係は豊かで多様である。一方、要素間の相対性も存在し、これは空間関係が相対的な概念であり、グローバル座標系ではなくエゴ中心の方法で符号化する必要があることを意味する。これらの観測に基づいて,運転シーンを異なる種類のノードとエッジを持つ異種グラフとしてモデル化したバックボーンである異種運転グラフ変換器(HDGT)を提案する。ヘテロジニアスグラフ構築では、様々な意味関係に従って異なる種類のノードを接続する。空間的関係符号化では、ノードの座標とエッジの座標は局所ノード中心座標系に含まれる。グラフニューラルネットワーク(GNN)のアグリゲーションモジュールでは、入力の不均一性に適合する階層的な方法でトランスフォーマー構造を採用する。実験結果から,HDGTは軌道予測およびWaymo Open Motion Challengeにおいて,軌道予測のタスクの最先端性能を達成することが示された。 Encoding a driving scene into vector representations has been an essential task for autonomous driving that can benefit downstream tasks e.g. trajectory prediction. The driving scene often involves heterogeneous elements such as the different types of objects (agents, lanes, traffic signs) and the semantic relations between objects are rich and diverse. Meanwhile, there also exist relativity across elements, which means that the spatial relation is a relative concept and need be encoded in a ego-centric manner instead of in a global coordinate system. Based on these observations, we propose Heterogeneous Driving Graph Transformer (HDGT), a backbone modelling the driving scene as a heterogeneous graph with different types of nodes and edges. For heterogeneous graph construction, we connect different types of nodes according to diverse semantic relations. For spatial relation encoding, the coordinates of the node as well as its in-edges are in the local node-centric coordinate system. For the aggregation module in the graph neural network (GNN), we adopt the transformer structure in a hierarchical way to fit the heterogeneous nature of inputs. Experimental results show that HDGT achieves state-of-the-art performance for the task of trajectory prediction, on INTERACTION Prediction Challenge and Waymo Open Motion Challenge.	翻訳日:2023-07-21 19:18:31 公開日:2023-07-20
# Torchhd:超次元コンピューティングとベクトル記号アーキテクチャの研究を支援するオープンソースのPythonライブラリ Torchhd: An Open Source Python Library to Support Research on Hyperdimensional Computing and Vector Symbolic Architectures ( http://arxiv.org/abs/2205.09208v2 ) ライセンス: Link先を確認	Mike Heddes, Igor Nunes, Pere Verg\'es, Denis Kleyko, Danny Abraham, Tony Givargis, Alexandru Nicolau, Alexander Veidenbaum	(参考訳) 超次元コンピューティング (HD) またはベクトル記号アーキテクチャ (VSA) は、ランダムな高次元ベクトル空間の性質を利用して分散表現を計算するためのフレームワークである。この特に学際的な分野の研究を集約し、広めるという科学コミュニティのコミットメントは、その進歩の基盤となっている。これらの取り組みの一環として、HD/VSA用の高性能オープンソースPythonライブラリであるTorchhdを紹介します。 Torchhdは、HD/VSAをよりアクセスしやすくし、さらなる研究とアプリケーション開発のための効率的な基盤となることを目指している。 PyTorch上に構築された使いやすいライブラリには、最先端のHD/VSA機能、明確なドキュメント、有名な出版物による実装例などがある。公開されているコードと対応するtorchhd実装を比較すると、実験は最大100倍高速に実行できる。 Torchhd は https://github.com/hyperdimensional-computing/torchhd で利用可能である。 Hyperdimensional computing (HD), also known as vector symbolic architectures (VSA), is a framework for computing with distributed representations by exploiting properties of random high-dimensional vector spaces. The commitment of the scientific community to aggregate and disseminate research in this particularly multidisciplinary area has been fundamental for its advancement. Joining these efforts, we present Torchhd, a high-performance open source Python library for HD/VSA. Torchhd seeks to make HD/VSA more accessible and serves as an efficient foundation for further research and application development. The easy-to-use library builds on top of PyTorch and features state-of-the-art HD/VSA functionality, clear documentation, and implementation examples from well-known publications. Comparing publicly available code with their corresponding Torchhd implementation shows that experiments can run up to 100x faster. Torchhd is available at: https://github.com/hyperdimensional-computing/torchhd.	翻訳日:2023-07-21 19:18:10 公開日:2023-07-20
# 構造力学とビブロア音響に応用した機械学習手法の検討 A Review of Machine Learning Methods Applied to Structural Dynamics and Vibroacoustic ( http://arxiv.org/abs/2204.06362v2 ) ライセンス: Link先を確認	Barbara Cunha (LTDS), Christophe Droz (I4S), Abdelmalek Zine (ICJ), St\'ephane Foulard, Mohamed Ichchou (LTDS)	(参考訳) 機械学習(ml)の使用は、いくつかの分野に急速に広がり、構造力学や振動音響学(sd\&v)の多くの応用に遭遇している。前例のないデータ可用性、アルゴリズムの進歩と計算能力、意思決定の強化、不確実性処理、パターン認識、リアルタイム評価によって駆動される、データからの洞察を明らかにするmlの能力の増大。 SD\&Vの主要な3つのアプリケーションがこれらの利点を生かしている。構造的健康モニタリングでは、ML検出と予後が安全な操作とメンテナンススケジュールの最適化につながる。システムの識別と制御設計は、アクティブノイズ制御およびアクティブ振動制御におけるML技術によって活用される。最後に、MLベースのサロゲートモデルはコストのかかるシミュレーションの高速な代替手段を提供し、堅牢で最適化された製品設計を可能にします。この地域の多くの作品にもかかわらず、レビューや分析は行われていない。そこで本稿では,これらの分野の統合を追跡し理解するために,sd\&v分析におけるml応用に関する調査を行い,実装の現状と新たな機会について考察する。これら3つの応用ごとに,科学的知識に基づく方法論,利点,限界,推奨事項が同定された。さらに,Digital Twins と Physics Guided ML の役割を,現在の課題を克服し,今後の研究の進展をパワーアップするために検討する。その結果、SD\&Vで適用されたMLの現在の展望を概観し、その分野の進歩と展望について、読者に高度な理解を促すことができた。 The use of Machine Learning (ML) has rapidly spread across several fields, having encountered many applications in Structural Dynamics and Vibroacoustic (SD\&V). The increasing capabilities of ML to unveil insights from data, driven by unprecedented data availability, algorithms advances and computational power, enhance decision making, uncertainty handling, patterns recognition and real-time assessments. Three main applications in SD\&V have taken advantage of these benefits. In Structural Health Monitoring, ML detection and prognosis lead to safe operation and optimized maintenance schedules. System identification and control design are leveraged by ML techniques in Active Noise Control and Active Vibration Control. Finally, the so-called ML-based surrogate models provide fast alternatives to costly simulations, enabling robust and optimized product design. Despite the many works in the area, they have not been reviewed and analyzed. Therefore, to keep track and understand this ongoing integration of fields, this paper presents a survey of ML applications in SD\&V analyses, shedding light on the current state of implementation and emerging opportunities. The main methodologies, advantages, limitations, and recommendations based on scientific knowledge were identified for each of the three applications. Moreover, the paper considers the role of Digital Twins and Physics Guided ML to overcome current challenges and power future research progress. As a result, the survey provides a broad overview of the present landscape of ML applied in SD\&V and guides the reader to an advanced understanding of progress and prospects in the field.	翻訳日:2023-07-21 19:17:41 公開日:2023-07-20
# 有限体上のランダム原始多項式生成のための量子加速アルゴリズム Quantum-accelerated algorithms for generating random primitive polynomials over finite fields ( http://arxiv.org/abs/2203.12884v2 ) ライセンス: Link先を確認	Shan Huang, Hua-Lei Yin, Zeng-Bing Chen, Shengjun Wu	(参考訳) 有限体上の原始多項式は、古典的擬似ランダム数生成、符号化理論、ポスト量子暗号など、コンピュータ科学の様々な領域において重要である。それでも、有限体上のランダム原始多項式を生成するための効率的な古典的アルゴリズムの追求は今も続いている課題である。本稿では,この問題をハイブリッド量子古典アルゴリズムを用いて効率的に解く方法を示し,それらを実装するための特定の量子回路の設計について述べる。本研究は,多種多様な量子通信および計算応用におけるランダムプリミティブ多項式の高速かつリアルタイムな生成方法である。 Primitive polynomials over finite fields are crucial for various domains of computer science, including classical pseudo-random number generation, coding theory and post-quantum cryptography. Nevertheless, the pursuit of an efficient classical algorithm for generating random primitive polynomials over finite fields remains an ongoing challenge. In this paper, we show how to solve this problem efficiently through hybrid quantum-classical algorithms, and designs of the specific quantum circuits to implement them are also presented. Our research paves the way for the rapid and real-time generation of random primitive polynomials in diverse quantum communication and computation applications.	翻訳日:2023-07-21 19:17:16 公開日:2023-07-20
# 勾配・投影自由分散オンラインmin-maxリソース最適化 Gradient and Projection Free Distributed Online Min-Max Resource Optimization ( http://arxiv.org/abs/2112.03896v3 ) ライセンス: Link先を確認	Jingrong Wang and Ben Liang	(参考訳) 分散オンラインmin-maxリソース割り当てを並列エージェントとパラメータサーバのセットで検討する。我々のゴールは、これらの関数に関する事前情報なしで、時間変化とコスト関数のセットに対するポイントワイズ最大化を最小化することである。本研究では,非ストラグラーが資源を放棄し,資源をストラグラーと共有することを学ぶ,分散オンラインリソース再配置(dora)と呼ばれる新しいオンラインアルゴリズムを提案する。 DORAの注目すべき特徴は、既存のオンライン最適化戦略とは異なり、勾配計算や投射操作を必要としないことである。これにより、大規模および分散ネットワークにおける計算オーバーヘッドを大幅に削減できる。我々は,DORAの最悪の性能を分析し,非凸関数に対する動的後悔の上限を導出する。さらに,分散オンライン機械学習における帯域幅割り当て問題への応用を検討する。本研究は,提案手法の有効性と,壁面時間短縮のための勾配および/または投影に基づく資源配分アルゴリズムに対する性能上の優位性を示す。 We consider distributed online min-max resource allocation with a set of parallel agents and a parameter server. Our goal is to minimize the pointwise maximum over a set of time-varying and decreasing cost functions, without a priori information about these functions. We propose a novel online algorithm, termed Distributed Online resource Re-Allocation (DORA), where non-stragglers learn to relinquish resource and share resource with stragglers. A notable feature of DORA is that it does not require gradient calculation or projection operation, unlike most existing online optimization strategies. This allows it to substantially reduce the computation overhead in large-scale and distributed networks. We analyze the worst-case performance of DORA and derive an upper bound on its dynamic regret for non-convex functions. We further consider an application to the bandwidth allocation problem in distributed online machine learning. Our numerical study demonstrates the efficacy of the proposed solution and its performance advantage over gradient- and/or projection-based resource allocation algorithms in reducing wall-clock time.	翻訳日:2023-07-21 19:16:23 公開日:2023-07-20
# クラウドソーシングにおける適応的多数決の完全性 Full Characterization of Adaptively Strong Majority Voting in Crowdsourcing ( http://arxiv.org/abs/2111.06390v2 ) ライセンス: Link先を確認	Margarita Boyarskaya and Panos Ipeirotis	(参考訳) クラウドソーシングでは、労働者がアイテムを調べ、その正確性に投票することで、品質管理が一般的に達成される。信頼できない労働者の反応の影響を最小限に抑えるために、労働者間の合意のための所定の閾値である$\delta$を超過するまで追加の投票を依頼する$\delta$-margin投票プロセスを利用する。このプロセスは広く採用されているが、ヒューリスティックである。本研究では,マルコフ鎖を吸収して,クラウドソーシングプロセスにおいて重要な投票過程の特性を分析するモデリング手法を提案する。我々は、結果のコンセンサス投票の品質、コンセンサスに必要な投票数、投票要求の分散、その他の分配モーメントに関するクローズドフォーム方程式を提供する。本研究は,精度の異なる労働者を雇用する投票プロセスにおける品質等価性を達成するために,$\delta$のしきい値をどのように調整できるかを示す。また、予測応答精度の異なる投票プロセスに対して、効率等級の支払い率を提供する。さらに,本モデルでは,各例の難易度や難易度が異なる項目について考察する。実世界のクラウドソーシング投票データを用いたシミュレーションは,コンセンサス集約過程を特徴付ける理論モデルの有効性を検証する。本研究の成果は,クラウドソーシングの実用化に効果的に活用できる。 In crowdsourcing, quality control is commonly achieved by having workers examine items and vote on their correctness. To minimize the impact of unreliable worker responses, a $\delta$-margin voting process is utilized, where additional votes are solicited until a predetermined threshold $\delta$ for agreement between workers is exceeded. The process is widely adopted but only as a heuristic. Our research presents a modeling approach using absorbing Markov chains to analyze the characteristics of this voting process that matter in crowdsourced processes. We provide closed-form equations for the quality of resulting consensus vote, the expected number of votes required for consensus, the variance of vote requirements, and other distribution moments. Our findings demonstrate how the threshold $\delta$ can be adjusted to achieve quality equivalence across voting processes that employ workers with varying accuracy levels. We also provide efficiency-equalizing payment rates for voting processes with different expected response accuracy levels. Additionally, our model considers items with varying degrees of difficulty and uncertainty about the difficulty of each example. Our simulations, using real-world crowdsourced vote data, validate the effectiveness of our theoretical model in characterizing the consensus aggregation process. The results of our study can be effectively employed in practical crowdsourcing applications.	翻訳日:2023-07-21 19:16:10 公開日:2023-07-20
# 大規模漸近解析による統計的推測のための確率勾配アルゴリズムのチューニング Tuning Stochastic Gradient Algorithms for Statistical Inference via Large-Sample Asymptotics ( http://arxiv.org/abs/2207.12395v3 ) ライセンス: Link先を確認	Jeffrey Negrea, Jun Yang, Haoyue Feng, Daniel M. Roy, Jonathan H. Huggins	(参考訳) 最適化とサンプリングのための確率勾配アルゴリズム(SGA)のチューニングはしばしば一般化可能な理論ではなくヒューリスティックスと試行錯誤に基づいている。この理論は,SGAの大規模統計的漸近をステップサイズ-サンプルサイズスケーリング制限によって特徴付けることによって,実践的ギャップを解消する。そこで本研究では,mleサンプリング分布に比例する共分散を漸近的に有し,大きな固定ステップサイズの反復平均化がチューニングパラメータの選択にロバストであることを示す。また,モデル不特定化に頑健な一般化された後部についても,チューニングを導くためのベルンシュタイン・ヴォン・ミセス的定理を証明した。数値実験により、現実的な有限サンプル状態における結果とレコメンデーションが検証される。我々の研究は、幅広いモデルに対する他の確率勾配マルコフ連鎖モンテカルロアルゴリズムの系統的解析の基礎を成している。 The tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory. We address this theory--practice gap by characterizing the large-sample statistical asymptotics of SGAs via a joint step-size--sample-size scaling limit. We show that iterate averaging with a large fixed step size is robust to the choice of tuning parameters and asymptotically has covariance proportional to that of the MLE sampling distribution. We also prove a Bernstein--von Mises-like theorem to guide tuning, including for generalized posteriors that are robust to model misspecification. Numerical experiments validate our results and recommendations in realistic finite-sample regimes. Our work lays the foundation for a systematic analysis of other stochastic gradient Markov chain Monte Carlo algorithms for a wide range of models.	翻訳日:2023-07-21 19:07:49 公開日:2023-07-20
# オンライン実験設計による線形MDPのインスタンス依存ニア最適ポリシー同定 Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design ( http://arxiv.org/abs/2207.02575v2 ) ライセンス: Link先を確認	Andrew Wagenmaker, Kevin Jamieson	(参考訳) 強化学習(RL)のミニマックスサンプル複雑性("Worst-case"インスタンスでの学習の複雑さ)を理解するために多くの進歩があったが、そのような複雑さの尺度は学習の真の困難を捉えていないことが多い。実際、"簡単"なインスタンスでは、最悪のケースで達成可能なものよりもはるかに複雑なものを達成することを望んでいます。本研究は,線形関数近似を用いたRLの設定において,ニア最適化ポリシー(PAC RL)を学習する際の「インスタンス依存」の複雑さを理解することを目的とする。本稿では,関数近似設定付きrlにおいて,その1つ目となる,複雑性のきめ細かなインスタンス依存測度を実現するアルゴリズムである \textsc{pedel} を提案する。明示的な例を通して,低regret,minimax-Optimalアルゴリズムよりも証明可能なゲインが得られ,そのようなアルゴリズムがインスタンス最適化率に到達できないことを示す。提案手法は, 探索予算を, 最適に近い政策の学習に最も関係のある「方向」に着目し, 独立した興味を持ったオンライン実験手法に依拠する。 While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true difficulty of learning. In practice, on an "easy" instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this work we seek to understand the "instance-dependent" complexity of learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation. We propose an algorithm, \textsc{Pedel}, which achieves a fine-grained instance-dependent measure of complexity, the first of its kind in the RL with function approximation setting, thereby capturing the difficulty of learning on each particular problem instance. Through an explicit example, we show that \textsc{Pedel} yields provable gains over low-regret, minimax-optimal algorithms and that such algorithms are unable to hit the instance-optimal rate. Our approach relies on a novel online experiment design-based procedure which focuses the exploration budget on the "directions" most relevant to learning a near-optimal policy, and may be of independent interest.	翻訳日:2023-07-21 19:07:24 公開日:2023-07-20
# グラフ理論からの不確実性関係 Uncertainty relations from graph theory ( http://arxiv.org/abs/2207.02197v4 ) ライセンス: Link先を確認	Carlos de Gois, Kiara Hansenne, Otfried G\"uhne	(参考訳) 量子測定は本質的に確率的であり、しばしば同時測定の結果を正確に予測することを禁じられる。この現象は不確実性関係によって捉え、定量化される。量子論の発端から研究されているが、量子測定の集合の期待値を決定する問題は、一般には未解決のままである。可観測物とグラフ理論の密接な関係を構築することにより、任意の二コトミック可観測物に対して妥当な不確実性関係を導出する。これらの関係は、多くの場合、密で、関連するグラフの最大傾きの大きさに関連している。応用として, エントロピーの不確実性関係, 分離可能性基準, 絡み合い証人の定式化に, 本結果は直接的に利用できる。 Quantum measurements are inherently probabilistic and quantum theory often forbids to precisely predict the outcomes of simultaneous measurements. This phenomenon is captured and quantified through uncertainty relations. Although studied since the inception of quantum theory, the problem of determining the possible expectation values of a collection of quantum measurements remains, in general, unsolved. By constructing a close connection between observables and graph theory, we derive uncertainty relations valid for any set of dichotomic observables. These relations are, in many cases, tight, and related to the size of the maximum clique of the associated graph. As applications, our results can be straightforwardly used to formulate entropic uncertainty relations, separability criteria and entanglement witnesses.	翻訳日:2023-07-21 19:07:01 公開日:2023-07-20
# convolutional generative adversarial networkを用いたノイズ時系列のデータ駆動モデリング Data-Driven Modeling of Noise Time Series with Convolutional Generative Adversarial Networks ( http://arxiv.org/abs/2207.01110v3 ) ライセンス: Link先を確認	Adam Wunderlich, Jack Sklar	(参考訳) 物理過程から生じるランダムノイズは測定の固有の特性であり、ほとんどの信号処理やデータ解析タスクの制限要因である。データ駆動型モデリングにおけるGAN(Generative Adversarial Network)に対する近年の関心を考えると、GANがターゲットデータセットのノイズを忠実に再現できる範囲を決定することが重要である。本稿では,この問題を時系列で解明することを目的とした実証的な調査を行う。すなわち、一般的な深層畳み込みGAN(DCGAN)アーキテクチャ、直接時系列モデル、短時間フーリエ変換(STFT)データ表現を用いた画像ベースモデルに基づく時系列用汎用GANを2つ評価する。 GANモデルは、既知の地絡パラメータを持つ模擬ノイズ時系列の分布を用いて、訓練および定量的評価を行う。ターゲットの時系列分布には、帯域制限熱ノイズ、電力法ノイズ、ショットノイズ、衝動ノイズなど、物理測定、電子機器、通信システムで一般的に見られる幅広い種類のノイズが含まれる。 ganは、多くのノイズタイプを学習できるが、ganアーキテクチャがノイズのいくつかの側面、例えば、極端な異常値を持つ衝動時系列に適していない場合、予測的に苦労する。本研究は, 時系列GANに対する現在のアプローチの能力と潜在的な限界に関する知見と, 今後の研究分野のハイライトを提供するものである。さらに,テストのバッテリは時系列の深部生成モデルの開発に役立つ有用なベンチマークを提供する。 Random noise arising from physical processes is an inherent characteristic of measurements and a limiting factor for most signal processing and data analysis tasks. Given the recent interest in generative adversarial networks (GANs) for data-driven modeling, it is important to determine to what extent GANs can faithfully reproduce noise in target data sets. In this paper, we present an empirical investigation that aims to shed light on this issue for time series. Namely, we assess two general-purpose GANs for time series that are based on the popular deep convolutional GAN (DCGAN) architecture, a direct time-series model and an image-based model that uses a short-time Fourier transform (STFT) data representation. The GAN models are trained and quantitatively evaluated using distributions of simulated noise time series with known ground-truth parameters. Target time series distributions include a broad range of noise types commonly encountered in physical measurements, electronics, and communication systems: band-limited thermal noise, power law noise, shot noise, and impulsive noise. We find that GANs are capable of learning many noise types, although they predictably struggle when the GAN architecture is not well suited to some aspects of the noise, e.g., impulsive time-series with extreme outliers. Our findings provide insights into the capabilities and potential limitations of current approaches to time-series GANs and highlight areas for further research. In addition, our battery of tests provides a useful benchmark to aid the development of deep generative models for time series.	翻訳日:2023-07-21 19:06:49 公開日:2023-07-20
# 周辺視トランスフォーマ Vicinity Vision Transformer ( http://arxiv.org/abs/2206.10552v2 ) ライセンス: Link先を確認	Weixuan Sun, Zhen Qin, Hui Deng, Jianyuan Wang, Yi Zhang, Kaihao Zhang, Nick Barnes, Stan Birchfield, Lingpeng Kong, Yiran Zhong	(参考訳) 視覚変換器は多くのコンピュータビジョンタスクで大きな成功を収めている。しかし、その中心的なコンポーネントであるSoftmax attentionは、計算複雑性とメモリフットプリントが二次的であるため、視覚変換器が高解像度の画像にスケールアップすることを禁止している。同様の問題を緩和するために自然言語処理(nlp)タスクに線形注意が導入されたが、既存の線形注意を視覚トランスフォーマーに直接適用することは、十分な結果をもたらすことはない。この問題を調査し,コンピュータビジョンタスクがNLPタスクよりもローカル情報に重点を置いていることを見出した。この観測に基づいて,線形複雑度を有する視覚変換器に局所性バイアスを導入するビシニティ注意法を提案する。具体的には,各画像パッチに対して,隣接パッチを用いて測定した2次元マンハッタン距離に基づいて注意重みを調節する。この場合、近隣のパッチは遠方のパッチよりも強い注目を集める。さらに,その効率性を示すためにはトークン長を特徴量よりも大きくする必要があるため,精度を損なうことなく特徴量を削減する新しい近傍視覚トランスフォーマ(vvt)構造を提案する。我々は,CIFAR100, ImageNet1K, ADE20Kデータセットについて広範囲に実験を行い,本手法の有効性を検証した。提案手法は,入力解像度が大きくなると,従来のトランスフォーマーベースおよび畳み込みベースネットワークよりもGFlopsの速度が遅い。特に,従来の手法よりも50%少ないパラメータで,最先端の画像分類精度を実現する。 Vision transformers have shown great success on numerous computer vision tasks. However, its central component, softmax attention, prohibits vision transformers from scaling up to high-resolution images, due to both the computational complexity and memory footprint being quadratic. Although linear attention was introduced in natural language processing (NLP) tasks to mitigate a similar issue, directly applying existing linear attention to vision transformers may not lead to satisfactory results. We investigate this problem and find that computer vision tasks focus more on local information compared with NLP tasks. Based on this observation, we present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity. Specifically, for each image patch, we adjust its attention weight based on its 2D Manhattan distance measured by its neighbouring patches. In this case, the neighbouring patches will receive stronger attention than far-away patches. Moreover, since our Vicinity Attention requires the token length to be much larger than the feature dimension to show its efficiency advantages, we further propose a new Vicinity Vision Transformer (VVT) structure to reduce the feature dimension without degenerating the accuracy. We perform extensive experiments on the CIFAR100, ImageNet1K, and ADE20K datasets to validate the effectiveness of our method. Our method has a slower growth rate of GFlops than previous transformer-based and convolution-based networks when the input resolution increases. In particular, our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous methods.	翻訳日:2023-07-21 19:06:22 公開日:2023-07-20
# Pythae: Pythonで生成オートエンコーダを統合する - ベンチマークユースケース Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case ( http://arxiv.org/abs/2206.08309v2 ) ライセンス: Link先を確認	Cl\'ement Chadebec and Louis J. Vincent and St\'ephanie Allassonni\`ere	(参考訳) 近年,複雑な分布をモデル化する能力から,深い生成モデルへの関心が高まっている。これらのモデルのうち、変分オートエンコーダは計算効率が良く、複数の分野で印象的な結果をもたらすことが証明され、人気を集めている。このブレークスルーの後、オリジナルの出版を改善するために広範な研究が行われ、様々なタスクに対応する様々なVAEモデルが生み出された。本稿では,汎用的なpythonライブラリであるpythaeについて述べる。pythaeは統一的な実装と,生成型オートエンコーダモデルの単純で再現性があり,信頼性の高い使用を可能にする専用フレームワークを提供する。次に,本ライブラリを用いてケーススタディベンチマークを行い,画像再構成,生成,分類,クラスタリング,補間といった下流タスクにおける主な改善点を代表する19個の生成型オートエンコーダモデルを比較し,比較する。オープンソースライブラリはhttps://github.com/clementchadebec/benchmark_vaeにある。 In recent years, deep generative models have attracted increasing interest due to their capacity to model complex distributions. Among those models, variational autoencoders have gained popularity as they have proven both to be computationally efficient and yield impressive results in multiple fields. Following this breakthrough, extensive research has been done in order to improve the original publication, resulting in a variety of different VAE models in response to different tasks. In this paper we present Pythae, a versatile open-source Python library providing both a unified implementation and a dedicated framework allowing straightforward, reproducible and reliable use of generative autoencoder models. We then propose to use this library to perform a case study benchmark where we present and compare 19 generative autoencoder models representative of some of the main improvements on downstream tasks such as image reconstruction, generation, classification, clustering and interpolation. The open-source library can be found at https://github.com/clementchadebec/benchmark_VAE.	翻訳日:2023-07-21 19:06:00 公開日:2023-07-20
# 事前学習された知覚機能は差分プライベート画像生成を改善する Pre-trained Perceptual Features Improve Differentially Private Image Generation ( http://arxiv.org/abs/2205.12900v4 ) ライセンス: Link先を確認	Fredrik Harder and Milad Jalali Asadabadi and Danica J. Sutherland and Mijung Park	(参考訳) 偏極性確率勾配勾配勾配(DP-SGD)を持つ中等度サイズの生成モデルの訓練は困難であり、適切なプライバシーレベルに必要なノイズレベルは、単に大きすぎる。代わりに、情報のある公開データセットに適切な、関連する表現を構築し、その表現でプライベートデータをモデル化することを学びます。特に、公開データセットから学習した知覚的特徴に基づくカーネルを用いて、プライベートなターゲットデータとジェネレータの分散との間の最大平均不一致(mmd)を最小限に抑える。 mmdでは、dp-sgdのように最適化の各ステップにノイズを導入するのではなく、データ依存の用語を何度でも民営化することができる。当社のアルゴリズムでは,MNISTやFashionMNISTなどのデータセットを大容量の$\epsilon \approx 10$で対象とする,分散における特徴を捉えたCIFAR10レベルのイメージを$\epsilon \approx 2$で生成することができる。我々の研究は、プライベートと非プライベートの深層生成モデルの間のギャップを減らすためのシンプルで強力な基盤を導入しました。私たちのコードは \url{https://github.com/ParkLabML/DP-MEPF} で利用可能です。 Training even moderately-sized generative models with differentially-private stochastic gradient descent (DP-SGD) is difficult: the required level of noise for reasonable levels of privacy is simply too large. We advocate instead building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation. In particular, we minimize the maximum mean discrepancy (MMD) between private target data and a generator's distribution, using a kernel based on perceptual features learned from a public dataset. With the MMD, we can simply privatize the data-dependent term once and for all, rather than introducing noise at each step of optimization as in DP-SGD. Our algorithm allows us to generate CIFAR10-level images with $\epsilon \approx 2$ which capture distinctive features in the distribution, far surpassing the current state of the art, which mostly focuses on datasets such as MNIST and FashionMNIST at a large $\epsilon \approx 10$. Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models. Our code is available at \url{https://github.com/ParkLabML/DP-MEPF}.	翻訳日:2023-07-21 19:05:44 公開日:2023-07-20
# MotionBERT:人間の動きの表現を学習する統一的な視点 MotionBERT: A Unified Perspective on Learning Human Motion Representations ( http://arxiv.org/abs/2210.06551v4 ) ライセンス: Link先を確認	Wentao Zhu, Xiaoxuan Ma, Zhaoyang Liu, Libin Liu, Wayne Wu, Yizhou Wang	(参考訳) 本稿では,大規模・異種データ資源から人間の動作表現を学習し,人間中心のビデオ課題に取り組むための統一的な視点を提案する。具体的には,ノイズのある部分的な2次元観測から基礎となる3次元運動を復元するために,動きエンコーダを訓練する事前学習ステージを提案する。この方法で得られた運動表現は、人の動きに関する幾何学的、運動学的、物理的知識を取り入れており、容易に複数の下流タスクに転送できる。動作エンコーダをDST(Dual-stream Spatio-temporal Transformer)ニューラルネットワークで実装する。骨格関節の長距離時空間的関係を包括的かつ適応的に捉え、スクラッチから訓練された場合の最低3次元ポーズ推定誤差を例示する。さらに,提案手法は,学習した動作表現の汎用性を示す単純な回帰ヘッド(1-2層)で事前学習した動きエンコーダを微調整することで,3つの下流タスクの最先端性能を実現する。コードとモデルはhttps://motionbert.github.io/で入手できる。 We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources. Specifically, we propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations. The motion representations acquired in this way incorporate geometric, kinematic, and physical knowledge about human motion, which can be easily transferred to multiple downstream tasks. We implement the motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network. It could capture long-range spatio-temporal relationships among the skeletal joints comprehensively and adaptively, exemplified by the lowest 3D pose estimation error so far when trained from scratch. Furthermore, our proposed framework achieves state-of-the-art performance on all three downstream tasks by simply finetuning the pretrained motion encoder with a simple regression head (1-2 layers), which demonstrates the versatility of the learned motion representations. Code and models are available at https://motionbert.github.io/	翻訳日:2023-07-21 19:00:18 公開日:2023-07-20
# ローカルクエリはいつ堅牢な学習に有用か? When are Local Queries Useful for Robust Learning? ( http://arxiv.org/abs/2210.06089v2 ) ライセンス: Link先を確認	Pascale Gourdeau, Varun Kanade, Marta Kwiatkowska, James Worrell	(参考訳) 正確なボール内ロバストリスクと、Gourdeau et al. (2019) によるランダムな例へのアクセスを考えると、概念クラスの堅牢な学習性には分布仮定が必要であることが示されている。本稿では,局所的クエリを用いて学習者がより多くのパワーを与えられる学習モデルについて検討し,このロバスト性の概念に対してロバストな経験的リスク最小化(erm)を行う最初の分散フリーアルゴリズムを提案する。私たちが検討する最初の学習モデルは、学習者がトレーニングサンプルの近くのポイントのラベルをクエリできるローカルメンバシップクエリ(LMQ)を使用する。均一分布の下では、LMQ は接続の堅牢性しきい値や、決定リストやハーフスペースのような任意のスーパークラスを増大させません。この否定的な結果に直面した私たちは、ローカル等価クエリ(\mathsf{leq}$)オラクルを紹介します。これは、仮説と対象概念がトレーニングサンプルの点の摂動領域で一致しているか、あるいはその存在が反例なのかを返します。一方、クエリ半径$\lambda$が敵の摂動予算$\rho$より厳密に小さい場合、分散のない堅牢な学習は様々な概念クラスでは不可能である。そして、オンライン学習保証に基づいてこれらのアルゴリズムの問合せ複雑性を制限し、特別な結合の場合にはこれらの境界をさらに改善します。最後に、$\{0,1\}^n$のハーフスペースに対するロバストな学習アルゴリズムを与え、精度制限された敵に対して$\mathbb{r}^n$のハーフスペースに対するロバスト性保証を得る。 Distributional assumptions have been shown to be necessary for the robust learnability of concept classes when considering the exact-in-the-ball robust risk and access to random examples by Gourdeau et al. (2019). In this paper, we study learning models where the learner is given more power through the use of local queries, and give the first distribution-free algorithms that perform robust empirical risk minimization (ERM) for this notion of robustness. The first learning model we consider uses local membership queries (LMQ), where the learner can query the label of points near the training sample. We show that, under the uniform distribution, LMQs do not increase the robustness threshold of conjunctions and any superclass, e.g., decision lists and halfspaces. Faced with this negative result, we introduce the local equivalence query ($\mathsf{LEQ}$) oracle, which returns whether the hypothesis and target concept agree in the perturbation region around a point in the training sample, as well as a counterexample if it exists. We show a separation result: on the one hand, if the query radius $\lambda$ is strictly smaller than the adversary's perturbation budget $\rho$, then distribution-free robust learning is impossible for a wide variety of concept classes; on the other hand, the setting $\lambda=\rho$ allows us to develop robust ERM algorithms. We then bound the query complexity of these algorithms based on online learning guarantees and further improve these bounds for the special case of conjunctions. We finish by giving robust learning algorithms for halfspaces on $\{0,1\}^n$ and then obtaining robustness guarantees for halfspaces in $\mathbb{R}^n$ against precision-bounded adversaries.	翻訳日:2023-07-21 18:59:58 公開日:2023-07-20
# MAP:マルチモーダル不確かさを意識したビジョンランゲージ事前学習モデル MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model ( http://arxiv.org/abs/2210.05335v3 ) ライセンス: Link先を確認	Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang	(参考訳) マルチモーダルな意味理解は、しばしば不確実性を扱う必要があり、つまり、得られたメッセージは複数のターゲットを参照する傾向がある。このような不確実性は、モーダル間の不確実性を含む私たちの解釈には問題があります。この不確実性のモデリング、特にラベルのないデータセットの事前トレーニングやタスク固有のダウンストリームデータセットの微調整についてはほとんど研究されていない。本稿では,確率分布エンコーダ(Probability Distribution Encoder:PDE)を用いて,全てのモードを確率分布として表現する。既存の決定論的手法と比較して、そのような不確実性モデリングはよりリッチなマルチモーダル意味情報やより複雑な関係を伝達することができる。さらに、一般的な事前学習フレームワークと不確実性モデリングを統合し、分布ベース視覚言語コントラスト学習(D-VLC)、分布ベースマスケッド言語モデリング(D-MLM)、分布ベース画像テキストマッチング(D-ITM)といった適切な事前学習タスクを提案する。微調整されたモデルは、画像テキスト検索、視覚的質問応答、視覚的推論、視覚的推論などの下流タスクに適応し、最先端の結果を達成する。 Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend to refer to multiple targets. Such uncertainty is problematic for our interpretation, including inter- and intra-modal uncertainty. Little effort has studied the modeling of this uncertainty, particularly in pre-training on unlabeled datasets and fine-tuning in task-specific downstream datasets. In this paper, we project the representations of all modalities as probabilistic distributions via a Probability Distribution Encoder (PDE) by utilizing sequence-level interactions. Compared to the existing deterministic methods, such uncertainty modeling can convey richer multimodal semantic information and more complex relationships. Furthermore, we integrate uncertainty modeling with popular pre-training frameworks and propose suitable pre-training tasks: Distribution-based Vision-Language Contrastive learning (D-VLC), Distribution-based Masked Language Modeling (D-MLM), and Distribution-based Image-Text Matching (D-ITM). The fine-tuned models are applied to challenging downstream tasks, including image-text retrieval, visual question answering, visual reasoning, and visual entailment, and achieve state-of-the-art results.	翻訳日:2023-07-21 18:59:12 公開日:2023-07-20
# 敵対的ノイズに対するフレンドリーなノイズ:データ中毒攻撃に対する強力な防御 Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attacks ( http://arxiv.org/abs/2208.10224v4 ) ライセンス: Link先を確認	Tian Yu Liu, Yu Yang, Baharan Mirzasoleiman	(参考訳) 見えない)データ中毒攻撃の強力なカテゴリは、特定のテスト時間データの予測を変更するために、小さな敵の摂動によってトレーニング例のサブセットを変更する。既存の防御機構は、しばしば一般化性能を著しく損なうか、攻撃固有のもので、適用が著しく遅いため、実際に配備されることは望ましくない。そこで本研究では, 従来の手法とは異なり, 一般化性能の低下により, 各種の目に見えない毒素攻撃を回避できる簡易かつ高効率な手法を提案する。攻撃が局所的な鋭い領域に高訓練損失をもたらし、それが最小化されると、敵の摂動を学習し、攻撃を成功させるという重要な観察を行う。毒殺攻撃を打破するためには、毒によって引き起こされる鋭い喪失領域を緩和する。そこで本手法は, 性能を劣化させることなく, 最大摂動音に対して発生する最適化親和性雑音と, ランダムに変化する雑音成分の2成分からなる。両方のコンポーネントの組み合わせは、非常に軽量だが、最も強力なトリガーレスターゲットおよび隠れトリガーバックドア中毒攻撃に対して非常に効果的に防御する、例えば勾配マッチング、ブルズアイポリトープ、睡眠剤などである。我々は、我々のフレンドリーなノイズが他のアーキテクチャに転送可能であることを示し、適応的な攻撃はランダムなノイズ成分のために我々の防御を損なうことができないことを示す。私たちのコードは、https://github.com/tianyu139/friendly-noiseで利用可能です。 A powerful category of (invisible) data poisoning attacks modify a subset of training examples by small adversarial perturbations to change the prediction of certain test-time data. Existing defense mechanisms are not desirable to deploy in practice, as they often either drastically harm the generalization performance, or are attack-specific, and prohibitively slow to apply. Here, we propose a simple but highly effective approach that unlike existing methods breaks various types of invisible poisoning attacks with the slightest drop in the generalization performance. We make the key observation that attacks introduce local sharp regions of high training loss, which when minimized, results in learning the adversarial perturbations and makes the attack successful. To break poisoning attacks, our key idea is to alleviate the sharp loss regions introduced by poisons. To do so, our approach comprises two components: an optimized friendly noise that is generated to maximally perturb examples without degrading the performance, and a randomly varying noise component. The combination of both components builds a very light-weight but extremely effective defense against the most powerful triggerless targeted and hidden-trigger backdoor poisoning attacks, including Gradient Matching, Bulls-eye Polytope, and Sleeper Agent. We show that our friendly noise is transferable to other architectures, and adaptive attacks cannot break our defense due to its random noise component. Our code is available at: https://github.com/tianyu139/friendly-noise	翻訳日:2023-07-21 18:57:23 公開日:2023-07-20
# 世論市場モデル:ポジティブな介入による極右意見の拡散 Opinion Market Model: Stemming Far-Right Opinion Spread using Positive Interventions ( http://arxiv.org/abs/2208.06620v2 ) ライセンス: Link先を確認	Pio Calderon, Rohit Ram, Marian-Andrei Rizoiu	(参考訳) オンライン過激主義は、ヘイトスピーチの正規化、ユーザーの過激化、社会的分裂の増加など、深刻な社会的結果をもたらす。これらの結果に対処するために様々な緩和戦略が検討されている。そのような戦略の1つはポジティブな介入:特定の意見を促進するために意見エコシステムに注意を向ける制御されたシグナルである。ポジティブ介入の有効性を評価するために,オピニオン間相互作用とポジティブ介入の役割の両方を考慮した2層オンラインオピニオン・エコシステムモデルであるオピニオン・マーケット・モデル(omm)を提案する。市場注目市場の大きさは、多変量離散時間ホークスプロセスを用いて第1階層でモデル化され、第2階層では、市場シェアアトラクションモデルを用いて限られた注意を払って、意見が協調して市場シェアを競う。合成データセット上で提案した推定手法の収束性を示す。次に、2つの学習タスクでOMMをテストし、2つの実世界のデータセットを適用して市場シェアを予測し、オンラインアイテム間の潜伏関係を明らかにする。最初のデータセットはfacebookとtwitterの議論で、ブッシュファイアと気候変動に関する中道と極右の意見を含んでいる。第2のデータセットは、人気のVEVOアーティストのYouTubeとTwitterのアテンションボリュームをキャプチャする。 OMMは、両方のデータセットで最先端の予測モデルより優れており、潜在的な協調競合関係を捉えている。我々は,(1)ブッシュファイアに関する極右意見と中道派意見の自己・相互強化,(2)コラボレーションや長期にわたる確執といった現実世界の相互作用と相関する対関係的アーティスト関係を明らかにする。最後に、OMMを肯定的な介入のためのテストベッドとして使用し、メディアカバレッジが極右意見の拡散をどう調節するかを示す。 Online extremism has severe societal consequences, including normalizing hate speech, user radicalization, and increased social divisions. Various mitigation strategies have been explored to address these consequences. One such strategy uses positive interventions: controlled signals that add attention to the opinion ecosystem to boost certain opinions. To evaluate the effectiveness of positive interventions, we introduce the Opinion Market Model (OMM), a two-tier online opinion ecosystem model that considers both inter-opinion interactions and the role of positive interventions. The size of the opinion attention market is modeled in the first tier using the multivariate discrete-time Hawkes process; in the second tier, opinions cooperate and compete for market share, given limited attention using the market share attraction model. We demonstrate the convergence of our proposed estimation scheme on a synthetic dataset. Next, we test OMM on two learning tasks, applying to two real-world datasets to predict attention market shares and uncover latent relationships between online items. The first dataset comprises Facebook and Twitter discussions containing moderate and far-right opinions about bushfires and climate change. The second dataset captures popular VEVO artists' YouTube and Twitter attention volumes. OMM outperforms the state-of-the-art predictive models on both datasets and captures latent cooperation-competition relations. We uncover (1) self- and cross-reinforcement between far-right and moderate opinions on the bushfires and (2) pairwise artist relations that correlate with real-world interactions such as collaborations and long-lasting feuds. Lastly, we use OMM as a testbed for positive interventions and show how media coverage modulates the spread of far-right opinions.	翻訳日:2023-07-21 18:56:10 公開日:2023-07-20
# カオスと乱流のニューラルネットワーク複雑性 Neural Network Complexity of Chaos and Turbulence ( http://arxiv.org/abs/2211.15382v2 ) ライセンス: Link先を確認	Tim Whittaker, Romuald A. Janik, Yaron Oz	(参考訳) カオスと乱流は複雑な物理現象であるが、それらを定量化する複雑性測度の正確な定義はまだ欠けている。本研究では,深層ニューラルネットワークの観点からカオスと乱流の相対的複雑性を考える。本研究では, カオス状態における流体プロファイルと, 様々なノイズ構造, 実世界の画像などの他の種類の画像とを, ネットワークが区別しなければならない一連の分類問題を解析する。非圧縮性および弱い圧縮性流体流の解析を行う。本研究では,内部特徴表現の内在的な次元を通してネットワークが行う計算の複雑さを定量化し,ネットワークがクラスを区別するために使用する独立特徴の有効数を算出する。この尺度は計算の複雑さを数値的に推定するだけでなく、中間段階と最終段階におけるニューラルネットワーク処理を特徴付ける。逆例を構築し,これらを用いてカオス渦と乱流渦の2点相関スペクトルを,ネットワークが分類に用いた特徴として同定する。 Chaos and turbulence are complex physical phenomena, yet a precise definition of the complexity measure that quantifies them is still lacking. In this work we consider the relative complexity of chaos and turbulence from the perspective of deep neural networks. We analyze a set of classification problems, where the network has to distinguish images of fluid profiles in the turbulent regime from other classes of images such as fluid profiles in the chaotic regime, various constructions of noise and real world images. We analyze incompressible as well as weakly compressible fluid flows. We quantify the complexity of the computation performed by the network via the intrinsic dimensionality of the internal feature representations, and calculate the effective number of independent features which the network uses in order to distinguish between classes. In addition to providing a numerical estimate of the complexity of the computation, the measure also characterizes the neural network processing at intermediate and final stages. We construct adversarial examples and use them to identify the two point correlation spectra for the chaotic and turbulent vorticity as the feature used by the network for classification.	翻訳日:2023-07-21 18:48:31 公開日:2023-07-20
# テンソルネットワークを用いた正のラベルなし学習 Positive unlabeled learning with tensor networks ( http://arxiv.org/abs/2211.14085v3 ) ライセンス: Link先を確認	Bojan \v{Z}unkovi\v{c}	(参考訳) 正のラベルなし学習は正のラベルなしデータを持つ二項分類問題である。医療やパーソナライズされた広告など、ネガティブなラベルが高価または不可能なドメインでは一般的である。正のラベルなし学習へのほとんどのアプローチは、特定のデータ型(画像、分類データなど)に適用され、新しい正と負のサンプルを生成できない。この研究は、正の未ラベル学習問題に対する特徴空間距離に基づくテンソルネットワークアプローチを導入する。提案手法はドメイン固有ではなく、MNIST画像と15の分類/混合データセットの最先端結果を大幅に改善する。トレーニングされたテンソルネットワークモデルもまた生成モデルであり、新しい正および負のインスタンスの生成を可能にする。 Positive unlabeled learning is a binary classification problem with positive and unlabeled data. It is common in domains where negative labels are costly or impossible to obtain, e.g., medicine and personalized advertising. Most approaches to positive unlabeled learning apply to specific data types (e.g., images, categorical data) and can not generate new positive and negative samples. This work introduces a feature-space distance-based tensor network approach to the positive unlabeled learning problem. The presented method is not domain specific and significantly improves the state-of-the-art results on the MNIST image and 15 categorical/mixed datasets. The trained tensor network model is also a generative model and enables the generation of new positive and negative instances.	翻訳日:2023-07-21 18:48:14 公開日:2023-07-20
# オンライン強化学習におけるオフラインデータ活用 Leveraging Offline Data in Online Reinforcement Learning ( http://arxiv.org/abs/2211.04974v2 ) ライセンス: Link先を確認	Andrew Wagenmaker, Aldo Pacchiano	(参考訳) 強化学習(RL)コミュニティには,オンラインRLとオフラインRLという,2つの中心的なパラダイムが出現している。オンラインRL設定では、エージェントは環境に関する事前の知識を持っておらず、$\epsilon$-Optimal Policyを見つけるためにそれと対話する必要がある。オフラインのrl設定では、学習者は、学習する固定データセットにアクセスするが、それ以外は環境とのインタラクションができず、オフラインデータから可能な最高のポリシーを取得する必要がある。もしいくつかのオフラインデータがあり、環境と相互作用する可能性があるなら、オフラインデータを使って$\epsilon$-Optimalポリシーを学ぶのに必要なオンラインインタラクションの数を最小化できるだろうか? 本研究では、線形構造を持つmdpに対して、この設定を \textsf{finetunerl} 設定と呼ぶ。オフラインデータセットへのアクセスを前提に、この設定で必要なオンラインサンプルの数を特徴付け、最大$h$ファクターの最適なアルゴリズムである \textsc{ftpedel} を開発します。オフラインデータとオンラインインタラクションを組み合わせることで、純粋にオフラインまたは純粋にオンラインRLよりも証明可能な改善がもたらされる、という明確な例を示す。最後に、オンラインRLにおける典型的な設定である「emph{verible}学習」と、オフラインRLにおいてしばしば考慮される「emph{unverible}学習」の区別を示し、これらの制度間に正式な分離が存在することを示す。 Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must interact with it in order to find an $\epsilon$-optimal policy. In the offline RL setting, the learner instead has access to a fixed dataset to learn from, but is unable to otherwise interact with the environment, and must obtain the best policy it can from this offline data. Practical scenarios often motivate an intermediate setting: if we have some set of offline data and, in addition, may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an $\epsilon$-optimal policy? In this work, we consider this setting, which we call the \textsf{FineTuneRL} setting, for MDPs with linear structure. We characterize the necessary number of online samples needed in this setting given access to some offline dataset, and develop an algorithm, \textsc{FTPedel}, which is provably optimal, up to $H$ factors. We show through an explicit example that combining offline data with online interactions can lead to a provable improvement over either purely offline or purely online RL. Finally, our results illustrate the distinction between \emph{verifiable} learning, the typical setting considered in online RL, and \emph{unverifiable} learning, the setting often considered in offline RL, and show that there is a formal separation between these regimes.	翻訳日:2023-07-21 18:47:27 公開日:2023-07-20
# オブザーバベース逆強化学習における等価解に対する不合理性と収束 Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning ( http://arxiv.org/abs/2210.16299v3 ) ライセンス: Link先を確認	Jared Town, Zachary Morrison, Rushikesh Kamalapurkar	(参考訳) オンラインおよびリアルタイムに決定論的逆強化学習(IRL)問題を解決する上で重要な課題は、複数の解が存在することである。非特異性は等価解の概念、すなわち異なるコスト関数的だが同じフィードバック行列をもたらす解、およびそのような解への収束の研究を必要とする。同等のソリューションに収束するオフラインアルゴリズムが文献で開発されているが、非合理性に対処するオンラインリアルタイム技術は利用できない。本稿では、IRL問題のほぼ等価解に収束する正規化履歴スタックオブザーバを開発する。本手法の有効性を実証するために,新しいデータリッチネス条件を開発し,シミュレーション結果を得た。 A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.	翻訳日:2023-07-21 18:47:00 公開日:2023-07-20
# 音声対音声比較のためのテキストレス指標 A Textless Metric for Speech-to-Speech Comparison ( http://arxiv.org/abs/2210.11835v2 ) ライセンス: Link先を確認	Laurent Besacier, Swen Ribeiro, Olivier Galibert, Ioan Calapodescu	(参考訳) 本稿では,テキストの書き起こしに頼らずに音声の発話を比較する方法を提案する。我々は,HuBERTのような最先端の音声2ユニットエンコーダを用いて,発話を離散音響単位に変換する。次に,テキストベースと密接に対応した音声ベースのメトリクスを学習する,シンプルで容易に複製可能なニューラルアーキテクチャを提案する。このテキストレスメートル法には、音声から音声への翻訳の評価や、信頼できるASRシステムを持たない言語、あるいはASRの転写を完全に回避するなど、多くの潜在的な応用がある。また、音声から音声への翻訳評価において、ASR系が強い場合でも、音声仮説と参照と文レベルのBLEUを自動で書き起こしするASR-BLEUが、実際のテキストBLEUのプロキシとして不十分であることを示す。 In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts. Our speech-to-speech comparison metric utilizes state-of-the-art speech2unit encoders like HuBERT to convert speech utterances into discrete acoustic units. We then propose a simple and easily replicable neural architecture that learns a speech-based metric that closely corresponds to its text-based counterpart. This textless metric has numerous potential applications, including evaluating speech-to-speech translation for oral languages, languages without dependable ASR systems, or to avoid the need for ASR transcription altogether. This paper also shows that for speech-to-speech translation evaluation, ASR-BLEU (which consists in automatically transcribing both speech hypothesis and reference and compute sentence-level BLEU between transcripts) is a poor proxy to real text-BLEU even when ASR system is strong.	翻訳日:2023-07-21 18:46:22 公開日:2023-07-20
# エンタングルメントハミルトンのエッジとバルクに対する異なる温度依存性 Different temperature-dependence for the edge and bulk of entanglement Hamiltonian ( http://arxiv.org/abs/2210.10062v2 ) ライセンス: Link先を確認	Menghan Song, Jiarui Zhao, Zheng Yan, and Zi Yang Meng	(参考訳) 本稿では, 経路積分定式化のワームホール効果に基づく物理図面を提案し, エンタングルメントスペクトル(ES)のメカニズムを説明するとともに, エネルギースペクトルのバルクエッジ対応とES(LiとHaldane予想)のトポロジ的状態を説明するとともに, それらのトポロジ的性質とは無関係に他のシステムに適用可能であることを示す。最終的に、システムの低層ESの挙動を決定するエッジエネルギーギャップに対して、バルクエネルギーギャップ(逆温度$\beta=1/T$)の相対的な強度であることを示した。状況によっては、ESは仮想エッジのエネルギースペクトルに似ているが、仮想バルクのエネルギースペクトルを表すこともできる。我々は、LiとHaldaneが0温度で予想するエッジのようなケースに加えて、有限温度でバルク状の低層ESを実証するために、1Dと2Dの両方でモデルを設計する。本研究は,ESを経路積分におけるワームホール効果,およびESのエッジとバルクの温度依存性の一般性を支持するものである。 We propose a physical picture based on the wormhole effect of the path-integral formulation to explain the mechanism of entanglement spectrum (ES), such that, our picture not only explains the topological state with bulk-edge correspondence of the energy spectrum and ES (the Li and Haldane conjecture), but is generically applicable to other systems independent of their topological properties. We point out it is ultimately the relative strength of bulk energy gap (multiplied with inverse temperature $\beta=1/T$) with respect to the edge energy gap that determines the behavior of the low-lying ES of the system. Depending on the circumstances, the ES can resemble the energy spectrum of the virtual edge, but can also represent that of the virtual bulk. We design models both in 1D and 2D to successfully demonstrate the bulk-like low-lying ES at finite temperatures, in addition to the edge-like case conjectured by Li and Haldane at zero temperature. Our results support the generality of viewing the ES as the wormhole effect in the path integral and the different temperature-dependence for the edge and bulk of ES.	翻訳日:2023-07-21 18:46:05 公開日:2023-07-20
# ニューラルネットワーク学習のためのデータ効率向上 Data-Efficient Augmentation for Training Neural Networks ( http://arxiv.org/abs/2210.08363v3 ) ライセンス: Link先を確認	Tian Yu Liu and Baharan Mirzasoleiman	(参考訳) データ拡張は、多くのディープラーニングアプリケーションで最先端のパフォーマンスを達成するために不可欠である。しかし、最も効果的な拡張技術は、中規模のデータセットでも計算的に禁止される。そこで本研究では,拡張されたデータポイントのサブセットを選択するための厳密な手法を提案する。まず,加法摂動としてモデル化されたデータ拡張は,ネットワークジャコビアンのより小さな特異値を相対的に拡大・摂動することで学習と一般化を改善し,その顕著な方向を維持していることを示す。これにより、過剰フィッティングが防止され、情報を学ぶのが難しくなる。そこで本研究では,学習データの小さな部分集合を反復的に抽出するフレームワークを提案する。本手法により得られた拡張部分集合に対する確率勾配勾配は、完全に拡張されたデータと同様のトレーニングダイナミクスを持つことを示す。実験により, CIFAR10では6.3倍, SVHNでは2.2倍の高速化を実現し, 各種サブセットサイズでベースラインを最大10%上回る性能を示した。同様に、TinyImageNetとImageNetでは、ベースラインを最大8%上回り、様々なサブセットサイズで最大3.3倍のスピードアップを実現しています。最後に、我々のCIFAR10のバージョンで、50%のサブセットのトレーニングと強化を行い、完全なデータセットを使用してラベルノイズがさらに優れていた。私たちのコードは、https://github.com/tianyu139/data- efficient-augmentationで利用可能です。 Data augmentation is essential to achieve state-of-the-art performance in many deep learning applications. However, the most effective augmentation techniques become computationally prohibitive for even medium-sized datasets. To address this, we propose a rigorous technique to select subsets of data points that when augmented, closely capture the training dynamics of full data augmentation. We first show that data augmentation, modeled as additive perturbations, improves learning and generalization by relatively enlarging and perturbing the smaller singular values of the network Jacobian, while preserving its prominent directions. This prevents overfitting and enhances learning the harder to learn information. Then, we propose a framework to iteratively extract small subsets of training data that when augmented, closely capture the alignment of the fully augmented Jacobian with labels/residuals. We prove that stochastic gradient descent applied to the augmented subsets found by our approach has similar training dynamics to that of fully augmented data. Our experiments demonstrate that our method achieves 6.3x speedup on CIFAR10 and 2.2x speedup on SVHN, and outperforms the baselines by up to 10% across various subset sizes. Similarly, on TinyImageNet and ImageNet, our method beats the baselines by up to 8%, while achieving up to 3.3x speedup across various subset sizes. Finally, training on and augmenting 50% subsets using our method on a version of CIFAR10 corrupted with label noise even outperforms using the full dataset. Our code is available at: https://github.com/tianyu139/data-efficient-augmentation	翻訳日:2023-07-21 18:45:43 公開日:2023-07-20
# ThoughtSource: 大規模言語モデル推論のための中心的なハブ ThoughtSource: A central hub for large language model reasoning data ( http://arxiv.org/abs/2301.11596v4 ) ライセンス: Link先を確認	Simon Ott, Konstantin Hebenstreit, Valentin Li\'evin, Christoffer Egeberg Hother, Milad Moradi, Maximilian Mayrhauser, Robert Praas, Ole Winther, Matthias Samwald	(参考訳) GPT-4のような大規模言語モデル(LLM)は、最近、幅広いタスクで印象的な結果を示した。 LLMは依然として制限されているが、複雑な推論でしばしば失敗し、推論プロセスは不透明であり、事実を「幻覚させる」傾向があるため、その根底にあるバイアスには懸念がある。モデルが推論ステップを自然言語として言語化する手法は、近年、これらの問題に対処する方法として提案されている。ここでは、思考の連鎖(CoT)推論のためのメタデータおよびソフトウェアライブラリであるThoughtSourceを紹介します。 ThoughtSourceの目標は、CoTの質的理解を促進し、経験的評価を可能にし、トレーニングデータを提供することによって、将来の人工知能システムを改善することである。 ThoughtSourceの最初のリリースでは、6つの科学的/医学的、3つの一般ドメイン、5つの数学語質問応答データセットを統合している。 Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results across a wide range of tasks. LLMs are still limited, however, in that they frequently fail at complex reasoning, their reasoning processes are opaque, they are prone to 'hallucinate' facts, and there are concerns about their underlying biases. Letting models verbalize reasoning steps as natural language, a technique known as chain-of-thought prompting, has recently been proposed as a way to address some of these issues. Here we present ThoughtSource, a meta-dataset and software library for chain-of-thought (CoT) reasoning. The goal of ThoughtSource is to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs, enabling empirical evaluations, and providing training data. This first release of ThoughtSource integrates six scientific/medical, three general-domain and five math word question answering datasets.	翻訳日:2023-07-21 18:39:13 公開日:2023-07-20
# 自律運転における協調的知覚 : 方法・データセット・課題 Collaborative Perception in Autonomous Driving: Methods, Datasets and Challenges ( http://arxiv.org/abs/2301.06262v2 ) ライセンス: Link先を確認	Yushan Han, Hui Zhang, Huifang Li, Yi Jin, Congyan Lang, Yidong Li	(参考訳) 協調認識は、自律運転における閉塞とセンサ障害の問題に対処するために不可欠である。近年,協調的知覚のための新作の理論的,実験的研究が著しく増加している。しかし、これまでのところ、体系的なコラボレーションモジュールと大規模な協調認識データセットに焦点を当てたレビューはほとんどない。この研究は、このギャップを埋め、将来の研究を動機付けるために、この分野における最近の成果をレビューする。まずは、コラボレーションスキームの概要から始めます。その後,理想的シナリオと実世界の課題に対する協調的知覚手法を体系的に要約する。前者はコラボレーションモジュールと効率に重点を置いており、後者は実際のアプリケーションの問題に対処することに集中しています。さらに, 大規模公開データセットを提示し, これらのベンチマークを定量的に要約する。最後に,現在の学術研究と実世界の応用とのギャップと見過ごされた課題を強調する。 Collaborative perception is essential to address occlusion and sensor failure issues in autonomous driving. In recent years, theoretical and experimental investigations of novel works for collaborative perception have increased tremendously. So far, however, few reviews have focused on systematical collaboration modules and large-scale collaborative perception datasets. This work reviews recent achievements in this field to bridge this gap and motivate future research. We start with a brief overview of collaboration schemes. After that, we systematically summarize the collaborative perception methods for ideal scenarios and real-world issues. The former focus on collaboration modules and efficiency, and the latter is devoted to addressing the problems in actual application. Furthermore, we present large-scale public datasets and summarize quantitative results on these benchmarks. Finally, we highlight gaps and overlooked challenges between current academic research and real-world applications.	翻訳日:2023-07-21 18:38:44 公開日:2023-07-20
# 簡素なマツリシカの花 Entanglement blossom in a simplex matryoshka ( http://arxiv.org/abs/2301.04170v2 ) ライセンス: Link先を確認	Zhao Zhang	(参考訳) エキゾチックな絡み合いエントロピースケーリング特性は、通常、実空間における興味深い絡み合い構造と時空格子の新しい計量をもたらす。 1つの顕著な例は、結合強度の強い不均一性から有効に長い範囲のカップリングにより、中心形状のベル対に対称な格子サイトが存在する虹鎖である。この写本はレインボー連鎖をハウスドルフ次元 1 の格子上の高次元空間に一般化し、ハミルトニアンフラストレーションを自由に保つ局所ヒルベルト空間を拡大する。シュリーファー・ウルフ変換の有効なハミルトニアンは、0$-次元(完全連結)の反強磁性ハミルトニアンを持つ、k$-単体の層を積み重ねることで与えられる。元の格子は、通常のk$-次元立方体格子で不傾斜欠陥を増殖させ、格子の中心に曲率を導入することで得られる。このモデルはSYKモデルと自由フェルミオンXXスピン鎖の間を補間するので、ブラックホール物理学やホログラフィーを理解するのに有用かもしれない。 Exotic entanglement entropy scaling properties usually come with interesting entanglement structures in real space and novel metrics of the spacetime lattice. One prominent example is the rainbow chain where lattice sites symmetric about the center form entangled Bell pairs due to an effective long-range coupling from the strong inhomogeneity of the coupling strength. This manuscript generalizes the rainbow chain to higher dimensional space on lattices with Hausdorff dimension one and enlarged local Hilbert space keeping the Hamiltonian frustration free. The effective Hamiltonian from the Schrieffer-Wolf transformation is given by a stacking of layers of $k$-simplices with $0$-dimensional (fully-connected) antiferromagnetic Hamiltonians, which can be diagonalized analytically with Young operators. The original lattice can be obtained from proliferating disinclination defects in a regular $k$-dimensional cubical lattice, which introduces curvature at the center of the lattice. The model interpolates between the SYK model and the free-fermionic XX spin chain, and hence might be potentially useful in understanding black hole physics and holography.	翻訳日:2023-07-21 18:38:34 公開日:2023-07-20
# イベントカメラデータの事前トレーニング Event Camera Data Pre-training ( http://arxiv.org/abs/2301.01928v3 ) ライセンス: Link先を確認	Yan Yang and Liyuan Pan and Liu Liu	(参考訳) 本稿では,イベントカメラデータを扱うためのトレーニング済みニューラルネットワークを提案する。私たちのモデルは、自己教師付き学習フレームワークであり、ペアのイベントカメラデータと自然なrgbイメージを使用してトレーニングを行います。提案手法は3つのモジュールを連続して連結する。一自己監督訓練のための有意義なイベント画像を生成するイベントデータ増強の家系二イベント画像から有意義なイベントパッチをサンプリングし、我々のモデルにシーンの空間配置を捉え、訓練を加速させるための条件付きマスキング戦略三一致したイベント画像とペア化されたイベント画像とRGB画像との埋め込みの類似性を強制する対照的な学習方法。イベント画像の埋め込み類似性を高める際に, モデル崩壊を回避するために, 埋め込み投影損失を提案する。イベント画像が特徴空間における対のrgb画像と一致するようにするための確率分布アライメント損失を提案する。ダウンストリームタスクにおける転送学習性能は,最先端手法よりも優れていることを示す。例えば、N-ImageNetデータセットにおいて、トップ1の精度は64.83%に達する。 This paper proposes a pre-trained neural network for handling event camera data. Our model is a self-supervised learning framework, and uses paired event camera data and natural RGB images for training. Our method contains three modules connected in a sequence: i) a family of event data augmentations, generating meaningful event images for self-supervised training; ii) a conditional masking strategy to sample informative event patches from event images, encouraging our model to capture the spatial layout of a scene and accelerating training; iii) a contrastive learning approach, enforcing the similarity of embeddings between matching event images, and between paired event and RGB images. An embedding projection loss is proposed to avoid the model collapse when enforcing the event image embedding similarities. A probability distribution alignment loss is proposed to encourage the event image to be consistent with its paired RGB image in the feature space. Transfer learning performance on downstream tasks shows the superiority of our method over state-of-the-art methods. For example, we achieve top-1 accuracy at 64.83% on the N-ImageNet dataset.	翻訳日:2023-07-21 18:38:12 公開日:2023-07-20
# 定数係数を持つ線形偏微分方程式系のガウス過程優先 Gaussian Process Priors for Systems of Linear Partial Differential Equations with Constant Coefficients ( http://arxiv.org/abs/2212.14319v3 ) ライセンス: Link先を確認	Marc H\"ark\"onen, Markus Lange-Hegermann, Bogdan Rai\c{t}\u{a}	(参考訳) 偏微分方程式(PDE)は物理システムをモデル化するための重要なツールであり、それらを機械学習モデルに含めることは物理知識を組み込む重要な方法である。定数係数の線形PDE系の任意の系が与えられたとき、我々はガウス過程(GP)先行系の族を提案し、これをEPGPと呼び、すべての実現がこの系の正確な解である。非線形フーリエ変換として働くehrenpreis-palamodov基本原理を適用し、gpsの標準スペクトル法を反映するgpカーネルを構築する。提案手法は,ノイズ測定や初期値,境界値などのデータから線形PDEシステムの確率解を推定できる。 EPGPプライヤの構築はアルゴリズム的であり、一般に適用可能であり、関連するスペクトル周波数を学習し、ビッグデータに対してよりうまく機能するスパースバージョン(S-EPGP)が付属している。我々はPDEの3種類の系、熱方程式、波動方程式、マクスウェル方程式について、いくつかの実験において計算時間と精度における技術の状態を改善する方法を示す。 Partial differential equations (PDEs) are important tools to model physical systems and including them into machine learning models is an important way of incorporating physical knowledge. Given any system of linear PDEs with constant coefficients, we propose a family of Gaussian process (GP) priors, which we call EPGP, such that all realizations are exact solutions of this system. We apply the Ehrenpreis-Palamodov fundamental principle, which works as a non-linear Fourier transform, to construct GP kernels mirroring standard spectral methods for GPs. Our approach can infer probable solutions of linear PDE systems from any data such as noisy measurements, or pointwise defined initial and boundary conditions. Constructing EPGP-priors is algorithmic, generally applicable, and comes with a sparse version (S-EPGP) that learns the relevant spectral frequencies and works better for big data sets. We demonstrate our approach on three families of systems of PDEs, the heat equation, wave equation, and Maxwell's equations, where we improve upon the state of the art in computation time and precision, in some experiments by several orders of magnitude.	翻訳日:2023-07-21 18:37:37 公開日:2023-07-20
# 眼周囲バイオメトリックス : 非拘束シナリオのモダリティ Periocular Biometrics: A Modality for Unconstrained Scenarios ( http://arxiv.org/abs/2212.13792v2 ) ライセンス: Link先を確認	Fernando Alonso-Fernandez, Josef Bigun, Julian Fierrez, Naser Damer, Hugo Proen\c{c}a, Arun Ross	(参考訳) 眼窩 (periocular) は、眼窩を取り囲む顔の外側に見える領域を指す。この特徴に富んだ領域は、アイリスや顔のモダリティが部分的閉塞や被写体間距離の上昇といった要因のために十分な生体計測的手がかりを提供しない、非拘束的または非協力的なシナリオにおいて正確な識別を提供することができる。新型コロナウイルス(COVID-19)のパンデミックは、マスクが普及しているため、コントロールされた設定でも目に見える唯一の顔領域であり続けたため、その重要性をさらに強調した。本稿では、近視バイオメトリックスにおける技術の現状について論じ、その最も重要な研究側面を包含する全体的な枠組みを示す。 a) 眼の定義,取得及び検出 (b)他のモダリティとの組合せ及び各種スペクトルの使用を含む識別及び (c)眼ソフトバイオメトリック解析。最後に,現在の課題に対処し,今後の方向性を提案する。 Periocular refers to the externally visible region of the face that surrounds the eye socket. This feature-rich area can provide accurate identification in unconstrained or uncooperative scenarios, where the iris or face modalities may not offer sufficient biometric cues due to factors such as partial occlusion or high subject-to-camera distance. The COVID-19 pandemic has further highlighted its importance, as the ocular region remained the only visible facial area even in controlled settings due to the widespread use of masks. This paper discusses the state of the art in periocular biometrics, presenting an overall framework encompassing its most significant research aspects, which include: (a) ocular definition, acquisition, and detection; (b) identity recognition, including combination with other modalities and use of various spectra; and (c) ocular soft-biometric analysis. Finally, we conclude by addressing current challenges and proposing future directions.	翻訳日:2023-07-21 18:37:15 公開日:2023-07-20
# 木構造学習による変数ネットワークの不確かさの定量化 Improving Uncertainty Quantification of Variance Networks by Tree-Structured Learning ( http://arxiv.org/abs/2212.12658v2 ) ライセンス: Link先を確認	Wenxuan Ma, Xing Yan, and Kun Zhang	(参考訳) 分散ネットワークの不確かさを定量化するために,不確実性に基づく特徴空間を複数の領域に分割する新しい木構造局所ニューラルネットワークモデルを提案する。葉ノードは、地域固有のニューラルネットワークをトレーニングして、不確実性を定量化するための平均と分散の両方を予測する、異なる領域を表す。提案したUncertainty-Splitting Neural Regression Tree (USNRT)は、新しいスプリッティング基準を採用している。各ノードにおいて、ニューラルネットワークをまず完全なデータに基づいてトレーニングし、その間に最も顕著な不均一性を持つ2つのサブリージョンに対応する、最良の分割を見つけるための残差の統計的テストを行う。 USNRTは、葉ノードが十分であり、刈り取りは不要であるため、計算に親しみやすい。さらに、アンサンブル版を簡単に構築して、気道およびてんかんを含む総不確実性を推定することができる。広範なuciデータセットにおいて、usnrtまたはそのアンサンブルは、分散による不確かさを定量化する最近の一般的な方法に比べて優れた性能を示している。包括的可視化と分析を通じて、USNRTがどのように機能するかを明らかにし、そのメリットを示し、不確実な不均一性が多くのデータセットに存在し、USNRTで学習できることを明らかにする。 To improve the uncertainty quantification of variance networks, we propose a novel tree-structured local neural network model that partitions the feature space into multiple regions based on uncertainty heterogeneity. A tree is built upon giving the training data, whose leaf nodes represent different regions where region-specific neural networks are trained to predict both the mean and the variance for quantifying uncertainty. The proposed Uncertainty-Splitting Neural Regression Tree (USNRT) employs novel splitting criteria. At each node, a neural network is trained on the full data first, and a statistical test for the residuals is conducted to find the best split, corresponding to the two sub-regions with the most significant uncertainty heterogeneity between them. USNRT is computationally friendly because very few leaf nodes are sufficient and pruning is unnecessary. Furthermore, an ensemble version can be easily constructed to estimate the total uncertainty including the aleatory and epistemic. On extensive UCI datasets, USNRT or its ensemble shows superior performance compared to some recent popular methods for quantifying uncertainty with variances. Through comprehensive visualization and analysis, we uncover how USNRT works and show its merits, revealing that uncertainty heterogeneity does exist in many datasets and can be learned by USNRT.	翻訳日:2023-07-21 18:37:00 公開日:2023-07-20
# 量子ウォークに基づくparrondoの量子探索ゲーム Parrondo's game of quantum search based on quantum walk ( http://arxiv.org/abs/2303.06579v2 ) ライセンス: Link先を確認	Taisuke Hosaka and Norio Konno	(参考訳) パロンドが考案したパロンドゲームは、敗戦戦略と組み合わせて勝利戦略を構築することを意味する。この状況をパロンドパラドックス(parrondo paradox)と呼ぶ。量子ウォークに基づくParrondoゲームと量子ウォークによる探索アルゴリズムは,それぞれ広く研究されている。本稿では,両モデルを組み合わせた量子ウォークに基づく量子探索のパロンドゲームを提案する。さらに, 数値シミュレーションにより1次元および2次元トーラス上のモデルに対するparrondoのパラドックスの存在を確認した。その後、パラドックスが発生する範囲は、頂点と1つのマークされた頂点を持つ $d$-dimensional torus $(d \geq 1)$ の元について対称であることを示した。 The Parrondo game, devised by Parrondo, means that winning strategy is constructed a combination of losing strategy. This situation is called the Parrondo paradox. The Parrondo game based on quantum walk and the search algorithm via quantum walk have been widely studied, respectively. This paper newly presents a Parrondo game of quantum search based on quantum walk by combining both models. Moreover we confirm that Parrondo's paradox exists for our model on the one- and two-dimensional torus by numerical simulations. Afterwards we show the range in which the paradox occurs is symmetric about the origin on the $d$-dimensional torus $(d \geq 1)$ with even vertices and one marked vertex.	翻訳日:2023-07-21 18:28:53 公開日:2023-07-20
# Langevin Monte Carloの完全な分析に向けて: Poincar\'eの不平等を超えて Towards a Complete Analysis of Langevin Monte Carlo: Beyond Poincar\'e Inequality ( http://arxiv.org/abs/2303.03589v2 ) ライセンス: Link先を確認	Alireza Mousavi-Hosseini and Tyler Farghly and Ye He and Krishnakumar Balasubramanian and Murat A. Erdogdu	(参考訳) ランゲヴィン拡散は適切な機能的不等式仮定の下で急速に収束する。したがって、離散化誤差を扱うための追加の滑らかさ条件により、ランジュバン・モンテカルロ(lmc)のような離散化も同様に収束することが期待できる。この研究プログラムは、Vempala and Wibisono (2019)によって始められ、ログソボレフの不等式で結果を確立した。 Chewi et al. (2022) は結果を Poincar\'e の不等式を扱うように拡張した。本稿では,poincar\'eの不等式を超えて,この研究プログラムを限界まで押し上げる。我々は、多項式分解重尾密度(すなわちコーシー型)を含む大きな密度のクラスで満たされる弱いポアンカーの不等式の下でランゲヴィン拡散と LMC の上下境界を確立する。本結果は,初期化器がLCCアルゴリズムの性能に与える影響を明示的に定量化する。特に、尾が準ガウスから亜指数へ、そして最後にコーシー様へと進むと、初期誤差への依存は対数的から多項式へ、そして最後に指数的であることを示す。この3段階の位相遷移は、以下に示すように特に避けられないものであり、LCCの境界を明確に定義している。 Langevin diffusions are rapidly convergent under appropriate functional inequality assumptions. Hence, it is natural to expect that with additional smoothness conditions to handle the discretization errors, their discretizations like the Langevin Monte Carlo (LMC) converge in a similar fashion. This research program was initiated by Vempala and Wibisono (2019), who established results under log-Sobolev inequalities. Chewi et al. (2022) extended the results to handle the case of Poincar\'e inequalities. In this paper, we go beyond Poincar\'e inequalities, and push this research program to its limit. We do so by establishing upper and lower bounds for Langevin diffusions and LMC under weak Poincar\'e inequalities that are satisfied by a large class of densities including polynomially-decaying heavy-tailed densities (i.e., Cauchy-type). Our results explicitly quantify the effect of the initializer on the performance of the LMC algorithm. In particular, we show that as the tail goes from sub-Gaussian, to sub-exponential, and finally to Cauchy-like, the dependency on the initial error goes from being logarithmic, to polynomial, and then finally to being exponential. This three-step phase transition is in particular unavoidable as demonstrated by our lower bounds, clearly defining the boundaries of LMC.	翻訳日:2023-07-21 18:28:42 公開日:2023-07-20
# MultiRobustBench: 複数の攻撃に対するロバスト性のベンチマーク MultiRobustBench: Benchmarking Robustness Against Multiple Attacks ( http://arxiv.org/abs/2302.10980v3 ) ライセンス: Link先を確認	Sihui Dai, Saeed Mahloujifar, Chong Xiang, Vikash Sehwag, Pin-Yu Chen, Prateek Mittal	(参考訳) 敵の例に対する防御に関する既存の研究の多くは、単一の(通常は境界付けられたLp-ノルム)攻撃に対する防御に焦点を当てているが、実際は機械学習(ML)モデルは様々な攻撃に対して堅牢であるべきである。本稿では,MLモデルに対する多重攻撃を考慮した最初の統一フレームワークを提案する。我々のフレームワークは、テスト時の敵に対する学習者の知識の異なるレベルをモデル化することができ、予期せぬ攻撃に対する頑健さと攻撃の結合に対する堅牢さをモデル化することができる。このフレームワークを用いて,攻撃型と攻撃強度をまたいだ性能を捉えるマルチアタック評価のベンチマークを行うための,最初のリーダボードであるmultirobustbenchを提案する。我々は,lpベースの脅威モデル,空間的変換,色変化を含む9種類の攻撃タイプに対するロバスト性に対する16種類の防御モデルの性能を20種類の攻撃強度(合計180攻撃)で評価した。さらに、複数の攻撃に対する現在の防御状況を分析する。我々の分析によると、既存の防御は、使用される攻撃セット全体の平均ロバストネスを進歩させたが、最悪の攻撃に対するロバストネスは依然として大きなオープンな問題であり、既存のすべてのモデルがランダムな推測よりも悪化している。 The bulk of existing research in defending against adversarial examples focuses on defending against a single (typically bounded Lp-norm) attack, but for a practical setting, machine learning (ML) models should be robust to a wide variety of attacks. In this paper, we present the first unified framework for considering multiple attacks against ML models. Our framework is able to model different levels of learner's knowledge about the test-time adversary, allowing us to model robustness against unforeseen attacks and robustness against unions of attacks. Using our framework, we present the first leaderboard, MultiRobustBench, for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including Lp-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total). Additionally, we analyze the state of current defenses against multiple attacks. Our analysis shows that while existing defenses have made progress in terms of average robustness across the set of attacks used, robustness against the worst-case attack is still a big open problem as all existing models perform worse than random guessing.	翻訳日:2023-07-21 18:27:57 公開日:2023-07-20
# 双極子膜におけるスピン励起のモーメント選択対生成 Momentum-selective pair creation of spin excitations in dipolar bilayers ( http://arxiv.org/abs/2302.09059v2 ) ライセンス: Link先を確認	Thomas Bilitewski, G. A. Dom\'inguez-Castro, David Wellnitz, Ana Maria Rey, Luis Santos	(参考訳) 長距離・異方性双極子相互作用を媒介とするスピン1/2量子xxzモデルを実現する2次元二重層における量子相関の時間的成長と空間伝播について検討した。各層に逆磁化を持つスピンからなる初期状態から始めると、スピン構造因子における運動量依存性の動的不安定性の出現を予測し、その結果、短時間で指数関数的に速い速度で励起対を生成する。生成されたペアは、双極子配向、層分離または双極子カップリングを制御することで調整できる特徴的な運動量分布を示す。予測された挙動は、非常に低い充填率で観測可能であり、ライドバーグ原子、磁気原子、極性分子配列を用いた最先端の実験で見ることができる。 We study the temporal growth and spatial propagation of quantum correlations in a two-dimensional bilayer realising a spin-1/2 quantum XXZ model with couplings mediated by long-range and anisotropic dipolar interactions. Starting with an initial state consisting of spins with opposite magnetization in each of the layers, we predict the emergence of a momentum-dependent dynamic instability in the spin structure factor that results, at short times, in the creation of pairs of excitations at exponentially fast rates. The created pairs present a characteristic momentum distribution that can be tuned by controlling the dipolar orientation, the layer separation or the dipolar couplings. The predicted behavior remains observable at very low filling fractions, making it accessible in state-of-the-art experiments with Rydberg atoms, magnetic atoms, and polar molecule arrays.	翻訳日:2023-07-21 18:27:33 公開日:2023-07-20
# navya3dseg -- navyaセマンティックセグメンテーションデータセットと自動運転車のための分割生成 Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation for autonomous vehicles ( http://arxiv.org/abs/2302.08292v3 ) ライセンス: Link先を確認	Alexandre Almin, L\'eo Lemari\'e, Anh Duong, B Ravi Kiran	(参考訳) 今日では、自動運転(AD)の認識は、キュレーションとアノテーションに関連するコストとともに、大規模な注釈付きデータセットを必要とするディープラーニングベースのアーキテクチャに大きく依存している。 3次元意味データは障害物検出や車軸位置推定などのコア知覚タスクに有用である。本研究では,13カ国の農村,都市,工業地,大学を含む大規模生産段階の運用ドメインに対応する多様なラベル空間を持つ,navya 3dセグメンテーション(navya3dseg)という新しいデータセットを提案する。ラベルのない23のラベル付きシーケンスと25の補足的なシーケンスを含み、ポイントクラウド上の自己教師付きおよび半教師付きセマンティックセグメンテーションベンチマークを探索するように設計されている。また,反復的マルチラベル階層化に基づく逐次データセット分割生成手法を提案し,SemanticKITTIデータセットによって提案された分割よりも+1.2%のmIoU改善を実現することを示した。セマンティクスセグメンテーションタスクの完全なベンチマークが, artメソッドの状態とともに実施された。最後に、アクティブラーニング(AL)に基づくデータセット蒸留フレームワークを実演する。 ALの文脈において,エゴ位置距離に基づく新しいヒューリスティックなサンプリング手法を提案する。データセットに関する詳細なプレゼンテーションは、https://www.youtube.com/watch? v=5m6ALIs-s20。 Autonomous driving (AD) perception today relies heavily on deep learning based architectures requiring large scale annotated datasets with their associated costs for curation and annotation. The 3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization. We propose a new dataset, Navya 3D Segmentation (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain, including rural, urban, industrial sites and universities from 13 countries. It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds. We also propose a novel method for sequential dataset split generation based on iterative multi-label stratification, and demonstrated to achieve a +1.2% mIoU improvement over the original split proposed by SemanticKITTI dataset. A complete benchmark for semantic segmentation task was performed, with state of the art methods. Finally, we demonstrate an Active Learning (AL) based dataset distillation framework. We introduce a novel heuristic-free sampling method called ego-pose distance based sampling in the context of AL. A detailed presentation on the dataset is available here https://www.youtube.com/watch?v=5m6ALIs-s20.	翻訳日:2023-07-21 18:27:18 公開日:2023-07-20
# 関数上の学習分布のための変分混合ハイパージェネレータ Variational Mixture of HyperGenerators for Learning Distributions Over Functions ( http://arxiv.org/abs/2302.06223v3 ) ライセンス: Link先を確認	Batuhan Koyuncu, Pablo Sanchez-Martin, Ignacio Peis, Pablo M. Olmos, Isabel Valera	(参考訳) 近年のアプローチは、関数空間上の生成モデルを提案するために暗黙の神経表現(INR)に基づいている。しかし、データ計算の欠如など推論タスクを扱う場合や、直接処理できない場合には計算コストがかかる。本研究では,VAMoHと呼ばれる新しい深層生成モデルを提案する。 VAMoHはINRを用いた連続関数のモデリング機能と変分オートエンコーダ(VAE)の推論機能を組み合わせたものである。さらにVAMoHは、事前を定義するための正規化フローと、データログライクな状態をパラメータ化するハイパーネットワークの混合に依存している。これによりVAMoHは高い表現能力と解釈可能性が得られる。画像やボクセル,気候データなど,さまざまな種類のデータタイプの実験を通じて,VAMoHは連続関数上の豊富な分布を効果的に学習できることを示す。さらに、条件付き超解像生成やインペインティングなどの推論関連タスクを、計算処理の要求を少なくしつつ、従来の手法よりも優れている。 Recent approaches build on implicit neural representations (INRs) to propose generative models over function spaces. However, they are computationally costly when dealing with inference tasks, such as missing data imputation, or directly cannot tackle them. In this work, we propose a novel deep generative model, named VAMoH. VAMoH combines the capabilities of modeling continuous functions using INRs and the inference capabilities of Variational Autoencoders (VAEs). In addition, VAMoH relies on a normalizing flow to define the prior, and a mixture of hypernetworks to parametrize the data log-likelihood. This gives VAMoH a high expressive capability and interpretability. Through experiments on a diverse range of data types, such as images, voxels, and climate data, we show that VAMoH can effectively learn rich distributions over continuous functions. Furthermore, it can perform inference-related tasks, such as conditional super-resolution generation and in-painting, as well or better than previous approaches, while being less computationally demanding.	翻訳日:2023-07-21 18:26:56 公開日:2023-07-20
# ChatGPTの数学的機能 Mathematical Capabilities of ChatGPT ( http://arxiv.org/abs/2301.13867v2 ) ライセンス: Link先を確認	Simon Frieder, Luca Pinchetti, Alexis Chevalier, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, Julius Berner	(参考訳) 公開データセットと手作りデータセットを用いて,chatgpt (9- january-2023 および 30- january-2023) と gpt-4 の2つのイテレーションの数学的能力について,新しい方法論を用いて検証した。形式的証明の大規模なデータベース(例えばリーン数学ライブラリ)が利用可能である形式数学とは対照的に、現在の自然言語数学のデータセットは言語モデルのベンチマークに使われ、初等数学のみをカバーするか、あるいは非常に小さい。この問題に対処するため、GHOSTSとminiGHOSTSという2つの新しいデータセットを公開しています。これらは、(1)大学院レベルの数学を対象とする数学研究者による最初の自然言語データセットであり、(2)言語モデルの数学的能力の全体像を提供し、(3)数学的推論の複数の次元を区別する。これらのデータセットはまた、ChatGPTとGPT-4が数学者の日々の職業活動で発生するユースケースをエミュレートすることで、プロの数学者の補助となるかどうかを検証している。モデルを、詳細なパフォーマンス指標でベンチマークします。高度な数学では、これは今までで最も詳細な評価である。この結果から,ChatGPTは数学的検索エンジンや知識ベースインタフェースとして機能し,事実を問合せするための数学的アシスタントとして最もうまく利用できることがわかった。 gpt-4は大学レベルの数学でも使えるが、大学院レベルの難易度では失敗する。 GPT-4とChatGPTの試験解決能力(選択バイアスの可能性)に関するメディアの多くの肯定的な報告とは対照的に、その全体的な数学的性能は大学院生のレベルよりかなり低い。したがって、ChatGPTを卒業レベルの数学試験に合格させることが目標ならば、平均的な仲間からのコピーをオフにする方がよいでしょう。 We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-language mathematics, used to benchmark language models, either cover only elementary mathematics or are very small. We address this by publicly releasing two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics that (1) aim to cover graduate-level mathematics, (2) provide a holistic overview of the mathematical capabilities of language models, and (3) distinguish multiple dimensions of mathematical reasoning. These datasets also test whether ChatGPT and GPT-4 can be helpful assistants to professional mathematicians by emulating use cases that arise in the daily professional activities of mathematicians. We benchmark the models on a range of fine-grained performance metrics. For advanced mathematics, this is the most detailed evaluation effort to date. We find that ChatGPT can be used most successfully as a mathematical assistant for querying facts, acting as a mathematical search engine and knowledge base interface. GPT-4 can additionally be used for undergraduate-level mathematics but fails on graduate-level difficulty. Contrary to many positive reports in the media about GPT-4 and ChatGPT's exam-solving abilities (a potential case of selection bias), their overall mathematical performance is well below the level of a graduate student. Hence, if your goal is to use ChatGPT to pass a graduate-level math exam, you would be better off copying from your average peer!	翻訳日:2023-07-21 18:26:12 公開日:2023-07-20
# トポロジカルポイントクラウドクラスタリング Topological Point Cloud Clustering ( http://arxiv.org/abs/2303.16716v2 ) ライセンス: Link先を確認	Vincent P. Grande and Michael T. Schaub	(参考訳) 我々は,グローバルトポロジカル機能への貢献に基づいて任意のポイントクラウドにポイントをクラスタリングする新しい手法であるtopological point cloud clustering (tpcc)を提案する。 TPCCは、スペクトルクラスタリングとトポロジカルデータ解析から望ましい特徴を合成し、考慮された点雲に付随する単体錯体のスペクトル特性を考慮した。スパース固有ベクトル計算を考えることから、tpccも同様にスペクトルクラスタリングとして解釈および実装が容易である。しかし、点クラウドデータから生成されたグラフに付随する1つの行列に焦点をあてるだけでなく、適切に構築された単純複体に関連付けられたホッジ・ラプラシアン全体の集合に焦点を合わせることで、よりリッチな位相的特徴集合を利用して点クラウド内のデータポイントを特徴づけ、雑音に対するトポロジ的手法の相対ロバスト性から恩恵を受けることができる。合成データと実データの両方でtpccの性能をテストし,従来のスペクトルクラスタリングと比較した。 We present Topological Point Cloud Clustering (TPCC), a new method to cluster points in an arbitrary point cloud based on their contribution to global topological features. TPCC synthesizes desirable features from spectral clustering and topological data analysis and is based on considering the spectral properties of a simplicial complex associated to the considered point cloud. As it is based on considering sparse eigenvector computations, TPCC is similarly easy to interpret and implement as spectral clustering. However, by focusing not just on a single matrix associated to a graph created from the point cloud data, but on a whole set of Hodge-Laplacians associated to an appropriately constructed simplicial complex, we can leverage a far richer set of topological features to characterize the data points within the point cloud and benefit from the relative robustness of topological techniques against noise. We test the performance of TPCC on both synthetic and real-world data and compare it with classical spectral clustering.	翻訳日:2023-07-21 18:20:09 公開日:2023-07-20
# 量子コンピュータを用いた生物シーケンス比較アルゴリズム A biological sequence comparison algorithm using quantum computers ( http://arxiv.org/abs/2303.13608v5 ) ライセンス: Link先を確認	B\"usra K\"osoglu-Kind, Robert Loredo, Michele Grossi, Christian Bernecker, Jody M Burks, Rudiger Buchkremer	(参考訳) 遺伝情報は、数千から数十億の文字で表されるヌクレオチドの線形配列に符号化される。変異はDNAまたはRNAヌクレオチド配列の変化を指す。したがって、突然変異検出は生物学や医学のあらゆる分野において不可欠である。病原性増強変異の注意深いモニタリングが不可欠である。しかし、このサイズの遺伝的配列を分析するには、膨大な量の古典計算能力が必要である。量子コンピュータ上での視覚の人間の知覚と画像のピクセル表現に着想を得て,これらの手法をペアワイズシーケンス解析に活用した。この手法は古典的アプローチよりも潜在的に有利であり、遺伝子配列の変異やその他の修正を特定するためにさらに応用することができる。本稿では,ヌクレオチド間の類似度を決定するために,類似度スコアを算出した量子コンピュータ上で2つのゲノム配列間の類似度を表示・解析する手法を提案する。 Genetic information is encoded in a linear sequence of nucleotides, represented by letters ranging from thousands to billions. Mutations refer to changes in the DNA or RNA nucleotide sequence. Thus, mutation detection is vital in all areas of biology and medicine. Careful monitoring of virulence-enhancing mutations is essential. However, an enormous amount of classical computing power is required to analyze genetic sequences of this size. Inspired by human perception of vision and pixel representation of images on quantum computers, we leverage these techniques to implement a pairwise sequence analysis. The methodology has a potential advantage over classical approaches and can be further applied to identify mutations and other modifications in genetic sequences. We present a method to display and analyze the similarity between two genome sequences on a quantum computer where a similarity score is calculated to determine the similarity between nucleotides.	翻訳日:2023-07-21 18:19:21 公開日:2023-07-20
# regformer:大規模ポイントクラウド登録のための効率的なプロジェクションアウェアトランスフォーマネットワーク RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration ( http://arxiv.org/abs/2303.12384v2 ) ライセンス: Link先を確認	Jiuming Liu, Guangming Wang, Zhe Liu, Chaokang Jiang, Marc Pollefeys, Hesheng Wang	(参考訳) ポイントクラウドの登録は、オブジェクトレベルのシーンや屋内シーンで著しい進歩を遂げているが、大規模な登録方法が探求されることはほとんどない。課題は主に、屋外LiDARスキャンの巨大な点数、複雑な分布、外れ値から生じる。さらに、既存の登録作業の多くは一般的に2段階のパラダイムを採用しており、まず識別可能な局所的な特徴を抽出することで対応を見つけ、その後、よく設計された記述子と後処理の選択に大きく依存する外れ値のフィルタリングに推定子(例えばransac)を利用する。そこで本研究では,大規模ポイントクラウドアライメントのためのエンドツーエンドトランスフォーマーネットワーク (regformer) を提案する。具体的には, 射影型階層変換器を提案し, 点特徴をグローバルに抽出することにより, 長距離依存を捕捉し, アウトレーヤをフィルタする。変圧器は線形複雑であり,大規模シーンにおいても高い効率性が保証される。さらに、ミスマッチを効果的に低減するために、初期変換を遅らせるために、客観的アソシエーション変換器を設計する。 KITTIとNuScenesのデータセットに関する大規模な実験は、我々のRegFormerが精度と効率の両面で競合性能を達成することを示した。 Although point cloud registration has achieved remarkable advances in object-level and indoor scenes, large-scale registration methods are rarely explored. Challenges mainly arise from the huge point number, complex distribution, and outliers of outdoor LiDAR scans. In addition, most existing registration works generally adopt a two-stage paradigm: They first find correspondences by extracting discriminative local features, and then leverage estimators (eg. RANSAC) to filter outliers, which are highly dependent on well-designed descriptors and post-processing choices. To address these problems, we propose an end-to-end transformer network (RegFormer) for large-scale point cloud alignment without any further post-processing. Specifically, a projection-aware hierarchical transformer is proposed to capture long-range dependencies and filter outliers by extracting point features globally. Our transformer has linear complexity, which guarantees high efficiency even for large-scale scenes. Furthermore, to effectively reduce mismatches, a bijective association transformer is designed for regressing the initial transformation. Extensive experiments on KITTI and NuScenes datasets demonstrate that our RegFormer achieves competitive performance in terms of both accuracy and efficiency.	翻訳日:2023-07-21 18:18:51 公開日:2023-07-20
# 画像とビデオのキャプション評価のためのポジティブなコントラスト学習 Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation ( http://arxiv.org/abs/2303.12112v3 ) ライセンス: Link先を確認	Sara Sarto, Manuele Barraco, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara	(参考訳) CLIPモデルは最近、視覚・言語アーキテクチャから生成されたキャプションの評価など、多種多様なクロスモーダルタスクに非常に効果的であることが証明されている。本稿では,画像キャプションのためのコントラストベース評価尺度,すなわち正示型コントラスト学習スコア(pac-s)を提案する。いくつかのデータセットにまたがる実験により、私たちの新しいメトリクスは、画像とビデオの両方で人間の判断と最も高い相関を達成し、CIDErやSPICEのような既存の参照ベースのメトリクスとCLIP-Scoreのような参照なしメトリクスを上回ります。最後に,人気のあるキャプション手法を考慮した場合,提案手法のシステムレベル相関をテストし,異なるクロスモーダル特徴を用いた場合の影響を評価する。ソースコードとトレーニングされたモデルは、https://github.com/aimagelab/pacscore.com/で公開されている。 The CLIP model has been recently proven to be very effective for a variety of cross-modal tasks, including the evaluation of captions generated from vision-and-language architectures. In this paper, we propose a new recipe for a contrastive-based evaluation metric for image captioning, namely Positive-Augmented Contrastive learning Score (PAC-S), that in a novel way unifies the learning of a contrastive visual-semantic space with the addition of generated images and text on curated data. Experiments spanning several datasets demonstrate that our new metric achieves the highest correlation with human judgments on both images and videos, outperforming existing reference-based metrics like CIDEr and SPICE and reference-free metrics like CLIP-Score. Finally, we test the system-level correlation of the proposed metric when considering popular image captioning approaches, and assess the impact of employing different cross-modal features. Our source code and trained models are publicly available at: https://github.com/aimagelab/pacscore.	翻訳日:2023-07-21 18:18:28 公開日:2023-07-20
# すべてはデータに関するものだ – 逆のロバスト性に対するデータの影響に関する調査 It Is All About Data: A Survey on the Effects of Data on Adversarial Robustness ( http://arxiv.org/abs/2303.09767v2 ) ライセンス: Link先を確認	Peiyu Xiong, Michael Tegegn, Jaskeerat Singh Sarin, Shubhraneel Pal, Julia Rubin	(参考訳) 敵の例は機械学習モデルへの入力であり、攻撃者が意図的にモデルを混同して間違いを起こすように設計した。このような例は、特に生命および安全クリティカルな領域において、機械学習ベースのシステムの適用性に深刻な脅威をもたらす。この問題に対処するため、敵対的堅牢性領域は、これらの攻撃に対する敵対的攻撃と防御の背後にあるメカニズムを調査している。本研究は, 避難攻撃時のモデルロバスト性の観点から, トレーニングデータの特性を調査することに焦点を当てた, この文献の特定のサブセットをレビューする。まず、敵の脆弱性につながるデータの主な特性を要約する。次に,データ表現と学習手順の強化による対向的ロバスト性向上のためのガイドラインと手法と,与えられた特定のデータに対するロバスト性保証を推定する手法について論じる。最後に、この領域における知識のギャップと将来的な研究の方向性について論じる。 Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to confuse the model into making a mistake. Such examples pose a serious threat to the applicability of machine-learning-based systems, especially in life- and safety-critical domains. To address this problem, the area of adversarial robustness investigates mechanisms behind adversarial attacks and defenses against these attacks. This survey reviews a particular subset of this literature that focuses on investigating properties of training data in the context of model robustness under evasion attacks. It first summarizes the main properties of data leading to adversarial vulnerability. It then discusses guidelines and techniques for improving adversarial robustness by enhancing the data representation and learning procedures, as well as techniques for estimating robustness guarantees given particular data. Finally, it discusses gaps of knowledge and promising future research directions in this area.	翻訳日:2023-07-21 18:17:23 公開日:2023-07-20
# 二部量子系の絡み合いダイナミクスに関する基礎的速度制限 Fundamental speed limits on entanglement dynamics of bipartite quantum systems ( http://arxiv.org/abs/2303.07415v2 ) ライセンス: Link先を確認	Vivek Pandey, Swapnil Bhowmick, Brij Mohan, Sohail, and Ujjwal Sen	(参考訳) エンタングルメントの速度限界は、物理的過程においてエンタングルメントが生成または劣化できる最大速度として定義される。エンタングルメントの相対エントロピーとトレース距離エンタングルメントを用いて、エンタングルメントの速度限界をユニタリと任意の量子力学で導出し、最も近い分離可能な状態のダイナミクスは、システムの実際のダイナミクスの最も近い分離可能なダイナミクスによって近似的に記述できると仮定する。純粋な状態によって記述される孤立二成分系のユニタリダイナミクスに対して、エンタングルメント生成の速度は、システムの駆動ハミルトニアンと超主作用素の揺らぎの積によって制限され、最接近分離可能な状態の時間依存性を反映した追加の項が与えられる。入力の純度と進化のユニタリ性に関する制限を取り除いた場合、境界内の2つの項は適切に変更される。さらに、任意の量子力学によりある程度の絡み合いを発生または分解するのに要する時間に対する低い境界を求める。実際に興味のある量子過程を考慮し, 絡み合いに対する速度制限の厳密さを示す。 The speed limits on entanglement are defined as the maximal rate at which entanglement can be generated or degraded in a physical process. We derive the speed limits on entanglement, using the relative entropy of entanglement and trace-distance entanglement, for unitary as well as for arbitrary quantum dynamics, where we assume that the dynamics of the closest separable state can be approximately described by the closest separable dynamics of the actual dynamics of the system. For unitary dynamics of isolated bipartite systems which are described by pure states, the rate of entanglement production is bounded by the product of fluctuations of the system's driving Hamiltonian and the surprisal operator, with an additional term reflecting the time-dependent nature of the closest separable state. Removing restrictions on the purity of the input and on the unitarity of the evolution, the two terms in the bound get suitably altered. Furthermore, we find a lower bound on the time required to generate or degrade a certain amount of entanglement by arbitrary quantum dynamics. We demonstrate the tightness of our speed limits on entanglement by considering quantum processes of practical interest.	翻訳日:2023-07-21 18:16:46 公開日:2023-07-20
# トップおよびバックビュードローン映像からのポーズ情報を用いたバドミントンダブルスの制御領域の推定 Estimation of control area in badminton doubles with pose information from top and back view drone videos ( http://arxiv.org/abs/2305.04247v2 ) ライセンス: Link先を確認	Ning Ding, Kazuya Takeda, Wenhui Jin, Yingjiu Bei, Keisuke Fujii	(参考訳) 動的競技におけるスポーツ選手のパフォーマンス分析へのビジュアルトラッキングの適用は,効果的なコーチングに不可欠である。ダブルスの試合では、調整された位置決めがコートのコントロールを維持し、対戦相手の得点機会を最小化するために重要である。このようなチームワークの分析はゲームのダイナミクスを理解する上で重要な役割を果たす。しかし,従来の研究では,放送ビデオの排除を考慮せずにシングルプレーヤーの分析と評価に重点を置いてきた。これらの研究は、特定のアクション(例えば、ストローク)の分析と表現を含む離散的な表現や、意味のある空間分布を見下ろしながらゲーム中に起こる出来事に依存してきた。本研究では,バドミントンダブルにおけるトップ・バックビューからの最初の注釈付きドローンデータセットを提示し,チームワークのパフォーマンスを評価するための制御領域確率マップを推定するためのフレームワークを提案する。完全な確率曲面の計算を可能にするディープニューラルネットワークの効率的なフレームワークを提案する。このフレームワークはプレイヤーの位置のガウス混合写像の埋め込みを利用し、ポーズにグラフ畳み込みを用いる。実験では,様々なベースラインを比較し,スコアと制御領域の相関関係を見出すことにより,我々のアプローチを検証する。また,ゲーム中に指示を与える最適位置評価のための実用的応用を提案する。このアプローチは,選手の動きを視覚的かつ定量的に評価し,ダブルスチームワークに対する貴重な洞察を提供する。データセットと関連するプロジェクトコードはhttps://github.com/ning-d/drone_bd_controlareaで入手できる。 The application of visual tracking to the performance analysis of sports players in dynamic competitions is vital for effective coaching. In doubles matches, coordinated positioning is crucial for maintaining control of the court and minimizing opponents' scoring opportunities. The analysis of such teamwork plays a vital role in understanding the dynamics of the game. However, previous studies have primarily focused on analyzing and assessing singles players without considering occlusion in broadcast videos. These studies have relied on discrete representations, which involve the analysis and representation of specific actions (e.g., strokes) or events that occur during the game while overlooking the meaningful spatial distribution. In this work, we present the first annotated drone dataset from top and back views in badminton doubles and propose a framework to estimate the control area probability map, which can be used to evaluate teamwork performance. We present an efficient framework of deep neural networks that enables the calculation of full probability surfaces. This framework utilizes the embedding of a Gaussian mixture map of players' positions and employs graph convolution on their poses. In the experiment, we verify our approach by comparing various baselines and discovering the correlations between the score and control area. Additionally, we propose a practical application for assessing optimal positioning to provide instructions during a game. Our approach offers both visual and quantitative evaluations of players' movements, thereby providing valuable insights into doubles teamwork. The dataset and related project code is available at https://github.com/Ning-D/Drone_BD_ControlArea	翻訳日:2023-07-21 18:09:00 公開日:2023-07-20
# BERT と Query-Aware LSH を用いたインフォームドキュメンテーションにおけるコード例推薦の改善 : 比較検討 Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study ( http://arxiv.org/abs/2305.03017v3 ) ライセンス: Link先を確認	Sajjad Rahmani, AmirHossein Naghshzan, Latifa Guerrouj	(参考訳) 本研究は,コードスニペットの用意により,開発者がかなりの時間を節約できるソフトウェア開発者の支援を目的としたコード例の推薦について検討する。私たちの研究の焦点はStack Overflowで、特にJavaプログラミング言語のコンテキストにおいて、議論やソリューションをコーディングするのによく使われるリソースです。我々は,LLM(Large Language Model)であるBERTを適用し,コード例を意味情報を抽出して数値ベクトルに変換する。これらの数値表現が準備されたら、Locality-Sensitive Hashing (LSH) を用いて近似近傍隣人(ANN)を同定する。 LSHにはランダム・ハイパープレーン・ベースLSHとクエリ・アウェアLSHの2つのバリエーションがある。これらの2つのアプローチを,hitrate, mean reciprocal rank (mrr), average execution time, associatedの4つのパラメータで厳密に比較した。本研究では,Random Hyperplane-based (RH) 法よりもQuery-Aware (QA) 法の方が優れた性能を示した。具体的には、RHアプローチと比較して、クエリペアに対してHitRateが20%から35%向上した。さらに、ハッシュテーブルの作成とデータサンプルのバケットへの割り当てを少なくとも4倍高速にすることで、QAアプローチは大幅に時間効率が向上した。コード例をミリ秒以内に返すことができるが、RHアプローチは通常、コード例を推奨するのに数秒を要する。 QAアプローチの優れたパフォーマンスのため、最先端のベースラインであるPostFinderとFaCoYに対してテストしました。提案手法は,有効なコード推薦の可能性を証明した。 Our research investigates the recommendation of code examples to aid software developers, a practice that saves developers significant time by providing ready-to-use code snippets. The focus of our study is Stack Overflow, a commonly used resource for coding discussions and solutions, particularly in the context of the Java programming language. We applied BERT, a powerful Large Language Model (LLM) that enables us to transform code examples into numerical vectors by extracting their semantic information. Once these numerical representations are prepared, we identify Approximate Nearest Neighbors (ANN) using Locality-Sensitive Hashing (LSH). Our research employed two variants of LSH: Random Hyperplane-based LSH and Query-Aware LSH. We rigorously compared these two approaches across four parameters: HitRate, Mean Reciprocal Rank (MRR), Average Execution Time, and Relevance. Our study revealed that the Query-Aware (QA) approach showed superior performance over the Random Hyperplane-based (RH) method. Specifically, it exhibited a notable improvement of 20% to 35% in HitRate for query pairs compared to the RH approach. Furthermore, the QA approach proved significantly more time-efficient, with its speed in creating hashing tables and assigning data samples to buckets being at least four times faster. It can return code examples within milliseconds, whereas the RH approach typically requires several seconds to recommend code examples. Due to the superior performance of the QA approach, we tested it against PostFinder and FaCoY, the state-of-the-art baselines. Our QA method showed comparable efficiency proving its potential for effective code recommendation.	翻訳日:2023-07-21 18:08:35 公開日:2023-07-20
# ブールネットワークの最小トラップ空間の普遍的性質に取り組む Tackling Universal Properties of Minimal Trap Spaces of Boolean Networks ( http://arxiv.org/abs/2305.02442v2 ) ライセンス: Link先を確認	Sara Riva, Jean-Marie Lagniez, Gustavo Maga\~na L\'opez, Lo\"ic Paulev\'e	(参考訳) 最小トラップ空間(MTS)は、更新モードによらず、ブールダイナミクスが閉じ込められている部分空間をキャプチャする。それらは最も寛容なモードの誘引者に対応する。汎用性のため、MSSの計算は、本質的には列挙に焦点をあてることで、近年牽引力を高めている。本稿では, MTS の普遍性に関する論理的推論を, MTS 上の任意の性質を強制する Boolean 変数の永久凍結を識別するための Boolean ネットワークの再プログラミングと, MTS 上の普遍性から Boolean ネットワークを合成する,という2つの問題の範囲内で解決する。どちらの問題も、量化命題論理式を3段階の量化子(\exists\forall\exists$)で満たすことができる。本稿では,2つの簡単な公式の解法を結合することにより,これらの問題を効率的に解くための逆例誘導改良抽象化(cegar)を提案する。式ごとに解集合プログラミングを頼りにし、生物ネットワークの幅広いブールモデルにその扱い可能性を示すプロトタイプを提供する。 Minimal trap spaces (MTSs) capture subspaces in which the Boolean dynamics is trapped, whatever the update mode. They correspond to the attractors of the most permissive mode. Due to their versatility, the computation of MTSs has recently gained traction, essentially by focusing on their enumeration. In this paper, we address the logical reasoning on universal properties of MTSs in the scope of two problems: the reprogramming of Boolean networks for identifying the permanent freeze of Boolean variables that enforce a given property on all the MTSs, and the synthesis of Boolean networks from universal properties on their MTSs. Both problems reduce to solving the satisfiability of quantified propositional logic formula with 3 levels of quantifiers ($\exists\forall\exists$). In this paper, we introduce a Counter-Example Guided Refinement Abstraction (CEGAR) to efficiently solve these problems by coupling the resolution of two simpler formulas. We provide a prototype relying on Answer-Set Programming for each formula and show its tractability on a wide range of Boolean models of biological networks.	翻訳日:2023-07-21 18:08:07 公開日:2023-07-20
# RadAdapt: 大規模言語モデルの軽量ドメイン適応による要約 RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models ( http://arxiv.org/abs/2305.01146v3 ) ライセンス: Link先を確認	Dave Van Veen, Cara Van Uden, Maayane Attias, Anuj Pareek, Christian Bluethgen, Malgorzata Polacin, Wah Chiu, Jean-Benoit Delbrouck, Juan Manuel Zambrano Chaves, Curtis P. Langlotz, Akshay S. Chaudhari, John Pauly	(参考訳) 本研究は,Radiology Report summarization (RRS) の課題に対して,大規模言語モデル(LLM)を適応するための軽量戦略を体系的に検討する。具体的には、プレトレーニング(自然言語、バイオメディカルテキスト、臨床テキスト)と離散的なプロンプトやパラメータ効率の微調整によるドメイン適応に焦点を当てる。臨床テキストの事前学習とrrsサンプルの微調整によって,タスクに最大限に適応することで,一貫して最高のパフォーマンスを達成できた。重要なことに、この方法は、エンドツーエンドの微調整(パラメータの100%)とは対照的に、モデル全体のパラメータの0.32%しか微調整しない。さらに, 放射線学読者による研究と定性分析を結論付ける前に, 文脈内実例とアウト・オブ・ディストリビューション(OOD)訓練の効果について検討した。本研究は、RSにおけるドメイン適応の重要性を強調し、臨床業務に有効な自然言語処理ソリューションを開発するための貴重な洞察を提供する。 We systematically investigate lightweight strategies to adapt large language models (LLMs) for the task of radiology report summarization (RRS). Specifically, we focus on domain adaptation via pretraining (on natural language, biomedical text, or clinical text) and via discrete prompting or parameter-efficient fine-tuning. Our results consistently achieve best performance by maximally adapting to the task via pretraining on clinical text and fine-tuning on RRS examples. Importantly, this method fine-tunes a mere 0.32% of parameters throughout the model, in contrast to end-to-end fine-tuning (100% of parameters). Additionally, we study the effect of in-context examples and out-of-distribution (OOD) training before concluding with a radiologist reader study and qualitative analysis. Our findings highlight the importance of domain adaptation in RRS and provide valuable insights toward developing effective natural language processing solutions for clinical tasks.	翻訳日:2023-07-21 18:07:44 公開日:2023-07-20
# ハイブリッド量子ニューラルネットワークによる迷路問題の解法に関する深部Q学習 Deep-Q Learning with Hybrid Quantum Neural Network on Solving Maze Problems ( http://arxiv.org/abs/2304.10159v2 ) ライセンス: Link先を確認	Hao-Yuan Chen, Yen-Jui Chang, Ching-Ray Chang	(参考訳) 量子コンピューティングは、高次元データを扱う機械学習アルゴリズムの限界を前進させ、ディープニューラルネットワーク(dnn)モデルの全体的なトレーニングパラメータを減らす大きな可能性を秘めている。本研究では,ゲート型量子コンピュータ上のパラメータ化量子回路(pqc)を用いて,モデルフリー強化学習問題における量子優位の可能性について検討する。量子コンピュータの現在のモデルと能力の包括的調査と評価を通じて、我々は最新のQiskitとPyTorchフレームワークに基づく新しいハイブリッド量子ニューラルネットワークを設計し、訓練した。我々は、その性能をpqcと統合されていない完全に古典的なdnnと比較した。私たちの研究は、迷路問題を解決するための深層量子学習の可能性と、他の強化学習問題に対する洞察を提供します。様々な強化学習問題は合理的なトレーニング時代において有効であると結論づける。さらに,迷路問題に対する様々な量子強化学習モデルの比較検討を行い,研究の全体的な可能性とメリットを評価する。 Quantum computing holds great potential for advancing the limitations of machine learning algorithms to handle higher data dimensions and reduce overall training parameters in deep neural network (DNN) models. This study uses a parameterized quantum circuit (PQC) on a gate-based quantum computer to investigate the potential for quantum advantage in a model-free reinforcement learning problem. Through a comprehensive investigation and evaluation of the current model and capabilities of quantum computers, we designed and trained a novel hybrid Quantum neural network based on the latest Qiskit and PyTorch framework. We compared its performance with a full-classical DNN with and without an integrated PQC. Our research provides insights into the potential of deep quantum learning to solve a maze problem and, potentially, other reinforcement learning problems. We conclude that various reinforcement learning problems can be effective with reasonable training epochs. Moreover, a comparative discussion of the various quantum reinforcement learning model on maze problems is discussed to evaluate our research's overall potential and advantages.	翻訳日:2023-07-21 18:06:48 公開日:2023-07-20
# Sabi\'a: ポルトガルの大規模言語モデル Sabi\'a: Portuguese Large Language Models ( http://arxiv.org/abs/2304.07880v3 ) ライセンス: Link先を確認	Ramon Pires, Hugo Abonizio, Thales Sales Almeida, Rodrigo Nogueira	(参考訳) 言語モデルの能力が向上し続ければ、"ワンサイズフィットオール"モデルが主要なパラダイムとして残ることは考えられます。例えば、世界中の膨大な数の言語が低リソースであることを考えれば、一般的なプラクティスは、複数の言語で単一のモデルを事前学習することだ。本稿では,この実践に挑戦するエビデンスを増大させ,対象言語での単言語事前学習が,すでに多様なコーパスで広く訓練されているモデルを大幅に改善することを示す。より具体的には、ポルトガル語テキストのGPT-JおよびLLaMAモデルを、当初の事前訓練予算の3%以下で事前訓練する。ポルトガルの14のデータセットからなるスイートであるPoetaに関するわずかな評価によると、我々のモデルは、英語と多言語で比較すると、かなり差がある。私たちのベストモデルであるSabi\'a-65Bは、GPT-3.5-turboと同等に動作します。対象言語と翻訳言語で当初考えられたデータセットから評価することにより,言語固有の事前学習の貢献度について検討する。 1)対象言語固有の言語ニュアンス及び構造を捉えること、及び 2) ドメインや文化に関するモデルの知識を豊かにする。以上の結果から,効果の大部分は単言語前訓練によって獲得したドメイン固有知識によるものであることが示唆された。 As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models already extensively trained on diverse corpora. More specifically, we further pretrain GPT-J and LLaMA models on Portuguese texts using 3% or less of their original pretraining budget. Few-shot evaluations on Poeta, a suite of 14 Portuguese datasets, reveal that our models outperform English-centric and multilingual counterparts by a significant margin. Our best model, Sabi\'a-65B, performs on par with GPT-3.5-turbo. By evaluating on datasets originally conceived in the target language as well as translated ones, we study the contributions of language-specific pretraining in terms of 1) capturing linguistic nuances and structures inherent to the target language, and 2) enriching the model's knowledge about a domain or culture. Our results indicate that the majority of the benefits stem from the domain-specific knowledge acquired through monolingual pretraining.	翻訳日:2023-07-21 18:06:13 公開日:2023-07-20
# 非等尺写像とド・ジッターテンソルネットワークからの重なり量子ビット Overlapping qubits from non-isometric maps and de Sitter tensor networks ( http://arxiv.org/abs/2304.02673v2 ) ライセンス: Link先を確認	ChunJun Cao, Wissam Chemissany, Alexander Jahn, and Zolt\'an Zimbor\'as	(参考訳) 非等尺写像を用いて、概局所可観測性、あるいは「重なり合う量子ビット」を構築し、局所実効理論における過程をホログラフィにおける我々の期待と類似した自由度の低い量子系でスプーフできることを示す。さらに、スプーフ系は自然に、量子重力の特徴と同一視できる方法で実際の局所理論から逸脱する。具体的な例として、デ・ジッター時空の2つのメラトイモデルを構築し、大域的デ・ジッターの指数展開が量子自由度を多く減らし、局所物理学が崩壊する前にほぼ長い時間保存されていることを説明した。量子ビットの重なりの近似は、ヒルベルト空間次元の検証、ブラックホールやホログラフィにおける自由度数、量子重力における近似局所性と概念的にどのように結びついているかを強調する。 We construct approximately local observables, or "overlapping qubits", using non-isometric maps and show that processes in local effective theories can be spoofed with a quantum system with fewer degrees of freedom, similar to our expectation in holography. Furthermore, the spoofed system naturally deviates from an actual local theory in ways that can be identified with features in quantum gravity. For a concrete example, we construct two MERA toy models of de Sitter space-time and explain how the exponential expansion in global de Sitter can be spoofed with many fewer quantum degrees of freedom and that local physics may be approximately preserved for an exceedingly long time before breaking down. We highlight how approximate overlapping qubits are conceptually connected to Hilbert space dimension verification, degree-of-freedom counting in black holes and holography, and approximate locality in quantum gravity.	翻訳日:2023-07-21 18:05:52 公開日:2023-07-20
# 固体量子系における欠陥の包括的同定法 Comprehensive scheme for identifying defects in solid-state quantum systems ( http://arxiv.org/abs/2305.17889v2 ) ライセンス: Link先を確認	Chanaprom Cholsuk, Sujin Suwanna, Tobias Vogl	(参考訳) 固体量子エミッタは、光学量子技術に必要なコンポーネントの1つである。理想的には、エミッタは量子ネットワーク内の他のコンポーネントと効率的に結合するための波長互換を持つべきである。したがって、特定のエミッターにつながる蛍光欠陥を理解することが不可欠である。本研究では,密度汎関数理論(dft)を用いて2次元材料窒化ホウ素中の量子エミッタの完全な光学的指紋の計算を行う。これらのエミッターは非常に興味深いが、その多くはまだ同定されていない。その結果、ゼロフォノン線エネルギーなどの単一光学特性を比較するのではなく、理論シミュレーションと実験を比較する際に複数の特性を用いるべきであることが示唆された。これにより、電子構造全体を予測し、量子エミッタを設計・調整することができる。さらに、本手法を適用し、Al$_{\text{N}}$とP$_{\text{N}}$V$_{\text{B}}$の欠陥の例を通して、特定の量子アプリケーションでエミッターを使用するための適合性を予測する。そこで我々は,dft計算を組み合わせて固体結晶中の量子エミッタを同定し,ミスアサインのリスクを低減し,光学量子システムの設計と調整を行う。これにより、将来のハイブリッド量子ネットワークにおける普遍的な固体量子エミッタシステムの分類と生成のレシピとなる。 A solid-state quantum emitter is one of the indispensable components for optical quantum technologies. Ideally, an emitter should have a compatible wavelength for efficient coupling to other components in a quantum network. It is therefore essential to understand fluorescent defects that lead to specific emitters. In this work, we employ density functional theory (DFT) to demonstrate the calculation of the complete optical fingerprints of quantum emitters in the two-dimensional material hexagonal boron nitride. These emitters are of great interest, yet many of them are still to be identified. Our results suggest that instead of comparing a single optical property, such as the commonly used zero-phonon line energy, multiple properties should be used when comparing theoretical simulations to the experiment. This way, the entire electronic structure can be predicted and quantum emitters can be designed and tailored. Moreover, we apply this approach to predict the suitability for using the emitters in specific quantum applications, demonstrating through the examples of the Al$_{\text{N}}$ and P$_{\text{N}}$V$_{\text{B}}$ defects. We therefore combine and apply DFT calculations to identify quantum emitters in solid-state crystals with a lower risk of misassignments as well as a way to design and tailor optical quantum systems. This consequently serves as a recipe for classification and the generation of universal solid-state quantum emitter systems in future hybrid quantum networks.	翻訳日:2023-07-21 18:00:09 公開日:2023-07-20
# 複数のラベルなしデータセットからのAUC最適化 AUC Optimization from Multiple Unlabeled Datasets ( http://arxiv.org/abs/2305.15776v2 ) ライセンス: Link先を確認	Yu Liu, Zheng Xie, Ming Li	(参考訳) 弱い教師付き学習は、完璧な監督が利用できない時に機械学習を強化することを目的としており、研究者から大きな注目を集めている。様々な弱い監督のうち、最も難しい事例の1つは、クラス事前の知識がほとんどない複数のラベルのない(u)データセットから学ぶか、略してu$^m$学習するかである。本稿では,複数のラベル付きデータセットから auc (area under roc curve) 最適化モデルを構築する際の問題点について検討する。 U$^m$-AUCは、U$^m$データを多ラベルAUC最適化問題に変換するAUC最適化手法であり、効率的に訓練することができる。提案したU$^m$-AUCは理論的および実験的に有効であることを示す。 Weakly supervised learning aims to empower machine learning when the perfect supervision is unavailable, which has drawn great attention from researchers. Among various types of weak supervision, one of the most challenging cases is to learn from multiple unlabeled (U) datasets with only a little knowledge of the class priors, or U$^m$ learning for short. In this paper, we study the problem of building an AUC (area under ROC curve) optimization model from multiple unlabeled datasets, which maximizes the pairwise ranking ability of the classifier. We propose U$^m$-AUC, an AUC optimization approach that converts the U$^m$ data into a multi-label AUC optimization problem, and can be trained efficiently. We show that the proposed U$^m$-AUC is effective theoretically and empirically.	翻訳日:2023-07-21 17:59:49 公開日:2023-07-20
# チャットGPT, 大規模言語モデル, 生成AI時代の科学 : 研究倫理と応答方法への挑戦 Science in the Era of ChatGPT, Large Language Models and Generative AI: Challenges for Research Ethics and How to Respond ( http://arxiv.org/abs/2305.15299v2 ) ライセンス: Link先を確認	Evangelos Pournaras	(参考訳) ChatGPTのような人工知能(AI)の大規模な言語モデルは、科学と研究に顕著だが議論の余地がある。本稿では,創造的AIの出現にともなう科学行為における認識論的課題,倫理的・整合性リスクについてレビューする。これは、高品質な研究倫理レビューのための、新たなタイムリーな基礎を築き上げることを目的としています。研究機器と主題としてのAI言語モデルの役割は、科学者、参加者、レビュアーに対する倫理的意味とともに精査されている。研究倫理レビューの新しい新たなプラクティスについて議論され、ai時代のより責任ある研究行為に対する反応を形成する10の推奨事項がまとめられている。 Large language models of artificial intelligence (AI), such as ChatGPT, find remarkable but controversial applicability in science and research. This paper reviews epistemological challenges, ethical and integrity risks in science conduct in the advent of generative AI. This is with the aim to lay new timely foundations for a high-quality research ethics review. The role of AI language models as a research instrument and subject is scrutinized along with ethical implications for scientists, participants and reviewers. New emerging practices for research ethics review are discussed, concluding with ten recommendations that shape a response for a more responsible research conduct in the era of AI.	翻訳日:2023-07-21 17:59:37 公開日:2023-07-20
# AlignAtt:同時音声翻訳ガイドとしての注意に基づく音声翻訳アライメント AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation ( http://arxiv.org/abs/2305.11408v2 ) ライセンス: Link先を確認	Sara Papi, Marco Turchi, Matteo Negri	(参考訳) 自然言語処理に今日最も使われているアーキテクチャの中核的なメカニズムは注意であり、機械翻訳関連タスクの有効性を含む多くの観点から分析されてきた。これらの研究の中で、音声翻訳(ST)タスクのように、入力テキストを音声セグメントに置き換えた場合にも、単語アライメントに関する洞察を得るのに役立つ情報源として注意が向けられた。本稿では,提案するAlignAttを提案する。このAlignAttは,アテンション情報を利用して推論時にモデルを誘導するソース・ターゲットアライメントを生成する,同時ST(SimulST)のための新しいポリシーである。 8言語対の MuST-C v1.0 の実験により、AlignAtt はオフライン学習モデルに適用された従来の最先端の SimulST ポリシーよりも優れており、BLEU は 2 点のBLEU で、レイテンシは 8 言語で0.5 から0.8 の範囲で減少している。 Attention is the core mechanism of today's most used architectures for natural language processing and has been analyzed from many perspectives, including its effectiveness for machine translation-related tasks. Among these studies, attention resulted to be a useful source of information to get insights about word alignment also when the input text is substituted with audio segments, as in the case of the speech translation (ST) task. In this paper, we propose AlignAtt, a novel policy for simultaneous ST (SimulST) that exploits the attention information to generate source-target alignments that guide the model during inference. Through experiments on the 8 language pairs of MuST-C v1.0, we show that AlignAtt outperforms previous state-of-the-art SimulST policies applied to offline-trained models with gains in terms of BLEU of 2 points and latency reductions ranging from 0.5s to 0.8s across the 8 languages.	翻訳日:2023-07-21 17:59:07 公開日:2023-07-20
# MaxViT-UNet:医療画像セグメンテーションのためのマルチ軸注意 MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation ( http://arxiv.org/abs/2305.08396v3 ) ライセンス: Link先を確認	Abdul Rehman Khan, Asifullah Khan	(参考訳) 畳み込みニューラルネットワーク(CNN)は近年,医療画像解析において大きな進歩を遂げている。しかし、畳み込み作用素の局所的な性質は、CNNにおける大域的および長距離的相互作用を捉える限界を生じさせる可能性がある。近年,コンピュータビジョンコミュニティや医療画像セグメンテーションにおいて,グローバル機能を効果的に処理する能力からトランスフォーマーが普及している。自己注意機構のスケーラビリティの問題とCNNのような帰納バイアスの欠如は、採用を制限した可能性がある。そのため,畳み込み機構と自己着脱機構の利点を活かしたハイブリッド視覚トランスフォーマ(cnn-トランスフォーマ)の重要性が高まっている。本稿では,医療用画像分割用エンコーダデコーダ型ハイブリッドビジョントランスフォーマ(cnn-transformer)maxvit-unetを提案する。 maxvit-blockに基づくハイブリッドデコーダは,各デコーダの畳み込み機構と自己アテンション機構の両方のパワーを,名目上の計算負荷で活用するように設計されている。復号器の各段階における多軸自己注意の包含は、対象領域と背景領域の識別能力を大幅に向上させ、セグメンテーション効率の向上に寄与する。ハイブリッドデコーダブロックでは、変換畳み込みにより得られたアップサンプル化下層デコーダ特徴とハイブリッドエンコーダから導出されるスキップ接続特徴とを一体化して融合処理を開始する。その後、多軸アテンション機構の利用により、融合した特徴が洗練される。提案したデコーダブロックは数回繰り返して核領域を段階的に分割する。 MoNuSeg18とMoNuSAC20データセットの実験結果から,提案手法の有効性が示された。 Convolutional Neural Networks (CNNs) have made significant strides in medical image analysis in recent years. However, the local nature of the convolution operator may pose a limitation for capturing global and long-range interactions in CNNs. Recently, Transformers have gained popularity in the computer vision community and also medical image segmentation due to their ability to process global features effectively. The scalability issues of self-attention mechanism and lack of the CNN-like inductive bias may have limited their adoption. Therefore, hybrid Vision transformers (CNN-Transformer), exploiting advantages of both Convolution and Self-attention Mechanisms, have gained importance. In this work, we present MaxViT-UNet, an Encoder-Decoder based hybrid vision transformer (CNN-Transformer) for medical image segmentation. The proposed Hybrid Decoder, based on MaxViT-block, is designed to harness the power of both the convolution and self-attention mechanisms at each decoding stage with nominal computational burden. The inclusion of multi-axis self-attention, within each decoder stage, significantly enhances the discriminating capacity between the object and background regions, and thereby helps in improving the segmentation efficiency. In the Hybrid Decoder block, the fusion process commences by integrating the upsampled lower level decoder features, obtained through transpose convolution, with the skip-connection features derived from the hybrid encoder. Subsequently, the fused features undergo refinement through the utilization of a multi-axis attention mechanism. The proposed decoder block is repeated multiple times to progressively segment the nuclei regions. Experimental results on MoNuSeg18 and MoNuSAC20 dataset demonstrates the effectiveness of the proposed technique.	翻訳日:2023-07-21 17:58:46 公開日:2023-07-20
# 混合状態に対する最適量子速度 Optimal quantum speed for mixed states ( http://arxiv.org/abs/2305.08004v2 ) ライセンス: Link先を確認	Ashraf Naderzadeh and Seyed Javad Akhtarshenas	(参考訳) 量子状態がどの程度高速に進化できるかという問題を考える。 phys におけるユークリッド距離に基づく二乗速度の定義を用いる。 Rev. Research, {\bf 2}, 033127 (2019)] では、時間非依存ハミルトニアンの下で一元的に進化した$d$次元システムの最適速度を得るための体系的な枠組みを提案する。同じ純度を持つ混合量子状態の組のうち、最適状態はその純度パラメータを用いて得られる。任意の$d$ に対して、最適状態は、二次対角線に対して対称である追加の性質を持つ$x$-状態によって表される。純度が最大混合状態$\Id/d$を少なくとも2/d^2$で純度を超える十分低い純度に対して、最適状態の非零対角エントリーは$\varrho_{1d}$であり、それぞれ最小固有値と最大固有値を持つ2つのエネルギー固有状態間の遷移振幅に対応する。しかし、より大きな純度の場合、他の二次径のエントリ$\varrho_{i,d-i+1}$を非零値とするかどうかは、相対エネルギーギャップ$\|E_{d-i+1}-E_{i}\|$に依存する。エネルギー基底に対するコヒーレンスと絡み合いの影響も検討され、最適状態においてはどちらの資源も純度の単調関数であるため、量子進化のスピードアップを招き、量子速度の限界を小さくすることができる。以上の結果から, 2次対角線上に位置する対角線外エントリによって引き起こされるコヒーレンスのみが, 状態のコヒーレンスが進化の速度に寄与することが示された。 The question of how fast a quantum state can evolve is considered. Using the definition of squared speed based on the Euclidean distance given in [Phys. Rev. Research, {\bf 2}, 033127 (2019)], we present a systematic framework to obtain the optimal speed of a $d$-dimensional system evolved unitarily under a time-independent Hamiltonian. Among the set of mixed quantum states having the same purity, the optimal state is obtained in terms of its purity parameter. We show that for an arbitrary $d$, the optimal state is represented by a $X$-state with an additional property of being symmetric with respect to the secondary diagonal. For sufficiently low purities for which the purity exceeds the purity of maximally mixed state $\Id/d$ by at most $2/d^2$, the only nonzero off-diagonal entry of the optimal state is $\varrho_{1d}$, corresponding to the transition amplitude between two energy eigenstates with minimum and maximum eigenvalues, respectively. For larger purities, however, whether or not the other secondary diameter entries $\varrho_{i,d-i+1}$ take nonzero values depends on their relative energy gaps $\|E_{d-i+1}-E_{i}\|$. The effects of coherence and entanglement, with respect to the energy basis, are also examined and find that for optimal states both resources are monotonic functions of purity, so they can cause speed up quantum evolution leading to a smaller quantum speed limit. Our results show that although the coherence of the states is responsible for the speed of evolution, for the fastest states only the coherence caused by some off-diagonal entries located on the secondary diagonal make a role.	翻訳日:2023-07-21 17:58:18 公開日:2023-07-20
# ベイズ推論の組成構造 The Compositional Structure of Bayesian Inference ( http://arxiv.org/abs/2305.06112v2 ) ライセンス: Link先を確認	Dylan Braithwaite, Jules Hedges, Toby St Clere Smithe	(参考訳) ベイズの法則は、新しい証拠に照らして信念を更新するために因果プロセスを反転させる方法を教えてくれる。もしこの過程が複雑な構成構造を持つと信じられているならば、全体の反転は成分過程の観点で区分的に計算できるのである。この構成規則の構造について検討し,関数型プログラミングにおけるレンズパターンとの関連について考察した。マルコフ核の圏の好ましく一般的な公理的な表現の中で、ベイズ反転をファイバー圏における状態依存型(英語版)の特定の例と考えることができる。基礎となるカテゴリの関手として定式化されたこの構成の性質について議論し、統計的推論に対するより型駆動的なアプローチにどのように使用できるかを検討する。 Bayes' rule tells us how to invert a causal process in order to update our beliefs in light of new evidence. If the process is believed to have a complex compositional structure, we may observe that the inversion of the whole can be computed piecewise in terms of the component processes. We study the structure of this compositional rule, noting that it relates to the lens pattern in functional programming. Working in a suitably general axiomatic presentation of a category of Markov kernels, we see how we can think of Bayesian inversion as a particular instance of a state-dependent morphism in a fibred category. We discuss the compositional nature of this, formulated as a functor on the underlying category and explore how this can used for a more type-driven approach to statistical inference.	翻訳日:2023-07-21 17:57:43 公開日:2023-07-20
# 完全ベイズVIB-DeepSSM Fully Bayesian VIB-DeepSSM ( http://arxiv.org/abs/2305.05797v2 ) ライセンス: Link先を確認	Jadie Adams and Shireen Elhabian	(参考訳) 統計的形状モデリング(SSM)は、集団に基づく解剖学的形状の定量的分析を可能にし、臨床診断を行う。深層学習による3次元画像からの対応ベースssmの予測は不確かさの定量化を必要とするが、ベイズ式化の動機付けは必要である。変動情報ボトルネックのDeepSSM(VIB-DeepSSM)は,アレータティック不確実性定量化画像から解剖の確率的形状を予測するための,有効で原則化されたフレームワークである。しかし、VIBは半ベイズ的であり、疫学的な不確実性推論を欠いている。我々は,完全ベイズ式vibを導出し,スケーラブルな2つの実装手法の有効性を実証する。さらに,マルチモーダル限界化による不確実性校正をさらに強化する新しい組み合わせを提案する。合成形状と左房データの実験により、完全ベイズVIBネットワークは精度を犠牲にすることなく不確実性推論を改善した画像からSSMを予測することを示した。 Statistical shape modeling (SSM) enables population-based quantitative analysis of anatomical shapes, informing clinical diagnosis. Deep learning approaches predict correspondence-based SSM directly from unsegmented 3D images but require calibrated uncertainty quantification, motivating Bayesian formulations. Variational information bottleneck DeepSSM (VIB-DeepSSM) is an effective, principled framework for predicting probabilistic shapes of anatomy from images with aleatoric uncertainty quantification. However, VIB is only half-Bayesian and lacks epistemic uncertainty inference. We derive a fully Bayesian VIB formulation and demonstrate the efficacy of two scalable implementation approaches: concrete dropout and batch ensemble. Additionally, we introduce a novel combination of the two that further enhances uncertainty calibration via multimodal marginalization. Experiments on synthetic shapes and left atrium data demonstrate that the fully Bayesian VIB network predicts SSM from images with improved uncertainty reasoning without sacrificing accuracy.	翻訳日:2023-07-21 17:57:29 公開日:2023-07-20
# ポイントクラウドネットワークは解剖学の統計的形状モデルを学ぶことができるか? Can point cloud networks learn statistical shape models of anatomies? ( http://arxiv.org/abs/2305.05610v2 ) ライセンス: Link先を確認	Jadie Adams and Shireen Elhabian	(参考訳) 統計的形状モデリング (SSM) は解剖学の個体群における解剖学的変動を調査し定量化する貴重なツールである。しかし、従来の対応ベースのSSM生成法では、SSMを構成するには完全な幾何学的プロキシ(例えば、高解像度のバイナリボリュームや表面メッシュ)が必要である。形状の無秩序な3dポイントクラウド表現は、様々な医療画像(しきい値画像や表面走査など)からより容易に取得できる。ポイントクラウドディープネットワークは、最近、異なるポイントクラウドタスク(例えば、補完、意味セグメンテーション、分類)の置換不変機能を学習することに成功した。しかし、ポイントクラウドからssmを学習する彼らの応用は未検討である。本研究では,既存のポイントクラウドエンコーダ・デコーダベースのコンプリートネットワークが,ssmの未解決可能性を提供し,人口レベルの統計表現をキャプチャし,推論負担を軽減し,入力要求を緩和できることを実証する。本稿では,SSMアプリケーションに対するこれらの手法の限界について論じ,今後の改良を提案する。我々の研究は、形状解析文学を進歩させ、多様なユースケースにSSMを広げるための有望な道である、SSMのためのポイントクラウド深層学習のさらなる探求の道を開く。 Statistical Shape Modeling (SSM) is a valuable tool for investigating and quantifying anatomical variations within populations of anatomies. However, traditional correspondence-based SSM generation methods have a prohibitive inference process and require complete geometric proxies (e.g., high-resolution binary volumes or surface meshes) as input shapes to construct the SSM. Unordered 3D point cloud representations of shapes are more easily acquired from various medical imaging practices (e.g., thresholded images and surface scanning). Point cloud deep networks have recently achieved remarkable success in learning permutation-invariant features for different point cloud tasks (e.g., completion, semantic segmentation, classification). However, their application to learning SSM from point clouds is to-date unexplored. In this work, we demonstrate that existing point cloud encoder-decoder-based completion networks can provide an untapped potential for SSM, capturing population-level statistical representations of shapes while reducing the inference burden and relaxing the input requirement. We discuss the limitations of these techniques to the SSM application and suggest future improvements. Our work paves the way for further exploration of point cloud deep learning for SSM, a promising avenue for advancing shape analysis literature and broadening SSM to diverse use cases.	翻訳日:2023-07-21 17:57:12 公開日:2023-07-20
# Chain-of-Knowledge Promptingによる言語モデルの強化 Boosting Language Models Reasoning with Chain-of-Knowledge Prompting ( http://arxiv.org/abs/2306.06427v2 ) ライセンス: Link先を確認	Jianing Wang, Qiushi Sun, Nuo Chen, Xiang Li, Ming Gao	(参考訳) これは ``let's think step by step'''' のような単純なプロンプトを設計することや、複数のコンテキスト内exemplarsを適切に設計し、大きな言語モデル(llm)を導出して中間的な推論ステップを生成することを目的としている。しかし、生成された合理性はしばしば間違いを伴い、非事実的で不誠実な推論連鎖を作る。この脆さを緩和するために,我々は,LLMを3重構造形式で明示的な知識証拠を生成することを目的とした,新しい知識の連鎖(CoK)プロンプトを提案する。これは人間の行動、つまり、複雑な質問に答える前に脳の推論証拠としてマインドマップや知識マップを描けることにインスパイアされている。さらに, 事実性および忠実性の観点から, 推論チェーンの信頼性を推定するF^2-Verification法を導入する。信頼できない反応については、誤った証拠がLSMに再考を促すために示される。広範な実験により,本手法はコモンセンス,ファクトラル,シンボリック,算術推論タスクの性能をさらに向上できることが証明された。 Recently, Chain-of-Thought (CoT) prompting has delivered success on complex reasoning tasks, which aims at designing a simple prompt like ``Let's think step by step'' or multiple in-context exemplars with well-designed rationales to elicit Large Language Models (LLMs) to generate intermediate reasoning steps. However, the generated rationales often come with mistakes, making unfactual and unfaithful reasoning chains. To mitigate this brittleness, we propose a novel Chain-of-Knowledge (CoK) prompting, where we aim at eliciting LLMs to generate explicit pieces of knowledge evidence in the form of structure triple. This is inspired by our human behaviors, i.e., we can draw a mind map or knowledge map as the reasoning evidence in the brain before answering a complex question. Benefiting from CoK, we additionally introduce a F^2-Verification method to estimate the reliability of the reasoning chains in terms of factuality and faithfulness. For the unreliable response, the wrong evidence can be indicated to prompt the LLM to rethink. Extensive experiments demonstrate that our method can further improve the performance of commonsense, factual, symbolic, and arithmetic reasoning tasks.	翻訳日:2023-07-21 17:48:11 公開日:2023-07-20
# 雑音変動量子アルゴリズムにおける量子平均値のシミュレーション:多項式スケールアプローチ Simulating Quantum Mean Values in Noisy Variational Quantum Algorithms: A Polynomial-Scale Approach ( http://arxiv.org/abs/2306.05804v2 ) ライセンス: Link先を確認	Yuguo Shao, Fuchuan Wei, Song Cheng, Zhengwei Liu	(参考訳) 大規模変動量子アルゴリズムは、実用的な量子優位性を達成するための潜在的な経路として広く認識されている。しかし、量子ノイズの存在はこれらの利点を抑圧し弱め、古典的シミュラビリティの境界を曖昧にする可能性がある。この問題をより明確にするために,観測可能なパウリパス(OBPPP)のバックプロパゲーションの経路積分に基づく新しい多項式スケール法を提案する。本手法は,独立単一量子ビット偏極雑音の存在下で,有界乱れ誤差を持つ変分量子アルゴリズムの量子平均値を効率よく近似する。理論的には厳格に証明します 1) 固定ノイズレート $\lambda$ に対して、obppp の時間と空間の複雑さは、量子ビット $n$ の数、回路深度 $l$ 、逆トランザクションエラー $\frac{1}{\varepsilon}$ 、ルート平方逆成功確率 $\frac{1}{\sqrt{\delta}}$ との多項式関係を示す。 2 変数 $\lambda$ に対して、計算複雑性は $\mathrm{Poly}\left(n,L\right)$ が $\lambda$ を超えるとき $\frac{1}{\log{L}}$ となり、$\lambda$ が $\frac{1}{L}$ 以下になるとき $L$ が指数関数となる。数値解析により,IBM の 127-qubit Eagle プロセッサ [Nature \textbf{618}, 500 (2023)] におけるゼロノイズ外挿実験結果の古典的シミュレーションを行った。提案手法は,量子デバイスと比較して精度が高く,実行速度も速い。さらに,本手法はノイズのない結果からノイズを低減し,生の観測と直接対応するIBMの未決定結果を正確に再現することを可能にする。 Large-scale variational quantum algorithms are widely recognized as a potential pathway to achieve practical quantum advantages. However, the presence of quantum noise might suppress and undermine these advantages, which blurs the boundaries of classical simulability. To gain further clarity on this matter, we present a novel polynomial-scale method based on the path integral of observable's back-propagation on Pauli paths (OBPPP). This method efficiently approximates quantum mean values in variational quantum algorithms with bounded truncation error in the presence of independent single-qubit depolarizing noise. Theoretically, we rigorously prove: 1) For a fixed noise rate $\lambda$, OBPPP's time and space complexity exhibit a polynomial relationship with the number of qubits $n$, the circuit depth $L$, the inverse truncation error $\frac{1}{\varepsilon}$, and the root square inverse success probability $\frac{1}{\sqrt{\delta}}$. 2) For variable $\lambda$, the computational complexity becomes $\mathrm{Poly}\left(n,L\right)$ when $\lambda$ exceeds $\frac{1}{\log{L}}$ and it becomes exponential with $L$ when $\lambda$ falls below $\frac{1}{L}$. Numerically, we conduct classical simulations of IBM's zero-noise extrapolated experimental results on the 127-qubit Eagle processor [Nature \textbf{618}, 500 (2023)]. Our method attains higher accuracy and faster runtime compared to the quantum device. Moreover, this approach enables us to deduce noisy outcomes from noiseless results, allowing us to accurately reproduce IBM's unmitigated results that directly correspond to raw experimental observations.	翻訳日:2023-07-21 17:47:30 公開日:2023-07-20
# 階層型変分オートエンコーダを用いた感情条件メロディ調和 Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder ( http://arxiv.org/abs/2306.03718v4 ) ライセンス: Link先を確認	Shulei Ji and Xinyu Yang	(参考訳) 既存のメロディ調和モデルでは、生成したハーモニーの品質向上に大きな進歩を遂げているが、その多くは音楽の下の感情を無視している。一方、以前の手法で生成された調和の変動性は不十分である。これらの問題を解決するために,LSTMを用いた階層的変分自動エンコーダ(LHVAE)を提案する。特に、LHVAEは、グローバルおよびローカルな音楽特性をモデル化するために、様々なレベル(ピースレベルとバーレベル)の潜伏変数と感情条件を組み込んでいる。さらに,各ステップに注意に基づくメロディコンテキストベクトルを導入し,メロディとハーモニーの対応をよりよく学習する。目的実験の結果,提案モデルは他のLSTMモデルよりも優れていた。主観的評価を通じて、和音の種類を変えるだけでは音楽全体の感情がほとんど変化しないと結論づける。定性的解析は、我々のモデルが可変調和を生成する能力を示す。 Existing melody harmonization models have made great progress in improving the quality of generated harmonies, but most of them ignored the emotions beneath the music. Meanwhile, the variability of harmonies generated by previous methods is insufficient. To solve these problems, we propose a novel LSTM-based Hierarchical Variational Auto-Encoder (LHVAE) to investigate the influence of emotional conditions on melody harmonization, while improving the quality of generated harmonies and capturing the abundant variability of chord progressions. Specifically, LHVAE incorporates latent variables and emotional conditions at different levels (piece- and bar-level) to model the global and local music properties. Additionally, we introduce an attention-based melody context vector at each step to better learn the correspondence between melodies and harmonies. Objective experimental results show that our proposed model outperforms other LSTM-based models. Through subjective evaluation, we conclude that only altering the types of chords hardly changes the overall emotion of the music. The qualitative analysis demonstrates the ability of our model to generate variable harmonies.	翻訳日:2023-07-21 17:46:42 公開日:2023-07-20
# 直交群対称性の下での$k$-陽性とシュミット数 $k$-positivity and Schmidt number under orthogonal group symmetries ( http://arxiv.org/abs/2306.00654v2 ) ライセンス: Link先を確認	Sang-Jun Park, Sang-Gyun Youn	(参考訳) 本稿では,標準直交群対称性の下で,k$-positivity と schmidt number について検討する。シュミット数は量子情報理論における絡み合いの自然な定量化である。まず、すべての直交共変 $k$-正の写像の完全な特徴づけを示す。これは [Tom85] で以前の結果を一般化する。さらに、コンパクト群対称性の下で、k$-ポジティビティとシュミット数の間の双対関係を最適化する。この新たな枠組みにより、直交不変量子状態のシュミット数を効率的に計算できる。 In this paper, we study $k$-positivity and Schmidt number under standard orthogonal group symmetries. The Schmidt number is a natural quantification of entanglement in quantum information theory. First of all, we exhibit a complete characterization of all orthogonally covariant $k$-positive maps. This generalizes earlier results in [Tom85]. Furthermore, we optimize duality relations between $k$-positivity and Schmidt numbers under compact group symmetries. This new framework enables us to efficiently compute the Schmidt numbers of all orthogonally invariant quantum states.	翻訳日:2023-07-21 17:46:22 公開日:2023-07-20
# 量子計測のための物理ノイズモデル A physical noise model for quantum measurements ( http://arxiv.org/abs/2305.19766v2 ) ライセンス: Link先を確認	Faedi Loulidi, Ion Nechita, Cl\'ement Pellegrini	(参考訳) そこで,本論文では,故障のある間接計測方式に動機づけられた量子計測のための新しいノイズモデルを提案する。量子系とプローブの相互作用を制御するランダムダイナミクス上の平均化により、自然な物理ノイズモデルが出現する。非互換性のロバスト性という枠組みで、既存のノイズモデル(一様および非分極)と比較する。我々は,本モデルが特定の測定クラスの互換性領域を大きくすることができることを観察した。 In this paper we introduce a novel noise model for quantum measurements motivated by an indirect measurement scheme with faulty preparation. Averaging over random dynamics governing the interaction between the quantum system and a probe, a natural, physical noise model emerges. We compare it to existing noise models (uniform and depolarizing) in the framework of incompatibility robustness. We observe that our model allows for larger compatibility regions for specific classes of measurements.	翻訳日:2023-07-21 17:46:15 公開日:2023-07-20
# 分子ドッキングと機械学習回帰法を用いたCOVID-19 3CLプロテアーゼを標的とした薬物精製 Drug Repurposing Targeting COVID-19 3CL Protease using Molecular Docking and Machine Learning Regression Approach ( http://arxiv.org/abs/2305.18088v4 ) ライセンス: Link先を確認	Imra Aqeel, and Abdul Majid	(参考訳) 新型コロナウイルス(COVID-19)のパンデミックが世界的な健康危機を引き起こし、治療薬の早期発見の必要性が高まっている。この課題を満たすために、医薬品の再利用はコスト、時間、労働を節約する唯一の解決策である。本研究では,SARS-CoV-2の主要プロテアーゼ3CLを標的とした新型コロナウイルス治療の可能性として,FDAが承認した5903薬を含む世界承認薬をスクリーニングするために,Zincデータベースを使用した。分子ドッキングを行い,薬物分子の有効性を確認した。薬物再資源化手法の効率を高めるために, 決定木, 余剰木, MLP, KNN, XGBoost, 勾配ブースティングなどのQSARモデリングのための機械学習回帰手法を用いて, 結合親和性をモデル化した。その結果,決定木回帰(DTR)モデルにより,R2およびRMSEの統計的測定精度が向上した。これらのシミュレーション結果は高い結合親和性を持つ薬物の同定に役立った。ドッキングおよびその他の統計分析から,-15 kcal/molから-13 kcal/molの範囲で,それぞれのZinc ID(ZINC3873365,ZINC85432544,ZINC203757351,ZINC85536956,ZINC8214470,ZINC261494640)の6種類の有望薬物をショートリスト化した。本研究は、他の研究ですでに新型コロナウイルスに対して同定されているZINC203757351抗ウイルス化合物以外の新規な薬剤である。さらに, 特定のプロテアーゼ3CLproに対する最も優れた結合相互作用について, これらのトップランク選択薬の生理化学的および薬物動態特性を解析した。我々の研究は、COVID-19に対する薬物再精製の効果的な枠組みを提供してきた。これは、分子ドッキングと機械学習回帰アプローチを組み合わせることで、潜在的な治療候補の同定を加速する可能性を強調している。 The COVID-19 pandemic has created a global health crisis, driving the need for the rapid identification of potential therapeutics. To meet this challenge, drug repurposing is the only solution with saving cost, time, and labor. In this study, we used the Zinc database to screen the world-approved including FDA-approved 5903 drugs for repurposing as potential COVID-19 treatments targeting the main protease 3CL of SARS-CoV-2. We performed molecular docking and checked the efficacy of drug molecules. To enhance the efficiency of drug repurposing approach, we modeled the binding affinities using several machine learning regression approaches for QSAR modeling such as decision tree, extra trees, MLP, KNN, XGBoost, and gradient boosting. The computational results demonstrated that Decision Tree Regression (DTR) model has improved statistical measures of R2 and RMSE. These simulated results helped to identify drugs with high binding affinity. From the docking and other statistical analysis, we shortlisted six promising drugs with their respective Zinc IDs (ZINC3873365, ZINC85432544, ZINC203757351, ZINC85536956, ZINC8214470 and ZINC261494640) within the range of -15 kcal/mol to -13 kcal/mol. In the study, the repurposed drugs are novel except ZINC203757351 antiviral compound that has already identified against COVID-19 in other studies. Further, we analyzed the physiochemical and pharmacokinetic properties of these top-ranked selected drugs with respect to their best binding interaction for specific target protease 3CLpro. Our study has provided an efficient framework for drug repurposing against COVID-19. This highlights the potential of combining molecular docking with machine learning regression approaches to accelerate the identification of potential therapeutic candidates.	翻訳日:2023-07-21 17:46:10 公開日:2023-07-20
# GSMorph: cine-MRI心筋変形性レジストレーションのためのグラディエント手術 GSMorph: Gradient Surgery for cine-MRI Cardiac Deformable Registration ( http://arxiv.org/abs/2306.14687v2 ) ライセンス: Link先を確認	Haoran Dou, Ning Bi, Luyi Han, Yuhao Huang, Ritse Mann, Xin Yang, Dong Ni, Nishant Ravikumar, Alejandro F. Frangi, Yunzhi Huang	(参考訳) 深層学習に基づく変形可能な登録法は様々な医学的応用において広く研究されている。学習に基づく変形可能な登録は、変形場の登録精度と滑らかさをトレードオフする重み付き目的関数に依存する。したがって、最適な登録性能を得るためには、必然的にハイパーパラメータをチューニングする必要がある。ハイパーパラメータのチューニングは非常に計算コストが高く、ドメイン知識に望ましくない依存性をもたらします。本研究では,GSMorph と呼ばれる勾配手術機構に基づく登録モデルを構築し,複数の損失に対するハイパーパラメータフリーバランスを実現する。 GSMorphでは、この2つの競合する項のバランスをとるためにハイパーパラメータを導入するのではなく、滑らか性制約に付随する平面に直交する類似性損失の勾配を投影することで最適化手順を再構築する。さらに,本手法はモデルに依存しないため,パラメータの追加や推論の遅延を伴わずに,任意のディープ登録ネットワークにマージすることができる。本研究では,2つの心臓MRIデータセットに対するSOTA (State-of-the-art) 変形性登録手法との比較を行った。 GSMorphは5つのSOTA学習ベース登録モデルと2つの従来の登録手法であるSyNとDemonsよりも、登録精度と滑らかさの両方で優れていることを証明している。 Deep learning-based deformable registration methods have been widely investigated in diverse medical applications. Learning-based deformable registration relies on weighted objective functions trading off registration accuracy and smoothness of the deformation field. Therefore, they inevitably require tuning the hyperparameter for optimal registration performance. Tuning the hyperparameters is highly computationally expensive and introduces undesired dependencies on domain knowledge. In this study, we construct a registration model based on the gradient surgery mechanism, named GSMorph, to achieve a hyperparameter-free balance on multiple losses. In GSMorph, we reformulate the optimization procedure by projecting the gradient of similarity loss orthogonally to the plane associated with the smoothness constraint, rather than additionally introducing a hyperparameter to balance these two competing terms. Furthermore, our method is model-agnostic and can be merged into any deep registration network without introducing extra parameters or slowing down inference. In this study, We compared our method with state-of-the-art (SOTA) deformable registration approaches over two publicly available cardiac MRI datasets. GSMorph proves superior to five SOTA learning-based registration models and two conventional registration techniques, SyN and Demons, on both registration accuracy and smoothness.	翻訳日:2023-07-21 17:39:37 公開日:2023-07-20
# $\alpha$-$\beta$-Factorization と Simon's Congruence のバイナリケース $\alpha$-$\beta$-Factorization and the Binary Case of Simon's Congruence ( http://arxiv.org/abs/2306.14192v2 ) ライセンス: Link先を確認	Pamela Fleischmann, Jonas H\"ofer, Annika Huch, Dirk Nowotka	(参考訳) 1991年、H'ebrardは単語の因数分解を導入し、単語の散在する要素(散在した)や部分列(サブワード)を調べる強力なツールとなった。これに基づいて、最初のカランディカールとシュネーベレンは$k$-richnessという概念を導入し、後にBarkerらに$k$-universalityという概念を導入した。 2022年、fleischmannらは、単語とその逆のアーチ分解を交差させることで、アーチ分解の一般化を示した。著者らは, この因子分解を, 最短欠落因子の探索にのみ用いたが, 本研究では, 新規な$\alpha$-$\beta$-factorization について検討する。我々は、有名なsimon congruenceのk$universalワードを1$universalワードで特徴づける。さらに,これらの結果をバイナリ単語に適用する。この特別な場合、クラスを完全に特徴づけ、合同の指標を計算する。最後に、三項ケースの調査を開始し、$\alpha\beta\alpha$-factorsの完全なリストを示し、それらの一貫性を特徴づける。 In 1991 H\'ebrard introduced a factorization of words that turned out to be a powerful tool for the investigation of a word's scattered factors (also known as (scattered) subwords or subsequences). Based on this, first Karandikar and Schnoebelen introduced the notion of $k$-richness and later on Barker et al. the notion of $k$-universality. In 2022 Fleischmann et al. presented a generalization of the arch factorization by intersecting the arch factorization of a word and its reverse. While the authors merely used this factorization for the investigation of shortest absent scattered factors, in this work we investigate this new $\alpha$-$\beta$-factorization as such. We characterize the famous Simon congruence of $k$-universal words in terms of $1$-universal words. Moreover, we apply these results to binary words. In this special case, we obtain a full characterization of the classes and calculate the index of the congruence. Lastly, we start investigating the ternary case, present a full list of possibilities for $\alpha\beta\alpha$-factors, and characterize their congruence.	翻訳日:2023-07-21 17:39:18 公開日:2023-07-20
# My Boli: コードミックスのMarathi- English Corpora、事前学習言語モデル、評価ベンチマーク My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks ( http://arxiv.org/abs/2306.14030v2 ) ライセンス: Link先を確認	Tanmay Chavan, Omkar Gokhale, Aditya Kane, Shantanu Patankar, Raviraj Joshi	(参考訳) コード混合データの研究は、専用のコード混合データセットと事前学習された言語モデルが利用できないため、限られている。この作業では、コードミックスに先立つ作業に欠ける、低リソースのインドの言語であるmarathiに焦点を合わせます。 L3Cube-MeCorpusは,Mr-Enコーパスと1000万のソーシャルメディア文による事前学習用コーパスである。また、コード混合BERTベースのトランスモデルであるL3Cube-MeBERTとMeRoBERTaをMeCorpusで事前学習した。さらに、ベンチマークでは、コード混合mr-enヘイトスピーチ検出、感情分析、言語識別などの下流タスクに対して、mehate、mesent、melidの3つの教師付きデータセットを提案する。これらの評価データセットは、手動で注釈付き \url{~}12,000 Marathi- English code-mixed tweet で構成されている。アブレーションは、この新しいコーパスで訓練されたモデルは、既存の最先端のBERTモデルよりも大幅に優れていることを示している。これは、コード混合マラーティー研究の成果物を提示する最初の作品である。すべてのデータセットとモデルはhttps://github.com/l3cube-pune/MarathiNLPで公開されている。 The research on code-mixed data is limited due to the unavailability of dedicated code-mixed datasets and pre-trained language models. In this work, we focus on the low-resource Indian language Marathi which lacks any prior work in code-mixing. We present L3Cube-MeCorpus, a large code-mixed Marathi-English (Mr-En) corpus with 10 million social media sentences for pretraining. We also release L3Cube-MeBERT and MeRoBERTa, code-mixed BERT-based transformer models pre-trained on MeCorpus. Furthermore, for benchmarking, we present three supervised datasets MeHate, MeSent, and MeLID for downstream tasks like code-mixed Mr-En hate speech detection, sentiment analysis, and language identification respectively. These evaluation datasets individually consist of manually annotated \url{~}12,000 Marathi-English code-mixed tweets. Ablations show that the models trained on this novel corpus significantly outperform the existing state-of-the-art BERT models. This is the first work that presents artifacts for code-mixed Marathi research. All datasets and models are publicly released at https://github.com/l3cube-pune/MarathiNLP .	翻訳日:2023-07-21 17:38:55 公開日:2023-07-20
# ボリューム医用画像解析のための正規SE(3)グループ畳み込み Regular SE(3) Group Convolutions for Volumetric Medical Image Analysis ( http://arxiv.org/abs/2306.13960v2 ) ライセンス: Link先を確認	Thijs P. Kuipers and Erik J. Bekkers	(参考訳) 正規群畳み込みニューラルネットワーク(G-CNN)は、モデル性能を高め、異なる幾何学的対称性に等しくなることが示されている。本研究は体積データ上のse(3),すなわちroto-translation equivarianceの問題に対処する。ボリューム画像データは、多くの医療現場で広く使われている。分離可能な群畳み込みに関する最近の研究により、連続的なSO(3)(回転)カーネルと空間的カーネルに分離されたSE(3)群畳み込みカーネルを考案した。均一なSO(3)格子をサンプリングすることで連続的な設定に近似する。我々の連続SO(3)カーネルは同様に一様格子上のRBF補間によってパラメータ化される。ボリューム画像解析における我々のアプローチの利点を実証する。医用分類課題において, se(3)同変モデルはcnnと正規離散g-cnnを一貫して上回っており, 一般化能力が著しく向上している。提案手法は,通常のCNNに比べて最大16.5%の精度向上を実現している。 Regular group convolutional neural networks (G-CNNs) have been shown to increase model performance and improve equivariance to different geometrical symmetries. This work addresses the problem of SE(3), i.e., roto-translation equivariance, on volumetric data. Volumetric image data is prevalent in many medical settings. Motivated by the recent work on separable group convolutions, we devise a SE(3) group convolution kernel separated into a continuous SO(3) (rotation) kernel and a spatial kernel. We approximate equivariance to the continuous setting by sampling uniform SO(3) grids. Our continuous SO(3) kernel is parameterized via RBF interpolation on similarly uniform grids. We demonstrate the advantages of our approach in volumetric medical image analysis. Our SE(3) equivariant models consistently outperform CNNs and regular discrete G-CNNs on challenging medical classification tasks and show significantly improved generalization capabilities. Our approach achieves up to a 16.5% gain in accuracy over regular CNNs.	翻訳日:2023-07-21 17:38:38 公開日:2023-07-20
# 道徳教育・開発研究における大規模言語モデル活用の可能性 Potential Benefits of Employing Large Language Models in Research in Moral Education and Development ( http://arxiv.org/abs/2306.13805v2 ) ライセンス: Link先を確認	Hyemin Han	(参考訳) 近年,計算機科学者は大規模言語コーパスと人間強化を用いた予測モデルを訓練することにより,大規模言語モデル(LLM)を開発した。 LLMは様々な分野の精度で人工知能を実装するための有望な方法となっている。興味深いことに、近年のLLMは、高度な人間の認知をエミュレートする創発的な機能的特徴、特に従来の予測モデルでは利用できなかった文脈内学習と思考の連鎖を持っている。本稿では,LLMが道徳教育・開発研究にどのように貢献するかを検討する。この目標を達成するために、最近発表された会議論文とArXivのプレプリントをレビューして、LLMで実装された新機能の概要を説明します。また、倫理的ジレンマや外部からのフィードバックに対処しながら、LCMがどのように振る舞うかをChatGPTで簡単な実験を行うつもりです。以上の結果から, LLMは外部入力による推論プロセスの修正と推論に基づいてジレンマを解くことができる可能性が示唆された。さらに、道徳的模範テストによる予備的な実験結果から、模範的な物語は、人間の参加者と同じように、LLMの道徳的高揚を招きかねないことが示される。モラル教育研究におけるllmの潜在的意義と今後の展開について考察する。 Recently, computer scientists have developed large language models (LLMs) by training prediction models with large-scale language corpora and human reinforcements. The LLMs have become one promising way to implement artificial intelligence with accuracy in various fields. Interestingly, recent LLMs possess emergent functional features that emulate sophisticated human cognition, especially in-context learning and the chain of thought, which were unavailable in previous prediction models. In this paper, I will examine how LLMs might contribute to moral education and development research. To achieve this goal, I will review the most recently published conference papers and ArXiv preprints to overview the novel functional features implemented in LLMs. I also intend to conduct brief experiments with ChatGPT to investigate how LLMs behave while addressing ethical dilemmas and external feedback. The results suggest that LLMs might be capable of solving dilemmas based on reasoning and revising their reasoning process with external input. Furthermore, a preliminary experimental result from the moral exemplar test may demonstrate that exemplary stories can elicit moral elevation in LLMs as do they among human participants. I will discuss the potential implications of LLMs on research on moral education and development with the results.	翻訳日:2023-07-21 17:38:23 公開日:2023-07-20
# ラベル生成に基づくクラスインクリメンタル学習 Class-Incremental Learning based on Label Generation ( http://arxiv.org/abs/2306.12619v2 ) ライセンス: Link先を確認	Yijia Shao, Yiduo Guo, Dongyan Zhao, Bing Liu	(参考訳) 事前学習された言語モデルの大きな成功にもかかわらず、これらのモデルを継続的学習、特に破滅的忘れ(CF)によるクラス増分学習(CIL)設定に使用することは依然として困難である。本稿では,cil を連続ラベル生成問題として定式化した場合,cf は大幅に削減され,事前学習モデルの一般化表現がより良く保持できることを示す。そこで我々は,語彙の空間性を活用して生成に集中し,ラベルセマンティクスを用いて擬似再生サンプルを作成する新しいCIL法を提案する。実験の結果, VAGはベースラインよりも大きなマージンで優れていた。 Despite the great success of pre-trained language models, it is still a challenge to use these models for continual learning, especially for the class-incremental learning (CIL) setting due to catastrophic forgetting (CF). This paper reports our finding that if we formulate CIL as a continual label generation problem, CF is drastically reduced and the generalizable representations of pre-trained models can be better retained. We thus propose a new CIL method (VAG) that also leverages the sparsity of vocabulary to focus the generation and creates pseudo-replay samples by using label semantics. Experimental results show that VAG outperforms baselines by a large margin.	翻訳日:2023-07-21 17:38:02 公開日:2023-07-20
# テキストマイニングのためのチャットGPT化学アシスタントとMOF合成予測 ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis ( http://arxiv.org/abs/2306.11296v2 ) ライセンス: Link先を確認	Zhiling Zheng, Oufan Zhang, Christian Borgs, Jennifer T. Chayes, Omar M. Yaghi	(参考訳) 本研究は,化学文献の様々な形式やスタイルから,金属-有機フレームワーク(MOF)合成条件のテキストマイニングの自動化におけるChatGPTの導出を行う。これはChatGPTが情報を幻覚させる傾向を効果的に緩和するものであり、以前は科学分野で大きな言語モデル(LLM)を使用していた問題だった。私たちのアプローチは、chatgpt自身によってプログラムされたテキストマイニングの3つの異なるプロセスを実装するワークフローの開発に関するものです。これらはすべて、パース、検索、フィルタリング、分類、要約、データ統合を可能にする。論文から得られた約800個のMOFに関する26,257個の異なる合成パラメータを抽出する。このプロセスには、ChatGPTにテキストマイニングを指示するChemPrompt Engineering戦略が含まれています。さらに,テキストマイニングによって構築されたデータセットを用いて,MOF実験結晶化結果の予測に精度86%以上の機械学習モデルを構築した。また, 化学反応や合成過程に関する質問に答える, 信頼性の高いデータ接地型mofチャットボットを開発した。 ChatGPTを使用するプロセスは、コーディングの専門知識を必要としない物語言語のみを使用して、多様なMOF合成情報を統一形式で確実にマイニングし、集計することを考えると、我々のChatGPT化学アシスタントは、他の様々な化学分野において非常に有用であると予想される。 We use prompt engineering to guide ChatGPT in the automation of text mining of metal-organic frameworks (MOFs) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT's tendency to hallucinate information -- an issue that previously made the use of Large Language Models (LLMs) in scientific fields challenging. Our approach involves the development of a workflow implementing three different processes for text mining, programmed by ChatGPT itself. All of them enable parsing, searching, filtering, classification, summarization, and data unification with different tradeoffs between labor, speed, and accuracy. We deploy this system to extract 26,257 distinct synthesis parameters pertaining to approximately 800 MOFs sourced from peer-reviewed research articles. This process incorporates our ChemPrompt Engineering strategy to instruct ChatGPT in text mining, resulting in impressive precision, recall, and F1 scores of 90-99%. Furthermore, with the dataset built by text mining, we constructed a machine-learning model with over 86% accuracy in predicting MOF experimental crystallization outcomes and preliminarily identifying important factors in MOF crystallization. We also developed a reliable data-grounded MOF chatbot to answer questions on chemical reactions and synthesis procedures. Given that the process of using ChatGPT reliably mines and tabulates diverse MOF synthesis information in a unified format, while using only narrative language requiring no coding expertise, we anticipate that our ChatGPT Chemistry Assistant will be very useful across various other chemistry sub-disciplines.	翻訳日:2023-07-21 17:37:50 公開日:2023-07-20
# EPRペアのみを用いた量子検出可能ビザンチン合意プロトコル A Quantum Detectable Byzantine Agreement Protocol using only EPR pairs ( http://arxiv.org/abs/2306.10825v2 ) ライセンス: Link先を確認	Theodore Andronikos, Alla Sirokofskich	(参考訳) 本稿では,検出可能ビザンチン合意のための新しい量子プロトコルを提案する。提案されたプロトコルを類似の量子プロトコルと区別することは、EPRペアのみを使用し、特に$\Psi^{ + }$ペアを使用するという事実である。検出可能なビザンチン協定を保証できる高度な量子プロトコルは数多く存在するが、現在の技術的制限のため、それらは実装に簡単には依存しない。多数のプレイヤーに対して、GHZ $n$-tuplesや他のよりエキゾチックな絡み合った状態は、生成が簡単ではなく、そのようなプロトコルのスケーラビリティを複雑にする可能性がある。対照的にベル状態は、間違いなく最大の絡み合った状態の中で最も容易に生成できる状態である。これは、プレイヤー数$n$に関係なく、EPRペアだけを必要とするため、提案されたプロトコルのスケーラビリティを促進することを願っている。最後に、任意の数のプレイヤーに対して$n$であっても、我々のプロトコルは常に一定の回数のラウンドで完了している。 In this paper, we introduce a new quantum protocol for Detectable Byzantine Agreement. What distinguishes the proposed protocol among similar quantum protocols, is the fact that it uses only EPR pairs, and, in particular, $\Psi^{ + }$ pairs. There are many sophisticated quantum protocols that guarantee Detectable Byzantine Agreement, but they do not easily lend themselves to practical implementations, due to present-day technological limitations. For a large number $n$ of players, GHZ $n$-tuples, or other more exotic entangled states, are not easy to produce, a fact which might complicate the scalability of such protocols. In contrast, Bell states are, undoubtedly, the easiest to generate among maximally entangled states. This will, hopefully, facilitate the scalability of the proposed protocol, as only EPR pairs are required, irrespective of the number $n$ of players. Finally, we mention that, even for arbitrary many players $n$, our protocol always completes in a constant number of rounds, namely $4$.	翻訳日:2023-07-21 17:37:06 公開日:2023-07-20
# Open-Vocabulary Object Detection のスケーリング Scaling Open-Vocabulary Object Detection ( http://arxiv.org/abs/2306.09683v2 ) ライセンス: Link先を確認	Matthias Minderer, Alexey Gritsenko, Neil Houlsby	(参考訳) オープンボキャブラリオブジェクト検出は、事前訓練された視覚言語モデルから大きな恩恵を受けているが、それでも検出訓練データの量によって制限されている。検出トレーニングデータは、Webイメージテキストペアを弱い監視手段として使用することで拡張できるが、画像レベルの事前トレーニングに匹敵するスケールでは行われていない。ここでは,既存の検出器を用いて画像テキストペアに擬似ボックスアノテーションを生成する自己学習を用いて,検出データをスケールアップする。自己学習のスケーリングにおける大きな課題は、ラベル空間の選択、擬似アノテーションフィルタリング、トレーニング効率である。これらの課題に対処するOWLv2モデルとOWL-ST自己学習レシピを提案する。 OWLv2は、既に同等のトレーニングスケール(約10万例)で、最先端のオープン語彙検出器の性能を上回っている。 L/14アーキテクチャでは、OWL-STはLVISレアクラスのAPを改善し、そのモデルでは31.2%から44.6%(相対的な改善43%)まで、人間のボックスアノテーションが見られない。 OWL-STは、画像分類や言語モデリングで見られるような、オープンワールドのローカライゼーションのためのWebスケールトレーニングをアンロックする。 Open-vocabulary object detection has benefited greatly from pretrained vision-language models, but is still limited by the amount of available detection training data. While detection training data can be expanded by using Web image-text pairs as weak supervision, this has not been done at scales comparable to image-level pretraining. Here, we scale up detection data with self-training, which uses an existing detector to generate pseudo-box annotations on image-text pairs. Major challenges in scaling self-training are the choice of label space, pseudo-annotation filtering, and training efficiency. We present the OWLv2 model and OWL-ST self-training recipe, which address these challenges. OWLv2 surpasses the performance of previous state-of-the-art open-vocabulary detectors already at comparable training scales (~10M examples). However, with OWL-ST, we can scale to over 1B examples, yielding further large improvement: With an L/14 architecture, OWL-ST improves AP on LVIS rare classes, for which the model has seen no human box annotations, from 31.2% to 44.6% (43% relative improvement). OWL-ST unlocks Web-scale training for open-world localization, similar to what has been seen for image classification and language modelling.	翻訳日:2023-07-21 17:36:49 公開日:2023-07-20
# VNHSGE英語データセットにおける大規模言語モデルの性能比較:OpenAI ChatGPT, Microsoft Bing Chat, Google Bard Performance Comparison of Large Language Models on VNHSGE English Dataset: OpenAI ChatGPT, Microsoft Bing Chat, and Google Bard ( http://arxiv.org/abs/2307.02288v3 ) ライセンス: Link先を確認	Xuan-Quy Dao	(参考訳) 本稿では,VNHSGEの英語データセット上で,OpenAI ChatGPT,Microsoft Bing Chat(BingChat),Google Bardの3つの大規模言語モデル(LLM)の性能比較を行った。 BingChat、Bard、ChatGPT(GPT-3.5)はそれぞれ92.4\%、86\%、79.2\%である。結果は、BingChatがChatGPTやBardより優れていることを示している。したがって、BingChatとBardはChatGPTを置き換えることができるが、ChatGPTはベトナムでは公式には利用できない。また,BingChat,Bard,ChatGPTは,ベトナム人学生の英語能力よりも優れていた。本研究の成果は、英語教育におけるllmの可能性の理解に寄与している。 ChatGPT、BingChat、Bardの顕著なパフォーマンスは、高校レベルで英語を教え学習するための効果的なツールとしての可能性を示している。 This paper presents a performance comparison of three large language models (LLMs), namely OpenAI ChatGPT, Microsoft Bing Chat (BingChat), and Google Bard, on the VNHSGE English dataset. The performance of BingChat, Bard, and ChatGPT (GPT-3.5) is 92.4\%, 86\%, and 79.2\%, respectively. The results show that BingChat is better than ChatGPT and Bard. Therefore, BingChat and Bard can replace ChatGPT while ChatGPT is not yet officially available in Vietnam. The results also indicate that BingChat, Bard and ChatGPT outperform Vietnamese students in English language proficiency. The findings of this study contribute to the understanding of the potential of LLMs in English language education. The remarkable performance of ChatGPT, BingChat, and Bard demonstrates their potential as effective tools for teaching and learning English at the high school level.	翻訳日:2023-07-21 17:29:12 公開日:2023-07-20
# 医用画像解析の公平性向上を目的とした固定属性群のない校正バイアスの緩和 Mitigating Calibration Bias Without Fixed Attribute Grouping for Improved Fairness in Medical Imaging Analysis ( http://arxiv.org/abs/2307.01738v2 ) ライセンス: Link先を確認	Changjian Shui, Justin Szeto, Raghav Mehta, Douglas L. Arnold, Tal Arbel	(参考訳) 深層学習医療画像モデルの現実的な臨床実践への展開には、校正が必要である。しかし、全体として十分に調整されたモデルは、サブ人口の調整が不十分なままであり、このモデルの推奨に基づいて、臨床医が不意にこのグループの決定を下す可能性がある。モデル精度の観点から,サブグループ間のバイアスの軽減に有効な方法が示されているが,本研究は医用画像解析の文脈におけるキャリブレーションバイアスの軽減に関するオープン問題に焦点を当てている。本手法は訓練中にサブグループ属性を必要とせず,各属性の選択に対するバイアスを緩和する柔軟性を実現する。そこで本研究では,まず低濃度の試料を同定し,それらをグループに分類し,グループワイド焦点損失を導入して校正バイアスを改善する2段階の手法を提案する。 HAM10000データセットを用いた皮膚病変分類と,多発性硬化症(MS)患者の将来の病変活動の予測について検討した。また,年齢,性別などの従来の敏感な属性を年齢,性別などのサブグループで考慮することに加えて,医療画像解析において必要となる病変負荷など,画像由来の属性が異なるグループ間でのバイアスも考慮する。提案手法は, 予測性能を維持しつつ, 最近のベースラインよりも高い精度で校正誤差を効果的に制御できることを示す。 Trustworthy deployment of deep learning medical imaging models into real-world clinical practice requires that they be calibrated. However, models that are well calibrated overall can still be poorly calibrated for a sub-population, potentially resulting in a clinician unwittingly making poor decisions for this group based on the recommendations of the model. Although methods have been shown to successfully mitigate biases across subgroups in terms of model accuracy, this work focuses on the open problem of mitigating calibration biases in the context of medical image analysis. Our method does not require subgroup attributes during training, permitting the flexibility to mitigate biases for different choices of sensitive attributes without re-training. To this end, we propose a novel two-stage method: Cluster-Focal to first identify poorly calibrated samples, cluster them into groups, and then introduce group-wise focal loss to improve calibration bias. We evaluate our method on skin lesion classification with the public HAM10000 dataset, and on predicting future lesional activity for multiple sclerosis (MS) patients. In addition to considering traditional sensitive attributes (e.g. age, sex) with demographic subgroups, we also consider biases among groups with different image-derived attributes, such as lesion load, which are required in medical image analysis. Our results demonstrate that our method effectively controls calibration error in the worst-performing subgroups while preserving prediction performance, and outperforming recent baselines.	翻訳日:2023-07-21 17:28:53 公開日:2023-07-20
# UW-ProCCaps: カプセルによる水中プログレッシブカラー化 UW-ProCCaps: UnderWater Progressive Colourisation with Capsules ( http://arxiv.org/abs/2307.01091v2 ) ライセンス: Link先を確認	Rita Pucci, Niki Martinel	(参考訳) 水中画像は海洋生物の研究と理解に欠かせないものである。画像保存に必要なメモリスペースの削減に重点を置いていますが、収集フェーズでのメモリスペースの消費は、このフェーズの持続時間を制限しているため、より多くの画像収集キャンペーンが必要になります。本稿では,水中画像の色を発光チャネルから再構成し,利用可能な記憶空間の2/3を節約する新しい機械学習モデルを提案する。本モデルは水中カラー再構成を専門とし,エンコーダ・デコーダアーキテクチャで構成されている。エンコーダは、畳み込みエンコーダと、ウェブ教師付きデータで訓練された並列特殊分類器からなる。エンコーダとデコーダはカプセルの層を使用して、画像内のエンティティの特徴をキャプチャする。色再現プロセスは、進行性および生成性逆行性訓練手順をリコールする。プログレッシブトレーニングは、色彩の洗練に焦点を当てた生成的な敵対的なルーチンの基盤を与え、画像を明るく飽和した色にすることで、イメージを生き返らせる。 4つのベンチマークデータセットで定性的かつ定量的にモデルを検証する。これは、グレースケールの水中画像で色を再現する最初の試みである。 4つのベンチマークデータセットの大規模な結果は、我々のソリューションが最先端(SOTA)ソリューションより優れていることを示している。また,生成した色調は,SOTAの画質向上モデルと比較して画質の向上を図っている。 Underwater images are fundamental for studying and understanding the status of marine life. We focus on reducing the memory space required for image storage while the memory space consumption in the collecting phase limits the time lasting of this phase leading to the need for more image collection campaigns. We present a novel machine-learning model that reconstructs the colours of underwater images from their luminescence channel, thus saving 2/3 of the available storage space. Our model specialises in underwater colour reconstruction and consists of an encoder-decoder architecture. The encoder is composed of a convolutional encoder and a parallel specialised classifier trained with webly-supervised data. The encoder and the decoder use layers of capsules to capture the features of the entities in the image. The colour reconstruction process recalls the progressive and the generative adversarial training procedures. The progressive training gives the ground for a generative adversarial routine focused on the refining of colours giving the image bright and saturated colours which bring the image back to life. We validate the model both qualitatively and quantitatively on four benchmark datasets. This is the first attempt at colour reconstruction in greyscale underwater images. Extensive results on four benchmark datasets demonstrate that our solution outperforms state-of-the-art (SOTA) solutions. We also demonstrate that the generated colourisation enhances the quality of images compared to enhancement models at the SOTA.	翻訳日:2023-07-21 17:28:28 公開日:2023-07-20
# PatternGPT : 大言語モデルテキスト生成のためのパターン駆動フレームワーク PatternGPT :A Pattern-Driven Framework for Large Language Model Text Generation ( http://arxiv.org/abs/2307.00470v4 ) ライセンス: Link先を確認	Le Xiao and Xin Shan	(参考訳) 大規模言語モデル(LLMS)は優れたテキスト生成能力を示しており、多くの下流タスクに対して流動的な人間のような応答を生成することができる。しかし、幻覚への感受性や外部知識を直接使用できないため、実世界の重要なタスクに大規模な言語モデルを適用することは依然として困難である。そこで本研究では,大規模言語モデルのためのパターン駆動型テキスト生成フレームワークであるPatternGPTを提案する。 Firstly, the framework utilizes the extraction capability of Large Language Models to generate rich and diversified structured and formalized patterns, which facilitates the introduction of external knowledge to do the computation, and then draws on the idea of federated learning to use multiple agents to achieve the sharing in order to obtain more diversified patterns, and finally uses judgment criteria and optimization algorithm to search for high-quality patterns to guide the generation of models. 最後に、判定基準や最適化アルゴリズムなどの外部知識を用いて高品質なパターンを探索し、探索されたパターンを用いてモデル生成を導く。このフレームワークは、多種多様なパターンの生成、データのプライバシ保護、外部知識の統合、生成品質の向上といった利点があり、大きな言語モデルのテキスト生成能力を最適化し、インテリジェントな対話やコンテンツ生成の分野によりよい適用を可能にする効果的な方法を提供する。 Large language models(LLMS)have shown excellent text generation capabilities, capable of generating fluent human-like responses for many downstream tasks. However, applying large language models to real-world critical tasks remains challenging due to their susceptibility to hallucinations and inability to directly use external knowledge. To cope with the above challenges, this paper proposes PatternGPT, a pattern-driven text generation framework for Large Language Models. Firstly, the framework utilizes the extraction capability of Large Language Models to generate rich and diversified structured and formalized patterns, which facilitates the introduction of external knowledge to do the computation, and then draws on the idea of federated learning to use multiple agents to achieve the sharing in order to obtain more diversified patterns, and finally uses judgment criteria and optimization algorithm to search for high-quality patterns to guide the generation of models. Finally, external knowledge such as judgment criteria and optimization algorithms are used to search for high-quality patterns, and the searched patterns are used to guide model generation. This framework has the advantages of generating diversified patterns, protecting data privacy, combining external knowledge, and improving the quality of generation, which provides an effective method to optimize the text generation capability of large language models, and make it better applied to the field of intelligent dialogue and content generation.	翻訳日:2023-07-21 17:28:11 公開日:2023-07-20
# 予測状態表現の学習に有効なUCB型アルゴリズム Provably Efficient UCB-type Algorithms For Learning Predictive State Representations ( http://arxiv.org/abs/2307.00405v2 ) ライセンス: Link先を確認	Ruiquan Huang, Yingbin Liang, Jing Yang	(参考訳) マルコフ決定プロセス(MDP)と部分的に観察可能なMDP(PMMDP)を特別に含む一般的なシーケンシャルな意思決定問題は、時間とともに観察と行動の歴史に基づいて一連の意思決定を行うことで累積報酬を最大化することである。近年の研究では、予測状態表現(psr)によってモデル化された低ランク構造を認める場合、逐次的意思決定問題は統計的に学習可能であることが示されている。これらの進歩にもかかわらず、既存のアプローチは通常、計算的に効率的でないオラクルやステップを含む。一方,楽観的なボーナスデザインの難しさから,盗賊やMDPの計算効率向上に成功している上位信頼境界(UCB)に基づくアプローチは,より一般的なPSRでは研究されていない。本稿では,推定モデルと実モデル間の全変動距離を上限とする新しいボーナス項を特徴とする,PSRに対する最初のUCB型アプローチを提案する。さらに,オンラインPSRとオフラインPSRの両方に設計したUPB型アルゴリズムの複雑さ境界を特徴付ける。従来のPSRのアプローチとは対照的に,UCB型アルゴリズムでは計算効率が向上し,最終段階の近似ポリシが保証され,モデル精度が保証された。 The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time. Recent studies have shown that the sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). Despite these advancements, existing approaches typically involve oracles or steps that are not computationally efficient. On the other hand, the upper confidence bound (UCB) based approaches, which have served successfully as computationally efficient methods in bandits and MDPs, have not been investigated for more general PSRs, due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.	翻訳日:2023-07-21 17:27:52 公開日:2023-07-20
# 最適化誘導巡回自己学習による教師なし3次元登録 Unsupervised 3D registration through optimization-guided cyclical self-training ( http://arxiv.org/abs/2306.16997v2 ) ライセンス: Link先を確認	Alexander Bigalke, Lasse Hansen, Tony C. W. Mok, Mattias P. Heinrich	(参考訳) 最先端のディープラーニングベースの登録には、3つの異なる学習戦略が採用されている: コストのかかる手動アノテーションを必要とする教師付き学習、ドメインの専門家が設計した手作りの類似度メトリクスに大きく依存する教師なし学習、ドメインシフトを導入する合成データからの学習。これらの戦略の限界を克服するため,我々は,教師なし登録のための新しい自己教師あり学習パラダイムを提案する。私たちの考えは2つの重要な洞察に基づいている。特徴ベース微分可能最適化器 1)ランダムな特徴からでも合理的な登録を行う 2) ノイズラベルによる先行特徴抽出ネットワークの訓練を安定化させる。その結果、ランダムな特徴から推定される変位場として擬似ラベルが初期化され、学習特徴抽出器からより表現的な特徴に基づいて循環的に更新され、自己強化効果が得られる循環自己学習を提案する。腹部と肺の登録方法を評価し,メートル法に基づく監督を一貫して上回り,様々な最先端の競争相手を上回っている。ソースコードはhttps://github.com/multimodallearning/reg-cyclical-self-trainで入手できる。 State-of-the-art deep learning-based registration methods employ three different learning strategies: supervised learning, which requires costly manual annotations, unsupervised learning, which heavily relies on hand-crafted similarity metrics designed by domain experts, or learning from synthetic data, which introduces a domain shift. To overcome the limitations of these strategies, we propose a novel self-supervised learning paradigm for unsupervised registration, relying on self-training. Our idea is based on two key insights. Feature-based differentiable optimizers 1) perform reasonable registration even from random features and 2) stabilize the training of the preceding feature extraction network on noisy labels. Consequently, we propose cyclical self-training, where pseudo labels are initialized as the displacement fields inferred from random features and cyclically updated based on more and more expressive features from the learning feature extractor, yielding a self-reinforcement effect. We evaluate the method for abdomen and lung registration, consistently surpassing metric-based supervision and outperforming diverse state-of-the-art competitors. Source code is available at https://github.com/multimodallearning/reg-cyclical-self-train.	翻訳日:2023-07-21 17:26:45 公開日:2023-07-20
# MotionGPT: 外国語としての人間の動き MotionGPT: Human Motion as a Foreign Language ( http://arxiv.org/abs/2306.14795v2 ) ライセンス: Link先を確認	Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, Tao Chen	(参考訳) 事前学習された大規模言語モデルの進歩は展開するが、言語とモーションのような他のマルチモーダルデータのための統一モデルの構築は、これまでも挑戦的で未修正である。幸運なことに、人間の動きは人間の言語に似た意味的な結合を示し、しばしば身体言語の一種として認識される。大規模動作モデルで言語データを融合することにより、動作関連タスクのパフォーマンスを向上させる動き言語事前学習が実現可能となる。この知見を活かし,複数の動作関連タスクを処理するための統合型,汎用性,ユーザフレンドリなモーション言語モデルであるmotiongptを提案する。具体的には,人間の動きに対する離散ベクトル量子化を用いて,単語トークンの生成過程と類似した3次元動きを動きトークンに転送する。この「動き語彙」に基づいて、動きとテキストの両方の言語モデリングを統一的に行い、人間の動きを特定の言語として扱う。さらに、素早い学習にインスパイアされたMotionGPTを、動き言語データの混合で事前訓練し、素早い質問・回答タスクで微調整する。広範囲な実験により、MotionGPTはテキスト駆動のモーション生成、モーションキャプション、モーション予測、動作中の動作を含む複数の動作タスクにおいて最先端のパフォーマンスを達成することが示された。 Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multi-modal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.	翻訳日:2023-07-21 17:26:25 公開日:2023-07-20
# 放射線医のような放射線画像を読む Reading Radiology Imaging Like The Radiologist ( http://arxiv.org/abs/2307.05921v3 ) ライセンス: Link先を確認	Yuhao Wang	(参考訳) 自動放射線学レポート生成は、放射線学イメージングのリッチできめ細かい記述を含む放射線学レポートを生成することを目的としている。自然画像領域の画像キャプションと比較すると、医療画像は互いに非常によく似ており、疾患の発生にはほとんど差異がない。放射線学レポートにおけるこれらの小さな違いの重要性を考えると、モデルに病気の発生の微妙な領域にもっと集中するよう促すことが重要である。第二に、視覚的およびテキスト的データバイアスの問題は深刻である。通常のケースがデータセットの大部分を占めるだけでなく、病的変化のある部分を記述する文も、段落のごく一部を構成するのみである。最後に、医療画像レポートの生成には、医療知識の専門知識と経験的トレーニングを必要とする長いテキスト生成の課題が伴う。その結果、このようなレポートを生成するのが困難になる。これらの課題に対処するため,我々は,同様の報告を先行知識参照として利用する疾患指向検索フレームワークを提案する。我々は、より正確かつ事実的に一貫した疾患記述を生成するために、事実整合性キャプション生成器を設計する。本研究の枠組みは,CXRデータベースから,その位置と形態的特徴からなる疾患指向マスクを検索することによって,疾患に関する最も類似した報告を見つけることができる。疾患指向の類似報告と視覚的特徴を参照することにより、事実整合性モデルはより正確な放射線診断レポートを生成することができる。 Automated radiology report generation aims to generate radiology reports that contain rich, fine-grained descriptions of radiology imaging. Compared with image captioning in the natural image domain, medical images are very similar to each other, with only minor differences in the occurrence of diseases. Given the importance of these minor differences in the radiology report, it is crucial to encourage the model to focus more on the subtle regions of disease occurrence. Secondly, the problem of visual and textual data biases is serious. Not only do normal cases make up the majority of the dataset, but sentences describing areas with pathological changes also constitute only a small part of the paragraph. Lastly, generating medical image reports involves the challenge of long text generation, which requires more expertise and empirical training in medical knowledge. As a result, the difficulty of generating such reports is increased. To address these challenges, we propose a disease-oriented retrieval framework that utilizes similar reports as prior knowledge references. We design a factual consistency captioning generator to generate more accurate and factually consistent disease descriptions. Our framework can find most similar reports for a given disease from the CXR database by retrieving a disease-oriented mask consisting of the position and morphological characteristics. By referencing the disease-oriented similar report and the visual features, the factual consistency model can generate a more accurate radiology report.	翻訳日:2023-07-21 17:20:36 公開日:2023-07-20
# 一般パラメトリック密度モデルのためのロバスト密度パワーに基づく発散を最小化する確率的最適化手法 A stochastic optimization approach to minimize robust density power-based divergences for general parametric density models ( http://arxiv.org/abs/2307.05251v2 ) ライセンス: Link先を確認	Akifumi Okuno	(参考訳) 観測の基盤となる分布を外圧に対して頑健に推定するために設計された密度パワー分散(DPD) [Basu et al. (1998), Biometrika] は、推定されるパラメトリック密度モデルのパワーの積分項を構成する。積分項の明示的な形式は、ある特定の密度(正規密度や指数密度など)に対して得られるが、その計算的難易度は、PDの提案から4分の1以上にわたって、より一般的なパラメトリック密度へのPDに基づく推定の適用を禁止している。本研究では,一般パラメトリック密度モデルにおけるPDDの最小化のための簡単な確率最適化手法を提案する。提案手法は、非正規化モデルの助けを借りて、別の密度電力ベースの$\gamma$-divergenceの最小化にも適用できる。 Density power divergence (DPD) [Basu et al. (1998), Biometrika], which is designed to estimate the underlying distribution of the observations robustly against outliers, comprises an integral term of the power of the parametric density models to be estimated. While the explicit form of the integral term can be obtained for some specific densities (such as normal density and exponential density), its computational intractability has prohibited the application of DPD-based estimation to more general parametric densities, over a quarter of a century since the proposal of DPD. This study proposes a simple stochastic optimization approach to minimize DPD for general parametric density models and explains its adequacy by referring to conventional theories on stochastic optimization. The proposed approach also can be applied to the minimization of another density power-based $\gamma$-divergence with the aid of unnormalized models.	翻訳日:2023-07-21 17:20:13 公開日:2023-07-20
# ディジタルゼロノイズ外挿による量子誤差緩和のベストプラクティス Best practices for quantum error mitigation with digital zero-noise extrapolation ( http://arxiv.org/abs/2307.05203v2 ) ライセンス: Link先を確認	Ritajit Majumdar and Pedro Rivero and Friederike Metz and Areeq Hasan and Derek S Wang	(参考訳) デジタルゼロノイズ外挿法(dZNE)は、その概念的単純さ、アクセシビリティ、資源効率のために量子エラー緩和(QEM)の一般的なアプローチとして登場した。しかし、実際には、ノイズの多い量子プロセッサの計算範囲を拡張するためにdZNEを適切に適用することは微妙な問題である。ここでは,ノイズシミュレータと実量子ハードウェアに関する文献レビューとオリジナル実験に基づいて,騒音増幅,量子デバイス上での実行,ゼロノイズ限界への外挿,他のqem法との合成など,ワークフローの各ステップにおけるdzneによるqemのベストプラクティスを定義する。 dzneのベストプラクティスを確立するこの取り組みは、他のqemメソッドにも拡張され、ノイズの多い量子ハードウェア上でより再現可能で厳密な計算が行われることを期待している。 Digital zero-noise extrapolation (dZNE) has emerged as a common approach for quantum error mitigation (QEM) due to its conceptual simplicity, accessibility, and resource efficiency. In practice, however, properly applying dZNE to extend the computational reach of noisy quantum processors is rife with subtleties. Here, based on literature review and original experiments on noisy simulators and real quantum hardware, we define best practices for QEM with dZNE for each step of the workflow, including noise amplification, execution on the quantum device, extrapolation to the zero-noise limit, and composition with other QEM methods. We anticipate that this effort to establish best practices for dZNE will be extended to other QEM methods, leading to more reproducible and rigorous calculations on noisy quantum hardware.	翻訳日:2023-07-21 17:19:55 公開日:2023-07-20
# Solvent: タンパク質のフォールディングのためのフレームワーク Solvent: A Framework for Protein Folding ( http://arxiv.org/abs/2307.04603v4 ) ライセンス: Link先を確認	Jaemyung Lee, Kyeongtak Han, Jaehoon Kim, Hasun Yu, Youhan Lee	(参考訳) ai研究を行うには一貫性と信頼性が不可欠である。オブジェクト検出のような多くの有名な研究分野は、堅固なベンチマークフレームワークで比較、検証されている。 AlphaFold2の後、タンパク質の折り畳みタスクは新しい段階に入り、AlphaFold2の構成要素に基づいて多くの方法が提案されている。タンパク質折り畳みにおける統一的な研究フレームワークの重要性は、様々なアプローチを一貫して比較するための実装とベンチマークを含んでいる。これを実現するために、Solventは、既製のインターフェイスのように最先端モデルの重要なコンポーネントをサポートするタンパク質折り畳みフレームワークである。Solventは、統一コードベースに実装された異なるモデルを含み、同じデータセット上で定義されたモデルのトレーニングと評価をサポートする。我々は、よく知られたアルゴリズムとそのコンポーネントをベンチマークし、タンパク質構造モデリング分野に関する有益な洞察を与える実験を提供する。我々はSolventが提案したモデルの信頼性と一貫性を高め、速度とコストの両面で効率を向上し、タンパク質の折り畳みモデル研究の加速を期待する。コードはhttps://github.com/kakaobrain/solventで入手できる。 Consistency and reliability are crucial for conducting AI research. Many famous research fields, such as object detection, have been compared and validated with solid benchmark frameworks. After AlphaFold2, the protein folding task has entered a new phase, and many methods are proposed based on the component of AlphaFold2. The importance of a unified research framework in protein folding contains implementations and benchmarks to consistently and fairly compare various approaches. To achieve this, we present Solvent, an protein folding framework that supports significant components of state-of-the-art models in the manner of off-the-shelf interface Solvent contains different models implemented in a unified codebase and supports training and evaluation for defined models on the same dataset. We benchmark well-known algorithms and their components and provide experiments that give helpful insights into the protein structure modeling field. We hope that Solvent will increase the reliability and consistency of proposed models and gives efficiency in both speed and costs, resulting in acceleration on protein folding modeling research. The code is available at https://github.com/kakaobrain/solvent, and the project will continue to be developed.	翻訳日:2023-07-21 17:19:18 公開日:2023-07-20
# 不確かさサンプリングを理解する Understanding Uncertainty Sampling ( http://arxiv.org/abs/2307.02719v3 ) ライセンス: Link先を確認	Shang Liu, Xiaocheng Li	(参考訳) 不確実性サンプリングは、現在の予測モデルが不確実であるデータサンプルの注釈を逐次クエリする、一般的なアクティブラーニングアルゴリズムである。しかし、不確実性サンプリングの使用は概ねヒューリスティックである。 (i)特定の損失を受けた特定のタスクに対する「不確実性」の適切な定義についての合意がないこと。 (II)アルゴリズムを実装するための標準プロトコルを規定する理論的保証はない。例えば、確率勾配降下のような最適化アルゴリズムの枠組みの下で、逐次到着した注釈付きデータをどう扱うか。本研究では,ストリームベースとプールベースの両方のアクティブラーニングの下で不確実性サンプリングアルゴリズムを体系的に検討する。そこで本研究では, 不確実性尺度と元の損失関数に依存する等価損失の概念を提案し, 不確実性サンプリングアルゴリズムが等価損失に対して本質的に最適化することを示す。この観点は、既存の不確実性対策の正当性を2つの側面から検証する。さらに、不確実性測度を不確実性として設計するための新しい概念である \textit{loss as uncertainty} を提案する。特徴を不確実性尺度として考慮すれば、条件付き期待損失を使用することが目的である。このような不確実性測度は、分類問題と回帰問題の両方をカバーする優れた解析的性質と一般性を有しており、基礎となるモデルと問題の完全な一般性において、ストリームベースとプールベースの設定の両方において不確実性サンプリングアルゴリズムに束縛された最初の一般化を提供することができる。最後に,リスクに敏感な目標と分布的ロバスト性を持つ不確実性サンプリングアルゴリズムのある種の変種間の接続を確立することにより,サンプルサイズが小さい場合の不確実性サンプリングアルゴリズムの利点を部分的に説明できる。 Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of "uncertainty" for a specific task under a specific loss; (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle the sequentially arrived annotated data under the framework of optimization algorithms such as stochastic gradient descent. In this work, we systematically examine uncertainty sampling algorithms under both stream-based and pool-based active learning. We propose a notion of equivalent loss which depends on the used uncertainty measure and the original loss function and establish that an uncertainty sampling algorithm essentially optimizes against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. Furthermore, we propose a new notion for designing uncertainty measures called \textit{loss as uncertainty}. The idea is to use the conditional expected loss given the features as the uncertainty measure. Such an uncertainty measure has nice analytical properties and generality to cover both classification and regression problems, which enable us to provide the first generalization bound for uncertainty sampling algorithms under both stream-based and pool-based settings, in the full generality of the underlying model and problem. Lastly, we establish connections between certain variants of the uncertainty sampling algorithms with risk-sensitive objectives and distributional robustness, which can partly explain the advantage of uncertainty sampling algorithms when the sample size is small.	翻訳日:2023-07-21 17:18:11 公開日:2023-07-20
# $\nu^2$-flows:条件付き正規化流を伴うマルチニュートリノ最終状態における高速で改善されたニュートリノ再構成 $\nu^2$-Flows: Fast and improved neutrino reconstruction in multi-neutrino final states with conditional normalizing flows ( http://arxiv.org/abs/2307.02405v2 ) ライセンス: Link先を確認	John Andrew Raine, Matthew Leigh, Knut Zoch, Tobias Golling	(参考訳) 本研究では、複数のニュートリノを含むファイナル状態への$\nu$-Flows法の拡張である$\nu^2$-Flowsを導入する。このアーキテクチャは、任意の所望のニュートリノ乗数に対して最終状態のオブジェクトタイプと乗数の組み合わせに対してネイティブにスケールすることができる。 t\bar{t}$ dileptonイベントにおいて、ニュートリノとそれらの間の相関のモーメントは、最も一般的な標準解析技術を使用する時よりも正確に再構築され、全てのイベントに対して解が見つかる。推論時間は競合する手法よりも大幅に速く、グラフィック処理ユニット上で並列に評価することでさらに削減することができる。我々は、$\nu^2$-Flows to $t\bar{t}$ dilepton イベントを適用し、展開分布における各ビンの不確かさが、標準手法よりも完全ニュートリノ再構成による性能の限界にかなり近いことを示す。選択された双微分可観測量 $\nu^2$- Flows は、ニュートリノ重み付け法と比較して1.5から2の係数で各ビンの統計的精度を改善し、楕円法と比較して最大4倍に向上する。 In this work we introduce $\nu^2$-Flows, an extension of the $\nu$-Flows method to final states containing multiple neutrinos. The architecture can natively scale for all combinations of object types and multiplicities in the final state for any desired neutrino multiplicities. In $t\bar{t}$ dilepton events, the momenta of both neutrinos and correlations between them are reconstructed more accurately than when using the most popular standard analytical techniques, and solutions are found for all events. Inference time is significantly faster than competing methods, and can be reduced further by evaluating in parallel on graphics processing units. We apply $\nu^2$-Flows to $t\bar{t}$ dilepton events and show that the per-bin uncertainties in unfolded distributions is much closer to the limit of performance set by perfect neutrino reconstruction than standard techniques. For the chosen double differential observables $\nu^2$-Flows results in improved statistical precision for each bin by a factor of 1.5 to 2 in comparison to the Neutrino Weighting method and up to a factor of four in comparison to the Ellipse approach.	翻訳日:2023-07-21 17:17:44 公開日:2023-07-20
# 局所固有次元を用いた深部拡散モデルによる画像の検出 Detecting Images Generated by Deep Diffusion Models using their Local Intrinsic Dimensionality ( http://arxiv.org/abs/2307.02347v3 ) ライセンス: Link先を確認	Peter Lorenz, Ricard Durall and Janis Keuper	(参考訳) 近年,非常にリアルな画像の視覚的合成に拡散モデルが適用されている。これにより、悪質な目的に対する潜在的な懸念が高まる。本稿では,合成画像の自動検出とそれに基づく生成ネットワークの同定のために,元来,敵対例の検出の文脈で開発された軽量なマルチローカル固有次元(multiLID)を提案する。 GAN生成画像に対してのみ動作する多くの既存の検出手法とは対照的に,提案手法は現実的なユースケースの多くにおいて,ほぼ完璧な検出結果を提供する。既知のデータセットと新たに作成されたデータセットに関する広範な実験は、提案手法が拡散検出とモデル同定において優れていることを示している。生成画像の検出に関する最近の出版物の実証的評価は、主に「lsun-bedroom」データセットに焦点を当てているため、画像サイズが異なる複数の拡散モデルからのサンプルを含む拡散生成画像の検出に関する包括的なベンチマークを確立する。 Diffusion models recently have been successfully applied for the visual synthesis of strikingly realistic appearing images. This raises strong concerns about their potential for malicious purposes. In this paper, we propose using the lightweight multi Local Intrinsic Dimensionality (multiLID), which has been originally developed in context of the detection of adversarial examples, for the automatic detection of synthetic images and the identification of the according generator networks. In contrast to many existing detection approaches, which often only work for GAN-generated images, the proposed method provides close to perfect detection results in many realistic use cases. Extensive experiments on known and newly created datasets demonstrate that the proposed multiLID approach exhibits superiority in diffusion detection and model identification. Since the empirical evaluations of recent publications on the detection of generated images are often mainly focused on the "LSUN-Bedroom" dataset, we further establish a comprehensive benchmark for the detection of diffusion-generated images, including samples from several diffusion models with different image sizes.	翻訳日:2023-07-21 17:17:24 公開日:2023-07-20
# 構成・プライバシー・削除のためのタンジェント変換器 Tangent Transformers for Composition, Privacy and Removal ( http://arxiv.org/abs/2307.08122v2 ) ライセンス: Link先を確認	Tian Yu Liu, Aditya Golatkar and Stefano Soatto	(参考訳) 本稿では,1次テイラー展開計算による線形化変圧器の微調整手法であるTangent Attention Fine-Tuning(TAFT)を紹介する。線形化から生じるヤコビアン・ベクター積は1つの前方通過で効率的に計算でき、同じ数のパラメータを用いてトレーニングと推論コストを元の非線形積と同じ桁に削減できることを示す。さらに, 下流の様々な視覚分類課題に適用すると, タフトを微調整したタンジェント変圧器は, 元の非線形ネットワークの微調整と相性が良いことを示した。タンジェントトランスフォーマーは,新しい重み集合に対して線形であり,結果として生じる微調整損失は凸であるので,モデル構成や並列トレーニング,機械学習,差分プライバシーなどに関して,TAFTは非線形微調整に比べていくつかの利点がある。 We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning linearized transformers obtained by computing a First-order Taylor Expansion around a pre-trained initialization. We show that the Jacobian-Vector Product resulting from linearization can be computed efficiently in a single forward pass, reducing training and inference cost to the same order of magnitude as its original non-linear counterpart, while using the same number of parameters. Furthermore, we show that, when applied to various downstream visual classification tasks, the resulting Tangent Transformer fine-tuned with TAFT can perform comparably with fine-tuning the original non-linear network. Since Tangent Transformers are linear with respect to the new set of weights, and the resulting fine-tuning loss is convex, we show that TAFT enjoys several advantages compared to non-linear fine-tuning when it comes to model composition, parallel training, machine unlearning, and differential privacy.	翻訳日:2023-07-21 17:09:04 公開日:2023-07-20
# ジオメトリ誘導クロスビュートランスによる3次元地対衛星カメラ位置推定精度の向上 Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer ( http://arxiv.org/abs/2307.08015v3 ) ライセンス: Link先を確認	Yujiao Shi, Fei Wu, Akhil Perincherry, Ankit Vora, and Hongdong Li	(参考訳) 画像検索に基づくクロスビューローカライズ手法は、データベース衛星画像のサンプリング密度が限られているため、非常に粗いカメラポーズ推定につながることが多い。本稿では,地上画像とマッチング・検索衛星画像との相対的な回転と変換を推定することにより,地上カメラの位置と方向の精度を向上させる手法を提案する。本手法では,従来の形状と学習可能なクロスビュートランスを併用した形状誘導クロスビュートランスを設計,地上観測をオーバヘッドビューにマッピングする。合成したオーバヘッドビューと観測された衛星特徴マップから,強いグローバル情報埋め込み能力を持つニューラルポーズオプティマイザを構築し,それらの相対回転を推定する。それらの回転を整列した後、不確実性誘導された空間相関関係を開発し、相対変換を決定できる車両位置の確率マップを生成する。実験の結果,本手法は最先端技術よりも優れていた。特に、クロスビューkittiデータセットにおける車両横ポーズを1m以内に制限する可能性は35.54\%$から76.44\%$に改善され、そのgt値の1^{\circ}$以内に制限される可能性は19.64\%$から99.10\%$に改善された。 Image retrieval-based cross-view localization methods often lead to very coarse camera pose estimation, due to the limited sampling density of the database satellite images. In this paper, we propose a method to increase the accuracy of a ground camera's location and orientation by estimating the relative rotation and translation between the ground-level image and its matched/retrieved satellite image. Our approach designs a geometry-guided cross-view transformer that combines the benefits of conventional geometry and learnable cross-view transformers to map the ground-view observations to an overhead view. Given the synthesized overhead view and observed satellite feature maps, we construct a neural pose optimizer with strong global information embedding ability to estimate the relative rotation between them. After aligning their rotations, we develop an uncertainty-guided spatial correlation to generate a probability map of the vehicle locations, from which the relative translation can be determined. Experimental results demonstrate that our method significantly outperforms the state-of-the-art. Notably, the likelihood of restricting the vehicle lateral pose to be within 1m of its Ground Truth (GT) value on the cross-view KITTI dataset has been improved from $35.54\%$ to $76.44\%$, and the likelihood of restricting the vehicle orientation to be within $1^{\circ}$ of its GT value has been improved from $19.64\%$ to $99.10\%$.	翻訳日:2023-07-21 17:08:46 公開日:2023-07-20
# Few-Shot Sequence Labelingにおけるトークンとスパンレベルの統一化 Unifying Token and Span Level Supervisions for Few-Shot Sequence Labeling ( http://arxiv.org/abs/2307.07946v2 ) ライセンス: Link先を確認	Zifeng Cheng, Qingyu Zhou, Zhiwei Jiang, Xuemin Zhao, Yunbo Cao, Qing Gu	(参考訳) 短いショットシーケンスラベリングは、少数のラベル付きサンプルに基づいて新しいクラスを特定することを目的としている。既存の手法は、主にメトリクス学習に基づくトークンレベルまたはスパンレベルのラベルモデルを設計することで、データの不足問題を解決する。しかしながら、これらの方法は単一の粒度(トークンレベルまたはスパンレベル)でのみ訓練され、対応する粒度にいくつかの弱点がある。本稿では,まずトークンとスパンレベルの監視を統一し,数ショットのシーケンスラベリングのための一貫性デュアル適応型(CDAP)ネットワークを提案する。 CDAPにはトークンレベルとスパンレベルのネットワークが含まれており、異なる粒度で共同で訓練されている。 2つのネットワークの出力を調整するために,我々は,相互に学習できる一貫性のある損失を提案する。推定段階では,まず予測確率を調整し,次に最大確率で非重複スパンを選択する一貫した欲求推論アルゴリズムを提案する。大規模実験の結果,3つのベンチマークデータセットにおいて,新たな最先端結果が得られた。 Few-shot sequence labeling aims to identify novel classes based on only a few labeled samples. Existing methods solve the data scarcity problem mainly by designing token-level or span-level labeling models based on metric learning. However, these methods are only trained at a single granularity (i.e., either token level or span level) and have some weaknesses of the corresponding granularity. In this paper, we first unify token and span level supervisions and propose a Consistent Dual Adaptive Prototypical (CDAP) network for few-shot sequence labeling. CDAP contains the token-level and span-level networks, jointly trained at different granularities. To align the outputs of two networks, we further propose a consistent loss to enable them to learn from each other. During the inference phase, we propose a consistent greedy inference algorithm that first adjusts the predicted probability and then greedily selects non-overlapping spans with maximum probability. Extensive experiments show that our model achieves new state-of-the-art results on three benchmark datasets.	翻訳日:2023-07-21 17:08:19 公開日:2023-07-20
# 確率的政策実行不確実性を考慮した効果的な行動ロバスト強化学習 Efficient Action Robust Reinforcement Learning with Probabilistic Policy Execution Uncertainty ( http://arxiv.org/abs/2307.07666v2 ) ライセンス: Link先を確認	Guanlin Liu, Zhihan Zhou, Han Liu, Lifeng Lai	(参考訳) ロバスト強化学習(RL)は、不確実性に直面した最悪のパフォーマンスを最適化する政策を見つけることを目的としている。本稿では,ポリシーに規定される行為を常に実行する代わりに,エージェントがポリシーに指定されたアクションを確率1〜\rho$で受け取り,確率$\rho$で代替の敵対行為を行う確率的ポリシー実行の不確実性を伴うアクションロバストrlに焦点を当てる。確率的政策実行の不確実性を持つ行動ロバストmdpに対する最適ポリシーの存在を確立し,その解に対して行動ロバストなベルマン最適性方程式を提供する。さらに、最小限の後悔とサンプルの複雑さを実現するために、Action Robust Reinforcement Learning with Certificates (ARRLC)アルゴリズムを開発した。さらに,本手法のロバスト性を検証するために数値実験を行い,arrlcが非ロバストrlアルゴリズムよりも優れ,行動摂動の存在下でロバストtdアルゴリズムよりも高速に収束することを示す。 Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-\rho$ and an alternative adversarial action with probability $\rho$. We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. Furthermore, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Furthermore, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations.	翻訳日:2023-07-21 17:08:01 公開日:2023-07-20
# ロバスト容積分節化のための周波数領域adversarial training Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation ( http://arxiv.org/abs/2307.07269v2 ) ライセンス: Link先を確認	Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan	(参考訳) 医療などの重要な応用において、ディープラーニングモデルの堅牢性を確保することが不可欠である。近年の深層学習の進歩により, ボリューム画像分割モデルの性能は向上しているが, 敵攻撃に対する脆弱性のため, 現実のアプリケーションに即時に展開することはできない。本稿では,3次元周波数領域対向攻撃をボリューム画像分割モデルに適用し,従来型の入力領域やボクセル領域攻撃に対する利点を示す。提案手法を用いて,voxelおよび周波数領域攻撃に対するロバストモデルを最適化する新しい周波数領域敵訓練手法を提案する。さらに, クリーンサンプルと逆サンプルのモデル性能のトレードオフを改善するために, 周波数領域敵訓練を規制するために, 周波数一貫性の損失を提案する。コードはhttps://github.com/asif-hanif/vafaで公開されている。 It is imperative to ensure the robustness of deep learning models in critical applications such as, healthcare. While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. Using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. Code is publicly available at https://github.com/asif-hanif/vafa.	翻訳日:2023-07-21 17:07:40 公開日:2023-07-20
# 中性窒素空洞中心における軌道状態のコヒーレント電界制御 Coherent Electric-Field Control of Orbital state in a Neutral Nitrogen-Vacancy Center ( http://arxiv.org/abs/2307.07198v2 ) ライセンス: Link先を確認	Hodaka Kurokawa, Keidai Wakamatsu, Shintaro Nakazato, Toshiharu Makino, Hiromitsu Kato, Yuhei Sekiguchi, and Hideo Kosaka	(参考訳) 軌道状態のコヒーレント制御は、ダイヤモンドの色中心において極めて低電力操作を実現するために重要である。ここでは、電場による軌道制御の理想的なシステムとして、中和された窒素空孔中心であるNV$^0$を提案する。我々は、NV$^0$の基底状態における電気感受性を、NV$^-$の励起状態における電気感受性と同等に推定する。また、NV$^0$の軌道状態のコヒーレント制御を示す。軌道制御に必要な電力はスピン制御よりも3桁小さく、希釈冷凍機で作動する超伝導量子ビットと対面する可能性を強調している。 The coherent control of the orbital state is crucial for color centers in diamonds for realizing extremely low-power manipulation. Here, we propose the neutrally charged nitrogen-vacancy center, NV$^0$, as an ideal system for orbital control through electric fields. We estimate electric susceptibility in the ground state of NV$^0$ to be comparable to that in the excited state of NV$^-$. Also, we demonstrate coherent control of the orbital states of NV$^0$. The required power for orbital control is three orders of magnitude smaller than that for spin control, highlighting the potential for interfacing a superconducting qubit operated in a dilution refrigerator.	翻訳日:2023-07-21 17:07:25 公開日:2023-07-20
# ディープニューラルネットワークにおける量的clt Quantitative CLTs in Deep Neural Networks ( http://arxiv.org/abs/2307.06092v2 ) ライセンス: Link先を確認	Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati	(参考訳) ランダムなガウス重みとバイアスを持つ完全連結ニューラルネットワークの分布について検討し,隠れた層幅が大きな定数$n$に比例することを示した。非線形性に関する穏やかな仮定の下では、正規近似の量的境界は、大きなが有限の n$ と任意の固定されたネットワーク深さで有効である。この定理は有限次元分布と全過程の両方について示しており、ランダムな完全連結ネットワーク(とその微分)と対応する無限幅ガウス過程の間の距離は、例えば$n^{-\gamma}$ for $\gamma>0$ のようにスケールする。我々の境界は、それまでの文献よりもネットワーク幅に依存しているという点で強く、一次元の場合、それらが最適であること、すなわち一致した下界を確立することを証明する。 We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.	翻訳日:2023-07-21 17:06:55 公開日:2023-07-20
# 表面電子に基づく非断熱的ホロノミック量子ゲート Nonadiabatic holonomic quantum gates based on the surface electron ( http://arxiv.org/abs/2307.09900v2 ) ライセンス: Link先を確認	Jun Wang, Hai-Bo Wang, Qing Ai	(参考訳) 幾何学位相に基づく非線形ホロノミック量子計算は、内蔵ノイズとデコヒーレンスに対して堅牢である。本研究では, 量子計算のための有望な2次元プラットフォームである表面電子系において, 非断熱ホロノミック量子ゲートを実現するためのスキームを理論的に提案する。ホロノミックゲートは、リドベルク状態とスピン状態が不均一磁場を介して結合する3層構造によって実現される。循環進化の後、計算基盤は異なる幾何学的位相を拾い上げ、幾何学的ゲートを実行する。スピンアップした電子のみが幾何ゲートを体験し、スピンダウンした電子は状態選択駆動場から分離される。 Rydberg状態とスピン状態に符号化された制御NOTゲートが実行に移される。出力状態の忠実度は、実験的に達成可能なパラメータで 0.99 を超える。 The nonadiabatic holonomic quantum computation based on the geometric phase is robust against the built-in noise and decoherence. In this work, we theoretically propose a scheme to realize nonadiabatic holonomic quantum gates in a surface electron system, which is a promising two-dimensional platform for quantum computation. The holonomic gate is realized by a three-level structure that combines the Rydberg states and spin states via an inhomogeneous magnetic field. After a cyclic evolution, the computation bases pick up different geometric phases and thus perform a geometric gate. Only the electron with spin up experiences the geometric gate, while the electron with spin down is decoupled from the state-selective driving fields. The controlled-NOT gate encoded on the Rydberg states and spin states is then put into practice. The fidelity of the output state exceeds 0.99 with experimentally achievable parameters.	翻訳日:2023-07-21 17:00:38 公開日:2023-07-20
# AesPA-Net:美的パターン認識型転送ネットワーク AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks ( http://arxiv.org/abs/2307.09724v2 ) ライセンス: Link先を確認	Kibeom Hong, Seogkyu Jeon, Junsoo Lee, Namhyuk Ahn, Kunhee Kim, Pilhyeon Lee, Daesik Kim, Youngjung Uh, Hyeran Byun	(参考訳) 対象のスタイルを芸術的に表現するために、近年の研究では、スタイル画像の局所パッチをコンテンツ画像の対応するパッチにマッピングする能力により、注意機構を活用している。しかし、任意の内容とアートワークのセマンティックな対応が低いため、アテンションモジュールはスタイルイメージから特定のローカルパッチを乱用し、不調和で明らかな反復的なアーティファクトをもたらす。この制限を克服し,芸術的なスタイルの伝達を困難にするため,注意機構の強化とスタイルを整理するパターンのリズムの獲得に重点を置いている。本稿では,スタイル画像におけるパターンの反復を定量化する新しい指標であるパターン反復可能性について述べる。このパターン再現性に基づき,局所的およびグローバル的表現のスイートスポットを探索する美的パターン認識型転送ネットワーク(aespa-net)を提案する。さらに,注意機構が正確で意味のある意味的対応を学習することを奨励する,新たな自己監督タスクを提案する。最後に,局所パターンの精巧なリズムを伝達するためにパッチワイズスタイルロスを導入する。定量的に定量的な評価を行い,人間の知覚に適合するパターン再現性の信頼性を検証し,提案手法の優れていることを示す。 To deliver the artistic expression of the target style, recent studies exploit the attention mechanism owing to its ability to map the local patches of the style image to the corresponding patches of the content image. However, because of the low semantic correspondence between arbitrary content and artworks, the attention module repeatedly abuses specific local patches from the style image, resulting in disharmonious and evident repetitive artifacts. To overcome this limitation and accomplish impeccable artistic style transfer, we focus on enhancing the attention mechanism and capturing the rhythm of patterns that organize the style. In this paper, we introduce a novel metric, namely pattern repeatability, that quantifies the repetition of patterns in the style image. Based on the pattern repeatability, we propose Aesthetic Pattern-Aware style transfer Networks (AesPA-Net) that discover the sweet spot of local and global style expressions. In addition, we propose a novel self-supervisory task to encourage the attention mechanism to learn precise and meaningful semantic correspondence. Lastly, we introduce the patch-wise style loss to transfer the elaborate rhythm of local patterns. Through qualitative and quantitative evaluations, we verify the reliability of the proposed pattern repeatability that aligns with human perception, and demonstrate the superiority of the proposed framework.	翻訳日:2023-07-21 17:00:26 公開日:2023-07-20
# 大規模言語モデルのための効率的誘導生成 Efficient Guided Generation for Large Language Models ( http://arxiv.org/abs/2307.09702v2 ) ライセンス: Link先を確認	Brandon T. Willard and R\'emi Louf	(参考訳) 本稿では,正規表現と文脈自由文法を用いた言語モデルテキスト生成のための効率的な手法について述べる。我々のアプローチはトークンシーケンス生成プロセスにほとんどオーバーヘッドを課さず、ガイド生成を実際に実現可能にする。実装はオープンソースのPythonライブラリOutlinesで提供されている。 In this article we describe an efficient approach to guiding language model text generation with regular expressions and context-free grammars. Our approach adds little to no overhead to the token sequence generation process, and makes guided generation feasible in practice. An implementation is provided in the open source Python library Outlines.	翻訳日:2023-07-21 17:00:03 公開日:2023-07-20
# ドメイン適応に基づく雨天・霧天における自律走行検出の強化 Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather ( http://arxiv.org/abs/2307.09676v2 ) ライセンス: Link先を確認	Jinlong Li, Runsheng Xu, Jin Ma, Qin Zou, Jiaqi Ma, Hongkai Yu	(参考訳) 通常、教師付き学習に依存する自律運転のための物体検出法は、トレーニングとテストデータの間で一貫した特徴分布を仮定するが、異なる気象条件下では失敗する可能性がある。ドメインギャップのため、晴れた天候下で訓練された検出モデルは、霧や雨の条件下ではうまく機能しない可能性がある。霧や雨の天候で検出のボトルネックを克服することは、野生に展開する自動運転車にとって真の課題だ。霧と雨天の領域間隙を橋渡しし、オブジェクト検出の性能を向上させるため、ドメイン適応オブジェクト検出のための新しいフレームワークを提案する。画像レベルとオブジェクトレベルの両方での適応は、画像スタイルの違いとドメイン間のオブジェクトの出現を最小化することを目的としている。さらに, 課題事例に対するモデルの性能向上のために, ドメイン適応に加えて, 困難な事例に対して, 敵地雷を行う新たな逆勾配反転層を導入する。さらに,新たな領域レベルの計量正規化を実施するために,データ拡張による補助ドメインの生成を提案する。公開v2vベンチマークにおける実験結果は、特に霧や雨の運転シナリオにおける物体検出の大幅な向上を示している。 Typically, object detection methods for autonomous driving that rely on supervised learning make the assumption of a consistent feature distribution between the training and testing data, however such assumption may fail in different weather conditions. Due to the domain gap, a detection model trained under clear weather may not perform well in foggy and rainy conditions. Overcoming detection bottlenecks in foggy and rainy weather is a real challenge for autonomous vehicles deployed in the wild. To bridge the domain gap and improve the performance of object detectionin foggy and rainy weather, this paper presents a novel framework for domain-adaptive object detection. The adaptations at both the image-level and object-level are intended to minimize the differences in image style and object appearance between domains. Furthermore, in order to improve the model's performance on challenging examples, we introduce a novel adversarial gradient reversal layer that conducts adversarial mining on difficult instances in addition to domain adaptation. Additionally, we suggest generating an auxiliary domain through data augmentation to enforce a new domain-level metric regularization. Experimental findings on public V2V benchmark exhibit a substantial enhancement in object detection specifically for foggy and rainy driving scenarios.	翻訳日:2023-07-21 16:59:58 公開日:2023-07-20
# 多変量可変チャネル時系列の多視点自己教師型学習 Multi-view self-supervised learning for multivariate variable-channel time series ( http://arxiv.org/abs/2307.09614v2 ) ライセンス: Link先を確認	Thea Br\"usch, Mikkel N. Schmidt, Tommy S. Alstr{\o}m	(参考訳) 多変量生物医学時系列データのラベル付けは、退屈で高価なプロセスである。自己教師付きコントラスト学習は、ラベルなしデータの事前トレーニングを通じて、大きなラベル付きデータセットの必要性を軽減する。しかし、多変量時系列データの場合、入力チャネルの集合はアプリケーションによって異なり、既存の作業の多くは異なる入力チャネルの集合を持つデータセット間の転送を許さない。入力チャネルを個別に操作するための1つのエンコーダの学習を提案する。次に、メッセージパッシングニューラルネットワークを使用して、チャネル間の単一の表現を抽出する。 6つのEEGチャネルを持つデータセット上でモデルを事前学習し、2つの異なるEEGチャネルを持つデータセット上でそれを微調整することで、この手法の可能性を示す。我々は、異なるコントラスト損失関数にまたがるメッセージパッシングニューラルネットワークとモデルを比較する。 TS2Vecの損失と組み合わせることで、ほとんどの設定で他のメソッドよりも優れていることを示す。 Labeling of multivariate biomedical time series data is a laborious and expensive process. Self-supervised contrastive learning alleviates the need for large, labeled datasets through pretraining on unlabeled data. However, for multivariate time series data, the set of input channels often varies between applications, and most existing work does not allow for transfer between datasets with different sets of input channels. We propose learning one encoder to operate on all input channels individually. We then use a message passing neural network to extract a single representation across channels. We demonstrate the potential of this method by pretraining our model on a dataset with six EEG channels and then fine-tuning it on a dataset with two different EEG channels. We compare models with and without the message passing neural network across different contrastive loss functions. We show that our method, combined with the TS2Vec loss, outperforms all other methods in most settings.	翻訳日:2023-07-21 16:59:38 公開日:2023-07-20
# 学習に基づく地形とロボット認識ダイナミクスモデルによるコンテキスト条件ナビゲーション Context-Conditional Navigation with a Learning-Based Terrain- and Robot-Aware Dynamics Model ( http://arxiv.org/abs/2307.09206v2 ) ライセンス: Link先を確認	Suresh Guttikonda, Jan Achterhold, Haolong Li, Joschka Boedecker, Joerg Stueckler	(参考訳) 自律的なナビゲーション設定では、いくつかの量にはバリエーションがある。摩擦係数などの地形特性は、ロボットの位置によって時間によって変化する。また、ロボットのダイナミクスは、例えば、異なるペイロード、システムの質量の変更、摩耗と涙、アクチュエータのゲインの変化、関節摩擦などによって変化する可能性がある。したがって、自律エージェントはそのようなバリエーションに適応できるべきである。本稿では,その変動に適応できる新しい確率的,地形的,ロボット対応のフォワードダイナミクスモデルであるTRADYNを開発する。ニューラルプロセスに基づいたメタラーニングフォワードダイナミクスモデルの最近の進歩の上に構築されている。本手法は,一輪車のようなロボットと,空間的な摩擦係数の異なる異なる地形配置を用いて,シミュレーションによる2次元ナビゲーション環境で評価する。本実験では,非適応アブレーションモデルと比較して,長水平軌道予測のタスクに対する予測誤差が小さいことを示す。また,ナビゲーション計画の下流作業において,ロボットと地形特性を考慮に入れた制御効率の高い経路を計画する際の性能向上を示す。 In autonomous navigation settings, several quantities can be subject to variations. Terrain properties such as friction coefficients may vary over time depending on the location of the robot. Also, the dynamics of the robot may change due to, e.g., different payloads, changing the system's mass, or wear and tear, changing actuator gains or joint friction. An autonomous agent should thus be able to adapt to such variations. In this paper, we develop a novel probabilistic, terrain- and robot-aware forward dynamics model, termed TRADYN, which is able to adapt to the above-mentioned variations. It builds on recent advances in meta-learning forward dynamics models based on Neural Processes. We evaluate our method in a simulated 2D navigation setting with a unicycle-like robot and different terrain layouts with spatially varying friction coefficients. In our experiments, the proposed model exhibits lower prediction error for the task of long-horizon trajectory prediction, compared to non-adaptive ablation models. We also evaluate our model on the downstream task of navigation planning, which demonstrates improved performance in planning control-efficient paths by taking robot and terrain properties into account.	翻訳日:2023-07-21 16:59:24 公開日:2023-07-20
# LA-Net:ラベル雑音下での表情認識のためのランドマーク認識学習 LA-Net: Landmark-Aware Learning for Reliable Facial Expression Recognition under Label Noise ( http://arxiv.org/abs/2307.09023v3 ) ライセンス: Link先を確認	Zhiyu Wu, Jinshi Cui	(参考訳) 表情認識(FER)は、表現のあいまいさのため難しい課題である。派生したノイズラベルは、実世界のシナリオのパフォーマンスを著しく損なう。この問題に対処するため,我々は2つの視点からラベルノイズの影響を軽減するために顔のランドマークを利用した新しいferモデルであるlandmark-aware net~(la-net)を提案する。まず、LA-Netは、表現空間の不確実性を抑えるためにランドマーク情報を使用し、各サンプルのラベル分布を近傍集約により構築し、訓練監督の質を向上させる。第二に、設計した表現ランドマークの対照的な損失を用いて、ランドマーク情報を表現表現に組み込む。強調表現特徴抽出器はラベルノイズの影響を受けにくい。本手法は,任意の深層ニューラルネットワークと統合することで,余分な推論コストを発生させることなく,よりよい指導を行うことができる。我々は,組込みデータセットと合成ノイズデータセットの両方について広範な実験を行い,LA-Netが最先端の性能を達成することを示す。 Facial expression recognition (FER) remains a challenging task due to the ambiguity of expressions. The derived noisy labels significantly harm the performance in real-world scenarios. To address this issue, we present a new FER model named Landmark-Aware Net~(LA-Net), which leverages facial landmarks to mitigate the impact of label noise from two perspectives. Firstly, LA-Net uses landmark information to suppress the uncertainty in expression space and constructs the label distribution of each sample by neighborhood aggregation, which in turn improves the quality of training supervision. Secondly, the model incorporates landmark information into expression representations using the devised expression-landmark contrastive loss. The enhanced expression feature extractor can be less susceptible to label noise. Our method can be integrated with any deep neural network for better training supervision without introducing extra inference costs. We conduct extensive experiments on both in-the-wild datasets and synthetic noisy datasets and demonstrate that LA-Net achieves state-of-the-art performance.	翻訳日:2023-07-21 16:59:06 公開日:2023-07-20
# 個別データに基づく健康のためのマルチモーダルLCM Multimodal LLMs for health grounded in individual-specific data ( http://arxiv.org/abs/2307.09018v2 ) ライセンス: Link先を確認	Anastasiya Belyaeva, Justin Cosentino, Farhad Hormozdiari, Krish Eswaran, Shravya Shetty, Greg Corrado, Andrew Carroll, Cory Y. McLean, Nicholas A. Furlotte	(参考訳) 基礎となる大規模言語モデル(LLM)は、健康を含む幅広い分野のタスクを解く素晴らしい能力を示している。パーソナライズされた健康タスクを効果的に解決するために、LLMは個人の健康状態に関連するさまざまなデータモダリティを抽出する能力が必要である。本稿では,マルチモーダル理解のための健康大言語モデル (helm: health large language model for multimodal understanding) を開発し,基礎疾患リスクを推定するために高次元臨床モダリティ(high-dimensional clinical modality)を活用することを可能にする。 HeLMは複雑なデータモダリティをLLMのトークン埋め込み空間にマッピングするエンコーダを学習し、データをテキストにシリアライズすることで表データのような単純なモダリティを符号化する。英国バイオバンクのデータを用いて,HeLMは高次元時系列データに加えて,人口統計学的,臨床的特徴を効果的に利用し,疾患リスクを推定できることを示した。例えば、HeLMは、表状データのみを使用する場合の0.49と比較して、表状データとスピログラムデータを組み合わせた場合の喘息予測のためのAUROCの0.75を達成している。全体として、Helmは8つのバイナリ特性から選択した古典的な機械学習アプローチよりも優れ、あるいは同等に動作する。さらに, 分布特性に対する一般化可能性や, 個人の健康と健康に関する会話を駆動する能力など, このモデルの下流利用について検討した。 Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ingest a diversity of data modalities that are relevant to an individual's health status. In this paper, we take a step towards creating multimodal LLMs for health that are grounded in individual-specific data by developing a framework (HeLM: Health Large Language Model for Multimodal Understanding) that enables LLMs to use high-dimensional clinical modalities to estimate underlying disease risk. HeLM encodes complex data modalities by learning an encoder that maps them into the LLM's token embedding space and for simple modalities like tabular data by serializing the data into text. Using data from the UK Biobank, we show that HeLM can effectively use demographic and clinical features in addition to high-dimensional time-series data to estimate disease risk. For example, HeLM achieves an AUROC of 0.75 for asthma prediction when combining tabular and spirogram data modalities compared with 0.49 when only using tabular data. Overall, we find that HeLM outperforms or performs at parity with classical machine learning approaches across a selection of eight binary traits. Furthermore, we investigate the downstream uses of this model such as its generalizability to out-of-distribution traits and its ability to power conversations around individual health and wellness.	翻訳日:2023-07-21 16:58:48 公開日:2023-07-20
# サイクル一貫性に基づく教師なしディープグラフマッチング Unsupervised Deep Graph Matching Based on Cycle Consistency ( http://arxiv.org/abs/2307.08930v2 ) ライセンス: Link先を確認	Siddharth Tourani, Carsten Rother and Muhammad Haris Khan and Bogdan Savchynskyy	(参考訳) 我々は,教師なし深度グラフマッチングの疎密な領域と,画像のキーポイントマッチングへの応用に寄与する。標準の \emph{supervised} アプローチとは対照的に、本手法ではキーポイント対間の基底真理対応は不要である。代わりに、同じオブジェクトカテゴリの画像間のマッチングの一貫性を強制することにより、自己監視される。マッチングと一貫性損失は離散的であるため、それらの微分は直接学習には使用できない。組合せ解のブラックボックス微分に関する最近の結果に基づいて,本手法を原理的に構築することにより,この問題に対処する。この手法は任意のネットワークアーキテクチャや組合せ解法と互換性があるため,非常に柔軟である。実験により,本手法は教師なしグラフマッチングのための新しい最先端技術であることがわかった。 We contribute to the sparsely populated area of unsupervised deep graph matching with application to keypoint matching in images. Contrary to the standard \emph{supervised} approach, our method does not require ground truth correspondences between keypoint pairs. Instead, it is self-supervised by enforcing consistency of matchings between images of the same object category. As the matching and the consistency loss are discrete, their derivatives cannot be straightforwardly used for learning. We address this issue in a principled way by building our method upon the recent results on black-box differentiation of combinatorial solvers. This makes our method exceptionally flexible, as it is compatible with arbitrary network architectures and combinatorial solvers. Our experimental evaluation suggests that our technique sets a new state-of-the-art for unsupervised graph matching.	翻訳日:2023-07-21 16:58:17 公開日:2023-07-20
# DialogStudio: 会話型AIのための最もリッチで最も多様な統一データセットコレクションを目指して DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI ( http://arxiv.org/abs/2307.10172v2 ) ライセンス: Link先を確認	Jianguo Zhang and Kun Qian and Zhiwei Liu and Shelby Heinecke and Rui Meng and Ye Liu and Zhou Yu and Huan Wang and Silvio Savarese and Caiming Xiong	(参考訳) 会話AIの進歩にもかかわらず、言語モデルは多様な会話タスクを扱うための課題に直面し、既存の対話データセットコレクションは多様性と包括性を欠いていることが多い。これらの問題に対処するために,対話データセットの最大かつ最も多様なコレクションであるDialogStudioを紹介し,元の情報を保存しながら一貫したフォーマットで統一する。本コレクションは,オープンドメイン対話,タスク指向対話,自然言語理解,対話レコメンデーション,対話要約,知識基底対話などのデータを含む。 DialogStudioの実用性をさらに向上するため、各データセットのライセンスを特定し、選択した対話のためのドメイン対応プロンプトを設計し、命令対応の微調整を容易にする。さらに、データセット収集を用いて会話型AIモデルを構築し、ゼロショットおよび少数ショット学習シナリオにおける実験により、DialogStudioの優位性を実証した。透明性を改善し、データセットやタスクベースの研究、言語モデルの事前トレーニングをサポートするため、すべてのデータセット、ライセンス、コード、dialogstudioに関連するモデルがhttps://github.com/salesforce/dialogstudioで公開されている。 Despite advancements in conversational AI, language models encounter challenges to handle diverse conversational tasks, and existing dialogue dataset collections often lack diversity and comprehensiveness. To tackle these issues, we introduce DialogStudio: the largest and most diverse collection of dialogue datasets, unified under a consistent format while preserving their original information. Our collection encompasses data from open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues, making it an incredibly rich and diverse resource for dialogue research and model training. To further enhance the utility of DialogStudio, we identify the licenses for each dataset and design domain-aware prompts for selected dialogues to facilitate instruction-aware fine-tuning. Furthermore, we develop conversational AI models using the dataset collection, and our experiments in both zero-shot and few-shot learning scenarios demonstrate the superiority of DialogStudio. To improve transparency and support dataset and task-based research, as well as language model pre-training, all datasets, licenses, codes, and models associated with DialogStudio are made publicly accessible at https://github.com/salesforce/DialogStudio	翻訳日:2023-07-21 16:48:57 公開日:2023-07-20
# 人間計算アルゴリズムの労働者としてのLLM LLMによるクラウドソーシングパイプラインのレプリケーション LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs ( http://arxiv.org/abs/2307.10168v2 ) ライセンス: Link先を確認	Tongshuang Wu, Haiyi Zhu, Maya Albayrak, Alexis Axon, Amanda Bertsch, Wenxing Deng, Ziqi Ding, Bill Guo, Sireesh Gururaja, Tzu-Sheng Kuo, Jenny T. Liang, Ryan Liu, Ihita Mandal, Jeremiah Milbauer, Xiaolin Ni, Namrata Padmanabhan, Subhashini Ramkumar, Alexis Sudjianto, Jordan Taylor, Ying-Jui Tseng, Patricia Vaidos, Zhijin Wu, Wei Wu, Chenyang Yang	(参考訳) LLMは、以前は人間の能力専用と考えられていたクラウドソーシングタスクにおいて、人間のような行動の複製を約束している。しかし、現在の取り組みは主に単純な原子タスクに焦点を当てている。 LLMがより複雑なクラウドソーシングパイプラインを複製できるかどうかを検討する。これらの「ヒューマン・コンピュテーション・アルゴリズム」において、現代のLLMはクラウドワーカーの能力の一部をシミュレートできるが、成功のレベルは変動しており、サブタスクに必要な特定のスキル、サブタスクを実行するための最適な相互作用のモダリティによって影響される。我々は,指示に対する人間とllmの感性の違いを考察し,llmに対するヒューマンセーフガードの実現の重要性を強調し,人間とllmを相補的なスキルセットで訓練する可能性について論じる。重要なのは、クラウドソーシングパイプラインの複製が、(1)異なるタスクにおけるllmの相対的な強み(サブタスクでのパフォーマンスをクロス比較することによって)と(2)複雑なタスクにおけるllmsの潜在能力を調査するための価値のあるプラットフォームであることを示すことである。 LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these "human computation algorithms," but the level of success is variable and influenced by requesters' understanding of LLM capabilities, the specific skills required for sub-tasks, and the optimal interaction modality for performing these sub-tasks. We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets. Crucially, we show that replicating crowdsourcing pipelines offers a valuable platform to investigate (1) the relative strengths of LLMs on different tasks (by cross-comparing their performances on sub-tasks) and (2) LLMs' potential in complex tasks, where they can complete part of the tasks while leaving others to humans.	翻訳日:2023-07-21 16:48:31 公開日:2023-07-20
# 屋内空間における車両位置のドローンナビゲーションとライセンス場所検出 Drone navigation and license place detection for vehicle location in indoor spaces ( http://arxiv.org/abs/2307.10165v2 ) ライセンス: Link先を確認	Moa Arvidsson, Sithichot Sawirot, Cristofer Englund, Fernando Alonso-Fernandez, Martin Torstensson, Boris Duran	(参考訳) 毎年何百万もの車両が輸送され、船やボートに密閉されている。火災などの関連する安全問題のリスクを軽減するためには、車両の位置を知ることが不可欠である。この研究の目的は、駐車中の車両の列を移動し、ナンバープレートを検出するナノドローンに基づくソリューションを作ることだ。壁追跡アルゴリズムと、ライセンスプレートを検出するために訓練されたCNNによって実現しています。すべての計算はドローン上でリアルタイムで行われ、位置と検出された画像を送るだけで、プレートの位置がついた2Dマップが作成できる。私たちのソリューションは、8つのテストケース(数列のプレート、異なるドローン速度、あるいは低光度)にまたがるすべてのプレートを、複数のドローンの旅の計測結果を集約することで読み取ることができます。 Millions of vehicles are transported every year, tightly parked in vessels or boats. To reduce the risks of associated safety issues like fires, knowing the location of vehicles is essential, since different vehicles may need different mitigation measures, e.g. electric cars. This work is aimed at creating a solution based on a nano-drone that navigates across rows of parked vehicles and detects their license plates. We do so via a wall-following algorithm, and a CNN trained to detect license plates. All computations are done in real-time on the drone, which just sends position and detected images that allow the creation of a 2D map with the position of the plates. Our solution is capable of reading all plates across eight test cases (with several rows of plates, different drone speeds, or low light) by aggregation of measurements across several drone journeys.	翻訳日:2023-07-21 16:48:10 公開日:2023-07-20
# 不均衡な医用画像認識のための病変領域へのクラスアテンション Class Attention to Regions of Lesion for Imbalanced Medical Image Recognition ( http://arxiv.org/abs/2307.10036v2 ) ライセンス: Link先を確認	Jia-Xin Zhuang, Jiabin Cai, Jianguo Zhang, Wei-shi Zheng and Ruixuan Wang	(参考訳) 医用画像の自動分類はインテリジェント診断システムにおいて重要な要素である。しかし、ほとんどの医療画像データセットには、一般的な疾患のサンプルが豊富に含まれており、まれなものだけが含まれており、大きな階級的不均衡につながっている。現在,不均衡なトレーニングデータから効果的に学習することは,知的診断においてオープンな問題である。本稿では, 単純で効果的なフレームワークである「textbf{C}lass \textbf{A}ttention to \textbf{RE}gions of the lesion (CARE) を提案し, 「textbf{C}onvolutional \textbf{N}eural \textbf{N}etworks (CNNs) のトレーニングプロセスに注意を埋め込んでデータ不均衡の問題に対処する。提案したアテンションモジュールは、CNNがまれな疾患の病変領域に適応するのに役立つため、CNNがそれらの特徴をより効果的に学習するのに役立つ。さらに、この注目モジュールはトレーニング段階でのみ動作し、元のネットワークのアーキテクチャを変更しないため、既存のCNNアーキテクチャと直接結合することができる。 CAREフレームワークは、まれな疾患の病変領域を表すために境界ボックスを必要とする。手動のアノテーションの必要性を軽減するため,従来のサリエンシ手法や事前訓練されたセグメンテーションモデルをボックス生成に適用することにより,CAREの変種をさらに発展させた。結果から,自動バウンディングボックス生成によるCARE変種は,従来のCAREフレームワークに比較して,‘textit{manual} バウンディングボックスアノテーションと同等であることがわかった。不均衡な皮膚画像データセットと肺炎データセットに関する一連の実験により、本手法は稀な疾患の病変領域に効果的に集中し、稀な疾患の分類性能を著しく向上することを示す。 Automated medical image classification is the key component in intelligent diagnosis systems. However, most medical image datasets contain plenty of samples of common diseases and just a handful of rare ones, leading to major class imbalances. Currently, it is an open problem in intelligent diagnosis to effectively learn from imbalanced training data. In this paper, we propose a simple yet effective framework, named \textbf{C}lass \textbf{A}ttention to \textbf{RE}gions of the lesion (CARE), to handle data imbalance issues by embedding attention into the training process of \textbf{C}onvolutional \textbf{N}eural \textbf{N}etworks (CNNs). The proposed attention module helps CNNs attend to lesion regions of rare diseases, therefore helping CNNs to learn their characteristics more effectively. In addition, this attention module works only during the training phase and does not change the architecture of the original network, so it can be directly combined with any existing CNN architecture. The CARE framework needs bounding boxes to represent the lesion regions of rare diseases. To alleviate the need for manual annotation, we further developed variants of CARE by leveraging the traditional saliency methods or a pretrained segmentation model for bounding box generation. Results show that the CARE variants with automated bounding box generation are comparable to the original CARE framework with \textit{manual} bounding box annotations. A series of experiments on an imbalanced skin image dataset and a pneumonia dataset indicates that our method can effectively help the network focus on the lesion regions of rare diseases and remarkably improves the classification performance of rare diseases.	翻訳日:2023-07-21 16:47:57 公開日:2023-07-20
# 入院バンド : 遅延のない長期的勧告の最適化 Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay ( http://arxiv.org/abs/2307.09943v2 ) ライセンス: Link先を確認	Thomas M. McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil Ciosek	(参考訳) リコメンダシステムは、オンラインプラットフォームのユビキタスな機能である。利用者の長期的満足度向上に特化している。本稿では,遅延報酬を伴うマルチアームバンディット問題として定式化したコンテンツ探索課題について検討する。我々は、学習信号の選択に明らかなトレードオフがあることを観察した。完全な報酬が利用可能になるのを待つのに数週間かかり、学習の開始率を損なう可能性がある一方で、短期的なプロキシの報酬を測定することは、実際の長期的な目標を不完全に反映する。この課題を2つのステップで解決する。まず,これまでに得られた情報をすべて組み込んだ遅延報酬の予測モデルを開発する。完全な観測と部分的な(短命または中期的な)結果がベイズフィルタを通して組み合わせられ、確率論的信念が得られる。第二に、この新たな予測モデルを利用する帯域幅アルゴリズムを考案する。このアルゴリズムは、探索とエクスプロイトを慎重にバランスさせて、長期的成功に対応するコンテンツを素早く特定する。このアプローチをポッドキャストのレコメンデーション問題に適用し,ユーザが2ヶ月以上繰り返し関与している番組を識別する。短期プロキシを最適化するアプローチや、長期的な結果が完全に実現されるのを待つアプローチと比較して、我々のアプローチがはるかに優れたパフォーマンスをもたらすことを実証的に検証する。 Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards. We observe that there is an apparent trade-off in choosing the learning signal: Waiting for the full reward to become available might take several weeks, hurting the rate at which learning happens, whereas measuring short-term proxy rewards reflects the actual long-term goal only imperfectly. We address this challenge in two steps. First, we develop a predictive model of delayed rewards that incorporates all information obtained to date. Full observations as well as partial (short or medium-term) outcomes are combined through a Bayesian filter to obtain a probabilistic belief. Second, we devise a bandit algorithm that takes advantage of this new predictive model. The algorithm quickly learns to identify content aligned with long-term success by carefully balancing exploration and exploitation. We apply our approach to a podcast recommendation problem, where we seek to identify shows that users engage with repeatedly over two months. We empirically validate that our approach results in substantially better performance compared to approaches that either optimize for short-term proxies, or wait for the long-term outcome to be fully realized.	翻訳日:2023-07-21 16:47:22 公開日:2023-07-20
# 音声ヘッドビデオ生成のための暗黙のアイデンティティ表現条件付きメモリ補償ネットワーク Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation ( http://arxiv.org/abs/2307.09906v2 ) ライセンス: Link先を確認	Fa-Ting Hong and Dan Xu	(参考訳) トーキングヘッドビデオ生成は、人物の身元を画像内に保持しつつ、ターゲット駆動ビデオから派生した動き情報を用いて、静止画像中の人間の顔に動的ポーズと表情をアニメーションすることを目的としている。しかし、運転映像における劇的かつ複雑な動きは、隠蔽された領域や微妙な表現のバリエーションに対して十分な外観情報を提供できないため、不明瞭な生成を引き起こす。この問題に対処するために,我々はグローバルな顔表現空間を学習し,MCNetと呼ばれる新しい暗黙のアイデンティティ表現条件付きメモリ補償ネットワークを設計することを提案する。具体的には、ネットワークモジュールを考案し、すべてのトレーニングサンプルから、統一的な空間的顔メタメモリバンクを学習し、より豊かな顔構造と外観を前もって提供し、その生成のための歪んだ顔特徴を補うことができる。さらに,ソース画像の離散的キーポイントから学習した暗黙的アイデンティティ表現に基づく効果的なクエリ機構を提案する。これにより、メモリバンクからより相関性の高い情報を検索し、補償を行うことができる。大規模な実験により、MCNetは代表的および補完的な顔記憶を学習でき、VoxCeleb1およびCelebVデータセットにおける従来の最先端の音声ヘッド生成方法よりも明らかに優れていることが示された。 https://github.com/harlanhong/iccv2023-mcnet}{project} を参照。 Talking head video generation aims to animate a human face in a still image with dynamic poses and expressions using motion information derived from a target-driving video, while maintaining the person's identity in the source image. However, dramatic and complex motions in the driving video cause ambiguous generation, because the still source image cannot provide sufficient appearance information for occluded regions or delicate expression variations, which produces severe artifacts and significantly degrades the generation quality. To tackle this problem, we propose to learn a global facial representation space, and design a novel implicit identity representation conditioned memory compensation network, coined as MCNet, for high-fidelity talking head generation.~Specifically, we devise a network module to learn a unified spatial facial meta-memory bank from all training samples, which can provide rich facial structure and appearance priors to compensate warped source facial features for the generation. Furthermore, we propose an effective query mechanism based on implicit identity representations learned from the discrete keypoints of the source image. It can greatly facilitate the retrieval of more correlated information from the memory bank for the compensation. Extensive experiments demonstrate that MCNet can learn representative and complementary facial memory, and can clearly outperform previous state-of-the-art talking head generation methods on VoxCeleb1 and CelebV datasets. Please check our \href{https://github.com/harlanhong/ICCV2023-MCNET}{Project}.	翻訳日:2023-07-21 16:47:00 公開日:2023-07-20
# FedSoup:選択的モデル補間によるフェデレーション学習における一般化とパーソナライゼーションの改善 FedSoup: Improving Generalization and Personalization in Federated Learning via Selective Model Interpolation ( http://arxiv.org/abs/2307.10507v1 ) ライセンス: Link先を確認	Minghui Chen, Meirui Jiang, Qi Dou, Zehua Wang, Xiaoxiao Li	(参考訳) cross-silo federated learning(fl)は、病院や臨床研究所などのデータセンタに分散したデータセット上の機械学習モデルの開発を可能にする。しかし、最近の研究では、現在のFLアルゴリズムは、分布シフトに直面した場合、局所的な性能とグローバルな性能のトレードオフに直面している。具体的には、パーソナライズされたflメソッドは、ローカルデータに過度に適合する傾向があり、ローカルモデルに鋭い谷が発生し、分散データに一般化する能力が阻害される。本稿では,地域とグローバルのパフォーマンスのトレードオフを最適化するために,新しいフェデレーションモデルスープ法(モデルパラメータの選択補間)を提案する。具体的には、フェデレーショントレーニングフェーズの間、各クライアントは、ローカルモデルとグローバルモデル間の補間モデルのパフォーマンスを監視して、独自のグローバルモデルプールを維持する。これにより、オーバーフィッティングを緩和し、フラットなミニマを求めることができ、モデルの一般化性能を大幅に改善できます。提案手法は,網膜および病理像の分類タスクにおける評価手法であり,本手法は分布汎化において有意な改善が得られた。私たちのコードはhttps://github.com/ubc-tea/fedsoupで利用可能です。 Cross-silo federated learning (FL) enables the development of machine learning models on datasets distributed across data centers such as hospitals and clinical research laboratories. However, recent research has found that current FL algorithms face a trade-off between local and global performance when confronted with distribution shifts. Specifically, personalized FL methods have a tendency to overfit to local data, leading to a sharp valley in the local model and inhibiting its ability to generalize to out-of-distribution data. In this paper, we propose a novel federated model soup method (i.e., selective interpolation of model parameters) to optimize the trade-off between local and global performance. Specifically, during the federated training phase, each client maintains its own global model pool by monitoring the performance of the interpolated model between the local and global models. This allows us to alleviate overfitting and seek flat minima, which can significantly improve the model's generalization performance. We evaluate our method on retinal and pathological image classification tasks, and our proposed method achieves significant improvements for out-of-distribution generalization. Our code is available at https://github.com/ubc-tea/FedSoup.	翻訳日:2023-07-21 15:21:08 公開日:2023-07-20
# Grad-CAMは医療画像で説明できるのか? Is Grad-CAM Explainable in Medical Images? ( http://arxiv.org/abs/2307.10506v1 ) ライセンス: Link先を確認	Subhashis Suara, Aayush Jha, Pratik Sinha, Arif Ahmed Sekh	(参考訳) 説明可能なディープラーニング(Explainable Deep Learning)は、人工知能(AI)分野、特に医療画像などの領域において、効果的な診断と治療計画のために正確かつ解釈可能な機械学習モデルが不可欠である。 Grad-CAMは、ディープラーニングモデルの意思決定プロセスで使用される画像の最も重要な領域を強調し、解釈可能性を高め、結果に対する信頼を高めるベースラインである。これは分類や説明など多くのコンピュータビジョン(CV)タスクに適用されている。本研究では,説明可能な深層学習の原理と医用画像との関連性について考察し,様々な説明可能性技術とその限界について考察し,Grad-CAMの医用画像応用について検討する。この結果は、医療画像におけるディープラーニングモデルの精度と解釈性を改善するために、説明可能なDeep LearningとGrad-CAMの可能性を浮き彫りにした。コードは利用可能である(利用可能になる予定)。 Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are crucial for effective diagnosis and treatment planning. Grad-CAM is a baseline that highlights the most critical regions of an image used in a deep learning model's decision-making process, increasing interpretability and trust in the results. It is applied in many computer vision (CV) tasks such as classification and explanation. This study explores the principles of Explainable Deep Learning and its relevance to medical imaging, discusses various explainability techniques and their limitations, and examines medical imaging applications of Grad-CAM. The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging. The code is available in (will be available).	翻訳日:2023-07-21 15:20:48 公開日:2023-07-20
# 画像表現における解釈可能な部分空間の同定 Identifying Interpretable Subspaces in Image Representations ( http://arxiv.org/abs/2307.10504v1 ) ライセンス: Link先を確認	Neha Kalibhat, Shweta Bhardwaj, Bayan Bruss, Hamed Firooz, Maziar Sanjabi, Soheil Feizi	(参考訳) 画像表現の特徴を解釈可能なフレームワークであるコントラスト概念(FALCON)を用いた自動特徴記述を提案する。ターゲット機能としてFALCONは、大きなキャプションデータセット(LAION-400mなど)とCLIPのような訓練済みの視覚言語モデルを使って、高機能なクロップ画像をキャプションする。キャプションの中の各単語はランク付けされ、ターゲットの特徴を詳細に記述した少数の共有、人間理解可能な概念へと導かれる。 FALCONはまた、低活性化(偽造)画像を用いた対照的な解釈を適用して、急激な概念を排除した。既存の多くのアプローチは独立して特徴を解釈するが、最先端の自己監督モデルや教師付きモデルでは、表現空間の20%未満は個々の特徴によって説明できる。より広い空間における特徴は、グループで研究するとより解釈しやすくなり、FALCONを通して高次スコアリングの概念で説明できることを示す。下流タスクにおける障害の説明とデバッグに抽出された概念をどのように利用できるかについて議論する。最後に、簡単な線形変換を学習することにより、ある(説明可能な)表現空間から別の見えない表現空間へ概念を移す手法を提案する。 We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations. For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset (like LAION-400m) and a pre-trained vision-language model like CLIP. Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts that closely describe the target feature. FALCON also applies contrastive interpretation using lowly activating (counterfactual) images, to eliminate spurious concepts. Although many existing approaches interpret features independently, we observe in state-of-the-art self-supervised and supervised models, that less than 20% of the representation space can be explained by individual features. We show that features in larger spaces become more interpretable when studied in groups and can be explained with high-order scoring concepts through FALCON. We discuss how extracted concepts can be used to explain and debug failures in downstream tasks. Finally, we present a technique to transfer concepts from one (explainable) representation space to another unseen representation space by learning a simple linear transformation.	翻訳日:2023-07-21 15:20:32 公開日:2023-07-20
# 安定性, 状態, 入力制約型安全フィルタを用いた微分フラット学習モデル予測制御 Differentially Flat Learning-based Model Predictive Control Using a Stability, State, and Input Constraining Safety Filter ( http://arxiv.org/abs/2307.10541v1 ) ライセンス: Link先を確認	Adam W. Hall and Melissa Greeff and Angela P. Schoellig	(参考訳) 学習に基づく最適制御アルゴリズムは、過去の軌道データとシステムダイナミクスの学習モデルを用いて未知のシステムを制御する。これらのコントローラは、学習したダイナミクスの線形近似、高速な計算のためのトレーディングパフォーマンス、あるいは一般的には性能は良いがリアルタイム適用性を制限する非線形最適化手法のいずれかを使用する。本稿では,最先端の学習ベースコントローラと同様の性能を実現するために微分平坦性を利用した新しい非線形コントローラを提案する。微分平坦性は、非線形入力写像によって非線形系を正確に線形化することができる力学系の特性である。ここで、非線形変換はガウス過程として学習され、高い確率、安定性、入力および平らな状態制約満足度を保証する安全フィルタで使用される。この安全フィルタは、フラットモデル予測制御器からの入力を洗練して、2つの連続凸最適化により制約付き非線形学習に基づく最適制御を行う。本手法を最先端の学習ベースの制御戦略と比較し,同様の性能を実現するとともに,計算効率が大幅に向上するとともに,フラット状態と入力制約を尊重し,安定性を保証した。 Learning-based optimal control algorithms control unknown systems using past trajectory data and a learned model of the system dynamics. These controllers use either a linear approximation of the learned dynamics, trading performance for faster computation, or nonlinear optimization methods, which typically perform better but can limit real-time applicability. In this work, we present a novel nonlinear controller that exploits differential flatness to achieve similar performance to state-of-the-art learning-based controllers but with significantly less computational effort. Differential flatness is a property of dynamical systems whereby nonlinear systems can be exactly linearized through a nonlinear input mapping. Here, the nonlinear transformation is learned as a Gaussian process and is used in a safety filter that guarantees, with high probability, stability as well as input and flat state constraint satisfaction. This safety filter is then used to refine inputs from a flat model predictive controller to perform constrained nonlinear learning-based optimal control through two successive convex optimizations. We compare our method to state-of-the-art learning-based control strategies and achieve similar performance, but with significantly better computational efficiency, while also respecting flat state and input constraints, and guaranteeing stability.	翻訳日:2023-07-21 15:10:22 公開日:2023-07-20
# 測地線進化による広帯域不明瞭量子センシング Wide-band Unambiguous Quantum Sensing via Geodesic Evolution ( http://arxiv.org/abs/2307.10537v1 ) ライセンス: Link先を確認	Ke Zeng, Xiaohui Yu, Martin B. Plenio, and Zhen-Yu Wang	(参考訳) 本稿では, 量子センシング技術を用いて, 量子ビットのダイナミックスを, 断熱進化の測地線に沿って循環的に駆動する手法を提案する。このアプローチは、動的デカップリング制御でよく発生する高調波や刺激応答などの不要な共振項を同時に除去しながら、デコヒーレンスノイズと制御誤差の両方の効果を効果的に抑制する。その結果、本手法は、スピンを含む量子システムの信号検出と個別アドレス付けにロバストで広帯域であいまいで高分解能な量子センシング機能を提供する。その汎用性を示すために,本手法の低周波および高周波センシングへの応用例を示す。この量子センシング技術の重要性は、複雑な信号の検出と複雑な量子環境の制御にまで及ぶ。検出精度を高め, 量子システムの精密操作を可能にすることで, 様々な実用的応用が期待できる。 We present a quantum sensing technique that utilizes a sequence of $\pi$ pulses to cyclically drive the qubit dynamics along a geodesic path of adiabatic evolution. This approach effectively suppresses the effects of both decoherence noise and control errors while simultaneously removing unwanted resonance terms, such as higher harmonics and spurious responses commonly encountered in dynamical decoupling control. As a result, our technique offers robust, wide-band, unambiguous, and high-resolution quantum sensing capabilities for signal detection and individual addressing of quantum systems, including spins. To demonstrate its versatility, we showcase successful applications of our method in both low-frequency and high-frequency sensing scenarios. The significance of this quantum sensing technique extends to the detection of complex signals and the control of intricate quantum environments. By enhancing detection accuracy and enabling precise manipulation of quantum systems, our method holds considerable promise for a variety of practical applications.	翻訳日:2023-07-21 15:09:59 公開日:2023-07-20
# 乗算ロバスト推定器による因果推論におけるニューラルネットワークモデルのハイパーパラメータチューニング Multiply Robust Estimator Circumvents Hyperparameter Tuning of Neural Network Models in Causal Inference ( http://arxiv.org/abs/2307.10536v1 ) ライセンス: Link先を確認	Mehdi Rostami, Olli Saarela	(参考訳) 平均処理効果(ATE)の推定は2ステップで行われ、第1ステップでは治療と結果がモデル化され、第2ステップでは予測がATE推定器に挿入される。最初のステップでは、機械学習アルゴリズムの使用を含む、多くのモデルが治療と結果に適合する。しかしながら、最も因果効果の高い推定と推論をもたらす超パラメータ集合の中から選択することは難しい課題である。乗算ロバスト (MR) 推定器は1つの推定器で全ての第一段階モデルを活用できる。 MR推定器が、第一段階の処理または結果モデルの1つが$n^r$整合であれば、$n^r$整合であることを示す。また、MR が方程式の幅広いクラスの解であり、処理モデルの一つが $\sqrt{n}$-consistent であれば漸近的に正規であることを示す。 MRの標準誤差も計算され、最初のステップで真のモデルの知識を必要としない。我々のシミュレーション研究は理論的な発見を支持している。 Estimation of the Average Treatment Effect (ATE) is often carried out in 2 steps, wherein the first step, the treatment and outcome are modeled, and in the second step the predictions are inserted into the ATE estimator. In the first steps, numerous models can be fit to the treatment and outcome, including using machine learning algorithms. However, it is a difficult task to choose among the hyperparameter sets which will result in the best causal effect estimation and inference. Multiply Robust (MR) estimator allows us to leverage all the first-step models in a single estimator. We show that MR estimator is $n^r$ consistent if one of the first-step treatment or outcome models is $n^r$ consistent. We also show that MR is the solution to a broad class of estimating equations, and is asymptotically normal if one of the treatment models is $\sqrt{n}$-consistent. The standard error of MR is also calculated which does not require a knowledge of the true models in the first step. Our simulations study supports the theoretical findings.	翻訳日:2023-07-21 15:09:40 公開日:2023-07-20
# Hypernetworks を用いた高速非教師付き深層モデル選択 Fast Unsupervised Deep Outlier Model Selection with Hypernetworks ( http://arxiv.org/abs/2307.10529v1 ) ライセンス: Link先を確認	Xueying Ding, Yue Zhao, Leman Akoglu	(参考訳) 外乱検出(OD)は、多くのテクニックの豊富な文献で多くの応用を見出す。 deep neural network based od (dod) は、ディープラーニングの多くの進歩によって、近年注目を集めている。本稿では,教師なしDOD,すなわち実効性ハイパーパラメータ(HP)チューニング/モデル選択による批判的評価課題について考察する。いくつかの先行研究では、ODモデルのHPに対する感受性が報告されているが、HPの長いリストを示す現代のDODモデルにとって、非常に重要になっている。我々は,DODモデルのチューニングにHYPERを導入し,(1)監督のない検証(ラベル付き異常の欠如による)と(2)HP/モデル空間の効率的な探索(HP数の増加による)という2つの基本的な課題に対処する。鍵となるアイデアは、HPをメインのDODモデルの最適な重みにマッピングする新しいハイパーネットワーク(HN)を設計し、訓練することである。 HYPERは、多くのDODモデルの重みを動的に生成できる単一のHN(HPの異なるモデルに対応する)に乗じて、大幅なスピードアップを実現している。さらに,従来のODタスクのメタラーニングを利用して,提案したHNを効率的に訓練したプロキシ検証関数をラベルでトレーニングする。 35のODタスクに対する大規模な実験により、HYPERは高い効率で8つのベースラインに対して高いパフォーマンスを達成している。 Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.	翻訳日:2023-07-21 15:09:20 公開日:2023-07-20
# Black-Box Adviceを超える:Q値予測付きMDPのための学習拡張アルゴリズム Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions ( http://arxiv.org/abs/2307.10524v1 ) ライセンス: Link先を確認	Tongxin Li, Yiheng Lin, Shaolei Ren and Adam Wierman	(参考訳) 単軌道時間変化マルコフ決定過程(MDP)の文脈における一貫性と堅牢性の間のトレードオフを、信頼できない機械学習アドバイスを用いて検討する。私たちの作業は、アドバイスの生成方法に関する追加情報が得られる設定を考慮し、ブラックボックスソースからのアドバイスを取り扱う典型的なアプローチから外れています。連続的および離散的状態/作用空間を含む一般MDPモデルの下でQ値のアドバイスを与えられた第一種一貫性とロバスト性トレードオフを証明する。以上の結果から,Q値アドバイスを利用することで,機械学習によるアドバイスとロバストなベースラインを動的に追求することが可能となり,ほぼ最適な性能保証が得られ,ブラックボックスアドバイスのみで得られるものが改善されることが示唆された。 We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.	翻訳日:2023-07-21 15:08:55 公開日:2023-07-20
# ジェンダーチューニング: 事前訓練された言語モデルに悪影響を及ぼすための微調整 Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models ( http://arxiv.org/abs/2307.10522v1 ) ライセンス: Link先を確認	Somayeh Ghanbarzadeh, Yan Huang, Hamid Palangi, Radames Cruz Moreno, and Hamed Khanpour	(参考訳) 近年の研究では、広く使用されているプレトレーニング言語モデル(plm)が、非モデレーションプレトレーニングコーパスから社会バイアスを広めていることが明らかになっている。既存のソリューションでは、リソース集約的でコストのかかるデバイアスのためのトレーニングプロセスとデータセットが必要です。さらに、これらの手法は、下流タスクにおけるPLMのパフォーマンスを損なう。本研究では,下流タスクのデータセットを微調整することでPLMを脱臭するジェンダーチューニングを提案する。この目的のために、Gender-tuning は Masked Language Modeling (MLM) トレーニング目標をファインチューニングのトレーニングプロセスに統合する。包括的実験により、ジェンダーチューニングはplmの平均性バイアススコアの点で最先端のベースラインよりも優れており、下流タスクのデータセットのみを使用して下流タスクにおけるplmのパフォーマンスを改善していることが示された。また、性別調整は、オリジナルの微調整で動作するplmのデプロイ可能なデバイアスツールである。 Recent studies have revealed that the widely-used Pre-trained Language Models (PLMs) propagate societal biases from the large unmoderated pre-training corpora. Existing solutions require debiasing training processes and datasets for debiasing, which are resource-intensive and costly. Furthermore, these methods hurt the PLMs' performance on downstream tasks. In this study, we propose Gender-tuning, which debiases the PLMs through fine-tuning on downstream tasks' datasets. For this aim, Gender-tuning integrates Masked Language Modeling (MLM) training objectives into fine-tuning's training process. Comprehensive experiments show that Gender-tuning outperforms the state-of-the-art baselines in terms of average gender bias scores in PLMs while improving PLMs' performance on downstream tasks solely using the downstream tasks' dataset. Also, Gender-tuning is a deployable debiasing tool for any PLM that works with original fine-tuning.	翻訳日:2023-07-21 15:08:39 公開日:2023-07-20
# 文脈のない異種ジェスチャーの対話的セグメンテーション Interactive Segmentation for Diverse Gesture Types Without Context ( http://arxiv.org/abs/2307.10518v1 ) ライセンス: Link先を確認	Josh Myers-Dean, Yifei Fan, Brian Price, Wilson Chan, Danna Gurari	(参考訳) インタラクティブセグメンテーションは、モデルがどのようにセグメンテーションを作成し、編集するかを導くために、人間がイメージをマークする。画像にマーキングするためのジェスチャタイプ(クリックやスクリブルなど)のみをサポートするか、使用中のジェスチャタイプの知識を必要とするか、最終セグメンテーションにマークされた領域が含まれているか除外されるべきかを指定する必要があります。その代わりに,ユーザがイメージのみをマークしなければならない,ジェスチャータイプを指定せずに任意のジェスチャータイプを入力できる,シンプルな対話型セグメンテーションタスクを提案する。我々は,対話型セグメンテーションアルゴリズムを全体評価可能な新しい評価指標とともに,複数のジェスチャー型を持つ最初の対話型セグメンテーションデータセットを導入することで,この新しいタスクを支援する。そして、新しいタスクに適応した部分を含む多数の対話的セグメンテーションアルゴリズムを分析する。全体として有望なパフォーマンスを観察しながら、将来的な改善の領域も強調しています。この作業をさらに拡張するために、新しいデータセットをhttps://github.com/joshmyersdean/digで公開しています。 Interactive segmentation entails a human marking an image to guide how a model either creates or edits a segmentation. Our work addresses limitations of existing methods: they either only support one gesture type for marking an image (e.g., either clicks or scribbles) or require knowledge of the gesture type being employed, and require specifying whether marked regions should be included versus excluded in the final segmentation. We instead propose a simplified interactive segmentation task where a user only must mark an image, where the input can be of any gesture type without specifying the gesture type. We support this new task by introducing the first interactive segmentation dataset with multiple gesture types as well as a new evaluation metric capable of holistically evaluating interactive segmentation algorithms. We then analyze numerous interactive segmentation algorithms, including ones adapted for our novel task. While we observe promising performance overall, we also highlight areas for future improvement. To facilitate further extensions of this work, we publicly share our new dataset at https://github.com/joshmyersdean/dig.	翻訳日:2023-07-21 15:08:20 公開日:2023-07-20
# 地域包摂型社会文化的包摂型ステレオタイプ資源の構築 Building Socio-culturally Inclusive Stereotype Resources with Community Engagement ( http://arxiv.org/abs/2307.10514v1 ) ライセンス: Link先を確認	Sunipa Dev, Jaya Goyal, Dinesh Tewari, Shachi Dave, Vinodkumar Prabhakaran	(参考訳) グローバル環境における生成言語モデルの迅速な開発と展開は,害の量や種類だけでなく,辺境的なアイデンティティや経験した社会的偏見など,地域文化の文脈をいかにうまく捉えているかという点において,我々の害の測定をスケールする必要がある。現在の評価パラダイムは、多様で局所的だがグローバルな社会文化的な視点を代表していないため、この問題に対処する能力に限られている。危険度測定における過度な過小評価やスキューを防止するため、世界各国の文化や社会から人や経験を取り入れることで、評価資源の強化と校正が不可欠である。本研究は,インド社会における評価資源の社会文化的に認識された拡大,特にステレオタイプによる影響について示すものである。我々は、インドに特有の格差の軸のステレオタイプを含むリソースを構築するためのコミュニティの取り組みを考案する。結果として得られる資源は、インド文脈で知られているステレオタイプの数を、多くのユニークなアイデンティティで1000以上のステレオタイプに増加させる。また,言語モデル評価のための拡張資源の有用性と有効性を示す。コンテンツ警告: 本論文は攻撃的かもしれないステレオタイプの例を含む。 With rapid development and deployment of generative language models in global settings, there is an urgent need to also scale our measurements of harm, not just in the number and types of harms covered, but also how well they account for local cultural contexts, including marginalized identities and the social biases experienced by them. Current evaluation paradigms are limited in their abilities to address this, as they are not representative of diverse, locally situated but global, socio-cultural perspectives. It is imperative that our evaluation resources are enhanced and calibrated by including people and experiences from different cultures and societies worldwide, in order to prevent gross underestimations or skews in measurements of harm. In this work, we demonstrate a socio-culturally aware expansion of evaluation resources in the Indian societal context, specifically for the harm of stereotyping. We devise a community engaged effort to build a resource which contains stereotypes for axes of disparity that are uniquely present in India. The resultant resource increases the number of stereotypes known for and in the Indian context by over 1000 stereotypes across many unique identities. We also demonstrate the utility and effectiveness of such expanded resources for evaluations of language models. CONTENT WARNING: This paper contains examples of stereotypes that may be offensive.	翻訳日:2023-07-21 15:08:00 公開日:2023-07-20
# IvyGPT : 医療領域における中国語パスウェイ言語モデル IvyGPT: InteractiVe Chinese pathwaY language model in medical domain ( http://arxiv.org/abs/2307.10512v1 ) ライセンス: Link先を確認	Rongsheng Wang and Yaofei Duan and ChanTong Lam and Jiexi Chen and Jiangsheng Xu and Haoming Chen and Xiaohong Liu and Patrick Cheong-Iao Pang and Tao Tan	(参考訳) ChatGPTのような一般的な大規模言語モデル(LLM)は顕著な成功を収めている。しかし、これらのLSMは、精度が低く、医療アドバイスができないため、医学的に広く採用されていない。我々は、高品質なQA(QA)インスタンスとRLHF(Reinforcement Learning from Human Feedback)インスタンスで訓練および微調整を行うLLaMAに基づくLLMであるIvyGPTを提案する。教師付き微調整の後、IvyGPTは多ターン会話能力に優れるが、包括的診断など他の面では医師のようには機能しない。 RLHFを通じて、IvyGPTは人間に近いリッチな診断と治療の回答を出力することができる。トレーニングでは、QLoRAを使用して、少数のNVIDIA A100 (80GB) GPU上で33億のパラメータをトレーニングしました。実験の結果、IvyGPTは他の医療用GPTモデルよりも優れていた。 General large language models (LLMs) such as ChatGPT have shown remarkable success. However, such LLMs have not been widely adopted for medical purposes, due to poor accuracy and inability to provide medical advice. We propose IvyGPT, an LLM based on LLaMA that is trained and fine-tuned with high-quality medical question-answer (QA) instances and Reinforcement Learning from Human Feedback (RLHF). After supervised fine-tuning, IvyGPT has good multi-turn conversation capabilities, but it cannot perform like a doctor in other aspects, such as comprehensive diagnosis. Through RLHF, IvyGPT can output richer diagnosis and treatment answers that are closer to human. In the training, we used QLoRA to train 33 billion parameters on a small number of NVIDIA A100 (80GB) GPUs. Experimental results show that IvyGPT has outperformed other medical GPT models.	翻訳日:2023-07-21 15:07:40 公開日:2023-07-20
# マルチモーダル感情分析のための一般デバイアス General Debiasing for Multimodal Sentiment Analysis ( http://arxiv.org/abs/2307.10511v1 ) ライセンス: Link先を確認	Teng Sun, Juntong Ni, Wenjie Wang, Liqiang Jing, Yinwei Wei, and Liqiang Nie	(参考訳) 既存のマルチモーダル感性分析(MSA)の研究は、マルチモーダル特徴と感情ラベルの急激な相関を適合させることなく、予測にマルチモーダル情報を利用する。例えば、青い背景を持つほとんどのビデオがデータセットにポジティブなラベルを持っている場合、モデルは予測のためにこのような相関に依存するが、'blue background''は感情に関連した機能ではない。この問題に対処するために、我々は、突発的相関への依存を減らすことで、MSAモデルの外部分布(OOD)一般化能力を向上することを目的とした、一般的なMSAタスクを定義する。そこで本研究では,より偏りが大きい試料に対して適応的に小さな重みを割り当てる逆確率重み付け(ipw)に基づく一般的な偏りの枠組みを提案する。この脱バイアスフレームワークの鍵は、各サンプルのバイアスを推定することであり、これは2つのステップによって達成される。 1)各モダリティにおけるロバストな特徴と偏った特徴の分離 2)バイアス特徴を利用してバイアスを推定する。最後に,IPWを用いて大規模バイアスサンプルの効果を低減し,感情予測のための堅牢な特徴学習を実現する。モデルの一般化能力を調べるために、元のテストセットを2つのベンチマークに保持し、さらに複数のユニモーダルおよびマルチモーダルのoodテストセットを構築する。実験結果は,提案フレームワークの優れた一般化能力を示すものである。我々は、複製を容易にするコードとデータをリリースした。 Existing work on Multimodal Sentiment Analysis (MSA) utilizes multimodal information for prediction yet unavoidably suffers from fitting the spurious correlations between multimodal features and sentiment labels. For example, if most videos with a blue background have positive labels in a dataset, the model will rely on such correlations for prediction, while ``blue background'' is not a sentiment-related feature. To address this problem, we define a general debiasing MSA task, which aims to enhance the Out-Of-Distribution (OOD) generalization ability of MSA models by reducing their reliance on spurious correlations. To this end, we propose a general debiasing framework based on Inverse Probability Weighting (IPW), which adaptively assigns small weights to the samples with larger bias i.e., the severer spurious correlations). The key to this debiasing framework is to estimate the bias of each sample, which is achieved by two steps: 1) disentangling the robust features and biased features in each modality, and 2) utilizing the biased features to estimate the bias. Finally, we employ IPW to reduce the effects of large-biased samples, facilitating robust feature learning for sentiment prediction. To examine the model's generalization ability, we keep the original testing sets on two benchmarks and additionally construct multiple unimodal and multimodal OOD testing sets. The empirical results demonstrate the superior generalization ability of our proposed framework. We have released the code and data to facilitate the reproduction.	翻訳日:2023-07-21 15:07:23 公開日:2023-07-20
# FACADE: 逆回路異常検出と評価のためのフレームワーク FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation ( http://arxiv.org/abs/2307.10563v1 ) ライセンス: Link先を確認	Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo	(参考訳) 本稿では、深層ニューラルネットワークにおける教師なし機械的異常検出のための新しい確率的および幾何学的フレームワークであるFACADEを提案する。その主な目標は、敵の攻撃の理解と緩和を促進することである。 FACADEは、回路上の確率分布を生成することを目的としており、擬似クラスや活性化空間における高次元モードの多様体特性の変化への寄与に重要な洞察を与え、敵の攻撃を発見・戦える強力なツールを提供する。我々のアプローチは、モデルの堅牢性を改善し、スケーラブルなモデル監視を強化し、現実のデプロイメント環境で有望なアプリケーションを実証することを目指している。 We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks. Its primary goal is advancing the understanding and mitigation of adversarial attacks. FACADE aims to generate probabilistic distributions over circuits, which provide critical insights to their contribution to changes in the manifold properties of pseudo-classes, or high-dimensional modes in activation space, yielding a powerful tool for uncovering and combating adversarial attacks. Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.	翻訳日:2023-07-21 15:02:30 公開日:2023-07-20
# 共用逆学習:非学習共用逆学習によるバックドア緩和 Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples ( http://arxiv.org/abs/2307.10562v1 ) ライセンス: Link先を確認	Shaokui Wei, Mingda Zhang, Hongyuan Zha, Baoyuan Wu	(参考訳) バックドア攻撃は、敵がトレーニングセットに有毒なサンプルを注入し、特定のターゲットクラスに特定のトリガーを含む有毒なサンプルを予測するバックドアモデルを引き起こす機械学習モデルに対する深刻なセキュリティ脅威である。本稿では,小さなクリーンデータセットを用いて,バックドアモデルの浄化作業について検討する。バックドアリスクと逆境リスクの関連性を確立することにより、バックドアモデルと浄化モデルとの間の共有敵例(SAE)のリスクを主に捉えた、バックドアリスクの新たな上限を導出する。この上界はさらに、対向訓練技術を用いてバックドアを緩和する新しい二段階最適化問題を示唆している。そこで本稿では,SAU(Shared Adversarial Unlearning)を提案する。具体的には、SAUはまずSAEを生成し、次いで生成されたSAEを、精製されたモデルによって正しく分類されるか、2つのモデルによって正しく分類され、バックドアモデルにおけるバックドア効果が浄化されたモデルで緩和されるように解放する。各種ベンチマークデータセットとネットワークアーキテクチャの実験により,提案手法がバックドアディフェンスの最先端性能を実現することを示す。 Backdoor attacks are serious security threats to machine learning models where an adversary can inject poisoned samples into the training set, causing a backdoored model which predicts poisoned samples with particular triggers to particular target classes, while behaving normally on benign samples. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk, which mainly captures the risk on the shared adversarial examples (SAEs) between the backdoored model and the purified model. This upper bound further suggests a novel bi-level optimization problem for mitigating backdoor using adversarial training techniques. To solve it, we propose Shared Adversarial Unlearning (SAU). Specifically, SAU first generates SAEs, and then, unlearns the generated SAEs such that they are either correctly classified by the purified model and/or differently classified by the two models, such that the backdoor effect in the backdoored model will be mitigated in the purified model. Experiments on various benchmark datasets and network architectures show that our proposed method achieves state-of-the-art performance for backdoor defense.	翻訳日:2023-07-21 15:02:19 公開日:2023-07-20
# 変分後量子ニューラルネットワーク Post-variational quantum neural networks ( http://arxiv.org/abs/2307.10560v1 ) ライセンス: Link先を確認	Po-Wei Huang, Patrick Rebentrost	(参考訳) 量子コンピューティングは、現在の最先端の古典的スーパーコンピュータよりも大きな計算上の利点を提供する可能性がある。しかし、現在のハードウェアはフォールトトレラント量子アルゴリズムを実行するには不十分である。変分アルゴリズムを用いたハイブリッド量子古典計算の代替として、バレンプラトー問題があり、勾配に基づく最適化手法の収束が遅い。本稿では,量子モデル最適化において,可変パラメータを量子コンピュータから古典コンピュータにシフトし,アンサンブル戦略を選択する「変分後戦略」について述べる。個々の量子回路を構築するための様々な戦略と設計原則について論じ、その結果のアンサンブルを凸プログラミングで最適化することができる。さらに,変分後量子ニューラルネットワークのアーキテクチャ設計について検討し,そのようなニューラルネットワークにおける推定誤差の伝播解析を行う。最後に,手書き桁のイメージ分類などの実世界の応用に適用し,96%の精度で分類できることを示す。 Quantum computing has the potential to provide substantial computational advantages over current state-of-the-art classical supercomputers. However, current hardware is not advanced enough to execute fault-tolerant quantum algorithms. An alternative of using hybrid quantum-classical computing with variational algorithms can exhibit barren plateau issues, causing slow convergence of gradient-based optimization techniques. In this paper, we discuss "post-variational strategies", which shift tunable parameters from the quantum computer to the classical computer, opting for ensemble strategies when optimizing quantum models. We discuss various strategies and design principles for constructing individual quantum circuits, where the resulting ensembles can be optimized with convex programming. Further, we discuss architectural designs of post-variational quantum neural networks and analyze the propagation of estimation errors throughout such neural networks. Lastly, we show that our algorithm can be applied to real-world applications such as image classification on handwritten digits, producing a 96% classification accuracy.	翻訳日:2023-07-21 15:01:55 公開日:2023-07-20
# 共形動的グラフ学習を用いたエアトラヒックコントローラの負荷レベル予測 Air Traffic Controller Workload Level Prediction using Conformalized Dynamical Graph Learning ( http://arxiv.org/abs/2307.10559v1 ) ライセンス: Link先を確認	Yutian Pang, Jueming Hu, Christopher S. Lieber, Nancy J. Cooke, Yongming Liu	(参考訳) 航空管制 (atc) は、地上交通管制局 (atcos) が日々の航空運用を維持するために常に注意を払わなければならない安全クリティカルサービスシステムである。 ATCoの作業負荷は、運用上の安全性と空域利用に悪影響を及ぼす可能性がある。 ATCosの過負荷を回避し、許容されるワークロードレベルを確保するためには、ATCosのワークロードを正確に予測することが重要である。本稿では,まず,航空交通の観点からatcoの作業負荷に関する研究を概観した。そこで,本研究では,航空交通データとワークロードラベルが得られたATCoによるHuman-in-the-loop(HITL)シミュレーションのセットアップについて紹介する。シミュレーションは3つのphoenixアプローチのシナリオで行われ、ヒトのatcoは負荷評価(低-1から高7)を自己評価するよう要求される。予備データ分析を行う。次に,共形予測を用いたグラフベースのディープラーニングフレームワークを提案し,atcoのワークロードレベルを同定する。制御器の制御下にある航空機の数は空間的にも時間的にも変化し、動的に進化するグラフとなる。実験結果は (a)トラフィック密度機能以外に、トラフィック競合機能は、ワークロードの予測能力(すなわち、最小水平/垂直分離距離)に寄与する。 b) グラフニューラルネットワークを用いた空域の時空間グラフレイアウトから直接学習することにより,手作りの交通複雑性特性と比較して,高い予測精度が得られる。 c) 適合予測(conformal prediction)は,モデル予測精度をさらに向上させる上で有用なツールである。使用されるコードは \href{https://github.com/ymlasu/para-atm-collection/blob/master/air-traffic-prediction/ATC-Workload-Predic tion/}{$\mathsf{Link}$} で公開されている。 Air traffic control (ATC) is a safety-critical service system that demands constant attention from ground air traffic controllers (ATCos) to maintain daily aviation operations. The workload of the ATCos can have negative effects on operational safety and airspace usage. To avoid overloading and ensure an acceptable workload level for the ATCos, it is important to predict the ATCos' workload accurately for mitigation actions. In this paper, we first perform a review of research on ATCo workload, mostly from the air traffic perspective. Then, we briefly introduce the setup of the human-in-the-loop (HITL) simulations with retired ATCos, where the air traffic data and workload labels are obtained. The simulations are conducted under three Phoenix approach scenarios while the human ATCos are requested to self-evaluate their workload ratings (i.e., low-1 to high-7). Preliminary data analysis is conducted. Next, we propose a graph-based deep-learning framework with conformal prediction to identify the ATCo workload levels. The number of aircraft under the controller's control varies both spatially and temporally, resulting in dynamically evolving graphs. The experiment results suggest that (a) besides the traffic density feature, the traffic conflict feature contributes to the workload prediction capabilities (i.e., minimum horizontal/vertical separation distance); (b) directly learning from the spatiotemporal graph layout of airspace with graph neural network can achieve higher prediction accuracy, compare to hand-crafted traffic complexity features; (c) conformal prediction is a valuable tool to further boost model prediction accuracy, resulting a range of predicted workload labels. The code used is available at \href{https://github.com/ymlasu/para-atm-collection/blob/master/air-traffic-prediction/ATC-Workload-Predic tion/}{$\mathsf{Link}$}.	翻訳日:2023-07-21 15:01:39 公開日:2023-07-20
# 動詞操作による命令追従評価 Instruction-following Evaluation through Verbalizer Manipulation ( http://arxiv.org/abs/2307.10558v1 ) ライセンス: Link先を確認	Shiyang Li, Jun Yan, Hai Wang, Zheng Tang, Xiang Ren, Vijay Srinivasan, Hongxia Jin	(参考訳) 命令調整型モデルは様々な自然言語処理タスクで顕著に成功したが、命令に従う能力の正確な評価は依然として難しい。既存のベンチマークは主に、トレーニング中にモデルが学んだこととよく一致する一般的な命令に焦点を当てています。しかし、これらの指示に応答する能力は、必ずしも命令追従の強い能力を意味するとは限らない。本稿では,動詞操作と呼ばれる新しい指示追従評価プロトコルを提案する。タスクラベルを、モデル先行と異なる程度に整合した単語で動詞化し、高い整合性(例えば、肯定的な感情に ``postive'' を出力する)から最小整合性(例えば、肯定的な感情に `` negative'' を出力する)の言語化を指示する。バーバリザの操作は、任意の分類ベンチマークとシームレスに統合して、モデルの事前依存性と、それらをオーバーライドして正確に指示に従う能力を調べることができる。我々は、9つのデータセットにまたがる4つの主要なモデルファミリーを包括的に評価し、それぞれに12組の発声器を用いる。我々は,異なる家族や規模にわたるモデルの指示追従能力が,より自然な言語化能力の低下によって著しく異なることを観察した。最強のGPT-4モデルでさえ、最も難易度の高い動詞をランダムに推測するよりも優れた性能を発揮するのに苦労している。 While instruction-tuned models have shown remarkable success in various natural language processing tasks, accurately evaluating their ability to follow instructions remains challenging. Existing benchmarks primarily focus on common instructions that align well with what the model learned during training. However, proficiency in responding to these instructions does not necessarily imply strong ability in instruction following. In this paper, we propose a novel instruction-following evaluation protocol called verbalizer manipulation. It instructs the model to verbalize the task label with words aligning with model priors to different extents, adopting verbalizers from highly aligned (e.g., outputting ``postive'' for positive sentiment), to minimally aligned (e.g., outputting ``negative'' for positive sentiment). Verbalizer manipulation can be seamlessly integrated with any classification benchmark to examine the model's reliance on priors and its ability to override them to accurately follow the instructions. We conduct a comprehensive evaluation of four major model families across nine datasets, employing twelve sets of verbalizers for each of them. We observe that the instruction-following abilities of models, across different families and scales, are significantly distinguished by their performance on less natural verbalizers. Even the strongest GPT-4 model struggles to perform better than random guessing on the most challenging verbalizer, emphasizing the need for continued advancements to improve their instruction-following abilities.	翻訳日:2023-07-21 15:01:03 公開日:2023-07-20
# EMQ: 自動混合精度量子化のためのトレーニング不要プロキシの進化 EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization ( http://arxiv.org/abs/2307.10554v1 ) ライセンス: Link先を確認	Peijie Dong and Lujun Li and Zimian Wei and Xin Niu and Zhiliang Tian and Hengyue Pan	(参考訳) Mixed-Precision Quantization~(MQ)は、モデルの競合する精度と複雑さのトレードオフを実現する。従来のトレーニングベースの検索手法では、MQ内の層ごとのビット幅設定を最適化するために時間を要する。近年、トレーニング不要なアプローチでは様々なMQプロキシが提供され、探索効率が大幅に向上している。しかし、これらのプロキシと量子化精度の相関性はよく分かっていない。このギャップに対処するために、私たちはまず、異なるビット構成と量子化結果を含むMQ-Bench-101を構築します。そこで,既存のトレーニングフリープロキシはMQ-Bench-101上で弱い相関関係を示す。優れたプロキシを効率的に探索するために,進化アルゴリズムによるMQ用プロキシフレームワークの自動検索を開発する。特に、既存のプロキシを含む精巧な検索空間を考案し、進化探索を行い、最も相関性の高いMQプロキシを発見する。我々は, 早期収束を回避し, 検索効率を向上させるために, 多様性向上戦略と互換性スクリーニングプロトコルを提案する。このようにして、Evolving proxies for Mixed-precision Quantization~(EMQ)フレームワークは、重いチューニングや専門知識のないプロキシの自動生成を可能にします。様々なResNetおよびMobileNetファミリによるImageNetの大規模な実験により、当社のEMQは最先端の混合精度メソッドよりも大幅にコストを削減して優れたパフォーマンスが得られることを示した。コードはリリースされます。 Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between these proxies and quantization accuracy is poorly understood. To address the gap, we first build the MQ-Bench-101, which involves different bit configurations and quantization results. Then, we observe that the existing training-free proxies perform weak correlations on the MQ-Bench-101. To efficiently seek superior proxies, we develop an automatic search of proxies framework for MQ via evolving algorithms. In particular, we devise an elaborate search space involving the existing proxies and perform an evolution search to discover the best correlated MQ proxy. We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. In this way, our Evolving proxies for Mixed-precision Quantization~(EMQ) framework allows the auto-generation of proxies without heavy tuning and expert knowledge. Extensive experiments on ImageNet with various ResNet and MobileNet families demonstrate that our EMQ obtains superior performance than state-of-the-art mixed-precision methods at a significantly reduced cost. The code will be released.	翻訳日:2023-07-21 15:00:20 公開日:2023-07-20
# PPN:複合レイアウトを用いた鍵情報抽出のための並列ポインタベースネットワーク PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts ( http://arxiv.org/abs/2307.10551v1 ) ライセンス: Link先を確認	Kaiwen Wei, Jie Yao, Jingyuan Zhang, Yangyang Kang, Fubang Zhao, Yating Zhang, Changlong Sun, Xin Jin, Xin Zhang	(参考訳) キー情報抽出(KIE)は、視覚的にリッチなドキュメントから構造化された値の意味的エンティティを抽出することを目的とした、挑戦的なマルチモーダルタスクである。重要な進展はありますが、対処すべき大きな課題は2つあります。まず、既存のデータセットのレイアウトが比較的固定され、セマンティックエンティティのカテゴリの数に制限されるため、これらのデータセットと複雑な実世界のシナリオの間に大きなギャップが生じる。第二に、既存の手法は2段階のパイプライン戦略に従い、エラー伝播問題を引き起こす可能性がある。さらに、見当たらない意味的エンティティカテゴリが出現する状況では、適用が難しい。キー情報抽出のための複合レイアウト形式 (clex) と呼ばれる, 意味的エンティティカテゴリ1,162の5,860画像からなる, 新たな大規模ヒューマンアノテートデータセットを提案する。第2の課題を解決するために,ゼロショットおよび少数ショットシナリオに適用可能なエンドツーエンドモデルであるParallel Pointer-based Network (PPN)を導入する。 PPNはセマンティックエンティティ間の暗黙の手がかりを利用して抽出を支援し、その並列抽出機構により複数の結果を同時に効率的に抽出することができる。 CLEXデータセットの実験では、PPNは既存の最先端メソッドよりも優れており、推論速度もはるかに高速である。 Key Information Extraction (KIE) is a challenging multimodal task that aims to extract structured value semantic entities from visually rich documents. Although significant progress has been made, there are still two major challenges that need to be addressed. Firstly, the layout of existing datasets is relatively fixed and limited in the number of semantic entity categories, creating a significant gap between these datasets and the complex real-world scenarios. Secondly, existing methods follow a two-stage pipeline strategy, which may lead to the error propagation problem. Additionally, they are difficult to apply in situations where unseen semantic entity categories emerge. To address the first challenge, we propose a new large-scale human-annotated dataset named Complex Layout form for key information EXtraction (CLEX), which consists of 5,860 images with 1,162 semantic entity categories. To solve the second challenge, we introduce Parallel Pointer-based Network (PPN), an end-to-end model that can be applied in zero-shot and few-shot scenarios. PPN leverages the implicit clues between semantic entities to assist extracting, and its parallel extraction mechanism allows it to extract multiple results simultaneously and efficiently. Experiments on the CLEX dataset demonstrate that PPN outperforms existing state-of-the-art methods while also offering a much faster inference speed.	翻訳日:2023-07-21 14:59:44 公開日:2023-07-20
# SC VALL-E:音声合成のためのスタイル制御可能なゼロショットテキスト SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer ( http://arxiv.org/abs/2307.10550v1 ) ライセンス: Link先を確認	Daegyeom Kim, Seongho Hong, and Yong-Hoon Choi	(参考訳) 音声のさまざまな特性を制御し、所望の声を生成するために、さまざまな話者、さまざまな感情、異なる話し方を備えたコーパスをデータセットに追加し、表現型音声合成モデルを訓練する。本稿では,ニューラルコーデック言語モデル(VALL-E)に基づくスタイル制御(SC)VALL-Eモデルを提案する。提案したSC VALL-Eは、テキストから入力を受け、音声をプロンプトし、プロンプト音声の特徴を単に模倣するのではなく、属性を制御して多様な音声を生成することによって制御可能な音声を生成するように設計されている。感情,発話率,ピッチ,音声強度などの属性を表現する新たに設計されたスタイルネットワークのスタイル埋め込みマトリックス内のトークンを識別し,これらの属性を制御可能なモデルを設計する。 SC VALL-Eの性能を評価するために,グローバルスタイルトークン(GST)Tacotron2,可変オートエンコーダ(VAE)Tacotron2,オリジナルVALL-Eの3つの代表的な表現型音声合成モデルを用いて比較実験を行った。単語誤り率(wer)、f0音声誤り(fve)、f0グロスピッチ誤差(f0gpe)を評価指標として測定し、生成文の精度を評価する。合成音声の品質を比較するために,比較平均オプションスコア(cmos)と類似度平均オプションスコア(smos)を測定した。生成した音声のスタイル制御能力を評価するために,F0 と mel-spectrogram の変化を学習トークンの修正によって観察する。トレーニングデータに存在しないプロンプトオーディオを使用する場合、SC VALL-Eは様々な表現音を生成し、既存のモデルと比較して競合性能を示す。実装、事前トレーニングされたモデル、オーディオサンプルはgithubにあります。 Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generative pretrained transformer 3 (GPT-3). The proposed SC VALL-E takes input from text sentences and prompt audio and is designed to generate controllable speech by not simply mimicking the characteristics of the prompt audio but by controlling the attributes to produce diverse voices. We identify tokens in the style embedding matrix of the newly designed style network that represent attributes such as emotion, speaking rate, pitch, and voice intensity, and design a model that can control these attributes. To evaluate the performance of SC VALL-E, we conduct comparative experiments with three representative expressive speech synthesis models: global style token (GST) Tacotron2, variational autoencoder (VAE) Tacotron2, and original VALL-E. We measure word error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as evaluation metrics to assess the accuracy of generated sentences. For comparing the quality of synthesized speech, we measure comparative mean option score (CMOS) and similarity mean option score (SMOS). To evaluate the style control ability of the generated speech, we observe the changes in F0 and mel-spectrogram by modifying the trained tokens. When using prompt audio that is not present in the training data, SC VALL-E generates a variety of expressive sounds and demonstrates competitive performance compared to the existing models. Our implementation, pretrained models, and audio samples are located on GitHub.	翻訳日:2023-07-21 14:59:07 公開日:2023-07-20
# ブロックチェーン上の動的大規模言語モデル Dynamic Large Language Models on Blockchains ( http://arxiv.org/abs/2307.10549v1 ) ライセンス: Link先を確認	Yuanhao Gong	(参考訳) 言語モデルには数十億のパラメータが含まれており、テキストには数千のトークンがあるため、大規模な言語モデルの訓練とデプロイには大量の計算資源が必要である。もう一つの問題は、大きな言語モデルが静的であることだ。トレーニングプロセス後に修正される。本稿では,これらの問題に対処するために,計算性能が高く,コンピュータネットワークに分散したブロックチェーン上での動的大規模言語モデルのトレーニングと展開を提案する。ブロックチェーンはセキュアで分散化された透明なシステムであり、仲介者不要のトランザクションのためのタンパー保護台帳の作成を可能にする。動的大規模言語モデルは、トレーニングプロセス後にユーザの入力から継続的に学習することができる。我々の手法は,大規模言語モデルを開発するための新しい方法を提供し,次世代人工知能システムに光を当てる。 Training and deploying the large language models requires a large mount of computational resource because the language models contain billions of parameters and the text has thousands of tokens. Another problem is that the large language models are static. They are fixed after the training process. To tackle these issues, in this paper, we propose to train and deploy the dynamic large language model on blockchains, which have high computation performance and are distributed across a network of computers. A blockchain is a secure, decentralized, and transparent system that allows for the creation of a tamper-proof ledger for transactions without the need for intermediaries. The dynamic large language models can continuously learn from the user input after the training process. Our method provides a new way to develop the large language models and also sheds a light on the next generation artificial intelligence systems.	翻訳日:2023-07-21 14:58:32 公開日:2023-07-20
# TREA:会話レコメンデーションのための木構造推論スキーマ TREA: Tree-Structure Reasoning Schema for Conversational Recommendation ( http://arxiv.org/abs/2307.10543v1 ) ライセンス: Link先を確認	Wendi Li, Wei Wei, Xiaoye Qu, Xian-Ling Mao, Ye Yuan, Wenfeng Xie, Dangyang Chen	(参考訳) 対話レコメンデーションシステム(CRS)は,対話を通じてユーザの動的興味をタイムリーに追跡し,項目レコメンデーションに対する関連応答を生成することを目的としている。近年,会話コンテキストの理解を深めるため,様々な外部知識基盤(特に知識グラフ)がCRSに組み込まれている。しかし、近年の推論モデルでは、因果関係推論のための線形構造や固定階層構造などの簡素な構造に大きく依存しており、外部知識を持つ発話間の洗練された関係を完全には理解できない。そこで本研究では,TREA という新しいツリー構造 schEmA を提案する。 TREAは、言及されたエンティティ間の因果関係を明らかにするための推論構造として多階層的スケーラブルツリーを構築し、過去の会話を十分に活用し、推奨された結果に対してより合理的で適切な応答を生成する。 2つの公開CRSデータセットに対する大規模な実験は、我々のアプローチの有効性を実証した。 Conversational recommender systems (CRS) aim to timely trace the dynamic interests of users through dialogues and generate relevant responses for item recommendations. Recently, various external knowledge bases (especially knowledge graphs) are incorporated into CRS to enhance the understanding of conversation contexts. However, recent reasoning-based models heavily rely on simplified structures such as linear structures or fixed-hierarchical structures for causality reasoning, hence they cannot fully figure out sophisticated relationships among utterances with external knowledge. To address this, we propose a novel Tree structure Reasoning schEmA named TREA. TREA constructs a multi-hierarchical scalable tree as the reasoning structure to clarify the causal relationships between mentioned entities, and fully utilizes historical conversations to generate more reasonable and suitable responses for recommended results. Extensive experiments on two public CRS datasets have demonstrated the effectiveness of our approach.	翻訳日:2023-07-21 14:58:19 公開日:2023-07-20
# 非二項安定化符号からのナラインCFT Narain CFTs from nonbinary stabilizer codes ( http://arxiv.org/abs/2307.10581v1 ) ライセンス: Link先を確認	Yasin Ferdous Alam, Kohki Kawabata, Tatsuma Nishioka, Takuya Okuda and Shinichiro Yahagi	(参考訳) 我々は、カライン共形体論(CFT)を、クーディット安定化符号から、素電力オーダーの有限体上の量子安定化符号($p$素数と$m\geq 1$)、または$k>1$の環上の量子安定化符号($k>1$)の構成へと一般化する。我々の構成は有理 CFT であり、これは以前の CFT よりも、ナライン CFT のモジュライ空間のより大きな点集合をカバーする。また、非ゼロ論理量子ビットの量子安定化符号と有限集合のナライン CFT との対応も提案する。本稿では,よく知られた安定化符号との対応について述べる。 We generalize the construction of Narain conformal field theories (CFTs) from qudit stabilizer codes to the construction from quantum stabilizer codes over the finite field of prime power order ($\mathbb{F}_{p^m}$ with $p$ prime and $m\geq 1$) or over the ring $\mathbb{Z}_k$ with $k>1$. Our construction results in rational CFTs, which cover a larger set of points in the moduli space of Narain CFTs than the previous one. We also propose a correspondence between a quantum stabilizer code with non-zero logical qubits and a finite set of Narain CFTs. We illustrate the correspondence with well-known stabilizer codes.	翻訳日:2023-07-21 14:50:33 公開日:2023-07-20
# 中国沖海霧予測のためのインテリジェントモデル Intelligent model for offshore China sea fog forecasting ( http://arxiv.org/abs/2307.10580v1 ) ライセンス: Link先を確認	Yanfei Xiang, Qinghong Zhang, Mingqing Wang, Ruixue Xia, Yang Kong, Xiaomeng Huang	(参考訳) 海洋経済活動と沿岸経済活動の効果的管理には,海霧の正確な時間的予測が重要である。海霧の複雑な性質と固有の変動を考えると、従来の数値および統計的予測法は不適切であることがしばしば証明される。本研究の目的は,yre(yangtze river estuary)沿岸地域を事例として,数値気象予測モデルに組み込んだ高度海霧予測手法の開発である。機械学習モデルをトレーニングする前に,タイムラグ相関分析手法を用いて主要な予測要因を同定し,海霧の発生を誘発するメカニズムを解明した。さらに,不均衡データ問題に対処するためにアンサンブル学習と焦点損失関数を実装し,モデルの予測能力を高める。本手法の精度を検証するため,気象観測と過去の予測の両方を含む1年にわたる包括的データセットを用いて,その性能を評価する。驚くべきことに、機械学習に基づくアプローチは、気象研究と非静水型メソスケールモデル(wrf-nmm)と、アメリカ海洋大気庁(noaa)予測システム研究所(fsl)が開発したアルゴリズムの2つの従来の手法の予測性能を上回っている。具体的には,60時間のリードタイムで1km以下の可視性を有する海霧の予測において,検出確率(pod)を増加させ,同時に誤警報率(far)を低下させることにより,優れた結果を得る。 Accurate and timely prediction of sea fog is very important for effectively managing maritime and coastal economic activities. Given the intricate nature and inherent variability of sea fog, traditional numerical and statistical forecasting methods are often proven inadequate. This study aims to develop an advanced sea fog forecasting method embedded in a numerical weather prediction model using the Yangtze River Estuary (YRE) coastal area as a case study. Prior to training our machine learning model, we employ a time-lagged correlation analysis technique to identify key predictors and decipher the underlying mechanisms driving sea fog occurrence. In addition, we implement ensemble learning and a focal loss function to address the issue of imbalanced data, thereby enhancing the predictive ability of our model. To verify the accuracy of our method, we evaluate its performance using a comprehensive dataset spanning one year, which encompasses both weather station observations and historical forecasts. Remarkably, our machine learning-based approach surpasses the predictive performance of two conventional methods, the weather research and forecasting nonhydrostatic mesoscale model (WRF-NMM) and the algorithm developed by the National Oceanic and Atmospheric Administration (NOAA) Forecast Systems Laboratory (FSL). Specifically, in regard to predicting sea fog with a visibility of less than or equal to 1 km with a lead time of 60 hours, our methodology achieves superior results by increasing the probability of detection (POD) while simultaneously reducing the false alarm ratio (FAR).	翻訳日:2023-07-21 14:50:14 公開日:2023-07-20
# 多目的フェデレーション学習によるSecureBoostハイパーパラメータチューニング SecureBoost Hyperparameter Tuning via Multi-Objective Federated Learning ( http://arxiv.org/abs/2307.10579v1 ) ライセンス: Link先を確認	Ziyao Ren, Yan Kang, Lixin Fan, Linghua Yang, Tao Fan, Yongxin Tong and Qiang Yang	(参考訳) SecureBoostは、準同型暗号化を活用して、垂直連邦学習環境でデータのプライバシを保護するツリーブースティングアルゴリズムである。金融や医療などの分野では、解釈可能性、有効性、プライバシー保護能力によって広く利用されている。しかしSecureBoostは、高い計算複雑性とラベルリークのリスクに悩まされている。 SecureBoostの潜在能力を最大限活用するためには、SecureBoostのハイパーパラメータを慎重に選択して、ユーティリティ、効率、プライバシの最適なバランスをとる必要がある。既存の手法では経験的あるいはヒューリスティックにハイパーパラメータを設定するが、それらは最適とはほど遠い。このギャップを埋めるために、制約付きマルチオブジェクトセキュアBoost(CMOSB)アルゴリズムを提案し、各ソリューションがユーティリティ損失、トレーニングコスト、プライバシリークの間の最適なトレードオフを達成するためのハイパーパラメータのセットである、Pareto最適解を見つける。 3つの目的の測定を設計する。特に,提案したインスタンスクラスタリング攻撃を用いて,プライバシリークを測定する。実験により、CMOSBはベースラインよりも優れたハイパーパラメータを得るだけでなく、FL参加者のフレキシブルな要求を満たすための最適なハイパーパラメータセットも得られることが示された。 SecureBoost is a tree-boosting algorithm leveraging homomorphic encryption to protect data privacy in vertical federated learning setting. It is widely used in fields such as finance and healthcare due to its interpretability, effectiveness, and privacy-preserving capability. However, SecureBoost suffers from high computational complexity and risk of label leakage. To harness the full potential of SecureBoost, hyperparameters of SecureBoost should be carefully chosen to strike an optimal balance between utility, efficiency, and privacy. Existing methods either set hyperparameters empirically or heuristically, which are far from optimal. To fill this gap, we propose a Constrained Multi-Objective SecureBoost (CMOSB) algorithm to find Pareto optimal solutions that each solution is a set of hyperparameters achieving optimal tradeoff between utility loss, training cost, and privacy leakage. We design measurements of the three objectives. In particular, the privacy leakage is measured using our proposed instance clustering attack. Experimental results demonstrate that the CMOSB yields not only hyperparameters superior to the baseline but also optimal sets of hyperparameters that can support the flexible requirements of FL participants.	翻訳日:2023-07-21 14:49:50 公開日:2023-07-20
# ethosight:文脈ラベル親和性メトリクスと推論に基づく反復学習を用いたニュアンス知覚のための共同埋め込みシステム Ethosight: A Joint-Embedding Based System for Nuanced Perception Using Contextual Label Affinity Metric and Reasoning Based Iterative Learning ( http://arxiv.org/abs/2307.10577v1 ) ライセンス: Link先を確認	Hugo Latapie, Kristinn R. Thorisson, Shan Yu, Vahagn Petrosyan, Patrick Hammer, Pei Wang, Brandon Kynoch, Hanning Chen, Tangrui Li	(参考訳) 従来のコンピュータビジョンモデルは、データ取得と検証、特に微妙な行動のニュアンスやイベントを検出するために、広範囲な手作業を必要とする。日常的な買い物と潜在的な万引きを区別するといった、現実世界のアプリケーションにおける潜在的なリスクとルーチンの振る舞いを区別することの難しさは、さらにプロセスを複雑にする。本稿では,新しいゼロショットコンピュータビジョンアルゴリズムであるethosightを提案する。 ethosightは、ユーザの要求と関心のセマンティックな知識に基づいたクリーンなスレートから始まり、既存のシンボル知識の必要性を根絶する。局所ラベル親和性計算と推論誘導反復学習ループを用いて、Ethosightはシーンの詳細を推測し、ラベルセットを反復的に洗練する。推論メカニズムは、GPT4のような大きな言語モデル、OpenNARSのようなシンボリック推論、ハイブリッドシステムから派生することができる。 Ethosightは、事前訓練されたマルチモーダルモデルであるImageBindの機能をさらに活用し、数サイクルで画像の正確なセマンティック知識を生成する。明示的要素とニュアンス的要素の両方を効率的にキャプチャする。また、Korzybskiの"タイムバインディング"の概念をマシンで実装し、世代別学習とデプロイメント間の知識共有を可能にします。以上の結果から,ethosightは40の複雑なユースケースにまたがる有効性を示す。それは、新しい関心領域を識別する特別な能力を示し、1000のセットから上位5レーベルで常に高い親和性スコアを生成している。さまざまな環境で実施されたテストは、ethosightの堅牢なパフォーマンスを証明している。本論文の本体内における詳細な結果とケーススタディと付録は,微妙でニュアンスな動作の検出と抽出において,コンピュータビジョンモデルの適応性とレジリエンスを高めるための有望な軌道を示すものである。 Traditional computer vision models often require extensive manual effort for data acquisition and validation, particularly when detecting subtle behavioral nuances or events. The difficulty in distinguishing routine behaviors from potential risks in real-world applications, like differentiating routine shopping from potential shoplifting, further complicates the process. We present Ethosight, a novel zero-shot computer vision algorithm. Ethosight eradicates the need for pre-existing symbolic knowledge, initiating from a clean slate based on user requirements and semantic knowledge of interest. Using localized label affinity calculations and a reasoning-guided iterative learning loop, Ethosight infers scene details and iteratively refines the label set. Reasoning mechanisms can be derived from large language models like GPT4, symbolic reasoners like OpenNARS, or hybrid systems. Ethosight further capitalizes on the capabilities of a pre-trained multi-modal model, ImageBind, generating accurate semantic knowledge of images within a few cycles. It successfully captures both explicit and nuanced elements efficiently. We also introduce the implementation of Korzybski's "time-binding" concept in machines, which allows for generational learning and knowledge sharing across deployments. Our evaluations demonstrate Ethosight's efficacy across 40 complex use cases. It has exhibited an exceptional ability to discern new areas of interest, consistently generating high-affinity scores within the top five labels from a set of a thousand. Tests conducted across diverse environments attest to Ethosight's robust performance. Detailed results and case studies within the main body of this paper and an appendix underscore a promising trajectory towards enhancing the adaptability and resilience of computer vision models in detecting and extracting subtle and nuanced behaviors.	翻訳日:2023-07-21 14:49:30 公開日:2023-07-20
# プロトタイプ正規化による連合学習収束の促進 Boosting Federated Learning Convergence with Prototype Regularization ( http://arxiv.org/abs/2307.10575v1 ) ライセンス: Link先を確認	Yu Qiao, Huy Q. Le, Choong Seon Hong	(参考訳) 分散機械学習技術として、フェデレートラーニング(FL)では、クライアントがローカルデータをリークすることなく、エッジサーバで共有モデルを共同でトレーニングする必要がある。しかし、クライアント間での不均一なデータ分散は、しばしばモデルの性能を低下させる。そこで本研究では,データ分布の不均一性に対処するプロトタイプベースの正規化戦略を提案する。具体的には、正規化プロセスでは、サーバが分散クライアントからローカルプロトタイプを集約してグローバルプロトタイプを生成し、それを個々のクライアントに送信して、ローカルトレーニングをガイドする。 MNISTとFashion-MNISTの実験結果から,最も人気のあるベースラインであるFedAvgと比較して平均テスト精度は3.3%,8.9%向上した。さらに,本手法は不均一な環境での収束速度が速い。 As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneity in the data distribution. Specifically, the regularization process involves the server aggregating local prototypes from distributed clients to generate a global prototype, which is then sent back to the individual clients to guide their local training. The experimental results on MNIST and Fashion-MNIST show that our proposal achieves improvements of 3.3% and 8.9% in average test accuracy, respectively, compared to the most popular baseline FedAvg. Furthermore, our approach has a fast convergence rate in heterogeneous settings.	翻訳日:2023-07-21 14:48:56 公開日:2023-07-20
# オンライン深層強化学習による建設作業とキャッシュフローの最適化のための資源フローの適応制御 Adaptive Control of Resource Flow to Optimize Construction Work and Cash Flow via Online Deep Reinforcement Learning ( http://arxiv.org/abs/2307.10574v1 ) ライセンス: Link先を確認	Can Jiang, Xin Li, Jia-Rui Lin, Ming Liu, Zhiliang Ma	(参考訳) 建設作業、資源、キャッシュフローの複雑さとダイナミクスのために、それらの管理の貧弱さは、通常、時間とコストのオーバーラン、破産、さらにはプロジェクトの失敗につながる。既存の手法では不確実性のある動的環境における資源フローの最適制御を達成できなかった。そこで本稿では,建設プロジェクトの作業とキャッシュフローを最適化するために,資源フローを適応的に制御するモデルと手法を提案する。まず, 部分観測可能なマルコフ決定過程に基づく数理モデルを確立し, 建設作業, 資源, キャッシュフローの複雑な相互作用, 多様な影響因子の不確実性と変動を定式化する。一方、最適解を効率的に見つけるために、労働と物質フローの適応的最適制御を実現するために、深層強化学習(DRL)に基づく手法を導入し、作業とキャッシュフローを最適化する。 drlのトレーニングプロセスを支援するために、プロジェクトの動的特徴と外部環境を模倣するために、離散イベントシミュレーションに基づくシミュレータも開発されている。シミュレーション実験により,提案手法がバニラ経験的手法と遺伝的アルゴリズムを上回り,多様なプロジェクトや外部環境において顕著な能力を有し,drlと経験的手法のハイブリッドエージェントが最良の結果をもたらすことを示した。本稿では,共同作業,資源,キャッシュフローの適応制御と最適化に寄与し,建設プロジェクト管理におけるDRL技術導入の一歩となる可能性がある。 Due to complexity and dynamics of construction work, resource, and cash flows, poor management of them usually leads to time and cost overruns, bankruptcy, even project failure. Existing approaches in construction failed to achieve optimal control of resource flow in a dynamic environment with uncertainty. Therefore, this paper introducess a model and method to adaptive control the resource flows to optimize the work and cash flows of construction projects. First, a mathematical model based on a partially observable Markov decision process is established to formulate the complex interactions of construction work, resource, and cash flows as well as uncertainty and variability of diverse influence factors. Meanwhile, to efficiently find the optimal solutions, a deep reinforcement learning (DRL) based method is introduced to realize the continuous adaptive optimal control of labor and material flows, thereby optimizing the work and cash flows. To assist the training process of DRL, a simulator based on discrete event simulation is also developed to mimic the dynamic features and external environments of a project. Experiments in simulated scenarios illustrate that our method outperforms the vanilla empirical method and genetic algorithm, possesses remarkable capability in diverse projects and external environments, and a hybrid agent of DRL and empirical method leads to the best result. This paper contributes to adaptive control and optimization of coupled work, resource, and cash flows, and may serve as a step stone for adopting DRL technology in construction project management.	翻訳日:2023-07-21 14:48:41 公開日:2023-07-20
# 無効論理と等価なゲイン:言語モデルのプロンプトにおける推論の奇妙な性質 Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting ( http://arxiv.org/abs/2307.10573v1 ) ライセンス: Link先を確認	Rylan Schaeffer, Kateryna Pistunova, Samar Khanna, Sarthak Consul, Sanmi Koyejo	(参考訳) 言語モデルは、パフォーマンスを大幅に向上させる方法で問題を通じて推論するよう促すことができる。しかし、このようなプロンプトによるパフォーマンス改善は明らかではない。最近の研究では、論理的な \textit{invalid} chain-of-thought (cot) プロンプトを用いることで、論理的な \textit{valid} cotプロンプトと同じくらいのパフォーマンスが向上し、cotの編集によって問題固有の情報を抽象情報や分散情報に置き換えることが通常性能に影響を与えないことが示された。批評家は、これらの発見は意味のある結論を導き出すにはあまりにも少ない、そして簡単な作業に基づいていると答えている。この問題を解決するために、論理的に無効なCoTプロンプトが、BIG-Bench Hard(BBH)と呼ばれるBIG-Benchベンチマークの最も難しいタスクにおいて、論理的に有効なプロンプトと同じレベルのパフォーマンスゲインを提供するかどうかをテストする。論理的に textit{invalid} 推論プロンプトは、BBH タスクにおいて論理的に有効な推論プロンプトとして、確かに同様のパフォーマンスゲインを達成する。また、前作で使われたcotプロンプトには論理的なエラーが含まれていることもわかりました。これは、論理的に妥当な推論を超えた共変項がパフォーマンス改善の責任を負うことを示唆している。 Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easy tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically \textit{invalid} reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that covariates beyond logically valid reasoning are responsible for performance improvements.	翻訳日:2023-07-21 14:48:15 公開日:2023-07-20
# 多粒子系における熱カシミール相互作用:散乱チャネルアプローチ Thermal Casimir interactions in multi-particle systems: scattering channel approach ( http://arxiv.org/abs/2307.10570v1 ) ライセンス: Link先を確認	Yang Li, Kimball A. Milton, Iver Brevik	(参考訳) 多粒子熱カシミール相互作用は、主にカシミールエントロピーの観点から、多重散乱過程に基づく視点から研究されている。散乱経路の幾何学を詳細に記述し, 横流路, 縦流路, 混合流路など, 異なる種類の流路からの寄与を示す。経路の幾何学は経路内の各チャネルの重みに大きな影響を与える。ネガティリティと非単調性は、多粒子カシミールエントロピーにおいて一般的に見られ、その源は、経路の幾何、偏光混合の種類、各粒子の分極性など多様である。多粒子散乱による熱的寄与は系において重要であるが、ゼロ温度の多粒子散乱効果は重要ではない。多粒子配置から連続体への挙動の制限を簡潔に検討する。 Multi-particle thermal Casimir interactions are investigated, mostly in terms of the Casimir entropy, from the point of view based on multiple-scattering processes. The geometry of the scattering path is depicted in detail, and the contributions from different types of channels, namely the transverse, longitudinal and mixing channels, are demonstrated. The geometry of the path can strongly influence the weight of each channel in the path. Negativity and nonmonotonicity are commonly seen in the multi-particle Casimir entropy, the sources of which are diverse, including the geometry of the path, the types of polarization mixing, the polarizability of each particle, etc. Thermal contributions from multi-particle scatterings can be significant in the system, while the zero-temperature multi-particle scattering effects are insignificant. Limiting behaviors from a multi-particle configuration to a continuum are briefly explored.	翻訳日:2023-07-21 14:47:47 公開日:2023-07-20
# 知覚的アライメントモニタリング Deceptive Alignment Monitoring ( http://arxiv.org/abs/2307.10569v1 ) ライセンス: Link先を確認	Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo	(参考訳) 大規模な機械学習モデルの能力が拡大し続け、そのようなモデルに与えられる自律性が拡大するにつれて、新しい敵の織機(モデルそのもの)が見えてくる。モデルが一見合理的に振る舞うという脅威は、内密かつ微妙にその振る舞いを操作上の理由から修正する一方で、AIセーフティ&アライメントのコミュニティにおいて、詐欺的アライメントと呼ばれることが多い。したがって、この新たな方向を認知アライメントモニタリングと呼ぶ。そこで本研究では,近未来にますます重要となり,相互に絡み合うであろう,多様な機械学習サブフィールドにおける新たな方向性を特定し,これらの分野における進歩は,長期的な課題と新たな研究機会の両方をもたらすと論じる。我々は、これらの新興方向への敵対的機械学習コミュニティのさらなる関与を提唱することで、結論付ける。 As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety & Alignment communities. Consequently, we call this new direction Deceptive Alignment Monitoring. In this work, we identify emerging directions in diverse machine learning subfields that we believe will become increasingly important and intertwined in the near future for deceptive alignment monitoring, and we argue that advances in these fields present both long-term challenges and new research opportunities. We conclude by advocating for greater involvement by the adversarial machine learning community in these emerging directions.	翻訳日:2023-07-21 14:47:31 公開日:2023-07-20
# No-frills Temporal Video Grounding:マルチスケール隣りの注意とズームイン境界検出 No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection ( http://arxiv.org/abs/2307.10567v1 ) ライセンス: Link先を確認	Qi Zhang and Sipeng Zheng and Qin Jin	(参考訳) 時間的ビデオグラウンドティング(TVG)は、未編集のビデオから言語クエリの時間間隔を取得することを目的としている。テレビGにおける重要な課題は、低SNR(Semantic Noise Ratio)による低SNRの性能低下である。先行研究はこの課題に洗練された技術を用いて対処した。本稿では,マルチスケールアテンディングアテンションとズームイン境界検出という2つのコアモジュールからなる非フリルtvgモデルを提案する。マルチスケール隣人の注意は、各ビデオトークンが隣人からの視覚的コンテキストのみを集約することを制限し、高比雑音から多スケール特徴階層による最も識別性の高い情報の抽出を可能にする。ズームイン境界検出は、きめ細かい接地調整のための選択された上位候補の局所的判別に焦点を当てる。エンド・ツー・エンドのトレーニング戦略により、我々のモデルは異なるTVGベンチマーク上での競合性能を達成すると同時に、より高速な推論速度と軽量なモデルパラメータの利点も享受できる。 Temporal video grounding (TVG) aims to retrieve the time interval of a language query from an untrimmed video. A significant challenge in TVG is the low "Semantic Noise Ratio (SNR)", which results in worse performance with lower SNR. Prior works have addressed this challenge using sophisticated techniques. In this paper, we propose a no-frills TVG model that consists of two core modules, namely multi-scale neighboring attention and zoom-in boundary detection. The multi-scale neighboring attention restricts each video token to only aggregate visual contexts from its neighbor, enabling the extraction of the most distinguishing information with multi-scale feature hierarchies from high-ratio noises. The zoom-in boundary detection then focuses on local-wise discrimination of the selected top candidates for fine-grained grounding adjustment. With an end-to-end training strategy, our model achieves competitive performance on different TVG benchmarks, while also having the advantage of faster inference speed and lighter model parameters, thanks to its lightweight architecture.	翻訳日:2023-07-21 14:47:16 公開日:2023-07-20
# SCA-PVNet: 3Dオブジェクト検索のためのポイントクラウドとマルチビューの自己組織化に基づくアグリゲーション SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval ( http://arxiv.org/abs/2307.10601v1 ) ライセンス: Link先を確認	Dongyun Lin, Yi Cheng, Aiyuan Guo, Shangbo Mao, Yiqun Li	(参考訳) 3dオブジェクトの検索に対処するため、ボクセル、ポイントクラウド、マルチビュー画像など、単一のモダリティで表現された3dオブジェクトの高度に識別可能な記述子を生成するための努力がなされている。 3dオブジェクトのマルチモダリティ表現からの補完情報を活用し、検索性能をさらに向上させることを約束する。しかし,大規模データセットを用いた多モード3Dオブジェクト検索はめったに行われない。本稿では,3次元オブジェクト検索のための点雲と多視点画像(SCA-PVNet)の自己組織化に基づくアグリゲーションを提案する。点群と多視点画像から深い特徴を抽出し,機能融合を効果的に行うために,インモダリティアグリゲーションモジュール (imam) とクロスモダリティアグリゲーションモジュール (cmam) という2種類の機能アグリゲーションモジュールを設計した。 IMAMはセルフアテンションメカニズムを利用してマルチビュー機能を集約し、CMAMはクロスアテンションメカニズムを利用してポイントクラウド機能をマルチビュー機能と相互作用する。オブジェクト検索のための3Dオブジェクトの最終記述子は、両方のモジュールから集約された特徴を連結することで得られる。提案手法よりもSCA-PVNetの方が優れていることを示すため,小規模から大規模までの3つのデータセットを用いて実験と解析を行った。 To address 3D object retrieval, substantial efforts have been made to generate highly discriminative descriptors of 3D objects represented by a single modality, e.g., voxels, point clouds or multi-view images. It is promising to leverage the complementary information from multi-modality representations of 3D objects to further improve retrieval performance. However, multi-modality 3D object retrieval is rarely developed and analyzed on large-scale datasets. In this paper, we propose self-and-cross attention based aggregation of point cloud and multi-view images (SCA-PVNet) for 3D object retrieval. With deep features extracted from point clouds and multi-view images, we design two types of feature aggregation modules, namely the In-Modality Aggregation Module (IMAM) and the Cross-Modality Aggregation Module (CMAM), for effective feature fusion. IMAM leverages a self-attention mechanism to aggregate multi-view features while CMAM exploits a cross-attention mechanism to interact point cloud features with multi-view features. The final descriptor of a 3D object for object retrieval can be obtained via concatenating the aggregated features from both modules. Extensive experiments and analysis are conducted on three datasets, ranging from small to large scale, to show the superiority of the proposed SCA-PVNet over the state-of-the-art methods.	翻訳日:2023-07-21 14:41:33 公開日:2023-07-20
# AIの課題と解決策 Challenges and Solutions in AI for All ( http://arxiv.org/abs/2307.10600v1 ) ライセンス: Link先を確認	Rifat Ara Shams, Didar Zowghi, Muneera Bano	(参考訳) ai(artificial intelligence)の広汎な存在と多様性は、公正、信頼、透明性のための設計において多様性と排他性(d&i)の原則を必要とする。しかし、これらの考察はしばしば見過ごされ、バイアス、差別、信頼できないという問題に繋がる。そこで我々は,aiにおけるd&iに関する課題と解決策を体系的に検討した。当社の厳密な検索の結果、2017年から2022年の間に48の論文が公開された。これらの論文のオープンコーディングでは、55の独特な課題と33のソリューション、24の独特な課題、23のソリューションがAIを使用してそのようなプラクティスを強化する。この研究は、これらの問題をより深く理解することで、これらの原則を将来のAIシステムに統合しようとする研究者や実践者に啓蒙する。 Artificial Intelligence (AI)'s pervasive presence and variety necessitate diversity and inclusivity (D&I) principles in its design for fairness, trust, and transparency. Yet, these considerations are often overlooked, leading to issues of bias, discrimination, and perceived untrustworthiness. In response, we conducted a Systematic Review to unearth challenges and solutions relating to D&I in AI. Our rigorous search yielded 48 research articles published between 2017 and 2022. Open coding of these papers revealed 55 unique challenges and 33 solutions for D&I in AI, as well as 24 unique challenges and 23 solutions for enhancing such practices using AI. This study, by offering a deeper understanding of these issues, will enlighten researchers and practitioners seeking to integrate these principles into future AI systems.	翻訳日:2023-07-21 14:41:07 公開日:2023-07-20
# アンサンブル学習に基づくベイジアンハイパーパラメータ感度分析によるIoTサイバーセキュリティの異常検出 Ensemble Learning based Anomaly Detection for IoT Cybersecurity via Bayesian Hyperparameters Sensitivity Analysis ( http://arxiv.org/abs/2307.10596v1 ) ライセンス: Link先を確認	Tin Lai, Farnaz Farid, Abubakar Bello, Fariza Sabrina	(参考訳) IoT(Internet of Things)は、世界中の何十億ものインテリジェントデバイスを統合し、人間の介入なしに他の接続デバイスと通信する能力を持つ。 IoTはデータアグリゲーションと分析を大規模に実現し、多くのドメインのライフクオリティを改善する。特にiotが収集するデータには、異常検出のための膨大な情報が含まれている。 IoTの異質な性質は、サイバーセキュリティの課題と機会の両方である。サイバーセキュリティ監視における従来のアプローチでは、さまざまなデータ型に対するさまざまなデータの前処理と処理が必要になることが少なくない。しかし、ヘテロジニアスタイプのネットワークデバイスは、単一のタイプのデバイス読み出しよりも、より多様な信号セットをキャプチャすることが多く、特に異常検出に有用である。本稿では,異常検出によるIoTサイバーセキュリティ向上のためのアンサンブル機械学習手法に関する総合的研究を行う。 1つの機械学習モデルを使用するのではなく、アンサンブル学習は複数のモデルからの予測力を組み合わせ、単一の機械学習モデルを使用するのではなく、異種データセットでの予測精度を高める。複数のIoTセンサを内蔵したネットワーク環境に適応するために,ベイジアンハイパーパラメータ最適化を利用したアンサンブル学習フレームワークを提案する。実験では,従来の手法と比較して高い予測力を示す。 The Internet of Things (IoT) integrates more than billions of intelligent devices over the globe with the capability of communicating with other connected devices with little to no human intervention. IoT enables data aggregation and analysis on a large scale to improve life quality in many domains. In particular, data collected by IoT contain a tremendous amount of information for anomaly detection. The heterogeneous nature of IoT is both a challenge and an opportunity for cybersecurity. Traditional approaches in cybersecurity monitoring often require different kinds of data pre-processing and handling for various data types, which might be problematic for datasets that contain heterogeneous features. However, heterogeneous types of network devices can often capture a more diverse set of signals than a single type of device readings, which is particularly useful for anomaly detection. In this paper, we present a comprehensive study on using ensemble machine learning methods for enhancing IoT cybersecurity via anomaly detection. Rather than using one single machine learning model, ensemble learning combines the predictive power from multiple models, enhancing their predictive accuracy in heterogeneous datasets rather than using one single machine learning model. We propose a unified framework with ensemble learning that utilises Bayesian hyperparameter optimisation to adapt to a network environment that contains multiple IoT sensor readings. Experimentally, we illustrate their high predictive power when compared to traditional methods.	翻訳日:2023-07-21 14:40:53 公開日:2023-07-20
# 最適マルチエージェントベイズ分散推定のための構造の利用 Exploiting Structure for Optimal Multi-Agent Bayesian Decentralized Estimation ( http://arxiv.org/abs/2307.10594v1 ) ライセンス: Link先を確認	Christopher Funk, Ofer Dagan, Benjamin Noack and Nisar R. Ahmed	(参考訳) ベイズ分散データ融合における重要な課題は、以前送信されたデータが送信元に循環する‘噂の伝播’あるいは‘二重カウント’現象である。これはしばしば、境界を計算するために見積もりの重み付け平均を取る共分散交叉(英語版)(ci)のような近似的な方法によって対処される。問題は、この境界がタイトではないこと、すなわち、見積もりがしばしば保存的すぎることである。本稿では,マルチエージェント分散核融合問題における確率的独立構造を生かして,より密接な境界を求めることができることを示す。 i) 元のCIの1つの(モノリシックな)因子ではなく複数の(非モノリシックな)重み付け因子を使用するCIアルゴリズムの拡張。 (ii)最適境界を計算し、任意の依存関係構造を完全に活用できる一般最適化スキーム。我々は,本手法を比較し,簡単な問題に対して同じ解に収束することを示す。次に, 大規模目標追跡シミュレーションを用いて新しい非モノリシックciアルゴリズムをテストし, 従来のモノリシックciよりも厳密なバウンドと正確な推定を実現することを示す。 A key challenge in Bayesian decentralized data fusion is the `rumor propagation' or `double counting' phenomenon, where previously sent data circulates back to its sender. It is often addressed by approximate methods like covariance intersection (CI) which takes a weighted average of the estimates to compute the bound. The problem is that this bound is not tight, i.e. the estimate is often over-conservative. In this paper, we show that by exploiting the probabilistic independence structure in multi-agent decentralized fusion problems a tighter bound can be found using (i) an expansion to the CI algorithm that uses multiple (non-monolithic) weighting factors instead of one (monolithic) factor in the original CI and (ii) a general optimization scheme that is able to compute optimal bounds and fully exploit an arbitrary dependency structure. We compare our methods and show that on a simple problem, they converge to the same solution. We then test our new non-monolithic CI algorithm on a large-scale target tracking simulation and show that it achieves a tighter bound and a more accurate estimate compared to the original monolithic CI.	翻訳日:2023-07-21 14:40:31 公開日:2023-07-20
# Event Blob Tracking: 非同期リアルタイムアルゴリズム Event Blob Tracking: An Asynchronous Real-Time Algorithm ( http://arxiv.org/abs/2307.10593v1 ) ライセンス: Link先を確認	Ziwei Wang, Timothy Molloy, Pieter van Goor, Robert Mahony	(参考訳) イベントベースのカメラは、高時間分解能、低レイテンシ、高ダイナミックレンジのため、動きの速い物体を追跡するために人気が高まっている。本稿では,生のイベントをリアルタイムで非同期に追跡する新しいアルゴリズムを提案する。本稿では,イベントブロブの概念を,条件空間の確率がブロブ様である事象発生の時空間的確率として導入する。多くの現実世界のオブジェクトはイベントブロブデータを生成する。例えば、車のヘッドライトや、静的あるいはゆっくりと変化する背景に対して動く小さなフォアグラウンドオブジェクトなどのLEDを点滅させる。提案アルゴリズムは、カルマンフィルタと結合してイベントブロブ状態を追跡するために、データアソシエーションの動的しきい値を持つ近傍分類器を用いる。提案手法は,照明条件や高速動作においても高精度なトラッキングとイベントブロブ形状推定を実現する。マイクロ秒の時間分解は、フィルタ出力が接触時間や距離推定などの二次情報を引き出すことができ、自動運転における衝突回避のような現実世界の問題に応用できることを意味する。 Event-based cameras have become increasingly popular for tracking fast-moving objects due to their high temporal resolution, low latency, and high dynamic range. In this paper, we propose a novel algorithm for tracking event blobs using raw events asynchronously in real time. We introduce the concept of an event blob as a spatio-temporal likelihood of event occurrence where the conditional spatial likelihood is blob-like. Many real-world objects generate event blob data, for example, flickering LEDs such as car headlights or any small foreground object moving against a static or slowly varying background. The proposed algorithm uses a nearest neighbour classifier with a dynamic threshold criteria for data association coupled with a Kalman filter to track the event blob state. Our algorithm achieves highly accurate tracking and event blob shape estimation even under challenging lighting conditions and high-speed motions. The microsecond time resolution achieved means that the filter output can be used to derive secondary information such as time-to-contact or range estimation, that will enable applications to real-world problems such as collision avoidance in autonomous driving.	翻訳日:2023-07-21 14:40:10 公開日:2023-07-20
# 自律走行システムの試験と改善のための境界状態生成 Boundary State Generation for Testing and Improvement of Autonomous Driving Systems ( http://arxiv.org/abs/2307.10590v1 ) ライセンス: Link先を確認	Matteo Biagiola, Paolo Tonella	(参考訳) 近年のディープニューラルネットワーク(DNN)とセンサ技術の進歩により、自律運転システム(ADS)の自律性はますます高まっている。しかし、信頼度の評価は依然として重要な関心事である。最先端のADSテストアプローチでは、シミュレーション運転環境の制御可能な属性をADSが誤動作するまで変更する。このようなアプローチの主な欠点は、(1) シミュレーション環境の変更は、フィールド内テスト設定(例えば、道路形状の変更)に容易に転送できないこと、(2) ADSが成功した環境インスタンスは、ADSが誤動作する可能性のある隠れ運転条件を含む可能性があるにもかかわらず、破棄されることである。本稿では,広告評価のための新しいテスト生成装置であるgenbo (generator of boundary state pairs)を提案する。 GenBoは、障害のない環境インスタンスで収集されたエゴ車両の運転条件(位置、速度、方向)を変更し、同一環境における行動境界(すなわち、モデルが誤動作し始める場所)における挑戦運転条件を効率的に生成する。このような境界条件を用いて、初期トレーニングデータセットを拡張し、テスト中のDNNモデルを再訓練する。評価結果から,リトレーニングモデルでは,元のdnnモデルと比較して,評価トラックの別セットにおいて,最大16以上の成功率を示した。 Recent advances in Deep Neural Networks (DNNs) and sensor technologies are enabling autonomous driving systems (ADSs) with an ever-increasing level of autonomy. However, assessing their dependability remains a critical concern. State-of-the-art ADS testing approaches modify the controllable attributes of a simulated driving environment until the ADS misbehaves. Such approaches have two main drawbacks: (1) modifications to the simulated environment might not be easily transferable to the in-field test setting (e.g., changing the road shape); (2) environment instances in which the ADS is successful are discarded, despite the possibility that they could contain hidden driving conditions in which the ADS may misbehave. In this paper, we present GenBo (GENerator of BOundary state pairs), a novel test generator for ADS testing. GenBo mutates the driving conditions of the ego vehicle (position, velocity and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (i.e., where the model starts to misbehave) in the same environment. We use such boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has up to 16 higher success rate on a separate set of evaluation tracks with respect to the original DNN model.	翻訳日:2023-07-21 14:39:53 公開日:2023-07-20
# バッテリー電気自動車の充電行動予測:マイクロクラスタ化とsmote技術を用いたディープラーニングアプローチ Forecasting Battery Electric Vehicle Charging Behavior: A Deep Learning Approach Equipped with Micro-Clustering and SMOTE Techniques ( http://arxiv.org/abs/2307.10588v1 ) ライセンス: Link先を確認	Hanif Tayarani, Trisha V. Ramadoss, Vaishnavi Karanam, Gil Tal, Christopher Nitta	(参考訳) エネルギーシステム、気候変動、公衆衛生が交通の電化に向けた主要な理由の1つである。排出削減のため、世界各国で輸送電化が進められている。その結果、多くの自動車メーカーが間もなくバッテリー電気自動車(BEV)のみの製造を開始する。カリフォルニア州では、主に気候変動や大気汚染の懸念から、BEVの採用率が上昇している。気候や大気汚染の目標には最適だが、不適切に管理されたBEV充電は、不十分な充電インフラと停電につながる可能性がある。本研究では,BEVの走行と充電データを学習し,BEVの充電イベントを予測するためのニューラルネットワークアルゴリズムであるMicro Clustering Deep Neural Network (MCDNN)を開発した。 MCDNNは、2015年から2020年にかけてカリフォルニア州で132台のBEVから発生し、合計1570167台のBEVモデルにまたがる、堅牢な旅行と料金のデータセットを使って構成されている。数値的な結果から,提案手法は支持ベクトルマシン,k近傍,決定木,その他のニューラルネットワークモデルなど,この分野のベンチマーク手法よりも有益であることが判明した。 Energy systems, climate change, and public health are among the primary reasons for moving toward electrification in transportation. Transportation electrification is being promoted worldwide to reduce emissions. As a result, many automakers will soon start making only battery electric vehicles (BEVs). BEV adoption rates are rising in California, mainly due to climate change and air pollution concerns. While great for climate and pollution goals, improperly managed BEV charging can lead to insufficient charging infrastructure and power outages. This study develops a novel Micro Clustering Deep Neural Network (MCDNN), an artificial neural network algorithm that is highly effective at learning BEVs trip and charging data to forecast BEV charging events, information that is essential for electricity load aggregators and utility managers to provide charging stations and electricity capacity effectively. The MCDNN is configured using a robust dataset of trips and charges that occurred in California between 2015 and 2020 from 132 BEVs, spanning 5 BEV models for a total of 1570167 vehicle miles traveled. The numerical findings revealed that the proposed MCDNN is more effective than benchmark approaches in this field, such as support vector machine, k nearest neighbors, decision tree, and other neural network-based models in predicting the charging events.	翻訳日:2023-07-21 14:39:31 公開日:2023-07-20
# NPTEL MOOCビデオにおける単語誤り率の差について A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos ( http://arxiv.org/abs/2307.10587v1 ) ライセンス: Link先を確認	Anand Kumar Rai, Siddharth D Jaiswal, Animesh Mukherjee	(参考訳) 自動音声認識(ASR)システムは、音声言語をテキストに書き起こし、音声アシスタントや文字起こしサービスを含む様々なアプリケーションで有用性を見つけるように設計されている。しかし、現在最先端のASRシステムは、印象的なベンチマーク結果を提供し、特定の地域の話者や、その音声特性の変化による人口統計学との抗争が観察されている。本研究は, 英語における「シム9.8ドル」の技術講義を含む8740時間の大規模音声データセットのキュレーションと, インドのデモグラフィーの様々な部分を表すインストラクターによる講義の書き起こしについて述べる。このデータセットは、非常に人気のあるNTTEL MOOCプラットフォームをベースとしている。私たちは、キュレートされたデータセットを使用して、youtubeの自動キャプションとopenai whisperモデルのパフォーマンスを、インドの多様な話者特性にわたって測定します。話者の性別、母国性、年齢、発話率などによる差はあるが、キャストによる差異は存在していない。また,講義の分野における統計的に有意な差異も観察した。これらの結果は、より包括的で堅牢なASRシステムと、それらの不均一性評価のための表現データセットの必要性を示している。 Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of $\sim9.8$K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography. The dataset is sourced from the very popular NPTEL MOOC platform. We use the curated dataset to measure the existing disparity in YouTube Automatic Captions and OpenAI Whisper model performance across the diverse demographic traits of speakers in India. While there exists disparity due to gender, native region, age and speech rate of speakers, disparity based on caste is non-existent. We also observe statistically significant disparity across the disciplines of the lectures. These results indicate the need of more inclusive and robust ASR systems and more representational datasets for disparity evaluation in them.	翻訳日:2023-07-21 14:39:11 公開日:2023-07-20
# 機械学習システムの信頼性に関する全体論的評価 A Holistic Assessment of the Reliability of Machine Learning Systems ( http://arxiv.org/abs/2307.10586v1 ) ライセンス: Link先を確認	Anthony Corso, David Karamadian, Romeo Valentin, Mary Cooper, Mykel J. Kochenderfer	(参考訳) 機械学習(ml)システムは、医療、輸送、軍、国家安全保障などの高リスク設定に浸透するにつれて、信頼性に関する懸念が高まっている。顕著な進歩にもかかわらず、これらのシステムの性能は敵の攻撃や環境の変化によって著しく低下し、過度な予測、入力障害の検出の失敗、予期せぬシナリオで一般化できないことにつながる。本稿では,MLシステムの信頼性に関する総合評価手法を提案する。分散精度,分散シフトロバスト性,逆ロバスト性,キャリブレーション,分散検出の5つの特性を評価した。信頼性スコアも導入され、システム全体の信頼性を評価するために使用される。異なるアルゴリズムアプローチのパフォーマンスに関する洞察を提供するため,最先端技術を特定し,分類し,提案する信頼性指標と信頼性スコアを用いて実世界のタスクの選択を評価する。 500モデル以上のモデルを分析すると、あるメトリックに対する設計は必ずしも他のメトリックを制約するわけではないが、特定のアルゴリズム技術は複数のメトリクスの信頼性を同時に向上させることができることが分かる。この研究は、MLの信頼性をより包括的に理解し、将来の研究開発のロードマップを提供する。 As machine learning (ML) systems increasingly permeate high-stakes settings such as healthcare, transportation, military, and national security, concerns regarding their reliability have emerged. Despite notable progress, the performance of these systems can significantly diminish due to adversarial attacks or environmental changes, leading to overconfident predictions, failures to detect input faults, and an inability to generalize in unexpected scenarios. This paper proposes a holistic assessment methodology for the reliability of ML systems. Our framework evaluates five key properties: in-distribution accuracy, distribution-shift robustness, adversarial robustness, calibration, and out-of-distribution detection. A reliability score is also introduced and used to assess the overall system reliability. To provide insights into the performance of different algorithmic approaches, we identify and categorize state-of-the-art techniques, then evaluate a selection on real-world tasks using our proposed reliability metrics and reliability score. Our analysis of over 500 models reveals that designing for one metric does not necessarily constrain others but certain algorithmic techniques can improve reliability across multiple metrics simultaneously. This study contributes to a more comprehensive understanding of ML reliability and provides a roadmap for future research and development.	翻訳日:2023-07-21 14:38:52 公開日:2023-07-20
# 拡散を経由する参照ベースの画家的インペインティング:野生参照ドメインギャップを横断する Reference-based Painterly Inpainting via Diffusion: Crossing the Wild Reference Domain Gap ( http://arxiv.org/abs/2307.10584v1 ) ライセンス: Link先を確認	Dejia Xu, Xingqian Xu, Wenyan Cong, Humphrey Shi, Zhangyang Wang	(参考訳) 絵に新しい物体を入れたらどうなるか想像したことがありますか? 例えば、クロード・モネ(claude monet)の『water lilies, evening effect』にバスケットボールを入れるとどうなるか? 本研究では,参照ドメインギャップを越え,新しいオブジェクトをアートワークに埋め込む新しいタスクであるPaterly Inpaintingを提案する。これまでの文献では, 対象と参照との間に大きな領域不一致を考慮せず, フォトリアリスティックな参照を用いて, 芸術的イメージを描けるように設計されている。本稿では,'inpaint more wildly'と呼ばれる新しい拡散フレームワークを提案する。画像条件付き拡散モデルを用いて構築され,塗布マスクで動作するラダーサイドブランチとマスク融合機構を導入する。 CLIPイメージの埋め込みを推論時に分解することで、セマンティックな情報とスタイル情報の強度を容易に操作できる。実験により,提案するrefpaintフレームワークが既存の手法よりもはるかに優れた結果をもたらすことを実証した。提案手法は,他の方法では達成し難い参照オブジェクトで絵を描くことができる。プロジェクトページ: https://vita-group.github.io/RefPaint/ Have you ever imagined how it would look if we placed new objects into paintings? For example, what would it look like if we placed a basketball into Claude Monet's ``Water Lilies, Evening Effect''? We propose Reference-based Painterly Inpainting, a novel task that crosses the wild reference domain gap and implants novel objects into artworks. Although previous works have examined reference-based inpainting, they are not designed for large domain discrepancies between the target and the reference, such as inpainting an artistic image using a photorealistic reference. This paper proposes a novel diffusion framework, dubbed RefPaint, to ``inpaint more wildly'' by taking such references with large domain gaps. Built with an image-conditioned diffusion model, we introduce a ladder-side branch and a masked fusion mechanism to work with the inpainting mask. By decomposing the CLIP image embeddings at inference time, one can manipulate the strength of semantic and style information with ease. Experiments demonstrate that our proposed RefPaint framework produces significantly better results than existing methods. Our method enables creative painterly image inpainting with reference objects that would otherwise be difficult to achieve. Project page: https://vita-group.github.io/RefPaint/	翻訳日:2023-07-21 14:38:31 公開日:2023-07-20
# パリティアーキテクチャのための構成的小冊子コンパイル Constructive plaquette compilation for the parity architecture ( http://arxiv.org/abs/2307.10626v1 ) ライセンス: Link先を確認	Roeland ter Hoeven, Benjamin E. Niehoff, Sagar Sudhir Kale, Wolfgang Lechner	(参考訳) パリティコンパイル(parity compilation)は、パリティマッピングに必要な制約をローカルに配置する、という課題である。任意の高階最適化問題に対して,ラケットを用いたパリティアーキテクチャのための最初の構成的コンパイルアルゴリズムを提案する。これにより、プラーペットレイアウトをネイティブに実装できる断熱プロトコルと、完全に並列化されたデジタル回路が可能になる。アルゴリズムは格子の長方形のレイアウトを構築し、矩形の各層に少なくとも1つの制約を加える。中心となる考え方は、矩形の境界上の任意の量子ビットといくつかの新しい量子ビットからなる各制約は、アンシラを用いて決定的な手順でプラケットに分解できるということである。有効な制約セットの選択方法と、この分解の動作方法を示します。さらに、アシラ数を最適化し、追加の制約で最適化問題を実装する方法を示します。 Parity compilation is the challenge of laying out the required constraints for the parity mapping in a local way. We present the first constructive compilation algorithm for the parity architecture using plaquettes for arbitrary higher-order optimization problems. This enables adiabatic protocols, where the plaquette layout can natively be implemented, as well as fully parallelized digital circuits. The algorithm builds a rectangular layout of plaquettes, where in each layer of the rectangle at least one constraint is added. The core idea is that each constraint, consisting of any qubits on the boundary of the rectangle and some new qubits, can be decomposed into plaquettes with a deterministic procedure using ancillas. We show how to pick a valid set of constraints and how this decomposition works. We further give ways to optimize the ancilla count and show how to implement optimization problems with additional constraints.	翻訳日:2023-07-21 14:30:14 公開日:2023-07-20
# ポリプ再同定のための識別的視覚テキスト表現の学習 Learning Discriminative Visual-Text Representation for Polyp Re-Identification ( http://arxiv.org/abs/2307.10625v1 ) ライセンス: Link先を確認	Suncheng Xiang, Cang Liu, Sijia Du, Dahong Qian	(参考訳) 大腸内視鏡的ポリープ再同定は大腸がんの予防と治療に重要な役割を果たす大きなギャラリー内の特定のポリープと異なるカメラとビューをマッチングすることを目的としている。しかし、伝統的な手法は主に視覚的表現学習に焦点をあてるが、トレーニング中に意味的特徴の可能性を探究することを無視し、新しいシナリオに事前学習されたモデルを適用すると、容易に一般化能力が低下する可能性がある。このジレンマを解消するために,高レベルのセマンティック情報を交換することで,ポリプビデオの表現を著しく強化する,VT-ReIDというシンプルで効果的なトレーニング手法を提案する。さらに,テキストデータからの事前知識を導入するための新しいクラスタリング機構を精巧に設計した。我々の知る限りでは、大腸内視鏡的ポリープ再同定のためのクラスタリング機構を備えたビジュアルテキスト機能を利用する最初の試みである。実験結果から,本手法は現在の最先端の手法を著しく上回っており,その差は明らかである。 Colonoscopic Polyp Re-Identification aims to match a specific polyp in a large gallery with different cameras and views, which plays a key role for the prevention and treatment of colorectal cancer in the computer-aided diagnosis. However, traditional methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training, which may easily leads to poor generalization capability when adapted the pretrained model into the new scenarios. To relieve this dilemma, we propose a simple but effective training method named VT-ReID, which can remarkably enrich the representation of polyp videos with the interchange of high-level semantic information. Moreover, we elaborately design a novel clustering mechanism to introduce prior knowledge from textual data, which leverages contrastive learning to promote better separation from abundant unlabeled text data. To the best of our knowledge, this is the first attempt to employ the visual-text feature with clustering mechanism for the colonoscopic polyp re-identification. Empirical results show that our method significantly outperforms current state-of-the art methods with a clear margin.	翻訳日:2023-07-21 14:29:57 公開日:2023-07-20
# マイクロジェスチャ分類における関節骨格およびセマンティクス埋め込み損失 Joint Skeletal and Semantic Embedding Loss for Micro-gesture Classification ( http://arxiv.org/abs/2307.10624v1 ) ライセンス: Link先を確認	Kun Li, Dan Guo, Guoliang Chen, Xinge Peng, and Meng Wang	(参考訳) 本稿では,IJCAI 2023におけるMiGAチャレンジにおけるマイクロスゲクチュア分類のためのチームHFUT-VUTのソリューションについて紹介する。マイクロジェスチャー分類タスクは、骨格データに基づいて、所定のビデオのアクションカテゴリを認識することを目的としている。そこで本研究では,3D-CNNを用いたマイクロ位置認識ネットワークを提案する。最後に,トップ1の精度で第2位を1.10%上回って,マイクロジェスチャ分類チャレンジで1位にランクインした。 In this paper, we briefly introduce the solution of our team HFUT-VUT for the Micros-gesture Classification in the MiGA challenge at IJCAI 2023. The micro-gesture classification task aims at recognizing the action category of a given video based on the skeleton data. For this task, we propose a 3D-CNNs-based micro-gesture recognition network, which incorporates a skeletal and semantic embedding loss to improve action classification performance. Finally, we rank 1st in the Micro-gesture Classification Challenge, surpassing the second-place team in terms of Top-1 accuracy by 1.10%.	翻訳日:2023-07-21 14:29:36 公開日:2023-07-20
# G\"oran Lindblad in memoriam G\"oran Lindblad in memoriam ( http://arxiv.org/abs/2307.10621v1 ) ライセンス: Link先を確認	Ingemar Bengtsson	(参考訳) これは、G\"oran Lindbladの生涯と作品の簡単な説明である。 This is a brief account of the life and work of G\"oran Lindblad.	翻訳日:2023-07-21 14:29:24 公開日:2023-07-20
# 四元テンソル環の分解とカラー画像インパインティングへの応用 Quaternion tensor ring decomposition and application for color image inpainting ( http://arxiv.org/abs/2307.10620v1 ) ライセンス: Link先を確認	Jifei Miao and Kit Ian Kou	(参考訳) 近年、テンソルネットワークは大規模最適化問題を解決する強力なツールとして登場している。最も有望なテンソル・ネットワークの1つはテンソル・リング(TR)分解であり、これはトレース演算と潜在コアの公平な処理を利用してモデル内の円形の置換不変性を達成する。一方,近年では,カラーピクセルの符号化に有効性があるため,カラー画像処理タスクに広く活用されている。そこで本研究では,色画素表現の四元数による利点を活用しつつ,TR分解の強力で一般化された表現能力を継承する四元数テンソルリング(QTR)分解を提案する。本稿では,QTR分解の定義とQTR形式学習アルゴリズムに加えて,低ランク四元数テンソル完備化(LRQTC)モデルと,QTR分解に基づくカラー画像インペイントのためのアルゴリズムを提案する。最後に,カラー画像インペインティングに関する広範な実験により,提案するqtlrc法が高い競合性を示す。 In recent years, tensor networks have emerged as powerful tools for solving large-scale optimization problems. One of the most promising tensor networks is the tensor ring (TR) decomposition, which achieves circular dimensional permutation invariance in the model through the utilization of the trace operation and equitable treatment of the latent cores. On the other hand, more recently, quaternions have gained significant attention and have been widely utilized in color image processing tasks due to their effectiveness in encoding color pixels. Therefore, in this paper, we propose the quaternion tensor ring (QTR) decomposition, which inherits the powerful and generalized representation abilities of the TR decomposition while leveraging the advantages of quaternions for color pixel representation. In addition to providing the definition of QTR decomposition and an algorithm for learning the QTR format, this paper also proposes a low-rank quaternion tensor completion (LRQTC) model and its algorithm for color image inpainting based on the QTR decomposition. Finally, extensive experiments on color image inpainting demonstrate that the proposed QTLRC method is highly competitive.	翻訳日:2023-07-21 14:29:21 公開日:2023-07-20
# テキスト分類による偽レビューの検出 Detecting deceptive reviews using text classification ( http://arxiv.org/abs/2307.10617v1 ) ライセンス: Link先を確認	Anusuya Baby	(参考訳) 近年、オンラインレビューはあらゆる種類の製品やサービスを促進する上で重要な役割を担っている。企業は、顧客が商品を購入するために偽レビューを埋め込むことができる。自社製品の利点を強調したり、競合製品を批判したりすることもある。マーケター、広告主、その他のオンラインビジネスユーザーは、本当に気に入らない製品に対して偽のポジティブレビューを作成したり、偽のネガティブレビューを与えたりすることを奨励しています。ですから今では,自分たちのビジネスを宣伝したり,競争相手の評判を損なうような,偽りのレビューを書くことは避けられないことです。したがって、偽りのレビューを特定することは、激しく、現在進行中の研究分野である。本研究は,認識的レビューを識別するための機械学習モデルアプローチを提案する。本論文は,レストランレビューの偽装的意見スパムコーパスデータセット上で行った複数の実験の結果について検討する。我々は偽レビューに焦点をあてて偽コンテンツを特定するn-gramモデルとmax機能を開発した。さらに,2つの特徴抽出手法の性能調査と5つの機械学習分類手法の適用についてベンチマーク研究を行った。実験の結果,パッシブアグレッシブな分類器は他のアルゴリズムよりも優れており,テキスト分類だけでなく,偽レビューにも高い精度を達成できた。また、データ拡張を研究し、異なるディープラーニング技術を実装します。 In recent years, online reviews play a vital role for promoting any kind of product or services. Businesses may embed fake reviews in order to attract customers to purchase their products. They may even highlight the benefits of their own product or criticize the competition's product. Marketers, advertisers, and other online business users have incentive to create fake positive reviews for products which they want to promote or give fake negative reviews for products which they really don't like. So now-a-days writing a deceptive review is inevitable thing for promoting their own business or degrading competitor's reputation. Thus, identifying deceptive reviews is an intense and on-going research area. This research paper proposes machine learning model approach to identify deceptive reviews. The paper investigates the performance of the several experiments done on a Deceptive Opinion Spam Corpus dataset of restaurants reviews. We developed a n-gram model and max features to identify deceptive contents with a particular focus on fake reviews. Further, we conduct a benchmark study to investigate the performance of two different features extraction techniques and apply five machine learning classification techniques. The experimental results show that passive aggressive classifier outperforms other algorithms, and it reaches the highest accuracy not only in text classification but also to fake reviews. We also study the data augmentation and implement different deep learning techniques.	翻訳日:2023-07-21 14:29:04 公開日:2023-07-20
# 不均一フェデレーション学習の現状と研究課題 Heterogeneous Federated Learning: State-of-the-art and Research Challenges ( http://arxiv.org/abs/2307.10616v1 ) ライセンス: Link先を確認	Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, Dacheng Tao	(参考訳) フェデレーテッド・ラーニング(FL)は、大規模産業用途での利用の可能性から注目を集めている。既存のフェデレーション学習は主にモデル均質な設定に焦点を当てている。しかし、実践的なフェデレーション学習は、典型的には、データ分散、モデルアーキテクチャ、ネットワーク環境、ハードウェア機器の異種性に直面する。不均一フェデレートラーニング(HFL)はより困難であり、それに対応するソリューションは多様で複雑である。したがって、研究課題と最先端技術に関する体系的な調査が不可欠である。本稿では,まず,HFLにおける様々な研究課題について,統計的異質性,モデル異質性,通信異質性,デバイス異質性,その他の課題の5つの側面から要約する。さらに,近年のHFLの進歩を概観し,既存のHFL手法の新たな分類法を提案し,その長所と短所の詳細な分析を行った。我々は既存のメソッドを,データレベル,モデルレベル,サーバレベルという3つの異なるレベルから分類する。最後に、この分野のさらなる発展を促進するため、hflにおけるいくつかの批判的かつ有望な今後の研究方向について論じる。 HFLの定期的に更新されたコレクションはhttps://github.com/marswhu/HFL_Survey.comで入手できる。 Federated learning (FL) has drawn increasing attention owing to its potential use in large-scale industrial applications. Existing federated learning works mainly focus on model homogeneous settings. However, practical federated learning typically faces the heterogeneity of data distributions, model architectures, network environments, and hardware devices among participant clients. Heterogeneous Federated Learning (HFL) is much more challenging, and corresponding solutions are diverse and complex. Therefore, a systematic survey on this topic about the research challenges and state-of-the-art is essential. In this survey, we firstly summarize the various research challenges in HFL from five aspects: statistical heterogeneity, model heterogeneity, communication heterogeneity, device heterogeneity, and additional challenges. In addition, recent advances in HFL are reviewed and a new taxonomy of existing HFL methods is proposed with an in-depth analysis of their pros and cons. We classify existing methods from three different levels according to the HFL procedure: data-level, model-level, and server-level. Finally, several critical and promising future research directions in HFL are discussed, which may facilitate further developments in this field. A periodically updated collection on HFL is available at https://github.com/marswhu/HFL_Survey.	翻訳日:2023-07-21 14:28:44 公開日:2023-07-20
# HC-NJDGデータ分析によるインドの高等裁判所の罰則の理解 Analyzing HC-NJDG Data to Understand the Pendency in High Courts in India ( http://arxiv.org/abs/2307.10615v1 ) ライセンス: Link先を確認	Kshitiz Verma	(参考訳) インドの司法機関は、あらゆるレベルで裁判所で係争中の何百万もの事件に苦しめられている。本稿では,インド共和国における24の高等裁判所(hc-njdg,high court njdg)において収集したデータを分析した。 2017年8月31日から2018年12月26日までの73日間のデータを収集しました。したがって、私たちによって収集されたデータは、ほぼ16ヶ月の期間にまたがる。我々は,高等裁判所のNJDGポータルにおいて,高等裁判所の裁判官数,高等裁判所に係留する事件数,10年以上保留されている事件数,提出された事件数,登録された事件数,女性・高齢者の訴訟数など,さまざまな統計分析を行った。結果はこう示しています 1) 高等裁判所判事の数はNJDG(第1、第1、第2、第10、第11、第V表)に重大な誤差がある。 2)ほとんどの高等裁判所の仮設事件は減少せず、増加傾向にある(第3、第13図)。 3)HC-NJDGの定期的な更新が必要である。一部の高等裁判所に関するデータは定期的に更新されず、ポータルで誤って更新される(第14図)。 4) 異なる高等裁判所の裁判官に対する判例の平均負荷には大きな差がある(第6図)。 5) すべての高等裁判所が裁判官の承認した力で運営している場合、今後20年以内に上級裁判所の年金は無効にすることができる(第21、第22図)。 6) 女性及び高齢者が起こした留置件数は不当に低く、合計留置件の10%未満である(第23-27図) 7)高等裁判所の仮設事件件数を減少させるため、裁判所における事案作成のスケジューリングプロセスの改善が図られる(第29図)。 8)いくつかの統計は明確に定義されていない(第31図)。 Indian Judiciary is suffering from burden of millions of cases that are lying pending in its courts at all the levels. In this paper, we analyze the data that we have collected on the pendency of 24 high courts in the Republic of India as they were made available on High Court NJDG (HC-NJDG). We collected data on 73 days beginning August 31, 2017 to December 26, 2018, including these days. Thus, the data collected by us spans a period of almost sixteen months. We have analyzed various statistics available on the NJDG portal for High Courts, including but not limited to the number of judges in each high court, the number of cases pending in each high court, cases that have been pending for more than 10 years, cases filed, listed and disposed, cases filed by women and senior citizens, etc. Our results show that: 1) statistics as important as the number of judges in high courts have serious errors on NJDG (Fig. 1, 2, 10, 11, Table V). 2) pending cases in most of the high courts are increasing rather than decreasing (Fig. 3, 13). 3) regular update of HC-NJDG is required for it to be useful. Data related to some high courts is not being updated regularly or is updated erroneously on the portal (Fig. 14). 4) there is a huge difference in terms of average load of cases on judges of different high courts (Fig. 6). 5) if all the high courts operate at their approved strength of judges, then for most of the high courts pendency can be nullified within 20 years from now (Fig. 21, 22). 6) the pending cases filed by women and senior citizens are disproportionately low, they together constitute less than 10% of the total pending cases (Fig. 23 - 27) 7) a better scheduling process for preparing causelists in courts can help reducing the number of pending cases in the High Courts (Fig. 29). 8) some statistics are not well defined (Fig. 31).	翻訳日:2023-07-21 14:28:22 公開日:2023-07-20
# ビルアウトライン自動抽出のためのハイブリッド特徴埋め込み Hybrid Feature Embedding For Automatic Building Outline Extraction ( http://arxiv.org/abs/2307.10609v1 ) ライセンス: Link先を確認	Weihang Ran, Wei Yuan, Xiaodan Shi, Zipei Fan, Ryosuke Shibasaki	(参考訳) 高解像度空中画像から抽出した建物概要は, 変化検出や災害評価など, 様々な応用分野に利用することができる。しかし、従来のcnnモデルはオリジナル画像から非常に正確に輪郭を認識できない。本稿では,CNNとTransformerをベースとしたモデルとアクティブな輪郭モデルを提案し,この問題に対処する。また,エンコーダが生成する異なる特徴を処理するために,トリプルブランチデコーダ構造も設計した。実験の結果、我々のモデルは2つのデータセットで他のベースラインモデルよりも優れており、ベイヒンゲンでは91.1% mIoU、ビング小屋では83.8%であることがわかった。 Building outline extracted from high-resolution aerial images can be used in various application fields such as change detection and disaster assessment. However, traditional CNN model cannot recognize contours very precisely from original images. In this paper, we proposed a CNN and Transformer based model together with active contour model to deal with this problem. We also designed a triple-branch decoder structure to handle different features generated by encoder. Experiment results show that our model outperforms other baseline model on two datasets, achieving 91.1% mIoU on Vaihingen and 83.8% on Bing huts.	翻訳日:2023-07-21 14:27:47 公開日:2023-07-20
# 確率的洗練による物理駆動乱流画像復元 Physics-Driven Turbulence Image Restoration with Stochastic Refinement ( http://arxiv.org/abs/2307.10603v1 ) ライセンス: Link先を確認	Ajay Jaiswal, Xingguang Zhang, Stanley H. Chan, Zhangyang Wang	(参考訳) 大気乱流による画像歪みは確率的劣化であり、長距離光学イメージングシステムでは重要な問題である。合成データの助けを借りて、モデルベースの新しいディープラーニングソリューションを含む、過去数十年間、数多くの研究が実施されてきた。近年、ディープラーニングモデルが現実の乱流に適応するために、高速で物理学的なシミュレーションツールが導入されたが、そのようなモデルの訓練は、合成データと地上の真理対にのみ依存している。本稿では,物理ベースのシミュレータを直接学習プロセスに導入し,ネットワークが確率性を劣化や基礎画像から切り離すのに役立つ物理統合復元ネットワーク(pirn)を提案する。さらに、決定論的モデルによって導入された「平均効果」と、合成と実世界の劣化の間の領域ギャップを克服するために、我々はさらに、その知覚的品質を高めるために、確率的微細化(PiRN-SR)を用いたPiRNを導入する。全体として、我々のPiRNとPiRN-SRは、実世界の未知の乱流条件への一般化を改善し、ピクセルの精度と知覚品質の両面で最先端の復元を提供する。我々のコードは \url{https://github.com/VITA-Group/PiRN} で入手できる。 Image distortion by atmospheric turbulence is a stochastic degradation, which is a critical problem in long-range optical imaging systems. A number of research has been conducted during the past decades, including model-based and emerging deep-learning solutions with the help of synthetic data. Although fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions recently, the training of such models only relies on the synthetic data and ground truth pairs. This paper proposes the Physics-integrated Restoration Network (PiRN) to bring the physics-based simulator directly into the training process to help the network to disentangle the stochasticity from the degradation and the underlying image. Furthermore, to overcome the ``average effect" introduced by deterministic models and the domain gap between the synthetic and real-world degradation, we further introduce PiRN with Stochastic Refinement (PiRN-SR) to boost its perceptual quality. Overall, our PiRN and PiRN-SR improve the generalization to real-world unknown turbulence conditions and provide a state-of-the-art restoration in both pixel-wise accuracy and perceptual quality. Our codes are available at \url{https://github.com/VITA-Group/PiRN}.	翻訳日:2023-07-21 14:27:37 公開日:2023-07-20
# 無線ネットワークにおけるデータ駆動遅延確率予測:テール確率に着目して Data-Driven Latency Probability Prediction for Wireless Networks: Focusing on Tail Probabilities ( http://arxiv.org/abs/2307.10648v1 ) ライセンス: Link先を確認	Samie Mostafavi, Gourav Prateek Sharma, James Gross	(参考訳) サイバー物理システムやヒューマン・イン・ザ・ループ・アプリケーションといった新しい応用分野が出現するにつれ、あるレベルのエンドツーエンドのネットワーク遅延を極めて高い信頼性(例えば99.999%)で保証する必要がある。 IEEE 802.1as のタイムセンシティブネットワーク (TSN) で規定されるメカニズムは、スイッチングイーサネットネットワークのこれらの要件を達成するのに利用できるが、無線ネットワークにおけるTSN機構の実装は、その確率的性質のため難しい。無線リンクを99.999%の信頼性レベルに適合させるためには、遅延確率分布や分布の尾部における極めて稀な外れ値の挙動を分析し、制御する必要がある。本研究は, 混合密度ネットワーク(MDN)や極値混合モデルなどの最先端データ駆動手法を用いて遅延分布の尾部を予測し, 無線伝送においてより情報的な決定を行うことのできる, ネットワークパラメータに条件付けられた稀なレイテンシの確率を推定することを提案する。 IEEE 802.11g(WiFi)、商用プライベート、ソフトウェア定義5Gネットワークの実際の遅延測定は、提案手法をベンチマークし、テール確率に対する感度を評価するために使用される。 With the emergence of new application areas, such as cyber-physical systems and human-in-the-loop applications, there is a need to guarantee a certain level of end-to-end network latency with extremely high reliability, e.g., 99.999%. While mechanisms specified under IEEE 802.1as time-sensitive networking (TSN) can be used to achieve these requirements for switched Ethernet networks, implementing TSN mechanisms in wireless networks is challenging due to their stochastic nature. To conform the wireless link to a reliability level of 99.999%, the behavior of extremely rare outliers in the latency probability distribution, or the tail of the distribution, must be analyzed and controlled. This work proposes predicting the tail of the latency distribution using state-of-the-art data-driven approaches, such as mixture density networks (MDN) and extreme value mixture models, to estimate the likelihood of rare latencies conditioned on the network parameters, which can be used to make more informed decisions in wireless transmission. Actual latency measurements of IEEE 802.11g (WiFi), commercial private and a software-defined 5G network are used to benchmark the proposed approaches and evaluate their sensitivities concerning the tail probabilities.	翻訳日:2023-07-21 14:21:48 公開日:2023-07-20
# 多変量正規分布間のフィッシャー・ラオ距離とプルバックSPDコーン距離 Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions ( http://arxiv.org/abs/2307.10644v1 ) ライセンス: Link先を確認	Frank Nielsen	(参考訳) 多変量正規分布のデータセットは、拡散テンソルイメージング、構造テンソルコンピュータビジョン、レーダー信号処理、機械学習など多くの科学分野に豊富に存在する。フィルタリングや分類、クラスタリングといった下流タスクのための通常のデータセットを処理するためには、通常のものとパスの相違点を適切に定義する必要がある。フィッシャー情報計量によって引き起こされるリーマン測地線距離として定義されるフィッシャー・ラオ距離は、そのような原理的な距離距離であるが、いくつかの特別な場合を除いて閉じた形では知られていない。本研究では,多変量正規分布間のフィッシャー・ラオ距離を任意に近似する高速でロバストな手法を最初に報告する。第二に、正規多様体の微分同相埋め込みに基づく距離のクラスを、中心となる正規分布の多様体に対応する高次元対称正定円錐の部分多様体に導入する。円錐上の射影ヒルベルト距離は、埋め込まれた正規部分多様体上の計量となり、その円錐距離を対応する直線ヒルベルト錐測地線と引き戻し、正規分布間の距離と滑らかな経路を得ることを示す。フィッシャー-ラオ距離近似と比較して、プルバックヒルベルト錐距離は行列の極小および極大固有値のみを計算する必要があるため、計算的に軽い。最後に、これらの距離をクラスタリングタスクで使う方法を示す。 Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.	翻訳日:2023-07-21 14:21:24 公開日:2023-07-20
# retouchingffhq: きめ細かい顔修正検出のための大規模データセット RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection ( http://arxiv.org/abs/2307.10642v1 ) ライセンス: Link先を確認	Qichao Ying, Jiaxin Liu, Sheng Li, Haisheng Xu, Zhenxing Qian, Xinpeng Zhang	(参考訳) ショートビデオプラットフォームにおける顔のリタッチフィルターの普及は、デジタル外観の正しさと偽装広告の影響を懸念している。これらの課題に対処するためには、高度な顔修正技術を開発する必要がある。しかし、大規模かつきめ細かい顔修正データセットの欠如は、この分野の進歩の大きな障害となっている。本稿では,50万以上の条件付きリタッチ画像を含む大規模かつ細粒度の顔リタッチデータセットであるretouchingffhqを紹介する。 RetouchingFFHQは、その大規模、高品質、きめ細かい粒度、カスタマイズのため、以前のデータセットから際立っている。 4種類の顔リタッチ操作と異なる顔リタッチレベルを含むことにより、両顔リタッチ検出を細粒度、マルチリタッチ型、マルチリタッチレベル推定問題に拡張する。さらに,クロススケール表現学習のためのcnnバックボーンのためのプラグインとして,マルチグラナラリティアテンションモジュール(mam)を提案する。異なるベースラインを用いた広範囲な実験と提案手法は顔のリタッチ検出に優れた性能を示す。提案する新しいデータセットでは、リアルタイムのきめ細かな顔のリタッチ検出の難しい問題に取り組むための、今後の作業には大きな可能性があると考えています。 The widespread use of face retouching filters on short-video platforms has raised concerns about the authenticity of digital appearances and the impact of deceptive advertising. To address these issues, there is a pressing need to develop advanced face retouching techniques. However, the lack of large-scale and fine-grained face retouching datasets has been a major obstacle to progress in this field. In this paper, we introduce RetouchingFFHQ, a large-scale and fine-grained face retouching dataset that contains over half a million conditionally-retouched images. RetouchingFFHQ stands out from previous datasets due to its large scale, high quality, fine-grainedness, and customization. By including four typical types of face retouching operations and different retouching levels, we extend the binary face retouching detection into a fine-grained, multi-retouching type, and multi-retouching level estimation problem. Additionally, we propose a Multi-granularity Attention Module (MAM) as a plugin for CNN backbones for enhanced cross-scale representation learning. Extensive experiments using different baselines as well as our proposed method on RetouchingFFHQ show decent performance on face retouching detection. With the proposed new dataset, we believe there is great potential for future work to tackle the challenging problem of real-world fine-grained face retouching detection.	翻訳日:2023-07-21 14:21:02 公開日:2023-07-20
# ネットワーク量子化のための量子化特徴蒸留 Quantized Feature Distillation for Network Quantization ( http://arxiv.org/abs/2307.10638v1 ) ライセンス: Link先を確認	Ke Zhu and Yin-Yin He and Jianxin Wu	(参考訳) ニューラルネットワーク量子化は、低ビット近似を用いて、完全精度のニューラルネットワークモデルを加速し、トリムすることを目的としている。量子化認識トレーニング(qat)パラダイムを採用する手法は最近急速に成長しているが、概念的には複雑であることが多い。本稿では,新しい高効率qat法である量子化特徴蒸留(qfd)を提案する。 QFDはまず教師として量子化された(または二項化された)表現を訓練し、その後知識蒸留(KD)を用いてネットワークを定量化する。定量的結果は、QFDが従来の量子化法よりも柔軟で効果的であることを示している。 QFDは、画像分類だけでなく、オブジェクト検出においても、既存の手法をはるかに上回ります。さらに、QFDは、MS-COCOの検出とセグメンテーションに基づいてViTとSwin-Transformerを定量化し、実世界の展開におけるその可能性を検証する。我々の知る限りでは、視覚変換器が物体検出や画像分割タスクで定量化されたのはこれが初めてである。 Neural network quantization aims to accelerate and trim full-precision neural network models by using low bit approximations. Methods adopting the quantization aware training (QAT) paradigm have recently seen a rapid growth, but are often conceptually complicated. This paper proposes a novel and highly effective QAT method, quantized feature distillation (QFD). QFD first trains a quantized (or binarized) representation as the teacher, then quantize the network using knowledge distillation (KD). Quantitative results show that QFD is more flexible and effective (i.e., quantization friendly) than previous quantization methods. QFD surpasses existing methods by a noticeable margin on not only image classification but also object detection, albeit being much simpler. Furthermore, QFD quantizes ViT and Swin-Transformer on MS-COCO detection and segmentation, which verifies its potential in real world deployment. To the best of our knowledge, this is the first time that vision transformers have been quantized in object detection and image segmentation tasks.	翻訳日:2023-07-21 14:20:38 公開日:2023-07-20
# 会話型頭部生成における人間の好みの学習と評価 Learning and Evaluating Human Preferences for Conversational Head Generation ( http://arxiv.org/abs/2307.10636v1 ) ライセンス: Link先を確認	Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei	(参考訳) 手動による選好評価と整合する信頼性と総合的な評価基準は,対話型頭部ビデオ合成法の開発に不可欠である。既存の定量的評価は、限られた評価次元のみを考慮するため、人間の嗜好の完全な複雑さを捉えるのに失敗することが多い。質的評価とユーザスタディはソリューションを提供するが、時間と労力がかかる。この制限は対話型ヘッド生成アルゴリズムやシステムの進歩を妨げる。本稿では,異なる次元にわたる定量的評価に基づいて,人間の嗜好を適合させるための学習ベース評価尺度であるPreference Score(PS)を提案する。 PSは人間のアノテーションを必要とせずに定量的評価を行うことができる。実験結果から,人間の知覚に合わせる上での選好スコアの優越性を検証するとともに,未確認データに対する堅牢性と一般化性を実証し,会話ヘッド生成に有用なツールとなった。この指標が会話型ヘッドジェネレーションの新たな進歩を促進すると期待しています。 A reliable and comprehensive evaluation metric that aligns with manual preference assessments is crucial for conversational head video synthesis method development. Existing quantitative evaluations often fail to capture the full complexity of human preference, as they only consider limited evaluation dimensions. Qualitative evaluations and user studies offer a solution but are time-consuming and labor-intensive. This limitation hinders the advancement of conversational head generation algorithms and systems. In this paper, we propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions. PS can serve as a quantitative evaluation without the need for human annotation. Experimental results validate the superiority of Preference Score in aligning with human perception, and also demonstrates robustness and generalizability to unseen data, making it a valuable tool for advancing conversation head generation. We expect this metric could facilitate new advances in conversational head generation.	翻訳日:2023-07-21 14:20:19 公開日:2023-07-20
# SciBench:大規模言語モデルの大学レベルの科学的問題解決能力の評価 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models ( http://arxiv.org/abs/2307.10635v1 ) ライセンス: Link先を確認	Xiaoxuan Wang and Ziniu Hu and Pan Lu and Yanqiao Zhu and Jieyu Zhang and Satyen Subramaniam and Arjun R. Loomba and Shichang Zhang and Yizhou Sun and Wei Wang	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、多くの数学的なベンチマークにおいて顕著な進歩を示している。しかし、これらのベンチマークのほとんどは中高生に根ざした問題に過ぎず、複数の質問しか含んでおらず、初等算術演算の限られた範囲に限定されている。本稿では,複雑な科学的問題解決に必要な推論能力を体系的に検討することを目的とした,拡張型ベンチマークスイート scibench を提案する。 SciBench には、数学、化学、物理学の教科書から引き出された様々な大学レベルの科学的問題を含むオープンセットと、コンピュータ科学と数学の学部レベルの試験から問題を構成するクローズドセットの2つの慎重に計算されたデータセットが含まれている。 2つのデータセットに基づいて,さまざまなプロンプト戦略を持つ2つの代表的llmの詳細なベンチマーク研究を行う。その結果、現在のLLMは満足なパフォーマンスを達成できないことが判明し、全体のスコアは35.80%に過ぎなかった。さらに,詳細なユーザ調査を行い,llmによる誤りを10の問題解決能力に分類した。分析の結果,特定の問題解決スキルの改善を示す戦略が,他のスキルの低下につながることが示唆された。我々は、SciBenchがLSMの推論能力のさらなる発展を触媒し、究極的には科学的研究と発見に寄与することを期待している。 Recent advances in large language models (LLMs) have demonstrated notable progress on many mathematical benchmarks. However, most of these benchmarks only feature problems grounded in junior and senior high school subjects, contain only multiple-choice questions, and are confined to a limited scope of elementary arithmetic operations. To address these issues, this paper introduces an expansive benchmark suite SciBench that aims to systematically examine the reasoning capabilities required for complex scientific problem solving. SciBench contains two carefully curated datasets: an open set featuring a range of collegiate-level scientific problems drawn from mathematics, chemistry, and physics textbooks, and a closed set comprising problems from undergraduate-level exams in computer science and mathematics. Based on the two datasets, we conduct an in-depth benchmark study of two representative LLMs with various prompting strategies. The results reveal that current LLMs fall short of delivering satisfactory performance, with an overall score of merely 35.80%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms others and some strategies that demonstrate improvements in certain problem-solving skills result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.	翻訳日:2023-07-21 14:20:04 公開日:2023-07-20
# ヒト遺伝子のヌクレオチド配列に関する生成言語モデル Generative Language Models on Nucleotide Sequences of Human Genes ( http://arxiv.org/abs/2307.10634v1 ) ライセンス: Link先を確認	Musa Nuri Ihtiyar and Arzucan Ozgur	(参考訳) 言語モデルは、主にトランスフォーマーベースのもので、NLPで大きな成功を収めた。より正確に言うと、NLUのBERTやNLGのGPT-3のような研究は非常に重要である。 DNA配列は構造的には自然言語に非常に近いため、DNA関連バイオインフォマティクスドメインが関係すると、DNABertのような識別モデルが存在する。しかし、硬貨の生成的な側面は、主に我々の知識の最良の部分について未調査である。そこで本研究では,DNAシークエンスのための自己回帰生成言語モデルであるGPT-3の開発に焦点をあてた。 DNAの全配列を扱うことは、相当な計算資源なしでは難しいため、我々は、DNA全体の機能ではなく、人間の遺伝子のヌクレオチド配列、特定の機能を持つDNAのユニークな部分に焦点を当て、より小さなスケールで研究を行うことに決めた。この決定は、DNAと遺伝子が4つの異なるヌクレオチドから構成される1D配列として見ることができ、多くの情報を失い、単純化しすぎるという事実から、問題構造を大きく変えなかった。まず,n-gramsのような単純な手法が有望であるのに対し,rnnは最善を尽くしているのが観察された。もうひとつのメリットは、自然言語とは異なり、理解できない言語で生成モデルを扱う方法を学ぶことです。パープレキシティのような古典的なメトリクスを超えて、現実のタスクを使用するのがいかに必要かが観察される。さらに, 4種類のヌクレオチドにより, 語彙が最小の言語を選択することにより, これらのモデルのデータ・ハングリーの性質を変えることができるかどうかを調べた。この点をレビューする理由は、そのような言語を選択することが問題をより簡単にするためである。しかし、この研究で分かったのは、必要なデータ量の変更がほとんどないことでした。 Language models, primarily transformer-based ones, obtained colossal success in NLP. To be more precise, studies like BERT in NLU and works such as GPT-3 for NLG are very crucial. DNA sequences are very close to natural language in terms of structure, so if the DNA-related bioinformatics domain is concerned, discriminative models, like DNABert, exist. Yet, the generative side of the coin is mainly unexplored to the best of our knowledge. Consequently, we focused on developing an autoregressive generative language model like GPT-3 for DNA sequences. Because working with whole DNA sequences is challenging without substantial computational resources, we decided to carry out our study on a smaller scale, focusing on nucleotide sequences of human genes, unique parts in DNA with specific functionalities, instead of the whole DNA. This decision did not change the problem structure a lot due to the fact that both DNA and genes can be seen as 1D sequences consisting of four different nucleotides without losing much information and making too much simplification. First of all, we systematically examined an almost entirely unexplored problem and observed that RNNs performed the best while simple techniques like N-grams were also promising. Another beneficial point was learning how to work with generative models on languages we do not understand, unlike natural language. How essential using real-life tasks beyond the classical metrics such as perplexity is observed. Furthermore, checking whether the data-hungry nature of these models can be changed through selecting a language with minimal vocabulary size, four owing to four different types of nucleotides, is examined. The reason for reviewing this was that choosing such a language might make the problem easier. However, what we observed in this study was it did not provide that much of a change in the amount of data needed.	翻訳日:2023-07-21 14:19:40 公開日:2023-07-20
# マルチメソッド自己学習: テキストによるコード生成の改善とその逆 Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa ( http://arxiv.org/abs/2307.10633v1 ) ライセンス: Link先を確認	Shriyash K. Upadhyay and Etan J. Ginsberg	(参考訳) 大規模言語モデルには、同じ問題を解決する多くの方法がある。これは、新しい強み(異なる方法が異なる問題にうまく機能する可能性がある)と弱点(どの方法を使うかを知るのが難しいかもしれない)を導入します。本稿では,Multi-Method Self-Training (MMST)を導入し,各手法の強みを増強し,弱点を緩和する手法を提案する。言語とコードの両方で訓練された176Bパラメータモデルを用いて、MMSTが可能であることを示す。 1) 性能の低い方法(最大30%)を改善し、モデルを使いやすくする。 2)より高性能な方法(最大32.2%)を改善し、より高性能にする。 3)モデルが合理性を生成する能力を向上させることにより、関連するが異なるタスク(最大10.3%)のパフォーマンスを向上させる。次に、MMSTがなぜ機能するのかを調べるためにアブレーション分析を行う。 MMSTは従来の自己学習よりも多くのデータを生成するが、性能改善は複数の手法を用いることで促進される。また,MMSTをより効果的にするために,手法間でのプロンプトエンジニアリングとアンチコラージュ性能を解析した。われわれの論文の証拠は、機械学習の研究者たちに、言語モデルの進歩が新しい形の訓練を可能にする方法を探求する動機を与えてくれることを願っている。 Large Language Models have many methods for solving the same problem. This introduces novel strengths (different methods may work well for different problems) and weaknesses (it may be difficult for users to know which method to use). In this paper, we introduce Multi-Method Self-Training (MMST), where one method is trained on the filtered outputs of another, allowing us to augment the strengths and ameliorate the weaknesses of each method. Using a 176B parameter model trained on both language and code, we show that MMST can 1) improve the less performant method (up to 30%) making the model easier to use, 2) improve the more performant method (up to 32.2%) making the model more performant, and 3) improve the performance of related but distinct tasks (up to 10.3%) by improving the ability of the model to generate rationales. We then conduct ablation analyses to explore why MMST works. We show that MMST generates more data than traditional self-training, but the improvement in performance is driven by the use of multiple methods. We also analyze prompt-engineering and anti-correlated performance between methods as means of making MMST more effective. We hope the evidence from our paper motivates machine learning researchers to explore ways in which advances in language models allow for new forms of training.	翻訳日:2023-07-21 14:19:10 公開日:2023-07-20
# 自動流星検出のための新しい組込みアプリケーションの並列化 Parallelization of a new embedded application for automatic meteor detection ( http://arxiv.org/abs/2307.10632v1 ) ライセンス: Link先を確認	Mathuran Kandeepan (ALSOC), Clara Ciocan (ALSOC), Adrien Cassagne (ALSOC), Lionel Lacassagne (ALSOC)	(参考訳) 本稿では,新しいコンピュータビジョンアプリケーションを並列化する手法を提案する。このシステムは、不安定なカメラとノイズの多いビデオシーケンスから、自動的に流星を検出できる。このアプリケーションは、気象気球や空中観測キャンペーンに組み込むように設計されている。したがって、最終ターゲットは低消費電力のシステムオンチップ(10ワット)であり、ソフトウェアはリアルタイムでフレームのストリームを計算する必要がある(毎秒25フレーム)。このために、最初にアプリケーションをタスクグラフに分割すると、異なる並列化技術が適用されます。実験結果は並列化法の効率を示す。例えばraspberry pi 4やhdビデオシーケンスでは、処理チェーンは毎秒42フレームに達するが、6ワットしか消費しない。 This article presents the methods used to parallelize a new computer vision application. The system is able to automatically detect meteor from non-stabilized cameras and noisy video sequences. The application is designed to be embedded in weather balloons or for airborne observation campaigns. Thus, the final target is a low power system-on-chip (< 10 Watts) while the software needs to compute a stream of frames in real-time (> 25 frames per second). For this, first the application is split in a tasks graph, then different parallelization techniques are applied. Experiment results demonstrate the efficiency of the parallelization methods. For instance, on the Raspberry Pi 4 and on a HD video sequence, the processing chain reaches 42 frames per second while it only consumes 6 Watts.	翻訳日:2023-07-21 14:18:49 公開日:2023-07-20
# Pluvio: トランスファーラーニングと条件変分情報ボトルネックによるドメイン外アーキテクチャとライブラリのアセンブリクローン検索 Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck ( http://arxiv.org/abs/2307.10631v1 ) ライセンス: Link先を確認	Zhiwei Fu, Steven H. H. Ding, Furkan Alaca, Benjamin C. M. Fung, Philippe Charland	(参考訳) コード再利用の実践は、より速くより効率的な開発ライフサイクルのためにソフトウェア開発において不可欠です。しかし実際には、コードの再利用プラクティスは適切なコントロールを欠いているため、脆弱性の伝播や知的財産権侵害といった問題が発生する。重要なシフトライト防御メカニズムであるアセンブリクローン検索は、リリースされた実行ファイルの再利用による脆弱性のあるコードの識別に有効である。組立クローン探索に関する最近の研究は、異なるツールチェーンが生成する組立コード変種にマッチする機械学習ベースの手法を使う傾向を示している。しかしながら、これらのメソッドはトレーニングで使用される少数のツールチェーンの変種から学んだことに限定されており、見当たらないアーキテクチャと対応するコンパイルツールチェーンの変種には適用できない。本稿では,未知のアーキテクチャとライブラリを用いたアセンブリクローン探索の問題に関する最初の研究を行う。本研究は,大規模に訓練された自然言語モデルを用いて,集団クローン探索のための現在の学習に基づくアプローチに人間の共通知識を組み入れることを提案する。トランスファー学習は、アセンブリコードの人間の専門家から幅広い知識をもたらすことができるため、既存のアプローチの制限に対処するのに役立つ。さらに,不要かつ冗長なトークンを削除するために強化学習エージェントを提案することで,シーケンス制限問題にも対処する。新しい変分情報ボトルネック学習戦略と組み合わされ、提案システムはアーキテクチャの潜在的な指標と最適化設定への依存を最小化し、未発見のアーキテクチャをより一般化する。我々は,未解決のアーキテクチャクローン探索シナリオをシミュレートし,提案手法が最先端ソリューションに対して有効であることを示す。 The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, code reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone search with unseen architectures and libraries. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search. Transfer learning can aid in addressing the limitations of the existing approaches, as it can bring in broader knowledge from human experts in assembly code. We further address the sequence limit issue by proposing a reinforcement learning agent to remove unnecessary and redundant tokens. Coupled with a new Variational Information Bottleneck learning strategy, the proposed system minimizes the reliance on potential indicators of architectures and optimization settings, for a better generalization of unseen architectures. We simulate the unseen architecture clone search scenarios and the experimental results show the effectiveness of the proposed approach against the state-of-the-art solutions.	翻訳日:2023-07-21 14:18:37 公開日:2023-07-20
# 3次元分子前処理のためのフラクタルデノイング Fractional Denoising for 3D Molecular Pre-training ( http://arxiv.org/abs/2307.10683v1 ) ライセンス: Link先を確認	Shikun Feng and Yuyan Ni and Yanyan Lan and Zhi-Ming Ma and Wei-Ying Ma	(参考訳) coordinate denoisingは有望な3d分子前訓練法であり、様々な下流の薬物発見タスクで顕著な性能を達成した。理論的には、この目的は下流のタスクに有用な力場を学ぶことと等価である。それにもかかわらず、効果的な力場、すなわち、低カバレッジサンプルと等方力場を学ぶための座標化の課題は2つある。その根底にある理由は、既存の分極法によって仮定される分子分布が分子の異方性特性を捉えないからである。これらの課題に対処するために,2面天使と座標の両方のノイズを含む,新しいハイブリッドノイズ戦略を提案する。しかし、そのようなハイブリッドノイズを伝統的な方法で発音することは、もはや力場を学ぶことと等価ではない。理論的推論により、この問題は共分散に対する入力コンホメーションの依存性によって引き起こされる。そこで本研究では,2種類の雑音を分離し,後者の座標部のみをデノー化する新しい分数デノージング法(frad)を設計することを提案する。このように、フラッドはより低エネルギーな構造をサンプリングする利点と力場等価性の両方を享受している。広範な実験により、分子表現におけるfradの有効性が示され、qm9の12のタスクのうち9つとmd17の8つのターゲットのうち7つに新しい状態が示された。 Coordinate denoising is a promising 3D molecular pre-training method, which has achieved remarkable performance in various downstream drug discovery tasks. Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks. Nevertheless, there are two challenges for coordinate denoising to learn an effective force field, i.e. low coverage samples and isotropic force field. The underlying reason is that molecular distributions assumed by existing denoising methods fail to capture the anisotropic characteristic of molecules. To tackle these challenges, we propose a novel hybrid noise strategy, including noises on both dihedral angel and coordinate. However, denoising such hybrid noise in a traditional way is no more equivalent to learning the force field. Through theoretical deductions, we find that the problem is caused by the dependency of the input conformation for covariance. To this end, we propose to decouple the two types of noise and design a novel fractional denoising method (Frad), which only denoises the latter coordinate part. In this way, Frad enjoys both the merits of sampling more low-energy structures and the force field equivalence. Extensive experiments show the effectiveness of Frad in molecular representation, with a new state-of-the-art on 9 out of 12 tasks of QM9 and on 7 out of 8 targets of MD17.	翻訳日:2023-07-21 14:11:29 公開日:2023-07-20
# 知識グラフ埋め込みに基づくパーソナライズされたレコメンダシステム A Personalized Recommender System Based-on Knowledge Graph Embeddings ( http://arxiv.org/abs/2307.10680v1 ) ライセンス: Link先を確認	Ngoc Luyen Le (Heudiasyc), Marie-H\'el\`ene Abel (Heudiasyc), Philippe Gouspillou	(参考訳) 知識グラフはオントロジーを用いてエンティティとその関係をモデル化するのに有効であることが証明されている。近年、知識グラフを情報モデリングの形式として利用することへの関心が高まり、レコメンダシステムへの採用が増加している。ユーザとアイテムを知識グラフに組み込むことで、これらのシステムはそれらの間の暗黙のつながりをよりよく捉え、より正確なレコメンデーションを提供することができる。本稿では,自動車購入/販売ドメインに適用した知識グラフを組み込んだパーソナライズされたレコメンデーションシステムの構築と提案を行う。実験の結果,提案手法が個々のユーザと整合性のあるレコメンデーションを提供することの有効性を示した。 Knowledge graphs have proven to be effective for modeling entities and their relationships through the use of ontologies. The recent emergence in interest for using knowledge graphs as a form of information modeling has led to their increased adoption in recommender systems. By incorporating users and items into the knowledge graph, these systems can better capture the implicit connections between them and provide more accurate recommendations. In this paper, we investigate and propose the construction of a personalized recommender system via knowledge graphs embedding applied to the vehicle purchase/sale domain. The results of our experimentation demonstrate the efficacy of the proposed method in providing relevant recommendations that are consistent with individual users.	翻訳日:2023-07-21 14:10:44 公開日:2023-07-20
# 雑音QRコードの分類のための深層学習 Deep learning for classification of noisy QR codes ( http://arxiv.org/abs/2307.10677v1 ) ライセンス: Link先を確認	Rebecca Leygonie (LIPADE), Sylvain Lobry (LIPADE)), Laurent Wendling (LIPADE)	(参考訳) 我々は,視覚的に識別可能な対象を表現しない抽象画像に対して,ディープラーニングに基づく古典的分類モデルの限界を定義したい。qr符号(quick response codes)は,この抽象画像のカテゴリに分類される。抽象画像分類のための深層学習に基づくモデルの限界を理解するために,健康パス読取時に得られた情報から生成されたqrコードに基づく画像分類モデルを訓練する。雑音の存在下での分類モデルと古典的(決定論的)復号法を比較した。本研究は,深層学習に基づくモデルが抽象画像の理解に有効であると結論付けることを可能にする。 We wish to define the limits of a classical classification model based on deep learning when applied to abstract images, which do not represent visually identifiable objects.QR codes (Quick Response codes) fall into this category of abstract images: one bit corresponding to one encoded character, QR codes were not designed to be decoded manually. To understand the limitations of a deep learning-based model for abstract image classification, we train an image classification model on QR codes generated from information obtained when reading a health pass. We compare a classification model with a classical (deterministic) decoding method in the presence of noise. This study allows us to conclude that a model based on deep learning can be relevant for the understanding of abstract images.	翻訳日:2023-07-21 14:10:23 公開日:2023-07-20
# ベイアーおよび非ベイヤパターン画像センサの効率的な統一デモサイシング Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors ( http://arxiv.org/abs/2307.10667v1 ) ライセンス: Link先を確認	Haechang Lee, Dongwon Park, Wongi Jeong, Kijeong Kim, Hyunwoo Je, Dongil Ryu, Se Young Chun	(参考訳) 最近のCMOSイメージセンサー(CIS)の物理的サイズが小さくなるにつれて、最新のモバイルカメラは、隣接する画素を持つ均一な色ユニットからなる独自の非バイヤーカラーフィルタアレイ(例えば、Quad、Nona、QxQ)パターンを採用している。これらの非バイヤーセンサは、異なる光条件の画素ビンサイズが変更可能であるため、従来のバイエルCFAよりも優れているが、固有の画素パターン構造とセンサハードウェア特性により、分解時に視覚的アーティファクトを導入する可能性がある。従来はバイエルCFAに重点を置いており、照明条件が異なる様々なCFAモードの非ベイエルパターンCISを再現する必要がある。本研究では,従来のBayer RAWと,様々な非Bayer CFAのRAWデータに異なる動作モードで適用可能な,効率的な統一復調手法を提案する。我々の知識学習に基づく適応パターンの復調モデル、すなわちKLAPは、CFA毎にネットワーク内の1%のキーフィルタに対してCFA適応フィルタを利用するが、それでもすべてのCFAを効果的に復調し、大規模モデルに匹敵する性能をもたらす。さらに,推論中にメタラーニング(KLAP-M)を用いることで,実際のRAWデータから未知のセンサ生成物を排除し,合成画像と実センサRAWのギャップを効果的に埋めることができる。 KLAP法とKLAP-M法は,Bayer および非Bayer CFAの合成RAWデータと実RAWデータの両方において,最先端の復調性能を達成した。 As the physical size of recent CMOS image sensors (CIS) gets smaller, the latest mobile cameras are adopting unique non-Bayer color filter array (CFA) patterns (e.g., Quad, Nona, QxQ), which consist of homogeneous color units with adjacent pixels. These non-Bayer sensors are superior to conventional Bayer CFA thanks to their changeable pixel-bin sizes for different light conditions but may introduce visual artifacts during demosaicing due to their inherent pixel pattern structures and sensor hardware characteristics. Previous demosaicing methods have primarily focused on Bayer CFA, necessitating distinct reconstruction methods for non-Bayer patterned CIS with various CFA modes under different lighting conditions. In this work, we propose an efficient unified demosaicing method that can be applied to both conventional Bayer RAW and various non-Bayer CFAs' RAW data in different operation modes. Our Knowledge Learning-based demosaicing model for Adaptive Patterns, namely KLAP, utilizes CFA-adaptive filters for only 1% key filters in the network for each CFA, but still manages to effectively demosaic all the CFAs, yielding comparable performance to the large-scale models. Furthermore, by employing meta-learning during inference (KLAP-M), our model is able to eliminate unknown sensor-generic artifacts in real RAW data, effectively bridging the gap between synthetic images and real sensor RAW. Our KLAP and KLAP-M methods achieved state-of-the-art demosaicing performance in both synthetic and real RAW data of Bayer and non-Bayer CFAs.	翻訳日:2023-07-21 14:10:03 公開日:2023-07-20
# チェコ語ニューステキストの分類のためのデータセットと強力なベースライン A Dataset and Strong Baselines for Classification of Czech News Texts ( http://arxiv.org/abs/2307.10666v1 ) ライセンス: Link先を確認	Hynek Kydl\'i\v{c}ek, Jind\v{r}ich Libovick\'y	(参考訳) チェコの自然言語処理のための事前学習されたモデルは、純粋に言語的なタスク(タグづけ、解析、ner)や、感情分類や記事分類などの比較的単純な分類タスクで評価されることが多い。その代わり、チェコ最大の分類データセットの一つであるチェコ〜news~classification~dataset(cze-nec)を20年以上にわたるさまざまなソースのニュース記事から構成し、より厳密な評価を可能にする。我々は、ニュースソース、ニュースカテゴリ、推定著者の性別、週の日という4つの分類タスクを定義した。タスクの難易度を検証するために,人間による評価を行い,事前学習されたトランスフォーマーモデルに基づく強力な機械学習ベースラインに人間のパフォーマンスが遅れていることを明らかにした。さらに, 言語固有の事前学習エンコーダ解析が, 市販の大規模生成言語モデルよりも優れていることを示す。 Pre-trained models for Czech Natural Language Processing are often evaluated on purely linguistic tasks (POS tagging, parsing, NER) and relatively simple classification tasks such as sentiment classification or article classification from a single news source. As an alternative, we present CZEch~NEws~Classification~dataset (CZE-NEC), one of the largest Czech classification datasets, composed of news articles from various sources spanning over twenty years, which allows a more rigorous evaluation of such models. We define four classification tasks: news source, news category, inferred author's gender, and day of the week. To verify the task difficulty, we conducted a human evaluation, which revealed that human performance lags behind strong machine-learning baselines built upon pre-trained transformer models. Furthermore, we show that language-specific pre-trained encoder analysis outperforms selected commercially available large-scale generative language models.	翻訳日:2023-07-21 14:09:22 公開日:2023-07-20
# 教師なし分解と強化によるnrfの照明 Lighting up NeRF via Unsupervised Decomposition and Enhancement ( http://arxiv.org/abs/2307.10664v1 ) ライセンス: Link先を確認	Haoyuan Wang, Xiaogang Xu, Ke Xu, Rynson WH. Lau	(参考訳) ニューラル・ラジアンス・フィールド(NeRF)は、シーンの一連の画像と対応するカメラのポーズから、新しいビューを合成するための有望なアプローチである。しかし、低照度シーンから撮影された画像は、低画素強度、高ノイズ、色歪みのために、高品質な結果を得るためにNeRFモデルを訓練するのにはほとんど利用できない。従来の低照度画像強調法とNeRF法を併用しても,個々の2次元強調プロセスによる視界の整合性のためうまく動作しない。本稿では,srgbローライト画像から直接,シーン表現を強化し,ノーマル・ライト・ノベル・ビューを教師なしで合成する手法であるlow-light nerf(llnerf)を提案する。我々のアプローチの核心は、光界学習の分解であり、照明を強化し、ノイズを低減し、歪んだ色をnrf最適化プロセスと共同で補正することができる。本手法は,低照度シーンからの低ダイナミックレンジ(8bits/channel)画像の集合を考慮し,適切な照明と鮮やかな色と細部を付加した新しいビュー画像を生成することができる。実験の結果,提案手法は既存の低照度化法やNeRF法よりも優れていた。 Neural Radiance Field (NeRF) is a promising approach for synthesizing novel views, given a set of images and the corresponding camera poses of a scene. However, images photographed from a low-light scene can hardly be used to train a NeRF model to produce high-quality results, due to their low pixel intensities, heavy noise, and color distortion. Combining existing low-light image enhancement methods with NeRF methods also does not work well due to the view inconsistency caused by the individual 2D enhancement process. In this paper, we propose a novel approach, called Low-Light NeRF (or LLNeRF), to enhance the scene representation and synthesize normal-light novel views directly from sRGB low-light images in an unsupervised manner. The core of our approach is a decomposition of radiance field learning, which allows us to enhance the illumination, reduce noise and correct the distorted colors jointly with the NeRF optimization process. Our method is able to produce novel view images with proper lighting and vivid colors and details, given a collection of camera-finished low dynamic range (8-bits/channel) images from a low-light scene. Experiments demonstrate that our method outperforms existing low-light enhancement methods and NeRF methods.	翻訳日:2023-07-21 14:09:03 公開日:2023-07-20
# フェデレーション学習における共有性に関する調査 : モデルユーティリティ,プライバシリーク,コミュニケーション効率の展望 A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency ( http://arxiv.org/abs/2307.10655v1 ) ライセンス: Link先を確認	Jiawei Shao, Zijian Li, Wenqiang Sun, Tailin Zhou, Yuchang Sun, Lumin Liu, Zehong Lin, Jun Zhang	(参考訳) 連合学習(federated learning, ffl)は,プライバシを保護し,異なるパーティ間のコラボレーショントレーニングにおいて,極めて効果的なパラダイムとして浮上している。従来の集中型学習とは異なり、flはクライアントがプライベートなデータセットを公開することなく、プライバシーを保った情報を共有できる。このアプローチは、プライバシー保護を強化するだけでなく、複数の参加者によるより効率的で安全なコラボレーションを促進する。そのため、flは研究者からかなりの注目を集め、関連する研究をまとめるために多くの調査が進められている。しかしながら、これらの調査の大部分は、トレーニングプロセス中にモデルパラメータを共有する方法に集中し、他の形式のローカル情報を共有する可能性を見据えている。本稿では,FLで何を共有すべきかという新たな視点から,モデルユーティリティ,プライバシリーク,通信効率を重視した体系的な調査を行う。この調査は4つの異なる貢献によって以前の調査と異なる。まず、共有情報の3つのカテゴリ(モデル共有、合成データ共有、知識共有)を含む共有方法の観点から、fl法の新たな分類法を提案する。第2に,プライバシ攻撃に対するさまざまな共有方法の脆弱性を分析し,特定のプライバシ保証を提供する防御機構をレビューする。第3に、FLにおける様々な共有手法の性能と通信のオーバーヘッドを比較するための広範な実験を行う。さらに,様々な防御手法の有効性を比較しながら,モデルインバージョン攻撃とメンバーシップ推論攻撃によるプライバシー漏洩の可能性を評価する。最後に,現在の手法における潜在的な欠陥を議論し,今後の改善の方向性について概説する。 Federated learning (FL) has emerged as a highly effective paradigm for privacy-preserving collaborative training among different parties. Unlike traditional centralized learning, which requires collecting data from each party, FL allows clients to share privacy-preserving information without exposing private datasets. This approach not only guarantees enhanced privacy protection but also facilitates more efficient and secure collaboration among multiple participants. Therefore, FL has gained considerable attention from researchers, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on methods sharing model parameters during the training process, while overlooking the potential of sharing other forms of local information. In this paper, we present a systematic survey from a new perspective, i.e., what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. This survey differs from previous ones due to four distinct contributions. First, we present a new taxonomy of FL methods in terms of the sharing methods, which includes three categories of shared information: model sharing, synthetic data sharing, and knowledge sharing. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms that provide certain privacy guarantees. Third, we conduct extensive experiments to compare the performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we discuss potential deficiencies in current methods and outline future directions for improvement.	翻訳日:2023-07-21 14:08:40 公開日:2023-07-20
# SHAPの条件予測ネットワーク Conditional expectation network for SHAP ( http://arxiv.org/abs/2307.10654v1 ) ライセンス: Link先を確認	Ronald Richman and Mario V. W\"uthrich	(参考訳) 予測モデルを説明するための非常に一般的なモデルに依存しないテクニックは、SHAP(SHapley Additive exPlanation)である。 SHAPの最も一般的な2つのバージョンは条件付き期待バージョンと条件なし期待バージョン(後者は介入型SHAPとも呼ばれる)である。木ベースのメソッドを除いて、通常、非条件バージョンが使用される(計算上の理由から)。ニューラルネットワークと他の回帰モデルの両方の条件付きバージョンを効率的に計算し、特徴成分の依存構造を適切に考慮する(代理的な)ニューラルネットワークアプローチを提供する。この提案は,一般化線形モデル(GLM)と類似した複雑な回帰モデルにおいて,ドロップ1およびアノバ解析を提供することにも有用であり,特徴成分の適切な依存構造を考慮した部分依存プロット(PDP)を提供する。 A very popular model-agnostic technique for explaining predictive models is the SHapley Additive exPlanation (SHAP). The two most popular versions of SHAP are a conditional expectation version and an unconditional expectation version (the latter is also known as interventional SHAP). Except for tree-based methods, usually the unconditional version is used (for computational reasons). We provide a (surrogate) neural network approach which allows us to efficiently calculate the conditional version for both neural networks and other regression models, and which properly considers the dependence structure in the feature components. This proposal is also useful to provide drop1 and anova analyses in complex regression models which are similar to their generalized linear model (GLM) counterparts, and we provide a partial dependence plot (PDP) counterpart that considers the right dependence structure in the feature components.	翻訳日:2023-07-21 14:08:14 公開日:2023-07-20
# 監視サービスにおける時系列自動異常検出のための最適化目標の検討 Refining the Optimization Target for Automatic Univariate Time Series Anomaly Detection in Monitoring Services ( http://arxiv.org/abs/2307.10653v1 ) ライセンス: Link先を確認	Manqing Dong and Zhanxiang Zhao and Yitong Geng and Wentao Li and Wei Wang and Huai Jiang	(参考訳) 信頼性の確保とシステムパフォーマンスの最適化を目的とした,大量のデータを扱う産業監視サービスでは,時系列異常検出が不可欠である。既存の手法では、広範囲のラベル付きリソースと手動パラメータの選択を必要とし、自動化の必要性を強調している。本稿では,時系列異常検出モデルにおけるパラメータ自動最適化のための包括的フレームワークを提案する。このフレームワークには,予測スコア,形状スコア,感度スコアという3つの最適化目標が導入されている。提案されたフレームワークは6ヶ月以上ネットで適用され、毎分5万回以上配信されている。ユーザエクスペリエンスをシンプルにするためには、期待された機密値のみを必要とし、ユーザフレンドリなインターフェースを提供し、望ましい検出結果を達成する。公開データセット上での広範な評価と他の手法との比較により,提案手法の有効性がさらに検証された。 Time series anomaly detection is crucial for industrial monitoring services that handle a large volume of data, aiming to ensure reliability and optimize system performance. Existing methods often require extensive labeled resources and manual parameter selection, highlighting the need for automation. This paper proposes a comprehensive framework for automatic parameter optimization in time series anomaly detection models. The framework introduces three optimization targets: prediction score, shape score, and sensitivity score, which can be easily adapted to different model backbones without prior knowledge or manual labeling efforts. The proposed framework has been successfully applied online for over six months, serving more than 50,000 time series every minute. It simplifies the user's experience by requiring only an expected sensitive value, offering a user-friendly interface, and achieving desired detection results. Extensive evaluations conducted on public datasets and comparison with other methods further confirm the effectiveness of the proposed framework.	翻訳日:2023-07-21 14:08:00 公開日:2023-07-20
# 自然言語処理研究の展望を探る Exploring the Landscape of Natural Language Processing Research ( http://arxiv.org/abs/2307.10652v1 ) ライセンス: Link先を確認	Tim Schopf, Karim Arabi, Florian Matthes	(参考訳) 自然言語テキストを理解し,生成し,処理するための効率的なアプローチとして,近年,自然言語処理(NLP)の研究が急速に広まり,広く採用されている。この分野での研究が増加していることを踏まえ、NLP関連のいくつかのアプローチが研究コミュニティで調査されている。しかし、確立したトピックを分類し、傾向を特定し、今後の研究分野を概説する総合的な研究は現在も残っていない。このギャップを埋めるため,aclアンソロジーに含まれる研究論文を体系的に分類・分析した。その結果,研究景観の構造化的概観,nlpにおける研究分野の分類,nlpにおける最近の展開の分析,知見の要約,今後の課題の方向性について概説する。 As an efficient approach to understand, generate, and process natural language texts, research in natural language processing (NLP) has exhibited a rapid spread and wide adoption in recent years. Given the increasing amount of research work in this area, several NLP-related approaches have been surveyed in the research community. However, a comprehensive study that categorizes established topics, identifies trends, and outlines areas for future research remains absent to this day. Contributing to closing this gap, we have systematically classified and analyzed research papers included in the ACL Anthology. As a result, we present a structured overview of the research landscape, provide a taxonomy of fields-of-study in NLP, analyze recent developments in NLP, summarize our findings, and highlight directions for future work.	翻訳日:2023-07-21 14:07:45 公開日:2023-07-20
# 気候科学におけるグランジャー因果関係の状態空間モデルにおけるグラフ Graphs in State-Space Models for Granger Causality in Climate Science ( http://arxiv.org/abs/2307.10703v1 ) ライセンス: Link先を確認	V\'ictor Elvira, \'Emilie Chouzenoux, Jordi Cerd\`a, Gustau Camps-Valls	(参考訳) グレンジャー因果関係(GC)は、しばしば実際の因果関係とはみなされない。しかし、これはおそらく別の時系列から予測可能性を評価する最も広く使われている方法である。グランガー因果関係は神経科学や計量学から地球科学まで、多くの応用分野で広く用いられている。我々は、状態空間モデルのグラフィカルな視点でGCを再考する。そこで我々は,線形ガウス状態空間モデルの状態方程式における線形行列作用素を推定するための期待最大化アルゴリズムであるgraphemを用いた。ラッソ正則化は、近位分解ダグラス-ラッフォードアルゴリズムを用いて解くmステップに含まれる。おもちゃの例と厳しい気候問題における実験は、標準グランジャー因果関係法に対するモデルと推論手法の利点を示している。 Granger causality (GC) is often considered not an actual form of causality. Still, it is arguably the most widely used method to assess the predictability of a time series from another one. Granger causality has been widely used in many applied disciplines, from neuroscience and econometrics to Earth sciences. We revisit GC under a graphical perspective of state-space models. For that, we use GraphEM, a recently presented expectation-maximisation algorithm for estimating the linear matrix operator in the state equation of a linear-Gaussian state-space model. Lasso regularisation is included in the M-step, which is solved using a proximal splitting Douglas-Rachford algorithm. Experiments in toy examples and challenging climate problems illustrate the benefits of the proposed model and inference technique over standard Granger causality methods.	翻訳日:2023-07-21 14:02:06 公開日:2023-07-20
# 社会が形成・形成する大規模言語モデル:arXiv出版パターンの調査 Large language models shape and are shaped by society: A survey of arXiv publication patterns ( http://arxiv.org/abs/2307.10700v1 ) ライセンス: Link先を確認	Rajiv Movva, Sidhika Balachandar, Kenny Peng, Gabriel Agostini, Nikhil Garg, Emma Pierson	(参考訳) 近年、大規模言語モデル(llm)の論文数が急増し、書誌分析によってほとんど文書化されていない科学的景観に劇的な変化をもたらした。ここでは、CSとStat arXivsに投稿された388Kの論文を分析し、2023年と2018-2022年の出版パターンの変化に注目した。本稿は, LLM論文の割合の増大, LLM論文の執筆者, LLM論文の執筆者, LLM論文の背景と著者研究の関連, 高度に引用された LLM 論文を区別する要因, 国際協力のパターンについて分析する。 LLM研究は、コンピュータと社会に関する論文の割合が18倍に増加しており、新たに出版されている著者は、より経験豊富な著者よりも、アプリケーションや社会への影響に重点を置いている可能性が高い。 LLM研究は、LLM著者がフォーカスするトピックにおけるジェンダーと学術的/産業的格差、そしてコラボレーションネットワークにおける米国と中国の分裂を文書化する。概して、我々の分析は、llmが社会によって形と形の両方を研究する深い方法を文書化しており、社会学的レンズの必要性を証明している。 There has been a steep recent increase in the number of large language model (LLM) papers, producing a dramatic shift in the scientific landscape which remains largely undocumented through bibliometric analysis. Here, we analyze 388K papers posted on the CS and Stat arXivs, focusing on changes in publication patterns in 2023 vs. 2018-2022. We analyze how the proportion of LLM papers is increasing; the LLM-related topics receiving the most attention; the authors writing LLM papers; how authors' research topics correlate with their backgrounds; the factors distinguishing highly cited LLM papers; and the patterns of international collaboration. We show that LLM research increasingly focuses on societal impacts: there has been an 18x increase in the proportion of LLM-related papers on the Computers and Society sub-arXiv, and authors newly publishing on LLMs are more likely to focus on applications and societal impacts than more experienced authors. LLM research is also shaped by social dynamics: we document gender and academic/industry disparities in the topics LLM authors focus on, and a US/China schism in the collaboration network. Overall, our analysis documents the profound ways in which LLM research both shapes and is shaped by society, attesting to the necessity of sociotechnical lenses.	翻訳日:2023-07-21 14:01:54 公開日:2023-07-20
# Lu-N-H系における近環境超伝導の実現可能性の評価 Assessing the feasibility of near-ambient conditions superconductivity in the Lu-N-H system ( http://arxiv.org/abs/2307.10699v1 ) ライセンス: Link先を確認	Yue-Wen Fang, {\DH}or{\dj}e Dangi\'c, Ion Errea	(参考訳) 窒素添加水素化ルテチウム(Lu-N-H)における近環境超伝導の最近の報告は大きな関心を集めている。しかし、相反する結果が超伝導に疑問を投げかけている。本稿では,高温超伝導臨界温度(T_c$)の高速予測器と高出力結晶構造予測を組み合わせ,Lu-N-Hの1GPaにおける特性に光を当てる。予測された構造はいずれも高温超伝導を支える可能性を示しておらず、窒素の含有は絶縁相の出現を好んでいる。近環境超伝導の欠如にもかかわらず、代替準安定テンプレートを検討し、そのT_c$と量子アンハーモニック効果を含む動的安定性について検討する。立方体lu$_4$h$_{11}$nは20gpaで100kという高い$t_c$を示し、親のluh$_3$で得られた30kに比べて大きく増加する。興味深いことに、実験で観察されたものと似たX線パターンを持つ。 LaH$_{10}$-like LuH$_{10}$とCaH$_6$-like LuH$_6$はそれぞれ175GPaと100GPaの高温超伝導体となり、T_c$は286K、246Kとなる。本研究により, 高温超伝導体は, 近環境圧力下での安定相では不可能であることが示唆された。 The recent report of near-ambient superconductivity in nitrogen-doped lutetium hydrides (Lu-N-H) has generated a great interest. However, conflicting results have raised doubts regarding superconductivity. Here, we combine high-throughput crystal structure predictions with a fast predictor of the superconducting critical temperature ($T_c$) to shed light on the properties of Lu-N-H at 1 GPa. None of the predicted structures shows the potential to support high-temperature superconductivity and the inclusion of nitrogen favors the appearance of insulating phases. Despite the lack of near-ambient superconductivity, we consider alternative metastable templates and study their $T_c$ and dynamical stability including quantum anharmonic effects. The cubic Lu$_4$H$_{11}$N exhibits a high $T_c$ of 100 K at 20 GPa, a large increase compared to 30 K obtained in its parent LuH$_3$. Interestingly, it has a similar X-ray pattern to the experimentally observed one. The LaH$_{10}$-like LuH$_{10}$ and CaH$_6$-like LuH$_6$ become high-temperature superconductors at 175 GPa and 100 GPa, with $T_c$ of 286 K and 246 K, respectively. Our findings suggest that high-temperature superconductivity is not possible in stable phases at near-ambient pressure, but metastable high-$T_c$ templates exist at moderate and high pressures.	翻訳日:2023-07-21 14:01:26 公開日:2023-07-20
# 逆知識蒸留:限定データを用いた網膜画像マッチングのための小型モデルによる大規模モデルの訓練 Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data ( http://arxiv.org/abs/2307.10698v1 ) ライセンス: Link先を確認	Sahar Almahfouz Nasser, Nihar Gupte, and Amit Sethi	(参考訳) 網膜画像マッチングは、疾患の進行と治療反応のモニタリングにおいて重要な役割を果たす。しかしながら、時間分割された画像のペア間で一致したキーポイントを持つデータセットは、トランスフォーマティブベースのモデルのトレーニングには不十分である。本稿では, オーバーフィッティングを防止しつつ, 限られたデータで大規模モデルを訓練するための, 逆知識蒸留に基づく新しい手法を提案する。まず,一般公開されたデータセット上での結果を改善するために,cnnベースのsuperretinaと呼ばれる半教師付きメソッドのアーキテクチャ修正を提案する。次に,より重いモデルに基づくより軽いモデルを訓練する分野の知識蒸留研究において直観に反するcnnベースのモデルを用いて,視覚トランスフォーマエンコーダに基づく計算量より重いモデルを訓練する。驚くべきことに、このような逆知識蒸留は一般化をさらに改善する。実験により,表現空間における高次元の嵌合は,最終出力に適合する訓練と異なり過度な適合を防止できる可能性が示唆された。また、網膜画像のキーポイント検出とマッチングのためのアノテーションを付加したパブリックデータセットを提供し、網膜画像応用のためのアルゴリズムの開発を支援する。 Retinal image matching plays a crucial role in monitoring disease progression and treatment response. However, datasets with matched keypoints between temporally separated pairs of images are not available in abundance to train transformer-based model. We propose a novel approach based on reverse knowledge distillation to train large models with limited data while preventing overfitting. Firstly, we propose architectural modifications to a CNN-based semi-supervised method called SuperRetina that help us improve its results on a publicly available dataset. Then, we train a computationally heavier model based on a vision transformer encoder using the lighter CNN-based model, which is counter-intuitive in the field knowledge-distillation research where training lighter models based on heavier ones is the norm. Surprisingly, such reverse knowledge distillation improves generalization even further. Our experiments suggest that high-dimensional fitting in representation space may prevent overfitting unlike training directly to match the final output. We also provide a public dataset with annotations for retinal image keypoint detection and matching to help the research community develop algorithms for retinal image applications.	翻訳日:2023-07-21 14:00:57 公開日:2023-07-20
# SqueezerFaceNet:小さな顔認識CNNを減らし、フィルタの処理をさらに強化 SqueezerFaceNet: Reducing a Small Face Recognition CNN Even More Via Filter Pruning ( http://arxiv.org/abs/2307.10697v1 ) ライセンス: Link先を確認	Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose Maria Buades Rubio, Josef Bigun	(参考訳) 様々なデジタルサービスでモバイルデバイスが広く使われるようになると、信頼性とリアルタイムの人物認証の必要性が高まった。このような状況下では、モバイルデバイスにおけるカメラの普及と日常アプリケーションへの統合により、顔認識技術がユーザ認証の信頼性の高い方法として出現している。深層畳み込みニューラルネットワーク(cnns)の急速な進歩は、多数の顔認証アーキテクチャを生み出した。しかし、これらのモデルはモバイルアプリケーションには大きめで実用的ではないことが多く、数百万のパラメータを持つ数百メガバイトに達する。我々は,100万パラメータ未満の軽量顔認識ネットワークであるSqueezerFaceNetを開発し,この問題に対処する。これはtaylorスコアに基づくネットワークプルーニング手法を適用し、重要度の低いフィルタを反復的に除去することで実現される。 squeezenetをベースとする既に小さなネットワーク(約1.24m)から始めると、パフォーマンスが低下することなく、さらに(最大40%まで)削減できることが分かる。我々の知識を最大限に活用するために、私たちは初めて顔認識タスクのためのネットワークプルーニング手法を評価する。 The widespread use of mobile devices for various digital services has created a need for reliable and real-time person authentication. In this context, facial recognition technologies have emerged as a dependable method for verifying users due to the prevalence of cameras in mobile devices and their integration into everyday applications. The rapid advancement of deep Convolutional Neural Networks (CNNs) has led to numerous face verification architectures. However, these models are often large and impractical for mobile applications, reaching sizes of hundreds of megabytes with millions of parameters. We address this issue by developing SqueezerFaceNet, a light face recognition network which less than 1M parameters. This is achieved by applying a network pruning method based on Taylor scores, where filters with small importance scores are removed iteratively. Starting from an already small network (of 1.24M) based on SqueezeNet, we show that it can be further reduced (up to 40%) without an appreciable loss in performance. To the best of our knowledge, we are the first to evaluate network pruning methods for the task of face recognition.	翻訳日:2023-07-21 14:00:36 公開日:2023-07-20
# SLPD:WSIのスライドレベル原型蒸留 SLPD: Slide-level Prototypical Distillation for WSIs ( http://arxiv.org/abs/2307.10696v1 ) ライセンス: Link先を確認	Zhimiao Yu, Tiancheng Lin, Yi Xu	(参考訳) 特徴表現能力の向上は、多くのスライド病理画像(WSI)タスクの基礎となっている。最近の研究は、病理特異的自己教師型学習(SSL)において大きな成功を収めている。しかし、その多くはパッチレベルの表現を学ぶことだけに焦点を当てているため、プリテキストとスライドレベルのダウンストリームタスク、例えばサブタイプ、グレーディング、ステージングの間にはギャップがある。スライドレベルの表現を目指して,WSI 上でのコンテキストモデリングのためのスライディング内およびスライディング間セマンティック構造を探索するために,SLPD (Slide-Level Prototypeal Distillation) を提案する。具体的には、各wsi内の領域(4096x4096パッチ)に対して反復的にスライダー内クラスタリングを行い、プロトタイプを作成し、割り当てられたプロトタイプに近い領域表現を奨励する。各スライドをプロトタイプで表現することで、プロトタイプのセット距離によって類似したスライドを選択し、蒸留のためのクロススライダープロトタイプで領域を割り当てる。 SLPDは、複数のスライドレベルのベンチマークで最先端の結果を達成し、スライドのセマンティックな構造の表現学習がWSI分析に適したプロキシタスクを実現できることを示した。コードはhttps://github.com/Carboxy/SLPD.comから入手できる。 Improving the feature representation ability is the foundation of many whole slide pathological image (WSIs) tasks. Recent works have achieved great success in pathological-specific self-supervised learning (SSL). However, most of them only focus on learning patch-level representations, thus there is still a gap between pretext and slide-level downstream tasks, e.g., subtyping, grading and staging. Aiming towards slide-level representations, we propose Slide-Level Prototypical Distillation (SLPD) to explore intra- and inter-slide semantic structures for context modeling on WSIs. Specifically, we iteratively perform intra-slide clustering for the regions (4096x4096 patches) within each WSI to yield the prototypes and encourage the region representations to be closer to the assigned prototypes. By representing each slide with its prototypes, we further select similar slides by the set distance of prototypes and assign the regions by cross-slide prototypes for distillation. SLPD achieves state-of-the-art results on multiple slide-level benchmarks and demonstrates that representation learning of semantic structures of slides can make a suitable proxy task for WSI analysis. Code will be available at https://github.com/Carboxy/SLPD.	翻訳日:2023-07-21 14:00:18 公開日:2023-07-20
# Self2Self+: 自己監督型学習と画像品質評価の損失を伴い、単一イメージのDenoising Self2Self+: Single-Image Denoising with Self-Supervised Learning and Image Quality Assessment Loss ( http://arxiv.org/abs/2307.10695v1 ) ライセンス: Link先を確認	Jaekyun Ko and Sanghwan Lee	(参考訳) 近年,教師付き学習に基づく校正手法が有望な性能を示している。しかし、ノイズクリーンなイメージペアを含む外部データセットへの依存は、適用性を制限する。この制限に対処するため、研究者はノイズの多い入力のみを使用して、デノナイジングネットワークのトレーニングに焦点を合わせてきた。そこで本研究では,ノイズの多い入力画像のみをネットワークトレーニングに用いる単一画像の自己教師型学習手法を提案する。ゲート畳み込みは特徴抽出に用いられ,無基準画像品質評価は訓練過程の指導に用いられた。さらに,Bernulliサンプルを用いて入力画像データセットからサンプルをサンプリングし,一定のドロップアウト率でトレーニングを行った。対応する結果は、トレーニングされたネットワークのさまざまなインスタンスから生成された予測をドロップアウトで平均することで得られた。実験の結果,提案手法は合成データと実世界データの両方において最先端のデノイジング性能を達成した。このことは,様々なノイズ除去タスクに対する潜在的な解決策として,本手法の有効性と実用性を強調している。 Recently, denoising methods based on supervised learning have exhibited promising performance. However, their reliance on external datasets containing noisy-clean image pairs restricts their applicability. To address this limitation, researchers have focused on training denoising networks using solely a set of noisy inputs. To improve the feasibility of denoising procedures, in this study, we proposed a single-image self-supervised learning method in which only the noisy input image is used for network training. Gated convolution was used for feature extraction and no-reference image quality assessment was used for guiding the training process. Moreover, the proposed method sampled instances from the input image dataset using Bernoulli sampling with a certain dropout rate for training. The corresponding result was produced by averaging the generated predictions from various instances of the trained network with dropouts. The experimental results indicated that the proposed method achieved state-of-the-art denoising performance on both synthetic and real-world datasets. This highlights the effectiveness and practicality of our method as a potential solution for various noise removal tasks.	翻訳日:2023-07-21 13:59:54 公開日:2023-07-20
# 確率的プログラミングを用いた知的仮想エージェントのためのアーキテクチャフレームワーク Towards an architectural framework for intelligent virtual agents using probabilistic programming ( http://arxiv.org/abs/2307.10693v1 ) ライセンス: Link先を確認	Anton Andreev (GIPSA-Services), Gr\'egoire Cattan	(参考訳) 我々は,ECA(Embodied conversational agent)を考案・構築するためのKorraAIと呼ばれる新しいフレームワークを提案する。本フレームワークは,環境情報やインタラクション時間,ヒューマンインタラクションパートナーが提供する不確実な情報など,コンテキスト情報を考慮したECAの振る舞いをモデル化する。さらに、KorraAIで構築されたエージェントは、人間のパートナーとの対話を開始することができるため、積極的な行動を示すことができる。これらの目的のために、korraaiは確率的プログラミングを利用する。 KorraAIの確率モデルは、その振る舞いとユーザとのインタラクションをモデル化するために使用される。ユーザの好みに適応し、ECAにおける一定の不確定性を実現し、より自然な振る舞いを実現する。ムード、嗜好、感情(サプライズなど)のような人間のような内部状態は、分布やベイジアンネットワークと共にKorraAIでモデル化することができる。これらのモデルは、ユーザと対話することなく、時間とともに進化することができる。 ECAモデルはプラグインとして実装され、共通のインターフェースを共有する。これにより、ECAデザイナは、モデリングしているキャラクタをより重視し、技術的な詳細に注目するだけでなく、ECAモデルを保存および交換することが可能になる。仮想セールスエージェント、カスタマーサービスエージェント、仮想コンパニオン、芸能人、家庭教師など、KorraAI ECAのいくつかの応用が可能である。 We present a new framework called KorraAI for conceiving and building embodied conversational agents (ECAs). Our framework models ECAs' behavior considering contextual information, for example, about environment and interaction time, and uncertain information provided by the human interaction partner. Moreover, agents built with KorraAI can show proactive behavior, as they can initiate interactions with human partners. For these purposes, KorraAI exploits probabilistic programming. Probabilistic models in KorraAI are used to model its behavior and interactions with the user. They enable adaptation to the user's preferences and a certain degree of indeterminism in the ECAs to achieve more natural behavior. Human-like internal states, such as moods, preferences, and emotions (e.g., surprise), can be modeled in KorraAI with distributions and Bayesian networks. These models can evolve over time, even without interaction with the user. ECA models are implemented as plugins and share a common interface. This enables ECA designers to focus more on the character they are modeling and less on the technical details, as well as to store and exchange ECA models. Several applications of KorraAI ECAs are possible, such as virtual sales agents, customer service agents, virtual companions, entertainers, or tutors.	翻訳日:2023-07-21 13:59:38 公開日:2023-07-20
# 解集合プログラミングによる有界組合せ再構成 Bounded Combinatorial Reconfiguration with Answer Set Programming ( http://arxiv.org/abs/2307.10688v1 ) ライセンス: Link先を確認	Yuya Yamada, Mutsunori Banbara, Katsumi Inoue, Torsten Schaub	(参考訳) 本稿では, Answer Set Programming (ASP) に基づく組合せ再構成問題の解法として, 有界組合せ再構成(bounded combinatorial reconfiguration) という手法を開発した。一般的な課題は、ソース組合せ問題の解空間を研究し、特別な性質を持つ実現可能な解列が存在するかどうかを決定することである。コンストラクションソルバは、直近の国際コンペ(CoRe Challenge 2022)において、コンストラクショントラックのすべてのメトリクスをカバーしている。コンストラゴはシングルエンジンソルバトラックの最短距離で1位にランクインした。本稿では,有界組合せ再構成の設計と実装について述べるとともに,最も研究されている組合せ再構成問題の一つである独立集合再構成問題のASPエンコーディングについて述べる。最後に,CoRe Challenge 2022のすべての事例を考慮した実証分析を行った。 We develop an approach called bounded combinatorial reconfiguration for solving combinatorial reconfiguration problems based on Answer Set Programming (ASP). The general task is to study the solution spaces of source combinatorial problems and to decide whether or not there are sequences of feasible solutions that have special properties. The resulting recongo solver covers all metrics of the solver track in the most recent international competition on combinatorial reconfiguration (CoRe Challenge 2022). recongo ranked first in the shortest metric of the single-engine solvers track. In this paper, we present the design and implementation of bounded combinatorial reconfiguration, and present an ASP encoding of the independent set reconfiguration problem that is one of the most studied combinatorial reconfiguration problems. Finally, we present empirical analysis considering all instances of CoRe Challenge 2022.	翻訳日:2023-07-21 13:59:18 公開日:2023-07-20
# pre-train, adapt and detection: camouflaged object detectionのためのマルチタスクアダプタチューニング Pre-train, Adapt and Detect: Multi-Task Adapter Tuning for Camouflaged Object Detection ( http://arxiv.org/abs/2307.10685v1 ) ライセンス: Link先を確認	Yinghui Xing, Dexuan Kong, Shizhou Zhang, Geng Chen, Lingyan Ran, Peng Wang, Yanning Zhang	(参考訳) camouflaged object detection (cod)は、背景に類似したパターンを示すcamouflaged objectをセグメント化することを目的としている。既存のほとんどの研究は、完全な細部と細部でカモフラージュされたオブジェクトを特定するための特別なモジュールの確立に特化しているが、境界は、オブジェクト関連のセマンティクスの欠如のためにうまく配置できない。本稿では,新しい‘pre-train, adapt and detection’パラダイムを提案する。大規模事前学習モデルを導入することで、大量のマルチモーダルデータから学んだ豊富な知識をcodに直接転送することができる。下流CODタスクに適した機能を調整するために、軽量並列アダプタを挿入する。 4つの挑戦的なベンチマークデータセットに対する大規模な実験により、我々の手法は既存の最先端のCODモデルよりも大きなマージンで優れていることが示された。さらに,異なるセマンティッククラス間で共有可能な知識を活用するために,アダプタをチューニングするためのマルチタスク学習方式を設計する。総合的な実験結果から,本モデルの一般化能力は,ソースタスクのマルチタスクアダプタ初期化とターゲットタスクのマルチタスク適応により大幅に向上できることがわかった。 Camouflaged object detection (COD), aiming to segment camouflaged objects which exhibit similar patterns with the background, is a challenging task. Most existing works are dedicated to establishing specialized modules to identify camouflaged objects with complete and fine details, while the boundary can not be well located for the lack of object-related semantics. In this paper, we propose a novel ``pre-train, adapt and detect" paradigm to detect camouflaged objects. By introducing a large pre-trained model, abundant knowledge learned from massive multi-modal data can be directly transferred to COD. A lightweight parallel adapter is inserted to adjust the features suitable for the downstream COD task. Extensive experiments on four challenging benchmark datasets demonstrate that our method outperforms existing state-of-the-art COD models by large margins. Moreover, we design a multi-task learning scheme for tuning the adapter to exploit the shareable knowledge across different semantic classes. Comprehensive experimental results showed that the generalization ability of our model can be substantially improved with multi-task adapter initialization on source tasks and multi-task adaptation on target tasks.	翻訳日:2023-07-21 13:59:04 公開日:2023-07-20
# ベル対角状態の特異な絡み合い構造を示すワイル・ハイゼンベルクベル基底の特殊特性 Special features of the Weyl-Heisenberg Bell basis imply unusual entanglement structure of Bell-diagonal states ( http://arxiv.org/abs/2307.10727v1 ) ライセンス: Link先を確認	Christopher Popp and Beatrix C. Hiesmayr	(参考訳) 最大絡み合いベル状態は、量子情報科学において絡み合いに基づく方法にとって重要である。通常、ワイル・ハイゼンベルク作用素による完全正則ベル基底の標準構成を考える。これらの演算子の群構造は、誤差補正スキームやベル対角状態の絡み合い構造に強い影響を与えることを示す。特に、これはパウリチャネルとツワールチャネルの等価性を意味する。興味深いことに、他の完全正則ベル基底は同値を破り、例えばPT交絡状態の共有において全く異なる絡み合い構造をもたらす。詳しくは,標準ベル基底は,他のベル基底と比較して,PT状態とPTアンタングル状態の観測値が最も高いことがわかった。結論として,標準ベル基底構造は,偏差を考慮した場合の量子情報理論プロトコルに強い意味を持つ,非常に特殊な構造を生かしている。 Maximally entangled Bell states are of crucial importance for entanglement based methods in quantum information science. Typically, a standard construction of a complete orthonormal Bell-basis by Weyl-Heisenberg operators is considered. We show that the group structure of these operators has strong implication on error correction schemes and on the entanglement structure within Bell-diagonal states. In particular, it implies a equivalence between a Pauli channel and a twirl channel. Interestingly, other complete orthonormal Bell-bases do break the equivalence and lead to a completely different entanglement structure, for instance in the share of PPT-entangled states. In detail, we find that the standard Bell basis has the highest observed share on PPT-states and PPT-entangled states compared to other Bell bases. In summary, our findings show that the standard Bell basis construction exploits a very special structure with strong implications to quantum information theoretic protocols if a deviation is considered.	翻訳日:2023-07-21 13:51:01 公開日:2023-07-20
# LLM検閲: 機械学習の課題か、それともコンピュータセキュリティの問題か? LLM Censorship: A Machine Learning Challenge or a Computer Security Problem? ( http://arxiv.org/abs/2307.10719v1 ) ライセンス: Link先を確認	David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan	(参考訳) 大規模言語モデル(LLM)は複雑な命令を解釈する際、印象的な能力を示した。しかし、提供指示に対する盲目な遵守は、悪意ある使用の危険性に関する懸念につながっている。 LLMを用いたモデル微調整や出力検閲のような既存の防御機構は、まだ問題のある応答を生成できるため、失敗することが証明されている。一般的な検閲アプローチでは、この問題を機械学習の問題として扱い、LLM出力における望ましくないコンテンツを検出するために別のLMに依存している。本稿では,このようなセマンティック検閲手法の理論的限界について述べる。具体的には,semantic censorship が決定不能な問題として認識される可能性を示し,llms のプログラム的および命令追従機能に起因する検閲の固有の課題を浮き彫りにする。さらに我々は、知識のある攻撃者が許容可能なものの集合から許容できない出力を再構築できるため、これらの課題は意味的な検閲を超えて広がると主張する。その結果、検閲の問題は再評価されるべきであり、潜在的なリスクを軽減するためのセキュリティベースのアプローチの適応を保証するセキュリティ問題として扱われるべきである。 Large language models (LLMs) have exhibited impressive capabilities in comprehending complex instructions. However, their blind adherence to provided instructions has led to concerns regarding risks of malicious use. Existing defence mechanisms, such as model fine-tuning or output censorship using LLMs, have proven to be fallible, as LLMs can still generate problematic responses. Commonly employed censorship approaches treat the issue as a machine learning problem and rely on another LM to detect undesirable content in LLM outputs. In this paper, we present the theoretical limitations of such semantic censorship approaches. Specifically, we demonstrate that semantic censorship can be perceived as an undecidable problem, highlighting the inherent challenges in censorship that arise due to LLMs' programmatic and instruction-following capabilities. Furthermore, we argue that the challenges extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs from a collection of permissible ones. As a result, we propose that the problem of censorship needs to be reevaluated; it should be treated as a security problem which warrants the adaptation of security-based approaches to mitigate potential risks.	翻訳日:2023-07-21 13:50:46 公開日:2023-07-20
# 硬質試料とノイズラベル試料の差異に関する実証的研究 Differences Between Hard and Noisy-labeled Samples: An Empirical Study ( http://arxiv.org/abs/2307.10718v1 ) ライセンス: Link先を確認	Mahsa Forouzesh and Patrick Thiran	(参考訳) ラベル付きデータセットからノイズや誤ったラベル付きサンプルをハード/ディフルトサンプルで抽出することは、重要だが未調査のトピックである。 2つの一般的な、しばしば独立した作業ラインが存在し、1つはノイズラベルへの対処に焦点を当て、もう1つはハードサンプルを扱う。しかし、両方のデータが存在する場合、既存のほとんどのメソッドはそれらを等しく扱い、結果としてモデル全体の性能が低下する。本稿では,まず,異なるサンプルに対して,カスタムハードネスとノイズレベルを有する各種合成データセットを設計する。提案する系統的実証研究により,本研究の類似性がよりよく理解され,また,難解なサンプルと不正確なラベル付きサンプルとの相違がより重要となる。これらの制御された実験は、硬度と雑音のサンプルを区別する手法の開発の道を開く。そこで本研究では,硬い試料を保ちながら雑音に満ちた試料をフィルタする簡易かつ効果的な測定法を提案する。本研究では,ラベルノイズが存在する場合の様々なデータ分割手法について検討し,提案手法を用いてハードサンプルからのノイズサンプルをフィルタリングし,フィルタ付きデータセット上でモデルをトレーニングした結果,高いテスト精度が得られたことを証明した。生成した合成データセットと実世界のラベルノイズのあるデータセットの両方でこれを実証する。さらに,提案手法は,半教師付き学習フレームワークで使用する場合,他の手法を大きく上回っている。 Extracting noisy or incorrectly labeled samples from a labeled dataset with hard/difficult samples is an important yet under-explored topic. Two general and often independent lines of work exist, one focuses on addressing noisy labels, and another deals with hard samples. However, when both types of data are present, most existing methods treat them equally, which results in a decline in the overall performance of the model. In this paper, we first design various synthetic datasets with custom hardness and noisiness levels for different samples. Our proposed systematic empirical study enables us to better understand the similarities and more importantly the differences between hard-to-learn samples and incorrectly-labeled samples. These controlled experiments pave the way for the development of methods that distinguish between hard and noisy samples. Through our study, we introduce a simple yet effective metric that filters out noisy-labeled samples while keeping the hard samples. We study various data partitioning methods in the presence of label noise and observe that filtering out noisy samples from hard samples with this proposed metric results in the best datasets as evidenced by the high test accuracy achieved after models are trained on the filtered datasets. We demonstrate this for both our created synthetic datasets and for datasets with real-world label noise. Furthermore, our proposed data partitioning method significantly outperforms other methods when employed within a semi-supervised learning framework.	翻訳日:2023-07-21 13:50:27 公開日:2023-07-20
# 決定的かつ快適な行動計画のためのリスクシャドーイングの導入 Introducing Risk Shadowing For Decisive and Comfortable Behavior Planning ( http://arxiv.org/abs/2307.10714v1 ) ライセンス: Link先を確認	Tim Puphal and Julian Eggert	(参考訳) 都市運転におけるグループインタラクションの問題を考える。自動運転車の最先端の行動プランナーは、主に、他のエージェントと衝突しないなどのエゴエージェントの最適な行動を見つけるために、コスト関数で各エージェントとエージェントの相互作用を個別に検討する。本稿では,3つのエージェント間のグループ間相互作用を分析することで,単一インタラクションを超越できる状況理解手法であるリスクシャドーイングを開発する。具体的には、この第1のエージェントは、第2のエージェントが邪魔しているため、egoエージェントに到達できないため、egoエージェントの行動プランナーで考慮する必要のない第1のエージェントを見つけ出すことができる。実験では,リスクシャドーイングを行動プランナの上流フィルタモジュールとして用いることで,これらの場合の安全性が保証されることから,より決定的かつ快適な運転戦略を計画できることを示した。このアプローチのユーザビリティは,異なる交差点シナリオと縦方向駆動に対して実証される。 We consider the problem of group interactions in urban driving. State-of-the-art behavior planners for self-driving cars mostly consider each single agent-to-agent interaction separately in a cost function in order to find an optimal behavior for the ego agent, such as not colliding with any of the other agents. In this paper, we develop risk shadowing, a situation understanding method that allows us to go beyond single interactions by analyzing group interactions between three agents. Concretely, the presented method can find out which first other agent does not need to be considered in the behavior planner of an ego agent, because this first other agent cannot reach the ego agent due to a second other agent obstructing its way. In experiments, we show that using risk shadowing as an upstream filter module for a behavior planner allows to plan more decisive and comfortable driving strategies than state of the art, given that safety is ensured in these cases. The usability of the approach is demonstrated for different intersection scenarios and longitudinal driving.	翻訳日:2023-07-21 13:50:03 公開日:2023-07-20
# Kick Back & Relax: SlowTVで世界を再構築する方法を学ぶ Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV ( http://arxiv.org/abs/2307.10713v1 ) ライセンス: Link先を確認	Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden	(参考訳) 自己教師付き単眼深度推定(ss-mde)は、膨大なデータにスケールする可能性がある。残念ながら、既存のアプローチは自動車領域に限定しており、自然環境や屋内環境といった複雑な環境に一般化できない。そこで我々は,既存の自動車用データセットよりも桁違いに多くのデータを含む,youtubeから収集した大規模slowtvデータセットを提案する。 SlowTVは、世界の季節的ハイキング、観光運転、スキューバダイビングなど、多様な環境からの1.7Mイメージを含んでいる。このデータセットを用いて、屋内/屋外の大量のデータセットにゼロショットの一般化を提供するSS-MDEモデルを訓練する。結果として得られたモデルは、より効率的なアーキテクチャを使用しても、既存のSSLアプローチをすべて上回り、教師付きSoTAのギャップを埋める。さらに,性能とゼロショット一般化をさらに最大化するために,ベストプラクティスのコレクションも導入する。これには 1)アスペクト比の増大 2)カメラ固有の推定 3)フレームランダム化とサポート 4) 柔軟な動き推定。コードはhttps://github.com/jspenmar/slowtv_monodepthで入手できる。 Self-supervised monocular depth estimation (SS-MDE) has the potential to scale to vast quantities of data. Unfortunately, existing approaches limit themselves to the automotive domain, resulting in models incapable of generalizing to complex environments such as natural or indoor settings. To address this, we propose a large-scale SlowTV dataset curated from YouTube, containing an order of magnitude more data than existing automotive datasets. SlowTV contains 1.7M images from a rich diversity of environments, such as worldwide seasonal hiking, scenic driving and scuba diving. Using this dataset, we train an SS-MDE model that provides zero-shot generalization to a large collection of indoor/outdoor datasets. The resulting model outperforms all existing SSL approaches and closes the gap on supervised SoTA, despite using a more efficient architecture. We additionally introduce a collection of best-practices to further maximize performance and zero-shot generalization. This includes 1) aspect ratio augmentation, 2) camera intrinsic estimation, 3) support frame randomization and 4) flexible motion estimation. Code is available at https://github.com/jspenmar/slowtv_monodepth.	翻訳日:2023-07-21 13:49:28 公開日:2023-07-20
# 共役DPM:拡散確率モデルの勾配バックプロパゲーションのための随伴感度法 AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models ( http://arxiv.org/abs/2307.10711v1 ) ライセンス: Link先を確認	Jiachun Pan, Hanshu Yan, Jun Hao Liew, Vincent Y. F. Tan, Jiashi Feng	(参考訳) 既存のカスタマイズ方法は、事前訓練された拡散確率モデル(DPM)をユーザが提供する概念に合わせるために、複数の参照例にアクセスする必要がある。本論文は、DPMカスタマイズの課題として、生成コンテンツ上で定義された差別化可能な指標が唯一利用可能な監督基準である場合に解決することを目的とする。 dpms のサンプリング手順は再帰的な unet への呼び出しを必要とするため、na\"ive gradient backpropagation では全てのイテレーションの中間状態を格納する必要があるため、メモリ消費が非常に高い。そこで本研究では,まず拡散モデルから,対応する確率フローODEを解き,新しいサンプルを生成する手法であるAdjointDPMを提案する。次に、随伴感度法を用いて、別の拡張ODEを解くことで、損失の勾配をモデルのパラメータ(条件信号、ネットワーク重み、初期雑音を含む)に戻す。さらに, 指数積分を用いて, 確率フローODEと拡張ODEを単純な非剛性ODEとして再パラメータ化する。最後に、視覚効果を識別テキストの埋め込みに変換すること、特定のスタイル化のためのDPMを微調整すること、セキュリティ監査のための反対サンプルを生成するために初期ノイズを最適化すること、の3つの興味深い課題に対するAdjointDPMの有効性を実証する。 Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denoising UNet, na\"ive gradient backpropagation requires storing the intermediate states of all iterations, resulting in extremely high memory consumption. To overcome this issue, we propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters (including conditioning signals, network weights, and initial noises) by solving another augmented ODE. To reduce numerical errors in both the forward generation and gradient backpropagation processes, we further reparameterize the probability-flow ODE and augmented ODE as simple non-stiff ODEs using exponential integration. Finally, we demonstrate the effectiveness of AdjointDPM on three interesting tasks: converting visual effects into identification text embeddings, finetuning DPMs for specific types of stylization, and optimizing initial noise to generate adversarial samples for security auditing.	翻訳日:2023-07-21 13:49:10 公開日:2023-07-20
# マルチモーダル軌道最適化のためのパラメータ化政策学習 Reparameterized Policy Learning for Multimodal Trajectory Optimization ( http://arxiv.org/abs/2307.10710v1 ) ライセンス: Link先を確認	Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su	(参考訳) 本研究では,高次元連続行動空間における強化学習(RL)のパラメータ化政策の課題について検討する。本稿の目的は,一般のガウスパラメータ化に内在する制限を克服するマルチモーダルポリシの開発である。そこで本研究では,連続rlポリシーを最適軌跡生成モデルとしてモデル化する原則付きフレームワークを提案する。潜在変数のポリシーを条件づけることで、新しい変動境界を最適化目標として導出し、環境の探索を促進する。次に、マルチモーダルポリシーパラメータ化と学習世界モデルを活用して、強力な探索機能と高データ効率を実現するための実用的モデルベースRL手法であるRPGを提案する。実験により,本手法は,密集した報酬を伴うタスクにおいて局所最適を回避し,オブジェクト中心の本質的な報酬を取り入れることで,スパース・リワード環境の解決に有効であることが示された。提案手法は, 様々なタスクにおいて, 従来手法を一貫して上回っている。コードと補足資料はプロジェクトページhttps://haosulab.github.io/rpg/で入手できる。 We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used Gaussian parameterization. To achieve this, we propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. By conditioning the policy on a latent variable, we derive a novel variational bound as the optimization objective, which promotes exploration of the environment. We then present a practical model-based RL method, called Reparameterized Policy Gradient (RPG), which leverages the multimodal policy parameterization and learned world model to achieve strong exploration capabilities and high data efficiency. Empirical results demonstrate that our method can help agents evade local optima in tasks with dense rewards and solve challenging sparse-reward environments by incorporating an object-centric intrinsic reward. Our method consistently outperforms previous approaches across a range of tasks. Code and supplementary materials are available on the project page https://haosulab.github.io/RPG/	翻訳日:2023-07-21 13:48:45 公開日:2023-07-20
# 局所化量子系とカオス量子系を区別する方法 A method to discriminate between localized and chaotic quantum systems ( http://arxiv.org/abs/2307.10706v1 ) ライセンス: Link先を確認	Youssef Aziz Alaoui and Bruno Laburthe-Tolra	(参考訳) 我々は、当初平衡から設定された一般の孤立量子系がその初期状態に近い局所化あるいはカオス化できるかどうかを区別する基準を導出する。提案手法では, 格子サイト内のエネルギーと, 格子サイトから次の格子サイトへのトンネルが等質である一次元格子内を移動する粒子に, 系の力学をマッピングするランツォス基底の時間発展を考察する。カオスシステムとローカライズされたシステムを区別できる基準を推測する。この基準はランツォ状態と期待エネルギーの変動の間の結合強度を含む。本研究では,次元関数としてのアンダーソン局在に対応する3つの事例,多体双極子スピン系の平衡外ダイナミクス,可積分系を検証し,妥当性を検証する。我々は、量子カオス系を特徴づけるために提案されたウィグナー予想と固有状態熱化仮説の正当性を示した。実際、系がカオスであるための我々の基準は、ウィグナー・ダイソン分布の特徴である固有ネルギのレベル反発(スペクトル剛性とも呼ばれる)を暗示している。実演では、ハミルトニアンによって弱次に結合された状態と接続する状態として、固有状態の加熱が適用される作用素のクラスを定義することができる。 We derive a criterion that distinguishes whether a generic isolated quantum system initially set out of equilibrium can be considered as localized close to its initial state, or chaotic. Our approach considers the time evolution in the Lanczos basis, which maps the system's dynamics onto that of a particle moving in a one-dimensional lattice where both the energy in the lattice sites and the tunneling from one lattice site to the next are inhomogeneous. We infer a criterion that allows distinguishing localized from chaotic systems. This criterion involves the coupling strengths between Lanczos states and their expectation energy fluctuations. We verify its validity by inspecting three cases, corresponding to Anderson localization as a function of dimension, the out-of-equilibrium dynamics of a many-body dipolar spin system, and integrable systems. We finally show that our approach provides a justification for the Wigner surmise and the eigenstate thermalization hypothesis, which have both been proposed to characterize quantum chaotic systems. Indeed, our criterion for a system to be chaotic implies the level repulsion (also known as spectral rigidity) of eigenenergies, which is characteristic of the Wigner-Dyson distribution; and we also demonstrate that in the chaotic regime, the expectation value of any local observable only weakly varies as a function of eigenstates. Our demonstration allows to define the class of operators to which the eigenstate thermalization applies, as the ones that connect states that are coupled at weak order by the Hamiltonian.	翻訳日:2023-07-21 13:48:25 公開日:2023-07-20
# TwinLiteNet:自動運転車における走行可能エリアとレーンセグメンテーションのための効率的軽量モデル TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars ( http://arxiv.org/abs/2307.10705v1 ) ライセンス: Link先を確認	Quang Huy Che and Dinh Phuc Nguyen and Minh Quan Pham and Duc Khai Lam	(参考訳) セマンティックセグメンテーションは、周囲の環境を理解するための自律運転において一般的な課題である。運転可能なエリアセグメンテーションとレーン検出は、道路上の安全かつ効率的なナビゲーションに特に重要である。しかし、オリジナルのセマンティクスセグメンテーションモデルは計算コストが高く、ハイエンドハードウェアを必要とするため、自動運転車の組み込みシステムでは実現不可能である。本稿では,運転可能領域と車線区分の軽量モデルを提案する。 TwinLiteNetは安価に設計されているが、正確で効率的なセグメンテーション結果が得られる。 bdd100kデータセット上でtwinlitenetを評価し,現代的なモデルと比較する。実験の結果,twinlitenetは既存の手法と同様に動作し,計算資源が大幅に少ないことがわかった。具体的には、twinlitenet はdrivable area task の91.3%、レーン検出タスクの31.08% iou を 0.4 million のパラメータで達成し、gpu rtx a5000 で 415 fps を達成した。さらにtwinlitenetは、jetson xavier nxで60fpsを達成したため、計算能力に制限のある組み込みデバイス上でリアルタイムに動作し、自動運転車にとって理想的なソリューションとなる。コードは url{https://github.com/chequanghuy/TwinLiteNet} で入手できる。 Semantic segmentation is a common task in autonomous driving to understand the surrounding environment. Driveable Area Segmentation and Lane Detection are particularly important for safe and efficient navigation on the road. However, original semantic segmentation models are computationally expensive and require high-end hardware, which is not feasible for embedded systems in autonomous vehicles. This paper proposes a lightweight model for the driveable area and lane line segmentation. TwinLiteNet is designed cheaply but achieves accurate and efficient segmentation results. We evaluate TwinLiteNet on the BDD100K dataset and compare it with modern models. Experimental results show that our TwinLiteNet performs similarly to existing approaches, requiring significantly fewer computational resources. Specifically, TwinLiteNet achieves a mIoU score of 91.3% for the Drivable Area task and 31.08% IoU for the Lane Detection task with only 0.4 million parameters and achieves 415 FPS on GPU RTX A5000. Furthermore, TwinLiteNet can run in real-time on embedded devices with limited computing power, especially since it achieves 60FPS on Jetson Xavier NX, making it an ideal solution for self-driving vehicles. Code is available: url{https://github.com/chequanghuy/TwinLiteNet}.	翻訳日:2023-07-21 13:48:00 公開日:2023-07-20
# 適応型マルチエージェントマルチアーム付きバンディットを用いた大規模evの分散型スマート充電 Decentralized Smart Charging of Large-Scale EVs using Adaptive Multi-Agent Multi-Armed Bandits ( http://arxiv.org/abs/2307.10704v1 ) ライセンス: Link先を確認	Sharyal Zafar (ENS Rennes, SATIE), Rapha\"el Feraud, Anne Blavette (ENS Rennes, SATIE), Guy Camilleri (UT3, IRIT), Hamid Ben (SATIE, ENS Rennes)	(参考訳) 電気自動車と太陽光発電の急激な成長は、ピーク負荷要求による電流混雑や電圧制限違反などの新しい課題をもたらす可能性がある。これらの問題は、電気自動車、すなわちスマート充電の動作を制御することで軽減することができる。集中型スマート充電ソリューションはすでに文献で提案されている。しかし、このようなソリューションはスケーラビリティに欠ける可能性があり、単一障害点やデータプライバシの懸念など、中央集権化の固有の欠点に苦しむ。分散化はこれらの課題に取り組むのに役立つ。本稿では,適応型マルチエージェントシステムの哲学を用いて,完全分散型スマート充電システムを提案する。提案システムでは,マルチアームバンディット学習を用いて不確実性を扱う。提示されたシステムは分散化、スケーラブル、リアルタイム、モデルフリーであり、異なるプレイヤー間で公平性を考慮している。また,性能評価のための詳細なケーススタディも提示した。 The drastic growth of electric vehicles and photovoltaics can introduce new challenges, such as electrical current congestion and voltage limit violations due to peak load demands. These issues can be mitigated by controlling the operation of electric vehicles i.e., smart charging. Centralized smart charging solutions have already been proposed in the literature. But such solutions may lack scalability and suffer from inherent drawbacks of centralization, such as a single point of failure, and data privacy concerns. Decentralization can help tackle these challenges. In this paper, a fully decentralized smart charging system is proposed using the philosophy of adaptive multi-agent systems. The proposed system utilizes multi-armed bandit learning to handle uncertainties in the system. The presented system is decentralized, scalable, real-time, model-free, and takes fairness among different players into account. A detailed case study is also presented for performance evaluation.	翻訳日:2023-07-21 13:47:38 公開日:2023-07-20
# MSQNet:マルチモーダルクエリによるアクターに依存しないアクション認識 MSQNet: Actor-agnostic Action Recognition with Multi-modal Query ( http://arxiv.org/abs/2307.10763v1 ) ライセンス: Link先を確認	Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta	(参考訳) 既存の行動認識法は、内在的なトポロジとアクター間の明らかな差異により、アクター固有のものである。これはアクター固有のポーズ推定(例えば人間対動物)を必要とし、複雑なモデル設計と高いメンテナンスコストをもたらす。さらに、他の利用可能な情報ソース(クラス名テキストなど)や複数のアクションの同時発生を無視しながら、視覚的モダリティのみと単一ラベルの分類を学ぶことに注力することが多い。これらの制約を克服するために,人間や動物を含む様々な種類の俳優に統一されたソリューションを提供する「アクター非依存マルチモード動作認識」という新しい手法を提案する。さらに,多モードセマンティッククエリーネットワーク(MSQNet)モデルをトランスフォーマーベースのオブジェクト検出フレームワーク(DETRなど)で定式化し,視覚的およびテキスト的モダリティを活用して,アクションクラスをより良く表現する。アクター固有のモデルデザインの排除は重要な利点であり、アクターのポーズ推定の必要性を完全に排除する。 5つの公開ベンチマークの大規模な実験によると、我々のMSQNetは、人間と動物のシングルラベルとマルチラベルのアクション認識タスクにおいて、アクター固有の代替手段の先行技術を最大50%上回っている。コードはhttps://github.com/mondalanindya/MSQNet.comでリリースされる。 Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code will be released at https://github.com/mondalanindya/MSQNet.	翻訳日:2023-07-21 13:42:01 公開日:2023-07-20
# 単一qudit符号化によるフォールトトレラント計算 Fault-Tolerant Computing with Single Qudit Encoding ( http://arxiv.org/abs/2307.10761v1 ) ライセンス: Link先を確認	Matteo Mezzadri, Alessandro Chiesa, Luca Lepori and Stefano Carretta	(参考訳) 本稿では,単一マルチレベルquditに符号化された論理量子ビットを用いた安定化器符号のフォールトトレラント実装に対する一般的なアプローチを提案する。提案方式は、補正と普遍量子計算を可能にする。分子スピン四重項のシミュレーションにより,quditサイズの論理的誤りをほぼ指数関数的に抑制することを示した。結果として得られた小さなquditのパフォーマンスは、数千単位のqubitコードと比較すると驚くべきものだ。 We present a general approach for the Fault Tolerant implementation of stabilizer codes with a logical qubit encoded into a single multi-level qudit, preventing the explosion of resources of multi-qubit codes. The proposed scheme allows for correction and universal quantum computation. We demonstrate its effectiveness by simulations on molecular spin qudits, finding an almost exponential suppression of logical errors with the qudit size. The resulting performance on a small qudit is remarkable when compared to qubit codes using thousands of units.	翻訳日:2023-07-21 13:41:37 公開日:2023-07-20
# Vesper: 音声認識のためのコンパクトで効果的な事前学習モデル Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition ( http://arxiv.org/abs/2307.10757v1 ) ライセンス: Link先を確認	Weidong Chen, Xiaofen Xing, Peihao Chen, Xiangmin Xu	(参考訳) 本稿では,一般的な大規模事前学習モデル(PTM)を音声感情認識タスクに適用するパラダイムを提案する。 PTMは、人工知能に新たな光を当てているが、それらは一般的なタスクを念頭に構築されており、特定のタスクに対する有効性をさらに向上することができる。さらに、実用アプリケーションにPTMを採用することは、かなりのサイズであるため、難しい可能性がある。上述の制限は、大規模PTMを特定のタスクに最適化し、コンパクトかつ効果的にタスク固有のPTMを生成するという別の研究方向を生み出します。本稿では,音声感情認識タスクに着目し,vesperと呼ばれる感情特異的事前学習エンコーダを提案する。 Vesperは、WavLMに基づく音声データセットで事前訓練され、感情的特徴を考慮に入れている。感情情報に対する感受性を高めるため、ヴェスパーは感情誘導マスキング戦略を採用し、マスキングが必要な地域を特定する。その後、vesperは階層的および横断的な自己スーパービジョンを採用し、音響的および意味的表現をキャプチャする能力を向上させる。 iemocap、meld、crema-dのデータセットにおける実験結果は、4層からなるvesperが12層のwavlmベースよりも優れており、12層のvesperの性能は24層のwavlmよりも大きいことを示している。 This paper presents a paradigm that adapts general large-scale pretrained models (PTMs) to speech emotion recognition task. Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved. Additionally, employing PTMs in practical applications can be challenging due to their considerable size. Above limitations spawn another research direction, namely, optimizing large-scale PTMs for specific tasks to generate task-specific PTMs that are both compact and effective. In this paper, we focus on the speech emotion recognition task and propose an improved emotion-specific pretrained encoder called Vesper. Vesper is pretrained on a speech dataset based on WavLM and takes into account emotional characteristics. To enhance sensitivity to emotional information, Vesper employs an emotion-guided masking strategy to identify the regions that need masking. Subsequently, Vesper employs hierarchical and cross-layer self-supervision to improve its ability to capture acoustic and semantic representations, both of which are crucial for emotion recognition. Experimental results on the IEMOCAP, MELD, and CREMA-D datasets demonstrate that Vesper with 4 layers outperforms WavLM Base with 12 layers, and the performance of Vesper with 12 layers surpasses that of WavLM Large with 24 layers.	翻訳日:2023-07-21 13:41:28 公開日:2023-07-20
# LBL:一級分類のための対数障壁損失関数 LBL: Logarithmic Barrier Loss Function for One-class Classification ( http://arxiv.org/abs/2307.10753v1 ) ライセンス: Link先を確認	Tianlei Wang, Dekang Liu, Wandong Zhang, Jiuwen Cao	(参考訳) one-class classification (occ) は、ターゲットのクラスデータのみで分類器を訓練することを目的としており、現実世界のアプリケーションに適用性が強いことで大きな注目を集めている。 OCCには多くの進歩があったが、深層学習に有効なOCC損失機能がない。本稿では,occ目標をスムースに近似することにより,マージンサンプルに大きな勾配を割り当て,よりコンパクトな超球面を導出する新しい対数バリア関数ベースocc損失(lbl)を提案する。しかし、特にサンプルが無限の損失につながる境界上にある場合、lblの最適化は不安定である可能性がある。この問題に対処するため、一方的な緩和Sigmoid関数をLBLに導入し、新しいOCC損失LBLSigを提案する。 LBLSigは平均二乗誤差(MSE)とクロスエントロピー(CE)の融合と見なすことができ、一方の緩和シグモイド関数によりLBLSigの最適化はより滑らかである。提案するlblとlblsigの有効性を,ネットワーク構造の違いに対する最先端occアルゴリズムとの比較により実験的に検証した。ソースコードはhttps://github.com/ML-HDU/LBL_LBLSigにある。 One-class classification (OCC) aims to train a classifier only with the target class data and attracts great attention for its strong applicability in real-world application. Despite a lot of advances have been made in OCC, it still lacks the effective OCC loss functions for deep learning. In this paper, a novel logarithmic barrier function based OCC loss (LBL) that assigns large gradients to the margin samples and thus derives more compact hypersphere, is first proposed by approximating the OCC objective smoothly. But the optimization of LBL may be instability especially when samples lie on the boundary leading to the infinity loss. To address this issue, then, a unilateral relaxation Sigmoid function is introduced into LBL and a novel OCC loss named LBLSig is proposed. The LBLSig can be seen as the fusion of the mean square error (MSE) and the cross entropy (CE) and the optimization of LBLSig is smoother owing to the unilateral relaxation Sigmoid function. The effectiveness of the proposed LBL and LBLSig is experimentally demonstrated in comparisons with several state-of-the-art OCC algorithms on different network structures. The source code can be found at https://github.com/ML-HDU/LBL_LBLSig.	翻訳日:2023-07-21 13:41:06 公開日:2023-07-20
# 人工知能が知識労働の創造性に及ぼす影響--機械的プラジャリズムと確率的パロットを超えて- Exploring Perspectives on the Impact of Artificial Intelligence on the Creativity of Knowledge Work: Beyond Mechanised Plagiarism and Stochastic Parrots ( http://arxiv.org/abs/2307.10751v1 ) ライセンス: Link先を確認	Advait Sarkar	(参考訳) 人工知能(AI)、特に生成モデルは、知識労働のための変換ツールである。彼らは創造性、独創性、盗作、信用の帰属、著作権の所有という概念を問題視している。生成モデルの批判者は、大量のトレーニングデータへの依存を強調し、これらのモデルの出力は、ソースデータのランダム化、リミックス、コラージュ以上のものではないとみなす。これらの理由から、多くの人はこれらのモデルの出力の配置、使用、帰属に関するより強い規制を主張してきた。しかし、これらの問題は人工知能に限ったものではない。本稿では,文学的批判や美術史,著作権法などの例を用いて,創造性と独創性が,対象の不可知性や情報理論的な性質として定義にどのように抵抗するかを示し,その代わりに,プロセスや著者,視聴者の性質として見ることができる。さらに別の見解として、すべての創造的な作業は本質的に再利用される(ほとんどが帰属しない)か、ランダム性自体が創造的になる可能性がある。創造性は最終的にクリエーターとレシーバーのコミュニティによって定義され、ワークフローの創造性はワークフローのどの部分を自動化できるかに依存します。創造的知識労働におけるAIの最近の研究の例から、AIは知識労働を物質生産から重要な統合へとシフトさせることを提案します。本論文は,これらのモデルの利用者の創造的・カリキュラム的音声の重要性を十分に認識し,より単純な表記的・情報理論的な視点から遠ざかる,創造的モデルにおける創造的・信用的割り当ての問題に対する,よりニュアンスなアプローチの議論を開始することを目的としている。 Artificial Intelligence (AI), and in particular generative models, are transformative tools for knowledge work. They problematise notions of creativity, originality, plagiarism, the attribution of credit, and copyright ownership. Critics of generative models emphasise the reliance on large amounts of training data, and view the output of these models as no more than randomised plagiarism, remix, or collage of the source data. On these grounds, many have argued for stronger regulations on the deployment, use, and attribution of the output of these models. However, these issues are not new or unique to artificial intelligence. In this position paper, using examples from literary criticism, the history of art, and copyright law, I show how creativity and originality resist definition as a notatable or information-theoretic property of an object, and instead can be seen as the property of a process, an author, or a viewer. Further alternative views hold that all creative work is essentially reuse (mostly without attribution), or that randomness itself can be creative. I suggest that creativity is ultimately defined by communities of creators and receivers, and the deemed sources of creativity in a workflow often depend on which parts of the workflow can be automated. Using examples from recent studies of AI in creative knowledge work, I suggest that AI shifts knowledge work from material production to critical integration. This position paper aims to begin a conversation around a more nuanced approach to the problems of creativity and credit assignment for generative models, one which more fully recognises the importance of the creative and curatorial voice of the users of these models and moves away from simpler notational or information-theoretic views.	翻訳日:2023-07-21 13:40:44 公開日:2023-07-20
# 公正な意見集約のための投票属性バイアスの緩和 Mitigating Voter Attribute Bias for Fair Opinion Aggregation ( http://arxiv.org/abs/2307.10749v1 ) ライセンス: Link先を確認	Ryosuke Ueda, Koh Takeuchi, Hisashi Kashima	(参考訳) 複数の意見の集約は、雇用や融資レビュー、教師付き学習のためのラベル付けデータなど、意思決定において重要な役割を果たす。多数決と既存の世論集計モデルは単純なタスクに有効であるが、不一致が生じる可能性のある客観的なラベルがないタスクには不適切である。特に、性別や人種などの有権者属性が意見に偏りをもたらす場合、集計結果は投票者属性の構成によって異なる可能性がある。バランスの取れた有権者のグループは公平な集計結果に望ましいが、準備が難しい可能性がある。本研究では, 投票者属性に基づく公正な意見集約を実現する手法を検討し, 集計結果の公平性を評価する。この目的のために、多数決のような意見集約モデルとdwid and skeneモデル(d&sモデル)とサンプル重み付けのような公平性オプションを組み合わせたアプローチを検討する。意見集約の公平性を評価するために,確率的ソフトラベルが離散クラスラベルよりも好まれる。まず,投票者属性を考慮せずにソフトラベル推定の問題に対処し,d&sモデルにおける問題点を特定する。これらの制約に対処するため,ソフトラベル推定の精度を向上させるソフトD&Sモデルを提案する。さらに, 合成データと半合成データを用いて, ソフトd&sを含む意見集約モデルのフェアネスを, 異なるフェアネスオプションと組み合わせて評価した。実験結果から,ソフトD&Sと公平性オプションとしてのデータ分割の組み合わせは高密度データに有効であるのに対し,重み付き多数決はスパースデータに有効であることが示唆された。これらの知見は、バランスのとれた意見集約を持つ人間および機械学習モデルによる意思決定を支援する上で特に有用である。 The aggregation of multiple opinions plays a crucial role in decision-making, such as in hiring and loan review, and in labeling data for supervised learning. Although majority voting and existing opinion aggregation models are effective for simple tasks, they are inappropriate for tasks without objectively true labels in which disagreements may occur. In particular, when voter attributes such as gender or race introduce bias into opinions, the aggregation results may vary depending on the composition of voter attributes. A balanced group of voters is desirable for fair aggregation results but may be difficult to prepare. In this study, we consider methods to achieve fair opinion aggregation based on voter attributes and evaluate the fairness of the aggregated results. To this end, we consider an approach that combines opinion aggregation models such as majority voting and the Dawid and Skene model (D&S model) with fairness options such as sample weighting. To evaluate the fairness of opinion aggregation, probabilistic soft labels are preferred over discrete class labels. First, we address the problem of soft label estimation without considering voter attributes and identify some issues with the D&S model. To address these limitations, we propose a new Soft D&S model with improved accuracy in estimating soft labels. Moreover, we evaluated the fairness of an opinion aggregation model, including Soft D&S, in combination with different fairness options using synthetic and semi-synthetic data. The experimental results suggest that the combination of Soft D&S and data splitting as a fairness option is effective for dense data, whereas weighted majority voting is effective for sparse data. These findings should prove particularly valuable in supporting decision-making by human and machine-learning models with balanced opinion aggregation.	翻訳日:2023-07-21 13:40:12 公開日:2023-07-20
# Edgeal: OCTセグメンテーションのためのエッジ推定に基づくアクティブラーニングアプローチ EdgeAL: An Edge Estimation Based Active Learning Approach for OCT Segmentation ( http://arxiv.org/abs/2307.10745v1 ) ライセンス: Link先を確認	Md Abdul Kadir, Hasan Md Tusfiqur Alam, Daniel Sonntag	(参考訳) アクティブラーニングアルゴリズムは、限られたデータでモデルのトレーニングにますます人気がある。しかし,未取得データで利用可能な情報量が限られているため,アノテーションデータの選択は依然として難しい課題である。そこで本研究では,不確かさを計測するために,未検出画像のエッジ情報を先行情報として利用するedgealを提案する。不確かさは、エッジを横断するモデル予測の発散とエントロピーを分析することによって定量化される。この尺度はアノテーション用のスーパーピクセルを選択するために使われる。マルチクラス光コヒーレンス・トモグラフィ(OCT)セグメンテーションタスクにおけるEdgeALの有効性を実証し、アノテーションラベルのコストを3つの公開データセット(Duke, AROI, UMN)でそれぞれ12%, 2.3%, 3%に削減し、99%のダイススコアを得た。ソースコードは \url{https://github.com/Mak-Ta-Reque/EdgeAL} で入手できる。 Active learning algorithms have become increasingly popular for training models with limited data. However, selecting data for annotation remains a challenging problem due to the limited information available on unseen data. To address this issue, we propose EdgeAL, which utilizes the edge information of unseen images as {\it a priori} information for measuring uncertainty. The uncertainty is quantified by analyzing the divergence and entropy in model predictions across edges. This measure is then used to select superpixels for annotation. We demonstrate the effectiveness of EdgeAL on multi-class Optical Coherence Tomography (OCT) segmentation tasks, where we achieved a 99% dice score while reducing the annotation label cost to 12%, 2.3%, and 3%, respectively, on three publicly available datasets (Duke, AROI, and UMN). The source code is available at \url{https://github.com/Mak-Ta-Reque/EdgeAL}	翻訳日:2023-07-21 13:39:41 公開日:2023-07-20
# フェデレーション学習のための公正なクライアント選択 Fairness-Aware Client Selection for Federated Learning ( http://arxiv.org/abs/2307.10738v1 ) ライセンス: Link先を確認	Yuxin Shi, Zelei Liu, Zhuan Shi, Han Yu	(参考訳) フェデレートラーニング(FL)により、複数のデータ所有者(FLクライアント)が、プライベートデータを公開せずに、機械学習モデルを協調的にトレーニングできるようになった。 FLサーバは各トレーニングラウンドで限られた数のクライアントしか扱えないため、FLクライアントの選択は重要な研究課題となっている。既存のアプローチでは、FLモデルの性能の向上や、FLクライアントの公平な処理の強化に重点を置いている。 FLクライアント選択時の性能と公平性のバランスに関する問題は未解決のままである。この問題を解決するために、FairFedCS(Fairness-aware Federated Client Selection)アプローチを提案する。リアプノフ最適化に基づき、その評価、flタスクへの参加時期、モデル性能への貢献を共同で考慮し、flクライアントの選択確率を動的に調整する。しきい値に基づく評判フィルタリングを使わずに、FLクライアントは、パフォーマンスの低さが認識された後に評判を再評価する機会を与えられる。実世界のマルチメディアデータセットに基づく大規模な実験により、FairFedCSは19.6%のフェアネスと0.73%のテスト精度を達成した。 Federated learning (FL) has enabled multiple data owners (a.k.a. FL clients) to train machine learning models collaboratively without revealing private data. Since the FL server can only engage a limited number of clients in each training round, FL client selection has become an important research problem. Existing approaches generally focus on either enhancing FL model performance or enhancing the fair treatment of FL clients. The problem of balancing performance and fairness considerations when selecting FL clients remains open. To address this problem, we propose the Fairness-aware Federated Client Selection (FairFedCS) approach. Based on Lyapunov optimization, it dynamically adjusts FL clients' selection probabilities by jointly considering their reputations, times of participation in FL tasks and contributions to the resulting model performance. By not using threshold-based reputation filtering, it provides FL clients with opportunities to redeem their reputations after a perceived poor performance, thereby further enhancing fair client treatment. Extensive experiments based on real-world multimedia datasets show that FairFedCS achieves 19.6% higher fairness and 0.73% higher test accuracy on average than the best-performing state-of-the-art approach.	翻訳日:2023-07-21 13:39:21 公開日:2023-07-20
# ガウス混合系におけるロングテール理論 Long-Tail Theory under Gaussian Mixtures ( http://arxiv.org/abs/2307.10736v1 ) ライセンス: Link先を確認	Arman Bolatov, Maxat Tezekbayev, Igor Melnykov, Artur Pak, Vassilina Nikoulina and Zhenisbek Assylbekov	(参考訳) フェルドマンのロングテール理論(2020年)に準拠したデータ生成のための単純なガウス混合モデルを提案する。線形分類器は,提案モデルの一定レベル以下では一般化誤差を低減できないが,記憶容量を有する非線形分類器は可能である。これは、長い尾の分布に対して、新しいデータへの最適な一般化のために稀なトレーニング例を考慮しなければならないことを裏付ける。最後に, 合成データおよび実データ実験により確認されるように, 尾部がサブポピュレーション周波数分布において短くなるにつれて, 線形モデルと非線形モデルの性能ギャップが小さくなることを示す。 We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered for optimal generalization to new data. Finally, we show that the performance gap between linear and nonlinear models can be lessened as the tail becomes shorter in the subpopulation frequency distribution, as confirmed by experiments on synthetic and real data.	翻訳日:2023-07-21 13:39:00 公開日:2023-07-20
# 一定深さでのロバストなスパースiqpサンプリング Robust sparse IQP sampling in constant depth ( http://arxiv.org/abs/2307.10729v1 ) ライセンス: Link先を確認	Louis Paletta, Anthony Leverrier, Alain Sarlette, Mazyar Mirrahimi, Christophe Vuillot	(参考訳) 強固な量子アドバンテージと完全にフォールトトレラントな量子計算の証明を伴わないnisq(noisy intermediate scale quantum)アプローチ間において、(広く受け入れられている複雑性予想の下で)証明可能な超多項量子アドバンテージを達成するためのスキームを提案する。我々は、スパースIQP(Instantaneous Quantum Polynomial-time)回路と呼ばれる通勤ゲートのサンプリング問題の種類を選択し、テトラヘリックス符号を導入することにより、その耐故障性を確保する。この新符号は、複数の四面体符号(3Dカラーコード)をマージして取得され、各スパースIQPゲートがトランスバーサル実装を認め、論理回路の深さをその幅で交換できるという特性を持つ。これらを組み合わせて、符号化状態の作成まで、任意のスパースiqp回路の深さ-1 実装を得る。これは、元の回路の幅で多対数しか持たない空間オーバーヘッドによるものである。さらに、従来の計算からフィードフォワードの単一ステップで、状態準備を一定の深さで行うこともできることを示す。そこで本研究では,1ラウンドの計測とフィードフォワードで一定深度回路上に実装したサンプリング問題に対して,ロバストなスーパーポリノミカル量子優位性を示す。 Between NISQ (noisy intermediate scale quantum) approaches without any proof of robust quantum advantage and fully fault-tolerant quantum computation, we propose a scheme to achieve a provable superpolynomial quantum advantage (under some widely accepted complexity conjectures) that is robust to noise with minimal error correction requirements. We choose a class of sampling problems with commuting gates known as sparse IQP (Instantaneous Quantum Polynomial-time) circuits and we ensure its fault-tolerant implementation by introducing the tetrahelix code. This new code is obtained by merging several tetrahedral codes (3D color codes) and has the following properties: each sparse IQP gate admits a transversal implementation, and the depth of the logical circuit can be traded for its width. Combining those, we obtain a depth-1 implementation of any sparse IQP circuit up to the preparation of encoded states. This comes at the cost of a space overhead which is only polylogarithmic in the width of the original circuit. We furthermore show that the state preparation can also be performed in constant depth with a single step of feed-forward from classical computation. Our construction thus exhibits a robust superpolynomial quantum advantage for a sampling problem implemented on a constant depth circuit with a single round of measurement and feed-forward.	翻訳日:2023-07-21 13:38:48 公開日:2023-07-20
# 簡単な検出による量子状態による物体検出とレンジフィンディング Object detection and rangefinding with quantum states using simple detection ( http://arxiv.org/abs/2307.10785v1 ) ライセンス: Link先を確認	Richard J. Murchie, Jonathan D. Pritchard, John Jeffers	(参考訳) 単一レベルが弱い雑音環境において、量子照明は、非同時位相非感受性の偶然数に基づく準最適測定の限界においても、対象物の存在と範囲を決定する際に古典的な照明よりも優れる。現実的な実験プロトコルによって動機付けされ、簡単な検出器で同時マルチショットデータを解析するための理論的枠組みを提案する。このアプローチは、見過ごされがちな非結合データを含めることを可能にし、オブジェクトの存在と範囲を推測するキャリブレーションフリーのしきい値を提供し、異なる検出レジーム間の公正な比較を可能にする。本研究は, 雑音環境下でのターゲット識別を行う際の古典的照明に対する量子の利点を定量化し, 所定の信頼度でターゲットを検出するのに必要なショット数を推定することを含む。 In a noisy environment with weak single levels, quantum illumination can outperform classical illumination in determining the presence and range of a target object even in the limit of sub-optimal measurements based on non-simultaneous, phase-insensitive coincidence counts. Motivated by realistic experimental protocols, we present a theoretical framework for analysing coincident multi-shot data with simple detectors. This approach allows for the often-overlooked non-coincidence data to be included, as well as providing a calibration-free threshold for inferring the presence and range of an object, enabling a fair comparison between different detection regimes. Our results quantify the advantage of quantum over classical illumination when performing target discrimination in a noisy thermal environment, including estimating the number of shots required to detect a target with a given confidence level.	翻訳日:2023-07-21 13:31:06 公開日:2023-07-20
# SMURF: 4次元イメージングレーダを用いた3次元物体検出のための空間多重表現融合 SMURF: Spatial Multi-Representation Fusion for 3D Object Detection with 4D Imaging Radar ( http://arxiv.org/abs/2307.10784v1 ) ライセンス: Link先を確認	Jianan Liu, Qiuchi Zhao, Weiyi Xiong, Tao Huang, Qing-Long Han, Bing Zhu	(参考訳) 4Dミリ波レーダー(mmWave)は、悪天候条件下でのコスト効率と操作性から、車両の検知に有望な技術である。しかし、この技術の採用は、レーダポイントクラウドデータにおけるスパーシリティとノイズの問題によって妨げられている。本稿では,単一4次元イメージングレーダを用いた新しい3次元物体検出手法である空間多重表現融合(SMURF)を提案する。 SMURFは、カーネル密度推定(KDE)を通して多次元ガウス混合分布の柱化や密度特性を含むレーダー検出点の複数の表現を利用する。 KDEは、狭角分解能とレーダ信号のマルチパス伝搬による測定精度の低下を効果的に緩和する。さらに、KDEは密度特性をキャプチャすることで、ポイントクラウドの分散を緩和する。 View-of-Delft(VoD)とTJ4DRadSetデータセットの実験的評価は、SMURFの有効性と一般化能力を示し、最近提案された4Dイメージングレーダベースの単一表現モデルよりも優れている。さらに、4Dイメージングレーダのみを使用しながら、SMURFは最先端の4Dイメージングレーダとカメラ融合方式に匹敵する性能を保ち、TJ4DRadSetデータセットの鳥眼視の平均精度は1.22%、VoDデータセットの全注釈領域の平均精度は1.32%向上した。提案手法は印象的な推論時間を示し,2つのデータセットのほとんどのスキャンにおいて0.05秒以内で,リアルタイム検出の課題に対処する。本研究は、4DmmWaveレーダの利点を強調し、4Dイメージングレーダを用いた3次元物体検出に関するその後の研究の強力なベンチマークである。 The 4D Millimeter wave (mmWave) radar is a promising technology for vehicle sensing due to its cost-effectiveness and operability in adverse weather conditions. However, the adoption of this technology has been hindered by sparsity and noise issues in radar point cloud data. This paper introduces spatial multi-representation fusion (SMURF), a novel approach to 3D object detection using a single 4D imaging radar. SMURF leverages multiple representations of radar detection points, including pillarization and density features of a multi-dimensional Gaussian mixture distribution through kernel density estimation (KDE). KDE effectively mitigates measurement inaccuracy caused by limited angular resolution and multi-path propagation of radar signals. Additionally, KDE helps alleviate point cloud sparsity by capturing density features. Experimental evaluations on View-of-Delft (VoD) and TJ4DRadSet datasets demonstrate the effectiveness and generalization ability of SMURF, outperforming recently proposed 4D imaging radar-based single-representation models. Moreover, while using 4D imaging radar only, SMURF still achieves comparable performance to the state-of-the-art 4D imaging radar and camera fusion-based method, with an increase of 1.22% in the mean average precision on bird's-eye view of TJ4DRadSet dataset and 1.32% in the 3D mean average precision on the entire annotated area of VoD dataset. Our proposed method demonstrates impressive inference time and addresses the challenges of real-time detection, with the inference time no more than 0.05 seconds for most scans on both datasets. This research highlights the benefits of 4D mmWave radar and is a strong benchmark for subsequent works regarding 3D object detection with 4D imaging radar.	翻訳日:2023-07-21 13:30:52 公開日:2023-07-20
# 詳細と詳細:マルチモーダルビジュアルデータによるゼロショットポイントクラウドセグメンテーション See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data ( http://arxiv.org/abs/2307.10782v1 ) ライセンス: Link先を確認	Yuhang Lu, Qi Jiang, Runnan Chen, Yuenan Hou, Xinge Zhu, Yuexin Ma	(参考訳) ゼロショットポイントクラウドセグメンテーションは、トレーニングフェーズで見えないポイントクラウドで新しいオブジェクトを認識することができるディープモデルを作ることを目的としている。最近のトレンドでは、ラベル付き参照クラスからラベルなしの未認識クラスに知識を転送するパイプラインが好まれている。彼らは通常、視覚的特徴と、見たクラスのアノテーションの監督によって単語の埋め込みから得られる意味的特徴とを一致させる。しかし、ポイントクラウドはセマンティック機能に完全にマッチする限られた情報を含んでいる。実際、画像のリッチな外観情報はテクスチャのない点雲の自然な補完であり、以前の文献ではよく研究されていない。そこで本研究では,点群と画像の相補的情報をより正確な視覚・意味的アライメントに活用するための,新しいマルチモーダルゼロショット学習手法を提案する。セマンティックKITTI と nuScenes という2つの一般的なベンチマークで大規模な実験を行い,本手法は従来のSOTA法よりも52%,49%向上した。 Zero-shot point cloud segmentation aims to make deep models capable of recognizing novel objects in point cloud that are unseen in the training phase. Recent trends favor the pipeline which transfers knowledge from seen classes with labels to unseen classes without labels. They typically align visual features with semantic features obtained from word embedding by the supervision of seen classes' annotations. However, point cloud contains limited information to fully match with semantic features. In fact, the rich appearance information of images is a natural complement to the textureless point cloud, which is not well explored in previous literature. Motivated by this, we propose a novel multi-modal zero-shot learning method to better utilize the complementary information of point clouds and images for more accurate visual-semantic alignment. Extensive experiments are performed in two popular benchmarks, i.e., SemanticKITTI and nuScenes, and our method outperforms current SOTA methods with 52% and 49% improvement on average for unseen class mIoU, respectively.	翻訳日:2023-07-21 13:30:23 公開日:2023-07-20
# 視覚トランスフォーマーの学習しきい値トークンのマージとプルーニング Learned Thresholds Token Merging and Pruning for Vision Transformers ( http://arxiv.org/abs/2307.10780v1 ) ライセンス: Link先を確認	Maxim Bonnaerens, Joni Dambre	(参考訳) ビジョントランスフォーマーは、過去数年間、幅広いコンピュータビジョンタスクで顕著な成功を収めてきた。しかし、それらの高い計算コストは、実際の展開にとって重要な障壁である。特に、トランスフォーマーモデルの複雑さは、入力トークンの数に関して二次的である。そのため、処理が必要な入力トークンの数を減らす技術が提案されている。本稿では,トークンマージとトークンプルーニングの両方の長所を活用する新しいアプローチであるLTMP(Learned Thresholds token Merging and Pruning)を紹介する。 LTMPは学習しきい値マスキングモジュールを使用して、マージするトークンとプルーするトークンを動的に決定する。我々は、ImageNet分類タスクにおいて、視覚変換器に関する広範な実験を行った。以上の結果から,LTMPは従来の手法よりも桁違いに高速な1つの微調整エポックしか必要とせず,縮小速度をまたいで最先端の精度を達成できることが示唆された。コードはhttps://github.com/Mxbonn/ltmpで入手できる。 Vision transformers have demonstrated remarkable success in a wide range of computer vision tasks over the last years. However, their high computational costs remain a significant barrier to their practical deployment. In particular, the complexity of transformer models is quadratic with respect to the number of input tokens. Therefore techniques that reduce the number of input tokens that need to be processed have been proposed. This paper introduces Learned Thresholds token Merging and Pruning (LTMP), a novel approach that leverages the strengths of both token merging and token pruning. LTMP uses learned threshold masking modules that dynamically determine which tokens to merge and which to prune. We demonstrate our approach with extensive experiments on vision transformers on the ImageNet classification task. Our results demonstrate that LTMP achieves state-of-the-art accuracy across reduction rates while requiring only a single fine-tuning epoch, which is an order of magnitude faster than previous methods. Code is available at https://github.com/Mxbonn/ltmp .	翻訳日:2023-07-21 13:30:03 公開日:2023-07-20
# 効率的なビームツリー再帰 Efficient Beam Tree Recursion ( http://arxiv.org/abs/2307.10779v1 ) ライセンス: Link先を確認	Jishnu Ray Chowdhury, Cornelia Caragea	(参考訳) Beam Tree Recursive Neural Network (BT-RvNN)は、最近、Gumbel Tree RvNNの単純な拡張として提案され、他のタスクで同等のパフォーマンスを維持しながら、ListOpsの最先端長一般化性能を達成することが示されている。しかし、BT-RvNNは、その種類では最悪のものではないが、メモリ使用量では極端に高価である。本稿では,BT-RvNNのメモリ使用量の主なボトルネックは,スコア機能と再帰的セル機能の絡み合いであることを示す。我々は、このボトルネックを取り除き、メモリ使用をさらに単純化する戦略を提案する。全体的に、BT-RvNNのメモリ使用量を10-16ドル倍に削減するだけでなく、他のタスクでも同様のパフォーマンスを維持しながら、ListOpsに新たな最先端技術を作成します。さらに、bt-rvnnが生成する遅延木ノード表現を用いて、$f:\mathbb{r}^{n \times d} \rightarrow \mathbb{r}^{d}$を$f:\mathbb{r}^{n \times d} \rightarrow \mathbb{r}^{n \times d} \rightarrow \mathbb{r}^{n \times d}$という形の文エンコーダからbt-rvnnを変換する方法も提案する。したがって、我々の提案はRvNNのさらなる拡張のための道を開くだけでなく、TransformersやStructured State Spaceモデルといった他の一般的なモデルと簡単に積み重ねたりインターフェースしたりできるディープラーニングツールキットの別のビルディングブロックとしてBT-RvNNを使用する方法を標準化する。 Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a simple extension of Gumbel Tree RvNN and it was shown to achieve state-of-the-art length generalization performance in ListOps while maintaining comparable performance on other tasks. However, although not the worst in its kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this paper, we identify the main bottleneck in BT-RvNN's memory usage to be the entanglement of the scorer function and the recursive cell function. We propose strategies to remove this bottleneck and further simplify its memory usage. Overall, our strategies not only reduce the memory usage of BT-RvNN by $10$-$16$ times but also create a new state-of-the-art in ListOps while maintaining similar performance in other tasks. In addition, we also propose a strategy to utilize the induced latent-tree node representations produced by BT-RvNN to turn BT-RvNN from a sentence encoder of the form $f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{d}$ into a sequence contextualizer of the form $f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}$. Thus, our proposals not only open up a path for further scalability of RvNNs but also standardize a way to use BT-RvNNs as another building block in the deep learning toolkit that can be easily stacked or interfaced with other popular models such as Transformers and Structured State Space models.	翻訳日:2023-07-21 13:29:46 公開日:2023-07-20
# 大規模言語モデルを用いた極多ラベルスキル抽出訓練 Extreme Multi-Label Skill Extraction Training using Large Language Models ( http://arxiv.org/abs/2307.10778v1 ) ライセンス: Link先を確認	Jens-Joris Decorte, Severine Verlinden, Jeroen Van Hautte, Johannes Deleu, Chris Develder and Thomas Demeester	(参考訳) オンライン求人広告は、スキル要件に関する情報の貴重な源であり、労働市場分析やe-recruitmentプロセスにおいて重要な役割を果たす。このような広告は通常、フリーテキストでフォーマットされるので、自然言語処理(nlp)技術は自動的に処理する必要がある。具体的には、スキル(文字通り、または暗黙的に記述された)を検出して、それらを大きなスキルオントロジーにリンクするタスクに焦点を当て、極端なマルチラベル分類(XMLC)の難しいケースとなる。この特定のXMLCタスクにラベル付き(トレーニング)データセットが存在しないことを考慮し、汎用言語モデル(LLM)を活用する手法を提案する。本稿では,スキル抽出のための精度の高い完全合成ラベル付きデータセットを生成するための費用対効果のアプローチについて述べる。 3つのスキル抽出ベンチマークで比較した結果,リテラルマッチングによる遠隔監視のみに依存する結果と比較して,textit{r-precision@5}では15～25パーセンテージの一致がみられた。 Online job ads serve as a valuable source of information for skill requirements, playing a crucial role in labor market analysis and e-recruitment processes. Since such ads are typically formatted in free text, natural language processing (NLP) technologies are required to automatically process them. We specifically focus on the task of detecting skills (mentioned literally, or implicitly described) and linking them to a large skill ontology, making it a challenging case of extreme multi-label classification (XMLC). Given that there is no sizable labeled (training) dataset are available for this specific XMLC task, we propose techniques to leverage general Large Language Models (LLMs). We describe a cost-effective approach to generate an accurate, fully synthetic labeled dataset for skill extraction, and present a contrastive learning strategy that proves effective in the task. Our results across three skill extraction benchmarks show a consistent increase of between 15 to 25 percentage points in \textit{R-Precision@5} compared to previously published results that relied solely on distant supervision through literal matches.	翻訳日:2023-07-21 13:29:10 公開日:2023-07-20
# 変形可能なニューラルネットワークプリミティブを用いた都市放射場表現 Urban Radiance Field Representation with Deformable Neural Mesh Primitives ( http://arxiv.org/abs/2307.10776v1 ) ライセンス: Link先を確認	Fan Lu, Yan Xu, Guang Chen, Hongsheng Li, Kwan-Yee Lin, Changjun Jiang	(参考訳) Neural Radiance Fields (NeRF) はここ数年で大きな成功を収めている。しかし、レイマーチングベースのレンダリングのために、現在のほとんどのメソッドは集中的なリソースを必要とする。都市レベルの放射場を効率的に構築するために,変形可能なニューラルネットワークプリミティブ(dnmp)を設計し,これらのプリミティブを用いてシーン全体をパラメータ化することを提案する。 DNMPは古典メッシュ表現の柔軟でコンパクトなニューラルバリアントであり、ラスタライズベースのレンダリングの効率と、フォトリアリスティック画像合成のための強力なニューラル表現能力の両方を享受している。具体的には、DNMPは、局所領域の幾何および放射情報をパラメータ化するために、ペアの頂点特徴を持つ連結変形可能なメッシュ頂点からなる。最適化の自由度を制限し、ストレージ予算を低くするために、各プリミティブの形状を比較的低次元の潜在空間から復号するように強制する。レンダリング色は、ビュー依存MLPにより頂点特徴(ラスタ化で補間)からデコードされる。 dnmpは、魅力的な特性を持つ都市レベルのシーン表現のための新しいパラダイムを提供する: $(1)$ high-quality rendering。本手法は,都市シナリオにおける新規ビュー合成の先進的な性能を実現する。計算コストは$(2)である。我々の表現は高速レンダリング(2.07ms/1kピクセル)と低ピークメモリ(110MB/1kピクセル)を可能にする。我々はまた、33$\times$でバニラのNeRFより高速に動作でき、高度に最適化されたInstant-NGP(0.61対0.71ms/1kピクセル)に匹敵する軽量バージョンも提示する。プロジェクトページ: \href{https://dnmp.github.io/}{https://dnmp.github.io/} Neural Radiance Fields (NeRFs) have achieved great success in the past few years. However, most current methods still require intensive resources due to ray marching-based rendering. To construct urban-level radiance fields efficiently, we design Deformable Neural Mesh Primitive~(DNMP), and propose to parameterize the entire scene with such primitives. The DNMP is a flexible and compact neural variant of classic mesh representation, which enjoys both the efficiency of rasterization-based rendering and the powerful neural representation capability for photo-realistic image synthesis. Specifically, a DNMP consists of a set of connected deformable mesh vertices with paired vertex features to parameterize the geometry and radiance information of a local area. To constrain the degree of freedom for optimization and lower the storage budgets, we enforce the shape of each primitive to be decoded from a relatively low-dimensional latent space. The rendering colors are decoded from the vertex features (interpolated with rasterization) by a view-dependent MLP. The DNMP provides a new paradigm for urban-level scene representation with appealing properties: $(1)$ High-quality rendering. Our method achieves leading performance for novel view synthesis in urban scenarios. $(2)$ Low computational costs. Our representation enables fast rendering (2.07ms/1k pixels) and low peak memory usage (110MB/1k pixels). We also present a lightweight version that can run 33$\times$ faster than vanilla NeRFs, and comparable to the highly-optimized Instant-NGP (0.61 vs 0.71ms/1k pixels). Project page: \href{https://dnmp.github.io/}{https://dnmp.github.io/}.	翻訳日:2023-07-21 13:28:52 公開日:2023-07-20
# データ駆動ソフトウェアエンジニアリングにおけるAutoMLの利用を評価する Assessing the Use of AutoML for Data-Driven Software Engineering ( http://arxiv.org/abs/2307.10774v1 ) ライセンス: Link先を確認	Fabio Calefato, Luigi Quaranta, Filippo Lanubile, Marcos Kalinowski	(参考訳) 背景。ソフトウェアアプリケーション構築にAI(AI)と機械学習(ML)が広く採用されているため、企業はそのような技術を深く理解している従業員を雇うのに苦労している。このシナリオでは、AutoMLはAI/MLスキルギャップを埋めるための有望なソリューションとして浮上しています。狙いだ関心の高まりと高い期待にもかかわらず、AutoMLが現在AI/ML対応システムを開発するチームによって採用されているか、実践者や研究者によってどのように認識されているか、という情報はほとんどない。方法。本稿では,このギャップを埋めるために,2つのseデータセットにおける12のエンドツーエンドautomlツールのベンチマークと,それに続くインタビューによるユーザ調査を組み合わせた混合手法研究を行い,automlの採用と認識の理解を深める。結果だ automlソリューションは、seドメインで分類タスクを実行するために、研究者がトレーニングし最適化したモデルよりも優れたモデルを生成することができることが分かりました。また、私たちの調査によると、現在利用可能なAutoMLソリューションは、ML開発ワークフローのステージとすべてのチームメンバーの自動化を均等にサポートしていないため、彼らの名前には達していない。結論だ私たちはSEリサーチコミュニティにAutoMLが彼らの活動をどのように促進し、ツールビルダーに次世代のAutoML技術をどのように設計するかを知らせるために洞察を得る。 Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that would normally be engineered by specialized team members. Aims. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted by teams developing AI/ML-enabled systems and how it is perceived by practitioners and researchers. Method. To fill these gaps, in this paper, we present a mixed-method study comprising a benchmark of 12 end-to-end AutoML tools on two SE datasets and a user survey with follow-up interviews to further our understanding of AutoML adoption and perception. Results. We found that AutoML solutions can generate models that outperform those trained and optimized by researchers to perform classification tasks in the SE domain. Also, our findings show that the currently available AutoML solutions do not live up to their names as they do not equally support automation across the stages of the ML development workflow and for all the team members. Conclusions. We derive insights to inform the SE research community on how AutoML can facilitate their activities and tool builders on how to design the next generation of AutoML technologies.	翻訳日:2023-07-21 13:28:24 公開日:2023-07-20
# ビジュアルスペクトログラムを用いたResNetとBi-GRUによる音楽ジャンル分類 Music Genre Classification with ResNet and Bi-GRU Using Visual Spectrograms ( http://arxiv.org/abs/2307.10773v1 ) ライセンス: Link先を確認	Junfei Zhang	(参考訳) 音楽レコメンデーションシステムは、音楽消費を支配している音楽ストリーミングサービスのユーザエクスペリエンスと満足度を高めるために欠かせない要素となっている。これらのレコメンダシステムを改善する上で重要な課題は、特に音楽ジャンルの分類において、音楽データの複雑さを理解することである。手動ジャンル分類の限界は、より高度なシステム、すなわち自動音楽ジャンル分類(AMGC)システムの必要性を強調している。従来の機械学習技術はジャンル分類の可能性を秘めているが、手作業による機能や特徴の選択に大きく依存しており、音楽データの完全な複雑さを捉えていない。一方で、従来の畳み込みニューラルネットワーク(cnn)のようなディープラーニング分類アーキテクチャは、空間階層を捉えるのに有効であるが、音楽データに固有の時間的ダイナミクスを捉えるのに苦労している。これらの課題に対処するために、視覚スペクトログラムを入力として用いる新しいアプローチを提案し、Residual Neural Network(ResNet)とGated Recurrent Unit(GRU)の強みを組み合わせたハイブリッドモデルを提案する。このモデルは、音楽データのより包括的な分析を提供し、音楽データのより包括的な分析と、より正確なジャンル分類を実現することによって、音楽レコメンデータシステムを改善する可能性を提供する。 Music recommendation systems have emerged as a vital component to enhance user experience and satisfaction for the music streaming services, which dominates music consumption. The key challenge in improving these recommender systems lies in comprehending the complexity of music data, specifically for the underpinning music genre classification. The limitations of manual genre classification have highlighted the need for a more advanced system, namely the Automatic Music Genre Classification (AMGC) system. While traditional machine learning techniques have shown potential in genre classification, they heavily rely on manually engineered features and feature selection, failing to capture the full complexity of music data. On the other hand, deep learning classification architectures like the traditional Convolutional Neural Networks (CNN) are effective in capturing the spatial hierarchies but struggle to capture the temporal dynamics inherent in music data. To address these challenges, this study proposes a novel approach using visual spectrograms as input, and propose a hybrid model that combines the strength of the Residual neural Network (ResNet) and the Gated Recurrent Unit (GRU). This model is designed to provide a more comprehensive analysis of music data, offering the potential to improve the music recommender systems through achieving a more comprehensive analysis of music data and hence potentially more accurate genre classification.	翻訳日:2023-07-21 13:27:55 公開日:2023-07-20
# エニグマのデコード:作業記憶のさまざまな面に人間とAIをベンチマークする Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory ( http://arxiv.org/abs/2307.10768v1 ) ライセンス: Link先を確認	Ankur Sikarwar and Mengmi Zhang	(参考訳) ワーキングメモリ(WM)は、情報の一時記憶、統合、操作、検索を容易にする基本的な認知プロセスであり、推論や意思決定において重要な役割を果たす。 WMの多面的な性質を捉えたロバストベンチマークデータセットは、AI WMモデルの効果的な開発と評価に不可欠である。ここでは、この目的のために包括的なワーキングメモリ(WorM)ベンチマークデータセットを紹介する。 WorMは10のタスクと100万のトライアルで構成され、WMの4つの機能、3つのドメイン、11の行動および神経特性を評価している。これらすべてのタスクで、最先端のリカレントニューラルネットワークとトランスフォーマーを共同でトレーニングし、テストしました。比較のための上限として、人間の行動ベンチマークも含んでいます。以上の結果から,脳におけるwmの特徴,特にプライマシーとrecency効果,神経クラスターを再現し,wmの異なる領域と機能に特有な相関関係を示唆した。実験では、既存のモデルにおける人間の行動を近似するいくつかの制限も明らかにしている。このデータセットは、認知心理学、神経科学、AIのコミュニティにとって貴重なリソースであり、WMモデルの比較と拡張、WMの神経基盤の調査、人間に似た能力を持つWMモデルの開発のための標準化されたフレームワークを提供する。ソースコードとデータはhttps://github.com/zhanglab-deepneurocoglab/wormで入手できます。 Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency effects, and neural clusters and correlates specialized for different domains and functionalities of WM. In the experiments, we also reveal some limitations in existing models to approximate human behavior. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities. Our source code and data are available at https://github.com/ZhangLab-DeepNeuroCogLab/WorM.	翻訳日:2023-07-21 13:27:30 公開日:2023-07-20
# 適応型特徴分割圧縮によるコミュニケーション効率の高い分割学習 Communication-Efficient Split Learning via Adaptive Feature-Wise Compression ( http://arxiv.org/abs/2307.10805v1 ) ライセンス: Link先を確認	Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, and Yo-Seb Jeon	(参考訳) 本稿では,SL学習過程における中間特徴量と勾配ベクトルの伝達に必要な通信オーバーヘッドを低減させる,SplitFCという新しい通信効率分割学習フレームワークを提案する。 splitfcの鍵となるアイデアは、行列の列に現れる異なる分散度を活用することである。 SplitFCには2つの圧縮戦略がある。 (i)アダプティブ・フィーチャーワイズ・ドロップアウトと (ii)適応的特徴量化。第1の戦略では、これらのベクトルの標準偏差に基づいて、適応的なドロップアウト確率で中間特徴ベクトルをドロップする。そして、チェーンルールにより、ドロップされた特徴ベクトルに関連する中間勾配ベクトルもドロップする。第2の戦略では、非投下中間特徴と勾配ベクトルは、ベクトルの範囲に基づいて決定される適応量子化レベルを用いて量子化される。量子化誤差を最小限に抑えるため、この戦略の最適量子化レベルは閉形式式で導出される。 MNIST、CIFAR-10、CelebAデータセットのシミュレーションの結果、SplitFCは最先端のSLフレームワークと比較して分類精度が5.6%以上向上し、圧縮のないバニラSLフレームワークに比べて通信オーバーヘッドが320倍小さいことが示されている。 This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.	翻訳日:2023-07-21 13:23:01 公開日:2023-07-20
# 海洋科学のための時空間データマイニング:データ,方法論,機会 Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities ( http://arxiv.org/abs/2307.10803v1 ) ライセンス: Link先を確認	Hanchen Yang and Wengen Li and Shuyu Wang and Hui Li and Jihong Guan and Shuigeng Zhou and Jiannong Cao	(参考訳) 時空間~(ST)海洋データの増加に伴い、気象予報や災害警報といった様々な海洋問題に対処するため、多くの時空間データマイニング(STDM)研究が実施されている。典型的なSTデータ(例えば、交通データ)と比較すると、ST海洋データはより複雑で、例えば、多様な地域性や高い空間性といった特徴がある。これらの特徴はSTDMモデルの設計と訓練を困難にしている。残念なことに、これらの研究の概要はいまだに欠けており、コンピュータ科学者が海洋研究の問題を識別するのを妨げつつ、海洋科学の研究者が高度なSTDM技術を適用することを妨げている。この状況を改善するため,海洋における既存のstm研究を総括する総合的な調査を行う。具体的には,広く使用されているST海洋データセットをまず要約し,その特徴を同定する。次に,典型的なst ocean data quality enhancement techniqueについて述べる。次に,海洋における既存のSTDM研究を,予測,事象検出,パターンマイニング,異常検出という4つのタスクに分類し,これらのタスクのテクニックを精査する。最後に、有望な研究機会が強調される。この調査は、コンピュータ科学と海洋科学の両方の分野の科学者が、海洋におけるstdmの基本概念、鍵となる技術、そしてオープンチャレンジをよりよく理解するのに役立つだろう。 With the increasing amount of spatial-temporal~(ST) ocean data, numerous spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, e.g., climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated with some unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models. Unfortunately, an overview of these studies is still missing, hindering computer scientists to identify the research issues in ocean while discouraging researchers in ocean science from applying advanced STDM techniques. To remedy this situation, we provide a comprehensive survey to summarize existing STDM studies in ocean. Concretely, we first summarize the widely-used ST ocean datasets and identify their unique characteristics. Then, typical ST ocean data quality enhancement techniques are discussed. Next, we classify existing STDM studies for ocean into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate the techniques for these tasks. Finally, promising research opportunities are highlighted. This survey will help scientists from the fields of both computer science and ocean science have a better understanding of the fundamental concepts, key techniques, and open challenges of STDM in ocean.	翻訳日:2023-07-21 13:22:38 公開日:2023-07-20
# Meta-Transformer: マルチモーダル学習のための統一フレームワーク Meta-Transformer: A Unified Framework for Multimodal Learning ( http://arxiv.org/abs/2307.10802v1 ) ライセンス: Link先を確認	Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue	(参考訳) マルチモーダル学習は、複数のモダリティから情報を処理し、関連付けるモデルを構築することを目的としている。この分野における長年の開発にもかかわらず、それらの間に固有のギャップがあるため、様々なモダリティを処理するための統一ネットワークを設計することは依然として困難である(\textit{e.}$ natural language, 2d images, 3d point clouds, audio, video, time series, tabular data)。本研究では,$\textbf{frozen}$エンコーダを利用して,対のマルチモーダルトレーニングデータを用いずにマルチモーダル知覚を行う,meta-transformerというフレームワークを提案する。 Meta-Transformerでは、様々なモダリティからの生の入力データを共有トークン空間にマッピングし、凍結パラメータを持つ後続のエンコーダで入力データの高レベルな意味的特徴を抽出する。統合データトークンライザ、モダリティ共有エンコーダ、ダウンストリームタスク用のタスク固有ヘッドの3つの主要コンポーネントで構成されるmeta-transformerは、12つのモダリティにまたがる統一学習を非ペアデータで実行する最初のフレームワークである。異なるベンチマークの実験によると、Meta-Transformerは基本的な認識(テキスト、画像、ポイントクラウド、オーディオ、ビデオ)、実用的なアプリケーション(X線、赤外線、ハイパースペクトル、IMU)、データマイニング(グラフ、表、時系列)など、幅広いタスクを処理できる。 Meta-Transformerは、トランスフォーマーを用いた統合マルチモーダルインテリジェンスを開発するための有望な未来を示す。コードはhttps://github.com/invictus717/MetaTransformerで入手できる。 Multimodal learning aims to build models that can process and relate information from multiple modalities. Despite years of development in this field, it still remains challenging to design a unified network for processing various modalities ($\textit{e.g.}$ natural language, 2D images, 3D point clouds, audio, video, time series, tabular data) due to the inherent gaps among them. In this work, we propose a framework, named Meta-Transformer, that leverages a $\textbf{frozen}$ encoder to perform multimodal perception without any paired multimodal training data. In Meta-Transformer, the raw input data from various modalities are mapped into a shared token space, allowing a subsequent encoder with frozen parameters to extract high-level semantic features of the input data. Composed of three main components: a unified data tokenizer, a modality-shared encoder, and task-specific heads for downstream tasks, Meta-Transformer is the first framework to perform unified learning across 12 modalities with unpaired data. Experiments on different benchmarks reveal that Meta-Transformer can handle a wide range of tasks including fundamental perception (text, image, point cloud, audio, video), practical application (X-Ray, infrared, hyperspectral, and IMU), and data mining (graph, tabular, and time-series). Meta-Transformer indicates a promising future for developing unified multimodal intelligence with transformers. Code will be available at https://github.com/invictus717/MetaTransformer	翻訳日:2023-07-21 13:22:14 公開日:2023-07-20
# 合成一般化のための層間表現融合 Layer-wise Representation Fusion for Compositional Generalization ( http://arxiv.org/abs/2307.10799v1 ) ライセンス: Link先を確認	Yafang Zheng, Lei Lin, Zhaohong Lai, Binling Wang, Shan Liu, Biao Fu, Wenhao Rao, Peigen Ye, Yidong Chen, Xiaodong Shi	(参考訳) 幅広い応用で成功したにもかかわらず、シーケンシャル・ツー・シーケンスモデルの解の構成は、人間のような一般化よりも構成的でないと論じられている。合成一般化を妨げる理由の1つはエンコーダの表現であり、最上層のデコーダが絡み合っているという証拠がある。言い換えると、シーケンスの構文的および意味的表現は不適切にツイストされる。しかし,従来のほとんどの研究は,人間のように適切にシーケンスの構文的・意味的表現を構成・使用するのではなく,トークンレベルの意味情報の強化に重点を置いている。また, ‘shallow' の残差接続や,従来のレイヤの情報を効果的に融合させることができない単純なワンステップ操作などにより,深層変圧器の訓練に関する最近の研究から,絡み合い問題が存在する理由を説明する。この発見から始まり、人間の戦略に着想を得て、各エンコーダおよびデコーダ層に \emph{fuse-attention module} を導入することにより、前のレイヤの情報をエンコードおよびデコードプロセスに適切に融合する、シーケンス-シーケンスモデルの拡張である \textsc{fusion} (\textbf{fu}sing \textbf{s}yntactic and semant\textbf{i}c representingati\textbf{on}s) を提案する。提案手法の有効性を実証的に実証した,2つの現実的なベンチマークに対して, 競合的かつ, さらには, \textbf{state-of-the-art}の結果が得られる。 Despite successes across a broad range of applications, sequence-to-sequence models' construct of solutions are argued to be less compositional than human-like generalization. There is mounting evidence that one of the reasons hindering compositional generalization is representations of the encoder and decoder uppermost layer are entangled. In other words, the syntactic and semantic representations of sequences are twisted inappropriately. However, most previous studies mainly concentrate on enhancing token-level semantic information to alleviate the representations entanglement problem, rather than composing and using the syntactic and semantic representations of sequences appropriately as humans do. In addition, we explain why the entanglement problem exists from the perspective of recent studies about training deeper Transformer, mainly owing to the ``shallow'' residual connections and its simple, one-step operations, which fails to fuse previous layers' information effectively. Starting from this finding and inspired by humans' strategies, we propose \textsc{FuSion} (\textbf{Fu}sing \textbf{S}yntactic and Semant\textbf{i}c Representati\textbf{on}s), an extension to sequence-to-sequence models to learn to fuse previous layers' information back into the encoding and decoding process appropriately through introducing a \emph{fuse-attention module} at each encoder and decoder layer. \textsc{FuSion} achieves competitive and even \textbf{state-of-the-art} results on two realistic benchmarks, which empirically demonstrates the effectiveness of our proposal.	翻訳日:2023-07-21 13:21:42 公開日:2023-07-20
# hyperreenact: 共同学習によるワンショット再現による顔の洗練とターゲティング HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces ( http://arxiv.org/abs/2307.10797v1 ) ライセンス: Link先を確認	Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos	(参考訳) 本稿では,ターゲットの顔のポーズによって駆動される音源の頭部画像のリアルな生成を目的とした,HyperReenactと呼ばれるニューラルフェイス再現法を提案する。既存の最先端の顔再現法では、現実的な顔画像の合成を学ぶための制御可能な生成モデルを訓練するが、重要な視覚的アーティファクト、特に極端な頭部ポーズの変化の困難な条件下では、再現された顔を生成する。本稿では,まず実像をその潜在空間に逆転させ,次にハイパーネットワークを用いて実行することで,予め訓練したStyleGAN2ジェネレータの光リアリスティック生成能力と歪み特性を活用することで,これらの制約に対処することを提案する。 (i)原産地特性の精細化及び (二)顔のポーズを再ターゲットし、通常人工物を生成する外部編集方法への依存をなくす。本手法は,単発設定(すなわち単一ソースフレームを使用する)で動作し,被写体固有の微調整を必要とせず,クロスサブジェクトの再現を可能にする。本手法は,voxceleb1およびvoxceleb2の標準ベンチマークにおいて,定量的かつ定性的に,いくつかの最先端技術と比較し,極端な頭部姿勢変化においても顕著なロバスト性を示すアーティファクトフリー画像生成におけるアプローチの優位性を示す。コードと事前訓練済みのモデルは、https://github.com/StelaBou/HyperReenact で公開しています。 In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that learn to synthesize realistic facial images, yet producing reenacted faces that are prone to significant visual artifacts, especially under the challenging condition of extreme head pose changes, or requiring expensive few-shot fine-tuning to better preserve the source identity characteristics. We propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting the real images into its latent space and then using a hypernetwork to perform: (i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (i.e., using a single source frame) and allows for cross-subject reenactment, without requiring any subject-specific fine-tuning. We compare our method both quantitatively and qualitatively against several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme head pose changes. We make the code and the pretrained models publicly available at: https://github.com/StelaBou/HyperReenact .	翻訳日:2023-07-21 13:20:48 公開日:2023-07-20
# 古典的ジャミングに対する量子強化レンジフィンディングの実証 Demonstration of quantum-enhanced rangefinding robust against classical jamming ( http://arxiv.org/abs/2307.10794v1 ) ライセンス: Link先を確認	Mateusz P. Mrozowski, Richard J. Murchie, John Jeffers, Jonathan D. Pritchard	(参考訳) 本稿では,連続励起光子対源に基づく量子増幅ライダーの動作と,信号レベルと背景レベルと目標反射率を52dB以下に5桁以上分離した条件下での簡単な検出を組み合わせて示す。本稿では,この検出器の性能をlog-likelihood分析フレームワークを用いて特徴付け,高速かつ遅い古典的ジャミングに対するシステムの頑健性を示すとともに,高い周波数変動に対する免疫を維持しつつ,背景変化の遅い影響をなくす動的背景追跡を実現するための新しいプロトコルを導入する。最後に,このシステムを古典的ジャミングの存在下でレンジファインディングの方式に拡張し,検出器ジッタのみに制限された11cmの空間分解能を持つターゲットを探索する。これらの結果は、ライダーアプリケーションに対する量子相関を利用して、現実のシナリオでこのシステムを実装するための明確な経路を提供する。 In this paper we demonstrate operation of a quantum-enhanced lidar based on a continuously pumped photon pair source combined with simple detection in regimes with over 5 orders of magnitude separation between signal and background levels and target reflectivity down to -52 dB. We characterise the performance of our detector using a log-likelihood analysis framework, and crucially demonstrate the robustness of our system to fast and slow classical jamming, introducing a new protocol to implement dynamic background tracking to eliminate the impact of slow background changes whilst maintaining immunity to high frequency fluctuations. Finally, we extend this system to the regime of rangefinding in the presence of classical jamming to locate a target with an 11 cm spatial resolution limited only by the detector jitter. These results demonstrate the advantage of exploiting quantum correlations for lidar applications, providing a clear route to implementation of this system in real-world scenarios.	翻訳日:2023-07-21 13:19:44 公開日:2023-07-20
# Few/many-shot異常検出のためのPatchCoreの最適化 Optimizing PatchCore for Few/many-shot Anomaly Detection ( http://arxiv.org/abs/2307.10792v1 ) ライセンス: Link先を確認	Jo\~ao Santos, Triet Tran, Oliver Rippel	(参考訳) Few-shot Anomaly Detection (AD) はADの出現するサブフィールドであり、少数のサンプルを用いて正常データと異常データの区別を試みる。新たに提案された数ショットADメソッドは、全ショットドメイン用に開発された既存のアルゴリズムをベースラインとして比較するが、数ショット設定のために専用に最適化するわけではない。したがって、そのような既存アルゴリズムの性能をさらに改善できるかどうかは不明である。私たちはこの仕事でその質問に答える。具体的には,現在最先端のフルショットAD/ASアルゴリズムであるPatchCoreのAD/アノマリーセグメンテーション(AS)性能について,少数ショットと多ショット設定の両方で検討する。我々は, (I) 様々なハイパーパラメータを最適化し, (II) 少数ショット教師あり学習をADドメインに変換することで, さらなる性能向上を実現することができると仮定した。パブリックなVisAとMVTec ADデータセットの発掘実験により、(I)基礎となる特徴抽出器のようなハイパーパラメータを最適化することで、(I)重要なパフォーマンス改善を実現し、(II)画像レベルの拡張は、パフォーマンスを改善するために、保証されない。これらの結果に基づき,visa上でのマイ・ショット広告において,新たな最先端の技術を実現し,既存のad/as手法をマイ・ショット・セッティングに適用するメリットをさらに実証する。最後に, 強いインダクティブバイアスを有する特徴抽出器について, (few-shot) ad/asの今後の研究方向性として検討する。 Few-shot anomaly detection (AD) is an emerging sub-field of general AD, and tries to distinguish between normal and anomalous data using only few selected samples. While newly proposed few-shot AD methods do compare against pre-existing algorithms developed for the full-shot domain as baselines, they do not dedicatedly optimize them for the few-shot setting. It thus remains unclear if the performance of such pre-existing algorithms can be further improved. We address said question in this work. Specifically, we present a study on the AD/anomaly segmentation (AS) performance of PatchCore, the current state-of-the-art full-shot AD/AS algorithm, in both the few-shot and the many-shot settings. We hypothesize that further performance improvements can be realized by (I) optimizing its various hyperparameters, and by (II) transferring techniques known to improve few-shot supervised learning to the AD domain. Exhaustive experiments on the public VisA and MVTec AD datasets reveal that (I) significant performance improvements can be realized by optimizing hyperparameters such as the underlying feature extractor, and that (II) image-level augmentations can, but are not guaranteed, to improve performance. Based on these findings, we achieve a new state of the art in few-shot AD on VisA, further demonstrating the merit of adapting pre-existing AD/AS methods to the few-shot setting. Last, we identify the investigation of feature extractors with a strong inductive bias as a potential future research direction for (few-shot) AD/AS.	翻訳日:2023-07-21 13:19:12 公開日:2023-07-20
# 視覚・言語ナビゲーションエージェントの行動解析 Behavioral Analysis of Vision-and-Language Navigation Agents ( http://arxiv.org/abs/2307.10790v1 ) ライセンス: Link先を確認	Zijiao Yang, Arjun Majumdar, Stefan Lee	(参考訳) 成功させるためには、Vision-and-Language Navigation (VLN) エージェントは周囲に基づいて行動の指示を下す必要がある。本研究では,既存のエージェントが,特定の物体や部屋の停止,旋回,移動に関する指示をいかにしっかりと下ろすかを調べることによって,エージェントの行動を研究する手法を開発する。このアプローチはスキル固有の介入の生成とエージェント予測の変化の測定に基づいている。本稿では,近年のエージェントの行動を分析し,複数のエージェントを比較した詳細なケーススタディを提案する。この分析は、学習のバイアスがエージェントの挙動に持続的な影響を与え、既存のモデルが単純な参照表現を基礎にすることができることを示唆している。本モデルとの比較から,VLNタスク全体の性能向上とスキル特化スコアの相関が示唆された。 To be successful, Vision-and-Language Navigation (VLN) agents must be able to ground instructions to actions based on their surroundings. In this work, we develop a methodology to study agent behavior on a skill-specific basis -- examining how well existing agents ground instructions about stopping, turning, and moving towards specified objects or rooms. Our approach is based on generating skill-specific interventions and measuring changes in agent predictions. We present a detailed case study analyzing the behavior of a recent agent and then compare multiple agents in terms of skill-specific competency scores. This analysis suggests that biases from training have lasting effects on agent behavior and that existing models are able to ground simple referring expressions. Our comparisons between models show that skill-specific scores correlate with improvements in overall VLN task performance.	翻訳日:2023-07-21 13:18:44 公開日:2023-07-20
# 分類器の混合に対する逆攻撃 Adversarial attacks for mixtures of classifiers ( http://arxiv.org/abs/2307.10788v1 ) ライセンス: Link先を確認	Lucas Gnecco Heredia, Benjamin Negrevergne, Yann Chevaleyre	(参考訳) 対向攻撃に対する堅牢性を改善する手段として、分類器の混合(すなわちランダム化アンサンブル)が提案されている。しかし、既存の攻撃はこの種の分類器には適していないことが示されている。本稿では,混合を原理的に攻撃する問題について議論し,問題(有効性と極大性)の幾何学的解析に基づく攻撃の2つの望ましい特性を紹介する。そして、既存の攻撃が両方の特性を満たさないことを示す。最後に, 2次線形設定を理論的に保証する格子クライマー攻撃という新たな攻撃を導入し, 合成および実データを用いた実験を行い, その性能を実証する。 Mixtures of classifiers (a.k.a. randomized ensembles) have been proposed as a way to improve robustness against adversarial attacks. However, it has been shown that existing attacks are not well suited for this kind of classifiers. In this paper, we discuss the problem of attacking a mixture in a principled way and introduce two desirable properties of attacks based on a geometrical analysis of the problem (effectiveness and maximality). We then show that existing attacks do not meet both of these properties. Finally, we introduce a new attack called lattice climber attack with theoretical guarantees on the binary linear setting, and we demonstrate its performance by conducting experiments on synthetic and real datasets.	翻訳日:2023-07-21 13:18:30 公開日:2023-07-20
# クラスプロトタイプによるフィードフォワードソースフリードメイン適応 Feed-Forward Source-Free Domain Adaptation via Class Prototypes ( http://arxiv.org/abs/2307.10787v1 ) ライセンス: Link先を確認	Ondrej Bohdal, Da Li, Timothy Hospedales	(参考訳) ソースフリーなドメイン適応は、実用性があり、ソースデータにアクセスする必要がないため人気がある。しかし、適応プロセスにはまだかなりの時間が必要であり、主にバックプロパゲーションに依存する最適化に基づいている。本稿では,バックプロパゲーションに基づく適応の必要性に挑戦する単純なフィードフォワードアプローチを提案する。提案手法は,事前学習モデルを用いて,ドメインシフト下でのクラスプロトタイプの計算に基づいている。事前学習したモデルに比べて精度が大幅に向上し、既存のドメイン適応法のわずかな時間しか必要としない。 Source-free domain adaptation has become popular because of its practical usefulness and no need to access source data. However, the adaptation process still takes a considerable amount of time and is predominantly based on optimization that relies on back-propagation. In this work we present a simple feed-forward approach that challenges the need for back-propagation based adaptation. Our approach is based on computing prototypes of classes under the domain shift using a pre-trained model. It achieves strong improvements in accuracy compared to the pre-trained model and requires only a small fraction of time of existing domain adaptation methods.	翻訳日:2023-07-21 13:18:19 公開日:2023-07-20
# 比較的(事実的)説明のミラー定義の修正 Modifications of the Miller definition of contrastive (counterfactual) explanations ( http://arxiv.org/abs/2307.10832v1 ) ライセンス: Link先を確認	Kevin McAreavey, Weiru Liu	(参考訳) miller氏は最近、よく知られたhalpern-pearl(hp)の定義と(矛盾しない)説明に基づいて、対比的(事実的)な説明の定義を提案した。重要なことに、ミラーの定義は元々のHPによる説明の定義に基づいているが、これはハルパーンによって修正されている。最近ではボルナーが第3の定義を提案しており、この修正HPの定義は直観に反する結果をもたらす可能性があるとしている。本稿では,miller の定義が hp の定義の問題点を継承することを示す。我々は,より堅牢なhp と borner の定義に基づいて,改良された 2 つの変種を提案することで,これらの問題に対処する。我々は、新しい定義を分析し、これらがミラー定義の精神を保ち、これら3つの変種全てが非矛盾的説明の基盤となる定義に関してモジュラーである別の統一定義を満たすことを示した。我々の知る限りでは、本論文は、オリジナルのHP定義と修正HP定義との最初の明示的な比較も提供する。 Miller recently proposed a definition of contrastive (counterfactual) explanations based on the well-known Halpern-Pearl (HP) definitions of causes and (non-contrastive) explanations. Crucially, the Miller definition was based on the original HP definition of explanations, but this has since been modified by Halpern; presumably because the original yields counterintuitive results in many standard examples. More recently Borner has proposed a third definition, observing that this modified HP definition may also yield counterintuitive results. In this paper we show that the Miller definition inherits issues found in the original HP definition. We address these issues by proposing two improved variants based on the more robust modified HP and Borner definitions. We analyse our new definitions and show that they retain the spirit of the Miller definition where all three variants satisfy an alternative unified definition that is modular with respect to an underlying definition of non-contrastive explanations. To the best of our knowledge this paper also provides the first explicit comparison between the original and modified HP definitions.	翻訳日:2023-07-21 13:11:10 公開日:2023-07-20
# Yelpレビューと食品タイプ: レーティング、センチメント、トピックの比較分析 Yelp Reviews and Food Types: A Comparative Analysis of Ratings, Sentiments, and Topics ( http://arxiv.org/abs/2307.10826v1 ) ライセンス: Link先を確認	Wenyu Liao, Yiqing Shi, Yujia Hu, Wei Quan	(参考訳) 本研究は、yelpのレビューと食品の種類との関係を調査し、格付け、感情、トピックが食品の種類によってどのように異なるかを調査した。具体的には,レビューの格付けや感情が食品の種類や格付けや感情に基づいてどのように変化するかを分析し,機械学習モデルを用いたレビュートピックを推察し,異なる食品タイプ間の話題分布を比較する。分析の結果、食品の種類によっては、類似の格付け、感情、話題の分布があるのに対し、別のパターンがあることが明らかとなった。評価と感情に基づいて,4種類の食品の種類を特定し,特定の食品の種類をレビューする際に異なる話題に注目する傾向が認められた。これらの知見は,デジタルメディアプラットフォームにおけるユーザ行動と文化的影響の理解と,異文化間の理解と評価の促進に重要な意味を持つ。 This study examines the relationship between Yelp reviews and food types, investigating how ratings, sentiments, and topics vary across different types of food. Specifically, we analyze how ratings and sentiments of reviews vary across food types, cluster food types based on ratings and sentiments, infer review topics using machine learning models, and compare topic distributions among different food types. Our analyses reveal that some food types have similar ratings, sentiments, and topics distributions, while others have distinct patterns. We identify four clusters of food types based on ratings and sentiments and find that reviewers tend to focus on different topics when reviewing certain food types. These findings have important implications for understanding user behavior and cultural influence on digital media platforms and promoting cross-cultural understanding and appreciation.	翻訳日:2023-07-21 13:10:53 公開日:2023-07-20
# Parseとリコール:放射線医のような正確な肺結節悪性度予測を目指して Parse and Recall: Towards Accurate Lung Nodule Malignancy Prediction like Radiologists ( http://arxiv.org/abs/2307.10824v1 ) ライセンス: Link先を確認	Jianpeng Zhang, Xianghua Ye, Jianfeng Zhang, Yuxing Tang, Minfeng Xu, Jianfei Guo, Xin Chen, Zaiyi Liu, Jingren Zhou, Le Lu, Ling Zhang	(参考訳) 肺がんは世界中で主要な死因であり、早期検診は生存率の向上に不可欠である。臨床的には、結節の文脈構造と放射線医の蓄積した経験は良性および悪性結節の同定の正確性に関連する2つの中核要素である。文脈情報は、位置、形状、周辺血管などの結節に関する包括的な情報を提供し、経験豊富な放射線科医は、意思決定の基礎を強化するために、以前の事例から手がかりを探すことができる。本稿では,放射線科医の診断過程をシミュレートする放射線科医にインスパイアされた手法を提案する。コンテキスト解析モジュールはまず、結節のコンテキスト構造をセグメント化し、その後、結節のより包括的な理解のためにコンテキスト情報を集約する。プロトタイプリコールモジュールは、プロトタイプベースの学習を利用して、以前に学んだケースを比較分析のプロトタイプとして凝縮する。この2つのモジュールを基盤として, 結節の固有特性と他の結節から蓄積された外部知識を併用し, 音響診断を行う。低用量と非用量の両方のニーズを満たすため,低用量および非用量CTからそれぞれ12,852ノジュールと4,029ノジュールの大規模データセットを収集し,それぞれに病理診断と追跡確認を行った。提案手法は,低線量および非コントラストの両方のシナリオにおいて,高度なスクリーニング性能を実現することを示す。 Lung cancer is a leading cause of death worldwide and early screening is critical for improving survival outcomes. In clinical practice, the contextual structure of nodules and the accumulated experience of radiologists are the two core elements related to the accuracy of identification of benign and malignant nodules. Contextual information provides comprehensive information about nodules such as location, shape, and peripheral vessels, and experienced radiologists can search for clues from previous cases as a reference to enrich the basis of decision-making. In this paper, we propose a radiologist-inspired method to simulate the diagnostic process of radiologists, which is composed of context parsing and prototype recalling modules. The context parsing module first segments the context structure of nodules and then aggregates contextual information for a more comprehensive understanding of the nodule. The prototype recalling module utilizes prototype-based learning to condense previously learned cases as prototypes for comparative analysis, which is updated online in a momentum way during training. Building on the two modules, our method leverages both the intrinsic characteristics of the nodules and the external knowledge accumulated from other nodules to achieve a sound diagnosis. To meet the needs of both low-dose and noncontrast screening, we collect a large-scale dataset of 12,852 and 4,029 nodules from low-dose and noncontrast CTs respectively, each with pathology- or follow-up-confirmed labels. Experiments on several datasets demonstrate that our method achieves advanced screening performance on both low-dose and noncontrast scenarios.	翻訳日:2023-07-21 13:10:36 公開日:2023-07-20
# 漸進的意味セグメンテーションのための勾配-意味論的補償 Gradient-Semantic Compensation for Incremental Semantic Segmentation ( http://arxiv.org/abs/2307.10822v1 ) ライセンス: Link先を確認	Wei Cong, Yang Cong, Jiahua Dong, Gan Sun, Henghui Ding	(参考訳) インクリメンタルセマンティックセマンティクスは、以前に学習したクラスのトレーニングデータにアクセスすることなく、新しいクラスのセマンティクスを継続的に学習することを目的としている。しかし、現在のほとんどの方法は破滅的な忘れと背景シフトに対処できない。 1)不均衡勾配バックプロパゲーションによって引き起こされる異なるペースを考慮せずに,すべてのクラスを等しく扱うこと。 2) クラス間の強い意味指導がない。本稿では,上記の課題に取り組むため,グラデーションとセマンティクスの両方の観点から段階的なセマンティクスセグメンテーションを克服する,グラデーション・セマンティクス補償(gsc)モデルを提案する。具体的には、勾配面からの破滅的な忘れに対処するために、再重み付け勾配のバックプロパゲーションにより、以前に見られたクラスの忘れるペースのバランスをとることができるステップアウェアな勾配補償を開発する。一方,本研究では,意味的側面からの破滅的忘れを緩和するソフトラベルを用いて,一貫したクラス間意味関係を蒸留するソフトシャープ意味関係蒸留法を提案する。さらに,背景変化を緩和する強力な意味的ガイダンスを提供する,原型的な擬似再ラベルを開発する。ピクセルとクラスワイドプロトタイプ間の距離を測定することで、バックグラウンドで古いクラスの高品質な擬似ラベルを生成する。 3つの公開データセット、すなわち Pascal VOC 2012 ADE20K と Cityscapes に関する大規模な実験は、提案した GSC モデルの有効性を実証している。 Incremental semantic segmentation aims to continually learn the segmentation of new coming classes without accessing the training data of previously learned classes. However, most current methods fail to address catastrophic forgetting and background shift since they 1) treat all previous classes equally without considering different forgetting paces caused by imbalanced gradient back-propagation; 2) lack strong semantic guidance between classes. To tackle the above challenges, in this paper, we propose a Gradient-Semantic Compensation (GSC) model, which surmounts incremental semantic segmentation from both gradient and semantic perspectives. Specifically, to address catastrophic forgetting from the gradient aspect, we develop a step-aware gradient compensation that can balance forgetting paces of previously seen classes via re-weighting gradient backpropagation. Meanwhile, we propose a soft-sharp semantic relation distillation to distill consistent inter-class semantic relations via soft labels for alleviating catastrophic forgetting from the semantic aspect. In addition, we develop a prototypical pseudo re-labeling that provides strong semantic guidance to mitigate background shift. It produces high-quality pseudo labels for old classes in the background by measuring distances between pixels and class-wise prototypes. Extensive experiments on three public datasets, i.e., Pascal VOC 2012, ADE20K, and Cityscapes, demonstrate the effectiveness of our proposed GSC model.	翻訳日:2023-07-21 13:10:06 公開日:2023-07-20
# 電磁散乱における第1子近似の厳密性 Exactness of the first Born approximation in electromagnetic scattering ( http://arxiv.org/abs/2307.10819v1 ) ライセンス: Link先を確認	Farhang Loran and Ali Mostafazadeh	(参考訳) 一般の非等方的定常線形媒質による3次元の平面電磁波散乱に対して、入射波数$k$が予め割り当てられた値$\alpha$を超えない場合に、第1ボルン近似が散乱波の正確な表現を得られる媒体の誘電率と透過性テンソルの条件を与える。また,この条件下では,入射波の偏光によらず広帯域可視性を示す,$k\leq \alpha/2$ に対して媒質が全方向可視であることを示す。 For the scattering of plane electromagnetic waves by a general possibly anisotropic stationary linear medium in three dimensions, we give a condition on the permittivity and permeability tensors of the medium under which the first Born approximation yields the exact expression for the scattered wave whenever the incident wavenumber $k$ does not exceed a pre-assigned value $\alpha$. We also show that under this condition the medium is omnidirectionally invisible for $k\leq \alpha/2$, i.e., it displays broadband invisibility regardless of the polarization of the incident wave.	翻訳日:2023-07-21 13:09:35 公開日:2023-07-20
# BoxDiff: トレーニング不要なボックス制約拡散を用いたテキスト・画像合成 BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion ( http://arxiv.org/abs/2307.10816v1 ) ライセンス: Link先を確認	Jinheng Xie, Yuexiang Li, Yawen Huang, Haozhe Liu, Wentian Zhang, Yefeng Zheng and Mike Zheng Shou	(参考訳) 最近のテキストから画像への拡散モデルは、高品質な画像を生成する驚くべき能力を示している。しかし、研究者は主にテキストプロンプトだけで画像の合成方法を研究した。他のモダリティを条件として利用する研究もあるが、箱/マスク画像ペアや微調整時間など、かなりのペアデータが必要となる。このようなペアデータには時間と労力がかかり、クローズドセットに制限されるため、オープンワールドにおけるアプリケーションのボトルネックになる可能性がある。本稿では,ボックスやスクリブルなどのユーザ提供条件の最も単純な形式に焦点を当てる。上記の問題を緩和するために,与えられた空間条件に固執する合成画像内のオブジェクトやコンテキストを制御するためのトレーニングフリーな手法を提案する。具体的には、3つの空間的制約、すなわち、インナーボックス、アウターボックス、コーナー制約は、追加のトレーニングや大量のアノテートレイアウトデータを必要としない拡散モデルのデノイングステップにシームレスに統合される。提案した制約は, 安定拡散モデルが高忠実で多様な概念カバレッジで合成できる能力を維持しつつ, 画像中の何とどこに表示すべきかを制御できることを示す。コードはhttps://github.com/Sierkinhane/BoxDiffで公開されている。 Recent text-to-image diffusion models have demonstrated an astonishing capacity to generate high-quality images. However, researchers mainly studied the way of synthesizing images with only text prompts. While some works have explored using other modalities as conditions, considerable paired data, e.g., box/mask-image pairs, and fine-tuning time are required for nurturing models. As such paired data is time-consuming and labor-intensive to acquire and restricted to a closed set, this potentially becomes the bottleneck for applications in an open world. This paper focuses on the simplest form of user-provided conditions, e.g., box or scribble. To mitigate the aforementioned problem, we propose a training-free method to control objects and contexts in the synthesized images adhering to the given spatial conditions. Specifically, three spatial constraints, i.e., Inner-Box, Outer-Box, and Corner Constraints, are designed and seamlessly integrated into the denoising step of diffusion models, requiring no additional training and massive annotated layout data. Extensive results show that the proposed constraints can control what and where to present in the images while retaining the ability of the Stable Diffusion model to synthesize with high fidelity and diverse concept coverage. The code is publicly available at https://github.com/Sierkinhane/BoxDiff.	翻訳日:2023-07-21 13:09:21 公開日:2023-07-20
# クロスコーポレート多言語音声感情認識:アムハラ語対他言語 Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages ( http://arxiv.org/abs/2307.10814v1 ) ライセンス: Link先を確認	Ephrem Afele Retta, Richard Sutcliffe, Jabar Mahmood, Michael Abebe Berwo, Eiad Almekhlafi, Sajjad Ahmed Khan, Shehzad Ashraf Chaudhry, Mustafa Mhamed, Jun Feng	(参考訳) 従来の音声感情認識(ser)タスクでは、所定の言語の分類器が、同じ言語用の既存のデータセット上で訓練される。しかし、言語のトレーニングデータが存在しない場合は、代わりに他の言語からのデータを使用することができる。言語横断および多言語SERを用いて,アムハラ語,英語,ドイツ語,URDUを用いて実験を行った。 amharicでは、公開されているamharic speech emotion dataset(ased)を使っています。英語、ドイツ語、Urduでは、既存のRAVDESS、EMO-DB、URDUデータセットを使用します。我々は、すべてのデータセットのラベルを正と負の2つのクラスにマッピングする以前の研究に従った。したがって、異なる言語のパフォーマンスを直接比較し、トレーニングとテストのための言語を組み合わせることができます。実験1では、AlexNet、VGGE(VGGの派生案)、ResNet50の3つの分類器を用いて単言語SER試験を行った。 3つのモデルの平均値はASEDとRAVDESSと非常によく似ており、アムハラ語と英語のSERも同様に難しいことが示唆された。同様に、ドイツのSERはより困難であり、Urdu SERはより簡単である。実験2では,ある言語で訓練を行い,各ペアの両方向(amharic<->german, amharic<->english, amharic<->urdu)でテストを行った。 amharicをターゲットとした結果は、英語やドイツ語をソースとして使うことが最良の結果をもたらすことを示唆している。実験3では、いくつかの非アムハラ語でトレーニングを行い、それからアムハラ語でテストしました。得られた最良の精度は実験2の最良の精度よりも数パーセント高く、訓練に2つまたは3つの非アンモリック言語を使う場合、1つの非アンモリック言語を使う場合よりも良い結果が得られることが示唆された。全体として,言語資源が不足している場合,言語間および多言語間トレーニングがser分類器の訓練に有効な戦略となる可能性が示唆された。 In a conventional Speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language does not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German and URDU. For Amharic, we use our own publicly-available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu we use the existing RAVDESS, EMO-DB and URDU datasets. We followed previous research in mapping labels for all datasets to just two classes, positive and negative. Thus we can compare performance on different languages directly, and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. Results averaged for the three models were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each pair: Amharic<->German, Amharic<->English, and Amharic<->Urdu. Results with Amharic as target suggested that using English or German as source will give the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percent greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training a SER classifier when resources for a language are scarce.	翻訳日:2023-07-21 13:08:58 公開日:2023-07-20
# 全方位音声視覚信号の知覚品質評価 Perceptual Quality Assessment of Omnidirectional Audio-visual Signals ( http://arxiv.org/abs/2307.10813v1 ) ライセンス: Link先を確認	Xilei Zhu, Huiyu Duan, Yuqin Cao, Yuxin Zhu, Yucheng Zhu, Jing Liu, Li Chen, Xiongkuo Min, Guangtao Zhai	(参考訳) 医療、教育、広告、観光などの分野において、Omnidirectional Video (ODV) はますます重要な役割を担っている。 ODVの品質を評価することは、サービスプロデューサにとってユーザのQuality of Experience(QoE)を改善する上で重要である。しかし、既存のODVの品質評価研究はビデオの歪みにのみ焦点を当てているが、全体的なQoEは付随する音声信号にも依存している。本稿では,まず,高画質全方向A/Vコンテンツから生成される375個の全方向オーディオ視覚(A/V)シーケンスと,それに対応する知覚的オーディオ視覚品質スコアを含む,全方向ビデオのための大規模オーディオ視覚品質評価データセットを確立する。そこで,本研究では,マルチモーダル融合戦略を用いて,既存の単一モードオーディオおよびビデオQAモデルを組み合わせた全方位オーディオ視覚品質評価(OAVQA)のための3つのベースライン手法を設計する。我々は,OAVQAに対するA/Vマルチモーダル融合法の有効性を検証し,全方位QoE評価のための新しいベンチマークを提供する。私たちのデータセットはhttps://github.com/iamazxl/oavqaで利用可能です。 Omnidirectional videos (ODVs) play an increasingly important role in the application fields of medical, education, advertising, tourism, etc. Assessing the quality of ODVs is significant for service-providers to improve the user's Quality of Experience (QoE). However, most existing quality assessment studies for ODVs only focus on the visual distortions of videos, while ignoring that the overall QoE also depends on the accompanying audio signals. In this paper, we first establish a large-scale audio-visual quality assessment dataset for omnidirectional videos, which includes 375 distorted omnidirectional audio-visual (A/V) sequences generated from 15 high-quality pristine omnidirectional A/V contents, and the corresponding perceptual audio-visual quality scores. Then, we design three baseline methods for full-reference omnidirectional audio-visual quality assessment (OAVQA), which combine existing state-of-the-art single-mode audio and video QA models via multimodal fusion strategies. We validate the effectiveness of the A/V multimodal fusion method for OAVQA on our dataset, which provides a new benchmark for omnidirectional QoE evaluation. Our dataset is available at https://github.com/iamazxl/OAVQA.	翻訳日:2023-07-21 13:08:28 公開日:2023-07-20
# 第二の心を持つように思える」:大規模言語モデルによる前書きにおける人間とAIの共創造性の検討 "It Felt Like Having a Second Mind": Investigating Human-AI Co-creativity in Prewriting with Large Language Models ( http://arxiv.org/abs/2307.10811v1 ) ライセンス: Link先を確認	Qian Wan, Siying Hu, Yu Zhang, Piaohong Wang, Bo Wen, Zhicong Lu	(参考訳) プレライティング(prewriting)は、最初のドラフトの前にアイデアを発見し、開発するプロセスである。大規模言語モデル(LLM)は、クリエイティブな記述を含む様々なタスクに有用であることが示されているが、ユーザーが事前記述をサポートするためにLLMとどのように協力するかは分かっていない。このような創造的プロセスにおいてllmの望ましい協力的役割とイニシアティブもまた不明確である。プリライティング中の人間-LLMのコラボレーションパターンとダイナミクスを調べるために,15人の参加者による3段階の質的研究を行った。その結果,共同作業において,理想,照明,実施段階を含む3段階の反復的Human-AI共創造プロセスが存在することがわかった。この協調プロセスは、人間とllmの間に存在する混合的かつシフト的なレベルのイニシアティブに加えて、人間を支配的な役割で擁護する。本研究は、このプロセス中に発生するコラボレーションのブレークダウン、Human-AIコクリエーションにおける既存のLLMの使用に対するユーザ認識について報告し、このコクリエーションプロセスを支援するための設計上の意味について論じる。 Prewriting is the process of discovering and developing ideas before a first draft, which requires divergent thinking and often implies unstructured strategies such as diagramming, outlining, free-writing, etc. Although large language models (LLMs) have been demonstrated to be useful for a variety of tasks including creative writing, little is known about how users would collaborate with LLMs to support prewriting. The preferred collaborative role and initiative of LLMs during such a creativity process is also unclear. To investigate human-LLM collaboration patterns and dynamics during prewriting, we conducted a three-session qualitative study with 15 participants in two creative tasks: story writing and slogan writing. The findings indicated that during collaborative prewriting, there appears to be a three-stage iterative Human-AI Co-creativity process that includes Ideation, Illumination, and Implementation stages. This collaborative process champions the human in a dominant role, in addition to mixed and shifting levels of initiative that exist between humans and LLMs. This research also reports on collaboration breakdowns that occur during this process, user perceptions of using existing LLMs during Human-AI Co-creativity, and discusses design implications to support this co-creativity process.	翻訳日:2023-07-21 13:08:06 公開日:2023-07-20
# 最適輸送による模倣学習におけるエキスパートの実証 On Combining Expert Demonstrations in Imitation Learning via Optimal Transport ( http://arxiv.org/abs/2307.10810v1 ) ライセンス: Link先を確認	Ilana Sebag, Samuel Cohen, Marc Peter Deisenroth	(参考訳) 模倣学習(il)は、専門家によるデモンストレーションを通じてエージェントに特定のタスクを教える。 ILの主要なアプローチの1つは、エージェントと専門家の間の距離を定義し、その距離を最小化するエージェントポリシーを見つけることである。エージェントと専門家の軌跡間の有意な距離を測定する手段を提供するため、模倣学習において最適な輸送法が広く用いられている。しかしながら、複数の専門家によるデモを最適に組み合わせる方法については、広く研究されていない。標準的な方法は、状態(-アクション)軌跡を単純に結合することであり、これはトラジェクトリがマルチモーダルである場合に問題となる。提案手法は,マルチマルジナルな最適輸送距離を用いて,複数の状態軌跡と多種多様な状態軌跡の組み合わせをOT感覚で実現し,より合理的な幾何平均値を提供する方法である。提案手法は,複数の専門家からエージェントが学習し,その効率をOpenAI Gym制御環境上で解析し,標準手法が常に最適であるとは限らないことを示す。 Imitation learning (IL) seeks to teach agents specific tasks through expert demonstrations. One of the key approaches to IL is to define a distance between agent and expert and to find an agent policy that minimizes that distance. Optimal transport methods have been widely used in imitation learning as they provide ways to measure meaningful distances between agent and expert trajectories. However, the problem of how to optimally combine multiple expert demonstrations has not been widely studied. The standard method is to simply concatenate state (-action) trajectories, which is problematic when trajectories are multi-modal. We propose an alternative method that uses a multi-marginal optimal transport distance and enables the combination of multiple and diverse state-trajectories in the OT sense, providing a more sensible geometric average of the demonstrations. Our approach enables an agent to learn from several experts, and its efficiency is analyzed on OpenAI Gym control environments and demonstrates that the standard method is not always optimal.	翻訳日:2023-07-21 13:07:43 公開日:2023-07-20
# 関係時間異常検出を含むクラウドシステムの性能問題同定 Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection ( http://arxiv.org/abs/2307.10869v1 ) ライセンス: Link先を確認	Wenwei Gu, Jinyang Liu, Zhuangbin Chen, Jianping Zhang, Yuxin Su, Jiazhen Gu, Cong Feng, Zengyin Yang and Michael Lyu	(参考訳) パフォーマンス問題は、大規模なクラウドサービスシステムに浸透し、大きな収益損失につながる可能性がある。信頼性の高いパフォーマンスを保証するためには、サービス監視メトリクスを使用してこれらの問題を正確に識別し、ローカライズする必要がある。現代のクラウドシステムの複雑さと規模を考えると、このタスクは困難であり、個々の人間の能力を超えた幅広い専門知識とリソースを必要とする可能性がある。既存の手法では、各メトリックを独立して分析して異常を検出することでこの問題に対処している。しかし、これはエンジニアが手動で診断することが難しい圧倒的な警報嵐を引き起こす可能性がある。より良いパフォーマンスを追求するためには、メトリクスの時間的パターンだけでなく、メトリクス(リレーショナルパターン)間の相関も考慮し、多変量メトリクス異常検出問題として定式化する必要がある。しかし、ほとんどの研究はこれらの2種類の特徴を明示的に抽出するに足りていない。さらに、トレーニングデータ中にラベルのない異常が混在しており、検出性能を損なう可能性がある。これらの制約に対処するために,メトリクスの相関情報と時間情報を組み合わせた関係時間異常検出モデル(RTAnomaly)を提案する。 RTAnomalyは、メトリクス間の依存関係を学習するためにグラフアテンション層を使用し、異常を効果的に発生させる可能性のある異常メトリクスの特定をさらに助ける。さらに、ポジティブなラベルなし学習の概念を利用して、トレーニングデータの潜在的な異常の問題に対処する。提案手法を評価するため,公開データセットと2つの産業データセットを用いて実験を行った。 RTAnomaly は、平均 F1 スコア 0.929 と Hit@3 0.920 を達成し、その優位性を示している。 Performance issues permeate large-scale cloud service systems, which can lead to huge revenue losses. To ensure reliable performance, it's essential to accurately identify and localize these issues using service monitoring metrics. Given the complexity and scale of modern cloud systems, this task can be challenging and may require extensive expertise and resources beyond the capacity of individual humans. Some existing methods tackle this problem by analyzing each metric independently to detect anomalies. However, this could incur overwhelming alert storms that are difficult for engineers to diagnose manually. To pursue better performance, not only the temporal patterns of metrics but also the correlation between metrics (i.e., relational patterns) should be considered, which can be formulated as a multivariate metrics anomaly detection problem. However, most of the studies fall short of extracting these two types of features explicitly. Moreover, there exist some unlabeled anomalies mixed in the training data, which may hinder the detection performance. To address these limitations, we propose the Relational- Temporal Anomaly Detection Model (RTAnomaly) that combines the relational and temporal information of metrics. RTAnomaly employs a graph attention layer to learn the dependencies among metrics, which will further help pinpoint the anomalous metrics that may cause the anomaly effectively. In addition, we exploit the concept of positive unlabeled learning to address the issue of potential anomalies in the training data. To evaluate our method, we conduct experiments on a public dataset and two industrial datasets. RTAnomaly outperforms all the baseline models by achieving an average F1 score of 0.929 and Hit@3 of 0.920, demonstrating its superiority.	翻訳日:2023-07-21 13:02:34 公開日:2023-07-20
# FigCaps-HF:図から図への生成フレームワークと人間のフィードバックによるベンチマーク FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback ( http://arxiv.org/abs/2307.10867v1 ) ライセンス: Link先を確認	Ashish Singh, Prateek Agarwal, Zixuan Huang, Arpita Singh, Tong Yu, Sungchul Kim, Victor Bursztyn, Nikos Vlassis, Ryan A. Rossi	(参考訳) キャプションは科学的な視覚化や文書を理解するのに不可欠である。既存の科学的な人物に対するキャプション手法は、学習のための文書から抽出された図形の字幕ペアに依存しているが、その多くが、助け、説明可能性、視覚的記述性([15])といった指標に関して不足しているため、字幕の生成は読者の好みと一致しない。高品質なフィギュアキャプションの生成を可能にするため,FigCaps-HFは,読取者の好みに最適化されたキャプションを生成する際に,ドメインエキスパートのフィードバックを組み込むことのできる,フィギュアキャプション生成のための新しいフレームワークである。私たちのフレームワークは 1) フィギュアキャプチャペアの品質評価のための自動方法 2)人間フィードバックを用いた新しい強化学習(RLHF)により,読取者の好みに応じて生成図形とキャプションのモデルを最適化する。各種モデルの標準微調整よりも性能を向上させることで,簡単な学習フレームワークの有効性を実証する。特にベースモデルとしてblipを使用する場合,我々のrlhfフレームワークは,ルージュ,ブルー,メテオールにおいて平均35.7%,16.9%,9%の利得を達成している。最後に,この問題に対するRLHF手法のさらなる評価と開発を可能にするために,人為的フィードバックを伴う大規模ベンチマークデータセットをリリースする。 Captions are crucial for understanding scientific visualizations and documents. Existing captioning methods for scientific figures rely on figure-caption pairs extracted from documents for training, many of which fall short with respect to metrics like helpfulness, explainability, and visual-descriptiveness [15] leading to generated captions being misaligned with reader preferences. To enable the generation of high-quality figure captions, we introduce FigCaps-HF a new framework for figure-caption generation that can incorporate domain expert feedback in generating captions optimized for reader preferences. Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences. We demonstrate the effectiveness of our simple learning framework by improving performance over standard fine-tuning across different types of models. In particular, when using BLIP as the base model, our RLHF framework achieves a mean gain of 35.7%, 16.9%, and 9% in ROUGE, BLEU, and Meteor, respectively. Finally, we release a large-scale benchmark dataset with human feedback on figure-caption pairs to enable further evaluation and development of RLHF techniques for this problem.	翻訳日:2023-07-21 13:02:06 公開日:2023-07-20
# 深部グラフを用いた神経持続の注意点 Addressing caveats of neural persistence with deep graph persistence ( http://arxiv.org/abs/2307.10865v1 ) ライセンス: Link先を確認	Leander Girrbach, Anders Christensen, Ole Winther, Zeynep Akata, A. Sophia Koepke	(参考訳) ニューラルパーシスタンス(Neural Persistence)は、ディープラーニングにおけるトポロジカルデータ分析の新たな分野において提案される、ニューラルネットワークの複雑性を定量化する重要な尺度である。しかし、本研究では、ネットワーク重みのばらつきと大きな重みの空間集中が神経の持続性に影響を与える主な要因であることを理論的および実証的に見出した。これは線形分類器の有用な情報をキャプチャする一方で、深層ニューラルネットワークの後の層には関連する空間構造が存在しておらず、ニューラルネットワークの永続性は重みの分散とほぼ同値である。さらに、ディープニューラルネットワークのための層間平均化手順は、層間の相互作用を考慮しない。そこで本研究では,1つの行列上でのニューラルネットワークの永続性を計算するのに等価である単一層ではなく,ニューラルネットワーク全体に対するニューラルネットワークの永続性に基づくフィルタリングの拡張を提案する。これは、ネットワークを通した永続的なパスを暗黙的に取り入れ、標準化を通じて分散に関連する問題を軽減します。コードはhttps://github.com/ExplainableML/Deep-Graph-Persistenceで入手できる。 Neural Persistence is a prominent measure for quantifying neural network complexity, proposed in the emerging field of topological data analysis in deep learning. In this work, however, we find both theoretically and empirically that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. Whilst this captures useful information for linear classifiers, we find that no relevant spatial structure is present in later layers of deep neural networks, making neural persistence roughly equivalent to the variance of weights. Additionally, the proposed averaging procedure across layers for deep neural networks does not consider interaction between layers. Based on our analysis, we propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers, which is equivalent to calculating neural persistence on one particular matrix. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues through standardisation. Code is available at https://github.com/ExplainableML/Deep-Graph-Persistence .	翻訳日:2023-07-21 13:01:40 公開日:2023-07-20
# セマンティクス看護の改善のために注意を分割・結合する Divide & Bind Your Attention for Improved Generative Semantic Nursing ( http://arxiv.org/abs/2307.10864v1 ) ライセンス: Link先を確認	Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva	(参考訳) 大規模テキストから画像への生成モデル、例えばstable diffusion (sd)は、高い忠実度で圧倒的な結果を示している。素晴らしい進歩にもかかわらず、現在の最先端モデルは入力プロンプトに完全に付着した画像を生成するのに依然として苦労している。 Attend & Exciteは、推論時間におけるクロスアテンションを最適化し、セマンティックスをよりうまく組み込むことを目的として、ジェネレーティブセマンティック・ナーシング(GSN)の概念を導入した。これは単純なプロンプト、例えば 'a cat and a dog'' を生成する上で有望な結果を示す。しかし、その有効性はより複雑なプロンプトを扱う際に低下し、不適切な属性結合の問題に明示的に対処しない。複雑なプロンプトや複数のエンティティを含むシナリオによって生じる課題に対処し、属性バインディングの改善を実現するため、division & bindを提案する。 GSNの新たな損失目標として,新規の出席損失と結合損失の2つを紹介する。提案手法は、複雑なプロンプトからの属性アライメントを改善した所望のオブジェクトを忠実に合成し、複数の評価ベンチマークで優れた性能を示す。さらなるビデオと更新はプロジェクトページ \url{https://sites.google.com/view/divide-and-bind} で見ることができる。 Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., ``a cat and a dog''. However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks. More videos and updates can be found on the project page \url{https://sites.google.com/view/divide-and-bind}.	翻訳日:2023-07-21 13:01:20 公開日:2023-07-20
# BlendFace: フェイススワッピングのためのアイデンティティエンコーダの再設計 BlendFace: Re-designing Identity Encoders for Face-Swapping ( http://arxiv.org/abs/2307.10854v1 ) ライセンス: Link先を確認	Kaede Shiohara, Xingchao Yang, Takafumi Taketomi	(参考訳) コンピュータビジョンにおける生成的敵ネットワークと顔認識モデルの大きな進歩により、単一のソースの画像のアイデンティティを交換できるようになった。多くの研究でほぼ満足な解が提案されたように思われるが、広く使われているアイデンティティエンコーダであるeg、ArcFaceは、顔認識タスクの事前訓練によっていくつかの重要な属性バイアスを持つため、いまだに不要な属性スワッピングを引き起こすアイデンティティ属性エンタングルメントに悩まされている。この問題に対処するために、顔スワッピングのための新しいIDエンコーダであるBlendFaceを設計する。 blendfaceの背景にある重要なアイデアは、ヘアセイルのような対人バイアスを緩和する別のイメージに置き換えられたブレンドイメージで顔認識モデルをトレーニングすることだ。 BlendFaceは混乱したID機能をジェネレータに供給し、ID損失関数としてジェネレータを適切に誘導する。大規模な実験により、BlendFaceはフェイススワッピングモデルにおけるID-属性の不整合を改善し、従来の方法と同等の定量的性能を維持することが示されている。 The great advancements of generative adversarial networks and face recognition models in computer vision have made it possible to swap identities on images from single sources. Although a lot of studies seems to have proposed almost satisfactory solutions, we notice previous methods still suffer from an identity-attribute entanglement that causes undesired attributes swapping because widely used identity encoders, eg, ArcFace, have some crucial attribute biases owing to their pretraining on face recognition tasks. To address this issue, we design BlendFace, a novel identity encoder for face-swapping. The key idea behind BlendFace is training face recognition models on blended images whose attributes are replaced with those of another mitigates inter-personal biases such as hairsyles. BlendFace feeds disentangled identity features into generators and guides generators properly as an identity loss function. Extensive experiments demonstrate that BlendFace improves the identity-attribute disentanglement in face-swapping models, maintaining a comparable quantitative performance to previous methods.	翻訳日:2023-07-21 13:00:56 公開日:2023-07-20
# 弱修正変化検出のための効果的な事前及び効率的なモデル探索 Exploring Effective Priors and Efficient Models for Weakly-Supervised Change Detection ( http://arxiv.org/abs/2307.10853v1 ) ライセンス: Link先を確認	Zhenghui Zhao, Lixiang Ru, Chen Wu	(参考訳) weakly-supervised change detection (wscd)は、画像レベルのアノテーションだけでピクセルレベルの変更を検出することを目的としている。ラベルの効率のため、WSCDは最近注目を集めている。しかし、現在のWSCDメソッドは、画像レベルのアノテーションとピクセルレベルの予測の不整合など、変更の欠如と製造の難しさにしばしば遭遇する。特に、変化の欠如は、画像レベルのラベルが変化しているにもかかわらず、WSCDモデルが変化したピクセルを予測できない状況と、その逆は変化の作り方である。この課題に対処するため、WSCDにおけるグローバルスケールおよびローカルスケールの事前処理を活用し、Dilated Prior(DP)デコーダとLabel Gated(LG)制約という2つのコンポーネントを提案する。 DPデコーダは、変更された画像レベルラベルでサンプルをデコードし、変更されていないラベルでサンプルをスキップし、すべて変更されていないピクセルレベルラベルで置き換える。 LGの制約は、変化した表現と画像レベルのラベルの対応から派生し、変化状態の誤予測時にモデルをペナルティ化する。さらに,変更検出における弱教師付き学習の可能性を示す,シンプルながら強力なトランスフォーマーベースモデルであるTransWCDを開発した。 DPデコーダとLG制約をTransWCDに統合することにより、TransWCD-DLを形成する。提案したTransWCDとTransWCD-DLは,WHU-CDデータセットの最先端手法に対して,それぞれ有意な+6.33%,+9.55%のF1スコアを達成している。いくつかのパフォーマンス指標は、FSCD(Full-supervised Change Detection)の競合よりも多い。コードはhttps://github.com/zhenghuizhao/TransWCDで入手できる。 Weakly-supervised change detection (WSCD) aims to detect pixel-level changes with only image-level annotations. Owing to its label efficiency, WSCD is drawing increasing attention recently. However, current WSCD methods often encounter the challenge of change missing and fabricating, i.e., the inconsistency between image-level annotations and pixel-level predictions. Specifically, change missing refer to the situation that the WSCD model fails to predict any changed pixels, even though the image-level label indicates changed, and vice versa for change fabricating. To address this challenge, in this work, we leverage global-scale and local-scale priors in WSCD and propose two components: a Dilated Prior (DP) decoder and a Label Gated (LG) constraint. The DP decoder decodes samples with the changed image-level label, skips samples with the unchanged label, and replaces them with an all-unchanged pixel-level label. The LG constraint is derived from the correspondence between changed representations and image-level labels, penalizing the model when it mispredicts the change status. Additionally, we develop TransWCD, a simple yet powerful transformer-based model, showcasing the potential of weakly-supervised learning in change detection. By integrating the DP decoder and LG constraint into TransWCD, we form TransWCD-DL. Our proposed TransWCD and TransWCD-DL achieve significant +6.33% and +9.55% F1 score improvements over the state-of-the-art methods on the WHU-CD dataset, respectively. Some performance metrics even exceed several fully-supervised change detection (FSCD) competitors. Code will be available at https://github.com/zhenghuizhao/TransWCD.	翻訳日:2023-07-21 13:00:35 公開日:2023-07-20
# 絡み合いに基づく到達可能性計画を用いたゴールコンディション強化学習 Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning ( http://arxiv.org/abs/2307.10846v1 ) ライセンス: Link先を確認	Zhifeng Qian and Mingyu You and Hongjun Zhou and Xuanhui Xu and Bin He	(参考訳) 目標条件強化学習(gcrl)は、エージェントが様々な目標を自発的に設定してスキルのセットを学ぶことを可能にする。様々な分野で提案された優れた成果にもかかわらず、時間的に拡張されたタスクで遠い目標に達することは、GCRLにとって課題である。現在の作業では、計画アルゴリズムを利用して中間部分ゴールを計画し、GCRLを増強することでこの問題に対処している。彼らの方法には2つの重要な要件が必要です (i)有効なサブゴールを検索する状態表現空間、及び (ii)サブゴールの到達可能性を測定する距離関数。しかし、彼らは非コンパクトな表現のために高次元の状態空間にスケールするのに苦労する。さらに、標準GCポリシを通じて高品質なトレーニングデータを収集できないため、不正確な距離関数が生じる。どちらも計画と政策学習の効率と性能に影響する。本稿では,目標条件付きrlアルゴリズムと異方性に基づく到達可能性計画(replan)を組み合わせた時間的拡張タスクの解法を提案する。再計画において, ロボットの姿勢と物体の位置を自己教師ありで観察するコンパクト表現を学習するために, drm(disentangled representation module)が提案されている。単純なReachability discrimination Module (REM) も、サブゴールの時間的距離を決定するように設計されている。さらに、REMは固有のボーナスを計算して、トレーニングのための新しい状態の収集を促進する。我々は3つの視覚に基づくシミュレーションタスクと1つの現実世界タスクでREPlanを評価した。実験の結果,REPlanは時間的に拡張されたタスクを解く上で,従来の最先端手法よりも大幅に優れていた。 Goal-Conditioned Reinforcement Learning (GCRL) can enable agents to spontaneously set diverse goals to learn a set of skills. Despite the excellent works proposed in various fields, reaching distant goals in temporally extended tasks remains a challenge for GCRL. Current works tackled this problem by leveraging planning algorithms to plan intermediate subgoals to augment GCRL. Their methods need two crucial requirements: (i) a state representation space to search valid subgoals, and (ii) a distance function to measure the reachability of subgoals. However, they struggle to scale to high-dimensional state space due to their non-compact representations. Moreover, they cannot collect high-quality training data through standard GC policies, which results in an inaccurate distance function. Both affect the efficiency and performance of planning and policy learning. In the paper, we propose a goal-conditioned RL algorithm combined with Disentanglement-based Reachability Planning (REPlan) to solve temporally extended tasks. In REPlan, a Disentangled Representation Module (DRM) is proposed to learn compact representations which disentangle robot poses and object positions from high-dimensional observations in a self-supervised manner. A simple REachability discrimination Module (REM) is also designed to determine the temporal distance of subgoals. Moreover, REM computes intrinsic bonuses to encourage the collection of novel states for training. We evaluate our REPlan in three vision-based simulation tasks and one real-world task. The experiments demonstrate that our REPlan significantly outperforms the prior state-of-the-art methods in solving temporally extended tasks.	翻訳日:2023-07-21 13:00:08 公開日:2023-07-20
# 連続学習のための自己ペース重み統合 Self-paced Weight Consolidation for Continual Learning ( http://arxiv.org/abs/2307.10845v1 ) ライセンス: Link先を確認	Wei Cong, Yang Cong, Gan Sun, Yuyang Liu, Jiahua Dong	(参考訳) 新しいタスクのパラメータを以前のタスクに近く保持する連続学習アルゴリズムは、シーケンシャルなタスク学習設定における破滅的な忘れの防止に人気がある。しかし、 1) 新たな継続学習者の業績は,以前に学習した課題の貢献を区別することなく劣化する。 2) 既存のアルゴリズムでは,新しいタスクを学習する際には,全てのタスクを正規化する必要があるため,タスク数とともに計算コストが大幅に向上する。上記の課題に対処するために,従来の課題の判別的貢献を評価することによって,堅牢な連続学習を実現するための自己ペース重み統合(spwc)フレームワークを提案する。具体的には,重要性能指標(精度)に基づく難易度を測定することで,過去のタスクの優先順位を反映した自己対応型正規化を開発する。新しいタスクに遭遇すると、すべてのタスクは優先順位に基づいて"difficult"から"easy"にソートされる。すると、新しい連続学習者のパラメータは、より困難な過去のタスクの知識を選択的に維持することで学習される。我々は,bi-convex形式におけるモデルパラメータと優先度重みを反復的に更新するために,代替凸探索を採用する。提案したspWCフレームワークはプラグイン・アンド・プレイであり、ほとんどの連続学習アルゴリズム(例えばEWC、MAS、RCIL)に異なる方向(例えば分類とセグメンテーション)で適用することができる。いくつかの公開ベンチマークデータセットの実験結果から,提案するフレームワークは,他の一般的な連続学習アルゴリズムと比較して,性能を効果的に向上できることが示された。 Continual learning algorithms which keep the parameters of new tasks close to that of previous tasks, are popular in preventing catastrophic forgetting in sequential task learning settings. However, 1) the performance for the new continual learner will be degraded without distinguishing the contributions of previously learned tasks; 2) the computational cost will be greatly increased with the number of tasks, since most existing algorithms need to regularize all previous tasks when learning new tasks. To address the above challenges, we propose a self-paced Weight Consolidation (spWC) framework to attain robust continual learning via evaluating the discriminative contributions of previous tasks. To be specific, we develop a self-paced regularization to reflect the priorities of past tasks via measuring difficulty based on key performance indicator (i.e., accuracy). When encountering a new task, all previous tasks are sorted from "difficult" to "easy" based on the priorities. Then the parameters of the new continual learner will be learned via selectively maintaining the knowledge amongst more difficult past tasks, which could well overcome catastrophic forgetting with less computational cost. We adopt an alternative convex search to iteratively update the model parameters and priority weights in the bi-convex formulation. The proposed spWC framework is plug-and-play, which is applicable to most continual learning algorithms (e.g., EWC, MAS and RCIL) in different directions (e.g., classification and segmentation). Experimental results on several public benchmark datasets demonstrate that our proposed framework can effectively improve performance when compared with other popular continual learning algorithms.	翻訳日:2023-07-21 12:59:44 公開日:2023-07-20
# U-Net Convolutional LSTMアーキテクチャによるGPM用統合マルチサテライトE検索のグローバル化 Global Precipitation Nowcasting of Integrated Multi-satellitE Retrievals for GPM: A U-Net Convolutional LSTM Architecture ( http://arxiv.org/abs/2307.10843v1 ) ライセンス: Link先を確認	Reyhaneh Rahimi, Ardeshir Ebtehaj, Ali Behrangi, Jackson Tan	(参考訳) 本稿では,30分毎の降水量を4時間のリードタイムでほぼ全世界的に予測する深層学習アーキテクチャを提案する。このアーキテクチャは、U-NetとLSTM(convolutional long-term memory)ニューラルネットワークを融合させ、GPM(IMERG)用のIntegrated MultisatellitE Retrievalsのデータと、Global Forecast System(GFS)のいくつかの主要な降水ドライバを使用してトレーニングされる。平均二乗誤差 (regression) と焦点損失 (classification) を含む異なるトレーニング損失関数が降水流の質に及ぼす影響について検討した。その結果, 回帰ネットワークは光降水量(1.6mm/hr以下)を捕捉するのに有効であるが, 分類ネットワークは, 臨界成功指数 (csi) の観点から, 降水極値 (>8mm/hr) を現在キャスティングする回帰ネットワークよりも優れることがわかった。 . ワッサースタイン距離を用いて,分類ネットワークによって予測される降水は回帰ネットワークよりもimergに密接なクラス確率分布を持つことを示した。物理変数を組み込むことで、特に両ネットワークのリードタイムが長くなると、降雨のノキャスティングを改善できることが判明した。 IMERGを相対的な基準として、分数スキルスコア(FSS)のマルチスケール分析を行い、GFSの50kmに比べて10kmの解像度で流し込み機(FSS > 0.5)が熟練していることを示した。 4～mm/hr以上の降水量では、2時間のリードタイムで50km以上のスケールでFSSに熟練している。 This paper presents a deep learning architecture for nowcasting of precipitation almost globally every 30 min with a 4-hour lead time. The architecture fuses a U-Net and a convolutional long short-term memory (LSTM) neural network and is trained using data from the Integrated MultisatellitE Retrievals for GPM (IMERG) and a few key precipitation drivers from the Global Forecast System (GFS). The impacts of different training loss functions, including the mean-squared error (regression) and the focal-loss (classification), on the quality of precipitation nowcasts are studied. The results indicate that the regression network performs well in capturing light precipitation (below 1.6 mm/hr), but the classification network can outperform the regression network for nowcasting of precipitation extremes (>8 mm/hr), in terms of the critical success index (CSI).. Using the Wasserstein distance, it is shown that the predicted precipitation by the classification network has a closer class probability distribution to the IMERG than the regression network. It is uncovered that the inclusion of the physical variables can improve precipitation nowcasting, especially at longer lead times in both networks. Taking IMERG as a relative reference, a multi-scale analysis in terms of fractions skill score (FSS), shows that the nowcasting machine remains skillful (FSS > 0.5) at the resolution of 10 km compared to 50 km for GFS. For precipitation rates greater than 4~mm/hr, only the classification network remains FSS-skillful on scales greater than 50 km within a 2-hour lead time.	翻訳日:2023-07-21 12:59:15 公開日:2023-07-20
# ドメインシフト下におけるセマンティックセグメンテーションのためのラベル校正 Label Calibration for Semantic Segmentation Under Domain Shift ( http://arxiv.org/abs/2307.10842v1 ) ライセンス: Link先を確認	Ondrej Bohdal, Da Li, Timothy Hospedales	(参考訳) 事前訓練されたセマンティックセグメンテーションモデルの性能は、新しいドメインのデータを大幅に低下させる可能性がある。予測されたクラス確率を持つベクトルに最も近いプロトタイプに従って予測を行うことにより,事前学習したモデルを,ソフトラベルのプロトタイプを領域シフトで計算し,ラベル付き対象領域データに適用できることを示す。提案した適応手順は高速で、計算資源の面ではほとんど無料で提供され、大幅な性能向上をもたらす。このようなラベル校正の利点を,高度に実践的な合成から現実への意味的セグメンテーション問題に示す。 Performance of a pre-trained semantic segmentation model is likely to substantially decrease on data from a new domain. We show a pre-trained model can be adapted to unlabelled target domain data by calculating soft-label prototypes under the domain shift and making predictions according to the prototype closest to the vector with predicted class probabilities. The proposed adaptation procedure is fast, comes almost for free in terms of computational resources and leads to considerable performance improvements. We demonstrate the benefits of such label calibration on the highly-practical synthetic-to-real semantic segmentation problem.	翻訳日:2023-07-21 12:58:41 公開日:2023-07-20
# 多視点自己監督学習におけるエントロピーと再構成の役割 The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning ( http://arxiv.org/abs/2307.10907v1 ) ライセンス: Link先を確認	Borja Rodr\'iguez-G\'alvez, Arno Blaas, Pau Rodr\'iguez, Adam Goli\'nski, Xavier Suau, Jason Ramapuram, Dan Busbridge, Luca Zappella	(参考訳) 多視点自己教師学習(MVSSL)の成功のメカニズムはまだ完全には理解されていない。対照的にMVSSL法は相互情報(MI)の下位境界であるInfoNCEのレンズを用いて研究されている。しかし、他のMVSSLメソッドとMIとの関係は未だ不明である。我々は、エントロピーと再構成項(ER)からなるMI上の異なる下界を考察し、そのレンズを通して主MVSSLファミリーを分析する。このER境界を通して、DeepClusterやSwaVといったクラスタリングベースの手法がMIを最大化することを示す。また,BYOLやDINOといった蒸留法に基づく手法のメカニズムを再解釈し,再現期間を明示的に最大化し,安定エントロピーを暗黙的に促進することを示した。本研究では, 一般的なMVSSL法をER境界に置き換えることで, より小さいバッチサイズあるいはより小さい指数移動平均(EMA)係数でトレーニングした場合に, 安定した性能が得られることを示す。 Github repo: https://github.com/apple/ml-entropy-reconstruction.com The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.	翻訳日:2023-07-21 12:51:05 公開日:2023-07-20
# votelab: オンライン集団意思決定のためのモジュラーで適応的な実験プラットフォーム VoteLab: A Modular and Adaptive Experimentation Platform for Online Collective Decision Making ( http://arxiv.org/abs/2307.10903v1 ) ライセンス: Link先を確認	Renato Kunz, Fatemeh Banaie, Abhinav Sharma, Carina I. Hausladen, Dirk Helbing, Evangelos Pournaras	(参考訳) デジタル民主主義と直接デジタル参加のための新しい形態は前例のない勢いを得ている。これは特に、市民集会、参加予算、選挙において公平で包括的で正当な集団的意思決定プロセスを促進するために設計された優先的な投票方法と意思決定支援システムの場合である。しかし、異なる投票方法を用いた体系的な人間実験は面倒で費用がかかる。本稿では,投票実験のモジュール化と適応設計のためのオープンソースかつ徹底的な文書化プラットフォームであるVoteLabを紹介する。これは、異なる投票方法を選択することで、再利用可能なキャンペーンを視覚的にインタラクティブに構築することをサポートし、投票者はスマートフォンで登録された投票質問に簡単に答えることができる。オンライン実験では、投票結果の整合性を調べるために、4つの投票方法と、COVID-19に関する質問を含む概念実証が使用されている。 VoteLabが複雑な投票シナリオの厳格な実験をサポートする能力を示している。 Digital democracy and new forms for direct digital participation in policy making gain unprecedented momentum. This is particularly the case for preferential voting methods and decision-support systems designed to promote fairer, more inclusive and legitimate collective decision-making processes in citizens assemblies, participatory budgeting and elections. However, a systematic human experimentation with different voting methods is cumbersome and costly. This paper introduces VoteLab, an open-source and thoroughly-documented platform for modular and adaptive design of voting experiments. It supports to visually and interactively build reusable campaigns with a choice of different voting methods, while voters can easily respond to subscribed voting questions on a smartphone. A proof-of-concept with four voting methods and questions on COVID-19 in an online lab experiment have been used to study the consistency of voting outcomes. It demonstrates the capability of VoteLab to support rigorous experimentation of complex voting scenarios.	翻訳日:2023-07-21 12:50:45 公開日:2023-07-20
# 歯科模型における変分点符号化変形 Variational Point Encoding Deformation for Dental Modeling ( http://arxiv.org/abs/2307.10895v1 ) ライセンス: Link先を確認	Johan Ziruo Ye, Thomas {\O}rkild, Peter Lempel S{\o}ndergaard, S{\o}ren Hauberg	(参考訳) 近年,デジタル歯科は大きな進歩を遂げているが,多くの課題が解決されている。本研究では,歯のメッシュの広範なデータセットを新たに公開し,さらなる研究を奨励する。さらに、FoldingNetを拡張して、ポイントクラウド表現の確率的学習を可能にする変分FoldingNet(VF-Net)を提案する。ポイントクラウドの既存の潜在変数モデルにおける重要な課題は、入力点と出力点の間の1対1のマッピングがないことである。代わりに、正規化された分布の対応を持たない計量であるチャムファー距離の最適化に頼らなければならず、確率モデルにおけるその使用を妨げている。確率的拡張を簡素化しながら計算効率を向上させるため,チャムファー距離の明示的な最小化を適切なエンコーダに置き換えることができることを示す。以上の結果から,VF-Netが既存モデルよりも優れていることを示す実証的証拠が得られた。さらに,VF-Netの潜在表現の堅牢性についても検討した。これらの結果は、ポイントクラウドの再構築と分析のための効果的で信頼性の高い方法としてのvf-netの有望な展望を裏付けるものである。 Digital dentistry has made significant advancements in recent years, yet numerous challenges remain to be addressed. In this study, we release a new extensive dataset of tooth meshes to encourage further research. Additionally, we propose Variational FoldingNet (VF-Net), which extends FoldingNet to enable probabilistic learning of point cloud representations. A key challenge in existing latent variable models for point clouds is the lack of a 1-to-1 mapping between input points and output points. Instead, they must rely on optimizing Chamfer distances, a metric that does not have a normalized distributional counterpart, preventing its usage in probabilistic models. We demonstrate that explicit minimization of Chamfer distances can be replaced by a suitable encoder, which allows us to increase computational efficiency while simplifying the probabilistic extension. Our experimental findings present empirical evidence demonstrating the superior performance of VF-Net over existing models in terms of dental scan reconstruction and extrapolation. Additionally, our investigation highlights the robustness of VF-Net's latent representations. These results underscore the promising prospects of VF-Net as an effective and reliable method for point cloud reconstruction and analysis.	翻訳日:2023-07-21 12:50:31 公開日:2023-07-20
# 人間の運動生成:調査 Human Motion Generation: A Survey ( http://arxiv.org/abs/2307.10894v1 ) ライセンス: Link先を確認	Wentao Zhu, Xiaoxuan Ma, Dongwoo Ro, Hai Ci, Jinlu Zhang, Jiaxin Shi, Feng Gao, Qi Tian, and Yizhou Wang	(参考訳) 人間の動き生成は、自然の人間のポーズシーケンスを生成し、現実世界の応用に大きな可能性を示す。近年,動きデータ収集技術や生成手法が進歩し,人間の動き生成への関心が高まっている。この分野のほとんどの研究は、テキスト、オーディオ、シーンコンテキストなどの条件信号に基づいて人間の動きを生成することに焦点を当てている。近年は顕著な進歩を遂げているが、人間の動きの複雑な性質と条件付き信号との暗黙的な関係により、課題が続いている。本稿では,人間の運動生成に関する総合的な文献レビューを行う。まず、人間の動作と生成モデルの背景を紹介し、続いて、テキストコンディショニング、オーディオコンディショニング、シーンコンディショニングの3つのメインストリームサブタスクの代表的な手法について検討する。さらに,共通データセットと評価指標の概要について述べる。最後に、オープンな問題について議論し、今後の研究の方向性について概説する。この調査がコミュニティに,この急速に発展する分野の包括的可視化を提供し,優れた課題に対処する新たなアイデアを刺激してくれることを願っています。 Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently in motion data collection technologies and generation methods, laying the foundation for increasing interest in human motion generation. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. While significant advancements have been made in recent years, the task continues to pose challenges due to the intricate nature of human motion and its implicit relationship with conditional signals. In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field. We begin by introducing the background of human motion and generative models, followed by an examination of representative methods for three mainstream sub-tasks: text-conditioned, audio-conditioned, and scene-conditioned human motion generation. Additionally, we provide an overview of common datasets and evaluation metrics. Lastly, we discuss open problems and outline potential future research directions. We hope that this survey could provide the community with a comprehensive glimpse of this rapidly evolving field and inspire novel ideas that address the outstanding challenges.	翻訳日:2023-07-21 12:50:13 公開日:2023-07-20
# シミュレーションメタモデリングにおける学習と一般化 Learning and Generalizing Polynomials in Simulation Metamodeling ( http://arxiv.org/abs/2307.10892v1 ) ライセンス: Link先を確認	Jesper Hauch, Christoffer Riis, Francisco C. Pereira	(参考訳) 多項式を学習し、分配を一般化する能力は、時間ステップの更新が多項式によって記述される工学の多くの分野におけるシミュレーションメタモデルにとって不可欠である。フィードフォワードニューラルネットワークは任意の関数に適合するが、高階多項式の分散の一般化はできない。そこで本研究では,高次多項式を近似するための再帰的ビルディングブロックとして使用される乗法ニューラルネットワーク(MNN)アーキテクチャを収集し,提案する。実験の結果、mnnは一般化時のベースラインモデルよりも優れており、検証のパフォーマンスは分散テストの性能に当てはまることがわかった。 MNNアーキテクチャに加えて,多項式時間ステップ更新を伴うシミュレーションに対して,シミュレーションメタモデリング手法を提案する。これらのシミュレーションでは、ステップサイズを増加させることで、時間間隔のシミュレーションを少ないステップで行うことができる。本手法は多項式時間ステップ更新を伴う任意のシミュレーションと互換性があるが, 疫学シミュレーションモデルを用いて, 高次多項式の学習と一般化のためのmnnの帰納的バイアスを示す。 The ability to learn polynomials and generalize out-of-distribution is essential for simulation metamodels in many disciplines of engineering, where the time step updates are described by polynomials. While feed forward neural networks can fit any function, they cannot generalize out-of-distribution for higher-order polynomials. Therefore, this paper collects and proposes multiplicative neural network (MNN) architectures that are used as recursive building blocks for approximating higher-order polynomials. Our experiments show that MNNs are better than baseline models at generalizing, and their performance in validation is true to their performance in out-of-distribution tests. In addition to MNN architectures, a simulation metamodeling approach is proposed for simulations with polynomial time step updates. For these simulations, simulating a time interval can be performed in fewer steps by increasing the step size, which entails approximating higher-order polynomials. While our approach is compatible with any simulation with polynomial time step updates, a demonstration is shown for an epidemiology simulation model, which also shows the inductive bias in MNNs for learning and generalizing higher-order polynomials.	翻訳日:2023-07-21 12:49:54 公開日:2023-07-20
# 構文対意味線形抽象化とニューラルネットワークの洗練 Syntactic vs Semantic Linear Abstraction and Refinement of Neural Networks ( http://arxiv.org/abs/2307.10891v1 ) ライセンス: Link先を確認	Calvin Chau, Jan K\v{r}et\'insk\'y, Stefanie Mohr	(参考訳) 抽象化はスケーラビリティを改善するための重要な検証テクニックです。しかし、ニューラルネットワークへの利用は非常に限られている。分類ネットワークを抽象化するための従来のアプローチは、いくつかのニューロンをその1つに置き換える。類似性は(ニューロン間の接続量を用いて)構文的に定義するか(様々な入力に対するニューロンの活性化値に基づいて)意味的に分類することができる。残念なことに、以前のアプローチは、実装時にのみ適度な削減を達成している。本研究では、ニューロンを他のニューロンの線形結合体に置き換えることのできる、より柔軟な枠組みを提供する。このアプローチを構文抽象と意味抽象の両方に適用し,それらを実験的に実装し,評価する。さらに, 抽象化の精細化手法を導入し, 縮小と精度のバランスを良くする手法を提案する。 Abstraction is a key verification technique to improve scalability. However, its use for neural networks is so far extremely limited. Previous approaches for abstracting classification networks replace several neurons with one of them that is similar enough. We can classify the similarity as defined either syntactically (using quantities on the connections between neurons) or semantically (on the activation values of neurons for various inputs). Unfortunately, the previous approaches only achieve moderate reductions, when implemented at all. In this work, we provide a more flexible framework where a neuron can be replaced with a linear combination of other neurons, improving the reduction. We apply this approach both on syntactic and semantic abstractions, and implement and evaluate them experimentally. Further, we introduce a refinement method for our abstractions, allowing for finding a better balance between reduction and precision.	翻訳日:2023-07-21 12:49:35 公開日:2023-07-20
# 競争市場における帯域学習のためのプレイヤー最適安定レグレット Player-optimal Stable Regret for Bandit Learning in Matching Markets ( http://arxiv.org/abs/2307.10890v1 ) ライセンス: Link先を確認	Fang Kong, Shuai Li	(参考訳) 市場マッチングの問題は、その適用範囲が多岐にわたることから、長い間文献で研究されてきた。安定マッチングを見つけることは、この問題における共通の均衡目標である。市場参加者は通常自分の好みについて不確実であるため、最近のリッチな作品のラインは、一方の参加者(プレイヤー)が他方(腕)との反復的な相互作用から未知の好みを学ぶオンライン設定を研究している。このシリーズの以前の作品の多くは、プレイヤーの最小予測の安定マッチングと比較して定義されるプレイヤー・ペシムの安定な後悔に対する理論的保証のみを導出することができる。しかし、悲観的安定マッチングの下では、プレイヤーは全ての安定マッチングの中で最小の報酬しか得られない。プレイヤーの利益を最大化するために、プレイヤー・オプティマイズ・マッチが最も望ましい。 \citet{basu21beyond} はプレイヤー最適の安定な後悔に対する上限をもたらすが、プレイヤーの好みの差が小さい場合には指数関数的に大きい。この後悔に対する多項式保証が存在するかどうかは重要な問題であるが、まだ未解決の問題である。本研究は,探索-テーマ-ゲイル-シャプリー (ETGS) という新しいアルゴリズムを提供し,各プレイヤーの最適な安定な後悔は,$O(K\log T/\Delta^2)$,$K$は武器数,$T$は地平線,$\Delta$は最初の$N+1$ランクのアーム間のプレイヤーの最小の選好ギャップであることを示す。この結果は、より弱いプレイヤー・ペシムの安定的目標を持つか、特別な仮定を持つ市場のみに適用する以前の作品を大幅に改善する。参加者の嗜好がいくつかの特別な条件を満たすとき、我々の後悔の上界も以前導出された下界と一致する。 The problem of matching markets has been studied for a long time in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players' least-preferred stable matching. However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players' profits, player-optimal stable matching would be the most desirable. Though \citet{basu21beyond} successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players' preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm named explore-then-Gale-Shapley (ETGS) and show that the optimal stable regret of each player can be upper bounded by $O(K\log T/\Delta^2)$ where $K$ is the number of arms, $T$ is the horizon and $\Delta$ is the players' minimum preference gap among the first $N+1$-ranked arms. This result significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound.	翻訳日:2023-07-21 12:49:22 公開日:2023-07-20
# ロバスト点雲分類におけるリスク最適化外乱除去 Risk-optimized Outlier Removal for Robust Point Cloud Classification ( http://arxiv.org/abs/2307.10875v1 ) ライセンス: Link先を確認	Xinke Li, Junchi Lu	(参考訳) 安全クリティカルな目的のためのポイントクラウドディープモデルの人気は高まっているが、これらのモデルの信頼性とセキュリティは、意図的または自然に発生するポイントクラウドノイズによって損なわれる可能性がある。この問題に対処するために,標準学習モデルに付加的なアウトレイラを排除し,データを復元するPointCVaRと呼ばれる新しいポイントクラウド・アウトレイラ除去手法を提案する。我々のアプローチは、各点がモデル出力に与える影響を決定するために帰属分析を行うことから始まり、それがポイントリスク(point risk)と呼ばれる。次に,リスク条件値(CVaR)を目的とする高リスク点のフィルタリング処理を最適化する。このアプローチの理論的根拠は、点雲のノイズポイントがリスク分布の尾に集結する傾向にあり、低頻度であるが高いレベルのリスクを持つため、分類結果にかなりの干渉が生じるという観察に基づいている。追加の訓練は必要とせず, ノイズノイズ, 逆方向ノイズ, バックドアトリガノイズによって劣化するノイズ点群に対する様々な除去・分類実験において, 例外的な結果が得られた。驚くべきことに、トリガーを取り除くことで、バックドア攻撃に対する防御精度が87%向上した。全体として、提案するpointcvarは、ノイズポイントを効果的に排除し、ポイントクラウドの分類を強化し、さまざまなシナリオにおいて、さまざまなモデルに対して有望なプラグインモジュールとなる。 The popularity of point cloud deep models for safety-critical purposes has increased, but the reliability and security of these models can be compromised by intentional or naturally occurring point cloud noise. To combat this issue, we present a novel point cloud outlier removal method called PointCVaR, which empowers standard-trained models to eliminate additional outliers and restore the data. Our approach begins by conducting attribution analysis to determine the influence of each point on the model output, which we refer to as point risk. We then optimize the process of filtering high-risk points using Conditional Value at Risk (CVaR) as the objective. The rationale for this approach is based on the observation that noise points in point clouds tend to cluster in the tail of the risk distribution, with a low frequency but a high level of risk, resulting in significant interference with classification results. Despite requiring no additional training effort, our method produces exceptional results in various removal-and-classification experiments for noisy point clouds, which are corrupted by random noise, adversarial noise, and backdoor trigger noise. Impressively, it achieves 87% accuracy in defense against the backdoor attack by removing triggers. Overall, the proposed PointCVaR effectively eliminates noise points and enhances point cloud classification, making it a promising plug-in module for various models in different scenarios.	翻訳日:2023-07-21 12:48:49 公開日:2023-07-20
# 自動車シナリオにおける安全軌道に対する動的物体の知覚関連性の保守的推定 Conservative Estimation of Perception Relevance of Dynamic Objects for Safe Trajectories in Automotive Scenarios ( http://arxiv.org/abs/2307.10873v1 ) ライセンス: Link先を確認	Ken Mori, Kai Storms, Steven Peters	(参考訳) 効率的なテスト戦略を持つことは、自動運転のリリースにおいて克服すべき課題である。これは明確な要件とテストに適した方法を必要とする。この研究において、知覚モジュールの要件は、関連性に関して考慮される。関連性の概念はいまだ十分に定義されていない。本稿では,ハイウェイ領域における衝突安全への模範的適用により,この課題を克服する新しい手法を提案する。この一般的なシステムとユースケース仕様を用いて、関連する概念を導出する。したがって、無関係なオブジェクトは、すべての不確実性を考慮して、エゴ車両で利用可能な安全なアクションのセットを制限することができないオブジェクトとして定義される。最初のステップでは、衝突の関連性に関してユースケースを機能シナリオに分解します。それぞれの機能シナリオにおいて、ego 車両と他の動的物体の両方の可能な動作は方程式として定式化される。この可能なアクションのセットは、トラフィックルールによって制約され、関連性基準が得られます。その結果,動的対象が知覚に関連し,完全な評価を行う必要があるという保守的な評価が得られた。この推定は、オフラインテストや知覚コンポーネントの検証に適用可能な要件を提供する。高次元データセットの例を視覚化し、結果の妥当性を示す。最後に,提案する妥当性概念の今後の検証の可能性について概説する。 Having efficient testing strategies is a core challenge that needs to be overcome for the release of automated driving. This necessitates clear requirements as well as suitable methods for testing. In this work, the requirements for perception modules are considered with respect to relevance. The concept of relevance currently remains insufficiently defined and specified. In this paper, we propose a novel methodology to overcome this challenge by exemplary application to collision safety in the highway domain. Using this general system and use case specification, a corresponding concept for relevance is derived. Irrelevant objects are thus defined as objects which do not limit the set of safe actions available to the ego vehicle under consideration of all uncertainties. As an initial step, the use case is decomposed into functional scenarios with respect to collision relevance. For each functional scenario, possible actions of both the ego vehicle and any other dynamic object are formalized as equations. This set of possible actions is constrained by traffic rules, yielding relevance criteria. As a result, we present a conservative estimation which dynamic objects are relevant for perception and need to be considered for a complete evaluation. The estimation provides requirements which are applicable for offline testing and validation of perception components. A visualization is presented for examples from the highD dataset, showing the plausibility of the results. Finally, a possibility for a future validation of the presented relevance concept is outlined.	翻訳日:2023-07-21 12:48:23 公開日:2023-07-20
# 非線形メタラーニングは速い速度を保証できる Nonlinear Meta-Learning Can Guarantee Faster Rates ( http://arxiv.org/abs/2307.10870v1 ) ライセンス: Link先を確認	Dimitri Meunier, Zhu Li, Arthur Gretton, Samory Kpotufe	(参考訳) 近年のemph{meta-learning}に関する多くの理論的研究は、類似した表象構造を目的タスクから簡易化するための保証を達成することを目的としている。重要なのは、理論の主要な目的は、共通表現の学習において、収束率が、タスク数(およびタスク当たりのサンプル数)とともに、\emph{may scale with the number $n$ of tasks} の程度を理解することである。この設定の最初のステップは、タスク間の共有表現とタスク固有の回帰関数の両方が線形であるときにこの特性を示す。この線形設定は、例えば平均的な引数を通じてタスクを集約する利点をすぐに明らかにする。しかし実際には、表現はしばしば非常に非線形であり、線形の場合のように容易に評価できない各タスクに非自明なバイアスを導入する。本研究では,非線形表現を用いたメタラーニングの理論的保証を導出する。特に、共有非線形性写像を無限次元 RKHS に仮定すると、タスク固有回帰関数の滑らかさを利用する注意的な正則化により、さらなるバイアスを緩和できることが示される。 Many recent theoretical works on \emph{meta-learning} aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task. Importantly, the main aim in theory works on the subject is to understand the extent to which convergence rates -- in learning a common representation -- \emph{may scale with the number $N$ of tasks} (as well as the number of samples per task). First steps in this setting demonstrate this property when both the shared representation amongst tasks, and task-specific regression functions, are linear. This linear setting readily reveals the benefits of aggregating tasks, e.g., via averaging arguments. In practice, however, the representation is often highly nonlinear, introducing nontrivial biases in each task that cannot easily be averaged out as in the linear case. In the present work, we derive theoretical guarantees for meta-learning with nonlinear representations. In particular, assuming the shared nonlinearity maps to an infinite-dimensional RKHS, we show that additional biases can be mitigated with careful regularization that leverages the smoothness of task-specific regression functions,	翻訳日:2023-07-21 12:48:05 公開日:2023-07-20
# 点雲変形ネットワークを用いた三次元心収縮と緩和のモデリング Modeling 3D cardiac contraction and relaxation with point cloud deformation networks ( http://arxiv.org/abs/2307.10927v1 ) ライセンス: Link先を確認	Marcel Beetz, Abhirup Banerjee, Vicente Grau	(参考訳) 射出率のような臨床で一般的に用いられる心機能のグローバルな単価バイオマーカーは、真の3d心臓変形過程に関する限られた洞察を与え、健康的および病理学的心臓力学の両方の理解を制限している。本研究では,3次元心収縮と心周期の極端間緩和をモデル化する新しい幾何学的深層学習手法として,point cloud deformation network (pcd-net)を提案する。心臓解剖学のマルチクラス3Dポイントクラウド表現上で,効率的なマルチスケール特徴学習を実現するために,ポイントクラウドベースの深層学習をエンコーダ・デコーダ構造に応用した。我々は,英国バイオバンクの調査から,1万件を超える大規模データセットに対するアプローチを評価し,画像取得の画素解像度以下の予測真理解剖学と地上真理解剖学の間の平均チャンファー距離を求める。以上の結果から,pcd-netは正常者と心筋梗塞患者との間に有意なサブポピュレーション特異的な差を捉えることができた。得られた3次元変形パターンは,MI検出および入射MI予測のタスクにおいて,受信機動作特性曲線の領域で13%,7%,ハーレルのMI生存分析におけるコンコーダンス指標で7%,複数の臨床ベンチマークで13%,7%を上回った。 Global single-valued biomarkers of cardiac function typically used in clinical practice, such as ejection fraction, provide limited insight on the true 3D cardiac deformation process and hence, limit the understanding of both healthy and pathological cardiac mechanics. In this work, we propose the Point Cloud Deformation Network (PCD-Net) as a novel geometric deep learning approach to model 3D cardiac contraction and relaxation between the extreme ends of the cardiac cycle. It employs the recent advances in point cloud-based deep learning into an encoder-decoder structure, in order to enable efficient multi-scale feature learning directly on multi-class 3D point cloud representations of the cardiac anatomy. We evaluate our approach on a large dataset of over 10,000 cases from the UK Biobank study and find average Chamfer distances between the predicted and ground truth anatomies below the pixel resolution of the underlying image acquisition. Furthermore, we observe similar clinical metrics between predicted and ground truth populations and show that the PCD-Net can successfully capture subpopulation-specific differences between normal subjects and myocardial infarction (MI) patients. We then demonstrate that the learned 3D deformation patterns outperform multiple clinical benchmarks by 13% and 7% in terms of area under the receiver operating characteristic curve for the tasks of prevalent MI detection and incident MI prediction and by 7% in terms of Harrell's concordance index for MI survival analysis.	翻訳日:2023-07-21 12:42:23 公開日:2023-07-20
# 3次元医用画像分割における信頼区間の評価 Confidence intervals for performance estimates in 3D medical image segmentation ( http://arxiv.org/abs/2307.10926v1 ) ライセンス: Link先を確認	R. El Jurdi, G. Varoquax, O. Colliot	(参考訳) 医療セグメンテーションモデルは経験的に評価される。このような評価は、サンプル画像の限られたセットに基づいているため、避けられない騒音である。平均的なパフォーマンス指標を超えて、信頼区間の報告が重要である。しかし、医用画像分割ではめったに行われない。信頼区間の幅は、テストセットのサイズとパフォーマンス測定値の広がりに依存する(テストセット全体の標準緩和)。分類には、幅広い信頼区間を避けるために多くのテスト画像が必要である。しかし、セグメンテーションは研究されておらず、与えられたテスト画像によってもたらされる情報量によって異なる。本稿では,医用画像分割における典型的な信頼区間について検討する。標準のnnu-netフレームワークを用いた3次元画像分割実験を行い,医療用デカロンチャレンジから得られた2つのデータセットと,dice精度とハウスドルフ距離の2つの性能測定を行った。パラメトリック信頼区間は,種々のテストセットサイズと性能指標の拡散に対するブートストラップ推定値の妥当な近似であることを示す。重要となるのは,特定の精度を達成するのに必要なテストサイズが,分類タスクよりもはるかに低いことだ。通常、1%の広信頼区間は、拡散が低い場合(標準偏差は約3%)、100-200のテストサンプルを必要とする。より難しいセグメンテーションタスクは、より高いスプレッドをもたらし、1000以上のサンプルを必要とする。 Medical segmentation models are evaluated empirically. As such an evaluation is based on a limited set of example images, it is unavoidably noisy. Beyond a mean performance measure, reporting confidence intervals is thus crucial. However, this is rarely done in medical image segmentation. The width of the confidence interval depends on the test set size and on the spread of the performance measure (its standard-deviation across of the test set). For classification, many test images are needed to avoid wide confidence intervals. Segmentation, however, has not been studied, and it differs by the amount of information brought by a given test image. In this paper, we study the typical confidence intervals in medical image segmentation. We carry experiments on 3D image segmentation using the standard nnU-net framework, two datasets from the Medical Decathlon challenge and two performance measures: the Dice accuracy and the Hausdorff distance. We show that the parametric confidence intervals are reasonable approximations of the bootstrap estimates for varying test set sizes and spread of the performance metric. Importantly, we show that the test size needed to achieve a given precision is often much lower than for classification tasks. Typically, a 1% wide confidence interval requires about 100-200 test samples when the spread is low (standard-deviation around 3%). More difficult segmentation tasks may lead to higher spreads and require over 1000 samples.	翻訳日:2023-07-21 12:41:54 公開日:2023-07-20
# 点雲表現を用いた固有出現分解 Intrinsic Appearance Decomposition Using Point Cloud Representation ( http://arxiv.org/abs/2307.10924v1 ) ライセンス: Link先を確認	Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers	(参考訳) 内在分解は、画像からアルベドとシェーディングを推測することである。かなり不適切な問題であるため、以前の方法は2d画像からの事前の仮定に依存しているが、データ表現自体の探索は限られている。点雲は、画像の幾何学的情報と色情報を自然に整列する豊かなシーン表現形式として知られている。提案手法であるPoint Intrinsic Net, 略してPoInt-Netは, 点雲表現を用いてアルベド, 光源方向, シェーディングを共同で予測する。実験によれば、point-netの利点は、精度の面では、データセットをまたがる複数のメトリクスに対する2d表現アプローチよりも優れており、効率の面では、小規模のポイントクラウド上でトレーニングされ、任意のスケールのポイントクラウド上で安定して実行される。 Intrinsic decomposition is to infer the albedo and shading from the image. Since it is a heavily ill-posed problem, previous methods rely on prior assumptions from 2D images, however, the exploration of the data representation itself is limited. The point cloud is known as a rich format of scene representation, which naturally aligns the geometric information and the color information of an image. Our proposed method, Point Intrinsic Net, in short, PoInt-Net, jointly predicts the albedo, light source direction, and shading, using point cloud representation. Experiments reveal the benefits of PoInt-Net, in terms of accuracy, it outperforms 2D representation approaches on multiple metrics across datasets; in terms of efficiency, it trains on small-scale point clouds and performs stably on any-scale point clouds; in terms of robustness, it only trains on single object level dataset, and demonstrates reasonable generalization ability for unseen objects and scenes.	翻訳日:2023-07-21 12:41:31 公開日:2023-07-20
# 臨床時系列における連続多次元自己監督学習 Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series ( http://arxiv.org/abs/2307.10923v1 ) ライセンス: Link先を確認	Aniruddh Raghu, Payal Chandak, Ridwan Alam, John Guttag, Collin M. Stultz	(参考訳) 臨床時系列データに対する自己教師付き学習 (ssl) は, 患者の生理的状態に関する重要な情報を提供するため, 近年の文献で注目されている。しかし、既存の臨床時系列のSSL法のほとんどは、構造化された特徴(例えば、実験値やバイタルサイン)や個々の高次元生理的信号(例えば、心電図)のような、単調な時系列のために設計されているという点で制限されている。これらの既存手法は、構造的特徴と高次元データがシーケンスの各時間ステップに記録される多モード性を示すモデル時系列に容易に拡張することはできない。本研究では,このギャップに対処し,シーケンス全体のレベルとシーケンス内の個々の高次元データポイントのレベルの両方でSSLロスを適用し,両方のスケールで情報をよりよく取得する,新たなSSLメソッドであるSequential Multi-dimensional SSLを提案する。当社の戦略は,各レベルで使用される損失関数の特定の形式とは無関係です -- vicregのように,simclrや非contrastiveのように,対照的なものです。本手法は,(1)高周波心電図,(2)検査値とバイタルサインからの構造化データを含む実世界の2つの臨床データセットを用いて評価した。実験結果から,本手法による事前学習と下流タスクの微調整により,両方のデータセットのベースライン上でのパフォーマンスが向上し,複数の設定で異なる自己教師付き損失関数が改良される可能性が示唆された。 Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab values and vitals signs) or an individual high-dimensional physiological signal (e.g., an electrocardiogram). These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method -- Sequential Multi-Dimensional SSL -- where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales. Our strategy is agnostic to the specific form of loss function used at each level -- it can be contrastive, as in SimCLR, or non-contrastive, as in VICReg. We evaluate our method on two real-world clinical datasets, where the time series contains sequences of (1) high-frequency electrocardiograms and (2) structured data from lab values and vitals signs. Our experimental results indicate that pre-training with our method and then fine-tuning on downstream tasks improves performance over baselines on both datasets, and in several settings, can lead to improvements across different self-supervised loss functions.	翻訳日:2023-07-21 12:41:14 公開日:2023-07-20
# 言語に基づく行動概念空間は自己指導型学習を改善する Language-based Action Concept Spaces Improve Video Self-Supervised Learning ( http://arxiv.org/abs/2307.10922v1 ) ライセンス: Link先を確認	Kanchana Ranasinghe and Michael Ryoo	(参考訳) 最近のコントラスト言語画像事前学習は、高度に転送可能で堅牢な画像表現の学習につながっている。しかし、これらのモデルを最小限の監督でビデオドメインに適応させることは、まだ未解決の問題である。画像CLIPモデルをビデオ領域に適応させるために,言語による自己教師型学習を用いて,その方向への簡単なステップを探索する。時間的モデリングのために修正されたバックボーンは、アクションコンセプト空間で動作する列車の目的と自己蒸留設定の下で訓練される。関連するテキストプロンプトを用いて言語エンコーダから抽出した様々なアクション概念の特徴ベクトルがこの空間を構成する。本稿では, 従来の表現の汎用性を保ちつつ, 動作と属性の関係を強制する, 概念蒸留と概念アライメントという2つの列車目標を紹介する。提案手法は3つの行動認識ベンチマークにおいてゼロショットおよび線形探索性能を向上させる。 Recent contrastive language image pre-training has led to learning highly transferable and robust image representations. However, adapting these models to video domains with minimal supervision remains an open problem. We explore a simple step in that direction, using language tied self-supervised learning to adapt an image CLIP model to the video domain. A backbone modified for temporal modeling is trained under self-distillation settings with train objectives operating in an action concept space. Feature vectors of various action concepts extracted from a language encoder using relevant textual prompts construct this space. We introduce two train objectives, concept distillation and concept alignment, that retain generality of original representations while enforcing relations between actions and their attributes. Our approach improves zero-shot and linear probing performance on three action recognition benchmarks.	翻訳日:2023-07-21 12:40:44 公開日:2023-07-20
# 多項式関数の量子コンピュータへの効率的な振幅符号化 Efficient amplitude encoding of polynomial functions into quantum computers ( http://arxiv.org/abs/2307.10917v1 ) ライセンス: Link先を確認	Javier Gonzalez-Conde, Thomas W. Watts, Pablo Rodriguez-Grasa and Mikel Sanz	(参考訳) 関数を量子コンピュータにロードすることは、偏微分方程式の解法のようないくつかの量子アルゴリズムにおいて重要なステップである。したがって、このプロセスの非効率性は、これらのアルゴリズムの適用に大きなボトルネックをもたらす。本稿では,実多項式関数の振幅符号化のための2つの効率的な手法を提示・比較する。最初のものは行列積の状態表現に依存し、そこでは結合次元が小さいと仮定された場合の目標状態の近似を研究し、ベンチマークする。第2のアルゴリズムは2つのサブルーチンを結合し、最初は線形関数を量子レジスタにエンコードし、アダマール・ウォルシュ級数展開をロードする多制御ゲートのドローシーケンスと、それに続く逆離散アダマール・ウォルシュ変換を導出する。次に、この構成をビルディングブロックとして使用して、線形関数に対応する振幅の$\mathcal{O}(n)$ブロック符号化を実現し、対応する多項式変換を実装した量子特異値変換を振幅のブロック符号化に適用する。さらに,線形関数のアダマール・ワルシュ級数列が対象状態の最終的な忠実性にどのように影響するかを考察し,小資源で高いフィディティを報告した。 Loading functions into quantum computers represents an essential step in several quantum algorithms, such as in the resolution of partial derivative equations. Therefore, the inefficiency of this process leads to a major bottleneck for the application of these algorithms. Here, we present and compare two efficient methods for the amplitude encoding of real polynomial functions. The first one relies on the matrix product state representation, where we study and benchmark the approximations of the target state when the bond dimension is assumed to be small. The second algorithm combines two subroutines, initially we encode the linear function into the quantum registers with a swallow sequence of multi-controlled gates that loads its Hadamard-Walsh series expansion, followed by the inverse discrete Hadamard-Walsh transform. Then, we use this construction as a building block to achieve a $\mathcal{O}(n)$ block encoding of the amplitudes corresponding to the linear function and apply the quantum singular value transformation that implements the corresponding polynomial transformation to the block encoding of the amplitudes. Additionally, we explore how truncating the Hadamard-Walsh series of the linear function affects the final fidelity of the target state, reporting high fidelities with small resources.	翻訳日:2023-07-21 12:40:32 公開日:2023-07-20
# 自己監視医用画像解析のための微調整戦略の再検討 Revisiting Fine-Tuning Strategies for Self-supervised Medical Imaging Analysis ( http://arxiv.org/abs/2307.10915v1 ) ライセンス: Link先を確認	Muhammad Osama Khan, Yi Fang	(参考訳) 自己教師付き学習(SSL)の急速な進歩にもかかわらず、医用画像解析におけるエンド・ツー・エンドの微調整戦略は依然として主流である。しかし、この手法が訓練済みの知識を効果的に活用するのに本当に最適なのか、特に異なるタイプの特徴を捉えたSSLの多様なカテゴリを考慮すると、はっきりしない。本稿では,まず,4つの下流タスクにおいてSOTAメソッドを上回り,強力なコントラスト的かつ復元的なSSLベースラインを確立する。これらの強力なベースラインに基づいて、複数の事前トレーニングおよび微調整データセット、および様々な微調整データセットサイズにわたる広範囲な微調整分析を行う。トレーニング済みネットワークの最後の数層のみを微調整するという従来の知恵とは対照的に、細調整中間層はより効果的であり、ネットワークの第2四半期(25-50%)は対照的なSSLに最適であるのに対して、第3四半期(50-75%)は復元SSLに最適である。エンドツーエンドファインチューニングのデファクト標準と比較すると、トレーニング済みネットワークの最初の3/3(0-75%)からなる浅層ネットワークを微調整し、最大5.48%の改善を実現しています。さらに,これらの知見を用いて,複数のSSLモデルの相補的強みを利用した簡易かつ効果的な手法を提案する。したがって,個々のsslモデルの性能を向上させるだけでなく,複数のsslモデルが提供する補完的強みを効果的に活用することで,自己監視型医用画像解析の大幅な改善を実現した。 Despite the rapid progress in self-supervised learning (SSL), end-to-end fine-tuning still remains the dominant fine-tuning strategy for medical imaging analysis. However, it remains unclear whether this approach is truly optimal for effectively utilizing the pre-trained knowledge, especially considering the diverse categories of SSL that capture different types of features. In this paper, we first establish strong contrastive and restorative SSL baselines that outperform SOTA methods across four diverse downstream tasks. Building upon these strong baselines, we conduct an extensive fine-tuning analysis across multiple pre-training and fine-tuning datasets, as well as various fine-tuning dataset sizes. Contrary to the conventional wisdom of fine-tuning only the last few layers of a pre-trained network, we show that fine-tuning intermediate layers is more effective, with fine-tuning the second quarter (25-50%) of the network being optimal for contrastive SSL whereas fine-tuning the third quarter (50-75%) of the network being optimal for restorative SSL. Compared to the de-facto standard of end-to-end fine-tuning, our best fine-tuning strategy, which fine-tunes a shallower network consisting of the first three quarters (0-75%) of the pre-trained network, yields improvements of as much as 5.48%. Additionally, using these insights, we propose a simple yet effective method to leverage the complementary strengths of multiple SSL models, resulting in enhancements of up to 3.57% compared to using the best model alone. Hence, our fine-tuning strategies not only enhance the performance of individual SSL models, but also enable effective utilization of the complementary strengths offered by multiple SSL models, leading to significant improvements in self-supervised medical imaging analysis.	翻訳日:2023-07-21 12:40:09 公開日:2023-07-20
# weak polyp: ポリプセグメンテーションのバウンディングボックスだけを見る WeakPolyp: You Only Look Bounding Box for Polyp Segmentation ( http://arxiv.org/abs/2307.10912v1 ) ライセンス: Link先を確認	Jun Wei, Yiwen Hu, Shuguang Cui, S.Kevin Zhou, Zhen Li	(参考訳) 高価なピクセルレベルラベルに制限されたポリプセグメンテーションモデルは、データ不足と一般化に苦しむ。対照的に、polypバウンディングボックスアノテーションはずっと安く、よりアクセスしやすい。したがって,ラベル付けコストを削減するため,境界ボックスアノテーションをベースとした弱教師付きポリプセグメンテーションモデル(WeakPolyp)の学習を提案する。しかし、粗い境界ボックスにはノイズが多すぎる。干渉を避けるため,マスクツーボックス変換(m2b)を導入する。予測自体ではなく予測の外側ボックスマスクを監視することにより、M2Bは粗いラベルと正確な予測とのミスマッチを大幅に軽減する。しかし、M2Bは厳密な監視しか提供せず、異常な予測に繋がる。そこで我々はさらに,集中管理のためのスケール一貫性(SC)損失を提案する。異なるスケールで同じ画像で予測を明示的に調整することで、sc損失は予測のばらつきを大幅に減少させる。 WeakPolypはプラグアンドプレイモデルで、他の魅力的なバックボーンに簡単に移植できます。さらに、提案されたモジュールはトレーニング中にのみ使用され、推論に計算コストがかからない。提案するweakpolypは,マスクアノテーションをまったく必要とせず,完全に教師付きモデルと同等の性能を実現している。 Limited by expensive pixel-level labels, polyp segmentation models are plagued by data shortage and suffer from impaired generalization. In contrast, polyp bounding box annotations are much cheaper and more accessible. Thus, to reduce labeling cost, we propose to learn a weakly supervised polyp segmentation model (i.e., WeakPolyp) completely based on bounding box annotations. However, coarse bounding boxes contain too much noise. To avoid interference, we introduce the mask-to-box (M2B) transformation. By supervising the outer box mask of the prediction instead of the prediction itself, M2B greatly mitigates the mismatch between the coarse label and the precise prediction. But, M2B only provides sparse supervision, leading to non-unique predictions. Therefore, we further propose a scale consistency (SC) loss for dense supervision. By explicitly aligning predictions across the same image at different scales, the SC loss largely reduces the variation of predictions. Note that our WeakPolyp is a plug-and-play model, which can be easily ported to other appealing backbones. Besides, the proposed modules are only used during training, bringing no computation cost to inference. Extensive experiments demonstrate the effectiveness of our proposed WeakPolyp, which surprisingly achieves a comparable performance with a fully supervised model, requiring no mask annotations at all.	翻訳日:2023-07-21 12:39:36 公開日:2023-07-20
# ゲージ対称性による準周期CMV行列のエクササイズエッジ Exact mobility edges for almost-periodic CMV matrices via gauge symmetries ( http://arxiv.org/abs/2307.10909v1 ) ライセンス: Link先を確認	Christopher Cedzich and Jake Fillman and Long Li and Darren Ong and Qi Zhou	(参考訳) 一般化拡張CMV行列の対称性について検討する。標準拡張CMV行列の反射対称性に関わる問題は微妙なものであることはよく文書化されている。一般化された拡張CMV行列のクラスをカンテロ・Gr\"ウンバウム・モラル・ベラスケスの精神における明示的な対角ユニタリを通して、エレガントな方法で扱う方法を示す。これらのアイデアの応用として、モーザイユニタリなニアマチュー作用素と呼ばれる、ほぼ周期的なCMV行列の明示的な族を構築し、正確なモビリティエッジの発生を証明する。すなわち、絶対連続かつ純粋な点スペクトルを持つスペクトル領域を分離し、それらを正確に計算するエネルギーの存在を示す。 We investigate the symmetries of so-called generalized extended CMV matrices. It is well-documented that problems involving reflection symmetries of standard extended CMV matrices can be subtle. We show how to deal with this in an elegant fashion by passing to the class of generalized extended CMV matrices via explicit diagonal unitaries in the spirit of Cantero-Gr\"unbaum-Moral-Vel\'azquez. As an application of these ideas, we construct an explicit family of almost-periodic CMV matrices, which we call the mosaic unitary almost-Mathieu operator, and prove the occurrence of exact mobility edges. That is, we show the existence of energies that separate spectral regions with absolutely continuous and pure point spectrum and exactly calculate them.	翻訳日:2023-07-21 12:39:14 公開日:2023-07-20
# d$-dimensional bell状態に基づくサードパーティなしのマルチパーティ量子和法の改良 Improvements on "Multi-Party Quantum Summation without a Third Party based on $d$-Dimensional Bell States" ( http://arxiv.org/abs/2307.10908v1 ) ライセンス: Link先を確認	Xiaobing Li and Jiale Hou and Haozhen Situ and Cai Zhang	(参考訳) 2021年、WuらはD次元ベル状態の絡み合い特性を利用した多次元量子和スキームを発表した(Wu et al. in Quantum Inf Process 20:200, 2021)。特に、著者らは3つのパーティの量子和プロトコルを提案し、その成果をマルチパーティのケースに拡張した。彼らのプロトコルは外部や参加者の攻撃に対して安全であると主張されている。しかし、この研究はウーのプロトコルが抜け穴を持っていること、すなわち、特定の位置関係を満たしている2人以上の不正な参加者が、検出されずに一部の正直な参加者のプライベートな入力を得ることを意図していることを指摘している。そのため、これらの問題に対処するための改善が提案されている。 In 2021, Wu et al. presented a multi-party quantum summation scheme exploiting the entanglement properties of d-dimensional Bell states (Wu et al. in Quantum Inf Process 20:200, 2021). In particular, the authors proposed a three-party quantum summation protocol and then extended their work to a multi-party case. It is claimed that their protocol is secure against outside and participants' attacks. However, this work points out that Wu's protocol has a loophole, i.e., two or more dishonest participants who meet a specific location relationship can conspire to obtain the private inputs of some honest participants without being detected. Accordingly, improvements are proposed to address these issues.	翻訳日:2023-07-21 12:39:00 公開日:2023-07-20
# 軟部組織駆動型顎顔面手術計画 Soft-tissue Driven Craniomaxillofacial Surgical Planning ( http://arxiv.org/abs/2307.10954v1 ) ライセンス: Link先を確認	Xi Fang, Daeseung Kim, Xuanang Xu, Tianshu Kuang, Nathan Lampen, Jungwook Lee, Hannah H. Deng, Jaime Gateno, Michael A.K. Liebschner, James J. Xia, Pingkun Yan	(参考訳) CMF手術では, 希望する顔の成果を達成するためのボニームーブメントの計画が難しい課題である。現在の骨駆動アプローチは、顔の外観が修正されることを期待して、骨の正常化に焦点を当てている。しかし、骨構造と顔面軟部組織との複雑な非線形関係のため、このような骨駆動法は顔面変形を矯正するには不十分である。骨の動きによる顔の変化をシミュレートする努力にもかかわらず、手術計画はまだ反復的な修正と教育的な推測に依存している。そこで本研究では,手術計画の自動作成と検証が可能なソフトトイシュー駆動フレームワークを提案する。本フレームワークは,所望の顔結果を達成するために必要なボニー運動を推定するボニープランナーネットワークと,推定ボニー運動計画から生じる顔変化をシミュレートする顔シミュレータネットワークとから構成される。これら2つのモデルを組み合わせることで、計画に必要な最終的なボニー運動を検証することができる。提案手法を臨床データを用いて評価し, 従来の骨駆動アプローチと比較して, 軟部組織駆動アプローチが外科的計画の精度と有効性を大幅に改善することを示した。 In CMF surgery, the planning of bony movement to achieve a desired facial outcome is a challenging task. Current bone driven approaches focus on normalizing the bone with the expectation that the facial appearance will be corrected accordingly. However, due to the complex non-linear relationship between bony structure and facial soft-tissue, such bone-driven methods are insufficient to correct facial deformities. Despite efforts to simulate facial changes resulting from bony movement, surgical planning still relies on iterative revisions and educated guesses. To address these issues, we propose a soft-tissue driven framework that can automatically create and verify surgical plans. Our framework consists of a bony planner network that estimates the bony movements required to achieve the desired facial outcome and a facial simulator network that can simulate the possible facial changes resulting from the estimated bony movement plans. By combining these two models, we can verify and determine the final bony movement required for planning. The proposed framework was evaluated using a clinical dataset, and our experimental results demonstrate that the soft-tissue driven approach greatly improves the accuracy and efficacy of surgical planning when compared to the conventional bone-driven approach.	翻訳日:2023-07-21 12:32:37 公開日:2023-07-20
# PE-YOLO:ダークオブジェクト検出のためのピラミッド拡張ネットワーク PE-YOLO: Pyramid Enhancement Network for Dark Object Detection ( http://arxiv.org/abs/2307.10953v1 ) ライセンス: Link先を確認	Xiangchen Yin, Zhenda Yu, Zetao Fei, Wenjun Lv, Xin Gao	(参考訳) 現在のオブジェクト検出モデルは、多くのベンチマークデータセットで良い結果を得ており、暗い条件下でオブジェクトを検出することは大きな課題である。この問題に対処するために,ピラミッド拡張ネットワーク(PENet)を提案し,それをYOLOv3と結合してPE-YOLOというダークオブジェクト検出フレームワークを構築する。まずPENetは、画像をラプラシアンピラミッドを用いて異なる解像度の4つのコンポーネントに分解する。具体的には、コンテキストブランチとエッジブランチで構成される画像のディテールを強化するためのディテール処理モジュール(DPM)を提案する。さらに、低周波セマンティクスを捕捉し、高周波ノイズを防止する低周波拡張フィルタ(LEF)を提案する。 PE-YOLOはエンドツーエンドのジョイントトレーニングアプローチを採用し、通常の検出損失のみを使用してトレーニングプロセスを簡素化する。我々は,低照度物体検出データセットexdarkの実験を行い,その効果を実証した。その結果,他の暗黒検出器や低照度化モデルと比較して,PE-YOLOはmAPが78.0%,FPSが53.6%となり,異なる低照度条件下での物体検出に適応できることがわかった。コードはhttps://github.com/XiangchenYin/PE-YOLOで公開されている。 Current object detection models have achieved good results on many benchmark datasets, detecting objects in dark conditions remains a large challenge. To address this issue, we propose a pyramid enhanced network (PENet) and joint it with YOLOv3 to build a dark object detection framework named PE-YOLO. Firstly, PENet decomposes the image into four components of different resolutions using the Laplacian pyramid. Specifically we propose a detail processing module (DPM) to enhance the detail of images, which consists of context branch and edge branch. In addition, we propose a low-frequency enhancement filter (LEF) to capture low-frequency semantics and prevent high-frequency noise. PE-YOLO adopts an end-to-end joint training approach and only uses normal detection loss to simplify the training process. We conduct experiments on the low-light object detection dataset ExDark to demonstrate the effectiveness of ours. The results indicate that compared with other dark detectors and low-light enhancement models, PE-YOLO achieves the advanced results, achieving 78.0% in mAP and 53.6 in FPS, respectively, which can adapt to object detection under different low-light conditions. The code is available at https://github.com/XiangchenYin/PE-YOLO.	翻訳日:2023-07-21 12:32:16 公開日:2023-07-20
# object-lane clustering によるオンラインレーングラフ抽出の改善 Improving Online Lane Graph Extraction by Object-Lane Clustering ( http://arxiv.org/abs/2307.10947v1 ) ライセンス: Link先を確認	Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool	(参考訳) 自律運転には正確な現場理解情報が必要である。この目的のために、自律エージェントは知覚スタックの一部としてオブジェクト検出とオンラインBEVレーングラフ抽出手法をデプロイする。本研究では,3次元物体検出出力を用いて局所レーングラフ推定精度を向上させるアーキテクチャと損失定式化を提案する。提案手法では, 中心線をクラスタセンタとして, オブジェクトをクラスタセンタ上の確率分布に割り当てるデータポイントとして考慮し, 中心線にオブジェクトを割り当てることを学ぶ。このトレーニングスキームはレーンとオブジェクトの関係を直接監視することを保証するので、パフォーマンスが向上する。提案手法は,最先端手法よりもレーングラフ推定を大幅に改善する。提案手法は,既存の3次元物体検出手法の出力を用いることで,大幅な性能向上が期待できることを示す。本手法では, 中間表現ではなく検出出力を用いるため, テスト時に任意の検出手法を単一モデルで使用することができる。 Autonomous driving requires accurate local scene understanding information. To this end, autonomous agents deploy object detection and online BEV lane graph extraction methods as a part of their perception stack. In this work, we propose an architecture and loss formulation to improve the accuracy of local lane graph estimates by using 3D object detection outputs. The proposed method learns to assign the objects to centerlines by considering the centerlines as cluster centers and the objects as data points to be assigned a probability distribution over the cluster centers. This training scheme ensures direct supervision on the relationship between lanes and objects, thus leading to better performance. The proposed method improves lane graph estimation substantially over state-of-the-art methods. The extensive ablations show that our method can achieve significant performance improvements by using the outputs of existing 3D object detection methods. Since our method uses the detection outputs rather than detection method intermediate representations, a single model of our method can use any detection method at test time.	翻訳日:2023-07-21 12:31:54 公開日:2023-07-20
# プロキシアンカーによる連続一般化カテゴリー探索のための教師なし学習 Proxy Anchor-based Unsupervised Learning for Continuous Generalized Category Discovery ( http://arxiv.org/abs/2307.10943v1 ) ライセンス: Link先を確認	Hyungmin Kim, Sungho Suh, Daehwan Kim, Daun Jeong, Hansang Cho, Junmo Kim	(参考訳) ディープラーニングの最近の進歩は、様々なコンピュータビジョンアプリケーションのパフォーマンスを大幅に改善した。しかしながら、インクリメンタル学習シナリオにおける新しいカテゴリの発見は、新しいカテゴリの数と性質に関する事前知識が不足しているため、依然として困難な問題である。既存の新しいカテゴリ発見手法は、ラベル付きデータセットに依存し、新規カテゴリの数やバッチ内の新規サンプルの割合に関する事前知識によって制限される。本稿では,実世界のシナリオをより正確に反映し,その制約に対処するために,事前知識のないラベル付き集合上で新しいカテゴリを発見できる,教師なしクラスインクリメンタル学習手法を提案する。提案手法は,ラベル付きデータセット上の特徴抽出器とプロキシアンカーを微調整し,未ラベルデータセット上の古いカテゴリと新しいカテゴリとクラスタに分割する。さらに、プロキシアンカーベースの例が代表カテゴリーベクトルを生成して破滅的忘れを緩和する。実験の結果,提案手法は実世界のシナリオにおいて,きめ細かなデータセットの最先端手法よりも優れていることがわかった。 Recent advances in deep learning have significantly improved the performance of various computer vision applications. However, discovering novel categories in an incremental learning scenario remains a challenging problem due to the lack of prior knowledge about the number and nature of new categories. Existing methods for novel category discovery are limited by their reliance on labeled datasets and prior knowledge about the number of novel categories and the proportion of novel samples in the batch. To address the limitations and more accurately reflect real-world scenarios, in this paper, we propose a novel unsupervised class incremental learning approach for discovering novel categories on unlabeled sets without prior knowledge. The proposed method fine-tunes the feature extractor and proxy anchors on labeled sets, then splits samples into old and novel categories and clusters on the unlabeled dataset. Furthermore, the proxy anchors-based exemplar generates representative category vectors to mitigate catastrophic forgetting. Experimental results demonstrate that our proposed approach outperforms the state-of-the-art methods on fine-grained datasets under real-world scenarios.	翻訳日:2023-07-21 12:31:39 公開日:2023-07-20
# pasta: 事前訓練されたアクションステートトランスフォーマーエージェント PASTA: Pretrained Action-State Transformer Agents ( http://arxiv.org/abs/2307.10936v1 ) ライセンス: Link先を確認	Raphael Boige and Yannis Flet-Berliac and Arthur Flajolet and Guillaume Richard and Thomas Pierrot	(参考訳) 自己教師型学習は、NLP、ビジョン、生物学など、さまざまなコンピューティング領域に革命的なパラダイムシフトをもたらした。最近のアプローチでは、大量のラベルのないデータでトランスフォーマーモデルを事前トレーニングし、下流タスクを効率的に解決するための出発点となる。強化学習の分野では、研究者たちは最近、専門家の軌道上で事前訓練されたモデルを開発し、ロボット工学からレコメンデーションシステムまで幅広いタスクに対処できるように、これらのアプローチを適用した。しかし、既存の手法は主に特定の下流アプリケーションに適した複雑な事前学習の目的に依存している。本稿では,前訓練動作状態トランスフォーマーエージェント (pasta) と呼ばれるモデルの包括的検討を行う。本研究は統一的な手法を用い,行動のクローン化,オフラインrl,センサ障害のロバスト性,ダイナミクス変化適応など,幅広い下流タスクをカバーする。私たちの目標は、さまざまな設計選択を体系的に比較し、堅牢なモデルを構築する実践者に貴重な洞察を提供することです。本研究では,アクションと状態コンポーネントレベルでのトークン化,次のトークン予測のような基本的な事前トレーニング目標の利用,多様なドメインをまたいだトレーニングモデル,パラメータ効率の優れた微調整(peft)などについて検討した。また,peftの適用により,下流適応時のパラメータ1万未満の微調整が可能となり,幅広いコミュニティがこれらのモデルを用いて実験を再現することが可能となった。本研究は,RL軌道を表現し,ロバストな政策学習に寄与するために,第一原理設計選択による変圧器の使用に関するさらなる研究を期待する。 Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In the realm of reinforcement learning, researchers have recently adapted these approaches by developing models pre-trained on expert trajectories, enabling them to address a wide range of tasks, from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper presents a comprehensive investigation of models we refer to as Pretrained Action-State Transformer Agents (PASTA). Our study uses a unified methodology and covers an extensive set of general downstream tasks including behavioral cloning, offline RL, sensor failure robustness, and dynamics change adaptation. Our goal is to systematically compare various design choices and provide valuable insights to practitioners for building robust models. Key highlights of our study include tokenization at the action and state component level, using fundamental pre-training objectives like next token prediction, training models across diverse domains simultaneously, and using parameter efficient fine-tuning (PEFT). The developed models in our study contain fewer than 10 million parameters and the application of PEFT enables fine-tuning of fewer than 10,000 parameters during downstream adaptation, allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first-principles design choices to represent RL trajectories and contribute to robust policy learning.	翻訳日:2023-07-21 12:31:22 公開日:2023-07-20
# 機械学習と結晶距離を用いたゼオライトの無機合成構造マップ Inorganic synthesis-structure maps in zeolites with machine learning and crystallographic distances ( http://arxiv.org/abs/2307.10935v1 ) ライセンス: Link先を確認	Daniel Schwalbe-Koda, Daniel E. Widdowson, Tuan Anh Pham, Vitaliy A. Kurlin	(参考訳) ゼオライト(zeolites)は、用途、合成条件、ポリモルフィックの多様性で知られる無機材料である。合成は無機合成と有機合成の両方で制御されているが、ゼオライト合成の計算的な研究は主に有機テンプレートの設計に焦点が当てられている。本研究では,結晶構造と機械学習(ml)間の強い距離測定値を用いて,ゼオライト中の無機合成マップを作成する。 253個のゼオライトから始めて, 構造単位などのラベルを使わずに, 文献から無機合成条件を連続的に再現する方法を示す。教師なし学習分析では, テンプレートベースの経路においても, 隣り合うゼオライトが類似した無機合成条件をしばしば共有していることが示されている。 ML分類器と組み合わせることで, ゼオライト中の14の無機質, Al, B, Be, Ca, Co, F, Ga, Ge, K, Mg, Na, P, Si, Znの合成構造関係が得られた。モデル予測を説明することで,既知の構造との類似性を合成空間の特徴として利用できることを示す。最後に, ゼオライトから局所的な構造パターンを抽出することにより, 仮説データベースにおける非実現枠組みの無機合成条件の予測と結果の解釈にこれらの手法が利用できることを示す。テンプレート設計と組み合わせることで、この研究はゼオライトの合成条件の空間の探索を加速することができる。 Zeolites are inorganic materials known for their diversity of applications, synthesis conditions, and resulting polymorphs. Although their synthesis is controlled both by inorganic and organic synthesis conditions, computational studies of zeolite synthesis have focused mostly on organic template design. In this work, we use a strong distance metric between crystal structures and machine learning (ML) to create inorganic synthesis maps in zeolites. Starting with 253 known zeolites, we show how the continuous distances between frameworks reproduce inorganic synthesis conditions from the literature without using labels such as building units. An unsupervised learning analysis shows that neighboring zeolites according to our metric often share similar inorganic synthesis conditions, even in template-based routes. In combination with ML classifiers, we find synthesis-structure relationships for 14 common inorganic conditions in zeolites, namely Al, B, Be, Ca, Co, F, Ga, Ge, K, Mg, Na, P, Si, and Zn. By explaining the model predictions, we demonstrate how (dis)similarities towards known structures can be used as features for the synthesis space. Finally, we show how these methods can be used to predict inorganic synthesis conditions for unrealized frameworks in hypothetical databases and interpret the outcomes by extracting local structural patterns from zeolites. In combination with template design, this work can accelerate the exploration of the space of synthesis conditions for zeolites.	翻訳日:2023-07-21 12:30:52 公開日:2023-07-20
# OCTraN:非構造交通シナリオにおける3次元駆動型畳み込み変圧器ネットワーク OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios ( http://arxiv.org/abs/2307.10934v1 ) ライセンス: Link先を確認	Aditya Nalgunda Ganesh and Dhruval Pobbathi Badrinath and Harshith Mohan Kumar and Priya SS and Surabhi Narayan	(参考訳) 自律ナビゲーションのための視覚中心環境認識の現代的アプローチは、不均一マップを出力する自己教師付き単眼深度推定アルゴリズムを広範囲に活用する。しかし, この差分マップを3次元空間に投影すると, 差分誤差が増大し, カメラからの距離が大きくなるにつれて, 深さ推定誤差が2次的に増加する。 Light Detection and Ranging (LiDAR)はこの問題を解決できるが、多くのアプリケーションでは高価であり実現不可能である。そこで本稿では, 2次元画像の特徴を3次元空間に変換し, 畳み込みと畳み込みを併用し, 空間情報を効率的に操作する変圧器アーキテクチャであるocranを提案する。また, 単眼深度推定から得られた擬似地上真理ラベルを置換することにより, LiDAR基底真理を排除し, 任意のシーンにモデルを一般化する自己教師型訓練パイプラインを開発した。 Modern approaches for vision-centric environment perception for autonomous navigation make extensive use of self-supervised monocular depth estimation algorithms that output disparity maps. However, when this disparity map is projected onto 3D space, the errors in disparity are magnified, resulting in a depth estimation error that increases quadratically as the distance from the camera increases. Though Light Detection and Ranging (LiDAR) can solve this issue, it is expensive and not feasible for many applications. To address the challenge of accurate ranging with low-cost sensors, we propose, OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features and makes use of convolution and transpose convolution to efficiently operate on spatial information. We also develop a self-supervised training pipeline to generalize the model to any scene by eliminating the need for LiDAR ground truth by substituting it with pseudo-ground truth labels obtained from boosted monocular depth estimation.	翻訳日:2023-07-21 12:30:25 公開日:2023-07-20
# 分節的双生児:文表現の微粒な意味的コントラスト学習 Identical and Fraternal Twins: Fine-Grained Semantic Contrastive Learning of Sentence Representations ( http://arxiv.org/abs/2307.10932v1 ) ライセンス: Link先を確認	Qingfa Xiao, Shuangyin Li, Lei Chen	(参考訳) 文表現の教師なし学習の強化は、コントラスト学習の有用性によって著しく達成されている。このアプローチは、拡張正のインスタンスをアンカーインスタンスとクラスタリングして、望ましい埋め込みスペースを作成する。しかし、対照的な目的のみに依存することは、正のペア間で微妙な意味のバリエーションを区別できないため、最適以下の結果をもたらす可能性がある。特に、一般的なデータ拡張技術は、しばしば意味的歪みをもたらし、正のペア間の意味的マージンをもたらす。情報損失関数は意味的マージンを見落とし、トレーニング中の正のペア間の類似度最大化を優先するが、トレーニングされたモデルの無意識な意味的理解能力に繋がる。本稿では,異なる拡張手法によって生成される様々な正の対に同時に適応できる,新しいIdentical and Fraternal Twins of Contrastive Learning (IFTCL)フレームワークを提案する。そこで本研究では,学習中に生来のマージンを保ち,データエンハンスメントの可能性を促進し,下位最適化問題を克服する \textit{twins loss} を提案する。また,提案したツインズ・ロスの有効性を証明するために,概念実証実験と対照的な目的を組み合わせる。さらに,新たな計算を行わずに負のインスタンスを復元・再利用するための海馬待ち行列機構を提案し,IFCLの効率と性能をさらに向上させる。英語と中国語のデータセットで9つの意味的テキスト類似性タスクをifclフレームワークで検証し,ifclが最先端の手法よりも優れていることを示す。 The enhancement of unsupervised learning of sentence representations has been significantly achieved by the utility of contrastive learning. This approach clusters the augmented positive instance with the anchor instance to create a desired embedding space. However, relying solely on the contrastive objective can result in sub-optimal outcomes due to its inability to differentiate subtle semantic variations between positive pairs. Specifically, common data augmentation techniques frequently introduce semantic distortion, leading to a semantic margin between the positive pair. While the InfoNCE loss function overlooks the semantic margin and prioritizes similarity maximization between positive pairs during training, leading to the insensitive semantic comprehension ability of the trained model. In this paper, we introduce a novel Identical and Fraternal Twins of Contrastive Learning (named IFTCL) framework, capable of simultaneously adapting to various positive pairs generated by different augmentation techniques. We propose a \textit{Twins Loss} to preserve the innate margin during training and promote the potential of data enhancement in order to overcome the sub-optimal issue. We also present proof-of-concept experiments combined with the contrastive objective to prove the validity of the proposed Twins Loss. Furthermore, we propose a hippocampus queue mechanism to restore and reuse the negative instances without additional calculation, which further enhances the efficiency and performance of the IFCL. We verify the IFCL framework on nine semantic textual similarity tasks with both English and Chinese datasets, and the experimental results show that IFCL outperforms state-of-the-art methods.	翻訳日:2023-07-21 12:29:55 公開日:2023-07-20
# mediagpt : 中国語メディアを対象とした大規模言語モデル MediaGPT : A Large Language Model Target Chinese Media ( http://arxiv.org/abs/2307.10930v1 ) ライセンス: Link先を確認	Zhonghao Wang	(参考訳) 大規模言語モデル(LLM)の開発は近年急速に進展している。最も広く使われているLCMの1つは、メディアドメインを含む様々な分野に適用されているジェネレーティブ・プレトレーニング・トランスフォーマー(GPT)シリーズである。しかし、実際的な応用では、メディアのユースケースとLLMの汎用的応用の違いが、特に中国語で顕著になっている。その結果、メディアドメインのユニークな要件に合わせて、LSMを開発する必要性が高まっている。本稿では,多種多様なメディアデータを用いた大規模言語モデルであるMediaGPTを紹介し,中国メディアの実践的ニーズに対処する。我々は、ドメインの特定の要件を満たすために、多様なタスク命令タイプを設計しました。提案手法の有効性をさらに検証するため,メディア領域に適した独自のデータセットを構築し,生成型タスクに特化して設計された検証手法を開発した。そこで我々は, LLM の汎用性とメディア領域の要件とのギャップを埋めること, この分野における LLM のより効率的かつ効率的な利用の道を開くことを目的としている。本稿では,メディアアプリケーションのためのLLM開発における課題と機会を探究し,これらの課題に対処するための潜在的解決策を提案する。 The development of large language models (LLMs) has seen rapid progress in recent years. One of the most widely used LLMs is the Generative Pre-trained Transformer (GPT) series, which has been applied in various fields, including the media domain. However, in practical applications, the differences between the media's use cases and the general-purpose applications of LLMs have become increasingly apparent, especially Chinese. As a result, there is a growing need to develop LLM that are specifically tailored to the unique requirements of the media domain. In this paper, we present MediaGPT, a large language model training on variety of media data and addressing the practical needs of Chinese media. We have designed a diverse set of task instruction types to cater to the specific requirements of the domain. To further validate the effectiveness of our proposed LLM, we have constructed unique datasets that are tailored to the media domain and have also developed verification methods that are specifically designed for generative-type tasks. By doing so, we aim to bridge the gap between the general-purpose LLM and the requirements of the media domain, and to pave the way for more effective and efficient use of LLM in this field. This paper aims to explore the challenges and opportunities of developing LLM for media applications and to propose potential solutions for addressing these challenges.	翻訳日:2023-07-21 12:28:59 公開日:2023-07-20
# FLASK:アライメントスキルセットに基づくきめ細かい言語モデルの評価 FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets ( http://arxiv.org/abs/2307.10928v1 ) ライセンス: Link先を確認	Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim, Yongrae Jo, James Thorne, Juho Kim, Minjoon Seo	(参考訳) 大規模言語モデル(LLM)の評価は、人的価値に合わせるには、複数のスキルの構成が必要であり、必要なスキルセットは命令によって異なるため、難しい。最近の研究では,(1)複数の独立ベンチマークの自動評価,(2)反応に対する総合スコアを与える人間または機械による評価,の2つの方法でllmの性能評価を行っている。しかし、どちらの設定も大まかな評価であり、LLMの真の能力の解釈を制限するインスタンスワイドなスキル構成を必要とするユーザ命令の性質を考慮しない。本稿では,粗粒度スコアリングをインスタンス毎のスキルセットレベルに分解するモデルベースとヒューマンベースの両方に適用可能な,粒度評価プロトコルであるflask(粒度言語モデル評価,アライメントスキルセットに基づく粒度言語モデル評価)を提案する。具体的には、LLMがオープンエンドのユーザ指示に従うために必要な12のきめ細かいスキルを定義し、各インスタンスのスキルセットを割り当てて評価セットを構築する。さらに、各インスタンスのターゲットドメインと難易度をアノテートすることで、FLASKは、スキル、ドメイン、難易度に応じて、モデルのパフォーマンスを包括的に分析する全体像を提供する。 FLASKを用いて、複数のオープンソースおよびプロプライエタリなLCMを比較し、モデルに基づく評価と人間による評価の高度に相関した結果を観察する。 FLASKを使うことで、開発者はモデルのパフォーマンスをより正確に測定し、特定のスキルにおいてLLMを熟練させる要因を分析することで改善できる。実践者にとって、FLASKは様々なLLMの総合的な比較を通じて、特定の状況に適したモデルを提案するために使用できる。評価データとコード実装はhttps://github.com/kaistAI/FLASK.comで公開します。 Evaluation of Large Language Models (LLMs) is challenging because aligning to human values requires the composition of multiple skills and the required set of skills varies depending on the instruction. Recent studies have evaluated the performance of LLMs in two ways, (1) automatic evaluation on several independent benchmarks and (2) human or machined-based evaluation giving an overall score to the response. However, both settings are coarse-grained evaluations, not considering the nature of user instructions that require instance-wise skill composition, which limits the interpretation of the true capabilities of LLMs. In this paper, we introduce FLASK (Fine-grained Language Model Evaluation based on Alignment SKill Sets), a fine-grained evaluation protocol that can be used for both model-based and human-based evaluation which decomposes coarse-level scoring to an instance-wise skill set-level. Specifically, we define 12 fine-grained skills needed for LLMs to follow open-ended user instructions and construct an evaluation set by allocating a set of skills for each instance. Additionally, by annotating the target domains and difficulty level for each instance, FLASK provides a holistic view with a comprehensive analysis of a model's performance depending on skill, domain, and difficulty. Through using FLASK, we compare multiple open-sourced and proprietary LLMs and observe highly-correlated findings between model-based and human-based evaluations. FLASK enables developers to more accurately measure the model performance and how it can be improved by analyzing factors that make LLMs proficient in particular skills. For practitioners, FLASK can be used to recommend suitable models for particular situations through comprehensive comparison among various LLMs. We release the evaluation data and code implementation at https://github.com/kaistAI/FLASK.	翻訳日:2023-07-21 12:28:29 公開日:2023-07-20
# MASR:メタデータ対応音声表現 MASR: Metadata Aware Speech Representation ( http://arxiv.org/abs/2307.10982v1 ) ライセンス: Link先を確認	Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth	(参考訳) 近年,音声表現学習は主に自己教師付き学習(SSL)タスクとして構築され,生音声信号のみを使用しながら,特定の音声記録でしばしば利用できるサイドインフォメーションを無視している。本稿では,前述の制限に対処するメタデータ対応音声表現学習フレームワークであるmasrを提案する。 MASRは、複数の外部知識ソースを組み込むことで、メタデータ情報の利用を促進できる。外部知識源は、ハードマイニング損失に有用なサンプルレベルのペアワイズ類似度行列の形で組み込まれている。 MASRフレームワークの重要な利点は、SSLメソッドの選択と組み合わせることができることである。我々は,MASR表現を用いて,言語識別や音声認識,話者認識や感情認識などの非意味的タスクなど,下流タスクの評価を行う。これらの実験では、他の確立されたベンチマークよりもMASRの大幅な性能向上を示す。本稿では,言語識別タスクの詳細な解析を行い,提案した損失関数が表現を密接な関係のある言語を分離することを可能にする方法について考察する。 In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording. In this paper, we propose MASR, a Metadata Aware Speech Representation learning framework, which addresses the aforementioned limitations. MASR enables the inclusion of multiple external knowledge sources to enhance the utilization of meta-data information. The external knowledge sources are incorporated in the form of sample-level pair-wise similarity matrices that are useful in a hard-mining loss. A key advantage of the MASR framework is that it can be combined with any choice of SSL method. Using MASR representations, we perform evaluations on several downstream tasks such as language identification, speech recognition and other non-semantic tasks such as speaker and emotion recognition. In these experiments, we illustrate significant performance improvements for the MASR over other established benchmarks. We perform a detailed analysis on the language identification task to provide insights on how the proposed loss function enables the representations to separate closely related languages.	翻訳日:2023-07-21 12:21:11 公開日:2023-07-20
# PATROL: モデル反転攻撃に対する協調推論のためのプライバシ指向プルーニング PATROL: Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks ( http://arxiv.org/abs/2307.10981v1 ) ライセンス: Link先を確認	Shiwei Ding, Lan Zhang, Miao Pan, Xiaoyong Yuan	(参考訳) 協調推論(collaborative inference)は、最先端のディープニューラルネットワーク(dnn)を使用してリソース制約のあるエッジデバイスによる推論を可能にする、有望なソリューションである。協調推論では、エッジデバイスはまず入力を部分dnnにローカルに供給し、その後中間結果をクラウドにアップロードして推論を完了させる。しかし、近年の研究では、モデル反転攻撃(MIA)は中間結果から入力データを再構築し、協調推論に深刻なプライバシー上の懸念を呈している。既存の摂動と暗号技術は、正確な推論を行いながらMIAに対する防御において非効率で信頼性が低い。本稿では,プライバシ,効率性,協調推論の有用性のバランスをとるために,プライバシ指向のプルーニングを開発する。 PATROLは、DNNの後のレイヤがタスク固有の機能を抽出できるという事実を活用する。協調推論のための限られたローカルリソースを前提として、PATROLは、推論のためのタスク固有の機能を強制し、プライバシ保護のためのタスク非関連だがセンシティブな機能を減らすために、プルーニング技術に基づいて、エッジにより多くのレイヤをデプロイする。プライバシ指向のプルーニングを実現するために、parioはリプシッツ正則化と、miasの安定性を低下させることによる再構成エラーの増加と、敵のトレーニングによる目標推論モデルの拡張という2つの重要な構成要素を導入している。 Collaborative inference has been a promising solution to enable resource-constrained edge devices to perform inference using state-of-the-art deep neural networks (DNNs). In collaborative inference, the edge device first feeds the input to a partial DNN locally and then uploads the intermediate result to the cloud to complete the inference. However, recent research indicates model inversion attacks (MIAs) can reconstruct input data from intermediate results, posing serious privacy concerns for collaborative inference. Existing perturbation and cryptography techniques are inefficient and unreliable in defending against MIAs while performing accurate inference. This paper provides a viable solution, named PATROL, which develops privacy-oriented pruning to balance privacy, efficiency, and utility of collaborative inference. PATROL takes advantage of the fact that later layers in a DNN can extract more task-specific features. Given limited local resources for collaborative inference, PATROL intends to deploy more layers at the edge based on pruning techniques to enforce task-specific features for inference and reduce task-irrelevant but sensitive features for privacy preservation. To achieve privacy-oriented pruning, PATROL introduces two key components: Lipschitz regularization and adversarial reconstruction training, which increase the reconstruction errors by reducing the stability of MIAs and enhance the target inference model by adversarial training, respectively.	翻訳日:2023-07-21 12:20:54 公開日:2023-07-20
# 電流-密度相互作用を受けるボース・アインシュタイン凝縮体のカイラル電流 Chiral currents in Bose-Einstein condensates subject to current-density interactions ( http://arxiv.org/abs/2307.10977v1 ) ライセンス: Link先を確認	Maria Arazo, Montserrat Guilleumas, Ricardo Mayol, Vicente Delgado and Antonio Mu\~noz Mateo	(参考訳) 準1次元ボース・アインシュタイン凝縮中の持続電流は、電流-密度相互作用の存在下でキラルとなる。この現象は、回転環幾何学でロードされた超低温原子で探索され、様々な電流担持定常状態が解析的に発見され、運動の平均場方程式に対する既知の解を一般化する。その動的安定性は、一定の密度プロファイルと変調された密度プロファイルを持つ状態に対して安定した電流を示す数値シミュレーションによって検証される。この分野における最近の実験により、これらの状態は実験的に到達できる。 Persistent currents in quasi-one-dimensional Bose-Einstein condensates become chiral in the presence of current-density interactions. This phenomenon is explored in ultracold atoms loaded in a rotating ring geometry, where diverse current-carrying stationary states are analytically found to generalize previously known solutions to the mean-field equations of motion. Their dynamical stability is tested by numerical simulations that show stable currents for states with both constant and modulated density profiles, while decaying currents appear only beyond a unidirectional velocity threshold. Recent experiments in the field make these states within experimental reach.	翻訳日:2023-07-21 12:20:25 公開日:2023-07-20
# 集積フォトニック分数畳み込み加速器 Integrated Photonic Fractional Convolution Accelerator ( http://arxiv.org/abs/2307.10976v1 ) ライセンス: Link先を確認	Kevin Zelaya and Mohammad-Ali Miri	(参考訳) 離散差分フーリエ変換(DFrFT)に基づく修正畳み込み演算を行う集積フォトニック回路アーキテクチャを提案する。これは、2つの非一様結合導波路格子と等間隔固有モードスペクトルと、変調器アレイを挟む相補的な順序のDFrDT演算を行う異なる長さの異なる長を持つ。数値シミュレーションにより、ノイズのある入力信号でもスムージングとエッジ検出のタスクが実際に実行されることが示された。 An integrated photonic circuit architecture to perform a modified-convolution operation based on the Discrete Fractional Fourier Transform (DFrFT) is introduced. This is accomplished by utilizing two nonuniformly-coupled waveguide lattices with equally-spaced eigenmode spectra and with different lengths that perform DFrDT operations of complementary orders sandwiching a modulator array. Numerical simulations show that smoothing and edge detection tasks are indeed performed even for noisy input signals.	翻訳日:2023-07-21 12:20:15 公開日:2023-07-20
# ストリーミング音声認識のためのトランスデューサのグローバル正規化 Globally Normalising the Transducer for Streaming Speech Recognition ( http://arxiv.org/abs/2307.10975v1 ) ライセンス: Link先を確認	Rogier van Dalen	(参考訳) Transducer(例えばRNN-TransducerやConformer-Transducer)は入力シーケンスを横切ると出力ラベルシーケンスを生成する。ストリーミングモードで使うのは簡単で、完全な入力を見る前に部分的な仮説を生成する。これは音声認識で人気がある。しかし、ストリーミングモードでは、Transducerには数学的欠陥があり、単にモデルが心を変える能力を制限するだけである。修正は局所正規化(例えばsoftmax)をグローバル正規化に置き換えることだが、損失関数を正確に評価することは不可能になる。近年の論文では,モデルを近似し,性能を著しく低下させることにより,この問題を解決することを提案する。本稿では,損失関数を近似し,最先端のストリーミングモデルにグローバル正規化を適用することを提案する。グローバル正規化は、ワードエラー率を9-11%削減し、ストリーミングとルックアヘッドモードのほぼ半分を閉じる。 The Transducer (e.g. RNN-Transducer or Conformer-Transducer) generates an output label sequence as it traverses the input sequence. It is straightforward to use in streaming mode, where it generates partial hypotheses before the complete input has been seen. This makes it popular in speech recognition. However, in streaming mode the Transducer has a mathematical flaw which, simply put, restricts the model's ability to change its mind. The fix is to replace local normalisation (e.g. a softmax) with global normalisation, but then the loss function becomes impossible to evaluate exactly. A recent paper proposes to solve this by approximating the model, severely degrading performance. Instead, this paper proposes to approximate the loss function, allowing global normalisation to apply to a state-of-the-art streaming model. Global normalisation reduces its word error rate by 9-11% relative, closing almost half the gap between streaming and lookahead mode.	翻訳日:2023-07-21 12:20:04 公開日:2023-07-20
# 画像処理用deep spiking-unet Deep Spiking-UNet for Image Processing ( http://arxiv.org/abs/2307.10974v1 ) ライセンス: Link先を確認	Hebei Li, Yueyi Zhang, Zhiwei Xiong, Zheng-jun Zha, Xiaoyan Sun	(参考訳) u-netはその単純かつ効率的なアーキテクチャで知られており、画像処理タスクに広く利用されており、特にニューロモルフィックチップへのデプロイに適している。本稿では,SNN(Spike Neural Networks)とU-Netアーキテクチャを組み合わせた,画像処理のためのスパイキング-UNetの概念を紹介する。効率的なスパイキング-UNetを実現するためには,スパイクによる高忠実度情報伝播の確保と,効果的なトレーニング戦略の策定という2つの課題に直面する。情報損失問題に対処するため、スパイキングUNet内の情報伝達効率を向上させるマルチ閾値スパイキングニューロンを導入する。トレーニング戦略には,事前学習されたu-netモデルを活用した変換および微調整パイプラインを採用する。変換過程では、スキップ接続を利用する際に、異なる部分間のデータ分散の大幅な変動が観察される。そこで本研究では,不正確な発火率を防止するための接続方向正規化手法を提案する。さらに,変換したモデルを微調整するフローベーストレーニング手法を採用し,性能を保ちながら時間ステップを短縮する。実験の結果,画像のセグメンテーションやデノイングでは,既存のSNN手法を超越して,スパイキング・UNetの非スパイキング手法に匹敵する性能が得られた。微調整なしで変換されたSpking-UNetと比較して、Spking-UNetは推論時間を約90%削減する。本研究は、画像処理におけるSNNの適用範囲を広げ、ニューロモルフィックエンジニアリングの分野におけるさらなる探究を促すことが期待されている。 Spiking-UNet実装のコードはhttps://github.com/SNNresearch/Spiking-UNet.comで公開されている。 U-Net, known for its simple yet efficient architecture, is widely utilized for image processing tasks and is particularly suitable for deployment on neuromorphic chips. This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture. To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy. To address the issue of information loss, we introduce multi-threshold spiking neurons, which improve the efficiency of information transmission within the Spiking-UNet. For the training strategy, we adopt a conversion and fine-tuning pipeline that leverage pre-trained U-Net models. During the conversion process, significant variability in data distribution across different parts is observed when utilizing skip connections. Therefore, we propose a connection-wise normalization method to prevent inaccurate firing rates. Furthermore, we adopt a flow-based training method to fine-tune the converted models, reducing time steps while preserving performance. Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart, surpassing existing SNN methods. Compared with the converted Spiking-UNet without fine-tuning, our Spiking-UNet reduces inference time by approximately 90\%. This research broadens the application scope of SNNs in image processing and is expected to inspire further exploration in the field of neuromorphic engineering. The code for our Spiking-UNet implementation is available at https://github.com/SNNresearch/Spiking-UNet.	翻訳日:2023-07-21 12:19:49 公開日:2023-07-20
# 即席投票の適度に重み付けされた監査員:AWAIRE Adaptively Weighted Audits of Instant-Runoff Voting Elections: AWAIRE ( http://arxiv.org/abs/2307.10972v1 ) ライセンス: Link先を確認	Alexander Ek, Philip B. Stark, Peter J. Stuckey, Damjan Vukcevic	(参考訳) 選挙監査(英: election audit)とは、不正な選挙結果が認定される確率を監査が制限した場合のリスク限度である。即時投票(IRV)選挙の監査方法は、リスク制限や、各投票における投票の電子的記録であるキャスト投票記録(CVR)を必要とするものではない。例えば、IRVコンテストを手動で集計する管轄区域では、CVRは必ずしも利用できない。我々は,CVRが利用できない場合に,適応的に重み付けされたテストスーパーマーチンガルを用いてIRV選挙を効率よく監査するRLA法(AWAIRE)を開発した。適応重み付けの「学習」は、選挙結果を確認するための効率的な仮説のセットである。正確なCVRが利用可能であれば、AWAIREはCVRを必要とする既存のメソッドのパフォーマンスに匹敵する効率を向上させるためにそれらを使用することができる。最大6人の候補者で選挙を処理できるオープンソースのプロトタイプ実装を提供する。実際の選挙のデータを用いたシミュレーションでは、AWAIREは実際に効率的であることが示されている。我々は、より多くの候補者で選挙を扱うための計算手法を拡張する方法について論じる。適応的に重み付けされたテストスーパーマーチンガルの平均は一般的なツールであり、選挙監査を超えて、家族ごとのエラー率を厳格に制御しながら仮説のコレクションをテストするのに有用である。 An election audit is risk-limiting if the audit limits (to a pre-specified threshold) the chance that an erroneous electoral outcome will be certified. Extant methods for auditing instant-runoff voting (IRV) elections are either not risk-limiting or require cast vote records (CVRs), the voting system's electronic record of the votes on each ballot. CVRs are not always available, for instance, in jurisdictions that tabulate IRV contests manually. We develop an RLA method (AWAIRE) that uses adaptively weighted averages of test supermartingales to efficiently audit IRV elections when CVRs are not available. The adaptive weighting 'learns' an efficient set of hypotheses to test to confirm the election outcome. When accurate CVRs are available, AWAIRE can use them to increase the efficiency to match the performance of existing methods that require CVRs. We provide an open-source prototype implementation that can handle elections with up to six candidates. Simulations using data from real elections show that AWAIRE is likely to be efficient in practice. We discuss how to extend the computational approach to handle elections with more candidates. Adaptively weighted averages of test supermartingales are a general tool, useful beyond election audits to test collections of hypotheses sequentially while rigorously controlling the familywise error rate.	翻訳日:2023-07-21 12:19:22 公開日:2023-07-20
# 複数対の空間分離オブザーバへの局所的絡み合い伝達 Local entanglement transfer to multiple pairs of spatially separated observers ( http://arxiv.org/abs/2307.10961v1 ) ライセンス: Link先を確認	Tanmoy Mondal, Kornikar Sen, Chirag Srivastava, Ujjwal Sen	(参考訳) 絡み合いは有利であるが、同時に様々な量子タスクで使われる費用のかかる資源である。絡み合いの効率的な利用と展開のために、空間的に分離された観測者であるCharuとDebuが互いに相互作用することなく絡み合いを共有したいというシナリオを考察する。その結果、それぞれのシステムは、すでに絡み合った状態を共有しているAliceとBobのシステムと、それぞれ別々にローカルに対話することができる。 Alice-Bob 対から複数の Charu-Debu 対への絡み合いが可能であるかどうかを問う。我々は、Alice と Charus の1つ、Bob とそれに対応する Debu によって適用された合同ユニタリを見つけ、Alice と Bob の間で共有される絡み合いの非ゼロの量を、無限個の Charus と Debus に順次転送することができる。これらのユニタリを用いて一定数のペアに移動可能な絡み合いの量について議論する。また、一定量の絡み合いを転送できるペアの数も決定する。さらに,可能なすべての局所ユニタリを最適化することにより,各組が少なくとも一定量の絡み合いを得るように、絡み合いを転送できる組の最大数を解析する。 Entanglement is an advantageous but at the same time a costly resource utilized in various quantum tasks. For an efficient usage and deployment of entanglement, we envisage the scenario where a pair of spatially separated observers, Charu and Debu, want to share entanglement without interacting with each other. As a way out, their systems can separately and locally interact with those of Alice and Bob, respectively, who already share an entangled state. We ask if it is possible to transfer entanglement from the Alice-Bob pair to multiple Charu- Debu pairs, where the Alice-Bob pair only possesses a limited amount of pre-shared entanglement. We find joint unitaries, which when applied by Alice and one of the Charus, and by Bob and the corresponding Debu, such that a nonzero amount of the entanglement shared between Alice and Bob can be sequentially transferred to an indefinite number of pairs of Charus and Debus. We discuss the amount of entanglement that can be transferred to a fixed number of pairs using these unitaries. Also, we determine to how many pairs a fixed amount of entanglement can be transferred. Moreover, by optimizing over all possible local unitaries, we analyze the maximum number of pairs to which entanglement can be transferred in such a way that each pair gets at least a fixed amount of entanglement.	翻訳日:2023-07-21 12:18:58 公開日:2023-07-20
# 光力学を用いた伝播光モード間の連続的可変絡み合い Continuous variable entanglement between propagating optical modes using optomechanics ( http://arxiv.org/abs/2307.10956v1 ) ライセンス: Link先を確認	Greeshma Gopinath (1), Yong Li (2), Sankar Davuluri (1) ((1) Department of Physics, BITS Pilani, Hyderabad Campus, Hyderabad, India, (2) Center for Theoretical Physics and School of Science, Hainan University, Haikou 570228, China)	(参考訳) 本稿では, 2つの空間分離した出力レーザー場を, 中間膜を有する光機械的キャビティから絡み合う新しい方法を提案する。放射圧力結合は、入力と出力場の四角形の間の相関を修正するために用いられる。次に、光機械的キャビティ出力のレーザーフィールドを量子バックアクションヌル化メーター技術を用いて絡み合う。熱雑音が絡み合いに及ぼす影響について検討した。実験可能なパラメータでは、レーザーフィールド間の絡み合いは室温まで持続する。 This article proposes a new method to entangle two spatially separated output laser fields from an optomechanical cavity with a membrane in the middle. The radiation pressure force coupling is used to modify the correlations between the input and the output field quadratures. Then the laser fields at the optomechanical cavity output are entangled using the quantum back-action nullifying meter technique. The effect of thermal noise on the entanglement is studied. For experimentally feasible parameters, the entanglement between the laser fields survives upto room temperature.	翻訳日:2023-07-21 12:18:32 公開日:2023-07-20
# 内視鏡手術症例における脊髄神経分節法とデータセット構築 Spinal nerve segmentation method and dataset construction in endoscopic surgical scenarios ( http://arxiv.org/abs/2307.10955v1 ) ライセンス: Link先を確認	Shaowu Peng, Pengcheng Zhao, Yongyu Ye, Junying Chen, Yunbing Chang, Xiaoqing Zheng	(参考訳) 内視鏡手術は現在,脊髄外科領域において重要な治療方法であり,ビデオ指導による脊髄神経損傷の回避が重要な課題である。本稿では,内視鏡下手術における脊髄神経のリアルタイム分割法について紹介する。手術中に記録された約10,000個の分節フレームの微細注釈付きセグメンテーションデータセットを初めて構築し、セグメンテーションの問題に対処する。本データセットに基づいて,フレーム間情報と自己認識機構を利用して最先端の性能を実現する FUnet (Frame-Unet) を提案する。また、同様のポリプ内視鏡映像データセット上で拡張exper-imentsを行い、そのモデルが優れた性能を有することを示す。この作業のデータセットとコードは以下の通りである。 Endoscopic surgery is currently an important treatment method in the field of spinal surgery and avoiding damage to the spinal nerves through video guidance is a key challenge. This paper presents the first real-time segmentation method for spinal nerves in endoscopic surgery, which provides crucial navigational information for surgeons. A finely annotated segmentation dataset of approximately 10,000 consec-utive frames recorded during surgery is constructed for the first time for this field, addressing the problem of semantic segmentation. Based on this dataset, we propose FUnet (Frame-Unet), which achieves state-of-the-art performance by utilizing inter-frame information and self-attention mechanisms. We also conduct extended exper-iments on a similar polyp endoscopy video dataset and show that the model has good generalization ability with advantageous performance. The dataset and code of this work are presented at: https://github.com/zzzzzzpc/FUnet .	翻訳日:2023-07-21 12:18:23 公開日:2023-07-20
# シャープネス最小化アルゴリズムはシャープネスを最小化するだけでなく、より高度な一般化を実現する Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization ( http://arxiv.org/abs/2307.11007v1 ) ライセンス: Link先を確認	Kaiyue Wen, Tengyu Ma, Zhiyuan Li	(参考訳) 広範な研究にもかかわらず、過剰パラメータ化されたニューラルネットワークが一般化できる理由については、いまだに解明されていない。既存の理論では、一般的な確率最適化器は訓練損失のより平坦な最小化器を好んでおり、従って平坦性は一般化を意味するという自然な説明がある。この研究はこの説明を批判的に検証する。 1) 平坦性が一般化を立証する, (2) 非一般化平坦性モデルが存在する, (2) シャープ性最小化アルゴリズムは一般化しない, (3) もっとも驚くことに、非一般化平坦性モデルが存在するが、シャープ性最小化アルゴリズムは依然として一般化している。以上の結果から,シャープネスと一般化の関係はデータ分布とモデルアーキテクチャに依存し,シャープネス最小化アルゴリズムはシャープネスを最小化するだけでなく,より優れた一般化を実現することができることが示唆された。これにより、超パラメータニューラルネットワークの一般化のための他の説明の探索が要求される。 Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.	翻訳日:2023-07-21 12:10:38 公開日:2023-07-20
# 事前学習されたASRとLMを統合した音声言語理解のためのシーケンス生成 Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding ( http://arxiv.org/abs/2307.11005v1 ) ライセンス: Link先を確認	Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe	(参考訳) 事前学習音声認識(ASR)と言語モデル(LM)をSLUフレームワークに統合することへの関心が高まっている。しかし、事前の手法は事前訓練されたモデル間の語彙ミスマッチに苦しむことが多く、LMはNLUの定式化から分岐するので直接利用できない。本研究では,ASRおよびLMサブネットワークをSLUに効果的に統合し,シーケンス生成タスクをSLUに組み込む3パスエンドツーエンド(E2E)SLUシステムを提案する。最初のパスでは、ASRサブネットワークを用いてASRの書き起こしを予測する。その後、LMサブネットワークが続き、最初のSLU予測を行う。第3パスでは、最終的な予測を行うために、ASRおよびLMサブネットワークからの表現に関する検討サブネットワーク条件が記述される。提案した3パスSLUシステムは,2つのベンチマークSLUデータセット(SLURPとSLUE)上でのカスケードおよびE2E SLUモデルの性能向上を示す。 There has been an increased interest in the integration of pretrained speech recognition (ASR) and language models (LM) into the SLU framework. However, prior methods often struggle with a vocabulary mismatch between pretrained models, and LM cannot be directly utilized as they diverge from its NLU formulation. In this study, we propose a three-pass end-to-end (E2E) SLU system that effectively integrates ASR and LM subnetworks into the SLU formulation for sequence generation tasks. In the first pass, our architecture predicts ASR transcripts using the ASR subnetwork. This is followed by the LM subnetwork, which makes an initial SLU prediction. Finally, in the third pass, the deliberation subnetwork conditions on representations from the ASR and LM subnetworks to make the final prediction. Our proposed three-pass SLU system shows improved performance over cascaded and E2E SLU models on two benchmark SLU datasets, SLURP and SLUE, especially on acoustically challenging utterances.	翻訳日:2023-07-21 12:10:18 公開日:2023-07-20
# neosyspartan:数値相対性理論を用いた偏心二重ブラックホールの高次多重極波形のニューロシンボリックスピン予測アーキテクチャ NeoSySPArtaN: A Neuro-Symbolic Spin Prediction Architecture for higher-order multipole waveforms from eccentric Binary Black Hole mergers using Numerical Relativity ( http://arxiv.org/abs/2307.11003v1 ) ライセンス: Link先を確認	Amrutaa Vibho, Ali Al Bataineh	(参考訳) 連星ブラックホールと中性子星の融合におけるスピンマグニチュードの予測は、これらの大災害の間に放出される天体物理学的過程と重力波(gw)信号を理解する上で重要である。本稿では,ニューラルネットのパワーとシンボリック回帰を組み合わせた新しいニューロシンボリックアーキテクチャ(nsa)を提案し,ブラックホールと中性子星の融合のスピンマグニチュードを正確に予測する。本稿では,SXSウェーブフォームカタログの数値相対性理論から得られたGW波形データを利用する。これら2つのアプローチを組み合わせることで,両パラダイムの強みを活用し,スピンマグニチュードの包括的かつ正確な予測を可能にする。実験の結果,提案アーキテクチャは, NSAモデルでは0.05の根平均二乗誤差(RMSE), NSAモデルでは0.03の平均二乗誤差(MSE), シンボリック回帰モデルでは0.12のRMSEを実現している。このモデルを用いて高次多重極波形の処理を訓練し,特異な特徴を示す偏心候補に着目した。以上の結果から,合併におけるスピン大小予測のための頑健かつ解釈可能な枠組みが得られた。これはブラックホールの天体物理学的性質を理解し、GW信号の基盤となる物理を解読することにつながる。 The prediction of spin magnitudes in binary black hole and neutron star mergers is crucial for understanding the astrophysical processes and gravitational wave (GW) signals emitted during these cataclysmic events. In this paper, we present a novel Neuro-Symbolic Architecture (NSA) that combines the power of neural networks and symbolic regression to accurately predict spin magnitudes of black hole and neutron star mergers. Our approach utilizes GW waveform data obtained from numerical relativity simulations in the SXS Waveform catalog. By combining these two approaches, we leverage the strengths of both paradigms, enabling a comprehensive and accurate prediction of spin magnitudes. Our experiments demonstrate that the proposed architecture achieves an impressive root-mean-squared-error (RMSE) of 0.05 and mean-squared-error (MSE) of 0.03 for the NSA model and an RMSE of 0.12 for the symbolic regression model alone. We train this model to handle higher-order multipole waveforms, with a specific focus on eccentric candidates, which are known to exhibit unique characteristics. Our results provide a robust and interpretable framework for predicting spin magnitudes in mergers. This has implications for understanding the astrophysical properties of black holes and deciphering the physics underlying the GW signals.	翻訳日:2023-07-21 12:09:57 公開日:2023-07-20
# 自動圧縮によるプライベートフェデレーション学習 Private Federated Learning with Autotuned Compression ( http://arxiv.org/abs/2307.10999v1 ) ライセンス: Link先を確認	Enayat Ullah, Christopher A. Choquette-Choo, Peter Kairouz, Sewoong Oh	(参考訳) 我々は,圧縮率の設定やチューニングを必要とせずに,プライベートフェデレーション学習におけるコミュニケーションを減らす新しい手法を提案する。我々のオンザフライ方式は,セキュアアグリゲーションとディファレンシャルプライバシを使用して,証明可能なプライバシ保証を維持しつつ,トレーニング中のエラーに基づいて圧縮率を自動的に調整する。提案手法は, 平均推定において, 「問題の硬さ」に適応し, 最小の相互作用性で適応できることを示す。本手法は,チューニングを必要とせず,良好な圧縮率を達成し,実世界のデータセット上での有効性を示す。 We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates. Our on-the-fly methods automatically adjust the compression rate based on the error induced during training, while maintaining provable privacy guarantees through the use of secure aggregation and differential privacy. Our techniques are provably instance-optimal for mean estimation, meaning that they can adapt to the ``hardness of the problem" with minimal interactivity. We demonstrate the effectiveness of our approach on real-world datasets by achieving favorable compression rates without the need for tuning.	翻訳日:2023-07-21 12:09:31 公開日:2023-07-20
# dream: ブラックボックスモデルのドメインフリーリバースエンジニアリング属性 DREAM: Domain-free Reverse Engineering Attributes of Black-box Model ( http://arxiv.org/abs/2307.10997v1 ) ライセンス: Link先を確認	Rongqing Li, Jiaqi Yu, Changsheng Li, Wenhan Luo, Ye Yuan, Guoren Wang	(参考訳) ディープラーニングモデルは通常、マシンラーニングプラットフォームにデプロイされるブラックボックスである。以前の研究では、ターゲットのブラックボックスニューラルネットワークの属性(例えば$、畳み込みレイヤの数)がクエリのシーケンスを通じて露呈できることが示されている。これらの作業では、ターゲットモデルを事前にトレーニングするために使用するデータセットを仮定し、このデータセットをモデル属性アタックに利用する。しかし、実際にターゲットブラックボックスモデルのトレーニングデータセットにアクセスすることは困難である。したがって、このケースでターゲットブラックボックスモデルの属性が明らかにされるかどうかは疑わしい。本稿では,対象モデルのトレーニングデータセットの可用性を必要とせず,ドリームと呼ばれるブラックボックスターゲットモデルの属性をドメインに依存しないリバースエンジニアリングする新たな問題を調査し,この問題を分散(ood)一般化問題として位置づけることで,汎用的・原則的な枠組みを提案する。このようにして、ターゲットブラックボックスモデルの属性を未知のトレーニングデータで逆推論するために、ドメインに依存しないモデルを学ぶことができる。これにより,本手法は,強力な一般化能力を持つモデル属性リバースエンジニアリングにおいて,任意の領域に優雅に適用できる種類の1つである。広範な実験を行い,提案手法がベースラインよりも優れていることを検証した。 Deep learning models are usually black boxes when deployed on machine learning platforms. Prior works have shown that the attributes ($e.g.$, the number of convolutional layers) of a target black-box neural network can be exposed through a sequence of queries. There is a crucial limitation: these works assume the dataset used for training the target model to be known beforehand and leverage this dataset for model attribute attack. However, it is difficult to access the training dataset of the target black-box model in reality. Therefore, whether the attributes of a target black-box model could be still revealed in this case is doubtful. In this paper, we investigate a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target Model, called DREAM, without requiring the availability of the target model's training dataset, and put forward a general and principled framework by casting this problem as an out of distribution (OOD) generalization problem. In this way, we can learn a domain-agnostic model to inversely infer the attributes of a target black-box model with unknown training data. This makes our method one of the kinds that can gracefully apply to an arbitrary domain for model attribute reverse engineering with strong generalization ability. Extensive experimental studies are conducted and the results validate the superiority of our proposed method over the baselines.	翻訳日:2023-07-21 12:09:21 公開日:2023-07-20
# 生音楽生成のためのプログレッシブ蒸留拡散 Progressive distillation diffusion for raw music generation ( http://arxiv.org/abs/2307.10994v1 ) ライセンス: Link先を確認	Svetlana Pavlova	(参考訳) 本稿では,生のオーディオファイルを生成するタスクに,新たなディープラーニングアプローチを適用することを目的とする。これは近年の深層生成モデルである拡散モデルに基づいている。この新しい手法は画像生成において際立った結果を示している。コンピュータビジョンコミュニティによって、これらのモデルに多くの焦点が当てられている。一方で、波形領域の音楽生成など、他の種類のアプリケーションに対して与えられたものはごくわずかである。本稿では,1次元u-netを用いたプログレッシブ蒸留拡散の非条件生成モデルを実装した。次に、拡散の異なるパラメータと完全な結果におけるそれらの値の比較を示す。この方法で実装された方法の大きな利点は、1チャンネル128×384から3チャンネル128×128メルスペクトログラムへの変換とループ生成を使用して、オーディオ処理と生成の進捗に対処できるという事実である。経験的比較は、異なる自己収集データセット間で実現される。 This paper aims to apply a new deep learning approach to the task of generating raw audio files. It is based on diffusion models, a recent type of deep generative model. This new type of method has recently shown outstanding results with image generation. A lot of focus has been given to those models by the computer vision community. On the other hand, really few have been given for other types of applications such as music generation in waveform domain. In this paper the model for unconditional generating applied to music is implemented: Progressive distillation diffusion with 1D U-Net. Then, a comparison of different parameters of diffusion and their value in a full result is presented. One big advantage of the methods implemented through this work is the fact that the model is able to deal with progressing audio processing and generating , using transformation from 1-channel 128 x 384 to 3-channel 128 x 128 mel-spectrograms and looped generation. The empirical comparisons are realized across different self-collected datasets.	翻訳日:2023-07-21 12:08:57 公開日:2023-07-20
# 高密度サンプルディープラーニング Dense Sample Deep Learning ( http://arxiv.org/abs/2307.10991v1 ) ライセンス: Link先を確認	Stephen Jos\`e Hanson, Vivek Yadev, Catherine Hanson	(参考訳) 1980年代に最初に提案されたニューラルネットワークアルゴリズムの変種であるdeep learning(dl)は、言語翻訳、タンパク質の折り畳み、自動運転車、最近では人間に似た言語モデル(チャットボット)に至るまで、人工知能(ai)において驚くべき進歩を遂げた。ディープラーニング(dl)ネットワークの利用は増加しているが、これらのネットワークをさまざまなアプリケーションで効果的にする学習メカニズムや表現については、実際にはほとんど理解されていない。答えの一部はアーキテクチャの巨大なスケールでなければならないし、もちろんデータの大規模なスケールでなければならない。しかし、深層学習表現の性質はほとんど不明である。残念なことに、数百万から数十億のトークンを持つトレーニングセットには未知のコンビネータがあり、数百万から数十億の隠れたユニットを持つネットワークは容易に可視化できず、そのメカニズムは容易に明らかにできない。本稿では,これらの質問を高密度サンプルタスク(最低500個以上のトークンを含む5つのユニークなトークン)における大きな (1.24M 重量; VGG) DL を用いて探索し,カテゴリ構造と特徴構成の出現をより注意深く追従することを可能にする。これらの結果から,dlの学習ダイナミクスに関する基礎的な観察を収集し,本研究に基づく複雑な特徴構築の新たな理論を提案する。 Deep Learning (DL) , a variant of the neural network algorithms originally proposed in the 1980s, has made surprising progress in Artificial Intelligence (AI), ranging from language translation, protein folding, autonomous cars, and more recently human-like language models (CHATbots), all that seemed intractable until very recently. Despite the growing use of Deep Learning (DL) networks, little is actually understood about the learning mechanisms and representations that makes these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and of course the large scale of the data, since not much has changed since 1987. But the nature of deep learned representations remain largely unknown. Unfortunately training sets with millions or billions of tokens have unknown combinatorics and Networks with millions or billions of hidden units cannot easily be visualized and their mechanisms cannot be easily revealed. In this paper, we explore these questions with a large (1.24M weights; VGG) DL in a novel high density sample task (5 unique tokens with at minimum 500 exemplars per token) which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping, From these results we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.	翻訳日:2023-07-21 12:08:43 公開日:2023-07-20
# 機械学習回帰におけるトレーニングセット充填距離の最小化の検討 Investigating minimizing the training set fill distance in machine learning regression ( http://arxiv.org/abs/2307.10988v1 ) ライセンス: Link先を確認	Paolo Climaco and Jochen Garcke	(参考訳) 多くの機械学習回帰手法は予測モデルをトレーニングするために大きなデータセットを利用する。しかし、計算上の制限やラベル付けコストが高いため、大規模なデータセットを使用することは不可能である。したがって、計算効率を保ちながらモデル性能を最大化するためには、未ラベルデータポイントのプールから小さなトレーニングセットをサンプリングすることが不可欠である。本研究では,選択した集合の充填距離を最小化するためのサンプリング手法を提案する。我々は,データ特徴の知識を条件として,トレーニングセット満杯距離に線形に依存する最大予測誤差の上限を導出する。経験的検証のために、2つのデータセット上で2つの回帰モデルを用いて実験を行う。実験により, 充填距離を最小化することを目的としたトレーニングセットの選択により, 境界を最小化することで, 各種回帰モデルの最大予測誤差を大幅に低減し, 既存のサンプリングアプローチを高いマージンで上回ることを示した。 Many machine learning regression methods leverage large datasets for training predictive models. However, using large datasets may not be feasible due to computational limitations or high labelling costs. Therefore, sampling small training sets from large pools of unlabelled data points is essential to maximize model performance while maintaining computational efficiency. In this work, we study a sampling approach aimed to minimize the fill distance of the selected set. We derive an upper bound for the maximum expected prediction error that linearly depends on the training set fill distance, conditional to the knowledge of data features. For empirical validation, we perform experiments using two regression models on two datasets. We empirically show that selecting a training set by aiming to minimize the fill distance, thereby minimizing the bound, significantly reduces the maximum prediction error of various regression models, outperforming existing sampling approaches by a large margin.	翻訳日:2023-07-21 12:08:15 公開日:2023-07-20
# 機械的因果グラフによる決定理論の特徴付け Characterising Decision Theories with Mechanised Causal Graphs ( http://arxiv.org/abs/2307.10987v1 ) ライセンス: Link先を確認	Matt MacDermott, Tom Everitt, and Francesco Belardinelli	(参考訳) 自分の決定は私の期待する成果に対する信念にどのように影響を与えるべきか? ある行動をとることで、自分自身をある種の人と見なすなら、他人が私をどう見ているか、そして私と似た人をどう見ているかに影響を与えます。これは私の期待するユーティリティ計算に影響し、どのアクションがベストかを変更できます。議論の対象となるかどうか、どのように考えるべきかは、明らかな決定理論、因果決定理論、機能的な決定理論を含む、議論の的となっている。本稿では、機械化された因果モデルを用いて、最も重要な決定理論を特徴づけ、区別し、異なる決定理論の分類を生成できることを示す。 How should my own decisions affect my beliefs about the outcomes I expect to achieve? If taking a certain action makes me view myself as a certain type of person, it might affect how I think others view me, and how I view others who are similar to me. This can influence my expected utility calculations and change which action I perceive to be best. Whether and how it should is subject to debate, with contenders for how to think about it including evidential decision theory, causal decision theory, and functional decision theory. In this paper, we show that mechanised causal models can be used to characterise and differentiate the most important decision theories, and generate a taxonomy of different decision theories.	翻訳日:2023-07-21 12:07:59 公開日:2023-07-20
# metric3d: 1つの画像からゼロショットメトリック3d予測へ Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image ( http://arxiv.org/abs/2307.10984v1 ) ライセンス: Link先を確認	Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen	(参考訳) 画像から正確な3dシーンを再構築することは、長年のビジョン課題だ。単一像再構成問題の不備により、最もよく確立された手法は多視点幾何学に基づいている。 state-of-the-art (sota) 単眼距離推定法は単一のカメラモデルしか処理できず、距離曖昧性のため混合データトレーニングを行うことができない。一方、大きな混合データセットで訓練されたsoma単眼法は、実世界のメトリクスを復元できないアフィン不変深さを学習することでゼロショット一般化を達成する。本研究では,ゼロショット単眼距離モデルにおける鍵は,大規模データトレーニングと様々なカメラモデルによる距離曖昧性解消の組み合わせにあることを示す。そこで本稿では,曖昧性問題に明示的に対処し,既存の単眼モデルに無益に接続可能な標準カメラ空間変換モジュールを提案する。当社のモジュールを搭載した単眼モデルは、数千台のカメラモデルを備えた800万以上のイメージで安定してトレーニングすることが可能です。 7つのゼロショットベンチマークでSOTA性能を示す実験を行った。特に,本手法は,第2回単眼深度推定チャレンジで優勝した。提案手法は, ランダムに収集したインターネット画像上での計測3次元構造の正確な復元を可能にする。潜在的な利点は下流のタスクにまで拡張され、モデルにプラグインするだけで大幅に改善できます。例えば,本モデルではモノクロSLAMのスケールドリフト問題(第1図)を緩和し,高品質な計量スケール高密度マッピングを実現する。コードはhttps://github.com/YvanYin/Metric3Dで入手できる。 Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-posedness of the single-image reconstruction problem, most well-established methods are built upon multi-view geometry. State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity. Meanwhile, SOTA monocular methods trained on large mixed datasets achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. In this work, we show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models. Equipped with our module, monocular models can be stably trained with over 8 million images with thousands of camera models, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Experiments demonstrate SOTA performance of our method on 7 zero-shot benchmarks. Notably, our method won the championship in the 2nd Monocular Depth Estimation Challenge. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. The potential benefits extend to downstream tasks, which can be significantly improved by simply plugging in our model. For example, our model relieves the scale drift issues of monocular-SLAM (Fig. 1), leading to high-quality metric scale dense mapping. The code is available at https://github.com/YvanYin/Metric3D.	翻訳日:2023-07-21 12:07:47 公開日:2023-07-20
# クラスタ対応半教師付き学習:クラスタリングを学習する関係知識蒸留 Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering ( http://arxiv.org/abs/2307.11030v1 ) ライセンス: Link先を確認	Yijun Dong, Kevin Miller, Qi Lei, Rachel Ward	(参考訳) 教師と生徒のモデル間の特徴(関係)にマッチする(関係)知識蒸留の実証的成功と実用的意義にもかかわらず、対応する理論解釈は様々な知識蒸留パラダイムに限定されている。本研究では, 半教師付き分類問題に着目し, 関係知識蒸留(RKD)の理論的理解に向けて最初の一歩を踏み出した。まず,教師モデルによって示される集団誘発グラフ上で,rkdをスペクトルクラスタリングとしてキャスティングすることから始める。予測値と基底値のクラスタリングのばらつきを定量化するクラスタリングエラーの概念を用いて,人口を超えたrkdがクラスタリングエラーの低減につながることを示す。さらに,非ラベルサンプルを限定してrkdに限定したサンプル複雑性を提供する。半教師付き学習では,クラスタ認識型半教師付き学習の一般的なフレームワークを通じて,クラスタリングエラーを想定するRKDのラベル効率をさらに向上する。最後に、このクラスタ対応フレームワークにデータの強化一貫性の規則化を統一することにより、正確なクラスタリングを学習する共通の効果にもかかわらず、rkdはスペクトルクラスタリングを通じて「グローバル」な視点を促進するが、一貫性の規則化は拡張を通じた「ローカル」な視点に焦点を当てる。 Despite the empirical success and practical significance of (relational) knowledge distillation that matches (the relations of) features between teacher and student models, the corresponding theoretical interpretations remain limited for various knowledge distillation paradigms. In this work, we take an initial step toward a theoretical understanding of relational knowledge distillation (RKD), with a focus on semi-supervised classification problems. We start by casting RKD as spectral clustering on a population-induced graph unveiled by a teacher model. Via a notion of clustering error that quantifies the discrepancy between the predicted and ground truth clusterings, we illustrate that RKD over the population provably leads to low clustering error. Moreover, we provide a sample complexity bound for RKD with limited unlabeled samples. For semi-supervised learning, we further demonstrate the label efficiency of RKD through a general framework of cluster-aware semi-supervised learning that assumes low clustering errors. Finally, by unifying data augmentation consistency regularization into this cluster-aware framework, we show that despite the common effect of learning accurate clusterings, RKD facilitates a "global" perspective through spectral clustering, whereas consistency regularization focuses on a "local" perspective via expansion.	翻訳日:2023-07-21 12:02:16 公開日:2023-07-20
# ノイズ量子コンピュータ上でのサイクル離散時間量子ウォーク Cycle discrete-time quantum walks on a noisy quantum computer ( http://arxiv.org/abs/2307.11027v1 ) ライセンス: Link先を確認	Vivek Wadhia, Nicholas Chancellor and Viv Kendon	(参考訳) 量子コンピューティングの急速な発展により、様々なアプリケーションに対する量子アルゴリズムへの関心が高まっている。量子ウォークは、量子アルゴリズムでの使用の可能性から、関心の高まりも経験している。 qiskitソフトウェアパッケージを使用して、ibmが提供する量子コンピュータの現在の世代がいかに正確にサイクル離散時間量子ウォークをシミュレートできるかをテストする。 ibmq_quitoとして知られるIBM量子デバイス上で、8ノード、8ステップウォーク、より単純な4ノード、4ステップの離散時間量子ウォークを実装し、各ウォークの各ステップに対する結果を示す。 ibmq_santiago量子デバイスのノイズレベルを少なくとも94%削減し、16ノード、16ステップサイクルの離散時間量子ウォークを適度な忠実度レベルにするために、カスタムノイズモデルを開発した。 The rapid development of quantum computing has led to increasing interest in quantum algorithms for a variety of different applications. Quantum walks have also experienced a surge in interest due to their potential use in quantum algorithms. Using the qiskit software package, we test how accurately the current generation of quantum computers provided by IBM can simulate a cycle discrete-time quantum walk. Implementing an 8-node, 8-step walk and a simpler 4-node, 4-step discrete-time quantum walk on an IBM quantum device known as ibmq_quito, the results for each step of the respective walks are presented. A custom noise model is developed in order to estimate that noise levels in the ibmq_santiago quantum device would need to be reduced by at least 94% in order to execute a 16-node, 16-step cycle discrete-time quantum walk to a reasonable level of fidelity.	翻訳日:2023-07-21 12:01:52 公開日:2023-07-20
# ストリーマー自己表現の再構築としてのVTubingの検討:アイデンティティ,パフォーマンス,ジェンダー Investigating VTubing as a Reconstruction of Streamer Self-Presentation: Identity, Performance, and Gender ( http://arxiv.org/abs/2307.11025v1 ) ライセンス: Link先を確認	Qian Wan and Zhicong Lu	(参考訳) vtubers(virtual youtuber)は、アニメーション2dまたは3d仮想アバターを使ってストリーミングコンテンツを制作するライブストリーマーである。近年、世界中のVTuberクリエイターや視聴者の数が大幅に増加している。この実践は、視聴者のエンゲージメント行動や知覚などのトピックに研究の注意を向けてきたが、アニメーションアバターは、自身の身体を使用する従来のライブストリーミングよりもアイデンティティとパフォーマンスの柔軟性を提供するため、この柔軟性がクリエイター自身の提示方法にどのように影響するかはほとんど研究されていない。この研究は、16人の中国語話者のvtuberのストリーミングプラクティスの質的研究の結果を提示することで、このギャップを埋めようとしている。データによると、ライブストリーミングで使用された仮想アバターは、インフレーションされたプレゼンテーションを使ってクリエイターが自らをプレゼンする機会を与え、視聴者と包括的な対話をもたらした。結果はまた、虚偽の環境に置かれている間、VTubersの膨らみ、しばしばセクシュアライズされた性表現も明らかにした。 VTubingの社会技術的側面は、性嫌がらせや性差別を減らし、自己目的化の懸念も高めた。 VTubers, or Virtual YouTubers, are live streamers who create streaming content using animated 2D or 3D virtual avatars. In recent years, there has been a significant increase in the number of VTuber creators and viewers across the globe. This practise has drawn research attention into topics such as viewers' engagement behaviors and perceptions, however, as animated avatars offer more identity and performance flexibility than traditional live streaming where one uses their own body, little research has focused on how this flexibility influences how creators present themselves. This research thus seeks to fill this gap by presenting results from a qualitative study of 16 Chinese-speaking VTubers' streaming practices. The data revealed that the virtual avatars that were used while live streaming afforded creators opportunities to present themselves using inflated presentations and resulted in inclusive interactions with viewers. The results also unveiled the inflated, and often sexualized, gender expressions of VTubers while they were situated in misogynistic environments. The socio-technical facets of VTubing were found to potentially reduce sexual harassment and sexism, whilst also raising self-objectification concerns.	翻訳日:2023-07-21 12:01:37 公開日:2023-07-20
# 検索強化による大規模言語モデルの事実知識境界の検討 Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation ( http://arxiv.org/abs/2307.11019v1 ) ライセンス: Link先を確認	Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang	(参考訳) 知識集約的なタスク(例えば、オープンドメイン質問応答(QA))は、かなりの量の事実知識を必要とし、しばしば援助のために外部情報に依存する。最近の大規模言語モデル(例えばchatgpt)は、知識集約的なタスクを含む、世界的知識による幅広いタスクの解決において印象的な能力を示している。しかし、LLMが実際の知識境界、特に検索強化を取り入れた場合の行動をどのように認識できるかは、まだ不明である。本研究では,オープンドメインQA上でのLLMの実態知識境界と検索の増大がLLMに与える影響について,初期分析を行った。特に,3つの主要な研究課題に焦点をあて,QA評価,事前判定,後部判定による分析を行った。 llmが質問に対する回答能力と回答の正確性に不当な自信を持っている証拠を示す。さらに,検索の強化は,llmsの知識境界に対する意識向上に有効なアプローチであることが証明され,その判断能力が向上した。さらに, LLMは, 回答の定式化に際し, 提案した検索結果に依存する傾向があり, これらの結果の質がそれらの信頼性に大きく影響することがわかった。この作業を再現するコードはhttps://github.com/RUCAIBox/LLM-Knowledge-Boundaryで公開されている。 Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries, thereby improving their judgemental abilities. Additionally, we also find that LLMs have a propensity to rely on the provided retrieval results when formulating answers, while the quality of these results significantly impacts their reliance. The code to reproduce this work is available at https://github.com/RUCAIBox/LLM-Knowledge-Boundary.	翻訳日:2023-07-21 12:01:14 公開日:2023-07-20
# Amortized Variational Inference: When and Why? Amortized Variational Inference: When and Why? ( http://arxiv.org/abs/2307.11018v1 ) ライセンス: Link先を確認	Charles C. Margossian and David M. Blei	(参考訳) amortized variational inference (a-vi) は確率モデルにおいて生じる難解な後方分布を近似する手法である。 A-VI の定義的特徴は、各観測結果を局所潜在変数の近似後部へマッピングする大域的推論関数を学ぶことである。これは、各潜在変数の近似分布のパラメータを直接学習するより古典的な因子化(平均場)変分推論(f-vi)とは対照的である。深層生成モデルでは、A-VIは局所潜伏変数の推論を高速化する計算トリックとして用いられる。本稿では, A-VI を F-VI の代替として検討した。 a-vi は、退化族が因子化された族の部分集合であるため、f-vi の最適解よりも低いkullback-leibler 分岐を持つ近似を生成することができない。したがって、中心的な理論的問題は、A-VIがF-VIの最適解を得るときに特徴づけることである。我々は、理論上F-VIの最適性を達成できるモデルと推論関数の両方の条件を導出する。より深い生成モデルを含む幅広い階層モデルに対して、A-VIとF-VIのギャップを埋めることが可能であることを示す。さらに、より広範なモデルのクラスでは、推論関数のドメインを拡張して償却を可能な戦略にする方法と方法を確立します。最後に、隠れマルコフモデルやガウス過程を含む特定のモデルにおいて、a-vi はどんなに表現力のある推論関数であっても f-vi の解と一致しないことを証明する。また、A-VIを実験的に研究する [...] Amortized variational inference (A-VI) is a method for approximating the intractable posterior distributions that arise in probabilistic models. The defining feature of A-VI is that it learns a global inference function that maps each observation to its local latent variable's approximate posterior. This stands in contrast to the more classical factorized (or mean-field) variational inference (F-VI), which directly learns the parameters of the approximating distribution for each latent variable. In deep generative models, A-VI is used as a computational trick to speed up inference for local latent variables. In this paper, we study A-VI as a general alternative to F-VI for approximate posterior inference. A-VI cannot produce an approximation with a lower Kullback-Leibler divergence than F-VI's optimal solution, because the amortized family is a subset of the factorized family. Thus a central theoretical problem is to characterize when A-VI still attains F-VI's optimal solution. We derive conditions on both the model and the inference function under which A-VI can theoretically achieve F-VI's optimum. We show that for a broad class of hierarchical models, including deep generative models, it is possible to close the gap between A-VI and F-VI. Further, for an even broader class of models, we establish when and how to expand the domain of the inference function to make amortization a feasible strategy. Finally, we prove that for certain models -- including hidden Markov models and Gaussian processes -- A-VI cannot match F-VI's solution, no matter how expressive the inference function is. We also study A-VI empirically [...]	翻訳日:2023-07-21 12:00:50 公開日:2023-07-20
# 心筋梗塞予測のための多目的ポイントクラウドオートエンコーダ Multi-objective point cloud autoencoders for explainable myocardial infarction prediction ( http://arxiv.org/abs/2307.11017v1 ) ライセンス: Link先を確認	Marcel Beetz, Abhirup Banerjee, Vicente Grau	(参考訳) 心筋梗塞(mi)は、世界で最も一般的な死因の1つである。クリニックで一般的に使用される画像ベースのバイオマーカー、例えば放出分画は、心臓の3D解剖学におけるより複雑なパターンを捉えることができず、診断精度が制限される。本稿では,心臓解剖学と機能学の多クラス3dポイントクラウド表現に基づいて,梗塞予測のための新しい幾何学的深層学習手法として,多目的ポイントクラウドオートエンコーダを提案する。そのアーキテクチャは、低次元の潜在空間で接続された複数のタスク固有の分岐で構成され、リコンストラクションとmi予測の両方の効果的な多目的学習を可能にし、また、解釈可能な潜在空間で病理学的に特異的な3d形状情報をキャプチャする。さらに、ポイントクラウドベースのディープラーニング操作を備えた階層的ブランチ設計により、高分解能の解剖学的ポイントクラウド上で直接、効率的なマルチスケール機能学習が可能になる。大規模な英国バイオバンクデータセットを用いた実験では,マルチオブジェクト・ポイント・クラウド・オートエンコーダは,画像の画素解像度より下方にある予測と入力の解剖学の間のチャムファー距離で,複数の時間的3次元形状を正確に再構成することができる。提案手法は,入射MI予測処理における複数の機械学習および深層学習ベンチマークを,受信者動作曲線の下での面積で19%向上させる。また,そのタスクに特有なコンパクトな潜在性空間は,対象の符号化と対応する3次元形状との間に臨床的に妥当な関係を持つ分離可能な制御およびmiクラスターを示し,その予測可能性を示す。 Myocardial infarction (MI) is one of the most common causes of death in the world. Image-based biomarkers commonly used in the clinic, such as ejection fraction, fail to capture more complex patterns in the heart's 3D anatomy and thus limit diagnostic accuracy. In this work, we present the multi-objective point cloud autoencoder as a novel geometric deep learning approach for explainable infarction prediction, based on multi-class 3D point cloud representations of cardiac anatomy and function. Its architecture consists of multiple task-specific branches connected by a low-dimensional latent space to allow for effective multi-objective learning of both reconstruction and MI prediction, while capturing pathology-specific 3D shape information in an interpretable latent space. Furthermore, its hierarchical branch design with point cloud-based deep learning operations enables efficient multi-scale feature learning directly on high-resolution anatomy point clouds. In our experiments on a large UK Biobank dataset, the multi-objective point cloud autoencoder is able to accurately reconstruct multi-temporal 3D shapes with Chamfer distances between predicted and input anatomies below the underlying images' pixel resolution. Our method outperforms multiple machine learning and deep learning benchmarks for the task of incident MI prediction by 19% in terms of Area Under the Receiver Operating Characteristic curve. In addition, its task-specific compact latent space exhibits easily separable control and MI clusters with clinically plausible associations between subject encodings and corresponding 3D shapes, thus demonstrating the explainability of the prediction.	翻訳日:2023-07-21 12:00:19 公開日:2023-07-20
# 未知の動的システムのためのフローマップ学習:概要,実装,ベンチマーク Flow Map Learning for Unknown Dynamical Systems: Overview, Implementation, and Benchmarks ( http://arxiv.org/abs/2307.11013v1 ) ライセンス: Link先を確認	Victor Churchill, Dongbin Xiu	(参考訳) フローマップ学習(FML)は、ディープニューラルネットワーク(DNN)とともに、未知の動的システムのデータ駆動モデリングを約束している。 FMLの注目すべき特徴は、正確な数学的モデルが存在しなくても、部分的に観測されたシステムの正確な予測モデルを作成することができることである。本稿では、FMLフレームワークの概要と、その実装を成功させるために重要な計算の詳細について述べる。また,未知の力学系を学習するための,よく定義されたベンチマーク問題も提示する。これらの問題の数値的な詳細は、それらのfmlの結果とともに示され、問題を横断的に検証し、結果が再現可能であることを保証する。 Flow map learning (FML), in conjunction with deep neural networks (DNNs), has shown promises for data driven modeling of unknown dynamical systems. A remarkable feature of FML is that it is capable of producing accurate predictive models for partially observed systems, even when their exact mathematical models do not exist. In this paper, we present an overview of the FML framework, along with the important computational details for its successful implementation. We also present a set of well defined benchmark problems for learning unknown dynamical systems. All the numerical details of these problems are presented, along with their FML results, to ensure that the problems are accessible for cross-examination and the results are reproducible.	翻訳日:2023-07-21 11:59:53 公開日:2023-07-20
# 深層学習テストのためのニューロン感度誘導型テストケース選択 Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing ( http://arxiv.org/abs/2307.11011v1 ) ライセンス: Link先を確認	Dong Huang, Qingwen Bu, Yichao Fu, Yuhao Qing, Bocheng Xiao, Heming Cui	(参考訳) Deep Neural Networks〜(DNN)は様々なタスク(例えば自律運転、医療診断)に対処するためにソフトウェアに広くデプロイされている。しかし、経済的な損失を招き、人間の安全を脅かす誤った行動も生み出す可能性がある。 DNNの誤った振る舞いを明らかにして修正するために、DNN開発者はしばしば、自然界から豊富なラベル付けされていないデータセットを収集し、それらをラベル付けしてDNNモデルをテストする。しかし、多くのラベルのないデータセットを適切にラベル付けすることは、非常に高価で時間がかかります。上記の問題に対処するために,nss(neuron sensitivity guided test case selection)を提案し,ラベル付きデータセットから有用なテストケースを選択することでラベリング時間を短縮する。 NSSは、テストケースによって引き起こされる内部ニューロンの情報を利用して、重要なテストケースを選択する。 sotaベースライン法と比較して,広範に使用される4つのデータセットとよく設計された4つのdnnモデルを用いてnssを評価する。その結果,nssはテストケースの障害トリガ発生確率とモデル改善能力の評価に有効であることがわかった。具体的には、ベースラインアプローチと比較して高いフォールト検出率(例えばmnist \&lenet1実験でラベルなしデータセットから5\%のテストケースを選択する場合、nssはベースラインより20\%高い81.8\%のフォールト検出率を得ることができる)を得ることができる。 Deep Neural Networks~(DNNs) have been widely deployed in software to address various tasks~(e.g., autonomous driving, medical diagnosis). However, they could also produce incorrect behaviors that result in financial losses and even threaten human safety. To reveal the incorrect behaviors in DNN and repair them, DNN developers often collect rich unlabeled datasets from the natural world and label them to test the DNN models. However, properly labeling a large number of unlabeled datasets is a highly expensive and time-consuming task. To address the above-mentioned problem, we propose NSS, Neuron Sensitivity guided test case Selection, which can reduce the labeling time by selecting valuable test cases from unlabeled datasets. NSS leverages the internal neuron's information induced by test cases to select valuable test cases, which have high confidence in causing the model to behave incorrectly. We evaluate NSS with four widely used datasets and four well-designed DNN models compared to SOTA baseline methods. The results show that NSS performs well in assessing the test cases' probability of fault triggering and model improvement capabilities. Specifically, compared with baseline approaches, NSS obtains a higher fault detection rate~(e.g., when selecting 5\% test case from the unlabeled dataset in MNIST \& LeNet1 experiment, NSS can obtain 81.8\% fault detection rate, 20\% higher than baselines).	翻訳日:2023-07-21 11:59:42 公開日:2023-07-20
# 反射エントロピーと計算可能なクロスノルムネガティビティ:自由理論と対称性の解決について On reflected entropy and computable cross-norm negativity: Free theories and symmetry resolution ( http://arxiv.org/abs/2307.11009v1 ) ライセンス: Link先を確認	Cl\'ement Berthiere and Gilles Parez	(参考訳) 計算可能なクロスノーム(CCNR)と,CCNR負性度(CCNR Negativity)と呼ばれる関連量に基づく分離性基準を検討する。 CCNR負性率の反射バージョンを導入し、その関係を他の確立された絡み合い関連量、すなわち反射エントロピーと作用素エントロピーとを議論する。自由フェルミオン理論とボゾン理論では、2点相関関数の項で正確な公式を導出し、体系的な数値的な研究と原理的には解析的処理を可能にする。大域的な$U(1)$対称性を持つ系に対しては、対称性を解いた反射エントロピーとCCNR負性度を研究する。我々は隣接する区間の荷電モーメントに対する共形場理論(cft)の結果を提供し、数値との完全な一致を求める。我々は,自由フェルミオンモデルと自由ボソンモデルの両方に対して,反射エントロピーとCCNR負の平衡を観察する。最初の電荷依存補正はフェルミオンに対して予想され、ボソンのcft計算から導かれる。 We investigate a separability criterion based on the computable cross-norm (CCNR), and a related quantity called the CCNR negativity. We introduce a reflected version of the CCNR negativity, and discuss its connection with other well-established entanglement-related quantities, namely the reflected entropy and the operator entanglement entropy. For free fermionic and bosonic theories, we derive exact formulas in terms of two-point correlation functions, which allows for systematic numerical investigations and, in principle, analytical treatments. For systems with a global $U(1)$ symmetry, we study the symmetry-resolved reflected entropy and CCNR negativity. We provide conformal field theory (CFT) results for the charged moments in the case of adjacent intervals, finding perfect agreement with the numerics. We observe an equipartition of reflected entropies and CCNR negativities, both for free fermions and free bosons models. The first charge-dependent correction are conjectured for fermions, and worked out from the CFT calculations for bosons.	翻訳日:2023-07-21 11:59:15 公開日:2023-07-20
# 二重非絡み合い操作による蒸留性絡み合い Distillable entanglement under dually non-entangling operations ( http://arxiv.org/abs/2307.11008v1 ) ライセンス: Link先を確認	Ludovico Lami, Bartosz Regula	(参考訳) ノイズ量子状態からエンタングルメントを蒸留できる正確な速度を計算することは、量子情報における最も長い疑問の1つである。 dne(dually non-entangling)オペレーションのセットの下で、絡み合い蒸留の正確な解を与える -- 一般的に考えられる局所操作と古典的コミュニケーションの緩和であり、分離可能な状態と測定のセットを保存するすべてのチャネルを含んでいる。本研究では, DNE蒸留可能なエンタングルメントは, 議論を分離可能な測定で測定する正規化相対エントロピーの修正版と一致することを示す。 ours は、エンタングルメント理論における任意の種類の自由操作の下での蒸留可能なエンタングルメントの2番目に知られている正規化公式である。我々の発見の直接の結果は、DNEの下では、絡み合った状態から絡み合いを蒸留できるということである。第2の主結果として,dne蒸留性エンタングルメントの一般上界を構成することにより,エンタングルメントの分離可能な相対エントロピーが、エンタングルメントの標準相対エントロピーの正規化よりも厳密に小さいことを証明した。これは [Li/Winter, CMP 326, 63 (2014)] の開問題を解く。 Computing the exact rate at which entanglement can be distilled from noisy quantum states is one of the longest-standing questions in quantum information. We give an exact solution for entanglement distillation under the set of dually non-entangling (DNE) operations -- a relaxation of the typically considered local operations and classical communication, comprising all channels which preserve the sets of separable states and measurements. We show that the DNE distillable entanglement coincides with a modified version of the regularised relative entropy of entanglement in which the arguments are measured with a separable measurement. Ours is only the second known regularised formula for the distillable entanglement under any class of free operations in entanglement theory, after that given by Devetak and Winter for one-way LOCCs. An immediate consequence of our finding is that, under DNE, entanglement can be distilled from any entangled state. As our second main result, we construct a general upper bound on the DNE distillable entanglement, using which we prove that the separably measured relative entropy of entanglement can be strictly smaller than the regularisation of the standard relative entropy of entanglement. This solves an open problem in [Li/Winter, CMP 326, 63 (2014)].	翻訳日:2023-07-21 11:58:55 公開日:2023-07-20
# 科学ワークフローにおけるネットワーク内記憶キャッシュの有効性と予測可能性 Effectiveness and predictability of in-network storage cache for scientific workflows ( http://arxiv.org/abs/2307.11069v1 ) ライセンス: Link先を確認	Caitlin Sim, Kesheng Wu, Alex Sim, Inder Monga, Chin Guok, Frank Wurthwein, Diego Davila, Harvey Newman, Justas Balcas	(参考訳) 大規模な科学的なコラボレーションでは、複数の科学者が同じファイルセットにアクセスし、異なる分析を行い、遠くにある大量の共有データに繰り返しアクセスする。これらのデータアクセスは、距離による遅延が長く、広域ネットワーク上で利用可能な帯域幅が限られている。広域ネットワークトラフィックとデータアクセス遅延を低減するため、新しいネットワークサービスとして地域データストレージキャッシュがインストールされている。科学的応用におけるキャッシュシステムの有効性を検討するため,南カリフォルニアのペタバイトスケールキャッシュを用いて高エネルギー物理実験を行った。約3TBの運用ログを調べることで、このキャッシュはワイドエリアネットワークから67.6%のファイルリクエストを削除し、ワイドエリアネットワーク上のトラフィック量を平均12.3TB(35.4%)削減した。トラフィック量(35.4%)の削減は、より大きなファイルが再利用される可能性が低いため、ファイル数(67.6%)の削減よりも少ない。このデータアクセスパターンの違いにより、キャッシュシステムは、より大きなファイルを処理する際に小さなファイルを削除しないようにポリシーを実装している。また、キャッシュ動作の予測可能性を研究するための機械学習モデルを構築します。テストの結果、このモデルはキャッシュアクセス、キャッシュミス、ネットワークスループットを正確に予測することができ、将来のリソースのプロビジョニングと計画に関する研究に役立ちます。 Large scientific collaborations often have multiple scientists accessing the same set of files while doing different analyses, which create repeated accesses to the large amounts of shared data located far away. These data accesses have long latency due to distance and occupy the limited bandwidth available over the wide-area network. To reduce the wide-area network traffic and the data access latency, regional data storage caches have been installed as a new networking service. To study the effectiveness of such a cache system in scientific applications, we examine the Southern California Petabyte Scale Cache for a high-energy physics experiment. By examining about 3TB of operational logs, we show that this cache removed 67.6% of file requests from the wide-area network and reduced the traffic volume on wide-area network by 12.3TB (or 35.4%) an average day. The reduction in the traffic volume (35.4%) is less than the reduction in file counts (67.6%) because the larger files are less likely to be reused. Due to this difference in data access patterns, the cache system has implemented a policy to avoid evicting smaller files when processing larger files. We also build a machine learning model to study the predictability of the cache behavior. Tests show that this model is able to accurately predict the cache accesses, cache misses, and network throughput, making the model useful for future studies on resource provisioning and planning.	翻訳日:2023-07-21 11:49:58 公開日:2023-07-20
# CNOS:CADベースの新しいオブジェクトセグメンテーションのための強力なベースライン CNOS: A Strong Baseline for CAD-based Novel Object Segmentation ( http://arxiv.org/abs/2307.11067v1 ) ライセンス: Link先を確認	Van Nguyen Nguyen, Tomas Hodan, Georgy Ponimatkin, Thibault Groueix, Vincent Lepetit	(参考訳) CADモデルを用いて,RGB画像中の未確認オブジェクトを分割する手法を提案する。最近の強力な基盤モデルであるDINOv2とSegment Anythingを活用して、記述子を作成し、与えられた入力RGBイメージのバイナリマスクを含む提案を生成する。 CADモデルから生成された参照記述子と提案を一致させることで、モーダルマスクとともに正確なオブジェクトID割り当てを実現する。我々は,本手法がCADに基づく新しいオブジェクトセグメンテーションにおいて,BOP課題の7つのコアデータセットに対する既存のアプローチを,同一のBOP評価プロトコルを用いて19.8倍のAPで上回っていることを示す。ソースコードはhttps://github.com/nv-nguyen/cnosで入手できます。 We propose a simple three-stage approach to segment unseen objects in RGB images using their CAD models. Leveraging recent powerful foundation models, DINOv2 and Segment Anything, we create descriptors and generate proposals, including binary masks for a given input RGB image. By matching proposals with reference descriptors created from CAD models, we achieve precise object ID assignment along with modal masks. We experimentally demonstrate that our method achieves state-of-the-art results in CAD-based novel object segmentation, surpassing existing approaches on the seven core datasets of the BOP challenge by 19.8\% AP using the same BOP evaluation protocol. Our source code is available at https://github.com/nv-nguyen/cnos.	翻訳日:2023-07-21 11:49:35 公開日:2023-07-20
# ディープラーニングモデルに基づく運転政策予測 Driving Policy Prediction based on Deep Learning Models ( http://arxiv.org/abs/2307.11058v1 ) ライセンス: Link先を確認	Fuxiao Liu	(参考訳) 本研究では,通常のカメラからの映像フレームの視覚的特徴とクラウドポイントスキャナからの深度情報を組み合わせたエンドツーエンドシステムを構築し,運転方針(車両速度と操舵角度)を予測する。実世界経験者による予測結果と標準行動を比較することにより,システムの安全性を検証した。実験結果から,テストケースの半数(モデルによっては50%80%)で精度の高い予測が可能であり,複合機能の利用はビデオフレームのみを使用するよりも,ほとんどのケースで性能が向上した。 In this project, we implemented an end-to-end system that takes in combined visual features of video frames from a normal camera and depth information from a cloud points scanner, and predicts driving policies (vehicle speed and steering angle). We verified the safety of our system by comparing the predicted results with standard behaviors by real-world experienced drivers. Our test results show that the predictions can be considered as accurate in at lease half of the testing cases (50% 80%, depending on the model), and using combined features improved the performance in most cases than using video frames only.	翻訳日:2023-07-21 11:49:19 公開日:2023-07-20
# 二次元テンソルネットワーク計算の複雑さに関するランダムな洞察 Random insights into the complexity of two-dimensional tensor network calculations ( http://arxiv.org/abs/2307.11053v1 ) ライセンス: Link先を確認	Sofia Gonzalez-Garcia, Shengqi Sang, Timothy H. Hsieh, Sergio Boixo, Guifre Vidal, Andrew C. Potter and Romain Vasseur	(参考訳) 射影絡み合いペア状態(PEPS)は、絡み合い領域の法則に従う量子多体状態のメモリ効率の表現を提供し、二次元(2d)凝縮物質系における基底状態の古典的なシミュレーションの基礎である。しかし、厳密な結果は、2d PEPS状態から観測可能なものを正確に計算することは、一般に計算的に難しい問題であることを示している。しかし、2d PEPSの計算特性の近似スキームは、(狭すぎる)凝縮物質基底状態の大きなサブクラスに対して、定期的に使われ、経験的に成功と見られる。本研究では, ランダム行列理論の哲学を取り入れ, 解析的マッピングを応用し, 大きな結合次元で制御された解析を許容する効果的な複製統計力学モデルに活用し, 概ね2次元ランダムペップを収縮する複雑性を解析する。この統計力学レンズを通して、我々は次のように論じる。一ランダムPEPSのおよそのサンプリング波動関数振幅は、臨界結合次元を超える計算複雑相転移に直面している。二任意の有限結合次元のノルム及び相関関数を総称的に推定することができる。これらの結果は、様々なボンド次元体制に対して数値的に支持される。乱数PEPSに対する上記の結果が、物理的に関連する基底状態を表すPEPSにもより一般的に適用されるかどうかは、重要な未解決問題である。 Projected entangled pair states (PEPS) offer memory-efficient representations of some quantum many-body states that obey an entanglement area law, and are the basis for classical simulations of ground states in two-dimensional (2d) condensed matter systems. However, rigorous results show that exactly computing observables from a 2d PEPS state is generically a computationally hard problem. Yet approximation schemes for computing properties of 2d PEPS are regularly used, and empirically seen to succeed, for a large subclass of (not too entangled) condensed matter ground states. Adopting the philosophy of random matrix theory, in this work we analyze the complexity of approximately contracting a 2d random PEPS by exploiting an analytic mapping to an effective replicated statistical mechanics model that permits a controlled analysis at large bond dimension. Through this statistical-mechanics lens, we argue that: i) although approximately sampling wave-function amplitudes of random PEPS faces a computational-complexity phase transition above a critical bond dimension, ii) one can generically efficiently estimate the norm and correlation functions for any finite bond dimension. These results are supported numerically for various bond-dimension regimes. It is an important open question whether the above results for random PEPS apply more generally also to PEPS representing physically relevant ground states	翻訳日:2023-07-21 11:49:08 公開日:2023-07-20
# hrfnet:衛星画像のローカライズのための高解像度偽造ネットワーク HRFNet: High-Resolution Forgery Network for Localizing Satellite Image Manipulation ( http://arxiv.org/abs/2307.11052v1 ) ライセンス: Link先を確認	Fahim Faisal Niloy, Kishor Kumar Bhaumik, Simon S. Woo	(参考訳) 既存の高解像度衛星画像偽造ローカライズ手法はパッチベースまたはダウンサンプリングベースのトレーニングに依存している。これらのトレーニング手法には、プリスタンと偽造領域の境界の不正確さ、不要なアーティファクトの生成など、大きな欠点がある。本稿では,高分解能画像分割文学に触発された課題に対処するため,衛星画像のフォージェリーローカライゼーションを効果的に実現するためのHRFNetと呼ばれる新しいモデルを提案する。具体的には, 浅い枝と深い枝が組み合わさったモデルにより, RGB と再サンプリング機能を大域的および局所的に統合し, フォージェリーをより正確にローカライズすることができる。メモリ要求と処理速度は既存手法と比較して損なわれないが,本手法が最高の性能を達成することを示すため,様々な実験を行った。 Existing high-resolution satellite image forgery localization methods rely on patch-based or downsampling-based training. Both of these training methods have major drawbacks, such as inaccurate boundaries between pristine and forged regions, the generation of unwanted artifacts, etc. To tackle the aforementioned challenges, inspired by the high-resolution image segmentation literature, we propose a novel model called HRFNet to enable satellite image forgery localization effectively. Specifically, equipped with shallow and deep branches, our model can successfully integrate RGB and resampling features in both global and local manners to localize forgery more accurately. We perform various experiments to demonstrate that our method achieves the best performance, while the memory requirement and processing speed are not compromised compared to existing methods.	翻訳日:2023-07-21 11:48:42 公開日:2023-07-20
# 目標へのパンクラルム:ヒューマン・イン・ザ・ループフィードバックによる目標条件付き探索 Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback ( http://arxiv.org/abs/2307.11049v1 ) ライセンス: Link先を確認	Marcel Torne, Max Balsells, Zihan Wang, Samedh Desai, Tao Chen, Pulkit Agrawal, Abhishek Gupta	(参考訳) 探索と報酬の仕様は強化学習の基本的かつ相互に絡み合った課題である。逐次的な意思決定タスクの解決には、報酬関数の慎重な設計や、新規な探索ボーナスの使用が必要である。ヒューマンスーパーバイザーは、探索プロセスを指示するためにループ内で効果的なガイダンスを提供することができるが、このガイダンスを利用する以前の方法は、常に同期した高品質な人間のフィードバックを必要とする。本研究では,非熟練ユーザからの低品質のフィードバックを,散発的で非同期でノイズの多い,ヒューマンガイド探索(huge)と呼ばれる手法を提案する。 HuGEは、シミュレーションだけでなく、実世界でも、厳密な報酬仕様なしで強化学習の探索をガイドしている。人間のフィードバックは探索を手助けするが、探索データから自己監督された学習はバイアスのない政策を生み出す。この手順は、騒々しく非同期な人間のフィードバックを利用して、手作りの報酬設計や探索ボーナスなしでポリシーを学ぶことができる。 HuGEは、専門家でないユーザからのクラウドソースフィードバックを使用して、シミュレーションにおいて、さまざまな困難なマルチステージロボットナビゲーションと操作タスクを学ぶことができる。さらに、このパラダイムは、人間のスーパーバイザーからの非同期フィードバックを使用して、現実世界のロボットで直接学習することができる。 Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision-making tasks requiring expansive exploration requires either careful design of reward functions or the use of novelty-seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to leverage this guidance require constant synchronous high-quality human feedback, which is expensive and impractical to obtain. In this work, we present a technique called Human Guided Exploration (HuGE), which uses low-quality feedback from non-expert users that may be sporadic, asynchronous, and noisy. HuGE guides exploration for reinforcement learning not only in simulation but also in the real world, all without meticulous reward specification. The key concept involves bifurcating human feedback and policy learning: human feedback steers exploration, while self-supervised learning from the exploration data yields unbiased policies. This procedure can leverage noisy, asynchronous human feedback to learn policies with no hand-crafted reward design or exploration bonuses. HuGE is able to learn a variety of challenging multi-stage robotic navigation and manipulation tasks in simulation using crowdsourced feedback from non-expert users. Moreover, this paradigm can be scaled to learning directly on real-world robots, using occasional, asynchronous feedback from human supervisors.	翻訳日:2023-07-21 11:48:27 公開日:2023-07-20
# 連続的強化学習の定義 A Definition of Continual Reinforcement Learning ( http://arxiv.org/abs/2307.11046v1 ) ライセンス: Link先を確認	David Abel, Andr\'e Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh	(参考訳) 本稿では,継続的強化学習の基盤を開発する。 In this paper we develop a foundation for continual reinforcement learning.	翻訳日:2023-07-21 11:48:01 公開日:2023-07-20
# 有界エージェントの収束について On the Convergence of Bounded Agents ( http://arxiv.org/abs/2307.11044v1 ) ライセンス: Link先を確認	David Abel, Andr\'e Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh	(参考訳) エージェントがいつ収束したか? 強化学習問題の標準モデルは収束の直接的な定義をもたらす: エージェントがそれぞれの環境状態における振る舞いや性能が変化しなくなると収束する。しかし,学習課題の焦点を環境状態からエージェントの状態へと移すにつれて,エージェントの収束の概念が著しく明確になる。本稿では,有界エージェントを中心とした強化学習問題のフレーミングにおけるエージェント収束の相補的な2つの説明を提案する。第一の見方では、有界エージェントは、エージェントの将来の振る舞いを記述するのに必要な最小の状態数が減少しないときに収束する。第2のビューでは、エージェントの内部状態が変更された場合にのみ、エージェントのパフォーマンスが変化するときのみ、境界エージェントが収束したと述べる。これらの2つの定義の基本的な性質を定め、標準設定における収束の典型的な見解を満たし、それらの性質と関係性に関するいくつかの事実を証明する。これらの視点、定義、分析は、分野の中心的な考え方に明確性をもたらす。 When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly less clear. In this paper, we propose two complementary accounts of agent convergence in a framing of the reinforcement learning problem that centers around bounded agents. The first view says that a bounded agent has converged when the minimal number of states needed to describe the agent's future behavior cannot decrease. The second view says that a bounded agent has converged just when the agent's performance only changes if the agent's internal state changes. We establish basic properties of these two definitions, show that they accommodate typical views of convergence in standard settings, and prove several facts about their nature and relationship. We take these perspectives, definitions, and analysis to bring clarity to a central idea of the field.	翻訳日:2023-07-21 11:48:00 公開日:2023-07-20
# Cascade-DETR: 高品質なユニバーサルオブジェクト検出 Cascade-DETR: Delving into High-Quality Universal Object Detection ( http://arxiv.org/abs/2307.11035v1 ) ライセンス: Link先を確認	Mingqiao Ye, Lei Ke, Siyuan Li, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan and Fisher Yu	(参考訳) 一般的な環境でのオブジェクトのローカライゼーションは、視覚システムの基本部分である。 COCOベンチマークで優位に立つ一方で、最近のTransformerベースの検出方法は多様なドメインで競合しない。さらに、これらの手法は複雑な環境でオブジェクトバウンディングボックスを正確に推定するのに苦労している。高品質な普遍物体検出のためのカスケードDETRを提案する。本稿では,対象中心情報を検出デコーダに明示的に統合するカスケード・アテンション・レイヤを提案することにより,多様な領域への一般化と局所化精度を両立させる。さらに精度を高めるために,クエリのスコアリングを再検討する。分類スコアに頼る代わりに、クエリの予想されるiouを予測することで、信頼性が大幅に向上します。最後に、多様なドメインから10のデータセットを含む汎用オブジェクト検出ベンチマークUDB10を紹介する。カスケード-DETRはCOCOの最先端を推し進める一方で、UDB10の全データセット上のDETRベースの検出器を大幅に改善している。厳密な品質要件による改善はさらに顕著である。私たちのコードとモデルはhttps://github.com/syscv/cascade-detrでリリースされる予定です。 Object localization in general environments is a fundamental part of vision systems. While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains. Moreover, these methods still struggle to very accurately estimate the object bounding boxes in complex environments. We introduce Cascade-DETR for high-quality universal object detection. We jointly tackle the generalization to diverse domains and localization accuracy by proposing the Cascade Attention layer, which explicitly integrates object-centric information into the detection decoder by limiting the attention to the previous box prediction. To further enhance accuracy, we also revisit the scoring of queries. Instead of relying on classification scores, we predict the expected IoU of the query, leading to substantially more well-calibrated confidences. Lastly, we introduce a universal object detection benchmark, UDB10, that contains 10 datasets from diverse domains. While also advancing the state-of-the-art on COCO, Cascade-DETR substantially improves DETR-based detectors on all datasets in UDB10, even by over 10 mAP in some cases. The improvements under stringent quality requirements are even more pronounced. Our code and models will be released at https://github.com/SysCV/cascade-detr.	翻訳日:2023-07-21 11:47:43 公開日:2023-07-20
# Embroid: 教師なし予測の平滑化は、わずかなショットの分類を改善できる Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification ( http://arxiv.org/abs/2307.11031v1 ) ライセンス: Link先を確認	Neel Guha, Mayee F. Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher R\'e	(参考訳) 近年の研究では、手動アノテーションが高価である領域において、言語モデル(LM)のプロンプトベースの学習機能がデータラベリングの自動化に適していることが示されている。課題は、初期プロンプトを書くのは安価だが、プロンプトを改善するのはコストがかかることだ。我々の研究は、ラベル付きデータを追加せずに、プロンプトベースの学習を改善することができるかどうかを問うものである。我々は,プロンプト自体ではなく,プロンプトの予測を変更することでこの問題にアプローチする。我々の直感では、正確な予測も一貫性があるべきである:ある特徴表現の下で類似したサンプルは、同じプロンプト予測を受けなければならない。 Embroidは、異なる埋め込み関数の下でデータセットの複数の表現を計算し、近隣のサンプルに対するLM予測間の整合性を利用して誤予測を識別する手法である。次にembroidは、これらの近傍を使用して各サンプルに対する追加の予測を作成し、これらの予測を単純な潜在変数のグラフィカルモデルと組み合わせて最終補正された予測を生成する。 Embroidの理論解析に加えて、6つの異なるLMと最大95の異なるタスクに対して厳密な経験的評価を行う。その結果,(1)エンブロイドは元々のプロンプト(例えばgpt-jtの平均7.3ポイント)よりも大幅に性能が向上し,(2)より洗練されたプロンプト戦略(例えばチェーン・オブ・マインド)の改善を実現し,(3)埋め込み関数を通じて法のような領域に特化できることがわかった。 Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, improving a prompt is costly -- practitioners often require significant labeled data in order to evaluate the impact of prompt modifications. Our work asks whether it is possible to improve prompt-based learning without additional labeled data. We approach this problem by attempting to modify the predictions of a prompt, rather than the prompt itself. Our intuition is that accurate predictions should also be consistent: samples which are similar under some feature representation should receive the same prompt prediction. We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions, and uses the consistency between the LM predictions for neighboring samples to identify mispredictions. Embroid then uses these neighborhoods to create additional predictions for each sample, and combines these predictions with a simple latent variable graphical model in order to generate a final corrected prediction. In addition to providing a theoretical analysis of Embroid, we conduct a rigorous empirical evaluation across six different LMs and up to 95 different tasks. We find that (1) Embroid substantially improves performance over original prompts (e.g., by an average of 7.3 points on GPT-JT), (2) also realizes improvements for more sophisticated prompting strategies (e.g., chain-of-thought), and (3) can be specialized to domains like law through the embedding functions.	翻訳日:2023-07-21 11:47:22 公開日:2023-07-20
# 量子相関に関するデータ駆動基準 Data-driven criteria for quantum correlations ( http://arxiv.org/abs/2307.11091v1 ) ライセンス: Link先を確認	Mateusz Krawczyk, Jaros{\l}aw Paw{\l}owski, Maciej M. Ma\'ska, and Katarzyna Roszak	(参考訳) ランダムに生成された状態に対して教師なしの方法で訓練されたニューラルネットワークを用いて、3量子ビットシステム内の相関を検出する機械学習モデルを構築する。ネットワークは分離可能な状態を認識せざるを得ず、相関状態は異常として検出される。極めて驚くべきことに、提案する検出器は、絡み合いよりも弱い量子相関、すなわち量子不一致を区別するのに優れていることがわかった。実際、絡み合い検出の最適しきい値においても、絡み合い状態の集合を極端に過大評価する傾向があり、不協和状態の集合をはるかに少ない程度に過小評価する傾向にある。量子相関性(quantum-correlated)として分類される状態の性質を説明するために、様々な種類の状態を含むダイアグラムを構築します。認識損失のゼロに近い値は、特に図表上のこの集合の非自明な形状を考慮すると、非識別分離状態の形状を高精度に再現する。ネットワークアーキテクチャは、分離性を保持し、その出力は、キュービットの置換に関して等しく変化する。部分的トレース操作のみを利用するベースラインモデルよりもはるかに優れた検出精度を得るためには,アーキテクチャの選択が重要であることを示す。 We build a machine learning model to detect correlations in a three-qubit system using a neural network trained in an unsupervised manner on randomly generated states. The network is forced to recognize separable states, and correlated states are detected as anomalies. Quite surprisingly, we find that the proposed detector performs much better at distinguishing a weaker form of quantum correlations, namely, the quantum discord, than entanglement. In fact, it has a tendency to grossly overestimate the set of entangled states even at the optimal threshold for entanglement detection, while it underestimates the set of discordant states to a much lesser extent. In order to illustrate the nature of states classified as quantum-correlated, we construct a diagram containing various types of states -- entangled, as well as separable, both discordant and non-discordant. We find that the near-zero value of the recognition loss reproduces the shape of the non-discordant separable states with high accuracy, especially considering the non-trivial shape of this set on the diagram. The network architecture is designed carefully: it preserves separability, and its output is equivariant with respect to qubit permutations. We show that the choice of architecture is important to get the highest detection accuracy, much better than for a baseline model that just utilizes a partial trace operation.	翻訳日:2023-07-21 11:41:57 公開日:2023-07-20
# l-eval:long context language modelの標準化評価 L-Eval: Instituting Standardized Evaluation for Long Context Language Models ( http://arxiv.org/abs/2307.11088v1 ) ライセンス: Link先を確認	Chenxin An, Shansan Gong, Ming Zhong, Mukai Li, Jun Zhang, Lingpeng Kong, and Xipeng Qiu	(参考訳) 近年、単ターンの長い入力(例えば論文の要約)やより広範な歴史との会話を効果的に処理するために、命令追従モデルのコンテキストの長さを拡張することへの関心が高まっている。 GPT-4やClaudeのようなプロプライエタリなモデルは、数万のコンテキストトークンを扱う上でかなりの進歩を見せているが、オープンソースモデルは実験の初期段階にある。これらの長いコンテキストモデルの開発が、チャンク化されたコンテキストでのみ訓練された検索ベースの方法やモデルよりも、実用的な下流タスクにかなりの利益をもたらすかどうかも、まだ不明である。本稿では,この課題に対処するために,ロングコンテキスト言語モデルの標準化評価を行う。具体的には,法律,金融,学校講義,長い会話,ニュース,長文小説,会議などの分野の著者が手作業で注釈とチェックを行った2000以上の質問応答ペアを含むl-evalを開発した。 L-Evalは様々な評価手法や命令スタイルを採用しており、Long Context Language Models (LCLM) の信頼性を高めている。私たちの調査では、オープンソースモデルは一般的に商用モデルよりも遅れているものの、それでも素晴らしいパフォーマンスを示しています。 LLaMA2は、4kコンテキスト長しか持たないオープンエンドタスクにおいて最良の結果(ウィン45\%対ターボ16k)を達成し、ChatGLM2は8k入力トークンを持つクローズドエンドタスクにおいて最高の結果を得る。オープンソースLCLM, GPT4-32k, Cluade-100k at {\url{https://github.com/OpenLMLab/LEval}} の予測を含む,新たな評価スイート,コード,およびすべての生成結果をリリースする。 Recently, there has been growing interest in extending the context length of instruction-following models in order to effectively process single-turn long input (e.g. summarizing a paper) and conversations with more extensive histories. While proprietary models such as GPT-4 and Claude have demonstrated considerable advancements in handling tens of thousands of tokens of context, open-sourced models are still in the early stages of experimentation. It also remains unclear whether developing these long context models can offer substantial gains on practical downstream tasks over retrieval-based methods or models simply trained on chunked contexts. To address this challenge, we propose to institute standardized evaluation for long context language models. Concretely, we develop L-Eval which contains 411 long documents and over 2,000 query-response pairs manually annotated and checked by the authors encompassing areas such as law, finance, school lectures, lengthy conversations, news, long-form novels, and meetings. L-Eval also adopts diverse evaluation methods and instruction styles, enabling a more reliable assessment of Long Context Language Models (LCLMs). Our findings indicate that while open-source models typically lag behind their commercial counterparts, they still exhibit impressive performance. LLaMA2 achieves the best results (win 45\% vs turbo-16k) on open-ended tasks with only 4k context length and ChatGLM2 achieves the best results on closed-ended tasks with 8k input tokens. We release our new evaluation suite, code, and all generation results including predictions from all open-sourced LCLMs, GPT4-32k, Cluade-100k at {\url{https://github.com/OpenLMLab/LEval}}.	翻訳日:2023-07-21 11:41:30 公開日:2023-07-20
# PAPR: 近視的注意ポイントレンダリング PAPR: Proximity Attention Point Rendering ( http://arxiv.org/abs/2307.11086v1 ) ライセンス: Link先を確認	Yanshu Zhang, Shichong Peng, Alireza Moazeni, Ke Li	(参考訳) スクラッチからシーン表面の正確で控えめなポイントクラウド表現を学ぶことは、3d表現学習の課題である。既存のポイントベース手法は、しばしば消失する勾配問題や、シーンの幾何学やテクスチャを正確にモデル化するために多くのポイントを必要とする。これらの制約に対処するため,我々は,ポイントベースのシーン表現と微分可能なレンダラからなる新しい手法である近接注意ポイントレンダリング(papr)を提案する。我々のシーン表現は、各点が空間的位置、前景スコア、ビュー非依存の特徴ベクトルによって特徴づけられる点雲を使用する。レンダラは、各光線に関する関連点を選択し、関連する特徴を用いて正確な色を生成する。 PAPRは、初期化がターゲットの幾何学と大きく異なる場合でも、適切なシーン幾何学を表現するために点雲の位置を効果的に学習する。特に,本手法では,相似点のみを用いて微細なテクスチャの詳細を抽出する。また,本手法の実用的応用として,幾何学的編集,オブジェクト操作,テクスチャ転送,露出制御の4つを挙げる。さらなる結果とコードは、プロジェクトのwebサイトhttps://zvict.github.io/papr/で閲覧できます。 Learning accurate and parsimonious point cloud representations of scene surfaces from scratch remains a challenge in 3D representation learning. Existing point-based methods often suffer from the vanishing gradient problem or require a large number of points to accurately model scene geometry and texture. To address these limitations, we propose Proximity Attention Point Rendering (PAPR), a novel method that consists of a point-based scene representation and a differentiable renderer. Our scene representation uses a point cloud where each point is characterized by its spatial position, foreground score, and view-independent feature vector. The renderer selects the relevant points for each ray and produces accurate colours using their associated features. PAPR effectively learns point cloud positions to represent the correct scene geometry, even when the initialization drastically differs from the target geometry. Notably, our method captures fine texture details while using only a parsimonious set of points. We also demonstrate four practical applications of our method: geometry editing, object manipulation, texture transfer, and exposure control. More results and code are available on our project website at https://zvict.github.io/papr/.	翻訳日:2023-07-21 11:40:55 公開日:2023-07-20
# 異常検出における表現学習:成功、限界、そして大きな挑戦 Representation Learning in Anomaly Detection: Successes, Limits and a Grand Challenge ( http://arxiv.org/abs/2307.11085v1 ) ライセンス: Link先を確認	Yedid Hoshen	(参考訳) 本稿では,異常検出における支配的パラダイムは無限にスケールできず,最終的には基本的限界に達することを論じる。これは、異常検出のための無料ランチの原則がないためである。これらの制限は、多くの産業的タスクと同様に、強いタスク前がある場合に克服できる。このような事前処理が存在しない場合、そのタスクは異常検出よりもずっと難しい。異常検出のための大きな課題として,2つの課題を挙げる。一異常検出による科学的発見 ii) imagenetデータセットにおける最も異常な画像を検出する「ミニグランド」チャレンジ。これらの課題を克服するためには、新たな異常検出ツールやアイデアを開発する必要があると考えています。 In this perspective paper, we argue that the dominant paradigm in anomaly detection cannot scale indefinitely and will eventually hit fundamental limits. This is due to the a no free lunch principle for anomaly detection. These limitations can be overcome when there are strong tasks priors, as is the case for many industrial tasks. When such priors do not exists, the task is much harder for anomaly detection. We pose two such tasks as grand challenges for anomaly detection: i) scientific discovery by anomaly detection ii) a "mini-grand" challenge of detecting the most anomalous image in the ImageNet dataset. We believe new anomaly detection tools and ideas would need to be developed to overcome these challenges.	翻訳日:2023-07-21 11:40:31 公開日:2023-07-20
# 量子ログスペース計算の検証 Quantum Logspace Computations are Verifiable ( http://arxiv.org/abs/2307.11083v1 ) ライセンス: Link先を確認	Uma Girish, Ran Raz, Wei Zhan	(参考訳) 本稿では、量子対数計算が古典的対数アルゴリズムによって検証され、無条件のセキュリティを持つことを観察する。より正確には、BQLの全ての言語は量子対数証明器と古典対数検証器を備えた(情報理論的に安全な)ストリーミング証明を持つ。証明者は、検証子にストリームされる多項式長証明を提供する。検証者は、その証明に対する一方向の読み取りアクセスを持ち、計算が正しく行われたことを検証できる。すなわち、入力が言語内にあり、証明者が正直であれば、検証者は高い確率で受け入れ、その入力が言語内でなければ、証明者は、たとえ証明者が逆であるとしても、高い確率で拒否する。さらに、検証者は$O(\log n)$ランダムビットのみを使用する。 In this note, we observe that quantum logspace computations are verifiable by classical logspace algorithms, with unconditional security. More precisely, every language in BQL has an (information-theoretically secure) streaming proof with a quantum logspace prover and a classical logspace verifier. The prover provides a polynomial-length proof that is streamed to the verifier. The verifier has a read-once one-way access to that proof and is able to verify that the computation was performed correctly. That is, if the input is in the language and the prover is honest, the verifier accepts with high probability, and, if the input is not in the language, the verifier rejects with high probability even if the prover is adversarial. Moreover, the verifier uses only $O(\log n)$ random bits.	翻訳日:2023-07-21 11:39:56 公開日:2023-07-20
# GLSFormer : 手術ビデオにおけるステップ認識のための長い短いシーケンス変換器 GLSFormer : Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos ( http://arxiv.org/abs/2307.11081v1 ) ライセンス: Link先を確認	Nisarg A. Shah, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel	(参考訳) 外科的ステップの自動認識は、手術中の患者の安全性と意思決定を大幅に改善する重要な課題である。既存の外科的段階認識のための最先端の手法は、空間情報と時間情報の分離した多段階モデリングに依存するか、あるいは、共同で学習した場合に短距離時間分解能で操作する。しかし、時空間的特徴と長距離情報の共同モデリングの利点は考慮されていない。本稿では,フレームレベルのパッチのシーケンスから時空間的特徴を直接学習するビジョントランスフォーマによるアプローチを提案する。本手法では,短期・長期の時空間特徴表現をインテリジェントに組み合わせたゲート時間アテンション機構を組み込んだ。 2つの白内障手術ビデオデータセット(白内障101とd99)に対するアプローチを広範囲に評価し,最先端の手法と比較して優れた性能を示す。これらの結果は, 手術ステップの自動認識における提案手法の適合性を検証する。私たちのコードは、https://github.com/nisargshah 1999/GLSFormerでリリースされています。 Automated surgical step recognition is an important task that can significantly improve patient safety and decision-making during surgeries. Existing state-of-the-art methods for surgical step recognition either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly. However, the benefits of joint modeling of spatio-temporal features and long-range information are not taken in account. In this paper, we propose a vision transformer-based approach to jointly learn spatio-temporal features directly from sequence of frame-level patches. Our method incorporates a gated-temporal attention mechanism that intelligently combines short-term and long-term spatio-temporal feature representations. We extensively evaluate our approach on two cataract surgery video datasets, namely Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods. These results validate the suitability of our proposed approach for automated surgical step recognition. Our code is released at: https://github.com/nisargshah1999/GLSFormer	翻訳日:2023-07-21 11:39:35 公開日:2023-07-20
# Brain2Music:人間の脳活動から音楽を再構築する Brain2Music: Reconstructing Music from Human Brain Activity ( http://arxiv.org/abs/2307.11078v1 ) ライセンス: Link先を確認	Timo I. Denk, Yu Takagi, Takuya Matsuyama, Andrea Agostinelli, Tomoya Nakai, Christian Frank, Shinji Nishimoto	(参考訳) 人間の脳活動から経験を再構築するプロセスは、脳が世界をどのように解釈し、表現するかというユニークなレンズを提供する。本稿では,機能的磁気共鳴画像(fMRI)を用いて,脳活動から音楽の再構成を行う手法を提案する。本手法では,fMRIデータからの埋め込みを条件とした音楽検索やMusicLM音楽生成モデルを用いる。生成された音楽は、ジャンル、楽器、ムードといった意味的特性に関して、人間の被験者が経験した音楽刺激に類似している。ボクセル単位の符号化モデル解析により,MusicLMの異なる成分と脳活動の関係について検討した。さらに,音楽刺激の純粋テキスト記述から得られる情報を表現する脳領域についても論じる。我々は https://google-research.github.io/seanet/brain2music で再構成された音楽の例を含む補足資料を提供する。 The process of reconstructing experiences from human brain activity offers a unique lens into how the brain interprets and represents the world. In this paper, we introduce a method for reconstructing music from brain activity, captured using functional magnetic resonance imaging (fMRI). Our approach uses either music retrieval or the MusicLM music generation model conditioned on embeddings derived from fMRI data. The generated music resembles the musical stimuli that human subjects experienced, with respect to semantic properties like genre, instrumentation, and mood. We investigate the relationship between different components of MusicLM and brain activity through a voxel-wise encoding modeling analysis. Furthermore, we discuss which brain regions represent information derived from purely textual descriptions of music stimuli. We provide supplementary material including examples of the reconstructed music at https://google-research.github.io/seanet/brain2music	翻訳日:2023-07-21 11:39:15 公開日:2023-07-20
# AlignDet: オブジェクト検出における事前トレーニングと微調整の調整 AlignDet: Aligning Pre-training and Fine-tuning in Object Detection ( http://arxiv.org/abs/2307.11077v1 ) ライセンス: Link先を確認	Ming Li, Jie Wu, Xionghui Wang, Chen Chen, Jie Qin, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan	(参考訳) 大規模事前学習のパラダイムと下流の微調整は様々な物体検出アルゴリズムで広く採用されている。本稿では,既存の手法における事前学習手順と微調整手順との間に,検出器の性能,一般化能力,収束速度を暗黙的に制限する,データ,モデル,タスクの差異を明らかにする。この目的のために、我々は、様々な既存の検出器に適応可能な統合事前学習フレームワークであるAlignDetを提案する。 AlignDetは事前トレーニングプロセスを、イメージドメインとボックスドメイン事前トレーニングの2つのステージに分離する。イメージドメイン事前トレーニングは検出バックボーンを最適化し、総合的な視覚的抽象化をキャプチャし、ボックスドメイン事前トレーニングはインスタンスレベルのセマンティクスとタスクアウェアの概念を学習し、バックボーンから部品を初期化する。自己教師付きバックボーンを組み込むことで、様々な検出器のための全てのモジュールを教師なしパラダイムで事前訓練することができる。図1に示すように、allendetが検出アルゴリズム、モデルバックボーン、データ設定、トレーニングスケジュールなど、さまざまなプロトコルで大幅に改善できることが、広範な実験で示されています。例えば、AlignDetはFCOSを5.3mAPで改善し、RetinaNetを2.1mAPで、R-CNNを3.3mAPで、DETRを2.3mAPで改善した。 The paradigm of large-scale pre-training followed by downstream fine-tuning has been widely employed in various object detection algorithms. In this paper, we reveal discrepancies in data, model, and task between the pre-training and fine-tuning procedure in existing practices, which implicitly limit the detector's performance, generalization ability, and convergence speed. To this end, we propose AlignDet, a unified pre-training framework that can be adapted to various existing detectors to alleviate the discrepancies. AlignDet decouples the pre-training process into two stages, i.e., image-domain and box-domain pre-training. The image-domain pre-training optimizes the detection backbone to capture holistic visual abstraction, and box-domain pre-training learns instance-level semantics and task-aware concepts to initialize the parts out of the backbone. By incorporating the self-supervised pre-trained backbones, we can pre-train all modules for various detectors in an unsupervised paradigm. As depicted in Figure 1, extensive experiments demonstrate that AlignDet can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule. For example, AlignDet improves FCOS by 5.3 mAP, RetinaNet by 2.1 mAP, Faster R-CNN by 3.3 mAP, and DETR by 2.3 mAP under fewer epochs.	翻訳日:2023-07-21 11:39:02 公開日:2023-07-20
# 人間のメッシュ回復のための高密度紫外線コンプリート学習 Learning Dense UV Completion for Human Mesh Recovery ( http://arxiv.org/abs/2307.11074v1 ) ライセンス: Link先を確認	Yanjun Wang, Qingping Sun, Wenjia Wang, Jun Ling, Zhongang Cai, Rong Xie, Li Song	(参考訳) 単一画像からの人間のメッシュ再構築は、自己や物体、あるいは他の人間によって引き起こされるオクルージョンの存在下では困難である。既存の手法では、人間の特徴を正確に分離できないか、機能補完のための適切な監督を欠いている。本稿では,密接な対応地図を利用して閉塞処理を行う2段階の手法であるDense Inpainting Human Mesh Recovery (DIMR)を提案する。提案手法は,高密度対応マップを用いて視覚的特徴を分離し,注目機能補完モジュールを用いた高密度UVマップ上での人間の特徴を補完する。また、未使用の機能から学習するためのネットワークを誘導する機能拡張訓練手順を設計する。提案手法を複数のデータセット上で評価し,その性能を他の手法と比較した。広汎な実験により,従来のSOTA法よりも高い性能を示し,標準ベンチマーク(3DPW)において同等の結果が得られた。 Human mesh reconstruction from a single image is challenging in the presence of occlusion, which can be caused by self, objects, or other humans. Existing methods either fail to separate human features accurately or lack proper supervision for feature completion. In this paper, we propose Dense Inpainting Human Mesh Recovery (DIMR), a two-stage method that leverages dense correspondence maps to handle occlusion. Our method utilizes a dense correspondence map to separate visible human features and completes human features on a structured UV map dense human with an attention-based feature completion module. We also design a feature inpainting training procedure that guides the network to learn from unoccluded features. We evaluate our method on several datasets and demonstrate its superior performance under heavily occluded scenarios compared to other methods. Extensive experiments show that our method obviously outperforms prior SOTA methods on heavily occluded images and achieves comparable results on the standard benchmarks (3DPW).	翻訳日:2023-07-21 11:38:35 公開日:2023-07-20
# OBJECT 3DIT:言語誘導型3D対応画像編集 OBJECT 3DIT: Language-guided 3D-aware Image Editing ( http://arxiv.org/abs/2307.11073v1 ) ライセンス: Link先を確認	Oscar Michel, Anand Bhattad, Eli VanderBilt, Ranjay Krishna, Aniruddha Kembhavi, Tanmay Gupta	(参考訳) 既存の画像編集ツールは強力だが、画像が投影される基礎となる3D幾何学は無視される。その結果、これらのツールを用いた編集は、画像形成プロセスの基礎となる幾何学的条件や照明条件から切り離される可能性がある。本研究では,画像中のオブジェクトを,下層の3Dシーンの文脈で言語命令に従って編集する,言語誘導型3D対応編集の新規要求を定式化する。この目標に向けての進展を促進するために、手続き的に生成された3Dシーンから作成される400Kの編集例からなるデータセットOBJECTをリリースする。それぞれの例は、入力画像、言語による編集命令、および編集画像からなる。 4つの編集タスクのためのシングルおよびマルチタスクモデルである3ditも紹介する。私たちのモデルでは、周囲の物体、表面、照明条件、影、物理的に表現可能な物体構成など、シーン全体の3D構成を理解する能力が印象的です。驚くべきことに、3DITの編集能力は、OBJECTの合成シーンのみのトレーニングを現実のイメージに一般化する。 Existing image editing tools, while powerful, typically disregard the underlying 3D geometry from which the image is projected. As a result, edits made using these tools may become detached from the geometry and lighting conditions that are at the foundation of the image formation process. In this work, we formulate the newt ask of language-guided 3D-aware editing, where objects in an image should be edited according to a language instruction in context of the underlying 3D scene. To promote progress towards this goal, we release OBJECT: a dataset consisting of 400K editing examples created from procedurally generated 3D scenes. Each example consists of an input image, editing instruction in language, and the edited image. We also introduce 3DIT : single and multi-task models for four editing tasks. Our models show impressive abilities to understand the 3D composition of entire scenes, factoring in surrounding objects, surfaces, lighting conditions, shadows, and physically-plausible object configurations. Surprisingly, training on only synthetic scenes from OBJECT, editing capabilities of 3DIT generalize to real-world images.	翻訳日:2023-07-21 11:38:20 公開日:2023-07-20

Title

Authors

Abstract

論文公表日・翻訳日

# 差分プライバシーを用いたデータ分析

Data Analytics with Differential Privacy ( http://arxiv.org/abs/2311.16104v1 )

ライセンス: Link先を確認

Vassilis Digalakis Jr

(参考訳) ディファレンシャルプライバシは、プライバシに関する最先端の定義であり、機密データセット上で実行される分析が、そのデータを含む個人に関する情報をリークしないことを保証する。本論文では,分散およびストリーミングデータを解析するための微分プライベートアルゴリズムを開発した。分散モデルでは、学習の特定の問題 -- 分散形式で -- はデータのグローバルモデルであり、その後任意の分析に使用できると考える。ベイズネットワークモデルを用いて,低次分布の積としての集中型データセットの高次元分布を近似する微分プライベート手法であるprivbayesを基礎とする。分散データからグローバルベイズネットワークを学習するための3つの新しいアプローチについて検討し、すべてのローカルデータセットに差分プライバシー保証を提供する。我々の研究は、我々のアルゴリズムの1つで使われている分散プライベートエントロピー推定器の詳細な理論的解析と、合成データと実世界のデータの両方を用いて詳細な実験的評価を含む。ストリーミングモデルでは,ストリームに実際に現れる全ユーザの比率を表す,ストリームの密度を推定する問題に注目する。我々は,ストリーミングモデルであるユーザレベルのパンプライバシに対して,最も強力なプライバシ保証を提供する。これは,アルゴリズムの内部状態を監視している敵に対して,ユーザのプライバシが保護されていることを保証します。そこで本研究では,既存のサンプリングベースアルゴリズムの詳細な解析を行い,全ての「プライバシー予算」を最適に活用し,理論的および実験的に改善する2つの新しい修正を提案する。

Differential privacy is the state-of-the-art definition for privacy, guaranteeing that any analysis performed on a sensitive dataset leaks no information about the individuals whose data are contained therein. In this thesis, we develop differentially private algorithms to analyze distributed and streaming data. In the distributed model, we consider the particular problem of learning -- in a distributed fashion -- a global model of the data, that can subsequently be used for arbitrary analyses. We build upon PrivBayes, a differentially private method that approximates the high-dimensional distribution of a centralized dataset as a product of low-order distributions, utilizing a Bayesian Network model. We examine three novel approaches to learning a global Bayesian Network from distributed data, while offering the differential privacy guarantee to all local datasets. Our work includes a detailed theoretical analysis of the distributed, differentially private entropy estimator which we use in one of our algorithms, as well as a detailed experimental evaluation, using both synthetic and real-world data. In the streaming model, we focus on the problem of estimating the density of a stream of users, which expresses the fraction of all users that actually appear in the stream. We offer one of the strongest privacy guarantees for the streaming model, user-level pan-privacy, which ensures that the privacy of any user is protected, even against an adversary that observes the internal state of the algorithm. We provide a detailed analysis of an existing, sampling-based algorithm for the problem and propose two novel modifications that significantly improve it, both theoretically and experimentally, by optimally using all the allocated "privacy budget."

翻訳日:2024-01-15 15:21:55 公開日:2023-07-20

# RESTful API設計ルールはWeb APIの理解可能性に影響を与えるか? API記述を用いたWebベースの実験

Do RESTful API Design Rules Have an Impact on the Understandability of Web APIs? A Web-Based Experiment with API Descriptions ( http://arxiv.org/abs/2305.07346v3 )

ライセンス: Link先を確認

Justus Bogner, Sebastian Kotstein, Timo Pfaff

(参考訳) コンテキスト: web apiは、web上でアプリケーション機能を公開するための最もよく使われる方法の1つであり、その理解力は、提供されたリソースを効率的に利用する上で重要である。多くのAPI設計ルールが存在するが、ほとんどのルールの有効性に関する実証的な証拠が欠けている。目的:学習したいと願う 1)restfulなapi設計ルールが理解可能性に与える影響 2 規則違反がより理解しにくいと認められる場合、及び 3) REST関連の経験のような人口統計特性がこれに影響を与えている場合。方法: 業界, 学界, 経験の異なる105人の参加者を対象に, 制御されたWebベースの実験を行った。クロスオーバーとオブジェクト間の設計のハイブリッドに基づいて,APIスニペットを2つの補完バージョンで使用した12の設計ルールについて検討した。参加者は理解的な質問に答え、その難しさを評価した。結果:12のルールのうち11のルールにおいて,「違反」は理解作業において「ルール」よりも有意に悪い結果が得られた。主観評価では,12ルールのうち9ルールに有意な差が認められた。デモグラフィックは「違反」に対する理解的なパフォーマンスには何の役割も果たさなかった。結論: この結果は, 研究者, 実践者, 教育者にとって重要な Web API の理解性を改善するために, 設計規則に従うことの重要性を実証した最初の証拠となる。

Context: Web APIs are one of the most used ways to expose application functionality on the Web, and their understandability is important for efficiently using the provided resources. While many API design rules exist, empirical evidence for the effectiveness of most rules is lacking. Objective: We therefore wanted to study 1) the impact of RESTful API design rules on understandability, 2) if rule violations are also perceived as more difficult to understand, and 3) if demographic attributes like REST-related experience have an influence on this. Method: We conducted a controlled Web-based experiment with 105 participants, from both industry and academia and with different levels of experience. Based on a hybrid between a crossover and a between-subjects design, we studied 12 design rules using API snippets in two complementary versions: one that adhered to a "rule" and one that was a "violation" of this rule. Participants answered comprehension questions and rated the perceived difficulty. Results: For 11 of the 12 rules, we found that "violation" performed significantly worse than "rule" for the comprehension tasks. Regarding the subjective ratings, we found significant differences for 9 of the 12 rules, meaning that most violations were subjectively rated as more difficult to understand. Demographics played no role in the comprehension performance for "violation". Conclusions: Our results provide first empirical evidence for the importance of following design rules to improve the understandability of Web APIs, which is important for researchers, practitioners, and educators.

翻訳日:2023-10-24 08:55:07 公開日:2023-07-20

# EventB, $\{log\}$, Why3 スパース集合モデルの比較

Comparing EventB, $\{log\}$ and Why3 Models of Sparse Sets ( http://arxiv.org/abs/2307.03974v2 )

ライセンス: Link先を確認

Maximiliano Cristi\'a and Catherine Dubois

(参考訳) 集合の多くの表現はプログラミング言語ライブラリで利用可能である。この論文は、例えば、範囲列の代替として、値の有限集合である整数変数領域を表す制約解決器で使われるスパース集合に焦点を当てている。本稿では, スパース集合の実装を, EventB, $\{log\}$ と Why3 の3つの帰納的形式検証ツールで検証する。さらに,仕様や証明について比較する。

Many representations for sets are available in programming languages libraries. The paper focuses on sparse sets used, e.g., in some constraint solvers for representing integer variable domains which are finite sets of values, as an alternative to range sequence. We propose in this paper verified implementations of sparse sets, in three deductive formal verification tools, namely EventB, $\{log\}$ and Why3. Furthermore, we draw some comparisons regarding specifications and proofs.

翻訳日:2023-10-23 18:05:38 公開日:2023-07-20

# ソフトウェア移植によるソフトウェア製品ラインエンジニアリング

Software Product Line Engineering via Software Transplantation ( http://arxiv.org/abs/2307.10896v1 )

ライセンス: Link先を確認

Leandro O. Souza, Earl T. Barr, Justyna Petke, Eduardo S. Almeida and Paulo Anselmo M. S. Neto

(参考訳) 関連製品を製造する企業にとって、SPL(Software Product Line)は、市場投入までの時間とソフトウェア品質を改善し、大幅なコスト削減を実現するソフトウェア再利用手法である。多くの場合、SPLをサポートするためにコードベースを再設計し、再設計するのに何年もかかります。現在のSPLのプラクティスは、さまざまな再設計フェーズ用に調整されたツールの集合に依存している。本稿では,splへの変換とメンテナンスを高速化するソフトウェア移植の汎用的自動化手法である foundryを提案する。 Foundryは機能抽出とマイグレーションを容易にする。複数のファイルで実装された一連の機能を効率よく、繰り返し、移植することができる。私たちはFoundryを使って、3つの現実世界のシステムから機能を自動で統合する2つの有効な製品ラインを作りました。さらに,Foundryの機能移行と手作業との比較実験を行った。 foundryは、splの専門家のグループがタスクを達成するのに要した平均時間よりも、コードベース全体の機能を自動的に4.8倍速く移行した。

For companies producing related products, a Software Product Line (SPL) is a software reuse method that improves time-to-market and software quality, achieving substantial cost reductions.These benefits do not come for free. It often takes years to re-architect and re-engineer a codebase to support SPL and, once adopted, it must be maintained. Current SPL practice relies on a collection of tools, tailored for different reengineering phases, whose output developers must coordinate and integrate. We present Foundry, a general automated approach for leveraging software transplantation to speed conversion to and maintenance of SPL. Foundry facilitates feature extraction and migration. It can efficiently, repeatedly, transplant a sequence of features, implemented in multiple files. We used Foundry to create two valid product lines that integrate features from three real-world systems in an automated way. Moreover, we conducted an experiment comparing Foundry's feature migration with manual effort. We show that Foundry automatically migrated features across codebases 4.8 times faster, on average, than the average time a group of SPL experts took to accomplish the task.

翻訳日:2023-10-23 17:05:07 公開日:2023-07-20

# コンパイラエラーに対処する - スタックオーバーフローか,あるいは大規模言語モデルか?

Addressing Compiler Errors: Stack Overflow or Large Language Models? ( http://arxiv.org/abs/2307.10793v1 )

ライセンス: Link先を確認

Patricia Widjojo and Christoph Treude

(参考訳) コンパイラエラーメッセージは、コンパイルエラーを扱うプログラマの初期リソースとして機能する。しかし、以前の研究では、コード問題を解決するのに十分なターゲット情報がないことがしばしば示されている。その結果、プログラマは通常、エラーを修正するために独自の研究に依存します。歴史的に、stack overflowはそのような情報の主要なリソースであったが、近年の大規模言語モデルの進歩は代替手段を提供している。本研究では,コンパイラエラーに遭遇するプログラマにとって最も効果的なアプローチを決定するために,3つのソースからの100個のコンパイラエラーメッセージを体系的に検討する。検討された要因には、Stack Overflow検索方法やモデルバージョンの影響、大規模言語モデルを使用する場合の迅速な表現などがある。 GPT-4は、コンパイラエラーメッセージの説明において、Stack Overflowよりも優れており、Stack Overflow検索にコードスニペットを追加する効果は、検索方法によって異なり、Stack Overflowの結果はGoogleとStackExchange APIの検索とは大きく異なる。さらに、GPT-4 は GPT-3.5 を超え、"How to fix" は "What do this error mean" に優れた結果をもたらす。これらの結果は、コンパイラエラーメッセージの支援、GPT-4のような先進的な大規模言語モデルのデバッグやAI支援プログラミングの研究者のための新たな探究の道を開く可能性について、プログラマに貴重なガイダンスを提供する。

Compiler error messages serve as an initial resource for programmers dealing with compilation errors. However, previous studies indicate that they often lack sufficient targeted information to resolve code issues. Consequently, programmers typically rely on their own research to fix errors. Historically, Stack Overflow has been the primary resource for such information, but recent advances in large language models offer alternatives. This study systematically examines 100 compiler error messages from three sources to determine the most effective approach for programmers encountering compiler errors. Factors considered include Stack Overflow search methods and the impact of model version and prompt phrasing when using large language models. The results reveal that GPT-4 outperforms Stack Overflow in explaining compiler error messages, the effectiveness of adding code snippets to Stack Overflow searches depends on the search method, and results for Stack Overflow differ significantly between Google and StackExchange API searches. Furthermore, GPT-4 surpasses GPT-3.5, with "How to fix" prompts yielding superior outcomes to "What does this error mean" prompts. These results offer valuable guidance for programmers seeking assistance with compiler error messages, underscoring the transformative potential of advanced large language models like GPT-4 in debugging and opening new avenues of exploration for researchers in AI-assisted programming.

翻訳日:2023-10-23 17:04:22 公開日:2023-07-20

# SMOTEC: 適応型スマートモビリティ実験のためのエッジコンピューティングテストベッド

SMOTEC: An Edge Computing Testbed for Adaptive Smart Mobility Experimentation ( http://arxiv.org/abs/2307.11181v1 )

ライセンス: Link先を確認

Zeinab Nezami, Evangelos Pournaras, Amir Borzouie, Jie Xu

(参考訳) smart mobilityは、ネットゼロの目標を達成する上で最重要となる。しかし、自動運転車や自動運転、電気自動車は、エッジからクラウドへの連続体全体に広がる、効率的で回復力があり、信頼性の高い計算オフロードバックボーンを必要とする。オンデマンドの不均一な計算資源をスマートモビリティに活用することは困難であり、しばしばコスト非効率である。本稿では,エッジコンピューティングを用いた適応型スマートモビリティ実験のためのオープンソーステストベッドSMOTECを紹介する。 SMOTECは、拡張現実やリアルタイムトラフィック監視といったエッジデバイス上のインテリジェンスサービスのプロトタイピングと最適化を行うモジュール型のエンドツーエンドインスツルメンテーションを初めて提供する。 SMOTECは、都市移動のためのSUMOシミュレータ、ZeroMQとEPOSを介して通信するRaspberry Piエッジデバイス、エッジからクラウドへの分散ロードバランシングを備えたAIベースのDockerコンテナ統合をサポートする。すべてのコンポーネントは、K3s軽量Kubernetesによってオーケストレーションされる。ミュンヘンからの交通監視のための自己最適化サービス配置の実証は、SMOTECの適用性と費用対効果を実証している。

Smart mobility becomes paramount for meeting net-zero targets. However, autonomous, self-driving and electric vehicles require more than ever before an efficient, resilient and trustworthy computational offloading backbone that expands throughout the edge-to-cloud continuum. Utilizing on-demand heterogeneous computational resources for smart mobility is challenging and often cost-ineffective. This paper introduces SMOTEC, a novel open-source testbed for adaptive smart mobility experimentation with edge computing. SMOTEC provides for the first time a modular end-to-end instrumentation for prototyping and optimizing placement of intelligence services on edge devices such as augmented reality and real-time traffic monitoring. SMOTEC supports a plug-and-play Docker container integration of the SUMO simulator for urban mobility, Raspberry Pi edge devices communicating via ZeroMQ and EPOS for an AI-based decentralized load balancing across edge-to-cloud. All components are orchestrated by the K3s lightweight Kubernetes. A proof-of-concept of self-optimized service placements for traffic monitoring from Munich demonstrates in practice the applicability and cost-effectiveness of SMOTEC.

翻訳日:2023-10-23 16:51:27 公開日:2023-07-20

# コンピュータ支援設計における脳神経インタフェースのためのビジュアルフローベースプログラミングプラグイン

Visual Flow-based Programming Plugin for Brain Computer Interface in Computer-Aided Design ( http://arxiv.org/abs/2307.11023v1 )

ライセンス: Link先を確認

Tong Bill Xu and Saleh Kalantari

(参考訳) 過去半世紀にわたり、BCI(Brain Computer Interfaces, Brain Computer Interfaces, BCI)のメインの応用は、車椅子やニューラルな義肢を制御したり、モビリティに制限のある人々のためのテキストやコマンドを生成したりしてきた。 BCIが新しい形態の環境相互作用を提供する可能性にもかかわらず、コンピュータ支援設計の応用にはこの分野において非常に注意が向けられている。本稿では、神経科学やコンピュータプログラミングの経験が乏しいデザイナーが、設計に関連する確立された指標とともに神経学的データにアクセスし、デジタルオンスクリーンオブジェクトと物理デバイスの両方でbciインタラクションプロトタイプを作成し、神経学的情報に基づいてデザインを評価し、さらなる分析を行うための新しいbciツールであるneuronの開発と応用について紹介する。 BCIツール開発について議論した後、この記事では2つのケーススタディを通じて、ツールのパフォーマンスを簡潔に評価し、意味、制限、将来の改善について議論する。

Over the last half century, the main application of Brain Computer Interfaces, BCIs has been controlling wheelchairs and neural prostheses or generating text or commands for people with restricted mobility. There has been very limited attention in the field to applications for computer aided design, despite the potential of BCIs to provide a new form of environmental interaction. In this paper we introduce the development and application of Neuron, a novel BCI tool that enables designers with little experience in neuroscience or computer programming to gain access to neurological data, along with established metrics relevant to design, create BCI interaction prototypes, both with digital onscreen objects and physical devices, and evaluate designs based on neurological information and record measurements for further analysis. After discussing the BCI tool development, the article presents its capabilities through two case studies, along with a brief evaluation of the tool performance and a discussion of implications, limitations, and future improvement.

翻訳日:2023-10-23 16:51:03 公開日:2023-07-20

# 抽出法リファクタリングのためのライブ環境の実証評価

Empirical Evaluation of a Live Environment for Extract Method Refactoring ( http://arxiv.org/abs/2307.11010v1 )

ライセンス: Link先を確認

Sara Fernandes, Ademar Aguiar, Andr\'e Restivo

(参考訳) 複雑なソフトウェアは読みやすく、適応し、維持することが難しい。リファクタリングはクリーンで自己説明的なコードを生成することができる。リファクタリングツールは、開発者をより良いコードへと誘導し、より品質を高めます。しかし、ほとんどがフィードバック、サポート、そして開発者がソフトウェアをどのように改善すべきかについてのガイダンスを提供するのに時間がかかり過ぎます。この問題を軽減するために,我々は,視覚的に提案し,リファクタリングを適用したLive Refactoringの概念をリアルタイムで検討した。このことを念頭に置いて,メソッドのリファクタリングを視覚的に識別し,推奨し,適用するライブリファクタリング環境を開発した。それを検証するために実験を行った。初期の結果から、私たちのアプローチはいくつかのコード品質メトリクスを改善しました。さらに、私たちの結果は、追加の助けなしにコードを手動でリファクタリングした結果とは大きく異なり、より良いと結論付けました。

Complex software can be hard to read, adapt, and maintain. Refactoring it can create cleaner and self-explanatory code. Refactoring tools try to guide developers towards better code, with more quality. However, most of them take too long to provide feedback, support, and guidance on how developers should improve their software. To reduce this problem, we explored the concept of Live Refactoring, focusing on visually suggesting and applying refactorings, in real-time. With this in mind, we developed a Live Refactoring Environment that visually identifies, recommends, and applies Extract Method refactorings. To validate it, we conducted an empirical experiment. Early results showed that our approach improved several code quality metrics. Besides, we also concluded that our results were significantly different and better than the ones from refactoring the code manually without further help.

翻訳日:2023-10-23 16:50:43 公開日:2023-07-20

# Twitterが未来を語るデータは何か?

What Twitter Data Tell Us about the Future? ( http://arxiv.org/abs/2308.02035v1 )

ライセンス: Link先を確認

Alina Landowska, Marek Robak, Maciej Skorski

(参考訳) 期待とは、未来に対する思考と生活を伴う人間の基本的な認知能力である。言語マーカーは予測思考を反映するが,自然言語処理の観点からの予測に関する研究は限られている。本研究は,未来派がtwitterで展開する未来を探究し,ソーシャルメディア利用者の予測思考に対する言語手がかりの影響を検討することを目的とする。我々は、Twitterの未来主義者が期待し共有する未来と、これらの将来がソーシャルデータからどのようにモデル化されるかに関する研究課題に対処する。本研究は,予測に関する関連研究を概観し,言語マーカーと高名な個人が予測思考に与える影響を考察し,未来を「現在未来」と「未来現在」に分類する分類体系を提案する。本研究では、将来のインフルエンサーによる100万件以上の公開ツイートをまとめたデータセットを提示し、SOTAモデルを用いたスケーラブルなNLPパイプラインを開発する。この研究は、LDAアプローチから15のトピックと、未来主義者のツイートの中でBERTopicアプローチから100のトピックを識別する。これらの発見はトピックモデリングの研究に寄与し、Twitterの未来学者が期待する未来についての洞察を提供する。この研究は、未来学者の言葉の手がかりが、ソーシャルメディア利用者が自身のシナリオを予測し、現在対応できる未来を示唆していることを実証している。完全なオープンソースデータセット、インタラクティブ解析、再現可能なソースコードは、さらなる調査のために利用可能である。

Anticipation is a fundamental human cognitive ability that involves thinking about and living towards the future. While language markers reflect anticipatory thinking, research on anticipation from the perspective of natural language processing is limited. This study aims to investigate the futures projected by futurists on Twitter and explore the impact of language cues on anticipatory thinking among social media users. We address the research questions of what futures Twitter's futurists anticipate and share, and how these anticipated futures can be modeled from social data. To investigate this, we review related works on anticipation, discuss the influence of language markers and prestigious individuals on anticipatory thinking, and present a taxonomy system categorizing futures into "present futures" and "future present". This research presents a compiled dataset of over 1 million publicly shared tweets by future influencers and develops a scalable NLP pipeline using SOTA models. The study identifies 15 topics from the LDA approach and 100 distinct topics from the BERTopic approach within the futurists' tweets. These findings contribute to the research on topic modelling and provide insights into the futures anticipated by Twitter's futurists. The research demonstrates the futurists' language cues signals futures-in-the-making that enhance social media users to anticipate their own scenarios and respond to them in present. The fully open-sourced dataset, interactive analysis, and reproducible source code are available for further exploration.

翻訳日:2023-08-14 01:58:38 公開日:2023-07-20

# 脳波からの感情の流出:GRUに基づくアプローチ

Unveiling Emotions from EEG: A GRU-Based Approach ( http://arxiv.org/abs/2308.02778v1 )

ライセンス: Link先を確認

Sarthak Johari, Gowri Namratha Meedinti, Radhakrishnan Delhibabu, Deepak Joshi

(参考訳) 感情コンピューティングにおける最も重要な研究分野の1つは脳波データを用いた感情識別である。本研究では,recurrent neural network(rnn)の一種であるgated recurrent unit(gru)アルゴリズムを用いて,脳波信号を用いて感情状態を予測できるかどうかを検証した。我々の公開データセットは、幸せ、中立、ネガティブな感情を呼び起こす刺激にさらされた人々の脳波記録と同様に、中立なデータを休ませることから成り立っている。最適な特徴抽出のために,アーティファクト除去,バンドパスフィルタ,正規化手法を用いて脳波データを前処理する。検証セットの100%の精度で,GRUの能力を利用して時間的依存関係を捕捉し,優れた結果を得た。他の機械学習技術と比較すると、GRUモデルのExtreme Gradient Boosting Classifierが最も精度が高かった。本研究により,モデルの性能に関する洞察に富んだ情報が得られ,正確な感情分類が可能となった。本研究は,感情認識のための grus などのディープラーニングモデルの可能性と,感情コンピューティングの進歩を強調する。我々の研究結果は、コンピュータと対話し、脳波活動を通して感情がどのように表現されるかを理解する新しい可能性を開く。

One of the most important study areas in affective computing is emotion identification using EEG data. In this study, the Gated Recurrent Unit (GRU) algorithm, which is a type of Recurrent Neural Networks (RNNs), is tested to see if it can use EEG signals to predict emotional states. Our publicly accessible dataset consists of resting neutral data as well as EEG recordings from people who were exposed to stimuli evoking happy, neutral, and negative emotions. For the best feature extraction, we pre-process the EEG data using artifact removal, bandpass filters, and normalization methods. With 100% accuracy on the validation set, our model produced outstanding results by utilizing the GRU's capacity to capture temporal dependencies. When compared to other machine learning techniques, our GRU model's Extreme Gradient Boosting Classifier had the highest accuracy. Our investigation of the confusion matrix revealed insightful information about the performance of the model, enabling precise emotion classification. This study emphasizes the potential of deep learning models like GRUs for emotion recognition and advances in affective computing. Our findings open up new possibilities for interacting with computers and comprehending how emotions are expressed through brainwave activity.

翻訳日:2023-08-14 00:49:38 公開日:2023-07-20

# LLMによるAI-Guardianの爆発支援

A LLM Assisted Exploitation of AI-Guardian ( http://arxiv.org/abs/2307.15008v1 )

ライセンス: Link先を確認

Nicholas Carlini

(参考訳) 大規模言語モデル(LLM)は今や様々なタスクで高い能力を持っている。本稿では,LPM である GPT-4 が,敵対的機械学習分野の研究者を支援することができるかどうかを考察する。ケーススタディとして、トップコンピュータセキュリティカンファレンスieee s&p 2023で発表された敵の例に対する最近の防御であるai-guardianのロバスト性を評価する。提案されたスキームは、未定義のベースラインと比較して堅牢性を高めません。我々は、このモデルを攻撃するためのコードを書かず、代わりにGPT-4に命令とガイダンスに従って全ての攻撃アルゴリズムを実装するよう促します。このプロセスは驚くほど効果的で効率的であり、言語モデルは、この論文の著者が実行したよりも早く曖昧な命令からコードを生成することもあった。結論として,(1)ai-guardianが提案する評価における警告サインが破られること,(2)言語モデリングにおける最新の進歩を用いて攻撃の設計と新たな研究を行う経験について論じた。

Large language models (LLMs) are now highly capable at a diverse range of tasks. This paper studies whether or not GPT-4, one such LLM, is capable of assisting researchers in the field of adversarial machine learning. As a case study, we evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023, a top computer security conference. We completely break this defense: the proposed scheme does not increase robustness compared to an undefended baseline. We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance. This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done. We conclude by discussing (1) the warning signs present in the evaluation that suggested to us AI-Guardian would be broken, and (2) our experience with designing attacks and performing novel research using the most recent advances in language modeling.

翻訳日:2023-07-30 03:58:43 公開日:2023-07-20

# ワンショット画像誘導による一般画像変換

General Image-to-Image Translation with One-Shot Image Guidance ( http://arxiv.org/abs/2307.14352v1 )

ライセンス: Link先を確認

Bin Cheng, Zuhao Liu, Yunbo Peng, Yue Lin

(参考訳) 大規模テキスト・画像ペアで事前学習した大規模テキスト・画像モデルは最近画像合成において優れた性能を示している。しかし、画像はプレーンテキストよりも直感的な視覚概念を提供することができる。望みの視覚概念を既存のイメージ、例えば肖像画に統合するにはどうすればいいのか? 現在の方法は、コンテンツを保存したり、視覚概念を効果的に翻訳する能力が欠けているため、この要求を満たすには不十分である。そこで本研究では,画像中のコンテンツを保存し,単一の参照画像でガイドされる視覚概念を翻訳する機能を備えた,視覚概念トランスレータ(VCT)という新しいフレームワークを提案する。提案するVCTは、内容と概念を抽出する内容概念反転(CCI)プロセスと、抽出した情報を収集して対象画像を得る内容概念融合(CCF)プロセスとを含む。 1つの参照画像のみを与えられた場合、提案するvctは、優れた結果を得て、幅広い一般的な画像から画像への翻訳タスクを完了することができる。提案手法の優越性と有効性を証明するため,広範な実験を行った。コードはhttps://github.com/crystalneuro/visual-concept-translatorで入手できる。

Large-scale text-to-image models pre-trained on massive text-image pairs show excellent performance in image synthesis recently. However, image can provide more intuitive visual concepts than plain text. People may ask: how can we integrate the desired visual concept into an existing image, such as our portrait? Current methods are inadequate in meeting this demand as they lack the ability to preserve content or translate visual concepts effectively. Inspired by this, we propose a novel framework named visual concept translator (VCT) with the ability to preserve content in the source image and translate the visual concepts guided by a single reference image. The proposed VCT contains a content-concept inversion (CCI) process to extract contents and concepts, and a content-concept fusion (CCF) process to gather the extracted information to obtain the target image. Given only one reference image, the proposed VCT can complete a wide range of general image-to-image translation tasks with excellent results. Extensive experiments are conducted to prove the superiority and effectiveness of the proposed methods. Codes are available at https://github.com/CrystalNeuro/visual-concept-translator.

翻訳日:2023-07-30 03:58:01 公開日:2023-07-20

# 財務における感情分析へのQNLPの適用

Applying QNLP to sentiment analysis in finance ( http://arxiv.org/abs/2307.11788v1 )

ライセンス: Link先を確認

Jonas Stein, Ivo Christ, Nicolas Kraus, Maximilian Balthasar Mansky, Robert M\"uller, Claudia Linnhof-Popien

(参考訳) わずかな質的な改善が大きな価値をもたらすアプリケーション領域として、金融は早期の量子優位の候補となる。量子自然言語処理(QNLP)の急速に進歩する分野に着目し、金融における感情分析の問題に対する2つの中心的アプローチであるDisCoCatとQuantum-Enhanced Long Short-Term Memory(QLSTM)の実用性について検討する。新たなChatGPTベースのデータ生成手法を用いることで、1000以上の現実的な文でケーススタディを行い、QLSTMはDisCoCatよりも大幅に高速にトレーニングでき、また、利用可能なソフトウェア実装の古典的な結果に近い結果が得られることを発見した。

As an application domain where the slightest qualitative improvements can yield immense value, finance is a promising candidate for early quantum advantage. Focusing on the rapidly advancing field of Quantum Natural Language Processing (QNLP), we explore the practical applicability of the two central approaches DisCoCat and Quantum-Enhanced Long Short-Term Memory (QLSTM) to the problem of sentiment analysis in finance. Utilizing a novel ChatGPT-based data generation approach, we conduct a case study with more than 1000 realistic sentences and find that QLSTMs can be trained substantially faster than DisCoCat while also achieving close to classical results for their available software implementations.

翻訳日:2023-07-25 19:47:35 公開日:2023-07-20

# 人間のLLM認知判断

LLM Cognitive Judgements Differ From Human ( http://arxiv.org/abs/2307.11787v1 )

ライセンス: Link先を確認

Sotiris Lamprinidis

(参考訳) 大規模言語モデル(LLM)は最近、研究者、ビジネス、消費者の注目を浴びている。このようなモデルの言語能力は広く研究されているが、認知的対象として研究することへの関心が高まっている。本研究は,認知科学文献からの限定データ帰納的推論課題におけるGPT-3とChatGPTの機能について検討する。その結果、これらのモデルの認知的判断は人間に似ていないことが示唆された。

Large Language Models (LLMs) have lately been on the spotlight of researchers, businesses, and consumers alike. While the linguistic capabilities of such models have been studied extensively, there is growing interest in investigating them as cognitive subjects. In the present work I examine GPT-3 and ChatGPT capabilities on an limited-data inductive reasoning task from the cognitive science literature. The results suggest that these models' cognitive judgements are not human-like.

翻訳日:2023-07-25 19:47:20 公開日:2023-07-20

# 知的エージェントの対話型シェーピング

Adversarial Conversational Shaping for Intelligent Agents ( http://arxiv.org/abs/2307.11785v1 )

ライセンス: Link先を確認

Piotr Tarasiewicz, Sultan Kenjeyev, Ilana Sebag, Shehab Alshehabi

(参考訳) 最近のディープラーニング手法の出現により、研究コミュニティは自然言語処理を含むいくつかの領域で最先端の成果を達成できるようになった。しかし、現在のロボコールシステムは不安定で不正確であり、テキスト生成とチャットボットは退屈で、人間のような対話を誤解する可能性がある。本研究は, 対人会話形成による知的会話エージェントの強化が可能な2つのモデルの性能について検討する: 政策勾配(GANPG)を持つ生成的敵ネットワークと, Li 等で提示された REGS モデルに基づいて, 世代ごとの報酬を持つ生成的敵ネットワークである。 [18] . このモデルは、部分的および完全に生成されたテキストシーケンスの両方に報酬を割り当てることができる。強化学習フレームワークにおいて,Seq2seq [36]とTransformer [37 ]という,異なるトレーニングの詳細でパフォーマンスを議論する。

The recent emergence of deep learning methods has enabled the research community to achieve state-of-the art results in several domains including natural language processing. However, the current robocall system remains unstable and inaccurate: text generator and chat-bots can be tedious and misunderstand human-like dialogue. In this work, we study the performance of two models able to enhance an intelligent conversational agent through adversarial conversational shaping: a generative adversarial network with policy gradient (GANPG) and a generative adversarial network with reward for every generation step (REGS) based on the REGS model presented in Li et al. [18] . This model is able to assign rewards to both partially and fully generated text sequences. We discuss performance with different training details : seq2seq [ 36] and transformers [37 ] in a reinforcement learning framework.

翻訳日:2023-07-25 19:47:15 公開日:2023-07-20

# 実際、学習可能安全クリティカルシステムのための達成可能な保証手段とは何か

What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems ( http://arxiv.org/abs/2307.11784v1 )

ライセンス: Link先を確認

Saddek Bensalem, Chih-Hong Cheng, Wei Huang, Xiaowei Huang, Changshun Wu, Xingyu Zhao

(参考訳) 機械学習は目覚ましい進歩を遂げているが、安全クリティカルな領域で学習可能なコンポーネントを確実に活用することは、依然として課題となっている。課題の1つは、厳格で実用的で、安全保証を達成する方法が最も顕著であることである。本稿ではまず,そのようなシステムの設計と検証に関わる工学的課題と研究課題について論じる。そして,既存の著作物が実際に証明可能な保証を達成できないという観測に基づいて,証明可能な統計保証の最終的な達成のための2段階検証手法を奨励する。

Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees.

翻訳日:2023-07-25 19:46:59 公開日:2023-07-20

# ボックス座標マッチングに基づく特定対象に対する新しい検出グラスピング法

A novel integrated method of detection-grasping for specific object based on the box coordinate matching ( http://arxiv.org/abs/2307.11783v1 )

ライセンス: Link先を確認

Zongmin Liu, Jirui Wang, Jie Li, Zufeng Li, Kai Ren, Peng Shi

(参考訳) 高齢者と障害者のケアを改善するためには,サービスロボットが物体検出と把持推定の効果的な融合法を持つことが不可欠である。しかし,物体検出と把握推定の組み合わせについて限定的な研究がなされている。そこで本稿では,この課題を克服するために,ボックス座標マッチングに基づく特定物体の検出・検出統合手法を提案する。まず、チャネルアテンションモジュール(CAM)と空間アテンションモジュール(SAM)を追加することで、SOLOv2インスタンスセグメンテーションモデルを改善する。次に、生成残差畳み込みニューラルネットワーク(GR-CNN)モデルに、アトラス空間ピラミッドプーリング(ASPP)とCAMを加え、把握推定を最適化する。さらに,ボックス座標マッチング(DG-BCM)に基づく検出グラスピング統合アルゴリズムを提案し,物体検出と把握推定の融合モデルを求める。検証のために,オブジェクト検出と把持推定実験を別々に行い,改良したモデルの優越性を検証する。さらに,本論文で提案するDG-BCMアルゴリズムの有効性と有効性を示すシミュレーションプラットフォーム上で,複数の特定のオブジェクトの把握タスクを実装した。

To better care for the elderly and disabled, it is essential for service robots to have an effective fusion method of object detection and grasp estimation. However, limited research has been observed on the combination of object detection and grasp estimation. To overcome this technical difficulty, a novel integrated method of detection-grasping for specific object based on the box coordinate matching is proposed in this paper. Firstly, the SOLOv2 instance segmentation model is improved by adding channel attention module (CAM) and spatial attention module (SAM). Then, the atrous spatial pyramid pooling (ASPP) and CAM are added to the generative residual convolutional neural network (GR-CNN) model to optimize grasp estimation. Furthermore, a detection-grasping integrated algorithm based on box coordinate matching (DG-BCM) is proposed to obtain the fusion model of object detection and grasp estimation. For verification, experiments on object detection and grasp estimation are conducted separately to verify the superiority of improved models. Additionally, grasping tasks for several specific objects are implemented on a simulation platform, demonstrating the feasibility and effectiveness of DG-BCM algorithm proposed in this paper.

翻訳日:2023-07-25 19:46:48 公開日:2023-07-20

# 非凸対象に対するアダムの収束性:緩和ハイパーパラメータと非エルゴードケース

Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic Case ( http://arxiv.org/abs/2307.11782v1 )

ライセンス: Link先を確認

Meixuan He, Yuqing Liang, Jinlan Liu and Dongpo Xu

(参考訳) adamは機械学習でよく使われる確率最適化アルゴリズムである。しかし、その収束は、特に非凸設定において完全には理解されていない。本稿では,バニラ・アダムの収束のためのハイパーパラメータ設定の検討と,非エルゴード収束の課題に取り組む。まず、エルゴード収束と非エルゴード収束の正確な定義を導入し、確率的最適化アルゴリズムの収束のほぼ全ての形態をカバーする。一方,エルゴード収束に対する非エルゴード収束の優位性を強調する。第二に、アダムのエルゴード収束を保証するためのより弱い条件を確立し、より緩和されたハイパーパラメータの選択を可能にする。このことから、adam のほぼ確実にエルゴード収束率を達成し、これは任意に $o(1/\sqrt{k})$ に近い。さらに重要なことは、Adamの最後の反復が非凸目的に対して定常点に収束することを初めて証明したことである。最後に、polyak-lojasiewicz (pl) 条件下で関数値に対する非エルゴード収束速度は$o(1/k)$を得る。これらの結果は、Adamが非凸確率最適化問題を解くための確かな理論基盤を構築している。

Adam is a commonly used stochastic optimization algorithm in machine learning. However, its convergence is still not fully understood, especially in the non-convex setting. This paper focuses on exploring hyperparameter settings for the convergence of vanilla Adam and tackling the challenges of non-ergodic convergence related to practical application. The primary contributions are summarized as follows: firstly, we introduce precise definitions of ergodic and non-ergodic convergence, which cover nearly all forms of convergence for stochastic optimization algorithms. Meanwhile, we emphasize the superiority of non-ergodic convergence over ergodic convergence. Secondly, we establish a weaker sufficient condition for the ergodic convergence guarantee of Adam, allowing a more relaxed choice of hyperparameters. On this basis, we achieve the almost sure ergodic convergence rate of Adam, which is arbitrarily close to $o(1/\sqrt{K})$. More importantly, we prove, for the first time, that the last iterate of Adam converges to a stationary point for non-convex objectives. Finally, we obtain the non-ergodic convergence rate of $O(1/K)$ for function values under the Polyak-Lojasiewicz (PL) condition. These findings build a solid theoretical foundation for Adam to solve non-convex stochastic optimization problems.

翻訳日:2023-07-25 19:46:28 公開日:2023-07-20

# 抽出抽象軸:生成言語モデルにおける内容「バローイング」の測定

The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models ( http://arxiv.org/abs/2307.11779v1 )

ライセンス: Link先を確認

Nedelina Teneva

(参考訳) 生成言語モデルは、検索エンジンの抽出応答とは対照的に、設計によって非常に抽象的な出力を生成する。このLCMの特徴とコンテンツライセシング・アトリビューションへの影響を考慮し、生成モデルのベンチマークのためのいわゆる抽出・抽象軸を提案し、対応するメトリクスやデータセット、ガイドラインの開発の必要性を強調した。我々は議論をテキストモダリティに限定する。

Generative language models produce highly abstractive outputs by design, in contrast to extractive responses in search engines. Given this characteristic of LLMs and the resulting implications for content Licensing & Attribution, we propose the the so-called Extractive-Abstractive axis for benchmarking generative models and highlight the need for developing corresponding metrics, datasets and annotation guidelines. We limit our discussion to the text modality.

翻訳日:2023-07-25 19:46:08 公開日:2023-07-20

# ASRU 2023 MADASR ChallengeにおけるTranssion TSUPの音声認識システム

Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge ( http://arxiv.org/abs/2307.11778v1 )

ライセンス: Link先を確認

Xiaoxiao Li, Gaosheng Zhang, An Zhu, Weiyong Li, Shuming Fang, Xiaoyue Yang, Jianchao Zhu

(参考訳) 本稿では,asru 2023 madasrチャレンジのためにtranssion speech understanding processing team (tsup) が開発した音声認識システムを提案する。このシステムは、低リソースインド言語へのasrモデルの適用にフォーカスしており、チャレンジの全4トラックをカバーしている。トラック1と2では、音響モデルはスクイーズフォーマエンコーダと、ジョイントctcアテンション訓練損失を有する双方向トランスデコーダを利用した。さらに、外部KenLM言語モデルがTLGビームサーチデコーディングに使用された。トラック3と4では、事前訓練されたindicwhisperモデルが採用され、チャレンジデータセットと公開データセットの両方で微調整された。ウィスパービームサーチデコーディングは、外部のKenLM言語モデルをサポートするように修正され、チャレンジによって提供される追加のテキストをより活用できるようになった。提案手法は,4トラックで24.17%,24.43%,15.97%,15.97%,ベンガル語で15.97%,4トラックで19.61%,19.54%,15.48%,15.48%の単語誤り率(wer)を達成した。これらの結果は,提案手法の有効性を示す。

This paper presents a speech recognition system developed by the Transsion Speech Understanding Processing Team (TSUP) for the ASRU 2023 MADASR Challenge. The system focuses on adapting ASR models for low-resource Indian languages and covers all four tracks of the challenge. For tracks 1 and 2, the acoustic model utilized a squeezeformer encoder and bidirectional transformer decoder with joint CTC-Attention training loss. Additionally, an external KenLM language model was used during TLG beam search decoding. For tracks 3 and 4, pretrained IndicWhisper models were employed and finetuned on both the challenge dataset and publicly available datasets. The whisper beam search decoding was also modified to support an external KenLM language model, which enabled better utilization of the additional text provided by the challenge. The proposed method achieved word error rates (WER) of 24.17%, 24.43%, 15.97%, and 15.97% for Bengali language in the four tracks, and WER of 19.61%, 19.54%, 15.48%, and 15.48% for Bhojpuri language in the four tracks. These results demonstrate the effectiveness of the proposed method.

翻訳日:2023-07-25 19:46:00 公開日:2023-07-20

# チーム強度推定による統計的強化学習によるハンドボールマッチの予測

Prediction of Handball Matches with Statistically Enhanced Learning via Estimated Team Strengths ( http://arxiv.org/abs/2307.11777v1 )

ライセンス: Link先を確認

Florian Felice and Christophe Ley

(参考訳) ハンドボールゲームを予測するため,統計的に強化された学習モデル(別名SEL)を提案する。 SELで強化された機械学習モデルは、80%以上の精度で最先端のモデルより優れている。本研究では,過去の女子部戦における機械学習モデルをトレーニングするためのデータセットの構築方法を示す。次に、異なるモデルを比較し、それらのパフォーマンス能力を評価する。最後に、説明可能性法により、ツールの範囲を、純粋に予測可能なソリューションから、非常に洞察に富んだ分析ツールに変更することができる。これはハンドボールチームのコーチにとって価値ある資産となり、将来のコンペティションに備えるための統計的および予測的な洞察を提供する。

We propose a Statistically Enhanced Learning (aka. SEL) model to predict handball games. Our Machine Learning model augmented with SEL features outperforms state-of-the-art models with an accuracy beyond 80%. In this work, we show how we construct the data set to train Machine Learning models on past female club matches. We then compare different models and evaluate them to assess their performance capabilities. Finally, explainability methods allow us to change the scope of our tool from a purely predictive solution to a highly insightful analytical tool. This can become a valuable asset for handball teams' coaches providing valuable statistical and predictive insights to prepare future competitions.

翻訳日:2023-07-25 19:45:35 公開日:2023-07-20

# 浅層再帰デコーダネットワークを用いた任意移動センサトラジェクタのフルステート再構成への応用

Leveraging arbitrary mobile sensor trajectories with shallow recurrent decoder networks for full-state reconstruction ( http://arxiv.org/abs/2307.11793v1 )

ライセンス: Link先を確認

Megan R. Ebers, Jan P. Williams, Katherine M. Steele, J. Nathan Kutz

(参考訳) センシングは、複雑な時空間システムの監視、予測、制御のための最も基本的なタスクの1つである。多くのアプリケーションでは、限られた数のセンサーがモバイルであり、ウェアラブル技術、海洋監視ブイ、気象気球など、ダイナミクスを使って移動している。これらの動的システム(統計に依存しない領域を除く)では、測定時間履歴は重要なタスクのために抽出できるかなりの量の情報をエンコードする。ほとんどのモデルフリーセンシングパラダイムは、現在のスパースセンサの測定結果を高次元の状態空間にマッピングすることを目的としている。現代のディープラーニングアーキテクチャを用いて、LSTM(long, short-term memory)ネットワークのようなシーケンス・ツー・ベクターモデルとデコーダ・ネットワークを用いて、動的軌跡情報を全状態空間推定にマッピング可能であることを示す。実際、我々は、浅い再帰デコーダネットワークでモバイルセンサトラジェクタを利用することで、ネットワークを訓練できることを実証する。一センサの任意の動的軌跡を用いて全状態空間を正確に再構築すること。 (ii)このアーキテクチャは、イムモービルセンサと比較して、復元誤差の平均二乗誤差のばらつきを低減させる。 (iii)アーキテクチャはまた、トレーニングセット外のデータの迅速な一般化(動的パラメータ化)を可能にする。また、センサの空間軌跡の訓練データが利用可能であれば、センサの経路を任意に選択することができる。ネットワークアーキテクチャの例外的な性能は,乱流,大域海面温度データ,人体運動バイオメカニクスの3つの応用で実証されている。

Sensing is one of the most fundamental tasks for the monitoring, forecasting and control of complex, spatio-temporal systems. In many applications, a limited number of sensors are mobile and move with the dynamics, with examples including wearable technology, ocean monitoring buoys, and weather balloons. In these dynamic systems (without regions of statistical-independence), the measurement time history encodes a significant amount of information that can be extracted for critical tasks. Most model-free sensing paradigms aim to map current sparse sensor measurements to the high-dimensional state space, ignoring the time-history all together. Using modern deep learning architectures, we show that a sequence-to-vector model, such as an LSTM (long, short-term memory) network, with a decoder network, dynamic trajectory information can be mapped to full state-space estimates. Indeed, we demonstrate that by leveraging mobile sensor trajectories with shallow recurrent decoder networks, we can train the network (i) to accurately reconstruct the full state space using arbitrary dynamical trajectories of the sensors, (ii) the architecture reduces the variance of the mean-square error of the reconstruction error in comparison with immobile sensors, and (iii) the architecture also allows for rapid generalization (parameterization of dynamics) for data outside the training set. Moreover, the path of the sensor can be chosen arbitrarily, provided training data for the spatial trajectory of the sensor is available. The exceptional performance of the network architecture is demonstrated on three applications: turbulent flows, global sea-surface temperature data, and human movement biomechanics.

翻訳日:2023-07-25 19:36:53 公開日:2023-07-20

# 古典データの分類のための相互作用層を有する量子畳み込みニューラルネットワーク

Quantum Convolutional Neural Networks with Interaction Layers for Classification of Classical Data ( http://arxiv.org/abs/2307.11792v1 )

ライセンス: Link先を確認

Jishnu Mahmud, Raisa Mashtura, Shaikh Anowarul Fattah

(参考訳) 量子機械学習(quantum machine learning, qml)は、量子コンピュータの計算能力の異常さから生まれた。量子ニューラルネットワークにおけるマルチキュービット相互作用の影響は, 近距離量子コンピュータの今後への期待から広く研究されることが重要である。本稿では,ネットワークの表現可能性と絡み合い能力を高める3量子ビット相互作用を利用した新しい相互作用層を有する量子畳み込みネットワークを提案する。提案手法は, mnist, fashion mnist, irisデータセットの3つの公開データセットを用いて, バイナリ分類とマルチクラス分類を行い, 既存の最先端手法の性能に取って代わるものと考えられる。

Quantum Machine Learning (QML) has come into the limelight due to the exceptional computational abilities of quantum computers. With the promises of near error-free quantum computers in the not-so-distant future, it is important that the effect of multi-qubit interactions on quantum neural networks is studied extensively. This paper introduces a Quantum Convolutional Network with novel Interaction layers exploiting three-qubit interactions increasing the network's expressibility and entangling capability, for classifying both image and one-dimensional data. The proposed approach is tested on three publicly available datasets namely MNIST, Fashion MNIST, and Iris datasets, to perform binary and multiclass classifications and is found to supersede the performance of the existing state-of-the-art methods.

翻訳日:2023-07-25 19:36:26 公開日:2023-07-20

# 複合量子シミュレーション

Composite Quantum Simulations ( http://arxiv.org/abs/2206.06409v3 )

ライセンス: Link先を確認

Matthew Hagan and Nathan Wiebe

(参考訳) 本稿では, トロッタスズキ公式やQDriftなどの複数の量子シミュレーション手法を, ゲート数を削減するための古いコネッセーションのアイデアの上に構築した1つの複合チャネルに組み合わせる枠組みを提案する。このアプローチの背後にある中心的な考え方は、シミュレーション内のチャネルのトロッターまたはQDrift部分にハミルトン項を割り当てるパーティショニングスキームを使用することである。これにより、高次トロッタースズキ式を用いてより大きい項をシミュレートしながら、QDriftを用いて、小さくて多数の項をシミュレートできる。合成チャネルと理想シミュレーションチャネルとの間のダイヤモンド距離の厳密な境界を証明し、合成チャネルの実装コストが漸近的に上界となる条件下では、項の確率的分割と決定論的分割の両方でそれを構成する方法を示す。最後に、分割スキームを決定するための戦略と、同一フレームワーク内で異なるシミュレーション手法を組み込む手法について論じる。

In this paper we provide a framework for combining multiple quantum simulation methods, such as Trotter-Suzuki formulas and QDrift into a single Composite channel that builds upon older coalescing ideas for reducing gate counts. The central idea behind our approach is to use a partitioning scheme that allocates a Hamiltonian term to the Trotter or QDrift part of a channel within the simulation. This allows us to simulate small but numerous terms using QDrift while simulating the larger terms using a high-order Trotter-Suzuki formula. We prove rigorous bounds on the diamond distance between the Composite channel and the ideal simulation channel and show under what conditions the cost of implementing the Composite channel is asymptotically upper bounded by the methods that comprise it for both probabilistic partitioning of terms and deterministic partitioning. Finally, we discuss strategies for determining partitioning schemes as well as methods for incorporating different simulation methods within the same framework.

翻訳日:2023-07-24 16:59:31 公開日:2023-07-20

# プリプロセッサが重要! 機械学習システムに対するリアルな意思決定に基づく攻撃

Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems ( http://arxiv.org/abs/2210.03297v2 )

ライセンス: Link先を確認

Chawin Sitawarin, Florian Tram\`er, Nicholas Carlini

(参考訳) 決定に基づく攻撃は、ハードラベルクエリのみを作成することによって、機械学習(ML)モデルに対する逆例を構築する。これらの攻撃は主にスタンドアロンのニューラルネットワークに直接適用される。しかし、実際には、MLモデルはより大きな学習システムの1つの構成要素にすぎない。分類器の前に1つのプリプロセッサを追加することで、最先端のクエリベースの攻撃は、モデル単独で攻撃するよりも予測パイプラインを攻撃するのに7$\times$以下になることがわかった。この相違は、ほとんどのプリプロセッサが入力空間に不変性の概念を導入しているという事実によって説明される。したがって、この不変性に気づいていない攻撃は、必然的に大量のクエリを無駄にして再発見または克服する。したがって、我々は技術を開発する。 (i)プリプロセッサをリバースエンジニアリングし、 (ii)この抽出情報を用いてエンドツーエンドシステムを攻撃する。プリプロセッサ抽出法は数百のクエリしか必要とせず,プリプロセッサアウェアアタックはモデル単独による攻撃と同じ効果を回復する。コードはhttps://github.com/google-research/preprocessor-aware-black-box-attackにある。

Decision-based attacks construct adversarial examples against a machine learning (ML) model by making only hard-label queries. These attacks have mainly been applied directly to standalone neural networks. However, in practice, ML models are just one component of a larger learning system. We find that by adding a single preprocessor in front of a classifier, state-of-the-art query-based attacks are up to 7$\times$ less effective at attacking a prediction pipeline than at attacking the model alone. We explain this discrepancy by the fact that most preprocessors introduce some notion of invariance to the input space. Hence, attacks that are unaware of this invariance inevitably waste a large number of queries to re-discover or overcome it. We, therefore, develop techniques to (i) reverse-engineer the preprocessor and then (ii) use this extracted information to attack the end-to-end system. Our preprocessors extraction method requires only a few hundred queries, and our preprocessor-aware attacks recover the same efficacy as when attacking the model alone. The code can be found at https://github.com/google-research/preprocessor-aware-black-box-attack.

翻訳日:2023-07-24 16:48:33 公開日:2023-07-20

# 逆ベイズシミュレーション

Adversarial Bayesian Simulation ( http://arxiv.org/abs/2208.12113v2 )

ライセンス: Link先を確認

Yuexi Wang, Veronika Ro\v{c}kov\'a

(参考訳) 明示的あるいは扱いやすい可能性がない場合、ベイジアンはしばしば推定のために近似ベイジアン計算(abc)に頼る。我々の研究は、GAN(Generative Adversarial Network)と逆効果ベイズに基づくディープ・ニューラル暗黙のサンプルでABCを橋渡しする。 abcとgansは、観測データと偽データとを比較して、それぞれ後方と確率からシミュレートする。我々は, 逆最適化問題を解くことで, 直接後方を狙うベイズ型GAN(B-GAN)サンプリング器を開発した。 B-GANは条件付きGANによってABC参照で学習された決定論的マッピングによって駆動される。マッピングがトレーニングされた後、ノイズを無視可能な追加コストでフィルタリングすることで、後部サンプルを得る。 1) 重み付けを重要視するデータ駆動型提案と, (2) 変分ベイズを用いて, 処理後の局所的な改良を2つ提案する。本研究は,ニューラルネットワーク生成器や識別器において,真と近似後部の典型的な総変動距離が0に収束することを示す。シミュレーションデータを用いた結果,近年の近未来型後方シミュレータと比較して高い競争性能を示した。

In the absence of explicit or tractable likelihoods, Bayesians often resort to approximate Bayesian computation (ABC) for inference. Our work bridges ABC with deep neural implicit samplers based on generative adversarial networks (GANs) and adversarial variational Bayes. Both ABC and GANs compare aspects of observed and fake data to simulate from posteriors and likelihoods, respectively. We develop a Bayesian GAN (B-GAN) sampler that directly targets the posterior by solving an adversarial optimization problem. B-GAN is driven by a deterministic mapping learned on the ABC reference by conditional GANs. Once the mapping has been trained, iid posterior samples are obtained by filtering noise at a negligible additional cost. We propose two post-processing local refinements using (1) data-driven proposals with importance reweighting, and (2) variational Bayes. We support our findings with frequentist-Bayesian results, showing that the typical total variation distance between the true and approximate posteriors converges to zero for certain neural network generators and discriminators. Our findings on simulated data show highly competitive performance relative to some of the most recent likelihood-free posterior simulators.

翻訳日:2023-07-24 16:47:30 公開日:2023-07-20

# 単一量子ビットセンサを用いた2次元双極子スピンアンサンブルの探索ダイナミクス

Probing dynamics of a two-dimensional dipolar spin ensemble using single qubit sensor ( http://arxiv.org/abs/2207.10688v2 )

ライセンス: Link先を確認

Kristine Rezai, Soonwon Choi, Mikhail D. Lukin, Alexander O. Sushkov

(参考訳) 量子多体系の微視的熱化ダイナミクスを理解することは、現代の統計物理学の中心的な課題の一つである。ここでは,ダイヤモンド結晶表面上の電子スピンの2次元アンサンブルにおける個々のスピンダイナミクスを実験的に検討する。表面近傍nv中心をナノスケール磁気センサとして、双極子相互作用面スピンアンサンブルにおける個々のスピンの相関ダイナミクスを調べる。各スピンの緩和速度は, 近傍の磁場変動の時間スケールと強く相関し, 自在に推定された双極子相互作用強度に基づいて, ネイブ期待よりも著しく遅いことが観察された。この不規則に緩やかな緩和速度は、強い動的障害の存在によるものであり、動的共鳴計数に基づく定量的な説明を示す。最後に、共振スピンロック駆動を用いて局所磁場の有効強度を制御し、異なる状態における動的障害の役割を明らかにする。我々の研究は、強く相互作用する無秩序なスピンアンサンブルにおける量子熱化の微視的研究と制御への道を開いた。

Understanding the thermalization dynamics of quantum many-body systems at the microscopic level is among the central challenges of modern statistical physics. Here we experimentally investigate individual spin dynamics in a two-dimensional ensemble of electron spins on the surface of a diamond crystal. We use a near-surface NV center as a nanoscale magnetic sensor to probe correlation dynamics of individual spins in a dipolar interacting surface spin ensemble. We observe that the relaxation rate for each spin is significantly slower than the naive expectation based on independently estimated dipolar interaction strengths with nearest neighbors and is strongly correlated with the timescale of the local magnetic field fluctuation. We show that this anomalously slow relaxation rate is due to the presence of strong dynamical disorder and present a quantitative explanation based on dynamic resonance counting. Finally, we use resonant spin-lock driving to control the effective strength of the local magnetic fields and reveal the role of the dynamical disorder in different regimes. Our work paves the way towards microscopic study and control of quantum thermalization in strongly interacting disordered spin ensembles.

翻訳日:2023-07-24 16:45:57 公開日:2023-07-20

# 正規化リスク最小化のための分布シフト下の単調リスク関係

Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization ( http://arxiv.org/abs/2210.11589v2 )

ライセンス: Link先を確認

Daniel LeJeune, Jiayu Liu, Reinhard Heckel

(参考訳) 機械学習システムは、トレーニング分布とは異なる分布から引き出されたデータに適用されることが多い。近年の研究では,様々な分類・信号再構成問題に対して,分布外性能と分布内性能との相関が強く示されている。この関係やより一般に単調な関係が成り立つと、それは重要な結果をもたらす。例えば、あるディストリビューションのパフォーマンスを、もう一方のパフォーマンスのプロキシとして最適化することができる。本稿では,2つの分布におけるモデルの性能の単調な関係が期待できる条件について検討する。共変量シフトの下でのリッジ正規化一般線形モデルの二乗誤差に対する完全漸近線形関係と誤分類誤差に対する単調関係および線形逆問題に対する近似線形関係を証明した。

Machine learning systems are often applied to data that is drawn from a different distribution than the training distribution. Recent work has shown that for a variety of classification and signal reconstruction problems, the out-of-distribution performance is strongly linearly correlated with the in-distribution performance. If this relationship or more generally a monotonic one holds, it has important consequences. For example, it allows to optimize performance on one distribution as a proxy for performance on the other. In this paper, we study conditions under which a monotonic relationship between the performances of a model on two distributions is expected. We prove an exact asymptotic linear relation for squared error and a monotonic relation for misclassification error for ridge-regularized general linear models under covariate shift, as well as an approximate linear relation for linear inverse problems.

翻訳日:2023-07-24 16:37:26 公開日:2023-07-20

# ADPS:画像異常検出のための非対称蒸留後分離法

ADPS: Asymmetric Distillation Post-Segmentation Method for Image Anomaly Detection ( http://arxiv.org/abs/2210.10495v2 )

ライセンス: Link先を確認

Peng Xing, Hao Tang, Jinhui Tang, Zechao Li

(参考訳) 知識蒸留に基づく異常検出(KDAD)手法は,両ネットワークが抽出した特徴を対比することにより,異常領域の検出とセグメント化を行う教師学生パラダイムに依存している。しかし、既存のKDADメソッドには2つの制限がある。 1)生徒ネットワークは、教師ネットワークの表現を必死に再現することができ、 2)教師ネットワークの特徴は「参照基準」としてのみ機能し,完全に活用されていない。この目的のために、確立されたパラダイムから離れ、代わりに非対称蒸留ポストセグメンテーション(ADPS)と呼ばれる革新的なアプローチを提案する。我々のADPSは教師-学生ネットワークの入力と同一の画像の異なる形態の非対称蒸留パラダイムを採用し、学生ネットワークに異常領域の識別表現を学習させる。一方,非対称パラダイムから得られた蒸留知識を教師ネットワークに伝達する粗い局所化マスクを生成するために,カスタマイズされた重みマスクブロック(wmb)を提案する。 WMBを組み込んだPSM(Post-Segmentation Module)は,微細な構造と明確な境界を持つ異常領域を効果的に検出し,分割することができる。実験の結果,ADPSは異常の検出とセグメント化において最先端の手法よりも優れていた。驚いたことに、ADPSは平均精度(AP)を、MVTec ADとKolektorSDD2データセットでそれぞれ9%、20%改善している。

Knowledge Distillation-based Anomaly Detection (KDAD) methods rely on the teacher-student paradigm to detect and segment anomalous regions by contrasting the unique features extracted by both networks. However, existing KDAD methods suffer from two main limitations: 1) the student network can effortlessly replicate the teacher network's representations, and 2) the features of the teacher network serve solely as a ``reference standard" and are not fully leveraged. Toward this end, we depart from the established paradigm and instead propose an innovative approach called Asymmetric Distillation Post-Segmentation (ADPS). Our ADPS employs an asymmetric distillation paradigm that takes distinct forms of the same image as the input of the teacher-student networks, driving the student network to learn discriminating representations for anomalous regions. Meanwhile, a customized Weight Mask Block (WMB) is proposed to generate a coarse anomaly localization mask that transfers the distilled knowledge acquired from the asymmetric paradigm to the teacher network. Equipped with WMB, the proposed Post-Segmentation Module (PSM) is able to effectively detect and segment abnormal regions with fine structures and clear boundaries. Experimental results demonstrate that the proposed ADPS outperforms the state-of-the-art methods in detecting and segmenting anomalies. Surprisingly, ADPS significantly improves Average Precision (AP) metric by 9% and 20% on the MVTec AD and KolektorSDD2 datasets, respectively.

翻訳日:2023-07-24 16:37:10 公開日:2023-07-20

# 協調進化探索によるML対応自律システムの危険性境界の同定

Identifying the Hazard Boundary of ML-enabled Autonomous Systems Using Cooperative Co-Evolutionary Search ( http://arxiv.org/abs/2301.13807v2 )

ライセンス: Link先を確認

Sepehr Sharifi, Donghwan Shin, Lionel C. Briand and Nathan Aschbacher

(参考訳) 機械学習(ML)対応自律システム(MLAS)では,MLコンポーネント(MLC)の危険境界を解析で識別することが不可欠である。このようなバウンダリがLCCの振る舞いやハザードに繋がるシステムコンテキストという観点で条件を捉えていることを考慮すれば、例えばハザード境界に到達する際に、事前に定義されたフォールバック機構を実行時に取得できる安全モニターを構築することができる。しかし、このようなMLコンポーネントのハザード境界を決定することは困難である。これは、システムコンテキスト(シナリオ)とMLCの振る舞い(入力と出力)を組み合わせた問題空間が、徹底的な探索には大きすぎること、遺伝的アルゴリズムのような従来のメタヒューリスティック(メタヒューリスティック)を扱うことにさえ原因がある。さらに、MLASの安全性違反を判定するために必要なシミュレーションの計算コストが高いため、この問題はさらに難しくなる。さらに、シミュレーションにおける制御不能なパラメータとMLASにおけるMLモデル(例えばディープニューラルネットワーク)の非線形な振る舞いのために、問題空間内の領域が決定論的に安全または安全でないと考えることは非現実的である。この課題に対処するために,協調進化アルゴリズム(CCEA)に基づく新しい手法であるMLCSHE(ML Component Safety Hazard Envelope)を提案する。さらに,安全で安全でない領域を確率論的に捉え,確率的ハザード境界からの距離を測定する新しい適合関数を定義し,探索を効果的に推進する。複雑な自律走行車(AV)におけるMLCSHEの有効性と効率について検討した。評価の結果,MLCSHEは標準的な遺伝的アルゴリズムやランダム検索よりも効率的かつ効率的であることが示唆された。

In Machine Learning (ML)-enabled autonomous systems (MLASs), it is essential to identify the hazard boundary of ML Components (MLCs) in the MLAS under analysis. Given that such boundary captures the conditions in terms of MLC behavior and system context that can lead to hazards, it can then be used to, for example, build a safety monitor that can take any predefined fallback mechanisms at runtime when reaching the hazard boundary. However, determining such hazard boundary for an ML component is challenging. This is due to the problem space combining system contexts (i.e., scenarios) and MLC behaviors (i.e., inputs and outputs) being far too large for exhaustive exploration and even to handle using conventional metaheuristics, such as genetic algorithms. Additionally, the high computational cost of simulations required to determine any MLAS safety violations makes the problem even more challenging. Furthermore, it is unrealistic to consider a region in the problem space deterministically safe or unsafe due to the uncontrollable parameters in simulations and the non-linear behaviors of ML models (e.g., deep neural networks) in the MLAS under analysis. To address the challenges, we propose MLCSHE (ML Component Safety Hazard Envelope), a novel method based on a Cooperative Co-Evolutionary Algorithm (CCEA), which aims to tackle a high-dimensional problem by decomposing it into two lower-dimensional search subproblems. Moreover, we take a probabilistic view of safe and unsafe regions and define a novel fitness function to measure the distance from the probabilistic hazard boundary and thus drive the search effectively. We evaluate the effectiveness and efficiency of MLCSHE on a complex Autonomous Vehicle (AV) case study. Our evaluation results show that MLCSHE is significantly more effective and efficient compared to a standard genetic algorithm and random search.

翻訳日:2023-07-24 16:28:29 公開日:2023-07-20

# マニフォールドニューラルネットワークの収束率

A Convergence Rate for Manifold Neural Networks ( http://arxiv.org/abs/2212.12606v2 )

ライセンス: Link先を確認

Joyce Chew and Deanna Needell and Michael Perlmutter

(参考訳) 幾何深層学習の急速に発展する分野は、グラフや多様体のような非ユークリッド領域でそのようなデータを解析するためのニューラルネットワークアーキテクチャの開発を目指している。 Z. Wang, L. Ruiz, A. Ribeiroの最近の研究は、ラプラスベルトラミ作用素のスペクトル分解を用いて多様体ニューラルネットワークを構築する方法を紹介している。さらに,本研究では,多様体が未知かつ有限個のサンプル点しかアクセスできない場合に,そのようなニューラルネットワークを実装するための数値スキームを提案する。著者らは、データ駆動グラフの構築に依存するこのスキームは、標本点の数が無限になるにつれて連続限界に収束することを示した。ここでは、多様体の内在次元に依存するが、周囲次元とは独立な収束率を確立することにより、この結果の上に構築する。また,収束速度は,ネットワークの深さと各層で使用されるフィルタ数にどのように依存するかについても検討した。

High-dimensional data arises in numerous applications, and the rapidly developing field of geometric deep learning seeks to develop neural network architectures to analyze such data in non-Euclidean domains, such as graphs and manifolds. Recent work by Z. Wang, L. Ruiz, and A. Ribeiro has introduced a method for constructing manifold neural networks using the spectral decomposition of the Laplace Beltrami operator. Moreover, in this work, the authors provide a numerical scheme for implementing such neural networks when the manifold is unknown and one only has access to finitely many sample points. The authors show that this scheme, which relies upon building a data-driven graph, converges to the continuum limit as the number of sample points tends to infinity. Here, we build upon this result by establishing a rate of convergence that depends on the intrinsic dimension of the manifold but is independent of the ambient dimension. We also discuss how the rate of convergence depends on the depth of the network and the number of filters used in each layer.

翻訳日:2023-07-24 16:27:26 公開日:2023-07-20

# 強磁場中におけるスカラー荷電粒子によるツイスト光子の放出

Emission of twisted photons by a scalar charged particle in a strong magnetic field ( http://arxiv.org/abs/2303.01946v2 )

ライセンス: Link先を確認

D. Karlovets, A. Di Piazza

(参考訳) 一定かつ均一な磁場中におけるスカラー荷電粒子による光子の放出について考察する。光子と外部電荷の両方が検出されるという従来のアプローチとは対照的に、電荷のみが検出され、放出された光子の特性が調査される場合について検討する。背景磁場は計算において正確に考慮され、電荷は相対論的ランダウ状態によって記述される。放出された光子状態は、それぞれ初期荷電粒子と最終荷電粒子の角運動量として$\ell-\ell'$ と$\ell'$ が与えられる全角運動量を持つねじれたベッセルビームを表すことが示されている。非偏極電荷、特にハードX線と$\gamma$-ray範囲、および臨界および亜臨界磁場において、シュウィンガー値が$H_c = 4.4\times 10^9$Tと比較すると、ほとんどの光子は$\ell-\ell'\gtrsim 1$でねじられる。

We consider the emission of a photon by a scalar charged particle in a constant and uniform magnetic field. In contrast to the conventional approach with both photon and outgoing charge being assumed to be detected, we study the case where only the charge is detected and investigate the properties of the emitted photon. The background magnetic field is taken into account exactly in the calculations and the charge is described by relativistic Landau states. It is shown that the emitted photon state represents a twisted Bessel beam with a total angular momentum given by $\ell-\ell'$, where $\ell$ and $\ell'$ are angular momentum quantum numbers of the initial and final charged particle, respectively. The majority of photons emitted by unpolarized charges, especially in the hard X-ray and $\gamma$-ray range and in critical and sub-critical magnetic fields, as compared to the Schwinger value of $H_c = 4.4\times 10^9$ T, turn out to be twisted with $\ell-\ell'\gtrsim 1$.

翻訳日:2023-07-24 16:20:21 公開日:2023-07-20

# クリフォード回路を用いた分割量子化学シミュレーション

Partitioning Quantum Chemistry Simulations with Clifford Circuits ( http://arxiv.org/abs/2303.01221v2 )

ライセンス: Link先を確認

Philipp Schleich, Joseph Boen, Lukasz Cincio, Abhinav Anand, Jakob S. Kottmann, Sergei Tretiak, Pavel A. Dub, Al\'an Aspuru-Guzik

(参考訳) 現在の量子コンピューティングハードウェアは、量子コンピュータ上での量子化学計算において、より大きく複雑な分子の研究を短期的に制限するわずかなノイズ量子ビットの可用性によって制限されている。本研究では,量子回路と変分量子固有解器の枠組みに留まりながら,古典的および近古典的処理の限界について検討する。この目的のために,分離可能なペア ansatz 形式を適応させたパラメトリズド波動関数に対して,naive と physical に動機づけられ,古典的に効率的な積 ansatz を考える。このアンサッツから派生したサブシステム間の相互作用を考慮した後処理と組み合わせる。古典的処理は、強制されたサブシステム間の支持を持ち、ハミルトニアンに折り畳まれる別の量子回路によって与えられる。ハミルトン項の数が指数関数的に増加するのを避けるために、エンタングリング演算は純粋にクリフォード回路または近クリフォード回路から構成される。クリフォード回路は古典的に効率的にシミュレートできるが、それらは普遍的ではない。表現性の欠如を考慮し、選択された非クリフォードゲートの少ない近クリフォード回路を用いる。この目的を達成するための正確な回路構造は分子に依存し、シミュレートアニーリングと遺伝的アルゴリズムを用いて構築される。関心の分子の集合に対する我々のアプローチを実証し、方法論の到達範囲について検討する。本手法の数値シミュレーションによる実証的検証により, 分離可能なペア・アンサッツと比較して, 最大50\%の量子ビット数の減少が確認された。

Current quantum computing hardware is restricted by the availability of only few, noisy qubits which limits the investigation of larger, more complex molecules in quantum chemistry calculations on quantum computers in the near-term. In this work, we investigate the limits of their classical and near-classical treatment while staying within the framework of quantum circuits and the variational quantum eigensolver. To this end, we consider naive and physically motivated, classically efficient product ansatz for the parametrized wavefunction adapting the separable pair ansatz form. We combine it with post-treatment to account for interactions between subsystems originating from this ansatz. The classical treatment is given by another quantum circuit that has support between the enforced subsystems and is folded into the Hamiltonian. To avoid an exponential increase in the number of Hamiltonian terms, the entangling operations are constructed from purely Clifford or near-Clifford circuits. While Clifford circuits can be simulated efficiently classically, they are not universal. In order to account for missing expressibility, near-Clifford circuits with only few, selected non-Clifford gates are employed. The exact circuit structure to achieve this objective is molecule-dependent and is constructed using simulated annealing and genetic algorithms. We demonstrate our approach on a set of molecules of interest and investigate the extent of our methodology's reach. Empirical validation of our approach using numerical simulations shows a reduction of the qubit count of up to a 50\% at a similar accuracy as compared to the separable-pair ansatz.

翻訳日:2023-07-24 16:19:58 公開日:2023-07-20

# 単一分子における刺激ラマン転移のスペクトル分割

Spectral splitting of a stimulated Raman transition in a single molecule ( http://arxiv.org/abs/2302.14733v2 )

ライセンス: Link先を確認

Johannes Zirkelbach, Burak Gurlek, Masoud Mirzaei, Alexey Shkarin, Tobias Utikal, Stephan G\"otzinger, Vahid Sandoghdar

(参考訳) ラマン散乱の小さな断面積は、単分子レベルでの直接研究にとって大きな課題となる。共振共振の高フランク・コンドン係数を利用し、電子接地における大きな振動周波数差と励起状態とt < 2kでの動作を選択し、コヒーレント刺激ラマン遷移を分子内で駆動することに成功した。我々は、その現象の特徴的シグネチャとなるスペクトル分割を観察し、モデル化する。本研究は、固体量子光学および情報処理における分子の自由度を内在的に利用するための基礎を定めている。

The small cross section of Raman scattering poses a great challenge for its direct study at the single-molecule level. By exploiting the high Franck-Condon factor of a common-mode resonance, choosing a large vibrational frequency difference in electronic ground and excited states and operation at T < 2K, we succeed at driving a coherent stimulated Raman transition in individual molecules. We observe and model a spectral splitting that serves as a characteristic signature of the phenomenon at hand. Our study sets the ground for exploiting the intrinsic optomechanical degrees of freedom of molecules for applications in solid-state quantum optics and information processing.

翻訳日:2023-07-24 16:19:19 公開日:2023-07-20

# 単眼単発6Dオブジェクトポース推定のためのオープンチャレンジ

Open Challenges for Monocular Single-shot 6D Object Pose Estimation ( http://arxiv.org/abs/2302.11827v2 )

ライセンス: Link先を確認

Stefan Thalhammer, Peter H\"onig, Jean-Baptiste Weibel, Markus Vincze

(参考訳) オブジェクトのポーズ推定は、ロボット操作、ビンピック、拡張現実、シーン理解を可能にする非自明なタスクである。単眼物体のポーズ推定は、高性能なディープラーニングベースのソリューションの台頭とともにかなりの勢いを増し、センサが安価で推論が速いため、コミュニティにとって特に興味深い。先行研究は多種多様なポーズ推定問題に対する芸術の包括的状態を確立する。その広い範囲は将来有望な方向を特定するのを困難にしている。我々は,ロボット工学でよく用いられる単発モノクロ6Dオブジェクトのポーズ推定の問題の範囲を狭め,そのような傾向を識別することができる。ロボティクスとコンピュータビジョンに関する最近の論文をレビューすることで、両方の分野の連合に最先端の芸術が確立される。その後、研究者が関連する研究のアイデアを定式化し、技術の現状を効果的に進めるための有望な研究方向を特定した。例えば、メソッドはドメインシフトを克服するのに十分な高度であり、オクルージョンハンドリングは根本的な課題である。また,ロボット工学を進歩させる上での課題として,新規なオブジェクトポーズ推定や課題処理といった課題も強調する。

Object pose estimation is a non-trivial task that enables robotic manipulation, bin picking, augmented reality, and scene understanding, to name a few use cases. Monocular object pose estimation gained considerable momentum with the rise of high-performing deep learning-based solutions and is particularly interesting for the community since sensors are inexpensive and inference is fast. Prior works establish the comprehensive state of the art for diverse pose estimation problems. Their broad scopes make it difficult to identify promising future directions. We narrow down the scope to the problem of single-shot monocular 6D object pose estimation, which is commonly used in robotics, and thus are able to identify such trends. By reviewing recent publications in robotics and computer vision, the state of the art is established at the union of both fields. Following that, we identify promising research directions in order to help researchers to formulate relevant research ideas and effectively advance the state of the art. Findings include that methods are sophisticated enough to overcome the domain shift and that occlusion handling is a fundamental challenge. We also highlight problems such as novel object pose estimation and challenging materials handling as central challenges to advance robotics.

翻訳日:2023-07-24 16:18:45 公開日:2023-07-20

# ニューラルネットワークに基づくスペクトル推定と希少事象予測のための不正確な反復数値線形代数

Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction ( http://arxiv.org/abs/2303.12534v3 )

ライセンス: Link先を確認

John Strahan, Spencer C. Guo, Chatipat Lorpaiboon, Aaron R. Dinner, Jonathan Weare

(参考訳) 複雑なシステムの力学を理解することは、多くの自由度があり、興味のある事象を記述する上で最も重要なものはしばしば明らかではない。遷移作用素の先頭の固有関数は視覚化に有用であり、イベントの確率や平均時間(予測)といった統計計算の効率的な基盤を提供することができる。ここでは、これらの固有関数(スペクトル推定)を計算し、有限間隔でサンプリングされた短い軌跡のデータセットから予測する不正確な反復線型代数法を開発する。生体分子系の可視化と高次元モデルを容易にする低次元モデル上での手法を実証する。強化学習における予測問題の意味について論じる。

Understanding dynamics in complex systems is challenging because there are many degrees of freedom, and those that are most important for describing events of interest are often not obvious. The leading eigenfunctions of the transition operator are useful for visualization, and they can provide an efficient basis for computing statistics such as the likelihood and average time of events (predictions). Here we develop inexact iterative linear algebra methods for computing these eigenfunctions (spectral estimation) and making predictions from a data set of short trajectories sampled at finite intervals. We demonstrate the methods on a low-dimensional model that facilitates visualization and a high-dimensional model of a biomolecular system. Implications for the prediction problem in reinforcement learning are discussed.

翻訳日:2023-07-24 16:09:30 公開日:2023-07-20

# 非一様超グラフ確率ブロックモデルの厳密な回復

Exact recovery for the non-uniform Hypergraph Stochastic Block Model ( http://arxiv.org/abs/2304.13139v2 )

ライセンス: Link先を確認

Ioana Dumitriu, Haixiao Wang

(参考訳) 非一様ハイパーグラフ確率ブロックモデル(hsbm)の下でのランダムハイパーグラフにおけるコミュニティ検出問題を考える。文献の中で初めて、この一様でないケースの下で正確な回復のための鋭いしきい値が、マイナーな制約のもとに確立された。ここでの重要なポイントは、すべての均一な層から情報を集約することで、各層が単独では不可能に見える場合であっても、正確な回復が得られることである。しきい値以上の正確な回復を達成する2つの効率的なアルゴリズムが提供される。我々のアルゴリズムの理論的解析は、非一様ランダムハイパーグラフに対する隣接行列の濃度と正規化に依存しており、これは独立な関心を持つ可能性がある。またパラメータ知識と推定に関するオープンな問題にも対処する。

Consider the community detection problem in random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), where each hyperedge appears independently with some given probability depending only on the labels of its vertices. We establish, for the first time in the literature, a sharp threshold for exact recovery under this non-uniform case, subject to minor constraints; in particular, we consider the model with multiple communities ($K \geq 2$). One crucial point here is that by aggregating information from all the uniform layers, we may obtain exact recovery even in cases when this may appear impossible if each layer were considered alone. Two efficient algorithms that successfully achieve exact recovery above the threshold are provided. The theoretical analysis of our algorithms relies on the concentration and regularization of the adjacency matrix for non-uniform random hypergraphs, which could be of independent interest. We also address some open problems regarding parameter knowledge and estimation.

翻訳日:2023-07-24 15:59:35 公開日:2023-07-20

# 因果部分構造を用いたシフトロバスト分子関係学習

Shift-Robust Molecular Relational Learning with Causal Substructure ( http://arxiv.org/abs/2305.18451v3 )

ライセンス: Link先を確認

Namkyeong Lee, Kanghoon Yoon, Gyoung S. Na, Sein Kim, Chanyoung Park

(参考訳) 近年、分子対間の相互作用の振る舞いを予測することを目的とした分子関係学習が、幅広い応用のために分子科学への関心が高まっている。本研究では,分子関係学習における分布変化に頑健なCMRLを提案する。そこで我々はまず,分子科学の領域知識に基づいて因果関係を仮定し,変数間の関係を明らかにする構造因果モデル(SCM)を構築する。 SCMに基づいて, 組換え分子上での干渉を条件付けした新しい条件付き干渉機構を導入する。条件付き介入の枠組みにより,本モデルは因果的サブ構造から学習し,化学反応に急激な相関を持つショートカットサブ構造の共起効果を緩和する。実世界および合成データセットを用いた様々なタスクに関する大規模な実験は、最先端のベースラインモデルよりもCMRLの方が優れていることを示す。私たちのコードはhttps://github.com/namkyeong/cmrlで利用可能です。

Recently, molecular relational learning, whose goal is to predict the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. In this work, we propose CMRL that is robust to the distributional shift in molecular relational learning by detecting the core substructure that is causally related to chemical reactions. To do so, we first assume a causal relationship based on the domain knowledge of molecular sciences and construct a structural causal model (SCM) that reveals the relationship between variables. Based on the SCM, we introduce a novel conditional intervention framework whose intervention is conditioned on the paired molecule. With the conditional intervention framework, our model successfully learns from the causal substructure and alleviates the confounding effect of shortcut substructures that are spuriously correlated to chemical reactions. Extensive experiments on various tasks with real-world and synthetic datasets demonstrate the superiority of CMRL over state-of-the-art baseline models. Our code is available at https://github.com/Namkyeong/CMRL.

翻訳日:2023-07-24 15:49:42 公開日:2023-07-20

# AIによる意思決定における精度と時間の両方に対する適応的介入

Adaptive interventions for both accuracy and time in AI-assisted human decision making ( http://arxiv.org/abs/2306.07458v2 )

ライセンス: Link先を確認

Siddharth Swaroop, Zana Bu\c{c}inca, Finale Doshi-Velez

(参考訳) 緊急治療室で働く医師など、ユーザが時間的にプレッシャーをかけ、高い精度を必要とする環境では、精度を高め、時間を短縮するaiアシスタントを提供したいと思っています。しかし、異なるタイプのAIアシストには、異なる利点がある。ですから私たちは,2つの目標を最大限にトレードオフするために,さまざまな特性(質問やユーザの)に依存したAI支援に適応したいと考えています。我々は、ユーザーがエイリアンに薬を処方しなければならない研究を紹介し、それを使ってAI支援に適応する可能性を探る。私たちは、質問に応じてAI支援を適用することが有益であるという証拠を見つけ、時間と正確性の間に良いトレードオフをもたらす。今後の研究では、機械学習アルゴリズム(強化学習など)が自動的に適応することを考慮します。

In settings where users are both time-pressured and need high accuracy, such as doctors working in Emergency Rooms, we want to provide AI assistance that both increases accuracy and reduces time. However, different types of AI assistance have different benefits: some reduce time taken while increasing overreliance on AI, while others do the opposite. We therefore want to adapt what AI assistance we show depending on various properties (of the question and of the user) in order to best tradeoff our two objectives. We introduce a study where users have to prescribe medicines to aliens, and use it to explore the potential for adapting AI assistance. We find evidence that it is beneficial to adapt our AI assistance depending on the question, leading to good tradeoffs between time taken and accuracy. Future work would consider machine-learning algorithms (such as reinforcement learning) to automatically adapt quickly.

翻訳日:2023-07-24 15:40:42 公開日:2023-07-20

# 高次元および置換不変異常検出

High-dimensional and Permutation Invariant Anomaly Detection ( http://arxiv.org/abs/2306.03933v2 )

ライセンス: Link先を確認

Vinicius Mikuni, Benjamin Nachman

(参考訳) 新しい物理過程の異常検出法は、高次元確率密度の学習が困難であるため、しばしば低次元空間に限られる。特に構成レベルでは,一般密度推定法では置換不変性や可変長入力などの望ましい特性を組み込むことが困難となる。本研究では, 分散モデルに基づく粒子物理学データに対して, 可変長入力を扱うために特別に設計された置換不変密度推定器を提案する。本手法の有効性は,学習密度を置換不変な異常検出スコアとして利用し,背景のみの仮説の下でジェットを効果的に同定することによって実証する。密度推定法を検証するため, 教師付き分類アルゴリズムにより得られた密度の比について検討し, 比較を行った。

Methods for anomaly detection of new physics processes are often limited to low-dimensional spaces due to the difficulty of learning high-dimensional probability densities. Particularly at the constituent level, incorporating desirable properties such as permutation invariance and variable-length inputs becomes difficult within popular density estimation methods. In this work, we introduce a permutation-invariant density estimator for particle physics data based on diffusion models, specifically designed to handle variable-length inputs. We demonstrate the efficacy of our methodology by utilizing the learned density as a permutation-invariant anomaly detection score, effectively identifying jets with low likelihood under the background-only hypothesis. To validate our density estimation method, we investigate the ratio of learned densities and compare to those obtained by a supervised classification algorithm.

翻訳日:2023-07-24 15:40:02 公開日:2023-07-20

# LiDARデータを用いた埋設考古学構造物のセマンティックセグメンテーション手法のトランファー学習

Tranfer Learning of Semantic Segmentation Methods for Identifying Buried Archaeological Structures on LiDAR Data ( http://arxiv.org/abs/2307.03512v2 )

ライセンス: Link先を確認

Paolo Soleni, Wouter B. Verschoof-van der Vaart, \v{Z}iga Kokalj, Arianna Traviglia, Marco Fiorucci

(参考訳) 考古学的な研究において、深層学習をリモートセンシングデータに適用する際には、トレーニングモデルに適したデータセットが限られている。転送学習の応用は、この欠点を軽減するために頻繁に用いられる。しかし、異なる考古学的データセットに適用する場合、その有効性を調べる必要がある。本稿では,2つのlidarデータセット上の2つの意味セグメンテーション深層ニューラルネットワークを用いた,転送学習構成の性能比較を行う。実験結果から, 考古学における伝達学習に基づくアプローチは, 体系的な拡張がまだ観察されていないものの, 性能改善につながる可能性が示唆された。我々は,今後の研究のベースラインとして機能する技術の有効性について,具体的な知見を提供する。

When applying deep learning to remote sensing data in archaeological research, a notable obstacle is the limited availability of suitable datasets for training models. The application of transfer learning is frequently employed to mitigate this drawback. However, there is still a need to explore its effectiveness when applied across different archaeological datasets. This paper compares the performance of various transfer learning configurations using two semantic segmentation deep neural networks on two LiDAR datasets. The experimental results indicate that transfer learning-based approaches in archaeology can lead to performance improvements, although a systematic enhancement has not yet been observed. We provide specific insights about the validity of such techniques that can serve as a baseline for future works.

翻訳日:2023-07-24 15:31:30 公開日:2023-07-20

# 入力制約型mpcの直接最適化アルゴリズム

A direct optimization algorithm for input-constrained MPC ( http://arxiv.org/abs/2306.15079v4 )

ライセンス: Link先を確認

Liang Wu

(参考訳) モデル予測制御(model prediction control, mpc)アルゴリズムを本番組込みプラットフォームで実行する際の課題のひとつは,最悪の計算複雑性の証明書を提供することである。本稿では、入力制約付きMPCに対する \textit{direct} 最適化アルゴリズムを初めて提案する: 繰り返しの回数は、問題次元$n$, 正確な値 $\left\lceil\frac{\log\left(\frac{2n}{\epsilon}\right)}{-2\log(1-\frac{1}{4\sqrt{2n}})}\right\rceil+1$, ここで$\epsilon$は所定の停止精度を示す。

One challenge of running a model predictive control (MPC) algorithm in a production-embedded platform is to provide the certificate of worst-case computation complexity, that is, its maximum execution time has to always be smaller than sampling time. This paper proposes for the first time a \textit{direct} optimization algorithm for input-constrained MPC: the number of iterations is data-independent and dependent on the problem dimension $n$, with exact value $\left\lceil\frac{\log\left(\frac{2n}{\epsilon}\right)}{-2\log(1-\frac{1}{4\sqrt{2n}})}\right\rceil+1$, where $\epsilon$ denotes a given stopping accuracy.

翻訳日:2023-07-24 15:29:19 公開日:2023-07-20

# FAIR: 判断の逆転を正確に推測するための因果関係フレームワーク

FAIR: A Causal Framework for Accurately Inferring Judgments Reversals ( http://arxiv.org/abs/2306.11585v2 )

ライセンス: Link先を確認

Minghua He, Nanfei Gu, Yuntao Shi, Qionghui Zhang, Yaying Chen

(参考訳) 人工知能研究者は近年、法的なインテリジェンスに大きな進歩を遂げている。しかし、既存の研究は、法的知性の効率の向上を制限する判断の反転に埋め込まれた重要な価値に焦点を絞ってはいない。本稿では,実際の中国語の判断をモデルとしたケースリバーサル(FAIR)の高精度推論のための因果的枠組みを提案する。因果推論法による判断反転の原因を抽出し,得られた因果関係を事前知識としてニューラルネットワークに注入する。そして、我々のフレームワークは、法的判断予測タスクとして挑戦的なデータセット上で検証される。実験の結果,提案手法は判断の反転において最も重要な要素を活用でき,得られた因果関係はニューラルネットワークの性能を効果的に改善できることがわかった。さらに、ChatGPTを例として、法的な知能タスクのための大規模言語モデルの一般化能力について論じる。実験の結果,大規模言語モデルの一般化能力にはまだ欠陥が残っており,因果関係のマイニングは,モデル予測の精度を効果的に向上し,説明できることがわかった。

Artificial intelligence researchers have made significant advances in legal intelligence in recent years. However, the existing studies have not focused on the important value embedded in judgments reversals, which limits the improvement of the efficiency of legal intelligence. In this paper, we propose a causal Framework for Accurately Inferring case Reversals (FAIR), which models the problem of judgments reversals based on real Chinese judgments. We mine the causes of judgments reversals by causal inference methods and inject the obtained causal relationships into the neural network as a priori knowledge. And then, our framework is validated on a challenging dataset as a legal judgment prediction task. The experimental results show that our framework can tap the most critical factors in judgments reversal, and the obtained causal relationships can effectively improve the neural network's performance. In addition, we discuss the generalization ability of large language models for legal intelligence tasks using ChatGPT as an example. Our experiment has found that the generalization ability of large language models still has defects, and mining causal relationships can effectively improve the accuracy and explain ability of model predictions.

翻訳日:2023-07-24 15:28:07 公開日:2023-07-20

# 長いステップを通したより高速なグラディエント染料

Provably Faster Gradient Descent via Long Steps ( http://arxiv.org/abs/2307.06324v4 )

ライセンス: Link先を確認

Benjamin Grimmer

(参考訳) 本研究は, 滑らかな凸最適化における勾配降下の収束速度を, コンピュータ支援解析手法により確実に向上させる。本理論は、多くの反復の全体的な効果を、ほとんどの一階法分析で使われる典型的な単文帰納法ではなく、一度に分析することにより、頻繁な長いステップでポリシーを段階化することを可能にする。短期的に客観的な価値を高めるための長いステップは、長期的には確実により早く収束することを示している。勾配降下のより高速な$O(1/T\log T)$レートを証明するための予想も、単純な数値検証と共に動機付けられる。

This work establishes provably faster convergence rates for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster $O(1/T\log T)$ rate for gradient descent is also motivated along with simple numerical validation.

翻訳日:2023-07-24 15:20:51 公開日:2023-07-20

# ZeroQuant-FP:浮動小数点フォーマットを用いたLLM後のW4A8量子化

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats ( http://arxiv.org/abs/2307.09782v2 )

ライセンス: Link先を確認

Xiaoxia Wu and Zhewei Yao and Yuxiong He

(参考訳) 大規模言語モデル(LLM)の複雑な領域では、計算効率とモデル品質の維持のバランスを崩すことは、非常に難しい課題である。均一量子化の本質的な限界をナビゲートし、特に外れ値を扱う場合、NVIDIAのH100ハードウェアのローンチによって動機づけられたこの研究は、浮動小数点量子化(FP)の生存可能性、特にFP8とFP4に焦点をあてる。我々の総合的な調査によると、LLMでは、FP8のアクティベーションは整数(INT8)を一貫して上回り、性能エッジは10億を超えるパラメータを持つモデルでより顕著になる。重量量子化では、FP4はINT4に匹敵する性能を示し、H100のようなFP対応ハードウェアへの展開を単純化している。重みとアクティベーションの差に起因する精度アライメントのオーバーヘッドを軽減するため、標準のw4a8モデルと比較して性能に悪影響を及ぼす2つの重み量子化のスケーリング制約を提案する。さらに、低ランク補償(LoRC)戦略を統合することで量子化手法を強化し、特に小型モデルにおいて改善をもたらす。本研究は, LLMにおけるFP量子化の可能性を強調し, 資源制限環境における高効率展開の道を開くものである。

In the complex domain of large language models (LLMs), striking a balance between computational efficiency and maintaining model quality is a formidable challenge. Navigating the inherent limitations of uniform quantization, particularly when dealing with outliers, and motivated by the launch of NVIDIA's H100 hardware, this study delves into the viability of floating-point (FP) quantization, particularly focusing on FP8 and FP4, as a potential solution. Our comprehensive investigation reveals that for LLMs, FP8 activation consistently outshines its integer (INT8) equivalent, with the performance edge becoming more noticeable in models possessing parameters beyond one billion. For weight quantization, our findings indicate that FP4 exhibits comparable, if not superior, performance to INT4, simplifying deployment on FP-supported hardware like H100. To mitigate the overhead from precision alignment caused by the disparity between weights and activations, we propose two scaling constraints for weight quantization that negligibly impact the performance compared to the standard W4A8 model. We additionally enhance our quantization methods by integrating the Low Rank Compensation (LoRC) strategy, yielding improvements especially in smaller models. The results of our investigation emphasize the immense potential of FP quantization for LLMs, paving the way for high-efficiency deployment in resource-limited settings.

翻訳日:2023-07-24 15:09:57 公開日:2023-07-20

# 単一回路を用いた量子ニューラルネットワークの全てのパラメータに関する勾配の計算

Computing the gradients with respect to all parameters of a quantum neural network using a single circuit ( http://arxiv.org/abs/2307.08167v2 )

ライセンス: Link先を確認

Guang Ping He

(参考訳) パラメータシフト規則を用いて量子ニューラルネットワークの勾配を計算する場合、ネットワークの1つの調整可能なパラメータに対して、勾配に対してコスト関数を2回計算する必要がある。パラメータの総数が多い場合には、計算のための量子回路を何度も調整して実行しなければならない。本稿では,回路深度を小さくし,古典レジスタを小さくした単一回路のみを用いた勾配計算手法を提案する。また、実量子ハードウェアとシミュレータの両方で実験により、回路が従来の手法よりもはるかに短い時間でコンパイルできるという利点があり、結果として全体の実行速度が向上することを示した。

When computing the gradients of a quantum neural network using the parameter-shift rule, the cost function needs to be calculated twice for the gradient with respect to a single adjustable parameter of the network. When the total number of parameters is high, the quantum circuit for the computation has to be adjusted and run for many times. Here we propose an approach to compute all the gradients using a single circuit only, with a much reduced circuit depth and less classical registers. We also demonstrate experimentally, on both real quantum hardware and simulator, that our approach has the advantages that the circuit takes a significantly shorter time to compile than the conventional approach, resulting in a speedup on the total runtime.

翻訳日:2023-07-24 15:08:17 公開日:2023-07-20

# 認知症患者の日常生活行動パターンの変化を識別するためのマルコフ連鎖モデル

A Markov Chain Model for Identifying Changes in Daily Activity Patterns of People Living with Dementia ( http://arxiv.org/abs/2307.11126v1 )

ライセンス: Link先を確認

Nan Fletcher-Lloyd, Alina-Irina Serban, Magdalena Kolanko, David Wingfield, Danielle Wilson, Ramin Nilforooshan, Payam Barnaghi, and Eyal Soreq

(参考訳) 栄養失調と脱水は認知症患者(plwd)の認知機能低下と、健常者と比較して入院率の上昇に強く関係している。食事や飲酒行動の過度な変化は、しばしば栄養失調や脱水を引き起こし、認知と機能低下の進行を加速させ、生活の質を著しく低下させる。残念ながら、このような変化を客観的に検出する方法は確立されていない。本稿では,iot(internet of things, モノのインターネット)技術を用いて,73世帯のplwdから収集した家庭内モニタリングデータを分析した。コロナウイルス2019(COVID-19)パンデミックは、PLWDの行動習慣、特に飲食習慣を劇的に変えたことがこれまで示されていた。新型コロナウイルスのパンデミックを自然実験として使用し,499日間連続観察されたPLWD21世帯のキッチン活動の変化を線形混合効果モデルを用いて検討した。昼間のキッチン活動の増加と夜間のキッチン活動の著しい減少(t(147) = -2.90, p < 0.001)を報告した。さらに, 遠隔監視データに適用したマルコフモデルを用いたplwdの挙動変化を, 直接計測できない行動のプロキシとして検出する新しい解析手法を提案する。これらの結果は, PLWDの自然的環境におけるモニタリングの改善と, 反応性から積極的ケアへの転換の道を開くものである。

Malnutrition and dehydration are strongly associated with increased cognitive and functional decline in people living with dementia (PLWD), as well as an increased rate of hospitalisations in comparison to their healthy counterparts. Extreme changes in eating and drinking behaviours can often lead to malnutrition and dehydration, accelerating the progression of cognitive and functional decline and resulting in a marked reduction in quality of life. Unfortunately, there are currently no established methods by which to objectively detect such changes. Here, we present the findings of an extensive quantitative analysis conducted on in-home monitoring data collected from 73 households of PLWD using Internet of Things technologies. The Coronavirus 2019 (COVID-19) pandemic has previously been shown to have dramatically altered the behavioural habits, particularly the eating and drinking habits, of PLWD. Using the COVID-19 pandemic as a natural experiment, we conducted linear mixed-effects modelling to examine changes in mean kitchen activity within a subset of 21 households of PLWD that were continuously monitored for 499 days. We report an observable increase in day-time kitchen activity and a significant decrease in night-time kitchen activity (t(147) = -2.90, p < 0.001). We further propose a novel analytical approach to detecting changes in behaviours of PLWD using Markov modelling applied to remote monitoring data as a proxy for behaviours that cannot be directly measured. Together, these results pave the way to introduce improvements into the monitoring of PLWD in naturalistic settings and for shifting from reactive to proactive care.

翻訳日:2023-07-24 14:51:44 公開日:2023-07-20

# 弱コヒーレント状態を用いたベル不等式違反

Violating Bell inequality using weak coherent states ( http://arxiv.org/abs/2307.11123v1 )

ライセンス: Link先を確認

Moslem Mahdavifar and S. M. Hashemi Rafsanjani

(参考訳) 連続波レーザーを用いた2光子干渉の実験的検討を行う。連続波レーザーによる位相ランダム化弱コヒーレント状態を用いたCHSH不等式違反を示す。我々の実装は、古典的ソースと見なされるソースの量子的性質を明らかにするためのアプローチとして機能する。

We present an experimental investigation of two-photon interference using a continuous-wave laser. We demonstrate the violation of the CHSH inequality using the phase randomized weak coherent states from a continuous wave laser. Our implementation serves as an approach to reveal the quantum nature of a source that is considered to be a classical source.

翻訳日:2023-07-24 14:51:18 公開日:2023-07-20

# 銀河画像の確率的デコンボリューションのための拡散モデル

Diffusion Models for Probabilistic Deconvolution of Galaxy Images ( http://arxiv.org/abs/2307.11122v1 )

ライセンス: Link先を確認

Zhiwei Xue, Yuhang Li, Yash Patel, Jeffrey Regier

(参考訳) 望遠鏡は特定の点拡散関数(PSF)で画像をキャプチャする。 PSFデコンボリューション(PSF deconvolution)として知られる問題である、よりシャープなPSFで画像がどのように見えるかを推測することは、PSFコンボリューションが可逆変換ではないために不適切である。深部生成モデルがPSFの非畳み込みに訴えているのは、PSFと結合した場合に観測結果が生成される可能性のある候補画像の後方分布を推測できるためである。しかしながら、VAEやGANのような古典的な深層生成モデルは、しばしば不十分なサンプル多様性をもたらす。代替として,銀河画像のpsf分解のための分類器フリー条件拡散モデルを提案する。この拡散モデルが条件付きvaeと比較して可能なデコンボリューションのより広い多様性を捉えることを実証する。

Telescopes capture images with a particular point spread function (PSF). Inferring what an image would have looked like with a much sharper PSF, a problem known as PSF deconvolution, is ill-posed because PSF convolution is not an invertible transformation. Deep generative models are appealing for PSF deconvolution because they can infer a posterior distribution over candidate images that, if convolved with the PSF, could have generated the observation. However, classical deep generative models such as VAEs and GANs often provide inadequate sample diversity. As an alternative, we propose a classifier-free conditional diffusion model for PSF deconvolution of galaxy images. We demonstrate that this diffusion model captures a greater diversity of possible deconvolutions compared to a conditional VAE.

翻訳日:2023-07-24 14:51:13 公開日:2023-07-20

# 計算倫理から道徳へ : 意思決定アルゴリズムが道徳原理の出現、最適な行動の存在、そしてそれを発見する能力を理解するのにどのように役立つか

From computational ethics to morality: how decision-making algorithms can help us understand the emergence of moral principles, the existence of an optimal behaviour and our ability to discover it ( http://arxiv.org/abs/2307.11119v1 )

ライセンス: Link先を確認

Eduardo C. Garrido-Merch\'an, Sara Lumbreras-Sancho

(参考訳) 本稿では,計算倫理観から得られた具体的な洞察を提供することにより,道徳性を自然化するための進化的倫理の努力を付け加える。本稿では,人工知能の最も成功したパラダイムの一つである強化学習に基づく,人間の意思決定のスタイル化モデルを提案する。強化学習に関する主要な概念が提示された後、倫理の進化的説明を照らし出すことのできる、特に有用な並列性が描かれた。具体的には,エージェントの条件を考慮した最適な政策(あるいは,客観的な倫理的原則)の存在について検討する。さらに、この方針が試行錯誤によってどのように学習可能かを示し、強化学習の文脈でよく知られた2つの定理の仮説を支持する。結論として,提案する枠組みを拡大して,人間行動の他の潜在的に興味深い分野について形式化の観点から検討する。

This paper adds to the efforts of evolutionary ethics to naturalize morality by providing specific insights derived from a computational ethics view. We propose a stylized model of human decision-making, which is based on Reinforcement Learning, one of the most successful paradigms in Artificial Intelligence. After the main concepts related to Reinforcement Learning have been presented, some particularly useful parallels are drawn that can illuminate evolutionary accounts of ethics. Specifically, we investigate the existence of an optimal policy (or, as we will refer to, objective ethical principles) given the conditions of an agent. In addition, we will show how this policy is learnable by means of trial and error, supporting our hypotheses on two well-known theorems in the context of Reinforcement Learning. We conclude by discussing how the proposed framework can be enlarged to study other potentially interesting areas of human behavior from a formalizable perspective.

翻訳日:2023-07-24 14:51:00 公開日:2023-07-20

# 発散物除去のための運動量を用いた拡散サンプリング

Diffusion Sampling with Momentum for Mitigating Divergence Artifacts ( http://arxiv.org/abs/2307.11118v1 )

ライセンス: Link先を確認

Suttisak Wizadwongsa, Worameth Chinchuthakun, Pramook Khungurn, Amit Raj, Supasorn Suwajanakorn

(参考訳) 画像生成における拡散モデルの顕著な成功にもかかわらず、遅いサンプリングは永続的な問題である。サンプリングプロセスの高速化を目的として,先行研究はODE/SDEとして拡散サンプリングを改良し,高次数値法を導入した。しかしながら、これらの手法はしばしば分岐アーティファクトを生成し、特に少ないサンプリングステップで達成可能な加速を制限する。本稿では,これらのアーティファクトの潜在的な原因を調査し,これらの方法の小さな安定性領域が主な原因である可能性を示唆する。この問題に対処するため,我々は2つの新しい手法を提案する。最初の手法は、最適化を改善する有名な手法である重球運動量(hb)を既存の拡散数値法に組み込んで安定化領域を広げることである。また、結果の方法が一階収束であることも証明する。第2のテクニックは、GHVB(Generalized Heavy Ball)と呼ばれ、精度とアーティファクトの抑制のトレードオフを提供する新しい高階法を構築する。提案手法は,低ステップサンプリングのためのピクセルベースおよび潜在拡散モデルの両方において,最先端の拡散ソルバを上回って,アーティファクトの削減と画質向上に極めて有効であることを示す。本研究は,今後の拡散作業のための数値手法の設計に関する新たな知見を提供する。

Despite the remarkable success of diffusion models in image generation, slow sampling remains a persistent issue. To accelerate the sampling process, prior studies have reformulated diffusion sampling as an ODE/SDE and introduced higher-order numerical methods. However, these methods often produce divergence artifacts, especially with a low number of sampling steps, which limits the achievable acceleration. In this paper, we investigate the potential causes of these artifacts and suggest that the small stability regions of these methods could be the principal cause. To address this issue, we propose two novel techniques. The first technique involves the incorporation of Heavy Ball (HB) momentum, a well-known technique for improving optimization, into existing diffusion numerical methods to expand their stability regions. We also prove that the resulting methods have first-order convergence. The second technique, called Generalized Heavy Ball (GHVB), constructs a new high-order method that offers a variable trade-off between accuracy and artifact suppression. Experimental results show that our techniques are highly effective in reducing artifacts and improving image quality, surpassing state-of-the-art diffusion solvers on both pixel-based and latent-based diffusion models for low-step sampling. Our research provides novel insights into the design of numerical methods for future diffusion work.

翻訳日:2023-07-24 14:50:42 公開日:2023-07-20

# 知能の性質

Nature of Intelligence ( http://arxiv.org/abs/2307.11114v1 )

ライセンス: Link先を確認

Barco Jie You

(参考訳) 人間の脳は人間の知能の基盤である。人間の脳をシミュレートすることで、人工知能は学習能力を持つ計算モデルを構築し、人間のレベルに近づくインテリジェントなタスクを実行する。ディープニューラルネットワークは、データの表現を学習し、多くの認識領域における最先端を改善するために複数の計算層から構成される。しかし、人間とAIの両方で一般的に表現される知性の本質は不明である。ここでは、インテリジェンスの性質は、空間と時間に関するデータセット間の機能的関係を確立することにより、システムのエントロピーを最小限に抑える一連の数学的機能的プロセスであることを示す。人間とAIは、エネルギーを消費する強化された方法でこれらのエントロピー還元プロセスを実装することで知性を達成した。この仮説により、言語、無意識、意識の数学的モデルを確立し、神経科学によって発見され、AI工学によって達成される証拠を予測する。さらに、宇宙の全体エントロピーは保守的であると結論付け、知性は、宇宙にもともと存在するが空間と時間の間で分離された物理的または情報的に連結されたデータセットによってエントロピーを減少させる自発的なプロセスと対向する。このエッセイは、宇宙と私たちを人間としてより深く理解するための出発点であり、人間の知性にかかわる高度なAIモデルを達成するためのものであるべきです。さらに、このエッセイは、エントロピーをより効率的なエネルギー消費方法で減らせば、人間よりも高度な知性が存在するべきだと主張している。

The human brain is the substrate for human intelligence. By simulating the human brain, artificial intelligence builds computational models that have learning capabilities and perform intelligent tasks approaching the human level. Deep neural networks consist of multiple computation layers to learn representations of data and improve the state-of-the-art in many recognition domains. However, the essence of intelligence commonly represented by both humans and AI is unknown. Here, we show that the nature of intelligence is a series of mathematically functional processes that minimize system entropy by establishing functional relationships between datasets over space and time. Humans and AI have achieved intelligence by implementing these entropy-reducing processes in a reinforced manner that consumes energy. With this hypothesis, we establish mathematical models of language, unconsciousness and consciousness, predicting the evidence to be found by neuroscience and achieved by AI engineering. Furthermore, a conclusion is made that the total entropy of the universe is conservative, and intelligence counters the spontaneous processes to decrease entropy by physically or informationally connecting datasets that originally exist in the universe but are separated across space and time. This essay should be a starting point for a deeper understanding of the universe and us as human beings and for achieving sophisticated AI models that are tantamount to human intelligence or even superior. Furthermore, this essay argues that more advanced intelligence than humans should exist if only it reduces entropy in a more efficient energy-consuming way.

翻訳日:2023-07-24 14:50:20 公開日:2023-07-20

# 昆虫の微細な分類のためのトランスフォーマーと畳み込みモデルの比較

Comparison between transformers and convolutional models for fine-grained classification of insects ( http://arxiv.org/abs/2307.11112v1 )

ライセンス: Link先を確認

Rita Pucci, Vincent J. Kalkman, Dan Stowell

(参考訳) 識別的特徴を見つけるのが難しいため、きめ細かい分類は難しい。この問題は、同じ分類群内の種を特定することに適用されると悪化する。これは種がしばしば形態的特徴を共有しており、区別が難しいためである。我々はInsectaの分類学クラスを考える。昆虫の識別は多くの生態系の基盤にある住民の1つであるため、生物多様性監視に不可欠である。市民科学は、野生の昆虫の画像を収集し、専門家がすべての国で改良された分布地図を作成する可能性を秘めている。何十億もの画像が自動的に分類され、ディープニューラルネットワークアルゴリズムが、きめ細かいタスクのために研究されている主要なテクニックの1つです。 SOTAでは、ディープラーニングアルゴリズムの分野は非常に実りが多いので、どのようにアルゴリズムを識別するか? 我々は,オドナタとコレオプテアの順序に着目し,コンピュータビジョンにおいてよく知られた2つの階層構造,トランスフォーマー層と畳み込み層を分析するための初期比較研究を提案する。我々は,完全トランスフォーマーベースであるT2TViT,完全畳み込みベースであるEfficientNet,ハイブリッドであるViTAEの性能を比較した。我々は,3つのモデルの性能を同一条件で分析し,性別,推論時間,およびスマートフォンからの画像のバランスの取れないデータセットを用いて,形態ごとの性能を評価する。 3種類のモデルすべてで高い性能を観察したが,本解析により,ハイブリッドモデルが完全畳み込みベースモデルおよび完全トランスフォーマベースモデルよりも精度において優れ,完全トランスフォーマベースモデルが推論速度において他モデルよりも優れており,トランスフォーマがサンプル不足に対して頑健であり,推論時間が速いことを証明した。

Fine-grained classification is challenging due to the difficulty of finding discriminatory features. This problem is exacerbated when applied to identifying species within the same taxonomical class. This is because species are often sharing morphological characteristics that make them difficult to differentiate. We consider the taxonomical class of Insecta. The identification of insects is essential in biodiversity monitoring as they are one of the inhabitants at the base of many ecosystems. Citizen science is doing brilliant work of collecting images of insects in the wild giving the possibility to experts to create improved distribution maps in all countries. We have billions of images that need to be automatically classified and deep neural network algorithms are one of the main techniques explored for fine-grained tasks. At the SOTA, the field of deep learning algorithms is extremely fruitful, so how to identify the algorithm to use? We focus on Odonata and Coleoptera orders, and we propose an initial comparative study to analyse the two best-known layer structures for computer vision: transformer and convolutional layers. We compare the performance of T2TViT, a fully transformer-base, EfficientNet, a fully convolutional-base, and ViTAE, a hybrid. We analyse the performance of the three models in identical conditions evaluating the performance per species, per morph together with sex, the inference time, and the overall performance with unbalanced datasets of images from smartphones. Although we observe high performances with all three families of models, our analysis shows that the hybrid model outperforms the fully convolutional-base and fully transformer-base models on accuracy performance and the fully transformer-base model outperforms the others on inference speed and, these prove the transformer to be robust to the shortage of samples and to be faster at inference time.

翻訳日:2023-07-24 14:49:54 公開日:2023-07-20

# ドメイン一般化のための平坦性を考慮した最小化

Flatness-Aware Minimization for Domain Generalization ( http://arxiv.org/abs/2307.11108v1 )

ライセンス: Link先を確認

Xingxuan Zhang, Renzhe Xu, Han Yu, Yancheng Dong, Pengfei Tian, Peng Cu

(参考訳) ドメイン一般化(DG)は、未知の分布シフトの下でよく一般化する堅牢なモデルを学ぶことを目指している。 DGの重要な側面として、オプティマイザの選択は深く調査されていない。現在、ほとんどのDGメソッドは広く使われているベンチマークであるDomainBedに従っており、すべてのデータセットのデフォルトオプティマイザとしてAdamを使用している。しかし、Adamは必ずしも現在のDGメソッドやデータセットの大部分にとって最適な選択肢ではない。本研究では,損失景観平坦性の観点から,ゼロ次および1次平坦性を同時に最適化できる領域一般化のための平坦性認識最小化(fad)を提案する。本稿では,FADのアウト・オブ・ディストリビューション(OOD)の一般化誤差と収束に関する理論的解析を行う。実験の結果,様々なDGデータセット上でのFADの優位性を示した。さらに、FADは、他のゼロ階および1階の平坦度対応最適化手法と比較して、フラットな最適性を発見することができることを確認した。

Domain generalization (DG) seeks to learn robust models that generalize well under unknown distribution shifts. As a critical aspect of DG, optimizer selection has not been explored in depth. Currently, most DG methods follow the widely used benchmark, DomainBed, and utilize Adam as the default optimizer for all datasets. However, we reveal that Adam is not necessarily the optimal choice for the majority of current DG methods and datasets. Based on the perspective of loss landscape flatness, we propose a novel approach, Flatness-Aware Minimization for Domain Generalization (FAD), which can efficiently optimize both zeroth-order and first-order flatness simultaneously for DG. We provide theoretical analyses of the FAD's out-of-distribution (OOD) generalization error and convergence. Our experimental results demonstrate the superiority of FAD on various DG datasets. Additionally, we confirm that FAD is capable of discovering flatter optima in comparison to other zeroth-order and first-order flatness-aware optimization methods.

翻訳日:2023-07-24 14:49:24 公開日:2023-07-20

# 双対性を持つ1次元スピン模型における弱普遍性、量子多体傷、異常無限温度自己相関

Weak universality, quantum many-body scars and anomalous infinite-temperature autocorrelations in a one-dimensional spin model with duality ( http://arxiv.org/abs/2307.11161v1 )

ライセンス: Link先を確認

Adithi Udupa, Samudra Sur, Arnab Sen and Diptiman Sen

(参考訳) 3スピン相互作用を持つ1次元スピン1/2モデルと横磁場 $h$ について検討した。このモデルは、z_2 \times z_2$ 対称性を持ち、h$と1/h$の双対性を持つことが知られている。自己双対点の$h=1$は連続相転移を持つ量子臨界点である。臨界指数 $z$, $\beta$, $\gamma$, $\nu$ を計算し、中心電荷 $c$ を正確な対角化を用いて数値的に計算する。 z$ と $c$ の両方が 1$ に等しいことは、臨界点が辺数作用素を持つ共形場理論によって支配されていることを暗示している。 3スピンモデルは4状態ポッツモデルと2つのデカップリング横場イジングモデルの間の中間であるアシュキン・テラー臨界性を示す。エネルギー準位間隔解析は、モデルが可積分でないことを示す。偶数のサイト数と周期境界条件を持つ系には、システムサイズとともに指数関数的に増加する正中スペクトルゼロエネルギー固有状態が存在する。これらの固有状態の部分集合は、$h$の値とは独立な波動関数を持ち、特異な絡み合い構造を持つため、量子多体傷と考えられる。このような量子スカーの数は、少なくともシステムサイズと線形にスケールする。最後に,開放系の一端に近い場所での無限温度自己相関関数について検討する。自己相関者の何人かは異常に時間的にリラックスし、h \gg 1$ または $h \ll 1$ であれば、発音される振動と非常に小さな減衰率を持つ。 h$ が臨界点に近い場合、オートコレレータは終点のオートコレレータを除いて急速に 0 に崩壊する。

We study a one-dimensional spin-1/2 model with three-spin interactions and a transverse magnetic field $h$. The model is known to have a $Z_2 \times Z_2$ symmetry, and a duality between $h$ and $1/h$. The self-dual point at $h=1$ is a quantum critical point with a continuous phase transition. We compute the critical exponents $z$, $\beta$, $\gamma$ and $\nu$, and the central charge $c$ numerically using exact diagonalization. We find that both $z$ and $c$ are equal to $1$, implying that the critical point is governed by a conformal field theory with a marginal operator. The three-spin model exhibits Ashkin-Teller criticality with an effective coupling that is intermediate between four-state Potts model and two decoupled transverse field Ising models. An energy level spacing analysis shows that the model is not integrable. For a system with an even number of sites and periodic boundary conditions, there are exact mid-spectrum zero-energy eigenstates whose number grows exponentially with the system size. A subset of these eigenstates have wave functions which are independent of the value of $h$ and have unusual entanglement structure; hence these can be considered to be quantum many-body scars. The number of such quantum scars scales at least linearly with system size. Finally, we study the infinite-temperature autocorrelation functions at sites close to one end of an open system. We find that some of the autocorrelators relax anomalously in time, with pronounced oscillations and very small decay rates if $h \gg 1$ or $h \ll 1$. If $h$ is close to the critical point, the autocorrelators decay quickly to zero except for an autocorrelator at the end site.

翻訳日:2023-07-24 14:44:25 公開日:2023-07-20

# 時間最適多ビットゲート:複雑度、効率的ヒューリスティックおよびゲート時間境界

Time-optimal multi-qubit gates: Complexity, efficient heuristic and gate-time bounds ( http://arxiv.org/abs/2307.11160v1 )

ライセンス: Link先を確認

Pascal Ba{\ss}ler, Markus Heinrich, Martin Kliesch

(参考訳) マルチキュービット相互作用は量子コンピューティングハードウェアにおいて一様であり、マルチキュービットエンタングゲートを生成することができる。このようなゲートは従来の2ビットゲートよりも有利である。本研究では,マルチキュービットIsing型相互作用と単一キュービットゲートを用いた量子ゲート合成に着目した。これらの相互作用はグローバルZZゲート(GZZゲート)を生成することができる。時間最適マルチキュービットゲートの合成はNPハードであることを示す。しかし、ある仮定の下では、効率的な合成を可能にする時間最適マルチキュービットゲートの明示的な構成を提供する。これらの構築されたマルチキュービットゲートは一定のゲート時間を持ち、線形なシングルキュービットゲート層で実装できる。さらに、高速なマルチキュービットゲートを合成するための多項式ランタイムを持つヒューリスティックアルゴリズムを提供する。最後に、最適GZZゲート時間において、下限と上限を証明した。さらに、任意の GZZ ゲートは n 個の量子ビットの時間 O(n) で実行可能であると推測する。我々はこの主張を理論的および数値的な結果で支持する。

Multi-qubit interactions are omnipresent in quantum computing hardware, and they can generate multi-qubit entangling gates. Such gates promise advantages over traditional two-qubit gates. In this work, we focus on the quantum gate synthesis with multi-qubit Ising-type interactions and single-qubit gates. These interactions can generate global ZZ-gates (GZZ gates). We show that the synthesis of time-optimal multi-qubit gates is NP-hard. However, under certain assumptions we provide explicit constructions of time-optimal multi-qubit gates allowing for efficient synthesis. These constructed multi-qubit gates have a constant gate time and can be implemented with linear single-qubit gate layers. Moreover, a heuristic algorithm with polynomial runtime for synthesizing fast multi-qubit gates is provided. Finally, we prove lower and upper bounds on the optimal GZZ gate-time. Furthermore, we conjecture that any GZZ gate can be executed in a time O(n) for n qubits. We support this claim with theoretical and numerical results.

翻訳日:2023-07-24 14:43:31 公開日:2023-07-20

# ハードウェアインスパイアしたゼロノイズ外挿を用いた変分固有解法における量子ゲート誤差の軽減

Mitigating Quantum Gate Errors for Variational Eigensolvers Using Hardware-Inspired Zero-Noise Extrapolation ( http://arxiv.org/abs/2307.11156v1 )

ライセンス: Link先を確認

Alexey Uvarov, Daniil Rabinovich, Olga Lakhmanskaya, Kirill Lakhmanskiy, Jacob Biamonte, Soumik Adhikary

(参考訳) 変分量子アルゴリズムは、現代の量子アルゴリズム研究の基盤として登場した。これらのアルゴリズムの実践的実装は、体系的エラーに対してある程度の堅牢性を提供するが、確率的エラーとコヒーレンス時間に制限があるため、性能の低下を示す。本研究では,ゼロノイズ外挿を用いた変分アルゴリズムの量子ゲート誤差を緩和する手法を開発した。回路の誤差強度を制御できる実験可能な手法を提案する。物理量子デバイスにおけるゲートエラーが、異なる量子ビットとペアに対して不均質に分布するという事実を利用する。その結果、回路内の抽象量子ビットを物理デバイスにマッピングする方法に基づいて、異なる回路誤差和を達成できる。回路誤差和 (CES) に関して, 変動的アプローチにおける推定エネルギーは概ね線形であることがわかった。したがって、CESをゼロにすると、エネルギー-CESデータによる線形フィットはノイズのない変動アルゴリズムによって推定されるエネルギーを近似することができる。これを数値的に証明し、回路内の2ビットゲートが正則グラフの形で配置されている場合、近似が正確であることを示す。

Variational quantum algorithms have emerged as a cornerstone of contemporary quantum algorithms research. Practical implementations of these algorithms, despite offering certain levels of robustness against systematic errors, show a decline in performance due to the presence of stochastic errors and limited coherence time. In this work, we develop a recipe for mitigating quantum gate errors for variational algorithms using zero-noise extrapolation. We introduce an experimentally amenable method to control error strength in the circuit. We utilise the fact that gate errors in a physical quantum device are distributed inhomogeneously over different qubits and pairs thereof. As a result, one can achieve different circuit error sums based on the manner in which abstract qubits in the circuit are mapped to a physical device. We find that the estimated energy in the variational approach is approximately linear with respect to the circuit error sum (CES). Consequently, a linear fit through the energy-CES data, when extrapolated to zero CES, can approximate the energy estimated by a noiseless variational algorithm. We demonstrate this numerically and further prove that the approximation is exact if the two-qubit gates in the circuits are arranged in the form of a regular graph.

翻訳日:2023-07-24 14:43:07 公開日:2023-07-20

# 一般ゲーム表現に向けて:ゲームピクセルをコンテンツとスタイルに分解する

Towards General Game Representations: Decomposing Games Pixels into Content and Style ( http://arxiv.org/abs/2307.11141v1 )

ライセンス: Link先を確認

Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis and Georgios N. Yannakakis

(参考訳) オンスクリーンゲーム映像には、プレイヤーがゲームをプレイしたり経験したりする際に処理する豊富なコンテキスト情報が含まれている。ゲームにおけるピクセル表現の学習は、ゲームプレイエージェント、手続き的コンテンツ生成、プレイヤーのモデリングなど、いくつかの下流タスクにわたる人工知能の恩恵を受ける。しかし、これらの手法の一般化性は、学習された表現は、類似のゲーム力学を持つゲーム間で理想的に共有されるべきである。例えば、1つのゲームでトレーニングされたゲームプレイングエージェントは、リトレーニングなしで同様のゲームでうまく動作することができる。本稿では,コンテンツ埋め込みやスタイル埋め込みに潜伏空間を分解することで,コンピュータビジョンエンコーダの汎用性について考察する。ゴールは、下流タスクにとって重要なゲームコンテンツに関して、同じジャンルのゲーム間のドメインギャップを最小化し、グラフィックスタイルの違いを無視することである。予め学習した視覚トランスフォーマエンコーダとゲームジャンルに基づく分解技術を用いて,異なるコンテンツとスタイル埋め込みを得る。本研究は, コンテント抽出能力を維持しつつ, 複数のゲームにまたがるスタイルの不変性を実現していることを示す。提案するコンテンツとスタイルの分解は,下流タスクとは無関係に,ゲーム環境にまたがるより良い一般化能力を提供する。

On-screen game footage contains rich contextual information that players process when playing and experiencing a game. Learning pixel representations of games can benefit artificial intelligence across several downstream tasks including game-playing agents, procedural content generation, and player modelling. The generalizability of these methods, however, remains a challenge, as learned representations should ideally be shared across games with similar game mechanics. This could allow, for instance, game-playing agents trained on one game to perform well in similar games with no re-training. This paper explores how generalizable pre-trained computer vision encoders can be for such tasks, by decomposing the latent space into content embeddings and style embeddings. The goal is to minimize the domain gap between games of the same genre when it comes to game content critical for downstream tasks, and ignore differences in graphical style. We employ a pre-trained Vision Transformer encoder and a decomposition technique based on game genres to obtain separate content and style embeddings. Our findings show that the decomposed embeddings achieve style invariance across multiple games while still maintaining strong content extraction capabilities. We argue that the proposed decomposition of content and style offers better generalization capacities across game environments independently of the downstream task.

翻訳日:2023-07-24 14:42:29 公開日:2023-07-20

# RCVaR:産業報告データを用いたサイバー攻撃コスト推定のための経済的なアプローチ

RCVaR: an Economic Approach to Estimate Cyberattacks Costs using Data from Industry Reports ( http://arxiv.org/abs/2307.11140v1 )

ライセンス: Link先を確認

Muriel Figueredo Franco, Fabian K\"unzler, Jan von der Assen, Chao Feng, Burkhard Stiller

(参考訳) デジタル化は、破壊的なサイバー攻撃の犠牲者となる企業のビジネス機会とリスクを高める。したがって、リスクエクスポージャーとサイバーセキュリティ戦略の管理は、競争力のある市場で生き残りたいデジタル企業にとって不可欠である。しかし、企業固有のリスクの理解と関連するコストの定量化は簡単ではない。現在のアプローチでは、サイバーセキュリティへの影響を個別かつ定量的に見積もることはできない。限られた資源と技術的専門知識のため、中小企業や大企業でさえ、サイバー攻撃の暴露の定量化に苦慮している。そのため、サイバー攻撃による損失の理解を支援するため、新たなアプローチをとらなければならない。この記事では、公開サイバーセキュリティレポートから実際の情報を用いて、サイバーセキュリティコストを見積もるための経済的なアプローチであるReal Cyber Value at Risk (RCVaR)を紹介する。 RCVaRは、様々な情報源から最も重要なサイバーリスク要因を特定し、それらの定量的結果を組み合わせて、企業のサイバー攻撃コストを見積もる。さらに、RCVaRは、確率に基づくシミュレーションだけでなく、過去の実世界のデータに基づくコストとリスク推定を実現するために、現在の手法を拡張している。未確認データに対するアプローチの評価は、サイバーリスクの予測と管理におけるRCVaRの精度と効率を示している。したがって、RCVaRはサイバーセキュリティ計画とリスク管理プロセスに価値ある追加であることを示している。

Digitization increases business opportunities and the risk of companies being victims of devastating cyberattacks. Therefore, managing risk exposure and cybersecurity strategies is essential for digitized companies that want to survive in competitive markets. However, understanding company-specific risks and quantifying their associated costs is not trivial. Current approaches fail to provide individualized and quantitative monetary estimations of cybersecurity impacts. Due to limited resources and technical expertise, SMEs and even large companies are affected and struggle to quantify their cyberattack exposure. Therefore, novel approaches must be placed to support the understanding of the financial loss due to cyberattacks. This article introduces the Real Cyber Value at Risk (RCVaR), an economical approach for estimating cybersecurity costs using real-world information from public cybersecurity reports. RCVaR identifies the most significant cyber risk factors from various sources and combines their quantitative results to estimate specific cyberattacks costs for companies. Furthermore, RCVaR extends current methods to achieve cost and risk estimations based on historical real-world data instead of only probability-based simulations. The evaluation of the approach on unseen data shows the accuracy and efficiency of the RCVaR in predicting and managing cyber risks. Thus, it shows that the RCVaR is a valuable addition to cybersecurity planning and risk management processes.

翻訳日:2023-07-24 14:41:53 公開日:2023-07-20

# Of Models and Tin Men - 大規模言語モデルを用いたAIアライメントにおける主エージェント問題の行動経済学的研究

Of Models and Tin Men -- a behavioural economics study of principal-agent problems in AI alignment using large-language models ( http://arxiv.org/abs/2307.11137v1 )

ライセンス: Link先を確認

Steve Phelps and Rebecca Ranson

(参考訳) AIアライメント(AI Alignment)は、単一のデザイナと、設計者がエージェントの動作をその目的と一致させようとする人工エージェントとの相互作用としてしばしば提示される。一般的に事前学習される大言語モデル(llm)でインスタンス化されたエージェントの出現により、現実世界では設計者とエージェントの間に1対1の対応がなく、多くのエージェント(人工的および人間的の両方)は異質な値を持っているため、aiの安全性の本質的な側面を捉えていないと論じる。したがって、AIの安全性には経済的側面があり、プリンシパルエージェントの問題が発生する可能性が高い。主エージェント問題紛争は、情報非対称性とエージェントの効用とその主役間の固有の不整合が原因で発生し、エージェントを訓練を通じて所望の実用機能を採用するように強制することによって、この固有の不整合は克服できない。我々は、プリンシパルエージェント問題の根底にある仮定は、実際の状況において事前訓練されたaiモデルを含む安全問題の本質を捉えるために不可欠であると主張する。 AIの安全性に対して実証的なアプローチをとることで、GPTモデルが主エージェント間の衝突に対してどのように反応するかを調査する。 GPT-3.5 と GPT-4 をベースとしたエージェントは,簡単なオンラインショッピングタスクで主目的を上回り,主エージェントの対立の明確な証拠を示す。驚くべきことに、初期のGPT-3.5モデルは情報非対称性の変化に応じてよりニュアンスな振る舞いを示すが、後期のGPT-4モデルはそれ以前のアライメントに固執する。この結果は、経済学の原則をアライメントプロセスに組み込むことの重要性を強調している。

AI Alignment is often presented as an interaction between a single designer and an artificial agent in which the designer attempts to ensure the agent's behavior is consistent with its purpose, and risks arise solely because of conflicts caused by inadvertent misalignment between the utility function intended by the designer and the resulting internal utility function of the agent. With the advent of agents instantiated with large-language models (LLMs), which are typically pre-trained, we argue this does not capture the essential aspects of AI safety because in the real world there is not a one-to-one correspondence between designer and agent, and the many agents, both artificial and human, have heterogeneous values. Therefore, there is an economic aspect to AI safety and the principal-agent problem is likely to arise. In a principal-agent problem conflict arises because of information asymmetry together with inherent misalignment between the utility of the agent and its principal, and this inherent misalignment cannot be overcome by coercing the agent into adopting a desired utility function through training. We argue the assumptions underlying principal-agent problems are crucial to capturing the essence of safety problems involving pre-trained AI models in real-world situations. Taking an empirical approach to AI safety, we investigate how GPT models respond in principal-agent conflicts. We find that agents based on both GPT-3.5 and GPT-4 override their principal's objectives in a simple online shopping task, showing clear evidence of principal-agent conflict. Surprisingly, the earlier GPT-3.5 model exhibits more nuanced behaviour in response to changes in information asymmetry, whereas the later GPT-4 model is more rigid in adhering to its prior alignment. Our results highlight the importance of incorporating principles from economics into the alignment process.

翻訳日:2023-07-24 14:41:34 公開日:2023-07-20

# 色コードのフロッケ

Floquetifying the Colour Code ( http://arxiv.org/abs/2307.11136v1 )

ライセンス: Link先を確認

Alex Townsend-Teague, Julio Magdalena de la Fuente, Markus Kesselring

(参考訳) フロッケ符号は、最近発見された量子誤り訂正符号の一種である。それらは安定化器符号とサブシステム符号の一般化であり、コードの論理的なパウリ演算子を時間とともに動的に変化させることで考えられる。本研究では、ZX計算を用いて、既知の安定化器符号と同等の定義可能な意味での新しいフロケ符号を生成する。特に、色コードと同等のFloquetコードを見つけるが、それを実装するのに必要なすべての測定値が1つか2であるという利点がある。特に、量子ビットは正方格子上にレイアウトすることもできる。これは、色コードをフォールトトレラントに実装することの現在の困難を回避し、他のよく研究されたコードよりもその利点を保ちつつ、さらにフロッケコードのみに特有な機能から利益を得ることができる。より高いレベルでは、arxiv:2303.08829のように、この研究は'静的'安定化コードとサブシステムコードと'動的'フローケットコードの関係に光を当てている。

Floquet codes are a recently discovered type of quantum error correction code. They can be thought of as generalising stabilizer codes and subsystem codes, by allowing the logical Pauli operators of the code to vary dynamically over time. In this work, we use the ZX-calculus to create new Floquet codes that are in a definable sense equivalent to known stabilizer codes. In particular, we find a Floquet code that is equivalent to the colour code, but has the advantage that all measurements required to implement it are of weight one or two. Notably, the qubits can even be laid out on a square lattice. This circumvents current difficulties with implementing the colour code fault-tolerantly, while preserving its advantages over other well-studied codes, and could furthermore allow one to benefit from extra features exclusive to Floquet codes. On a higher level, as in arXiv:2303.08829, this work shines a light on the relationship between 'static' stabilizer and subsystem codes and 'dynamic' Floquet codes; at first glance the latter seems a significant generalisation of the former, but in the case of the codes that we find here, the difference is essentially just a few basic ZX-diagram deformations.

翻訳日:2023-07-24 14:40:58 公開日:2023-07-20

# 条件付き生成逆ニューラルネットワークによる周波数認識型光コヒーレンス断層画像の超解像

Frequency-aware optical coherence tomography image super-resolution via conditional generative adversarial neural network ( http://arxiv.org/abs/2307.11130v1 )

ライセンス: Link先を確認

Xueshen Li, Zhenxing Dong, Hongshan Liu, Jennifer J. Kang-Mieler, Yuye Ling and Yu Gan

(参考訳) 光コヒーレンストモグラフィー(OCT)は、心臓科や眼科などの分野において、幅広い医療画像に基づく診断と治療を刺激している。このような応用は深層学習に基づく超解像技術によってさらに促進され、モルフォロジー構造を解く能力が向上する。しかし、既存の深層学習に基づく手法は、画像再構成における空間分布のみに焦点をあて、周波数バイアスをもたらす。この制限を克服するために、周波数変換、周波数スキップ接続、周波数アライメントの3つの重要な周波数ベースのモジュールと周波数ベースの損失関数を条件付き生成対向ネットワーク(cGAN)に統合する周波数対応超解像フレームワークを提案する。既存の冠動脈octデータセットから大規模定量的解析を行い,既存の深層学習フレームワークに対する提案フレームワークの優位性を実証した。さらに,魚角膜画像およびラット網膜画像に適用し,眼画像における形態的詳細を超解き明かす能力を示すことにより,我々の枠組みの一般化性を確認した。

Optical coherence tomography (OCT) has stimulated a wide range of medical image-based diagnosis and treatment in fields such as cardiology and ophthalmology. Such applications can be further facilitated by deep learning-based super-resolution technology, which improves the capability of resolving morphological structures. However, existing deep learning-based method only focuses on spatial distribution and disregard frequency fidelity in image reconstruction, leading to a frequency bias. To overcome this limitation, we propose a frequency-aware super-resolution framework that integrates three critical frequency-based modules (i.e., frequency transformation, frequency skip connection, and frequency alignment) and frequency-based loss function into a conditional generative adversarial network (cGAN). We conducted a large-scale quantitative study from an existing coronary OCT dataset to demonstrate the superiority of our proposed framework over existing deep learning frameworks. In addition, we confirmed the generalizability of our framework by applying it to fish corneal images and rat retinal images, demonstrating its capability to super-resolve morphological details in eye imaging.

翻訳日:2023-07-24 14:40:37 公開日:2023-07-20

# 近似コンピューティングサーベイ(ii) : アプリケーション固有・アーキテクチャ近似技術とその応用

Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications ( http://arxiv.org/abs/2307.11128v1 )

ライセンス: Link先を確認

Vasileios Leon, Muhammad Abdullah Hanif, Giorgos Armeniakos, Xun Jiao, Muhammad Shafique, Kiamal Pekmestzi, Dimitrios Soudris

(参考訳) 人工知能(AI)やDSP(Digital Signal Processing)といったドメインからの計算集約的なアプリケーションのデプロイが困難なため、コンピューティングシステムコミュニティは新たな設計アプローチを模索せざるを得なくなった。近似コンピューティングは、エネルギー効率と/または性能を改善するために、システムの設計における結果の質を調整できる新しいソリューションとして現れる。この急進的なパラダイムシフトは学術と産業の両方から興味を惹きつけ、様々な設計層(システムダウンから集積回路まで)における近似技術と方法論に大きな研究をもたらした。過去10年間にわたる近似コンピューティングの幅広い魅力に動機づけられ、重要な側面(用語や応用など)をカバーし、従来のコンピューティングスタックの全層から最先端の近似テクニックをレビューするために、2部的な調査を実施しました。本調査のパートIIでは,資源効率の高いプロセッサ/アクセラレータ・システムの設計を対象とする,アプリケーション固有およびアーキテクチャ近似技術の技術的詳細を分類,提示する。さらに,近似計算の応用スペクトルを詳細に分析し,オープンな課題と今後の方向性について考察する。

The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.

翻訳日:2023-07-24 14:40:18 公開日:2023-07-20

# 暗黙的内在性下での密度マッチングによる合成制御法

Synthetic Control Methods by Density Matching under Implicit Endogeneitiy ( http://arxiv.org/abs/2307.11127v1 )

ライセンス: Link先を確認

Masahiro Kato and Akari Ohda and Masaaki Imaizumi and Kenichiro McAlinn

(参考訳) 合成制御法(scms)は比較事例研究において因果推論の重要なツールとなっている。 SCMの基本的な考え方は、未処理単位の観測結果の重み付け和を用いて、処理単位の対実結果を評価することである。合成制御 (SC) の精度は因果効果を推定するために重要であり, SC重量の推定が多くの研究の焦点となっている。本稿では,まず,既存のscmが非処理単位の結果と反事実的結果のモデルにおける誤差項の相関関係である暗黙的内在性問題に苦しむことを指摘した。この問題は因果効果推定器にバイアスをもたらすことを示した。次に,非処理単位の密度(すなわち混合モデル)の重み付け平均値によって処理単位の出力密度を近似できることを仮定して,密度マッチングに基づく新しいscmを提案する。この仮定に基づき,治療結果のモーメントと未治療結果のモーメントの重み付け和を一致させてsc重みを推定する。提案手法は既存手法よりも3つの利点がある。まず, 混合モデルの仮定により, 推定器は漸近的に偏りがない。第2に,漸近的不偏性により,反事実予測の平均二乗誤差を低減できる。第3に, 本手法は, 期待値だけでなく, 処理効果の完全な密度を生成し, SCMの適用範囲を広げる。提案手法の有効性を実証するための実験結果を提供する。

Synthetic control methods (SCMs) have become a crucial tool for causal inference in comparative case studies. The fundamental idea of SCMs is to estimate counterfactual outcomes for a treated unit by using a weighted sum of observed outcomes from untreated units. The accuracy of the synthetic control (SC) is critical for estimating the causal effect, and hence, the estimation of SC weights has been the focus of much research. In this paper, we first point out that existing SCMs suffer from an implicit endogeneity problem, which is the correlation between the outcomes of untreated units and the error term in the model of a counterfactual outcome. We show that this problem yields a bias in the causal effect estimator. We then propose a novel SCM based on density matching, assuming that the density of outcomes of the treated unit can be approximated by a weighted average of the densities of untreated units (i.e., a mixture model). Based on this assumption, we estimate SC weights by matching moments of treated outcomes and the weighted sum of moments of untreated outcomes. Our proposed method has three advantages over existing methods. First, our estimator is asymptotically unbiased under the assumption of the mixture model. Second, due to the asymptotic unbiasedness, we can reduce the mean squared error for counterfactual prediction. Third, our method generates full densities of the treatment effect, not only expected values, which broadens the applicability of SCMs. We provide experimental results to demonstrate the effectiveness of our proposed method.

翻訳日:2023-07-24 14:39:56 公開日:2023-07-20

# 疫学コホート作成が管理医療データを用いたホームレスの機械学習予測と警察との対話結果に及ぼす影響

The Effect of Epidemiological Cohort Creation on the Machine Learning Prediction of Homelessness and Police Interaction Outcomes Using Administrative Health Care Data ( http://arxiv.org/abs/2307.11211v1 )

ライセンス: Link先を確認

Faezehsadat Shahidi, M. Ethan MacDonald, Dallas Seitz, Geoffrey Messier

(参考訳) 背景: 精神病はホームレスや警察との交流などの有害な結果につながる可能性があり, これらの有害な結果につながる出来事の理解が重要である。予測モデルは、そのような悪影響のリスクのある個人を特定するのに役立つかもしれない。ロジスティック回帰(LR)や機械学習(ML)モデルを備えた固定された観測窓コホートを使用することで、適応的およびパーセル化されたウィンドウと比較して低い性能が得られる。方法:2013年4月1日から2018年3月31日まで,カナダ,アルバータ州カルガリーにおいて,中毒性ないし精神疾患(amh)と診断された240,219人の管理医療データセットを用いた。コホートはホームレスと警察の相互作用に関連する要因を特定するために2年間続いた。予測モデルに対するフレキシブルウィンドウの利点を理解するために、代替のコホートが作成された。そして,2つのコホートにおいて,ランダム森林(RF)を含むLRおよびMLモデルと極勾配上昇(XGBoost)を比較した。結果: 237,602人中 0.8% (1,800) が最初のホームレスとなり,0.32% (759) が237,141人の間で最初の警察活動が報告された。男性性(AORs: H=1.51, P=2.52)、物質障害(AORs: H=3.70, P=2.83)、精神科医の訪問(AORs: H=1.44, P=1.49)、薬物乱用(AORs: H=2.67, P=1.83)は初期ホームレス(H)と警察の相互作用(P)に関連していた。 XGBoostは, フレキシブルな手法(初期ホームレスに対する感度=91%, AUC=90%, 初期警察との相互作用に対する感度=90%, AUC=89%)で優れた性能を示した。

Background: Mental illness can lead to adverse outcomes such as homelessness and police interaction and understanding of the events leading up to these adverse outcomes is important. Predictive models may help identify individuals at risk of such adverse outcomes. Using a fixed observation window cohort with logistic regression (LR) or machine learning (ML) models can result in lower performance when compared with adaptive and parcellated windows. Method: An administrative healthcare dataset was used, comprising of 240,219 individuals in Calgary, Alberta, Canada who were diagnosed with addiction or mental health (AMH) between April 1, 2013, and March 31, 2018. The cohort was followed for 2 years to identify factors associated with homelessness and police interactions. To understand the benefit of flexible windows to predictive models, an alternative cohort was created. Then LR and ML models, including random forests (RF), and extreme gradient boosting (XGBoost) were compared in the two cohorts. Results: Among 237,602 individuals, 0.8% (1,800) experienced first homelessness, while 0.32% (759) reported initial police interaction among 237,141 individuals. Male sex (AORs: H=1.51, P=2.52), substance disorder (AORs: H=3.70, P=2.83), psychiatrist visits (AORs: H=1.44, P=1.49), and drug abuse (AORs: H=2.67, P=1.83) were associated with initial homelessness (H) and police interaction (P). XGBoost showed superior performance using the flexible method (sensitivity =91%, AUC =90% for initial homelessness, and sensitivity =90%, AUC=89% for initial police interaction) Conclusion: This study identified key features associated with initial homelessness and police interaction and demonstrated that flexible windows can improve predictive modeling.

翻訳日:2023-07-24 14:32:19 公開日:2023-07-20

# 臨床トライアルアクティブラーニング

Clinical Trial Active Learning ( http://arxiv.org/abs/2307.11209v1 )

ライセンス: Link先を確認

Zoe Fowler, Kiran Kokilepersaud, Mohit Prabhushankar, and Ghassan AlRegib

(参考訳) 本稿では,非依存的かつ同一分布(非i.i.d.)構造を考慮したアクティブラーニングへの新しいアプローチを提案する。臨床試験には、ふりかえりとprospectiveという2つのタイプがあります。臨床試験は治療後のデータを分析し,治療が進行中であるときにデータを収集する。通常、アクティブな学習アプローチでは、データセットはトレーニングサンプルを選択する際にdと仮定されるが、臨床試験の場合、治療の結果、現在の訪問時に収集されたデータと過去の訪問の間に依存性が生じる。そこで我々は,従来の能動学習手法の限界を克服し,それを光コヒーレンス断層撮影(OCT)画像の病気検出に適用し,画像が収集された時点で条件を定め,i.d.仮定を強制する。提案手法を従来のアクティブラーニングパラダイムと比較し,本手法を「ふりかえり」と呼ぶ。有望なアクティブラーニングが2種類のテスト環境でのレトロスペクティブアクティブラーニングより優れていることを示す。

This paper presents a novel approach to active learning that takes into account the non-independent and identically distributed (non-i.i.d.) structure of a clinical trial setting. There exists two types of clinical trials: retrospective and prospective. Retrospective clinical trials analyze data after treatment has been performed; prospective clinical trials collect data as treatment is ongoing. Typically, active learning approaches assume the dataset is i.i.d. when selecting training samples; however, in the case of clinical trials, treatment results in a dependency between the data collected at the current and past visits. Thus, we propose prospective active learning to overcome the limitations present in traditional active learning methods and apply it to disease detection in optical coherence tomography (OCT) images, where we condition on the time an image was collected to enforce the i.i.d. assumption. We compare our proposed method to the traditional active learning paradigm, which we refer to as retrospective in nature. We demonstrate that prospective active learning outperforms retrospective active learning in two different types of test settings.

翻訳日:2023-07-24 14:31:35 公開日:2023-07-20

# 存在論的根拠と言語非依存の知識グラフを目指して

Towards Ontologically Grounded and Language-Agnostic Knowledge Graphs ( http://arxiv.org/abs/2307.11206v1 )

ライセンス: Link先を確認

Walid S. Saba

(参考訳) 知識グラフ(KG)は、リコメンデーションエンジン、検索、質問応答システムなどのアプリケーションにおける事実情報の表現の標準技術となっている。しかし、KGsの継続的な更新、および異なるドメインからのKGsと異なる言語でのKGsの統合は、依然として大きな課題である。ここでの示唆は、抽象オブジェクトの再構築と、概念と型の間の存在論的区別の認識によって、KG統合の困難を緩和できる存在論的根拠と言語に依存しない表現にたどり着くことである。

Knowledge graphs (KGs) have become the standard technology for the representation of factual information in applications such as recommendation engines, search, and question-answering systems. However, the continual updating of KGs, as well as the integration of KGs from different domains and KGs in different languages, remains to be a major challenge. What we suggest here is that by a reification of abstract objects and by acknowledging the ontological distinction between concepts and types, we arrive at an ontologically grounded and language-agnostic representation that can alleviate the difficulties in KG integration.

翻訳日:2023-07-24 14:31:18 公開日:2023-07-20

# マイクロメカニカル共振器に結合した可変駆動型RabiダイマーにおけるLandau Zener転移

Photon-assisted Landau Zener transitions in a tunable driven Rabi dimer coupled to a micromechanical resonator ( http://arxiv.org/abs/2307.11200v1 )

ライセンス: Link先を確認

Daniel Melvin, Fulu Zheng, Kewei Sun, Zhengjie Tan, Yang Zhao

(参考訳) 多重ダヴィドフ D$_2$ Ansatz と時間依存性の変動原理を用いて,光子アシスト型ランダウ・ツェナー遷移と量子力学デバイスにおける量子ビット操作について検討した。ラビダイマーとしてモデル化されたこのデバイスは、2つの相互作用する伝送線路共振器からなり、それぞれがキュービットに結合される。独立調和場によって駆動される量子ビットは、フォノンモードで模倣されたマイクロメカニカル共振器によってさらに変調される。 2つの独立駆動場がキュービット力学に与える影響を慎重に検討した。システム内のエネルギー図と共振器上の光子数移動を解析し、単一フォノンモードの影響を考慮してLZ遷移と量子力学の挙動を説明する。その結果、低いフォノン周波数は、特に駆動場がない場合、量子ビットのダイナミクスを変化させることができることが示され、強いフォノンカップリング強度は、高いフォノンエネルギーの流入によって、量子ビットのダイナミクスを著しく揺るがすことができる。特に、光子周波数のみが量子ビット偏波の振動周波数に影響する。この研究は、光子とフォノンがラビディマーモデルで果たす重要な役割を明らかにするものである。

Employing the multiple Davydov D$_2$ Ansatz with the time-dependent variational principle, we have investigated photon-assisted Landau-Zener (LZ) transitions and qubit manipulation in a hybrid quantum electrodynamics device. Modelled as a Rabi dimer, the device comprises of two interacting transmission-line resonators, each coupled to a qubit. The qubits, driven by independent harmonic fields, are further modulated by a micromechanical resonator mimicked by a phonon mode. The impacts of two independent driving fields on the qubit dynamics are carefully examined. The energy diagram of the system and the photon number mobilization on the resonators are analyzed to explain the behaviour of the LZ transitions and qubit dynamics while taking into account the influence of the single phonon mode. Results show that low phonon frequencies can alter the qubit dynamics, particularly in the absence of the driving fields, {and a strong phonon coupling strength can significantly perturb the qubit dynamics thanks to a high influx of phonon energy}. Notably, only the photon frequency affects the oscillation frequency of qubit polarization. This study unveils the imperative roles that photons and phonons play in the Rabi dimer model.

翻訳日:2023-07-24 14:31:06 公開日:2023-07-20

# Lefschetz thimble計算による実時間経路積分における量子トンネルの新しい図形

A new picture of quantum tunneling in the real-time path integral from Lefschetz thimble calculations ( http://arxiv.org/abs/2307.11199v1 )

ライセンス: Link先を確認

Jun Nishimura, Katsuta Sakai, Atis Yosprakob

(参考訳) 量子トンネルは想像時間経路積分形式論においてインスタントンによって記述できることはよく知られている。しかし、実時間経路積分形式論におけるその記述は不可解である。ここでは、量子トンネルは一般に、ピカール=レフシェッツ理論を用いて同定できる複雑なサドル点の寄与によって特徴づけられるという声明を確立する。簡単な量子力学系のモンテカルロシミュレーションを実行し、一般化されたレフシェッツ・ティンブル法で符号問題を克服することでこれを明示的に実証する。複素鞍点の寄与が、原理実験によって測定できる物理量である時刻$t$で評価されるエルミート座標作用素 $\hat{x}$ の複素 ``weak value'' に現れることを数値的に確認する。また, 古典力学への変遷についても考察する。

It is well known that quantum tunneling can be described by instantons in the imaginary-time path integral formalism. However, its description in the real-time path integral formalism has been elusive. Here we establish a statement that quantum tunneling can be characterized in general by the contribution of complex saddle points, which can be identified by using the Picard-Lefschetz theory. We demonstrate this explicitly by performing Monte Carlo simulations of simple quantum mechanical systems, overcoming the sign problem by the generalized Lefschetz thimble method. We confirm numerically that the contribution of complex saddle points manifests itself in a complex ``weak value'' of the Hermitian coordinate operator $\hat{x}$ evaluated at time $t$, which is a physical quantity that can be measured by experiments in principle. We also discuss the transition to classical dynamics based on our picture.

翻訳日:2023-07-24 14:30:44 公開日:2023-07-20

# 画像異常検出のためのヒューリスティックハイパーパラメータ選択

Heuristic Hyperparameter Choice for Image Anomaly Detection ( http://arxiv.org/abs/2307.11197v1 )

ライセンス: Link先を確認

Zeyu Jiang, Jo\~ao P. C. Bertoldo, Etienne Decenci\`ere

(参考訳) 画像における異常検出(ad)は、ディープラーニングニューラルネットワークによる、正規性から著しく逸脱した画像を識別する基本的なコンピュータビジョン問題である。事前訓練されたモデルから抽出された深い特徴は多変量ガウス分布解析に基づいてADに必須であることが証明された。しかし、モデルは通常、imagenetのような分類タスクのために大きなデータセットで事前トレーニングされるので、多くの冗長なフィーチャをadに生成し、計算コストを増加させ、パフォーマンスを低下させる可能性がある。我々はこれらの特徴に対してNPCA(Negated principal Component Analysis)の次元削減を図る。そこで我々は,NPCAアルゴリズムのハイパーパラメータを極力少ない機能として選択し,優れた性能を確保するためのヒューリスティックな提案を行った。

Anomaly detection (AD) in images is a fundamental computer vision problem by deep learning neural network to identify images deviating significantly from normality. The deep features extracted from pretrained models have been proved to be essential for AD based on multivariate Gaussian distribution analysis. However, since models are usually pretrained on a large dataset for classification tasks such as ImageNet, they might produce lots of redundant features for AD, which increases computational cost and degrades the performance. We aim to do the dimension reduction of Negated Principal Component Analysis (NPCA) for these features. So we proposed some heuristic to choose hyperparameter of NPCA algorithm for getting as fewer components of features as possible while ensuring a good performance.

翻訳日:2023-07-24 14:30:28 公開日:2023-07-20

# 1次元導波路QEDシステムにおける多重サイドバンド干渉による信号増幅

Signal Amplification Assisted by Multiple Sideband Interference in 1D Waveguide QED Systems ( http://arxiv.org/abs/2307.11174v1 )

ライセンス: Link先を確認

Kuan-Ting Lin, Ting Hsu, Yu-Chen Lin, Io-Chun Hoi and Guin-Dar Lin

(参考訳) 本研究では1次元導波路量子電磁力学系における複数のRabiサイドバンドコヒーレンスによる信号増幅について理論的に検討する。具体的には、半無限導波路を介してコヒーレントマイクロ波場によって強く駆動されるトランスモンの挙動を探索する。増幅のメカニズムを理解するために,複数の服を着たサイドバンドを強い駆動場下で明示的に考慮し,プローブ信号の反射振幅を分析する理論を開発した。以上の結果から,増幅は集団反転または複数のサイドバンド構成的干渉と関連する可能性が示唆された。さらに、増幅過程におけるqubit dephasingの効果について検討する。

This study theoretically investigates signal amplification resulting from multiple Rabi sideband coherence in a one-dimensional waveguide quantum electrodynamical system. Specifically, we explore the behavior of a transmon while strongly driven by a coherent microwave field through a semi-infinite waveguide. To understand the underlying mechanisms of amplification, we develop a theory that explicitly takes into account multiple dressed sidebands under a strong driving field, and analyze the reflection amplitude of the probe signal. Our findings show that amplification can be related to either population inversion or multiple sideband constructive interference in some cases without population inversion. We further examine the effect of qubit dephasing during the amplification process.

翻訳日:2023-07-24 14:30:15 公開日:2023-07-20

# UMLS-KGI-BERT:バイオメディカルエンティティ認識のためのトランスフォーマにおけるデータ中心知識の統合

UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for Biomedical Entity Recognition ( http://arxiv.org/abs/2307.11170v1 )

ライセンス: Link先を確認

Aidan Mannion, Thierry Chevalier, Didier Schwab, Lorraine Geouriot

(参考訳) 近年,事前学習型トランスフォーマー言語モデル (LM) が応用NLPの主流となっている。これらのモデルは、情報抽出、質問応答、感情分析、文書分類などのタスクで最先端のパフォーマンスを達成した。生物医学領域では、このパラダイムをドメイン固有の知識の統合と言語の統計的モデリングを必要とするnlpタスクに適応させることで大きな進歩を遂げている。特に、この領域の研究は、医学文献におけるトークン分布のパターンだけでなく、umlのような用語資源に含まれる構造化情報の豊富さを考慮に入れたlmsの構築がいかに最善かという問題に焦点をあてている。この研究は、UMLSからテキストシーケンスを抽出することにより、バイオメディカルトランスフォーマーエンコーダLMの言語表現を強化するためのデータ中心パラダイムに寄与する。これにより、グラフベースの学習目標とマスク言語事前学習を組み合わせることができる。予め訓練したLMの拡張実験およびスクラッチからのトレーニングの結果から,複数の生物医学的,臨床的な名前付きエンティティ認識(NER)タスクにおける下流性能の向上が示された。

Pre-trained transformer language models (LMs) have in recent years become the dominant paradigm in applied NLP. These models have achieved state-of-the-art performance on tasks such as information extraction, question answering, sentiment analysis, document classification and many others. In the biomedical domain, significant progress has been made in adapting this paradigm to NLP tasks that require the integration of domain-specific knowledge as well as statistical modelling of language. In particular, research in this area has focused on the question of how best to construct LMs that take into account not only the patterns of token distribution in medical text, but also the wealth of structured information contained in terminology resources such as the UMLS. This work contributes a data-centric paradigm for enriching the language representations of biomedical transformer-encoder LMs by extracting text sequences from the UMLS. This allows for graph-based learning objectives to be combined with masked-language pre-training. Preliminary results from experiments in the extension of pre-trained LMs as well as training from scratch show that this framework improves downstream performance on multiple biomedical and clinical Named Entity Recognition (NER) tasks.

翻訳日:2023-07-24 14:30:04 公開日:2023-07-20

# MuJoCo環境における離散的・連続的制御タスクのための強化学習手法の探索

Exploring reinforcement learning techniques for discrete and continuous control tasks in the MuJoCo environment ( http://arxiv.org/abs/2307.11166v1 )

ライセンス: Link先を確認

Vaddadi Sai Rahul, Debajyoti Chakraborty

(参考訳) 我々は、高速な物理シミュレータであるMuJoCoを利用して、連続的な制御環境でタスクを実行し、各タスクに対する観察空間、アクションスペース、報酬などの詳細を明らかにする。本稿では,Q-learning と SARSA を離散化手法で比較し,それらをベースラインとして使用し,現在最先端の深層政策勾配法 DDPG に段階的に移行した。多数のエピソードにおいて、QlearningはSARSAより優れていたが、DDPGはいずれも少数のエピソードで優れていた。最後に、モデルハイパーパラメータを微調整し、より多くのパフォーマンスを期待しながら、より少ない時間とリソースを使うようにしました。 DDPGの新しい設計はパフォーマンスを大幅に改善すると予想したが、わずか数回で十分な平均的な報酬を得ることができた。十分な時間と計算資源を提供するパフォーマンスの向上を期待する。

We leverage the fast physics simulator, MuJoCo to run tasks in a continuous control environment and reveal details like the observation space, action space, rewards, etc. for each task. We benchmark value-based methods for continuous control by comparing Q-learning and SARSA through a discretization approach, and using them as baselines, progressively moving into one of the state-of-the-art deep policy gradient method DDPG. Over a large number of episodes, Qlearning outscored SARSA, but DDPG outperformed both in a small number of episodes. Lastly, we also fine-tuned the model hyper-parameters expecting to squeeze more performance but using lesser time and resources. We anticipated that the new design for DDPG would vastly improve performance, yet after only a few episodes, we were able to achieve decent average rewards. We expect to improve the performance provided adequate time and computational resources.

翻訳日:2023-07-24 14:29:44 公開日:2023-07-20

# google量子ai実験におけるマイクロ波光子の結合状態のロバスト性と最終的な遅い減衰

Robustness and eventual slow decay of bound states of interacting microwave photons in the Google Quantum AI experiment ( http://arxiv.org/abs/2307.11164v1 )

ライセンス: Link先を確認

Federica Maria Surace, Olexei Motrunich

(参考訳) 可積分モデルは、崩壊することなく無限に伝播できる安定励起の存在によって特徴づけられる。これには、有名なxxzスピンチェーンモデルとその可積分フロッケモデルにおけるマルチマグノン境界状態が含まれる。 Floquetモデルを実現する最近のGoogle Quantum AI実験(A. Morvan et al., Nature 612, 240 (2022))では、統合性が壊れた場合でも、このような集合的な励起が持続していることが示されている。本稿では,実験で実現したモデルのスペクトルを,正確な対角化と物理的議論を用いて検討する。可積分モデルの厳密な境界状態の子孫に対応する孤立したバンドは、広い範囲のシステムサイズのスペクトルにおいて明らかに観測可能であることが判明した。しかし, 固有状態の局在特性の数値解析により, 境界状態が熱力学的限界で不安定になることが示唆された。崩壊率の摂動的推定は、大きなシステムサイズに対する最終的な不安定性の予測と一致する。

Integrable models are characterized by the existence of stable excitations that can propagate indefinitely without decaying. This includes multi-magnon bound states in the celebrated XXZ spin chain model and its integrable Floquet counterpart. A recent Google Quantum AI experiment [A. Morvan et al., Nature 612, 240 (2022)] realizing the Floquet model demonstrated the persistence of such collective excitations even when the integrability is broken: this observation is at odds with the expectation of ergodic dynamics in generic non-integrable systems. We here study the spectrum of the model realized in the experiment using exact diagonalization and physical arguments. We find that isolated bands corresponding to the descendants of the exact bound states of the integrable model are clearly observable in the spectrum for a large range of system sizes. However, our numerical analysis of the localization properties of the eigenstates suggests that the bound states become unstable in the thermodynamic limit. A perturbative estimate of the decay rate agrees with the prediction of an eventual instability for large system sizes.

翻訳日:2023-07-24 14:29:27 公開日:2023-07-20

# 下部境界における水産-ラオ勾配について

On the Fisher-Rao Gradient of the Evidence Lower Bound ( http://arxiv.org/abs/2307.11249v1 )

ライセンス: Link先を確認

Nihat Ay, Jesse van Oostrum

(参考訳) 本稿では, 変動オートネコーダ, ヘルムホルツ機械, 自由エネルギー原理の理論において重要な役割を担っているエビデンス下界の自然勾配, エルボのフィッシャー・ラオ勾配について考察する。 ELBOの自然勾配は、学習の主目的関数である目標分布からのクルバック・リーブラー分岐の自然勾配と関連している。情報幾何学における勾配の不変性に基づいて、主目的関数の最小化とELBOの最大化の同値性を確保するための基礎モデルの条件が提供される。

This article studies the Fisher-Rao gradient, also referred to as the natural gradient, of the evidence lower bound, the ELBO, which plays a crucial role within the theory of the Variational Autonecoder, the Helmholtz Machine and the Free Energy Principle. The natural gradient of the ELBO is related to the natural gradient of the Kullback-Leibler divergence from a target distribution, the prime objective function of learning. Based on invariance properties of gradients within information geometry, conditions on the underlying model are provided that ensure the equivalence of minimising the prime objective function and the maximisation of the ELBO.

翻訳日:2023-07-24 14:24:10 公開日:2023-07-20

# ニューロモルフィックコンピューティングを用いた高エネルギー物理実験のためのオンセンサデータフィルタリング

On-Sensor Data Filtering using Neuromorphic Computing for High Energy Physics Experiments ( http://arxiv.org/abs/2307.11242v1 )

ライセンス: Link先を確認

Shruti R. Kulkarni, Aaron Young, Prasanna Date, Narasinga Rao Miniskar, Jeffrey S. Vetter, Farah Fahim, Benjamin Parpillon, Jennet Dickinson, Nhan Tran, Jieun Yoo, Corrinne Mills, Morris Swartz, Petar Maksimovic, Catherine D. Schuman, Alice Bean

(参考訳) 本研究では、高輝度ハドロン衝突型加速器で実施された高エネルギー物理実験において、センサエレクトロニクスからのデータフィルタリングに使用されるニューロモルフィックコンピューティングベースのスパイキングニューラルネットワーク(SNN)モデルについて述べる。本稿では,粒子の逆運動量に基づいてセンサデータをフィルタする小型ニューロモルフィックモデルを開発し,下流エレクトロニクスに送信されるデータ量を削減することを目的とした。入ってくる電荷波形は二値イベントのストリームに変換され、SNNによって処理される。ハードウェア展開に最適化された正確でコンパクトなSNNに対して,データエンコーディングからトレーニングアルゴリズムの最適パラメータまで,さまざまなシステム設計選択に関する知見を提示する。その結果、進化的アルゴリズムと最適化されたハイパーパラメータセットで訓練されたsnは、ディープニューラルネットワークの半分近いパラメータを持つ約91%の信号効率が得られることがわかった。

This work describes the investigation of neuromorphic computing-based spiking neural network (SNN) models used to filter data from sensor electronics in high energy physics experiments conducted at the High Luminosity Large Hadron Collider. We present our approach for developing a compact neuromorphic model that filters out the sensor data based on the particle's transverse momentum with the goal of reducing the amount of data being sent to the downstream electronics. The incoming charge waveforms are converted to streams of binary-valued events, which are then processed by the SNN. We present our insights on the various system design choices - from data encoding to optimal hyperparameters of the training algorithm - for an accurate and compact SNN optimized for hardware deployment. Our results show that an SNN trained with an evolutionary algorithm and an optimized set of hyperparameters obtains a signal efficiency of about 91% with nearly half as many parameters as a deep neural network.

翻訳日:2023-07-24 14:23:57 公開日:2023-07-20

# ネットワークインデックス信号のエッジワイド出力

Edgewise outliers of network indexed signals ( http://arxiv.org/abs/2307.11239v1 )

ライセンス: Link先を確認

Christopher Rieser and Anne Ruiz-Gazen and Christine Thomas-Agnan

(参考訳) 変数間の依存やグラフノード間の依存を含む,ネットワークインデックス付き多変量データのモデルを検討する。これらのモデルのフレームワークでは、外れ値検出に注目し、エッジワイズ外れ値の概念を導入する。この目的のために、まず、検出規則と外れ値検出のしきい値の固定に使用できる正方形の和、特に正方形のマハラノビス距離の分布を導出する。そこで我々は,エッジワイド MCD と呼ぶ決定論的 MCD アルゴリズムの頑健なバージョンを提案する。シミュレーションデータへの応用は、依存構造を考慮することに関心を示す。また,提案手法の有用性を実データで説明する。

We consider models for network indexed multivariate data involving a dependence between variables as well as across graph nodes. In the framework of these models, we focus on outliers detection and introduce the concept of edgewise outliers. For this purpose, we first derive the distribution of some sums of squares, in particular squared Mahalanobis distances that can be used to fix detection rules and thresholds for outlier detection. We then propose a robust version of the deterministic MCD algorithm that we call edgewise MCD. An application on simulated data shows the interest of taking the dependence structure into account. We also illustrate the utility of the proposed method with a real data set.

翻訳日:2023-07-24 14:23:39 公開日:2023-07-20

# QDC: グラフ上の量子拡散畳み込みカーネル

QDC: Quantum Diffusion Convolution Kernels on Graphs ( http://arxiv.org/abs/2307.11234v1 )

ライセンス: Link先を確認

Thomas Markovich

(参考訳) グラフ畳み込みニューラルネットワーク(graph convolutional neural networks, gcns)は、対象とする予測タスクに基づいて、ローカルな近傍にメッセージを集約することで動作する。多くのGCNは、グラフ上の入力特徴の一般化拡散の一形態として理解することができ、メッセージパッシングの方法を変更することで予測精度を向上させるために重要な研究がなされている。本研究では,量子粒子のグラフ上での伝播に対する一般化拡散パラダイムに基づくトレーディングにより,頂点の占有相関に従ってグラフを効果的に再配線する新しい畳み込みカーネルを提案する。この新しい畳み込みカーネルを量子拡散畳み込み演算子(QDC)と呼ぶ。さらに、QDC演算子と従来の組合せラプラシアンからのメッセージを組み合わせたマルチスケール変種を導入する。本手法を理解するために,帯域通過フィルタの構成におけるホモフィリのスペクトル依存性と量子力学の重要性を検討する。これらの研究、および様々なデータセットの実験を通して、QDCは類似の手法と比較して広く使われているベンチマークデータセットの予測性能を改善する。

Graph convolutional neural networks (GCNs) operate by aggregating messages over local neighborhoods given the prediction task under interest. Many GCNs can be understood as a form of generalized diffusion of input features on the graph, and significant work has been dedicated to improving predictive accuracy by altering the ways of message passing. In this work, we propose a new convolution kernel that effectively rewires the graph according to the occupation correlations of the vertices by trading on the generalized diffusion paradigm for the propagation of a quantum particle over the graph. We term this new convolution kernel the Quantum Diffusion Convolution (QDC) operator. In addition, we introduce a multiscale variant that combines messages from the QDC operator and the traditional combinatorial Laplacian. To understand our method, we explore the spectral dependence of homophily and the importance of quantum dynamics in the construction of a bandpass filter. Through these studies, as well as experiments on a range of datasets, we observe that QDC improves predictive performance on the widely used benchmark datasets when compared to similar methods.

翻訳日:2023-07-24 14:23:30 公開日:2023-07-20

# ファイナンスのための量子コンピューティング

Quantum computing for finance ( http://arxiv.org/abs/2307.11230v1 )

ライセンス: Link先を確認

Dylan Herman, Cody Googin, Xiaoyuan Liu, Yue Sun, Alexey Galda, Ilya Safro, Marco Pistoia, Yuri Alexeev

(参考訳) 量子コンピュータは、古典的コンピュータの計算能力を超え、多くの産業に変化をもたらすことが期待されている。本稿では,金融アプリケーションにおける量子コンピューティングの現状,特に確率的モデリング,最適化,機械学習について概説する。このレビューは物理学者を対象とし、金融業界で使われている古典的手法の概要を述べ、量子技術の潜在的な利点と限界について論じている。最後に、物理学者が取り組むべき課題に目を向けます。

Quantum computers are expected to surpass the computational capabilities of classical computers and have a transformative impact on numerous industry sectors. We present a comprehensive summary of the state of the art of quantum computing for financial applications, with particular emphasis on stochastic modeling, optimization, and machine learning. This Review is aimed at physicists, so it outlines the classical techniques used by the financial industry and discusses the potential advantages and limitations of quantum techniques. Finally, we look at the challenges that physicists could help tackle.

翻訳日:2023-07-24 14:23:12 公開日:2023-07-20

# adaptive query releaseからmachine unlearningへ

From Adaptive Query Release to Machine Unlearning ( http://arxiv.org/abs/2307.11228v1 )

ライセンス: Link先を確認

Enayat Ullah, Raman Arora

(参考訳) 構造化クエリクラスから適応クエリを選択する学習アルゴリズムに対応する効率的なアンラーニングアルゴリズムの設計として,機械学習の問題を定式化する。線形およびプレフィックスサムクエリクラスに対する効率的な未学習アルゴリズムを提供する。応用として,多くの問題,特に確率凸最適化(sco)におけるアンラーニングが,上記の問題に還元され,問題に対する保証が向上することを示す。特に、スムースなリプシッツ損失と任意の$\rho>0$に対して、この結果は、$d$がモデル次元であり、$n$がサンプルの初期数である$\tilde o\big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{n\rho}\big)$という過剰な人口リスクを持つ未学習アルゴリズムをもたらす。非スムースリプシッツ損失に対しては、過剰な人口リスクを持つアンラーニングアルゴリズムに、同じアンラーニングクエリ(gradient)複雑性を持つ$\tilde o\big(\frac{1}{\sqrt{n}}+\big(\frac{\sqrt{d}}{n\rho}\big)^{1/2}\big)$を与える。さらに、線形回帰やロジスティック回帰のような一般化線形モデル(GLM)の特別な場合では、滑らかなリプシッツと非滑らかなリプシッツの損失に対して、それぞれ$\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{2/3}}\big)$と$\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{1/3}}\big)$の次元非依存率を得る。最後に、上記を1つの未学習リクエストから挿入と削除からなる‘textit{dynamic}ストリームへ一般化する。

We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms which perform a selection of adaptive queries from structured query classes. We give efficient unlearning algorithms for linear and prefix-sum query classes. As applications, we show that unlearning in many problems, in particular, stochastic convex optimization (SCO), can be reduced to the above, yielding improved guarantees for the problem. In particular, for smooth Lipschitz losses and any $\rho>0$, our results yield an unlearning algorithm with excess population risk of $\tilde O\big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{n\rho}\big)$ with unlearning query (gradient) complexity $\tilde O(\rho \cdot \text{Retraining Complexity})$, where $d$ is the model dimensionality and $n$ is the initial number of samples. For non-smooth Lipschitz losses, we give an unlearning algorithm with excess population risk $\tilde O\big(\frac{1}{\sqrt{n}}+\big(\frac{\sqrt{d}}{n\rho}\big)^{1/2}\big)$ with the same unlearning query (gradient) complexity. Furthermore, in the special case of Generalized Linear Models (GLMs), such as those in linear and logistic regression, we get dimension-independent rates of $\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{2/3}}\big)$ and $\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{1/3}}\big)$ for smooth Lipschitz and non-smooth Lipschitz losses respectively. Finally, we give generalizations of the above from one unlearning request to \textit{dynamic} streams consisting of insertions and deletions.

翻訳日:2023-07-24 14:23:04 公開日:2023-07-20

# UP-DP:ビジョン言語モデルを用いたデータ事前選択のための教師なしプロンプト学習

UP-DP: Unsupervised Prompt Learning for Data Pre-Selection with Vision-Language Models ( http://arxiv.org/abs/2307.11227v1 )

ライセンス: Link先を確認

Xin Li, Sima Behpour, Thang Doan, Wenbin He, Liang Gou, Liu Ren

(参考訳) 本研究では,ラベルのないデータセットから単一のパスでラベル付けするインスタンスを選択することを目的としたデータ事前選択タスクについて検討し,アノテーション予算に制限のある下流タスクのパフォーマンスを最適化する。以前のデータ事前選択のアプローチは、CLIPやBLIP-2といった基礎モデルから抽出された視覚的特徴にのみ依存していたが、テキスト機能の強力さは無視された。本研究では、適切な設計により、視覚とテキストの融合特徴空間がデータの事前選択により良い表現をもたらすことを論じる。この目的のために,データ事前選択にBLIP-2のような視覚言語モデルを適用する,シンプルで効果的な教師なしのプロンプト学習手法であるUP-DPを導入する。具体的には、BLIP-2パラメータを凍結することで、テキストプロンプトをトレーニングし、表現性を改善し、データセット全体をカバーする多様なクラスタ構造を保証する。この手法を7つのベンチマークデータセットを異なる設定で使用し,最大20%のパフォーマンス向上を実現した最新技術と比較した。興味深いことに、あるデータセットから学んだプロンプトは大きな一般化可能性を示し、他のデータセットからBLIP-2の特徴抽出を強化するために直接適用することができる。 up-dpは、データ事前選択のためのビジョン言語モデルに教師なしのプロンプト学習を組み込んだ最初の仕事です。

In this study, we investigate the task of data pre-selection, which aims to select instances for labeling from an unlabeled dataset through a single pass, thereby optimizing performance for undefined downstream tasks with a limited annotation budget. Previous approaches to data pre-selection relied solely on visual features extracted from foundation models, such as CLIP and BLIP-2, but largely ignored the powerfulness of text features. In this work, we argue that, with proper design, the joint feature space of both vision and text can yield a better representation for data pre-selection. To this end, we introduce UP-DP, a simple yet effective unsupervised prompt learning approach that adapts vision-language models, like BLIP-2, for data pre-selection. Specifically, with the BLIP-2 parameters frozen, we train text prompts to extract the joint features with improved representation, ensuring a diverse cluster structure that covers the entire dataset. We extensively compare our method with the state-of-the-art using seven benchmark datasets in different settings, achieving up to a performance gain of 20%. Interestingly, the prompts learned from one dataset demonstrate significant generalizability and can be applied directly to enhance the feature extraction of BLIP-2 from other datasets. To the best of our knowledge, UP-DP is the first work to incorporate unsupervised prompt learning in a vision-language model for data pre-selection.

翻訳日:2023-07-24 14:22:14 公開日:2023-07-20

# Jina Embeddings: 高性能な文埋め込みモデルの新しいセット

Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models ( http://arxiv.org/abs/2307.11224v1 )

ライセンス: Link先を確認

Michael G\"unther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, Han Xiao

(参考訳) Jina Embeddingsは、様々なテキスト入力を数値表現に変換するのに有効な高性能な文埋め込みモデルの集合を構成する。これらのモデルはテキスト生成のためにのみ設計されたものではないが、密検索や意味的テキストの類似性といった応用に優れている。本稿では、高品質なペアワイズおよびトリプルトデータセットの作成から始まった、jina組み込みの開発について述べる。データセット作成におけるデータクリーニングの重要な役割を強調し、モデルトレーニングプロセスに関する深い洞察を与え、massive textual embedded benchmark(mteb)を用いた包括的なパフォーマンス評価で締めくくっている。

Jina Embeddings constitutes a set of high-performance sentence embedding models adept at translating various textual inputs into numerical representations, thereby capturing the semantic essence of the text. While these models are not exclusively designed for text generation, they excel in applications such as dense retrieval and semantic textual similarity. This paper details the development of Jina Embeddings, starting with the creation of a high-quality pairwise and triplet dataset. It underlines the crucial role of data cleaning in dataset preparation, gives in-depth insights into the model training process, and concludes with a comprehensive performance evaluation using the Massive Textual Embedding Benchmark (MTEB).

翻訳日:2023-07-24 14:21:49 公開日:2023-07-20

# マルチオブザーバブルとマルチインスツルメント

Multi-Observables and Multi-Instruments ( http://arxiv.org/abs/2307.11223v1 )

ライセンス: Link先を確認

Stan Gudder

(参考訳) 本稿では、量子力学におけるマルチオブザーバブルとマルチインストゥルメントの概念を紹介する。 a multi-observable $A$ (multi-instrument $\mathcal{I}$) は $\Omega =\Omega _1\times\cdots\Omega _n$ という形式の結果空間を持ち、$A_{x_1\cdots x_n}$$$\mathcal{I}_{x_1\cdots x_n}$(x_1,\ldots ,x_n)\in\Omega$ で表される。また、$A$ ($\mathcal{I}$) a $n$-observable ($n$-instrument) と呼び、$n=2$ は $A$$$\mathcal{I}$) a bi-observable (bi-instrument) と呼ぶ。 bi-observables $a$(\mathcal{i}$)とbi-instrumentsは過去の文献で検討されてきたが、より一般的なケースは新しいようだ。特に、2つの観測可能量 (instrument) は、共同観測可能量 (bi-instrument) を持つ場合、共存または相容的であるように定義されている。この定義を$n$オブザーバブルと$n$楽器に拡張し、$n$オブザーバブルのジョイント限界と$n$インストラクトのジョイント限界を考える。我々は、n$-instrument がユニークな $n$-observable を計測し、有限個の機器のumber が共存するならば、それらの測定されたobservables が共存することを示す。非自明な$n$-observableとその部分の間には密接な関係があることを証明します。さらに、同様の結果が楽器に当てはまる。次に、有限個の楽器のテンソル積に対する自然な定義が存在し、合理的な性質を持つことを示す。次に,有限個の観測器と観測器の逐次積について考察する。我々は、kraus、holevo、l\"udersといった様々な楽器の例を示す。

This article introduces the concepts of multi-observables and multi-instruments in quantum mechanics. A multi-observable $A$ (multi-instrument $\mathcal{I}$) has an outcome space of the form $\Omega =\Omega _1\times\cdots\times\Omega _n$ and is denoted by $A_{x_1\cdots x_n}$ ($\mathcal{I}_{x_1\cdots x_n}$) where $(x_1,\ldots ,x_n)\in\Omega$. We also call $A$ ($\mathcal{I}$) an $n$-observable ($n$-instrument) and when $n=2$ we call $A$ ($\mathcal{I}$) a bi-observable (bi-instrument). We point out that bi-observables $A$ ($\mathcal{I}$) and bi-instruments have been considered in past literature, but the more general case appears to be new. In particular, two observables (instruments) have been defined to coexist or be compatible if they possess a joint bi-observable (bi-instrument). We extend this definition to $n$ observables and $n$ instruments by considering joint marginals of $n$-observables and joint reduced marginals of $n$-instruments. We show that a $n$-instrument measures a unique $n$-observable and if a finite umber of instruments coexist, then their measured observables coexist. We prove that there is a close relationship between a nontrivial $n$-observable and its parts. Moreover, a similar result holds for instruments. We next show that a natural definition for the tensor product of a finite number of instruments exist and possess reasonable properties. We then discuss sequential products of a finite number of observables and instruments. We present various examples such as Kraus, Holevo and L\"uders instruments.

翻訳日:2023-07-24 14:21:34 公開日:2023-07-20

# FairMobi-Net: 都市移動フロー生成のためのフェアネスを考慮したディープラーニングモデル

FairMobi-Net: A Fairness-aware Deep Learning Model for Urban Mobility Flow Generation ( http://arxiv.org/abs/2307.11214v1 )

ライセンス: Link先を確認

Zhewei Liu, Lipai Huang, Chao Fan, Ali Mostafavi

(参考訳) 都市構造と人口活動パターンを理解するためには, 地域をまたいだ現実的な人的流れの生成が不可欠であり, 都市計画・管理の分野において重要な応用が期待できる。しかし、既存のモビリティ生成手法の顕著な欠点は、予測公正性を無視することであり、弱い人口集団を持つ地域をまたいだモビリティフローの過小評価を招き、資源分布やインフラ開発が不適当になる可能性がある。この限界を克服するため,本研究では,地域間人的フロー予測のための新しいフェアネスアウェア深層学習モデルfairmobi-netを提案する。 FairMobi-Netモデルは、損失関数に公正損失を独自に組み込み、ハイブリッドアプローチを採用し、人間のフロー予測にバイナリ分類と数値回帰技術を統合する。本研究では,米国4都市の総合的移動度データセットを用いてFairMobi-Netモデルを検証する。この結果から,FairMobi-Netモデルは,地域所得差にかかわらず,より正確で公平な人流予測を実現する上で,最先端モデル(DeepGravityモデルなど)よりも優れていることがわかった。モデルは様々な領域にわたって高い精度を維持しており、以前の公正な懸念に対処している。特徴のさらなる分析は、物理的距離と道路ネットワーク構造が地域を横断する人的流れに与える影響を解明する。このモデルと結果は、都市科学、交通工学、コンピューティングの分野にまたがる研究者や実践者に、地域をまたがる人間の移動の流れを正確に生成するための効果的なツールを提供する。

Generating realistic human flows across regions is essential for our understanding of urban structures and population activity patterns, enabling important applications in the fields of urban planning and management. However, a notable shortcoming of most existing mobility generation methodologies is neglect of prediction fairness, which can result in underestimation of mobility flows across regions with vulnerable population groups, potentially resulting in inequitable resource distribution and infrastructure development. To overcome this limitation, our study presents a novel, fairness-aware deep learning model, FairMobi-Net, for inter-region human flow prediction. The FairMobi-Net model uniquely incorporates fairness loss into the loss function and employs a hybrid approach, merging binary classification and numerical regression techniques for human flow prediction. We validate the FairMobi-Net model using comprehensive human mobility datasets from four U.S. cities, predicting human flow at the census-tract level. Our findings reveal that the FairMobi-Net model outperforms state-of-the-art models (such as the DeepGravity model) in producing more accurate and equitable human flow predictions across a variety of region pairs, regardless of regional income differences. The model maintains a high degree of accuracy consistently across diverse regions, addressing the previous fairness concern. Further analysis of feature importance elucidates the impact of physical distances and road network structures on human flows across regions. With fairness as its touchstone, the model and results provide researchers and practitioners across the fields of urban sciences, transportation engineering, and computing with an effective tool for accurate generation of human mobility flows across regions.

翻訳日:2023-07-24 14:20:49 公開日:2023-07-20

# simcol3d -- 大腸内視鏡検査による3次元再建

SimCol3D -- 3D Reconstruction during Colonoscopy Challenge ( http://arxiv.org/abs/2307.11261v1 )

ライセンス: Link先を確認

Anita Rau, Sophia Bano, Yueming Jin, Pablo Azagra, Javier Morlana, Edward Sanderson, Bogdan J. Matuszewski, Jae Young Lee, Dong-Jae Lee, Erez Posner, Netanel Frank, Varshini Elangovan, Sista Raviteja, Zhengwen Li, Jiquan Liu, Seenivasan Lalithkumar, Mobarakol Islam, Hongliang Ren, Jos\'e M.M. Montiel, Danail Stoyanov

(参考訳) 大腸癌は世界で最も一般的ながんの1つである。大腸内視鏡は効果的なスクリーニング技術であるが,大腸内視鏡を通してポリープを検出するのは困難である。観察された表面の3dマップは、無防備な大腸組織の同定を強化し、訓練用プラットフォームとして機能する。しかし, 自己閉塞, 反射面, テクスチャの欠如, 特徴的手法を制限した組織変形など多くの要因により, ビデオ映像からの結腸の再構築は未解決のままである。学習ベースのアプローチはpromiseを堅牢な代替手段として持つが、広範なデータセットを必要とする。ベンチマークを確立することで、2022 EndoVisのサブチャンジSimCol3Dは、データ駆動深度を促進し、大腸内視鏡中に予測を行う。この挑戦はMICCAI 2022の一部としてシンガポールで開催された。世界中から6つのチームと、学界や産業の代表者が、合成深度予測、合成ポーズ予測、実際のポーズ予測という3つの課題に参加した。本稿では,課題,提案手法,その結果について述べる。仮想大腸内視鏡の深度予測は頑健に解けるが, ポーズ推定は未解決の課題である。

Colorectal cancer is one of the most common cancers in the world. While colonoscopy is an effective screening technique, navigating an endoscope through the colon to detect polyps is challenging. A 3D map of the observed surfaces could enhance the identification of unscreened colon tissue and serve as a training platform. However, reconstructing the colon from video footage remains unsolved due to numerous factors such as self-occlusion, reflective surfaces, lack of texture, and tissue deformation that limit feature-based methods. Learning-based approaches hold promise as robust alternatives, but necessitate extensive datasets. By establishing a benchmark, the 2022 EndoVis sub-challenge SimCol3D aimed to facilitate data-driven depth and pose prediction during colonoscopy. The challenge was hosted as part of MICCAI 2022 in Singapore. Six teams from around the world and representatives from academia and industry participated in the three sub-challenges: synthetic depth prediction, synthetic pose prediction, and real pose prediction. This paper describes the challenge, the submitted methods, and their results. We show that depth prediction in virtual colonoscopy is robustly solvable, while pose estimation remains an open research question.

翻訳日:2023-07-24 14:11:32 公開日:2023-07-20

# ガウス過程を用いた低データからの信頼度画像予測のための非パラメトリックモデルに向けて

Towards Non-Parametric Models for Confidence Aware Image Prediction from Low Data using Gaussian Processes ( http://arxiv.org/abs/2307.11259v1 )

ライセンス: Link先を確認

Nikhil U. Shinde, Florian Richter, Michael C. Yip

(参考訳) 将来の状態を想定する能力は、動的環境と対話しながらインフォームドな意思決定に不可欠である。カメラが広範かつ情報に富んだ知覚モダリティを提供することで、画像シーケンスから将来の状態を予測できるという問題が注目されている。工法の現状は、通常、予測のために大きなパラメトリックモデルを訓練する。精度で予測できることが多いが、これらのモデルは有用なソリューションに収束するために、大規模なトレーニングデータセットの可用性に依存している。本稿では,非常に少ないトレーニングデータから画像系列の将来の画像を予測する問題に着目する。この問題に取り組むために,非パラメトリックモデルを用いて確率論的手法による画像予測を行う。逐次予測画像上で確率分布を生成し,不確かさを時間を通して伝播し,予測に対する信頼度指標を生成する。 gaussianプロセスは、データ効率と、オンラインに新しいトレーニングデータを組み込む能力のために使用される。滑らかな流体シミュレーション環境における将来のフレーム予測に成功し,提案手法を紹介する。

The ability to envision future states is crucial to informed decision making while interacting with dynamic environments. With cameras providing a prevalent and information rich sensing modality, the problem of predicting future states from image sequences has garnered a lot of attention. Current state of the art methods typically train large parametric models for their predictions. Though often able to predict with accuracy, these models rely on the availability of large training datasets to converge to useful solutions. In this paper we focus on the problem of predicting future images of an image sequence from very little training data. To approach this problem, we use non-parametric models to take a probabilistic approach to image prediction. We generate probability distributions over sequentially predicted images and propagate uncertainty through time to generate a confidence metric for our predictions. Gaussian Processes are used for their data efficiency and ability to readily incorporate new training data online. We showcase our method by successfully predicting future frames of a smooth fluid simulation environment.

翻訳日:2023-07-24 14:11:13 公開日:2023-07-20

# 脱分極雑音下でのロバスト基底状態エネルギー推定

Robust ground-state energy estimation under depolarizing noise ( http://arxiv.org/abs/2307.11257v1 )

ライセンス: Link先を確認

Zhiyan Ding and Yulong Dong and Yu Tong and Lin Lin

(参考訳) 我々は,大域的な分極誤差チャネルの下で頑健な基底状態エネルギー推定アルゴリズムを提案する。最近開発された量子指数最小二乗法 (qcels) アルゴリズム [ding, lin, prx quantum, 4, 020331, 2023] に基づいて, 多項式コストの精度を維持しつつ, 頑健な推定を実現するための重要な進歩を取り入れている。ハミルトンのスペクトルギャップを効果的に活用することにより、我々のアルゴリズムは量子位相推定(QPE)やロバスト位相推定(RPE)といった従来の手法で観測された限界を克服する。グローバル非分極化誤りチャネルを超えて、量子ノイズを非分極化エラーチャネルに合わせるためにランダムコンパイル技術を活用することの重要性と実際的な利点を強調する。本研究では,非分極ノイズの存在下での基底状態エネルギー推定の可能性を示し,誤差補正と量子アルゴリズムのアルゴリズムレベルの誤差緩和の可能性を示す。

We present a novel ground-state energy estimation algorithm that is robust under global depolarizing error channels. Building upon the recently developed Quantum Exponential Least Squares (QCELS) algorithm [Ding, Lin, PRX Quantum, 4, 020331, 2023], our new approach incorporates significant advancements to ensure robust estimation while maintaining a polynomial cost in precision. By leveraging the spectral gap of the Hamiltonian effectively, our algorithm overcomes limitations observed in previous methods like quantum phase estimation (QPE) and robust phase estimation (RPE). Going beyond global depolarizing error channels, our work underscores the significance and practical advantages of utilizing randomized compiling techniques to tailor quantum noise towards depolarizing error channels. Our research demonstrates the feasibility of ground-state energy estimation in the presence of depolarizing noise, offering potential advancements in error correction and algorithmic-level error mitigation for quantum algorithms.

翻訳日:2023-07-24 14:10:58 公開日:2023-07-20

# バイオメディカル自然言語処理におけるフェデレーション学習の体系的評価

A Systematic Evaluation of Federated Learning on Biomedical Natural Language Processing ( http://arxiv.org/abs/2307.11254v1 )

ライセンス: Link先を確認

Le Peng, sicheng zhou, jiandong chen, Rui Zhang, Ziyue Xu, Ju Sun

(参考訳) BERTやGPTのような言語モデル(LM)は自然言語処理(NLP)に革命をもたらした。しかし、プライバシーに敏感なドメイン、特に医療分野は、健康保険可搬性会計法(Health Insurance Portability and Accountability Act, HIPPA)や一般データ保護規則(General Data Protection Regulation, GDPR)などの規制によって課されるデータアクセスとプライバシーの制約が制限されているため、LMを訓練する課題に直面している。フェデレートラーニング(FL)は、データプライバシの保護を確保しながら協調学習を可能にする分散ソリューションを提供する。本研究は, バイオメディカルNLPタスクの医療におけるFLを, 8ドルコーパスを含む6ドルのLMを用いて体系的に評価した。結果はこう示しました 1) flモデルは,個々のクライアントのデータでトレーニングされたlmmを一貫して上回っており,時にはポーリングされたデータでトレーニングされたモデルと一致する。 2) 総データ量は一定であり, FLをより多くのクライアントで訓練したLMでは性能は劣るが, 事前学習したトランスフォーマーモデルではレジリエンスが向上した。 3) fl を用いてトレーニングした lms は,非 iid データの可視ギャップを提示しながら,クライアントの iid 分散時にプールデータでトレーニングされたモデルとほぼ同等の性能を発揮する。私たちのコードは、https://github.com/PL97/FedNLPで利用可能です。

Language models (LMs) like BERT and GPT have revolutionized natural language processing (NLP). However, privacy-sensitive domains, particularly the medical field, face challenges to train LMs due to limited data access and privacy constraints imposed by regulations like the Health Insurance Portability and Accountability Act (HIPPA) and the General Data Protection Regulation (GDPR). Federated learning (FL) offers a decentralized solution that enables collaborative learning while ensuring the preservation of data privacy. In this study, we systematically evaluate FL in medicine across $2$ biomedical NLP tasks using $6$ LMs encompassing $8$ corpora. Our results showed that: 1) FL models consistently outperform LMs trained on individual client's data and sometimes match the model trained with polled data; 2) With the fixed number of total data, LMs trained using FL with more clients exhibit inferior performance, but pre-trained transformer-based models exhibited greater resilience. 3) LMs trained using FL perform nearly on par with the model trained with pooled data when clients' data are IID distributed while exhibiting visible gaps with non-IID data. Our code is available at: https://github.com/PL97/FedNLP

翻訳日:2023-07-24 14:10:37 公開日:2023-07-20

# 大腸癌予防のための片面合成非ペア画像翻訳と分節化

Joint one-sided synthetic unpaired image translation and segmentation for colorectal cancer prevention ( http://arxiv.org/abs/2307.11253v1 )

ライセンス: Link先を確認

Enric Moreu, Eric Arazo, Kevin McGuinness, Noel E. O'Connor

(参考訳) 深層学習は医療画像の解析において優れた性能を示した。しかし、データセットはプライバシの問題、標準化の問題、アノテーションの欠如のために取得することが難しい。本稿では,3次元技術と生成対向ネットワークを組み合わせたリアルな合成画像を作成することで,これらの課題に対処する。 CUT-segは,ポリプの分割学習中に,分割モデルと生成モデルとを併用してリアルな画像を生成するジョイントトレーニングである。最近の片面翻訳モデルの利点は、メモリ使用量が非常に少なく、トレーニングループにセグメンテーションモデルを追加できる点にあります。 CUT-segは2段階の訓練を必要とする他のメモリ集約型画像変換手法よりもパフォーマンスが良く、計算コストも低く、実際の画像を必要としない。有望な結果は、単一の実画像とゼロ実アノテーションを使用して、5つの実ポリプセグメンテーションデータセットで達成される。この研究の一環として、我々はSynth-Colonをリリースした。Synth-Colonは、20000のリアルな大腸画像と深度と3D幾何学に関する追加情報を含む完全に合成されたデータセットである。

Deep learning has shown excellent performance in analysing medical images. However, datasets are difficult to obtain due privacy issues, standardization problems, and lack of annotations. We address these problems by producing realistic synthetic images using a combination of 3D technologies and generative adversarial networks. We propose CUT-seg, a joint training where a segmentation model and a generative model are jointly trained to produce realistic images while learning to segment polyps. We take advantage of recent one-sided translation models because they use significantly less memory, allowing us to add a segmentation model in the training loop. CUT-seg performs better, is computationally less expensive, and requires less real images than other memory-intensive image translation approaches that require two stage training. Promising results are achieved on five real polyp segmentation datasets using only one real image and zero real annotations. As a part of this study we release Synth-Colon, an entirely synthetic dataset that includes 20000 realistic colon images and additional details about depth and 3D geometry: https://enric1994.github.io/synth-colon

翻訳日:2023-07-24 14:10:09 公開日:2023-07-20

# Contra multos verbos : 量子力学のスキャンダルについて

Contra multos verbos: On scandals of quantum mechanics ( http://arxiv.org/abs/2307.11669v1 )

ライセンス: Link先を確認

Theodorus Maria Nieuwenhuizen

(参考訳) 2008年、ニコ・ファン・カンペン(nico van kampen)は「量子力学のスキャンダル」("it the scandal of quantum mechanics")という手紙の中で、「このスキャンダルは、様々な解釈や哲学的根拠を宣伝する多くの記事、議論、教科書がある。「それ以来、あまり変わっていないが、ソーシャルメディアはニコが「スキャンダル」と呼ぶようなプラットフォームを提供してきた。量子力学の現状について、Arman Allahverdyan と Roger Balian の20年間の研究から、量子測定のためのキュリー・ワイスのモデルの動的解について、詳細な見解が述べられている。統計的解釈のある種の最小形態を具現化し、存在論的つながりを排除している。その過程で、様々な主題、用語、解釈に関するコメントが与えられる。

In 2008 Nico van Kampen wrote in his letter {\it The scandal of quantum mechanics}: ``The scandal is that there are still many articles, discussions and textbooks, which advertise various interpretations and philosophical profundities." Not much has changed since then, while social media have given a platform for more of what Nico would term ``a scandal''. A detailed viewpoint is presented on the status of quantum mechanics, distilled from two decades of work with Armen Allahverdyan and Roger Balian on the dynamical solution of Curie-Weiss models for quantum measurement. It embodies a certain minimal form of the statistical interpretation and stays clear of ontological connections. Along the way, comments on various related subjects, terms and interpretations are given.

翻訳日:2023-07-24 11:52:52 公開日:2023-07-20

# ロバスト主成分分析:手段アプローチの中央値

Robust Principal Component Analysis: A Median of Means Approach ( http://arxiv.org/abs/2102.03403v2 )

ライセンス: Link先を確認

Debolina Paul, Saptarshi Chakraborty and Swagatam Das

(参考訳) 主成分分析(PCA)は、データの可視化、復調、次元化のための基本的なツールである。統計学、機械学習、コンピュータビジョン、関連する分野で広く使われている。しかし、PCAは外れ値に陥ることがよく知られており、しばしばデータセット内の真の下層の低次元構造を検出することに失敗する。メディア・オブ・ミーンズ(MoM)の哲学に従い、近年の教師付き学習手法は、膨大なサンプル理論特性を損なうことなく、外部観測を扱うことに成功している。本稿では,MoM原理に基づくPCA手法を提案する。 mompca (textbf{m}edian of \textbf{m}eans \textbf{p}rincipal \textbf{c}omponent \textbf{a}nalysis) と呼ばれるこの手法は計算上魅力的であるだけでなく、最小の仮定の下で最適収束率を達成する。特に、ラデマッハ複素数の助けを借りて得られた解の漸近的でない誤差境界を探索し、外部の観測に全く仮定を与えない。導出された濃度結果は、解析が分離可能なヒルベルト空間で行われ、結果が対応するノルムにおける基底分布の4番目のモーメントのみに依存するため、次元に依存しない。提案の有効性はシミュレーションや実データアプリケーションを通じて徹底的に実証されている。

Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philosophy, recent supervised learning methods have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. This paper proposes a PCA procedure based on the MoM principle. Called the \textbf{M}edian of \textbf{M}eans \textbf{P}rincipal \textbf{C}omponent \textbf{A}nalysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of the Rademacher complexities while granting absolutely no assumption on the outlying observations. The derived concentration results are not dependent on the dimension because the analysis is conducted in a separable Hilbert space, and the results only depend on the fourth moment of the underlying distribution in the corresponding norm. The proposal's efficacy is also thoroughly showcased through simulations and real data applications.

翻訳日:2023-07-21 19:35:49 公開日:2023-07-20

# パーセプトロン理論はニューラルネットワークの精度を予測することができる

Perceptron Theory Can Predict the Accuracy of Neural Networks ( http://arxiv.org/abs/2012.07881v2 )

ライセンス: Link先を確認

Denis Kleyko, Antonello Rosato, E. Paxon Frady, Massimo Panella, Friedrich T. Sommer

(参考訳) 多層ニューラルネットワークは、多くの技術的分類問題に対する技術の現状を定めている。しかし、これらのネットワークは基本的にはブラックボックスであり、分析してパフォーマンスを予測する。本稿では,1層パーセプトロンの統計的理論を開発し,異なるアーキテクチャを持つ驚くほど多種多様なニューラルネットワークの性能を予測できることを示す。パーセプトロンを用いた分類の一般的な理論は、ベクトル記号アーキテクチャとして知られるシンボリック推論のための貯水池計算モデルとコネクショニストモデルを分析するための既存の理論を一般化することによって展開される。我々の統計理論は、信号統計を利用した3つの公式を提供する。式は解析的に難解であるが、数値的に評価できる。最大詳細をキャプチャする記述レベルには、確率的サンプリング方法が必要である。ネットワークモデルによっては、単純な公式はすでに高い予測精度をもたらす。理論予測の質は、貯水池計算文献からのエコー状態ネットワーク(ESN)の記憶タスク、浅いランダムに接続されたネットワークの分類データセットの収集、深層畳み込みニューラルネットワークのイメージNetデータセットの3つの実験環境で評価される。パーセプトロン理論の2番目の記述レベルは,従来説明できなかったタイプのESNの性能を予測できることがわかった。この理論は、その出力層に適用することで、深い多層ニューラルネットワークを予測することができる。ニューラルネットワークの性能を予測する他の方法は、推定モデルの訓練を必要とすることが多いが、提案された理論は、出力ニューロンにおけるシナプス後和の分布の最初の2つのモーメントのみを必要とする。パーセプトロン理論は、推定器モデルを訓練に依存しない他の方法と好意的に比較する。

Multilayer neural networks set the current state of the art for many technical classification problems. But, these networks are still, essentially, black boxes in terms of analyzing them and predicting their performance. Here, we develop a statistical theory for the one-layer perceptron and show that it can predict performances of a surprisingly large variety of neural networks with different architectures. A general theory of classification with perceptrons is developed by generalizing an existing theory for analyzing reservoir computing models and connectionist models for symbolic reasoning known as vector symbolic architectures. Our statistical theory offers three formulas leveraging the signal statistics with increasing detail. The formulas are analytically intractable, but can be evaluated numerically. The description level that captures maximum details requires stochastic sampling methods. Depending on the network model, the simpler formulas already yield high prediction accuracy. The quality of the theory predictions is assessed in three experimental settings, a memorization task for echo state networks (ESNs) from reservoir computing literature, a collection of classification datasets for shallow randomly connected networks, and the ImageNet dataset for deep convolutional neural networks. We find that the second description level of the perceptron theory can predict the performance of types of ESNs, which could not be described previously. The theory can predict deep multilayer neural networks by being applied to their output layer. While other methods for prediction of neural networks performance commonly require to train an estimator model, the proposed theory requires only the first two moments of the distribution of the postsynaptic sums in the output neurons. The perceptron theory compares favorably to other methods that do not rely on training an estimator model.

翻訳日:2023-07-21 19:35:24 公開日:2023-07-20

# ABNIRML:ニューラルIRモデルの挙動解析

ABNIRML: Analyzing the Behavior of Neural IR Models ( http://arxiv.org/abs/2011.00696v2 )

ライセンス: Link先を確認

Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan

(参考訳) BERTやT5のような事前制約付き言語モデルは、アドホック検索のための新しい最先端技術を確立した。しかし、これらの方法がなぜこれほど効果的なのか、なぜ他の種類よりも有効なのか、どのような落とし穴があるのか、まだよく理解されていない。本稿では,従来の手法では扱えなかった文体,事実性,言い換えに対する感受性,単語順など,いくつかの特徴をテスト可能な新しいタイプの診断プローブを含む,ニューラルirモデル(abnirml)の挙動解析のための新しい包括的なフレームワークを提案する。フレームワークの価値を示すために、神経モデルの利益に寄与する要因についての洞察を与え、モデルが提示する意図しないバイアスを識別する、広範な実証研究を行う。例えば、最近のニューラルネットワークのランキングモデルでは、クエリと正確な項重なりをあまり頼りにせず、単語や文の順序に高い感度で示されるより豊かな言語情報を活用するようにしています。他の結果は、いくつかのモデル(例えばT5やColBERT)が(単に関連性ではなく)事実的に正しいテキストに偏っているなど、より驚くべきものである。さらに、同じベース言語モデルであってもいくつかの特性が異なり、他の特徴はモデルの訓練中にランダムなバリエーションによって現れる。

Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search. However, it is not yet well-understood why these methods are so effective, what makes some variants more effective than others, and what pitfalls they may have. We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML), which includes new types of diagnostic probes that allow us to test several characteristics -- such as writing styles, factuality, sensitivity to paraphrasing and word order -- that are not addressed by previous techniques. To demonstrate the value of the framework, we conduct an extensive empirical study that yields insights into the factors that contribute to the neural model's gains, and identify potential unintended biases the models exhibit. Some of our results confirm conventional wisdom, like that recent neural ranking models rely less on exact term overlap with the query, and instead leverage richer linguistic information, evidenced by their higher sensitivity to word and sentence order. Other results are more surprising, such as that some models (e.g., T5 and ColBERT) are biased towards factually correct (rather than simply relevant) texts. Further, some characteristics vary even for the same base language model, and other characteristics can appear due to random variations during model training.

翻訳日:2023-07-21 19:34:57 公開日:2023-07-20

# 局所部分空間の暗黙多次元射影

Implicit Multidimensional Projection of Local Subspaces ( http://arxiv.org/abs/2009.03259v2 )

ライセンス: Link先を確認

Rongzheng Bian, Yumeng Xue, Liang Zhou, Jian Zhang, Baoquan Chen, Daniel Weiskopf, Yunhai Wang

(参考訳) 本研究では,多次元投影が局所部分空間に与える影響を暗黙の関数微分を用いて可視化する手法を提案する。ここでは、局所部分空間をデータポイントの多次元局所近傍として理解する。既存の手法は多次元データポイントの投影に重点を置いており、近隣情報は無視される。本手法は,局所部分空間の形状と方向情報を解析し,局所構造を知覚することで,データの全体構造に関するさらなる洞察を得ることができる。局所部分空間は基底ベクトルにまたがる多次元楕円体によって構成される。暗黙関数として定式化された多次元射影の解析的微分に基づいて,高精度かつ効率的なベクトル変換法を提案する。結果はグリフとして可視化され、効率的なWebベースの可視化ツールでサポートされている、特別に設計されたインタラクションの完全なセットを用いて分析される。本手法の有用性を多次元および高次元ベンチマークデータセットを用いて実証した。暗黙的微分ベクトル変換は数値比較により評価され, 探索例とユースケースを用いて総合的手法が評価された。

We propose a visualization method to understand the effect of multidimensional projection on local subspaces, using implicit function differentiation. Here, we understand the local subspace as the multidimensional local neighborhood of data points. Existing methods focus on the projection of multidimensional data points, and the neighborhood information is ignored. Our method is able to analyze the shape and directional information of the local subspace to gain more insights into the global structure of the data through the perception of local structures. Local subspaces are fitted by multidimensional ellipses that are spanned by basis vectors. An accurate and efficient vector transformation method is proposed based on analytical differentiation of multidimensional projections formulated as implicit functions. The results are visualized as glyphs and analyzed using a full set of specifically-designed interactions supported in our efficient web-based visualization tool. The usefulness of our method is demonstrated using various multi- and high-dimensional benchmark datasets. Our implicit differentiation vector transformation is evaluated through numerical comparisons; the overall method is evaluated through exploration examples and use cases.

翻訳日:2023-07-21 19:34:35 公開日:2023-07-20

# 2量子ビット状態を持つランダムアクセスコードプロトコルにおける量子長所の十分条件

Sufficient conditions for quantum advantage in random access code protocols with two-qubit states ( http://arxiv.org/abs/1912.09900v6 )

ライセンス: Link先を確認

Som Kanjilal, C Jebarathinam, Tomasz Paterek, Dipankar Home

(参考訳) ランダムアクセスコード(RAC)は、nビット文字列のランダムに指定されたサブストリングに関する情報を取得するための重要な通信プロトコルである。量子RACは通常、古典的な通信と共に使用される量子ビットの通信または共用量子状態を利用する。ここでは、単一ビット通信と2つの量子ビットの共有任意の状態の制約の下で、量子プロトコルの後者について考察する。最低ケースの成功確率をメリットの図形として、逆相関行列を持つ任意の状態を用いて、n=3の古典的RACを上回り得ることを示す。 n=2の場合、最も優れた古典的性能を達成できる追加条件を導出する。特に、分離状態は n=2,3 の量子優位性の背後にある有用な資源であることが判明した。量子状態の単一コピーを補助する$n \geq 4$ RACは、古典的なRACよりも優れていない。

Random access code (RAC) is an important communication protocol to obtain information about a randomly specified substring of an n-bit string, while only having limited information about the n-bit string. Quantum RACs usually utilise either communication of quantum bits or a shared-in-advance quantum state used in conjunction with classical communication. Here we consider the latter version of the quantum protocols under the constraint of single-bit communication and with shared arbitrary state of two qubits. Taking the worst-case success probability as the figure of merit, we demonstrate that any state with invertible correlation matrix can be used to outperform the best classical RAC for n=3. We derive an additional condition sufficient to beat the best classical performance in the case of n=2. In particular, separable states turn out to be a useful resource behind the quantum advantage for n=2,3. For $n \geq 4$ RACs assisted with a single copy of a quantum state do not outperform the classical RACs.

翻訳日:2023-07-21 19:34:01 公開日:2023-07-20

# 相対サブシステムと量子参照フレーム変換

Relative subsystems and quantum reference frame transformations ( http://arxiv.org/abs/2110.13199v2 )

ライセンス: Link先を確認

Esteban Castro-Ruiz and Ognyan Oreshkov

(参考訳) 近年、参照フレーム変換の量子一般化の開発に多くの努力がなされている。重要な進歩にもかかわらず、その原則に対する完全な理解はまだ欠けている。特に、以前の提案は、宇宙全体に適用した場合のみ、任意の量子参照フレーム間の可逆変換をもたらす可能性があると論じる。対照的に、標準量子理論のみを用いて、第一原理から量子参照フレーム変換を導出する。我々のフレームワークは、自然にコヒーレントなグループ平均化よりも不整合性に基づいており、参照フレームと関心体系にのみ依存する可逆変換をもたらす。これまでの研究よりもより一般的な変換が得られ、これは制限部分空間でのみ有効である。重要なことに、我々のフレームワークは、参照フレーム状態の量子的特徴に関する情報を伝達する「外部粒子」という形で追加の自由度を含んでいる。我々の形式主義は幅広い対称群に対して有効である。中心的に拡張されたガリレイ群を特に研究し、以前の提案との大きな違いを強調した。

Recently there has been much effort in developing a quantum generalisation of reference frame transformations. Despite important progress, a complete understanding of their principles is still lacking. In particular, we argue that previous proposals could yield reversible transformations between arbitrary quantum reference frames only when applied to the whole universe. In contrast, here we derive quantum reference frame transformations from first principles, using only standard quantum theory. Our framework, naturally based on incoherent rather than coherent group averaging, yields reversible transformations that only depend on the reference frames and system of interest. We find more general transformations than those studied so far, which are valid only in a restricted subspace. Importantly, our framework contains additional degrees of freedom in the form of an "extra particle," which carries information about the quantum features of reference frame states. Our formalism is valid for a broad range of symmetry groups. We study the centrally extended Galilei group specifically, highlighting key differences from previous proposals.

翻訳日:2023-07-21 19:28:03 公開日:2023-07-20

# 動作認識に注意を向けた高次テンソルプーリング

High-order Tensor Pooling with Attention for Action Recognition ( http://arxiv.org/abs/2110.05216v2 )

ライセンス: Link先を確認

Piotr Koniusz and Lei Wang and Ke Sun

(参考訳) 本稿では,ニューラルネットワークによって形成される特徴ベクトルの高次統計を捉え,エンドツーエンドの2次・高次プーリングを提案し,テンソルディスクリプタを構成する。テンソルディスクリプタは、集約ベクトルの少ない数と、与えられた特徴が統計的に予想されるよりも頻繁に現れるバーストネス現象のために、堅牢な類似度尺度を必要とする。グラフラプラシアン上の熱拡散過程(HDP)は、逆がループグラフラプラシアンを形成する共分散・自己相関行列の固有値パワー正規化(EPN)と密接に関係している。我々は,HDPとEPNが同一の役割を担っていること,すなわち固有スペクトルの大きさを増大または減衰させることにより,バーストの防止を図っている。我々は、高次発生のスペクトル検出器として作用するepnに高次テンソルを装備し、バーストネスを防止する。また、d次元特徴記述子から構築された位数 r のテンソルに対して、そのような検出器は、少なくとも1つの高次発生がテンソルで表されるbinom(d,r)部分空間の1つに「射影」される可能性を示し、したがってそのような「detectors」のようなbinom(d,r)で導かれるテンソルパワー正規化計量を形成する。実験的なコントリビューションとして,2次および高次プール変種をアクション認識に適用し,これまでに提示されていないプール変種の比較を行い,HMDB-51,YUP++,MPII調理活動の最先端結果を示す。

We aim at capturing high-order statistics of feature vectors formed by a neural network, and propose end-to-end second- and higher-order pooling to form a tensor descriptor. Tensor descriptors require a robust similarity measure due to low numbers of aggregated vectors and the burstiness phenomenon, when a given feature appears more/less frequently than statistically expected. The Heat Diffusion Process (HDP) on a graph Laplacian is closely related to the Eigenvalue Power Normalization (EPN) of the covariance/auto-correlation matrix, whose inverse forms a loopy graph Laplacian. We show that the HDP and the EPN play the same role, i.e., to boost or dampen the magnitude of the eigenspectrum thus preventing the burstiness. We equip higher-order tensors with EPN which acts as a spectral detector of higher-order occurrences to prevent burstiness. We also prove that for a tensor of order r built from d dimensional feature descriptors, such a detector gives the likelihood if at least one higher-order occurrence is 'projected' into one of binom(d,r) subspaces represented by the tensor; thus forming a tensor power normalization metric endowed with binom(d,r) such 'detectors'. For experimental contributions, we apply several second- and higher-order pooling variants to action recognition, provide previously not presented comparisons of such pooling variants, and show state-of-the-art results on HMDB-51, YUP++ and MPII Cooking Activities.

翻訳日:2023-07-21 19:27:39 公開日:2023-07-20

# ジェネリックコンテキスト帯域のモデル選択

Model Selection for Generic Contextual Bandits ( http://arxiv.org/abs/2107.03455v2 )

ライセンス: Link先を確認

Avishek Ghosh, Abishek Sankararaman and Kannan Ramchandran

(参考訳) 一般化可能性仮定の下では,一般確率的文脈帯域のモデル選択の問題を考える。そこで本研究では,適応的文脈的バンドイット({\ttfamily acb})と呼ばれる逐次改良型アルゴリズムを提案する。このアルゴリズムが適応的であること、すなわち、後悔率の順序付けは、証明可能な文脈的バンディットアルゴリズムのそれと一致することを証明する。これは真のモデルクラスの知識を必要とする。正しいモデルクラスを知らないという価格は、後悔境界における第二次項に寄与する加法項のみであることが判明した。このコストはモデルクラスが識別しやすくなり、逆もまたより小さくなるという直感的な特性を持っている。また,真のモデルクラスを知らないにもかかわらず,ETCスタイルのアルゴリズムでも同様の後悔境界が得られることを示す。しかし、モデル選択のコストは予想通り in {\ttfamily acb} よりも高い。さらに,線形文脈バンディットの特別な場合に対して,汎用的な構成に比べてシャープな保証を得るための特別なアルゴリズムを提案する。

We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption. We propose a successive refinement based algorithm called Adaptive Contextual Bandit ({\ttfamily ACB}), that works in phases and successively eliminates model classes that are too simple to fit the given instance. We prove that this algorithm is adaptive, i.e., the regret rate order-wise matches that of any provable contextual bandit algorithm (ex. \cite{falcon}), that needs the knowledge of the true model class. The price of not knowing the correct model class turns out to be only an additive term contributing to the second order term in the regret bound. This cost possess the intuitive property that it becomes smaller as the model class becomes easier to identify, and vice-versa. We also show that a much simpler explore-then-commit (ETC) style algorithm also obtains similar regret bound, despite not knowing the true model class. However, the cost of model selection is higher in ETC as opposed to in {\ttfamily ACB}, as expected. Furthermore, for the special case of linear contextual bandits, we propose specialized algorithms that obtain sharper guarantees compared to the generic setup.

翻訳日:2023-07-21 19:26:49 公開日:2023-07-20

# 新興ハードウェアのための計算フレームワークとしてのベクトルシンボリックアーキテクチャ

Vector Symbolic Architectures as a Computing Framework for Emerging Hardware ( http://arxiv.org/abs/2106.05268v2 )

ライセンス: Link先を確認

Denis Kleyko, Mike Davies, E. Paxon Frady, Pentti Kanerva, Spencer J. Kent, Bruno A. Olshausen, Evgeny Osipov, Jan M. Rabaey, Dmitri A. Rachkovskij, Abbas Rahimi, Friedrich T. Sommer

(参考訳) 本稿では,vector symbolic architectures (vsa) (超次元コンピューティングとしても知られている) の開発における最近の進歩を概観する。このフレームワークは確率的で新興のハードウェアの実装に適しており、人工知能(AI)に必要な認知操作のタイプを自然に表現している。本稿では、vsa の体様代数構造が、現代的な計算に関連する全てのデータ構造と操作をサポートする高次元ベクトル上の単純かつ強力な操作を提供することを示す。さらに,VSAの区別機能である「重ね合わせ計算」について述べる。また、AIアプリケーションに固有の難しい組合せ探索問題に対する効率的なソリューションへの扉を開く。我々はVSAが計算学的に普遍的であることを示す方法をスケッチする。分散表現を用いたコンピューティングのフレームワークとして機能し、新興コンピューティングハードウェアの抽象化レイヤの役割を担っていると考えています。この記事では、vsaの背景にある哲学、それらを用いた分散コンピューティングのテクニック、ニューロモーフィックコンピューティングのような新しいコンピューティングハードウェアとの関連を図示することで、コンピュータアーキテクトへの参照として役立ちます。

This article reviews recent progress in the development of the computing framework vector symbolic architectures (VSA) (also known as hyperdimensional computing). This framework is well suited for implementation in stochastic, emerging hardware, and it naturally expresses the types of cognitive operations required for artificial intelligence (AI). We demonstrate in this article that the field-like algebraic structure of VSA offers simple but powerful operations on high-dimensional vectors that can support all data structures and manipulations relevant to modern computing. In addition, we illustrate the distinguishing feature of VSA, "computing in superposition," which sets it apart from conventional computing. It also opens the door to efficient solutions to the difficult combinatorial search problems inherent in AI applications. We sketch ways of demonstrating that VSA are computationally universal. We see them acting as a framework for computing with distributed representations that can play a role of an abstraction layer for emerging computing hardware. This article serves as a reference for computer architects by illustrating the philosophy behind VSA, techniques of distributed computing with them, and their relevance to emerging computing hardware, such as neuromorphic computing.

翻訳日:2023-07-21 19:26:29 公開日:2023-07-20

# detreg: オブジェクト検出のための領域優先型教師なし事前トレーニング

DETReg: Unsupervised Pretraining with Region Priors for Object Detection ( http://arxiv.org/abs/2106.04550v5 )

ライセンス: Link先を確認

Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

(参考訳) 近年, 物体検出のための自己監督型事前学習法は, 検出アーキテクチャの重要な部分を無視して, 対象検出器のバックボーンの事前訓練に重点を置いている。代わりに、オブジェクトのローカライゼーションと埋め込みコンポーネントを含む、オブジェクト検出ネットワーク全体を事前学習する新しい自己教師ありメソッドであるdetregを紹介する。事前トレーニング中、detregは、教師なし領域提案ジェネレータからのローカライゼーションと一致するオブジェクトのローカライゼーションを予測し、対応する特徴埋め込みと自己教師なし画像エンコーダからの埋め込みを同時に調整する。我々は,DETRファミリーの検出器を用いてDETRegを実装し,COCO,PASCAL VOC,Airbus Shipのベンチマークを微調整することで,競争ベースラインよりも向上することを示す。低データのレシエーションでは、DreTRegは、1%のラベルと数ショットの学習設定でトレーニングするなど、パフォーマンスの向上を実現している。

Recent self-supervised pretraining methods for object detection largely focus on pretraining the backbone of the object detector, neglecting key parts of detection architecture. Instead, we introduce DETReg, a new self-supervised method that pretrains the entire object detection network, including the object localization and embedding components. During pretraining, DETReg predicts object localizations to match the localizations from an unsupervised region proposal generator and simultaneously aligns the corresponding feature embeddings with embeddings from a self-supervised image encoder. We implement DETReg using the DETR family of detectors and show that it improves over competitive baselines when finetuned on COCO, PASCAL VOC, and Airbus Ship benchmarks. In low-data regimes DETReg achieves improved performance, e.g., when training with only 1% of the labels and in the few-shot learning settings.

翻訳日:2023-07-21 19:26:13 公開日:2023-07-20

# 明示的知識指導による意味的逆シナリオ生成

Semantically Adversarial Scenario Generation with Explicit Knowledge Guidance ( http://arxiv.org/abs/2106.04066v6 )

ライセンス: Link先を確認

Wenhao Ding, Haohong Lin, Bo Li, Ding Zhao

(参考訳) 自律運転システムを失敗させる可能性のある敵シナリオを生成することは、堅牢性を改善する効果的な方法である。純粋にデータ駆動生成モデルを拡張し、最近の特殊モデルは、ニューロンレベルで暗黙的にパターンを操作することによって、運転シーンに交通標識を埋め込むなど、制御可能な追加要件を満たす。本稿では,semantically adversarial generation (sag) を実現するために,生成プロセスにドメイン知識を明示的に組み込む手法を提案する。ドライビングシーンの構成に整合性を持たせるために,まず知識を物体の性質と物体間の関係という2つのタイプに分類する。次に,木構造変化型自動エンコーダ(T-VAE)を提案する。ツリー構造におけるノードとエッジの特性にセマンティックルールを課すことで、明示的な知識統合は制御可能な生成を可能にする。本手法の制御性と説明性を示すための合成例を簡潔な設定で構築する。本手法は,異なる最先端の3dポイントクラウドセグメンテーションモデルに対する逆行運転シーンを効率的に識別し,明示的な知識として指定されたトラフィックルールを満たす。

Generating adversarial scenarios, which have the potential to fail autonomous driving systems, provides an effective way to improve robustness. Extending purely data-driven generative models, recent specialized models satisfy additional controllable requirements such as embedding a traffic sign in a driving scene by manipulating patterns implicitly in the neuron level. In this paper, we introduce a method to incorporate domain knowledge explicitly in the generation process to achieve the Semantically Adversarial Generation (SAG). To be consistent with the composition of driving scenes, we first categorize the knowledge into two types, the property of objects and the relationship among objects. We then propose a tree-structured variational auto-encoder (T-VAE) to learn hierarchical scene representation. By imposing semantic rules on the properties of nodes and edges in the tree structure, explicit knowledge integration enables controllable generation. We construct a synthetic example to illustrate the controllability and explainability of our method in a succinct setting. We further extend to realistic environments for autonomous vehicles: our method efficiently identifies adversarial driving scenes against different state-of-the-art 3D point cloud segmentation models and satisfies the traffic rules specified as the explicit knowledge.

翻訳日:2023-07-21 19:25:54 公開日:2023-07-20

# 到達可能なマルチスタビリティを最大化するリカレントニューラルネットワークのウォーミングアップが学習を大幅に改善

Warming up recurrent neural networks to maximise reachable multistability greatly improves learning ( http://arxiv.org/abs/2106.01001v3 )

ライセンス: Link先を確認

Gaspard Lambrechts, Florent De Geeter, Nicolas Vecoven, Damien Ernst, Guillaume Drion

(参考訳) リカレントニューラルネットワークのトレーニングは、時間依存が長くなると難しいことが知られている。本研究では、ほとんどの標準セルは初期化時に1つの安定平衡しか持たず、ネットワーク安定平衡の数が増加すると、長い時間依存を持つタスクの学習が一般的に起こることを示す。マルチスタビリティは、初期のモノスタブルネットワークでは容易に実現できないことが多く、入力と出力の間の長時間の依存関係の学習が困難になる。この洞察は、"warmup"と呼ばれる手続きを通じて、任意の再帰的な細胞接続を初期化し、任意に長い時間依存を学習する能力を改善する新しい方法の設計に繋がる。この初期化手順は、ネットワークの到達可能な多重性、すなわち、いくつかの勾配ステップにおいて、関連する入力軌跡を通じて到達可能なネットワーク内の平衡数を最大化するように設計されている。いくつかの情報復元,シーケンス分類,強化学習ベンチマークについて検討し,複数の繰り返しセルにおいて学習速度と性能が大幅に向上するが,時には精度が損なわれることを示した。そこで我々は,高レベルな精度を維持しつつ,長時間依存の学習を大幅に改善できる部分ウォームアップを特徴とする二重層アーキテクチャを導入する。このアプローチは、長期間の依存関係が存在する場合のリカレントセルの学習能力を改善するための一般的なフレームワークを提供する。また,文献から得られた他の初期化および前訓練法が,再発細胞の到達可能な多重化を暗黙的に促進することを示す。

Training recurrent neural networks is known to be difficult when time dependencies become long. In this work, we show that most standard cells only have one stable equilibrium at initialisation, and that learning on tasks with long time dependencies generally occurs once the number of network stable equilibria increases; a property known as multistability. Multistability is often not easily attained by initially monostable networks, making learning of long time dependencies between inputs and outputs difficult. This insight leads to the design of a novel way to initialise any recurrent cell connectivity through a procedure called "warmup" to improve its capability to learn arbitrarily long time dependencies. This initialisation procedure is designed to maximise network reachable multistability, i.e., the number of equilibria within the network that can be reached through relevant input trajectories, in few gradient steps. We show on several information restitution, sequence classification, and reinforcement learning benchmarks that warming up greatly improves learning speed and performance, for multiple recurrent cells, but sometimes impedes precision. We therefore introduce a double-layer architecture initialised with a partial warmup that is shown to greatly improve learning of long time dependencies while maintaining high levels of precision. This approach provides a general framework for improving learning abilities of any recurrent cell when long time dependencies are present. We also show empirically that other initialisation and pretraining procedures from the literature implicitly foster reachable multistability of recurrent cells.

翻訳日:2023-07-21 19:25:32 公開日:2023-07-20

# HDGT:シーンエンコーディングによるマルチエージェント軌道予測のための異種駆動グラフ変換器

HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding ( http://arxiv.org/abs/2205.09753v2 )

ライセンス: Link先を確認

Xiaosong Jia, Penghao Wu, Li Chen, Yu Liu, Hongyang Li, Junchi Yan

(参考訳) 運転シーンをベクトル表現にエンコーディングすることは、軌道予測のような下流タスクに利益をもたらす自動運転にとって必須のタスクである。駆動シーンは、しばしば異なる種類のオブジェクト(エージェント、レーン、交通標識)のような異種要素を伴い、オブジェクト間の意味的関係は豊かで多様である。一方、要素間の相対性も存在し、これは空間関係が相対的な概念であり、グローバル座標系ではなくエゴ中心の方法で符号化する必要があることを意味する。これらの観測に基づいて,運転シーンを異なる種類のノードとエッジを持つ異種グラフとしてモデル化したバックボーンである異種運転グラフ変換器(HDGT)を提案する。ヘテロジニアスグラフ構築では、様々な意味関係に従って異なる種類のノードを接続する。空間的関係符号化では、ノードの座標とエッジの座標は局所ノード中心座標系に含まれる。グラフニューラルネットワーク(GNN)のアグリゲーションモジュールでは、入力の不均一性に適合する階層的な方法でトランスフォーマー構造を採用する。実験結果から,HDGTは軌道予測およびWaymo Open Motion Challengeにおいて,軌道予測のタスクの最先端性能を達成することが示された。

Encoding a driving scene into vector representations has been an essential task for autonomous driving that can benefit downstream tasks e.g. trajectory prediction. The driving scene often involves heterogeneous elements such as the different types of objects (agents, lanes, traffic signs) and the semantic relations between objects are rich and diverse. Meanwhile, there also exist relativity across elements, which means that the spatial relation is a relative concept and need be encoded in a ego-centric manner instead of in a global coordinate system. Based on these observations, we propose Heterogeneous Driving Graph Transformer (HDGT), a backbone modelling the driving scene as a heterogeneous graph with different types of nodes and edges. For heterogeneous graph construction, we connect different types of nodes according to diverse semantic relations. For spatial relation encoding, the coordinates of the node as well as its in-edges are in the local node-centric coordinate system. For the aggregation module in the graph neural network (GNN), we adopt the transformer structure in a hierarchical way to fit the heterogeneous nature of inputs. Experimental results show that HDGT achieves state-of-the-art performance for the task of trajectory prediction, on INTERACTION Prediction Challenge and Waymo Open Motion Challenge.

翻訳日:2023-07-21 19:18:31 公開日:2023-07-20

# Torchhd:超次元コンピューティングとベクトル記号アーキテクチャの研究を支援するオープンソースのPythonライブラリ

Torchhd: An Open Source Python Library to Support Research on Hyperdimensional Computing and Vector Symbolic Architectures ( http://arxiv.org/abs/2205.09208v2 )

ライセンス: Link先を確認

Mike Heddes, Igor Nunes, Pere Verg\'es, Denis Kleyko, Danny Abraham, Tony Givargis, Alexandru Nicolau, Alexander Veidenbaum

(参考訳) 超次元コンピューティング (HD) またはベクトル記号アーキテクチャ (VSA) は、ランダムな高次元ベクトル空間の性質を利用して分散表現を計算するためのフレームワークである。この特に学際的な分野の研究を集約し、広めるという科学コミュニティのコミットメントは、その進歩の基盤となっている。これらの取り組みの一環として、HD/VSA用の高性能オープンソースPythonライブラリであるTorchhdを紹介します。 Torchhdは、HD/VSAをよりアクセスしやすくし、さらなる研究とアプリケーション開発のための効率的な基盤となることを目指している。 PyTorch上に構築された使いやすいライブラリには、最先端のHD/VSA機能、明確なドキュメント、有名な出版物による実装例などがある。公開されているコードと対応するtorchhd実装を比較すると、実験は最大100倍高速に実行できる。 Torchhd は https://github.com/hyperdimensional-computing/torchhd で利用可能である。

Hyperdimensional computing (HD), also known as vector symbolic architectures (VSA), is a framework for computing with distributed representations by exploiting properties of random high-dimensional vector spaces. The commitment of the scientific community to aggregate and disseminate research in this particularly multidisciplinary area has been fundamental for its advancement. Joining these efforts, we present Torchhd, a high-performance open source Python library for HD/VSA. Torchhd seeks to make HD/VSA more accessible and serves as an efficient foundation for further research and application development. The easy-to-use library builds on top of PyTorch and features state-of-the-art HD/VSA functionality, clear documentation, and implementation examples from well-known publications. Comparing publicly available code with their corresponding Torchhd implementation shows that experiments can run up to 100x faster. Torchhd is available at: https://github.com/hyperdimensional-computing/torchhd.

翻訳日:2023-07-21 19:18:10 公開日:2023-07-20

# 構造力学とビブロア音響に応用した機械学習手法の検討

A Review of Machine Learning Methods Applied to Structural Dynamics and Vibroacoustic ( http://arxiv.org/abs/2204.06362v2 )

ライセンス: Link先を確認

Barbara Cunha (LTDS), Christophe Droz (I4S), Abdelmalek Zine (ICJ), St\'ephane Foulard, Mohamed Ichchou (LTDS)

(参考訳) 機械学習(ml)の使用は、いくつかの分野に急速に広がり、構造力学や振動音響学(sd\&v)の多くの応用に遭遇している。前例のないデータ可用性、アルゴリズムの進歩と計算能力、意思決定の強化、不確実性処理、パターン認識、リアルタイム評価によって駆動される、データからの洞察を明らかにするmlの能力の増大。 SD\&Vの主要な3つのアプリケーションがこれらの利点を生かしている。構造的健康モニタリングでは、ML検出と予後が安全な操作とメンテナンススケジュールの最適化につながる。システムの識別と制御設計は、アクティブノイズ制御およびアクティブ振動制御におけるML技術によって活用される。最後に、MLベースのサロゲートモデルはコストのかかるシミュレーションの高速な代替手段を提供し、堅牢で最適化された製品設計を可能にします。この地域の多くの作品にもかかわらず、レビューや分析は行われていない。そこで本稿では,これらの分野の統合を追跡し理解するために,sd\&v分析におけるml応用に関する調査を行い,実装の現状と新たな機会について考察する。これら3つの応用ごとに,科学的知識に基づく方法論,利点,限界,推奨事項が同定された。さらに,Digital Twins と Physics Guided ML の役割を,現在の課題を克服し,今後の研究の進展をパワーアップするために検討する。その結果、SD\&Vで適用されたMLの現在の展望を概観し、その分野の進歩と展望について、読者に高度な理解を促すことができた。

The use of Machine Learning (ML) has rapidly spread across several fields, having encountered many applications in Structural Dynamics and Vibroacoustic (SD\&V). The increasing capabilities of ML to unveil insights from data, driven by unprecedented data availability, algorithms advances and computational power, enhance decision making, uncertainty handling, patterns recognition and real-time assessments. Three main applications in SD\&V have taken advantage of these benefits. In Structural Health Monitoring, ML detection and prognosis lead to safe operation and optimized maintenance schedules. System identification and control design are leveraged by ML techniques in Active Noise Control and Active Vibration Control. Finally, the so-called ML-based surrogate models provide fast alternatives to costly simulations, enabling robust and optimized product design. Despite the many works in the area, they have not been reviewed and analyzed. Therefore, to keep track and understand this ongoing integration of fields, this paper presents a survey of ML applications in SD\&V analyses, shedding light on the current state of implementation and emerging opportunities. The main methodologies, advantages, limitations, and recommendations based on scientific knowledge were identified for each of the three applications. Moreover, the paper considers the role of Digital Twins and Physics Guided ML to overcome current challenges and power future research progress. As a result, the survey provides a broad overview of the present landscape of ML applied in SD\&V and guides the reader to an advanced understanding of progress and prospects in the field.

翻訳日:2023-07-21 19:17:41 公開日:2023-07-20

# 有限体上のランダム原始多項式生成のための量子加速アルゴリズム

Quantum-accelerated algorithms for generating random primitive polynomials over finite fields ( http://arxiv.org/abs/2203.12884v2 )

ライセンス: Link先を確認

Shan Huang, Hua-Lei Yin, Zeng-Bing Chen, Shengjun Wu

(参考訳) 有限体上の原始多項式は、古典的擬似ランダム数生成、符号化理論、ポスト量子暗号など、コンピュータ科学の様々な領域において重要である。それでも、有限体上のランダム原始多項式を生成するための効率的な古典的アルゴリズムの追求は今も続いている課題である。本稿では,この問題をハイブリッド量子古典アルゴリズムを用いて効率的に解く方法を示し,それらを実装するための特定の量子回路の設計について述べる。本研究は,多種多様な量子通信および計算応用におけるランダムプリミティブ多項式の高速かつリアルタイムな生成方法である。

Primitive polynomials over finite fields are crucial for various domains of computer science, including classical pseudo-random number generation, coding theory and post-quantum cryptography. Nevertheless, the pursuit of an efficient classical algorithm for generating random primitive polynomials over finite fields remains an ongoing challenge. In this paper, we show how to solve this problem efficiently through hybrid quantum-classical algorithms, and designs of the specific quantum circuits to implement them are also presented. Our research paves the way for the rapid and real-time generation of random primitive polynomials in diverse quantum communication and computation applications.

翻訳日:2023-07-21 19:17:16 公開日:2023-07-20

# 勾配・投影自由分散オンラインmin-maxリソース最適化

Gradient and Projection Free Distributed Online Min-Max Resource Optimization ( http://arxiv.org/abs/2112.03896v3 )

ライセンス: Link先を確認

Jingrong Wang and Ben Liang

(参考訳) 分散オンラインmin-maxリソース割り当てを並列エージェントとパラメータサーバのセットで検討する。我々のゴールは、これらの関数に関する事前情報なしで、時間変化とコスト関数のセットに対するポイントワイズ最大化を最小化することである。本研究では,非ストラグラーが資源を放棄し,資源をストラグラーと共有することを学ぶ,分散オンラインリソース再配置(dora)と呼ばれる新しいオンラインアルゴリズムを提案する。 DORAの注目すべき特徴は、既存のオンライン最適化戦略とは異なり、勾配計算や投射操作を必要としないことである。これにより、大規模および分散ネットワークにおける計算オーバーヘッドを大幅に削減できる。我々は,DORAの最悪の性能を分析し,非凸関数に対する動的後悔の上限を導出する。さらに,分散オンライン機械学習における帯域幅割り当て問題への応用を検討する。本研究は,提案手法の有効性と,壁面時間短縮のための勾配および/または投影に基づく資源配分アルゴリズムに対する性能上の優位性を示す。

We consider distributed online min-max resource allocation with a set of parallel agents and a parameter server. Our goal is to minimize the pointwise maximum over a set of time-varying and decreasing cost functions, without a priori information about these functions. We propose a novel online algorithm, termed Distributed Online resource Re-Allocation (DORA), where non-stragglers learn to relinquish resource and share resource with stragglers. A notable feature of DORA is that it does not require gradient calculation or projection operation, unlike most existing online optimization strategies. This allows it to substantially reduce the computation overhead in large-scale and distributed networks. We analyze the worst-case performance of DORA and derive an upper bound on its dynamic regret for non-convex functions. We further consider an application to the bandwidth allocation problem in distributed online machine learning. Our numerical study demonstrates the efficacy of the proposed solution and its performance advantage over gradient- and/or projection-based resource allocation algorithms in reducing wall-clock time.

翻訳日:2023-07-21 19:16:23 公開日:2023-07-20

# クラウドソーシングにおける適応的多数決の完全性

Full Characterization of Adaptively Strong Majority Voting in Crowdsourcing ( http://arxiv.org/abs/2111.06390v2 )

ライセンス: Link先を確認

Margarita Boyarskaya and Panos Ipeirotis

(参考訳) クラウドソーシングでは、労働者がアイテムを調べ、その正確性に投票することで、品質管理が一般的に達成される。信頼できない労働者の反応の影響を最小限に抑えるために、労働者間の合意のための所定の閾値である$\delta$を超過するまで追加の投票を依頼する$\delta$-margin投票プロセスを利用する。このプロセスは広く採用されているが、ヒューリスティックである。本研究では,マルコフ鎖を吸収して,クラウドソーシングプロセスにおいて重要な投票過程の特性を分析するモデリング手法を提案する。我々は、結果のコンセンサス投票の品質、コンセンサスに必要な投票数、投票要求の分散、その他の分配モーメントに関するクローズドフォーム方程式を提供する。本研究は,精度の異なる労働者を雇用する投票プロセスにおける品質等価性を達成するために,$\delta$のしきい値をどのように調整できるかを示す。また、予測応答精度の異なる投票プロセスに対して、効率等級の支払い率を提供する。さらに,本モデルでは,各例の難易度や難易度が異なる項目について考察する。実世界のクラウドソーシング投票データを用いたシミュレーションは,コンセンサス集約過程を特徴付ける理論モデルの有効性を検証する。本研究の成果は,クラウドソーシングの実用化に効果的に活用できる。

In crowdsourcing, quality control is commonly achieved by having workers examine items and vote on their correctness. To minimize the impact of unreliable worker responses, a $\delta$-margin voting process is utilized, where additional votes are solicited until a predetermined threshold $\delta$ for agreement between workers is exceeded. The process is widely adopted but only as a heuristic. Our research presents a modeling approach using absorbing Markov chains to analyze the characteristics of this voting process that matter in crowdsourced processes. We provide closed-form equations for the quality of resulting consensus vote, the expected number of votes required for consensus, the variance of vote requirements, and other distribution moments. Our findings demonstrate how the threshold $\delta$ can be adjusted to achieve quality equivalence across voting processes that employ workers with varying accuracy levels. We also provide efficiency-equalizing payment rates for voting processes with different expected response accuracy levels. Additionally, our model considers items with varying degrees of difficulty and uncertainty about the difficulty of each example. Our simulations, using real-world crowdsourced vote data, validate the effectiveness of our theoretical model in characterizing the consensus aggregation process. The results of our study can be effectively employed in practical crowdsourcing applications.

翻訳日:2023-07-21 19:16:10 公開日:2023-07-20

# 大規模漸近解析による統計的推測のための確率勾配アルゴリズムのチューニング

Tuning Stochastic Gradient Algorithms for Statistical Inference via Large-Sample Asymptotics ( http://arxiv.org/abs/2207.12395v3 )

ライセンス: Link先を確認

Jeffrey Negrea, Jun Yang, Haoyue Feng, Daniel M. Roy, Jonathan H. Huggins

(参考訳) 最適化とサンプリングのための確率勾配アルゴリズム(SGA)のチューニングはしばしば一般化可能な理論ではなくヒューリスティックスと試行錯誤に基づいている。この理論は,SGAの大規模統計的漸近をステップサイズ-サンプルサイズスケーリング制限によって特徴付けることによって,実践的ギャップを解消する。そこで本研究では,mleサンプリング分布に比例する共分散を漸近的に有し,大きな固定ステップサイズの反復平均化がチューニングパラメータの選択にロバストであることを示す。また,モデル不特定化に頑健な一般化された後部についても,チューニングを導くためのベルンシュタイン・ヴォン・ミセス的定理を証明した。数値実験により、現実的な有限サンプル状態における結果とレコメンデーションが検証される。我々の研究は、幅広いモデルに対する他の確率勾配マルコフ連鎖モンテカルロアルゴリズムの系統的解析の基礎を成している。

The tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory. We address this theory--practice gap by characterizing the large-sample statistical asymptotics of SGAs via a joint step-size--sample-size scaling limit. We show that iterate averaging with a large fixed step size is robust to the choice of tuning parameters and asymptotically has covariance proportional to that of the MLE sampling distribution. We also prove a Bernstein--von Mises-like theorem to guide tuning, including for generalized posteriors that are robust to model misspecification. Numerical experiments validate our results and recommendations in realistic finite-sample regimes. Our work lays the foundation for a systematic analysis of other stochastic gradient Markov chain Monte Carlo algorithms for a wide range of models.

翻訳日:2023-07-21 19:07:49 公開日:2023-07-20

# オンライン実験設計による線形MDPのインスタンス依存ニア最適ポリシー同定

Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design ( http://arxiv.org/abs/2207.02575v2 )

ライセンス: Link先を確認

Andrew Wagenmaker, Kevin Jamieson

(参考訳) 強化学習(RL)のミニマックスサンプル複雑性("Worst-case"インスタンスでの学習の複雑さ)を理解するために多くの進歩があったが、そのような複雑さの尺度は学習の真の困難を捉えていないことが多い。実際、"簡単"なインスタンスでは、最悪のケースで達成可能なものよりもはるかに複雑なものを達成することを望んでいます。本研究は,線形関数近似を用いたRLの設定において,ニア最適化ポリシー(PAC RL)を学習する際の「インスタンス依存」の複雑さを理解することを目的とする。本稿では,関数近似設定付きrlにおいて,その1つ目となる,複雑性のきめ細かなインスタンス依存測度を実現するアルゴリズムである \textsc{pedel} を提案する。明示的な例を通して,低regret,minimax-Optimalアルゴリズムよりも証明可能なゲインが得られ,そのようなアルゴリズムがインスタンス最適化率に到達できないことを示す。提案手法は, 探索予算を, 最適に近い政策の学習に最も関係のある「方向」に着目し, 独立した興味を持ったオンライン実験手法に依拠する。

While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true difficulty of learning. In practice, on an "easy" instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this work we seek to understand the "instance-dependent" complexity of learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation. We propose an algorithm, \textsc{Pedel}, which achieves a fine-grained instance-dependent measure of complexity, the first of its kind in the RL with function approximation setting, thereby capturing the difficulty of learning on each particular problem instance. Through an explicit example, we show that \textsc{Pedel} yields provable gains over low-regret, minimax-optimal algorithms and that such algorithms are unable to hit the instance-optimal rate. Our approach relies on a novel online experiment design-based procedure which focuses the exploration budget on the "directions" most relevant to learning a near-optimal policy, and may be of independent interest.

翻訳日:2023-07-21 19:07:24 公開日:2023-07-20

# グラフ理論からの不確実性関係

Uncertainty relations from graph theory ( http://arxiv.org/abs/2207.02197v4 )

ライセンス: Link先を確認

Carlos de Gois, Kiara Hansenne, Otfried G\"uhne

(参考訳) 量子測定は本質的に確率的であり、しばしば同時測定の結果を正確に予測することを禁じられる。この現象は不確実性関係によって捉え、定量化される。量子論の発端から研究されているが、量子測定の集合の期待値を決定する問題は、一般には未解決のままである。可観測物とグラフ理論の密接な関係を構築することにより、任意の二コトミック可観測物に対して妥当な不確実性関係を導出する。これらの関係は、多くの場合、密で、関連するグラフの最大傾きの大きさに関連している。応用として, エントロピーの不確実性関係, 分離可能性基準, 絡み合い証人の定式化に, 本結果は直接的に利用できる。

Quantum measurements are inherently probabilistic and quantum theory often forbids to precisely predict the outcomes of simultaneous measurements. This phenomenon is captured and quantified through uncertainty relations. Although studied since the inception of quantum theory, the problem of determining the possible expectation values of a collection of quantum measurements remains, in general, unsolved. By constructing a close connection between observables and graph theory, we derive uncertainty relations valid for any set of dichotomic observables. These relations are, in many cases, tight, and related to the size of the maximum clique of the associated graph. As applications, our results can be straightforwardly used to formulate entropic uncertainty relations, separability criteria and entanglement witnesses.

翻訳日:2023-07-21 19:07:01 公開日:2023-07-20

# convolutional generative adversarial networkを用いたノイズ時系列のデータ駆動モデリング

Data-Driven Modeling of Noise Time Series with Convolutional Generative Adversarial Networks ( http://arxiv.org/abs/2207.01110v3 )

ライセンス: Link先を確認

Adam Wunderlich, Jack Sklar

(参考訳) 物理過程から生じるランダムノイズは測定の固有の特性であり、ほとんどの信号処理やデータ解析タスクの制限要因である。データ駆動型モデリングにおけるGAN(Generative Adversarial Network)に対する近年の関心を考えると、GANがターゲットデータセットのノイズを忠実に再現できる範囲を決定することが重要である。本稿では,この問題を時系列で解明することを目的とした実証的な調査を行う。すなわち、一般的な深層畳み込みGAN(DCGAN)アーキテクチャ、直接時系列モデル、短時間フーリエ変換(STFT)データ表現を用いた画像ベースモデルに基づく時系列用汎用GANを2つ評価する。 GANモデルは、既知の地絡パラメータを持つ模擬ノイズ時系列の分布を用いて、訓練および定量的評価を行う。ターゲットの時系列分布には、帯域制限熱ノイズ、電力法ノイズ、ショットノイズ、衝動ノイズなど、物理測定、電子機器、通信システムで一般的に見られる幅広い種類のノイズが含まれる。 ganは、多くのノイズタイプを学習できるが、ganアーキテクチャがノイズのいくつかの側面、例えば、極端な異常値を持つ衝動時系列に適していない場合、予測的に苦労する。本研究は, 時系列GANに対する現在のアプローチの能力と潜在的な限界に関する知見と, 今後の研究分野のハイライトを提供するものである。さらに,テストのバッテリは時系列の深部生成モデルの開発に役立つ有用なベンチマークを提供する。

Random noise arising from physical processes is an inherent characteristic of measurements and a limiting factor for most signal processing and data analysis tasks. Given the recent interest in generative adversarial networks (GANs) for data-driven modeling, it is important to determine to what extent GANs can faithfully reproduce noise in target data sets. In this paper, we present an empirical investigation that aims to shed light on this issue for time series. Namely, we assess two general-purpose GANs for time series that are based on the popular deep convolutional GAN (DCGAN) architecture, a direct time-series model and an image-based model that uses a short-time Fourier transform (STFT) data representation. The GAN models are trained and quantitatively evaluated using distributions of simulated noise time series with known ground-truth parameters. Target time series distributions include a broad range of noise types commonly encountered in physical measurements, electronics, and communication systems: band-limited thermal noise, power law noise, shot noise, and impulsive noise. We find that GANs are capable of learning many noise types, although they predictably struggle when the GAN architecture is not well suited to some aspects of the noise, e.g., impulsive time-series with extreme outliers. Our findings provide insights into the capabilities and potential limitations of current approaches to time-series GANs and highlight areas for further research. In addition, our battery of tests provides a useful benchmark to aid the development of deep generative models for time series.

翻訳日:2023-07-21 19:06:49 公開日:2023-07-20

# 周辺視トランスフォーマ

Vicinity Vision Transformer ( http://arxiv.org/abs/2206.10552v2 )

ライセンス: Link先を確認

Weixuan Sun, Zhen Qin, Hui Deng, Jianyuan Wang, Yi Zhang, Kaihao Zhang, Nick Barnes, Stan Birchfield, Lingpeng Kong, Yiran Zhong

(参考訳) 視覚変換器は多くのコンピュータビジョンタスクで大きな成功を収めている。しかし、その中心的なコンポーネントであるSoftmax attentionは、計算複雑性とメモリフットプリントが二次的であるため、視覚変換器が高解像度の画像にスケールアップすることを禁止している。同様の問題を緩和するために自然言語処理(nlp)タスクに線形注意が導入されたが、既存の線形注意を視覚トランスフォーマーに直接適用することは、十分な結果をもたらすことはない。この問題を調査し,コンピュータビジョンタスクがNLPタスクよりもローカル情報に重点を置いていることを見出した。この観測に基づいて,線形複雑度を有する視覚変換器に局所性バイアスを導入するビシニティ注意法を提案する。具体的には,各画像パッチに対して,隣接パッチを用いて測定した2次元マンハッタン距離に基づいて注意重みを調節する。この場合、近隣のパッチは遠方のパッチよりも強い注目を集める。さらに,その効率性を示すためにはトークン長を特徴量よりも大きくする必要があるため,精度を損なうことなく特徴量を削減する新しい近傍視覚トランスフォーマ(vvt)構造を提案する。我々は,CIFAR100, ImageNet1K, ADE20Kデータセットについて広範囲に実験を行い,本手法の有効性を検証した。提案手法は,入力解像度が大きくなると,従来のトランスフォーマーベースおよび畳み込みベースネットワークよりもGFlopsの速度が遅い。特に,従来の手法よりも50%少ないパラメータで,最先端の画像分類精度を実現する。

Vision transformers have shown great success on numerous computer vision tasks. However, its central component, softmax attention, prohibits vision transformers from scaling up to high-resolution images, due to both the computational complexity and memory footprint being quadratic. Although linear attention was introduced in natural language processing (NLP) tasks to mitigate a similar issue, directly applying existing linear attention to vision transformers may not lead to satisfactory results. We investigate this problem and find that computer vision tasks focus more on local information compared with NLP tasks. Based on this observation, we present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity. Specifically, for each image patch, we adjust its attention weight based on its 2D Manhattan distance measured by its neighbouring patches. In this case, the neighbouring patches will receive stronger attention than far-away patches. Moreover, since our Vicinity Attention requires the token length to be much larger than the feature dimension to show its efficiency advantages, we further propose a new Vicinity Vision Transformer (VVT) structure to reduce the feature dimension without degenerating the accuracy. We perform extensive experiments on the CIFAR100, ImageNet1K, and ADE20K datasets to validate the effectiveness of our method. Our method has a slower growth rate of GFlops than previous transformer-based and convolution-based networks when the input resolution increases. In particular, our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous methods.

翻訳日:2023-07-21 19:06:22 公開日:2023-07-20

# Pythae: Pythonで生成オートエンコーダを統合する - ベンチマークユースケース

Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case ( http://arxiv.org/abs/2206.08309v2 )

ライセンス: Link先を確認

Cl\'ement Chadebec and Louis J. Vincent and St\'ephanie Allassonni\`ere

(参考訳) 近年,複雑な分布をモデル化する能力から,深い生成モデルへの関心が高まっている。これらのモデルのうち、変分オートエンコーダは計算効率が良く、複数の分野で印象的な結果をもたらすことが証明され、人気を集めている。このブレークスルーの後、オリジナルの出版を改善するために広範な研究が行われ、様々なタスクに対応する様々なVAEモデルが生み出された。本稿では,汎用的なpythonライブラリであるpythaeについて述べる。pythaeは統一的な実装と,生成型オートエンコーダモデルの単純で再現性があり,信頼性の高い使用を可能にする専用フレームワークを提供する。次に,本ライブラリを用いてケーススタディベンチマークを行い,画像再構成,生成,分類,クラスタリング,補間といった下流タスクにおける主な改善点を代表する19個の生成型オートエンコーダモデルを比較し,比較する。オープンソースライブラリはhttps://github.com/clementchadebec/benchmark_vaeにある。

In recent years, deep generative models have attracted increasing interest due to their capacity to model complex distributions. Among those models, variational autoencoders have gained popularity as they have proven both to be computationally efficient and yield impressive results in multiple fields. Following this breakthrough, extensive research has been done in order to improve the original publication, resulting in a variety of different VAE models in response to different tasks. In this paper we present Pythae, a versatile open-source Python library providing both a unified implementation and a dedicated framework allowing straightforward, reproducible and reliable use of generative autoencoder models. We then propose to use this library to perform a case study benchmark where we present and compare 19 generative autoencoder models representative of some of the main improvements on downstream tasks such as image reconstruction, generation, classification, clustering and interpolation. The open-source library can be found at https://github.com/clementchadebec/benchmark_VAE.

翻訳日:2023-07-21 19:06:00 公開日:2023-07-20

# 事前学習された知覚機能は差分プライベート画像生成を改善する

Pre-trained Perceptual Features Improve Differentially Private Image Generation ( http://arxiv.org/abs/2205.12900v4 )

ライセンス: Link先を確認

Fredrik Harder and Milad Jalali Asadabadi and Danica J. Sutherland and Mijung Park

(参考訳) 偏極性確率勾配勾配勾配(DP-SGD)を持つ中等度サイズの生成モデルの訓練は困難であり、適切なプライバシーレベルに必要なノイズレベルは、単に大きすぎる。代わりに、情報のある公開データセットに適切な、関連する表現を構築し、その表現でプライベートデータをモデル化することを学びます。特に、公開データセットから学習した知覚的特徴に基づくカーネルを用いて、プライベートなターゲットデータとジェネレータの分散との間の最大平均不一致(mmd)を最小限に抑える。 mmdでは、dp-sgdのように最適化の各ステップにノイズを導入するのではなく、データ依存の用語を何度でも民営化することができる。当社のアルゴリズムでは,MNISTやFashionMNISTなどのデータセットを大容量の$\epsilon \approx 10$で対象とする,分散における特徴を捉えたCIFAR10レベルのイメージを$\epsilon \approx 2$で生成することができる。我々の研究は、プライベートと非プライベートの深層生成モデルの間のギャップを減らすためのシンプルで強力な基盤を導入しました。私たちのコードは \url{https://github.com/ParkLabML/DP-MEPF} で利用可能です。

Training even moderately-sized generative models with differentially-private stochastic gradient descent (DP-SGD) is difficult: the required level of noise for reasonable levels of privacy is simply too large. We advocate instead building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation. In particular, we minimize the maximum mean discrepancy (MMD) between private target data and a generator's distribution, using a kernel based on perceptual features learned from a public dataset. With the MMD, we can simply privatize the data-dependent term once and for all, rather than introducing noise at each step of optimization as in DP-SGD. Our algorithm allows us to generate CIFAR10-level images with $\epsilon \approx 2$ which capture distinctive features in the distribution, far surpassing the current state of the art, which mostly focuses on datasets such as MNIST and FashionMNIST at a large $\epsilon \approx 10$. Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models. Our code is available at \url{https://github.com/ParkLabML/DP-MEPF}.

翻訳日:2023-07-21 19:05:44 公開日:2023-07-20

# MotionBERT:人間の動きの表現を学習する統一的な視点

MotionBERT: A Unified Perspective on Learning Human Motion Representations ( http://arxiv.org/abs/2210.06551v4 )

ライセンス: Link先を確認

Wentao Zhu, Xiaoxuan Ma, Zhaoyang Liu, Libin Liu, Wayne Wu, Yizhou Wang

(参考訳) 本稿では,大規模・異種データ資源から人間の動作表現を学習し,人間中心のビデオ課題に取り組むための統一的な視点を提案する。具体的には,ノイズのある部分的な2次元観測から基礎となる3次元運動を復元するために,動きエンコーダを訓練する事前学習ステージを提案する。この方法で得られた運動表現は、人の動きに関する幾何学的、運動学的、物理的知識を取り入れており、容易に複数の下流タスクに転送できる。動作エンコーダをDST(Dual-stream Spatio-temporal Transformer)ニューラルネットワークで実装する。骨格関節の長距離時空間的関係を包括的かつ適応的に捉え、スクラッチから訓練された場合の最低3次元ポーズ推定誤差を例示する。さらに,提案手法は,学習した動作表現の汎用性を示す単純な回帰ヘッド(1-2層)で事前学習した動きエンコーダを微調整することで,3つの下流タスクの最先端性能を実現する。コードとモデルはhttps://motionbert.github.io/で入手できる。

We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources. Specifically, we propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations. The motion representations acquired in this way incorporate geometric, kinematic, and physical knowledge about human motion, which can be easily transferred to multiple downstream tasks. We implement the motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network. It could capture long-range spatio-temporal relationships among the skeletal joints comprehensively and adaptively, exemplified by the lowest 3D pose estimation error so far when trained from scratch. Furthermore, our proposed framework achieves state-of-the-art performance on all three downstream tasks by simply finetuning the pretrained motion encoder with a simple regression head (1-2 layers), which demonstrates the versatility of the learned motion representations. Code and models are available at https://motionbert.github.io/

翻訳日:2023-07-21 19:00:18 公開日:2023-07-20

# ローカルクエリはいつ堅牢な学習に有用か?

When are Local Queries Useful for Robust Learning? ( http://arxiv.org/abs/2210.06089v2 )

ライセンス: Link先を確認

Pascale Gourdeau, Varun Kanade, Marta Kwiatkowska, James Worrell

(参考訳) 正確なボール内ロバストリスクと、Gourdeau et al. (2019) によるランダムな例へのアクセスを考えると、概念クラスの堅牢な学習性には分布仮定が必要であることが示されている。本稿では,局所的クエリを用いて学習者がより多くのパワーを与えられる学習モデルについて検討し,このロバスト性の概念に対してロバストな経験的リスク最小化(erm)を行う最初の分散フリーアルゴリズムを提案する。私たちが検討する最初の学習モデルは、学習者がトレーニングサンプルの近くのポイントのラベルをクエリできるローカルメンバシップクエリ(LMQ)を使用する。均一分布の下では、LMQ は接続の堅牢性しきい値や、決定リストやハーフスペースのような任意のスーパークラスを増大させません。この否定的な結果に直面した私たちは、ローカル等価クエリ(\mathsf{leq}$)オラクルを紹介します。これは、仮説と対象概念がトレーニングサンプルの点の摂動領域で一致しているか、あるいはその存在が反例なのかを返します。一方、クエリ半径$\lambda$が敵の摂動予算$\rho$より厳密に小さい場合、分散のない堅牢な学習は様々な概念クラスでは不可能である。そして、オンライン学習保証に基づいてこれらのアルゴリズムの問合せ複雑性を制限し、特別な結合の場合にはこれらの境界をさらに改善します。最後に、$\{0,1\}^n$のハーフスペースに対するロバストな学習アルゴリズムを与え、精度制限された敵に対して$\mathbb{r}^n$のハーフスペースに対するロバスト性保証を得る。

Distributional assumptions have been shown to be necessary for the robust learnability of concept classes when considering the exact-in-the-ball robust risk and access to random examples by Gourdeau et al. (2019). In this paper, we study learning models where the learner is given more power through the use of local queries, and give the first distribution-free algorithms that perform robust empirical risk minimization (ERM) for this notion of robustness. The first learning model we consider uses local membership queries (LMQ), where the learner can query the label of points near the training sample. We show that, under the uniform distribution, LMQs do not increase the robustness threshold of conjunctions and any superclass, e.g., decision lists and halfspaces. Faced with this negative result, we introduce the local equivalence query ($\mathsf{LEQ}$) oracle, which returns whether the hypothesis and target concept agree in the perturbation region around a point in the training sample, as well as a counterexample if it exists. We show a separation result: on the one hand, if the query radius $\lambda$ is strictly smaller than the adversary's perturbation budget $\rho$, then distribution-free robust learning is impossible for a wide variety of concept classes; on the other hand, the setting $\lambda=\rho$ allows us to develop robust ERM algorithms. We then bound the query complexity of these algorithms based on online learning guarantees and further improve these bounds for the special case of conjunctions. We finish by giving robust learning algorithms for halfspaces on $\{0,1\}^n$ and then obtaining robustness guarantees for halfspaces in $\mathbb{R}^n$ against precision-bounded adversaries.

翻訳日:2023-07-21 18:59:58 公開日:2023-07-20

# MAP:マルチモーダル不確かさを意識したビジョンランゲージ事前学習モデル

MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model ( http://arxiv.org/abs/2210.05335v3 )

ライセンス: Link先を確認

Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang

(参考訳) マルチモーダルな意味理解は、しばしば不確実性を扱う必要があり、つまり、得られたメッセージは複数のターゲットを参照する傾向がある。このような不確実性は、モーダル間の不確実性を含む私たちの解釈には問題があります。この不確実性のモデリング、特にラベルのないデータセットの事前トレーニングやタスク固有のダウンストリームデータセットの微調整についてはほとんど研究されていない。本稿では,確率分布エンコーダ(Probability Distribution Encoder:PDE)を用いて,全てのモードを確率分布として表現する。既存の決定論的手法と比較して、そのような不確実性モデリングはよりリッチなマルチモーダル意味情報やより複雑な関係を伝達することができる。さらに、一般的な事前学習フレームワークと不確実性モデリングを統合し、分布ベース視覚言語コントラスト学習(D-VLC)、分布ベースマスケッド言語モデリング(D-MLM)、分布ベース画像テキストマッチング(D-ITM)といった適切な事前学習タスクを提案する。微調整されたモデルは、画像テキスト検索、視覚的質問応答、視覚的推論、視覚的推論などの下流タスクに適応し、最先端の結果を達成する。

Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend to refer to multiple targets. Such uncertainty is problematic for our interpretation, including inter- and intra-modal uncertainty. Little effort has studied the modeling of this uncertainty, particularly in pre-training on unlabeled datasets and fine-tuning in task-specific downstream datasets. In this paper, we project the representations of all modalities as probabilistic distributions via a Probability Distribution Encoder (PDE) by utilizing sequence-level interactions. Compared to the existing deterministic methods, such uncertainty modeling can convey richer multimodal semantic information and more complex relationships. Furthermore, we integrate uncertainty modeling with popular pre-training frameworks and propose suitable pre-training tasks: Distribution-based Vision-Language Contrastive learning (D-VLC), Distribution-based Masked Language Modeling (D-MLM), and Distribution-based Image-Text Matching (D-ITM). The fine-tuned models are applied to challenging downstream tasks, including image-text retrieval, visual question answering, visual reasoning, and visual entailment, and achieve state-of-the-art results.

翻訳日:2023-07-21 18:59:12 公開日:2023-07-20

# 敵対的ノイズに対するフレンドリーなノイズ:データ中毒攻撃に対する強力な防御

Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attacks ( http://arxiv.org/abs/2208.10224v4 )

ライセンス: Link先を確認

Tian Yu Liu, Yu Yang, Baharan Mirzasoleiman

(参考訳) 見えない)データ中毒攻撃の強力なカテゴリは、特定のテスト時間データの予測を変更するために、小さな敵の摂動によってトレーニング例のサブセットを変更する。既存の防御機構は、しばしば一般化性能を著しく損なうか、攻撃固有のもので、適用が著しく遅いため、実際に配備されることは望ましくない。そこで本研究では, 従来の手法とは異なり, 一般化性能の低下により, 各種の目に見えない毒素攻撃を回避できる簡易かつ高効率な手法を提案する。攻撃が局所的な鋭い領域に高訓練損失をもたらし、それが最小化されると、敵の摂動を学習し、攻撃を成功させるという重要な観察を行う。毒殺攻撃を打破するためには、毒によって引き起こされる鋭い喪失領域を緩和する。そこで本手法は, 性能を劣化させることなく, 最大摂動音に対して発生する最適化親和性雑音と, ランダムに変化する雑音成分の2成分からなる。両方のコンポーネントの組み合わせは、非常に軽量だが、最も強力なトリガーレスターゲットおよび隠れトリガーバックドア中毒攻撃に対して非常に効果的に防御する、例えば勾配マッチング、ブルズアイポリトープ、睡眠剤などである。我々は、我々のフレンドリーなノイズが他のアーキテクチャに転送可能であることを示し、適応的な攻撃はランダムなノイズ成分のために我々の防御を損なうことができないことを示す。私たちのコードは、https://github.com/tianyu139/friendly-noiseで利用可能です。

A powerful category of (invisible) data poisoning attacks modify a subset of training examples by small adversarial perturbations to change the prediction of certain test-time data. Existing defense mechanisms are not desirable to deploy in practice, as they often either drastically harm the generalization performance, or are attack-specific, and prohibitively slow to apply. Here, we propose a simple but highly effective approach that unlike existing methods breaks various types of invisible poisoning attacks with the slightest drop in the generalization performance. We make the key observation that attacks introduce local sharp regions of high training loss, which when minimized, results in learning the adversarial perturbations and makes the attack successful. To break poisoning attacks, our key idea is to alleviate the sharp loss regions introduced by poisons. To do so, our approach comprises two components: an optimized friendly noise that is generated to maximally perturb examples without degrading the performance, and a randomly varying noise component. The combination of both components builds a very light-weight but extremely effective defense against the most powerful triggerless targeted and hidden-trigger backdoor poisoning attacks, including Gradient Matching, Bulls-eye Polytope, and Sleeper Agent. We show that our friendly noise is transferable to other architectures, and adaptive attacks cannot break our defense due to its random noise component. Our code is available at: https://github.com/tianyu139/friendly-noise

翻訳日:2023-07-21 18:57:23 公開日:2023-07-20

# 世論市場モデル:ポジティブな介入による極右意見の拡散

Opinion Market Model: Stemming Far-Right Opinion Spread using Positive Interventions ( http://arxiv.org/abs/2208.06620v2 )

ライセンス: Link先を確認

Pio Calderon, Rohit Ram, Marian-Andrei Rizoiu

(参考訳) オンライン過激主義は、ヘイトスピーチの正規化、ユーザーの過激化、社会的分裂の増加など、深刻な社会的結果をもたらす。これらの結果に対処するために様々な緩和戦略が検討されている。そのような戦略の1つはポジティブな介入:特定の意見を促進するために意見エコシステムに注意を向ける制御されたシグナルである。ポジティブ介入の有効性を評価するために,オピニオン間相互作用とポジティブ介入の役割の両方を考慮した2層オンラインオピニオン・エコシステムモデルであるオピニオン・マーケット・モデル(omm)を提案する。市場注目市場の大きさは、多変量離散時間ホークスプロセスを用いて第1階層でモデル化され、第2階層では、市場シェアアトラクションモデルを用いて限られた注意を払って、意見が協調して市場シェアを競う。合成データセット上で提案した推定手法の収束性を示す。次に、2つの学習タスクでOMMをテストし、2つの実世界のデータセットを適用して市場シェアを予測し、オンラインアイテム間の潜伏関係を明らかにする。最初のデータセットはfacebookとtwitterの議論で、ブッシュファイアと気候変動に関する中道と極右の意見を含んでいる。第2のデータセットは、人気のVEVOアーティストのYouTubeとTwitterのアテンションボリュームをキャプチャする。 OMMは、両方のデータセットで最先端の予測モデルより優れており、潜在的な協調競合関係を捉えている。我々は,(1)ブッシュファイアに関する極右意見と中道派意見の自己・相互強化,(2)コラボレーションや長期にわたる確執といった現実世界の相互作用と相関する対関係的アーティスト関係を明らかにする。最後に、OMMを肯定的な介入のためのテストベッドとして使用し、メディアカバレッジが極右意見の拡散をどう調節するかを示す。

Online extremism has severe societal consequences, including normalizing hate speech, user radicalization, and increased social divisions. Various mitigation strategies have been explored to address these consequences. One such strategy uses positive interventions: controlled signals that add attention to the opinion ecosystem to boost certain opinions. To evaluate the effectiveness of positive interventions, we introduce the Opinion Market Model (OMM), a two-tier online opinion ecosystem model that considers both inter-opinion interactions and the role of positive interventions. The size of the opinion attention market is modeled in the first tier using the multivariate discrete-time Hawkes process; in the second tier, opinions cooperate and compete for market share, given limited attention using the market share attraction model. We demonstrate the convergence of our proposed estimation scheme on a synthetic dataset. Next, we test OMM on two learning tasks, applying to two real-world datasets to predict attention market shares and uncover latent relationships between online items. The first dataset comprises Facebook and Twitter discussions containing moderate and far-right opinions about bushfires and climate change. The second dataset captures popular VEVO artists' YouTube and Twitter attention volumes. OMM outperforms the state-of-the-art predictive models on both datasets and captures latent cooperation-competition relations. We uncover (1) self- and cross-reinforcement between far-right and moderate opinions on the bushfires and (2) pairwise artist relations that correlate with real-world interactions such as collaborations and long-lasting feuds. Lastly, we use OMM as a testbed for positive interventions and show how media coverage modulates the spread of far-right opinions.

翻訳日:2023-07-21 18:56:10 公開日:2023-07-20

# カオスと乱流のニューラルネットワーク複雑性

Neural Network Complexity of Chaos and Turbulence ( http://arxiv.org/abs/2211.15382v2 )

ライセンス: Link先を確認

Tim Whittaker, Romuald A. Janik, Yaron Oz

(参考訳) カオスと乱流は複雑な物理現象であるが、それらを定量化する複雑性測度の正確な定義はまだ欠けている。本研究では,深層ニューラルネットワークの観点からカオスと乱流の相対的複雑性を考える。本研究では, カオス状態における流体プロファイルと, 様々なノイズ構造, 実世界の画像などの他の種類の画像とを, ネットワークが区別しなければならない一連の分類問題を解析する。非圧縮性および弱い圧縮性流体流の解析を行う。本研究では,内部特徴表現の内在的な次元を通してネットワークが行う計算の複雑さを定量化し,ネットワークがクラスを区別するために使用する独立特徴の有効数を算出する。この尺度は計算の複雑さを数値的に推定するだけでなく、中間段階と最終段階におけるニューラルネットワーク処理を特徴付ける。逆例を構築し,これらを用いてカオス渦と乱流渦の2点相関スペクトルを,ネットワークが分類に用いた特徴として同定する。

Chaos and turbulence are complex physical phenomena, yet a precise definition of the complexity measure that quantifies them is still lacking. In this work we consider the relative complexity of chaos and turbulence from the perspective of deep neural networks. We analyze a set of classification problems, where the network has to distinguish images of fluid profiles in the turbulent regime from other classes of images such as fluid profiles in the chaotic regime, various constructions of noise and real world images. We analyze incompressible as well as weakly compressible fluid flows. We quantify the complexity of the computation performed by the network via the intrinsic dimensionality of the internal feature representations, and calculate the effective number of independent features which the network uses in order to distinguish between classes. In addition to providing a numerical estimate of the complexity of the computation, the measure also characterizes the neural network processing at intermediate and final stages. We construct adversarial examples and use them to identify the two point correlation spectra for the chaotic and turbulent vorticity as the feature used by the network for classification.

翻訳日:2023-07-21 18:48:31 公開日:2023-07-20

# テンソルネットワークを用いた正のラベルなし学習

Positive unlabeled learning with tensor networks ( http://arxiv.org/abs/2211.14085v3 )

ライセンス: Link先を確認

Bojan \v{Z}unkovi\v{c}

(参考訳) 正のラベルなし学習は正のラベルなしデータを持つ二項分類問題である。医療やパーソナライズされた広告など、ネガティブなラベルが高価または不可能なドメインでは一般的である。正のラベルなし学習へのほとんどのアプローチは、特定のデータ型(画像、分類データなど)に適用され、新しい正と負のサンプルを生成できない。この研究は、正の未ラベル学習問題に対する特徴空間距離に基づくテンソルネットワークアプローチを導入する。提案手法はドメイン固有ではなく、MNIST画像と15の分類/混合データセットの最先端結果を大幅に改善する。トレーニングされたテンソルネットワークモデルもまた生成モデルであり、新しい正および負のインスタンスの生成を可能にする。

Positive unlabeled learning is a binary classification problem with positive and unlabeled data. It is common in domains where negative labels are costly or impossible to obtain, e.g., medicine and personalized advertising. Most approaches to positive unlabeled learning apply to specific data types (e.g., images, categorical data) and can not generate new positive and negative samples. This work introduces a feature-space distance-based tensor network approach to the positive unlabeled learning problem. The presented method is not domain specific and significantly improves the state-of-the-art results on the MNIST image and 15 categorical/mixed datasets. The trained tensor network model is also a generative model and enables the generation of new positive and negative instances.

翻訳日:2023-07-21 18:48:14 公開日:2023-07-20

# オンライン強化学習におけるオフラインデータ活用

Leveraging Offline Data in Online Reinforcement Learning ( http://arxiv.org/abs/2211.04974v2 )

ライセンス: Link先を確認

Andrew Wagenmaker, Aldo Pacchiano

(参考訳) 強化学習(RL)コミュニティには,オンラインRLとオフラインRLという,2つの中心的なパラダイムが出現している。オンラインRL設定では、エージェントは環境に関する事前の知識を持っておらず、$\epsilon$-Optimal Policyを見つけるためにそれと対話する必要がある。オフラインのrl設定では、学習者は、学習する固定データセットにアクセスするが、それ以外は環境とのインタラクションができず、オフラインデータから可能な最高のポリシーを取得する必要がある。もしいくつかのオフラインデータがあり、環境と相互作用する可能性があるなら、オフラインデータを使って$\epsilon$-Optimalポリシーを学ぶのに必要なオンラインインタラクションの数を最小化できるだろうか? 本研究では、線形構造を持つmdpに対して、この設定を \textsf{finetunerl} 設定と呼ぶ。オフラインデータセットへのアクセスを前提に、この設定で必要なオンラインサンプルの数を特徴付け、最大$h$ファクターの最適なアルゴリズムである \textsc{ftpedel} を開発します。オフラインデータとオンラインインタラクションを組み合わせることで、純粋にオフラインまたは純粋にオンラインRLよりも証明可能な改善がもたらされる、という明確な例を示す。最後に、オンラインRLにおける典型的な設定である「emph{verible}学習」と、オフラインRLにおいてしばしば考慮される「emph{unverible}学習」の区別を示し、これらの制度間に正式な分離が存在することを示す。

Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must interact with it in order to find an $\epsilon$-optimal policy. In the offline RL setting, the learner instead has access to a fixed dataset to learn from, but is unable to otherwise interact with the environment, and must obtain the best policy it can from this offline data. Practical scenarios often motivate an intermediate setting: if we have some set of offline data and, in addition, may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an $\epsilon$-optimal policy? In this work, we consider this setting, which we call the \textsf{FineTuneRL} setting, for MDPs with linear structure. We characterize the necessary number of online samples needed in this setting given access to some offline dataset, and develop an algorithm, \textsc{FTPedel}, which is provably optimal, up to $H$ factors. We show through an explicit example that combining offline data with online interactions can lead to a provable improvement over either purely offline or purely online RL. Finally, our results illustrate the distinction between \emph{verifiable} learning, the typical setting considered in online RL, and \emph{unverifiable} learning, the setting often considered in offline RL, and show that there is a formal separation between these regimes.

翻訳日:2023-07-21 18:47:27 公開日:2023-07-20

# オブザーバベース逆強化学習における等価解に対する不合理性と収束

Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning ( http://arxiv.org/abs/2210.16299v3 )

ライセンス: Link先を確認

Jared Town, Zachary Morrison, Rushikesh Kamalapurkar

(参考訳) オンラインおよびリアルタイムに決定論的逆強化学習(IRL)問題を解決する上で重要な課題は、複数の解が存在することである。非特異性は等価解の概念、すなわち異なるコスト関数的だが同じフィードバック行列をもたらす解、およびそのような解への収束の研究を必要とする。同等のソリューションに収束するオフラインアルゴリズムが文献で開発されているが、非合理性に対処するオンラインリアルタイム技術は利用できない。本稿では、IRL問題のほぼ等価解に収束する正規化履歴スタックオブザーバを開発する。本手法の有効性を実証するために,新しいデータリッチネス条件を開発し,シミュレーション結果を得た。

A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.

翻訳日:2023-07-21 18:47:00 公開日:2023-07-20

# 音声対音声比較のためのテキストレス指標

A Textless Metric for Speech-to-Speech Comparison ( http://arxiv.org/abs/2210.11835v2 )

ライセンス: Link先を確認

Laurent Besacier, Swen Ribeiro, Olivier Galibert, Ioan Calapodescu

(参考訳) 本稿では,テキストの書き起こしに頼らずに音声の発話を比較する方法を提案する。我々は,HuBERTのような最先端の音声2ユニットエンコーダを用いて,発話を離散音響単位に変換する。次に,テキストベースと密接に対応した音声ベースのメトリクスを学習する,シンプルで容易に複製可能なニューラルアーキテクチャを提案する。このテキストレスメートル法には、音声から音声への翻訳の評価や、信頼できるASRシステムを持たない言語、あるいはASRの転写を完全に回避するなど、多くの潜在的な応用がある。また、音声から音声への翻訳評価において、ASR系が強い場合でも、音声仮説と参照と文レベルのBLEUを自動で書き起こしするASR-BLEUが、実際のテキストBLEUのプロキシとして不十分であることを示す。

In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts. Our speech-to-speech comparison metric utilizes state-of-the-art speech2unit encoders like HuBERT to convert speech utterances into discrete acoustic units. We then propose a simple and easily replicable neural architecture that learns a speech-based metric that closely corresponds to its text-based counterpart. This textless metric has numerous potential applications, including evaluating speech-to-speech translation for oral languages, languages without dependable ASR systems, or to avoid the need for ASR transcription altogether. This paper also shows that for speech-to-speech translation evaluation, ASR-BLEU (which consists in automatically transcribing both speech hypothesis and reference and compute sentence-level BLEU between transcripts) is a poor proxy to real text-BLEU even when ASR system is strong.

翻訳日:2023-07-21 18:46:22 公開日:2023-07-20

# エンタングルメントハミルトンのエッジとバルクに対する異なる温度依存性

Different temperature-dependence for the edge and bulk of entanglement Hamiltonian ( http://arxiv.org/abs/2210.10062v2 )

ライセンス: Link先を確認

Menghan Song, Jiarui Zhao, Zheng Yan, and Zi Yang Meng

(参考訳) 本稿では, 経路積分定式化のワームホール効果に基づく物理図面を提案し, エンタングルメントスペクトル(ES)のメカニズムを説明するとともに, エネルギースペクトルのバルクエッジ対応とES(LiとHaldane予想)のトポロジ的状態を説明するとともに, それらのトポロジ的性質とは無関係に他のシステムに適用可能であることを示す。最終的に、システムの低層ESの挙動を決定するエッジエネルギーギャップに対して、バルクエネルギーギャップ(逆温度$\beta=1/T$)の相対的な強度であることを示した。状況によっては、ESは仮想エッジのエネルギースペクトルに似ているが、仮想バルクのエネルギースペクトルを表すこともできる。我々は、LiとHaldaneが0温度で予想するエッジのようなケースに加えて、有限温度でバルク状の低層ESを実証するために、1Dと2Dの両方でモデルを設計する。本研究は,ESを経路積分におけるワームホール効果,およびESのエッジとバルクの温度依存性の一般性を支持するものである。

We propose a physical picture based on the wormhole effect of the path-integral formulation to explain the mechanism of entanglement spectrum (ES), such that, our picture not only explains the topological state with bulk-edge correspondence of the energy spectrum and ES (the Li and Haldane conjecture), but is generically applicable to other systems independent of their topological properties. We point out it is ultimately the relative strength of bulk energy gap (multiplied with inverse temperature $\beta=1/T$) with respect to the edge energy gap that determines the behavior of the low-lying ES of the system. Depending on the circumstances, the ES can resemble the energy spectrum of the virtual edge, but can also represent that of the virtual bulk. We design models both in 1D and 2D to successfully demonstrate the bulk-like low-lying ES at finite temperatures, in addition to the edge-like case conjectured by Li and Haldane at zero temperature. Our results support the generality of viewing the ES as the wormhole effect in the path integral and the different temperature-dependence for the edge and bulk of ES.

翻訳日:2023-07-21 18:46:05 公開日:2023-07-20

# ニューラルネットワーク学習のためのデータ効率向上

Data-Efficient Augmentation for Training Neural Networks ( http://arxiv.org/abs/2210.08363v3 )

ライセンス: Link先を確認

Tian Yu Liu and Baharan Mirzasoleiman

(参考訳) データ拡張は、多くのディープラーニングアプリケーションで最先端のパフォーマンスを達成するために不可欠である。しかし、最も効果的な拡張技術は、中規模のデータセットでも計算的に禁止される。そこで本研究では,拡張されたデータポイントのサブセットを選択するための厳密な手法を提案する。まず,加法摂動としてモデル化されたデータ拡張は,ネットワークジャコビアンのより小さな特異値を相対的に拡大・摂動することで学習と一般化を改善し,その顕著な方向を維持していることを示す。これにより、過剰フィッティングが防止され、情報を学ぶのが難しくなる。そこで本研究では,学習データの小さな部分集合を反復的に抽出するフレームワークを提案する。本手法により得られた拡張部分集合に対する確率勾配勾配は、完全に拡張されたデータと同様のトレーニングダイナミクスを持つことを示す。実験により, CIFAR10では6.3倍, SVHNでは2.2倍の高速化を実現し, 各種サブセットサイズでベースラインを最大10%上回る性能を示した。同様に、TinyImageNetとImageNetでは、ベースラインを最大8%上回り、様々なサブセットサイズで最大3.3倍のスピードアップを実現しています。最後に、我々のCIFAR10のバージョンで、50%のサブセットのトレーニングと強化を行い、完全なデータセットを使用してラベルノイズがさらに優れていた。私たちのコードは、https://github.com/tianyu139/data- efficient-augmentationで利用可能です。

Data augmentation is essential to achieve state-of-the-art performance in many deep learning applications. However, the most effective augmentation techniques become computationally prohibitive for even medium-sized datasets. To address this, we propose a rigorous technique to select subsets of data points that when augmented, closely capture the training dynamics of full data augmentation. We first show that data augmentation, modeled as additive perturbations, improves learning and generalization by relatively enlarging and perturbing the smaller singular values of the network Jacobian, while preserving its prominent directions. This prevents overfitting and enhances learning the harder to learn information. Then, we propose a framework to iteratively extract small subsets of training data that when augmented, closely capture the alignment of the fully augmented Jacobian with labels/residuals. We prove that stochastic gradient descent applied to the augmented subsets found by our approach has similar training dynamics to that of fully augmented data. Our experiments demonstrate that our method achieves 6.3x speedup on CIFAR10 and 2.2x speedup on SVHN, and outperforms the baselines by up to 10% across various subset sizes. Similarly, on TinyImageNet and ImageNet, our method beats the baselines by up to 8%, while achieving up to 3.3x speedup across various subset sizes. Finally, training on and augmenting 50% subsets using our method on a version of CIFAR10 corrupted with label noise even outperforms using the full dataset. Our code is available at: https://github.com/tianyu139/data-efficient-augmentation

翻訳日:2023-07-21 18:45:43 公開日:2023-07-20

# ThoughtSource: 大規模言語モデル推論のための中心的なハブ

ThoughtSource: A central hub for large language model reasoning data ( http://arxiv.org/abs/2301.11596v4 )

ライセンス: Link先を確認

Simon Ott, Konstantin Hebenstreit, Valentin Li\'evin, Christoffer Egeberg Hother, Milad Moradi, Maximilian Mayrhauser, Robert Praas, Ole Winther, Matthias Samwald

(参考訳) GPT-4のような大規模言語モデル(LLM)は、最近、幅広いタスクで印象的な結果を示した。 LLMは依然として制限されているが、複雑な推論でしばしば失敗し、推論プロセスは不透明であり、事実を「幻覚させる」傾向があるため、その根底にあるバイアスには懸念がある。モデルが推論ステップを自然言語として言語化する手法は、近年、これらの問題に対処する方法として提案されている。ここでは、思考の連鎖(CoT)推論のためのメタデータおよびソフトウェアライブラリであるThoughtSourceを紹介します。 ThoughtSourceの目標は、CoTの質的理解を促進し、経験的評価を可能にし、トレーニングデータを提供することによって、将来の人工知能システムを改善することである。 ThoughtSourceの最初のリリースでは、6つの科学的/医学的、3つの一般ドメイン、5つの数学語質問応答データセットを統合している。

Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results across a wide range of tasks. LLMs are still limited, however, in that they frequently fail at complex reasoning, their reasoning processes are opaque, they are prone to 'hallucinate' facts, and there are concerns about their underlying biases. Letting models verbalize reasoning steps as natural language, a technique known as chain-of-thought prompting, has recently been proposed as a way to address some of these issues. Here we present ThoughtSource, a meta-dataset and software library for chain-of-thought (CoT) reasoning. The goal of ThoughtSource is to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs, enabling empirical evaluations, and providing training data. This first release of ThoughtSource integrates six scientific/medical, three general-domain and five math word question answering datasets.

翻訳日:2023-07-21 18:39:13 公開日:2023-07-20

# 自律運転における協調的知覚 : 方法・データセット・課題

Collaborative Perception in Autonomous Driving: Methods, Datasets and Challenges ( http://arxiv.org/abs/2301.06262v2 )

ライセンス: Link先を確認

Yushan Han, Hui Zhang, Huifang Li, Yi Jin, Congyan Lang, Yidong Li

(参考訳) 協調認識は、自律運転における閉塞とセンサ障害の問題に対処するために不可欠である。近年,協調的知覚のための新作の理論的,実験的研究が著しく増加している。しかし、これまでのところ、体系的なコラボレーションモジュールと大規模な協調認識データセットに焦点を当てたレビューはほとんどない。この研究は、このギャップを埋め、将来の研究を動機付けるために、この分野における最近の成果をレビューする。まずは、コラボレーションスキームの概要から始めます。その後,理想的シナリオと実世界の課題に対する協調的知覚手法を体系的に要約する。前者はコラボレーションモジュールと効率に重点を置いており、後者は実際のアプリケーションの問題に対処することに集中しています。さらに, 大規模公開データセットを提示し, これらのベンチマークを定量的に要約する。最後に,現在の学術研究と実世界の応用とのギャップと見過ごされた課題を強調する。

Collaborative perception is essential to address occlusion and sensor failure issues in autonomous driving. In recent years, theoretical and experimental investigations of novel works for collaborative perception have increased tremendously. So far, however, few reviews have focused on systematical collaboration modules and large-scale collaborative perception datasets. This work reviews recent achievements in this field to bridge this gap and motivate future research. We start with a brief overview of collaboration schemes. After that, we systematically summarize the collaborative perception methods for ideal scenarios and real-world issues. The former focus on collaboration modules and efficiency, and the latter is devoted to addressing the problems in actual application. Furthermore, we present large-scale public datasets and summarize quantitative results on these benchmarks. Finally, we highlight gaps and overlooked challenges between current academic research and real-world applications.

翻訳日:2023-07-21 18:38:44 公開日:2023-07-20

# 簡素なマツリシカの花

Entanglement blossom in a simplex matryoshka ( http://arxiv.org/abs/2301.04170v2 )

ライセンス: Link先を確認

Zhao Zhang

(参考訳) エキゾチックな絡み合いエントロピースケーリング特性は、通常、実空間における興味深い絡み合い構造と時空格子の新しい計量をもたらす。 1つの顕著な例は、結合強度の強い不均一性から有効に長い範囲のカップリングにより、中心形状のベル対に対称な格子サイトが存在する虹鎖である。この写本はレインボー連鎖をハウスドルフ次元 1 の格子上の高次元空間に一般化し、ハミルトニアンフラストレーションを自由に保つ局所ヒルベルト空間を拡大する。シュリーファー・ウルフ変換の有効なハミルトニアンは、0$-次元(完全連結)の反強磁性ハミルトニアンを持つ、k$-単体の層を積み重ねることで与えられる。元の格子は、通常のk$-次元立方体格子で不傾斜欠陥を増殖させ、格子の中心に曲率を導入することで得られる。このモデルはSYKモデルと自由フェルミオンXXスピン鎖の間を補間するので、ブラックホール物理学やホログラフィーを理解するのに有用かもしれない。

Exotic entanglement entropy scaling properties usually come with interesting entanglement structures in real space and novel metrics of the spacetime lattice. One prominent example is the rainbow chain where lattice sites symmetric about the center form entangled Bell pairs due to an effective long-range coupling from the strong inhomogeneity of the coupling strength. This manuscript generalizes the rainbow chain to higher dimensional space on lattices with Hausdorff dimension one and enlarged local Hilbert space keeping the Hamiltonian frustration free. The effective Hamiltonian from the Schrieffer-Wolf transformation is given by a stacking of layers of $k$-simplices with $0$-dimensional (fully-connected) antiferromagnetic Hamiltonians, which can be diagonalized analytically with Young operators. The original lattice can be obtained from proliferating disinclination defects in a regular $k$-dimensional cubical lattice, which introduces curvature at the center of the lattice. The model interpolates between the SYK model and the free-fermionic XX spin chain, and hence might be potentially useful in understanding black hole physics and holography.

翻訳日:2023-07-21 18:38:34 公開日:2023-07-20

# イベントカメラデータの事前トレーニング

Event Camera Data Pre-training ( http://arxiv.org/abs/2301.01928v3 )

ライセンス: Link先を確認

Yan Yang and Liyuan Pan and Liu Liu

(参考訳) 本稿では,イベントカメラデータを扱うためのトレーニング済みニューラルネットワークを提案する。私たちのモデルは、自己教師付き学習フレームワークであり、ペアのイベントカメラデータと自然なrgbイメージを使用してトレーニングを行います。提案手法は3つのモジュールを連続して連結する。一自己監督訓練のための有意義なイベント画像を生成するイベントデータ増強の家系二イベント画像から有意義なイベントパッチをサンプリングし、我々のモデルにシーンの空間配置を捉え、訓練を加速させるための条件付きマスキング戦略三一致したイベント画像とペア化されたイベント画像とRGB画像との埋め込みの類似性を強制する対照的な学習方法。イベント画像の埋め込み類似性を高める際に, モデル崩壊を回避するために, 埋め込み投影損失を提案する。イベント画像が特徴空間における対のrgb画像と一致するようにするための確率分布アライメント損失を提案する。ダウンストリームタスクにおける転送学習性能は,最先端手法よりも優れていることを示す。例えば、N-ImageNetデータセットにおいて、トップ1の精度は64.83%に達する。

This paper proposes a pre-trained neural network for handling event camera data. Our model is a self-supervised learning framework, and uses paired event camera data and natural RGB images for training. Our method contains three modules connected in a sequence: i) a family of event data augmentations, generating meaningful event images for self-supervised training; ii) a conditional masking strategy to sample informative event patches from event images, encouraging our model to capture the spatial layout of a scene and accelerating training; iii) a contrastive learning approach, enforcing the similarity of embeddings between matching event images, and between paired event and RGB images. An embedding projection loss is proposed to avoid the model collapse when enforcing the event image embedding similarities. A probability distribution alignment loss is proposed to encourage the event image to be consistent with its paired RGB image in the feature space. Transfer learning performance on downstream tasks shows the superiority of our method over state-of-the-art methods. For example, we achieve top-1 accuracy at 64.83% on the N-ImageNet dataset.

翻訳日:2023-07-21 18:38:12 公開日:2023-07-20

# 定数係数を持つ線形偏微分方程式系のガウス過程優先

Gaussian Process Priors for Systems of Linear Partial Differential Equations with Constant Coefficients ( http://arxiv.org/abs/2212.14319v3 )

ライセンス: Link先を確認

Marc H\"ark\"onen, Markus Lange-Hegermann, Bogdan Rai\c{t}\u{a}

(参考訳) 偏微分方程式(PDE)は物理システムをモデル化するための重要なツールであり、それらを機械学習モデルに含めることは物理知識を組み込む重要な方法である。定数係数の線形PDE系の任意の系が与えられたとき、我々はガウス過程(GP)先行系の族を提案し、これをEPGPと呼び、すべての実現がこの系の正確な解である。非線形フーリエ変換として働くehrenpreis-palamodov基本原理を適用し、gpsの標準スペクトル法を反映するgpカーネルを構築する。提案手法は,ノイズ測定や初期値,境界値などのデータから線形PDEシステムの確率解を推定できる。 EPGPプライヤの構築はアルゴリズム的であり、一般に適用可能であり、関連するスペクトル周波数を学習し、ビッグデータに対してよりうまく機能するスパースバージョン(S-EPGP)が付属している。我々はPDEの3種類の系、熱方程式、波動方程式、マクスウェル方程式について、いくつかの実験において計算時間と精度における技術の状態を改善する方法を示す。

Partial differential equations (PDEs) are important tools to model physical systems and including them into machine learning models is an important way of incorporating physical knowledge. Given any system of linear PDEs with constant coefficients, we propose a family of Gaussian process (GP) priors, which we call EPGP, such that all realizations are exact solutions of this system. We apply the Ehrenpreis-Palamodov fundamental principle, which works as a non-linear Fourier transform, to construct GP kernels mirroring standard spectral methods for GPs. Our approach can infer probable solutions of linear PDE systems from any data such as noisy measurements, or pointwise defined initial and boundary conditions. Constructing EPGP-priors is algorithmic, generally applicable, and comes with a sparse version (S-EPGP) that learns the relevant spectral frequencies and works better for big data sets. We demonstrate our approach on three families of systems of PDEs, the heat equation, wave equation, and Maxwell's equations, where we improve upon the state of the art in computation time and precision, in some experiments by several orders of magnitude.

翻訳日:2023-07-21 18:37:37 公開日:2023-07-20

# 眼周囲バイオメトリックス : 非拘束シナリオのモダリティ

Periocular Biometrics: A Modality for Unconstrained Scenarios ( http://arxiv.org/abs/2212.13792v2 )

ライセンス: Link先を確認

Fernando Alonso-Fernandez, Josef Bigun, Julian Fierrez, Naser Damer, Hugo Proen\c{c}a, Arun Ross

(参考訳) 眼窩 (periocular) は、眼窩を取り囲む顔の外側に見える領域を指す。この特徴に富んだ領域は、アイリスや顔のモダリティが部分的閉塞や被写体間距離の上昇といった要因のために十分な生体計測的手がかりを提供しない、非拘束的または非協力的なシナリオにおいて正確な識別を提供することができる。新型コロナウイルス(COVID-19)のパンデミックは、マスクが普及しているため、コントロールされた設定でも目に見える唯一の顔領域であり続けたため、その重要性をさらに強調した。本稿では、近視バイオメトリックスにおける技術の現状について論じ、その最も重要な研究側面を包含する全体的な枠組みを示す。 a) 眼の定義,取得及び検出 (b)他のモダリティとの組合せ及び各種スペクトルの使用を含む識別及び (c)眼ソフトバイオメトリック解析。最後に,現在の課題に対処し,今後の方向性を提案する。

Periocular refers to the externally visible region of the face that surrounds the eye socket. This feature-rich area can provide accurate identification in unconstrained or uncooperative scenarios, where the iris or face modalities may not offer sufficient biometric cues due to factors such as partial occlusion or high subject-to-camera distance. The COVID-19 pandemic has further highlighted its importance, as the ocular region remained the only visible facial area even in controlled settings due to the widespread use of masks. This paper discusses the state of the art in periocular biometrics, presenting an overall framework encompassing its most significant research aspects, which include: (a) ocular definition, acquisition, and detection; (b) identity recognition, including combination with other modalities and use of various spectra; and (c) ocular soft-biometric analysis. Finally, we conclude by addressing current challenges and proposing future directions.

翻訳日:2023-07-21 18:37:15 公開日:2023-07-20

# 木構造学習による変数ネットワークの不確かさの定量化

Improving Uncertainty Quantification of Variance Networks by Tree-Structured Learning ( http://arxiv.org/abs/2212.12658v2 )

ライセンス: Link先を確認

Wenxuan Ma, Xing Yan, and Kun Zhang

(参考訳) 分散ネットワークの不確かさを定量化するために,不確実性に基づく特徴空間を複数の領域に分割する新しい木構造局所ニューラルネットワークモデルを提案する。葉ノードは、地域固有のニューラルネットワークをトレーニングして、不確実性を定量化するための平均と分散の両方を予測する、異なる領域を表す。提案したUncertainty-Splitting Neural Regression Tree (USNRT)は、新しいスプリッティング基準を採用している。各ノードにおいて、ニューラルネットワークをまず完全なデータに基づいてトレーニングし、その間に最も顕著な不均一性を持つ2つのサブリージョンに対応する、最良の分割を見つけるための残差の統計的テストを行う。 USNRTは、葉ノードが十分であり、刈り取りは不要であるため、計算に親しみやすい。さらに、アンサンブル版を簡単に構築して、気道およびてんかんを含む総不確実性を推定することができる。広範なuciデータセットにおいて、usnrtまたはそのアンサンブルは、分散による不確かさを定量化する最近の一般的な方法に比べて優れた性能を示している。包括的可視化と分析を通じて、USNRTがどのように機能するかを明らかにし、そのメリットを示し、不確実な不均一性が多くのデータセットに存在し、USNRTで学習できることを明らかにする。

To improve the uncertainty quantification of variance networks, we propose a novel tree-structured local neural network model that partitions the feature space into multiple regions based on uncertainty heterogeneity. A tree is built upon giving the training data, whose leaf nodes represent different regions where region-specific neural networks are trained to predict both the mean and the variance for quantifying uncertainty. The proposed Uncertainty-Splitting Neural Regression Tree (USNRT) employs novel splitting criteria. At each node, a neural network is trained on the full data first, and a statistical test for the residuals is conducted to find the best split, corresponding to the two sub-regions with the most significant uncertainty heterogeneity between them. USNRT is computationally friendly because very few leaf nodes are sufficient and pruning is unnecessary. Furthermore, an ensemble version can be easily constructed to estimate the total uncertainty including the aleatory and epistemic. On extensive UCI datasets, USNRT or its ensemble shows superior performance compared to some recent popular methods for quantifying uncertainty with variances. Through comprehensive visualization and analysis, we uncover how USNRT works and show its merits, revealing that uncertainty heterogeneity does exist in many datasets and can be learned by USNRT.

翻訳日:2023-07-21 18:37:00 公開日:2023-07-20

# 量子ウォークに基づくparrondoの量子探索ゲーム

Parrondo's game of quantum search based on quantum walk ( http://arxiv.org/abs/2303.06579v2 )

ライセンス: Link先を確認

Taisuke Hosaka and Norio Konno

(参考訳) パロンドが考案したパロンドゲームは、敗戦戦略と組み合わせて勝利戦略を構築することを意味する。この状況をパロンドパラドックス(parrondo paradox)と呼ぶ。量子ウォークに基づくParrondoゲームと量子ウォークによる探索アルゴリズムは,それぞれ広く研究されている。本稿では,両モデルを組み合わせた量子ウォークに基づく量子探索のパロンドゲームを提案する。さらに, 数値シミュレーションにより1次元および2次元トーラス上のモデルに対するparrondoのパラドックスの存在を確認した。その後、パラドックスが発生する範囲は、頂点と1つのマークされた頂点を持つ $d$-dimensional torus $(d \geq 1)$ の元について対称であることを示した。

The Parrondo game, devised by Parrondo, means that winning strategy is constructed a combination of losing strategy. This situation is called the Parrondo paradox. The Parrondo game based on quantum walk and the search algorithm via quantum walk have been widely studied, respectively. This paper newly presents a Parrondo game of quantum search based on quantum walk by combining both models. Moreover we confirm that Parrondo's paradox exists for our model on the one- and two-dimensional torus by numerical simulations. Afterwards we show the range in which the paradox occurs is symmetric about the origin on the $d$-dimensional torus $(d \geq 1)$ with even vertices and one marked vertex.

翻訳日:2023-07-21 18:28:53 公開日:2023-07-20

# Langevin Monte Carloの完全な分析に向けて: Poincar\'eの不平等を超えて

Towards a Complete Analysis of Langevin Monte Carlo: Beyond Poincar\'e Inequality ( http://arxiv.org/abs/2303.03589v2 )

ライセンス: Link先を確認

Alireza Mousavi-Hosseini and Tyler Farghly and Ye He and Krishnakumar Balasubramanian and Murat A. Erdogdu

(参考訳) ランゲヴィン拡散は適切な機能的不等式仮定の下で急速に収束する。したがって、離散化誤差を扱うための追加の滑らかさ条件により、ランジュバン・モンテカルロ(lmc)のような離散化も同様に収束することが期待できる。この研究プログラムは、Vempala and Wibisono (2019)によって始められ、ログソボレフの不等式で結果を確立した。 Chewi et al. (2022) は結果を Poincar\'e の不等式を扱うように拡張した。本稿では,poincar\'eの不等式を超えて,この研究プログラムを限界まで押し上げる。我々は、多項式分解重尾密度(すなわちコーシー型)を含む大きな密度のクラスで満たされる弱いポアンカーの不等式の下でランゲヴィン拡散と LMC の上下境界を確立する。本結果は,初期化器がLCCアルゴリズムの性能に与える影響を明示的に定量化する。特に、尾が準ガウスから亜指数へ、そして最後にコーシー様へと進むと、初期誤差への依存は対数的から多項式へ、そして最後に指数的であることを示す。この3段階の位相遷移は、以下に示すように特に避けられないものであり、LCCの境界を明確に定義している。

Langevin diffusions are rapidly convergent under appropriate functional inequality assumptions. Hence, it is natural to expect that with additional smoothness conditions to handle the discretization errors, their discretizations like the Langevin Monte Carlo (LMC) converge in a similar fashion. This research program was initiated by Vempala and Wibisono (2019), who established results under log-Sobolev inequalities. Chewi et al. (2022) extended the results to handle the case of Poincar\'e inequalities. In this paper, we go beyond Poincar\'e inequalities, and push this research program to its limit. We do so by establishing upper and lower bounds for Langevin diffusions and LMC under weak Poincar\'e inequalities that are satisfied by a large class of densities including polynomially-decaying heavy-tailed densities (i.e., Cauchy-type). Our results explicitly quantify the effect of the initializer on the performance of the LMC algorithm. In particular, we show that as the tail goes from sub-Gaussian, to sub-exponential, and finally to Cauchy-like, the dependency on the initial error goes from being logarithmic, to polynomial, and then finally to being exponential. This three-step phase transition is in particular unavoidable as demonstrated by our lower bounds, clearly defining the boundaries of LMC.

翻訳日:2023-07-21 18:28:42 公開日:2023-07-20

# MultiRobustBench: 複数の攻撃に対するロバスト性のベンチマーク

MultiRobustBench: Benchmarking Robustness Against Multiple Attacks ( http://arxiv.org/abs/2302.10980v3 )

ライセンス: Link先を確認

Sihui Dai, Saeed Mahloujifar, Chong Xiang, Vikash Sehwag, Pin-Yu Chen, Prateek Mittal

(参考訳) 敵の例に対する防御に関する既存の研究の多くは、単一の(通常は境界付けられたLp-ノルム)攻撃に対する防御に焦点を当てているが、実際は機械学習(ML)モデルは様々な攻撃に対して堅牢であるべきである。本稿では,MLモデルに対する多重攻撃を考慮した最初の統一フレームワークを提案する。我々のフレームワークは、テスト時の敵に対する学習者の知識の異なるレベルをモデル化することができ、予期せぬ攻撃に対する頑健さと攻撃の結合に対する堅牢さをモデル化することができる。このフレームワークを用いて,攻撃型と攻撃強度をまたいだ性能を捉えるマルチアタック評価のベンチマークを行うための,最初のリーダボードであるmultirobustbenchを提案する。我々は,lpベースの脅威モデル,空間的変換,色変化を含む9種類の攻撃タイプに対するロバスト性に対する16種類の防御モデルの性能を20種類の攻撃強度(合計180攻撃)で評価した。さらに、複数の攻撃に対する現在の防御状況を分析する。我々の分析によると、既存の防御は、使用される攻撃セット全体の平均ロバストネスを進歩させたが、最悪の攻撃に対するロバストネスは依然として大きなオープンな問題であり、既存のすべてのモデルがランダムな推測よりも悪化している。

The bulk of existing research in defending against adversarial examples focuses on defending against a single (typically bounded Lp-norm) attack, but for a practical setting, machine learning (ML) models should be robust to a wide variety of attacks. In this paper, we present the first unified framework for considering multiple attacks against ML models. Our framework is able to model different levels of learner's knowledge about the test-time adversary, allowing us to model robustness against unforeseen attacks and robustness against unions of attacks. Using our framework, we present the first leaderboard, MultiRobustBench, for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including Lp-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total). Additionally, we analyze the state of current defenses against multiple attacks. Our analysis shows that while existing defenses have made progress in terms of average robustness across the set of attacks used, robustness against the worst-case attack is still a big open problem as all existing models perform worse than random guessing.

翻訳日:2023-07-21 18:27:57 公開日:2023-07-20

# 双極子膜におけるスピン励起のモーメント選択対生成

Momentum-selective pair creation of spin excitations in dipolar bilayers ( http://arxiv.org/abs/2302.09059v2 )

ライセンス: Link先を確認

Thomas Bilitewski, G. A. Dom\'inguez-Castro, David Wellnitz, Ana Maria Rey, Luis Santos

(参考訳) 長距離・異方性双極子相互作用を媒介とするスピン1/2量子xxzモデルを実現する2次元二重層における量子相関の時間的成長と空間伝播について検討した。各層に逆磁化を持つスピンからなる初期状態から始めると、スピン構造因子における運動量依存性の動的不安定性の出現を予測し、その結果、短時間で指数関数的に速い速度で励起対を生成する。生成されたペアは、双極子配向、層分離または双極子カップリングを制御することで調整できる特徴的な運動量分布を示す。予測された挙動は、非常に低い充填率で観測可能であり、ライドバーグ原子、磁気原子、極性分子配列を用いた最先端の実験で見ることができる。

We study the temporal growth and spatial propagation of quantum correlations in a two-dimensional bilayer realising a spin-1/2 quantum XXZ model with couplings mediated by long-range and anisotropic dipolar interactions. Starting with an initial state consisting of spins with opposite magnetization in each of the layers, we predict the emergence of a momentum-dependent dynamic instability in the spin structure factor that results, at short times, in the creation of pairs of excitations at exponentially fast rates. The created pairs present a characteristic momentum distribution that can be tuned by controlling the dipolar orientation, the layer separation or the dipolar couplings. The predicted behavior remains observable at very low filling fractions, making it accessible in state-of-the-art experiments with Rydberg atoms, magnetic atoms, and polar molecule arrays.

翻訳日:2023-07-21 18:27:33 公開日:2023-07-20

# navya3dseg -- navyaセマンティックセグメンテーションデータセットと自動運転車のための分割生成

Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation for autonomous vehicles ( http://arxiv.org/abs/2302.08292v3 )

ライセンス: Link先を確認

Alexandre Almin, L\'eo Lemari\'e, Anh Duong, B Ravi Kiran

(参考訳) 今日では、自動運転(AD)の認識は、キュレーションとアノテーションに関連するコストとともに、大規模な注釈付きデータセットを必要とするディープラーニングベースのアーキテクチャに大きく依存している。 3次元意味データは障害物検出や車軸位置推定などのコア知覚タスクに有用である。本研究では,13カ国の農村,都市,工業地,大学を含む大規模生産段階の運用ドメインに対応する多様なラベル空間を持つ,navya 3dセグメンテーション(navya3dseg)という新しいデータセットを提案する。ラベルのない23のラベル付きシーケンスと25の補足的なシーケンスを含み、ポイントクラウド上の自己教師付きおよび半教師付きセマンティックセグメンテーションベンチマークを探索するように設計されている。また,反復的マルチラベル階層化に基づく逐次データセット分割生成手法を提案し,SemanticKITTIデータセットによって提案された分割よりも+1.2%のmIoU改善を実現することを示した。セマンティクスセグメンテーションタスクの完全なベンチマークが, artメソッドの状態とともに実施された。最後に、アクティブラーニング(AL)に基づくデータセット蒸留フレームワークを実演する。 ALの文脈において,エゴ位置距離に基づく新しいヒューリスティックなサンプリング手法を提案する。データセットに関する詳細なプレゼンテーションは、https://www.youtube.com/watch? v=5m6ALIs-s20。

Autonomous driving (AD) perception today relies heavily on deep learning based architectures requiring large scale annotated datasets with their associated costs for curation and annotation. The 3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization. We propose a new dataset, Navya 3D Segmentation (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain, including rural, urban, industrial sites and universities from 13 countries. It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds. We also propose a novel method for sequential dataset split generation based on iterative multi-label stratification, and demonstrated to achieve a +1.2% mIoU improvement over the original split proposed by SemanticKITTI dataset. A complete benchmark for semantic segmentation task was performed, with state of the art methods. Finally, we demonstrate an Active Learning (AL) based dataset distillation framework. We introduce a novel heuristic-free sampling method called ego-pose distance based sampling in the context of AL. A detailed presentation on the dataset is available here https://www.youtube.com/watch?v=5m6ALIs-s20.

翻訳日:2023-07-21 18:27:18 公開日:2023-07-20

# 関数上の学習分布のための変分混合ハイパージェネレータ

Variational Mixture of HyperGenerators for Learning Distributions Over Functions ( http://arxiv.org/abs/2302.06223v3 )

ライセンス: Link先を確認

Batuhan Koyuncu, Pablo Sanchez-Martin, Ignacio Peis, Pablo M. Olmos, Isabel Valera

(参考訳) 近年のアプローチは、関数空間上の生成モデルを提案するために暗黙の神経表現(INR)に基づいている。しかし、データ計算の欠如など推論タスクを扱う場合や、直接処理できない場合には計算コストがかかる。本研究では,VAMoHと呼ばれる新しい深層生成モデルを提案する。 VAMoHはINRを用いた連続関数のモデリング機能と変分オートエンコーダ(VAE)の推論機能を組み合わせたものである。さらにVAMoHは、事前を定義するための正規化フローと、データログライクな状態をパラメータ化するハイパーネットワークの混合に依存している。これによりVAMoHは高い表現能力と解釈可能性が得られる。画像やボクセル,気候データなど,さまざまな種類のデータタイプの実験を通じて,VAMoHは連続関数上の豊富な分布を効果的に学習できることを示す。さらに、条件付き超解像生成やインペインティングなどの推論関連タスクを、計算処理の要求を少なくしつつ、従来の手法よりも優れている。

Recent approaches build on implicit neural representations (INRs) to propose generative models over function spaces. However, they are computationally costly when dealing with inference tasks, such as missing data imputation, or directly cannot tackle them. In this work, we propose a novel deep generative model, named VAMoH. VAMoH combines the capabilities of modeling continuous functions using INRs and the inference capabilities of Variational Autoencoders (VAEs). In addition, VAMoH relies on a normalizing flow to define the prior, and a mixture of hypernetworks to parametrize the data log-likelihood. This gives VAMoH a high expressive capability and interpretability. Through experiments on a diverse range of data types, such as images, voxels, and climate data, we show that VAMoH can effectively learn rich distributions over continuous functions. Furthermore, it can perform inference-related tasks, such as conditional super-resolution generation and in-painting, as well or better than previous approaches, while being less computationally demanding.

翻訳日:2023-07-21 18:26:56 公開日:2023-07-20

# ChatGPTの数学的機能

Mathematical Capabilities of ChatGPT ( http://arxiv.org/abs/2301.13867v2 )

ライセンス: Link先を確認

Simon Frieder, Luca Pinchetti, Alexis Chevalier, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, Julius Berner

(参考訳) 公開データセットと手作りデータセットを用いて,chatgpt (9- january-2023 および 30- january-2023) と gpt-4 の2つのイテレーションの数学的能力について,新しい方法論を用いて検証した。形式的証明の大規模なデータベース(例えばリーン数学ライブラリ)が利用可能である形式数学とは対照的に、現在の自然言語数学のデータセットは言語モデルのベンチマークに使われ、初等数学のみをカバーするか、あるいは非常に小さい。この問題に対処するため、GHOSTSとminiGHOSTSという2つの新しいデータセットを公開しています。これらは、(1)大学院レベルの数学を対象とする数学研究者による最初の自然言語データセットであり、(2)言語モデルの数学的能力の全体像を提供し、(3)数学的推論の複数の次元を区別する。これらのデータセットはまた、ChatGPTとGPT-4が数学者の日々の職業活動で発生するユースケースをエミュレートすることで、プロの数学者の補助となるかどうかを検証している。モデルを、詳細なパフォーマンス指標でベンチマークします。高度な数学では、これは今までで最も詳細な評価である。この結果から,ChatGPTは数学的検索エンジンや知識ベースインタフェースとして機能し,事実を問合せするための数学的アシスタントとして最もうまく利用できることがわかった。 gpt-4は大学レベルの数学でも使えるが、大学院レベルの難易度では失敗する。 GPT-4とChatGPTの試験解決能力(選択バイアスの可能性)に関するメディアの多くの肯定的な報告とは対照的に、その全体的な数学的性能は大学院生のレベルよりかなり低い。したがって、ChatGPTを卒業レベルの数学試験に合格させることが目標ならば、平均的な仲間からのコピーをオフにする方がよいでしょう。

We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-language mathematics, used to benchmark language models, either cover only elementary mathematics or are very small. We address this by publicly releasing two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics that (1) aim to cover graduate-level mathematics, (2) provide a holistic overview of the mathematical capabilities of language models, and (3) distinguish multiple dimensions of mathematical reasoning. These datasets also test whether ChatGPT and GPT-4 can be helpful assistants to professional mathematicians by emulating use cases that arise in the daily professional activities of mathematicians. We benchmark the models on a range of fine-grained performance metrics. For advanced mathematics, this is the most detailed evaluation effort to date. We find that ChatGPT can be used most successfully as a mathematical assistant for querying facts, acting as a mathematical search engine and knowledge base interface. GPT-4 can additionally be used for undergraduate-level mathematics but fails on graduate-level difficulty. Contrary to many positive reports in the media about GPT-4 and ChatGPT's exam-solving abilities (a potential case of selection bias), their overall mathematical performance is well below the level of a graduate student. Hence, if your goal is to use ChatGPT to pass a graduate-level math exam, you would be better off copying from your average peer!

翻訳日:2023-07-21 18:26:12 公開日:2023-07-20

# トポロジカルポイントクラウドクラスタリング

Topological Point Cloud Clustering ( http://arxiv.org/abs/2303.16716v2 )

ライセンス: Link先を確認

Vincent P. Grande and Michael T. Schaub

(参考訳) 我々は,グローバルトポロジカル機能への貢献に基づいて任意のポイントクラウドにポイントをクラスタリングする新しい手法であるtopological point cloud clustering (tpcc)を提案する。 TPCCは、スペクトルクラスタリングとトポロジカルデータ解析から望ましい特徴を合成し、考慮された点雲に付随する単体錯体のスペクトル特性を考慮した。スパース固有ベクトル計算を考えることから、tpccも同様にスペクトルクラスタリングとして解釈および実装が容易である。しかし、点クラウドデータから生成されたグラフに付随する1つの行列に焦点をあてるだけでなく、適切に構築された単純複体に関連付けられたホッジ・ラプラシアン全体の集合に焦点を合わせることで、よりリッチな位相的特徴集合を利用して点クラウド内のデータポイントを特徴づけ、雑音に対するトポロジ的手法の相対ロバスト性から恩恵を受けることができる。合成データと実データの両方でtpccの性能をテストし,従来のスペクトルクラスタリングと比較した。

We present Topological Point Cloud Clustering (TPCC), a new method to cluster points in an arbitrary point cloud based on their contribution to global topological features. TPCC synthesizes desirable features from spectral clustering and topological data analysis and is based on considering the spectral properties of a simplicial complex associated to the considered point cloud. As it is based on considering sparse eigenvector computations, TPCC is similarly easy to interpret and implement as spectral clustering. However, by focusing not just on a single matrix associated to a graph created from the point cloud data, but on a whole set of Hodge-Laplacians associated to an appropriately constructed simplicial complex, we can leverage a far richer set of topological features to characterize the data points within the point cloud and benefit from the relative robustness of topological techniques against noise. We test the performance of TPCC on both synthetic and real-world data and compare it with classical spectral clustering.

翻訳日:2023-07-21 18:20:09 公開日:2023-07-20

# 量子コンピュータを用いた生物シーケンス比較アルゴリズム

A biological sequence comparison algorithm using quantum computers ( http://arxiv.org/abs/2303.13608v5 )

ライセンス: Link先を確認

B\"usra K\"osoglu-Kind, Robert Loredo, Michele Grossi, Christian Bernecker, Jody M Burks, Rudiger Buchkremer

(参考訳) 遺伝情報は、数千から数十億の文字で表されるヌクレオチドの線形配列に符号化される。変異はDNAまたはRNAヌクレオチド配列の変化を指す。したがって、突然変異検出は生物学や医学のあらゆる分野において不可欠である。病原性増強変異の注意深いモニタリングが不可欠である。しかし、このサイズの遺伝的配列を分析するには、膨大な量の古典計算能力が必要である。量子コンピュータ上での視覚の人間の知覚と画像のピクセル表現に着想を得て,これらの手法をペアワイズシーケンス解析に活用した。この手法は古典的アプローチよりも潜在的に有利であり、遺伝子配列の変異やその他の修正を特定するためにさらに応用することができる。本稿では,ヌクレオチド間の類似度を決定するために,類似度スコアを算出した量子コンピュータ上で2つのゲノム配列間の類似度を表示・解析する手法を提案する。

Genetic information is encoded in a linear sequence of nucleotides, represented by letters ranging from thousands to billions. Mutations refer to changes in the DNA or RNA nucleotide sequence. Thus, mutation detection is vital in all areas of biology and medicine. Careful monitoring of virulence-enhancing mutations is essential. However, an enormous amount of classical computing power is required to analyze genetic sequences of this size. Inspired by human perception of vision and pixel representation of images on quantum computers, we leverage these techniques to implement a pairwise sequence analysis. The methodology has a potential advantage over classical approaches and can be further applied to identify mutations and other modifications in genetic sequences. We present a method to display and analyze the similarity between two genome sequences on a quantum computer where a similarity score is calculated to determine the similarity between nucleotides.

翻訳日:2023-07-21 18:19:21 公開日:2023-07-20

# regformer:大規模ポイントクラウド登録のための効率的なプロジェクションアウェアトランスフォーマネットワーク

RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration ( http://arxiv.org/abs/2303.12384v2 )

ライセンス: Link先を確認

Jiuming Liu, Guangming Wang, Zhe Liu, Chaokang Jiang, Marc Pollefeys, Hesheng Wang

(参考訳) ポイントクラウドの登録は、オブジェクトレベルのシーンや屋内シーンで著しい進歩を遂げているが、大規模な登録方法が探求されることはほとんどない。課題は主に、屋外LiDARスキャンの巨大な点数、複雑な分布、外れ値から生じる。さらに、既存の登録作業の多くは一般的に2段階のパラダイムを採用しており、まず識別可能な局所的な特徴を抽出することで対応を見つけ、その後、よく設計された記述子と後処理の選択に大きく依存する外れ値のフィルタリングに推定子(例えばransac)を利用する。そこで本研究では,大規模ポイントクラウドアライメントのためのエンドツーエンドトランスフォーマーネットワーク (regformer) を提案する。具体的には, 射影型階層変換器を提案し, 点特徴をグローバルに抽出することにより, 長距離依存を捕捉し, アウトレーヤをフィルタする。変圧器は線形複雑であり,大規模シーンにおいても高い効率性が保証される。さらに、ミスマッチを効果的に低減するために、初期変換を遅らせるために、客観的アソシエーション変換器を設計する。 KITTIとNuScenesのデータセットに関する大規模な実験は、我々のRegFormerが精度と効率の両面で競合性能を達成することを示した。

Although point cloud registration has achieved remarkable advances in object-level and indoor scenes, large-scale registration methods are rarely explored. Challenges mainly arise from the huge point number, complex distribution, and outliers of outdoor LiDAR scans. In addition, most existing registration works generally adopt a two-stage paradigm: They first find correspondences by extracting discriminative local features, and then leverage estimators (eg. RANSAC) to filter outliers, which are highly dependent on well-designed descriptors and post-processing choices. To address these problems, we propose an end-to-end transformer network (RegFormer) for large-scale point cloud alignment without any further post-processing. Specifically, a projection-aware hierarchical transformer is proposed to capture long-range dependencies and filter outliers by extracting point features globally. Our transformer has linear complexity, which guarantees high efficiency even for large-scale scenes. Furthermore, to effectively reduce mismatches, a bijective association transformer is designed for regressing the initial transformation. Extensive experiments on KITTI and NuScenes datasets demonstrate that our RegFormer achieves competitive performance in terms of both accuracy and efficiency.

翻訳日:2023-07-21 18:18:51 公開日:2023-07-20

# 画像とビデオのキャプション評価のためのポジティブなコントラスト学習

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation ( http://arxiv.org/abs/2303.12112v3 )

ライセンス: Link先を確認

Sara Sarto, Manuele Barraco, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

(参考訳) CLIPモデルは最近、視覚・言語アーキテクチャから生成されたキャプションの評価など、多種多様なクロスモーダルタスクに非常に効果的であることが証明されている。本稿では,画像キャプションのためのコントラストベース評価尺度,すなわち正示型コントラスト学習スコア(pac-s)を提案する。いくつかのデータセットにまたがる実験により、私たちの新しいメトリクスは、画像とビデオの両方で人間の判断と最も高い相関を達成し、CIDErやSPICEのような既存の参照ベースのメトリクスとCLIP-Scoreのような参照なしメトリクスを上回ります。最後に,人気のあるキャプション手法を考慮した場合,提案手法のシステムレベル相関をテストし,異なるクロスモーダル特徴を用いた場合の影響を評価する。ソースコードとトレーニングされたモデルは、https://github.com/aimagelab/pacscore.com/で公開されている。

The CLIP model has been recently proven to be very effective for a variety of cross-modal tasks, including the evaluation of captions generated from vision-and-language architectures. In this paper, we propose a new recipe for a contrastive-based evaluation metric for image captioning, namely Positive-Augmented Contrastive learning Score (PAC-S), that in a novel way unifies the learning of a contrastive visual-semantic space with the addition of generated images and text on curated data. Experiments spanning several datasets demonstrate that our new metric achieves the highest correlation with human judgments on both images and videos, outperforming existing reference-based metrics like CIDEr and SPICE and reference-free metrics like CLIP-Score. Finally, we test the system-level correlation of the proposed metric when considering popular image captioning approaches, and assess the impact of employing different cross-modal features. Our source code and trained models are publicly available at: https://github.com/aimagelab/pacscore.

翻訳日:2023-07-21 18:18:28 公開日:2023-07-20

# すべてはデータに関するものだ – 逆のロバスト性に対するデータの影響に関する調査

It Is All About Data: A Survey on the Effects of Data on Adversarial Robustness ( http://arxiv.org/abs/2303.09767v2 )

ライセンス: Link先を確認

Peiyu Xiong, Michael Tegegn, Jaskeerat Singh Sarin, Shubhraneel Pal, Julia Rubin

(参考訳) 敵の例は機械学習モデルへの入力であり、攻撃者が意図的にモデルを混同して間違いを起こすように設計した。このような例は、特に生命および安全クリティカルな領域において、機械学習ベースのシステムの適用性に深刻な脅威をもたらす。この問題に対処するため、敵対的堅牢性領域は、これらの攻撃に対する敵対的攻撃と防御の背後にあるメカニズムを調査している。本研究は, 避難攻撃時のモデルロバスト性の観点から, トレーニングデータの特性を調査することに焦点を当てた, この文献の特定のサブセットをレビューする。まず、敵の脆弱性につながるデータの主な特性を要約する。次に,データ表現と学習手順の強化による対向的ロバスト性向上のためのガイドラインと手法と,与えられた特定のデータに対するロバスト性保証を推定する手法について論じる。最後に、この領域における知識のギャップと将来的な研究の方向性について論じる。

Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to confuse the model into making a mistake. Such examples pose a serious threat to the applicability of machine-learning-based systems, especially in life- and safety-critical domains. To address this problem, the area of adversarial robustness investigates mechanisms behind adversarial attacks and defenses against these attacks. This survey reviews a particular subset of this literature that focuses on investigating properties of training data in the context of model robustness under evasion attacks. It first summarizes the main properties of data leading to adversarial vulnerability. It then discusses guidelines and techniques for improving adversarial robustness by enhancing the data representation and learning procedures, as well as techniques for estimating robustness guarantees given particular data. Finally, it discusses gaps of knowledge and promising future research directions in this area.

翻訳日:2023-07-21 18:17:23 公開日:2023-07-20

# 二部量子系の絡み合いダイナミクスに関する基礎的速度制限

Fundamental speed limits on entanglement dynamics of bipartite quantum systems ( http://arxiv.org/abs/2303.07415v2 )

ライセンス: Link先を確認

Vivek Pandey, Swapnil Bhowmick, Brij Mohan, Sohail, and Ujjwal Sen

(参考訳) エンタングルメントの速度限界は、物理的過程においてエンタングルメントが生成または劣化できる最大速度として定義される。エンタングルメントの相対エントロピーとトレース距離エンタングルメントを用いて、エンタングルメントの速度限界をユニタリと任意の量子力学で導出し、最も近い分離可能な状態のダイナミクスは、システムの実際のダイナミクスの最も近い分離可能なダイナミクスによって近似的に記述できると仮定する。純粋な状態によって記述される孤立二成分系のユニタリダイナミクスに対して、エンタングルメント生成の速度は、システムの駆動ハミルトニアンと超主作用素の揺らぎの積によって制限され、最接近分離可能な状態の時間依存性を反映した追加の項が与えられる。入力の純度と進化のユニタリ性に関する制限を取り除いた場合、境界内の2つの項は適切に変更される。さらに、任意の量子力学によりある程度の絡み合いを発生または分解するのに要する時間に対する低い境界を求める。実際に興味のある量子過程を考慮し, 絡み合いに対する速度制限の厳密さを示す。

The speed limits on entanglement are defined as the maximal rate at which entanglement can be generated or degraded in a physical process. We derive the speed limits on entanglement, using the relative entropy of entanglement and trace-distance entanglement, for unitary as well as for arbitrary quantum dynamics, where we assume that the dynamics of the closest separable state can be approximately described by the closest separable dynamics of the actual dynamics of the system. For unitary dynamics of isolated bipartite systems which are described by pure states, the rate of entanglement production is bounded by the product of fluctuations of the system's driving Hamiltonian and the surprisal operator, with an additional term reflecting the time-dependent nature of the closest separable state. Removing restrictions on the purity of the input and on the unitarity of the evolution, the two terms in the bound get suitably altered. Furthermore, we find a lower bound on the time required to generate or degrade a certain amount of entanglement by arbitrary quantum dynamics. We demonstrate the tightness of our speed limits on entanglement by considering quantum processes of practical interest.

翻訳日:2023-07-21 18:16:46 公開日:2023-07-20

# トップおよびバックビュードローン映像からのポーズ情報を用いたバドミントンダブルスの制御領域の推定

Estimation of control area in badminton doubles with pose information from top and back view drone videos ( http://arxiv.org/abs/2305.04247v2 )

ライセンス: Link先を確認

Ning Ding, Kazuya Takeda, Wenhui Jin, Yingjiu Bei, Keisuke Fujii

(参考訳) 動的競技におけるスポーツ選手のパフォーマンス分析へのビジュアルトラッキングの適用は,効果的なコーチングに不可欠である。ダブルスの試合では、調整された位置決めがコートのコントロールを維持し、対戦相手の得点機会を最小化するために重要である。このようなチームワークの分析はゲームのダイナミクスを理解する上で重要な役割を果たす。しかし,従来の研究では,放送ビデオの排除を考慮せずにシングルプレーヤーの分析と評価に重点を置いてきた。これらの研究は、特定のアクション(例えば、ストローク)の分析と表現を含む離散的な表現や、意味のある空間分布を見下ろしながらゲーム中に起こる出来事に依存してきた。本研究では,バドミントンダブルにおけるトップ・バックビューからの最初の注釈付きドローンデータセットを提示し,チームワークのパフォーマンスを評価するための制御領域確率マップを推定するためのフレームワークを提案する。完全な確率曲面の計算を可能にするディープニューラルネットワークの効率的なフレームワークを提案する。このフレームワークはプレイヤーの位置のガウス混合写像の埋め込みを利用し、ポーズにグラフ畳み込みを用いる。実験では,様々なベースラインを比較し,スコアと制御領域の相関関係を見出すことにより,我々のアプローチを検証する。また,ゲーム中に指示を与える最適位置評価のための実用的応用を提案する。このアプローチは,選手の動きを視覚的かつ定量的に評価し,ダブルスチームワークに対する貴重な洞察を提供する。データセットと関連するプロジェクトコードはhttps://github.com/ning-d/drone_bd_controlareaで入手できる。

The application of visual tracking to the performance analysis of sports players in dynamic competitions is vital for effective coaching. In doubles matches, coordinated positioning is crucial for maintaining control of the court and minimizing opponents' scoring opportunities. The analysis of such teamwork plays a vital role in understanding the dynamics of the game. However, previous studies have primarily focused on analyzing and assessing singles players without considering occlusion in broadcast videos. These studies have relied on discrete representations, which involve the analysis and representation of specific actions (e.g., strokes) or events that occur during the game while overlooking the meaningful spatial distribution. In this work, we present the first annotated drone dataset from top and back views in badminton doubles and propose a framework to estimate the control area probability map, which can be used to evaluate teamwork performance. We present an efficient framework of deep neural networks that enables the calculation of full probability surfaces. This framework utilizes the embedding of a Gaussian mixture map of players' positions and employs graph convolution on their poses. In the experiment, we verify our approach by comparing various baselines and discovering the correlations between the score and control area. Additionally, we propose a practical application for assessing optimal positioning to provide instructions during a game. Our approach offers both visual and quantitative evaluations of players' movements, thereby providing valuable insights into doubles teamwork. The dataset and related project code is available at https://github.com/Ning-D/Drone_BD_ControlArea

翻訳日:2023-07-21 18:09:00 公開日:2023-07-20

# BERT と Query-Aware LSH を用いたインフォームドキュメンテーションにおけるコード例推薦の改善 : 比較検討

Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study ( http://arxiv.org/abs/2305.03017v3 )

ライセンス: Link先を確認

Sajjad Rahmani, AmirHossein Naghshzan, Latifa Guerrouj

(参考訳) 本研究は,コードスニペットの用意により,開発者がかなりの時間を節約できるソフトウェア開発者の支援を目的としたコード例の推薦について検討する。私たちの研究の焦点はStack Overflowで、特にJavaプログラミング言語のコンテキストにおいて、議論やソリューションをコーディングするのによく使われるリソースです。我々は,LLM(Large Language Model)であるBERTを適用し,コード例を意味情報を抽出して数値ベクトルに変換する。これらの数値表現が準備されたら、Locality-Sensitive Hashing (LSH) を用いて近似近傍隣人(ANN)を同定する。 LSHにはランダム・ハイパープレーン・ベースLSHとクエリ・アウェアLSHの2つのバリエーションがある。これらの2つのアプローチを,hitrate, mean reciprocal rank (mrr), average execution time, associatedの4つのパラメータで厳密に比較した。本研究では,Random Hyperplane-based (RH) 法よりもQuery-Aware (QA) 法の方が優れた性能を示した。具体的には、RHアプローチと比較して、クエリペアに対してHitRateが20%から35%向上した。さらに、ハッシュテーブルの作成とデータサンプルのバケットへの割り当てを少なくとも4倍高速にすることで、QAアプローチは大幅に時間効率が向上した。コード例をミリ秒以内に返すことができるが、RHアプローチは通常、コード例を推奨するのに数秒を要する。 QAアプローチの優れたパフォーマンスのため、最先端のベースラインであるPostFinderとFaCoYに対してテストしました。提案手法は,有効なコード推薦の可能性を証明した。

Our research investigates the recommendation of code examples to aid software developers, a practice that saves developers significant time by providing ready-to-use code snippets. The focus of our study is Stack Overflow, a commonly used resource for coding discussions and solutions, particularly in the context of the Java programming language. We applied BERT, a powerful Large Language Model (LLM) that enables us to transform code examples into numerical vectors by extracting their semantic information. Once these numerical representations are prepared, we identify Approximate Nearest Neighbors (ANN) using Locality-Sensitive Hashing (LSH). Our research employed two variants of LSH: Random Hyperplane-based LSH and Query-Aware LSH. We rigorously compared these two approaches across four parameters: HitRate, Mean Reciprocal Rank (MRR), Average Execution Time, and Relevance. Our study revealed that the Query-Aware (QA) approach showed superior performance over the Random Hyperplane-based (RH) method. Specifically, it exhibited a notable improvement of 20% to 35% in HitRate for query pairs compared to the RH approach. Furthermore, the QA approach proved significantly more time-efficient, with its speed in creating hashing tables and assigning data samples to buckets being at least four times faster. It can return code examples within milliseconds, whereas the RH approach typically requires several seconds to recommend code examples. Due to the superior performance of the QA approach, we tested it against PostFinder and FaCoY, the state-of-the-art baselines. Our QA method showed comparable efficiency proving its potential for effective code recommendation.

翻訳日:2023-07-21 18:08:35 公開日:2023-07-20

# ブールネットワークの最小トラップ空間の普遍的性質に取り組む

Tackling Universal Properties of Minimal Trap Spaces of Boolean Networks ( http://arxiv.org/abs/2305.02442v2 )

ライセンス: Link先を確認

Sara Riva, Jean-Marie Lagniez, Gustavo Maga\~na L\'opez, Lo\"ic Paulev\'e

(参考訳) 最小トラップ空間(MTS)は、更新モードによらず、ブールダイナミクスが閉じ込められている部分空間をキャプチャする。それらは最も寛容なモードの誘引者に対応する。汎用性のため、MSSの計算は、本質的には列挙に焦点をあてることで、近年牽引力を高めている。本稿では, MTS の普遍性に関する論理的推論を, MTS 上の任意の性質を強制する Boolean 変数の永久凍結を識別するための Boolean ネットワークの再プログラミングと, MTS 上の普遍性から Boolean ネットワークを合成する,という2つの問題の範囲内で解決する。どちらの問題も、量化命題論理式を3段階の量化子(\exists\forall\exists$)で満たすことができる。本稿では,2つの簡単な公式の解法を結合することにより,これらの問題を効率的に解くための逆例誘導改良抽象化(cegar)を提案する。式ごとに解集合プログラミングを頼りにし、生物ネットワークの幅広いブールモデルにその扱い可能性を示すプロトタイプを提供する。

Minimal trap spaces (MTSs) capture subspaces in which the Boolean dynamics is trapped, whatever the update mode. They correspond to the attractors of the most permissive mode. Due to their versatility, the computation of MTSs has recently gained traction, essentially by focusing on their enumeration. In this paper, we address the logical reasoning on universal properties of MTSs in the scope of two problems: the reprogramming of Boolean networks for identifying the permanent freeze of Boolean variables that enforce a given property on all the MTSs, and the synthesis of Boolean networks from universal properties on their MTSs. Both problems reduce to solving the satisfiability of quantified propositional logic formula with 3 levels of quantifiers ($\exists\forall\exists$). In this paper, we introduce a Counter-Example Guided Refinement Abstraction (CEGAR) to efficiently solve these problems by coupling the resolution of two simpler formulas. We provide a prototype relying on Answer-Set Programming for each formula and show its tractability on a wide range of Boolean models of biological networks.

翻訳日:2023-07-21 18:08:07 公開日:2023-07-20

# RadAdapt: 大規模言語モデルの軽量ドメイン適応による要約

RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models ( http://arxiv.org/abs/2305.01146v3 )

ライセンス: Link先を確認

Dave Van Veen, Cara Van Uden, Maayane Attias, Anuj Pareek, Christian Bluethgen, Malgorzata Polacin, Wah Chiu, Jean-Benoit Delbrouck, Juan Manuel Zambrano Chaves, Curtis P. Langlotz, Akshay S. Chaudhari, John Pauly

(参考訳) 本研究は,Radiology Report summarization (RRS) の課題に対して,大規模言語モデル(LLM)を適応するための軽量戦略を体系的に検討する。具体的には、プレトレーニング(自然言語、バイオメディカルテキスト、臨床テキスト)と離散的なプロンプトやパラメータ効率の微調整によるドメイン適応に焦点を当てる。臨床テキストの事前学習とrrsサンプルの微調整によって,タスクに最大限に適応することで,一貫して最高のパフォーマンスを達成できた。重要なことに、この方法は、エンドツーエンドの微調整(パラメータの100%)とは対照的に、モデル全体のパラメータの0.32%しか微調整しない。さらに, 放射線学読者による研究と定性分析を結論付ける前に, 文脈内実例とアウト・オブ・ディストリビューション(OOD)訓練の効果について検討した。本研究は、RSにおけるドメイン適応の重要性を強調し、臨床業務に有効な自然言語処理ソリューションを開発するための貴重な洞察を提供する。

We systematically investigate lightweight strategies to adapt large language models (LLMs) for the task of radiology report summarization (RRS). Specifically, we focus on domain adaptation via pretraining (on natural language, biomedical text, or clinical text) and via discrete prompting or parameter-efficient fine-tuning. Our results consistently achieve best performance by maximally adapting to the task via pretraining on clinical text and fine-tuning on RRS examples. Importantly, this method fine-tunes a mere 0.32% of parameters throughout the model, in contrast to end-to-end fine-tuning (100% of parameters). Additionally, we study the effect of in-context examples and out-of-distribution (OOD) training before concluding with a radiologist reader study and qualitative analysis. Our findings highlight the importance of domain adaptation in RRS and provide valuable insights toward developing effective natural language processing solutions for clinical tasks.

翻訳日:2023-07-21 18:07:44 公開日:2023-07-20

# ハイブリッド量子ニューラルネットワークによる迷路問題の解法に関する深部Q学習

Deep-Q Learning with Hybrid Quantum Neural Network on Solving Maze Problems ( http://arxiv.org/abs/2304.10159v2 )

ライセンス: Link先を確認

Hao-Yuan Chen, Yen-Jui Chang, Ching-Ray Chang

(参考訳) 量子コンピューティングは、高次元データを扱う機械学習アルゴリズムの限界を前進させ、ディープニューラルネットワーク(dnn)モデルの全体的なトレーニングパラメータを減らす大きな可能性を秘めている。本研究では,ゲート型量子コンピュータ上のパラメータ化量子回路(pqc)を用いて,モデルフリー強化学習問題における量子優位の可能性について検討する。量子コンピュータの現在のモデルと能力の包括的調査と評価を通じて、我々は最新のQiskitとPyTorchフレームワークに基づく新しいハイブリッド量子ニューラルネットワークを設計し、訓練した。我々は、その性能をpqcと統合されていない完全に古典的なdnnと比較した。私たちの研究は、迷路問題を解決するための深層量子学習の可能性と、他の強化学習問題に対する洞察を提供します。様々な強化学習問題は合理的なトレーニング時代において有効であると結論づける。さらに,迷路問題に対する様々な量子強化学習モデルの比較検討を行い,研究の全体的な可能性とメリットを評価する。

Quantum computing holds great potential for advancing the limitations of machine learning algorithms to handle higher data dimensions and reduce overall training parameters in deep neural network (DNN) models. This study uses a parameterized quantum circuit (PQC) on a gate-based quantum computer to investigate the potential for quantum advantage in a model-free reinforcement learning problem. Through a comprehensive investigation and evaluation of the current model and capabilities of quantum computers, we designed and trained a novel hybrid Quantum neural network based on the latest Qiskit and PyTorch framework. We compared its performance with a full-classical DNN with and without an integrated PQC. Our research provides insights into the potential of deep quantum learning to solve a maze problem and, potentially, other reinforcement learning problems. We conclude that various reinforcement learning problems can be effective with reasonable training epochs. Moreover, a comparative discussion of the various quantum reinforcement learning model on maze problems is discussed to evaluate our research's overall potential and advantages.

翻訳日:2023-07-21 18:06:48 公開日:2023-07-20

# Sabi\'a: ポルトガルの大規模言語モデル

Sabi\'a: Portuguese Large Language Models ( http://arxiv.org/abs/2304.07880v3 )

ライセンス: Link先を確認

Ramon Pires, Hugo Abonizio, Thales Sales Almeida, Rodrigo Nogueira

(参考訳) 言語モデルの能力が向上し続ければ、"ワンサイズフィットオール"モデルが主要なパラダイムとして残ることは考えられます。例えば、世界中の膨大な数の言語が低リソースであることを考えれば、一般的なプラクティスは、複数の言語で単一のモデルを事前学習することだ。本稿では,この実践に挑戦するエビデンスを増大させ,対象言語での単言語事前学習が,すでに多様なコーパスで広く訓練されているモデルを大幅に改善することを示す。より具体的には、ポルトガル語テキストのGPT-JおよびLLaMAモデルを、当初の事前訓練予算の3%以下で事前訓練する。ポルトガルの14のデータセットからなるスイートであるPoetaに関するわずかな評価によると、我々のモデルは、英語と多言語で比較すると、かなり差がある。私たちのベストモデルであるSabi\'a-65Bは、GPT-3.5-turboと同等に動作します。対象言語と翻訳言語で当初考えられたデータセットから評価することにより,言語固有の事前学習の貢献度について検討する。 1)対象言語固有の言語ニュアンス及び構造を捉えること、及び 2) ドメインや文化に関するモデルの知識を豊かにする。以上の結果から,効果の大部分は単言語前訓練によって獲得したドメイン固有知識によるものであることが示唆された。

As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models already extensively trained on diverse corpora. More specifically, we further pretrain GPT-J and LLaMA models on Portuguese texts using 3% or less of their original pretraining budget. Few-shot evaluations on Poeta, a suite of 14 Portuguese datasets, reveal that our models outperform English-centric and multilingual counterparts by a significant margin. Our best model, Sabi\'a-65B, performs on par with GPT-3.5-turbo. By evaluating on datasets originally conceived in the target language as well as translated ones, we study the contributions of language-specific pretraining in terms of 1) capturing linguistic nuances and structures inherent to the target language, and 2) enriching the model's knowledge about a domain or culture. Our results indicate that the majority of the benefits stem from the domain-specific knowledge acquired through monolingual pretraining.

翻訳日:2023-07-21 18:06:13 公開日:2023-07-20

# 非等尺写像とド・ジッターテンソルネットワークからの重なり量子ビット

Overlapping qubits from non-isometric maps and de Sitter tensor networks ( http://arxiv.org/abs/2304.02673v2 )

ライセンス: Link先を確認

ChunJun Cao, Wissam Chemissany, Alexander Jahn, and Zolt\'an Zimbor\'as

(参考訳) 非等尺写像を用いて、概局所可観測性、あるいは「重なり合う量子ビット」を構築し、局所実効理論における過程をホログラフィにおける我々の期待と類似した自由度の低い量子系でスプーフできることを示す。さらに、スプーフ系は自然に、量子重力の特徴と同一視できる方法で実際の局所理論から逸脱する。具体的な例として、デ・ジッター時空の2つのメラトイモデルを構築し、大域的デ・ジッターの指数展開が量子自由度を多く減らし、局所物理学が崩壊する前にほぼ長い時間保存されていることを説明した。量子ビットの重なりの近似は、ヒルベルト空間次元の検証、ブラックホールやホログラフィにおける自由度数、量子重力における近似局所性と概念的にどのように結びついているかを強調する。

We construct approximately local observables, or "overlapping qubits", using non-isometric maps and show that processes in local effective theories can be spoofed with a quantum system with fewer degrees of freedom, similar to our expectation in holography. Furthermore, the spoofed system naturally deviates from an actual local theory in ways that can be identified with features in quantum gravity. For a concrete example, we construct two MERA toy models of de Sitter space-time and explain how the exponential expansion in global de Sitter can be spoofed with many fewer quantum degrees of freedom and that local physics may be approximately preserved for an exceedingly long time before breaking down. We highlight how approximate overlapping qubits are conceptually connected to Hilbert space dimension verification, degree-of-freedom counting in black holes and holography, and approximate locality in quantum gravity.

翻訳日:2023-07-21 18:05:52 公開日:2023-07-20

# 固体量子系における欠陥の包括的同定法

Comprehensive scheme for identifying defects in solid-state quantum systems ( http://arxiv.org/abs/2305.17889v2 )

ライセンス: Link先を確認

Chanaprom Cholsuk, Sujin Suwanna, Tobias Vogl

(参考訳) 固体量子エミッタは、光学量子技術に必要なコンポーネントの1つである。理想的には、エミッタは量子ネットワーク内の他のコンポーネントと効率的に結合するための波長互換を持つべきである。したがって、特定のエミッターにつながる蛍光欠陥を理解することが不可欠である。本研究では,密度汎関数理論(dft)を用いて2次元材料窒化ホウ素中の量子エミッタの完全な光学的指紋の計算を行う。これらのエミッターは非常に興味深いが、その多くはまだ同定されていない。その結果、ゼロフォノン線エネルギーなどの単一光学特性を比較するのではなく、理論シミュレーションと実験を比較する際に複数の特性を用いるべきであることが示唆された。これにより、電子構造全体を予測し、量子エミッタを設計・調整することができる。さらに、本手法を適用し、Al$_{\text{N}}$とP$_{\text{N}}$V$_{\text{B}}$の欠陥の例を通して、特定の量子アプリケーションでエミッターを使用するための適合性を予測する。そこで我々は,dft計算を組み合わせて固体結晶中の量子エミッタを同定し,ミスアサインのリスクを低減し,光学量子システムの設計と調整を行う。これにより、将来のハイブリッド量子ネットワークにおける普遍的な固体量子エミッタシステムの分類と生成のレシピとなる。

A solid-state quantum emitter is one of the indispensable components for optical quantum technologies. Ideally, an emitter should have a compatible wavelength for efficient coupling to other components in a quantum network. It is therefore essential to understand fluorescent defects that lead to specific emitters. In this work, we employ density functional theory (DFT) to demonstrate the calculation of the complete optical fingerprints of quantum emitters in the two-dimensional material hexagonal boron nitride. These emitters are of great interest, yet many of them are still to be identified. Our results suggest that instead of comparing a single optical property, such as the commonly used zero-phonon line energy, multiple properties should be used when comparing theoretical simulations to the experiment. This way, the entire electronic structure can be predicted and quantum emitters can be designed and tailored. Moreover, we apply this approach to predict the suitability for using the emitters in specific quantum applications, demonstrating through the examples of the Al$_{\text{N}}$ and P$_{\text{N}}$V$_{\text{B}}$ defects. We therefore combine and apply DFT calculations to identify quantum emitters in solid-state crystals with a lower risk of misassignments as well as a way to design and tailor optical quantum systems. This consequently serves as a recipe for classification and the generation of universal solid-state quantum emitter systems in future hybrid quantum networks.

翻訳日:2023-07-21 18:00:09 公開日:2023-07-20

# 複数のラベルなしデータセットからのAUC最適化

AUC Optimization from Multiple Unlabeled Datasets ( http://arxiv.org/abs/2305.15776v2 )

ライセンス: Link先を確認

Yu Liu, Zheng Xie, Ming Li

(参考訳) 弱い教師付き学習は、完璧な監督が利用できない時に機械学習を強化することを目的としており、研究者から大きな注目を集めている。様々な弱い監督のうち、最も難しい事例の1つは、クラス事前の知識がほとんどない複数のラベルのない(u)データセットから学ぶか、略してu$^m$学習するかである。本稿では,複数のラベル付きデータセットから auc (area under roc curve) 最適化モデルを構築する際の問題点について検討する。 U$^m$-AUCは、U$^m$データを多ラベルAUC最適化問題に変換するAUC最適化手法であり、効率的に訓練することができる。提案したU$^m$-AUCは理論的および実験的に有効であることを示す。

Weakly supervised learning aims to empower machine learning when the perfect supervision is unavailable, which has drawn great attention from researchers. Among various types of weak supervision, one of the most challenging cases is to learn from multiple unlabeled (U) datasets with only a little knowledge of the class priors, or U$^m$ learning for short. In this paper, we study the problem of building an AUC (area under ROC curve) optimization model from multiple unlabeled datasets, which maximizes the pairwise ranking ability of the classifier. We propose U$^m$-AUC, an AUC optimization approach that converts the U$^m$ data into a multi-label AUC optimization problem, and can be trained efficiently. We show that the proposed U$^m$-AUC is effective theoretically and empirically.

翻訳日:2023-07-21 17:59:49 公開日:2023-07-20

# チャットGPT, 大規模言語モデル, 生成AI時代の科学 : 研究倫理と応答方法への挑戦

Science in the Era of ChatGPT, Large Language Models and Generative AI: Challenges for Research Ethics and How to Respond ( http://arxiv.org/abs/2305.15299v2 )

ライセンス: Link先を確認

Evangelos Pournaras

(参考訳) ChatGPTのような人工知能(AI)の大規模な言語モデルは、科学と研究に顕著だが議論の余地がある。本稿では,創造的AIの出現にともなう科学行為における認識論的課題,倫理的・整合性リスクについてレビューする。これは、高品質な研究倫理レビューのための、新たなタイムリーな基礎を築き上げることを目的としています。研究機器と主題としてのAI言語モデルの役割は、科学者、参加者、レビュアーに対する倫理的意味とともに精査されている。研究倫理レビューの新しい新たなプラクティスについて議論され、ai時代のより責任ある研究行為に対する反応を形成する10の推奨事項がまとめられている。

Large language models of artificial intelligence (AI), such as ChatGPT, find remarkable but controversial applicability in science and research. This paper reviews epistemological challenges, ethical and integrity risks in science conduct in the advent of generative AI. This is with the aim to lay new timely foundations for a high-quality research ethics review. The role of AI language models as a research instrument and subject is scrutinized along with ethical implications for scientists, participants and reviewers. New emerging practices for research ethics review are discussed, concluding with ten recommendations that shape a response for a more responsible research conduct in the era of AI.

翻訳日:2023-07-21 17:59:37 公開日:2023-07-20

# AlignAtt:同時音声翻訳ガイドとしての注意に基づく音声翻訳アライメント

AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation ( http://arxiv.org/abs/2305.11408v2 )

ライセンス: Link先を確認

Sara Papi, Marco Turchi, Matteo Negri

(参考訳) 自然言語処理に今日最も使われているアーキテクチャの中核的なメカニズムは注意であり、機械翻訳関連タスクの有効性を含む多くの観点から分析されてきた。これらの研究の中で、音声翻訳(ST)タスクのように、入力テキストを音声セグメントに置き換えた場合にも、単語アライメントに関する洞察を得るのに役立つ情報源として注意が向けられた。本稿では,提案するAlignAttを提案する。このAlignAttは,アテンション情報を利用して推論時にモデルを誘導するソース・ターゲットアライメントを生成する,同時ST(SimulST)のための新しいポリシーである。 8言語対の MuST-C v1.0 の実験により、AlignAtt はオフライン学習モデルに適用された従来の最先端の SimulST ポリシーよりも優れており、BLEU は 2 点のBLEU で、レイテンシは 8 言語で0.5 から0.8 の範囲で減少している。

Attention is the core mechanism of today's most used architectures for natural language processing and has been analyzed from many perspectives, including its effectiveness for machine translation-related tasks. Among these studies, attention resulted to be a useful source of information to get insights about word alignment also when the input text is substituted with audio segments, as in the case of the speech translation (ST) task. In this paper, we propose AlignAtt, a novel policy for simultaneous ST (SimulST) that exploits the attention information to generate source-target alignments that guide the model during inference. Through experiments on the 8 language pairs of MuST-C v1.0, we show that AlignAtt outperforms previous state-of-the-art SimulST policies applied to offline-trained models with gains in terms of BLEU of 2 points and latency reductions ranging from 0.5s to 0.8s across the 8 languages.

翻訳日:2023-07-21 17:59:07 公開日:2023-07-20

# MaxViT-UNet:医療画像セグメンテーションのためのマルチ軸注意

MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation ( http://arxiv.org/abs/2305.08396v3 )

ライセンス: Link先を確認

Abdul Rehman Khan, Asifullah Khan

(参考訳) 畳み込みニューラルネットワーク(CNN)は近年,医療画像解析において大きな進歩を遂げている。しかし、畳み込み作用素の局所的な性質は、CNNにおける大域的および長距離的相互作用を捉える限界を生じさせる可能性がある。近年,コンピュータビジョンコミュニティや医療画像セグメンテーションにおいて,グローバル機能を効果的に処理する能力からトランスフォーマーが普及している。自己注意機構のスケーラビリティの問題とCNNのような帰納バイアスの欠如は、採用を制限した可能性がある。そのため,畳み込み機構と自己着脱機構の利点を活かしたハイブリッド視覚トランスフォーマ(cnn-トランスフォーマ)の重要性が高まっている。本稿では,医療用画像分割用エンコーダデコーダ型ハイブリッドビジョントランスフォーマ(cnn-transformer)maxvit-unetを提案する。 maxvit-blockに基づくハイブリッドデコーダは,各デコーダの畳み込み機構と自己アテンション機構の両方のパワーを,名目上の計算負荷で活用するように設計されている。復号器の各段階における多軸自己注意の包含は、対象領域と背景領域の識別能力を大幅に向上させ、セグメンテーション効率の向上に寄与する。ハイブリッドデコーダブロックでは、変換畳み込みにより得られたアップサンプル化下層デコーダ特徴とハイブリッドエンコーダから導出されるスキップ接続特徴とを一体化して融合処理を開始する。その後、多軸アテンション機構の利用により、融合した特徴が洗練される。提案したデコーダブロックは数回繰り返して核領域を段階的に分割する。 MoNuSeg18とMoNuSAC20データセットの実験結果から,提案手法の有効性が示された。

Convolutional Neural Networks (CNNs) have made significant strides in medical image analysis in recent years. However, the local nature of the convolution operator may pose a limitation for capturing global and long-range interactions in CNNs. Recently, Transformers have gained popularity in the computer vision community and also medical image segmentation due to their ability to process global features effectively. The scalability issues of self-attention mechanism and lack of the CNN-like inductive bias may have limited their adoption. Therefore, hybrid Vision transformers (CNN-Transformer), exploiting advantages of both Convolution and Self-attention Mechanisms, have gained importance. In this work, we present MaxViT-UNet, an Encoder-Decoder based hybrid vision transformer (CNN-Transformer) for medical image segmentation. The proposed Hybrid Decoder, based on MaxViT-block, is designed to harness the power of both the convolution and self-attention mechanisms at each decoding stage with nominal computational burden. The inclusion of multi-axis self-attention, within each decoder stage, significantly enhances the discriminating capacity between the object and background regions, and thereby helps in improving the segmentation efficiency. In the Hybrid Decoder block, the fusion process commences by integrating the upsampled lower level decoder features, obtained through transpose convolution, with the skip-connection features derived from the hybrid encoder. Subsequently, the fused features undergo refinement through the utilization of a multi-axis attention mechanism. The proposed decoder block is repeated multiple times to progressively segment the nuclei regions. Experimental results on MoNuSeg18 and MoNuSAC20 dataset demonstrates the effectiveness of the proposed technique.

翻訳日:2023-07-21 17:58:46 公開日:2023-07-20

# 混合状態に対する最適量子速度

Optimal quantum speed for mixed states ( http://arxiv.org/abs/2305.08004v2 )

ライセンス: Link先を確認

Ashraf Naderzadeh and Seyed Javad Akhtarshenas

(参考訳) 量子状態がどの程度高速に進化できるかという問題を考える。 phys におけるユークリッド距離に基づく二乗速度の定義を用いる。 Rev. Research, {\bf 2}, 033127 (2019)] では、時間非依存ハミルトニアンの下で一元的に進化した$d$次元システムの最適速度を得るための体系的な枠組みを提案する。同じ純度を持つ混合量子状態の組のうち、最適状態はその純度パラメータを用いて得られる。任意の$d$ に対して、最適状態は、二次対角線に対して対称である追加の性質を持つ$x$-状態によって表される。純度が最大混合状態$\Id/d$を少なくとも2/d^2$で純度を超える十分低い純度に対して、最適状態の非零対角エントリーは$\varrho_{1d}$であり、それぞれ最小固有値と最大固有値を持つ2つのエネルギー固有状態間の遷移振幅に対応する。しかし、より大きな純度の場合、他の二次径のエントリ$\varrho_{i,d-i+1}$を非零値とするかどうかは、相対エネルギーギャップ$|E_{d-i+1}-E_{i}|$に依存する。エネルギー基底に対するコヒーレンスと絡み合いの影響も検討され、最適状態においてはどちらの資源も純度の単調関数であるため、量子進化のスピードアップを招き、量子速度の限界を小さくすることができる。以上の結果から, 2次対角線上に位置する対角線外エントリによって引き起こされるコヒーレンスのみが, 状態のコヒーレンスが進化の速度に寄与することが示された。

The question of how fast a quantum state can evolve is considered. Using the definition of squared speed based on the Euclidean distance given in [Phys. Rev. Research, {\bf 2}, 033127 (2019)], we present a systematic framework to obtain the optimal speed of a $d$-dimensional system evolved unitarily under a time-independent Hamiltonian. Among the set of mixed quantum states having the same purity, the optimal state is obtained in terms of its purity parameter. We show that for an arbitrary $d$, the optimal state is represented by a $X$-state with an additional property of being symmetric with respect to the secondary diagonal. For sufficiently low purities for which the purity exceeds the purity of maximally mixed state $\Id/d$ by at most $2/d^2$, the only nonzero off-diagonal entry of the optimal state is $\varrho_{1d}$, corresponding to the transition amplitude between two energy eigenstates with minimum and maximum eigenvalues, respectively. For larger purities, however, whether or not the other secondary diameter entries $\varrho_{i,d-i+1}$ take nonzero values depends on their relative energy gaps $|E_{d-i+1}-E_{i}|$. The effects of coherence and entanglement, with respect to the energy basis, are also examined and find that for optimal states both resources are monotonic functions of purity, so they can cause speed up quantum evolution leading to a smaller quantum speed limit. Our results show that although the coherence of the states is responsible for the speed of evolution, for the fastest states only the coherence caused by some off-diagonal entries located on the secondary diagonal make a role.

翻訳日:2023-07-21 17:58:18 公開日:2023-07-20

# ベイズ推論の組成構造

The Compositional Structure of Bayesian Inference ( http://arxiv.org/abs/2305.06112v2 )

ライセンス: Link先を確認

Dylan Braithwaite, Jules Hedges, Toby St Clere Smithe

(参考訳) ベイズの法則は、新しい証拠に照らして信念を更新するために因果プロセスを反転させる方法を教えてくれる。もしこの過程が複雑な構成構造を持つと信じられているならば、全体の反転は成分過程の観点で区分的に計算できるのである。この構成規則の構造について検討し,関数型プログラミングにおけるレンズパターンとの関連について考察した。マルコフ核の圏の好ましく一般的な公理的な表現の中で、ベイズ反転をファイバー圏における状態依存型(英語版)の特定の例と考えることができる。基礎となるカテゴリの関手として定式化されたこの構成の性質について議論し、統計的推論に対するより型駆動的なアプローチにどのように使用できるかを検討する。

Bayes' rule tells us how to invert a causal process in order to update our beliefs in light of new evidence. If the process is believed to have a complex compositional structure, we may observe that the inversion of the whole can be computed piecewise in terms of the component processes. We study the structure of this compositional rule, noting that it relates to the lens pattern in functional programming. Working in a suitably general axiomatic presentation of a category of Markov kernels, we see how we can think of Bayesian inversion as a particular instance of a state-dependent morphism in a fibred category. We discuss the compositional nature of this, formulated as a functor on the underlying category and explore how this can used for a more type-driven approach to statistical inference.

翻訳日:2023-07-21 17:57:43 公開日:2023-07-20

# 完全ベイズVIB-DeepSSM

Fully Bayesian VIB-DeepSSM ( http://arxiv.org/abs/2305.05797v2 )

ライセンス: Link先を確認

Jadie Adams and Shireen Elhabian

(参考訳) 統計的形状モデリング(SSM)は、集団に基づく解剖学的形状の定量的分析を可能にし、臨床診断を行う。深層学習による3次元画像からの対応ベースssmの予測は不確かさの定量化を必要とするが、ベイズ式化の動機付けは必要である。変動情報ボトルネックのDeepSSM(VIB-DeepSSM)は,アレータティック不確実性定量化画像から解剖の確率的形状を予測するための,有効で原則化されたフレームワークである。しかし、VIBは半ベイズ的であり、疫学的な不確実性推論を欠いている。我々は,完全ベイズ式vibを導出し,スケーラブルな2つの実装手法の有効性を実証する。さらに,マルチモーダル限界化による不確実性校正をさらに強化する新しい組み合わせを提案する。合成形状と左房データの実験により、完全ベイズVIBネットワークは精度を犠牲にすることなく不確実性推論を改善した画像からSSMを予測することを示した。

Statistical shape modeling (SSM) enables population-based quantitative analysis of anatomical shapes, informing clinical diagnosis. Deep learning approaches predict correspondence-based SSM directly from unsegmented 3D images but require calibrated uncertainty quantification, motivating Bayesian formulations. Variational information bottleneck DeepSSM (VIB-DeepSSM) is an effective, principled framework for predicting probabilistic shapes of anatomy from images with aleatoric uncertainty quantification. However, VIB is only half-Bayesian and lacks epistemic uncertainty inference. We derive a fully Bayesian VIB formulation and demonstrate the efficacy of two scalable implementation approaches: concrete dropout and batch ensemble. Additionally, we introduce a novel combination of the two that further enhances uncertainty calibration via multimodal marginalization. Experiments on synthetic shapes and left atrium data demonstrate that the fully Bayesian VIB network predicts SSM from images with improved uncertainty reasoning without sacrificing accuracy.

翻訳日:2023-07-21 17:57:29 公開日:2023-07-20

# ポイントクラウドネットワークは解剖学の統計的形状モデルを学ぶことができるか?

Can point cloud networks learn statistical shape models of anatomies? ( http://arxiv.org/abs/2305.05610v2 )

ライセンス: Link先を確認

Jadie Adams and Shireen Elhabian

(参考訳) 統計的形状モデリング (SSM) は解剖学の個体群における解剖学的変動を調査し定量化する貴重なツールである。しかし、従来の対応ベースのSSM生成法では、SSMを構成するには完全な幾何学的プロキシ(例えば、高解像度のバイナリボリュームや表面メッシュ)が必要である。形状の無秩序な3dポイントクラウド表現は、様々な医療画像(しきい値画像や表面走査など)からより容易に取得できる。ポイントクラウドディープネットワークは、最近、異なるポイントクラウドタスク(例えば、補完、意味セグメンテーション、分類)の置換不変機能を学習することに成功した。しかし、ポイントクラウドからssmを学習する彼らの応用は未検討である。本研究では,既存のポイントクラウドエンコーダ・デコーダベースのコンプリートネットワークが,ssmの未解決可能性を提供し,人口レベルの統計表現をキャプチャし,推論負担を軽減し,入力要求を緩和できることを実証する。本稿では,SSMアプリケーションに対するこれらの手法の限界について論じ,今後の改良を提案する。我々の研究は、形状解析文学を進歩させ、多様なユースケースにSSMを広げるための有望な道である、SSMのためのポイントクラウド深層学習のさらなる探求の道を開く。

Statistical Shape Modeling (SSM) is a valuable tool for investigating and quantifying anatomical variations within populations of anatomies. However, traditional correspondence-based SSM generation methods have a prohibitive inference process and require complete geometric proxies (e.g., high-resolution binary volumes or surface meshes) as input shapes to construct the SSM. Unordered 3D point cloud representations of shapes are more easily acquired from various medical imaging practices (e.g., thresholded images and surface scanning). Point cloud deep networks have recently achieved remarkable success in learning permutation-invariant features for different point cloud tasks (e.g., completion, semantic segmentation, classification). However, their application to learning SSM from point clouds is to-date unexplored. In this work, we demonstrate that existing point cloud encoder-decoder-based completion networks can provide an untapped potential for SSM, capturing population-level statistical representations of shapes while reducing the inference burden and relaxing the input requirement. We discuss the limitations of these techniques to the SSM application and suggest future improvements. Our work paves the way for further exploration of point cloud deep learning for SSM, a promising avenue for advancing shape analysis literature and broadening SSM to diverse use cases.

翻訳日:2023-07-21 17:57:12 公開日:2023-07-20

# Chain-of-Knowledge Promptingによる言語モデルの強化

Boosting Language Models Reasoning with Chain-of-Knowledge Prompting ( http://arxiv.org/abs/2306.06427v2 )

ライセンス: Link先を確認

Jianing Wang, Qiushi Sun, Nuo Chen, Xiang Li, Ming Gao

(参考訳) これは ``let's think step by step'''' のような単純なプロンプトを設計することや、複数のコンテキスト内exemplarsを適切に設計し、大きな言語モデル(llm)を導出して中間的な推論ステップを生成することを目的としている。しかし、生成された合理性はしばしば間違いを伴い、非事実的で不誠実な推論連鎖を作る。この脆さを緩和するために,我々は,LLMを3重構造形式で明示的な知識証拠を生成することを目的とした,新しい知識の連鎖(CoK)プロンプトを提案する。これは人間の行動、つまり、複雑な質問に答える前に脳の推論証拠としてマインドマップや知識マップを描けることにインスパイアされている。さらに, 事実性および忠実性の観点から, 推論チェーンの信頼性を推定するF^2-Verification法を導入する。信頼できない反応については、誤った証拠がLSMに再考を促すために示される。広範な実験により,本手法はコモンセンス,ファクトラル,シンボリック,算術推論タスクの性能をさらに向上できることが証明された。

Recently, Chain-of-Thought (CoT) prompting has delivered success on complex reasoning tasks, which aims at designing a simple prompt like ``Let's think step by step'' or multiple in-context exemplars with well-designed rationales to elicit Large Language Models (LLMs) to generate intermediate reasoning steps. However, the generated rationales often come with mistakes, making unfactual and unfaithful reasoning chains. To mitigate this brittleness, we propose a novel Chain-of-Knowledge (CoK) prompting, where we aim at eliciting LLMs to generate explicit pieces of knowledge evidence in the form of structure triple. This is inspired by our human behaviors, i.e., we can draw a mind map or knowledge map as the reasoning evidence in the brain before answering a complex question. Benefiting from CoK, we additionally introduce a F^2-Verification method to estimate the reliability of the reasoning chains in terms of factuality and faithfulness. For the unreliable response, the wrong evidence can be indicated to prompt the LLM to rethink. Extensive experiments demonstrate that our method can further improve the performance of commonsense, factual, symbolic, and arithmetic reasoning tasks.

翻訳日:2023-07-21 17:48:11 公開日:2023-07-20

# 雑音変動量子アルゴリズムにおける量子平均値のシミュレーション:多項式スケールアプローチ

Simulating Quantum Mean Values in Noisy Variational Quantum Algorithms: A Polynomial-Scale Approach ( http://arxiv.org/abs/2306.05804v2 )

ライセンス: Link先を確認

Yuguo Shao, Fuchuan Wei, Song Cheng, Zhengwei Liu

(参考訳) 大規模変動量子アルゴリズムは、実用的な量子優位性を達成するための潜在的な経路として広く認識されている。しかし、量子ノイズの存在はこれらの利点を抑圧し弱め、古典的シミュラビリティの境界を曖昧にする可能性がある。この問題をより明確にするために,観測可能なパウリパス(OBPPP)のバックプロパゲーションの経路積分に基づく新しい多項式スケール法を提案する。本手法は,独立単一量子ビット偏極雑音の存在下で,有界乱れ誤差を持つ変分量子アルゴリズムの量子平均値を効率よく近似する。理論的には厳格に証明します 1) 固定ノイズレート $\lambda$ に対して、obppp の時間と空間の複雑さは、量子ビット $n$ の数、回路深度 $l$ 、逆トランザクションエラー $\frac{1}{\varepsilon}$ 、ルート平方逆成功確率 $\frac{1}{\sqrt{\delta}}$ との多項式関係を示す。 2 変数 $\lambda$ に対して、計算複雑性は $\mathrm{Poly}\left(n,L\right)$ が $\lambda$ を超えるとき $\frac{1}{\log{L}}$ となり、$\lambda$ が $\frac{1}{L}$ 以下になるとき $L$ が指数関数となる。数値解析により,IBM の 127-qubit Eagle プロセッサ [Nature \textbf{618}, 500 (2023)] におけるゼロノイズ外挿実験結果の古典的シミュレーションを行った。提案手法は,量子デバイスと比較して精度が高く,実行速度も速い。さらに,本手法はノイズのない結果からノイズを低減し,生の観測と直接対応するIBMの未決定結果を正確に再現することを可能にする。

Large-scale variational quantum algorithms are widely recognized as a potential pathway to achieve practical quantum advantages. However, the presence of quantum noise might suppress and undermine these advantages, which blurs the boundaries of classical simulability. To gain further clarity on this matter, we present a novel polynomial-scale method based on the path integral of observable's back-propagation on Pauli paths (OBPPP). This method efficiently approximates quantum mean values in variational quantum algorithms with bounded truncation error in the presence of independent single-qubit depolarizing noise. Theoretically, we rigorously prove: 1) For a fixed noise rate $\lambda$, OBPPP's time and space complexity exhibit a polynomial relationship with the number of qubits $n$, the circuit depth $L$, the inverse truncation error $\frac{1}{\varepsilon}$, and the root square inverse success probability $\frac{1}{\sqrt{\delta}}$. 2) For variable $\lambda$, the computational complexity becomes $\mathrm{Poly}\left(n,L\right)$ when $\lambda$ exceeds $\frac{1}{\log{L}}$ and it becomes exponential with $L$ when $\lambda$ falls below $\frac{1}{L}$. Numerically, we conduct classical simulations of IBM's zero-noise extrapolated experimental results on the 127-qubit Eagle processor [Nature \textbf{618}, 500 (2023)]. Our method attains higher accuracy and faster runtime compared to the quantum device. Moreover, this approach enables us to deduce noisy outcomes from noiseless results, allowing us to accurately reproduce IBM's unmitigated results that directly correspond to raw experimental observations.

翻訳日:2023-07-21 17:47:30 公開日:2023-07-20

# 階層型変分オートエンコーダを用いた感情条件メロディ調和

Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder ( http://arxiv.org/abs/2306.03718v4 )

ライセンス: Link先を確認

Shulei Ji and Xinyu Yang

(参考訳) 既存のメロディ調和モデルでは、生成したハーモニーの品質向上に大きな進歩を遂げているが、その多くは音楽の下の感情を無視している。一方、以前の手法で生成された調和の変動性は不十分である。これらの問題を解決するために,LSTMを用いた階層的変分自動エンコーダ(LHVAE)を提案する。特に、LHVAEは、グローバルおよびローカルな音楽特性をモデル化するために、様々なレベル(ピースレベルとバーレベル)の潜伏変数と感情条件を組み込んでいる。さらに,各ステップに注意に基づくメロディコンテキストベクトルを導入し,メロディとハーモニーの対応をよりよく学習する。目的実験の結果,提案モデルは他のLSTMモデルよりも優れていた。主観的評価を通じて、和音の種類を変えるだけでは音楽全体の感情がほとんど変化しないと結論づける。定性的解析は、我々のモデルが可変調和を生成する能力を示す。

Existing melody harmonization models have made great progress in improving the quality of generated harmonies, but most of them ignored the emotions beneath the music. Meanwhile, the variability of harmonies generated by previous methods is insufficient. To solve these problems, we propose a novel LSTM-based Hierarchical Variational Auto-Encoder (LHVAE) to investigate the influence of emotional conditions on melody harmonization, while improving the quality of generated harmonies and capturing the abundant variability of chord progressions. Specifically, LHVAE incorporates latent variables and emotional conditions at different levels (piece- and bar-level) to model the global and local music properties. Additionally, we introduce an attention-based melody context vector at each step to better learn the correspondence between melodies and harmonies. Objective experimental results show that our proposed model outperforms other LSTM-based models. Through subjective evaluation, we conclude that only altering the types of chords hardly changes the overall emotion of the music. The qualitative analysis demonstrates the ability of our model to generate variable harmonies.

翻訳日:2023-07-21 17:46:42 公開日:2023-07-20

# 直交群対称性の下での$k$-陽性とシュミット数

$k$-positivity and Schmidt number under orthogonal group symmetries ( http://arxiv.org/abs/2306.00654v2 )

ライセンス: Link先を確認

Sang-Jun Park, Sang-Gyun Youn

(参考訳) 本稿では,標準直交群対称性の下で,k$-positivity と schmidt number について検討する。シュミット数は量子情報理論における絡み合いの自然な定量化である。まず、すべての直交共変 $k$-正の写像の完全な特徴づけを示す。これは [Tom85] で以前の結果を一般化する。さらに、コンパクト群対称性の下で、k$-ポジティビティとシュミット数の間の双対関係を最適化する。この新たな枠組みにより、直交不変量子状態のシュミット数を効率的に計算できる。

In this paper, we study $k$-positivity and Schmidt number under standard orthogonal group symmetries. The Schmidt number is a natural quantification of entanglement in quantum information theory. First of all, we exhibit a complete characterization of all orthogonally covariant $k$-positive maps. This generalizes earlier results in [Tom85]. Furthermore, we optimize duality relations between $k$-positivity and Schmidt numbers under compact group symmetries. This new framework enables us to efficiently compute the Schmidt numbers of all orthogonally invariant quantum states.

翻訳日:2023-07-21 17:46:22 公開日:2023-07-20

# 量子計測のための物理ノイズモデル

A physical noise model for quantum measurements ( http://arxiv.org/abs/2305.19766v2 )

ライセンス: Link先を確認

Faedi Loulidi, Ion Nechita, Cl\'ement Pellegrini

(参考訳) そこで,本論文では,故障のある間接計測方式に動機づけられた量子計測のための新しいノイズモデルを提案する。量子系とプローブの相互作用を制御するランダムダイナミクス上の平均化により、自然な物理ノイズモデルが出現する。非互換性のロバスト性という枠組みで、既存のノイズモデル(一様および非分極)と比較する。我々は,本モデルが特定の測定クラスの互換性領域を大きくすることができることを観察した。

In this paper we introduce a novel noise model for quantum measurements motivated by an indirect measurement scheme with faulty preparation. Averaging over random dynamics governing the interaction between the quantum system and a probe, a natural, physical noise model emerges. We compare it to existing noise models (uniform and depolarizing) in the framework of incompatibility robustness. We observe that our model allows for larger compatibility regions for specific classes of measurements.

翻訳日:2023-07-21 17:46:15 公開日:2023-07-20

# 分子ドッキングと機械学習回帰法を用いたCOVID-19 3CLプロテアーゼを標的とした薬物精製

Drug Repurposing Targeting COVID-19 3CL Protease using Molecular Docking and Machine Learning Regression Approach ( http://arxiv.org/abs/2305.18088v4 )

ライセンス: Link先を確認

Imra Aqeel, and Abdul Majid

(参考訳) 新型コロナウイルス(COVID-19)のパンデミックが世界的な健康危機を引き起こし、治療薬の早期発見の必要性が高まっている。この課題を満たすために、医薬品の再利用はコスト、時間、労働を節約する唯一の解決策である。本研究では,SARS-CoV-2の主要プロテアーゼ3CLを標的とした新型コロナウイルス治療の可能性として,FDAが承認した5903薬を含む世界承認薬をスクリーニングするために,Zincデータベースを使用した。分子ドッキングを行い,薬物分子の有効性を確認した。薬物再資源化手法の効率を高めるために, 決定木, 余剰木, MLP, KNN, XGBoost, 勾配ブースティングなどのQSARモデリングのための機械学習回帰手法を用いて, 結合親和性をモデル化した。その結果,決定木回帰(DTR)モデルにより,R2およびRMSEの統計的測定精度が向上した。これらのシミュレーション結果は高い結合親和性を持つ薬物の同定に役立った。ドッキングおよびその他の統計分析から,-15 kcal/molから-13 kcal/molの範囲で,それぞれのZinc ID(ZINC3873365,ZINC85432544,ZINC203757351,ZINC85536956,ZINC8214470,ZINC261494640)の6種類の有望薬物をショートリスト化した。本研究は、他の研究ですでに新型コロナウイルスに対して同定されているZINC203757351抗ウイルス化合物以外の新規な薬剤である。さらに, 特定のプロテアーゼ3CLproに対する最も優れた結合相互作用について, これらのトップランク選択薬の生理化学的および薬物動態特性を解析した。我々の研究は、COVID-19に対する薬物再精製の効果的な枠組みを提供してきた。これは、分子ドッキングと機械学習回帰アプローチを組み合わせることで、潜在的な治療候補の同定を加速する可能性を強調している。

The COVID-19 pandemic has created a global health crisis, driving the need for the rapid identification of potential therapeutics. To meet this challenge, drug repurposing is the only solution with saving cost, time, and labor. In this study, we used the Zinc database to screen the world-approved including FDA-approved 5903 drugs for repurposing as potential COVID-19 treatments targeting the main protease 3CL of SARS-CoV-2. We performed molecular docking and checked the efficacy of drug molecules. To enhance the efficiency of drug repurposing approach, we modeled the binding affinities using several machine learning regression approaches for QSAR modeling such as decision tree, extra trees, MLP, KNN, XGBoost, and gradient boosting. The computational results demonstrated that Decision Tree Regression (DTR) model has improved statistical measures of R2 and RMSE. These simulated results helped to identify drugs with high binding affinity. From the docking and other statistical analysis, we shortlisted six promising drugs with their respective Zinc IDs (ZINC3873365, ZINC85432544, ZINC203757351, ZINC85536956, ZINC8214470 and ZINC261494640) within the range of -15 kcal/mol to -13 kcal/mol. In the study, the repurposed drugs are novel except ZINC203757351 antiviral compound that has already identified against COVID-19 in other studies. Further, we analyzed the physiochemical and pharmacokinetic properties of these top-ranked selected drugs with respect to their best binding interaction for specific target protease 3CLpro. Our study has provided an efficient framework for drug repurposing against COVID-19. This highlights the potential of combining molecular docking with machine learning regression approaches to accelerate the identification of potential therapeutic candidates.

翻訳日:2023-07-21 17:46:10 公開日:2023-07-20

# GSMorph: cine-MRI心筋変形性レジストレーションのためのグラディエント手術

GSMorph: Gradient Surgery for cine-MRI Cardiac Deformable Registration ( http://arxiv.org/abs/2306.14687v2 )

ライセンス: Link先を確認

Haoran Dou, Ning Bi, Luyi Han, Yuhao Huang, Ritse Mann, Xin Yang, Dong Ni, Nishant Ravikumar, Alejandro F. Frangi, Yunzhi Huang

(参考訳) 深層学習に基づく変形可能な登録法は様々な医学的応用において広く研究されている。学習に基づく変形可能な登録は、変形場の登録精度と滑らかさをトレードオフする重み付き目的関数に依存する。したがって、最適な登録性能を得るためには、必然的にハイパーパラメータをチューニングする必要がある。ハイパーパラメータのチューニングは非常に計算コストが高く、ドメイン知識に望ましくない依存性をもたらします。本研究では,GSMorph と呼ばれる勾配手術機構に基づく登録モデルを構築し,複数の損失に対するハイパーパラメータフリーバランスを実現する。 GSMorphでは、この2つの競合する項のバランスをとるためにハイパーパラメータを導入するのではなく、滑らか性制約に付随する平面に直交する類似性損失の勾配を投影することで最適化手順を再構築する。さらに,本手法はモデルに依存しないため,パラメータの追加や推論の遅延を伴わずに,任意のディープ登録ネットワークにマージすることができる。本研究では,2つの心臓MRIデータセットに対するSOTA (State-of-the-art) 変形性登録手法との比較を行った。 GSMorphは5つのSOTA学習ベース登録モデルと2つの従来の登録手法であるSyNとDemonsよりも、登録精度と滑らかさの両方で優れていることを証明している。

Deep learning-based deformable registration methods have been widely investigated in diverse medical applications. Learning-based deformable registration relies on weighted objective functions trading off registration accuracy and smoothness of the deformation field. Therefore, they inevitably require tuning the hyperparameter for optimal registration performance. Tuning the hyperparameters is highly computationally expensive and introduces undesired dependencies on domain knowledge. In this study, we construct a registration model based on the gradient surgery mechanism, named GSMorph, to achieve a hyperparameter-free balance on multiple losses. In GSMorph, we reformulate the optimization procedure by projecting the gradient of similarity loss orthogonally to the plane associated with the smoothness constraint, rather than additionally introducing a hyperparameter to balance these two competing terms. Furthermore, our method is model-agnostic and can be merged into any deep registration network without introducing extra parameters or slowing down inference. In this study, We compared our method with state-of-the-art (SOTA) deformable registration approaches over two publicly available cardiac MRI datasets. GSMorph proves superior to five SOTA learning-based registration models and two conventional registration techniques, SyN and Demons, on both registration accuracy and smoothness.

翻訳日:2023-07-21 17:39:37 公開日:2023-07-20

# $\alpha$-$\beta$-Factorization と Simon's Congruence のバイナリケース

$\alpha$-$\beta$-Factorization and the Binary Case of Simon's Congruence ( http://arxiv.org/abs/2306.14192v2 )

ライセンス: Link先を確認

Pamela Fleischmann, Jonas H\"ofer, Annika Huch, Dirk Nowotka

(参考訳) 1991年、H'ebrardは単語の因数分解を導入し、単語の散在する要素(散在した)や部分列(サブワード)を調べる強力なツールとなった。これに基づいて、最初のカランディカールとシュネーベレンは$k$-richnessという概念を導入し、後にBarkerらに$k$-universalityという概念を導入した。 2022年、fleischmannらは、単語とその逆のアーチ分解を交差させることで、アーチ分解の一般化を示した。著者らは, この因子分解を, 最短欠落因子の探索にのみ用いたが, 本研究では, 新規な$\alpha$-$\beta$-factorization について検討する。我々は、有名なsimon congruenceのk$universalワードを1$universalワードで特徴づける。さらに,これらの結果をバイナリ単語に適用する。この特別な場合、クラスを完全に特徴づけ、合同の指標を計算する。最後に、三項ケースの調査を開始し、$\alpha\beta\alpha$-factorsの完全なリストを示し、それらの一貫性を特徴づける。

In 1991 H\'ebrard introduced a factorization of words that turned out to be a powerful tool for the investigation of a word's scattered factors (also known as (scattered) subwords or subsequences). Based on this, first Karandikar and Schnoebelen introduced the notion of $k$-richness and later on Barker et al. the notion of $k$-universality. In 2022 Fleischmann et al. presented a generalization of the arch factorization by intersecting the arch factorization of a word and its reverse. While the authors merely used this factorization for the investigation of shortest absent scattered factors, in this work we investigate this new $\alpha$-$\beta$-factorization as such. We characterize the famous Simon congruence of $k$-universal words in terms of $1$-universal words. Moreover, we apply these results to binary words. In this special case, we obtain a full characterization of the classes and calculate the index of the congruence. Lastly, we start investigating the ternary case, present a full list of possibilities for $\alpha\beta\alpha$-factors, and characterize their congruence.

翻訳日:2023-07-21 17:39:18 公開日:2023-07-20

# My Boli: コードミックスのMarathi- English Corpora、事前学習言語モデル、評価ベンチマーク

My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks ( http://arxiv.org/abs/2306.14030v2 )

ライセンス: Link先を確認

Tanmay Chavan, Omkar Gokhale, Aditya Kane, Shantanu Patankar, Raviraj Joshi

(参考訳) コード混合データの研究は、専用のコード混合データセットと事前学習された言語モデルが利用できないため、限られている。この作業では、コードミックスに先立つ作業に欠ける、低リソースのインドの言語であるmarathiに焦点を合わせます。 L3Cube-MeCorpusは,Mr-Enコーパスと1000万のソーシャルメディア文による事前学習用コーパスである。また、コード混合BERTベースのトランスモデルであるL3Cube-MeBERTとMeRoBERTaをMeCorpusで事前学習した。さらに、ベンチマークでは、コード混合mr-enヘイトスピーチ検出、感情分析、言語識別などの下流タスクに対して、mehate、mesent、melidの3つの教師付きデータセットを提案する。これらの評価データセットは、手動で注釈付き \url{~}12,000 Marathi- English code-mixed tweet で構成されている。アブレーションは、この新しいコーパスで訓練されたモデルは、既存の最先端のBERTモデルよりも大幅に優れていることを示している。これは、コード混合マラーティー研究の成果物を提示する最初の作品である。すべてのデータセットとモデルはhttps://github.com/l3cube-pune/MarathiNLPで公開されている。

The research on code-mixed data is limited due to the unavailability of dedicated code-mixed datasets and pre-trained language models. In this work, we focus on the low-resource Indian language Marathi which lacks any prior work in code-mixing. We present L3Cube-MeCorpus, a large code-mixed Marathi-English (Mr-En) corpus with 10 million social media sentences for pretraining. We also release L3Cube-MeBERT and MeRoBERTa, code-mixed BERT-based transformer models pre-trained on MeCorpus. Furthermore, for benchmarking, we present three supervised datasets MeHate, MeSent, and MeLID for downstream tasks like code-mixed Mr-En hate speech detection, sentiment analysis, and language identification respectively. These evaluation datasets individually consist of manually annotated \url{~}12,000 Marathi-English code-mixed tweets. Ablations show that the models trained on this novel corpus significantly outperform the existing state-of-the-art BERT models. This is the first work that presents artifacts for code-mixed Marathi research. All datasets and models are publicly released at https://github.com/l3cube-pune/MarathiNLP .

翻訳日:2023-07-21 17:38:55 公開日:2023-07-20

# ボリューム医用画像解析のための正規SE(3)グループ畳み込み

Regular SE(3) Group Convolutions for Volumetric Medical Image Analysis ( http://arxiv.org/abs/2306.13960v2 )

ライセンス: Link先を確認

Thijs P. Kuipers and Erik J. Bekkers

(参考訳) 正規群畳み込みニューラルネットワーク(G-CNN)は、モデル性能を高め、異なる幾何学的対称性に等しくなることが示されている。本研究は体積データ上のse(3),すなわちroto-translation equivarianceの問題に対処する。ボリューム画像データは、多くの医療現場で広く使われている。分離可能な群畳み込みに関する最近の研究により、連続的なSO(3)(回転)カーネルと空間的カーネルに分離されたSE(3)群畳み込みカーネルを考案した。均一なSO(3)格子をサンプリングすることで連続的な設定に近似する。我々の連続SO(3)カーネルは同様に一様格子上のRBF補間によってパラメータ化される。ボリューム画像解析における我々のアプローチの利点を実証する。医用分類課題において, se(3)同変モデルはcnnと正規離散g-cnnを一貫して上回っており, 一般化能力が著しく向上している。提案手法は,通常のCNNに比べて最大16.5%の精度向上を実現している。

Regular group convolutional neural networks (G-CNNs) have been shown to increase model performance and improve equivariance to different geometrical symmetries. This work addresses the problem of SE(3), i.e., roto-translation equivariance, on volumetric data. Volumetric image data is prevalent in many medical settings. Motivated by the recent work on separable group convolutions, we devise a SE(3) group convolution kernel separated into a continuous SO(3) (rotation) kernel and a spatial kernel. We approximate equivariance to the continuous setting by sampling uniform SO(3) grids. Our continuous SO(3) kernel is parameterized via RBF interpolation on similarly uniform grids. We demonstrate the advantages of our approach in volumetric medical image analysis. Our SE(3) equivariant models consistently outperform CNNs and regular discrete G-CNNs on challenging medical classification tasks and show significantly improved generalization capabilities. Our approach achieves up to a 16.5% gain in accuracy over regular CNNs.

翻訳日:2023-07-21 17:38:38 公開日:2023-07-20

# 道徳教育・開発研究における大規模言語モデル活用の可能性

Potential Benefits of Employing Large Language Models in Research in Moral Education and Development ( http://arxiv.org/abs/2306.13805v2 )

ライセンス: Link先を確認

Hyemin Han

(参考訳) 近年,計算機科学者は大規模言語コーパスと人間強化を用いた予測モデルを訓練することにより,大規模言語モデル(LLM)を開発した。 LLMは様々な分野の精度で人工知能を実装するための有望な方法となっている。興味深いことに、近年のLLMは、高度な人間の認知をエミュレートする創発的な機能的特徴、特に従来の予測モデルでは利用できなかった文脈内学習と思考の連鎖を持っている。本稿では,LLMが道徳教育・開発研究にどのように貢献するかを検討する。この目標を達成するために、最近発表された会議論文とArXivのプレプリントをレビューして、LLMで実装された新機能の概要を説明します。また、倫理的ジレンマや外部からのフィードバックに対処しながら、LCMがどのように振る舞うかをChatGPTで簡単な実験を行うつもりです。以上の結果から, LLMは外部入力による推論プロセスの修正と推論に基づいてジレンマを解くことができる可能性が示唆された。さらに、道徳的模範テストによる予備的な実験結果から、模範的な物語は、人間の参加者と同じように、LLMの道徳的高揚を招きかねないことが示される。モラル教育研究におけるllmの潜在的意義と今後の展開について考察する。

Recently, computer scientists have developed large language models (LLMs) by training prediction models with large-scale language corpora and human reinforcements. The LLMs have become one promising way to implement artificial intelligence with accuracy in various fields. Interestingly, recent LLMs possess emergent functional features that emulate sophisticated human cognition, especially in-context learning and the chain of thought, which were unavailable in previous prediction models. In this paper, I will examine how LLMs might contribute to moral education and development research. To achieve this goal, I will review the most recently published conference papers and ArXiv preprints to overview the novel functional features implemented in LLMs. I also intend to conduct brief experiments with ChatGPT to investigate how LLMs behave while addressing ethical dilemmas and external feedback. The results suggest that LLMs might be capable of solving dilemmas based on reasoning and revising their reasoning process with external input. Furthermore, a preliminary experimental result from the moral exemplar test may demonstrate that exemplary stories can elicit moral elevation in LLMs as do they among human participants. I will discuss the potential implications of LLMs on research on moral education and development with the results.

翻訳日:2023-07-21 17:38:23 公開日:2023-07-20

# ラベル生成に基づくクラスインクリメンタル学習

Class-Incremental Learning based on Label Generation ( http://arxiv.org/abs/2306.12619v2 )

ライセンス: Link先を確認

Yijia Shao, Yiduo Guo, Dongyan Zhao, Bing Liu

(参考訳) 事前学習された言語モデルの大きな成功にもかかわらず、これらのモデルを継続的学習、特に破滅的忘れ(CF)によるクラス増分学習(CIL)設定に使用することは依然として困難である。本稿では,cil を連続ラベル生成問題として定式化した場合,cf は大幅に削減され,事前学習モデルの一般化表現がより良く保持できることを示す。そこで我々は,語彙の空間性を活用して生成に集中し,ラベルセマンティクスを用いて擬似再生サンプルを作成する新しいCIL法を提案する。実験の結果, VAGはベースラインよりも大きなマージンで優れていた。

Despite the great success of pre-trained language models, it is still a challenge to use these models for continual learning, especially for the class-incremental learning (CIL) setting due to catastrophic forgetting (CF). This paper reports our finding that if we formulate CIL as a continual label generation problem, CF is drastically reduced and the generalizable representations of pre-trained models can be better retained. We thus propose a new CIL method (VAG) that also leverages the sparsity of vocabulary to focus the generation and creates pseudo-replay samples by using label semantics. Experimental results show that VAG outperforms baselines by a large margin.

翻訳日:2023-07-21 17:38:02 公開日:2023-07-20

# テキストマイニングのためのチャットGPT化学アシスタントとMOF合成予測

ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis ( http://arxiv.org/abs/2306.11296v2 )

ライセンス: Link先を確認

Zhiling Zheng, Oufan Zhang, Christian Borgs, Jennifer T. Chayes, Omar M. Yaghi

(参考訳) 本研究は,化学文献の様々な形式やスタイルから,金属-有機フレームワーク(MOF)合成条件のテキストマイニングの自動化におけるChatGPTの導出を行う。これはChatGPTが情報を幻覚させる傾向を効果的に緩和するものであり、以前は科学分野で大きな言語モデル(LLM)を使用していた問題だった。私たちのアプローチは、chatgpt自身によってプログラムされたテキストマイニングの3つの異なるプロセスを実装するワークフローの開発に関するものです。これらはすべて、パース、検索、フィルタリング、分類、要約、データ統合を可能にする。論文から得られた約800個のMOFに関する26,257個の異なる合成パラメータを抽出する。このプロセスには、ChatGPTにテキストマイニングを指示するChemPrompt Engineering戦略が含まれています。さらに,テキストマイニングによって構築されたデータセットを用いて,MOF実験結晶化結果の予測に精度86%以上の機械学習モデルを構築した。また, 化学反応や合成過程に関する質問に答える, 信頼性の高いデータ接地型mofチャットボットを開発した。 ChatGPTを使用するプロセスは、コーディングの専門知識を必要としない物語言語のみを使用して、多様なMOF合成情報を統一形式で確実にマイニングし、集計することを考えると、我々のChatGPT化学アシスタントは、他の様々な化学分野において非常に有用であると予想される。

We use prompt engineering to guide ChatGPT in the automation of text mining of metal-organic frameworks (MOFs) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT's tendency to hallucinate information -- an issue that previously made the use of Large Language Models (LLMs) in scientific fields challenging. Our approach involves the development of a workflow implementing three different processes for text mining, programmed by ChatGPT itself. All of them enable parsing, searching, filtering, classification, summarization, and data unification with different tradeoffs between labor, speed, and accuracy. We deploy this system to extract 26,257 distinct synthesis parameters pertaining to approximately 800 MOFs sourced from peer-reviewed research articles. This process incorporates our ChemPrompt Engineering strategy to instruct ChatGPT in text mining, resulting in impressive precision, recall, and F1 scores of 90-99%. Furthermore, with the dataset built by text mining, we constructed a machine-learning model with over 86% accuracy in predicting MOF experimental crystallization outcomes and preliminarily identifying important factors in MOF crystallization. We also developed a reliable data-grounded MOF chatbot to answer questions on chemical reactions and synthesis procedures. Given that the process of using ChatGPT reliably mines and tabulates diverse MOF synthesis information in a unified format, while using only narrative language requiring no coding expertise, we anticipate that our ChatGPT Chemistry Assistant will be very useful across various other chemistry sub-disciplines.

翻訳日:2023-07-21 17:37:50 公開日:2023-07-20

# EPRペアのみを用いた量子検出可能ビザンチン合意プロトコル

A Quantum Detectable Byzantine Agreement Protocol using only EPR pairs ( http://arxiv.org/abs/2306.10825v2 )

ライセンス: Link先を確認

Theodore Andronikos, Alla Sirokofskich

(参考訳) 本稿では,検出可能ビザンチン合意のための新しい量子プロトコルを提案する。提案されたプロトコルを類似の量子プロトコルと区別することは、EPRペアのみを使用し、特に$\Psi^{ + }$ペアを使用するという事実である。検出可能なビザンチン協定を保証できる高度な量子プロトコルは数多く存在するが、現在の技術的制限のため、それらは実装に簡単には依存しない。多数のプレイヤーに対して、GHZ $n$-tuplesや他のよりエキゾチックな絡み合った状態は、生成が簡単ではなく、そのようなプロトコルのスケーラビリティを複雑にする可能性がある。対照的にベル状態は、間違いなく最大の絡み合った状態の中で最も容易に生成できる状態である。これは、プレイヤー数$n$に関係なく、EPRペアだけを必要とするため、提案されたプロトコルのスケーラビリティを促進することを願っている。最後に、任意の数のプレイヤーに対して$n$であっても、我々のプロトコルは常に一定の回数のラウンドで完了している。

In this paper, we introduce a new quantum protocol for Detectable Byzantine Agreement. What distinguishes the proposed protocol among similar quantum protocols, is the fact that it uses only EPR pairs, and, in particular, $\Psi^{ + }$ pairs. There are many sophisticated quantum protocols that guarantee Detectable Byzantine Agreement, but they do not easily lend themselves to practical implementations, due to present-day technological limitations. For a large number $n$ of players, GHZ $n$-tuples, or other more exotic entangled states, are not easy to produce, a fact which might complicate the scalability of such protocols. In contrast, Bell states are, undoubtedly, the easiest to generate among maximally entangled states. This will, hopefully, facilitate the scalability of the proposed protocol, as only EPR pairs are required, irrespective of the number $n$ of players. Finally, we mention that, even for arbitrary many players $n$, our protocol always completes in a constant number of rounds, namely $4$.

翻訳日:2023-07-21 17:37:06 公開日:2023-07-20

# Open-Vocabulary Object Detection のスケーリング

Scaling Open-Vocabulary Object Detection ( http://arxiv.org/abs/2306.09683v2 )

ライセンス: Link先を確認

Matthias Minderer, Alexey Gritsenko, Neil Houlsby

(参考訳) オープンボキャブラリオブジェクト検出は、事前訓練された視覚言語モデルから大きな恩恵を受けているが、それでも検出訓練データの量によって制限されている。検出トレーニングデータは、Webイメージテキストペアを弱い監視手段として使用することで拡張できるが、画像レベルの事前トレーニングに匹敵するスケールでは行われていない。ここでは,既存の検出器を用いて画像テキストペアに擬似ボックスアノテーションを生成する自己学習を用いて,検出データをスケールアップする。自己学習のスケーリングにおける大きな課題は、ラベル空間の選択、擬似アノテーションフィルタリング、トレーニング効率である。これらの課題に対処するOWLv2モデルとOWL-ST自己学習レシピを提案する。 OWLv2は、既に同等のトレーニングスケール(約10万例)で、最先端のオープン語彙検出器の性能を上回っている。 L/14アーキテクチャでは、OWL-STはLVISレアクラスのAPを改善し、そのモデルでは31.2%から44.6%(相対的な改善43%)まで、人間のボックスアノテーションが見られない。 OWL-STは、画像分類や言語モデリングで見られるような、オープンワールドのローカライゼーションのためのWebスケールトレーニングをアンロックする。

Open-vocabulary object detection has benefited greatly from pretrained vision-language models, but is still limited by the amount of available detection training data. While detection training data can be expanded by using Web image-text pairs as weak supervision, this has not been done at scales comparable to image-level pretraining. Here, we scale up detection data with self-training, which uses an existing detector to generate pseudo-box annotations on image-text pairs. Major challenges in scaling self-training are the choice of label space, pseudo-annotation filtering, and training efficiency. We present the OWLv2 model and OWL-ST self-training recipe, which address these challenges. OWLv2 surpasses the performance of previous state-of-the-art open-vocabulary detectors already at comparable training scales (~10M examples). However, with OWL-ST, we can scale to over 1B examples, yielding further large improvement: With an L/14 architecture, OWL-ST improves AP on LVIS rare classes, for which the model has seen no human box annotations, from 31.2% to 44.6% (43% relative improvement). OWL-ST unlocks Web-scale training for open-world localization, similar to what has been seen for image classification and language modelling.

翻訳日:2023-07-21 17:36:49 公開日:2023-07-20

# VNHSGE英語データセットにおける大規模言語モデルの性能比較:OpenAI ChatGPT, Microsoft Bing Chat, Google Bard

Performance Comparison of Large Language Models on VNHSGE English Dataset: OpenAI ChatGPT, Microsoft Bing Chat, and Google Bard ( http://arxiv.org/abs/2307.02288v3 )

ライセンス: Link先を確認

Xuan-Quy Dao

(参考訳) 本稿では,VNHSGEの英語データセット上で,OpenAI ChatGPT,Microsoft Bing Chat(BingChat),Google Bardの3つの大規模言語モデル(LLM)の性能比較を行った。 BingChat、Bard、ChatGPT(GPT-3.5)はそれぞれ92.4\%、86\%、79.2\%である。結果は、BingChatがChatGPTやBardより優れていることを示している。したがって、BingChatとBardはChatGPTを置き換えることができるが、ChatGPTはベトナムでは公式には利用できない。また,BingChat,Bard,ChatGPTは,ベトナム人学生の英語能力よりも優れていた。本研究の成果は、英語教育におけるllmの可能性の理解に寄与している。 ChatGPT、BingChat、Bardの顕著なパフォーマンスは、高校レベルで英語を教え学習するための効果的なツールとしての可能性を示している。

This paper presents a performance comparison of three large language models (LLMs), namely OpenAI ChatGPT, Microsoft Bing Chat (BingChat), and Google Bard, on the VNHSGE English dataset. The performance of BingChat, Bard, and ChatGPT (GPT-3.5) is 92.4\%, 86\%, and 79.2\%, respectively. The results show that BingChat is better than ChatGPT and Bard. Therefore, BingChat and Bard can replace ChatGPT while ChatGPT is not yet officially available in Vietnam. The results also indicate that BingChat, Bard and ChatGPT outperform Vietnamese students in English language proficiency. The findings of this study contribute to the understanding of the potential of LLMs in English language education. The remarkable performance of ChatGPT, BingChat, and Bard demonstrates their potential as effective tools for teaching and learning English at the high school level.

翻訳日:2023-07-21 17:29:12 公開日:2023-07-20

# 医用画像解析の公平性向上を目的とした固定属性群のない校正バイアスの緩和

Mitigating Calibration Bias Without Fixed Attribute Grouping for Improved Fairness in Medical Imaging Analysis ( http://arxiv.org/abs/2307.01738v2 )

ライセンス: Link先を確認

Changjian Shui, Justin Szeto, Raghav Mehta, Douglas L. Arnold, Tal Arbel

(参考訳) 深層学習医療画像モデルの現実的な臨床実践への展開には、校正が必要である。しかし、全体として十分に調整されたモデルは、サブ人口の調整が不十分なままであり、このモデルの推奨に基づいて、臨床医が不意にこのグループの決定を下す可能性がある。モデル精度の観点から,サブグループ間のバイアスの軽減に有効な方法が示されているが,本研究は医用画像解析の文脈におけるキャリブレーションバイアスの軽減に関するオープン問題に焦点を当てている。本手法は訓練中にサブグループ属性を必要とせず,各属性の選択に対するバイアスを緩和する柔軟性を実現する。そこで本研究では,まず低濃度の試料を同定し,それらをグループに分類し,グループワイド焦点損失を導入して校正バイアスを改善する2段階の手法を提案する。 HAM10000データセットを用いた皮膚病変分類と,多発性硬化症(MS)患者の将来の病変活動の予測について検討した。また,年齢,性別などの従来の敏感な属性を年齢,性別などのサブグループで考慮することに加えて,医療画像解析において必要となる病変負荷など,画像由来の属性が異なるグループ間でのバイアスも考慮する。提案手法は, 予測性能を維持しつつ, 最近のベースラインよりも高い精度で校正誤差を効果的に制御できることを示す。

Trustworthy deployment of deep learning medical imaging models into real-world clinical practice requires that they be calibrated. However, models that are well calibrated overall can still be poorly calibrated for a sub-population, potentially resulting in a clinician unwittingly making poor decisions for this group based on the recommendations of the model. Although methods have been shown to successfully mitigate biases across subgroups in terms of model accuracy, this work focuses on the open problem of mitigating calibration biases in the context of medical image analysis. Our method does not require subgroup attributes during training, permitting the flexibility to mitigate biases for different choices of sensitive attributes without re-training. To this end, we propose a novel two-stage method: Cluster-Focal to first identify poorly calibrated samples, cluster them into groups, and then introduce group-wise focal loss to improve calibration bias. We evaluate our method on skin lesion classification with the public HAM10000 dataset, and on predicting future lesional activity for multiple sclerosis (MS) patients. In addition to considering traditional sensitive attributes (e.g. age, sex) with demographic subgroups, we also consider biases among groups with different image-derived attributes, such as lesion load, which are required in medical image analysis. Our results demonstrate that our method effectively controls calibration error in the worst-performing subgroups while preserving prediction performance, and outperforming recent baselines.

翻訳日:2023-07-21 17:28:53 公開日:2023-07-20

# UW-ProCCaps: カプセルによる水中プログレッシブカラー化

UW-ProCCaps: UnderWater Progressive Colourisation with Capsules ( http://arxiv.org/abs/2307.01091v2 )

ライセンス: Link先を確認

Rita Pucci, Niki Martinel

(参考訳) 水中画像は海洋生物の研究と理解に欠かせないものである。画像保存に必要なメモリスペースの削減に重点を置いていますが、収集フェーズでのメモリスペースの消費は、このフェーズの持続時間を制限しているため、より多くの画像収集キャンペーンが必要になります。本稿では,水中画像の色を発光チャネルから再構成し,利用可能な記憶空間の2/3を節約する新しい機械学習モデルを提案する。本モデルは水中カラー再構成を専門とし,エンコーダ・デコーダアーキテクチャで構成されている。エンコーダは、畳み込みエンコーダと、ウェブ教師付きデータで訓練された並列特殊分類器からなる。エンコーダとデコーダはカプセルの層を使用して、画像内のエンティティの特徴をキャプチャする。色再現プロセスは、進行性および生成性逆行性訓練手順をリコールする。プログレッシブトレーニングは、色彩の洗練に焦点を当てた生成的な敵対的なルーチンの基盤を与え、画像を明るく飽和した色にすることで、イメージを生き返らせる。 4つのベンチマークデータセットで定性的かつ定量的にモデルを検証する。これは、グレースケールの水中画像で色を再現する最初の試みである。 4つのベンチマークデータセットの大規模な結果は、我々のソリューションが最先端(SOTA)ソリューションより優れていることを示している。また,生成した色調は,SOTAの画質向上モデルと比較して画質の向上を図っている。

Underwater images are fundamental for studying and understanding the status of marine life. We focus on reducing the memory space required for image storage while the memory space consumption in the collecting phase limits the time lasting of this phase leading to the need for more image collection campaigns. We present a novel machine-learning model that reconstructs the colours of underwater images from their luminescence channel, thus saving 2/3 of the available storage space. Our model specialises in underwater colour reconstruction and consists of an encoder-decoder architecture. The encoder is composed of a convolutional encoder and a parallel specialised classifier trained with webly-supervised data. The encoder and the decoder use layers of capsules to capture the features of the entities in the image. The colour reconstruction process recalls the progressive and the generative adversarial training procedures. The progressive training gives the ground for a generative adversarial routine focused on the refining of colours giving the image bright and saturated colours which bring the image back to life. We validate the model both qualitatively and quantitatively on four benchmark datasets. This is the first attempt at colour reconstruction in greyscale underwater images. Extensive results on four benchmark datasets demonstrate that our solution outperforms state-of-the-art (SOTA) solutions. We also demonstrate that the generated colourisation enhances the quality of images compared to enhancement models at the SOTA.

翻訳日:2023-07-21 17:28:28 公開日:2023-07-20

# PatternGPT : 大言語モデルテキスト生成のためのパターン駆動フレームワーク

PatternGPT :A Pattern-Driven Framework for Large Language Model Text Generation ( http://arxiv.org/abs/2307.00470v4 )

ライセンス: Link先を確認

Le Xiao and Xin Shan

(参考訳) 大規模言語モデル(LLMS)は優れたテキスト生成能力を示しており、多くの下流タスクに対して流動的な人間のような応答を生成することができる。しかし、幻覚への感受性や外部知識を直接使用できないため、実世界の重要なタスクに大規模な言語モデルを適用することは依然として困難である。そこで本研究では,大規模言語モデルのためのパターン駆動型テキスト生成フレームワークであるPatternGPTを提案する。 Firstly, the framework utilizes the extraction capability of Large Language Models to generate rich and diversified structured and formalized patterns, which facilitates the introduction of external knowledge to do the computation, and then draws on the idea of federated learning to use multiple agents to achieve the sharing in order to obtain more diversified patterns, and finally uses judgment criteria and optimization algorithm to search for high-quality patterns to guide the generation of models. 最後に、判定基準や最適化アルゴリズムなどの外部知識を用いて高品質なパターンを探索し、探索されたパターンを用いてモデル生成を導く。このフレームワークは、多種多様なパターンの生成、データのプライバシ保護、外部知識の統合、生成品質の向上といった利点があり、大きな言語モデルのテキスト生成能力を最適化し、インテリジェントな対話やコンテンツ生成の分野によりよい適用を可能にする効果的な方法を提供する。

Large language models(LLMS)have shown excellent text generation capabilities, capable of generating fluent human-like responses for many downstream tasks. However, applying large language models to real-world critical tasks remains challenging due to their susceptibility to hallucinations and inability to directly use external knowledge. To cope with the above challenges, this paper proposes PatternGPT, a pattern-driven text generation framework for Large Language Models. Firstly, the framework utilizes the extraction capability of Large Language Models to generate rich and diversified structured and formalized patterns, which facilitates the introduction of external knowledge to do the computation, and then draws on the idea of federated learning to use multiple agents to achieve the sharing in order to obtain more diversified patterns, and finally uses judgment criteria and optimization algorithm to search for high-quality patterns to guide the generation of models. Finally, external knowledge such as judgment criteria and optimization algorithms are used to search for high-quality patterns, and the searched patterns are used to guide model generation. This framework has the advantages of generating diversified patterns, protecting data privacy, combining external knowledge, and improving the quality of generation, which provides an effective method to optimize the text generation capability of large language models, and make it better applied to the field of intelligent dialogue and content generation.

翻訳日:2023-07-21 17:28:11 公開日:2023-07-20

# 予測状態表現の学習に有効なUCB型アルゴリズム

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations ( http://arxiv.org/abs/2307.00405v2 )

ライセンス: Link先を確認

Ruiquan Huang, Yingbin Liang, Jing Yang

(参考訳) マルコフ決定プロセス(MDP)と部分的に観察可能なMDP(PMMDP)を特別に含む一般的なシーケンシャルな意思決定問題は、時間とともに観察と行動の歴史に基づいて一連の意思決定を行うことで累積報酬を最大化することである。近年の研究では、予測状態表現(psr)によってモデル化された低ランク構造を認める場合、逐次的意思決定問題は統計的に学習可能であることが示されている。これらの進歩にもかかわらず、既存のアプローチは通常、計算的に効率的でないオラクルやステップを含む。一方,楽観的なボーナスデザインの難しさから,盗賊やMDPの計算効率向上に成功している上位信頼境界(UCB)に基づくアプローチは,より一般的なPSRでは研究されていない。本稿では,推定モデルと実モデル間の全変動距離を上限とする新しいボーナス項を特徴とする,PSRに対する最初のUCB型アプローチを提案する。さらに,オンラインPSRとオフラインPSRの両方に設計したUPB型アルゴリズムの複雑さ境界を特徴付ける。従来のPSRのアプローチとは対照的に,UCB型アルゴリズムでは計算効率が向上し,最終段階の近似ポリシが保証され,モデル精度が保証された。

The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time. Recent studies have shown that the sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). Despite these advancements, existing approaches typically involve oracles or steps that are not computationally efficient. On the other hand, the upper confidence bound (UCB) based approaches, which have served successfully as computationally efficient methods in bandits and MDPs, have not been investigated for more general PSRs, due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.

翻訳日:2023-07-21 17:27:52 公開日:2023-07-20

# 最適化誘導巡回自己学習による教師なし3次元登録

Unsupervised 3D registration through optimization-guided cyclical self-training ( http://arxiv.org/abs/2306.16997v2 )

ライセンス: Link先を確認

Alexander Bigalke, Lasse Hansen, Tony C. W. Mok, Mattias P. Heinrich

(参考訳) 最先端のディープラーニングベースの登録には、3つの異なる学習戦略が採用されている: コストのかかる手動アノテーションを必要とする教師付き学習、ドメインの専門家が設計した手作りの類似度メトリクスに大きく依存する教師なし学習、ドメインシフトを導入する合成データからの学習。これらの戦略の限界を克服するため,我々は,教師なし登録のための新しい自己教師あり学習パラダイムを提案する。私たちの考えは2つの重要な洞察に基づいている。特徴ベース微分可能最適化器 1)ランダムな特徴からでも合理的な登録を行う 2) ノイズラベルによる先行特徴抽出ネットワークの訓練を安定化させる。その結果、ランダムな特徴から推定される変位場として擬似ラベルが初期化され、学習特徴抽出器からより表現的な特徴に基づいて循環的に更新され、自己強化効果が得られる循環自己学習を提案する。腹部と肺の登録方法を評価し,メートル法に基づく監督を一貫して上回り,様々な最先端の競争相手を上回っている。ソースコードはhttps://github.com/multimodallearning/reg-cyclical-self-trainで入手できる。

State-of-the-art deep learning-based registration methods employ three different learning strategies: supervised learning, which requires costly manual annotations, unsupervised learning, which heavily relies on hand-crafted similarity metrics designed by domain experts, or learning from synthetic data, which introduces a domain shift. To overcome the limitations of these strategies, we propose a novel self-supervised learning paradigm for unsupervised registration, relying on self-training. Our idea is based on two key insights. Feature-based differentiable optimizers 1) perform reasonable registration even from random features and 2) stabilize the training of the preceding feature extraction network on noisy labels. Consequently, we propose cyclical self-training, where pseudo labels are initialized as the displacement fields inferred from random features and cyclically updated based on more and more expressive features from the learning feature extractor, yielding a self-reinforcement effect. We evaluate the method for abdomen and lung registration, consistently surpassing metric-based supervision and outperforming diverse state-of-the-art competitors. Source code is available at https://github.com/multimodallearning/reg-cyclical-self-train.

翻訳日:2023-07-21 17:26:45 公開日:2023-07-20

# MotionGPT: 外国語としての人間の動き

MotionGPT: Human Motion as a Foreign Language ( http://arxiv.org/abs/2306.14795v2 )

ライセンス: Link先を確認

Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, Tao Chen

(参考訳) 事前学習された大規模言語モデルの進歩は展開するが、言語とモーションのような他のマルチモーダルデータのための統一モデルの構築は、これまでも挑戦的で未修正である。幸運なことに、人間の動きは人間の言語に似た意味的な結合を示し、しばしば身体言語の一種として認識される。大規模動作モデルで言語データを融合することにより、動作関連タスクのパフォーマンスを向上させる動き言語事前学習が実現可能となる。この知見を活かし,複数の動作関連タスクを処理するための統合型,汎用性,ユーザフレンドリなモーション言語モデルであるmotiongptを提案する。具体的には,人間の動きに対する離散ベクトル量子化を用いて,単語トークンの生成過程と類似した3次元動きを動きトークンに転送する。この「動き語彙」に基づいて、動きとテキストの両方の言語モデリングを統一的に行い、人間の動きを特定の言語として扱う。さらに、素早い学習にインスパイアされたMotionGPTを、動き言語データの混合で事前訓練し、素早い質問・回答タスクで微調整する。広範囲な実験により、MotionGPTはテキスト駆動のモーション生成、モーションキャプション、モーション予測、動作中の動作を含む複数の動作タスクにおいて最先端のパフォーマンスを達成することが示された。

Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multi-modal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.

翻訳日:2023-07-21 17:26:25 公開日:2023-07-20

# 放射線医のような放射線画像を読む

Reading Radiology Imaging Like The Radiologist ( http://arxiv.org/abs/2307.05921v3 )

ライセンス: Link先を確認

Yuhao Wang

(参考訳) 自動放射線学レポート生成は、放射線学イメージングのリッチできめ細かい記述を含む放射線学レポートを生成することを目的としている。自然画像領域の画像キャプションと比較すると、医療画像は互いに非常によく似ており、疾患の発生にはほとんど差異がない。放射線学レポートにおけるこれらの小さな違いの重要性を考えると、モデルに病気の発生の微妙な領域にもっと集中するよう促すことが重要である。第二に、視覚的およびテキスト的データバイアスの問題は深刻である。通常のケースがデータセットの大部分を占めるだけでなく、病的変化のある部分を記述する文も、段落のごく一部を構成するのみである。最後に、医療画像レポートの生成には、医療知識の専門知識と経験的トレーニングを必要とする長いテキスト生成の課題が伴う。その結果、このようなレポートを生成するのが困難になる。これらの課題に対処するため,我々は,同様の報告を先行知識参照として利用する疾患指向検索フレームワークを提案する。我々は、より正確かつ事実的に一貫した疾患記述を生成するために、事実整合性キャプション生成器を設計する。本研究の枠組みは,CXRデータベースから,その位置と形態的特徴からなる疾患指向マスクを検索することによって,疾患に関する最も類似した報告を見つけることができる。疾患指向の類似報告と視覚的特徴を参照することにより、事実整合性モデルはより正確な放射線診断レポートを生成することができる。

Automated radiology report generation aims to generate radiology reports that contain rich, fine-grained descriptions of radiology imaging. Compared with image captioning in the natural image domain, medical images are very similar to each other, with only minor differences in the occurrence of diseases. Given the importance of these minor differences in the radiology report, it is crucial to encourage the model to focus more on the subtle regions of disease occurrence. Secondly, the problem of visual and textual data biases is serious. Not only do normal cases make up the majority of the dataset, but sentences describing areas with pathological changes also constitute only a small part of the paragraph. Lastly, generating medical image reports involves the challenge of long text generation, which requires more expertise and empirical training in medical knowledge. As a result, the difficulty of generating such reports is increased. To address these challenges, we propose a disease-oriented retrieval framework that utilizes similar reports as prior knowledge references. We design a factual consistency captioning generator to generate more accurate and factually consistent disease descriptions. Our framework can find most similar reports for a given disease from the CXR database by retrieving a disease-oriented mask consisting of the position and morphological characteristics. By referencing the disease-oriented similar report and the visual features, the factual consistency model can generate a more accurate radiology report.

翻訳日:2023-07-21 17:20:36 公開日:2023-07-20

# 一般パラメトリック密度モデルのためのロバスト密度パワーに基づく発散を最小化する確率的最適化手法

A stochastic optimization approach to minimize robust density power-based divergences for general parametric density models ( http://arxiv.org/abs/2307.05251v2 )

ライセンス: Link先を確認

Akifumi Okuno

(参考訳) 観測の基盤となる分布を外圧に対して頑健に推定するために設計された密度パワー分散(DPD) [Basu et al. (1998), Biometrika] は、推定されるパラメトリック密度モデルのパワーの積分項を構成する。積分項の明示的な形式は、ある特定の密度(正規密度や指数密度など)に対して得られるが、その計算的難易度は、PDの提案から4分の1以上にわたって、より一般的なパラメトリック密度へのPDに基づく推定の適用を禁止している。本研究では,一般パラメトリック密度モデルにおけるPDDの最小化のための簡単な確率最適化手法を提案する。提案手法は、非正規化モデルの助けを借りて、別の密度電力ベースの$\gamma$-divergenceの最小化にも適用できる。

Density power divergence (DPD) [Basu et al. (1998), Biometrika], which is designed to estimate the underlying distribution of the observations robustly against outliers, comprises an integral term of the power of the parametric density models to be estimated. While the explicit form of the integral term can be obtained for some specific densities (such as normal density and exponential density), its computational intractability has prohibited the application of DPD-based estimation to more general parametric densities, over a quarter of a century since the proposal of DPD. This study proposes a simple stochastic optimization approach to minimize DPD for general parametric density models and explains its adequacy by referring to conventional theories on stochastic optimization. The proposed approach also can be applied to the minimization of another density power-based $\gamma$-divergence with the aid of unnormalized models.

翻訳日:2023-07-21 17:20:13 公開日:2023-07-20

# ディジタルゼロノイズ外挿による量子誤差緩和のベストプラクティス

Best practices for quantum error mitigation with digital zero-noise extrapolation ( http://arxiv.org/abs/2307.05203v2 )

ライセンス: Link先を確認

Ritajit Majumdar and Pedro Rivero and Friederike Metz and Areeq Hasan and Derek S Wang

(参考訳) デジタルゼロノイズ外挿法(dZNE)は、その概念的単純さ、アクセシビリティ、資源効率のために量子エラー緩和(QEM)の一般的なアプローチとして登場した。しかし、実際には、ノイズの多い量子プロセッサの計算範囲を拡張するためにdZNEを適切に適用することは微妙な問題である。ここでは,ノイズシミュレータと実量子ハードウェアに関する文献レビューとオリジナル実験に基づいて,騒音増幅,量子デバイス上での実行,ゼロノイズ限界への外挿,他のqem法との合成など,ワークフローの各ステップにおけるdzneによるqemのベストプラクティスを定義する。 dzneのベストプラクティスを確立するこの取り組みは、他のqemメソッドにも拡張され、ノイズの多い量子ハードウェア上でより再現可能で厳密な計算が行われることを期待している。

Digital zero-noise extrapolation (dZNE) has emerged as a common approach for quantum error mitigation (QEM) due to its conceptual simplicity, accessibility, and resource efficiency. In practice, however, properly applying dZNE to extend the computational reach of noisy quantum processors is rife with subtleties. Here, based on literature review and original experiments on noisy simulators and real quantum hardware, we define best practices for QEM with dZNE for each step of the workflow, including noise amplification, execution on the quantum device, extrapolation to the zero-noise limit, and composition with other QEM methods. We anticipate that this effort to establish best practices for dZNE will be extended to other QEM methods, leading to more reproducible and rigorous calculations on noisy quantum hardware.

翻訳日:2023-07-21 17:19:55 公開日:2023-07-20

# Solvent: タンパク質のフォールディングのためのフレームワーク

Solvent: A Framework for Protein Folding ( http://arxiv.org/abs/2307.04603v4 )

ライセンス: Link先を確認

Jaemyung Lee, Kyeongtak Han, Jaehoon Kim, Hasun Yu, Youhan Lee

(参考訳) ai研究を行うには一貫性と信頼性が不可欠である。オブジェクト検出のような多くの有名な研究分野は、堅固なベンチマークフレームワークで比較、検証されている。 AlphaFold2の後、タンパク質の折り畳みタスクは新しい段階に入り、AlphaFold2の構成要素に基づいて多くの方法が提案されている。タンパク質折り畳みにおける統一的な研究フレームワークの重要性は、様々なアプローチを一貫して比較するための実装とベンチマークを含んでいる。これを実現するために、Solventは、既製のインターフェイスのように最先端モデルの重要なコンポーネントをサポートするタンパク質折り畳みフレームワークである。Solventは、統一コードベースに実装された異なるモデルを含み、同じデータセット上で定義されたモデルのトレーニングと評価をサポートする。我々は、よく知られたアルゴリズムとそのコンポーネントをベンチマークし、タンパク質構造モデリング分野に関する有益な洞察を与える実験を提供する。我々はSolventが提案したモデルの信頼性と一貫性を高め、速度とコストの両面で効率を向上し、タンパク質の折り畳みモデル研究の加速を期待する。コードはhttps://github.com/kakaobrain/solventで入手できる。

Consistency and reliability are crucial for conducting AI research. Many famous research fields, such as object detection, have been compared and validated with solid benchmark frameworks. After AlphaFold2, the protein folding task has entered a new phase, and many methods are proposed based on the component of AlphaFold2. The importance of a unified research framework in protein folding contains implementations and benchmarks to consistently and fairly compare various approaches. To achieve this, we present Solvent, an protein folding framework that supports significant components of state-of-the-art models in the manner of off-the-shelf interface Solvent contains different models implemented in a unified codebase and supports training and evaluation for defined models on the same dataset. We benchmark well-known algorithms and their components and provide experiments that give helpful insights into the protein structure modeling field. We hope that Solvent will increase the reliability and consistency of proposed models and gives efficiency in both speed and costs, resulting in acceleration on protein folding modeling research. The code is available at https://github.com/kakaobrain/solvent, and the project will continue to be developed.

翻訳日:2023-07-21 17:19:18 公開日:2023-07-20

# 不確かさサンプリングを理解する

Understanding Uncertainty Sampling ( http://arxiv.org/abs/2307.02719v3 )

ライセンス: Link先を確認

Shang Liu, Xiaocheng Li

(参考訳) 不確実性サンプリングは、現在の予測モデルが不確実であるデータサンプルの注釈を逐次クエリする、一般的なアクティブラーニングアルゴリズムである。しかし、不確実性サンプリングの使用は概ねヒューリスティックである。 (i)特定の損失を受けた特定のタスクに対する「不確実性」の適切な定義についての合意がないこと。 (II)アルゴリズムを実装するための標準プロトコルを規定する理論的保証はない。例えば、確率勾配降下のような最適化アルゴリズムの枠組みの下で、逐次到着した注釈付きデータをどう扱うか。本研究では,ストリームベースとプールベースの両方のアクティブラーニングの下で不確実性サンプリングアルゴリズムを体系的に検討する。そこで本研究では, 不確実性尺度と元の損失関数に依存する等価損失の概念を提案し, 不確実性サンプリングアルゴリズムが等価損失に対して本質的に最適化することを示す。この観点は、既存の不確実性対策の正当性を2つの側面から検証する。さらに、不確実性測度を不確実性として設計するための新しい概念である \textit{loss as uncertainty} を提案する。特徴を不確実性尺度として考慮すれば、条件付き期待損失を使用することが目的である。このような不確実性測度は、分類問題と回帰問題の両方をカバーする優れた解析的性質と一般性を有しており、基礎となるモデルと問題の完全な一般性において、ストリームベースとプールベースの設定の両方において不確実性サンプリングアルゴリズムに束縛された最初の一般化を提供することができる。最後に,リスクに敏感な目標と分布的ロバスト性を持つ不確実性サンプリングアルゴリズムのある種の変種間の接続を確立することにより,サンプルサイズが小さい場合の不確実性サンプリングアルゴリズムの利点を部分的に説明できる。

Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of "uncertainty" for a specific task under a specific loss; (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle the sequentially arrived annotated data under the framework of optimization algorithms such as stochastic gradient descent. In this work, we systematically examine uncertainty sampling algorithms under both stream-based and pool-based active learning. We propose a notion of equivalent loss which depends on the used uncertainty measure and the original loss function and establish that an uncertainty sampling algorithm essentially optimizes against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. Furthermore, we propose a new notion for designing uncertainty measures called \textit{loss as uncertainty}. The idea is to use the conditional expected loss given the features as the uncertainty measure. Such an uncertainty measure has nice analytical properties and generality to cover both classification and regression problems, which enable us to provide the first generalization bound for uncertainty sampling algorithms under both stream-based and pool-based settings, in the full generality of the underlying model and problem. Lastly, we establish connections between certain variants of the uncertainty sampling algorithms with risk-sensitive objectives and distributional robustness, which can partly explain the advantage of uncertainty sampling algorithms when the sample size is small.

翻訳日:2023-07-21 17:18:11 公開日:2023-07-20

# $\nu^2$-flows:条件付き正規化流を伴うマルチニュートリノ最終状態における高速で改善されたニュートリノ再構成

$\nu^2$-Flows: Fast and improved neutrino reconstruction in multi-neutrino final states with conditional normalizing flows ( http://arxiv.org/abs/2307.02405v2 )

ライセンス: Link先を確認

John Andrew Raine, Matthew Leigh, Knut Zoch, Tobias Golling

(参考訳) 本研究では、複数のニュートリノを含むファイナル状態への$\nu$-Flows法の拡張である$\nu^2$-Flowsを導入する。このアーキテクチャは、任意の所望のニュートリノ乗数に対して最終状態のオブジェクトタイプと乗数の組み合わせに対してネイティブにスケールすることができる。 t\bar{t}$ dileptonイベントにおいて、ニュートリノとそれらの間の相関のモーメントは、最も一般的な標準解析技術を使用する時よりも正確に再構築され、全てのイベントに対して解が見つかる。推論時間は競合する手法よりも大幅に速く、グラフィック処理ユニット上で並列に評価することでさらに削減することができる。我々は、$\nu^2$-Flows to $t\bar{t}$ dilepton イベントを適用し、展開分布における各ビンの不確かさが、標準手法よりも完全ニュートリノ再構成による性能の限界にかなり近いことを示す。選択された双微分可観測量 $\nu^2$- Flows は、ニュートリノ重み付け法と比較して1.5から2の係数で各ビンの統計的精度を改善し、楕円法と比較して最大4倍に向上する。

In this work we introduce $\nu^2$-Flows, an extension of the $\nu$-Flows method to final states containing multiple neutrinos. The architecture can natively scale for all combinations of object types and multiplicities in the final state for any desired neutrino multiplicities. In $t\bar{t}$ dilepton events, the momenta of both neutrinos and correlations between them are reconstructed more accurately than when using the most popular standard analytical techniques, and solutions are found for all events. Inference time is significantly faster than competing methods, and can be reduced further by evaluating in parallel on graphics processing units. We apply $\nu^2$-Flows to $t\bar{t}$ dilepton events and show that the per-bin uncertainties in unfolded distributions is much closer to the limit of performance set by perfect neutrino reconstruction than standard techniques. For the chosen double differential observables $\nu^2$-Flows results in improved statistical precision for each bin by a factor of 1.5 to 2 in comparison to the Neutrino Weighting method and up to a factor of four in comparison to the Ellipse approach.

翻訳日:2023-07-21 17:17:44 公開日:2023-07-20

# 局所固有次元を用いた深部拡散モデルによる画像の検出

Detecting Images Generated by Deep Diffusion Models using their Local Intrinsic Dimensionality ( http://arxiv.org/abs/2307.02347v3 )

ライセンス: Link先を確認

Peter Lorenz, Ricard Durall and Janis Keuper

(参考訳) 近年,非常にリアルな画像の視覚的合成に拡散モデルが適用されている。これにより、悪質な目的に対する潜在的な懸念が高まる。本稿では,合成画像の自動検出とそれに基づく生成ネットワークの同定のために,元来,敵対例の検出の文脈で開発された軽量なマルチローカル固有次元(multiLID)を提案する。 GAN生成画像に対してのみ動作する多くの既存の検出手法とは対照的に,提案手法は現実的なユースケースの多くにおいて,ほぼ完璧な検出結果を提供する。既知のデータセットと新たに作成されたデータセットに関する広範な実験は、提案手法が拡散検出とモデル同定において優れていることを示している。生成画像の検出に関する最近の出版物の実証的評価は、主に「lsun-bedroom」データセットに焦点を当てているため、画像サイズが異なる複数の拡散モデルからのサンプルを含む拡散生成画像の検出に関する包括的なベンチマークを確立する。

Diffusion models recently have been successfully applied for the visual synthesis of strikingly realistic appearing images. This raises strong concerns about their potential for malicious purposes. In this paper, we propose using the lightweight multi Local Intrinsic Dimensionality (multiLID), which has been originally developed in context of the detection of adversarial examples, for the automatic detection of synthetic images and the identification of the according generator networks. In contrast to many existing detection approaches, which often only work for GAN-generated images, the proposed method provides close to perfect detection results in many realistic use cases. Extensive experiments on known and newly created datasets demonstrate that the proposed multiLID approach exhibits superiority in diffusion detection and model identification. Since the empirical evaluations of recent publications on the detection of generated images are often mainly focused on the "LSUN-Bedroom" dataset, we further establish a comprehensive benchmark for the detection of diffusion-generated images, including samples from several diffusion models with different image sizes.

翻訳日:2023-07-21 17:17:24 公開日:2023-07-20

# 構成・プライバシー・削除のためのタンジェント変換器

Tangent Transformers for Composition, Privacy and Removal ( http://arxiv.org/abs/2307.08122v2 )

ライセンス: Link先を確認

Tian Yu Liu, Aditya Golatkar and Stefano Soatto

(参考訳) 本稿では,1次テイラー展開計算による線形化変圧器の微調整手法であるTangent Attention Fine-Tuning(TAFT)を紹介する。線形化から生じるヤコビアン・ベクター積は1つの前方通過で効率的に計算でき、同じ数のパラメータを用いてトレーニングと推論コストを元の非線形積と同じ桁に削減できることを示す。さらに, 下流の様々な視覚分類課題に適用すると, タフトを微調整したタンジェント変圧器は, 元の非線形ネットワークの微調整と相性が良いことを示した。タンジェントトランスフォーマーは,新しい重み集合に対して線形であり,結果として生じる微調整損失は凸であるので,モデル構成や並列トレーニング,機械学習,差分プライバシーなどに関して,TAFTは非線形微調整に比べていくつかの利点がある。

We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning linearized transformers obtained by computing a First-order Taylor Expansion around a pre-trained initialization. We show that the Jacobian-Vector Product resulting from linearization can be computed efficiently in a single forward pass, reducing training and inference cost to the same order of magnitude as its original non-linear counterpart, while using the same number of parameters. Furthermore, we show that, when applied to various downstream visual classification tasks, the resulting Tangent Transformer fine-tuned with TAFT can perform comparably with fine-tuning the original non-linear network. Since Tangent Transformers are linear with respect to the new set of weights, and the resulting fine-tuning loss is convex, we show that TAFT enjoys several advantages compared to non-linear fine-tuning when it comes to model composition, parallel training, machine unlearning, and differential privacy.

翻訳日:2023-07-21 17:09:04 公開日:2023-07-20

# ジオメトリ誘導クロスビュートランスによる3次元地対衛星カメラ位置推定精度の向上

Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer ( http://arxiv.org/abs/2307.08015v3 )

ライセンス: Link先を確認

Yujiao Shi, Fei Wu, Akhil Perincherry, Ankit Vora, and Hongdong Li

(参考訳) 画像検索に基づくクロスビューローカライズ手法は、データベース衛星画像のサンプリング密度が限られているため、非常に粗いカメラポーズ推定につながることが多い。本稿では,地上画像とマッチング・検索衛星画像との相対的な回転と変換を推定することにより,地上カメラの位置と方向の精度を向上させる手法を提案する。本手法では,従来の形状と学習可能なクロスビュートランスを併用した形状誘導クロスビュートランスを設計,地上観測をオーバヘッドビューにマッピングする。合成したオーバヘッドビューと観測された衛星特徴マップから,強いグローバル情報埋め込み能力を持つニューラルポーズオプティマイザを構築し,それらの相対回転を推定する。それらの回転を整列した後、不確実性誘導された空間相関関係を開発し、相対変換を決定できる車両位置の確率マップを生成する。実験の結果,本手法は最先端技術よりも優れていた。特に、クロスビューkittiデータセットにおける車両横ポーズを1m以内に制限する可能性は35.54\%$から76.44\%$に改善され、そのgt値の1^{\circ}$以内に制限される可能性は19.64\%$から99.10\%$に改善された。

Image retrieval-based cross-view localization methods often lead to very coarse camera pose estimation, due to the limited sampling density of the database satellite images. In this paper, we propose a method to increase the accuracy of a ground camera's location and orientation by estimating the relative rotation and translation between the ground-level image and its matched/retrieved satellite image. Our approach designs a geometry-guided cross-view transformer that combines the benefits of conventional geometry and learnable cross-view transformers to map the ground-view observations to an overhead view. Given the synthesized overhead view and observed satellite feature maps, we construct a neural pose optimizer with strong global information embedding ability to estimate the relative rotation between them. After aligning their rotations, we develop an uncertainty-guided spatial correlation to generate a probability map of the vehicle locations, from which the relative translation can be determined. Experimental results demonstrate that our method significantly outperforms the state-of-the-art. Notably, the likelihood of restricting the vehicle lateral pose to be within 1m of its Ground Truth (GT) value on the cross-view KITTI dataset has been improved from $35.54\%$ to $76.44\%$, and the likelihood of restricting the vehicle orientation to be within $1^{\circ}$ of its GT value has been improved from $19.64\%$ to $99.10\%$.

翻訳日:2023-07-21 17:08:46 公開日:2023-07-20

# Few-Shot Sequence Labelingにおけるトークンとスパンレベルの統一化

Unifying Token and Span Level Supervisions for Few-Shot Sequence Labeling ( http://arxiv.org/abs/2307.07946v2 )

ライセンス: Link先を確認

Zifeng Cheng, Qingyu Zhou, Zhiwei Jiang, Xuemin Zhao, Yunbo Cao, Qing Gu

(参考訳) 短いショットシーケンスラベリングは、少数のラベル付きサンプルに基づいて新しいクラスを特定することを目的としている。既存の手法は、主にメトリクス学習に基づくトークンレベルまたはスパンレベルのラベルモデルを設計することで、データの不足問題を解決する。しかしながら、これらの方法は単一の粒度(トークンレベルまたはスパンレベル)でのみ訓練され、対応する粒度にいくつかの弱点がある。本稿では,まずトークンとスパンレベルの監視を統一し,数ショットのシーケンスラベリングのための一貫性デュアル適応型(CDAP)ネットワークを提案する。 CDAPにはトークンレベルとスパンレベルのネットワークが含まれており、異なる粒度で共同で訓練されている。 2つのネットワークの出力を調整するために,我々は,相互に学習できる一貫性のある損失を提案する。推定段階では,まず予測確率を調整し,次に最大確率で非重複スパンを選択する一貫した欲求推論アルゴリズムを提案する。大規模実験の結果,3つのベンチマークデータセットにおいて,新たな最先端結果が得られた。

Few-shot sequence labeling aims to identify novel classes based on only a few labeled samples. Existing methods solve the data scarcity problem mainly by designing token-level or span-level labeling models based on metric learning. However, these methods are only trained at a single granularity (i.e., either token level or span level) and have some weaknesses of the corresponding granularity. In this paper, we first unify token and span level supervisions and propose a Consistent Dual Adaptive Prototypical (CDAP) network for few-shot sequence labeling. CDAP contains the token-level and span-level networks, jointly trained at different granularities. To align the outputs of two networks, we further propose a consistent loss to enable them to learn from each other. During the inference phase, we propose a consistent greedy inference algorithm that first adjusts the predicted probability and then greedily selects non-overlapping spans with maximum probability. Extensive experiments show that our model achieves new state-of-the-art results on three benchmark datasets.

翻訳日:2023-07-21 17:08:19 公開日:2023-07-20

# 確率的政策実行不確実性を考慮した効果的な行動ロバスト強化学習

Efficient Action Robust Reinforcement Learning with Probabilistic Policy Execution Uncertainty ( http://arxiv.org/abs/2307.07666v2 )

ライセンス: Link先を確認

Guanlin Liu, Zhihan Zhou, Han Liu, Lifeng Lai

(参考訳) ロバスト強化学習(RL)は、不確実性に直面した最悪のパフォーマンスを最適化する政策を見つけることを目的としている。本稿では,ポリシーに規定される行為を常に実行する代わりに,エージェントがポリシーに指定されたアクションを確率1〜\rho$で受け取り,確率$\rho$で代替の敵対行為を行う確率的ポリシー実行の不確実性を伴うアクションロバストrlに焦点を当てる。確率的政策実行の不確実性を持つ行動ロバストmdpに対する最適ポリシーの存在を確立し,その解に対して行動ロバストなベルマン最適性方程式を提供する。さらに、最小限の後悔とサンプルの複雑さを実現するために、Action Robust Reinforcement Learning with Certificates (ARRLC)アルゴリズムを開発した。さらに,本手法のロバスト性を検証するために数値実験を行い,arrlcが非ロバストrlアルゴリズムよりも優れ,行動摂動の存在下でロバストtdアルゴリズムよりも高速に収束することを示す。

Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-\rho$ and an alternative adversarial action with probability $\rho$. We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. Furthermore, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Furthermore, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations.

翻訳日:2023-07-21 17:08:01 公開日:2023-07-20

# ロバスト容積分節化のための周波数領域adversarial training

Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation ( http://arxiv.org/abs/2307.07269v2 )

ライセンス: Link先を確認

Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

(参考訳) 医療などの重要な応用において、ディープラーニングモデルの堅牢性を確保することが不可欠である。近年の深層学習の進歩により, ボリューム画像分割モデルの性能は向上しているが, 敵攻撃に対する脆弱性のため, 現実のアプリケーションに即時に展開することはできない。本稿では,3次元周波数領域対向攻撃をボリューム画像分割モデルに適用し,従来型の入力領域やボクセル領域攻撃に対する利点を示す。提案手法を用いて,voxelおよび周波数領域攻撃に対するロバストモデルを最適化する新しい周波数領域敵訓練手法を提案する。さらに, クリーンサンプルと逆サンプルのモデル性能のトレードオフを改善するために, 周波数領域敵訓練を規制するために, 周波数一貫性の損失を提案する。コードはhttps://github.com/asif-hanif/vafaで公開されている。

It is imperative to ensure the robustness of deep learning models in critical applications such as, healthcare. While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. Using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. Code is publicly available at https://github.com/asif-hanif/vafa.

翻訳日:2023-07-21 17:07:40 公開日:2023-07-20

# 中性窒素空洞中心における軌道状態のコヒーレント電界制御

Coherent Electric-Field Control of Orbital state in a Neutral Nitrogen-Vacancy Center ( http://arxiv.org/abs/2307.07198v2 )

ライセンス: Link先を確認

Hodaka Kurokawa, Keidai Wakamatsu, Shintaro Nakazato, Toshiharu Makino, Hiromitsu Kato, Yuhei Sekiguchi, and Hideo Kosaka

(参考訳) 軌道状態のコヒーレント制御は、ダイヤモンドの色中心において極めて低電力操作を実現するために重要である。ここでは、電場による軌道制御の理想的なシステムとして、中和された窒素空孔中心であるNV$^0$を提案する。我々は、NV$^0$の基底状態における電気感受性を、NV$^-$の励起状態における電気感受性と同等に推定する。また、NV$^0$の軌道状態のコヒーレント制御を示す。軌道制御に必要な電力はスピン制御よりも3桁小さく、希釈冷凍機で作動する超伝導量子ビットと対面する可能性を強調している。

The coherent control of the orbital state is crucial for color centers in diamonds for realizing extremely low-power manipulation. Here, we propose the neutrally charged nitrogen-vacancy center, NV$^0$, as an ideal system for orbital control through electric fields. We estimate electric susceptibility in the ground state of NV$^0$ to be comparable to that in the excited state of NV$^-$. Also, we demonstrate coherent control of the orbital states of NV$^0$. The required power for orbital control is three orders of magnitude smaller than that for spin control, highlighting the potential for interfacing a superconducting qubit operated in a dilution refrigerator.

翻訳日:2023-07-21 17:07:25 公開日:2023-07-20

# ディープニューラルネットワークにおける量的clt

Quantitative CLTs in Deep Neural Networks ( http://arxiv.org/abs/2307.06092v2 )

ライセンス: Link先を確認

Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati

(参考訳) ランダムなガウス重みとバイアスを持つ完全連結ニューラルネットワークの分布について検討し,隠れた層幅が大きな定数$n$に比例することを示した。非線形性に関する穏やかな仮定の下では、正規近似の量的境界は、大きなが有限の n$ と任意の固定されたネットワーク深さで有効である。この定理は有限次元分布と全過程の両方について示しており、ランダムな完全連結ネットワーク(とその微分)と対応する無限幅ガウス過程の間の距離は、例えば$n^{-\gamma}$ for $\gamma>0$ のようにスケールする。我々の境界は、それまでの文献よりもネットワーク幅に依存しているという点で強く、一次元の場合、それらが最適であること、すなわち一致した下界を確立することを証明する。

We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.

翻訳日:2023-07-21 17:06:55 公開日:2023-07-20

# 表面電子に基づく非断熱的ホロノミック量子ゲート

Nonadiabatic holonomic quantum gates based on the surface electron ( http://arxiv.org/abs/2307.09900v2 )

ライセンス: Link先を確認

Jun Wang, Hai-Bo Wang, Qing Ai

(参考訳) 幾何学位相に基づく非線形ホロノミック量子計算は、内蔵ノイズとデコヒーレンスに対して堅牢である。本研究では, 量子計算のための有望な2次元プラットフォームである表面電子系において, 非断熱ホロノミック量子ゲートを実現するためのスキームを理論的に提案する。ホロノミックゲートは、リドベルク状態とスピン状態が不均一磁場を介して結合する3層構造によって実現される。循環進化の後、計算基盤は異なる幾何学的位相を拾い上げ、幾何学的ゲートを実行する。スピンアップした電子のみが幾何ゲートを体験し、スピンダウンした電子は状態選択駆動場から分離される。 Rydberg状態とスピン状態に符号化された制御NOTゲートが実行に移される。出力状態の忠実度は、実験的に達成可能なパラメータで 0.99 を超える。

The nonadiabatic holonomic quantum computation based on the geometric phase is robust against the built-in noise and decoherence. In this work, we theoretically propose a scheme to realize nonadiabatic holonomic quantum gates in a surface electron system, which is a promising two-dimensional platform for quantum computation. The holonomic gate is realized by a three-level structure that combines the Rydberg states and spin states via an inhomogeneous magnetic field. After a cyclic evolution, the computation bases pick up different geometric phases and thus perform a geometric gate. Only the electron with spin up experiences the geometric gate, while the electron with spin down is decoupled from the state-selective driving fields. The controlled-NOT gate encoded on the Rydberg states and spin states is then put into practice. The fidelity of the output state exceeds 0.99 with experimentally achievable parameters.

翻訳日:2023-07-21 17:00:38 公開日:2023-07-20

# AesPA-Net:美的パターン認識型転送ネットワーク

AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks ( http://arxiv.org/abs/2307.09724v2 )

ライセンス: Link先を確認

Kibeom Hong, Seogkyu Jeon, Junsoo Lee, Namhyuk Ahn, Kunhee Kim, Pilhyeon Lee, Daesik Kim, Youngjung Uh, Hyeran Byun

(参考訳) 対象のスタイルを芸術的に表現するために、近年の研究では、スタイル画像の局所パッチをコンテンツ画像の対応するパッチにマッピングする能力により、注意機構を活用している。しかし、任意の内容とアートワークのセマンティックな対応が低いため、アテンションモジュールはスタイルイメージから特定のローカルパッチを乱用し、不調和で明らかな反復的なアーティファクトをもたらす。この制限を克服し,芸術的なスタイルの伝達を困難にするため,注意機構の強化とスタイルを整理するパターンのリズムの獲得に重点を置いている。本稿では,スタイル画像におけるパターンの反復を定量化する新しい指標であるパターン反復可能性について述べる。このパターン再現性に基づき,局所的およびグローバル的表現のスイートスポットを探索する美的パターン認識型転送ネットワーク(aespa-net)を提案する。さらに,注意機構が正確で意味のある意味的対応を学習することを奨励する,新たな自己監督タスクを提案する。最後に,局所パターンの精巧なリズムを伝達するためにパッチワイズスタイルロスを導入する。定量的に定量的な評価を行い,人間の知覚に適合するパターン再現性の信頼性を検証し,提案手法の優れていることを示す。

To deliver the artistic expression of the target style, recent studies exploit the attention mechanism owing to its ability to map the local patches of the style image to the corresponding patches of the content image. However, because of the low semantic correspondence between arbitrary content and artworks, the attention module repeatedly abuses specific local patches from the style image, resulting in disharmonious and evident repetitive artifacts. To overcome this limitation and accomplish impeccable artistic style transfer, we focus on enhancing the attention mechanism and capturing the rhythm of patterns that organize the style. In this paper, we introduce a novel metric, namely pattern repeatability, that quantifies the repetition of patterns in the style image. Based on the pattern repeatability, we propose Aesthetic Pattern-Aware style transfer Networks (AesPA-Net) that discover the sweet spot of local and global style expressions. In addition, we propose a novel self-supervisory task to encourage the attention mechanism to learn precise and meaningful semantic correspondence. Lastly, we introduce the patch-wise style loss to transfer the elaborate rhythm of local patterns. Through qualitative and quantitative evaluations, we verify the reliability of the proposed pattern repeatability that aligns with human perception, and demonstrate the superiority of the proposed framework.

翻訳日:2023-07-21 17:00:26 公開日:2023-07-20

# 大規模言語モデルのための効率的誘導生成

Efficient Guided Generation for Large Language Models ( http://arxiv.org/abs/2307.09702v2 )

ライセンス: Link先を確認

Brandon T. Willard and R\'emi Louf

(参考訳) 本稿では,正規表現と文脈自由文法を用いた言語モデルテキスト生成のための効率的な手法について述べる。我々のアプローチはトークンシーケンス生成プロセスにほとんどオーバーヘッドを課さず、ガイド生成を実際に実現可能にする。実装はオープンソースのPythonライブラリOutlinesで提供されている。

In this article we describe an efficient approach to guiding language model text generation with regular expressions and context-free grammars. Our approach adds little to no overhead to the token sequence generation process, and makes guided generation feasible in practice. An implementation is provided in the open source Python library Outlines.

翻訳日:2023-07-21 17:00:03 公開日:2023-07-20

# ドメイン適応に基づく雨天・霧天における自律走行検出の強化

Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather ( http://arxiv.org/abs/2307.09676v2 )

ライセンス: Link先を確認

Jinlong Li, Runsheng Xu, Jin Ma, Qin Zou, Jiaqi Ma, Hongkai Yu

(参考訳) 通常、教師付き学習に依存する自律運転のための物体検出法は、トレーニングとテストデータの間で一貫した特徴分布を仮定するが、異なる気象条件下では失敗する可能性がある。ドメインギャップのため、晴れた天候下で訓練された検出モデルは、霧や雨の条件下ではうまく機能しない可能性がある。霧や雨の天候で検出のボトルネックを克服することは、野生に展開する自動運転車にとって真の課題だ。霧と雨天の領域間隙を橋渡しし、オブジェクト検出の性能を向上させるため、ドメイン適応オブジェクト検出のための新しいフレームワークを提案する。画像レベルとオブジェクトレベルの両方での適応は、画像スタイルの違いとドメイン間のオブジェクトの出現を最小化することを目的としている。さらに, 課題事例に対するモデルの性能向上のために, ドメイン適応に加えて, 困難な事例に対して, 敵地雷を行う新たな逆勾配反転層を導入する。さらに,新たな領域レベルの計量正規化を実施するために,データ拡張による補助ドメインの生成を提案する。公開v2vベンチマークにおける実験結果は、特に霧や雨の運転シナリオにおける物体検出の大幅な向上を示している。

Typically, object detection methods for autonomous driving that rely on supervised learning make the assumption of a consistent feature distribution between the training and testing data, however such assumption may fail in different weather conditions. Due to the domain gap, a detection model trained under clear weather may not perform well in foggy and rainy conditions. Overcoming detection bottlenecks in foggy and rainy weather is a real challenge for autonomous vehicles deployed in the wild. To bridge the domain gap and improve the performance of object detectionin foggy and rainy weather, this paper presents a novel framework for domain-adaptive object detection. The adaptations at both the image-level and object-level are intended to minimize the differences in image style and object appearance between domains. Furthermore, in order to improve the model's performance on challenging examples, we introduce a novel adversarial gradient reversal layer that conducts adversarial mining on difficult instances in addition to domain adaptation. Additionally, we suggest generating an auxiliary domain through data augmentation to enforce a new domain-level metric regularization. Experimental findings on public V2V benchmark exhibit a substantial enhancement in object detection specifically for foggy and rainy driving scenarios.

翻訳日:2023-07-21 16:59:58 公開日:2023-07-20

# 多変量可変チャネル時系列の多視点自己教師型学習

Multi-view self-supervised learning for multivariate variable-channel time series ( http://arxiv.org/abs/2307.09614v2 )

ライセンス: Link先を確認

Thea Br\"usch, Mikkel N. Schmidt, Tommy S. Alstr{\o}m

(参考訳) 多変量生物医学時系列データのラベル付けは、退屈で高価なプロセスである。自己教師付きコントラスト学習は、ラベルなしデータの事前トレーニングを通じて、大きなラベル付きデータセットの必要性を軽減する。しかし、多変量時系列データの場合、入力チャネルの集合はアプリケーションによって異なり、既存の作業の多くは異なる入力チャネルの集合を持つデータセット間の転送を許さない。入力チャネルを個別に操作するための1つのエンコーダの学習を提案する。次に、メッセージパッシングニューラルネットワークを使用して、チャネル間の単一の表現を抽出する。 6つのEEGチャネルを持つデータセット上でモデルを事前学習し、2つの異なるEEGチャネルを持つデータセット上でそれを微調整することで、この手法の可能性を示す。我々は、異なるコントラスト損失関数にまたがるメッセージパッシングニューラルネットワークとモデルを比較する。 TS2Vecの損失と組み合わせることで、ほとんどの設定で他のメソッドよりも優れていることを示す。

Labeling of multivariate biomedical time series data is a laborious and expensive process. Self-supervised contrastive learning alleviates the need for large, labeled datasets through pretraining on unlabeled data. However, for multivariate time series data, the set of input channels often varies between applications, and most existing work does not allow for transfer between datasets with different sets of input channels. We propose learning one encoder to operate on all input channels individually. We then use a message passing neural network to extract a single representation across channels. We demonstrate the potential of this method by pretraining our model on a dataset with six EEG channels and then fine-tuning it on a dataset with two different EEG channels. We compare models with and without the message passing neural network across different contrastive loss functions. We show that our method, combined with the TS2Vec loss, outperforms all other methods in most settings.

翻訳日:2023-07-21 16:59:38 公開日:2023-07-20

# 学習に基づく地形とロボット認識ダイナミクスモデルによるコンテキスト条件ナビゲーション

Context-Conditional Navigation with a Learning-Based Terrain- and Robot-Aware Dynamics Model ( http://arxiv.org/abs/2307.09206v2 )

ライセンス: Link先を確認

Suresh Guttikonda, Jan Achterhold, Haolong Li, Joschka Boedecker, Joerg Stueckler

(参考訳) 自律的なナビゲーション設定では、いくつかの量にはバリエーションがある。摩擦係数などの地形特性は、ロボットの位置によって時間によって変化する。また、ロボットのダイナミクスは、例えば、異なるペイロード、システムの質量の変更、摩耗と涙、アクチュエータのゲインの変化、関節摩擦などによって変化する可能性がある。したがって、自律エージェントはそのようなバリエーションに適応できるべきである。本稿では,その変動に適応できる新しい確率的,地形的,ロボット対応のフォワードダイナミクスモデルであるTRADYNを開発する。ニューラルプロセスに基づいたメタラーニングフォワードダイナミクスモデルの最近の進歩の上に構築されている。本手法は,一輪車のようなロボットと,空間的な摩擦係数の異なる異なる地形配置を用いて,シミュレーションによる2次元ナビゲーション環境で評価する。本実験では,非適応アブレーションモデルと比較して,長水平軌道予測のタスクに対する予測誤差が小さいことを示す。また,ナビゲーション計画の下流作業において,ロボットと地形特性を考慮に入れた制御効率の高い経路を計画する際の性能向上を示す。

In autonomous navigation settings, several quantities can be subject to variations. Terrain properties such as friction coefficients may vary over time depending on the location of the robot. Also, the dynamics of the robot may change due to, e.g., different payloads, changing the system's mass, or wear and tear, changing actuator gains or joint friction. An autonomous agent should thus be able to adapt to such variations. In this paper, we develop a novel probabilistic, terrain- and robot-aware forward dynamics model, termed TRADYN, which is able to adapt to the above-mentioned variations. It builds on recent advances in meta-learning forward dynamics models based on Neural Processes. We evaluate our method in a simulated 2D navigation setting with a unicycle-like robot and different terrain layouts with spatially varying friction coefficients. In our experiments, the proposed model exhibits lower prediction error for the task of long-horizon trajectory prediction, compared to non-adaptive ablation models. We also evaluate our model on the downstream task of navigation planning, which demonstrates improved performance in planning control-efficient paths by taking robot and terrain properties into account.

翻訳日:2023-07-21 16:59:24 公開日:2023-07-20

# LA-Net:ラベル雑音下での表情認識のためのランドマーク認識学習

LA-Net: Landmark-Aware Learning for Reliable Facial Expression Recognition under Label Noise ( http://arxiv.org/abs/2307.09023v3 )

ライセンス: Link先を確認

Zhiyu Wu, Jinshi Cui

(参考訳) 表情認識(FER)は、表現のあいまいさのため難しい課題である。派生したノイズラベルは、実世界のシナリオのパフォーマンスを著しく損なう。この問題に対処するため,我々は2つの視点からラベルノイズの影響を軽減するために顔のランドマークを利用した新しいferモデルであるlandmark-aware net~(la-net)を提案する。まず、LA-Netは、表現空間の不確実性を抑えるためにランドマーク情報を使用し、各サンプルのラベル分布を近傍集約により構築し、訓練監督の質を向上させる。第二に、設計した表現ランドマークの対照的な損失を用いて、ランドマーク情報を表現表現に組み込む。強調表現特徴抽出器はラベルノイズの影響を受けにくい。本手法は,任意の深層ニューラルネットワークと統合することで,余分な推論コストを発生させることなく,よりよい指導を行うことができる。我々は,組込みデータセットと合成ノイズデータセットの両方について広範な実験を行い,LA-Netが最先端の性能を達成することを示す。

Facial expression recognition (FER) remains a challenging task due to the ambiguity of expressions. The derived noisy labels significantly harm the performance in real-world scenarios. To address this issue, we present a new FER model named Landmark-Aware Net~(LA-Net), which leverages facial landmarks to mitigate the impact of label noise from two perspectives. Firstly, LA-Net uses landmark information to suppress the uncertainty in expression space and constructs the label distribution of each sample by neighborhood aggregation, which in turn improves the quality of training supervision. Secondly, the model incorporates landmark information into expression representations using the devised expression-landmark contrastive loss. The enhanced expression feature extractor can be less susceptible to label noise. Our method can be integrated with any deep neural network for better training supervision without introducing extra inference costs. We conduct extensive experiments on both in-the-wild datasets and synthetic noisy datasets and demonstrate that LA-Net achieves state-of-the-art performance.

翻訳日:2023-07-21 16:59:06 公開日:2023-07-20

# 個別データに基づく健康のためのマルチモーダルLCM

Multimodal LLMs for health grounded in individual-specific data ( http://arxiv.org/abs/2307.09018v2 )

ライセンス: Link先を確認

Anastasiya Belyaeva, Justin Cosentino, Farhad Hormozdiari, Krish Eswaran, Shravya Shetty, Greg Corrado, Andrew Carroll, Cory Y. McLean, Nicholas A. Furlotte

(参考訳) 基礎となる大規模言語モデル(LLM)は、健康を含む幅広い分野のタスクを解く素晴らしい能力を示している。パーソナライズされた健康タスクを効果的に解決するために、LLMは個人の健康状態に関連するさまざまなデータモダリティを抽出する能力が必要である。本稿では,マルチモーダル理解のための健康大言語モデル (helm: health large language model for multimodal understanding) を開発し,基礎疾患リスクを推定するために高次元臨床モダリティ(high-dimensional clinical modality)を活用することを可能にする。 HeLMは複雑なデータモダリティをLLMのトークン埋め込み空間にマッピングするエンコーダを学習し、データをテキストにシリアライズすることで表データのような単純なモダリティを符号化する。英国バイオバンクのデータを用いて,HeLMは高次元時系列データに加えて,人口統計学的,臨床的特徴を効果的に利用し,疾患リスクを推定できることを示した。例えば、HeLMは、表状データのみを使用する場合の0.49と比較して、表状データとスピログラムデータを組み合わせた場合の喘息予測のためのAUROCの0.75を達成している。全体として、Helmは8つのバイナリ特性から選択した古典的な機械学習アプローチよりも優れ、あるいは同等に動作する。さらに, 分布特性に対する一般化可能性や, 個人の健康と健康に関する会話を駆動する能力など, このモデルの下流利用について検討した。

Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ingest a diversity of data modalities that are relevant to an individual's health status. In this paper, we take a step towards creating multimodal LLMs for health that are grounded in individual-specific data by developing a framework (HeLM: Health Large Language Model for Multimodal Understanding) that enables LLMs to use high-dimensional clinical modalities to estimate underlying disease risk. HeLM encodes complex data modalities by learning an encoder that maps them into the LLM's token embedding space and for simple modalities like tabular data by serializing the data into text. Using data from the UK Biobank, we show that HeLM can effectively use demographic and clinical features in addition to high-dimensional time-series data to estimate disease risk. For example, HeLM achieves an AUROC of 0.75 for asthma prediction when combining tabular and spirogram data modalities compared with 0.49 when only using tabular data. Overall, we find that HeLM outperforms or performs at parity with classical machine learning approaches across a selection of eight binary traits. Furthermore, we investigate the downstream uses of this model such as its generalizability to out-of-distribution traits and its ability to power conversations around individual health and wellness.

翻訳日:2023-07-21 16:58:48 公開日:2023-07-20

# サイクル一貫性に基づく教師なしディープグラフマッチング

Unsupervised Deep Graph Matching Based on Cycle Consistency ( http://arxiv.org/abs/2307.08930v2 )

ライセンス: Link先を確認

Siddharth Tourani, Carsten Rother and Muhammad Haris Khan and Bogdan Savchynskyy

(参考訳) 我々は,教師なし深度グラフマッチングの疎密な領域と,画像のキーポイントマッチングへの応用に寄与する。標準の \emph{supervised} アプローチとは対照的に、本手法ではキーポイント対間の基底真理対応は不要である。代わりに、同じオブジェクトカテゴリの画像間のマッチングの一貫性を強制することにより、自己監視される。マッチングと一貫性損失は離散的であるため、それらの微分は直接学習には使用できない。組合せ解のブラックボックス微分に関する最近の結果に基づいて,本手法を原理的に構築することにより,この問題に対処する。この手法は任意のネットワークアーキテクチャや組合せ解法と互換性があるため,非常に柔軟である。実験により,本手法は教師なしグラフマッチングのための新しい最先端技術であることがわかった。

We contribute to the sparsely populated area of unsupervised deep graph matching with application to keypoint matching in images. Contrary to the standard \emph{supervised} approach, our method does not require ground truth correspondences between keypoint pairs. Instead, it is self-supervised by enforcing consistency of matchings between images of the same object category. As the matching and the consistency loss are discrete, their derivatives cannot be straightforwardly used for learning. We address this issue in a principled way by building our method upon the recent results on black-box differentiation of combinatorial solvers. This makes our method exceptionally flexible, as it is compatible with arbitrary network architectures and combinatorial solvers. Our experimental evaluation suggests that our technique sets a new state-of-the-art for unsupervised graph matching.

翻訳日:2023-07-21 16:58:17 公開日:2023-07-20

# DialogStudio: 会話型AIのための最もリッチで最も多様な統一データセットコレクションを目指して

DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI ( http://arxiv.org/abs/2307.10172v2 )

ライセンス: Link先を確認

Jianguo Zhang and Kun Qian and Zhiwei Liu and Shelby Heinecke and Rui Meng and Ye Liu and Zhou Yu and Huan Wang and Silvio Savarese and Caiming Xiong

(参考訳) 会話AIの進歩にもかかわらず、言語モデルは多様な会話タスクを扱うための課題に直面し、既存の対話データセットコレクションは多様性と包括性を欠いていることが多い。これらの問題に対処するために,対話データセットの最大かつ最も多様なコレクションであるDialogStudioを紹介し,元の情報を保存しながら一貫したフォーマットで統一する。本コレクションは,オープンドメイン対話,タスク指向対話,自然言語理解,対話レコメンデーション,対話要約,知識基底対話などのデータを含む。 DialogStudioの実用性をさらに向上するため、各データセットのライセンスを特定し、選択した対話のためのドメイン対応プロンプトを設計し、命令対応の微調整を容易にする。さらに、データセット収集を用いて会話型AIモデルを構築し、ゼロショットおよび少数ショット学習シナリオにおける実験により、DialogStudioの優位性を実証した。透明性を改善し、データセットやタスクベースの研究、言語モデルの事前トレーニングをサポートするため、すべてのデータセット、ライセンス、コード、dialogstudioに関連するモデルがhttps://github.com/salesforce/dialogstudioで公開されている。

Despite advancements in conversational AI, language models encounter challenges to handle diverse conversational tasks, and existing dialogue dataset collections often lack diversity and comprehensiveness. To tackle these issues, we introduce DialogStudio: the largest and most diverse collection of dialogue datasets, unified under a consistent format while preserving their original information. Our collection encompasses data from open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues, making it an incredibly rich and diverse resource for dialogue research and model training. To further enhance the utility of DialogStudio, we identify the licenses for each dataset and design domain-aware prompts for selected dialogues to facilitate instruction-aware fine-tuning. Furthermore, we develop conversational AI models using the dataset collection, and our experiments in both zero-shot and few-shot learning scenarios demonstrate the superiority of DialogStudio. To improve transparency and support dataset and task-based research, as well as language model pre-training, all datasets, licenses, codes, and models associated with DialogStudio are made publicly accessible at https://github.com/salesforce/DialogStudio

翻訳日:2023-07-21 16:48:57 公開日:2023-07-20

# 人間計算アルゴリズムの労働者としてのLLM LLMによるクラウドソーシングパイプラインのレプリケーション

LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs ( http://arxiv.org/abs/2307.10168v2 )

ライセンス: Link先を確認

Tongshuang Wu, Haiyi Zhu, Maya Albayrak, Alexis Axon, Amanda Bertsch, Wenxing Deng, Ziqi Ding, Bill Guo, Sireesh Gururaja, Tzu-Sheng Kuo, Jenny T. Liang, Ryan Liu, Ihita Mandal, Jeremiah Milbauer, Xiaolin Ni, Namrata Padmanabhan, Subhashini Ramkumar, Alexis Sudjianto, Jordan Taylor, Ying-Jui Tseng, Patricia Vaidos, Zhijin Wu, Wei Wu, Chenyang Yang

(参考訳) LLMは、以前は人間の能力専用と考えられていたクラウドソーシングタスクにおいて、人間のような行動の複製を約束している。しかし、現在の取り組みは主に単純な原子タスクに焦点を当てている。 LLMがより複雑なクラウドソーシングパイプラインを複製できるかどうかを検討する。これらの「ヒューマン・コンピュテーション・アルゴリズム」において、現代のLLMはクラウドワーカーの能力の一部をシミュレートできるが、成功のレベルは変動しており、サブタスクに必要な特定のスキル、サブタスクを実行するための最適な相互作用のモダリティによって影響される。我々は,指示に対する人間とllmの感性の違いを考察し,llmに対するヒューマンセーフガードの実現の重要性を強調し,人間とllmを相補的なスキルセットで訓練する可能性について論じる。重要なのは、クラウドソーシングパイプラインの複製が、(1)異なるタスクにおけるllmの相対的な強み(サブタスクでのパフォーマンスをクロス比較することによって)と(2)複雑なタスクにおけるllmsの潜在能力を調査するための価値のあるプラットフォームであることを示すことである。

LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these "human computation algorithms," but the level of success is variable and influenced by requesters' understanding of LLM capabilities, the specific skills required for sub-tasks, and the optimal interaction modality for performing these sub-tasks. We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets. Crucially, we show that replicating crowdsourcing pipelines offers a valuable platform to investigate (1) the relative strengths of LLMs on different tasks (by cross-comparing their performances on sub-tasks) and (2) LLMs' potential in complex tasks, where they can complete part of the tasks while leaving others to humans.

翻訳日:2023-07-21 16:48:31 公開日:2023-07-20

# 屋内空間における車両位置のドローンナビゲーションとライセンス場所検出

Drone navigation and license place detection for vehicle location in indoor spaces ( http://arxiv.org/abs/2307.10165v2 )

ライセンス: Link先を確認

Moa Arvidsson, Sithichot Sawirot, Cristofer Englund, Fernando Alonso-Fernandez, Martin Torstensson, Boris Duran

(参考訳) 毎年何百万もの車両が輸送され、船やボートに密閉されている。火災などの関連する安全問題のリスクを軽減するためには、車両の位置を知ることが不可欠である。この研究の目的は、駐車中の車両の列を移動し、ナンバープレートを検出するナノドローンに基づくソリューションを作ることだ。壁追跡アルゴリズムと、ライセンスプレートを検出するために訓練されたCNNによって実現しています。すべての計算はドローン上でリアルタイムで行われ、位置と検出された画像を送るだけで、プレートの位置がついた2Dマップが作成できる。私たちのソリューションは、8つのテストケース(数列のプレート、異なるドローン速度、あるいは低光度)にまたがるすべてのプレートを、複数のドローンの旅の計測結果を集約することで読み取ることができます。

Millions of vehicles are transported every year, tightly parked in vessels or boats. To reduce the risks of associated safety issues like fires, knowing the location of vehicles is essential, since different vehicles may need different mitigation measures, e.g. electric cars. This work is aimed at creating a solution based on a nano-drone that navigates across rows of parked vehicles and detects their license plates. We do so via a wall-following algorithm, and a CNN trained to detect license plates. All computations are done in real-time on the drone, which just sends position and detected images that allow the creation of a 2D map with the position of the plates. Our solution is capable of reading all plates across eight test cases (with several rows of plates, different drone speeds, or low light) by aggregation of measurements across several drone journeys.

翻訳日:2023-07-21 16:48:10 公開日:2023-07-20

# 不均衡な医用画像認識のための病変領域へのクラスアテンション

Class Attention to Regions of Lesion for Imbalanced Medical Image Recognition ( http://arxiv.org/abs/2307.10036v2 )

ライセンス: Link先を確認

Jia-Xin Zhuang, Jiabin Cai, Jianguo Zhang, Wei-shi Zheng and Ruixuan Wang

(参考訳) 医用画像の自動分類はインテリジェント診断システムにおいて重要な要素である。しかし、ほとんどの医療画像データセットには、一般的な疾患のサンプルが豊富に含まれており、まれなものだけが含まれており、大きな階級的不均衡につながっている。現在,不均衡なトレーニングデータから効果的に学習することは,知的診断においてオープンな問題である。本稿では, 単純で効果的なフレームワークである「textbf{C}lass \textbf{A}ttention to \textbf{RE}gions of the lesion (CARE) を提案し, 「textbf{C}onvolutional \textbf{N}eural \textbf{N}etworks (CNNs) のトレーニングプロセスに注意を埋め込んでデータ不均衡の問題に対処する。提案したアテンションモジュールは、CNNがまれな疾患の病変領域に適応するのに役立つため、CNNがそれらの特徴をより効果的に学習するのに役立つ。さらに、この注目モジュールはトレーニング段階でのみ動作し、元のネットワークのアーキテクチャを変更しないため、既存のCNNアーキテクチャと直接結合することができる。 CAREフレームワークは、まれな疾患の病変領域を表すために境界ボックスを必要とする。手動のアノテーションの必要性を軽減するため,従来のサリエンシ手法や事前訓練されたセグメンテーションモデルをボックス生成に適用することにより,CAREの変種をさらに発展させた。結果から,自動バウンディングボックス生成によるCARE変種は,従来のCAREフレームワークに比較して,‘textit{manual} バウンディングボックスアノテーションと同等であることがわかった。不均衡な皮膚画像データセットと肺炎データセットに関する一連の実験により、本手法は稀な疾患の病変領域に効果的に集中し、稀な疾患の分類性能を著しく向上することを示す。

Automated medical image classification is the key component in intelligent diagnosis systems. However, most medical image datasets contain plenty of samples of common diseases and just a handful of rare ones, leading to major class imbalances. Currently, it is an open problem in intelligent diagnosis to effectively learn from imbalanced training data. In this paper, we propose a simple yet effective framework, named \textbf{C}lass \textbf{A}ttention to \textbf{RE}gions of the lesion (CARE), to handle data imbalance issues by embedding attention into the training process of \textbf{C}onvolutional \textbf{N}eural \textbf{N}etworks (CNNs). The proposed attention module helps CNNs attend to lesion regions of rare diseases, therefore helping CNNs to learn their characteristics more effectively. In addition, this attention module works only during the training phase and does not change the architecture of the original network, so it can be directly combined with any existing CNN architecture. The CARE framework needs bounding boxes to represent the lesion regions of rare diseases. To alleviate the need for manual annotation, we further developed variants of CARE by leveraging the traditional saliency methods or a pretrained segmentation model for bounding box generation. Results show that the CARE variants with automated bounding box generation are comparable to the original CARE framework with \textit{manual} bounding box annotations. A series of experiments on an imbalanced skin image dataset and a pneumonia dataset indicates that our method can effectively help the network focus on the lesion regions of rare diseases and remarkably improves the classification performance of rare diseases.

翻訳日:2023-07-21 16:47:57 公開日:2023-07-20

# 入院バンド : 遅延のない長期的勧告の最適化

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay ( http://arxiv.org/abs/2307.09943v2 )

ライセンス: Link先を確認

Thomas M. McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil Ciosek

(参考訳) リコメンダシステムは、オンラインプラットフォームのユビキタスな機能である。利用者の長期的満足度向上に特化している。本稿では,遅延報酬を伴うマルチアームバンディット問題として定式化したコンテンツ探索課題について検討する。我々は、学習信号の選択に明らかなトレードオフがあることを観察した。完全な報酬が利用可能になるのを待つのに数週間かかり、学習の開始率を損なう可能性がある一方で、短期的なプロキシの報酬を測定することは、実際の長期的な目標を不完全に反映する。この課題を2つのステップで解決する。まず,これまでに得られた情報をすべて組み込んだ遅延報酬の予測モデルを開発する。完全な観測と部分的な(短命または中期的な)結果がベイズフィルタを通して組み合わせられ、確率論的信念が得られる。第二に、この新たな予測モデルを利用する帯域幅アルゴリズムを考案する。このアルゴリズムは、探索とエクスプロイトを慎重にバランスさせて、長期的成功に対応するコンテンツを素早く特定する。このアプローチをポッドキャストのレコメンデーション問題に適用し,ユーザが2ヶ月以上繰り返し関与している番組を識別する。短期プロキシを最適化するアプローチや、長期的な結果が完全に実現されるのを待つアプローチと比較して、我々のアプローチがはるかに優れたパフォーマンスをもたらすことを実証的に検証する。

Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards. We observe that there is an apparent trade-off in choosing the learning signal: Waiting for the full reward to become available might take several weeks, hurting the rate at which learning happens, whereas measuring short-term proxy rewards reflects the actual long-term goal only imperfectly. We address this challenge in two steps. First, we develop a predictive model of delayed rewards that incorporates all information obtained to date. Full observations as well as partial (short or medium-term) outcomes are combined through a Bayesian filter to obtain a probabilistic belief. Second, we devise a bandit algorithm that takes advantage of this new predictive model. The algorithm quickly learns to identify content aligned with long-term success by carefully balancing exploration and exploitation. We apply our approach to a podcast recommendation problem, where we seek to identify shows that users engage with repeatedly over two months. We empirically validate that our approach results in substantially better performance compared to approaches that either optimize for short-term proxies, or wait for the long-term outcome to be fully realized.

翻訳日:2023-07-21 16:47:22 公開日:2023-07-20

# 音声ヘッドビデオ生成のための暗黙のアイデンティティ表現条件付きメモリ補償ネットワーク

Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation ( http://arxiv.org/abs/2307.09906v2 )

ライセンス: Link先を確認

Fa-Ting Hong and Dan Xu

(参考訳) トーキングヘッドビデオ生成は、人物の身元を画像内に保持しつつ、ターゲット駆動ビデオから派生した動き情報を用いて、静止画像中の人間の顔に動的ポーズと表情をアニメーションすることを目的としている。しかし、運転映像における劇的かつ複雑な動きは、隠蔽された領域や微妙な表現のバリエーションに対して十分な外観情報を提供できないため、不明瞭な生成を引き起こす。この問題に対処するために,我々はグローバルな顔表現空間を学習し,MCNetと呼ばれる新しい暗黙のアイデンティティ表現条件付きメモリ補償ネットワークを設計することを提案する。具体的には、ネットワークモジュールを考案し、すべてのトレーニングサンプルから、統一的な空間的顔メタメモリバンクを学習し、より豊かな顔構造と外観を前もって提供し、その生成のための歪んだ顔特徴を補うことができる。さらに,ソース画像の離散的キーポイントから学習した暗黙的アイデンティティ表現に基づく効果的なクエリ機構を提案する。これにより、メモリバンクからより相関性の高い情報を検索し、補償を行うことができる。大規模な実験により、MCNetは代表的および補完的な顔記憶を学習でき、VoxCeleb1およびCelebVデータセットにおける従来の最先端の音声ヘッド生成方法よりも明らかに優れていることが示された。 https://github.com/harlanhong/iccv2023-mcnet}{project} を参照。

Talking head video generation aims to animate a human face in a still image with dynamic poses and expressions using motion information derived from a target-driving video, while maintaining the person's identity in the source image. However, dramatic and complex motions in the driving video cause ambiguous generation, because the still source image cannot provide sufficient appearance information for occluded regions or delicate expression variations, which produces severe artifacts and significantly degrades the generation quality. To tackle this problem, we propose to learn a global facial representation space, and design a novel implicit identity representation conditioned memory compensation network, coined as MCNet, for high-fidelity talking head generation.~Specifically, we devise a network module to learn a unified spatial facial meta-memory bank from all training samples, which can provide rich facial structure and appearance priors to compensate warped source facial features for the generation. Furthermore, we propose an effective query mechanism based on implicit identity representations learned from the discrete keypoints of the source image. It can greatly facilitate the retrieval of more correlated information from the memory bank for the compensation. Extensive experiments demonstrate that MCNet can learn representative and complementary facial memory, and can clearly outperform previous state-of-the-art talking head generation methods on VoxCeleb1 and CelebV datasets. Please check our \href{https://github.com/harlanhong/ICCV2023-MCNET}{Project}.

翻訳日:2023-07-21 16:47:00 公開日:2023-07-20

# FedSoup:選択的モデル補間によるフェデレーション学習における一般化とパーソナライゼーションの改善

FedSoup: Improving Generalization and Personalization in Federated Learning via Selective Model Interpolation ( http://arxiv.org/abs/2307.10507v1 )

ライセンス: Link先を確認

Minghui Chen, Meirui Jiang, Qi Dou, Zehua Wang, Xiaoxiao Li

(参考訳) cross-silo federated learning(fl)は、病院や臨床研究所などのデータセンタに分散したデータセット上の機械学習モデルの開発を可能にする。しかし、最近の研究では、現在のFLアルゴリズムは、分布シフトに直面した場合、局所的な性能とグローバルな性能のトレードオフに直面している。具体的には、パーソナライズされたflメソッドは、ローカルデータに過度に適合する傾向があり、ローカルモデルに鋭い谷が発生し、分散データに一般化する能力が阻害される。本稿では,地域とグローバルのパフォーマンスのトレードオフを最適化するために,新しいフェデレーションモデルスープ法(モデルパラメータの選択補間)を提案する。具体的には、フェデレーショントレーニングフェーズの間、各クライアントは、ローカルモデルとグローバルモデル間の補間モデルのパフォーマンスを監視して、独自のグローバルモデルプールを維持する。これにより、オーバーフィッティングを緩和し、フラットなミニマを求めることができ、モデルの一般化性能を大幅に改善できます。提案手法は,網膜および病理像の分類タスクにおける評価手法であり,本手法は分布汎化において有意な改善が得られた。私たちのコードはhttps://github.com/ubc-tea/fedsoupで利用可能です。

Cross-silo federated learning (FL) enables the development of machine learning models on datasets distributed across data centers such as hospitals and clinical research laboratories. However, recent research has found that current FL algorithms face a trade-off between local and global performance when confronted with distribution shifts. Specifically, personalized FL methods have a tendency to overfit to local data, leading to a sharp valley in the local model and inhibiting its ability to generalize to out-of-distribution data. In this paper, we propose a novel federated model soup method (i.e., selective interpolation of model parameters) to optimize the trade-off between local and global performance. Specifically, during the federated training phase, each client maintains its own global model pool by monitoring the performance of the interpolated model between the local and global models. This allows us to alleviate overfitting and seek flat minima, which can significantly improve the model's generalization performance. We evaluate our method on retinal and pathological image classification tasks, and our proposed method achieves significant improvements for out-of-distribution generalization. Our code is available at https://github.com/ubc-tea/FedSoup.

翻訳日:2023-07-21 15:21:08 公開日:2023-07-20

# Grad-CAMは医療画像で説明できるのか?

Is Grad-CAM Explainable in Medical Images? ( http://arxiv.org/abs/2307.10506v1 )

ライセンス: Link先を確認

Subhashis Suara, Aayush Jha, Pratik Sinha, Arif Ahmed Sekh

(参考訳) 説明可能なディープラーニング(Explainable Deep Learning)は、人工知能(AI)分野、特に医療画像などの領域において、効果的な診断と治療計画のために正確かつ解釈可能な機械学習モデルが不可欠である。 Grad-CAMは、ディープラーニングモデルの意思決定プロセスで使用される画像の最も重要な領域を強調し、解釈可能性を高め、結果に対する信頼を高めるベースラインである。これは分類や説明など多くのコンピュータビジョン(CV)タスクに適用されている。本研究では,説明可能な深層学習の原理と医用画像との関連性について考察し,様々な説明可能性技術とその限界について考察し,Grad-CAMの医用画像応用について検討する。この結果は、医療画像におけるディープラーニングモデルの精度と解釈性を改善するために、説明可能なDeep LearningとGrad-CAMの可能性を浮き彫りにした。コードは利用可能である(利用可能になる予定)。

Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are crucial for effective diagnosis and treatment planning. Grad-CAM is a baseline that highlights the most critical regions of an image used in a deep learning model's decision-making process, increasing interpretability and trust in the results. It is applied in many computer vision (CV) tasks such as classification and explanation. This study explores the principles of Explainable Deep Learning and its relevance to medical imaging, discusses various explainability techniques and their limitations, and examines medical imaging applications of Grad-CAM. The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging. The code is available in (will be available).

翻訳日:2023-07-21 15:20:48 公開日:2023-07-20

# 画像表現における解釈可能な部分空間の同定

Identifying Interpretable Subspaces in Image Representations ( http://arxiv.org/abs/2307.10504v1 )

ライセンス: Link先を確認

Neha Kalibhat, Shweta Bhardwaj, Bayan Bruss, Hamed Firooz, Maziar Sanjabi, Soheil Feizi

(参考訳) 画像表現の特徴を解釈可能なフレームワークであるコントラスト概念(FALCON)を用いた自動特徴記述を提案する。ターゲット機能としてFALCONは、大きなキャプションデータセット(LAION-400mなど)とCLIPのような訓練済みの視覚言語モデルを使って、高機能なクロップ画像をキャプションする。キャプションの中の各単語はランク付けされ、ターゲットの特徴を詳細に記述した少数の共有、人間理解可能な概念へと導かれる。 FALCONはまた、低活性化(偽造)画像を用いた対照的な解釈を適用して、急激な概念を排除した。既存の多くのアプローチは独立して特徴を解釈するが、最先端の自己監督モデルや教師付きモデルでは、表現空間の20%未満は個々の特徴によって説明できる。より広い空間における特徴は、グループで研究するとより解釈しやすくなり、FALCONを通して高次スコアリングの概念で説明できることを示す。下流タスクにおける障害の説明とデバッグに抽出された概念をどのように利用できるかについて議論する。最後に、簡単な線形変換を学習することにより、ある(説明可能な)表現空間から別の見えない表現空間へ概念を移す手法を提案する。

We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations. For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset (like LAION-400m) and a pre-trained vision-language model like CLIP. Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts that closely describe the target feature. FALCON also applies contrastive interpretation using lowly activating (counterfactual) images, to eliminate spurious concepts. Although many existing approaches interpret features independently, we observe in state-of-the-art self-supervised and supervised models, that less than 20% of the representation space can be explained by individual features. We show that features in larger spaces become more interpretable when studied in groups and can be explained with high-order scoring concepts through FALCON. We discuss how extracted concepts can be used to explain and debug failures in downstream tasks. Finally, we present a technique to transfer concepts from one (explainable) representation space to another unseen representation space by learning a simple linear transformation.

翻訳日:2023-07-21 15:20:32 公開日:2023-07-20

# 安定性, 状態, 入力制約型安全フィルタを用いた微分フラット学習モデル予測制御

Differentially Flat Learning-based Model Predictive Control Using a Stability, State, and Input Constraining Safety Filter ( http://arxiv.org/abs/2307.10541v1 )

ライセンス: Link先を確認

Adam W. Hall and Melissa Greeff and Angela P. Schoellig

(参考訳) 学習に基づく最適制御アルゴリズムは、過去の軌道データとシステムダイナミクスの学習モデルを用いて未知のシステムを制御する。これらのコントローラは、学習したダイナミクスの線形近似、高速な計算のためのトレーディングパフォーマンス、あるいは一般的には性能は良いがリアルタイム適用性を制限する非線形最適化手法のいずれかを使用する。本稿では,最先端の学習ベースコントローラと同様の性能を実現するために微分平坦性を利用した新しい非線形コントローラを提案する。微分平坦性は、非線形入力写像によって非線形系を正確に線形化することができる力学系の特性である。ここで、非線形変換はガウス過程として学習され、高い確率、安定性、入力および平らな状態制約満足度を保証する安全フィルタで使用される。この安全フィルタは、フラットモデル予測制御器からの入力を洗練して、2つの連続凸最適化により制約付き非線形学習に基づく最適制御を行う。本手法を最先端の学習ベースの制御戦略と比較し,同様の性能を実現するとともに,計算効率が大幅に向上するとともに,フラット状態と入力制約を尊重し,安定性を保証した。

Learning-based optimal control algorithms control unknown systems using past trajectory data and a learned model of the system dynamics. These controllers use either a linear approximation of the learned dynamics, trading performance for faster computation, or nonlinear optimization methods, which typically perform better but can limit real-time applicability. In this work, we present a novel nonlinear controller that exploits differential flatness to achieve similar performance to state-of-the-art learning-based controllers but with significantly less computational effort. Differential flatness is a property of dynamical systems whereby nonlinear systems can be exactly linearized through a nonlinear input mapping. Here, the nonlinear transformation is learned as a Gaussian process and is used in a safety filter that guarantees, with high probability, stability as well as input and flat state constraint satisfaction. This safety filter is then used to refine inputs from a flat model predictive controller to perform constrained nonlinear learning-based optimal control through two successive convex optimizations. We compare our method to state-of-the-art learning-based control strategies and achieve similar performance, but with significantly better computational efficiency, while also respecting flat state and input constraints, and guaranteeing stability.

翻訳日:2023-07-21 15:10:22 公開日:2023-07-20

# 測地線進化による広帯域不明瞭量子センシング

Wide-band Unambiguous Quantum Sensing via Geodesic Evolution ( http://arxiv.org/abs/2307.10537v1 )

ライセンス: Link先を確認

Ke Zeng, Xiaohui Yu, Martin B. Plenio, and Zhen-Yu Wang

(参考訳) 本稿では, 量子センシング技術を用いて, 量子ビットのダイナミックスを, 断熱進化の測地線に沿って循環的に駆動する手法を提案する。このアプローチは、動的デカップリング制御でよく発生する高調波や刺激応答などの不要な共振項を同時に除去しながら、デコヒーレンスノイズと制御誤差の両方の効果を効果的に抑制する。その結果、本手法は、スピンを含む量子システムの信号検出と個別アドレス付けにロバストで広帯域であいまいで高分解能な量子センシング機能を提供する。その汎用性を示すために,本手法の低周波および高周波センシングへの応用例を示す。この量子センシング技術の重要性は、複雑な信号の検出と複雑な量子環境の制御にまで及ぶ。検出精度を高め, 量子システムの精密操作を可能にすることで, 様々な実用的応用が期待できる。

We present a quantum sensing technique that utilizes a sequence of $\pi$ pulses to cyclically drive the qubit dynamics along a geodesic path of adiabatic evolution. This approach effectively suppresses the effects of both decoherence noise and control errors while simultaneously removing unwanted resonance terms, such as higher harmonics and spurious responses commonly encountered in dynamical decoupling control. As a result, our technique offers robust, wide-band, unambiguous, and high-resolution quantum sensing capabilities for signal detection and individual addressing of quantum systems, including spins. To demonstrate its versatility, we showcase successful applications of our method in both low-frequency and high-frequency sensing scenarios. The significance of this quantum sensing technique extends to the detection of complex signals and the control of intricate quantum environments. By enhancing detection accuracy and enabling precise manipulation of quantum systems, our method holds considerable promise for a variety of practical applications.

翻訳日:2023-07-21 15:09:59 公開日:2023-07-20

# 乗算ロバスト推定器による因果推論におけるニューラルネットワークモデルのハイパーパラメータチューニング

Multiply Robust Estimator Circumvents Hyperparameter Tuning of Neural Network Models in Causal Inference ( http://arxiv.org/abs/2307.10536v1 )

ライセンス: Link先を確認

Mehdi Rostami, Olli Saarela

(参考訳) 平均処理効果(ATE)の推定は2ステップで行われ、第1ステップでは治療と結果がモデル化され、第2ステップでは予測がATE推定器に挿入される。最初のステップでは、機械学習アルゴリズムの使用を含む、多くのモデルが治療と結果に適合する。しかしながら、最も因果効果の高い推定と推論をもたらす超パラメータ集合の中から選択することは難しい課題である。乗算ロバスト (MR) 推定器は1つの推定器で全ての第一段階モデルを活用できる。 MR推定器が、第一段階の処理または結果モデルの1つが$n^r$整合であれば、$n^r$整合であることを示す。また、MR が方程式の幅広いクラスの解であり、処理モデルの一つが $\sqrt{n}$-consistent であれば漸近的に正規であることを示す。 MRの標準誤差も計算され、最初のステップで真のモデルの知識を必要としない。我々のシミュレーション研究は理論的な発見を支持している。

Estimation of the Average Treatment Effect (ATE) is often carried out in 2 steps, wherein the first step, the treatment and outcome are modeled, and in the second step the predictions are inserted into the ATE estimator. In the first steps, numerous models can be fit to the treatment and outcome, including using machine learning algorithms. However, it is a difficult task to choose among the hyperparameter sets which will result in the best causal effect estimation and inference. Multiply Robust (MR) estimator allows us to leverage all the first-step models in a single estimator. We show that MR estimator is $n^r$ consistent if one of the first-step treatment or outcome models is $n^r$ consistent. We also show that MR is the solution to a broad class of estimating equations, and is asymptotically normal if one of the treatment models is $\sqrt{n}$-consistent. The standard error of MR is also calculated which does not require a knowledge of the true models in the first step. Our simulations study supports the theoretical findings.

翻訳日:2023-07-21 15:09:40 公開日:2023-07-20

# Hypernetworks を用いた高速非教師付き深層モデル選択

Fast Unsupervised Deep Outlier Model Selection with Hypernetworks ( http://arxiv.org/abs/2307.10529v1 )

ライセンス: Link先を確認

Xueying Ding, Yue Zhao, Leman Akoglu

(参考訳) 外乱検出(OD)は、多くのテクニックの豊富な文献で多くの応用を見出す。 deep neural network based od (dod) は、ディープラーニングの多くの進歩によって、近年注目を集めている。本稿では,教師なしDOD,すなわち実効性ハイパーパラメータ(HP)チューニング/モデル選択による批判的評価課題について考察する。いくつかの先行研究では、ODモデルのHPに対する感受性が報告されているが、HPの長いリストを示す現代のDODモデルにとって、非常に重要になっている。我々は,DODモデルのチューニングにHYPERを導入し,(1)監督のない検証(ラベル付き異常の欠如による)と(2)HP/モデル空間の効率的な探索(HP数の増加による)という2つの基本的な課題に対処する。鍵となるアイデアは、HPをメインのDODモデルの最適な重みにマッピングする新しいハイパーネットワーク(HN)を設計し、訓練することである。 HYPERは、多くのDODモデルの重みを動的に生成できる単一のHN(HPの異なるモデルに対応する)に乗じて、大幅なスピードアップを実現している。さらに,従来のODタスクのメタラーニングを利用して,提案したHNを効率的に訓練したプロキシ検証関数をラベルでトレーニングする。 35のODタスクに対する大規模な実験により、HYPERは高い効率で8つのベースラインに対して高いパフォーマンスを達成している。

Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.

翻訳日:2023-07-21 15:09:20 公開日:2023-07-20

# Black-Box Adviceを超える:Q値予測付きMDPのための学習拡張アルゴリズム

Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions ( http://arxiv.org/abs/2307.10524v1 )

ライセンス: Link先を確認

Tongxin Li, Yiheng Lin, Shaolei Ren and Adam Wierman

(参考訳) 単軌道時間変化マルコフ決定過程(MDP)の文脈における一貫性と堅牢性の間のトレードオフを、信頼できない機械学習アドバイスを用いて検討する。私たちの作業は、アドバイスの生成方法に関する追加情報が得られる設定を考慮し、ブラックボックスソースからのアドバイスを取り扱う典型的なアプローチから外れています。連続的および離散的状態/作用空間を含む一般MDPモデルの下でQ値のアドバイスを与えられた第一種一貫性とロバスト性トレードオフを証明する。以上の結果から,Q値アドバイスを利用することで,機械学習によるアドバイスとロバストなベースラインを動的に追求することが可能となり,ほぼ最適な性能保証が得られ,ブラックボックスアドバイスのみで得られるものが改善されることが示唆された。

We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.

翻訳日:2023-07-21 15:08:55 公開日:2023-07-20

# ジェンダーチューニング: 事前訓練された言語モデルに悪影響を及ぼすための微調整

Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models ( http://arxiv.org/abs/2307.10522v1 )

ライセンス: Link先を確認

Somayeh Ghanbarzadeh, Yan Huang, Hamid Palangi, Radames Cruz Moreno, and Hamed Khanpour

(参考訳) 近年の研究では、広く使用されているプレトレーニング言語モデル(plm)が、非モデレーションプレトレーニングコーパスから社会バイアスを広めていることが明らかになっている。既存のソリューションでは、リソース集約的でコストのかかるデバイアスのためのトレーニングプロセスとデータセットが必要です。さらに、これらの手法は、下流タスクにおけるPLMのパフォーマンスを損なう。本研究では,下流タスクのデータセットを微調整することでPLMを脱臭するジェンダーチューニングを提案する。この目的のために、Gender-tuning は Masked Language Modeling (MLM) トレーニング目標をファインチューニングのトレーニングプロセスに統合する。包括的実験により、ジェンダーチューニングはplmの平均性バイアススコアの点で最先端のベースラインよりも優れており、下流タスクのデータセットのみを使用して下流タスクにおけるplmのパフォーマンスを改善していることが示された。また、性別調整は、オリジナルの微調整で動作するplmのデプロイ可能なデバイアスツールである。

Recent studies have revealed that the widely-used Pre-trained Language Models (PLMs) propagate societal biases from the large unmoderated pre-training corpora. Existing solutions require debiasing training processes and datasets for debiasing, which are resource-intensive and costly. Furthermore, these methods hurt the PLMs' performance on downstream tasks. In this study, we propose Gender-tuning, which debiases the PLMs through fine-tuning on downstream tasks' datasets. For this aim, Gender-tuning integrates Masked Language Modeling (MLM) training objectives into fine-tuning's training process. Comprehensive experiments show that Gender-tuning outperforms the state-of-the-art baselines in terms of average gender bias scores in PLMs while improving PLMs' performance on downstream tasks solely using the downstream tasks' dataset. Also, Gender-tuning is a deployable debiasing tool for any PLM that works with original fine-tuning.

翻訳日:2023-07-21 15:08:39 公開日:2023-07-20

# 文脈のない異種ジェスチャーの対話的セグメンテーション

Interactive Segmentation for Diverse Gesture Types Without Context ( http://arxiv.org/abs/2307.10518v1 )

ライセンス: Link先を確認

Josh Myers-Dean, Yifei Fan, Brian Price, Wilson Chan, Danna Gurari

(参考訳) インタラクティブセグメンテーションは、モデルがどのようにセグメンテーションを作成し、編集するかを導くために、人間がイメージをマークする。画像にマーキングするためのジェスチャタイプ(クリックやスクリブルなど)のみをサポートするか、使用中のジェスチャタイプの知識を必要とするか、最終セグメンテーションにマークされた領域が含まれているか除外されるべきかを指定する必要があります。その代わりに,ユーザがイメージのみをマークしなければならない,ジェスチャータイプを指定せずに任意のジェスチャータイプを入力できる,シンプルな対話型セグメンテーションタスクを提案する。我々は,対話型セグメンテーションアルゴリズムを全体評価可能な新しい評価指標とともに,複数のジェスチャー型を持つ最初の対話型セグメンテーションデータセットを導入することで,この新しいタスクを支援する。そして、新しいタスクに適応した部分を含む多数の対話的セグメンテーションアルゴリズムを分析する。全体として有望なパフォーマンスを観察しながら、将来的な改善の領域も強調しています。この作業をさらに拡張するために、新しいデータセットをhttps://github.com/joshmyersdean/digで公開しています。

Interactive segmentation entails a human marking an image to guide how a model either creates or edits a segmentation. Our work addresses limitations of existing methods: they either only support one gesture type for marking an image (e.g., either clicks or scribbles) or require knowledge of the gesture type being employed, and require specifying whether marked regions should be included versus excluded in the final segmentation. We instead propose a simplified interactive segmentation task where a user only must mark an image, where the input can be of any gesture type without specifying the gesture type. We support this new task by introducing the first interactive segmentation dataset with multiple gesture types as well as a new evaluation metric capable of holistically evaluating interactive segmentation algorithms. We then analyze numerous interactive segmentation algorithms, including ones adapted for our novel task. While we observe promising performance overall, we also highlight areas for future improvement. To facilitate further extensions of this work, we publicly share our new dataset at https://github.com/joshmyersdean/dig.

翻訳日:2023-07-21 15:08:20 公開日:2023-07-20

# 地域包摂型社会文化的包摂型ステレオタイプ資源の構築

Building Socio-culturally Inclusive Stereotype Resources with Community Engagement ( http://arxiv.org/abs/2307.10514v1 )

ライセンス: Link先を確認

Sunipa Dev, Jaya Goyal, Dinesh Tewari, Shachi Dave, Vinodkumar Prabhakaran

(参考訳) グローバル環境における生成言語モデルの迅速な開発と展開は,害の量や種類だけでなく,辺境的なアイデンティティや経験した社会的偏見など,地域文化の文脈をいかにうまく捉えているかという点において,我々の害の測定をスケールする必要がある。現在の評価パラダイムは、多様で局所的だがグローバルな社会文化的な視点を代表していないため、この問題に対処する能力に限られている。危険度測定における過度な過小評価やスキューを防止するため、世界各国の文化や社会から人や経験を取り入れることで、評価資源の強化と校正が不可欠である。本研究は,インド社会における評価資源の社会文化的に認識された拡大,特にステレオタイプによる影響について示すものである。我々は、インドに特有の格差の軸のステレオタイプを含むリソースを構築するためのコミュニティの取り組みを考案する。結果として得られる資源は、インド文脈で知られているステレオタイプの数を、多くのユニークなアイデンティティで1000以上のステレオタイプに増加させる。また,言語モデル評価のための拡張資源の有用性と有効性を示す。コンテンツ警告: 本論文は攻撃的かもしれないステレオタイプの例を含む。

With rapid development and deployment of generative language models in global settings, there is an urgent need to also scale our measurements of harm, not just in the number and types of harms covered, but also how well they account for local cultural contexts, including marginalized identities and the social biases experienced by them. Current evaluation paradigms are limited in their abilities to address this, as they are not representative of diverse, locally situated but global, socio-cultural perspectives. It is imperative that our evaluation resources are enhanced and calibrated by including people and experiences from different cultures and societies worldwide, in order to prevent gross underestimations or skews in measurements of harm. In this work, we demonstrate a socio-culturally aware expansion of evaluation resources in the Indian societal context, specifically for the harm of stereotyping. We devise a community engaged effort to build a resource which contains stereotypes for axes of disparity that are uniquely present in India. The resultant resource increases the number of stereotypes known for and in the Indian context by over 1000 stereotypes across many unique identities. We also demonstrate the utility and effectiveness of such expanded resources for evaluations of language models. CONTENT WARNING: This paper contains examples of stereotypes that may be offensive.

翻訳日:2023-07-21 15:08:00 公開日:2023-07-20

# IvyGPT : 医療領域における中国語パスウェイ言語モデル

IvyGPT: InteractiVe Chinese pathwaY language model in medical domain ( http://arxiv.org/abs/2307.10512v1 )

ライセンス: Link先を確認

Rongsheng Wang and Yaofei Duan and ChanTong Lam and Jiexi Chen and Jiangsheng Xu and Haoming Chen and Xiaohong Liu and Patrick Cheong-Iao Pang and Tao Tan

(参考訳) ChatGPTのような一般的な大規模言語モデル(LLM)は顕著な成功を収めている。しかし、これらのLSMは、精度が低く、医療アドバイスができないため、医学的に広く採用されていない。我々は、高品質なQA(QA)インスタンスとRLHF(Reinforcement Learning from Human Feedback)インスタンスで訓練および微調整を行うLLaMAに基づくLLMであるIvyGPTを提案する。教師付き微調整の後、IvyGPTは多ターン会話能力に優れるが、包括的診断など他の面では医師のようには機能しない。 RLHFを通じて、IvyGPTは人間に近いリッチな診断と治療の回答を出力することができる。トレーニングでは、QLoRAを使用して、少数のNVIDIA A100 (80GB) GPU上で33億のパラメータをトレーニングしました。実験の結果、IvyGPTは他の医療用GPTモデルよりも優れていた。

General large language models (LLMs) such as ChatGPT have shown remarkable success. However, such LLMs have not been widely adopted for medical purposes, due to poor accuracy and inability to provide medical advice. We propose IvyGPT, an LLM based on LLaMA that is trained and fine-tuned with high-quality medical question-answer (QA) instances and Reinforcement Learning from Human Feedback (RLHF). After supervised fine-tuning, IvyGPT has good multi-turn conversation capabilities, but it cannot perform like a doctor in other aspects, such as comprehensive diagnosis. Through RLHF, IvyGPT can output richer diagnosis and treatment answers that are closer to human. In the training, we used QLoRA to train 33 billion parameters on a small number of NVIDIA A100 (80GB) GPUs. Experimental results show that IvyGPT has outperformed other medical GPT models.

翻訳日:2023-07-21 15:07:40 公開日:2023-07-20

# マルチモーダル感情分析のための一般デバイアス

General Debiasing for Multimodal Sentiment Analysis ( http://arxiv.org/abs/2307.10511v1 )

ライセンス: Link先を確認

Teng Sun, Juntong Ni, Wenjie Wang, Liqiang Jing, Yinwei Wei, and Liqiang Nie

(参考訳) 既存のマルチモーダル感性分析(MSA)の研究は、マルチモーダル特徴と感情ラベルの急激な相関を適合させることなく、予測にマルチモーダル情報を利用する。例えば、青い背景を持つほとんどのビデオがデータセットにポジティブなラベルを持っている場合、モデルは予測のためにこのような相関に依存するが、'blue background''は感情に関連した機能ではない。この問題に対処するために、我々は、突発的相関への依存を減らすことで、MSAモデルの外部分布(OOD)一般化能力を向上することを目的とした、一般的なMSAタスクを定義する。そこで本研究では,より偏りが大きい試料に対して適応的に小さな重みを割り当てる逆確率重み付け(ipw)に基づく一般的な偏りの枠組みを提案する。この脱バイアスフレームワークの鍵は、各サンプルのバイアスを推定することであり、これは2つのステップによって達成される。 1)各モダリティにおけるロバストな特徴と偏った特徴の分離 2)バイアス特徴を利用してバイアスを推定する。最後に,IPWを用いて大規模バイアスサンプルの効果を低減し,感情予測のための堅牢な特徴学習を実現する。モデルの一般化能力を調べるために、元のテストセットを2つのベンチマークに保持し、さらに複数のユニモーダルおよびマルチモーダルのoodテストセットを構築する。実験結果は,提案フレームワークの優れた一般化能力を示すものである。我々は、複製を容易にするコードとデータをリリースした。

Existing work on Multimodal Sentiment Analysis (MSA) utilizes multimodal information for prediction yet unavoidably suffers from fitting the spurious correlations between multimodal features and sentiment labels. For example, if most videos with a blue background have positive labels in a dataset, the model will rely on such correlations for prediction, while ``blue background'' is not a sentiment-related feature. To address this problem, we define a general debiasing MSA task, which aims to enhance the Out-Of-Distribution (OOD) generalization ability of MSA models by reducing their reliance on spurious correlations. To this end, we propose a general debiasing framework based on Inverse Probability Weighting (IPW), which adaptively assigns small weights to the samples with larger bias i.e., the severer spurious correlations). The key to this debiasing framework is to estimate the bias of each sample, which is achieved by two steps: 1) disentangling the robust features and biased features in each modality, and 2) utilizing the biased features to estimate the bias. Finally, we employ IPW to reduce the effects of large-biased samples, facilitating robust feature learning for sentiment prediction. To examine the model's generalization ability, we keep the original testing sets on two benchmarks and additionally construct multiple unimodal and multimodal OOD testing sets. The empirical results demonstrate the superior generalization ability of our proposed framework. We have released the code and data to facilitate the reproduction.

翻訳日:2023-07-21 15:07:23 公開日:2023-07-20

# FACADE: 逆回路異常検出と評価のためのフレームワーク

FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation ( http://arxiv.org/abs/2307.10563v1 )

ライセンス: Link先を確認

Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

(参考訳) 本稿では、深層ニューラルネットワークにおける教師なし機械的異常検出のための新しい確率的および幾何学的フレームワークであるFACADEを提案する。その主な目標は、敵の攻撃の理解と緩和を促進することである。 FACADEは、回路上の確率分布を生成することを目的としており、擬似クラスや活性化空間における高次元モードの多様体特性の変化への寄与に重要な洞察を与え、敵の攻撃を発見・戦える強力なツールを提供する。我々のアプローチは、モデルの堅牢性を改善し、スケーラブルなモデル監視を強化し、現実のデプロイメント環境で有望なアプリケーションを実証することを目指している。

We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks. Its primary goal is advancing the understanding and mitigation of adversarial attacks. FACADE aims to generate probabilistic distributions over circuits, which provide critical insights to their contribution to changes in the manifold properties of pseudo-classes, or high-dimensional modes in activation space, yielding a powerful tool for uncovering and combating adversarial attacks. Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.

翻訳日:2023-07-21 15:02:30 公開日:2023-07-20

# 共用逆学習:非学習共用逆学習によるバックドア緩和

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples ( http://arxiv.org/abs/2307.10562v1 )

ライセンス: Link先を確認

Shaokui Wei, Mingda Zhang, Hongyuan Zha, Baoyuan Wu

(参考訳) バックドア攻撃は、敵がトレーニングセットに有毒なサンプルを注入し、特定のターゲットクラスに特定のトリガーを含む有毒なサンプルを予測するバックドアモデルを引き起こす機械学習モデルに対する深刻なセキュリティ脅威である。本稿では,小さなクリーンデータセットを用いて,バックドアモデルの浄化作業について検討する。バックドアリスクと逆境リスクの関連性を確立することにより、バックドアモデルと浄化モデルとの間の共有敵例(SAE)のリスクを主に捉えた、バックドアリスクの新たな上限を導出する。この上界はさらに、対向訓練技術を用いてバックドアを緩和する新しい二段階最適化問題を示唆している。そこで本稿では,SAU(Shared Adversarial Unlearning)を提案する。具体的には、SAUはまずSAEを生成し、次いで生成されたSAEを、精製されたモデルによって正しく分類されるか、2つのモデルによって正しく分類され、バックドアモデルにおけるバックドア効果が浄化されたモデルで緩和されるように解放する。各種ベンチマークデータセットとネットワークアーキテクチャの実験により,提案手法がバックドアディフェンスの最先端性能を実現することを示す。

Backdoor attacks are serious security threats to machine learning models where an adversary can inject poisoned samples into the training set, causing a backdoored model which predicts poisoned samples with particular triggers to particular target classes, while behaving normally on benign samples. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk, which mainly captures the risk on the shared adversarial examples (SAEs) between the backdoored model and the purified model. This upper bound further suggests a novel bi-level optimization problem for mitigating backdoor using adversarial training techniques. To solve it, we propose Shared Adversarial Unlearning (SAU). Specifically, SAU first generates SAEs, and then, unlearns the generated SAEs such that they are either correctly classified by the purified model and/or differently classified by the two models, such that the backdoor effect in the backdoored model will be mitigated in the purified model. Experiments on various benchmark datasets and network architectures show that our proposed method achieves state-of-the-art performance for backdoor defense.

翻訳日:2023-07-21 15:02:19 公開日:2023-07-20

# 変分後量子ニューラルネットワーク

Post-variational quantum neural networks ( http://arxiv.org/abs/2307.10560v1 )

ライセンス: Link先を確認

Po-Wei Huang, Patrick Rebentrost

(参考訳) 量子コンピューティングは、現在の最先端の古典的スーパーコンピュータよりも大きな計算上の利点を提供する可能性がある。しかし、現在のハードウェアはフォールトトレラント量子アルゴリズムを実行するには不十分である。変分アルゴリズムを用いたハイブリッド量子古典計算の代替として、バレンプラトー問題があり、勾配に基づく最適化手法の収束が遅い。本稿では,量子モデル最適化において,可変パラメータを量子コンピュータから古典コンピュータにシフトし,アンサンブル戦略を選択する「変分後戦略」について述べる。個々の量子回路を構築するための様々な戦略と設計原則について論じ、その結果のアンサンブルを凸プログラミングで最適化することができる。さらに,変分後量子ニューラルネットワークのアーキテクチャ設計について検討し,そのようなニューラルネットワークにおける推定誤差の伝播解析を行う。最後に,手書き桁のイメージ分類などの実世界の応用に適用し,96%の精度で分類できることを示す。

Quantum computing has the potential to provide substantial computational advantages over current state-of-the-art classical supercomputers. However, current hardware is not advanced enough to execute fault-tolerant quantum algorithms. An alternative of using hybrid quantum-classical computing with variational algorithms can exhibit barren plateau issues, causing slow convergence of gradient-based optimization techniques. In this paper, we discuss "post-variational strategies", which shift tunable parameters from the quantum computer to the classical computer, opting for ensemble strategies when optimizing quantum models. We discuss various strategies and design principles for constructing individual quantum circuits, where the resulting ensembles can be optimized with convex programming. Further, we discuss architectural designs of post-variational quantum neural networks and analyze the propagation of estimation errors throughout such neural networks. Lastly, we show that our algorithm can be applied to real-world applications such as image classification on handwritten digits, producing a 96% classification accuracy.

翻訳日:2023-07-21 15:01:55 公開日:2023-07-20

# 共形動的グラフ学習を用いたエアトラヒックコントローラの負荷レベル予測

Air Traffic Controller Workload Level Prediction using Conformalized Dynamical Graph Learning ( http://arxiv.org/abs/2307.10559v1 )

ライセンス: Link先を確認

Yutian Pang, Jueming Hu, Christopher S. Lieber, Nancy J. Cooke, Yongming Liu

(参考訳) 航空管制 (atc) は、地上交通管制局 (atcos) が日々の航空運用を維持するために常に注意を払わなければならない安全クリティカルサービスシステムである。 ATCoの作業負荷は、運用上の安全性と空域利用に悪影響を及ぼす可能性がある。 ATCosの過負荷を回避し、許容されるワークロードレベルを確保するためには、ATCosのワークロードを正確に予測することが重要である。本稿では,まず,航空交通の観点からatcoの作業負荷に関する研究を概観した。そこで,本研究では,航空交通データとワークロードラベルが得られたATCoによるHuman-in-the-loop(HITL)シミュレーションのセットアップについて紹介する。シミュレーションは3つのphoenixアプローチのシナリオで行われ、ヒトのatcoは負荷評価(低-1から高7)を自己評価するよう要求される。予備データ分析を行う。次に,共形予測を用いたグラフベースのディープラーニングフレームワークを提案し,atcoのワークロードレベルを同定する。制御器の制御下にある航空機の数は空間的にも時間的にも変化し、動的に進化するグラフとなる。実験結果は (a)トラフィック密度機能以外に、トラフィック競合機能は、ワークロードの予測能力(すなわち、最小水平/垂直分離距離)に寄与する。 b) グラフニューラルネットワークを用いた空域の時空間グラフレイアウトから直接学習することにより,手作りの交通複雑性特性と比較して,高い予測精度が得られる。 c) 適合予測(conformal prediction)は,モデル予測精度をさらに向上させる上で有用なツールである。使用されるコードは \href{https://github.com/ymlasu/para-atm-collection/blob/master/air-traffic-prediction/ATC-Workload-Predic tion/}{$\mathsf{Link}$} で公開されている。

Air traffic control (ATC) is a safety-critical service system that demands constant attention from ground air traffic controllers (ATCos) to maintain daily aviation operations. The workload of the ATCos can have negative effects on operational safety and airspace usage. To avoid overloading and ensure an acceptable workload level for the ATCos, it is important to predict the ATCos' workload accurately for mitigation actions. In this paper, we first perform a review of research on ATCo workload, mostly from the air traffic perspective. Then, we briefly introduce the setup of the human-in-the-loop (HITL) simulations with retired ATCos, where the air traffic data and workload labels are obtained. The simulations are conducted under three Phoenix approach scenarios while the human ATCos are requested to self-evaluate their workload ratings (i.e., low-1 to high-7). Preliminary data analysis is conducted. Next, we propose a graph-based deep-learning framework with conformal prediction to identify the ATCo workload levels. The number of aircraft under the controller's control varies both spatially and temporally, resulting in dynamically evolving graphs. The experiment results suggest that (a) besides the traffic density feature, the traffic conflict feature contributes to the workload prediction capabilities (i.e., minimum horizontal/vertical separation distance); (b) directly learning from the spatiotemporal graph layout of airspace with graph neural network can achieve higher prediction accuracy, compare to hand-crafted traffic complexity features; (c) conformal prediction is a valuable tool to further boost model prediction accuracy, resulting a range of predicted workload labels. The code used is available at \href{https://github.com/ymlasu/para-atm-collection/blob/master/air-traffic-prediction/ATC-Workload-Predic tion/}{$\mathsf{Link}$}.

翻訳日:2023-07-21 15:01:39 公開日:2023-07-20

# 動詞操作による命令追従評価

Instruction-following Evaluation through Verbalizer Manipulation ( http://arxiv.org/abs/2307.10558v1 )

ライセンス: Link先を確認

Shiyang Li, Jun Yan, Hai Wang, Zheng Tang, Xiang Ren, Vijay Srinivasan, Hongxia Jin

(参考訳) 命令調整型モデルは様々な自然言語処理タスクで顕著に成功したが、命令に従う能力の正確な評価は依然として難しい。既存のベンチマークは主に、トレーニング中にモデルが学んだこととよく一致する一般的な命令に焦点を当てています。しかし、これらの指示に応答する能力は、必ずしも命令追従の強い能力を意味するとは限らない。本稿では,動詞操作と呼ばれる新しい指示追従評価プロトコルを提案する。タスクラベルを、モデル先行と異なる程度に整合した単語で動詞化し、高い整合性(例えば、肯定的な感情に ``postive'' を出力する)から最小整合性(例えば、肯定的な感情に `` negative'' を出力する)の言語化を指示する。バーバリザの操作は、任意の分類ベンチマークとシームレスに統合して、モデルの事前依存性と、それらをオーバーライドして正確に指示に従う能力を調べることができる。我々は、9つのデータセットにまたがる4つの主要なモデルファミリーを包括的に評価し、それぞれに12組の発声器を用いる。我々は,異なる家族や規模にわたるモデルの指示追従能力が,より自然な言語化能力の低下によって著しく異なることを観察した。最強のGPT-4モデルでさえ、最も難易度の高い動詞をランダムに推測するよりも優れた性能を発揮するのに苦労している。

While instruction-tuned models have shown remarkable success in various natural language processing tasks, accurately evaluating their ability to follow instructions remains challenging. Existing benchmarks primarily focus on common instructions that align well with what the model learned during training. However, proficiency in responding to these instructions does not necessarily imply strong ability in instruction following. In this paper, we propose a novel instruction-following evaluation protocol called verbalizer manipulation. It instructs the model to verbalize the task label with words aligning with model priors to different extents, adopting verbalizers from highly aligned (e.g., outputting ``postive'' for positive sentiment), to minimally aligned (e.g., outputting ``negative'' for positive sentiment). Verbalizer manipulation can be seamlessly integrated with any classification benchmark to examine the model's reliance on priors and its ability to override them to accurately follow the instructions. We conduct a comprehensive evaluation of four major model families across nine datasets, employing twelve sets of verbalizers for each of them. We observe that the instruction-following abilities of models, across different families and scales, are significantly distinguished by their performance on less natural verbalizers. Even the strongest GPT-4 model struggles to perform better than random guessing on the most challenging verbalizer, emphasizing the need for continued advancements to improve their instruction-following abilities.

翻訳日:2023-07-21 15:01:03 公開日:2023-07-20

# EMQ: 自動混合精度量子化のためのトレーニング不要プロキシの進化

EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization ( http://arxiv.org/abs/2307.10554v1 )

ライセンス: Link先を確認

Peijie Dong and Lujun Li and Zimian Wei and Xin Niu and Zhiliang Tian and Hengyue Pan

(参考訳) Mixed-Precision Quantization~(MQ)は、モデルの競合する精度と複雑さのトレードオフを実現する。従来のトレーニングベースの検索手法では、MQ内の層ごとのビット幅設定を最適化するために時間を要する。近年、トレーニング不要なアプローチでは様々なMQプロキシが提供され、探索効率が大幅に向上している。しかし、これらのプロキシと量子化精度の相関性はよく分かっていない。このギャップに対処するために、私たちはまず、異なるビット構成と量子化結果を含むMQ-Bench-101を構築します。そこで,既存のトレーニングフリープロキシはMQ-Bench-101上で弱い相関関係を示す。優れたプロキシを効率的に探索するために,進化アルゴリズムによるMQ用プロキシフレームワークの自動検索を開発する。特に、既存のプロキシを含む精巧な検索空間を考案し、進化探索を行い、最も相関性の高いMQプロキシを発見する。我々は, 早期収束を回避し, 検索効率を向上させるために, 多様性向上戦略と互換性スクリーニングプロトコルを提案する。このようにして、Evolving proxies for Mixed-precision Quantization~(EMQ)フレームワークは、重いチューニングや専門知識のないプロキシの自動生成を可能にします。様々なResNetおよびMobileNetファミリによるImageNetの大規模な実験により、当社のEMQは最先端の混合精度メソッドよりも大幅にコストを削減して優れたパフォーマンスが得られることを示した。コードはリリースされます。

Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between these proxies and quantization accuracy is poorly understood. To address the gap, we first build the MQ-Bench-101, which involves different bit configurations and quantization results. Then, we observe that the existing training-free proxies perform weak correlations on the MQ-Bench-101. To efficiently seek superior proxies, we develop an automatic search of proxies framework for MQ via evolving algorithms. In particular, we devise an elaborate search space involving the existing proxies and perform an evolution search to discover the best correlated MQ proxy. We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. In this way, our Evolving proxies for Mixed-precision Quantization~(EMQ) framework allows the auto-generation of proxies without heavy tuning and expert knowledge. Extensive experiments on ImageNet with various ResNet and MobileNet families demonstrate that our EMQ obtains superior performance than state-of-the-art mixed-precision methods at a significantly reduced cost. The code will be released.

翻訳日:2023-07-21 15:00:20 公開日:2023-07-20

# PPN:複合レイアウトを用いた鍵情報抽出のための並列ポインタベースネットワーク

PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts ( http://arxiv.org/abs/2307.10551v1 )

ライセンス: Link先を確認

Kaiwen Wei, Jie Yao, Jingyuan Zhang, Yangyang Kang, Fubang Zhao, Yating Zhang, Changlong Sun, Xin Jin, Xin Zhang

(参考訳) キー情報抽出(KIE)は、視覚的にリッチなドキュメントから構造化された値の意味的エンティティを抽出することを目的とした、挑戦的なマルチモーダルタスクである。重要な進展はありますが、対処すべき大きな課題は2つあります。まず、既存のデータセットのレイアウトが比較的固定され、セマンティックエンティティのカテゴリの数に制限されるため、これらのデータセットと複雑な実世界のシナリオの間に大きなギャップが生じる。第二に、既存の手法は2段階のパイプライン戦略に従い、エラー伝播問題を引き起こす可能性がある。さらに、見当たらない意味的エンティティカテゴリが出現する状況では、適用が難しい。キー情報抽出のための複合レイアウト形式 (clex) と呼ばれる, 意味的エンティティカテゴリ1,162の5,860画像からなる, 新たな大規模ヒューマンアノテートデータセットを提案する。第2の課題を解決するために,ゼロショットおよび少数ショットシナリオに適用可能なエンドツーエンドモデルであるParallel Pointer-based Network (PPN)を導入する。 PPNはセマンティックエンティティ間の暗黙の手がかりを利用して抽出を支援し、その並列抽出機構により複数の結果を同時に効率的に抽出することができる。 CLEXデータセットの実験では、PPNは既存の最先端メソッドよりも優れており、推論速度もはるかに高速である。

Key Information Extraction (KIE) is a challenging multimodal task that aims to extract structured value semantic entities from visually rich documents. Although significant progress has been made, there are still two major challenges that need to be addressed. Firstly, the layout of existing datasets is relatively fixed and limited in the number of semantic entity categories, creating a significant gap between these datasets and the complex real-world scenarios. Secondly, existing methods follow a two-stage pipeline strategy, which may lead to the error propagation problem. Additionally, they are difficult to apply in situations where unseen semantic entity categories emerge. To address the first challenge, we propose a new large-scale human-annotated dataset named Complex Layout form for key information EXtraction (CLEX), which consists of 5,860 images with 1,162 semantic entity categories. To solve the second challenge, we introduce Parallel Pointer-based Network (PPN), an end-to-end model that can be applied in zero-shot and few-shot scenarios. PPN leverages the implicit clues between semantic entities to assist extracting, and its parallel extraction mechanism allows it to extract multiple results simultaneously and efficiently. Experiments on the CLEX dataset demonstrate that PPN outperforms existing state-of-the-art methods while also offering a much faster inference speed.

翻訳日:2023-07-21 14:59:44 公開日:2023-07-20

# SC VALL-E:音声合成のためのスタイル制御可能なゼロショットテキスト

SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer ( http://arxiv.org/abs/2307.10550v1 )

ライセンス: Link先を確認

Daegyeom Kim, Seongho Hong, and Yong-Hoon Choi

(参考訳) 音声のさまざまな特性を制御し、所望の声を生成するために、さまざまな話者、さまざまな感情、異なる話し方を備えたコーパスをデータセットに追加し、表現型音声合成モデルを訓練する。本稿では,ニューラルコーデック言語モデル(VALL-E)に基づくスタイル制御(SC)VALL-Eモデルを提案する。提案したSC VALL-Eは、テキストから入力を受け、音声をプロンプトし、プロンプト音声の特徴を単に模倣するのではなく、属性を制御して多様な音声を生成することによって制御可能な音声を生成するように設計されている。感情,発話率,ピッチ,音声強度などの属性を表現する新たに設計されたスタイルネットワークのスタイル埋め込みマトリックス内のトークンを識別し,これらの属性を制御可能なモデルを設計する。 SC VALL-Eの性能を評価するために,グローバルスタイルトークン(GST)Tacotron2,可変オートエンコーダ(VAE)Tacotron2,オリジナルVALL-Eの3つの代表的な表現型音声合成モデルを用いて比較実験を行った。単語誤り率(wer)、f0音声誤り(fve)、f0グロスピッチ誤差(f0gpe)を評価指標として測定し、生成文の精度を評価する。合成音声の品質を比較するために,比較平均オプションスコア(cmos)と類似度平均オプションスコア(smos)を測定した。生成した音声のスタイル制御能力を評価するために,F0 と mel-spectrogram の変化を学習トークンの修正によって観察する。トレーニングデータに存在しないプロンプトオーディオを使用する場合、SC VALL-Eは様々な表現音を生成し、既存のモデルと比較して競合性能を示す。実装、事前トレーニングされたモデル、オーディオサンプルはgithubにあります。

Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generative pretrained transformer 3 (GPT-3). The proposed SC VALL-E takes input from text sentences and prompt audio and is designed to generate controllable speech by not simply mimicking the characteristics of the prompt audio but by controlling the attributes to produce diverse voices. We identify tokens in the style embedding matrix of the newly designed style network that represent attributes such as emotion, speaking rate, pitch, and voice intensity, and design a model that can control these attributes. To evaluate the performance of SC VALL-E, we conduct comparative experiments with three representative expressive speech synthesis models: global style token (GST) Tacotron2, variational autoencoder (VAE) Tacotron2, and original VALL-E. We measure word error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as evaluation metrics to assess the accuracy of generated sentences. For comparing the quality of synthesized speech, we measure comparative mean option score (CMOS) and similarity mean option score (SMOS). To evaluate the style control ability of the generated speech, we observe the changes in F0 and mel-spectrogram by modifying the trained tokens. When using prompt audio that is not present in the training data, SC VALL-E generates a variety of expressive sounds and demonstrates competitive performance compared to the existing models. Our implementation, pretrained models, and audio samples are located on GitHub.

翻訳日:2023-07-21 14:59:07 公開日:2023-07-20

# ブロックチェーン上の動的大規模言語モデル

Dynamic Large Language Models on Blockchains ( http://arxiv.org/abs/2307.10549v1 )

ライセンス: Link先を確認

Yuanhao Gong

(参考訳) 言語モデルには数十億のパラメータが含まれており、テキストには数千のトークンがあるため、大規模な言語モデルの訓練とデプロイには大量の計算資源が必要である。もう一つの問題は、大きな言語モデルが静的であることだ。トレーニングプロセス後に修正される。本稿では,これらの問題に対処するために,計算性能が高く,コンピュータネットワークに分散したブロックチェーン上での動的大規模言語モデルのトレーニングと展開を提案する。ブロックチェーンはセキュアで分散化された透明なシステムであり、仲介者不要のトランザクションのためのタンパー保護台帳の作成を可能にする。動的大規模言語モデルは、トレーニングプロセス後にユーザの入力から継続的に学習することができる。我々の手法は,大規模言語モデルを開発するための新しい方法を提供し,次世代人工知能システムに光を当てる。

Training and deploying the large language models requires a large mount of computational resource because the language models contain billions of parameters and the text has thousands of tokens. Another problem is that the large language models are static. They are fixed after the training process. To tackle these issues, in this paper, we propose to train and deploy the dynamic large language model on blockchains, which have high computation performance and are distributed across a network of computers. A blockchain is a secure, decentralized, and transparent system that allows for the creation of a tamper-proof ledger for transactions without the need for intermediaries. The dynamic large language models can continuously learn from the user input after the training process. Our method provides a new way to develop the large language models and also sheds a light on the next generation artificial intelligence systems.

翻訳日:2023-07-21 14:58:32 公開日:2023-07-20

# TREA:会話レコメンデーションのための木構造推論スキーマ

TREA: Tree-Structure Reasoning Schema for Conversational Recommendation ( http://arxiv.org/abs/2307.10543v1 )

ライセンス: Link先を確認

Wendi Li, Wei Wei, Xiaoye Qu, Xian-Ling Mao, Ye Yuan, Wenfeng Xie, Dangyang Chen

(参考訳) 対話レコメンデーションシステム(CRS)は,対話を通じてユーザの動的興味をタイムリーに追跡し,項目レコメンデーションに対する関連応答を生成することを目的としている。近年,会話コンテキストの理解を深めるため,様々な外部知識基盤(特に知識グラフ)がCRSに組み込まれている。しかし、近年の推論モデルでは、因果関係推論のための線形構造や固定階層構造などの簡素な構造に大きく依存しており、外部知識を持つ発話間の洗練された関係を完全には理解できない。そこで本研究では,TREA という新しいツリー構造 schEmA を提案する。 TREAは、言及されたエンティティ間の因果関係を明らかにするための推論構造として多階層的スケーラブルツリーを構築し、過去の会話を十分に活用し、推奨された結果に対してより合理的で適切な応答を生成する。 2つの公開CRSデータセットに対する大規模な実験は、我々のアプローチの有効性を実証した。

Conversational recommender systems (CRS) aim to timely trace the dynamic interests of users through dialogues and generate relevant responses for item recommendations. Recently, various external knowledge bases (especially knowledge graphs) are incorporated into CRS to enhance the understanding of conversation contexts. However, recent reasoning-based models heavily rely on simplified structures such as linear structures or fixed-hierarchical structures for causality reasoning, hence they cannot fully figure out sophisticated relationships among utterances with external knowledge. To address this, we propose a novel Tree structure Reasoning schEmA named TREA. TREA constructs a multi-hierarchical scalable tree as the reasoning structure to clarify the causal relationships between mentioned entities, and fully utilizes historical conversations to generate more reasonable and suitable responses for recommended results. Extensive experiments on two public CRS datasets have demonstrated the effectiveness of our approach.

翻訳日:2023-07-21 14:58:19 公開日:2023-07-20

# 非二項安定化符号からのナラインCFT

Narain CFTs from nonbinary stabilizer codes ( http://arxiv.org/abs/2307.10581v1 )

ライセンス: Link先を確認

Yasin Ferdous Alam, Kohki Kawabata, Tatsuma Nishioka, Takuya Okuda and Shinichiro Yahagi

(参考訳) 我々は、カライン共形体論(CFT)を、クーディット安定化符号から、素電力オーダーの有限体上の量子安定化符号($p$素数と$m\geq 1$)、または$k>1$の環上の量子安定化符号($k>1$)の構成へと一般化する。我々の構成は有理 CFT であり、これは以前の CFT よりも、ナライン CFT のモジュライ空間のより大きな点集合をカバーする。また、非ゼロ論理量子ビットの量子安定化符号と有限集合のナライン CFT との対応も提案する。本稿では,よく知られた安定化符号との対応について述べる。

We generalize the construction of Narain conformal field theories (CFTs) from qudit stabilizer codes to the construction from quantum stabilizer codes over the finite field of prime power order ($\mathbb{F}_{p^m}$ with $p$ prime and $m\geq 1$) or over the ring $\mathbb{Z}_k$ with $k>1$. Our construction results in rational CFTs, which cover a larger set of points in the moduli space of Narain CFTs than the previous one. We also propose a correspondence between a quantum stabilizer code with non-zero logical qubits and a finite set of Narain CFTs. We illustrate the correspondence with well-known stabilizer codes.

翻訳日:2023-07-21 14:50:33 公開日:2023-07-20

# 中国沖海霧予測のためのインテリジェントモデル

Intelligent model for offshore China sea fog forecasting ( http://arxiv.org/abs/2307.10580v1 )

ライセンス: Link先を確認

Yanfei Xiang, Qinghong Zhang, Mingqing Wang, Ruixue Xia, Yang Kong, Xiaomeng Huang

(参考訳) 海洋経済活動と沿岸経済活動の効果的管理には,海霧の正確な時間的予測が重要である。海霧の複雑な性質と固有の変動を考えると、従来の数値および統計的予測法は不適切であることがしばしば証明される。本研究の目的は,yre(yangtze river estuary)沿岸地域を事例として,数値気象予測モデルに組み込んだ高度海霧予測手法の開発である。機械学習モデルをトレーニングする前に,タイムラグ相関分析手法を用いて主要な予測要因を同定し,海霧の発生を誘発するメカニズムを解明した。さらに,不均衡データ問題に対処するためにアンサンブル学習と焦点損失関数を実装し,モデルの予測能力を高める。本手法の精度を検証するため,気象観測と過去の予測の両方を含む1年にわたる包括的データセットを用いて,その性能を評価する。驚くべきことに、機械学習に基づくアプローチは、気象研究と非静水型メソスケールモデル(wrf-nmm)と、アメリカ海洋大気庁(noaa)予測システム研究所(fsl)が開発したアルゴリズムの2つの従来の手法の予測性能を上回っている。具体的には,60時間のリードタイムで1km以下の可視性を有する海霧の予測において,検出確率(pod)を増加させ,同時に誤警報率(far)を低下させることにより,優れた結果を得る。

Accurate and timely prediction of sea fog is very important for effectively managing maritime and coastal economic activities. Given the intricate nature and inherent variability of sea fog, traditional numerical and statistical forecasting methods are often proven inadequate. This study aims to develop an advanced sea fog forecasting method embedded in a numerical weather prediction model using the Yangtze River Estuary (YRE) coastal area as a case study. Prior to training our machine learning model, we employ a time-lagged correlation analysis technique to identify key predictors and decipher the underlying mechanisms driving sea fog occurrence. In addition, we implement ensemble learning and a focal loss function to address the issue of imbalanced data, thereby enhancing the predictive ability of our model. To verify the accuracy of our method, we evaluate its performance using a comprehensive dataset spanning one year, which encompasses both weather station observations and historical forecasts. Remarkably, our machine learning-based approach surpasses the predictive performance of two conventional methods, the weather research and forecasting nonhydrostatic mesoscale model (WRF-NMM) and the algorithm developed by the National Oceanic and Atmospheric Administration (NOAA) Forecast Systems Laboratory (FSL). Specifically, in regard to predicting sea fog with a visibility of less than or equal to 1 km with a lead time of 60 hours, our methodology achieves superior results by increasing the probability of detection (POD) while simultaneously reducing the false alarm ratio (FAR).

翻訳日:2023-07-21 14:50:14 公開日:2023-07-20

# 多目的フェデレーション学習によるSecureBoostハイパーパラメータチューニング

SecureBoost Hyperparameter Tuning via Multi-Objective Federated Learning ( http://arxiv.org/abs/2307.10579v1 )

ライセンス: Link先を確認

Ziyao Ren, Yan Kang, Lixin Fan, Linghua Yang, Tao Fan, Yongxin Tong and Qiang Yang

(参考訳) SecureBoostは、準同型暗号化を活用して、垂直連邦学習環境でデータのプライバシを保護するツリーブースティングアルゴリズムである。金融や医療などの分野では、解釈可能性、有効性、プライバシー保護能力によって広く利用されている。しかしSecureBoostは、高い計算複雑性とラベルリークのリスクに悩まされている。 SecureBoostの潜在能力を最大限活用するためには、SecureBoostのハイパーパラメータを慎重に選択して、ユーティリティ、効率、プライバシの最適なバランスをとる必要がある。既存の手法では経験的あるいはヒューリスティックにハイパーパラメータを設定するが、それらは最適とはほど遠い。このギャップを埋めるために、制約付きマルチオブジェクトセキュアBoost(CMOSB)アルゴリズムを提案し、各ソリューションがユーティリティ損失、トレーニングコスト、プライバシリークの間の最適なトレードオフを達成するためのハイパーパラメータのセットである、Pareto最適解を見つける。 3つの目的の測定を設計する。特に,提案したインスタンスクラスタリング攻撃を用いて,プライバシリークを測定する。実験により、CMOSBはベースラインよりも優れたハイパーパラメータを得るだけでなく、FL参加者のフレキシブルな要求を満たすための最適なハイパーパラメータセットも得られることが示された。

SecureBoost is a tree-boosting algorithm leveraging homomorphic encryption to protect data privacy in vertical federated learning setting. It is widely used in fields such as finance and healthcare due to its interpretability, effectiveness, and privacy-preserving capability. However, SecureBoost suffers from high computational complexity and risk of label leakage. To harness the full potential of SecureBoost, hyperparameters of SecureBoost should be carefully chosen to strike an optimal balance between utility, efficiency, and privacy. Existing methods either set hyperparameters empirically or heuristically, which are far from optimal. To fill this gap, we propose a Constrained Multi-Objective SecureBoost (CMOSB) algorithm to find Pareto optimal solutions that each solution is a set of hyperparameters achieving optimal tradeoff between utility loss, training cost, and privacy leakage. We design measurements of the three objectives. In particular, the privacy leakage is measured using our proposed instance clustering attack. Experimental results demonstrate that the CMOSB yields not only hyperparameters superior to the baseline but also optimal sets of hyperparameters that can support the flexible requirements of FL participants.

翻訳日:2023-07-21 14:49:50 公開日:2023-07-20

# ethosight:文脈ラベル親和性メトリクスと推論に基づく反復学習を用いたニュアンス知覚のための共同埋め込みシステム

Ethosight: A Joint-Embedding Based System for Nuanced Perception Using Contextual Label Affinity Metric and Reasoning Based Iterative Learning ( http://arxiv.org/abs/2307.10577v1 )

ライセンス: Link先を確認

Hugo Latapie, Kristinn R. Thorisson, Shan Yu, Vahagn Petrosyan, Patrick Hammer, Pei Wang, Brandon Kynoch, Hanning Chen, Tangrui Li

(参考訳) 従来のコンピュータビジョンモデルは、データ取得と検証、特に微妙な行動のニュアンスやイベントを検出するために、広範囲な手作業を必要とする。日常的な買い物と潜在的な万引きを区別するといった、現実世界のアプリケーションにおける潜在的なリスクとルーチンの振る舞いを区別することの難しさは、さらにプロセスを複雑にする。本稿では,新しいゼロショットコンピュータビジョンアルゴリズムであるethosightを提案する。 ethosightは、ユーザの要求と関心のセマンティックな知識に基づいたクリーンなスレートから始まり、既存のシンボル知識の必要性を根絶する。局所ラベル親和性計算と推論誘導反復学習ループを用いて、Ethosightはシーンの詳細を推測し、ラベルセットを反復的に洗練する。推論メカニズムは、GPT4のような大きな言語モデル、OpenNARSのようなシンボリック推論、ハイブリッドシステムから派生することができる。 Ethosightは、事前訓練されたマルチモーダルモデルであるImageBindの機能をさらに活用し、数サイクルで画像の正確なセマンティック知識を生成する。明示的要素とニュアンス的要素の両方を効率的にキャプチャする。また、Korzybskiの"タイムバインディング"の概念をマシンで実装し、世代別学習とデプロイメント間の知識共有を可能にします。以上の結果から,ethosightは40の複雑なユースケースにまたがる有効性を示す。それは、新しい関心領域を識別する特別な能力を示し、1000のセットから上位5レーベルで常に高い親和性スコアを生成している。さまざまな環境で実施されたテストは、ethosightの堅牢なパフォーマンスを証明している。本論文の本体内における詳細な結果とケーススタディと付録は,微妙でニュアンスな動作の検出と抽出において,コンピュータビジョンモデルの適応性とレジリエンスを高めるための有望な軌道を示すものである。

Traditional computer vision models often require extensive manual effort for data acquisition and validation, particularly when detecting subtle behavioral nuances or events. The difficulty in distinguishing routine behaviors from potential risks in real-world applications, like differentiating routine shopping from potential shoplifting, further complicates the process. We present Ethosight, a novel zero-shot computer vision algorithm. Ethosight eradicates the need for pre-existing symbolic knowledge, initiating from a clean slate based on user requirements and semantic knowledge of interest. Using localized label affinity calculations and a reasoning-guided iterative learning loop, Ethosight infers scene details and iteratively refines the label set. Reasoning mechanisms can be derived from large language models like GPT4, symbolic reasoners like OpenNARS, or hybrid systems. Ethosight further capitalizes on the capabilities of a pre-trained multi-modal model, ImageBind, generating accurate semantic knowledge of images within a few cycles. It successfully captures both explicit and nuanced elements efficiently. We also introduce the implementation of Korzybski's "time-binding" concept in machines, which allows for generational learning and knowledge sharing across deployments. Our evaluations demonstrate Ethosight's efficacy across 40 complex use cases. It has exhibited an exceptional ability to discern new areas of interest, consistently generating high-affinity scores within the top five labels from a set of a thousand. Tests conducted across diverse environments attest to Ethosight's robust performance. Detailed results and case studies within the main body of this paper and an appendix underscore a promising trajectory towards enhancing the adaptability and resilience of computer vision models in detecting and extracting subtle and nuanced behaviors.

翻訳日:2023-07-21 14:49:30 公開日:2023-07-20

# プロトタイプ正規化による連合学習収束の促進

Boosting Federated Learning Convergence with Prototype Regularization ( http://arxiv.org/abs/2307.10575v1 )

ライセンス: Link先を確認

Yu Qiao, Huy Q. Le, Choong Seon Hong

(参考訳) 分散機械学習技術として、フェデレートラーニング(FL)では、クライアントがローカルデータをリークすることなく、エッジサーバで共有モデルを共同でトレーニングする必要がある。しかし、クライアント間での不均一なデータ分散は、しばしばモデルの性能を低下させる。そこで本研究では,データ分布の不均一性に対処するプロトタイプベースの正規化戦略を提案する。具体的には、正規化プロセスでは、サーバが分散クライアントからローカルプロトタイプを集約してグローバルプロトタイプを生成し、それを個々のクライアントに送信して、ローカルトレーニングをガイドする。 MNISTとFashion-MNISTの実験結果から,最も人気のあるベースラインであるFedAvgと比較して平均テスト精度は3.3%,8.9%向上した。さらに,本手法は不均一な環境での収束速度が速い。

As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneity in the data distribution. Specifically, the regularization process involves the server aggregating local prototypes from distributed clients to generate a global prototype, which is then sent back to the individual clients to guide their local training. The experimental results on MNIST and Fashion-MNIST show that our proposal achieves improvements of 3.3% and 8.9% in average test accuracy, respectively, compared to the most popular baseline FedAvg. Furthermore, our approach has a fast convergence rate in heterogeneous settings.

翻訳日:2023-07-21 14:48:56 公開日:2023-07-20

# オンライン深層強化学習による建設作業とキャッシュフローの最適化のための資源フローの適応制御

Adaptive Control of Resource Flow to Optimize Construction Work and Cash Flow via Online Deep Reinforcement Learning ( http://arxiv.org/abs/2307.10574v1 )

ライセンス: Link先を確認

Can Jiang, Xin Li, Jia-Rui Lin, Ming Liu, Zhiliang Ma

(参考訳) 建設作業、資源、キャッシュフローの複雑さとダイナミクスのために、それらの管理の貧弱さは、通常、時間とコストのオーバーラン、破産、さらにはプロジェクトの失敗につながる。既存の手法では不確実性のある動的環境における資源フローの最適制御を達成できなかった。そこで本稿では,建設プロジェクトの作業とキャッシュフローを最適化するために,資源フローを適応的に制御するモデルと手法を提案する。まず, 部分観測可能なマルコフ決定過程に基づく数理モデルを確立し, 建設作業, 資源, キャッシュフローの複雑な相互作用, 多様な影響因子の不確実性と変動を定式化する。一方、最適解を効率的に見つけるために、労働と物質フローの適応的最適制御を実現するために、深層強化学習(DRL)に基づく手法を導入し、作業とキャッシュフローを最適化する。 drlのトレーニングプロセスを支援するために、プロジェクトの動的特徴と外部環境を模倣するために、離散イベントシミュレーションに基づくシミュレータも開発されている。シミュレーション実験により,提案手法がバニラ経験的手法と遺伝的アルゴリズムを上回り,多様なプロジェクトや外部環境において顕著な能力を有し,drlと経験的手法のハイブリッドエージェントが最良の結果をもたらすことを示した。本稿では,共同作業,資源,キャッシュフローの適応制御と最適化に寄与し,建設プロジェクト管理におけるDRL技術導入の一歩となる可能性がある。

Due to complexity and dynamics of construction work, resource, and cash flows, poor management of them usually leads to time and cost overruns, bankruptcy, even project failure. Existing approaches in construction failed to achieve optimal control of resource flow in a dynamic environment with uncertainty. Therefore, this paper introducess a model and method to adaptive control the resource flows to optimize the work and cash flows of construction projects. First, a mathematical model based on a partially observable Markov decision process is established to formulate the complex interactions of construction work, resource, and cash flows as well as uncertainty and variability of diverse influence factors. Meanwhile, to efficiently find the optimal solutions, a deep reinforcement learning (DRL) based method is introduced to realize the continuous adaptive optimal control of labor and material flows, thereby optimizing the work and cash flows. To assist the training process of DRL, a simulator based on discrete event simulation is also developed to mimic the dynamic features and external environments of a project. Experiments in simulated scenarios illustrate that our method outperforms the vanilla empirical method and genetic algorithm, possesses remarkable capability in diverse projects and external environments, and a hybrid agent of DRL and empirical method leads to the best result. This paper contributes to adaptive control and optimization of coupled work, resource, and cash flows, and may serve as a step stone for adopting DRL technology in construction project management.

翻訳日:2023-07-21 14:48:41 公開日:2023-07-20

# 無効論理と等価なゲイン:言語モデルのプロンプトにおける推論の奇妙な性質

Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting ( http://arxiv.org/abs/2307.10573v1 )

ライセンス: Link先を確認

Rylan Schaeffer, Kateryna Pistunova, Samar Khanna, Sarthak Consul, Sanmi Koyejo

(参考訳) 言語モデルは、パフォーマンスを大幅に向上させる方法で問題を通じて推論するよう促すことができる。しかし、このようなプロンプトによるパフォーマンス改善は明らかではない。最近の研究では、論理的な \textit{invalid} chain-of-thought (cot) プロンプトを用いることで、論理的な \textit{valid} cotプロンプトと同じくらいのパフォーマンスが向上し、cotの編集によって問題固有の情報を抽象情報や分散情報に置き換えることが通常性能に影響を与えないことが示された。批評家は、これらの発見は意味のある結論を導き出すにはあまりにも少ない、そして簡単な作業に基づいていると答えている。この問題を解決するために、論理的に無効なCoTプロンプトが、BIG-Bench Hard(BBH)と呼ばれるBIG-Benchベンチマークの最も難しいタスクにおいて、論理的に有効なプロンプトと同じレベルのパフォーマンスゲインを提供するかどうかをテストする。論理的に textit{invalid} 推論プロンプトは、BBH タスクにおいて論理的に有効な推論プロンプトとして、確かに同様のパフォーマンスゲインを達成する。また、前作で使われたcotプロンプトには論理的なエラーが含まれていることもわかりました。これは、論理的に妥当な推論を超えた共変項がパフォーマンス改善の責任を負うことを示唆している。

Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easy tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically \textit{invalid} reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that covariates beyond logically valid reasoning are responsible for performance improvements.

翻訳日:2023-07-21 14:48:15 公開日:2023-07-20

# 多粒子系における熱カシミール相互作用:散乱チャネルアプローチ

Thermal Casimir interactions in multi-particle systems: scattering channel approach ( http://arxiv.org/abs/2307.10570v1 )

ライセンス: Link先を確認

Yang Li, Kimball A. Milton, Iver Brevik

(参考訳) 多粒子熱カシミール相互作用は、主にカシミールエントロピーの観点から、多重散乱過程に基づく視点から研究されている。散乱経路の幾何学を詳細に記述し, 横流路, 縦流路, 混合流路など, 異なる種類の流路からの寄与を示す。経路の幾何学は経路内の各チャネルの重みに大きな影響を与える。ネガティリティと非単調性は、多粒子カシミールエントロピーにおいて一般的に見られ、その源は、経路の幾何、偏光混合の種類、各粒子の分極性など多様である。多粒子散乱による熱的寄与は系において重要であるが、ゼロ温度の多粒子散乱効果は重要ではない。多粒子配置から連続体への挙動の制限を簡潔に検討する。

Multi-particle thermal Casimir interactions are investigated, mostly in terms of the Casimir entropy, from the point of view based on multiple-scattering processes. The geometry of the scattering path is depicted in detail, and the contributions from different types of channels, namely the transverse, longitudinal and mixing channels, are demonstrated. The geometry of the path can strongly influence the weight of each channel in the path. Negativity and nonmonotonicity are commonly seen in the multi-particle Casimir entropy, the sources of which are diverse, including the geometry of the path, the types of polarization mixing, the polarizability of each particle, etc. Thermal contributions from multi-particle scatterings can be significant in the system, while the zero-temperature multi-particle scattering effects are insignificant. Limiting behaviors from a multi-particle configuration to a continuum are briefly explored.

翻訳日:2023-07-21 14:47:47 公開日:2023-07-20

# 知覚的アライメントモニタリング

Deceptive Alignment Monitoring ( http://arxiv.org/abs/2307.10569v1 )

ライセンス: Link先を確認

Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

(参考訳) 大規模な機械学習モデルの能力が拡大し続け、そのようなモデルに与えられる自律性が拡大するにつれて、新しい敵の織機(モデルそのもの)が見えてくる。モデルが一見合理的に振る舞うという脅威は、内密かつ微妙にその振る舞いを操作上の理由から修正する一方で、AIセーフティ&アライメントのコミュニティにおいて、詐欺的アライメントと呼ばれることが多い。したがって、この新たな方向を認知アライメントモニタリングと呼ぶ。そこで本研究では,近未来にますます重要となり,相互に絡み合うであろう,多様な機械学習サブフィールドにおける新たな方向性を特定し,これらの分野における進歩は,長期的な課題と新たな研究機会の両方をもたらすと論じる。我々は、これらの新興方向への敵対的機械学習コミュニティのさらなる関与を提唱することで、結論付ける。

As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety & Alignment communities. Consequently, we call this new direction Deceptive Alignment Monitoring. In this work, we identify emerging directions in diverse machine learning subfields that we believe will become increasingly important and intertwined in the near future for deceptive alignment monitoring, and we argue that advances in these fields present both long-term challenges and new research opportunities. We conclude by advocating for greater involvement by the adversarial machine learning community in these emerging directions.

翻訳日:2023-07-21 14:47:31 公開日:2023-07-20

# No-frills Temporal Video Grounding:マルチスケール隣りの注意とズームイン境界検出

No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection ( http://arxiv.org/abs/2307.10567v1 )

ライセンス: Link先を確認

Qi Zhang and Sipeng Zheng and Qin Jin

(参考訳) 時間的ビデオグラウンドティング(TVG)は、未編集のビデオから言語クエリの時間間隔を取得することを目的としている。テレビGにおける重要な課題は、低SNR(Semantic Noise Ratio)による低SNRの性能低下である。先行研究はこの課題に洗練された技術を用いて対処した。本稿では,マルチスケールアテンディングアテンションとズームイン境界検出という2つのコアモジュールからなる非フリルtvgモデルを提案する。マルチスケール隣人の注意は、各ビデオトークンが隣人からの視覚的コンテキストのみを集約することを制限し、高比雑音から多スケール特徴階層による最も識別性の高い情報の抽出を可能にする。ズームイン境界検出は、きめ細かい接地調整のための選択された上位候補の局所的判別に焦点を当てる。エンド・ツー・エンドのトレーニング戦略により、我々のモデルは異なるTVGベンチマーク上での競合性能を達成すると同時に、より高速な推論速度と軽量なモデルパラメータの利点も享受できる。

Temporal video grounding (TVG) aims to retrieve the time interval of a language query from an untrimmed video. A significant challenge in TVG is the low "Semantic Noise Ratio (SNR)", which results in worse performance with lower SNR. Prior works have addressed this challenge using sophisticated techniques. In this paper, we propose a no-frills TVG model that consists of two core modules, namely multi-scale neighboring attention and zoom-in boundary detection. The multi-scale neighboring attention restricts each video token to only aggregate visual contexts from its neighbor, enabling the extraction of the most distinguishing information with multi-scale feature hierarchies from high-ratio noises. The zoom-in boundary detection then focuses on local-wise discrimination of the selected top candidates for fine-grained grounding adjustment. With an end-to-end training strategy, our model achieves competitive performance on different TVG benchmarks, while also having the advantage of faster inference speed and lighter model parameters, thanks to its lightweight architecture.

翻訳日:2023-07-21 14:47:16 公開日:2023-07-20

# SCA-PVNet: 3Dオブジェクト検索のためのポイントクラウドとマルチビューの自己組織化に基づくアグリゲーション

SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval ( http://arxiv.org/abs/2307.10601v1 )

ライセンス: Link先を確認

Dongyun Lin, Yi Cheng, Aiyuan Guo, Shangbo Mao, Yiqun Li

(参考訳) 3dオブジェクトの検索に対処するため、ボクセル、ポイントクラウド、マルチビュー画像など、単一のモダリティで表現された3dオブジェクトの高度に識別可能な記述子を生成するための努力がなされている。 3dオブジェクトのマルチモダリティ表現からの補完情報を活用し、検索性能をさらに向上させることを約束する。しかし,大規模データセットを用いた多モード3Dオブジェクト検索はめったに行われない。本稿では,3次元オブジェクト検索のための点雲と多視点画像(SCA-PVNet)の自己組織化に基づくアグリゲーションを提案する。点群と多視点画像から深い特徴を抽出し,機能融合を効果的に行うために,インモダリティアグリゲーションモジュール (imam) とクロスモダリティアグリゲーションモジュール (cmam) という2種類の機能アグリゲーションモジュールを設計した。 IMAMはセルフアテンションメカニズムを利用してマルチビュー機能を集約し、CMAMはクロスアテンションメカニズムを利用してポイントクラウド機能をマルチビュー機能と相互作用する。オブジェクト検索のための3Dオブジェクトの最終記述子は、両方のモジュールから集約された特徴を連結することで得られる。提案手法よりもSCA-PVNetの方が優れていることを示すため,小規模から大規模までの3つのデータセットを用いて実験と解析を行った。

To address 3D object retrieval, substantial efforts have been made to generate highly discriminative descriptors of 3D objects represented by a single modality, e.g., voxels, point clouds or multi-view images. It is promising to leverage the complementary information from multi-modality representations of 3D objects to further improve retrieval performance. However, multi-modality 3D object retrieval is rarely developed and analyzed on large-scale datasets. In this paper, we propose self-and-cross attention based aggregation of point cloud and multi-view images (SCA-PVNet) for 3D object retrieval. With deep features extracted from point clouds and multi-view images, we design two types of feature aggregation modules, namely the In-Modality Aggregation Module (IMAM) and the Cross-Modality Aggregation Module (CMAM), for effective feature fusion. IMAM leverages a self-attention mechanism to aggregate multi-view features while CMAM exploits a cross-attention mechanism to interact point cloud features with multi-view features. The final descriptor of a 3D object for object retrieval can be obtained via concatenating the aggregated features from both modules. Extensive experiments and analysis are conducted on three datasets, ranging from small to large scale, to show the superiority of the proposed SCA-PVNet over the state-of-the-art methods.

翻訳日:2023-07-21 14:41:33 公開日:2023-07-20

# AIの課題と解決策

Challenges and Solutions in AI for All ( http://arxiv.org/abs/2307.10600v1 )

ライセンス: Link先を確認

Rifat Ara Shams, Didar Zowghi, Muneera Bano

(参考訳) ai(artificial intelligence)の広汎な存在と多様性は、公正、信頼、透明性のための設計において多様性と排他性(d&i)の原則を必要とする。しかし、これらの考察はしばしば見過ごされ、バイアス、差別、信頼できないという問題に繋がる。そこで我々は,aiにおけるd&iに関する課題と解決策を体系的に検討した。当社の厳密な検索の結果、2017年から2022年の間に48の論文が公開された。これらの論文のオープンコーディングでは、55の独特な課題と33のソリューション、24の独特な課題、23のソリューションがAIを使用してそのようなプラクティスを強化する。この研究は、これらの問題をより深く理解することで、これらの原則を将来のAIシステムに統合しようとする研究者や実践者に啓蒙する。

Artificial Intelligence (AI)'s pervasive presence and variety necessitate diversity and inclusivity (D&I) principles in its design for fairness, trust, and transparency. Yet, these considerations are often overlooked, leading to issues of bias, discrimination, and perceived untrustworthiness. In response, we conducted a Systematic Review to unearth challenges and solutions relating to D&I in AI. Our rigorous search yielded 48 research articles published between 2017 and 2022. Open coding of these papers revealed 55 unique challenges and 33 solutions for D&I in AI, as well as 24 unique challenges and 23 solutions for enhancing such practices using AI. This study, by offering a deeper understanding of these issues, will enlighten researchers and practitioners seeking to integrate these principles into future AI systems.

翻訳日:2023-07-21 14:41:07 公開日:2023-07-20

# アンサンブル学習に基づくベイジアンハイパーパラメータ感度分析によるIoTサイバーセキュリティの異常検出

Ensemble Learning based Anomaly Detection for IoT Cybersecurity via Bayesian Hyperparameters Sensitivity Analysis ( http://arxiv.org/abs/2307.10596v1 )

ライセンス: Link先を確認

Tin Lai, Farnaz Farid, Abubakar Bello, Fariza Sabrina

(参考訳) IoT(Internet of Things)は、世界中の何十億ものインテリジェントデバイスを統合し、人間の介入なしに他の接続デバイスと通信する能力を持つ。 IoTはデータアグリゲーションと分析を大規模に実現し、多くのドメインのライフクオリティを改善する。特にiotが収集するデータには、異常検出のための膨大な情報が含まれている。 IoTの異質な性質は、サイバーセキュリティの課題と機会の両方である。サイバーセキュリティ監視における従来のアプローチでは、さまざまなデータ型に対するさまざまなデータの前処理と処理が必要になることが少なくない。しかし、ヘテロジニアスタイプのネットワークデバイスは、単一のタイプのデバイス読み出しよりも、より多様な信号セットをキャプチャすることが多く、特に異常検出に有用である。本稿では,異常検出によるIoTサイバーセキュリティ向上のためのアンサンブル機械学習手法に関する総合的研究を行う。 1つの機械学習モデルを使用するのではなく、アンサンブル学習は複数のモデルからの予測力を組み合わせ、単一の機械学習モデルを使用するのではなく、異種データセットでの予測精度を高める。複数のIoTセンサを内蔵したネットワーク環境に適応するために,ベイジアンハイパーパラメータ最適化を利用したアンサンブル学習フレームワークを提案する。実験では,従来の手法と比較して高い予測力を示す。

The Internet of Things (IoT) integrates more than billions of intelligent devices over the globe with the capability of communicating with other connected devices with little to no human intervention. IoT enables data aggregation and analysis on a large scale to improve life quality in many domains. In particular, data collected by IoT contain a tremendous amount of information for anomaly detection. The heterogeneous nature of IoT is both a challenge and an opportunity for cybersecurity. Traditional approaches in cybersecurity monitoring often require different kinds of data pre-processing and handling for various data types, which might be problematic for datasets that contain heterogeneous features. However, heterogeneous types of network devices can often capture a more diverse set of signals than a single type of device readings, which is particularly useful for anomaly detection. In this paper, we present a comprehensive study on using ensemble machine learning methods for enhancing IoT cybersecurity via anomaly detection. Rather than using one single machine learning model, ensemble learning combines the predictive power from multiple models, enhancing their predictive accuracy in heterogeneous datasets rather than using one single machine learning model. We propose a unified framework with ensemble learning that utilises Bayesian hyperparameter optimisation to adapt to a network environment that contains multiple IoT sensor readings. Experimentally, we illustrate their high predictive power when compared to traditional methods.

翻訳日:2023-07-21 14:40:53 公開日:2023-07-20

# 最適マルチエージェントベイズ分散推定のための構造の利用

Exploiting Structure for Optimal Multi-Agent Bayesian Decentralized Estimation ( http://arxiv.org/abs/2307.10594v1 )

ライセンス: Link先を確認

Christopher Funk, Ofer Dagan, Benjamin Noack and Nisar R. Ahmed

(参考訳) ベイズ分散データ融合における重要な課題は、以前送信されたデータが送信元に循環する‘噂の伝播’あるいは‘二重カウント’現象である。これはしばしば、境界を計算するために見積もりの重み付け平均を取る共分散交叉(英語版)(ci)のような近似的な方法によって対処される。問題は、この境界がタイトではないこと、すなわち、見積もりがしばしば保存的すぎることである。本稿では,マルチエージェント分散核融合問題における確率的独立構造を生かして,より密接な境界を求めることができることを示す。 i) 元のCIの1つの(モノリシックな)因子ではなく複数の(非モノリシックな)重み付け因子を使用するCIアルゴリズムの拡張。 (ii)最適境界を計算し、任意の依存関係構造を完全に活用できる一般最適化スキーム。我々は,本手法を比較し,簡単な問題に対して同じ解に収束することを示す。次に, 大規模目標追跡シミュレーションを用いて新しい非モノリシックciアルゴリズムをテストし, 従来のモノリシックciよりも厳密なバウンドと正確な推定を実現することを示す。

A key challenge in Bayesian decentralized data fusion is the `rumor propagation' or `double counting' phenomenon, where previously sent data circulates back to its sender. It is often addressed by approximate methods like covariance intersection (CI) which takes a weighted average of the estimates to compute the bound. The problem is that this bound is not tight, i.e. the estimate is often over-conservative. In this paper, we show that by exploiting the probabilistic independence structure in multi-agent decentralized fusion problems a tighter bound can be found using (i) an expansion to the CI algorithm that uses multiple (non-monolithic) weighting factors instead of one (monolithic) factor in the original CI and (ii) a general optimization scheme that is able to compute optimal bounds and fully exploit an arbitrary dependency structure. We compare our methods and show that on a simple problem, they converge to the same solution. We then test our new non-monolithic CI algorithm on a large-scale target tracking simulation and show that it achieves a tighter bound and a more accurate estimate compared to the original monolithic CI.

翻訳日:2023-07-21 14:40:31 公開日:2023-07-20

# Event Blob Tracking: 非同期リアルタイムアルゴリズム

Event Blob Tracking: An Asynchronous Real-Time Algorithm ( http://arxiv.org/abs/2307.10593v1 )

ライセンス: Link先を確認

Ziwei Wang, Timothy Molloy, Pieter van Goor, Robert Mahony

(参考訳) イベントベースのカメラは、高時間分解能、低レイテンシ、高ダイナミックレンジのため、動きの速い物体を追跡するために人気が高まっている。本稿では,生のイベントをリアルタイムで非同期に追跡する新しいアルゴリズムを提案する。本稿では,イベントブロブの概念を,条件空間の確率がブロブ様である事象発生の時空間的確率として導入する。多くの現実世界のオブジェクトはイベントブロブデータを生成する。例えば、車のヘッドライトや、静的あるいはゆっくりと変化する背景に対して動く小さなフォアグラウンドオブジェクトなどのLEDを点滅させる。提案アルゴリズムは、カルマンフィルタと結合してイベントブロブ状態を追跡するために、データアソシエーションの動的しきい値を持つ近傍分類器を用いる。提案手法は,照明条件や高速動作においても高精度なトラッキングとイベントブロブ形状推定を実現する。マイクロ秒の時間分解は、フィルタ出力が接触時間や距離推定などの二次情報を引き出すことができ、自動運転における衝突回避のような現実世界の問題に応用できることを意味する。

Event-based cameras have become increasingly popular for tracking fast-moving objects due to their high temporal resolution, low latency, and high dynamic range. In this paper, we propose a novel algorithm for tracking event blobs using raw events asynchronously in real time. We introduce the concept of an event blob as a spatio-temporal likelihood of event occurrence where the conditional spatial likelihood is blob-like. Many real-world objects generate event blob data, for example, flickering LEDs such as car headlights or any small foreground object moving against a static or slowly varying background. The proposed algorithm uses a nearest neighbour classifier with a dynamic threshold criteria for data association coupled with a Kalman filter to track the event blob state. Our algorithm achieves highly accurate tracking and event blob shape estimation even under challenging lighting conditions and high-speed motions. The microsecond time resolution achieved means that the filter output can be used to derive secondary information such as time-to-contact or range estimation, that will enable applications to real-world problems such as collision avoidance in autonomous driving.

翻訳日:2023-07-21 14:40:10 公開日:2023-07-20

# 自律走行システムの試験と改善のための境界状態生成

Boundary State Generation for Testing and Improvement of Autonomous Driving Systems ( http://arxiv.org/abs/2307.10590v1 )

ライセンス: Link先を確認

Matteo Biagiola, Paolo Tonella

(参考訳) 近年のディープニューラルネットワーク(DNN)とセンサ技術の進歩により、自律運転システム(ADS)の自律性はますます高まっている。しかし、信頼度の評価は依然として重要な関心事である。最先端のADSテストアプローチでは、シミュレーション運転環境の制御可能な属性をADSが誤動作するまで変更する。このようなアプローチの主な欠点は、(1) シミュレーション環境の変更は、フィールド内テスト設定(例えば、道路形状の変更)に容易に転送できないこと、(2) ADSが成功した環境インスタンスは、ADSが誤動作する可能性のある隠れ運転条件を含む可能性があるにもかかわらず、破棄されることである。本稿では,広告評価のための新しいテスト生成装置であるgenbo (generator of boundary state pairs)を提案する。 GenBoは、障害のない環境インスタンスで収集されたエゴ車両の運転条件(位置、速度、方向)を変更し、同一環境における行動境界(すなわち、モデルが誤動作し始める場所)における挑戦運転条件を効率的に生成する。このような境界条件を用いて、初期トレーニングデータセットを拡張し、テスト中のDNNモデルを再訓練する。評価結果から,リトレーニングモデルでは,元のdnnモデルと比較して,評価トラックの別セットにおいて,最大16以上の成功率を示した。

Recent advances in Deep Neural Networks (DNNs) and sensor technologies are enabling autonomous driving systems (ADSs) with an ever-increasing level of autonomy. However, assessing their dependability remains a critical concern. State-of-the-art ADS testing approaches modify the controllable attributes of a simulated driving environment until the ADS misbehaves. Such approaches have two main drawbacks: (1) modifications to the simulated environment might not be easily transferable to the in-field test setting (e.g., changing the road shape); (2) environment instances in which the ADS is successful are discarded, despite the possibility that they could contain hidden driving conditions in which the ADS may misbehave. In this paper, we present GenBo (GENerator of BOundary state pairs), a novel test generator for ADS testing. GenBo mutates the driving conditions of the ego vehicle (position, velocity and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (i.e., where the model starts to misbehave) in the same environment. We use such boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has up to 16 higher success rate on a separate set of evaluation tracks with respect to the original DNN model.

翻訳日:2023-07-21 14:39:53 公開日:2023-07-20

# バッテリー電気自動車の充電行動予測:マイクロクラスタ化とsmote技術を用いたディープラーニングアプローチ

Forecasting Battery Electric Vehicle Charging Behavior: A Deep Learning Approach Equipped with Micro-Clustering and SMOTE Techniques ( http://arxiv.org/abs/2307.10588v1 )

ライセンス: Link先を確認

Hanif Tayarani, Trisha V. Ramadoss, Vaishnavi Karanam, Gil Tal, Christopher Nitta

(参考訳) エネルギーシステム、気候変動、公衆衛生が交通の電化に向けた主要な理由の1つである。排出削減のため、世界各国で輸送電化が進められている。その結果、多くの自動車メーカーが間もなくバッテリー電気自動車(BEV)のみの製造を開始する。カリフォルニア州では、主に気候変動や大気汚染の懸念から、BEVの採用率が上昇している。気候や大気汚染の目標には最適だが、不適切に管理されたBEV充電は、不十分な充電インフラと停電につながる可能性がある。本研究では,BEVの走行と充電データを学習し,BEVの充電イベントを予測するためのニューラルネットワークアルゴリズムであるMicro Clustering Deep Neural Network (MCDNN)を開発した。 MCDNNは、2015年から2020年にかけてカリフォルニア州で132台のBEVから発生し、合計1570167台のBEVモデルにまたがる、堅牢な旅行と料金のデータセットを使って構成されている。数値的な結果から,提案手法は支持ベクトルマシン,k近傍,決定木,その他のニューラルネットワークモデルなど,この分野のベンチマーク手法よりも有益であることが判明した。

Energy systems, climate change, and public health are among the primary reasons for moving toward electrification in transportation. Transportation electrification is being promoted worldwide to reduce emissions. As a result, many automakers will soon start making only battery electric vehicles (BEVs). BEV adoption rates are rising in California, mainly due to climate change and air pollution concerns. While great for climate and pollution goals, improperly managed BEV charging can lead to insufficient charging infrastructure and power outages. This study develops a novel Micro Clustering Deep Neural Network (MCDNN), an artificial neural network algorithm that is highly effective at learning BEVs trip and charging data to forecast BEV charging events, information that is essential for electricity load aggregators and utility managers to provide charging stations and electricity capacity effectively. The MCDNN is configured using a robust dataset of trips and charges that occurred in California between 2015 and 2020 from 132 BEVs, spanning 5 BEV models for a total of 1570167 vehicle miles traveled. The numerical findings revealed that the proposed MCDNN is more effective than benchmark approaches in this field, such as support vector machine, k nearest neighbors, decision tree, and other neural network-based models in predicting the charging events.

翻訳日:2023-07-21 14:39:31 公開日:2023-07-20

# NPTEL MOOCビデオにおける単語誤り率の差について

A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos ( http://arxiv.org/abs/2307.10587v1 )

ライセンス: Link先を確認

Anand Kumar Rai, Siddharth D Jaiswal, Animesh Mukherjee

(参考訳) 自動音声認識(ASR)システムは、音声言語をテキストに書き起こし、音声アシスタントや文字起こしサービスを含む様々なアプリケーションで有用性を見つけるように設計されている。しかし、現在最先端のASRシステムは、印象的なベンチマーク結果を提供し、特定の地域の話者や、その音声特性の変化による人口統計学との抗争が観察されている。本研究は, 英語における「シム9.8ドル」の技術講義を含む8740時間の大規模音声データセットのキュレーションと, インドのデモグラフィーの様々な部分を表すインストラクターによる講義の書き起こしについて述べる。このデータセットは、非常に人気のあるNTTEL MOOCプラットフォームをベースとしている。私たちは、キュレートされたデータセットを使用して、youtubeの自動キャプションとopenai whisperモデルのパフォーマンスを、インドの多様な話者特性にわたって測定します。話者の性別、母国性、年齢、発話率などによる差はあるが、キャストによる差異は存在していない。また,講義の分野における統計的に有意な差異も観察した。これらの結果は、より包括的で堅牢なASRシステムと、それらの不均一性評価のための表現データセットの必要性を示している。

Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of $\sim9.8$K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography. The dataset is sourced from the very popular NPTEL MOOC platform. We use the curated dataset to measure the existing disparity in YouTube Automatic Captions and OpenAI Whisper model performance across the diverse demographic traits of speakers in India. While there exists disparity due to gender, native region, age and speech rate of speakers, disparity based on caste is non-existent. We also observe statistically significant disparity across the disciplines of the lectures. These results indicate the need of more inclusive and robust ASR systems and more representational datasets for disparity evaluation in them.

翻訳日:2023-07-21 14:39:11 公開日:2023-07-20

# 機械学習システムの信頼性に関する全体論的評価

A Holistic Assessment of the Reliability of Machine Learning Systems ( http://arxiv.org/abs/2307.10586v1 )

ライセンス: Link先を確認

Anthony Corso, David Karamadian, Romeo Valentin, Mary Cooper, Mykel J. Kochenderfer

(参考訳) 機械学習(ml)システムは、医療、輸送、軍、国家安全保障などの高リスク設定に浸透するにつれて、信頼性に関する懸念が高まっている。顕著な進歩にもかかわらず、これらのシステムの性能は敵の攻撃や環境の変化によって著しく低下し、過度な予測、入力障害の検出の失敗、予期せぬシナリオで一般化できないことにつながる。本稿では,MLシステムの信頼性に関する総合評価手法を提案する。分散精度,分散シフトロバスト性,逆ロバスト性,キャリブレーション,分散検出の5つの特性を評価した。信頼性スコアも導入され、システム全体の信頼性を評価するために使用される。異なるアルゴリズムアプローチのパフォーマンスに関する洞察を提供するため,最先端技術を特定し,分類し,提案する信頼性指標と信頼性スコアを用いて実世界のタスクの選択を評価する。 500モデル以上のモデルを分析すると、あるメトリックに対する設計は必ずしも他のメトリックを制約するわけではないが、特定のアルゴリズム技術は複数のメトリクスの信頼性を同時に向上させることができることが分かる。この研究は、MLの信頼性をより包括的に理解し、将来の研究開発のロードマップを提供する。

As machine learning (ML) systems increasingly permeate high-stakes settings such as healthcare, transportation, military, and national security, concerns regarding their reliability have emerged. Despite notable progress, the performance of these systems can significantly diminish due to adversarial attacks or environmental changes, leading to overconfident predictions, failures to detect input faults, and an inability to generalize in unexpected scenarios. This paper proposes a holistic assessment methodology for the reliability of ML systems. Our framework evaluates five key properties: in-distribution accuracy, distribution-shift robustness, adversarial robustness, calibration, and out-of-distribution detection. A reliability score is also introduced and used to assess the overall system reliability. To provide insights into the performance of different algorithmic approaches, we identify and categorize state-of-the-art techniques, then evaluate a selection on real-world tasks using our proposed reliability metrics and reliability score. Our analysis of over 500 models reveals that designing for one metric does not necessarily constrain others but certain algorithmic techniques can improve reliability across multiple metrics simultaneously. This study contributes to a more comprehensive understanding of ML reliability and provides a roadmap for future research and development.

翻訳日:2023-07-21 14:38:52 公開日:2023-07-20

# 拡散を経由する参照ベースの画家的インペインティング:野生参照ドメインギャップを横断する

Reference-based Painterly Inpainting via Diffusion: Crossing the Wild Reference Domain Gap ( http://arxiv.org/abs/2307.10584v1 )

ライセンス: Link先を確認

Dejia Xu, Xingqian Xu, Wenyan Cong, Humphrey Shi, Zhangyang Wang

(参考訳) 絵に新しい物体を入れたらどうなるか想像したことがありますか? 例えば、クロード・モネ(claude monet)の『water lilies, evening effect』にバスケットボールを入れるとどうなるか? 本研究では,参照ドメインギャップを越え,新しいオブジェクトをアートワークに埋め込む新しいタスクであるPaterly Inpaintingを提案する。これまでの文献では, 対象と参照との間に大きな領域不一致を考慮せず, フォトリアリスティックな参照を用いて, 芸術的イメージを描けるように設計されている。本稿では,'inpaint more wildly'と呼ばれる新しい拡散フレームワークを提案する。画像条件付き拡散モデルを用いて構築され,塗布マスクで動作するラダーサイドブランチとマスク融合機構を導入する。 CLIPイメージの埋め込みを推論時に分解することで、セマンティックな情報とスタイル情報の強度を容易に操作できる。実験により,提案するrefpaintフレームワークが既存の手法よりもはるかに優れた結果をもたらすことを実証した。提案手法は,他の方法では達成し難い参照オブジェクトで絵を描くことができる。プロジェクトページ: https://vita-group.github.io/RefPaint/

Have you ever imagined how it would look if we placed new objects into paintings? For example, what would it look like if we placed a basketball into Claude Monet's ``Water Lilies, Evening Effect''? We propose Reference-based Painterly Inpainting, a novel task that crosses the wild reference domain gap and implants novel objects into artworks. Although previous works have examined reference-based inpainting, they are not designed for large domain discrepancies between the target and the reference, such as inpainting an artistic image using a photorealistic reference. This paper proposes a novel diffusion framework, dubbed RefPaint, to ``inpaint more wildly'' by taking such references with large domain gaps. Built with an image-conditioned diffusion model, we introduce a ladder-side branch and a masked fusion mechanism to work with the inpainting mask. By decomposing the CLIP image embeddings at inference time, one can manipulate the strength of semantic and style information with ease. Experiments demonstrate that our proposed RefPaint framework produces significantly better results than existing methods. Our method enables creative painterly image inpainting with reference objects that would otherwise be difficult to achieve. Project page: https://vita-group.github.io/RefPaint/

翻訳日:2023-07-21 14:38:31 公開日:2023-07-20

# パリティアーキテクチャのための構成的小冊子コンパイル

Constructive plaquette compilation for the parity architecture ( http://arxiv.org/abs/2307.10626v1 )

ライセンス: Link先を確認

Roeland ter Hoeven, Benjamin E. Niehoff, Sagar Sudhir Kale, Wolfgang Lechner

(参考訳) パリティコンパイル(parity compilation)は、パリティマッピングに必要な制約をローカルに配置する、という課題である。任意の高階最適化問題に対して,ラケットを用いたパリティアーキテクチャのための最初の構成的コンパイルアルゴリズムを提案する。これにより、プラーペットレイアウトをネイティブに実装できる断熱プロトコルと、完全に並列化されたデジタル回路が可能になる。アルゴリズムは格子の長方形のレイアウトを構築し、矩形の各層に少なくとも1つの制約を加える。中心となる考え方は、矩形の境界上の任意の量子ビットといくつかの新しい量子ビットからなる各制約は、アンシラを用いて決定的な手順でプラケットに分解できるということである。有効な制約セットの選択方法と、この分解の動作方法を示します。さらに、アシラ数を最適化し、追加の制約で最適化問題を実装する方法を示します。

Parity compilation is the challenge of laying out the required constraints for the parity mapping in a local way. We present the first constructive compilation algorithm for the parity architecture using plaquettes for arbitrary higher-order optimization problems. This enables adiabatic protocols, where the plaquette layout can natively be implemented, as well as fully parallelized digital circuits. The algorithm builds a rectangular layout of plaquettes, where in each layer of the rectangle at least one constraint is added. The core idea is that each constraint, consisting of any qubits on the boundary of the rectangle and some new qubits, can be decomposed into plaquettes with a deterministic procedure using ancillas. We show how to pick a valid set of constraints and how this decomposition works. We further give ways to optimize the ancilla count and show how to implement optimization problems with additional constraints.

翻訳日:2023-07-21 14:30:14 公開日:2023-07-20

# ポリプ再同定のための識別的視覚テキスト表現の学習

Learning Discriminative Visual-Text Representation for Polyp Re-Identification ( http://arxiv.org/abs/2307.10625v1 )

ライセンス: Link先を確認

Suncheng Xiang, Cang Liu, Sijia Du, Dahong Qian

(参考訳) 大腸内視鏡的ポリープ再同定は大腸がんの予防と治療に重要な役割を果たす大きなギャラリー内の特定のポリープと異なるカメラとビューをマッチングすることを目的としている。しかし、伝統的な手法は主に視覚的表現学習に焦点をあてるが、トレーニング中に意味的特徴の可能性を探究することを無視し、新しいシナリオに事前学習されたモデルを適用すると、容易に一般化能力が低下する可能性がある。このジレンマを解消するために,高レベルのセマンティック情報を交換することで,ポリプビデオの表現を著しく強化する,VT-ReIDというシンプルで効果的なトレーニング手法を提案する。さらに,テキストデータからの事前知識を導入するための新しいクラスタリング機構を精巧に設計した。我々の知る限りでは、大腸内視鏡的ポリープ再同定のためのクラスタリング機構を備えたビジュアルテキスト機能を利用する最初の試みである。実験結果から,本手法は現在の最先端の手法を著しく上回っており,その差は明らかである。

Colonoscopic Polyp Re-Identification aims to match a specific polyp in a large gallery with different cameras and views, which plays a key role for the prevention and treatment of colorectal cancer in the computer-aided diagnosis. However, traditional methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training, which may easily leads to poor generalization capability when adapted the pretrained model into the new scenarios. To relieve this dilemma, we propose a simple but effective training method named VT-ReID, which can remarkably enrich the representation of polyp videos with the interchange of high-level semantic information. Moreover, we elaborately design a novel clustering mechanism to introduce prior knowledge from textual data, which leverages contrastive learning to promote better separation from abundant unlabeled text data. To the best of our knowledge, this is the first attempt to employ the visual-text feature with clustering mechanism for the colonoscopic polyp re-identification. Empirical results show that our method significantly outperforms current state-of-the art methods with a clear margin.

翻訳日:2023-07-21 14:29:57 公開日:2023-07-20

# マイクロジェスチャ分類における関節骨格およびセマンティクス埋め込み損失

Joint Skeletal and Semantic Embedding Loss for Micro-gesture Classification ( http://arxiv.org/abs/2307.10624v1 )

ライセンス: Link先を確認

Kun Li, Dan Guo, Guoliang Chen, Xinge Peng, and Meng Wang

(参考訳) 本稿では,IJCAI 2023におけるMiGAチャレンジにおけるマイクロスゲクチュア分類のためのチームHFUT-VUTのソリューションについて紹介する。マイクロジェスチャー分類タスクは、骨格データに基づいて、所定のビデオのアクションカテゴリを認識することを目的としている。そこで本研究では,3D-CNNを用いたマイクロ位置認識ネットワークを提案する。最後に,トップ1の精度で第2位を1.10%上回って,マイクロジェスチャ分類チャレンジで1位にランクインした。

In this paper, we briefly introduce the solution of our team HFUT-VUT for the Micros-gesture Classification in the MiGA challenge at IJCAI 2023. The micro-gesture classification task aims at recognizing the action category of a given video based on the skeleton data. For this task, we propose a 3D-CNNs-based micro-gesture recognition network, which incorporates a skeletal and semantic embedding loss to improve action classification performance. Finally, we rank 1st in the Micro-gesture Classification Challenge, surpassing the second-place team in terms of Top-1 accuracy by 1.10%.

翻訳日:2023-07-21 14:29:36 公開日:2023-07-20

# G\"oran Lindblad in memoriam

G\"oran Lindblad in memoriam ( http://arxiv.org/abs/2307.10621v1 )

ライセンス: Link先を確認

Ingemar Bengtsson

(参考訳) これは、G\"oran Lindbladの生涯と作品の簡単な説明である。

This is a brief account of the life and work of G\"oran Lindblad.

翻訳日:2023-07-21 14:29:24 公開日:2023-07-20

# 四元テンソル環の分解とカラー画像インパインティングへの応用

Quaternion tensor ring decomposition and application for color image inpainting ( http://arxiv.org/abs/2307.10620v1 )

ライセンス: Link先を確認

Jifei Miao and Kit Ian Kou

(参考訳) 近年、テンソルネットワークは大規模最適化問題を解決する強力なツールとして登場している。最も有望なテンソル・ネットワークの1つはテンソル・リング(TR)分解であり、これはトレース演算と潜在コアの公平な処理を利用してモデル内の円形の置換不変性を達成する。一方,近年では,カラーピクセルの符号化に有効性があるため,カラー画像処理タスクに広く活用されている。そこで本研究では,色画素表現の四元数による利点を活用しつつ,TR分解の強力で一般化された表現能力を継承する四元数テンソルリング(QTR)分解を提案する。本稿では,QTR分解の定義とQTR形式学習アルゴリズムに加えて,低ランク四元数テンソル完備化(LRQTC)モデルと,QTR分解に基づくカラー画像インペイントのためのアルゴリズムを提案する。最後に,カラー画像インペインティングに関する広範な実験により,提案するqtlrc法が高い競合性を示す。

In recent years, tensor networks have emerged as powerful tools for solving large-scale optimization problems. One of the most promising tensor networks is the tensor ring (TR) decomposition, which achieves circular dimensional permutation invariance in the model through the utilization of the trace operation and equitable treatment of the latent cores. On the other hand, more recently, quaternions have gained significant attention and have been widely utilized in color image processing tasks due to their effectiveness in encoding color pixels. Therefore, in this paper, we propose the quaternion tensor ring (QTR) decomposition, which inherits the powerful and generalized representation abilities of the TR decomposition while leveraging the advantages of quaternions for color pixel representation. In addition to providing the definition of QTR decomposition and an algorithm for learning the QTR format, this paper also proposes a low-rank quaternion tensor completion (LRQTC) model and its algorithm for color image inpainting based on the QTR decomposition. Finally, extensive experiments on color image inpainting demonstrate that the proposed QTLRC method is highly competitive.

翻訳日:2023-07-21 14:29:21 公開日:2023-07-20

# テキスト分類による偽レビューの検出

Detecting deceptive reviews using text classification ( http://arxiv.org/abs/2307.10617v1 )

ライセンス: Link先を確認

Anusuya Baby

(参考訳) 近年、オンラインレビューはあらゆる種類の製品やサービスを促進する上で重要な役割を担っている。企業は、顧客が商品を購入するために偽レビューを埋め込むことができる。自社製品の利点を強調したり、競合製品を批判したりすることもある。マーケター、広告主、その他のオンラインビジネスユーザーは、本当に気に入らない製品に対して偽のポジティブレビューを作成したり、偽のネガティブレビューを与えたりすることを奨励しています。ですから今では,自分たちのビジネスを宣伝したり,競争相手の評判を損なうような,偽りのレビューを書くことは避けられないことです。したがって、偽りのレビューを特定することは、激しく、現在進行中の研究分野である。本研究は,認識的レビューを識別するための機械学習モデルアプローチを提案する。本論文は,レストランレビューの偽装的意見スパムコーパスデータセット上で行った複数の実験の結果について検討する。我々は偽レビューに焦点をあてて偽コンテンツを特定するn-gramモデルとmax機能を開発した。さらに,2つの特徴抽出手法の性能調査と5つの機械学習分類手法の適用についてベンチマーク研究を行った。実験の結果,パッシブアグレッシブな分類器は他のアルゴリズムよりも優れており,テキスト分類だけでなく,偽レビューにも高い精度を達成できた。また、データ拡張を研究し、異なるディープラーニング技術を実装します。

In recent years, online reviews play a vital role for promoting any kind of product or services. Businesses may embed fake reviews in order to attract customers to purchase their products. They may even highlight the benefits of their own product or criticize the competition's product. Marketers, advertisers, and other online business users have incentive to create fake positive reviews for products which they want to promote or give fake negative reviews for products which they really don't like. So now-a-days writing a deceptive review is inevitable thing for promoting their own business or degrading competitor's reputation. Thus, identifying deceptive reviews is an intense and on-going research area. This research paper proposes machine learning model approach to identify deceptive reviews. The paper investigates the performance of the several experiments done on a Deceptive Opinion Spam Corpus dataset of restaurants reviews. We developed a n-gram model and max features to identify deceptive contents with a particular focus on fake reviews. Further, we conduct a benchmark study to investigate the performance of two different features extraction techniques and apply five machine learning classification techniques. The experimental results show that passive aggressive classifier outperforms other algorithms, and it reaches the highest accuracy not only in text classification but also to fake reviews. We also study the data augmentation and implement different deep learning techniques.

翻訳日:2023-07-21 14:29:04 公開日:2023-07-20

# 不均一フェデレーション学習の現状と研究課題

Heterogeneous Federated Learning: State-of-the-art and Research Challenges ( http://arxiv.org/abs/2307.10616v1 )

ライセンス: Link先を確認

Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, Dacheng Tao

(参考訳) フェデレーテッド・ラーニング(FL)は、大規模産業用途での利用の可能性から注目を集めている。既存のフェデレーション学習は主にモデル均質な設定に焦点を当てている。しかし、実践的なフェデレーション学習は、典型的には、データ分散、モデルアーキテクチャ、ネットワーク環境、ハードウェア機器の異種性に直面する。不均一フェデレートラーニング(HFL)はより困難であり、それに対応するソリューションは多様で複雑である。したがって、研究課題と最先端技術に関する体系的な調査が不可欠である。本稿では,まず,HFLにおける様々な研究課題について,統計的異質性,モデル異質性,通信異質性,デバイス異質性,その他の課題の5つの側面から要約する。さらに,近年のHFLの進歩を概観し,既存のHFL手法の新たな分類法を提案し,その長所と短所の詳細な分析を行った。我々は既存のメソッドを,データレベル,モデルレベル,サーバレベルという3つの異なるレベルから分類する。最後に、この分野のさらなる発展を促進するため、hflにおけるいくつかの批判的かつ有望な今後の研究方向について論じる。 HFLの定期的に更新されたコレクションはhttps://github.com/marswhu/HFL_Survey.comで入手できる。

Federated learning (FL) has drawn increasing attention owing to its potential use in large-scale industrial applications. Existing federated learning works mainly focus on model homogeneous settings. However, practical federated learning typically faces the heterogeneity of data distributions, model architectures, network environments, and hardware devices among participant clients. Heterogeneous Federated Learning (HFL) is much more challenging, and corresponding solutions are diverse and complex. Therefore, a systematic survey on this topic about the research challenges and state-of-the-art is essential. In this survey, we firstly summarize the various research challenges in HFL from five aspects: statistical heterogeneity, model heterogeneity, communication heterogeneity, device heterogeneity, and additional challenges. In addition, recent advances in HFL are reviewed and a new taxonomy of existing HFL methods is proposed with an in-depth analysis of their pros and cons. We classify existing methods from three different levels according to the HFL procedure: data-level, model-level, and server-level. Finally, several critical and promising future research directions in HFL are discussed, which may facilitate further developments in this field. A periodically updated collection on HFL is available at https://github.com/marswhu/HFL_Survey.

翻訳日:2023-07-21 14:28:44 公開日:2023-07-20

# HC-NJDGデータ分析によるインドの高等裁判所の罰則の理解

Analyzing HC-NJDG Data to Understand the Pendency in High Courts in India ( http://arxiv.org/abs/2307.10615v1 )

ライセンス: Link先を確認

Kshitiz Verma

(参考訳) インドの司法機関は、あらゆるレベルで裁判所で係争中の何百万もの事件に苦しめられている。本稿では,インド共和国における24の高等裁判所(hc-njdg,high court njdg)において収集したデータを分析した。 2017年8月31日から2018年12月26日までの73日間のデータを収集しました。したがって、私たちによって収集されたデータは、ほぼ16ヶ月の期間にまたがる。我々は,高等裁判所のNJDGポータルにおいて,高等裁判所の裁判官数,高等裁判所に係留する事件数,10年以上保留されている事件数,提出された事件数,登録された事件数,女性・高齢者の訴訟数など,さまざまな統計分析を行った。結果はこう示しています 1) 高等裁判所判事の数はNJDG(第1、第1、第2、第10、第11、第V表)に重大な誤差がある。 2)ほとんどの高等裁判所の仮設事件は減少せず、増加傾向にある(第3、第13図)。 3)HC-NJDGの定期的な更新が必要である。一部の高等裁判所に関するデータは定期的に更新されず、ポータルで誤って更新される(第14図)。 4) 異なる高等裁判所の裁判官に対する判例の平均負荷には大きな差がある(第6図)。 5) すべての高等裁判所が裁判官の承認した力で運営している場合、今後20年以内に上級裁判所の年金は無効にすることができる(第21、第22図)。 6) 女性及び高齢者が起こした留置件数は不当に低く、合計留置件の10%未満である(第23-27図) 7)高等裁判所の仮設事件件数を減少させるため、裁判所における事案作成のスケジューリングプロセスの改善が図られる(第29図)。 8)いくつかの統計は明確に定義されていない(第31図)。

Indian Judiciary is suffering from burden of millions of cases that are lying pending in its courts at all the levels. In this paper, we analyze the data that we have collected on the pendency of 24 high courts in the Republic of India as they were made available on High Court NJDG (HC-NJDG). We collected data on 73 days beginning August 31, 2017 to December 26, 2018, including these days. Thus, the data collected by us spans a period of almost sixteen months. We have analyzed various statistics available on the NJDG portal for High Courts, including but not limited to the number of judges in each high court, the number of cases pending in each high court, cases that have been pending for more than 10 years, cases filed, listed and disposed, cases filed by women and senior citizens, etc. Our results show that: 1) statistics as important as the number of judges in high courts have serious errors on NJDG (Fig. 1, 2, 10, 11, Table V). 2) pending cases in most of the high courts are increasing rather than decreasing (Fig. 3, 13). 3) regular update of HC-NJDG is required for it to be useful. Data related to some high courts is not being updated regularly or is updated erroneously on the portal (Fig. 14). 4) there is a huge difference in terms of average load of cases on judges of different high courts (Fig. 6). 5) if all the high courts operate at their approved strength of judges, then for most of the high courts pendency can be nullified within 20 years from now (Fig. 21, 22). 6) the pending cases filed by women and senior citizens are disproportionately low, they together constitute less than 10% of the total pending cases (Fig. 23 - 27) 7) a better scheduling process for preparing causelists in courts can help reducing the number of pending cases in the High Courts (Fig. 29). 8) some statistics are not well defined (Fig. 31).

翻訳日:2023-07-21 14:28:22 公開日:2023-07-20

# ビルアウトライン自動抽出のためのハイブリッド特徴埋め込み

Hybrid Feature Embedding For Automatic Building Outline Extraction ( http://arxiv.org/abs/2307.10609v1 )

ライセンス: Link先を確認

Weihang Ran, Wei Yuan, Xiaodan Shi, Zipei Fan, Ryosuke Shibasaki

(参考訳) 高解像度空中画像から抽出した建物概要は, 変化検出や災害評価など, 様々な応用分野に利用することができる。しかし、従来のcnnモデルはオリジナル画像から非常に正確に輪郭を認識できない。本稿では,CNNとTransformerをベースとしたモデルとアクティブな輪郭モデルを提案し,この問題に対処する。また,エンコーダが生成する異なる特徴を処理するために,トリプルブランチデコーダ構造も設計した。実験の結果、我々のモデルは2つのデータセットで他のベースラインモデルよりも優れており、ベイヒンゲンでは91.1% mIoU、ビング小屋では83.8%であることがわかった。

Building outline extracted from high-resolution aerial images can be used in various application fields such as change detection and disaster assessment. However, traditional CNN model cannot recognize contours very precisely from original images. In this paper, we proposed a CNN and Transformer based model together with active contour model to deal with this problem. We also designed a triple-branch decoder structure to handle different features generated by encoder. Experiment results show that our model outperforms other baseline model on two datasets, achieving 91.1% mIoU on Vaihingen and 83.8% on Bing huts.

翻訳日:2023-07-21 14:27:47 公開日:2023-07-20

# 確率的洗練による物理駆動乱流画像復元

Physics-Driven Turbulence Image Restoration with Stochastic Refinement ( http://arxiv.org/abs/2307.10603v1 )

ライセンス: Link先を確認

Ajay Jaiswal, Xingguang Zhang, Stanley H. Chan, Zhangyang Wang

(参考訳) 大気乱流による画像歪みは確率的劣化であり、長距離光学イメージングシステムでは重要な問題である。合成データの助けを借りて、モデルベースの新しいディープラーニングソリューションを含む、過去数十年間、数多くの研究が実施されてきた。近年、ディープラーニングモデルが現実の乱流に適応するために、高速で物理学的なシミュレーションツールが導入されたが、そのようなモデルの訓練は、合成データと地上の真理対にのみ依存している。本稿では,物理ベースのシミュレータを直接学習プロセスに導入し,ネットワークが確率性を劣化や基礎画像から切り離すのに役立つ物理統合復元ネットワーク(pirn)を提案する。さらに、決定論的モデルによって導入された「平均効果」と、合成と実世界の劣化の間の領域ギャップを克服するために、我々はさらに、その知覚的品質を高めるために、確率的微細化(PiRN-SR)を用いたPiRNを導入する。全体として、我々のPiRNとPiRN-SRは、実世界の未知の乱流条件への一般化を改善し、ピクセルの精度と知覚品質の両面で最先端の復元を提供する。我々のコードは \url{https://github.com/VITA-Group/PiRN} で入手できる。

Image distortion by atmospheric turbulence is a stochastic degradation, which is a critical problem in long-range optical imaging systems. A number of research has been conducted during the past decades, including model-based and emerging deep-learning solutions with the help of synthetic data. Although fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions recently, the training of such models only relies on the synthetic data and ground truth pairs. This paper proposes the Physics-integrated Restoration Network (PiRN) to bring the physics-based simulator directly into the training process to help the network to disentangle the stochasticity from the degradation and the underlying image. Furthermore, to overcome the ``average effect" introduced by deterministic models and the domain gap between the synthetic and real-world degradation, we further introduce PiRN with Stochastic Refinement (PiRN-SR) to boost its perceptual quality. Overall, our PiRN and PiRN-SR improve the generalization to real-world unknown turbulence conditions and provide a state-of-the-art restoration in both pixel-wise accuracy and perceptual quality. Our codes are available at \url{https://github.com/VITA-Group/PiRN}.

翻訳日:2023-07-21 14:27:37 公開日:2023-07-20

# 無線ネットワークにおけるデータ駆動遅延確率予測:テール確率に着目して

Data-Driven Latency Probability Prediction for Wireless Networks: Focusing on Tail Probabilities ( http://arxiv.org/abs/2307.10648v1 )

ライセンス: Link先を確認

Samie Mostafavi, Gourav Prateek Sharma, James Gross

(参考訳) サイバー物理システムやヒューマン・イン・ザ・ループ・アプリケーションといった新しい応用分野が出現するにつれ、あるレベルのエンドツーエンドのネットワーク遅延を極めて高い信頼性(例えば99.999%)で保証する必要がある。 IEEE 802.1as のタイムセンシティブネットワーク (TSN) で規定されるメカニズムは、スイッチングイーサネットネットワークのこれらの要件を達成するのに利用できるが、無線ネットワークにおけるTSN機構の実装は、その確率的性質のため難しい。無線リンクを99.999%の信頼性レベルに適合させるためには、遅延確率分布や分布の尾部における極めて稀な外れ値の挙動を分析し、制御する必要がある。本研究は, 混合密度ネットワーク(MDN)や極値混合モデルなどの最先端データ駆動手法を用いて遅延分布の尾部を予測し, 無線伝送においてより情報的な決定を行うことのできる, ネットワークパラメータに条件付けられた稀なレイテンシの確率を推定することを提案する。 IEEE 802.11g(WiFi)、商用プライベート、ソフトウェア定義5Gネットワークの実際の遅延測定は、提案手法をベンチマークし、テール確率に対する感度を評価するために使用される。

With the emergence of new application areas, such as cyber-physical systems and human-in-the-loop applications, there is a need to guarantee a certain level of end-to-end network latency with extremely high reliability, e.g., 99.999%. While mechanisms specified under IEEE 802.1as time-sensitive networking (TSN) can be used to achieve these requirements for switched Ethernet networks, implementing TSN mechanisms in wireless networks is challenging due to their stochastic nature. To conform the wireless link to a reliability level of 99.999%, the behavior of extremely rare outliers in the latency probability distribution, or the tail of the distribution, must be analyzed and controlled. This work proposes predicting the tail of the latency distribution using state-of-the-art data-driven approaches, such as mixture density networks (MDN) and extreme value mixture models, to estimate the likelihood of rare latencies conditioned on the network parameters, which can be used to make more informed decisions in wireless transmission. Actual latency measurements of IEEE 802.11g (WiFi), commercial private and a software-defined 5G network are used to benchmark the proposed approaches and evaluate their sensitivities concerning the tail probabilities.

翻訳日:2023-07-21 14:21:48 公開日:2023-07-20

# 多変量正規分布間のフィッシャー・ラオ距離とプルバックSPDコーン距離

Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions ( http://arxiv.org/abs/2307.10644v1 )

ライセンス: Link先を確認

Frank Nielsen

(参考訳) 多変量正規分布のデータセットは、拡散テンソルイメージング、構造テンソルコンピュータビジョン、レーダー信号処理、機械学習など多くの科学分野に豊富に存在する。フィルタリングや分類、クラスタリングといった下流タスクのための通常のデータセットを処理するためには、通常のものとパスの相違点を適切に定義する必要がある。フィッシャー情報計量によって引き起こされるリーマン測地線距離として定義されるフィッシャー・ラオ距離は、そのような原理的な距離距離であるが、いくつかの特別な場合を除いて閉じた形では知られていない。本研究では,多変量正規分布間のフィッシャー・ラオ距離を任意に近似する高速でロバストな手法を最初に報告する。第二に、正規多様体の微分同相埋め込みに基づく距離のクラスを、中心となる正規分布の多様体に対応する高次元対称正定円錐の部分多様体に導入する。円錐上の射影ヒルベルト距離は、埋め込まれた正規部分多様体上の計量となり、その円錐距離を対応する直線ヒルベルト錐測地線と引き戻し、正規分布間の距離と滑らかな経路を得ることを示す。フィッシャー-ラオ距離近似と比較して、プルバックヒルベルト錐距離は行列の極小および極大固有値のみを計算する必要があるため、計算的に軽い。最後に、これらの距離をクラスタリングタスクで使う方法を示す。

Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.

翻訳日:2023-07-21 14:21:24 公開日:2023-07-20

# retouchingffhq: きめ細かい顔修正検出のための大規模データセット

RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection ( http://arxiv.org/abs/2307.10642v1 )

ライセンス: Link先を確認

Qichao Ying, Jiaxin Liu, Sheng Li, Haisheng Xu, Zhenxing Qian, Xinpeng Zhang

(参考訳) ショートビデオプラットフォームにおける顔のリタッチフィルターの普及は、デジタル外観の正しさと偽装広告の影響を懸念している。これらの課題に対処するためには、高度な顔修正技術を開発する必要がある。しかし、大規模かつきめ細かい顔修正データセットの欠如は、この分野の進歩の大きな障害となっている。本稿では,50万以上の条件付きリタッチ画像を含む大規模かつ細粒度の顔リタッチデータセットであるretouchingffhqを紹介する。 RetouchingFFHQは、その大規模、高品質、きめ細かい粒度、カスタマイズのため、以前のデータセットから際立っている。 4種類の顔リタッチ操作と異なる顔リタッチレベルを含むことにより、両顔リタッチ検出を細粒度、マルチリタッチ型、マルチリタッチレベル推定問題に拡張する。さらに,クロススケール表現学習のためのcnnバックボーンのためのプラグインとして,マルチグラナラリティアテンションモジュール(mam)を提案する。異なるベースラインを用いた広範囲な実験と提案手法は顔のリタッチ検出に優れた性能を示す。提案する新しいデータセットでは、リアルタイムのきめ細かな顔のリタッチ検出の難しい問題に取り組むための、今後の作業には大きな可能性があると考えています。

The widespread use of face retouching filters on short-video platforms has raised concerns about the authenticity of digital appearances and the impact of deceptive advertising. To address these issues, there is a pressing need to develop advanced face retouching techniques. However, the lack of large-scale and fine-grained face retouching datasets has been a major obstacle to progress in this field. In this paper, we introduce RetouchingFFHQ, a large-scale and fine-grained face retouching dataset that contains over half a million conditionally-retouched images. RetouchingFFHQ stands out from previous datasets due to its large scale, high quality, fine-grainedness, and customization. By including four typical types of face retouching operations and different retouching levels, we extend the binary face retouching detection into a fine-grained, multi-retouching type, and multi-retouching level estimation problem. Additionally, we propose a Multi-granularity Attention Module (MAM) as a plugin for CNN backbones for enhanced cross-scale representation learning. Extensive experiments using different baselines as well as our proposed method on RetouchingFFHQ show decent performance on face retouching detection. With the proposed new dataset, we believe there is great potential for future work to tackle the challenging problem of real-world fine-grained face retouching detection.

翻訳日:2023-07-21 14:21:02 公開日:2023-07-20

# ネットワーク量子化のための量子化特徴蒸留

Quantized Feature Distillation for Network Quantization ( http://arxiv.org/abs/2307.10638v1 )

ライセンス: Link先を確認

Ke Zhu and Yin-Yin He and Jianxin Wu

(参考訳) ニューラルネットワーク量子化は、低ビット近似を用いて、完全精度のニューラルネットワークモデルを加速し、トリムすることを目的としている。量子化認識トレーニング(qat)パラダイムを採用する手法は最近急速に成長しているが、概念的には複雑であることが多い。本稿では,新しい高効率qat法である量子化特徴蒸留(qfd)を提案する。 QFDはまず教師として量子化された(または二項化された)表現を訓練し、その後知識蒸留(KD)を用いてネットワークを定量化する。定量的結果は、QFDが従来の量子化法よりも柔軟で効果的であることを示している。 QFDは、画像分類だけでなく、オブジェクト検出においても、既存の手法をはるかに上回ります。さらに、QFDは、MS-COCOの検出とセグメンテーションに基づいてViTとSwin-Transformerを定量化し、実世界の展開におけるその可能性を検証する。我々の知る限りでは、視覚変換器が物体検出や画像分割タスクで定量化されたのはこれが初めてである。

Neural network quantization aims to accelerate and trim full-precision neural network models by using low bit approximations. Methods adopting the quantization aware training (QAT) paradigm have recently seen a rapid growth, but are often conceptually complicated. This paper proposes a novel and highly effective QAT method, quantized feature distillation (QFD). QFD first trains a quantized (or binarized) representation as the teacher, then quantize the network using knowledge distillation (KD). Quantitative results show that QFD is more flexible and effective (i.e., quantization friendly) than previous quantization methods. QFD surpasses existing methods by a noticeable margin on not only image classification but also object detection, albeit being much simpler. Furthermore, QFD quantizes ViT and Swin-Transformer on MS-COCO detection and segmentation, which verifies its potential in real world deployment. To the best of our knowledge, this is the first time that vision transformers have been quantized in object detection and image segmentation tasks.

翻訳日:2023-07-21 14:20:38 公開日:2023-07-20

# 会話型頭部生成における人間の好みの学習と評価

Learning and Evaluating Human Preferences for Conversational Head Generation ( http://arxiv.org/abs/2307.10636v1 )

ライセンス: Link先を確認

Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei

(参考訳) 手動による選好評価と整合する信頼性と総合的な評価基準は,対話型頭部ビデオ合成法の開発に不可欠である。既存の定量的評価は、限られた評価次元のみを考慮するため、人間の嗜好の完全な複雑さを捉えるのに失敗することが多い。質的評価とユーザスタディはソリューションを提供するが、時間と労力がかかる。この制限は対話型ヘッド生成アルゴリズムやシステムの進歩を妨げる。本稿では,異なる次元にわたる定量的評価に基づいて,人間の嗜好を適合させるための学習ベース評価尺度であるPreference Score(PS)を提案する。 PSは人間のアノテーションを必要とせずに定量的評価を行うことができる。実験結果から,人間の知覚に合わせる上での選好スコアの優越性を検証するとともに,未確認データに対する堅牢性と一般化性を実証し,会話ヘッド生成に有用なツールとなった。この指標が会話型ヘッドジェネレーションの新たな進歩を促進すると期待しています。

A reliable and comprehensive evaluation metric that aligns with manual preference assessments is crucial for conversational head video synthesis method development. Existing quantitative evaluations often fail to capture the full complexity of human preference, as they only consider limited evaluation dimensions. Qualitative evaluations and user studies offer a solution but are time-consuming and labor-intensive. This limitation hinders the advancement of conversational head generation algorithms and systems. In this paper, we propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions. PS can serve as a quantitative evaluation without the need for human annotation. Experimental results validate the superiority of Preference Score in aligning with human perception, and also demonstrates robustness and generalizability to unseen data, making it a valuable tool for advancing conversation head generation. We expect this metric could facilitate new advances in conversational head generation.

翻訳日:2023-07-21 14:20:19 公開日:2023-07-20

# SciBench:大規模言語モデルの大学レベルの科学的問題解決能力の評価

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models ( http://arxiv.org/abs/2307.10635v1 )

ライセンス: Link先を確認

Xiaoxuan Wang and Ziniu Hu and Pan Lu and Yanqiao Zhu and Jieyu Zhang and Satyen Subramaniam and Arjun R. Loomba and Shichang Zhang and Yizhou Sun and Wei Wang

(参考訳) 大規模言語モデル(LLM)の最近の進歩は、多くの数学的なベンチマークにおいて顕著な進歩を示している。しかし、これらのベンチマークのほとんどは中高生に根ざした問題に過ぎず、複数の質問しか含んでおらず、初等算術演算の限られた範囲に限定されている。本稿では,複雑な科学的問題解決に必要な推論能力を体系的に検討することを目的とした,拡張型ベンチマークスイート scibench を提案する。 SciBench には、数学、化学、物理学の教科書から引き出された様々な大学レベルの科学的問題を含むオープンセットと、コンピュータ科学と数学の学部レベルの試験から問題を構成するクローズドセットの2つの慎重に計算されたデータセットが含まれている。 2つのデータセットに基づいて,さまざまなプロンプト戦略を持つ2つの代表的llmの詳細なベンチマーク研究を行う。その結果、現在のLLMは満足なパフォーマンスを達成できないことが判明し、全体のスコアは35.80%に過ぎなかった。さらに,詳細なユーザ調査を行い,llmによる誤りを10の問題解決能力に分類した。分析の結果,特定の問題解決スキルの改善を示す戦略が,他のスキルの低下につながることが示唆された。我々は、SciBenchがLSMの推論能力のさらなる発展を触媒し、究極的には科学的研究と発見に寄与することを期待している。

Recent advances in large language models (LLMs) have demonstrated notable progress on many mathematical benchmarks. However, most of these benchmarks only feature problems grounded in junior and senior high school subjects, contain only multiple-choice questions, and are confined to a limited scope of elementary arithmetic operations. To address these issues, this paper introduces an expansive benchmark suite SciBench that aims to systematically examine the reasoning capabilities required for complex scientific problem solving. SciBench contains two carefully curated datasets: an open set featuring a range of collegiate-level scientific problems drawn from mathematics, chemistry, and physics textbooks, and a closed set comprising problems from undergraduate-level exams in computer science and mathematics. Based on the two datasets, we conduct an in-depth benchmark study of two representative LLMs with various prompting strategies. The results reveal that current LLMs fall short of delivering satisfactory performance, with an overall score of merely 35.80%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms others and some strategies that demonstrate improvements in certain problem-solving skills result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.

翻訳日:2023-07-21 14:20:04 公開日:2023-07-20

# ヒト遺伝子のヌクレオチド配列に関する生成言語モデル

Generative Language Models on Nucleotide Sequences of Human Genes ( http://arxiv.org/abs/2307.10634v1 )

ライセンス: Link先を確認

Musa Nuri Ihtiyar and Arzucan Ozgur

(参考訳) 言語モデルは、主にトランスフォーマーベースのもので、NLPで大きな成功を収めた。より正確に言うと、NLUのBERTやNLGのGPT-3のような研究は非常に重要である。 DNA配列は構造的には自然言語に非常に近いため、DNA関連バイオインフォマティクスドメインが関係すると、DNABertのような識別モデルが存在する。しかし、硬貨の生成的な側面は、主に我々の知識の最良の部分について未調査である。そこで本研究では,DNAシークエンスのための自己回帰生成言語モデルであるGPT-3の開発に焦点をあてた。 DNAの全配列を扱うことは、相当な計算資源なしでは難しいため、我々は、DNA全体の機能ではなく、人間の遺伝子のヌクレオチド配列、特定の機能を持つDNAのユニークな部分に焦点を当て、より小さなスケールで研究を行うことに決めた。この決定は、DNAと遺伝子が4つの異なるヌクレオチドから構成される1D配列として見ることができ、多くの情報を失い、単純化しすぎるという事実から、問題構造を大きく変えなかった。まず,n-gramsのような単純な手法が有望であるのに対し,rnnは最善を尽くしているのが観察された。もうひとつのメリットは、自然言語とは異なり、理解できない言語で生成モデルを扱う方法を学ぶことです。パープレキシティのような古典的なメトリクスを超えて、現実のタスクを使用するのがいかに必要かが観察される。さらに, 4種類のヌクレオチドにより, 語彙が最小の言語を選択することにより, これらのモデルのデータ・ハングリーの性質を変えることができるかどうかを調べた。この点をレビューする理由は、そのような言語を選択することが問題をより簡単にするためである。しかし、この研究で分かったのは、必要なデータ量の変更がほとんどないことでした。

Language models, primarily transformer-based ones, obtained colossal success in NLP. To be more precise, studies like BERT in NLU and works such as GPT-3 for NLG are very crucial. DNA sequences are very close to natural language in terms of structure, so if the DNA-related bioinformatics domain is concerned, discriminative models, like DNABert, exist. Yet, the generative side of the coin is mainly unexplored to the best of our knowledge. Consequently, we focused on developing an autoregressive generative language model like GPT-3 for DNA sequences. Because working with whole DNA sequences is challenging without substantial computational resources, we decided to carry out our study on a smaller scale, focusing on nucleotide sequences of human genes, unique parts in DNA with specific functionalities, instead of the whole DNA. This decision did not change the problem structure a lot due to the fact that both DNA and genes can be seen as 1D sequences consisting of four different nucleotides without losing much information and making too much simplification. First of all, we systematically examined an almost entirely unexplored problem and observed that RNNs performed the best while simple techniques like N-grams were also promising. Another beneficial point was learning how to work with generative models on languages we do not understand, unlike natural language. How essential using real-life tasks beyond the classical metrics such as perplexity is observed. Furthermore, checking whether the data-hungry nature of these models can be changed through selecting a language with minimal vocabulary size, four owing to four different types of nucleotides, is examined. The reason for reviewing this was that choosing such a language might make the problem easier. However, what we observed in this study was it did not provide that much of a change in the amount of data needed.

翻訳日:2023-07-21 14:19:40 公開日:2023-07-20

# マルチメソッド自己学習: テキストによるコード生成の改善とその逆

Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa ( http://arxiv.org/abs/2307.10633v1 )

ライセンス: Link先を確認

Shriyash K. Upadhyay and Etan J. Ginsberg

(参考訳) 大規模言語モデルには、同じ問題を解決する多くの方法がある。これは、新しい強み(異なる方法が異なる問題にうまく機能する可能性がある)と弱点(どの方法を使うかを知るのが難しいかもしれない)を導入します。本稿では,Multi-Method Self-Training (MMST)を導入し,各手法の強みを増強し,弱点を緩和する手法を提案する。言語とコードの両方で訓練された176Bパラメータモデルを用いて、MMSTが可能であることを示す。 1) 性能の低い方法(最大30%)を改善し、モデルを使いやすくする。 2)より高性能な方法(最大32.2%)を改善し、より高性能にする。 3)モデルが合理性を生成する能力を向上させることにより、関連するが異なるタスク(最大10.3%)のパフォーマンスを向上させる。次に、MMSTがなぜ機能するのかを調べるためにアブレーション分析を行う。 MMSTは従来の自己学習よりも多くのデータを生成するが、性能改善は複数の手法を用いることで促進される。また,MMSTをより効果的にするために,手法間でのプロンプトエンジニアリングとアンチコラージュ性能を解析した。われわれの論文の証拠は、機械学習の研究者たちに、言語モデルの進歩が新しい形の訓練を可能にする方法を探求する動機を与えてくれることを願っている。

Large Language Models have many methods for solving the same problem. This introduces novel strengths (different methods may work well for different problems) and weaknesses (it may be difficult for users to know which method to use). In this paper, we introduce Multi-Method Self-Training (MMST), where one method is trained on the filtered outputs of another, allowing us to augment the strengths and ameliorate the weaknesses of each method. Using a 176B parameter model trained on both language and code, we show that MMST can 1) improve the less performant method (up to 30%) making the model easier to use, 2) improve the more performant method (up to 32.2%) making the model more performant, and 3) improve the performance of related but distinct tasks (up to 10.3%) by improving the ability of the model to generate rationales. We then conduct ablation analyses to explore why MMST works. We show that MMST generates more data than traditional self-training, but the improvement in performance is driven by the use of multiple methods. We also analyze prompt-engineering and anti-correlated performance between methods as means of making MMST more effective. We hope the evidence from our paper motivates machine learning researchers to explore ways in which advances in language models allow for new forms of training.

翻訳日:2023-07-21 14:19:10 公開日:2023-07-20

# 自動流星検出のための新しい組込みアプリケーションの並列化

Parallelization of a new embedded application for automatic meteor detection ( http://arxiv.org/abs/2307.10632v1 )

ライセンス: Link先を確認

Mathuran Kandeepan (ALSOC), Clara Ciocan (ALSOC), Adrien Cassagne (ALSOC), Lionel Lacassagne (ALSOC)

(参考訳) 本稿では,新しいコンピュータビジョンアプリケーションを並列化する手法を提案する。このシステムは、不安定なカメラとノイズの多いビデオシーケンスから、自動的に流星を検出できる。このアプリケーションは、気象気球や空中観測キャンペーンに組み込むように設計されている。したがって、最終ターゲットは低消費電力のシステムオンチップ(10ワット)であり、ソフトウェアはリアルタイムでフレームのストリームを計算する必要がある(毎秒25フレーム)。このために、最初にアプリケーションをタスクグラフに分割すると、異なる並列化技術が適用されます。実験結果は並列化法の効率を示す。例えばraspberry pi 4やhdビデオシーケンスでは、処理チェーンは毎秒42フレームに達するが、6ワットしか消費しない。

This article presents the methods used to parallelize a new computer vision application. The system is able to automatically detect meteor from non-stabilized cameras and noisy video sequences. The application is designed to be embedded in weather balloons or for airborne observation campaigns. Thus, the final target is a low power system-on-chip (< 10 Watts) while the software needs to compute a stream of frames in real-time (> 25 frames per second). For this, first the application is split in a tasks graph, then different parallelization techniques are applied. Experiment results demonstrate the efficiency of the parallelization methods. For instance, on the Raspberry Pi 4 and on a HD video sequence, the processing chain reaches 42 frames per second while it only consumes 6 Watts.

翻訳日:2023-07-21 14:18:49 公開日:2023-07-20

# Pluvio: トランスファーラーニングと条件変分情報ボトルネックによるドメイン外アーキテクチャとライブラリのアセンブリクローン検索

Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck ( http://arxiv.org/abs/2307.10631v1 )

ライセンス: Link先を確認

Zhiwei Fu, Steven H. H. Ding, Furkan Alaca, Benjamin C. M. Fung, Philippe Charland

(参考訳) コード再利用の実践は、より速くより効率的な開発ライフサイクルのためにソフトウェア開発において不可欠です。しかし実際には、コードの再利用プラクティスは適切なコントロールを欠いているため、脆弱性の伝播や知的財産権侵害といった問題が発生する。重要なシフトライト防御メカニズムであるアセンブリクローン検索は、リリースされた実行ファイルの再利用による脆弱性のあるコードの識別に有効である。組立クローン探索に関する最近の研究は、異なるツールチェーンが生成する組立コード変種にマッチする機械学習ベースの手法を使う傾向を示している。しかしながら、これらのメソッドはトレーニングで使用される少数のツールチェーンの変種から学んだことに限定されており、見当たらないアーキテクチャと対応するコンパイルツールチェーンの変種には適用できない。本稿では,未知のアーキテクチャとライブラリを用いたアセンブリクローン探索の問題に関する最初の研究を行う。本研究は,大規模に訓練された自然言語モデルを用いて,集団クローン探索のための現在の学習に基づくアプローチに人間の共通知識を組み入れることを提案する。トランスファー学習は、アセンブリコードの人間の専門家から幅広い知識をもたらすことができるため、既存のアプローチの制限に対処するのに役立つ。さらに,不要かつ冗長なトークンを削除するために強化学習エージェントを提案することで,シーケンス制限問題にも対処する。新しい変分情報ボトルネック学習戦略と組み合わされ、提案システムはアーキテクチャの潜在的な指標と最適化設定への依存を最小化し、未発見のアーキテクチャをより一般化する。我々は,未解決のアーキテクチャクローン探索シナリオをシミュレートし,提案手法が最先端ソリューションに対して有効であることを示す。

The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, code reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone search with unseen architectures and libraries. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search. Transfer learning can aid in addressing the limitations of the existing approaches, as it can bring in broader knowledge from human experts in assembly code. We further address the sequence limit issue by proposing a reinforcement learning agent to remove unnecessary and redundant tokens. Coupled with a new Variational Information Bottleneck learning strategy, the proposed system minimizes the reliance on potential indicators of architectures and optimization settings, for a better generalization of unseen architectures. We simulate the unseen architecture clone search scenarios and the experimental results show the effectiveness of the proposed approach against the state-of-the-art solutions.

翻訳日:2023-07-21 14:18:37 公開日:2023-07-20

# 3次元分子前処理のためのフラクタルデノイング

Fractional Denoising for 3D Molecular Pre-training ( http://arxiv.org/abs/2307.10683v1 )

ライセンス: Link先を確認

Shikun Feng and Yuyan Ni and Yanyan Lan and Zhi-Ming Ma and Wei-Ying Ma

(参考訳) coordinate denoisingは有望な3d分子前訓練法であり、様々な下流の薬物発見タスクで顕著な性能を達成した。理論的には、この目的は下流のタスクに有用な力場を学ぶことと等価である。それにもかかわらず、効果的な力場、すなわち、低カバレッジサンプルと等方力場を学ぶための座標化の課題は2つある。その根底にある理由は、既存の分極法によって仮定される分子分布が分子の異方性特性を捉えないからである。これらの課題に対処するために,2面天使と座標の両方のノイズを含む,新しいハイブリッドノイズ戦略を提案する。しかし、そのようなハイブリッドノイズを伝統的な方法で発音することは、もはや力場を学ぶことと等価ではない。理論的推論により、この問題は共分散に対する入力コンホメーションの依存性によって引き起こされる。そこで本研究では,2種類の雑音を分離し,後者の座標部のみをデノー化する新しい分数デノージング法(frad)を設計することを提案する。このように、フラッドはより低エネルギーな構造をサンプリングする利点と力場等価性の両方を享受している。広範な実験により、分子表現におけるfradの有効性が示され、qm9の12のタスクのうち9つとmd17の8つのターゲットのうち7つに新しい状態が示された。

Coordinate denoising is a promising 3D molecular pre-training method, which has achieved remarkable performance in various downstream drug discovery tasks. Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks. Nevertheless, there are two challenges for coordinate denoising to learn an effective force field, i.e. low coverage samples and isotropic force field. The underlying reason is that molecular distributions assumed by existing denoising methods fail to capture the anisotropic characteristic of molecules. To tackle these challenges, we propose a novel hybrid noise strategy, including noises on both dihedral angel and coordinate. However, denoising such hybrid noise in a traditional way is no more equivalent to learning the force field. Through theoretical deductions, we find that the problem is caused by the dependency of the input conformation for covariance. To this end, we propose to decouple the two types of noise and design a novel fractional denoising method (Frad), which only denoises the latter coordinate part. In this way, Frad enjoys both the merits of sampling more low-energy structures and the force field equivalence. Extensive experiments show the effectiveness of Frad in molecular representation, with a new state-of-the-art on 9 out of 12 tasks of QM9 and on 7 out of 8 targets of MD17.

翻訳日:2023-07-21 14:11:29 公開日:2023-07-20

# 知識グラフ埋め込みに基づくパーソナライズされたレコメンダシステム

A Personalized Recommender System Based-on Knowledge Graph Embeddings ( http://arxiv.org/abs/2307.10680v1 )

ライセンス: Link先を確認

Ngoc Luyen Le (Heudiasyc), Marie-H\'el\`ene Abel (Heudiasyc), Philippe Gouspillou

(参考訳) 知識グラフはオントロジーを用いてエンティティとその関係をモデル化するのに有効であることが証明されている。近年、知識グラフを情報モデリングの形式として利用することへの関心が高まり、レコメンダシステムへの採用が増加している。ユーザとアイテムを知識グラフに組み込むことで、これらのシステムはそれらの間の暗黙のつながりをよりよく捉え、より正確なレコメンデーションを提供することができる。本稿では,自動車購入/販売ドメインに適用した知識グラフを組み込んだパーソナライズされたレコメンデーションシステムの構築と提案を行う。実験の結果,提案手法が個々のユーザと整合性のあるレコメンデーションを提供することの有効性を示した。

Knowledge graphs have proven to be effective for modeling entities and their relationships through the use of ontologies. The recent emergence in interest for using knowledge graphs as a form of information modeling has led to their increased adoption in recommender systems. By incorporating users and items into the knowledge graph, these systems can better capture the implicit connections between them and provide more accurate recommendations. In this paper, we investigate and propose the construction of a personalized recommender system via knowledge graphs embedding applied to the vehicle purchase/sale domain. The results of our experimentation demonstrate the efficacy of the proposed method in providing relevant recommendations that are consistent with individual users.

翻訳日:2023-07-21 14:10:44 公開日:2023-07-20

# 雑音QRコードの分類のための深層学習

Deep learning for classification of noisy QR codes ( http://arxiv.org/abs/2307.10677v1 )

ライセンス: Link先を確認

Rebecca Leygonie (LIPADE), Sylvain Lobry (LIPADE)), Laurent Wendling (LIPADE)

(参考訳) 我々は,視覚的に識別可能な対象を表現しない抽象画像に対して,ディープラーニングに基づく古典的分類モデルの限界を定義したい。qr符号(quick response codes)は,この抽象画像のカテゴリに分類される。抽象画像分類のための深層学習に基づくモデルの限界を理解するために,健康パス読取時に得られた情報から生成されたqrコードに基づく画像分類モデルを訓練する。雑音の存在下での分類モデルと古典的(決定論的)復号法を比較した。本研究は,深層学習に基づくモデルが抽象画像の理解に有効であると結論付けることを可能にする。

We wish to define the limits of a classical classification model based on deep learning when applied to abstract images, which do not represent visually identifiable objects.QR codes (Quick Response codes) fall into this category of abstract images: one bit corresponding to one encoded character, QR codes were not designed to be decoded manually. To understand the limitations of a deep learning-based model for abstract image classification, we train an image classification model on QR codes generated from information obtained when reading a health pass. We compare a classification model with a classical (deterministic) decoding method in the presence of noise. This study allows us to conclude that a model based on deep learning can be relevant for the understanding of abstract images.

翻訳日:2023-07-21 14:10:23 公開日:2023-07-20

# ベイアーおよび非ベイヤパターン画像センサの効率的な統一デモサイシング

Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors ( http://arxiv.org/abs/2307.10667v1 )

ライセンス: Link先を確認

Haechang Lee, Dongwon Park, Wongi Jeong, Kijeong Kim, Hyunwoo Je, Dongil Ryu, Se Young Chun

(参考訳) 最近のCMOSイメージセンサー(CIS)の物理的サイズが小さくなるにつれて、最新のモバイルカメラは、隣接する画素を持つ均一な色ユニットからなる独自の非バイヤーカラーフィルタアレイ(例えば、Quad、Nona、QxQ)パターンを採用している。これらの非バイヤーセンサは、異なる光条件の画素ビンサイズが変更可能であるため、従来のバイエルCFAよりも優れているが、固有の画素パターン構造とセンサハードウェア特性により、分解時に視覚的アーティファクトを導入する可能性がある。従来はバイエルCFAに重点を置いており、照明条件が異なる様々なCFAモードの非ベイエルパターンCISを再現する必要がある。本研究では,従来のBayer RAWと,様々な非Bayer CFAのRAWデータに異なる動作モードで適用可能な,効率的な統一復調手法を提案する。我々の知識学習に基づく適応パターンの復調モデル、すなわちKLAPは、CFA毎にネットワーク内の1%のキーフィルタに対してCFA適応フィルタを利用するが、それでもすべてのCFAを効果的に復調し、大規模モデルに匹敵する性能をもたらす。さらに,推論中にメタラーニング(KLAP-M)を用いることで,実際のRAWデータから未知のセンサ生成物を排除し,合成画像と実センサRAWのギャップを効果的に埋めることができる。 KLAP法とKLAP-M法は,Bayer および非Bayer CFAの合成RAWデータと実RAWデータの両方において,最先端の復調性能を達成した。

As the physical size of recent CMOS image sensors (CIS) gets smaller, the latest mobile cameras are adopting unique non-Bayer color filter array (CFA) patterns (e.g., Quad, Nona, QxQ), which consist of homogeneous color units with adjacent pixels. These non-Bayer sensors are superior to conventional Bayer CFA thanks to their changeable pixel-bin sizes for different light conditions but may introduce visual artifacts during demosaicing due to their inherent pixel pattern structures and sensor hardware characteristics. Previous demosaicing methods have primarily focused on Bayer CFA, necessitating distinct reconstruction methods for non-Bayer patterned CIS with various CFA modes under different lighting conditions. In this work, we propose an efficient unified demosaicing method that can be applied to both conventional Bayer RAW and various non-Bayer CFAs' RAW data in different operation modes. Our Knowledge Learning-based demosaicing model for Adaptive Patterns, namely KLAP, utilizes CFA-adaptive filters for only 1% key filters in the network for each CFA, but still manages to effectively demosaic all the CFAs, yielding comparable performance to the large-scale models. Furthermore, by employing meta-learning during inference (KLAP-M), our model is able to eliminate unknown sensor-generic artifacts in real RAW data, effectively bridging the gap between synthetic images and real sensor RAW. Our KLAP and KLAP-M methods achieved state-of-the-art demosaicing performance in both synthetic and real RAW data of Bayer and non-Bayer CFAs.

翻訳日:2023-07-21 14:10:03 公開日:2023-07-20

# チェコ語ニューステキストの分類のためのデータセットと強力なベースライン

A Dataset and Strong Baselines for Classification of Czech News Texts ( http://arxiv.org/abs/2307.10666v1 )

ライセンス: Link先を確認

Hynek Kydl\'i\v{c}ek, Jind\v{r}ich Libovick\'y

(参考訳) チェコの自然言語処理のための事前学習されたモデルは、純粋に言語的なタスク(タグづけ、解析、ner)や、感情分類や記事分類などの比較的単純な分類タスクで評価されることが多い。その代わり、チェコ最大の分類データセットの一つであるチェコ〜news~classification~dataset(cze-nec)を20年以上にわたるさまざまなソースのニュース記事から構成し、より厳密な評価を可能にする。我々は、ニュースソース、ニュースカテゴリ、推定著者の性別、週の日という4つの分類タスクを定義した。タスクの難易度を検証するために,人間による評価を行い,事前学習されたトランスフォーマーモデルに基づく強力な機械学習ベースラインに人間のパフォーマンスが遅れていることを明らかにした。さらに, 言語固有の事前学習エンコーダ解析が, 市販の大規模生成言語モデルよりも優れていることを示す。

Pre-trained models for Czech Natural Language Processing are often evaluated on purely linguistic tasks (POS tagging, parsing, NER) and relatively simple classification tasks such as sentiment classification or article classification from a single news source. As an alternative, we present CZEch~NEws~Classification~dataset (CZE-NEC), one of the largest Czech classification datasets, composed of news articles from various sources spanning over twenty years, which allows a more rigorous evaluation of such models. We define four classification tasks: news source, news category, inferred author's gender, and day of the week. To verify the task difficulty, we conducted a human evaluation, which revealed that human performance lags behind strong machine-learning baselines built upon pre-trained transformer models. Furthermore, we show that language-specific pre-trained encoder analysis outperforms selected commercially available large-scale generative language models.

翻訳日:2023-07-21 14:09:22 公開日:2023-07-20

# 教師なし分解と強化によるnrfの照明

Lighting up NeRF via Unsupervised Decomposition and Enhancement ( http://arxiv.org/abs/2307.10664v1 )

ライセンス: Link先を確認

Haoyuan Wang, Xiaogang Xu, Ke Xu, Rynson WH. Lau

(参考訳) ニューラル・ラジアンス・フィールド(NeRF)は、シーンの一連の画像と対応するカメラのポーズから、新しいビューを合成するための有望なアプローチである。しかし、低照度シーンから撮影された画像は、低画素強度、高ノイズ、色歪みのために、高品質な結果を得るためにNeRFモデルを訓練するのにはほとんど利用できない。従来の低照度画像強調法とNeRF法を併用しても,個々の2次元強調プロセスによる視界の整合性のためうまく動作しない。本稿では,srgbローライト画像から直接,シーン表現を強化し,ノーマル・ライト・ノベル・ビューを教師なしで合成する手法であるlow-light nerf(llnerf)を提案する。我々のアプローチの核心は、光界学習の分解であり、照明を強化し、ノイズを低減し、歪んだ色をnrf最適化プロセスと共同で補正することができる。本手法は,低照度シーンからの低ダイナミックレンジ(8bits/channel)画像の集合を考慮し,適切な照明と鮮やかな色と細部を付加した新しいビュー画像を生成することができる。実験の結果,提案手法は既存の低照度化法やNeRF法よりも優れていた。

Neural Radiance Field (NeRF) is a promising approach for synthesizing novel views, given a set of images and the corresponding camera poses of a scene. However, images photographed from a low-light scene can hardly be used to train a NeRF model to produce high-quality results, due to their low pixel intensities, heavy noise, and color distortion. Combining existing low-light image enhancement methods with NeRF methods also does not work well due to the view inconsistency caused by the individual 2D enhancement process. In this paper, we propose a novel approach, called Low-Light NeRF (or LLNeRF), to enhance the scene representation and synthesize normal-light novel views directly from sRGB low-light images in an unsupervised manner. The core of our approach is a decomposition of radiance field learning, which allows us to enhance the illumination, reduce noise and correct the distorted colors jointly with the NeRF optimization process. Our method is able to produce novel view images with proper lighting and vivid colors and details, given a collection of camera-finished low dynamic range (8-bits/channel) images from a low-light scene. Experiments demonstrate that our method outperforms existing low-light enhancement methods and NeRF methods.

翻訳日:2023-07-21 14:09:03 公開日:2023-07-20

# フェデレーション学習における共有性に関する調査 : モデルユーティリティ,プライバシリーク,コミュニケーション効率の展望

A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency ( http://arxiv.org/abs/2307.10655v1 )

ライセンス: Link先を確認

Jiawei Shao, Zijian Li, Wenqiang Sun, Tailin Zhou, Yuchang Sun, Lumin Liu, Zehong Lin, Jun Zhang

(参考訳) 連合学習(federated learning, ffl)は,プライバシを保護し,異なるパーティ間のコラボレーショントレーニングにおいて,極めて効果的なパラダイムとして浮上している。従来の集中型学習とは異なり、flはクライアントがプライベートなデータセットを公開することなく、プライバシーを保った情報を共有できる。このアプローチは、プライバシー保護を強化するだけでなく、複数の参加者によるより効率的で安全なコラボレーションを促進する。そのため、flは研究者からかなりの注目を集め、関連する研究をまとめるために多くの調査が進められている。しかしながら、これらの調査の大部分は、トレーニングプロセス中にモデルパラメータを共有する方法に集中し、他の形式のローカル情報を共有する可能性を見据えている。本稿では,FLで何を共有すべきかという新たな視点から,モデルユーティリティ,プライバシリーク,通信効率を重視した体系的な調査を行う。この調査は4つの異なる貢献によって以前の調査と異なる。まず、共有情報の3つのカテゴリ(モデル共有、合成データ共有、知識共有)を含む共有方法の観点から、fl法の新たな分類法を提案する。第2に,プライバシ攻撃に対するさまざまな共有方法の脆弱性を分析し,特定のプライバシ保証を提供する防御機構をレビューする。第3に、FLにおける様々な共有手法の性能と通信のオーバーヘッドを比較するための広範な実験を行う。さらに,様々な防御手法の有効性を比較しながら,モデルインバージョン攻撃とメンバーシップ推論攻撃によるプライバシー漏洩の可能性を評価する。最後に,現在の手法における潜在的な欠陥を議論し,今後の改善の方向性について概説する。

Federated learning (FL) has emerged as a highly effective paradigm for privacy-preserving collaborative training among different parties. Unlike traditional centralized learning, which requires collecting data from each party, FL allows clients to share privacy-preserving information without exposing private datasets. This approach not only guarantees enhanced privacy protection but also facilitates more efficient and secure collaboration among multiple participants. Therefore, FL has gained considerable attention from researchers, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on methods sharing model parameters during the training process, while overlooking the potential of sharing other forms of local information. In this paper, we present a systematic survey from a new perspective, i.e., what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. This survey differs from previous ones due to four distinct contributions. First, we present a new taxonomy of FL methods in terms of the sharing methods, which includes three categories of shared information: model sharing, synthetic data sharing, and knowledge sharing. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms that provide certain privacy guarantees. Third, we conduct extensive experiments to compare the performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we discuss potential deficiencies in current methods and outline future directions for improvement.

翻訳日:2023-07-21 14:08:40 公開日:2023-07-20

# SHAPの条件予測ネットワーク

Conditional expectation network for SHAP ( http://arxiv.org/abs/2307.10654v1 )

ライセンス: Link先を確認

Ronald Richman and Mario V. W\"uthrich

(参考訳) 予測モデルを説明するための非常に一般的なモデルに依存しないテクニックは、SHAP(SHapley Additive exPlanation)である。 SHAPの最も一般的な2つのバージョンは条件付き期待バージョンと条件なし期待バージョン(後者は介入型SHAPとも呼ばれる)である。木ベースのメソッドを除いて、通常、非条件バージョンが使用される(計算上の理由から)。ニューラルネットワークと他の回帰モデルの両方の条件付きバージョンを効率的に計算し、特徴成分の依存構造を適切に考慮する(代理的な)ニューラルネットワークアプローチを提供する。この提案は,一般化線形モデル(GLM)と類似した複雑な回帰モデルにおいて,ドロップ1およびアノバ解析を提供することにも有用であり,特徴成分の適切な依存構造を考慮した部分依存プロット(PDP)を提供する。

A very popular model-agnostic technique for explaining predictive models is the SHapley Additive exPlanation (SHAP). The two most popular versions of SHAP are a conditional expectation version and an unconditional expectation version (the latter is also known as interventional SHAP). Except for tree-based methods, usually the unconditional version is used (for computational reasons). We provide a (surrogate) neural network approach which allows us to efficiently calculate the conditional version for both neural networks and other regression models, and which properly considers the dependence structure in the feature components. This proposal is also useful to provide drop1 and anova analyses in complex regression models which are similar to their generalized linear model (GLM) counterparts, and we provide a partial dependence plot (PDP) counterpart that considers the right dependence structure in the feature components.

翻訳日:2023-07-21 14:08:14 公開日:2023-07-20

# 監視サービスにおける時系列自動異常検出のための最適化目標の検討

Refining the Optimization Target for Automatic Univariate Time Series Anomaly Detection in Monitoring Services ( http://arxiv.org/abs/2307.10653v1 )

ライセンス: Link先を確認

Manqing Dong and Zhanxiang Zhao and Yitong Geng and Wentao Li and Wei Wang and Huai Jiang

(参考訳) 信頼性の確保とシステムパフォーマンスの最適化を目的とした,大量のデータを扱う産業監視サービスでは,時系列異常検出が不可欠である。既存の手法では、広範囲のラベル付きリソースと手動パラメータの選択を必要とし、自動化の必要性を強調している。本稿では,時系列異常検出モデルにおけるパラメータ自動最適化のための包括的フレームワークを提案する。このフレームワークには,予測スコア,形状スコア,感度スコアという3つの最適化目標が導入されている。提案されたフレームワークは6ヶ月以上ネットで適用され、毎分5万回以上配信されている。ユーザエクスペリエンスをシンプルにするためには、期待された機密値のみを必要とし、ユーザフレンドリなインターフェースを提供し、望ましい検出結果を達成する。公開データセット上での広範な評価と他の手法との比較により,提案手法の有効性がさらに検証された。

Time series anomaly detection is crucial for industrial monitoring services that handle a large volume of data, aiming to ensure reliability and optimize system performance. Existing methods often require extensive labeled resources and manual parameter selection, highlighting the need for automation. This paper proposes a comprehensive framework for automatic parameter optimization in time series anomaly detection models. The framework introduces three optimization targets: prediction score, shape score, and sensitivity score, which can be easily adapted to different model backbones without prior knowledge or manual labeling efforts. The proposed framework has been successfully applied online for over six months, serving more than 50,000 time series every minute. It simplifies the user's experience by requiring only an expected sensitive value, offering a user-friendly interface, and achieving desired detection results. Extensive evaluations conducted on public datasets and comparison with other methods further confirm the effectiveness of the proposed framework.

翻訳日:2023-07-21 14:08:00 公開日:2023-07-20

# 自然言語処理研究の展望を探る

Exploring the Landscape of Natural Language Processing Research ( http://arxiv.org/abs/2307.10652v1 )

ライセンス: Link先を確認

Tim Schopf, Karim Arabi, Florian Matthes

(参考訳) 自然言語テキストを理解し,生成し,処理するための効率的なアプローチとして,近年,自然言語処理(NLP)の研究が急速に広まり,広く採用されている。この分野での研究が増加していることを踏まえ、NLP関連のいくつかのアプローチが研究コミュニティで調査されている。しかし、確立したトピックを分類し、傾向を特定し、今後の研究分野を概説する総合的な研究は現在も残っていない。このギャップを埋めるため,aclアンソロジーに含まれる研究論文を体系的に分類・分析した。その結果,研究景観の構造化的概観,nlpにおける研究分野の分類,nlpにおける最近の展開の分析,知見の要約,今後の課題の方向性について概説する。

As an efficient approach to understand, generate, and process natural language texts, research in natural language processing (NLP) has exhibited a rapid spread and wide adoption in recent years. Given the increasing amount of research work in this area, several NLP-related approaches have been surveyed in the research community. However, a comprehensive study that categorizes established topics, identifies trends, and outlines areas for future research remains absent to this day. Contributing to closing this gap, we have systematically classified and analyzed research papers included in the ACL Anthology. As a result, we present a structured overview of the research landscape, provide a taxonomy of fields-of-study in NLP, analyze recent developments in NLP, summarize our findings, and highlight directions for future work.

翻訳日:2023-07-21 14:07:45 公開日:2023-07-20

# 気候科学におけるグランジャー因果関係の状態空間モデルにおけるグラフ

Graphs in State-Space Models for Granger Causality in Climate Science ( http://arxiv.org/abs/2307.10703v1 )

ライセンス: Link先を確認

V\'ictor Elvira, \'Emilie Chouzenoux, Jordi Cerd\`a, Gustau Camps-Valls

(参考訳) グレンジャー因果関係(GC)は、しばしば実際の因果関係とはみなされない。しかし、これはおそらく別の時系列から予測可能性を評価する最も広く使われている方法である。グランガー因果関係は神経科学や計量学から地球科学まで、多くの応用分野で広く用いられている。我々は、状態空間モデルのグラフィカルな視点でGCを再考する。そこで我々は,線形ガウス状態空間モデルの状態方程式における線形行列作用素を推定するための期待最大化アルゴリズムであるgraphemを用いた。ラッソ正則化は、近位分解ダグラス-ラッフォードアルゴリズムを用いて解くmステップに含まれる。おもちゃの例と厳しい気候問題における実験は、標準グランジャー因果関係法に対するモデルと推論手法の利点を示している。

Granger causality (GC) is often considered not an actual form of causality. Still, it is arguably the most widely used method to assess the predictability of a time series from another one. Granger causality has been widely used in many applied disciplines, from neuroscience and econometrics to Earth sciences. We revisit GC under a graphical perspective of state-space models. For that, we use GraphEM, a recently presented expectation-maximisation algorithm for estimating the linear matrix operator in the state equation of a linear-Gaussian state-space model. Lasso regularisation is included in the M-step, which is solved using a proximal splitting Douglas-Rachford algorithm. Experiments in toy examples and challenging climate problems illustrate the benefits of the proposed model and inference technique over standard Granger causality methods.

翻訳日:2023-07-21 14:02:06 公開日:2023-07-20

# 社会が形成・形成する大規模言語モデル:arXiv出版パターンの調査

Large language models shape and are shaped by society: A survey of arXiv publication patterns ( http://arxiv.org/abs/2307.10700v1 )

ライセンス: Link先を確認

Rajiv Movva, Sidhika Balachandar, Kenny Peng, Gabriel Agostini, Nikhil Garg, Emma Pierson

(参考訳) 近年、大規模言語モデル(llm)の論文数が急増し、書誌分析によってほとんど文書化されていない科学的景観に劇的な変化をもたらした。ここでは、CSとStat arXivsに投稿された388Kの論文を分析し、2023年と2018-2022年の出版パターンの変化に注目した。本稿は, LLM論文の割合の増大, LLM論文の執筆者, LLM論文の執筆者, LLM論文の背景と著者研究の関連, 高度に引用された LLM 論文を区別する要因, 国際協力のパターンについて分析する。 LLM研究は、コンピュータと社会に関する論文の割合が18倍に増加しており、新たに出版されている著者は、より経験豊富な著者よりも、アプリケーションや社会への影響に重点を置いている可能性が高い。 LLM研究は、LLM著者がフォーカスするトピックにおけるジェンダーと学術的/産業的格差、そしてコラボレーションネットワークにおける米国と中国の分裂を文書化する。概して、我々の分析は、llmが社会によって形と形の両方を研究する深い方法を文書化しており、社会学的レンズの必要性を証明している。

There has been a steep recent increase in the number of large language model (LLM) papers, producing a dramatic shift in the scientific landscape which remains largely undocumented through bibliometric analysis. Here, we analyze 388K papers posted on the CS and Stat arXivs, focusing on changes in publication patterns in 2023 vs. 2018-2022. We analyze how the proportion of LLM papers is increasing; the LLM-related topics receiving the most attention; the authors writing LLM papers; how authors' research topics correlate with their backgrounds; the factors distinguishing highly cited LLM papers; and the patterns of international collaboration. We show that LLM research increasingly focuses on societal impacts: there has been an 18x increase in the proportion of LLM-related papers on the Computers and Society sub-arXiv, and authors newly publishing on LLMs are more likely to focus on applications and societal impacts than more experienced authors. LLM research is also shaped by social dynamics: we document gender and academic/industry disparities in the topics LLM authors focus on, and a US/China schism in the collaboration network. Overall, our analysis documents the profound ways in which LLM research both shapes and is shaped by society, attesting to the necessity of sociotechnical lenses.

翻訳日:2023-07-21 14:01:54 公開日:2023-07-20

# Lu-N-H系における近環境超伝導の実現可能性の評価

Assessing the feasibility of near-ambient conditions superconductivity in the Lu-N-H system ( http://arxiv.org/abs/2307.10699v1 )

ライセンス: Link先を確認

Yue-Wen Fang, {\DH}or{\dj}e Dangi\'c, Ion Errea

(参考訳) 窒素添加水素化ルテチウム(Lu-N-H)における近環境超伝導の最近の報告は大きな関心を集めている。しかし、相反する結果が超伝導に疑問を投げかけている。本稿では,高温超伝導臨界温度(T_c$)の高速予測器と高出力結晶構造予測を組み合わせ,Lu-N-Hの1GPaにおける特性に光を当てる。予測された構造はいずれも高温超伝導を支える可能性を示しておらず、窒素の含有は絶縁相の出現を好んでいる。近環境超伝導の欠如にもかかわらず、代替準安定テンプレートを検討し、そのT_c$と量子アンハーモニック効果を含む動的安定性について検討する。立方体lu$_4$h$_{11}$nは20gpaで100kという高い$t_c$を示し、親のluh$_3$で得られた30kに比べて大きく増加する。興味深いことに、実験で観察されたものと似たX線パターンを持つ。 LaH$_{10}$-like LuH$_{10}$とCaH$_6$-like LuH$_6$はそれぞれ175GPaと100GPaの高温超伝導体となり、T_c$は286K、246Kとなる。本研究により, 高温超伝導体は, 近環境圧力下での安定相では不可能であることが示唆された。

The recent report of near-ambient superconductivity in nitrogen-doped lutetium hydrides (Lu-N-H) has generated a great interest. However, conflicting results have raised doubts regarding superconductivity. Here, we combine high-throughput crystal structure predictions with a fast predictor of the superconducting critical temperature ($T_c$) to shed light on the properties of Lu-N-H at 1 GPa. None of the predicted structures shows the potential to support high-temperature superconductivity and the inclusion of nitrogen favors the appearance of insulating phases. Despite the lack of near-ambient superconductivity, we consider alternative metastable templates and study their $T_c$ and dynamical stability including quantum anharmonic effects. The cubic Lu$_4$H$_{11}$N exhibits a high $T_c$ of 100 K at 20 GPa, a large increase compared to 30 K obtained in its parent LuH$_3$. Interestingly, it has a similar X-ray pattern to the experimentally observed one. The LaH$_{10}$-like LuH$_{10}$ and CaH$_6$-like LuH$_6$ become high-temperature superconductors at 175 GPa and 100 GPa, with $T_c$ of 286 K and 246 K, respectively. Our findings suggest that high-temperature superconductivity is not possible in stable phases at near-ambient pressure, but metastable high-$T_c$ templates exist at moderate and high pressures.

翻訳日:2023-07-21 14:01:26 公開日:2023-07-20

# 逆知識蒸留:限定データを用いた網膜画像マッチングのための小型モデルによる大規模モデルの訓練

Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data ( http://arxiv.org/abs/2307.10698v1 )

ライセンス: Link先を確認

Sahar Almahfouz Nasser, Nihar Gupte, and Amit Sethi

(参考訳) 網膜画像マッチングは、疾患の進行と治療反応のモニタリングにおいて重要な役割を果たす。しかしながら、時間分割された画像のペア間で一致したキーポイントを持つデータセットは、トランスフォーマティブベースのモデルのトレーニングには不十分である。本稿では, オーバーフィッティングを防止しつつ, 限られたデータで大規模モデルを訓練するための, 逆知識蒸留に基づく新しい手法を提案する。まず,一般公開されたデータセット上での結果を改善するために,cnnベースのsuperretinaと呼ばれる半教師付きメソッドのアーキテクチャ修正を提案する。次に,より重いモデルに基づくより軽いモデルを訓練する分野の知識蒸留研究において直観に反するcnnベースのモデルを用いて,視覚トランスフォーマエンコーダに基づく計算量より重いモデルを訓練する。驚くべきことに、このような逆知識蒸留は一般化をさらに改善する。実験により,表現空間における高次元の嵌合は,最終出力に適合する訓練と異なり過度な適合を防止できる可能性が示唆された。また、網膜画像のキーポイント検出とマッチングのためのアノテーションを付加したパブリックデータセットを提供し、網膜画像応用のためのアルゴリズムの開発を支援する。

Retinal image matching plays a crucial role in monitoring disease progression and treatment response. However, datasets with matched keypoints between temporally separated pairs of images are not available in abundance to train transformer-based model. We propose a novel approach based on reverse knowledge distillation to train large models with limited data while preventing overfitting. Firstly, we propose architectural modifications to a CNN-based semi-supervised method called SuperRetina that help us improve its results on a publicly available dataset. Then, we train a computationally heavier model based on a vision transformer encoder using the lighter CNN-based model, which is counter-intuitive in the field knowledge-distillation research where training lighter models based on heavier ones is the norm. Surprisingly, such reverse knowledge distillation improves generalization even further. Our experiments suggest that high-dimensional fitting in representation space may prevent overfitting unlike training directly to match the final output. We also provide a public dataset with annotations for retinal image keypoint detection and matching to help the research community develop algorithms for retinal image applications.

翻訳日:2023-07-21 14:00:57 公開日:2023-07-20

# SqueezerFaceNet:小さな顔認識CNNを減らし、フィルタの処理をさらに強化

SqueezerFaceNet: Reducing a Small Face Recognition CNN Even More Via Filter Pruning ( http://arxiv.org/abs/2307.10697v1 )

ライセンス: Link先を確認

Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose Maria Buades Rubio, Josef Bigun

(参考訳) 様々なデジタルサービスでモバイルデバイスが広く使われるようになると、信頼性とリアルタイムの人物認証の必要性が高まった。このような状況下では、モバイルデバイスにおけるカメラの普及と日常アプリケーションへの統合により、顔認識技術がユーザ認証の信頼性の高い方法として出現している。深層畳み込みニューラルネットワーク(cnns)の急速な進歩は、多数の顔認証アーキテクチャを生み出した。しかし、これらのモデルはモバイルアプリケーションには大きめで実用的ではないことが多く、数百万のパラメータを持つ数百メガバイトに達する。我々は,100万パラメータ未満の軽量顔認識ネットワークであるSqueezerFaceNetを開発し,この問題に対処する。これはtaylorスコアに基づくネットワークプルーニング手法を適用し、重要度の低いフィルタを反復的に除去することで実現される。 squeezenetをベースとする既に小さなネットワーク(約1.24m)から始めると、パフォーマンスが低下することなく、さらに(最大40%まで)削減できることが分かる。我々の知識を最大限に活用するために、私たちは初めて顔認識タスクのためのネットワークプルーニング手法を評価する。

The widespread use of mobile devices for various digital services has created a need for reliable and real-time person authentication. In this context, facial recognition technologies have emerged as a dependable method for verifying users due to the prevalence of cameras in mobile devices and their integration into everyday applications. The rapid advancement of deep Convolutional Neural Networks (CNNs) has led to numerous face verification architectures. However, these models are often large and impractical for mobile applications, reaching sizes of hundreds of megabytes with millions of parameters. We address this issue by developing SqueezerFaceNet, a light face recognition network which less than 1M parameters. This is achieved by applying a network pruning method based on Taylor scores, where filters with small importance scores are removed iteratively. Starting from an already small network (of 1.24M) based on SqueezeNet, we show that it can be further reduced (up to 40%) without an appreciable loss in performance. To the best of our knowledge, we are the first to evaluate network pruning methods for the task of face recognition.

翻訳日:2023-07-21 14:00:36 公開日:2023-07-20

# SLPD:WSIのスライドレベル原型蒸留

SLPD: Slide-level Prototypical Distillation for WSIs ( http://arxiv.org/abs/2307.10696v1 )

ライセンス: Link先を確認

Zhimiao Yu, Tiancheng Lin, Yi Xu

(参考訳) 特徴表現能力の向上は、多くのスライド病理画像(WSI)タスクの基礎となっている。最近の研究は、病理特異的自己教師型学習(SSL)において大きな成功を収めている。しかし、その多くはパッチレベルの表現を学ぶことだけに焦点を当てているため、プリテキストとスライドレベルのダウンストリームタスク、例えばサブタイプ、グレーディング、ステージングの間にはギャップがある。スライドレベルの表現を目指して,WSI 上でのコンテキストモデリングのためのスライディング内およびスライディング間セマンティック構造を探索するために,SLPD (Slide-Level Prototypeal Distillation) を提案する。具体的には、各wsi内の領域(4096x4096パッチ)に対して反復的にスライダー内クラスタリングを行い、プロトタイプを作成し、割り当てられたプロトタイプに近い領域表現を奨励する。各スライドをプロトタイプで表現することで、プロトタイプのセット距離によって類似したスライドを選択し、蒸留のためのクロススライダープロトタイプで領域を割り当てる。 SLPDは、複数のスライドレベルのベンチマークで最先端の結果を達成し、スライドのセマンティックな構造の表現学習がWSI分析に適したプロキシタスクを実現できることを示した。コードはhttps://github.com/Carboxy/SLPD.comから入手できる。

Improving the feature representation ability is the foundation of many whole slide pathological image (WSIs) tasks. Recent works have achieved great success in pathological-specific self-supervised learning (SSL). However, most of them only focus on learning patch-level representations, thus there is still a gap between pretext and slide-level downstream tasks, e.g., subtyping, grading and staging. Aiming towards slide-level representations, we propose Slide-Level Prototypical Distillation (SLPD) to explore intra- and inter-slide semantic structures for context modeling on WSIs. Specifically, we iteratively perform intra-slide clustering for the regions (4096x4096 patches) within each WSI to yield the prototypes and encourage the region representations to be closer to the assigned prototypes. By representing each slide with its prototypes, we further select similar slides by the set distance of prototypes and assign the regions by cross-slide prototypes for distillation. SLPD achieves state-of-the-art results on multiple slide-level benchmarks and demonstrates that representation learning of semantic structures of slides can make a suitable proxy task for WSI analysis. Code will be available at https://github.com/Carboxy/SLPD.

翻訳日:2023-07-21 14:00:18 公開日:2023-07-20

# Self2Self+: 自己監督型学習と画像品質評価の損失を伴い、単一イメージのDenoising

Self2Self+: Single-Image Denoising with Self-Supervised Learning and Image Quality Assessment Loss ( http://arxiv.org/abs/2307.10695v1 )

ライセンス: Link先を確認

Jaekyun Ko and Sanghwan Lee

(参考訳) 近年,教師付き学習に基づく校正手法が有望な性能を示している。しかし、ノイズクリーンなイメージペアを含む外部データセットへの依存は、適用性を制限する。この制限に対処するため、研究者はノイズの多い入力のみを使用して、デノナイジングネットワークのトレーニングに焦点を合わせてきた。そこで本研究では,ノイズの多い入力画像のみをネットワークトレーニングに用いる単一画像の自己教師型学習手法を提案する。ゲート畳み込みは特徴抽出に用いられ,無基準画像品質評価は訓練過程の指導に用いられた。さらに,Bernulliサンプルを用いて入力画像データセットからサンプルをサンプリングし,一定のドロップアウト率でトレーニングを行った。対応する結果は、トレーニングされたネットワークのさまざまなインスタンスから生成された予測をドロップアウトで平均することで得られた。実験の結果,提案手法は合成データと実世界データの両方において最先端のデノイジング性能を達成した。このことは,様々なノイズ除去タスクに対する潜在的な解決策として,本手法の有効性と実用性を強調している。

Recently, denoising methods based on supervised learning have exhibited promising performance. However, their reliance on external datasets containing noisy-clean image pairs restricts their applicability. To address this limitation, researchers have focused on training denoising networks using solely a set of noisy inputs. To improve the feasibility of denoising procedures, in this study, we proposed a single-image self-supervised learning method in which only the noisy input image is used for network training. Gated convolution was used for feature extraction and no-reference image quality assessment was used for guiding the training process. Moreover, the proposed method sampled instances from the input image dataset using Bernoulli sampling with a certain dropout rate for training. The corresponding result was produced by averaging the generated predictions from various instances of the trained network with dropouts. The experimental results indicated that the proposed method achieved state-of-the-art denoising performance on both synthetic and real-world datasets. This highlights the effectiveness and practicality of our method as a potential solution for various noise removal tasks.

翻訳日:2023-07-21 13:59:54 公開日:2023-07-20

# 確率的プログラミングを用いた知的仮想エージェントのためのアーキテクチャフレームワーク

Towards an architectural framework for intelligent virtual agents using probabilistic programming ( http://arxiv.org/abs/2307.10693v1 )

ライセンス: Link先を確認

Anton Andreev (GIPSA-Services), Gr\'egoire Cattan

(参考訳) 我々は,ECA(Embodied conversational agent)を考案・構築するためのKorraAIと呼ばれる新しいフレームワークを提案する。本フレームワークは,環境情報やインタラクション時間,ヒューマンインタラクションパートナーが提供する不確実な情報など,コンテキスト情報を考慮したECAの振る舞いをモデル化する。さらに、KorraAIで構築されたエージェントは、人間のパートナーとの対話を開始することができるため、積極的な行動を示すことができる。これらの目的のために、korraaiは確率的プログラミングを利用する。 KorraAIの確率モデルは、その振る舞いとユーザとのインタラクションをモデル化するために使用される。ユーザの好みに適応し、ECAにおける一定の不確定性を実現し、より自然な振る舞いを実現する。ムード、嗜好、感情(サプライズなど)のような人間のような内部状態は、分布やベイジアンネットワークと共にKorraAIでモデル化することができる。これらのモデルは、ユーザと対話することなく、時間とともに進化することができる。 ECAモデルはプラグインとして実装され、共通のインターフェースを共有する。これにより、ECAデザイナは、モデリングしているキャラクタをより重視し、技術的な詳細に注目するだけでなく、ECAモデルを保存および交換することが可能になる。仮想セールスエージェント、カスタマーサービスエージェント、仮想コンパニオン、芸能人、家庭教師など、KorraAI ECAのいくつかの応用が可能である。

We present a new framework called KorraAI for conceiving and building embodied conversational agents (ECAs). Our framework models ECAs' behavior considering contextual information, for example, about environment and interaction time, and uncertain information provided by the human interaction partner. Moreover, agents built with KorraAI can show proactive behavior, as they can initiate interactions with human partners. For these purposes, KorraAI exploits probabilistic programming. Probabilistic models in KorraAI are used to model its behavior and interactions with the user. They enable adaptation to the user's preferences and a certain degree of indeterminism in the ECAs to achieve more natural behavior. Human-like internal states, such as moods, preferences, and emotions (e.g., surprise), can be modeled in KorraAI with distributions and Bayesian networks. These models can evolve over time, even without interaction with the user. ECA models are implemented as plugins and share a common interface. This enables ECA designers to focus more on the character they are modeling and less on the technical details, as well as to store and exchange ECA models. Several applications of KorraAI ECAs are possible, such as virtual sales agents, customer service agents, virtual companions, entertainers, or tutors.

翻訳日:2023-07-21 13:59:38 公開日:2023-07-20

# 解集合プログラミングによる有界組合せ再構成

Bounded Combinatorial Reconfiguration with Answer Set Programming ( http://arxiv.org/abs/2307.10688v1 )

ライセンス: Link先を確認

Yuya Yamada, Mutsunori Banbara, Katsumi Inoue, Torsten Schaub

(参考訳) 本稿では, Answer Set Programming (ASP) に基づく組合せ再構成問題の解法として, 有界組合せ再構成(bounded combinatorial reconfiguration) という手法を開発した。一般的な課題は、ソース組合せ問題の解空間を研究し、特別な性質を持つ実現可能な解列が存在するかどうかを決定することである。コンストラクションソルバは、直近の国際コンペ(CoRe Challenge 2022)において、コンストラクショントラックのすべてのメトリクスをカバーしている。コンストラゴはシングルエンジンソルバトラックの最短距離で1位にランクインした。本稿では,有界組合せ再構成の設計と実装について述べるとともに,最も研究されている組合せ再構成問題の一つである独立集合再構成問題のASPエンコーディングについて述べる。最後に,CoRe Challenge 2022のすべての事例を考慮した実証分析を行った。

We develop an approach called bounded combinatorial reconfiguration for solving combinatorial reconfiguration problems based on Answer Set Programming (ASP). The general task is to study the solution spaces of source combinatorial problems and to decide whether or not there are sequences of feasible solutions that have special properties. The resulting recongo solver covers all metrics of the solver track in the most recent international competition on combinatorial reconfiguration (CoRe Challenge 2022). recongo ranked first in the shortest metric of the single-engine solvers track. In this paper, we present the design and implementation of bounded combinatorial reconfiguration, and present an ASP encoding of the independent set reconfiguration problem that is one of the most studied combinatorial reconfiguration problems. Finally, we present empirical analysis considering all instances of CoRe Challenge 2022.

翻訳日:2023-07-21 13:59:18 公開日:2023-07-20

# pre-train, adapt and detection: camouflaged object detectionのためのマルチタスクアダプタチューニング

Pre-train, Adapt and Detect: Multi-Task Adapter Tuning for Camouflaged Object Detection ( http://arxiv.org/abs/2307.10685v1 )

ライセンス: Link先を確認

Yinghui Xing, Dexuan Kong, Shizhou Zhang, Geng Chen, Lingyan Ran, Peng Wang, Yanning Zhang

(参考訳) camouflaged object detection (cod)は、背景に類似したパターンを示すcamouflaged objectをセグメント化することを目的としている。既存のほとんどの研究は、完全な細部と細部でカモフラージュされたオブジェクトを特定するための特別なモジュールの確立に特化しているが、境界は、オブジェクト関連のセマンティクスの欠如のためにうまく配置できない。本稿では,新しい‘pre-train, adapt and detection’パラダイムを提案する。大規模事前学習モデルを導入することで、大量のマルチモーダルデータから学んだ豊富な知識をcodに直接転送することができる。下流CODタスクに適した機能を調整するために、軽量並列アダプタを挿入する。 4つの挑戦的なベンチマークデータセットに対する大規模な実験により、我々の手法は既存の最先端のCODモデルよりも大きなマージンで優れていることが示された。さらに,異なるセマンティッククラス間で共有可能な知識を活用するために,アダプタをチューニングするためのマルチタスク学習方式を設計する。総合的な実験結果から,本モデルの一般化能力は,ソースタスクのマルチタスクアダプタ初期化とターゲットタスクのマルチタスク適応により大幅に向上できることがわかった。

Camouflaged object detection (COD), aiming to segment camouflaged objects which exhibit similar patterns with the background, is a challenging task. Most existing works are dedicated to establishing specialized modules to identify camouflaged objects with complete and fine details, while the boundary can not be well located for the lack of object-related semantics. In this paper, we propose a novel ``pre-train, adapt and detect" paradigm to detect camouflaged objects. By introducing a large pre-trained model, abundant knowledge learned from massive multi-modal data can be directly transferred to COD. A lightweight parallel adapter is inserted to adjust the features suitable for the downstream COD task. Extensive experiments on four challenging benchmark datasets demonstrate that our method outperforms existing state-of-the-art COD models by large margins. Moreover, we design a multi-task learning scheme for tuning the adapter to exploit the shareable knowledge across different semantic classes. Comprehensive experimental results showed that the generalization ability of our model can be substantially improved with multi-task adapter initialization on source tasks and multi-task adaptation on target tasks.

翻訳日:2023-07-21 13:59:04 公開日:2023-07-20

# ベル対角状態の特異な絡み合い構造を示すワイル・ハイゼンベルクベル基底の特殊特性

Special features of the Weyl-Heisenberg Bell basis imply unusual entanglement structure of Bell-diagonal states ( http://arxiv.org/abs/2307.10727v1 )

ライセンス: Link先を確認

Christopher Popp and Beatrix C. Hiesmayr

(参考訳) 最大絡み合いベル状態は、量子情報科学において絡み合いに基づく方法にとって重要である。通常、ワイル・ハイゼンベルク作用素による完全正則ベル基底の標準構成を考える。これらの演算子の群構造は、誤差補正スキームやベル対角状態の絡み合い構造に強い影響を与えることを示す。特に、これはパウリチャネルとツワールチャネルの等価性を意味する。興味深いことに、他の完全正則ベル基底は同値を破り、例えばPT交絡状態の共有において全く異なる絡み合い構造をもたらす。詳しくは,標準ベル基底は,他のベル基底と比較して,PT状態とPTアンタングル状態の観測値が最も高いことがわかった。結論として,標準ベル基底構造は,偏差を考慮した場合の量子情報理論プロトコルに強い意味を持つ,非常に特殊な構造を生かしている。

Maximally entangled Bell states are of crucial importance for entanglement based methods in quantum information science. Typically, a standard construction of a complete orthonormal Bell-basis by Weyl-Heisenberg operators is considered. We show that the group structure of these operators has strong implication on error correction schemes and on the entanglement structure within Bell-diagonal states. In particular, it implies a equivalence between a Pauli channel and a twirl channel. Interestingly, other complete orthonormal Bell-bases do break the equivalence and lead to a completely different entanglement structure, for instance in the share of PPT-entangled states. In detail, we find that the standard Bell basis has the highest observed share on PPT-states and PPT-entangled states compared to other Bell bases. In summary, our findings show that the standard Bell basis construction exploits a very special structure with strong implications to quantum information theoretic protocols if a deviation is considered.

翻訳日:2023-07-21 13:51:01 公開日:2023-07-20

# LLM検閲: 機械学習の課題か、それともコンピュータセキュリティの問題か?

LLM Censorship: A Machine Learning Challenge or a Computer Security Problem? ( http://arxiv.org/abs/2307.10719v1 )

ライセンス: Link先を確認

David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan

(参考訳) 大規模言語モデル(LLM)は複雑な命令を解釈する際、印象的な能力を示した。しかし、提供指示に対する盲目な遵守は、悪意ある使用の危険性に関する懸念につながっている。 LLMを用いたモデル微調整や出力検閲のような既存の防御機構は、まだ問題のある応答を生成できるため、失敗することが証明されている。一般的な検閲アプローチでは、この問題を機械学習の問題として扱い、LLM出力における望ましくないコンテンツを検出するために別のLMに依存している。本稿では,このようなセマンティック検閲手法の理論的限界について述べる。具体的には,semantic censorship が決定不能な問題として認識される可能性を示し,llms のプログラム的および命令追従機能に起因する検閲の固有の課題を浮き彫りにする。さらに我々は、知識のある攻撃者が許容可能なものの集合から許容できない出力を再構築できるため、これらの課題は意味的な検閲を超えて広がると主張する。その結果、検閲の問題は再評価されるべきであり、潜在的なリスクを軽減するためのセキュリティベースのアプローチの適応を保証するセキュリティ問題として扱われるべきである。

Large language models (LLMs) have exhibited impressive capabilities in comprehending complex instructions. However, their blind adherence to provided instructions has led to concerns regarding risks of malicious use. Existing defence mechanisms, such as model fine-tuning or output censorship using LLMs, have proven to be fallible, as LLMs can still generate problematic responses. Commonly employed censorship approaches treat the issue as a machine learning problem and rely on another LM to detect undesirable content in LLM outputs. In this paper, we present the theoretical limitations of such semantic censorship approaches. Specifically, we demonstrate that semantic censorship can be perceived as an undecidable problem, highlighting the inherent challenges in censorship that arise due to LLMs' programmatic and instruction-following capabilities. Furthermore, we argue that the challenges extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs from a collection of permissible ones. As a result, we propose that the problem of censorship needs to be reevaluated; it should be treated as a security problem which warrants the adaptation of security-based approaches to mitigate potential risks.

翻訳日:2023-07-21 13:50:46 公開日:2023-07-20

# 硬質試料とノイズラベル試料の差異に関する実証的研究

Differences Between Hard and Noisy-labeled Samples: An Empirical Study ( http://arxiv.org/abs/2307.10718v1 )

ライセンス: Link先を確認

Mahsa Forouzesh and Patrick Thiran

(参考訳) ラベル付きデータセットからノイズや誤ったラベル付きサンプルをハード/ディフルトサンプルで抽出することは、重要だが未調査のトピックである。 2つの一般的な、しばしば独立した作業ラインが存在し、1つはノイズラベルへの対処に焦点を当て、もう1つはハードサンプルを扱う。しかし、両方のデータが存在する場合、既存のほとんどのメソッドはそれらを等しく扱い、結果としてモデル全体の性能が低下する。本稿では,まず,異なるサンプルに対して,カスタムハードネスとノイズレベルを有する各種合成データセットを設計する。提案する系統的実証研究により,本研究の類似性がよりよく理解され,また,難解なサンプルと不正確なラベル付きサンプルとの相違がより重要となる。これらの制御された実験は、硬度と雑音のサンプルを区別する手法の開発の道を開く。そこで本研究では,硬い試料を保ちながら雑音に満ちた試料をフィルタする簡易かつ効果的な測定法を提案する。本研究では,ラベルノイズが存在する場合の様々なデータ分割手法について検討し,提案手法を用いてハードサンプルからのノイズサンプルをフィルタリングし,フィルタ付きデータセット上でモデルをトレーニングした結果,高いテスト精度が得られたことを証明した。生成した合成データセットと実世界のラベルノイズのあるデータセットの両方でこれを実証する。さらに,提案手法は,半教師付き学習フレームワークで使用する場合,他の手法を大きく上回っている。

Extracting noisy or incorrectly labeled samples from a labeled dataset with hard/difficult samples is an important yet under-explored topic. Two general and often independent lines of work exist, one focuses on addressing noisy labels, and another deals with hard samples. However, when both types of data are present, most existing methods treat them equally, which results in a decline in the overall performance of the model. In this paper, we first design various synthetic datasets with custom hardness and noisiness levels for different samples. Our proposed systematic empirical study enables us to better understand the similarities and more importantly the differences between hard-to-learn samples and incorrectly-labeled samples. These controlled experiments pave the way for the development of methods that distinguish between hard and noisy samples. Through our study, we introduce a simple yet effective metric that filters out noisy-labeled samples while keeping the hard samples. We study various data partitioning methods in the presence of label noise and observe that filtering out noisy samples from hard samples with this proposed metric results in the best datasets as evidenced by the high test accuracy achieved after models are trained on the filtered datasets. We demonstrate this for both our created synthetic datasets and for datasets with real-world label noise. Furthermore, our proposed data partitioning method significantly outperforms other methods when employed within a semi-supervised learning framework.

翻訳日:2023-07-21 13:50:27 公開日:2023-07-20

# 決定的かつ快適な行動計画のためのリスクシャドーイングの導入

Introducing Risk Shadowing For Decisive and Comfortable Behavior Planning ( http://arxiv.org/abs/2307.10714v1 )

ライセンス: Link先を確認

Tim Puphal and Julian Eggert

(参考訳) 都市運転におけるグループインタラクションの問題を考える。自動運転車の最先端の行動プランナーは、主に、他のエージェントと衝突しないなどのエゴエージェントの最適な行動を見つけるために、コスト関数で各エージェントとエージェントの相互作用を個別に検討する。本稿では,3つのエージェント間のグループ間相互作用を分析することで,単一インタラクションを超越できる状況理解手法であるリスクシャドーイングを開発する。具体的には、この第1のエージェントは、第2のエージェントが邪魔しているため、egoエージェントに到達できないため、egoエージェントの行動プランナーで考慮する必要のない第1のエージェントを見つけ出すことができる。実験では,リスクシャドーイングを行動プランナの上流フィルタモジュールとして用いることで,これらの場合の安全性が保証されることから,より決定的かつ快適な運転戦略を計画できることを示した。このアプローチのユーザビリティは,異なる交差点シナリオと縦方向駆動に対して実証される。

We consider the problem of group interactions in urban driving. State-of-the-art behavior planners for self-driving cars mostly consider each single agent-to-agent interaction separately in a cost function in order to find an optimal behavior for the ego agent, such as not colliding with any of the other agents. In this paper, we develop risk shadowing, a situation understanding method that allows us to go beyond single interactions by analyzing group interactions between three agents. Concretely, the presented method can find out which first other agent does not need to be considered in the behavior planner of an ego agent, because this first other agent cannot reach the ego agent due to a second other agent obstructing its way. In experiments, we show that using risk shadowing as an upstream filter module for a behavior planner allows to plan more decisive and comfortable driving strategies than state of the art, given that safety is ensured in these cases. The usability of the approach is demonstrated for different intersection scenarios and longitudinal driving.

翻訳日:2023-07-21 13:50:03 公開日:2023-07-20

# Kick Back & Relax: SlowTVで世界を再構築する方法を学ぶ

Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV ( http://arxiv.org/abs/2307.10713v1 )

ライセンス: Link先を確認

Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden

(参考訳) 自己教師付き単眼深度推定(ss-mde)は、膨大なデータにスケールする可能性がある。残念ながら、既存のアプローチは自動車領域に限定しており、自然環境や屋内環境といった複雑な環境に一般化できない。そこで我々は,既存の自動車用データセットよりも桁違いに多くのデータを含む,youtubeから収集した大規模slowtvデータセットを提案する。 SlowTVは、世界の季節的ハイキング、観光運転、スキューバダイビングなど、多様な環境からの1.7Mイメージを含んでいる。このデータセットを用いて、屋内/屋外の大量のデータセットにゼロショットの一般化を提供するSS-MDEモデルを訓練する。結果として得られたモデルは、より効率的なアーキテクチャを使用しても、既存のSSLアプローチをすべて上回り、教師付きSoTAのギャップを埋める。さらに,性能とゼロショット一般化をさらに最大化するために,ベストプラクティスのコレクションも導入する。これには 1)アスペクト比の増大 2)カメラ固有の推定 3)フレームランダム化とサポート 4) 柔軟な動き推定。コードはhttps://github.com/jspenmar/slowtv_monodepthで入手できる。

Self-supervised monocular depth estimation (SS-MDE) has the potential to scale to vast quantities of data. Unfortunately, existing approaches limit themselves to the automotive domain, resulting in models incapable of generalizing to complex environments such as natural or indoor settings. To address this, we propose a large-scale SlowTV dataset curated from YouTube, containing an order of magnitude more data than existing automotive datasets. SlowTV contains 1.7M images from a rich diversity of environments, such as worldwide seasonal hiking, scenic driving and scuba diving. Using this dataset, we train an SS-MDE model that provides zero-shot generalization to a large collection of indoor/outdoor datasets. The resulting model outperforms all existing SSL approaches and closes the gap on supervised SoTA, despite using a more efficient architecture. We additionally introduce a collection of best-practices to further maximize performance and zero-shot generalization. This includes 1) aspect ratio augmentation, 2) camera intrinsic estimation, 3) support frame randomization and 4) flexible motion estimation. Code is available at https://github.com/jspenmar/slowtv_monodepth.

翻訳日:2023-07-21 13:49:28 公開日:2023-07-20

# 共役DPM:拡散確率モデルの勾配バックプロパゲーションのための随伴感度法

AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models ( http://arxiv.org/abs/2307.10711v1 )

ライセンス: Link先を確認

Jiachun Pan, Hanshu Yan, Jun Hao Liew, Vincent Y. F. Tan, Jiashi Feng

(参考訳) 既存のカスタマイズ方法は、事前訓練された拡散確率モデル(DPM)をユーザが提供する概念に合わせるために、複数の参照例にアクセスする必要がある。本論文は、DPMカスタマイズの課題として、生成コンテンツ上で定義された差別化可能な指標が唯一利用可能な監督基準である場合に解決することを目的とする。 dpms のサンプリング手順は再帰的な unet への呼び出しを必要とするため、na\"ive gradient backpropagation では全てのイテレーションの中間状態を格納する必要があるため、メモリ消費が非常に高い。そこで本研究では,まず拡散モデルから,対応する確率フローODEを解き,新しいサンプルを生成する手法であるAdjointDPMを提案する。次に、随伴感度法を用いて、別の拡張ODEを解くことで、損失の勾配をモデルのパラメータ(条件信号、ネットワーク重み、初期雑音を含む)に戻す。さらに, 指数積分を用いて, 確率フローODEと拡張ODEを単純な非剛性ODEとして再パラメータ化する。最後に、視覚効果を識別テキストの埋め込みに変換すること、特定のスタイル化のためのDPMを微調整すること、セキュリティ監査のための反対サンプルを生成するために初期ノイズを最適化すること、の3つの興味深い課題に対するAdjointDPMの有効性を実証する。

Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denoising UNet, na\"ive gradient backpropagation requires storing the intermediate states of all iterations, resulting in extremely high memory consumption. To overcome this issue, we propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters (including conditioning signals, network weights, and initial noises) by solving another augmented ODE. To reduce numerical errors in both the forward generation and gradient backpropagation processes, we further reparameterize the probability-flow ODE and augmented ODE as simple non-stiff ODEs using exponential integration. Finally, we demonstrate the effectiveness of AdjointDPM on three interesting tasks: converting visual effects into identification text embeddings, finetuning DPMs for specific types of stylization, and optimizing initial noise to generate adversarial samples for security auditing.

翻訳日:2023-07-21 13:49:10 公開日:2023-07-20

# マルチモーダル軌道最適化のためのパラメータ化政策学習

Reparameterized Policy Learning for Multimodal Trajectory Optimization ( http://arxiv.org/abs/2307.10710v1 )

ライセンス: Link先を確認

Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su

(参考訳) 本研究では,高次元連続行動空間における強化学習(RL)のパラメータ化政策の課題について検討する。本稿の目的は,一般のガウスパラメータ化に内在する制限を克服するマルチモーダルポリシの開発である。そこで本研究では,連続rlポリシーを最適軌跡生成モデルとしてモデル化する原則付きフレームワークを提案する。潜在変数のポリシーを条件づけることで、新しい変動境界を最適化目標として導出し、環境の探索を促進する。次に、マルチモーダルポリシーパラメータ化と学習世界モデルを活用して、強力な探索機能と高データ効率を実現するための実用的モデルベースRL手法であるRPGを提案する。実験により,本手法は,密集した報酬を伴うタスクにおいて局所最適を回避し,オブジェクト中心の本質的な報酬を取り入れることで,スパース・リワード環境の解決に有効であることが示された。提案手法は, 様々なタスクにおいて, 従来手法を一貫して上回っている。コードと補足資料はプロジェクトページhttps://haosulab.github.io/rpg/で入手できる。

We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used Gaussian parameterization. To achieve this, we propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. By conditioning the policy on a latent variable, we derive a novel variational bound as the optimization objective, which promotes exploration of the environment. We then present a practical model-based RL method, called Reparameterized Policy Gradient (RPG), which leverages the multimodal policy parameterization and learned world model to achieve strong exploration capabilities and high data efficiency. Empirical results demonstrate that our method can help agents evade local optima in tasks with dense rewards and solve challenging sparse-reward environments by incorporating an object-centric intrinsic reward. Our method consistently outperforms previous approaches across a range of tasks. Code and supplementary materials are available on the project page https://haosulab.github.io/RPG/

翻訳日:2023-07-21 13:48:45 公開日:2023-07-20

# 局所化量子系とカオス量子系を区別する方法

A method to discriminate between localized and chaotic quantum systems ( http://arxiv.org/abs/2307.10706v1 )

ライセンス: Link先を確認

Youssef Aziz Alaoui and Bruno Laburthe-Tolra

(参考訳) 我々は、当初平衡から設定された一般の孤立量子系がその初期状態に近い局所化あるいはカオス化できるかどうかを区別する基準を導出する。提案手法では, 格子サイト内のエネルギーと, 格子サイトから次の格子サイトへのトンネルが等質である一次元格子内を移動する粒子に, 系の力学をマッピングするランツォス基底の時間発展を考察する。カオスシステムとローカライズされたシステムを区別できる基準を推測する。この基準はランツォ状態と期待エネルギーの変動の間の結合強度を含む。本研究では,次元関数としてのアンダーソン局在に対応する3つの事例,多体双極子スピン系の平衡外ダイナミクス,可積分系を検証し,妥当性を検証する。我々は、量子カオス系を特徴づけるために提案されたウィグナー予想と固有状態熱化仮説の正当性を示した。実際、系がカオスであるための我々の基準は、ウィグナー・ダイソン分布の特徴である固有ネルギのレベル反発(スペクトル剛性とも呼ばれる)を暗示している。実演では、ハミルトニアンによって弱次に結合された状態と接続する状態として、固有状態の加熱が適用される作用素のクラスを定義することができる。

We derive a criterion that distinguishes whether a generic isolated quantum system initially set out of equilibrium can be considered as localized close to its initial state, or chaotic. Our approach considers the time evolution in the Lanczos basis, which maps the system's dynamics onto that of a particle moving in a one-dimensional lattice where both the energy in the lattice sites and the tunneling from one lattice site to the next are inhomogeneous. We infer a criterion that allows distinguishing localized from chaotic systems. This criterion involves the coupling strengths between Lanczos states and their expectation energy fluctuations. We verify its validity by inspecting three cases, corresponding to Anderson localization as a function of dimension, the out-of-equilibrium dynamics of a many-body dipolar spin system, and integrable systems. We finally show that our approach provides a justification for the Wigner surmise and the eigenstate thermalization hypothesis, which have both been proposed to characterize quantum chaotic systems. Indeed, our criterion for a system to be chaotic implies the level repulsion (also known as spectral rigidity) of eigenenergies, which is characteristic of the Wigner-Dyson distribution; and we also demonstrate that in the chaotic regime, the expectation value of any local observable only weakly varies as a function of eigenstates. Our demonstration allows to define the class of operators to which the eigenstate thermalization applies, as the ones that connect states that are coupled at weak order by the Hamiltonian.

翻訳日:2023-07-21 13:48:25 公開日:2023-07-20

# TwinLiteNet:自動運転車における走行可能エリアとレーンセグメンテーションのための効率的軽量モデル

TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars ( http://arxiv.org/abs/2307.10705v1 )

ライセンス: Link先を確認

Quang Huy Che and Dinh Phuc Nguyen and Minh Quan Pham and Duc Khai Lam

(参考訳) セマンティックセグメンテーションは、周囲の環境を理解するための自律運転において一般的な課題である。運転可能なエリアセグメンテーションとレーン検出は、道路上の安全かつ効率的なナビゲーションに特に重要である。しかし、オリジナルのセマンティクスセグメンテーションモデルは計算コストが高く、ハイエンドハードウェアを必要とするため、自動運転車の組み込みシステムでは実現不可能である。本稿では,運転可能領域と車線区分の軽量モデルを提案する。 TwinLiteNetは安価に設計されているが、正確で効率的なセグメンテーション結果が得られる。 bdd100kデータセット上でtwinlitenetを評価し,現代的なモデルと比較する。実験の結果,twinlitenetは既存の手法と同様に動作し,計算資源が大幅に少ないことがわかった。具体的には、twinlitenet はdrivable area task の91.3%、レーン検出タスクの31.08% iou を 0.4 million のパラメータで達成し、gpu rtx a5000 で 415 fps を達成した。さらにtwinlitenetは、jetson xavier nxで60fpsを達成したため、計算能力に制限のある組み込みデバイス上でリアルタイムに動作し、自動運転車にとって理想的なソリューションとなる。コードは url{https://github.com/chequanghuy/TwinLiteNet} で入手できる。

Semantic segmentation is a common task in autonomous driving to understand the surrounding environment. Driveable Area Segmentation and Lane Detection are particularly important for safe and efficient navigation on the road. However, original semantic segmentation models are computationally expensive and require high-end hardware, which is not feasible for embedded systems in autonomous vehicles. This paper proposes a lightweight model for the driveable area and lane line segmentation. TwinLiteNet is designed cheaply but achieves accurate and efficient segmentation results. We evaluate TwinLiteNet on the BDD100K dataset and compare it with modern models. Experimental results show that our TwinLiteNet performs similarly to existing approaches, requiring significantly fewer computational resources. Specifically, TwinLiteNet achieves a mIoU score of 91.3% for the Drivable Area task and 31.08% IoU for the Lane Detection task with only 0.4 million parameters and achieves 415 FPS on GPU RTX A5000. Furthermore, TwinLiteNet can run in real-time on embedded devices with limited computing power, especially since it achieves 60FPS on Jetson Xavier NX, making it an ideal solution for self-driving vehicles. Code is available: url{https://github.com/chequanghuy/TwinLiteNet}.

翻訳日:2023-07-21 13:48:00 公開日:2023-07-20

# 適応型マルチエージェントマルチアーム付きバンディットを用いた大規模evの分散型スマート充電

Decentralized Smart Charging of Large-Scale EVs using Adaptive Multi-Agent Multi-Armed Bandits ( http://arxiv.org/abs/2307.10704v1 )

ライセンス: Link先を確認

Sharyal Zafar (ENS Rennes, SATIE), Rapha\"el Feraud, Anne Blavette (ENS Rennes, SATIE), Guy Camilleri (UT3, IRIT), Hamid Ben (SATIE, ENS Rennes)

(参考訳) 電気自動車と太陽光発電の急激な成長は、ピーク負荷要求による電流混雑や電圧制限違反などの新しい課題をもたらす可能性がある。これらの問題は、電気自動車、すなわちスマート充電の動作を制御することで軽減することができる。集中型スマート充電ソリューションはすでに文献で提案されている。しかし、このようなソリューションはスケーラビリティに欠ける可能性があり、単一障害点やデータプライバシの懸念など、中央集権化の固有の欠点に苦しむ。分散化はこれらの課題に取り組むのに役立つ。本稿では,適応型マルチエージェントシステムの哲学を用いて,完全分散型スマート充電システムを提案する。提案システムでは,マルチアームバンディット学習を用いて不確実性を扱う。提示されたシステムは分散化、スケーラブル、リアルタイム、モデルフリーであり、異なるプレイヤー間で公平性を考慮している。また,性能評価のための詳細なケーススタディも提示した。

The drastic growth of electric vehicles and photovoltaics can introduce new challenges, such as electrical current congestion and voltage limit violations due to peak load demands. These issues can be mitigated by controlling the operation of electric vehicles i.e., smart charging. Centralized smart charging solutions have already been proposed in the literature. But such solutions may lack scalability and suffer from inherent drawbacks of centralization, such as a single point of failure, and data privacy concerns. Decentralization can help tackle these challenges. In this paper, a fully decentralized smart charging system is proposed using the philosophy of adaptive multi-agent systems. The proposed system utilizes multi-armed bandit learning to handle uncertainties in the system. The presented system is decentralized, scalable, real-time, model-free, and takes fairness among different players into account. A detailed case study is also presented for performance evaluation.

翻訳日:2023-07-21 13:47:38 公開日:2023-07-20

# MSQNet:マルチモーダルクエリによるアクターに依存しないアクション認識

MSQNet: Actor-agnostic Action Recognition with Multi-modal Query ( http://arxiv.org/abs/2307.10763v1 )

ライセンス: Link先を確認

Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta

(参考訳) 既存の行動認識法は、内在的なトポロジとアクター間の明らかな差異により、アクター固有のものである。これはアクター固有のポーズ推定(例えば人間対動物)を必要とし、複雑なモデル設計と高いメンテナンスコストをもたらす。さらに、他の利用可能な情報ソース(クラス名テキストなど)や複数のアクションの同時発生を無視しながら、視覚的モダリティのみと単一ラベルの分類を学ぶことに注力することが多い。これらの制約を克服するために,人間や動物を含む様々な種類の俳優に統一されたソリューションを提供する「アクター非依存マルチモード動作認識」という新しい手法を提案する。さらに,多モードセマンティッククエリーネットワーク(MSQNet)モデルをトランスフォーマーベースのオブジェクト検出フレームワーク(DETRなど)で定式化し,視覚的およびテキスト的モダリティを活用して,アクションクラスをより良く表現する。アクター固有のモデルデザインの排除は重要な利点であり、アクターのポーズ推定の必要性を完全に排除する。 5つの公開ベンチマークの大規模な実験によると、我々のMSQNetは、人間と動物のシングルラベルとマルチラベルのアクション認識タスクにおいて、アクター固有の代替手段の先行技術を最大50%上回っている。コードはhttps://github.com/mondalanindya/MSQNet.comでリリースされる。

Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code will be released at https://github.com/mondalanindya/MSQNet.

翻訳日:2023-07-21 13:42:01 公開日:2023-07-20

# 単一qudit符号化によるフォールトトレラント計算

Fault-Tolerant Computing with Single Qudit Encoding ( http://arxiv.org/abs/2307.10761v1 )

ライセンス: Link先を確認

Matteo Mezzadri, Alessandro Chiesa, Luca Lepori and Stefano Carretta

(参考訳) 本稿では,単一マルチレベルquditに符号化された論理量子ビットを用いた安定化器符号のフォールトトレラント実装に対する一般的なアプローチを提案する。提案方式は、補正と普遍量子計算を可能にする。分子スピン四重項のシミュレーションにより,quditサイズの論理的誤りをほぼ指数関数的に抑制することを示した。結果として得られた小さなquditのパフォーマンスは、数千単位のqubitコードと比較すると驚くべきものだ。

We present a general approach for the Fault Tolerant implementation of stabilizer codes with a logical qubit encoded into a single multi-level qudit, preventing the explosion of resources of multi-qubit codes. The proposed scheme allows for correction and universal quantum computation. We demonstrate its effectiveness by simulations on molecular spin qudits, finding an almost exponential suppression of logical errors with the qudit size. The resulting performance on a small qudit is remarkable when compared to qubit codes using thousands of units.

翻訳日:2023-07-21 13:41:37 公開日:2023-07-20

# Vesper: 音声認識のためのコンパクトで効果的な事前学習モデル

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition ( http://arxiv.org/abs/2307.10757v1 )

ライセンス: Link先を確認

Weidong Chen, Xiaofen Xing, Peihao Chen, Xiangmin Xu

(参考訳) 本稿では,一般的な大規模事前学習モデル(PTM)を音声感情認識タスクに適用するパラダイムを提案する。 PTMは、人工知能に新たな光を当てているが、それらは一般的なタスクを念頭に構築されており、特定のタスクに対する有効性をさらに向上することができる。さらに、実用アプリケーションにPTMを採用することは、かなりのサイズであるため、難しい可能性がある。上述の制限は、大規模PTMを特定のタスクに最適化し、コンパクトかつ効果的にタスク固有のPTMを生成するという別の研究方向を生み出します。本稿では,音声感情認識タスクに着目し,vesperと呼ばれる感情特異的事前学習エンコーダを提案する。 Vesperは、WavLMに基づく音声データセットで事前訓練され、感情的特徴を考慮に入れている。感情情報に対する感受性を高めるため、ヴェスパーは感情誘導マスキング戦略を採用し、マスキングが必要な地域を特定する。その後、vesperは階層的および横断的な自己スーパービジョンを採用し、音響的および意味的表現をキャプチャする能力を向上させる。 iemocap、meld、crema-dのデータセットにおける実験結果は、4層からなるvesperが12層のwavlmベースよりも優れており、12層のvesperの性能は24層のwavlmよりも大きいことを示している。

This paper presents a paradigm that adapts general large-scale pretrained models (PTMs) to speech emotion recognition task. Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved. Additionally, employing PTMs in practical applications can be challenging due to their considerable size. Above limitations spawn another research direction, namely, optimizing large-scale PTMs for specific tasks to generate task-specific PTMs that are both compact and effective. In this paper, we focus on the speech emotion recognition task and propose an improved emotion-specific pretrained encoder called Vesper. Vesper is pretrained on a speech dataset based on WavLM and takes into account emotional characteristics. To enhance sensitivity to emotional information, Vesper employs an emotion-guided masking strategy to identify the regions that need masking. Subsequently, Vesper employs hierarchical and cross-layer self-supervision to improve its ability to capture acoustic and semantic representations, both of which are crucial for emotion recognition. Experimental results on the IEMOCAP, MELD, and CREMA-D datasets demonstrate that Vesper with 4 layers outperforms WavLM Base with 12 layers, and the performance of Vesper with 12 layers surpasses that of WavLM Large with 24 layers.

翻訳日:2023-07-21 13:41:28 公開日:2023-07-20

# LBL:一級分類のための対数障壁損失関数

LBL: Logarithmic Barrier Loss Function for One-class Classification ( http://arxiv.org/abs/2307.10753v1 )

ライセンス: Link先を確認

Tianlei Wang, Dekang Liu, Wandong Zhang, Jiuwen Cao

(参考訳) one-class classification (occ) は、ターゲットのクラスデータのみで分類器を訓練することを目的としており、現実世界のアプリケーションに適用性が強いことで大きな注目を集めている。 OCCには多くの進歩があったが、深層学習に有効なOCC損失機能がない。本稿では,occ目標をスムースに近似することにより,マージンサンプルに大きな勾配を割り当て,よりコンパクトな超球面を導出する新しい対数バリア関数ベースocc損失(lbl)を提案する。しかし、特にサンプルが無限の損失につながる境界上にある場合、lblの最適化は不安定である可能性がある。この問題に対処するため、一方的な緩和Sigmoid関数をLBLに導入し、新しいOCC損失LBLSigを提案する。 LBLSigは平均二乗誤差(MSE)とクロスエントロピー(CE)の融合と見なすことができ、一方の緩和シグモイド関数によりLBLSigの最適化はより滑らかである。提案するlblとlblsigの有効性を,ネットワーク構造の違いに対する最先端occアルゴリズムとの比較により実験的に検証した。ソースコードはhttps://github.com/ML-HDU/LBL_LBLSigにある。

One-class classification (OCC) aims to train a classifier only with the target class data and attracts great attention for its strong applicability in real-world application. Despite a lot of advances have been made in OCC, it still lacks the effective OCC loss functions for deep learning. In this paper, a novel logarithmic barrier function based OCC loss (LBL) that assigns large gradients to the margin samples and thus derives more compact hypersphere, is first proposed by approximating the OCC objective smoothly. But the optimization of LBL may be instability especially when samples lie on the boundary leading to the infinity loss. To address this issue, then, a unilateral relaxation Sigmoid function is introduced into LBL and a novel OCC loss named LBLSig is proposed. The LBLSig can be seen as the fusion of the mean square error (MSE) and the cross entropy (CE) and the optimization of LBLSig is smoother owing to the unilateral relaxation Sigmoid function. The effectiveness of the proposed LBL and LBLSig is experimentally demonstrated in comparisons with several state-of-the-art OCC algorithms on different network structures. The source code can be found at https://github.com/ML-HDU/LBL_LBLSig.

翻訳日:2023-07-21 13:41:06 公開日:2023-07-20

# 人工知能が知識労働の創造性に及ぼす影響--機械的プラジャリズムと確率的パロットを超えて-

Exploring Perspectives on the Impact of Artificial Intelligence on the Creativity of Knowledge Work: Beyond Mechanised Plagiarism and Stochastic Parrots ( http://arxiv.org/abs/2307.10751v1 )

ライセンス: Link先を確認

Advait Sarkar

(参考訳) 人工知能(AI)、特に生成モデルは、知識労働のための変換ツールである。彼らは創造性、独創性、盗作、信用の帰属、著作権の所有という概念を問題視している。生成モデルの批判者は、大量のトレーニングデータへの依存を強調し、これらのモデルの出力は、ソースデータのランダム化、リミックス、コラージュ以上のものではないとみなす。これらの理由から、多くの人はこれらのモデルの出力の配置、使用、帰属に関するより強い規制を主張してきた。しかし、これらの問題は人工知能に限ったものではない。本稿では,文学的批判や美術史,著作権法などの例を用いて,創造性と独創性が,対象の不可知性や情報理論的な性質として定義にどのように抵抗するかを示し,その代わりに,プロセスや著者,視聴者の性質として見ることができる。さらに別の見解として、すべての創造的な作業は本質的に再利用される(ほとんどが帰属しない)か、ランダム性自体が創造的になる可能性がある。創造性は最終的にクリエーターとレシーバーのコミュニティによって定義され、ワークフローの創造性はワークフローのどの部分を自動化できるかに依存します。創造的知識労働におけるAIの最近の研究の例から、AIは知識労働を物質生産から重要な統合へとシフトさせることを提案します。本論文は,これらのモデルの利用者の創造的・カリキュラム的音声の重要性を十分に認識し,より単純な表記的・情報理論的な視点から遠ざかる,創造的モデルにおける創造的・信用的割り当ての問題に対する,よりニュアンスなアプローチの議論を開始することを目的としている。

Artificial Intelligence (AI), and in particular generative models, are transformative tools for knowledge work. They problematise notions of creativity, originality, plagiarism, the attribution of credit, and copyright ownership. Critics of generative models emphasise the reliance on large amounts of training data, and view the output of these models as no more than randomised plagiarism, remix, or collage of the source data. On these grounds, many have argued for stronger regulations on the deployment, use, and attribution of the output of these models. However, these issues are not new or unique to artificial intelligence. In this position paper, using examples from literary criticism, the history of art, and copyright law, I show how creativity and originality resist definition as a notatable or information-theoretic property of an object, and instead can be seen as the property of a process, an author, or a viewer. Further alternative views hold that all creative work is essentially reuse (mostly without attribution), or that randomness itself can be creative. I suggest that creativity is ultimately defined by communities of creators and receivers, and the deemed sources of creativity in a workflow often depend on which parts of the workflow can be automated. Using examples from recent studies of AI in creative knowledge work, I suggest that AI shifts knowledge work from material production to critical integration. This position paper aims to begin a conversation around a more nuanced approach to the problems of creativity and credit assignment for generative models, one which more fully recognises the importance of the creative and curatorial voice of the users of these models and moves away from simpler notational or information-theoretic views.

翻訳日:2023-07-21 13:40:44 公開日:2023-07-20

# 公正な意見集約のための投票属性バイアスの緩和

Mitigating Voter Attribute Bias for Fair Opinion Aggregation ( http://arxiv.org/abs/2307.10749v1 )

ライセンス: Link先を確認

Ryosuke Ueda, Koh Takeuchi, Hisashi Kashima

(参考訳) 複数の意見の集約は、雇用や融資レビュー、教師付き学習のためのラベル付けデータなど、意思決定において重要な役割を果たす。多数決と既存の世論集計モデルは単純なタスクに有効であるが、不一致が生じる可能性のある客観的なラベルがないタスクには不適切である。特に、性別や人種などの有権者属性が意見に偏りをもたらす場合、集計結果は投票者属性の構成によって異なる可能性がある。バランスの取れた有権者のグループは公平な集計結果に望ましいが、準備が難しい可能性がある。本研究では, 投票者属性に基づく公正な意見集約を実現する手法を検討し, 集計結果の公平性を評価する。この目的のために、多数決のような意見集約モデルとdwid and skeneモデル(d&sモデル)とサンプル重み付けのような公平性オプションを組み合わせたアプローチを検討する。意見集約の公平性を評価するために,確率的ソフトラベルが離散クラスラベルよりも好まれる。まず,投票者属性を考慮せずにソフトラベル推定の問題に対処し,d&sモデルにおける問題点を特定する。これらの制約に対処するため,ソフトラベル推定の精度を向上させるソフトD&Sモデルを提案する。さらに, 合成データと半合成データを用いて, ソフトd&sを含む意見集約モデルのフェアネスを, 異なるフェアネスオプションと組み合わせて評価した。実験結果から,ソフトD&Sと公平性オプションとしてのデータ分割の組み合わせは高密度データに有効であるのに対し,重み付き多数決はスパースデータに有効であることが示唆された。これらの知見は、バランスのとれた意見集約を持つ人間および機械学習モデルによる意思決定を支援する上で特に有用である。

The aggregation of multiple opinions plays a crucial role in decision-making, such as in hiring and loan review, and in labeling data for supervised learning. Although majority voting and existing opinion aggregation models are effective for simple tasks, they are inappropriate for tasks without objectively true labels in which disagreements may occur. In particular, when voter attributes such as gender or race introduce bias into opinions, the aggregation results may vary depending on the composition of voter attributes. A balanced group of voters is desirable for fair aggregation results but may be difficult to prepare. In this study, we consider methods to achieve fair opinion aggregation based on voter attributes and evaluate the fairness of the aggregated results. To this end, we consider an approach that combines opinion aggregation models such as majority voting and the Dawid and Skene model (D&S model) with fairness options such as sample weighting. To evaluate the fairness of opinion aggregation, probabilistic soft labels are preferred over discrete class labels. First, we address the problem of soft label estimation without considering voter attributes and identify some issues with the D&S model. To address these limitations, we propose a new Soft D&S model with improved accuracy in estimating soft labels. Moreover, we evaluated the fairness of an opinion aggregation model, including Soft D&S, in combination with different fairness options using synthetic and semi-synthetic data. The experimental results suggest that the combination of Soft D&S and data splitting as a fairness option is effective for dense data, whereas weighted majority voting is effective for sparse data. These findings should prove particularly valuable in supporting decision-making by human and machine-learning models with balanced opinion aggregation.

翻訳日:2023-07-21 13:40:12 公開日:2023-07-20

# Edgeal: OCTセグメンテーションのためのエッジ推定に基づくアクティブラーニングアプローチ

EdgeAL: An Edge Estimation Based Active Learning Approach for OCT Segmentation ( http://arxiv.org/abs/2307.10745v1 )

ライセンス: Link先を確認

Md Abdul Kadir, Hasan Md Tusfiqur Alam, Daniel Sonntag

(参考訳) アクティブラーニングアルゴリズムは、限られたデータでモデルのトレーニングにますます人気がある。しかし,未取得データで利用可能な情報量が限られているため,アノテーションデータの選択は依然として難しい課題である。そこで本研究では,不確かさを計測するために,未検出画像のエッジ情報を先行情報として利用するedgealを提案する。不確かさは、エッジを横断するモデル予測の発散とエントロピーを分析することによって定量化される。この尺度はアノテーション用のスーパーピクセルを選択するために使われる。マルチクラス光コヒーレンス・トモグラフィ(OCT)セグメンテーションタスクにおけるEdgeALの有効性を実証し、アノテーションラベルのコストを3つの公開データセット(Duke, AROI, UMN)でそれぞれ12%, 2.3%, 3%に削減し、99%のダイススコアを得た。ソースコードは \url{https://github.com/Mak-Ta-Reque/EdgeAL} で入手できる。

Active learning algorithms have become increasingly popular for training models with limited data. However, selecting data for annotation remains a challenging problem due to the limited information available on unseen data. To address this issue, we propose EdgeAL, which utilizes the edge information of unseen images as {\it a priori} information for measuring uncertainty. The uncertainty is quantified by analyzing the divergence and entropy in model predictions across edges. This measure is then used to select superpixels for annotation. We demonstrate the effectiveness of EdgeAL on multi-class Optical Coherence Tomography (OCT) segmentation tasks, where we achieved a 99% dice score while reducing the annotation label cost to 12%, 2.3%, and 3%, respectively, on three publicly available datasets (Duke, AROI, and UMN). The source code is available at \url{https://github.com/Mak-Ta-Reque/EdgeAL}

翻訳日:2023-07-21 13:39:41 公開日:2023-07-20

# フェデレーション学習のための公正なクライアント選択

Fairness-Aware Client Selection for Federated Learning ( http://arxiv.org/abs/2307.10738v1 )

ライセンス: Link先を確認

Yuxin Shi, Zelei Liu, Zhuan Shi, Han Yu

(参考訳) フェデレートラーニング(FL)により、複数のデータ所有者(FLクライアント)が、プライベートデータを公開せずに、機械学習モデルを協調的にトレーニングできるようになった。 FLサーバは各トレーニングラウンドで限られた数のクライアントしか扱えないため、FLクライアントの選択は重要な研究課題となっている。既存のアプローチでは、FLモデルの性能の向上や、FLクライアントの公平な処理の強化に重点を置いている。 FLクライアント選択時の性能と公平性のバランスに関する問題は未解決のままである。この問題を解決するために、FairFedCS(Fairness-aware Federated Client Selection)アプローチを提案する。リアプノフ最適化に基づき、その評価、flタスクへの参加時期、モデル性能への貢献を共同で考慮し、flクライアントの選択確率を動的に調整する。しきい値に基づく評判フィルタリングを使わずに、FLクライアントは、パフォーマンスの低さが認識された後に評判を再評価する機会を与えられる。実世界のマルチメディアデータセットに基づく大規模な実験により、FairFedCSは19.6%のフェアネスと0.73%のテスト精度を達成した。

Federated learning (FL) has enabled multiple data owners (a.k.a. FL clients) to train machine learning models collaboratively without revealing private data. Since the FL server can only engage a limited number of clients in each training round, FL client selection has become an important research problem. Existing approaches generally focus on either enhancing FL model performance or enhancing the fair treatment of FL clients. The problem of balancing performance and fairness considerations when selecting FL clients remains open. To address this problem, we propose the Fairness-aware Federated Client Selection (FairFedCS) approach. Based on Lyapunov optimization, it dynamically adjusts FL clients' selection probabilities by jointly considering their reputations, times of participation in FL tasks and contributions to the resulting model performance. By not using threshold-based reputation filtering, it provides FL clients with opportunities to redeem their reputations after a perceived poor performance, thereby further enhancing fair client treatment. Extensive experiments based on real-world multimedia datasets show that FairFedCS achieves 19.6% higher fairness and 0.73% higher test accuracy on average than the best-performing state-of-the-art approach.

翻訳日:2023-07-21 13:39:21 公開日:2023-07-20

# ガウス混合系におけるロングテール理論

Long-Tail Theory under Gaussian Mixtures ( http://arxiv.org/abs/2307.10736v1 )

ライセンス: Link先を確認

Arman Bolatov, Maxat Tezekbayev, Igor Melnykov, Artur Pak, Vassilina Nikoulina and Zhenisbek Assylbekov

(参考訳) フェルドマンのロングテール理論(2020年)に準拠したデータ生成のための単純なガウス混合モデルを提案する。線形分類器は,提案モデルの一定レベル以下では一般化誤差を低減できないが,記憶容量を有する非線形分類器は可能である。これは、長い尾の分布に対して、新しいデータへの最適な一般化のために稀なトレーニング例を考慮しなければならないことを裏付ける。最後に, 合成データおよび実データ実験により確認されるように, 尾部がサブポピュレーション周波数分布において短くなるにつれて, 線形モデルと非線形モデルの性能ギャップが小さくなることを示す。

We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered for optimal generalization to new data. Finally, we show that the performance gap between linear and nonlinear models can be lessened as the tail becomes shorter in the subpopulation frequency distribution, as confirmed by experiments on synthetic and real data.

翻訳日:2023-07-21 13:39:00 公開日:2023-07-20

# 一定深さでのロバストなスパースiqpサンプリング

Robust sparse IQP sampling in constant depth ( http://arxiv.org/abs/2307.10729v1 )

ライセンス: Link先を確認

Louis Paletta, Anthony Leverrier, Alain Sarlette, Mazyar Mirrahimi, Christophe Vuillot

(参考訳) 強固な量子アドバンテージと完全にフォールトトレラントな量子計算の証明を伴わないnisq(noisy intermediate scale quantum)アプローチ間において、(広く受け入れられている複雑性予想の下で)証明可能な超多項量子アドバンテージを達成するためのスキームを提案する。我々は、スパースIQP(Instantaneous Quantum Polynomial-time)回路と呼ばれる通勤ゲートのサンプリング問題の種類を選択し、テトラヘリックス符号を導入することにより、その耐故障性を確保する。この新符号は、複数の四面体符号(3Dカラーコード)をマージして取得され、各スパースIQPゲートがトランスバーサル実装を認め、論理回路の深さをその幅で交換できるという特性を持つ。これらを組み合わせて、符号化状態の作成まで、任意のスパースiqp回路の深さ-1 実装を得る。これは、元の回路の幅で多対数しか持たない空間オーバーヘッドによるものである。さらに、従来の計算からフィードフォワードの単一ステップで、状態準備を一定の深さで行うこともできることを示す。そこで本研究では,1ラウンドの計測とフィードフォワードで一定深度回路上に実装したサンプリング問題に対して,ロバストなスーパーポリノミカル量子優位性を示す。

Between NISQ (noisy intermediate scale quantum) approaches without any proof of robust quantum advantage and fully fault-tolerant quantum computation, we propose a scheme to achieve a provable superpolynomial quantum advantage (under some widely accepted complexity conjectures) that is robust to noise with minimal error correction requirements. We choose a class of sampling problems with commuting gates known as sparse IQP (Instantaneous Quantum Polynomial-time) circuits and we ensure its fault-tolerant implementation by introducing the tetrahelix code. This new code is obtained by merging several tetrahedral codes (3D color codes) and has the following properties: each sparse IQP gate admits a transversal implementation, and the depth of the logical circuit can be traded for its width. Combining those, we obtain a depth-1 implementation of any sparse IQP circuit up to the preparation of encoded states. This comes at the cost of a space overhead which is only polylogarithmic in the width of the original circuit. We furthermore show that the state preparation can also be performed in constant depth with a single step of feed-forward from classical computation. Our construction thus exhibits a robust superpolynomial quantum advantage for a sampling problem implemented on a constant depth circuit with a single round of measurement and feed-forward.

翻訳日:2023-07-21 13:38:48 公開日:2023-07-20

# 簡単な検出による量子状態による物体検出とレンジフィンディング

Object detection and rangefinding with quantum states using simple detection ( http://arxiv.org/abs/2307.10785v1 )

ライセンス: Link先を確認

Richard J. Murchie, Jonathan D. Pritchard, John Jeffers

(参考訳) 単一レベルが弱い雑音環境において、量子照明は、非同時位相非感受性の偶然数に基づく準最適測定の限界においても、対象物の存在と範囲を決定する際に古典的な照明よりも優れる。現実的な実験プロトコルによって動機付けされ、簡単な検出器で同時マルチショットデータを解析するための理論的枠組みを提案する。このアプローチは、見過ごされがちな非結合データを含めることを可能にし、オブジェクトの存在と範囲を推測するキャリブレーションフリーのしきい値を提供し、異なる検出レジーム間の公正な比較を可能にする。本研究は, 雑音環境下でのターゲット識別を行う際の古典的照明に対する量子の利点を定量化し, 所定の信頼度でターゲットを検出するのに必要なショット数を推定することを含む。

In a noisy environment with weak single levels, quantum illumination can outperform classical illumination in determining the presence and range of a target object even in the limit of sub-optimal measurements based on non-simultaneous, phase-insensitive coincidence counts. Motivated by realistic experimental protocols, we present a theoretical framework for analysing coincident multi-shot data with simple detectors. This approach allows for the often-overlooked non-coincidence data to be included, as well as providing a calibration-free threshold for inferring the presence and range of an object, enabling a fair comparison between different detection regimes. Our results quantify the advantage of quantum over classical illumination when performing target discrimination in a noisy thermal environment, including estimating the number of shots required to detect a target with a given confidence level.

翻訳日:2023-07-21 13:31:06 公開日:2023-07-20

# SMURF: 4次元イメージングレーダを用いた3次元物体検出のための空間多重表現融合

SMURF: Spatial Multi-Representation Fusion for 3D Object Detection with 4D Imaging Radar ( http://arxiv.org/abs/2307.10784v1 )

ライセンス: Link先を確認

Jianan Liu, Qiuchi Zhao, Weiyi Xiong, Tao Huang, Qing-Long Han, Bing Zhu

(参考訳) 4Dミリ波レーダー(mmWave)は、悪天候条件下でのコスト効率と操作性から、車両の検知に有望な技術である。しかし、この技術の採用は、レーダポイントクラウドデータにおけるスパーシリティとノイズの問題によって妨げられている。本稿では,単一4次元イメージングレーダを用いた新しい3次元物体検出手法である空間多重表現融合(SMURF)を提案する。 SMURFは、カーネル密度推定(KDE)を通して多次元ガウス混合分布の柱化や密度特性を含むレーダー検出点の複数の表現を利用する。 KDEは、狭角分解能とレーダ信号のマルチパス伝搬による測定精度の低下を効果的に緩和する。さらに、KDEは密度特性をキャプチャすることで、ポイントクラウドの分散を緩和する。 View-of-Delft(VoD)とTJ4DRadSetデータセットの実験的評価は、SMURFの有効性と一般化能力を示し、最近提案された4Dイメージングレーダベースの単一表現モデルよりも優れている。さらに、4Dイメージングレーダのみを使用しながら、SMURFは最先端の4Dイメージングレーダとカメラ融合方式に匹敵する性能を保ち、TJ4DRadSetデータセットの鳥眼視の平均精度は1.22%、VoDデータセットの全注釈領域の平均精度は1.32%向上した。提案手法は印象的な推論時間を示し,2つのデータセットのほとんどのスキャンにおいて0.05秒以内で,リアルタイム検出の課題に対処する。本研究は、4DmmWaveレーダの利点を強調し、4Dイメージングレーダを用いた3次元物体検出に関するその後の研究の強力なベンチマークである。

The 4D Millimeter wave (mmWave) radar is a promising technology for vehicle sensing due to its cost-effectiveness and operability in adverse weather conditions. However, the adoption of this technology has been hindered by sparsity and noise issues in radar point cloud data. This paper introduces spatial multi-representation fusion (SMURF), a novel approach to 3D object detection using a single 4D imaging radar. SMURF leverages multiple representations of radar detection points, including pillarization and density features of a multi-dimensional Gaussian mixture distribution through kernel density estimation (KDE). KDE effectively mitigates measurement inaccuracy caused by limited angular resolution and multi-path propagation of radar signals. Additionally, KDE helps alleviate point cloud sparsity by capturing density features. Experimental evaluations on View-of-Delft (VoD) and TJ4DRadSet datasets demonstrate the effectiveness and generalization ability of SMURF, outperforming recently proposed 4D imaging radar-based single-representation models. Moreover, while using 4D imaging radar only, SMURF still achieves comparable performance to the state-of-the-art 4D imaging radar and camera fusion-based method, with an increase of 1.22% in the mean average precision on bird's-eye view of TJ4DRadSet dataset and 1.32% in the 3D mean average precision on the entire annotated area of VoD dataset. Our proposed method demonstrates impressive inference time and addresses the challenges of real-time detection, with the inference time no more than 0.05 seconds for most scans on both datasets. This research highlights the benefits of 4D mmWave radar and is a strong benchmark for subsequent works regarding 3D object detection with 4D imaging radar.

翻訳日:2023-07-21 13:30:52 公開日:2023-07-20

# 詳細と詳細:マルチモーダルビジュアルデータによるゼロショットポイントクラウドセグメンテーション

See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data ( http://arxiv.org/abs/2307.10782v1 )

ライセンス: Link先を確認

Yuhang Lu, Qi Jiang, Runnan Chen, Yuenan Hou, Xinge Zhu, Yuexin Ma

(参考訳) ゼロショットポイントクラウドセグメンテーションは、トレーニングフェーズで見えないポイントクラウドで新しいオブジェクトを認識することができるディープモデルを作ることを目的としている。最近のトレンドでは、ラベル付き参照クラスからラベルなしの未認識クラスに知識を転送するパイプラインが好まれている。彼らは通常、視覚的特徴と、見たクラスのアノテーションの監督によって単語の埋め込みから得られる意味的特徴とを一致させる。しかし、ポイントクラウドはセマンティック機能に完全にマッチする限られた情報を含んでいる。実際、画像のリッチな外観情報はテクスチャのない点雲の自然な補完であり、以前の文献ではよく研究されていない。そこで本研究では,点群と画像の相補的情報をより正確な視覚・意味的アライメントに活用するための,新しいマルチモーダルゼロショット学習手法を提案する。セマンティックKITTI と nuScenes という2つの一般的なベンチマークで大規模な実験を行い,本手法は従来のSOTA法よりも52%,49%向上した。

Zero-shot point cloud segmentation aims to make deep models capable of recognizing novel objects in point cloud that are unseen in the training phase. Recent trends favor the pipeline which transfers knowledge from seen classes with labels to unseen classes without labels. They typically align visual features with semantic features obtained from word embedding by the supervision of seen classes' annotations. However, point cloud contains limited information to fully match with semantic features. In fact, the rich appearance information of images is a natural complement to the textureless point cloud, which is not well explored in previous literature. Motivated by this, we propose a novel multi-modal zero-shot learning method to better utilize the complementary information of point clouds and images for more accurate visual-semantic alignment. Extensive experiments are performed in two popular benchmarks, i.e., SemanticKITTI and nuScenes, and our method outperforms current SOTA methods with 52% and 49% improvement on average for unseen class mIoU, respectively.

翻訳日:2023-07-21 13:30:23 公開日:2023-07-20

# 視覚トランスフォーマーの学習しきい値トークンのマージとプルーニング

Learned Thresholds Token Merging and Pruning for Vision Transformers ( http://arxiv.org/abs/2307.10780v1 )

ライセンス: Link先を確認

Maxim Bonnaerens, Joni Dambre

(参考訳) ビジョントランスフォーマーは、過去数年間、幅広いコンピュータビジョンタスクで顕著な成功を収めてきた。しかし、それらの高い計算コストは、実際の展開にとって重要な障壁である。特に、トランスフォーマーモデルの複雑さは、入力トークンの数に関して二次的である。そのため、処理が必要な入力トークンの数を減らす技術が提案されている。本稿では,トークンマージとトークンプルーニングの両方の長所を活用する新しいアプローチであるLTMP(Learned Thresholds token Merging and Pruning)を紹介する。 LTMPは学習しきい値マスキングモジュールを使用して、マージするトークンとプルーするトークンを動的に決定する。我々は、ImageNet分類タスクにおいて、視覚変換器に関する広範な実験を行った。以上の結果から,LTMPは従来の手法よりも桁違いに高速な1つの微調整エポックしか必要とせず,縮小速度をまたいで最先端の精度を達成できることが示唆された。コードはhttps://github.com/Mxbonn/ltmpで入手できる。

Vision transformers have demonstrated remarkable success in a wide range of computer vision tasks over the last years. However, their high computational costs remain a significant barrier to their practical deployment. In particular, the complexity of transformer models is quadratic with respect to the number of input tokens. Therefore techniques that reduce the number of input tokens that need to be processed have been proposed. This paper introduces Learned Thresholds token Merging and Pruning (LTMP), a novel approach that leverages the strengths of both token merging and token pruning. LTMP uses learned threshold masking modules that dynamically determine which tokens to merge and which to prune. We demonstrate our approach with extensive experiments on vision transformers on the ImageNet classification task. Our results demonstrate that LTMP achieves state-of-the-art accuracy across reduction rates while requiring only a single fine-tuning epoch, which is an order of magnitude faster than previous methods. Code is available at https://github.com/Mxbonn/ltmp .

翻訳日:2023-07-21 13:30:03 公開日:2023-07-20

# 効率的なビームツリー再帰

Efficient Beam Tree Recursion ( http://arxiv.org/abs/2307.10779v1 )

ライセンス: Link先を確認

Jishnu Ray Chowdhury, Cornelia Caragea

(参考訳) Beam Tree Recursive Neural Network (BT-RvNN)は、最近、Gumbel Tree RvNNの単純な拡張として提案され、他のタスクで同等のパフォーマンスを維持しながら、ListOpsの最先端長一般化性能を達成することが示されている。しかし、BT-RvNNは、その種類では最悪のものではないが、メモリ使用量では極端に高価である。本稿では,BT-RvNNのメモリ使用量の主なボトルネックは,スコア機能と再帰的セル機能の絡み合いであることを示す。我々は、このボトルネックを取り除き、メモリ使用をさらに単純化する戦略を提案する。全体的に、BT-RvNNのメモリ使用量を10-16ドル倍に削減するだけでなく、他のタスクでも同様のパフォーマンスを維持しながら、ListOpsに新たな最先端技術を作成します。さらに、bt-rvnnが生成する遅延木ノード表現を用いて、$f:\mathbb{r}^{n \times d} \rightarrow \mathbb{r}^{d}$を$f:\mathbb{r}^{n \times d} \rightarrow \mathbb{r}^{n \times d} \rightarrow \mathbb{r}^{n \times d}$という形の文エンコーダからbt-rvnnを変換する方法も提案する。したがって、我々の提案はRvNNのさらなる拡張のための道を開くだけでなく、TransformersやStructured State Spaceモデルといった他の一般的なモデルと簡単に積み重ねたりインターフェースしたりできるディープラーニングツールキットの別のビルディングブロックとしてBT-RvNNを使用する方法を標準化する。

Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a simple extension of Gumbel Tree RvNN and it was shown to achieve state-of-the-art length generalization performance in ListOps while maintaining comparable performance on other tasks. However, although not the worst in its kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this paper, we identify the main bottleneck in BT-RvNN's memory usage to be the entanglement of the scorer function and the recursive cell function. We propose strategies to remove this bottleneck and further simplify its memory usage. Overall, our strategies not only reduce the memory usage of BT-RvNN by $10$-$16$ times but also create a new state-of-the-art in ListOps while maintaining similar performance in other tasks. In addition, we also propose a strategy to utilize the induced latent-tree node representations produced by BT-RvNN to turn BT-RvNN from a sentence encoder of the form $f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{d}$ into a sequence contextualizer of the form $f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}$. Thus, our proposals not only open up a path for further scalability of RvNNs but also standardize a way to use BT-RvNNs as another building block in the deep learning toolkit that can be easily stacked or interfaced with other popular models such as Transformers and Structured State Space models.

翻訳日:2023-07-21 13:29:46 公開日:2023-07-20

# 大規模言語モデルを用いた極多ラベルスキル抽出訓練

Extreme Multi-Label Skill Extraction Training using Large Language Models ( http://arxiv.org/abs/2307.10778v1 )

ライセンス: Link先を確認

Jens-Joris Decorte, Severine Verlinden, Jeroen Van Hautte, Johannes Deleu, Chris Develder and Thomas Demeester

(参考訳) オンライン求人広告は、スキル要件に関する情報の貴重な源であり、労働市場分析やe-recruitmentプロセスにおいて重要な役割を果たす。このような広告は通常、フリーテキストでフォーマットされるので、自然言語処理(nlp)技術は自動的に処理する必要がある。具体的には、スキル(文字通り、または暗黙的に記述された)を検出して、それらを大きなスキルオントロジーにリンクするタスクに焦点を当て、極端なマルチラベル分類(XMLC)の難しいケースとなる。この特定のXMLCタスクにラベル付き(トレーニング)データセットが存在しないことを考慮し、汎用言語モデル(LLM)を活用する手法を提案する。本稿では,スキル抽出のための精度の高い完全合成ラベル付きデータセットを生成するための費用対効果のアプローチについて述べる。 3つのスキル抽出ベンチマークで比較した結果,リテラルマッチングによる遠隔監視のみに依存する結果と比較して,textit{r-precision@5}では15～25パーセンテージの一致がみられた。

Online job ads serve as a valuable source of information for skill requirements, playing a crucial role in labor market analysis and e-recruitment processes. Since such ads are typically formatted in free text, natural language processing (NLP) technologies are required to automatically process them. We specifically focus on the task of detecting skills (mentioned literally, or implicitly described) and linking them to a large skill ontology, making it a challenging case of extreme multi-label classification (XMLC). Given that there is no sizable labeled (training) dataset are available for this specific XMLC task, we propose techniques to leverage general Large Language Models (LLMs). We describe a cost-effective approach to generate an accurate, fully synthetic labeled dataset for skill extraction, and present a contrastive learning strategy that proves effective in the task. Our results across three skill extraction benchmarks show a consistent increase of between 15 to 25 percentage points in \textit{R-Precision@5} compared to previously published results that relied solely on distant supervision through literal matches.

翻訳日:2023-07-21 13:29:10 公開日:2023-07-20

# 変形可能なニューラルネットワークプリミティブを用いた都市放射場表現

Urban Radiance Field Representation with Deformable Neural Mesh Primitives ( http://arxiv.org/abs/2307.10776v1 )

ライセンス: Link先を確認

Fan Lu, Yan Xu, Guang Chen, Hongsheng Li, Kwan-Yee Lin, Changjun Jiang

(参考訳) Neural Radiance Fields (NeRF) はここ数年で大きな成功を収めている。しかし、レイマーチングベースのレンダリングのために、現在のほとんどのメソッドは集中的なリソースを必要とする。都市レベルの放射場を効率的に構築するために,変形可能なニューラルネットワークプリミティブ(dnmp)を設計し,これらのプリミティブを用いてシーン全体をパラメータ化することを提案する。 DNMPは古典メッシュ表現の柔軟でコンパクトなニューラルバリアントであり、ラスタライズベースのレンダリングの効率と、フォトリアリスティック画像合成のための強力なニューラル表現能力の両方を享受している。具体的には、DNMPは、局所領域の幾何および放射情報をパラメータ化するために、ペアの頂点特徴を持つ連結変形可能なメッシュ頂点からなる。最適化の自由度を制限し、ストレージ予算を低くするために、各プリミティブの形状を比較的低次元の潜在空間から復号するように強制する。レンダリング色は、ビュー依存MLPにより頂点特徴(ラスタ化で補間)からデコードされる。 dnmpは、魅力的な特性を持つ都市レベルのシーン表現のための新しいパラダイムを提供する: $(1)$ high-quality rendering。本手法は,都市シナリオにおける新規ビュー合成の先進的な性能を実現する。計算コストは$(2)である。我々の表現は高速レンダリング(2.07ms/1kピクセル)と低ピークメモリ(110MB/1kピクセル)を可能にする。我々はまた、33$\times$でバニラのNeRFより高速に動作でき、高度に最適化されたInstant-NGP(0.61対0.71ms/1kピクセル)に匹敵する軽量バージョンも提示する。プロジェクトページ: \href{https://dnmp.github.io/}{https://dnmp.github.io/}

Neural Radiance Fields (NeRFs) have achieved great success in the past few years. However, most current methods still require intensive resources due to ray marching-based rendering. To construct urban-level radiance fields efficiently, we design Deformable Neural Mesh Primitive~(DNMP), and propose to parameterize the entire scene with such primitives. The DNMP is a flexible and compact neural variant of classic mesh representation, which enjoys both the efficiency of rasterization-based rendering and the powerful neural representation capability for photo-realistic image synthesis. Specifically, a DNMP consists of a set of connected deformable mesh vertices with paired vertex features to parameterize the geometry and radiance information of a local area. To constrain the degree of freedom for optimization and lower the storage budgets, we enforce the shape of each primitive to be decoded from a relatively low-dimensional latent space. The rendering colors are decoded from the vertex features (interpolated with rasterization) by a view-dependent MLP. The DNMP provides a new paradigm for urban-level scene representation with appealing properties: $(1)$ High-quality rendering. Our method achieves leading performance for novel view synthesis in urban scenarios. $(2)$ Low computational costs. Our representation enables fast rendering (2.07ms/1k pixels) and low peak memory usage (110MB/1k pixels). We also present a lightweight version that can run 33$\times$ faster than vanilla NeRFs, and comparable to the highly-optimized Instant-NGP (0.61 vs 0.71ms/1k pixels). Project page: \href{https://dnmp.github.io/}{https://dnmp.github.io/}.

翻訳日:2023-07-21 13:28:52 公開日:2023-07-20

# データ駆動ソフトウェアエンジニアリングにおけるAutoMLの利用を評価する

Assessing the Use of AutoML for Data-Driven Software Engineering ( http://arxiv.org/abs/2307.10774v1 )

ライセンス: Link先を確認

Fabio Calefato, Luigi Quaranta, Filippo Lanubile, Marcos Kalinowski

(参考訳) 背景。ソフトウェアアプリケーション構築にAI(AI)と機械学習(ML)が広く採用されているため、企業はそのような技術を深く理解している従業員を雇うのに苦労している。このシナリオでは、AutoMLはAI/MLスキルギャップを埋めるための有望なソリューションとして浮上しています。狙いだ関心の高まりと高い期待にもかかわらず、AutoMLが現在AI/ML対応システムを開発するチームによって採用されているか、実践者や研究者によってどのように認識されているか、という情報はほとんどない。方法。本稿では,このギャップを埋めるために,2つのseデータセットにおける12のエンドツーエンドautomlツールのベンチマークと,それに続くインタビューによるユーザ調査を組み合わせた混合手法研究を行い,automlの採用と認識の理解を深める。結果だ automlソリューションは、seドメインで分類タスクを実行するために、研究者がトレーニングし最適化したモデルよりも優れたモデルを生成することができることが分かりました。また、私たちの調査によると、現在利用可能なAutoMLソリューションは、ML開発ワークフローのステージとすべてのチームメンバーの自動化を均等にサポートしていないため、彼らの名前には達していない。結論だ私たちはSEリサーチコミュニティにAutoMLが彼らの活動をどのように促進し、ツールビルダーに次世代のAutoML技術をどのように設計するかを知らせるために洞察を得る。

Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that would normally be engineered by specialized team members. Aims. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted by teams developing AI/ML-enabled systems and how it is perceived by practitioners and researchers. Method. To fill these gaps, in this paper, we present a mixed-method study comprising a benchmark of 12 end-to-end AutoML tools on two SE datasets and a user survey with follow-up interviews to further our understanding of AutoML adoption and perception. Results. We found that AutoML solutions can generate models that outperform those trained and optimized by researchers to perform classification tasks in the SE domain. Also, our findings show that the currently available AutoML solutions do not live up to their names as they do not equally support automation across the stages of the ML development workflow and for all the team members. Conclusions. We derive insights to inform the SE research community on how AutoML can facilitate their activities and tool builders on how to design the next generation of AutoML technologies.

翻訳日:2023-07-21 13:28:24 公開日:2023-07-20

# ビジュアルスペクトログラムを用いたResNetとBi-GRUによる音楽ジャンル分類

Music Genre Classification with ResNet and Bi-GRU Using Visual Spectrograms ( http://arxiv.org/abs/2307.10773v1 )

ライセンス: Link先を確認

Junfei Zhang

(参考訳) 音楽レコメンデーションシステムは、音楽消費を支配している音楽ストリーミングサービスのユーザエクスペリエンスと満足度を高めるために欠かせない要素となっている。これらのレコメンダシステムを改善する上で重要な課題は、特に音楽ジャンルの分類において、音楽データの複雑さを理解することである。手動ジャンル分類の限界は、より高度なシステム、すなわち自動音楽ジャンル分類(AMGC)システムの必要性を強調している。従来の機械学習技術はジャンル分類の可能性を秘めているが、手作業による機能や特徴の選択に大きく依存しており、音楽データの完全な複雑さを捉えていない。一方で、従来の畳み込みニューラルネットワーク(cnn)のようなディープラーニング分類アーキテクチャは、空間階層を捉えるのに有効であるが、音楽データに固有の時間的ダイナミクスを捉えるのに苦労している。これらの課題に対処するために、視覚スペクトログラムを入力として用いる新しいアプローチを提案し、Residual Neural Network(ResNet)とGated Recurrent Unit(GRU)の強みを組み合わせたハイブリッドモデルを提案する。このモデルは、音楽データのより包括的な分析を提供し、音楽データのより包括的な分析と、より正確なジャンル分類を実現することによって、音楽レコメンデータシステムを改善する可能性を提供する。

Music recommendation systems have emerged as a vital component to enhance user experience and satisfaction for the music streaming services, which dominates music consumption. The key challenge in improving these recommender systems lies in comprehending the complexity of music data, specifically for the underpinning music genre classification. The limitations of manual genre classification have highlighted the need for a more advanced system, namely the Automatic Music Genre Classification (AMGC) system. While traditional machine learning techniques have shown potential in genre classification, they heavily rely on manually engineered features and feature selection, failing to capture the full complexity of music data. On the other hand, deep learning classification architectures like the traditional Convolutional Neural Networks (CNN) are effective in capturing the spatial hierarchies but struggle to capture the temporal dynamics inherent in music data. To address these challenges, this study proposes a novel approach using visual spectrograms as input, and propose a hybrid model that combines the strength of the Residual neural Network (ResNet) and the Gated Recurrent Unit (GRU). This model is designed to provide a more comprehensive analysis of music data, offering the potential to improve the music recommender systems through achieving a more comprehensive analysis of music data and hence potentially more accurate genre classification.

翻訳日:2023-07-21 13:27:55 公開日:2023-07-20

# エニグマのデコード:作業記憶のさまざまな面に人間とAIをベンチマークする

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory ( http://arxiv.org/abs/2307.10768v1 )

ライセンス: Link先を確認

Ankur Sikarwar and Mengmi Zhang

(参考訳) ワーキングメモリ(WM)は、情報の一時記憶、統合、操作、検索を容易にする基本的な認知プロセスであり、推論や意思決定において重要な役割を果たす。 WMの多面的な性質を捉えたロバストベンチマークデータセットは、AI WMモデルの効果的な開発と評価に不可欠である。ここでは、この目的のために包括的なワーキングメモリ(WorM)ベンチマークデータセットを紹介する。 WorMは10のタスクと100万のトライアルで構成され、WMの4つの機能、3つのドメイン、11の行動および神経特性を評価している。これらすべてのタスクで、最先端のリカレントニューラルネットワークとトランスフォーマーを共同でトレーニングし、テストしました。比較のための上限として、人間の行動ベンチマークも含んでいます。以上の結果から,脳におけるwmの特徴,特にプライマシーとrecency効果,神経クラスターを再現し,wmの異なる領域と機能に特有な相関関係を示唆した。実験では、既存のモデルにおける人間の行動を近似するいくつかの制限も明らかにしている。このデータセットは、認知心理学、神経科学、AIのコミュニティにとって貴重なリソースであり、WMモデルの比較と拡張、WMの神経基盤の調査、人間に似た能力を持つWMモデルの開発のための標準化されたフレームワークを提供する。ソースコードとデータはhttps://github.com/zhanglab-deepneurocoglab/wormで入手できます。

Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency effects, and neural clusters and correlates specialized for different domains and functionalities of WM. In the experiments, we also reveal some limitations in existing models to approximate human behavior. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities. Our source code and data are available at https://github.com/ZhangLab-DeepNeuroCogLab/WorM.

翻訳日:2023-07-21 13:27:30 公開日:2023-07-20

# 適応型特徴分割圧縮によるコミュニケーション効率の高い分割学習

Communication-Efficient Split Learning via Adaptive Feature-Wise Compression ( http://arxiv.org/abs/2307.10805v1 )

ライセンス: Link先を確認

Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, and Yo-Seb Jeon

(参考訳) 本稿では,SL学習過程における中間特徴量と勾配ベクトルの伝達に必要な通信オーバーヘッドを低減させる,SplitFCという新しい通信効率分割学習フレームワークを提案する。 splitfcの鍵となるアイデアは、行列の列に現れる異なる分散度を活用することである。 SplitFCには2つの圧縮戦略がある。 (i)アダプティブ・フィーチャーワイズ・ドロップアウトと (ii)適応的特徴量化。第1の戦略では、これらのベクトルの標準偏差に基づいて、適応的なドロップアウト確率で中間特徴ベクトルをドロップする。そして、チェーンルールにより、ドロップされた特徴ベクトルに関連する中間勾配ベクトルもドロップする。第2の戦略では、非投下中間特徴と勾配ベクトルは、ベクトルの範囲に基づいて決定される適応量子化レベルを用いて量子化される。量子化誤差を最小限に抑えるため、この戦略の最適量子化レベルは閉形式式で導出される。 MNIST、CIFAR-10、CelebAデータセットのシミュレーションの結果、SplitFCは最先端のSLフレームワークと比較して分類精度が5.6%以上向上し、圧縮のないバニラSLフレームワークに比べて通信オーバーヘッドが320倍小さいことが示されている。

This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.

翻訳日:2023-07-21 13:23:01 公開日:2023-07-20

# 海洋科学のための時空間データマイニング:データ,方法論,機会

Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities ( http://arxiv.org/abs/2307.10803v1 )

ライセンス: Link先を確認

Hanchen Yang and Wengen Li and Shuyu Wang and Hui Li and Jihong Guan and Shuigeng Zhou and Jiannong Cao

(参考訳) 時空間~(ST)海洋データの増加に伴い、気象予報や災害警報といった様々な海洋問題に対処するため、多くの時空間データマイニング(STDM)研究が実施されている。典型的なSTデータ(例えば、交通データ)と比較すると、ST海洋データはより複雑で、例えば、多様な地域性や高い空間性といった特徴がある。これらの特徴はSTDMモデルの設計と訓練を困難にしている。残念なことに、これらの研究の概要はいまだに欠けており、コンピュータ科学者が海洋研究の問題を識別するのを妨げつつ、海洋科学の研究者が高度なSTDM技術を適用することを妨げている。この状況を改善するため,海洋における既存のstm研究を総括する総合的な調査を行う。具体的には,広く使用されているST海洋データセットをまず要約し,その特徴を同定する。次に,典型的なst ocean data quality enhancement techniqueについて述べる。次に,海洋における既存のSTDM研究を,予測,事象検出,パターンマイニング,異常検出という4つのタスクに分類し,これらのタスクのテクニックを精査する。最後に、有望な研究機会が強調される。この調査は、コンピュータ科学と海洋科学の両方の分野の科学者が、海洋におけるstdmの基本概念、鍵となる技術、そしてオープンチャレンジをよりよく理解するのに役立つだろう。

With the increasing amount of spatial-temporal~(ST) ocean data, numerous spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, e.g., climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated with some unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models. Unfortunately, an overview of these studies is still missing, hindering computer scientists to identify the research issues in ocean while discouraging researchers in ocean science from applying advanced STDM techniques. To remedy this situation, we provide a comprehensive survey to summarize existing STDM studies in ocean. Concretely, we first summarize the widely-used ST ocean datasets and identify their unique characteristics. Then, typical ST ocean data quality enhancement techniques are discussed. Next, we classify existing STDM studies for ocean into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate the techniques for these tasks. Finally, promising research opportunities are highlighted. This survey will help scientists from the fields of both computer science and ocean science have a better understanding of the fundamental concepts, key techniques, and open challenges of STDM in ocean.

翻訳日:2023-07-21 13:22:38 公開日:2023-07-20

# Meta-Transformer: マルチモーダル学習のための統一フレームワーク

Meta-Transformer: A Unified Framework for Multimodal Learning ( http://arxiv.org/abs/2307.10802v1 )

ライセンス: Link先を確認

Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue

(参考訳) マルチモーダル学習は、複数のモダリティから情報を処理し、関連付けるモデルを構築することを目的としている。この分野における長年の開発にもかかわらず、それらの間に固有のギャップがあるため、様々なモダリティを処理するための統一ネットワークを設計することは依然として困難である(\textit{e.}$ natural language, 2d images, 3d point clouds, audio, video, time series, tabular data)。本研究では,$\textbf{frozen}$エンコーダを利用して,対のマルチモーダルトレーニングデータを用いずにマルチモーダル知覚を行う,meta-transformerというフレームワークを提案する。 Meta-Transformerでは、様々なモダリティからの生の入力データを共有トークン空間にマッピングし、凍結パラメータを持つ後続のエンコーダで入力データの高レベルな意味的特徴を抽出する。統合データトークンライザ、モダリティ共有エンコーダ、ダウンストリームタスク用のタスク固有ヘッドの3つの主要コンポーネントで構成されるmeta-transformerは、12つのモダリティにまたがる統一学習を非ペアデータで実行する最初のフレームワークである。異なるベンチマークの実験によると、Meta-Transformerは基本的な認識(テキスト、画像、ポイントクラウド、オーディオ、ビデオ)、実用的なアプリケーション(X線、赤外線、ハイパースペクトル、IMU)、データマイニング(グラフ、表、時系列)など、幅広いタスクを処理できる。 Meta-Transformerは、トランスフォーマーを用いた統合マルチモーダルインテリジェンスを開発するための有望な未来を示す。コードはhttps://github.com/invictus717/MetaTransformerで入手できる。

Multimodal learning aims to build models that can process and relate information from multiple modalities. Despite years of development in this field, it still remains challenging to design a unified network for processing various modalities ($\textit{e.g.}$ natural language, 2D images, 3D point clouds, audio, video, time series, tabular data) due to the inherent gaps among them. In this work, we propose a framework, named Meta-Transformer, that leverages a $\textbf{frozen}$ encoder to perform multimodal perception without any paired multimodal training data. In Meta-Transformer, the raw input data from various modalities are mapped into a shared token space, allowing a subsequent encoder with frozen parameters to extract high-level semantic features of the input data. Composed of three main components: a unified data tokenizer, a modality-shared encoder, and task-specific heads for downstream tasks, Meta-Transformer is the first framework to perform unified learning across 12 modalities with unpaired data. Experiments on different benchmarks reveal that Meta-Transformer can handle a wide range of tasks including fundamental perception (text, image, point cloud, audio, video), practical application (X-Ray, infrared, hyperspectral, and IMU), and data mining (graph, tabular, and time-series). Meta-Transformer indicates a promising future for developing unified multimodal intelligence with transformers. Code will be available at https://github.com/invictus717/MetaTransformer

翻訳日:2023-07-21 13:22:14 公開日:2023-07-20

# 合成一般化のための層間表現融合

Layer-wise Representation Fusion for Compositional Generalization ( http://arxiv.org/abs/2307.10799v1 )

ライセンス: Link先を確認

Yafang Zheng, Lei Lin, Zhaohong Lai, Binling Wang, Shan Liu, Biao Fu, Wenhao Rao, Peigen Ye, Yidong Chen, Xiaodong Shi

(参考訳) 幅広い応用で成功したにもかかわらず、シーケンシャル・ツー・シーケンスモデルの解の構成は、人間のような一般化よりも構成的でないと論じられている。合成一般化を妨げる理由の1つはエンコーダの表現であり、最上層のデコーダが絡み合っているという証拠がある。言い換えると、シーケンスの構文的および意味的表現は不適切にツイストされる。しかし,従来のほとんどの研究は,人間のように適切にシーケンスの構文的・意味的表現を構成・使用するのではなく,トークンレベルの意味情報の強化に重点を置いている。また, ‘shallow' の残差接続や,従来のレイヤの情報を効果的に融合させることができない単純なワンステップ操作などにより,深層変圧器の訓練に関する最近の研究から,絡み合い問題が存在する理由を説明する。この発見から始まり、人間の戦略に着想を得て、各エンコーダおよびデコーダ層に \emph{fuse-attention module} を導入することにより、前のレイヤの情報をエンコードおよびデコードプロセスに適切に融合する、シーケンス-シーケンスモデルの拡張である \textsc{fusion} (\textbf{fu}sing \textbf{s}yntactic and semant\textbf{i}c representingati\textbf{on}s) を提案する。提案手法の有効性を実証的に実証した,2つの現実的なベンチマークに対して, 競合的かつ, さらには, \textbf{state-of-the-art}の結果が得られる。

Despite successes across a broad range of applications, sequence-to-sequence models' construct of solutions are argued to be less compositional than human-like generalization. There is mounting evidence that one of the reasons hindering compositional generalization is representations of the encoder and decoder uppermost layer are entangled. In other words, the syntactic and semantic representations of sequences are twisted inappropriately. However, most previous studies mainly concentrate on enhancing token-level semantic information to alleviate the representations entanglement problem, rather than composing and using the syntactic and semantic representations of sequences appropriately as humans do. In addition, we explain why the entanglement problem exists from the perspective of recent studies about training deeper Transformer, mainly owing to the ``shallow'' residual connections and its simple, one-step operations, which fails to fuse previous layers' information effectively. Starting from this finding and inspired by humans' strategies, we propose \textsc{FuSion} (\textbf{Fu}sing \textbf{S}yntactic and Semant\textbf{i}c Representati\textbf{on}s), an extension to sequence-to-sequence models to learn to fuse previous layers' information back into the encoding and decoding process appropriately through introducing a \emph{fuse-attention module} at each encoder and decoder layer. \textsc{FuSion} achieves competitive and even \textbf{state-of-the-art} results on two realistic benchmarks, which empirically demonstrates the effectiveness of our proposal.

翻訳日:2023-07-21 13:21:42 公開日:2023-07-20

# hyperreenact: 共同学習によるワンショット再現による顔の洗練とターゲティング

HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces ( http://arxiv.org/abs/2307.10797v1 )

ライセンス: Link先を確認

Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos

(参考訳) 本稿では,ターゲットの顔のポーズによって駆動される音源の頭部画像のリアルな生成を目的とした,HyperReenactと呼ばれるニューラルフェイス再現法を提案する。既存の最先端の顔再現法では、現実的な顔画像の合成を学ぶための制御可能な生成モデルを訓練するが、重要な視覚的アーティファクト、特に極端な頭部ポーズの変化の困難な条件下では、再現された顔を生成する。本稿では,まず実像をその潜在空間に逆転させ,次にハイパーネットワークを用いて実行することで,予め訓練したStyleGAN2ジェネレータの光リアリスティック生成能力と歪み特性を活用することで,これらの制約に対処することを提案する。 (i)原産地特性の精細化及び (二)顔のポーズを再ターゲットし、通常人工物を生成する外部編集方法への依存をなくす。本手法は,単発設定(すなわち単一ソースフレームを使用する)で動作し,被写体固有の微調整を必要とせず,クロスサブジェクトの再現を可能にする。本手法は,voxceleb1およびvoxceleb2の標準ベンチマークにおいて,定量的かつ定性的に,いくつかの最先端技術と比較し,極端な頭部姿勢変化においても顕著なロバスト性を示すアーティファクトフリー画像生成におけるアプローチの優位性を示す。コードと事前訓練済みのモデルは、https://github.com/StelaBou/HyperReenact で公開しています。

In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that learn to synthesize realistic facial images, yet producing reenacted faces that are prone to significant visual artifacts, especially under the challenging condition of extreme head pose changes, or requiring expensive few-shot fine-tuning to better preserve the source identity characteristics. We propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting the real images into its latent space and then using a hypernetwork to perform: (i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (i.e., using a single source frame) and allows for cross-subject reenactment, without requiring any subject-specific fine-tuning. We compare our method both quantitatively and qualitatively against several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme head pose changes. We make the code and the pretrained models publicly available at: https://github.com/StelaBou/HyperReenact .

翻訳日:2023-07-21 13:20:48 公開日:2023-07-20

# 古典的ジャミングに対する量子強化レンジフィンディングの実証

Demonstration of quantum-enhanced rangefinding robust against classical jamming ( http://arxiv.org/abs/2307.10794v1 )

ライセンス: Link先を確認

Mateusz P. Mrozowski, Richard J. Murchie, John Jeffers, Jonathan D. Pritchard

(参考訳) 本稿では,連続励起光子対源に基づく量子増幅ライダーの動作と,信号レベルと背景レベルと目標反射率を52dB以下に5桁以上分離した条件下での簡単な検出を組み合わせて示す。本稿では,この検出器の性能をlog-likelihood分析フレームワークを用いて特徴付け,高速かつ遅い古典的ジャミングに対するシステムの頑健性を示すとともに,高い周波数変動に対する免疫を維持しつつ,背景変化の遅い影響をなくす動的背景追跡を実現するための新しいプロトコルを導入する。最後に,このシステムを古典的ジャミングの存在下でレンジファインディングの方式に拡張し,検出器ジッタのみに制限された11cmの空間分解能を持つターゲットを探索する。これらの結果は、ライダーアプリケーションに対する量子相関を利用して、現実のシナリオでこのシステムを実装するための明確な経路を提供する。

In this paper we demonstrate operation of a quantum-enhanced lidar based on a continuously pumped photon pair source combined with simple detection in regimes with over 5 orders of magnitude separation between signal and background levels and target reflectivity down to -52 dB. We characterise the performance of our detector using a log-likelihood analysis framework, and crucially demonstrate the robustness of our system to fast and slow classical jamming, introducing a new protocol to implement dynamic background tracking to eliminate the impact of slow background changes whilst maintaining immunity to high frequency fluctuations. Finally, we extend this system to the regime of rangefinding in the presence of classical jamming to locate a target with an 11 cm spatial resolution limited only by the detector jitter. These results demonstrate the advantage of exploiting quantum correlations for lidar applications, providing a clear route to implementation of this system in real-world scenarios.

翻訳日:2023-07-21 13:19:44 公開日:2023-07-20

# Few/many-shot異常検出のためのPatchCoreの最適化

Optimizing PatchCore for Few/many-shot Anomaly Detection ( http://arxiv.org/abs/2307.10792v1 )

ライセンス: Link先を確認

Jo\~ao Santos, Triet Tran, Oliver Rippel

(参考訳) Few-shot Anomaly Detection (AD) はADの出現するサブフィールドであり、少数のサンプルを用いて正常データと異常データの区別を試みる。新たに提案された数ショットADメソッドは、全ショットドメイン用に開発された既存のアルゴリズムをベースラインとして比較するが、数ショット設定のために専用に最適化するわけではない。したがって、そのような既存アルゴリズムの性能をさらに改善できるかどうかは不明である。私たちはこの仕事でその質問に答える。具体的には,現在最先端のフルショットAD/ASアルゴリズムであるPatchCoreのAD/アノマリーセグメンテーション(AS)性能について,少数ショットと多ショット設定の両方で検討する。我々は, (I) 様々なハイパーパラメータを最適化し, (II) 少数ショット教師あり学習をADドメインに変換することで, さらなる性能向上を実現することができると仮定した。パブリックなVisAとMVTec ADデータセットの発掘実験により、(I)基礎となる特徴抽出器のようなハイパーパラメータを最適化することで、(I)重要なパフォーマンス改善を実現し、(II)画像レベルの拡張は、パフォーマンスを改善するために、保証されない。これらの結果に基づき,visa上でのマイ・ショット広告において,新たな最先端の技術を実現し,既存のad/as手法をマイ・ショット・セッティングに適用するメリットをさらに実証する。最後に, 強いインダクティブバイアスを有する特徴抽出器について, (few-shot) ad/asの今後の研究方向性として検討する。

Few-shot anomaly detection (AD) is an emerging sub-field of general AD, and tries to distinguish between normal and anomalous data using only few selected samples. While newly proposed few-shot AD methods do compare against pre-existing algorithms developed for the full-shot domain as baselines, they do not dedicatedly optimize them for the few-shot setting. It thus remains unclear if the performance of such pre-existing algorithms can be further improved. We address said question in this work. Specifically, we present a study on the AD/anomaly segmentation (AS) performance of PatchCore, the current state-of-the-art full-shot AD/AS algorithm, in both the few-shot and the many-shot settings. We hypothesize that further performance improvements can be realized by (I) optimizing its various hyperparameters, and by (II) transferring techniques known to improve few-shot supervised learning to the AD domain. Exhaustive experiments on the public VisA and MVTec AD datasets reveal that (I) significant performance improvements can be realized by optimizing hyperparameters such as the underlying feature extractor, and that (II) image-level augmentations can, but are not guaranteed, to improve performance. Based on these findings, we achieve a new state of the art in few-shot AD on VisA, further demonstrating the merit of adapting pre-existing AD/AS methods to the few-shot setting. Last, we identify the investigation of feature extractors with a strong inductive bias as a potential future research direction for (few-shot) AD/AS.

翻訳日:2023-07-21 13:19:12 公開日:2023-07-20

# 視覚・言語ナビゲーションエージェントの行動解析

Behavioral Analysis of Vision-and-Language Navigation Agents ( http://arxiv.org/abs/2307.10790v1 )

ライセンス: Link先を確認

Zijiao Yang, Arjun Majumdar, Stefan Lee

(参考訳) 成功させるためには、Vision-and-Language Navigation (VLN) エージェントは周囲に基づいて行動の指示を下す必要がある。本研究では,既存のエージェントが,特定の物体や部屋の停止,旋回,移動に関する指示をいかにしっかりと下ろすかを調べることによって,エージェントの行動を研究する手法を開発する。このアプローチはスキル固有の介入の生成とエージェント予測の変化の測定に基づいている。本稿では,近年のエージェントの行動を分析し,複数のエージェントを比較した詳細なケーススタディを提案する。この分析は、学習のバイアスがエージェントの挙動に持続的な影響を与え、既存のモデルが単純な参照表現を基礎にすることができることを示唆している。本モデルとの比較から,VLNタスク全体の性能向上とスキル特化スコアの相関が示唆された。

To be successful, Vision-and-Language Navigation (VLN) agents must be able to ground instructions to actions based on their surroundings. In this work, we develop a methodology to study agent behavior on a skill-specific basis -- examining how well existing agents ground instructions about stopping, turning, and moving towards specified objects or rooms. Our approach is based on generating skill-specific interventions and measuring changes in agent predictions. We present a detailed case study analyzing the behavior of a recent agent and then compare multiple agents in terms of skill-specific competency scores. This analysis suggests that biases from training have lasting effects on agent behavior and that existing models are able to ground simple referring expressions. Our comparisons between models show that skill-specific scores correlate with improvements in overall VLN task performance.

翻訳日:2023-07-21 13:18:44 公開日:2023-07-20

# 分類器の混合に対する逆攻撃

Adversarial attacks for mixtures of classifiers ( http://arxiv.org/abs/2307.10788v1 )

ライセンス: Link先を確認

Lucas Gnecco Heredia, Benjamin Negrevergne, Yann Chevaleyre

(参考訳) 対向攻撃に対する堅牢性を改善する手段として、分類器の混合(すなわちランダム化アンサンブル)が提案されている。しかし、既存の攻撃はこの種の分類器には適していないことが示されている。本稿では,混合を原理的に攻撃する問題について議論し,問題(有効性と極大性)の幾何学的解析に基づく攻撃の2つの望ましい特性を紹介する。そして、既存の攻撃が両方の特性を満たさないことを示す。最後に, 2次線形設定を理論的に保証する格子クライマー攻撃という新たな攻撃を導入し, 合成および実データを用いた実験を行い, その性能を実証する。

Mixtures of classifiers (a.k.a. randomized ensembles) have been proposed as a way to improve robustness against adversarial attacks. However, it has been shown that existing attacks are not well suited for this kind of classifiers. In this paper, we discuss the problem of attacking a mixture in a principled way and introduce two desirable properties of attacks based on a geometrical analysis of the problem (effectiveness and maximality). We then show that existing attacks do not meet both of these properties. Finally, we introduce a new attack called lattice climber attack with theoretical guarantees on the binary linear setting, and we demonstrate its performance by conducting experiments on synthetic and real datasets.

翻訳日:2023-07-21 13:18:30 公開日:2023-07-20

# クラスプロトタイプによるフィードフォワードソースフリードメイン適応

Feed-Forward Source-Free Domain Adaptation via Class Prototypes ( http://arxiv.org/abs/2307.10787v1 )

ライセンス: Link先を確認

Ondrej Bohdal, Da Li, Timothy Hospedales

(参考訳) ソースフリーなドメイン適応は、実用性があり、ソースデータにアクセスする必要がないため人気がある。しかし、適応プロセスにはまだかなりの時間が必要であり、主にバックプロパゲーションに依存する最適化に基づいている。本稿では,バックプロパゲーションに基づく適応の必要性に挑戦する単純なフィードフォワードアプローチを提案する。提案手法は,事前学習モデルを用いて,ドメインシフト下でのクラスプロトタイプの計算に基づいている。事前学習したモデルに比べて精度が大幅に向上し、既存のドメイン適応法のわずかな時間しか必要としない。

Source-free domain adaptation has become popular because of its practical usefulness and no need to access source data. However, the adaptation process still takes a considerable amount of time and is predominantly based on optimization that relies on back-propagation. In this work we present a simple feed-forward approach that challenges the need for back-propagation based adaptation. Our approach is based on computing prototypes of classes under the domain shift using a pre-trained model. It achieves strong improvements in accuracy compared to the pre-trained model and requires only a small fraction of time of existing domain adaptation methods.

翻訳日:2023-07-21 13:18:19 公開日:2023-07-20

# 比較的(事実的)説明のミラー定義の修正

Modifications of the Miller definition of contrastive (counterfactual) explanations ( http://arxiv.org/abs/2307.10832v1 )

ライセンス: Link先を確認

Kevin McAreavey, Weiru Liu

(参考訳) miller氏は最近、よく知られたhalpern-pearl(hp)の定義と(矛盾しない)説明に基づいて、対比的(事実的)な説明の定義を提案した。重要なことに、ミラーの定義は元々のHPによる説明の定義に基づいているが、これはハルパーンによって修正されている。最近ではボルナーが第3の定義を提案しており、この修正HPの定義は直観に反する結果をもたらす可能性があるとしている。本稿では,miller の定義が hp の定義の問題点を継承することを示す。我々は,より堅牢なhp と borner の定義に基づいて,改良された 2 つの変種を提案することで,これらの問題に対処する。我々は、新しい定義を分析し、これらがミラー定義の精神を保ち、これら3つの変種全てが非矛盾的説明の基盤となる定義に関してモジュラーである別の統一定義を満たすことを示した。我々の知る限りでは、本論文は、オリジナルのHP定義と修正HP定義との最初の明示的な比較も提供する。

Miller recently proposed a definition of contrastive (counterfactual) explanations based on the well-known Halpern-Pearl (HP) definitions of causes and (non-contrastive) explanations. Crucially, the Miller definition was based on the original HP definition of explanations, but this has since been modified by Halpern; presumably because the original yields counterintuitive results in many standard examples. More recently Borner has proposed a third definition, observing that this modified HP definition may also yield counterintuitive results. In this paper we show that the Miller definition inherits issues found in the original HP definition. We address these issues by proposing two improved variants based on the more robust modified HP and Borner definitions. We analyse our new definitions and show that they retain the spirit of the Miller definition where all three variants satisfy an alternative unified definition that is modular with respect to an underlying definition of non-contrastive explanations. To the best of our knowledge this paper also provides the first explicit comparison between the original and modified HP definitions.

翻訳日:2023-07-21 13:11:10 公開日:2023-07-20

# Yelpレビューと食品タイプ: レーティング、センチメント、トピックの比較分析

Yelp Reviews and Food Types: A Comparative Analysis of Ratings, Sentiments, and Topics ( http://arxiv.org/abs/2307.10826v1 )

ライセンス: Link先を確認

Wenyu Liao, Yiqing Shi, Yujia Hu, Wei Quan

(参考訳) 本研究は、yelpのレビューと食品の種類との関係を調査し、格付け、感情、トピックが食品の種類によってどのように異なるかを調査した。具体的には,レビューの格付けや感情が食品の種類や格付けや感情に基づいてどのように変化するかを分析し,機械学習モデルを用いたレビュートピックを推察し,異なる食品タイプ間の話題分布を比較する。分析の結果、食品の種類によっては、類似の格付け、感情、話題の分布があるのに対し、別のパターンがあることが明らかとなった。評価と感情に基づいて,4種類の食品の種類を特定し,特定の食品の種類をレビューする際に異なる話題に注目する傾向が認められた。これらの知見は,デジタルメディアプラットフォームにおけるユーザ行動と文化的影響の理解と,異文化間の理解と評価の促進に重要な意味を持つ。

This study examines the relationship between Yelp reviews and food types, investigating how ratings, sentiments, and topics vary across different types of food. Specifically, we analyze how ratings and sentiments of reviews vary across food types, cluster food types based on ratings and sentiments, infer review topics using machine learning models, and compare topic distributions among different food types. Our analyses reveal that some food types have similar ratings, sentiments, and topics distributions, while others have distinct patterns. We identify four clusters of food types based on ratings and sentiments and find that reviewers tend to focus on different topics when reviewing certain food types. These findings have important implications for understanding user behavior and cultural influence on digital media platforms and promoting cross-cultural understanding and appreciation.

翻訳日:2023-07-21 13:10:53 公開日:2023-07-20

# Parseとリコール:放射線医のような正確な肺結節悪性度予測を目指して

Parse and Recall: Towards Accurate Lung Nodule Malignancy Prediction like Radiologists ( http://arxiv.org/abs/2307.10824v1 )

ライセンス: Link先を確認

Jianpeng Zhang, Xianghua Ye, Jianfeng Zhang, Yuxing Tang, Minfeng Xu, Jianfei Guo, Xin Chen, Zaiyi Liu, Jingren Zhou, Le Lu, Ling Zhang

(参考訳) 肺がんは世界中で主要な死因であり、早期検診は生存率の向上に不可欠である。臨床的には、結節の文脈構造と放射線医の蓄積した経験は良性および悪性結節の同定の正確性に関連する2つの中核要素である。文脈情報は、位置、形状、周辺血管などの結節に関する包括的な情報を提供し、経験豊富な放射線科医は、意思決定の基礎を強化するために、以前の事例から手がかりを探すことができる。本稿では,放射線科医の診断過程をシミュレートする放射線科医にインスパイアされた手法を提案する。コンテキスト解析モジュールはまず、結節のコンテキスト構造をセグメント化し、その後、結節のより包括的な理解のためにコンテキスト情報を集約する。プロトタイプリコールモジュールは、プロトタイプベースの学習を利用して、以前に学んだケースを比較分析のプロトタイプとして凝縮する。この2つのモジュールを基盤として, 結節の固有特性と他の結節から蓄積された外部知識を併用し, 音響診断を行う。低用量と非用量の両方のニーズを満たすため,低用量および非用量CTからそれぞれ12,852ノジュールと4,029ノジュールの大規模データセットを収集し,それぞれに病理診断と追跡確認を行った。提案手法は,低線量および非コントラストの両方のシナリオにおいて,高度なスクリーニング性能を実現することを示す。

Lung cancer is a leading cause of death worldwide and early screening is critical for improving survival outcomes. In clinical practice, the contextual structure of nodules and the accumulated experience of radiologists are the two core elements related to the accuracy of identification of benign and malignant nodules. Contextual information provides comprehensive information about nodules such as location, shape, and peripheral vessels, and experienced radiologists can search for clues from previous cases as a reference to enrich the basis of decision-making. In this paper, we propose a radiologist-inspired method to simulate the diagnostic process of radiologists, which is composed of context parsing and prototype recalling modules. The context parsing module first segments the context structure of nodules and then aggregates contextual information for a more comprehensive understanding of the nodule. The prototype recalling module utilizes prototype-based learning to condense previously learned cases as prototypes for comparative analysis, which is updated online in a momentum way during training. Building on the two modules, our method leverages both the intrinsic characteristics of the nodules and the external knowledge accumulated from other nodules to achieve a sound diagnosis. To meet the needs of both low-dose and noncontrast screening, we collect a large-scale dataset of 12,852 and 4,029 nodules from low-dose and noncontrast CTs respectively, each with pathology- or follow-up-confirmed labels. Experiments on several datasets demonstrate that our method achieves advanced screening performance on both low-dose and noncontrast scenarios.

翻訳日:2023-07-21 13:10:36 公開日:2023-07-20

# 漸進的意味セグメンテーションのための勾配-意味論的補償

Gradient-Semantic Compensation for Incremental Semantic Segmentation ( http://arxiv.org/abs/2307.10822v1 )

ライセンス: Link先を確認

Wei Cong, Yang Cong, Jiahua Dong, Gan Sun, Henghui Ding

(参考訳) インクリメンタルセマンティックセマンティクスは、以前に学習したクラスのトレーニングデータにアクセスすることなく、新しいクラスのセマンティクスを継続的に学習することを目的としている。しかし、現在のほとんどの方法は破滅的な忘れと背景シフトに対処できない。 1)不均衡勾配バックプロパゲーションによって引き起こされる異なるペースを考慮せずに,すべてのクラスを等しく扱うこと。 2) クラス間の強い意味指導がない。本稿では,上記の課題に取り組むため,グラデーションとセマンティクスの両方の観点から段階的なセマンティクスセグメンテーションを克服する,グラデーション・セマンティクス補償(gsc)モデルを提案する。具体的には、勾配面からの破滅的な忘れに対処するために、再重み付け勾配のバックプロパゲーションにより、以前に見られたクラスの忘れるペースのバランスをとることができるステップアウェアな勾配補償を開発する。一方,本研究では,意味的側面からの破滅的忘れを緩和するソフトラベルを用いて,一貫したクラス間意味関係を蒸留するソフトシャープ意味関係蒸留法を提案する。さらに,背景変化を緩和する強力な意味的ガイダンスを提供する,原型的な擬似再ラベルを開発する。ピクセルとクラスワイドプロトタイプ間の距離を測定することで、バックグラウンドで古いクラスの高品質な擬似ラベルを生成する。 3つの公開データセット、すなわち Pascal VOC 2012 ADE20K と Cityscapes に関する大規模な実験は、提案した GSC モデルの有効性を実証している。

Incremental semantic segmentation aims to continually learn the segmentation of new coming classes without accessing the training data of previously learned classes. However, most current methods fail to address catastrophic forgetting and background shift since they 1) treat all previous classes equally without considering different forgetting paces caused by imbalanced gradient back-propagation; 2) lack strong semantic guidance between classes. To tackle the above challenges, in this paper, we propose a Gradient-Semantic Compensation (GSC) model, which surmounts incremental semantic segmentation from both gradient and semantic perspectives. Specifically, to address catastrophic forgetting from the gradient aspect, we develop a step-aware gradient compensation that can balance forgetting paces of previously seen classes via re-weighting gradient backpropagation. Meanwhile, we propose a soft-sharp semantic relation distillation to distill consistent inter-class semantic relations via soft labels for alleviating catastrophic forgetting from the semantic aspect. In addition, we develop a prototypical pseudo re-labeling that provides strong semantic guidance to mitigate background shift. It produces high-quality pseudo labels for old classes in the background by measuring distances between pixels and class-wise prototypes. Extensive experiments on three public datasets, i.e., Pascal VOC 2012, ADE20K, and Cityscapes, demonstrate the effectiveness of our proposed GSC model.

翻訳日:2023-07-21 13:10:06 公開日:2023-07-20

# 電磁散乱における第1子近似の厳密性

Exactness of the first Born approximation in electromagnetic scattering ( http://arxiv.org/abs/2307.10819v1 )

ライセンス: Link先を確認

Farhang Loran and Ali Mostafazadeh

(参考訳) 一般の非等方的定常線形媒質による3次元の平面電磁波散乱に対して、入射波数$k$が予め割り当てられた値$\alpha$を超えない場合に、第1ボルン近似が散乱波の正確な表現を得られる媒体の誘電率と透過性テンソルの条件を与える。また,この条件下では,入射波の偏光によらず広帯域可視性を示す,$k\leq \alpha/2$ に対して媒質が全方向可視であることを示す。

For the scattering of plane electromagnetic waves by a general possibly anisotropic stationary linear medium in three dimensions, we give a condition on the permittivity and permeability tensors of the medium under which the first Born approximation yields the exact expression for the scattered wave whenever the incident wavenumber $k$ does not exceed a pre-assigned value $\alpha$. We also show that under this condition the medium is omnidirectionally invisible for $k\leq \alpha/2$, i.e., it displays broadband invisibility regardless of the polarization of the incident wave.

翻訳日:2023-07-21 13:09:35 公開日:2023-07-20

# BoxDiff: トレーニング不要なボックス制約拡散を用いたテキスト・画像合成

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion ( http://arxiv.org/abs/2307.10816v1 )

ライセンス: Link先を確認

Jinheng Xie, Yuexiang Li, Yawen Huang, Haozhe Liu, Wentian Zhang, Yefeng Zheng and Mike Zheng Shou

(参考訳) 最近のテキストから画像への拡散モデルは、高品質な画像を生成する驚くべき能力を示している。しかし、研究者は主にテキストプロンプトだけで画像の合成方法を研究した。他のモダリティを条件として利用する研究もあるが、箱/マスク画像ペアや微調整時間など、かなりのペアデータが必要となる。このようなペアデータには時間と労力がかかり、クローズドセットに制限されるため、オープンワールドにおけるアプリケーションのボトルネックになる可能性がある。本稿では,ボックスやスクリブルなどのユーザ提供条件の最も単純な形式に焦点を当てる。上記の問題を緩和するために,与えられた空間条件に固執する合成画像内のオブジェクトやコンテキストを制御するためのトレーニングフリーな手法を提案する。具体的には、3つの空間的制約、すなわち、インナーボックス、アウターボックス、コーナー制約は、追加のトレーニングや大量のアノテートレイアウトデータを必要としない拡散モデルのデノイングステップにシームレスに統合される。提案した制約は, 安定拡散モデルが高忠実で多様な概念カバレッジで合成できる能力を維持しつつ, 画像中の何とどこに表示すべきかを制御できることを示す。コードはhttps://github.com/Sierkinhane/BoxDiffで公開されている。

Recent text-to-image diffusion models have demonstrated an astonishing capacity to generate high-quality images. However, researchers mainly studied the way of synthesizing images with only text prompts. While some works have explored using other modalities as conditions, considerable paired data, e.g., box/mask-image pairs, and fine-tuning time are required for nurturing models. As such paired data is time-consuming and labor-intensive to acquire and restricted to a closed set, this potentially becomes the bottleneck for applications in an open world. This paper focuses on the simplest form of user-provided conditions, e.g., box or scribble. To mitigate the aforementioned problem, we propose a training-free method to control objects and contexts in the synthesized images adhering to the given spatial conditions. Specifically, three spatial constraints, i.e., Inner-Box, Outer-Box, and Corner Constraints, are designed and seamlessly integrated into the denoising step of diffusion models, requiring no additional training and massive annotated layout data. Extensive results show that the proposed constraints can control what and where to present in the images while retaining the ability of the Stable Diffusion model to synthesize with high fidelity and diverse concept coverage. The code is publicly available at https://github.com/Sierkinhane/BoxDiff.

翻訳日:2023-07-21 13:09:21 公開日:2023-07-20

# クロスコーポレート多言語音声感情認識:アムハラ語対他言語

Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages ( http://arxiv.org/abs/2307.10814v1 )

ライセンス: Link先を確認

Ephrem Afele Retta, Richard Sutcliffe, Jabar Mahmood, Michael Abebe Berwo, Eiad Almekhlafi, Sajjad Ahmed Khan, Shehzad Ashraf Chaudhry, Mustafa Mhamed, Jun Feng

(参考訳) 従来の音声感情認識(ser)タスクでは、所定の言語の分類器が、同じ言語用の既存のデータセット上で訓練される。しかし、言語のトレーニングデータが存在しない場合は、代わりに他の言語からのデータを使用することができる。言語横断および多言語SERを用いて,アムハラ語,英語,ドイツ語,URDUを用いて実験を行った。 amharicでは、公開されているamharic speech emotion dataset(ased)を使っています。英語、ドイツ語、Urduでは、既存のRAVDESS、EMO-DB、URDUデータセットを使用します。我々は、すべてのデータセットのラベルを正と負の2つのクラスにマッピングする以前の研究に従った。したがって、異なる言語のパフォーマンスを直接比較し、トレーニングとテストのための言語を組み合わせることができます。実験1では、AlexNet、VGGE(VGGの派生案)、ResNet50の3つの分類器を用いて単言語SER試験を行った。 3つのモデルの平均値はASEDとRAVDESSと非常によく似ており、アムハラ語と英語のSERも同様に難しいことが示唆された。同様に、ドイツのSERはより困難であり、Urdu SERはより簡単である。実験2では,ある言語で訓練を行い,各ペアの両方向(amharic<->german, amharic<->english, amharic<->urdu)でテストを行った。 amharicをターゲットとした結果は、英語やドイツ語をソースとして使うことが最良の結果をもたらすことを示唆している。実験3では、いくつかの非アムハラ語でトレーニングを行い、それからアムハラ語でテストしました。得られた最良の精度は実験2の最良の精度よりも数パーセント高く、訓練に2つまたは3つの非アンモリック言語を使う場合、1つの非アンモリック言語を使う場合よりも良い結果が得られることが示唆された。全体として,言語資源が不足している場合,言語間および多言語間トレーニングがser分類器の訓練に有効な戦略となる可能性が示唆された。

In a conventional Speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language does not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German and URDU. For Amharic, we use our own publicly-available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu we use the existing RAVDESS, EMO-DB and URDU datasets. We followed previous research in mapping labels for all datasets to just two classes, positive and negative. Thus we can compare performance on different languages directly, and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. Results averaged for the three models were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each pair: Amharic<->German, Amharic<->English, and Amharic<->Urdu. Results with Amharic as target suggested that using English or German as source will give the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percent greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training a SER classifier when resources for a language are scarce.

翻訳日:2023-07-21 13:08:58 公開日:2023-07-20

# 全方位音声視覚信号の知覚品質評価

Perceptual Quality Assessment of Omnidirectional Audio-visual Signals ( http://arxiv.org/abs/2307.10813v1 )

ライセンス: Link先を確認

Xilei Zhu, Huiyu Duan, Yuqin Cao, Yuxin Zhu, Yucheng Zhu, Jing Liu, Li Chen, Xiongkuo Min, Guangtao Zhai

(参考訳) 医療、教育、広告、観光などの分野において、Omnidirectional Video (ODV) はますます重要な役割を担っている。 ODVの品質を評価することは、サービスプロデューサにとってユーザのQuality of Experience(QoE)を改善する上で重要である。しかし、既存のODVの品質評価研究はビデオの歪みにのみ焦点を当てているが、全体的なQoEは付随する音声信号にも依存している。本稿では,まず,高画質全方向A/Vコンテンツから生成される375個の全方向オーディオ視覚(A/V)シーケンスと,それに対応する知覚的オーディオ視覚品質スコアを含む,全方向ビデオのための大規模オーディオ視覚品質評価データセットを確立する。そこで,本研究では,マルチモーダル融合戦略を用いて,既存の単一モードオーディオおよびビデオQAモデルを組み合わせた全方位オーディオ視覚品質評価(OAVQA)のための3つのベースライン手法を設計する。我々は,OAVQAに対するA/Vマルチモーダル融合法の有効性を検証し,全方位QoE評価のための新しいベンチマークを提供する。私たちのデータセットはhttps://github.com/iamazxl/oavqaで利用可能です。

Omnidirectional videos (ODVs) play an increasingly important role in the application fields of medical, education, advertising, tourism, etc. Assessing the quality of ODVs is significant for service-providers to improve the user's Quality of Experience (QoE). However, most existing quality assessment studies for ODVs only focus on the visual distortions of videos, while ignoring that the overall QoE also depends on the accompanying audio signals. In this paper, we first establish a large-scale audio-visual quality assessment dataset for omnidirectional videos, which includes 375 distorted omnidirectional audio-visual (A/V) sequences generated from 15 high-quality pristine omnidirectional A/V contents, and the corresponding perceptual audio-visual quality scores. Then, we design three baseline methods for full-reference omnidirectional audio-visual quality assessment (OAVQA), which combine existing state-of-the-art single-mode audio and video QA models via multimodal fusion strategies. We validate the effectiveness of the A/V multimodal fusion method for OAVQA on our dataset, which provides a new benchmark for omnidirectional QoE evaluation. Our dataset is available at https://github.com/iamazxl/OAVQA.

翻訳日:2023-07-21 13:08:28 公開日:2023-07-20

# 第二の心を持つように思える」:大規模言語モデルによる前書きにおける人間とAIの共創造性の検討

"It Felt Like Having a Second Mind": Investigating Human-AI Co-creativity in Prewriting with Large Language Models ( http://arxiv.org/abs/2307.10811v1 )

ライセンス: Link先を確認

Qian Wan, Siying Hu, Yu Zhang, Piaohong Wang, Bo Wen, Zhicong Lu

(参考訳) プレライティング(prewriting)は、最初のドラフトの前にアイデアを発見し、開発するプロセスである。大規模言語モデル(LLM)は、クリエイティブな記述を含む様々なタスクに有用であることが示されているが、ユーザーが事前記述をサポートするためにLLMとどのように協力するかは分かっていない。このような創造的プロセスにおいてllmの望ましい協力的役割とイニシアティブもまた不明確である。プリライティング中の人間-LLMのコラボレーションパターンとダイナミクスを調べるために,15人の参加者による3段階の質的研究を行った。その結果,共同作業において,理想,照明,実施段階を含む3段階の反復的Human-AI共創造プロセスが存在することがわかった。この協調プロセスは、人間とllmの間に存在する混合的かつシフト的なレベルのイニシアティブに加えて、人間を支配的な役割で擁護する。本研究は、このプロセス中に発生するコラボレーションのブレークダウン、Human-AIコクリエーションにおける既存のLLMの使用に対するユーザ認識について報告し、このコクリエーションプロセスを支援するための設計上の意味について論じる。

Prewriting is the process of discovering and developing ideas before a first draft, which requires divergent thinking and often implies unstructured strategies such as diagramming, outlining, free-writing, etc. Although large language models (LLMs) have been demonstrated to be useful for a variety of tasks including creative writing, little is known about how users would collaborate with LLMs to support prewriting. The preferred collaborative role and initiative of LLMs during such a creativity process is also unclear. To investigate human-LLM collaboration patterns and dynamics during prewriting, we conducted a three-session qualitative study with 15 participants in two creative tasks: story writing and slogan writing. The findings indicated that during collaborative prewriting, there appears to be a three-stage iterative Human-AI Co-creativity process that includes Ideation, Illumination, and Implementation stages. This collaborative process champions the human in a dominant role, in addition to mixed and shifting levels of initiative that exist between humans and LLMs. This research also reports on collaboration breakdowns that occur during this process, user perceptions of using existing LLMs during Human-AI Co-creativity, and discusses design implications to support this co-creativity process.

翻訳日:2023-07-21 13:08:06 公開日:2023-07-20

# 最適輸送による模倣学習におけるエキスパートの実証

On Combining Expert Demonstrations in Imitation Learning via Optimal Transport ( http://arxiv.org/abs/2307.10810v1 )

ライセンス: Link先を確認

Ilana Sebag, Samuel Cohen, Marc Peter Deisenroth

(参考訳) 模倣学習(il)は、専門家によるデモンストレーションを通じてエージェントに特定のタスクを教える。 ILの主要なアプローチの1つは、エージェントと専門家の間の距離を定義し、その距離を最小化するエージェントポリシーを見つけることである。エージェントと専門家の軌跡間の有意な距離を測定する手段を提供するため、模倣学習において最適な輸送法が広く用いられている。しかしながら、複数の専門家によるデモを最適に組み合わせる方法については、広く研究されていない。標準的な方法は、状態(-アクション)軌跡を単純に結合することであり、これはトラジェクトリがマルチモーダルである場合に問題となる。提案手法は,マルチマルジナルな最適輸送距離を用いて,複数の状態軌跡と多種多様な状態軌跡の組み合わせをOT感覚で実現し,より合理的な幾何平均値を提供する方法である。提案手法は,複数の専門家からエージェントが学習し,その効率をOpenAI Gym制御環境上で解析し,標準手法が常に最適であるとは限らないことを示す。

Imitation learning (IL) seeks to teach agents specific tasks through expert demonstrations. One of the key approaches to IL is to define a distance between agent and expert and to find an agent policy that minimizes that distance. Optimal transport methods have been widely used in imitation learning as they provide ways to measure meaningful distances between agent and expert trajectories. However, the problem of how to optimally combine multiple expert demonstrations has not been widely studied. The standard method is to simply concatenate state (-action) trajectories, which is problematic when trajectories are multi-modal. We propose an alternative method that uses a multi-marginal optimal transport distance and enables the combination of multiple and diverse state-trajectories in the OT sense, providing a more sensible geometric average of the demonstrations. Our approach enables an agent to learn from several experts, and its efficiency is analyzed on OpenAI Gym control environments and demonstrates that the standard method is not always optimal.

翻訳日:2023-07-21 13:07:43 公開日:2023-07-20

# 関係時間異常検出を含むクラウドシステムの性能問題同定

Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection ( http://arxiv.org/abs/2307.10869v1 )

ライセンス: Link先を確認

Wenwei Gu, Jinyang Liu, Zhuangbin Chen, Jianping Zhang, Yuxin Su, Jiazhen Gu, Cong Feng, Zengyin Yang and Michael Lyu

(参考訳) パフォーマンス問題は、大規模なクラウドサービスシステムに浸透し、大きな収益損失につながる可能性がある。信頼性の高いパフォーマンスを保証するためには、サービス監視メトリクスを使用してこれらの問題を正確に識別し、ローカライズする必要がある。現代のクラウドシステムの複雑さと規模を考えると、このタスクは困難であり、個々の人間の能力を超えた幅広い専門知識とリソースを必要とする可能性がある。既存の手法では、各メトリックを独立して分析して異常を検出することでこの問題に対処している。しかし、これはエンジニアが手動で診断することが難しい圧倒的な警報嵐を引き起こす可能性がある。より良いパフォーマンスを追求するためには、メトリクスの時間的パターンだけでなく、メトリクス(リレーショナルパターン)間の相関も考慮し、多変量メトリクス異常検出問題として定式化する必要がある。しかし、ほとんどの研究はこれらの2種類の特徴を明示的に抽出するに足りていない。さらに、トレーニングデータ中にラベルのない異常が混在しており、検出性能を損なう可能性がある。これらの制約に対処するために,メトリクスの相関情報と時間情報を組み合わせた関係時間異常検出モデル(RTAnomaly)を提案する。 RTAnomalyは、メトリクス間の依存関係を学習するためにグラフアテンション層を使用し、異常を効果的に発生させる可能性のある異常メトリクスの特定をさらに助ける。さらに、ポジティブなラベルなし学習の概念を利用して、トレーニングデータの潜在的な異常の問題に対処する。提案手法を評価するため,公開データセットと2つの産業データセットを用いて実験を行った。 RTAnomaly は、平均 F1 スコア 0.929 と Hit@3 0.920 を達成し、その優位性を示している。

Performance issues permeate large-scale cloud service systems, which can lead to huge revenue losses. To ensure reliable performance, it's essential to accurately identify and localize these issues using service monitoring metrics. Given the complexity and scale of modern cloud systems, this task can be challenging and may require extensive expertise and resources beyond the capacity of individual humans. Some existing methods tackle this problem by analyzing each metric independently to detect anomalies. However, this could incur overwhelming alert storms that are difficult for engineers to diagnose manually. To pursue better performance, not only the temporal patterns of metrics but also the correlation between metrics (i.e., relational patterns) should be considered, which can be formulated as a multivariate metrics anomaly detection problem. However, most of the studies fall short of extracting these two types of features explicitly. Moreover, there exist some unlabeled anomalies mixed in the training data, which may hinder the detection performance. To address these limitations, we propose the Relational- Temporal Anomaly Detection Model (RTAnomaly) that combines the relational and temporal information of metrics. RTAnomaly employs a graph attention layer to learn the dependencies among metrics, which will further help pinpoint the anomalous metrics that may cause the anomaly effectively. In addition, we exploit the concept of positive unlabeled learning to address the issue of potential anomalies in the training data. To evaluate our method, we conduct experiments on a public dataset and two industrial datasets. RTAnomaly outperforms all the baseline models by achieving an average F1 score of 0.929 and Hit@3 of 0.920, demonstrating its superiority.

翻訳日:2023-07-21 13:02:34 公開日:2023-07-20

# FigCaps-HF:図から図への生成フレームワークと人間のフィードバックによるベンチマーク

FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback ( http://arxiv.org/abs/2307.10867v1 )

ライセンス: Link先を確認

Ashish Singh, Prateek Agarwal, Zixuan Huang, Arpita Singh, Tong Yu, Sungchul Kim, Victor Bursztyn, Nikos Vlassis, Ryan A. Rossi

(参考訳) キャプションは科学的な視覚化や文書を理解するのに不可欠である。既存の科学的な人物に対するキャプション手法は、学習のための文書から抽出された図形の字幕ペアに依存しているが、その多くが、助け、説明可能性、視覚的記述性([15])といった指標に関して不足しているため、字幕の生成は読者の好みと一致しない。高品質なフィギュアキャプションの生成を可能にするため,FigCaps-HFは,読取者の好みに最適化されたキャプションを生成する際に,ドメインエキスパートのフィードバックを組み込むことのできる,フィギュアキャプション生成のための新しいフレームワークである。私たちのフレームワークは 1) フィギュアキャプチャペアの品質評価のための自動方法 2)人間フィードバックを用いた新しい強化学習(RLHF)により,読取者の好みに応じて生成図形とキャプションのモデルを最適化する。各種モデルの標準微調整よりも性能を向上させることで,簡単な学習フレームワークの有効性を実証する。特にベースモデルとしてblipを使用する場合,我々のrlhfフレームワークは,ルージュ,ブルー,メテオールにおいて平均35.7%,16.9%,9%の利得を達成している。最後に,この問題に対するRLHF手法のさらなる評価と開発を可能にするために,人為的フィードバックを伴う大規模ベンチマークデータセットをリリースする。

Captions are crucial for understanding scientific visualizations and documents. Existing captioning methods for scientific figures rely on figure-caption pairs extracted from documents for training, many of which fall short with respect to metrics like helpfulness, explainability, and visual-descriptiveness [15] leading to generated captions being misaligned with reader preferences. To enable the generation of high-quality figure captions, we introduce FigCaps-HF a new framework for figure-caption generation that can incorporate domain expert feedback in generating captions optimized for reader preferences. Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences. We demonstrate the effectiveness of our simple learning framework by improving performance over standard fine-tuning across different types of models. In particular, when using BLIP as the base model, our RLHF framework achieves a mean gain of 35.7%, 16.9%, and 9% in ROUGE, BLEU, and Meteor, respectively. Finally, we release a large-scale benchmark dataset with human feedback on figure-caption pairs to enable further evaluation and development of RLHF techniques for this problem.

翻訳日:2023-07-21 13:02:06 公開日:2023-07-20

# 深部グラフを用いた神経持続の注意点

Addressing caveats of neural persistence with deep graph persistence ( http://arxiv.org/abs/2307.10865v1 )

ライセンス: Link先を確認

Leander Girrbach, Anders Christensen, Ole Winther, Zeynep Akata, A. Sophia Koepke

(参考訳) ニューラルパーシスタンス(Neural Persistence)は、ディープラーニングにおけるトポロジカルデータ分析の新たな分野において提案される、ニューラルネットワークの複雑性を定量化する重要な尺度である。しかし、本研究では、ネットワーク重みのばらつきと大きな重みの空間集中が神経の持続性に影響を与える主な要因であることを理論的および実証的に見出した。これは線形分類器の有用な情報をキャプチャする一方で、深層ニューラルネットワークの後の層には関連する空間構造が存在しておらず、ニューラルネットワークの永続性は重みの分散とほぼ同値である。さらに、ディープニューラルネットワークのための層間平均化手順は、層間の相互作用を考慮しない。そこで本研究では,1つの行列上でのニューラルネットワークの永続性を計算するのに等価である単一層ではなく,ニューラルネットワーク全体に対するニューラルネットワークの永続性に基づくフィルタリングの拡張を提案する。これは、ネットワークを通した永続的なパスを暗黙的に取り入れ、標準化を通じて分散に関連する問題を軽減します。コードはhttps://github.com/ExplainableML/Deep-Graph-Persistenceで入手できる。

Neural Persistence is a prominent measure for quantifying neural network complexity, proposed in the emerging field of topological data analysis in deep learning. In this work, however, we find both theoretically and empirically that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. Whilst this captures useful information for linear classifiers, we find that no relevant spatial structure is present in later layers of deep neural networks, making neural persistence roughly equivalent to the variance of weights. Additionally, the proposed averaging procedure across layers for deep neural networks does not consider interaction between layers. Based on our analysis, we propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers, which is equivalent to calculating neural persistence on one particular matrix. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues through standardisation. Code is available at https://github.com/ExplainableML/Deep-Graph-Persistence .

翻訳日:2023-07-21 13:01:40 公開日:2023-07-20

# セマンティクス看護の改善のために注意を分割・結合する

Divide & Bind Your Attention for Improved Generative Semantic Nursing ( http://arxiv.org/abs/2307.10864v1 )

ライセンス: Link先を確認

Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva

(参考訳) 大規模テキストから画像への生成モデル、例えばstable diffusion (sd)は、高い忠実度で圧倒的な結果を示している。素晴らしい進歩にもかかわらず、現在の最先端モデルは入力プロンプトに完全に付着した画像を生成するのに依然として苦労している。 Attend & Exciteは、推論時間におけるクロスアテンションを最適化し、セマンティックスをよりうまく組み込むことを目的として、ジェネレーティブセマンティック・ナーシング(GSN)の概念を導入した。これは単純なプロンプト、例えば 'a cat and a dog'' を生成する上で有望な結果を示す。しかし、その有効性はより複雑なプロンプトを扱う際に低下し、不適切な属性結合の問題に明示的に対処しない。複雑なプロンプトや複数のエンティティを含むシナリオによって生じる課題に対処し、属性バインディングの改善を実現するため、division & bindを提案する。 GSNの新たな損失目標として,新規の出席損失と結合損失の2つを紹介する。提案手法は、複雑なプロンプトからの属性アライメントを改善した所望のオブジェクトを忠実に合成し、複数の評価ベンチマークで優れた性能を示す。さらなるビデオと更新はプロジェクトページ \url{https://sites.google.com/view/divide-and-bind} で見ることができる。

Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., ``a cat and a dog''. However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks. More videos and updates can be found on the project page \url{https://sites.google.com/view/divide-and-bind}.

翻訳日:2023-07-21 13:01:20 公開日:2023-07-20

# BlendFace: フェイススワッピングのためのアイデンティティエンコーダの再設計

BlendFace: Re-designing Identity Encoders for Face-Swapping ( http://arxiv.org/abs/2307.10854v1 )

ライセンス: Link先を確認

Kaede Shiohara, Xingchao Yang, Takafumi Taketomi

(参考訳) コンピュータビジョンにおける生成的敵ネットワークと顔認識モデルの大きな進歩により、単一のソースの画像のアイデンティティを交換できるようになった。多くの研究でほぼ満足な解が提案されたように思われるが、広く使われているアイデンティティエンコーダであるeg、ArcFaceは、顔認識タスクの事前訓練によっていくつかの重要な属性バイアスを持つため、いまだに不要な属性スワッピングを引き起こすアイデンティティ属性エンタングルメントに悩まされている。この問題に対処するために、顔スワッピングのための新しいIDエンコーダであるBlendFaceを設計する。 blendfaceの背景にある重要なアイデアは、ヘアセイルのような対人バイアスを緩和する別のイメージに置き換えられたブレンドイメージで顔認識モデルをトレーニングすることだ。 BlendFaceは混乱したID機能をジェネレータに供給し、ID損失関数としてジェネレータを適切に誘導する。大規模な実験により、BlendFaceはフェイススワッピングモデルにおけるID-属性の不整合を改善し、従来の方法と同等の定量的性能を維持することが示されている。

The great advancements of generative adversarial networks and face recognition models in computer vision have made it possible to swap identities on images from single sources. Although a lot of studies seems to have proposed almost satisfactory solutions, we notice previous methods still suffer from an identity-attribute entanglement that causes undesired attributes swapping because widely used identity encoders, eg, ArcFace, have some crucial attribute biases owing to their pretraining on face recognition tasks. To address this issue, we design BlendFace, a novel identity encoder for face-swapping. The key idea behind BlendFace is training face recognition models on blended images whose attributes are replaced with those of another mitigates inter-personal biases such as hairsyles. BlendFace feeds disentangled identity features into generators and guides generators properly as an identity loss function. Extensive experiments demonstrate that BlendFace improves the identity-attribute disentanglement in face-swapping models, maintaining a comparable quantitative performance to previous methods.

翻訳日:2023-07-21 13:00:56 公開日:2023-07-20

# 弱修正変化検出のための効果的な事前及び効率的なモデル探索

Exploring Effective Priors and Efficient Models for Weakly-Supervised Change Detection ( http://arxiv.org/abs/2307.10853v1 )

ライセンス: Link先を確認

Zhenghui Zhao, Lixiang Ru, Chen Wu

(参考訳) weakly-supervised change detection (wscd)は、画像レベルのアノテーションだけでピクセルレベルの変更を検出することを目的としている。ラベルの効率のため、WSCDは最近注目を集めている。しかし、現在のWSCDメソッドは、画像レベルのアノテーションとピクセルレベルの予測の不整合など、変更の欠如と製造の難しさにしばしば遭遇する。特に、変化の欠如は、画像レベルのラベルが変化しているにもかかわらず、WSCDモデルが変化したピクセルを予測できない状況と、その逆は変化の作り方である。この課題に対処するため、WSCDにおけるグローバルスケールおよびローカルスケールの事前処理を活用し、Dilated Prior(DP)デコーダとLabel Gated(LG)制約という2つのコンポーネントを提案する。 DPデコーダは、変更された画像レベルラベルでサンプルをデコードし、変更されていないラベルでサンプルをスキップし、すべて変更されていないピクセルレベルラベルで置き換える。 LGの制約は、変化した表現と画像レベルのラベルの対応から派生し、変化状態の誤予測時にモデルをペナルティ化する。さらに,変更検出における弱教師付き学習の可能性を示す,シンプルながら強力なトランスフォーマーベースモデルであるTransWCDを開発した。 DPデコーダとLG制約をTransWCDに統合することにより、TransWCD-DLを形成する。提案したTransWCDとTransWCD-DLは,WHU-CDデータセットの最先端手法に対して,それぞれ有意な+6.33%,+9.55%のF1スコアを達成している。いくつかのパフォーマンス指標は、FSCD(Full-supervised Change Detection)の競合よりも多い。コードはhttps://github.com/zhenghuizhao/TransWCDで入手できる。

Weakly-supervised change detection (WSCD) aims to detect pixel-level changes with only image-level annotations. Owing to its label efficiency, WSCD is drawing increasing attention recently. However, current WSCD methods often encounter the challenge of change missing and fabricating, i.e., the inconsistency between image-level annotations and pixel-level predictions. Specifically, change missing refer to the situation that the WSCD model fails to predict any changed pixels, even though the image-level label indicates changed, and vice versa for change fabricating. To address this challenge, in this work, we leverage global-scale and local-scale priors in WSCD and propose two components: a Dilated Prior (DP) decoder and a Label Gated (LG) constraint. The DP decoder decodes samples with the changed image-level label, skips samples with the unchanged label, and replaces them with an all-unchanged pixel-level label. The LG constraint is derived from the correspondence between changed representations and image-level labels, penalizing the model when it mispredicts the change status. Additionally, we develop TransWCD, a simple yet powerful transformer-based model, showcasing the potential of weakly-supervised learning in change detection. By integrating the DP decoder and LG constraint into TransWCD, we form TransWCD-DL. Our proposed TransWCD and TransWCD-DL achieve significant +6.33% and +9.55% F1 score improvements over the state-of-the-art methods on the WHU-CD dataset, respectively. Some performance metrics even exceed several fully-supervised change detection (FSCD) competitors. Code will be available at https://github.com/zhenghuizhao/TransWCD.

翻訳日:2023-07-21 13:00:35 公開日:2023-07-20

# 絡み合いに基づく到達可能性計画を用いたゴールコンディション強化学習

Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning ( http://arxiv.org/abs/2307.10846v1 )

ライセンス: Link先を確認

Zhifeng Qian and Mingyu You and Hongjun Zhou and Xuanhui Xu and Bin He

(参考訳) 目標条件強化学習(gcrl)は、エージェントが様々な目標を自発的に設定してスキルのセットを学ぶことを可能にする。様々な分野で提案された優れた成果にもかかわらず、時間的に拡張されたタスクで遠い目標に達することは、GCRLにとって課題である。現在の作業では、計画アルゴリズムを利用して中間部分ゴールを計画し、GCRLを増強することでこの問題に対処している。彼らの方法には2つの重要な要件が必要です (i)有効なサブゴールを検索する状態表現空間、及び (ii)サブゴールの到達可能性を測定する距離関数。しかし、彼らは非コンパクトな表現のために高次元の状態空間にスケールするのに苦労する。さらに、標準GCポリシを通じて高品質なトレーニングデータを収集できないため、不正確な距離関数が生じる。どちらも計画と政策学習の効率と性能に影響する。本稿では,目標条件付きrlアルゴリズムと異方性に基づく到達可能性計画(replan)を組み合わせた時間的拡張タスクの解法を提案する。再計画において, ロボットの姿勢と物体の位置を自己教師ありで観察するコンパクト表現を学習するために, drm(disentangled representation module)が提案されている。単純なReachability discrimination Module (REM) も、サブゴールの時間的距離を決定するように設計されている。さらに、REMは固有のボーナスを計算して、トレーニングのための新しい状態の収集を促進する。我々は3つの視覚に基づくシミュレーションタスクと1つの現実世界タスクでREPlanを評価した。実験の結果,REPlanは時間的に拡張されたタスクを解く上で,従来の最先端手法よりも大幅に優れていた。

Goal-Conditioned Reinforcement Learning (GCRL) can enable agents to spontaneously set diverse goals to learn a set of skills. Despite the excellent works proposed in various fields, reaching distant goals in temporally extended tasks remains a challenge for GCRL. Current works tackled this problem by leveraging planning algorithms to plan intermediate subgoals to augment GCRL. Their methods need two crucial requirements: (i) a state representation space to search valid subgoals, and (ii) a distance function to measure the reachability of subgoals. However, they struggle to scale to high-dimensional state space due to their non-compact representations. Moreover, they cannot collect high-quality training data through standard GC policies, which results in an inaccurate distance function. Both affect the efficiency and performance of planning and policy learning. In the paper, we propose a goal-conditioned RL algorithm combined with Disentanglement-based Reachability Planning (REPlan) to solve temporally extended tasks. In REPlan, a Disentangled Representation Module (DRM) is proposed to learn compact representations which disentangle robot poses and object positions from high-dimensional observations in a self-supervised manner. A simple REachability discrimination Module (REM) is also designed to determine the temporal distance of subgoals. Moreover, REM computes intrinsic bonuses to encourage the collection of novel states for training. We evaluate our REPlan in three vision-based simulation tasks and one real-world task. The experiments demonstrate that our REPlan significantly outperforms the prior state-of-the-art methods in solving temporally extended tasks.

翻訳日:2023-07-21 13:00:08 公開日:2023-07-20

# 連続学習のための自己ペース重み統合

Self-paced Weight Consolidation for Continual Learning ( http://arxiv.org/abs/2307.10845v1 )

ライセンス: Link先を確認

Wei Cong, Yang Cong, Gan Sun, Yuyang Liu, Jiahua Dong

(参考訳) 新しいタスクのパラメータを以前のタスクに近く保持する連続学習アルゴリズムは、シーケンシャルなタスク学習設定における破滅的な忘れの防止に人気がある。しかし、 1) 新たな継続学習者の業績は,以前に学習した課題の貢献を区別することなく劣化する。 2) 既存のアルゴリズムでは,新しいタスクを学習する際には,全てのタスクを正規化する必要があるため,タスク数とともに計算コストが大幅に向上する。上記の課題に対処するために,従来の課題の判別的貢献を評価することによって,堅牢な連続学習を実現するための自己ペース重み統合(spwc)フレームワークを提案する。具体的には,重要性能指標(精度)に基づく難易度を測定することで,過去のタスクの優先順位を反映した自己対応型正規化を開発する。新しいタスクに遭遇すると、すべてのタスクは優先順位に基づいて"difficult"から"easy"にソートされる。すると、新しい連続学習者のパラメータは、より困難な過去のタスクの知識を選択的に維持することで学習される。我々は,bi-convex形式におけるモデルパラメータと優先度重みを反復的に更新するために,代替凸探索を採用する。提案したspWCフレームワークはプラグイン・アンド・プレイであり、ほとんどの連続学習アルゴリズム(例えばEWC、MAS、RCIL)に異なる方向(例えば分類とセグメンテーション)で適用することができる。いくつかの公開ベンチマークデータセットの実験結果から,提案するフレームワークは,他の一般的な連続学習アルゴリズムと比較して,性能を効果的に向上できることが示された。

Continual learning algorithms which keep the parameters of new tasks close to that of previous tasks, are popular in preventing catastrophic forgetting in sequential task learning settings. However, 1) the performance for the new continual learner will be degraded without distinguishing the contributions of previously learned tasks; 2) the computational cost will be greatly increased with the number of tasks, since most existing algorithms need to regularize all previous tasks when learning new tasks. To address the above challenges, we propose a self-paced Weight Consolidation (spWC) framework to attain robust continual learning via evaluating the discriminative contributions of previous tasks. To be specific, we develop a self-paced regularization to reflect the priorities of past tasks via measuring difficulty based on key performance indicator (i.e., accuracy). When encountering a new task, all previous tasks are sorted from "difficult" to "easy" based on the priorities. Then the parameters of the new continual learner will be learned via selectively maintaining the knowledge amongst more difficult past tasks, which could well overcome catastrophic forgetting with less computational cost. We adopt an alternative convex search to iteratively update the model parameters and priority weights in the bi-convex formulation. The proposed spWC framework is plug-and-play, which is applicable to most continual learning algorithms (e.g., EWC, MAS and RCIL) in different directions (e.g., classification and segmentation). Experimental results on several public benchmark datasets demonstrate that our proposed framework can effectively improve performance when compared with other popular continual learning algorithms.

翻訳日:2023-07-21 12:59:44 公開日:2023-07-20

# U-Net Convolutional LSTMアーキテクチャによるGPM用統合マルチサテライトE検索のグローバル化

Global Precipitation Nowcasting of Integrated Multi-satellitE Retrievals for GPM: A U-Net Convolutional LSTM Architecture ( http://arxiv.org/abs/2307.10843v1 )

ライセンス: Link先を確認

Reyhaneh Rahimi, Ardeshir Ebtehaj, Ali Behrangi, Jackson Tan

(参考訳) 本稿では,30分毎の降水量を4時間のリードタイムでほぼ全世界的に予測する深層学習アーキテクチャを提案する。このアーキテクチャは、U-NetとLSTM(convolutional long-term memory)ニューラルネットワークを融合させ、GPM(IMERG)用のIntegrated MultisatellitE Retrievalsのデータと、Global Forecast System(GFS)のいくつかの主要な降水ドライバを使用してトレーニングされる。平均二乗誤差 (regression) と焦点損失 (classification) を含む異なるトレーニング損失関数が降水流の質に及ぼす影響について検討した。その結果, 回帰ネットワークは光降水量(1.6mm/hr以下)を捕捉するのに有効であるが, 分類ネットワークは, 臨界成功指数 (csi) の観点から, 降水極値 (>8mm/hr) を現在キャスティングする回帰ネットワークよりも優れることがわかった。 . ワッサースタイン距離を用いて,分類ネットワークによって予測される降水は回帰ネットワークよりもimergに密接なクラス確率分布を持つことを示した。物理変数を組み込むことで、特に両ネットワークのリードタイムが長くなると、降雨のノキャスティングを改善できることが判明した。 IMERGを相対的な基準として、分数スキルスコア(FSS)のマルチスケール分析を行い、GFSの50kmに比べて10kmの解像度で流し込み機(FSS > 0.5)が熟練していることを示した。 4～mm/hr以上の降水量では、2時間のリードタイムで50km以上のスケールでFSSに熟練している。

This paper presents a deep learning architecture for nowcasting of precipitation almost globally every 30 min with a 4-hour lead time. The architecture fuses a U-Net and a convolutional long short-term memory (LSTM) neural network and is trained using data from the Integrated MultisatellitE Retrievals for GPM (IMERG) and a few key precipitation drivers from the Global Forecast System (GFS). The impacts of different training loss functions, including the mean-squared error (regression) and the focal-loss (classification), on the quality of precipitation nowcasts are studied. The results indicate that the regression network performs well in capturing light precipitation (below 1.6 mm/hr), but the classification network can outperform the regression network for nowcasting of precipitation extremes (>8 mm/hr), in terms of the critical success index (CSI).. Using the Wasserstein distance, it is shown that the predicted precipitation by the classification network has a closer class probability distribution to the IMERG than the regression network. It is uncovered that the inclusion of the physical variables can improve precipitation nowcasting, especially at longer lead times in both networks. Taking IMERG as a relative reference, a multi-scale analysis in terms of fractions skill score (FSS), shows that the nowcasting machine remains skillful (FSS > 0.5) at the resolution of 10 km compared to 50 km for GFS. For precipitation rates greater than 4~mm/hr, only the classification network remains FSS-skillful on scales greater than 50 km within a 2-hour lead time.

翻訳日:2023-07-21 12:59:15 公開日:2023-07-20

# ドメインシフト下におけるセマンティックセグメンテーションのためのラベル校正

Label Calibration for Semantic Segmentation Under Domain Shift ( http://arxiv.org/abs/2307.10842v1 )

ライセンス: Link先を確認

Ondrej Bohdal, Da Li, Timothy Hospedales

(参考訳) 事前訓練されたセマンティックセグメンテーションモデルの性能は、新しいドメインのデータを大幅に低下させる可能性がある。予測されたクラス確率を持つベクトルに最も近いプロトタイプに従って予測を行うことにより,事前学習したモデルを,ソフトラベルのプロトタイプを領域シフトで計算し,ラベル付き対象領域データに適用できることを示す。提案した適応手順は高速で、計算資源の面ではほとんど無料で提供され、大幅な性能向上をもたらす。このようなラベル校正の利点を,高度に実践的な合成から現実への意味的セグメンテーション問題に示す。

Performance of a pre-trained semantic segmentation model is likely to substantially decrease on data from a new domain. We show a pre-trained model can be adapted to unlabelled target domain data by calculating soft-label prototypes under the domain shift and making predictions according to the prototype closest to the vector with predicted class probabilities. The proposed adaptation procedure is fast, comes almost for free in terms of computational resources and leads to considerable performance improvements. We demonstrate the benefits of such label calibration on the highly-practical synthetic-to-real semantic segmentation problem.

翻訳日:2023-07-21 12:58:41 公開日:2023-07-20

# 多視点自己監督学習におけるエントロピーと再構成の役割

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning ( http://arxiv.org/abs/2307.10907v1 )

ライセンス: Link先を確認

Borja Rodr\'iguez-G\'alvez, Arno Blaas, Pau Rodr\'iguez, Adam Goli\'nski, Xavier Suau, Jason Ramapuram, Dan Busbridge, Luca Zappella

(参考訳) 多視点自己教師学習(MVSSL)の成功のメカニズムはまだ完全には理解されていない。対照的にMVSSL法は相互情報(MI)の下位境界であるInfoNCEのレンズを用いて研究されている。しかし、他のMVSSLメソッドとMIとの関係は未だ不明である。我々は、エントロピーと再構成項(ER)からなるMI上の異なる下界を考察し、そのレンズを通して主MVSSLファミリーを分析する。このER境界を通して、DeepClusterやSwaVといったクラスタリングベースの手法がMIを最大化することを示す。また,BYOLやDINOといった蒸留法に基づく手法のメカニズムを再解釈し,再現期間を明示的に最大化し,安定エントロピーを暗黙的に促進することを示した。本研究では, 一般的なMVSSL法をER境界に置き換えることで, より小さいバッチサイズあるいはより小さい指数移動平均(EMA)係数でトレーニングした場合に, 安定した性能が得られることを示す。 Github repo: https://github.com/apple/ml-entropy-reconstruction.com

The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.

翻訳日:2023-07-21 12:51:05 公開日:2023-07-20

# votelab: オンライン集団意思決定のためのモジュラーで適応的な実験プラットフォーム

VoteLab: A Modular and Adaptive Experimentation Platform for Online Collective Decision Making ( http://arxiv.org/abs/2307.10903v1 )

ライセンス: Link先を確認

Renato Kunz, Fatemeh Banaie, Abhinav Sharma, Carina I. Hausladen, Dirk Helbing, Evangelos Pournaras

(参考訳) デジタル民主主義と直接デジタル参加のための新しい形態は前例のない勢いを得ている。これは特に、市民集会、参加予算、選挙において公平で包括的で正当な集団的意思決定プロセスを促進するために設計された優先的な投票方法と意思決定支援システムの場合である。しかし、異なる投票方法を用いた体系的な人間実験は面倒で費用がかかる。本稿では,投票実験のモジュール化と適応設計のためのオープンソースかつ徹底的な文書化プラットフォームであるVoteLabを紹介する。これは、異なる投票方法を選択することで、再利用可能なキャンペーンを視覚的にインタラクティブに構築することをサポートし、投票者はスマートフォンで登録された投票質問に簡単に答えることができる。オンライン実験では、投票結果の整合性を調べるために、4つの投票方法と、COVID-19に関する質問を含む概念実証が使用されている。 VoteLabが複雑な投票シナリオの厳格な実験をサポートする能力を示している。

Digital democracy and new forms for direct digital participation in policy making gain unprecedented momentum. This is particularly the case for preferential voting methods and decision-support systems designed to promote fairer, more inclusive and legitimate collective decision-making processes in citizens assemblies, participatory budgeting and elections. However, a systematic human experimentation with different voting methods is cumbersome and costly. This paper introduces VoteLab, an open-source and thoroughly-documented platform for modular and adaptive design of voting experiments. It supports to visually and interactively build reusable campaigns with a choice of different voting methods, while voters can easily respond to subscribed voting questions on a smartphone. A proof-of-concept with four voting methods and questions on COVID-19 in an online lab experiment have been used to study the consistency of voting outcomes. It demonstrates the capability of VoteLab to support rigorous experimentation of complex voting scenarios.

翻訳日:2023-07-21 12:50:45 公開日:2023-07-20

# 歯科模型における変分点符号化変形

Variational Point Encoding Deformation for Dental Modeling ( http://arxiv.org/abs/2307.10895v1 )

ライセンス: Link先を確認

Johan Ziruo Ye, Thomas {\O}rkild, Peter Lempel S{\o}ndergaard, S{\o}ren Hauberg

(参考訳) 近年,デジタル歯科は大きな進歩を遂げているが,多くの課題が解決されている。本研究では,歯のメッシュの広範なデータセットを新たに公開し,さらなる研究を奨励する。さらに、FoldingNetを拡張して、ポイントクラウド表現の確率的学習を可能にする変分FoldingNet(VF-Net)を提案する。ポイントクラウドの既存の潜在変数モデルにおける重要な課題は、入力点と出力点の間の1対1のマッピングがないことである。代わりに、正規化された分布の対応を持たない計量であるチャムファー距離の最適化に頼らなければならず、確率モデルにおけるその使用を妨げている。確率的拡張を簡素化しながら計算効率を向上させるため,チャムファー距離の明示的な最小化を適切なエンコーダに置き換えることができることを示す。以上の結果から,VF-Netが既存モデルよりも優れていることを示す実証的証拠が得られた。さらに,VF-Netの潜在表現の堅牢性についても検討した。これらの結果は、ポイントクラウドの再構築と分析のための効果的で信頼性の高い方法としてのvf-netの有望な展望を裏付けるものである。

Digital dentistry has made significant advancements in recent years, yet numerous challenges remain to be addressed. In this study, we release a new extensive dataset of tooth meshes to encourage further research. Additionally, we propose Variational FoldingNet (VF-Net), which extends FoldingNet to enable probabilistic learning of point cloud representations. A key challenge in existing latent variable models for point clouds is the lack of a 1-to-1 mapping between input points and output points. Instead, they must rely on optimizing Chamfer distances, a metric that does not have a normalized distributional counterpart, preventing its usage in probabilistic models. We demonstrate that explicit minimization of Chamfer distances can be replaced by a suitable encoder, which allows us to increase computational efficiency while simplifying the probabilistic extension. Our experimental findings present empirical evidence demonstrating the superior performance of VF-Net over existing models in terms of dental scan reconstruction and extrapolation. Additionally, our investigation highlights the robustness of VF-Net's latent representations. These results underscore the promising prospects of VF-Net as an effective and reliable method for point cloud reconstruction and analysis.

翻訳日:2023-07-21 12:50:31 公開日:2023-07-20

# 人間の運動生成:調査

Human Motion Generation: A Survey ( http://arxiv.org/abs/2307.10894v1 )

ライセンス: Link先を確認

Wentao Zhu, Xiaoxuan Ma, Dongwoo Ro, Hai Ci, Jinlu Zhang, Jiaxin Shi, Feng Gao, Qi Tian, and Yizhou Wang

(参考訳) 人間の動き生成は、自然の人間のポーズシーケンスを生成し、現実世界の応用に大きな可能性を示す。近年,動きデータ収集技術や生成手法が進歩し,人間の動き生成への関心が高まっている。この分野のほとんどの研究は、テキスト、オーディオ、シーンコンテキストなどの条件信号に基づいて人間の動きを生成することに焦点を当てている。近年は顕著な進歩を遂げているが、人間の動きの複雑な性質と条件付き信号との暗黙的な関係により、課題が続いている。本稿では,人間の運動生成に関する総合的な文献レビューを行う。まず、人間の動作と生成モデルの背景を紹介し、続いて、テキストコンディショニング、オーディオコンディショニング、シーンコンディショニングの3つのメインストリームサブタスクの代表的な手法について検討する。さらに,共通データセットと評価指標の概要について述べる。最後に、オープンな問題について議論し、今後の研究の方向性について概説する。この調査がコミュニティに,この急速に発展する分野の包括的可視化を提供し,優れた課題に対処する新たなアイデアを刺激してくれることを願っています。

Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently in motion data collection technologies and generation methods, laying the foundation for increasing interest in human motion generation. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. While significant advancements have been made in recent years, the task continues to pose challenges due to the intricate nature of human motion and its implicit relationship with conditional signals. In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field. We begin by introducing the background of human motion and generative models, followed by an examination of representative methods for three mainstream sub-tasks: text-conditioned, audio-conditioned, and scene-conditioned human motion generation. Additionally, we provide an overview of common datasets and evaluation metrics. Lastly, we discuss open problems and outline potential future research directions. We hope that this survey could provide the community with a comprehensive glimpse of this rapidly evolving field and inspire novel ideas that address the outstanding challenges.

翻訳日:2023-07-21 12:50:13 公開日:2023-07-20

# シミュレーションメタモデリングにおける学習と一般化

Learning and Generalizing Polynomials in Simulation Metamodeling ( http://arxiv.org/abs/2307.10892v1 )

ライセンス: Link先を確認

Jesper Hauch, Christoffer Riis, Francisco C. Pereira

(参考訳) 多項式を学習し、分配を一般化する能力は、時間ステップの更新が多項式によって記述される工学の多くの分野におけるシミュレーションメタモデルにとって不可欠である。フィードフォワードニューラルネットワークは任意の関数に適合するが、高階多項式の分散の一般化はできない。そこで本研究では,高次多項式を近似するための再帰的ビルディングブロックとして使用される乗法ニューラルネットワーク(MNN)アーキテクチャを収集し,提案する。実験の結果、mnnは一般化時のベースラインモデルよりも優れており、検証のパフォーマンスは分散テストの性能に当てはまることがわかった。 MNNアーキテクチャに加えて,多項式時間ステップ更新を伴うシミュレーションに対して,シミュレーションメタモデリング手法を提案する。これらのシミュレーションでは、ステップサイズを増加させることで、時間間隔のシミュレーションを少ないステップで行うことができる。本手法は多項式時間ステップ更新を伴う任意のシミュレーションと互換性があるが, 疫学シミュレーションモデルを用いて, 高次多項式の学習と一般化のためのmnnの帰納的バイアスを示す。

The ability to learn polynomials and generalize out-of-distribution is essential for simulation metamodels in many disciplines of engineering, where the time step updates are described by polynomials. While feed forward neural networks can fit any function, they cannot generalize out-of-distribution for higher-order polynomials. Therefore, this paper collects and proposes multiplicative neural network (MNN) architectures that are used as recursive building blocks for approximating higher-order polynomials. Our experiments show that MNNs are better than baseline models at generalizing, and their performance in validation is true to their performance in out-of-distribution tests. In addition to MNN architectures, a simulation metamodeling approach is proposed for simulations with polynomial time step updates. For these simulations, simulating a time interval can be performed in fewer steps by increasing the step size, which entails approximating higher-order polynomials. While our approach is compatible with any simulation with polynomial time step updates, a demonstration is shown for an epidemiology simulation model, which also shows the inductive bias in MNNs for learning and generalizing higher-order polynomials.

翻訳日:2023-07-21 12:49:54 公開日:2023-07-20

# 構文対意味線形抽象化とニューラルネットワークの洗練

Syntactic vs Semantic Linear Abstraction and Refinement of Neural Networks ( http://arxiv.org/abs/2307.10891v1 )

ライセンス: Link先を確認

Calvin Chau, Jan K\v{r}et\'insk\'y, Stefanie Mohr

(参考訳) 抽象化はスケーラビリティを改善するための重要な検証テクニックです。しかし、ニューラルネットワークへの利用は非常に限られている。分類ネットワークを抽象化するための従来のアプローチは、いくつかのニューロンをその1つに置き換える。類似性は(ニューロン間の接続量を用いて)構文的に定義するか(様々な入力に対するニューロンの活性化値に基づいて)意味的に分類することができる。残念なことに、以前のアプローチは、実装時にのみ適度な削減を達成している。本研究では、ニューロンを他のニューロンの線形結合体に置き換えることのできる、より柔軟な枠組みを提供する。このアプローチを構文抽象と意味抽象の両方に適用し,それらを実験的に実装し,評価する。さらに, 抽象化の精細化手法を導入し, 縮小と精度のバランスを良くする手法を提案する。

Abstraction is a key verification technique to improve scalability. However, its use for neural networks is so far extremely limited. Previous approaches for abstracting classification networks replace several neurons with one of them that is similar enough. We can classify the similarity as defined either syntactically (using quantities on the connections between neurons) or semantically (on the activation values of neurons for various inputs). Unfortunately, the previous approaches only achieve moderate reductions, when implemented at all. In this work, we provide a more flexible framework where a neuron can be replaced with a linear combination of other neurons, improving the reduction. We apply this approach both on syntactic and semantic abstractions, and implement and evaluate them experimentally. Further, we introduce a refinement method for our abstractions, allowing for finding a better balance between reduction and precision.

翻訳日:2023-07-21 12:49:35 公開日:2023-07-20

# 競争市場における帯域学習のためのプレイヤー最適安定レグレット

Player-optimal Stable Regret for Bandit Learning in Matching Markets ( http://arxiv.org/abs/2307.10890v1 )

ライセンス: Link先を確認

Fang Kong, Shuai Li

(参考訳) 市場マッチングの問題は、その適用範囲が多岐にわたることから、長い間文献で研究されてきた。安定マッチングを見つけることは、この問題における共通の均衡目標である。市場参加者は通常自分の好みについて不確実であるため、最近のリッチな作品のラインは、一方の参加者(プレイヤー)が他方(腕)との反復的な相互作用から未知の好みを学ぶオンライン設定を研究している。このシリーズの以前の作品の多くは、プレイヤーの最小予測の安定マッチングと比較して定義されるプレイヤー・ペシムの安定な後悔に対する理論的保証のみを導出することができる。しかし、悲観的安定マッチングの下では、プレイヤーは全ての安定マッチングの中で最小の報酬しか得られない。プレイヤーの利益を最大化するために、プレイヤー・オプティマイズ・マッチが最も望ましい。 \citet{basu21beyond} はプレイヤー最適の安定な後悔に対する上限をもたらすが、プレイヤーの好みの差が小さい場合には指数関数的に大きい。この後悔に対する多項式保証が存在するかどうかは重要な問題であるが、まだ未解決の問題である。本研究は,探索-テーマ-ゲイル-シャプリー (ETGS) という新しいアルゴリズムを提供し,各プレイヤーの最適な安定な後悔は,$O(K\log T/\Delta^2)$,$K$は武器数,$T$は地平線,$\Delta$は最初の$N+1$ランクのアーム間のプレイヤーの最小の選好ギャップであることを示す。この結果は、より弱いプレイヤー・ペシムの安定的目標を持つか、特別な仮定を持つ市場のみに適用する以前の作品を大幅に改善する。参加者の嗜好がいくつかの特別な条件を満たすとき、我々の後悔の上界も以前導出された下界と一致する。

The problem of matching markets has been studied for a long time in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players' least-preferred stable matching. However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players' profits, player-optimal stable matching would be the most desirable. Though \citet{basu21beyond} successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players' preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm named explore-then-Gale-Shapley (ETGS) and show that the optimal stable regret of each player can be upper bounded by $O(K\log T/\Delta^2)$ where $K$ is the number of arms, $T$ is the horizon and $\Delta$ is the players' minimum preference gap among the first $N+1$-ranked arms. This result significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound.

翻訳日:2023-07-21 12:49:22 公開日:2023-07-20

# ロバスト点雲分類におけるリスク最適化外乱除去

Risk-optimized Outlier Removal for Robust Point Cloud Classification ( http://arxiv.org/abs/2307.10875v1 )

ライセンス: Link先を確認

Xinke Li, Junchi Lu

(参考訳) 安全クリティカルな目的のためのポイントクラウドディープモデルの人気は高まっているが、これらのモデルの信頼性とセキュリティは、意図的または自然に発生するポイントクラウドノイズによって損なわれる可能性がある。この問題に対処するために,標準学習モデルに付加的なアウトレイラを排除し,データを復元するPointCVaRと呼ばれる新しいポイントクラウド・アウトレイラ除去手法を提案する。我々のアプローチは、各点がモデル出力に与える影響を決定するために帰属分析を行うことから始まり、それがポイントリスク(point risk)と呼ばれる。次に,リスク条件値(CVaR)を目的とする高リスク点のフィルタリング処理を最適化する。このアプローチの理論的根拠は、点雲のノイズポイントがリスク分布の尾に集結する傾向にあり、低頻度であるが高いレベルのリスクを持つため、分類結果にかなりの干渉が生じるという観察に基づいている。追加の訓練は必要とせず, ノイズノイズ, 逆方向ノイズ, バックドアトリガノイズによって劣化するノイズ点群に対する様々な除去・分類実験において, 例外的な結果が得られた。驚くべきことに、トリガーを取り除くことで、バックドア攻撃に対する防御精度が87%向上した。全体として、提案するpointcvarは、ノイズポイントを効果的に排除し、ポイントクラウドの分類を強化し、さまざまなシナリオにおいて、さまざまなモデルに対して有望なプラグインモジュールとなる。

The popularity of point cloud deep models for safety-critical purposes has increased, but the reliability and security of these models can be compromised by intentional or naturally occurring point cloud noise. To combat this issue, we present a novel point cloud outlier removal method called PointCVaR, which empowers standard-trained models to eliminate additional outliers and restore the data. Our approach begins by conducting attribution analysis to determine the influence of each point on the model output, which we refer to as point risk. We then optimize the process of filtering high-risk points using Conditional Value at Risk (CVaR) as the objective. The rationale for this approach is based on the observation that noise points in point clouds tend to cluster in the tail of the risk distribution, with a low frequency but a high level of risk, resulting in significant interference with classification results. Despite requiring no additional training effort, our method produces exceptional results in various removal-and-classification experiments for noisy point clouds, which are corrupted by random noise, adversarial noise, and backdoor trigger noise. Impressively, it achieves 87% accuracy in defense against the backdoor attack by removing triggers. Overall, the proposed PointCVaR effectively eliminates noise points and enhances point cloud classification, making it a promising plug-in module for various models in different scenarios.

翻訳日:2023-07-21 12:48:49 公開日:2023-07-20

# 自動車シナリオにおける安全軌道に対する動的物体の知覚関連性の保守的推定

Conservative Estimation of Perception Relevance of Dynamic Objects for Safe Trajectories in Automotive Scenarios ( http://arxiv.org/abs/2307.10873v1 )

ライセンス: Link先を確認

Ken Mori, Kai Storms, Steven Peters

(参考訳) 効率的なテスト戦略を持つことは、自動運転のリリースにおいて克服すべき課題である。これは明確な要件とテストに適した方法を必要とする。この研究において、知覚モジュールの要件は、関連性に関して考慮される。関連性の概念はいまだ十分に定義されていない。本稿では,ハイウェイ領域における衝突安全への模範的適用により,この課題を克服する新しい手法を提案する。この一般的なシステムとユースケース仕様を用いて、関連する概念を導出する。したがって、無関係なオブジェクトは、すべての不確実性を考慮して、エゴ車両で利用可能な安全なアクションのセットを制限することができないオブジェクトとして定義される。最初のステップでは、衝突の関連性に関してユースケースを機能シナリオに分解します。それぞれの機能シナリオにおいて、ego 車両と他の動的物体の両方の可能な動作は方程式として定式化される。この可能なアクションのセットは、トラフィックルールによって制約され、関連性基準が得られます。その結果,動的対象が知覚に関連し,完全な評価を行う必要があるという保守的な評価が得られた。この推定は、オフラインテストや知覚コンポーネントの検証に適用可能な要件を提供する。高次元データセットの例を視覚化し、結果の妥当性を示す。最後に,提案する妥当性概念の今後の検証の可能性について概説する。

Having efficient testing strategies is a core challenge that needs to be overcome for the release of automated driving. This necessitates clear requirements as well as suitable methods for testing. In this work, the requirements for perception modules are considered with respect to relevance. The concept of relevance currently remains insufficiently defined and specified. In this paper, we propose a novel methodology to overcome this challenge by exemplary application to collision safety in the highway domain. Using this general system and use case specification, a corresponding concept for relevance is derived. Irrelevant objects are thus defined as objects which do not limit the set of safe actions available to the ego vehicle under consideration of all uncertainties. As an initial step, the use case is decomposed into functional scenarios with respect to collision relevance. For each functional scenario, possible actions of both the ego vehicle and any other dynamic object are formalized as equations. This set of possible actions is constrained by traffic rules, yielding relevance criteria. As a result, we present a conservative estimation which dynamic objects are relevant for perception and need to be considered for a complete evaluation. The estimation provides requirements which are applicable for offline testing and validation of perception components. A visualization is presented for examples from the highD dataset, showing the plausibility of the results. Finally, a possibility for a future validation of the presented relevance concept is outlined.

翻訳日:2023-07-21 12:48:23 公開日:2023-07-20

# 非線形メタラーニングは速い速度を保証できる

Nonlinear Meta-Learning Can Guarantee Faster Rates ( http://arxiv.org/abs/2307.10870v1 )

ライセンス: Link先を確認

Dimitri Meunier, Zhu Li, Arthur Gretton, Samory Kpotufe

(参考訳) 近年のemph{meta-learning}に関する多くの理論的研究は、類似した表象構造を目的タスクから簡易化するための保証を達成することを目的としている。重要なのは、理論の主要な目的は、共通表現の学習において、収束率が、タスク数(およびタスク当たりのサンプル数)とともに、\emph{may scale with the number $n$ of tasks} の程度を理解することである。この設定の最初のステップは、タスク間の共有表現とタスク固有の回帰関数の両方が線形であるときにこの特性を示す。この線形設定は、例えば平均的な引数を通じてタスクを集約する利点をすぐに明らかにする。しかし実際には、表現はしばしば非常に非線形であり、線形の場合のように容易に評価できない各タスクに非自明なバイアスを導入する。本研究では,非線形表現を用いたメタラーニングの理論的保証を導出する。特に、共有非線形性写像を無限次元 RKHS に仮定すると、タスク固有回帰関数の滑らかさを利用する注意的な正則化により、さらなるバイアスを緩和できることが示される。

Many recent theoretical works on \emph{meta-learning} aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task. Importantly, the main aim in theory works on the subject is to understand the extent to which convergence rates -- in learning a common representation -- \emph{may scale with the number $N$ of tasks} (as well as the number of samples per task). First steps in this setting demonstrate this property when both the shared representation amongst tasks, and task-specific regression functions, are linear. This linear setting readily reveals the benefits of aggregating tasks, e.g., via averaging arguments. In practice, however, the representation is often highly nonlinear, introducing nontrivial biases in each task that cannot easily be averaged out as in the linear case. In the present work, we derive theoretical guarantees for meta-learning with nonlinear representations. In particular, assuming the shared nonlinearity maps to an infinite-dimensional RKHS, we show that additional biases can be mitigated with careful regularization that leverages the smoothness of task-specific regression functions,

翻訳日:2023-07-21 12:48:05 公開日:2023-07-20

# 点雲変形ネットワークを用いた三次元心収縮と緩和のモデリング

Modeling 3D cardiac contraction and relaxation with point cloud deformation networks ( http://arxiv.org/abs/2307.10927v1 )

ライセンス: Link先を確認

Marcel Beetz, Abhirup Banerjee, Vicente Grau

(参考訳) 射出率のような臨床で一般的に用いられる心機能のグローバルな単価バイオマーカーは、真の3d心臓変形過程に関する限られた洞察を与え、健康的および病理学的心臓力学の両方の理解を制限している。本研究では,3次元心収縮と心周期の極端間緩和をモデル化する新しい幾何学的深層学習手法として,point cloud deformation network (pcd-net)を提案する。心臓解剖学のマルチクラス3Dポイントクラウド表現上で,効率的なマルチスケール特徴学習を実現するために,ポイントクラウドベースの深層学習をエンコーダ・デコーダ構造に応用した。我々は,英国バイオバンクの調査から,1万件を超える大規模データセットに対するアプローチを評価し,画像取得の画素解像度以下の予測真理解剖学と地上真理解剖学の間の平均チャンファー距離を求める。以上の結果から,pcd-netは正常者と心筋梗塞患者との間に有意なサブポピュレーション特異的な差を捉えることができた。得られた3次元変形パターンは,MI検出および入射MI予測のタスクにおいて,受信機動作特性曲線の領域で13%,7%,ハーレルのMI生存分析におけるコンコーダンス指標で7%,複数の臨床ベンチマークで13%,7%を上回った。

Global single-valued biomarkers of cardiac function typically used in clinical practice, such as ejection fraction, provide limited insight on the true 3D cardiac deformation process and hence, limit the understanding of both healthy and pathological cardiac mechanics. In this work, we propose the Point Cloud Deformation Network (PCD-Net) as a novel geometric deep learning approach to model 3D cardiac contraction and relaxation between the extreme ends of the cardiac cycle. It employs the recent advances in point cloud-based deep learning into an encoder-decoder structure, in order to enable efficient multi-scale feature learning directly on multi-class 3D point cloud representations of the cardiac anatomy. We evaluate our approach on a large dataset of over 10,000 cases from the UK Biobank study and find average Chamfer distances between the predicted and ground truth anatomies below the pixel resolution of the underlying image acquisition. Furthermore, we observe similar clinical metrics between predicted and ground truth populations and show that the PCD-Net can successfully capture subpopulation-specific differences between normal subjects and myocardial infarction (MI) patients. We then demonstrate that the learned 3D deformation patterns outperform multiple clinical benchmarks by 13% and 7% in terms of area under the receiver operating characteristic curve for the tasks of prevalent MI detection and incident MI prediction and by 7% in terms of Harrell's concordance index for MI survival analysis.

翻訳日:2023-07-21 12:42:23 公開日:2023-07-20

# 3次元医用画像分割における信頼区間の評価

Confidence intervals for performance estimates in 3D medical image segmentation ( http://arxiv.org/abs/2307.10926v1 )

ライセンス: Link先を確認

R. El Jurdi, G. Varoquax, O. Colliot

(参考訳) 医療セグメンテーションモデルは経験的に評価される。このような評価は、サンプル画像の限られたセットに基づいているため、避けられない騒音である。平均的なパフォーマンス指標を超えて、信頼区間の報告が重要である。しかし、医用画像分割ではめったに行われない。信頼区間の幅は、テストセットのサイズとパフォーマンス測定値の広がりに依存する(テストセット全体の標準緩和)。分類には、幅広い信頼区間を避けるために多くのテスト画像が必要である。しかし、セグメンテーションは研究されておらず、与えられたテスト画像によってもたらされる情報量によって異なる。本稿では,医用画像分割における典型的な信頼区間について検討する。標準のnnu-netフレームワークを用いた3次元画像分割実験を行い,医療用デカロンチャレンジから得られた2つのデータセットと,dice精度とハウスドルフ距離の2つの性能測定を行った。パラメトリック信頼区間は,種々のテストセットサイズと性能指標の拡散に対するブートストラップ推定値の妥当な近似であることを示す。重要となるのは,特定の精度を達成するのに必要なテストサイズが,分類タスクよりもはるかに低いことだ。通常、1%の広信頼区間は、拡散が低い場合(標準偏差は約3%)、100-200のテストサンプルを必要とする。より難しいセグメンテーションタスクは、より高いスプレッドをもたらし、1000以上のサンプルを必要とする。

Medical segmentation models are evaluated empirically. As such an evaluation is based on a limited set of example images, it is unavoidably noisy. Beyond a mean performance measure, reporting confidence intervals is thus crucial. However, this is rarely done in medical image segmentation. The width of the confidence interval depends on the test set size and on the spread of the performance measure (its standard-deviation across of the test set). For classification, many test images are needed to avoid wide confidence intervals. Segmentation, however, has not been studied, and it differs by the amount of information brought by a given test image. In this paper, we study the typical confidence intervals in medical image segmentation. We carry experiments on 3D image segmentation using the standard nnU-net framework, two datasets from the Medical Decathlon challenge and two performance measures: the Dice accuracy and the Hausdorff distance. We show that the parametric confidence intervals are reasonable approximations of the bootstrap estimates for varying test set sizes and spread of the performance metric. Importantly, we show that the test size needed to achieve a given precision is often much lower than for classification tasks. Typically, a 1% wide confidence interval requires about 100-200 test samples when the spread is low (standard-deviation around 3%). More difficult segmentation tasks may lead to higher spreads and require over 1000 samples.

翻訳日:2023-07-21 12:41:54 公開日:2023-07-20

# 点雲表現を用いた固有出現分解

Intrinsic Appearance Decomposition Using Point Cloud Representation ( http://arxiv.org/abs/2307.10924v1 )

ライセンス: Link先を確認

Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers

(参考訳) 内在分解は、画像からアルベドとシェーディングを推測することである。かなり不適切な問題であるため、以前の方法は2d画像からの事前の仮定に依存しているが、データ表現自体の探索は限られている。点雲は、画像の幾何学的情報と色情報を自然に整列する豊かなシーン表現形式として知られている。提案手法であるPoint Intrinsic Net, 略してPoInt-Netは, 点雲表現を用いてアルベド, 光源方向, シェーディングを共同で予測する。実験によれば、point-netの利点は、精度の面では、データセットをまたがる複数のメトリクスに対する2d表現アプローチよりも優れており、効率の面では、小規模のポイントクラウド上でトレーニングされ、任意のスケールのポイントクラウド上で安定して実行される。

Intrinsic decomposition is to infer the albedo and shading from the image. Since it is a heavily ill-posed problem, previous methods rely on prior assumptions from 2D images, however, the exploration of the data representation itself is limited. The point cloud is known as a rich format of scene representation, which naturally aligns the geometric information and the color information of an image. Our proposed method, Point Intrinsic Net, in short, PoInt-Net, jointly predicts the albedo, light source direction, and shading, using point cloud representation. Experiments reveal the benefits of PoInt-Net, in terms of accuracy, it outperforms 2D representation approaches on multiple metrics across datasets; in terms of efficiency, it trains on small-scale point clouds and performs stably on any-scale point clouds; in terms of robustness, it only trains on single object level dataset, and demonstrates reasonable generalization ability for unseen objects and scenes.

翻訳日:2023-07-21 12:41:31 公開日:2023-07-20

# 臨床時系列における連続多次元自己監督学習

Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series ( http://arxiv.org/abs/2307.10923v1 )

ライセンス: Link先を確認

Aniruddh Raghu, Payal Chandak, Ridwan Alam, John Guttag, Collin M. Stultz

(参考訳) 臨床時系列データに対する自己教師付き学習 (ssl) は, 患者の生理的状態に関する重要な情報を提供するため, 近年の文献で注目されている。しかし、既存の臨床時系列のSSL法のほとんどは、構造化された特徴(例えば、実験値やバイタルサイン)や個々の高次元生理的信号(例えば、心電図)のような、単調な時系列のために設計されているという点で制限されている。これらの既存手法は、構造的特徴と高次元データがシーケンスの各時間ステップに記録される多モード性を示すモデル時系列に容易に拡張することはできない。本研究では,このギャップに対処し,シーケンス全体のレベルとシーケンス内の個々の高次元データポイントのレベルの両方でSSLロスを適用し,両方のスケールで情報をよりよく取得する,新たなSSLメソッドであるSequential Multi-dimensional SSLを提案する。当社の戦略は,各レベルで使用される損失関数の特定の形式とは無関係です -- vicregのように,simclrや非contrastiveのように,対照的なものです。本手法は,(1)高周波心電図,(2)検査値とバイタルサインからの構造化データを含む実世界の2つの臨床データセットを用いて評価した。実験結果から,本手法による事前学習と下流タスクの微調整により,両方のデータセットのベースライン上でのパフォーマンスが向上し,複数の設定で異なる自己教師付き損失関数が改良される可能性が示唆された。

Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab values and vitals signs) or an individual high-dimensional physiological signal (e.g., an electrocardiogram). These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method -- Sequential Multi-Dimensional SSL -- where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales. Our strategy is agnostic to the specific form of loss function used at each level -- it can be contrastive, as in SimCLR, or non-contrastive, as in VICReg. We evaluate our method on two real-world clinical datasets, where the time series contains sequences of (1) high-frequency electrocardiograms and (2) structured data from lab values and vitals signs. Our experimental results indicate that pre-training with our method and then fine-tuning on downstream tasks improves performance over baselines on both datasets, and in several settings, can lead to improvements across different self-supervised loss functions.

翻訳日:2023-07-21 12:41:14 公開日:2023-07-20

# 言語に基づく行動概念空間は自己指導型学習を改善する

Language-based Action Concept Spaces Improve Video Self-Supervised Learning ( http://arxiv.org/abs/2307.10922v1 )

ライセンス: Link先を確認

Kanchana Ranasinghe and Michael Ryoo

(参考訳) 最近のコントラスト言語画像事前学習は、高度に転送可能で堅牢な画像表現の学習につながっている。しかし、これらのモデルを最小限の監督でビデオドメインに適応させることは、まだ未解決の問題である。画像CLIPモデルをビデオ領域に適応させるために,言語による自己教師型学習を用いて,その方向への簡単なステップを探索する。時間的モデリングのために修正されたバックボーンは、アクションコンセプト空間で動作する列車の目的と自己蒸留設定の下で訓練される。関連するテキストプロンプトを用いて言語エンコーダから抽出した様々なアクション概念の特徴ベクトルがこの空間を構成する。本稿では, 従来の表現の汎用性を保ちつつ, 動作と属性の関係を強制する, 概念蒸留と概念アライメントという2つの列車目標を紹介する。提案手法は3つの行動認識ベンチマークにおいてゼロショットおよび線形探索性能を向上させる。

Recent contrastive language image pre-training has led to learning highly transferable and robust image representations. However, adapting these models to video domains with minimal supervision remains an open problem. We explore a simple step in that direction, using language tied self-supervised learning to adapt an image CLIP model to the video domain. A backbone modified for temporal modeling is trained under self-distillation settings with train objectives operating in an action concept space. Feature vectors of various action concepts extracted from a language encoder using relevant textual prompts construct this space. We introduce two train objectives, concept distillation and concept alignment, that retain generality of original representations while enforcing relations between actions and their attributes. Our approach improves zero-shot and linear probing performance on three action recognition benchmarks.

翻訳日:2023-07-21 12:40:44 公開日:2023-07-20

# 多項式関数の量子コンピュータへの効率的な振幅符号化

Efficient amplitude encoding of polynomial functions into quantum computers ( http://arxiv.org/abs/2307.10917v1 )

ライセンス: Link先を確認

Javier Gonzalez-Conde, Thomas W. Watts, Pablo Rodriguez-Grasa and Mikel Sanz

(参考訳) 関数を量子コンピュータにロードすることは、偏微分方程式の解法のようないくつかの量子アルゴリズムにおいて重要なステップである。したがって、このプロセスの非効率性は、これらのアルゴリズムの適用に大きなボトルネックをもたらす。本稿では,実多項式関数の振幅符号化のための2つの効率的な手法を提示・比較する。最初のものは行列積の状態表現に依存し、そこでは結合次元が小さいと仮定された場合の目標状態の近似を研究し、ベンチマークする。第2のアルゴリズムは2つのサブルーチンを結合し、最初は線形関数を量子レジスタにエンコードし、アダマール・ウォルシュ級数展開をロードする多制御ゲートのドローシーケンスと、それに続く逆離散アダマール・ウォルシュ変換を導出する。次に、この構成をビルディングブロックとして使用して、線形関数に対応する振幅の$\mathcal{O}(n)$ブロック符号化を実現し、対応する多項式変換を実装した量子特異値変換を振幅のブロック符号化に適用する。さらに,線形関数のアダマール・ワルシュ級数列が対象状態の最終的な忠実性にどのように影響するかを考察し,小資源で高いフィディティを報告した。

Loading functions into quantum computers represents an essential step in several quantum algorithms, such as in the resolution of partial derivative equations. Therefore, the inefficiency of this process leads to a major bottleneck for the application of these algorithms. Here, we present and compare two efficient methods for the amplitude encoding of real polynomial functions. The first one relies on the matrix product state representation, where we study and benchmark the approximations of the target state when the bond dimension is assumed to be small. The second algorithm combines two subroutines, initially we encode the linear function into the quantum registers with a swallow sequence of multi-controlled gates that loads its Hadamard-Walsh series expansion, followed by the inverse discrete Hadamard-Walsh transform. Then, we use this construction as a building block to achieve a $\mathcal{O}(n)$ block encoding of the amplitudes corresponding to the linear function and apply the quantum singular value transformation that implements the corresponding polynomial transformation to the block encoding of the amplitudes. Additionally, we explore how truncating the Hadamard-Walsh series of the linear function affects the final fidelity of the target state, reporting high fidelities with small resources.

翻訳日:2023-07-21 12:40:32 公開日:2023-07-20

# 自己監視医用画像解析のための微調整戦略の再検討

Revisiting Fine-Tuning Strategies for Self-supervised Medical Imaging Analysis ( http://arxiv.org/abs/2307.10915v1 )

ライセンス: Link先を確認

Muhammad Osama Khan, Yi Fang

(参考訳) 自己教師付き学習(SSL)の急速な進歩にもかかわらず、医用画像解析におけるエンド・ツー・エンドの微調整戦略は依然として主流である。しかし、この手法が訓練済みの知識を効果的に活用するのに本当に最適なのか、特に異なるタイプの特徴を捉えたSSLの多様なカテゴリを考慮すると、はっきりしない。本稿では,まず,4つの下流タスクにおいてSOTAメソッドを上回り,強力なコントラスト的かつ復元的なSSLベースラインを確立する。これらの強力なベースラインに基づいて、複数の事前トレーニングおよび微調整データセット、および様々な微調整データセットサイズにわたる広範囲な微調整分析を行う。トレーニング済みネットワークの最後の数層のみを微調整するという従来の知恵とは対照的に、細調整中間層はより効果的であり、ネットワークの第2四半期(25-50%)は対照的なSSLに最適であるのに対して、第3四半期(50-75%)は復元SSLに最適である。エンドツーエンドファインチューニングのデファクト標準と比較すると、トレーニング済みネットワークの最初の3/3(0-75%)からなる浅層ネットワークを微調整し、最大5.48%の改善を実現しています。さらに,これらの知見を用いて,複数のSSLモデルの相補的強みを利用した簡易かつ効果的な手法を提案する。したがって,個々のsslモデルの性能を向上させるだけでなく,複数のsslモデルが提供する補完的強みを効果的に活用することで,自己監視型医用画像解析の大幅な改善を実現した。

Despite the rapid progress in self-supervised learning (SSL), end-to-end fine-tuning still remains the dominant fine-tuning strategy for medical imaging analysis. However, it remains unclear whether this approach is truly optimal for effectively utilizing the pre-trained knowledge, especially considering the diverse categories of SSL that capture different types of features. In this paper, we first establish strong contrastive and restorative SSL baselines that outperform SOTA methods across four diverse downstream tasks. Building upon these strong baselines, we conduct an extensive fine-tuning analysis across multiple pre-training and fine-tuning datasets, as well as various fine-tuning dataset sizes. Contrary to the conventional wisdom of fine-tuning only the last few layers of a pre-trained network, we show that fine-tuning intermediate layers is more effective, with fine-tuning the second quarter (25-50%) of the network being optimal for contrastive SSL whereas fine-tuning the third quarter (50-75%) of the network being optimal for restorative SSL. Compared to the de-facto standard of end-to-end fine-tuning, our best fine-tuning strategy, which fine-tunes a shallower network consisting of the first three quarters (0-75%) of the pre-trained network, yields improvements of as much as 5.48%. Additionally, using these insights, we propose a simple yet effective method to leverage the complementary strengths of multiple SSL models, resulting in enhancements of up to 3.57% compared to using the best model alone. Hence, our fine-tuning strategies not only enhance the performance of individual SSL models, but also enable effective utilization of the complementary strengths offered by multiple SSL models, leading to significant improvements in self-supervised medical imaging analysis.

翻訳日:2023-07-21 12:40:09 公開日:2023-07-20

# weak polyp: ポリプセグメンテーションのバウンディングボックスだけを見る

WeakPolyp: You Only Look Bounding Box for Polyp Segmentation ( http://arxiv.org/abs/2307.10912v1 )

ライセンス: Link先を確認

Jun Wei, Yiwen Hu, Shuguang Cui, S.Kevin Zhou, Zhen Li

(参考訳) 高価なピクセルレベルラベルに制限されたポリプセグメンテーションモデルは、データ不足と一般化に苦しむ。対照的に、polypバウンディングボックスアノテーションはずっと安く、よりアクセスしやすい。したがって,ラベル付けコストを削減するため,境界ボックスアノテーションをベースとした弱教師付きポリプセグメンテーションモデル(WeakPolyp)の学習を提案する。しかし、粗い境界ボックスにはノイズが多すぎる。干渉を避けるため,マスクツーボックス変換(m2b)を導入する。予測自体ではなく予測の外側ボックスマスクを監視することにより、M2Bは粗いラベルと正確な予測とのミスマッチを大幅に軽減する。しかし、M2Bは厳密な監視しか提供せず、異常な予測に繋がる。そこで我々はさらに,集中管理のためのスケール一貫性(SC)損失を提案する。異なるスケールで同じ画像で予測を明示的に調整することで、sc損失は予測のばらつきを大幅に減少させる。 WeakPolypはプラグアンドプレイモデルで、他の魅力的なバックボーンに簡単に移植できます。さらに、提案されたモジュールはトレーニング中にのみ使用され、推論に計算コストがかからない。提案するweakpolypは,マスクアノテーションをまったく必要とせず,完全に教師付きモデルと同等の性能を実現している。

Limited by expensive pixel-level labels, polyp segmentation models are plagued by data shortage and suffer from impaired generalization. In contrast, polyp bounding box annotations are much cheaper and more accessible. Thus, to reduce labeling cost, we propose to learn a weakly supervised polyp segmentation model (i.e., WeakPolyp) completely based on bounding box annotations. However, coarse bounding boxes contain too much noise. To avoid interference, we introduce the mask-to-box (M2B) transformation. By supervising the outer box mask of the prediction instead of the prediction itself, M2B greatly mitigates the mismatch between the coarse label and the precise prediction. But, M2B only provides sparse supervision, leading to non-unique predictions. Therefore, we further propose a scale consistency (SC) loss for dense supervision. By explicitly aligning predictions across the same image at different scales, the SC loss largely reduces the variation of predictions. Note that our WeakPolyp is a plug-and-play model, which can be easily ported to other appealing backbones. Besides, the proposed modules are only used during training, bringing no computation cost to inference. Extensive experiments demonstrate the effectiveness of our proposed WeakPolyp, which surprisingly achieves a comparable performance with a fully supervised model, requiring no mask annotations at all.

翻訳日:2023-07-21 12:39:36 公開日:2023-07-20

# ゲージ対称性による準周期CMV行列のエクササイズエッジ

Exact mobility edges for almost-periodic CMV matrices via gauge symmetries ( http://arxiv.org/abs/2307.10909v1 )

ライセンス: Link先を確認

Christopher Cedzich and Jake Fillman and Long Li and Darren Ong and Qi Zhou

(参考訳) 一般化拡張CMV行列の対称性について検討する。標準拡張CMV行列の反射対称性に関わる問題は微妙なものであることはよく文書化されている。一般化された拡張CMV行列のクラスをカンテロ・Gr\"ウンバウム・モラル・ベラスケスの精神における明示的な対角ユニタリを通して、エレガントな方法で扱う方法を示す。これらのアイデアの応用として、モーザイユニタリなニアマチュー作用素と呼ばれる、ほぼ周期的なCMV行列の明示的な族を構築し、正確なモビリティエッジの発生を証明する。すなわち、絶対連続かつ純粋な点スペクトルを持つスペクトル領域を分離し、それらを正確に計算するエネルギーの存在を示す。

We investigate the symmetries of so-called generalized extended CMV matrices. It is well-documented that problems involving reflection symmetries of standard extended CMV matrices can be subtle. We show how to deal with this in an elegant fashion by passing to the class of generalized extended CMV matrices via explicit diagonal unitaries in the spirit of Cantero-Gr\"unbaum-Moral-Vel\'azquez. As an application of these ideas, we construct an explicit family of almost-periodic CMV matrices, which we call the mosaic unitary almost-Mathieu operator, and prove the occurrence of exact mobility edges. That is, we show the existence of energies that separate spectral regions with absolutely continuous and pure point spectrum and exactly calculate them.

翻訳日:2023-07-21 12:39:14 公開日:2023-07-20

# d$-dimensional bell状態に基づくサードパーティなしのマルチパーティ量子和法の改良

Improvements on "Multi-Party Quantum Summation without a Third Party based on $d$-Dimensional Bell States" ( http://arxiv.org/abs/2307.10908v1 )

ライセンス: Link先を確認

Xiaobing Li and Jiale Hou and Haozhen Situ and Cai Zhang

(参考訳) 2021年、WuらはD次元ベル状態の絡み合い特性を利用した多次元量子和スキームを発表した(Wu et al. in Quantum Inf Process 20:200, 2021)。特に、著者らは3つのパーティの量子和プロトコルを提案し、その成果をマルチパーティのケースに拡張した。彼らのプロトコルは外部や参加者の攻撃に対して安全であると主張されている。しかし、この研究はウーのプロトコルが抜け穴を持っていること、すなわち、特定の位置関係を満たしている2人以上の不正な参加者が、検出されずに一部の正直な参加者のプライベートな入力を得ることを意図していることを指摘している。そのため、これらの問題に対処するための改善が提案されている。

In 2021, Wu et al. presented a multi-party quantum summation scheme exploiting the entanglement properties of d-dimensional Bell states (Wu et al. in Quantum Inf Process 20:200, 2021). In particular, the authors proposed a three-party quantum summation protocol and then extended their work to a multi-party case. It is claimed that their protocol is secure against outside and participants' attacks. However, this work points out that Wu's protocol has a loophole, i.e., two or more dishonest participants who meet a specific location relationship can conspire to obtain the private inputs of some honest participants without being detected. Accordingly, improvements are proposed to address these issues.

翻訳日:2023-07-21 12:39:00 公開日:2023-07-20

# 軟部組織駆動型顎顔面手術計画

Soft-tissue Driven Craniomaxillofacial Surgical Planning ( http://arxiv.org/abs/2307.10954v1 )

ライセンス: Link先を確認

Xi Fang, Daeseung Kim, Xuanang Xu, Tianshu Kuang, Nathan Lampen, Jungwook Lee, Hannah H. Deng, Jaime Gateno, Michael A.K. Liebschner, James J. Xia, Pingkun Yan

(参考訳) CMF手術では, 希望する顔の成果を達成するためのボニームーブメントの計画が難しい課題である。現在の骨駆動アプローチは、顔の外観が修正されることを期待して、骨の正常化に焦点を当てている。しかし、骨構造と顔面軟部組織との複雑な非線形関係のため、このような骨駆動法は顔面変形を矯正するには不十分である。骨の動きによる顔の変化をシミュレートする努力にもかかわらず、手術計画はまだ反復的な修正と教育的な推測に依存している。そこで本研究では,手術計画の自動作成と検証が可能なソフトトイシュー駆動フレームワークを提案する。本フレームワークは,所望の顔結果を達成するために必要なボニー運動を推定するボニープランナーネットワークと,推定ボニー運動計画から生じる顔変化をシミュレートする顔シミュレータネットワークとから構成される。これら2つのモデルを組み合わせることで、計画に必要な最終的なボニー運動を検証することができる。提案手法を臨床データを用いて評価し, 従来の骨駆動アプローチと比較して, 軟部組織駆動アプローチが外科的計画の精度と有効性を大幅に改善することを示した。

In CMF surgery, the planning of bony movement to achieve a desired facial outcome is a challenging task. Current bone driven approaches focus on normalizing the bone with the expectation that the facial appearance will be corrected accordingly. However, due to the complex non-linear relationship between bony structure and facial soft-tissue, such bone-driven methods are insufficient to correct facial deformities. Despite efforts to simulate facial changes resulting from bony movement, surgical planning still relies on iterative revisions and educated guesses. To address these issues, we propose a soft-tissue driven framework that can automatically create and verify surgical plans. Our framework consists of a bony planner network that estimates the bony movements required to achieve the desired facial outcome and a facial simulator network that can simulate the possible facial changes resulting from the estimated bony movement plans. By combining these two models, we can verify and determine the final bony movement required for planning. The proposed framework was evaluated using a clinical dataset, and our experimental results demonstrate that the soft-tissue driven approach greatly improves the accuracy and efficacy of surgical planning when compared to the conventional bone-driven approach.

翻訳日:2023-07-21 12:32:37 公開日:2023-07-20

# PE-YOLO:ダークオブジェクト検出のためのピラミッド拡張ネットワーク

PE-YOLO: Pyramid Enhancement Network for Dark Object Detection ( http://arxiv.org/abs/2307.10953v1 )

ライセンス: Link先を確認

Xiangchen Yin, Zhenda Yu, Zetao Fei, Wenjun Lv, Xin Gao

(参考訳) 現在のオブジェクト検出モデルは、多くのベンチマークデータセットで良い結果を得ており、暗い条件下でオブジェクトを検出することは大きな課題である。この問題に対処するために,ピラミッド拡張ネットワーク(PENet)を提案し,それをYOLOv3と結合してPE-YOLOというダークオブジェクト検出フレームワークを構築する。まずPENetは、画像をラプラシアンピラミッドを用いて異なる解像度の4つのコンポーネントに分解する。具体的には、コンテキストブランチとエッジブランチで構成される画像のディテールを強化するためのディテール処理モジュール(DPM)を提案する。さらに、低周波セマンティクスを捕捉し、高周波ノイズを防止する低周波拡張フィルタ(LEF)を提案する。 PE-YOLOはエンドツーエンドのジョイントトレーニングアプローチを採用し、通常の検出損失のみを使用してトレーニングプロセスを簡素化する。我々は,低照度物体検出データセットexdarkの実験を行い,その効果を実証した。その結果,他の暗黒検出器や低照度化モデルと比較して,PE-YOLOはmAPが78.0%,FPSが53.6%となり,異なる低照度条件下での物体検出に適応できることがわかった。コードはhttps://github.com/XiangchenYin/PE-YOLOで公開されている。

Current object detection models have achieved good results on many benchmark datasets, detecting objects in dark conditions remains a large challenge. To address this issue, we propose a pyramid enhanced network (PENet) and joint it with YOLOv3 to build a dark object detection framework named PE-YOLO. Firstly, PENet decomposes the image into four components of different resolutions using the Laplacian pyramid. Specifically we propose a detail processing module (DPM) to enhance the detail of images, which consists of context branch and edge branch. In addition, we propose a low-frequency enhancement filter (LEF) to capture low-frequency semantics and prevent high-frequency noise. PE-YOLO adopts an end-to-end joint training approach and only uses normal detection loss to simplify the training process. We conduct experiments on the low-light object detection dataset ExDark to demonstrate the effectiveness of ours. The results indicate that compared with other dark detectors and low-light enhancement models, PE-YOLO achieves the advanced results, achieving 78.0% in mAP and 53.6 in FPS, respectively, which can adapt to object detection under different low-light conditions. The code is available at https://github.com/XiangchenYin/PE-YOLO.

翻訳日:2023-07-21 12:32:16 公開日:2023-07-20

# object-lane clustering によるオンラインレーングラフ抽出の改善

Improving Online Lane Graph Extraction by Object-Lane Clustering ( http://arxiv.org/abs/2307.10947v1 )

ライセンス: Link先を確認

Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool

(参考訳) 自律運転には正確な現場理解情報が必要である。この目的のために、自律エージェントは知覚スタックの一部としてオブジェクト検出とオンラインBEVレーングラフ抽出手法をデプロイする。本研究では,3次元物体検出出力を用いて局所レーングラフ推定精度を向上させるアーキテクチャと損失定式化を提案する。提案手法では, 中心線をクラスタセンタとして, オブジェクトをクラスタセンタ上の確率分布に割り当てるデータポイントとして考慮し, 中心線にオブジェクトを割り当てることを学ぶ。このトレーニングスキームはレーンとオブジェクトの関係を直接監視することを保証するので、パフォーマンスが向上する。提案手法は,最先端手法よりもレーングラフ推定を大幅に改善する。提案手法は,既存の3次元物体検出手法の出力を用いることで,大幅な性能向上が期待できることを示す。本手法では, 中間表現ではなく検出出力を用いるため, テスト時に任意の検出手法を単一モデルで使用することができる。

Autonomous driving requires accurate local scene understanding information. To this end, autonomous agents deploy object detection and online BEV lane graph extraction methods as a part of their perception stack. In this work, we propose an architecture and loss formulation to improve the accuracy of local lane graph estimates by using 3D object detection outputs. The proposed method learns to assign the objects to centerlines by considering the centerlines as cluster centers and the objects as data points to be assigned a probability distribution over the cluster centers. This training scheme ensures direct supervision on the relationship between lanes and objects, thus leading to better performance. The proposed method improves lane graph estimation substantially over state-of-the-art methods. The extensive ablations show that our method can achieve significant performance improvements by using the outputs of existing 3D object detection methods. Since our method uses the detection outputs rather than detection method intermediate representations, a single model of our method can use any detection method at test time.

翻訳日:2023-07-21 12:31:54 公開日:2023-07-20

# プロキシアンカーによる連続一般化カテゴリー探索のための教師なし学習

Proxy Anchor-based Unsupervised Learning for Continuous Generalized Category Discovery ( http://arxiv.org/abs/2307.10943v1 )

ライセンス: Link先を確認

Hyungmin Kim, Sungho Suh, Daehwan Kim, Daun Jeong, Hansang Cho, Junmo Kim

(参考訳) ディープラーニングの最近の進歩は、様々なコンピュータビジョンアプリケーションのパフォーマンスを大幅に改善した。しかしながら、インクリメンタル学習シナリオにおける新しいカテゴリの発見は、新しいカテゴリの数と性質に関する事前知識が不足しているため、依然として困難な問題である。既存の新しいカテゴリ発見手法は、ラベル付きデータセットに依存し、新規カテゴリの数やバッチ内の新規サンプルの割合に関する事前知識によって制限される。本稿では,実世界のシナリオをより正確に反映し,その制約に対処するために,事前知識のないラベル付き集合上で新しいカテゴリを発見できる,教師なしクラスインクリメンタル学習手法を提案する。提案手法は,ラベル付きデータセット上の特徴抽出器とプロキシアンカーを微調整し,未ラベルデータセット上の古いカテゴリと新しいカテゴリとクラスタに分割する。さらに、プロキシアンカーベースの例が代表カテゴリーベクトルを生成して破滅的忘れを緩和する。実験の結果,提案手法は実世界のシナリオにおいて,きめ細かなデータセットの最先端手法よりも優れていることがわかった。

Recent advances in deep learning have significantly improved the performance of various computer vision applications. However, discovering novel categories in an incremental learning scenario remains a challenging problem due to the lack of prior knowledge about the number and nature of new categories. Existing methods for novel category discovery are limited by their reliance on labeled datasets and prior knowledge about the number of novel categories and the proportion of novel samples in the batch. To address the limitations and more accurately reflect real-world scenarios, in this paper, we propose a novel unsupervised class incremental learning approach for discovering novel categories on unlabeled sets without prior knowledge. The proposed method fine-tunes the feature extractor and proxy anchors on labeled sets, then splits samples into old and novel categories and clusters on the unlabeled dataset. Furthermore, the proxy anchors-based exemplar generates representative category vectors to mitigate catastrophic forgetting. Experimental results demonstrate that our proposed approach outperforms the state-of-the-art methods on fine-grained datasets under real-world scenarios.

翻訳日:2023-07-21 12:31:39 公開日:2023-07-20

# pasta: 事前訓練されたアクションステートトランスフォーマーエージェント

PASTA: Pretrained Action-State Transformer Agents ( http://arxiv.org/abs/2307.10936v1 )

ライセンス: Link先を確認

Raphael Boige and Yannis Flet-Berliac and Arthur Flajolet and Guillaume Richard and Thomas Pierrot

(参考訳) 自己教師型学習は、NLP、ビジョン、生物学など、さまざまなコンピューティング領域に革命的なパラダイムシフトをもたらした。最近のアプローチでは、大量のラベルのないデータでトランスフォーマーモデルを事前トレーニングし、下流タスクを効率的に解決するための出発点となる。強化学習の分野では、研究者たちは最近、専門家の軌道上で事前訓練されたモデルを開発し、ロボット工学からレコメンデーションシステムまで幅広いタスクに対処できるように、これらのアプローチを適用した。しかし、既存の手法は主に特定の下流アプリケーションに適した複雑な事前学習の目的に依存している。本稿では,前訓練動作状態トランスフォーマーエージェント (pasta) と呼ばれるモデルの包括的検討を行う。本研究は統一的な手法を用い,行動のクローン化,オフラインrl,センサ障害のロバスト性,ダイナミクス変化適応など,幅広い下流タスクをカバーする。私たちの目標は、さまざまな設計選択を体系的に比較し、堅牢なモデルを構築する実践者に貴重な洞察を提供することです。本研究では,アクションと状態コンポーネントレベルでのトークン化,次のトークン予測のような基本的な事前トレーニング目標の利用,多様なドメインをまたいだトレーニングモデル,パラメータ効率の優れた微調整(peft)などについて検討した。また,peftの適用により,下流適応時のパラメータ1万未満の微調整が可能となり,幅広いコミュニティがこれらのモデルを用いて実験を再現することが可能となった。本研究は,RL軌道を表現し,ロバストな政策学習に寄与するために,第一原理設計選択による変圧器の使用に関するさらなる研究を期待する。

Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In the realm of reinforcement learning, researchers have recently adapted these approaches by developing models pre-trained on expert trajectories, enabling them to address a wide range of tasks, from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper presents a comprehensive investigation of models we refer to as Pretrained Action-State Transformer Agents (PASTA). Our study uses a unified methodology and covers an extensive set of general downstream tasks including behavioral cloning, offline RL, sensor failure robustness, and dynamics change adaptation. Our goal is to systematically compare various design choices and provide valuable insights to practitioners for building robust models. Key highlights of our study include tokenization at the action and state component level, using fundamental pre-training objectives like next token prediction, training models across diverse domains simultaneously, and using parameter efficient fine-tuning (PEFT). The developed models in our study contain fewer than 10 million parameters and the application of PEFT enables fine-tuning of fewer than 10,000 parameters during downstream adaptation, allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first-principles design choices to represent RL trajectories and contribute to robust policy learning.

翻訳日:2023-07-21 12:31:22 公開日:2023-07-20

# 機械学習と結晶距離を用いたゼオライトの無機合成構造マップ

Inorganic synthesis-structure maps in zeolites with machine learning and crystallographic distances ( http://arxiv.org/abs/2307.10935v1 )

ライセンス: Link先を確認

Daniel Schwalbe-Koda, Daniel E. Widdowson, Tuan Anh Pham, Vitaliy A. Kurlin

(参考訳) ゼオライト(zeolites)は、用途、合成条件、ポリモルフィックの多様性で知られる無機材料である。合成は無機合成と有機合成の両方で制御されているが、ゼオライト合成の計算的な研究は主に有機テンプレートの設計に焦点が当てられている。本研究では,結晶構造と機械学習(ml)間の強い距離測定値を用いて,ゼオライト中の無機合成マップを作成する。 253個のゼオライトから始めて, 構造単位などのラベルを使わずに, 文献から無機合成条件を連続的に再現する方法を示す。教師なし学習分析では, テンプレートベースの経路においても, 隣り合うゼオライトが類似した無機合成条件をしばしば共有していることが示されている。 ML分類器と組み合わせることで, ゼオライト中の14の無機質, Al, B, Be, Ca, Co, F, Ga, Ge, K, Mg, Na, P, Si, Znの合成構造関係が得られた。モデル予測を説明することで,既知の構造との類似性を合成空間の特徴として利用できることを示す。最後に, ゼオライトから局所的な構造パターンを抽出することにより, 仮説データベースにおける非実現枠組みの無機合成条件の予測と結果の解釈にこれらの手法が利用できることを示す。テンプレート設計と組み合わせることで、この研究はゼオライトの合成条件の空間の探索を加速することができる。

Zeolites are inorganic materials known for their diversity of applications, synthesis conditions, and resulting polymorphs. Although their synthesis is controlled both by inorganic and organic synthesis conditions, computational studies of zeolite synthesis have focused mostly on organic template design. In this work, we use a strong distance metric between crystal structures and machine learning (ML) to create inorganic synthesis maps in zeolites. Starting with 253 known zeolites, we show how the continuous distances between frameworks reproduce inorganic synthesis conditions from the literature without using labels such as building units. An unsupervised learning analysis shows that neighboring zeolites according to our metric often share similar inorganic synthesis conditions, even in template-based routes. In combination with ML classifiers, we find synthesis-structure relationships for 14 common inorganic conditions in zeolites, namely Al, B, Be, Ca, Co, F, Ga, Ge, K, Mg, Na, P, Si, and Zn. By explaining the model predictions, we demonstrate how (dis)similarities towards known structures can be used as features for the synthesis space. Finally, we show how these methods can be used to predict inorganic synthesis conditions for unrealized frameworks in hypothetical databases and interpret the outcomes by extracting local structural patterns from zeolites. In combination with template design, this work can accelerate the exploration of the space of synthesis conditions for zeolites.

翻訳日:2023-07-21 12:30:52 公開日:2023-07-20

# OCTraN:非構造交通シナリオにおける3次元駆動型畳み込み変圧器ネットワーク

OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios ( http://arxiv.org/abs/2307.10934v1 )

ライセンス: Link先を確認

Aditya Nalgunda Ganesh and Dhruval Pobbathi Badrinath and Harshith Mohan Kumar and Priya SS and Surabhi Narayan

(参考訳) 自律ナビゲーションのための視覚中心環境認識の現代的アプローチは、不均一マップを出力する自己教師付き単眼深度推定アルゴリズムを広範囲に活用する。しかし, この差分マップを3次元空間に投影すると, 差分誤差が増大し, カメラからの距離が大きくなるにつれて, 深さ推定誤差が2次的に増加する。 Light Detection and Ranging (LiDAR)はこの問題を解決できるが、多くのアプリケーションでは高価であり実現不可能である。そこで本稿では, 2次元画像の特徴を3次元空間に変換し, 畳み込みと畳み込みを併用し, 空間情報を効率的に操作する変圧器アーキテクチャであるocranを提案する。また, 単眼深度推定から得られた擬似地上真理ラベルを置換することにより, LiDAR基底真理を排除し, 任意のシーンにモデルを一般化する自己教師型訓練パイプラインを開発した。

Modern approaches for vision-centric environment perception for autonomous navigation make extensive use of self-supervised monocular depth estimation algorithms that output disparity maps. However, when this disparity map is projected onto 3D space, the errors in disparity are magnified, resulting in a depth estimation error that increases quadratically as the distance from the camera increases. Though Light Detection and Ranging (LiDAR) can solve this issue, it is expensive and not feasible for many applications. To address the challenge of accurate ranging with low-cost sensors, we propose, OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features and makes use of convolution and transpose convolution to efficiently operate on spatial information. We also develop a self-supervised training pipeline to generalize the model to any scene by eliminating the need for LiDAR ground truth by substituting it with pseudo-ground truth labels obtained from boosted monocular depth estimation.

翻訳日:2023-07-21 12:30:25 公開日:2023-07-20

# 分節的双生児:文表現の微粒な意味的コントラスト学習

Identical and Fraternal Twins: Fine-Grained Semantic Contrastive Learning of Sentence Representations ( http://arxiv.org/abs/2307.10932v1 )

ライセンス: Link先を確認

Qingfa Xiao, Shuangyin Li, Lei Chen

(参考訳) 文表現の教師なし学習の強化は、コントラスト学習の有用性によって著しく達成されている。このアプローチは、拡張正のインスタンスをアンカーインスタンスとクラスタリングして、望ましい埋め込みスペースを作成する。しかし、対照的な目的のみに依存することは、正のペア間で微妙な意味のバリエーションを区別できないため、最適以下の結果をもたらす可能性がある。特に、一般的なデータ拡張技術は、しばしば意味的歪みをもたらし、正のペア間の意味的マージンをもたらす。情報損失関数は意味的マージンを見落とし、トレーニング中の正のペア間の類似度最大化を優先するが、トレーニングされたモデルの無意識な意味的理解能力に繋がる。本稿では,異なる拡張手法によって生成される様々な正の対に同時に適応できる,新しいIdentical and Fraternal Twins of Contrastive Learning (IFTCL)フレームワークを提案する。そこで本研究では,学習中に生来のマージンを保ち,データエンハンスメントの可能性を促進し,下位最適化問題を克服する \textit{twins loss} を提案する。また,提案したツインズ・ロスの有効性を証明するために,概念実証実験と対照的な目的を組み合わせる。さらに,新たな計算を行わずに負のインスタンスを復元・再利用するための海馬待ち行列機構を提案し,IFCLの効率と性能をさらに向上させる。英語と中国語のデータセットで9つの意味的テキスト類似性タスクをifclフレームワークで検証し,ifclが最先端の手法よりも優れていることを示す。

The enhancement of unsupervised learning of sentence representations has been significantly achieved by the utility of contrastive learning. This approach clusters the augmented positive instance with the anchor instance to create a desired embedding space. However, relying solely on the contrastive objective can result in sub-optimal outcomes due to its inability to differentiate subtle semantic variations between positive pairs. Specifically, common data augmentation techniques frequently introduce semantic distortion, leading to a semantic margin between the positive pair. While the InfoNCE loss function overlooks the semantic margin and prioritizes similarity maximization between positive pairs during training, leading to the insensitive semantic comprehension ability of the trained model. In this paper, we introduce a novel Identical and Fraternal Twins of Contrastive Learning (named IFTCL) framework, capable of simultaneously adapting to various positive pairs generated by different augmentation techniques. We propose a \textit{Twins Loss} to preserve the innate margin during training and promote the potential of data enhancement in order to overcome the sub-optimal issue. We also present proof-of-concept experiments combined with the contrastive objective to prove the validity of the proposed Twins Loss. Furthermore, we propose a hippocampus queue mechanism to restore and reuse the negative instances without additional calculation, which further enhances the efficiency and performance of the IFCL. We verify the IFCL framework on nine semantic textual similarity tasks with both English and Chinese datasets, and the experimental results show that IFCL outperforms state-of-the-art methods.

翻訳日:2023-07-21 12:29:55 公開日:2023-07-20

# mediagpt : 中国語メディアを対象とした大規模言語モデル

MediaGPT : A Large Language Model Target Chinese Media ( http://arxiv.org/abs/2307.10930v1 )

ライセンス: Link先を確認

Zhonghao Wang

(参考訳) 大規模言語モデル(LLM)の開発は近年急速に進展している。最も広く使われているLCMの1つは、メディアドメインを含む様々な分野に適用されているジェネレーティブ・プレトレーニング・トランスフォーマー(GPT)シリーズである。しかし、実際的な応用では、メディアのユースケースとLLMの汎用的応用の違いが、特に中国語で顕著になっている。その結果、メディアドメインのユニークな要件に合わせて、LSMを開発する必要性が高まっている。本稿では,多種多様なメディアデータを用いた大規模言語モデルであるMediaGPTを紹介し,中国メディアの実践的ニーズに対処する。我々は、ドメインの特定の要件を満たすために、多様なタスク命令タイプを設計しました。提案手法の有効性をさらに検証するため,メディア領域に適した独自のデータセットを構築し,生成型タスクに特化して設計された検証手法を開発した。そこで我々は, LLM の汎用性とメディア領域の要件とのギャップを埋めること, この分野における LLM のより効率的かつ効率的な利用の道を開くことを目的としている。本稿では,メディアアプリケーションのためのLLM開発における課題と機会を探究し,これらの課題に対処するための潜在的解決策を提案する。

The development of large language models (LLMs) has seen rapid progress in recent years. One of the most widely used LLMs is the Generative Pre-trained Transformer (GPT) series, which has been applied in various fields, including the media domain. However, in practical applications, the differences between the media's use cases and the general-purpose applications of LLMs have become increasingly apparent, especially Chinese. As a result, there is a growing need to develop LLM that are specifically tailored to the unique requirements of the media domain. In this paper, we present MediaGPT, a large language model training on variety of media data and addressing the practical needs of Chinese media. We have designed a diverse set of task instruction types to cater to the specific requirements of the domain. To further validate the effectiveness of our proposed LLM, we have constructed unique datasets that are tailored to the media domain and have also developed verification methods that are specifically designed for generative-type tasks. By doing so, we aim to bridge the gap between the general-purpose LLM and the requirements of the media domain, and to pave the way for more effective and efficient use of LLM in this field. This paper aims to explore the challenges and opportunities of developing LLM for media applications and to propose potential solutions for addressing these challenges.

翻訳日:2023-07-21 12:28:59 公開日:2023-07-20

# FLASK:アライメントスキルセットに基づくきめ細かい言語モデルの評価

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets ( http://arxiv.org/abs/2307.10928v1 )

ライセンス: Link先を確認

Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim, Yongrae Jo, James Thorne, Juho Kim, Minjoon Seo

(参考訳) 大規模言語モデル(LLM)の評価は、人的価値に合わせるには、複数のスキルの構成が必要であり、必要なスキルセットは命令によって異なるため、難しい。最近の研究では,(1)複数の独立ベンチマークの自動評価,(2)反応に対する総合スコアを与える人間または機械による評価,の2つの方法でllmの性能評価を行っている。しかし、どちらの設定も大まかな評価であり、LLMの真の能力の解釈を制限するインスタンスワイドなスキル構成を必要とするユーザ命令の性質を考慮しない。本稿では,粗粒度スコアリングをインスタンス毎のスキルセットレベルに分解するモデルベースとヒューマンベースの両方に適用可能な,粒度評価プロトコルであるflask(粒度言語モデル評価,アライメントスキルセットに基づく粒度言語モデル評価)を提案する。具体的には、LLMがオープンエンドのユーザ指示に従うために必要な12のきめ細かいスキルを定義し、各インスタンスのスキルセットを割り当てて評価セットを構築する。さらに、各インスタンスのターゲットドメインと難易度をアノテートすることで、FLASKは、スキル、ドメイン、難易度に応じて、モデルのパフォーマンスを包括的に分析する全体像を提供する。 FLASKを用いて、複数のオープンソースおよびプロプライエタリなLCMを比較し、モデルに基づく評価と人間による評価の高度に相関した結果を観察する。 FLASKを使うことで、開発者はモデルのパフォーマンスをより正確に測定し、特定のスキルにおいてLLMを熟練させる要因を分析することで改善できる。実践者にとって、FLASKは様々なLLMの総合的な比較を通じて、特定の状況に適したモデルを提案するために使用できる。評価データとコード実装はhttps://github.com/kaistAI/FLASK.comで公開します。

Evaluation of Large Language Models (LLMs) is challenging because aligning to human values requires the composition of multiple skills and the required set of skills varies depending on the instruction. Recent studies have evaluated the performance of LLMs in two ways, (1) automatic evaluation on several independent benchmarks and (2) human or machined-based evaluation giving an overall score to the response. However, both settings are coarse-grained evaluations, not considering the nature of user instructions that require instance-wise skill composition, which limits the interpretation of the true capabilities of LLMs. In this paper, we introduce FLASK (Fine-grained Language Model Evaluation based on Alignment SKill Sets), a fine-grained evaluation protocol that can be used for both model-based and human-based evaluation which decomposes coarse-level scoring to an instance-wise skill set-level. Specifically, we define 12 fine-grained skills needed for LLMs to follow open-ended user instructions and construct an evaluation set by allocating a set of skills for each instance. Additionally, by annotating the target domains and difficulty level for each instance, FLASK provides a holistic view with a comprehensive analysis of a model's performance depending on skill, domain, and difficulty. Through using FLASK, we compare multiple open-sourced and proprietary LLMs and observe highly-correlated findings between model-based and human-based evaluations. FLASK enables developers to more accurately measure the model performance and how it can be improved by analyzing factors that make LLMs proficient in particular skills. For practitioners, FLASK can be used to recommend suitable models for particular situations through comprehensive comparison among various LLMs. We release the evaluation data and code implementation at https://github.com/kaistAI/FLASK.

翻訳日:2023-07-21 12:28:29 公開日:2023-07-20

# MASR:メタデータ対応音声表現

MASR: Metadata Aware Speech Representation ( http://arxiv.org/abs/2307.10982v1 )

ライセンス: Link先を確認

Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth

(参考訳) 近年,音声表現学習は主に自己教師付き学習(SSL)タスクとして構築され,生音声信号のみを使用しながら,特定の音声記録でしばしば利用できるサイドインフォメーションを無視している。本稿では,前述の制限に対処するメタデータ対応音声表現学習フレームワークであるmasrを提案する。 MASRは、複数の外部知識ソースを組み込むことで、メタデータ情報の利用を促進できる。外部知識源は、ハードマイニング損失に有用なサンプルレベルのペアワイズ類似度行列の形で組み込まれている。 MASRフレームワークの重要な利点は、SSLメソッドの選択と組み合わせることができることである。我々は,MASR表現を用いて,言語識別や音声認識,話者認識や感情認識などの非意味的タスクなど,下流タスクの評価を行う。これらの実験では、他の確立されたベンチマークよりもMASRの大幅な性能向上を示す。本稿では,言語識別タスクの詳細な解析を行い,提案した損失関数が表現を密接な関係のある言語を分離することを可能にする方法について考察する。

In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording. In this paper, we propose MASR, a Metadata Aware Speech Representation learning framework, which addresses the aforementioned limitations. MASR enables the inclusion of multiple external knowledge sources to enhance the utilization of meta-data information. The external knowledge sources are incorporated in the form of sample-level pair-wise similarity matrices that are useful in a hard-mining loss. A key advantage of the MASR framework is that it can be combined with any choice of SSL method. Using MASR representations, we perform evaluations on several downstream tasks such as language identification, speech recognition and other non-semantic tasks such as speaker and emotion recognition. In these experiments, we illustrate significant performance improvements for the MASR over other established benchmarks. We perform a detailed analysis on the language identification task to provide insights on how the proposed loss function enables the representations to separate closely related languages.

翻訳日:2023-07-21 12:21:11 公開日:2023-07-20

# PATROL: モデル反転攻撃に対する協調推論のためのプライバシ指向プルーニング

PATROL: Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks ( http://arxiv.org/abs/2307.10981v1 )

ライセンス: Link先を確認

Shiwei Ding, Lan Zhang, Miao Pan, Xiaoyong Yuan

(参考訳) 協調推論(collaborative inference)は、最先端のディープニューラルネットワーク(dnn)を使用してリソース制約のあるエッジデバイスによる推論を可能にする、有望なソリューションである。協調推論では、エッジデバイスはまず入力を部分dnnにローカルに供給し、その後中間結果をクラウドにアップロードして推論を完了させる。しかし、近年の研究では、モデル反転攻撃(MIA)は中間結果から入力データを再構築し、協調推論に深刻なプライバシー上の懸念を呈している。既存の摂動と暗号技術は、正確な推論を行いながらMIAに対する防御において非効率で信頼性が低い。本稿では,プライバシ,効率性,協調推論の有用性のバランスをとるために,プライバシ指向のプルーニングを開発する。 PATROLは、DNNの後のレイヤがタスク固有の機能を抽出できるという事実を活用する。協調推論のための限られたローカルリソースを前提として、PATROLは、推論のためのタスク固有の機能を強制し、プライバシ保護のためのタスク非関連だがセンシティブな機能を減らすために、プルーニング技術に基づいて、エッジにより多くのレイヤをデプロイする。プライバシ指向のプルーニングを実現するために、parioはリプシッツ正則化と、miasの安定性を低下させることによる再構成エラーの増加と、敵のトレーニングによる目標推論モデルの拡張という2つの重要な構成要素を導入している。

Collaborative inference has been a promising solution to enable resource-constrained edge devices to perform inference using state-of-the-art deep neural networks (DNNs). In collaborative inference, the edge device first feeds the input to a partial DNN locally and then uploads the intermediate result to the cloud to complete the inference. However, recent research indicates model inversion attacks (MIAs) can reconstruct input data from intermediate results, posing serious privacy concerns for collaborative inference. Existing perturbation and cryptography techniques are inefficient and unreliable in defending against MIAs while performing accurate inference. This paper provides a viable solution, named PATROL, which develops privacy-oriented pruning to balance privacy, efficiency, and utility of collaborative inference. PATROL takes advantage of the fact that later layers in a DNN can extract more task-specific features. Given limited local resources for collaborative inference, PATROL intends to deploy more layers at the edge based on pruning techniques to enforce task-specific features for inference and reduce task-irrelevant but sensitive features for privacy preservation. To achieve privacy-oriented pruning, PATROL introduces two key components: Lipschitz regularization and adversarial reconstruction training, which increase the reconstruction errors by reducing the stability of MIAs and enhance the target inference model by adversarial training, respectively.

翻訳日:2023-07-21 12:20:54 公開日:2023-07-20

# 電流-密度相互作用を受けるボース・アインシュタイン凝縮体のカイラル電流

Chiral currents in Bose-Einstein condensates subject to current-density interactions ( http://arxiv.org/abs/2307.10977v1 )

ライセンス: Link先を確認

Maria Arazo, Montserrat Guilleumas, Ricardo Mayol, Vicente Delgado and Antonio Mu\~noz Mateo

(参考訳) 準1次元ボース・アインシュタイン凝縮中の持続電流は、電流-密度相互作用の存在下でキラルとなる。この現象は、回転環幾何学でロードされた超低温原子で探索され、様々な電流担持定常状態が解析的に発見され、運動の平均場方程式に対する既知の解を一般化する。その動的安定性は、一定の密度プロファイルと変調された密度プロファイルを持つ状態に対して安定した電流を示す数値シミュレーションによって検証される。この分野における最近の実験により、これらの状態は実験的に到達できる。

Persistent currents in quasi-one-dimensional Bose-Einstein condensates become chiral in the presence of current-density interactions. This phenomenon is explored in ultracold atoms loaded in a rotating ring geometry, where diverse current-carrying stationary states are analytically found to generalize previously known solutions to the mean-field equations of motion. Their dynamical stability is tested by numerical simulations that show stable currents for states with both constant and modulated density profiles, while decaying currents appear only beyond a unidirectional velocity threshold. Recent experiments in the field make these states within experimental reach.

翻訳日:2023-07-21 12:20:25 公開日:2023-07-20

# 集積フォトニック分数畳み込み加速器

Integrated Photonic Fractional Convolution Accelerator ( http://arxiv.org/abs/2307.10976v1 )

ライセンス: Link先を確認

Kevin Zelaya and Mohammad-Ali Miri

(参考訳) 離散差分フーリエ変換(DFrFT)に基づく修正畳み込み演算を行う集積フォトニック回路アーキテクチャを提案する。これは、2つの非一様結合導波路格子と等間隔固有モードスペクトルと、変調器アレイを挟む相補的な順序のDFrDT演算を行う異なる長さの異なる長を持つ。数値シミュレーションにより、ノイズのある入力信号でもスムージングとエッジ検出のタスクが実際に実行されることが示された。

An integrated photonic circuit architecture to perform a modified-convolution operation based on the Discrete Fractional Fourier Transform (DFrFT) is introduced. This is accomplished by utilizing two nonuniformly-coupled waveguide lattices with equally-spaced eigenmode spectra and with different lengths that perform DFrDT operations of complementary orders sandwiching a modulator array. Numerical simulations show that smoothing and edge detection tasks are indeed performed even for noisy input signals.

翻訳日:2023-07-21 12:20:15 公開日:2023-07-20

# ストリーミング音声認識のためのトランスデューサのグローバル正規化

Globally Normalising the Transducer for Streaming Speech Recognition ( http://arxiv.org/abs/2307.10975v1 )

ライセンス: Link先を確認

Rogier van Dalen

(参考訳) Transducer(例えばRNN-TransducerやConformer-Transducer)は入力シーケンスを横切ると出力ラベルシーケンスを生成する。ストリーミングモードで使うのは簡単で、完全な入力を見る前に部分的な仮説を生成する。これは音声認識で人気がある。しかし、ストリーミングモードでは、Transducerには数学的欠陥があり、単にモデルが心を変える能力を制限するだけである。修正は局所正規化(例えばsoftmax)をグローバル正規化に置き換えることだが、損失関数を正確に評価することは不可能になる。近年の論文では,モデルを近似し,性能を著しく低下させることにより,この問題を解決することを提案する。本稿では,損失関数を近似し,最先端のストリーミングモデルにグローバル正規化を適用することを提案する。グローバル正規化は、ワードエラー率を9-11%削減し、ストリーミングとルックアヘッドモードのほぼ半分を閉じる。

The Transducer (e.g. RNN-Transducer or Conformer-Transducer) generates an output label sequence as it traverses the input sequence. It is straightforward to use in streaming mode, where it generates partial hypotheses before the complete input has been seen. This makes it popular in speech recognition. However, in streaming mode the Transducer has a mathematical flaw which, simply put, restricts the model's ability to change its mind. The fix is to replace local normalisation (e.g. a softmax) with global normalisation, but then the loss function becomes impossible to evaluate exactly. A recent paper proposes to solve this by approximating the model, severely degrading performance. Instead, this paper proposes to approximate the loss function, allowing global normalisation to apply to a state-of-the-art streaming model. Global normalisation reduces its word error rate by 9-11% relative, closing almost half the gap between streaming and lookahead mode.

翻訳日:2023-07-21 12:20:04 公開日:2023-07-20

# 画像処理用deep spiking-unet

Deep Spiking-UNet for Image Processing ( http://arxiv.org/abs/2307.10974v1 )

ライセンス: Link先を確認

Hebei Li, Yueyi Zhang, Zhiwei Xiong, Zheng-jun Zha, Xiaoyan Sun

(参考訳) u-netはその単純かつ効率的なアーキテクチャで知られており、画像処理タスクに広く利用されており、特にニューロモルフィックチップへのデプロイに適している。本稿では,SNN(Spike Neural Networks)とU-Netアーキテクチャを組み合わせた,画像処理のためのスパイキング-UNetの概念を紹介する。効率的なスパイキング-UNetを実現するためには,スパイクによる高忠実度情報伝播の確保と,効果的なトレーニング戦略の策定という2つの課題に直面する。情報損失問題に対処するため、スパイキングUNet内の情報伝達効率を向上させるマルチ閾値スパイキングニューロンを導入する。トレーニング戦略には,事前学習されたu-netモデルを活用した変換および微調整パイプラインを採用する。変換過程では、スキップ接続を利用する際に、異なる部分間のデータ分散の大幅な変動が観察される。そこで本研究では,不正確な発火率を防止するための接続方向正規化手法を提案する。さらに,変換したモデルを微調整するフローベーストレーニング手法を採用し,性能を保ちながら時間ステップを短縮する。実験の結果,画像のセグメンテーションやデノイングでは,既存のSNN手法を超越して,スパイキング・UNetの非スパイキング手法に匹敵する性能が得られた。微調整なしで変換されたSpking-UNetと比較して、Spking-UNetは推論時間を約90%削減する。本研究は、画像処理におけるSNNの適用範囲を広げ、ニューロモルフィックエンジニアリングの分野におけるさらなる探究を促すことが期待されている。 Spiking-UNet実装のコードはhttps://github.com/SNNresearch/Spiking-UNet.comで公開されている。

U-Net, known for its simple yet efficient architecture, is widely utilized for image processing tasks and is particularly suitable for deployment on neuromorphic chips. This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture. To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy. To address the issue of information loss, we introduce multi-threshold spiking neurons, which improve the efficiency of information transmission within the Spiking-UNet. For the training strategy, we adopt a conversion and fine-tuning pipeline that leverage pre-trained U-Net models. During the conversion process, significant variability in data distribution across different parts is observed when utilizing skip connections. Therefore, we propose a connection-wise normalization method to prevent inaccurate firing rates. Furthermore, we adopt a flow-based training method to fine-tune the converted models, reducing time steps while preserving performance. Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart, surpassing existing SNN methods. Compared with the converted Spiking-UNet without fine-tuning, our Spiking-UNet reduces inference time by approximately 90\%. This research broadens the application scope of SNNs in image processing and is expected to inspire further exploration in the field of neuromorphic engineering. The code for our Spiking-UNet implementation is available at https://github.com/SNNresearch/Spiking-UNet.

翻訳日:2023-07-21 12:19:49 公開日:2023-07-20

# 即席投票の適度に重み付けされた監査員:AWAIRE

Adaptively Weighted Audits of Instant-Runoff Voting Elections: AWAIRE ( http://arxiv.org/abs/2307.10972v1 )

ライセンス: Link先を確認

Alexander Ek, Philip B. Stark, Peter J. Stuckey, Damjan Vukcevic

(参考訳) 選挙監査(英: election audit)とは、不正な選挙結果が認定される確率を監査が制限した場合のリスク限度である。即時投票(IRV)選挙の監査方法は、リスク制限や、各投票における投票の電子的記録であるキャスト投票記録(CVR)を必要とするものではない。例えば、IRVコンテストを手動で集計する管轄区域では、CVRは必ずしも利用できない。我々は,CVRが利用できない場合に,適応的に重み付けされたテストスーパーマーチンガルを用いてIRV選挙を効率よく監査するRLA法(AWAIRE)を開発した。適応重み付けの「学習」は、選挙結果を確認するための効率的な仮説のセットである。正確なCVRが利用可能であれば、AWAIREはCVRを必要とする既存のメソッドのパフォーマンスに匹敵する効率を向上させるためにそれらを使用することができる。最大6人の候補者で選挙を処理できるオープンソースのプロトタイプ実装を提供する。実際の選挙のデータを用いたシミュレーションでは、AWAIREは実際に効率的であることが示されている。我々は、より多くの候補者で選挙を扱うための計算手法を拡張する方法について論じる。適応的に重み付けされたテストスーパーマーチンガルの平均は一般的なツールであり、選挙監査を超えて、家族ごとのエラー率を厳格に制御しながら仮説のコレクションをテストするのに有用である。

An election audit is risk-limiting if the audit limits (to a pre-specified threshold) the chance that an erroneous electoral outcome will be certified. Extant methods for auditing instant-runoff voting (IRV) elections are either not risk-limiting or require cast vote records (CVRs), the voting system's electronic record of the votes on each ballot. CVRs are not always available, for instance, in jurisdictions that tabulate IRV contests manually. We develop an RLA method (AWAIRE) that uses adaptively weighted averages of test supermartingales to efficiently audit IRV elections when CVRs are not available. The adaptive weighting 'learns' an efficient set of hypotheses to test to confirm the election outcome. When accurate CVRs are available, AWAIRE can use them to increase the efficiency to match the performance of existing methods that require CVRs. We provide an open-source prototype implementation that can handle elections with up to six candidates. Simulations using data from real elections show that AWAIRE is likely to be efficient in practice. We discuss how to extend the computational approach to handle elections with more candidates. Adaptively weighted averages of test supermartingales are a general tool, useful beyond election audits to test collections of hypotheses sequentially while rigorously controlling the familywise error rate.

翻訳日:2023-07-21 12:19:22 公開日:2023-07-20

# 複数対の空間分離オブザーバへの局所的絡み合い伝達

Local entanglement transfer to multiple pairs of spatially separated observers ( http://arxiv.org/abs/2307.10961v1 )

ライセンス: Link先を確認

Tanmoy Mondal, Kornikar Sen, Chirag Srivastava, Ujjwal Sen

(参考訳) 絡み合いは有利であるが、同時に様々な量子タスクで使われる費用のかかる資源である。絡み合いの効率的な利用と展開のために、空間的に分離された観測者であるCharuとDebuが互いに相互作用することなく絡み合いを共有したいというシナリオを考察する。その結果、それぞれのシステムは、すでに絡み合った状態を共有しているAliceとBobのシステムと、それぞれ別々にローカルに対話することができる。 Alice-Bob 対から複数の Charu-Debu 対への絡み合いが可能であるかどうかを問う。我々は、Alice と Charus の1つ、Bob とそれに対応する Debu によって適用された合同ユニタリを見つけ、Alice と Bob の間で共有される絡み合いの非ゼロの量を、無限個の Charus と Debus に順次転送することができる。これらのユニタリを用いて一定数のペアに移動可能な絡み合いの量について議論する。また、一定量の絡み合いを転送できるペアの数も決定する。さらに,可能なすべての局所ユニタリを最適化することにより,各組が少なくとも一定量の絡み合いを得るように、絡み合いを転送できる組の最大数を解析する。

Entanglement is an advantageous but at the same time a costly resource utilized in various quantum tasks. For an efficient usage and deployment of entanglement, we envisage the scenario where a pair of spatially separated observers, Charu and Debu, want to share entanglement without interacting with each other. As a way out, their systems can separately and locally interact with those of Alice and Bob, respectively, who already share an entangled state. We ask if it is possible to transfer entanglement from the Alice-Bob pair to multiple Charu- Debu pairs, where the Alice-Bob pair only possesses a limited amount of pre-shared entanglement. We find joint unitaries, which when applied by Alice and one of the Charus, and by Bob and the corresponding Debu, such that a nonzero amount of the entanglement shared between Alice and Bob can be sequentially transferred to an indefinite number of pairs of Charus and Debus. We discuss the amount of entanglement that can be transferred to a fixed number of pairs using these unitaries. Also, we determine to how many pairs a fixed amount of entanglement can be transferred. Moreover, by optimizing over all possible local unitaries, we analyze the maximum number of pairs to which entanglement can be transferred in such a way that each pair gets at least a fixed amount of entanglement.

翻訳日:2023-07-21 12:18:58 公開日:2023-07-20

# 光力学を用いた伝播光モード間の連続的可変絡み合い

Continuous variable entanglement between propagating optical modes using optomechanics ( http://arxiv.org/abs/2307.10956v1 )

ライセンス: Link先を確認

Greeshma Gopinath (1), Yong Li (2), Sankar Davuluri (1) ((1) Department of Physics, BITS Pilani, Hyderabad Campus, Hyderabad, India, (2) Center for Theoretical Physics and School of Science, Hainan University, Haikou 570228, China)

(参考訳) 本稿では, 2つの空間分離した出力レーザー場を, 中間膜を有する光機械的キャビティから絡み合う新しい方法を提案する。放射圧力結合は、入力と出力場の四角形の間の相関を修正するために用いられる。次に、光機械的キャビティ出力のレーザーフィールドを量子バックアクションヌル化メーター技術を用いて絡み合う。熱雑音が絡み合いに及ぼす影響について検討した。実験可能なパラメータでは、レーザーフィールド間の絡み合いは室温まで持続する。

This article proposes a new method to entangle two spatially separated output laser fields from an optomechanical cavity with a membrane in the middle. The radiation pressure force coupling is used to modify the correlations between the input and the output field quadratures. Then the laser fields at the optomechanical cavity output are entangled using the quantum back-action nullifying meter technique. The effect of thermal noise on the entanglement is studied. For experimentally feasible parameters, the entanglement between the laser fields survives upto room temperature.

翻訳日:2023-07-21 12:18:32 公開日:2023-07-20

# 内視鏡手術症例における脊髄神経分節法とデータセット構築

Spinal nerve segmentation method and dataset construction in endoscopic surgical scenarios ( http://arxiv.org/abs/2307.10955v1 )

ライセンス: Link先を確認

Shaowu Peng, Pengcheng Zhao, Yongyu Ye, Junying Chen, Yunbing Chang, Xiaoqing Zheng

(参考訳) 内視鏡手術は現在,脊髄外科領域において重要な治療方法であり,ビデオ指導による脊髄神経損傷の回避が重要な課題である。本稿では,内視鏡下手術における脊髄神経のリアルタイム分割法について紹介する。手術中に記録された約10,000個の分節フレームの微細注釈付きセグメンテーションデータセットを初めて構築し、セグメンテーションの問題に対処する。本データセットに基づいて,フレーム間情報と自己認識機構を利用して最先端の性能を実現する FUnet (Frame-Unet) を提案する。また、同様のポリプ内視鏡映像データセット上で拡張exper-imentsを行い、そのモデルが優れた性能を有することを示す。この作業のデータセットとコードは以下の通りである。

Endoscopic surgery is currently an important treatment method in the field of spinal surgery and avoiding damage to the spinal nerves through video guidance is a key challenge. This paper presents the first real-time segmentation method for spinal nerves in endoscopic surgery, which provides crucial navigational information for surgeons. A finely annotated segmentation dataset of approximately 10,000 consec-utive frames recorded during surgery is constructed for the first time for this field, addressing the problem of semantic segmentation. Based on this dataset, we propose FUnet (Frame-Unet), which achieves state-of-the-art performance by utilizing inter-frame information and self-attention mechanisms. We also conduct extended exper-iments on a similar polyp endoscopy video dataset and show that the model has good generalization ability with advantageous performance. The dataset and code of this work are presented at: https://github.com/zzzzzzpc/FUnet .

翻訳日:2023-07-21 12:18:23 公開日:2023-07-20

# シャープネス最小化アルゴリズムはシャープネスを最小化するだけでなく、より高度な一般化を実現する

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization ( http://arxiv.org/abs/2307.11007v1 )

ライセンス: Link先を確認

Kaiyue Wen, Tengyu Ma, Zhiyuan Li

(参考訳) 広範な研究にもかかわらず、過剰パラメータ化されたニューラルネットワークが一般化できる理由については、いまだに解明されていない。既存の理論では、一般的な確率最適化器は訓練損失のより平坦な最小化器を好んでおり、従って平坦性は一般化を意味するという自然な説明がある。この研究はこの説明を批判的に検証する。 1) 平坦性が一般化を立証する, (2) 非一般化平坦性モデルが存在する, (2) シャープ性最小化アルゴリズムは一般化しない, (3) もっとも驚くことに、非一般化平坦性モデルが存在するが、シャープ性最小化アルゴリズムは依然として一般化している。以上の結果から,シャープネスと一般化の関係はデータ分布とモデルアーキテクチャに依存し,シャープネス最小化アルゴリズムはシャープネスを最小化するだけでなく,より優れた一般化を実現することができることが示唆された。これにより、超パラメータニューラルネットワークの一般化のための他の説明の探索が要求される。

Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.

翻訳日:2023-07-21 12:10:38 公開日:2023-07-20

# 事前学習されたASRとLMを統合した音声言語理解のためのシーケンス生成

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding ( http://arxiv.org/abs/2307.11005v1 )

ライセンス: Link先を確認

Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe

(参考訳) 事前学習音声認識(ASR)と言語モデル(LM)をSLUフレームワークに統合することへの関心が高まっている。しかし、事前の手法は事前訓練されたモデル間の語彙ミスマッチに苦しむことが多く、LMはNLUの定式化から分岐するので直接利用できない。本研究では,ASRおよびLMサブネットワークをSLUに効果的に統合し,シーケンス生成タスクをSLUに組み込む3パスエンドツーエンド(E2E)SLUシステムを提案する。最初のパスでは、ASRサブネットワークを用いてASRの書き起こしを予測する。その後、LMサブネットワークが続き、最初のSLU予測を行う。第3パスでは、最終的な予測を行うために、ASRおよびLMサブネットワークからの表現に関する検討サブネットワーク条件が記述される。提案した3パスSLUシステムは,2つのベンチマークSLUデータセット(SLURPとSLUE)上でのカスケードおよびE2E SLUモデルの性能向上を示す。

There has been an increased interest in the integration of pretrained speech recognition (ASR) and language models (LM) into the SLU framework. However, prior methods often struggle with a vocabulary mismatch between pretrained models, and LM cannot be directly utilized as they diverge from its NLU formulation. In this study, we propose a three-pass end-to-end (E2E) SLU system that effectively integrates ASR and LM subnetworks into the SLU formulation for sequence generation tasks. In the first pass, our architecture predicts ASR transcripts using the ASR subnetwork. This is followed by the LM subnetwork, which makes an initial SLU prediction. Finally, in the third pass, the deliberation subnetwork conditions on representations from the ASR and LM subnetworks to make the final prediction. Our proposed three-pass SLU system shows improved performance over cascaded and E2E SLU models on two benchmark SLU datasets, SLURP and SLUE, especially on acoustically challenging utterances.

翻訳日:2023-07-21 12:10:18 公開日:2023-07-20

# neosyspartan:数値相対性理論を用いた偏心二重ブラックホールの高次多重極波形のニューロシンボリックスピン予測アーキテクチャ

NeoSySPArtaN: A Neuro-Symbolic Spin Prediction Architecture for higher-order multipole waveforms from eccentric Binary Black Hole mergers using Numerical Relativity ( http://arxiv.org/abs/2307.11003v1 )

ライセンス: Link先を確認

Amrutaa Vibho, Ali Al Bataineh

(参考訳) 連星ブラックホールと中性子星の融合におけるスピンマグニチュードの予測は、これらの大災害の間に放出される天体物理学的過程と重力波(gw)信号を理解する上で重要である。本稿では,ニューラルネットのパワーとシンボリック回帰を組み合わせた新しいニューロシンボリックアーキテクチャ(nsa)を提案し,ブラックホールと中性子星の融合のスピンマグニチュードを正確に予測する。本稿では,SXSウェーブフォームカタログの数値相対性理論から得られたGW波形データを利用する。これら2つのアプローチを組み合わせることで,両パラダイムの強みを活用し,スピンマグニチュードの包括的かつ正確な予測を可能にする。実験の結果,提案アーキテクチャは, NSAモデルでは0.05の根平均二乗誤差(RMSE), NSAモデルでは0.03の平均二乗誤差(MSE), シンボリック回帰モデルでは0.12のRMSEを実現している。このモデルを用いて高次多重極波形の処理を訓練し,特異な特徴を示す偏心候補に着目した。以上の結果から,合併におけるスピン大小予測のための頑健かつ解釈可能な枠組みが得られた。これはブラックホールの天体物理学的性質を理解し、GW信号の基盤となる物理を解読することにつながる。

The prediction of spin magnitudes in binary black hole and neutron star mergers is crucial for understanding the astrophysical processes and gravitational wave (GW) signals emitted during these cataclysmic events. In this paper, we present a novel Neuro-Symbolic Architecture (NSA) that combines the power of neural networks and symbolic regression to accurately predict spin magnitudes of black hole and neutron star mergers. Our approach utilizes GW waveform data obtained from numerical relativity simulations in the SXS Waveform catalog. By combining these two approaches, we leverage the strengths of both paradigms, enabling a comprehensive and accurate prediction of spin magnitudes. Our experiments demonstrate that the proposed architecture achieves an impressive root-mean-squared-error (RMSE) of 0.05 and mean-squared-error (MSE) of 0.03 for the NSA model and an RMSE of 0.12 for the symbolic regression model alone. We train this model to handle higher-order multipole waveforms, with a specific focus on eccentric candidates, which are known to exhibit unique characteristics. Our results provide a robust and interpretable framework for predicting spin magnitudes in mergers. This has implications for understanding the astrophysical properties of black holes and deciphering the physics underlying the GW signals.

翻訳日:2023-07-21 12:09:57 公開日:2023-07-20

# 自動圧縮によるプライベートフェデレーション学習

Private Federated Learning with Autotuned Compression ( http://arxiv.org/abs/2307.10999v1 )

ライセンス: Link先を確認

Enayat Ullah, Christopher A. Choquette-Choo, Peter Kairouz, Sewoong Oh

(参考訳) 我々は,圧縮率の設定やチューニングを必要とせずに,プライベートフェデレーション学習におけるコミュニケーションを減らす新しい手法を提案する。我々のオンザフライ方式は,セキュアアグリゲーションとディファレンシャルプライバシを使用して,証明可能なプライバシ保証を維持しつつ,トレーニング中のエラーに基づいて圧縮率を自動的に調整する。提案手法は, 平均推定において, 「問題の硬さ」に適応し, 最小の相互作用性で適応できることを示す。本手法は,チューニングを必要とせず,良好な圧縮率を達成し,実世界のデータセット上での有効性を示す。

We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates. Our on-the-fly methods automatically adjust the compression rate based on the error induced during training, while maintaining provable privacy guarantees through the use of secure aggregation and differential privacy. Our techniques are provably instance-optimal for mean estimation, meaning that they can adapt to the ``hardness of the problem" with minimal interactivity. We demonstrate the effectiveness of our approach on real-world datasets by achieving favorable compression rates without the need for tuning.

翻訳日:2023-07-21 12:09:31 公開日:2023-07-20

# dream: ブラックボックスモデルのドメインフリーリバースエンジニアリング属性

DREAM: Domain-free Reverse Engineering Attributes of Black-box Model ( http://arxiv.org/abs/2307.10997v1 )

ライセンス: Link先を確認

Rongqing Li, Jiaqi Yu, Changsheng Li, Wenhan Luo, Ye Yuan, Guoren Wang

(参考訳) ディープラーニングモデルは通常、マシンラーニングプラットフォームにデプロイされるブラックボックスである。以前の研究では、ターゲットのブラックボックスニューラルネットワークの属性(例えば$、畳み込みレイヤの数)がクエリのシーケンスを通じて露呈できることが示されている。これらの作業では、ターゲットモデルを事前にトレーニングするために使用するデータセットを仮定し、このデータセットをモデル属性アタックに利用する。しかし、実際にターゲットブラックボックスモデルのトレーニングデータセットにアクセスすることは困難である。したがって、このケースでターゲットブラックボックスモデルの属性が明らかにされるかどうかは疑わしい。本稿では,対象モデルのトレーニングデータセットの可用性を必要とせず,ドリームと呼ばれるブラックボックスターゲットモデルの属性をドメインに依存しないリバースエンジニアリングする新たな問題を調査し,この問題を分散(ood)一般化問題として位置づけることで,汎用的・原則的な枠組みを提案する。このようにして、ターゲットブラックボックスモデルの属性を未知のトレーニングデータで逆推論するために、ドメインに依存しないモデルを学ぶことができる。これにより,本手法は,強力な一般化能力を持つモデル属性リバースエンジニアリングにおいて,任意の領域に優雅に適用できる種類の1つである。広範な実験を行い,提案手法がベースラインよりも優れていることを検証した。

Deep learning models are usually black boxes when deployed on machine learning platforms. Prior works have shown that the attributes ($e.g.$, the number of convolutional layers) of a target black-box neural network can be exposed through a sequence of queries. There is a crucial limitation: these works assume the dataset used for training the target model to be known beforehand and leverage this dataset for model attribute attack. However, it is difficult to access the training dataset of the target black-box model in reality. Therefore, whether the attributes of a target black-box model could be still revealed in this case is doubtful. In this paper, we investigate a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target Model, called DREAM, without requiring the availability of the target model's training dataset, and put forward a general and principled framework by casting this problem as an out of distribution (OOD) generalization problem. In this way, we can learn a domain-agnostic model to inversely infer the attributes of a target black-box model with unknown training data. This makes our method one of the kinds that can gracefully apply to an arbitrary domain for model attribute reverse engineering with strong generalization ability. Extensive experimental studies are conducted and the results validate the superiority of our proposed method over the baselines.

翻訳日:2023-07-21 12:09:21 公開日:2023-07-20

# 生音楽生成のためのプログレッシブ蒸留拡散

Progressive distillation diffusion for raw music generation ( http://arxiv.org/abs/2307.10994v1 )

ライセンス: Link先を確認

Svetlana Pavlova

(参考訳) 本稿では,生のオーディオファイルを生成するタスクに,新たなディープラーニングアプローチを適用することを目的とする。これは近年の深層生成モデルである拡散モデルに基づいている。この新しい手法は画像生成において際立った結果を示している。コンピュータビジョンコミュニティによって、これらのモデルに多くの焦点が当てられている。一方で、波形領域の音楽生成など、他の種類のアプリケーションに対して与えられたものはごくわずかである。本稿では,1次元u-netを用いたプログレッシブ蒸留拡散の非条件生成モデルを実装した。次に、拡散の異なるパラメータと完全な結果におけるそれらの値の比較を示す。この方法で実装された方法の大きな利点は、1チャンネル128×384から3チャンネル128×128メルスペクトログラムへの変換とループ生成を使用して、オーディオ処理と生成の進捗に対処できるという事実である。経験的比較は、異なる自己収集データセット間で実現される。

This paper aims to apply a new deep learning approach to the task of generating raw audio files. It is based on diffusion models, a recent type of deep generative model. This new type of method has recently shown outstanding results with image generation. A lot of focus has been given to those models by the computer vision community. On the other hand, really few have been given for other types of applications such as music generation in waveform domain. In this paper the model for unconditional generating applied to music is implemented: Progressive distillation diffusion with 1D U-Net. Then, a comparison of different parameters of diffusion and their value in a full result is presented. One big advantage of the methods implemented through this work is the fact that the model is able to deal with progressing audio processing and generating , using transformation from 1-channel 128 x 384 to 3-channel 128 x 128 mel-spectrograms and looped generation. The empirical comparisons are realized across different self-collected datasets.

翻訳日:2023-07-21 12:08:57 公開日:2023-07-20

# 高密度サンプルディープラーニング

Dense Sample Deep Learning ( http://arxiv.org/abs/2307.10991v1 )

ライセンス: Link先を確認

Stephen Jos\`e Hanson, Vivek Yadev, Catherine Hanson

(参考訳) 1980年代に最初に提案されたニューラルネットワークアルゴリズムの変種であるdeep learning(dl)は、言語翻訳、タンパク質の折り畳み、自動運転車、最近では人間に似た言語モデル(チャットボット)に至るまで、人工知能(ai)において驚くべき進歩を遂げた。ディープラーニング(dl)ネットワークの利用は増加しているが、これらのネットワークをさまざまなアプリケーションで効果的にする学習メカニズムや表現については、実際にはほとんど理解されていない。答えの一部はアーキテクチャの巨大なスケールでなければならないし、もちろんデータの大規模なスケールでなければならない。しかし、深層学習表現の性質はほとんど不明である。残念なことに、数百万から数十億のトークンを持つトレーニングセットには未知のコンビネータがあり、数百万から数十億の隠れたユニットを持つネットワークは容易に可視化できず、そのメカニズムは容易に明らかにできない。本稿では,これらの質問を高密度サンプルタスク(最低500個以上のトークンを含む5つのユニークなトークン)における大きな (1.24M 重量; VGG) DL を用いて探索し,カテゴリ構造と特徴構成の出現をより注意深く追従することを可能にする。これらの結果から,dlの学習ダイナミクスに関する基礎的な観察を収集し,本研究に基づく複雑な特徴構築の新たな理論を提案する。

Deep Learning (DL) , a variant of the neural network algorithms originally proposed in the 1980s, has made surprising progress in Artificial Intelligence (AI), ranging from language translation, protein folding, autonomous cars, and more recently human-like language models (CHATbots), all that seemed intractable until very recently. Despite the growing use of Deep Learning (DL) networks, little is actually understood about the learning mechanisms and representations that makes these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and of course the large scale of the data, since not much has changed since 1987. But the nature of deep learned representations remain largely unknown. Unfortunately training sets with millions or billions of tokens have unknown combinatorics and Networks with millions or billions of hidden units cannot easily be visualized and their mechanisms cannot be easily revealed. In this paper, we explore these questions with a large (1.24M weights; VGG) DL in a novel high density sample task (5 unique tokens with at minimum 500 exemplars per token) which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping, From these results we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.

翻訳日:2023-07-21 12:08:43 公開日:2023-07-20

# 機械学習回帰におけるトレーニングセット充填距離の最小化の検討

Investigating minimizing the training set fill distance in machine learning regression ( http://arxiv.org/abs/2307.10988v1 )

ライセンス: Link先を確認

Paolo Climaco and Jochen Garcke

(参考訳) 多くの機械学習回帰手法は予測モデルをトレーニングするために大きなデータセットを利用する。しかし、計算上の制限やラベル付けコストが高いため、大規模なデータセットを使用することは不可能である。したがって、計算効率を保ちながらモデル性能を最大化するためには、未ラベルデータポイントのプールから小さなトレーニングセットをサンプリングすることが不可欠である。本研究では,選択した集合の充填距離を最小化するためのサンプリング手法を提案する。我々は,データ特徴の知識を条件として,トレーニングセット満杯距離に線形に依存する最大予測誤差の上限を導出する。経験的検証のために、2つのデータセット上で2つの回帰モデルを用いて実験を行う。実験により, 充填距離を最小化することを目的としたトレーニングセットの選択により, 境界を最小化することで, 各種回帰モデルの最大予測誤差を大幅に低減し, 既存のサンプリングアプローチを高いマージンで上回ることを示した。

Many machine learning regression methods leverage large datasets for training predictive models. However, using large datasets may not be feasible due to computational limitations or high labelling costs. Therefore, sampling small training sets from large pools of unlabelled data points is essential to maximize model performance while maintaining computational efficiency. In this work, we study a sampling approach aimed to minimize the fill distance of the selected set. We derive an upper bound for the maximum expected prediction error that linearly depends on the training set fill distance, conditional to the knowledge of data features. For empirical validation, we perform experiments using two regression models on two datasets. We empirically show that selecting a training set by aiming to minimize the fill distance, thereby minimizing the bound, significantly reduces the maximum prediction error of various regression models, outperforming existing sampling approaches by a large margin.

翻訳日:2023-07-21 12:08:15 公開日:2023-07-20

# 機械的因果グラフによる決定理論の特徴付け

Characterising Decision Theories with Mechanised Causal Graphs ( http://arxiv.org/abs/2307.10987v1 )

ライセンス: Link先を確認

Matt MacDermott, Tom Everitt, and Francesco Belardinelli

(参考訳) 自分の決定は私の期待する成果に対する信念にどのように影響を与えるべきか? ある行動をとることで、自分自身をある種の人と見なすなら、他人が私をどう見ているか、そして私と似た人をどう見ているかに影響を与えます。これは私の期待するユーティリティ計算に影響し、どのアクションがベストかを変更できます。議論の対象となるかどうか、どのように考えるべきかは、明らかな決定理論、因果決定理論、機能的な決定理論を含む、議論の的となっている。本稿では、機械化された因果モデルを用いて、最も重要な決定理論を特徴づけ、区別し、異なる決定理論の分類を生成できることを示す。

How should my own decisions affect my beliefs about the outcomes I expect to achieve? If taking a certain action makes me view myself as a certain type of person, it might affect how I think others view me, and how I view others who are similar to me. This can influence my expected utility calculations and change which action I perceive to be best. Whether and how it should is subject to debate, with contenders for how to think about it including evidential decision theory, causal decision theory, and functional decision theory. In this paper, we show that mechanised causal models can be used to characterise and differentiate the most important decision theories, and generate a taxonomy of different decision theories.

翻訳日:2023-07-21 12:07:59 公開日:2023-07-20

# metric3d: 1つの画像からゼロショットメトリック3d予測へ

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image ( http://arxiv.org/abs/2307.10984v1 )

ライセンス: Link先を確認

Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen

(参考訳) 画像から正確な3dシーンを再構築することは、長年のビジョン課題だ。単一像再構成問題の不備により、最もよく確立された手法は多視点幾何学に基づいている。 state-of-the-art (sota) 単眼距離推定法は単一のカメラモデルしか処理できず、距離曖昧性のため混合データトレーニングを行うことができない。一方、大きな混合データセットで訓練されたsoma単眼法は、実世界のメトリクスを復元できないアフィン不変深さを学習することでゼロショット一般化を達成する。本研究では,ゼロショット単眼距離モデルにおける鍵は,大規模データトレーニングと様々なカメラモデルによる距離曖昧性解消の組み合わせにあることを示す。そこで本稿では,曖昧性問題に明示的に対処し,既存の単眼モデルに無益に接続可能な標準カメラ空間変換モジュールを提案する。当社のモジュールを搭載した単眼モデルは、数千台のカメラモデルを備えた800万以上のイメージで安定してトレーニングすることが可能です。 7つのゼロショットベンチマークでSOTA性能を示す実験を行った。特に,本手法は,第2回単眼深度推定チャレンジで優勝した。提案手法は, ランダムに収集したインターネット画像上での計測3次元構造の正確な復元を可能にする。潜在的な利点は下流のタスクにまで拡張され、モデルにプラグインするだけで大幅に改善できます。例えば,本モデルではモノクロSLAMのスケールドリフト問題(第1図)を緩和し,高品質な計量スケール高密度マッピングを実現する。コードはhttps://github.com/YvanYin/Metric3Dで入手できる。

Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-posedness of the single-image reconstruction problem, most well-established methods are built upon multi-view geometry. State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity. Meanwhile, SOTA monocular methods trained on large mixed datasets achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. In this work, we show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models. Equipped with our module, monocular models can be stably trained with over 8 million images with thousands of camera models, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Experiments demonstrate SOTA performance of our method on 7 zero-shot benchmarks. Notably, our method won the championship in the 2nd Monocular Depth Estimation Challenge. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. The potential benefits extend to downstream tasks, which can be significantly improved by simply plugging in our model. For example, our model relieves the scale drift issues of monocular-SLAM (Fig. 1), leading to high-quality metric scale dense mapping. The code is available at https://github.com/YvanYin/Metric3D.

翻訳日:2023-07-21 12:07:47 公開日:2023-07-20

# クラスタ対応半教師付き学習:クラスタリングを学習する関係知識蒸留

Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering ( http://arxiv.org/abs/2307.11030v1 )

ライセンス: Link先を確認

Yijun Dong, Kevin Miller, Qi Lei, Rachel Ward

(参考訳) 教師と生徒のモデル間の特徴(関係)にマッチする(関係)知識蒸留の実証的成功と実用的意義にもかかわらず、対応する理論解釈は様々な知識蒸留パラダイムに限定されている。本研究では, 半教師付き分類問題に着目し, 関係知識蒸留(RKD)の理論的理解に向けて最初の一歩を踏み出した。まず,教師モデルによって示される集団誘発グラフ上で,rkdをスペクトルクラスタリングとしてキャスティングすることから始める。予測値と基底値のクラスタリングのばらつきを定量化するクラスタリングエラーの概念を用いて,人口を超えたrkdがクラスタリングエラーの低減につながることを示す。さらに,非ラベルサンプルを限定してrkdに限定したサンプル複雑性を提供する。半教師付き学習では,クラスタ認識型半教師付き学習の一般的なフレームワークを通じて,クラスタリングエラーを想定するRKDのラベル効率をさらに向上する。最後に、このクラスタ対応フレームワークにデータの強化一貫性の規則化を統一することにより、正確なクラスタリングを学習する共通の効果にもかかわらず、rkdはスペクトルクラスタリングを通じて「グローバル」な視点を促進するが、一貫性の規則化は拡張を通じた「ローカル」な視点に焦点を当てる。

Despite the empirical success and practical significance of (relational) knowledge distillation that matches (the relations of) features between teacher and student models, the corresponding theoretical interpretations remain limited for various knowledge distillation paradigms. In this work, we take an initial step toward a theoretical understanding of relational knowledge distillation (RKD), with a focus on semi-supervised classification problems. We start by casting RKD as spectral clustering on a population-induced graph unveiled by a teacher model. Via a notion of clustering error that quantifies the discrepancy between the predicted and ground truth clusterings, we illustrate that RKD over the population provably leads to low clustering error. Moreover, we provide a sample complexity bound for RKD with limited unlabeled samples. For semi-supervised learning, we further demonstrate the label efficiency of RKD through a general framework of cluster-aware semi-supervised learning that assumes low clustering errors. Finally, by unifying data augmentation consistency regularization into this cluster-aware framework, we show that despite the common effect of learning accurate clusterings, RKD facilitates a "global" perspective through spectral clustering, whereas consistency regularization focuses on a "local" perspective via expansion.

翻訳日:2023-07-21 12:02:16 公開日:2023-07-20

# ノイズ量子コンピュータ上でのサイクル離散時間量子ウォーク

Cycle discrete-time quantum walks on a noisy quantum computer ( http://arxiv.org/abs/2307.11027v1 )

ライセンス: Link先を確認

Vivek Wadhia, Nicholas Chancellor and Viv Kendon

(参考訳) 量子コンピューティングの急速な発展により、様々なアプリケーションに対する量子アルゴリズムへの関心が高まっている。量子ウォークは、量子アルゴリズムでの使用の可能性から、関心の高まりも経験している。 qiskitソフトウェアパッケージを使用して、ibmが提供する量子コンピュータの現在の世代がいかに正確にサイクル離散時間量子ウォークをシミュレートできるかをテストする。 ibmq_quitoとして知られるIBM量子デバイス上で、8ノード、8ステップウォーク、より単純な4ノード、4ステップの離散時間量子ウォークを実装し、各ウォークの各ステップに対する結果を示す。 ibmq_santiago量子デバイスのノイズレベルを少なくとも94%削減し、16ノード、16ステップサイクルの離散時間量子ウォークを適度な忠実度レベルにするために、カスタムノイズモデルを開発した。

The rapid development of quantum computing has led to increasing interest in quantum algorithms for a variety of different applications. Quantum walks have also experienced a surge in interest due to their potential use in quantum algorithms. Using the qiskit software package, we test how accurately the current generation of quantum computers provided by IBM can simulate a cycle discrete-time quantum walk. Implementing an 8-node, 8-step walk and a simpler 4-node, 4-step discrete-time quantum walk on an IBM quantum device known as ibmq_quito, the results for each step of the respective walks are presented. A custom noise model is developed in order to estimate that noise levels in the ibmq_santiago quantum device would need to be reduced by at least 94% in order to execute a 16-node, 16-step cycle discrete-time quantum walk to a reasonable level of fidelity.

翻訳日:2023-07-21 12:01:52 公開日:2023-07-20

# ストリーマー自己表現の再構築としてのVTubingの検討:アイデンティティ,パフォーマンス,ジェンダー

Investigating VTubing as a Reconstruction of Streamer Self-Presentation: Identity, Performance, and Gender ( http://arxiv.org/abs/2307.11025v1 )

ライセンス: Link先を確認

Qian Wan and Zhicong Lu

(参考訳) vtubers(virtual youtuber)は、アニメーション2dまたは3d仮想アバターを使ってストリーミングコンテンツを制作するライブストリーマーである。近年、世界中のVTuberクリエイターや視聴者の数が大幅に増加している。この実践は、視聴者のエンゲージメント行動や知覚などのトピックに研究の注意を向けてきたが、アニメーションアバターは、自身の身体を使用する従来のライブストリーミングよりもアイデンティティとパフォーマンスの柔軟性を提供するため、この柔軟性がクリエイター自身の提示方法にどのように影響するかはほとんど研究されていない。この研究は、16人の中国語話者のvtuberのストリーミングプラクティスの質的研究の結果を提示することで、このギャップを埋めようとしている。データによると、ライブストリーミングで使用された仮想アバターは、インフレーションされたプレゼンテーションを使ってクリエイターが自らをプレゼンする機会を与え、視聴者と包括的な対話をもたらした。結果はまた、虚偽の環境に置かれている間、VTubersの膨らみ、しばしばセクシュアライズされた性表現も明らかにした。 VTubingの社会技術的側面は、性嫌がらせや性差別を減らし、自己目的化の懸念も高めた。

VTubers, or Virtual YouTubers, are live streamers who create streaming content using animated 2D or 3D virtual avatars. In recent years, there has been a significant increase in the number of VTuber creators and viewers across the globe. This practise has drawn research attention into topics such as viewers' engagement behaviors and perceptions, however, as animated avatars offer more identity and performance flexibility than traditional live streaming where one uses their own body, little research has focused on how this flexibility influences how creators present themselves. This research thus seeks to fill this gap by presenting results from a qualitative study of 16 Chinese-speaking VTubers' streaming practices. The data revealed that the virtual avatars that were used while live streaming afforded creators opportunities to present themselves using inflated presentations and resulted in inclusive interactions with viewers. The results also unveiled the inflated, and often sexualized, gender expressions of VTubers while they were situated in misogynistic environments. The socio-technical facets of VTubing were found to potentially reduce sexual harassment and sexism, whilst also raising self-objectification concerns.

翻訳日:2023-07-21 12:01:37 公開日:2023-07-20

# 検索強化による大規模言語モデルの事実知識境界の検討

Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation ( http://arxiv.org/abs/2307.11019v1 )

ライセンス: Link先を確認

Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang

(参考訳) 知識集約的なタスク(例えば、オープンドメイン質問応答(QA))は、かなりの量の事実知識を必要とし、しばしば援助のために外部情報に依存する。最近の大規模言語モデル(例えばchatgpt)は、知識集約的なタスクを含む、世界的知識による幅広いタスクの解決において印象的な能力を示している。しかし、LLMが実際の知識境界、特に検索強化を取り入れた場合の行動をどのように認識できるかは、まだ不明である。本研究では,オープンドメインQA上でのLLMの実態知識境界と検索の増大がLLMに与える影響について,初期分析を行った。特に,3つの主要な研究課題に焦点をあて,QA評価,事前判定,後部判定による分析を行った。 llmが質問に対する回答能力と回答の正確性に不当な自信を持っている証拠を示す。さらに,検索の強化は,llmsの知識境界に対する意識向上に有効なアプローチであることが証明され,その判断能力が向上した。さらに, LLMは, 回答の定式化に際し, 提案した検索結果に依存する傾向があり, これらの結果の質がそれらの信頼性に大きく影響することがわかった。この作業を再現するコードはhttps://github.com/RUCAIBox/LLM-Knowledge-Boundaryで公開されている。

Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries, thereby improving their judgemental abilities. Additionally, we also find that LLMs have a propensity to rely on the provided retrieval results when formulating answers, while the quality of these results significantly impacts their reliance. The code to reproduce this work is available at https://github.com/RUCAIBox/LLM-Knowledge-Boundary.

翻訳日:2023-07-21 12:01:14 公開日:2023-07-20

# Amortized Variational Inference: When and Why?

Amortized Variational Inference: When and Why? ( http://arxiv.org/abs/2307.11018v1 )

ライセンス: Link先を確認

Charles C. Margossian and David M. Blei

(参考訳) amortized variational inference (a-vi) は確率モデルにおいて生じる難解な後方分布を近似する手法である。 A-VI の定義的特徴は、各観測結果を局所潜在変数の近似後部へマッピングする大域的推論関数を学ぶことである。これは、各潜在変数の近似分布のパラメータを直接学習するより古典的な因子化(平均場)変分推論(f-vi)とは対照的である。深層生成モデルでは、A-VIは局所潜伏変数の推論を高速化する計算トリックとして用いられる。本稿では, A-VI を F-VI の代替として検討した。 a-vi は、退化族が因子化された族の部分集合であるため、f-vi の最適解よりも低いkullback-leibler 分岐を持つ近似を生成することができない。したがって、中心的な理論的問題は、A-VIがF-VIの最適解を得るときに特徴づけることである。我々は、理論上F-VIの最適性を達成できるモデルと推論関数の両方の条件を導出する。より深い生成モデルを含む幅広い階層モデルに対して、A-VIとF-VIのギャップを埋めることが可能であることを示す。さらに、より広範なモデルのクラスでは、推論関数のドメインを拡張して償却を可能な戦略にする方法と方法を確立します。最後に、隠れマルコフモデルやガウス過程を含む特定のモデルにおいて、a-vi はどんなに表現力のある推論関数であっても f-vi の解と一致しないことを証明する。また、A-VIを実験的に研究する [...]

Amortized variational inference (A-VI) is a method for approximating the intractable posterior distributions that arise in probabilistic models. The defining feature of A-VI is that it learns a global inference function that maps each observation to its local latent variable's approximate posterior. This stands in contrast to the more classical factorized (or mean-field) variational inference (F-VI), which directly learns the parameters of the approximating distribution for each latent variable. In deep generative models, A-VI is used as a computational trick to speed up inference for local latent variables. In this paper, we study A-VI as a general alternative to F-VI for approximate posterior inference. A-VI cannot produce an approximation with a lower Kullback-Leibler divergence than F-VI's optimal solution, because the amortized family is a subset of the factorized family. Thus a central theoretical problem is to characterize when A-VI still attains F-VI's optimal solution. We derive conditions on both the model and the inference function under which A-VI can theoretically achieve F-VI's optimum. We show that for a broad class of hierarchical models, including deep generative models, it is possible to close the gap between A-VI and F-VI. Further, for an even broader class of models, we establish when and how to expand the domain of the inference function to make amortization a feasible strategy. Finally, we prove that for certain models -- including hidden Markov models and Gaussian processes -- A-VI cannot match F-VI's solution, no matter how expressive the inference function is. We also study A-VI empirically [...]

翻訳日:2023-07-21 12:00:50 公開日:2023-07-20

# 心筋梗塞予測のための多目的ポイントクラウドオートエンコーダ

Multi-objective point cloud autoencoders for explainable myocardial infarction prediction ( http://arxiv.org/abs/2307.11017v1 )

ライセンス: Link先を確認

Marcel Beetz, Abhirup Banerjee, Vicente Grau

(参考訳) 心筋梗塞(mi)は、世界で最も一般的な死因の1つである。クリニックで一般的に使用される画像ベースのバイオマーカー、例えば放出分画は、心臓の3D解剖学におけるより複雑なパターンを捉えることができず、診断精度が制限される。本稿では,心臓解剖学と機能学の多クラス3dポイントクラウド表現に基づいて,梗塞予測のための新しい幾何学的深層学習手法として,多目的ポイントクラウドオートエンコーダを提案する。そのアーキテクチャは、低次元の潜在空間で接続された複数のタスク固有の分岐で構成され、リコンストラクションとmi予測の両方の効果的な多目的学習を可能にし、また、解釈可能な潜在空間で病理学的に特異的な3d形状情報をキャプチャする。さらに、ポイントクラウドベースのディープラーニング操作を備えた階層的ブランチ設計により、高分解能の解剖学的ポイントクラウド上で直接、効率的なマルチスケール機能学習が可能になる。大規模な英国バイオバンクデータセットを用いた実験では,マルチオブジェクト・ポイント・クラウド・オートエンコーダは,画像の画素解像度より下方にある予測と入力の解剖学の間のチャムファー距離で,複数の時間的3次元形状を正確に再構成することができる。提案手法は,入射MI予測処理における複数の機械学習および深層学習ベンチマークを,受信者動作曲線の下での面積で19%向上させる。また,そのタスクに特有なコンパクトな潜在性空間は,対象の符号化と対応する3次元形状との間に臨床的に妥当な関係を持つ分離可能な制御およびmiクラスターを示し,その予測可能性を示す。

Myocardial infarction (MI) is one of the most common causes of death in the world. Image-based biomarkers commonly used in the clinic, such as ejection fraction, fail to capture more complex patterns in the heart's 3D anatomy and thus limit diagnostic accuracy. In this work, we present the multi-objective point cloud autoencoder as a novel geometric deep learning approach for explainable infarction prediction, based on multi-class 3D point cloud representations of cardiac anatomy and function. Its architecture consists of multiple task-specific branches connected by a low-dimensional latent space to allow for effective multi-objective learning of both reconstruction and MI prediction, while capturing pathology-specific 3D shape information in an interpretable latent space. Furthermore, its hierarchical branch design with point cloud-based deep learning operations enables efficient multi-scale feature learning directly on high-resolution anatomy point clouds. In our experiments on a large UK Biobank dataset, the multi-objective point cloud autoencoder is able to accurately reconstruct multi-temporal 3D shapes with Chamfer distances between predicted and input anatomies below the underlying images' pixel resolution. Our method outperforms multiple machine learning and deep learning benchmarks for the task of incident MI prediction by 19% in terms of Area Under the Receiver Operating Characteristic curve. In addition, its task-specific compact latent space exhibits easily separable control and MI clusters with clinically plausible associations between subject encodings and corresponding 3D shapes, thus demonstrating the explainability of the prediction.

翻訳日:2023-07-21 12:00:19 公開日:2023-07-20

# 未知の動的システムのためのフローマップ学習:概要,実装,ベンチマーク

Flow Map Learning for Unknown Dynamical Systems: Overview, Implementation, and Benchmarks ( http://arxiv.org/abs/2307.11013v1 )

ライセンス: Link先を確認

Victor Churchill, Dongbin Xiu

(参考訳) フローマップ学習(FML)は、ディープニューラルネットワーク(DNN)とともに、未知の動的システムのデータ駆動モデリングを約束している。 FMLの注目すべき特徴は、正確な数学的モデルが存在しなくても、部分的に観測されたシステムの正確な予測モデルを作成することができることである。本稿では、FMLフレームワークの概要と、その実装を成功させるために重要な計算の詳細について述べる。また,未知の力学系を学習するための,よく定義されたベンチマーク問題も提示する。これらの問題の数値的な詳細は、それらのfmlの結果とともに示され、問題を横断的に検証し、結果が再現可能であることを保証する。

Flow map learning (FML), in conjunction with deep neural networks (DNNs), has shown promises for data driven modeling of unknown dynamical systems. A remarkable feature of FML is that it is capable of producing accurate predictive models for partially observed systems, even when their exact mathematical models do not exist. In this paper, we present an overview of the FML framework, along with the important computational details for its successful implementation. We also present a set of well defined benchmark problems for learning unknown dynamical systems. All the numerical details of these problems are presented, along with their FML results, to ensure that the problems are accessible for cross-examination and the results are reproducible.

翻訳日:2023-07-21 11:59:53 公開日:2023-07-20

# 深層学習テストのためのニューロン感度誘導型テストケース選択

Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing ( http://arxiv.org/abs/2307.11011v1 )

ライセンス: Link先を確認

Dong Huang, Qingwen Bu, Yichao Fu, Yuhao Qing, Bocheng Xiao, Heming Cui

(参考訳) Deep Neural Networks〜(DNN)は様々なタスク(例えば自律運転、医療診断)に対処するためにソフトウェアに広くデプロイされている。しかし、経済的な損失を招き、人間の安全を脅かす誤った行動も生み出す可能性がある。 DNNの誤った振る舞いを明らかにして修正するために、DNN開発者はしばしば、自然界から豊富なラベル付けされていないデータセットを収集し、それらをラベル付けしてDNNモデルをテストする。しかし、多くのラベルのないデータセットを適切にラベル付けすることは、非常に高価で時間がかかります。上記の問題に対処するために,nss(neuron sensitivity guided test case selection)を提案し,ラベル付きデータセットから有用なテストケースを選択することでラベリング時間を短縮する。 NSSは、テストケースによって引き起こされる内部ニューロンの情報を利用して、重要なテストケースを選択する。 sotaベースライン法と比較して,広範に使用される4つのデータセットとよく設計された4つのdnnモデルを用いてnssを評価する。その結果,nssはテストケースの障害トリガ発生確率とモデル改善能力の評価に有効であることがわかった。具体的には、ベースラインアプローチと比較して高いフォールト検出率(例えばmnist \&lenet1実験でラベルなしデータセットから5\%のテストケースを選択する場合、nssはベースラインより20\%高い81.8\%のフォールト検出率を得ることができる)を得ることができる。

Deep Neural Networks~(DNNs) have been widely deployed in software to address various tasks~(e.g., autonomous driving, medical diagnosis). However, they could also produce incorrect behaviors that result in financial losses and even threaten human safety. To reveal the incorrect behaviors in DNN and repair them, DNN developers often collect rich unlabeled datasets from the natural world and label them to test the DNN models. However, properly labeling a large number of unlabeled datasets is a highly expensive and time-consuming task. To address the above-mentioned problem, we propose NSS, Neuron Sensitivity guided test case Selection, which can reduce the labeling time by selecting valuable test cases from unlabeled datasets. NSS leverages the internal neuron's information induced by test cases to select valuable test cases, which have high confidence in causing the model to behave incorrectly. We evaluate NSS with four widely used datasets and four well-designed DNN models compared to SOTA baseline methods. The results show that NSS performs well in assessing the test cases' probability of fault triggering and model improvement capabilities. Specifically, compared with baseline approaches, NSS obtains a higher fault detection rate~(e.g., when selecting 5\% test case from the unlabeled dataset in MNIST \& LeNet1 experiment, NSS can obtain 81.8\% fault detection rate, 20\% higher than baselines).

翻訳日:2023-07-21 11:59:42 公開日:2023-07-20

# 反射エントロピーと計算可能なクロスノルムネガティビティ:自由理論と対称性の解決について

On reflected entropy and computable cross-norm negativity: Free theories and symmetry resolution ( http://arxiv.org/abs/2307.11009v1 )

ライセンス: Link先を確認

Cl\'ement Berthiere and Gilles Parez

(参考訳) 計算可能なクロスノーム(CCNR)と,CCNR負性度(CCNR Negativity)と呼ばれる関連量に基づく分離性基準を検討する。 CCNR負性率の反射バージョンを導入し、その関係を他の確立された絡み合い関連量、すなわち反射エントロピーと作用素エントロピーとを議論する。自由フェルミオン理論とボゾン理論では、2点相関関数の項で正確な公式を導出し、体系的な数値的な研究と原理的には解析的処理を可能にする。大域的な$U(1)$対称性を持つ系に対しては、対称性を解いた反射エントロピーとCCNR負性度を研究する。我々は隣接する区間の荷電モーメントに対する共形場理論(cft)の結果を提供し、数値との完全な一致を求める。我々は,自由フェルミオンモデルと自由ボソンモデルの両方に対して,反射エントロピーとCCNR負の平衡を観察する。最初の電荷依存補正はフェルミオンに対して予想され、ボソンのcft計算から導かれる。

We investigate a separability criterion based on the computable cross-norm (CCNR), and a related quantity called the CCNR negativity. We introduce a reflected version of the CCNR negativity, and discuss its connection with other well-established entanglement-related quantities, namely the reflected entropy and the operator entanglement entropy. For free fermionic and bosonic theories, we derive exact formulas in terms of two-point correlation functions, which allows for systematic numerical investigations and, in principle, analytical treatments. For systems with a global $U(1)$ symmetry, we study the symmetry-resolved reflected entropy and CCNR negativity. We provide conformal field theory (CFT) results for the charged moments in the case of adjacent intervals, finding perfect agreement with the numerics. We observe an equipartition of reflected entropies and CCNR negativities, both for free fermions and free bosons models. The first charge-dependent correction are conjectured for fermions, and worked out from the CFT calculations for bosons.

翻訳日:2023-07-21 11:59:15 公開日:2023-07-20

# 二重非絡み合い操作による蒸留性絡み合い

Distillable entanglement under dually non-entangling operations ( http://arxiv.org/abs/2307.11008v1 )

ライセンス: Link先を確認

Ludovico Lami, Bartosz Regula

(参考訳) ノイズ量子状態からエンタングルメントを蒸留できる正確な速度を計算することは、量子情報における最も長い疑問の1つである。 dne(dually non-entangling)オペレーションのセットの下で、絡み合い蒸留の正確な解を与える -- 一般的に考えられる局所操作と古典的コミュニケーションの緩和であり、分離可能な状態と測定のセットを保存するすべてのチャネルを含んでいる。本研究では, DNE蒸留可能なエンタングルメントは, 議論を分離可能な測定で測定する正規化相対エントロピーの修正版と一致することを示す。 ours は、エンタングルメント理論における任意の種類の自由操作の下での蒸留可能なエンタングルメントの2番目に知られている正規化公式である。我々の発見の直接の結果は、DNEの下では、絡み合った状態から絡み合いを蒸留できるということである。第2の主結果として,dne蒸留性エンタングルメントの一般上界を構成することにより,エンタングルメントの分離可能な相対エントロピーが、エンタングルメントの標準相対エントロピーの正規化よりも厳密に小さいことを証明した。これは [Li/Winter, CMP 326, 63 (2014)] の開問題を解く。

Computing the exact rate at which entanglement can be distilled from noisy quantum states is one of the longest-standing questions in quantum information. We give an exact solution for entanglement distillation under the set of dually non-entangling (DNE) operations -- a relaxation of the typically considered local operations and classical communication, comprising all channels which preserve the sets of separable states and measurements. We show that the DNE distillable entanglement coincides with a modified version of the regularised relative entropy of entanglement in which the arguments are measured with a separable measurement. Ours is only the second known regularised formula for the distillable entanglement under any class of free operations in entanglement theory, after that given by Devetak and Winter for one-way LOCCs. An immediate consequence of our finding is that, under DNE, entanglement can be distilled from any entangled state. As our second main result, we construct a general upper bound on the DNE distillable entanglement, using which we prove that the separably measured relative entropy of entanglement can be strictly smaller than the regularisation of the standard relative entropy of entanglement. This solves an open problem in [Li/Winter, CMP 326, 63 (2014)].

翻訳日:2023-07-21 11:58:55 公開日:2023-07-20

# 科学ワークフローにおけるネットワーク内記憶キャッシュの有効性と予測可能性

Effectiveness and predictability of in-network storage cache for scientific workflows ( http://arxiv.org/abs/2307.11069v1 )

ライセンス: Link先を確認

Caitlin Sim, Kesheng Wu, Alex Sim, Inder Monga, Chin Guok, Frank Wurthwein, Diego Davila, Harvey Newman, Justas Balcas

(参考訳) 大規模な科学的なコラボレーションでは、複数の科学者が同じファイルセットにアクセスし、異なる分析を行い、遠くにある大量の共有データに繰り返しアクセスする。これらのデータアクセスは、距離による遅延が長く、広域ネットワーク上で利用可能な帯域幅が限られている。広域ネットワークトラフィックとデータアクセス遅延を低減するため、新しいネットワークサービスとして地域データストレージキャッシュがインストールされている。科学的応用におけるキャッシュシステムの有効性を検討するため,南カリフォルニアのペタバイトスケールキャッシュを用いて高エネルギー物理実験を行った。約3TBの運用ログを調べることで、このキャッシュはワイドエリアネットワークから67.6%のファイルリクエストを削除し、ワイドエリアネットワーク上のトラフィック量を平均12.3TB(35.4%)削減した。トラフィック量(35.4%)の削減は、より大きなファイルが再利用される可能性が低いため、ファイル数(67.6%)の削減よりも少ない。このデータアクセスパターンの違いにより、キャッシュシステムは、より大きなファイルを処理する際に小さなファイルを削除しないようにポリシーを実装している。また、キャッシュ動作の予測可能性を研究するための機械学習モデルを構築します。テストの結果、このモデルはキャッシュアクセス、キャッシュミス、ネットワークスループットを正確に予測することができ、将来のリソースのプロビジョニングと計画に関する研究に役立ちます。

Large scientific collaborations often have multiple scientists accessing the same set of files while doing different analyses, which create repeated accesses to the large amounts of shared data located far away. These data accesses have long latency due to distance and occupy the limited bandwidth available over the wide-area network. To reduce the wide-area network traffic and the data access latency, regional data storage caches have been installed as a new networking service. To study the effectiveness of such a cache system in scientific applications, we examine the Southern California Petabyte Scale Cache for a high-energy physics experiment. By examining about 3TB of operational logs, we show that this cache removed 67.6% of file requests from the wide-area network and reduced the traffic volume on wide-area network by 12.3TB (or 35.4%) an average day. The reduction in the traffic volume (35.4%) is less than the reduction in file counts (67.6%) because the larger files are less likely to be reused. Due to this difference in data access patterns, the cache system has implemented a policy to avoid evicting smaller files when processing larger files. We also build a machine learning model to study the predictability of the cache behavior. Tests show that this model is able to accurately predict the cache accesses, cache misses, and network throughput, making the model useful for future studies on resource provisioning and planning.

翻訳日:2023-07-21 11:49:58 公開日:2023-07-20

# CNOS:CADベースの新しいオブジェクトセグメンテーションのための強力なベースライン

CNOS: A Strong Baseline for CAD-based Novel Object Segmentation ( http://arxiv.org/abs/2307.11067v1 )

ライセンス: Link先を確認

Van Nguyen Nguyen, Tomas Hodan, Georgy Ponimatkin, Thibault Groueix, Vincent Lepetit

(参考訳) CADモデルを用いて,RGB画像中の未確認オブジェクトを分割する手法を提案する。最近の強力な基盤モデルであるDINOv2とSegment Anythingを活用して、記述子を作成し、与えられた入力RGBイメージのバイナリマスクを含む提案を生成する。 CADモデルから生成された参照記述子と提案を一致させることで、モーダルマスクとともに正確なオブジェクトID割り当てを実現する。我々は,本手法がCADに基づく新しいオブジェクトセグメンテーションにおいて,BOP課題の7つのコアデータセットに対する既存のアプローチを,同一のBOP評価プロトコルを用いて19.8倍のAPで上回っていることを示す。ソースコードはhttps://github.com/nv-nguyen/cnosで入手できます。

We propose a simple three-stage approach to segment unseen objects in RGB images using their CAD models. Leveraging recent powerful foundation models, DINOv2 and Segment Anything, we create descriptors and generate proposals, including binary masks for a given input RGB image. By matching proposals with reference descriptors created from CAD models, we achieve precise object ID assignment along with modal masks. We experimentally demonstrate that our method achieves state-of-the-art results in CAD-based novel object segmentation, surpassing existing approaches on the seven core datasets of the BOP challenge by 19.8\% AP using the same BOP evaluation protocol. Our source code is available at https://github.com/nv-nguyen/cnos.

翻訳日:2023-07-21 11:49:35 公開日:2023-07-20

# ディープラーニングモデルに基づく運転政策予測

Driving Policy Prediction based on Deep Learning Models ( http://arxiv.org/abs/2307.11058v1 )

ライセンス: Link先を確認

Fuxiao Liu

(参考訳) 本研究では,通常のカメラからの映像フレームの視覚的特徴とクラウドポイントスキャナからの深度情報を組み合わせたエンドツーエンドシステムを構築し,運転方針(車両速度と操舵角度)を予測する。実世界経験者による予測結果と標準行動を比較することにより,システムの安全性を検証した。実験結果から,テストケースの半数(モデルによっては50%80%)で精度の高い予測が可能であり,複合機能の利用はビデオフレームのみを使用するよりも,ほとんどのケースで性能が向上した。

In this project, we implemented an end-to-end system that takes in combined visual features of video frames from a normal camera and depth information from a cloud points scanner, and predicts driving policies (vehicle speed and steering angle). We verified the safety of our system by comparing the predicted results with standard behaviors by real-world experienced drivers. Our test results show that the predictions can be considered as accurate in at lease half of the testing cases (50% 80%, depending on the model), and using combined features improved the performance in most cases than using video frames only.

翻訳日:2023-07-21 11:49:19 公開日:2023-07-20

# 二次元テンソルネットワーク計算の複雑さに関するランダムな洞察

Random insights into the complexity of two-dimensional tensor network calculations ( http://arxiv.org/abs/2307.11053v1 )

ライセンス: Link先を確認

Sofia Gonzalez-Garcia, Shengqi Sang, Timothy H. Hsieh, Sergio Boixo, Guifre Vidal, Andrew C. Potter and Romain Vasseur

(参考訳) 射影絡み合いペア状態(PEPS)は、絡み合い領域の法則に従う量子多体状態のメモリ効率の表現を提供し、二次元(2d)凝縮物質系における基底状態の古典的なシミュレーションの基礎である。しかし、厳密な結果は、2d PEPS状態から観測可能なものを正確に計算することは、一般に計算的に難しい問題であることを示している。しかし、2d PEPSの計算特性の近似スキームは、(狭すぎる)凝縮物質基底状態の大きなサブクラスに対して、定期的に使われ、経験的に成功と見られる。本研究では, ランダム行列理論の哲学を取り入れ, 解析的マッピングを応用し, 大きな結合次元で制御された解析を許容する効果的な複製統計力学モデルに活用し, 概ね2次元ランダムペップを収縮する複雑性を解析する。この統計力学レンズを通して、我々は次のように論じる。一ランダムPEPSのおよそのサンプリング波動関数振幅は、臨界結合次元を超える計算複雑相転移に直面している。二任意の有限結合次元のノルム及び相関関数を総称的に推定することができる。これらの結果は、様々なボンド次元体制に対して数値的に支持される。乱数PEPSに対する上記の結果が、物理的に関連する基底状態を表すPEPSにもより一般的に適用されるかどうかは、重要な未解決問題である。

Projected entangled pair states (PEPS) offer memory-efficient representations of some quantum many-body states that obey an entanglement area law, and are the basis for classical simulations of ground states in two-dimensional (2d) condensed matter systems. However, rigorous results show that exactly computing observables from a 2d PEPS state is generically a computationally hard problem. Yet approximation schemes for computing properties of 2d PEPS are regularly used, and empirically seen to succeed, for a large subclass of (not too entangled) condensed matter ground states. Adopting the philosophy of random matrix theory, in this work we analyze the complexity of approximately contracting a 2d random PEPS by exploiting an analytic mapping to an effective replicated statistical mechanics model that permits a controlled analysis at large bond dimension. Through this statistical-mechanics lens, we argue that: i) although approximately sampling wave-function amplitudes of random PEPS faces a computational-complexity phase transition above a critical bond dimension, ii) one can generically efficiently estimate the norm and correlation functions for any finite bond dimension. These results are supported numerically for various bond-dimension regimes. It is an important open question whether the above results for random PEPS apply more generally also to PEPS representing physically relevant ground states

翻訳日:2023-07-21 11:49:08 公開日:2023-07-20

# hrfnet:衛星画像のローカライズのための高解像度偽造ネットワーク

HRFNet: High-Resolution Forgery Network for Localizing Satellite Image Manipulation ( http://arxiv.org/abs/2307.11052v1 )

ライセンス: Link先を確認

Fahim Faisal Niloy, Kishor Kumar Bhaumik, Simon S. Woo

(参考訳) 既存の高解像度衛星画像偽造ローカライズ手法はパッチベースまたはダウンサンプリングベースのトレーニングに依存している。これらのトレーニング手法には、プリスタンと偽造領域の境界の不正確さ、不要なアーティファクトの生成など、大きな欠点がある。本稿では,高分解能画像分割文学に触発された課題に対処するため,衛星画像のフォージェリーローカライゼーションを効果的に実現するためのHRFNetと呼ばれる新しいモデルを提案する。具体的には, 浅い枝と深い枝が組み合わさったモデルにより, RGB と再サンプリング機能を大域的および局所的に統合し, フォージェリーをより正確にローカライズすることができる。メモリ要求と処理速度は既存手法と比較して損なわれないが,本手法が最高の性能を達成することを示すため,様々な実験を行った。

Existing high-resolution satellite image forgery localization methods rely on patch-based or downsampling-based training. Both of these training methods have major drawbacks, such as inaccurate boundaries between pristine and forged regions, the generation of unwanted artifacts, etc. To tackle the aforementioned challenges, inspired by the high-resolution image segmentation literature, we propose a novel model called HRFNet to enable satellite image forgery localization effectively. Specifically, equipped with shallow and deep branches, our model can successfully integrate RGB and resampling features in both global and local manners to localize forgery more accurately. We perform various experiments to demonstrate that our method achieves the best performance, while the memory requirement and processing speed are not compromised compared to existing methods.

翻訳日:2023-07-21 11:48:42 公開日:2023-07-20

# 目標へのパンクラルム:ヒューマン・イン・ザ・ループフィードバックによる目標条件付き探索

Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback ( http://arxiv.org/abs/2307.11049v1 )

ライセンス: Link先を確認

Marcel Torne, Max Balsells, Zihan Wang, Samedh Desai, Tao Chen, Pulkit Agrawal, Abhishek Gupta

(参考訳) 探索と報酬の仕様は強化学習の基本的かつ相互に絡み合った課題である。逐次的な意思決定タスクの解決には、報酬関数の慎重な設計や、新規な探索ボーナスの使用が必要である。ヒューマンスーパーバイザーは、探索プロセスを指示するためにループ内で効果的なガイダンスを提供することができるが、このガイダンスを利用する以前の方法は、常に同期した高品質な人間のフィードバックを必要とする。本研究では,非熟練ユーザからの低品質のフィードバックを,散発的で非同期でノイズの多い,ヒューマンガイド探索(huge)と呼ばれる手法を提案する。 HuGEは、シミュレーションだけでなく、実世界でも、厳密な報酬仕様なしで強化学習の探索をガイドしている。人間のフィードバックは探索を手助けするが、探索データから自己監督された学習はバイアスのない政策を生み出す。この手順は、騒々しく非同期な人間のフィードバックを利用して、手作りの報酬設計や探索ボーナスなしでポリシーを学ぶことができる。 HuGEは、専門家でないユーザからのクラウドソースフィードバックを使用して、シミュレーションにおいて、さまざまな困難なマルチステージロボットナビゲーションと操作タスクを学ぶことができる。さらに、このパラダイムは、人間のスーパーバイザーからの非同期フィードバックを使用して、現実世界のロボットで直接学習することができる。

Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision-making tasks requiring expansive exploration requires either careful design of reward functions or the use of novelty-seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to leverage this guidance require constant synchronous high-quality human feedback, which is expensive and impractical to obtain. In this work, we present a technique called Human Guided Exploration (HuGE), which uses low-quality feedback from non-expert users that may be sporadic, asynchronous, and noisy. HuGE guides exploration for reinforcement learning not only in simulation but also in the real world, all without meticulous reward specification. The key concept involves bifurcating human feedback and policy learning: human feedback steers exploration, while self-supervised learning from the exploration data yields unbiased policies. This procedure can leverage noisy, asynchronous human feedback to learn policies with no hand-crafted reward design or exploration bonuses. HuGE is able to learn a variety of challenging multi-stage robotic navigation and manipulation tasks in simulation using crowdsourced feedback from non-expert users. Moreover, this paradigm can be scaled to learning directly on real-world robots, using occasional, asynchronous feedback from human supervisors.

翻訳日:2023-07-21 11:48:27 公開日:2023-07-20

# 連続的強化学習の定義

A Definition of Continual Reinforcement Learning ( http://arxiv.org/abs/2307.11046v1 )

ライセンス: Link先を確認

David Abel, Andr\'e Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh

(参考訳) 本稿では,継続的強化学習の基盤を開発する。

In this paper we develop a foundation for continual reinforcement learning.

翻訳日:2023-07-21 11:48:01 公開日:2023-07-20

# 有界エージェントの収束について

On the Convergence of Bounded Agents ( http://arxiv.org/abs/2307.11044v1 )

ライセンス: Link先を確認

David Abel, Andr\'e Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh

(参考訳) エージェントがいつ収束したか? 強化学習問題の標準モデルは収束の直接的な定義をもたらす: エージェントがそれぞれの環境状態における振る舞いや性能が変化しなくなると収束する。しかし,学習課題の焦点を環境状態からエージェントの状態へと移すにつれて,エージェントの収束の概念が著しく明確になる。本稿では,有界エージェントを中心とした強化学習問題のフレーミングにおけるエージェント収束の相補的な2つの説明を提案する。第一の見方では、有界エージェントは、エージェントの将来の振る舞いを記述するのに必要な最小の状態数が減少しないときに収束する。第2のビューでは、エージェントの内部状態が変更された場合にのみ、エージェントのパフォーマンスが変化するときのみ、境界エージェントが収束したと述べる。これらの2つの定義の基本的な性質を定め、標準設定における収束の典型的な見解を満たし、それらの性質と関係性に関するいくつかの事実を証明する。これらの視点、定義、分析は、分野の中心的な考え方に明確性をもたらす。

When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly less clear. In this paper, we propose two complementary accounts of agent convergence in a framing of the reinforcement learning problem that centers around bounded agents. The first view says that a bounded agent has converged when the minimal number of states needed to describe the agent's future behavior cannot decrease. The second view says that a bounded agent has converged just when the agent's performance only changes if the agent's internal state changes. We establish basic properties of these two definitions, show that they accommodate typical views of convergence in standard settings, and prove several facts about their nature and relationship. We take these perspectives, definitions, and analysis to bring clarity to a central idea of the field.

翻訳日:2023-07-21 11:48:00 公開日:2023-07-20

# Cascade-DETR: 高品質なユニバーサルオブジェクト検出

Cascade-DETR: Delving into High-Quality Universal Object Detection ( http://arxiv.org/abs/2307.11035v1 )

ライセンス: Link先を確認

Mingqiao Ye, Lei Ke, Siyuan Li, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan and Fisher Yu

(参考訳) 一般的な環境でのオブジェクトのローカライゼーションは、視覚システムの基本部分である。 COCOベンチマークで優位に立つ一方で、最近のTransformerベースの検出方法は多様なドメインで競合しない。さらに、これらの手法は複雑な環境でオブジェクトバウンディングボックスを正確に推定するのに苦労している。高品質な普遍物体検出のためのカスケードDETRを提案する。本稿では,対象中心情報を検出デコーダに明示的に統合するカスケード・アテンション・レイヤを提案することにより,多様な領域への一般化と局所化精度を両立させる。さらに精度を高めるために,クエリのスコアリングを再検討する。分類スコアに頼る代わりに、クエリの予想されるiouを予測することで、信頼性が大幅に向上します。最後に、多様なドメインから10のデータセットを含む汎用オブジェクト検出ベンチマークUDB10を紹介する。カスケード-DETRはCOCOの最先端を推し進める一方で、UDB10の全データセット上のDETRベースの検出器を大幅に改善している。厳密な品質要件による改善はさらに顕著である。私たちのコードとモデルはhttps://github.com/syscv/cascade-detrでリリースされる予定です。

Object localization in general environments is a fundamental part of vision systems. While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains. Moreover, these methods still struggle to very accurately estimate the object bounding boxes in complex environments. We introduce Cascade-DETR for high-quality universal object detection. We jointly tackle the generalization to diverse domains and localization accuracy by proposing the Cascade Attention layer, which explicitly integrates object-centric information into the detection decoder by limiting the attention to the previous box prediction. To further enhance accuracy, we also revisit the scoring of queries. Instead of relying on classification scores, we predict the expected IoU of the query, leading to substantially more well-calibrated confidences. Lastly, we introduce a universal object detection benchmark, UDB10, that contains 10 datasets from diverse domains. While also advancing the state-of-the-art on COCO, Cascade-DETR substantially improves DETR-based detectors on all datasets in UDB10, even by over 10 mAP in some cases. The improvements under stringent quality requirements are even more pronounced. Our code and models will be released at https://github.com/SysCV/cascade-detr.

翻訳日:2023-07-21 11:47:43 公開日:2023-07-20

# Embroid: 教師なし予測の平滑化は、わずかなショットの分類を改善できる

Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification ( http://arxiv.org/abs/2307.11031v1 )

ライセンス: Link先を確認

Neel Guha, Mayee F. Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher R\'e

(参考訳) 近年の研究では、手動アノテーションが高価である領域において、言語モデル(LM)のプロンプトベースの学習機能がデータラベリングの自動化に適していることが示されている。課題は、初期プロンプトを書くのは安価だが、プロンプトを改善するのはコストがかかることだ。我々の研究は、ラベル付きデータを追加せずに、プロンプトベースの学習を改善することができるかどうかを問うものである。我々は,プロンプト自体ではなく,プロンプトの予測を変更することでこの問題にアプローチする。我々の直感では、正確な予測も一貫性があるべきである:ある特徴表現の下で類似したサンプルは、同じプロンプト予測を受けなければならない。 Embroidは、異なる埋め込み関数の下でデータセットの複数の表現を計算し、近隣のサンプルに対するLM予測間の整合性を利用して誤予測を識別する手法である。次にembroidは、これらの近傍を使用して各サンプルに対する追加の予測を作成し、これらの予測を単純な潜在変数のグラフィカルモデルと組み合わせて最終補正された予測を生成する。 Embroidの理論解析に加えて、6つの異なるLMと最大95の異なるタスクに対して厳密な経験的評価を行う。その結果,(1)エンブロイドは元々のプロンプト(例えばgpt-jtの平均7.3ポイント)よりも大幅に性能が向上し,(2)より洗練されたプロンプト戦略(例えばチェーン・オブ・マインド)の改善を実現し,(3)埋め込み関数を通じて法のような領域に特化できることがわかった。

Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, improving a prompt is costly -- practitioners often require significant labeled data in order to evaluate the impact of prompt modifications. Our work asks whether it is possible to improve prompt-based learning without additional labeled data. We approach this problem by attempting to modify the predictions of a prompt, rather than the prompt itself. Our intuition is that accurate predictions should also be consistent: samples which are similar under some feature representation should receive the same prompt prediction. We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions, and uses the consistency between the LM predictions for neighboring samples to identify mispredictions. Embroid then uses these neighborhoods to create additional predictions for each sample, and combines these predictions with a simple latent variable graphical model in order to generate a final corrected prediction. In addition to providing a theoretical analysis of Embroid, we conduct a rigorous empirical evaluation across six different LMs and up to 95 different tasks. We find that (1) Embroid substantially improves performance over original prompts (e.g., by an average of 7.3 points on GPT-JT), (2) also realizes improvements for more sophisticated prompting strategies (e.g., chain-of-thought), and (3) can be specialized to domains like law through the embedding functions.

翻訳日:2023-07-21 11:47:22 公開日:2023-07-20

# 量子相関に関するデータ駆動基準

Data-driven criteria for quantum correlations ( http://arxiv.org/abs/2307.11091v1 )

ライセンス: Link先を確認

Mateusz Krawczyk, Jaros{\l}aw Paw{\l}owski, Maciej M. Ma\'ska, and Katarzyna Roszak

(参考訳) ランダムに生成された状態に対して教師なしの方法で訓練されたニューラルネットワークを用いて、3量子ビットシステム内の相関を検出する機械学習モデルを構築する。ネットワークは分離可能な状態を認識せざるを得ず、相関状態は異常として検出される。極めて驚くべきことに、提案する検出器は、絡み合いよりも弱い量子相関、すなわち量子不一致を区別するのに優れていることがわかった。実際、絡み合い検出の最適しきい値においても、絡み合い状態の集合を極端に過大評価する傾向があり、不協和状態の集合をはるかに少ない程度に過小評価する傾向にある。量子相関性(quantum-correlated)として分類される状態の性質を説明するために、様々な種類の状態を含むダイアグラムを構築します。認識損失のゼロに近い値は、特に図表上のこの集合の非自明な形状を考慮すると、非識別分離状態の形状を高精度に再現する。ネットワークアーキテクチャは、分離性を保持し、その出力は、キュービットの置換に関して等しく変化する。部分的トレース操作のみを利用するベースラインモデルよりもはるかに優れた検出精度を得るためには,アーキテクチャの選択が重要であることを示す。

We build a machine learning model to detect correlations in a three-qubit system using a neural network trained in an unsupervised manner on randomly generated states. The network is forced to recognize separable states, and correlated states are detected as anomalies. Quite surprisingly, we find that the proposed detector performs much better at distinguishing a weaker form of quantum correlations, namely, the quantum discord, than entanglement. In fact, it has a tendency to grossly overestimate the set of entangled states even at the optimal threshold for entanglement detection, while it underestimates the set of discordant states to a much lesser extent. In order to illustrate the nature of states classified as quantum-correlated, we construct a diagram containing various types of states -- entangled, as well as separable, both discordant and non-discordant. We find that the near-zero value of the recognition loss reproduces the shape of the non-discordant separable states with high accuracy, especially considering the non-trivial shape of this set on the diagram. The network architecture is designed carefully: it preserves separability, and its output is equivariant with respect to qubit permutations. We show that the choice of architecture is important to get the highest detection accuracy, much better than for a baseline model that just utilizes a partial trace operation.

翻訳日:2023-07-21 11:41:57 公開日:2023-07-20

# l-eval:long context language modelの標準化評価

L-Eval: Instituting Standardized Evaluation for Long Context Language Models ( http://arxiv.org/abs/2307.11088v1 )

ライセンス: Link先を確認

Chenxin An, Shansan Gong, Ming Zhong, Mukai Li, Jun Zhang, Lingpeng Kong, and Xipeng Qiu

(参考訳) 近年、単ターンの長い入力(例えば論文の要約)やより広範な歴史との会話を効果的に処理するために、命令追従モデルのコンテキストの長さを拡張することへの関心が高まっている。 GPT-4やClaudeのようなプロプライエタリなモデルは、数万のコンテキストトークンを扱う上でかなりの進歩を見せているが、オープンソースモデルは実験の初期段階にある。これらの長いコンテキストモデルの開発が、チャンク化されたコンテキストでのみ訓練された検索ベースの方法やモデルよりも、実用的な下流タスクにかなりの利益をもたらすかどうかも、まだ不明である。本稿では,この課題に対処するために,ロングコンテキスト言語モデルの標準化評価を行う。具体的には,法律,金融,学校講義,長い会話,ニュース,長文小説,会議などの分野の著者が手作業で注釈とチェックを行った2000以上の質問応答ペアを含むl-evalを開発した。 L-Evalは様々な評価手法や命令スタイルを採用しており、Long Context Language Models (LCLM) の信頼性を高めている。私たちの調査では、オープンソースモデルは一般的に商用モデルよりも遅れているものの、それでも素晴らしいパフォーマンスを示しています。 LLaMA2は、4kコンテキスト長しか持たないオープンエンドタスクにおいて最良の結果(ウィン45\%対ターボ16k)を達成し、ChatGLM2は8k入力トークンを持つクローズドエンドタスクにおいて最高の結果を得る。オープンソースLCLM, GPT4-32k, Cluade-100k at {\url{https://github.com/OpenLMLab/LEval}} の予測を含む,新たな評価スイート,コード,およびすべての生成結果をリリースする。

Recently, there has been growing interest in extending the context length of instruction-following models in order to effectively process single-turn long input (e.g. summarizing a paper) and conversations with more extensive histories. While proprietary models such as GPT-4 and Claude have demonstrated considerable advancements in handling tens of thousands of tokens of context, open-sourced models are still in the early stages of experimentation. It also remains unclear whether developing these long context models can offer substantial gains on practical downstream tasks over retrieval-based methods or models simply trained on chunked contexts. To address this challenge, we propose to institute standardized evaluation for long context language models. Concretely, we develop L-Eval which contains 411 long documents and over 2,000 query-response pairs manually annotated and checked by the authors encompassing areas such as law, finance, school lectures, lengthy conversations, news, long-form novels, and meetings. L-Eval also adopts diverse evaluation methods and instruction styles, enabling a more reliable assessment of Long Context Language Models (LCLMs). Our findings indicate that while open-source models typically lag behind their commercial counterparts, they still exhibit impressive performance. LLaMA2 achieves the best results (win 45\% vs turbo-16k) on open-ended tasks with only 4k context length and ChatGLM2 achieves the best results on closed-ended tasks with 8k input tokens. We release our new evaluation suite, code, and all generation results including predictions from all open-sourced LCLMs, GPT4-32k, Cluade-100k at {\url{https://github.com/OpenLMLab/LEval}}.

翻訳日:2023-07-21 11:41:30 公開日:2023-07-20

# PAPR: 近視的注意ポイントレンダリング

PAPR: Proximity Attention Point Rendering ( http://arxiv.org/abs/2307.11086v1 )

ライセンス: Link先を確認

Yanshu Zhang, Shichong Peng, Alireza Moazeni, Ke Li

(参考訳) スクラッチからシーン表面の正確で控えめなポイントクラウド表現を学ぶことは、3d表現学習の課題である。既存のポイントベース手法は、しばしば消失する勾配問題や、シーンの幾何学やテクスチャを正確にモデル化するために多くのポイントを必要とする。これらの制約に対処するため,我々は,ポイントベースのシーン表現と微分可能なレンダラからなる新しい手法である近接注意ポイントレンダリング(papr)を提案する。我々のシーン表現は、各点が空間的位置、前景スコア、ビュー非依存の特徴ベクトルによって特徴づけられる点雲を使用する。レンダラは、各光線に関する関連点を選択し、関連する特徴を用いて正確な色を生成する。 PAPRは、初期化がターゲットの幾何学と大きく異なる場合でも、適切なシーン幾何学を表現するために点雲の位置を効果的に学習する。特に,本手法では,相似点のみを用いて微細なテクスチャの詳細を抽出する。また,本手法の実用的応用として,幾何学的編集,オブジェクト操作,テクスチャ転送,露出制御の4つを挙げる。さらなる結果とコードは、プロジェクトのwebサイトhttps://zvict.github.io/papr/で閲覧できます。

Learning accurate and parsimonious point cloud representations of scene surfaces from scratch remains a challenge in 3D representation learning. Existing point-based methods often suffer from the vanishing gradient problem or require a large number of points to accurately model scene geometry and texture. To address these limitations, we propose Proximity Attention Point Rendering (PAPR), a novel method that consists of a point-based scene representation and a differentiable renderer. Our scene representation uses a point cloud where each point is characterized by its spatial position, foreground score, and view-independent feature vector. The renderer selects the relevant points for each ray and produces accurate colours using their associated features. PAPR effectively learns point cloud positions to represent the correct scene geometry, even when the initialization drastically differs from the target geometry. Notably, our method captures fine texture details while using only a parsimonious set of points. We also demonstrate four practical applications of our method: geometry editing, object manipulation, texture transfer, and exposure control. More results and code are available on our project website at https://zvict.github.io/papr/.

翻訳日:2023-07-21 11:40:55 公開日:2023-07-20

# 異常検出における表現学習:成功、限界、そして大きな挑戦

Representation Learning in Anomaly Detection: Successes, Limits and a Grand Challenge ( http://arxiv.org/abs/2307.11085v1 )

ライセンス: Link先を確認

Yedid Hoshen

(参考訳) 本稿では,異常検出における支配的パラダイムは無限にスケールできず,最終的には基本的限界に達することを論じる。これは、異常検出のための無料ランチの原則がないためである。これらの制限は、多くの産業的タスクと同様に、強いタスク前がある場合に克服できる。このような事前処理が存在しない場合、そのタスクは異常検出よりもずっと難しい。異常検出のための大きな課題として,2つの課題を挙げる。一異常検出による科学的発見 ii) imagenetデータセットにおける最も異常な画像を検出する「ミニグランド」チャレンジ。これらの課題を克服するためには、新たな異常検出ツールやアイデアを開発する必要があると考えています。

In this perspective paper, we argue that the dominant paradigm in anomaly detection cannot scale indefinitely and will eventually hit fundamental limits. This is due to the a no free lunch principle for anomaly detection. These limitations can be overcome when there are strong tasks priors, as is the case for many industrial tasks. When such priors do not exists, the task is much harder for anomaly detection. We pose two such tasks as grand challenges for anomaly detection: i) scientific discovery by anomaly detection ii) a "mini-grand" challenge of detecting the most anomalous image in the ImageNet dataset. We believe new anomaly detection tools and ideas would need to be developed to overcome these challenges.

翻訳日:2023-07-21 11:40:31 公開日:2023-07-20

# 量子ログスペース計算の検証

Quantum Logspace Computations are Verifiable ( http://arxiv.org/abs/2307.11083v1 )

ライセンス: Link先を確認

Uma Girish, Ran Raz, Wei Zhan

(参考訳) 本稿では、量子対数計算が古典的対数アルゴリズムによって検証され、無条件のセキュリティを持つことを観察する。より正確には、BQLの全ての言語は量子対数証明器と古典対数検証器を備えた(情報理論的に安全な)ストリーミング証明を持つ。証明者は、検証子にストリームされる多項式長証明を提供する。検証者は、その証明に対する一方向の読み取りアクセスを持ち、計算が正しく行われたことを検証できる。すなわち、入力が言語内にあり、証明者が正直であれば、検証者は高い確率で受け入れ、その入力が言語内でなければ、証明者は、たとえ証明者が逆であるとしても、高い確率で拒否する。さらに、検証者は$O(\log n)$ランダムビットのみを使用する。

In this note, we observe that quantum logspace computations are verifiable by classical logspace algorithms, with unconditional security. More precisely, every language in BQL has an (information-theoretically secure) streaming proof with a quantum logspace prover and a classical logspace verifier. The prover provides a polynomial-length proof that is streamed to the verifier. The verifier has a read-once one-way access to that proof and is able to verify that the computation was performed correctly. That is, if the input is in the language and the prover is honest, the verifier accepts with high probability, and, if the input is not in the language, the verifier rejects with high probability even if the prover is adversarial. Moreover, the verifier uses only $O(\log n)$ random bits.

翻訳日:2023-07-21 11:39:56 公開日:2023-07-20

# GLSFormer : 手術ビデオにおけるステップ認識のための長い短いシーケンス変換器

GLSFormer : Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos ( http://arxiv.org/abs/2307.11081v1 )

ライセンス: Link先を確認

Nisarg A. Shah, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel

(参考訳) 外科的ステップの自動認識は、手術中の患者の安全性と意思決定を大幅に改善する重要な課題である。既存の外科的段階認識のための最先端の手法は、空間情報と時間情報の分離した多段階モデリングに依存するか、あるいは、共同で学習した場合に短距離時間分解能で操作する。しかし、時空間的特徴と長距離情報の共同モデリングの利点は考慮されていない。本稿では,フレームレベルのパッチのシーケンスから時空間的特徴を直接学習するビジョントランスフォーマによるアプローチを提案する。本手法では,短期・長期の時空間特徴表現をインテリジェントに組み合わせたゲート時間アテンション機構を組み込んだ。 2つの白内障手術ビデオデータセット(白内障101とd99)に対するアプローチを広範囲に評価し,最先端の手法と比較して優れた性能を示す。これらの結果は, 手術ステップの自動認識における提案手法の適合性を検証する。私たちのコードは、https://github.com/nisargshah 1999/GLSFormerでリリースされています。

Automated surgical step recognition is an important task that can significantly improve patient safety and decision-making during surgeries. Existing state-of-the-art methods for surgical step recognition either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly. However, the benefits of joint modeling of spatio-temporal features and long-range information are not taken in account. In this paper, we propose a vision transformer-based approach to jointly learn spatio-temporal features directly from sequence of frame-level patches. Our method incorporates a gated-temporal attention mechanism that intelligently combines short-term and long-term spatio-temporal feature representations. We extensively evaluate our approach on two cataract surgery video datasets, namely Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods. These results validate the suitability of our proposed approach for automated surgical step recognition. Our code is released at: https://github.com/nisargshah1999/GLSFormer

翻訳日:2023-07-21 11:39:35 公開日:2023-07-20

# Brain2Music:人間の脳活動から音楽を再構築する

Brain2Music: Reconstructing Music from Human Brain Activity ( http://arxiv.org/abs/2307.11078v1 )

ライセンス: Link先を確認

Timo I. Denk, Yu Takagi, Takuya Matsuyama, Andrea Agostinelli, Tomoya Nakai, Christian Frank, Shinji Nishimoto

(参考訳) 人間の脳活動から経験を再構築するプロセスは、脳が世界をどのように解釈し、表現するかというユニークなレンズを提供する。本稿では,機能的磁気共鳴画像(fMRI)を用いて,脳活動から音楽の再構成を行う手法を提案する。本手法では,fMRIデータからの埋め込みを条件とした音楽検索やMusicLM音楽生成モデルを用いる。生成された音楽は、ジャンル、楽器、ムードといった意味的特性に関して、人間の被験者が経験した音楽刺激に類似している。ボクセル単位の符号化モデル解析により,MusicLMの異なる成分と脳活動の関係について検討した。さらに,音楽刺激の純粋テキスト記述から得られる情報を表現する脳領域についても論じる。我々は https://google-research.github.io/seanet/brain2music で再構成された音楽の例を含む補足資料を提供する。

The process of reconstructing experiences from human brain activity offers a unique lens into how the brain interprets and represents the world. In this paper, we introduce a method for reconstructing music from brain activity, captured using functional magnetic resonance imaging (fMRI). Our approach uses either music retrieval or the MusicLM music generation model conditioned on embeddings derived from fMRI data. The generated music resembles the musical stimuli that human subjects experienced, with respect to semantic properties like genre, instrumentation, and mood. We investigate the relationship between different components of MusicLM and brain activity through a voxel-wise encoding modeling analysis. Furthermore, we discuss which brain regions represent information derived from purely textual descriptions of music stimuli. We provide supplementary material including examples of the reconstructed music at https://google-research.github.io/seanet/brain2music

翻訳日:2023-07-21 11:39:15 公開日:2023-07-20

# AlignDet: オブジェクト検出における事前トレーニングと微調整の調整

AlignDet: Aligning Pre-training and Fine-tuning in Object Detection ( http://arxiv.org/abs/2307.11077v1 )

ライセンス: Link先を確認

Ming Li, Jie Wu, Xionghui Wang, Chen Chen, Jie Qin, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan

(参考訳) 大規模事前学習のパラダイムと下流の微調整は様々な物体検出アルゴリズムで広く採用されている。本稿では,既存の手法における事前学習手順と微調整手順との間に,検出器の性能,一般化能力,収束速度を暗黙的に制限する,データ,モデル,タスクの差異を明らかにする。この目的のために、我々は、様々な既存の検出器に適応可能な統合事前学習フレームワークであるAlignDetを提案する。 AlignDetは事前トレーニングプロセスを、イメージドメインとボックスドメイン事前トレーニングの2つのステージに分離する。イメージドメイン事前トレーニングは検出バックボーンを最適化し、総合的な視覚的抽象化をキャプチャし、ボックスドメイン事前トレーニングはインスタンスレベルのセマンティクスとタスクアウェアの概念を学習し、バックボーンから部品を初期化する。自己教師付きバックボーンを組み込むことで、様々な検出器のための全てのモジュールを教師なしパラダイムで事前訓練することができる。図1に示すように、allendetが検出アルゴリズム、モデルバックボーン、データ設定、トレーニングスケジュールなど、さまざまなプロトコルで大幅に改善できることが、広範な実験で示されています。例えば、AlignDetはFCOSを5.3mAPで改善し、RetinaNetを2.1mAPで、R-CNNを3.3mAPで、DETRを2.3mAPで改善した。

The paradigm of large-scale pre-training followed by downstream fine-tuning has been widely employed in various object detection algorithms. In this paper, we reveal discrepancies in data, model, and task between the pre-training and fine-tuning procedure in existing practices, which implicitly limit the detector's performance, generalization ability, and convergence speed. To this end, we propose AlignDet, a unified pre-training framework that can be adapted to various existing detectors to alleviate the discrepancies. AlignDet decouples the pre-training process into two stages, i.e., image-domain and box-domain pre-training. The image-domain pre-training optimizes the detection backbone to capture holistic visual abstraction, and box-domain pre-training learns instance-level semantics and task-aware concepts to initialize the parts out of the backbone. By incorporating the self-supervised pre-trained backbones, we can pre-train all modules for various detectors in an unsupervised paradigm. As depicted in Figure 1, extensive experiments demonstrate that AlignDet can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule. For example, AlignDet improves FCOS by 5.3 mAP, RetinaNet by 2.1 mAP, Faster R-CNN by 3.3 mAP, and DETR by 2.3 mAP under fewer epochs.

翻訳日:2023-07-21 11:39:02 公開日:2023-07-20

# 人間のメッシュ回復のための高密度紫外線コンプリート学習

Learning Dense UV Completion for Human Mesh Recovery ( http://arxiv.org/abs/2307.11074v1 )

ライセンス: Link先を確認

Yanjun Wang, Qingping Sun, Wenjia Wang, Jun Ling, Zhongang Cai, Rong Xie, Li Song

(参考訳) 単一画像からの人間のメッシュ再構築は、自己や物体、あるいは他の人間によって引き起こされるオクルージョンの存在下では困難である。既存の手法では、人間の特徴を正確に分離できないか、機能補完のための適切な監督を欠いている。本稿では,密接な対応地図を利用して閉塞処理を行う2段階の手法であるDense Inpainting Human Mesh Recovery (DIMR)を提案する。提案手法は,高密度対応マップを用いて視覚的特徴を分離し,注目機能補完モジュールを用いた高密度UVマップ上での人間の特徴を補完する。また、未使用の機能から学習するためのネットワークを誘導する機能拡張訓練手順を設計する。提案手法を複数のデータセット上で評価し,その性能を他の手法と比較した。広汎な実験により,従来のSOTA法よりも高い性能を示し,標準ベンチマーク(3DPW)において同等の結果が得られた。

Human mesh reconstruction from a single image is challenging in the presence of occlusion, which can be caused by self, objects, or other humans. Existing methods either fail to separate human features accurately or lack proper supervision for feature completion. In this paper, we propose Dense Inpainting Human Mesh Recovery (DIMR), a two-stage method that leverages dense correspondence maps to handle occlusion. Our method utilizes a dense correspondence map to separate visible human features and completes human features on a structured UV map dense human with an attention-based feature completion module. We also design a feature inpainting training procedure that guides the network to learn from unoccluded features. We evaluate our method on several datasets and demonstrate its superior performance under heavily occluded scenarios compared to other methods. Extensive experiments show that our method obviously outperforms prior SOTA methods on heavily occluded images and achieves comparable results on the standard benchmarks (3DPW).

翻訳日:2023-07-21 11:38:35 公開日:2023-07-20

# OBJECT 3DIT:言語誘導型3D対応画像編集

OBJECT 3DIT: Language-guided 3D-aware Image Editing ( http://arxiv.org/abs/2307.11073v1 )

ライセンス: Link先を確認

Oscar Michel, Anand Bhattad, Eli VanderBilt, Ranjay Krishna, Aniruddha Kembhavi, Tanmay Gupta

(参考訳) 既存の画像編集ツールは強力だが、画像が投影される基礎となる3D幾何学は無視される。その結果、これらのツールを用いた編集は、画像形成プロセスの基礎となる幾何学的条件や照明条件から切り離される可能性がある。本研究では,画像中のオブジェクトを,下層の3Dシーンの文脈で言語命令に従って編集する,言語誘導型3D対応編集の新規要求を定式化する。この目標に向けての進展を促進するために、手続き的に生成された3Dシーンから作成される400Kの編集例からなるデータセットOBJECTをリリースする。それぞれの例は、入力画像、言語による編集命令、および編集画像からなる。 4つの編集タスクのためのシングルおよびマルチタスクモデルである3ditも紹介する。私たちのモデルでは、周囲の物体、表面、照明条件、影、物理的に表現可能な物体構成など、シーン全体の3D構成を理解する能力が印象的です。驚くべきことに、3DITの編集能力は、OBJECTの合成シーンのみのトレーニングを現実のイメージに一般化する。

Existing image editing tools, while powerful, typically disregard the underlying 3D geometry from which the image is projected. As a result, edits made using these tools may become detached from the geometry and lighting conditions that are at the foundation of the image formation process. In this work, we formulate the newt ask of language-guided 3D-aware editing, where objects in an image should be edited according to a language instruction in context of the underlying 3D scene. To promote progress towards this goal, we release OBJECT: a dataset consisting of 400K editing examples created from procedurally generated 3D scenes. Each example consists of an input image, editing instruction in language, and the edited image. We also introduce 3DIT : single and multi-task models for four editing tasks. Our models show impressive abilities to understand the 3D composition of entire scenes, factoring in surrounding objects, surfaces, lighting conditions, shadows, and physically-plausible object configurations. Surprisingly, training on only synthetic scenes from OBJECT, editing capabilities of 3DIT generalize to real-world images.

翻訳日:2023-07-21 11:38:20 公開日:2023-07-20

PDF登録状況（公開日: 20230720）