Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230808となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# LLMを使ったコードインテリジェンスタスクのコンテキスト内説明に何が役立つのか? What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs? ( http://arxiv.org/abs/2304.07575v2 ) ライセンス: Link先を確認	Shuzheng Gao, Xin-Cheng Wen, Cuiyun Gao, Wenxuan Wang, Hongyu Zhang, Michael R. Lyu	(参考訳) トレーニング済みのソースコードモデルは、多くのコードインテリジェンスタスクで広く人気を集めている。近年、モデルとコーパスサイズのスケーリングにより、大きな言語モデルでは、コンテキスト内学習(icl)の能力が示されている。 iclはタスク命令といくつかの例をデモンストレーションとして使用し、そのデモンストレーションを言語モデルに入力して予測を行う。この新しい学習パラダイムはトレーニングフリーであり、様々な自然言語処理やコードインテリジェンスタスクで印象的なパフォーマンスを示している。しかし、ICLのパフォーマンスは、例えば選択された例のようなデモの質に大きく依存している。コード関連タスクの良質なデモンストレーションを構築する方法について体系的に調査することが重要である。本稿では,コードインテリジェンスタスクにおけるICLの性能に及ぼす3つの重要な要因 – 選択,順序,実演例の数 – の影響を実証的に検討する。コード要約、バグ修正、プログラム合成を含む3つのコードインテリジェンスタスクについて広範な実験を行った。実験の結果、上記の3つの要因がコードインテリジェンスタスクにおけるICLの性能に劇的な影響を及ぼすことが示された。さらに,本研究の成果を要約し,これらの3つの観点から効果的な実演の作り方を提案する。また,本研究に基づく注意深く設計されたデモンストレーションは,bleu-4,em,emを少なくとも9.90%,175.96%,50.81%,コード要約,バグフィックス,プログラム合成など,広く使用されているデモンストレーション構築手法に対して大幅に改善する可能性を示す。 Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context learning (ICL). ICL employs task instructions and a few examples as demonstrations, and then inputs the demonstrations to the language models for making predictions. This new learning paradigm is training-free and has shown impressive performance in various natural language processing and code intelligence tasks. However, the performance of ICL heavily relies on the quality of demonstrations, e.g., the selected examples. It is important to systematically investigate how to construct a good demonstration for code-related tasks. In this paper, we empirically explore the impact of three key factors on the performance of ICL in code intelligence tasks: the selection, order, and number of demonstration examples. We conduct extensive experiments on three code intelligence tasks including code summarization, bug fixing, and program synthesis. Our experimental results demonstrate that all the above three factors dramatically impact the performance of ICL in code intelligence tasks. Additionally, we summarize our findings and provide takeaway suggestions on how to construct effective demonstrations, taking into account these three perspectives. We also show that a carefully-designed demonstration based on our findings can lead to substantial improvements over widely-used demonstration construction methods, e.g., improving BLEU-4, EM, and EM by at least 9.90%, 175.96%, and 50.81% on code summarization, bug fixing, and program synthesis, respectively	翻訳日:2023-10-24 12:47:40 公開日:2023-08-08
# シナリオ生成はSOTIFの準備が整っているか? 体系的な文献レビュー Is Scenario Generation Ready for SOTIF? A Systematic Literature Review ( http://arxiv.org/abs/2308.02273v2 ) ライセンス: Link先を確認	Lukas Birkemeyer, Christian King, Ina Schaefer	(参考訳) シナリオベースのテストは、高度な運転支援システムや自動運転システムを検証するための最先端技術と考えられている。 sotif標準(iso 21448)の正式ローンチにより、シナリオベースのテストは、これらの高度に自動化された運転システムのリリースにますます重要になる。しかし、本質的な欠落は、SOTIF標準の実践的適用を妨げる: シナリオベースのテストのシナリオを現実的に生成する方法? 本稿では,SOTIF規格の要件を満たすシナリオを生成する手法を特定するために,システム文献レビューを実施している。既存のシナリオ生成手法を分類し,生成されたシナリオwrtの特性を評価する。 sotif要件。実世界のどの詳細が生成されたシナリオでカバーされているのか、テスト対象のシステム固有のシナリオなのか、それともジェネリックなシナリオなのか、未知のシナリオと危険なシナリオのセットを最小限に抑えるよう設計されているのかを調査した。我々は,既存の技術で生成されたシナリオが,SOTIF規格に規定されている要件に従わないことを結論し,今後の研究の方向性を提案する。 Scenario-based testing is considered state-of-the-art to verify and validate Advanced Driver Assistance Systems or Automated Driving Systems. Due to the official launch of the SOTIF-standard (ISO 21448), scenario-based testing becomes more and more relevant for releasing those Highly Automated Driving Systems. However, an essential missing detail prevent the practical application of the SOTIF-standard: How to practically generate scenarios for scenario-based testing? In this paper, we perform a Systematic Literature Review to identify techniques that generate scenarios complying with requirements of the SOTIF-standard. We classify existing scenario generation techniques and evaluate the characteristics of generated scenarios wrt. SOTIF requirements. We investigate which details of the real-world are covered by generated scenarios, whether scenarios are specific for a system under test or generic, and whether scenarios are designed to minimize the set of unknown and hazardous scenarios. We conclude that scenarios generated with existing techniques do not comply with requirements implied by the SOTIF-standard; hence, we propose directions for future research.	翻訳日:2023-10-23 15:21:08 公開日:2023-08-08
# ファジングのためのモデルベーススクリプト合成 model-based script synthesis for fuzzing ( http://arxiv.org/abs/2308.04115v1 ) ライセンス: Link先を確認	Zian Liu, Chao Chen, Muhammad Ejaz Ahmed, Jun Zhang, Dongxi Liu	(参考訳) カーネルファジングは重要なカーネルの脆弱性を見つけるのに重要である。ソースコードの欠如により、クローズソース(例えばwindows)オペレーティングシステムカーネルのファジングはさらに困難である。既存のアプローチは、トレースからのsyscallシーケンスやシステムコードの静的解析をモデル化することでカーネルを混乱させる。しかしながら、一般的な制限は、異なるカーネル状態に到達するためにsyscallシーケンスを学習したり変更したりしないため、より多くのバグやクラッシュを引き起こす可能性があることである。本稿では,異なるカーネル状態に到達するためにトレースされたsyscallシーケンスを学習し,ミュートする手法であるwinkfuzzを提案する。 WinkFuzzは、トレースからsyscall依存性を学び、後続のsyscallを持つ可能性のあるトレース内の潜在的なsyscallを特定し、依存関係を適用して、依存関係をトレースに保存する。そして、WinkFuzzは合成された新しいsyscallシーケンスをファズしてシステムクラッシュを見つける。我々は,WinkFuzzを4種類のシードアプリケーションに適用し,シースコール数70.8\%,成功率61\%の合計増加を3つのインサートレベルで確認した。トレース時間,依存性解析,モデルスクリプトの復元,合成スクリプトの平均時間は,それぞれ600,39,34,129秒であった。瞬時のファジングレートは3742 syscall per secondである。しかし,初期化時間,待ち時間,その他の要因を考慮した場合,平均ファズ効率は毎秒155回まで低下した。私たちは各シードアプリケーションを24秒間ファズして、平均してその時間内に12.25回クラッシュしました。 Kernel fuzzing is important for finding critical kernel vulnerabilities. Close-source (e.g., Windows) operating system kernel fuzzing is even more challenging due to the lack of source code. Existing approaches fuzz the kernel by modeling syscall sequences from traces or static analysis of system codes. However, a common limitation is that they do not learn and mutate the syscall sequences to reach different kernel states, which can potentially result in more bugs or crashes. In this paper, we propose WinkFuzz, an approach to learn and mutate traced syscall sequences in order to reach different kernel states. WinkFuzz learns syscall dependencies from the trace, identifies potential syscalls in the trace that can have dependent subsequent syscalls, and applies the dependencies to insert more syscalls while preserving the dependencies into the trace. Then WinkFuzz fuzzes the synthesized new syscall sequence to find system crashes. We applied WinkFuzz to four seed applications and found a total increase in syscall number of 70.8\%, with a success rate of 61\%, within three insert levels. The average time for tracing, dependency analysis, recovering model script, and synthesizing script was 600, 39, 34, and 129 seconds respectively. The instant fuzzing rate is 3742 syscall executions per second. However, the average fuzz efficiency dropped to 155 syscall executions per second when the initializing time, waiting time, and other factors were taken into account. We fuzzed each seed application for 24 seconds and, on average, obtained 12.25 crashes within that time frame.	翻訳日:2023-10-23 15:13:00 公開日:2023-08-08
# Inverse Transparency Toolchain: 完全に統合され、素早くデプロイ可能なデータ使用ログインフラストラクチャ The Inverse Transparency Toolchain: A Fully Integrated and Quickly Deployable Data Usage Logging Infrastructure ( http://arxiv.org/abs/2308.04366v1 ) ライセンス: Link先を確認	Valentin Zieglmeier	(参考訳) 逆透明性は、従業員データのすべての使用を見える化することで実現される。これは、利用情報のロギングと保存を処理し、ログされたデータをデータ所有者に可視化するツールを必要とする。逆透過性を統合した研究と教育のコンテキストでは、必要なインフラストラクチャの構築が難しくなります。 Inverse Transparency Toolchainはこのようなシナリオに対して柔軟なソリューションを提供する。簡単にデプロイでき、密に統合できる。そこで本研究では,ユーザによる経験的学習,大学コースでのプロトタイピング,業界パートナによる実験を含むユースケースをうまく処理した。 Inverse transparency is created by making all usages of employee data visible to them. This requires tools that handle the logging and storage of usage information, and making logged data visible to data owners. For research and teaching contexts that integrate inverse transparency, creating this required infrastructure can be challenging. The Inverse Transparency Toolchain presents a flexible solution for such scenarios. It can be easily deployed and is tightly integrated. With it, we successfully handled use cases covering empirical studies with users, prototyping in university courses, and experimentation with our industry partner.	翻訳日:2023-10-23 15:01:30 公開日:2023-08-08
# 公正かつ包括的参加予算:累積および二次投票インタフェースを用いた投票経験 Fair and Inclusive Participatory Budgeting: Voter Experience with Cumulative and Quadratic Voting Interfaces ( http://arxiv.org/abs/2308.04345v1 ) ライセンス: Link先を確認	Thomas Welling, Fatemeh Banaie Heravan, Abhinav Sharma, Lodewijk Gelauff, Regula Haenggli, Evangelos Pournaras	(参考訳) 累積投票と2次投票は、特に参加予算の領域において、公平さと包摂性を促進する2つの分散投票方法である。これらの利点にもかかわらず、累積および二次投票のためのグラフィカル投票インタフェースは、実装と有効利用が複雑である。その結果、このような方法がデジタル投票プラットフォームで広く採用されることはなかった。本稿では,最先端の投票プラットフォームであるstanford participatory budgetingにおいて,累積投票と二次投票の実装と評価を導入することで課題を解決する。その結果、有権者は単純な方法を好むが、より表現力のある(かつ複雑な)累積投票の方が、単純だが表現力の低いkランク投票よりも好まれることがわかった。実装された投票インターフェース要素は有用であり、より表現力のある投票方法に対する投票者の好みを支持する。 * Cumulative and quadratic voting are two distributional voting methods that are expressive, promoting fairness and inclusion, particularly in the realm of participatory budgeting. Despite these benefits, graphical voter interfaces for cumulative and quadratic voting are complex to implement and use effectively. As a result, such methods have not seen yet widespread adoption on digital voting platforms. This paper addresses the challenge by introducing an implementation and evaluation of cumulative and quadratic voting within a state-of-the-art voting platform: Stanford Participatory Budgeting. The findings of the study show that while voters prefer simple methods, the more expressive (and complex) cumulative voting becomes the preferred one compared to k-ranking voting that is simpler but less expressive. The implemented voting interface elements are found useful and support the observed voters' preferences for more expressive voting methods. *	翻訳日:2023-10-23 15:01:21 公開日:2023-08-08
# オープンソースの機械学習製品のデータセットと分析 A Dataset and Analysis of Open-Source Machine Learning Products ( http://arxiv.org/abs/2308.04328v1 ) ライセンス: Link先を確認	Nadia Nahar, Haoran Zhang, Grace Lewis, Shurui Zhou, Christian K\"astner	(参考訳) 機械学習(ML)コンポーネントはソフトウェア製品にますます取り入れられているが、開発者はMLプロトタイプから製品に移行する上での課題に直面している。学術研究者は、これらの課題に対する解決策の提案と介入を評価するのに苦労している。本研究では,オープンソースのMLプロダクトを定義し,GitHubから262リポジトリのデータセットをキュレートし,さらなる研究と教育を促進する。まず、異なる開発活動に関する6つの幅広い研究課題を調査し、データセットから30のML製品のサンプルから21の調査結果を報告する。この結果から,今後の研究革新に十分な機会を提供するMLモデルの開発プラクティスやアーキテクチャ決定の多様さが明らかになった。また、オープンソースのML製品におけるモデルテストやパイプライン自動化といった業界のベストプラクティスの証拠はほとんどありません。 Machine learning (ML) components are increasingly incorporated into software products, yet developers face challenges in transitioning from ML prototypes to products. Academic researchers struggle to propose solutions to these challenges and evaluate interventions because they often do not have access to close-sourced ML products from industry. In this study, we define and identify open-source ML products, curating a dataset of 262 repositories from GitHub, to facilitate further research and education. As a start, we explore six broad research questions related to different development activities and report 21 findings from a sample of 30 ML products from the dataset. Our findings reveal a variety of development practices and architectural decisions surrounding different types and uses of ML models that offer ample opportunities for future research innovations. We also find very little evidence of industry best practices such as model testing and pipeline automation within the open-source ML products, which leaves room for further investigation to understand its potential impact on the development and eventual end-user experience for the products.	翻訳日:2023-10-23 15:01:06 公開日:2023-08-08
# 技術ノード組立プロセスのための自動機械視覚制御システム Automated machine vision control system for technological nodes assembly process ( http://arxiv.org/abs/2310.00005v1 ) ライセンス: Link先を確認	Nikolay Shtabel, Mikhail Saramud, Stepan Tkachev, Iakov Pikalov	(参考訳) 本稿では,小型宇宙船の組み立てのための自動制御システムの構築,技術的解決,実装の前提条件について論じる。各種の職場における個々のユニットの組み立て過程の制御とログを提供するシステムのハードウェアおよびソフトウェア実装の両方を解析する。本稿では, 組立技術, 特に低解像度のカメラを, 技術マークの形成と処理に特別なアルゴリズムを用いることにより, 機器の要求を低減させる手法を提案する。このツールでは、スレッド接続の締め付けトルクを制御し、所定のアルゴリズムによる無線制御による締め付けトルクを制限することができる。開発システムは、制御だけでなく、技術プロセスのロギング機能も提供しており、将来的には製品のデジタルツインを作成する際にも有用である。 The paper discusses the prerequisites for the creation, technical solutions and implementation of an automated control system for the assembly of a small spacecraft. Both the hardware and software implementation of the system that provides control and logging of the assembly process of individual units at various workplaces are analyzed. The article presents solutions to reduce the requirements for equipment used to control the assembly technology, in particular, to use cameras with a lower resolution, through the use of special algorithms for the formation and processing of technological marks. A tool is presented that allows you to control the tightening torques of threaded connections and limit the tightening torque according to a given algorithm with wireless control. The developed system provides the functions of not only control, but also logging of the technological process, which can be useful in the future when creating a digital twin of the product.	翻訳日:2023-10-23 05:24:59 公開日:2023-08-08
# アナログ回路を用いたMNISTデータセット学習の実装 Implementation Of MNIST Dataset Learning Using Analog Circuit ( http://arxiv.org/abs/2308.16307v1 ) ライセンス: Link先を確認	Minjae Kim	(参考訳) アナログ回路にニューラルネットワークを実装する試みは数多く行われている。それらの多くは多くの入力語を持ち、ほとんどの研究は、Spiceと呼ばれる回路シミュレーションプログラムを通じてアナログ回路にニューラルネットワークを実装し、チップを高コストで設計することを避け、入力する回路を直接実装した。本研究では,コンデンサとダイオードを用いてニューラルネットワークを実装し,マイクロコントローラ(Arduino Mega 2560 R3ボード)を用いて実世界のモデルを駆動し,結果を解析する。 There have been many attempts to implement neural networks in the analog circuit. Most of them had a lot of input terms, and most studies implemented neural networks in the analog circuit through a circuit simulation program called Spice to avoid the need to design chips at a high cost and implement circuits directly to input them. In this study, we will implement neural networks using a capacitor and diode and use microcontrollers (Arduino Mega 2560 R3 boards) to drive real-world models and analyze the results.	翻訳日:2023-09-03 21:22:54 公開日:2023-08-08
# 各種拡張を有する無定常確率型プッシュダウンシステムのモデルチェッキングPCTL特性 Model-Checking PCTL properties of Stateless Probabilistic Pushdown Systems with Various Extensions ( http://arxiv.org/abs/2209.10517v7 ) ライセンス: Link先を確認	Tianrong Lin	(参考訳) 本稿では、まず、無限状態系の確率的検証(具体的には、状態のない確率的プッシュダウン系)における開問題を解決する。我々は、モデルチェック {\em stateless probabilistic pushdown system (pBPA) が一般には決定不可能であることを示す。我々は「em確率的プッシュダウンシステム」と「emマルコフ連鎖」の量子アナログを定義し、本論文で定義された「em量子マルコフ連鎖」の分岐時間特性を記述するために「em確率的計算木論理」の量子アナログを定義する必要があるかどうかをさらに検討する。モデルチェック問題について検討し,計算木論理 (PCTL) に対する状態のない量子プッシュダウンシステム (qBPA) のモデルチェックが概ね決定不可能であることを示す。我々は「em確率的$\omega$-pushdown automaton」の概念を初めて定義し、"em stateless probabilistic $\omega$-pushdown system (\omega$-pbpa)} と$\omega$-pctl (chatterjee et al. in \cite{csh08}) とのモデルチェック問題を調べ、"em stateless probabilistic $\omega$-pushdown system (\omega$-pbpa)} と$\omega$-pctl のモデルチェックが一般に決定不能であることを示し、その結果を要約する。我々のアプローチは間接的に$\omega$-PCTLを符号化する公式を構築することである。 In this paper, we first resolve an open question in the probabilistic verification of infinite-state systems (specifically, the {\em stateless probabilistic pushdown systems}). We show that model checking {\em stateless probabilistic pushdown systems (pBPA)} against {\em probabilistic computational tree logic (PCTL)} is generally undecidable. We define the quantum analogues of the {\em probabilistic pushdown systems} and {\em Markov chains}, and further investigate whether it is necessary to define a quantum analogue of {\em probabilistic computational tree logic} to describe the branching-time properties of the {\em quantum Markov chain} defined in this paper. We study its model-checking question and show that the model-checking of {\em stateless quantum pushdown systems (qBPA)} against {\em probabilistic computational tree logic (PCTL)} is generally undecidable, with the immediate corollaries summarized. We define the notion of {\em probabilistic $\omega$-pushdown automaton} for the first time and study the model-checking question of {\em stateless probabilistic $\omega$-pushdown system ($\omega$-pBPA)} against $\omega$-PCTL (defined by Chatterjee et al. in \cite{CSH08}) and show that the model-checking of {\em stateless probabilistic $\omega$-pushdown systems ($\omega$-pBPA)} against $\omega$-PCTL is generally undecidable, with immediate consequences summarized. Our approach is to construct formulas of $\omega$-PCTL encoding the {\em Post Correspondence Problem} indirectly.	翻訳日:2023-08-27 05:32:15 公開日:2023-08-08
# AdaptEx: セルフサービスのコンテキストバンドプラットフォーム AdaptEx: A Self-Service Contextual Bandit Platform ( http://arxiv.org/abs/2308.08650v1 ) ライセンス: Link先を確認	William Black, Ercument Ilhan, Andrea Marchini and Vilda Markeviciute	(参考訳) 本稿では,Expedia Groupで広く利用されているセルフサービスコンテキスト型バンディットプラットフォームであるAdaptExについて述べる。 AdaptExは、各訪問者のユニークなコンテキストを考慮し、最適なバリエーションを選択し、それらが行うすべてのインタラクションから素早く学習する。従来のテストメソッドに関連するコストと時間を最小化しながら、ユーザエクスペリエンスを改善する強力なソリューションを提供する。このプラットフォームは、常に変化するコンテンツや継続的な"コールドスタート"状況でも、最適な製品ソリューションへのイテレーションを迅速に行うことができる。 This paper presents AdaptEx, a self-service contextual bandit platform widely used at Expedia Group, that leverages multi-armed bandit algorithms to personalize user experiences at scale. AdaptEx considers the unique context of each visitor to select the optimal variants and learns quickly from every interaction they make. It offers a powerful solution to improve user experiences while minimizing the costs and time associated with traditional testing methods. The platform unlocks the ability to iterate towards optimal product solutions quickly, even in ever-changing content and continuous "cold start" situations gracefully.	翻訳日:2023-08-27 05:16:13 公開日:2023-08-08
# 人工知能のメタヒューリスティックアルゴリズムとバイオインフォマティクス, バイオ統計学, 生態学, 製造業への応用 Metaheuristic Algorithms in Artificial Intelligence with Applications to Bioinformatics, Biostatistics, Ecology and, the Manufacturing Industries ( http://arxiv.org/abs/2308.10875v1 ) ライセンス: Link先を確認	Elvis Han Cui, Zizhao Zhang, Culsome Junwen Chen, Weng Kee Wong	(参考訳) 自然にインスパイアされたメタヒューリスティックアルゴリズムは、人工知能の重要なコンポーネントであり、様々な最適化問題に取り組むために、分野間でますます使われています。我々は,CSO-MAを用いた競合Swarm Optimizationrという,自然に着想を得たメタヒューリスティックアルゴリズムを新たに提案し,その柔軟性と性能を,統計学における様々な最適化問題に適用した。特に、アルゴリズムは効率的であり、様々なコスト構造や複数のユーザ指定非線形制約を組み込むことができる。私たちのアプリケーションには一単細胞一般化傾向モデルにおけるパラメータの最大推定値を求め、バイオインフォマティクスにおける擬似時間を研究する。 (ii)教育研究における一般的なraschモデルにおけるパラメータの推定 (iii)マルコフ更新モデルにおけるcox回帰のためのm-estimatesの探索と (4) 2つのコンパートメントモデルにおける欠落値を暗示する行列補完。さらに応用についても論じる。 (v)生態問題において最適な変数を選定し、 (vi)複数の相互作用因子をもつロジスティックモデルを用いて自動車産業のための燃料補給実験を設計する。 Nature-inspired metaheuristic algorithms are important components of artificial intelligence, and are increasingly used across disciplines to tackle various types of challenging optimization problems. We apply a newly proposed nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA) and demonstrate its flexibility and out-performance relative to its competitors in a variety of optimization problems in the statistical sciences. In particular, we show the algorithm is efficient and can incorporate various cost structures or multiple user-specified nonlinear constraints. Our applications include (i) finding maximum likelihood estimates of parameters in a single cell generalized trend model to study pseudotime in bioinformatics, (ii) estimating parameters in a commonly used Rasch model in education research, (iii) finding M-estimates for a Cox regression in a Markov renewal model and (iv) matrix completion to impute missing values in a two compartment model. In addition we discuss applications to (v) select variables optimally in an ecology problem and (vi) design a car refueling experiment for the auto industry using a logistic model with multiple interacting factors.	翻訳日:2023-08-27 05:07:22 公開日:2023-08-08
# transtyler: 顔と身体のジェスチャー生成のためのマルチモーダルな動作スタイル転送 TranSTYLer: Multimodal Behavioral Style Transfer for Facial and Body Gestures Generation ( http://arxiv.org/abs/2308.10843v1 ) ライセンス: Link先を確認	Mireille Fares, Catherine Pelachaud, Nicolas Obin	(参考訳) 本稿では,仮想エージェントの行動表現スタイルを他のエージェントに移し,コミュニケーション的意味を持つ行動形態を保ちながら,行動表現スタイルを他のエージェントに移すことの課題について述べる。ここでは行動表現性スタイルを行動の質的特性と見なす。そこで我々は,TranSTYLerを提案する。TranSTYLerは,ソース話者のマルチモーダル動作をターゲット話者のスタイルで合成するマルチモーダルトランスフォーマーモデルである。行動表現スタイルは, テキスト, 音声, 身体ジェスチャー, 表情など, 様々なコミュニケーションのモダリティにまたがってコード化されていると仮定する。このモデルはスタイルとコンテンツの絡み合いスキーマを使用して、転送されたスタイルがソースの振る舞いによって伝達される意味に干渉しないようにします。提案手法は,スタイルラベルの必要性を排除し,トレーニング期間中に見られなかったスタイルへの一般化を可能にする。我々はPATSコーパスでモデルをトレーニングし、ダイアログや2D顔のランドマークを含むように拡張した。客観的および主観的評価は,本モデルがトレーニング中の見知らぬスタイルと見知らぬスタイルの両方において,アートモデルの状態よりも優れていたことを示している。そこで本稿では,コンテンツのリークや流儀の漏えい問題に対処するために,対象のスタイルに関連する動作やジェスチャーの伝達の程度を評価する手法を提案する。 This paper addresses the challenge of transferring the behavior expressivity style of a virtual agent to another one while preserving behaviors shape as they carry communicative meaning. Behavior expressivity style is viewed here as the qualitative properties of behaviors. We propose TranSTYLer, a multimodal transformer based model that synthesizes the multimodal behaviors of a source speaker with the style of a target speaker. We assume that behavior expressivity style is encoded across various modalities of communication, including text, speech, body gestures, and facial expressions. The model employs a style and content disentanglement schema to ensure that the transferred style does not interfere with the meaning conveyed by the source behaviors. Our approach eliminates the need for style labels and allows the generalization to styles that have not been seen during the training phase. We train our model on the PATS corpus, which we extended to include dialog acts and 2D facial landmarks. Objective and subjective evaluations show that our model outperforms state of the art models in style transfer for both seen and unseen styles during training. To tackle the issues of style and content leakage that may arise, we propose a methodology to assess the degree to which behavior and gestures associated with the target style are successfully transferred, while ensuring the preservation of the ones related to the source content.	翻訳日:2023-08-27 05:06:38 公開日:2023-08-08
# スマートエネルギー管理のための電流特徴可視化に基づく非侵入電力負荷モニタリング手法 Non-Intrusive Electric Load Monitoring Approach Based on Current Feature Visualization for Smart Energy Management ( http://arxiv.org/abs/2308.11627v1 ) ライセンス: Link先を確認	Yiwen Xu, Dengfeng Liu, Liangtao Huang, Zhiquan Lin, Tiesong Zhao, and Sam Kwong	(参考訳) 最先端のスマートシティは、特に電力システムにおいて、大規模ネットワーク上で経済的に効率的なエネルギー管理を求められている。システム内の全ユーザの電力負荷を監視、分析、制御することが重要な問題である。本稿では,aiの一般的なコンピュータビジョン技術を用いて,スマートエネルギー管理のための非侵襲的負荷監視手法を提案する。まず,信号変換(ウェーブレット変換と離散フーリエ変換を含む)とグラミアン角場(GAF)法の両方を用いて,一次元の電流信号を2次元カラー特徴像にマッピングする。第2に,多スケール特徴抽出と注意機構を備えたu字型ディープニューラルネットワークを用いて,カラー特徴画像からすべての電気負荷を認識することを提案する。第3に,本手法をクラウドベースで非侵襲的な全ユーザモニタリングとして設計し,電力系統制御時の省エネルギー化を図る。大規模IoT(Internet of Things, モノのインターネット)上での効率的なエネルギー管理を支援することを目的として, 提案手法の有効性を実証した。 The state-of-the-art smart city has been calling for an economic but efficient energy management over large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze and control electric loads of all users in system. In this paper, we employ the popular computer vision techniques of AI to design a non-invasive load monitoring method for smart electric energy management. First of all, we utilize both signal transforms (including wavelet transform and discrete Fourier transform) and Gramian Angular Field (GAF) methods to map one-dimensional current signals onto two-dimensional color feature images. Second, we propose to recognize all electric loads from color feature images using a U-shape deep neural network with multi-scale feature extraction and attention mechanism. Third, we design our method as a cloud-based, non-invasive monitoring of all users, thereby saving energy cost during electric power system control. Experimental results on both public and our private datasets have demonstrated our method achieves superior performances than its peers, and thus supports efficient energy management over large-scale Internet of Things (IoT).	翻訳日:2023-08-27 04:59:12 公開日:2023-08-08
# PokerKit: 細粒度多変数ポーカーゲームシミュレーションのための総合Pythonライブラリ PokerKit: A Comprehensive Python Library for Fine-Grained Multi-Variant Poker Game Simulations ( http://arxiv.org/abs/2308.07327v1 ) ライセンス: Link先を確認	Juho Kim	(参考訳) PokerKitは、既存のポーカーゲームシミュレーションと手評価ツールの制限を克服するために設計された、オープンソースのPythonライブラリである。対照的に、ポーカーキットはポーカーの多種多様なバリエーションをサポートし、ユーザーが独自のゲームを定義するための柔軟なアーキテクチャを提供する。本稿では,ポーカーキットの設計と実装について詳述する。ポーカーキットは,直感的なプログラムapi,多変量ゲームサポート,さまざまな手のタイプにわたる統一的なハンド評価スイートなどである。 PokerKitの柔軟性により、ポーカーAI開発、ツール作成、オンラインポーカーカジノ実装など、さまざまな分野のアプリケーションが可能になる。 pokerkitの信頼性は、静的な型チェック、広範なdocテスト、ユニットテストによって確立され、97\%のコードカバレッジを達成している。 PokerKitの導入は、コンピュータポーカーの分野への重要な貢献であり、様々なポーカーゲームのための将来の研究と高度なAI開発を促進する。 PokerKit is an open-source Python library designed to overcome the restrictions of existing poker game simulation and hand evaluation tools, which typically support only a handful of poker variants and lack flexibility in game state control. In contrast, PokerKit significantly expands this scope by supporting an extensive array of poker variants and it provides a flexible architecture for users to define their custom games. This paper details the design and implementation of PokerKit, including its intuitive programmatic API, multi-variant game support, and a unified hand evaluation suite across different hand types. The flexibility of PokerKit allows for applications in diverse areas, such as poker AI development, tool creation, and online poker casino implementation. PokerKit's reliability has been established through static type checking, extensive doctests, and unit tests, achieving 97\% code coverage. The introduction of PokerKit represents a significant contribution to the field of computer poker, fostering future research and advanced AI development for a wide variety of poker games.	翻訳日:2023-08-20 16:29:00 公開日:2023-08-08
# 圧縮, 類似性検索, クラスタリング, 組織化, cDNAライブラリの操作改善のためのシーケンス類似性とコンテキストによるベクトル埋め込み Vector Embeddings by Sequence Similarity and Context for Improved Compression, Similarity Search, Clustering, Organization, and Manipulation of cDNA Libraries ( http://arxiv.org/abs/2308.05118v1 ) ライセンス: Link先を確認	Daniel H. Um, David A. Knowles, Gail E. Kaiser	(参考訳) 本稿では、フラット文字列遺伝子形式(FASTA/FASTQ5)の研究における、遺伝子の組織的数値表現の有用性を示す。 FASTA/FASTQファイルには、ファイルサイズ、マッピングとアライメントの処理速度の遅さ、コンテキスト依存など、いくつかの制限がある。これらの課題は、類似のシーケンスを見つけることに関わる調査やタスクを著しく妨げている。この解は、配列を別の表現に変換することで、生の配列自身と比較して、類似したグループへのクラスタリングを容易にする。各ショートシーケンスに独自のベクトル埋め込みを割り当てることで、cDNAライブラリの文字列表現に対する圧縮性能をより効率的にクラスタリングし、改善することができる。さらに,コドン三重項の文脈に基づく交互座標ベクトル埋め込みの学習により,アミノ酸特性に基づくクラスタリングを示すことができる。最後に、バーコードとcDNA配列をエンコードするためにこのシーケンス埋め込み法を用いることで、ユークリッド空間におけるベクトルの近接性を決定するアルゴリズムとベクトル埋め込みを結合することで、類似検索の時間的複雑さを向上させることができる。 This paper demonstrates the utility of organized numerical representations of genes in research involving flat string gene formats (i.e., FASTA/FASTQ5). FASTA/FASTQ files have several current limitations, such as their large file sizes, slow processing speeds for mapping and alignment, and contextual dependencies. These challenges significantly hinder investigations and tasks that involve finding similar sequences. The solution lies in transforming sequences into an alternative representation that facilitates easier clustering into similar groups compared to the raw sequences themselves. By assigning a unique vector embedding to each short sequence, it is possible to more efficiently cluster and improve upon compression performance for the string representations of cDNA libraries. Furthermore, through learning alternative coordinate vector embeddings based on the contexts of codon triplets, we can demonstrate clustering based on amino acid properties. Finally, using this sequence embedding method to encode barcodes and cDNA sequences, we can improve the time complexity of the similarity search by coupling vector embeddings with an algorithm that determines the proximity of vectors in Euclidean space; this allows us to perform sequence similarity searches in a quicker and more modular fashion.	翻訳日:2023-08-11 14:59:22 公開日:2023-08-08
# PTransIPs:タンパク質事前学習言語モデルとトランスフォーマーに基づくリン酸化部位の同定 PTransIPs: Identification of phosphorylation sites based on protein pretrained language model and Transformer ( http://arxiv.org/abs/2308.05115v1 ) ライセンス: Link先を確認	Ziyang Xu and Haitian Zhong	(参考訳) リン酸化は多くの基本的な細胞プロセスの中心であり、様々な疾患の発症と進行に影響を与える。リン酸化部位の同定は、細胞やウイルス感染の分子機構を理解するための重要なステップであり、新たな治療標的となる可能性がある。本研究では,リン酸化部位の同定のための新しい深層学習モデルであるPTransIPを提案する。 PTransIPsは、タンパク質配列中のアミノ酸を自然言語の単語として扱い、配列中のアミノ酸の位置と型に基づくユニークなエンコーディングを抽出する。また、大きな事前訓練されたタンパク質モデルの埋め込みを追加のデータ入力として組み込む。 ptransipsはさらに、残差接続を持つ畳み込みニューラルネットワークと、マルチヘッドアテンション機構を備えたトランスフォーマーモデルの組み合わせモデルに基づいて訓練される。最後に、モデルは完全な連結層を通して分類結果を出力する。独立試験の結果、PTransIPsは既存の最先端手法よりも優れており、リン化S/T部位とY部位をそれぞれ同定するためのAUROCs 0.9232と0.9660が達成されている。さらに,プレトレーニングモデル埋め込みがPTransIPの性能に寄与することを示す。さらに、PTransIPsは、解釈可能なアミノ酸嗜好、可視訓練プロセスを有し、他の生物活性分類タスクにおける一般化性を示す。使用を容易にするため、コードとデータは \url{https://github.com/StatXzy7/PTransIPs} で公開されています。 Phosphorylation is central to numerous fundamental cellular processes, influencing the onset and progression of a variety of diseases. Identification of phosphorylation sites is thus an important step for understanding the molecular mechanisms of cells and virus infection, which potentially leads to new therapeutic targets. In this study, we present PTransIPs, a novel deep learning model for the identification of phosphorylation sites. PTransIPs treats amino acids in protein sequences as words in natural language, extracting unique encodings based on the types along with position of amino acids in the sequence. It also incorporates embeddings from large pre-trained protein models as additional data inputs. PTransIPS is further trained on a combination model of convolutional neural network with residual connections and Transformer model equipped with multi-head attention mechanisms. At last, the model outputs classification results through a fully connected layer. The results of independent testing reveal that PTransIPs outperforms existing state-of-the-art methodologies, achieving AUROCs of 0.9232 and 0.9660 for identifying phosphorylated S/T and Y sites respectively. In addition, ablation studies prove that pretrained model embeddings contribute to the performance of PTransIPs. Furthermore, PTransIPs has interpretable amino acid preference, visible training process and shows generalizability on other bioactivity classification tasks. To facilitate usage, our code and data are publicly accessible at \url{https://github.com/StatXzy7/PTransIPs}.	翻訳日:2023-08-11 14:59:00 公開日:2023-08-08
# ankylosing spondylitis に対する脊椎x線自動スコアリングの試み Towards Automatic Scoring of Spinal X-ray for Ankylosing Spondylitis ( http://arxiv.org/abs/2308.05123v1 ) ライセンス: Link先を確認	Yuanhan Mo and Yao Chen and Aimee Readie and Gregory Ligozio and Thibaud Coroller and Bart{\l}omiej W. Papie\.z	(参考訳) 脊椎X線画像におけるStoke Ankylosing Spondylitis Spinal Score (mSASSS) の適応による構造変化は, 骨形状の複雑さと画像品質の変化により, コストと時間を要する。本研究では,x線脊椎イメージングにおいて,頚椎・腰椎ユニット(vus)のmsasssスコアを自動予測するために,vertxgradenetと呼ばれる2段階の自動グレーディングパイプラインを試作することで,この課題に対処した。 VertXGradeNetは、以前開発したVU抽出パイプライン(VertXNet)によって生成されたVUを入力として使用し、それらのVUに基づいてmSASSSを予測する。 vertxgradenet は軸椎変形性関節症患者の頚椎外側x線および腰椎x線画像の社内データセットで評価した。以上の結果から,VertXGradeNetは,データ量に制限のある場合,各VUのmSASSSスコアを予測できることがわかった。全体として、4つの異なるmSASSSスコア(すなわち、2つのテストデータセットで0, 1, 2, 3)に対して0.56と0.51のバランスの取れた精度を達成することができる。この方法の精度は, 脊髄x線読影の合理化の可能性を示し, 今後の臨床試験の費用削減に寄与する。 Manually grading structural changes with the modified Stoke Ankylosing Spondylitis Spinal Score (mSASSS) on spinal X-ray imaging is costly and time-consuming due to bone shape complexity and image quality variations. In this study, we address this challenge by prototyping a 2-step auto-grading pipeline, called VertXGradeNet, to automatically predict mSASSS scores for the cervical and lumbar vertebral units (VUs) in X-ray spinal imaging. The VertXGradeNet utilizes VUs generated by our previously developed VU extraction pipeline (VertXNet) as input and predicts mSASSS based on those VUs. VertXGradeNet was evaluated on an in-house dataset of lateral cervical and lumbar X-ray images for axial spondylarthritis patients. Our results show that VertXGradeNet can predict the mSASSS score for each VU when the data is limited in quantity and imbalanced. Overall, it can achieve a balanced accuracy of 0.56 and 0.51 for 4 different mSASSS scores (i.e., a score of 0, 1, 2, 3) on two test datasets. The accuracy of the presented method shows the potential to streamline the spinal radiograph readings and therefore reduce the cost of future clinical trials.	翻訳日:2023-08-11 14:47:47 公開日:2023-08-08
# fMRIによる自閉症スペクトラム障害の予測 Copy Number Variation Informs fMRI-based Prediction of Autism Spectrum Disorder ( http://arxiv.org/abs/2308.05122v1 ) ライセンス: Link先を確認	Nicha C. Dvornek, Catherine Sullivan, James S. Duncan, Abha R. Gupta	(参考訳) 自閉症スペクトラム障害(ASD)の多因子的エティロジーは、その研究が、神経画像、遺伝学、臨床評価など、幅広いプラットフォームからのデータを組み合わせたマルチモーダルアプローチから大きな恩恵を受けることを示唆している。以前のニューロイメージング・ジェネティック分析は、しばしば、データ駆動型作業においてナイーブな特徴結合アプローチを適用したり、あるモダリティからの発見を別のモダリティ分析のガイドに用いたりし、真に統一されたアプローチでペア化されたマルチモーダルデータを解析する機会を欠いた。本稿では、遺伝、人口統計、神経画像データを組み合わせたより統合的なモデルを開発する。遺伝子型が表現型に与える影響に着想を得て,モデル予測において重要な神経画像の特徴に注意を向ける注意型アプローチを提案する。遺伝データはコピー数の変化パラメータから、神経画像データは機能的磁気共鳴画像から得られる。 ASD分類と重大度予測タスクに対する提案手法を,228 ASDの性バランスデータセットを用いて評価し,典型的には10倍のクロスバリデーションフレームワークで被験者を育成する。遺伝情報,人口統計データ,機能的磁気共鳴画像を組み合わせた注意に基づくモデルが,他のマルチモーダル手法と比較して優れた予測性能をもたらすことを実証した。 The multifactorial etiology of autism spectrum disorder (ASD) suggests that its study would benefit greatly from multimodal approaches that combine data from widely varying platforms, e.g., neuroimaging, genetics, and clinical characterization. Prior neuroimaging-genetic analyses often apply naive feature concatenation approaches in data-driven work or use the findings from one modality to guide posthoc analysis of another, missing the opportunity to analyze the paired multimodal data in a truly unified approach. In this paper, we develop a more integrative model for combining genetic, demographic, and neuroimaging data. Inspired by the influence of genotype on phenotype, we propose using an attention-based approach where the genetic data guides attention to neuroimaging features of importance for model prediction. The genetic data is derived from copy number variation parameters, while the neuroimaging data is from functional magnetic resonance imaging. We evaluate the proposed approach on ASD classification and severity prediction tasks, using a sex-balanced dataset of 228 ASD and typically developing subjects in a 10-fold cross-validation framework. We demonstrate that our attention-based model combining genetic information, demographic data, and functional magnetic resonance imaging results in superior prediction performance compared to other multimodal approaches.	翻訳日:2023-08-11 14:47:20 公開日:2023-08-08
# インスツルメンテーション・アンド・コントロールシステムに統合された機械学習法の動的モデルの信頼性評価 Dynamic Model Agnostic Reliability Evaluation of Machine-Learning Methods Integrated in Instrumentation & Control Systems ( http://arxiv.org/abs/2308.05120v1 ) ライセンス: Link先を確認	Edward Chen, Han Bao, Nam Dinh	(参考訳) 近年、データ駆動ニューラルネットワークベースの機械学習(ML)アルゴリズムの分野は著しく成長し、計測と制御システムへの適用性の研究が加速している。運用環境では有望だが、そのようなアルゴリズムの信頼性は十分に評価されていない。総合的なリスクモデリングの欠如は、これらのシステムの信頼性を低下させる可能性がある。全米標準技術研究所の最近の報告では、MLの信頼性は採用にとって重要な障壁であり、インテリジェントシステムの安全かつ説明責任のある運用において重要な役割を果たす。そこで本研究では,トレーニングデータセットに分散検出を組み込むことで,ml予測の相対的信頼性を評価するリアルタイムモデル非依存手法を提案する。 MLアルゴリズムは補間(または近補間)タスクでは優れているが、補間では著しく劣化する。これは、新しいサンプルがトレーニングサンプルから"遠い"場合に発生する。この手法はlaplacian distributed decay for reliability (laddr)と呼ばれ、予測の相対的信頼性を計算するために使用される運用データと訓練データセットの違いを決定する。 LADDRは、フィードフォワードニューラルネットワークベースのモデルで、異なるフローの損失遷移における安全性の重要な要因を予測する。 LADDRは「データスーパーバイザ」として意図され、運用条件の文脈でよく訓練されたMLモデルの適切性を決定する。最終的に、LADDRは、従来の補間タスクに使用する場合のML予測の信頼性を支える証拠としてトレーニングデータを使用する方法を示している。 In recent years, the field of data-driven neural network-based machine learning (ML) algorithms has grown significantly and spurred research in its applicability to instrumentation and control systems. While they are promising in operational contexts, the trustworthiness of such algorithms is not adequately assessed. Failures of ML-integrated systems are poorly understood; the lack of comprehensive risk modeling can degrade the trustworthiness of these systems. In recent reports by the National Institute for Standards and Technology, trustworthiness in ML is a critical barrier to adoption and will play a vital role in intelligent systems' safe and accountable operation. Thus, in this work, we demonstrate a real-time model-agnostic method to evaluate the relative reliability of ML predictions by incorporating out-of-distribution detection on the training dataset. It is well documented that ML algorithms excel at interpolation (or near-interpolation) tasks but significantly degrade at extrapolation. This occurs when new samples are "far" from training samples. The method, referred to as the Laplacian distributed decay for reliability (LADDR), determines the difference between the operational and training datasets, which is used to calculate a prediction's relative reliability. LADDR is demonstrated on a feedforward neural network-based model used to predict safety significant factors during different loss-of-flow transients. LADDR is intended as a "data supervisor" and determines the appropriateness of well-trained ML models in the context of operational conditions. Ultimately, LADDR illustrates how training data can be used as evidence to support the trustworthiness of ML predictions when utilized for conventional interpolation tasks.	翻訳日:2023-08-11 14:46:53 公開日:2023-08-08
# 2次元のディラックデルタシュロディンガーポテンシャルに対する特異連続 L$^2(\mathbb{R}^2)$境界状態解の特異点スペクトルと固有ベクトル The Exact Point Spectrum and Eigenvector of the Unique Continuous L$^2(\mathbb{R}^2)$ Bound State Solution to the Dirac Delta Schrodinger Potential in Two Dimensions ( http://arxiv.org/abs/2308.05195v1 ) ライセンス: Link先を確認	Michael Maroun	(参考訳) 2次元と3次元のディラックデルタ関数の点スペクトル、すなわち境界状態エネルギー固有値を分析することは、典型的には正規化や再正規化を伴わずに非常に難しい。この2次元の理由は2つの折りたたみである。 1) 結合定数は質量とプランク定数と共に単数量を形成する。これにより、異常な長さのスケールが失われる。 2) 直ちに明らかな l$^2$ の解は原点において発散し、ディラックデルタポテンシャルは測度として重要な支持点を持つ。ここで示される解の一意性から、線型作用素(すべての$\mathbb{r}^2$ 上の2次元ラプラス作用素)が、ここで構成される特別な領域を持つと、点スペクトルがちょうど1つの要素を持つことが保証される。この要素は正確に決定され、異常な長さスケールに対する自然な数学的厳密な分解が起こる。この研究において、任意の種類の再正規化や正規化には関係がない。 Analyzing the point spectrum, i.e. bound state energy eigenvalue, of the Dirac delta function in two and three dimensions is notoriously difficult without recourse to regularization or renormalization, typically both. The reason for this in two dimensions is two fold; 1) the coupling constant, together with the mass and Planck's constant form an unitless quantity. This causes there to be a missing anomalous length scale. 2) The immediately obvious L$^2$ solution is divergent at the origin, where the Dirac Delta potential has its important point of support as a measure. Due to the uniqueness of the solution presented here, it is immediate that the linear operator (the two dimensional Laplace operator on all of $\mathbb{R}^2$), with the specialized domain constructed here, ensures that the point spectrum has exactly one element. This element is determined precisely, and a natural mathematically rigorous resolution to the anomalous length scale arises. In this work, there is no recourse to renormalization or regularization of any kind.	翻訳日:2023-08-11 14:27:50 公開日:2023-08-08
# 1+1)D$QED散乱過程における絡み合い生成 Entanglement generation in $(1+1)D$ QED scattering processes ( http://arxiv.org/abs/2105.03445v3 ) ライセンス: Link先を確認	Marco Rigobello, Simone Notarnicola, Giuseppe Magnifico, Simone Montangero	(参考訳) テンソルネットワークを用いた1+1$次元QEDにおける実時間中間子散乱過程について検討した。自由フェルミオンモデルに基づく近似を導入することで、与えられた運動量と位置を持つ初期中間波パケットを作成する。次に, 2つの初期分離結合中間子の動力学を計算し, 相互作用強度および初期状態が弱結合系および中間結合系で変化することを観測した。最後に, 弾性衝突を考慮し, いくつかの散乱振幅とプロセスによって生じる絡み合いを計測する。驚くべきことに, 外部の中間子間の漸近的絡み合いに対する2つの異なるレジームを同定し, 結合関数としての成長が急激に加速するしきい値結合よりも摂動的に小さい。 We study real-time meson-meson scattering processes in $(1+1)$-dimensional QED by means of Tensor Networks. We prepare initial meson wave packets with given momentum and position introducing an approximation based on the free fermions model. Then, we compute the dynamics of two initially separated colliding mesons, observing a rich phenomenology as the interaction strength and the initial states are varied in the weak and intermediate coupling regimes. Finally, we consider elastic collisions and measure some scattering amplitudes as well as the entanglement generated by the process. Remarkably, we identify two different regimes for the asymptotic entanglement between the outgoing mesons: it is perturbatively small below a threshold coupling, past which its growth as a function of the coupling abruptly accelerates.	翻訳日:2023-08-10 18:38:54 公開日:2023-08-08
# 準最適サンプル複素数を持つゼロサムマルコフゲームにおけるモデルベースマルチエージェントRL Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity ( http://arxiv.org/abs/2007.07461v3 ) ライセンス: Link先を確認	Kaiqing Zhang, Sham M. Kakade, Tamer Ba\c{s}ar, Lin F. Yang	(参考訳) 実験モデルを用いたモデルベース強化学習(RL)は,RLのコーナーストーンの1つとして長年認識されてきた。学習と計画段階を自然に分離するマルチエージェントrl(marl)に特に適しており、全てのエージェントがサンプルを使用してポリシーを同時に改善する場合、非定常問題を回避する。直感的で広く使われているが、モデルベースMARLアルゴリズムのサンプル複雑性は十分に研究されていない。本稿では,サンプルの複雑さに関する根本的な問題に対処することを目的とする。生成モデルにのみアクセス可能な2プレイヤーのゼロサムマルコフゲームについて,最も基本的なMARL設定について検討した。モデルベースMARLは、Nash平衡値(NE)を求めるために$\tilde O(\|S\|\|A\|\|\|B\|(1-\gamma)^{-3}\epsilon^{-2})$と、滑らかな計画オラクルを持つ$\epsilon$-NEポリシーのサンプル複雑性を達成し、$\gamma$は割引係数であり、$S,A,B$は状態空間と2つのエージェントのアクション空間を表す。さらに,アルゴリズムが報酬に依存しない場合,そのようなサンプル境界がミニマックス最適(対数係数まで)であることが示され,アルゴリズムは報酬知識のない遷移サンプルを検索し,一致した下位境界を確立する。これは通常の報酬対応の設定とは対照的で、$\tilde\Omega(\|S\|(\|A\|+\|B\|)(1-\gamma)^{-3}\epsilon^{-2})$ lower bound である。今回の結果は,marlにおけるモデルベースアプローチのサンプル効率を示すだけでなく,そのパワー(より困難な報酬非依存のケースを簡易に処理する)と制限($\|a\|,\|b\|$の適応的かつ最適でない)との根本的なトレードオフを詳細に示すものである。 Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the corner stones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intuitive and widely-used, the sample complexity of model-based MARL algorithms has not been fully investigated. In this paper, our goal is to address the fundamental question about its sample complexity. We study arguably the most basic MARL setting: two-player discounted zero-sum Markov games, given only access to a generative model. We show that model-based MARL achieves a sample complexity of $\tilde O(\|S\|\|A\|\|B\|(1-\gamma)^{-3}\epsilon^{-2})$ for finding the Nash equilibrium (NE) value up to some $\epsilon$ error, and the $\epsilon$-NE policies with a smooth planning oracle, where $\gamma$ is the discount factor, and $S,A,B$ denote the state space, and the action spaces for the two agents. We further show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge, by establishing a matching lower bound. This is in contrast to the usual reward-aware setting, with a $\tilde\Omega(\|S\|(\|A\|+\|B\|)(1-\gamma)^{-3}\epsilon^{-2})$ lower bound, where this model-based approach is near-optimal with only a gap on the $\|A\|,\|B\|$ dependence. Our results not only demonstrate the sample-efficiency of this basic model-based approach in MARL, but also elaborate on the fundamental tradeoff between its power (easily handling the more challenging reward-agnostic case) and limitation (less adaptive and suboptimal in $\|A\|,\|B\|$), particularly arises in the multi-agent context.	翻訳日:2023-08-10 18:38:23 公開日:2023-08-08
# 量子ゼノダイナミクスによる制約付き最適化 Constrained Optimization via Quantum Zeno Dynamics ( http://arxiv.org/abs/2209.15024v6 ) ライセンス: Link先を確認	Dylan Herman, Ruslan Shaydulin, Yue Sun, Shouvanik Chakrabarti, Shaohan Hu, Pierre Minssen, Arthur Rattew, Romina Yalovetzky, Marco Pistoia	(参考訳) 制約付き最適化問題は科学や産業においてユビキタスである。量子アルゴリズムは最適化問題の解法において有望であるが、現在のアルゴリズムでは任意の制約を効果的に扱えない。量子ゼノダイナミクスを用いて、不等式を含む複数の任意の制約で最適化問題を解く手法を提案する。量子最適化のダイナミクスは, 少数の補助量子ビットとポスト選択を必要とせず, 反復射影計測により, フォールトトレラント量子コンピュータ上のコンストラクタント部分空間に効率的に制限できることを示した。本手法は、量子近似最適化アルゴリズム(qaoa)と変分量子回路に組み込んで最適化し、幅広い適用性を有する。本手法は,複数の現実的制約を持つポートフォリオ最適化問題に対して数値的に評価し,現状技術よりも優れた解品質と制約内確率を観測する。我々は,量子H1-2量子プロセッサ上で概念実証を行う。 Constrained optimization problems are ubiquitous in science and industry. Quantum algorithms have shown promise in solving optimization problems, yet none of the current algorithms can effectively handle arbitrary constraints. We introduce a technique that uses quantum Zeno dynamics to solve optimization problems with multiple arbitrary constraints, including inequalities. We show that the dynamics of quantum optimization can be efficiently restricted to the in-constraint subspace on a fault-tolerant quantum computer via repeated projective measurements, requiring only a small number of auxiliary qubits and no post-selection. Our technique has broad applicability, which we demonstrate by incorporating it into the quantum approximate optimization algorithm (QAOA) and variational quantum circuits for optimization. We evaluate our method numerically on portfolio optimization problems with multiple realistic constraints and observe better solution quality and higher in-constraint probability than state-of-the-art techniques. We implement a proof-of-concept demonstration of our method on the Quantinuum H1-2 quantum processor.	翻訳日:2023-08-10 18:31:58 公開日:2023-08-08
# M$^2$-3DLaneNet:マルチモーダル3Dレーン検出の探索 M$^2$-3DLaneNet: Exploring Multi-Modal 3D Lane Detection ( http://arxiv.org/abs/2209.05996v3 ) ライセンス: Link先を確認	Yueru Luo, Xu Yan, Chaoda Zheng, Chao Zheng, Shuqi Mei, Tang Kun, Shuguang Cui, Zhen Li	(参考訳) 3d空間における正確なレーン線の推定は、その希薄な性質のため、依然として困難である。以前の研究は主に3dレーン検出に画像を使うことに重点を置いており、内在的な投影誤差と幾何情報の損失を招いた。これらの問題に対処するために,既存の単分子手法と組み合わせて,LiDARを3次元車線検出に活用する可能性を検討する。本稿では,複数のセンサからの補完情報を統合するためのm$^2$-3dlanenetを提案する。具体的には、M$^2$-3DLaneNetは、深度補完を通してLiDARデータから幾何情報を取り込むことで、2次元特徴を3次元空間に持ち上げる。その後、リフトされた2D機能は、BEV融合によりLiDAR機能によりさらに強化される。大規模openlaneデータセットに関する広範囲な実験により、m$^2$-3dlanenetが75mまたは100mの範囲に関係なく有効であることが示されている。 Estimating accurate lane lines in 3D space remains challenging due to their sparse and slim nature. Previous works mainly focused on using images for 3D lane detection, leading to inherent projection error and loss of geometry information. To address these issues, we explore the potential of leveraging LiDAR for 3D lane detection, either as a standalone method or in combination with existing monocular approaches. In this paper, we propose M$^2$-3DLaneNet to integrate complementary information from multiple sensors. Specifically, M$^2$-3DLaneNet lifts 2D features into 3D space by incorporating geometry information from LiDAR data through depth completion. Subsequently, the lifted 2D features are further enhanced with LiDAR features through cross-modality BEV fusion. Extensive experiments on the large-scale OpenLane dataset demonstrate the effectiveness of M$^2$-3DLaneNet, regardless of the range (75m or 100m).	翻訳日:2023-08-10 18:30:39 公開日:2023-08-08
# Archangel: 位置とメッセージメタデータを備えたハイブリッドUAVベースのヒューマン検出ベンチマーク Archangel: A Hybrid UAV-based Human Detection Benchmark with Position and Pose Metadata ( http://arxiv.org/abs/2209.00128v3 ) ライセンス: Link先を確認	Yi-Ting Shen, Yaesop Lee, Heesung Kwon, Damon M. Conover, Shuvra S. Bhattacharyya, Nikolas Vale, Joshua D. Gray, G. Jeremy Leong, Kenneth Evensen, Frank Skirlo	(参考訳) 無人航空機(UAV)が捉えた画像の中で、人間のような物体を検出することを学ぶことは、通常、UAVの物体に対する位置によって引き起こされる大きな変動に悩まされる。加えて、既存のUAVベースのベンチマークデータセットは適切なデータセットメタデータを提供していない。本稿では,類似した想像条件とuav位置およびオブジェクトポーズメタデータでキャプチャされた,実および合成のサブセットからなる,最初のuavベースのオブジェクト検出データセットであるarchangelを紹介する。モデル評価中にメタデータを活用するメリットを示すために、最先端のオブジェクト検出器を用いて、一連の実験を慎重に設計する。さらに,モデル最適化における実データと合成データの両方に関する重要な知見を提示する。最後に、archangelのメリット、限界、今後の方向性について議論し、より広範な機械学習コミュニティにその明確な価値を強調する。 Learning to detect objects, such as humans, in imagery captured by an unmanned aerial vehicle (UAV) usually suffers from tremendous variations caused by the UAV's position towards the objects. In addition, existing UAV-based benchmark datasets do not provide adequate dataset metadata, which is essential for precise model diagnosis and learning features invariant to those variations. In this paper, we introduce Archangel, the first UAV-based object detection dataset composed of real and synthetic subsets captured with similar imagining conditions and UAV position and object pose metadata. A series of experiments are carefully designed with a state-of-the-art object detector to demonstrate the benefits of leveraging the metadata during model evaluation. Moreover, several crucial insights involving both real and synthetic data during model optimization are presented. In the end, we discuss the advantages, limitations, and future directions regarding Archangel to highlight its distinct value for the broader machine learning community.	翻訳日:2023-08-10 18:29:37 公開日:2023-08-08
# ナノスケール力センシングのためのインダクティブ電気機械伝達 Kinetic Inductive Electromechanical Transduction for Nanoscale Force Sensing ( http://arxiv.org/abs/2301.11055v4 ) ライセンス: Link先を確認	August K. Roos, Ermes Scarano, Elisabet K. Arvidsson, Erik Holmgren, David B. Haviland	(参考訳) 原子間力顕微鏡のための共鳴力センサの設計にはキャビティ光学の原理を用いる。このセンサーは、従来の静電容量カップリングと二重の電気機械結合の一種に基づいており、カンチレバーの運動は、超伝導ナノワイヤの運動インダクタンスの変化を引き起こす表面ひずみを誘導する。キャビティは、ナノワイヤのキネティックインダクタンスを含む等価な$lc$回路を備えたコンパクトマイクロ波プラズマモードによって実現される。本装置は完全にコプラナーであり,伝送線路と後続増幅器との最適結合のためにキャビティインピーダンスを変換する方法を示す。ここでは3-10Hzの範囲で, 素動インダクティブ・メカノ電界結合 (KIMEC) 速度$g_0 / 2 \pi$ を推定する。多周波ポンピングと測定手法を用いて, キャンチレバーの位相感度検出を行う。 We use the principles of cavity optomechanics to design a resonant mechanical force sensor for atomic force microscopy. The sensor is based on a type of electromechanical coupling, dual to traditional capacitive coupling, whereby the motion of a cantilever induces surface strain that causes a change in the kinetic inductance of a superconducting nanowire. The cavity is realized by a compact microwave-plasma mode with an equivalent $LC$ circuit involving the kinetic inductance of the nanowire. The device is fully coplanar and we show how to transform the cavity impedance for optimal coupling to the transmission line and the following amplifier. For the device presented here, we estimate the bare kinetic inductive mechano-electric coupling (KIMEC) rate $g_0 / 2 \pi$ in the range 3-10 Hz. We demonstrate phase-sensitive detection of cantilever motion using a multifrequency pumping and measurement scheme.	翻訳日:2023-08-10 18:09:24 公開日:2023-08-08
# 深層学習に基づく時系列因果推論による北極増幅の定量化 Quantifying Causes of Arctic Amplification via Deep Learning based Time-series Causal Inference ( http://arxiv.org/abs/2303.07122v3 ) ライセンス: Link先を確認	Sahara Ali, Omar Faruque, Yiyi Huang, Md. Osman Gani, Aneesh Subramanian, Nicole-Jienne Shchlegel, Jianwu Wang	(参考訳) 北極の温暖化、または北極の増幅は、いくつかの大気と海洋のドライバーによって導かれる。しかし、その根底にある熱力学的原因の詳細はまだ不明である。固定処理効果戦略を用いた海氷融解に対する大気プロセスの因果効果の推算は非現実的な反事実推定につながる。このようなモデルは、時間的な混乱によってバイアスになりがちである。さらに、地球科学データの複雑な非線形性は、既存の限界構造技術を用いて因果推論を行うことができない。これらの課題に取り組むために,反復型ニューラルネットワークと新しい確率的バランス手法を用いて,連続処理中の因果関係を推測する時系列因果推論モデルtcinetを提案する。合成および観測データに関する実験を通じて、我々の研究は北極海氷融解の原因の定量化能力を大幅に向上し、観測地球科学における因果推論の経路をさらに深めることができることを示す。 The warming of the Arctic, also known as Arctic amplification, is led by several atmospheric and oceanic drivers. However, the details of its underlying thermodynamic causes are still unknown. Inferring the causal effects of atmospheric processes on sea ice melt using fixed treatment effect strategies leads to unrealistic counterfactual estimations. Such models are also prone to bias due to time-varying confoundedness. Further, the complex non-linearity in Earth science data makes it infeasible to perform causal inference using existing marginal structural techniques. In order to tackle these challenges, we propose TCINet - time-series causal inference model to infer causation under continuous treatment using recurrent neural networks and a novel probabilistic balancing technique. Through experiments on synthetic and observational data, we show how our research can substantially improve the ability to quantify leading causes of Arctic sea ice melt, further paving paths for causal inference in observational Earth science.	翻訳日:2023-08-10 17:59:42 公開日:2023-08-08
# 注意マップエントロピーに基づくアクティブビジュアル探索 Active Visual Exploration Based on Attention-Map Entropy ( http://arxiv.org/abs/2303.06457v3 ) ライセンス: Link先を確認	Adam Pardyl, Grzegorz Rype\'s\'c, Grzegorz Kurzejamski, Bartosz Zieli\'nski, Tomasz Trzci\'nski	(参考訳) アクティブビジュアル探索は、環境に基づいて連続した観測がアクティブに選択される現実世界のシナリオにおいて、限られたセンサー能力の問題に対処する。この問題に対処するために,Attention-Map Entropy (AME) と呼ばれる新しい手法を導入する。変圧器モデルの内部の不確実性を利用して、最も情報性の高い観測値を決定する。既存のソリューションとは対照的に、トレーニングを単純化する追加の損失コンポーネントは必要ない。網膜様センサを模倣する実験により、そのような簡易なトレーニングにより、公開データセットの再構成、セグメンテーション、分類の性能が大幅に向上することを示した。 Active visual exploration addresses the issue of limited sensor capabilities in real-world scenarios, where successive observations are actively chosen based on the environment. To tackle this problem, we introduce a new technique called Attention-Map Entropy (AME). It leverages the internal uncertainty of the transformer-based model to determine the most informative observations. In contrast to existing solutions, it does not require additional loss components, which simplifies the training. Through experiments, which also mimic retina-like sensors, we show that such simplified training significantly improves the performance of reconstruction, segmentation and classification on publicly available datasets.	翻訳日:2023-08-10 17:59:25 公開日:2023-08-08
# 大規模言語モデル生成推論のためのコスト効果ハイパーパラメータ最適化 Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference ( http://arxiv.org/abs/2303.04673v2 ) ライセンス: Link先を確認	Chi Wang, Susan Xueqing Liu, Ahmed H. Awadallah	(参考訳) 大きな言語モデル(LLM)は、その生成能力に大きな関心を惹き付け、様々な商用アプリケーションの開発につながった。モデルを使用することのコストが高いため、アプリケーションビルダーは限られた推論予算の下で世代価値を最大化することができる。本稿では,テキスト生成の有用性とコストに大きな影響を及ぼす応答数,温度,最大トークンなどの推定ハイパーパラメータの最適化について検討する。経済的なハイパーパラメータ最適化とコストベースプルーニングを活用したEcoOptiGenというフレームワークを設計する。 GPT-3.5/GPT-4モデルを様々なタスクで実験し、その有効性を検証する。 EcoOptiGen は FLAML ライブラリの ‘autogen' パッケージで実装されている。 Large Language Models (LLMs) have sparked significant interest in their generative capabilities, leading to the development of various commercial applications. The high cost of using the models drives application builders to maximize the value of generation under a limited inference budget. This paper presents a study of optimizing inference hyperparameters such as the number of responses, temperature and max tokens, which significantly affects the utility/cost of text generation. We design a framework named EcoOptiGen which leverages economical hyperparameter optimization and cost-based pruning. Experiments with the GPT-3.5/GPT-4 models on a variety of tasks verify its effectiveness. EcoOptiGen is implemented in the `autogen' package of the FLAML library: \url{https://aka.ms/autogen}.	翻訳日:2023-08-10 17:58:53 公開日:2023-08-08
# 知識グラフのためのユニバーサル質問応答プラットフォーム A Universal Question-Answering Platform for Knowledge Graphs ( http://arxiv.org/abs/2303.00595v2 ) ライセンス: Link先を確認	Reham Omar, Ishika Dhall, Panos Kalnis, Essam Mansour	(参考訳) 多様なアプリケーションドメインからの知識は、SPARQLエンドポイントを介してWebにアクセス可能なRDFエンジンに格納されるナレッジグラフ(KG)として組織される。整形されたSPARQLクエリを表現するには、グラフ構造とそのコンポーネントの正確なURIに関する情報が必要である。質問応答(QA)システムは、自然言語の質問をSPARQLに翻訳するのを支援する。既存のQAシステムは通常、アプリケーション固有の人為的なルールに基づいており、あるいは、事前情報、高価な前処理、ターゲットとする各KGに対するモデル適応を必要とする。したがって、広い範囲のアプリケーションやKGに一般化することは困難である。本稿では,各ターゲットKGに合わせて調整する必要のない汎用QAシステムであるKGQAnを提案する。キュレートされた規則の代わりに、KGQAnは疑問理解の新たな形式化をテキスト生成問題として導入し、質問をニューラルシーケンスからシーケンスモデルを通じて中間抽象表現に変換する。また、クエリ時に抽象表現を特定のkgのsparqlクエリにマップし、公開アクセス可能なapiとrdfストアの既存のインデックスのみを使用するジャストインタイムリンカを開発した。いくつかの実kgを用いた実験により,kgqanは,解答の質や処理時間,特に任意のkgに対して,訓練中は見当たらない処理時間において,最先端の割に容易に展開し,その性能を上回っていることが示された。 Knowledge from diverse application domains is organized as knowledge graphs (KGs) that are stored in RDF engines accessible in the web via SPARQL endpoints. Expressing a well-formed SPARQL query requires information about the graph structure and the exact URIs of its components, which is impractical for the average user. Question answering (QA) systems assist by translating natural language questions to SPARQL. Existing QA systems are typically based on application-specific human-curated rules, or require prior information, expensive pre-processing and model adaptation for each targeted KG. Therefore, they are hard to generalize to a broad set of applications and KGs. In this paper, we propose KGQAn, a universal QA system that does not need to be tailored to each target KG. Instead of curated rules, KGQAn introduces a novel formalization of question understanding as a text generation problem to convert a question into an intermediate abstract representation via a neural sequence-to-sequence model. We also develop a just-in-time linker that maps at query time the abstract representation to a SPARQL query for a specific KG, using only the publicly accessible APIs and the existing indices of the RDF store, without requiring any pre-processing. Our experiments with several real KGs demonstrate that KGQAn is easily deployed and outperforms by a large margin the state-of-the-art in terms of quality of answers and processing time, especially for arbitrary KGs, unseen during the training.	翻訳日:2023-08-10 17:57:40 公開日:2023-08-08
# DiffIR:画像復元のための効率的な拡散モデル DiffIR: Efficient Diffusion Model for Image Restoration ( http://arxiv.org/abs/2303.09472v2 ) ライセンス: Link先を確認	Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, and Luc Van Gool	(参考訳) 拡散モデル(DM)は、画像合成過程をデノナイジングネットワークのシーケンシャルな応用にモデル化することで、SOTA性能を達成した。しかし、画像合成とは違って、画像復元(IR)は、地上構造に応じて結果を生成するのに強い制約がある。したがって、IRの場合、画像全体や特徴マップを推定する大規模なモデルで大規模なイテレーションを実行する従来のDMは非効率である。この問題に対処するために、コンパクトIR先行抽出ネットワーク(CPEN)、動的IRトランスフォーマ(DIRformer)、復調ネットワーク(denoising network)からなるIR(DiffIR)のための効率的なDMを提案する。具体的には、DiffIRには2つのトレーニングステージがある。事前トレーニングでは, CPEN$_{S1}$に接地画像を入力することで, コンパクトIR先行表現(IPR)を捕捉し, DIRformerを誘導する。第2段階では、LQ画像のみを用いて事前訓練されたCPEN$_{S1}$と同じIRPを直接推定するようにDMを訓練する。 IPRはコンパクトなベクトルであるため、DiffIRは従来のDMよりも少ないイテレーションで正確な推定を行い、より安定でリアルな結果を生成することができる。繰り返しは少ないので、我々のDiffIRはCPEN$_{S2}$, DIRformer, denoising Networkを併用することで、推定誤差の影響をさらに低減することができる。計算コストを削減しつつ、複数のIRタスクを広範囲に実験し、SOTA性能を達成する。コードは \url{https://github.com/zj-binxia/diffir} で入手できる。 Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network. However, different from image synthesis, image restoration (IR) has a strong constraint to generate results in accordance with ground-truth. Thus, for IR, traditional DMs running massive iterations on a large model to estimate whole images or feature maps is inefficient. To address this issue, we propose an efficient DM for IR (DiffIR), which consists of a compact IR prior extraction network (CPEN), dynamic IR transformer (DIRformer), and denoising network. Specifically, DiffIR has two training stages: pretraining and training DM. In pretraining, we input ground-truth images into CPEN$_{S1}$ to capture a compact IR prior representation (IPR) to guide DIRformer. In the second stage, we train the DM to directly estimate the same IRP as pretrained CPEN$_{S1}$ only using LQ images. We observe that since the IPR is only a compact vector, DiffIR can use fewer iterations than traditional DM to obtain accurate estimations and generate more stable and realistic results. Since the iterations are few, our DiffIR can adopt a joint optimization of CPEN$_{S2}$, DIRformer, and denoising network, which can further reduce the estimation error influence. We conduct extensive experiments on several IR tasks and achieve SOTA performance while consuming less computational costs. Code is available at \url{https://github.com/Zj-BinXia/DiffIR}.	翻訳日:2023-08-10 17:48:06 公開日:2023-08-08
# TiDEによる長期予測:時系列Dense Encoder Long-term Forecasting with TiDE: Time-series Dense Encoder ( http://arxiv.org/abs/2304.08424v3 ) ライセンス: Link先を確認	Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan Mathur, Rajat Sen and Rose Yu	(参考訳) 最近の研究で、単純な線形モデルは、長期の時系列予測においてトランスフォーマーベースのアプローチより優れていることが示されている。そこで我々は,線形モデルの単純さと高速さを享受しつつ,共変量や非線形依存性を扱える時系列予測のためのマルチレイヤパーセプトロン(MLP)ベースのエンコーダ・デコーダモデルであるTiDEを提案する。理論的には、このモデルの最も単純な線形類似物は、いくつかの仮定の下で線形力学系(lds)の最適誤差率に近いことを証明できる。実験により,提案手法は,最も優れたTransformerベースモデルよりも5～10倍高速でありながら,一般的な時系列予測ベンチマークにおいて,先行手法に適合あるいは優れることを示す。 Recent work has shown that simple linear models can outperform several Transformer based approaches in long term time-series forecasting. Motivated by this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model, Time-series Dense Encoder (TiDE), for long-term time-series forecasting that enjoys the simplicity and speed of linear models while also being able to handle covariates and non-linear dependencies. Theoretically, we prove that the simplest linear analogue of our model can achieve near optimal error rate for linear dynamical systems (LDS) under some assumptions. Empirically, we show that our method can match or outperform prior approaches on popular long-term time-series forecasting benchmarks while being 5-10x faster than the best Transformer based model.	翻訳日:2023-08-10 17:38:42 公開日:2023-08-08
# 分散化と加速により大規模バンドル調整が可能に Decentralization and Acceleration Enables Large-Scale Bundle Adjustment ( http://arxiv.org/abs/2305.07026v3 ) ライセンス: Link先を確認	Taosha Fan, Joseph Ortiz, Ming Hsiao, Maurizio Monge, Jing Dong, Todd Murphey, Mustafa Mukadam	(参考訳) 大規模なバンドル調整問題へのスケーリングには、複数のデバイスに分散するデータと計算が必要である。事前作業における集中型メソッドは、計算と通信のオーバーヘッドのため、中小規模の問題を解決することしかできない。本稿では,計算と通信のボトルネックを軽減し,任意に大きなバンドル調整問題を解決する完全分散手法を提案する。再投射誤差を補正し、異なるデバイスから最適化変数を分離する新しい代理関数を導出することにより、これを実現する。この関数は、最大化最小化技術を使用することを可能にし、並列で解決できる独立最適化サブプロブレムへのバンドル調整を減らす。さらに、ネステロフの加速と適応再起動を適用し、理論的な保証を維持しながら収束を改善する。ピアツーピア通信は限られているが,本手法は軽度条件下での1次臨界点への収束が証明可能である。公開データセットを用いた大規模なベンチマークでは,メモリ使用量や通信負荷に類似した分散ベースラインよりもはるかに高速に収束する。単一デバイスを用いた集中型ベースラインと比較して、我々の手法は分散化されているものの、Ceresで最大953.7倍、DeepLMで最大174.6倍の精度で解が得られる。コード: https://joeaortiz.github.io/daba。 Scaling to arbitrarily large bundle adjustment problems requires data and compute to be distributed across multiple devices. Centralized methods in prior works are only able to solve small or medium size problems due to overhead in computation and communication. In this paper, we present a fully decentralized method that alleviates computation and communication bottlenecks to solve arbitrarily large bundle adjustment problems. We achieve this by reformulating the reprojection error and deriving a novel surrogate function that decouples optimization variables from different devices. This function makes it possible to use majorization minimization techniques and reduces bundle adjustment to independent optimization subproblems that can be solved in parallel. We further apply Nesterov's acceleration and adaptive restart to improve convergence while maintaining its theoretical guarantees. Despite limited peer-to-peer communication, our method has provable convergence to first-order critical points under mild conditions. On extensive benchmarks with public datasets, our method converges much faster than decentralized baselines with similar memory usage and communication load. Compared to centralized baselines using a single device, our method, while being decentralized, yields more accurate solutions with significant speedups of up to 953.7x over Ceres and 174.6x over DeepLM. Code: https://joeaortiz.github.io/daba.	翻訳日:2023-08-10 17:31:34 公開日:2023-08-08
# DOCTOR:ウェアラブル・メディカル・センサを用いたマルチ障害検出連続学習フレームワーク DOCTOR: A Multi-Disease Detection Continual Learning Framework Based on Wearable Medical Sensors ( http://arxiv.org/abs/2305.05738v2 ) ライセンス: Link先を確認	Chia-Hao Li and Niraj K. Jha	(参考訳) エッジデバイスにおける機械学習(ML)とウェアラブル医療センサ(WMS)の最近の進歩により、スマートヘルスケアのためのML駆動型疾患検出が可能になった。従来のML駆動型疾患検出法は、各疾患の個々のモデルとその対応するWMSデータのカスタマイズに依存している。しかし、このような方法は分散シフトや新しいタスク分類クラスへの適応性に欠ける。さらに、新しい疾患ごとに再設計し、スクラッチから再訓練する必要がある。これらの課題に対処するために,WMSに基づく多相検出連続学習(CL)フレームワークであるDOCTORを提案する。マルチヘッドディープニューラルネットワーク(DNN)と、模範再生スタイルのCLアルゴリズムを採用している。 clアルゴリズムは、異なるデータ分布、分類クラス、病気検出タスクが順次導入される新しいミッションを継続的に学習することを可能にする。データ保存方法と合成データ生成(SDG)モジュールとで破滅的な忘れを対処する。データ保存方法は、前回のミッションから得たトレーニングデータの最も情報性の高いサブセットを効率よく保存して再生する。 SDGモジュールは、実際のトレーニングデータの確率分布をモデル化し、データのプライバシーを維持しながら再生のための合成データを生成する。マルチヘッドDNNにより、DOCTORはユーザWMSデータに基づいて複数の疾患を同時に検出できる。各種CL実験において、1つのDNNモデルを用いて高い疾患分類精度を維持する上でのDOCTORの有効性を実証した。 doctorは平均テスト精度が1.43倍、f1-scoreが1.25倍、naive fine-tuning frameworkよりも0.01倍、モデルサイズが小さく複雑なclシナリオが複雑である。 Modern advances in machine learning (ML) and wearable medical sensors (WMSs) in edge devices have enabled ML-driven disease detection for smart healthcare. Conventional ML-driven disease detection methods rely on customizing individual models for each disease and its corresponding WMS data. However, such methods lack adaptability to distribution shifts and new task classification classes. Moreover, they need to be rearchitected and retrained from scratch for each new disease. To address these challenges, we propose DOCTOR, a multi-disease detection continual learning (CL) framework based on WMSs. It employs a multi-headed deep neural network (DNN) and an exemplar-replay-style CL algorithm. The CL algorithm enables the framework to continually learn new missions where different data distributions, classification classes, and disease detection tasks are introduced sequentially. It counteracts catastrophic forgetting with a data preservation method and a synthetic data generation (SDG) module. The data preservation method efficiently preserves the most informative subset of training data from previous missions for replay. The SDG module models the probability distribution of the real training data and generates synthetic data for replays while retaining data privacy. The multi-headed DNN enables DOCTOR to detect multiple diseases simultaneously based on user WMS data. In various CL experiments, we demonstrate DOCTOR's efficacy in maintaining high disease classification accuracy with a single DNN model. DOCTOR achieves 1.43 times better average test accuracy, 1.25 times better F1-score, and 0.41 higher backward transfer than the naive fine-tuning framework, with a small model size and in complex CL scenarios.	翻訳日:2023-08-10 17:31:12 公開日:2023-08-08
# GAD-NR 近傍再構成によるグラフ異常検出 GAD-NR: Graph Anomaly Detection via Neighborhood Reconstruction ( http://arxiv.org/abs/2306.01951v4 ) ライセンス: Link先を確認	Amit Roy, Juan Shu, Jia Li, Carl Yang, Olivier Elshocht, Jeroen Smeets and Pan Li	(参考訳) Graph Anomaly Detection (GAD) は、グラフ内の異常ノードを識別し、ネットワークセキュリティ、不正検出、ソーシャルメディアスパム検出、その他さまざまな分野の応用を見つけるために用いられるテクニックである。 GADの一般的な方法は、グラフデータをノード表現にエンコードし、これらの表現に基づいてグラフの再構成品質を評価することによって異常を識別するグラフオートエンコーダ(GAE)である。しかし、既存のGAEモデルは直接リンク再構成に最適化されており、グラフに接続されたノードは潜在空間にクラスタ化される。その結果、クラスター型構造異常を検出するのに優れるが、クラスタに適合しないより複雑な構造異常に悩まされる。この制限に対処するため,グラフ異常検出のための近傍再構成を組み込んだGAEの新しい変種であるGAD-NRを提案する。 GAD-NRは、ノード表現に基づいて、ローカル構造、自己属性、および隣接属性を含むノードの近傍全体を再構築することを目的としている。異常ノードと正常ノード間の近傍再構成損失を比較することで、GAD-NRは任意の異常を効果的に検出できる。 6つの実世界のデータセットで実施された大規模な実験は、GAD-NRの有効性を検証し、最先端の競合相手よりも顕著な改善(AUCでは最大30%)を示す。 GAD-NRのソースコードが公開されている。比較分析の結果,既存の手法は3種類の異常から1種類または2種類の異常を検出する場合にのみ有効であることが判明した。対照的に、GAD-NRはデータセット全体の3種類の異常を検知し、その包括的な異常検出能力を示す。 Graph Anomaly Detection (GAD) is a technique used to identify abnormal nodes within graphs, finding applications in network security, fraud detection, social media spam detection, and various other domains. A common method for GAD is Graph Auto-Encoders (GAEs), which encode graph data into node representations and identify anomalies by assessing the reconstruction quality of the graphs based on these representations. However, existing GAE models are primarily optimized for direct link reconstruction, resulting in nodes connected in the graph being clustered in the latent space. As a result, they excel at detecting cluster-type structural anomalies but struggle with more complex structural anomalies that do not conform to clusters. To address this limitation, we propose a novel solution called GAD-NR, a new variant of GAE that incorporates neighborhood reconstruction for graph anomaly detection. GAD-NR aims to reconstruct the entire neighborhood of a node, encompassing the local structure, self-attributes, and neighbor attributes, based on the corresponding node representation. By comparing the neighborhood reconstruction loss between anomalous nodes and normal nodes, GAD-NR can effectively detect any anomalies. Extensive experimentation conducted on six real-world datasets validates the effectiveness of GAD-NR, showcasing significant improvements (by up to 30% in AUC) over state-of-the-art competitors. The source code for GAD-NR is openly available. Importantly, the comparative analysis reveals that the existing methods perform well only in detecting one or two types of anomalies out of the three types studied. In contrast, GAD-NR excels at detecting all three types of anomalies across the datasets, demonstrating its comprehensive anomaly detection capabilities.	翻訳日:2023-08-10 17:18:58 公開日:2023-08-08
# 静電場による極低温CaF分子の遮蔽衝突 Shielding collisions of ultracold CaF molecules with static electric fields ( http://arxiv.org/abs/2305.07600v2 ) ライセンス: Link先を確認	Bijit Mukherjee, Matthew D. Frye, C. Ruth Le Sueur, Michael R. Tarbutt and Jeremy M. Hutson	(参考訳) 強静電場における極低温CaF分子の衝突について検討する。これらの分野は相互作用ポテンシャルに長距離障壁を作ることを可能にし、非弾性やその他の損失過程が起こる可能性のある短距離領域に分子が到達するのを効果的に妨げている。弾性散乱と損失に対するレート係数の結合チャネル計算を行う。本稿では,Van Vleck変換を用いて,エネルギー的によく分離されたロータ関数をベースとした効率的なプロシージャを開発する。遮蔽はCaFにおいて特に効率的であり,23kV/cmのフィールドにおいて,2体損失過程の速度を10^7$以上削減できることを示す。損失率は、かなりの範囲の分野において低いままである。電子スピンと核スピンは、いくつかの小さな領域で強い損失をもたらすが、他の領域では効果がほとんどない。これらの結果は、CaFの蒸発冷却の量子縮退への道を開く。 We study collisions of ultracold CaF molecules in strong static electric fields. These fields allow the creation of long-range barriers in the interaction potential, effectively preventing the molecules from reaching the short-range region where inelastic and other loss processes are likely to occur. We carry out coupled-channel calculations of rate coefficients for elastic scattering and loss. We develop an efficient procedure for including energetically well-separated rotor functions in the basis set via a Van Vleck transformation. We show that shielding is particularly efficient for CaF and allows the rate of two-body loss processes to be reduced by a factor of $10^7$ or more at a field of 23 kV/cm. The loss rates remain low over a substantial range of fields. Electron and nuclear spins cause strong additional loss in some small ranges of field, but have little effect elsewhere. These results pave the way for evaporative cooling of CaF towards quantum degeneracy.	翻訳日:2023-08-10 17:17:37 公開日:2023-08-08
# AutoHint: Hint生成による自動プロンプト最適化 AutoHint: Automatic Prompt Optimization with Hint Generation ( http://arxiv.org/abs/2307.07415v2 ) ライセンス: Link先を確認	Hong Sun, Xue Li, Yinchuan Xu, Youkow Homma, Qi Cao, Min Wu, Jian Jiao, Denis Charles	(参考訳) 本稿では,大規模言語モデル(LLM)の自動プロンプトエンジニアリングと最適化のための新しいフレームワークであるAutoHintを提案する。 llmは、様々なタスクで高品質なアノテーションを実現する素晴らしい能力を示しているが、特定のタスクにこの能力を適用する鍵は、高品質なプロンプトを開発することである。そこで本研究では,インプット・アウトプット・デモから得られた拡張した指示を組み込むことで,文脈内学習とゼロショット学習の両方のメリットを継承し,プロンプトを最適化する枠組みを提案する。我々は、エンリッチメントをヒントとして参照し、ラベル付きデータから自動的にヒントを生成するフレームワークを提案する。より具体的には、最初のプロンプトから始めて、提案手法はまず、不正な予測から選択したサンプルに対する新しいヒントを導出するようにLCMに指示し、次にサンプルごとのヒントから要約し、その結果を初期プロンプトに付加して、新しいリッチな命令を生成する。提案手法は, ゼロショットプロンプトと少数ショートプロンプトの両方に対して, BIG-Benchインストラクション・インストラクション・インジェクション・インジェクション・データセットを用いて評価し, 実験により複数のタスクの精度を大幅に向上させることができることを示した。 This paper presents AutoHint, a novel framework for automatic prompt engineering and optimization for Large Language Models (LLM). While LLMs have demonstrated remarkable ability in achieving high-quality annotation in various tasks, the key to applying this ability to specific tasks lies in developing high-quality prompts. Thus we propose a framework to inherit the merits of both in-context learning and zero-shot learning by incorporating enriched instructions derived from input-output demonstrations to optimize original prompt. We refer to the enrichment as the hint and propose a framework to automatically generate the hint from labeled data. More concretely, starting from an initial prompt, our method first instructs a LLM to deduce new hints for selected samples from incorrect predictions, and then summarizes from per-sample hints and adds the results back to the initial prompt to form a new, enriched instruction. The proposed method is evaluated on the BIG-Bench Instruction Induction dataset for both zero-shot and few-short prompts, where experiments demonstrate our method is able to significantly boost accuracy for multiple tasks.	翻訳日:2023-08-10 17:10:50 公開日:2023-08-08
# 蒸留プルーニング: 合成データを使って宝くじを勝ち取る Distilled Pruning: Using Synthetic Data to Win the Lottery ( http://arxiv.org/abs/2307.03364v3 ) ライセンス: Link先を確認	Luke McDermott, Daniel Cummings	(参考訳) この研究は、蒸留データを用いてディープラーニングモデルを刈り取る新しいアプローチを導入する。アーキテクチャやアルゴリズムの最適化を主眼とする従来の戦略とは異なり、我々の手法はこれらのシナリオにおけるデータの役割を再考する。蒸留データセットは、より大きなデータセットから必須パターンをキャプチャし、この能力を活用して、計算効率の良いプルーニングプロセスを実現する方法を実証する。我々のアプローチでは、CIFAR-10で同等の間隔でイテレーティブマグニチュード・プルーニング(Iterative Magnitude Pruning)よりも5倍高速な、スパースでトレーニング可能なサブネットワーク(Lottery Tickets)を見つけることができる。実験結果は,資源効率のよいニューラルネットワークのプルーニング,モデル圧縮,ニューラルネットワークの探索に蒸留データを利用する可能性を強調した。 This work introduces a novel approach to pruning deep learning models by using distilled data. Unlike conventional strategies which primarily focus on architectural or algorithmic optimization, our method reconsiders the role of data in these scenarios. Distilled datasets capture essential patterns from larger datasets, and we demonstrate how to leverage this capability to enable a computationally efficient pruning process. Our approach can find sparse, trainable subnetworks (a.k.a. Lottery Tickets) up to 5x faster than Iterative Magnitude Pruning at comparable sparsity on CIFAR-10. The experimental results highlight the potential of using distilled data for resource-efficient neural network pruning, model compression, and neural architecture search.	翻訳日:2023-08-10 17:09:16 公開日:2023-08-08
# マルチモーダルクエリを用いたアクタ非依存マルチラベル動作認識 Actor-agnostic Multi-label Action Recognition with Multi-modal Query ( http://arxiv.org/abs/2307.10763v2 ) ライセンス: Link先を確認	Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta	(参考訳) 既存の行動認識法は、内在的なトポロジとアクター間の明らかな差異により、アクター固有のものである。これはアクター固有のポーズ推定(例えば人間対動物)を必要とし、複雑なモデル設計と高いメンテナンスコストをもたらす。さらに、他の利用可能な情報ソース(クラス名テキストなど)や複数のアクションの同時発生を無視しながら、視覚的モダリティのみと単一ラベルの分類を学ぶことに注力することが多い。これらの制約を克服するために,人間や動物を含む様々な種類の俳優に統一されたソリューションを提供する「アクター非依存マルチモード動作認識」という新しい手法を提案する。さらに,多モードセマンティッククエリーネットワーク(MSQNet)モデルをトランスフォーマーベースのオブジェクト検出フレームワーク(DETRなど)で定式化し,視覚的およびテキスト的モダリティを活用して,アクションクラスをより良く表現する。アクター固有のモデルデザインの排除は重要な利点であり、アクターのポーズ推定の必要性を完全に排除する。 5つの公開ベンチマークの大規模な実験によると、我々のMSQNetは、人間と動物のシングルラベルとマルチラベルのアクション認識タスクにおいて、アクター固有の代替手段の先行技術を最大50%上回っている。コードはhttps://github.com/mondalanindya/MSQNet.comでリリースされる。 Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code will be released at https://github.com/mondalanindya/MSQNet.	翻訳日:2023-08-10 16:58:55 公開日:2023-08-08
# スワップ演算子の代数構造による量子マックスカットの緩和と厳密解 Relaxations and Exact Solutions to Quantum Max Cut via the Algebraic Structure of Swap Operators ( http://arxiv.org/abs/2307.15661v2 ) ライセンス: Link先を確認	Adam Bene Watts, Anirban Chowdhury, Aidan Epperly, J. William Helton, Igor Klep	(参考訳) 量子マックスカット(qmc)問題は、局所ハミルトニアン問題の近似アルゴリズムを設計するためのテストプロブレムとして現れた。本稿では、QMCの代数構造、特に量子マックスカットハミルトニアンと対称群の表現理論の関係を用いてこの問題に対処する。この論文の最初の大きな貢献は、量子マックスカットに緩和の新たな階層を与えるために非可換な正方形最適化手法(ncSoS)の拡張である。現在の階層は、キュービットスワップ作用素の多項式に対する最適化に基づいている。これは、パウリ行列の項で表される多項式に基づく '`standard'' 量子ラッサール階層とは対照的である。この階層の正しさを証明するために、キュービットスワップ作用素によって生成される代数の有限表現を利用する。このプレゼンテーションは、スワップ演算子を使って記述された多項式を操作するためにコンピュータ代数的手法を使うことを可能にし、独立した興味を持つかもしれない。驚くべきことに、この新しい階層のレベル2は、最大8頂点のグラフ上の一様エッジ重みを持つすべてのqmcインスタンスにおいて、正確に(10^{-7}$)である。この論文の2つ目の大きな貢献は、あるグラフに対してQMCハミルトンの最大固有値を正確に計算する多項式時間アルゴリズムである。後者の特別なケースは、一様辺重みを持つ完備二部グラフであり、リーブとマティスの業績から正確な解が知られている。この手法は対称群の表現論を用いており、リーブ・マティス結果の一般化と見なすことができる。 The Quantum Max Cut (QMC) problem has emerged as a test-problem for designing approximation algorithms for local Hamiltonian problems. In this paper we attack this problem using the algebraic structure of QMC, in particular the relationship between the quantum max cut Hamiltonian and the representation theory of the symmetric group. The first major contribution of this paper is an extension of non-commutative Sum of Squares (ncSoS) optimization techniques to give a new hierarchy of relaxations to Quantum Max Cut. The hierarchy we present is based on optimizations over polynomials in the qubit swap operators. This is in contrast to the ``standard'' quantum Lasserre Hierarchy, which is based on polynomials expressed in terms of the Pauli matrices. To prove correctness of this hierarchy, we exploit a finite presentation of the algebra generated by the qubit swap operators. This presentation allows for the use of computer algebraic techniques to manipulate simplify polynomials written in terms of the swap operators, and may be of independent interest. Surprisingly, we find that level-2 of this new hierarchy is exact (up to tolerance $10^{-7}$) on all QMC instances with uniform edge weights on graphs with at most 8 vertices. The second major contribution of this paper is a polynomial-time algorithm that exactly computes the maximum eigenvalue of the QMC Hamiltonian for certain graphs, including graphs that can be "decomposed" as a signed combination of cliques. A special case of the latter are complete bipartite graphs with uniform edge-weights, for which exact solutions are known from the work of Lieb and Mattis. Our methods, which use representation theory of the symmetric group, can be seen as a generalization of the Lieb-Mattis result.	翻訳日:2023-08-10 16:49:14 公開日:2023-08-08
# ロシアのソーシャルメディアで、ウクライナでの戦争に反対する人々と支持者:彼らは誰なのか? Opponents and proponents of the war in Ukraine in Russian social media: who are they? ( http://arxiv.org/abs/2308.04473v1 ) ライセンス: Link先を確認	Alesya Sokolova	(参考訳) ウクライナでの戦争を支持するロシアの性格を理解することは、この戦争がいかにして可能になったかを理解するための重要なステップの1つである。しかし、戦時中、伝統的な社会学的手法は必ずしも適用されない。ソーシャルメディアは、人々の頭の中にあるものの代替の情報源を提供する。本稿では,ウクライナにおける戦争に対する強硬な立場にあるロシアにおけるソーシャルメディア利用者の政治的アイデンティティ,価値観,利益を比較検討する。私はロシアで最も人気のあるソーシャルメディアプラットフォームであるVKからデータを収集し、ユーザーが購読したグループだけでなく、自己完結したプロフィール情報も分析します。私は、戦争の支持者は、より正確に指定する(しばしば「自由」に限定されるわけではない)相手よりも、より弱い政治的アイデンティティ(自らを「モデレート」と呼ぶ)を持つ傾向があることを見出しました。さらに、支持者の価値観は、正統派や家族といったロシア政府によって推進されたものとよく一致している。これらの違いにもかかわらず、親戦派と反戦派のユーザーは、音楽、歴史、スポーツに焦点を当てた同じグループへのサブスクリプションによって証明されるように、多くの共通の関心を共有している。人々の最も重要な特性(フィールドユーザがVKを埋めることができる)を述べるように頼まれると、両方のグループの最も頻繁な答えは、“親切と誠実さ”である。分析結果は、ロシアにおける世論の理解に寄与するだけでなく、ソーシャルメディアのプロフィールに基づいて戦争における立場を予測するために利用することができる。 Understanding the personality of Russians who support the war in Ukraine is one of the key steps to understanding how this war became possible. However, during the war, traditional sociological methods are not always applicable. Social media provides an alternative source of what is inside people's heads. In this paper, I compare the political identities, values, and interests of social media users in Russia who hold a strong position for or against the war in Ukraine. I collect data from VK, the most popular Russian social media platform, and analyze self-filled profile information as well as the groups that the users subscribed to. I found that proponents of the war tend to have a weaker political identity (self-identified as "moderate") compared to opponents, who specify it more precisely (often, but not limited to, "liberal"). Additionally, the values of the proponents more frequently align with those promoted by the Russian government, such as orthodoxy and family. Despite these differences, pro-war and anti-war users share many common interests, as evidenced by their subscriptions to the same groups focused on music, history, and sport. When asked to state the most important trait in people (a field users can fill in VK), the most frequent answer for both groups is "kindness and honesty". The analysis results, in addition to contributing to the understanding of public opinion in Russia, can be utilized for predicting one's position on the war based on their social media profile.	翻訳日:2023-08-10 16:40:41 公開日:2023-08-08
# 量子計測理論における正準占有状態(マクロ)のエントロピー Entropy of the Canonical Occupancy (Macro) State in the Quantum Measurement Theory ( http://arxiv.org/abs/2308.04472v1 ) ライセンス: Link先を確認	Arnaldo Spalvieri	(参考訳) 本論文は, 平衡における不連続粒子の任意の数からなる系のエントロピーを解析し, エントロピーを位相空間表現ではなく, 系の量子状態の関数として定義する。我々の重要な観察は、系のエントロピーが、系の粒子に許される量子状態のランダム占有数のシャノンエントロピーであるということである。我々は、Jaynesの最大エントロピー原理に基づく情報理論的アプローチと、現代の量子熱力学における標準的典型性をもたらす経験的アプローチを考える。情報理論のアプローチでは、粒子の量子状態の占有数は多変量分布であり、経験的アプローチではその分布は多変量ハイパー幾何学である。経験的確率のサンプルの数が無限大になる傾向があるため、多変量超幾何分布は多項分布に傾向がある。これにより、少なくとも極限では、2つのアプローチが和解する。量子計測の観点から考えると、本解析は最大エントロピーアプローチを特徴付ける有名な主観主義よりも、別の種類の主観主義の存在を示唆する。この主観性の形態は、情報理論と経験的アプローチの両方において、量子測定の後にエントロピーがゼロに崩壊する原因である。 The paper analyzes the entropy of a system composed by an arbitrary number of indistinguishable particles at the equilibrium, defining entropy as a function of the quantum state of the system, not of its phase space representation. Our crucial observation is that the entropy of the system is the Shannon entropy of the random occupancy numbers of the quantum states allowed to system's particles. We consider the information-theoretic approach, which is based on Jaynes' maximum entropy principle, and the empirical approach, which leads to canonical typicality in modern quantum thermodynamics. In the information-theoretic approach, the occupancy numbers of particles' quantum states are multinomially distributed, while in the empirical approach their distribution is multivariate hypergeometric. As the number of samples of the empirical probability tends to infinity, the multivariate hypergeometric distribution tends to the multinomial distribution. This reconciles, at least in the limit, the two approaches. When regarded from the perspective of quantum measurement, our analysis suggests the existence of another kind of subjectivism than the well-known subjectivism that characterizes the maximum entropy approach. This form of subjectivity is responsible for the collapse of entropy to zero after the quantum measurement, both in the information-theoretic and in the empirical approaches.	翻訳日:2023-08-10 16:40:15 公開日:2023-08-08
# d-score:フィルタープルーニングのためのシナプスにインスパイアされたアプローチ D-Score: A Synapse-Inspired Approach for Filter Pruning ( http://arxiv.org/abs/2308.04470v1 ) ライセンス: Link先を確認	Doyoung Park, Jinsoo Kim, Jina Nam, Jooyoung Chang, Sang Min Park	(参考訳) 本稿では,畳み込みニューラルネットワーク(CNN)におけるフィルタプルーニングにおける重要でないフィルタのランクを決定するための新しい側面を紹介する。ヒトシナプス系では、興奮性および抑制性神経伝達物質として知られる2つの重要なチャネルがあり、ニューロンから細胞にシグナルを伝達する。神経科学的な観点から、我々はシナプスにインスパイアされたフィルタプルーニング法、すなわちDynamic Score(D-Score)を提案する。 D-Scoreはフィルタにおける正と負の重みの独立重要性を分析し、スコアを割り当てることによって独立重要性をランク付けする。全体的なスコアが低く、ニューラルネットワークの精度への影響が低いフィルタを切断する。 CIFAR-10 と ImageNet データセットを用いた実験結果から,FLOP と Param の顕著な量の Acc を伴わずに削減し,提案手法の有効性を示した。ドロップ。 This paper introduces a new aspect for determining the rank of the unimportant filters for filter pruning on convolutional neural networks (CNNs). In the human synaptic system, there are two important channels known as excitatory and inhibitory neurotransmitters that transmit a signal from a neuron to a cell. Adopting the neuroscientific perspective, we propose a synapse-inspired filter pruning method, namely Dynamic Score (D-Score). D-Score analyzes the independent importance of positive and negative weights in the filters and ranks the independent importance by assigning scores. Filters having low overall scores, and thus low impact on the accuracy of neural networks are pruned. The experimental results on CIFAR-10 and ImageNet datasets demonstrate the effectiveness of our proposed method by reducing notable amounts of FLOPs and Params without significant Acc. Drop.	翻訳日:2023-08-10 16:39:56 公開日:2023-08-08
# 深層学習ニューラルネットワークによるメディカルクレームサービスに関する考察 Correlating Medi- Claim Service by Deep Learning Neural Networks ( http://arxiv.org/abs/2308.04469v1 ) ライセンス: Link先を確認	Jayanthi Vajiram, Negha Senthil, Nean Adhith.P	(参考訳) 医療保険請求は、患者、医師、診断センター、保険業者に関連する組織犯罪であり、常に監視されなければならない連鎖反応を形成する。このような不正行為は、保険保険業者と保険業者の財政的成長に影響を及ぼす。畳み込みニューラルネットワークアーキテクチャ(convolution neural network architecture)は、回帰モデルの相関研究を通じて不正なクレームを検出するために使用される。監視および教師なしの分類器は詐欺や非詐欺行為を検出するために使用される。 Medical insurance claims are of organized crimes related to patients, physicians, diagnostic centers, and insurance providers, forming a chain reaction that must be monitored constantly. These kinds of frauds affect the financial growth of both insured people and health insurance companies. The Convolution Neural Network architecture is used to detect fraudulent claims through a correlation study of regression models, which helps to detect money laundering on different claims given by different providers. Supervised and unsupervised classifiers are used to detect fraud and non-fraud claims.	翻訳日:2023-08-10 16:39:38 公開日:2023-08-08
# シーングラフを用いた3次元シーン拡散誘導 3D Scene Diffusion Guidance using Scene Graphs ( http://arxiv.org/abs/2308.04468v1 ) ライセンス: Link先を確認	Mohammad Naanaa, Katharina Schmid, Yinyu Nie	(参考訳) 高品質な3dシーンの合成は難しい課題である。拡散モデルは、3Dシーンを含む多様なデータを生成することを約束している。しかし、現在の手法は生成を制御するために直接テキスト埋め込みに依存しており、オブジェクト間の複雑な空間的関係の組み込みを制限している。シーングラフを用いた3次元シーン拡散誘導手法を提案する。シーングラフが提供する相対的空間情報を活用するために,我々は,ネットワーク内の関係グラフ畳み込みブロックを利用する。提案手法はシーン記述と生成シーンのアライメントを大幅に改善することを示す。 Guided synthesis of high-quality 3D scenes is a challenging task. Diffusion models have shown promise in generating diverse data, including 3D scenes. However, current methods rely directly on text embeddings for controlling the generation, limiting the incorporation of complex spatial relationships between objects. We propose a novel approach for 3D scene diffusion guidance using scene graphs. To leverage the relative spatial information the scene graphs provide, we make use of relational graph convolutional blocks within our denoising network. We show that our approach significantly improves the alignment between scene description and generated scene.	翻訳日:2023-08-10 16:39:29 公開日:2023-08-08
# バックドアクリティカルレイヤの毒殺によるバックドアフェデレート学習 Backdoor Federated Learning by Poisoning Backdoor-Critical Layers ( http://arxiv.org/abs/2308.04466v1 ) ライセンス: Link先を確認	Haomin Zhuang, Mingxian Yu, Hao Wang, Yang Hua, Jian Li, and Xu Yuan	(参考訳) フェデレートラーニング(FL)は、分散デバイス間の機密データに対する機械学習トレーニングを可能にするために広くデプロイされている。しかし、FLの分散学習パラダイムと不均一性は、バックドア攻撃の攻撃面をさらに拡張する。既存のFL攻撃と防衛方法は通常、モデル全体に焦点を当てる。いずれも、モデル脆弱性を支配しているバックドアクリティカル(BC)層の存在を認識していない。 bc層を攻撃することは、モデル全体を攻撃することと同等の効果をもたらすが、最先端の防御(sota)によって検出される可能性ははるかに低い。本稿では,攻撃者の視点からBC層を同定し,検証する一般のin-situアプローチを提案する。識別されたbc層に基づき、様々な防御戦略の下で攻撃効果とステルスネスの基本的なバランスを適応的に求める新しいバックドア攻撃手法を慎重に作成する。広範囲な実験によって、bc層対応のバックドア攻撃は7つのsota防御の下でflをうまくバックドアすることができ、悪意のあるクライアントはわずか10%であり、最新のバックドア攻撃方法よりも優れています。 Federated learning (FL) has been widely deployed to enable machine learning training on sensitive data across distributed devices. However, the decentralized learning paradigm and heterogeneity of FL further extend the attack surface for backdoor attacks. Existing FL attack and defense methodologies typically focus on the whole model. None of them recognizes the existence of backdoor-critical (BC) layers-a small subset of layers that dominate the model vulnerabilities. Attacking the BC layers achieves equivalent effects as attacking the whole model but at a far smaller chance of being detected by state-of-the-art (SOTA) defenses. This paper proposes a general in-situ approach that identifies and verifies BC layers from the perspective of attackers. Based on the identified BC layers, we carefully craft a new backdoor attack methodology that adaptively seeks a fundamental balance between attacking effects and stealthiness under various defense strategies. Extensive experiments show that our BC layer-aware backdoor attacks can successfully backdoor FL under seven SOTA defenses with only 10% malicious clients and outperform the latest backdoor attack methods.	翻訳日:2023-08-10 16:39:22 公開日:2023-08-08
# 強化学習型筋制御器による人的バランスのキャラクタリゼーション Characterization of Human Balance through a Reinforcement Learning-based Muscle Controller ( http://arxiv.org/abs/2308.04462v1 ) ライセンス: Link先を確認	K\"ubra Akba\c{s}, Carlotta Mummolo, Xianlian Zhou	(参考訳) 身体リハビリテーション中のバランスアセスメントは、しばしば患者の身体能力を評価するためにルーリック指向のバッテリーテストに依存し、主観性につながる。いくつかの客観的バランス評価は存在するが、身体全体の姿勢安定性を完全に把握しない圧力中心(COP)の追跡に限られることが多い。本研究は, 重心状態空間(COM)の利用について検討し, ヒトのバランス能力を監視するための有望な道を示す。我々は、バランスコントローラと統合された筋骨格モデルを用いて、強化学習(RL)を通して訓練し、バランス機能を調べる。 RLフレームワークは、それぞれバランス回復と筋肉調整を管理する2つの相互接続ニューラルネットワークで構成され、PPO(Proximal Policy Optimization)を使用してトレーニングされ、参照状態の初期化、早期終了、複数のトレーニング戦略が提供されている。トレーニングされたコントローラに対するランダムな初期COM状態(位置と速度)空間からの回復を探索することにより、バランス回復軌道を囲む最終BRを得る。線形逆振り子モデルによる解析的姿勢安定性限界と比較すると, COM状態は同様の傾向を示すが, 回復可能な領域はより限定的である。さらに,brsに対する筋力低下と神経興奮遅延の影響について検討し,異なる領域におけるバランス能力の低下を明らかにした。全体として, 筋力バランス制御系を学習するアプローチは, バランス回復限界の確立と2足歩行系, 特にヒトにおけるバランス能力の客観的評価に有望な新しい方法を提案する。 Balance assessment during physical rehabilitation often relies on rubric-oriented battery tests to score a patient's physical capabilities, leading to subjectivity. While some objective balance assessments exist, they are often limited to tracking the center of pressure (COP), which does not fully capture the whole-body postural stability. This study explores the use of the center of mass (COM) state space and presents a promising avenue for monitoring the balance capabilities in humans. We employ a musculoskeletal model integrated with a balance controller, trained through reinforcement learning (RL), to investigate balancing capabilities. The RL framework consists of two interconnected neural networks governing balance recovery and muscle coordination respectively, trained using Proximal Policy Optimization (PPO) with reference state initialization, early termination, and multiple training strategies. By exploring recovery from random initial COM states (position and velocity) space for a trained controller, we obtain the final BR enclosing successful balance recovery trajectories. Comparing the BRs with analytical postural stability limits from a linear inverted pendulum model, we observe a similar trend in successful COM states but more limited ranges in the recoverable areas. We further investigate the effect of muscle weakness and neural excitation delay on the BRs, revealing reduced balancing capability in different regions. Overall, our approach of learning muscular balance controllers presents a promising new method for establishing balance recovery limits and objectively assessing balance capability in bipedal systems, particularly in humans.	翻訳日:2023-08-10 16:39:02 公開日:2023-08-08
# 会話型マルチモーダル感情認識におけるモーダリティとコンテキストに関する再検討 Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition ( http://arxiv.org/abs/2308.04502v1 ) ライセンス: Link先を確認	Bobo Li, Hao Fei, Lizi Liao, Yu Zhao, Chong Teng, Tat-Seng Chua, Donghong Ji, Fei Li	(参考訳) 会話におけるマルチモーダル感情分析(MM-ERC)の課題である対話シナリオ下で、機械が人間の感情を多モーダルな文脈で理解できるようにするためのホットな研究テーマである。 MM-ERCは近年,タスク性能向上のための多種多様な手法が提案されている。 MM-ERCを標準マルチモーダル分類問題として扱い,特徴量最大化のためのマルチモーダル特徴分散と融合を行う。しかし,MM-ERCの特徴を再考した結果,特徴の多相性と会話の文脈化は,特徴の絡み合いや融合の段階において同時にモデル化されるべきである,と論じている。本研究では、上記の知見を十分に考慮し、タスクパフォーマンスのさらなる向上を目標としている。一方,特徴の絡み合いにおいては,コントラスト学習手法に基づき,特徴をモダリティ空間と発話空間の両方に分離するddm(d-level disentanglement mechanism)を考案する。一方,機能融合の段階では,マルチモーダルとコンテキスト統合のための貢献・認識融合機構(cfm)とコンテキスト再融合機構(crm)を提案する。それらは、マルチモーダル機能とコンテキスト機能の適切な統合をスケジュールする。具体的には、CFMは動的にマルチモーダル機能のコントリビューションを管理し、CRMは対話コンテキストの導入を柔軟に調整する。 2つの公開MM-ERCデータセット上で,本システムは新しい最先端性能を一貫して達成する。さらに,マルチモーダルとコンテキスト機能を適応的に活用することにより,提案手法はすべてmm-ercタスクを大いに促進することを示す。提案手法は,より広い範囲の対話型マルチモーダルタスクを実現するための大きな可能性を秘めている。 It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard multimodal classification problem and perform multimodal feature disentanglement and fusion for maximizing feature utility. Yet after revisiting the characteristic of MM-ERC, we argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. In this work, we target further pushing the task performance by taking full consideration of the above insights. On the one hand, during feature disentanglement, based on the contrastive learning technique, we devise a Dual-level Disentanglement Mechanism (DDM) to decouple the features into both the modality space and utterance space. On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively. They together schedule the proper integrations of multimodal and context features. Specifically, CFM explicitly manages the multimodal feature contributions dynamically, while CRM flexibly coordinates the introduction of dialogue contexts. On two public MM-ERC datasets, our system achieves new state-of-the-art performance consistently. Further analyses demonstrate that all our proposed mechanisms greatly facilitate the MM-ERC task by making full use of the multimodal and context features adaptively. Note that our proposed methods have the great potential to facilitate a broader range of other conversational multimodal tasks.	翻訳日:2023-08-10 16:32:32 公開日:2023-08-08
# 量子部分情報分解 Quantum Partial Information Decomposition ( http://arxiv.org/abs/2308.04499v1 ) ライセンス: Link先を確認	S.J. van Enk	(参考訳) 部分情報分解 (Partial Information Decomposition, PID) は、情報2変数$A,B$が持つ第3変数$T$を、一意、共有(または冗長)、相乗的情報という別の部分に分解するシャノンの理論の一歩を踏み出したものである。ここでは、これらの概念を量子的に定義する方法を示す。我々は、量子論的記述が生産的であることが証明された量子多体系のスクランブルに量子PIDを適用した。特に特異な情報は、いわゆる三情報よりもスクランブルの詳細な記述を提供する。 The Partial Information Decomposition (PID) takes one step beyond Shannon's theory in decomposing the information two variables $A,B$ possess about a third variable $T$ into distinct parts: unique, shared (or redundant) and synergistic information. Here we show how these concepts can be defined in a quantum setting. We apply a quantum PID to scrambling in quantum many-body systems, for which a quantum-theoretic description has been proven productive. Unique information in particular provides a finer description of scrambling than does the so-called tri-information.	翻訳日:2023-08-10 16:32:01 公開日:2023-08-08
# DialogRE^C+: ダイアログの相関抽出にどの程度のコアが役立つかを調べるためのダイアログの拡張 DialogRE^C+: An Extension of DialogRE to Investigate How Much Coreference Helps Relation Extraction in Dialogs ( http://arxiv.org/abs/2308.04498v1 ) ライセンス: Link先を確認	Yiyun Xiong, Mengwei Dai, Fei Li, Hao Fei, Bobo Li, Shengqiong Wu, Donghong Ji, Chong Teng	(参考訳) 対話テキスト中の引数ペア間の関係を識別する対話関係抽出(DRE)は、個人代名詞の頻繁な出現、すなわちエンティティと話者のコア参照に悩まされる。本稿では、新しいベンチマークデータセットdialogre^c+を導入し、dreシナリオにコリファレンスレゾリューションを導入する。高品質なコア参照知識の活用により、議論関係の推論が強化されることが期待される。 dialogre^c+データセットでは、既存のdialogreデータに基づいて、36,369以上の合計5,068個のコリファレンスチェーンに手動で注釈を付けます。さらに、4つのコア参照強化グラフベースDREモデルを開発し、DREタスクを改善するための効果的なコア参照表現を学習する。また、アノテーションに基づいたコリファレンス解決モデルをトレーニングし、データセットの実用性とその他のドメインやタスクへの可能性を示す、自動抽出されたコリファレンスチェーンの効果を評価します。 Dialogue relation extraction (DRE) that identifies the relations between argument pairs in dialogue text, suffers much from the frequent occurrence of personal pronouns, or entity and speaker coreference. This work introduces a new benchmark dataset DialogRE^C+, introducing coreference resolution into the DRE scenario. With the aid of high-quality coreference knowledge, the reasoning of argument relations is expected to be enhanced. In DialogRE^C+ dataset, we manually annotate total 5,068 coreference chains over 36,369 argument mentions based on the existing DialogRE data, where four different coreference chain types namely speaker chain, person chain, location chain and organization chain are explicitly marked. We further develop 4 coreference-enhanced graph-based DRE models, which learn effective coreference representations for improving the DRE task. We also train a coreference resolution model based on our annotations and evaluate the effect of automatically extracted coreference chains demonstrating the practicality of our dataset and its potential to other domains and tasks.	翻訳日:2023-08-10 16:31:49 公開日:2023-08-08
# 非エルミート準結晶中の相関粒子の相転移と凝集 Phase transitions and bunching of correlated particles in a non-Hermitian quasicrystal ( http://arxiv.org/abs/2308.04495v1 ) ライセンス: Link先を確認	Stefano Longhi	(参考訳) 非エルミート準結晶中の非相互作用粒子は、点ギャップ位相によって特徴づけられる複雑なエネルギー平面における局在化とスペクトル相転移を示す。ここでは,非エルミート準結晶中の2つの相互作用粒子のスペクトルおよび動的特徴について検討し,複素位相をもつ非共役正弦波ポテンシャルの有効なハバードモデルにより記述し,エルミート準結晶を伴わないいくつかの興味深い効果を解き明かす。粒子相互作用によって引き起こされる相関ホッピングの効果的な減少、すなわち境界粒子状態は、単一粒子状態よりもスペクトルおよび局在化-非局在化遷移のしきい値がはるかに低く、移動エッジが出現する。顕著なことに、ダビロンは寿命が長いため、最初に離れた場所に置かれた2つの粒子は束縛され、進化の長期的限界において二重状態を形成する傾向にあり、これは「非エルミート粒子束」と呼ばれる現象である。 Non-interacting particles in non-Hermitian quasi crystals display localization-delocalization and spectral phase transitions in complex energy plane, that can be characterized by point-gap topology. Here we investigate the spectral and dynamical features of two interacting particles in a non-Hermitian quasi crystal, described by an effective Hubbard model in an incommensurate sinusoidal potential with a complex phase, and unravel some intriguing effects without any Hermitian counterpart. Owing to the effective decrease of correlated hopping introduced by particle interaction, doublon states, i.e. bound particle states, display a much lower threshold for spectral and localization-delocalization transitions than single-particle states, leading to the emergence of mobility edges. Remarkably, since doublons display longer lifetimes, two particles initially placed in distant sites tend to bunch and stick together, forming a doublon state in the long time limit of evolution, a phenomenon that can be dubbed {\em non-Hermitian particle bunching}.	翻訳日:2023-08-10 16:31:28 公開日:2023-08-08
# 波動関数分岐:混合状態から純粋な状態を区別できない場合 Wavefunction branching: when you can't tell pure states from mixed states ( http://arxiv.org/abs/2308.04494v1 ) ライセンス: Link先を確認	Jordan K. Taylor, Ian P. McCulloch	(参考訳) 本稿では、時間的進化の下でも対応する混合状態と区別できない量子重ね合わせの波動関数"分岐"の定義を提案する。我々の定義は解釈から大きく独立しており、枝を区別するよりも多くの局所ゲートを交換する必要がある。そのような分岐分解を認める状態のいくつかの例を示す。本定義では, 枝間の相対位相情報取得の試みは, 頻繁な能動的誤り訂正を行わずに失敗し, 枝はよい誤り訂正符号とは事実上逆であり, 枝は自然進化下の時間に, 枝はより分離して成長し, 枝は空間的絡み合いを吸収し, 枝は保存量の存在下では強く, 分岐は効果的な非可逆性をもたらすことを示した。多体量子状態におけるこれらの分岐分解の同定は、古典性の出現に光を当て、量子/古典境界での実験的実験のためのメトリックを提供し、より長い時間発展シミュレーションを可能にする。本研究は, 環境・環境の明確な分割のない状況に対する, 環境に起因したデコヒーレンスの基本概念の一般化であると考えている。 We propose a definition of wavefunction "branchings": quantum superpositions which can't be feasibly distinguished from the corresponding mixed state, even under time evolution. Our definition is largely independent of interpretations, requiring only that it takes many more local gates to swap branches than to distinguish them. We give several examples of states admitting such branch decompositions. Under our definition, we show that attempts to get relative-phase information between branches will fail without frequent active error correction, that branches are effectively the opposite of good error-correcting codes, that branches effectively only grow further apart in time under natural evolution, that branches tend to absorb spatial entanglement, that branching is stronger in the presence of conserved quantities, and that branching implies effective irreversibility. Identifying these branch decompositions in many-body quantum states could shed light on the emergence of classicality, provide a metric for experimental tests at the quantum/ classical boundary, and allow for longer numerical time evolution simulations. We see this work as a generalization of the basic ideas of environmentally-induced decoherence to situations with no clear system/ environment split.	翻訳日:2023-08-10 16:31:07 公開日:2023-08-08
# 単元型フォトニックコンピューティングチップによる効率的なオプション価格設定と生成逆学習 Efficient option pricing with unary-based photonic computing chip and generative adversarial learning ( http://arxiv.org/abs/2308.04493v1 ) ライセンス: Link先を確認	Hui Zhang, Lingxiao Wan, Sergi Ramos-Calderer, Yuancheng Zhan, Wai-Keong Mok, Hong Cai, Feng Gao, Xianshu Luo, Guo-Qiang Lo, Leong Chuan Kwek, Jos\'e Ignacio Latorre and Ai Qun Liu	(参考訳) 現代の金融産業システムでは、製品の構造がますます複雑になってきており、古典的コンピューティングパワーのボトルネックの制約は金融産業の発展を既に制限している。本稿では,古典モンテカルロ法と比較して2次高速化を実現するために,量子振幅推定アルゴリズムと組み合わせて,欧州のオプション価格の一元的手法を実装したフォトニックチップを提案する。回路は、資産価格の分布をロードするモジュール、期待されるペイオフを計算するモジュール、スピードアップを導入する量子振幅推定アルゴリズムを実行するモジュールの3つのモジュールで構成される。流通モジュールでは、資産分布の効率的な学習とロードのために生成的対向ネットワークが組み込まれ、市場動向を正確に把握する。この研究は金融分野のアプリケーション向けの特殊なフォトニックプロセッサの開発における一歩であり、金融サービスの効率と品質を向上させる可能性を秘めている。 In the modern financial industry system, the structure of products has become more and more complex, and the bottleneck constraint of classical computing power has already restricted the development of the financial industry. Here, we present a photonic chip that implements the unary approach to European option pricing, in combination with the quantum amplitude estimation algorithm, to achieve a quadratic speedup compared to classical Monte Carlo methods. The circuit consists of three modules: a module loading the distribution of asset prices, a module computing the expected payoff, and a module performing the quantum amplitude estimation algorithm to introduce speed-ups. In the distribution module, a generative adversarial network is embedded for efficient learning and loading of asset distributions, which precisely capture the market trends. This work is a step forward in the development of specialized photonic processors for applications in finance, with the potential to improve the efficiency and quality of financial services.	翻訳日:2023-08-10 16:30:44 公開日:2023-08-08
# アラビア語文法誤り訂正のためのchatgpt ChatGPT for Arabic Grammatical Error Correction ( http://arxiv.org/abs/2308.04492v1 ) ライセンス: Link先を確認	Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoud, Muhammad Abdul-Mageed	(参考訳) 近年,人間の指導に追従するように微調整された大規模言語モデル (LLM) は,様々な英語NLPタスクにおいて重要な機能を示している。しかし、文法的誤り訂正(GEC)タスクにおけるそれらの性能は、特に非英語言語では明らかに未解明のままである。本稿では,アラビア語の豊富な形態が原因で複雑化した課題である,アラビア語 GEC における微調整 LLM の指導能力について検討する。この結果から, GPT-4 はエキスパート・プロンプトで 65.49$ F\textsubscript{1} のスコアを達成し, 各種プロンプト法と (文脈内) 少数ショット学習の併用により, 高い効果が得られたことが示唆された。これは低リソース環境でのLLMの可能性を強調し、モデルトレーニングに有用な合成データを生成するための実行可能なアプローチを提供する。これらの肯定的な結果にもかかわらず、命令の微調整モデルは、そのサイズに関わらず、かなり小さいサイズの完全微調整モデルに比べて、著しく性能が劣ることがわかった。この格差は、LLMの大幅な改善の余地を浮き彫りにする。また,低リソース機械翻訳の手法に触発されて,従来の2つの標準アラビア語ベンチマークのモデルを大きく上回る合成データを利用する手法を開発した。我々の研究は、2014年と2015年のQALBデータセットで、それぞれ72.19 %$と73.26 $ F$_{1}$の新たな SoTA をアラビア語 GEC 向けに設定している。 Recently, large language models (LLMs) fine-tuned to follow human instruction have exhibited significant capabilities in various English NLP tasks. However, their performance in grammatical error correction (GEC) tasks, particularly in non-English languages, remains significantly unexplored. In this paper, we delve into abilities of instruction fine-tuned LLMs in Arabic GEC, a task made complex due to Arabic's rich morphology. Our findings suggest that various prompting methods, coupled with (in-context) few-shot learning, demonstrate considerable effectiveness, with GPT-4 achieving up to $65.49$ F\textsubscript{1} score under expert prompting (approximately $5$ points higher than our established baseline). This highlights the potential of LLMs in low-resource settings, offering a viable approach for generating useful synthetic data for model training. Despite these positive results, we find that instruction fine-tuned models, regardless of their size, significantly underperform compared to fully fine-tuned models of significantly smaller sizes. This disparity highlights a substantial room for improvements for LLMs. Inspired by methods from low-resource machine translation, we also develop a method exploiting synthetic data that significantly outperforms previous models on two standard Arabic benchmarks. Our work sets new SoTA for Arabic GEC, with $72.19\%$ and $73.26$ F$_{1}$ on the 2014 and 2015 QALB datasets, respectively.	翻訳日:2023-08-10 16:30:28 公開日:2023-08-08
# 1+1)Dハミルトンハードコア格子QCDにおけるハドロン Hadrons in (1+1)D Hamiltonian hardcore lattice QCD ( http://arxiv.org/abs/2308.04488v1 ) ライセンス: Link先を確認	Marco Rigobello, Giuseppe Magnifico, Pietro Silvi, Simone Montangero	(参考訳) 本研究では, (1+1)D にハードコアグルーオンを持つ2-フレーバーハミルトン格子 QCD を, 行列積状態を用いて0, 有限密度で検討した。ゲージ冗長性が存在しない理論を定式化し、ゲージ不変テンソルネットワーク ansatz を構成する。モデルがパラメータ空間の拡張部分領域において重要なことを示し、少なくとも2つの異なる位相を同定し、そのうちの1つは連続極限位置を埋め込む。我々は各相における粒子スペクトルのサブセットを再構成し、エッジとバルクギャップレスモードを同定する。したがって、研究モデルは、3+1D QCDの既知の現象を再現しながら、最小の SU(3) ゲージ理論を提供することを示した。最も注目すべきは、荷電パイ中間子を持つ粒子スペクトルである。 We study 2-flavor Hamiltonian lattice QCD in (1+1)D with hardcore gluons, at zero and finite density, by means of matrix product states. We introduce a formulation of the theory where gauge redundancy is absent and construct a gauge invariant tensor network ansatz. We show that the model is critical in an extended subregion of parameter space and identify at least two distinct phases, one of which embeds the continuum limit location. We reconstruct a subset of the particle spectrum in each phase, identifying edge and bulk gapless modes. We thereby show that the studied model provides a minimal SU(3) gauge theory whilst reproducing known phenomena of (3+1)D QCD. Most notably, its particle spectrum features charged pions.	翻訳日:2023-08-10 16:30:00 公開日:2023-08-08
# デジタル量子コンピュータにおける基底状態準備のためのスケーラブル回路:100Qubit上のSchwinger Model Vacuum Scalable Circuits for Preparing Ground States on Digital Quantum Computers: The Schwinger Model Vacuum on 100 Qubits ( http://arxiv.org/abs/2308.04481v1 ) ライセンス: Link先を確認	Roland C. Farrell, Marc Illa, Anthony N. Ciavarella, Martin J. Savage	(参考訳) 格子シュウィンガーモデルの真空は、最大100キュービットのibmのイーグルプロセッサ量子コンピュータで用意されている。量子コンピュータ上でガッピング変換不変システムの基底状態を生成する新しいアルゴリズムを提案し,スケーラブル回路adapt-vqe (sc-adapt-vqe) と呼ぶ。このアルゴリズムは、ADAPT-VQEとともに、基底状態の遠い領域間の相関関係の指数的減衰を利用して、任意に大きなシステムにスケールできる状態準備のための量子回路を構築する。 SC-ADAPT-VQEはシュウィンガーモデルに適用され、回路深さと指数的に収束する精度で体系的に即効性を示す。回路の構造と準備された波動関数の偏差の両方が、空間的位置の個数($L$)に依存しないことが分かる。これにより、小さいまたは小さめのシステムを用いて決定される回路の制御された外挿が可能となり、任意に$l$となる。シュウィンガーモデルの回路は、カイスキットの古典的シミュレータによる格子上で決定され、その後、IBMの超伝導量子コンピュータ ibm_brisbane と ibm_cusco 上の$L=50$ (100 qubits) 真空を準備するためにスケールアップされた。演算子デコヒーレンス再正規化(Operator Decoherence Renormalization)と呼ばれる改良された誤り軽減手法を適用した後, 量子コンピュータから得られたカイラル縮合および電荷電荷相関器は, 古典的行列積状態シミュレーションとよく一致していることがわかった。 The vacuum of the lattice Schwinger model is prepared on up to 100 qubits of IBM's Eagle-processor quantum computers. A new algorithm to prepare the ground state of a gapped translationally-invariant system on a quantum computer is presented, which we call Scalable Circuits ADAPT-VQE (SC-ADAPT-VQE). This algorithm uses the exponential decay of correlations between distant regions of the ground state, together with ADAPT-VQE, to construct quantum circuits for state preparation that can be scaled to arbitrarily large systems. SC-ADAPT-VQE is applied to the Schwinger model, and shown to be systematically improvable, with an accuracy that converges exponentially with circuit depth. Both the structure of the circuits and the deviations of prepared wavefunctions are found to become independent of the number of spatial sites, $L$. This allows for a controlled extrapolation of the circuits, determined using small or modest-sized systems, to arbitrarily large $L$. The circuits for the Schwinger model are determined on lattices up to $L=14$ (28 qubits) with the qiskit classical simulator, and subsequently scaled up to prepare the $L=50$ (100 qubits) vacuum on IBM's 127 superconducting-qubit quantum computers ibm_brisbane and ibm_cusco. After applying an improved error-mitigation technique, which we call Operator Decoherence Renormalization, the chiral condensate and charge-charge correlators obtained from the quantum computers are found to be in good agreement with classical Matrix Product State simulations.	翻訳日:2023-08-10 16:29:45 公開日:2023-08-08
# 10言語にわたるChatGPT 3.5を用いたコード生成の比較検討 A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages ( http://arxiv.org/abs/2308.04477v1 ) ライセンス: Link先を確認	Alessio Buscemi	(参考訳) LLM(Large Language Models)は、人工知能(AI)システムで、人間のものとよく似た言語を理解し生産するために、大規模なデータセットを使用して広範囲に訓練されている。これらのモデルは、いくつかの分野にわたる大学試験を成功させ、新しい問題に対処する機能コードを生成する能力のレベルに達している。本研究は,2022年11月にOpenAIがリリースしたLLMであるChatGPT 3.5の符号化能力について検討した。コードスニペットを作成する際のモデルのスキルは、10の異なるプログラミング言語と4つの異なるソフトウェアドメインで評価される。本研究から得られた知見に基づき, モデルの主な予期せぬ挙動と限界が同定された。本研究は,プログラミング言語の進化と技術産業における自動コード生成の意義を明らかにすることを目的としている。 Large Language Models (LLMs) are advanced Artificial Intelligence (AI) systems that have undergone extensive training using large datasets in order to understand and produce language that closely resembles that of humans. These models have reached a level of proficiency where they are capable of successfully completing university exams across several disciplines and generating functional code to handle novel problems. This research investigates the coding proficiency of ChatGPT 3.5, a LLM released by OpenAI in November 2022, which has gained significant recognition for its impressive text generating and code creation capabilities. The skill of the model in creating code snippets is evaluated across 10 various programming languages and 4 different software domains. Based on the findings derived from this research, major unexpected behaviors and limitations of the model have been identified. This study aims to identify potential areas for development and examine the ramifications of automated code generation on the evolution of programming languages and on the tech industry.	翻訳日:2023-08-10 16:29:13 公開日:2023-08-08
# テキストに先駆けて:金融関係抽出のためのエンティティ前置詞の活用 Ahead of the Text: Leveraging Entity Preposition for Financial Relation Extraction ( http://arxiv.org/abs/2308.04534v1 ) ライセンス: Link先を確認	Stefan Pasch, Dimitrios Petridis	(参考訳) ACM KDF-SIGIR 2023コンペティションの文脈では、REFindと呼ばれる金融関係のデータセット上で、エンティティ関係タスクを実行する。私たちのトップパフォーマンスソリューションには、多段階のアプローチがありました。最初は、提供されたエンティティをテキスト内の対応する場所に挿入しました。その後,テキスト分類のためのトランスフォーマーベース言語モデルRogerta-largeをラベル付きトレーニングセットを用いて微調整し,実体関係を予測する。最後に,モデルが生成する疑わしい予測を識別し処理するために,処理後フェーズを実装した。提案手法により,大会の公開リーダーボードにおいて,第1位にランクインした。 In the context of the ACM KDF-SIGIR 2023 competition, we undertook an entity relation task on a dataset of financial entity relations called REFind. Our top-performing solution involved a multi-step approach. Initially, we inserted the provided entities at their corresponding locations within the text. Subsequently, we fine-tuned the transformer-based language model roberta-large for text classification by utilizing a labeled training set to predict the entity relations. Lastly, we implemented a post-processing phase to identify and handle improbable predictions generated by the model. As a result of our methodology, we achieved the 1st place ranking on the competition's public leaderboard.	翻訳日:2023-08-10 16:21:37 公開日:2023-08-08
# スタイル変換による現代ペルシャカルペットマップの生成 Generating Modern Persian Carpet Map by Style-transfer ( http://arxiv.org/abs/2308.04529v1 ) ライセンス: Link先を確認	Dorsa Rahmatian, Monireh Moshavash, Mahdi Eftekhari, and Kamran Hoseinkhani	(参考訳) 現在、ディープニューラルネットワーク(DNN)の性能は様々な分野で証明されている。最も魅力的な応用の1つは芸術的なデザインを作ることである。芸術作品として知られるカーペットは、世界中の多くの愛好家がいる家の中で最も重要なアイテムの1つである。カーペットを作る第1段階は、地図を作成することであり、これは困難で時間がかかり、費用がかかる作業である。本研究の目的は,近代ペルシャカルペットマップの作成にDNNを使用することである。この目的を達成するために、3つの異なるDNNスタイルの転送手法を提案し、比較した。提案手法では,初期カーペットマップの作成にスタイルスワップ法を応用し,より多様なデザインを生成するため,クリップスワップ法,ガティ法,スタイルスワップ法を別々に使用する。また, カーペットマップの着色方法についても検討し, 導入した。設計した地図は, ユーザ評価の結果が生成したカーペットマップの人気を裏付けるアンケートの結果によって評価される。最終的に、カーペットマップの作成に初めてインテリジェントな手法が使用され、人間の介入を減らす。提案手法は,従来の手法よりも高速で多種多様なカーペットデザインを作成可能である。 Today, the great performance of Deep Neural Networks(DNN) has been proven in various fields. One of its most attractive applications is to produce artistic designs. A carpet that is known as a piece of art is one of the most important items in a house, which has many enthusiasts all over the world. The first stage of producing a carpet is to prepare its map, which is a difficult, time-consuming, and expensive task. In this research work, our purpose is to use DNN for generating a Modern Persian Carpet Map. To reach this aim, three different DNN style transfer methods are proposed and compared against each other. In the proposed methods, the Style-Swap method is utilized to create the initial carpet map, and in the following, to generate more diverse designs, methods Clip-Styler, Gatys, and Style-Swap are used separately. In addition, some methods are examined and introduced for coloring the produced carpet maps. The designed maps are evaluated via the results of filled questionnaires where the outcomes of user evaluations confirm the popularity of generated carpet maps. Eventually, for the first time, intelligent methods are used in producing carpet maps, and it reduces human intervention. The proposed methods can successfully produce diverse carpet designs, and at a higher speed than traditional ways.	翻訳日:2023-08-10 16:21:24 公開日:2023-08-08
# ドメイン適応としての教師なしcamouflaged object segmentation Unsupervised Camouflaged Object Segmentation as Domain Adaptation ( http://arxiv.org/abs/2308.04528v1 ) ライセンス: Link先を確認	Yi Zhang, Chengyi Wu	(参考訳) 人間のラベルがないため、教師なしのイメージセグメンテーションのための深層学習は依然として困難である。一般的なアイデアはセグメンテーションヘッドを訓練することであり、自己教師付きバックボーンの表現に基づいてピクセル単位で擬似ラベルを生成する。これにより、モデルパフォーマンスは、ターゲットデータセットの分布と事前トレーニングデータセット(例えば、ImageNet)の間の距離に大きく依存する。そこで本研究では,対象オブジェクトが共通に稀な属性,すなわちカモフラージュ(camouflage)を持つような,教師なしカモフラージュオブジェクトセグメンテーション(UCOS)の新たなタスクについて検討する。当然のことながら、最先端の教師なしモデルは、ジェネリックオブジェクトとカモフラーグオブジェクトのドメインギャップのため、UCOSの適応に苦慮している。この目的のために、UCOSをソースフリーな教師なしドメイン適応タスク(UCOS-DA)として定式化し、モデルトレーニングプロセス全体において、ソースラベルとターゲットラベルの両方が欠落している。具体的には、imagenetで事前学習された自己教師付き視覚トランスフォーマーからなるソースモデルを定義する。一方、対象領域は単純な線形層(すなわち、ターゲットモデル)とラベルなしのカモフラージュオブジェクトを含む。次に,強固な uco を実現するために,フォアグラウンド・バックグラウンド・コントラッシブな自己競合ドメイン適応のためのパイプラインを設計する。その結果,UCOSベンチマークにおける教師なしモデルと比較すると,教師付きCOSモデルの10分の1のスケールのトレーニングセットに対して,ベースラインモデルの方が優れたセグメンテーション性能が得られることがわかった。 Deep learning for unsupervised image segmentation remains challenging due to the absence of human labels. The common idea is to train a segmentation head, with the supervision of pixel-wise pseudo-labels generated based on the representation of self-supervised backbones. By doing so, the model performance depends much on the distance between the distributions of target datasets and the pre-training dataset (e.g., ImageNet). In this work, we investigate a new task, namely unsupervised camouflaged object segmentation (UCOS), where the target objects own a common rarely-seen attribute, i.e., camouflage. Unsurprisingly, we find that the state-of-the-art unsupervised models struggle in adapting UCOS, due to the domain gap between the properties of generic and camouflaged objects. To this end, we formulate the UCOS as a source-free unsupervised domain adaptation task (UCOS-DA), where both source labels and target labels are absent during the whole model training process. Specifically, we define a source model consisting of self-supervised vision transformers pre-trained on ImageNet. On the other hand, the target domain includes a simple linear layer (i.e., our target model) and unlabeled camouflaged objects. We then design a pipeline for foreground-background-contrastive self-adversarial domain adaptation, to achieve robust UCOS. As a result, our baseline model achieves superior segmentation performance when compared with competing unsupervised models on the UCOS benchmark, with the training set which's scale is only one tenth of the supervised COS counterpart.	翻訳日:2023-08-10 16:21:05 公開日:2023-08-08
# 超メトリック輪郭マップを用いた大規模マルチハイポテーゼ細胞追跡 Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps ( http://arxiv.org/abs/2308.04526v1 ) ライセンス: Link先を確認	Jord\~ao Bragantini, Merlin Lange, Lo\"ic Royer	(参考訳) 本稿では,セグメンテーション選択アプローチによる大規模3dセル追跡手法について述べる。提案手法は, 大規模顕微鏡データセットにおけるセルの追跡に有効である。 (i)テラバイト規模の3D+tデータセットに数百万のセグメンテーションインスタンスを含む問題を解くことができる。 (ii)蛍光顕微鏡の領域では少ない3dアノテートデータを必要とする深層学習の有無で競争力のある結果が得られる。提案手法はセグメンテーション仮説の階層を用いてセルのトラックやセグメントを計算し,隣接フレーム間の重なりを最大化することにより隣接セグメントを選択する。本手法は, セル追跡課題から得られた3次元画像の最先端化を実現し, より高速な整数線形計画法を有することを示す。さらに,本フレームワークは柔軟で,既製のセルセグメンテーションモデルからのセグメンテーションをサポートし,それらを組み合わせることで追跡性を向上させる。コードはhttps://github.com/royerlab/ultrackで入手できる。 In this work, we describe a method for large-scale 3D cell-tracking through a segmentation selection approach. The proposed method is effective at tracking cells across large microscopy datasets on two fronts: (i) It can solve problems containing millions of segmentation instances in terabyte-scale 3D+t datasets; (ii) It achieves competitive results with or without deep learning, which requires 3D annotated data, that is scarce in the fluorescence microscopy field. The proposed method computes cell tracks and segments using a hierarchy of segmentation hypotheses and selects disjoint segments by maximizing the overlap between adjacent frames. We show that this method achieves state-of-the-art results in 3D images from the cell tracking challenge and has a faster integer linear programming formulation. Moreover, our framework is flexible and supports segmentations from off-the-shelf cell segmentation models and can combine them into an ensemble that improves tracking. The code is available https://github.com/royerlab/ultrack.	翻訳日:2023-08-10 16:20:37 公開日:2023-08-08
# 参照誘導型DNA配列アライメントのための量子ゲートアルゴリズム Quantum gate algorithm for reference-guided DNA sequence alignment ( http://arxiv.org/abs/2308.04525v1 ) ライセンス: Link先を確認	G. D. Varsamis, I. G. Karafyllidis, K. M. Gilkes, U. Arranz, R. Martin-Cuevas, G. Calleja, P. Dimitrakis, P. Kolovos, R. Sandaltzopoulos, H. C. Jessen, J. Wong	(参考訳) 参照誘導DNAシークエンシングとアライメントは、計算分子生物学において重要なプロセスである。 DNAデータの量は急速に増加し、数百万のプライベートゲノムを再配列する必要がある間に新しいゲノムが配列されるのを待っている。それぞれのヒトゲノムは3.2B塩基対を持ち、それぞれに2ビットの情報を格納できるため、1つのヒトゲノムは6.4Bビットまたは約760MBの貯蔵を必要とする(National Institute of General Medical Sciences)。現在、ほとんどの強力なテンソル処理ユニットは、計算能力の大きな飛躍を必要とするDNAデータの量を扱うことができない。したがって、ゲノムデータ解析、特にDNA配列アライメントにおける量子コンピュータの有用性を調べることが重要である。量子コンピュータはDNAシークエンシングに関わり、当初は古典的なシステムの一部として、量子加速器として機能することが期待されている。利用可能な量子ビットの数は毎年増えており、将来の量子コンピュータは古典的な計算システムの代わりにdnaシーケンシングを行うことができる。ゲート型量子コンピューティングをモデルとした参照誘導型DNA配列アライメントのための新しい量子アルゴリズムを提案する。このアルゴリズムはスケーラブルで、既存の古典的なDNAシークエンシングシステムに統合することができ、計算エラーを制限するために意図的に構造化されている。量子アルゴリズムはIBM Quantumが提供する量子処理ユニットとシミュレータを用いてテストされており、その正確性が確認されている。 Reference-guided DNA sequencing and alignment is an important process in computational molecular biology. The amount of DNA data grows very fast, and many new genomes are waiting to be sequenced while millions of private genomes need to be re-sequenced. Each human genome has 3.2 B base pairs, and each one could be stored with 2 bits of information, so one human genome would take 6.4 B bits or about 760 MB of storage (National Institute of General Medical Sciences). Today most powerful tensor processing units cannot handle the volume of DNA data necessitating a major leap in computing power. It is, therefore, important to investigate the usefulness of quantum computers in genomic data analysis, especially in DNA sequence alignment. Quantum computers are expected to be involved in DNA sequencing, initially as parts of classical systems, acting as quantum accelerators. The number of available qubits is increasing annually, and future quantum computers could conduct DNA sequencing, taking the place of classical computing systems. We present a novel quantum algorithm for reference-guided DNA sequence alignment modeled with gate-based quantum computing. The algorithm is scalable, can be integrated into existing classical DNA sequencing systems and is intentionally structured to limit computational errors. The quantum algorithm has been tested using the quantum processing units and simulators provided by IBM Quantum, and its correctness has been confirmed.	翻訳日:2023-08-10 16:20:19 公開日:2023-08-08
# 多様なデータ型のためのディープラーニング:レビュー Deep Learning for Diverse Data Types Steganalysis: A Review ( http://arxiv.org/abs/2308.04522v1 ) ライセンス: Link先を確認	Hamza Kheddar, Mustapha Hemis, Yassine Himeur, David Meg\'ias, Abbes Amira	(参考訳) ステガノグラフィーとステガナリシスは情報セキュリティの分野における2つの相互関係の側面である。ステガノグラフィーは通信を隠蔽しようとするが、ステガナリシスはそれらを見つけるか、可能であればそれらを含むデータを回収することを目的としている。ステガノグラフィーとステガナリシスは特に法執行機関から大きな関心を集めている。ステガノグラフィーは、多くの国で暗号が禁止または制限されているため、しばしばサイバー犯罪者やテロリストが犯罪証拠を所持している間に捕らえられるのを避けるために使用される。したがって、隠蔽情報を明らかにするための最先端技術に関する知識は、違法行為の暴露に不可欠である。ここ数年、多くの強固で信頼性の高いステガノグラフィーとステグアナリシス技術が文献に紹介されている。本稿では,デジタルメディア内の隠れ情報を検出するための深層学習に基づくseg analysis技術の概要について述べる。本論文は、画像、音声、ビデオを含む、ステガナリシスにおけるあらゆる種類のカバーをカバーし、最もよく使われているディープラーニング技術について論じる。さらに,より高度な深層学習技術である深層移動学習 (DTL) や深層強化学習 (DRL) をステガナリシスシステムの性能向上に活用することを検討した。本稿は,最近の研究におけるデータセットや評価指標を含む最近の研究の体系的レビューを提供する。また, dtlに基づくsteg analysisアプローチの詳細な解析と, 異なるデータセット上での性能について述べる。このレビューは、ディープラーニングに基づくステガナリシスの現状、課題、今後の研究方向性に関する議論から締めくくっている。 Steganography and steganalysis are two interrelated aspects of the field of information security. Steganography seeks to conceal communications, whereas steganalysis is aimed to either find them or even, if possible, recover the data they contain. Steganography and steganalysis have attracted a great deal of interest, particularly from law enforcement. Steganography is often used by cybercriminals and even terrorists to avoid being captured while in possession of incriminating evidence, even encrypted, since cryptography is prohibited or restricted in many countries. Therefore, knowledge of cutting-edge techniques to uncover concealed information is crucial in exposing illegal acts. Over the last few years, a number of strong and reliable steganography and steganalysis techniques have been introduced in the literature. This review paper provides a comprehensive overview of deep learning-based steganalysis techniques used to detect hidden information within digital media. The paper covers all types of cover in steganalysis, including image, audio, and video, and discusses the most commonly used deep learning techniques. In addition, the paper explores the use of more advanced deep learning techniques, such as deep transfer learning (DTL) and deep reinforcement learning (DRL), to enhance the performance of steganalysis systems. The paper provides a systematic review of recent research in the field, including data sets and evaluation metrics used in recent studies. It also presents a detailed analysis of DTL-based steganalysis approaches and their performance on different data sets. The review concludes with a discussion on the current state of deep learning-based steganalysis, challenges, and future research directions.	翻訳日:2023-08-10 16:19:56 公開日:2023-08-08
# Donkey文のためのDisCoCat DisCoCat for Donkey Sentences ( http://arxiv.org/abs/2308.04519v1 ) ライセンス: Link先を確認	Lachlan McPheat (University College London), Daphne Wang (University College London)	(参考訳) 我々は、Geachのドンキー文を構成的分布モデルで解析する方法を実証する。我々は、談話、決定子、相対代名詞をモデル化する拡張を含むDisCoCat(Distributional Compositional Categorical)フレームワークに関する以前の研究に基づいて構築する。関係空間意味論とベクトル空間意味論の両方を定義するロバ文を解析するための型論理構文を提案する。 We demonstrate how to parse Geach's Donkey sentences in a compositional distributional model of meaning. We build on previous work on the DisCoCat (Distributional Compositional Categorical) framework, including extensions that model discourse, determiners, and relative pronouns. We present a type-logical syntax for parsing donkey sentences, for which we define both relational and vector space semantics.	翻訳日:2023-08-10 16:19:30 公開日:2023-08-08
# 汎用AIによるラベルなし多視点3次元歩行者検出に向けて:技術と性能解析 Toward unlabeled multi-view 3D pedestrian detection by generalizable AI: techniques and performance analysis ( http://arxiv.org/abs/2308.04515v1 ) ライセンス: Link先を確認	Jo\~ao Paulo Lima, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb	(参考訳) 我々は、ラベルのないターゲットシーンにおける多視点3D歩行者検出を改善するために、一般化可能なAIをいかに活用できるかを明らかにした。新しいシーンへの一般化を促進する方法の1つは、ターゲットデータを自動的にラベル付けすることで、検出器モデルのトレーニングに使用できる。本研究では,教師付き検出器を用いた擬似ラベル付けと,未学習検出器を用いた自動ラベル付けの2つの手法について検討した。自動ラベリング手法を用いて検出器モデルを最適化するためのトレーニングフレームワークを採用する。このフレームワークは、異なるトレーニングセット/モードとマルチラウンドの自動ラベリング戦略を含んでいる。 WILDTRACKおよびMultiviewXデータセットについて解析を行った。学習されていない検出器に基づく自動ラベル付け手法を用いることで、学習されていない検出器や既存のラベル付きソースデータセットでトレーニングされた検出器を直接使用するよりも優れた結果が得られることを示す。ターゲットデータセットとしてwildtrackとmultiviewxを使用する場合、既存の未ラベルメソッドよりも4%と1%のモーダが達成された。 We unveil how generalizable AI can be used to improve multi-view 3D pedestrian detection in unlabeled target scenes. One way to increase generalization to new scenes is to automatically label target data, which can then be used for training a detector model. In this context, we investigate two approaches for automatically labeling target data: pseudo-labeling using a supervised detector and automatic labeling using an untrained detector (that can be applied out of the box without any training). We adopt a training framework for optimizing detector models using automatic labeling procedures. This framework encompasses different training sets/modes and multi-round automatic labeling strategies. We conduct our analyses on the publicly-available WILDTRACK and MultiviewX datasets. We show that, by using the automatic labeling approach based on an untrained detector, we can obtain superior results than directly using the untrained detector or a detector trained with an existing labeled source dataset. It achieved a MODA about 4% and 1% better than the best existing unlabeled method when using WILDTRACK and MultiviewX as target datasets, respectively.	翻訳日:2023-08-10 16:19:22 公開日:2023-08-08
# MT-IceNet -北極海氷予測のための空間的・時間的深層学習モデル MT-IceNet -- A Spatial and Multi-Temporal Deep Learning Model for Arctic Sea Ice Forecasting ( http://arxiv.org/abs/2308.04511v1 ) ライセンス: Link先を確認	Sahara Ali, Jianwu Wang	(参考訳) 北極圏の増幅は、気候パターンを地域的にも世界的にも変化させ、過去数十年で、より頻繁で激しい気象現象を引き起こした。北極圏の増幅の不可欠な部分は、衛星観測による前例のない海氷の喪失である。季節的から季節的スケールで北極海氷を正確に予測することは、根本的な課題を伴う主要な研究課題である。物理に基づく地球系のモデルに加えて、研究者は海氷予測に複数の統計モデルと機械学習モデルを適用している。海氷の変動を研究するためのデータ駆動型アプローチの可能性を検討するため,北極海氷濃度(SIC)予測のためのUNetに基づく空間的・時間的深層学習モデルMT-IceNetを提案する。このモデルはエンコーダ-デコーダアーキテクチャを使用し、スキップ接続と多時間入力ストリームを処理し、将来の時間ステップで空間マップを再生する。 1979-2021年、NSIDCから月毎・月毎の衛星海氷データと、ERA5の再分析製品から得られた大気および海洋の変数を用いて、提案モデルが、最先端の予測誤差を6ヵ月間最大60%減少させ、画素ごとのSIC予測に有望な予測性能を提供することを示した。 Arctic amplification has altered the climate patterns both regionally and globally, resulting in more frequent and more intense extreme weather events in the past few decades. The essential part of Arctic amplification is the unprecedented sea ice loss as demonstrated by satellite observations. Accurately forecasting Arctic sea ice from sub-seasonal to seasonal scales has been a major research question with fundamental challenges at play. In addition to physics-based Earth system models, researchers have been applying multiple statistical and machine learning models for sea ice forecasting. Looking at the potential of data-driven approaches to study sea ice variations, we propose MT-IceNet - a UNet based spatial and multi-temporal (MT) deep learning model for forecasting Arctic sea ice concentration (SIC). The model uses an encoder-decoder architecture with skip connections and processes multi-temporal input streams to regenerate spatial maps at future timesteps. Using bi-monthly and monthly satellite retrieved sea ice data from NSIDC as well as atmospheric and oceanic variables from ERA5 reanalysis product during 1979-2021, we show that our proposed model provides promising predictive performance for per-pixel SIC forecasting with up to 60% decrease in prediction error for a lead time of 6 months as compared to its state-of-the-art counterparts.	翻訳日:2023-08-10 16:19:05 公開日:2023-08-08
# 2粒子非エルミートハバード模型におけるスペクトル構造とダビロン解離 Spectral structure and doublon dissociation in the two-particle non-Hermitian Hubbard model ( http://arxiv.org/abs/2308.04505v1 ) ライセンス: Link先を確認	Stefano Longhi	(参考訳) 非エルミート模型の強相関系は研究の創発的な領域である。ここでは、格子上の単一粒子ホッピング振幅が相反しない非エルミートハバードモデルを検討し、異なる境界条件下でのヒルベルト空間の2粒子セクターのスペクトル構造の正確な解析結果を提供する。この分析は、純粋に非エルミート的性質の興味深いスペクトル的および動的効果を示し、単粒子系で見られる通常のシナリオから逸脱する。具体的には、無限格子上のmott-hubbardバンドのスペクトル相転移を、相互作用エネルギーが複素エネルギー平面内の開ループから閉ループへの臨界値よりも増大し、その2つの粒子が格子端に達するとドブロン状態が突然復活し、格子のバルクにおける2粒子結合状態の不安定性(英語版)(dublons)の動的解離が起こると予測する。格子のバルクで観測された粒子解離は、単一粒子と2粒子状態の異なる寿命から生じる非エルミート力学の明らかな顕在化であり、一方、境界におけるドバイロン状態の突然の回復は、境界依存エネルギースペクトルを持つ非エルミート系に特有の突破端の動的効果であり、相関粒子に対して初めて予測される。 Strongly-correlated systems in non-Hermitian models are an emergent area of research. Here we consider a non-Hermitian Hubbard model, where the single-particle hopping amplitudes on the lattice are not reciprocal, and provide exact analytical results of the spectral structure in the two-particle sector of Hilbert space under different boundary conditions. The analysis unveils some interesting spectral and dynamical effects of purely non-Hermitian nature and that deviate from the usual scenario found in the single-particle regime. Specifically, we predict a spectral phase transition of the Mott-Hubbard band on the infinite lattice as the interaction energy is increased above a critical value, from an open to a closed loop in complex energy plane, and the dynamical dissociation of doublons, i.e. instability of two-particle bound states, in the bulk of the lattice, with a sudden revival of the doublon state when the two particles reach the lattice edge. Particle dissociation observed in the bulk of the lattice is a clear manifestation of non-Hermitian dynamics arising from the different lifetimes of single-particle and two-particle states, whereas the sudden revival of the doublon state at the boundaries is a striking burst edge dynamical effect peculiar to non-Hermitian systems with boundary-dependent energy spectra, here predicted for the first time for correlated particles.	翻訳日:2023-08-10 16:18:39 公開日:2023-08-08
# FakeからReal(FFR)へ:合成データによる素早い相関を緩和するための2段階トレーニングパイプライン From Fake to Real (FFR): A two-stage training pipeline for mitigating spurious correlations with synthetic data ( http://arxiv.org/abs/2308.04553v1 ) ライセンス: Link先を確認	Maan Qraitem, Kate Saenko, Bryan A. Plummer	(参考訳) 視覚認識モデルは、特定のグループ(女性)が特定のクラス(プログラマ)で不足している不均衡なトレーニングセットによって引き起こされる急激な相関を学習する傾向にある。生成モデルは、マイノリティサンプルの合成データを生成し、トレーニングセットのバランスをとることで、このバイアスを緩和する有望な方向を提供する。しかし、これらのアプローチを用いた以前の研究は、視覚認識モデルが実画像と合成画像の区別を学べることがしばしばあり、したがって元のデータセットのバイアスを解き放つことに失敗する可能性があることを見落としていた。本稿では,この問題を緩和する新たな2段階パイプラインを提案する。 1)バランスの取れた合成データセット上でモデルを事前訓練した後 2)実際のデータを微調整する。このパイプラインを使用することで,実データと合成データの両方のトレーニングを回避し,実データと合成データのバイアスを回避する。さらに,第1ステップではバイアスに対して頑健な特徴を学習し,第2ステップではバイアスを緩和する。さらに、当社のパイプラインはバイアス緩和手法と自然に統合され、微調整ステップに単純に適用することができます。実験により,3つの大規模データセット上での最先端性能を得るバイアス軽減手法の性能をさらに向上させることができた。 Visual recognition models are prone to learning spurious correlations induced by an imbalanced training set where certain groups (\eg Females) are under-represented in certain classes (\eg Programmers). Generative models offer a promising direction in mitigating this bias by generating synthetic data for the minority samples and thus balancing the training set. However, prior work that uses these approaches overlooks that visual recognition models could often learn to differentiate between real and synthetic images and thus fail to unlearn the bias in the original dataset. In our work, we propose a novel two-stage pipeline to mitigate this issue where 1) we pre-train a model on a balanced synthetic dataset and then 2) fine-tune on the real data. Using this pipeline, we avoid training on both real and synthetic data, thus avoiding the bias between real and synthetic data. Moreover, we learn robust features against the bias in the first step that mitigate the bias in the second step. Moreover, our pipeline naturally integrates with bias mitigation methods; they can be simply applied to the fine-tuning step. As our experiments prove, our pipeline can further improve the performance of bias mitigation methods obtaining state-of-the-art performance on three large-scale datasets.	翻訳日:2023-08-10 16:12:46 公開日:2023-08-08
# 自己教師付き事前訓練による雑音ラベルの医用画像分類の改善 Improving Medical Image Classification in Noisy Labels Using Only Self-supervised Pretraining ( http://arxiv.org/abs/2308.04551v1 ) ライセンス: Link先を確認	Bidur Khanal, Binod Bhattarai, Bishesh Khanal, Cristian A. Linte	(参考訳) ノイズラベルが深層学習に基づく教師付き画像分類性能を損なうのは、モデルがノイズに過度に適合し、劣化した特徴抽出器を学習するためである。ノイズラベル付きデータを用いた自然画像分類訓練では,自己教師あり重みによるモデル初期化が特徴破壊の低減と分類性能の向上に寄与している。しかし、研究は行われていない。一プレテキストタスクベースの事前学習のような他の自己指導的アプローチが騒音ラベルによる学習に与える影響二騒々しいラベル設定の医用画像に対して単独の自己監督事前訓練方法医療画像は、しばしばより小さなデータセットと微妙なクラス間変異を特徴とし、正確な分類を保証するために人間の専門知識を必要とする。したがって、CIFARのような自然画像データセットにおけるノイズラベルによる学習を改善する手法が医療画像にも役立つかどうかは不明である。本研究は,NCT-CRC-HE-100K組織組織像とCOVID-QU-Ex胸部X線画像を用いた2つの医学データセットの深層学習分類モデルの重み付けを初期化するために,コントラッシブでプレトレーニングされたタスクベースの自己教師付きプレトレーニングについて検討する。その結果,自己教師付き学習から得られた事前学習重みで初期化したモデルでは,より優れた特徴を効果的に学習し,雑音ラベルに対する頑健性を向上させることができた。 Noisy labels hurt deep learning-based supervised image classification performance as the models may overfit the noise and learn corrupted feature extractors. For natural image classification training with noisy labeled data, model initialization with contrastive self-supervised pretrained weights has shown to reduce feature corruption and improve classification performance. However, no works have explored: i) how other self-supervised approaches, such as pretext task-based pretraining, impact the learning with noisy label, and ii) any self-supervised pretraining methods alone for medical images in noisy label settings. Medical images often feature smaller datasets and subtle inter class variations, requiring human expertise to ensure correct classification. Thus, it is not clear if the methods improving learning with noisy labels in natural image datasets such as CIFAR would also help with medical images. In this work, we explore contrastive and pretext task-based self-supervised pretraining to initialize the weights of a deep learning classification model for two medical datasets with self-induced noisy labels -- NCT-CRC-HE-100K tissue histological images and COVID-QU-Ex chest X-ray images. Our results show that models initialized with pretrained weights obtained from self-supervised learning can effectively learn better features and improve robustness against noisy labels.	翻訳日:2023-08-10 16:12:27 公開日:2023-08-08
# 意味認識時間蓄積によるprune時空間トークン Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation ( http://arxiv.org/abs/2308.04549v1 ) ライセンス: Link先を確認	Shuangrui Ding, Peisen Zhao, Xiaopeng Zhang, Rui Qian, Hongkai Xiong, Qi Tian	(参考訳) トランスフォーマーは、その素晴らしい性能により、コンピュータビジョンコミュニティの主要なバックボーンとなっている。しかし、不都合な計算コストは、ビデオ認識領域におけるその可能性を妨げる。速度精度のトレードオフを最適化するために,時空間トークンを一体的にプルーピングするための意味認識時間蓄積スコア(sta)を提案する。 STAスコアは、時間的冗長性と意味的重要性の2つの重要な要因を考慮する。前者は連続するフレームでトークンとtokenの類似性を集約し、後者は全体的な予測への貢献に基づいて各トークンを評価することにより、新しい事象か見掛けられた実体かに基づいて、特定の領域を描写する。その結果、staの高いスコアを持つトークンは、より時間的冗長性を持ち、より低い意味論を持つため、刈り取られる。 STAスコアに基づいて、追加のパラメータを導入することなく、あるいはさらなる再トレーニングを必要とせずに、トークンを段階的にプルークすることができる。市販のvitおよびvideoswinバックボーンにstaモジュールを直接適用し,kinetics-400 および something-something v2 を用いた実験結果では,約0.2%の精度低下で30%以上削減できた。コードはhttps://github.com/Mark12Ding/STAで公開されている。 Transformers have become the primary backbone of the computer vision community due to their impressive performance. However, the unfriendly computation cost impedes their potential in the video recognition domain. To optimize the speed-accuracy trade-off, we propose Semantic-aware Temporal Accumulation score (STA) to prune spatio-temporal tokens integrally. STA score considers two critical factors: temporal redundancy and semantic importance. The former depicts a specific region based on whether it is a new occurrence or a seen entity by aggregating token-to-token similarity in consecutive frames while the latter evaluates each token based on its contribution to the overall prediction. As a result, tokens with higher scores of STA carry more temporal redundancy as well as lower semantics thus being pruned. Based on the STA score, we are able to progressively prune the tokens without introducing any additional parameters or requiring further re-training. We directly apply the STA module to off-the-shelf ViT and VideoSwin backbones, and the empirical results on Kinetics-400 and Something-Something V2 achieve over 30% computation reduction with a negligible ~0.2% accuracy drop. The code is released at https://github.com/Mark12Ding/STA.	翻訳日:2023-08-10 16:12:07 公開日:2023-08-08
# イジングマシンを用いた化学反応ネットワークにおける最適経路の探索 Finding Optimal Pathways in Chemical Reaction Networks Using Ising Machines ( http://arxiv.org/abs/2308.04544v1 ) ライセンス: Link先を確認	Yuta Mizuno and Tamiki Komatsuzaki	(参考訳) 化学反応ネットワークにおける最適経路の発見は化学プロセスの解明と設計に不可欠であり、合成計画や代謝経路解析などの重要な応用がある。このような化学経路探索問題は制約付き組合せ最適化問題として定式化することができ、出発物質とターゲット物質を所定のネットワーク内で接続する化学反応の最適な組み合わせを見つけることを目的としている。組合せ爆発により、最適な経路を見つけるのに必要な計算時間はネットワークサイズによって指数関数的に増加する。量子アニーリングデバイスやシミュレーションアニーリングデバイスを含むイジングマシンは、このようなハードコンビネーション最適化に特化した新しいコンピュータを約束している。しかしながら、我々の知る限りでは、化学経路探索問題にイジングマシンを適用する試みはまだない。本稿では,化学経路探索問題に対する最初の ising/quantum 計算応用について述べる。化学経路フィニング問題から翻訳されたIsingモデルは、制約に違反するいくつかの種類のペナルティ項を含む。異なるタイプの適切なペナルティ強度を設定する方法が明確ではない。この課題に対処するために,パラメータチューニングにベイズ最適化を用いる。さらに,基礎となる問題構造に応じてペナルティ項をグループ化し,チューニング性能を向上させる手法を提案する。提案アルゴリズムの性能評価と解析は,D-Wave Advantageシステムとシミュレートアニーリングを用いて行った。ベンチマークの結果,最適な経路を見つける上での課題が明らかになった。同時に, コスト値の相対誤差がある程度許容できることを示すことにより, 最適経路の探索の可能性を示す。 Finding optimal pathways in chemical reaction networks is essential for elucidating and designing chemical processes, with significant applications such as synthesis planning and metabolic pathway analysis. Such a chemical pathway-finding problem can be formulated as a constrained combinatorial optimization problem, aiming to find an optimal combination of chemical reactions connecting starting materials to target materials in a given network. Due to combinatorial explosion, the computation time required to find an optimal pathway increases exponentially with the network size. Ising machines, including quantum and simulated annealing devices, are promising novel computers dedicated to such hard combinatorial optimization. However, to the best of our knowledge, there has yet to be an attempt to apply Ising machines to chemical pathway-finding problems. In this article, we present the first Ising/quantum computing application for chemical pathway-finding problems. The Ising model, translated from a chemical pathway-finding problem, involves several types of penalty terms for violating constraints. It is not obvious how to set appropriate penalty strengths of different types. To address this challenge, we employ Bayesian optimization for parameter tuning. Furthermore, we introduce a novel technique that enhances tuning performance by grouping penalty terms according to the underlying problem structure. The performance evaluation and analysis of the proposed algorithm were conducted using a D-Wave Advantage system and simulated annealing. The benchmark results reveal challenges in finding exact optimal pathways. Concurrently, the results indicate the feasibility of finding approximate optimal pathways, provided that a certain degree of relative error in cost value is acceptable.	翻訳日:2023-08-10 16:11:43 公開日:2023-08-08
# フォトニック量子極端学習機における実験的特性再構成 Experimental property-reconstruction in a photonic quantum extreme learning machine ( http://arxiv.org/abs/2308.04543v1 ) ライセンス: Link先を確認	Alessia Suprano, Danilo Zia, Luca Innocenti, Salvatore Lorenzo, Valeria Cimini, Taira Giordani, Ivan Palmisano, Emanuele Polino, Nicol\`o Spagnolo, Fabio Sciarrino, G. Massimo Palma, Alessandro Ferraro and Mauro Paternostro	(参考訳) 近年の発展により、量子状態の性質のキャラクタリゼーションを含む重要な問題に対処するために、実験プラットフォームに機械学習ツールを組み込むことが可能になった。これを利用して、光子の偏光状態の資源効率と正確な評価を実現するために、フォトニックプラットフォームに量子極端学習マシンを実装した。このような入力状態が進化する基盤となる貯留層ダイナミクスは、高次元フォトニック軌道角運動量の量子ウォークを用いて実装され、一定の基底で射影的測定を行う。本研究では, 未知の偏光状態の再構成が測定装置の注意深い特徴付けを必要とせず, 実験的な不完全性に対して堅牢であることを示す。 Recent developments have led to the possibility of embedding machine learning tools into experimental platforms to address key problems, including the characterization of the properties of quantum states. Leveraging on this, we implement a quantum extreme learning machine in a photonic platform to achieve resource-efficient and accurate characterization of the polarization state of a photon. The underlying reservoir dynamics through which such input state evolves is implemented using the coined quantum walk of high-dimensional photonic orbital angular momentum, and performing projective measurements over a fixed basis. We demonstrate how the reconstruction of an unknown polarization state does not need a careful characterization of the measurement apparatus and is robust to experimental imperfections, thus representing a promising route for resource-economic state characterisation.	翻訳日:2023-08-10 16:11:20 公開日:2023-08-08
# yudo: 統一指向オブジェクト検出のためのyolo YUDO: YOLO for Uniform Directed Object Detection ( http://arxiv.org/abs/2308.04542v1 ) ライセンス: Link先を確認	{\DJ}or{\dj}e Nedeljkovi\'c	(参考訳) 本稿では,その中心座標と方向角を予測し,有向物体を効率的に検出する手法を提案する。対象物のサイズは一様であるため,提案モデルは対象物の幅や高さを予測せずに動作する。この問題に使用されるデータセットは、Honeybee Segmentation and Tracking Datasetsプロジェクトで紹介されている。この研究の貢献の1つは、位置や方向を検出するためにyolov7のような標準リアルタイムオブジェクト検出アーキテクチャをカスタマイズする能力の検討である。このアプローチでは、非常に効率的で小さなバージョンのアーキテクチャが使用されます。さらに、アンカーのない3つの検出ヘッドのうち1つだけで十分である。また, 回転箱-方向iou (diriou) に対するskewiou(union over union)計算について, 絶対角度差を含む拡張スキュー交点を導入する。 DirIoUは、mAP計算のためのターゲットと予測バウンディングボックスのマッチング手順と、NMSフィルタリング手順の両方で使用される。コードとモデルはhttps://github.com/djordjened92/yudoで入手できる。 This paper presents an efficient way of detecting directed objects by predicting their center coordinates and direction angle. Since the objects are of uniform size, the proposed model works without predicting the object's width and height. The dataset used for this problem is presented in Honeybee Segmentation and Tracking Datasets project. One of the contributions of this work is an examination of the ability of the standard real-time object detection architecture like YoloV7 to be customized for position and direction detection. A very efficient, tiny version of the architecture is used in this approach. Moreover, only one of three detection heads without anchors is sufficient for this task. We also introduce the extended Skew Intersection over Union (SkewIoU) calculation for rotated boxes - directed IoU (DirIoU), which includes an absolute angle difference. DirIoU is used both in the matching procedure of target and predicted bounding boxes for mAP calculation, and in the NMS filtering procedure. The code and models are available at https://github.com/djordjened92/yudo.	翻訳日:2023-08-10 16:11:05 公開日:2023-08-08
# ナノビーム中のシリコンt中心からの高効率単一光子放出 High-efficiency single photon emission from a silicon T-center in a nanobeam ( http://arxiv.org/abs/2308.04541v1 ) ライセンス: Link先を確認	Chang-Min Lee, Fariba Islam, Samuel Harper, Mustafa Atabey Buyukkaya, Daniel Higginbottom, Stephanie Simmons, Edo Waks	(参考訳) Siのカラーセンターは、全シリコンプラットフォームで長いコヒーレンス時間を持つ効率的な量子エミッタと量子メモリの両方として機能する可能性がある。様々な既知の色中心の中で、T中心は長いコヒーレンス時間を持つスピン基底状態を持つため、特定の約束を持っている。しかし、この色中心は長い励起状態の寿命を示し、光子放出速度が低く、高効率で光子放出を抽出する方法が必要となる。ナノビームを用いた単一T中心からの高効率単一光子放出を示す。ナノビームは、レンズファイバとよくマッチするモードにおいて効率的に光を放射し、t中心放射の70%以上を単一モードファイバに直接集めることができる。この効率により、T中心からのコヒーレントな放出を表すゼロフォノン線からの単一光子放出を直接示すことができる。この結果は、量子コンピューティングと量子ネットワークのためのシリコン集積スピン光子インタフェースへの重要な一歩である。 Color centers in Si could serve as both efficient quantum emitters and quantum memories with long coherence times in an all-silicon platform. Of the various known color centers, the T center holds particular promise because it possesses a spin ground state that has long coherence times. But this color center exhibits a long excited state lifetime which results in a low photon emission rate, requiring methods to extract photon emission with high efficiency. We demonstrate high-efficiency single photon emission from a single T center using a nanobeam. The nanobeam efficiently radiates light in a mode that is well-matched to a lensed fiber, enabling us to collect over 70% of the T center emission directly into a single mode fiber. This efficiency enables us to directly demonstrate single photon emission from the zero phonon line, which represents the coherent emission from the T center. Our results represent an important step towards silicon-integrated spin-photon interfaces for quantum computing and quantum networks.	翻訳日:2023-08-10 16:10:46 公開日:2023-08-08
# バイオインスパイアされたアーキテクチャを用いた連続学習タスクの性能向上 Improving Performance in Continual Learning Tasks using Bio-Inspired Architectures ( http://arxiv.org/abs/2308.04539v1 ) ライセンス: Link先を確認	Sandeep Madireddy, Angel Yanguas-Gil, Prasanna Balaprakash	(参考訳) 破滅的な忘れることなく、入ってくるデータストリームから継続的に学習する能力は、インテリジェントなシステムを設計する上で重要である。継続的学習のための多くのアプローチは、確率的勾配降下とそのグローバルエラー更新を用いた変種に依存しているため、安定性、強欲、短期的なメモリ制限を回避するために、メモリバッファやリプレイのような戦略を採用する必要がある。この制限に対処するために,我々は,シナプス可塑性機構とニューロモジュレーションを組み込んだ,生物学的にインスパイアされた軽量ニューラルネットワークアーキテクチャを開発した。提案手法は,スプリット-MNIST,スプリット-CIFAR-10,スプリット-CIFAR-100データセットのオンライン連続学習性能を,他のメモリ制約学習手法と比較し,最先端のメモリ集約リプレイ方式と一致させる。さらに,鍵設計概念を他のバックプロパゲーションに基づく連続学習アルゴリズムに統合し,その精度を大幅に向上させることにより,提案手法の有効性を実証する。我々の結果は、生物学的原則を機械学習モデルに取り入れることの重要性を証明し、オンライン連続学習のためのより効率的で堅牢なシステムの設計にそれらをどのように活用できるかについての洞察を提供する。 The ability to learn continuously from an incoming data stream without catastrophic forgetting is critical to designing intelligent systems. Many approaches to continual learning rely on stochastic gradient descent and its variants that employ global error updates, and hence need to adopt strategies such as memory buffers or replay to circumvent its stability, greed, and short-term memory limitations. To address this limitation, we have developed a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation and hence learns through local error signals to enable online continual learning without stochastic gradient descent. Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets compared to other memory-constrained learning approaches and matches that of the state-of-the-art memory-intensive replay-based approaches. We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms, significantly improving their accuracy. Our results provide compelling evidence for the importance of incorporating biological principles into machine learning models and offer insights into how we can leverage them to design more efficient and robust systems for online continual learning.	翻訳日:2023-08-10 16:10:31 公開日:2023-08-08
# マイクロ表現生成のための顔優先1次運動モデル Facial Prior Based First Order Motion Model for Micro-expression Generation ( http://arxiv.org/abs/2308.04536v1 ) ライセンス: Link先を確認	Yi Zhang, Youjun Zhao, Yuhang Wen, Zixuan Tang, Xinhua Xu, Mengyuan Liu	(参考訳) ビデオから顔のマイクロ表現を見つけると、臨床診断や尋問などの分野で様々な応用が考えられるが、トレーニングデータの規模が限られているため、この課題はまだ難しい。そこで本研究では,マイクロ圧縮生成と呼ばれる新しいタスクを定式化し,第1次動作モデルと顔の先行知識を組み合わせた強力なベースラインを提示する。対象の顔が与えられた場合、原動画の動きパターンに応じて、顔を動かしてマイクロ圧縮ビデオを生成する。具体的には、新しいモデルは3つのモジュールを含む。まず,領域集中モジュールから顔先行特徴を抽出する。第2に,動き予測モジュールを用いたキーポイントと局所アフィン変換を用いて顔の動きを推定する。第三に、表情生成モジュールはターゲットの顔を駆動してビデオを生成する。パブリックなcasme ii、samm、smicデータセットでモデルをトレーニングし、そのモデルを使って評価のために新しいマイクロ表現ビデオを生成します。本モデルは,顔マイクロ表現チャレンジ2021 (megc2021) において,顔動作符号化システム認定を受けた3人の専門家によって,優れた性能が検証される第1位となる。ソースコードはhttps://github.com/Necolizer/Facial-Prior-Based-FOMMで公開されている。 Spotting facial micro-expression from videos finds various potential applications in fields including clinical diagnosis and interrogation, meanwhile this task is still difficult due to the limited scale of training data. To solve this problem, this paper tries to formulate a new task called micro-expression generation and then presents a strong baseline which combines the first order motion model with facial prior knowledge. Given a target face, we intend to drive the face to generate micro-expression videos according to the motion patterns of source videos. Specifically, our new model involves three modules. First, we extract facial prior features from a region focusing module. Second, we estimate facial motion using key points and local affine transformations with a motion prediction module. Third, expression generation module is used to drive the target face to generate videos. We train our model on public CASME II, SAMM and SMIC datasets and then use the model to generate new micro-expression videos for evaluation. Our model achieves the first place in the Facial Micro-Expression Challenge 2021 (MEGC2021), where our superior performance is verified by three experts with Facial Action Coding System certification. Source code is provided in https://github.com/Necolizer/Facial-Prior-Based-FOMM.	翻訳日:2023-08-10 16:10:02 公開日:2023-08-08
# 航空ドローン画像を用いた災害現場のヒューマンコンディションの推定 Estimation of Human Condition at Disaster Site Using Aerial Drone Images ( http://arxiv.org/abs/2308.04535v1 ) ライセンス: Link先を確認	Tomoki Arai, Kenji Iwata, Kensho Hara, Yutaka Satoh	(参考訳) ドローンはさまざまな災害の状況を評価するために使われています。本研究では,災害現場の把握を迅速かつ省力化するために,航空ドローン画像の動作に基づいて,被害状況を自動的に推定する手法について検討する。都市部で発生した仮説的災害における人的行動の航空画像データセットを構築し,3D ResNetを用いて人的被害状況の分類を行った。その結果、人間の行動に特徴的な状態はリコール率80%以上で分類できるが、同様の行動を持つ他の状態はリコール率約50%でしか分類できないことが分かった。さらに、クラウドベースのvrプレゼンテーションアプリケーションは、ドローンを使って災害現場を理解し、人間の状態を推定することの有効性を示唆した。 Drones are being used to assess the situation in various disasters. In this study, we investigate a method to automatically estimate the damage status of people based on their actions in aerial drone images in order to understand disaster sites faster and save labor. We constructed a new dataset of aerial images of human actions in a hypothetical disaster that occurred in an urban area, and classified the human damage status using 3D ResNet. The results showed that the status with characteristic human actions could be classified with a recall rate of more than 80%, while other statuses with similar human actions could only be classified with a recall rate of about 50%. In addition, a cloud-based VR presentation application suggested the effectiveness of using drones to understand the disaster site and estimate the human condition.	翻訳日:2023-08-10 16:09:28 公開日:2023-08-08
# テンポラル・ディノ:アクション予測を強化する自己監督型ビデオ戦略 Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction ( http://arxiv.org/abs/2308.04589v1 ) ライセンス: Link先を確認	Izzeddin Teeti, Rongali Sai Bhargav, Vivek Singh, Andrew Bradley, Biplab Banerjee, Fabio Cuzzolin	(参考訳) 行動予測の分野は、自律運転、アクティビティ分析、人間とコンピュータの相互作用など、様々なコンピュータビジョンアプリケーションにおいて重要な役割を果たす。大幅な進歩にもかかわらず、ビデオデータに固有の高次元性、複雑なダイナミクス、不確実性のために、将来の行動を正確に予測することは難しい問題である。従来の教師付きアプローチでは大量のラベル付きデータが必要です。本稿では,DINO (self-distillation with labels) にインスパイアされた行動予測を強化するための,新たな自己教師型ビデオ戦略を提案する。テンポラル・ディノのアプローチでは、過去のフレームを「学生」処理する2つのモデルと、過去と将来のフレームの両方を「教師」処理することで、より広い時間的コンテキストを実現する。授業中、教師は過去のフレームだけを観察して将来の文脈を学ぶよう指導する。この戦略は3D-ResNet, Transformer, LSTMアーキテクチャを用いて, アクション予測下流タスクのためのROADデータセット上で評価される。提案手法は,9.9%の精度ポイント(PP)を平均的に向上させるとともに,長期的依存関係を捕捉するバックボーンの能力向上に有効であることを示す。さらに,本手法は,事前学習データセットのサイズと必要エポック数の効率性を示す。この方法は、様々なバックボーンアーキテクチャを考慮し、複数の予測水平線に対処し、手作りの強化への依存を減らし、事前学習プロセスを単一のステージに合理化することを含む、他のアプローチにおける制限を克服する。これらの結果は,行動認識,運動計画,シーン理解など,多様な映像ベースタスクにおけるアプローチの可能性を強調した。 The emerging field of action prediction plays a vital role in various computer vision applications such as autonomous driving, activity analysis and human-computer interaction. Despite significant advancements, accurately predicting future actions remains a challenging problem due to high dimensionality, complex dynamics and uncertainties inherent in video data. Traditional supervised approaches require large amounts of labelled data, which is expensive and time-consuming to obtain. This paper introduces a novel self-supervised video strategy for enhancing action prediction inspired by DINO (self-distillation with no labels). The Temporal-DINO approach employs two models; a 'student' processing past frames; and a 'teacher' processing both past and future frames, enabling a broader temporal context. During training, the teacher guides the student to learn future context by only observing past frames. The strategy is evaluated on ROAD dataset for the action prediction downstream task using 3D-ResNet, Transformer, and LSTM architectures. The experimental results showcase significant improvements in prediction performance across these architectures, with our method achieving an average enhancement of 9.9% Precision Points (PP), highlighting its effectiveness in enhancing the backbones' capabilities of capturing long-term dependencies. Furthermore, our approach demonstrates efficiency regarding the pretraining dataset size and the number of epochs required. This method overcomes limitations present in other approaches, including considering various backbone architectures, addressing multiple prediction horizons, reducing reliance on hand-crafted augmentations, and streamlining the pretraining process into a single stage. These findings highlight the potential of our approach in diverse video-based tasks such as activity recognition, motion planning, and scene understanding.	翻訳日:2023-08-10 16:02:03 公開日:2023-08-08
# ScatterUQ:マルチクラスディープラーニング問題に対する対話型不確実性可視化 ScatterUQ: Interactive Uncertainty Visualizations for Multiclass Deep Learning Problems ( http://arxiv.org/abs/2308.04588v1 ) ライセンス: Link先を確認	Harry Li, Steven Jorgensen, John Holodnak and Allan Wollaber	(参考訳) 近年,マルチクラスラベリング問題に対する不確実性を考慮したディープラーニング手法が開発され,クラス予測確率の校正と分散(ood)指標を提供し,機械学習(ml)の消費者とエンジニアがモデルの予測に対する信頼度を評価する。しかし、この余分なニューラルネットワーク予測情報は、複数の不確実性条件下で任意のデータソースに対して視覚的に伝達することが困難である。これらの課題に対処するために、ユーザがコンテキスト駆動の不確実性設定におけるモデルパフォーマンスをよりよく理解できるように、ターゲット視覚化を提供するインタラクティブシステムであるScatterUQを提案する。 ScatterUQは、距離対応ニューラルネットワークの最近の進歩を活用し、次元の縮小技術とともに、モデルがテスト例を(1)分布内および特定のクラス、(2)分布外、(3)分布外を予測した理由を説明する頑健な2次元散乱プロットを構築する。 mlのコンシューマとエンジニアは、モデル不確実性のパフォーマンスを理解し、アクションのフォローアップコースを決定するために``hoverコールバック'を使用して、テストサンプルの突出した特徴をトレーニング例と比較することができる。我々は、Fashion-MNISTで訓練され、Fashion-MNIST(分布中)およびMNIST(分布外)でテストされた距離認識ニューラルネットワーク上で、マルチクラス画像分類のためのモデル不確実性を説明するために、ScatterUQの有効性を実証する。文脈駆動型UQ可視化を最適化するために,次元削減手法を定量的に評価する。以上の結果から,ScatterUQシステムは任意のマルチクラスデータセットにスケールすることが示唆された。私たちのコードはhttps://github.com/mit-ll-responsible-ai/equine-webappで利用可能です。 Recently, uncertainty-aware deep learning methods for multiclass labeling problems have been developed that provide calibrated class prediction probabilities and out-of-distribution (OOD) indicators, letting machine learning (ML) consumers and engineers gauge a model's confidence in its predictions. However, this extra neural network prediction information is challenging to scalably convey visually for arbitrary data sources under multiple uncertainty contexts. To address these challenges, we present ScatterUQ, an interactive system that provides targeted visualizations to allow users to better understand model performance in context-driven uncertainty settings. ScatterUQ leverages recent advances in distance-aware neural networks, together with dimensionality reduction techniques, to construct robust, 2-D scatter plots explaining why a model predicts a test example to be (1) in-distribution and of a particular class, (2) in-distribution but unsure of the class, and (3) out-of-distribution. ML consumers and engineers can visually compare the salient features of test samples with training examples through the use of a ``hover callback'' to understand model uncertainty performance and decide follow up courses of action. We demonstrate the effectiveness of ScatterUQ to explain model uncertainty for a multiclass image classification on a distance-aware neural network trained on Fashion-MNIST and tested on Fashion-MNIST (in distribution) and MNIST digits (out of distribution), as well as a deep learning model for a cyber dataset. We quantitatively evaluate dimensionality reduction techniques to optimize our contextually driven UQ visualizations. Our results indicate that the ScatterUQ system should scale to arbitrary, multiclass datasets. Our code is available at https://github.com/mit-ll-responsible-ai/equine-webapp	翻訳日:2023-08-10 16:01:35 公開日:2023-08-08
# AIの開発ブートストラップ Developmental Bootstrapping of AIs ( http://arxiv.org/abs/2308.04586v1 ) ライセンス: Link先を確認	Mark Stefik and Robert Price	(参考訳) 現在のAIの中には、ボードゲームのようなクローズドな世界では人間の能力を上回っているものもあるが、乱雑な現実世界でのパフォーマンスは限られている。彼らは奇妙な間違いを犯し、気づかない。簡単には指示できないし、常識を使わず、好奇心を欠いている。彼らは良い協力者はしない。従来の手作業によるシンボリックAIアプローチを使用して構築されたシステムも、大きな言語モデル(LLM)を含む生成的およびディープラーニングAIアプローチを使用して構築されたシステムも、その課題を満たすことができない。堅牢で信頼できるAIを作るには向いていない。メインストリームのAIアプローチの外にあるが、開発ブートストラップは有望だ。発達的なブートストラップでは、AIは人間の子供のように能力を生み出す。彼らは生まれながらの能力から始まる。人間と同様に、彼らは環境と相互作用し、相互作用から学ぶ。彼らは自己発達能力で自然能力を徐々に拡張する。彼らは対話し、人々から学び、知覚、認知、共通基盤を確立する。ブートストラッププロセスに続いて、必要な能力を取得する。しかし、発達ロボット工学はまだ大人レベルの強力な能力を持つAIを生産していない。通常、トードラーバリアでは、音声が流流する前の約2歳で幼児の発達に対応するプロジェクトが中止されている。彼らはまた、llmを駆動する巨大な社会的に発達した情報リソースを巧みに、そして懐疑的に活用できる読み取り障壁を橋渡ししません。人間の認知発達における次の能力は、本質的な動機づけ、模倣学習、想像、協調、コミュニケーションである。本稿では,堅牢でレジリエントなaiを作成するための開発ブートストラップのプラクティスを拡張するための,論理,展望,ギャップ,課題を概説する。 Although some current AIs surpass human abilities especially in closed worlds such as board games, their performance in the messy real world is limited. They make strange mistakes and do not notice them. They cannot be instructed easily, fail to use common sense, and lack curiosity. They do not make good collaborators. Neither systems built using the traditional manually-constructed symbolic AI approach nor systems built using generative and deep learning AI approaches including large language models (LLMs) can meet the challenges. They are not well suited for creating robust and trustworthy AIs. Although it is outside of mainstream AI approaches, developmental bootstrapping shows promise. In developmental bootstrapping, AIs develop competences like human children do. They start with innate competences. Like humans, they interact with the environment and learn from their interactions. They incrementally extend their innate competences with self-developed competences. They interact and learn from people and establish perceptual, cognitive, and common grounding. Following a bootstrapping process, they acquire the competences that they need. However, developmental robotics has not yet produced AIs with robust adult-level competences. Projects have typically stopped at the Toddler Barrier corresponding to human infant development at about two years of age, before speech is fluent. They also do not bridge the Reading Barrier, where they can skillfully and skeptically tap into the vast socially developed recorded information resources that power LLMs. The next competences in human cognitive development involve intrinsic motivation, imitation learning, imagination, coordination, and communication. This paper lays out the logic, prospects, gaps, and challenges for extending the practice of developmental bootstrapping to create robust and resilient AIs.	翻訳日:2023-08-10 16:00:59 公開日:2023-08-08
# 決定論的共起のためのカーネル単一プロキシ制御 Kernel Single Proxy Control for Deterministic Confounding ( http://arxiv.org/abs/2308.04585v1 ) ライセンス: Link先を確認	Liyuan Xu, Arthur Gretton	(参考訳) 本研究では,未観測の共同設立者による因果効果推定の問題点を考察し,共同設立者に関連するプロキシ変数を観察する。 Proxy Causal Learning (PCL)は2つのプロキシ変数を用いて真の因果効果を回復するが、結果が決定論的に生成されると、単一のプロキシ変数が因果推定に十分であることを示す。本研究では,2段階回帰法と最大モーメント制限法を組み合わせた2つのカーネルベース手法を提案する。いずれのアプローチも一貫して因果効果を推定できることを実証し,合成データセット上で因果効果を正常に回復できることを実証した。 We consider the problem of causal effect estimation with an unobserved confounder, where we observe a proxy variable that is associated with the confounder. Although Proxy Causal Learning (PCL) uses two proxy variables to recover the true causal effect, we show that a single proxy variable is sufficient for causal estimation if the outcome is generated deterministically, generalizing Control Outcome Calibration Approach (COCA). We propose two kernel-based methods for this setting: the first based on the two-stage regression approach, and the second based on a maximum moment restriction approach. We prove that both approaches can consistently estimate the causal effect, and we empirically demonstrate that we can successfully recover the causal effect on a synthetic dataset.	翻訳日:2023-08-10 16:00:35 公開日:2023-08-08
# LATR:トランスを用いた単眼画像からの3次元レーン検出 LATR: 3D Lane Detection from Monocular Images with Transformer ( http://arxiv.org/abs/2308.04583v1 ) ライセンス: Link先を確認	Yueru Luo, Chaoda Zheng, Xu Yan, Tang Kun, Chao Zheng, Shuguang Cui, Zhen Li	(参考訳) 単眼画像からの3次元車線検出は、自動運転の基本的な課題である。最近の進歩は主に、フロントビューの画像特徴とカメラパラメータから構築された構造的な3dサロゲート(鳥の目視など)に依存している。しかし, 単眼画像の奥行きの曖昧さは, 構築したサロゲート特徴写像と原画像との相違を必然的に引き起こし, 正確な車線検出には大きな課題となる。上記の課題に対処するため, 3D 対応のフロントビュー特徴を用いた3次元レーン検出システムである LATR モデルを提案する。具体的には、LATRはクエリとキーと値のペアに基づいて3次元レーンを検出し、車線対応クエリジェネレータと動的3次元地上位置埋め込みを用いて構築する。一方、各クエリは2dレーン認識機能に基づいて生成され、レーン情報を強化するためにハイブリッド組込みを採用する。一方、3D空間情報は、反復的に更新された3D地上面から位置埋め込みとして注入される。 LATRは、合成アポロと現実的なOpenLaneの両方の最先端の手法を大きなマージンで上回る(例えば、OpenLaneのF1スコアの11.4ゲイン)。コードはhttps://github.com/JMoonr/LATRでリリースされる。 3D lane detection from monocular images is a fundamental yet challenging task in autonomous driving. Recent advances primarily rely on structural 3D surrogates (e.g., bird's eye view) that are built from front-view image features and camera parameters. However, the depth ambiguity in monocular images inevitably causes misalignment between the constructed surrogate feature map and the original image, posing a great challenge for accurate lane detection. To address the above issue, we present a novel LATR model, an end-to-end 3D lane detector that uses 3D-aware front-view features without transformed view representation. Specifically, LATR detects 3D lanes via cross-attention based on query and key-value pairs, constructed using our lane-aware query generator and dynamic 3D ground positional embedding. On the one hand, each query is generated based on 2D lane-aware features and adopts a hybrid embedding to enhance the lane information. On the other hand, 3D space information is injected as positional embedding from an iteratively-updated 3D ground plane. LATR outperforms previous state-of-the-art methods on both synthetic Apollo and realistic OpenLane by large margins (e.g., 11.4 gains in terms of F1 score on OpenLane). Code will be released at https://github.com/JMoonr/LATR.	翻訳日:2023-08-10 16:00:19 公開日:2023-08-08
# RECipe:マルチモーダルレシピ知識グラフは多目的推薦システムに適合しているか? RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System? ( http://arxiv.org/abs/2308.04579v1 ) ライセンス: Link先を確認	Ali Pesaranghader, Touqir Sajed	(参考訳) 過去20年間、レコメンデーションシステム(RS)は、機械学習(ML)ソリューションを使用して、例えば映画、本、レストランなどのアイテムを、企業の顧客やオンラインプラットフォームに推奨してきた。しかし、レシピレコメンデーションは、これらのアプリケーションと比べてあまり注目されていない。マルチモーダル知識グラフ(MMKG)をバックボーンとした多目的レシピレコメンデーションフレームワークとしてRECipeを導入する。 RECipeの背後にあるモチベーションは、自然言語でのクエリやイメージの提供によって、ユーザにレシピを推奨することで、(ディープ)ニューラルコラボレーティブフィルタリング(NCF)を越えていくことである。 RECipeは,(1)行動ベースレコメンデータ,(2)レビューベースレコメンデータ,(3)画像ベースレコメンデータの3つのサブシステムから構成される。各サブシステムは、グラフ内のエンティティと関係の埋め込み表現に依存している。まず、MicrosoftのMPNetの微調整モデルから、レビューや材料などのテキストエンティティの(事前訓練された)埋め込み表現を得る。これらの埋め込みでエンティティの重みを初期化し、知識グラフ埋め込み(KGE)モデルをトレーニングします。視覚成分,すなわちレシピ画像に対して,kge誘導変分オートエンコーダ(kg-vae)を開発し,画像の分布と潜在表現を学習する。 KGEとKG-VAEモデルを完全にトレーニングすると、多目的レコメンデーションフレームワークとして使用します。ベンチマークのために、レシピレコメンデーションのためにKaggle上の公開データセットから2つのナレッジグラフ(KG)を作成しました。実験の結果,KGEモデルはニューラルソリューションに匹敵する性能を示した。また,新しいユーザに対するゼロショット推論(あるいはコールドスタート問題)やレシピカテゴリに対する条件付き推奨など,重要な応用に対処するための事前学習NLP埋め込みを提案する。最終的に、多目的レコメンデーション設定におけるRECipeの適用を実証する。 Over the past two decades, recommendation systems (RSs) have used machine learning (ML) solutions to recommend items, e.g., movies, books, and restaurants, to clients of a business or an online platform. Recipe recommendation, however, has not yet received much attention compared to those applications. We introduce RECipe as a multi-purpose recipe recommendation framework with a multi-modal knowledge graph (MMKG) backbone. The motivation behind RECipe is to go beyond (deep) neural collaborative filtering (NCF) by recommending recipes to users when they query in natural language or by providing an image. RECipe consists of 3 subsystems: (1) behavior-based recommender, (2) review-based recommender, and (3) image-based recommender. Each subsystem relies on the embedding representations of entities and relations in the graph. We first obtain (pre-trained) embedding representations of textual entities, such as reviews or ingredients, from a fine-tuned model of Microsoft's MPNet. We initialize the weights of the entities with these embeddings to train our knowledge graph embedding (KGE) model. For the visual component, i.e., recipe images, we develop a KGE-Guided variational autoencoder (KG-VAE) to learn the distribution of images and their latent representations. Once KGE and KG-VAE models are fully trained, we use them as a multi-purpose recommendation framework. For benchmarking, we created two knowledge graphs (KGs) from public datasets on Kaggle for recipe recommendation. Our experiments show that the KGE models have comparable performance to the neural solutions. We also present pre-trained NLP embeddings to address important applications such as zero-shot inference for new users (or the cold start problem) and conditional recommendation with respect to recipe categories. We eventually demonstrate the application of RECipe in a multi-purpose recommendation setting.	翻訳日:2023-08-10 15:59:53 公開日:2023-08-08
# Pairwise User Preferencesによるアルゴリズムの最適化 Optimizing Algorithms From Pairwise User Preferences ( http://arxiv.org/abs/2308.04571v1 ) ライセンス: Link先を確認	Leonid Keselman, Katherine Shih, Martial Hebert, Aaron Steinfeld	(参考訳) ロボット工学における典型的なブラックボックス最適化アプローチは、メトリクススコアからの学習に焦点を当てている。しかし、すべての開発者が真実を理解できるわけではないので、必ずしもそれが可能であるとは限らない。人間中心のコンテキストで適切なロボットの振る舞いを学ぶには、多くの場合、正確なメトリクススコアを提供できないユーザーをクエリする必要がある。既存のアプローチでは、暗黙の報酬関数をモデル化するために人間のフィードバックを利用するが、この報酬を効果的に捕獲することは困難または不可能である。本研究では,ペアワイズユーザの好みに基づいてアルゴリズムパラメータを高次元に最適化するSortCMAを提案する。 SortCMAは、報酬を直接モデル化することなく、ユーザー入力を利用してパラメータセットを見つける。本手法は,地上の真理を示さずに市販の深度センサをチューニングし,ロボットの行動よりも複雑な嗜好を伴うロボット社会ナビゲーションに適用する。提案手法は,ユーザの目標を最適化し,ユーザ調査を行い,ソーシャルナビゲーションの結果を評価することに成功している。 Typical black-box optimization approaches in robotics focus on learning from metric scores. However, that is not always possible, as not all developers have ground truth available. Learning appropriate robot behavior in human-centric contexts often requires querying users, who typically cannot provide precise metric scores. Existing approaches leverage human feedback in an attempt to model an implicit reward function; however, this reward may be difficult or impossible to effectively capture. In this work, we introduce SortCMA to optimize algorithm parameter configurations in high dimensions based on pairwise user preferences. SortCMA efficiently and robustly leverages user input to find parameter sets without directly modeling a reward. We apply this method to tuning a commercial depth sensor without ground truth, and to robot social navigation, which involves highly complex preferences over robot behavior. We show that our method succeeds in optimizing for the user's goals and perform a user study to evaluate social navigation results.	翻訳日:2023-08-10 15:59:19 公開日:2023-08-08
# single-sentence reader : 回答位置バイアスに対する新しいアプローチ Single-Sentence Reader: A Novel Approach for Addressing Answer Position Bias ( http://arxiv.org/abs/2308.04566v1 ) ライセンス: Link先を確認	Son Quoc Tran and Matt Kretchmar	(参考訳) Machine Reading Comprehension (MRC)モデルは、素早い相関(研究コミュニティのデータセットバイアスやアノテーションアーティファクトとしても知られる)を利用する傾向がある。したがって、これらのモデルは与えられたコンテキストと質問を完全に理解することなくMCCタスクを実行することができ、分散シフトに対するロバスト性が低い可能性があるため、望ましくない。本論文は, 文脈の第一文のみにのみ回答がある学習者のかなりの割合が, 回答位置バイアスという概念を考察する。 MRCにおける解答位置バイアスに対処するための新しいアプローチとして,Single-Sentence Readerを提案する。このアプローチを6つの異なるモデルを用いて実装し、その性能を徹底的に分析する。驚くべきことに,提案するシングルセンテンスリーダは,従来のトレーニングセットでトレーニングされたモデルとほぼ一致し,その効果を実証する。本研究は,シングルセンテンス読者が遭遇するいくつかの課題についても考察し,潜在的な解決策を提案する。 Machine Reading Comprehension (MRC) models tend to take advantage of spurious correlations (also known as dataset bias or annotation artifacts in the research community). Consequently, these models may perform the MRC task without fully comprehending the given context and question, which is undesirable since it may result in low robustness against distribution shift. This paper delves into the concept of answer-position bias, where a significant percentage of training questions have answers located solely in the first sentence of the context. We propose a Single-Sentence Reader as a new approach for addressing answer position bias in MRC. We implement this approach using six different models and thoroughly analyze their performance. Remarkably, our proposed Single-Sentence Readers achieve results that nearly match those of models trained on conventional training sets, proving their effectiveness. Our study also discusses several challenges our Single-Sentence Readers encounter and proposes a potential solution.	翻訳日:2023-08-10 15:59:03 公開日:2023-08-08
# スペクトル正規化カーネル良性試験 Spectral Regularized Kernel Goodness-of-Fit Tests ( http://arxiv.org/abs/2308.04561v1 ) ライセンス: Link先を確認	Omar Hagrass, Bharath K. Sriperumbudur, Bing Li	(参考訳) maximum mean discrepancy (mmd)は非ユークリッドデータを扱う能力があるため、非パラメトリック仮説テストを含む多くの機械学習や統計応用で多くの成功を収めている。近年、balasubramanian et alで実証されている。 (2021) MMD に基づく適合性テストは最小限最適ではないが、Tikhonov の正規化バージョンは正規化パラメータの適切な選択のために最適である。しかし、balasubramanian et al. (2021) の結果は平均元が 0 であるという制限付き仮定と積分作用素の固有関数上の一様有界性条件の下で得られる。さらに、balasubramanian et al. (2021) で提案されたテストは、多くのカーネルで計算できないため実用的ではない。本稿では,これらの欠点を取り上げ,tikhonov正則化を含む一般スペクトル正則化器に結果を拡張する。 Maximum mean discrepancy (MMD) has enjoyed a lot of success in many machine learning and statistical applications, including non-parametric hypothesis testing, because of its ability to handle non-Euclidean data. Recently, it has been demonstrated in Balasubramanian et al.(2021) that the goodness-of-fit test based on MMD is not minimax optimal while a Tikhonov regularized version of it is, for an appropriate choice of the regularization parameter. However, the results in Balasubramanian et al. (2021) are obtained under the restrictive assumptions of the mean element being zero, and the uniform boundedness condition on the eigenfunctions of the integral operator. Moreover, the test proposed in Balasubramanian et al. (2021) is not practical as it is not computable for many kernels. In this paper, we address these shortcomings and extend the results to general spectral regularizers that include Tikhonov regularization.	翻訳日:2023-08-10 15:58:48 公開日:2023-08-08
# FocalFormer3D : 3Dオブジェクト検出のためのハードインスタンスに着目して FocalFormer3D : Focusing on Hard Instance for 3D Object Detection ( http://arxiv.org/abs/2308.04556v1 ) ライセンス: Link先を確認	Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez	(参考訳) 3dオブジェクト検出における偽陰性(fn)は、歩行者、車両、その他の障害物の予測を欠くことによって、自動運転において潜在的に危険な状況につながる可能性がある。致命的な問題だが、この問題は現在の多くの3D検出手法で検討されている。本研究では,多段階的に \textit{fn} を識別する一般的なパイプラインであるhard instance probing (hip)を提案する。 3次元物体検出のために,この手法をfocalformer3dとしてインスタンス化する。 FocalFormer3Dは、ハードオブジェクトを見つけるためのマルチステージクエリ生成と、巨大なオブジェクト候補からオブジェクトを効率的に区別するボックスレベルのトランスフォーマーデコーダを備えている。 nuScenesとWaymoデータセットの実験結果は、FocalFormer3Dの優れた性能を検証する。この利点は、LiDARとマルチモーダル設定の両方において、検出とトラッキングの両方で強力なパフォーマンスをもたらす。 FocalFormer3D は nuScenes 検出ベンチマークで 70.5 mAP と 73.9 NDS を達成し、nuScenes 追跡ベンチマークでは 72.1 AMOTA を示し、どちらも nuScenes LiDAR リーダーボードで1位となった。私たちのコードは \url{https://github.com/NVlabs/FocalFormer3D} で利用可能です。 False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the superior performance of FocalFormer3D. The advantage leads to strong performance on both detection and tracking, in both LiDAR and multi-modal settings. Notably, FocalFormer3D achieves a 70.5 mAP and 73.9 NDS on nuScenes detection benchmark, while the nuScenes tracking benchmark shows 72.1 AMOTA, both ranking 1st place on the nuScenes LiDAR leaderboard. Our code is available at \url{https://github.com/NVlabs/FocalFormer3D}.	翻訳日:2023-08-10 15:58:32 公開日:2023-08-08
# PSRFlow:科学データのためのフローベースモデルによる確率的超解法 PSRFlow: Probabilistic Super Resolution with Flow-Based Models for Scientific Data ( http://arxiv.org/abs/2308.04605v1 ) ライセンス: Link先を確認	Jingyi Shen and Han-Wei Shen	(参考訳) 近年,多くの深層学習に基づく超解法が提案されているが,推論段階では基礎的な真理が得られていないため,超解答結果の誤りや不確実性を定量化できるものはほとんどない。しかし、科学的視覚化の応用においては、結果の不確かさを科学者に伝えることは、誤った情報や誤った情報の発生を避けるために不可欠である。本稿では,不確かさの定量化を超解像プロセスに組み込んだ,科学データ超解像のための新しい正規化フロー型生成モデルpsrflowを提案する。 PSRFlowは低解像度データに基づいて高解像度データの条件分布を学習する。高解像度データの欠落情報をキャプチャするガウス潜在空間からサンプリングすることにより、異なる可視超解出力を生成することができる。ガウス潜在空間における効率的なサンプリングにより、超解結果に対する不確実な定量化を行うことができる。モデルトレーニング中、様々なスケールのサンプルでトレーニングデータを増強し、異なるスケールのデータに適応できるようにし、与えられた入力に対して柔軟な超解像を実現する。この結果は,補間やGANに基づく超解像ネットワークなどの既存手法と比較して,優れた性能とロバストな不確実性定量化を示す。 Although many deep-learning-based super-resolution approaches have been proposed in recent years, because no ground truth is available in the inference stage, few can quantify the errors and uncertainties of the super-resolved results. For scientific visualization applications, however, conveying uncertainties of the results to scientists is crucial to avoid generating misleading or incorrect information. In this paper, we propose PSRFlow, a novel normalizing flow-based generative model for scientific data super-resolution that incorporates uncertainty quantification into the super-resolution process. PSRFlow learns the conditional distribution of the high-resolution data based on the low-resolution counterpart. By sampling from a Gaussian latent space that captures the missing information in the high-resolution data, one can generate different plausible super-resolution outputs. The efficient sampling in the Gaussian latent space allows our model to perform uncertainty quantification for the super-resolved results. During model training, we augment the training data with samples across various scales to make the model adaptable to data of different scales, achieving flexible super-resolution for a given input. Our results demonstrate superior performance and robust uncertainty quantification compared with existing methods such as interpolation and GAN-based super-resolution networks.	翻訳日:2023-08-10 15:52:35 公開日:2023-08-08
# 分散型連合学習に関する調査研究 A Survey on Decentralized Federated Learning ( http://arxiv.org/abs/2308.04604v1 ) ライセンス: Link先を確認	Edoardo Gabrielli, Giovanni Pica, Gabriele Tolomei	(参考訳) 近年、フェデレーテッド・ラーニング(FL)は、分散、大規模、プライバシ保護機械学習(ML)システムのトレーニングにおいて、非常に一般的なパラダイムとなっている。トレーニングが行われる正確な場所でデータを収集しなければならない標準的なMLとは対照的に、FLは数百万のエッジデバイスの計算能力を活用して、ローカルのプライベートデータを開示することなく、共有グローバルモデルを協調的にトレーニングする。具体的には、典型的なflシステムでは、中央サーバはオーケストレータとしてのみ動作し、各クライアントがトレーニングしたすべてのローカルモデルを、収束するまでそのプライベートデータ上で反復的に収集し集約する。 FLは間違いなく従来のMLよりもいくつかの利点がある(例えば、設計によるプライベートデータ所有権を保護する)が、いくつかの弱点に悩まされている。最も重要な課題の1つは、単一障害リスクや中間者攻撃に弱いことが知られている古典的なFLクライアントサーバアーキテクチャの集中的なオーケストレーションを克服することである。このような露出を軽減するために、すべてのFLクライアントが中央サーバなしで協調して通信する分散FLソリューションが登場した。この調査は、文献で提案されている既存の分散FLアプローチを包括的に要約し、レビューする。さらに、新たな課題を特定し、この未調査領域における有望な研究方向性を提案する。 In recent years, federated learning (FL) has become a very popular paradigm for training distributed, large-scale, and privacy-preserving machine learning (ML) systems. In contrast to standard ML, where data must be collected at the exact location where training is performed, FL takes advantage of the computational capabilities of millions of edge devices to collaboratively train a shared, global model without disclosing their local private data. Specifically, in a typical FL system, the central server acts only as an orchestrator; it iteratively gathers and aggregates all the local models trained by each client on its private data until convergence. Although FL undoubtedly has several benefits over traditional ML (e.g., it protects private data ownership by design), it suffers from several weaknesses. One of the most critical challenges is to overcome the centralized orchestration of the classical FL client-server architecture, which is known to be vulnerable to single-point-of-failure risks and man-in-the-middle attacks, among others. To mitigate such exposure, decentralized FL solutions have emerged where all FL clients cooperate and communicate without a central server. This survey comprehensively summarizes and reviews existing decentralized FL approaches proposed in the literature. Furthermore, it identifies emerging challenges and suggests promising research directions in this under-explored domain.	翻訳日:2023-08-10 15:52:15 公開日:2023-08-08
# 深層学習に基づく画像透かし - 簡単な調査- Deep Learning based Image Watermarking: A Brief Survey ( http://arxiv.org/abs/2308.04603v1 ) ライセンス: Link先を確認	Xin Zhong, Arjon Das, Fahad Alrasheedi, Abdullah Tanvir	(参考訳) カバー画像に秘かに透かしを埋め込み抽出して保護する行為は、画像透かし(image watermarking)と呼ばれる。近年,深層学習に基づく画像透かし技術が次々と出現している。そこで本研究では,最先端の深層学習に基づく画像透かし技術について,埋め込み・抽出合同訓練,特徴変換としてのディープネットワーク,ハイブリッドスキームに分類した。各カテゴリの研究方向も分析され、要約される。また,今後の研究の方向性についても論じる。 The act of secretly embedding and extracting a watermark on a cover image to protect it is known as image watermarking. In recent years, deep learning-based image watermarking techniques have been emerging one after another. To study the state-of-the-art, this survey categorizes cutting-edge deep learning-based image watermarking techniques into Embedder-Extractor Joint Training, Deep Networks as a Feature Transformation, and Hybrid schemes. Research directions in each category are also analyzed and summarized. Additionally, potential future research directions are discussed to envision future studies.	翻訳日:2023-08-10 15:51:54 公開日:2023-08-08
# NSF RESUME HPC Workshop: 疫学モデリングにおける高性能コンピューティングと大規模データ管理 NSF RESUME HPC Workshop: High-Performance Computing and Large-Scale Data Management in Service of Epidemiological Modeling ( http://arxiv.org/abs/2308.04602v1 ) ライセンス: Link先を確認	Abby Stevens, Jonathan Ozik, Kyle Chard, Jaline Gerardin, Justin M. Wozniak	(参考訳) NSFが出資したRobust Epidemic Surveillance and Modeling (RESUME)プロジェクトは、2023年5月1日から2日にかけてシカゴ大学で「疫学モデリングのための高性能コンピューティングと大規模データ管理」というワークショップを開催した。これは、予測知性とパンデミック予防のための持続可能な学際的共同設計を促進するために設計された一連のワークショップの一部である。このイベントでは、疫学モデリング、ハイパフォーマンスコンピューティング(hpc)、hpcワークフロー、大規模データ管理の専門家31人が集結し、パンデミック予防のために計算疫学に必要な能力の共有ビジョンを開発する。ワークショップを通じて、参加者は、HPCのワークフロー、データ統合、およびHPCアクセスに重点を置いて、特に公衆衛生上の意思決定を支援するために、HPC能力が疫学的モデリングを改善するのに使用できる重要な領域を特定した。ワークショップでは、新しいHPCワークフローと、現在疫学モデリングに使われている大規模データ管理アプローチを調査し、疫学モデリングに最も適したプラクティスを決定するために、他のドメインで使われているアプローチから引き出そうとした。本報告では,ワークショップの成果と成果について報告する。 The NSF-funded Robust Epidemic Surveillance and Modeling (RESUME) project successfully convened a workshop entitled "High-performance computing and large-scale data management in service of epidemiological modeling" at the University of Chicago on May 1-2, 2023. This was part of a series of workshops designed to foster sustainable and interdisciplinary co-design for predictive intelligence and pandemic prevention. The event brought together 31 experts in epidemiological modeling, high-performance computing (HPC), HPC workflows, and large-scale data management to develop a shared vision for capabilities needed for computational epidemiology to better support pandemic prevention. Through the workshop, participants identified key areas in which HPC capabilities could be used to improve epidemiological modeling, particularly in supporting public health decision-making, with an emphasis on HPC workflows, data integration, and HPC access. The workshop explored nascent HPC workflow and large-scale data management approaches currently in use for epidemiological modeling and sought to draw from approaches used in other domains to determine which practices could be best adapted for use in epidemiological modeling. This report documents the key findings and takeaways from the workshop.	翻訳日:2023-08-10 15:51:43 公開日:2023-08-08
# モデルモデル -- その1 Model of models -- Part 1 ( http://arxiv.org/abs/2308.04600v1 ) ライセンス: Link先を確認	Shimon Komarovsky	(参考訳) 本稿では,AGIエージェントの主成分として機能する新しい認知モデルを提案する。このモデルは、成熟したインテリジェンス状態に導入され、以前のモデルであるDENN、特にAKREMの拡張として、運用モデル(フレーム/クラス)と意志を含む。このモデルの中核的な仮定は、認知は蓄積された知識を操作することであり、適切な意志のガイダンスである。また、知識の一部である行動が、成熟した知性状態に先行する進化段階において、意志に沿うことを学習していると仮定する。さらに、このモデルは、トップダウンとボトムアップの両方のモデル学習、一般化のバース特殊化など、既知のすべての知的側面における双対性原理に基づいている。さらに、AGI設計には全体論的アプローチが提唱され、再利用性とシンプルさという形で制約や効率性の下での認知が提案される。最後に、この成熟状態に達するには、統合原理を利用して、幼児から成人への認知的進化を通して記述する。この認知モデルの最終的な製品は、モデルとインスタンスの動的操作メモリである。最後に、成熟状態に達する進化段階のいくつかの例と予備的なアイデアを示す。 This paper proposes a new cognitive model, acting as the main component of an AGI agent. The model is introduced in its mature intelligence state, and as an extension of previous models, DENN, and especially AKREM, by including operational models (frames/classes) and will. This model's core assumption is that cognition is about operating on accumulated knowledge, with the guidance of an appropriate will. Also, we assume that the actions, part of knowledge, are learning to be aligned with will, during the evolution phase that precedes the mature intelligence state. In addition, this model is mainly based on the duality principle in every known intelligent aspect, such as exhibiting both top-down and bottom-up model learning, generalization verse specialization, and more. Furthermore, a holistic approach is advocated for AGI designing, and cognition under constraints or efficiency is proposed, in the form of reusability and simplicity. Finally, reaching this mature state is described via a cognitive evolution from infancy to adulthood, utilizing a consolidation principle. The final product of this cognitive model is a dynamic operational memory of models and instances. Lastly, some examples and preliminary ideas for the evolution phase to reach the mature state are presented.	翻訳日:2023-08-10 15:51:21 公開日:2023-08-08
# cvpr2023のバーストロングテールとオープンワールドへの挑戦 1st Place Solution for CVPR2023 BURST Long Tail and Open World Challenges ( http://arxiv.org/abs/2308.04598v1 ) ライセンス: Link先を確認	Kaer Huang	(参考訳) 現在、ビデオインスタンスセグメンテーション(vis)は、わずか数十のカテゴリを含むクローズドなトレーニングカテゴリから、ビデオ内のオブジェクトをセグメンテーションし、分類することを目的としている。 TAOとBURSTのデータセットがリリースされるにつれて、長い尾とオープンワールドのシナリオでVISを研究する機会が得られます。従来のVISメソッドは、少数の共通クラスに限定されたベンチマークで評価されるが、実用的なアプリケーションでは、これらの共通クラスを越えて、稀で目に見えないオブジェクトを検出し、追跡するトラッカーが必要である。ロングテールタスクのための最新のmot論文(野生のあらゆるものを追跡するsiyuan li et)にインスパイアされたburst long tail challengeでは、反復係数サンプリングを使用して、lvisv0.5とcocoデータセットの組み合わせでモデルをトレーニングします。まず、LVISv0.5 + COCOデータセット上でセグメンテーションとCEMで検出器を訓練する。そして、TAOデータセットでインスタンスの外観の類似性をトレーニングする。最終的に、我々のメソッド(LeTracker)は、BURSTテストセットで14.9 HOTAallを獲得し、ベンチマークで1位になった。オープンワールドの課題では、64クラス(BURST TrainサブセットのIntersectionクラスとCOCOデータセット、LVISデータセットなしで)のアノテーションデータトレーニングと、BURSTテストセットデータセット上でのテストのみを使用し、ベンチマークで1位となる61.4 OWTAallを取得します。私たちのコードは将来の研究を促進するためにリリースされます。 Currently, Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories that contain only a few dozen of categories, lacking the ability to handle diverse objects in real-world videos. As TAO and BURST datasets release, we have the opportunity to research VIS in long-tailed and open-world scenarios. Traditional VIS methods are evaluated on benchmarks limited to a small number of common classes, But practical applications require trackers that go beyond these common classes, detecting and tracking rare and even never-before-seen objects. Inspired by the latest MOT paper for the long tail task (Tracking Every Thing in the Wild, Siyuan Li et), for the BURST long tail challenge, we train our model on a combination of LVISv0.5 and the COCO dataset using repeat factor sampling. First, train the detector with segmentation and CEM on LVISv0.5 + COCO dataset. And then, train the instance appearance similarity head on the TAO dataset. at last, our method (LeTracker) gets 14.9 HOTAall in the BURST test set, ranking 1st in the benchmark. for the open-world challenges, we only use 64 classes (Intersection classes of BURST Train subset and COCO dataset, without LVIS dataset) annotations data training, and testing on BURST test set data and get 61.4 OWTAall, ranking 1st in the benchmark. Our code will be released to facilitate future research.	翻訳日:2023-08-10 15:50:59 公開日:2023-08-08
# P\"oschl-Teller電位の再正規化とスペクトル Renormalization and spectra of the P\"oschl-Teller potential ( http://arxiv.org/abs/2308.04596v1 ) ライセンス: Link先を確認	Ulysses Camara da Silva, Andre Alves Lima, Carlos F.S. Pereira	(参考訳) 2次元パラメータのすべての値に対する p\"oschl-teller ポテンシャルのエネルギー固有関数とスペクトルについて検討した。ポテンシャルは原点に特異性を持ち、パラメータ空間のいくつかの領域では固有関数の境界条件が不定義となる。再正規化手順を解の族に応用し,関連する再正規化群(rg)フローを考察する。再正規化は `dimensional transmutation'' によって異常な長さスケールをもたらす。このスケールがゼロに設定できないカップリング空間の領域では、特異点の近くで漸近共形対称性を自発的に破る。対称性はポテンシャルの次元パラメータによって明確に破られる。これら2つの競合する共形対称性を破る方法の存在は、RGフローを興味深い構造にする。ポテンシャルの超対称性は、存在すれば漸近共形対称性の自発的破れを防止できることを示す。固有関数の族を用いてパラメータ空間のすべての領域における S-行列を異常スケールの任意の値に対して計算する。次に、S-行列の極を体系的に研究し、すべての有界、反有界、準安定状態を分類する。 We study the energy eigenfunctions and spectrum of the P\"oschl-Teller potential for every value of its two dimensionless parameters. The potential has a singularity at the origin which, in some regions of parameter space, makes boundary conditions of the eigenfunctions ill-defined. We apply a renormalization procedure to obtain a family of well-defined solutions, and study the associated renormalization group (RG) flow. Renormalization introduces an anomalous length scale by ``dimensional transmutation''. In the regions of coupling space where this scale cannot be set to zero, it spontaneously breaks the asymptotic conformal symmetry near the singularity. The symmetry is also explicitly broken by a dimensionful parameter in the potential. The existence of these two competing ways of breaking conformal symmetry gives the RG flow an interesting structure. We show that supersymmetry of the potential, when present, allows one to prevent spontaneous breaking of the asymptotic conformal symmetry. We use the family of eigenfunctions to compute the S-matrix in all regions of parameter space, for any value of anomalous scale. Then we systematically study the poles of the S-matrix to classify all bound, anti-bound and metastable states.	翻訳日:2023-08-10 15:50:28 公開日:2023-08-08
# 深層ニューラルネットワーク圧縮のための量子化認識因子化 Quantization Aware Factorization for Deep Neural Network Compression ( http://arxiv.org/abs/2308.04595v1 ) ライセンス: Link先を確認	Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets, Andrzej Cichocki, Julia Gusak	(参考訳) 畳み込み層と完全連結層のテンソル分解は、ニューラルネットワークのパラメータとフラップを減らす効果的な方法である。モバイルまたは組み込みデバイスのメモリと消費電力の制限のため、事前トレーニングされたモデルがデプロイされる場合、量子化ステップが通常必要となる。従来のトレーニング後量子化手法は、分割重み付きネットワークに適用され、精度が低下する。これにより、テンソル近似を量子化因子で直接求めるアルゴリズムを開発し、モデルの予測品質を維持しながら、両方の圧縮手法の恩恵を受けることができる。すなわち、特定の量子化格子上に存在する要素を持つ正準ポリアディック(CP)分解に、 Alternating Direction Method of Multipliers (ADMM) を用いることを提案する。ニューラルネットワークの重み付けを考案したアルゴリズムで圧縮し,その予測品質と性能を評価する。本手法を最先端のトレーニング後量子化手法と比較し,望ましい品質・パフォーマンストレードオフの達成において,高い柔軟性と競争性を示す。 Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it's prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.	翻訳日:2023-08-10 15:50:07 公開日:2023-08-08
# セリウム置換M型六フッ化ストロンチウムの4価電子駆動における巨大磁気異方性と光学異方性 Giant magnetic and optical anisotropy in cerium-substituted M-type strontium hexaferrite driven by 4$f$ electrons ( http://arxiv.org/abs/2308.04594v1 ) ライセンス: Link先を確認	Churna Bhandari, Durga Paudyal	(参考訳) 密度汎関数計算により, セリウム (Ce) 置換M型ヘキサフェライト中の巨大結晶異方性 (MCA) 定数が, Ce から特定の鉄 (2a) サイトへの量子閉じ込め電子移動の支援により, エネルギー的に有利なストロンチウムサイトに存在することがわかった。計算された電子構造は、電子移動がCe$^{3+}$とFe$^{2+}$をフェルミ準位以下に占有したCe($4f^1$)状態を生成する2a$サイトで形成し、MCAと磁気モーメントに重要な寄与をもたらすことを示している。ハーフce置換は金属状態を形成し、全置換はストロンチウム-ヘキサフェライト(ホスト)の半導状態を保持する。後者では、ホストのギャップ領域における電荷移動状態の形成によりバンドギャップが減少する。光吸収係数は、平行方向の光偏光と垂直方向の強い異方性を示す。予測可能な競合相の解析を含む計算された生成エネルギーと弾性定数は、両方の組成が化学的に、機械的に安定であることを確認する。 Ce-ヘキサフェライトは、合成の成功により、自動車の駆動モーターなどの装置での使用に適合する新しい高性能な臨界要素のない永久磁石材料となる。 By performing density functional calculations, we find a giant magnetocrystalline anisotropy (MCA) constant in abundant element cerium (Ce) substituted M-type hexaferrite, in the energetically favorable strontium site, assisted by a quantum confined electron transfer from Ce to specific iron (2a) site. Remarkably, the calculated electronic structure shows that the electron transfer leads to the formation of Ce$^{3+}$ and Fe$^{2+}$ at the $2a$ site producing an occupied Ce($4f^1$) state below the Fermi level that adds a significant contribution to MCA and magnetic moment. A half Ce-substitution forms a metallic state, while a full substitution retains the semiconducting state of the strontium-hexaferrite (host). In the latter, the band gap is reduced due to the formation of charge transferred states in the gap region of the host. The optical absorption coefficient shows an enhanced anisotropy between light polarization in parallel and perpendicular directions. Calculated formation energies, including the analysis of probable competing phases, and elastic constants confirm that both compositions are chemically and mechanically stable. With successful synthesis, the Ce-hexaferrite can be a new high-performing critical-element-free permanent magnet material adapted for use in devices such as automotive traction drive motors.	翻訳日:2023-08-10 15:49:52 公開日:2023-08-08
# shepherd: 言語モデル生成に対する批判 Shepherd: A Critic for Language Model Generation ( http://arxiv.org/abs/2308.04592v1 ) ライセンス: Link先を確認	Tianlu Wang, Ping Yu, Xiaoqing Ellen Tan, Sean O'Brien, Ramakanth Pasunuru, Jane Dwivedi-Yu, Olga Golovneva, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz	(参考訳) 大きな言語モデルの改善に伴い、これらのモデルの能力を活用して独自の出力を洗練する技術への関心が高まっている。本研究では,応答を批判し,改良を提案する言語モデルとして,多種多様なエラーを識別し,修正を提案する未調整モデルの能力を超えて拡張する。私たちのアプローチの中核は高品質なフィードバックデータセットで、コミュニティのフィードバックとヒューマンアノテーションからキュレートしています。 Shepherd は小さい (7B パラメータ) が、その批判は ChatGPT などの確立したモデルと同等か好まれる。 GPT-4による評価では、シェパードの平均勝利率は53-87%である。人間の評価では、Shepherdは他のモデルを厳密に上回り、ChatGPTと密接な関係にある。 As large language models improve, there is increasing interest in techniques that leverage these models' capabilities to refine their own outputs. In this work, we introduce Shepherd, a language model specifically tuned to critique responses and suggest refinements, extending beyond the capabilities of an untuned model to identify diverse errors and provide suggestions to remedy them. At the core of our approach is a high quality feedback dataset, which we curate from community feedback and human annotations. Even though Shepherd is small (7B parameters), its critiques are either equivalent or preferred to those from established models including ChatGPT. Using GPT-4 for evaluation, Shepherd reaches an average win-rate of 53-87% compared to competitive alternatives. In human evaluation, Shepherd strictly outperforms other models and on average closely ties with ChatGPT.	翻訳日:2023-08-10 15:49:25 公開日:2023-08-08
# ヒルベルト=シュミット作用素と複素ヒルベルト空間の共役:ディラックのブラケット形式を再訪 Hilbert-Schmidt operators and the conjugate of a complex Hilbert space: Dirac's bra-ket formalism revisited ( http://arxiv.org/abs/2308.04627v1 ) ライセンス: Link先を確認	Frank Oertel	(参考訳) 我々は、与えられた複素ヒルベルト空間上の内積の定義が(通常、数学で使われる(線形性は第一成分で、半線型性は第二成分で仮定される)、量子物理学におけるディラックの強力なブラケット形式性に直接関係していることを詳細に示す。この目的のために、複素ヒルベルト空間の共役(半線型作用素の解析を線型作用素理論で扱うことができる)を利用し、従って Fr\'{e}chet-Riesz の定理を再適用する必要がある。応用は、2つの複素ヒルベルト空間 $h \otimes k$ のテンソル積の自己完結的で単純な記述や、量子テレポーテーション過程の純粋に線形代数的記述(例3.8)を含む。そのような場合、ヒルベルト空間 $H \otimes (K \otimes L)$ と $(H \otimes K) \otimes L$ (Theorem 3.7) の間の正準同型を明示的に構成する。 We reveal in detail how the definition of the inner product on a given complex Hilbert space - usually used in mathematics (where linearity is assumed in the first component and semilinearity in the second) - directly links to Dirac's powerful bra-ket formalism in quantum physics. To this end, we just have to make use of the conjugate of a complex Hilbert space (by which an analysis of semilinear operators can be handled by means of linear operator theory) and re-apply the theorem of Fr\'{e}chet-Riesz accordingly. Applications are specified, including a self-contained and simple description of the tensor product of two complex Hilbert spaces $H \otimes K$ (answering a related question of B. K. Driver) and a purely linear algebraic description of the quantum teleportation process (Example 3.8). In doing so, we provide an explicit construction of a canonical isometric isomorphism between the Hilbert spaces $H \otimes (K \otimes L)$ and $(H \otimes K) \otimes L$ (Theorem 3.7).	翻訳日:2023-08-10 15:41:40 公開日:2023-08-08
# 意味的変動評価のための文埋め込みモデルの比較検討 A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation ( http://arxiv.org/abs/2308.04625v1 ) ライセンス: Link先を確認	Deven M. Mistry and Ali A. Minai	(参考訳) 本や写本のような長い現実世界のテキストにおける意味変化のパターンを分析することは、スタイリスティック、認知、言語の観点から興味深い。また、テキストセグメンテーション、文書要約、セマンティックノベルティの検出などのアプリケーションにも有用である。文埋め込みのためのベクトル空間法が最近出現し、そのような分析が可能になった。しかし、これは様々な方法によって生み出される意味表現がいかに一貫性があり有意義であるかという問題を引き起こす。本稿では,複数の文献において,連続する文間の意味的類似性の時系列と対の文類似性の行列を用いた最近の文埋め込み手法を比較した。文埋め込み法を比較するために,目的とするタスクやデータセットを用いた従来の作業とは対照的に,本手法は「野放し」な手法の評価を提供する。文の埋め込み手法のほとんどは、ある文書において意味的類似性の高相関パターンを推定するが、興味深い相違が見られる。 Analyzing the pattern of semantic variation in long real-world texts such as books or transcripts is interesting from the stylistic, cognitive, and linguistic perspectives. It is also useful for applications such as text segmentation, document summarization, and detection of semantic novelty. The recent emergence of several vector-space methods for sentence embedding has made such analysis feasible. However, this raises the issue of how consistent and meaningful the semantic representations produced by various methods are in themselves. In this paper, we compare several recent sentence embedding methods via time-series of semantic similarity between successive sentences and matrices of pairwise sentence similarity for multiple books of literature. In contrast to previous work using target tasks and curated datasets to compare sentence embedding methods, our approach provides an evaluation of the methods 'in the wild'. We find that most of the sentence embedding methods considered do infer highly correlated patterns of semantic similarity in a given document, but show interesting differences.	翻訳日:2023-08-10 15:41:03 公開日:2023-08-08
# LLMを利用したチャットボットのベンチマーク:方法とメトリクス Benchmarking LLM powered Chatbots: Methods and Metrics ( http://arxiv.org/abs/2308.04624v1 ) ライセンス: Link先を確認	Debarag Banerjee, Pooja Singh, Arjun Avadhanam, Saksham Srivastava	(参考訳) 自律的な会話エージェント、すなわちチャットボットは、企業が顧客やパートナーにサポートを提供するための一般的なメカニズムになりつつある。チャットボット、特にLarge Language Models (LLMs)のようなジェネレーティブAIツールを活用するものを評価するためには、パフォーマンスを正確に評価する必要がある。ここでチャットボットのベンチマークが重要になる。本稿では,e2e(end to end)ベンチマークと呼ばれる新しいベンチマークの利用を提案し,チャットボット,特にllmsによる回答の正確性と有用性を評価するためにe2eベンチマークをどのように利用できるかを示す。我々は,E2Eベンチマークと,技術状況で一般的に使用されている他のメトリクスの両方に基づいて,さまざまなレベルの高度度でチャットボットの例を評価し,提案したベンチマークが他と比較して優れた結果を示すことを観察した。さらに、いくつかのメトリクスは予測不可能であることが判明したが、チャットボットの評価においてコサインの類似性を利用したE2Eベンチマークに関連するメトリクスは良好に動作した。ベストモデルの性能は,コサイン類似度スコアを指標としてE2Eベンチマークにいくつかの利点があることを示している。 Autonomous conversational agents, i.e. chatbots, are becoming an increasingly common mechanism for enterprises to provide support to customers and partners. In order to rate chatbots, especially ones powered by Generative AI tools like Large Language Models (LLMs) we need to be able to accurately assess their performance. This is where chatbot benchmarking becomes important. In this paper, we propose the use of a novel benchmark that we call the E2E (End to End) benchmark, and show how the E2E benchmark can be used to evaluate accuracy and usefulness of the answers provided by chatbots, especially ones powered by LLMs. We evaluate an example chatbot at different levels of sophistication based on both our E2E benchmark, as well as other available metrics commonly used in the state of art, and observe that the proposed benchmark show better results compared to others. In addition, while some metrics proved to be unpredictable, the metric associated with the E2E benchmark, which uses cosine similarity performed well in evaluating chatbots. The performance of our best models shows that there are several benefits of using the cosine similarity score as a metric in the E2E benchmark.	翻訳日:2023-08-10 15:40:47 公開日:2023-08-08
# staged speculative decodingを用いたllm推論の高速化 Accelerating LLM Inference with Staged Speculative Decoding ( http://arxiv.org/abs/2308.04623v1 ) ライセンス: Link先を確認	Benjamin Spector and Chris Re	(参考訳) 大規模言語モデル(LLM)による最近の進歩は、その多様な能力を示している。そこで我々は,小型デバイス上でのLDM推論を高速化する新しいアルゴリズム,ステージド投機デコーディングを提案する。我々は、投機的復号法における従来の作業を改善することで、小バッチ推論の算術強度を低くする。まず、投機的バッチをツリーとして再構成し、生成コストを削減し、バッチ当たりの期待トークンを増やす。次に、投機的復号化の第2段階を追加します。出力品質を完全に保ちながら、762MパラメータGPT-2-Lモデルを用いて、単一バッチ復号遅延を3.16倍削減する。 Recent advances with large language models (LLM) illustrate their diverse capabilities. We propose a novel algorithm, staged speculative decoding, to accelerate LLM inference in small-batch, on-device scenarios. We address the low arithmetic intensity of small-batch inference by improving upon previous work in speculative decoding. First, we restructure the speculative batch as a tree, which reduces generation costs and increases the expected tokens per batch. Second, we add a second stage of speculative decoding. Taken together, we reduce single-batch decoding latency by 3.16x with a 762M parameter GPT-2-L model while perfectly preserving output quality.	翻訳日:2023-08-10 15:40:28 公開日:2023-08-08
# モノクロ映像から人間をレンダリングする Rendering Humans from Object-Occluded Monocular Videos ( http://arxiv.org/abs/2308.04622v1 ) ライセンス: Link先を確認	Tiange Xiang, Adam Sun, Jiajun Wu, Ehsan Adeli, Li Fei-Fei	(参考訳) モノクロビデオから人間を動かすことの3D理解とレンダリングは難しい課題だ。近年の進歩にもかかわらず、実際のシナリオでは、障害物がカメラの視界を遮り、キャプチャーされたビデオに部分的閉塞を引き起こすような作業は依然として困難である。既存のメソッドは2つの理由からこのような欠陥を処理できない。第一に、標準的なレンダリング戦略は点点マッピングに依存しており、これは身体の可視領域と隠蔽領域の間に劇的な差異をもたらす可能性がある。第二に、自然な直接回帰アプローチは、閉塞下でのレンダリングの実現可能性基準(つまり事前情報)を考慮しない。以上の欠点に対処するため,重度の閉鎖シーンにおいて,より優れたレンダリングを実現するニューラルネットワークレンダリング手法であるOccNeRFを提案する。この2つの欠点に対する直接的な解決策として,形状と可視性の統合による表面レンダリングを提案する。シミュレーションと実世界のオクルージョンの両方に対して本手法の有効性を検証する。 3D understanding and rendering of moving humans from monocular videos is a challenging task. Despite recent progress, the task remains difficult in real-world scenarios, where obstacles may block the camera view and cause partial occlusions in the captured videos. Existing methods cannot handle such defects due to two reasons. First, the standard rendering strategy relies on point-point mapping, which could lead to dramatic disparities between the visible and occluded areas of the body. Second, the naive direct regression approach does not consider any feasibility criteria (ie, prior information) for rendering under occlusions. To tackle the above drawbacks, we present OccNeRF, a neural rendering method that achieves better rendering of humans in severely occluded scenes. As direct solutions to the two drawbacks, we propose surface-based rendering by integrating geometry and visibility priors. We validate our method on both simulated and real-world occlusions and demonstrate our method's superiority.	翻訳日:2023-08-10 15:40:17 公開日:2023-08-08
# 帯域フィードバックによるマルチクラスオンライン学習 Multiclass Online Learnability under Bandit Feedback ( http://arxiv.org/abs/2308.04620v1 ) ライセンス: Link先を確認	Ananth Raman, Vinod Raman, Unique Subedi, Ambuj Tewari	(参考訳) バンディットフィードバックに基づくオンラインマルチクラス分類について検討する。ラベル空間が非有界である場合でも、Bandit Littlestone次元の有限性が必要かつ十分であることを示すことにより、(ダニーリー2013プライス)の結果を拡張した。この結果から,ラベル空間が非有界である場合,Littlestone次元がオンラインマルチクラス学習能力を特徴付けることを示す(Hanneke2023multiclass)最近の研究を補完する。 We study online multiclass classification under bandit feedback. We extend the results of (daniely2013price) by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online multiclass learnability even when the label space is unbounded. Our result complements the recent work by (hanneke2023multiclass) who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting when the label space is unbounded.	翻訳日:2023-08-10 15:40:02 公開日:2023-08-08
# ユニバーサルバックドア緩和とテスト時間検出のためのアクティベーションクリッピングの改善 Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection ( http://arxiv.org/abs/2308.04617v1 ) ライセンス: Link先を確認	Hang Wang, Zhen Xiang, David J. Miller, George Kesidis	(参考訳) ディープニューラルネットワークはバックドア攻撃(トロイの木馬)に脆弱であり、攻撃者がバックドアトリガーでトレーニングセットに毒を盛り、ニューラルネットワークが攻撃者の指定されたターゲットクラスに対するテストタイムトリガーの分類を学ぶ。近年の研究では、バックドア中毒は攻撃されたモデルにおいて過剰フィッティング(異常に大きな活性化)を誘発し、これによりバックドア緩和のための一般的な訓練後のクリッピング法、すなわち、少量のクリーンサンプルを用いて学習した内部層活性化の限界を動機付けることが示されている。我々は、分類マージンを明示的に制限するためにアクティベーション境界を選択する新しいアプローチを考案する。この手法は、CIFAR-10画像分類のためのピア法に対して優れた性能を与える。また,この手法は適応攻撃,x2x攻撃,異なるデータセットに対して強いロバスト性を示す。最後に、元のネットワークとアクティベーションバウンドネットワークの出力差に基づいて、テスト時間検出と修正のための方法拡張を示す。本手法のコードはオンラインで利用可能である。 Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for test-time detection and correction based on the output differences between the original and activation-bounded networks. The code of our method is online available.	翻訳日:2023-08-10 15:39:52 公開日:2023-08-08
# ストレス・ストレス関連精神疾患の検出・予測・モニタリングのための機械学習・ディープラーニング・データ前処理技術:スコープレビュー Machine Learning, Deep Learning and Data Preprocessing Techniques for Detection, Prediction, and Monitoring of Stress and Stress-related Mental Disorders: A Scoping Review ( http://arxiv.org/abs/2308.04616v1 ) ライセンス: Link先を確認	Moein Razavi, Samira Ziyadidegan, Reza Jahromi, Saber Kazeminasab, Vahid Janfaza, Ahmadreza Mahmoudzadeh, Elaheh Baharlouei, Farzan Sasangohar	(参考訳) この総合的なレビューは、精神ストレスとその関連する精神障害の検出、予測、分析に使用される機械学習(ML)方法論を体系的に評価する。厳密なスコーピングレビュープロセスを用いて,ストレスおよびストレス関連mdsの文脈で使用される最新のmlアルゴリズム,前処理技術,データ型について調査を行った。その結果、Support Vector Machine(SVM)、Neural Network(NN)、Random Forest(RF)モデルは、検査されたすべての機械学習アルゴリズムにおいて、常に優れた精度と堅牢性を示すことがわかった。さらに, 心拍数測定や皮膚反応などの生理的パラメータが, mlアルゴリズムのストレス予測因子として広く用いられていることを考察する。これは、ストレスやストレス関連のMDに関する豊富な説明情報と、データ取得の比較的容易さに起因する。さらに、マッピング、特徴選択、フィルタリング、ノイズ低減を含む次元性低減技術の応用は、MLアルゴリズムの訓練に先立って重要なステップとしてしばしば観察される。このレビューの合成は、重要な研究のギャップを明らかにし、この分野の今後の方向性を概説する。これらの領域は、モデル解釈可能性、モデルパーソナライゼーション、自然主義的設定の組み込み、ストレスやストレスに関連するmdsの検出と予測のためのリアルタイム処理能力などを含む。 This comprehensive review systematically evaluates Machine Learning (ML) methodologies employed in the detection, prediction, and analysis of mental stress and its consequent mental disorders (MDs). Utilizing a rigorous scoping review process, the investigation delves into the latest ML algorithms, preprocessing techniques, and data types employed in the context of stress and stress-related MDs. The findings highlight that Support Vector Machine (SVM), Neural Network (NN), and Random Forest (RF) models consistently exhibit superior accuracy and robustness among all machine learning algorithms examined. Furthermore, the review underscores that physiological parameters, such as heart rate measurements and skin response, are prevalently used as stress predictors in ML algorithms. This is attributed to their rich explanatory information concerning stress and stress-related MDs, as well as the relative ease of data acquisition. Additionally, the application of dimensionality reduction techniques, including mappings, feature selection, filtering, and noise reduction, is frequently observed as a crucial step preceding the training of ML algorithms. The synthesis of this review identifies significant research gaps and outlines future directions for the field. These encompass areas such as model interpretability, model personalization, the incorporation of naturalistic settings, and real-time processing capabilities for detection and prediction of stress and stress-related MDs.	翻訳日:2023-08-10 15:39:34 公開日:2023-08-08
# 深層学習を用いた方向探索のためのスパースアレイ設計 Sparse Array Design for Direction Finding using Deep Learning ( http://arxiv.org/abs/2308.04615v1 ) ライセンス: Link先を確認	Kumar Vijay Mishra, Ahmet M. Elbir and Koichi Ichige	(参考訳) 近年,スパースアレイの設計に深層学習(DL)技術が導入されている。これらの手法は、機能工学と低い予測段階の複雑さの利点を提供し、スパース配列を見つけることに固有の組合せ探索に取り組むのに役立つ。本章では,DLに基づくスパースアレイの応用について,複数の方向の合成を行う。まず、認識レーダ応用のためのスパースアレイの選択に適用可能な教師付きおよび伝達学習手法を検討する。ここでは,2次元スパースアレイの設計において,シミュレートアニーリングなどのメタヒューリスティック学習アルゴリズムの利用についても論じる。次に,sparse array問題とチャネル推定,ビームフォーミング,ローカライズを併用した無線通信のためのdlベースアンテナ選択について検討する。最後に,isac(integrated sensing and communications)アプリケーションにおいて,レーダと通信性能のトレードオフによってisacスパースアレイ問題が非常に困難となるような,深いスパースアレイ手法の例を示す。各設定について,いくつかの数値実験を通してモデルに基づく最適化とdl手法の性能を示す。我々は、配列データの様々な不完全性に対するdlベースのアルゴリズムの堅牢性を確保するために必要となる追加の考慮事項について論じる。 In the past few years, deep learning (DL) techniques have been introduced for designing sparse arrays. These methods offer the advantages of feature engineering and low prediction-stage complexity, which is helpful in tackling the combinatorial search inherent to finding a sparse array. In this chapter, we provide a synopsis of several direction finding applications of DL-based sparse arrays. We begin by examining supervised and transfer learning techniques that have applications in selecting sparse arrays for a cognitive radar application. Here, we also discuss the use of meta-heuristic learning algorithms such as simulated annealing for the case of designing two-dimensional sparse arrays. Next, we consider DL-based antenna selection for wireless communications, wherein sparse array problem may also be combined with channel estimation, beamforming, or localization. Finally, we provide an example of deep sparse array technique for integrated sensing and communications (ISAC) application, wherein a trade-off of radar and communications performance makes ISAC sparse array problem very challenging. For each setting, we illustrate the performance of model-based optimization and DL techniques through several numerical experiments. We discuss additional considerations required to ensure robustness of DL-based algorithms against various imperfections in array data.	翻訳日:2023-08-10 15:39:09 公開日:2023-08-08
# 津波に伴う内部重力波の深層学習による検出 : 開海自然災害検出への道 Deep Learning Driven Detection of Tsunami Related Internal GravityWaves: a path towards open-ocean natural hazards detection ( http://arxiv.org/abs/2308.04611v1 ) ライセンス: Link先を確認	Valentino Constantinou, Michela Ravanelli, Hamlin Liu, Jacob Bortnik	(参考訳) 津波は電離圏内で内部重力波(IGW)を発生させ、地球航法衛星システム(GNSS)によって検出される全電子含有量(TEC)を摂動させる。 GNSSは、ヨーロッパのガリレオ、アメリカ合衆国のGPS、ロシアのGlobal'naya Navigatsionnaya Sputnikovaya Sistema(GLONASS)、中国のBeiDouといった地球軌道からの信号を提供する衛星群である。 TIDのリアルタイム検出は津波検出のアプローチを提供し、ブイベースの警報システムでは利用できない地域において、早期警報システムを強化する。 GNSSデータの大部分はディープラーニングによって活用され、何千ものデータストリームにわたる複雑な非線形関係を効果的に処理する。 VARION(Variometric Approach for Real-Time Ionosphere Observation)アルゴリズムからスラント全電子含有量(sTEC)をグラミアン角差分場(Computer Vision)と畳み込みニューラルネットワーク(Convolutional Neural Networks, CNN)を用いてほぼリアルタイムに検出するフレームワークについて述べる。 2010年モーレ地震、2011年東北地震、2012年ハイダ・グワイ地震と津波の過去のデータはモデルトレーニングに使われ、2015年チリのイラペル地震と津波はサンプルモデルの検証に使われている。論文で説明した実験フレームワークを用いて,91.7%のF1スコアを得た。ソースコードはhttps://github.com/vc1492a/tidd。本研究は, 開海の津波によるIGWの検出における新たなフロンティアであり, 沿岸地域の自然災害検出の可能性を大幅に向上させるものである。 Tsunamis can trigger internal gravity waves (IGWs) in the ionosphere, perturbing the Total Electron Content (TEC) - referred to as Traveling Ionospheric Disturbances (TIDs) that are detectable through the Global Navigation Satellite System (GNSS). The GNSS are constellations of satellites providing signals from Earth orbit - Europe's Galileo, the United States' Global Positioning System (GPS), Russia's Global'naya Navigatsionnaya Sputnikovaya Sistema (GLONASS) and China's BeiDou. The real-time detection of TIDs provides an approach for tsunami detection, enhancing early warning systems by providing open-ocean coverage in geographic areas not serviceable by buoy-based warning systems. Large volumes of the GNSS data is leveraged by deep learning, which effectively handles complex non-linear relationships across thousands of data streams. We describe a framework leveraging slant total electron content (sTEC) from the VARION (Variometric Approach for Real-Time Ionosphere Observation) algorithm by Gramian Angular Difference Fields (from Computer Vision) and Convolutional Neural Networks (CNNs) to detect TIDs in near-real-time. Historical data from the 2010 Maule, 2011 Tohoku and the 2012 Haida-Gwaii earthquakes and tsunamis are used in model training, and the later-occurring 2015 Illapel earthquake and tsunami in Chile for out-of-sample model validation. Using the experimental framework described in the paper, we achieved a 91.7% F1 score. Source code is available at: https://github.com/vc1492a/tidd. Our work represents a new frontier in detecting tsunami-driven IGWs in open-ocean, dramatically improving the potential for natural hazards detection for coastal communities.	翻訳日:2023-08-10 15:38:51 公開日:2023-08-08
# データサイエンスプロジェクトが失敗する理由 Why Data Science Projects Fail ( http://arxiv.org/abs/2308.04896v1 ) ライセンス: Link先を確認	Balaram Panda (The University of Auckland)	(参考訳) データサイエンスは現代のデータインテリジェンスの実践であり、多くのビジネスの中核であり、ビジネスの課題をより効率的に扱うためのスマートな戦略を構築するのに役立ちます。データサイエンスの実践は、このアルゴリズムを使ってビジネスプロセスを自動化するのにも役立ちます。データサイエンスに関しては、主に3つの重要な要素がデータサイエンスプロジェクトの効果的な成果に影響を及ぼす。データの利用可能性 2.Algorithm 3.技術力やインフラ Data Science is a modern Data Intelligence practice, which is the core of many businesses and helps businesses build smart strategies around to deal with businesses challenges more efficiently. Data Science practice also helps in automating business processes using the algorithm, and it has several other benefits, which also deliver in a non-profitable framework. In regards to data science, three key components primarily influence the effective outcome of a data science project. Those are 1.Availability of Data 2.Algorithm 3.Processing power or infrastructure	翻訳日:2023-08-10 13:52:08 公開日:2023-08-08
# cuts: 医療用画像セグメンテーションのための教師なしフレームワーク CUTS: A Fully Unsupervised Framework for Medical Image Segmentation ( http://arxiv.org/abs/2209.11359v5 ) ライセンス: Link先を確認	Chen Liu, Matthew Amodio, Liangbo L. Shen, Feng Gao, Arman Avesta, Sanjay Aneja, Jay C. Wang, Lucian V. Del Priore, Smita Krishnaswamy	(参考訳) 本研究では,医用画像セグメンテーションのための完全教師なしディープラーニングフレームワークであるCUTS(Contrastive and Unsupervised Training for Segmentation)を導入する。ピクセルとその周辺地域からの自己スーパービジョンを画像自身で活用する。教師なしのアプローチは、コントラスト学習や自動エンコーディングの概念を活用するトレーニング目標を最適化します。いずれの段階においてもラベル付きデータを必要とせず,新たな2段階アプローチで医療画像のセグメンテーションを行う。最初の段階は、高次元の潜在埋め込み空間におけるベクトル表現を用いて、周囲のパッチと共にすべてのピクセルを埋め込む「ピクセル中心のパッチ」を作成することである。第2段階は、多スケールの位相データ解析手法である拡散凝縮を用いて、これらの埋め込みベクトルを任意のレベルの粒度で動的に粗粒化する。最終的な結果は、様々なスケールで画像構造をハイライトする粗い部分分割のシリーズである。本研究では,自然画像,網膜眼底画像,脳mri画像のマルチスケールセグメンテーションを成功させた。本フレームワークは, 医療画像の場合, 臨床解釈に関連のある異なる情報を伝達しうる, 異なるスケールで構造やパターンを規定する。本フレームワークは,3つの医用画像データセットにおける既存の教師なし手法と比較して,ダイス係数とハウスドルフ距離の10%から200%の改善を定量的に示す。ラベルに頼らずに複数の意味のある粒度の医療画像の分節化の問題に取り組む中で,今後,退屈かつ反復的な手動アノテーションを回避できることを実証したい。 In this work we introduce CUTS (Contrastive and Unsupervised Training for Segmentation), a fully unsupervised deep learning framework for medical image segmentation to better utilize the vast majority of imaging data that is not labeled or annotated. We utilize self-supervision from pixels and their local neighborhoods in the images themselves. Our unsupervised approach optimizes a training objective that leverages concepts from contrastive learning and autoencoding. Our framework segments medical images with a novel two-stage approach without relying on any labeled data at any stage. The first stage involves the creation of a "pixel-centered patch" that embeds every pixel along with its surrounding patch, using a vector representation in a high-dimensional latent embedding space. The second stage utilizes diffusion condensation, a multi-scale topological data analysis approach, to dynamically coarse-grain these embedding vectors at all levels of granularity. The final outcome is a series of coarse-to-fine segmentations that highlight image structures at various scales. In this work, we show successful multi-scale segmentation on natural images, retinal fundus images, and brain MRI images. Our framework delineates structures and patterns at different scales which, in the cases of medical images, may carry distinct information relevant to clinical interpretation. Quantitatively, our framework demonstrates improvements ranging from 10% to 200% on dice coefficient and Hausdorff distance compared to existing unsupervised methods across three medical image datasets. As we tackle the problem of segmenting medical images at multiple meaningful granularities without relying on any label, we hope to demonstrate the possibility to circumvent tedious and repetitive manual annotations in future practice.	翻訳日:2023-08-10 10:57:28 公開日:2023-08-08
# アニーリングマシンによるベイズネットワークの学習 Learning Bayesian Networks with Annealing Machine ( http://arxiv.org/abs/2006.06926v4 ) ライセンス: Link先を確認	Yuta Shikuri	(参考訳) 近年の研究では、アニーリングマシンは高い精度で組合せ最適化問題を解決することができると報告されている。アニーリングマシンは、スコアベースのベイズネットワーク構造学習に応用できる可能性がある。しかし、現在、アニール機のビット容量は制限されている。このアニール技術を利用するには、スコアベースの学習問題をビット容量内の2次非制約バイナリ最適化に変換する必要がある。本稿では,候補となる親集合の高度な同定と分解による効率的な変換手法を提案する。また、必要なビット数を最小限に抑える分解を見つけるために整数プログラミング問題も提供する。変数が75ドルから223ドルまでのベンチマークデータセットによる実験結果から,半導体技術で開発された完全結合型アニールマシンであるFujitsu Digital Annealerの100ドルKビット容量よりも,我々のアプローチではビット数が少なくなることがわかった。さらに,本手法によるディジタルアニーラは,既存のアルゴリズムよりもスコア最大化に優れることを示す。これらの結果はベイズネットワーク学習におけるアニールプロセッサの有用性を強調した。 Recent studies have reported that annealing machines are capable of solving combinatorial optimization problems with high accuracy. Annealing machines can potentially be applied to score-based Bayesian network structure learning. However, the bit capacity of an annealing machine is currently limited. To utilize the annealing technology, converting score-based learning problems into quadratic unconstrained binary optimizations within the bit capacity is necessary. In this paper, we propose an efficient conversion method with the advanced identification of candidate parent sets and their decomposition. We also provide an integer programming problem to find the decomposition that minimizes the number of required bits. Experimental results on $7$ benchmark datasets with variables from $75$ to $223$ show that our approach requires less bits than the $100$K bit capacity of the fourth-generation Fujitsu Digital Annealer, a fully coupled annealing machine developed with semiconductor technology. Moreover, we demonstrate that the Digital Annealer with our conversion method outperforms existing algorithms on score maximization. These results highlight the utility of annealing processors in learning Bayesian networks.	翻訳日:2023-08-09 18:09:54 公開日:2023-08-08
# モバイルネット畳み込みに基づく軽量ターゲット検出アルゴリズム A lightweight target detection algorithm based on Mobilenet Convolution ( http://arxiv.org/abs/2002.03729v3 ) ライセンス: Link先を確認	Shengquan Wang	(参考訳) Target detection algorithm based on deep learning needs high computer GPU configuration, even need to use high performance deep learning workstation, this not only makes the cost increase, also greatly limits the realizability of the ground, this paper introduces a kind of lightweight algorithm for target detection under the condition of the balance accuracy and computational efficiency, MobileNet as Backbone performs parameter The processing speed is 30fps on the RTX2060 card for images with the CNN separator layer. rtx2060カードの処理速度は30fpsで、解像度は320320である。 Target detection algorithm based on deep learning needs high computer GPU configuration, even need to use high performance deep learning workstation, this not only makes the cost increase, also greatly limits the realizability of the ground, this paper introduces a kind of lightweight algorithm for target detection under the condition of the balance accuracy and computational efficiency, MobileNet as Backbone performs parameter The processing speed is 30fps on the RTX2060 card for images with the CNN separator layer. The processing speed is 30fps on the RTX2060 card for images with a resolution of 320320.	翻訳日:2023-08-09 18:09:35 公開日:2023-08-08
# 一般資源理論におけるモノトン Monotones in General Resource Theories ( http://arxiv.org/abs/1912.07085v3 ) ライセンス: Link先を確認	Tom\'a\v{s} Gonda, Robert W. Spekkens	(参考訳) 資源理論の研究における中心的な問題は、資源変換(モノトーンと呼ばれる)下では、リソースフルネスを定量化するために、不要な関数を見つけることである。モノトンの様々な構成は、多くの異なるコンクリートの資源理論に現れる。これらの構造はどのくらい一般的ですか。与えられた構成を適用すべき資源理論に必要な条件は何か。これらの疑問に答えるために、モノトーンを構成するための幅広いスキームを導入する。興味のある資源の序列から、非自明なモノトンが以前に知られていたり、より簡単に構築できるような、明確な事前順序への順序保存写像を見つけることを含む。私たちが研究した2つの主要なクラスのうちの1つでは、リソースの事前順序はリソースの集合の事前順序にマッピングされ、順序関係が包含されている場合、これらの集合内の関数の値の最大化や最小化によってモノトンが定義できる。他のクラスでは、リソースのプレオーダーはリソースのタプルのプレオーダーにマッピングされ、タプルの異なる要素(その情報内容)の識別可能性の量を測定するモノトーンをプルする。収縮に基づくモノトーンは、後者のクラスで自然に発生し、さらに驚くべきことに、重量とロバスト性の測定も行う。標準モノトン構成の多くを捉えることに加えて、このスキームはこれらの重要な一般化も示唆している。結果の適用可能性の広さを適切に把握するために, 構成概念が関連する資源の種類(状態, チャネル, コームなど)に依存しない, 新たな資源理論の抽象的枠組みとして提示する。 A central problem in the study of resource theories is to find functions that are nonincreasing under resource conversions - termed monotones - in order to quantify resourcefulness. Various constructions of monotones appear in many different concrete resource theories. How general are these constructions? What are the necessary conditions on a resource theory for a given construction to be applicable? To answer these questions, we introduce a broad scheme for constructing monotones. It involves finding an order-preserving map from the preorder of resources of interest to a distinct preorder for which nontrivial monotones are previously known or can be more easily constructed; these monotones are then pulled back through the map. In one of the two main classes we study, the preorder of resources is mapped to a preorder of sets of resources, where the order relation is set inclusion, such that monotones can be defined via maximizing or minimizing the value of a function within these sets. In the other class, the preorder of resources is mapped to a preorder of tuples of resources, and one pulls back monotones that measure the amount of distinguishability of the different elements of the tuple (hence its information content). Monotones based on contractions arise naturally in the latter class, and, more surprisingly, so do weight and robustness measures. In addition to capturing many standard monotone constructions, our scheme also suggests significant generalizations of these. In order to properly capture the breadth of applicability of our results, we present them within a novel abstract framework for resource theories in which the notion of composition is independent of the types of the resources involved (i.e., whether they are states, channels, combs, etc.).	翻訳日:2023-08-09 18:09:28 公開日:2023-08-08
# 識別器最適輸送 Discriminator optimal transport ( http://arxiv.org/abs/1910.06832v3 ) ライセンス: Link先を確認	Akinori Tanaka	(参考訳) 生成逆数ネットワークの幅広いクラスにおいて、判別器最適化プロセスは、ターゲット分布$p$とジェネレータ分布$p_G$の間のワッサーシュタイン距離に対する双対コスト関数の下位境界を増大させることを示す。これは、訓練された判別器が$p_G$から$p$まで最適輸送(OT)を近似できることを意味する。いくつかの実験と少しのot理論に基づき、画像生成を改善するための判別器最適輸送(dot)スキームを提案する。 CIFAR-10, STL-10 で訓練された無条件 GAN と ImageNet による条件付き GAN の事前学習モデルにより, 開始スコアと FID が向上することを示す。 Within a broad class of generative adversarial networks, we show that discriminator optimization process increases a lower bound of the dual cost function for the Wasserstein distance between the target distribution $p$ and the generator distribution $p_G$. It implies that the trained discriminator can approximate optimal transport (OT) from $p_G$ to $p$.Based on some experiments and a bit of OT theory, we propose a discriminator optimal transport (DOT) scheme to improve generated images. We show that it improves inception score and FID calculated by un-conditional GAN trained by CIFAR-10, STL-10 and a public pre-trained model of conditional GAN by ImageNet.	翻訳日:2023-08-09 18:08:58 公開日:2023-08-08
# 対数モデルpt対称性作用素における実スペクトル:対数モデルpt対称性におけるisoスペクトル Real spectra in Logarithmic model PT-symmetry operators: Iso-spectra in Logarithmic PT-symmetry ( http://arxiv.org/abs/1904.09983v5 ) ライセンス: Link先を確認	Biswanath Rath, Rabab Jarrar, Hussein Shanak, Jihad Asad, and Rania Wannan	(参考訳) 特異および非特異な性質を持つ新しい対数モデルPT対称性作用素の実スペクトルを反映する。また, 逆対数および非逆対数型pt対称ポテンシャル間のisoスペクトルの性質にも気付く。現在の数値結果は以前の結果とよく一致している。 We reflect real spectra of new logarithmic model PT-symmetry operators with singular and non-singular in nature. We also notice that iso-spectral nature between inverted and non-inverted logarithmic PT-symmetric potentials. Present numerical result give good agreement with previous results.	翻訳日:2023-08-09 18:08:45 公開日:2023-08-08
# トランスベース事前学習言語モデルを用いた制御可能なテキスト生成に関する調査 A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models ( http://arxiv.org/abs/2201.05337v4 ) ライセンス: Link先を確認	Hanqing Zhang, Haolin Song, Shaoyu Li, Ming Zhou, Dawei Song	(参考訳) 制御可能なテキスト生成(CTG)は、自然言語生成(NLG)分野における新興分野である。実用上の制約をよりよく満たす高度なテキスト生成技術を開発する上で重要であると考えられている。近年、大規模な事前学習言語モデル(PLM)を用いた手法、特に広く使われているトランスフォーマーベースのPLMは、NLGの新しいパラダイムとなり、より多種多様な流動的なテキストを生成することができる。しかし、ディープニューラルネットワークの解釈可能性に限界があるため、これらの手法の制御性を保証する必要がある。この目的のために、トランスフォーマーベースのPLMを用いた制御可能なテキスト生成は、急速に成長するが、新しい研究ホットスポットとなっている。近年の3～4年間で、様々なタイプの制御制約を必要とするCTGタスクをターゲットにした多様なアプローチが出現している。本稿では,この分野における共通課題,主なアプローチ,評価手法について,系統的な批判的考察を行う。最後に、この分野が直面している課題について議論し、様々な将来的な方向性を提示する。我々の知る限りでは、トランスフォーマーベースのPLMの観点から最先端CTG技術の概要をまとめた最初の調査論文である。関連分野の研究者や実践者が、学術的および技術的フロンティアを迅速に追跡し、その領域の風景と将来の研究のロードマップを提供するのに役立つことを期待している。 Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG). It is regarded as crucial for the development of advanced text generation technologies that better meet the specific constraints in practical applications. In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse and fluent text. However, due to the limited level of interpretability of deep neural networks, the controllability of these methods need to be guaranteed. To this end, controllable text generation using transformer-based PLMs has become a rapidly growing yet challenging new research hotspot. A diverse range of approaches have emerged in the recent 3-4 years, targeting different CTG tasks that require different types of controlled constraints. In this paper, we present a systematic critical review on the common tasks, main approaches, and evaluation methods in this area. Finally, we discuss the challenges that the field is facing, and put forward various promising future directions. To the best of our knowledge, this is the first survey paper to summarize the state-of-the-art CTG techniques from the perspective of Transformer-based PLMs. We hope it can help researchers and practitioners in the related fields to quickly track the academic and technological frontier, providing them with a landscape of the area and a roadmap for future research.	翻訳日:2023-08-09 18:04:13 公開日:2023-08-08
# Lawin Transformer: セマンティックセグメンテーションのためのマルチスケール表現による新しいEraビジョンバックボーンの改良 Lawin Transformer: Improving New-Era Vision Backbones with Multi-Scale Representations for Semantic Segmentation ( http://arxiv.org/abs/2201.01615v3 ) ライセンス: Link先を確認	Haotian Yan and Chuang Zhang and Ming Wu	(参考訳) マルチレベルアグリゲーション(MLA)モジュールは、セマンティックセグメンテーションにおいて、新しい時代のビジョンバックボーンを前進させる重要なコンポーネントとして登場した。本稿では,視覚バックボーンからのマルチスケール特徴マップを創造的に活用する新しいMLAアーキテクチャであるLawin (large window) Transformerを提案する。 lawin transformerのコアはlawin attentionであり、ローカルウィンドウよりもずっと大きなコンテキストウィンドウをクエリできる、新たに設計されたウィンドウアテンションメカニズムである。我々は,大規模ウィンドウパラダイムの効率的かつ簡易な応用について研究することに注力し,大規模コンテクストのクエリとマルチスケール表現のキャプチャに対する比率の柔軟な規制を可能にした。我々はLawin TransformerがCityscapesおよびADE20Kに与える影響を検証し、新しい視覚バックボーンと組み合わせることで、広く使われているMLAモジュールに優れた優位性を示す。コードはhttps://github.com/yan-hao-tian/lawinで入手できる。 The multi-level aggregation (MLA) module has emerged as a critical component for advancing new-era vision back-bones in semantic segmentation. In this paper, we propose Lawin (large window) Transformer, a novel MLA architecture that creatively utilizes multi-scale feature maps from the vision backbone. At the core of Lawin Transformer is the Lawin attention, a newly designed window attention mechanism capable of querying much larger context windows than local windows. We focus on studying the efficient and simplistic application of the large-window paradigm, allowing for flexible regulation of the ratio of large context to query and capturing multi-scale representations. We validate the effectiveness of Lawin Transformer on Cityscapes and ADE20K, consistently demonstrating great superiority to widely-used MLA modules when combined with new-era vision backbones. The code is available at https://github.com/yan-hao-tian/lawin.	翻訳日:2023-08-09 18:03:44 公開日:2023-08-08
# カーネルを用いた複合適合試験 Composite Goodness-of-fit Tests with Kernels ( http://arxiv.org/abs/2111.10275v3 ) ライセンス: Link先を確認	Oscar Key, Arthur Gretton, Fran\c{c}ois-Xavier Briol, Tamara Fernandez	(参考訳) モデルの不特定は確率的モデルの実装に重大な課題を生じさせうるため、この問題を直接的に考慮する様々な堅牢な手法の開発につながっている。しかし、これらのより関連するメソッドが必要かどうかは、モデルが本当に誤った仕様であるかどうかに依存し、この質問に答える一般的な方法が欠如している。本稿では,そのような方法を提案する。より正確には、あるパラメトリックな家系の任意の分布からデータが得られるかどうかに関心を持つ、難しい複合テスト問題に対するカーネルベースの仮説テストを提案する。実験では,最小距離推定器を用いて,最大平均誤差とカーネルのスタイン誤差を推定する。これらは広く適用可能であり、パラメトリックモデルの密度が正規化定数まで分かる場合や、モデルがシミュレータの形式を取る場合などである。その結果,適切なテストレベルを維持しつつ,パラメータを推定し,同じデータに対して(データ分割を伴わずに)テストを行うことが可能であることが判明した。提案手法は, 異常な非パラメトリック密度モデルの有効性の検証や, 生体細胞ネットワークの難易度生成モデルなど, 様々な問題について考察する。 Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more involved methods are required will depend on whether the model is really misspecified, and there is a lack of generally applicable methods to answer this question. In this paper, we propose one such method. More precisely, we propose kernel-based hypothesis tests for the challenging composite testing problem, where we are interested in whether the data comes from any distribution in some parametric family. Our tests make use of minimum distance estimators based on the maximum mean discrepancy and the kernel Stein discrepancy. They are widely applicable, including whenever the density of the parametric model is known up to normalisation constant, or if the model takes the form of a simulator. As our main result, we show that we are able to estimate the parameter and conduct our test on the same data (without data splitting), while maintaining a correct test level. Our approach is illustrated on a range of problems, including testing for goodness-of-fit of an unnormalised non-parametric density model, and an intractable generative model of a biological cellular network.	翻訳日:2023-08-09 18:03:07 公開日:2023-08-08
# バナッハ空間における線形関数データの正規化学習の解析 Analysis of Regularized Learning for Linear-functional Data in Banach Spaces ( http://arxiv.org/abs/2109.03159v6 ) ライセンス: Link先を確認	Qi Ye	(参考訳) 本稿では, 表現定理, 擬近似定理, 収束定理を含むバナッハ空間における線形汎関数データに対する正規化学習の全体論を考察する。入力トレーニングデータは、マルチモデルデータとマルチスケールモデルの離散局所情報を表現するために、バナッハ空間の先行空間における線形関数からなる。トレーニングデータとマルチロス関数は、期待されるリスクを近似するために経験的リスクを計算するために使用され、正規化学習はバナッハ空間上の正規化された経験的リスクを最小化する。元の問題の厳密な解は、たとえ元の問題が未知あるいは未定であっても、正規化学習によって世界規模で近似される。収束定理では、バナッハ空間の弱位相による正確な解への近似解の収束を示す。さらに、正規化学習の定理を適用して、サポートベクトルマシンや人工ニューラルネットワークといった機械学習の多くの問題を解決する。 In this article, we study the whole theory of regularized learning for linear-functional data in Banach spaces including representer theorems, pseudo-approximation theorems, and convergence theorems. The input training data are composed of linear functionals in the predual space of the Banach space to represent the discrete local information of multimodel data and multiscale models. The training data and the multi-loss functions are used to compute the empirical risks to approximate the expected risks, and the regularized learning is to minimize the regularized empirical risks over the Banach spaces. The exact solutions of the original problems are approximated globally by the regularized learning even if the original problems are unknown or unformulated. In the convergence theorems, we show the convergence of the approximate solutions to the exact solutions by the weak topology of the Banach space. Moreover, the theorems of the regularized learning are applied to solve many problems of machine learning such as support vector machines and artificial neural networks.	翻訳日:2023-08-09 18:02:47 公開日:2023-08-08
# ガウス過程補間におけるパラメータ選択--選択基準の実証的研究 Parameter selection in Gaussian process interpolation: an empirical study of selection criteria ( http://arxiv.org/abs/2107.06006v5 ) ライセンス: Link先を確認	S\'ebastien Petit (L2S, GdR MASCOT-NUM), Julien Bect (L2S, GdR MASCOT-NUM), Paul Feliot, Emmanuel Vazquez (L2S, GdR MASCOT-NUM)	(参考訳) 本稿では,ガウス過程補間におけるパラメータ選択の基本問題を再検討する。パラメトリックファミリー内のガウス過程の平均および共分散関数を選択することにより、ユーザは未知の機能についての予測を行うベイズ手順のファミリーを取得し、良好な予測パフォーマンスを提供する家族を選択する必要がある。本研究は,2009年にファスハウアーと共著者が提唱した概念に基づいて,例えば一般のクロスバリデーション基準のような標準選択基準の回復を可能にする,離脱一貫選択基準と検証基準を構築するための効果的な枠組みを提供する,スコアリングルールの一般的な概念に基づく。この条件下では, 適切なモデル群の選択が, 特定の選択基準の選択よりも重要であることが, 文献のいくつかのテスト問題として実証的に示される。さらに,mat{\'e}rn共分散の正則性パラメータは,ほとんどの選択基準により効果的に選択できることを示した。 This article revisits the fundamental problem of parameter selection for Gaussian process interpolation. By choosing the mean and the covariance functions of a Gaussian process within parametric families, the user obtains a family of Bayesian procedures to perform predictions about the unknown function, and must choose a member of the family that will hopefully provide good predictive performances. We base our study on the general concept of scoring rules, which provides an effective framework for building leave-one-out selection and validation criteria, and a notion of extended likelihood criteria based on an idea proposed by Fasshauer and co-authors in 2009, which makes it possible to recover standard selection criteria such as, for instance, the generalized cross-validation criterion. Under this setting, we empirically show on several test problems of the literature that the choice of an appropriate family of models is often more important than the choice of a particular selection criterion (e.g., the likelihood versus a leave-one-out selection criterion). Moreover, our numerical results show that the regularity parameter of a Mat{\'e}rn covariance can be selected effectively by most selection criteria.	翻訳日:2023-08-09 18:02:32 公開日:2023-08-08
# ガウス過程回帰の実用的かつ厳密な不確実性境界 Practical and Rigorous Uncertainty Bounds for Gaussian Process Regression ( http://arxiv.org/abs/2105.02796v2 ) ライセンス: Link先を確認	Christian Fiedler, Carsten W. Scherer, Sebastian Trimpe	(参考訳) ガウス過程回帰(Gaussian Process Regression)は、ベイズ原理に基づく一般的な非パラメトリック回帰法であり、予測に対する不確実性推定を提供する。しかしながら、これらの推定はベイズの性質であり、安全性を保証する学習ベース制御のような重要な応用には、頻繁な不確実性境界が必要である。このような厳密な境界はガウス過程で利用できるが、それらはアプリケーションで役立つには保守的すぎる。これはしばしば実践者がこれらの境界をヒューリスティックに置き換え、理論上の保証を全て破ることになる。この問題に対処するために,厳密だが実用上有用である新たな不確実性境界を導入する。特に、境界は明示的に評価され、芸術結果の状態よりも保守的ではない。さらに,特定のモデル誤特定は優雅な劣化のみをもたらすことを示した。数値例による学習ベース制御におけるこれらの利点と有用性を示す。 Gaussian Process Regression is a popular nonparametric regression method based on Bayesian principles that provides uncertainty estimates for its predictions. However, these estimates are of a Bayesian nature, whereas for some important applications, like learning-based control with safety guarantees, frequentist uncertainty bounds are required. Although such rigorous bounds are available for Gaussian Processes, they are too conservative to be useful in applications. This often leads practitioners to replacing these bounds by heuristics, thus breaking all theoretical guarantees. To address this problem, we introduce new uncertainty bounds that are rigorous, yet practically useful at the same time. In particular, the bounds can be explicitly evaluated and are much less conservative than state of the art results. Furthermore, we show that certain model misspecifications lead to only graceful degradation. We demonstrate these advantages and the usefulness of our results for learning-based control with numerical examples.	翻訳日:2023-08-09 18:02:10 公開日:2023-08-08
# 熱伝達を増強する層流流路壁修正の迅速発見のための機械学習 Machine learning for rapid discovery of laminar flow channel wall modifications that enhance heat transfer ( http://arxiv.org/abs/2101.08130v2 ) ライセンス: Link先を確認	Yuri Koide, Arjun J. Kaithakkal, Matthias Schniewind, Bradley P. Ladewig, Alexander Stroh and Pascal Friederich	(参考訳) 流体の数値シミュレーションは,多くの物理現象をモデル化する上で重要な役割を担っている。単純平たい流路内の流体中の伝熱の計算は, 様々なシミュレーション手法において比較的容易な作業である。しかし、チャネル幾何がより複雑になると、数値シミュレーションは壁のジオメトリの最適化においてボトルネックとなる。任意の, 平坦な, 非平坦なチャネルの正確な数値シミュレーションと, ドラッグ係数とスタントン数を予測する機械学習モデルを組み合わせる。畳み込みニューラルネットワーク(CNN)は,数値シミュレーションのわずかな時間で,目標特性を正確に予測できることを示す。我々は,CNNモデルを仮想的な高スループットスクリーニング手法を用いて,多種多様なランダムな壁構造を探索する。データ拡張は既存のジオメトリデータに適用され、モデルの一般化を改善するために同じ数の熱伝達パラメータを持つ生成された新しいトレーニングデータを追加した。一般的なアプローチは、ここで述べたような単純なフロー設定に適用できるだけでなく、化学工学における多相や反応単位操作のようなより複雑なタスクにも拡張できる。 Numerical simulation of fluids plays an essential role in modeling many physical phenomena, which enables technological advancements, contributes to sustainable practices, and expands our understanding of various natural and engineered systems. The calculation of heat transfer in fluid flow in simple flat channels is a relatively easy task for various simulation methods. However, once the channel geometry becomes more complex, numerical simulations become a bottleneck in optimizing wall geometries. We present a combination of accurate numerical simulations of arbitrary, flat, and non-flat channels and machine learning models predicting drag coefficient and Stanton number. We show that convolutional neural networks (CNN) can accurately predict the target properties at a fraction of the time of numerical simulations. We use the CNN models in a virtual high-throughput screening approach to explore a large number of possible, randomly generated wall architectures. Data Augmentation was applied to existing geometries data to add generated new training data which have the same number of parameters of heat transfer to improve the model's generalization. The general approach is not only applicable to simple flow setups as presented here but can be extended to more complex tasks, such as multiphase or even reactive unit operations in chemical engineering.	翻訳日:2023-08-09 18:01:22 公開日:2023-08-08
# 非定常マルコフ環境に対する集合ベース値演算子 Set-based value operators for non-stationary Markovian environments ( http://arxiv.org/abs/2207.07271v3 ) ライセンス: Link先を確認	Sarah H.Q. Li, Assal\'e Adj\'e, Pierre-Lo\"ic Garoche, Beh\c{c}et A\c{c}{\i}kme\c{s}e	(参考訳) 本稿では,有限状態マルコフ決定過程(MDPs)をコンパクトな集合における不確かさパラメータで解析し,集合ベースの固定点理論による堅牢なMDPの結果を再検討する。この目的のために、ベルマンとポリシー評価演算子を値関数空間上の収縮作用素に一般化し、それらを 'emph{value operator} と表す。これらの値演算子は値関数の \emph{sets} に作用し、それらを \emph{set-based value operator} と表す。集合ベースの値作用素がコンパクト値関数集合の空間において \emph{contractions} であることを証明する。集合論からの洞察を生かして、古典ロバストなmdp文献における矩形性条件を、より弱く、動的計画法においてパラメータ不明なmdpと契約演算子のより大きな集合に適用できる全ての値演算子の封じ込め条件に一般化する。矩形条件と包含条件の両方が、集合ベースの値演算子の固定点集合が自身のエクストリーム要素を含むことを十分に保証する。不確実な MDP パラメータの凸集合とコンパクト集合に対して、古典的ロバスト値関数と集合ベースのベルマン作用素の固定点集合の上限との同値性を示す。コンパクト集合における動的に変化するMDPパラメータの下では、値反復に対する集合収束結果が証明され、そうでなければ単一の値関数に収束しない。最後に,惑星探査と成層圏観測における確率的経路計画問題に対する新たな保証を得る。 This paper analyzes finite state Markov Decision Processes (MDPs) with uncertain parameters in compact sets and re-examines results from robust MDP via set-based fixed point theory. To this end, we generalize the Bellman and policy evaluation operators to contracting operators on the value function space and denote them as \emph{value operators}. We lift these value operators to act on \emph{sets} of value functions and denote them as \emph{set-based value operators}. We prove that the set-based value operators are \emph{contractions} in the space of compact value function sets. Leveraging insights from set theory, we generalize the rectangularity condition in classic robust MDP literature to a containment condition for all value operators, which is weaker and can be applied to a larger set of parameter-uncertain MDPs and contracting operators in dynamic programming. We prove that both the rectangularity condition and the containment condition sufficiently ensure that the set-based value operator's fixed point set contains its own extrema elements. For convex and compact sets of uncertain MDP parameters, we show equivalence between the classic robust value function and the supremum of the fixed point set of the set-based Bellman operator. Under dynamically changing MDP parameters in compact sets, we prove a set convergence result for value iteration, which otherwise may not converge to a single value function. Finally, we derive novel guarantees for probabilistic path-planning problems in planet exploration and stratospheric station-keeping.	翻訳日:2023-08-09 17:53:22 公開日:2023-08-08
# 敵対的模倣学習の自動エンコーディング Auto-Encoding Adversarial Imitation Learning ( http://arxiv.org/abs/2206.11004v3 ) ライセンス: Link先を確認	Kaifeng Zhang, Rui Zhao, Ziming Zhang, Yang Gao	(参考訳) 強化学習(rl)は意思決定のための強力なフレームワークを提供するが、実際には注意深く設計された報酬機能を必要とすることが多い。 AIL(Adversarial Imitation Learning)は、環境からの報酬信号にアクセスせずに自動ポリシー取得に光を当てる。本稿では,堅牢でスケーラブルな AIL フレームワークである Auto-Encoding Adversarial Imitation Learning (AEAIL) を提案する。 AEAILは、実証から専門家ポリシーを誘導するため、オートエンコーダの再構成エラーを報奨信号として利用し、従来の差別者ベースのものよりも、ポリシーを最適化するための情報を提供する。その後、導出した目的関数を用いてオートエンコーダとエージェントポリシーを訓練する。実験の結果,AEAILは現状および画像ベース環境において,最先端の手法よりも優れていることがわかった。さらに重要なのは、AEAILは、専門家によるデモが騒々しいときに、はるかに優れた堅牢性を示します。 Reinforcement learning (RL) provides a powerful framework for decision-making, but its application in practice often requires a carefully designed reward function. Adversarial Imitation Learning (AIL) sheds light on automatic policy acquisition without access to the reward signal from the environment. In this work, we propose Auto-Encoding Adversarial Imitation Learning (AEAIL), a robust and scalable AIL framework. To induce expert policies from demonstrations, AEAIL utilizes the reconstruction error of an auto-encoder as a reward signal, which provides more information for optimizing policies than the prior discriminator-based ones. Subsequently, we use the derived objective functions to train the auto-encoder and the agent policy. Experiments show that our AEAIL performs superior compared to state-of-the-art methods on both state and image based environments. More importantly, AEAIL shows much better robustness when the expert demonstrations are noisy.	翻訳日:2023-08-09 17:52:13 公開日:2023-08-08
# orc: オンラインロールチェンジを用いたネットワークグループベースの知識蒸留 ORC: Network Group-based Knowledge Distillation using Online Role Change ( http://arxiv.org/abs/2206.01186v2 ) ライセンス: Link先を確認	Junyong Choi, Hyeon Cho, Seokhwa Cheung, Wonjun Hwang	(参考訳) 知識蒸留では,全能全能の教師ネットワークではすべての問題を解決できないため,近年,複数の教師による知識蒸留が研究されている。しかし、一部の未熟な教師が生徒に虚偽の知識を移すことがあるため、その改善は期待したほど良くないこともある。本稿では,この制限を克服し,複数のネットワークの有効性を活かすために,複数のネットワークを教師グループと学生グループに分割する。すなわち、学生グループは教師の知識を学習する必要がある未熟なネットワークの集合であり、教師グループは、うまく教えられる選択されたネットワークで構成されている。学生グループ内の上位ネットワークが各イテレーションで教師グループに昇格できるオンラインの役割変更戦略を提案する。教師集団の知識を洗練させるために,教師集団の誤りサンプルを用いて教員集団を訓練した後,教師グループから学生グループへの協調的知識の伝達に成功した。 CIFAR-10, CIFAR-100, ImageNetにおける提案手法の優位性を検証する。我々はさらに,resnet, wrn, vgg, mobilenet, shufflenet などの様々なバックボーンアーキテクチャを用いた手法の汎用性を示す。 In knowledge distillation, since a single, omnipotent teacher network cannot solve all problems, multiple teacher-based knowledge distillations have been studied recently. However, sometimes their improvements are not as good as expected because some immature teachers may transfer the false knowledge to the student. In this paper, to overcome this limitation and take the efficacy of the multiple networks, we divide the multiple networks into teacher and student groups, respectively. That is, the student group is a set of immature networks that require learning the teacher's knowledge, while the teacher group consists of the selected networks that are capable of teaching successfully. We propose our online role change strategy where the top-ranked networks in the student group are able to promote to the teacher group at every iteration. After training the teacher group using the error samples of the student group to refine the teacher group's knowledge, we transfer the collaborative knowledge from the teacher group to the student group successfully. We verify the superiority of the proposed method on CIFAR-10, CIFAR-100, and ImageNet which achieves high performance. We further show the generality of our method with various backbone architectures such as ResNet, WRN, VGG, Mobilenet, and Shufflenet.	翻訳日:2023-08-09 17:51:58 公開日:2023-08-08
# 任意の次元における格子ゲージ理論の資源効率の良い量子シミュレーション:ガウスの法則とフェルミオン除去の解法 Resource-Efficient Quantum Simulation of Lattice Gauge Theories in Arbitrary Dimensions: Solving for Gauss' Law and Fermion Elimination ( http://arxiv.org/abs/2206.00685v3 ) ライセンス: Link先を確認	Guy Pardo, Tomer Greenberg, Aryeh Fortinsky, Nadav Katz, Erez Zohar	(参考訳) 格子ゲージ理論の量子シミュレーションが提案され、そのようなモデルの非摂動的性質を扱う理論的困難を克服する手法として利用されている。一つはフェルミオン自由度をシミュレートすることの難しさであり、もう一つはヒルベルト空間の冗長性であり、これは実験資源の無駄とゲージ理論の局所対称性の制約を課し、監視する必要性をもたらす。これは以前、非局所的な方法を用いて、1次元の設定で取り組まれてきた。ここでは、この問題とヒルベルト空間の冗長性を取り除き、より高い空間次元に有効である、これらの問題に対処するための別の手順を示す。我々は、$\mathbb{Z}_2$の格子ゲージ理論を実証し、IBMQクラウド量子コンピューティングプラットフォームを介して実験的に実装する。 Quantum simulation of Lattice Gauge Theories has been proposed and used as a method to overcome theoretical difficulties in dealing with the non-perturbative nature of such models. In this work we focus on two important bottlenecks that make developing such simulators hard: one is the difficulty of simulating fermionic degrees of freedom, and the other is the redundancy of the Hilbert space, which leads to a waste of experimental resources and the need to impose and monitor the local symmetry constraints of gauge theories. This has previously been tackled in one dimensional settings, using non-local methods. Here we show an alternative procedure for dealing with these problems, which removes the matter and the Hilbert space redundancy, and is valid for higher space dimensions. We demonstrate it for a $\mathbb{Z}_2$ lattice gauge theory and implement it experimentally via the IBMQ cloud quantum computing platform.	翻訳日:2023-08-09 17:51:38 公開日:2023-08-08
# 量子マルコフ力学における情報バックフローとテレポーテーションへの接続 Information back-flow in quantum non-Markovian dynamics and its connection to teleportation ( http://arxiv.org/abs/2203.00668v3 ) ライセンス: Link先を確認	Spyros Tserkis, Kade Head-Marsden, Prineha Narang	(参考訳) 量子過程は、その進化中に記憶効果が発生するとき、非マルコフ過程と呼ばれる。量子非マルコフ性(quantum non-markovianity)は、環境から主系への情報バックフローに関連する現象であるが、そのような効果は必要ないことが示されている。本研究では、離散性と連続変数系の量子非マルコビアン性と量子テレポーテーションのプロトコルとの接続を確立する。また、主システムと環境間のテレポーテーションプロトコル中に、状態回復につながる双方向の方法で情報がどのように流れるかを示す。最後に、テレポーテーションプロトコルにおけるリソースのような絡み合いの役割を考えると、この性質と非マルコフ性との関係も解明される。 A quantum process is called non-Markovian when memory effects take place during its evolution. Quantum non-Markovianity is a phenomenon typically associated with the information back-flow from the environment to the principal system, however it has been shown that such an effect is not necessary. In this work, we establish a connection between quantum non-Markovianity and the protocol of quantum teleportation in both discrete and continuous-variable systems. We also show how information flows during a teleportation protocol between the principal system and the environment in a bidirectional way leading up to a state revival. Finally, given the resource-like role of entanglement in the teleportation protocol, the relationship between this property and non-Markovianity is also elucidated.	翻訳日:2023-08-09 17:50:50 公開日:2023-08-08
# リフティングに基づく変異型マルチクラスセグメンテーション:設計,解析,実装 Lifting-based variational multiclass segmentation: design, analysis and implementation ( http://arxiv.org/abs/2202.04680v2 ) ライセンス: Link先を確認	Nadja Gruber, Johannes Schwab, Sebastien Court, Elke Gizewski, Markus Haltmeier	(参考訳) 与えられた画像を特定の特性を示す複数の領域に分割する変分多クラスセグメンテーションスキームを提案し,解析し,実現する。異なるチャネルからのエネルギー汎関数結合情報を最小化することにより、セグメンテーション領域を符号化する複数の関数を決定する。特定のマルチチャネルフィルタリングを用いて高次元の特徴空間に画像を持ち上げることで、またはRGB画像やマルチモーダル医療データなど、検討中の画像モダリティによって既に提供されることができる。実験の結果,提案手法は様々なシナリオで有効であることがわかった。特に,脳膿瘍の分類と腫瘍増殖の2つの医学的応用について有望な結果が得られた。主な理論的貢献として、提案したエネルギー関数のグローバル最小化器の存在を証明し、ノイズ入力に対する安定性と収束性を示す。特に、これらの結果はバイナリセグメンテーションの特殊な場合にも当てはまり、この特定の状況においてもこれらの結果は新規である。 We propose, analyze and realize a variational multiclass segmentation scheme that partitions a given image into multiple regions exhibiting specific properties. Our method determines multiple functions that encode the segmentation regions by minimizing an energy functional combining information from different channels. Multichannel image data can be obtained by lifting the image into a higher dimensional feature space using specific multichannel filtering or may already be provided by the imaging modality under consideration, such as an RGB image or multimodal medical data. Experimental results show that the proposed method performs well in various scenarios. In particular, promising results are presented for two medical applications involving classification of brain abscess and tumor growth, respectively. As main theoretical contributions, we prove the existence of global minimizers of the proposed energy functional and show its stability and convergence with respect to noisy inputs. In particular, these results also apply to the special case of binary segmentation, and these results are also novel in this particular situation.	翻訳日:2023-08-09 17:50:37 公開日:2023-08-08
# Genie: 量子化のデータを見せてください Genie: Show Me the Data for Quantization ( http://arxiv.org/abs/2212.04780v3 ) ライセンス: Link先を確認	Yongkweon Jeon, Chungman Lee, Ho-young Kim	(参考訳) ゼロショット量子化は、プライバシに関連するコストや問題など、さまざまな理由からデータがアクセスできない場合に、軽量なディープニューラルネットワークを開発する上で有望なアプローチである。 FP32事前学習モデルにおけるバッチ正規化層の学習パラメータ($\mu$と$\sigma$)を利用することで、ゼロショット量子化スキームは合成データの生成に焦点を当てる。その後、事前学習されたモデル(教師)から量子化モデル(学生)への知識を蒸留し、量子化モデルに合成データセットを最適化する。しかし、これまでのゼロショット量子化は、タスク固有の損失と長期最適化を必要とする量子化対応トレーニング手法の文脈で主に議論されてきた。そこで我々は,高品質な量子化ネットワークを数時間で生成するゼロショット量子化のための後学習量子化方式を提案する。さらに,量子化に適したデータを生成するGenieというフレームワークを提案する。 Genieによって合成されたデータにより、実際のデータセットを使わずに堅牢な量子化モデルを作成できる。また,学習後の量子化アルゴリズムを提案し,量子化モデルの性能を向上させる。これらを組み合わせることで、ゼロショットと少数ショットの量子化のギャップを埋めることができ、既存のアプローチと比べて量子化性能を著しく改善することができる。言い換えれば、ユニークな最先端ゼロショット量子化アプローチを得ることができる。コードは \url{https://github.com/samsunglabs/genie} で入手できる。 Zero-shot quantization is a promising approach for developing lightweight deep neural networks when data is inaccessible owing to various reasons, including cost and issues related to privacy. By exploiting the learned parameters ($\mu$ and $\sigma$) of batch normalization layers in an FP32-pre-trained model, zero-shot quantization schemes focus on generating synthetic data. Subsequently, they distill knowledge from the pre-trained model (teacher) to the quantized model (student) such that the quantized model can be optimized with the synthetic dataset. However, thus far, zero-shot quantization has primarily been discussed in the context of quantization-aware training methods, which require task-specific losses and long-term optimization as much as retraining. We thus introduce a post-training quantization scheme for zero-shot quantization that produces high-quality quantized networks within a few hours. Furthermore, we propose a framework called Genie~that generates data suited for quantization. With the data synthesized by Genie, we can produce robust quantized models without real datasets, which is comparable to few-shot quantization. We also propose a post-training quantization algorithm to enhance the performance of quantized models. By combining them, we can bridge the gap between zero-shot and few-shot quantization while significantly improving the quantization performance compared to that of existing approaches. In other words, we can obtain a unique state-of-the-art zero-shot quantization approach. The code is available at \url{https://github.com/SamsungLabs/Genie}.	翻訳日:2023-08-09 17:45:30 公開日:2023-08-08
# 選択的記憶再帰的最小二乗法:rbfニューラルネットワークによるリアルタイム学習における記憶への再キャスト Selective Memory Recursive Least Squares: Recast Forgetting into Memory in RBF Neural Network Based Real-Time Learning ( http://arxiv.org/abs/2211.07909v2 ) ライセンス: Link先を確認	Yiming Fei, Jiangang Li, Yanan Li	(参考訳) 放射ベース関数ニューラルネットワーク(RBFNN)に基づくリアルタイム学習タスクでは、ニューラルネットワークが新たなデータに対する感度を維持するために、忘れるメカニズムが広く使用されている。しかし, 忘れる機構によっては, 昔から学習されていただけあって, 受動的知識を忘れる現象として, 有用な知識が失われる。そこで本稿では,従来の記憶機構を記憶機構に再キャストする,smrls(selective memory recursive least squares)と呼ばれるリアルタイム学習手法を提案する。サンプルの収集時間に応じてサンプルの重要性を主に評価する忘れ機構とは異なり、記憶機構はサンプルの時間分布と空間分布の両方を通してサンプルの重要性を評価する。 SMRLSでは、RBFNNの入力空間を有限個の分割に均等に分割し、各分割から合成されたサンプルを用いて合成目的関数を開発する。現在の近似誤差に加えて、ニューラルネットワークは、訪問したパーティションから記録されたデータに従って重みも更新する。 SMRLSは, 最小二乗(FFRLS)や確率勾配降下(SGD)といった古典的学習法と比較して, 学習速度と一般化能力の向上を実現し, 対応するシミュレーション結果から検証した。 In radial basis function neural network (RBFNN) based real-time learning tasks, forgetting mechanisms are widely used such that the neural network can keep its sensitivity to new data. However, with forgetting mechanisms, some useful knowledge will get lost simply because they are learned a long time ago, which we refer to as the passive knowledge forgetting phenomenon. To address this problem, this paper proposes a real-time training method named selective memory recursive least squares (SMRLS) in which the classical forgetting mechanisms are recast into a memory mechanism. Different from the forgetting mechanism, which mainly evaluates the importance of samples according to the time when samples are collected, the memory mechanism evaluates the importance of samples through both temporal and spatial distribution of samples. With SMRLS, the input space of the RBFNN is evenly divided into a finite number of partitions and a synthesized objective function is developed using synthesized samples from each partition. In addition to the current approximation error, the neural network also updates its weights according to the recorded data from the partition being visited. Compared with classical training methods including the forgetting factor recursive least squares (FFRLS) and stochastic gradient descent (SGD) methods, SMRLS achieves improved learning speed and generalization capability, which are demonstrated by corresponding simulation results.	翻訳日:2023-08-09 17:44:45 公開日:2023-08-08
# Airbnbで横並びのランク付けを学ぶ Learning To Rank Diversely At Airbnb ( http://arxiv.org/abs/2210.07774v3 ) ライセンス: Link先を確認	Malay Haldar, Mustafa Abdool, Liwei He, Dillon Davis, Huiji Gao, Sanjeev Katariya	(参考訳) Airbnbは二面的なマーケットプレースで、家賃のリスティングを所有するホストと世界中から来場客を集めている。ランク付け技術にニューラルネットワークベースの学習を適用することで、ゲストとホストのマッチングが大幅に改善されている。これらのランキングの改善はコア戦略によって推進された: 予測された予約確率でリストを順序付けし、これらの予約確率の推定をより正確にするためのテクニックを反復する。この戦略に暗黙的に埋め込まれた仮定は、リストの予約確率が検索結果の他のリストとは独立して決定できるという仮定であった。本稿では,フレームワークのランク付けに広く用いられているこの仮定がいかに誤っているかを論じる。この仮定を補正する理論的基盤を提供し、その後に理論に基づく効率的なニューラルネットワークアーキテクチャを提供する。リスト間の類似性を明示的に説明し、検索結果の多様化を減らすことで、強いポジティブな影響が生じた。この理論のオンラインA/Bテストの一環として,これらの指標の勝利について議論する。本手法は,大規模生産ランキングシステムの検索結果を多角化するための実用的な手法である。 Airbnb is a two-sided marketplace, bringing together hosts who own listings for rent, with prospective guests from around the globe. Applying neural network-based learning to rank techniques has led to significant improvements in matching guests with hosts. These improvements in ranking were driven by a core strategy: order the listings by their estimated booking probabilities, then iterate on techniques to make these booking probability estimates more and more accurate. Embedded implicitly in this strategy was an assumption that the booking probability of a listing could be determined independently of other listings in search results. In this paper we discuss how this assumption, pervasive throughout the commonly-used learning to rank frameworks, is false. We provide a theoretical foundation correcting this assumption, followed by efficient neural network architectures based on the theory. Explicitly accounting for possible similarities between listings, and reducing them to diversify the search results generated strong positive impact. We discuss these metric wins as part of the online A/B tests of the theory. Our method provides a practical way to diversify search results for large-scale production ranking systems.	翻訳日:2023-08-09 17:43:27 公開日:2023-08-08
# イベントベース行動認識のためのスパイクニューラルネットワーク:その利点を理解するための新しいタスク Spiking Neural Networks for event-based action recognition: A new task to understand their advantage ( http://arxiv.org/abs/2209.14915v2 ) ライセンス: Link先を確認	Alex Vicente-Sola, Davide L. Manna, Paul Kirkland, Gaetano Di Caterina, Trevor Bihl	(参考訳) スパイキングニューラルネットワーク(snn)は、その独特の時間ダイナミクスによって特徴付けられるが、そのような計算の性質と利点はまだよく分かっていない。そこで本研究では,スパイキングニューロンが繰り返しシナプスを必要とせずに,フィードフォワードニューラルネットワークの時間的特徴抽出を可能にし,そのバイオインスパイアされた計算原理をエネルギー効率の向上を超えてうまく活用し,従来のニューロンとの違いを推定する方法を示す。これは、dvs-gesture-chain(dvs-gc)という新しいタスクを提案し、実イベントベースのアクション認識データセットにおける時間依存の知覚を初めて評価する。本研究は,イベントの順序の理解を必要とする新しいDVS-GCと異なり,時間的特徴抽出を伴わないネットワークで広く使用されているDVS Gestureベンチマークを解く方法を示す。さらに,この機構により,スパイクニューロンの時間的処理における漏洩率の役割を明らかにし,「ハードリセット」機構の利点を実証した。さらに,時間依存重みと正規化が時間的注意による順序の理解につながることを示す。 Spiking Neural Networks (SNN) are characterised by their unique temporal dynamics, but the properties and advantages of such computations are still not well understood. In order to provide answers, in this work we demonstrate how Spiking neurons can enable temporal feature extraction in feed-forward neural networks without the need for recurrent synapses, showing how their bio-inspired computing principles can be successfully exploited beyond energy efficiency gains and evidencing their differences with respect to conventional neurons. This is demonstrated by proposing a new task, DVS-Gesture-Chain (DVS-GC), which allows, for the first time, to evaluate the perception of temporal dependencies in a real event-based action recognition dataset. Our study proves how the widely used DVS Gesture benchmark could be solved by networks without temporal feature extraction, unlike the new DVS-GC which demands an understanding of the ordering of the events. Furthermore, this setup allowed us to unveil the role of the leakage rate in spiking neurons for temporal processing tasks and demonstrated the benefits of "hard reset" mechanisms. Additionally, we also show how time-dependent weights and normalization can lead to understanding order by means of temporal attention.	翻訳日:2023-08-09 17:43:09 公開日:2023-08-08
# 効率的なロバストトレーニングのための逆コアセット選択 Adversarial Coreset Selection for Efficient Robust Training ( http://arxiv.org/abs/2209.05785v2 ) ライセンス: Link先を確認	Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie	(参考訳) ニューラルネットワークは敵の攻撃に弱い: 入力に巧みに作り上げられた、知覚不能な摂動を加えることで、出力を変更できる。敵の訓練は、そのような攻撃に対して堅牢なモデルを訓練するための最も効果的なアプローチの1つである。残念ながら、トレーニングデータ全体の逆例をイテレーション毎に構築する必要があるため、ニューラルネットワークのバニラトレーニングよりもはるかに遅い。コアセット選択の理論を活用することで、トレーニングデータの小さなサブセットの選択が、堅牢なトレーニングの時間的複雑さを軽減するための原則的なアプローチを提供することを示す。この目的のために、まず、逆コアセット選択に対する収束保証を提供する。特に、収束境界は、コアセットがトレーニングデータ全体にわたって計算された勾配をいかにうまく近似できるかに直接関係していることを示す。理論的解析により,この勾配近似誤差を逆コアセット選択目的として用いて,トレーニングセットのサイズを効果的に削減する。一度構築すると、トレーニングデータのこのサブセット上で逆トレーニングを実行します。既存の手法と異なり,TRADES,$\ell_p$-PGD,Perceptual Adversarial Trainingなど,さまざまなトレーニング対象に適用することができる。我々は,我々のアプローチが,クリーンでロバストな精度の低下を経験しながら,敵のトレーニングを2～3倍高速化することを示すために,広範な実験を行った。 Neural networks are vulnerable to adversarial attacks: adding well-crafted, imperceptible perturbations to their input can modify their output. Adversarial training is one of the most effective approaches to training robust models against such attacks. Unfortunately, this method is much slower than vanilla training of neural networks since it needs to construct adversarial examples for the entire training data at every iteration. By leveraging the theory of coreset selection, we show how selecting a small subset of training data provides a principled approach to reducing the time complexity of robust training. To this end, we first provide convergence guarantees for adversarial coreset selection. In particular, we show that the convergence bound is directly related to how well our coresets can approximate the gradient computed over the entire training data. Motivated by our theoretical analysis, we propose using this gradient approximation error as our adversarial coreset selection objective to reduce the training set size effectively. Once built, we run adversarial training over this subset of the training data. Unlike existing methods, our approach can be adapted to a wide variety of training objectives, including TRADES, $\ell_p$-PGD, and Perceptual Adversarial Training. We conduct extensive experiments to demonstrate that our approach speeds up adversarial training by 2-3 times while experiencing a slight degradation in the clean and robust accuracy.	翻訳日:2023-08-09 17:42:46 公開日:2023-08-08
# 一般化量子マスター方程式を用いたNISQコンピュータ上のオープン量子システムダイナミクスのシミュレーション Simulating Open Quantum System Dynamics on NISQ Computers with Generalized Quantum Master Equations ( http://arxiv.org/abs/2209.04956v2 ) ライセンス: Link先を確認	Yuchen Wang (1), Ellen Mulvihill (2), Zixuan Hu (1), Ningyi Lyu (2), Saurabh Shivpuje (1), Yudan Liu (3), Micheline B. Soley (2 and 4), Eitan Geva (3), Victor S. Batista (2), and Sabre Kais (1) ((1) Purdue University, (2) Yale University, (3) University of Michigan, Ann Arbor, (4) University of Wisconsin-Madison)	(参考訳) 本稿では,一般量子マスター方程式(GQME)に基づく量子アルゴリズムを提案する。このアプローチは、還元密度行列の要素の任意の部分集合に対する運動方程式の厳密な導出を提供することにより、システムバス結合とマルコビティを仮定するリンドブラッド方程式の限界を克服する。残りの自由度の影響によるメモリカーネルを入力として、対応する非単位プロパゲータを算出する。 Szの仕組みを実証する。非ユニタリプロパゲータを高次元のヒルベルト空間内のユニタリなものに変換するために、-nagy dilation theorem を用いることができ、それが nisq コンピュータの量子回路上で実装できる。我々は, 量子回路深度が, 減密度行列の対角要素に制限された場合の精度に与える影響を解析し, スピンボソンベンチマークモデルに適用した量子アルゴリズムの有効性を検証した。提案手法は, NISQ IBM コンピュータ上で信頼性の高い結果が得られることを示す。 We present a quantum algorithm based on the Generalized Quantum Master Equation (GQME) approach to simulate open quantum system dynamics on noisy intermediate-scale quantum (NISQ) computers. This approach overcomes the limitations of the Lindblad equation, which assumes weak system-bath coupling and Markovity, by providing a rigorous derivation of the equations of motion for any subset of elements of the reduced density matrix. The memory kernel resulting from the effect of the remaining degrees of freedom is used as input to calculate the corresponding non-unitary propagator. We demonstrate how the Sz.-Nagy dilation theorem can be employed to transform the non-unitary propagator into a unitary one in a higher-dimensional Hilbert space, which can then be implemented on quantum circuits of NISQ computers. We validate our quantum algorithm as applied to the spin-boson benchmark model by analyzing the impact of the quantum circuit depth on the accuracy of the results when the subset is limited to the diagonal elements of the reduced density matrix. Our findings demonstrate that our approach yields reliable results on NISQ IBM computers.	翻訳日:2023-08-09 17:42:22 公開日:2023-08-08
# Swin-transformer-yolov5によるリアルタイムワイングレープバンチ検出 Swin-transformer-yolov5 For Real-time Wine Grape Bunch Detection ( http://arxiv.org/abs/2208.14508v3 ) ライセンス: Link先を確認	Shenglian Lu (1), Xiaoyu Liu (1), Zixaun He (2), Wenbo Liu (3), Xin Zhang (3), and Manoj Karkee (2) ((1) Guangxi Normal University, China, (2) Washington State University, US, (3) Mississippi State University, US)	(参考訳) 本研究では, リアルタイムワイン品種検出において, Swin-transformer-YOLOv5 と Swin-T-YOLOv5 が提案され, YOLOv5 と Swin-transformer の両方の利点を継承した。この研究は、2019年7月から9月にかけて、シャルドネ(白ベリーの皮)とメルロット(未熟時に白または白赤の混合ベリーの皮)の2種類のブドウ品種について行われた。 Swin-T-YOLOv5の優位性を検証するため、その性能はFaster R-CNN、YOLOv3、YOLOv4、YOLOv5など、一般的に使われている、競合するオブジェクト検出器と比較された。いずれのモデルも,2つの異なる気象条件(晴れと曇り),2つの異なるベリー成熟段階(未熟と成熟),および3つの異なる日光方向/強度(朝,正午,午後)を総合的に比較した。さらに,Swin-T-YOLOv5によるブドウの品種数予測は,アノテーション処理中の手動カウントや手動ラベリングなど,真理値と比較した。その結果、提案されたSwin-T-YOLOv5は、天候が曇ったときに平均精度(mAP)が97%、F1スコアが0.89という他の研究モデルよりも優れていた。このmAPはFaster R-CNN, YOLOv3, YOLOv4, YOLOv5より約44%, 18%, 14%, 4%高かった。 Swin-T-YOLOv5 は未熟果検出時に最低 mAP (90%) と F1-score (0.82) を達成し, 約40%, 5%, 3%, 1% の値を示した。さらに、Swin-T-YOLOv5は、予測と地上の真実を比較する際に、R2の最大0.91と2.36の根平均二乗誤差(RMSE)を達成したシャルドネ品種に対してより良い性能を示した。しかし、Merlotの品種では性能が劣り、R2の0.70とRMSEの3.30しか達成できなかった。 In this research, an integrated detection model, Swin-transformer-YOLOv5 or Swin-T-YOLOv5, was proposed for real-time wine grape bunch detection to inherit the advantages from both YOLOv5 and Swin-transformer. The research was conducted on two different grape varieties of Chardonnay (always white berry skin) and Merlot (white or white-red mix berry skin when immature; red when matured) from July to September in 2019. To verify the superiority of Swin-T-YOLOv5, its performance was compared against several commonly used/competitive object detectors, including Faster R-CNN, YOLOv3, YOLOv4, and YOLOv5. All models were assessed under different test conditions, including two different weather conditions (sunny and cloudy), two different berry maturity stages (immature and mature), and three different sunlight directions/intensities (morning, noon, and afternoon) for a comprehensive comparison. Additionally, the predicted number of grape bunches by Swin-T-YOLOv5 was further compared with ground truth values, including both in-field manual counting and manual labeling during the annotation process. Results showed that the proposed Swin-T-YOLOv5 outperformed all other studied models for grape bunch detection, with up to 97% of mean Average Precision (mAP) and 0.89 of F1-score when the weather was cloudy. This mAP was approximately 44%, 18%, 14%, and 4% greater than Faster R-CNN, YOLOv3, YOLOv4, and YOLOv5, respectively. Swin-T-YOLOv5 achieved its lowest mAP (90%) and F1-score (0.82) when detecting immature berries, where the mAP was approximately 40%, 5%, 3%, and 1% greater than the same. Furthermore, Swin-T-YOLOv5 performed better on Chardonnay variety with achieved up to 0.91 of R2 and 2.36 root mean square error (RMSE) when comparing the predictions with ground truth. However, it underperformed on Merlot variety with achieved only up to 0.70 of R2 and 3.30 of RMSE.	翻訳日:2023-08-09 17:42:02 公開日:2023-08-08
# 深部産業画像の異常検出:調査 Deep Industrial Image Anomaly Detection: A Survey ( http://arxiv.org/abs/2301.11514v4 ) ライセンス: Link先を確認	Jiaqi Liu, Guoyang Xie, Jingbao Wang, Shangnian Li, Chengjie Wang, Feng Zheng, Yaochu Jin	(参考訳) 近年のディープラーニングの急速な発展は,産業用画像異常検出(IAD)のマイルストーンとなった。本稿では,ニューラルネットワークアーキテクチャ,監視レベル,損失関数,メトリクス,データセットの観点から,ディープラーニングに基づく画像異常検出手法の包括的なレビューを行う。また, 工業生産から新たな環境を抽出し, 我々の提案した新たな環境下での現在のIADアプローチを概観する。さらに,画像異常検出のオープニング課題をいくつか挙げる。各種監視下の代表的ネットワークアーキテクチャのメリットと欠点について論じる。最後に,研究成果を要約し,今後の研究方向性を指摘する。さらなるリソースはhttps://github.com/M-3LAB/awesome-industrial-anomaly-detectionで入手できる。 The recent rapid development of deep learning has laid a milestone in industrial Image Anomaly Detection (IAD). In this paper, we provide a comprehensive review of deep learning-based image anomaly detection techniques, from the perspectives of neural network architectures, levels of supervision, loss functions, metrics and datasets. In addition, we extract the new setting from industrial manufacturing and review the current IAD approaches under our proposed our new setting. Moreover, we highlight several opening challenges for image anomaly detection. The merits and downsides of representative network architectures under varying supervision are discussed. Finally, we summarize the research findings and point out future research directions. More resources are available at https://github.com/M-3LAB/awesome-industrial-anomaly-detection.	翻訳日:2023-08-09 17:33:19 公開日:2023-08-08
# 実写フルアノテート顕微鏡画像データセット生成のための非定常拡散確率モデル Denoising Diffusion Probabilistic Models for Generation of Realistic Fully-Annotated Microscopy Image Data Sets ( http://arxiv.org/abs/2301.10227v2 ) ライセンス: Link先を確認	Dennis Eschweiler, R\"uveyda Yilmaz, Matisse Baumann, Ina Laube, Rijo Roy, Abin Jose, Daniel Br\"uckner, Johannes Stegmaier	(参考訳) 近年のコンピュータビジョンの進歩は、拡散確率モデルが特に効果的な方法であることが証明され、写実的画像データの生成に大きな進展をもたらした。本研究では,望まれる構造の粗いスケッチを出発点として,教師なしかつ直感的なアプローチにより,拡散モデルが完全注釈付顕微鏡画像データセットを効果的に生成できることを実証する。提案されたパイプラインは、ディープラーニングベースのセグメンテーションアプローチをトレーニングする際の手動アノテーションへの依存を軽減するとともに、人間のアノテーションを必要とせずに、多様なデータセットのセグメンテーションを可能にする。このアプローチは、データ生成プロセスの合理化と、様々な生物や細胞タイプを含む様々な実践実験の例で示すように、セグメンテーションモデルのより効率的でスケーラブルなトレーニングを可能にする、という大きな約束を持っている。 Recent advances in computer vision have led to significant progress in the generation of realistic image data, with denoising diffusion probabilistic models proving to be a particularly effective method. In this study, we demonstrate that diffusion models can effectively generate fully-annotated microscopy image data sets through an unsupervised and intuitive approach, using rough sketches of desired structures as the starting point. The proposed pipeline helps to reduce the reliance on manual annotations when training deep learning-based segmentation approaches and enables the segmentation of diverse datasets without the need for human annotations. This approach holds great promise in streamlining the data generation process and enabling a more efficient and scalable training of segmentation models, as we show in the example of different practical experiments involving various organisms and cell types.	翻訳日:2023-08-09 17:33:09 公開日:2023-08-08
# 深度画像から変形を推定するソフトマテリアルのコマニピュレーション Co-manipulation of soft-materials estimating deformation from depth images ( http://arxiv.org/abs/2301.05609v4 ) ライセンス: Link先を確認	Giorgio Nicola, Enrico Villagrossi, Nicola Pedrocchi	(参考訳) 布、複合材料、紙/ボール紙などの柔らかい材料を人ロボットで共同操作することは、いくつかの産業応用を提示する困難な作業である。コマニピュレーションされた材料の変形状態を推定することが主な課題である。人間のロボットの相対距離を計算して間接測度を提供する。本稿では,畳み込みニューラルネットワーク(CNN)を用いて,深度画像から素材の変形状態を推定するデータ駆動モデルを開発する。まず,素材の変形状態を,現在のロボットポーズと人間のつかみ位置との相対的なロト変換として定義する。モデルは、畳み込みニューラルネットワーク、特にImageNetで事前訓練されたDenseNet-121を介して、電流と所望の変形状態の間のデルタをロボットコントローラに供給し、ツイストコマンドを出力する。本稿では,データセットの取得,事前処理,モデルのトレーニングのために開発された手法について述べる。このモデルは、カメラからの骨格トラッカーに基づく最先端の手法と比較される。結果から,本手法は,骨格トラッカーによる性能向上と種々の欠点を回避し,データセット取得に必要な時間を最小限に抑えるため,異なるアーキテクチャやデータセット次元によるモデル性能についても検討した。 Human-robot co-manipulation of soft materials, such as fabrics, composites, and sheets of paper/cardboard, is a challenging operation that presents several relevant industrial applications. Estimating the deformation state of the co-manipulated material is one of the main challenges. Viable methods provide the indirect measure by calculating the human-robot relative distance. In this paper, we develop a data-driven model to estimate the deformation state of the material from a depth image through a Convolutional Neural Network (CNN). First, we define the deformation state of the material as the relative roto-translation from the current robot pose and a human grasping position. The model estimates the current deformation state through a Convolutional Neural Network, specifically a DenseNet-121 pretrained on ImageNet.The delta between the current and the desired deformation state is fed to the robot controller that outputs twist commands. The paper describes the developed approach to acquire, preprocess the dataset and train the model. The model is compared with the current state-of-the-art method based on a skeletal tracker from cameras. Results show that our approach achieves better performances and avoids the various drawbacks caused by using a skeletal tracker.Finally, we also studied the model performance according to different architectures and dataset dimensions to minimize the time required for dataset acquisition	翻訳日:2023-08-09 17:32:49 公開日:2023-08-08
# SPTS v2: シングルポイントシーンテキストスポッティング SPTS v2: Single-Point Scene Text Spotting ( http://arxiv.org/abs/2301.01635v3 ) ライセンス: Link先を確認	Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin	(参考訳) エンド・ツー・エンドのシーンテキストスポッティングは、本質的なテキスト検出と認識の相乗効果により大きな進歩を遂げている。従来の手法では、水平長方形、回転矩形、四角形、多角形などの手動アノテーションを前提条件としており、単点法よりもはるかに高価である。新しいフレームワークであるSPTS v2では、単一ポイントアノテーションを使用して高パフォーマンステキストスポッティングモデルをトレーニングできます。 spts v2は、同じ予測シーケンス内の全てのテキストインスタンスの中央点を逐次予測し、並行してテキスト認識を行う並列認識デコーダ(prd)を用いて、インスタンス割り当てデコーダ(iad)による自動回帰トランスの利点を予約する。これら2つのデコーダは同じパラメータを共有し、単純な情報伝達プロセスと対話的に接続され、勾配と情報を渡す。様々な既存のベンチマークデータセットに関する包括的な実験により、spts v2は、より少ないパラメータで以前の最先端のシングルポイントテキストスポッターを上回ることができ、19$\times$の推論速度を実現している。 SPTS v2フレームワークのコンテキスト内では、他の表現と比較した場合、シーンテキストスポッティングにおける単一点表現の潜在的嗜好が示唆される。このような試みは、既存のパラダイムの領域を超えたシーンテキストスポッティングアプリケーションにとって重要な機会を提供する。コードはhttps://github.com/Yuliang-Liu/SPTSv2.comで入手できる。 End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19$\times$ faster inference speed. Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms. Code is available at https://github.com/Yuliang-Liu/SPTSv2.	翻訳日:2023-08-09 17:32:29 公開日:2023-08-08
# 最適化情報完全一般化測定によるADAPT-VQEの測定オーバーヘッドの軽減 Mitigating the measurement overhead of ADAPT-VQE with optimised informationally complete generalised measurements ( http://arxiv.org/abs/2212.09719v2 ) ライセンス: Link先を確認	Anton Nyk\"anen, Matteo A. C. Rossi, Elsi-Mari Borrelli, Sabrina Maniscalco, Guillermo Garc\'ia-P\'erez	(参考訳) ADAPT-VQE は分子シミュレーションのためのコンパクトな ans\atze を構築するための頑健なアルゴリズムである。 UCCSDのような他の手法と比較して回路深度を著しく低減できるが、精度は高く、多くのハードウェア効率の良い ans\atze の変動最適化を妨げるようなバレン高原に悩まされない。しかし、標準的な実装では、多くの整流子演算子の勾配評価とトラフ推定という形でかなりの測定オーバーヘッドを導入する。本研究では, 適応情報完全一般化計測(AIM)に基づくエネルギー評価手法を最近導入して, この測定オーバーヘッドを軽減する。エネルギー自体の効率的な測定方法を提供する以外に、情報完全(IC)測定データは、古典的に効率的な後処理のみを使用してADAPT-VQEの演算子プール内の演算子のすべての演算子を推定するために再利用することができる。本稿では,AIM-ADAPT-VQE方式の詳細を述べるとともに,H4ハミルトニアンと演算子プールを用いてその性能について検討する。数値シミュレーションにより,エネルギーを評価するために得られた測定データを再利用してADAPT-VQEを実装することができることを示す。さらに, エネルギーを化学精度で測定すると, 生成回路のcnotカウントが理想値に近いことを示す。測定データが少ないため、AIM-ADAPT-VQEは高い確率で基底状態に収束するが、回路深さが増加する場合もある。 ADAPT-VQE stands out as a robust algorithm for constructing compact ans\"atze for molecular simulation. It enables to significantly reduce the circuit depth with respect to other methods, such as UCCSD, while achieving higher accuracy and not suffering from so-called barren plateaus that hinder the variational optimisation of many hardware-efficient ans\"atze. In its standard implementation, however, it introduces a considerable measurement overhead in the form of gradient evaluations trough estimations of many commutator operators. In this work, we mitigate this measurement overhead by exploiting a recently introduced method for energy evaluation relying on Adaptive Informationally complete generalised Measurements (AIM). Besides offering an efficient way to measure the energy itself, Informationally Complete (IC) measurement data can be reused to estimate all the commutators of the operators in the operator pool of ADAPT-VQE, using only classically efficient post-processing. We present the AIM-ADAPT-VQE scheme in detail, and investigate its performance with several H4 Hamiltonians and operator pools. Our numerical simulations indicate that the measurement data obtained to evaluate the energy can be reused to implement ADAPT-VQE with no additional measurement overhead for the systems considered here. In addition, we show that, if the energy is measured within chemical precision, the CNOT count in the resulting circuits is close to the ideal one. With scarce measurement data, AIM-ADAPT-VQE still converges to the ground state with high probability, albeit with an increased circuit depth in some cases.	翻訳日:2023-08-09 17:31:36 公開日:2023-08-08
# 測定デバイス非依存量子秘密共有の破断速度-距離制限 Breaking Rate-Distance Limitation of Measurement-Device-Independent Quantum Secret Sharing ( http://arxiv.org/abs/2212.06148v3 ) ライセンス: Link先を確認	Chen-Long Li, Yao Fu, Wen-Bo Liu, Yuan-Mei Xie, Bing-Hong Li, Min-Gang Zhou, Hua-Lei Yin, Zeng-Bing Chen	(参考訳) 現在、量子シークレット共有のほとんどの進歩はレート距離境界に苦しむため、キーレートは限られている。キーレートの制限に加えて、技術的困難とそれに伴うコストが相まって、大規模なデプロイメントを妨げている。さらに, 既存プロトコルの性能は, 参加者の攻撃を考慮せずに漸近的に解析される。本稿では,キーレートと伝送距離を改良した測定デバイス非依存の量子秘密共有プロトコルについて報告する。空間多重化に基づき,少なくとも10の通信相手のネットワーク上でのレート距離境界を破ることができることを示す。他のプロトコルと比較して、我々の研究は秘密鍵レートを2桁以上改善し、送信距離を長くしている。参加者攻撃を考慮した構成可能フレームワークにおけるプロトコルのセキュリティを解析し,その性能評価を行った。さらに,既存のプロトコルと比較して,署名率が10^7ドル以上向上したデジタル署名に対して,我々のプロトコルを適用することを検討する。我々は、量子ネットワーク上のマルチパーティアプリケーションに、我々の量子秘密共有プロトコルが確かな未来を提供することを期待している。 Currently most progresses on quantum secret sharing suffer from rate-distance bound, and thus the key rates are limited. In addition to the limited key rate, the technical difficulty and the corresponding cost together prevent large-scale deployment. Furthermore, the performance of most existing protocols is analyzed in the asymptotic regime without considering participant attacks. Here we report a measurement-device-independent quantum secret sharing protocol with improved key rate and transmission distance. Based on spatial multiplexing, our protocol shows it can break rate-distance bounds over network under at least ten communication parties. Compared with other protocols, our work improves the secret key rate by more than two orders of magnitude and has a longer transmission distance. We analyze the security of our protocol in the composable framework considering participant attacks and evaluate its performance in the finite-size regime. In addition, we investigate applying our protocol to digital signatures where the signature rate is improved more than $10^7$ times compared with existing protocols. We anticipate that our quantum secret sharing protocol will provide a solid future for multiparty applications on the quantum network.	翻訳日:2023-08-09 17:31:08 公開日:2023-08-08
# コヒーレント励起輸送下におけるスピンリング用エネルギーランドスケープコントローラのロバスト性 Robustness of Energy Landscape Controllers for Spin Rings under Coherent Excitation Transport ( http://arxiv.org/abs/2303.00142v2 ) ライセンス: Link先を確認	Sean O'Neil, Frank Langbein, Edmond Jonckheere, and S Shermer	(参考訳) 量子スピンリングにおける励起輸送を調節するコントローラの設計と解析は、古典的なフィードバック制御技術を用いて効果的な制御を合成し、古典的な制御理論の期待に反する結果をもたらす。本稿では,システムおよび制御パラメータの不確実性に対する励振伝達の忠実性を最適化する制御器のロバスト性について検討する。我々は,追跡誤差の感度を古典的制御アナログとして,ロバスト性尺度として忠実性誤差の対数感度を用いる。本稿では,コヒーレントトランスポートに最適化された量子系が,正確な時間Tでの読み出しに最適化されているか,あるいはTのタイムウインドウで最適化されているかによって,誤差とログ感度の相関が著しく異なることを示した。 The design and analysis of controllers to regulate excitation transport in quantum spin rings presents challenges in the application of classical feedback control techniques to synthesize effective control, and generates results in contradiction to the expectations of classical control theory. In this paper, we examine the robustness of controllers designed to optimize the fidelity of an excitation transfer to uncertainty in system and control parameters. We use the logarithmic sensitivity of the fidelity error as the measure of robustness, drawing on the classical control analog of the sensitivity of the tracking error. In our analysis we demonstrate that quantum systems optimized for coherent transport demonstrate significantly different correlation between error and the log-sensitivity depending on whether the controller is optimized for readout at an exact time T or over a time-window about T.	翻訳日:2023-08-09 17:25:02 公開日:2023-08-08
# 最小識別性原理による量子力学 Quantum Mechanics From Principle of Least Distinguishability ( http://arxiv.org/abs/2302.14619v5 ) ライセンス: Link先を確認	Jianhao M. Yang	(参考訳) 非相対論的量子力学の定式化は最小識別可能性の原理から導出できることを示す。この原理は、2つの仮定を分解することで古典力学から最小作用原理の拡張と考えることができる。第一に、Planck定数は、観測可能となるために、物理オブジェクトがそのダイナミクス中に示す必要がある個別のアクションの量を定義する。これにより、古典的軌道との識別性の度合いを計算できる。第二に、古典軌道に沿って一定の真空揺らぎがある。真空揺らぎによる新たな識別可能性を測定するために,情報メトリクスを定義する新しい手法を提案する。変分原理を適用して、可微分性の合計度を最小にすることで、不確実性関係やシュル・"{o}ディンガー方程式を含む基本量子定式化を位置および運動量表現の両方で取り戻すことができる。さらに、この原則は2つの面で新しい結果をもたらす。概念レベルでは、真空揺らぎに関する情報指標は、基礎となる物理的相互作用を伴わずに絡み合い効果を示すものであり、絡み合い効果が非因果関係であることを示唆している。数学のレベルでは、相対エントロピーのより一般的な定義を用いて真空揺らぎの情報量を定義することは、相対エントロピーの順序に依存する一般化されたシュルンディンガー方程式をもたらす。最小の微分可能性原理は、新しい数学的ツールであり、他の高度な量子定式化を得られることを期待する。 We show that the formulations of non-relativistic quantum mechanics can be derived from the principle of least distinguishability. The principle can be considered as an extension of the least action principle from classical mechanics by factoring in two assumptions. First, the Planck constant defines the discrete amount of action a physical object needs to exhibit during its dynamics in order to be observable. This enables us to calculate the degree of distinguishability from a classical trajectory. Second, there is constant vacuum fluctuation along a classical trajectory. A novel method is introduced to define the information metrics to measure additional distinguishability due to vacuum fluctuations. Applying the variation principle to minimize the total degree of distinguishability allows us to recover the basic quantum formulations including the uncertainty relation and the Schr\"{o}dinger equation in both position and momentum representations. Furthermore, the principle brings in new results on two fronts. At the conceptual level, we find that the information metrics for vacuum fluctuations are responsible for manifesting entanglement effects without underlying physical interactions, implying that entanglement effects are non-causal. At the mathematical level, defining the information metrics for vacuum fluctuations using more general definitions of relative entropy results in a generalized Schr\"{o}dinger equation that depends on the order of relative entropy. The least distinguishability principle is a new mathematical tool, and we expect other advanced quantum formulations can be obtained from it.	翻訳日:2023-08-09 17:24:45 公開日:2023-08-08
# グラフ畳み込みネットワークに対する意味的バックドア攻撃 A semantic backdoor attack against Graph Convolutional Networks ( http://arxiv.org/abs/2302.14353v3 ) ライセンス: Link先を確認	Jiazhu Dai, Zhipeng Xiong	(参考訳) グラフ畳み込みネットワーク(GCN)は、ノード分類やグラフ分類など、様々なグラフ構造化タスクの問題に対処するのに非常に効果的である。しかし、最近の研究では、GCNはバックドア攻撃と呼ばれる新しい種類の脅威に弱いことが示されており、敵は隠れバックドアをGCNに注入することで、攻撃されたモデルが良質なサンプルに対して良好に動作するようにしているが、攻撃者が定義したトリガーによって隠れバックドアがアクティベートされた場合、その予測は攻撃者が指定したターゲットラベルに変更される。本稿では,このようなセマンティックバックドア攻撃がGCNに対して可能かどうかを考察し,GCNにおけるセキュリティ脆弱性の存在を明らかにするために,グラフ分類の文脈下でGCNに対するセマンティックバックドア攻撃(SBAG)を提案する。 SBAGはサンプルの特定の種類のノードをバックドアトリガーとして使用し、トレーニングデータを汚染することでGCNモデルに隠れたバックドアを注入する。バックドアがアクティベートされ、GCNモデルは、サンプルが十分なトリガーノードを含む限り、修正されていないサンプルでも攻撃者が指定した悪意のある分類結果を与える。 4つのグラフデータセット上でSBAGを評価する。実験の結果,スバッグは2種類の攻撃試料に対して約99.9%,82%以上の攻撃成功率を達成でき,中毒率は5%以下であった。 Graph convolutional networks (GCNs) have been very effective in addressing the issue of various graph-structured related tasks, such as node classification and graph classification. However, recent research has shown that GCNs are vulnerable to a new type of threat called a backdoor attack, where the adversary can inject a hidden backdoor into GCNs so that the attacked model performs well on benign samples, but its prediction will be maliciously changed to the attacker-specified target label if the hidden backdoor is activated by the attacker-defined trigger. In this paper, we investigate whether such semantic backdoor attacks are possible for GCNs and propose a semantic backdoor attack against GCNs (SBAG) under the context of graph classification to reveal the existence of this security vulnerability in GCNs. SBAG uses a certain type of node in the samples as a backdoor trigger and injects a hidden backdoor into GCN models by poisoning training data. The backdoor will be activated, and the GCN models will give malicious classification results specified by the attacker even on unmodified samples as long as the samples contain enough trigger nodes. We evaluate SBAG on four graph datasets. The experimental results indicate that SBAG can achieve attack success rates of approximately 99.9% and over 82% for two kinds of attack samples, respectively, with poisoning rates of less than 5%.	翻訳日:2023-08-09 17:24:21 公開日:2023-08-08
# 何が新しいの? 物語における新しい出来事の展開を特定する Whats New? Identifying the Unfolding of New Events in Narratives ( http://arxiv.org/abs/2302.07748v4 ) ライセンス: Link先を確認	Seyed Mahed Mousavi, Shohei Tanaka, Gabriel Roccabruna, Koichiro Yoshino, Satoshi Nakamura, Giuseppe Riccardi	(参考訳) ナラティブには、時間とコンテキストにまたがる豊富なイベントソースが含まれている。これらの出来事の自動理解は、さらなる計算(推論など)のために物語を要約した理解を提供する。本稿では,イベントの情報状況(IS)を調査し,物語における新たなイベントの自動識別という,新たな課題を提案する。イベントは主題、述語、オブジェクトの三重項として定義します。イベントは、談話の文脈と、コモンセンス推論によって推測できるかどうかに関して、新しく分類される。我々は,人間の注釈を用いて,新しい出来事を文レベルで表現した物語の公開コーパスを注釈した。本稿ではアノテーションプロトコルを提案し,アノテーションの品質とタスクの難易度について検討する。ナラティブ理解のための新しいイベント抽出タスクのために,アノテーション付きデータセット,アノテーション資料,機械学習ベースラインモデルを公開する。 Narratives include a rich source of events unfolding over time and context. Automatic understanding of these events provides a summarised comprehension of the narrative for further computation (such as reasoning). In this paper, we study the Information Status (IS) of the events and propose a novel challenging task: the automatic identification of new events in a narrative. We define an event as a triplet of subject, predicate, and object. The event is categorized as new with respect to the discourse context and whether it can be inferred through commonsense reasoning. We annotated a publicly available corpus of narratives with the new events at sentence level using human annotators. We present the annotation protocol and study the quality of the annotation and the difficulty of the task. We publish the annotated dataset, annotation materials, and machine learning baseline models for the task of new event extraction for narrative understanding.	翻訳日:2023-08-09 17:23:42 公開日:2023-08-08
# アラビア語のエンティティ認識に関する調査:過去・最近の進歩・将来の動向 A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and Future Trends ( http://arxiv.org/abs/2302.03512v3 ) ライセンス: Link先を確認	Xiaoye Qu, Yingjie Gu, Qingrong Xia, Zechang Li, Zhefeng Wang, Baoxing Huai	(参考訳) アラビア語のテキストがインターネット上に出現するにつれ、これらのアラビア語のテキストから重要な情報を抽出することは特に有用である。基本的な技術として、名前付きエンティティ認識(NER)は情報抽出技術のコアコンポーネントとして機能し、質問応答や知識グラフ構築など多くの自然言語処理(NLP)システムにおいて重要な役割を果たす。本稿では,アラビア語nerの開発,特にディープラーニングと事前学習型言語モデルにおける最近の進歩について概観する。具体的には、アラビア語 NER の背景として、アラビア語 NER の特徴や、アラビア語 NER の既存の資源について紹介する。そこで我々はアラビアNER法の開発を体系的にレビューした。伝統的なアラビア語のNERシステムは機能工学とドメイン固有のルールの設計に重点を置いている。近年,テキストを連続ベクトル表現で表現することで,深層学習が大きな進歩を遂げている。事前訓練された言語モデルの成長に伴い、アラビア語のNERはより良いパフォーマンスを得る。最後に,他の言語からのアラビアNER法とNER法のギャップを解消し,アラビアNERの今後の方向性を概説する。 As more and more Arabic texts emerged on the Internet, extracting important information from these Arabic texts is especially useful. As a fundamental technology, Named entity recognition (NER) serves as the core component in information extraction technology, while also playing a critical role in many other Natural Language Processing (NLP) systems, such as question answering and knowledge graph building. In this paper, we provide a comprehensive review of the development of Arabic NER, especially the recent advances in deep learning and pre-trained language model. Specifically, we first introduce the background of Arabic NER, including the characteristics of Arabic and existing resources for Arabic NER. Then, we systematically review the development of Arabic NER methods. Traditional Arabic NER systems focus on feature engineering and designing domain-specific rules. In recent years, deep learning methods achieve significant progress by representing texts via continuous vector representations. With the growth of pre-trained language model, Arabic NER yields better performance. Finally, we conclude the method gap between Arabic NER and NER methods from other languages, which helps outline future directions for Arabic NER.	翻訳日:2023-08-09 17:23:30 公開日:2023-08-08
# MonoFlow: Wassersteinグラディエントフローの観点からの多様性GANの再考 MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein Gradient Flows ( http://arxiv.org/abs/2302.01075v5 ) ライセンス: Link先を確認	Mingxuan Yi, Zhanxing Zhu, Song Liu	(参考訳) GAN(Generative Adversarial Network)における対人訓練の従来の理解は、判別器が分散を推定するために訓練され、生成器はこの分散を最小化する。 GANの多くの変種がこのパラダイムに従って開発されたという事実にもかかわらず、GANとその実践的アルゴリズムの現在の理論的理解は矛盾している。本稿では,サンプル空間における粒子の進化を特徴づけるwasserstein勾配流を利用して,ganの理論的洞察とアルゴリズム的インスピレーションを得る。粒子の進化は単調に増大する対数密度比のマッピングによって再スケールされる。本手法では, 識別器の訓練によりモノフローのベクトル場を得る手順として, 相手のベクトル場によって定義される粒子流を描画することを学ぶ。また,変動発散最小化と逆行訓練の基本的な違いを明らかにする。この解析は,ganの学習にどのような種類のジェネレータ損失関数が寄与するかを明らかにするのに役立ち,モノフローを実現する限り,ganは文献以上の損失設計(例えば,不飽和損失)を持つ可能性があることを示唆する。本フレームワークの有効性を検証するため, 一貫性のある実証研究を含む。 The conventional understanding of adversarial training in generative adversarial networks (GANs) is that the discriminator is trained to estimate a divergence, and the generator learns to minimize this divergence. We argue that despite the fact that many variants of GANs were developed following this paradigm, the current theoretical understanding of GANs and their practical algorithms are inconsistent. In this paper, we leverage Wasserstein gradient flows which characterize the evolution of particles in the sample space, to gain theoretical insights and algorithmic inspiration of GANs. We introduce a unified generative modeling framework - MonoFlow: the particle evolution is rescaled via a monotonically increasing mapping of the log density ratio. Under our framework, adversarial training can be viewed as a procedure first obtaining MonoFlow's vector field via training the discriminator and the generator learns to draw the particle flow defined by the corresponding vector field. We also reveal the fundamental difference between variational divergence minimization and adversarial training. This analysis helps us to identify what types of generator loss functions can lead to the successful training of GANs and suggest that GANs may have more loss designs beyond the literature (e.g., non-saturated loss), as long as they realize MonoFlow. Consistent empirical studies are included to validate the effectiveness of our framework.	翻訳日:2023-08-09 17:22:43 公開日:2023-08-08
# MS-DETR:低結合核融合型マルチスペクトル歩行者検出変換器とモードベース最適化 MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization ( http://arxiv.org/abs/2302.00290v2 ) ライセンス: Link先を確認	Yinghui Xing, Song Wang, Shizhou Zhang, Guoqiang Liang, Xiuwei Zhang, Yanning Zhang	(参考訳) 可視・熱変調は特に低照度条件下で相補的な情報を提供することができるため、多スペクトル歩行者検出は、多くの時空応用にとって重要な課題である。利用可能なマルチスペクトル歩行者検出装置のほとんどが非エンド・ツー・エンド検出器に基づいているが,本稿ではマルチスペクトル歩行者検出用トランスフォーマ(ms-detr)を提案し,detrをマルチモーダル検出の分野に拡張する。 ms-detrは2つのモダリティ固有のバックボーンとトランスエンコーダで構成され、続いてマルチモーダルトランスフォーマデコーダがあり、可視性と熱的特徴はマルチモーダルトランスフォーマデコーダで融合される。マルチモーダル画像間の不一致によく抵抗するため,マルチモーダル特徴のキーポイントを個別に抽出し,適応的に学習した注意重みでそれらを融合することにより,疎結合な融合戦略を設計する。さらに、異なるモダリティだけでなく、異なる歩行者インスタンスが最終検出のために異なる信頼度スコアを持つ傾向があるという知見に基づいて、可視およびサーマルデコーダの分岐を保存し、インスタンス毎の動的損失を通じて予測スロットを整列するインスタンス対応モダリティバランス最適化戦略を提案する。我々のエンドツーエンドMS-DETRは、挑戦的なKAIST、CVC-14、LLVIPベンチマークデータセットよりも優れた性能を示している。ソースコードはhttps://github.com/YinghuiXing/MS-DETR で公開されている。 Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions. Most of the available multispectral pedestrian detectors are based on non-end-to-end detectors, while in this paper, we propose MultiSpectral pedestrian DEtection TRansformer (MS-DETR), an end-to-end multispectral pedestrian detector, which extends DETR into the field of multi-modal detection. MS-DETR consists of two modality-specific backbones and Transformer encoders, followed by a multi-modal Transformer decoder, and the visible and thermal features are fused in the multi-modal Transformer decoder. To well resist the misalignment between multi-modal images, we design a loosely coupled fusion strategy by sparsely sampling some keypoints from multi-modal features independently and fusing them with adaptively learned attention weights. Moreover, based on the insight that not only different modalities, but also different pedestrian instances tend to have different confidence scores to final detection, we further propose an instance-aware modality-balanced optimization strategy, which preserves visible and thermal decoder branches and aligns their predicted slots through an instance-wise dynamic loss. Our end-to-end MS-DETR shows superior performance on the challenging KAIST, CVC-14 and LLVIP benchmark datasets. The source code is available at https://github.com/YinghuiXing/MS-DETR .	翻訳日:2023-08-09 17:22:19 公開日:2023-08-08
# 重量予測はAdamWの収束を高める Weight Prediction Boosts the Convergence of AdamW ( http://arxiv.org/abs/2302.00195v2 ) ライセンス: Link先を確認	Lei Guan	(参考訳) 本稿では、ディープニューラルネットワーク(DNN)モデルをトレーニングする際の収束を高めるために、AdamWオプティマイザに重み予測を導入する。特に、各ミニバッチトレーニングの前に、AdamWの更新ルールに従って将来の重量を予測し、予測された将来の重量を前方通過と後方伝播の両方に応用する。このように、AdamWオプティマイザは、常に現在の重みではなく将来の重みの勾配を利用してDNNパラメータを更新し、AdamWオプティマイザはより良い収束を達成する。提案手法は単純で実装が容易だが, DNN トレーニングの収束性向上に有効である。提案手法の有効性を検証するため,画像分類と言語モデリングタスクについて広範な実験を行った。実験の結果,提案手法はDNNモデルのトレーニングにおいて,AdamWの収束を向上し,AdamWよりも精度がよいことがわかった。 In this paper, we introduce weight prediction into the AdamW optimizer to boost its convergence when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, we predict the future weights according to the update rule of AdamW and then apply the predicted future weights to do both forward pass and backward propagation. In this way, the AdamW optimizer always utilizes the gradients w.r.t. the future weights instead of current weights to update the DNN parameters, making the AdamW optimizer achieve better convergence. Our proposal is simple and straightforward to implement but effective in boosting the convergence of DNN training. We performed extensive experimental evaluations on image classification and language modeling tasks to verify the effectiveness of our proposal. The experimental results validate that our proposal can boost the convergence of AdamW and achieve better accuracy than AdamW when training the DNN models.	翻訳日:2023-08-09 17:21:49 公開日:2023-08-08
# 重力誘起低温原子の絡み合い Gravitationally-induced entanglement in cold atoms ( http://arxiv.org/abs/2304.00734v2 ) ライセンス: Link先を確認	Richard Howl, Nathan Cooper, Lucia Hackerm\"uller	(参考訳) 実験室で量子重力をテストするための有望なルートは、2つ以上の量子物質間の重力誘起絡み合い(GIE)を探すことである。このような試験の提案は、主にN00N状態や高スクイーズ状態のような非古典状態のマイクロソリッドシステムを用いている。ここでは、初めて、2つの原子間ガス干渉計間のGIEを量子重力のテストとして考える。本稿では、2つの干渉計を並列に配置し、GIEと量子重力の証拠として出力ポートにおける原子数の相関関係を求める。 GIEは、N00NやSchr\odinger cat状態のようなマクロな重ね合わせ状態に挑戦することなく可能であり、代わりに原子の古典的な「コヒーレント」状態が存在する。これにより、原子干渉計の総質量はプランク質量スケールと長い積分時間でなければならない。しかし、現在最先端の量子スクイージングがコールド原子で行われていることから、質量スケールは接近可能なレベルまで減少し、近い将来にそのような質量スケールが達成できるかを詳細に議論する。 A promising route to testing quantum gravity in the laboratory is to look for gravitationally-induced entanglement (GIE) between two or more quantum matter systems. Proposals for such tests have principally used microsolid systems, with highly non-classical states, such as N00N states or highly-squeezed states. Here, we consider, for the first time, GIE between two atomic gas interferometers as a test of quantum gravity. We propose placing the two interferometers next to each other in parallel and looking for correlations in the number of atoms at the output ports as evidence of GIE and quantum gravity. GIE is possible without challenging macroscopic superposition states, such as N00N or Schr\"odinger cat states, and instead there can be just classical-like 'coherent' states of atoms. This requires the total mass of the atom interferometers to be on the Planck mass scale, and long integration times. However, with current state-of-the-art quantum squeezing in cold atoms, we argue that the mass scale can be reduced to approachable levels and detail how such a mass scale can be achieved in the near future.	翻訳日:2023-08-09 17:14:06 公開日:2023-08-08
# PMAA:マルチ時間衛星画像からの高速雲除去のためのプログレッシブなマルチスケールアテンションオートエンコーダモデル PMAA: A Progressive Multi-scale Attention Autoencoder Model for High-performance Cloud Removal from Multi-temporal Satellite Imagery ( http://arxiv.org/abs/2303.16565v2 ) ライセンス: Link先を確認	Xuechao Zou, Kai Li, Junliang Xing, Pin Tao, Yachao Cui	(参考訳) 衛星画像解析はリモートセンシングにおいて重要な役割を担っているが、雲による情報損失は適用を著しく妨げている。既存のディープクラウド除去モデルは顕著な成果を上げているが、文脈情報を考えることはほとんどない。本研究では,MAM(Multiscale Attention Module)とLIM(Local Interaction Module)を用いて,グローバルおよびローカル情報を同時利用し,ロバストなコンテキスト依存を構築するための高性能クラウド除去アーキテクチャであるPMAA(Progressive Multi-scale Attention Autoencoder)を紹介する。 PMAAは、MAMを用いたマルチスケール機能の長距離依存性を確立し、LIMを用いた細粒度細部再構築を調整し、細粒度と粗粒度の同時表現を可能にする。多様なマルチスケール機能の助けを借りて、PMAAは2つのベンチマークデータセットで従来の最先端モデルCTGANを一貫して上回っている。さらに、PMAAは、それぞれCTGANのパラメータと計算複雑性の0.5%と14.6%しかなく、かなりの効率上の利点を持っている。これらの総合的な結果は、大規模なクラウド除去タスクを達成するためにエッジデバイスへのデプロイに適した軽量クラウド除去ネットワークとしてのPMAAの可能性を示している。ソースコードと事前トレーニングされたモデルは、https://github.com/xavierjiezou/pmaaで利用可能です。 Satellite imagery analysis plays a pivotal role in remote sensing; however, information loss due to cloud cover significantly impedes its application. Although existing deep cloud removal models have achieved notable outcomes, they scarcely consider contextual information. This study introduces a high-performance cloud removal architecture, termed Progressive Multi-scale Attention Autoencoder (PMAA), which concurrently harnesses global and local information to construct robust contextual dependencies using a novel Multi-scale Attention Module (MAM) and a novel Local Interaction Module (LIM). PMAA establishes long-range dependencies of multi-scale features using MAM and modulates the reconstruction of fine-grained details utilizing LIM, enabling simultaneous representation of fine- and coarse-grained features at the same level. With the help of diverse and multi-scale features, PMAA consistently outperforms the previous state-of-the-art model CTGAN on two benchmark datasets. Moreover, PMAA boasts considerable efficiency advantages, with only 0.5% and 14.6% of the parameters and computational complexity of CTGAN, respectively. These comprehensive results underscore PMAA's potential as a lightweight cloud removal network suitable for deployment on edge devices to accomplish large-scale cloud removal tasks. Our source code and pre-trained models are available at https://github.com/XavierJiezou/PMAA.	翻訳日:2023-08-09 17:13:28 公開日:2023-08-08
# gnnbuilder - 汎用グラフニューラルネットワークアクセラレーション生成,シミュレーション,最適化のための自動化フレームワーク GNNBuilder: An Automated Framework for Generic Graph Neural Network Accelerator Generation, Simulation, and Optimization ( http://arxiv.org/abs/2303.16459v2 ) ライセンス: Link先を確認	Stefan Abi-Karam, Cong Hao	(参考訳) たくさんのグラフニューラルネットワーク(gnn)加速器が提案されている。しかし、それらはユーザーのハードウェアの専門知識に強く依存しており、通常は特定のGNNモデルに最適化されているため、実用上は困難である。そこで、本研究では、gnnbuilder を提案する。これは、最初の自動化された、汎用的な、エンドツーエンドのgnnアクセラレーター生成フレームワークである。 It features four advantages: (1) GNNBuilder can automatically generate GNN accelerators for a wide range of GNN models arbitrarily defined by users; (2) GNNBuilder takes standard PyTorch programming interface, introducing zero overhead for algorithm developers; (3) GNNBuilder supports end-to-end code generation, simulation, accelerator optimization, and hardware deployment, realizing a push-button fashion for GNN accelerator design; (4) GNNBuilder is equipped with accurate performance models of its generated accelerator, enabling fast and flexible design space exploration (DSE). 実験では、まず、我々のアクセラレータ性能モデルがレイテンシ予測で36セント、BRAMカウント予測で18セントの誤差を持つことを示した。次に、生成したアクセラレーターはCPUを6.33\times$、GPUを6.87\times$で上回ります。このフレームワークはオープンソースであり、コードはhttps://github.com/sharc-lab/gnn-builderで入手できる。 There are plenty of graph neural network (GNN) accelerators being proposed. However, they highly rely on users' hardware expertise and are usually optimized for one specific GNN model, making them challenging for practical use. Therefore, in this work, we propose GNNBuilder, the first automated, generic, end-to-end GNN accelerator generation framework. It features four advantages: (1) GNNBuilder can automatically generate GNN accelerators for a wide range of GNN models arbitrarily defined by users; (2) GNNBuilder takes standard PyTorch programming interface, introducing zero overhead for algorithm developers; (3) GNNBuilder supports end-to-end code generation, simulation, accelerator optimization, and hardware deployment, realizing a push-button fashion for GNN accelerator design; (4) GNNBuilder is equipped with accurate performance models of its generated accelerator, enabling fast and flexible design space exploration (DSE). In the experiments, first, we show that our accelerator performance model has errors within $36\%$ for latency prediction and $18\%$ for BRAM count prediction. Second, we show that our generated accelerators can outperform CPU by $6.33\times$ and GPU by $6.87\times$. This framework is open-source, and the code is available at https://github.com/sharc-lab/gnn-builder.	翻訳日:2023-08-09 17:13:02 公開日:2023-08-08
# 空間フォトニックイジングマシンによる低ランク組合せ最適化と統計的学習 Low-rank combinatorial optimization and statistical learning by spatial photonic Ising machine ( http://arxiv.org/abs/2303.14993v2 ) ライセンス: Link先を確認	Hiroshi Yamashita, Ken-ichi Okubo, Suguru Shimomura, Yusuke Ogura, Jun Tanida, Hideyuki Suzuki	(参考訳) 空間フォトニックイジングマシン (SPIM) [D. Pierangeli et al., Phys. Lett. 122, 213902 (2019)] は、空間光変調を利用して大規模な組合せ最適化問題を効率的に解くための有望な光学アーキテクチャである。しかし、SPIMの原始バージョンは、ランク1の相互作用行列だけでIsing問題に対応できる。本稿では,任意のイジング問題に光学的実装を変更せずに対応可能なspmの新しい計算モデルを提案する。提案モデルはクナップサック問題のような低位相互作用行列のイジング問題において特に効率的である。さらに、ボルツマンマシンの学習能力を取得する。低ランク相互作用モデルを用いて,MNIST手書き桁画像の学習,分類,サンプリングを効率的に行うことを示す。提案手法は,SPIMアーキテクチャに固有のスケーラビリティを損なうことなく,組合せ最適化と統計的学習の様々な問題に適用可能であることを示す。 The spatial photonic Ising machine (SPIM) [D. Pierangeli et al., Phys. Rev. Lett. 122, 213902 (2019)] is a promising optical architecture utilizing spatial light modulation for solving large-scale combinatorial optimization problems efficiently. The primitive version of the SPIM, however, can accommodate Ising problems with only rank-one interaction matrices. In this Letter, we propose a new computing model for the SPIM that can accommodate any Ising problem without changing its optical implementation. The proposed model is particularly efficient for Ising problems with low-rank interaction matrices, such as knapsack problems. Moreover, it acquires the learning ability of Boltzmann machines. We demonstrate that learning, classification, and sampling of the MNIST handwritten digit images are achieved efficiently using the model with low-rank interactions. Thus, the proposed model exhibits higher practical applicability to various problems of combinatorial optimization and statistical learning, without losing the scalability inherent in the SPIM architecture.	翻訳日:2023-08-09 17:12:42 公開日:2023-08-08
# カイラルRydbergモデルにおける量子スピン液体の分類と発生 Classification and emergence of quantum spin liquids in chiral Rydberg models ( http://arxiv.org/abs/2303.12829v2 ) ライセンス: Link先を確認	Poetri Sonya Tarabunga, Giuliano Giudici, Titas Chanda, Marcello Dalmonte	(参考訳) ライドバーグ原子配列で最近実現されたキラル相互作用ハミルトニアンの量子相の性質について検討する。ハニカム格子上のパルトン構成を用いて、全ての可能なフェルミオンキラルスピン液体を{\mathrm{u}(1)$ global symmetryに分類する。これらの2つのクラスから得られる対応する変動波動関数は、Rydberg多体基底状態の1/2$と1/4$の粒子密度を正確に記述する。この解析をテンソルネットワークシミュレーションで補完することにより、両方の粒子充填セクタは、同じ位相次数$\nu=1/2$分数量子ホール効果を持つスピン液体を持つと結論づける。密度 1/2$ では, モデルの位相図を明らかにするが, 密度 1/4$ では, 微視的位相図とほぼ重複する基底状態波動関数を明示的に構成する。これらの発見は、チャートン波動関数を用いてカイラル・リドバーグ模型における量子スピン液体の発見を導く道を開いた。 We investigate the nature of quantum phases arising in chiral interacting Hamiltonians recently realized in Rydberg atom arrays. We classify all possible fermionic chiral spin liquids with $\mathrm{U}(1)$ global symmetry using parton construction on the honeycomb lattice. The resulting classification includes six distinct classes of gapped quantum spin liquids: the corresponding variational wave functions obtained from two of these classes accurately describe the Rydberg many-body ground state at $1/2$ and $1/4$ particle density. Complementing this analysis with tensor network simulations, we conclude that both particle filling sectors host a spin liquid with the same topological order of a $\nu=1/2$ fractional quantum Hall effect. At density $1/2$, our results clarify the phase diagram of the model, while at density $1/4$, they provide an explicit construction of the ground state wave function with almost unit overlap with the microscopic one. These findings pave the way to the use of parton wave functions to guide the discovery of quantum spin liquids in chiral Rydberg models.	翻訳日:2023-08-09 17:12:22 公開日:2023-08-08
# 誘導アテンションを有するハイブリッドスペクトルDenoising Transformer Hybrid Spectral Denoising Transformer with Guided Attention ( http://arxiv.org/abs/2303.09040v2 ) ライセンス: Link先を確認	Zeqiang Lai, Chenggang Yan, Ying Fu	(参考訳) 本稿では,ハイパースペクトル画像デノージングのためのハイブリッドスペクトルデノージングトランス(hsdt)を提案する。 HSIにトランスフォーマーを適用する際の課題は、効率と柔軟性を維持しつつ、大域的および局所的な空間スペクトル相関を捕捉するCNNベースの手法の既存の制限に対処する能力から生じる。この問題に対処するために,s3conv,gssa,自己変調フィードフォワードネットワーク(sm-ffn)の2つのモデルの利点を組み合わせたハイブリッド手法を提案する。私たちのS3Convは、3D畳み込みの軽量な代替として機能し、任意のバンド数でHSIに取り組む柔軟性を維持しながら、より空間的・スペクトル的な特徴を抽出します。これらの機能はGSSAによって適応的に処理され、スペクトル帯域にわたって3Dの自己アテンションを変換し、スペクトルシグネチャを符号化する学習可能なクエリセットによってガイドされる。これは我々のモデルに、大域的なスペクトル相関を識別する強力な能力を与えるだけでなく、線形複雑性も維持する。さらに, SM-FFNでは, より情報的領域の活性化を促進させる自己変調法を提案する。シミュレーションと実世界のノイズの両面において,様々な実験を行い,HSDTが計算オーバーヘッドを低く保ちながら既存の最先端手法を著しく上回ることを示す。コードはhttps: //github.com/Zeqiang-Lai/HSDTにある。 In this paper, we present a Hybrid Spectral Denoising Transformer (HSDT) for hyperspectral image denoising. Challenges in adapting transformer for HSI arise from the capabilities to tackle existing limitations of CNN-based methods in capturing the global and local spatial-spectral correlations while maintaining efficiency and flexibility. To address these issues, we introduce a hybrid approach that combines the advantages of both models with a Spatial-Spectral Separable Convolution (S3Conv), Guided Spectral Self-Attention (GSSA), and Self-Modulated Feed-Forward Network (SM-FFN). Our S3Conv works as a lightweight alternative to 3D convolution, which extracts more spatial-spectral correlated features while keeping the flexibility to tackle HSIs with an arbitrary number of bands. These features are then adaptively processed by GSSA which per-forms 3D self-attention across the spectral bands, guided by a set of learnable queries that encode the spectral signatures. This not only enriches our model with powerful capabilities for identifying global spectral correlations but also maintains linear complexity. Moreover, our SM-FFN proposes the self-modulation that intensifies the activations of more informative regions, which further strengthens the aggregated features. Extensive experiments are conducted on various datasets under both simulated and real-world noise, and it shows that our HSDT significantly outperforms the existing state-of-the-art methods while maintaining low computational overhead. Code is at https: //github.com/Zeqiang-Lai/HSDT.	翻訳日:2023-08-09 17:12:04 公開日:2023-08-08
# SemARFlow: 自律運転のための教師なし光フロー推定にセマンティックスを注入する SemARFlow: Injecting Semantics into Unsupervised Optical Flow Estimation for Autonomous Driving ( http://arxiv.org/abs/2303.06209v2 ) ライセンス: Link先を確認	Shuai Yuan, Shuzhi Yu, Hannah Kim and Carlo Tomasi	(参考訳) 教師なし光フロー推定は、特に低テクスチャ領域における閉塞や運動境界付近で困難である。セマンティクスやドメイン知識などの追加情報は、この問題をより制約するのに役立ちます。本稿では,セマンティックセグメンテーションマスクを付加入力として利用する自律運転データのための教師なし光フローネットワークSemARFlowを紹介する。この追加情報はエンコーダに注入され、フロー出力を洗練する学習アップサンプラーに注入される。さらに、単純だが効果的なセマンティック拡張モジュールは、車両、ポール、空のフローとその境界を学習する際の自己スーパービジョンを提供する。これらの意味情報の注入により、KITTI-2015の光学フローテストの誤差は11.80%から8.38%に改善された。また、オブジェクト境界に関する目に見える改善や、データセットをまたいで一般化する能力も示しています。コードはhttps://github.com/duke-vision/semantic-unsup-flow-releaseで入手できる。 Unsupervised optical flow estimation is especially hard near occlusions and motion boundaries and in low-texture regions. We show that additional information such as semantics and domain knowledge can help better constrain this problem. We introduce SemARFlow, an unsupervised optical flow network designed for autonomous driving data that takes estimated semantic segmentation masks as additional inputs. This additional information is injected into the encoder and into a learned upsampler that refines the flow output. In addition, a simple yet effective semantic augmentation module provides self-supervision when learning flow and its boundaries for vehicles, poles, and sky. Together, these injections of semantic information improve the KITTI-2015 optical flow test error rate from 11.80% to 8.38%. We also show visible improvements around object boundaries as well as a greater ability to generalize across datasets. Code is available at https://github.com/duke-vision/semantic-unsup-flow-release.	翻訳日:2023-08-09 17:11:36 公開日:2023-08-08
# クラス特化因子を用いた遺伝的に解釈可能なマルチラベル分類 Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals ( http://arxiv.org/abs/2303.00500v2 ) ライセンス: Link先を確認	Susu Sun, Stefano Woerner, Andreas Maier, Lisa M. Koch, Christian F. Baumgartner	(参考訳) 医療画像解析などの高度な応用分野における機械学習アルゴリズムの解釈性は不可欠である。しかし、高いパフォーマンスのブラックボックスニューラルネットワークは予測の説明を提供していないため、不信感や人間とMLのコラボレーションにつながる可能性がある。実際には広く使われているポストホックな説明技術は、深刻な概念的問題に苦しむことが示されている。さらに,本論文で示すように,複数の医学的所見が1つの画像に共生するマルチラベルシナリオでは,現在の説明手法が適切に機能しない。マルチラベル分類のための本質的に解釈可能なモデルであるAttri-Netを提案する。 attri-netは、透明で信頼できる、人間に理解可能な説明を提供する強力な分類器である。モデルはまず、偽物に基づいてクラス固有の帰属マップを生成し、どの画像領域が特定の医学的所見に対応するかを特定する。次に、単純なロジスティック回帰分類器を用いて、これらの帰属写像のみに基づいて予測を行う。 Attri-Netを5つのポストホックな説明手法と3つの胸部X線データセット上の本質的に解釈可能な分類器と比較した。 Attri-Netは、臨床知識と整合した高品質なマルチラベル説明を生成し、最先端の分類モデルに匹敵する分類性能を有する。 Interpretability is essential for machine learning algorithms in high-stakes application fields such as medical image analysis. However, high-performing black-box neural networks do not provide explanations for their predictions, which can lead to mistrust and suboptimal human-ML collaboration. Post-hoc explanation techniques, which are widely used in practice, have been shown to suffer from severe conceptual problems. Furthermore, as we show in this paper, current explanation techniques do not perform adequately in the multi-label scenario, in which multiple medical findings may co-occur in a single image. We propose Attri-Net, an inherently interpretable model for multi-label classification. Attri-Net is a powerful classifier that provides transparent, trustworthy, and human-understandable explanations. The model first generates class-specific attribution maps based on counterfactuals to identify which image regions correspond to certain medical findings. Then a simple logistic regression classifier is used to make predictions based solely on these attribution maps. We compare Attri-Net to five post-hoc explanation techniques and one inherently interpretable classifier on three chest X-ray datasets. We find that Attri-Net produces high-quality multi-label explanations consistent with clinical knowledge and has comparable classification performance to state-of-the-art classification models.	翻訳日:2023-08-09 17:11:23 公開日:2023-08-08
# 異なる負の扱い方:リンク予測のための領域制約と範囲制約による損失関数の強化 Treat Different Negatives Differently: Enriching Loss Functions with Domain and Range Constraints for Link Prediction ( http://arxiv.org/abs/2303.00286v3 ) ライセンス: Link先を確認	Nicolas Hubert, Pierre Monnin, Armelle Brun, Davy Monticolo	(参考訳) 知識グラフ埋め込みモデル(KGEM)は、リンク予測を含む知識グラフ(KG)に関連する様々なタスクに使用される。それらは、三重項とその対応するラベルのバッチを考慮して計算される損失関数で訓練される。伝統的なアプローチでは、三重項のラベルは真か偽かである。しかし、最近の研究は全ての負の三重項が等しく評価されるべきでないことを示唆している。この仮定に従って、w.r.t.ドメインと範囲制約が意味的に妥当な負の三重項は高品質な負の三重項であると仮定する。したがって、損失関数は、意味的に無効な否定関数とは異なる扱いをするべきである。そこで本研究では,リンク予測のための3つの主損失関数に対する意味駆動型バージョンを提案する。広範かつ制御された実験環境において,提案した損失関数は,異なるスキーマを基盤とする3つの公開ベンチマークKGに対して,体系的に満足度の高い結果を与えることを示す。実際、提案された損失関数は(1) MRR と Hits@10 の値が向上し、(2) KGEM は Sem@K 測定値によって測定されるように、よりセマンティックな認識に向かわせる。これは意味情報がKGEMをグローバルに改善し、損失関数に組み込むべきであることを強調している。ドメインと範囲の関係はスキーマ定義のKGでほとんど利用できますが、このアプローチは実用的にも広く利用できます。 Knowledge graph embedding models (KGEMs) are used for various tasks related to knowledge graphs (KGs), including link prediction. They are trained with loss functions that are computed considering a batch of scored triples and their corresponding labels. Traditional approaches consider the label of a triple to be either true or false. However, recent works suggest that all negative triples should not be valued equally. In line with this recent assumption, we posit that negative triples that are semantically valid w.r.t. domain and range constraints might be high-quality negative triples. As such, loss functions should treat them differently from semantically invalid negative ones. To this aim, we propose semantic-driven versions for the three main loss functions for link prediction. In an extensive and controlled experimental setting, we show that the proposed loss functions systematically provide satisfying results on three public benchmark KGs underpinned with different schemas, which demonstrates both the generality and superiority of our proposed approach. In fact, the proposed loss functions do (1) lead to better MRR and Hits@10 values, (2) drive KGEMs towards better semantic awareness as measured by the Sem@K metric. This highlights that semantic information globally improves KGEMs, and thus should be incorporated into loss functions. Domains and ranges of relations being largely available in schema-defined KGs, this makes our approach both beneficial and widely usable in practice.	翻訳日:2023-08-09 17:11:02 公開日:2023-08-08
# 潜在特徴と接地ラベルの相互情報最大化によるロングテール認識 Long-Tailed Recognition by Mutual Information Maximization between Latent Features and Ground-Truth Labels ( http://arxiv.org/abs/2305.01160v3 ) ライセンス: Link先を確認	Min-Kook Suh and Seung-Woo Seo	(参考訳) コントラスト学習手法は,様々な表現学習タスクにおいて有意な性能を示したが,訓練データセットが長期化されると困難に陥る。多くの研究者は、この問題を解決するためにコントラスト学習とロジット調整技術を組み合わせたが、これらの組み合わせはアドホックに行われ、理論的背景はまだ提供されていない。本稿の目標は,背景を提供し,パフォーマンスをさらに向上させることである。まず,ロングテールタスクに苦しむコントラスト学習の基本的な理由は,潜在特徴量と入力データ間の相互情報最大化を最大化しようとすることである。基底ラベルは最大化では考慮されないため、クラスラベル間の不均衡に対処することはできない。むしろ、ロングテール認識タスクを潜在特徴と接地ラベルの相互情報最大化として解釈する。このアプローチは、コントラスト学習とロジット調整をシームレスに統合し、ロングテール認識ベンチマークで最先端のパフォーマンスを示す損失関数を導出する。また、画像分割タスクにおいて有効性を示し、画像分類を超えた汎用性を検証する。 Although contrastive learning methods have shown prevailing performance on a variety of representation learning tasks, they encounter difficulty when the training dataset is long-tailed. Many researchers have combined contrastive learning and a logit adjustment technique to address this problem, but the combinations are done ad-hoc and a theoretical background has not yet been provided. The goal of this paper is to provide the background and further improve the performance. First, we show that the fundamental reason contrastive learning methods struggle with long-tailed tasks is that they try to maximize the mutual information maximization between latent features and input data. As ground-truth labels are not considered in the maximization, they are not able to address imbalances between class labels. Rather, we interpret the long-tailed recognition task as a mutual information maximization between latent features and ground-truth labels. This approach integrates contrastive learning and logit adjustment seamlessly to derive a loss function that shows state-of-the-art performance on long-tailed recognition benchmarks. It also demonstrates its efficacy in image segmentation tasks, verifying its versatility beyond image classification.	翻訳日:2023-08-09 17:04:13 公開日:2023-08-08
# 周期系におけるキャビティ誘起電荷移動:長さゲージ形式 Cavity-induced charge transfer in periodic systems: length-gauge formalism ( http://arxiv.org/abs/2304.11364v2 ) ライセンス: Link先を確認	Ekaterina Vlasiuk, Valerii K. Kozin, Jelena Klinovaja, Daniel Loss, Ivan V. Iorsh, Ilya V. Tokatly	(参考訳) 光-物質相互作用を誘導する光子空洞の存在下で1次元周期格子系を扱うための長ゲージ形式を開発した。形式主義の目的は、パワー・ジエナウ=ウーリー・ハミルトニアンの文脈で位置作用素を定義するときに生じる数学的曖昧さを取り除くことである。次に、電子量子系と長波長のフォトニックキャビティモードとの相互作用を摂動的に解析するためにダイアグラム法を用いる。逆対称性を破った米-meleモデルにおけるキャビティ誘起電荷の不均衡と分極の研究により, 正則性の多様性を示す。 We develop a length-gauge formalism for treating one-dimensional periodic lattice systems in the presence of a photon cavity inducing light-matter interaction. The purpose of the formalism is to remove mathematical ambiguities that occur when defining the position operator in the context of the Power-Zienau-Woolley Hamiltonian. We then use a diagrammatic approach to analyze perturbatively the interaction between an electronic quantum system and a photonic cavity mode of long wavelength. We illustrate the versatility of the formalism by studying the cavity-induced electric charge imbalance and polarization in the Rice-Mele model with broken inversion symmetry.	翻訳日:2023-08-09 17:03:45 公開日:2023-08-08
# wigner friend シナリオにおけるオブザーバ依存事実からフレーム依存計測記録へ From observer-dependent facts to frame-dependent measurement records in Wigner friend scenarios ( http://arxiv.org/abs/2304.09289v2 ) ライセンス: Link先を確認	J. Allam and A. Matzkin	(参考訳) 友人が測定を行うクローズドラボを外部エージェントが記述するwigner-friendのシナリオの記述は、量子測定のあいまいな性質のために問題となっている。 1つの選択肢は、友人の測定結果が外部の観察者の観点から定義されていないことを考慮し、観察者依存の事実につながる仮定を支持することである。本研究では,エージェントが観測を行う慣性参照フレームに依存する測定記録が,これらの仮定によってもたらされることを示す相対論的文脈のモデルを提案する。我々のモデルは、友人と遠方のエージェントが共有する絡み合ったペアに基づいて、空間的に分離された測定を行う。閉じた実験室に相対して休息中の外部観察者と移動フレームの観測者は観測された記録について一致しないが、これは互いにローレンツ変換ではない。 The description of Wigner-friend scenarios -- in which external agents describe a closed laboratory containing a friend making a measurement -- remains problematic due to the ambiguous nature of quantum measurements. One option is to endorse assumptions leading to observer-dependent facts, given that the friend's measurement outcome is not defined from the point of view of the external observers. We introduce in this work a model in a relativistic context showing that these assumptions can also lead to measurement records that depend on the inertial reference frame in which the agents make their observations. Our model is based on an entangled pair shared by the friend and a distant agent performing space-like separated measurements. An external observer at rest relative to the closed laboratory and observers in a moving frame do not agree on the observed records, which are not Lorentz transforms of one another.	翻訳日:2023-08-09 17:03:33 公開日:2023-08-08
# 局所的最大切断に対する古典上の量子的優位性 A quantum advantage over classical for local max cut ( http://arxiv.org/abs/2304.08420v3 ) ライセンス: Link先を確認	Charlie Carlson, Zackary Jorquera, Alexandra Kolla, Steven Kordonowy	(参考訳) 量子局所アルゴリズムの性能を、よく確立された組合せ最適化問題LocalMaxCut上で、類似の古典的アルゴリズムと比較する。量子最適化近似アルゴリズム (qaoa) と呼ばれる、farhi, goldstone, gutmannn [1] によって最初に発見された一般的な量子アルゴリズムは、次数-3グラフ上の比較可能な局所的手法よりも計算上優れていることが示されている。これらの結果は、最先端の量子ハードウェアに関連する小さな量子計算であっても、比較可能な単純な古典計算よりも大きな利点があることを示唆している。 We compare the performance of a quantum local algorithm to a similar classical counterpart on a well-established combinatorial optimization problem LocalMaxCut. We show that a popular quantum algorithm first discovered by Farhi, Goldstone, and Gutmannn [1] called the quantum optimization approximation algorithm (QAOA) has a computational advantage over comparable local classical techniques on degree-3 graphs. These results hint that even small-scale quantum computation, which is relevant to the current state-of the art quantum hardware, could have significant advantages over comparably simple classical computation.	翻訳日:2023-08-09 17:03:18 公開日:2023-08-08
# 責任あるAIを実装する:倫理的側面の緊張とトレードオフ Implementing Responsible AI: Tensions and Trade-Offs Between Ethics Aspects ( http://arxiv.org/abs/2304.08275v3 ) ライセンス: Link先を確認	Conrad Sanderson, David Douglas, Qinghua Lu	(参考訳) 責任あるAIに対する多くの倫理原則が、AI/MLシステムの誤用と悪用に関する懸念を和らげるために提案されている。このような原則の基本的な側面は、プライバシー、正確性、公正性、堅牢性、説明可能性、透明性である。しかし、これらの側面の間には潜在的な緊張関係があり、これらの原則に従おうとするAI/ML開発者には困難をもたらしている。例えば、AI/MLシステムの精度を高めることで、その説明可能性を減らすことができる。この作業では、原則を実践するための継続的な取り組みの一環として、10の顕著な緊張、トレードオフ、および基盤となる側面の間のその他の相互作用のカタログをまとめ、議論します。主に双方向の対話に焦点を合わせ、さまざまな文献にまたがるサポートを描いています。このカタログは、倫理原則の側面間の相互作用の認識を高めるとともに、AI/MLシステムのデザイナと開発者による十分に支持された判断を促進するのに役立つ。 Many sets of ethics principles for responsible AI have been proposed to allay concerns about misuse and abuse of AI/ML systems. The underlying aspects of such sets of principles include privacy, accuracy, fairness, robustness, explainability, and transparency. However, there are potential tensions between these aspects that pose difficulties for AI/ML developers seeking to follow these principles. For example, increasing the accuracy of an AI/ML system may reduce its explainability. As part of the ongoing effort to operationalise the principles into practice, in this work we compile and discuss a catalogue of 10 notable tensions, trade-offs and other interactions between the underlying aspects. We primarily focus on two-sided interactions, drawing on support spread across a diverse literature. This catalogue can be helpful in raising awareness of the possible interactions between aspects of ethics principles, as well as facilitating well-supported judgements by the designers and developers of AI/ML systems.	翻訳日:2023-08-09 17:03:07 公開日:2023-08-08
# 顔認証エッジケースに取り組む - 奥行き解析とヒューマンマシン融合アプローチ- Tackling Face Verification Edge Cases: In-Depth Analysis and Human-Machine Fusion Approach ( http://arxiv.org/abs/2304.08134v3 ) ライセンス: Link先を確認	Martin Knoche and Gerhard Rigoll	(参考訳) 現在、顔認識システムは複数のデータセットで人間のパフォーマンスを上回っている。しかし、マシンが正しく分類できないエッジケースは依然として存在する。本稿では,顔認証タスクにおける機械と操作者の組合せの効果について検討する。まず、いくつかの最先端モデルのエッジケースに注目して、共通のデータセットの困難な設定を見つける。次に,選択タスクの参加者60名を対象に,人間による調査を行い,詳細な分析を行った。最後に、機械と人間の意思決定を組み合わせることで、様々なベンチマークデータセットにおける最先端の顔認証システムの性能をさらに向上できることを実証する。コードとデータはgithubで公開されている。 Nowadays, face recognition systems surpass human performance on several datasets. However, there are still edge cases that the machine can't correctly classify. This paper investigates the effect of a combination of machine and human operators in the face verification task. First, we look closer at the edge cases for several state-of-the-art models to discover common datasets' challenging settings. Then, we conduct a study with 60 participants on these selected tasks with humans and provide an extensive analysis. Finally, we demonstrate that combining machine and human decisions can further improve the performance of state-of-the-art face verification systems on various benchmark datasets. Code and data are publicly available on GitHub.	翻訳日:2023-08-09 17:02:50 公開日:2023-08-08
# GaitRef:refined Sequential Skeletonsを用いた歩行認識 GaitRef: Gait Recognition with Refined Sequential Skeletons ( http://arxiv.org/abs/2304.07916v3 ) ライセンス: Link先を確認	Haidong Zhu, Wanrong Zheng, Zhaoheng Zheng, Ram Nevatia	(参考訳) 歩行認識と呼ばれる歩行シーケンスで人間を識別することは、遠くから観察できるとともに、被験者の協力を必要としない、有用な生体情報理解タスクである。人の歩行の順序を表すのに使われる2つの一般的な様相はシルエットと関節骨格である。各フレーム内の歩行者の境界を記録するシルエットシーケンスは、その人物の持ち運び物や衣服の様々な外観に苦しむ可能性がある。フレームワイドな関節検出はノイズが多く、シーケンシャルな検出と一致しないジッタを導入する。本稿では,シルエットと骨格を組み合わせることで,歩行認識のためのフレームワイドジョイント予測を洗練する。シルエット配列からの時間的情報を用いて,改良された骨格は付加アノテーションなしで歩行認識性能を向上させることができることを示す。我々は,CASIA-B,OUMVLP,Gait3D,GREWの4つの公開データセットを用いて手法を比較し,最先端の性能を示す。 Identifying humans with their walking sequences, known as gait recognition, is a useful biometric understanding task as it can be observed from a long distance and does not require cooperation from the subject. Two common modalities used for representing the walking sequence of a person are silhouettes and joint skeletons. Silhouette sequences, which record the boundary of the walking person in each frame, may suffer from the variant appearances from carried-on objects and clothes of the person. Framewise joint detections are noisy and introduce some jitters that are not consistent with sequential detections. In this paper, we combine the silhouettes and skeletons and refine the framewise joint predictions for gait recognition. With temporal information from the silhouette sequences, we show that the refined skeletons can improve gait recognition performance without extra annotations. We compare our methods on four public datasets, CASIA-B, OUMVLP, Gait3D and GREW, and show state-of-the-art performance.	翻訳日:2023-08-09 17:02:41 公開日:2023-08-08
# 3+1d$のフェルミオンガウスペップ : 回転と相対論的極限 Fermionic Gaussian PEPS in $3+1d$: Rotations and Relativistic Limits ( http://arxiv.org/abs/2304.06744v2 ) ライセンス: Link先を確認	Patrick Emonts, Erez Zohar	(参考訳) フェルミオンガウス射影アンタングルペア状態(Fermionic Gaussian Projected Entangled Pair States)は、非相互作用性フェルミオンハミルトニアンの基底状態の物理を記述するフェルミオンテンソルネットワーク状態構造である。非相互作用状態として、解析的および数値的な方法で、それらを非常に効率的に研究し分析することができる。近年,格子ゲージ理論の変分研究において,いわゆるPEPSゲージ機構を適用した上での出発点として用いられることが示されている。これは符号プロブレム自由変分モンテカルロを用いて行われる。本研究では、スピン表現と格子回転の要求に焦点をあてて、2次元から3次元に一般化する方法を示す。 2+1$-dおよび3+1$-dモデルにおいて、フェルミオン物質を用いた非摂動性格子ゲージ理論物理学を研究するために、上記の変分モンテカルロ法の適用に不可欠な構成を示す。したがって、ここで提示される構成はフェルミオンテンソルネットワーク状態を持つ非自明な格子ゲージ理論の研究に不可欠である。 Fermionic Gaussian Projected Entangled Pair States are fermionic tensor network state constructions which describe the physics of ground states of non-interacting fermionic Hamiltonians. As non-interacting states, one may study and analyze them very efficiently, in both analytical and numerical means. Recently it was shown that they may be used as the starting point - after applying so-called PEPS gauging mechanisms - for variational study of lattice gauge theories. This is done using sign-problem free variational Monte-Carlo. In this work we show how to generalize such states from two to three spatial dimensions, focusing on spin representations and requirements of lattice rotations. We present constructions which are crucial for the application of the above mentioned variational Monte-Carlo techniques for studying non-perturbative lattice gauge theory physics, with fermionic matter, in $2+1$-d and $3+1$-d models. Thus, the constructions presented here are crucial for the study of non-trivial lattice gauge theories with fermionic tensor network states.	翻訳日:2023-08-09 17:02:24 公開日:2023-08-08
# ハイブリッド音源を用いた非同期計測デバイス非依存量子鍵分布 Asynchronous measurement-device-independent quantum key distribution with hybrid source ( http://arxiv.org/abs/2304.04569v3 ) ライセンス: Link先を確認	Jun-Lin Bai, Yuan-Mei Xie, Yao Fu, Hua-Lei Yin, Zeng-Bing Chen	(参考訳) 秘密鍵レート容量の線形制約は、チューフィールド量子鍵分布(QKD)によって克服される。しかし、複雑な位相同期と位相追跡技術は、ツインフィールドプロトコルの実際の応用を阻害する。非同期計測デバイス非依存(AMDI)QKDあるいはモードペアリングQKDプロトコルは、技術的要求を緩和し、ツインフィールドプロトコルと同様の性能を維持することができる。本稿では,位相ランダム化弱コヒーレント状態から位相ランダム化コヒーレント状態重畳状態に変化させることにより,非古典光源を用いたAMDI-QKDプロトコルを提案する。シミュレーションの結果,提案プロトコルはAMDI-QKDプロトコルの鍵レートを大幅に向上するとともに,非古典光源の不完全変調に対するロバスト性を示した。 The linear constraint of secret key rate capacity is overcome by the tiwn-field quantum key distribution (QKD). However, the complex phase-locking and phase-tracking technique requirements throttle the real-life applications of twin-field protocol. The asynchronous measurement-device-independent (AMDI) QKD or called mode-pairing QKD protocol can relax the technical requirements and keep the similar performance of twin-field protocol. Here, we propose an AMDI-QKD protocol with a nonclassical light source by changing the phase-randomized weak coherent state to a phase-randomized coherent-state superposition in the signal state time window. Simulation results show that our proposed hybrid source protocol significantly enhances the key rate of the AMDI-QKD protocol, while exhibiting robustness to imperfect modulation of nonclassical light sources.	翻訳日:2023-08-09 17:02:03 公開日:2023-08-08
# コントラスト学習と深いモジュール化に基づく音声分離 Speech Separation based on Contrastive Learning and Deep Modularization ( http://arxiv.org/abs/2305.10652v2 ) ライセンス: Link先を確認	Peter Ochieng	(参考訳) 音声分離のための技術ツールの現況は教師付き学習に依存している。これは、置換問題に対処する必要があることを意味しており、トレーニングや推論で使用する話者数にミスマッチの影響を受けている。さらに、その性能は高品質なラベル付きデータの存在に大きく依存している。これらの問題は、完全に教師なしの音声分離技術を用いることで効果的に解決できる。本稿では,コントラスト学習を用いてフレームの表現を確立し,下流のディープモジュール化タスクで学習表現を使用する。具体的には、音声分離において、話者の異なるフレームを、その話者の隠れた標準フレームの強化と見なすことができることを実験的に示す。話者のフレームは、音声分離の鍵となる十分な韻律情報の重複を含む。そこで本研究では,与えられた話者に属するフレーム間の距離を最小化するために,自己教師付き学習を実現する。学習された表現は、下流の深いモジュール化タスクで、話者のアイデンティティに基づいたクラスタフレームに使用される。 WSJ0-2mix と WSJ0-3mix において, SI-SNRi と SDRi を 20.8 と 21.0 でそれぞれ達成した。 WSJ0-3mix では、SI-SNRi と SDRi はそれぞれ 20.7 と 20.7 を WSJ0-2mix で得る。最大の強みは、話者数が増えるにつれて、その性能が著しく低下しないことである。 The current monaural state of the art tools for speech separation relies on supervised learning. This means that they must deal with permutation problem, they are impacted by the mismatch on the number of speakers used in training and inference. Moreover, their performance heavily relies on the presence of high-quality labelled data. These problems can be effectively addressed by employing a fully unsupervised technique for speech separation. In this paper, we use contrastive learning to establish the representations of frames then use the learned representations in the downstream deep modularization task. Concretely, we demonstrate experimentally that in speech separation, different frames of a speaker can be viewed as augmentations of a given hidden standard frame of that speaker. The frames of a speaker contain enough prosodic information overlap which is key in speech separation. Based on this, we implement a self-supervised learning to learn to minimize the distance between frames belonging to a given speaker. The learned representations are used in a downstream deep modularization task to cluster frames based on speaker identity. Evaluation of the developed technique on WSJ0-2mix and WSJ0-3mix shows that the technique attains SI-SNRi and SDRi of 20.8 and 21.0 respectively in WSJ0-2mix. In WSJ0-3mix, it attains SI-SNRi and SDRi of 20.7 and 20.7 respectively in WSJ0-2mix. Its greatest strength being that as the number of speakers increase, its performance does not degrade significantly.	翻訳日:2023-08-09 16:54:57 公開日:2023-08-08
# インプラント位置予測のための2ストリーム回帰ネットワーク Two-Stream Regression Network for Dental Implant Position Prediction ( http://arxiv.org/abs/2305.10044v3 ) ライセンス: Link先を確認	Xinquan Yang and Xuguang Li and Xuechen Li and Wenting Chen and Linlin Shen and Xin Li and Yongqiang Deng	(参考訳) インプラント補綴治療において, 外科的ガイドの設計は, 主観的かつ医師の経験に訴えやすいインプラント位置の手動位置に大きく依存する。この問題を解決するために深層学習法が適用され始めたとき, 歯間空間は様々であり, その一部には実際のインプラント領域と類似したテクスチャ特性を示すものもある。どちらの問題もインプラント位置予測には大きな課題となる。本稿では, 埋込領域検出器 (IRD) とマルチスケールパッチ埋め込み回帰ネットワーク (MSPENet) から構成される2ストリーム埋込位置回帰フレームワーク (TSIPR) を開発し, この問題に対処する。 irdのトレーニングのために、元のアノテーションを拡張して、よりリッチな特徴を持ち、追加のラベリングコストを発生しない、追加の監督情報を提供する。マルチスケールのパッチ埋め込みモジュールはMSPENetが様々な歯の間隔で画像から特徴を適応的に抽出するために設計されている。グローバルローカルな特徴相互作用ブロックは、リッチな特徴表現のための変換器と畳み込みを組み合わせたMSPENetのエンコーダを構築するように設計されている。推測中、IRDから抽出したRoIマスクを用いてMSPENetの予測結果を洗練する。 5倍のクロスバリデーションによる歯科インプラントデータセットの大規模な実験により,提案したTSIPRは既存の方法よりも優れた性能を示した。 In implant prosthesis treatment, the design of the surgical guide heavily relies on the manual location of the implant position, which is subjective and prone to doctor's experiences. When deep learning based methods has started to be applied to address this problem, the space between teeth are various and some of them might present similar texture characteristic with the actual implant region. Both problems make a big challenge for the implant position prediction. In this paper, we develop a two-stream implant position regression framework (TSIPR), which consists of an implant region detector (IRD) and a multi-scale patch embedding regression network (MSPENet), to address this issue. For the training of IRD, we extend the original annotation to provide additional supervisory information, which contains much more rich characteristic and do not introduce extra labeling costs. A multi-scale patch embedding module is designed for the MSPENet to adaptively extract features from the images with various tooth spacing. The global-local feature interaction block is designed to build the encoder of MSPENet, which combines the transformer and convolution for enriched feature representation. During inference, the RoI mask extracted from the IRD is used to refine the prediction results of the MSPENet. Extensive experiments on a dental implant dataset through five-fold cross-validation demonstrated that the proposed TSIPR achieves superior performance than existing methods.	翻訳日:2023-08-09 16:54:37 公開日:2023-08-08
# 視覚トランスフォーマーとそのcnnトランスフォーマーに基づく変種に関する調査 A survey of the Vision Transformers and its CNN-Transformer based Variants ( http://arxiv.org/abs/2305.09880v3 ) ライセンス: Link先を確認	Asifullah Khan, Zunaira Rauf, Anabia Sohail, Abdul Rehman, Hifsa Asif, Aqsa Asif, and Umair Farooq	(参考訳) 視覚トランスフォーマーは、様々なコンピュータビジョンアプリケーションのための畳み込みニューラルネットワーク(cnns)の代替として人気を博した。これらのトランスフォーマーは、画像のグローバルな関係に焦点を合わせ、大きな学習能力を提供する。しかし、画像の局所的相関をモデル化しないため、限定的な一般化に悩まされることがある。近年,視覚変換器による畳み込み操作と自己認識機構のハイブリッド化が出現し,局所的およびグローバルな画像表現の両面を利用した。これらのハイブリッド視覚トランスフォーマーは、cnn-transformer architectureとも呼ばれ、視覚応用において顕著な結果を示している。急速に増加するハイブリッドビジョントランスフォーマーの数を考えると、これらのハイブリッドアーキテクチャの分類と説明を提供する必要がある。本調査では,近年のビジョントランスフォーマーアーキテクチャの分類,特にハイブリッドビジョントランスフォーマーの分類について述べる。さらに,アテンション機構,位置埋め込み,マルチスケール処理,畳み込みなど,これらのアーキテクチャの重要な特徴についても論じる。個々の視覚トランスフォーマーアーキテクチャやcnnに焦点を当てた以前の調査論文とは対照的に、この調査はハイブリッド視覚トランスフォーマーの新たなトレンドを独特に強調している。ハイブリッドビジョントランスフォーマーが様々なコンピュータビジョンタスクに優れたパフォーマンスをもたらす可能性を示すことによって、この急速に進化するアーキテクチャの今後の方向性を明らかにした。 Vision transformers have become popular as a possible substitute to convolutional neural networks (CNNs) for a variety of computer vision applications. These transformers, with their ability to focus on global relationships in images, offer large learning capacity. However, they may suffer from limited generalization as they do not tend to model local correlation in images. Recently, in vision transformers hybridization of both the convolution operation and self-attention mechanism has emerged, to exploit both the local and global image representations. These hybrid vision transformers, also referred to as CNN-Transformer architectures, have demonstrated remarkable results in vision applications. Given the rapidly growing number of hybrid vision transformers, it has become necessary to provide a taxonomy and explanation of these hybrid architectures. This survey presents a taxonomy of the recent vision transformer architectures and more specifically that of the hybrid vision transformers. Additionally, the key features of these architectures such as the attention mechanisms, positional embeddings, multi-scale processing, and convolution are also discussed. In contrast to the previous survey papers that are primarily focused on individual vision transformer architectures or CNNs, this survey uniquely emphasizes the emerging trend of hybrid vision transformers. By showcasing the potential of hybrid vision transformers to deliver exceptional performance across a range of computer vision tasks, this survey sheds light on the future directions of this rapidly evolving architecture.	翻訳日:2023-08-09 16:54:13 公開日:2023-08-08
# 量子論の別の基礎 An alternative foundation of quantum theory ( http://arxiv.org/abs/2305.06727v5 ) ライセンス: Link先を確認	Inge S. Helland	(参考訳) 本稿では,量子論への新たなアプローチを提案する。基本はまず、理論変数、アクセス可能あるいはアクセス不能な変数、すなわち、アクターが任意に鋭い数値をそれらに割り当てることは可能であるか不可能であるかもしれない。認識論的プロセスでは、アクセス可能な変数は、アクターまたは一部の通信アクターと接続された理想的な観察である。群作用はこれらの変数上で定義され、群表現論はここでヒルベルト空間形式論を展開する基礎である。アクセス可能な理論変数に対応する演算子が導出され、離散の場合、可能な物理値はそれらの演算子の固有値であることが証明される。論文の焦点は、提案された量子論の基礎の基礎となるいくつかの数学的定理である。ここで、このアプローチで必要とされる群と変換は、アクセス可能な変数が有限次元である場合に明示的に構成できることを示す。ヒルベルト空間の定式化を再現するには、2つの相補変数の存在を仮定するのに十分である。数学的変数よりも物理変数にのみ焦点を合わせるために、到達不能変数の概念は概念の概念に置き換えられ、この関係において圏論の側面は群論を部分的に置き換える。ここで提案された基礎から推測される解釈は、量子論の一般的なエピステミック解釈と呼ばれる。この解釈の特別な例はQB主義であり、他のいくつかの解釈とも関係している。 A new approach towards quantum theory is proposed in this paper. The basis is first taken to be theoretical variables, variables that may be accessible or inaccessible, i.e., it may be possible or impossible for an actor to assign arbitrarily sharp numerical values to them. In an epistemic process, the accessible variables are just ideal observations connected to an actor or to some communicating actors. Group actions are defined on these variables, and group representation theory is the basis for developing the Hilbert space formalism here. Operators corresponding to accessible theoretical variables are derived, and in the discrete case it is proved that the possible physical values are the eigenvalues of these operators. The focus of the paper are some mathematical theorems paving the ground for the proposed foundation of quantum theory. It is shown here that the groups and transformation needed in this approach can be constructed explicitly in the case where the accessible variables are finite-dimensional. This simplifies the theory considerably: To reproduce the Hilbert space formulation, it is enough to assume the existence of two complementary variables. To focus only on physical variables rather than mathematical variables, the concept of inaccessible variables is then replaced by the concept of notions, and in this connection, aspects of category theory partly replace group theory. The interpretation inferred from the proposed foundation here may be called a general epistemic interpretation of quantum theory. A special case of this interpretation is QBism; it also has a relationship to several other interpretations.	翻訳日:2023-08-09 16:53:33 公開日:2023-08-08
# 量子コンピュータ上の多体系シミュレーションのための量子フローアルゴリズム Quantum flow algorithms for simulating many-body systems on quantum computers ( http://arxiv.org/abs/2305.05168v2 ) ライセンス: Link先を確認	Karol Kowalski, Nicholas P. Bauman	(参考訳) 我々は,次元活性空間の縮小した固有値問題を通じてヒルベルト空間の大規模部分空間をサンプリングする量子フロー (QFlow) 法を用いて,強相関系の量子シミュレーションを行った。我々のQFlowアルゴリズムは回路の複雑さを大幅に減らし、スケーラブルで一定の回路幅の量子コンピューティングの道を開く。シミュレーションにより,QFlowは必要量子ビットを増大させることなく,パラメータの桁数が桁違いに少ないアクティブ空間を用いて,波動関数パラメータの集合数を最適化できることを示した。 We conducted quantum simulations of strongly correlated systems using the quantum flow (QFlow) approach, which enables sampling large sub-spaces of the Hilbert space through coupled eigenvalue problems in reduced dimensionality active spaces. Our QFlow algorithms significantly reduce circuit complexity and pave the way for scalable and constant-circuit-depth quantum computing. Our simulations show that QFlow can optimize the collective number of wave function parameters without increasing the required qubits using active spaces having an order of magnitude fewer number of parameters.	翻訳日:2023-08-09 16:53:13 公開日:2023-08-08
# DiffBFR: ブラインド顔復元に向けたブートストラップ拡散モデル DiffBFR: Bootstrapping Diffusion Model Towards Blind Face Restoration ( http://arxiv.org/abs/2305.04517v2 ) ライセンス: Link先を確認	Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li, Tiande Guo, Xuecheng Nie	(参考訳) ブラインドフェイス修復(bfr)は挑戦的に重要である。以前の作業では、品質と効率のバランスのため、ganベースのフレームワークを利用してこの問題に取り組むことを好む。しかし、これらの手法は長期分布に対する安定性の低下と適応性に悩まされ、ソースのアイデンティティを同時に保持できず、詳細を復元することができない。本稿では,トレーニング崩壊の回避とロングテール分布の生成という面において,ganよりも優れていることを考慮し,bfrに拡散確率モデル(dpm)を導入することを提案する。 DiffBFRは2段階の設計を用いて、まず低画質の画像から識別情報を復元し、実際の顔の分布に応じてテクスチャの詳細を強化する。この設計は2つの重要なコンポーネントで実装されている。 1) 結果の顔の詳細を保存するためのアイデンティティ復元モジュール(IRM) 逆過程の条件として,LQ画像を用いた純ガウス的ランダム分布からノイズを除去する代わりに,部分雑音を付加したLQ画像から始まる新しい切り出しサンプリング手法を提案する。理論的には、この変化はDPMの限界の低い証拠を縮小し、さらにオリジナルの詳細を復元する。理論的証明により、入力サイズが異なる2つのカスケード条件DPMを導入し、このサンプリング効果を強化し、直接発生する高解像度画像のトレーニング困難を軽減する。 2)画像のテクスチャを磨くためのテクスチャ強化モジュール(TEM)。ここでは、LQフリーモデルである無条件DPMを導入し、修復を現実的に見せるように強制する。理論上は、純粋なHQ画像に基づいて訓練されたこの非条件DPMが、IRMから出力される推論画像の画素レベルの正しい分布を正当化するのに役立つことを証明した。分節時間ステップの切り抜きサンプリングを用いて、アイデンティティ情報を保持しながら画素レベルのテクスチャを研磨する。 Blind face restoration (BFR) is important while challenging. Prior works prefer to exploit GAN-based frameworks to tackle this task due to the balance of quality and efficiency. However, these methods suffer from poor stability and adaptability to long-tail distribution, failing to simultaneously retain source identity and restore detail. We propose DiffBFR to introduce Diffusion Probabilistic Model (DPM) for BFR to tackle the above problem, given its superiority over GAN in aspects of avoiding training collapse and generating long-tail distribution. DiffBFR utilizes a two-step design, that first restores identity information from low-quality images and then enhances texture details according to the distribution of real faces. This design is implemented with two key components: 1) Identity Restoration Module (IRM) for preserving the face details in results. Instead of denoising from pure Gaussian random distribution with LQ images as the condition during the reverse process, we propose a novel truncated sampling method which starts from LQ images with part noise added. We theoretically prove that this change shrinks the evidence lower bound of DPM and then restores more original details. With theoretical proof, two cascade conditional DPMs with different input sizes are introduced to strengthen this sampling effect and reduce training difficulty in the high-resolution image generated directly. 2) Texture Enhancement Module (TEM) for polishing the texture of the image. Here an unconditional DPM, a LQ-free model, is introduced to further force the restorations to appear realistic. We theoretically proved that this unconditional DPM trained on pure HQ images contributes to justifying the correct distribution of inference images output from IRM in pixel-level space. Truncated sampling with fractional time step is utilized to polish pixel-level textures while preserving identity information.	翻訳日:2023-08-09 16:53:04 公開日:2023-08-08
# YOLOCS:特徴空間凝固のためのDense Channel Compressionに基づく物体検出 YOLOCS: Object Detection based on Dense Channel Compression for Feature Spatial Solidification ( http://arxiv.org/abs/2305.04170v3 ) ライセンス: Link先を確認	Lin Huang, Weisheng Li, Linlin Shen, Haojie Fu, Xue Xiao, Suihan Xiao	(参考訳) 本研究では,ネットワーク内の前方および後方伝播に着目し,特徴浄化と勾配バックプロパゲーションの過程におけるチャネル特性と畳み込み核の関係について検討する。そこで本稿では,Dense Channel Compression for Feature Spatial Solidificationを提案する。本手法の中心概念に基づき,Dense Channel Compression for Feature Spatial Solidification Structure (DCFS) と非対称多層圧縮デカップリングヘッド (ADH) という,バックボーンとヘッドネットワークのための2つの革新的なモジュールを導入する。 YOLOv5モデルに統合されると、これらの2つのモジュールは例外的な性能を示し、YOLOCSと呼ばれるモデルが修正される。 MSCOCOデータセットに基づいて評価すると、大、中、小のYOLOCSモデルはそれぞれ50.1%、47.6%、42.5%のAPが得られる。推論速度はYOLOv5モデルと著しく類似しており、大、中、小のYOLOCSモデルはYOLOv5モデルのAPをそれぞれ1.1%、2.3%、5.2%上回っている。 In this study, we examine the associations between channel features and convolutional kernels during the processes of feature purification and gradient backpropagation, with a focus on the forward and backward propagation within the network. Consequently, we propose a method called Dense Channel Compression for Feature Spatial Solidification. Drawing upon the central concept of this method, we introduce two innovative modules for backbone and head networks: the Dense Channel Compression for Feature Spatial Solidification Structure (DCFS) and the Asymmetric Multi-Level Compression Decoupled Head (ADH). When integrated into the YOLOv5 model, these two modules demonstrate exceptional performance, resulting in a modified model referred to as YOLOCS. Evaluated on the MSCOCO dataset, the large, medium, and small YOLOCS models yield AP of 50.1%, 47.6%, and 42.5%, respectively. Maintaining inference speeds remarkably similar to those of the YOLOv5 model, the large, medium, and small YOLOCS models surpass the YOLOv5 model's AP by 1.1%, 2.3%, and 5.2%, respectively.	翻訳日:2023-08-09 16:52:33 公開日:2023-08-08
# 駆動マイクロ波共振器の光子放射統計 Photon emission statistics of a driven microwave cavity ( http://arxiv.org/abs/2305.01986v2 ) ライセンス: Link先を確認	Pedro Portugal, Fredrik Brange, Kalle S. U. Kansanen, Peter Samuelsson, and Christian Flindt	(参考訳) 最近の実験的進歩により、ナノスケール導体中の単一電子のトンネル化や非古典光源からの光子放出など、オープン量子系の個々の量子ジャンプを検出できるようになった。本研究では,外部磁場により共鳴駆動されるマイクロ波共振器から放射される光子の統計を理論的に検討する。パラメトリックとコヒーレントドライブの違いに着目し,キャビティフィールドを圧縮または変位させる。ガウス状態に基づく理論的枠組みを用いて,光子放射統計量の生成関数を得るために,計数場を施したlindbladマスター方程式を用いる。次に、2つのドライブの光子待ち時間の分布と、出射光の$g^{(2)}$-関数を比較し、これらの観測値間の重要な違いを同定する。長時間の限界において、光子放射統計の因子的累積と、この2つの駆動で顕著に異なる放出電流の大規模偏差統計を解析する。理論的な枠組みは、マイクロ波共振器を複数組み合わせた、より複雑なシステムにも容易に拡張でき、将来の実験で予測を検証できる。 Recent experimental advances have made it possible to detect individual quantum jumps in open quantum systems, such as the tunneling of single electrons in nanoscale conductors or the emission of photons from non-classical light sources. Here, we investigate theoretically the statistics of photons emitted from a microwave cavity that is driven resonantly by an external field. We focus on the differences between a parametric and a coherent drive, which either squeezes or displaces the cavity field. We employ a Lindblad master equation dressed with counting fields to obtain the generating function of the photon emission statistics using a theoretical framework based on Gaussian states. We then compare the distribution of photon waiting times for the two drives as well as the $g^{(2)}$-functions of the outgoing light, and we identify important differences between these observables. In the long-time limit, we analyze the factorial cumulants of the photon emission statistics and the large-deviation statistics of the emission currents, which are markedly different for the two drives. Our theoretical framework can readily be extended to more complicated systems, for instance, with several coupled microwave cavities, and our predictions may be tested in future experiments.	翻訳日:2023-08-09 16:51:49 公開日:2023-08-08
# GCformer: 正確でスケーラブルな多変数時系列予測のための効率的なフレームワーク GCformer: An Efficient Framework for Accurate and Scalable Long-Term Multivariate Time Series Forecasting ( http://arxiv.org/abs/2306.08325v2 ) ライセンス: Link先を確認	YanJun Zhao, Ziqing Ma, Tian Zhou, Liang Sun, Mengni Ye, Yi Qian	(参考訳) トランスフォーマーベースのモデルは、時系列予測の有望なツールとして登場した。しかし、これらのモデルでは長い入力時系列の正確な予測はできない。一方で、時系列データ内のグローバルな依存関係を捉えられなかった。一方、長い入力シーケンスは、通常、大きなモデルサイズと高い時間複雑性をもたらす。この制限に対処するために、長い入力列を処理する構造化グローバル畳み込みブランチと、短い最新の信号をキャプチャするローカルトランスフォーマティブベースのブランチを組み合わせたgcformerを提案する。大域的畳み込みカーネルのための凝集フレームワークが3つの異なるパラメータ化手法を用いて導入された。グローバルブランチで選択された構造化畳み込みカーネルは、特に線形の複雑さで構築されており、長大で雑音の多い入力信号の効率的かつ効率的な処理を可能にしている。 6つのベンチマークデータセットに関する実証的研究により、GCformerは最先端の手法より優れており、多変量時系列ベンチマークのMSEエラーを4.38%、モデルパラメータを61.92%削減している。特に、グローバル畳み込み分岐は他のモデルの性能を向上させるためのプラグインブロックとして機能することができ、最近発表された様々なトランスフォーマーベースのモデルを含む平均31.93\%改善されている。私たちのコードはhttps://github.com/zyj-111/gcformerで公開しています。 Transformer-based models have emerged as promising tools for time series forecasting. However, these model cannot make accurate prediction for long input time series. On the one hand, they failed to capture global dependencies within time series data. On the other hand, the long input sequence usually leads to large model size and high time complexity. To address these limitations, we present GCformer, which combines a structured global convolutional branch for processing long input sequences with a local Transformer-based branch for capturing short, recent signals. A cohesive framework for a global convolution kernel has been introduced, utilizing three distinct parameterization methods. The selected structured convolutional kernel in the global branch has been specifically crafted with sublinear complexity, thereby allowing for the efficient and effective processing of lengthy and noisy input signals. Empirical studies on six benchmark datasets demonstrate that GCformer outperforms state-of-the-art methods, reducing MSE error in multivariate time series benchmarks by 4.38% and model parameters by 61.92%. In particular, the global convolutional branch can serve as a plug-in block to enhance the performance of other models, with an average improvement of 31.93\%, including various recently published Transformer-based models. Our code is publicly available at https://github.com/zyj-111/GCformer.	翻訳日:2023-08-09 16:46:07 公開日:2023-08-08
# GEMO-CLAP:ジェンダー属性強化コントラスト言語-Audio Pretraining for Speech Emotion Recognition GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition ( http://arxiv.org/abs/2306.07848v6 ) ライセンス: Link先を確認	Yu Pan, Yanni Hu, Yuguang Yang, Jixun Yao, Wen Fei, Lei Ma, Heng Lu	(参考訳) コントラスト学習に基づくクロスモダリティ事前学習アプローチは、近年、様々な分野で素晴らしい成功を収めている。本稿では,音声感情認識のための性分析型コントラスト言語-audio pretraining (clap) 手法であるgemo-clapを提案する。具体的には,WavLMモデルとRoBERTaモデルを用いて,感情CLAPモデル(Emo-CLAP)を構築した。第二に、音声感情モデリングにおけるジェンダー属性の重要性から、2つのソフトラベルに基づくGEmo-CLAP(SL-GEmo-CLAP)とマルチタスク学習に基づくGEmo-CLAP(ML-GEmo-CLAP)モデルが提案され、音声信号の感情とジェンダー情報を統合し、より合理的な目的を形成する。 IEMOCAPの大規模実験により,提案した2つのGEmo-CLAPモデルがベースラインであるEmo-CLAPより一貫して優れており,また最近の最先端手法と比較して最高の認識性能が得られた。特に、提案したSL-GEMO-CLAPモデルは、81.43\%の最高のUARと83.16\%のWARを達成する。 Contrastive learning based cross-modality pretraining approaches have recently exhibited impressive success in diverse fields. In this paper, we propose GEmo-CLAP, a kind of gender-attribute-enhanced contrastive language-audio pretraining (CLAP) method for speech emotion recognition. Specifically, a novel emotion CLAP model (Emo-CLAP) is first built, utilizing pre-trained WavLM and RoBERTa models. Second, given the significance of the gender attribute in speech emotion modeling, two novel soft label based GEmo-CLAP (SL-GEmo-CLAP) and multi-task learning based GEmo-CLAP (ML-GEmo-CLAP) models are further proposed to integrate emotion and gender information of speech signals, forming more reasonable objectives. Extensive experiments on IEMOCAP show that our proposed two GEmo-CLAP models consistently outperform the baseline Emo-CLAP, while also achieving the best recognition performance compared with recent state-of-the-art methods. Noticeably, the proposed SL-GEmo-CLAP model achieves the best UAR of 81.43\% and WAR of 83.16\% which performs better than other state-of-the-art SER methods by at least 3\%.	翻訳日:2023-08-09 16:45:45 公開日:2023-08-08
# InstructZero: ブラックボックス大言語モデルの効率的な命令最適化 InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models ( http://arxiv.org/abs/2306.03082v2 ) ライセンス: Link先を確認	Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, Tianyi Zhou	(参考訳) 大規模言語モデル~(llms)は命令フォロワであるが、異なる状況、特にバックプロパゲーションが禁止されているブラックボックスllmに対して最適な命令を見つけることは困難である。離散命令を直接最適化する代わりに,オープンソースLLMに適用した低次元ソフトプロンプトを最適化し,ブラックボックスLLMの命令を生成する。 InstructZero と呼ぶ提案手法の各イテレーションにおいて,ソフトプロンプトをオープンソース LLM を用いて命令に変換し,ゼロショット評価のためにブラックボックス LLM に送信し,その性能をベイズ最適化に送信し,ゼロショット性能を向上させるソフトプロンプトを新たに生成する。 Vicuna や ChatGPT など,オープンソースの LLM と API の組み合わせによる InstructZero の評価を行った。 InstructZero は,様々な下流タスクにおいて SOTA 自動命令手法より優れていることを示す。私たちのコードとデータはhttps://github.com/Lichang-Chen/InstructZero.comで公開されています。 Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at https://github.com/Lichang-Chen/InstructZero.	翻訳日:2023-08-09 16:45:17 公開日:2023-08-08
# 公務分野における話題分類のための大規模言語モデル活用 Leveraging Large Language Models for Topic Classification in the Domain of Public Affairs ( http://arxiv.org/abs/2306.02864v2 ) ライセンス: Link先を確認	Alejandro Pe\~na, Aythami Morales, Julian Fierrez, Ignacio Serna, Javier Ortega-Garcia, I\~nigo Puente, Jorge Cordova, Gonzalo Cordova	(参考訳) 行政文書の分析は、透明性、説明責任、情報的意思決定を促進するため、市民にとって不可欠である。市民は政府の政策を理解し、公的な議論に参加し、代表者が責任を負うことができる。特定の規制に依存している企業にとって、これは重要なことであり、時には命または死の問題である。大規模言語モデル(LLM)は、そのような文書で使用される複雑な言語を効果的に処理し理解することで、公務文書の分析を大幅に強化する可能性がある。本研究では,公務文書の分類におけるLCMの性能分析を行う。自然なマルチラベルタスクとして、これらの文書の分類は重要な課題である。本研究では,33K以上のサンプルと22.5Mトークンを持つ公開事務文書のデータベース収集に,Regexを利用したツールを使用する。実験では,スペインにおける4つの異なるllmの性能を評価し,最大30のトピックを異なる構成で分類した。その結果, LLM は公務分野の文書など, ドメイン固有の文書の処理に有効であることが示唆された。 The analysis of public affairs documents is crucial for citizens as it promotes transparency, accountability, and informed decision-making. It allows citizens to understand government policies, participate in public discourse, and hold representatives accountable. This is crucial, and sometimes a matter of life or death, for companies whose operation depend on certain regulations. Large Language Models (LLMs) have the potential to greatly enhance the analysis of public affairs documents by effectively processing and understanding the complex language used in such documents. In this work, we analyze the performance of LLMs in classifying public affairs documents. As a natural multi-label task, the classification of these documents presents important challenges. In this work, we use a regex-powered tool to collect a database of public affairs documents with more than 33K samples and 22.5M tokens. Our experiments assess the performance of 4 different Spanish LLMs to classify up to 30 different topics in the data in different configurations. The results shows that LLMs can be of great use to process domain-specific documents, such as those in the domain of public affairs.	翻訳日:2023-08-09 16:44:58 公開日:2023-08-08
# LLM時代のAI透明性:人間中心の研究ロードマップ AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap ( http://arxiv.org/abs/2306.01941v2 ) ライセンス: Link先を確認	Q. Vera Liao and Jennifer Wortman Vaughan	(参考訳) 強力な大規模言語モデル(llm)の台頭は、イノベーションの絶好の機会をもたらすだけでなく、個人や社会全体に対するリスクも高めている。我々は LLM と LLM を注入したアプリケーションの開発とデプロイを責任を持って行うための重要な瞬間に達した。しかし、責任あるAI — 透明性 — の中心的な柱は、LLMに関する現在の議論から大きく逸脱している。 LLMの透明性を提供するための新しいアプローチを追求することが最重要であり、AIとヒューマンコンピュータの相互作用(HCI)の交差点における長年の研究は、人間中心の視点で行う必要があることを強調している。新たなLLMエコシステムにおける利害関係者のニーズ、新しいタイプのLLM組み込みアプリケーション、LLMに関する新たな利用パターンと課題を考慮し、人々の処理、インタラクション、情報の利用に関する教訓に基づいて、透明性へのアプローチを開発し、設計する必要があります。私たちは、LLMに透明性を提供する上で生じるユニークな課題と、HCIから学んだ教訓、AI透明性を人間中心の視点で捉えた責任あるAI研究を反映しています。次に、透明性を達成するためにコミュニティが採用した4つの一般的なアプローチ -- モデルレポート、評価結果の公開、説明の提供、不確実性の伝達 -- を概説し、これらのアプローチがllmにどのように適用されるか、あるいは適用されないかに関するオープン質問を提起します。これが議論の出発点となり、将来の研究に有用なロードマップになることを願っています。 The rise of powerful large language models (LLMs) brings about tremendous opportunities for innovation but also looming risks for individuals and society at large. We have reached a pivotal moment for ensuring that LLMs and LLM-infused applications are developed and deployed responsibly. However, a central pillar of responsible AI -- transparency -- is largely missing from the current discourse around LLMs. It is paramount to pursue new approaches to provide transparency for LLMs, and years of research at the intersection of AI and human-computer interaction (HCI) highlight that we must do so with a human-centered perspective: Transparency is fundamentally about supporting appropriate human understanding, and this understanding is sought by different stakeholders with different goals in different contexts. In this new era of LLMs, we must develop and design approaches to transparency by considering the needs of stakeholders in the emerging LLM ecosystem, the novel types of LLM-infused applications being built, and the new usage patterns and challenges around LLMs, all while building on lessons learned about how people process, interact with, and make use of information. We reflect on the unique challenges that arise in providing transparency for LLMs, along with lessons learned from HCI and responsible AI research that has taken a human-centered perspective on AI transparency. We then lay out four common approaches that the community has taken to achieve transparency -- model reporting, publishing evaluation results, providing explanations, and communicating uncertainty -- and call out open questions around how these approaches may or may not be applied to LLMs. We hope this provides a starting point for discussion and a useful roadmap for future research.	翻訳日:2023-08-09 16:44:42 公開日:2023-08-08
# Shuffle SGD は常に SGD より優れている: 任意データ順序による SGD の解析の改善 Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders ( http://arxiv.org/abs/2305.19259v3 ) ライセンス: Link先を確認	Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi	(参考訳) 確率勾配 Descent (SGD) アルゴリズムはニューラルネットワークの最適化に広く用いられ、ランダムリシャッフル (RR) とシングルシャッフル (SS) はトレーニングデータのランダムまたは単一置換によるサイクリングの一般的な選択肢である。しかし、非凸の場合におけるこれらのアルゴリズムの収束性は完全には理解されていない。既存の結果から,エポックの数がトレーニングセットサイズよりも小さい現実的なトレーニングシナリオでは,RRはSGDよりも悪いパフォーマンスを示す可能性が示唆された。本稿では,任意のデータ順序付けが可能な一般SGDアルゴリズムを解析し,非凸関数に対する収束率の向上を示す。具体的には, ランダムかつ単一シャッフルのSGDは, イテレーション数に関係なく, 従来のSGDよりも常に高速か,少なくとも同等であることを示す。本研究は,SGDをランダム/単一シャッフルで使用することの利点を強調し,非凸最適化のための収束特性に関する新たな知見を提供する。 Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being popular choices for cycling through random or single permutations of the training data. However, the convergence properties of these algorithms in the non-convex case are not fully understood. Existing results suggest that, in realistic training scenarios where the number of epochs is smaller than the training set size, RR may perform worse than SGD. In this paper, we analyze a general SGD algorithm that allows for arbitrary data orderings and show improved convergence rates for non-convex functions. Specifically, our analysis reveals that SGD with random and single shuffling is always faster or at least as good as classical SGD with replacement, regardless of the number of iterations. Overall, our study highlights the benefits of using SGD with random/single shuffling and provides new insights into its convergence properties for non-convex optimization.	翻訳日:2023-08-09 16:44:12 公開日:2023-08-08
# UMD: X2Xバックドア攻撃の教師なしモデル検出 UMD: Unsupervised Model Detection for X2X Backdoor Attacks ( http://arxiv.org/abs/2305.18651v3 ) ライセンス: Link先を確認	Zhen Xiang, Zidi Xiong, Bo Li	(参考訳) バックドア(トロイの木馬)攻撃はディープニューラルネットワークに対する一般的な脅威であり、バックドアトリガーに埋め込まれた1つ以上のソースクラスからのサンプルは、敵のターゲットクラスに誤分類される。既存の分類器がバックドア攻撃であるかどうかを検出する方法は、主に1対1攻撃(例えば全対1攻撃)で攻撃するために設計されている。我々の知る限り、監督なしでは、任意のソースクラスでより一般的なX2X攻撃に効果的に対処する既存のメソッドは、いずれも任意のターゲットクラスとペアリングすることはできません。本稿では,敵(ソース,ターゲット)クラスペアの合同推論により,x2xバックドア攻撃を効果的に検出する,初の教師なしモデル検出手法umdを提案する。特に,提案するクラスタリングアプローチに基づき,提案するバックドアクラスペアのサブセットを計測・選択するための新しい転送可能性統計を最初に定義した。次に,提案するロバストで教師なしの異常検出器を用いて,検出推定のためのリバースエンジニアリングトリガサイズの集約に基づいて,選択されたクラスペアを共同で評価する。我々は, CIFAR-10, GTSRB, Imagenetteデータセットの総合的な評価を行い, 多様なX2X攻撃に対する検出精度の観点から, 教師なしUDDがSOTA検出器(監督下でも)を17%, 4%, 8%で上回っていることを示す。また,いくつかの強適応攻撃に対するumdの強力な検出性能を示す。 Backdoor (Trojan) attack is a common threat to deep neural networks, where samples from one or more source classes embedded with a backdoor trigger will be misclassified to adversarial target classes. Existing methods for detecting whether a classifier is backdoor attacked are mostly designed for attacks with a single adversarial target (e.g., all-to-one attack). To the best of our knowledge, without supervision, no existing methods can effectively address the more general X2X attack with an arbitrary number of source classes, each paired with an arbitrary target class. In this paper, we propose UMD, the first Unsupervised Model Detection method that effectively detects X2X backdoor attacks via a joint inference of the adversarial (source, target) class pairs. In particular, we first define a novel transferability statistic to measure and select a subset of putative backdoor class pairs based on a proposed clustering approach. Then, these selected class pairs are jointly assessed based on an aggregation of their reverse-engineered trigger size for detection inference, using a robust and unsupervised anomaly detector we proposed. We conduct comprehensive evaluations on CIFAR-10, GTSRB, and Imagenette dataset, and show that our unsupervised UMD outperforms SOTA detectors (even with supervision) by 17%, 4%, and 8%, respectively, in terms of the detection accuracy against diverse X2X attacks. We also show the strong detection performance of UMD against several strong adaptive attacks.	翻訳日:2023-08-09 16:43:51 公開日:2023-08-08
# P-NOC:弱教師付きセマンティックセグメンテーションのための逆CAM生成 P-NOC: Adversarial CAM Generation for Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2305.12522v2 ) ライセンス: Link先を確認	Lucas David, Helio Pedrini, and Zanoni Dias	(参考訳) 大量の教師付きセグメンテーションアノテーションセットの必要性を軽減するため、複数のWeakly Supervised Semantic Segmentation(WSSS)戦略が考案された。これらはしばしば、注釈付き情報の欠如にもかかわらず、セグメンテーション前の有用なプロパティ(例えば、予測完全性と意味境界への忠実性)の開発を促進するための高度なデータとモデル正規化戦略に依存する。本稿では、まず、補完的なWSSS技術を分析し、その強みと限界を考慮して戦略を規則化する。次に,2つの対向CAM生成ネットワークを段階的に改良し,ロバストなセマンティックセマンティックセグメンテーションを提案する。実験の結果,本手法はベースラインの有効性を著しく向上させ,Pascal VOC 2012とMS COCO 2014データセットの両方に対して顕著な改善をもたらすことが示唆された。 To mitigate the necessity for large amounts of supervised segmentation annotation sets, multiple Weakly Supervised Semantic Segmentation (WSSS) strategies have been devised. These will often rely on advanced data and model regularization strategies to instigate the development of useful properties (e.g., prediction completeness and fidelity to semantic boundaries) in segmentation priors, notwithstanding the lack of annotated information. In this work, we first create a strong baseline by analyzing complementary WSSS techniques and regularizing strategies, considering their strengths and limitations. We then propose a new Class-specific Adversarial Erasing strategy, comprising two adversarial CAM generating networks being gradually refined to produce robust semantic segmentation proposals. Empirical results suggest that our approach induces substantial improvement in the effectiveness of the baseline, resulting in a noticeable improvement over both Pascal VOC 2012 and MS COCO 2014 datasets.	翻訳日:2023-08-09 16:42:54 公開日:2023-08-08
# 実数値観測からの強化学習のためのニューロモルフィックアーキテクチャ A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations ( http://arxiv.org/abs/2307.02947v2 ) ライセンス: Link先を確認	Sergio F. Chevtchenko, Yeshwanth Bethi, Teresa B. Ludermir, Saeed Afshar	(参考訳) 強化学習(RL)は複雑な環境における意思決定のための強力なフレームワークを提供する。しかし、ハードウェア効率とバイオインスパイアされた方法でRLを実装することは依然として課題である。本稿では,実測値を用いてRL問題を解くための新しいスパイキングニューラルネットワーク(SNN)アーキテクチャを提案する。提案モデルは,td(temporal difference)-error modulation)とeligibility tracesを追加して,事前作業に基づいて多層イベントベースクラスタリングを組み込んだものである。アブレーション研究は、これらの成分がモデルの性能に与える影響を裏付けるものである。適応性トレースを持つ表型アクター批判アルゴリズムと最先端のPPOアルゴリズムをベンチマークとして使用する。当社のネットワークは,従来型のRL環境(マウンテンカー,カートポール,アクロボット)における安定的な制御ポリシの発見に成功した。提案モデルは,計算およびハードウェア実装要件の観点から,魅力的なトレードオフを提供する。このモデルは外部メモリバッファやグローバルエラー勾配計算を必要とせず、ローカル学習ルールと放送されたtd-error信号によってオンラインにシナプス更新が行われる。したがって、この研究はよりハードウェア効率の良いRLソリューションの開発に寄与する。 Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. However, implementing RL in hardware-efficient and bio-inspired ways remains a challenge. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations. The proposed model incorporates multi-layered event-based clustering, with the addition of Temporal Difference (TD)-error modulation and eligibility traces, building upon prior work. An ablation study confirms the significant impact of these components on the proposed model's performance. A tabular actor-critic algorithm with eligibility traces and a state-of-the-art Proximal Policy Optimization (PPO) algorithm are used as benchmarks. Our network consistently outperforms the tabular approach and successfully discovers stable control policies on classic RL environments: mountain car, cart-pole, and acrobot. The proposed model offers an appealing trade-off in terms of computational and hardware implementation requirements. The model does not require an external memory buffer nor a global error gradient computation, and synaptic updates occur online, driven by local learning rules and a broadcasted TD-error signal. Thus, this work contributes to the development of more hardware-efficient RL solutions.	翻訳日:2023-08-09 16:36:34 公開日:2023-08-08
# MAE-DFER:自己教師型動的顔表情認識のための効率的なマスク付きオートエンコーダ MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition ( http://arxiv.org/abs/2307.02227v2 ) ライセンス: Link先を確認	Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao	(参考訳) 動的表情認識(DFER)は、インテリジェントで共感的な機械の開発に不可欠である。この分野での以前の取り組みは、主に教師付き学習パラダイムに当てはまり、既存のデータセットの制限付きデータによって厳しく制限されている。近年のマスク付きオートエンコーダ(例:videomae)の成功に触発されて,大量のラベルなしデータに対して大規模自己教師付き事前学習を活用し,dferの開発を大いに前進させる新しい自己教師付き手法mae-dferを提案する。ビデオMAEで使用されるバニラ・ビジョン・トランスフォーマー(ViT)は微調整中にかなりの計算を必要とするため、MAE-DFERはエンコーダとして効率的なローカル・グローバル・インタラクション・トランスフォーマー(LGI-Former)を開発する。さらに,MAE-DFERは,ビデオMAEのスタンドアロンな外観コンテンツ再構成に加えて,LGI-Formerが静的な外観情報と動的動き情報の両方を発掘することを奨励する明示的な時間的顔の動きモデリングも導入している。 6つのデータセットに対する大規模な実験により、MAE-DFERは最先端の教師付き手法をかなりのマージン(DFEWでは+6.30\% UAR、MAFWでは+8.34\% UAR)で一貫して上回り、大規模な自己監督型事前訓練を通じて強力な動的顔表現を学習できることが確認された。さらに、ビデオMAEと同等かそれ以上の性能を有し、計算コスト(約38 % FLOPs)を大幅に削減している。 mae-dferは、dferの進歩のための新しい方法を開拓し、この分野および他の関連するタスクにおいて、より関連する研究を刺激することができると信じている。コードとモデルはhttps://github.com/sunlicai/MAE-DFERで公開されている。 Dynamic facial expression recognition (DFER) is essential to the development of intelligent and empathetic machines. Prior efforts in this field mainly fall into supervised learning paradigm, which is severely restricted by the limited labeled data in existing datasets. Inspired by recent unprecedented success of masked autoencoders (e.g., VideoMAE), this paper proposes MAE-DFER, a novel self-supervised method which leverages large-scale self-supervised pre-training on abundant unlabeled data to largely advance the development of DFER. Since the vanilla Vision Transformer (ViT) employed in VideoMAE requires substantial computation during fine-tuning, MAE-DFER develops an efficient local-global interaction Transformer (LGI-Former) as the encoder. Moreover, in addition to the standalone appearance content reconstruction in VideoMAE, MAE-DFER also introduces explicit temporal facial motion modeling to encourage LGI-Former to excavate both static appearance and dynamic motion information. Extensive experiments on six datasets show that MAE-DFER consistently outperforms state-of-the-art supervised methods by significant margins (e.g., +6.30\% UAR on DFEW and +8.34\% UAR on MAFW), verifying that it can learn powerful dynamic facial representations via large-scale self-supervised pre-training. Besides, it has comparable or even better performance than VideoMAE, while largely reducing the computational cost (about 38\% FLOPs). We believe MAE-DFER has paved a new way for the advancement of DFER and can inspire more relevant research in this field and even other related tasks. Codes and models are publicly available at https://github.com/sunlicai/MAE-DFER.	翻訳日:2023-08-09 16:36:19 公開日:2023-08-08
# 変分ガウス近似のための高次幾何積分器 High-order geometric integrators for the variational Gaussian approximation ( http://arxiv.org/abs/2306.17608v2 ) ライセンス: Link先を確認	Roya Moghaddasi Fereidani and Ji\v{r}\'i J. L. Van\'i\v{c}ek	(参考訳) 時間依存型シュル「o」ディンガー方程式を解くための単軌道ガウス法のうち、変分ガウス近似が最も正確である。ヘラーの元々のソードガウス近似とは対照的に、シンプレクティックであり、エネルギーを正確に保存し、部分的にトンネルを考慮できる。しかし、変分法もはるかに高価である。効率を向上させるため,faou と lubich の2次シンプレクティック積分器を対称に合成し,任意の収束次数を時間ステップで達成できる幾何学的積分器を得る。本研究では,高次積分器が2次アルゴリズムに比べて収束を劇的に高速化できることを示すとともに,一般の4次ルンゲ・クッタ法とは対照的に,標準とシンプレクティック構造を正確に保存できることを示す。本手法は低次元系に限定されないことを示すため, 結合モーゼ発振器の非分離性20次元モデル上で解析を行う。また, 変分法はトンネルを捕捉し, 非変分法によるガウス近似よりも精度を向上することを示した。 Among the single-trajectory Gaussian-based methods for solving the time-dependent Schr\"{o}dinger equation, the variational Gaussian approximation is the most accurate one. In contrast to Heller's original thawed Gaussian approximation, it is symplectic, conserves energy exactly, and may partially account for tunneling. However, the variational method is also much more expensive. To improve its efficiency, we symmetrically compose the second-order symplectic integrator of Faou and Lubich and obtain geometric integrators that can achieve an arbitrary even order of convergence in the time step. We demonstrate that the high-order integrators can speed up convergence drastically compared to the second-order algorithm and, in contrast to the popular fourth-order Runge-Kutta method, are time-reversible and conserve the norm and the symplectic structure exactly, regardless of the time step. To show that the method is not restricted to low-dimensional systems, we perform most of the analysis on a non-separable twenty-dimensional model of coupled Morse oscillators. We also show that the variational method may capture tunneling and, in general, improves accuracy over the non-variational thawed Gaussian approximation.	翻訳日:2023-08-09 16:35:19 公開日:2023-08-08
# VCMのためのエンドツーエンド学習型マルチスケール特徴圧縮 End-to-End Learnable Multi-Scale Feature Compression for VCM ( http://arxiv.org/abs/2306.16670v3 ) ライセンス: Link先を確認	Yeongwoong Kim, Hyewon Jeong, Janghyun Yu, Younhee Kim, Jooyoung Lee, Se Yoon Jeong, and Hui Yong Kim	(参考訳) ディープラーニングベースのマシンビジョンアプリケーションの普及により、ビデオ符号化(VCM)と呼ばれる新しいタイプの圧縮が生まれている。 VCMは従来のビデオコーディングとは異なり、人間の視覚的品質ではなく、マシンビジョンのパフォーマンスに最適化されている。 MPEG-VCMの特徴圧縮トラックでは,画像から抽出したマルチスケール特徴を圧縮する。近年,MPEG-VCM機能アンカーに対するBDレートを最大96%削減できる多目的ビデオ符号化(VVC)標準方式が実証されている。しかし、vvcは抽出された特徴ではなく、自然画像のために設計されたため、まだ最適ではない。さらに、VVCの符号化複雑性が高いため、性能を犠牲にすることなく軽量エンコーダの設計が困難になる。これらの課題に対処するため,我々は,抽出された特徴のエンドツーエンド最適化と軽量エンコーダの設計を両立する,新しいマルチスケール特徴圧縮手法を提案する。提案モデルは,学習可能な圧縮機とマルチスケール特徴融合ネットワークを組み合わせることで,マルチスケール特徴の冗長性を効果的に除去する。融合ネットワークと圧縮ネットワークを単純にカスケードする代わりに、融合処理と符号化処理をインターリーブ方式で統合する。提案モデルでは,まず大規模特徴を符号化して潜伏表現を取得し,さらに小型特徴量で潜伏表現を融合する。この処理は、最小のスケール特徴が融合するまで連続して行われ、最終段階のエントロピー符号化によりエントロピー符号化が行われる。その結果、我々のモデルは、BDレートを少なくとも52%削減し、オブジェクト検出のエンコードタイムを$\times5$から$\times27$に短縮した。 The proliferation of deep learning-based machine vision applications has given rise to a new type of compression, so called video coding for machine (VCM). VCM differs from traditional video coding in that it is optimized for machine vision performance instead of human visual quality. In the feature compression track of MPEG-VCM, multi-scale features extracted from images are subject to compression. Recent feature compression works have demonstrated that the versatile video coding (VVC) standard-based approach can achieve a BD-rate reduction of up to 96% against MPEG-VCM feature anchor. However, it is still sub-optimal as VVC was not designed for extracted features but for natural images. Moreover, the high encoding complexity of VVC makes it difficult to design a lightweight encoder without sacrificing performance. To address these challenges, we propose a novel multi-scale feature compression method that enables both the end-to-end optimization on the extracted features and the design of lightweight encoders. The proposed model combines a learnable compressor with a multi-scale feature fusion network so that the redundancy in the multi-scale features is effectively removed. Instead of simply cascading the fusion network and the compression network, we integrate the fusion and encoding processes in an interleaved way. Our model first encodes a larger-scale feature to obtain a latent representation and then fuses the latent with a smaller-scale feature. This process is successively performed until the smallest-scale feature is fused and then the encoded latent at the final stage is entropy-coded for transmission. The results show that our model outperforms previous approaches by at least 52% BD-rate reduction and has $\times5$ to $\times27$ times less encoding time for object detection...	翻訳日:2023-08-09 16:34:56 公開日:2023-08-08
# 空間的詳細記憶を用いたパンシャープ化への学習 Learning to Pan-sharpening with Memories of Spatial Details ( http://arxiv.org/abs/2306.16181v3 ) ライセンス: Link先を確認	Maoxun Yuan, Tianyi Zhao, Bo Li, Xingxing Wei	(参考訳) リモートセンシングシステムにおいて最もよく用いられる技術の一つであるパンシャーペニングは、パンクロマティック画像からマルチスペクトル画像(MS)に空間的詳細を注入し、高解像度のマルチスペクトル画像を得る。ディープラーニングはその強固な適合能力と効率的な特徴抽出によって広く注目を集めているため、優れた性能を達成するために様々なパンシャープ化手法が提案されている。しかしながら、現在のパンシャーピング法では、通常、ペア化されたパンクロマトグラフィ(PAN)とMSイメージを入力として必要としており、いくつかのシナリオでは使用を制限している。この問題に対処するために,本論文では,PAN画像の空間的詳細が主に高周波の手がかりである,すなわち入力PAN画像の輪郭を反映していることを観察する。これにより,いくつかのベースエッジを格納するPAN非依存表現を開発し,それを介して対応するPAN画像の輪郭を構成することができる。その結果、推定時にms画像のみを用いてパンシャープ化タスクを行うことができる。この目的のために、メモリベースのネットワークは、トレーニングフェーズ中に空間の詳細を抽出して記憶するように適応し、メモリベースの空間詳細ネットワーク(MSDN)と呼ばれる推論時にPAN画像から空間情報を取得するプロセスを置き換えるために使用される。最後に、提案したMSDNモジュールを既存のディープラーニングベースのパンシャーピング手法に統合し、エンドツーエンドのパンシャーピングネットワークを実現する。我々はGaofen1衛星とWorldView-4衛星の広範な実験により、PAN画像なしで良好な空間的詳細を構築し、最高の性能を達成することを検証する。コードはhttps://github.com/Zhao-Tian-yi/Learning-to-Pan-sharpening-with-Memories-of-Spatial-Details.gitで公開されている。 Pan-sharpening, as one of the most commonly used techniques in remote sensing systems, aims to inject spatial details from panchromatic images into multispectral images (MS) to obtain high-resolution multispectral images. Since deep learning has received widespread attention because of its powerful fitting ability and efficient feature extraction, a variety of pan-sharpening methods have been proposed to achieve remarkable performance. However, current pan-sharpening methods usually require the paired panchromatic (PAN) and MS images as input, which limits their usage in some scenarios. To address this issue, in this paper we observe that the spatial details from PAN images are mainly high-frequency cues, i.e., the edges reflect the contour of input PAN images. This motivates us to develop a PAN-agnostic representation to store some base edges, so as to compose the contour for the corresponding PAN image via them. As a result, we can perform the pan-sharpening task with only the MS image when inference. To this end, a memory-based network is adapted to extract and memorize the spatial details during the training phase and is used to replace the process of obtaining spatial information from PAN images when inference, which is called Memory-based Spatial Details Network (MSDN). Finally, we integrate the proposed MSDN module into the existing deep learning-based pan-sharpening methods to achieve an end-to-end pan-sharpening network. With extensive experiments on the Gaofen1 and WorldView-4 satellites, we verify that our method constructs good spatial details without PAN images and achieves the best performance. The code is available at https://github.com/Zhao-Tian-yi/Learning-to-Pan-sharpening-with-Memories-of-Spatial-Details.git.	翻訳日:2023-08-09 16:34:26 公開日:2023-08-08
# 調和振動子の固有状態を記述する経路分布と他の1次元問題 Path distributions for describing eigenstates of the harmonic oscillator and other 1-dimensional problems ( http://arxiv.org/abs/2306.11155v2 ) ライセンス: Link先を確認	Randall M. Feenstra	(参考訳) 経路の確率振幅を合計して調和振動子の波動関数を形成する方法と、他の単純な1次元問題について述べる。各問題に対して既知の閉形式パスベースの伝搬器を用いて、波動関数を記述する積分式を記述する。この表現は伝統的に粒子の初期位置上の積分の形を取るが、経路の終点間の運動に関連した特性運動量の観点からここで再表現される。このようにして、得られた表現は定常位相解析の一般化を用いて解析され、各固有状態を正確に記述する経路の分布に繋がる。これらの分布は全ての旅行時間に有効であるが、長い時間評価すると、特性運動量の非負関数であることが判明する。特に調和振動子の場合、幾分広い分布が見られ、記述される状態のエネルギー固有値と等しい古典エネルギーに対応する運動量の値でピークとなる。 The manner in which probability amplitudes of paths sum up to form wave functions of a harmonic oscillator, as well as other, simple 1-dimensional problems, is described. Using known, closed-form, path-based propagators for each problem, an integral expression is written that describes the wave function. This expression conventionally takes the form of an integral over initial locations of a particle, but it is re-expressed here in terms of a characteristic momentum associated with motion between the endpoints of a path. In this manner, the resulting expression can be analyzed using a generalization of stationary-phase analysis, leading to distributions of paths that exactly describe each eigenstate. These distributions are valid for all travel times, but when evaluated for long times they turn out to be real, non-negative functions of the characteristic momentum. For the harmonic oscillator in particular, a somewhat broad distribution is found, peaked at value of momentum that corresponds to a classical energy which in turn equals the energy eigenvalue for the state being described.	翻訳日:2023-08-09 16:33:54 公開日:2023-08-08
# 文書レイアウトアノテーション:公務領域におけるデータベースとベンチマーク Document Layout Annotation: Database and Benchmark in the Domain of Public Affairs ( http://arxiv.org/abs/2306.10046v2 ) ライセンス: Link先を確認	Alejandro Pe\~na, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Marcos Grande, I\~nigo Puente, Jorge Cordova, Gonzalo Cordova	(参考訳) 毎日何千ものデジタル文書が、企業、公共団体、市民に有用な情報と共に生成される。手動で処理できないことを考えると、これらの文書の自動処理は特定の分野においてますます必要となってきている。しかし、ほとんどの場合、テキストのみの構文解析では、様々な意味を持つ異なるコンポーネントを通して提示される情報を十分に理解できないため、この課題は依然として困難なままである。このような観点から、文書レイアウト分析(Document Layout Analysis, DLA)は、文書の基本コンポーネントを検出し分類することを目的とした、長年にわたる興味深い研究分野である。本研究では4つの基本レイアウトブロックと4つのテキストカテゴリを含む,異なるレイアウトラベルを持つデジタル文書をセミオートマチックにアノテートする手法を用いた。本稿では,スペイン政府から24件のデータソースを用いて,行政領域におけるDLAの新しいデータベースの収集に本手法を適用した。データベースは、37.9Kドキュメントと441Kドキュメントページと、8Mラベルが8つのレイアウトブロックユニットに関連付けられている。実験の結果,提案するテキストラベリング手順を最大99%の精度で検証した。 Every day, thousands of digital documents are generated with useful information for companies, public organizations, and citizens. Given the impossibility of processing them manually, the automatic processing of these documents is becoming increasingly necessary in certain sectors. However, this task remains challenging, since in most cases a text-only based parsing is not enough to fully understand the information presented through different components of varying significance. In this regard, Document Layout Analysis (DLA) has been an interesting research field for many years, which aims to detect and classify the basic components of a document. In this work, we used a procedure to semi-automatically annotate digital documents with different layout labels, including 4 basic layout blocks and 4 text categories. We apply this procedure to collect a novel database for DLA in the public affairs domain, using a set of 24 data sources from the Spanish Administration. The database comprises 37.9K documents with more than 441K document pages, and more than 8M labels associated to 8 layout block units. The results of our experiments validate the proposed text labeling procedure with accuracy up to 99%.	翻訳日:2023-08-09 16:33:37 公開日:2023-08-08
# 大規模言語モデルは本当に優れた論理型推論器か? 総合的な評価とそれ以上 Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond ( http://arxiv.org/abs/2306.09841v3 ) ライセンス: Link先を確認	Fangzhi Xu, Qika Lin, Jiawei Han, Tianzhe Zhao, Jun Liu, Erik Cambria	(参考訳) 論理的推論は、知識工学と人工知能の分野において、一貫して基本的で重要な役割を果たす。近年、Large Language Models (LLMs) は自然言語処理(NLP)における注目すべき革新として現れ、様々な古典的NLPタスクにおいて顕著な成果を発揮している。しかし、LLMが人間の知性に類似した段階的な認知推論を必要とする論理的推論の課題に効果的に対処できるかどうかという問題は未解決のままである。この目的のために,本論文では,このギャップを橋渡しし,包括的評価を行う。まず,システマティックな評価を行うために,15の典型的な論理推論データセットを選択し,推論,帰納的,帰納的,混合形式の推論設定に整理する。評価の包括性を考慮すると、3つの代表的なLCM(text-davinci-003, ChatGPT, BARD)を含み、ゼロショット、ワンショット、3ショット設定で選択されたすべてのデータセットで評価する。第二に,単純な指標(例えば正確性)のみに依存する従来の評価と異なり,客観的・主観的評価を行い,回答と説明の両方をカバーする。さらに、LLMの論理的欠陥を明らかにするために、問題のあるケースは2次元から5つのエラータイプ、すなわちエビデンス選択プロセスと推論プロセスに起因する。第三に、知識バイアスの影響を回避し、LLMの論理的推論能力のベンチマークに純粋に集中するため、中立性のある新しいデータセットを提案する。サンプルは3,000種類あり、デダクティブ、インダクティブ、アブダクティブの設定をカバーしている。本論文は,詳細な評価に基づいて,6次元から論理推論能力の一般的な評価手法を提案する。 LLMの長所と短所を反映し、将来の作品の指針を与える。 Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge engineering and artificial intelligence. Recently, Large Language Models (LLMs) have emerged as a noteworthy innovation in natural language processing (NLP), exhibiting impressive achievements across various classic NLP tasks. However, the question of whether LLMs can effectively address the task of logical reasoning, which requires gradual cognitive inference similar to human intelligence, remains unanswered. To this end, we aim to bridge this gap and provide comprehensive evaluations in this paper. Firstly, to offer systematic evaluations, we select fifteen typical logical reasoning datasets and organize them into deductive, inductive, abductive and mixed-form reasoning settings. Considering the comprehensiveness of evaluations, we include three representative LLMs (i.e., text-davinci-003, ChatGPT and BARD) and evaluate them on all selected datasets under zero-shot, one-shot and three-shot settings. Secondly, different from previous evaluations relying only on simple metrics (e.g., accuracy), we propose fine-level evaluations from objective and subjective manners, covering both answers and explanations. Additionally, to uncover the logical flaws of LLMs, problematic cases will be attributed to five error types from two dimensions, i.e., evidence selection process and reasoning process. Thirdly, to avoid the influences of knowledge bias and purely focus on benchmarking the logical reasoning capability of LLMs, we propose a new dataset with neutral content. It contains 3,000 samples and covers deductive, inductive and abductive settings. Based on the in-depth evaluations, this paper finally forms a general evaluation scheme of logical reasoning capability from six dimensions. It reflects the pros and cons of LLMs and gives guiding directions for future works.	翻訳日:2023-08-09 16:33:20 公開日:2023-08-08
# テキストから画像へのデータ帰属の評価 Evaluating Data Attribution for Text-to-Image Models ( http://arxiv.org/abs/2306.09345v2 ) ライセンス: Link先を確認	Sheng-Yu Wang, Alexei A. Efros, Jun-Yan Zhu, Richard Zhang	(参考訳) 大きなテキスト・画像モデルでは「ノーベル」なイメージを合成できるが、これらの画像は必ずしもトレーニングデータのリフレクションである。このようなモデルにおけるデータ帰属の問題 -- トレーニングセット内の画像のどれが、生成された画像の出現に最も責任を持つか -- は、難しいが重要な問題である。この問題に対する最初のステップとして、既存の大規模モデルを所定の例題オブジェクトやスタイルに向けてチューニングする「カスタマイズ」メソッドによる帰属評価を行う。私たちのキーとなる洞察は、これによって、構築によって模範にコンピュータ的に影響される合成画像を効率的に作成できるということです。このような画像の新たなデータセットを用いて、様々なデータ属性アルゴリズムと様々な可能な特徴空間を評価することができる。さらに,データセット上でトレーニングすることで,dino, clip, vitなどの標準モデルを帰属問題に向けてチューニングすることができる。手順は小さな例集合に向けて調整されるが、より大きい集合への一般化を示す。最後に,問題の本質的不確実性を考慮することで,一連のトレーニング画像に対してソフト属性スコアを割り当てることができる。 While large text-to-image models are able to synthesize "novel" images, these images are necessarily a reflection of the training data. The problem of data attribution in such models -- which of the images in the training set are most responsible for the appearance of a given generated image -- is a difficult yet important one. As an initial step toward this problem, we evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style. Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction. With our new dataset of such exemplar-influenced images, we are able to evaluate various data attribution algorithms and different possible feature spaces. Furthermore, by training on our dataset, we can tune standard models, such as DINO, CLIP, and ViT, toward the attribution problem. Even though the procedure is tuned towards small exemplar sets, we show generalization to larger sets. Finally, by taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.	翻訳日:2023-08-09 16:32:23 公開日:2023-08-08
# 大規模言語モデルを用いた数学的導出の生成 Generating Mathematical Derivations with Large Language Models ( http://arxiv.org/abs/2307.09998v3 ) ライセンス: Link先を確認	Jordan Meadows, Marco Valentino, Andre Freitas	(参考訳) LLM(Large Language Models)を用いた特殊分野における数学的結果の導出は、モデルの限界を識別し、数学的発見を支援するための新たな研究方向である。本稿では,記号エンジンを用いて大規模方程式の導出を行い,目的方程式を前提から導出する際の LLM の機能について検討する。具体的には,事前学習戦略の頑健さと一般化を特殊化モデルと比較するため,GPTの文脈内学習とT5モデルの微調整を行う。実験結果から,FLAN-T5-large (MathT5) は従来のスコアにおいて,全ての静的および分布外テストセットにおいてGPTモデルよりも優れていた。しかし、詳細な分析により、微調整されたモデルは、見当たらない記号を含む摂動や(より少ない範囲で)方程式構造の変化に対してより敏感であることが明らかになった。さらに、1.7Kの方程式と200以上の導出を解析し、誤り、無関係、冗長な方程式を含むような一般的な推論誤差を強調する。最後に、数学的導出を評価するための既存の指標の適合性について検討し、摂動に対する感度などの一般的な特性を捉えることができるが、詳細な推論誤差やモデル間の本質的な差異を強調できないことを示す。全体として、この研究は合成データのトレーニングモデルがより大きなLLMよりも数学能力を向上することを示したが、現在のメトリクスは生成した数学的テキストの品質を適切に評価していない。 The derivation of mathematical results in specialised fields, using Large Language Models (LLMs), is an emerging research direction that can help identify models' limitations, and potentially support mathematical discovery. In this paper, we leverage a symbolic engine to generate derivations of equations at scale, and investigate the capabilities of LLMs when deriving goal equations from premises. Specifically, we employ in-context learning for GPT and fine-tune a range of T5 models to compare the robustness and generalisation of pre-training strategies to specialised models. Empirical results show that fine-tuned FLAN-T5-large (MathT5) outperforms GPT models on all static and out-of-distribution test sets in conventional scores. However, an in-depth analysis reveals that the fine-tuned models are more sensitive to perturbations involving unseen symbols and (to a lesser extent) changes to equation structure. In addition, we analyse 1.7K equations, and over 200 derivations, to highlight common reasoning errors such as the inclusion of incorrect, irrelevant, and redundant equations. Finally, we explore the suitability of existing metrics for evaluating mathematical derivations and find evidence that, while they can capture general properties such as sensitivity to perturbations, they fail to highlight fine-grained reasoning errors and essential differences between models. Overall, this work demonstrates that training models on synthetic data may improve their math capabilities beyond much larger LLMs, but current metrics are not appropriately assessing the quality of generated mathematical text.	翻訳日:2023-08-09 16:26:52 公開日:2023-08-08
# 遠方点雲登録のための密度不変特性 Density-invariant Features for Distant Point Cloud Registration ( http://arxiv.org/abs/2307.09788v2 ) ライセンス: Link先を確認	Quan Liu, Hongzi Zhu, Yunsong Zhou, Hongyang Li, Shan Chang, Minyi Guo	(参考訳) 遠隔地ライダー点雲の登録は、協調走行車の3dビジョンを拡張する上で重要であるが、重複面積が小さいことと観測点密度の差が大きいため、課題である。本稿では, 遠方のライダー点雲を登録するために, 密度不変な幾何学的特徴を抽出するグループワイズコントラスト学習(gcl)スキームを提案する。我々は、密度不変特徴抽出器を訓練するために、コントラスト正値が独立かつ同一分布(i.i.d.)であるべきという理論的解析と実験を通した。本稿では,同一空間的位置(正の群と呼ばれる)における複数の点群の特徴を類似させる,単純かつ効果的な訓練手法を提案し,一対の点群がi.i.d.原理に適合するように導入するサンプリングバイアスを回避する。結果として得られる完全畳み込み特徴抽出器は最先端の手法よりも強力で密度不変であり、KITTIとnuScenesベンチマークにおける遠隔シナリオの登録リコールをそれぞれ40.9%、26.9%改善した。コードはhttps://github.com/liuQuan98/GCLで入手できる。 Registration of distant outdoor LiDAR point clouds is crucial to extending the 3D vision of collaborative autonomous vehicles, and yet is challenging due to small overlapping area and a huge disparity between observed point densities. In this paper, we propose Group-wise Contrastive Learning (GCL) scheme to extract density-invariant geometric features to register distant outdoor LiDAR point clouds. We mark through theoretical analysis and experiments that, contrastive positives should be independent and identically distributed (i.i.d.), in order to train densityinvariant feature extractors. We propose upon the conclusion a simple yet effective training scheme to force the feature of multiple point clouds in the same spatial location (referred to as positive groups) to be similar, which naturally avoids the sampling bias introduced by a pair of point clouds to conform with the i.i.d. principle. The resulting fully-convolutional feature extractor is more powerful and density-invariant than state-of-the-art methods, improving the registration recall of distant scenarios on KITTI and nuScenes benchmarks by 40.9% and 26.9%, respectively. Code is available at https://github.com/liuQuan98/GCL.	翻訳日:2023-08-09 16:26:27 公開日:2023-08-08
# AesPA-Net:美的パターン認識型転送ネットワーク AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks ( http://arxiv.org/abs/2307.09724v3 ) ライセンス: Link先を確認	Kibeom Hong, Seogkyu Jeon, Junsoo Lee, Namhyuk Ahn, Kunhee Kim, Pilhyeon Lee, Daesik Kim, Youngjung Uh, Hyeran Byun	(参考訳) 対象のスタイルを芸術的に表現するために、近年の研究では、スタイル画像の局所パッチをコンテンツ画像の対応するパッチにマッピングする能力により、注意機構を活用している。しかし、任意の内容とアートワークのセマンティックな対応が低いため、アテンションモジュールはスタイルイメージから特定のローカルパッチを乱用し、不調和で明らかな反復的なアーティファクトをもたらす。この制限を克服し,芸術的なスタイルの伝達を困難にするため,注意機構の強化とスタイルを整理するパターンのリズムの獲得に重点を置いている。本稿では,スタイル画像におけるパターンの反復を定量化する新しい指標であるパターン反復可能性について述べる。このパターン再現性に基づき,局所的およびグローバル的表現のスイートスポットを探索する美的パターン認識型転送ネットワーク(aespa-net)を提案する。さらに,注意機構が正確で意味のある意味的対応を学習することを奨励する,新たな自己監督タスクを提案する。最後に,局所パターンの精巧なリズムを伝達するためにパッチワイズスタイルロスを導入する。定量的に定量的な評価を行い,人間の知覚に適合するパターン再現性の信頼性を検証し,提案手法の優れていることを示す。 To deliver the artistic expression of the target style, recent studies exploit the attention mechanism owing to its ability to map the local patches of the style image to the corresponding patches of the content image. However, because of the low semantic correspondence between arbitrary content and artworks, the attention module repeatedly abuses specific local patches from the style image, resulting in disharmonious and evident repetitive artifacts. To overcome this limitation and accomplish impeccable artistic style transfer, we focus on enhancing the attention mechanism and capturing the rhythm of patterns that organize the style. In this paper, we introduce a novel metric, namely pattern repeatability, that quantifies the repetition of patterns in the style image. Based on the pattern repeatability, we propose Aesthetic Pattern-Aware style transfer Networks (AesPA-Net) that discover the sweet spot of local and global style expressions. In addition, we propose a novel self-supervisory task to encourage the attention mechanism to learn precise and meaningful semantic correspondence. Lastly, we introduce the patch-wise style loss to transfer the elaborate rhythm of local patterns. Through qualitative and quantitative evaluations, we verify the reliability of the proposed pattern repeatability that aligns with human perception, and demonstrate the superiority of the proposed framework.	翻訳日:2023-08-09 16:26:05 公開日:2023-08-08
# 人工知能のプライバシと進歩のバランス:生物医学研究・教育の病理学における匿名化 Balancing Privacy and Progress in Artificial Intelligence: Anonymization in Histopathology for Biomedical Research and Education ( http://arxiv.org/abs/2307.09426v2 ) ライセンス: Link先を確認	Neel Kanwal, Emiel A.M. Janssen, Kjersti Engan	(参考訳) 生物医学研究の進展は、大量の医療データへのアクセスに大きく依存している。病理組織学の場合,全スライド画像(WSI)と臨床病理学的情報は,Digital Pathology(DP)のための人工知能(AI)アルゴリズムの開発に有用である。医療データの転送は、二次的な目的のためにデータの使用性を高めるが、患者のプライバシにリスクをもたらす。同時に、既存の規制は、再識別リスクを避けるため、医療データを「必要に応じてクローズド」し続けるよう推進している。一般に、これらの法的規制は機密データを削除する必要があるが、現代の画像マッチングアルゴリズムによるデータ連鎖攻撃の可能性を考慮していない。さらに、DPにおける標準化の欠如により、WSIのすべてのフォーマットに対して単一のソリューションを確立するのが難しくなる。これらの課題は、AIアルゴリズムを開発しながらプライバシーと進捗のバランスをとるバイオインフォマティクス研究者の問題を提起する。本稿では,医療データ共有の法的規制と用語について検討する。我々は既存のアプローチをレビューし、病理学的観点から課題を強調する。また,多分野の研究・教育を促進するために,組織データのためのデータ共有ガイドラインも提示する。 The advancement of biomedical research heavily relies on access to large amounts of medical data. In the case of histopathology, Whole Slide Images (WSI) and clinicopathological information are valuable for developing Artificial Intelligence (AI) algorithms for Digital Pathology (DP). Transferring medical data "as open as possible" enhances the usability of the data for secondary purposes but poses a risk to patient privacy. At the same time, existing regulations push towards keeping medical data "as closed as necessary" to avoid re-identification risks. Generally, these legal regulations require the removal of sensitive data but do not consider the possibility of data linkage attacks due to modern image-matching algorithms. In addition, the lack of standardization in DP makes it harder to establish a single solution for all formats of WSIs. These challenges raise problems for bio-informatics researchers in balancing privacy and progress while developing AI algorithms. This paper explores the legal regulations and terminologies for medical data-sharing. We review existing approaches and highlight challenges from the histopathological perspective. We also present a data-sharing guideline for histological data to foster multidisciplinary research and education.	翻訳日:2023-08-09 16:25:43 公開日:2023-08-08
# なぜ小さなロバストさが役に立つのか? 代理訓練による対向移動可能性の理解 Why Does Little Robustness Help? Understanding Adversarial Transferability From Surrogate Training ( http://arxiv.org/abs/2307.07873v3 ) ライセンス: Link先を確認	Yechao Zhang, Shengshan Hu, Leo Yu Zhang, Junyu Shi, Minghui Li, Xiaogeng Liu, Wei Wan, Hai Jin	(参考訳) DNNの逆例(AE)は転送可能であることが示されている: ホワイトボックスサロゲートモデルをうまく騙すAEは、異なるアーキテクチャで他のブラックボックスモデルを騙すこともできる。多くの実験的な研究は、高度に伝達可能なAEを生成するためのガイダンスを提供してきたが、これらの発見の多くは説明に欠け、矛盾するアドバイスに至る。本稿では,敵対的伝達可能性の理解に向けてさらなる一歩を踏み出し,サロゲート的な側面に焦点をあてる。弱い摂動サンプルで逆向きに訓練されたモデルがより良い代理となるという、興味深い小さな堅牢性現象から始まり、モデルの滑らかさと勾配類似性という2つの主要な要因のトレードオフが原因と考えられる。研究は, 移動可能性との相関性ではなく, 共同効果に焦点をあてた。一連の理論的および経験的分析を通して、逆行訓練におけるデータ分布シフトが勾配類似性の低下を説明すると推測する。これらの知見に基づいて,データ拡張と勾配正規化が伝達可能性に与える影響を考察し,そのトレードオフが様々なトレーニングメカニズムに一般的に存在していることを確認する。最後に,入力勾配正則化とシャープネス認識最小化(sam)の組み合わせなど,モデルの滑らかさと勾配の類似性を同時に最適化するトランスファー性を高めるために,より優れたサロゲートを構築するための一般的な経路を提案する。要約すると、我々は、一方を無視しながら一方を最適化するのではなく、他方を効果的に移動攻撃する2つの要因の統一的な影響に注意を向け、代理モデルを操作する重要な役割を強調している。 Adversarial examples (AEs) for DNNs have been shown to be transferable: AEs that successfully fool white-box surrogate models can also deceive other black-box models with different architectures. Although a bunch of empirical studies have provided guidance on generating highly transferable AEs, many of these findings lack explanations and even lead to inconsistent advice. In this paper, we take a further step towards understanding adversarial transferability, with a particular focus on surrogate aspects. Starting from the intriguing little robustness phenomenon, where models adversarially trained with mildly perturbed adversarial samples can serve as better surrogates, we attribute it to a trade-off between two predominant factors: model smoothness and gradient similarity. Our investigations focus on their joint effects, rather than their separate correlations with transferability. Through a series of theoretical and empirical analyses, we conjecture that the data distribution shift in adversarial training explains the degradation of gradient similarity. Building on these insights, we explore the impacts of data augmentation and gradient regularization on transferability and identify that the trade-off generally exists in the various training mechanisms, thus building a comprehensive blueprint for the regulation mechanism behind transferability. Finally, we provide a general route for constructing better surrogates to boost transferability which optimizes both model smoothness and gradient similarity simultaneously, e.g., the combination of input gradient regularization and sharpness-aware minimization (SAM), validated by extensive experiments. In summary, we call for attention to the united impacts of these two factors for launching effective transfer attacks, rather than optimizing one while ignoring the other, and emphasize the crucial role of manipulating surrogate models.	翻訳日:2023-08-09 16:24:50 公開日:2023-08-08
# 大規模言語モデルを用いたテキスト分類の事前適応による教師なし校正 Unsupervised Calibration through Prior Adaptation for Text Classification using Large Language Models ( http://arxiv.org/abs/2307.06713v2 ) ライセンス: Link先を確認	Lautaro Estienne	(参考訳) 現在、さまざまな自然言語タスクが大規模言語モデル(llm)で処理されている。これらのモデルは、通常、非常に大量の教師なしのテキストデータで訓練され、微調整、キャリブレーション、コンテキスト内学習などの手法を使用して下流の自然言語タスクを実行するように適合する。そこで本研究では,ラベル付きサンプルとドメイン内サンプルクエリのみを必要とせず,テキスト分類タスクに事前クラス分布を適用する手法を提案する。提案されたアプローチでは、llmをブラックボックスとして扱い、モデル後方をタスクに校正するステージを追加する。提案手法は,適応データを用いずにキャリブレーションを行い,プロンプトと前回のアプローチで異なるトレーニングショット数に対して適応しないモデルよりも優れていた。 A wide variety of natural language tasks are currently being addressed with large-scale language models (LLMs). These models are usually trained with a very large amount of unsupervised text data and adapted to perform a downstream natural language task using methods like fine-tuning, calibration or in-context learning. In this work, we propose an approach to adapt the prior class distribution to perform text classification tasks without the need for labelled samples and only few in-domain sample queries. The proposed approach treats the LLM as a black box, adding a stage where the model posteriors are calibrated to the task. Results show that these methods outperform the un-adapted model for different number of training shots in the prompt and a previous approach were calibration is performed without using any adaptation data.	翻訳日:2023-08-09 16:23:46 公開日:2023-08-08
# エッジの平滑化: Hadamard overparametrization を用いたスパース正規化におけるスムース最適化のための汎用フレームワーク Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization ( http://arxiv.org/abs/2307.03571v2 ) ライセンス: Link先を確認	Chris Kolb and Christian L. M\"uller and Bernd Bischl and David R\"ugamer	(参考訳) 本稿では,(構造)スパーシティに対する$\ell_q$と$\ell_{p,q}$正規化を伴う目的の円滑な最適化のためのフレームワークを提案する。これらの非滑らかでおそらくは非凸問題に対する解を見つけることは、通常、特別な最適化ルーチンに依存する。対照的に,本手法は,深層学習においてユビキタスなオフ・ザ・シェルフ(stochastic)勾配降下と相性があり,近似なしで微分可能なスパース正規化が可能となる。提案する最適化転送は、選択されたモデルパラメータのオーバーパラメータ化と、ペナルティの変更を含む。過度パラメータ化問題において、滑らかで凸な$\ell_2$正規化は元のパラメトリゼーションにおいて非滑らかかつ非凸正規化を誘導する。結果の代理問題は、同じ大域的最適性を持つだけでなく、局所的なミニマを正確に保存することを示した。これは非凸正則化において特に有用であり、大域的解を見つけることはNPハードであり、局所ミニマはしばしば一般化される。我々は,スパーシティ誘導パラメトリゼーションに関する様々な文献ストランドを一般の設定で集約し,既存のアプローチを有意義に拡張する統合的概観を提供する。本手法の有効性を数値実験により評価し,凸および非凸正則化器の共通実装を一致または上回ることでその効果を実証した。 This paper presents a framework for smooth optimization of objectives with $\ell_q$ and $\ell_{p,q}$ regularization for (structured) sparsity. Finding solutions to these non-smooth and possibly non-convex problems typically relies on specialized optimization routines. In contrast, the method studied here is compatible with off-the-shelf (stochastic) gradient descent that is ubiquitous in deep learning, thereby enabling differentiable sparse regularization without approximations. The proposed optimization transfer comprises an overparametrization of selected model parameters followed by a change of penalties. In the overparametrized problem, smooth and convex $\ell_2$ regularization induces non-smooth and non-convex regularization in the original parametrization. We show that the resulting surrogate problem not only has an identical global optimum but also exactly preserves the local minima. This is particularly useful in non-convex regularization, where finding global solutions is NP-hard and local minima often generalize well. We provide an integrative overview that consolidates various literature strands on sparsity-inducing parametrizations in a general setting and meaningfully extend existing approaches. The feasibility of our approach is evaluated through numerical experiments, demonstrating its effectiveness by matching or outperforming common implementations of convex and non-convex regularizers.	翻訳日:2023-08-09 16:23:32 公開日:2023-08-08
# テキスト分類におけるGzip vs. bag-of-words Gzip versus bag-of-words for text classification ( http://arxiv.org/abs/2307.15002v5 ) ライセンス: Link先を確認	Juri Opitz	(参考訳) テキスト分類における圧縮の有効性('gzip')は最近多くの注目を集めている。本稿では, 'bag-of-words' アプローチが類似あるいは良好な結果を達成し,より効率的であることを示す。 The effectiveness of compression in text classification ('gzip') has recently garnered lots of attention. In this note we show that `bag-of-words' approaches can achieve similar or better results, and are more efficient.	翻訳日:2023-08-09 16:14:56 公開日:2023-08-08
# RPG-Palm:パルププリント認識のための実データ生成 RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition ( http://arxiv.org/abs/2307.14016v3 ) ライセンス: Link先を確認	Lei Shen, Jianlong Jin, Ruixin Zhang, Huaen Li, Kai Zhao, Yingyi Zhang, Jingyun Zhang, Shouhong Ding, Yang Zhao, Wei Jia	(参考訳) Palmprintは最近、プライバシーにやさしく安定したバイオメトリックスであるため、認識アプリケーションに大きな可能性を示している。しかし、大規模な公開palmprintデータセットの欠如は、palmprint認識のさらなる研究と開発を制限している。本稿では,パームプリントを大量のIDで合成する新しい現実的な擬似パルムプリント生成(RPG)モデルを提案する。まず,クラス内多様性を改善する条件変調生成器を提案する。次に,非ペアトレーニングに対するid一貫性を確保するために,id認識損失を提案する。我々は、アイデンティティ独立を保証するため、B'ezier palm creases生成戦略をさらに改善する。広範な実験結果から,合成前訓練は認識モデルの性能を著しく向上させることが示された。例えば、我々のモデルは、1:1$と1:3$のオープンセットプロトコルの下でtar@far=1e-6の観点で、最先端のb\'ezierpalmを$5\%$と$14\%$で改善します。実際のトレーニングデータのうち10〜%しかアクセスしない場合、本手法はarcfaceを100〜%の実際のトレーニングデータで上回っており、実データなしのpalmprint認識に近いことを示している。 Palmprint recently shows great potential in recognition applications as it is a privacy-friendly and stable biometric. However, the lack of large-scale public palmprint datasets limits further research and development of palmprint recognition. In this paper, we propose a novel realistic pseudo-palmprint generation (RPG) model to synthesize palmprints with massive identities. We first introduce a conditional modulation generator to improve the intra-class diversity. Then an identity-aware loss is proposed to ensure identity consistency against unpaired training. We further improve the B\'ezier palm creases generation strategy to guarantee identity independence. Extensive experimental results demonstrate that synthetic pretraining significantly boosts the recognition model performance. For example, our model improves the state-of-the-art B\'ezierPalm by more than $5\%$ and $14\%$ in terms of TAR@FAR=1e-6 under the $1:1$ and $1:3$ Open-set protocol. When accessing only $10\%$ of the real training data, our method still outperforms ArcFace with $100\%$ real training data, indicating that we are closer to real-data-free palmprint recognition.	翻訳日:2023-08-09 16:14:51 公開日:2023-08-08
# インテリジェントシステムの複雑解析 Complex Analysis of Intelligent Systems ( http://arxiv.org/abs/2307.12905v2 ) ライセンス: Link先を確認	M.W. AlMasri	(参考訳) 論理ゲートは、入力と出力が複数の変数を持つ解析関数である複素微分作用素を用いて書くことができる。複素数の極表現を用いて、系の振動挙動と論理ゲートの間の即時接続に到達する。物理オブジェクトが情報処理に使用するユニバーサルプログラミング言語(UPL)について説明する。 UPLの因果構造を保証するため,各時間スケールの計算を特徴付けるレイヤの概念を導入する。 Logic gates can be written in terms of complex differential operators where the inputs and outputs are analytic functions with several variables. Using the polar representation of complex numbers, we arrive at an immediate connection between the oscillatory behavior of the system and logic gates. We explain the universal programming language (UPL) used by physical objects to process information. To assure the causality structure in UPL, we introduce the concept of layers that characterizes the computations for each time scale.	翻訳日:2023-08-09 16:14:31 公開日:2023-08-08
# 不均衡異常検出のための損傷ビジョンマイニング機会 Damage Vision Mining Opportunity for Imbalanced Anomaly Detection ( http://arxiv.org/abs/2307.12676v3 ) ライセンス: Link先を確認	Takato Yasuno	(参考訳) 過去10年間で、従来のバランスの取れたデータセットは、産業アプリケーションにおける分類、オブジェクト検出、セマンティックセグメンテーション、異常検出のアルゴリズムの進歩に使われてきた。特に、条件ベースのメンテナンスでは、品質を保証するために視覚検査の自動化が不可欠である。予測保守と前向きな修復のための細かな決定過程を最適化するための劣化予測の試み。土木インフラや生活環境において, 被害データマイニングが不均衡なデータ問題を回避することはできない。視覚検査では, コンクリート表面から得られた劣化クラスと鋼材成分とのバランスが, 時々不均衡になる。多くの関連調査から、不均衡なデータ問題は4つのタイプに分類できると要約する。 1)対象物及びラベル有価物の範囲の欠如 2)マイノリティ階級の不均衡 3)空間的不均衡の背景 4) 画素単位の不均衡の長尾クラス。 2015年以降、回帰、画像分類、オブジェクト検出、セマンティックセグメンテーションを含むディープラーニングアプローチを用いた不均衡な研究が数多く行われている。しかし、不均衡なデータの異常検出はまだよく分かっていない。本研究は, 異常クラスの有無にかかわらず一級異常検出アプリケーションに注目し, 血液スメア, 肺感染症, 危険運転, 木質, コンクリート劣化, 河川汚泥, 災害被害等, 不均衡視覚データセットの明確な例を示す。図1に示すように、ダメージビジョンマイニングの優位性に関する重要な結果を提供し、より効果的な正比の範囲、異常検出アプリケーションの精度向上を仮定する。不均衡な研究では、正比1/1の平衡の場合と比較して、正比が適用可能であり、精度は一貫して高いことが判明した。 In past decade, previous balanced datasets have been used to advance algorithms for classification, object detection, semantic segmentation, and anomaly detection in industrial applications. Specifically, for condition-based maintenance, automating visual inspection is crucial to ensure high quality. Deterioration prognostic attempts to optimize the fine decision process for predictive maintenance and proactive repair. In civil infrastructure and living environment, damage data mining cannot avoid the imbalanced data issue because of rare unseen events and high quality status by improved operations. For visual inspection, deteriorated class acquired from the surface of concrete and steel components are occasionally imbalanced. From numerous related surveys, we summarize that imbalanced data problems can be categorized into four types; 1) missing range of target and label valuables, 2) majority-minority class imbalance, 3) foreground-background of spatial imbalance, 4) long-tailed class of pixel-wise imbalance. Since 2015, there has been many imbalanced studies using deep learning approaches that includes regression, image classification, object detection, semantic segmentation. However, anomaly detection for imbalanced data is not yet well known. In the study, we highlight one-class anomaly detection application whether anomalous class or not, and demonstrate clear examples on imbalanced vision datasets: blood smear, lung infection, hazardous driving, wooden, concrete deterioration, river sludge, and disaster damage. Illustrated in Fig.1, we provide key results on damage vision mining advantage, hypothesizing that the more effective range of positive ratio, the higher accuracy gain of anomaly detection application. In our imbalanced studies, compared with the balanced case of positive ratio 1/1, we find that there is applicable positive ratio, where the accuracy are consistently high.	翻訳日:2023-08-09 16:14:23 公開日:2023-08-08
# ProtoFL: 原型蒸留による教師なしフェデレーション学習 ProtoFL: Unsupervised Federated Learning via Prototypical Distillation ( http://arxiv.org/abs/2307.12450v2 ) ライセンス: Link先を確認	Hansol Kim, Youngjun Kwak, Minyoung Jung, Jinho Shin, Youngsung Kim, Changick Kim	(参考訳) フェデレートラーニング(FL)は、特に認証システムにおいて、データのプライバシ保護を強化するための有望なアプローチである。しかしながら、ラウンドコミュニケーションの制限、表現の不足、スケーラビリティは、デプロイメントに重大な課題をもたらし、その潜在能力を完全に阻害する。本稿では,グローバルモデルの表現力を高め,ラウンドコミュニケーションコストを削減するために,教師なしフェデレーション学習に基づく原型的表現蒸留法である「protofl」を提案する。さらに,正規化フローに基づく局所的な一クラス分類器を導入し,データ制限による性能向上を図る。本研究は,FLを用いた一級分類性能向上のための最初の研究である。我々は,MNIST, CIFAR-10, CIFAR-100, ImageNet-30, Keystroke-Dynamicsの5つの広く利用されているベンチマークにおいて,従来の手法よりも優れた性能を示した。 Federated learning (FL) is a promising approach for enhancing data privacy preservation, particularly for authentication systems. However, limited round communications, scarce representation, and scalability pose significant challenges to its deployment, hindering its full potential. In this paper, we propose 'ProtoFL', Prototypical Representation Distillation based unsupervised Federated Learning to enhance the representation power of a global model and reduce round communication costs. Additionally, we introduce a local one-class classifier based on normalizing flows to improve performance with limited data. Our study represents the first investigation of using FL to improve one-class classification performance. We conduct extensive experiments on five widely used benchmarks, namely MNIST, CIFAR-10, CIFAR-100, ImageNet-30, and Keystroke-Dynamics, to demonstrate the superior performance of our proposed framework over previous methods in the literature.	翻訳日:2023-08-09 16:13:37 公開日:2023-08-08
# 正しい理由:解釈可能なML技術は偽相関を検出できるか? Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations? ( http://arxiv.org/abs/2307.12344v2 ) ライセンス: Link先を確認	Susu Sun, Lisa M. Koch, Christian F. Baumgartner	(参考訳) ディープニューラルネットワークモデルは、未整合の分類性能を提供するが、データ内の急激な相関を学習する傾向がある。テストデータがトレーニングデータと同じ分布から来ている場合、その情報に対するそのような依存をパフォーマンスメトリクスを使って検出することは困難である。ポストホックな説明や本質的に解釈可能な分類器のような解釈可能なMLメソッドは、欠陥モデル推論を特定することを約束する。しかし、これらの技法が実際にできるかどうかについては諸説ある。本稿では,説明手法のスプリアス相関を正しく識別する能力を評価するための厳密な評価手法を提案する。この戦略を用いて,胸部x線診断タスクにおいて3種類の人工的な共同創設者を検出できるため,ホック後の5つの説明手法と本質的に解釈可能な1つの手法を評価した。ポストホックな手法であるSHAPと本質的に解釈可能なAttri-Netは、最高の性能を提供し、欠陥モデルの振る舞いを確実に識別するために使用できる。 While deep neural network models offer unmatched classification performance, they are prone to learning spurious correlations in the data. Such dependencies on confounding information can be difficult to detect using performance metrics if the test data comes from the same distribution as the training data. Interpretable ML methods such as post-hoc explanations or inherently interpretable classifiers promise to identify faulty model reasoning. However, there is mixed evidence whether many of these techniques are actually able to do so. In this paper, we propose a rigorous evaluation strategy to assess an explanation technique's ability to correctly identify spurious correlations. Using this strategy, we evaluate five post-hoc explanation techniques and one inherently interpretable method for their ability to detect three types of artificially added confounders in a chest x-ray diagnosis task. We find that the post-hoc technique SHAP, as well as the inherently interpretable Attri-Net provide the best performance and can be used to reliably identify faulty model behavior.	翻訳日:2023-08-09 16:13:20 公開日:2023-08-08
# GPT-4によるCLIPの強化: プロンプトとしての視覚記述の調和 Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts ( http://arxiv.org/abs/2307.11661v2 ) ライセンス: Link先を確認	Mayug Maniparambil, Chris Vorster, Derek Molloy, Noel Murphy, Kevin McGuinness, Noel E. O'Connor	(参考訳) 対照的に、CLIPのような大きなVLM(Vision-Language Model)は、下流データセットで優れたパフォーマンスを提供することで、視覚表現学習に革命をもたらした。 VLMは、データセットに関連するプロンプトを設計することで、下流データセットに0ショットで適合する。このような迅速なエンジニアリングはドメインの専門知識と検証データセットを利用する。一方、gpt-4のような生成前訓練モデルの最近の開発は、高度なインターネット検索ツールとして使用できることを意味する。また、どんな構造でも視覚情報を提供するために操作することもできる。本稿では,GPT-4を用いて視覚的に記述可能なテキストを生成し,CLIPを下流タスクに適応させる方法について述べる。我々は、CLIPのデフォルトプロンプトと比較して、EuroSAT (~7%)、DTD (~7%)、SUN397 (~4.6%)、CUB (~3.3%)のような特殊な細粒度データセットの0ショット転送精度を大幅に改善したことを示す。また,提案するcocoopを平均で約2%上回り,4つの特殊細粒度データセットで4%以上上回る汎用分類器を構築するために,最善の文を選択できる簡単な数ショットアダプタも設計した。コード、プロンプト、補助テキストデータセットはhttps://github.com/mayug/VDT-Adapter.comで入手できる。 Contrastive pretrained large Vision-Language Models (VLMs) like CLIP have revolutionized visual representation learning by providing good performance on downstream datasets. VLMs are 0-shot adapted to a downstream dataset by designing prompts that are relevant to the dataset. Such prompt engineering makes use of domain expertise and a validation dataset. Meanwhile, recent developments in generative pretrained models like GPT-4 mean they can be used as advanced internet search tools. They can also be manipulated to provide visual information in any structure. In this work, we show that GPT-4 can be used to generate text that is visually descriptive and how this can be used to adapt CLIP to downstream tasks. We show considerable improvements in 0-shot transfer accuracy on specialized fine-grained datasets like EuroSAT (~7%), DTD (~7%), SUN397 (~4.6%), and CUB (~3.3%) when compared to CLIP's default prompt. We also design a simple few-shot adapter that learns to choose the best possible sentences to construct generalizable classifiers that outperform the recently proposed CoCoOP by ~2% on average and by over 4% on 4 specialized fine-grained datasets. The code, prompts, and auxiliary text dataset is available at https://github.com/mayug/VDT-Adapter.	翻訳日:2023-08-09 16:13:03 公開日:2023-08-08
# 多目的フェデレーション学習によるSecureBoostハイパーパラメータチューニング SecureBoost Hyperparameter Tuning via Multi-Objective Federated Learning ( http://arxiv.org/abs/2307.10579v3 ) ライセンス: Link先を確認	Ziyao Ren, Yan Kang, Lixin Fan, Linghua Yang, Yongxin Tong and Qiang Yang	(参考訳) SecureBoostは、準同型暗号化を活用して、垂直連邦学習環境でデータのプライバシを保護するツリーブースティングアルゴリズムである。金融や医療などの分野では、解釈可能性、有効性、プライバシー保護能力によって広く利用されている。しかしSecureBoostは、高い計算複雑性とラベルリークのリスクに悩まされている。 SecureBoostの潜在能力を最大限活用するためには、SecureBoostのハイパーパラメータを慎重に選択して、ユーティリティ、効率、プライバシの最適なバランスをとる必要がある。既存の手法では経験的あるいはヒューリスティックにハイパーパラメータを設定するが、それらは最適とはほど遠い。このギャップを埋めるために、制約付きマルチオブジェクトセキュアBoost(CMOSB)アルゴリズムを提案し、各ソリューションがユーティリティ損失、トレーニングコスト、プライバシリークの間の最適なトレードオフを達成するためのハイパーパラメータのセットである、Pareto最適解を見つける。 3つの目的の測定を設計する。特に,提案したインスタンスクラスタリング攻撃を用いて,プライバシリークを測定する。実験により、CMOSBはベースラインよりも優れたハイパーパラメータを得るだけでなく、FL参加者のフレキシブルな要求を満たすための最適なハイパーパラメータセットも得られることが示された。 SecureBoost is a tree-boosting algorithm leveraging homomorphic encryption to protect data privacy in vertical federated learning setting. It is widely used in fields such as finance and healthcare due to its interpretability, effectiveness, and privacy-preserving capability. However, SecureBoost suffers from high computational complexity and risk of label leakage. To harness the full potential of SecureBoost, hyperparameters of SecureBoost should be carefully chosen to strike an optimal balance between utility, efficiency, and privacy. Existing methods either set hyperparameters empirically or heuristically, which are far from optimal. To fill this gap, we propose a Constrained Multi-Objective SecureBoost (CMOSB) algorithm to find Pareto optimal solutions that each solution is a set of hyperparameters achieving optimal tradeoff between utility loss, training cost, and privacy leakage. We design measurements of the three objectives. In particular, the privacy leakage is measured using our proposed instance clustering attack. Experimental results demonstrate that the CMOSB yields not only hyperparameters superior to the baseline but also optimal sets of hyperparameters that can support the flexible requirements of FL participants.	翻訳日:2023-08-09 16:12:41 公開日:2023-08-08
# 実世界応用における事前学習言語モデルの再利用性の向上 Improving the Reusability of Pre-trained Language Models in Real-world Applications ( http://arxiv.org/abs/2307.10457v3 ) ライセンス: Link先を確認	Somayeh Ghanbarzadeh, Hamid Palangi, Yan Huang, Radames Cruz Moreno, and Hamed Khanpour	(参考訳) 最先端の事前学習言語モデル(PLM)の再利用可能性はしばしば、その一般化問題によって制限され、トレーニングデータセットと異なる例であるOOD(Out-of-Distribution)/unseenの例で評価すると、その性能が劇的に低下する。この制限はplmsがスプリアス相関に依存しており、頻繁な例型ではうまく機能するが、一般的な例ではうまく機能しない。この問題に対処するため,我々は Masked Language Modeling (MLM) トレーニング目標を微調整プロセスに統合して PLM の一般化を向上する Mask-tuning というトレーニング手法を提案する。総合的な実験により、Mask-tuningは現在の最先端技術を超え、PLMのOODデータセットへの一般化を促進しながら、分散データセットのパフォーマンスを改善している。この結果から,マスクチューニングにより,見えないデータ上でのPLMの再利用性が向上し,現実のアプリケーションにおいてより実用的で効果的であることが示唆された。 The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their generalization problem, where their performance drastically decreases when evaluated on examples that differ from the training dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation arises from PLMs' reliance on spurious correlations, which work well for frequent example types but not for general examples. To address this issue, we propose a training approach called Mask-tuning, which integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization. Comprehensive experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs' generalization on OOD datasets while improving their performance on in-distribution datasets. The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications.	翻訳日:2023-08-09 16:12:21 公開日:2023-08-08
# デジェネレーションチューニング:安定拡散から不要な概念をスクランブルグリッドシールドを用いて Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion ( http://arxiv.org/abs/2308.02552v2 ) ライセンス: Link先を確認	Zixuan Ni, Longhui Wei, Jiacheng Li, Siliang Tang, Yueting Zhuang, Qi Tian	(参考訳) トレーニングデータにおけるコンテンツの制約のない性質のため、SD(Stable Diffusion)のような大きなテキスト間拡散モデルは、対応するテキスト概念情報に基づいて、潜在的に著作権付きまたは危険なコンテンツを生成できる。これには、特定の知的財産権(IP)、人間の顔、様々な芸術様式が含まれる。しかし、広く使われるコンテンツ削除の方法である負のプロンプトは、推論ロジックに固有の制限があるため、しばしばこのコンテンツを隠すことに失敗する。本研究では,不必要な概念の内容をsd重みから保護するための新しい戦略である \textbf{degeneration-tuning (dt) を提案する。 Scrambled Gridを利用して、望ましくない概念とそれに対応する画像領域の相関関係を再構築することにより、そのようなテキスト概念が入力として提供されるとき、SDを無意味なコンテンツを生成するように誘導する。この適応はモデルの重みのレベルで発生するため、DTの後、SDはCrutNetのような他の条件付き拡散フレームワークに移植して不要な概念を保護することができる。各種概念の保護におけるDT法の有効性を定性的に示すことに加えて,DT前後のSDの定量的比較は,DT法が他のコンテンツの生成品質に悪影響を及ぼさないことを示している。 COCO-30KのFIDとISスコアはDT後、それぞれ12.61と39.20から13.04と38.25に微妙な変化しか示していない。 Owing to the unrestricted nature of the content in the training data, large text-to-image diffusion models, such as Stable Diffusion (SD), are capable of generating images with potentially copyrighted or dangerous content based on corresponding textual concepts information. This includes specific intellectual property (IP), human faces, and various artistic styles. However, Negative Prompt, a widely used method for content removal, frequently fails to conceal this content due to inherent limitations in its inference logic. In this work, we propose a novel strategy named \textbf{Degeneration-Tuning (DT)} to shield contents of unwanted concepts from SD weights. By utilizing Scrambled Grid to reconstruct the correlation between undesired concepts and their corresponding image domain, we guide SD to generate meaningless content when such textual concepts are provided as input. As this adaptation occurs at the level of the model's weights, the SD, after DT, can be grafted onto other conditional diffusion frameworks like ControlNet to shield unwanted concepts. In addition to qualitatively showcasing the effectiveness of our DT method in protecting various types of concepts, a quantitative comparison of the SD before and after DT indicates that the DT method does not significantly impact the generative quality of other contents. The FID and IS scores of the model on COCO-30K exhibit only minor changes after DT, shifting from 12.61 and 39.20 to 13.04 and 38.25, respectively, which clearly outperforms the previous methods.	翻訳日:2023-08-09 16:06:51 公開日:2023-08-08
# 大規模データ可視化のための適応配置マルチグリッドシーン表現ネットワーク Adaptively Placed Multi-Grid Scene Representation Networks for Large-Scale Data Visualization ( http://arxiv.org/abs/2308.02494v2 ) ライセンス: Link先を確認	Skylar Wolfgang Wurster, Tianyu Xiong, Han-Wei Shen, Hanqi Guo, Tom Peterka	(参考訳) 科学データの圧縮と可視化のためにSRN(Scene representation network)が最近提案されている。しかし、現在最先端のSRNは、科学データに見られる複雑な特徴に利用可能なネットワークパラメータの割り当てに適応せず、再構築品質が低下する。本稿では,適応配置されたマルチグリッドsrn (apmgsrn) の欠点に対処し,マルチgpu上での並列学習を高速化するためのドメイン分割訓練と推論手法を提案する。また、PyTorchベースのSRNでプラグインとプレイのレンダリングを可能にする、オープンソースのニューラルボリュームレンダリングアプリケーションもリリースしています。提案アーキテクチャでは,複数の空間適応型特徴格子を用いて,領域内に配置すべき位置を学習し,エラー量の多いニューラルネットワーク資源を動的に割り当て,高価なオクツリー精製,プルーニング,従来の適応モデルのようなトラバーサルを必要とせず,科学的データに対するSRNの最先端の再構築精度を向上させる。大規模データを表現するためのドメイン分割アプローチでは、gpuメモリに収まるには大きすぎるボリュームのアウトオブコアソリューションに必要なオーバーヘッドを回避しつつ、トレーニング時間を削減しながら、ボリュームの別々のブロック上で並列にapmgsrnのセットをトレーニングします。トレーニング後、軽量なSRNはオープンソースレンダラーのリアルタイムなニューラルボリュームレンダリングに使用され、任意のビュー角と転送関数を探索することができる。この論文のコピーであるall code, all models used in our experiment, and all supplemental materials and videoは、https://github.com/skywolf829/apmgsrn.comで閲覧できます。 Scene representation networks (SRNs) have been recently proposed for compression and visualization of scientific data. However, state-of-the-art SRNs do not adapt the allocation of available network parameters to the complex features found in scientific data, leading to a loss in reconstruction quality. We address this shortcoming with an adaptively placed multi-grid SRN (APMGSRN) and propose a domain decomposition training and inference technique for accelerated parallel training on multi-GPU systems. We also release an open-source neural volume rendering application that allows plug-and-play rendering with any PyTorch-based SRN. Our proposed APMGSRN architecture uses multiple spatially adaptive feature grids that learn where to be placed within the domain to dynamically allocate more neural network resources where error is high in the volume, improving state-of-the-art reconstruction accuracy of SRNs for scientific data without requiring expensive octree refining, pruning, and traversal like previous adaptive models. In our domain decomposition approach for representing large-scale data, we train an set of APMGSRNs in parallel on separate bricks of the volume to reduce training time while avoiding overhead necessary for an out-of-core solution for volumes too large to fit in GPU memory. After training, the lightweight SRNs are used for realtime neural volume rendering in our open-source renderer, where arbitrary view angles and transfer functions can be explored. A copy of this paper, all code, all models used in our experiments, and all supplemental materials and videos are available at https://github.com/skywolf829/APMGSRN.	翻訳日:2023-08-09 16:06:23 公開日:2023-08-08
# ランダム化QAOA回路のエントロピー特性 Entropic property of randomized QAOA circuits ( http://arxiv.org/abs/2308.01807v2 ) ライセンス: Link先を確認	A. Yu. Chernyavkiy, B. I. Bantysh	(参考訳) 量子近似最適化アルゴリズム (QAOA) は、パラメータ化量子回路を用いてビットストリングをサンプリングすることにより、いくつかのバイナリ目的関数を最小化する。回路パラメータ(角度)を探索する一般的な最適化手法とは対照的に,ランダムに選択することを検討する。このアプローチは、Max-Cutを含む2次非拘束スピン最適化(QUSO)問題に対して古典的アルゴリズムより優れているわけではないが、古典的ランダム探索よりも驚くほど有利である。異なる目的値を得る確率分布を考えると、QUSO問題に対する確率パラメータ QAOA は常に古典的ランダム探索よりも高いエントロピーを与える。また,分布解析式も提供する。 Quantum approximate optimization algorithm (QAOA) aims to minimize some binary objective function by sampling bitstrings using a parameterized quantum circuit. In contrast to common optimization-based methods for searching circuit parameters (angles), here we consider choosing them at random. Despite the fact that this approach does not outperform classical algorithms for quadratic unconstrained spin optimization (QUSO) problems, including Max-Cut, it surprisingly provides an advantage over the classical random search. Investigation of this effect has led us to the following conjecture: given the probability distribution of obtaining distinct objective values, random parameters QAOA for QUSO problems always gives a higher entropy of this distribution than the classical random search. We also provide an analytical expressions for the distribution.	翻訳日:2023-08-09 16:05:07 公開日:2023-08-08
# NBIAS:テキスト中のバイアス識別のための自然言語処理フレームワーク NBIAS: A Natural Language Processing Framework for Bias Identification in Text ( http://arxiv.org/abs/2308.01681v2 ) ライセンス: Link先を確認	Shaina Raza, Muskan Garg, Deepak John Reji, Syed Raza Bashir, Chen Ding	(参考訳) テキストデータのバイアスは、データが使用されると歪んだ解釈や結果につながる可能性がある。これらのバイアスは、ステレオタイプ、差別、その他の不公平な扱いを永続する可能性がある。偏ったデータに基づいて訓練されたアルゴリズムは、あるグループに不公平に影響を及ぼす決定を下す。したがって、データの公正かつ倫理的利用を確保するためには、これらのバイアスを検出して取り除くことが不可欠である。そこで我々は,データ層,コーパス・コントラクション,モデル開発層,評価層から構成される包括的で堅牢なフレームワークであるtextsc{Nbias} を開発した。このデータセットは、ソーシャルメディア、ヘルスケア、雇用ポータルなど、さまざまな分野からさまざまなデータを収集することによって構築される。そこで,変圧器を用いたトークン分類モデルを適用し,一意な名前を持つエンティティを通じてバイアス語やフレーズを識別する。評価手法では,定量的および定性的な評価をブレンドして,モデルの有効性を評価する。ベースラインに比べて1%から8%の精度向上を実現しています。また,モデル機能に関する堅牢な理解を生成でき,数値データだけでなく,その性能の質や複雑さも把握できる。提案手法は,様々なバイアスに適用でき,公平かつ倫理的なテキストデータの活用に寄与する。 Bias in textual data can lead to skewed interpretations and outcomes when the data is used. These biases could perpetuate stereotypes, discrimination, or other forms of unfair treatment. An algorithm trained on biased data ends up making decisions that disproportionately impact a certain group of people. Therefore, it is crucial to detect and remove these biases to ensure the fair and ethical use of data. To this end, we develop a comprehensive and robust framework \textsc{Nbias} that consists of a data layer, corpus contruction, model development layer and an evaluation layer. The dataset is constructed by collecting diverse data from various fields, including social media, healthcare, and job hiring portals. As such, we applied a transformer-based token classification model that is able to identify bias words/ phrases through a unique named entity. In the assessment procedure, we incorporate a blend of quantitative and qualitative evaluations to gauge the effectiveness of our models. We achieve accuracy improvements ranging from 1% to 8% compared to baselines. We are also able to generate a robust understanding of the model functioning, capturing not only numerical data but also the quality and intricacies of its performance. The proposed approach is applicable to a variety of biases and contributes to the fair and ethical use of textual data.	翻訳日:2023-08-09 16:04:53 公開日:2023-08-08
# fusionad: 自動運転の予測と計画タスクのためのマルチモダリティ融合 FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving ( http://arxiv.org/abs/2308.01006v3 ) ライセンス: Link先を確認	Tengju Ye, Wei Jing, Chunyong Hu, Shikun Huang, Lingping Gao, Fangzhen Li, Jingke Wang, Ke Guo, Wencong Xiao, Weibo Mao, Hang Zheng, Kun Li, Junbo Chen, Kaicheng Yu	(参考訳) 高精度でロバストなパフォーマンスに向けたマルチモダリティマルチタスクニューラルネットワークの構築は、自動運転の知覚タスクにおけるデファクトスタンダードである。しかし、複数のセンサからのそのようなデータを活用して予測と計画タスクを共同で最適化することは、ほとんど未検討のままである。本稿では、FusionADについて、私たちの知る限りでは、カメラとLiDARの2つの重要なセンサーからの情報を融合する最初の統合フレームワークであるFusionADについて述べる。具体的には、最初にトランスフォーマーベースのマルチモダリティフュージョンネットワークを構築し、フュージョンベースの機能を効果的に生み出す。カメラベースのエンドツーエンド手法であるUniADに対して、マルチモーダル特徴の利点を生かしたFMSPnPと呼ばれるモダリティ対応予測とステータス対応計画モジュールを融合して構築する。一般的なベンチマークnuscenesデータセットを広範囲に実験した結果,fusionadは最先端のパフォーマンスを達成し,検出や追跡などの知覚タスクでは平均15%,占有予測精度では10%,adeスコアでは0.708から0.389に低下し,衝突率を0.31%から0.12%に低減した。 Building a multi-modality multi-task neural network toward accurate and robust performance is a de-facto standard in perception task of autonomous driving. However, leveraging such data from multiple sensors to jointly optimize the prediction and planning tasks remains largely unexplored. In this paper, we present FusionAD, to the best of our knowledge, the first unified framework that fuse the information from two most critical sensors, camera and LiDAR, goes beyond perception task. Concretely, we first build a transformer based multi-modality fusion network to effectively produce fusion based features. In constrast to camera-based end-to-end method UniAD, we then establish a fusion aided modality-aware prediction and status-aware planning modules, dubbed FMSPnP that take advantages of multi-modality features. We conduct extensive experiments on commonly used benchmark nuScenes dataset, our FusionAD achieves state-of-the-art performance and surpassing baselines on average 15% on perception tasks like detection and tracking, 10% on occupancy prediction accuracy, reducing prediction error from 0.708 to 0.389 in ADE score and reduces the collision rate from 0.31% to only 0.12%.	翻訳日:2023-08-09 16:04:33 公開日:2023-08-08
# RecycleGPT: リサイクル可能なモジュールを備えた自動回帰言語モデル RecycleGPT: An Autoregressive Language Model with Recyclable Module ( http://arxiv.org/abs/2308.03421v2 ) ライセンス: Link先を確認	Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu, Kunpeng Wang, Wenlai Zhao, Guangwen Yang	(参考訳) 既存の大きな言語モデルは、Kトークンのシーケンスを生成するためにK回実行する必要がある。本稿では,複数のステップでモデル全体を動作させることなく,事前生成したモデル状態をリサイクルすることで,高速な復号化速度を持つ生成言語モデルRecycleGPTを提案する。提案手法は,シーケンス内の隣接トークンは通常強い相関関係を持ち,シーケンス内の次のトークンは前列のトークンに基づいて合理的に推測あるいは推測できるという観測に基づく。実験と解析により,提案手法が推論遅延を低減し,最大1.4倍の高速化を実現し,高い性能を維持した。 Existing large language models have to run K times to generate a sequence of K tokens. In this paper, we present RecycleGPT, a generative language model with fast decoding speed by recycling pre-generated model states without running the whole model in multiple steps. Our approach relies on the observation that adjacent tokens in a sequence usually have strong correlations and the next token in a sequence can be reasonably guessed or inferred based on the preceding ones. Experiments and analysis demonstrate the effectiveness of our approach in lowering inference latency, achieving up to 1.4x speedup while preserving high performance.	翻訳日:2023-08-09 15:55:35 公開日:2023-08-08
# シーン画像を用いたマルチラベル自己監督学習 Multi-Label Self-Supervised Learning with Scene Images ( http://arxiv.org/abs/2308.03286v2 ) ライセンス: Link先を確認	Ke Zhu and Minghao Fu and Jianxin Wu	(参考訳) シーンイメージをターゲットとした自己教師あり学習(SSL)手法は最近急速に成長しており、主に専用の密マッチング機構か、高価な教師なしオブジェクト発見モジュールに依存している。本稿では,これらの厳密な操作に代えて,シーン/複数ラベル画像SSLを多ラベル分類問題として扱い,学習フレームワークを大幅に単純化することで,高品質な画像表現を学習可能であることを示す。具体的には、組込みと2つの辞書の組込みを比較して各入力画像に複数の二項擬似ラベルを割り当て、二項クロスエントロピー損失を用いてネットワークを最適化する。提案手法はマルチラベル自己教師学習(MLS)と呼ばれる。 MLSによる擬似ラベルは、異なる画像にまたがって意味的に類似した擬似陽性のペアを自動的に見つけ、コントラスト学習を容易にする。 MLSはMS-COCOの高品質な表現を学習し、分類、検出、セグメンテーションのベンチマークで最先端の結果を得る。同時に、MLSは既存のメソッドよりもはるかにシンプルで、デプロイやさらなる探索が容易である。 Self-supervised learning (SSL) methods targeting scene images have seen a rapid growth recently, and they mostly rely on either a dedicated dense matching mechanism or a costly unsupervised object discovery module. This paper shows that instead of hinging on these strenuous operations, quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem, which greatly simplifies the learning framework. Specifically, multiple binary pseudo-labels are assigned for each input image by comparing its embeddings with those in two dictionaries, and the network is optimized using the binary cross entropy loss. The proposed method is named Multi-Label Self-supervised learning (MLS). Visualizations qualitatively show that clearly the pseudo-labels by MLS can automatically find semantically similar pseudo-positive pairs across different images to facilitate contrastive learning. MLS learns high quality representations on MS-COCO and achieves state-of-the-art results on classification, detection and segmentation benchmarks. At the same time, MLS is much simpler than existing methods, making it easier to deploy and for further exploration.	翻訳日:2023-08-09 15:55:23 公開日:2023-08-08
# Spaceyze:空間対応最適化による地理空間ビデオ分析システム Spatialyze: A Geospatial Video Analytics System with Spatial-Aware Optimizations ( http://arxiv.org/abs/2308.03276v2 ) ライセンス: Link先を確認	Chanwut Kittivorawong, Yongming Ge, Yousef Helal, Alvin Cheung	(参考訳) 携帯電話や監視カメラなどのコモディティなハードウェアを使って撮影されたビデオは、時間や位置などの様々なメタデータを記録する。このような地理空間的ビデオは日常的に遭遇し,その量は著しく増加している。しかし、そのようなデータと効率的に対話できるデータ管理システムは存在しません。本稿では,地理空間ビデオのエンドツーエンドクエリのための新しいフレームワークであるSpatialyzeについて述べる。 Spatialyzeにはドメイン固有の言語があり、ユーザは3段階の宣言型ビルド-フィルタ-オブザーブパラダイムを使って地理空間ビデオ分析ワークフローを構築することができる。内部的には、Spatialyzeはワークフローの宣言的な性質、ビデオに格納された時間空間メタデータ、現実世界のオブジェクトの物理的な振る舞いを活用してワークフローの実行を最適化する。実世界のビデオとワークフローを用いた結果から、spatialyzeは、最適化されていない実行と比較して97.1%の精度を維持しながら、実行時間を最大5.3倍削減できることがわかった。 Videos that are shot using commodity hardware such as phones and surveillance cameras record various metadata such as time and location. We encounter such geospatial videos on a daily basis and such videos have been growing in volume significantly. Yet, we do not have data management systems that allow users to interact with such data effectively. In this paper, we describe Spatialyze, a new framework for end-to-end querying of geospatial videos. Spatialyze comes with a domain-specific language where users can construct geospatial video analytic workflows using a 3-step, declarative, build-filter-observe paradigm. Internally, Spatialyze leverages the declarative nature of such workflows, the temporal-spatial metadata stored with videos, and physical behavior of real-world objects to optimize the execution of workflows. Our results using real-world videos and workflows show that Spatialyze can reduce execution time by up to 5.3x, while maintaining up to 97.1% accuracy compared to unoptimized execution.	翻訳日:2023-08-09 15:55:04 公開日:2023-08-08
# クエリガイドによるFew-shot 3D Point Cloud Segmentationの強化 Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement ( http://arxiv.org/abs/2308.03177v2 ) ライセンス: Link先を確認	Zhenhua Ning, Zhuotao Tian, Guangming Lu, Wenjie Pei	(参考訳) 3dポイントクラウドセグメンテーションに関する広範な研究が行われているが、ジェネリックモデルを新しいカテゴリに効果的に適応させることは、依然として大きな課題である。本稿では,pc-fss(point cloud few-shot segmentation)モデルを改善するための新しい手法を提案する。従来のPC-FSSでは,クエリサンプルの新規クラスを識別するために,サポートプロトタイプのカテゴリ情報を直接活用する手法とは異なり,提案手法では,サポートプロトタイプとクエリ機能間のコンテキストギャップを減らし,モデル性能を大幅に向上させる2つの重要な側面を識別する。具体的には,(1)クエリサンプルの背景や背景が不明瞭な外部キューを除去しながら,クエリコンテキストに適合するサポートバックグラウンドプロトタイプを適応させるとともに,(2)クエリ機能の指導の下で,クエリターゲットに意味的ギャップがないものをエミュレートするために,サポートプロトタイプを水平的に修正する。提案する設計は特徴抽出器と無関係であり,任意のプロトタイプベース手法に容易に適用できる。 S3DISとScanNetの実験結果は, 高い効率を維持しつつ, 大幅な改善を実現し, 顕著な実用効果を示した。このアプローチのコードはhttps://github.com/AaronNZH/Boosting-Few-shot-3D-Point-Segmentation-via-Query-Guided-Enhancementで公開されています。 Although extensive research has been conducted on 3D point cloud segmentation, effectively adapting generic models to novel categories remains a formidable challenge. This paper proposes a novel approach to improve point cloud few-shot segmentation (PC-FSS) models. Unlike existing PC-FSS methods that directly utilize categorical information from support prototypes to recognize novel classes in query samples, our method identifies two critical aspects that substantially enhance model performance by reducing contextual gaps between support prototypes and query features. Specifically, we (1) adapt support background prototypes to match query context while removing extraneous cues that may obscure foreground and background in query samples, and (2) holistically rectify support prototypes under the guidance of query features to emulate the latter having no semantic gap to the query targets. Our proposed designs are agnostic to the feature extractor, rendering them readily applicable to any prototype-based methods. The experimental results on S3DIS and ScanNet demonstrate notable practical benefits, as our approach achieves significant improvements while still maintaining high efficiency. The code for our approach is available at https://github.com/AaronNZH/Boosting-Few-shot-3D-Point-Cloud-Segmentation-via-Query-Guided-Enhanceme nt	翻訳日:2023-08-09 15:54:47 公開日:2023-08-08
# 複数参照時代に向けて -- NLG評価におけるデータ漏洩と限定参照多様性の対応 Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation ( http://arxiv.org/abs/2308.03131v2 ) ライセンス: Link先を確認	Xianfeng Zeng, Yijin Liu, Fandong Meng and Jie Zhou	(参考訳) BLEUやchrFのようなN-gramマッチングに基づく評価指標は、自然言語生成(NLG)タスクで広く利用されている。しかし、最近の研究では、これらのマッチングベースのメトリクスと人間の評価との間に弱い相関関係が明らかになっている。本稿では、マッチングベースのメトリクスにおけるパフォーマンスボトルネックは、参照の多様性の制限によって引き起こされる可能性があると推測する。この問題に対処するために,これらの指標と人的評価との整合性を高めるために, textit{multiple references} を用いることを提案する。 wmtメトリックベンチマークでは、マルチリファレンスf200spbleuが従来のシングルリファレンスより7.2\%精度が向上している。驚くべきことに、ニューラルネットワークベースのbertscoreを3.9\%の精度向上で上回っている。さらに,大規模言語モデル (LLM) におけるデータ漏洩問題は,マルチリファレンス・メトリックによって大幅に軽減できることがわかった。コードとデータは \url{https://github.com/sefazeng/llm-ref} でリリースします。 N-gram matching-based evaluation metrics, such as BLEU and chrF, are widely utilized across a range of natural language generation (NLG) tasks. However, recent studies have revealed a weak correlation between these matching-based metrics and human evaluations, especially when compared with neural-based metrics like BLEURT. In this paper, we conjecture that the performance bottleneck in matching-based metrics may be caused by the limited diversity of references. To address this issue, we propose to utilize \textit{multiple references} to enhance the consistency between these metrics and human evaluations. Within the WMT Metrics benchmarks, we observe that the multi-references F200spBLEU surpasses the conventional single-reference one by an accuracy improvement of 7.2\%. Remarkably, it also exceeds the neural-based BERTscore by an accuracy enhancement of 3.9\%. Moreover, we observe that the data leakage issue in large language models (LLMs) can be mitigated to a large extent by our multi-reference metric. We release the code and data at \url{https://github.com/SefaZeng/LLM-Ref}	翻訳日:2023-08-09 15:54:20 公開日:2023-08-08
# Gottesman-Kitaev-Preskill Codesによるボソニック量子誤差補正の進歩:理論・工学・応用 Advances in Bosonic Quantum Error Correction with Gottesman-Kitaev-Preskill Codes: Theory, Engineering and Applications ( http://arxiv.org/abs/2308.02913v2 ) ライセンス: Link先を確認	Anthony J. Brady, Alec Eickbusch, Shraddha Singh, Jing Wu and Quntao Zhuang	(参考訳) 量子情報を一組の高調波発振器に符号化することは、信頼性のある量子情報処理のためのノイズを軽減するためのハードウェア効率の良い手法と考えられる。量子ビットを振動子にエンコードするために、猫符号、二項符号、ゴッテマン・キタエフ・プレスキル(GKP)符号を含む様々な符号が提案されている。これらのボソニック符号は、量子誤差補正の分岐点に達した最初のものの一つである。さらに、GKP状態はボソニックチャネルにおける近接-最適量子通信速度を可能にするだけでなく、多くの発振器への発振器の誤り補正を可能にする。本稿では、超伝導回路アーキテクチャの最近の実験的進歩とマルチモードGKP量子ビット符号と発振器・オシレータ(O2O)符号の理論的進歩に焦点を当て、GKP符号の基本動作機構、性能評価および多くの応用に焦点を当てる。まず、ボソニック符号に必要な事前の連続変数形式から始める。次に、GKP状態の物理的実現に関わる量子工学に進む。本稿では,超伝導アーキテクチャにおけるGKP安定化と準備について深く掘り下げ,光領域におけるGKP状態を実現するための提案について検討する。最後に、マルチモードGKP量子ビットとGKP-O2O符号を示し、コード性能を調べ、計算、通信、センシングなどの量子情報処理タスクにおけるGKP符号の適用について議論する。 Encoding quantum information into a set of harmonic oscillators is considered a hardware efficient approach to mitigate noise for reliable quantum information processing. Various codes have been proposed to encode a qubit into an oscillator -- including cat codes, binomial codes and Gottesman-Kitaev-Preskill (GKP) codes. These bosonic codes are among the first to reach a break-even point for quantum error correction. Furthermore, GKP states not only enable close-to-optimal quantum communication rates in bosonic channels, but also allow for error correction of an oscillator into many oscillators. This review focuses on the basic working mechanism, performance characterization, and the many applications of GKP codes, with emphasis on recent experimental progress in superconducting circuit architectures and theoretical progress in multimode GKP qubit codes and oscillators-to-oscillators (O2O) codes. We begin with a preliminary continuous-variable formalism needed for bosonic codes. We then proceed to the quantum engineering involved to physically realize GKP states. We take a deep dive into GKP stabilization and preparation in superconducting architectures and examine proposals for realizing GKP states in the optical domain (along with a concise review of GKP realization in trapped-ion platforms). Finally, we present multimode GKP qubits and GKP-O2O codes, examine code performance and discuss applications of GKP codes in quantum information processing tasks such as computing, communication, and sensing.	翻訳日:2023-08-09 15:54:02 公開日:2023-08-08
# テンソル正規化群を持つ(1+1)次元o(3)非線形シグマモデルの絡み合いとr\'enyiエントロピー Entanglement and R\'enyi entropies of (1+1)-dimensional O(3) nonlinear sigma model with tensor renormalization group ( http://arxiv.org/abs/2308.02798v2 ) ライセンス: Link先を確認	Xiao Luo, Yoshinobu Kuramashi	(参考訳) 1+1)次元o(3)非線形シグマモデルのエンタングルメントとr\'enyiエントロピーをテンソル正規化群法を用いて検討した。中心電荷は両エントロピーの漸近スケーリング特性から決定される。また、エンタングルメントエントロピーと$n\rightarrow 1$の次 R'eny エントロピーとの整合性についても検討する。 We investigate the entanglement and R\'enyi entropies for the (1+1)-dimensional O(3) nonlinear sigma model using the tensor renormalization group method. The central charge is determined from the asymptotic scaling properties of both entropies. We also examine the consistency between the entanglement entropy and the $n$th-order R\'enyi entropy with $n\rightarrow 1$.	翻訳日:2023-08-09 15:53:32 公開日:2023-08-08
# 頚椎細胞学分類のための開始ネットワークの投票序列化 A Voting-Stacking Ensemble of Inception Networks for Cervical Cytology Classification ( http://arxiv.org/abs/2308.02781v2 ) ライセンス: Link先を確認	Linyi Qian, Qian Huang, Yulin Chen, Junzhou Chen	(参考訳) 子宮頸癌は女性の健康を脅かす最も深刻な疾患の1つである。早期発見と診断は、頸部細胞診の分類が不可欠である癌リスクを著しく減少させる可能性がある。研究者は最近、頚部癌の自動診断のためのネットワークを多数設計しているが、これらの個々のモデルの精度と大小は、実用的な応用ニーズを満たすことができない。そこで本研究では,3つのインセプションネットワークをベース学習者として採用し,それらのアウトプットを投票アンサンブルで統合した,投票集計アンサンブル戦略を提案する。アンサンブルモデルで誤分類されたサンプルは、線形分類モデルをメタラーナーとして訓練し、最終的な予測を行う新しいトレーニングセットを生成する。さらに、パフォーマンスをさらに向上させるために、マルチレベルスタックアンサンブルフレームワークも設計されている。この手法はSIPakMed, Herlev, Mendeleyの各データセットで評価され, 100%, 100%, 100%の精度が得られた。実験結果は、現在の最先端(SOTA)法よりも優れており、スクリーニングの負荷を減らし、病理学者が子宮頸がんを検出するのに役立つ可能性を示している。 Cervical cancer is one of the most severe diseases threatening women's health. Early detection and diagnosis can significantly reduce cancer risk, in which cervical cytology classification is indispensable. Researchers have recently designed many networks for automated cervical cancer diagnosis, but the limited accuracy and bulky size of these individual models cannot meet practical application needs. To address this issue, we propose a Voting-Stacking ensemble strategy, which employs three Inception networks as base learners and integrates their outputs through a voting ensemble. The samples misclassified by the ensemble model generate a new training set on which a linear classification model is trained as the meta-learner and performs the final predictions. In addition, a multi-level Stacking ensemble framework is designed to improve performance further. The method is evaluated on the SIPakMed, Herlev, and Mendeley datasets, achieving accuracies of 100%, 100%, and 100%, respectively. The experimental results outperform the current state-of-the-art (SOTA) methods, demonstrating its potential for reducing screening workload and helping pathologists detect cervical cancer.	翻訳日:2023-08-09 15:53:22 公開日:2023-08-08
# 生成逆数ネットワークを用いた自動走行用実時間合成Raw Radarデータの生成 Generation of Realistic Synthetic Raw Radar Data for Automated Driving Applications using Generative Adversarial Networks ( http://arxiv.org/abs/2308.02632v2 ) ライセンス: Link先を確認	Eduardo C. Fidelis and Fabio Reway and Herick Y. S. Ribeiro and Pietro L. Campos and Werner Huber and Christian Icking and Lester A. Faria and Torsten Sch\"on	(参考訳) FMCWレーダをシミュレートする主なアプローチはレイトレーシングであり、通常は計算集約であり、バックグラウンドノイズを考慮しない。本研究では,GAN(Generative Adversarial Network)を用いた合成生レーダデータを生成するFMCWレーダシミュレーションの高速化手法を提案する。コードとトレーニング済みのウェイトはオープンソースであり、githubで入手できる。この方法は16個の同時チャープを生成し、レーダデータ(フィルタリングとクラスタリング)を処理するアルゴリズムのさらなる開発に生成されたデータを使用できる。これは、実生活では再現できない非存在または安全クリティカルなシナリオでデータを生成することによって、データ拡張の可能性を高めることができる。この研究で、GANはオートバイのレーダー測定を訓練され、直線を走行するオートバイの合成生レーダーデータを生成するために使用された。このデータを生成するには、ニューラルネットワークへの入力として、オートバイとガウスノイズの距離を用いる。合成レーダチャープはFrechet Inception Distance (FID)を用いて評価した。次に、このganを用いた合成データに基づいて、第1に、実データに基づいて、範囲方位(ra)マップを2回算出する。これらのRAマップに基づいて、適応しきい値とエッジ検出のアルゴリズムがオブジェクト検出に使用される。以上の結果から, 車両のコヒーレントレーダ反射と背景騒音について, チャープ, RAマップ, 物体検出結果の比較から, 現実的なデータであることが示唆された。そこで本研究では,レーダデータ生成におけるシミュレーションと現実のギャップを最小化する手法を提案する。 The main approaches for simulating FMCW radar are based on ray tracing, which is usually computationally intensive and do not account for background noise. This work proposes a faster method for FMCW radar simulation capable of generating synthetic raw radar data using generative adversarial networks (GAN). The code and pre-trained weights are open-source and available on GitHub. This method generates 16 simultaneous chirps, which allows the generated data to be used for the further development of algorithms for processing radar data (filtering and clustering). This can increase the potential for data augmentation, e.g., by generating data in non-existent or safety-critical scenarios that are not reproducible in real life. In this work, the GAN was trained with radar measurements of a motorcycle and used to generate synthetic raw radar data of a motorcycle traveling in a straight line. For generating this data, the distance of the motorcycle and Gaussian noise are used as input to the neural network. The synthetic generated radar chirps were evaluated using the Frechet Inception Distance (FID). Then, the Range-Azimuth (RA) map is calculated twice: first, based on synthetic data using this GAN and, second, based on real data. Based on these RA maps, an algorithm with adaptive threshold and edge detection is used for object detection. The results have shown that the data is realistic in terms of coherent radar reflections of the motorcycle and background noise based on the comparison of chirps, the RA maps and the object detection results. Thus, the proposed method in this work has shown to minimize the simulation-to-reality gap for the generation of radar data.	翻訳日:2023-08-09 15:53:02 公開日:2023-08-08
# Adapt and Decompose: Domain Adapted Least-to-Most PromptingによるText-to-SQLの効率的な一般化 Adapt and Decompose: Efficient Generalization of Text-to-SQL via Domain Adapted Least-To-Most Prompting ( http://arxiv.org/abs/2308.02582v2 ) ライセンス: Link先を確認	Aseem Arora, Shabbirhussain Bhaisaheb, Manasi Patwardhan, Lovekesh Vig, Gautam Shroff	(参考訳) Text-to-SQLセマンティックパーシングのクロスドメインとクロスコンポーネントの一般化は難しい課題である。既存のLarge Language Model (LLM) ベースのソリューションは、自然言語(NL)テストクエリ毎に実行時のプロンプトを合成するために、トレーニングセットから少数ショットの例の推論時検索に依存する。対照的に、トレーニングデータから最小限の少数のショットをオフラインでサンプリングするアルゴリズムを考案し、SQL節、演算子、関数を完全にカバーし、許容トークン長内でのドメインカバレッジを最大化する。これにより、固定されたジェネリック・プロンプト (GP) の合成が可能となり、NLテストクエリに共通する様々な例のセットで、高価なテストタイムの例検索を避けることができる。さらに、GPをターゲットデータベース領域(DA-GP)に自動適応させ、クロスドメインの一般化をよりうまく処理し、次いで、クロスコンポジションの一般化を扱うために分解されたLast-To-Most-Prompting(LTMP-DA-GP)を処理します。 LTMP-DA-GPの合成はオフラインタスクであり、人間の介入を最小限に抑えた新しいデータベースに対して1回ずつ実行される。提案手法は,テキストからSQLへのタスクの一般化性を評価するために設計されたKaggleDBQAデータセット上で,優れた性能を示す。さらに,GP 上での LTMP-DA-GP の性能改善を LLM や KaggleDBQA のデータベース上で一貫した性能向上を示し,本手法の有効性とモデルに依存しない利点を強調した。 Cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing is a challenging task. Existing Large Language Model (LLM) based solutions rely on inference-time retrieval of few-shot exemplars from the training set to synthesize a run-time prompt for each Natural Language (NL) test query. In contrast, we devise an algorithm which performs offline sampling of a minimal set-of few-shots from the training data, with complete coverage of SQL clauses, operators and functions, and maximal domain coverage within the allowed token length. This allows for synthesis of a fixed Generic Prompt (GP), with a diverse set-of exemplars common across NL test queries, avoiding expensive test time exemplar retrieval. We further auto-adapt the GP to the target database domain (DA-GP), to better handle cross-domain generalization; followed by a decomposed Least-To-Most-Prompting (LTMP-DA-GP) to handle cross-compositional generalization. The synthesis of LTMP-DA-GP is an offline task, to be performed one-time per new database with minimal human intervention. Our approach demonstrates superior performance on the KaggleDBQA dataset, designed to evaluate generalizability for the Text-to-SQL task. We further showcase consistent performance improvement of LTMP-DA-GP over GP, across LLMs and databases of KaggleDBQA, highlighting the efficacy and model agnostic benefits of our prompt based adapt and decompose approach.	翻訳日:2023-08-09 15:52:35 公開日:2023-08-08
# 偏微分プライベート・パーソナライズドレコメンデーションの高精度測定のためのランダム化アルゴリズム Randomized algorithms for precise measurement of differentially-private, personalized recommendations ( http://arxiv.org/abs/2308.03735v2 ) ライセンス: Link先を確認	Allegra Laro, Yanqing Chen, Hao He, Babak Aghazadeh	(参考訳) パーソナライズドレコメンデーションは、今日のインターネットエコシステムの重要な部分を形成し、アーティストやクリエーターが興味のあるユーザーにリーチすることを支援し、ユーザーが新しく魅力的なコンテンツを見つけるのを助ける。しかし、今日の多くのユーザーは、歴史的に不注意な個人データの扱いとデータのプライバシーのために、推奨をパーソナライズするプラットフォームに懐疑的です。現在、パーソナライズドレコメンデーションに依存している企業は、プライバシ優先のシステムの多くをオーバーホールしなければならない、新たなパラダイムに移行している。本稿では,個人毎の個人別測定を容易にするためのアルゴリズムを提案する。広告をサンプルアプリケーションとして検討し,提案したプライバシー保護アルゴリズムがユーザエクスペリエンス,広告主価値,プラットフォーム収益に関連する重要な指標にどのように影響するかを,非個人的かつ非個人的かつパーソナライズされた実装の極端な部分と比較して定量化する。 Personalized recommendations form an important part of today's internet ecosystem, helping artists and creators to reach interested users, and helping users to discover new and engaging content. However, many users today are skeptical of platforms that personalize recommendations, in part due to historically careless treatment of personal data and data privacy. Now, businesses that rely on personalized recommendations are entering a new paradigm, where many of their systems must be overhauled to be privacy-first. In this article, we propose an algorithm for personalized recommendations that facilitates both precise and differentially-private measurement. We consider advertising as an example application, and conduct offline experiments to quantify how the proposed privacy-preserving algorithm affects key metrics related to user experience, advertiser value, and platform revenue compared to the extremes of both (private) non-personalized and non-private, personalized implementations.	翻訳日:2023-08-09 15:46:55 公開日:2023-08-08
# 分散画像セマンティクス無線伝送のための通信効率の高いフレームワーク Communication-Efficient Framework for Distributed Image Semantic Wireless Transmission ( http://arxiv.org/abs/2308.03713v2 ) ライセンス: Link先を確認	Bingyan Xie, Yongpeng Wu, Yuxuan Shi, Derrick Wing Kwan Ng, Wenjun Zhang	(参考訳) 複数のデバイス間の通信を指すマルチノード通信は、多くのIoT(Internet-of-Things)シナリオで注目を集めている。しかし、その膨大なデータフローとタスク拡張の柔軟性は、通信効率のよい分散データ伝送フレームワークの緊急要求を引き起こした。本稿では,帯域幅削減と意味コミュニケーションのタスク適応性に着想を得て,iotデバイスを用いたマルチタスク分散画像伝送のためのflsc(federated learning-based semantic communication)フレームワークを提案する。フェデレートラーニングにより、各ユーザの独立したセマンティックコミュニケーションリンクの設計が可能となり、グローバルアグリゲーションによるセマンティック抽出とタスクパフォーマンスがさらに向上する。 FLSCの各リンクは、階層型視覚変換器(HVT)ベースの抽出器と、粗い意味抽出のためのタスク適応翻訳器と、特定のタスクに応じた意味翻訳からなる。 flscをより現実的な状態に拡張するために,チャネル状態情報に基づく複数入力多重出力伝送モジュールを設計し,チャネルフェーディングやノイズ対策を行う。シミュレーションの結果,粗い意味情報が画像レベルのタスクを処理できることが判明した。さらに、特に低信号対雑音比とチャネル帯域比の規則では、FLSCは従来の方式、例えば3dBチャネル条件で約10のピーク信号対雑音比利得よりも明らかに優れている。 Multi-node communication, which refers to the interaction among multiple devices, has attracted lots of attention in many Internet-of-Things (IoT) scenarios. However, its huge amounts of data flows and inflexibility for task extension have triggered the urgent requirement of communication-efficient distributed data transmission frameworks. In this paper, inspired by the great superiorities on bandwidth reduction and task adaptation of semantic communications, we propose a federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices. Federated learning enables the design of independent semantic communication link of each user while further improves the semantic extraction and task performance through global aggregation. Each link in FLSC is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator for coarse-to-fine semantic extraction and meaning translation according to specific tasks. In order to extend the FLSC into more realistic conditions, we design a channel state information-based multiple-input multiple-output transmission module to combat channel fading and noise. Simulation results show that the coarse semantic information can deal with a range of image-level tasks. Moreover, especially in low signal-to-noise ratio and channel bandwidth ratio regimes, FLSC evidently outperforms the traditional scheme, e.g. about 10 peak signal-to-noise ratio gain in the 3 dB channel condition.	翻訳日:2023-08-09 15:46:33 公開日:2023-08-08
# スクリーンベース3次元主観実験ソフトウェア Screen-based 3D Subjective Experiment Software ( http://arxiv.org/abs/2308.03698v2 ) ライセンス: Link先を確認	Songlin Fan and Wei Gao	(参考訳) 近年,多岐にわたる3dグラフィックス(ポイントクラウドやメッシュなど)が学界や産業から,主観的実験を行うことでその知覚的品質を評価するための多大な努力を集めている。しかし、3Dの主観的実験のための便利なソフトウェアがないため、3Dグラフィック品質評価データセットの構築が複雑になり、関連する分野の繁栄を妨げる。本稿では,ユーザが柔軟に3dの主観的方法論を設計でき,高品質なデータセットを構築することができる強力なプラットフォームを開発し,幅広い3dグラフィックの主観的品質研究を可能にした。 3d刺激の知覚的品質差を正確に示すために,本ソフトウェアは音源刺激と刺激障害を同時に描画し,両刺激が同時反応することを可能にする。アマチュアの3d可視化ツールや画像/ビデオレンダリング方式と比較すると,主観実験時の認知的過負荷を最小限に抑えながら,典型的な3dアプリケーションを具現化する。提案するソフトウェアの有効性を検証するために,40名を対象に主観実験を行った。実験分析により,本ソフトウェアにおける主観的テストが3dモデルの合理的主観的品質スコアを生成できることが示されている。この論文のすべてのリソースはhttps://openi.pcl.ac.cn/OpenDatasets/3DQAで見ることができる。 Recently, widespread 3D graphics (e.g., point clouds and meshes) have drawn considerable efforts from academia and industry to assess their perceptual quality by conducting subjective experiments. However, lacking a handy software for 3D subjective experiments complicates the construction of 3D graphics quality assessment datasets, thus hindering the prosperity of relevant fields. In this paper, we develop a powerful platform with which users can flexibly design their 3D subjective methodologies and build high-quality datasets, easing a broad spectrum of 3D graphics subjective quality study. To accurately illustrate the perceptual quality differences of 3D stimuli, our software can simultaneously render the source stimulus and impaired stimulus and allows both stimuli to respond synchronously to viewer interactions. Compared with amateur 3D visualization tool-based or image/video rendering-based schemes, our approach embodies typical 3D applications while minimizing cognitive overload during subjective experiments. We organized a subjective experiment involving 40 participants to verify the validity of the proposed software. Experimental analyses demonstrate that subjective tests on our software can produce reasonable subjective quality scores of 3D models. All resources in this paper can be found at https://openi.pcl.ac.cn/OpenDatasets/3DQA.	翻訳日:2023-08-09 15:46:05 公開日:2023-08-08
# MedMine: メディケイトマイニングにおける事前学習言語モデルの検討 MedMine: Examining Pre-trained Language Models on Medication Mining ( http://arxiv.org/abs/2308.03629v2 ) ライセンス: Link先を確認	Haifa Alrdahi, Lifeng Han, Hendrik \v{S}uvalov, Goran Nenadic	(参考訳) 臨床およびバイオメディカルテキストからの薬剤の自動マイニングは、医療アプリケーションや最近の強力な言語モデル(lms)の開発に実際に影響するため、一般的な話題となっている。しかし、完全自動抽出モデルは依然として克服すべき障害に直面しており、より優れた影響を得るために直接臨床実践にデプロイすることができる。このような障害には、異なるエンティティタイプや臨床イベントに対する不均衡なパフォーマンスが含まれる。本研究では,モノリンガルモデルMed7や多言語大言語モデル(LLM)XLM-RoBERTaなどの微調整により,現状のPLMについて検討する。 n2c2-2018課題の共有タスクデータセットを用いて,それらの利点と欠点を比較した。これらの微調整実験から得られた知見を報告する。例えば、それらの出力を組み合わせたり、モデルをマージしたり、学習とデータ拡張によって全体的な精度を向上させることができる。 MedMineはM3 Initiative \url{https://github.com/HECTA-UoM/M3}の一部である。 Automatic medication mining from clinical and biomedical text has become a popular topic due to its real impact on healthcare applications and the recent development of powerful language models (LMs). However, fully-automatic extraction models still face obstacles to be overcome such that they can be deployed directly into clinical practice for better impacts. Such obstacles include their imbalanced performances on different entity types and clinical events. In this work, we examine current state-of-the-art pre-trained language models (PLMs) on such tasks, via fine-tuning including the monolingual model Med7 and multilingual large language model (LLM) XLM-RoBERTa. We compare their advantages and drawbacks using historical medication mining shared task data sets from n2c2-2018 challenges. We report the findings we get from these fine-tuning experiments such that they can facilitate future research on addressing them, for instance, how to combine their outputs, merge such models, or improve their overall accuracy by ensemble learning and data augmentation. MedMine is part of the M3 Initiative \url{https://github.com/HECTA-UoM/M3}	翻訳日:2023-08-09 15:45:32 公開日:2023-08-08
# GPT-3のトポロジカル解釈 Topological Interpretations of GPT-3 ( http://arxiv.org/abs/2308.03565v2 ) ライセンス: Link先を確認	Tianyi Sun and Bradley Nelson	(参考訳) 文ベクトルと文の意味的意味の相関関係を導出する一貫した方法を検討するための実験的検討である。我々はまず,GPT-3,Word2Vec,Sentence-BERTの3つの最先端単語/文埋め込み手法を用いて,平文文字列を高次元空間に埋め込む。次に、埋め込み空間における2つの文ベクトルの任意の組合せ間の対距離を計算し、それらを行列にマッピングする。各距離行列に基づいて、埋め込み空間における他の文ベクトルに対する文ベクトルの距離の相関を計算する。次に、距離行列の各対の相関を計算する。異なる埋め込み空間における同じ文の相関と同一埋め込み空間における異なる文の相関を観察した。これらの観察は私たちの仮説と一致し、次の段階へと進む。 This is an experiential study of investigating a consistent method for deriving the correlation between sentence vector and semantic meaning of a sentence. We first used three state-of-the-art word/sentence embedding methods including GPT-3, Word2Vec, and Sentence-BERT, to embed plain text sentence strings into high dimensional spaces. Then we compute the pairwise distance between any possible combination of two sentence vectors in an embedding space and map them into a matrix. Based on each distance matrix, we compute the correlation of distances of a sentence vector with respect to the other sentence vectors in an embedding space. Then we compute the correlation of each pair of the distance matrices. We observed correlations of the same sentence in different embedding spaces and correlations of different sentences in the same embedding space. These observations are consistent with our hypothesis and take us to the next stage.	翻訳日:2023-08-09 15:45:00 公開日:2023-08-08
# 高速インタラクティブセグメンテーションのための特徴デカップリング・リサイクリングネットワーク Feature Decoupling-Recycling Network for Fast Interactive Segmentation ( http://arxiv.org/abs/2308.03529v2 ) ライセンス: Link先を確認	Huimin Zeng, Weinong Wang, Xin Tao, Zhiwei Xiong, Yu-Wing Tai, Wenjie Pei	(参考訳) 近年のインタラクティブセグメンテーション手法は, 画像の不変性を考慮せずに, 画像, ユーザガイダンス, 従来予測されていたマスクを入力とする。その結果、各インタラクションにおいて、ソース画像から特徴抽出が繰り返され、実質的な計算冗長性が生じる。本稿では,本研究で提案するfdrn(feature decoupling-recycling network)を提案する。これにより、インタラクティブプロセス全体の効率を大幅に改善することができる。具体的には,3つの相違点に対処するために,3つの視点からDecoupling-Recycling戦略を適用する。まず,2種類の入力領域を別々に処理するために,ユーザガイダンスの符号化からソース画像意味学の学習を分離する。第二に、FDRNは階層化された意味表現から高レベルの特徴と低レベルの特徴を分離し、特徴学習を強化する。第3に、ユーザガイダンスのエンコーディング中に、現在のユーザガイダンスが履歴ガイダンスから切り離され、現在のユーザガイダンスの効果が強調される。異なるドメインとモダリティから得られた6つのデータセットに関する広範な実験を行い、以下のモデルの有効性を実証する。 1) 他の方法よりも優れた効率性,特に長期的インタラクション(最大4.25倍の速度)を必要とする困難なシナリオにおいて有利であり,かつ,良好なセグメンテーション性能を達成する。 2) ユニバーサルエンハンスメント技術としての様々な方法への強い適用性 3) 医用画像のセグメンテーションや誤解を招くユーザガイダンスに対するロバスト性など,優れたクロスタスク汎用性。 Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input without considering the invariant nature of the source image. As a result, extracting features from the source image is repeated in each interaction, resulting in substantial computational redundancy. In this work, we propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies and then recycles components for each user interaction. Thus, the efficiency of the whole interactive process can be significantly improved. To be specific, we apply the Decoupling-Recycling strategy from three perspectives to address three types of discrepancies, respectively. First, our model decouples the learning of source image semantics from the encoding of user guidance to process two types of input domains separately. Second, FDRN decouples high-level and low-level features from stratified semantic representations to enhance feature learning. Third, during the encoding of user guidance, current user guidance is decoupled from historical guidance to highlight the effect of current user guidance. We conduct extensive experiments on 6 datasets from different domains and modalities, which demonstrate the following merits of our model: 1) superior efficiency than other methods, particularly advantageous in challenging scenarios requiring long-term interactions (up to 4.25x faster), while achieving favorable segmentation performance; 2) strong applicability to various methods serving as a universal enhancement technique; 3) well cross-task generalizability, e.g., to medical image segmentation, and robustness against misleading user guidance.	翻訳日:2023-08-09 15:44:29 公開日:2023-08-08
# DiffSynth:リアルタイムビデオ合成のための遅延インイテレーションデクリッカ DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis ( http://arxiv.org/abs/2308.03463v2 ) ライセンス: Link先を確認	Zhongjie Duan, Lizhou You, Chengyu Wang, Cen Chen, Ziheng Wu, Weining Qian, Jun Huang, Fei Chao	(参考訳) 近年、拡散モデルが画像合成における最も強力なアプローチとして登場している。しかし、これらのモデルをビデオ合成に直接適用することは、しばしば目立ったフリックングコンテンツにつながるため、課題となる。最近提案されたゼロショット法は、フリックをある程度緩和するが、コヒーレントなビデオを生成するのに苦労している。本稿では,画像合成パイプラインをビデオ合成パイプラインに変換する新しい手法であるDiffSynthを提案する。 DiffSynthは2つの重要なコンポーネントで構成されている。潜像デクリッカリングフレームワークは、拡散モデルの潜像空間にビデオデクリッカリングを適用し、中間ステップにおけるフレッカの蓄積を効果的に防止する。さらに、異なるフレーム内のオブジェクトをリマップし、それらをブレンドしてビデオ一貫性を高める、patch blending algorithmというビデオデクリッカーアルゴリズムを提案する。 diffsynthの顕著な利点の1つは、テキスト誘導ビデオスタイライゼーション、ファッションビデオ合成、画像誘導ビデオスタイライゼーション、ビデオ復元、および3dレンダリングなど、様々なビデオ合成タスクへの一般的な適用である。テキスト誘導型ビデオスタイリングのタスクでは,チェリーピッキングなしで高品質な映像を合成することができる。実験結果はDiffSynthの有効性を示した。すべてのビデオはプロジェクトのページで見ることができる。ソースコードもリリースされる予定だ。 In recent years, diffusion models have emerged as the most powerful approach in image synthesis. However, applying these models directly to video synthesis presents challenges, as it often leads to noticeable flickering contents. Although recently proposed zero-shot methods can alleviate flicker to some extent, we still struggle to generate coherent videos. In this paper, we propose DiffSynth, a novel approach that aims to convert image synthesis pipelines to video synthesis pipelines. DiffSynth consists of two key components: a latent in-iteration deflickering framework and a video deflickering algorithm. The latent in-iteration deflickering framework applies video deflickering to the latent space of diffusion models, effectively preventing flicker accumulation in intermediate steps. Additionally, we propose a video deflickering algorithm, named patch blending algorithm, that remaps objects in different frames and blends them together to enhance video consistency. One of the notable advantages of DiffSynth is its general applicability to various video synthesis tasks, including text-guided video stylization, fashion video synthesis, image-guided video stylization, video restoring, and 3D rendering. In the task of text-guided video stylization, we make it possible to synthesize high-quality videos without cherry-picking. The experimental results demonstrate the effectiveness of DiffSynth. All videos can be viewed on our project page. Source codes will also be released.	翻訳日:2023-08-09 15:43:39 公開日:2023-08-08
# paif:攻撃耐性を持つセマンティクスセグメンテーションのための知覚認識型赤外可視画像融合 PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation ( http://arxiv.org/abs/2308.03979v1 ) ライセンス: Link先を確認	Zhu Liu, Jinyuan Liu, Benzhuang Zhang, Long Ma, Xin Fan, Risheng Liu	(参考訳) 赤外線および可視画像融合は、下流意味知覚タスクのための異なるモダリティからの補完情報を結合する強力な技術である。既存の学習ベースの手法は優れた性能を示すが、敵攻撃の固有の脆弱性に悩まされており、精度が著しく低下する。本研究では, 対向場面におけるセグメンテーションの堅牢性を促進するために, 知覚認識融合フレームワークを提案する。まず画像融合の成分を系統的に解析し, 対向摂動下でのセグメンテーションの堅牢性との関係について検討する。これらの分析に基づいて,標準精度とロバスト性のバランスをとるために,分解構造を用いた調和型アーキテクチャ探索を提案する。また,画像融合のパラメータロバスト性を改善するための適応型学習手法を提案する。したがって、画像融合の目標(つまり、ソースモダリティから相補的な特徴を抽出し、攻撃を防御する)は、アーキテクチャと学習戦略の観点から実現することができる。広範な実験結果から,本手法は,競争相手に比べて15.3%のセグメンテーションが向上し,ロバスト性が大幅に向上することが示された。ソースコードはhttps://github.com/liuzhu-cv/paifで入手できる。 Infrared and visible image fusion is a powerful technique that combines complementary information from different modalities for downstream semantic perception tasks. Existing learning-based methods show remarkable performance, but are suffering from the inherent vulnerability of adversarial attacks, causing a significant decrease in accuracy. In this work, a perception-aware fusion framework is proposed to promote segmentation robustness in adversarial scenes. We first conduct systematic analyses about the components of image fusion, investigating the correlation with segmentation robustness under adversarial perturbations. Based on these analyses, we propose a harmonized architecture search with a decomposition-based structure to balance standard accuracy and robustness. We also propose an adaptive learning strategy to improve the parameter robustness of image fusion, which can learn effective feature extraction under diverse adversarial perturbations. Thus, the goals of image fusion (\textit{i.e.,} extracting complementary features from source modalities and defending attack) can be realized from the perspectives of architectural and learning strategies. Extensive experimental results demonstrate that our scheme substantially enhances the robustness, with gains of 15.3% mIOU of segmentation in the adversarial scene, compared with advanced competitors. The source codes are available at https://github.com/LiuZhu-CV/PAIF.	翻訳日:2023-08-09 14:36:38 公開日:2023-08-08
# PUG:表現学習のためのフォトリアリスティックでセマンティックに制御可能な合成データ PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning ( http://arxiv.org/abs/2308.03977v1 ) ライセンス: Link先を確認	Florian Bordes, Shashank Shekhar, Mark Ibrahim, Diane Bouchacourt, Pascal Vincent, Ari S. Morcos	(参考訳) 合成画像データセットは、ディープニューラルネットワークの設計と評価に不整合な利点を提供する。 i) 必要なだけ多くのデータサンプルをレンダリングする。 (ii)各場面を精密に制御し、細かな地上真理ラベル(及びキャプション)を付与する。 (iii)音実験の興味のある変数を分離するために、トレーニングとテストの間における分布の正確な制御を行う。このような約束にもかかわらず、合成画像データの使用は、主に現実主義が欠如しているため、依然として制限されている。それゆえ、ほとんどの作品は実際の画像のデータセットに依存しており、それはインターネット上の公開画像からしばしば取り除かれており、プライバシー、バイアス、著作権に関して問題があり、オブジェクトが正確にどのように現れるかはほとんど制御できない。本研究では,フォトリアリスティックな合成データの利用を民主化する手法を提案する。我々は,制御可能性と現実性の両方を提供する表現学習研究のための新しい世代の対話環境を開発する。私たちはエンタテインメント業界でよく知られた強力なゲームエンジンであるunreal engineを使用して、表現学習のためにpug(photorealistic unreal graphics)環境とデータセットを作成しています。本稿では,より厳密な視覚モデル評価を可能にするPUGの可能性を示す。 Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation. Despite such promise, the use of synthetic image data is still limited -- and often played down -- mainly due to their lack of realism. Most works therefore rely on datasets of real images, which have often been scraped from public images on the internet, and may have issues with regards to privacy, bias, and copyright, while offering little control over how objects precisely appear. In this work, we present a path to democratize the use of photorealistic synthetic data: we develop a new generation of interactive environments for representation learning research, that offer both controllability and realism. We use the Unreal Engine, a powerful game engine well known in the entertainment industry, to produce PUG (Photorealistic Unreal Graphics) environments and datasets for representation learning. In this paper, we demonstrate the potential of PUG to enable more rigorous evaluations of vision models.	翻訳日:2023-08-09 14:36:13 公開日:2023-08-08
# qutritシステムにおける時間依存デコヒーレンス率の最適化とコヒーレント制御 Optimization of Time-Dependent Decoherence Rates and Coherent Control for a Qutrit System ( http://arxiv.org/abs/2308.03976v1 ) ライセンス: Link先を確認	Oleg Morzhin, Alexander Pechen	(参考訳) この研究は、密度行列 $\rho(t)$ の進化がgorini-kossakowski-sudarshan-lindbladマスター方程式と同時コヒーレント(ハミルトニアン)と非コヒーレント(散逸のスーパーオペレーター)によって制御されるオープンクトリット系を考える。非コヒーレント制御は、特定の制御方法で時間や明確な物理力学内でのデコヒーレンス率に依存する。系の最終状態 $\rho(T)$ と与えられた目標状態 $\rho_{\rm target}$ との重なりを最大化する問題と、これらの状態間の2乗ヒルベルト-シュミット距離を最小化する問題を考える。両問題を両立させ, 対応するポントリャーギン関数, 随伴系(両終端目標の2つの場合), 目標の勾配を導出し, 1段階, 2段階, 3段階の勾配投影法を適用した。重なりを最大化する問題に対しては、正則化一階krotov法も適用する。数値実験では,まず,手法の動作を解析し,次に得られた制御過程を,非一貫性制御による資源としての環境を考察した。 The work considers an open qutrit system whose density matrix $\rho(t)$ evolution is governed by the Gorini-Kossakowski-Sudarshan-Lindblad master equation with simultaneous coherent (in the Hamiltonian) and incoherent (in the superoperator of dissipation) controls. Incoherent control makes the decoherence rates depending on time in a specific controlled manner and within clear physical mechanics. We consider the problem of maximizing the Hilbert-Schmidt overlap between the system's final state $\rho(T)$ and a given target state $\rho_{\rm target}$ and the problem of minimizing the squared Hilbert-Schmidt distance between these states. For the both problems, we perform their realifications, derive the corresponding Pontryagin function, adjount system (with the two cases of transversality conditions in view of the two terminal objectives), and gradients of the objectives, adapt the one-, two-, three-step gradient projection methods. For the problem of maximizing the overlap, we also adapt the regularized first-order Krotov method. In the numerical experiments, we analyze, first, the methods' operation and, second, the obtained control processes, in respect to considering the environment as a resource via incoherent control.	翻訳日:2023-08-09 14:35:53 公開日:2023-08-08
# 仮面運動モデリングによるプロンプトコントラスト:3次元動作表現学習に向けて Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning ( http://arxiv.org/abs/2308.03975v1 ) ライセンス: Link先を確認	Jiahang Zhang, Lilang Lin, Jiaying Liu	(参考訳) 自己教師型学習は骨格に基づく人間の行動理解に有効であることが証明されている。先行研究は主に、骨格関係をモデル化するために、対比学習やマスキングモーションモデリングパラダイムに依存している。しかし,これらの手法では,シーケンスレベルと共同レベルの表現学習を効果的かつ同時に行うことはできない。その結果、学習した表現は、異なる下流タスクに一般化できない。さらに、これらの2つのパラダイムをナイーブな方法で組み合わせることで、相乗効果が失われ、トレーニングの干渉につながる可能性がある。これらの問題に対処するために、多目的な3次元動作表現学習のためのMasked Motion Modeling, PCM$^{\rm 3}$を用いたPrompted Contrastを提案する。本手法は,コントラスト学習とマスキング予測タスクを相互に有益に統合することで,下流課題の一般化能力を大幅に向上させる。具体的には、マスク付き予測は、コントラスト学習のための新しいトレーニングビューを提供し、ハイレベルなセマンティック情報でマスク付き予測トレーニングをガイドする。さらに,2つの異なるプリテキストタスクを学習することによって生じる干渉を低減し,モデル表現をさらに改善するマルチタスクプリトレーニング戦略を提案する。 3つの大規模データセットに基づく5つの下流タスクの大規模な実験を行い、PCM$^{\rm 3}$が最先端の作業と比較して優れた一般化能力を示す。私たちのプロジェクトは、https://jhang2020.github.io/Projects/PCM3/PCM3.htmlで公開されています。 Self-supervised learning has proved effective for skeleton-based human action understanding, which is an important yet challenging topic. Previous works mainly rely on contrastive learning or masked motion modeling paradigm to model the skeleton relations. However, the sequence-level and joint-level representation learning cannot be effectively and simultaneously handled by these methods. As a result, the learned representations fail to generalize to different downstream tasks. Moreover, combining these two paradigms in a naive manner leaves the synergy between them untapped and can lead to interference in training. To address these problems, we propose Prompted Contrast with Masked Motion Modeling, PCM$^{\rm 3}$, for versatile 3D action representation learning. Our method integrates the contrastive learning and masked prediction tasks in a mutually beneficial manner, which substantially boosts the generalization capacity for various downstream tasks. Specifically, masked prediction provides novel training views for contrastive learning, which in turn guides the masked prediction training with high-level semantic information. Moreover, we propose a dual-prompted multi-task pretraining strategy, which further improves model representations by reducing the interference caused by learning the two different pretext tasks. Extensive experiments on five downstream tasks under three large-scale datasets are conducted, demonstrating the superior generalization capacity of PCM$^{\rm 3}$ compared to the state-of-the-art works. Our project is publicly available at: https://jhang2020.github.io/Projects/PCM3/PCM3.html .	翻訳日:2023-08-09 14:35:23 公開日:2023-08-08
# クラスタ間のコストに依存する有向非巡回グラフの最適分割 Optimal partitioning of directed acyclic graphs with dependent costs between clusters ( http://arxiv.org/abs/2308.03970v1 ) ライセンス: Link先を確認	Paul Pao-Yen Wu, Fabrizio Rggeri, Kerrie Mengersen	(参考訳) ベイジアンネットワーク(bns)、マルコフ過程、隠れマルコフモデル(hmms)を含む多くの統計推論コンテキストは、基礎となる有向非巡回グラフ(dag)をクラスタに分割することでサポートされる。しかしながら、最適化するコストはクラスタ内の両方のノードに依存し、依存するクラスタと呼ばれる親ノードと子ノードを介して接続されるクラスタのマッピングであるため、統計的推論では、最適分割は困難である。本稿では,依存クラスタを用いた最適なクラスタマッピングのためのDCMAPアルゴリズムを提案する。 dagとクラスタマッピングに基づいて任意に定義された正のコスト関数が与えられると、dcmapは収束してすべての最適なクラスタを見つけ、途中に最適に近い解を返す。実験により,計算コスト関数を用いた海草複合体系のDBNモデルに対して,アルゴリズムは時間効率が高いことがわかった。 25ノードdbnと50ノードdbnでは、探索空間のサイズはそれぞれ9.91\times 10^9$と1.51\times10^{21}$でクラスタマッピングが可能であるが、最適解に88\%と72\%の近似性を持つ近似最適解は170と855である。第1の最適解は、第9434条の$(\text{95\% ci } 926,971)$、2256の$(2150,2271)$であり、それぞれ平均ヒューリスティックコストの4\%と0.2\%である。 Many statistical inference contexts, including Bayesian Networks (BNs), Markov processes and Hidden Markov Models (HMMS) could be supported by partitioning (i.e.~mapping) the underlying Directed Acyclic Graph (DAG) into clusters. However, optimal partitioning is challenging, especially in statistical inference as the cost to be optimised is dependent on both nodes within a cluster, and the mapping of clusters connected via parent and/or child nodes, which we call dependent clusters. We propose a novel algorithm called DCMAP for optimal cluster mapping with dependent clusters. Given an arbitrarily defined, positive cost function based on the DAG and cluster mappings, we show that DCMAP converges to find all optimal clusters, and returns near-optimal solutions along the way. Empirically, we find that the algorithm is time-efficient for a DBN model of a seagrass complex system using a computation cost function. For a 25 and 50-node DBN, the search space size was $9.91\times 10^9$ and $1.51\times10^{21}$ possible cluster mappings, respectively, but near-optimal solutions with 88\% and 72\% similarity to the optimal solution were found at iterations 170 and 865, respectively. The first optimal solution was found at iteration 934 $(\text{95\% CI } 926,971)$, and 2256 $(2150,2271)$ with a cost that was 4\% and 0.2\% of the naive heuristic cost, respectively.	翻訳日:2023-08-09 14:34:56 公開日:2023-08-08
# CheXFusion:長尺胸部X線分類のためのトランスフォーマーを用いたマルチビュー機能の有効融合 CheXFusion: Effective Fusion of Multi-View Features using Transformers for Long-Tailed Chest X-Ray Classification ( http://arxiv.org/abs/2308.03968v1 ) ライセンス: Link先を確認	Dongkyun Kim	(参考訳) 医用画像分類は、病気の長期分布、診断所見の同時発生、各研究または患者に利用可能な複数の視点により、ユニークな課題を生んでいる。本稿ではICCV CVAMD 2023 Shared Task on CXR-LT: Multi-Label Long-Tailed Classification on Chest X-raysについて述べる。マルチビュー画像を含むトランスフォーマーベースのフュージョンモジュールであるchexfusionを提案する。セルフアテンションとクロスアテンション機構により誘導される融合モジュールはラベル共起を考慮したマルチビュー特徴を効率的に集約する。さらに、モデルの性能を最適化するデータバランシングと自己学習手法についても検討する。提案手法はMIMIC-CXRテストセットにおいて0.372 mAPで最先端の結果を達成し,競争において第1位を確保した。この課題の成功は,マルチビュー設定,クラス不均衡,ラベル共起を考慮した医用画像分類の意義を浮き彫りにする。公開コードはhttps://github.com/dongkyuk/cxr-lt-public-solutionで入手できる。 Medical image classification poses unique challenges due to the long-tailed distribution of diseases, the co-occurrence of diagnostic findings, and the multiple views available for each study or patient. This paper introduces our solution to the ICCV CVAMD 2023 Shared Task on CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays. Our approach introduces CheXFusion, a transformer-based fusion module incorporating multi-view images. The fusion module, guided by self-attention and cross-attention mechanisms, efficiently aggregates multi-view features while considering label co-occurrence. Furthermore, we explore data balancing and self-training methods to optimize the model's performance. Our solution achieves state-of-the-art results with 0.372 mAP in the MIMIC-CXR test set, securing 1st place in the competition. Our success in the task underscores the significance of considering multi-view settings, class imbalance, and label co-occurrence in medical image classification. Public code is available at https://github.com/dongkyuk/CXR-LT-public-solution	翻訳日:2023-08-09 14:34:25 公開日:2023-08-08
# 単段画像検索のためのラフ・トゥ・フィギュア:学習コンパクト識別表現 Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval ( http://arxiv.org/abs/2308.04008v1 ) ライセンス: Link先を確認	Yunquan Zhu, Xinkai Gao, Bo Ke, Ruizhi Qiao, Xing Sun	(参考訳) 画像検索ターゲットは、クエリ画像と視覚的に類似したデータベースから画像を見つける。フェッチ・アンド・リランク・パラダイムに続く2段階のメソッドは優れた性能を達成しているが、それぞれのローカルモジュールとグローバルモジュールは実世界のアプリケーションでは非効率である。検索効率と精度を向上させるため、グローバル特徴とローカル特徴を融合表現に融合して単段画像検索を行う手法もある。しかし、様々な状況、例えば$、バックグラウンド、オクルージョン、視点によって、これらは依然として困難である。本研究では,一段階画像検索のためのコンパクト識別表現 (CFCD) を学習するための粗結合フレームワークを設計する。具体的には,各ミニバッチのスケールとマージンを動的に調整し,トレーニングやクラス内コンパクト性の向上のために徐々に強化する,適応型ソフトマックスベースロスの設計を行った。さらに,グローバルスケールでクラス間識別性を最適化するためのハードネガティブサンプリング戦略により,著名な局所記述子を注意深く選択し,詳細な意味関係をグローバル表現に注入するメカニズムを提案する。 Revisited Oxford や Revisited Paris などのベンチマークを用いて,最先端の単一ステージ画像検索性能を実現する手法の有効性を実証した。コードはhttps://github.com/bassyess/CFCDで入手できる。 Image retrieval targets to find images from a database that are visually similar to the query image. Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications. To better trade-off retrieval efficiency and accuracy, some approaches fuse global and local feature into a joint representation to perform single-stage image retrieval. However, they are still challenging due to various situations to tackle, $e.g.$, background, occlusion and viewpoint. In this work, we design a Coarse-to-Fine framework to learn Compact Discriminative representation (CFCD) for end-to-end single-stage image retrieval-requiring only image-level labels. Specifically, we first design a novel adaptive softmax-based loss which dynamically tunes its scale and margin within each mini-batch and increases them progressively to strengthen supervision during training and intra-class compactness. Furthermore, we propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation by a hard negative sampling strategy to optimize inter-class distinctiveness at a global scale. Extensive experimental results have demonstrated the effectiveness of our method, which achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris. Code is available at https://github.com/bassyess/CFCD.	翻訳日:2023-08-09 14:28:39 公開日:2023-08-08
# 視覚言語モデルを用いた単純な形状とテクスチャテキスト記述子を用いた医用画像分類 Few-shot medical image classification with simple shape and texture text descriptors using vision-language models ( http://arxiv.org/abs/2308.04005v1 ) ライセンス: Link先を確認	Michal Byra, Muhammad Febrian Rachmadi, Henrik Skibbe	(参考訳) 本研究では,視覚言語モデル (vlms) と大言語モデル (大言語モデル) の有用性について検討した。 gpt-4モデルを用いて,医療画像中の物体の形状とテクスチャ特性をカプセル化したテキスト記述子を生成する。次に、これらのgpt-4生成ディスクリプタと、自然画像に事前訓練されたvlmを用いて、胸部x線および胸部超音波画像の分類を行う。以上の結果から,VLMとGPT-4生成ディスクリプタを用いた医療画像の少ない分類が可能であることが示唆された。しかし、正確な分類は、ある記述子を分類スコアの計算から除外する必要がある。さらに,乳房超音波画像におけるvlmの形状特徴評価能について検討した。さらに, GPT-4 で生成したテキスト記述子の集合間の変動度について検討する。本研究は,医用画像解析へのVLMの適用について,いくつかの重要な知見を提供する。 In this work, we investigate the usefulness of vision-language models (VLMs) and large language models for binary few-shot classification of medical images. We utilize the GPT-4 model to generate text descriptors that encapsulate the shape and texture characteristics of objects in medical images. Subsequently, these GPT-4 generated descriptors, alongside VLMs pre-trained on natural images, are employed to classify chest X-rays and breast ultrasound images. Our results indicate that few-shot classification of medical images using VLMs and GPT-4 generated descriptors is a viable approach. However, accurate classification requires to exclude certain descriptors from the calculations of the classification scores. Moreover, we assess the ability of VLMs to evaluate shape features in breast mass ultrasound images. We further investigate the degree of variability among the sets of text descriptors produced by GPT-4. Our work provides several important insights about the application of VLMs for medical image analysis.	翻訳日:2023-08-09 14:27:58 公開日:2023-08-08
# 構造化背景知識と誘導推論を用いたCNN隠れニューロン活性化の理解 Understanding CNN Hidden Neuron Activations using Structured Background Knowledge and Deductive Reasoning ( http://arxiv.org/abs/2308.03999v1 ) ライセンス: Link先を確認	Abhilekha Dalal, Md Kamruzzaman Sarker, Adrita Barua, Eugene Vasserman, Pascal Hitzler	(参考訳) 正確な解釈は、深層学習システムが入力に関係していると内部的に何が検出されているかについての洞察を与え、深層学習システムのブラックボックス文字を非神秘化する。その技術は、隠れたノードの活性化は、人間にとって意味のある方法で解釈可能であるが、隠れたニューロンの活性化の解釈を仮説化し検証できる体系的な自動化手法は、過小評価されていることを示している。本稿では,そのような方法を提供し,意味のある解釈を提供することを示す。提案手法は,ウィキペディアの概念階層から学習した約200万クラスの大規模バックグラウンド知識と,セマンティックWeb分野のアプリケーション向けに開発された記述論理に基づく概念推論と呼ばれるシンボリック推論手法をベースとする。以上より,畳み込みニューラルネットワークの密集層内の個々のニューロンに,背景知識から有意なラベルを仮説と検証プロセスを通じて自動的に付加できることを示す。 A major challenge in Explainable AI is in correctly interpreting activations of hidden neurons: accurate interpretations would provide insights into the question of what a deep learning system has internally detected as relevant on the input, de-mystifying the otherwise black-box character of deep learning systems. The state of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans, but systematic automated methods that would be able to hypothesize and verify interpretations of hidden neuron activations are underexplored. In this paper, we provide such a method and demonstrate that it provides meaningful interpretations. Our approach is based on using large-scale background knowledge approximately 2 million classes curated from the Wikipedia concept hierarchy together with a symbolic reasoning approach called Concept Induction based on description logics, originally developed for applications in the Semantic Web field. Our results show that we can automatically attach meaningful labels from the background knowledge to individual neurons in the dense layer of a Convolutional Neural Network through a hypothesis and verification process	翻訳日:2023-08-09 14:27:16 公開日:2023-08-08
# オープンフィールド環境におけるロボットハーベスティングのための改良型YOLOv5sアーキテクチャに基づくリアルタイムイチゴ検出 Real-time Strawberry Detection Based on Improved YOLOv5s Architecture for Robotic Harvesting in open-field environment ( http://arxiv.org/abs/2308.03998v1 ) ライセンス: Link先を確認	Zixuan He (1)(2), Salik Ram Khana (1)(2), Xin Zhang (3), Manoj Karkee (1)(2), Qin Zhang (1)(2) ((1) Center for Precision and Automated Agricultural Systems, Washington State University, (2) Department of Biological Systems Engineering, Washington State University, (3) Department of Agricultural and Biological Engineering, Mississippi State University)	(参考訳) 本研究では、屋外環境下でイチゴを検知するYOLOv5を用いたカスタムオブジェクト検出モデルを提案する。 YOLOv5sの当初のアーキテクチャは、C3モジュールをバックボーンネットワークのC2fモジュールに置き換えることで変更され、より優れた機能勾配フローを提供した。第2に, YOLOv5sのバックボーンネットワークの最終層における空間ピラミッドのポーリング速度をクロスステージ部分ネットと組み合わせて, イチゴデータセットの一般化能力を向上した。提案されたアーキテクチャはYOLOv5s-Strawと名付けられた。 3つの成熟度クラス(未熟、ほぼ成熟、成熟)を持つイチゴキャノピーのrgb画像データセットは、オープンフィールド環境で収集され、輝度の低下、輝度の増大、ノイズの追加を含む一連の操作によって拡張された。オープンフィールド環境におけるイチゴ検出手法の優位性を検証するため、4つの競合検出モデル(YOLOv3-tiny, YOLOv5s, YOLOv5s-C2f, YOLOv8s)をトレーニングし、同じ計算環境下でテストし、YOLOv5s-Strawと比較した。その結果、平均平均精度は80.3%で、yolov3-tiny、yolov5s、yolov5s-c2f、yolov8では73.4%、77.8%、79.8%、79.3%であった。具体的には、YOLOv5s-Strawの平均精度は未熟なクラスで82.1%、ほぼ成熟したクラスで73.5%、成熟したクラスで86.6%であり、それぞれ2.3%と3.7%であった。モデルには8.610^6のネットワークパラメータがあり、1画像あたりの推論速度は18msであり、yolov8の推論速度は21.0ms、重いパラメータは11.110^6であった。 This study proposed a YOLOv5-based custom object detection model to detect strawberries in an outdoor environment. The original architecture of the YOLOv5s was modified by replacing the C3 module with the C2f module in the backbone network, which provided a better feature gradient flow. Secondly, the Spatial Pyramid Pooling Fast in the final layer of the backbone network of YOLOv5s was combined with Cross Stage Partial Net to improve the generalization ability over the strawberry dataset in this study. The proposed architecture was named YOLOv5s-Straw. The RGB images dataset of the strawberry canopy with three maturity classes (immature, nearly mature, and mature) was collected in open-field environment and augmented through a series of operations including brightness reduction, brightness increase, and noise adding. To verify the superiority of the proposed method for strawberry detection in open-field environment, four competitive detection models (YOLOv3-tiny, YOLOv5s, YOLOv5s-C2f, and YOLOv8s) were trained, and tested under the same computational environment and compared with YOLOv5s-Straw. The results showed that the highest mean average precision of 80.3% was achieved using the proposed architecture whereas the same was achieved with YOLOv3-tiny, YOLOv5s, YOLOv5s-C2f, and YOLOv8s were 73.4%, 77.8%, 79.8%, 79.3%, respectively. Specifically, the average precision of YOLOv5s-Straw was 82.1% in the immature class, 73.5% in the nearly mature class, and 86.6% in the mature class, which were 2.3% and 3.7%, respectively, higher than that of the latest YOLOv8s. The model included 8.610^6 network parameters with an inference speed of 18ms per image while the inference speed of YOLOv8s had a slower inference speed of 21.0ms and heavy parameters of 11.110^6, which indicates that the proposed model is fast enough for real time strawberry detection and localization for the robotic picking.	翻訳日:2023-08-09 14:26:45 公開日:2023-08-08
# 宇宙空間統合ネットワークにおける資源管理のための協調型マルチエージェント深層強化学習 Cooperative Multi-Type Multi-Agent Deep Reinforcement Learning for Resource Management in Space-Air-Ground Integrated Networks ( http://arxiv.org/abs/2308.03995v1 ) ライセンス: Link先を確認	Hengxi Zhang, Huaze Tang, Wenbo Ding, Xiao-Ping Zhang	(参考訳) sagin(space-air-ground integrated network)は、低軌道(leo)衛星、無人航空機(uavs)、地上ユーザー(gus)を含む異種デバイスを統合することで、スマートシティの応用を前進させることを約束している。しかし、SAGINの資源管理は、不適切な資源管理がデータ伝達の貧弱を招き、スマートシティのサービスに影響を及ぼすという緊急の研究を必要とする課題である。本稿では,5つの異なる通信リンクを含む総合的なSAGINシステムを開発し,資源管理問題に対処する効率的な協調型マルチエージェント深層強化学習(CMT-MARL)手法を提案する。実験結果は,提案するcmt-marlの有効性を強調するものである。これらの結果は、将来のSAGINの実装の可能性と実現可能性を示している。 The Space-Air-Ground Integrated Network (SAGIN), integrating heterogeneous devices including low earth orbit (LEO) satellites, unmanned aerial vehicles (UAVs), and ground users (GUs), holds significant promise for advancing smart city applications. However, resource management of the SAGIN is a challenge requiring urgent study in that inappropriate resource management will cause poor data transmission, and hence affect the services in smart cities. In this paper, we develop a comprehensive SAGIN system that encompasses five distinct communication links and propose an efficient cooperative multi-type multi-agent deep reinforcement learning (CMT-MARL) method to address the resource management issue. The experimental results highlight the efficacy of the proposed CMT-MARL, as evidenced by key performance indicators such as the overall transmission rate and transmission success rate. These results underscore the potential value and feasibility of future implementation of the SAGIN.	翻訳日:2023-08-09 14:25:59 公開日:2023-08-08
# マルチロール教育エージェントとしてのAIチャットボット:CS教育におけるエンゲージメントの変容 AI Chatbots as Multi-Role Pedagogical Agents: Transforming Engagement in CS Education ( http://arxiv.org/abs/2308.03992v1 ) ライセンス: Link先を確認	Cassie Chen Cao, Zijian Ding, Jionghao Lin, Frank Hopfgartner	(参考訳) 本研究では,人工知能(ai)を活用したマルチロールチャットボットを,学習経験の向上とコンピュータサイエンス教育への取り組みの促進に活用する。デザインに基づく研究アプローチを活用し、インストラクターボット、ピアボット、キャリアアドバイザボット、感情支援ボットという4つの異なる役割を持つ新しい学習環境を開発し、実装し、評価する。これらの役割は、自己決定理論の信条に基づいて設計され、能力、自律性、関連性という、学習者の生来の心理学的ニーズを満たしている。さらに、このシステムは質問に基づく学習パラダイムを採用し、学生に質問をし、解決策を求め、その好奇心を探求するよう促す。我々は,このシステムを,200人の参加学生を対象に,1ヶ月の高等教育状況下でテストし,人間教師と1人のチャットボットを含む条件と比較した。本研究は,チャットログのシーケンス分析や,調査やフォーカスグループインタビューなど質的手法などの定量的手法を取り入れた混合手法を用いた。トピックモデリングや感情分析などの最先端自然言語処理技術を統合することにより,学習者の関与,動機づけ,質問に基づく学習に対するシステムの影響を深く理解する。この研究は、厳格な設計と革新的アプローチを通じて、コンピュータサイエンス教育の風景を形作り、熱心で支援的でモチベーションのある学習環境を育む上で、AIを駆使したマルチロールチャットボットの可能性に関する重要な洞察を提供する。 This study investigates the use of Artificial Intelligence (AI)-powered, multi-role chatbots as a means to enhance learning experiences and foster engagement in computer science education. Leveraging a design-based research approach, we develop, implement, and evaluate a novel learning environment enriched with four distinct chatbot roles: Instructor Bot, Peer Bot, Career Advising Bot, and Emotional Supporter Bot. These roles, designed around the tenets of Self-Determination Theory, cater to the three innate psychological needs of learners - competence, autonomy, and relatedness. Additionally, the system embraces an inquiry-based learning paradigm, encouraging students to ask questions, seek solutions, and explore their curiosities. We test this system in a higher education context over a period of one month with 200 participating students, comparing outcomes with conditions involving a human tutor and a single chatbot. Our research utilizes a mixed-methods approach, encompassing quantitative measures such as chat log sequence analysis, and qualitative methods including surveys and focus group interviews. By integrating cutting-edge Natural Language Processing techniques such as topic modelling and sentiment analysis, we offer an in-depth understanding of the system's impact on learner engagement, motivation, and inquiry-based learning. This study, through its rigorous design and innovative approach, provides significant insights into the potential of AI-empowered, multi-role chatbots in reshaping the landscape of computer science education and fostering an engaging, supportive, and motivating learning environment.	翻訳日:2023-08-09 14:25:42 公開日:2023-08-08
# NEOLAF - LLMを利用したニューラルシンボリック認知アーキテクチャ NEOLAF, an LLM-powered neural-symbolic cognitive architecture ( http://arxiv.org/abs/2308.03990v1 ) ライセンス: Link先を確認	Richard Jiarui Tong, Cassie Chen Cao, Timothy Xueqian Lee, Guodong Zhao, Ray Wan, Feiyue Wang, Xiangen Hu, Robin Schmucker, Jinsheng Pan, Julian Quevedo, Yu Lu	(参考訳) 本稿では,知的なエージェントをモデル化し構築する統合型ニューラルシンボリック認知アーキテクチャであるnever ending open learning adaptive framework(neolaf)を提案する。 NEOLAFフレームワークは、その説明可能性、漸進学習、効率性、協調的および分散学習、ヒューマン・イン・ザ・ループの実現、自己改善により、純粋接続性および純粋シンボル的アプローチよりもインテリジェントエージェントを構築する上で優れたアプローチである。さらに,課題解決エージェントとして構築されたNEOLAFエージェントを,オープンソースのMATHデータセットから複雑な数学問題に投入する,説得力のある実験を行った。その結果、NEOLAFの優れた学習能力と、認知アーキテクチャの分野や自己改善型適応型教育システムに革命をもたらす可能性を示す。 This paper presents the Never Ending Open Learning Adaptive Framework (NEOLAF), an integrated neural-symbolic cognitive architecture that models and constructs intelligent agents. The NEOLAF framework is a superior approach to constructing intelligent agents than both the pure connectionist and pure symbolic approaches due to its explainability, incremental learning, efficiency, collaborative and distributed learning, human-in-the-loop enablement, and self-improvement. The paper further presents a compelling experiment where a NEOLAF agent, built as a problem-solving agent, is fed with complex math problems from the open-source MATH dataset. The results demonstrate NEOLAF's superior learning capability and its potential to revolutionize the field of cognitive architectures and self-improving adaptive instructional systems.	翻訳日:2023-08-09 14:25:14 公開日:2023-08-08
# 3次元動的都市気候のリアルタイムシミュレーションのためのフーリエニューラルオペレータ Fourier neural operator for real-time simulation of 3D dynamic urban microclimate ( http://arxiv.org/abs/2308.03985v1 ) ライセンス: Link先を確認	Wenhui Peng, Shaoxiang Qin, Senwen Yang, Jianchun Wang, Xue Liu, Liangzhu (Leon) Wang	(参考訳) 地球規模の都市化は、人間の快適性、健康、建築/都市エネルギー効率のための都市微小気候の重要性を強調している。主な環境影響として、建築設計や都市計画に大きな影響を与えている。都市が気候変動に備え、レジリエンス対策を効果的に実施するためには、地域の微気候を理解することが不可欠である。しかし、都市の微気候を分析するには、計算領域内の屋外パラメータの複雑な配列を屋内よりも長期にわたって考慮する必要がある。その結果, 都市微小気候の影響評価において, 計算流体力学(cfd)などの数値計算手法は計算コストが高くなる。ディープラーニング技術の台頭により、複雑な非線形相互作用とシステムダイナミクスのモデリングを加速する新たな機会が開けた。近年、フーリエニューラル演算子(FNO)は、部分微分方程式(PDE)の解法と流体力学系のモデリングの高速化に非常に有望であることが示されている。本研究では,FNOネットワークを実時間3次元都市風況シミュレーションに適用する。都市域のCFDシミュレーションから,半ラグランジュ的アプローチと分数ステップ法による大規模都市問題モデリングのための都市微気候特性のシミュレートによる訓練・試験データを生成する。数値実験により,fnoモデルは瞬時空間速度場を正確に再現できることがわかった。さらに,風向の異なる未確認データに基づくFNOモデルの評価を行い,FNOモデルが風向の異なるデータに対して良好に一般化可能であることを示す。さらに重要なことに、fnoアプローチはグラフィック処理ユニット上でミリ秒以内の予測を可能にし、3d動的都市気候のリアルタイムシミュレーションを可能にします。 Global urbanization has underscored the significance of urban microclimates for human comfort, health, and building/urban energy efficiency. They profoundly influence building design and urban planning as major environmental impacts. Understanding local microclimates is essential for cities to prepare for climate change and effectively implement resilience measures. However, analyzing urban microclimates requires considering a complex array of outdoor parameters within computational domains at the city scale over a longer period than indoors. As a result, numerical methods like Computational Fluid Dynamics (CFD) become computationally expensive when evaluating the impact of urban microclimates. The rise of deep learning techniques has opened new opportunities for accelerating the modeling of complex non-linear interactions and system dynamics. Recently, the Fourier Neural Operator (FNO) has been shown to be very promising in accelerating solving the Partial Differential Equations (PDEs) and modeling fluid dynamic systems. In this work, we apply the FNO network for real-time three-dimensional (3D) urban wind field simulation. The training and testing data are generated from CFD simulation of the urban area, based on the semi-Lagrangian approach and fractional stepping method to simulate urban microclimate features for modeling large-scale urban problems. Numerical experiments show that the FNO model can accurately reconstruct the instantaneous spatial velocity field. We further evaluate the trained FNO model on unseen data with different wind directions, and the results show that the FNO model can generalize well on different wind directions. More importantly, the FNO approach can make predictions within milliseconds on the graphics processing unit, making real-time simulation of 3D dynamic urban microclimate possible.	翻訳日:2023-08-09 14:24:57 公開日:2023-08-08
# SimplyRetrieve: プライベートで軽量な検索中心の生成AIツール SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI Tool ( http://arxiv.org/abs/2308.03983v1 ) ライセンス: Link先を確認	Youyang Ng, Daisuke Miyashita, Yasuto Hoshi, Yasuhiro Morioka, Osamu Torii, Tomoya Kodama, Jun Deguchi	(参考訳) 大規模言語モデル(LLM)ベースの生成AIシステムは,近年,大きな進歩を遂げている。知識検索アーキテクチャを統合することで、追加のモデル微調整を必要とせずに、事前訓練されたLLMを使用して、プライベートデータを公開可能な生成AIシステムにシームレスに統合することができる。さらに、検索中心生成(RCG)アプローチは、文脈解釈と知識記憶におけるLLMとレトリバーの役割を明確に分離する将来的な研究方向であり、より効率的な実装につながる可能性がある。 simplyretrieveはオープンソースのツールで、機械学習コミュニティへの高度な進歩に対して、ローカライズされ、軽量で、ユーザフレンドリーなインターフェースを提供することを目標としている。 SimplyRetrieveはGUIとAPIベースのRCGプラットフォームを備えており、Private Knowledge Base ConstructorとRetrieval Tuning Moduleが支援している。これらの機能を活用することで、ユーザーはプライバシ標準を維持しながら生成AIのパフォーマンスを改善するためのRCGの可能性を探ることができる。このツールはMITライセンスでhttps://github.com/RCGAI/SimplyRetrieveで入手できる。 Large Language Model (LLM) based Generative AI systems have seen significant progress in recent years. Integrating a knowledge retrieval architecture allows for seamless integration of private data into publicly available Generative AI systems using pre-trained LLM without requiring additional model fine-tuning. Moreover, Retrieval-Centric Generation (RCG) approach, a promising future research direction that explicitly separates roles of LLMs and retrievers in context interpretation and knowledge memorization, potentially leads to more efficient implementation. SimplyRetrieve is an open-source tool with the goal of providing a localized, lightweight, and user-friendly interface to these sophisticated advancements to the machine learning community. SimplyRetrieve features a GUI and API based RCG platform, assisted by a Private Knowledge Base Constructor and a Retrieval Tuning Module. By leveraging these capabilities, users can explore the potential of RCG for improving generative AI performance while maintaining privacy standards. The tool is available at https://github.com/RCGAI/SimplyRetrieve with an MIT license.	翻訳日:2023-08-09 14:24:30 公開日:2023-08-08
# PartNER: LiDAR 3Dオブジェクト検出のための極性表現のレベルアップ PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection ( http://arxiv.org/abs/2308.03982v1 ) ライセンス: Link先を確認	Ming Nie, Yujing Xue, Chunwei Wang, Chaoqiang Ye, Hang Xu, Xinge Zhu, Qingqiu Huang, Michael Bi Mi, Xinchao Wang, Li Zhang	(参考訳) 近年、極性に基づく表現は知覚タスクにおいて有望な性質を示している。点雲を均等に分離するデカルト的アプローチに加えて,(1)異なる解像度下でのロバスト性能の優位性と(2)ストリーミングベースのアプローチの優位性から,点雲を極性グリッドとして表現する手法が選択肢として認識されている。しかし、極性表現の不均一な分割のため、最先端の極性検出法は必然的に特徴歪み問題に悩まされ、カルテシアン法と比較して非無視的な性能差が生じる。この問題に対処するため,極座標における新しい3次元物体検出器Partnerを提案する。 PartNERは、グローバル表現再構成による特徴歪みのジレンマを緩和し、検出ヘッドにインスタンスレベルの幾何情報を導入することで回帰を容易にする。大規模な実験は、ストリーミングベースの検出と異なる解像度において圧倒的な優位性を示している。さらに,本手法は,Waymo と ONCE の検証セットにおいて,3.68% と 9.15% の顕著なマージンを持つ従来の極性理論よりも優れており,最先端の手法よりも競争力のある結果が得られる。 Recently, polar-based representation has shown promising properties in perceptual tasks. In addition to Cartesian-based approaches, which separate point clouds unevenly, representing point clouds as polar grids has been recognized as an alternative due to (1) its advantage in robust performance under different resolutions and (2) its superiority in streaming-based approaches. However, state-of-the-art polar-based detection methods inevitably suffer from the feature distortion problem because of the non-uniform division of polar representation, resulting in a non-negligible performance gap compared to Cartesian-based approaches. To tackle this issue, we present PARTNER, a novel 3D object detector in the polar coordinate. PARTNER alleviates the dilemma of feature distortion with global representation re-alignment and facilitates the regression by introducing instance-level geometric information into the detection head. Extensive experiments show overwhelming advantages in streaming-based detection and different resolutions. Furthermore, our method outperforms the previous polar-based works with remarkable margins of 3.68% and 9.15% on Waymo and ONCE validation set, thus achieving competitive results over the state-of-the-art methods.	翻訳日:2023-08-09 14:24:11 公開日:2023-08-08
# agentsims: 大きな言語モデル評価のためのオープンソースサンドボックス AgentSims: An Open-Source Sandbox for Large Language Model Evaluation ( http://arxiv.org/abs/2308.04026v1 ) ライセンス: Link先を確認	Jiaju Lin, Haoran Zhao, Aochi Zhang, Yiting Wu, Huqiuyue Ping, Qin Chen	(参考訳) ChatGPTライクな大規模言語モデル(LLM)がコミュニティで普及しているため、LLMの能力を評価する方法はオープンな問題である。既存の評価手法では,(1)制約付き評価能力,(2)脆弱なベンチマーク,(3)客観的な指標が不足している。 LLMエージェントがシミュレーション環境でタスクを完了するタスクベース評価は、上記の問題を解決するための一対一のソリューションである。 agentimsは、あらゆる分野の研究者が興味のある特定の能力をテストするための、使いやすいインフラストラクチャです。研究者は対話的なGUIにエージェントやビルディングを追加するか、メモリ、計画、ツール使用システムといった新しいサポートメカニズムを数行のコードでテストすることで、評価タスクを構築することができる。デモはhttps://agentsims.comで公開しています。 With ChatGPT-like large language models (LLM) prevailing in the community, how to evaluate the ability of LLMs is an open question. Existing evaluation methods suffer from following shortcomings: (1) constrained evaluation abilities, (2) vulnerable benchmarks, (3) unobjective metrics. We suggest that task-based evaluation, where LLM agents complete tasks in a simulated environment, is a one-for-all solution to solve above problems. We present AgentSims, an easy-to-use infrastructure for researchers from all disciplines to test the specific capacities they are interested in. Researchers can build their evaluation tasks by adding agents and buildings on an interactive GUI or deploy and test new support mechanisms, i.e. memory, planning and tool-use systems, by a few lines of codes. Our demo is available at https://agentsims.com .	翻訳日:2023-08-09 14:16:21 公開日:2023-08-08
# MSAC:音声感情認識のための複数音声属性制御法 MSAC: Multiple Speech Attribute Control Method for Speech Emotion Recognition ( http://arxiv.org/abs/2308.04025v1 ) ライセンス: Link先を確認	Yu Pan	(参考訳) 言語感情認識(SER)は、大きな進歩にもかかわらず、特に野生世界では、感情特性の複雑さとあいまいさのため、依然として困難である。最近の研究は主に認識と一般化の能力に焦点を当てているが、本研究はser法の信頼性を探求し、様々な音声属性間のデータ分布の観点から音声感情をモデル化する方法を検討する。具体的には,新たなcnnベースのserモデルを構築し,加算マージンソフトマックス損失を適用し,異なるクラスの特徴間の距離を拡大することで識別性を高めた。第2に,音声属性を明示的に制御し,感情非依存な属性の影響を軽減し,よりきめ細かい感情関連特徴を捉えるための,新しい複数音声属性制御法であるmsacを提案する。第3に,out-of-distribution detection法を用いて,提案するserワークフローの信頼性をテスト・解析する試みを行った。単一とクロスコーポレートの両方のserシナリオに関する広範な実験により,提案する統一serワークフローは,認識,一般化,信頼性性能において,ベースラインを一貫して上回っていることが示された。さらにシングルコーパスのserでは、提案するserワークフローは72.97\%のwarとiemocapコーパス上の71.76\%のuarで優れた認識結果を達成している。 Despite significant progress, speech emotion recognition (SER) remains challenging due to inherent complexity and ambiguity of the emotion attribute, particularly in wild world. Whereas current studies primarily focus on recognition and generalization capabilities, this work pioneers an exploration into the reliability of SER methods and investigates how to model the speech emotion from the aspect of data distribution across various speech attributes. Specifically, we first build a novel CNN-based SER model which adopts additive margin softmax loss to expand the distance between features of different classes, thereby enhancing their discrimination. Second, a novel multiple speech attribute control method MSAC is proposed to explicitly control speech attributes, enabling the model to be less affected by emotion-agnostic attributes and capture more fine-grained emotion-related features. Third, we make a first attempt to test and analyze the reliability of the proposed SER workflow using the out-of-distribution detection method. Extensive experiments on both single and cross-corpus SER scenarios show that our proposed unified SER workflow consistently outperforms the baseline in terms of recognition, generalization, and reliability performance. Besides, in single-corpus SER, the proposed SER workflow achieves superior recognition results with a WAR of 72.97\% and a UAR of 71.76\% on the IEMOCAP corpus.	翻訳日:2023-08-09 14:16:05 公開日:2023-08-08
# 不均衡分類とRL探索のためのスコープ損失 Scope Loss for Imbalanced Classification and RL Exploration ( http://arxiv.org/abs/2308.04024v1 ) ライセンス: Link先を確認	Hasham Burhani, Xiao Qi Shi, Jonathan Jaegerman, Daniel Balicki	(参考訳) 強化学習問題と教師付き分類問題との等価性を示す。その結果,強化学習における探索的活用のトレードオフを教師付き分類におけるデータセットの不均衡問題と同一視し,その対処方法の類似性を見出した。上記の問題の解析から,強化学習と教師付き分類のための新しい損失関数を導出する。新たな損失関数であるScope Lossは、チューニングを必要とせずに、パフォーマンス損失のオーバーエクスプロイテーションやデータセットの不均衡を防止するために、勾配を調整する。ベンチマーク強化学習タスクのバスケットとスキュー分類データセットを用いて、SOTA損失関数に対するスコープ損失を検証し、スコープ損失が他の損失関数よりも優れていることを示す。 We demonstrate equivalence between the reinforcement learning problem and the supervised classification problem. We consequently equate the exploration exploitation trade-off in reinforcement learning to the dataset imbalance problem in supervised classification, and find similarities in how they are addressed. From our analysis of the aforementioned problems we derive a novel loss function for reinforcement learning and supervised classification. Scope Loss, our new loss function, adjusts gradients to prevent performance losses from over-exploitation and dataset imbalances, without the need for any tuning. We test Scope Loss against SOTA loss functions over a basket of benchmark reinforcement learning tasks and a skewed classification dataset, and show that Scope Loss outperforms other loss functions.	翻訳日:2023-08-09 14:15:40 公開日:2023-08-08
# Harrow-Hassidim-Lloydアルゴリズムにおける量子資源 Quantum Resources in Harrow-Hassidim-Lloyd Algorithm ( http://arxiv.org/abs/2308.04021v1 ) ライセンス: Link先を確認	Pradeep Kumar, Tanoy Kanti Konar, Leela Ganesh Chandra Lakkaraju, Aditi Sen De	(参考訳) 量子アルゴリズムは、古典的なアルゴリズムの能力を超えたタスク実行のランタイムを削減できる。したがって、量子の利点に責任を持つリソースを特定することは興味深い試みである。 HHL(Harrow-Hassidim-Lloyd)アルゴリズムにおいて、非自明な線形方程式系を解くためには、二分法と真多分法の両方の非消滅量子相関が不可欠であることを示す。さらに,システム全体の非有望なl1-ノルム量子コヒーレンスとレジスタ量子ビットがアルゴリズムの成功確率と関連していることがわかった。量子資源の定量的解析により、各ステップでかなりの量の二部交絡が生成され、このアルゴリズムに必要な一方で、多部交絡内容は性能指標に逆比例することが明らかとなった。さらに,ガウス分布から選択された不完全性が制御された回転に組み込まれると,障害の強さによって多部交絡が増加し,二部交絡とコヒーレンスも減少する一方で,二部交絡とコヒーレンスも増加し,このアルゴリズムにおける二部交絡とコヒーレンスの有効性が確かめられる。 Quantum algorithms have the ability to reduce runtime for executing tasks beyond the capabilities of classical algorithms. Therefore, identifying the resources responsible for quantum advantages is an interesting endeavour. We prove that nonvanishing quantum correlations, both bipartite and genuine multipartite entanglement, are required for solving nontrivial linear systems of equations in the Harrow-Hassidim-Lloyd (HHL) algorithm. Moreover, we find a nonvanishing l1-norm quantum coherence of the entire system and the register qubit which turns out to be related to the success probability of the algorithm. Quantitative analysis of the quantum resources reveals that while a significant amount of bipartite entanglement is generated in each step and required for this algorithm, multipartite entanglement content is inversely proportional to the performance indicator. In addition, we report that when imperfections chosen from Gaussian distribution are incorporated in controlled rotations, multipartite entanglement increases with the strength of the disorder, albeit error also increases while bipartite entanglement and coherence decreases, confirming the beneficial role of bipartite entanglement and coherence in this algorithm.	翻訳日:2023-08-09 14:15:29 公開日:2023-08-08
# 大規模無条件事前訓練による合成強化 Synthetic Augmentation with Large-scale Unconditional Pre-training ( http://arxiv.org/abs/2308.04020v1 ) ライセンス: Link先を確認	Jiarong Ye, Haomiao Ni, Peng Jin, Sharon X. Huang, Yuan Xue	(参考訳) 深層学習に基づく医用画像認識システムは、専門家のアノテーションによるかなりの量のトレーニングデータを必要とすることが多い。近年,クラスラベルに条件付けされたリアルな画像を生成することで問題を緩和する合成拡張技術が提案されている。しかし、これらの手法の有効性は、十分なラベル付きトレーニングデータなしでは保証できない訓練された生成モデルの表現能力に大きく依存する。さらに,アノテートデータへの依存を減らすために,大規模なラベルなしデータセットで事前学習し,後に小規模ラベル付きデータセットに適用して拡張トレーニングを行う,histodiffusionと呼ばれる合成拡張法を提案する。特に,多種多様なラベル付きデータセット上に潜在拡散モデル(LDM)をトレーニングし,共通特徴を学習し,条件付き入力なしで現実的な画像を生成する。そこで,本モデルでは,未確認ラベル付きデータセット上で,潜在空間の分類器ガイダンスを用いてモデルを微調整し,特定のカテゴリの画像を合成する。さらに,ターゲットラベルとのマッチングの信頼性が高い合成試料のみを添加する選択的な機構を採用した。本手法は,3つの病理組織学的データセットを事前学習し,大腸癌の病理組織学的データセット(CRC)を事前学習データセットから除外して評価する。 histodiffusionの強化により,backbone分類器の分類精度が6.4%向上した。私たちのコードはhttps://github.com/karenyyy/HistoDiffAug.comで利用可能です。 Deep learning based medical image recognition systems often require a substantial amount of training data with expert annotations, which can be expensive and time-consuming to obtain. Recently, synthetic augmentation techniques have been proposed to mitigate the issue by generating realistic images conditioned on class labels. However, the effectiveness of these methods heavily depends on the representation capability of the trained generative model, which cannot be guaranteed without sufficient labeled training data. To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training. In particular, we train a latent diffusion model (LDM) on diverse unlabeled datasets to learn common features and generate realistic images without conditional inputs. Then, we fine-tune the model with classifier guidance in latent space on an unseen labeled dataset so that the model can synthesize images of specific categories. Additionally, we adopt a selective mechanism to only add synthetic samples with high confidence of matching to target labels. We evaluate our proposed method by pre-training on three histopathology datasets and testing on a histopathology dataset of colorectal cancer (CRC) excluded from the pre-training datasets. With HistoDiffusion augmentation, the classification accuracy of a backbone classifier is remarkably improved by 6.4% using a small set of the original labels. Our code is available at https://github.com/karenyyy/HistoDiffAug.	翻訳日:2023-08-09 14:15:03 公開日:2023-08-08
# 逆攻撃による半教師付き学習の性能向上 Improving Performance of Semi-Supervised Learning by Adversarial Attacks ( http://arxiv.org/abs/2308.04018v1 ) ライセンス: Link先を確認	Dongyoon Yang, Kunwoong Kim, Yongdai Kim	(参考訳) semi-supervised learning (ssl) アルゴリズムは、大量のラベル付きデータへのアクセスが難しいという現実的な仮定に基づいている。本研究では,最近のSSLアルゴリズムの性能向上を目的として,逆ロバストネスを用いたクリーンサンプルの選択のためのSCARというフレームワークを提案する。セミスーパービジョンで事前学習したモデルを逆襲することにより,画像分類の大幅な進歩を示す。本稿では,現在の予測でラベル付けされた高信頼度ラベル付きデータを選択する方法を紹介する。 CIFAR10では、SCARを使った最近のSSLアルゴリズムが3つあり、画像分類が大幅に改善された。 Semi-supervised learning (SSL) algorithm is a setup built upon a realistic assumption that access to a large amount of labeled data is tough. In this study, we present a generalized framework, named SCAR, standing for Selecting Clean samples with Adversarial Robustness, for improving the performance of recent SSL algorithms. By adversarially attacking pre-trained models with semi-supervision, our framework shows substantial advances in classifying images. We introduce how adversarial attacks successfully select high-confident unlabeled data to be labeled with current predictions. On CIFAR10, three recent SSL algorithms with SCAR result in significantly improved image classification.	翻訳日:2023-08-09 14:14:37 公開日:2023-08-08
# グループレコメンデーションのための多粒度アテンションモデル Multi-Granularity Attention Model for Group Recommendation ( http://arxiv.org/abs/2308.04017v1 ) ライセンス: Link先を確認	Jianye Ji, Jiayan Pei, Shaochuan Lin, Taotao Zhou, Hengxu He, Jia Jia, Ning Hu	(参考訳) グループレコメンデーションは、共通の興味、好み、特徴に基づいて、ユーザーグループにパーソナライズされたレコメンデーションを提供する。最近の研究では、個人の好みを統合し、グループ全体に役立つ集団的な決定を下す様々な方法が研究されている。しかし、それらの多くはリッチな振る舞いを持つユーザに依存しており、比較的まばらな振る舞いを持つユーザの潜在的嗜好を無視しているため、個人の興味の学習は不十分である。この課題に対処するために,複数レベルの粒度(サブセット,グループ,スーパーセットなど)を活用して,グループメンバーの潜伏傾向を解明し,推薦ノイズを軽減する手法であるMGAM(Multi-Granularity Attention Model)を提案する。特に,それまでのアイテムとのインタラクションを取り入れ,階層的な機構を活用し,ユーザの潜在部分レベルの嗜好表現を強化するサブセット選好抽出モジュールを提案する。さらに,グループ選好抽出モジュールとスーパーセット選好抽出モジュールを導入し,グループ選好を継続するグループレベルとグループグループ外見情報を含むスーパーセットレベルという2つのレベルにおいて,ユーザの潜在選好を探索する。提案手法は,サブセットレベルの埋め込み,グループレベルの埋め込み,スーパーセットレベルの埋め込みを組み込むことにより,複数の粒度にわたるグループレコメンデーションノイズを効果的に低減し,個々の興味を包括的に学習する。大規模オフラインおよびオンライン実験により,本手法の優れた性能が実証された。 Group recommendation provides personalized recommendations to a group of users based on their shared interests, preferences, and characteristics. Current studies have explored different methods for integrating individual preferences and making collective decisions that benefit the group as a whole. However, most of them heavily rely on users with rich behavior and ignore latent preferences of users with relatively sparse behavior, leading to insufficient learning of individual interests. To address this challenge, we present the Multi-Granularity Attention Model (MGAM), a novel approach that utilizes multiple levels of granularity (i.e., subsets, groups, and supersets) to uncover group members' latent preferences and mitigate recommendation noise. Specially, we propose a Subset Preference Extraction module that enhances the representation of users' latent subset-level preferences by incorporating their previous interactions with items and utilizing a hierarchical mechanism. Additionally, our method introduces a Group Preference Extraction module and a Superset Preference Extraction module, which explore users' latent preferences on two levels: the group-level, which maintains users' original preferences, and the superset-level, which includes group-group exterior information. By incorporating the subset-level embedding, group-level embedding, and superset-level embedding, our proposed method effectively reduces group recommendation noise across multiple granularities and comprehensively learns individual interests. Extensive offline and online experiments have demonstrated the superiority of our method in terms of performance.	翻訳日:2023-08-09 14:14:28 公開日:2023-08-08
# 構成ゼロショット学習のための階層的ビジュアルプリミティブエキスパート Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning ( http://arxiv.org/abs/2308.04016v1 ) ライセンス: Link先を確認	Hanjae Kim, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn	(参考訳) compositional zero-shot learning (czsl) は、既知のプリミティブ(属性とオブジェクト)の事前知識を持つ、未知のコンポジションを認識することを目的としている。 CZSLのこれまでの研究は、属性とオブジェクト間の文脈性、視覚的特徴の識別可能性、および現実世界の合成データの長期分布の把握に悩まされることが多かった。このような問題に対処するために,コンポジショントランスフォーマー(CoT)と呼ばれるシンプルでスケーラブルなフレームワークを提案する。 CoTは、視覚ネットワークを階層的に使用して、オブジェクトと属性の専門家を独特な方法で使用して、代表的な埋め込みを生成する。オブジェクトエキスパートは、最終層からボトムアップ方式で代表オブジェクト埋め込みを抽出し、属性エキスパートは、コンテキスト性を明確にモデル化するオブジェクト誘導アテンションモジュールで、トップダウン方式で属性埋め込みを行う。不均衡なデータ分布に起因するバイアス予測を緩和するために,2つのイメージを混合して仮想サンプルを合成し,少数属性クラスをオーバーサンプリングする,シンプルなマイノリティ属性拡張(MAA)を開発した。提案手法は,MIT-States,C-GQA,VAW-CZSLなど,いくつかのベンチマークでSoTA性能を実現する。また,cotが視覚識別を改善し,不均衡データ分布からモデルバイアスに対処する効果を示す。コードはhttps://github.com/HanjaeKim98/CoT.comで入手できる。 Compositional zero-shot learning (CZSL) aims to recognize unseen compositions with prior knowledge of known primitives (attribute and object). Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data. We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. CoT employs object and attribute experts in distinctive manners to generate representative embeddings, using the visual network hierarchically. The object expert extracts representative object embeddings from the final layer in a bottom-up manner, while the attribute expert makes attribute embeddings in a top-down manner with a proposed object-guided attention module that models contextuality explicitly. To remedy biased prediction caused by imbalanced data distribution, we develop a simple minority attribute augmentation (MAA) that synthesizes virtual samples by mixing two images and oversampling minority attribute classes. Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL. We also demonstrate the effectiveness of CoT in improving visual discrimination and addressing the model bias from the imbalanced data distribution. The code is available at https://github.com/HanjaeKim98/CoT.	翻訳日:2023-08-09 14:14:02 公開日:2023-08-08
# 大規模言語モデルの継続的な事前学習: モデルをいかに(再)ウォームするか? Continual Pre-Training of Large Language Models: How to (re)warm your model? ( http://arxiv.org/abs/2308.04014v1 ) ライセンス: Link先を確認	Kshitij Gupta, Benjamin Th\'erien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timoth\'ee Lesort	(参考訳) 大規模言語モデル(LLM)は数十億のトークンで定期的に事前訓練されるが、新しいデータが利用可能になったら再起動する。より安価で効率的な解決策は、これらのモデルの継続的な事前トレーニングを可能にすることである。しかし、新しいデータによって誘導される分布シフトは、通常過去のデータにおける劣化性能をもたらす。本研究は,効率的な継続事前学習に向けた一歩として,異なるウォームアップ戦略の効果を検討する。私たちの仮説は、新しいデータセットでトレーニングするときの計算効率を改善するために、学習率を再び高めなければならないということです。我々は,SlimPajama(下流データ,297Bトークン)の事前トレーニングを継続するPile(上流データ,300Bトークン)上で事前トレーニングされたモデルのウォームアップフェーズについて,線形ウォームアップおよびコサイン崩壊スケジュールに従って検討した。我々はPythia 410M言語モデルアーキテクチャに関する全ての実験を行い、検証の難易度を通して性能を評価する。我々は,事前学習チェックポイント,最大学習率,ウォームアップ長の異なる実験を行った。私たちの結果は、リワーミングモデルが最初に上流データと下流データの損失を増加させる一方で、長期的にはダウンストリームパフォーマンスを改善し、大きなダウンストリームデータセットに対してscratch$\unicode{x2013}$evenからトレーニングされたモデルを上回ることを示しています。 Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch$\unicode{x2013}$even for a large downstream dataset.	翻訳日:2023-08-09 14:13:38 公開日:2023-08-08
# 観測ネットワークデータから因果効果を推定するための一般化 Generalization bound for estimating causal effects from observational network data ( http://arxiv.org/abs/2308.04011v1 ) ライセンス: Link先を確認	Ruichu Cai, Zeqin Yang, Weilin Chen, Yuguang Yan, Zhifeng Hao	(参考訳) 観測ネットワークデータから因果効果を推定することは重要であるが難しい問題である。観測ネットワークデータに対する因果推論における既存の研究は、一般化境界の解析を欠いているため、理論的には複雑な境界バイアスを緩和し、原則的に学習目標の設計を実践的に導くことができる。このギャップを埋めるために,ネットワークシナリオにおける因果効果推定のための一般化を活用して導出する。 1)関節適合度スコアに基づく再重み付けスキーマと 2)IPM(Integrated Probability Metric)に基づく表現学習スキーマ。我々はそれぞれ、重み付けと表現学習の観点から、一般化に関する2つの視点を提供する。本稿では,境界の分析に動機づけられ,表現学習を付加した関節伸縮スコアに基づく重み付け回帰法を提案する。半合成データを持つ2つの実世界のネットワークに関する広範囲な実験により,本アルゴリズムの有効性が示された。 Estimating causal effects from observational network data is a significant but challenging problem. Existing works in causal inference for observational network data lack an analysis of the generalization bound, which can theoretically provide support for alleviating the complex confounding bias and practically guide the design of learning objectives in a principled manner. To fill this gap, we derive a generalization bound for causal effect estimation in network scenarios by exploiting 1) the reweighting schema based on joint propensity score and 2) the representation learning schema based on Integral Probability Metric (IPM). We provide two perspectives on the generalization bound in terms of reweighting and representation learning, respectively. Motivated by the analysis of the bound, we propose a weighting regression method based on the joint propensity score augmented with representation learning. Extensive experimental studies on two real-world networks with semi-synthetic data demonstrate the effectiveness of our algorithm.	翻訳日:2023-08-09 14:13:09 公開日:2023-08-08
# 形状最適化における異常検出と設計空間次元削減のための生成モデル Generative Models for Anomaly Detection and Design-Space Dimensionality Reduction in Shape Optimization ( http://arxiv.org/abs/2308.04051v1 ) ライセンス: Link先を確認	Danny D'Agostino	(参考訳) 本研究は, 幾何異常のない最適化プロセスにおいて, 高品質な設計の創出を推進しつつ, グローバル最適化アルゴリズムの効率を向上させるために, 新たな形状最適化手法を提案する。これは、幾何学的分散が最大化される新しい縮小部分空間を定義する元の設計変数の数を減らし、因子分析や確率主成分分析のような確率的線形潜在変数モデルを介してデータの基底となる生成過程をモデル化することで達成される。形状修正法が線形であり, 設計変数が一様にランダムにサンプリングされる場合, 中心極限定理の直接適用により, データはガウス分布にほぼ従うことを示す。モデルの不確かさはマハラノビス距離の観点で測定され、異常な設計はこの測定値の高い値を示す傾向があることが示されている。これにより、異常なジオメトリがペナルティ化され、最適化ループ中に回避される新しい最適化モデルの定義が可能になる。この手法はdtmb 5415モデルの船体形状最適化に応用され、形状最適化問題の国際ベンチマークとして広く用いられている。グローバル最適化ルーチンはベイズ最適化とDIRECTアルゴリズムを用いて実行される。数値計算結果から,大域的最適化アルゴリズムの収束性が向上する一方で,高質な幾何学的特徴を持つ設計のみを最適化ルーチンによって生成し,貴重な計算量の多いシミュレーションの段階を回避した。 Our work presents a novel approach to shape optimization, that has the twofold objective to improve the efficiency of global optimization algorithms while promoting the generation of high-quality designs during the optimization process free of geometrical anomalies. This is accomplished by reducing the number of the original design variables defining a new reduced subspace where the geometrical variance is maximized and modeling the underlying generative process of the data via probabilistic linear latent variable models such as Factor Analysis and Probabilistic Principal Component Analysis. We show that the data follows approximately a Gaussian distribution when the shape modification method is linear and the design variables are sampled uniformly at random, due to the direct application of the central limit theorem. The model uncertainty is measured in terms of Mahalanobis distance, and the paper demonstrates that anomalous designs tend to exhibit a high value of this metric. This enables the definition of a new optimization model where anomalous geometries are penalized and consequently avoided during the optimization loop. The procedure is demonstrated for hull shape optimization of the DTMB 5415 model, extensively used as an international benchmark for shape optimization problems. The global optimization routine is carried out using Bayesian Optimization and the DIRECT algorithm. From the numerical results, the new framework improves the convergence of global optimization algorithms, while only designs with high-quality geometrical features are generated through the optimization routine thereby avoiding the wastage of precious computationally expensive simulations.	翻訳日:2023-08-09 14:07:32 公開日:2023-08-08
# sodformer: イベントとフレームを用いたtransformerによるストリーミングオブジェクト検出 SODFormer: Streaming Object Detection with Transformer Using Events and Frames ( http://arxiv.org/abs/2308.04047v1 ) ライセンス: Link先を確認	Dianze Li and Jianing Li and Yonghong Tian	(参考訳) DAVISカメラは、非同期イベントとフレームの相補的な2つのモードをストリーミングするが、徐々に大きなオブジェクト検出の課題(例えば、高速モーションのぼかしと低照度)に対処するために使われている。しかし、リッチな時間的手がかりを効果的に活用し、2つの異種視覚ストリームを融合する方法は、依然として困難な試みである。この課題に対処するために,まずイベントとフレームを統合し,非同期にオブジェクトを連続的に検出する,transformerを備えた新しいストリーミングオブジェクト検出器sodformerを提案する。まず,PKU-DAVIS-SOD(PKU-DAVIS-SOD)を1080.1k以上の手動ラベル上に構築する。そこで,この時空間トランスフォーマーモジュールは2つの視覚ストリームからのリッチな時間的キューを利用して検出性能を向上させることで,オブジェクトを終端から終端までのシーケンス予測問題により検出する時空間トランスフォーマーアーキテクチャを設計する。最後に、非同期アテンションベースの融合モジュールを提案し、2つの不均一なセンシングモードを統合し、各端から相補的な利点を生かし、任意のタイミングでオブジェクトを探索し、同期フレームベースの融合戦略から限られた出力周波数を破ることができる。その結果,提案するsodformerは,最先端手法が4つ,ベースラインが8つと有意な差を示した。また、従来のフレームベースカメラが故障した場合、例えば、高速モーションや低照度条件などでも、統一フレームワークがうまく機能することを示す。データセットとコードはhttps://github.com/dianzl/SODFormer.orgから入手可能です。 DAVIS camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges (e.g., fast motion blur and low-light). However, how to effectively leverage rich temporal cues and fuse two heterogeneous visual streams remains a challenging endeavor. To address this challenge, we propose a novel streaming object detector with Transformer, namely SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i.e., PKU-DAVIS-SOD) over 1080.1k manual labels. Then, we design a spatiotemporal Transformer architecture to detect objects via an end-to-end sequence prediction problem, where the novel temporal Transformer module leverages rich temporal cues from two visual streams to improve the detection performance. Finally, an asynchronous attention-based fusion module is proposed to integrate two heterogeneous sensing modalities and take complementary advantages from each end, which can be queried at any time to locate objects and break through the limited output frequency from synchronized frame-based fusion strategies. The results show that the proposed SODFormer outperforms four state-of-the-art methods and our eight baselines by a significant margin. We also show that our unifying framework works well even in cases where the conventional frame-based camera fails, e.g., high-speed motion and low-light conditions. Our dataset and code can be available at https://github.com/dianzl/SODFormer.	翻訳日:2023-08-09 14:07:07 公開日:2023-08-08
# 任意の二次集団-スピン相互作用を持つ非線形時間反転干渉法 Nonlinear time-reversal interferometry with arbitrary quadratic collective-spin interaction ( http://arxiv.org/abs/2308.04042v1 ) ライセンス: Link先を確認	Zhiyao Hu, Qixian Li, Xuanchen Zhang, He-bin Zhang, Long-Gang Huang, Yong-Chun Liu	(参考訳) 原子間非線形干渉法は量子力学や量子情報科学に広く応用されている。本稿では、任意の二次的集団-スピン相互作用によって生じるスピンスクイーズに基づいて、高ロバスト性およびメソジカルゲインを有する非線形時間反転干渉法を提案し、これをLipkin-Meshkov-Glick(LMG)モデルで記述する。 LMGモデルの2つの特定のケース, 1軸ねじれ, 2軸ねじれは, それぞれ頑健さと精度で優れており, スクイーズ処理, 符号化処理, アンチスクイーズ処理を最適化する。さらに,原子系における等価時間反転を実現するFloquet駆動方式を提案し,精度,ロバスト性,操作性が向上した。本研究では,原子非線形干渉法において高精度かつロバスト性を達成するためのベンチマークを設定する。 Atomic nonlinear interferometry has wide applications in quantum metrology and quantum information science. Here we propose a nonlinear time-reversal interferometry scheme with high robustness and metrological gain based on the spin squeezing generated by arbitrary quadratic collective-spin interaction, which could be described by the Lipkin-Meshkov-Glick (LMG) model. We optimize the squeezing process, encoding process, and anti-squeezing process, finding that the two particular cases of the LMG model, one-axis twisting and two-axis twisting outperform in robustness and precision, respectively. Moreover, we propose a Floquet driving method to realize equivalent time reverse in the atomic system, which leads to high performance in precision, robustness, and operability. Our study sets a benchmark in achieving high precision and robustness in atomic nonlinear interferometry.	翻訳日:2023-08-09 14:06:37 公開日:2023-08-08
# infere: 推論チェーンによるステップバイステップのレゲックス生成 InfeRE: Step-by-Step Regex Generation via Chain of Inference ( http://arxiv.org/abs/2308.04041v1 ) ライセンス: Link先を確認	Shuai Zhang, Xiaodong Gu, Yuting Chen, Beijun Shen	(参考訳) 自然言語記述(NL2RE)から正規表現(regexesの略)を自動生成する研究領域が新たに登場した。先行研究は、regexをトークンの線形列として扱い、最後の式を単一のパスで自動回帰的に生成する。彼らは最終結果の背後にある内部テキストマッチングプロセスのステップバイステップを考慮していない。これは、ニューラルネットワークモデルによるregex生成の有効性と解釈性を著しく阻害する。本稿では,レゲックスの生成をステップバイステップ推論の連鎖に分解する,infereと呼ばれる新しいパラダイムを提案する。頑健性を高めるために,異なるモデルからサンプリングされた複数の出力をアンサンブルする自己一貫性復号機構を導入する。我々は、NL-RX-TurkとKB13の2つの公開データセット上でInfeREを評価し、その結果を最先端のアプローチと人気のツリーベース生成アプローチであるTRANXと比較した。実験の結果、InfeREは以前のベースラインを大幅に上回り、2つのデータセットでそれぞれ16.3%と14.7%のDFA@5精度が向上した。特にInfeREは、DFA@5の精度で、両方のデータセットにおいて、人気のツリーベースの生成アプローチを18.1%、11.3%で上回っている。 Automatically generating regular expressions (abbrev. regexes) from natural language description (NL2RE) has been an emerging research area. Prior studies treat regex as a linear sequence of tokens and generate the final expressions autoregressively in a single pass. They did not take into account the step-by-step internal text-matching processes behind the final results. This significantly hinders the efficacy and interpretability of regex generation by neural language models. In this paper, we propose a new paradigm called InfeRE, which decomposes the generation of regexes into chains of step-by-step inference. To enhance the robustness, we introduce a self-consistency decoding mechanism that ensembles multiple outputs sampled from different models. We evaluate InfeRE on two publicly available datasets, NL-RX-Turk and KB13, and compare the results with state-of-the-art approaches and the popular tree-based generation approach TRANX. Experimental results show that InfeRE substantially outperforms previous baselines, yielding 16.3% and 14.7% improvement in DFA@5 accuracy on two datasets, respectively. Particularly, InfeRE outperforms the popular tree-based generation approach by 18.1% and 11.3% on both datasets, respectively, in terms of DFA@5 accuracy.	翻訳日:2023-08-09 14:06:21 公開日:2023-08-08
# マーモセット脳における結合分解と遺伝子発現画像の登録のための暗黙的神経表現 Implicit neural representations for joint decomposition and registration of gene expression images in the marmoset brain ( http://arxiv.org/abs/2308.04039v1 ) ライセンス: Link先を確認	Michal Byra, Charissa Poon, Tomomi Shimogori, Henrik Skibbe	(参考訳) 本稿では,脳の2つの画像に類似した解剖学的構造を登録するが,一方の画像には他方の画像に存在しない特徴やアーティファクトが含まれているという課題を解決する,暗黙的な神経表現に基づく新しい画像登録法を提案する。その効果を示すために,marmoset脳の2次元顕微鏡$\textit{in situ}$ハイブリダイゼーション遺伝子発現画像を用いた。遺伝子発現を正確に定量化するには、脳テンプレートへの画像登録が必要である。提案手法では,暗黙のネットワークと画像排除損失を併用して,画像の登録と分割を共同で行う。サポートイメージはテンプレートとよく一致し、残りのイメージはテンプレートから切り離された個々のイメージ特性をキャプチャします。実験では,提案手法は優れた結果を与え,他の登録手法よりも優れていた。 We propose a novel image registration method based on implicit neural representations that addresses the challenging problem of registering a pair of brain images with similar anatomical structures, but where one image contains additional features or artifacts that are not present in the other image. To demonstrate its effectiveness, we use 2D microscopy $\textit{in situ}$ hybridization gene expression images of the marmoset brain. Accurately quantifying gene expression requires image registration to a brain template, which is difficult due to the diversity of patterns causing variations in visible anatomical brain structures. Our approach uses implicit networks in combination with an image exclusion loss to jointly perform the registration and decompose the image into a support and residual image. The support image aligns well with the template, while the residual image captures individual image characteristics that diverge from the template. In experiments, our method provided excellent results and outperformed other registration techniques.	翻訳日:2023-08-09 14:06:01 公開日:2023-08-08
# 非構造データセットを用いたTF-IDF特徴量法と解析の比較検討 A Comparative Study on TF-IDF feature Weighting Method and its Analysis using Unstructured Dataset ( http://arxiv.org/abs/2308.04037v1 ) ライセンス: Link先を確認	Mamata Das, Selvakumar K., P.J.A. Alphonse	(参考訳) テキスト分類は、テキストを関連するカテゴリに分類するプロセスであり、そのアルゴリズムは多くの自然言語処理(NLP)の中核にある。 TF-IDF (Term Frequency-Inverse Document Frequency) とNLP (NLP) はテキスト分類において最もよく用いられる情報検索手法である。本研究では,非構造化データのテキスト分類における特徴重み付け手法の検討と解析を行った。提案モデルは,imdb movie reviews の n-grams と tf-idf と,感情分析のための amazon alexa reviews データセットの2つの特徴を検討した。次に、最先端の分類器を用いて、SVM(Support Vector Machine)、ロジスティック回帰(Logistic Regression)、Multinomial Naive Bayes(Multinomial NB)、ランダムフォレスト(Random Forest)、決定木(Decision Tree)、k-nearest neighbors(KNN)などの手法を検証する。これら2つの特徴抽出から,N-Gramに基づくよりもTF-IDFによる特徴抽出が顕著に増加した。 TF-IDFは最大精度(93.81%)、精度(94.20%)、リコール(93.81%)、F1スコア(91.99%)を得た。 Text Classification is the process of categorizing text into the relevant categories and its algorithms are at the core of many Natural Language Processing (NLP). Term Frequency-Inverse Document Frequency (TF-IDF) and NLP are the most highly used information retrieval methods in text classification. We have investigated and analyzed the feature weighting method for text classification on unstructured data. The proposed model considered two features N-Grams and TF-IDF on the IMDB movie reviews and Amazon Alexa reviews dataset for sentiment analysis. Then we have used the state-of-the-art classifier to validate the method i.e., Support Vector Machine (SVM), Logistic Regression, Multinomial Naive Bayes (Multinomial NB), Random Forest, Decision Tree, and k-nearest neighbors (KNN). From those two feature extractions, a significant increase in feature extraction with TF-IDF features rather than based on N-Gram. TF-IDF got the maximum accuracy (93.81%), precision (94.20%), recall (93.81%), and F1-score (91.99%) value in Random Forest classifier.	翻訳日:2023-08-09 14:05:43 公開日:2023-08-08
# 無線通信仕様情報合成のための基礎モデルの適用 Adapting Foundation Models for Information Synthesis of Wireless Communication Specifications ( http://arxiv.org/abs/2308.04033v1 ) ライセンス: Link先を確認	Manikanta Kotaru	(参考訳) 現代の無線通信技術を理解し、開発し、研究するための既存のアプローチは、多くのWebページや技術仕様文書を精査し、必要な情報を収集し、それを合成する時間集約的で厳しいプロセスである。本稿では,無線通信仕様の情報合成のための対話型人工知能であるNextGen Communications Copilotを提案する。このシステムは、基盤モデルの最近の進歩の上に構築され、ドメイン固有データベース、コンテキスト抽出器、フィードバックメカニズムの3つの主要な追加コンポーネントで構成されている。このシステムは、無線技術仕様のデータベースから抽出された簡潔でクエリ依存のコンテキスト情報と、専門家のフィードバックとデータコントリビューションのためのツールを付加する。対象物の専門家によるクエリと参照応答のベンチマークデータセットを用いた評価では、ChatGPTのような最先端ツールによって達成された0.07と0.59の値と比較して、平均BLEUスコアとBERTScore F1測定値0.37と0.79との関連性および正確な回答を示した。 Existing approaches to understanding, developing and researching modern wireless communication technologies involves time-intensive and arduous process of sifting through numerous webpages and technical specification documents, gathering the required information and synthesizing it. This paper presents NextGen Communications Copilot, a conversational artificial intelligence tool for information synthesis of wireless communication specifications. The system builds on top of recent advancements in foundation models and consists of three key additional components: a domain-specific database, a context extractor, and a feedback mechanism. The system appends user queries with concise and query-dependent contextual information extracted from a database of wireless technical specifications and incorporates tools for expert feedback and data contributions. On evaluation using a benchmark dataset of queries and reference responses created by subject matter experts, the system demonstrated more relevant and accurate answers with an average BLEU score and BERTScore F1-measure of 0.37 and 0.79 respectively compared to the corresponding values of 0.07 and 0.59 achieved by state-of-the-art tools like ChatGPT.	翻訳日:2023-08-09 14:05:15 公開日:2023-08-08
# 人間の感情の不確かさの測定 Measure of Uncertainty in Human Emotions ( http://arxiv.org/abs/2308.04032v1 ) ライセンス: Link先を確認	Etienne Naude (1), Henry Gann (1), Balaram Panda (1), Lance Zhang (1), Raina Song (1), Yuwei Shen (1) ((1) The University of Auckland)	(参考訳) 多くの研究は、コンピュータがいかに人間によって表示された感情を検査し、そのデータを使って異なるタスクを遂行できるかを調査している。しかし,ユーザの意思決定やタスクの実行を支援するために,感情分類情報を生成するコンピュータ能力を評価する研究はほとんどない。これは、人間とコンピュータの双方向コミュニケーションにとって最重要となるため、探究すべき重要な領域である。本研究では,感情分類の異なる不確実性情報表示が意思決定プロセスに与える影響を検討する実験を行った。その結果,不確実性情報を表示することで,意思決定に自信が持てることがわかった。 Many research explore how well computers are able to examine emotions displayed by humans and use that data to perform different tasks. However, there have been very few research which evaluate the computers ability to generate emotion classification information in an attempt to help the user make decisions or perform tasks. This is a crucial area to explore as it is paramount to the two way communication between humans and computers. This research conducted an experiment to investigate the impact of different uncertainty information displays of emotion classification on the human decision making process. Results show that displaying more uncertainty information can help users to be more confident when making decisions.	翻訳日:2023-08-09 14:04:53 公開日:2023-08-08
# Gentopia: ツール拡張LDMのためのコラボレーションプラットフォーム Gentopia: A Collaborative Platform for Tool-Augmented LLMs ( http://arxiv.org/abs/2308.04030v1 ) ライセンス: Link先を確認	Binfeng Xu, Xukun Liu, Hua Shen, Zeyu Han, Yuhan Li, Murong Yue, Zhiyuan Peng, Yuchen Liu, Ziyu Yao, Dongkuan Xu	(参考訳) 拡張言語モデル(alm)は、ツールを使用する能力を持つ大きな言語モデルに力を与え、それらを実世界のインタラクションのためのインテリジェントエージェントに変換する。しかし、ALMの既存のフレームワークのほとんどは、フレキシブルなカスタマイズ、協調的な民主化、全体的評価といった重要な特徴に欠けている。シンプルな構成でエージェントを柔軟にカスタマイズでき、様々な言語モデル、タスクフォーマット、モジュールのプロンプト、プラグインを統一パラダイムにシームレスに統合できるalmフレームワークであるgentopiaを提案する。さらに,ユーザカスタマイズエージェントの登録と共有を可能にする公開プラットフォームであるgentpoolを構築した。ジェントプールに登録されたエージェントは、人工知能の民主化を進めるエージェント協力のために組み立てられるように構成可能である。クオリティの高いエージェントを確保するため、ジェントプールの不可欠なコンポーネントであるジェントベンチは、安全、堅牢性、効率など様々な面でユーザカスタマイズエージェントを徹底的に評価するように設計されている。 gentopiaをgithubにリリースし、今後も継続していく予定です。 Augmented Language Models (ALMs) empower large language models with the ability to use tools, transforming them into intelligent agents for real-world interactions. However, most existing frameworks for ALMs, to varying degrees, are deficient in the following critical features: flexible customization, collaborative democratization, and holistic evaluation. We present gentopia, an ALM framework enabling flexible customization of agents through simple configurations, seamlessly integrating various language models, task formats, prompting modules, and plugins into a unified paradigm. Furthermore, we establish gentpool, a public platform enabling the registration and sharing of user-customized agents. Agents registered in gentpool are composable such that they can be assembled together for agent collaboration, advancing the democratization of artificial intelligence. To ensure high-quality agents, gentbench, an integral component of gentpool, is designed to thoroughly evaluate user-customized agents across diverse aspects such as safety, robustness, efficiency, etc. We release gentopia on Github and will continuously move forward.	翻訳日:2023-08-09 14:04:43 公開日:2023-08-08
# バイオメディカル質問応答のためのトップK関連パス検索 Top K Relevant Passage Retrieval for Biomedical Question Answering ( http://arxiv.org/abs/2308.04028v1 ) ライセンス: Link先を確認	Shashank Gupta	(参考訳) 質問応答は、大量の文書を用いて事実のない質問に答えるタスクである。自然言語によるユーザの質問に対して,正確な回答を提供することを目標としている。質問応答は、TF-IDFやBM25のような伝統的なスパースベクトル空間モデルが事実上の方法である、選択された候補コンテキストに対する効率的な経路探索に依存する。ウェブ上では、ユーザーが質問した問題に対して、インターネットで利用可能なすべての回答を提供することのできる記事は1つもない。既存の密集した通路の検索モデルは、2018年12月20日からwikipediaのダンプで、質問に答えるための資料として訓練されている。質問応答(QA)は、大規模アノテートデータセットを使用して構築されたいくつかのオープンドメインとマシン理解システムで大きく進歩している。しかし、臨床領域では、この問題は比較的未解明のままである。複数の調査によると、バイオメディカル質問はWikipediaの記事から正しく答えられない。本研究では,既存の生体医学領域のためのdprフレームワークを開発し,医学的質問に答える信頼できる情報源であるpubmedアーティクルから回答を取り出す。 BioASQ QAデータセットで評価すると、細調整された高密度検索器は0.81F1スコアとなる。 Question answering is a task that answers factoid questions using a large collection of documents. It aims to provide precise answers in response to the user's questions in natural language. Question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. On the web, there is no single article that could provide all the possible answers available on the internet to the question of the problem asked by the user. The existing Dense Passage Retrieval model has been trained on Wikipedia dump from Dec. 20, 2018, as the source documents for answering questions. Question answering (QA) has made big strides with several open-domain and machine comprehension systems built using large-scale annotated datasets. However, in the clinical domain, this problem remains relatively unexplored. According to multiple surveys, Biomedical Questions cannot be answered correctly from Wikipedia Articles. In this work, we work on the existing DPR framework for the biomedical domain and retrieve answers from the Pubmed articles which is a reliable source to answer medical questions. When evaluated on a BioASQ QA dataset, our fine-tuned dense retriever results in a 0.81 F1 score.	翻訳日:2023-08-09 14:04:25 公開日:2023-08-08
# 軌道インフォームドサロゲート勾配を用いたフェデレートゼロ階最適化 Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients ( http://arxiv.org/abs/2308.04077v1 ) ライセンス: Link先を確認	Yao Shu, Xiaoqiang Lin, Zhongxiang Dai, Bryan Kian Hsiang Low	(参考訳) フェデレーション最適化(federated optimization)は、フェデレーション学習のような広い現実世界のアプリケーションを見つける新たなパラダイムで、複数のクライアント(エッジデバイスなど)がグローバルな機能を協調的に最適化する。クライアントはローカルデータセットを共有せず、通常はローカル勾配のみを共有する。しかし、勾配情報はフェデレーション最適化の多くの応用では利用できないため、フェデレーションゼロth-order optimization (zoo) のパラダイムが生まれている。既存のZOOアルゴリズムは、クエリと通信の非効率性の限界に悩まされている。 (a)勾配推定のための相当数の関数クエリに依存すること、及び (b) 実現したローカルアップデートと意図したグローバルアップデートの間に大きな差異がある。この目的のためには (a) 正確でクエリ効率のよい勾配推定のための最適化中に関数クエリの履歴を利用できるトラジェクトリインフォームド勾配サロゲートを導入し、 (b) これらの勾配置換体を用いた適応勾配補正法を開発し, 上記の相違を緩和する。そこで本稿では, トラジェクトリインフォームド・サロゲート勾配 (FZooS) アルゴリズムを用いたフェデレーションゼロ階次最適化手法を提案する。当社のfzoosは,フェデレーションブラックボックス逆攻撃やフェデレーション非微分メトリック最適化といった実世界実験によって支持される,既存のアプローチに対する理論的改善を実現しています。 Federated optimization, an emerging paradigm which finds wide real-world applications such as federated learning, enables multiple clients (e.g., edge devices) to collaboratively optimize a global function. The clients do not share their local datasets and typically only share their local gradients. However, the gradient information is not available in many applications of federated optimization, which hence gives rise to the paradigm of federated zeroth-order optimization (ZOO). Existing federated ZOO algorithms suffer from the limitations of query and communication inefficiency, which can be attributed to (a) their reliance on a substantial number of function queries for gradient estimation and (b) the significant disparity between their realized local updates and the intended global updates. To this end, we (a) introduce trajectory-informed gradient surrogates which is able to use the history of function queries during optimization for accurate and query-efficient gradient estimation, and (b) develop the technique of adaptive gradient correction using these gradient surrogates to mitigate the aforementioned disparity. Based on these, we propose the federated zeroth-order optimization using trajectory-informed surrogate gradients (FZooS) algorithm for query- and communication-efficient federated ZOO. Our FZooS achieves theoretical improvements over the existing approaches, which is supported by our real-world experiments such as federated black-box adversarial attack and federated non-differentiable metric optimization.	翻訳日:2023-08-09 13:56:06 公開日:2023-08-08
# DataTales: 大規模言語モデルによるデータ駆動記事のオーサリングの検討 DataTales: Investigating the use of Large Language Models for Authoring Data-Driven Articles ( http://arxiv.org/abs/2308.04076v1 ) ライセンス: Link先を確認	Nicole Sultanum, Arjun Srinivasan	(参考訳) データ駆動記事の執筆は複雑なプロセスであり、著者は洞察のためにデータを分析するだけでなく、洞察を効果的に伝達する結束的な物語を作る必要がある。現代大言語モデル(llms)のテキスト生成能力は、データ駆動記事の作成を支援し、執筆プロセスを迅速化する機会を提供する。本研究では LLM を活用したデータ駆動記事作成支援の実現可能性と評価について検討する。我々は,llmを利用して与えられたチャートに付随する文章的ナラティブを生成する,プロトタイプシステムdatatalesを設計した。デザインプローブとしてDataTalesを用いて,11人の専門家による質的研究を行い,LLMを価値あるデータ駆動型記事作成アシスタントとして活用する機会と機会を抽出した。 Authoring data-driven articles is a complex process requiring authors to not only analyze data for insights but also craft a cohesive narrative that effectively communicates the insights. Text generation capabilities of contemporary large language models (LLMs) present an opportunity to assist the authoring of data-driven articles and expedite the writing process. In this work, we investigate the feasibility and perceived value of leveraging LLMs to support authors of data-driven articles. We designed a prototype system, DataTales, that leverages a LLM to generate textual narratives accompanying a given chart. Using DataTales as a design probe, we conducted a qualitative study with 11 professionals to evaluate the concept, from which we distilled affordances and opportunities to further integrate LLMs as valuable data-driven article authoring assistants.	翻訳日:2023-08-09 13:55:43 公開日:2023-08-08
# 単眼RGBビデオにおける手指再建の空間的文脈の展開 Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction on Monocular RGB Video ( http://arxiv.org/abs/2308.04074v1 ) ライセンス: Link先を確認	Weichao Zhao, Hezhen Hu, Wengang Zhou, Li li, Houqiang Li	(参考訳) モノラルなRGBデータから相互作用する手を再構築することは難しい作業であり、例えば、自己と相互の閉塞や類似したテクスチャなど、多くの干渉要因が伴う。それまでの作業では、物理的に妥当な関係をモデル化することなく、単一のRGB画像からの情報しか活用できなかった。本研究は,空間的時空間情報を明示的に活用し,より優れたハンドリコンストラクションを実現することを目的としている。一方,1つのフレームで提供される情報不足を補うために時間的文脈を活用し,手の動きの滑らかさを対話するための時間的制約を伴う新しい時間的枠組みを設計する。また, 物理的衝突を伴わずに, 動的に再現可能な手を作るための相互浸透検出モジュールを提案する。提案フレームワークの有効性を検証するために,公開ベンチマークで新たな最先端性能を実現するための広範囲な実験を行った。 Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors, e.g. self- and mutual occlusion and similar textures. Previous works only leverage information from a single RGB image without modeling their physically plausible relation, which leads to inferior reconstruction results. In this work, we are dedicated to explicitly exploiting spatial-temporal information to achieve better interacting hand reconstruction. On one hand, we leverage temporal context to complement insufficient information provided by the single frame, and design a novel temporal framework with a temporal constraint for interacting hand motion smoothness. On the other hand, we further propose an interpenetration detection module to produce kinetically plausible interacting hands without physical collisions. Extensive experiments are performed to validate the effectiveness of our proposed framework, which achieves new state-of-the-art performance on public benchmarks.	翻訳日:2023-08-09 13:55:28 公開日:2023-08-08
# 物理形ニューラルネットワークのための特殊活性化関数の学習 Learning Specialized Activation Functions for Physics-informed Neural Networks ( http://arxiv.org/abs/2308.04073v1 ) ライセンス: Link先を確認	Honghui Wang, Lu Lu, Shiji Song, Gao Huang	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は最適化の難しさに悩まされている。本研究では,PINNの最適化難易度とアクティベーション関数の関係を明らかにする。具体的には, PINNは, 異なる性質を持つPDEを解く際に, 活性化関数に対して高い感度を示すことを示す。既存の作業は通常、非効率な試行錯誤によってアクティベーション関数を選択する。非効率な手動選択を回避し、PINNの最適化の難しさを軽減するため、異なる問題を解く際に最適な関数を探すための適応的アクティベーション関数を導入する。異なる適応活性化関数を比較し,その限界をピンの文脈で議論する。さらに,学習関数の滑らかさと多様性の要求度が高いピンズ最適化に,候補活性化関数の学習組合せのアイデアを合わせることを提案する。これは、候補集合から高次微分を与えることができないアクティベーション関数を除去し、手前のPDEに関する以前の知識に従って基本関数を異なる性質で組み込むことによって達成される。我々は,適応傾斜で探索空間をさらに強化する。提案したアダプティブアクティベーション関数は、異なるPDEシステムを解釈可能な方法で解くために使用できる。その効果は一連のベンチマークで示される。コードはhttps://github.com/LeapLabTHU/AdaAFforPINNsで入手できる。 Physics-informed neural networks (PINNs) are known to suffer from optimization difficulty. In this work, we reveal the connection between the optimization difficulty of PINNs and activation functions. Specifically, we show that PINNs exhibit high sensitivity to activation functions when solving PDEs with distinct properties. Existing works usually choose activation functions by inefficient trial-and-error. To avoid the inefficient manual selection and to alleviate the optimization difficulty of PINNs, we introduce adaptive activation functions to search for the optimal function when solving different problems. We compare different adaptive activation functions and discuss their limitations in the context of PINNs. Furthermore, we propose to tailor the idea of learning combinations of candidate activation functions to the PINNs optimization, which has a higher requirement for the smoothness and diversity on learned functions. This is achieved by removing activation functions which cannot provide higher-order derivatives from the candidate set and incorporating elementary functions with different properties according to our prior knowledge about the PDE at hand. We further enhance the search space with adaptive slopes. The proposed adaptive activation function can be used to solve different PDE systems in an interpretable way. Its effectiveness is demonstrated on a series of benchmarks. Code is available at https://github.com/LeapLabTHU/AdaAFforPINNs.	翻訳日:2023-08-09 13:55:11 公開日:2023-08-08
# 確率的軌道最適化における経路シグナチャ Path Signatures for Diversity in Probabilistic Trajectory Optimisation ( http://arxiv.org/abs/2308.04071v1 ) ライセンス: Link先を確認	Lucas Barcelos, Tin Lai, Rafael Oliveira, Paulo Borges and Fabio Ramos	(参考訳) 移動計画は、発生した軌道の関数としてコストを最小化する軌道最適化問題としてキャストすることができる。いくつかの障害物と複雑な幾何学を持つ複雑な環境では、この最適化問題は一般に解くのが難しく、局所ミニマに傾向がある。しかし、近年のコンピューティングハードウェアの進歩により、複数の解が同時に得られる並列軌道最適化が可能となり、それぞれ異なる出発点から初期化される。残念なことに、2つの解が互いに崩壊することを防ぐ戦略がなければ、単純並列最適化はモード崩壊に悩まされ、アプローチの効率が低下し、グローバルな解を見つける可能性が低下する。本稿では, 粗路理論の最近の進歩を活用し, パラレルトラジェクトリ最適化のアルゴリズムを考案し, 解幅の多様性を促進し, モード崩壊を回避し, より優れたグローバル性を実現する。本手法は軌道の経路シグネチャとヒルベルト空間表現を基盤とし,軌道推定のための並列変分推論とカーネルの多様性を促進する。この戦略は,2次元ナビゲーションからロボットマニピュレータまで,さまざまな問題において競合する代替手段よりも低い平均コストを実現することを実証的に実証する。 Motion planning can be cast as a trajectory optimisation problem where a cost is minimised as a function of the trajectory being generated. In complex environments with several obstacles and complicated geometry, this optimisation problem is usually difficult to solve and prone to local minima. However, recent advancements in computing hardware allow for parallel trajectory optimisation where multiple solutions are obtained simultaneously, each initialised from a different starting point. Unfortunately, without a strategy preventing two solutions to collapse on each other, naive parallel optimisation can suffer from mode collapse diminishing the efficiency of the approach and the likelihood of finding a global solution. In this paper we leverage on recent advances in the theory of rough paths to devise an algorithm for parallel trajectory optimisation that promotes diversity over the range of solutions, therefore avoiding mode collapses and achieving better global properties. Our approach builds on path signatures and Hilbert space representations of trajectories, and connects parallel variational inference for trajectory estimation with diversity promoting kernels. We empirically demonstrate that this strategy achieves lower average costs than competing alternatives on a range of problems, from 2D navigation to robotic manipulators operating in cluttered environments.	翻訳日:2023-08-09 13:54:52 公開日:2023-08-08
# ConDistFL:部分注釈データからのフェデレーション学習のための条件付き蒸留 ConDistFL: Conditional Distillation for Federated Learning from Partially Annotated Data ( http://arxiv.org/abs/2308.04070v1 ) ライセンス: Link先を確認	Pochuan Wang, Chen Shen, Weichung Wang, Masahiro Oda, Chiou-Shann Fuh, Kensaku Mori, Holger R. Roth	(参考訳) 複数の臓器と疾患を同時に記述できる一般化セグメンテーションモデルの開発が望まれる。フェデレートラーニング(FL)は、トレーニングデータを交換することなく、モデルの協調開発を可能にする重要な技術である。しかし、完全に注釈付けされたトレーニングデータへの限られたアクセスは、一般化可能なモデルをトレーニングする上で大きな課題となる。本稿では,FLと知識蒸留を組み合わせた「ConDistFL」を提案する。局所モデルは、適切に設計された条件付き確率表現を用いて、グローバルモデルから部分的に注釈付きデータからラベルのない臓器や腫瘍の知識を抽出することができる。我々は,MSDとKITS19の課題から4つの異なる部分的腹部CTデータセットを検証した。実験の結果,提案フレームワークはfedavgおよびfedoptベースラインを大きく上回っている。さらに、外部テストデータセットのパフォーマンスは、各データセットで個別にトレーニングされたモデルと比較して、優れた一般化性を示す。本研究は,コンディストFLが頻繁な凝集を伴わずに良好に機能し,FLの通信コストを低減できることを示す。実装はhttps://github.com/nvidia/nvflare/tree/dev/research/condist-flで利用可能です。 Developing a generalized segmentation model capable of simultaneously delineating multiple organs and diseases is highly desirable. Federated learning (FL) is a key technology enabling the collaborative development of a model without exchanging training data. However, the limited access to fully annotated training data poses a major challenge to training generalizable models. We propose "ConDistFL", a framework to solve this problem by combining FL with knowledge distillation. Local models can extract the knowledge of unlabeled organs and tumors from partially annotated data from the global model with an adequately designed conditional probability representation. We validate our framework on four distinct partially annotated abdominal CT datasets from the MSD and KiTS19 challenges. The experimental results show that the proposed framework significantly outperforms FedAvg and FedOpt baselines. Moreover, the performance on an external test dataset demonstrates superior generalizability compared to models trained on each dataset separately. Our ablation study suggests that ConDistFL can perform well without frequent aggregation, reducing the communication cost of FL. Our implementation will be available at https://github.com/NVIDIA/NVFlare/tree/dev/research/condist-fl.	翻訳日:2023-08-09 13:54:30 公開日:2023-08-08
# 適応重み付き正規化と知識蒸留による低ラベルレジームの逆ロバスト性向上 Enhancing Adversarial Robustness in Low-Label Regime via Adaptively Weighted Regularization and Knowledge Distillation ( http://arxiv.org/abs/2308.04061v1 ) ライセンス: Link先を確認	Dongyoon Yang, Insung Kong, Yongdai Kim	(参考訳) 敵対的堅牢性は、最近、信頼できる人工知能の探求に多くの注目を集めた研究分野である。しかし、近年はラベル付きデータが豊富であると考えられる教師あり学習に焦点が当てられている。本稿では,ラベル付きデータが少ない半教師付き対人訓練について検討する。我々は,ロバストリスクに対する2つの上界を導出し,これら2つの上界に動機づけられたラベルなしデータの正規化項を提案する。そこで,本研究では,半教師型教師(セミ教師型学習アルゴリズムを用いた教師モデル)を用いて,正規化項と知識蒸留を併用した半教師型逆学習アルゴリズムを開発した。実験の結果,提案アルゴリズムは既存のアルゴリズムに比べて高いマージンで最先端の性能を実現することがわかった。特に教師付き学習アルゴリズムと比較して,ラベル付きデータの量が非常に少ない場合でも,提案アルゴリズムの性能はそれほど悪くはない。例えば、8\%のラベル付きデータしか持たないアルゴリズムは、CIFAR-10の標準および堅牢な精度の両面で、すべてのラベル付きデータを使用する教師付き敵訓練アルゴリズムに匹敵する。 Adversarial robustness is a research area that has recently received a lot of attention in the quest for trustworthy artificial intelligence. However, recent works on adversarial robustness have focused on supervised learning where it is assumed that labeled data is plentiful. In this paper, we investigate semi-supervised adversarial training where labeled data is scarce. We derive two upper bounds for the robust risk and propose a regularization term for unlabeled data motivated by these two upper bounds. Then, we develop a semi-supervised adversarial training algorithm that combines the proposed regularization term with knowledge distillation using a semi-supervised teacher (i.e., a teacher model trained using a semi-supervised learning algorithm). Our experiments show that our proposed algorithm achieves state-of-the-art performance with significant margins compared to existing algorithms. In particular, compared to supervised learning algorithms, performance of our proposed algorithm is not much worse even when the amount of labeled data is very small. For example, our algorithm with only 8\% labeled data is comparable to supervised adversarial training algorithms that use all labeled data, both in terms of standard and robust accuracies on CIFAR-10.	翻訳日:2023-08-09 13:54:13 公開日:2023-08-08
# クラスタリング手法によるニュージーランドの児童福祉システムの予測リスクモデルの改善に向けて Toward Improving Predictive Risk Modelling for New Zealand's Child Welfare System Using Clustering Methods ( http://arxiv.org/abs/2308.04060v1 ) ライセンス: Link先を確認	Sahar Barmomanesh and Victor Miranda-Soberanis	(参考訳) 臨床的判断と予測的リスクモデルの組み合わせは、社会労働者が児童を虐待のリスクで隔離し、当局が介入すべき時期を決定するために重要な助けとなる。この問題に対処するための予測リスクモデリングは、行政データと機械学習アルゴリズムを含む世界中の政府福祉当局によって始められた。これまでの研究は、子供の虐待に関連するリスク要因を調査してきたが、これらのリスク要因がどのように相互作用するか、予測リスクモデルが異なる特徴を持つ子供に対して異なる機能を持つのかを理解するために、いくつかのギャップが残っている。本稿では,主成分分析とK-平均クラスタリングを統合することで,これらの特徴の同定と,現在のリスクモデリングフレームワークに対する潜在的な影響を明らかにする。このアプローチにより、ニュージーランド(NZ)の子供たちのケアと保護に関する懸念が報告された存在、未確認のクラスターを調べ、内部構造を分析し、訓練されたクラスターの賢明な予測モデルの性能を評価することができる。本研究の目的は,児童虐待の予測リスクモデルの開発に必要となるクラスタリングの程度を明らかにすることであり,児童保護当局が利用しようとするモデルの精度を高めることである。同一クラスタ上で学習したLASSOロジスティック回帰モデルの結果, 性能に有意な差は認められなかった。しかし、これらのモデルは、幼児を含む2つのクラスターに対してわずかに改善された。以上の結果から,特定の年齢の子どもに対して,誤差率のコントロールやモデルの精度向上のために,別のモデルを開発する必要があることが示唆された。結果は有望だが、結論を出すにはさらなる証拠が必要であり、さらなる調査が必要である。 The combination of clinical judgement and predictive risk models crucially assist social workers to segregate children at risk of maltreatment and decide when authorities should intervene. Predictive risk modelling to address this matter has been initiated by several governmental welfare authorities worldwide involving administrative data and machine learning algorithms. While previous studies have investigated risk factors relating to child maltreatment, several gaps remain as to understanding how such risk factors interact and whether predictive risk models perform differently for children with different features. By integrating Principal Component Analysis and K-Means clustering, this paper presents initial findings of our work on the identification of such features as well as their potential effect on current risk modelling frameworks. This approach allows examining existent, unidentified yet, clusters of New Zealand (NZ) children reported with care and protection concerns, as well as to analyse their inner structure, and evaluate the performance of prediction models trained cluster wise. We aim to discover the extent of clustering degree required as an early step in the development of predictive risk models for child maltreatment and so enhance the accuracy of such models intended for use by child protection authorities. The results from testing LASSO logistic regression models trained on identified clusters revealed no significant difference in their performance. The models, however, performed slightly better for two clusters including younger children. our results suggest that separate models might need to be developed for children of certain age to gain additional control over the error rates and to improve model accuracy. While results are promising, more evidence is needed to draw definitive conclusions, and further investigation is necessary.	翻訳日:2023-08-09 13:53:54 公開日:2023-08-08
# 3次元物体検出のための距離の実証分析 An Empirical Analysis of Range for 3D Object Detection ( http://arxiv.org/abs/2308.04054v1 ) ライセンス: Link先を確認	Neehar Peri, Mengtian Li, Benjamin Wilson, Yu-Xiong Wang, James Hays, Deva Ramanan	(参考訳) LiDARベースの3D検出は、自律ナビゲーションにおいて重要な役割を果たす。驚いたことに、自動運転車(AV)は(衝突回避のために)近接場オブジェクトと(長期計画のために)遠距離フィールドオブジェクトの両方を検出する必要があるが、現代のベンチマークは近接場3D検出のみに焦点を当てている。しかし、avは安全な航行のために遠方界物体を検出する必要がある。本稿では、長距離検出データセットArgoverse 2.0を用いた遠距離3次元検出の実証分析を行い、この問題をよりよく理解し、以下の知見を共有する: 近距離LiDAR測定は密度が高く、小さなボクセルで最適に符号化される一方、遠距離測定はスパースであり、大きなボクセルで符号化されている。この観察を利用して近距離vs遠距離検出用に調整された範囲エキスパートのコレクションを構築し,効率を33%向上させ,精度を3.2%向上させる長距離検出のためのモデルを効率的にアンサンブルする簡単な手法を提案する。 LiDAR-based 3D detection plays a vital role in autonomous navigation. Surprisingly, although autonomous vehicles (AVs) must detect both near-field objects (for collision avoidance) and far-field objects (for longer-term planning), contemporary benchmarks focus only on near-field 3D detection. However, AVs must detect far-field objects for safe navigation. In this paper, we present an empirical analysis of far-field 3D detection using the long-range detection dataset Argoverse 2.0 to better understand the problem, and share the following insight: near-field LiDAR measurements are dense and optimally encoded by small voxels, while far-field measurements are sparse and are better encoded with large voxels. We exploit this observation to build a collection of range experts tuned for near-vs-far field detection, and propose simple techniques to efficiently ensemble models for long-range detection that improve efficiency by 33% and boost accuracy by 3.2% CDS.	翻訳日:2023-08-09 13:53:28 公開日:2023-08-08
# 5ドルモデル:文の埋め込みからゲームマップとスプライトを生成する The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings ( http://arxiv.org/abs/2308.04052v1 ) ライセンス: Link先を確認	Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius	(参考訳) 5ドルモデルは、符号化されたテキストプロンプトから低次元画像を生成する軽量なテキスト画像生成アーキテクチャである。このモデルは,低次元領域において,限られたトレーニングデータを用いて,正確かつ美的なコンテンツを生成することができる。モデルとデータセットの両方の小さなサイズにもかかわらず、生成された画像は、テキストプロンプトのエンコードされた意味を維持できる。このモデルを,画素アートゲームマップ,ビデオゲームスプライト画像,ダウンスケール絵文字画像の3つの小さなデータセットに適用し,これらの限られたデータセット上でのモデルの性能向上のために,新たな拡張戦略を適用した。 CLIP VIT-B/32モデルにより生成されたテキスト画像ペア間のコサイン類似度スコアを用いて,本モデルの性能を評価する。 The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model.	翻訳日:2023-08-09 13:53:10 公開日:2023-08-08
# I-WAS: 同期検出のためのGPT-2を用いたデータ拡張手法 I-WAS: a Data Augmentation Method with GPT-2 for Simile Detection ( http://arxiv.org/abs/2308.04109v1 ) ライセンス: Link先を確認	Yongzhu Chang, Rongsheng Zhang, Jiashu Pu	(参考訳) シミュラ検出は多くの自然言語処理(NLP)ベースのアプリケーション、特に文学分野において重要なタスクである。しかし、模擬検出に関する既存の研究は、しばしばサイズが限られており、完全な模擬形態を適切に表現していないコーパスに依存している。この問題に対処するため, GPT-2言語モデルを用いて, \textbf{W}ord置換および文補完に基づくデータ拡張手法を提案する。 I-WASと呼ばれる反復的なプロセスは、拡張文の品質を向上させるために設計されている。本手法の性能を実世界のアプリケーションでよりよく評価するために,実験のためにより多様なシミール形式を含むコーパスをコンパイルした。提案手法の有効性を実験的に検証し,本手法の有効性を検証した。 Simile detection is a valuable task for many natural language processing (NLP)-based applications, particularly in the field of literature. However, existing research on simile detection often relies on corpora that are limited in size and do not adequately represent the full range of simile forms. To address this issue, we propose a simile data augmentation method based on \textbf{W}ord replacement And Sentence completion using the GPT-2 language model. Our iterative process called I-WAS, is designed to improve the quality of the augmented sentences. To better evaluate the performance of our method in real-world applications, we have compiled a corpus containing a more diverse set of simile forms for experimentation. Our experimental results demonstrate the effectiveness of our proposed data augmentation method for simile detection.	翻訳日:2023-08-09 13:47:32 公開日:2023-08-08
# マルチタスクニューラルネットワークによる並列学習 Parallel Learning by Multitasking Neural Networks ( http://arxiv.org/abs/2308.04106v1 ) ライセンス: Link先を確認	Elena Agliari and Andrea Alessandrelli and Adriano Barra and Federico Ricci-Tersenghi	(参考訳) 現代の人工知能の課題は、複数のパターンを同時に学習すること(すなわち並列学習)である。標準的なヘビアン連想ニューラルネットワークでは実現できないが,本論文では,マルチタスキング・ヘビアン・ネットワーク(ホップフィールドモデルがスパースデータセットに取り組んでいるテーマのバリエーション)が,この複雑なタスクを自然に実行可能であることを示す。我々は,パターン認識に携わる標準連想ニューラルネットワークの低storageレベルを反映し,有限(ネットワークサイズが対数的に増加するまで)のパターンを並列に処理することに焦点を当てた。パターンを軽度に希釈するために、ネットワークはそれらを階層的に処理し、それらの信号の振幅をその情報内容(階層的状態)のパワーローとして分配する一方、強い希釈のために、全てのパターンに関連するすべての信号を同じ強度(並列的状態)で引き上げる。さらに、低ストレージ設定(例えば、スピンガラス限界から遠く離れた)に限定され、教師の存在はマルチタスクのパフォーマンスを変更したり、学習のしきい値を変更したりせず、後者はトレーニングプロトコルが監督または監督されていないものと同じである。例えば、モデルのコスト関数が複数のパターン(統計力学による記述)で並列に最小化されるたびに、標準的な総和二乗誤差損失関数(一般的に機械学習で使用される)が同じである。 A modern challenge of Artificial Intelligence is learning multiple patterns at once (i.e.parallel learning). While this can not be accomplished by standard Hebbian associative neural networks, in this paper we show how the Multitasking Hebbian Network (a variation on theme of the Hopfield model working on sparse data-sets) is naturally able to perform this complex task. We focus on systems processing in parallel a finite (up to logarithmic growth in the size of the network) amount of patterns, mirroring the low-storage level of standard associative neural networks at work with pattern recognition. For mild dilution in the patterns, the network handles them hierarchically, distributing the amplitudes of their signals as power-laws w.r.t. their information content (hierarchical regime), while, for strong dilution, all the signals pertaining to all the patterns are raised with the same strength (parallel regime). Further, confined to the low-storage setting (i.e., far from the spin glass limit), the presence of a teacher neither alters the multitasking performances nor changes the thresholds for learning: the latter are the same whatever the training protocol is supervised or unsupervised. Results obtained through statistical mechanics, signal-to-noise technique and Monte Carlo simulations are overall in perfect agreement and carry interesting insights on multiple learning at once: for instance, whenever the cost-function of the model is minimized in parallel on several patterns (in its description via Statistical Mechanics), the same happens to the standard sum-squared error Loss function (typically used in Machine Learning).	翻訳日:2023-08-09 13:47:19 公開日:2023-08-08
# 説明可能な機械学習によるドープ共役高分子の高スループット電気伝導率最適化 Explainable machine learning to enable high-throughput electrical conductivity optimization of doped conjugated polymers ( http://arxiv.org/abs/2308.04103v1 ) ライセンス: Link先を確認	Ji Wei Yoon, Adithya Kumar, Pawan Kumar, Kedar Hippalgaonkar, J Senthilnath, Vijila Chellappan	(参考訳) 高スループット実験技術と機械学習(ml)の組み合わせは、最近加速材料発見の新しい時代を導いており、最先端特性を持つ材料の識別を可能にしている。しかし、ある物理量の測定は自動化が難しいままである。特に、ドープポリマー材料の最適導電性を達成するには、細心のプロセス制御、実験および手間のかかる測定が必要である。本稿では,容易に測定可能な吸収スペクトルを用いたML手法を提案し,導電率測定に伴うワークフローを高速化する。最初のMLモデル(分類モデル)は、導電率>25から100S/cmの試料を正確に分類し、最大100%の精度を達成する。高導電率試料のサブセットについては,2次MLモデル(回帰モデル)を用いて導電率を予測し,印象的なR2値0.984を得た。このアプローチを検証するために, 498 s/cm と 506 s/cm の2つの高い導電率を持つ試料ではトレーニングされなかったモデルが, 精度の高いエラーレベルで正しく分類し, 予測できたことを示した。提案するmlワークフローにより, 導電率測定の効率を最大89%向上させることができた。さらに,記述子とmlモデルの独自な数学的性質を活用し,導電性に対するスペクトルの影響を裏付ける洞察を得ることにより,mlモデルの説明可能性の欠如という共通の課題に対処した。本研究では,実験科学におけるMLの目的的利用から得られる貴重な知見を提示しながら,ドープポリマー材料の特性を最適化するための加速経路を提案する。 The combination of high-throughput experimentation techniques and machine learning (ML) has recently ushered in a new era of accelerated material discovery, enabling the identification of materials with cutting-edge properties. However, the measurement of certain physical quantities remains challenging to automate. Specifically, meticulous process control, experimentation and laborious measurements are required to achieve optimal electrical conductivity in doped polymer materials. We propose a ML approach, which relies on readily measured absorbance spectra, to accelerate the workflow associated with measuring electrical conductivity. The first ML model (classification model), accurately classifies samples with a conductivity >~25 to 100 S/cm, achieving a maximum of 100% accuracy rate. For the subset of highly conductive samples, we employed a second ML model (regression model), to predict their conductivities, yielding an impressive test R2 value of 0.984. To validate the approach, we showed that the models, neither trained on the samples with the two highest conductivities of 498 and 506 S/cm, were able to, in an extrapolative manner, correctly classify and predict them at satisfactory levels of errors. The proposed ML workflow results in an improvement in the efficiency of the conductivity measurements by 89% of the maximum achievable using our experimental techniques. Furthermore, our approach addressed the common challenge of the lack of explainability in ML models by exploiting bespoke mathematical properties of the descriptors and ML model, allowing us to gain corroborated insights into the spectral influences on conductivity. Through this study, we offer an accelerated pathway for optimizing the properties of doped polymer materials while showcasing the valuable insights that can be derived from purposeful utilization of ML in experimental science.	翻訳日:2023-08-09 13:46:54 公開日:2023-08-08
# ディープニューラルネットワークアーキテクチャの非同期進化 Asynchronous Evolution of Deep Neural Network Architectures ( http://arxiv.org/abs/2308.04102v1 ) ライセンス: Link先を確認	Jason Liang, Hormoz Shahrzad, Risto Miikkulainen	(参考訳) 多くの進化的アルゴリズム(EA)は、候補の並列評価を利用する。しかし、評価時間が著しく異なる場合、多くのワーカノード(例えば、\計算クライアント)は、その時間の大部分をアイドル状態にし、次の世代が作られるのを待ちます。ディープニューラルネットワークのアーキテクチャとハイパーパラメータを最適化するeasのクラスである evolutionary neural architecture search (enas) は、この問題に特に脆弱である。本稿では,ENASと協調して動作する汎用非同期評価戦略(AES)を提案する。 AESは、最大$K$のキューを労働者に送信して評価し、M<K$の個人が労働者によって評価され次第、次の世代に進むことでスループットを向上させる。 M$の適切な値は、多様性と効率のバランスをとって実験的に決定される。 AESの汎用性と威力を示すために、まず11ビット多重化設計(単一ポピュレーション検証探索タスク)で評価され、画像キャプション(複数ポピュレーション開放最適化タスク)のためにENASまで拡張された。両問題とも多角的性能改善が観察され、AESはENASのような長大かつ可変的な評価時間を持つ複雑なシステムの進化を並列化するための有望な手法であることが示唆された。 Many evolutionary algorithms (EAs) take advantage of parallel evaluation of candidates. However, if evaluation times vary significantly, many worker nodes (i.e.,\ compute clients) are idle much of the time, waiting for the next generation to be created. Evolutionary neural architecture search (ENAS), a class of EAs that optimizes the architecture and hyperparameters of deep neural networks, is particularly vulnerable to this issue. This paper proposes a generic asynchronous evaluation strategy (AES) that is then adapted to work with ENAS. AES increases throughput by maintaining a queue of upto $K$ individuals ready to be sent to the workers for evaluation and proceeding to the next generation as soon as $M<<K$ individuals have been evaluated by the workers. A suitable value for $M$ is determined experimentally, balancing diversity and efficiency. To showcase the generality and power of AES, it was first evaluated in 11-bit multiplexer design (a single-population verifiable discovery task) and then scaled up to ENAS for image captioning (a multi-population open-ended-optimization task). In both problems, a multifold performance improvement was observed, suggesting that AES is a promising method for parallelizing the evolution of complex systems with long and variable evaluation times, such as those in ENAS.	翻訳日:2023-08-09 13:46:22 公開日:2023-08-08
# 量子近似最適化アルゴリズムによる分子ドッキング Molecular docking via quantum approximate optimization algorithm ( http://arxiv.org/abs/2308.04098v1 ) ライセンス: Link先を確認	Qi-Ming Ding, Yi-Ming Huang, Xiao Yuan	(参考訳) 分子ドッキングは、薬物発見と精密医療において重要な役割を担い、タンパク質の機能を理解し、新しい治療法を進歩させることができる。本稿では, 量子コンピュータ上での逆ダイアバティック駆動とqaoaを利用した, ディジタル化カウンタダイアバティック量子近似最適化アルゴリズム (dc-qaoa) を提案する。 PM-2-020BのSARS-CoV-2 Mpro複合体,イミダゾピリジン34のDPP-4複合体,JP-III-048のHIV-1 gp120複合体など,多様な生物学的システムの解析に応用した。 DC-QAOAは優れた性能を示し、特に大きな分子ドッキング問題に対して、より正確で生物学的に関連するドッキング結果を提供する。さらに、QAOAベースのアルゴリズムは、ノイズの多い中間スケール量子時代のハードウェア互換性を向上し、実用的なドッキングシナリオ下での効率的な実装の可能性を示している。我々の発見は、量子コンピューティングの創薬の可能性の中核となり、タンパク質リグナンドドッキングプロセスを最適化するための貴重な洞察を提供する。 Molecular docking plays a pivotal role in drug discovery and precision medicine, enabling us to understand protein functions and advance novel therapeutics. Here, we introduce a potential alternative solution to this problem, the digitized-counterdiabatic quantum approximate optimization algorithm (DC-QAOA), which utilizes counterdiabatic driving and QAOA on a quantum computer. Our method was applied to analyze diverse biological systems, including the SARS-CoV-2 Mpro complex with PM-2-020B, the DPP-4 complex with piperidine fused imidazopyridine 34, and the HIV-1 gp120 complex with JP-III-048. The DC-QAOA exhibits superior performance, providing more accurate and biologically relevant docking results, especially for larger molecular docking problems. Moreover, QAOA-based algorithms demonstrate enhanced hardware compatibility in the noisy intermediate-scale quantum era, indicating their potential for efficient implementation under practical docking scenarios. Our findings underscore quantum computing's potential in drug discovery and offer valuable insights for optimizing protein-ligand docking processes.	翻訳日:2023-08-09 13:45:59 公開日:2023-08-08
# ユニモーダルからマルチモーダルへ:深い生成モデルによるsEMGに基づくパターン認識の改善 From Unimodal to Multimodal: improving the sEMG-Based Pattern Recognition via deep generative models ( http://arxiv.org/abs/2308.04091v1 ) ライセンス: Link先を確認	Wentao Wei, Linyan Ren	(参考訳) マルチモーダルハンドジェスチャー認識(HGR)システムは高い認識精度を実現する。しかし、マルチモーダルなジェスチャー認識データを取得するには、ユーザーが追加のセンサーを装着する必要があるため、ハードウェアコストが増加する。本稿では,仮想慣性計測ユニット(IMU)信号を用いた表面筋電図(sEMG)に基づくHGRの精度向上のための新しい生成手法を提案する。具体的には,前腕sEMG信号と前腕IMU信号の内在的相関に基づいて深部生成モデルを訓練し,入力前腕sEMG信号から仮想前腕IMU信号を生成する。その後、SEMG信号と仮想IMU信号は、ジェスチャー認識のためのマルチモーダル畳み込みニューラルネットワーク(CNN)モデルに入力される。提案手法の性能を評価するため,SEMGデータとIMUデータの両方を含む38のジェスチャーを行う28の被験者からなる5つの公開データベースと収集データベースを含む6つのデータベースについて実験を行った。その結果,提案手法は, sEMGをベースとした単調HGR法(2.15%～13.10%増加)よりも優れていた。深部生成モデルにより生成された仮想IMU信号の組み込みは、sEMGベースのHGRの精度を大幅に向上させることを示した。提案手法は,センサハードウェアを追加せずにHGRからHGRへの移行に成功したことを示す。 Multimodal hand gesture recognition (HGR) systems can achieve higher recognition accuracy. However, acquiring multimodal gesture recognition data typically requires users to wear additional sensors, thereby increasing hardware costs. This paper proposes a novel generative approach to improve Surface Electromyography (sEMG)-based HGR accuracy via virtual Inertial Measurement Unit (IMU) signals. Specifically, we trained a deep generative model based on the intrinsic correlation between forearm sEMG signals and forearm IMU signals to generate virtual forearm IMU signals from the input forearm sEMG signals at first. Subsequently, the sEMG signals and virtual IMU signals were fed into a multimodal Convolutional Neural Network (CNN) model for gesture recognition. To evaluate the performance of the proposed approach, we conducted experiments on 6 databases, including 5 publicly available databases and our collected database comprising 28 subjects performing 38 gestures, containing both sEMG and IMU data. The results show that our proposed approach outperforms the sEMG-based unimodal HGR method (with increases of 2.15%-13.10%). It demonstrates that incorporating virtual IMU signals, generated by deep generative models, can significantly enhance the accuracy of sEMG-based HGR. The proposed approach represents a successful attempt to transition from unimodal HGR to multimodal HGR without additional sensor hardware.	翻訳日:2023-08-09 13:45:39 公開日:2023-08-08
# メタバースにおける異種360度ビデオ:差分強化学習アプローチ Heterogeneous 360 Degree Videos in Metaverse: Differentiated Reinforcement Learning Approaches ( http://arxiv.org/abs/2308.04083v1 ) ライセンス: Link先を確認	Wenhan Yu and Jun Zhao	(参考訳) 高度なビデオ技術が未来的なメタバースの開発を後押ししている。そのため、ユーザーのユースケースはもっと多様になり、360度ビデオとVR以外の2種類のビデオが混在するようになる。本稿では,フレームレートとサイバーシックネスの異なる360度ビデオに対して,新しい品質のサービスモデルを提案する。本稿では,自己設計の深部強化学習アルゴリズムを用いたフレームスロット構造とフレームワイズ最適化を提案する。具体的には、この異種シナリオに対して、SIDO(Separate Input Differentiated Output)とMIDO(Merged Input Differentiated Output)という2つの構造を設計する。また,その効果を示すため,総合的な実験を行う。 Advanced video technologies are driving the development of the futuristic Metaverse, which aims to connect users from anywhere and anytime. As such, the use cases for users will be much more diverse, leading to a mix of 360-degree videos with two types: non-VR and VR 360-degree videos. This paper presents a novel Quality of Service model for heterogeneous 360-degree videos with different requirements for frame rates and cybersickness. We propose a frame-slotted structure and conduct frame-wise optimization using self-designed differentiated deep reinforcement learning algorithms. Specifically, we design two structures, Separate Input Differentiated Output (SIDO) and Merged Input Differentiated Output (MIDO), for this heterogeneous scenario. We also conduct comprehensive experiments to demonstrate their effectiveness.	翻訳日:2023-08-09 13:45:14 公開日:2023-08-08
# quRKを用いた量子生成学習のアプリケーション指向ベンチマーク Application-Oriented Benchmarking of Quantum Generative Learning Using QUARK ( http://arxiv.org/abs/2308.04082v1 ) ライセンス: Link先を確認	Florian J. Kiwit, Marwa Marso, Philipp Ross, Carlos A. Riofr\'io, Johannes Klepsch, Andre Luckow	(参考訳) 量子機械学習(QML)アルゴリズムのベンチマークは、QMLシステムの複雑さと変動性、例えば、モデルアンサーゼ、データセット、トレーニング技術、ハイパーパラメータ選択などによって困難である。 QUantum Computing Application benchmaRK (QUARK) フレームワークは、量子コンピューティングアプリケーションのためのベンチマーク研究を単純化し、標準化する。本稿では、量子生成モデルのトレーニングと展開を評価する機能を含むクォークの拡張をいくつか提案する。ソフトウェアアーキテクチャの更新について述べるとともに,その柔軟性を,いくつかのサンプルアプリケーションを通じて説明している。 2)GPUおよび実量子ハードウェアを用いたモデルの評価を行った。 (3) 生成モデルの一般化能力は, 生成データの新規性や妥当性など, 幅広い指標を用いて評価した。 Benchmarking of quantum machine learning (QML) algorithms is challenging due to the complexity and variability of QML systems, e.g., regarding model ansatzes, data sets, training techniques, and hyper-parameters selection. The QUantum computing Application benchmaRK (QUARK) framework simplifies and standardizes benchmarking studies for quantum computing applications. Here, we propose several extensions of QUARK to include the ability to evaluate the training and deployment of quantum generative models. We describe the updated software architecture and illustrate its flexibility through several example applications: (1) We trained different quantum generative models using several circuit ansatzes, data sets, and data transformations. (2) We evaluated our models on GPU and real quantum hardware. (3) We assessed the generalization capabilities of our generative models using a broad set of metrics that capture, e.g., the novelty and validity of the generated data.	翻訳日:2023-08-09 13:45:02 公開日:2023-08-08
# リアルタイム放射フィールドレンダリングのための3次元gaussian splatting 3D Gaussian Splatting for Real-Time Radiance Field Rendering ( http://arxiv.org/abs/2308.04079v1 ) ライセンス: Link先を確認	Bernhard Kerbl, Georgios Kopanas, Thomas Leimk\"uhler, George Drettakis	(参考訳) ラジアンス・フィールド法は、最近、複数の写真やビデオで撮影されたシーンの新規ビュー合成に革命をもたらした。しかし、高い視覚的品質を達成するには、トレーニングとレンダリングにコストがかかるニューラルネットワークが必要である。非有界で完全なシーン(孤立したオブジェクトではなく)と1080p解像度のレンダリングでは、現在の方法ではリアルタイムの表示速度を達成できない。 1080pの解像度で高画質のリアルタイム(>30fps)ノベルビュー合成を実現するために,最先端の視覚品質を実現するための3つの重要な要素を導入する。 First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. 確立されたデータセット上で,最先端のビジュアル品質とリアルタイムレンダリングを実演する。 Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.	翻訳日:2023-08-09 13:44:48 公開日:2023-08-08
# 連続波レーザーの偏光パス相関のコヒーレンス操作によるマクロ量子相関 Macroscopic quantum correlation using coherence manipulations of polarization-path correlations of a continuous-wave laser ( http://arxiv.org/abs/2308.04078v1 ) ライセンス: Link先を確認	B. S. Ham	(参考訳) 量子重ね合わせは通常、ハイゼンベルクの不確かさ原理が支配する微視的な方法で持続する。ペア粒子間の量子相関は、古典物理学によって支配される局所実在論の違反を意味する。過去数十年間、量子機能は量子コンピューティング、通信、センシングなど様々な量子技術に実装されてきた。このような量子的特徴は一般に古典的な手段では不可能であることが知られている。ここでは、連続波レーザーの偏光-パス相関のコヒーレンス操作のためのマクロ量子相関を提示し、分離不能な積-基底形式の結合パラメータ関係を満たす。偏光パス相関のコヒーレンス制御には、一対の電気光学変調器が干渉しないマッハ・ツェンダー干渉計において、対の偏光基底間の決定論的切替に使用され、従った一対の光変調器によって選択された積-基底選択の偏光積-基底重ねが生じる。この前例のないマクロな量子特徴は、将来の古典光学互換量子情報のための顕微鏡的状態を超えた量子力学の新しい理解の扉を開く。 Quantum superposition is normally sustained in a microscopic regime governed by Heisenberg uncertainty principle applicable to a single particle. Quantum correlation between paired particles implies the violation of local realism governed by classical physics. Over the last decades, quantum features have been implemented in various quantum technologies including quantum computing, communications, and sensing. Such quantum features are generally known to be impossible by any classical means. Here, a macroscopic quantum correlation is presented for coherence manipulations of polarization-path correlations of a continuous wave laser, satisfying the joint-parameter relation in an inseparable product-basis form. For the coherence control of the polarization-path correlation, a pair of electro-optic modulators is used in a noninterfering Mach-Zehnder interferometer for deterministic switching between paired polarization bases, resulting in the polarization product-basis superposition in a selective product-basis choice manner by a followed pair of acousto-optic modulators. This unprecedented macroscopic quantum feature opens the door to a new understanding of quantum mechanics beyond the microscopic regime for future classical optics-compatible quantum information.	翻訳日:2023-08-09 13:44:29 公開日:2023-08-08
# 難易度を考慮に入れた深層学習分類器の性能に関する総合的評価 Comprehensive Assessment of the Performance of Deep Learning Classifiers Reveals a Surprising Lack of Robustness ( http://arxiv.org/abs/2308.04137v1 ) ライセンス: Link先を確認	Michael W. Spratling	(参考訳) 信頼性が高くロバストな評価手法は、それ自体が堅牢で信頼性の高い機械学習モデルを開発する上で必要な第一歩である。残念ながら、分類器を評価するために一般的に使用される現在の評価プロトコルは、限られた種類のテストデータに依存する傾向があるため、パフォーマンスを総合的に評価できない。例えば、標準のテストデータを使用すると、分類器がトレーニングしていないクラスからサンプルへの予測を評価することができない。一方、未知クラスのサンプルを含むデータを用いたテストでは、分類器が既知のクラスのラベルをどの程度正確に予測できるかを評価することができない。本稿では,多種多様なデータを用いたベンチマーキング性能と,そのようなデータ型すべてに適用可能な単一のメトリクスを用いて,一貫した性能評価を行う。このようなベンチマークを用いて、現在のディープニューラルネットワークは、最先端のロバスト性を生み出すと信じられているメソッドで訓練されているものを含む、ある種のデータに対するミスに対して極めて脆弱であることが判明した。つまり、このようなモデルは、さまざまなドメインのデータに遭遇する可能性のある現実のシナリオでは信頼できないし、誤った判断をするのは簡単に騙されるため、安全ではない、ということだ。これらの結果によって、より包括的なテスト手法が広く採用され、その結果、将来的にはより堅牢な機械学習手法の開発につながることが期待されている。コードは以下の通り。 \url{https://codeberg.org/mwspratling/RobustnessEvaluation} Reliable and robust evaluation methods are a necessary first step towards developing machine learning models that are themselves robust and reliable. Unfortunately, current evaluation protocols typically used to assess classifiers fail to comprehensively evaluate performance as they tend to rely on limited types of test data, and ignore others. For example, using the standard test data fails to evaluate the predictions made by the classifier to samples from classes it was not trained on. On the other hand, testing with data containing samples from unknown classes fails to evaluate how well the classifier can predict the labels for known classes. This article advocates bench-marking performance using a wide range of different types of data and using a single metric that can be applied to all such data types to produce a consistent evaluation of performance. Using such a benchmark it is found that current deep neural networks, including those trained with methods that are believed to produce state-of-the-art robustness, are extremely vulnerable to making mistakes on certain types of data. This means that such models will be unreliable in real-world scenarios where they may encounter data from many different domains, and that they are insecure as they can easily be fooled into making the wrong decisions. It is hoped that these results will motivate the wider adoption of more comprehensive testing methods that will, in turn, lead to the development of more robust machine learning methods in the future. Code is available at: \url{https://codeberg.org/mwspratling/RobustnessEvaluation}	翻訳日:2023-08-09 13:38:06 公開日:2023-08-08
# 量子エンタングルメントとスクイーズを用いたサブSQL電子場センシング Sub-SQL electronic field sensing by simultaneously using quantum entanglements and squeezings ( http://arxiv.org/abs/2308.04136v1 ) ライセンス: Link先を確認	X. N. Feng, M. Zhang, and L. F. Wei	(参考訳) 量子エンタングルメント(quantum entanglement)と量子スクイージング(quantum squeezing)は、量子メトロロジーにおける感度の高い位相推定の標準量子限界(sql)を打ち負かすための2つの典型的なアプローチである。それぞれが、トラップされたイオンプラットフォームによる電界センシングの感度を向上させるために、すでに個別に利用されてきたが、実証された感度ゲインの上限は、SQL上の実験的な3dBと理論的な6dBと非常に限られている。ここで、内部(スピン)外部(オシレータ)状態の絡み合いと発振器のスクイージングを同時に使用して蓄積位相を効果的に増幅し、平均励起フォノン数を圧縮することにより、関連するパラメータを適切に設定できれば、これらの感度向上を効果的に超越することができることを示す。願わくば、この提案は、所望の電界や他のメトロロギーの繊細なセンシングのためのsqlのより強力なビートに対する新しいアプローチを提供する。 Quantum entanglement and quantum squeezing are two most typical approaches to beat the standard quantum limit (SQL) of the sensitive phase estimations in quantum metrology. Each of them has already been utilized individually to improve the sensitivity of electric field sensing with the trapped ion platform, but the upper bound of the demonstrated sensitivity gain is very limited, i.e., the experimental 3dB and theoretical 6dB, over the SQL. Here, by simultaneously using the internal (spin)-external (oscillator) state entanglements and the oscillator squeezings to effectively amplify the accumulation phase and compress the mean excited phonon number at the same time, we show that these sensitivity gains can be effectively surpassed, once the relevant parameters can be properly set. Hopefully, the proposal provides a novel approach to the stronger beaten of the SQL for the sensitive sensings of the desired electric field and also the other metrologies.	翻訳日:2023-08-09 13:37:22 公開日:2023-08-08
# マルチパススタッケルバーグ原子干渉法によるブロッホ振動相の研究 Bloch Oscillation Phases investigated by Multi-path Stuckelberg Atom Interferometry ( http://arxiv.org/abs/2308.04134v1 ) ライセンス: Link先を確認	Tahiyat Rahman, Anna Wirth-Singh, Andrew Ivanov, Daniel Gochnauer, Emmett Hough and Subhadeep Gupta	(参考訳) 加速光学格子でブロッホ振動(bos)を受ける原子は、2つの光子反動の運動量を得る。この技術は、原子光学のための大きな運動量伝達ツールを提供するが、原子干渉センサの完全な利用には、関連する位相を実験的に評価する必要がある。各BOは、スタッケルベルク干渉と呼ばれる干渉を引き起こす複数の交差を伴うランダウ・ツェナー交差を含む。我々はマルチパス・スタッケルベルク干渉計を開発し、BO中の原子相進化を最大100光子リコイル運動量移動で調べる。数値計算した単一粒子のシュロディンガー進化と比較し,高度にコヒーレントなBO配列を示し,基礎物理学およびセンシング応用におけるBO強化精密干渉計の位相安定性要件を評価する。 Atoms undergoing Bloch oscillations (BOs) in an accelerating optical lattice acquire momentum of two photon recoils per BO. This technique provides a large momentum transfer tool for atom optics, but its full exploitation for atom interferometric sensors requires experimental characterization of associated phases. Each BO involves a Landau-Zener crossing with multiple crossings inducing interference known as Stuckelberg interference. We develop a multi-path Stuckelberg interferometer and investigate atomic phase evolution during BOs, up to 100 photon recoil momentum transfer. We compare to numerically calculated single-particle Schrodinger evolution, demonstrate highly coherent BO sequences, and assess phase stability requirements for BO-enhanced precision interferometry in fundamental physics and sensing applications.	翻訳日:2023-08-09 13:36:50 公開日:2023-08-08
# 計測シャープネスと外乱トレードオフ Measurement sharpness and disturbance trade-off ( http://arxiv.org/abs/2308.04133v1 ) ライセンス: Link先を確認	Nayere Saberian, Seyed Javad Akhtarshenas, and Fereshte Shahbeigi	(参考訳) 測定によって量子システムから情報を取得すると、通常は状態が乱される。しかし、測定後の状態は独特ではなく、選択した測定モデルに強く依存しており、情報ゆらぎのパズルを複雑にしている。 2つの異なる質問が順番に行われる。第一に、測定が引き起こす最小の障害は何か。第二に、固定された外乱が発生した場合、最善のシナリオで可能な測定量はどの程度有益か? 本稿では,これらの問題に対処する様々な手法を提案し,ユニタリキュービットチャネルの像と等価な,偏りのないバイナリキュービット測定と後測定状態空間の集合に対する明確な解を提供する。特に, この測定のシャープネスと, 測定前状態空間の平均忠実度との間には, 測定後状態に保存されたシャープネスと量子資源とのトレードオフ関係が, 局所的に適用された場合のコヒーレンスと不協和性の関係で異なることを示す。 Obtaining information from a quantum system through a measurement typically disturbs its state. The post-measurement states for a given measurement, however, are not unique and highly rely on the chosen measurement model, complicating the puzzle of information-disturbance. Two distinct questions are then in order. Firstly, what is the minimum disturbance a measurement may induce? Secondly, when a fixed disturbance occurs, how informative is the possible measurement in the best-case scenario? Here, we propose various approaches to tackle these questions and provide explicit solutions for the set of unbiased binary qubit measurements and post-measurement state spaces that are equivalent to the image of a unital qubit channel. In particular, we show there are different trade-off relations between the sharpness of this measurement and the average fidelity of the pre-measurement and post-measurement state spaces as well as the sharpness and quantum resources preserved in the post-measurement states in terms of coherence and discord-like correlation once the measurement is applied locally.	翻訳日:2023-08-09 13:36:23 公開日:2023-08-08
# OmniDataComposer: マルチモーダルデータ融合と無限データ生成のための統一データ構造 OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation ( http://arxiv.org/abs/2308.04126v1 ) ライセンス: Link先を確認	Dongyang Yu and Shihao Wang and Yuan Fang and Wangpeng An	(参考訳) 本稿では,マルチモーダルデータ融合と無制限データ生成のための革新的なアプローチであるOmniDataComposerについて述べる。コアとなるブレークスルーは、ビデオ、オーディオ、テキストを含むマルチモーダルなデータ入力の処理と統合に熟練した凝集性のあるデータ構造の導入だ。提案アルゴリズムは,映像・画像のキャプション抽出,高密度キャプション抽出,自動音声認識(ASR),光学文字認識(OCR),認識任意のモデル(RAM),オブジェクト追跡など,複数の操作の進歩を活用している。 omnidatacomposerは、6400以上のオブジェクトのカテゴリを識別でき、視覚情報のスペクトルを大きく広げることができる。これらの多様なモダリティを融合させ、モダリティ間の相互強化を促進し、クロスモダリティデータの修正を促進する。 \textbf{the final outputは、各ビデオの入力を精巧なシーケンシャルなドキュメントに変換し、ビデオを徹底的な物語に変換し、大きな言語モデルによって処理しやすくする。将来の展望には、無制限のデータ生成を促進するために各モダリティ用のデータセットを最適化することが含まれる。この堅牢なベースは、ChatGPTのようなモデルに価値のない洞察を提供し、ビデオキャプションのための高品質なデータセットを作成し、ビデオコンテンツに基づいた質問応答タスクを緩和する。 OmniDataComposerは、マルチモーダル学習の新たなステージを開拓し、AIの理解と複雑な実世界のデータ生成を増大させる大きな可能性を与える。 This paper presents OmniDataComposer, an innovative approach for multimodal data fusion and unlimited data generation with an intent to refine and uncomplicate interplay among diverse data modalities. Coming to the core breakthrough, it introduces a cohesive data structure proficient in processing and merging multimodal data inputs, which include video, audio, and text. Our crafted algorithm leverages advancements across multiple operations such as video/image caption extraction, dense caption extraction, Automatic Speech Recognition (ASR), Optical Character Recognition (OCR), Recognize Anything Model(RAM), and object tracking. OmniDataComposer is capable of identifying over 6400 categories of objects, substantially broadening the spectrum of visual information. It amalgamates these diverse modalities, promoting reciprocal enhancement among modalities and facilitating cross-modal data correction. \textbf{The final output metamorphoses each video input into an elaborate sequential document}, virtually transmuting videos into thorough narratives, making them easier to be processed by large language models. Future prospects include optimizing datasets for each modality to encourage unlimited data generation. This robust base will offer priceless insights to models like ChatGPT, enabling them to create higher quality datasets for video captioning and easing question-answering tasks based on video content. OmniDataComposer inaugurates a new stage in multimodal learning, imparting enormous potential for augmenting AI's understanding and generation of complex, real-world data.	翻訳日:2023-08-09 13:36:00 公開日:2023-08-08
# 自治体意思決定支援におけるソーシャルメディアとトピックモデリングと感性分析 Social Media, Topic Modeling and Sentiment Analysis in Municipal Decision Support ( http://arxiv.org/abs/2308.04124v1 ) ライセンス: Link先を確認	Milo\v{s} \v{S}va\v{n}a	(参考訳) 世界中の多くの都市が成長を望んでいる。しかし、スマートイニシアティブは一般市民の意見にあまり重みを与えないことが多い。ソーシャルメディアは市民の意見の最も重要な情報源の1つである。本稿では,自治体の意思決定を考慮したソーシャルメディア投稿処理フレームワークの試作について述べる。本フレームワークは,(1)各ソーシャルメディア投稿の感情極性を決定すること,(2)各トピックを識別し,それらのトピックを個別の投稿にマッピングすること,(3)これら2つの情報を各トピックに対して表現された全体感情を表すファジィ数に集約すること,の3段階からなる。任意にファジィ数は、各トピックに対して表される正と負の意見の「量」を示す2つの実数のタプルに還元することができる。このフレームワークはチェコのオストラヴァから約2ヶ月にわたって公開されたツイートで実証されている。このアプリケーションは、ファジィ数字がよりリッチな方法で感情を表現し、ソーシャルメディア上で表現される意見の多様性を捉えていることを示す。 Many cities around the world are aspiring to become. However, smart initiatives often give little weight to the opinions of average citizens. Social media are one of the most important sources of citizen opinions. This paper presents a prototype of a framework for processing social media posts with municipal decision-making in mind. The framework consists of a sequence of three steps: (1) determining the sentiment polarity of each social media post (2) identifying prevalent topics and mapping these topics to individual posts, and (3) aggregating these two pieces of information into a fuzzy number representing the overall sentiment expressed towards each topic. Optionally, the fuzzy number can be reduced into a tuple of two real numbers indicating the "amount" of positive and negative opinion expressed towards each topic. The framework is demonstrated on tweets published from Ostrava, Czechia over a period of about two months. This application illustrates how fuzzy numbers represent sentiment in a richer way and capture the diversity of opinions expressed on social media.	翻訳日:2023-08-09 13:35:31 公開日:2023-08-08
# ディープラーニングを用いたカスタム熱力学の構築 Constructing Custom Thermodynamics Using Deep Learning ( http://arxiv.org/abs/2308.04119v1 ) ライセンス: Link先を確認	Xiaoli Chen, Beatrice W. Soh, Zi-En Ooi, Eleonore Vissol-Gaudin, Haijun Yu, Kostya S. Novoselov, Kedar Hippalgaonkar, Qianxiao Li	(参考訳) AIの最もエキサイティングな応用の1つは、以前に蓄積されたデータに基づく自動科学的発見と、対称性や保存法を含む既知の物理原則によって提供される制限である。このような自動仮説作成と検証は、従来の物理的直観が失敗する複雑な現象の研究を支援する。特に重要なのが複雑な動的システムであり、時間発展は外部のパラメータによって強く影響を受ける。本稿では,任意の確率散逸系のマクロ的動的記述を,その微視的軌跡の観察から直接学習する,一般化したOnsager原理に基づくプラットフォームを開発する。複雑性と大きさが完全に顕微鏡的な記述を非現実的にするシステムに注目し,理論マクロモデルの構築には広範なドメイン知識や試行錯誤が必要となる。我々の機械学習アプローチは、還元熱力学座標を同時に構築し、これらの座標上の力学を解釈することでこの問題に対処する。提案手法を理論的および実験的に検証し, 外部応用分野における長鎖の延伸を実証する。具体的には,(1)安定状態と遷移状態の同定,(2)伸張速度の制御など,3つの解釈可能な熱力学的座標を学習し,高分子伸展の動的景観を構築する。我々はさらに,このアプローチの普遍性を,異なる領域の無関係問題に適用することで実証する。空間的流行に対するマクロダイナミクスの構築であり,その手法が幅広い科学的・技術的応用に対応していることを示す。 One of the most exciting applications of AI is automated scientific discovery based on previously amassed data, coupled with restrictions provided by the known physical principles, including symmetries and conservation laws. Such automated hypothesis creation and verification can assist scientists in studying complex phenomena, where traditional physical intuition may fail. Of particular importance are complex dynamic systems where their time evolution is strongly influenced by varying external parameters. In this paper we develop a platform based on a generalised Onsager principle to learn macroscopic dynamical descriptions of arbitrary stochastic dissipative systems directly from observations of their microscopic trajectories. We focus on systems whose complexity and sheer sizes render complete microscopic description impractical, and constructing theoretical macroscopic models requires extensive domain knowledge or trial-and-error. Our machine learning approach addresses this by simultaneously constructing reduced thermodynamic coordinates and interpreting the dynamics on these coordinates. We demonstrate our method by studying theoretically and validating experimentally, the stretching of long polymer chains in an externally applied field. Specifically, we learn three interpretable thermodynamic coordinates and build a dynamical landscape of polymer stretching, including (1) the identification of stable and transition states and (2) the control of the stretching rate. We further demonstrate the universality of our approach by applying it to an unrelated problem in a different domain: constructing macroscopic dynamics for spatial epidemics, showing that our method addresses wide scientific and technological applications.	翻訳日:2023-08-09 13:35:15 公開日:2023-08-08
# ベクターグラフィック文書におけるマルチモーダルカラーレコメンデーション Multimodal Color Recommendation in Vector Graphic Documents ( http://arxiv.org/abs/2308.04118v1 ) ライセンス: Link先を確認	Qianru Qiu, Xueting Wang, Mayu Otani	(参考訳) カラー選択はグラフィック文書設計において重要な役割を担い、様々な文脈を十分に考慮する必要がある。しかし、ドキュメント内の他の色やテキストコンテキストと調和する適切な色を推奨することは、経験豊富なデザイナーにとっても難しい課題である。本研究では,色とテクストのコンテキストを統合したマルチモーダルマスクカラーモデルを提案し,グラフィック文書のテキスト対応カラーレコメンデーションを提案する。提案モデルは,複数のパレットにおける色間の関係をキャプチャする自己注意ネットワークと,色とCLIPに基づくテキスト表現を組み込んだ相互注意ネットワークから構成される。提案手法は主に色とテキストに基づいて色を推奨するカラーパレット補完に焦点を当てている。また、与えられたテキストに対応する完全なカラーパレットを生成するフルパレット生成という別のカラーレコメンデーションタスクにも適用可能である。実験結果から,提案手法は従来のカラーパレット完成法よりも精度,色分布,ユーザエクスペリエンスを上回り,色多様性と地味パレットとの類似性について完全なパレット生成法が得られた。 Color selection plays a critical role in graphic document design and requires sufficient consideration of various contexts. However, recommending appropriate colors which harmonize with the other colors and textual contexts in documents is a challenging task, even for experienced designers. In this study, we propose a multimodal masked color model that integrates both color and textual contexts to provide text-aware color recommendation for graphic documents. Our proposed model comprises self-attention networks to capture the relationships between colors in multiple palettes, and cross-attention networks that incorporate both color and CLIP-based text representations. Our proposed method primarily focuses on color palette completion, which recommends colors based on the given colors and text. Additionally, it is applicable for another color recommendation task, full palette generation, which generates a complete color palette corresponding to the given text. Experimental results demonstrate that our proposed approach surpasses previous color palette completion methods on accuracy, color distribution, and user experience, as well as full palette generation methods concerning color diversity and similarity to the ground truth palettes.	翻訳日:2023-08-09 13:34:51 公開日:2023-08-08
# 分子イオンを用いたシリコン中のドナースピン量子ビットの配置精度の向上 Improved placement precision of implanted donor spin qubits in silicon using molecule ions ( http://arxiv.org/abs/2308.04117v1 ) ライセンス: Link先を確認	Danielle Holmes (1), Benjamin Wilhelm (1), Alexander M. Jakob (2), Xi Yu (1), Fay E. Hudson (1,3), Kohei M. Itoh (4), Andrew S. Dzurak (1,3), David N. Jamieson (2), Andrea Morello (1) ((1) CQC2T, School of Electrical Engineering and Telecommunications, UNSW Sydney, Australia, (2) CQC2T, School of Physics, The University of Melbourne, Australia, (3) Diraq, Sydney, Australia, (4) School of Fundamental Science and Technology, Keio University, Japan)	(参考訳) シリコン28(^{28}$Si)のドナースピンは、固体で最も高性能な量子ビットの1つであり、記録的なコヒーレンス時間とゲート忠実度を99%以上提供する。ドナースピン量子ビットは、決定論的イオン注入の半導体-産業互換法を用いて製造することができる。ここでは, 単原子ではなく分子イオンを注入することで, 製造方法の精度を向上できることを示す。傍観者イオンは関心のドーパントと共作用し、さらなる運動エネルギーを持ち、単一のイオン検出器を用いて誘導された電子-ホール対を信号する決定論的ドナー注入の検出信頼性を高める。これにより、検出信頼を損なうことなくドナー量子ビットの配置不確実性を最小化することができる。高品質なPドナー量子ビットを生成するために二フッ化リン(PF$_2^+$)分子イオンの適合性を検討した。 $^{19}$F核は$I = 1/2$のスピンを持つので、磁気ノイズを加えることによってデコヒーレンスを引き起こすため、Pドナー電子にカップルを超微細化しないようにすることが必須である。二次イオン質量分析法を用いて、fはqubitデバイスの活性領域から拡散し、pドナーはドナー活性化アニールの間は元の位置に近いことが確認された。 PF$_2$-implanted qubit deviceを作製し、P供与電子上で電子スピン共鳴(ESR)測定を行った。 t_2^* = 20.5 \pm 0.5$$\mu$s と$t_2^{hahn} = 424 \pm 5$$$\mu$s の純粋な強調時間は、従来のp実装量子ビットデバイスに匹敵するpドナー電子値に対して抽出された。 PドナーESRスペクトルのより密な調査により、Pドナー近傍で$^{19}$Fの核スピンは見つからなかったことが判明した。したがって、分子イオンは、長寿命ドナースピン量子ビットの高精度な決定論的に実装された配列を生成することを大いに約束する。 Donor spins in silicon-28 ($^{28}$Si) are among the most performant qubits in the solid state, offering record coherence times and gate fidelities above 99%. Donor spin qubits can be fabricated using the semiconductor-industry compatible method of deterministic ion implantation. Here we show that the precision of this fabrication method can be boosted by implanting molecule ions instead of single atoms. The bystander ions, co-implanted with the dopant of interest, carry additional kinetic energy and thus increase the detection confidence of deterministic donor implantation employing single ion detectors to signal the induced electron-hole pairs. This allows the placement uncertainty of donor qubits to be minimised without compromising on detection confidence. We investigate the suitability of phosphorus difluoride (PF$_2^+$) molecule ions to produce high quality P donor qubits. Since $^{19}$F nuclei have a spin of $I = 1/2$, it is imperative to ensure that they do not hyperfine couple to P donor electrons as they would cause decoherence by adding magnetic noise. Using secondary ion mass spectrometry, we confirm that F diffuses away from the active region of qubit devices while the P donors remain close to their original location during a donor activation anneal. PF$_2$-implanted qubit devices were then fabricated and electron spin resonance (ESR) measurements were performed on the P donor electron. A pure dephasing time of $T_2^* = 20.5 \pm 0.5$ $\mu$s and a coherence time of $T_2^{Hahn} = 424 \pm 5$ $\mu$s were extracted for the P donor electron-values comparable to those found in previous P-implanted qubit devices. Closer investigation of the P donor ESR spectrum revealed that no $^{19}$F nuclear spins were found in the vicinity of the P donor. Molecule ions therefore show great promise for producing high-precision deterministically-implanted arrays of long-lived donor spin qubits.	翻訳日:2023-08-09 13:34:30 公開日:2023-08-08
# 意味的テクスト類似性における集団的人間の意見 Collective Human Opinions in Semantic Textual Similarity ( http://arxiv.org/abs/2308.04114v1 ) ライセンス: Link先を確認	Yuxia Wang, Shimin Tao, Ning Xie, Hao Yang, Timothy Baldwin, Karin Verspoor	(参考訳) セマンティックテキスト類似性(STS)の主観的な性質とSTSアノテーションの広汎な相違にもかかわらず、既存のベンチマークでは、平均的な人間格付けをゴールドスタンダードとして使用してきた。平均的なマスクは、低い合意の例における人間の意見の真の分布を隠蔽し、モデルが個々の評価が示す意味的曖昧さを捉えるのを防ぐ。本研究では,約15,000の文対と15万のラベルを持つ最初の不確実性対応STSデータセットであるUSTSを紹介する。分析により、スカラーも単一のガウス群も観測された判断のセットに適切に適合しないことが明らかになった。さらに,現在のstsモデルでは,個々のインスタンスに対する人間の不一致によるばらつきを捉えることはできず,集合データセットに対する予測信頼度を反映していることを示した。 Despite the subjective nature of semantic textual similarity (STS) and pervasive disagreements in STS annotation, existing benchmarks have used averaged human ratings as the gold standard. Averaging masks the true distribution of human opinions on examples of low agreement, and prevents models from capturing the semantic vagueness that the individual ratings represent. In this work, we introduce USTS, the first Uncertainty-aware STS dataset with ~15,000 Chinese sentence pairs and 150,000 labels, to study collective human opinions in STS. Analysis reveals that neither a scalar nor a single Gaussian fits a set of observed judgements adequately. We further show that current STS models cannot capture the variance caused by human disagreement on individual instances, but rather reflect the predictive confidence over the aggregate dataset.	翻訳日:2023-08-09 13:33:44 公開日:2023-08-08
# チャーン数の計算:実空間とツイスト境界条件の同値性 Calculations of Chern number: equivalence of real-space and twisted-boundary-condition formulae ( http://arxiv.org/abs/2308.04164v1 ) ライセンス: Link先を確認	Ling Lin, Yongguan Ke, Li Zhang and Chaohong Lee	(参考訳) チャーン数は二次元量子系の位相的特徴を特徴づける重要な不変量である。実空間チャーン数は、変換対称性を伴わずにシステムの位相的性質を抽出できるため、障害や不純物を伴うトポロジカルシステムの調査において重要な役割を果たす。一方、ツイスト境界条件(TBC)は、翻訳対称性のないチャーン数を定義するためにも用いられる。ここではこれらの異なるチャーン数の定義の関係について検討する。 TBC式と2つの実空間式(非可換チャーン数とボット指数式)を解析することにより、これらのアプローチが熱力学極限において等価であることを示す。等価性はハルダンモデルを通じて数値的に確認される。 Chern number is a crucial invariant for characterizing topological feature of two-dimensional quantum systems. Real-space Chern number allows us to extract topological properties of systems without involving translational symmetry, and hence plays an important role in investigating topological systems with disorder or impurity. On the other hand, the twisted boundary condition (TBC) can also be used to define the Chern number in the absence of translational symmetry. Here we study the relation between these different definitions of Chern number. Through analyzing the TBC formula and two real-space formulae (the non-commutative Chern number and the Bott index formula), we show that these approaches are equivalent in the thermodynamic limit. The equivalence is also numerically confirmed via the Haldane model.	翻訳日:2023-08-09 13:28:07 公開日:2023-08-08
# 散乱効果によるディスク下カメラ画像復元 Under-Display Camera Image Restoration with Scattering Effect ( http://arxiv.org/abs/2308.04163v1 ) ライセンス: Link先を確認	Binbin Song, Xiangyu Chen, Shuning Xu, and Jiantao Zhou	(参考訳) under-display camera(udc)は、ノッチやパンチホールによる邪魔なしにフルスクリーンのビジュアル体験を提供する。しかし、ディスプレイの半透明性は必然的にudc画像に深刻な劣化をもたらす。本稿では,表示による散乱効果の具体的な考察により,UDC画像復元問題に対処する。ディスプレイを均質な散乱媒体として扱うことにより,散乱効果を明示的にモデル化する。散乱効果の物理モデルを用いて、画像合成のための画像形成パイプラインを改善し、基底真理を持つ現実的なudcデータセットを構築する。最終的なUDC画像回復に対する散乱効果を抑制するために、2分岐復元ネットワークを設計する。より具体的には、散乱枝は、劣化した画像から散乱効果のパラメータを推定するためにチャンネルワイズ自己アテンションのグローバルモデリング能力を利用する。画像ブランチはcnnのローカル表現の利点を利用してクリアなシーンを復元する一方で、散乱ブランチによって暗黙的に誘導される。実世界のデータと合成データの両方で大規模な実験を行い、現状のUDC修復技術よりも提案手法の優位性を実証した。ソースコードとデータセットは \url{https://github.com/namecantbenull/srudc} で入手できる。 The under-display camera (UDC) provides consumers with a full-screen visual experience without any obstruction due to notches or punched holes. However, the semi-transparent nature of the display inevitably introduces the severe degradation into UDC images. In this work, we address the UDC image restoration problem with the specific consideration of the scattering effect caused by the display. We explicitly model the scattering effect by treating the display as a piece of homogeneous scattering medium. With the physical model of the scattering effect, we improve the image formation pipeline for the image synthesis to construct a realistic UDC dataset with ground truths. To suppress the scattering effect for the eventual UDC image recovery, a two-branch restoration network is designed. More specifically, the scattering branch leverages global modeling capabilities of the channel-wise self-attention to estimate parameters of the scattering effect from degraded images. While the image branch exploits the local representation advantage of CNN to recover clear scenes, implicitly guided by the scattering branch. Extensive experiments are conducted on both real-world and synthesized data, demonstrating the superiority of the proposed method over the state-of-the-art UDC restoration techniques. The source code and dataset are available at \url{https://github.com/NamecantbeNULL/SRUDC}.	翻訳日:2023-08-09 13:27:52 公開日:2023-08-08
# epcformer:ユニバーサル参照ビデオオブジェクトセグメンテーションのための表現プロンプト協調トランス EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation ( http://arxiv.org/abs/2308.04162v1 ) ライセンス: Link先を確認	Jiajun Chen, Jiacheng Lin, Zhiqiang Xiao, Haolong Fu, Ke Nai, Kailun Yang, Zhiyong Li	(参考訳) 音声誘導型ビデオオブジェクトセグメンテーション(A-VOS)と参照型ビデオオブジェクトセグメンテーション(R-VOS)は、どちらもユーザが提供する表現プロンプトに従って、ビデオシーケンスから特定のオブジェクトをセグメントすることを目的としている。しかし、異なるモダリティの表現をモデル化する際の課題のため、現代の手法は相互作用の柔軟性と高精度なローカライゼーションとセグメンテーションのバランスをとるのに苦労している。本稿では,音声とテキストのアライメント表現と,音声,テキスト,視覚的特徴間の深い相互作用という2つの観点からこの問題に対処する。まず,epcformerにおいて,汎用アーキテクチャであるexpression prompt collaboration transformerを提案する。次に,音声およびテキスト表現のための表現アライメント(EA)機構を提案する。音声およびテキスト表現のコントラスト学習を導入することにより,同じオブジェクトを表す音声とテキスト表現間の意味的等価性の理解を実現する。次に,音声,テキスト,映像間の深いインタラクションを容易にするために,表現・視覚注意(eva)機構を導入する。表現プロンプトの観点からの映像オブジェクトのセグメンテーションの知識は,テキストと音声の相補的手がかりを深く探求することにより,2つのタスク間のシームレスな移動を可能にする。良く認識されたベンチマークの実験は、我々の普遍的なEPCFormerが両方のタスクで最先端の結果を得ることを示した。 EPCFormerのソースコードはhttps://github.com/lab206/EPCFormerで公開されている。 Audio-guided Video Object Segmentation (A-VOS) and Referring Video Object Segmentation (R-VOS) are two highly-related tasks, which both aim to segment specific objects from video sequences according to user-provided expression prompts. However, due to the challenges in modeling representations for different modalities, contemporary methods struggle to strike a balance between interaction flexibility and high-precision localization and segmentation. In this paper, we address this problem from two perspectives: the alignment representation of audio and text and the deep interaction among audio, text, and visual features. First, we propose a universal architecture, the Expression Prompt Collaboration Transformer, herein EPCFormer. Next, we propose an Expression Alignment (EA) mechanism for audio and text expressions. By introducing contrastive learning for audio and text expressions, the proposed EPCFormer realizes comprehension of the semantic equivalence between audio and text expressions denoting the same objects. Then, to facilitate deep interactions among audio, text, and video features, we introduce an Expression-Visual Attention (EVA) mechanism. The knowledge of video object segmentation in terms of the expression prompts can seamlessly transfer between the two tasks by deeply exploring complementary cues between text and audio. Experiments on well-recognized benchmarks demonstrate that our universal EPCFormer attains state-of-the-art results on both tasks. The source code of EPCFormer will be made publicly available at https://github.com/lab206/EPCFormer.	翻訳日:2023-08-09 13:27:32 公開日:2023-08-08
# 知識表現と推論の現状と課題 Current and Future Challenges in Knowledge Representation and Reasoning ( http://arxiv.org/abs/2308.04161v1 ) ライセンス: Link先を確認	James P. Delgrande, Birte Glimm, Thomas Meyer, Miroslaw Truszczynski, Frank Wolter	(参考訳) 知識表現と推論は人工知能の中心的で、長く、活発な領域である。近年では、機械学習や不確実性下での推論といった分野の研究によって、その研究に挑戦され、補完されている。 2022年7月、知識表現と推論に関するdagstuhl perspectivesワークショップが開催された。ワークショップの目的は、他の分野との関係、欠点と強み、今後の進歩の勧告などを含む、分野における芸術の状況を説明することであった。私たちは、Dagstuhl Workshopで行われたプレゼンテーション、パネル、ワーキンググループ、ディスカッションに基づいて、このマニフェストを開発しました。それは知識表現に関する私たちの見解の宣言である:その起源、目標、マイルストーン、現在のファシ、その他の分野、特に人工知能との関係、そしてその課題、そして次の10年の主要な優先事項である。 Knowledge Representation and Reasoning is a central, longstanding, and active area of Artificial Intelligence. Over the years it has evolved significantly; more recently it has been challenged and complemented by research in areas such as machine learning and reasoning under uncertainty. In July 2022 a Dagstuhl Perspectives workshop was held on Knowledge Representation and Reasoning. The goal of the workshop was to describe the state of the art in the field, including its relation with other areas, its shortcomings and strengths, together with recommendations for future progress. We developed this manifesto based on the presentations, panels, working groups, and discussions that took place at the Dagstuhl Workshop. It is a declaration of our views on Knowledge Representation: its origins, goals, milestones, and current foci; its relation to other disciplines, especially to Artificial Intelligence; and on its challenges, along with key priorities for the next decade.	翻訳日:2023-08-09 13:27:07 公開日:2023-08-08
# ステレオ・アテンションによるトップダウン立体画像品質評価 Towards Top-Down Stereoscopic Image Quality Assessment via Stereo Attention ( http://arxiv.org/abs/2308.04156v1 ) ライセンス: Link先を確認	Huilin Zhang, Sumei Li, Yongli Chang	(参考訳) 立体画像品質評価(SIQA)は、3Dコンテンツの視覚的体験を評価し改善する上で重要な役割を担っている。 SIQAの既存の双眼鏡特性と注意法は有望な性能を達成した。しかし、これらのボトムアップアプローチは、人間の視覚システム(HVS)の本質的な特徴を利用するには不十分である。本稿では,SIQAをステレオアテンションとして,品質評価プロセスの指針としてトップダウン視点を用いた新しいネットワークを提案する。提案手法は,高次双眼信号から低次単眼信号への誘導を実現する一方,両眼・単眼情報は処理パイプライン全体を通して段階的に校正することができる。我々は,ステレオ知覚におけるトップダウン哲学を実現するために,一般化ステレオアテンション(sat)ブロックを設計する。このブロックは、融合生成アテンションマップを2つの低レベル単眼特徴の表現に影響を与える高レベル双眼鏡変調器として利用する。さらに、霊長類一次視覚野の両眼反応が単眼反応の総和よりも小さいことを示す最近の知見を考慮に入れたエネルギー係数(EC)を導入する。適応ECは両眼反応の大きさを柔軟に調整できるため,我々の枠組み内での頑健な両眼特徴の形成が促進される。単眼的特徴の2つの枝の総和と減算から最も識別的品質情報を抽出するために,ミンプールとマックスプール操作を各枝に適用する二重プール戦略を用いる。実験結果から,SIQA分野における視覚知覚特性のシミュレーションと最先端化におけるトップダウン手法の優位性を強調した。この作業のコードはhttps://github.com/fanning-zhang/satnetで入手できる。 Stereoscopic image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing binocular properties and attention-based methods for SIQA have achieved promising performance. However, these bottom-up approaches are inadequate in exploiting the inherent characteristics of the human visual system (HVS). This paper presents a novel network for SIQA via stereo attention, employing a top-down perspective to guide the quality assessment process. Our proposed method realizes the guidance from high-level binocular signals down to low-level monocular signals, while the binocular and monocular information can be calibrated progressively throughout the processing pipeline. We design a generalized Stereo AttenTion (SAT) block to implement the top-down philosophy in stereo perception. This block utilizes the fusion-generated attention map as a high-level binocular modulator, influencing the representation of two low-level monocular features. Additionally, we introduce an Energy Coefficient (EC) to account for recent findings indicating that binocular responses in the primate primary visual cortex are less than the sum of monocular responses. The adaptive EC can tune the magnitude of binocular response flexibly, thus enhancing the formation of robust binocular features within our framework. To extract the most discriminative quality information from the summation and subtraction of the two branches of monocular features, we utilize a dual-pooling strategy that applies min-pooling and max-pooling operations to the respective branches. Experimental results highlight the superiority of our top-down method in simulating the property of visual perception and advancing the state-of-the-art in the SIQA field. The code of this work is available at https://github.com/Fanning-Zhang/SATNet.	翻訳日:2023-08-09 13:26:53 公開日:2023-08-08
# ビジョンランゲージモデルを用いたインターリーブ型ビジョンランゲージ指導 Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions ( http://arxiv.org/abs/2308.04152v1 ) ライセンス: Link先を確認	Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Hanwang Zhang, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Yueting Zhuang	(参考訳) 最近、MLLM(Multimodal Large Language Models)が大きな関心を集め、様々な視覚言語タスクの汎用モデルとして機能する創発的な能力を示している。しかし、既存の手法は主に、MLLMの普及を妨げる視覚的コンテキストとして単一のイメージを持つ限られたタイプの命令に焦点を当てている。本稿では,視覚に豊かなWebページ/テキスト,講義スライド,エンボディダイアログなど,さまざまなシナリオをカバーする複雑な画像テキストシーケンシャルなコンテキストを含む複雑な視覚言語命令に対する命令に従う能力を総合的に評価するI4ベンチマークを提案する。画像キャプションのアライメントを目標とするVisual Prompt Generator (VPG)は、キャプションのための一般的なフォアグラウンド情報に出席する傾向にあるが、特定のタスクに必要な特定の情報を抽出するのに苦労する。本稿では,LLMの高度な推論能力を利用してVPGを制御し,命令固有の視覚情報を条件付きで抽出し,LLMに再注入する汎用的で軽量な知識再注入モジュールを提案する。さらに,基礎モデルのカスケードを協調させることにより,提案モジュールを体系的に学習するための,アノテーションフリーな対物画像学習戦略を提案する。提案するモジュールとトレーニング戦略によって強化されたcheetahは,多種多様な視覚言語インストラクションを効果的に処理し,高品質なマルチモーダルインストラクションチューニングデータを用いずに,i4のすべてのタスクにおいて最先端のゼロショット性能を実現するmllmである。さらに、Cheetahは、同時MMEベンチマークにおける最先端の命令チューニングモデルと比較して、競合性能を示す。 Multimodal Large Language Models (MLLMs) have recently sparked significant interest, which demonstrates emergent capabilities to serve as a general-purpose model for various vision-language tasks. However, existing methods mainly focus on limited types of instructions with a single image as visual context, which hinders the widespread availability of MLLMs. In this paper, we introduce the I4 benchmark to comprehensively evaluate the instruction following ability on complicated interleaved vision-language instructions, which involve intricate image-text sequential context, covering a diverse range of scenarios (e.g., visually-rich webpages/textbooks, lecture slides, embodied dialogue). Systematic evaluation on our I4 benchmark reveals a common defect of existing methods: the Visual Prompt Generator (VPG) trained on image-captioning alignment objective tends to attend to common foreground information for captioning but struggles to extract specific information required by particular tasks. To address this issue, we propose a generic and lightweight controllable knowledge re-injection module, which utilizes the sophisticated reasoning ability of LLMs to control the VPG to conditionally extract instruction-specific visual information and re-inject it into the LLM. Further, we introduce an annotation-free cross-attention guided counterfactual image training strategy to methodically learn the proposed module by collaborating a cascade of foundation models. Enhanced by the proposed module and training strategy, we present Cheetah, a MLLM that can effectively handle a wide variety of interleaved vision-language instructions and achieves state-of-the-art zero-shot performance across all tasks of I4, without high-quality multimodal instruction tuning data. Moreover, Cheetah also exhibits competitive performance compared with state-of-the-art instruction tuned models on concurrent MME benchmark.	翻訳日:2023-08-09 13:26:24 公開日:2023-08-08
# エッジ機械学習を用いた白斑症候群ウイルス(WSSV)モニタリングへの応用 Application for White Spot Syndrome Virus (WSSV) Monitoring using Edge Machine Learning ( http://arxiv.org/abs/2308.04151v1 ) ライセンス: Link先を確認	Lorenzo S. Querol, Macario O. Cordel II, Dan Jeric A. Rustia, Mary Nia M. Santos	(参考訳) 養殖産業はエビの輸出に強く依存しており、生産に深刻な影響を及ぼすホワイトスポット症候群ウイルス(WSSV)のようなウイルス感染による課題に直面している。この文脈では、コンピュータビジョンは、熟練した目や訓練されていない目ですぐに明らかでない特徴を特定する上で重要な役割を果たす。本研究は,WSSV認識のための限られたデータに対する課題である。データ収集とモニタリングに特化したモバイルアプリケーションは、WSSV認識モデルをトレーニングし、国全体の疾病監視を改善するためのイメージデータセットの作成を容易にするために開発された。この研究は、不均衡学習とデバイス上の推論の課題に対処するために、WSSV認識の徹底的な分析も含んでいる。 MobileNetV3-SmallとEfficientNetV2-B0がそれぞれ0.72と0.99のF1スコアを獲得した。両方のモデルの塩分ヒートマップは、これらのモデルの「ブラックボックス」の性質を明らかにし、画像のどの特徴が予測に最も重要であるかについての洞察を得るためにも観察された。これらの結果は、リソース制約のあるデバイス用に設計されたモデルを使用することの有効性と限界を強調し、WSSVを正確に認識し、この領域におけるコンピュータビジョンの使用における貴重な情報と方向性を提供する。 The aquaculture industry, strongly reliant on shrimp exports, faces challenges due to viral infections like the White Spot Syndrome Virus (WSSV) that severely impact output yields. In this context, computer vision can play a significant role in identifying features not immediately evident to skilled or untrained eyes, potentially reducing the time required to report WSSV infections. In this study, the challenge of limited data for WSSV recognition was addressed. A mobile application dedicated to data collection and monitoring was developed to facilitate the creation of an image dataset to train a WSSV recognition model and improve country-wide disease surveillance. The study also includes a thorough analysis of WSSV recognition to address the challenge of imbalanced learning and on-device inference. The models explored, MobileNetV3-Small and EfficientNetV2-B0, gained an F1-Score of 0.72 and 0.99 respectively. The saliency heatmaps of both models were also observed to uncover the "black-box" nature of these models and to gain insight as to what features in the images are most important in making a prediction. These results highlight the effectiveness and limitations of using models designed for resource-constrained devices and balancing their performance in accurately recognizing WSSV, providing valuable information and direction in the use of computer vision in this domain.	翻訳日:2023-08-09 13:25:50 公開日:2023-08-08
# ハイブリッドフィードフォワード受信機による二相シフト鍵識別のための標準量子限界のビーティング Beating the standard quantum limit for binary phase-shift-keying discrimination with a hybrid feed-forward receiver ( http://arxiv.org/abs/2308.04146v1 ) ライセンス: Link先を確認	Michele N. Notarnicola and Stefano Olivares	(参考訳) 低強度局所発振器と光子数分解検出器を用いて、変位フィードフォワード受信機(DFFRE)とホモダインの適切な組み合わせに基づいて、二相シフト鍵コヒーレント状態の判別を行うハイブリッドフィードフォワード受信機(HFFRE)を提案する。提案手法は,非単位量子検出効率,暗カウント,可視性低下の存在下での現実的なシナリオにも対処する。現在のHFFREは、全ての条件においてDFFREよりも優れており、特定のレシエーションにおける標準量子限界を上回っている。 We propose a hybrid feed-forward receiver (HFFRE) for the discrimination of binary phase-shift-keyed coherent states based on the appropriate combination of the displacement feed-forward receiver (DFFRE) and a homodyne-like setup employing a low-intensity local oscillator and photon-number-resolving detectors. We investigate the performance of the proposed scheme addressing also realistic scenarios in the presence of non-unit quantum detection efficiency, dark counts and a visibility reduction. The present HFFRE outperforms the DFFRE in all conditions, beating the standard quantum limit in particular regimes.	翻訳日:2023-08-09 13:25:29 公開日:2023-08-08
# 視覚表現学習のためのクラスレベル構造関係モデリングと平滑化 Class-level Structural Relation Modelling and Smoothing for Visual Representation Learning ( http://arxiv.org/abs/2308.04142v1 ) ライセンス: Link先を確認	Zitan Chen, Zhuang Qi, Xiao Cao, Xiangxian Li, Xiangxu Meng, Lei Meng	(参考訳) 画像の表現学習は、視覚トランスフォーマーのようなより複雑な神経モデルや、構造因果モデルのような新しい学習理論の進歩によって進歩してきた。しかし、これらのモデルはクラスレベルのデータ分散を暗黙的に規則化する分類損失に主に依存しており、様々な視覚的パターンを持つクラスを扱う際に困難に直面する可能性がある。データサンプル間の構造情報の導入は,この状況を改善する可能性がある。 To achieve this goal, this paper presents a framework termed \textbf{C}lass-level Structural Relation Modeling and Smoothing for Visual Representation Learning (CSRMS), which includes the Class-level Relation Modelling, Class-aware Graph Sampling, and Relational Graph-Guided Representation Learning modules to model a relational graph of the entire dataset and perform class-aware smoothing and regularization operations to alleviate the issue of intra-class visual diversity and inter-class similarity. Specifically, the Class-level Relation Modelling module uses a clustering algorithm to learn the data distributions in the feature space and identify three types of class-level sample relations for the training set; Class-aware Graph Sampling module extends typical training batch construction process with three strategies to sample dataset-level sub-graphs; and Relational Graph-Guided Representation Learning module employs a graph convolution network with knowledge-guided smoothing operations to ease the projection from different visual patterns to the same class. 構造化知識モデルによる表現学習の効果を実証し、csrmを任意の最先端の視覚表現学習モデルと組み込むことで、パフォーマンスの向上が期待できることを示した。ソースコードとデモはhttps://github.com/czt117/CSRMSで公開されている。 Representation learning for images has been advanced by recent progress in more complex neural models such as the Vision Transformers and new learning theories such as the structural causal models. However, these models mainly rely on the classification loss to implicitly regularize the class-level data distributions, and they may face difficulties when handling classes with diverse visual patterns. We argue that the incorporation of the structural information between data samples may improve this situation. To achieve this goal, this paper presents a framework termed \textbf{C}lass-level Structural Relation Modeling and Smoothing for Visual Representation Learning (CSRMS), which includes the Class-level Relation Modelling, Class-aware Graph Sampling, and Relational Graph-Guided Representation Learning modules to model a relational graph of the entire dataset and perform class-aware smoothing and regularization operations to alleviate the issue of intra-class visual diversity and inter-class similarity. Specifically, the Class-level Relation Modelling module uses a clustering algorithm to learn the data distributions in the feature space and identify three types of class-level sample relations for the training set; Class-aware Graph Sampling module extends typical training batch construction process with three strategies to sample dataset-level sub-graphs; and Relational Graph-Guided Representation Learning module employs a graph convolution network with knowledge-guided smoothing operations to ease the projection from different visual patterns to the same class. Experiments demonstrate the effectiveness of structured knowledge modelling for enhanced representation learning and show that CSRMS can be incorporated with any state-of-the-art visual representation learning models for performance gains. The source codes and demos have been released at https://github.com/czt117/CSRMS.	翻訳日:2023-08-09 13:25:15 公開日:2023-08-08
# 長期法的文書分類のための大規模言語モデルプロンプトチェイン Large Language Model Prompt Chaining for Long Legal Document Classification ( http://arxiv.org/abs/2308.04138v1 ) ライセンス: Link先を確認	Dietrich Trautmann	(参考訳) プロンプトは、望ましい結果に合致した適切な応答を生成する際に、言語モデルを誘導または制御するために使用される。チェイン(Chaining)は、複雑なタスクを小さな管理可能なコンポーネントに分解する戦略である。本研究は,広範な法律文書分類タスクにおいて,プロンプト・チェーンを活用し,その複雑なドメイン固有言語と相当な長さの制約を呈する。私たちのアプローチは、元の文書の簡潔な要約の作成から始まり、関連する例文とその対応するアノテーションをトレーニングコーパスから意味的に検索する。最後に、限定的なプロンプトからコンテキスト内学習を活用することで、タスクに基づいたラベルを割り当てるように促します。即時連鎖により、ゼロショット以上の性能を向上できるだけでなく、より小さなモデルを用いてChatGPTゼロショットのような大型モデルによって達成されるマイクロF1スコアを超越できることを実証する。 Prompting is used to guide or steer a language model in generating an appropriate response that is consistent with the desired outcome. Chaining is a strategy used to decompose complex tasks into smaller, manageable components. In this study, we utilize prompt chaining for extensive legal document classification tasks, which present difficulties due to their intricate domain-specific language and considerable length. Our approach begins with the creation of a concise summary of the original document, followed by a semantic search for related exemplar texts and their corresponding annotations from a training corpus. Finally, we prompt for a label - based on the task - to assign, by leveraging the in-context learning from the few-shot prompt. We demonstrate that through prompt chaining, we can not only enhance the performance over zero-shot, but also surpass the micro-F1 score achieved by larger models, such as ChatGPT zero-shot, using smaller models.	翻訳日:2023-08-09 13:24:49 公開日:2023-08-08
# 超強結合系におけるベル状態の超高速および決定論的生成 Ultrafast and deterministic generation of Bell states in the ultrastrong coupling regime ( http://arxiv.org/abs/2308.04183v1 ) ライセンス: Link先を確認	Xin Xie, Junlong Tian, Jie Peng	(参考訳) 我々は、非等方性2量子ラビモデル(qrm)の特別なダーク状態解を発見し、これは少なくとも1つの光子を持ち、カップリング状態全体において一定の固有エネルギーを持つ。そこで本研究では,暗黒状態に沿った断熱的進化を通じて2種類のベル状態を生成する手法を提案する。スタークシフトの助けを借りて、生成時間をサブナノ秒スケールに短縮することができ、共振器周波数の逆に比例し、忠実度は99%に達する。さらに、他の2種類のベル状態も超高速生成することができる。 We have found the special dark state solutions of the anisotropic two-qubit quantum Rabi model (QRM), which has at most one photon, and constant eigenenergy in the whole coupling regime. Accordingly, we propose a scheme to deterministically generate two kinds of the two-qubit Bell states through adiabatic evolution along the dark states. With the assistance of the Stark shift, the generation time can be reduced to subnanosecond scales, proportional to the reverse of the resonator frequency, with fidelity reaching 99%. Furthermore, the other two kinds of Bell states can also be ultrafast generated.	翻訳日:2023-08-09 13:16:43 公開日:2023-08-08
# 社会的に受け入れがたい談話分類(SUD)について : 「我々は同じページにいるのか?」 Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ?" ( http://arxiv.org/abs/2308.04180v1 ) ライセンス: Link先を確認	Bruno Machado Carneiro, Michele Linardi, Julien Longhi	(参考訳) オンラインテキストにおけるsud(socially unacceptable discourse)の特徴付けと検出について検討した。我々は、これまで最先端の機械学習(ML) SUD検出ソリューションで使用されてきたさまざまなオンラインソースから、さまざまな手動の注釈付きテキストを含む、新しいコーパスを構築し、提示する。このグローバルな文脈は、異なる文脈からではなく、同じSUDカテゴリに関する知識を取得するSUD分類器の一般化能力をテストすることができる。この観点から、オープンチャレンジとオープンリサーチの方向性を議論することで、異なるアノテーションのモダリティがSUD学習にどのように影響するかを分析することができる。また、アノテーションタスクでドメインエキスパートをサポートするいくつかのデータインサイトも提供します。 We study Socially Unacceptable Discourse (SUD) characterization and detection in online text. We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources used so far in state-of-the-art Machine learning (ML) SUD detection solutions. This global context allows us to test the generalization ability of SUD classifiers that acquire knowledge around the same SUD categories, but from different contexts. From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning by discussing open challenges and open research directions. We also provide several data insights which can support domain experts in the annotation task.	翻訳日:2023-08-09 13:16:32 公開日:2023-08-08
# 医療のためのチャットボット:簡潔なレビュー Assistive Chatbots for healthcare: a succinct review ( http://arxiv.org/abs/2308.04178v1 ) ライセンス: Link先を確認	Basabdatta Sen Bhattacharya, Vibhav Sinai Pissurlenkar	(参考訳) 医療サービスを支援する人工知能(AI)は、近年の世界的なパンデミックほど必要とされていない。ここでは、過去10年間(2013-2023)に提案された医療におけるAI対応チャットボットの現状について概観する。 AI対応技術に焦点が当てられているのは、チャットボットによる人間と機械のインタラクションの質を高め、人間と人間のインタラクションへの依存を減らし、人間の時間を節約できる可能性があるからだ。われわれのレビューは、患者サポートに使われている(商用)チャットボットはごくわずかだが、臨床試験段階にある他の(商用ではない)チャットボットもあることを示している。しかし、このテクノロジーに対する患者の安全とデータ保護に関する信頼の欠如に加えて、医療従事者や専門家の間では、そのメリットに対するより広い認識の欠如がある。また,ヒトと比較して,チャットボットの自然言語処理(NLP)スキルに不満を呈している。このチャットボットは、nlpテクノロジーのバーを育てた最近のchatgptの導入にもかかわらず、医療支援の「ナロー」領域で機能する徹底的かつ厳格なチェックなしでは、患者の安全と医療倫理に信頼できない。私たちのレビューでは、公衆衛生サービスにおけるAI対応チャットボットのデプロイと統合を可能にするためには、時間の必要性が示唆されている。 (a)研修・開発を中心とした医療コミュニティ b) アウトリーチを通じて患者とより広い地域社会。 Artificial Intelligence (AI) for supporting healthcare services has never been more necessitated than by the recent global pandemic. Here, we review the state-of-the-art in AI-enabled Chatbots in healthcare proposed during the last 10 years (2013-2023). The focus on AI-enabled technology is because of its potential for enhancing the quality of human-machine interaction via Chatbots, reducing dependence on human-human interaction and saving man-hours. Our review indicates that there are a handful of (commercial) Chatbots that are being used for patient support, while there are others (non-commercial) that are in the clinical trial phases. However, there is a lack of trust on this technology regarding patient safety and data protection, as well as a lack of wider awareness on its benefits among the healthcare workers and professionals. Also, patients have expressed dissatisfaction with Natural Language Processing (NLP) skills of the Chatbots in comparison to humans. Notwithstanding the recent introduction of ChatGPT that has raised the bar for the NLP technology, this Chatbot cannot be trusted with patient safety and medical ethics without thorough and rigorous checks to serve in the `narrow' domain of assistive healthcare. Our review suggests that to enable deployment and integration of AI-enabled Chatbots in public health services, the need of the hour is: to build technology that is simple and safe to use; to build confidence on the technology among: (a) the medical community by focussed training and development; (b) the patients and wider community through outreach.	翻訳日:2023-08-09 13:16:20 公開日:2023-08-08
# ディープフェイク検出器はどの程度一般化可能か? 実証的研究 How Generalizable are Deepfake Detectors? An Empirical Study ( http://arxiv.org/abs/2308.04177v1 ) ライセンス: Link先を確認	Boquan Li, Jun Sun, Christopher M. Poskitt	(参考訳) ディープフェイクビデオや画像はますます信頼性が高くなり、詐欺やバイパスアクセス制御システムを促進する可能性から、大きな脅威となっている。これはディープフェイク検出法の開発を動機付けており、ディープラーニングモデルは実写映像と合成映像を区別するために訓練されている。残念ながら、既存の検出モデルは、トレーニングされていないデータセットのディープフェイクを一般化するのに苦労するが、なぜこの制限に対処できるのかを調査する作業はほとんど行われていない。本稿では,ディープフェイク検出器の汎用性に関する最初の実証的研究について述べる。本研究では,6つのdeepfakeデータセット,5つのdeepfake検出手法,および2つのモデル拡張手法を用いて,ゼロショット設定では検出器が一般化しないことを確認した。さらに, 検出器は, 合成法に特有の不要な特性を学習し, 識別的特徴の抽出に苦慮し, 一般化能力に限界があることが判明した。最後に、見えないデータセットをまたいで検出に普遍的に寄与するニューロンが存在することを見出し、ゼロショット一般化可能性への道筋を照明する。 Deepfake videos and images are becoming increasingly credible, posing a significant threat given their potential to facilitate fraud or bypass access control systems. This has motivated the development of deepfake detection methods, in which deep learning models are trained to distinguish between real and synthesized footage. Unfortunately, existing detection models struggle to generalize to deepfakes from datasets they were not trained on, but little work has been done to examine why or how this limitation can be addressed. In this paper, we present the first empirical study on the generalizability of deepfake detectors, an essential goal for detectors to stay one step ahead of attackers. Our study utilizes six deepfake datasets, five deepfake detection methods, and two model augmentation approaches, confirming that detectors do not generalize in zero-shot settings. Additionally, we find that detectors are learning unwanted properties specific to synthesis methods and struggling to extract discriminative features, limiting their ability to generalize. Finally, we find that there are neurons universally contributing to detection across seen and unseen datasets, illuminating a possible path forward to zero-shot generalizability.	翻訳日:2023-08-09 13:15:57 公開日:2023-08-08
# オープンドメインQAのためのモノトニックアグリゲーションについて On Monotonic Aggregation for Open-domain QA ( http://arxiv.org/abs/2308.04176v1 ) ライセンス: Link先を確認	Sang-eun Han, Yeonseok Jeong, Seung-won Hwang, Kyungjae Lee	(参考訳) 質問応答 (QA) は, 支援文書を読み取ることなく, 回答のみを精査することで, 知識ソースからの音声検索において重要な課題である。特に、オープンドメインのQAは、制限なしの知識ソースに関するユーザの質問に答えることを目的としている。理想的には、ソースを追加することは精度を低下させるべきではないが、この特性("モノトニック性"と表記される)は現在の最先端のメソッドには当てはまらない。我々はその原因を特定し,それに基づいてジャッジ・スペシャリストの枠組みを提案する。本フレームワークは,(1)個々の情報源をカバーする専門的検索者/読み手,(2)最終回答を選択する専用言語モデルからなる。実験の結果,本フレームワークはモノトニック性を保証するだけでなく,最先端のマルチソースQA手法よりも優れていることがわかった。さらに,音声認識による雑音に対する単調性は頑健に保たれることを示す。コードと設定を公開しています。 Question answering (QA) is a critical task for speech-based retrieval from knowledge sources, by sifting only the answers without requiring to read supporting documents. Specifically, open-domain QA aims to answer user questions on unrestricted knowledge sources. Ideally, adding a source should not decrease the accuracy, but we find this property (denoted as "monotonicity") does not hold for current state-of-the-art methods. We identify the cause, and based on that we propose Judge-Specialist framework. Our framework consists of (1) specialist retrievers/readers to cover individual sources, and (2) judge, a dedicated language model to select the final answer. Our experiments show that our framework not only ensures monotonicity, but also outperforms state-of-the-art multi-source QA methods on Natural Questions. Additionally, we show that our models robustly preserve the monotonicity against noise from speech recognition. We publicly release our code and setting.	翻訳日:2023-08-09 13:15:35 公開日:2023-08-08
# 知識グラフを用いた薬物-薬物相互作用の予測 Predicting Drug-Drug Interactions Using Knowledge Graphs ( http://arxiv.org/abs/2308.04172v1 ) ライセンス: Link先を確認	Lizzy Farrugia, Lilian M. Azzopardi, Jeremy Debattista and Charlie Abela	(参考訳) 過去数十年間、人々は以前よりも多くの薬物を消費し、組み合わせ、ドラッグ・ドラッグ・インタラクション(DDI)の数を増やしてきた。未知のDDIを予測するために、近年では、単一の薬物特性を使用するよりも優れた薬物表現を提供するエンティティ間の関係を捉えることができるため、知識グラフ(KG)を導入し始めた。本稿では,様々な翻訳,因子化,ニューラルネットワーク(nn)ベースのkg埋め込み(kge)手法を用いて,公開薬物リポジトリからいくつかの薬物機能を1kgに統合し,そのノードをグラフに組み込む,medicx end-to-endフレームワークを提案する。最終的に、未知のDDIを予測する機械学習(ML)アルゴリズムを使用します。異なる翻訳と分解に基づくKGEモデルの中で、最も優れた組み合わせは、ComplExとLong Short-Term Memory (LSTM) ネットワークの組込みであり、D薬バンクのバージョン5.1.8にあるDDIに基づくデータセットでF1スコアの95.19%を得ることができた。このスコアは最先端のDeepDDIよりも5.61%良い。さらに,グラフニューラルネットワーク(gnn)を用いたグラフ自動エンコーダモデルも開発し,91.94%のf1スコアを達成した。その結果、GNNはComplExモデルよりもKGの基盤となるセマンティクスをマイニングする能力が強く、したがって、GNN内に高次元の埋め込みを使用することで、最先端のパフォーマンスを実現することができる。 In the last decades, people have been consuming and combining more drugs than before, increasing the number of Drug-Drug Interactions (DDIs). To predict unknown DDIs, recently, studies started incorporating Knowledge Graphs (KGs) since they are able to capture the relationships among entities providing better drug representations than using a single drug property. In this paper, we propose the medicX end-to-end framework that integrates several drug features from public drug repositories into a KG and embeds the nodes in the graph using various translation, factorisation and Neural Network (NN) based KG Embedding (KGE) methods. Ultimately, we use a Machine Learning (ML) algorithm that predicts unknown DDIs. Among the different translation and factorisation-based KGE models, we found that the best performing combination was the ComplEx embedding method with a Long Short-Term Memory (LSTM) network, which obtained an F1-score of 95.19% on a dataset based on the DDIs found in DrugBank version 5.1.8. This score is 5.61% better than the state-of-the-art model DeepDDI. Additionally, we also developed a graph auto-encoder model that uses a Graph Neural Network (GNN), which achieved an F1-score of 91.94%. Consequently, GNNs have demonstrated a stronger ability to mine the underlying semantics of the KG than the ComplEx model, and thus using higher dimension embeddings within the GNN can lead to state-of-the-art performance.	翻訳日:2023-08-09 13:15:19 公開日:2023-08-08
# 多コアニューロモルフィックプロセッサのコアインタフェース最適化 Core interface optimization for multi-core neuromorphic processors ( http://arxiv.org/abs/2308.04171v1 ) ライセンス: Link先を確認	Zhe Su, Hyunjung Hwang, Tristan Torchet, Giacomo Indiveri	(参考訳) Spiking Neural Networks(SNN)のハードウェア実装は、低電力と低レイテンシを必要とし、外部クラウドベースのコンピューティングサービスに頼らないアプリケーションのためのエッジコンピューティングへの有望なアプローチである。しかし、これまで提案されたほとんどのソリューションは、比較的小さなネットワークしかサポートしていないか、大きなネットワークを実装するための重要なハードウェアリソースを取り上げている。大規模でスケーラブルなSNNを実現するためには、マルチコアアーキテクチャの設計を可能にする効率的な非同期通信およびルーティングファブリックを開発する必要がある。特に、コア間スパイク通信を管理するコアインターフェースは、特に調停アーキテクチャとルーティングメモリにおける電力性能領域(ppa)のボトルネックを表しているため、重要なコンポーネントである。本稿では,階層型アービタ木に基づく,対応する非同期符号化パイプライン回路との調停機構を提案する。提案手法は,最先端の調停アーキテクチャと比較して,スパースイベントモードでのレイテンシを70%以上削減し,面積コストを低減した。ルーティングメモリは、電流センシング完了検出(cscd)を伴う非同期コンテンツアドレス可能メモリ(cam)を使用し、約46%の省エネを実現し、構成可能な遅延線を用いて従来の非同期camに対するスループットを40%向上させる。さらに、マルチコアニューロモルフィックプロセッサのコアインタフェースリソースを劇的に削減すると同時に、我々が提案する調停アーキテクチャとCAMアーキテクチャは、幅広い一般的な非同期回路やシステムにも適用可能である。 Hardware implementations of Spiking Neural Networks (SNNs) represent a promising approach to edge-computing for applications that require low-power and low-latency, and which cannot resort to external cloud-based computing services. However, most solutions proposed so far either support only relatively small networks, or take up significant hardware resources, to implement large networks. To realize large-scale and scalable SNNs it is necessary to develop an efficient asynchronous communication and routing fabric that enables the design of multi-core architectures. In particular the core interface that manages inter-core spike communication is a crucial component as it represents the bottleneck of Power-Performance-Area (PPA) especially for the arbitration architecture and the routing memory. In this paper we present an arbitration mechanism with the corresponding asynchronous encoding pipeline circuits, based on hierarchical arbiter trees. The proposed scheme reduces the latency by more than 70% in sparse-event mode, compared to the state-of-the-art arbitration architectures, with lower area cost. The routing memory makes use of asynchronous Content Addressable Memory (CAM) with Current Sensing Completion Detection (CSCD), which saves approximately 46% energy, and achieves a 40% increase in throughput against conventional asynchronous CAM using configurable delay lines, at the cost of only a slight increase in area. In addition as it radically reduces the core interface resources in multi-core neuromorphic processors, the arbitration architecture and CAM architecture we propose can be also applied to a wide range of general asynchronous circuits and systems.	翻訳日:2023-08-09 13:14:51 公開日:2023-08-08
# 位置音源定位のための2重入力ニューラルネットワーク Dual input neural networks for positional sound source localization ( http://arxiv.org/abs/2308.04169v1 ) ライセンス: Link先を確認	Eric Grinstein, Vincent W. Neo and Patrick A. Naylor	(参考訳) 多くの信号処理アプリケーションでは、メタデータを高次元信号と組み合わせて所望の出力を生成するのに有利に使用できる。従来のサウンドソースローカライゼーション(SSL)アルゴリズムでは、多くの分散マイクロホンから受信される高次元のマルチチャンネルオーディオ信号から得られる情報と、空間内のマイクロホンの座標などのシーンの音響特性を記述する情報を組み合わせて、音源の位置を推定する。本稿では,これら2つのデータ型をニューラルネットワークでモデル化するための簡易かつ効果的な手法として,dual input neural network (di-nns)を導入する。提案したDI-NNを,難易度やリアリズムの異なるシナリオで訓練・評価し,従来のLast-Squares(LS)法や,従来の畳み込みリカレントニューラルネットワーク(CRNN)法と比較する。その結果、実記録の試験データセットにおいて、di-nnがベースラインを著しく上回り、ls法より5倍低いローカライズエラーとなり、crnnより2倍低い値を示した。 In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such as the microphones' coordinates in space, to estimate the position of a sound source. We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture, a classical Least-Squares (LS) method as well as a classical Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.	翻訳日:2023-08-09 13:14:21 公開日:2023-08-08
# EFaR 2023: 効率的な顔認識コンペティション EFaR 2023: Efficient Face Recognition Competition ( http://arxiv.org/abs/2308.04168v1 ) ライセンス: Link先を確認	Jan Niklas Kolf, Fadi Boutros, Jurek Elliesen, Markus Theuerkauf, Naser Damer, Mohamad Alansari, Oussama Abdul Hay, Sara Alansari, Sajid Javed, Naoufel Werghi, Klemen Grm, Vitomir \v{S}truc, Fernando Alonso-Fernandez, Kevin Hernandez Diaz, Josef Bigun, Anjith George, Christophe Ecabert, Hatef Otroshi Shahreza, Ketan Kotwal, S\'ebastien Marcel, Iurii Medvedev, Bo Jin, Diogo Nunes, Ahmad Hassanpour, Pankaj Khatiwada, Aafan Ahmad Toor, Bian Yang	(参考訳) 本稿では,2023年の国際生体認証合同会議(ijcb 2023)で開かれた,効率的な顔認識コンペティション(efar)の概要を紹介する。この大会は6つの異なるチームから17の応募を受けた。効率的な顔認識モデルのさらなる発展を促進するため、提案したソリューションは、様々なベンチマークで達成された検証精度の重み付けスコアと、浮動小数点演算数とモデルサイズによるデプロイ可能性に基づいてランク付けされる。提案の評価はバイアス、クロス品質、大規模認識ベンチマークに拡張される。本稿では,提案したソリューションの性能評価結果の概要と,多様なベースラインのセットについて概説する。提出されたソリューションは、計算コストを削減するために小さく効率的なネットワークアーキテクチャを使用し、いくつかのソリューションはモデル量子化を適用する。現在のソリューションで不足している可能性のある技術についても,その展望が述べられている。 This paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023). The competition received 17 submissions from 6 different teams. To drive further development of efficient face recognition models, the submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size. The evaluation of submissions is extended to bias, cross-quality, and large-scale recognition benchmarks. Overall, the paper gives an overview of the achieved performance values of the submitted solutions as well as a diverse set of baselines. The submitted solutions use small, efficient network architectures to reduce the computational cost, some solutions apply model quantization. An outlook on possible techniques that are underrepresented in current solutions is given as well.	翻訳日:2023-08-09 13:13:59 公開日:2023-08-08
# 連続変数ベースの量子位置検証プロトコルのセキュリティ Security of a Continuous-Variable based Quantum Position Verification Protocol ( http://arxiv.org/abs/2308.04166v1 ) ライセンス: Link先を確認	Rene Allerstorfer, Lloren\c{c} Escol\`a-Farr\`as, Arpan Akash Ray, Boris \v{S}kori\'c, Florian Speelman, Philip Verduyn Lunel	(参考訳) 本研究では,連続可変量子状態を用いた量子位置検証について検討する。既存の離散プロトコルとは対照的に,コヒーレントな状態とその特性を利用するプロトコルを提示・分析する。離散可変フォトニック状態と比較して、コヒーレント状態は、現在の技術で効率的に調製および操作できるため、実用的な利点がある。我々は,量子チャネル内の雑音が一定のしきい値以下である限り,敵は正しい応答について正直な証明者よりも不確実性が高いことを示すため,エントロピーな不確実性関係を通じて,絡み合っていない攻撃者に対するプロトコルのセキュリティを証明した。さらに,eprペアを1つだけ共有する攻撃者がプロトコルを破ることができることを示す。 In this work we study quantum position verification with continuous-variable quantum states. In contrast to existing discrete protocols, we present and analyze a protocol that utilizes coherent states and its properties. Compared to discrete-variable photonic states, coherent states offer practical advantages since they can be efficiently prepared and manipulated with current technology. We prove security of the protocol against any unentangled attackers via entropic uncertainty relations, showing that the adversary has more uncertainty than the honest prover about the correct response as long as the noise in the quantum channel is below a certain threshold. Additionally, we show that attackers who pre-share one continuous-variable EPR pair can break the protocol.	翻訳日:2023-08-09 13:13:47 公開日:2023-08-08
# KNNを用いたLASSOを用いた地域定量値の変動係数と健康影響研究への応用 Varying-coefficients for regional quantile via KNN-based LASSO with applications to health outcome study ( http://arxiv.org/abs/2308.04212v1 ) ライセンス: Link先を確認	Seyoung Park, Eun Ryung Lee, Hyokyoung G. Hong	(参考訳) 身体の質量指数やコレステロール濃度などの健康影響は年齢に依存し、関連する危険因子に様々な影響を与えることが知られている。本稿では,k-nearest neighbors (knn) fused lasso を用いた変分共効率(vc)地域分位回帰を用いた,健康成果とリスク要因の関係を動的にモデル化する新しい枠組みを提案する。提案手法は,厳密な推定誤差バウンドと,特定の正規性条件下で正確なクラスターパターンを検出する能力を含む,強い理論的特性を有する。結果の最適化問題を効率的に解くために,乗算器アルゴリズムの交互方向法(ADMM)を開発した。本研究は,健康成果とリスク因子の複雑な年齢依存関係を捉えるための提案手法の有効性を実証する。 Health outcomes, such as body mass index and cholesterol levels, are known to be dependent on age and exhibit varying effects with their associated risk factors. In this paper, we propose a novel framework for dynamic modeling of the associations between health outcomes and risk factors using varying-coefficients (VC) regional quantile regression via K-nearest neighbors (KNN) fused Lasso, which captures the time-varying effects of age. The proposed method has strong theoretical properties, including a tight estimation error bound and the ability to detect exact clustered patterns under certain regularity conditions. To efficiently solve the resulting optimization problem, we develop an alternating direction method of multipliers (ADMM) algorithm. Our empirical results demonstrate the efficacy of the proposed method in capturing the complex age-dependent associations between health outcomes and their risk factors.	翻訳日:2023-08-09 13:08:00 公開日:2023-08-08
# x線マイクロスペクトロスコピーによる材料化学状態のロバスト検索 Robust retrieval of material chemical states in X-ray microspectroscopy ( http://arxiv.org/abs/2308.04207v1 ) ライセンス: Link先を確認	Ting Wang, Xiaotong Wu, Jizhou Li, Chao Wang	(参考訳) x線マイクロスペクトロスコープ技術は、材料の形態的および化学的変化を研究するために必須であり、高分解能な構造と分光情報を提供する。しかし、化学状態の確実な回収のための実用的なデータ分析は、多くの研究分野における材料の基本的理解を加速させる大きな障害である。本研究では、x線マイクロスペクトロスコピーのための新しいデータ定式化モデルを提案し、ノイズやスペクトル変動に頑健な、この問題を解決するための専用非混合フレームワークを開発した。さらに、この枠組みは二状態物質化学の分析に限らず、従来および広く用いられている手法の代替として有効である。また、より効率的に解を得るために、証明可能な収束を伴う代替方向乗算法が適用される。提案手法は,低信号対雑音比や重なり合うスペクトル特徴といった困難な条件下でも,複雑な試料や異種試料の化学状態を正確に同定し特徴付けることができる。シミュレーションおよび実データに対する大規模な実験結果は、その有効性と信頼性を示している。 X-ray microspectroscopic techniques are essential for studying morphological and chemical changes in materials, providing high-resolution structural and spectroscopic information. However, its practical data analysis for reliably retrieving the chemical states remains a major obstacle to accelerating the fundamental understanding of materials in many research fields. In this work, we propose a novel data formulation model for X-ray microspectroscopy and develop a dedicated unmixing framework to solve this problem, which is robust to noise and spectral variability. Moreover, this framework is not limited to the analysis of two-state material chemistry, making it an effective alternative to conventional and widely-used methods. In addition, an alternative directional multiplier method with provable convergence is applied to obtain the solution efficiently. Our framework can accurately identify and characterize chemical states in complex and heterogeneous samples, even under challenging conditions such as low signal-to-noise ratios and overlapping spectral features. Extensive experimental results on simulated and real datasets demonstrate its effectiveness and reliability.	翻訳日:2023-08-09 13:07:42 公開日:2023-08-08
# オープンワールドインスタンスセグメンテーションのためのトランスフォーマーの探索 Exploring Transformers for Open-world Instance Segmentation ( http://arxiv.org/abs/2308.04206v1 ) ライセンス: Link先を確認	Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo	(参考訳) オープンワールドのインスタンスセグメンテーションは、少数のベースカテゴリオブジェクトから学習することで、イメージ内のすべてのオブジェクトをセグメンテーションすることを目的としている。目に見えないカテゴリの数は、見られているカテゴリの何百倍も大きい可能性があるため、このタスクは困難である。近年、DETRのようなモデルがクローズドな世界で広く研究され、オープンな世界では探索されていない。本稿では,Transformerを用いてオープンワールドのインスタンスセグメンテーションとSWORDを提案する。まず,分類ヘッドの前にストップグレード操作をアタッチし,さらに新たなオブジェクト発見のためのiouヘッドを追加する。単純なストップグレード操作は,新しいオブジェクトが背景として抑制されるのを防ぐだけでなく,ヒューリスティックラベル割り当てのメリットをネットワークが享受できることを示す。次に,オブジェクトと背景の表現を拡大するための新しいコントラスト学習フレームワークを提案する。具体的には,オブジェクトセンタを得るためにユニバーサルオブジェクトキューを維持し,オブジェクトクエリから正と負のサンプルを動的に選択して対比学習を行う。本研究は, 平均リコールと平均精度の無視にのみ焦点をあてるものであるが, いずれの基準も考慮し, SWORDの優位性を示す。我々のモデルは、様々なオープンワールドのクロスカテゴリやクロスデータセットの一般化において最先端のパフォーマンスを達成する。特にVOC以外のシステムでは,ARb100では40.0%,ARm100では34.9%の新たな技術結果が得られた。 COCO と UVO の一般化では、SWORD はAPm では5.9%、ARm100 では8.1% で過去最高のオープンワールドモデルを上回っている。 Open-world instance segmentation is a rising task, which aims to segment all objects in the image by learning from a limited number of base-category objects. This task is challenging, as the number of unseen categories could be hundreds of times larger than that of seen categories. Recently, the DETR-like models have been extensively studied in the closed world while stay unexplored in the open world. In this paper, we utilize the Transformer for open-world instance segmentation and present SWORD. Firstly, we introduce to attach the stop-gradient operation before classification head and further add IoU heads for discovering novel objects. We demonstrate that a simple stop-gradient operation not only prevents the novel objects from being suppressed as background, but also allows the network to enjoy the merit of heuristic label assignment. Secondly, we propose a novel contrastive learning framework to enlarge the representations between objects and background. Specifically, we maintain a universal object queue to obtain the object center, and dynamically select positive and negative samples from the object queries for contrastive learning. While the previous works only focus on pursuing average recall and neglect average precision, we show the prominence of SWORD by giving consideration to both criteria. Our models achieve state-of-the-art performance in various open-world cross-category and cross-dataset generalizations. Particularly, in VOC to non-VOC setup, our method sets new state-of-the-art results of 40.0% on ARb100 and 34.9% on ARm100. For COCO to UVO generalization, SWORD significantly outperforms the previous best open-world model by 5.9% on APm and 8.1% on ARm100.	翻訳日:2023-08-09 13:07:26 公開日:2023-08-08
# 量子力学系の隠れテンソル構造:単一粒子量子計算を目指して Hidden tensor structures of any quantum mechanical system: Towards single-particle quantum computation ( http://arxiv.org/abs/2308.04202v1 ) ライセンス: Link先を確認	Marek Czachor	(参考訳) 量子情報処理の標準的なアーキテクチャはボトムアップ設計に基づいている: 1桁の1粒子システムから始まり、マルチ桁の量子レジスタは1つの量子桁のテンソル積によって数学的にモデル化されたマルチ粒子構成を要求する。ここでは、量子情報処理の単一粒子トップダウン設計を可能にする隠れテンソル構造を、任意の単一量子システムが自動的に備えていることを示す。隠れテンソル構造は、単一の1次元調和振動子のように単純な量子系を任意の数のサブシステムに分解できることを意味する。結果として生じる構造は、量子計算、ベルの不等式違反、普遍量子ゲートの定式化を可能にするのに十分なリッチである。原則として、単一粒子量子コンピュータは可能である。さらに、これらの隠れた構造は、ブラント・グリーンバーグによる生成消滅作用素のマルチボゾン表現のような、いくつかのよく知られた理論構成のルーツであり、高次または分数次スクイージングの文脈で集中的に研究されていることが示されている。事実上、文献から知られているかなり退屈な標準的な証明は、文字通り1行に単純化することができる。一般的な構成は具体例で示される。 Standard architecture of quantum information processing is based on bottom-up design: One begins with a one-digit one-particle system, while multi-digit quantum registers demand multi-particle configurations, mathematically modeled by tensor products of single quantum digits. Here we show that any single quantum system is automatically equipped with hidden tensor structures that allow for single-particle top-down designs of quantum information processing. Hidden tensor structures imply that any quantum system, even as simple as a single one-dimensional harmonic oscillator, can be decomposed into an arbitrary number of subsystems. The resulting structure is rich enough to enable quantum computation, violation of Bell's inequalities, and formulation of universal quantum gates. In principle, a single-particle quantum computer is possible. Moreover, it is shown that these hidden structures are at the roots of some well known theoretical constructions, such as the Brandt-Greenberg multi-boson representation of creation-annihilation operators, intensively investigated in the context of higher-order or fractional-order squeezing. In effect, certain rather tedious standard proofs known from the literature can be simplified to literally one line. The general construction is illustrated by concrete examples.	翻訳日:2023-08-09 13:07:00 公開日:2023-08-08
# アインシュタインの量子リドルの観点から見たハイゼンベルクの量子力学の百年次再評価 A centennial reappraisal of Heisenberg's Quantum Mechanics with a perspective on Einstein's Quantum Riddle ( http://arxiv.org/abs/2308.04199v1 ) ライセンス: Link先を確認	Tuck C. Choy	(参考訳) ハイゼンベルクは1925年7月に発表した論文で、ボルン、ヨルダン、ハイゼンベルク、そしてディラック(1925年から1927年まで)によるその後の論文を通じて量子力学の発展を推し進めた。本稿では,新しい視点について考察する。 (i)彼の発見の直観を導くものは何か (ii)ボルン=ヨルダン=ハイゼンベルク正準量子化規則の起源この点から、アインシュタインの量子リドル (Lande 1974, Sommerfeld1918, Born1926) についての洞察と、ハイゼンベルクの量子力学の過去100年後に何が起こるのかを垣間見ることができる。 Heisenberg's breakthrough in his July 1925 paper that set in motion the development of Quantum Mechanics through subsequent papers by Born, Jordan, Heisenberg and also Dirac (from 1925 to 1927) is reexamined through a modern lens. In this paper, we shall discuss some new perspectives on (i) what could be the guiding intuitions for his discoveries and (ii) the origin of the Born-Jordan-Heisenberg canonical quantization rule. From this vantage point we may get an insight into Einstein's Quantum Riddle (Lande1974,Sommerfeld1918,Born1926) and a possible glimpse of what might come next after the last 100 years of Heisenberg's quantum mechanics.	翻訳日:2023-08-09 13:06:38 公開日:2023-08-08
# D3G: Glanceアノテーションを用いた時間文接地のためのガウス先行探索 D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation ( http://arxiv.org/abs/2308.04197v1 ) ライセンス: Link先を確認	Hanjun Li, Xiujun Shu, Sunan He, Ruizhi Qiao, Wei Wen, Taian Guo, Bei Gan, Xing Sun	(参考訳) time sentence grounding (tsg) は、与えられた自然言語クエリを持つ未トリミングビデオから特定のモーメントを見つけることを目的としている。近年では、教師付きメソッドが完全に教師付きメソッドに比べて大きなパフォーマンスギャップを保ち、後者は面倒なタイムスタンプアノテーションを必要とする。本研究では,TSGタスクのアノテーションコストを削減しつつ,TSGタスクの競合性能を維持することを目的としている。この目的を達成するために、最近提案された、各クエリに対して単一のフレームアノテーション(参照アノテーション)のみを必要とする、一見教師付き時間文グラウンド化タスクについて検討する。そこで,本研究では,semantic alignment group contrastive learning module (sa-gcl) と dynamic gaussian prior adjustment module (dga) からなるd3g (d3g) を用いた動的ガウス型事前学習フレームワークを提案する。特に、SA-GCLはガウスの先行と意味的整合性を利用して、2次元の時間地図から信頼できる正のモーメントをサンプリングし、共同埋め込み空間における正の文-モーメント対の整合に寄与する。さらに,複数のイベントからなる,視線アノテーションとモデル複合クエリによるアノテーションバイアスを軽減するために,ターゲットモーメントの基底的真理を近似するために分布を動的に調整するdgaモジュールを提案する。 3つの挑戦的なベンチマークに関する大規模な実験は、提案したD3Gの有効性を検証する。最先端の弱い教師付き手法を大きなマージンで上回り、完全に教師付き手法と比較してパフォーマンスギャップを狭める。コードはhttps://github.com/solicucu/D3Gで入手できる。 Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. Recently, weakly supervised methods still have a large performance gap compared to fully supervised ones, while the latter requires laborious timestamp annotations. In this study, we aim to reduce the annotation cost yet keep competitive performance for TSG task compared to fully supervised ones. To achieve this goal, we investigate a recently proposed glance-supervised temporal sentence grounding task, which requires only single frame annotation (referred to as glance annotation) for each query. Under this setup, we propose a Dynamic Gaussian prior based Grounding framework with Glance annotation (D3G), which consists of a Semantic Alignment Group Contrastive Learning module (SA-GCL) and a Dynamic Gaussian prior Adjustment module (DGA). Specifically, SA-GCL samples reliable positive moments from a 2D temporal map via jointly leveraging Gaussian prior and semantic consistency, which contributes to aligning the positive sentence-moment pairs in the joint embedding space. Moreover, to alleviate the annotation bias resulting from glance annotation and model complex queries consisting of multiple events, we propose the DGA module, which adjusts the distribution dynamically to approximate the ground truth of target moments. Extensive experiments on three challenging benchmarks verify the effectiveness of the proposed D3G. It outperforms the state-of-the-art weakly supervised methods by a large margin and narrows the performance gap compared to fully supervised methods. Code is available at https://github.com/solicucu/D3G.	翻訳日:2023-08-09 13:06:25 公開日:2023-08-08
# GHZ状態測定を用いた高光沢しきい値量子コンピューティング High photon-loss threshold quantum computing using GHZ-state measurements ( http://arxiv.org/abs/2308.04192v1 ) ライセンス: Link先を確認	Brendan Pankovich, Angus Kan, Kwok Ho Wan, Maike Ostmann, Alex Neville, Srikrishna Omkar, Adel Sohbi and Kamil Br\'adler	(参考訳) 本稿では,グリーンベルガー・ホーネ・ザイリンガー(GHZ)において,一定の大きさ,絡み合った資源状態に基づく射影的計測に基づく耐故障性アーキテクチャを提案する。本稿では,GHZ状態測定を符号化し,光子損失による誤差や線形光学の確率的性質を抑える線形光学構造について述べる。シミュレーションにより, 一定サイズの資源状態上での2量子ビット核融合測定により実現される, 最先端の線形光学アーキテクチャと比較して高い単一光子損失しきい値を示す。この結果はフォトニックフォールトトレラント量子コンピューティングを実現するためのリソース効率のよい道筋を示していると信じている。 We propose fault-tolerant architectures based on performing projective measurements in the Greenberger-Horne-Zeilinger (GHZ) basis on constant-sized, entangled resource states. We present linear-optical constructions of the architectures, where the GHZ-state measurements are encoded to suppress the errors induced by photon loss and the probabilistic nature of linear optics. Simulations of our constructions demonstrate high single-photon loss thresholds compared to the state-of-the-art linear-optical architecture realized with encoded two-qubit fusion measurements performed on constant-sized resource states. We believe this result shows a resource-efficient path to achieving photonic fault-tolerant quantum computing.	翻訳日:2023-08-09 13:05:54 公開日:2023-08-08
# ディープクロススケールパッチマッチングによる画像コピーモーブ偽造検出 Image Copy-Move Forgery Detection via Deep Cross-Scale PatchMatch ( http://arxiv.org/abs/2308.04188v1 ) ライセンス: Link先を確認	Yingjie He, Yuanman Li, Changsheng Chen and Xia Li	(参考訳) 最近開発された深層アルゴリズムは,イメージコピーモーブ偽造検出(cmfd)の分野で有望な進歩を遂げている。しかし、訓練画像やクローンされた領域にコピーモブオブジェクトが存在しない場合、いくつかの実用的なシナリオでは一般化性は限られている。以上の課題に対処するため,本研究では,従来の手法と深層手法を融合した新しいエンドツーエンドCMFDフレームワークを提案する。具体的には、コピー-ムーブ領域をローカライズするCMFDに適した、ディープクロススケールパッチマッチ手法を設計する。既存の深層モデルとは対照的に,高分解能スケールから抽出した特徴を用いて,ソースとターゲット領域間の明確かつ信頼性の高いポイント・ツー・ポイントマッチングを求める。さらに、ソース/ターゲット分離のための操作領域位置分岐を開発する。提案したCMFDフレームワークは完全に差別化可能であり、エンドツーエンドでトレーニングすることができる。提案手法は,本手法をコピー・ムーブの異なる内容に対して高い一般化性を示し,提案手法は既存手法よりも優れた性能を実現する。 The recently developed deep algorithms achieve promising progress in the field of image copy-move forgery detection (CMFD). However, they have limited generalizability in some practical scenarios, where the copy-move objects may not appear in the training images or cloned regions are from the background. To address the above issues, in this work, we propose a novel end-to-end CMFD framework by integrating merits from both conventional and deep methods. Specifically, we design a deep cross-scale patchmatch method tailored for CMFD to localize copy-move regions. In contrast to existing deep models, our scheme aims to seek explicit and reliable point-to-point matching between source and target regions using features extracted from high-resolution scales. Further, we develop a manipulation region location branch for source/target separation. The proposed CMFD framework is completely differentiable and can be trained in an end-to-end manner. Extensive experimental results demonstrate the high generalizability of our method to different copy-move contents, and the proposed scheme achieves significantly better performance than existing approaches.	翻訳日:2023-08-09 13:05:46 公開日:2023-08-08
# 何に理由を加えるか? 日常的な説明の分析 Adding Why to What? Analyses of an Everyday Explanation ( http://arxiv.org/abs/2308.04187v1 ) ライセンス: Link先を確認	Lutz Terfloth, Michael Schaffer, Heike M. Buhl, Carsten Schulte	(参考訳) xaiでは、専門的なオーディエンスのための説明とは対照的に、在職者について説明するとき、共通の専門知識を想定できないと考えることが重要である。しかし、人間間の説明は大きく異なるため、説明の共通性の研究は困難である。技術哲学的なアプローチである双対自然理論を使って、これらの課題に対処しました。アーキテクチャ(例えば、アルゴリズムのロジック)や関連性(例えば、決定の重大さ、レコメンデーションの意味)に焦点を当てることによって、XAIの2つの性質に対処することで、XAIの決定を説明することができる。本理論を分析的枠組みとして20種類のゲーム説明を検討した。我々は、この理論を使って、技術的アーティファクトの説明を素早く構造化し、比較した。ビデオリコールの結果から説明内容を分析した結果を補足し,説明者による説明の正当性について検討した。説明者はまずゲームの物理的側面(アーキテクチャ)に注目し、その後にのみ関連性の側面に注目することを発見した。ビデオのリコールでは、EXがアーキテクチャに重点を置くことは、より複雑で無形な側面にフォーカスする前に、まずは基本的なコンポーネントを説明することによって、説明を構築する上で重要であると見なされた。両者の対応の切り替えは、説明の目標、新たな誤解、説明者の知識ニーズによって正当化された。我々は,今後の研究課題を喚起するいくつかの共通点を発見し,さらに一般化すれば,合成説明の構成に第一のアイデアを与える。 In XAI it is important to consider that, in contrast to explanations for professional audiences, one cannot assume common expertise when explaining for laypeople. But such explanations between humans vary greatly, making it difficult to research commonalities across explanations. We used the dual nature theory, a techno-philosophical approach, to cope with these challenges. According to it, one can explain, for example, an XAI's decision by addressing its dual nature: by focusing on the Architecture (e.g., the logic of its algorithms) or the Relevance (e.g., the severity of a decision, the implications of a recommendation). We investigated 20 game explanations using the theory as an analytical framework. We elaborate how we used the theory to quickly structure and compare explanations of technological artifacts. We supplemented results from analyzing the explanation contents with results from a video recall to explore how explainers justified their explanation. We found that explainers were focusing on the physical aspects of the game first (Architecture) and only later on aspects of the Relevance. Reasoning in the video recalls indicated that EX regarded the focus on the Architecture as important for structuring the explanation initially by explaining the basic components before focusing on more complex, intangible aspects. Shifting between addressing the two sides was justified by explanation goals, emerging misunderstandings, and the knowledge needs of the explainee. We discovered several commonalities that inspire future research questions which, if further generalizable, provide first ideas for the construction of synthetic explanations.	翻訳日:2023-08-09 13:05:27 公開日:2023-08-08
# セキュアコード回帰のための反復スケッチ Iterative Sketching for Secure Coded Regression ( http://arxiv.org/abs/2308.04185v1 ) ライセンス: Link先を確認	Neophytos Charalambides, Hessam Mahdavifar, Mert Pilanci, Alfred O. Hero III	(参考訳) 本研究では,安全性を確保しつつ,線形回帰分布を高速化する手法を提案する。ランダムなスケッチ技術を活用し、非同期システムにおけるストラグラーレジリエンスを改善する。具体的には、ランダム正規直交行列を適用し、その後、情報を確保し、回帰問題の次元を減らすために \textit{blocks} をサブサンプルする。我々の設定では、変換は \textit{approximate gradient coding scheme} で符号化された暗号化に対応し、サブサンプリングは非ストラップ作業者の応答に対応する。これにより、$\ell_2$-subspace Embedding, \textit{i.e.} に対する分配的な \textit{iterative sketching} アプローチが各イテレーションで検討される。我々はまた、サンプリングをブロックするために一般化された \textit{Subsampled Randomized Hadamard Transform} の特別なケースに焦点を当て、データをセキュアにするためにどのように修正できるかについて議論する。 In this work, we propose methods for speeding up linear regression distributively, while ensuring security. We leverage randomized sketching techniques, and improve straggler resilience in asynchronous systems. Specifically, we apply a random orthonormal matrix and then subsample \textit{blocks}, to simultaneously secure the information and reduce the dimension of the regression problem. In our setup, the transformation corresponds to an encoded encryption in an \textit{approximate gradient coding scheme}, and the subsampling corresponds to the responses of the non-straggling workers; in a centralized coded computing network. This results in a distributive \textit{iterative sketching} approach for an $\ell_2$-subspace embedding, \textit{i.e.} a new sketch is considered at each iteration. We also focus on the special case of the \textit{Subsampled Randomized Hadamard Transform}, which we generalize to block sampling; and discuss how it can be modified in order to secure the data.	翻訳日:2023-08-09 13:05:01 公開日:2023-08-08
# キラル特異点を用いたプラズマ共鳴のコヒーレント光-マター相互作用の増強と室温量子収率 Enhanced coherent light-matter interaction and room-temperature quantum yield of plasmonic resonances engineered by a chiral exceptional point ( http://arxiv.org/abs/2308.04239v1 ) ライセンス: Link先を確認	Yuwei Lu, Haoxiang Jiang, Renming Liu	(参考訳) プラズモニック共鳴の強い消散は量子操作に有害である。量子コヒーレンスを高めるために,光磁場の位相が量子状態を柔軟に操作する新しい自由度を提供するキラル例外点(CEP)で作動するフォトニックキャビティを統合することにより,プラズモン共鳴の局所状態密度(LDOS)を調整することを提案する。量子化数モード理論を用いて,提案するハイブリッドキャビティのldosが,cepを伴わない通常のプラズモニック・フォトニックキャビティと比較して最大8倍のエンハンスメントとマグニチュード・オブ・マグニチュード・ライン幅の狭さを伴うサブロレンツ型に進化できることを明らかにした。これにより、偏光状態の散逸が減少すると共にコヒーレントな光-物質相互作用が強化される。さらに,cepにおける量子収率の大幅な向上,ファノ干渉によるプラズモニック吸収の低減,スーパー散乱によるキャビティ放射の増大の2つのメカニズムを明らかにするために,固有モード分解に基づく散乱理論が存在する。また,cepにおける高量子収率は,量子エミッタの蛍光寿命を測定することで,cepにおける拡張ldosの実験的検証に有用であることがわかった。そこで本研究では,CEPを用いた環境下でのプラズマ共鳴が,オープン光共振器の非ハーモニティ性を利用して量子状態制御を探索し,センサ,分光,量子情報処理,量子コンピューティングのための高性能な量子デバイスを構築する上で,有望なプラットフォームとなることを示す。 Strong dissipation of plasmonic resonances is detrimental to quantum manipulation. To enhance the quantum coherence, we propose to tailor the local density of states (LDOS) of plasmonic resonances by integrating with a photonic cavity operating at a chiral exceptional point (CEP), where the phase of light field can offer a new degree of freedom to flexibly manipulate the quantum states. A quantized few-mode theory is employed to reveal that the LDOS of the proposed hybrid cavity can evolve into sub-Lorentzian lineshape, with order-of-magnitude linewidth narrowing and additionally a maximum of eightfold enhancement compared to the usual plasmonic-photonic cavity without CEP. This results in the enhanced coherent light-matter interaction accompanied by the reduced dissipation of polaritonic states. Furthermore, a scattering theory based on eigenmode decomposition is present to elucidate two mechanisms responsible for the significant improvement of quantum yield at CEP, the reduction of plasmonic absorption by the Fano interference and the enhancement of cavity radiation through the superscattering. Importantly, we find the latter allows achieving a near-unity quantum yield at room temperature; in return, high quantum yield is beneficial to experimentally verify the enhanced LDOS at CEP by measuring the fluorescence lifetime of a quantum emitter. Therefore, our work demonstrates that the plasmonic resonances in CEP-engineered environment can serve as a promising platform for exploring the quantum states control by virtue of the non-Hermiticity of open optical resonators and building the high-performance quantum devices for sensing, spectroscopy, quantum information processing and quantum computing.	翻訳日:2023-08-09 12:57:13 公開日:2023-08-08
# コンフォメーション予測による無線チャネル上の信頼性不確実性定量化を用いたフェデレーション推定 Federated Inference with Reliable Uncertainty Quantification over Wireless Channels via Conformal Prediction ( http://arxiv.org/abs/2308.04237v1 ) ライセンス: Link先を確認	Meiyi Zhu, Matteo Zecchin, Sangwoo Park, Caili Guo, Chunyan Feng, Osvaldo Simeone	(参考訳) デバイスとサーバが事前訓練されたモデルを共有する設定を考える。サーバはモデルが与えられたら、新しい入力を推論したい。デバイスは、以前トレーニングに使用されていなかったデータにアクセスでき、共通の無線チャネルを介してサーバと通信することができる。デバイスが新しい入力にアクセスできない場合、デバイスからサーバへの通信は、サーバにおける推論決定の質を高めることができるのか? 最近の研究では、デバイス間通信を利用してサーバの決定の信頼性を向上させるfederated conformal prediction(cp)が導入されている。連合CPでは、デバイスがローカルデータ上で共有事前学習モデルによって得られた損失に関するサーバ情報と通信し、サーバは、この情報を利用して決定間隔や設定を校正し、予め定義された目標信頼性レベルに正しい回答を含むことが保証される。以前の作業ではノイズのない通信を想定しており、デバイスは1つの実数をサーバに通信できる。本稿では,無線環境下での初となるフェデレーションCPについて検討する。本稿では,タイプベース多重アクセス(TBMA)と新しい量子補正戦略に基づく新しいプロトコルWFCPを提案する。 WFCPは、サーバが生成した予測セットのカバレッジに関して、正式な信頼性を保証することが証明されている。計算結果を用いて、既存の連合CP方式のデジタル実装に対するWFCPの顕著なアドバンテージを、特に限られた通信資源や多数のデバイスで示している。 Consider a setting in which devices and a server share a pre-trained model. The server wishes to make an inference on a new input given the model. Devices have access to data, previously not used for training, and can communicate to the server over a common wireless channel. If the devices have no access to the new input, can communication from devices to the server enhance the quality of the inference decision at the server? Recent work has introduced federated conformal prediction (CP), which leverages devices-to-server communication to improve the reliability of the server's decision. With federated CP, devices communicate to the server information about the loss accrued by the shared pre-trained model on the local data, and the server leverages this information to calibrate a decision interval, or set, so that it is guaranteed to contain the correct answer with a pre-defined target reliability level. Previous work assumed noise-free communication, whereby devices can communicate a single real number to the server. In this paper, we study for the first time federated CP in a wireless setting. We introduce a novel protocol, termed wireless federated conformal prediction (WFCP), which builds on type-based multiple access (TBMA) and on a novel quantile correction strategy. WFCP is proved to provide formal reliability guarantees in terms of coverage of the predicted set produced by the server. Using numerical results, we demonstrate the significant advantages of WFCP against digital implementations of existing federated CP schemes, especially in regimes with limited communication resources and/or large number of devices.	翻訳日:2023-08-09 12:56:38 公開日:2023-08-08
# 合成子レースデータにおけるGANを用いた画像間変換の比較検討 A Comparative Study of Image-to-Image Translation Using GANs for Synthetic Child Race Data ( http://arxiv.org/abs/2308.04232v1 ) ライセンス: Link先を確認	Wang Yao, Muhammad Ali Farooq, Joseph Lemley, Peter Corcoran	(参考訳) データにおける民族多様性の欠如は、文献における顔認識技術の限界要因となっている。これは、データサンプルが不足している子供に特に当てはまり、成人データに基づいて訓練されたマシンビジョンアルゴリズムを子供に適応させようとする際の課題である。本研究では,画像から画像への変換を利用して異なる人種のデータを合成し,児童の顔データの民族性を調整することを提案する。ピク2ピク、サイクガン、カットネットワークという3つの異なる画像から画像へのニューラルネットワーク手法を比較し、コーカサス的児童データとアジアの児童データ変換を実装した。画像から画像への変換手法を用いて、幅広い民族多様性を持つ様々な合成子データサンプルを作成することが可能であることを示す。 The lack of ethnic diversity in data has been a limiting factor of face recognition techniques in the literature. This is particularly the case for children where data samples are scarce and presents a challenge when seeking to adapt machine vision algorithms that are trained on adult data to work on children. This work proposes the utilization of image-to-image transformation to synthesize data of different races and thus adjust the ethnicity of children's face data. We consider ethnicity as a style and compare three different Image-to-Image neural network based methods, specifically pix2pix, CycleGAN, and CUT networks to implement Caucasian child data and Asian child data conversion. Experimental validation results on synthetic data demonstrate the feasibility of using image-to-image transformation methods to generate various synthetic child data samples with broader ethnic diversity.	翻訳日:2023-08-09 12:56:10 公開日:2023-08-08
# opinionconv: 接頭辞を持つ会話型製品検索 OpinionConv: Conversational Product Search with Grounded Opinions ( http://arxiv.org/abs/2308.04226v1 ) ライセンス: Link先を確認	Vahid Sadiri Javadi, Martin Potthast, Lucie Flek	(参考訳) 製品を探すとき、他人の意見はインフォームドな意思決定において重要な役割を果たす。製品に関する主観的な経験は貴重な情報源になり得る。これはまた、顧客とセールスアシスタントが製品に関する事実や意見を交換する販売会話においても当てはまる。しかし、そのような会話のためにAIを訓練することは、言語モデルが実世界の経験の欠如に対して真の意見を持っていないという事実によって複雑である。製品レビューを製品意見の豊富な情報源として活用し、真に主観的な物語の中で対話型AIを基礎にすることでこの問題に対処する。 OpinionConvでは,営業会話をシミュレートする最初の対話型AIを開発した。生成した会話を検証するために,生成した意見が現実的であると認識されることを示すユーザスタディを複数実施する。また, 意思決定の根拠として, 意見の重要性も確認した。 When searching for products, the opinions of others play an important role in making informed decisions. Subjective experiences about a product can be a valuable source of information. This is also true in sales conversations, where a customer and a sales assistant exchange facts and opinions about products. However, training an AI for such conversations is complicated by the fact that language models do not possess authentic opinions for their lack of real-world experience. We address this problem by leveraging product reviews as a rich source of product opinions to ground conversational AI in true subjective narratives. With OpinionConv, we develop the first conversational AI for simulating sales conversations. To validate the generated conversations, we conduct several user studies showing that the generated opinions are perceived as realistic. Our assessors also confirm the importance of opinions as an informative basis for decision-making.	翻訳日:2023-08-09 12:55:54 公開日:2023-08-08
# Doorbellのカメラは年を重ねるにつれて認識されるのか? Will your Doorbell Camera still recognize you as you grow old ( http://arxiv.org/abs/2308.04224v1 ) ライセンス: Link先を確認	Wang Yao, Muhammad Ali Farooq, Joseph Lemley and Peter Corcoran	(参考訳) ドアベルカメラのような低消費電力の消費者向けデバイスに対するロバスト認証は、価値がありユニークな課題である。本研究は,顔認証法の性能に及ぼす年齢と加齢の影響を考察する。 AgeDBとMorph-IIの2つの公開年齢データセットがこの作業のベースラインとして使用されている。様々な年齢効果を持つ高品質な顔画像の集合を拡大するために、フォトリアリスティックな年齢変換法が用いられている。そして、これらの合成老化データが高速深層学習に基づく顔認識モデルに与える影響を、受信者動作特性(ROC)曲線や一致スコア分布を含む様々な指標を用いて定量化する。実験結果から, 顔認証手法の長期化は依然として重要な課題であることが明らかとなった。 Robust authentication for low-power consumer devices such as doorbell cameras poses a valuable and unique challenge. This work explores the effect of age and aging on the performance of facial authentication methods. Two public age datasets, AgeDB and Morph-II have been used as baselines in this work. A photo-realistic age transformation method has been employed to augment a set of high-quality facial images with various age effects. Then the effect of these synthetic aging data on the high-performance deep-learning-based face recognition model is quantified by using various metrics including Receiver Operating Characteristic (ROC) curves and match score distributions. Experimental results demonstrate that long-term age effects are still a significant challenge for the state-of-the-art facial authentication method.	翻訳日:2023-08-09 12:55:40 公開日:2023-08-08
# リアルタイムプログレッシブラーニング:ニューラルネットワークに基づく選択記憶を用いた相互強化学習と制御 Real-Time Progressive Learning: Mutually Reinforcing Learning and Control with Neural-Network-Based Selective Memory ( http://arxiv.org/abs/2308.04223v1 ) ライセンス: Link先を確認	Yiming Fei, Jiangang Li, Yanan Li	(参考訳) 記憶は、学習の基盤として、知識の記憶、更新、および忘れることを決定し、さらに学習の効率を決定づける。リアルタイム・プログレッシブ・ラーニング(RTPL)と呼ばれる,放射基底関数ニューラルネットワーク(RBFNN)に基づく学習制御方式を,安定性と閉ループ性能を保証したシステムの未知のダイナミクスを学習するために提案する。適応型神経制御(ANC)における確率勾配降下(SGD)更新法則の代わりに、RTPLは選択型メモリ再帰最小二乗法(SMRLS)アルゴリズムを採用し、RBFNNの重みを更新する。 SMRLSを介してRBFNNの近似能力を特徴空間上に均一に分散し、SGD法の受動的知識忘れ現象を抑制する。その後、RTPLは古典的なANCに対して以下のメリットを達成します。 1)低レベル持続励起(PE)下での学習能力保証 2)学習性能の向上(学習速度,精度,一般化能力) 3)実用用途におけるRTPLの堅牢性を確保する低利得要件。さらに、rtplベースの学習と制御は、タスク実行中に徐々に強化され、長期学習制御タスクに適合する。例えば、RTPLは適応フィードフォワードコントローラであるRBFNNを持つ非線形システムのクラスにおけるトラッキング制御問題に対処するために使用される。対応する理論解析およびシミュレーション研究はrtplの有効性を示す。 Memory, as the basis of learning, determines the storage, update and forgetting of the knowledge and further determines the efficiency of learning. Featured with a mechanism of memory, a radial basis function neural network (RBFNN) based learning control scheme named real-time progressive learning (RTPL) is proposed to learn the unknown dynamics of the system with guaranteed stability and closed-loop performance. Instead of the stochastic gradient descent (SGD) update law in adaptive neural control (ANC), RTPL adopts the selective memory recursive least squares (SMRLS) algorithm to update the weights of the RBFNN. Through SMRLS, the approximation capabilities of the RBFNN are uniformly distributed over the feature space and thus the passive knowledge forgetting phenomenon of SGD method is suppressed. Subsequently, RTPL achieves the following merits over the classical ANC: 1) guaranteed learning capability under low-level persistent excitation (PE), 2) improved learning performance (learning speed, accuracy and generalization capability), and 3) low gain requirement ensuring robustness of RTPL in practical applications. Moreover, the RTPL based learning and control will gradually reinforce each other during the task execution, making it appropriate for long-term learning control tasks. As an example, RTPL is used to address the tracking control problem of a class of nonlinear systems with RBFNN being an adaptive feedforward controller. Corresponding theoretical analysis and simulation studies demonstrate the effectiveness of RTPL.	翻訳日:2023-08-09 12:55:29 公開日:2023-08-08
# GNNモデルにおけるグラフ注意に基づく説明の意味解釈と検証 Semantic Interpretation and Validation of Graph Attention-based Explanations for GNN Models ( http://arxiv.org/abs/2308.04220v1 ) ライセンス: Link先を確認	Efimia Panagiotaki, Daniele De Martini, Lars Kunze	(参考訳) 本研究では,グラフニューラルネットワーク(GNN)に基づくモデルの説明可能性を高めるために意味的注意の応用について検討し,意味的インフォームド摂動を導入し,予測特徴量とモデル精度の相関性を確立する手法を提案する。 Graph Deep Learning(GDL)は、複雑な特徴や関係を簡潔に記述するために柔軟なグラフ構造を活用する、シーン解釈のようなタスクのための有望な分野として登場した。 eXplainable AI(XAI)で使用される従来の説明可能性手法は、そのような構造に直接適用できないため、グラフ固有のアプローチが導入された。注意機構は、深層学習モデルにおける入力特徴の重要性を推定する上での有効性を示しており、GNN予測のための特徴に基づく説明を提供するために、これまで用いられてきた。これらの知見に基づいて,注意重みを意味的ソートされた特徴集合の重要性指標として用いることを検討する既存の注意度ベースのグラフ説明可能性手法を拡張する。予測注目度分布の挙動をモデル精度と相関して解析することにより、GNNモデルの挙動に関する特徴的重要性に関する貴重な洞察を得る。提案手法をlidar pointcloud推定モデルに適用し,高機能化に寄与する重要セマンティクスクラスを効果的に同定し,信頼性の高いポストホックセマンティクス記述を生成する。 In this work, we propose a methodology for investigating the application of semantic attention to enhance the explainability of Graph Neural Network (GNN)-based models, introducing semantically-informed perturbations and establishing a correlation between predicted feature-importance weights and model accuracy. Graph Deep Learning (GDL) has emerged as a promising field for tasks like scene interpretation, leveraging flexible graph structures to concisely describe complex features and relationships. As traditional explainability methods used in eXplainable AI (XAI) cannot be directly applied to such structures, graph-specific approaches are introduced. Attention mechanisms have demonstrated their efficacy in estimating the importance of input features in deep learning models and thus have been previously employed to provide feature-based explanations for GNN predictions. Building upon these insights, we extend existing attention-based graph-explainability methods investigating the use of attention weights as importance indicators of semantically sorted feature sets. Through analysing the behaviour of predicted attention-weights distribution in correlation with model accuracy, we gain valuable insights into feature importance with respect to the behaviour of the GNN model. We apply our methodology to a lidar pointcloud estimation model successfully identifying key semantic classes that contribute to enhanced performance effectively generating reliable post-hoc semantic explanations.	翻訳日:2023-08-09 12:55:03 公開日:2023-08-08
# AquaSAM:水中画像フォアグラウンドセグメンテーション AquaSAM: Underwater Image Foreground Segmentation ( http://arxiv.org/abs/2308.04218v1 ) ライセンス: Link先を確認	Muduo Xu, Jianhao Su, Yutao Liu	(参考訳) SAM(Segment Anything Model)は自然画像のセグメンテーションに革命をもたらしたが、それでも水中画像のパフォーマンスは制限されている。この研究は、様々な水中ターゲットのセグメンテーションのための汎用的な方法を作成することを目的として、水中画像上でSAMの成功を拡大する最初の試みであるAquaSAMを提示する。これを実現するために、SUIMデータセットで様々なラベルを自動的に分類し抽出することから始める。次に,サムを海中イメージセグメンテーションに適応させるための簡易な微調整法を開発した。人間のダイバーのような8つのセグメンテーションタスクを含む広範な実験を通して、AquaSAMは特にサンゴ礁のような硬いタスクにおいて、デフォルトのSAMモデルよりも優れていることを示した。 AquaSAMは、水中セグメンテーションにおける平均Dice similarity Coefficient(DSC)が7.13(%)改善され、mIoUの改善が平均8.27(%)改善された。 The Segment Anything Model (SAM) has revolutionized natural image segmentation, nevertheless, its performance on underwater images is still restricted. This work presents AquaSAM, the first attempt to extend the success of SAM on underwater images with the purpose of creating a versatile method for the segmentation of various underwater targets. To achieve this, we begin by classifying and extracting various labels automatically in SUIM dataset. Subsequently, we develop a straightforward fine-tuning method to adapt SAM to general foreground underwater image segmentation. Through extensive experiments involving eight segmentation tasks like human divers, we demonstrate that AquaSAM outperforms the default SAM model especially at hard tasks like coral reefs. AquaSAM achieves an average Dice Similarity Coefficient (DSC) of 7.13 (%) improvement and an average of 8.27 (%) on mIoU improvement in underwater segmentation tasks.	翻訳日:2023-08-09 12:54:39 公開日:2023-08-08
# リアルタイム合成支援のためのハイブリッド検索拡張生成 Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance ( http://arxiv.org/abs/2308.04215v1 ) ライセンス: Link先を確認	Xuchao Zhang, Menglin Xia, Camille Couturier, Guoqing Zheng, Saravan Rajmohan, Victor Ruhle	(参考訳) 検索拡張モデルは、文脈理解を改善し、プライベートデータを統合し、幻覚を減らすことで、伝統的な言語モデルの強化に役立つ。しかし,大規模言語モデルの検索に要する処理時間は,合成支援などのリアルタイム応答を必要とするタスクに適用する際の課題となっている。この制限を克服するために,クライアントモデルとクラウドモデルを組み合わせたハイブリッド設定を利用するハイブリッド検索拡張生成(HybridRAG)フレームワークを提案する。 HybridRAGはクラウド上でLLM(Large Language Model)によって非同期に生成される検索拡張メモリを組み込んでいる。この検索強化メモリを統合することで、クライアントモデルはLLMの能力を利用して、非常に効果的な応答を生成する能力を得る。さらに、非同期メモリの統合により、クライアントモデルはクラウドからのメモリ同期を待つことなく、ユーザの要求に対してリアルタイムにレスポンスを提供することができる。 Wikitext と Pile のサブセットを用いた実験により,HybridRAG はクラウドベースの検索拡張 LLM よりも低レイテンシを実現し,クライアントのみのモデルよりも実用性が高いことがわかった。 Retrieval augmented models show promise in enhancing traditional language models by improving their contextual understanding, integrating private data, and reducing hallucination. However, the processing time required for retrieval augmented large language models poses a challenge when applying them to tasks that require real-time responses, such as composition assistance. To overcome this limitation, we propose the Hybrid Retrieval-Augmented Generation (HybridRAG) framework that leverages a hybrid setting that combines both client and cloud models. HybridRAG incorporates retrieval-augmented memory generated asynchronously by a Large Language Model (LLM) in the cloud. By integrating this retrieval augmented memory, the client model acquires the capability to generate highly effective responses, benefiting from the LLM's capabilities. Furthermore, through asynchronous memory integration, the client model is capable of delivering real-time responses to user requests without the need to wait for memory synchronization from the cloud. Our experiments on Wikitext and Pile subsets show that HybridRAG achieves lower latency than a cloud-based retrieval-augmented LLM, while outperforming client-only models in utility.	翻訳日:2023-08-09 12:54:22 公開日:2023-08-08
# FLIRT: フィードバックループのコンテキスト内でのレッドチーム FLIRT: Feedback Loop In-context Red Teaming ( http://arxiv.org/abs/2308.04265v1 ) ライセンス: Link先を確認	Ninareh Mehrabi, Palash Goyal, Christophe Dupuy, Qian Hu, Shalini Ghosh, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta	(参考訳) 警告: 本論文は不適切または不快な内容を含む。生成モデルが様々なアプリケーションで一般公開されるようになるにつれて、これらのモデルの脆弱性のテストと分析が最優先事項となっている。ここでは,特定のモデルを評価し,その脆弱性を安全でない不適切なコンテンツ生成に対して公開する自動レッドチームフレームワークを提案する。私たちのフレームワークは、レッドチームモデルに対するフィードバックループでコンテキスト内学習を使用し、それらを安全でないコンテンツ生成にトリガーします。本稿では,テキストから画像へのモデルの効果的かつ多様なプロンプトを自動的に学習するための,コンテキスト内攻撃戦略を提案する。提案手法は, ベースラインアプローチと比較して, 安定拡散(SD)モデルにおいて, 安全性が向上した場合でも, 脆弱性の暴露に有効であることが実証された。さらに,提案フレームワークが,テキスト対テキストモデルのレッド・ペアリングに有効であることを実証し,従来報告した数に比べて有毒な応答生成率を有意に高めることを示した。 Warning: this paper contains content that may be inappropriate or offensive. As generative models become available for public use in various applications, testing and analyzing vulnerabilities of these models has become a priority. Here we propose an automatic red teaming framework that evaluates a given model and exposes its vulnerabilities against unsafe and inappropriate content generation. Our framework uses in-context learning in a feedback loop to red team models and trigger them into unsafe content generation. We propose different in-context attack strategies to automatically learn effective and diverse adversarial prompts for text-to-image models. Our experiments demonstrate that compared to baseline approaches, our proposed strategy is significantly more effective in exposing vulnerabilities in Stable Diffusion (SD) model, even when the latter is enhanced with safety features. Furthermore, we demonstrate that the proposed framework is effective for red teaming text-to-text models, resulting in significantly higher toxic response generation rate compared to previously reported numbers.	翻訳日:2023-08-09 12:48:55 公開日:2023-08-08
# BarlowRL:データ効率の良い強化学習のためのバローツイン BarlowRL: Barlow Twins for Data-Efficient Reinforcement Learning ( http://arxiv.org/abs/2308.04263v1 ) ライセンス: Link先を確認	Omer Veysel Cagatan	(参考訳) 本稿では,Barlow Twins自己教師型学習フレームワークとDER(Data-Efficient Rainbow)アルゴリズムを組み合わせたデータ効率強化学習エージェントBarlowRLを紹介する。 BarlowRLはAtari 100kベンチマークでDERとそれと対照的なCURLの両方を上回っている。 BarlowRLは空間全体に広がる情報を強制することによって次元的崩壊を避ける。これにより、RLアルゴリズムは、最終的に顕著なパフォーマンスをもたらす一様拡散状態表現を利用することができる。 Barlow TwinsとDERの統合により、データ効率が向上し、RLタスクのパフォーマンスが向上する。 BarlowRLは、RLアルゴリズムを改善するために自己教師付き学習技術を導入する可能性を示している。 This paper introduces BarlowRL, a data-efficient reinforcement learning agent that combines the Barlow Twins self-supervised learning framework with DER (Data-Efficient Rainbow) algorithm. BarlowRL outperforms both DER and its contrastive counterpart CURL on the Atari 100k benchmark. BarlowRL avoids dimensional collapse by enforcing information spread to the whole space. This helps RL algorithms to utilize uniformly spread state representation that eventually results in a remarkable performance. The integration of Barlow Twins with DER enhances data efficiency and achieves superior performance in the RL tasks. BarlowRL demonstrates the potential of incorporating self-supervised learning techniques to improve RL algorithms.	翻訳日:2023-08-09 12:48:37 公開日:2023-08-08
# SDLFormer: 高速MR画像再構成のための疎高密度局所変換器 SDLFormer: A Sparse and Dense Locality-enhanced Transformer for Accelerated MR Image Reconstruction ( http://arxiv.org/abs/2308.04262v1 ) ライセンス: Link先を確認	Rahul G.S., Sriprabha Ramnarayanan, Mohammad Al Fahim, Keerthi Ram, Preejith S.P, and Mohanasankar Sivaprakasam	(参考訳) トランスフォーマーは、空間領域における非局所的な領域関係を学習する能力のため、畳み込みニューラルネットワークの有効な代替手段として登場した。トランスの自己アテンション機構により、トランスフォーマーは画像の長距離依存性を捉えることができ、画像領域におけるアンダーサンプリングの効果が非局所的であるため、mri画像再構成の高速化に望ましい。計算効率にも拘わらず、ウィンドウベースのトランスフォーマーはイメージウィンドウの範囲内に限定されるため、レセプティブフィールドの制限を受ける。拡張注意機構と畳み込み機構を統合し,mri画像再構成を高速化する窓型トランスフォーマーネットワークを提案する。提案手法は,mri画像再構成のための低レベル変換不変特性を学習するために,遠方近傍の画素関係を強化し,トランスフォーマモジュール内に奥行き方向畳み込みを導入するために,拡張および密集した近傍注意トランスから構成する。提案モデルは, 自己監督的に訓練される。 k-space スプリッティングに基づく自己教師型学習における4xおよび5xアンダーサンプリングと対比した冠状骨PD, 冠状骨PDFS, 軸方向T2に対する多コイルMRIアクセラレーションの広範な実験を行った。本手法は他の再構築アーキテクチャと並列ドメイン自己教師付き学習ベースラインとの比較を行った。その結果,提案モデルが改善率を示すことがわかった。 (i)PSNRでは約1.40dB、他のアーキテクチャでは平均0.028dBである。 (ii)psnrでは約1.44db、並列ドメイン自己教師付き学習では約0.029db。コードはhttps://github.com/rahul-gs-16/sdlformer.gitで入手できる。 Transformers have emerged as viable alternatives to convolutional neural networks owing to their ability to learn non-local region relationships in the spatial domain. The self-attention mechanism of the transformer enables transformers to capture long-range dependencies in the images, which might be desirable for accelerated MRI image reconstruction as the effect of undersampling is non-local in the image domain. Despite its computational efficiency, the window-based transformers suffer from restricted receptive fields as the dependencies are limited to within the scope of the image windows. We propose a window-based transformer network that integrates dilated attention mechanism and convolution for accelerated MRI image reconstruction. The proposed network consists of dilated and dense neighborhood attention transformers to enhance the distant neighborhood pixel relationship and introduce depth-wise convolutions within the transformer module to learn low-level translation invariant features for accelerated MRI image reconstruction. The proposed model is trained in a self-supervised manner. We perform extensive experiments for multi-coil MRI acceleration for coronal PD, coronal PDFS and axial T2 contrasts with 4x and 5x under-sampling in self-supervised learning based on k-space splitting. We compare our method against other reconstruction architectures and the parallel domain self-supervised learning baseline. Results show that the proposed model exhibits improvement margins of (i) around 1.40 dB in PSNR and around 0.028 in SSIM on average over other architectures (ii) around 1.44 dB in PSNR and around 0.029 in SSIM over parallel domain self-supervised learning. The code is available at https://github.com/rahul-gs-16/sdlformer.git	翻訳日:2023-08-09 12:48:27 公開日:2023-08-08
# passt と large audio-caption データセットを用いた自然言語に基づく音声検索の高度化 Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets ( http://arxiv.org/abs/2308.04258v1 ) ライセンス: Link先を確認	Paul Primus, Khaled Koutini, Gerhard Widmer	(参考訳) 本研究は,事前学習されたテキストとスペクトログラム変換器に基づく音声検索システムを提案する。提案手法は,異なるモーダルの関連事例が近接した共有音声キャプション空間に記録とテキスト記述を投影する。本研究では,システムの各コンポーネントが検索性能に与える影響を系統的分析により検討する。その結果,音声埋め込みのためのセルフアテンションベースのオーディオエンコーダと,事前学習における人間生成および合成データセットの利用という2つの重要な役割を担っている。さらに,ClosoV2字幕をキーワードで拡張し,その多様性を高める実験を行ったが,これは限界改善にしか至らなかった。当システムは2023年のdcaseチャレンジで第1位にランクインし, clothov2ベンチマークの現在の成果を5.6ppも上回った。マップ@10。 This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transformers. Our method projects recordings and textual descriptions into a shared audio-caption space in which related examples from different modalities are close. Through a systematic analysis, we examine how each component of the system influences retrieval performance. As a result, we identify two key components that play a crucial role in driving performance: the self-attention-based audio encoder for audio embedding and the utilization of additional human-generated and synthetic data sets during pre-training. We further experimented with augmenting ClothoV2 captions with available keywords to increase their variety; however, this only led to marginal improvements. Our system ranked first in the 2023's DCASE Challenge, and it outperforms the current state of the art on the ClothoV2 benchmark by 5.6 pp. mAP@10.	翻訳日:2023-08-09 12:47:58 公開日:2023-08-08
# CLASSLA-Stanza:南スラヴ語の言語処理の次のステップ CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages ( http://arxiv.org/abs/2308.04255v1 ) ライセンス: Link先を確認	Luka Ter\v{c}on, Nikola Ljube\v{s}i\'c	(参考訳) 本稿では,南スラヴ語の自動言語アノテーションのためのパイプラインであるCLASSLA-Stanzaについて述べる。我々は、Stanzaに対するCLASSLA-Stanzaの主な改善点を説明し、パイプラインの最新2.1リリースのモデルトレーニングプロセスの詳細を説明します。また、異なる言語や品種のパイプラインによって生成されたパフォーマンススコアも報告する。 CLASSLA-Stanzaは、サポートするすべての言語で一貫して高いパフォーマンスを示し、サポート対象のすべてのタスクにおいて、親パイプラインStanzaをパフォーマンスまたは拡張する。また、Webデータの効率的な処理を可能にするパイプラインの新機能と、その実装に繋がった理由についても紹介する。 We present CLASSLA-Stanza, a pipeline for automatic linguistic annotation of the South Slavic languages, which is based on the Stanza natural language processing pipeline. We describe the main improvements in CLASSLA-Stanza with respect to Stanza, and give a detailed description of the model training process for the latest 2.1 release of the pipeline. We also report performance scores produced by the pipeline for different languages and varieties. CLASSLA-Stanza exhibits consistently high performance across all the supported languages and outperforms or expands its parent pipeline Stanza at all the supported tasks. We also present the pipeline's new functionality enabling efficient processing of web data and the reasons that led to its implementation.	翻訳日:2023-08-09 12:47:33 公開日:2023-08-08
# 多焦点レンズカメラによるブラア認識距離推定 Blur aware metric depth estimation with multi-focus plenoptic cameras ( http://arxiv.org/abs/2308.04252v1 ) ライセンス: Link先を確認	Mathieu Labussi\`ere, C\'eline Teuli\`ere, Omar Ait-Aider	(参考訳) 従来のカメラはシーンの1つの視点のみをキャプチャするが、plenopticまたはlight-fieldカメラは1つのスナップショットで空間的および角的情報をキャプチャし、単一の取得から深さを推定できる。本稿では,多焦点カメラからの生画像のみを用いた新しい距離深度推定アルゴリズムを提案する。提案手法は,焦点長の異なる複数のマイクロレンズを用いたマルチフォーカス構成に特に適合する。 BLADEのアプローチの主な目的は,デフォーカスステレオ画像の一致度とデフォーカス手がかりの両組み合わせによる相違度推定を改善することである。したがって,従来は欠点とされていたぼやけ情報を活用する。スケール係数までの深さ推定を提供するデフォーカスボケを含む逆射影モデルを明示的に導出する。次に, 逆モデルを校正する手法を提案する。したがって、深度スケーリングを考慮に入れ、正確なメートル深度推定を行う。その結果,Defocus cuesの導入により深さ推定が向上した。筆者らは,3次元ライダースキャナーを用いて,相対的な深度推定設定と実世界の3次元複雑なシーンにおけるフレームワークと深度スケーリングキャリブレーションの有効性を実証した。 While a traditional camera only captures one point of view of a scene, a plenoptic or light-field camera, is able to capture spatial and angular information in a single snapshot, enabling depth estimation from a single acquisition. In this paper, we present a new metric depth estimation algorithm using only raw images from a multi-focus plenoptic camera. The proposed approach is especially suited for the multi-focus configuration where several micro-lenses with different focal lengths are used. The main goal of our blur aware depth estimation (BLADE) approach is to improve disparity estimation for defocus stereo images by integrating both correspondence and defocus cues. We thus leverage blur information where it was previously considered a drawback. We explicitly derive an inverse projection model including the defocus blur providing depth estimates up to a scale factor. A method to calibrate the inverse model is then proposed. We thus take into account depth scaling to achieve precise and accurate metric depth estimates. Our results show that introducing defocus cues improves the depth estimation. We demonstrate the effectiveness of our framework and depth scaling calibration on relative depth estimation setups and on real-world 3D complex scenes with ground truth acquired with a 3D lidar scanner.	翻訳日:2023-08-09 12:47:12 公開日:2023-08-08
# minddiffuser: 意味的および構造的拡散を伴うヒト脳活動からの画像再構成制御 MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion ( http://arxiv.org/abs/2308.04249v1 ) ライセンス: Link先を確認	Yizhuo Lu, Changde Du, Qiongyi zhou, Dianpeng Wang, Huiguang He	(参考訳) 脳の録音から視覚刺激を再構築することは有意義で難しい課題である。特に、精密かつ制御可能な画像再構成の達成は、脳-コンピュータインタフェースの進歩と活用を促進する上で非常に重要である。複雑な画像再構成技術の進歩にもかかわらず、この課題は、画像刺激と意味(概念と対象)と構造(位置、方向、大きさ)の結合的なアライメントを達成することにある。上記の問題に対処するため,MindDiffuserと呼ばれる2段階画像再構成モデルを提案する。ステージ1では、VQ-VAE潜在表現とfMRIからデコードされたCLIPテキスト埋め込みが安定拡散され、セマンティック情報を含む予備画像が生成される。ステージ2では、fMRIからデコードされたCLIP視覚特徴を監視情報として利用し、バックプロパゲーションによりステージ1でデコードされた2つの特徴ベクトルを継続的に調整し、構造情報を整列させる。定性的および定量的解析の結果から,本モデルがNatural Scenes Dataset (NSD) の最先端モデルを上回ったことが明らかとなった。その後の実験結果は、そのモデルの神経生物学的妥当性を裏付けるものであり、対応する脳反応と一致するマルチモーダル特徴の解釈可能性によって証明された。 Reconstructing visual stimuli from brain recordings has been a meaningful and challenging task. Especially, the achievement of precise and controllable image reconstruction bears great significance in propelling the progress and utilization of brain-computer interfaces. Despite the advancements in complex image reconstruction techniques, the challenge persists in achieving a cohesive alignment of both semantic (concepts and objects) and structure (position, orientation, and size) with the image stimuli. To address the aforementioned issue, we propose a two-stage image reconstruction model called MindDiffuser. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion, which yields a preliminary image that contains semantic information. In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpropagation to align the structural information. The results of both qualitative and quantitative analyses demonstrate that our model has surpassed the current state-of-the-art models on Natural Scenes Dataset (NSD). The subsequent experimental findings corroborate the neurobiological plausibility of the model, as evidenced by the interpretability of the multimodal feature employed, which align with the corresponding brain responses.	翻訳日:2023-08-09 12:46:28 公開日:2023-08-08
# 単語埋め込みを用いた光沢アライメント Gloss Alignment Using Word Embeddings ( http://arxiv.org/abs/2308.04248v1 ) ライセンス: Link先を確認	Harry Walsh, Ozge Mercanoglu Sincan, Ben Saunders, Richard Bowden	(参考訳) 署名言語データセットのキャプチャとアノテーションは、時間とコストのかかるプロセスである。現在のデータセットは、制約のない \acf{slt}モデルをうまくトレーニングするには、桁違いに小さすぎる。その結果、研究は、手話インタプリタと関連するオーディオサブタイトルの両方からなる大規模トレーニングデータのソースとして、テレビ放送コンテンツに転換した。しかし、手話アノテーションの欠如は、このデータのユーザビリティを制限し、手話スポッティングのような自動アノテーション技術の開発につながった。これらのスポッティングは、字幕ではなくビデオと一致しており、しばしば字幕と斑点の記号のミスアライメントをもたらす。本論文では,大規模な音声言語モデルを用いて,スポッティングを対応する字幕に合わせる手法を提案する。単一のモダリティを用いることで,計算コストが低く,既存のアライメント手法と組み合わせて利用することができる。本稿では, 単語アライメントにおける<acf{mdgs} と \acf{bobsl} データセットの有効性を定量的に検証し, 単語アライメントにおいて最大33.22 BLEU-1 スコアを回復する。 Capturing and annotating Sign language datasets is a time consuming and costly process. Current datasets are orders of magnitude too small to successfully train unconstrained \acf{slt} models. As a result, research has turned to TV broadcast content as a source of large-scale training data, consisting of both the sign language interpreter and the associated audio subtitle. However, lack of sign language annotation limits the usability of this data and has led to the development of automatic annotation techniques such as sign spotting. These spottings are aligned to the video rather than the subtitle, which often results in a misalignment between the subtitle and spotted signs. In this paper we propose a method for aligning spottings with their corresponding subtitles using large spoken language models. Using a single modality means our method is computationally inexpensive and can be utilized in conjunction with existing alignment techniques. We quantitatively demonstrate the effectiveness of our method on the \acf{mdgs} and \acf{bobsl} datasets, recovering up to a 33.22 BLEU-1 score in word alignment.	翻訳日:2023-08-09 12:45:52 公開日:2023-08-08
# aicsd: 意味セグメンテーションのための適応的クラス間類似性蒸留 AICSD: Adaptive Inter-Class Similarity Distillation for Semantic Segmentation ( http://arxiv.org/abs/2308.04243v1 ) ライセンス: Link先を確認	Amir M. Mansourian, Rozhan Ahmadi, Shohreh Kasaei	(参考訳) 近年、深層ニューラルネットワークはコンピュータビジョンタスクにおいて顕著な精度を実現している。特にセマンティックセグメンテーションのような密集した予測タスクにおいて、推論時間が重要な要因となるため、知識蒸留は軽量な学生ネットワークの精度向上に成功している。既存の手法は、チャンネルや異なるクラスの情報を無視することが多い。これらの制約を克服するために,知識蒸留のためのICSD (Inter-Class similarity Distillation) を提案する。提案手法は,クラス内のクラス内分布をネットワーク出力から独立に計算することにより,教師ネットワークから生徒ネットワークへ高次関係を伝達する。その後、各クラスの分布間のkl発散を用いて蒸留のためのクラス間類似度行列を計算する。提案手法の有効性をさらに向上するため,適応損失重み付け(ALW)トレーニング戦略を提案する。既存の方法とは異なり、alw戦略は教師の予測の誤りを考慮し、訓練プロセスの終了に向けて教師ネットワークの影響を徐々に減少させる。セマンティックセグメンテーションのためのよく知られた2つのデータセットであるCityscapesとPascal VOC 2012で実施された大規模な実験は、mIoUとピクセル精度の観点から提案手法の有効性を検証する。提案手法は, 定量評価と定性評価の両方により, 既存の知識蒸留法よりも優れていた。コードは、https://github.com/AmirMansurian/AICSDで入手できる。 In recent years, deep neural networks have achieved remarkable accuracy in computer vision tasks. With inference time being a crucial factor, particularly in dense prediction tasks such as semantic segmentation, knowledge distillation has emerged as a successful technique for improving the accuracy of lightweight student networks. The existing methods often neglect the information in channels and among different classes. To overcome these limitations, this paper proposes a novel method called Inter-Class Similarity Distillation (ICSD) for the purpose of knowledge distillation. The proposed method transfers high-order relations from the teacher network to the student network by independently computing intra-class distributions for each class from network outputs. This is followed by calculating inter-class similarity matrices for distillation using KL divergence between distributions of each pair of classes. To further improve the effectiveness of the proposed method, an Adaptive Loss Weighting (ALW) training strategy is proposed. Unlike existing methods, the ALW strategy gradually reduces the influence of the teacher network towards the end of training process to account for errors in teacher's predictions. Extensive experiments conducted on two well-known datasets for semantic segmentation, Cityscapes and Pascal VOC 2012, validate the effectiveness of the proposed method in terms of mIoU and pixel accuracy. The proposed method outperforms most of existing knowledge distillation methods as demonstrated by both quantitative and qualitative evaluations. Code is available at: https://github.com/AmirMansurian/AICSD	翻訳日:2023-08-09 12:45:32 公開日:2023-08-08
# AutoPCF: 大規模言語モデルを用いた効率的な製品カーボンフットプリント会計 AutoPCF: Efficient Product Carbon Footprint Accounting with Large Language Models ( http://arxiv.org/abs/2308.04241v1 ) ライセンス: Link先を確認	Zhu Deng, Jinjie Liu, Biao Luo, Can Yuan, Qingrun Yang, Lei Xiao, Wenwen Zhou, Zhu Liu	(参考訳) 製品炭素フットプリント(pcf)はサプライチェーンの脱炭素化に不可欠であり、製品ライフサイクル中のすべての活動によって引き起こされる間接的および間接的な温室効果ガス排出量を測定する。しかし、PCF会計は、しばしば専門知識とライフサイクルモデルを構築するのにかなりの時間を必要とする。本研究では,5つの大規模言語モデル(llm)の創発的能力を用いて,製品の'cradle-to-gate'ライフサイクルをモデル化し,入力と出力のインベントリデータを生成し,その限界を一般化pcf知識データベースとして明らかにする。 llmsを活用することで,計算パラメータの自動マッチングにディープラーニングアルゴリズムを適用し,最終的にpcfを計算する,自動ai駆動型pcf会計フレームワークautopcfを提案する。 autopcfフレームワークを用いて3つのケース製品の炭素フットプリントを推定した結果,モデリング時間を数日から数分に短縮し,pcfの自動モデリングと推定を実現する可能性を示した。 The product carbon footprint (PCF) is crucial for decarbonizing the supply chain, as it measures the direct and indirect greenhouse gas emissions caused by all activities during the product's life cycle. However, PCF accounting often requires expert knowledge and significant time to construct life cycle models. In this study, we test and compare the emergent ability of five large language models (LLMs) in modeling the 'cradle-to-gate' life cycles of products and generating the inventory data of inputs and outputs, revealing their limitations as a generalized PCF knowledge database. By utilizing LLMs, we propose an automatic AI-driven PCF accounting framework, called AutoPCF, which also applies deep learning algorithms to automatically match calculation parameters, and ultimately calculate the PCF. The results of estimating the carbon footprint for three case products using the AutoPCF framework demonstrate its potential in achieving automatic modeling and estimation of PCF with a large reduction in modeling time from days to minutes.	翻訳日:2023-08-09 12:45:07 公開日:2023-08-08
# 持続的行動による時間的離散化を伴うアクタ-クリティック Actor-Critic with variable time discretization via sustained actions ( http://arxiv.org/abs/2308.04299v1 ) ライセンス: Link先を確認	Jakub {\L}yskawa, Pawe{\l} Wawrzy\'nski	(参考訳) 強化学習(RL)法は離散時間で機能する。ロボット制御のような本質的に連続した問題にRLを適用するには、特定の時間離散化を定義する必要がある。これは、訓練が容易なスパースタイムコントロールと、最終的なパフォーマンス向上を可能にするより細かいタイムコントロールの2つの選択肢である。本研究では,異なる時間離散化設定の利点を組み合わせたオフポリシーrlアルゴリズムであるsusacerを提案する。最初はスパースタイムの離散化で動作し、徐々に微細なものに切り替える。ロボット制御環境における時間偏差変化の影響を解析する:Ant, HalfCheetah, Hopper, Walker2D。いずれの場合も,提案アルゴリズムは最先端技術より優れている。 Reinforcement learning (RL) methods work in discrete time. In order to apply RL to inherently continuous problems like robotic control, a specific time discretization needs to be defined. This is a choice between sparse time control, which may be easier to train, and finer time control, which may allow for better ultimate performance. In this work, we propose SusACER, an off-policy RL algorithm that combines the advantages of different time discretization settings. Initially, it operates with sparse time discretization and gradually switches to a fine one. We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D. In all cases our proposed algorithm outperforms state of the art.	翻訳日:2023-08-09 12:36:50 公開日:2023-08-08
# LaCAM$^\ast$:リアルタイム・大規模・準最適マルチエージェントパスフィニングを目指して Engineering LaCAM$^\ast$: Towards Real-Time, Large-Scale, and Near-Optimal Multi-Agent Pathfinding ( http://arxiv.org/abs/2308.04292v1 ) ライセンス: Link先を確認	Keisuke Okumura	(参考訳) 本稿では,最近提案されたLaCAMアルゴリズムの改良を通じて,リアルタイム,大規模,準最適マルチエージェントパスフィンディング(MAPF)の課題に対処する。 LaCAMはスケーラブルな検索ベースのアルゴリズムであり、累積遷移コストに対する最適解の最終的な発見を保証する。様々な最先端MAPF法を超越した計画成功率を示す一方で、初期解の質は最適には程遠いものであり、最適への収束速度は遅い。これらの制限を克服するために,他のMAPF法からインスピレーションを得た改良手法をいくつか紹介する。これらの手法の融合がLaCAMの解の質を著しく向上させ、MAPFアルゴリズムの境界をさらに推し進めるという実証的な証拠を提供する。 This paper addresses the challenges of real-time, large-scale, and near-optimal multi-agent pathfinding (MAPF) through enhancements to the recently proposed LaCAM algorithm. LaCAM* is a scalable search-based algorithm that guarantees the eventual finding of optimal solutions for cumulative transition costs. While it has demonstrated remarkable planning success rates, surpassing various state-of-the-art MAPF methods, its initial solution quality is far from optimal, and its convergence speed to the optimum is slow. To overcome these limitations, this paper introduces several improvement techniques, partly drawing inspiration from other MAPF methods. We provide empirical evidence that the fusion of these techniques significantly improves the solution quality of LaCAM*, thus further pushing the boundaries of MAPF algorithms.	翻訳日:2023-08-09 12:36:41 公開日:2023-08-08
# 長距離絡み合いを混合に変換する:局所平衡へのテンソル-ネットワークアプローチ Converting long-range entanglement into mixture: tensor-network approach to local equilibration ( http://arxiv.org/abs/2308.04291v1 ) ライセンス: Link先を確認	Miguel Fr\'ias-P\'erez, Luca Tagliacozzo and Mari Carmen Ba\~nuls	(参考訳) クエンチによって誘導される平衡外進化において、高速自由度は標準テンソルネットワークでエンコードし難い長距離絡みを生じる。しかし、局所観測者はそのような長距離相関を、還元された局所状態への寄与を通じてのみ知覚する。本稿では,このような長距離の絡み合いを識別し,それを効率的に混合して表現しやすいテンソルネットワーク法を提案する。このように,有限計算資源を持つ局所作用素の長時間挙動をキャプチャする密度行列として,時間発展状態の効果的な記述を得る。 In the out-of-equilibrium evolution induced by a quench, fast degrees of freedom generate long-range entanglement that is hard to encode with standard tensor networks. However, local observables only sense such long-range correlations through their contribution to the reduced local state as a mixture. We present a tensor network method that identifies such long-range entanglement and efficiently transforms it into mixture, much easier to represent. In this way, we obtain an effective description of the time-evolved state as a density matrix that captures the long-time behavior of local operators with finite computational resources.	翻訳日:2023-08-09 12:36:24 公開日:2023-08-08
# Cloth2Tex: 3D仮想トライオンのためのカスタマイズされた布地テクスチャ生成パイプライン Cloth2Tex: A Customized Cloth Texture Generation Pipeline for 3D Virtual Try-On ( http://arxiv.org/abs/2308.04288v1 ) ライセンス: Link先を確認	Daiheng Gao, Xu Chen, Xindi Zhang, Qi Wang, Ke Sun, Bang Zhang, Liefeng Bo, Qixing Huang	(参考訳) 3D服の製作とデザインは、3D仮想試着、2D服の3Dアパレルへのデジタル化、布のアニメーションなど、様々な用途でリアルな服装を合成する必要性が高まるにつれて、非常に要求されるようになった。そのため、2d参照画像などの単純な入力から高品質なテクスチャを得るために、シンプルで簡単なパイプラインが必要となる。伝統的なワーピングベースのテクスチャ生成法では、各タイプの衣服に手動で選択するかなりの数の制御ポイントが必要であるため、時間と手間がかかる。本稿では,この過程における人的負担をなくす新しい方法である cloth2tex を提案する。 Cloth2Texは、合理的なレイアウトと構造整合性を持つテクスチャマップを生成する自己教師方式である。 Cloth2Texのもうひとつの重要な特徴は、高忠実なテクスチャインペイントをサポートするために使用できることだ。これはClos2Texと一般的な潜在拡散モデルを組み合わせることで実現される。提案手法は質的かつ定量的に評価し,Clos2Texが高品質なテクスチャマップを生成でき,他の手法と比較して最高の視覚効果が得られることを示した。プロジェクトページ:tomguluson92.github.io/projects/cloth2tex/ Fabricating and designing 3D garments has become extremely demanding with the increasing need for synthesizing realistic dressed persons for a variety of applications, e.g. 3D virtual try-on, digitalization of 2D clothes into 3D apparel, and cloth animation. It thus necessitates a simple and straightforward pipeline to obtain high-quality texture from simple input, such as 2D reference images. Since traditional warping-based texture generation methods require a significant number of control points to be manually selected for each type of garment, which can be a time-consuming and tedious process. We propose a novel method, called Cloth2Tex, which eliminates the human burden in this process. Cloth2Tex is a self-supervised method that generates texture maps with reasonable layout and structural consistency. Another key feature of Cloth2Tex is that it can be used to support high-fidelity texture inpainting. This is done by combining Cloth2Tex with a prevailing latent diffusion model. We evaluate our approach both qualitatively and quantitatively and demonstrate that Cloth2Tex can generate high-quality texture maps and achieve the best visual effects in comparison to other methods. Project page: tomguluson92.github.io/projects/cloth2tex/	翻訳日:2023-08-09 12:36:14 公開日:2023-08-08
# wav2vec 2.0 Feature Extractorの比較解析 Comparative Analysis of the wav2vec 2.0 Feature Extractor ( http://arxiv.org/abs/2308.04286v1 ) ライセンス: Link先を確認	Peter Vieting and Ralf Schl\"uter and Hermann Ney	(参考訳) 自動音声認識(ASR)システムは通常手作りの特徴抽出パイプラインを使用する。固有情報損失を回避し、音声から転写テキストへのより一貫したモデリングを達成するために、neural raw waveform feature extractor(fes)は魅力的なアプローチである。また、最近広く普及したwav2vec 2.0モデルは、音声波形を直接操作する畳み込みFEを使用している。しかし、文献ではまだ広く研究されていない。本研究では,ctc (connectionist temporal classification) asrモデルにおける標準特徴抽出法を代替する能力について検討し,それを代替神経feと比較する。両者とも、librispeechベンチマークにおいて従来のfesと競合し、個々のコンポーネントの影響を分析する。さらに、学習したフィルタを分析し、ASRシステムにとって最も重要な情報が一連の帯域通過フィルタによって得られることを示す。 Automatic speech recognition (ASR) systems typically use handcrafted feature extraction pipelines. To avoid their inherent information loss and to achieve more consistent modeling from speech to transcribed text, neural raw waveform feature extractors (FEs) are an appealing approach. Also the wav2vec 2.0 model, which has recently gained large popularity, uses a convolutional FE which operates directly on the speech waveform. However, it is not yet studied extensively in the literature. In this work, we study its capability to replace the standard feature extraction methods in a connectionist temporal classification (CTC) ASR model and compare it to an alternative neural FE. We show that both are competitive with traditional FEs on the LibriSpeech benchmark and analyze the effect of the individual components. Furthermore, we analyze the learned filters and show that the most important information for the ASR system is obtained by a set of bandpass filters.	翻訳日:2023-08-09 12:35:51 公開日:2023-08-08
# 極端海洋環境下における無人船の視覚に基づく自律航法 Vision-Based Autonomous Navigation for Unmanned Surface Vessel in Extreme Marine Conditions ( http://arxiv.org/abs/2308.04283v1 ) ライセンス: Link先を確認	Muhayyuddin Ahmed, Ahsan Baidar Bakht, Taimur Hassan, Waseem Akram, Ahmed Humais, Lakmal Seneviratne, Shaoming He, Defu Lin, and Irfan Hussain	(参考訳) 視覚知覚は無人表面容器(USV)の自律航法において重要な要素であり、特に自律的な検査と追跡に関わるタスクにおいて重要である。これらのタスクには、ナビゲーションのターゲットを特定する視覚ベースのナビゲーション技術が含まれる。海洋環境における極端な気象条件下での視認性の低下は、視覚に基づくアプローチが適切に働くことを困難にしている。これらの課題を克服するために,極端海洋環境下で対象物を追跡する自律型視覚ナビゲーションフレームワークを提案する。提案するフレームワークは、GAN(Generative Adversarial Network)を使用してノイズを除去し、オブジェクト検出器(YOLOv5)に渡す前にオブジェクトの特徴をハイライトする統合認識パイプラインで構成されている。検出された視覚的特徴は、ターゲットを追跡するためにUSVによって使用される。提案手法は砂嵐や霧による可視性低下下でのシミュレーションで徹底的に検証されている。その結果,提案手法が既存の手法を様々な測定値で上回っているmbzircシミュレーションデータセット全体において,最先端のデヘイジング手法と比較した。 Visual perception is an important component for autonomous navigation of unmanned surface vessels (USV), particularly for the tasks related to autonomous inspection and tracking. These tasks involve vision-based navigation techniques to identify the target for navigation. Reduced visibility under extreme weather conditions in marine environments makes it difficult for vision-based approaches to work properly. To overcome these issues, this paper presents an autonomous vision-based navigation framework for tracking target objects in extreme marine conditions. The proposed framework consists of an integrated perception pipeline that uses a generative adversarial network (GAN) to remove noise and highlight the object features before passing them to the object detector (i.e., YOLOv5). The detected visual features are then used by the USV to track the target. The proposed framework has been thoroughly tested in simulation under extremely reduced visibility due to sandstorms and fog. The results are compared with state-of-the-art de-hazing methods across the benchmarked MBZIRC simulation dataset, on which the proposed scheme has outperformed the existing methods across various metrics.	翻訳日:2023-08-09 12:35:38 公開日:2023-08-08
# 散逸のないエッジ状態による線幅狭化による位相保護下空洞ポラリトン Topologically protected subradiant cavity polaritons through linewidth narrowing enabled by dissipationless edge states ( http://arxiv.org/abs/2308.04277v1 ) ライセンス: Link先を確認	Yuwei Lu, Jingfeng Liu, Haoxiang Jiang, Zeyang Liao	(参考訳) 量子レベルでの強い光-物質相互作用に由来するキャビティ偏光子は、キャビティ場を介して量子状態の効率的な操作の基礎となる。狭い直線幅と長い寿命を持つポラリトンは、量子センシングや記憶などの応用に魅力的である。本稿では,一次元原子配列で成形したトポロジカルミラーを用いた発振ガリーモード共振器を試作し,キャビティ偏光子の寿命を等級的に向上させる手法を提案する。この顕著な強化特性は、空洞モードの漏れを抑制する原子配列のトポロジカルバンドギャップによって保護される散逸のないエッジ状態への分極状態のカップリングによるものである。ラビ分割の幅を超えると、位相的バンドギャップは、極性状態から原子配列のバルク状態への散逸をさらに減少させ、非常に鋭い線幅を持つ亜ラジアントキャビティポラリトンに生じる。結果のラビ振動は、単一の量子エミッタの自由空間崩壊よりも低い速度で崩壊する。エッジ状態の位相的に保護された性質から受け継いだキャビティポラリトンは、原子周波数、相互作用強度、位置を含む中程度の摂動を伴う乱れた原子ミラーに保存することができる。我々の研究は、量子コンピューティングとネットワークの将来の応用にロバストな量子コヒーレンスを持つトポロジー工学の量子状態の新しいパラダイムを開放する。 Cavity polaritons derived from the strong light-matter interaction at the quantum level provide a basis for efficient manipulation of quantum states via cavity field. Polaritons with narrow linewidth and long lifetime are appealing in applications such as quantum sensing and storage. Here, we propose a prototypical arrangement to implement a whispering-gallery-mode resonator with topological mirror moulded by one-dimensional atom array, which allows to boost the lifetime of cavity polaritons over an order of magnitude. This considerable enhancement attributes to the coupling of polaritonic states to dissipationless edge states protected by the topological bandgap of atom array that suppresses the leakage of cavity modes. When exceeding the width of Rabi splitting, topological bandgap can further reduce the dissipation from polaritonic states to bulk states of atom array, giving arise to subradiant cavity polaritons with extremely sharp linewidth. The resultant Rabi oscillation decays with a rate even below the free-space decay of a single quantum emitter. Inheriting from the topologically protected properties of edge states, the subradiance of cavity polaritons can be preserved in the disordered atom mirror with moderate perturbations involving the atomic frequency, interaction strengths and location. Our work opens up a new paradigm of topology-engineered quantum states with robust quantum coherence for future applications in quantum computing and network.	翻訳日:2023-08-09 12:35:18 公開日:2023-08-08
# コンテキストアライメント - 微調整前のバニラ言語モデルとのチャット In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning ( http://arxiv.org/abs/2308.04275v1 ) ライセンス: Link先を確認	Xiaochuang Han	(参考訳) 本稿では,コンテキスト内学習による推論時間アライメントについて検討する。我々は,事前学習された言語モデルであるllama-2を微調整する前に検討し,モデルがチャットスタイルの指示に従うように促された場合,平均9個のデモンストレーションアライメント例を取得する。直接的プロンプトと比較すると、モデル重みを変更しないコンテキスト内アライメントは、OpenAIのtext-davinci-003モデルであるWin-rate w.r.tの7倍増加し、アライメントを微調整する強力なベースラインに匹敵するバニラ言語モデルとなる。 In this note, we explore inference-time alignment through in-context learning. We consider a vanilla pretrained language model Llama-2 before any fine-tuning and retrieve an average of 9 demonstration alignment examples when the model is prompted to follow chat-style instructions. Compared to direct prompting, the in-context alignment without changing model weights leads to a 7x increase in win-rate w.r.t. the text-davinci-003 model from OpenAI, making the vanilla language model comparable to strong baselines with alignment fine-tuning.	翻訳日:2023-08-09 12:34:54 公開日:2023-08-08
# losy and lossless (l$^2$) トレーニング後のモデルサイズ圧縮 Lossy and Lossless (L$^2$) Post-training Model Size Compression ( http://arxiv.org/abs/2308.04269v1 ) ライセンス: Link先を確認	Yumeng Shi, Shihao Bai, Xiuying Wei, Ruihao Gong, Jianlei Yang	(参考訳) ディープニューラルネットワークは驚くべきパフォーマンスをもたらし、様々なビジュアルタスクで広く使われている。しかし、その巨大なサイズは伝送と貯蔵に多大な不便をもたらす。過去の多くの研究でモデルサイズ圧縮が研究されている。しかしながら、これらの研究は、しばしば様々な損失のない圧縮手法に単独でアプローチし、高い圧縮比を効率的に達成する上での課題となる。本研究では,無損失圧縮と無損失圧縮を統一的に組み合わせた後学習モデルサイズ圧縮法を提案する。本稿ではまず,異なる損失圧縮法を訓練後の方法で共同で行うことができる統一パラメトリックウェイト変換を提案する。次に、損失圧縮の最適化を導出するために専用微分可能なカウンタを導入し、後続のロスレス圧縮に適した点に到達させる。さらに, 所望のグローバル圧縮比を制御でき, 異なる層に対して適応比を割り当てることができる。最後に,精度を犠牲にすることなく10/times$圧縮比を安定させ,短時間で20/times$圧縮比を小さくする手法を提案する。私たちのコードはhttps://github.com/ModelTC/L2_Compressionで利用可能です。 Deep neural networks have delivered remarkable performance and have been widely used in various visual tasks. However, their huge size causes significant inconvenience for transmission and storage. Many previous studies have explored model size compression. However, these studies often approach various lossy and lossless compression methods in isolation, leading to challenges in achieving high compression ratios efficiently. This work proposes a post-training model size compression method that combines lossy and lossless compression in a unified way. We first propose a unified parametric weight transformation, which ensures different lossy compression methods can be performed jointly in a post-training manner. Then, a dedicated differentiable counter is introduced to guide the optimization of lossy compression to arrive at a more suitable point for later lossless compression. Additionally, our method can easily control a desired global compression ratio and allocate adaptive ratios for different layers. Finally, our method can achieve a stable $10\times$ compression ratio without sacrificing accuracy and a $20\times$ compression ratio with minor accuracy loss in a short time. Our code is available at https://github.com/ModelTC/L2_Compression .	翻訳日:2023-08-09 12:34:41 公開日:2023-08-08
# 知識蒸留のための教師学生アーキテクチャ:調査 Teacher-Student Architecture for Knowledge Distillation: A Survey ( http://arxiv.org/abs/2308.04268v1 ) ライセンス: Link先を確認	Chengming Hu, Xuan Li, Dan Liu, Haolun Wu, Xi Chen, Ju Wang, Xue Liu	(参考訳) ディープニューラルネットワーク(dnn)は、多くの領域で大規模な問題を解決する能力が強かったが、そのようなdnnを実世界のシステムに展開することは困難である。この問題に対処するために,数パラメータの単純な学生ネットワークが,パラメータの少ない教師ネットワークと同等の性能を達成できる,教師学習型アーキテクチャが提案されている。近年, 知識圧縮, 知識拡張, 知識適応, 知識向上など, 様々な知識蒸留(KD)の目標に対して, 教師・学生アーキテクチャが効果的に広く受け入れられている。教師・学生アーキテクチャの助けを借りて,最近の研究は,軽量で汎用的な学生ネットワークを通じて,複数の蒸留目的を達成することができる。知識圧縮を主眼とする既存のKD調査と異なり、この調査はまず、複数の蒸留目標にわたる教師-学生アーキテクチャについて調査する。本調査では,様々な知識表現とそれに対応する最適化目標について紹介する。さらに, 代表的な学習アルゴリズムと効果的な蒸留スキームを用いて, 教師・学生のアーキテクチャを体系的に概観する。この調査は、分類、認識、生成、ランキング、回帰など、様々な目的にまたがる教師学習型アーキテクチャの最近の応用を要約している。最後に,アーキテクチャ設計,知識品質,回帰型学習の理論研究を中心に,kdにおける潜在的研究方向を検討する。この包括的調査を通じて、産業実践家や学術コミュニティは、様々な蒸留目的に教師-学生アーキテクチャを効果的に設計、学習、適用するための貴重な洞察とガイドラインを得ることができる。 Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be deployed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. Recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. Different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. This survey presents an introduction to various knowledge representations and their corresponding optimization objectives. Additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. This survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. Lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. Through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.	翻訳日:2023-08-09 12:34:21 公開日:2023-08-08
# RLHF-Blender: 多様なヒューマンフィードバックから学ぶための構成可能な対話インタフェース RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback ( http://arxiv.org/abs/2308.04332v1 ) ライセンス: Link先を確認	Yannick Metz, David Lindner, Rapha\"el Baur, Daniel Keim, Mennatallah El-Assady	(参考訳) ヒューマンフィードバック(RLHF)からの強化学習を実用化するためには,多様なフィードバック源から報酬モデルを学習し,異なるタイプのフィードバックの提供に関わる人的要因を検討することが重要である。しかし、多様なフィードバックから学習する体系的な研究は、研究者が利用できる限られた標準ツールによって支えられている。このギャップを埋めるために,人間のフィードバックから学習するための,構成可能な対話型インタフェースであるRLHF-Blenderを提案する。 RLHF-Blenderはモジュラー実験フレームワークと実装を提供しており、研究者は報酬学習のために人間のフィードバックの特性と品質を体系的に研究することができる。このシステムは、デモ、ランキング、比較、自然言語指導を含む様々なフィードバックタイプの探索や、その効果に対するヒューマンファクターの影響を考慮した研究を促進する。 RLHF-ブレンダーによる具体的な研究の機会について論じる。詳細はhttps://rlhfblender.info/を参照。 To use reinforcement learning from human feedback (RLHF) in practical applications, it is crucial to learn reward models from diverse sources of human feedback and to consider human factors involved in providing feedback of different types. However, the systematic study of learning from diverse types of feedback is held back by limited standardized tooling available to researchers. To bridge this gap, we propose RLHF-Blender, a configurable, interactive interface for learning from human feedback. RLHF-Blender provides a modular experimentation framework and implementation that enables researchers to systematically investigate the properties and qualities of human feedback for reward learning. The system facilitates the exploration of various feedback types, including demonstrations, rankings, comparisons, and natural language instructions, as well as studies considering the impact of human factors on their effectiveness. We discuss a set of concrete research opportunities enabled by RLHF-Blender. More information is available at https://rlhfblender.info/.	翻訳日:2023-08-09 12:27:53 公開日:2023-08-08
# GANを用いたクロスシーン映像のシーン合成によるドメイン適応型人物探索 Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos ( http://arxiv.org/abs/2308.04322v1 ) ライセンス: Link先を確認	Huibing Wang, Tianxiang Cui, Mingze Yao, Huijuan Pang, Yushan Du	(参考訳) 人探しは近年、実際のカメラから特定の歩行者を検索することを目的としているコンピュータビジョン分野において難しい課題となっている。しかしながら、ほとんどの監視ビデオは、歩行者のイメージのみで構成されており、しばしば同じ背景や衣服を特徴としている。したがって,実場面での人物検索において,より識別的な特徴を知ることは困難である。この課題に対処するため、GAN(Generative Adversarial Networks)を用いて監視ビデオからデータを合成する。 GANは高品質な画像を効率よく生成するため、コンピュータビジョンの問題に発展してきた。ビデオの処理や正確な検出結果の取得が可能な,人気の高いFast R-CNNモデルを変更するだけでよい。 2段階モデルがもたらす圧力を適切に軽減するため,我々はAIDQ (Assisted-Identity Query Module) を設計し,後方部に対して肯定的な画像を提供する。さらに,人物検索作業のための高品質な人物画像の合成が可能な,新しいGANベースのシーン合成モデルを提案する。 GANに基づくシーン合成モデルの特徴学習を容易にするため,合成画像とオリジナル画像の協調学習を行うオンライン学習戦略を採用した。 CUHK-SYSU と PRW の2つの広く使われている個人探索ベンチマークによる広範囲な実験により,本手法は高い性能を達成し,より広範なアブレーション研究により,GAN合成データがデータセットの変動性を効果的に増加し,より現実的になることを示す。 Person search has recently been a challenging task in the computer vision domain, which aims to search specific pedestrians from real cameras.Nevertheless, most surveillance videos comprise only a handful of images of each pedestrian, which often feature identical backgrounds and clothing. Hence, it is difficult to learn more discriminative features for person search in real scenes. To tackle this challenge, we draw on Generative Adversarial Networks (GAN) to synthesize data from surveillance videos. GAN has thrived in computer vision problems because it produces high-quality images efficiently. We merely alter the popular Fast R-CNN model, which is capable of processing videos and yielding accurate detection outcomes. In order to appropriately relieve the pressure brought by the two-stage model, we design an Assisted-Identity Query Module (AIDQ) to provide positive images for the behind part. Besides, the proposed novel GAN-based Scene Synthesis model that can synthesize high-quality cross-id person images for person search tasks. In order to facilitate the feature learning of the GAN-based Scene Synthesis model, we adopt an online learning strategy that collaboratively learns the synthesized images and original images. Extensive experiments on two widely used person search benchmarks, CUHK-SYSU and PRW, have shown that our method has achieved great performance, and the extensive ablation study further justifies our GAN-synthetic data can effectively increase the variability of the datasets and be more realistic.	翻訳日:2023-08-09 12:27:37 公開日:2023-08-08
# 弱教師付きセマンティックセグメンテーションのための全ペア一貫性学習 All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2308.04321v1 ) ライセンス: Link先を確認	Weixuan Sun, Yanhao Zhang, Zhen Qin, Zheyuan Liu, Lin Cheng, Fanyi Wang, Yiran Zhong, Nick Barnes	(参考訳) 本研究では,Wakly supervised semantic segmentation (WSSS) のためのオブジェクトのローカライズを改良したトランスフォーマーベース正規化を提案する。画像レベルのWSSSでは、擬似セグメンテーションラベルとしてオブジェクトローカライゼーションを生成するためにクラスアクティベーションマップ(CAM)が採用されている。 CAMの部分的なアクティベーション問題に対処するために、様々な画像拡張におけるアクティベーション強度の不変性を維持するために整合正則化を用いる。しかし、これらの手法は各CAM内の領域間のペアワイズ関係を無視し、コンテキストをキャプチャし、画像ビュー間で不変であるべきである。そこで本研究では,新しい全対整合正規化(ACR)を提案する。一対の拡張ビューが与えられた場合、我々のアプローチは、一対の拡張ビュー間でのアクティベーション強度を規則化するとともに、各ビュー内の領域間の親和性が一貫していることを保証する。視覚トランスフォーマーを自己着脱機構として採用し,自然にペアワイズ親和性を埋め込む。これにより、強調画像対の注目行列間の距離を簡易に調整できる。さらに,クラストークンの勾配を利用した新しいクラス単位のローカライズ手法を提案する。我々の手法はアーキテクチャを変更することなくトランスフォーマーを用いて既存のWSSSメソッドにシームレスに統合することができる。 PASCAL VOCおよびMS COCOデータセットを用いて本手法の評価を行った。本手法はクラスローカライゼーションマップ(PASCAL VOC列車の67.3% mIoU)を著しく改善し,WSSS性能が向上した。 In this work, we propose a new transformer-based regularization to better localize objects for Weakly supervised semantic segmentation (WSSS). In image-level WSSS, Class Activation Map (CAM) is adopted to generate object localization as pseudo segmentation labels. To address the partial activation issue of the CAMs, consistency regularization is employed to maintain activation intensity invariance across various image augmentations. However, such methods ignore pair-wise relations among regions within each CAM, which capture context and should also be invariant across image views. To this end, we propose a new all-pairs consistency regularization (ACR). Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent. We adopt vision transformers as the self-attention mechanism naturally embeds pair-wise affinity. This enables us to simply regularize the distance between the attention matrices of augmented image pairs. Additionally, we introduce a novel class-wise localization method that leverages the gradients of the class token. Our method can be seamlessly integrated into existing WSSS methods using transformers without modifying the architectures. We evaluate our method on PASCAL VOC and MS COCO datasets. Our method produces noticeably better class localization maps (67.3% mIoU on PASCAL VOC train), resulting in superior WSSS performances.	翻訳日:2023-08-09 12:27:09 公開日:2023-08-08
# サブ回折非コヒーレント光イメージングにおける量子限界 III。数値解析 Quantum limit to subdiffraction incoherent optical imaging. III. Numerical analysis ( http://arxiv.org/abs/2308.04317v1 ) ライセンス: Link先を確認	Xiao-Jie Tan and Mankei Tsang	(参考訳) 遠距離非コヒーレントイメージングの基本的な限界を調べるために、この研究の予備(m. tsang, phys. rev. a 99, 012305 (2019), 104, 052411 (2021)]は、物体モーメント推定誤差の量子下限を研究し、物体サイズに関する境界のスケーリング則を証明した。スケーリングの法則は、消滅する物体の大きさの漸近的極限でのみ証明されたため、この研究は量子境界の数値解析を行い、実際にゼロでない物体サイズでうまく働くことを検証した。また,空間モードデマルチプレクシング (SPADE) と呼ばれる測定値の最適性について検討し,SPADEがスケーリングに追従するだけでなく,少なくとも低次モーメントに対して最適に近い数値的に近いことを示す。 To investigate the fundamental limit to far-field incoherent imaging, the prequels to this work [M. Tsang, Phys. Rev. A 99, 012305 (2019); 104, 052411 (2021)] have studied a quantum lower bound on the error of estimating an object moment and proved a scaling law for the bound with respect to the object size. As the scaling law was proved only in the asymptotic limit of vanishing object size, this work performs a numerical analysis of the quantum bound to verify that the law works well for nonzero object sizes in reality. We also use the numerical bounds to study the optimality of a measurement called spatial-mode demultiplexing or SPADE, showing that SPADE not only follows the scaling but is also numerically close to being optimal, at least for low-order moments.	翻訳日:2023-08-09 12:26:46 公開日:2023-08-08
# 協調マルチエージェントバンド: 最適な個別レグレットと一定通信コストを持つ分散アルゴリズム Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs ( http://arxiv.org/abs/2308.04314v1 ) ライセンス: Link先を確認	Lin Yang, Xuchuang Wang, Mohammad Hajiesmaili, Lijun Zhang, John C.S. Lui, Don Towsley	(参考訳) 近年,一組の分散エージェントが協調的に同じマルチアームバンディットゲームをする,協調型マルチエージェントマルチアームバンディットの研究が盛んに行われている。目標は、最適なグループと個人の後悔とエージェント間のコミュニケーションの少ないバンディットアルゴリズムを開発することである。以前の作業では、リーダフォローと完全な分散アルゴリズムという2つのパラダイムを使用してこの問題に取り組んでいた。両方のパラダイムにおける先行アルゴリズムは、最適なグループ後悔を達成する。リーダー追跡アルゴリズムは一定の通信コストを達成するが、最適な個人の後悔は達成できない。最先端の完全分散アルゴリズムは、最適な個別の後悔を実現するが、一定の通信コストは達成できない。本稿では,シンプルだが効果的な通信方針を示し,協調的盗賊学習アルゴリズムに統合する。我々のアルゴリズムは、最適な個人の後悔と絶え間ないコミュニケーションコストという、両方のパラダイムのベストを達成する。 Recently, there has been extensive study of cooperative multi-agent multi-armed bandits where a set of distributed agents cooperatively play the same multi-armed bandit game. The goal is to develop bandit algorithms with the optimal group and individual regrets and low communication between agents. The prior work tackled this problem using two paradigms: leader-follower and fully distributed algorithms. Prior algorithms in both paradigms achieve the optimal group regret. The leader-follower algorithms achieve constant communication costs but fail to achieve optimal individual regrets. The state-of-the-art fully distributed algorithms achieve optimal individual regrets but fail to achieve constant communication costs. This paper presents a simple yet effective communication policy and integrates it into a learning algorithm for cooperative bandits. Our algorithm achieves the best of both paradigms: optimal individual regret and constant communication costs.	翻訳日:2023-08-09 12:26:26 公開日:2023-08-08
# Apple Vision Pro for Healthcare:「究極のディスプレイ」? Apple Vision Pro for Healthcare: "The Ultimate Display"? ( http://arxiv.org/abs/2308.04313v1 ) ライセンス: Link先を確認	Jan Egger, Christina Gsaxner, Xiaojun Chen, Jiang Bian, Jens Kleesiek, Behrus Puladi	(参考訳) 2023年6月のWorldwide Developers Conference (WWDC)で、AppleはVision Proを発表した。 Vision ProはMR(Mixed Reality)ヘッドセットで、より具体的にはVR(Virtual Reality)デバイスで、VST(Video See-Through)機能が追加されている。 VST機能は、Vision Proを拡張現実(Augmented Reality, AR)デバイスに変える。 AR機能は、カメラを介して現実世界をユーザーの目の前で(VR)スクリーンにストリーミングすることで実現される。もちろんこれはユニークではなく、Varjo XR-3のような他のデバイスと似ている。それでもVision Proには、ヘッドセットの装着者の目が「外」に表示されるインサイド・アウト・スクリーンや、デジタルクラウンと呼ばれる上部のボタンなど、デジタルコンテンツを物理的空間とシームレスにブレンドできる機能があります。さらに、バッテリへのケーブル以外は接続されていないため、varjo xr-3と比較してヘッドセットはより機敏になる。これは、1965年にイヴァン・サザーランドがスケッチした「Ultimate Display」に近いかもしれない。 Ultimate Displayのような一般向けにはまだ公開されていないが、この観点からは、ARがまだ医療分野で直面しているいくつかの臨床的課題を克服できるかどうかを見極めるとともに、Vision Proが臨床医を不可欠なタスクで支援し、患者とより多くの時間を過ごすことができるかどうかを議論したい。 At the Worldwide Developers Conference (WWDC) in June 2023, Apple introduced the Vision Pro. The Vision Pro is a Mixed Reality (MR) headset, more specifically it is a Virtual Reality (VR) device with an additional Video See-Through (VST) capability. The VST capability turns the Vision Pro also into an Augmented Reality (AR) device. The AR feature is enabled by streaming the real world via cameras to the (VR) screens in front of the user's eyes. This is of course not unique and similar to other devices, like the Varjo XR-3. Nevertheless, the Vision Pro has some interesting features, like an inside-out screen that can show the headset wearers' eyes to "outsiders" or a button on the top, called "Digital Crown", that allows you to seamlessly blend digital content with your physical space by turning it. In addition, it is untethered, except for the cable to the battery, which makes the headset more agile, compared to the Varjo XR-3. This could actually come closer to the "Ultimate Display", which Ivan Sutherland had already sketched in 1965. Not available to the public yet, like the Ultimate Display, we want to take a look into the crystal ball in this perspective to see if it can overcome some clinical challenges that - especially - AR still faces in the medical domain, but also go beyond and discuss if the Vision Pro could support clinicians in essential tasks to spend more time with their patients.	翻訳日:2023-08-09 12:26:15 公開日:2023-08-08
# 対話シナリオにおける車両軌道予測のための解釈可能なゴールベースモデル Interpretable Goal-Based model for Vehicle Trajectory Prediction in Interactive Scenarios ( http://arxiv.org/abs/2308.04312v1 ) ライセンス: Link先を確認	Amina Ghoul, Itheri Yahiaoui, Anne Verroust-Blondet, and Fawzi Nashashibi	(参考訳) 都市環境における交通路を予測しながら、車と周囲の社会的相互作用を理解する能力は、自動運転における道路安全に不可欠である。社会的相互作用は不確実性のため説明が難しい。近年、ニューラルネットワークに基づく手法は軌道予測に広く使われており、手作りの手法よりも優れていることが示されている。しかし、これらの手法は解釈可能性の欠如に苦しむ。この制限を克服するために,対話環境における車両軌道予測タスクにおいて,離散的選択モデルの解釈可能性とニューラルネットワークに基づくモデルの高精度を組み合わせる。インタラクションデータセットを用いてモデルを実装し評価し,提案手法の有効性を実証し,精度を損なうことなくその予測を説明する。 The abilities to understand the social interaction behaviors between a vehicle and its surroundings while predicting its trajectory in an urban environment are critical for road safety in autonomous driving. Social interactions are hard to explain because of their uncertainty. In recent years, neural network-based methods have been widely used for trajectory prediction and have been shown to outperform hand-crafted methods. However, these methods suffer from their lack of interpretability. In order to overcome this limitation, we combine the interpretability of a discrete choice model with the high accuracy of a neural network-based model for the task of vehicle trajectory prediction in an interactive environment. We implement and evaluate our model using the INTERACTION dataset and demonstrate the effectiveness of our proposed architecture to explain its predictions without compromising the accuracy.	翻訳日:2023-08-09 12:25:50 公開日:2023-08-08
# メタファー検出のためのディープラーニングに基づく知識注入:包括的レビュー Deep Learning-Based Knowledge Injection for Metaphor Detection: A Comprehensive Review ( http://arxiv.org/abs/2308.04306v1 ) ライセンス: Link先を確認	Cheng Yang, Wenye Zhao, Qingbao Huang	(参考訳) 比喩研究の歴史は知識注入研究の進化を象徴している。近年のディープラーニング技術の進歩により、自然言語処理コミュニティはメタファ認識タスクの成果に知識を適用することに大きな関心を示している。メタファ認識の分野では,知識注入に関するアプローチが徐々に増えてきたが,知識注入に基づくアプローチに関する完全なレビュー記事が不足している。そこで本稿の目的は,メタファ認識タスクにおける知識注入へのディープラーニングの適用における研究の進歩を包括的にレビューすることである。本稿では,主要な知識と知識の注入原則を体系的に要約し,一般化するとともに,メタファ認識タスクで使用されるデータセット,評価指標,ベンチマークモデルをレビューする。最後に,ナレッジインジェクション手法が直面する課題を探究し,今後の研究の方向性を展望する。 The history of metaphor research also marks the evolution of knowledge infusion research. With the continued advancement of deep learning techniques in recent years, the natural language processing community has shown great interest in applying knowledge to successful results in metaphor recognition tasks. Although there has been a gradual increase in the number of approaches involving knowledge injection in the field of metaphor recognition, there is a lack of a complete review article on knowledge injection based approaches. Therefore, the goal of this paper is to provide a comprehensive review of research advances in the application of deep learning for knowledge injection in metaphor recognition tasks. In this paper, we systematically summarize and generalize the mainstream knowledge and knowledge injection principles, as well as review the datasets, evaluation metrics, and benchmark models used in metaphor recognition tasks. Finally, we explore the current issues facing knowledge injection methods and provide an outlook on future research directions.	翻訳日:2023-08-09 12:25:40 公開日:2023-08-08
# セマンティック通信システムにおけるモデル反転盗聴攻撃 The Model Inversion Eavesdropping Attack in Semantic Communication Systems ( http://arxiv.org/abs/2308.04304v1 ) ライセンス: Link先を確認	Yuhao Chen, Qianqian Yang, Zhiguo Shi and Jiming Chen	(参考訳) 近年,セマンティックコミュニケーションはコミュニケーション効率の優位性について研究が盛んに行われている。意味コミュニケーションは、生のメッセージから意味を抽出するためにディープラーニングに依存するため、ディープラーニングモデルをターゲットにした攻撃には弱い。本稿では, セマンティック通信システムにおけるプライバシー漏洩のリスクを明らかにするために, モデル逆盗聴攻撃(MIEA)を導入する。 mieaでは、攻撃者は最初にセマンティック通信システムによって送信される信号を盗み出し、次にモデル反転攻撃を行い、ホワイトボックスとブラックボックスの設定の両方が考慮される生のメッセージを再構築する。評価の結果,MIEAは異なるチャネル条件下で良好な品質で生メッセージを再構築できることがわかった。次に, セキュアな意味コミュニケーションを実現するために, ランダムな順列と置換に基づく防御手法を提案する。本研究は,MIEA対策における防衛法の有効性を実証するものである。 In recent years, semantic communication has been a popular research topic for its superiority in communication efficiency. As semantic communication relies on deep learning to extract meaning from raw messages, it is vulnerable to attacks targeting deep learning models. In this paper, we introduce the model inversion eavesdropping attack (MIEA) to reveal the risk of privacy leaks in the semantic communication system. In MIEA, the attacker first eavesdrops the signal being transmitted by the semantic communication system and then performs model inversion attack to reconstruct the raw message, where both the white-box and black-box settings are considered. Evaluation results show that MIEA can successfully reconstruct the raw message with good quality under different channel conditions. We then propose a defense method based on random permutation and substitution to defend against MIEA in order to achieve secure semantic communication. Our experimental results demonstrate the effectiveness of the proposed defense method in preventing MIEA.	翻訳日:2023-08-09 12:25:27 公開日:2023-08-08
# 先行情報とセマンティック支援機能グリッドマップを用いた車両運動予測 Vehicle Motion Forecasting using Prior Information and Semantic-assisted Occupancy Grid Maps ( http://arxiv.org/abs/2308.04303v1 ) ライセンス: Link先を確認	Rabbia Asghar, Manuel Diaz-Zapata, Lukas Rummelhard, Anne Spalanzani, Christian Laugier	(参考訳) センサデータの不確実性、未来の非決定論的性質、エージェントの複雑な振る舞いなどにより、自律走行車両の動作予測は困難なタスクである。本稿では,シーンを動的占有グリッドマップ(dogm)として表現し,占有セルに意味ラベルを関連付け,地図情報を組み込むことにより,この問題に取り組む。本稿では,車両行動予測のための深層学習に基づく時空間的手法と確率論的手法を組み合わせた新しい枠組みを提案する。実世界のNuScenesデータセットを用いて実験を行い,OGMの予測よりも静的車両と動的車両の予測能力が優れていることを示す。さらに,アブレーション研究を行い,アーキテクチャにおける意味ラベルとマップの役割を評価する。 Motion prediction is a challenging task for autonomous vehicles due to uncertainty in the sensor data, the non-deterministic nature of future, and complex behavior of agents. In this paper, we tackle this problem by representing the scene as dynamic occupancy grid maps (DOGMs), associating semantic labels to the occupied cells and incorporating map information. We propose a novel framework that combines deep-learning-based spatio-temporal and probabilistic approaches to predict vehicle behaviors.Contrary to the conventional OGM prediction methods, evaluation of our work is conducted against the ground truth annotations. We experiment and validate our results on real-world NuScenes dataset and show that our model shows superior ability to predict both static and dynamic vehicles compared to OGM predictions. Furthermore, we perform an ablation study and assess the role of semantic labels and map in the architecture.	翻訳日:2023-08-09 12:25:12 公開日:2023-08-08
# SSTFormer: フレームイベントに基づく認識のためのブリッジングスパイキングニューラルネットワークとメモリサポートトランス SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition ( http://arxiv.org/abs/2308.04369v1 ) ライセンス: Link先を確認	Xiao Wang, Zongzhen Wu, Yao Rong, Lin Zhu, Bo Jiang, Jin Tang, Yonghong Tian	(参考訳) イベントカメラに基づくパターン認識は近年新たに生まれた研究テーマである。現在の研究者は通常、イベントストリームを画像、グラフ、voxelに変換し、イベントベースの分類にディープニューラルネットワークを採用する。しかし、単純なイベント認識データセットでは良いパフォーマンスが得られるが、以下の2つの問題により、結果はまだ限られているかもしれない。まず、認識のみに空間的スパースイベントストリームを採用するが、色や詳細なテクスチャ情報をうまくキャプチャできない場合がある。第2に、spyking neural networks (snn) をエネルギー効率のよいサブオプティマイズによる認識に、artificial neural networks (ann) をエネルギー集約的かつ高性能な認識に採用している。しかし、これら2つの側面のバランスを取ることはほとんど考えていない。本稿では,RGBフレームとイベントストリームを同時に融合してパターンを認識することを提案し,上記の問題に対処する新しいRGBフレームイベント認識フレームワークを提案する。提案手法は,RGBフレーム符号化のためのメモリサポートトランスフォーマーネットワーク,生イベントストリーム符号化のためのスパイクニューラルネットワーク,RGBイベント特徴集約のためのマルチモーダルボトルネック融合モジュール,予測ヘッドの4つの主要モジュールを含む。また,RGB-Eventに基づく分類データセットが不足しているため,DVS346イベントカメラを用いて記録した114のクラスと27102のフレームイベントペアを含む大規模PokerEventデータセットを提案する。 2つのrgbイベントベースの分類データセットに関する広範な実験により,提案フレームワークの有効性が完全に検証された。この作業により、RGBフレームとイベントストリームを融合することで、パターン認識の開発が促進されることを願っています。この作業のデータセットとソースコードは、https://github.com/Event-AHU/SSTFormer.comで公開されます。 Event camera-based pattern recognition is a newly arising research topic in recent years. Current researchers usually transform the event streams into images, graphs, or voxels, and adopt deep neural networks for event-based classification. Although good performance can be achieved on simple event recognition datasets, however, their results may be still limited due to the following two issues. Firstly, they adopt spatial sparse event streams for recognition only, which may fail to capture the color and detailed texture information well. Secondly, they adopt either Spiking Neural Networks (SNN) for energy-efficient recognition with suboptimal results, or Artificial Neural Networks (ANN) for energy-intensive, high-performance recognition. However, seldom of them consider achieving a balance between these two aspects. In this paper, we formally propose to recognize patterns by fusing RGB frames and event streams simultaneously and propose a new RGB frame-event recognition framework to address the aforementioned issues. The proposed method contains four main modules, i.e., memory support Transformer network for RGB frame encoding, spiking neural network for raw event stream encoding, multi-modal bottleneck fusion module for RGB-Event feature aggregation, and prediction head. Due to the scarce of RGB-Event based classification dataset, we also propose a large-scale PokerEvent dataset which contains 114 classes, and 27102 frame-event pairs recorded using a DVS346 event camera. Extensive experiments on two RGB-Event based classification datasets fully validated the effectiveness of our proposed framework. We hope this work will boost the development of pattern recognition by fusing RGB frames and event streams. Both our dataset and source code of this work will be released at https://github.com/Event-AHU/SSTFormer.	翻訳日:2023-08-09 12:18:12 公開日:2023-08-08
# SLEM:超学習方程式モデリングを用いた経路モデリングと因果推論のための機械学習 SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling ( http://arxiv.org/abs/2308.04365v1 ) ライセンス: Link先を確認	Matthew J. Vowels	(参考訳) 因果推論は科学の重要な目標であり、観測データを用いて仮説的介入の予測に関する有意義な結論に達することができる。経路モデル、構造方程式モデル(SEM)、より一般的には、DAG(Directed Acyclic Graphs)は、現象の根底にある因果構造に関する仮定を明確に特定する手段を提供する。関数形式とパラメトリック形式についてほとんど仮定しないDAGとは異なり、SEMは線型性を仮定する。これにより機能的不特定が生じ、研究者が信頼性の高い効果サイズ推定を行うのを防ぐことができる。これとは対照的に,機械学習のスーパーラーナーアンサンブルを統合するパスモデリング技術であるSuper Learner Equation Modelingを提案する。我々は,SEMと比較した場合の線形モデルに対する因果効果の一貫性と不偏性の評価,および非線形関係を扱う場合のSEMに対する優位性を実証的に示す。オープンソースのコードとサンプルを使ったチュートリアルノートブックを提供し,メソッドの使いやすさを強調する。 Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.	翻訳日:2023-08-09 12:17:42 公開日:2023-08-08
# 無バイアス画像分割の学習 : プレーン膝X線撮影を例に Learning Unbiased Image Segmentation: A Case Study with Plain Knee Radiographs ( http://arxiv.org/abs/2308.04356v1 ) ライセンス: Link先を確認	Nickolas Littlefield, Johannes F. Plate, Kurt R. Weiss, Ines Lohse, Avani Chhabra, Ismaeel A. Siddiqui, Zoe Menezes, George Mastorakos, Sakshi Mehul Thakar, Mehrnaz Abedian, Matthew F. Gong, Luke A. Carlson, Hamidreza Moradi, Soheyla Amirian, and Ahmad P. Tafti	(参考訳) 膝骨骨解剖の自動分節化は整形外科において必須であり,術前および術後のいずれにおいても数年にわたって行われている。深層学習アルゴリズムは医用画像解析において異常な性能を示しているが、これらのモデルにおける公平性と潜在的なバイアスの評価は限られている。本研究では,単純x線写真を用いた深層学習による膝骨解剖学的セグメント化を再考し,視認性や人種バイアスを明らかにすることを目的とした。現在の貢献はバイアスに対する理解を深める可能性を提供し、医療画像の研究者や実践者に実践的な洞察を提供する。提案された緩和戦略は、男女の偏見を緩和し、公平で偏見のないセグメンテーション結果を保証する。さらに本研究は, 多様な患者集団の正確な診断と治療結果への平等なアクセスを促進し, 公平かつ包括的な医療提供を促進する。 Automatic segmentation of knee bony anatomy is essential in orthopedics, and it has been around for several years in both pre-operative and post-operative settings. While deep learning algorithms have demonstrated exceptional performance in medical image analysis, the assessment of fairness and potential biases within these models remains limited. This study aims to revisit deep learning-powered knee-bony anatomy segmentation using plain radiographs to uncover visible gender and racial biases. The current contribution offers the potential to advance our understanding of biases, and it provides practical insights for researchers and practitioners in medical imaging. The proposed mitigation strategies mitigate gender and racial biases, ensuring fair and unbiased segmentation results. Furthermore, this work promotes equal access to accurate diagnoses and treatment outcomes for diverse patient populations, fostering equitable and inclusive healthcare provision.	翻訳日:2023-08-09 12:17:24 公開日:2023-08-08
# 3D-VisTA:3Dビジョンとテキストアライメントのためのトレーニング済みトランス 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment ( http://arxiv.org/abs/2308.04352v1 ) ライセンス: Link先を確認	Ziyu Zhu, Xiaojian Ma, Yixin Chen, Zhidong Deng, Siyuan Huang, Qing Li	(参考訳) 3次元視覚言語接地(3D-VL)は、3次元物理世界と自然言語を結びつけることを目的とした新興分野である。現在の3D-VLモデルは、洗練されたモジュール、補助的な損失、最適化のトリックに大きく依存している。本稿では,様々な下流タスクに容易に適応可能な3次元視覚およびテキストアライメントのための事前学習トランスフォーマである3d-vistaを提案する。 3D-VisTAは、単一のモーダルモデリングとマルチモーダル融合の両方に、高度なタスク固有の設計を使わずに自己アテンション層を利用する。 3D-VLタスクの性能をさらに向上するために,3D-VL事前学習のための大規模3DシーンテキストペアデータセットであるScanScribeを構築した。 ScanScribeには、ScanNetと3R-Scanデータセットに由来する1,185の屋内シーンのための2,995のRGB-Dスキャンと、既存の3D-VLタスク、テンプレート、GPT-3から生成された278Kシーン記述が含まれている。 3D-VisTAは、マスク付き言語/オブジェクトモデリングとシーンテキストマッチングによってScanScribe上で事前トレーニングされる。視覚的接地や密接なキャプション、質問応答、位置推論など、様々な3D-VLタスクの最先端結果が得られる。さらに、3D-VisTAはデータ効率が優れており、下流タスクの微調整中に限られたアノテーションでも高い性能が得られる。 3D vision-language grounding (3D-VL) is an emerging field that aims to connect the 3D physical world with natural language, which is crucial for achieving embodied intelligence. Current 3D-VL models rely heavily on sophisticated modules, auxiliary losses, and optimization tricks, which calls for a simple and unified model. In this paper, we propose 3D-VisTA, a pre-trained Transformer for 3D Vision and Text Alignment that can be easily adapted to various downstream tasks. 3D-VisTA simply utilizes self-attention layers for both single-modal modeling and multi-modal fusion without any sophisticated task-specific design. To further enhance its performance on 3D-VL tasks, we construct ScanScribe, the first large-scale 3D scene-text pairs dataset for 3D-VL pre-training. ScanScribe contains 2,995 RGB-D scans for 1,185 unique indoor scenes originating from ScanNet and 3R-Scan datasets, along with paired 278K scene descriptions generated from existing 3D-VL tasks, templates, and GPT-3. 3D-VisTA is pre-trained on ScanScribe via masked language/object modeling and scene-text matching. It achieves state-of-the-art results on various 3D-VL tasks, ranging from visual grounding and dense captioning to question answering and situated reasoning. Moreover, 3D-VisTA demonstrates superior data efficiency, obtaining strong performance even with limited annotations during downstream task fine-tuning.	翻訳日:2023-08-09 12:17:07 公開日:2023-08-08
# 国籍バイアスを解き放つ:AI生成記事における人間による国籍の認識に関する研究 Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles ( http://arxiv.org/abs/2308.04346v1 ) ライセンス: Link先を確認	Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao `Kenneth' Huang and Shomir Wilson	(参考訳) 自然言語処理(NLP)モデルにおける国籍バイアスの可能性について,人間の評価手法を用いて検討した。バイアス付きNLPモデルは、ステレオタイプを永続させ、アルゴリズムによる差別につながる可能性がある。本研究は,テキスト生成モデルにおける国籍バイアスの影響を定量的かつ定性的に把握するための2段階の混合手法を用いる。人間中心の定量的分析を通じて、AIソースが生成した記事の国籍バイアスの程度を測定する。次に,被験者との公開面接を行い,質的コーディングと主題分析を行い,これらのバイアスが人間の読者に与える影響を理解する。以上の結果から,NLPモデルでは既存の社会的バイアスを再現・増幅する傾向があり,社会工学的な場面で使用すれば害につながる可能性が示唆された。インタビューから得られた質的な分析は、読者がそのような記事に遭遇する際の体験についての洞察を与え、読者の国に対する認識を変える可能性を強調している。これらの知見は、AIが社会に与える影響を形作り、AIシステムのバイアスを正す必要性において、公衆の認識が重要な役割を担っていることを強調している。 We investigate the potential for nationality biases in natural language processing (NLP) models using human evaluation methods. Biased NLP models can perpetuate stereotypes and lead to algorithmic discrimination, posing a significant challenge to the fairness and justice of AI systems. Our study employs a two-step mixed-methods approach that includes both quantitative and qualitative analysis to identify and understand the impact of nationality bias in a text generation model. Through our human-centered quantitative analysis, we measure the extent of nationality bias in articles generated by AI sources. We then conduct open-ended interviews with participants, performing qualitative coding and thematic analysis to understand the implications of these biases on human readers. Our findings reveal that biased NLP models tend to replicate and amplify existing societal biases, which can translate to harm if used in a sociotechnical setting. The qualitative analysis from our interviews offers insights into the experience readers have when encountering such articles, highlighting the potential to shift a reader's perception of a country. These findings emphasize the critical role of public perception in shaping AI's impact on society and the need to correct biases in AI systems.	翻訳日:2023-08-09 12:16:39 公開日:2023-08-08
# クロスモーダル検索のためのトランスフォーマによる2ストリームエンコーダの統合 Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval ( http://arxiv.org/abs/2308.04343v1 ) ライセンス: Link先を確認	Yi Bin, Haoxuan Li, Yahui Xu, Xing Xu, Yang Yang, Heng Tao Shen	(参考訳) 既存のクロスモーダル検索手法の多くは、画像とテキストの異なるアーキテクチャを持つ2ストリームエンコーダ、画像のCNN、テキストのRNN/Transformerを使用している。このようなアーキテクチャの相違は、異なる意味的分布空間を誘導し、画像とテキスト間の相互作用を制限し、さらに画像とテキストのアライメントが劣る可能性がある。視覚タスクにおけるトランスフォーマーの最近の進歩に触発されたこの研究ギャップを埋めるため,両モードでトランスフォーマーとエンコーダアーキテクチャを統合することを提案する。具体的には、画像変換器、テキスト変換器、階層アライメントモジュールからなる2ストリーム変換器(textbf{Hierarchical Alignment Transformer (HAT)})を純粋にベースとしたクロスモーダル検索フレームワークを設計する。このような同一のアーキテクチャでは、エンコーダは画像やテキストに類似した特徴を持つ表現を生成し、それらの相互作用やアライメントをより容易にすることができる。さらに、リッチセマンティクスを活用するために、画像とテキストの間の異なるレイヤのマルチレベル対応を探索するための階層的アライメントスキームを考案する。提案するHATの有効性を評価するため,MSCOCOとFlickr30Kという2つのベンチマークデータセットについて広範な実験を行った。実験の結果,HATはSOTAベースラインよりも大きなマージンで優れていた。具体的には、textit{i.e.} と Image-to-text と text-to-image の2つの主要なタスクにおいて、HAT は MSCOCO での Recall@1 の相対スコア改善を 7.6 % と 16.7 %、Flickr30k では 4.4 % と 11.6 % を達成する。コードは \url{https://github.com/luminosityx/hat} で入手できる。 Most existing cross-modal retrieval methods employ two-stream encoders with different architectures for images and texts, \textit{e.g.}, CNN for images and RNN/Transformer for texts. Such discrepancy in architectures may induce different semantic distribution spaces and limit the interactions between images and texts, and further result in inferior alignment between images and texts. To fill this research gap, inspired by recent advances of Transformers in vision tasks, we propose to unify the encoder architectures with Transformers for both modalities. Specifically, we design a cross-modal retrieval framework purely based on two-stream Transformers, dubbed \textbf{Hierarchical Alignment Transformers (HAT)}, which consists of an image Transformer, a text Transformer, and a hierarchical alignment module. With such identical architectures, the encoders could produce representations with more similar characteristics for images and texts, and make the interactions and alignments between them much easier. Besides, to leverage the rich semantics, we devise a hierarchical alignment scheme to explore multi-level correspondences of different layers between images and texts. To evaluate the effectiveness of the proposed HAT, we conduct extensive experiments on two benchmark datasets, MSCOCO and Flickr30K. Experimental results demonstrate that HAT outperforms SOTA baselines by a large margin. Specifically, on two key tasks, \textit{i.e.}, image-to-text and text-to-image retrieval, HAT achieves 7.6\% and 16.7\% relative score improvement of Recall@1 on MSCOCO, and 4.4\% and 11.6\% on Flickr30k respectively. The code is available at \url{https://github.com/LuminosityX/HAT}.	翻訳日:2023-08-09 12:16:17 公開日:2023-08-08
# 正確、説明可能、プライベートモデル:トレーニングデータの漏洩を最小限に抑えながらリコースを提供する Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage ( http://arxiv.org/abs/2308.04341v1 ) ライセンス: Link先を確認	Catherine Huang, Chelse Swoopes, Christina Xiao, Jiaqi Ma, Himabindu Lakkaraju	(参考訳) 機械学習モデルは、個々の結果を予測するために、影響のある領域でますます利用されています。このように、多くのモデルは、否定的な結果を受ける個人にアルゴリズム的リコースを提供する。しかし、recourseは敵によってプライベートな情報を開示するために利用される。この研究はそのような攻撃を緩和する最初の試みである。本稿では,微分プライベート・モデル(DPM)とラプラス・リコース(LR)の2つの新しい手法を提案する。実世界および合成データセットのロジスティック回帰分類器を用いて、DPMとLRは、特に低FPRにおいて、敵対者が推論できることを減らすのに有効であることがわかった。トレーニングデータセットのサイズが十分に大きい場合、モデルを維持しながらプライバシーの漏洩を防止し、新しいLR法でレコメンデーション精度を向上することに成功した。 Machine learning models are increasingly utilized across impactful domains to predict individual outcomes. As such, many models provide algorithmic recourse to individuals who receive negative outcomes. However, recourse can be leveraged by adversaries to disclose private information. This work presents the first attempt at mitigating such attacks. We present two novel methods to generate differentially private recourse: Differentially Private Model (DPM) and Laplace Recourse (LR). Using logistic regression classifiers and real world and synthetic datasets, we find that DPM and LR perform well in reducing what an adversary can infer, especially at low FPR. When training dataset size is large enough, we find particular success in preventing privacy leakage while maintaining model and recourse accuracy with our novel LR method.	翻訳日:2023-08-09 12:15:41 公開日:2023-08-08
# 網膜面に基づく軽量かつ高精度な顔検出アルゴリズム A Lightweight and Accurate Face Detection Algorithm Based on Retinaface ( http://arxiv.org/abs/2308.04340v1 ) ライセンス: Link先を確認	Baozhu Liu, Hewei Yu	(参考訳) 本稿では,Retinaface を用いた軽量かつ高精度な顔検出アルゴリズム LAFD (Light and accurate face detection) を提案する。アルゴリズムのバックボーンネットワークは、畳み込みカーネルのサイズ、反転残差ブロックのチャネル拡大乗算器、seアテンション機構の使用を調整する改良されたmobilenetv3ネットワークである。変形可能な畳み込みネットワーク(dcn)がコンテキストモジュールに導入され、アルゴリズムはモデルの分類損失関数としてクロスエントロピー損失関数の代わりに焦点損失関数を使用する。 WIDERFACEデータセットの試験結果は、LAFDの平均精度が94.1%、92.2%、82.1%で、それぞれ3.4%、4.0%、および8.3%の改善であり、優れた軽量モデルであるLFFDよりも3.1%、4.1%高い。入力画像が前処理され、長さ1560px、幅1200pxにスケールされた場合、「ハード」検証サブセットの平均精度は86.2%となる。モデルは軽量で、サイズは10.2MBである。 In this paper, we propose a lightweight and accurate face detection algorithm LAFD (Light and accurate face detection) based on Retinaface. Backbone network in the algorithm is a modified MobileNetV3 network which adjusts the size of the convolution kernel, the channel expansion multiplier of the inverted residuals block and the use of the SE attention mechanism. Deformable convolution network(DCN) is introduced in the context module and the algorithm uses focal loss function instead of cross-entropy loss function as the classification loss function of the model. The test results on the WIDERFACE dataset indicate that the average accuracy of LAFD is 94.1%, 92.2% and 82.1% for the "easy", "medium" and "hard" validation subsets respectively with an improvement of 3.4%, 4.0% and 8.3% compared to Retinaface and 3.1%, 4.1% and 4.1% higher than the well-performing lightweight model, LFFD. If the input image is pre-processed and scaled to 1560px in length or 1200px in width, the model achieves an average accuracy of 86.2% on the 'hard' validation subset. The model is lightweight, with a size of only 10.2MB.	翻訳日:2023-08-09 12:15:27 公開日:2023-08-08
# Pengembangan Model untuk Mendeteksi Kerusakan pada Terumbu Karang dengan Klasifikasi Citra Pengembangan Model untuk Mendeteksi Kerusakan pada Terumbu Karang dengan Klasifikasi Citra ( http://arxiv.org/abs/2308.04337v1 ) ライセンス: Link先を確認	Fadhil Muhammad, Alif Bintang Elfandra, Iqbal Pahlevi Amin, Alfan Farizki Wicaksono	(参考訳) インドネシア海域のサンゴ礁の生物多様性は貴重な資産であり、保存する必要がある。急速な気候変動と人的活動はサンゴ礁の生態系を悪化させ、サンゴの白化はサンゴの健康状態の重要な指標となっている。そこで本研究では,健康サンゴと漂白サンゴを区別する正確な分類モデルを開発することを目的としている。本研究はFlickr APIを用いてFlickrから収集した923の画像からなる特別なデータセットを利用する。データセットは、健康サンゴ(438画像)と漂白サンゴ(485画像)の2つの異なるクラスで構成されている。これらの画像は最大300ピクセルの幅や高さにリサイズされ、データセット全体にわたって一貫したサイズを維持している。本研究で用いられる方法は、機械学習モデル、特に畳み込みニューラルネットワーク(cnn)を用いて、健康で漂白したサンゴの視覚パターンを認識し識別することである。この文脈では、データセットは、最適な結果を得るために様々な分類モデルのトレーニングとテストに使用できる。 ResNetモデルを利用することで、Stock-Scratch ResNetモデルは、精度と精度で事前訓練されたモデルより優れていることがわかった。正確な分類モデルの開発の成功は、サンゴ礁の健康をよりよく理解する研究者や海洋生物学者に大いに役立つだろう。これらのモデルはサンゴ礁環境の変化をモニタリングするためにも用いられるため、生命に大きく影響する保護と生態系の回復に重要な貢献をする。 The abundant biodiversity of coral reefs in Indonesian waters is a valuable asset that needs to be preserved. Rapid climate change and uncontrolled human activities have led to the degradation of coral reef ecosystems, including coral bleaching, which is a critical indicator of coral health conditions. Therefore, this research aims to develop an accurate classification model to distinguish between healthy corals and corals experiencing bleaching. This study utilizes a specialized dataset consisting of 923 images collected from Flickr using the Flickr API. The dataset comprises two distinct classes: healthy corals (438 images) and bleached corals (485 images). These images have been resized to a maximum of 300 pixels in width or height, whichever is larger, to maintain consistent sizes across the dataset. The method employed in this research involves the use of machine learning models, particularly convolutional neural networks (CNN), to recognize and differentiate visual patterns associated with healthy and bleached corals. In this context, the dataset can be used to train and test various classification models to achieve optimal results. By leveraging the ResNet model, it was found that a from-scratch ResNet model can outperform pretrained models in terms of precision and accuracy. The success in developing accurate classification models will greatly benefit researchers and marine biologists in gaining a better understanding of coral reef health. These models can also be employed to monitor changes in the coral reef environment, thereby making a significant contribution to conservation and ecosystem restoration efforts that have far-reaching impacts on life.	翻訳日:2023-08-09 12:15:03 公開日:2023-08-08
# ガーナのnational science and maths quizを勝ち取るaiを目指して Towards an AI to Win Ghana's National Science and Maths Quiz ( http://arxiv.org/abs/2308.04333v1 ) ライセンス: Link先を確認	George Boateng, Jonathan Abrefah Mensah, Kevin Takyi Yeboah, William Edor, Andrew Kojo Mensah-Onumah, Naafi Dasana Ibrahim, Nana Sam Yeboah	(参考訳) aiはガーナのnational science and maths quiz(nsmq)に勝つことができるか? NSMQ AIプロジェクト(NSMQ AI Project)は、NSMQのライブ配信と勝利を競うAIを開発するオープンソースプロジェクトである。 NSMQ (英語: NSMQ) は、ガーナの2人の学生からなる3つのチームが、生物学、化学、物理学、数学の5段階にわたる質問に答えて、優勝チームが優勝するまでの5段階で競う、毎年開催される科学・数学の大会である。 NSMQは、音声テキスト、テキスト音声、質問応答、人間とコンピュータのインタラクションなど、興味深い技術的課題を抱える、エキサイティングなライブクイズコンペティションである。 2023年1月に始まったこの進行中の作業の中で、プロジェクトの概要、各チーム、これまでの進捗状況、そして10月にNSMQ 2023向けに計画されたAIのローンチとデビューに向けた次のステップについて説明します。この大きな課題を克服するAIは、アフリカの何百万人もの学生が、このAIから一対一の学習支援を受けられるように、教育に現実的な影響を与える可能性がある。 Can an AI win Ghana's National Science and Maths Quiz (NSMQ)? That is the question we seek to answer in the NSMQ AI project, an open-source project that is building AI to compete live in the NSMQ and win. The NSMQ is an annual live science and mathematics competition for senior secondary school students in Ghana in which 3 teams of 2 students compete by answering questions across biology, chemistry, physics, and math in 5 rounds over 5 progressive stages until a winning team is crowned for that year. The NSMQ is an exciting live quiz competition with interesting technical challenges across speech-to-text, text-to-speech, question-answering, and human-computer interaction. In this ongoing work that began in January 2023, we give an overview of the project, describe each of the teams, progress made thus far, and the next steps toward our planned launch and debut of the AI in October for NSMQ 2023. An AI that conquers this grand challenge can have real-world impact on education such as enabling millions of students across Africa to have one-on-one learning support from this AI.	翻訳日:2023-08-09 12:14:36 公開日:2023-08-08
# シーケンス生成のための大規模言語モデルからの学習評価モデル Learning Evaluation Models from Large Language Models for Sequence Generation ( http://arxiv.org/abs/2308.04386v1 ) ライセンス: Link先を確認	Chenglong Wang, Hang Zhou, Kaiyan Chang, Tongran Liu, Chunliang Zhang, Quan Du, Tong Xiao, Jingbo Zhu	(参考訳) 大規模言語モデルはシーケンス生成評価において最先端のパフォーマンスを実現するが、一般的に多くのパラメータを持つ。これは、大規模に評価能力を適用することで示される計算上の課題である。本稿では, LLM から比較的軽量な言語モデルへ評価能力を移すために, この問題を克服するために, \textbf{ECT}valuation \textbf{e}valuation \textbf{c}apability \textbf{t}ransfer 法を提案する。提案するectに基づいて、chatgptから様々な評価モデルを学び、強化学習と再ランキングアプローチによるシーケンス生成モデルの改善に報奨モデルとして活用する。機械翻訳, テキストスタイル転送, 要約タスクの実験結果から, ECTの有効性が示された。特に、学習した評価モデルをシーケンス生成モデルに適用すると、一般的なメトリクスやChatGPTで評価されるように、より優れた生成シーケンスが得られる。 Large language models achieve state-of-the-art performance on sequence generation evaluation, but typically have a large number of parameters. This is a computational challenge as presented by applying their evaluation capability at scale. To overcome the challenge, in this paper, we propose \textbf{ECT}, an \textbf{e}valuation \textbf{c}apability \textbf{t}ransfer method, to transfer the evaluation capability from LLMs to relatively lightweight language models. Based on the proposed ECT, we learn various evaluation models from ChatGPT, and employ them as reward models to improve sequence generation models via reinforcement learning and reranking approaches. Experimental results on machine translation, text style transfer, and summarization tasks demonstrate the effectiveness of our ECT. Notably, applying the learned evaluation models to sequence generation models results in better generated sequences as evaluated by commonly used metrics and ChatGPT.	翻訳日:2023-08-09 12:09:26 公開日:2023-08-08
# DELFlow: 大規模クラウドのためのシーンフローの高精度学習 DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds ( http://arxiv.org/abs/2308.04383v1 ) ライセンス: Link先を確認	Chensheng Peng, Guangming Wang, Xian Wan Lo, Xinrui Wu, Chenfeng Xu, Masayoshi Tomizuka, Wei Zhan, Hesheng Wang	(参考訳) 点雲は自然に狭く、画像ピクセルは密度が高い。不整合限界は、ポイントワイドシーンフロー推定のための両モードからの融合である。従来の手法では,局所的な特徴集約のための最遠点サンプリング,kn,ボール問合せアルゴリズムに関わる距離計算とソートによるメモリ効率の非効率とオーバーヘッドのため,一時的推論によってシーン全体のシーンフローを予測することはほとんどなかった。シーンフロー学習におけるこれらの問題を緩和するため、3次元座標を2次元グリッドに格納することにより、生点を濃密な形式に規則化する。既存の作品でよく使われるサンプリング操作とは異なり,密度2次元表現 1)所定のシーンのほとんどのポイントを保存する。 2)効率の大幅な向上をもたらし、 3) 点と画素間の密度ギャップを排除し, 効率的な特徴融合を実現する。また,複数の点を投影中に1つのグリッドにマッピング可能であることによる情報損失問題を軽減するための新しいワーピング投影手法を提案する。十分な実験により,flyingthings3dとkittiデータセットの先行技術に匹敵する,本手法の有効性と有効性が実証された。 Point clouds are naturally sparse, while image pixels are dense. The inconsistency limits feature fusion from both modalities for point-wise scene flow estimation. Previous methods rarely predict scene flow from the entire point clouds of the scene with one-time inference due to the memory inefficiency and heavy overhead from distance calculation and sorting involved in commonly used farthest point sampling, KNN, and ball query algorithms for local feature aggregation. To mitigate these issues in scene flow learning, we regularize raw points to a dense format by storing 3D coordinates in 2D grids. Unlike the sampling operation commonly used in existing works, the dense 2D representation 1) preserves most points in the given scene, 2) brings in a significant boost of efficiency, and 3) eliminates the density gap between points and pixels, allowing us to perform effective feature fusion. We also present a novel warping projection technique to alleviate the information loss problem resulting from the fact that multiple points could be mapped into one grid during projection when computing cost volume. Sufficient experiments demonstrate the efficiency and effectiveness of our method, outperforming the prior-arts on the FlyingThings3D and KITTI dataset.	翻訳日:2023-08-09 12:09:07 公開日:2023-08-08
# 1次元量子多体系における活性誘起強磁性 Activity-induced ferromagnetism in one-dimensional quantum many-body systems ( http://arxiv.org/abs/2308.04382v1 ) ライセンス: Link先を確認	Kazuaki Takasan, Kyosuke Adachi, Kyogo Kawaguchi	(参考訳) 自己推進体のアンサンブルである活性物質は、様々な非平衡相転移を示す。本稿では,活性物質の原型モデルであるヴィエクモデルに類似した1次元の非エルミート量子多体モデルを構築し,その量子相転移について検討する。このモデルは強磁性相互作用と活性を伴う2成分ハードコアボソンから構成される:スピン依存非対称ホッピング。数値的な結果は、古典的な例ではフラッキングの量子的相反する活性によって誘導される強磁性秩序の出現を示し、強磁性相互作用なしでも生き残る。摂動理論と2粒子の場合の解法により、2粒子レベルでの非エルミート皮膚効果がこの群れ形成に不可欠であることがわかった。この効果を考慮に入れ,二点平均場理論を用いて数値的に求めた位相図を定性的に再現する。さらに,ハードコア条件が緩和されたモデルの変形を数値的に検討し,強磁性秩序のロバスト性を確認した。 Active matter, an ensemble of self-propelled entities, exhibits various nonequilibrium phase transitions. In this paper, we construct a non-Hermitian quantum many-body model in one dimension analogous to the Vicsek model, a prototypical model of active matter, and investigate its quantum phase transitions. The model consists of two-component hard-core bosons undergoing ferromagnetic interactions and with activity: spin-dependent asymmetric hopping. Numerical results show the emergence of a ferromagnetic order induced by the activity, which is a quantum counterpart of flocking in classical examples, and it even survives without the ferromagnetic interaction. We find through perturbation theory and solving the two-particle case that the non-Hermitian skin effect at the two-particle level is crucial for this flocking phase. To take this effect into account, we employ a two-site mean-field theory and qualitatively reproduce the numerically obtained phase diagram. We further numerically study a variant of our model, where the hard-core condition is relaxed, and confirm the robustness of the ferromagnetic order.	翻訳日:2023-08-09 12:08:46 公開日:2023-08-08
# あなたの否定は真否定ではないかもしれない:偽陰性除去による画像テキストマッチングの促進 Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination ( http://arxiv.org/abs/2308.04380v1 ) ライセンス: Link先を確認	Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, Heng Tao Shen	(参考訳) 既存の画像テキストマッチング手法の多くは最適化目的として三重項損失を採用しており、モデルを効果的に訓練するには<anchor, positive, negative> の3重項に対する適切な負のサンプルを選択することが重要である。しかし, 既存の手法では, ほぼ同様の試料をハード負として用いるが, 真の負ではない可能性がある。言い換えれば、アンカーと組み合わせていない高い類似性を持つサンプルは、正の意味的関連を保留し、それらを偽陰性と呼ぶ。これらの偽陰性を三重項損失で撃退することは、意味表現学習を誤解させ、検索性能を低下させる。本稿では,偽陰性から生じる問題を緩和できる新しい偽陰性除去法を提案する。具体的には,画像エンコーダとテキストエンコーダから抽出した特徴に基づいて,まず,アンカーとの類似性から正と負のサンプルの分布を別々に構築する。得られたサンプルの偽陰性確率は、アンカーとの類似性および上記の分布に基づいてベイズの法則を用いて計算し、これは負のサンプリング過程においてサンプリング重量として用いられる。小さいバッチサイズでは偽陰性は存在しないかもしれないので、大きな負のバッファを保持するために運動量を持つメモリモジュールを設計し、バッファにまたがる負のサンプリング戦略を実装します。さらに, モデルが強陰性に焦点を合わせるために, 単純な負のサンプリング重みをカットダウン戦略で再割り当てする。 Flickr30KとMS-COCOで大規模な実験を行い,提案した偽陰性除去戦略の優位性を実証した。コードはhttps://github.com/luminosityx/fneで入手できる。 Most existing image-text matching methods adopt triplet loss as the optimization objective, and choosing a proper negative sample for the triplet of <anchor, positive, negative> is important for effectively training the model, e.g., hard negatives make the model learn efficiently and effectively. However, we observe that existing methods mainly employ the most similar samples as hard negatives, which may not be true negatives. In other words, the samples with high similarity but not paired with the anchor may reserve positive semantic associations, and we call them false negatives. Repelling these false negatives in triplet loss would mislead the semantic representation learning and result in inferior retrieval performance. In this paper, we propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling, which could alleviate the problem introduced by false negatives. Specifically, we first construct the distributions of positive and negative samples separately via their similarities with the anchor, based on the features extracted from image and text encoders. Then we calculate the false negative probability of a given sample based on its similarity with the anchor and the above distributions via the Bayes' rule, which is employed as the sampling weight during negative sampling process. Since there may not exist any false negative in a small batch size, we design a memory module with momentum to retain a large negative buffer and implement our negative sampling strategy spanning over the buffer. In addition, to make the model focus on hard negatives, we reassign the sampling weights for the simple negatives with a cut-down strategy. The extensive experiments are conducted on Flickr30K and MS-COCO, and the results demonstrate the superiority of our proposed false negative elimination strategy. The code is available at https://github.com/LuminosityX/FNE.	翻訳日:2023-08-09 12:08:28 公開日:2023-08-08
# 3+1次元における時空対称量子力学 Space-time-symmetric quantum mechanics in 3+1 dimensions ( http://arxiv.org/abs/2308.04376v1 ) ライセンス: Link先を確認	Eduardo O. Dias	(参考訳) 従来の量子力学(QM)では、時間はパラメータとして$t$として扱われ、時間に関する量子状態の進化は${\hat {H}}\|\psi(t)\rangle=i\hbar \frac{d}{dt}\|\psi(t)\rangle$で記述される。 QM の最近提案された時空対称(STS)拡張では、位置がパラメータとなり、新しい量子状態 $\|\phi(x)\rangle$ が導入された。この状態は、粒子の到着時刻が $x$ の位置で記述され、到着時刻が $x$ に対して変化する方法は ${\hat {p}}\|\phi(x)\rangle=-i\hbar \frac{d}{dx} \|\phi(x)\rangle$ によって制御される。本研究では,三次元空間を移動する粒子へのSTS拡張を一般化する。従来のQMと3次元STS拡張を組み合わせることで、動的方程式 ${\hat { P}}^{\mu}\|{\phi }^\mu(x^{\mu})\rangle=-i \hbar~\eta^{\mu\nu}\frac{d}{dx^{\nu}}\|{\phi}^\mu (x^{\mu})\rangle$ で与えられる `full'' STS QM が得られる。 x^\mu$を選択すると、Schr\"odinger方程式($x^\mu=x^0=t$)または3次元STS拡張($x^\mu=x^i=$または$x$、$y$、または$z$)を復元できる。 x^\mu=x$ を選択することにより、自由粒子に対する STS QM の動的方程式を解き、波動関数 $\langle t,y,z\|\phi^1(x)\rangle$ を計算する。この波動関数は、検出器がx$の位置にあるyz$平面全体を占有していることを考えると、即時$t$で到達する粒子の確率振幅(y$,z$)を表す。注目すべきことに、$\|\langle t,y,z\|\phi (x)\rangle\|^2$ in $y$ と $z$ の積分は、公理的キョフスキ分布の3次元版の形を取る。 In conventional quantum mechanics (QM), time is treated as a parameter, $t$, and the evolution of the quantum state with respect to time is described by ${\hat {H}}\|\psi(t)\rangle=i\hbar \frac{d}{dt}\|\psi(t)\rangle$. In a recently proposed space-time-symmetric (STS) extension of QM, position becomes the parameter and a new quantum state, $\|\phi(x)\rangle$, is introduced. This state describes the particle's arrival time at position $x$, and the way the arrival time changes with respect to $x$ is governed by ${\hat {P}}\|\phi(x)\rangle=-i\hbar \frac{d}{dx} \|\phi(x)\rangle$. In this work, we generalize the STS extension to a particle moving in three-dimensional space. By combining the conventional QM with the three-dimensional STS extension, we have a ``full'' STS QM given by the dynamic equation ${\hat { P}}^{\mu}\|{\phi }^\mu(x^{\mu})\rangle=- i \hbar~\eta^{\mu\nu}\frac{d}{dx^{\nu}}\|{\phi}^\mu (x^{\mu})\rangle$, where $x^{\mu}$ is the coordinate chosen as the parameter of the state. Depending on the choice of $x^\mu$, we can recover either the Schr\"odinger equation (with $x^\mu=x^0=t$) or the three-dimensional STS extension (with $x^\mu=x^i=$ either $x$, $y$, or $z$). By selecting $x^\mu=x$, we solve the dynamic equation of the STS QM for a free particle and calculate the wave function $\langle t,y,z\|\phi^1(x)\rangle$. This wave function represents the probability amplitude of the particle arriving at position ($y$,$z$) at instant $t$, given that the detector occupies the entire $yz$-plane located at position $x$. Remarkably, we find that the integral of $\|\langle t,y,z\|\phi (x)\rangle\|^2$ in $y$ and $z$ takes the form of the three-dimensional version of the axiomatic Kijowski distribution.	翻訳日:2023-08-09 12:07:56 公開日:2023-08-08
# 人間-ai共同臨床意思決定における反事実的説明が信頼と信頼に及ぼす影響の理解 Understanding the Effect of Counterfactual Explanations on Trust and Reliance on AI for Human-AI Collaborative Clinical Decision Making ( http://arxiv.org/abs/2308.04375v1 ) ライセンス: Link先を確認	Min Hun Lee, Chong Jun Chew	(参考訳) 人工知能(AI)は、ハイテイクドメイン(例えば健康)における人間の意思決定を支援するものと考えられている。しかし、研究者は人間のAI補完的なパフォーマンスを達成する代わりに、人間がAIモデルの間違った提案を過度に評価できるという問題を議論してきた。そこで本研究では,AIに対する信頼度を低下させるため,AI提案をより分析的にレビューする上で,有能な特徴説明に加えて,臨床意思決定におけるAIへの信頼度と信頼度への影響を検討した。我々は,7人のセラピストと10人のレイパーを対象に,ストローク後の生存者の動作の質を評価するための実験を行い,そのパフォーマンス,タスクの合意レベル,AIへの依存度を2種類のAIの説明なしで分析した。その結果,「正しい」aiアウトプットが提示された場合,aiモデルがセラピストや素直な説明を補助し,作業の成果や合意レベルを改善することができた。セラピストもレイパーもAIのアウトプットを過度に頼っていたが、反ファクト的な説明はセラピストとレイパーの双方が、優れた特徴説明と比較して「ホワイト」AIのアウトプットへの過度な依存を21倍に減らした。具体的には、18.0 f1-score によるパフォーマンス劣化が顕著で、14.0 f1-score は8.6 f1-score と2.8 f1-score のパフォーマンス劣化のセラピストよりも高い。我々の研究は、AIモデルの精度をより正確に見積り、AI出力の過度な信頼度を減らし、人間とAIの協調的な意思決定を改善することの意義について論じている。 Artificial intelligence (AI) is increasingly being considered to assist human decision-making in high-stake domains (e.g. health). However, researchers have discussed an issue that humans can over-rely on wrong suggestions of the AI model instead of achieving human AI complementary performance. In this work, we utilized salient feature explanations along with what-if, counterfactual explanations to make humans review AI suggestions more analytically to reduce overreliance on AI and explored the effect of these explanations on trust and reliance on AI during clinical decision-making. We conducted an experiment with seven therapists and ten laypersons on the task of assessing post-stroke survivors' quality of motion, and analyzed their performance, agreement level on the task, and reliance on AI without and with two types of AI explanations. Our results showed that the AI model with both salient features and counterfactual explanations assisted therapists and laypersons to improve their performance and agreement level on the task when `right' AI outputs are presented. While both therapists and laypersons over-relied on `wrong' AI outputs, counterfactual explanations assisted both therapists and laypersons to reduce their over-reliance on `wrong' AI outputs by 21\% compared to salient feature explanations. Specifically, laypersons had higher performance degrades by 18.0 f1-score with salient feature explanations and 14.0 f1-score with counterfactual explanations than therapists with performance degrades of 8.6 and 2.8 f1-scores respectively. Our work discusses the potential of counterfactual explanations to better estimate the accuracy of an AI model and reduce over-reliance on `wrong' AI outputs and implications for improving human-AI collaborative decision-making.	翻訳日:2023-08-09 12:07:00 公開日:2023-08-08
# pelta: フェデレーション学習における回避攻撃を軽減するためのトランスフォーマーの遮蔽 Pelta: Shielding Transformers to Mitigate Evasion Attacks in Federated Learning ( http://arxiv.org/abs/2308.04373v1 ) ライセンス: Link先を確認	Simon Queyrut, Y\'erom-David Bromberg, Valerio Schiavoni	(参考訳) フェデレートされた学習の主な前提は、機械学習モデルの更新がローカルに計算され、特にユーザーのデータのプライバシを保護するためである。このメカニズムは、一度集約された一般的なモデルを、共同作業や非悪意のあるノードにブロードキャストすると仮定する。しかし、適切な防御がなければ、妥協されたクライアントは、敵の例を探すことで、ローカルメモリ内のモデルを簡単に探すことができる。例えば、画像ベースの応用を考えると、敵対的な例は、ローカルモデルによって誤って分類された(人間の目には)知覚不能に摂動されたイメージから構成される。このような悪質な調査を軽減するため,我々は,信頼できるハードウェアを活用した新たな遮蔽機構であるpeltaを紹介する。 Trusted Execution Environments(TEEs)の能力を活用することで、Peltaはバックプロパゲーションチェーンルールの一部をマスクする。我々は,アートアンサンブルモデルの現状についてペルタを評価し,自己注意勾配攻撃に対する効果を実証する。 The main premise of federated learning is that machine learning model updates are computed locally, in particular to preserve user data privacy, as those never leave the perimeter of their device. This mechanism supposes the general model, once aggregated, to be broadcast to collaborating and non malicious nodes. However, without proper defenses, compromised clients can easily probe the model inside their local memory in search of adversarial examples. For instance, considering image-based applications, adversarial examples consist of imperceptibly perturbed images (to the human eye) misclassified by the local model, which can be later presented to a victim node's counterpart model to replicate the attack. To mitigate such malicious probing, we introduce Pelta, a novel shielding mechanism leveraging trusted hardware. By harnessing the capabilities of Trusted Execution Environments (TEEs), Pelta masks part of the back-propagation chain rule, otherwise typically exploited by attackers for the design of malicious samples. We evaluate Pelta on a state of the art ensemble model and demonstrate its effectiveness against the Self Attention Gradient adversarial Attack.	翻訳日:2023-08-09 12:06:22 公開日:2023-08-08
# 導出Argument を用いた Bipolar Argument グラフの検証 Some Options for Instantiation of Bipolar Argument Graphs with Deductive Arguments ( http://arxiv.org/abs/2308.04372v1 ) ライセンス: Link先を確認	Anthony Hunter	(参考訳) 議論グラフは議論的状況の抽象表現を提供する。双極性グラフは有向グラフであり、各ノードは引数を表し、各アークは別のノードに対する1つの引数の影響を表す。ここでは、影響が支持、攻撃、曖昧であると仮定する。双極引数グラフでは、各引数はアトミックであるため、内部構造を持たない。しかし、個々の議論の性質やどのように相互作用するかをよりよく理解するには、その内部構造を検討することが重要である。そこで本論文では,双極子グラフのインスタンス化のための論理的引数の利用に基づくフレームワークと,引数の内部構造と引数間の関係のタイプを考慮に入れた議論のインスタンス化に関する制約のセットを提案する。 Argument graphs provide an abstract representation of an argumentative situation. A bipolar argument graph is a directed graph where each node denotes an argument, and each arc denotes the influence of one argument on another. Here we assume that the influence is supporting, attacking, or ambiguous. In a bipolar argument graph, each argument is atomic and so it has no internal structure. Yet to better understand the nature of the individual arguments, and how they interact, it is important to consider their internal structure. To address this need, this paper presents a framework based on the use of logical arguments to instantiate bipolar argument graphs, and a set of possible constraints on instantiating arguments that take into account the internal structure of the arguments, and the types of relationship between arguments.	翻訳日:2023-08-09 12:06:03 公開日:2023-08-08
# 大規模言語モデルを用いた累積推論 Cumulative Reasoning With Large Language Models ( http://arxiv.org/abs/2308.04371v1 ) ライセンス: Link先を確認	Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew Chi-Chih Yao	(参考訳) 言語モデルは強力で多用途であるが、しばしば非常に複雑な問題に対処できない。これは、複雑な問題を解決するには意図的な思考が必要であり、トレーニングの間は最小限の指導しか行われていないからである。本稿では,言語モデルを累積的かつ反復的に活用し,人間の思考過程をエミュレートするCumulative Reasoning(CR)という新しい手法を提案する。タスクを小さなコンポーネントに分解することで、 \ournamebは問題解決プロセスを合理化し、より管理しやすく、効果的にする。論理推論タスクでは、CRは既存のメソッドを9.3\%改善し、計算済みのFOLIO wikiデータセットで98.04\%の驚くべき精度を達成する。 24 のゲームでは、CR は 94 % の精度を達成し、これは以前の最先端手法よりも 20 % の大幅な向上を意味する。 While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, \ournameb streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3\%, and achieves the astonishing accuracy of 98.04\% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 94\%, which signifies a substantial enhancement of 20\% over the previous state-of-the-art method.	翻訳日:2023-08-09 12:05:49 公開日:2023-08-08
# 超解像におけるカモフラーゲ型物体検出 : 比較検討 When Super-Resolution Meets Camouflaged Object Detection: A Comparison Study ( http://arxiv.org/abs/2308.04370v1 ) ライセンス: Link先を確認	Juan Wen, Shupeng Cheng, Peng Xu, Bowen Zhou, Radu Timofte, Weiyan Hou, Luc Van Gool	(参考訳) Super Resolution (SR) と Camouflaged Object Detection (COD) は、コンピュータビジョンにおける様々なジョイントアプリケーションとのホットトピックである。例えば、低解像度の監視画像は、超高解像度技術と擬似物体検出によって順次処理することができる。しかし、以前の研究では、この2つの領域は常に孤立して研究されている。本稿では, 両者の総合的な比較評価を初めて実施する。具体的には,一般的なcodデータセット上で異なる超解像法をベンチマークし,sr法で処理したcodデータを用いて,異なるcodモデルのロバスト性を評価する。私たちの目標は、これらの2つの領域を橋渡し、新しい実験現象を発見し、新しい経験をまとめることです。 Super Resolution (SR) and Camouflaged Object Detection (COD) are two hot topics in computer vision with various joint applications. For instance, low-resolution surveillance images can be successively processed by super-resolution techniques and camouflaged object detection. However, in previous work, these two areas are always studied in isolation. In this paper, we, for the first time, conduct an integrated comparative evaluation for both. Specifically, we benchmark different super-resolution methods on commonly used COD datasets, and meanwhile, we evaluate the robustness of different COD models by using COD data processed by SR methods. Our goal is to bridge these two domains, discover novel experimental phenomena, summarize new experim.	翻訳日:2023-08-09 12:05:34 公開日:2023-08-08
# 屋外神経放射領域における深度事前の探索 Digging into Depth Priors for Outdoor Neural Radiance Fields ( http://arxiv.org/abs/2308.04413v1 ) ライセンス: Link先を確認	Chen Wang, Jiadai Sun, Lina Liu, Chenming Wu, Zhelun Shen, Dayan Wu, Yuchao Dai, Liangjun Zhang	(参考訳) neural radiance fields (nerf) は、新しいビュー合成や没入現実(immersive reality)など、視覚やグラフィックタスクにおいて印象的なパフォーマンスを示している。しかしながら、放射場の形状-照度あいまいさは、特に希薄な視点設定において、依然として課題である。近年の作業では、問題を緩和するため、奥行き先を屋外のNeRFトレーニングに統合している。しかし, 深度事前の選択基準と, 異なる先行の相対的メリットについては, 十分に検討されていない。さらに、深さ優先法を使うための異なるアプローチを選択するという相対的なメリットも未検討の問題である。本稿では,屋外神経放射場に先行する深度を用いた総合的な研究と評価を行い,一般的な深度センシング技術とその適用方法について述べる。具体的には,広く使用されている2つの屋外データセット上で,4つの共通使用深度前置法と異なる深さ使用法を備えた2つの代表的なnerf法を用いて広範囲な実験を行う。実験結果から,NeRFモデルの深度事前トレーニングにおいて,実践者や研究者が有用である可能性が示唆された。プロジェクトページ: https://cwchenwang.github.io/outdoor-nerf-depth Neural Radiance Fields (NeRF) have demonstrated impressive performance in vision and graphics tasks, such as novel view synthesis and immersive reality. However, the shape-radiance ambiguity of radiance fields remains a challenge, especially in the sparse viewpoints setting. Recent work resorts to integrating depth priors into outdoor NeRF training to alleviate the issue. However, the criteria for selecting depth priors and the relative merits of different priors have not been thoroughly investigated. Moreover, the relative merits of selecting different approaches to use the depth priors is also an unexplored problem. In this paper, we provide a comprehensive study and evaluation of employing depth priors to outdoor neural radiance fields, covering common depth sensing technologies and most application ways. Specifically, we conduct extensive experiments with two representative NeRF methods equipped with four commonly-used depth priors and different depth usages on two widely used outdoor datasets. Our experimental results reveal several interesting findings that can potentially benefit practitioners and researchers in training their NeRF models with depth priors. Project Page: https://cwchenwang.github.io/outdoor-nerf-depth	翻訳日:2023-08-09 11:58:44 公開日:2023-08-08
# ランダム化線形分類器を用いた確率不変学習 Probabilistic Invariant Learning with Randomized Linear Classifiers ( http://arxiv.org/abs/2308.04412v1 ) ライセンス: Link先を確認	Leonardo Cotta, Gal Yehuda, Assaf Schuster, Chris J. Maddison	(参考訳) 既知のタスクの不分散を表現的かつ保存するモデルの設計は、ますます難しい問題になっている。既存のソリューション計算リソースやメモリリソースに対する不変性。本研究では,表現的かつ不変だが資源の少ないランダム性モデルと設計モデルをどのように活用するかを示す。ランダム化アルゴリズムにインスパイアされた私たちの重要な洞察は、普遍近似と不変性の確率論的概念を受け入れることで、リソースの要求を減らせることである。具体的には,Randomized Linear Classifiers (RLC) と呼ばれるバイナリ分類モデルのクラスを提案する。 rlcはコンパクト群変換に対する不変性を維持しつつ、高確率で任意の(スムース)関数を近似できるパラメータとサンプルサイズ条件を与える。この結果を利用して,集合,グラフ,球面データ上の分類タスクに対して有理確率不変量を持つ3つのrlcを設計した。これらのモデルが、(決定論的)ニューラルネットワークとその不変量よりも少ないリソースを用いて、確率的不変性と普遍性を達成する方法を示す。最後に、決定論的不変ニューラルネットワークが困難であることが知られている不変タスクにおいて、この新しいモデルの利点を実証的に示す。 Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, our key insight is that accepting probabilistic notions of universal approximation and invariance can reduce our resource requirements. More specifically, we propose a class of binary classification models called Randomized Linear Classifiers (RLCs). We give parameter and sample size conditions in which RLCs can, with high probability, approximate any (smooth) function while preserving invariance to compact group transformations. Leveraging this result, we design three RLCs that are provably probabilistic invariant for classification tasks over sets, graphs, and spherical data. We show how these models can achieve probabilistic invariance and universality using less resources than (deterministic) neural networks and their invariant counterparts. Finally, we empirically demonstrate the benefits of this new class of models on invariant tasks where deterministic invariant neural networks are known to struggle.	翻訳日:2023-08-09 11:58:24 公開日:2023-08-08
# 3次元物体検出のための頂点相対位置符号化V-DETR:DETR V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection ( http://arxiv.org/abs/2308.04409v1 ) ライセンス: Link先を確認	Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, Baining Guo	(参考訳) DETRフレームワークを用いた点雲のための高性能な3次元物体検出器を提案する。事前の試みは、訓練データの限られた規模から正確な帰納バイアスを学習できないため、すべて最適以下の結果に終わる。特に、クエリは、ターゲットオブジェクトから遠く離れた点にしばしば参加し、オブジェクト検出の局所性原理に違反します。この制限に対処するために,各デコーダ層におけるクエリによって予測される3Dボックスに対する相対的な位置に基づいて各点の位置エンコーディングを計算し,局所性の原則に従ってモデルがオブジェクト近傍の点に焦点を合わせるための明確な情報を提供する,新しい3D Vertex Relative Position Encoding (3DV-RPE)手法を提案する。さらに,タスクの理解に基づくデータの正規化など,さまざまな側面からパイプラインを体系的に改善する。難解なscannetv2ベンチマークでは、それぞれ65.0\%/47.0\%から77.8\%/66.0\%までの$\rm{ap}_{25}$/$\rm{ap}_{50}$で以前の3detrを大きく改善した。さらに、ScanNetV2 と SUN RGB-D データセットに新しいレコードをセットし、http://github.com/yichaoshen-MS/V-DETR でコードをリリースする。 We introduce a highly performant 3D object detector for point clouds using the DETR framework. The prior attempts all end up with suboptimal results because they fail to learn accurate inductive biases from the limited scale of training data. In particular, the queries often attend to points that are far away from the target objects, violating the locality principle in object detection. To address the limitation, we introduce a novel 3D Vertex Relative Position Encoding (3DV-RPE) method which computes position encoding for each point based on its relative position to the 3D boxes predicted by the queries in each decoder layer, thus providing clear information to guide the model to focus on points near the objects, in accordance with the principle of locality. In addition, we systematically improve the pipeline from various aspects such as data normalization based on our understanding of the task. We show exceptional results on the challenging ScanNetV2 benchmark, achieving significant improvements over the previous 3DETR in $\rm{AP}_{25}$/$\rm{AP}_{50}$ from 65.0\%/47.0\% to 77.8\%/66.0\%, respectively. In addition, our method sets a new record on ScanNetV2 and SUN RGB-D datasets.Code will be released at http://github.com/yichaoshen-MS/V-DETR.	翻訳日:2023-08-09 11:58:07 公開日:2023-08-08
# XGBD:説明誘導グラフバックドア検出 XGBD: Explanation-Guided Graph Backdoor Detection ( http://arxiv.org/abs/2308.04406v1 ) ライセンス: Link先を確認	Zihan Guan, Mengnan Du, Ninghao Liu	(参考訳) バックドア攻撃は、グラフ学習モデルに重大なセキュリティリスクをもたらす。トレーニングデータセットにバックドアトリガーを挿入することで、ターゲットモデルにバックドアを組み込むことができる。バックドア攻撃に対抗するためにバックドア検出が提案されている。バックドアとクリーンサンプルの混合でモデルのトレーニングを行うと、バックドアサンプルの損失はクリーンサンプルよりも大幅に減少し、最低損失値のサンプルを選択することでバックドアサンプルを容易に検出できる。しかし、グラフデータ上のトポロジ的特徴情報の無知は、グラフ領域に直接適用した場合、検出の有効性を制限する。そこで本稿では,トポロジ情報を活用するために,説明誘導型バックドア検出手法を提案する。具体的には、グラフデータセット上でヘルパモデルをトレーニングし、モデルにグラフサンプルをフィードし、モデル予測を重要なサブグラフに属性付けるために説明手法を採用する。バックドア試料はクリーンサンプルと異なる属性分布を有するので,説明文はバックドア試料を検出するための識別的特徴として有用である。複数のポピュラーデータセットと攻撃手法に関する包括的実験により,本手法の有効性と説明可能性を示す。私たちのコードは、https://github.com/GuanZihan/GNN_backdoor_detectionで利用可能です。 Backdoor attacks pose a significant security risk to graph learning models. Backdoors can be embedded into the target model by inserting backdoor triggers into the training dataset, causing the model to make incorrect predictions when the trigger is present. To counter backdoor attacks, backdoor detection has been proposed. An emerging detection strategy in the vision and NLP domains is based on an intriguing phenomenon: when training models on a mixture of backdoor and clean samples, the loss on backdoor samples drops significantly faster than on clean samples, allowing backdoor samples to be easily detected by selecting samples with the lowest loss values. However, the ignorance of topological feature information on graph data limits its detection effectiveness when applied directly to the graph domain. To this end, we propose an explanation-guided backdoor detection method to take advantage of the topological information. Specifically, we train a helper model on the graph dataset, feed graph samples into the model, and then adopt explanation methods to attribute model prediction to an important subgraph. We observe that backdoor samples have distinct attribution distribution than clean samples, so the explanatory subgraph could serve as more discriminative features for detecting backdoor samples. Comprehensive experiments on multiple popular datasets and attack methods demonstrate the effectiveness and explainability of our method. Our code is available: https://github.com/GuanZihan/GNN_backdoor_detection.	翻訳日:2023-08-09 11:57:24 公開日:2023-08-08
# イベント匿名化による識別のない人物再識別 Person Re-Identification without Identification via Event Anonymization ( http://arxiv.org/abs/2308.04402v1 ) ライセンス: Link先を確認	Shafiq Ahmad, Pietro Morerio, Alessio Del Bue	(参考訳) 公共空間における視覚的監視の大規模利用は、個人のプライバシーを犠牲にしつつ、リソース消費(エネルギー、帯域幅、計算)を増加させる。ニューロモルフィック視覚センサ(イベントカメラ)は, 現場の被験者の詳細なRGB視覚情報を捉えないため, プライバシー問題に対する有効な解決策として近年検討されている。しかし、最近のディープラーニングアーキテクチャは、イベントカメラからのイメージを高い忠実度で再構築することができ、イベントベースのビジョンアプリケーションに対するプライバシーに対する潜在的な脅威を再導入している。本稿では,このような画像再構成攻撃から人間の身元を守るために,イベントストリームを匿名化することを目的とする。そこで本研究では,プライバシを保護し,人物ReIdのような下流タスクを実行するという2つの目的に対して,エンドツーエンドネットワークアーキテクチャを共同で最適化する手法を提案する。我々のネットワークは、イベントをスクランブルすることを学び、プライバシー攻撃者から回収された画像の劣化を強制する。この作業では、私たちのアプローチのパフォーマンスを評価するために収集された最初のイベントベースの人物ReIdデータセットもコミュニティに提供します。本手法を広範囲な実験により検証し,SoftBioデータセットと提案したEvent-ReIdデータセットからシミュレーションした合成イベントデータについて報告する。 Wide-scale use of visual surveillance in public spaces puts individual privacy at stake while increasing resource consumption (energy, bandwidth, and computation). Neuromorphic vision sensors (event-cameras) have been recently considered a valid solution to the privacy issue because they do not capture detailed RGB visual information of the subjects in the scene. However, recent deep learning architectures have been able to reconstruct images from event cameras with high fidelity, reintroducing a potential threat to privacy for event-based vision applications. In this paper, we aim to anonymize event-streams to protect the identity of human subjects against such image reconstruction attacks. To achieve this, we propose an end-to-end network architecture jointly optimized for the twofold objective of preserving privacy and performing a downstream task such as person ReId. Our network learns to scramble events, enforcing the degradation of images recovered from the privacy attacker. In this work, we also bring to the community the first ever event-based person ReId dataset gathered to evaluate the performance of our approach. We validate our approach with extensive experiments and report results on the synthetic event data simulated from the publicly available SoftBio dataset and our proposed Event-ReId dataset.	翻訳日:2023-08-09 11:56:49 公開日:2023-08-08
# ファインチューニングゲーム:汎用モデルの獲得と適応 Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models ( http://arxiv.org/abs/2308.04399v1 ) ライセンス: Link先を確認	Benjamin Laufer and Jon Kleinberg and Hoda Heidari	(参考訳) 機械学習(ML)と人工知能(AI)の主な進歩は、汎用モデルの開発とリリースの形式をますます取り入れている。これらのモデルは、他の企業や代理店が特定のドメイン固有の機能を実行するように設計されている。このプロセスは適応や微調整として知られるようになった。本稿では、ジェネラリストが技術製品(以下、MLモデル)を一定のレベルのパフォーマンスで導入し、1つ以上のドメイン-スペシャリストが特定のドメインでの使用に適応する微調整プロセスのモデルを提案する。両社とも、テクノロジに投資するときに利益を計上し、コストを被る。そして、市場に到達するためのテクノロジの収益の共有方法に関する交渉合意に達する必要がある。比較的一般的なコストと収益関数に対して、細調整ゲームが利益分配ソリューションをもたらす条件を特徴付ける。我々は、潜在的なドメイン-特殊化が、テクノロジーの取り込みに寄与し、自由化され、または吸収されることを観察し、これらの異なる戦略をもたらす条件を提供する。我々は,このタイプのインタラクションにおける企業の戦略行動の洞察を,バーゲインソリューションとサブゲーム完全均衡に基づく手法がどのように提供するかを示し,一方の企業が他方よりも著しくコストが高い場合でも,利益の分配が生じることを見出した。また,実用関数の一般集合に対するパレート・最適交渉配置を同定する手法も提案する。 Major advances in Machine Learning (ML) and Artificial Intelligence (AI) increasingly take the form of developing and releasing general-purpose models. These models are designed to be adapted by other businesses and agencies to perform a particular, domain-specific function. This process has become known as adaptation or fine-tuning. This paper offers a model of the fine-tuning process where a Generalist brings the technological product (here an ML model) to a certain level of performance, and one or more Domain-specialist(s) adapts it for use in a particular domain. Both entities are profit-seeking and incur costs when they invest in the technology, and they must reach a bargaining agreement on how to share the revenue for the technology to reach the market. For a relatively general class of cost and revenue functions, we characterize the conditions under which the fine-tuning game yields a profit-sharing solution. We observe that any potential domain-specialization will either contribute, free-ride, or abstain in their uptake of the technology, and we provide conditions yielding these different strategies. We show how methods based on bargaining solutions and sub-game perfect equilibria provide insights into the strategic behavior of firms in these types of interactions, and we find that profit-sharing can still arise even when one firm has significantly higher costs than another. We also provide methods for identifying Pareto-optimal bargaining arrangements for a general set of utility functions.	翻訳日:2023-08-09 11:56:07 公開日:2023-08-08
# 文字レベルNMTと言語類似性 Character-level NMT and language similarity ( http://arxiv.org/abs/2308.04398v1 ) ライセンス: Link先を確認	Josef Jon and Ond\v{r}ej Bojar	(参考訳) 本稿では,チェコ語とクロアチア語,ドイツ語,ハンガリー語,スロバキア語,スペイン語の翻訳における,トランスフォーマーアーキテクチャを用いた文字レベルのニューラルネットワーク翻訳の有効性について検討する。自動mtメトリクスを用いてモデルを評価し,類似言語間の翻訳が文字レベルの入力セグメンテーションに有益であることを示すが,関連度の低い言語では,文字レベルのバニラトランスフォーマベースがサブワードレベルのセグメンテーションに遅れることが多い。我々は、既に訓練済みのサブワードレベルのモデルを文字レベルに微調整することで、ギャップを閉じることができるという以前の発見を確認する。 We explore the effectiveness of character-level neural machine translation using Transformer architecture for various levels of language similarity and size of the training dataset on translation between Czech and Croatian, German, Hungarian, Slovak, and Spanish. We evaluate the models using automatic MT metrics and show that translation between similar languages benefits from character-level input segmentation, while for less related languages, character-level vanilla Transformer-base often lags behind subword-level segmentation. We confirm previous findings that it is possible to close the gap by finetuning the already trained subword-level models to character-level.	翻訳日:2023-08-09 11:55:43 公開日:2023-08-08
# LEFormer:リモートセンシング画像からの湖沼抽出のためのハイブリッドCNN変換器アーキテクチャ LEFormer: A Hybrid CNN-Transformer Architecture for Accurate Lake Extraction from Remote Sensing Imagery ( http://arxiv.org/abs/2308.04397v1 ) ライセンス: Link先を確認	Ben Chen, Xuechao Zou, Yu Zhang, Jiayu Li, Kai Li, Pin Tao	(参考訳) リモートセンシング画像からの湖の抽出は、湖の複雑な形状とノイズの存在のために困難である。既存の手法は曖昧なセグメンテーション境界と貧弱なフォアグラウンドモデリングに悩まされている。本稿では, LEFormerと呼ばれるCNN-Transformerハイブリッドアーキテクチャを, 正確な湖沼抽出のために提案する。 leformerにはcnnエンコーダ、トランスフォーマーエンコーダ、クロスエンコーダ融合、軽量デコーダの4つのモジュールが含まれている。 CNNエンコーダは、局所的な空間情報を復元し、微細な詳細を改善する。同時にTransformerエンコーダは、任意の長さのシーケンス間の長距離依存関係をキャプチャし、グローバルな特徴とコンテキスト情報をよりよく取得する。最後に、マスク予測に軽量デコーダを用いる。本研究では,2つのデータセットである表層水 (SW) と清海・チベット高原湖 (QTPL) のLEFormerの性能と効率を評価する。実験結果から,LEFormerはこれらの2つのデータセット上で,最新技術(SOTA)のパフォーマンスと効率を一貫して達成し,既存の手法よりも優れていることがわかった。具体的には、LEFormerはSWデータセットとQTPLデータセットの90.86%と97.42% mIoUをそれぞれ3.61Mで達成し、従来のSOTA法より20倍小さい。 Lake extraction from remote sensing imagery is challenging due to the complex shapes of lakes and the presence of noise. Existing methods suffer from blurred segmentation boundaries and poor foreground modeling. In this paper, we propose a hybrid CNN-Transformer architecture, called LEFormer, for accurate lake extraction. LEFormer contains four main modules: CNN encoder, Transformer encoder, cross-encoder fusion, and lightweight decoder. The CNN encoder recovers local spatial information and improves fine-scale details. Simultaneously, the Transformer encoder captures long-range dependencies between sequences of any length, allowing them to obtain global features and context information better. Finally, a lightweight decoder is employed for mask prediction. We evaluate the performance and efficiency of LEFormer on two datasets, the Surface Water (SW) and the Qinghai-Tibet Plateau Lake (QTPL). Experimental results show that LEFormer consistently achieves state-of-the-art (SOTA) performance and efficiency on these two datasets, outperforming existing methods. Specifically, LEFormer achieves 90.86% and 97.42% mIoU on the SW and QTPL datasets with a parameter count of 3.61M, respectively, while being 20x minor than the previous SOTA method.	翻訳日:2023-08-09 11:55:30 公開日:2023-08-08
# ソーシャルプロセスマイニングを支援する企業コラボレーションシステムのためのイベント抽象化 Event Abstraction for Enterprise Collaboration Systems to Support Social Process Mining ( http://arxiv.org/abs/2308.04396v1 ) ライセンス: Link先を確認	Jonas Blatt, Patrick Delfmann, Petra Schubert	(参考訳) プロセスマイニング(PM)の1つの目的は、情報システムのイベントログからプロセスモデルの発見である。 PMはプロセス指向のエンタープライズシステムに適用されているが、通信やドキュメント指向のエンタープライズコラボレーションシステム(ECS)には適していない。 ECSイベントログは非常に粒度が高く、その結果はスパゲッティモデルに適用される。これに対する一般的な解決策は、発見アルゴリズムを実行する前に低レベルのログをより抽象的な高レベルのログに変換する、イベント抽象化である。 ECSログには、既存のイベント抽象化アプローチで完全に対処されていない特別な特徴がある。このギャップをECSイベント抽象化(ECSEA)アプローチで埋めることを目指しており、記録された実際のユーザアクティビティ(ハイレベルトレース)とシステム生成の低レベルトレース(ECSから抽出した)を比較してモデルを訓練する。このモデルにより、将来の低レベルトレースをPMに使用できる抽象化された高レベルログに変換することができる。本評価は,アルゴリズムが正確な結果を生成することを示す。 ECSEAは、社会プロセスマイニング(Social Process Mining)と呼ばれるECSにおける協調作業活動の解釈に不可欠な前処理手法である。 One aim of Process Mining (PM) is the discovery of process models from event logs of information systems. PM has been successfully applied to process-oriented enterprise systems but is less suited for communication- and document-oriented Enterprise Collaboration Systems (ECS). ECS event logs are very fine-granular and PM applied to their logs results in spaghetti models. A common solution for this is event abstraction, i.e., converting low-level logs into more abstract high-level logs before running discovery algorithms. ECS logs come with special characteristics that have so far not been fully addressed by existing event abstraction approaches. We aim to close this gap with a tailored ECS event abstraction (ECSEA) approach that trains a model by comparing recorded actual user activities (high-level traces) with the system-generated low-level traces (extracted from the ECS). The model allows us to automatically convert future low-level traces into an abstracted high-level log that can be used for PM. Our evaluation shows that the algorithm produces accurate results. ECSEA is a preprocessing method that is essential for the interpretation of collaborative work activity in ECS, which we call Social Process Mining.	翻訳日:2023-08-09 11:55:05 公開日:2023-08-08
# 医用画像におけるデータ拡張に基づく教師なしドメイン適応 Data Augmentation-Based Unsupervised Domain Adaptation In Medical Imaging ( http://arxiv.org/abs/2308.04395v1 ) ライセンス: Link先を確認	Sebastian N{\o}rgaard Llambias, Mads Nielsen, Mostafa Mehdipour Ghazi	(参考訳) ディープラーニングベースの医療画像モデルは、ハードウェア、取得パラメータ、人口、アーティファクトの違いによって生じるデータの異質性によって、しばしば新しいスキャンを効果的に一般化するのに苦労する。この制限は、臨床に機械学習モデルを採用する上で大きな課題となる。脳MRI領域の領域適応のための教師なし手法として,MRI固有の拡張技術を活用して提案する。本手法の有効性を評価するために,様々なデータセット,モダリティ,セグメンテーションタスクにまたがる広範な実験を行い,最先端手法との比較を行った。その結果,提案手法は高い精度を実現し,幅広い適用性を示し,多くのケースで最先端性能を上回って,様々なタスクにおけるドメインシフトに対する著しい堅牢性を示すことができた。 Deep learning-based models in medical imaging often struggle to generalize effectively to new scans due to data heterogeneity arising from differences in hardware, acquisition parameters, population, and artifacts. This limitation presents a significant challenge in adopting machine learning models for clinical practice. We propose an unsupervised method for robust domain adaptation in brain MRI segmentation by leveraging MRI-specific augmentation techniques. To evaluate the effectiveness of our method, we conduct extensive experiments across diverse datasets, modalities, and segmentation tasks, comparing against the state-of-the-art methods. The results show that our proposed approach achieves high accuracy, exhibits broad applicability, and showcases remarkable robustness against domain shift in various tasks, surpassing the state-of-the-art performance in the majority of cases.	翻訳日:2023-08-09 11:54:44 公開日:2023-08-08
# 追加データセットを組み込むことで、余分な相関を導入すればパフォーマンスを損なうことができる When More is Less: Incorporating Additional Datasets Can Hurt Performance By Introducing Spurious Correlations ( http://arxiv.org/abs/2308.04431v1 ) ライセンス: Link先を確認	Rhys Compton, Lily Zhang, Aahlad Puli, Rajesh Ranganath	(参考訳) この作業は、多くの場合、外部データセットの追加がモデルのパフォーマンスを損なう可能性があることを示すことで、その概念に挑戦する。 4つの異なるオープンソースの胸部x線データセットと9つの異なるラベルの組み合わせを用いた大規模実証研究において,2つの病院のデータに基づいてトレーニングされたモデルでは,単一の病院のデータでトレーニングされたモデルよりも,2つの病院でトレーニングされたモデルの精度が最悪であることが示されている。この驚くべき結果は、追加の病院がトレーニング分布をテスト分布とよりよく似ているとしても起こる。この現象は, 病院固有のイメージアーティファクトが原因で, 疾患と病院との間に生じる急激な相関関係から生じると説明される。複数のデータセットでトレーニングする際のトレードオフ、追加データの明らかなメリットと、導入した急激な相関の差し迫ったコストを強調します。場合によっては、データセットのバランスをとることで、スプリアス相関を取り除き、パフォーマンスを向上させることができるが、必ずしも効果的な戦略ではない。我々は、これらの結果を説明するのに役立つ、散発的な相関に関する文献内の結果の文脈化を行う。本実験は,機械学習モデルにおけるトレーニングデータの選択において,特に医用画像などと相関する危険のある場面において,注意を喚起することの重要性を強調する。リスクの概要は、将来の研究と実践において注意深いデータ選択とモデル評価の必要性を浮き彫りにしている。 In machine learning, incorporating more data is often seen as a reliable strategy for improving model performance; this work challenges that notion by demonstrating that the addition of external datasets in many cases can hurt the resulting model's performance. In a large-scale empirical study across combinations of four different open-source chest x-ray datasets and 9 different labels, we demonstrate that in 43% of settings, a model trained on data from two hospitals has poorer worst group accuracy over both hospitals than a model trained on just a single hospital's data. This surprising result occurs even though the added hospital makes the training distribution more similar to the test distribution. We explain that this phenomenon arises from the spurious correlation that emerges between the disease and hospital, due to hospital-specific image artifacts. We highlight the trade-off one encounters when training on multiple datasets, between the obvious benefit of additional data and insidious cost of the introduced spurious correlation. In some cases, balancing the dataset can remove the spurious correlation and improve performance, but it is not always an effective strategy. We contextualize our results within the literature on spurious correlations to help explain these outcomes. Our experiments underscore the importance of exercising caution when selecting training data for machine learning models, especially in settings where there is a risk of spurious correlations such as with medical imaging. The risks outlined highlight the need for careful data selection and model evaluation in future research and practice.	翻訳日:2023-08-09 11:49:19 公開日:2023-08-08
# SILO言語モデル:非パラメトリックデータストアにおける法的リスクの解消 SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore ( http://arxiv.org/abs/2308.04430v1 ) ライセンス: Link先を確認	Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer	(参考訳) 著作権や制限されたデータに対する訓練言語モデル(LM)の合法性は、激しい議論の対象となっている。しかし, モデルの性能は, 低リスクテキスト(例えば, 著作権外書籍や政府文書)でのみ訓練した場合, サイズやドメインカバレッジが限定されているため, 著しく低下する。これは推論中にリスクパフォーマンスのトレードオフを管理する新しい言語モデルです。 siloは、(1)パブリックドメインの228bトークンと許容ライセンスのテキストをキュレートした新しいコーパスであるopen license corpus(olc)上でパラメトリックlmをトレーニングし、(2)より一般的で容易に修正可能な非パラメトリックデータストア(例えば、著作権付き書籍やニュースを含む)で拡張することで構築されます。データストアは、トレーニングなしでハイリスクデータを使用することができ、文レベルのデータ属性をサポートし、データプロデューサがストアからコンテンツを削除することで、モデルからオプトアウトできる。これらの能力は、米国の公正使用原則や欧州連合のGDPRなどのデータ利用規制の遵守を促進することができる。実験の結果,パラメトリックLMはOLCでカバーされていない領域で苦労していることがわかった。しかし、データストアへのアクセスはドメインのパフォーマンスを大幅に改善し、パフォーマンスギャップの90%を、主にリスクの高いテキストを含むより多様なコーパスであるパイル上でトレーニングされたlmで閉じる。また、どの非パラメトリックアプローチが最適か、残りのエラーがどこにあるか、そしてデータストアサイズでパフォーマンスがどのようにスケールするかを分析します。その結果, 法的リスクを軽減しつつ, 高品質な言語モデルの構築が可能であることが示唆された。 The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government documents), due to its limited size and domain coverage. We present SILO, a new language model that manages this risk-performance tradeoff during inference. SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e.g., containing copyrighted books or news) that is only queried during inference. The datastore allows use of high-risk data without training on it, supports sentence-level data attribution, and enables data producers to opt out from the model by removing content from the store. These capabilities can foster compliance with data-use regulations such as the fair use doctrine in the United States and the GDPR in the European Union. Our experiments show that the parametric LM struggles on domains not covered by OLC. However, access to the datastore greatly improves out of domain performance, closing 90% of the performance gap with an LM trained on the Pile, a more diverse corpus with mostly high-risk text. We also analyze which nonparametric approach works best, where the remaining errors lie, and how performance scales with datastore size. Our results suggest that it is possible to build high quality language models while mitigating their legal risk.	翻訳日:2023-08-09 11:48:51 公開日:2023-08-08
# マルチタスク非IIDデータによるメタ学習オペレータの最適性 Meta-Learning Operators to Optimality from Multi-Task Non-IID Data ( http://arxiv.org/abs/2308.04428v1 ) ライセンス: Link先を確認	Thomas T.C.K. Zhang, Leonardo F. Toso, James Anderson, Nikolai Matni	(参考訳) 機械学習の最近の進歩の背後にある強力な概念は、異種ソースやタスクからデータにまたがる共通機能を抽出することだ。直感的には、共通の表現関数を学ぶためにすべてのデータを使用することは、与えられたタスクでより少ないパラメータを微調整に残すことで、計算努力と統計的一般化の両方に利益をもたらす。これらの利点を理論的に基礎づけるために、ノイジーベクトル測度$y = Mx + w$ から線型作用素 $M$ を回復する一般的な設定を提案し、この共変量 $x$ は非等方的かつ非等方的である。既存の等方性非依存のメタラーニングアプローチは,表現更新のバイアスを伴い,ノイズ項のスケーリングによってソースタスク数への好適な依存が失われることを示した。これにより、単一タスクのデータサイズによって、表現学習のサンプル複雑性がボトルネックになる可能性がある。本稿では,collins et al. (2021) で提案されている交互最小化-descent (amd) 方式の適応である$\texttt{de-bias & feature-whiten}$ (\texttt{dfw}$) を導入し,$\textit{total}$ソースデータサイズでスケールダウンしたノイズレベルによる最適表現への線形収束を確立する。これはoracleの実証的リスク最小化器と同じ順序で一般化される。各種数値シミュレーションにおける$\texttt{DFW}$の重要性を検証する。特に,バニラの交互最小化降下は,iidにおいても破滅的に失敗するが,軽度に非等方性データを示す。我々の分析は、事前の作業を統一し、一般化し、制御や動的システムといった幅広いアプリケーションに対して柔軟なフレームワークを提供する。 A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically grounding these merits, we propose a general setting of recovering linear operators $M$ from noisy vector measurements $y = Mx + w$, where the covariates $x$ may be both non-i.i.d. and non-isotropic. We demonstrate that existing isotropy-agnostic meta-learning approaches incur biases on the representation update, which causes the scaling of the noise terms to lose favorable dependence on the number of source tasks. This in turn can cause the sample complexity of representation learning to be bottlenecked by the single-task data size. We introduce an adaptation, $\texttt{De-bias & Feature-Whiten}$ ($\texttt{DFW}$), of the popular alternating minimization-descent (AMD) scheme proposed in Collins et al., (2021), and establish linear convergence to the optimal representation with noise level scaling down with the $\textit{total}$ source data size. This leads to generalization bounds on the same order as an oracle empirical risk minimizer. We verify the vital importance of $\texttt{DFW}$ on various numerical simulations. In particular, we show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications, such as in controls and dynamical systems.	翻訳日:2023-08-09 11:48:20 公開日:2023-08-08
# 自動エンコーダと生成逆数ネットワークを用いた古代石器表面の異常検出のためのディープラーニング手法 A Deep-Learning Method Using Auto-encoder and Generative Adversarial Network for Anomaly Detection on Ancient Stone Stele Surfaces ( http://arxiv.org/abs/2308.04426v1 ) ライセンス: Link先を確認	Yikun Liu and Yuning Wang and Cheng Liu	(参考訳) 最初の例では、自然劣化と人為的損傷の正確な検出が、その予防的保存に不可欠である。既存の文化遺産保存法は、正確性、効率性、時系列性、コストのバランスが困難であるため、この目標を完全に達成できない。本稿では, オートエンコーダ (ae) とgan (generative adversarial network) を用いて, 上記の古代石碑の緊急状況をリアルタイムで自動検出する深層学習手法を提案する。提案手法は, 予測不能な異常を包括的に検出しつつ, 広範な異常サンプルを必要とせず, 既存の手法の限界を克服するものである。この方法は、監視、データ取得、前処理、モデル構築、後処理の段階を含む。ロングメン・グロットーズの石碑をケーススタディとして、aeとganアーキテクチャに基づく教師なし学習モデルを提案し、99.74\%の再構成精度で検証した。本手法の評価により,人工的に設計された7つの異常を十分に検出し,誤報を伴わずに精度と信頼性を示した。本研究は,文化遺産分野における深層学習の新たな考え方と可能性を提供する。 Accurate detection of natural deterioration and man-made damage on the surfaces of ancient stele in the first instance is essential for their preventive conservation. Existing methods for cultural heritage preservation are not able to achieve this goal perfectly due to the difficulty of balancing accuracy, efficiency, timeliness, and cost. This paper presents a deep-learning method to automatically detect above mentioned emergencies on ancient stone stele in real time, employing autoencoder (AE) and generative adversarial network (GAN). The proposed method overcomes the limitations of existing methods by requiring no extensive anomaly samples while enabling comprehensive detection of unpredictable anomalies. the method includes stages of monitoring, data acquisition, pre-processing, model structuring, and post-processing. Taking the Longmen Grottoes' stone steles as a case study, an unsupervised learning model based on AE and GAN architectures is proposed and validated with a reconstruction accuracy of 99.74\%. The method's evaluation revealed the proficient detection of seven artificially designed anomalies and demonstrated precision and reliability without false alarms. This research provides novel ideas and possibilities for the application of deep learning in the field of cultural heritage.	翻訳日:2023-08-09 11:47:44 公開日:2023-08-08
# 共同対話感覚分類と行為認識のための双方向マルチホップ推論モデル A Bi-directional Multi-hop Inference Model for Joint Dialog Sentiment Classification and Act Recognition ( http://arxiv.org/abs/2308.04424v1 ) ライセンス: Link先を確認	Li Zheng, Fei Li, Yuyang Chai, Chong Teng, Donghong Ji	(参考訳) ダイアログ知覚分類(DSC)とアクト認識(DAR)の併用作業は,ダイアログ中の各発話に対する感情ラベルと行動ラベルを同時に予測することを目的としている。しかし、現在のメソッドはダイアログコンテキストを1つの方向だけエンコードしており、コンテキストを完全に理解する能力が制限されている。さらに、これらの手法は、感情と行動ラベルの明確な相関を見落とし、リッチな感情を捉え、手がかりを行動させ、効果的で正確な推論を妨げる能力に乏しい。これらの問題に対処するために,特徴選択ネットワークと双方向マルチホップ推論ネットワークを活用した双方向マルチホップ推論モデル(bmim)を提案する。また,感情と行動ラベルの相関を明示的にモデル化するために,コントラスト学習と二重学習を用いる。 DARのF1スコアは少なくとも2.6%,DSCのF1スコアは1.4%,BMIMは最先端のベースラインよりも優れていた。さらに,提案モデルでは,パフォーマンスの向上だけでなく,共同感情と行動予測タスクの解釈可能性の向上も図っている。 The joint task of Dialog Sentiment Classification (DSC) and Act Recognition (DAR) aims to predict the sentiment label and act label for each utterance in a dialog simultaneously. However, current methods encode the dialog context in only one direction, which limits their ability to thoroughly comprehend the context. Moreover, these methods overlook the explicit correlations between sentiment and act labels, which leads to an insufficient ability to capture rich sentiment and act clues and hinders effective and accurate reasoning. To address these issues, we propose a Bi-directional Multi-hop Inference Model (BMIM) that leverages a feature selection network and a bi-directional multi-hop inference network to iteratively extract and integrate rich sentiment and act clues in a bi-directional manner. We also employ contrastive learning and dual learning to explicitly model the correlations of sentiment and act labels. Our experiments on two widely-used datasets show that BMIM outperforms state-of-the-art baselines by at least 2.6% on F1 score in DAR and 1.4% on F1 score in DSC. Additionally, Our proposed model not only improves the performance but also enhances the interpretability of the joint sentiment and act prediction task.	翻訳日:2023-08-09 11:47:24 公開日:2023-08-08
# 有限干渉計を用いた高次元時空絡み合いの活用法 How to harness high-dimensional temporal entanglement, using limited interferometry setups ( http://arxiv.org/abs/2308.04422v1 ) ライセンス: Link先を確認	Alexandra Bergmayr, Florian Kanitschar, Matej Pivoluska, Marcus Huber	(参考訳) 高次元の絡み合いは量子通信において大きな利点があることが示されている。多くの自由度、特にダウンコンバージョン(SPDC)で定期的に生成される時間領域で利用可能である。ローカルに1つの検出器チャネルだけが必要であるという利点はあるが、特に量子鍵分散アプリケーションに必要な仮定なしの方法で解析することは、悪名高い。分極時間領域における高次元絡み合いの最初の完全解析を行い、関連する密度行列要素と量子鍵分布(qkd)のセキュリティパラメータを効率的に検証する方法を示す。厳密な足場に関する過去の実験に加えて、物理ノイズモデルも開発し、自由空間量子通信の耐雑音性をさらに高める新しい構成を提案する。 High-dimensional entanglement has shown to have significant advantages in quantum communication. It is available in many degrees of freedom and in particular in the time-domain routinely produced in down-conversion (SPDC). While advantageous in the sense that only a single detector channel is needed locally, it is notoriously hard to analyze, especially in an assumption-free manner that is required for quantum key distribution applications. We develop the first complete analysis of high-dimensional entanglement in the polarization-time-domain and show how to efficiently certify relevant density matrix elements and security parameters for Quantum Key Distribution (QKD). In addition to putting past experiments on rigorous footing, we also develop physical noise models and propose a novel setup that can further enhance the noise resistance of free-space quantum communication.	翻訳日:2023-08-09 11:47:02 公開日:2023-08-08
# DiffCR:光学衛星画像からの雲除去のための高速条件拡散フレームワーク DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images ( http://arxiv.org/abs/2308.04417v1 ) ライセンス: Link先を確認	Xuechao Zou, Kai Li, Junliang Xing, Yu Zhang, Shiying Wang, Lei Jin, and Pin Tao	(参考訳) 光衛星画像は重要なデータソースであるが、雲は品質を損なうことが多く、画像の応用や分析を妨げている。その結果、光学衛星画像から雲を効果的に除去する研究の方向性が明らかになってきた。クラウド除去の最近の進歩は、主に最適な画像品質をもたらす可能性のある生成的逆ネットワークに依存しているが、拡散モデルは、様々な画像生成タスクにおいて顕著な成功を示しており、この課題に対処できる可能性を示している。本稿では,光衛星画像の高速クラウド除去に深部畳み込みネットワークを用いた条件付き拡散を利用したDiffCRという新しいフレームワークを提案する。具体的には、条件付き画像特徴抽出のための分離エンコーダを導入し、条件付き入力と合成出力との外観情報の密接な類似性を保証する頑健な色表現を提供する。また,雲の除去モデルにおいて,条件画像の出現と目標画像との対応性を計算コストで正確にシミュレートする,新しい,効率的な時間と条件の融合ブロックを提案する。 2つの一般的なベンチマークデータセットに対する大規模な実験的評価は、DiffCRが全ての指標で常に最先端のパフォーマンスを達成しており、パラメータと計算の複雑さはそれぞれ、以前のベストメソッドの5.1%と5.4%であることを示している。ソースコード、事前トレーニングされたモデル、および実験結果は、この論文が受け入れられた時点でhttps://github.com/XavierJiezou/DiffCRで公開されている。 Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image quality, diffusion models have demonstrated remarkable success in diverse image-generation tasks, showcasing their potential in addressing this challenge. This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery. Specifically, we introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output. Moreover, we propose a novel and efficient time and condition fusion block within the cloud removal model to accurately simulate the correspondence between the appearance in the conditional image and the target image at a low computational cost. Extensive experimental evaluations on two commonly used benchmark datasets demonstrate that DiffCR consistently achieves state-of-the-art performance on all metrics, with parameter and computational complexities amounting to only 5.1% and 5.4%, respectively, of those previous best methods. The source code, pre-trained models, and all the experimental results will be publicly available at https://github.com/XavierJiezou/DiffCR upon the paper's acceptance of this work.	翻訳日:2023-08-09 11:46:30 公開日:2023-08-08
# 新しいタイプの自然崩壊モデルの提案 A proposal for a new kind of spontaneous collapse model ( http://arxiv.org/abs/2308.04415v1 ) ライセンス: Link先を確認	Nicol\`o Piccione	(参考訳) 自然崩壊モデルは、物理機構が波動関数の崩壊の原因となる標準的な量子力学の修正であり、いわゆる「測定問題」を解決する手段を提供する。しかし、相対論的にしようとすると、大きな課題が現れます。本稿では,相対論的バージョンを容易に得ることができる新しい非相対論的自発的崩壊モデルを提案する。非相対論的な状態においては、このモデルがGhirardi-Rimini-Weberモデルと非常によく似た力学に導かれることを示す。さらに、よく知られた連続自発局所化モデルのマスター方程式を得ることもできる。最後に,提案モデルがGhirardi-Rimini-Weberモデルと概念的に類似した方法で測定問題を解く方法を示す。 Spontaneous collapse models are modifications of standard quantum mechanics in which a physical mechanism is responsible for the collapse of the wavefunction, thus providing a way to solve the so-called "measurement problem". However, they present great challenges when one tries to make them relativistic. Here, we propose a new kind of non-relativistic spontaneous collapse model whose relativistic version could be easier to obtain. In the non-relativistic regime, we show that this model can lead to a dynamics quite similar to that of the Ghirardi-Rimini-Weber model, by also naturally solving the problem of indistinguishable particles. Moreover, we can also obtain the same master equation of the well-known Continuous Spontaneous Localization models. Finally, we show how our proposed model solves the measurement problem in a manner conceptually similar to the Ghirardi-Rimini-Weber model.	翻訳日:2023-08-09 11:45:52 公開日:2023-08-08

Title

Authors

Abstract

論文公表日・翻訳日

# LLMを使ったコードインテリジェンスタスクのコンテキスト内説明に何が役立つのか?

What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs? ( http://arxiv.org/abs/2304.07575v2 )

ライセンス: Link先を確認

Shuzheng Gao, Xin-Cheng Wen, Cuiyun Gao, Wenxuan Wang, Hongyu Zhang, Michael R. Lyu

(参考訳) トレーニング済みのソースコードモデルは、多くのコードインテリジェンスタスクで広く人気を集めている。近年、モデルとコーパスサイズのスケーリングにより、大きな言語モデルでは、コンテキスト内学習(icl)の能力が示されている。 iclはタスク命令といくつかの例をデモンストレーションとして使用し、そのデモンストレーションを言語モデルに入力して予測を行う。この新しい学習パラダイムはトレーニングフリーであり、様々な自然言語処理やコードインテリジェンスタスクで印象的なパフォーマンスを示している。しかし、ICLのパフォーマンスは、例えば選択された例のようなデモの質に大きく依存している。コード関連タスクの良質なデモンストレーションを構築する方法について体系的に調査することが重要である。本稿では,コードインテリジェンスタスクにおけるICLの性能に及ぼす3つの重要な要因 – 選択,順序,実演例の数 – の影響を実証的に検討する。コード要約、バグ修正、プログラム合成を含む3つのコードインテリジェンスタスクについて広範な実験を行った。実験の結果、上記の3つの要因がコードインテリジェンスタスクにおけるICLの性能に劇的な影響を及ぼすことが示された。さらに,本研究の成果を要約し,これらの3つの観点から効果的な実演の作り方を提案する。また,本研究に基づく注意深く設計されたデモンストレーションは,bleu-4,em,emを少なくとも9.90%,175.96%,50.81%,コード要約,バグフィックス,プログラム合成など,広く使用されているデモンストレーション構築手法に対して大幅に改善する可能性を示す。

Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context learning (ICL). ICL employs task instructions and a few examples as demonstrations, and then inputs the demonstrations to the language models for making predictions. This new learning paradigm is training-free and has shown impressive performance in various natural language processing and code intelligence tasks. However, the performance of ICL heavily relies on the quality of demonstrations, e.g., the selected examples. It is important to systematically investigate how to construct a good demonstration for code-related tasks. In this paper, we empirically explore the impact of three key factors on the performance of ICL in code intelligence tasks: the selection, order, and number of demonstration examples. We conduct extensive experiments on three code intelligence tasks including code summarization, bug fixing, and program synthesis. Our experimental results demonstrate that all the above three factors dramatically impact the performance of ICL in code intelligence tasks. Additionally, we summarize our findings and provide takeaway suggestions on how to construct effective demonstrations, taking into account these three perspectives. We also show that a carefully-designed demonstration based on our findings can lead to substantial improvements over widely-used demonstration construction methods, e.g., improving BLEU-4, EM, and EM by at least 9.90%, 175.96%, and 50.81% on code summarization, bug fixing, and program synthesis, respectively

翻訳日:2023-10-24 12:47:40 公開日:2023-08-08

# シナリオ生成はSOTIFの準備が整っているか? 体系的な文献レビュー

Is Scenario Generation Ready for SOTIF? A Systematic Literature Review ( http://arxiv.org/abs/2308.02273v2 )

ライセンス: Link先を確認

Lukas Birkemeyer, Christian King, Ina Schaefer

(参考訳) シナリオベースのテストは、高度な運転支援システムや自動運転システムを検証するための最先端技術と考えられている。 sotif標準(iso 21448)の正式ローンチにより、シナリオベースのテストは、これらの高度に自動化された運転システムのリリースにますます重要になる。しかし、本質的な欠落は、SOTIF標準の実践的適用を妨げる: シナリオベースのテストのシナリオを現実的に生成する方法? 本稿では,SOTIF規格の要件を満たすシナリオを生成する手法を特定するために,システム文献レビューを実施している。既存のシナリオ生成手法を分類し,生成されたシナリオwrtの特性を評価する。 sotif要件。実世界のどの詳細が生成されたシナリオでカバーされているのか、テスト対象のシステム固有のシナリオなのか、それともジェネリックなシナリオなのか、未知のシナリオと危険なシナリオのセットを最小限に抑えるよう設計されているのかを調査した。我々は,既存の技術で生成されたシナリオが,SOTIF規格に規定されている要件に従わないことを結論し,今後の研究の方向性を提案する。

Scenario-based testing is considered state-of-the-art to verify and validate Advanced Driver Assistance Systems or Automated Driving Systems. Due to the official launch of the SOTIF-standard (ISO 21448), scenario-based testing becomes more and more relevant for releasing those Highly Automated Driving Systems. However, an essential missing detail prevent the practical application of the SOTIF-standard: How to practically generate scenarios for scenario-based testing? In this paper, we perform a Systematic Literature Review to identify techniques that generate scenarios complying with requirements of the SOTIF-standard. We classify existing scenario generation techniques and evaluate the characteristics of generated scenarios wrt. SOTIF requirements. We investigate which details of the real-world are covered by generated scenarios, whether scenarios are specific for a system under test or generic, and whether scenarios are designed to minimize the set of unknown and hazardous scenarios. We conclude that scenarios generated with existing techniques do not comply with requirements implied by the SOTIF-standard; hence, we propose directions for future research.

翻訳日:2023-10-23 15:21:08 公開日:2023-08-08

# ファジングのためのモデルベーススクリプト合成

model-based script synthesis for fuzzing ( http://arxiv.org/abs/2308.04115v1 )

ライセンス: Link先を確認

Zian Liu, Chao Chen, Muhammad Ejaz Ahmed, Jun Zhang, Dongxi Liu

(参考訳) カーネルファジングは重要なカーネルの脆弱性を見つけるのに重要である。ソースコードの欠如により、クローズソース(例えばwindows)オペレーティングシステムカーネルのファジングはさらに困難である。既存のアプローチは、トレースからのsyscallシーケンスやシステムコードの静的解析をモデル化することでカーネルを混乱させる。しかしながら、一般的な制限は、異なるカーネル状態に到達するためにsyscallシーケンスを学習したり変更したりしないため、より多くのバグやクラッシュを引き起こす可能性があることである。本稿では,異なるカーネル状態に到達するためにトレースされたsyscallシーケンスを学習し,ミュートする手法であるwinkfuzzを提案する。 WinkFuzzは、トレースからsyscall依存性を学び、後続のsyscallを持つ可能性のあるトレース内の潜在的なsyscallを特定し、依存関係を適用して、依存関係をトレースに保存する。そして、WinkFuzzは合成された新しいsyscallシーケンスをファズしてシステムクラッシュを見つける。我々は,WinkFuzzを4種類のシードアプリケーションに適用し,シースコール数70.8\%,成功率61\%の合計増加を3つのインサートレベルで確認した。トレース時間,依存性解析,モデルスクリプトの復元,合成スクリプトの平均時間は,それぞれ600,39,34,129秒であった。瞬時のファジングレートは3742 syscall per secondである。しかし,初期化時間,待ち時間,その他の要因を考慮した場合,平均ファズ効率は毎秒155回まで低下した。私たちは各シードアプリケーションを24秒間ファズして、平均してその時間内に12.25回クラッシュしました。

Kernel fuzzing is important for finding critical kernel vulnerabilities. Close-source (e.g., Windows) operating system kernel fuzzing is even more challenging due to the lack of source code. Existing approaches fuzz the kernel by modeling syscall sequences from traces or static analysis of system codes. However, a common limitation is that they do not learn and mutate the syscall sequences to reach different kernel states, which can potentially result in more bugs or crashes. In this paper, we propose WinkFuzz, an approach to learn and mutate traced syscall sequences in order to reach different kernel states. WinkFuzz learns syscall dependencies from the trace, identifies potential syscalls in the trace that can have dependent subsequent syscalls, and applies the dependencies to insert more syscalls while preserving the dependencies into the trace. Then WinkFuzz fuzzes the synthesized new syscall sequence to find system crashes. We applied WinkFuzz to four seed applications and found a total increase in syscall number of 70.8\%, with a success rate of 61\%, within three insert levels. The average time for tracing, dependency analysis, recovering model script, and synthesizing script was 600, 39, 34, and 129 seconds respectively. The instant fuzzing rate is 3742 syscall executions per second. However, the average fuzz efficiency dropped to 155 syscall executions per second when the initializing time, waiting time, and other factors were taken into account. We fuzzed each seed application for 24 seconds and, on average, obtained 12.25 crashes within that time frame.

翻訳日:2023-10-23 15:13:00 公開日:2023-08-08

# Inverse Transparency Toolchain: 完全に統合され、素早くデプロイ可能なデータ使用ログインフラストラクチャ

The Inverse Transparency Toolchain: A Fully Integrated and Quickly Deployable Data Usage Logging Infrastructure ( http://arxiv.org/abs/2308.04366v1 )

ライセンス: Link先を確認

Valentin Zieglmeier

(参考訳) 逆透明性は、従業員データのすべての使用を見える化することで実現される。これは、利用情報のロギングと保存を処理し、ログされたデータをデータ所有者に可視化するツールを必要とする。逆透過性を統合した研究と教育のコンテキストでは、必要なインフラストラクチャの構築が難しくなります。 Inverse Transparency Toolchainはこのようなシナリオに対して柔軟なソリューションを提供する。簡単にデプロイでき、密に統合できる。そこで本研究では,ユーザによる経験的学習,大学コースでのプロトタイピング,業界パートナによる実験を含むユースケースをうまく処理した。

Inverse transparency is created by making all usages of employee data visible to them. This requires tools that handle the logging and storage of usage information, and making logged data visible to data owners. For research and teaching contexts that integrate inverse transparency, creating this required infrastructure can be challenging. The Inverse Transparency Toolchain presents a flexible solution for such scenarios. It can be easily deployed and is tightly integrated. With it, we successfully handled use cases covering empirical studies with users, prototyping in university courses, and experimentation with our industry partner.

翻訳日:2023-10-23 15:01:30 公開日:2023-08-08

# 公正かつ包括的参加予算:累積および二次投票インタフェースを用いた投票経験

Fair and Inclusive Participatory Budgeting: Voter Experience with Cumulative and Quadratic Voting Interfaces ( http://arxiv.org/abs/2308.04345v1 )

ライセンス: Link先を確認

Thomas Welling, Fatemeh Banaie Heravan, Abhinav Sharma, Lodewijk Gelauff, Regula Haenggli, Evangelos Pournaras

(参考訳) 累積投票と2次投票は、特に参加予算の領域において、公平さと包摂性を促進する2つの分散投票方法である。これらの利点にもかかわらず、累積および二次投票のためのグラフィカル投票インタフェースは、実装と有効利用が複雑である。その結果、このような方法がデジタル投票プラットフォームで広く採用されることはなかった。本稿では,最先端の投票プラットフォームであるstanford participatory budgetingにおいて,累積投票と二次投票の実装と評価を導入することで課題を解決する。その結果、有権者は単純な方法を好むが、より表現力のある(かつ複雑な)累積投票の方が、単純だが表現力の低いkランク投票よりも好まれることがわかった。実装された投票インターフェース要素は有用であり、より表現力のある投票方法に対する投票者の好みを支持する。 *

Cumulative and quadratic voting are two distributional voting methods that are expressive, promoting fairness and inclusion, particularly in the realm of participatory budgeting. Despite these benefits, graphical voter interfaces for cumulative and quadratic voting are complex to implement and use effectively. As a result, such methods have not seen yet widespread adoption on digital voting platforms. This paper addresses the challenge by introducing an implementation and evaluation of cumulative and quadratic voting within a state-of-the-art voting platform: Stanford Participatory Budgeting. The findings of the study show that while voters prefer simple methods, the more expressive (and complex) cumulative voting becomes the preferred one compared to k-ranking voting that is simpler but less expressive. The implemented voting interface elements are found useful and support the observed voters' preferences for more expressive voting methods. *

翻訳日:2023-10-23 15:01:21 公開日:2023-08-08

# オープンソースの機械学習製品のデータセットと分析

A Dataset and Analysis of Open-Source Machine Learning Products ( http://arxiv.org/abs/2308.04328v1 )

ライセンス: Link先を確認

Nadia Nahar, Haoran Zhang, Grace Lewis, Shurui Zhou, Christian K\"astner

(参考訳) 機械学習(ML)コンポーネントはソフトウェア製品にますます取り入れられているが、開発者はMLプロトタイプから製品に移行する上での課題に直面している。学術研究者は、これらの課題に対する解決策の提案と介入を評価するのに苦労している。本研究では,オープンソースのMLプロダクトを定義し,GitHubから262リポジトリのデータセットをキュレートし,さらなる研究と教育を促進する。まず、異なる開発活動に関する6つの幅広い研究課題を調査し、データセットから30のML製品のサンプルから21の調査結果を報告する。この結果から,今後の研究革新に十分な機会を提供するMLモデルの開発プラクティスやアーキテクチャ決定の多様さが明らかになった。また、オープンソースのML製品におけるモデルテストやパイプライン自動化といった業界のベストプラクティスの証拠はほとんどありません。

Machine learning (ML) components are increasingly incorporated into software products, yet developers face challenges in transitioning from ML prototypes to products. Academic researchers struggle to propose solutions to these challenges and evaluate interventions because they often do not have access to close-sourced ML products from industry. In this study, we define and identify open-source ML products, curating a dataset of 262 repositories from GitHub, to facilitate further research and education. As a start, we explore six broad research questions related to different development activities and report 21 findings from a sample of 30 ML products from the dataset. Our findings reveal a variety of development practices and architectural decisions surrounding different types and uses of ML models that offer ample opportunities for future research innovations. We also find very little evidence of industry best practices such as model testing and pipeline automation within the open-source ML products, which leaves room for further investigation to understand its potential impact on the development and eventual end-user experience for the products.

翻訳日:2023-10-23 15:01:06 公開日:2023-08-08

# 技術ノード組立プロセスのための自動機械視覚制御システム

Automated machine vision control system for technological nodes assembly process ( http://arxiv.org/abs/2310.00005v1 )

ライセンス: Link先を確認

Nikolay Shtabel, Mikhail Saramud, Stepan Tkachev, Iakov Pikalov

(参考訳) 本稿では,小型宇宙船の組み立てのための自動制御システムの構築,技術的解決,実装の前提条件について論じる。各種の職場における個々のユニットの組み立て過程の制御とログを提供するシステムのハードウェアおよびソフトウェア実装の両方を解析する。本稿では, 組立技術, 特に低解像度のカメラを, 技術マークの形成と処理に特別なアルゴリズムを用いることにより, 機器の要求を低減させる手法を提案する。このツールでは、スレッド接続の締め付けトルクを制御し、所定のアルゴリズムによる無線制御による締め付けトルクを制限することができる。開発システムは、制御だけでなく、技術プロセスのロギング機能も提供しており、将来的には製品のデジタルツインを作成する際にも有用である。

The paper discusses the prerequisites for the creation, technical solutions and implementation of an automated control system for the assembly of a small spacecraft. Both the hardware and software implementation of the system that provides control and logging of the assembly process of individual units at various workplaces are analyzed. The article presents solutions to reduce the requirements for equipment used to control the assembly technology, in particular, to use cameras with a lower resolution, through the use of special algorithms for the formation and processing of technological marks. A tool is presented that allows you to control the tightening torques of threaded connections and limit the tightening torque according to a given algorithm with wireless control. The developed system provides the functions of not only control, but also logging of the technological process, which can be useful in the future when creating a digital twin of the product.

翻訳日:2023-10-23 05:24:59 公開日:2023-08-08

# アナログ回路を用いたMNISTデータセット学習の実装

Implementation Of MNIST Dataset Learning Using Analog Circuit ( http://arxiv.org/abs/2308.16307v1 )

ライセンス: Link先を確認

Minjae Kim

(参考訳) アナログ回路にニューラルネットワークを実装する試みは数多く行われている。それらの多くは多くの入力語を持ち、ほとんどの研究は、Spiceと呼ばれる回路シミュレーションプログラムを通じてアナログ回路にニューラルネットワークを実装し、チップを高コストで設計することを避け、入力する回路を直接実装した。本研究では,コンデンサとダイオードを用いてニューラルネットワークを実装し,マイクロコントローラ(Arduino Mega 2560 R3ボード)を用いて実世界のモデルを駆動し,結果を解析する。

There have been many attempts to implement neural networks in the analog circuit. Most of them had a lot of input terms, and most studies implemented neural networks in the analog circuit through a circuit simulation program called Spice to avoid the need to design chips at a high cost and implement circuits directly to input them. In this study, we will implement neural networks using a capacitor and diode and use microcontrollers (Arduino Mega 2560 R3 boards) to drive real-world models and analyze the results.

翻訳日:2023-09-03 21:22:54 公開日:2023-08-08

# 各種拡張を有する無定常確率型プッシュダウンシステムのモデルチェッキングPCTL特性

Model-Checking PCTL properties of Stateless Probabilistic Pushdown Systems with Various Extensions ( http://arxiv.org/abs/2209.10517v7 )

ライセンス: Link先を確認

Tianrong Lin

(参考訳) 本稿では、まず、無限状態系の確率的検証(具体的には、状態のない確率的プッシュダウン系)における開問題を解決する。我々は、モデルチェック {\em stateless probabilistic pushdown system (pBPA) が一般には決定不可能であることを示す。我々は「em確率的プッシュダウンシステム」と「emマルコフ連鎖」の量子アナログを定義し、本論文で定義された「em量子マルコフ連鎖」の分岐時間特性を記述するために「em確率的計算木論理」の量子アナログを定義する必要があるかどうかをさらに検討する。モデルチェック問題について検討し,計算木論理 (PCTL) に対する状態のない量子プッシュダウンシステム (qBPA) のモデルチェックが概ね決定不可能であることを示す。我々は「em確率的$\omega$-pushdown automaton」の概念を初めて定義し、"em stateless probabilistic $\omega$-pushdown system (\omega$-pbpa)} と$\omega$-pctl (chatterjee et al. in \cite{csh08}) とのモデルチェック問題を調べ、"em stateless probabilistic $\omega$-pushdown system (\omega$-pbpa)} と$\omega$-pctl のモデルチェックが一般に決定不能であることを示し、その結果を要約する。我々のアプローチは間接的に$\omega$-PCTLを符号化する公式を構築することである。

In this paper, we first resolve an open question in the probabilistic verification of infinite-state systems (specifically, the {\em stateless probabilistic pushdown systems}). We show that model checking {\em stateless probabilistic pushdown systems (pBPA)} against {\em probabilistic computational tree logic (PCTL)} is generally undecidable. We define the quantum analogues of the {\em probabilistic pushdown systems} and {\em Markov chains}, and further investigate whether it is necessary to define a quantum analogue of {\em probabilistic computational tree logic} to describe the branching-time properties of the {\em quantum Markov chain} defined in this paper. We study its model-checking question and show that the model-checking of {\em stateless quantum pushdown systems (qBPA)} against {\em probabilistic computational tree logic (PCTL)} is generally undecidable, with the immediate corollaries summarized. We define the notion of {\em probabilistic $\omega$-pushdown automaton} for the first time and study the model-checking question of {\em stateless probabilistic $\omega$-pushdown system ($\omega$-pBPA)} against $\omega$-PCTL (defined by Chatterjee et al. in \cite{CSH08}) and show that the model-checking of {\em stateless probabilistic $\omega$-pushdown systems ($\omega$-pBPA)} against $\omega$-PCTL is generally undecidable, with immediate consequences summarized. Our approach is to construct formulas of $\omega$-PCTL encoding the {\em Post Correspondence Problem} indirectly.

翻訳日:2023-08-27 05:32:15 公開日:2023-08-08

# AdaptEx: セルフサービスのコンテキストバンドプラットフォーム

AdaptEx: A Self-Service Contextual Bandit Platform ( http://arxiv.org/abs/2308.08650v1 )

ライセンス: Link先を確認

William Black, Ercument Ilhan, Andrea Marchini and Vilda Markeviciute

(参考訳) 本稿では,Expedia Groupで広く利用されているセルフサービスコンテキスト型バンディットプラットフォームであるAdaptExについて述べる。 AdaptExは、各訪問者のユニークなコンテキストを考慮し、最適なバリエーションを選択し、それらが行うすべてのインタラクションから素早く学習する。従来のテストメソッドに関連するコストと時間を最小化しながら、ユーザエクスペリエンスを改善する強力なソリューションを提供する。このプラットフォームは、常に変化するコンテンツや継続的な"コールドスタート"状況でも、最適な製品ソリューションへのイテレーションを迅速に行うことができる。

This paper presents AdaptEx, a self-service contextual bandit platform widely used at Expedia Group, that leverages multi-armed bandit algorithms to personalize user experiences at scale. AdaptEx considers the unique context of each visitor to select the optimal variants and learns quickly from every interaction they make. It offers a powerful solution to improve user experiences while minimizing the costs and time associated with traditional testing methods. The platform unlocks the ability to iterate towards optimal product solutions quickly, even in ever-changing content and continuous "cold start" situations gracefully.

翻訳日:2023-08-27 05:16:13 公開日:2023-08-08

# 人工知能のメタヒューリスティックアルゴリズムとバイオインフォマティクス, バイオ統計学, 生態学, 製造業への応用

Metaheuristic Algorithms in Artificial Intelligence with Applications to Bioinformatics, Biostatistics, Ecology and, the Manufacturing Industries ( http://arxiv.org/abs/2308.10875v1 )

ライセンス: Link先を確認

Elvis Han Cui, Zizhao Zhang, Culsome Junwen Chen, Weng Kee Wong

(参考訳) 自然にインスパイアされたメタヒューリスティックアルゴリズムは、人工知能の重要なコンポーネントであり、様々な最適化問題に取り組むために、分野間でますます使われています。我々は,CSO-MAを用いた競合Swarm Optimizationrという,自然に着想を得たメタヒューリスティックアルゴリズムを新たに提案し,その柔軟性と性能を,統計学における様々な最適化問題に適用した。特に、アルゴリズムは効率的であり、様々なコスト構造や複数のユーザ指定非線形制約を組み込むことができる。私たちのアプリケーションには一単細胞一般化傾向モデルにおけるパラメータの最大推定値を求め、バイオインフォマティクスにおける擬似時間を研究する。 (ii)教育研究における一般的なraschモデルにおけるパラメータの推定 (iii)マルコフ更新モデルにおけるcox回帰のためのm-estimatesの探索と (4) 2つのコンパートメントモデルにおける欠落値を暗示する行列補完。さらに応用についても論じる。 (v)生態問題において最適な変数を選定し、 (vi)複数の相互作用因子をもつロジスティックモデルを用いて自動車産業のための燃料補給実験を設計する。

Nature-inspired metaheuristic algorithms are important components of artificial intelligence, and are increasingly used across disciplines to tackle various types of challenging optimization problems. We apply a newly proposed nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA) and demonstrate its flexibility and out-performance relative to its competitors in a variety of optimization problems in the statistical sciences. In particular, we show the algorithm is efficient and can incorporate various cost structures or multiple user-specified nonlinear constraints. Our applications include (i) finding maximum likelihood estimates of parameters in a single cell generalized trend model to study pseudotime in bioinformatics, (ii) estimating parameters in a commonly used Rasch model in education research, (iii) finding M-estimates for a Cox regression in a Markov renewal model and (iv) matrix completion to impute missing values in a two compartment model. In addition we discuss applications to (v) select variables optimally in an ecology problem and (vi) design a car refueling experiment for the auto industry using a logistic model with multiple interacting factors.

翻訳日:2023-08-27 05:07:22 公開日:2023-08-08

# transtyler: 顔と身体のジェスチャー生成のためのマルチモーダルな動作スタイル転送

TranSTYLer: Multimodal Behavioral Style Transfer for Facial and Body Gestures Generation ( http://arxiv.org/abs/2308.10843v1 )

ライセンス: Link先を確認

Mireille Fares, Catherine Pelachaud, Nicolas Obin

(参考訳) 本稿では,仮想エージェントの行動表現スタイルを他のエージェントに移し,コミュニケーション的意味を持つ行動形態を保ちながら,行動表現スタイルを他のエージェントに移すことの課題について述べる。ここでは行動表現性スタイルを行動の質的特性と見なす。そこで我々は,TranSTYLerを提案する。TranSTYLerは,ソース話者のマルチモーダル動作をターゲット話者のスタイルで合成するマルチモーダルトランスフォーマーモデルである。行動表現スタイルは, テキスト, 音声, 身体ジェスチャー, 表情など, 様々なコミュニケーションのモダリティにまたがってコード化されていると仮定する。このモデルはスタイルとコンテンツの絡み合いスキーマを使用して、転送されたスタイルがソースの振る舞いによって伝達される意味に干渉しないようにします。提案手法は,スタイルラベルの必要性を排除し,トレーニング期間中に見られなかったスタイルへの一般化を可能にする。我々はPATSコーパスでモデルをトレーニングし、ダイアログや2D顔のランドマークを含むように拡張した。客観的および主観的評価は,本モデルがトレーニング中の見知らぬスタイルと見知らぬスタイルの両方において,アートモデルの状態よりも優れていたことを示している。そこで本稿では,コンテンツのリークや流儀の漏えい問題に対処するために,対象のスタイルに関連する動作やジェスチャーの伝達の程度を評価する手法を提案する。

This paper addresses the challenge of transferring the behavior expressivity style of a virtual agent to another one while preserving behaviors shape as they carry communicative meaning. Behavior expressivity style is viewed here as the qualitative properties of behaviors. We propose TranSTYLer, a multimodal transformer based model that synthesizes the multimodal behaviors of a source speaker with the style of a target speaker. We assume that behavior expressivity style is encoded across various modalities of communication, including text, speech, body gestures, and facial expressions. The model employs a style and content disentanglement schema to ensure that the transferred style does not interfere with the meaning conveyed by the source behaviors. Our approach eliminates the need for style labels and allows the generalization to styles that have not been seen during the training phase. We train our model on the PATS corpus, which we extended to include dialog acts and 2D facial landmarks. Objective and subjective evaluations show that our model outperforms state of the art models in style transfer for both seen and unseen styles during training. To tackle the issues of style and content leakage that may arise, we propose a methodology to assess the degree to which behavior and gestures associated with the target style are successfully transferred, while ensuring the preservation of the ones related to the source content.

翻訳日:2023-08-27 05:06:38 公開日:2023-08-08

# スマートエネルギー管理のための電流特徴可視化に基づく非侵入電力負荷モニタリング手法

Non-Intrusive Electric Load Monitoring Approach Based on Current Feature Visualization for Smart Energy Management ( http://arxiv.org/abs/2308.11627v1 )

ライセンス: Link先を確認

Yiwen Xu, Dengfeng Liu, Liangtao Huang, Zhiquan Lin, Tiesong Zhao, and Sam Kwong

(参考訳) 最先端のスマートシティは、特に電力システムにおいて、大規模ネットワーク上で経済的に効率的なエネルギー管理を求められている。システム内の全ユーザの電力負荷を監視、分析、制御することが重要な問題である。本稿では,aiの一般的なコンピュータビジョン技術を用いて,スマートエネルギー管理のための非侵襲的負荷監視手法を提案する。まず,信号変換(ウェーブレット変換と離散フーリエ変換を含む)とグラミアン角場(GAF)法の両方を用いて,一次元の電流信号を2次元カラー特徴像にマッピングする。第2に,多スケール特徴抽出と注意機構を備えたu字型ディープニューラルネットワークを用いて,カラー特徴画像からすべての電気負荷を認識することを提案する。第3に,本手法をクラウドベースで非侵襲的な全ユーザモニタリングとして設計し,電力系統制御時の省エネルギー化を図る。大規模IoT(Internet of Things, モノのインターネット)上での効率的なエネルギー管理を支援することを目的として, 提案手法の有効性を実証した。

The state-of-the-art smart city has been calling for an economic but efficient energy management over large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze and control electric loads of all users in system. In this paper, we employ the popular computer vision techniques of AI to design a non-invasive load monitoring method for smart electric energy management. First of all, we utilize both signal transforms (including wavelet transform and discrete Fourier transform) and Gramian Angular Field (GAF) methods to map one-dimensional current signals onto two-dimensional color feature images. Second, we propose to recognize all electric loads from color feature images using a U-shape deep neural network with multi-scale feature extraction and attention mechanism. Third, we design our method as a cloud-based, non-invasive monitoring of all users, thereby saving energy cost during electric power system control. Experimental results on both public and our private datasets have demonstrated our method achieves superior performances than its peers, and thus supports efficient energy management over large-scale Internet of Things (IoT).

翻訳日:2023-08-27 04:59:12 公開日:2023-08-08

# PokerKit: 細粒度多変数ポーカーゲームシミュレーションのための総合Pythonライブラリ

PokerKit: A Comprehensive Python Library for Fine-Grained Multi-Variant Poker Game Simulations ( http://arxiv.org/abs/2308.07327v1 )

ライセンス: Link先を確認

Juho Kim

(参考訳) PokerKitは、既存のポーカーゲームシミュレーションと手評価ツールの制限を克服するために設計された、オープンソースのPythonライブラリである。対照的に、ポーカーキットはポーカーの多種多様なバリエーションをサポートし、ユーザーが独自のゲームを定義するための柔軟なアーキテクチャを提供する。本稿では,ポーカーキットの設計と実装について詳述する。ポーカーキットは,直感的なプログラムapi,多変量ゲームサポート,さまざまな手のタイプにわたる統一的なハンド評価スイートなどである。 PokerKitの柔軟性により、ポーカーAI開発、ツール作成、オンラインポーカーカジノ実装など、さまざまな分野のアプリケーションが可能になる。 pokerkitの信頼性は、静的な型チェック、広範なdocテスト、ユニットテストによって確立され、97\%のコードカバレッジを達成している。 PokerKitの導入は、コンピュータポーカーの分野への重要な貢献であり、様々なポーカーゲームのための将来の研究と高度なAI開発を促進する。

PokerKit is an open-source Python library designed to overcome the restrictions of existing poker game simulation and hand evaluation tools, which typically support only a handful of poker variants and lack flexibility in game state control. In contrast, PokerKit significantly expands this scope by supporting an extensive array of poker variants and it provides a flexible architecture for users to define their custom games. This paper details the design and implementation of PokerKit, including its intuitive programmatic API, multi-variant game support, and a unified hand evaluation suite across different hand types. The flexibility of PokerKit allows for applications in diverse areas, such as poker AI development, tool creation, and online poker casino implementation. PokerKit's reliability has been established through static type checking, extensive doctests, and unit tests, achieving 97\% code coverage. The introduction of PokerKit represents a significant contribution to the field of computer poker, fostering future research and advanced AI development for a wide variety of poker games.

翻訳日:2023-08-20 16:29:00 公開日:2023-08-08

# 圧縮, 類似性検索, クラスタリング, 組織化, cDNAライブラリの操作改善のためのシーケンス類似性とコンテキストによるベクトル埋め込み

Vector Embeddings by Sequence Similarity and Context for Improved Compression, Similarity Search, Clustering, Organization, and Manipulation of cDNA Libraries ( http://arxiv.org/abs/2308.05118v1 )

ライセンス: Link先を確認

Daniel H. Um, David A. Knowles, Gail E. Kaiser

(参考訳) 本稿では、フラット文字列遺伝子形式(FASTA/FASTQ5)の研究における、遺伝子の組織的数値表現の有用性を示す。 FASTA/FASTQファイルには、ファイルサイズ、マッピングとアライメントの処理速度の遅さ、コンテキスト依存など、いくつかの制限がある。これらの課題は、類似のシーケンスを見つけることに関わる調査やタスクを著しく妨げている。この解は、配列を別の表現に変換することで、生の配列自身と比較して、類似したグループへのクラスタリングを容易にする。各ショートシーケンスに独自のベクトル埋め込みを割り当てることで、cDNAライブラリの文字列表現に対する圧縮性能をより効率的にクラスタリングし、改善することができる。さらに,コドン三重項の文脈に基づく交互座標ベクトル埋め込みの学習により,アミノ酸特性に基づくクラスタリングを示すことができる。最後に、バーコードとcDNA配列をエンコードするためにこのシーケンス埋め込み法を用いることで、ユークリッド空間におけるベクトルの近接性を決定するアルゴリズムとベクトル埋め込みを結合することで、類似検索の時間的複雑さを向上させることができる。

This paper demonstrates the utility of organized numerical representations of genes in research involving flat string gene formats (i.e., FASTA/FASTQ5). FASTA/FASTQ files have several current limitations, such as their large file sizes, slow processing speeds for mapping and alignment, and contextual dependencies. These challenges significantly hinder investigations and tasks that involve finding similar sequences. The solution lies in transforming sequences into an alternative representation that facilitates easier clustering into similar groups compared to the raw sequences themselves. By assigning a unique vector embedding to each short sequence, it is possible to more efficiently cluster and improve upon compression performance for the string representations of cDNA libraries. Furthermore, through learning alternative coordinate vector embeddings based on the contexts of codon triplets, we can demonstrate clustering based on amino acid properties. Finally, using this sequence embedding method to encode barcodes and cDNA sequences, we can improve the time complexity of the similarity search by coupling vector embeddings with an algorithm that determines the proximity of vectors in Euclidean space; this allows us to perform sequence similarity searches in a quicker and more modular fashion.

翻訳日:2023-08-11 14:59:22 公開日:2023-08-08

# PTransIPs:タンパク質事前学習言語モデルとトランスフォーマーに基づくリン酸化部位の同定

PTransIPs: Identification of phosphorylation sites based on protein pretrained language model and Transformer ( http://arxiv.org/abs/2308.05115v1 )

ライセンス: Link先を確認

Ziyang Xu and Haitian Zhong

(参考訳) リン酸化は多くの基本的な細胞プロセスの中心であり、様々な疾患の発症と進行に影響を与える。リン酸化部位の同定は、細胞やウイルス感染の分子機構を理解するための重要なステップであり、新たな治療標的となる可能性がある。本研究では,リン酸化部位の同定のための新しい深層学習モデルであるPTransIPを提案する。 PTransIPsは、タンパク質配列中のアミノ酸を自然言語の単語として扱い、配列中のアミノ酸の位置と型に基づくユニークなエンコーディングを抽出する。また、大きな事前訓練されたタンパク質モデルの埋め込みを追加のデータ入力として組み込む。 ptransipsはさらに、残差接続を持つ畳み込みニューラルネットワークと、マルチヘッドアテンション機構を備えたトランスフォーマーモデルの組み合わせモデルに基づいて訓練される。最後に、モデルは完全な連結層を通して分類結果を出力する。独立試験の結果、PTransIPsは既存の最先端手法よりも優れており、リン化S/T部位とY部位をそれぞれ同定するためのAUROCs 0.9232と0.9660が達成されている。さらに,プレトレーニングモデル埋め込みがPTransIPの性能に寄与することを示す。さらに、PTransIPsは、解釈可能なアミノ酸嗜好、可視訓練プロセスを有し、他の生物活性分類タスクにおける一般化性を示す。使用を容易にするため、コードとデータは \url{https://github.com/StatXzy7/PTransIPs} で公開されています。

Phosphorylation is central to numerous fundamental cellular processes, influencing the onset and progression of a variety of diseases. Identification of phosphorylation sites is thus an important step for understanding the molecular mechanisms of cells and virus infection, which potentially leads to new therapeutic targets. In this study, we present PTransIPs, a novel deep learning model for the identification of phosphorylation sites. PTransIPs treats amino acids in protein sequences as words in natural language, extracting unique encodings based on the types along with position of amino acids in the sequence. It also incorporates embeddings from large pre-trained protein models as additional data inputs. PTransIPS is further trained on a combination model of convolutional neural network with residual connections and Transformer model equipped with multi-head attention mechanisms. At last, the model outputs classification results through a fully connected layer. The results of independent testing reveal that PTransIPs outperforms existing state-of-the-art methodologies, achieving AUROCs of 0.9232 and 0.9660 for identifying phosphorylated S/T and Y sites respectively. In addition, ablation studies prove that pretrained model embeddings contribute to the performance of PTransIPs. Furthermore, PTransIPs has interpretable amino acid preference, visible training process and shows generalizability on other bioactivity classification tasks. To facilitate usage, our code and data are publicly accessible at \url{https://github.com/StatXzy7/PTransIPs}.

翻訳日:2023-08-11 14:59:00 公開日:2023-08-08

# ankylosing spondylitis に対する脊椎x線自動スコアリングの試み

Towards Automatic Scoring of Spinal X-ray for Ankylosing Spondylitis ( http://arxiv.org/abs/2308.05123v1 )

ライセンス: Link先を確認

Yuanhan Mo and Yao Chen and Aimee Readie and Gregory Ligozio and Thibaud Coroller and Bart{\l}omiej W. Papie\.z

(参考訳) 脊椎X線画像におけるStoke Ankylosing Spondylitis Spinal Score (mSASSS) の適応による構造変化は, 骨形状の複雑さと画像品質の変化により, コストと時間を要する。本研究では,x線脊椎イメージングにおいて,頚椎・腰椎ユニット(vus)のmsasssスコアを自動予測するために,vertxgradenetと呼ばれる2段階の自動グレーディングパイプラインを試作することで,この課題に対処した。 VertXGradeNetは、以前開発したVU抽出パイプライン(VertXNet)によって生成されたVUを入力として使用し、それらのVUに基づいてmSASSSを予測する。 vertxgradenet は軸椎変形性関節症患者の頚椎外側x線および腰椎x線画像の社内データセットで評価した。以上の結果から,VertXGradeNetは,データ量に制限のある場合,各VUのmSASSSスコアを予測できることがわかった。全体として、4つの異なるmSASSSスコア(すなわち、2つのテストデータセットで0, 1, 2, 3)に対して0.56と0.51のバランスの取れた精度を達成することができる。この方法の精度は, 脊髄x線読影の合理化の可能性を示し, 今後の臨床試験の費用削減に寄与する。

Manually grading structural changes with the modified Stoke Ankylosing Spondylitis Spinal Score (mSASSS) on spinal X-ray imaging is costly and time-consuming due to bone shape complexity and image quality variations. In this study, we address this challenge by prototyping a 2-step auto-grading pipeline, called VertXGradeNet, to automatically predict mSASSS scores for the cervical and lumbar vertebral units (VUs) in X-ray spinal imaging. The VertXGradeNet utilizes VUs generated by our previously developed VU extraction pipeline (VertXNet) as input and predicts mSASSS based on those VUs. VertXGradeNet was evaluated on an in-house dataset of lateral cervical and lumbar X-ray images for axial spondylarthritis patients. Our results show that VertXGradeNet can predict the mSASSS score for each VU when the data is limited in quantity and imbalanced. Overall, it can achieve a balanced accuracy of 0.56 and 0.51 for 4 different mSASSS scores (i.e., a score of 0, 1, 2, 3) on two test datasets. The accuracy of the presented method shows the potential to streamline the spinal radiograph readings and therefore reduce the cost of future clinical trials.

翻訳日:2023-08-11 14:47:47 公開日:2023-08-08

# fMRIによる自閉症スペクトラム障害の予測

Copy Number Variation Informs fMRI-based Prediction of Autism Spectrum Disorder ( http://arxiv.org/abs/2308.05122v1 )

ライセンス: Link先を確認

Nicha C. Dvornek, Catherine Sullivan, James S. Duncan, Abha R. Gupta

(参考訳) 自閉症スペクトラム障害(ASD)の多因子的エティロジーは、その研究が、神経画像、遺伝学、臨床評価など、幅広いプラットフォームからのデータを組み合わせたマルチモーダルアプローチから大きな恩恵を受けることを示唆している。以前のニューロイメージング・ジェネティック分析は、しばしば、データ駆動型作業においてナイーブな特徴結合アプローチを適用したり、あるモダリティからの発見を別のモダリティ分析のガイドに用いたりし、真に統一されたアプローチでペア化されたマルチモーダルデータを解析する機会を欠いた。本稿では、遺伝、人口統計、神経画像データを組み合わせたより統合的なモデルを開発する。遺伝子型が表現型に与える影響に着想を得て,モデル予測において重要な神経画像の特徴に注意を向ける注意型アプローチを提案する。遺伝データはコピー数の変化パラメータから、神経画像データは機能的磁気共鳴画像から得られる。 ASD分類と重大度予測タスクに対する提案手法を,228 ASDの性バランスデータセットを用いて評価し,典型的には10倍のクロスバリデーションフレームワークで被験者を育成する。遺伝情報,人口統計データ,機能的磁気共鳴画像を組み合わせた注意に基づくモデルが,他のマルチモーダル手法と比較して優れた予測性能をもたらすことを実証した。

The multifactorial etiology of autism spectrum disorder (ASD) suggests that its study would benefit greatly from multimodal approaches that combine data from widely varying platforms, e.g., neuroimaging, genetics, and clinical characterization. Prior neuroimaging-genetic analyses often apply naive feature concatenation approaches in data-driven work or use the findings from one modality to guide posthoc analysis of another, missing the opportunity to analyze the paired multimodal data in a truly unified approach. In this paper, we develop a more integrative model for combining genetic, demographic, and neuroimaging data. Inspired by the influence of genotype on phenotype, we propose using an attention-based approach where the genetic data guides attention to neuroimaging features of importance for model prediction. The genetic data is derived from copy number variation parameters, while the neuroimaging data is from functional magnetic resonance imaging. We evaluate the proposed approach on ASD classification and severity prediction tasks, using a sex-balanced dataset of 228 ASD and typically developing subjects in a 10-fold cross-validation framework. We demonstrate that our attention-based model combining genetic information, demographic data, and functional magnetic resonance imaging results in superior prediction performance compared to other multimodal approaches.

翻訳日:2023-08-11 14:47:20 公開日:2023-08-08

# インスツルメンテーション・アンド・コントロールシステムに統合された機械学習法の動的モデルの信頼性評価

Dynamic Model Agnostic Reliability Evaluation of Machine-Learning Methods Integrated in Instrumentation & Control Systems ( http://arxiv.org/abs/2308.05120v1 )

ライセンス: Link先を確認

Edward Chen, Han Bao, Nam Dinh

(参考訳) 近年、データ駆動ニューラルネットワークベースの機械学習(ML)アルゴリズムの分野は著しく成長し、計測と制御システムへの適用性の研究が加速している。運用環境では有望だが、そのようなアルゴリズムの信頼性は十分に評価されていない。総合的なリスクモデリングの欠如は、これらのシステムの信頼性を低下させる可能性がある。全米標準技術研究所の最近の報告では、MLの信頼性は採用にとって重要な障壁であり、インテリジェントシステムの安全かつ説明責任のある運用において重要な役割を果たす。そこで本研究では,トレーニングデータセットに分散検出を組み込むことで,ml予測の相対的信頼性を評価するリアルタイムモデル非依存手法を提案する。 MLアルゴリズムは補間(または近補間)タスクでは優れているが、補間では著しく劣化する。これは、新しいサンプルがトレーニングサンプルから"遠い"場合に発生する。この手法はlaplacian distributed decay for reliability (laddr)と呼ばれ、予測の相対的信頼性を計算するために使用される運用データと訓練データセットの違いを決定する。 LADDRは、フィードフォワードニューラルネットワークベースのモデルで、異なるフローの損失遷移における安全性の重要な要因を予測する。 LADDRは「データスーパーバイザ」として意図され、運用条件の文脈でよく訓練されたMLモデルの適切性を決定する。最終的に、LADDRは、従来の補間タスクに使用する場合のML予測の信頼性を支える証拠としてトレーニングデータを使用する方法を示している。

In recent years, the field of data-driven neural network-based machine learning (ML) algorithms has grown significantly and spurred research in its applicability to instrumentation and control systems. While they are promising in operational contexts, the trustworthiness of such algorithms is not adequately assessed. Failures of ML-integrated systems are poorly understood; the lack of comprehensive risk modeling can degrade the trustworthiness of these systems. In recent reports by the National Institute for Standards and Technology, trustworthiness in ML is a critical barrier to adoption and will play a vital role in intelligent systems' safe and accountable operation. Thus, in this work, we demonstrate a real-time model-agnostic method to evaluate the relative reliability of ML predictions by incorporating out-of-distribution detection on the training dataset. It is well documented that ML algorithms excel at interpolation (or near-interpolation) tasks but significantly degrade at extrapolation. This occurs when new samples are "far" from training samples. The method, referred to as the Laplacian distributed decay for reliability (LADDR), determines the difference between the operational and training datasets, which is used to calculate a prediction's relative reliability. LADDR is demonstrated on a feedforward neural network-based model used to predict safety significant factors during different loss-of-flow transients. LADDR is intended as a "data supervisor" and determines the appropriateness of well-trained ML models in the context of operational conditions. Ultimately, LADDR illustrates how training data can be used as evidence to support the trustworthiness of ML predictions when utilized for conventional interpolation tasks.

翻訳日:2023-08-11 14:46:53 公開日:2023-08-08

# 2次元のディラックデルタシュロディンガーポテンシャルに対する特異連続 L$^2(\mathbb{R}^2)$境界状態解の特異点スペクトルと固有ベクトル

The Exact Point Spectrum and Eigenvector of the Unique Continuous L$^2(\mathbb{R}^2)$ Bound State Solution to the Dirac Delta Schrodinger Potential in Two Dimensions ( http://arxiv.org/abs/2308.05195v1 )

ライセンス: Link先を確認

Michael Maroun

(参考訳) 2次元と3次元のディラックデルタ関数の点スペクトル、すなわち境界状態エネルギー固有値を分析することは、典型的には正規化や再正規化を伴わずに非常に難しい。この2次元の理由は2つの折りたたみである。 1) 結合定数は質量とプランク定数と共に単数量を形成する。これにより、異常な長さのスケールが失われる。 2) 直ちに明らかな l$^2$ の解は原点において発散し、ディラックデルタポテンシャルは測度として重要な支持点を持つ。ここで示される解の一意性から、線型作用素(すべての$\mathbb{r}^2$ 上の2次元ラプラス作用素)が、ここで構成される特別な領域を持つと、点スペクトルがちょうど1つの要素を持つことが保証される。この要素は正確に決定され、異常な長さスケールに対する自然な数学的厳密な分解が起こる。この研究において、任意の種類の再正規化や正規化には関係がない。

Analyzing the point spectrum, i.e. bound state energy eigenvalue, of the Dirac delta function in two and three dimensions is notoriously difficult without recourse to regularization or renormalization, typically both. The reason for this in two dimensions is two fold; 1) the coupling constant, together with the mass and Planck's constant form an unitless quantity. This causes there to be a missing anomalous length scale. 2) The immediately obvious L$^2$ solution is divergent at the origin, where the Dirac Delta potential has its important point of support as a measure. Due to the uniqueness of the solution presented here, it is immediate that the linear operator (the two dimensional Laplace operator on all of $\mathbb{R}^2$), with the specialized domain constructed here, ensures that the point spectrum has exactly one element. This element is determined precisely, and a natural mathematically rigorous resolution to the anomalous length scale arises. In this work, there is no recourse to renormalization or regularization of any kind.

翻訳日:2023-08-11 14:27:50 公開日:2023-08-08

# 1+1)D$QED散乱過程における絡み合い生成

Entanglement generation in $(1+1)D$ QED scattering processes ( http://arxiv.org/abs/2105.03445v3 )

ライセンス: Link先を確認

Marco Rigobello, Simone Notarnicola, Giuseppe Magnifico, Simone Montangero

(参考訳) テンソルネットワークを用いた1+1$次元QEDにおける実時間中間子散乱過程について検討した。自由フェルミオンモデルに基づく近似を導入することで、与えられた運動量と位置を持つ初期中間波パケットを作成する。次に, 2つの初期分離結合中間子の動力学を計算し, 相互作用強度および初期状態が弱結合系および中間結合系で変化することを観測した。最後に, 弾性衝突を考慮し, いくつかの散乱振幅とプロセスによって生じる絡み合いを計測する。驚くべきことに, 外部の中間子間の漸近的絡み合いに対する2つの異なるレジームを同定し, 結合関数としての成長が急激に加速するしきい値結合よりも摂動的に小さい。

We study real-time meson-meson scattering processes in $(1+1)$-dimensional QED by means of Tensor Networks. We prepare initial meson wave packets with given momentum and position introducing an approximation based on the free fermions model. Then, we compute the dynamics of two initially separated colliding mesons, observing a rich phenomenology as the interaction strength and the initial states are varied in the weak and intermediate coupling regimes. Finally, we consider elastic collisions and measure some scattering amplitudes as well as the entanglement generated by the process. Remarkably, we identify two different regimes for the asymptotic entanglement between the outgoing mesons: it is perturbatively small below a threshold coupling, past which its growth as a function of the coupling abruptly accelerates.

翻訳日:2023-08-10 18:38:54 公開日:2023-08-08

# 準最適サンプル複素数を持つゼロサムマルコフゲームにおけるモデルベースマルチエージェントRL

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity ( http://arxiv.org/abs/2007.07461v3 )

ライセンス: Link先を確認

Kaiqing Zhang, Sham M. Kakade, Tamer Ba\c{s}ar, Lin F. Yang

(参考訳) 実験モデルを用いたモデルベース強化学習(RL)は,RLのコーナーストーンの1つとして長年認識されてきた。学習と計画段階を自然に分離するマルチエージェントrl(marl)に特に適しており、全てのエージェントがサンプルを使用してポリシーを同時に改善する場合、非定常問題を回避する。直感的で広く使われているが、モデルベースMARLアルゴリズムのサンプル複雑性は十分に研究されていない。本稿では,サンプルの複雑さに関する根本的な問題に対処することを目的とする。生成モデルにのみアクセス可能な2プレイヤーのゼロサムマルコフゲームについて,最も基本的なMARL設定について検討した。モデルベースMARLは、Nash平衡値(NE)を求めるために$\tilde O(|S||A|||B|(1-\gamma)^{-3}\epsilon^{-2})$と、滑らかな計画オラクルを持つ$\epsilon$-NEポリシーのサンプル複雑性を達成し、$\gamma$は割引係数であり、$S,A,B$は状態空間と2つのエージェントのアクション空間を表す。さらに,アルゴリズムが報酬に依存しない場合,そのようなサンプル境界がミニマックス最適(対数係数まで)であることが示され,アルゴリズムは報酬知識のない遷移サンプルを検索し,一致した下位境界を確立する。これは通常の報酬対応の設定とは対照的で、$\tilde\Omega(|S|(|A|+|B|)(1-\gamma)^{-3}\epsilon^{-2})$ lower bound である。今回の結果は,marlにおけるモデルベースアプローチのサンプル効率を示すだけでなく,そのパワー(より困難な報酬非依存のケースを簡易に処理する)と制限($|a|,|b|$の適応的かつ最適でない)との根本的なトレードオフを詳細に示すものである。

Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the corner stones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intuitive and widely-used, the sample complexity of model-based MARL algorithms has not been fully investigated. In this paper, our goal is to address the fundamental question about its sample complexity. We study arguably the most basic MARL setting: two-player discounted zero-sum Markov games, given only access to a generative model. We show that model-based MARL achieves a sample complexity of $\tilde O(|S||A||B|(1-\gamma)^{-3}\epsilon^{-2})$ for finding the Nash equilibrium (NE) value up to some $\epsilon$ error, and the $\epsilon$-NE policies with a smooth planning oracle, where $\gamma$ is the discount factor, and $S,A,B$ denote the state space, and the action spaces for the two agents. We further show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge, by establishing a matching lower bound. This is in contrast to the usual reward-aware setting, with a $\tilde\Omega(|S|(|A|+|B|)(1-\gamma)^{-3}\epsilon^{-2})$ lower bound, where this model-based approach is near-optimal with only a gap on the $|A|,|B|$ dependence. Our results not only demonstrate the sample-efficiency of this basic model-based approach in MARL, but also elaborate on the fundamental tradeoff between its power (easily handling the more challenging reward-agnostic case) and limitation (less adaptive and suboptimal in $|A|,|B|$), particularly arises in the multi-agent context.

翻訳日:2023-08-10 18:38:23 公開日:2023-08-08

# 量子ゼノダイナミクスによる制約付き最適化

Constrained Optimization via Quantum Zeno Dynamics ( http://arxiv.org/abs/2209.15024v6 )

ライセンス: Link先を確認

Dylan Herman, Ruslan Shaydulin, Yue Sun, Shouvanik Chakrabarti, Shaohan Hu, Pierre Minssen, Arthur Rattew, Romina Yalovetzky, Marco Pistoia

(参考訳) 制約付き最適化問題は科学や産業においてユビキタスである。量子アルゴリズムは最適化問題の解法において有望であるが、現在のアルゴリズムでは任意の制約を効果的に扱えない。量子ゼノダイナミクスを用いて、不等式を含む複数の任意の制約で最適化問題を解く手法を提案する。量子最適化のダイナミクスは, 少数の補助量子ビットとポスト選択を必要とせず, 反復射影計測により, フォールトトレラント量子コンピュータ上のコンストラクタント部分空間に効率的に制限できることを示した。本手法は、量子近似最適化アルゴリズム(qaoa)と変分量子回路に組み込んで最適化し、幅広い適用性を有する。本手法は,複数の現実的制約を持つポートフォリオ最適化問題に対して数値的に評価し,現状技術よりも優れた解品質と制約内確率を観測する。我々は,量子H1-2量子プロセッサ上で概念実証を行う。

Constrained optimization problems are ubiquitous in science and industry. Quantum algorithms have shown promise in solving optimization problems, yet none of the current algorithms can effectively handle arbitrary constraints. We introduce a technique that uses quantum Zeno dynamics to solve optimization problems with multiple arbitrary constraints, including inequalities. We show that the dynamics of quantum optimization can be efficiently restricted to the in-constraint subspace on a fault-tolerant quantum computer via repeated projective measurements, requiring only a small number of auxiliary qubits and no post-selection. Our technique has broad applicability, which we demonstrate by incorporating it into the quantum approximate optimization algorithm (QAOA) and variational quantum circuits for optimization. We evaluate our method numerically on portfolio optimization problems with multiple realistic constraints and observe better solution quality and higher in-constraint probability than state-of-the-art techniques. We implement a proof-of-concept demonstration of our method on the Quantinuum H1-2 quantum processor.

翻訳日:2023-08-10 18:31:58 公開日:2023-08-08

# M$^2$-3DLaneNet:マルチモーダル3Dレーン検出の探索

M$^2$-3DLaneNet: Exploring Multi-Modal 3D Lane Detection ( http://arxiv.org/abs/2209.05996v3 )

ライセンス: Link先を確認

Yueru Luo, Xu Yan, Chaoda Zheng, Chao Zheng, Shuqi Mei, Tang Kun, Shuguang Cui, Zhen Li

(参考訳) 3d空間における正確なレーン線の推定は、その希薄な性質のため、依然として困難である。以前の研究は主に3dレーン検出に画像を使うことに重点を置いており、内在的な投影誤差と幾何情報の損失を招いた。これらの問題に対処するために,既存の単分子手法と組み合わせて,LiDARを3次元車線検出に活用する可能性を検討する。本稿では,複数のセンサからの補完情報を統合するためのm$^2$-3dlanenetを提案する。具体的には、M$^2$-3DLaneNetは、深度補完を通してLiDARデータから幾何情報を取り込むことで、2次元特徴を3次元空間に持ち上げる。その後、リフトされた2D機能は、BEV融合によりLiDAR機能によりさらに強化される。大規模openlaneデータセットに関する広範囲な実験により、m$^2$-3dlanenetが75mまたは100mの範囲に関係なく有効であることが示されている。

Estimating accurate lane lines in 3D space remains challenging due to their sparse and slim nature. Previous works mainly focused on using images for 3D lane detection, leading to inherent projection error and loss of geometry information. To address these issues, we explore the potential of leveraging LiDAR for 3D lane detection, either as a standalone method or in combination with existing monocular approaches. In this paper, we propose M$^2$-3DLaneNet to integrate complementary information from multiple sensors. Specifically, M$^2$-3DLaneNet lifts 2D features into 3D space by incorporating geometry information from LiDAR data through depth completion. Subsequently, the lifted 2D features are further enhanced with LiDAR features through cross-modality BEV fusion. Extensive experiments on the large-scale OpenLane dataset demonstrate the effectiveness of M$^2$-3DLaneNet, regardless of the range (75m or 100m).

翻訳日:2023-08-10 18:30:39 公開日:2023-08-08

# Archangel: 位置とメッセージメタデータを備えたハイブリッドUAVベースのヒューマン検出ベンチマーク

Archangel: A Hybrid UAV-based Human Detection Benchmark with Position and Pose Metadata ( http://arxiv.org/abs/2209.00128v3 )

ライセンス: Link先を確認

Yi-Ting Shen, Yaesop Lee, Heesung Kwon, Damon M. Conover, Shuvra S. Bhattacharyya, Nikolas Vale, Joshua D. Gray, G. Jeremy Leong, Kenneth Evensen, Frank Skirlo

(参考訳) 無人航空機(UAV)が捉えた画像の中で、人間のような物体を検出することを学ぶことは、通常、UAVの物体に対する位置によって引き起こされる大きな変動に悩まされる。加えて、既存のUAVベースのベンチマークデータセットは適切なデータセットメタデータを提供していない。本稿では,類似した想像条件とuav位置およびオブジェクトポーズメタデータでキャプチャされた,実および合成のサブセットからなる,最初のuavベースのオブジェクト検出データセットであるarchangelを紹介する。モデル評価中にメタデータを活用するメリットを示すために、最先端のオブジェクト検出器を用いて、一連の実験を慎重に設計する。さらに,モデル最適化における実データと合成データの両方に関する重要な知見を提示する。最後に、archangelのメリット、限界、今後の方向性について議論し、より広範な機械学習コミュニティにその明確な価値を強調する。

Learning to detect objects, such as humans, in imagery captured by an unmanned aerial vehicle (UAV) usually suffers from tremendous variations caused by the UAV's position towards the objects. In addition, existing UAV-based benchmark datasets do not provide adequate dataset metadata, which is essential for precise model diagnosis and learning features invariant to those variations. In this paper, we introduce Archangel, the first UAV-based object detection dataset composed of real and synthetic subsets captured with similar imagining conditions and UAV position and object pose metadata. A series of experiments are carefully designed with a state-of-the-art object detector to demonstrate the benefits of leveraging the metadata during model evaluation. Moreover, several crucial insights involving both real and synthetic data during model optimization are presented. In the end, we discuss the advantages, limitations, and future directions regarding Archangel to highlight its distinct value for the broader machine learning community.

翻訳日:2023-08-10 18:29:37 公開日:2023-08-08

# ナノスケール力センシングのためのインダクティブ電気機械伝達

Kinetic Inductive Electromechanical Transduction for Nanoscale Force Sensing ( http://arxiv.org/abs/2301.11055v4 )

ライセンス: Link先を確認

August K. Roos, Ermes Scarano, Elisabet K. Arvidsson, Erik Holmgren, David B. Haviland

(参考訳) 原子間力顕微鏡のための共鳴力センサの設計にはキャビティ光学の原理を用いる。このセンサーは、従来の静電容量カップリングと二重の電気機械結合の一種に基づいており、カンチレバーの運動は、超伝導ナノワイヤの運動インダクタンスの変化を引き起こす表面ひずみを誘導する。キャビティは、ナノワイヤのキネティックインダクタンスを含む等価な$lc$回路を備えたコンパクトマイクロ波プラズマモードによって実現される。本装置は完全にコプラナーであり,伝送線路と後続増幅器との最適結合のためにキャビティインピーダンスを変換する方法を示す。ここでは3-10Hzの範囲で, 素動インダクティブ・メカノ電界結合 (KIMEC) 速度$g_0 / 2 \pi$ を推定する。多周波ポンピングと測定手法を用いて, キャンチレバーの位相感度検出を行う。

We use the principles of cavity optomechanics to design a resonant mechanical force sensor for atomic force microscopy. The sensor is based on a type of electromechanical coupling, dual to traditional capacitive coupling, whereby the motion of a cantilever induces surface strain that causes a change in the kinetic inductance of a superconducting nanowire. The cavity is realized by a compact microwave-plasma mode with an equivalent $LC$ circuit involving the kinetic inductance of the nanowire. The device is fully coplanar and we show how to transform the cavity impedance for optimal coupling to the transmission line and the following amplifier. For the device presented here, we estimate the bare kinetic inductive mechano-electric coupling (KIMEC) rate $g_0 / 2 \pi$ in the range 3-10 Hz. We demonstrate phase-sensitive detection of cantilever motion using a multifrequency pumping and measurement scheme.

翻訳日:2023-08-10 18:09:24 公開日:2023-08-08

# 深層学習に基づく時系列因果推論による北極増幅の定量化

Quantifying Causes of Arctic Amplification via Deep Learning based Time-series Causal Inference ( http://arxiv.org/abs/2303.07122v3 )

ライセンス: Link先を確認

Sahara Ali, Omar Faruque, Yiyi Huang, Md. Osman Gani, Aneesh Subramanian, Nicole-Jienne Shchlegel, Jianwu Wang

(参考訳) 北極の温暖化、または北極の増幅は、いくつかの大気と海洋のドライバーによって導かれる。しかし、その根底にある熱力学的原因の詳細はまだ不明である。固定処理効果戦略を用いた海氷融解に対する大気プロセスの因果効果の推算は非現実的な反事実推定につながる。このようなモデルは、時間的な混乱によってバイアスになりがちである。さらに、地球科学データの複雑な非線形性は、既存の限界構造技術を用いて因果推論を行うことができない。これらの課題に取り組むために,反復型ニューラルネットワークと新しい確率的バランス手法を用いて,連続処理中の因果関係を推測する時系列因果推論モデルtcinetを提案する。合成および観測データに関する実験を通じて、我々の研究は北極海氷融解の原因の定量化能力を大幅に向上し、観測地球科学における因果推論の経路をさらに深めることができることを示す。

The warming of the Arctic, also known as Arctic amplification, is led by several atmospheric and oceanic drivers. However, the details of its underlying thermodynamic causes are still unknown. Inferring the causal effects of atmospheric processes on sea ice melt using fixed treatment effect strategies leads to unrealistic counterfactual estimations. Such models are also prone to bias due to time-varying confoundedness. Further, the complex non-linearity in Earth science data makes it infeasible to perform causal inference using existing marginal structural techniques. In order to tackle these challenges, we propose TCINet - time-series causal inference model to infer causation under continuous treatment using recurrent neural networks and a novel probabilistic balancing technique. Through experiments on synthetic and observational data, we show how our research can substantially improve the ability to quantify leading causes of Arctic sea ice melt, further paving paths for causal inference in observational Earth science.

翻訳日:2023-08-10 17:59:42 公開日:2023-08-08

# 注意マップエントロピーに基づくアクティブビジュアル探索

Active Visual Exploration Based on Attention-Map Entropy ( http://arxiv.org/abs/2303.06457v3 )

ライセンス: Link先を確認

Adam Pardyl, Grzegorz Rype\'s\'c, Grzegorz Kurzejamski, Bartosz Zieli\'nski, Tomasz Trzci\'nski

(参考訳) アクティブビジュアル探索は、環境に基づいて連続した観測がアクティブに選択される現実世界のシナリオにおいて、限られたセンサー能力の問題に対処する。この問題に対処するために,Attention-Map Entropy (AME) と呼ばれる新しい手法を導入する。変圧器モデルの内部の不確実性を利用して、最も情報性の高い観測値を決定する。既存のソリューションとは対照的に、トレーニングを単純化する追加の損失コンポーネントは必要ない。網膜様センサを模倣する実験により、そのような簡易なトレーニングにより、公開データセットの再構成、セグメンテーション、分類の性能が大幅に向上することを示した。

Active visual exploration addresses the issue of limited sensor capabilities in real-world scenarios, where successive observations are actively chosen based on the environment. To tackle this problem, we introduce a new technique called Attention-Map Entropy (AME). It leverages the internal uncertainty of the transformer-based model to determine the most informative observations. In contrast to existing solutions, it does not require additional loss components, which simplifies the training. Through experiments, which also mimic retina-like sensors, we show that such simplified training significantly improves the performance of reconstruction, segmentation and classification on publicly available datasets.

翻訳日:2023-08-10 17:59:25 公開日:2023-08-08

# 大規模言語モデル生成推論のためのコスト効果ハイパーパラメータ最適化

Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference ( http://arxiv.org/abs/2303.04673v2 )

ライセンス: Link先を確認

Chi Wang, Susan Xueqing Liu, Ahmed H. Awadallah

(参考訳) 大きな言語モデル(LLM)は、その生成能力に大きな関心を惹き付け、様々な商用アプリケーションの開発につながった。モデルを使用することのコストが高いため、アプリケーションビルダーは限られた推論予算の下で世代価値を最大化することができる。本稿では,テキスト生成の有用性とコストに大きな影響を及ぼす応答数,温度,最大トークンなどの推定ハイパーパラメータの最適化について検討する。経済的なハイパーパラメータ最適化とコストベースプルーニングを活用したEcoOptiGenというフレームワークを設計する。 GPT-3.5/GPT-4モデルを様々なタスクで実験し、その有効性を検証する。 EcoOptiGen は FLAML ライブラリの ‘autogen' パッケージで実装されている。

Large Language Models (LLMs) have sparked significant interest in their generative capabilities, leading to the development of various commercial applications. The high cost of using the models drives application builders to maximize the value of generation under a limited inference budget. This paper presents a study of optimizing inference hyperparameters such as the number of responses, temperature and max tokens, which significantly affects the utility/cost of text generation. We design a framework named EcoOptiGen which leverages economical hyperparameter optimization and cost-based pruning. Experiments with the GPT-3.5/GPT-4 models on a variety of tasks verify its effectiveness. EcoOptiGen is implemented in the `autogen' package of the FLAML library: \url{https://aka.ms/autogen}.

翻訳日:2023-08-10 17:58:53 公開日:2023-08-08

# 知識グラフのためのユニバーサル質問応答プラットフォーム

A Universal Question-Answering Platform for Knowledge Graphs ( http://arxiv.org/abs/2303.00595v2 )

ライセンス: Link先を確認

Reham Omar, Ishika Dhall, Panos Kalnis, Essam Mansour

(参考訳) 多様なアプリケーションドメインからの知識は、SPARQLエンドポイントを介してWebにアクセス可能なRDFエンジンに格納されるナレッジグラフ(KG)として組織される。整形されたSPARQLクエリを表現するには、グラフ構造とそのコンポーネントの正確なURIに関する情報が必要である。質問応答(QA)システムは、自然言語の質問をSPARQLに翻訳するのを支援する。既存のQAシステムは通常、アプリケーション固有の人為的なルールに基づいており、あるいは、事前情報、高価な前処理、ターゲットとする各KGに対するモデル適応を必要とする。したがって、広い範囲のアプリケーションやKGに一般化することは困難である。本稿では,各ターゲットKGに合わせて調整する必要のない汎用QAシステムであるKGQAnを提案する。キュレートされた規則の代わりに、KGQAnは疑問理解の新たな形式化をテキスト生成問題として導入し、質問をニューラルシーケンスからシーケンスモデルを通じて中間抽象表現に変換する。また、クエリ時に抽象表現を特定のkgのsparqlクエリにマップし、公開アクセス可能なapiとrdfストアの既存のインデックスのみを使用するジャストインタイムリンカを開発した。いくつかの実kgを用いた実験により,kgqanは,解答の質や処理時間,特に任意のkgに対して,訓練中は見当たらない処理時間において,最先端の割に容易に展開し,その性能を上回っていることが示された。

Knowledge from diverse application domains is organized as knowledge graphs (KGs) that are stored in RDF engines accessible in the web via SPARQL endpoints. Expressing a well-formed SPARQL query requires information about the graph structure and the exact URIs of its components, which is impractical for the average user. Question answering (QA) systems assist by translating natural language questions to SPARQL. Existing QA systems are typically based on application-specific human-curated rules, or require prior information, expensive pre-processing and model adaptation for each targeted KG. Therefore, they are hard to generalize to a broad set of applications and KGs. In this paper, we propose KGQAn, a universal QA system that does not need to be tailored to each target KG. Instead of curated rules, KGQAn introduces a novel formalization of question understanding as a text generation problem to convert a question into an intermediate abstract representation via a neural sequence-to-sequence model. We also develop a just-in-time linker that maps at query time the abstract representation to a SPARQL query for a specific KG, using only the publicly accessible APIs and the existing indices of the RDF store, without requiring any pre-processing. Our experiments with several real KGs demonstrate that KGQAn is easily deployed and outperforms by a large margin the state-of-the-art in terms of quality of answers and processing time, especially for arbitrary KGs, unseen during the training.

翻訳日:2023-08-10 17:57:40 公開日:2023-08-08

# DiffIR:画像復元のための効率的な拡散モデル

DiffIR: Efficient Diffusion Model for Image Restoration ( http://arxiv.org/abs/2303.09472v2 )

ライセンス: Link先を確認

Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, and Luc Van Gool

(参考訳) 拡散モデル(DM)は、画像合成過程をデノナイジングネットワークのシーケンシャルな応用にモデル化することで、SOTA性能を達成した。しかし、画像合成とは違って、画像復元(IR)は、地上構造に応じて結果を生成するのに強い制約がある。したがって、IRの場合、画像全体や特徴マップを推定する大規模なモデルで大規模なイテレーションを実行する従来のDMは非効率である。この問題に対処するために、コンパクトIR先行抽出ネットワーク(CPEN)、動的IRトランスフォーマ(DIRformer)、復調ネットワーク(denoising network)からなるIR(DiffIR)のための効率的なDMを提案する。具体的には、DiffIRには2つのトレーニングステージがある。事前トレーニングでは, CPEN$_{S1}$に接地画像を入力することで, コンパクトIR先行表現(IPR)を捕捉し, DIRformerを誘導する。第2段階では、LQ画像のみを用いて事前訓練されたCPEN$_{S1}$と同じIRPを直接推定するようにDMを訓練する。 IPRはコンパクトなベクトルであるため、DiffIRは従来のDMよりも少ないイテレーションで正確な推定を行い、より安定でリアルな結果を生成することができる。繰り返しは少ないので、我々のDiffIRはCPEN$_{S2}$, DIRformer, denoising Networkを併用することで、推定誤差の影響をさらに低減することができる。計算コストを削減しつつ、複数のIRタスクを広範囲に実験し、SOTA性能を達成する。コードは \url{https://github.com/zj-binxia/diffir} で入手できる。

Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network. However, different from image synthesis, image restoration (IR) has a strong constraint to generate results in accordance with ground-truth. Thus, for IR, traditional DMs running massive iterations on a large model to estimate whole images or feature maps is inefficient. To address this issue, we propose an efficient DM for IR (DiffIR), which consists of a compact IR prior extraction network (CPEN), dynamic IR transformer (DIRformer), and denoising network. Specifically, DiffIR has two training stages: pretraining and training DM. In pretraining, we input ground-truth images into CPEN$_{S1}$ to capture a compact IR prior representation (IPR) to guide DIRformer. In the second stage, we train the DM to directly estimate the same IRP as pretrained CPEN$_{S1}$ only using LQ images. We observe that since the IPR is only a compact vector, DiffIR can use fewer iterations than traditional DM to obtain accurate estimations and generate more stable and realistic results. Since the iterations are few, our DiffIR can adopt a joint optimization of CPEN$_{S2}$, DIRformer, and denoising network, which can further reduce the estimation error influence. We conduct extensive experiments on several IR tasks and achieve SOTA performance while consuming less computational costs. Code is available at \url{https://github.com/Zj-BinXia/DiffIR}.

翻訳日:2023-08-10 17:48:06 公開日:2023-08-08

# TiDEによる長期予測:時系列Dense Encoder

Long-term Forecasting with TiDE: Time-series Dense Encoder ( http://arxiv.org/abs/2304.08424v3 )

ライセンス: Link先を確認

Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan Mathur, Rajat Sen and Rose Yu

(参考訳) 最近の研究で、単純な線形モデルは、長期の時系列予測においてトランスフォーマーベースのアプローチより優れていることが示されている。そこで我々は,線形モデルの単純さと高速さを享受しつつ,共変量や非線形依存性を扱える時系列予測のためのマルチレイヤパーセプトロン(MLP)ベースのエンコーダ・デコーダモデルであるTiDEを提案する。理論的には、このモデルの最も単純な線形類似物は、いくつかの仮定の下で線形力学系(lds)の最適誤差率に近いことを証明できる。実験により,提案手法は,最も優れたTransformerベースモデルよりも5～10倍高速でありながら,一般的な時系列予測ベンチマークにおいて,先行手法に適合あるいは優れることを示す。

Recent work has shown that simple linear models can outperform several Transformer based approaches in long term time-series forecasting. Motivated by this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model, Time-series Dense Encoder (TiDE), for long-term time-series forecasting that enjoys the simplicity and speed of linear models while also being able to handle covariates and non-linear dependencies. Theoretically, we prove that the simplest linear analogue of our model can achieve near optimal error rate for linear dynamical systems (LDS) under some assumptions. Empirically, we show that our method can match or outperform prior approaches on popular long-term time-series forecasting benchmarks while being 5-10x faster than the best Transformer based model.

翻訳日:2023-08-10 17:38:42 公開日:2023-08-08

# 分散化と加速により大規模バンドル調整が可能に

Decentralization and Acceleration Enables Large-Scale Bundle Adjustment ( http://arxiv.org/abs/2305.07026v3 )

ライセンス: Link先を確認

Taosha Fan, Joseph Ortiz, Ming Hsiao, Maurizio Monge, Jing Dong, Todd Murphey, Mustafa Mukadam

(参考訳) 大規模なバンドル調整問題へのスケーリングには、複数のデバイスに分散するデータと計算が必要である。事前作業における集中型メソッドは、計算と通信のオーバーヘッドのため、中小規模の問題を解決することしかできない。本稿では,計算と通信のボトルネックを軽減し,任意に大きなバンドル調整問題を解決する完全分散手法を提案する。再投射誤差を補正し、異なるデバイスから最適化変数を分離する新しい代理関数を導出することにより、これを実現する。この関数は、最大化最小化技術を使用することを可能にし、並列で解決できる独立最適化サブプロブレムへのバンドル調整を減らす。さらに、ネステロフの加速と適応再起動を適用し、理論的な保証を維持しながら収束を改善する。ピアツーピア通信は限られているが,本手法は軽度条件下での1次臨界点への収束が証明可能である。公開データセットを用いた大規模なベンチマークでは,メモリ使用量や通信負荷に類似した分散ベースラインよりもはるかに高速に収束する。単一デバイスを用いた集中型ベースラインと比較して、我々の手法は分散化されているものの、Ceresで最大953.7倍、DeepLMで最大174.6倍の精度で解が得られる。コード: https://joeaortiz.github.io/daba。

Scaling to arbitrarily large bundle adjustment problems requires data and compute to be distributed across multiple devices. Centralized methods in prior works are only able to solve small or medium size problems due to overhead in computation and communication. In this paper, we present a fully decentralized method that alleviates computation and communication bottlenecks to solve arbitrarily large bundle adjustment problems. We achieve this by reformulating the reprojection error and deriving a novel surrogate function that decouples optimization variables from different devices. This function makes it possible to use majorization minimization techniques and reduces bundle adjustment to independent optimization subproblems that can be solved in parallel. We further apply Nesterov's acceleration and adaptive restart to improve convergence while maintaining its theoretical guarantees. Despite limited peer-to-peer communication, our method has provable convergence to first-order critical points under mild conditions. On extensive benchmarks with public datasets, our method converges much faster than decentralized baselines with similar memory usage and communication load. Compared to centralized baselines using a single device, our method, while being decentralized, yields more accurate solutions with significant speedups of up to 953.7x over Ceres and 174.6x over DeepLM. Code: https://joeaortiz.github.io/daba.

翻訳日:2023-08-10 17:31:34 公開日:2023-08-08

# DOCTOR:ウェアラブル・メディカル・センサを用いたマルチ障害検出連続学習フレームワーク

DOCTOR: A Multi-Disease Detection Continual Learning Framework Based on Wearable Medical Sensors ( http://arxiv.org/abs/2305.05738v2 )

ライセンス: Link先を確認

Chia-Hao Li and Niraj K. Jha

(参考訳) エッジデバイスにおける機械学習(ML)とウェアラブル医療センサ(WMS)の最近の進歩により、スマートヘルスケアのためのML駆動型疾患検出が可能になった。従来のML駆動型疾患検出法は、各疾患の個々のモデルとその対応するWMSデータのカスタマイズに依存している。しかし、このような方法は分散シフトや新しいタスク分類クラスへの適応性に欠ける。さらに、新しい疾患ごとに再設計し、スクラッチから再訓練する必要がある。これらの課題に対処するために,WMSに基づく多相検出連続学習(CL)フレームワークであるDOCTORを提案する。マルチヘッドディープニューラルネットワーク(DNN)と、模範再生スタイルのCLアルゴリズムを採用している。 clアルゴリズムは、異なるデータ分布、分類クラス、病気検出タスクが順次導入される新しいミッションを継続的に学習することを可能にする。データ保存方法と合成データ生成(SDG)モジュールとで破滅的な忘れを対処する。データ保存方法は、前回のミッションから得たトレーニングデータの最も情報性の高いサブセットを効率よく保存して再生する。 SDGモジュールは、実際のトレーニングデータの確率分布をモデル化し、データのプライバシーを維持しながら再生のための合成データを生成する。マルチヘッドDNNにより、DOCTORはユーザWMSデータに基づいて複数の疾患を同時に検出できる。各種CL実験において、1つのDNNモデルを用いて高い疾患分類精度を維持する上でのDOCTORの有効性を実証した。 doctorは平均テスト精度が1.43倍、f1-scoreが1.25倍、naive fine-tuning frameworkよりも0.01倍、モデルサイズが小さく複雑なclシナリオが複雑である。

Modern advances in machine learning (ML) and wearable medical sensors (WMSs) in edge devices have enabled ML-driven disease detection for smart healthcare. Conventional ML-driven disease detection methods rely on customizing individual models for each disease and its corresponding WMS data. However, such methods lack adaptability to distribution shifts and new task classification classes. Moreover, they need to be rearchitected and retrained from scratch for each new disease. To address these challenges, we propose DOCTOR, a multi-disease detection continual learning (CL) framework based on WMSs. It employs a multi-headed deep neural network (DNN) and an exemplar-replay-style CL algorithm. The CL algorithm enables the framework to continually learn new missions where different data distributions, classification classes, and disease detection tasks are introduced sequentially. It counteracts catastrophic forgetting with a data preservation method and a synthetic data generation (SDG) module. The data preservation method efficiently preserves the most informative subset of training data from previous missions for replay. The SDG module models the probability distribution of the real training data and generates synthetic data for replays while retaining data privacy. The multi-headed DNN enables DOCTOR to detect multiple diseases simultaneously based on user WMS data. In various CL experiments, we demonstrate DOCTOR's efficacy in maintaining high disease classification accuracy with a single DNN model. DOCTOR achieves 1.43 times better average test accuracy, 1.25 times better F1-score, and 0.41 higher backward transfer than the naive fine-tuning framework, with a small model size and in complex CL scenarios.

翻訳日:2023-08-10 17:31:12 公開日:2023-08-08

# GAD-NR 近傍再構成によるグラフ異常検出

GAD-NR: Graph Anomaly Detection via Neighborhood Reconstruction ( http://arxiv.org/abs/2306.01951v4 )

ライセンス: Link先を確認

Amit Roy, Juan Shu, Jia Li, Carl Yang, Olivier Elshocht, Jeroen Smeets and Pan Li

(参考訳) Graph Anomaly Detection (GAD) は、グラフ内の異常ノードを識別し、ネットワークセキュリティ、不正検出、ソーシャルメディアスパム検出、その他さまざまな分野の応用を見つけるために用いられるテクニックである。 GADの一般的な方法は、グラフデータをノード表現にエンコードし、これらの表現に基づいてグラフの再構成品質を評価することによって異常を識別するグラフオートエンコーダ(GAE)である。しかし、既存のGAEモデルは直接リンク再構成に最適化されており、グラフに接続されたノードは潜在空間にクラスタ化される。その結果、クラスター型構造異常を検出するのに優れるが、クラスタに適合しないより複雑な構造異常に悩まされる。この制限に対処するため,グラフ異常検出のための近傍再構成を組み込んだGAEの新しい変種であるGAD-NRを提案する。 GAD-NRは、ノード表現に基づいて、ローカル構造、自己属性、および隣接属性を含むノードの近傍全体を再構築することを目的としている。異常ノードと正常ノード間の近傍再構成損失を比較することで、GAD-NRは任意の異常を効果的に検出できる。 6つの実世界のデータセットで実施された大規模な実験は、GAD-NRの有効性を検証し、最先端の競合相手よりも顕著な改善(AUCでは最大30%)を示す。 GAD-NRのソースコードが公開されている。比較分析の結果,既存の手法は3種類の異常から1種類または2種類の異常を検出する場合にのみ有効であることが判明した。対照的に、GAD-NRはデータセット全体の3種類の異常を検知し、その包括的な異常検出能力を示す。

Graph Anomaly Detection (GAD) is a technique used to identify abnormal nodes within graphs, finding applications in network security, fraud detection, social media spam detection, and various other domains. A common method for GAD is Graph Auto-Encoders (GAEs), which encode graph data into node representations and identify anomalies by assessing the reconstruction quality of the graphs based on these representations. However, existing GAE models are primarily optimized for direct link reconstruction, resulting in nodes connected in the graph being clustered in the latent space. As a result, they excel at detecting cluster-type structural anomalies but struggle with more complex structural anomalies that do not conform to clusters. To address this limitation, we propose a novel solution called GAD-NR, a new variant of GAE that incorporates neighborhood reconstruction for graph anomaly detection. GAD-NR aims to reconstruct the entire neighborhood of a node, encompassing the local structure, self-attributes, and neighbor attributes, based on the corresponding node representation. By comparing the neighborhood reconstruction loss between anomalous nodes and normal nodes, GAD-NR can effectively detect any anomalies. Extensive experimentation conducted on six real-world datasets validates the effectiveness of GAD-NR, showcasing significant improvements (by up to 30% in AUC) over state-of-the-art competitors. The source code for GAD-NR is openly available. Importantly, the comparative analysis reveals that the existing methods perform well only in detecting one or two types of anomalies out of the three types studied. In contrast, GAD-NR excels at detecting all three types of anomalies across the datasets, demonstrating its comprehensive anomaly detection capabilities.

翻訳日:2023-08-10 17:18:58 公開日:2023-08-08

# 静電場による極低温CaF分子の遮蔽衝突

Shielding collisions of ultracold CaF molecules with static electric fields ( http://arxiv.org/abs/2305.07600v2 )

ライセンス: Link先を確認

Bijit Mukherjee, Matthew D. Frye, C. Ruth Le Sueur, Michael R. Tarbutt and Jeremy M. Hutson

(参考訳) 強静電場における極低温CaF分子の衝突について検討する。これらの分野は相互作用ポテンシャルに長距離障壁を作ることを可能にし、非弾性やその他の損失過程が起こる可能性のある短距離領域に分子が到達するのを効果的に妨げている。弾性散乱と損失に対するレート係数の結合チャネル計算を行う。本稿では,Van Vleck変換を用いて,エネルギー的によく分離されたロータ関数をベースとした効率的なプロシージャを開発する。遮蔽はCaFにおいて特に効率的であり,23kV/cmのフィールドにおいて,2体損失過程の速度を10^7$以上削減できることを示す。損失率は、かなりの範囲の分野において低いままである。電子スピンと核スピンは、いくつかの小さな領域で強い損失をもたらすが、他の領域では効果がほとんどない。これらの結果は、CaFの蒸発冷却の量子縮退への道を開く。

We study collisions of ultracold CaF molecules in strong static electric fields. These fields allow the creation of long-range barriers in the interaction potential, effectively preventing the molecules from reaching the short-range region where inelastic and other loss processes are likely to occur. We carry out coupled-channel calculations of rate coefficients for elastic scattering and loss. We develop an efficient procedure for including energetically well-separated rotor functions in the basis set via a Van Vleck transformation. We show that shielding is particularly efficient for CaF and allows the rate of two-body loss processes to be reduced by a factor of $10^7$ or more at a field of 23 kV/cm. The loss rates remain low over a substantial range of fields. Electron and nuclear spins cause strong additional loss in some small ranges of field, but have little effect elsewhere. These results pave the way for evaporative cooling of CaF towards quantum degeneracy.

翻訳日:2023-08-10 17:17:37 公開日:2023-08-08

# AutoHint: Hint生成による自動プロンプト最適化

AutoHint: Automatic Prompt Optimization with Hint Generation ( http://arxiv.org/abs/2307.07415v2 )

ライセンス: Link先を確認

Hong Sun, Xue Li, Yinchuan Xu, Youkow Homma, Qi Cao, Min Wu, Jian Jiao, Denis Charles

(参考訳) 本稿では,大規模言語モデル(LLM)の自動プロンプトエンジニアリングと最適化のための新しいフレームワークであるAutoHintを提案する。 llmは、様々なタスクで高品質なアノテーションを実現する素晴らしい能力を示しているが、特定のタスクにこの能力を適用する鍵は、高品質なプロンプトを開発することである。そこで本研究では,インプット・アウトプット・デモから得られた拡張した指示を組み込むことで,文脈内学習とゼロショット学習の両方のメリットを継承し,プロンプトを最適化する枠組みを提案する。我々は、エンリッチメントをヒントとして参照し、ラベル付きデータから自動的にヒントを生成するフレームワークを提案する。より具体的には、最初のプロンプトから始めて、提案手法はまず、不正な予測から選択したサンプルに対する新しいヒントを導出するようにLCMに指示し、次にサンプルごとのヒントから要約し、その結果を初期プロンプトに付加して、新しいリッチな命令を生成する。提案手法は, ゼロショットプロンプトと少数ショートプロンプトの両方に対して, BIG-Benchインストラクション・インストラクション・インジェクション・インジェクション・データセットを用いて評価し, 実験により複数のタスクの精度を大幅に向上させることができることを示した。

This paper presents AutoHint, a novel framework for automatic prompt engineering and optimization for Large Language Models (LLM). While LLMs have demonstrated remarkable ability in achieving high-quality annotation in various tasks, the key to applying this ability to specific tasks lies in developing high-quality prompts. Thus we propose a framework to inherit the merits of both in-context learning and zero-shot learning by incorporating enriched instructions derived from input-output demonstrations to optimize original prompt. We refer to the enrichment as the hint and propose a framework to automatically generate the hint from labeled data. More concretely, starting from an initial prompt, our method first instructs a LLM to deduce new hints for selected samples from incorrect predictions, and then summarizes from per-sample hints and adds the results back to the initial prompt to form a new, enriched instruction. The proposed method is evaluated on the BIG-Bench Instruction Induction dataset for both zero-shot and few-short prompts, where experiments demonstrate our method is able to significantly boost accuracy for multiple tasks.

翻訳日:2023-08-10 17:10:50 公開日:2023-08-08

# 蒸留プルーニング: 合成データを使って宝くじを勝ち取る

Distilled Pruning: Using Synthetic Data to Win the Lottery ( http://arxiv.org/abs/2307.03364v3 )

ライセンス: Link先を確認

Luke McDermott, Daniel Cummings

(参考訳) この研究は、蒸留データを用いてディープラーニングモデルを刈り取る新しいアプローチを導入する。アーキテクチャやアルゴリズムの最適化を主眼とする従来の戦略とは異なり、我々の手法はこれらのシナリオにおけるデータの役割を再考する。蒸留データセットは、より大きなデータセットから必須パターンをキャプチャし、この能力を活用して、計算効率の良いプルーニングプロセスを実現する方法を実証する。我々のアプローチでは、CIFAR-10で同等の間隔でイテレーティブマグニチュード・プルーニング(Iterative Magnitude Pruning)よりも5倍高速な、スパースでトレーニング可能なサブネットワーク(Lottery Tickets)を見つけることができる。実験結果は,資源効率のよいニューラルネットワークのプルーニング,モデル圧縮,ニューラルネットワークの探索に蒸留データを利用する可能性を強調した。

This work introduces a novel approach to pruning deep learning models by using distilled data. Unlike conventional strategies which primarily focus on architectural or algorithmic optimization, our method reconsiders the role of data in these scenarios. Distilled datasets capture essential patterns from larger datasets, and we demonstrate how to leverage this capability to enable a computationally efficient pruning process. Our approach can find sparse, trainable subnetworks (a.k.a. Lottery Tickets) up to 5x faster than Iterative Magnitude Pruning at comparable sparsity on CIFAR-10. The experimental results highlight the potential of using distilled data for resource-efficient neural network pruning, model compression, and neural architecture search.

翻訳日:2023-08-10 17:09:16 公開日:2023-08-08

# マルチモーダルクエリを用いたアクタ非依存マルチラベル動作認識

Actor-agnostic Multi-label Action Recognition with Multi-modal Query ( http://arxiv.org/abs/2307.10763v2 )

ライセンス: Link先を確認

Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta

(参考訳) 既存の行動認識法は、内在的なトポロジとアクター間の明らかな差異により、アクター固有のものである。これはアクター固有のポーズ推定(例えば人間対動物)を必要とし、複雑なモデル設計と高いメンテナンスコストをもたらす。さらに、他の利用可能な情報ソース(クラス名テキストなど)や複数のアクションの同時発生を無視しながら、視覚的モダリティのみと単一ラベルの分類を学ぶことに注力することが多い。これらの制約を克服するために,人間や動物を含む様々な種類の俳優に統一されたソリューションを提供する「アクター非依存マルチモード動作認識」という新しい手法を提案する。さらに,多モードセマンティッククエリーネットワーク(MSQNet)モデルをトランスフォーマーベースのオブジェクト検出フレームワーク(DETRなど)で定式化し,視覚的およびテキスト的モダリティを活用して,アクションクラスをより良く表現する。アクター固有のモデルデザインの排除は重要な利点であり、アクターのポーズ推定の必要性を完全に排除する。 5つの公開ベンチマークの大規模な実験によると、我々のMSQNetは、人間と動物のシングルラベルとマルチラベルのアクション認識タスクにおいて、アクター固有の代替手段の先行技術を最大50%上回っている。コードはhttps://github.com/mondalanindya/MSQNet.comでリリースされる。

Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code will be released at https://github.com/mondalanindya/MSQNet.

翻訳日:2023-08-10 16:58:55 公開日:2023-08-08

# スワップ演算子の代数構造による量子マックスカットの緩和と厳密解

Relaxations and Exact Solutions to Quantum Max Cut via the Algebraic Structure of Swap Operators ( http://arxiv.org/abs/2307.15661v2 )

ライセンス: Link先を確認

Adam Bene Watts, Anirban Chowdhury, Aidan Epperly, J. William Helton, Igor Klep

(参考訳) 量子マックスカット(qmc)問題は、局所ハミルトニアン問題の近似アルゴリズムを設計するためのテストプロブレムとして現れた。本稿では、QMCの代数構造、特に量子マックスカットハミルトニアンと対称群の表現理論の関係を用いてこの問題に対処する。この論文の最初の大きな貢献は、量子マックスカットに緩和の新たな階層を与えるために非可換な正方形最適化手法(ncSoS)の拡張である。現在の階層は、キュービットスワップ作用素の多項式に対する最適化に基づいている。これは、パウリ行列の項で表される多項式に基づく '`standard'' 量子ラッサール階層とは対照的である。この階層の正しさを証明するために、キュービットスワップ作用素によって生成される代数の有限表現を利用する。このプレゼンテーションは、スワップ演算子を使って記述された多項式を操作するためにコンピュータ代数的手法を使うことを可能にし、独立した興味を持つかもしれない。驚くべきことに、この新しい階層のレベル2は、最大8頂点のグラフ上の一様エッジ重みを持つすべてのqmcインスタンスにおいて、正確に(10^{-7}$)である。この論文の2つ目の大きな貢献は、あるグラフに対してQMCハミルトンの最大固有値を正確に計算する多項式時間アルゴリズムである。後者の特別なケースは、一様辺重みを持つ完備二部グラフであり、リーブとマティスの業績から正確な解が知られている。この手法は対称群の表現論を用いており、リーブ・マティス結果の一般化と見なすことができる。

The Quantum Max Cut (QMC) problem has emerged as a test-problem for designing approximation algorithms for local Hamiltonian problems. In this paper we attack this problem using the algebraic structure of QMC, in particular the relationship between the quantum max cut Hamiltonian and the representation theory of the symmetric group. The first major contribution of this paper is an extension of non-commutative Sum of Squares (ncSoS) optimization techniques to give a new hierarchy of relaxations to Quantum Max Cut. The hierarchy we present is based on optimizations over polynomials in the qubit swap operators. This is in contrast to the ``standard'' quantum Lasserre Hierarchy, which is based on polynomials expressed in terms of the Pauli matrices. To prove correctness of this hierarchy, we exploit a finite presentation of the algebra generated by the qubit swap operators. This presentation allows for the use of computer algebraic techniques to manipulate simplify polynomials written in terms of the swap operators, and may be of independent interest. Surprisingly, we find that level-2 of this new hierarchy is exact (up to tolerance $10^{-7}$) on all QMC instances with uniform edge weights on graphs with at most 8 vertices. The second major contribution of this paper is a polynomial-time algorithm that exactly computes the maximum eigenvalue of the QMC Hamiltonian for certain graphs, including graphs that can be "decomposed" as a signed combination of cliques. A special case of the latter are complete bipartite graphs with uniform edge-weights, for which exact solutions are known from the work of Lieb and Mattis. Our methods, which use representation theory of the symmetric group, can be seen as a generalization of the Lieb-Mattis result.

翻訳日:2023-08-10 16:49:14 公開日:2023-08-08

# ロシアのソーシャルメディアで、ウクライナでの戦争に反対する人々と支持者:彼らは誰なのか?

Opponents and proponents of the war in Ukraine in Russian social media: who are they? ( http://arxiv.org/abs/2308.04473v1 )

ライセンス: Link先を確認

Alesya Sokolova

(参考訳) ウクライナでの戦争を支持するロシアの性格を理解することは、この戦争がいかにして可能になったかを理解するための重要なステップの1つである。しかし、戦時中、伝統的な社会学的手法は必ずしも適用されない。ソーシャルメディアは、人々の頭の中にあるものの代替の情報源を提供する。本稿では,ウクライナにおける戦争に対する強硬な立場にあるロシアにおけるソーシャルメディア利用者の政治的アイデンティティ,価値観,利益を比較検討する。私はロシアで最も人気のあるソーシャルメディアプラットフォームであるVKからデータを収集し、ユーザーが購読したグループだけでなく、自己完結したプロフィール情報も分析します。私は、戦争の支持者は、より正確に指定する(しばしば「自由」に限定されるわけではない)相手よりも、より弱い政治的アイデンティティ(自らを「モデレート」と呼ぶ)を持つ傾向があることを見出しました。さらに、支持者の価値観は、正統派や家族といったロシア政府によって推進されたものとよく一致している。これらの違いにもかかわらず、親戦派と反戦派のユーザーは、音楽、歴史、スポーツに焦点を当てた同じグループへのサブスクリプションによって証明されるように、多くの共通の関心を共有している。人々の最も重要な特性(フィールドユーザがVKを埋めることができる)を述べるように頼まれると、両方のグループの最も頻繁な答えは、“親切と誠実さ”である。分析結果は、ロシアにおける世論の理解に寄与するだけでなく、ソーシャルメディアのプロフィールに基づいて戦争における立場を予測するために利用することができる。

Understanding the personality of Russians who support the war in Ukraine is one of the key steps to understanding how this war became possible. However, during the war, traditional sociological methods are not always applicable. Social media provides an alternative source of what is inside people's heads. In this paper, I compare the political identities, values, and interests of social media users in Russia who hold a strong position for or against the war in Ukraine. I collect data from VK, the most popular Russian social media platform, and analyze self-filled profile information as well as the groups that the users subscribed to. I found that proponents of the war tend to have a weaker political identity (self-identified as "moderate") compared to opponents, who specify it more precisely (often, but not limited to, "liberal"). Additionally, the values of the proponents more frequently align with those promoted by the Russian government, such as orthodoxy and family. Despite these differences, pro-war and anti-war users share many common interests, as evidenced by their subscriptions to the same groups focused on music, history, and sport. When asked to state the most important trait in people (a field users can fill in VK), the most frequent answer for both groups is "kindness and honesty". The analysis results, in addition to contributing to the understanding of public opinion in Russia, can be utilized for predicting one's position on the war based on their social media profile.

翻訳日:2023-08-10 16:40:41 公開日:2023-08-08

# 量子計測理論における正準占有状態(マクロ)のエントロピー

Entropy of the Canonical Occupancy (Macro) State in the Quantum Measurement Theory ( http://arxiv.org/abs/2308.04472v1 )

ライセンス: Link先を確認

Arnaldo Spalvieri

(参考訳) 本論文は, 平衡における不連続粒子の任意の数からなる系のエントロピーを解析し, エントロピーを位相空間表現ではなく, 系の量子状態の関数として定義する。我々の重要な観察は、系のエントロピーが、系の粒子に許される量子状態のランダム占有数のシャノンエントロピーであるということである。我々は、Jaynesの最大エントロピー原理に基づく情報理論的アプローチと、現代の量子熱力学における標準的典型性をもたらす経験的アプローチを考える。情報理論のアプローチでは、粒子の量子状態の占有数は多変量分布であり、経験的アプローチではその分布は多変量ハイパー幾何学である。経験的確率のサンプルの数が無限大になる傾向があるため、多変量超幾何分布は多項分布に傾向がある。これにより、少なくとも極限では、2つのアプローチが和解する。量子計測の観点から考えると、本解析は最大エントロピーアプローチを特徴付ける有名な主観主義よりも、別の種類の主観主義の存在を示唆する。この主観性の形態は、情報理論と経験的アプローチの両方において、量子測定の後にエントロピーがゼロに崩壊する原因である。

The paper analyzes the entropy of a system composed by an arbitrary number of indistinguishable particles at the equilibrium, defining entropy as a function of the quantum state of the system, not of its phase space representation. Our crucial observation is that the entropy of the system is the Shannon entropy of the random occupancy numbers of the quantum states allowed to system's particles. We consider the information-theoretic approach, which is based on Jaynes' maximum entropy principle, and the empirical approach, which leads to canonical typicality in modern quantum thermodynamics. In the information-theoretic approach, the occupancy numbers of particles' quantum states are multinomially distributed, while in the empirical approach their distribution is multivariate hypergeometric. As the number of samples of the empirical probability tends to infinity, the multivariate hypergeometric distribution tends to the multinomial distribution. This reconciles, at least in the limit, the two approaches. When regarded from the perspective of quantum measurement, our analysis suggests the existence of another kind of subjectivism than the well-known subjectivism that characterizes the maximum entropy approach. This form of subjectivity is responsible for the collapse of entropy to zero after the quantum measurement, both in the information-theoretic and in the empirical approaches.

翻訳日:2023-08-10 16:40:15 公開日:2023-08-08

# d-score:フィルタープルーニングのためのシナプスにインスパイアされたアプローチ

D-Score: A Synapse-Inspired Approach for Filter Pruning ( http://arxiv.org/abs/2308.04470v1 )

ライセンス: Link先を確認

Doyoung Park, Jinsoo Kim, Jina Nam, Jooyoung Chang, Sang Min Park

(参考訳) 本稿では,畳み込みニューラルネットワーク(CNN)におけるフィルタプルーニングにおける重要でないフィルタのランクを決定するための新しい側面を紹介する。ヒトシナプス系では、興奮性および抑制性神経伝達物質として知られる2つの重要なチャネルがあり、ニューロンから細胞にシグナルを伝達する。神経科学的な観点から、我々はシナプスにインスパイアされたフィルタプルーニング法、すなわちDynamic Score(D-Score)を提案する。 D-Scoreはフィルタにおける正と負の重みの独立重要性を分析し、スコアを割り当てることによって独立重要性をランク付けする。全体的なスコアが低く、ニューラルネットワークの精度への影響が低いフィルタを切断する。 CIFAR-10 と ImageNet データセットを用いた実験結果から,FLOP と Param の顕著な量の Acc を伴わずに削減し,提案手法の有効性を示した。ドロップ。

This paper introduces a new aspect for determining the rank of the unimportant filters for filter pruning on convolutional neural networks (CNNs). In the human synaptic system, there are two important channels known as excitatory and inhibitory neurotransmitters that transmit a signal from a neuron to a cell. Adopting the neuroscientific perspective, we propose a synapse-inspired filter pruning method, namely Dynamic Score (D-Score). D-Score analyzes the independent importance of positive and negative weights in the filters and ranks the independent importance by assigning scores. Filters having low overall scores, and thus low impact on the accuracy of neural networks are pruned. The experimental results on CIFAR-10 and ImageNet datasets demonstrate the effectiveness of our proposed method by reducing notable amounts of FLOPs and Params without significant Acc. Drop.

翻訳日:2023-08-10 16:39:56 公開日:2023-08-08

# 深層学習ニューラルネットワークによるメディカルクレームサービスに関する考察

Correlating Medi- Claim Service by Deep Learning Neural Networks ( http://arxiv.org/abs/2308.04469v1 )

ライセンス: Link先を確認

Jayanthi Vajiram, Negha Senthil, Nean Adhith.P

(参考訳) 医療保険請求は、患者、医師、診断センター、保険業者に関連する組織犯罪であり、常に監視されなければならない連鎖反応を形成する。このような不正行為は、保険保険業者と保険業者の財政的成長に影響を及ぼす。畳み込みニューラルネットワークアーキテクチャ(convolution neural network architecture)は、回帰モデルの相関研究を通じて不正なクレームを検出するために使用される。監視および教師なしの分類器は詐欺や非詐欺行為を検出するために使用される。

Medical insurance claims are of organized crimes related to patients, physicians, diagnostic centers, and insurance providers, forming a chain reaction that must be monitored constantly. These kinds of frauds affect the financial growth of both insured people and health insurance companies. The Convolution Neural Network architecture is used to detect fraudulent claims through a correlation study of regression models, which helps to detect money laundering on different claims given by different providers. Supervised and unsupervised classifiers are used to detect fraud and non-fraud claims.

翻訳日:2023-08-10 16:39:38 公開日:2023-08-08

# シーングラフを用いた3次元シーン拡散誘導

3D Scene Diffusion Guidance using Scene Graphs ( http://arxiv.org/abs/2308.04468v1 )

ライセンス: Link先を確認

Mohammad Naanaa, Katharina Schmid, Yinyu Nie

(参考訳) 高品質な3dシーンの合成は難しい課題である。拡散モデルは、3Dシーンを含む多様なデータを生成することを約束している。しかし、現在の手法は生成を制御するために直接テキスト埋め込みに依存しており、オブジェクト間の複雑な空間的関係の組み込みを制限している。シーングラフを用いた3次元シーン拡散誘導手法を提案する。シーングラフが提供する相対的空間情報を活用するために,我々は,ネットワーク内の関係グラフ畳み込みブロックを利用する。提案手法はシーン記述と生成シーンのアライメントを大幅に改善することを示す。

Guided synthesis of high-quality 3D scenes is a challenging task. Diffusion models have shown promise in generating diverse data, including 3D scenes. However, current methods rely directly on text embeddings for controlling the generation, limiting the incorporation of complex spatial relationships between objects. We propose a novel approach for 3D scene diffusion guidance using scene graphs. To leverage the relative spatial information the scene graphs provide, we make use of relational graph convolutional blocks within our denoising network. We show that our approach significantly improves the alignment between scene description and generated scene.

翻訳日:2023-08-10 16:39:29 公開日:2023-08-08

# バックドアクリティカルレイヤの毒殺によるバックドアフェデレート学習

Backdoor Federated Learning by Poisoning Backdoor-Critical Layers ( http://arxiv.org/abs/2308.04466v1 )

ライセンス: Link先を確認

Haomin Zhuang, Mingxian Yu, Hao Wang, Yang Hua, Jian Li, and Xu Yuan

(参考訳) フェデレートラーニング(FL)は、分散デバイス間の機密データに対する機械学習トレーニングを可能にするために広くデプロイされている。しかし、FLの分散学習パラダイムと不均一性は、バックドア攻撃の攻撃面をさらに拡張する。既存のFL攻撃と防衛方法は通常、モデル全体に焦点を当てる。いずれも、モデル脆弱性を支配しているバックドアクリティカル(BC)層の存在を認識していない。 bc層を攻撃することは、モデル全体を攻撃することと同等の効果をもたらすが、最先端の防御(sota)によって検出される可能性ははるかに低い。本稿では,攻撃者の視点からBC層を同定し,検証する一般のin-situアプローチを提案する。識別されたbc層に基づき、様々な防御戦略の下で攻撃効果とステルスネスの基本的なバランスを適応的に求める新しいバックドア攻撃手法を慎重に作成する。広範囲な実験によって、bc層対応のバックドア攻撃は7つのsota防御の下でflをうまくバックドアすることができ、悪意のあるクライアントはわずか10%であり、最新のバックドア攻撃方法よりも優れています。

Federated learning (FL) has been widely deployed to enable machine learning training on sensitive data across distributed devices. However, the decentralized learning paradigm and heterogeneity of FL further extend the attack surface for backdoor attacks. Existing FL attack and defense methodologies typically focus on the whole model. None of them recognizes the existence of backdoor-critical (BC) layers-a small subset of layers that dominate the model vulnerabilities. Attacking the BC layers achieves equivalent effects as attacking the whole model but at a far smaller chance of being detected by state-of-the-art (SOTA) defenses. This paper proposes a general in-situ approach that identifies and verifies BC layers from the perspective of attackers. Based on the identified BC layers, we carefully craft a new backdoor attack methodology that adaptively seeks a fundamental balance between attacking effects and stealthiness under various defense strategies. Extensive experiments show that our BC layer-aware backdoor attacks can successfully backdoor FL under seven SOTA defenses with only 10% malicious clients and outperform the latest backdoor attack methods.

翻訳日:2023-08-10 16:39:22 公開日:2023-08-08

# 強化学習型筋制御器による人的バランスのキャラクタリゼーション

Characterization of Human Balance through a Reinforcement Learning-based Muscle Controller ( http://arxiv.org/abs/2308.04462v1 )

ライセンス: Link先を確認

K\"ubra Akba\c{s}, Carlotta Mummolo, Xianlian Zhou

(参考訳) 身体リハビリテーション中のバランスアセスメントは、しばしば患者の身体能力を評価するためにルーリック指向のバッテリーテストに依存し、主観性につながる。いくつかの客観的バランス評価は存在するが、身体全体の姿勢安定性を完全に把握しない圧力中心(COP)の追跡に限られることが多い。本研究は, 重心状態空間(COM)の利用について検討し, ヒトのバランス能力を監視するための有望な道を示す。我々は、バランスコントローラと統合された筋骨格モデルを用いて、強化学習(RL)を通して訓練し、バランス機能を調べる。 RLフレームワークは、それぞれバランス回復と筋肉調整を管理する2つの相互接続ニューラルネットワークで構成され、PPO(Proximal Policy Optimization)を使用してトレーニングされ、参照状態の初期化、早期終了、複数のトレーニング戦略が提供されている。トレーニングされたコントローラに対するランダムな初期COM状態(位置と速度)空間からの回復を探索することにより、バランス回復軌道を囲む最終BRを得る。線形逆振り子モデルによる解析的姿勢安定性限界と比較すると, COM状態は同様の傾向を示すが, 回復可能な領域はより限定的である。さらに,brsに対する筋力低下と神経興奮遅延の影響について検討し,異なる領域におけるバランス能力の低下を明らかにした。全体として, 筋力バランス制御系を学習するアプローチは, バランス回復限界の確立と2足歩行系, 特にヒトにおけるバランス能力の客観的評価に有望な新しい方法を提案する。

Balance assessment during physical rehabilitation often relies on rubric-oriented battery tests to score a patient's physical capabilities, leading to subjectivity. While some objective balance assessments exist, they are often limited to tracking the center of pressure (COP), which does not fully capture the whole-body postural stability. This study explores the use of the center of mass (COM) state space and presents a promising avenue for monitoring the balance capabilities in humans. We employ a musculoskeletal model integrated with a balance controller, trained through reinforcement learning (RL), to investigate balancing capabilities. The RL framework consists of two interconnected neural networks governing balance recovery and muscle coordination respectively, trained using Proximal Policy Optimization (PPO) with reference state initialization, early termination, and multiple training strategies. By exploring recovery from random initial COM states (position and velocity) space for a trained controller, we obtain the final BR enclosing successful balance recovery trajectories. Comparing the BRs with analytical postural stability limits from a linear inverted pendulum model, we observe a similar trend in successful COM states but more limited ranges in the recoverable areas. We further investigate the effect of muscle weakness and neural excitation delay on the BRs, revealing reduced balancing capability in different regions. Overall, our approach of learning muscular balance controllers presents a promising new method for establishing balance recovery limits and objectively assessing balance capability in bipedal systems, particularly in humans.

翻訳日:2023-08-10 16:39:02 公開日:2023-08-08

# 会話型マルチモーダル感情認識におけるモーダリティとコンテキストに関する再検討

Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition ( http://arxiv.org/abs/2308.04502v1 )

ライセンス: Link先を確認

Bobo Li, Hao Fei, Lizi Liao, Yu Zhao, Chong Teng, Tat-Seng Chua, Donghong Ji, Fei Li

(参考訳) 会話におけるマルチモーダル感情分析(MM-ERC)の課題である対話シナリオ下で、機械が人間の感情を多モーダルな文脈で理解できるようにするためのホットな研究テーマである。 MM-ERCは近年,タスク性能向上のための多種多様な手法が提案されている。 MM-ERCを標準マルチモーダル分類問題として扱い,特徴量最大化のためのマルチモーダル特徴分散と融合を行う。しかし,MM-ERCの特徴を再考した結果,特徴の多相性と会話の文脈化は,特徴の絡み合いや融合の段階において同時にモデル化されるべきである,と論じている。本研究では、上記の知見を十分に考慮し、タスクパフォーマンスのさらなる向上を目標としている。一方,特徴の絡み合いにおいては,コントラスト学習手法に基づき,特徴をモダリティ空間と発話空間の両方に分離するddm(d-level disentanglement mechanism)を考案する。一方,機能融合の段階では,マルチモーダルとコンテキスト統合のための貢献・認識融合機構(cfm)とコンテキスト再融合機構(crm)を提案する。それらは、マルチモーダル機能とコンテキスト機能の適切な統合をスケジュールする。具体的には、CFMは動的にマルチモーダル機能のコントリビューションを管理し、CRMは対話コンテキストの導入を柔軟に調整する。 2つの公開MM-ERCデータセット上で,本システムは新しい最先端性能を一貫して達成する。さらに,マルチモーダルとコンテキスト機能を適応的に活用することにより,提案手法はすべてmm-ercタスクを大いに促進することを示す。提案手法は,より広い範囲の対話型マルチモーダルタスクを実現するための大きな可能性を秘めている。

It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard multimodal classification problem and perform multimodal feature disentanglement and fusion for maximizing feature utility. Yet after revisiting the characteristic of MM-ERC, we argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. In this work, we target further pushing the task performance by taking full consideration of the above insights. On the one hand, during feature disentanglement, based on the contrastive learning technique, we devise a Dual-level Disentanglement Mechanism (DDM) to decouple the features into both the modality space and utterance space. On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively. They together schedule the proper integrations of multimodal and context features. Specifically, CFM explicitly manages the multimodal feature contributions dynamically, while CRM flexibly coordinates the introduction of dialogue contexts. On two public MM-ERC datasets, our system achieves new state-of-the-art performance consistently. Further analyses demonstrate that all our proposed mechanisms greatly facilitate the MM-ERC task by making full use of the multimodal and context features adaptively. Note that our proposed methods have the great potential to facilitate a broader range of other conversational multimodal tasks.

翻訳日:2023-08-10 16:32:32 公開日:2023-08-08

# 量子部分情報分解

Quantum Partial Information Decomposition ( http://arxiv.org/abs/2308.04499v1 )

ライセンス: Link先を確認

S.J. van Enk

(参考訳) 部分情報分解 (Partial Information Decomposition, PID) は、情報2変数$A,B$が持つ第3変数$T$を、一意、共有(または冗長)、相乗的情報という別の部分に分解するシャノンの理論の一歩を踏み出したものである。ここでは、これらの概念を量子的に定義する方法を示す。我々は、量子論的記述が生産的であることが証明された量子多体系のスクランブルに量子PIDを適用した。特に特異な情報は、いわゆる三情報よりもスクランブルの詳細な記述を提供する。

The Partial Information Decomposition (PID) takes one step beyond Shannon's theory in decomposing the information two variables $A,B$ possess about a third variable $T$ into distinct parts: unique, shared (or redundant) and synergistic information. Here we show how these concepts can be defined in a quantum setting. We apply a quantum PID to scrambling in quantum many-body systems, for which a quantum-theoretic description has been proven productive. Unique information in particular provides a finer description of scrambling than does the so-called tri-information.

翻訳日:2023-08-10 16:32:01 公開日:2023-08-08

# DialogRE^C+: ダイアログの相関抽出にどの程度のコアが役立つかを調べるためのダイアログの拡張

DialogRE^C+: An Extension of DialogRE to Investigate How Much Coreference Helps Relation Extraction in Dialogs ( http://arxiv.org/abs/2308.04498v1 )

ライセンス: Link先を確認

Yiyun Xiong, Mengwei Dai, Fei Li, Hao Fei, Bobo Li, Shengqiong Wu, Donghong Ji, Chong Teng

(参考訳) 対話テキスト中の引数ペア間の関係を識別する対話関係抽出(DRE)は、個人代名詞の頻繁な出現、すなわちエンティティと話者のコア参照に悩まされる。本稿では、新しいベンチマークデータセットdialogre^c+を導入し、dreシナリオにコリファレンスレゾリューションを導入する。高品質なコア参照知識の活用により、議論関係の推論が強化されることが期待される。 dialogre^c+データセットでは、既存のdialogreデータに基づいて、36,369以上の合計5,068個のコリファレンスチェーンに手動で注釈を付けます。さらに、4つのコア参照強化グラフベースDREモデルを開発し、DREタスクを改善するための効果的なコア参照表現を学習する。また、アノテーションに基づいたコリファレンス解決モデルをトレーニングし、データセットの実用性とその他のドメインやタスクへの可能性を示す、自動抽出されたコリファレンスチェーンの効果を評価します。

Dialogue relation extraction (DRE) that identifies the relations between argument pairs in dialogue text, suffers much from the frequent occurrence of personal pronouns, or entity and speaker coreference. This work introduces a new benchmark dataset DialogRE^C+, introducing coreference resolution into the DRE scenario. With the aid of high-quality coreference knowledge, the reasoning of argument relations is expected to be enhanced. In DialogRE^C+ dataset, we manually annotate total 5,068 coreference chains over 36,369 argument mentions based on the existing DialogRE data, where four different coreference chain types namely speaker chain, person chain, location chain and organization chain are explicitly marked. We further develop 4 coreference-enhanced graph-based DRE models, which learn effective coreference representations for improving the DRE task. We also train a coreference resolution model based on our annotations and evaluate the effect of automatically extracted coreference chains demonstrating the practicality of our dataset and its potential to other domains and tasks.

翻訳日:2023-08-10 16:31:49 公開日:2023-08-08

# 非エルミート準結晶中の相関粒子の相転移と凝集

Phase transitions and bunching of correlated particles in a non-Hermitian quasicrystal ( http://arxiv.org/abs/2308.04495v1 )

ライセンス: Link先を確認

Stefano Longhi

(参考訳) 非エルミート準結晶中の非相互作用粒子は、点ギャップ位相によって特徴づけられる複雑なエネルギー平面における局在化とスペクトル相転移を示す。ここでは,非エルミート準結晶中の2つの相互作用粒子のスペクトルおよび動的特徴について検討し,複素位相をもつ非共役正弦波ポテンシャルの有効なハバードモデルにより記述し,エルミート準結晶を伴わないいくつかの興味深い効果を解き明かす。粒子相互作用によって引き起こされる相関ホッピングの効果的な減少、すなわち境界粒子状態は、単一粒子状態よりもスペクトルおよび局在化-非局在化遷移のしきい値がはるかに低く、移動エッジが出現する。顕著なことに、ダビロンは寿命が長いため、最初に離れた場所に置かれた2つの粒子は束縛され、進化の長期的限界において二重状態を形成する傾向にあり、これは「非エルミート粒子束」と呼ばれる現象である。

Non-interacting particles in non-Hermitian quasi crystals display localization-delocalization and spectral phase transitions in complex energy plane, that can be characterized by point-gap topology. Here we investigate the spectral and dynamical features of two interacting particles in a non-Hermitian quasi crystal, described by an effective Hubbard model in an incommensurate sinusoidal potential with a complex phase, and unravel some intriguing effects without any Hermitian counterpart. Owing to the effective decrease of correlated hopping introduced by particle interaction, doublon states, i.e. bound particle states, display a much lower threshold for spectral and localization-delocalization transitions than single-particle states, leading to the emergence of mobility edges. Remarkably, since doublons display longer lifetimes, two particles initially placed in distant sites tend to bunch and stick together, forming a doublon state in the long time limit of evolution, a phenomenon that can be dubbed {\em non-Hermitian particle bunching}.

翻訳日:2023-08-10 16:31:28 公開日:2023-08-08

# 波動関数分岐:混合状態から純粋な状態を区別できない場合

Wavefunction branching: when you can't tell pure states from mixed states ( http://arxiv.org/abs/2308.04494v1 )

ライセンス: Link先を確認

Jordan K. Taylor, Ian P. McCulloch

(参考訳) 本稿では、時間的進化の下でも対応する混合状態と区別できない量子重ね合わせの波動関数"分岐"の定義を提案する。我々の定義は解釈から大きく独立しており、枝を区別するよりも多くの局所ゲートを交換する必要がある。そのような分岐分解を認める状態のいくつかの例を示す。本定義では, 枝間の相対位相情報取得の試みは, 頻繁な能動的誤り訂正を行わずに失敗し, 枝はよい誤り訂正符号とは事実上逆であり, 枝は自然進化下の時間に, 枝はより分離して成長し, 枝は空間的絡み合いを吸収し, 枝は保存量の存在下では強く, 分岐は効果的な非可逆性をもたらすことを示した。多体量子状態におけるこれらの分岐分解の同定は、古典性の出現に光を当て、量子/古典境界での実験的実験のためのメトリックを提供し、より長い時間発展シミュレーションを可能にする。本研究は, 環境・環境の明確な分割のない状況に対する, 環境に起因したデコヒーレンスの基本概念の一般化であると考えている。

We propose a definition of wavefunction "branchings": quantum superpositions which can't be feasibly distinguished from the corresponding mixed state, even under time evolution. Our definition is largely independent of interpretations, requiring only that it takes many more local gates to swap branches than to distinguish them. We give several examples of states admitting such branch decompositions. Under our definition, we show that attempts to get relative-phase information between branches will fail without frequent active error correction, that branches are effectively the opposite of good error-correcting codes, that branches effectively only grow further apart in time under natural evolution, that branches tend to absorb spatial entanglement, that branching is stronger in the presence of conserved quantities, and that branching implies effective irreversibility. Identifying these branch decompositions in many-body quantum states could shed light on the emergence of classicality, provide a metric for experimental tests at the quantum/ classical boundary, and allow for longer numerical time evolution simulations. We see this work as a generalization of the basic ideas of environmentally-induced decoherence to situations with no clear system/ environment split.

翻訳日:2023-08-10 16:31:07 公開日:2023-08-08

# 単元型フォトニックコンピューティングチップによる効率的なオプション価格設定と生成逆学習

Efficient option pricing with unary-based photonic computing chip and generative adversarial learning ( http://arxiv.org/abs/2308.04493v1 )

ライセンス: Link先を確認

Hui Zhang, Lingxiao Wan, Sergi Ramos-Calderer, Yuancheng Zhan, Wai-Keong Mok, Hong Cai, Feng Gao, Xianshu Luo, Guo-Qiang Lo, Leong Chuan Kwek, Jos\'e Ignacio Latorre and Ai Qun Liu

(参考訳) 現代の金融産業システムでは、製品の構造がますます複雑になってきており、古典的コンピューティングパワーのボトルネックの制約は金融産業の発展を既に制限している。本稿では,古典モンテカルロ法と比較して2次高速化を実現するために,量子振幅推定アルゴリズムと組み合わせて,欧州のオプション価格の一元的手法を実装したフォトニックチップを提案する。回路は、資産価格の分布をロードするモジュール、期待されるペイオフを計算するモジュール、スピードアップを導入する量子振幅推定アルゴリズムを実行するモジュールの3つのモジュールで構成される。流通モジュールでは、資産分布の効率的な学習とロードのために生成的対向ネットワークが組み込まれ、市場動向を正確に把握する。この研究は金融分野のアプリケーション向けの特殊なフォトニックプロセッサの開発における一歩であり、金融サービスの効率と品質を向上させる可能性を秘めている。

In the modern financial industry system, the structure of products has become more and more complex, and the bottleneck constraint of classical computing power has already restricted the development of the financial industry. Here, we present a photonic chip that implements the unary approach to European option pricing, in combination with the quantum amplitude estimation algorithm, to achieve a quadratic speedup compared to classical Monte Carlo methods. The circuit consists of three modules: a module loading the distribution of asset prices, a module computing the expected payoff, and a module performing the quantum amplitude estimation algorithm to introduce speed-ups. In the distribution module, a generative adversarial network is embedded for efficient learning and loading of asset distributions, which precisely capture the market trends. This work is a step forward in the development of specialized photonic processors for applications in finance, with the potential to improve the efficiency and quality of financial services.

翻訳日:2023-08-10 16:30:44 公開日:2023-08-08

# アラビア語文法誤り訂正のためのchatgpt

ChatGPT for Arabic Grammatical Error Correction ( http://arxiv.org/abs/2308.04492v1 )

ライセンス: Link先を確認

Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoud, Muhammad Abdul-Mageed

(参考訳) 近年,人間の指導に追従するように微調整された大規模言語モデル (LLM) は,様々な英語NLPタスクにおいて重要な機能を示している。しかし、文法的誤り訂正(GEC)タスクにおけるそれらの性能は、特に非英語言語では明らかに未解明のままである。本稿では,アラビア語の豊富な形態が原因で複雑化した課題である,アラビア語 GEC における微調整 LLM の指導能力について検討する。この結果から, GPT-4 はエキスパート・プロンプトで 65.49$ F\textsubscript{1} のスコアを達成し, 各種プロンプト法と (文脈内) 少数ショット学習の併用により, 高い効果が得られたことが示唆された。これは低リソース環境でのLLMの可能性を強調し、モデルトレーニングに有用な合成データを生成するための実行可能なアプローチを提供する。これらの肯定的な結果にもかかわらず、命令の微調整モデルは、そのサイズに関わらず、かなり小さいサイズの完全微調整モデルに比べて、著しく性能が劣ることがわかった。この格差は、LLMの大幅な改善の余地を浮き彫りにする。また,低リソース機械翻訳の手法に触発されて,従来の2つの標準アラビア語ベンチマークのモデルを大きく上回る合成データを利用する手法を開発した。我々の研究は、2014年と2015年のQALBデータセットで、それぞれ72.19 %$と73.26 $ F$_{1}$の新たな SoTA をアラビア語 GEC 向けに設定している。

Recently, large language models (LLMs) fine-tuned to follow human instruction have exhibited significant capabilities in various English NLP tasks. However, their performance in grammatical error correction (GEC) tasks, particularly in non-English languages, remains significantly unexplored. In this paper, we delve into abilities of instruction fine-tuned LLMs in Arabic GEC, a task made complex due to Arabic's rich morphology. Our findings suggest that various prompting methods, coupled with (in-context) few-shot learning, demonstrate considerable effectiveness, with GPT-4 achieving up to $65.49$ F\textsubscript{1} score under expert prompting (approximately $5$ points higher than our established baseline). This highlights the potential of LLMs in low-resource settings, offering a viable approach for generating useful synthetic data for model training. Despite these positive results, we find that instruction fine-tuned models, regardless of their size, significantly underperform compared to fully fine-tuned models of significantly smaller sizes. This disparity highlights a substantial room for improvements for LLMs. Inspired by methods from low-resource machine translation, we also develop a method exploiting synthetic data that significantly outperforms previous models on two standard Arabic benchmarks. Our work sets new SoTA for Arabic GEC, with $72.19\%$ and $73.26$ F$_{1}$ on the 2014 and 2015 QALB datasets, respectively.

翻訳日:2023-08-10 16:30:28 公開日:2023-08-08

# 1+1)Dハミルトンハードコア格子QCDにおけるハドロン

Hadrons in (1+1)D Hamiltonian hardcore lattice QCD ( http://arxiv.org/abs/2308.04488v1 )

ライセンス: Link先を確認

Marco Rigobello, Giuseppe Magnifico, Pietro Silvi, Simone Montangero

(参考訳) 本研究では, (1+1)D にハードコアグルーオンを持つ2-フレーバーハミルトン格子 QCD を, 行列積状態を用いて0, 有限密度で検討した。ゲージ冗長性が存在しない理論を定式化し、ゲージ不変テンソルネットワーク ansatz を構成する。モデルがパラメータ空間の拡張部分領域において重要なことを示し、少なくとも2つの異なる位相を同定し、そのうちの1つは連続極限位置を埋め込む。我々は各相における粒子スペクトルのサブセットを再構成し、エッジとバルクギャップレスモードを同定する。したがって、研究モデルは、3+1D QCDの既知の現象を再現しながら、最小の SU(3) ゲージ理論を提供することを示した。最も注目すべきは、荷電パイ中間子を持つ粒子スペクトルである。

We study 2-flavor Hamiltonian lattice QCD in (1+1)D with hardcore gluons, at zero and finite density, by means of matrix product states. We introduce a formulation of the theory where gauge redundancy is absent and construct a gauge invariant tensor network ansatz. We show that the model is critical in an extended subregion of parameter space and identify at least two distinct phases, one of which embeds the continuum limit location. We reconstruct a subset of the particle spectrum in each phase, identifying edge and bulk gapless modes. We thereby show that the studied model provides a minimal SU(3) gauge theory whilst reproducing known phenomena of (3+1)D QCD. Most notably, its particle spectrum features charged pions.

翻訳日:2023-08-10 16:30:00 公開日:2023-08-08

# デジタル量子コンピュータにおける基底状態準備のためのスケーラブル回路:100Qubit上のSchwinger Model Vacuum

Scalable Circuits for Preparing Ground States on Digital Quantum Computers: The Schwinger Model Vacuum on 100 Qubits ( http://arxiv.org/abs/2308.04481v1 )

ライセンス: Link先を確認

Roland C. Farrell, Marc Illa, Anthony N. Ciavarella, Martin J. Savage

(参考訳) 格子シュウィンガーモデルの真空は、最大100キュービットのibmのイーグルプロセッサ量子コンピュータで用意されている。量子コンピュータ上でガッピング変換不変システムの基底状態を生成する新しいアルゴリズムを提案し,スケーラブル回路adapt-vqe (sc-adapt-vqe) と呼ぶ。このアルゴリズムは、ADAPT-VQEとともに、基底状態の遠い領域間の相関関係の指数的減衰を利用して、任意に大きなシステムにスケールできる状態準備のための量子回路を構築する。 SC-ADAPT-VQEはシュウィンガーモデルに適用され、回路深さと指数的に収束する精度で体系的に即効性を示す。回路の構造と準備された波動関数の偏差の両方が、空間的位置の個数($L$)に依存しないことが分かる。これにより、小さいまたは小さめのシステムを用いて決定される回路の制御された外挿が可能となり、任意に$l$となる。シュウィンガーモデルの回路は、カイスキットの古典的シミュレータによる格子上で決定され、その後、IBMの超伝導量子コンピュータ ibm_brisbane と ibm_cusco 上の$L=50$ (100 qubits) 真空を準備するためにスケールアップされた。演算子デコヒーレンス再正規化(Operator Decoherence Renormalization)と呼ばれる改良された誤り軽減手法を適用した後, 量子コンピュータから得られたカイラル縮合および電荷電荷相関器は, 古典的行列積状態シミュレーションとよく一致していることがわかった。

The vacuum of the lattice Schwinger model is prepared on up to 100 qubits of IBM's Eagle-processor quantum computers. A new algorithm to prepare the ground state of a gapped translationally-invariant system on a quantum computer is presented, which we call Scalable Circuits ADAPT-VQE (SC-ADAPT-VQE). This algorithm uses the exponential decay of correlations between distant regions of the ground state, together with ADAPT-VQE, to construct quantum circuits for state preparation that can be scaled to arbitrarily large systems. SC-ADAPT-VQE is applied to the Schwinger model, and shown to be systematically improvable, with an accuracy that converges exponentially with circuit depth. Both the structure of the circuits and the deviations of prepared wavefunctions are found to become independent of the number of spatial sites, $L$. This allows for a controlled extrapolation of the circuits, determined using small or modest-sized systems, to arbitrarily large $L$. The circuits for the Schwinger model are determined on lattices up to $L=14$ (28 qubits) with the qiskit classical simulator, and subsequently scaled up to prepare the $L=50$ (100 qubits) vacuum on IBM's 127 superconducting-qubit quantum computers ibm_brisbane and ibm_cusco. After applying an improved error-mitigation technique, which we call Operator Decoherence Renormalization, the chiral condensate and charge-charge correlators obtained from the quantum computers are found to be in good agreement with classical Matrix Product State simulations.

翻訳日:2023-08-10 16:29:45 公開日:2023-08-08

# 10言語にわたるChatGPT 3.5を用いたコード生成の比較検討

A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages ( http://arxiv.org/abs/2308.04477v1 )

ライセンス: Link先を確認

Alessio Buscemi

(参考訳) LLM(Large Language Models)は、人工知能(AI)システムで、人間のものとよく似た言語を理解し生産するために、大規模なデータセットを使用して広範囲に訓練されている。これらのモデルは、いくつかの分野にわたる大学試験を成功させ、新しい問題に対処する機能コードを生成する能力のレベルに達している。本研究は,2022年11月にOpenAIがリリースしたLLMであるChatGPT 3.5の符号化能力について検討した。コードスニペットを作成する際のモデルのスキルは、10の異なるプログラミング言語と4つの異なるソフトウェアドメインで評価される。本研究から得られた知見に基づき, モデルの主な予期せぬ挙動と限界が同定された。本研究は,プログラミング言語の進化と技術産業における自動コード生成の意義を明らかにすることを目的としている。

Large Language Models (LLMs) are advanced Artificial Intelligence (AI) systems that have undergone extensive training using large datasets in order to understand and produce language that closely resembles that of humans. These models have reached a level of proficiency where they are capable of successfully completing university exams across several disciplines and generating functional code to handle novel problems. This research investigates the coding proficiency of ChatGPT 3.5, a LLM released by OpenAI in November 2022, which has gained significant recognition for its impressive text generating and code creation capabilities. The skill of the model in creating code snippets is evaluated across 10 various programming languages and 4 different software domains. Based on the findings derived from this research, major unexpected behaviors and limitations of the model have been identified. This study aims to identify potential areas for development and examine the ramifications of automated code generation on the evolution of programming languages and on the tech industry.

翻訳日:2023-08-10 16:29:13 公開日:2023-08-08

# テキストに先駆けて:金融関係抽出のためのエンティティ前置詞の活用

Ahead of the Text: Leveraging Entity Preposition for Financial Relation Extraction ( http://arxiv.org/abs/2308.04534v1 )

ライセンス: Link先を確認

Stefan Pasch, Dimitrios Petridis

(参考訳) ACM KDF-SIGIR 2023コンペティションの文脈では、REFindと呼ばれる金融関係のデータセット上で、エンティティ関係タスクを実行する。私たちのトップパフォーマンスソリューションには、多段階のアプローチがありました。最初は、提供されたエンティティをテキスト内の対応する場所に挿入しました。その後,テキスト分類のためのトランスフォーマーベース言語モデルRogerta-largeをラベル付きトレーニングセットを用いて微調整し,実体関係を予測する。最後に,モデルが生成する疑わしい予測を識別し処理するために,処理後フェーズを実装した。提案手法により,大会の公開リーダーボードにおいて,第1位にランクインした。

In the context of the ACM KDF-SIGIR 2023 competition, we undertook an entity relation task on a dataset of financial entity relations called REFind. Our top-performing solution involved a multi-step approach. Initially, we inserted the provided entities at their corresponding locations within the text. Subsequently, we fine-tuned the transformer-based language model roberta-large for text classification by utilizing a labeled training set to predict the entity relations. Lastly, we implemented a post-processing phase to identify and handle improbable predictions generated by the model. As a result of our methodology, we achieved the 1st place ranking on the competition's public leaderboard.

翻訳日:2023-08-10 16:21:37 公開日:2023-08-08

# スタイル変換による現代ペルシャカルペットマップの生成

Generating Modern Persian Carpet Map by Style-transfer ( http://arxiv.org/abs/2308.04529v1 )

ライセンス: Link先を確認

Dorsa Rahmatian, Monireh Moshavash, Mahdi Eftekhari, and Kamran Hoseinkhani

(参考訳) 現在、ディープニューラルネットワーク(DNN)の性能は様々な分野で証明されている。最も魅力的な応用の1つは芸術的なデザインを作ることである。芸術作品として知られるカーペットは、世界中の多くの愛好家がいる家の中で最も重要なアイテムの1つである。カーペットを作る第1段階は、地図を作成することであり、これは困難で時間がかかり、費用がかかる作業である。本研究の目的は,近代ペルシャカルペットマップの作成にDNNを使用することである。この目的を達成するために、3つの異なるDNNスタイルの転送手法を提案し、比較した。提案手法では,初期カーペットマップの作成にスタイルスワップ法を応用し,より多様なデザインを生成するため,クリップスワップ法,ガティ法,スタイルスワップ法を別々に使用する。また, カーペットマップの着色方法についても検討し, 導入した。設計した地図は, ユーザ評価の結果が生成したカーペットマップの人気を裏付けるアンケートの結果によって評価される。最終的に、カーペットマップの作成に初めてインテリジェントな手法が使用され、人間の介入を減らす。提案手法は,従来の手法よりも高速で多種多様なカーペットデザインを作成可能である。

Today, the great performance of Deep Neural Networks(DNN) has been proven in various fields. One of its most attractive applications is to produce artistic designs. A carpet that is known as a piece of art is one of the most important items in a house, which has many enthusiasts all over the world. The first stage of producing a carpet is to prepare its map, which is a difficult, time-consuming, and expensive task. In this research work, our purpose is to use DNN for generating a Modern Persian Carpet Map. To reach this aim, three different DNN style transfer methods are proposed and compared against each other. In the proposed methods, the Style-Swap method is utilized to create the initial carpet map, and in the following, to generate more diverse designs, methods Clip-Styler, Gatys, and Style-Swap are used separately. In addition, some methods are examined and introduced for coloring the produced carpet maps. The designed maps are evaluated via the results of filled questionnaires where the outcomes of user evaluations confirm the popularity of generated carpet maps. Eventually, for the first time, intelligent methods are used in producing carpet maps, and it reduces human intervention. The proposed methods can successfully produce diverse carpet designs, and at a higher speed than traditional ways.

翻訳日:2023-08-10 16:21:24 公開日:2023-08-08

# ドメイン適応としての教師なしcamouflaged object segmentation

Unsupervised Camouflaged Object Segmentation as Domain Adaptation ( http://arxiv.org/abs/2308.04528v1 )

ライセンス: Link先を確認

Yi Zhang, Chengyi Wu

(参考訳) 人間のラベルがないため、教師なしのイメージセグメンテーションのための深層学習は依然として困難である。一般的なアイデアはセグメンテーションヘッドを訓練することであり、自己教師付きバックボーンの表現に基づいてピクセル単位で擬似ラベルを生成する。これにより、モデルパフォーマンスは、ターゲットデータセットの分布と事前トレーニングデータセット(例えば、ImageNet)の間の距離に大きく依存する。そこで本研究では,対象オブジェクトが共通に稀な属性,すなわちカモフラージュ(camouflage)を持つような,教師なしカモフラージュオブジェクトセグメンテーション(UCOS)の新たなタスクについて検討する。当然のことながら、最先端の教師なしモデルは、ジェネリックオブジェクトとカモフラーグオブジェクトのドメインギャップのため、UCOSの適応に苦慮している。この目的のために、UCOSをソースフリーな教師なしドメイン適応タスク(UCOS-DA)として定式化し、モデルトレーニングプロセス全体において、ソースラベルとターゲットラベルの両方が欠落している。具体的には、imagenetで事前学習された自己教師付き視覚トランスフォーマーからなるソースモデルを定義する。一方、対象領域は単純な線形層(すなわち、ターゲットモデル)とラベルなしのカモフラージュオブジェクトを含む。次に,強固な uco を実現するために,フォアグラウンド・バックグラウンド・コントラッシブな自己競合ドメイン適応のためのパイプラインを設計する。その結果,UCOSベンチマークにおける教師なしモデルと比較すると,教師付きCOSモデルの10分の1のスケールのトレーニングセットに対して,ベースラインモデルの方が優れたセグメンテーション性能が得られることがわかった。

Deep learning for unsupervised image segmentation remains challenging due to the absence of human labels. The common idea is to train a segmentation head, with the supervision of pixel-wise pseudo-labels generated based on the representation of self-supervised backbones. By doing so, the model performance depends much on the distance between the distributions of target datasets and the pre-training dataset (e.g., ImageNet). In this work, we investigate a new task, namely unsupervised camouflaged object segmentation (UCOS), where the target objects own a common rarely-seen attribute, i.e., camouflage. Unsurprisingly, we find that the state-of-the-art unsupervised models struggle in adapting UCOS, due to the domain gap between the properties of generic and camouflaged objects. To this end, we formulate the UCOS as a source-free unsupervised domain adaptation task (UCOS-DA), where both source labels and target labels are absent during the whole model training process. Specifically, we define a source model consisting of self-supervised vision transformers pre-trained on ImageNet. On the other hand, the target domain includes a simple linear layer (i.e., our target model) and unlabeled camouflaged objects. We then design a pipeline for foreground-background-contrastive self-adversarial domain adaptation, to achieve robust UCOS. As a result, our baseline model achieves superior segmentation performance when compared with competing unsupervised models on the UCOS benchmark, with the training set which's scale is only one tenth of the supervised COS counterpart.

翻訳日:2023-08-10 16:21:05 公開日:2023-08-08

# 超メトリック輪郭マップを用いた大規模マルチハイポテーゼ細胞追跡

Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps ( http://arxiv.org/abs/2308.04526v1 )

ライセンス: Link先を確認

Jord\~ao Bragantini, Merlin Lange, Lo\"ic Royer

(参考訳) 本稿では,セグメンテーション選択アプローチによる大規模3dセル追跡手法について述べる。提案手法は, 大規模顕微鏡データセットにおけるセルの追跡に有効である。 (i)テラバイト規模の3D+tデータセットに数百万のセグメンテーションインスタンスを含む問題を解くことができる。 (ii)蛍光顕微鏡の領域では少ない3dアノテートデータを必要とする深層学習の有無で競争力のある結果が得られる。提案手法はセグメンテーション仮説の階層を用いてセルのトラックやセグメントを計算し,隣接フレーム間の重なりを最大化することにより隣接セグメントを選択する。本手法は, セル追跡課題から得られた3次元画像の最先端化を実現し, より高速な整数線形計画法を有することを示す。さらに,本フレームワークは柔軟で,既製のセルセグメンテーションモデルからのセグメンテーションをサポートし,それらを組み合わせることで追跡性を向上させる。コードはhttps://github.com/royerlab/ultrackで入手できる。

In this work, we describe a method for large-scale 3D cell-tracking through a segmentation selection approach. The proposed method is effective at tracking cells across large microscopy datasets on two fronts: (i) It can solve problems containing millions of segmentation instances in terabyte-scale 3D+t datasets; (ii) It achieves competitive results with or without deep learning, which requires 3D annotated data, that is scarce in the fluorescence microscopy field. The proposed method computes cell tracks and segments using a hierarchy of segmentation hypotheses and selects disjoint segments by maximizing the overlap between adjacent frames. We show that this method achieves state-of-the-art results in 3D images from the cell tracking challenge and has a faster integer linear programming formulation. Moreover, our framework is flexible and supports segmentations from off-the-shelf cell segmentation models and can combine them into an ensemble that improves tracking. The code is available https://github.com/royerlab/ultrack.

翻訳日:2023-08-10 16:20:37 公開日:2023-08-08

# 参照誘導型DNA配列アライメントのための量子ゲートアルゴリズム

Quantum gate algorithm for reference-guided DNA sequence alignment ( http://arxiv.org/abs/2308.04525v1 )

ライセンス: Link先を確認

G. D. Varsamis, I. G. Karafyllidis, K. M. Gilkes, U. Arranz, R. Martin-Cuevas, G. Calleja, P. Dimitrakis, P. Kolovos, R. Sandaltzopoulos, H. C. Jessen, J. Wong

(参考訳) 参照誘導DNAシークエンシングとアライメントは、計算分子生物学において重要なプロセスである。 DNAデータの量は急速に増加し、数百万のプライベートゲノムを再配列する必要がある間に新しいゲノムが配列されるのを待っている。それぞれのヒトゲノムは3.2B塩基対を持ち、それぞれに2ビットの情報を格納できるため、1つのヒトゲノムは6.4Bビットまたは約760MBの貯蔵を必要とする(National Institute of General Medical Sciences)。現在、ほとんどの強力なテンソル処理ユニットは、計算能力の大きな飛躍を必要とするDNAデータの量を扱うことができない。したがって、ゲノムデータ解析、特にDNA配列アライメントにおける量子コンピュータの有用性を調べることが重要である。量子コンピュータはDNAシークエンシングに関わり、当初は古典的なシステムの一部として、量子加速器として機能することが期待されている。利用可能な量子ビットの数は毎年増えており、将来の量子コンピュータは古典的な計算システムの代わりにdnaシーケンシングを行うことができる。ゲート型量子コンピューティングをモデルとした参照誘導型DNA配列アライメントのための新しい量子アルゴリズムを提案する。このアルゴリズムはスケーラブルで、既存の古典的なDNAシークエンシングシステムに統合することができ、計算エラーを制限するために意図的に構造化されている。量子アルゴリズムはIBM Quantumが提供する量子処理ユニットとシミュレータを用いてテストされており、その正確性が確認されている。

Reference-guided DNA sequencing and alignment is an important process in computational molecular biology. The amount of DNA data grows very fast, and many new genomes are waiting to be sequenced while millions of private genomes need to be re-sequenced. Each human genome has 3.2 B base pairs, and each one could be stored with 2 bits of information, so one human genome would take 6.4 B bits or about 760 MB of storage (National Institute of General Medical Sciences). Today most powerful tensor processing units cannot handle the volume of DNA data necessitating a major leap in computing power. It is, therefore, important to investigate the usefulness of quantum computers in genomic data analysis, especially in DNA sequence alignment. Quantum computers are expected to be involved in DNA sequencing, initially as parts of classical systems, acting as quantum accelerators. The number of available qubits is increasing annually, and future quantum computers could conduct DNA sequencing, taking the place of classical computing systems. We present a novel quantum algorithm for reference-guided DNA sequence alignment modeled with gate-based quantum computing. The algorithm is scalable, can be integrated into existing classical DNA sequencing systems and is intentionally structured to limit computational errors. The quantum algorithm has been tested using the quantum processing units and simulators provided by IBM Quantum, and its correctness has been confirmed.

翻訳日:2023-08-10 16:20:19 公開日:2023-08-08

# 多様なデータ型のためのディープラーニング:レビュー

Deep Learning for Diverse Data Types Steganalysis: A Review ( http://arxiv.org/abs/2308.04522v1 )

ライセンス: Link先を確認

Hamza Kheddar, Mustapha Hemis, Yassine Himeur, David Meg\'ias, Abbes Amira

(参考訳) ステガノグラフィーとステガナリシスは情報セキュリティの分野における2つの相互関係の側面である。ステガノグラフィーは通信を隠蔽しようとするが、ステガナリシスはそれらを見つけるか、可能であればそれらを含むデータを回収することを目的としている。ステガノグラフィーとステガナリシスは特に法執行機関から大きな関心を集めている。ステガノグラフィーは、多くの国で暗号が禁止または制限されているため、しばしばサイバー犯罪者やテロリストが犯罪証拠を所持している間に捕らえられるのを避けるために使用される。したがって、隠蔽情報を明らかにするための最先端技術に関する知識は、違法行為の暴露に不可欠である。ここ数年、多くの強固で信頼性の高いステガノグラフィーとステグアナリシス技術が文献に紹介されている。本稿では,デジタルメディア内の隠れ情報を検出するための深層学習に基づくseg analysis技術の概要について述べる。本論文は、画像、音声、ビデオを含む、ステガナリシスにおけるあらゆる種類のカバーをカバーし、最もよく使われているディープラーニング技術について論じる。さらに,より高度な深層学習技術である深層移動学習 (DTL) や深層強化学習 (DRL) をステガナリシスシステムの性能向上に活用することを検討した。本稿は,最近の研究におけるデータセットや評価指標を含む最近の研究の体系的レビューを提供する。また, dtlに基づくsteg analysisアプローチの詳細な解析と, 異なるデータセット上での性能について述べる。このレビューは、ディープラーニングに基づくステガナリシスの現状、課題、今後の研究方向性に関する議論から締めくくっている。

Steganography and steganalysis are two interrelated aspects of the field of information security. Steganography seeks to conceal communications, whereas steganalysis is aimed to either find them or even, if possible, recover the data they contain. Steganography and steganalysis have attracted a great deal of interest, particularly from law enforcement. Steganography is often used by cybercriminals and even terrorists to avoid being captured while in possession of incriminating evidence, even encrypted, since cryptography is prohibited or restricted in many countries. Therefore, knowledge of cutting-edge techniques to uncover concealed information is crucial in exposing illegal acts. Over the last few years, a number of strong and reliable steganography and steganalysis techniques have been introduced in the literature. This review paper provides a comprehensive overview of deep learning-based steganalysis techniques used to detect hidden information within digital media. The paper covers all types of cover in steganalysis, including image, audio, and video, and discusses the most commonly used deep learning techniques. In addition, the paper explores the use of more advanced deep learning techniques, such as deep transfer learning (DTL) and deep reinforcement learning (DRL), to enhance the performance of steganalysis systems. The paper provides a systematic review of recent research in the field, including data sets and evaluation metrics used in recent studies. It also presents a detailed analysis of DTL-based steganalysis approaches and their performance on different data sets. The review concludes with a discussion on the current state of deep learning-based steganalysis, challenges, and future research directions.

翻訳日:2023-08-10 16:19:56 公開日:2023-08-08

# Donkey文のためのDisCoCat

DisCoCat for Donkey Sentences ( http://arxiv.org/abs/2308.04519v1 )

ライセンス: Link先を確認

Lachlan McPheat (University College London), Daphne Wang (University College London)

(参考訳) 我々は、Geachのドンキー文を構成的分布モデルで解析する方法を実証する。我々は、談話、決定子、相対代名詞をモデル化する拡張を含むDisCoCat(Distributional Compositional Categorical)フレームワークに関する以前の研究に基づいて構築する。関係空間意味論とベクトル空間意味論の両方を定義するロバ文を解析するための型論理構文を提案する。

We demonstrate how to parse Geach's Donkey sentences in a compositional distributional model of meaning. We build on previous work on the DisCoCat (Distributional Compositional Categorical) framework, including extensions that model discourse, determiners, and relative pronouns. We present a type-logical syntax for parsing donkey sentences, for which we define both relational and vector space semantics.

翻訳日:2023-08-10 16:19:30 公開日:2023-08-08

# 汎用AIによるラベルなし多視点3次元歩行者検出に向けて:技術と性能解析

Toward unlabeled multi-view 3D pedestrian detection by generalizable AI: techniques and performance analysis ( http://arxiv.org/abs/2308.04515v1 )

ライセンス: Link先を確認

Jo\~ao Paulo Lima, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

(参考訳) 我々は、ラベルのないターゲットシーンにおける多視点3D歩行者検出を改善するために、一般化可能なAIをいかに活用できるかを明らかにした。新しいシーンへの一般化を促進する方法の1つは、ターゲットデータを自動的にラベル付けすることで、検出器モデルのトレーニングに使用できる。本研究では,教師付き検出器を用いた擬似ラベル付けと,未学習検出器を用いた自動ラベル付けの2つの手法について検討した。自動ラベリング手法を用いて検出器モデルを最適化するためのトレーニングフレームワークを採用する。このフレームワークは、異なるトレーニングセット/モードとマルチラウンドの自動ラベリング戦略を含んでいる。 WILDTRACKおよびMultiviewXデータセットについて解析を行った。学習されていない検出器に基づく自動ラベル付け手法を用いることで、学習されていない検出器や既存のラベル付きソースデータセットでトレーニングされた検出器を直接使用するよりも優れた結果が得られることを示す。ターゲットデータセットとしてwildtrackとmultiviewxを使用する場合、既存の未ラベルメソッドよりも4%と1%のモーダが達成された。

We unveil how generalizable AI can be used to improve multi-view 3D pedestrian detection in unlabeled target scenes. One way to increase generalization to new scenes is to automatically label target data, which can then be used for training a detector model. In this context, we investigate two approaches for automatically labeling target data: pseudo-labeling using a supervised detector and automatic labeling using an untrained detector (that can be applied out of the box without any training). We adopt a training framework for optimizing detector models using automatic labeling procedures. This framework encompasses different training sets/modes and multi-round automatic labeling strategies. We conduct our analyses on the publicly-available WILDTRACK and MultiviewX datasets. We show that, by using the automatic labeling approach based on an untrained detector, we can obtain superior results than directly using the untrained detector or a detector trained with an existing labeled source dataset. It achieved a MODA about 4% and 1% better than the best existing unlabeled method when using WILDTRACK and MultiviewX as target datasets, respectively.

翻訳日:2023-08-10 16:19:22 公開日:2023-08-08

# MT-IceNet -北極海氷予測のための空間的・時間的深層学習モデル

MT-IceNet -- A Spatial and Multi-Temporal Deep Learning Model for Arctic Sea Ice Forecasting ( http://arxiv.org/abs/2308.04511v1 )

ライセンス: Link先を確認

Sahara Ali, Jianwu Wang

(参考訳) 北極圏の増幅は、気候パターンを地域的にも世界的にも変化させ、過去数十年で、より頻繁で激しい気象現象を引き起こした。北極圏の増幅の不可欠な部分は、衛星観測による前例のない海氷の喪失である。季節的から季節的スケールで北極海氷を正確に予測することは、根本的な課題を伴う主要な研究課題である。物理に基づく地球系のモデルに加えて、研究者は海氷予測に複数の統計モデルと機械学習モデルを適用している。海氷の変動を研究するためのデータ駆動型アプローチの可能性を検討するため,北極海氷濃度(SIC)予測のためのUNetに基づく空間的・時間的深層学習モデルMT-IceNetを提案する。このモデルはエンコーダ-デコーダアーキテクチャを使用し、スキップ接続と多時間入力ストリームを処理し、将来の時間ステップで空間マップを再生する。 1979-2021年、NSIDCから月毎・月毎の衛星海氷データと、ERA5の再分析製品から得られた大気および海洋の変数を用いて、提案モデルが、最先端の予測誤差を6ヵ月間最大60%減少させ、画素ごとのSIC予測に有望な予測性能を提供することを示した。

Arctic amplification has altered the climate patterns both regionally and globally, resulting in more frequent and more intense extreme weather events in the past few decades. The essential part of Arctic amplification is the unprecedented sea ice loss as demonstrated by satellite observations. Accurately forecasting Arctic sea ice from sub-seasonal to seasonal scales has been a major research question with fundamental challenges at play. In addition to physics-based Earth system models, researchers have been applying multiple statistical and machine learning models for sea ice forecasting. Looking at the potential of data-driven approaches to study sea ice variations, we propose MT-IceNet - a UNet based spatial and multi-temporal (MT) deep learning model for forecasting Arctic sea ice concentration (SIC). The model uses an encoder-decoder architecture with skip connections and processes multi-temporal input streams to regenerate spatial maps at future timesteps. Using bi-monthly and monthly satellite retrieved sea ice data from NSIDC as well as atmospheric and oceanic variables from ERA5 reanalysis product during 1979-2021, we show that our proposed model provides promising predictive performance for per-pixel SIC forecasting with up to 60% decrease in prediction error for a lead time of 6 months as compared to its state-of-the-art counterparts.

翻訳日:2023-08-10 16:19:05 公開日:2023-08-08

# 2粒子非エルミートハバード模型におけるスペクトル構造とダビロン解離

Spectral structure and doublon dissociation in the two-particle non-Hermitian Hubbard model ( http://arxiv.org/abs/2308.04505v1 )

ライセンス: Link先を確認

Stefano Longhi

(参考訳) 非エルミート模型の強相関系は研究の創発的な領域である。ここでは、格子上の単一粒子ホッピング振幅が相反しない非エルミートハバードモデルを検討し、異なる境界条件下でのヒルベルト空間の2粒子セクターのスペクトル構造の正確な解析結果を提供する。この分析は、純粋に非エルミート的性質の興味深いスペクトル的および動的効果を示し、単粒子系で見られる通常のシナリオから逸脱する。具体的には、無限格子上のmott-hubbardバンドのスペクトル相転移を、相互作用エネルギーが複素エネルギー平面内の開ループから閉ループへの臨界値よりも増大し、その2つの粒子が格子端に達するとドブロン状態が突然復活し、格子のバルクにおける2粒子結合状態の不安定性(英語版)(dublons)の動的解離が起こると予測する。格子のバルクで観測された粒子解離は、単一粒子と2粒子状態の異なる寿命から生じる非エルミート力学の明らかな顕在化であり、一方、境界におけるドバイロン状態の突然の回復は、境界依存エネルギースペクトルを持つ非エルミート系に特有の突破端の動的効果であり、相関粒子に対して初めて予測される。

Strongly-correlated systems in non-Hermitian models are an emergent area of research. Here we consider a non-Hermitian Hubbard model, where the single-particle hopping amplitudes on the lattice are not reciprocal, and provide exact analytical results of the spectral structure in the two-particle sector of Hilbert space under different boundary conditions. The analysis unveils some interesting spectral and dynamical effects of purely non-Hermitian nature and that deviate from the usual scenario found in the single-particle regime. Specifically, we predict a spectral phase transition of the Mott-Hubbard band on the infinite lattice as the interaction energy is increased above a critical value, from an open to a closed loop in complex energy plane, and the dynamical dissociation of doublons, i.e. instability of two-particle bound states, in the bulk of the lattice, with a sudden revival of the doublon state when the two particles reach the lattice edge. Particle dissociation observed in the bulk of the lattice is a clear manifestation of non-Hermitian dynamics arising from the different lifetimes of single-particle and two-particle states, whereas the sudden revival of the doublon state at the boundaries is a striking burst edge dynamical effect peculiar to non-Hermitian systems with boundary-dependent energy spectra, here predicted for the first time for correlated particles.

翻訳日:2023-08-10 16:18:39 公開日:2023-08-08

# FakeからReal(FFR)へ:合成データによる素早い相関を緩和するための2段階トレーニングパイプライン

From Fake to Real (FFR): A two-stage training pipeline for mitigating spurious correlations with synthetic data ( http://arxiv.org/abs/2308.04553v1 )

ライセンス: Link先を確認

Maan Qraitem, Kate Saenko, Bryan A. Plummer

(参考訳) 視覚認識モデルは、特定のグループ(女性)が特定のクラス(プログラマ)で不足している不均衡なトレーニングセットによって引き起こされる急激な相関を学習する傾向にある。生成モデルは、マイノリティサンプルの合成データを生成し、トレーニングセットのバランスをとることで、このバイアスを緩和する有望な方向を提供する。しかし、これらのアプローチを用いた以前の研究は、視覚認識モデルが実画像と合成画像の区別を学べることがしばしばあり、したがって元のデータセットのバイアスを解き放つことに失敗する可能性があることを見落としていた。本稿では,この問題を緩和する新たな2段階パイプラインを提案する。 1)バランスの取れた合成データセット上でモデルを事前訓練した後 2)実際のデータを微調整する。このパイプラインを使用することで,実データと合成データの両方のトレーニングを回避し,実データと合成データのバイアスを回避する。さらに,第1ステップではバイアスに対して頑健な特徴を学習し,第2ステップではバイアスを緩和する。さらに、当社のパイプラインはバイアス緩和手法と自然に統合され、微調整ステップに単純に適用することができます。実験により,3つの大規模データセット上での最先端性能を得るバイアス軽減手法の性能をさらに向上させることができた。

Visual recognition models are prone to learning spurious correlations induced by an imbalanced training set where certain groups (\eg Females) are under-represented in certain classes (\eg Programmers). Generative models offer a promising direction in mitigating this bias by generating synthetic data for the minority samples and thus balancing the training set. However, prior work that uses these approaches overlooks that visual recognition models could often learn to differentiate between real and synthetic images and thus fail to unlearn the bias in the original dataset. In our work, we propose a novel two-stage pipeline to mitigate this issue where 1) we pre-train a model on a balanced synthetic dataset and then 2) fine-tune on the real data. Using this pipeline, we avoid training on both real and synthetic data, thus avoiding the bias between real and synthetic data. Moreover, we learn robust features against the bias in the first step that mitigate the bias in the second step. Moreover, our pipeline naturally integrates with bias mitigation methods; they can be simply applied to the fine-tuning step. As our experiments prove, our pipeline can further improve the performance of bias mitigation methods obtaining state-of-the-art performance on three large-scale datasets.

翻訳日:2023-08-10 16:12:46 公開日:2023-08-08

# 自己教師付き事前訓練による雑音ラベルの医用画像分類の改善

Improving Medical Image Classification in Noisy Labels Using Only Self-supervised Pretraining ( http://arxiv.org/abs/2308.04551v1 )

ライセンス: Link先を確認

Bidur Khanal, Binod Bhattarai, Bishesh Khanal, Cristian A. Linte

(参考訳) ノイズラベルが深層学習に基づく教師付き画像分類性能を損なうのは、モデルがノイズに過度に適合し、劣化した特徴抽出器を学習するためである。ノイズラベル付きデータを用いた自然画像分類訓練では,自己教師あり重みによるモデル初期化が特徴破壊の低減と分類性能の向上に寄与している。しかし、研究は行われていない。一プレテキストタスクベースの事前学習のような他の自己指導的アプローチが騒音ラベルによる学習に与える影響二騒々しいラベル設定の医用画像に対して単独の自己監督事前訓練方法医療画像は、しばしばより小さなデータセットと微妙なクラス間変異を特徴とし、正確な分類を保証するために人間の専門知識を必要とする。したがって、CIFARのような自然画像データセットにおけるノイズラベルによる学習を改善する手法が医療画像にも役立つかどうかは不明である。本研究は,NCT-CRC-HE-100K組織組織像とCOVID-QU-Ex胸部X線画像を用いた2つの医学データセットの深層学習分類モデルの重み付けを初期化するために,コントラッシブでプレトレーニングされたタスクベースの自己教師付きプレトレーニングについて検討する。その結果,自己教師付き学習から得られた事前学習重みで初期化したモデルでは,より優れた特徴を効果的に学習し,雑音ラベルに対する頑健性を向上させることができた。

Noisy labels hurt deep learning-based supervised image classification performance as the models may overfit the noise and learn corrupted feature extractors. For natural image classification training with noisy labeled data, model initialization with contrastive self-supervised pretrained weights has shown to reduce feature corruption and improve classification performance. However, no works have explored: i) how other self-supervised approaches, such as pretext task-based pretraining, impact the learning with noisy label, and ii) any self-supervised pretraining methods alone for medical images in noisy label settings. Medical images often feature smaller datasets and subtle inter class variations, requiring human expertise to ensure correct classification. Thus, it is not clear if the methods improving learning with noisy labels in natural image datasets such as CIFAR would also help with medical images. In this work, we explore contrastive and pretext task-based self-supervised pretraining to initialize the weights of a deep learning classification model for two medical datasets with self-induced noisy labels -- NCT-CRC-HE-100K tissue histological images and COVID-QU-Ex chest X-ray images. Our results show that models initialized with pretrained weights obtained from self-supervised learning can effectively learn better features and improve robustness against noisy labels.

翻訳日:2023-08-10 16:12:27 公開日:2023-08-08

# 意味認識時間蓄積によるprune時空間トークン

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation ( http://arxiv.org/abs/2308.04549v1 )

ライセンス: Link先を確認

Shuangrui Ding, Peisen Zhao, Xiaopeng Zhang, Rui Qian, Hongkai Xiong, Qi Tian

(参考訳) トランスフォーマーは、その素晴らしい性能により、コンピュータビジョンコミュニティの主要なバックボーンとなっている。しかし、不都合な計算コストは、ビデオ認識領域におけるその可能性を妨げる。速度精度のトレードオフを最適化するために,時空間トークンを一体的にプルーピングするための意味認識時間蓄積スコア(sta)を提案する。 STAスコアは、時間的冗長性と意味的重要性の2つの重要な要因を考慮する。前者は連続するフレームでトークンとtokenの類似性を集約し、後者は全体的な予測への貢献に基づいて各トークンを評価することにより、新しい事象か見掛けられた実体かに基づいて、特定の領域を描写する。その結果、staの高いスコアを持つトークンは、より時間的冗長性を持ち、より低い意味論を持つため、刈り取られる。 STAスコアに基づいて、追加のパラメータを導入することなく、あるいはさらなる再トレーニングを必要とせずに、トークンを段階的にプルークすることができる。市販のvitおよびvideoswinバックボーンにstaモジュールを直接適用し,kinetics-400 および something-something v2 を用いた実験結果では,約0.2%の精度低下で30%以上削減できた。コードはhttps://github.com/Mark12Ding/STAで公開されている。

Transformers have become the primary backbone of the computer vision community due to their impressive performance. However, the unfriendly computation cost impedes their potential in the video recognition domain. To optimize the speed-accuracy trade-off, we propose Semantic-aware Temporal Accumulation score (STA) to prune spatio-temporal tokens integrally. STA score considers two critical factors: temporal redundancy and semantic importance. The former depicts a specific region based on whether it is a new occurrence or a seen entity by aggregating token-to-token similarity in consecutive frames while the latter evaluates each token based on its contribution to the overall prediction. As a result, tokens with higher scores of STA carry more temporal redundancy as well as lower semantics thus being pruned. Based on the STA score, we are able to progressively prune the tokens without introducing any additional parameters or requiring further re-training. We directly apply the STA module to off-the-shelf ViT and VideoSwin backbones, and the empirical results on Kinetics-400 and Something-Something V2 achieve over 30% computation reduction with a negligible ~0.2% accuracy drop. The code is released at https://github.com/Mark12Ding/STA.

翻訳日:2023-08-10 16:12:07 公開日:2023-08-08

# イジングマシンを用いた化学反応ネットワークにおける最適経路の探索

Finding Optimal Pathways in Chemical Reaction Networks Using Ising Machines ( http://arxiv.org/abs/2308.04544v1 )

ライセンス: Link先を確認

Yuta Mizuno and Tamiki Komatsuzaki

(参考訳) 化学反応ネットワークにおける最適経路の発見は化学プロセスの解明と設計に不可欠であり、合成計画や代謝経路解析などの重要な応用がある。このような化学経路探索問題は制約付き組合せ最適化問題として定式化することができ、出発物質とターゲット物質を所定のネットワーク内で接続する化学反応の最適な組み合わせを見つけることを目的としている。組合せ爆発により、最適な経路を見つけるのに必要な計算時間はネットワークサイズによって指数関数的に増加する。量子アニーリングデバイスやシミュレーションアニーリングデバイスを含むイジングマシンは、このようなハードコンビネーション最適化に特化した新しいコンピュータを約束している。しかしながら、我々の知る限りでは、化学経路探索問題にイジングマシンを適用する試みはまだない。本稿では,化学経路探索問題に対する最初の ising/quantum 計算応用について述べる。化学経路フィニング問題から翻訳されたIsingモデルは、制約に違反するいくつかの種類のペナルティ項を含む。異なるタイプの適切なペナルティ強度を設定する方法が明確ではない。この課題に対処するために,パラメータチューニングにベイズ最適化を用いる。さらに,基礎となる問題構造に応じてペナルティ項をグループ化し,チューニング性能を向上させる手法を提案する。提案アルゴリズムの性能評価と解析は,D-Wave Advantageシステムとシミュレートアニーリングを用いて行った。ベンチマークの結果,最適な経路を見つける上での課題が明らかになった。同時に, コスト値の相対誤差がある程度許容できることを示すことにより, 最適経路の探索の可能性を示す。

Finding optimal pathways in chemical reaction networks is essential for elucidating and designing chemical processes, with significant applications such as synthesis planning and metabolic pathway analysis. Such a chemical pathway-finding problem can be formulated as a constrained combinatorial optimization problem, aiming to find an optimal combination of chemical reactions connecting starting materials to target materials in a given network. Due to combinatorial explosion, the computation time required to find an optimal pathway increases exponentially with the network size. Ising machines, including quantum and simulated annealing devices, are promising novel computers dedicated to such hard combinatorial optimization. However, to the best of our knowledge, there has yet to be an attempt to apply Ising machines to chemical pathway-finding problems. In this article, we present the first Ising/quantum computing application for chemical pathway-finding problems. The Ising model, translated from a chemical pathway-finding problem, involves several types of penalty terms for violating constraints. It is not obvious how to set appropriate penalty strengths of different types. To address this challenge, we employ Bayesian optimization for parameter tuning. Furthermore, we introduce a novel technique that enhances tuning performance by grouping penalty terms according to the underlying problem structure. The performance evaluation and analysis of the proposed algorithm were conducted using a D-Wave Advantage system and simulated annealing. The benchmark results reveal challenges in finding exact optimal pathways. Concurrently, the results indicate the feasibility of finding approximate optimal pathways, provided that a certain degree of relative error in cost value is acceptable.

翻訳日:2023-08-10 16:11:43 公開日:2023-08-08

# フォトニック量子極端学習機における実験的特性再構成

Experimental property-reconstruction in a photonic quantum extreme learning machine ( http://arxiv.org/abs/2308.04543v1 )

ライセンス: Link先を確認

Alessia Suprano, Danilo Zia, Luca Innocenti, Salvatore Lorenzo, Valeria Cimini, Taira Giordani, Ivan Palmisano, Emanuele Polino, Nicol\`o Spagnolo, Fabio Sciarrino, G. Massimo Palma, Alessandro Ferraro and Mauro Paternostro

(参考訳) 近年の発展により、量子状態の性質のキャラクタリゼーションを含む重要な問題に対処するために、実験プラットフォームに機械学習ツールを組み込むことが可能になった。これを利用して、光子の偏光状態の資源効率と正確な評価を実現するために、フォトニックプラットフォームに量子極端学習マシンを実装した。このような入力状態が進化する基盤となる貯留層ダイナミクスは、高次元フォトニック軌道角運動量の量子ウォークを用いて実装され、一定の基底で射影的測定を行う。本研究では, 未知の偏光状態の再構成が測定装置の注意深い特徴付けを必要とせず, 実験的な不完全性に対して堅牢であることを示す。

Recent developments have led to the possibility of embedding machine learning tools into experimental platforms to address key problems, including the characterization of the properties of quantum states. Leveraging on this, we implement a quantum extreme learning machine in a photonic platform to achieve resource-efficient and accurate characterization of the polarization state of a photon. The underlying reservoir dynamics through which such input state evolves is implemented using the coined quantum walk of high-dimensional photonic orbital angular momentum, and performing projective measurements over a fixed basis. We demonstrate how the reconstruction of an unknown polarization state does not need a careful characterization of the measurement apparatus and is robust to experimental imperfections, thus representing a promising route for resource-economic state characterisation.

翻訳日:2023-08-10 16:11:20 公開日:2023-08-08

# yudo: 統一指向オブジェクト検出のためのyolo

YUDO: YOLO for Uniform Directed Object Detection ( http://arxiv.org/abs/2308.04542v1 )

ライセンス: Link先を確認

{\DJ}or{\dj}e Nedeljkovi\'c

(参考訳) 本稿では,その中心座標と方向角を予測し,有向物体を効率的に検出する手法を提案する。対象物のサイズは一様であるため,提案モデルは対象物の幅や高さを予測せずに動作する。この問題に使用されるデータセットは、Honeybee Segmentation and Tracking Datasetsプロジェクトで紹介されている。この研究の貢献の1つは、位置や方向を検出するためにyolov7のような標準リアルタイムオブジェクト検出アーキテクチャをカスタマイズする能力の検討である。このアプローチでは、非常に効率的で小さなバージョンのアーキテクチャが使用されます。さらに、アンカーのない3つの検出ヘッドのうち1つだけで十分である。また, 回転箱-方向iou (diriou) に対するskewiou(union over union)計算について, 絶対角度差を含む拡張スキュー交点を導入する。 DirIoUは、mAP計算のためのターゲットと予測バウンディングボックスのマッチング手順と、NMSフィルタリング手順の両方で使用される。コードとモデルはhttps://github.com/djordjened92/yudoで入手できる。

This paper presents an efficient way of detecting directed objects by predicting their center coordinates and direction angle. Since the objects are of uniform size, the proposed model works without predicting the object's width and height. The dataset used for this problem is presented in Honeybee Segmentation and Tracking Datasets project. One of the contributions of this work is an examination of the ability of the standard real-time object detection architecture like YoloV7 to be customized for position and direction detection. A very efficient, tiny version of the architecture is used in this approach. Moreover, only one of three detection heads without anchors is sufficient for this task. We also introduce the extended Skew Intersection over Union (SkewIoU) calculation for rotated boxes - directed IoU (DirIoU), which includes an absolute angle difference. DirIoU is used both in the matching procedure of target and predicted bounding boxes for mAP calculation, and in the NMS filtering procedure. The code and models are available at https://github.com/djordjened92/yudo.

翻訳日:2023-08-10 16:11:05 公開日:2023-08-08

# ナノビーム中のシリコンt中心からの高効率単一光子放出

High-efficiency single photon emission from a silicon T-center in a nanobeam ( http://arxiv.org/abs/2308.04541v1 )

ライセンス: Link先を確認

Chang-Min Lee, Fariba Islam, Samuel Harper, Mustafa Atabey Buyukkaya, Daniel Higginbottom, Stephanie Simmons, Edo Waks

(参考訳) Siのカラーセンターは、全シリコンプラットフォームで長いコヒーレンス時間を持つ効率的な量子エミッタと量子メモリの両方として機能する可能性がある。様々な既知の色中心の中で、T中心は長いコヒーレンス時間を持つスピン基底状態を持つため、特定の約束を持っている。しかし、この色中心は長い励起状態の寿命を示し、光子放出速度が低く、高効率で光子放出を抽出する方法が必要となる。ナノビームを用いた単一T中心からの高効率単一光子放出を示す。ナノビームは、レンズファイバとよくマッチするモードにおいて効率的に光を放射し、t中心放射の70%以上を単一モードファイバに直接集めることができる。この効率により、T中心からのコヒーレントな放出を表すゼロフォノン線からの単一光子放出を直接示すことができる。この結果は、量子コンピューティングと量子ネットワークのためのシリコン集積スピン光子インタフェースへの重要な一歩である。

Color centers in Si could serve as both efficient quantum emitters and quantum memories with long coherence times in an all-silicon platform. Of the various known color centers, the T center holds particular promise because it possesses a spin ground state that has long coherence times. But this color center exhibits a long excited state lifetime which results in a low photon emission rate, requiring methods to extract photon emission with high efficiency. We demonstrate high-efficiency single photon emission from a single T center using a nanobeam. The nanobeam efficiently radiates light in a mode that is well-matched to a lensed fiber, enabling us to collect over 70% of the T center emission directly into a single mode fiber. This efficiency enables us to directly demonstrate single photon emission from the zero phonon line, which represents the coherent emission from the T center. Our results represent an important step towards silicon-integrated spin-photon interfaces for quantum computing and quantum networks.

翻訳日:2023-08-10 16:10:46 公開日:2023-08-08

# バイオインスパイアされたアーキテクチャを用いた連続学習タスクの性能向上

Improving Performance in Continual Learning Tasks using Bio-Inspired Architectures ( http://arxiv.org/abs/2308.04539v1 )

ライセンス: Link先を確認

Sandeep Madireddy, Angel Yanguas-Gil, Prasanna Balaprakash

(参考訳) 破滅的な忘れることなく、入ってくるデータストリームから継続的に学習する能力は、インテリジェントなシステムを設計する上で重要である。継続的学習のための多くのアプローチは、確率的勾配降下とそのグローバルエラー更新を用いた変種に依存しているため、安定性、強欲、短期的なメモリ制限を回避するために、メモリバッファやリプレイのような戦略を採用する必要がある。この制限に対処するために,我々は,シナプス可塑性機構とニューロモジュレーションを組み込んだ,生物学的にインスパイアされた軽量ニューラルネットワークアーキテクチャを開発した。提案手法は,スプリット-MNIST,スプリット-CIFAR-10,スプリット-CIFAR-100データセットのオンライン連続学習性能を,他のメモリ制約学習手法と比較し,最先端のメモリ集約リプレイ方式と一致させる。さらに,鍵設計概念を他のバックプロパゲーションに基づく連続学習アルゴリズムに統合し,その精度を大幅に向上させることにより,提案手法の有効性を実証する。我々の結果は、生物学的原則を機械学習モデルに取り入れることの重要性を証明し、オンライン連続学習のためのより効率的で堅牢なシステムの設計にそれらをどのように活用できるかについての洞察を提供する。

The ability to learn continuously from an incoming data stream without catastrophic forgetting is critical to designing intelligent systems. Many approaches to continual learning rely on stochastic gradient descent and its variants that employ global error updates, and hence need to adopt strategies such as memory buffers or replay to circumvent its stability, greed, and short-term memory limitations. To address this limitation, we have developed a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation and hence learns through local error signals to enable online continual learning without stochastic gradient descent. Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets compared to other memory-constrained learning approaches and matches that of the state-of-the-art memory-intensive replay-based approaches. We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms, significantly improving their accuracy. Our results provide compelling evidence for the importance of incorporating biological principles into machine learning models and offer insights into how we can leverage them to design more efficient and robust systems for online continual learning.

翻訳日:2023-08-10 16:10:31 公開日:2023-08-08

# マイクロ表現生成のための顔優先1次運動モデル

Facial Prior Based First Order Motion Model for Micro-expression Generation ( http://arxiv.org/abs/2308.04536v1 )

ライセンス: Link先を確認

Yi Zhang, Youjun Zhao, Yuhang Wen, Zixuan Tang, Xinhua Xu, Mengyuan Liu

(参考訳) ビデオから顔のマイクロ表現を見つけると、臨床診断や尋問などの分野で様々な応用が考えられるが、トレーニングデータの規模が限られているため、この課題はまだ難しい。そこで本研究では,マイクロ圧縮生成と呼ばれる新しいタスクを定式化し,第1次動作モデルと顔の先行知識を組み合わせた強力なベースラインを提示する。対象の顔が与えられた場合、原動画の動きパターンに応じて、顔を動かしてマイクロ圧縮ビデオを生成する。具体的には、新しいモデルは3つのモジュールを含む。まず,領域集中モジュールから顔先行特徴を抽出する。第2に,動き予測モジュールを用いたキーポイントと局所アフィン変換を用いて顔の動きを推定する。第三に、表情生成モジュールはターゲットの顔を駆動してビデオを生成する。パブリックなcasme ii、samm、smicデータセットでモデルをトレーニングし、そのモデルを使って評価のために新しいマイクロ表現ビデオを生成します。本モデルは,顔マイクロ表現チャレンジ2021 (megc2021) において,顔動作符号化システム認定を受けた3人の専門家によって,優れた性能が検証される第1位となる。ソースコードはhttps://github.com/Necolizer/Facial-Prior-Based-FOMMで公開されている。

Spotting facial micro-expression from videos finds various potential applications in fields including clinical diagnosis and interrogation, meanwhile this task is still difficult due to the limited scale of training data. To solve this problem, this paper tries to formulate a new task called micro-expression generation and then presents a strong baseline which combines the first order motion model with facial prior knowledge. Given a target face, we intend to drive the face to generate micro-expression videos according to the motion patterns of source videos. Specifically, our new model involves three modules. First, we extract facial prior features from a region focusing module. Second, we estimate facial motion using key points and local affine transformations with a motion prediction module. Third, expression generation module is used to drive the target face to generate videos. We train our model on public CASME II, SAMM and SMIC datasets and then use the model to generate new micro-expression videos for evaluation. Our model achieves the first place in the Facial Micro-Expression Challenge 2021 (MEGC2021), where our superior performance is verified by three experts with Facial Action Coding System certification. Source code is provided in https://github.com/Necolizer/Facial-Prior-Based-FOMM.

翻訳日:2023-08-10 16:10:02 公開日:2023-08-08

# 航空ドローン画像を用いた災害現場のヒューマンコンディションの推定

Estimation of Human Condition at Disaster Site Using Aerial Drone Images ( http://arxiv.org/abs/2308.04535v1 )

ライセンス: Link先を確認

Tomoki Arai, Kenji Iwata, Kensho Hara, Yutaka Satoh

(参考訳) ドローンはさまざまな災害の状況を評価するために使われています。本研究では,災害現場の把握を迅速かつ省力化するために,航空ドローン画像の動作に基づいて,被害状況を自動的に推定する手法について検討する。都市部で発生した仮説的災害における人的行動の航空画像データセットを構築し,3D ResNetを用いて人的被害状況の分類を行った。その結果、人間の行動に特徴的な状態はリコール率80%以上で分類できるが、同様の行動を持つ他の状態はリコール率約50%でしか分類できないことが分かった。さらに、クラウドベースのvrプレゼンテーションアプリケーションは、ドローンを使って災害現場を理解し、人間の状態を推定することの有効性を示唆した。

Drones are being used to assess the situation in various disasters. In this study, we investigate a method to automatically estimate the damage status of people based on their actions in aerial drone images in order to understand disaster sites faster and save labor. We constructed a new dataset of aerial images of human actions in a hypothetical disaster that occurred in an urban area, and classified the human damage status using 3D ResNet. The results showed that the status with characteristic human actions could be classified with a recall rate of more than 80%, while other statuses with similar human actions could only be classified with a recall rate of about 50%. In addition, a cloud-based VR presentation application suggested the effectiveness of using drones to understand the disaster site and estimate the human condition.

翻訳日:2023-08-10 16:09:28 公開日:2023-08-08

# テンポラル・ディノ:アクション予測を強化する自己監督型ビデオ戦略

Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction ( http://arxiv.org/abs/2308.04589v1 )

ライセンス: Link先を確認

Izzeddin Teeti, Rongali Sai Bhargav, Vivek Singh, Andrew Bradley, Biplab Banerjee, Fabio Cuzzolin

(参考訳) 行動予測の分野は、自律運転、アクティビティ分析、人間とコンピュータの相互作用など、様々なコンピュータビジョンアプリケーションにおいて重要な役割を果たす。大幅な進歩にもかかわらず、ビデオデータに固有の高次元性、複雑なダイナミクス、不確実性のために、将来の行動を正確に予測することは難しい問題である。従来の教師付きアプローチでは大量のラベル付きデータが必要です。本稿では,DINO (self-distillation with labels) にインスパイアされた行動予測を強化するための,新たな自己教師型ビデオ戦略を提案する。テンポラル・ディノのアプローチでは、過去のフレームを「学生」処理する2つのモデルと、過去と将来のフレームの両方を「教師」処理することで、より広い時間的コンテキストを実現する。授業中、教師は過去のフレームだけを観察して将来の文脈を学ぶよう指導する。この戦略は3D-ResNet, Transformer, LSTMアーキテクチャを用いて, アクション予測下流タスクのためのROADデータセット上で評価される。提案手法は,9.9%の精度ポイント(PP)を平均的に向上させるとともに,長期的依存関係を捕捉するバックボーンの能力向上に有効であることを示す。さらに,本手法は,事前学習データセットのサイズと必要エポック数の効率性を示す。この方法は、様々なバックボーンアーキテクチャを考慮し、複数の予測水平線に対処し、手作りの強化への依存を減らし、事前学習プロセスを単一のステージに合理化することを含む、他のアプローチにおける制限を克服する。これらの結果は,行動認識,運動計画,シーン理解など,多様な映像ベースタスクにおけるアプローチの可能性を強調した。

The emerging field of action prediction plays a vital role in various computer vision applications such as autonomous driving, activity analysis and human-computer interaction. Despite significant advancements, accurately predicting future actions remains a challenging problem due to high dimensionality, complex dynamics and uncertainties inherent in video data. Traditional supervised approaches require large amounts of labelled data, which is expensive and time-consuming to obtain. This paper introduces a novel self-supervised video strategy for enhancing action prediction inspired by DINO (self-distillation with no labels). The Temporal-DINO approach employs two models; a 'student' processing past frames; and a 'teacher' processing both past and future frames, enabling a broader temporal context. During training, the teacher guides the student to learn future context by only observing past frames. The strategy is evaluated on ROAD dataset for the action prediction downstream task using 3D-ResNet, Transformer, and LSTM architectures. The experimental results showcase significant improvements in prediction performance across these architectures, with our method achieving an average enhancement of 9.9% Precision Points (PP), highlighting its effectiveness in enhancing the backbones' capabilities of capturing long-term dependencies. Furthermore, our approach demonstrates efficiency regarding the pretraining dataset size and the number of epochs required. This method overcomes limitations present in other approaches, including considering various backbone architectures, addressing multiple prediction horizons, reducing reliance on hand-crafted augmentations, and streamlining the pretraining process into a single stage. These findings highlight the potential of our approach in diverse video-based tasks such as activity recognition, motion planning, and scene understanding.

翻訳日:2023-08-10 16:02:03 公開日:2023-08-08

# ScatterUQ:マルチクラスディープラーニング問題に対する対話型不確実性可視化

ScatterUQ: Interactive Uncertainty Visualizations for Multiclass Deep Learning Problems ( http://arxiv.org/abs/2308.04588v1 )

ライセンス: Link先を確認

Harry Li, Steven Jorgensen, John Holodnak and Allan Wollaber

(参考訳) 近年,マルチクラスラベリング問題に対する不確実性を考慮したディープラーニング手法が開発され,クラス予測確率の校正と分散(ood)指標を提供し,機械学習(ml)の消費者とエンジニアがモデルの予測に対する信頼度を評価する。しかし、この余分なニューラルネットワーク予測情報は、複数の不確実性条件下で任意のデータソースに対して視覚的に伝達することが困難である。これらの課題に対処するために、ユーザがコンテキスト駆動の不確実性設定におけるモデルパフォーマンスをよりよく理解できるように、ターゲット視覚化を提供するインタラクティブシステムであるScatterUQを提案する。 ScatterUQは、距離対応ニューラルネットワークの最近の進歩を活用し、次元の縮小技術とともに、モデルがテスト例を(1)分布内および特定のクラス、(2)分布外、(3)分布外を予測した理由を説明する頑健な2次元散乱プロットを構築する。 mlのコンシューマとエンジニアは、モデル不確実性のパフォーマンスを理解し、アクションのフォローアップコースを決定するために``hoverコールバック'を使用して、テストサンプルの突出した特徴をトレーニング例と比較することができる。我々は、Fashion-MNISTで訓練され、Fashion-MNIST(分布中)およびMNIST(分布外)でテストされた距離認識ニューラルネットワーク上で、マルチクラス画像分類のためのモデル不確実性を説明するために、ScatterUQの有効性を実証する。文脈駆動型UQ可視化を最適化するために,次元削減手法を定量的に評価する。以上の結果から,ScatterUQシステムは任意のマルチクラスデータセットにスケールすることが示唆された。私たちのコードはhttps://github.com/mit-ll-responsible-ai/equine-webappで利用可能です。

Recently, uncertainty-aware deep learning methods for multiclass labeling problems have been developed that provide calibrated class prediction probabilities and out-of-distribution (OOD) indicators, letting machine learning (ML) consumers and engineers gauge a model's confidence in its predictions. However, this extra neural network prediction information is challenging to scalably convey visually for arbitrary data sources under multiple uncertainty contexts. To address these challenges, we present ScatterUQ, an interactive system that provides targeted visualizations to allow users to better understand model performance in context-driven uncertainty settings. ScatterUQ leverages recent advances in distance-aware neural networks, together with dimensionality reduction techniques, to construct robust, 2-D scatter plots explaining why a model predicts a test example to be (1) in-distribution and of a particular class, (2) in-distribution but unsure of the class, and (3) out-of-distribution. ML consumers and engineers can visually compare the salient features of test samples with training examples through the use of a ``hover callback'' to understand model uncertainty performance and decide follow up courses of action. We demonstrate the effectiveness of ScatterUQ to explain model uncertainty for a multiclass image classification on a distance-aware neural network trained on Fashion-MNIST and tested on Fashion-MNIST (in distribution) and MNIST digits (out of distribution), as well as a deep learning model for a cyber dataset. We quantitatively evaluate dimensionality reduction techniques to optimize our contextually driven UQ visualizations. Our results indicate that the ScatterUQ system should scale to arbitrary, multiclass datasets. Our code is available at https://github.com/mit-ll-responsible-ai/equine-webapp

翻訳日:2023-08-10 16:01:35 公開日:2023-08-08

# AIの開発ブートストラップ

Developmental Bootstrapping of AIs ( http://arxiv.org/abs/2308.04586v1 )

ライセンス: Link先を確認

Mark Stefik and Robert Price

(参考訳) 現在のAIの中には、ボードゲームのようなクローズドな世界では人間の能力を上回っているものもあるが、乱雑な現実世界でのパフォーマンスは限られている。彼らは奇妙な間違いを犯し、気づかない。簡単には指示できないし、常識を使わず、好奇心を欠いている。彼らは良い協力者はしない。従来の手作業によるシンボリックAIアプローチを使用して構築されたシステムも、大きな言語モデル(LLM)を含む生成的およびディープラーニングAIアプローチを使用して構築されたシステムも、その課題を満たすことができない。堅牢で信頼できるAIを作るには向いていない。メインストリームのAIアプローチの外にあるが、開発ブートストラップは有望だ。発達的なブートストラップでは、AIは人間の子供のように能力を生み出す。彼らは生まれながらの能力から始まる。人間と同様に、彼らは環境と相互作用し、相互作用から学ぶ。彼らは自己発達能力で自然能力を徐々に拡張する。彼らは対話し、人々から学び、知覚、認知、共通基盤を確立する。ブートストラッププロセスに続いて、必要な能力を取得する。しかし、発達ロボット工学はまだ大人レベルの強力な能力を持つAIを生産していない。通常、トードラーバリアでは、音声が流流する前の約2歳で幼児の発達に対応するプロジェクトが中止されている。彼らはまた、llmを駆動する巨大な社会的に発達した情報リソースを巧みに、そして懐疑的に活用できる読み取り障壁を橋渡ししません。人間の認知発達における次の能力は、本質的な動機づけ、模倣学習、想像、協調、コミュニケーションである。本稿では,堅牢でレジリエントなaiを作成するための開発ブートストラップのプラクティスを拡張するための,論理,展望,ギャップ,課題を概説する。

Although some current AIs surpass human abilities especially in closed worlds such as board games, their performance in the messy real world is limited. They make strange mistakes and do not notice them. They cannot be instructed easily, fail to use common sense, and lack curiosity. They do not make good collaborators. Neither systems built using the traditional manually-constructed symbolic AI approach nor systems built using generative and deep learning AI approaches including large language models (LLMs) can meet the challenges. They are not well suited for creating robust and trustworthy AIs. Although it is outside of mainstream AI approaches, developmental bootstrapping shows promise. In developmental bootstrapping, AIs develop competences like human children do. They start with innate competences. Like humans, they interact with the environment and learn from their interactions. They incrementally extend their innate competences with self-developed competences. They interact and learn from people and establish perceptual, cognitive, and common grounding. Following a bootstrapping process, they acquire the competences that they need. However, developmental robotics has not yet produced AIs with robust adult-level competences. Projects have typically stopped at the Toddler Barrier corresponding to human infant development at about two years of age, before speech is fluent. They also do not bridge the Reading Barrier, where they can skillfully and skeptically tap into the vast socially developed recorded information resources that power LLMs. The next competences in human cognitive development involve intrinsic motivation, imitation learning, imagination, coordination, and communication. This paper lays out the logic, prospects, gaps, and challenges for extending the practice of developmental bootstrapping to create robust and resilient AIs.

翻訳日:2023-08-10 16:00:59 公開日:2023-08-08

# 決定論的共起のためのカーネル単一プロキシ制御

Kernel Single Proxy Control for Deterministic Confounding ( http://arxiv.org/abs/2308.04585v1 )

ライセンス: Link先を確認

Liyuan Xu, Arthur Gretton

(参考訳) 本研究では,未観測の共同設立者による因果効果推定の問題点を考察し,共同設立者に関連するプロキシ変数を観察する。 Proxy Causal Learning (PCL)は2つのプロキシ変数を用いて真の因果効果を回復するが、結果が決定論的に生成されると、単一のプロキシ変数が因果推定に十分であることを示す。本研究では,2段階回帰法と最大モーメント制限法を組み合わせた2つのカーネルベース手法を提案する。いずれのアプローチも一貫して因果効果を推定できることを実証し,合成データセット上で因果効果を正常に回復できることを実証した。

We consider the problem of causal effect estimation with an unobserved confounder, where we observe a proxy variable that is associated with the confounder. Although Proxy Causal Learning (PCL) uses two proxy variables to recover the true causal effect, we show that a single proxy variable is sufficient for causal estimation if the outcome is generated deterministically, generalizing Control Outcome Calibration Approach (COCA). We propose two kernel-based methods for this setting: the first based on the two-stage regression approach, and the second based on a maximum moment restriction approach. We prove that both approaches can consistently estimate the causal effect, and we empirically demonstrate that we can successfully recover the causal effect on a synthetic dataset.

翻訳日:2023-08-10 16:00:35 公開日:2023-08-08

# LATR:トランスを用いた単眼画像からの3次元レーン検出

LATR: 3D Lane Detection from Monocular Images with Transformer ( http://arxiv.org/abs/2308.04583v1 )

ライセンス: Link先を確認

Yueru Luo, Chaoda Zheng, Xu Yan, Tang Kun, Chao Zheng, Shuguang Cui, Zhen Li

(参考訳) 単眼画像からの3次元車線検出は、自動運転の基本的な課題である。最近の進歩は主に、フロントビューの画像特徴とカメラパラメータから構築された構造的な3dサロゲート(鳥の目視など)に依存している。しかし, 単眼画像の奥行きの曖昧さは, 構築したサロゲート特徴写像と原画像との相違を必然的に引き起こし, 正確な車線検出には大きな課題となる。上記の課題に対処するため, 3D 対応のフロントビュー特徴を用いた3次元レーン検出システムである LATR モデルを提案する。具体的には、LATRはクエリとキーと値のペアに基づいて3次元レーンを検出し、車線対応クエリジェネレータと動的3次元地上位置埋め込みを用いて構築する。一方、各クエリは2dレーン認識機能に基づいて生成され、レーン情報を強化するためにハイブリッド組込みを採用する。一方、3D空間情報は、反復的に更新された3D地上面から位置埋め込みとして注入される。 LATRは、合成アポロと現実的なOpenLaneの両方の最先端の手法を大きなマージンで上回る(例えば、OpenLaneのF1スコアの11.4ゲイン)。コードはhttps://github.com/JMoonr/LATRでリリースされる。

3D lane detection from monocular images is a fundamental yet challenging task in autonomous driving. Recent advances primarily rely on structural 3D surrogates (e.g., bird's eye view) that are built from front-view image features and camera parameters. However, the depth ambiguity in monocular images inevitably causes misalignment between the constructed surrogate feature map and the original image, posing a great challenge for accurate lane detection. To address the above issue, we present a novel LATR model, an end-to-end 3D lane detector that uses 3D-aware front-view features without transformed view representation. Specifically, LATR detects 3D lanes via cross-attention based on query and key-value pairs, constructed using our lane-aware query generator and dynamic 3D ground positional embedding. On the one hand, each query is generated based on 2D lane-aware features and adopts a hybrid embedding to enhance the lane information. On the other hand, 3D space information is injected as positional embedding from an iteratively-updated 3D ground plane. LATR outperforms previous state-of-the-art methods on both synthetic Apollo and realistic OpenLane by large margins (e.g., 11.4 gains in terms of F1 score on OpenLane). Code will be released at https://github.com/JMoonr/LATR.

翻訳日:2023-08-10 16:00:19 公開日:2023-08-08

# RECipe:マルチモーダルレシピ知識グラフは多目的推薦システムに適合しているか?

RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System? ( http://arxiv.org/abs/2308.04579v1 )

ライセンス: Link先を確認

Ali Pesaranghader, Touqir Sajed

(参考訳) 過去20年間、レコメンデーションシステム(RS)は、機械学習(ML)ソリューションを使用して、例えば映画、本、レストランなどのアイテムを、企業の顧客やオンラインプラットフォームに推奨してきた。しかし、レシピレコメンデーションは、これらのアプリケーションと比べてあまり注目されていない。マルチモーダル知識グラフ(MMKG)をバックボーンとした多目的レシピレコメンデーションフレームワークとしてRECipeを導入する。 RECipeの背後にあるモチベーションは、自然言語でのクエリやイメージの提供によって、ユーザにレシピを推奨することで、(ディープ)ニューラルコラボレーティブフィルタリング(NCF)を越えていくことである。 RECipeは,(1)行動ベースレコメンデータ,(2)レビューベースレコメンデータ,(3)画像ベースレコメンデータの3つのサブシステムから構成される。各サブシステムは、グラフ内のエンティティと関係の埋め込み表現に依存している。まず、MicrosoftのMPNetの微調整モデルから、レビューや材料などのテキストエンティティの(事前訓練された)埋め込み表現を得る。これらの埋め込みでエンティティの重みを初期化し、知識グラフ埋め込み(KGE)モデルをトレーニングします。視覚成分,すなわちレシピ画像に対して,kge誘導変分オートエンコーダ(kg-vae)を開発し,画像の分布と潜在表現を学習する。 KGEとKG-VAEモデルを完全にトレーニングすると、多目的レコメンデーションフレームワークとして使用します。ベンチマークのために、レシピレコメンデーションのためにKaggle上の公開データセットから2つのナレッジグラフ(KG)を作成しました。実験の結果,KGEモデルはニューラルソリューションに匹敵する性能を示した。また,新しいユーザに対するゼロショット推論(あるいはコールドスタート問題)やレシピカテゴリに対する条件付き推奨など,重要な応用に対処するための事前学習NLP埋め込みを提案する。最終的に、多目的レコメンデーション設定におけるRECipeの適用を実証する。

Over the past two decades, recommendation systems (RSs) have used machine learning (ML) solutions to recommend items, e.g., movies, books, and restaurants, to clients of a business or an online platform. Recipe recommendation, however, has not yet received much attention compared to those applications. We introduce RECipe as a multi-purpose recipe recommendation framework with a multi-modal knowledge graph (MMKG) backbone. The motivation behind RECipe is to go beyond (deep) neural collaborative filtering (NCF) by recommending recipes to users when they query in natural language or by providing an image. RECipe consists of 3 subsystems: (1) behavior-based recommender, (2) review-based recommender, and (3) image-based recommender. Each subsystem relies on the embedding representations of entities and relations in the graph. We first obtain (pre-trained) embedding representations of textual entities, such as reviews or ingredients, from a fine-tuned model of Microsoft's MPNet. We initialize the weights of the entities with these embeddings to train our knowledge graph embedding (KGE) model. For the visual component, i.e., recipe images, we develop a KGE-Guided variational autoencoder (KG-VAE) to learn the distribution of images and their latent representations. Once KGE and KG-VAE models are fully trained, we use them as a multi-purpose recommendation framework. For benchmarking, we created two knowledge graphs (KGs) from public datasets on Kaggle for recipe recommendation. Our experiments show that the KGE models have comparable performance to the neural solutions. We also present pre-trained NLP embeddings to address important applications such as zero-shot inference for new users (or the cold start problem) and conditional recommendation with respect to recipe categories. We eventually demonstrate the application of RECipe in a multi-purpose recommendation setting.

翻訳日:2023-08-10 15:59:53 公開日:2023-08-08

# Pairwise User Preferencesによるアルゴリズムの最適化

Optimizing Algorithms From Pairwise User Preferences ( http://arxiv.org/abs/2308.04571v1 )

ライセンス: Link先を確認

Leonid Keselman, Katherine Shih, Martial Hebert, Aaron Steinfeld

(参考訳) ロボット工学における典型的なブラックボックス最適化アプローチは、メトリクススコアからの学習に焦点を当てている。しかし、すべての開発者が真実を理解できるわけではないので、必ずしもそれが可能であるとは限らない。人間中心のコンテキストで適切なロボットの振る舞いを学ぶには、多くの場合、正確なメトリクススコアを提供できないユーザーをクエリする必要がある。既存のアプローチでは、暗黙の報酬関数をモデル化するために人間のフィードバックを利用するが、この報酬を効果的に捕獲することは困難または不可能である。本研究では,ペアワイズユーザの好みに基づいてアルゴリズムパラメータを高次元に最適化するSortCMAを提案する。 SortCMAは、報酬を直接モデル化することなく、ユーザー入力を利用してパラメータセットを見つける。本手法は,地上の真理を示さずに市販の深度センサをチューニングし,ロボットの行動よりも複雑な嗜好を伴うロボット社会ナビゲーションに適用する。提案手法は,ユーザの目標を最適化し,ユーザ調査を行い,ソーシャルナビゲーションの結果を評価することに成功している。

Typical black-box optimization approaches in robotics focus on learning from metric scores. However, that is not always possible, as not all developers have ground truth available. Learning appropriate robot behavior in human-centric contexts often requires querying users, who typically cannot provide precise metric scores. Existing approaches leverage human feedback in an attempt to model an implicit reward function; however, this reward may be difficult or impossible to effectively capture. In this work, we introduce SortCMA to optimize algorithm parameter configurations in high dimensions based on pairwise user preferences. SortCMA efficiently and robustly leverages user input to find parameter sets without directly modeling a reward. We apply this method to tuning a commercial depth sensor without ground truth, and to robot social navigation, which involves highly complex preferences over robot behavior. We show that our method succeeds in optimizing for the user's goals and perform a user study to evaluate social navigation results.

翻訳日:2023-08-10 15:59:19 公開日:2023-08-08

# single-sentence reader : 回答位置バイアスに対する新しいアプローチ

Single-Sentence Reader: A Novel Approach for Addressing Answer Position Bias ( http://arxiv.org/abs/2308.04566v1 )

ライセンス: Link先を確認

Son Quoc Tran and Matt Kretchmar

(参考訳) Machine Reading Comprehension (MRC)モデルは、素早い相関(研究コミュニティのデータセットバイアスやアノテーションアーティファクトとしても知られる)を利用する傾向がある。したがって、これらのモデルは与えられたコンテキストと質問を完全に理解することなくMCCタスクを実行することができ、分散シフトに対するロバスト性が低い可能性があるため、望ましくない。本論文は, 文脈の第一文のみにのみ回答がある学習者のかなりの割合が, 回答位置バイアスという概念を考察する。 MRCにおける解答位置バイアスに対処するための新しいアプローチとして,Single-Sentence Readerを提案する。このアプローチを6つの異なるモデルを用いて実装し、その性能を徹底的に分析する。驚くべきことに,提案するシングルセンテンスリーダは,従来のトレーニングセットでトレーニングされたモデルとほぼ一致し,その効果を実証する。本研究は,シングルセンテンス読者が遭遇するいくつかの課題についても考察し,潜在的な解決策を提案する。

Machine Reading Comprehension (MRC) models tend to take advantage of spurious correlations (also known as dataset bias or annotation artifacts in the research community). Consequently, these models may perform the MRC task without fully comprehending the given context and question, which is undesirable since it may result in low robustness against distribution shift. This paper delves into the concept of answer-position bias, where a significant percentage of training questions have answers located solely in the first sentence of the context. We propose a Single-Sentence Reader as a new approach for addressing answer position bias in MRC. We implement this approach using six different models and thoroughly analyze their performance. Remarkably, our proposed Single-Sentence Readers achieve results that nearly match those of models trained on conventional training sets, proving their effectiveness. Our study also discusses several challenges our Single-Sentence Readers encounter and proposes a potential solution.

翻訳日:2023-08-10 15:59:03 公開日:2023-08-08

# スペクトル正規化カーネル良性試験

Spectral Regularized Kernel Goodness-of-Fit Tests ( http://arxiv.org/abs/2308.04561v1 )

ライセンス: Link先を確認

Omar Hagrass, Bharath K. Sriperumbudur, Bing Li

(参考訳) maximum mean discrepancy (mmd)は非ユークリッドデータを扱う能力があるため、非パラメトリック仮説テストを含む多くの機械学習や統計応用で多くの成功を収めている。近年、balasubramanian et alで実証されている。 (2021) MMD に基づく適合性テストは最小限最適ではないが、Tikhonov の正規化バージョンは正規化パラメータの適切な選択のために最適である。しかし、balasubramanian et al. (2021) の結果は平均元が 0 であるという制限付き仮定と積分作用素の固有関数上の一様有界性条件の下で得られる。さらに、balasubramanian et al. (2021) で提案されたテストは、多くのカーネルで計算できないため実用的ではない。本稿では,これらの欠点を取り上げ,tikhonov正則化を含む一般スペクトル正則化器に結果を拡張する。

Maximum mean discrepancy (MMD) has enjoyed a lot of success in many machine learning and statistical applications, including non-parametric hypothesis testing, because of its ability to handle non-Euclidean data. Recently, it has been demonstrated in Balasubramanian et al.(2021) that the goodness-of-fit test based on MMD is not minimax optimal while a Tikhonov regularized version of it is, for an appropriate choice of the regularization parameter. However, the results in Balasubramanian et al. (2021) are obtained under the restrictive assumptions of the mean element being zero, and the uniform boundedness condition on the eigenfunctions of the integral operator. Moreover, the test proposed in Balasubramanian et al. (2021) is not practical as it is not computable for many kernels. In this paper, we address these shortcomings and extend the results to general spectral regularizers that include Tikhonov regularization.

翻訳日:2023-08-10 15:58:48 公開日:2023-08-08

# FocalFormer3D : 3Dオブジェクト検出のためのハードインスタンスに着目して

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection ( http://arxiv.org/abs/2308.04556v1 )

ライセンス: Link先を確認

Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez

(参考訳) 3dオブジェクト検出における偽陰性(fn)は、歩行者、車両、その他の障害物の予測を欠くことによって、自動運転において潜在的に危険な状況につながる可能性がある。致命的な問題だが、この問題は現在の多くの3D検出手法で検討されている。本研究では,多段階的に \textit{fn} を識別する一般的なパイプラインであるhard instance probing (hip)を提案する。 3次元物体検出のために,この手法をfocalformer3dとしてインスタンス化する。 FocalFormer3Dは、ハードオブジェクトを見つけるためのマルチステージクエリ生成と、巨大なオブジェクト候補からオブジェクトを効率的に区別するボックスレベルのトランスフォーマーデコーダを備えている。 nuScenesとWaymoデータセットの実験結果は、FocalFormer3Dの優れた性能を検証する。この利点は、LiDARとマルチモーダル設定の両方において、検出とトラッキングの両方で強力なパフォーマンスをもたらす。 FocalFormer3D は nuScenes 検出ベンチマークで 70.5 mAP と 73.9 NDS を達成し、nuScenes 追跡ベンチマークでは 72.1 AMOTA を示し、どちらも nuScenes LiDAR リーダーボードで1位となった。私たちのコードは \url{https://github.com/NVlabs/FocalFormer3D} で利用可能です。

False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the superior performance of FocalFormer3D. The advantage leads to strong performance on both detection and tracking, in both LiDAR and multi-modal settings. Notably, FocalFormer3D achieves a 70.5 mAP and 73.9 NDS on nuScenes detection benchmark, while the nuScenes tracking benchmark shows 72.1 AMOTA, both ranking 1st place on the nuScenes LiDAR leaderboard. Our code is available at \url{https://github.com/NVlabs/FocalFormer3D}.

翻訳日:2023-08-10 15:58:32 公開日:2023-08-08

# PSRFlow:科学データのためのフローベースモデルによる確率的超解法

PSRFlow: Probabilistic Super Resolution with Flow-Based Models for Scientific Data ( http://arxiv.org/abs/2308.04605v1 )

ライセンス: Link先を確認

Jingyi Shen and Han-Wei Shen

(参考訳) 近年,多くの深層学習に基づく超解法が提案されているが,推論段階では基礎的な真理が得られていないため,超解答結果の誤りや不確実性を定量化できるものはほとんどない。しかし、科学的視覚化の応用においては、結果の不確かさを科学者に伝えることは、誤った情報や誤った情報の発生を避けるために不可欠である。本稿では,不確かさの定量化を超解像プロセスに組み込んだ,科学データ超解像のための新しい正規化フロー型生成モデルpsrflowを提案する。 PSRFlowは低解像度データに基づいて高解像度データの条件分布を学習する。高解像度データの欠落情報をキャプチャするガウス潜在空間からサンプリングすることにより、異なる可視超解出力を生成することができる。ガウス潜在空間における効率的なサンプリングにより、超解結果に対する不確実な定量化を行うことができる。モデルトレーニング中、様々なスケールのサンプルでトレーニングデータを増強し、異なるスケールのデータに適応できるようにし、与えられた入力に対して柔軟な超解像を実現する。この結果は,補間やGANに基づく超解像ネットワークなどの既存手法と比較して,優れた性能とロバストな不確実性定量化を示す。

Although many deep-learning-based super-resolution approaches have been proposed in recent years, because no ground truth is available in the inference stage, few can quantify the errors and uncertainties of the super-resolved results. For scientific visualization applications, however, conveying uncertainties of the results to scientists is crucial to avoid generating misleading or incorrect information. In this paper, we propose PSRFlow, a novel normalizing flow-based generative model for scientific data super-resolution that incorporates uncertainty quantification into the super-resolution process. PSRFlow learns the conditional distribution of the high-resolution data based on the low-resolution counterpart. By sampling from a Gaussian latent space that captures the missing information in the high-resolution data, one can generate different plausible super-resolution outputs. The efficient sampling in the Gaussian latent space allows our model to perform uncertainty quantification for the super-resolved results. During model training, we augment the training data with samples across various scales to make the model adaptable to data of different scales, achieving flexible super-resolution for a given input. Our results demonstrate superior performance and robust uncertainty quantification compared with existing methods such as interpolation and GAN-based super-resolution networks.

翻訳日:2023-08-10 15:52:35 公開日:2023-08-08

# 分散型連合学習に関する調査研究

A Survey on Decentralized Federated Learning ( http://arxiv.org/abs/2308.04604v1 )

ライセンス: Link先を確認

Edoardo Gabrielli, Giovanni Pica, Gabriele Tolomei

(参考訳) 近年、フェデレーテッド・ラーニング(FL)は、分散、大規模、プライバシ保護機械学習(ML)システムのトレーニングにおいて、非常に一般的なパラダイムとなっている。トレーニングが行われる正確な場所でデータを収集しなければならない標準的なMLとは対照的に、FLは数百万のエッジデバイスの計算能力を活用して、ローカルのプライベートデータを開示することなく、共有グローバルモデルを協調的にトレーニングする。具体的には、典型的なflシステムでは、中央サーバはオーケストレータとしてのみ動作し、各クライアントがトレーニングしたすべてのローカルモデルを、収束するまでそのプライベートデータ上で反復的に収集し集約する。 FLは間違いなく従来のMLよりもいくつかの利点がある(例えば、設計によるプライベートデータ所有権を保護する)が、いくつかの弱点に悩まされている。最も重要な課題の1つは、単一障害リスクや中間者攻撃に弱いことが知られている古典的なFLクライアントサーバアーキテクチャの集中的なオーケストレーションを克服することである。このような露出を軽減するために、すべてのFLクライアントが中央サーバなしで協調して通信する分散FLソリューションが登場した。この調査は、文献で提案されている既存の分散FLアプローチを包括的に要約し、レビューする。さらに、新たな課題を特定し、この未調査領域における有望な研究方向性を提案する。

In recent years, federated learning (FL) has become a very popular paradigm for training distributed, large-scale, and privacy-preserving machine learning (ML) systems. In contrast to standard ML, where data must be collected at the exact location where training is performed, FL takes advantage of the computational capabilities of millions of edge devices to collaboratively train a shared, global model without disclosing their local private data. Specifically, in a typical FL system, the central server acts only as an orchestrator; it iteratively gathers and aggregates all the local models trained by each client on its private data until convergence. Although FL undoubtedly has several benefits over traditional ML (e.g., it protects private data ownership by design), it suffers from several weaknesses. One of the most critical challenges is to overcome the centralized orchestration of the classical FL client-server architecture, which is known to be vulnerable to single-point-of-failure risks and man-in-the-middle attacks, among others. To mitigate such exposure, decentralized FL solutions have emerged where all FL clients cooperate and communicate without a central server. This survey comprehensively summarizes and reviews existing decentralized FL approaches proposed in the literature. Furthermore, it identifies emerging challenges and suggests promising research directions in this under-explored domain.

翻訳日:2023-08-10 15:52:15 公開日:2023-08-08

# 深層学習に基づく画像透かし - 簡単な調査-

Deep Learning based Image Watermarking: A Brief Survey ( http://arxiv.org/abs/2308.04603v1 )

ライセンス: Link先を確認

Xin Zhong, Arjon Das, Fahad Alrasheedi, Abdullah Tanvir

(参考訳) カバー画像に秘かに透かしを埋め込み抽出して保護する行為は、画像透かし(image watermarking)と呼ばれる。近年,深層学習に基づく画像透かし技術が次々と出現している。そこで本研究では,最先端の深層学習に基づく画像透かし技術について,埋め込み・抽出合同訓練,特徴変換としてのディープネットワーク,ハイブリッドスキームに分類した。各カテゴリの研究方向も分析され、要約される。また,今後の研究の方向性についても論じる。

The act of secretly embedding and extracting a watermark on a cover image to protect it is known as image watermarking. In recent years, deep learning-based image watermarking techniques have been emerging one after another. To study the state-of-the-art, this survey categorizes cutting-edge deep learning-based image watermarking techniques into Embedder-Extractor Joint Training, Deep Networks as a Feature Transformation, and Hybrid schemes. Research directions in each category are also analyzed and summarized. Additionally, potential future research directions are discussed to envision future studies.

翻訳日:2023-08-10 15:51:54 公開日:2023-08-08

# NSF RESUME HPC Workshop: 疫学モデリングにおける高性能コンピューティングと大規模データ管理

NSF RESUME HPC Workshop: High-Performance Computing and Large-Scale Data Management in Service of Epidemiological Modeling ( http://arxiv.org/abs/2308.04602v1 )

ライセンス: Link先を確認

Abby Stevens, Jonathan Ozik, Kyle Chard, Jaline Gerardin, Justin M. Wozniak

(参考訳) NSFが出資したRobust Epidemic Surveillance and Modeling (RESUME)プロジェクトは、2023年5月1日から2日にかけてシカゴ大学で「疫学モデリングのための高性能コンピューティングと大規模データ管理」というワークショップを開催した。これは、予測知性とパンデミック予防のための持続可能な学際的共同設計を促進するために設計された一連のワークショップの一部である。このイベントでは、疫学モデリング、ハイパフォーマンスコンピューティング(hpc)、hpcワークフロー、大規模データ管理の専門家31人が集結し、パンデミック予防のために計算疫学に必要な能力の共有ビジョンを開発する。ワークショップを通じて、参加者は、HPCのワークフロー、データ統合、およびHPCアクセスに重点を置いて、特に公衆衛生上の意思決定を支援するために、HPC能力が疫学的モデリングを改善するのに使用できる重要な領域を特定した。ワークショップでは、新しいHPCワークフローと、現在疫学モデリングに使われている大規模データ管理アプローチを調査し、疫学モデリングに最も適したプラクティスを決定するために、他のドメインで使われているアプローチから引き出そうとした。本報告では,ワークショップの成果と成果について報告する。

The NSF-funded Robust Epidemic Surveillance and Modeling (RESUME) project successfully convened a workshop entitled "High-performance computing and large-scale data management in service of epidemiological modeling" at the University of Chicago on May 1-2, 2023. This was part of a series of workshops designed to foster sustainable and interdisciplinary co-design for predictive intelligence and pandemic prevention. The event brought together 31 experts in epidemiological modeling, high-performance computing (HPC), HPC workflows, and large-scale data management to develop a shared vision for capabilities needed for computational epidemiology to better support pandemic prevention. Through the workshop, participants identified key areas in which HPC capabilities could be used to improve epidemiological modeling, particularly in supporting public health decision-making, with an emphasis on HPC workflows, data integration, and HPC access. The workshop explored nascent HPC workflow and large-scale data management approaches currently in use for epidemiological modeling and sought to draw from approaches used in other domains to determine which practices could be best adapted for use in epidemiological modeling. This report documents the key findings and takeaways from the workshop.

翻訳日:2023-08-10 15:51:43 公開日:2023-08-08

# モデルモデル -- その1

Model of models -- Part 1 ( http://arxiv.org/abs/2308.04600v1 )

ライセンス: Link先を確認

Shimon Komarovsky

(参考訳) 本稿では,AGIエージェントの主成分として機能する新しい認知モデルを提案する。このモデルは、成熟したインテリジェンス状態に導入され、以前のモデルであるDENN、特にAKREMの拡張として、運用モデル(フレーム/クラス)と意志を含む。このモデルの中核的な仮定は、認知は蓄積された知識を操作することであり、適切な意志のガイダンスである。また、知識の一部である行動が、成熟した知性状態に先行する進化段階において、意志に沿うことを学習していると仮定する。さらに、このモデルは、トップダウンとボトムアップの両方のモデル学習、一般化のバース特殊化など、既知のすべての知的側面における双対性原理に基づいている。さらに、AGI設計には全体論的アプローチが提唱され、再利用性とシンプルさという形で制約や効率性の下での認知が提案される。最後に、この成熟状態に達するには、統合原理を利用して、幼児から成人への認知的進化を通して記述する。この認知モデルの最終的な製品は、モデルとインスタンスの動的操作メモリである。最後に、成熟状態に達する進化段階のいくつかの例と予備的なアイデアを示す。

This paper proposes a new cognitive model, acting as the main component of an AGI agent. The model is introduced in its mature intelligence state, and as an extension of previous models, DENN, and especially AKREM, by including operational models (frames/classes) and will. This model's core assumption is that cognition is about operating on accumulated knowledge, with the guidance of an appropriate will. Also, we assume that the actions, part of knowledge, are learning to be aligned with will, during the evolution phase that precedes the mature intelligence state. In addition, this model is mainly based on the duality principle in every known intelligent aspect, such as exhibiting both top-down and bottom-up model learning, generalization verse specialization, and more. Furthermore, a holistic approach is advocated for AGI designing, and cognition under constraints or efficiency is proposed, in the form of reusability and simplicity. Finally, reaching this mature state is described via a cognitive evolution from infancy to adulthood, utilizing a consolidation principle. The final product of this cognitive model is a dynamic operational memory of models and instances. Lastly, some examples and preliminary ideas for the evolution phase to reach the mature state are presented.

翻訳日:2023-08-10 15:51:21 公開日:2023-08-08

# cvpr2023のバーストロングテールとオープンワールドへの挑戦

1st Place Solution for CVPR2023 BURST Long Tail and Open World Challenges ( http://arxiv.org/abs/2308.04598v1 )

ライセンス: Link先を確認

Kaer Huang

(参考訳) 現在、ビデオインスタンスセグメンテーション(vis)は、わずか数十のカテゴリを含むクローズドなトレーニングカテゴリから、ビデオ内のオブジェクトをセグメンテーションし、分類することを目的としている。 TAOとBURSTのデータセットがリリースされるにつれて、長い尾とオープンワールドのシナリオでVISを研究する機会が得られます。従来のVISメソッドは、少数の共通クラスに限定されたベンチマークで評価されるが、実用的なアプリケーションでは、これらの共通クラスを越えて、稀で目に見えないオブジェクトを検出し、追跡するトラッカーが必要である。ロングテールタスクのための最新のmot論文(野生のあらゆるものを追跡するsiyuan li et)にインスパイアされたburst long tail challengeでは、反復係数サンプリングを使用して、lvisv0.5とcocoデータセットの組み合わせでモデルをトレーニングします。まず、LVISv0.5 + COCOデータセット上でセグメンテーションとCEMで検出器を訓練する。そして、TAOデータセットでインスタンスの外観の類似性をトレーニングする。最終的に、我々のメソッド(LeTracker)は、BURSTテストセットで14.9 HOTAallを獲得し、ベンチマークで1位になった。オープンワールドの課題では、64クラス(BURST TrainサブセットのIntersectionクラスとCOCOデータセット、LVISデータセットなしで)のアノテーションデータトレーニングと、BURSTテストセットデータセット上でのテストのみを使用し、ベンチマークで1位となる61.4 OWTAallを取得します。私たちのコードは将来の研究を促進するためにリリースされます。

Currently, Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories that contain only a few dozen of categories, lacking the ability to handle diverse objects in real-world videos. As TAO and BURST datasets release, we have the opportunity to research VIS in long-tailed and open-world scenarios. Traditional VIS methods are evaluated on benchmarks limited to a small number of common classes, But practical applications require trackers that go beyond these common classes, detecting and tracking rare and even never-before-seen objects. Inspired by the latest MOT paper for the long tail task (Tracking Every Thing in the Wild, Siyuan Li et), for the BURST long tail challenge, we train our model on a combination of LVISv0.5 and the COCO dataset using repeat factor sampling. First, train the detector with segmentation and CEM on LVISv0.5 + COCO dataset. And then, train the instance appearance similarity head on the TAO dataset. at last, our method (LeTracker) gets 14.9 HOTAall in the BURST test set, ranking 1st in the benchmark. for the open-world challenges, we only use 64 classes (Intersection classes of BURST Train subset and COCO dataset, without LVIS dataset) annotations data training, and testing on BURST test set data and get 61.4 OWTAall, ranking 1st in the benchmark. Our code will be released to facilitate future research.

翻訳日:2023-08-10 15:50:59 公開日:2023-08-08

# P\"oschl-Teller電位の再正規化とスペクトル

Renormalization and spectra of the P\"oschl-Teller potential ( http://arxiv.org/abs/2308.04596v1 )

ライセンス: Link先を確認

Ulysses Camara da Silva, Andre Alves Lima, Carlos F.S. Pereira

(参考訳) 2次元パラメータのすべての値に対する p\"oschl-teller ポテンシャルのエネルギー固有関数とスペクトルについて検討した。ポテンシャルは原点に特異性を持ち、パラメータ空間のいくつかの領域では固有関数の境界条件が不定義となる。再正規化手順を解の族に応用し,関連する再正規化群(rg)フローを考察する。再正規化は `dimensional transmutation'' によって異常な長さスケールをもたらす。このスケールがゼロに設定できないカップリング空間の領域では、特異点の近くで漸近共形対称性を自発的に破る。対称性はポテンシャルの次元パラメータによって明確に破られる。これら2つの競合する共形対称性を破る方法の存在は、RGフローを興味深い構造にする。ポテンシャルの超対称性は、存在すれば漸近共形対称性の自発的破れを防止できることを示す。固有関数の族を用いてパラメータ空間のすべての領域における S-行列を異常スケールの任意の値に対して計算する。次に、S-行列の極を体系的に研究し、すべての有界、反有界、準安定状態を分類する。

We study the energy eigenfunctions and spectrum of the P\"oschl-Teller potential for every value of its two dimensionless parameters. The potential has a singularity at the origin which, in some regions of parameter space, makes boundary conditions of the eigenfunctions ill-defined. We apply a renormalization procedure to obtain a family of well-defined solutions, and study the associated renormalization group (RG) flow. Renormalization introduces an anomalous length scale by ``dimensional transmutation''. In the regions of coupling space where this scale cannot be set to zero, it spontaneously breaks the asymptotic conformal symmetry near the singularity. The symmetry is also explicitly broken by a dimensionful parameter in the potential. The existence of these two competing ways of breaking conformal symmetry gives the RG flow an interesting structure. We show that supersymmetry of the potential, when present, allows one to prevent spontaneous breaking of the asymptotic conformal symmetry. We use the family of eigenfunctions to compute the S-matrix in all regions of parameter space, for any value of anomalous scale. Then we systematically study the poles of the S-matrix to classify all bound, anti-bound and metastable states.

翻訳日:2023-08-10 15:50:28 公開日:2023-08-08

# 深層ニューラルネットワーク圧縮のための量子化認識因子化

Quantization Aware Factorization for Deep Neural Network Compression ( http://arxiv.org/abs/2308.04595v1 )

ライセンス: Link先を確認

Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets, Andrzej Cichocki, Julia Gusak

(参考訳) 畳み込み層と完全連結層のテンソル分解は、ニューラルネットワークのパラメータとフラップを減らす効果的な方法である。モバイルまたは組み込みデバイスのメモリと消費電力の制限のため、事前トレーニングされたモデルがデプロイされる場合、量子化ステップが通常必要となる。従来のトレーニング後量子化手法は、分割重み付きネットワークに適用され、精度が低下する。これにより、テンソル近似を量子化因子で直接求めるアルゴリズムを開発し、モデルの予測品質を維持しながら、両方の圧縮手法の恩恵を受けることができる。すなわち、特定の量子化格子上に存在する要素を持つ正準ポリアディック(CP)分解に、 Alternating Direction Method of Multipliers (ADMM) を用いることを提案する。ニューラルネットワークの重み付けを考案したアルゴリズムで圧縮し,その予測品質と性能を評価する。本手法を最先端のトレーニング後量子化手法と比較し,望ましい品質・パフォーマンストレードオフの達成において,高い柔軟性と競争性を示す。

Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it's prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.

翻訳日:2023-08-10 15:50:07 公開日:2023-08-08

# セリウム置換M型六フッ化ストロンチウムの4価電子駆動における巨大磁気異方性と光学異方性

Giant magnetic and optical anisotropy in cerium-substituted M-type strontium hexaferrite driven by 4$f$ electrons ( http://arxiv.org/abs/2308.04594v1 )

ライセンス: Link先を確認

Churna Bhandari, Durga Paudyal

(参考訳) 密度汎関数計算により, セリウム (Ce) 置換M型ヘキサフェライト中の巨大結晶異方性 (MCA) 定数が, Ce から特定の鉄 (2a) サイトへの量子閉じ込め電子移動の支援により, エネルギー的に有利なストロンチウムサイトに存在することがわかった。計算された電子構造は、電子移動がCe$^{3+}$とFe$^{2+}$をフェルミ準位以下に占有したCe($4f^1$)状態を生成する2a$サイトで形成し、MCAと磁気モーメントに重要な寄与をもたらすことを示している。ハーフce置換は金属状態を形成し、全置換はストロンチウム-ヘキサフェライト(ホスト)の半導状態を保持する。後者では、ホストのギャップ領域における電荷移動状態の形成によりバンドギャップが減少する。光吸収係数は、平行方向の光偏光と垂直方向の強い異方性を示す。予測可能な競合相の解析を含む計算された生成エネルギーと弾性定数は、両方の組成が化学的に、機械的に安定であることを確認する。 Ce-ヘキサフェライトは、合成の成功により、自動車の駆動モーターなどの装置での使用に適合する新しい高性能な臨界要素のない永久磁石材料となる。

By performing density functional calculations, we find a giant magnetocrystalline anisotropy (MCA) constant in abundant element cerium (Ce) substituted M-type hexaferrite, in the energetically favorable strontium site, assisted by a quantum confined electron transfer from Ce to specific iron (2a) site. Remarkably, the calculated electronic structure shows that the electron transfer leads to the formation of Ce$^{3+}$ and Fe$^{2+}$ at the $2a$ site producing an occupied Ce($4f^1$) state below the Fermi level that adds a significant contribution to MCA and magnetic moment. A half Ce-substitution forms a metallic state, while a full substitution retains the semiconducting state of the strontium-hexaferrite (host). In the latter, the band gap is reduced due to the formation of charge transferred states in the gap region of the host. The optical absorption coefficient shows an enhanced anisotropy between light polarization in parallel and perpendicular directions. Calculated formation energies, including the analysis of probable competing phases, and elastic constants confirm that both compositions are chemically and mechanically stable. With successful synthesis, the Ce-hexaferrite can be a new high-performing critical-element-free permanent magnet material adapted for use in devices such as automotive traction drive motors.

翻訳日:2023-08-10 15:49:52 公開日:2023-08-08

# shepherd: 言語モデル生成に対する批判

Shepherd: A Critic for Language Model Generation ( http://arxiv.org/abs/2308.04592v1 )

ライセンス: Link先を確認

Tianlu Wang, Ping Yu, Xiaoqing Ellen Tan, Sean O'Brien, Ramakanth Pasunuru, Jane Dwivedi-Yu, Olga Golovneva, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz

(参考訳) 大きな言語モデルの改善に伴い、これらのモデルの能力を活用して独自の出力を洗練する技術への関心が高まっている。本研究では,応答を批判し,改良を提案する言語モデルとして,多種多様なエラーを識別し,修正を提案する未調整モデルの能力を超えて拡張する。私たちのアプローチの中核は高品質なフィードバックデータセットで、コミュニティのフィードバックとヒューマンアノテーションからキュレートしています。 Shepherd は小さい (7B パラメータ) が、その批判は ChatGPT などの確立したモデルと同等か好まれる。 GPT-4による評価では、シェパードの平均勝利率は53-87%である。人間の評価では、Shepherdは他のモデルを厳密に上回り、ChatGPTと密接な関係にある。

As large language models improve, there is increasing interest in techniques that leverage these models' capabilities to refine their own outputs. In this work, we introduce Shepherd, a language model specifically tuned to critique responses and suggest refinements, extending beyond the capabilities of an untuned model to identify diverse errors and provide suggestions to remedy them. At the core of our approach is a high quality feedback dataset, which we curate from community feedback and human annotations. Even though Shepherd is small (7B parameters), its critiques are either equivalent or preferred to those from established models including ChatGPT. Using GPT-4 for evaluation, Shepherd reaches an average win-rate of 53-87% compared to competitive alternatives. In human evaluation, Shepherd strictly outperforms other models and on average closely ties with ChatGPT.

翻訳日:2023-08-10 15:49:25 公開日:2023-08-08

# ヒルベルト=シュミット作用素と複素ヒルベルト空間の共役:ディラックのブラケット形式を再訪

Hilbert-Schmidt operators and the conjugate of a complex Hilbert space: Dirac's bra-ket formalism revisited ( http://arxiv.org/abs/2308.04627v1 )

ライセンス: Link先を確認

Frank Oertel

(参考訳) 我々は、与えられた複素ヒルベルト空間上の内積の定義が(通常、数学で使われる(線形性は第一成分で、半線型性は第二成分で仮定される)、量子物理学におけるディラックの強力なブラケット形式性に直接関係していることを詳細に示す。この目的のために、複素ヒルベルト空間の共役(半線型作用素の解析を線型作用素理論で扱うことができる)を利用し、従って Fr\'{e}chet-Riesz の定理を再適用する必要がある。応用は、2つの複素ヒルベルト空間 $h \otimes k$ のテンソル積の自己完結的で単純な記述や、量子テレポーテーション過程の純粋に線形代数的記述(例3.8)を含む。そのような場合、ヒルベルト空間 $H \otimes (K \otimes L)$ と $(H \otimes K) \otimes L$ (Theorem 3.7) の間の正準同型を明示的に構成する。

We reveal in detail how the definition of the inner product on a given complex Hilbert space - usually used in mathematics (where linearity is assumed in the first component and semilinearity in the second) - directly links to Dirac's powerful bra-ket formalism in quantum physics. To this end, we just have to make use of the conjugate of a complex Hilbert space (by which an analysis of semilinear operators can be handled by means of linear operator theory) and re-apply the theorem of Fr\'{e}chet-Riesz accordingly. Applications are specified, including a self-contained and simple description of the tensor product of two complex Hilbert spaces $H \otimes K$ (answering a related question of B. K. Driver) and a purely linear algebraic description of the quantum teleportation process (Example 3.8). In doing so, we provide an explicit construction of a canonical isometric isomorphism between the Hilbert spaces $H \otimes (K \otimes L)$ and $(H \otimes K) \otimes L$ (Theorem 3.7).

翻訳日:2023-08-10 15:41:40 公開日:2023-08-08

# 意味的変動評価のための文埋め込みモデルの比較検討

A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation ( http://arxiv.org/abs/2308.04625v1 )

ライセンス: Link先を確認

Deven M. Mistry and Ali A. Minai

(参考訳) 本や写本のような長い現実世界のテキストにおける意味変化のパターンを分析することは、スタイリスティック、認知、言語の観点から興味深い。また、テキストセグメンテーション、文書要約、セマンティックノベルティの検出などのアプリケーションにも有用である。文埋め込みのためのベクトル空間法が最近出現し、そのような分析が可能になった。しかし、これは様々な方法によって生み出される意味表現がいかに一貫性があり有意義であるかという問題を引き起こす。本稿では,複数の文献において,連続する文間の意味的類似性の時系列と対の文類似性の行列を用いた最近の文埋め込み手法を比較した。文埋め込み法を比較するために,目的とするタスクやデータセットを用いた従来の作業とは対照的に,本手法は「野放し」な手法の評価を提供する。文の埋め込み手法のほとんどは、ある文書において意味的類似性の高相関パターンを推定するが、興味深い相違が見られる。

Analyzing the pattern of semantic variation in long real-world texts such as books or transcripts is interesting from the stylistic, cognitive, and linguistic perspectives. It is also useful for applications such as text segmentation, document summarization, and detection of semantic novelty. The recent emergence of several vector-space methods for sentence embedding has made such analysis feasible. However, this raises the issue of how consistent and meaningful the semantic representations produced by various methods are in themselves. In this paper, we compare several recent sentence embedding methods via time-series of semantic similarity between successive sentences and matrices of pairwise sentence similarity for multiple books of literature. In contrast to previous work using target tasks and curated datasets to compare sentence embedding methods, our approach provides an evaluation of the methods 'in the wild'. We find that most of the sentence embedding methods considered do infer highly correlated patterns of semantic similarity in a given document, but show interesting differences.

翻訳日:2023-08-10 15:41:03 公開日:2023-08-08

# LLMを利用したチャットボットのベンチマーク:方法とメトリクス

Benchmarking LLM powered Chatbots: Methods and Metrics ( http://arxiv.org/abs/2308.04624v1 )

ライセンス: Link先を確認

Debarag Banerjee, Pooja Singh, Arjun Avadhanam, Saksham Srivastava

(参考訳) 自律的な会話エージェント、すなわちチャットボットは、企業が顧客やパートナーにサポートを提供するための一般的なメカニズムになりつつある。チャットボット、特にLarge Language Models (LLMs)のようなジェネレーティブAIツールを活用するものを評価するためには、パフォーマンスを正確に評価する必要がある。ここでチャットボットのベンチマークが重要になる。本稿では,e2e(end to end)ベンチマークと呼ばれる新しいベンチマークの利用を提案し,チャットボット,特にllmsによる回答の正確性と有用性を評価するためにe2eベンチマークをどのように利用できるかを示す。我々は,E2Eベンチマークと,技術状況で一般的に使用されている他のメトリクスの両方に基づいて,さまざまなレベルの高度度でチャットボットの例を評価し,提案したベンチマークが他と比較して優れた結果を示すことを観察した。さらに、いくつかのメトリクスは予測不可能であることが判明したが、チャットボットの評価においてコサインの類似性を利用したE2Eベンチマークに関連するメトリクスは良好に動作した。ベストモデルの性能は,コサイン類似度スコアを指標としてE2Eベンチマークにいくつかの利点があることを示している。

Autonomous conversational agents, i.e. chatbots, are becoming an increasingly common mechanism for enterprises to provide support to customers and partners. In order to rate chatbots, especially ones powered by Generative AI tools like Large Language Models (LLMs) we need to be able to accurately assess their performance. This is where chatbot benchmarking becomes important. In this paper, we propose the use of a novel benchmark that we call the E2E (End to End) benchmark, and show how the E2E benchmark can be used to evaluate accuracy and usefulness of the answers provided by chatbots, especially ones powered by LLMs. We evaluate an example chatbot at different levels of sophistication based on both our E2E benchmark, as well as other available metrics commonly used in the state of art, and observe that the proposed benchmark show better results compared to others. In addition, while some metrics proved to be unpredictable, the metric associated with the E2E benchmark, which uses cosine similarity performed well in evaluating chatbots. The performance of our best models shows that there are several benefits of using the cosine similarity score as a metric in the E2E benchmark.

翻訳日:2023-08-10 15:40:47 公開日:2023-08-08

# staged speculative decodingを用いたllm推論の高速化

Accelerating LLM Inference with Staged Speculative Decoding ( http://arxiv.org/abs/2308.04623v1 )

ライセンス: Link先を確認

Benjamin Spector and Chris Re

(参考訳) 大規模言語モデル(LLM)による最近の進歩は、その多様な能力を示している。そこで我々は,小型デバイス上でのLDM推論を高速化する新しいアルゴリズム,ステージド投機デコーディングを提案する。我々は、投機的復号法における従来の作業を改善することで、小バッチ推論の算術強度を低くする。まず、投機的バッチをツリーとして再構成し、生成コストを削減し、バッチ当たりの期待トークンを増やす。次に、投機的復号化の第2段階を追加します。出力品質を完全に保ちながら、762MパラメータGPT-2-Lモデルを用いて、単一バッチ復号遅延を3.16倍削減する。

Recent advances with large language models (LLM) illustrate their diverse capabilities. We propose a novel algorithm, staged speculative decoding, to accelerate LLM inference in small-batch, on-device scenarios. We address the low arithmetic intensity of small-batch inference by improving upon previous work in speculative decoding. First, we restructure the speculative batch as a tree, which reduces generation costs and increases the expected tokens per batch. Second, we add a second stage of speculative decoding. Taken together, we reduce single-batch decoding latency by 3.16x with a 762M parameter GPT-2-L model while perfectly preserving output quality.

翻訳日:2023-08-10 15:40:28 公開日:2023-08-08

# モノクロ映像から人間をレンダリングする

Rendering Humans from Object-Occluded Monocular Videos ( http://arxiv.org/abs/2308.04622v1 )

ライセンス: Link先を確認

Tiange Xiang, Adam Sun, Jiajun Wu, Ehsan Adeli, Li Fei-Fei

(参考訳) モノクロビデオから人間を動かすことの3D理解とレンダリングは難しい課題だ。近年の進歩にもかかわらず、実際のシナリオでは、障害物がカメラの視界を遮り、キャプチャーされたビデオに部分的閉塞を引き起こすような作業は依然として困難である。既存のメソッドは2つの理由からこのような欠陥を処理できない。第一に、標準的なレンダリング戦略は点点マッピングに依存しており、これは身体の可視領域と隠蔽領域の間に劇的な差異をもたらす可能性がある。第二に、自然な直接回帰アプローチは、閉塞下でのレンダリングの実現可能性基準(つまり事前情報)を考慮しない。以上の欠点に対処するため,重度の閉鎖シーンにおいて,より優れたレンダリングを実現するニューラルネットワークレンダリング手法であるOccNeRFを提案する。この2つの欠点に対する直接的な解決策として,形状と可視性の統合による表面レンダリングを提案する。シミュレーションと実世界のオクルージョンの両方に対して本手法の有効性を検証する。

3D understanding and rendering of moving humans from monocular videos is a challenging task. Despite recent progress, the task remains difficult in real-world scenarios, where obstacles may block the camera view and cause partial occlusions in the captured videos. Existing methods cannot handle such defects due to two reasons. First, the standard rendering strategy relies on point-point mapping, which could lead to dramatic disparities between the visible and occluded areas of the body. Second, the naive direct regression approach does not consider any feasibility criteria (ie, prior information) for rendering under occlusions. To tackle the above drawbacks, we present OccNeRF, a neural rendering method that achieves better rendering of humans in severely occluded scenes. As direct solutions to the two drawbacks, we propose surface-based rendering by integrating geometry and visibility priors. We validate our method on both simulated and real-world occlusions and demonstrate our method's superiority.

翻訳日:2023-08-10 15:40:17 公開日:2023-08-08

# 帯域フィードバックによるマルチクラスオンライン学習

Multiclass Online Learnability under Bandit Feedback ( http://arxiv.org/abs/2308.04620v1 )

ライセンス: Link先を確認

Ananth Raman, Vinod Raman, Unique Subedi, Ambuj Tewari

(参考訳) バンディットフィードバックに基づくオンラインマルチクラス分類について検討する。ラベル空間が非有界である場合でも、Bandit Littlestone次元の有限性が必要かつ十分であることを示すことにより、(ダニーリー2013プライス)の結果を拡張した。この結果から,ラベル空間が非有界である場合,Littlestone次元がオンラインマルチクラス学習能力を特徴付けることを示す(Hanneke2023multiclass)最近の研究を補完する。

We study online multiclass classification under bandit feedback. We extend the results of (daniely2013price) by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online multiclass learnability even when the label space is unbounded. Our result complements the recent work by (hanneke2023multiclass) who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting when the label space is unbounded.

翻訳日:2023-08-10 15:40:02 公開日:2023-08-08

# ユニバーサルバックドア緩和とテスト時間検出のためのアクティベーションクリッピングの改善

Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection ( http://arxiv.org/abs/2308.04617v1 )

ライセンス: Link先を確認

Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

(参考訳) ディープニューラルネットワークはバックドア攻撃(トロイの木馬)に脆弱であり、攻撃者がバックドアトリガーでトレーニングセットに毒を盛り、ニューラルネットワークが攻撃者の指定されたターゲットクラスに対するテストタイムトリガーの分類を学ぶ。近年の研究では、バックドア中毒は攻撃されたモデルにおいて過剰フィッティング(異常に大きな活性化)を誘発し、これによりバックドア緩和のための一般的な訓練後のクリッピング法、すなわち、少量のクリーンサンプルを用いて学習した内部層活性化の限界を動機付けることが示されている。我々は、分類マージンを明示的に制限するためにアクティベーション境界を選択する新しいアプローチを考案する。この手法は、CIFAR-10画像分類のためのピア法に対して優れた性能を与える。また,この手法は適応攻撃,x2x攻撃,異なるデータセットに対して強いロバスト性を示す。最後に、元のネットワークとアクティベーションバウンドネットワークの出力差に基づいて、テスト時間検出と修正のための方法拡張を示す。本手法のコードはオンラインで利用可能である。

Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for test-time detection and correction based on the output differences between the original and activation-bounded networks. The code of our method is online available.

翻訳日:2023-08-10 15:39:52 公開日:2023-08-08

# ストレス・ストレス関連精神疾患の検出・予測・モニタリングのための機械学習・ディープラーニング・データ前処理技術:スコープレビュー

Machine Learning, Deep Learning and Data Preprocessing Techniques for Detection, Prediction, and Monitoring of Stress and Stress-related Mental Disorders: A Scoping Review ( http://arxiv.org/abs/2308.04616v1 )

ライセンス: Link先を確認

Moein Razavi, Samira Ziyadidegan, Reza Jahromi, Saber Kazeminasab, Vahid Janfaza, Ahmadreza Mahmoudzadeh, Elaheh Baharlouei, Farzan Sasangohar

(参考訳) この総合的なレビューは、精神ストレスとその関連する精神障害の検出、予測、分析に使用される機械学習(ML)方法論を体系的に評価する。厳密なスコーピングレビュープロセスを用いて,ストレスおよびストレス関連mdsの文脈で使用される最新のmlアルゴリズム,前処理技術,データ型について調査を行った。その結果、Support Vector Machine(SVM)、Neural Network(NN)、Random Forest(RF)モデルは、検査されたすべての機械学習アルゴリズムにおいて、常に優れた精度と堅牢性を示すことがわかった。さらに, 心拍数測定や皮膚反応などの生理的パラメータが, mlアルゴリズムのストレス予測因子として広く用いられていることを考察する。これは、ストレスやストレス関連のMDに関する豊富な説明情報と、データ取得の比較的容易さに起因する。さらに、マッピング、特徴選択、フィルタリング、ノイズ低減を含む次元性低減技術の応用は、MLアルゴリズムの訓練に先立って重要なステップとしてしばしば観察される。このレビューの合成は、重要な研究のギャップを明らかにし、この分野の今後の方向性を概説する。これらの領域は、モデル解釈可能性、モデルパーソナライゼーション、自然主義的設定の組み込み、ストレスやストレスに関連するmdsの検出と予測のためのリアルタイム処理能力などを含む。

This comprehensive review systematically evaluates Machine Learning (ML) methodologies employed in the detection, prediction, and analysis of mental stress and its consequent mental disorders (MDs). Utilizing a rigorous scoping review process, the investigation delves into the latest ML algorithms, preprocessing techniques, and data types employed in the context of stress and stress-related MDs. The findings highlight that Support Vector Machine (SVM), Neural Network (NN), and Random Forest (RF) models consistently exhibit superior accuracy and robustness among all machine learning algorithms examined. Furthermore, the review underscores that physiological parameters, such as heart rate measurements and skin response, are prevalently used as stress predictors in ML algorithms. This is attributed to their rich explanatory information concerning stress and stress-related MDs, as well as the relative ease of data acquisition. Additionally, the application of dimensionality reduction techniques, including mappings, feature selection, filtering, and noise reduction, is frequently observed as a crucial step preceding the training of ML algorithms. The synthesis of this review identifies significant research gaps and outlines future directions for the field. These encompass areas such as model interpretability, model personalization, the incorporation of naturalistic settings, and real-time processing capabilities for detection and prediction of stress and stress-related MDs.

翻訳日:2023-08-10 15:39:34 公開日:2023-08-08

# 深層学習を用いた方向探索のためのスパースアレイ設計

Sparse Array Design for Direction Finding using Deep Learning ( http://arxiv.org/abs/2308.04615v1 )

ライセンス: Link先を確認

Kumar Vijay Mishra, Ahmet M. Elbir and Koichi Ichige

(参考訳) 近年,スパースアレイの設計に深層学習(DL)技術が導入されている。これらの手法は、機能工学と低い予測段階の複雑さの利点を提供し、スパース配列を見つけることに固有の組合せ探索に取り組むのに役立つ。本章では,DLに基づくスパースアレイの応用について,複数の方向の合成を行う。まず、認識レーダ応用のためのスパースアレイの選択に適用可能な教師付きおよび伝達学習手法を検討する。ここでは,2次元スパースアレイの設計において,シミュレートアニーリングなどのメタヒューリスティック学習アルゴリズムの利用についても論じる。次に,sparse array問題とチャネル推定,ビームフォーミング,ローカライズを併用した無線通信のためのdlベースアンテナ選択について検討する。最後に,isac(integrated sensing and communications)アプリケーションにおいて,レーダと通信性能のトレードオフによってisacスパースアレイ問題が非常に困難となるような,深いスパースアレイ手法の例を示す。各設定について,いくつかの数値実験を通してモデルに基づく最適化とdl手法の性能を示す。我々は、配列データの様々な不完全性に対するdlベースのアルゴリズムの堅牢性を確保するために必要となる追加の考慮事項について論じる。

In the past few years, deep learning (DL) techniques have been introduced for designing sparse arrays. These methods offer the advantages of feature engineering and low prediction-stage complexity, which is helpful in tackling the combinatorial search inherent to finding a sparse array. In this chapter, we provide a synopsis of several direction finding applications of DL-based sparse arrays. We begin by examining supervised and transfer learning techniques that have applications in selecting sparse arrays for a cognitive radar application. Here, we also discuss the use of meta-heuristic learning algorithms such as simulated annealing for the case of designing two-dimensional sparse arrays. Next, we consider DL-based antenna selection for wireless communications, wherein sparse array problem may also be combined with channel estimation, beamforming, or localization. Finally, we provide an example of deep sparse array technique for integrated sensing and communications (ISAC) application, wherein a trade-off of radar and communications performance makes ISAC sparse array problem very challenging. For each setting, we illustrate the performance of model-based optimization and DL techniques through several numerical experiments. We discuss additional considerations required to ensure robustness of DL-based algorithms against various imperfections in array data.

翻訳日:2023-08-10 15:39:09 公開日:2023-08-08

# 津波に伴う内部重力波の深層学習による検出 : 開海自然災害検出への道

Deep Learning Driven Detection of Tsunami Related Internal GravityWaves: a path towards open-ocean natural hazards detection ( http://arxiv.org/abs/2308.04611v1 )

ライセンス: Link先を確認

Valentino Constantinou, Michela Ravanelli, Hamlin Liu, Jacob Bortnik

(参考訳) 津波は電離圏内で内部重力波(IGW)を発生させ、地球航法衛星システム(GNSS)によって検出される全電子含有量(TEC)を摂動させる。 GNSSは、ヨーロッパのガリレオ、アメリカ合衆国のGPS、ロシアのGlobal'naya Navigatsionnaya Sputnikovaya Sistema(GLONASS)、中国のBeiDouといった地球軌道からの信号を提供する衛星群である。 TIDのリアルタイム検出は津波検出のアプローチを提供し、ブイベースの警報システムでは利用できない地域において、早期警報システムを強化する。 GNSSデータの大部分はディープラーニングによって活用され、何千ものデータストリームにわたる複雑な非線形関係を効果的に処理する。 VARION(Variometric Approach for Real-Time Ionosphere Observation)アルゴリズムからスラント全電子含有量(sTEC)をグラミアン角差分場(Computer Vision)と畳み込みニューラルネットワーク(Convolutional Neural Networks, CNN)を用いてほぼリアルタイムに検出するフレームワークについて述べる。 2010年モーレ地震、2011年東北地震、2012年ハイダ・グワイ地震と津波の過去のデータはモデルトレーニングに使われ、2015年チリのイラペル地震と津波はサンプルモデルの検証に使われている。論文で説明した実験フレームワークを用いて,91.7%のF1スコアを得た。ソースコードはhttps://github.com/vc1492a/tidd。本研究は, 開海の津波によるIGWの検出における新たなフロンティアであり, 沿岸地域の自然災害検出の可能性を大幅に向上させるものである。

Tsunamis can trigger internal gravity waves (IGWs) in the ionosphere, perturbing the Total Electron Content (TEC) - referred to as Traveling Ionospheric Disturbances (TIDs) that are detectable through the Global Navigation Satellite System (GNSS). The GNSS are constellations of satellites providing signals from Earth orbit - Europe's Galileo, the United States' Global Positioning System (GPS), Russia's Global'naya Navigatsionnaya Sputnikovaya Sistema (GLONASS) and China's BeiDou. The real-time detection of TIDs provides an approach for tsunami detection, enhancing early warning systems by providing open-ocean coverage in geographic areas not serviceable by buoy-based warning systems. Large volumes of the GNSS data is leveraged by deep learning, which effectively handles complex non-linear relationships across thousands of data streams. We describe a framework leveraging slant total electron content (sTEC) from the VARION (Variometric Approach for Real-Time Ionosphere Observation) algorithm by Gramian Angular Difference Fields (from Computer Vision) and Convolutional Neural Networks (CNNs) to detect TIDs in near-real-time. Historical data from the 2010 Maule, 2011 Tohoku and the 2012 Haida-Gwaii earthquakes and tsunamis are used in model training, and the later-occurring 2015 Illapel earthquake and tsunami in Chile for out-of-sample model validation. Using the experimental framework described in the paper, we achieved a 91.7% F1 score. Source code is available at: https://github.com/vc1492a/tidd. Our work represents a new frontier in detecting tsunami-driven IGWs in open-ocean, dramatically improving the potential for natural hazards detection for coastal communities.

翻訳日:2023-08-10 15:38:51 公開日:2023-08-08

# データサイエンスプロジェクトが失敗する理由

Why Data Science Projects Fail ( http://arxiv.org/abs/2308.04896v1 )

ライセンス: Link先を確認

Balaram Panda (The University of Auckland)

(参考訳) データサイエンスは現代のデータインテリジェンスの実践であり、多くのビジネスの中核であり、ビジネスの課題をより効率的に扱うためのスマートな戦略を構築するのに役立ちます。データサイエンスの実践は、このアルゴリズムを使ってビジネスプロセスを自動化するのにも役立ちます。データサイエンスに関しては、主に3つの重要な要素がデータサイエンスプロジェクトの効果的な成果に影響を及ぼす。データの利用可能性 2.Algorithm 3.技術力やインフラ

Data Science is a modern Data Intelligence practice, which is the core of many businesses and helps businesses build smart strategies around to deal with businesses challenges more efficiently. Data Science practice also helps in automating business processes using the algorithm, and it has several other benefits, which also deliver in a non-profitable framework. In regards to data science, three key components primarily influence the effective outcome of a data science project. Those are 1.Availability of Data 2.Algorithm 3.Processing power or infrastructure

翻訳日:2023-08-10 13:52:08 公開日:2023-08-08

# cuts: 医療用画像セグメンテーションのための教師なしフレームワーク

CUTS: A Fully Unsupervised Framework for Medical Image Segmentation ( http://arxiv.org/abs/2209.11359v5 )

ライセンス: Link先を確認

Chen Liu, Matthew Amodio, Liangbo L. Shen, Feng Gao, Arman Avesta, Sanjay Aneja, Jay C. Wang, Lucian V. Del Priore, Smita Krishnaswamy

(参考訳) 本研究では,医用画像セグメンテーションのための完全教師なしディープラーニングフレームワークであるCUTS(Contrastive and Unsupervised Training for Segmentation)を導入する。ピクセルとその周辺地域からの自己スーパービジョンを画像自身で活用する。教師なしのアプローチは、コントラスト学習や自動エンコーディングの概念を活用するトレーニング目標を最適化します。いずれの段階においてもラベル付きデータを必要とせず,新たな2段階アプローチで医療画像のセグメンテーションを行う。最初の段階は、高次元の潜在埋め込み空間におけるベクトル表現を用いて、周囲のパッチと共にすべてのピクセルを埋め込む「ピクセル中心のパッチ」を作成することである。第2段階は、多スケールの位相データ解析手法である拡散凝縮を用いて、これらの埋め込みベクトルを任意のレベルの粒度で動的に粗粒化する。最終的な結果は、様々なスケールで画像構造をハイライトする粗い部分分割のシリーズである。本研究では,自然画像,網膜眼底画像,脳mri画像のマルチスケールセグメンテーションを成功させた。本フレームワークは, 医療画像の場合, 臨床解釈に関連のある異なる情報を伝達しうる, 異なるスケールで構造やパターンを規定する。本フレームワークは,3つの医用画像データセットにおける既存の教師なし手法と比較して,ダイス係数とハウスドルフ距離の10%から200%の改善を定量的に示す。ラベルに頼らずに複数の意味のある粒度の医療画像の分節化の問題に取り組む中で,今後,退屈かつ反復的な手動アノテーションを回避できることを実証したい。

In this work we introduce CUTS (Contrastive and Unsupervised Training for Segmentation), a fully unsupervised deep learning framework for medical image segmentation to better utilize the vast majority of imaging data that is not labeled or annotated. We utilize self-supervision from pixels and their local neighborhoods in the images themselves. Our unsupervised approach optimizes a training objective that leverages concepts from contrastive learning and autoencoding. Our framework segments medical images with a novel two-stage approach without relying on any labeled data at any stage. The first stage involves the creation of a "pixel-centered patch" that embeds every pixel along with its surrounding patch, using a vector representation in a high-dimensional latent embedding space. The second stage utilizes diffusion condensation, a multi-scale topological data analysis approach, to dynamically coarse-grain these embedding vectors at all levels of granularity. The final outcome is a series of coarse-to-fine segmentations that highlight image structures at various scales. In this work, we show successful multi-scale segmentation on natural images, retinal fundus images, and brain MRI images. Our framework delineates structures and patterns at different scales which, in the cases of medical images, may carry distinct information relevant to clinical interpretation. Quantitatively, our framework demonstrates improvements ranging from 10% to 200% on dice coefficient and Hausdorff distance compared to existing unsupervised methods across three medical image datasets. As we tackle the problem of segmenting medical images at multiple meaningful granularities without relying on any label, we hope to demonstrate the possibility to circumvent tedious and repetitive manual annotations in future practice.

翻訳日:2023-08-10 10:57:28 公開日:2023-08-08

# アニーリングマシンによるベイズネットワークの学習

Learning Bayesian Networks with Annealing Machine ( http://arxiv.org/abs/2006.06926v4 )

ライセンス: Link先を確認

Yuta Shikuri

(参考訳) 近年の研究では、アニーリングマシンは高い精度で組合せ最適化問題を解決することができると報告されている。アニーリングマシンは、スコアベースのベイズネットワーク構造学習に応用できる可能性がある。しかし、現在、アニール機のビット容量は制限されている。このアニール技術を利用するには、スコアベースの学習問題をビット容量内の2次非制約バイナリ最適化に変換する必要がある。本稿では,候補となる親集合の高度な同定と分解による効率的な変換手法を提案する。また、必要なビット数を最小限に抑える分解を見つけるために整数プログラミング問題も提供する。変数が75ドルから223ドルまでのベンチマークデータセットによる実験結果から,半導体技術で開発された完全結合型アニールマシンであるFujitsu Digital Annealerの100ドルKビット容量よりも,我々のアプローチではビット数が少なくなることがわかった。さらに,本手法によるディジタルアニーラは,既存のアルゴリズムよりもスコア最大化に優れることを示す。これらの結果はベイズネットワーク学習におけるアニールプロセッサの有用性を強調した。

Recent studies have reported that annealing machines are capable of solving combinatorial optimization problems with high accuracy. Annealing machines can potentially be applied to score-based Bayesian network structure learning. However, the bit capacity of an annealing machine is currently limited. To utilize the annealing technology, converting score-based learning problems into quadratic unconstrained binary optimizations within the bit capacity is necessary. In this paper, we propose an efficient conversion method with the advanced identification of candidate parent sets and their decomposition. We also provide an integer programming problem to find the decomposition that minimizes the number of required bits. Experimental results on $7$ benchmark datasets with variables from $75$ to $223$ show that our approach requires less bits than the $100$K bit capacity of the fourth-generation Fujitsu Digital Annealer, a fully coupled annealing machine developed with semiconductor technology. Moreover, we demonstrate that the Digital Annealer with our conversion method outperforms existing algorithms on score maximization. These results highlight the utility of annealing processors in learning Bayesian networks.

翻訳日:2023-08-09 18:09:54 公開日:2023-08-08

# モバイルネット畳み込みに基づく軽量ターゲット検出アルゴリズム

A lightweight target detection algorithm based on Mobilenet Convolution ( http://arxiv.org/abs/2002.03729v3 )

ライセンス: Link先を確認

Shengquan Wang

(参考訳) Target detection algorithm based on deep learning needs high computer GPU configuration, even need to use high performance deep learning workstation, this not only makes the cost increase, also greatly limits the realizability of the ground, this paper introduces a kind of lightweight algorithm for target detection under the condition of the balance accuracy and computational efficiency, MobileNet as Backbone performs parameter The processing speed is 30fps on the RTX2060 card for images with the CNN separator layer. rtx2060カードの処理速度は30fpsで、解像度は320*320である。

Target detection algorithm based on deep learning needs high computer GPU configuration, even need to use high performance deep learning workstation, this not only makes the cost increase, also greatly limits the realizability of the ground, this paper introduces a kind of lightweight algorithm for target detection under the condition of the balance accuracy and computational efficiency, MobileNet as Backbone performs parameter The processing speed is 30fps on the RTX2060 card for images with the CNN separator layer. The processing speed is 30fps on the RTX2060 card for images with a resolution of 320*320.

翻訳日:2023-08-09 18:09:35 公開日:2023-08-08

# 一般資源理論におけるモノトン

Monotones in General Resource Theories ( http://arxiv.org/abs/1912.07085v3 )

ライセンス: Link先を確認

Tom\'a\v{s} Gonda, Robert W. Spekkens

(参考訳) 資源理論の研究における中心的な問題は、資源変換(モノトーンと呼ばれる)下では、リソースフルネスを定量化するために、不要な関数を見つけることである。モノトンの様々な構成は、多くの異なるコンクリートの資源理論に現れる。これらの構造はどのくらい一般的ですか。与えられた構成を適用すべき資源理論に必要な条件は何か。これらの疑問に答えるために、モノトーンを構成するための幅広いスキームを導入する。興味のある資源の序列から、非自明なモノトンが以前に知られていたり、より簡単に構築できるような、明確な事前順序への順序保存写像を見つけることを含む。私たちが研究した2つの主要なクラスのうちの1つでは、リソースの事前順序はリソースの集合の事前順序にマッピングされ、順序関係が包含されている場合、これらの集合内の関数の値の最大化や最小化によってモノトンが定義できる。他のクラスでは、リソースのプレオーダーはリソースのタプルのプレオーダーにマッピングされ、タプルの異なる要素(その情報内容)の識別可能性の量を測定するモノトーンをプルする。収縮に基づくモノトーンは、後者のクラスで自然に発生し、さらに驚くべきことに、重量とロバスト性の測定も行う。標準モノトン構成の多くを捉えることに加えて、このスキームはこれらの重要な一般化も示唆している。結果の適用可能性の広さを適切に把握するために, 構成概念が関連する資源の種類(状態, チャネル, コームなど)に依存しない, 新たな資源理論の抽象的枠組みとして提示する。

A central problem in the study of resource theories is to find functions that are nonincreasing under resource conversions - termed monotones - in order to quantify resourcefulness. Various constructions of monotones appear in many different concrete resource theories. How general are these constructions? What are the necessary conditions on a resource theory for a given construction to be applicable? To answer these questions, we introduce a broad scheme for constructing monotones. It involves finding an order-preserving map from the preorder of resources of interest to a distinct preorder for which nontrivial monotones are previously known or can be more easily constructed; these monotones are then pulled back through the map. In one of the two main classes we study, the preorder of resources is mapped to a preorder of sets of resources, where the order relation is set inclusion, such that monotones can be defined via maximizing or minimizing the value of a function within these sets. In the other class, the preorder of resources is mapped to a preorder of tuples of resources, and one pulls back monotones that measure the amount of distinguishability of the different elements of the tuple (hence its information content). Monotones based on contractions arise naturally in the latter class, and, more surprisingly, so do weight and robustness measures. In addition to capturing many standard monotone constructions, our scheme also suggests significant generalizations of these. In order to properly capture the breadth of applicability of our results, we present them within a novel abstract framework for resource theories in which the notion of composition is independent of the types of the resources involved (i.e., whether they are states, channels, combs, etc.).

翻訳日:2023-08-09 18:09:28 公開日:2023-08-08

# 識別器最適輸送

Discriminator optimal transport ( http://arxiv.org/abs/1910.06832v3 )

ライセンス: Link先を確認

Akinori Tanaka

(参考訳) 生成逆数ネットワークの幅広いクラスにおいて、判別器最適化プロセスは、ターゲット分布$p$とジェネレータ分布$p_G$の間のワッサーシュタイン距離に対する双対コスト関数の下位境界を増大させることを示す。これは、訓練された判別器が$p_G$から$p$まで最適輸送(OT)を近似できることを意味する。いくつかの実験と少しのot理論に基づき、画像生成を改善するための判別器最適輸送(dot)スキームを提案する。 CIFAR-10, STL-10 で訓練された無条件 GAN と ImageNet による条件付き GAN の事前学習モデルにより, 開始スコアと FID が向上することを示す。

Within a broad class of generative adversarial networks, we show that discriminator optimization process increases a lower bound of the dual cost function for the Wasserstein distance between the target distribution $p$ and the generator distribution $p_G$. It implies that the trained discriminator can approximate optimal transport (OT) from $p_G$ to $p$.Based on some experiments and a bit of OT theory, we propose a discriminator optimal transport (DOT) scheme to improve generated images. We show that it improves inception score and FID calculated by un-conditional GAN trained by CIFAR-10, STL-10 and a public pre-trained model of conditional GAN by ImageNet.

翻訳日:2023-08-09 18:08:58 公開日:2023-08-08

# 対数モデルpt対称性作用素における実スペクトル:対数モデルpt対称性におけるisoスペクトル

Real spectra in Logarithmic model PT-symmetry operators: Iso-spectra in Logarithmic PT-symmetry ( http://arxiv.org/abs/1904.09983v5 )

ライセンス: Link先を確認

Biswanath Rath, Rabab Jarrar, Hussein Shanak, Jihad Asad, and Rania Wannan

(参考訳) 特異および非特異な性質を持つ新しい対数モデルPT対称性作用素の実スペクトルを反映する。また, 逆対数および非逆対数型pt対称ポテンシャル間のisoスペクトルの性質にも気付く。現在の数値結果は以前の結果とよく一致している。

We reflect real spectra of new logarithmic model PT-symmetry operators with singular and non-singular in nature. We also notice that iso-spectral nature between inverted and non-inverted logarithmic PT-symmetric potentials. Present numerical result give good agreement with previous results.

翻訳日:2023-08-09 18:08:45 公開日:2023-08-08

# トランスベース事前学習言語モデルを用いた制御可能なテキスト生成に関する調査

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models ( http://arxiv.org/abs/2201.05337v4 )

ライセンス: Link先を確認

Hanqing Zhang, Haolin Song, Shaoyu Li, Ming Zhou, Dawei Song

(参考訳) 制御可能なテキスト生成(CTG)は、自然言語生成(NLG)分野における新興分野である。実用上の制約をよりよく満たす高度なテキスト生成技術を開発する上で重要であると考えられている。近年、大規模な事前学習言語モデル(PLM)を用いた手法、特に広く使われているトランスフォーマーベースのPLMは、NLGの新しいパラダイムとなり、より多種多様な流動的なテキストを生成することができる。しかし、ディープニューラルネットワークの解釈可能性に限界があるため、これらの手法の制御性を保証する必要がある。この目的のために、トランスフォーマーベースのPLMを用いた制御可能なテキスト生成は、急速に成長するが、新しい研究ホットスポットとなっている。近年の3～4年間で、様々なタイプの制御制約を必要とするCTGタスクをターゲットにした多様なアプローチが出現している。本稿では,この分野における共通課題,主なアプローチ,評価手法について,系統的な批判的考察を行う。最後に、この分野が直面している課題について議論し、様々な将来的な方向性を提示する。我々の知る限りでは、トランスフォーマーベースのPLMの観点から最先端CTG技術の概要をまとめた最初の調査論文である。関連分野の研究者や実践者が、学術的および技術的フロンティアを迅速に追跡し、その領域の風景と将来の研究のロードマップを提供するのに役立つことを期待している。

Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG). It is regarded as crucial for the development of advanced text generation technologies that better meet the specific constraints in practical applications. In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse and fluent text. However, due to the limited level of interpretability of deep neural networks, the controllability of these methods need to be guaranteed. To this end, controllable text generation using transformer-based PLMs has become a rapidly growing yet challenging new research hotspot. A diverse range of approaches have emerged in the recent 3-4 years, targeting different CTG tasks that require different types of controlled constraints. In this paper, we present a systematic critical review on the common tasks, main approaches, and evaluation methods in this area. Finally, we discuss the challenges that the field is facing, and put forward various promising future directions. To the best of our knowledge, this is the first survey paper to summarize the state-of-the-art CTG techniques from the perspective of Transformer-based PLMs. We hope it can help researchers and practitioners in the related fields to quickly track the academic and technological frontier, providing them with a landscape of the area and a roadmap for future research.

翻訳日:2023-08-09 18:04:13 公開日:2023-08-08

# Lawin Transformer: セマンティックセグメンテーションのためのマルチスケール表現による新しいEraビジョンバックボーンの改良

Lawin Transformer: Improving New-Era Vision Backbones with Multi-Scale Representations for Semantic Segmentation ( http://arxiv.org/abs/2201.01615v3 )

ライセンス: Link先を確認

Haotian Yan and Chuang Zhang and Ming Wu

(参考訳) マルチレベルアグリゲーション(MLA)モジュールは、セマンティックセグメンテーションにおいて、新しい時代のビジョンバックボーンを前進させる重要なコンポーネントとして登場した。本稿では,視覚バックボーンからのマルチスケール特徴マップを創造的に活用する新しいMLAアーキテクチャであるLawin (large window) Transformerを提案する。 lawin transformerのコアはlawin attentionであり、ローカルウィンドウよりもずっと大きなコンテキストウィンドウをクエリできる、新たに設計されたウィンドウアテンションメカニズムである。我々は,大規模ウィンドウパラダイムの効率的かつ簡易な応用について研究することに注力し,大規模コンテクストのクエリとマルチスケール表現のキャプチャに対する比率の柔軟な規制を可能にした。我々はLawin TransformerがCityscapesおよびADE20Kに与える影響を検証し、新しい視覚バックボーンと組み合わせることで、広く使われているMLAモジュールに優れた優位性を示す。コードはhttps://github.com/yan-hao-tian/lawinで入手できる。

The multi-level aggregation (MLA) module has emerged as a critical component for advancing new-era vision back-bones in semantic segmentation. In this paper, we propose Lawin (large window) Transformer, a novel MLA architecture that creatively utilizes multi-scale feature maps from the vision backbone. At the core of Lawin Transformer is the Lawin attention, a newly designed window attention mechanism capable of querying much larger context windows than local windows. We focus on studying the efficient and simplistic application of the large-window paradigm, allowing for flexible regulation of the ratio of large context to query and capturing multi-scale representations. We validate the effectiveness of Lawin Transformer on Cityscapes and ADE20K, consistently demonstrating great superiority to widely-used MLA modules when combined with new-era vision backbones. The code is available at https://github.com/yan-hao-tian/lawin.

翻訳日:2023-08-09 18:03:44 公開日:2023-08-08

# カーネルを用いた複合適合試験

Composite Goodness-of-fit Tests with Kernels ( http://arxiv.org/abs/2111.10275v3 )

ライセンス: Link先を確認

Oscar Key, Arthur Gretton, Fran\c{c}ois-Xavier Briol, Tamara Fernandez

(参考訳) モデルの不特定は確率的モデルの実装に重大な課題を生じさせうるため、この問題を直接的に考慮する様々な堅牢な手法の開発につながっている。しかし、これらのより関連するメソッドが必要かどうかは、モデルが本当に誤った仕様であるかどうかに依存し、この質問に答える一般的な方法が欠如している。本稿では,そのような方法を提案する。より正確には、あるパラメトリックな家系の任意の分布からデータが得られるかどうかに関心を持つ、難しい複合テスト問題に対するカーネルベースの仮説テストを提案する。実験では,最小距離推定器を用いて,最大平均誤差とカーネルのスタイン誤差を推定する。これらは広く適用可能であり、パラメトリックモデルの密度が正規化定数まで分かる場合や、モデルがシミュレータの形式を取る場合などである。その結果,適切なテストレベルを維持しつつ,パラメータを推定し,同じデータに対して(データ分割を伴わずに)テストを行うことが可能であることが判明した。提案手法は, 異常な非パラメトリック密度モデルの有効性の検証や, 生体細胞ネットワークの難易度生成モデルなど, 様々な問題について考察する。

Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more involved methods are required will depend on whether the model is really misspecified, and there is a lack of generally applicable methods to answer this question. In this paper, we propose one such method. More precisely, we propose kernel-based hypothesis tests for the challenging composite testing problem, where we are interested in whether the data comes from any distribution in some parametric family. Our tests make use of minimum distance estimators based on the maximum mean discrepancy and the kernel Stein discrepancy. They are widely applicable, including whenever the density of the parametric model is known up to normalisation constant, or if the model takes the form of a simulator. As our main result, we show that we are able to estimate the parameter and conduct our test on the same data (without data splitting), while maintaining a correct test level. Our approach is illustrated on a range of problems, including testing for goodness-of-fit of an unnormalised non-parametric density model, and an intractable generative model of a biological cellular network.

翻訳日:2023-08-09 18:03:07 公開日:2023-08-08

# バナッハ空間における線形関数データの正規化学習の解析

Analysis of Regularized Learning for Linear-functional Data in Banach Spaces ( http://arxiv.org/abs/2109.03159v6 )

ライセンス: Link先を確認

Qi Ye

(参考訳) 本稿では, 表現定理, 擬近似定理, 収束定理を含むバナッハ空間における線形汎関数データに対する正規化学習の全体論を考察する。入力トレーニングデータは、マルチモデルデータとマルチスケールモデルの離散局所情報を表現するために、バナッハ空間の先行空間における線形関数からなる。トレーニングデータとマルチロス関数は、期待されるリスクを近似するために経験的リスクを計算するために使用され、正規化学習はバナッハ空間上の正規化された経験的リスクを最小化する。元の問題の厳密な解は、たとえ元の問題が未知あるいは未定であっても、正規化学習によって世界規模で近似される。収束定理では、バナッハ空間の弱*位相による正確な解への近似解の収束を示す。さらに、正規化学習の定理を適用して、サポートベクトルマシンや人工ニューラルネットワークといった機械学習の多くの問題を解決する。

In this article, we study the whole theory of regularized learning for linear-functional data in Banach spaces including representer theorems, pseudo-approximation theorems, and convergence theorems. The input training data are composed of linear functionals in the predual space of the Banach space to represent the discrete local information of multimodel data and multiscale models. The training data and the multi-loss functions are used to compute the empirical risks to approximate the expected risks, and the regularized learning is to minimize the regularized empirical risks over the Banach spaces. The exact solutions of the original problems are approximated globally by the regularized learning even if the original problems are unknown or unformulated. In the convergence theorems, we show the convergence of the approximate solutions to the exact solutions by the weak* topology of the Banach space. Moreover, the theorems of the regularized learning are applied to solve many problems of machine learning such as support vector machines and artificial neural networks.

翻訳日:2023-08-09 18:02:47 公開日:2023-08-08

# ガウス過程補間におけるパラメータ選択--選択基準の実証的研究

Parameter selection in Gaussian process interpolation: an empirical study of selection criteria ( http://arxiv.org/abs/2107.06006v5 )

ライセンス: Link先を確認

S\'ebastien Petit (L2S, GdR MASCOT-NUM), Julien Bect (L2S, GdR MASCOT-NUM), Paul Feliot, Emmanuel Vazquez (L2S, GdR MASCOT-NUM)

(参考訳) 本稿では,ガウス過程補間におけるパラメータ選択の基本問題を再検討する。パラメトリックファミリー内のガウス過程の平均および共分散関数を選択することにより、ユーザは未知の機能についての予測を行うベイズ手順のファミリーを取得し、良好な予測パフォーマンスを提供する家族を選択する必要がある。本研究は,2009年にファスハウアーと共著者が提唱した概念に基づいて,例えば一般のクロスバリデーション基準のような標準選択基準の回復を可能にする,離脱一貫選択基準と検証基準を構築するための効果的な枠組みを提供する,スコアリングルールの一般的な概念に基づく。この条件下では, 適切なモデル群の選択が, 特定の選択基準の選択よりも重要であることが, 文献のいくつかのテスト問題として実証的に示される。さらに,mat{\'e}rn共分散の正則性パラメータは,ほとんどの選択基準により効果的に選択できることを示した。

This article revisits the fundamental problem of parameter selection for Gaussian process interpolation. By choosing the mean and the covariance functions of a Gaussian process within parametric families, the user obtains a family of Bayesian procedures to perform predictions about the unknown function, and must choose a member of the family that will hopefully provide good predictive performances. We base our study on the general concept of scoring rules, which provides an effective framework for building leave-one-out selection and validation criteria, and a notion of extended likelihood criteria based on an idea proposed by Fasshauer and co-authors in 2009, which makes it possible to recover standard selection criteria such as, for instance, the generalized cross-validation criterion. Under this setting, we empirically show on several test problems of the literature that the choice of an appropriate family of models is often more important than the choice of a particular selection criterion (e.g., the likelihood versus a leave-one-out selection criterion). Moreover, our numerical results show that the regularity parameter of a Mat{\'e}rn covariance can be selected effectively by most selection criteria.

翻訳日:2023-08-09 18:02:32 公開日:2023-08-08

# ガウス過程回帰の実用的かつ厳密な不確実性境界

Practical and Rigorous Uncertainty Bounds for Gaussian Process Regression ( http://arxiv.org/abs/2105.02796v2 )

ライセンス: Link先を確認

Christian Fiedler, Carsten W. Scherer, Sebastian Trimpe

(参考訳) ガウス過程回帰(Gaussian Process Regression)は、ベイズ原理に基づく一般的な非パラメトリック回帰法であり、予測に対する不確実性推定を提供する。しかしながら、これらの推定はベイズの性質であり、安全性を保証する学習ベース制御のような重要な応用には、頻繁な不確実性境界が必要である。このような厳密な境界はガウス過程で利用できるが、それらはアプリケーションで役立つには保守的すぎる。これはしばしば実践者がこれらの境界をヒューリスティックに置き換え、理論上の保証を全て破ることになる。この問題に対処するために,厳密だが実用上有用である新たな不確実性境界を導入する。特に、境界は明示的に評価され、芸術結果の状態よりも保守的ではない。さらに,特定のモデル誤特定は優雅な劣化のみをもたらすことを示した。数値例による学習ベース制御におけるこれらの利点と有用性を示す。

Gaussian Process Regression is a popular nonparametric regression method based on Bayesian principles that provides uncertainty estimates for its predictions. However, these estimates are of a Bayesian nature, whereas for some important applications, like learning-based control with safety guarantees, frequentist uncertainty bounds are required. Although such rigorous bounds are available for Gaussian Processes, they are too conservative to be useful in applications. This often leads practitioners to replacing these bounds by heuristics, thus breaking all theoretical guarantees. To address this problem, we introduce new uncertainty bounds that are rigorous, yet practically useful at the same time. In particular, the bounds can be explicitly evaluated and are much less conservative than state of the art results. Furthermore, we show that certain model misspecifications lead to only graceful degradation. We demonstrate these advantages and the usefulness of our results for learning-based control with numerical examples.

翻訳日:2023-08-09 18:02:10 公開日:2023-08-08

# 熱伝達を増強する層流流路壁修正の迅速発見のための機械学習

Machine learning for rapid discovery of laminar flow channel wall modifications that enhance heat transfer ( http://arxiv.org/abs/2101.08130v2 )

ライセンス: Link先を確認

Yuri Koide, Arjun J. Kaithakkal, Matthias Schniewind, Bradley P. Ladewig, Alexander Stroh and Pascal Friederich

(参考訳) 流体の数値シミュレーションは,多くの物理現象をモデル化する上で重要な役割を担っている。単純平たい流路内の流体中の伝熱の計算は, 様々なシミュレーション手法において比較的容易な作業である。しかし、チャネル幾何がより複雑になると、数値シミュレーションは壁のジオメトリの最適化においてボトルネックとなる。任意の, 平坦な, 非平坦なチャネルの正確な数値シミュレーションと, ドラッグ係数とスタントン数を予測する機械学習モデルを組み合わせる。畳み込みニューラルネットワーク(CNN)は,数値シミュレーションのわずかな時間で,目標特性を正確に予測できることを示す。我々は,CNNモデルを仮想的な高スループットスクリーニング手法を用いて,多種多様なランダムな壁構造を探索する。データ拡張は既存のジオメトリデータに適用され、モデルの一般化を改善するために同じ数の熱伝達パラメータを持つ生成された新しいトレーニングデータを追加した。一般的なアプローチは、ここで述べたような単純なフロー設定に適用できるだけでなく、化学工学における多相や反応単位操作のようなより複雑なタスクにも拡張できる。

Numerical simulation of fluids plays an essential role in modeling many physical phenomena, which enables technological advancements, contributes to sustainable practices, and expands our understanding of various natural and engineered systems. The calculation of heat transfer in fluid flow in simple flat channels is a relatively easy task for various simulation methods. However, once the channel geometry becomes more complex, numerical simulations become a bottleneck in optimizing wall geometries. We present a combination of accurate numerical simulations of arbitrary, flat, and non-flat channels and machine learning models predicting drag coefficient and Stanton number. We show that convolutional neural networks (CNN) can accurately predict the target properties at a fraction of the time of numerical simulations. We use the CNN models in a virtual high-throughput screening approach to explore a large number of possible, randomly generated wall architectures. Data Augmentation was applied to existing geometries data to add generated new training data which have the same number of parameters of heat transfer to improve the model's generalization. The general approach is not only applicable to simple flow setups as presented here but can be extended to more complex tasks, such as multiphase or even reactive unit operations in chemical engineering.

翻訳日:2023-08-09 18:01:22 公開日:2023-08-08

# 非定常マルコフ環境に対する集合ベース値演算子

Set-based value operators for non-stationary Markovian environments ( http://arxiv.org/abs/2207.07271v3 )

ライセンス: Link先を確認

Sarah H.Q. Li, Assal\'e Adj\'e, Pierre-Lo\"ic Garoche, Beh\c{c}et A\c{c}{\i}kme\c{s}e

(参考訳) 本稿では,有限状態マルコフ決定過程(MDPs)をコンパクトな集合における不確かさパラメータで解析し,集合ベースの固定点理論による堅牢なMDPの結果を再検討する。この目的のために、ベルマンとポリシー評価演算子を値関数空間上の収縮作用素に一般化し、それらを 'emph{value operator} と表す。これらの値演算子は値関数の \emph{sets} に作用し、それらを \emph{set-based value operator} と表す。集合ベースの値作用素がコンパクト値関数集合の空間において \emph{contractions} であることを証明する。集合論からの洞察を生かして、古典ロバストなmdp文献における矩形性条件を、より弱く、動的計画法においてパラメータ不明なmdpと契約演算子のより大きな集合に適用できる全ての値演算子の封じ込め条件に一般化する。矩形条件と包含条件の両方が、集合ベースの値演算子の固定点集合が自身のエクストリーム要素を含むことを十分に保証する。不確実な MDP パラメータの凸集合とコンパクト集合に対して、古典的ロバスト値関数と集合ベースのベルマン作用素の固定点集合の上限との同値性を示す。コンパクト集合における動的に変化するMDPパラメータの下では、値反復に対する集合収束結果が証明され、そうでなければ単一の値関数に収束しない。最後に,惑星探査と成層圏観測における確率的経路計画問題に対する新たな保証を得る。

This paper analyzes finite state Markov Decision Processes (MDPs) with uncertain parameters in compact sets and re-examines results from robust MDP via set-based fixed point theory. To this end, we generalize the Bellman and policy evaluation operators to contracting operators on the value function space and denote them as \emph{value operators}. We lift these value operators to act on \emph{sets} of value functions and denote them as \emph{set-based value operators}. We prove that the set-based value operators are \emph{contractions} in the space of compact value function sets. Leveraging insights from set theory, we generalize the rectangularity condition in classic robust MDP literature to a containment condition for all value operators, which is weaker and can be applied to a larger set of parameter-uncertain MDPs and contracting operators in dynamic programming. We prove that both the rectangularity condition and the containment condition sufficiently ensure that the set-based value operator's fixed point set contains its own extrema elements. For convex and compact sets of uncertain MDP parameters, we show equivalence between the classic robust value function and the supremum of the fixed point set of the set-based Bellman operator. Under dynamically changing MDP parameters in compact sets, we prove a set convergence result for value iteration, which otherwise may not converge to a single value function. Finally, we derive novel guarantees for probabilistic path-planning problems in planet exploration and stratospheric station-keeping.

翻訳日:2023-08-09 17:53:22 公開日:2023-08-08

# 敵対的模倣学習の自動エンコーディング

Auto-Encoding Adversarial Imitation Learning ( http://arxiv.org/abs/2206.11004v3 )

ライセンス: Link先を確認

Kaifeng Zhang, Rui Zhao, Ziming Zhang, Yang Gao

(参考訳) 強化学習(rl)は意思決定のための強力なフレームワークを提供するが、実際には注意深く設計された報酬機能を必要とすることが多い。 AIL(Adversarial Imitation Learning)は、環境からの報酬信号にアクセスせずに自動ポリシー取得に光を当てる。本稿では,堅牢でスケーラブルな AIL フレームワークである Auto-Encoding Adversarial Imitation Learning (AEAIL) を提案する。 AEAILは、実証から専門家ポリシーを誘導するため、オートエンコーダの再構成エラーを報奨信号として利用し、従来の差別者ベースのものよりも、ポリシーを最適化するための情報を提供する。その後、導出した目的関数を用いてオートエンコーダとエージェントポリシーを訓練する。実験の結果,AEAILは現状および画像ベース環境において,最先端の手法よりも優れていることがわかった。さらに重要なのは、AEAILは、専門家によるデモが騒々しいときに、はるかに優れた堅牢性を示します。

Reinforcement learning (RL) provides a powerful framework for decision-making, but its application in practice often requires a carefully designed reward function. Adversarial Imitation Learning (AIL) sheds light on automatic policy acquisition without access to the reward signal from the environment. In this work, we propose Auto-Encoding Adversarial Imitation Learning (AEAIL), a robust and scalable AIL framework. To induce expert policies from demonstrations, AEAIL utilizes the reconstruction error of an auto-encoder as a reward signal, which provides more information for optimizing policies than the prior discriminator-based ones. Subsequently, we use the derived objective functions to train the auto-encoder and the agent policy. Experiments show that our AEAIL performs superior compared to state-of-the-art methods on both state and image based environments. More importantly, AEAIL shows much better robustness when the expert demonstrations are noisy.

翻訳日:2023-08-09 17:52:13 公開日:2023-08-08

# orc: オンラインロールチェンジを用いたネットワークグループベースの知識蒸留

ORC: Network Group-based Knowledge Distillation using Online Role Change ( http://arxiv.org/abs/2206.01186v2 )

ライセンス: Link先を確認

Junyong Choi, Hyeon Cho, Seokhwa Cheung, Wonjun Hwang

(参考訳) 知識蒸留では,全能全能の教師ネットワークではすべての問題を解決できないため,近年,複数の教師による知識蒸留が研究されている。しかし、一部の未熟な教師が生徒に虚偽の知識を移すことがあるため、その改善は期待したほど良くないこともある。本稿では,この制限を克服し,複数のネットワークの有効性を活かすために,複数のネットワークを教師グループと学生グループに分割する。すなわち、学生グループは教師の知識を学習する必要がある未熟なネットワークの集合であり、教師グループは、うまく教えられる選択されたネットワークで構成されている。学生グループ内の上位ネットワークが各イテレーションで教師グループに昇格できるオンラインの役割変更戦略を提案する。教師集団の知識を洗練させるために,教師集団の誤りサンプルを用いて教員集団を訓練した後,教師グループから学生グループへの協調的知識の伝達に成功した。 CIFAR-10, CIFAR-100, ImageNetにおける提案手法の優位性を検証する。我々はさらに,resnet, wrn, vgg, mobilenet, shufflenet などの様々なバックボーンアーキテクチャを用いた手法の汎用性を示す。

In knowledge distillation, since a single, omnipotent teacher network cannot solve all problems, multiple teacher-based knowledge distillations have been studied recently. However, sometimes their improvements are not as good as expected because some immature teachers may transfer the false knowledge to the student. In this paper, to overcome this limitation and take the efficacy of the multiple networks, we divide the multiple networks into teacher and student groups, respectively. That is, the student group is a set of immature networks that require learning the teacher's knowledge, while the teacher group consists of the selected networks that are capable of teaching successfully. We propose our online role change strategy where the top-ranked networks in the student group are able to promote to the teacher group at every iteration. After training the teacher group using the error samples of the student group to refine the teacher group's knowledge, we transfer the collaborative knowledge from the teacher group to the student group successfully. We verify the superiority of the proposed method on CIFAR-10, CIFAR-100, and ImageNet which achieves high performance. We further show the generality of our method with various backbone architectures such as ResNet, WRN, VGG, Mobilenet, and Shufflenet.

翻訳日:2023-08-09 17:51:58 公開日:2023-08-08

# 任意の次元における格子ゲージ理論の資源効率の良い量子シミュレーション:ガウスの法則とフェルミオン除去の解法

Resource-Efficient Quantum Simulation of Lattice Gauge Theories in Arbitrary Dimensions: Solving for Gauss' Law and Fermion Elimination ( http://arxiv.org/abs/2206.00685v3 )

ライセンス: Link先を確認

Guy Pardo, Tomer Greenberg, Aryeh Fortinsky, Nadav Katz, Erez Zohar

(参考訳) 格子ゲージ理論の量子シミュレーションが提案され、そのようなモデルの非摂動的性質を扱う理論的困難を克服する手法として利用されている。一つはフェルミオン自由度をシミュレートすることの難しさであり、もう一つはヒルベルト空間の冗長性であり、これは実験資源の無駄とゲージ理論の局所対称性の制約を課し、監視する必要性をもたらす。これは以前、非局所的な方法を用いて、1次元の設定で取り組まれてきた。ここでは、この問題とヒルベルト空間の冗長性を取り除き、より高い空間次元に有効である、これらの問題に対処するための別の手順を示す。我々は、$\mathbb{Z}_2$の格子ゲージ理論を実証し、IBMQクラウド量子コンピューティングプラットフォームを介して実験的に実装する。

Quantum simulation of Lattice Gauge Theories has been proposed and used as a method to overcome theoretical difficulties in dealing with the non-perturbative nature of such models. In this work we focus on two important bottlenecks that make developing such simulators hard: one is the difficulty of simulating fermionic degrees of freedom, and the other is the redundancy of the Hilbert space, which leads to a waste of experimental resources and the need to impose and monitor the local symmetry constraints of gauge theories. This has previously been tackled in one dimensional settings, using non-local methods. Here we show an alternative procedure for dealing with these problems, which removes the matter and the Hilbert space redundancy, and is valid for higher space dimensions. We demonstrate it for a $\mathbb{Z}_2$ lattice gauge theory and implement it experimentally via the IBMQ cloud quantum computing platform.

翻訳日:2023-08-09 17:51:38 公開日:2023-08-08

# 量子マルコフ力学における情報バックフローとテレポーテーションへの接続

Information back-flow in quantum non-Markovian dynamics and its connection to teleportation ( http://arxiv.org/abs/2203.00668v3 )

ライセンス: Link先を確認

Spyros Tserkis, Kade Head-Marsden, Prineha Narang

(参考訳) 量子過程は、その進化中に記憶効果が発生するとき、非マルコフ過程と呼ばれる。量子非マルコフ性(quantum non-markovianity)は、環境から主系への情報バックフローに関連する現象であるが、そのような効果は必要ないことが示されている。本研究では、離散性と連続変数系の量子非マルコビアン性と量子テレポーテーションのプロトコルとの接続を確立する。また、主システムと環境間のテレポーテーションプロトコル中に、状態回復につながる双方向の方法で情報がどのように流れるかを示す。最後に、テレポーテーションプロトコルにおけるリソースのような絡み合いの役割を考えると、この性質と非マルコフ性との関係も解明される。

A quantum process is called non-Markovian when memory effects take place during its evolution. Quantum non-Markovianity is a phenomenon typically associated with the information back-flow from the environment to the principal system, however it has been shown that such an effect is not necessary. In this work, we establish a connection between quantum non-Markovianity and the protocol of quantum teleportation in both discrete and continuous-variable systems. We also show how information flows during a teleportation protocol between the principal system and the environment in a bidirectional way leading up to a state revival. Finally, given the resource-like role of entanglement in the teleportation protocol, the relationship between this property and non-Markovianity is also elucidated.

翻訳日:2023-08-09 17:50:50 公開日:2023-08-08

# リフティングに基づく変異型マルチクラスセグメンテーション:設計,解析,実装

Lifting-based variational multiclass segmentation: design, analysis and implementation ( http://arxiv.org/abs/2202.04680v2 )

ライセンス: Link先を確認

Nadja Gruber, Johannes Schwab, Sebastien Court, Elke Gizewski, Markus Haltmeier

(参考訳) 与えられた画像を特定の特性を示す複数の領域に分割する変分多クラスセグメンテーションスキームを提案し,解析し,実現する。異なるチャネルからのエネルギー汎関数結合情報を最小化することにより、セグメンテーション領域を符号化する複数の関数を決定する。特定のマルチチャネルフィルタリングを用いて高次元の特徴空間に画像を持ち上げることで、またはRGB画像やマルチモーダル医療データなど、検討中の画像モダリティによって既に提供されることができる。実験の結果,提案手法は様々なシナリオで有効であることがわかった。特に,脳膿瘍の分類と腫瘍増殖の2つの医学的応用について有望な結果が得られた。主な理論的貢献として、提案したエネルギー関数のグローバル最小化器の存在を証明し、ノイズ入力に対する安定性と収束性を示す。特に、これらの結果はバイナリセグメンテーションの特殊な場合にも当てはまり、この特定の状況においてもこれらの結果は新規である。

We propose, analyze and realize a variational multiclass segmentation scheme that partitions a given image into multiple regions exhibiting specific properties. Our method determines multiple functions that encode the segmentation regions by minimizing an energy functional combining information from different channels. Multichannel image data can be obtained by lifting the image into a higher dimensional feature space using specific multichannel filtering or may already be provided by the imaging modality under consideration, such as an RGB image or multimodal medical data. Experimental results show that the proposed method performs well in various scenarios. In particular, promising results are presented for two medical applications involving classification of brain abscess and tumor growth, respectively. As main theoretical contributions, we prove the existence of global minimizers of the proposed energy functional and show its stability and convergence with respect to noisy inputs. In particular, these results also apply to the special case of binary segmentation, and these results are also novel in this particular situation.

翻訳日:2023-08-09 17:50:37 公開日:2023-08-08

# Genie: 量子化のデータを見せてください

Genie: Show Me the Data for Quantization ( http://arxiv.org/abs/2212.04780v3 )

ライセンス: Link先を確認

Yongkweon Jeon, Chungman Lee, Ho-young Kim

(参考訳) ゼロショット量子化は、プライバシに関連するコストや問題など、さまざまな理由からデータがアクセスできない場合に、軽量なディープニューラルネットワークを開発する上で有望なアプローチである。 FP32事前学習モデルにおけるバッチ正規化層の学習パラメータ($\mu$と$\sigma$)を利用することで、ゼロショット量子化スキームは合成データの生成に焦点を当てる。その後、事前学習されたモデル(教師)から量子化モデル(学生)への知識を蒸留し、量子化モデルに合成データセットを最適化する。しかし、これまでのゼロショット量子化は、タスク固有の損失と長期最適化を必要とする量子化対応トレーニング手法の文脈で主に議論されてきた。そこで我々は,高品質な量子化ネットワークを数時間で生成するゼロショット量子化のための後学習量子化方式を提案する。さらに,量子化に適したデータを生成するGenieというフレームワークを提案する。 Genieによって合成されたデータにより、実際のデータセットを使わずに堅牢な量子化モデルを作成できる。また,学習後の量子化アルゴリズムを提案し,量子化モデルの性能を向上させる。これらを組み合わせることで、ゼロショットと少数ショットの量子化のギャップを埋めることができ、既存のアプローチと比べて量子化性能を著しく改善することができる。言い換えれば、ユニークな最先端ゼロショット量子化アプローチを得ることができる。コードは \url{https://github.com/samsunglabs/genie} で入手できる。

Zero-shot quantization is a promising approach for developing lightweight deep neural networks when data is inaccessible owing to various reasons, including cost and issues related to privacy. By exploiting the learned parameters ($\mu$ and $\sigma$) of batch normalization layers in an FP32-pre-trained model, zero-shot quantization schemes focus on generating synthetic data. Subsequently, they distill knowledge from the pre-trained model (teacher) to the quantized model (student) such that the quantized model can be optimized with the synthetic dataset. However, thus far, zero-shot quantization has primarily been discussed in the context of quantization-aware training methods, which require task-specific losses and long-term optimization as much as retraining. We thus introduce a post-training quantization scheme for zero-shot quantization that produces high-quality quantized networks within a few hours. Furthermore, we propose a framework called Genie~that generates data suited for quantization. With the data synthesized by Genie, we can produce robust quantized models without real datasets, which is comparable to few-shot quantization. We also propose a post-training quantization algorithm to enhance the performance of quantized models. By combining them, we can bridge the gap between zero-shot and few-shot quantization while significantly improving the quantization performance compared to that of existing approaches. In other words, we can obtain a unique state-of-the-art zero-shot quantization approach. The code is available at \url{https://github.com/SamsungLabs/Genie}.

翻訳日:2023-08-09 17:45:30 公開日:2023-08-08

# 選択的記憶再帰的最小二乗法:rbfニューラルネットワークによるリアルタイム学習における記憶への再キャスト

Selective Memory Recursive Least Squares: Recast Forgetting into Memory in RBF Neural Network Based Real-Time Learning ( http://arxiv.org/abs/2211.07909v2 )

ライセンス: Link先を確認

Yiming Fei, Jiangang Li, Yanan Li

(参考訳) 放射ベース関数ニューラルネットワーク(RBFNN)に基づくリアルタイム学習タスクでは、ニューラルネットワークが新たなデータに対する感度を維持するために、忘れるメカニズムが広く使用されている。しかし, 忘れる機構によっては, 昔から学習されていただけあって, 受動的知識を忘れる現象として, 有用な知識が失われる。そこで本稿では,従来の記憶機構を記憶機構に再キャストする,smrls(selective memory recursive least squares)と呼ばれるリアルタイム学習手法を提案する。サンプルの収集時間に応じてサンプルの重要性を主に評価する忘れ機構とは異なり、記憶機構はサンプルの時間分布と空間分布の両方を通してサンプルの重要性を評価する。 SMRLSでは、RBFNNの入力空間を有限個の分割に均等に分割し、各分割から合成されたサンプルを用いて合成目的関数を開発する。現在の近似誤差に加えて、ニューラルネットワークは、訪問したパーティションから記録されたデータに従って重みも更新する。 SMRLSは, 最小二乗(FFRLS)や確率勾配降下(SGD)といった古典的学習法と比較して, 学習速度と一般化能力の向上を実現し, 対応するシミュレーション結果から検証した。

In radial basis function neural network (RBFNN) based real-time learning tasks, forgetting mechanisms are widely used such that the neural network can keep its sensitivity to new data. However, with forgetting mechanisms, some useful knowledge will get lost simply because they are learned a long time ago, which we refer to as the passive knowledge forgetting phenomenon. To address this problem, this paper proposes a real-time training method named selective memory recursive least squares (SMRLS) in which the classical forgetting mechanisms are recast into a memory mechanism. Different from the forgetting mechanism, which mainly evaluates the importance of samples according to the time when samples are collected, the memory mechanism evaluates the importance of samples through both temporal and spatial distribution of samples. With SMRLS, the input space of the RBFNN is evenly divided into a finite number of partitions and a synthesized objective function is developed using synthesized samples from each partition. In addition to the current approximation error, the neural network also updates its weights according to the recorded data from the partition being visited. Compared with classical training methods including the forgetting factor recursive least squares (FFRLS) and stochastic gradient descent (SGD) methods, SMRLS achieves improved learning speed and generalization capability, which are demonstrated by corresponding simulation results.

翻訳日:2023-08-09 17:44:45 公開日:2023-08-08

# Airbnbで横並びのランク付けを学ぶ

Learning To Rank Diversely At Airbnb ( http://arxiv.org/abs/2210.07774v3 )

ライセンス: Link先を確認

Malay Haldar, Mustafa Abdool, Liwei He, Dillon Davis, Huiji Gao, Sanjeev Katariya

(参考訳) Airbnbは二面的なマーケットプレースで、家賃のリスティングを所有するホストと世界中から来場客を集めている。ランク付け技術にニューラルネットワークベースの学習を適用することで、ゲストとホストのマッチングが大幅に改善されている。これらのランキングの改善はコア戦略によって推進された: 予測された予約確率でリストを順序付けし、これらの予約確率の推定をより正確にするためのテクニックを反復する。この戦略に暗黙的に埋め込まれた仮定は、リストの予約確率が検索結果の他のリストとは独立して決定できるという仮定であった。本稿では,フレームワークのランク付けに広く用いられているこの仮定がいかに誤っているかを論じる。この仮定を補正する理論的基盤を提供し、その後に理論に基づく効率的なニューラルネットワークアーキテクチャを提供する。リスト間の類似性を明示的に説明し、検索結果の多様化を減らすことで、強いポジティブな影響が生じた。この理論のオンラインA/Bテストの一環として,これらの指標の勝利について議論する。本手法は,大規模生産ランキングシステムの検索結果を多角化するための実用的な手法である。

Airbnb is a two-sided marketplace, bringing together hosts who own listings for rent, with prospective guests from around the globe. Applying neural network-based learning to rank techniques has led to significant improvements in matching guests with hosts. These improvements in ranking were driven by a core strategy: order the listings by their estimated booking probabilities, then iterate on techniques to make these booking probability estimates more and more accurate. Embedded implicitly in this strategy was an assumption that the booking probability of a listing could be determined independently of other listings in search results. In this paper we discuss how this assumption, pervasive throughout the commonly-used learning to rank frameworks, is false. We provide a theoretical foundation correcting this assumption, followed by efficient neural network architectures based on the theory. Explicitly accounting for possible similarities between listings, and reducing them to diversify the search results generated strong positive impact. We discuss these metric wins as part of the online A/B tests of the theory. Our method provides a practical way to diversify search results for large-scale production ranking systems.

翻訳日:2023-08-09 17:43:27 公開日:2023-08-08

# イベントベース行動認識のためのスパイクニューラルネットワーク:その利点を理解するための新しいタスク

Spiking Neural Networks for event-based action recognition: A new task to understand their advantage ( http://arxiv.org/abs/2209.14915v2 )

ライセンス: Link先を確認

Alex Vicente-Sola, Davide L. Manna, Paul Kirkland, Gaetano Di Caterina, Trevor Bihl

(参考訳) スパイキングニューラルネットワーク(snn)は、その独特の時間ダイナミクスによって特徴付けられるが、そのような計算の性質と利点はまだよく分かっていない。そこで本研究では,スパイキングニューロンが繰り返しシナプスを必要とせずに,フィードフォワードニューラルネットワークの時間的特徴抽出を可能にし,そのバイオインスパイアされた計算原理をエネルギー効率の向上を超えてうまく活用し,従来のニューロンとの違いを推定する方法を示す。これは、dvs-gesture-chain(dvs-gc)という新しいタスクを提案し、実イベントベースのアクション認識データセットにおける時間依存の知覚を初めて評価する。本研究は,イベントの順序の理解を必要とする新しいDVS-GCと異なり,時間的特徴抽出を伴わないネットワークで広く使用されているDVS Gestureベンチマークを解く方法を示す。さらに,この機構により,スパイクニューロンの時間的処理における漏洩率の役割を明らかにし,「ハードリセット」機構の利点を実証した。さらに,時間依存重みと正規化が時間的注意による順序の理解につながることを示す。

Spiking Neural Networks (SNN) are characterised by their unique temporal dynamics, but the properties and advantages of such computations are still not well understood. In order to provide answers, in this work we demonstrate how Spiking neurons can enable temporal feature extraction in feed-forward neural networks without the need for recurrent synapses, showing how their bio-inspired computing principles can be successfully exploited beyond energy efficiency gains and evidencing their differences with respect to conventional neurons. This is demonstrated by proposing a new task, DVS-Gesture-Chain (DVS-GC), which allows, for the first time, to evaluate the perception of temporal dependencies in a real event-based action recognition dataset. Our study proves how the widely used DVS Gesture benchmark could be solved by networks without temporal feature extraction, unlike the new DVS-GC which demands an understanding of the ordering of the events. Furthermore, this setup allowed us to unveil the role of the leakage rate in spiking neurons for temporal processing tasks and demonstrated the benefits of "hard reset" mechanisms. Additionally, we also show how time-dependent weights and normalization can lead to understanding order by means of temporal attention.

翻訳日:2023-08-09 17:43:09 公開日:2023-08-08

# 効率的なロバストトレーニングのための逆コアセット選択

Adversarial Coreset Selection for Efficient Robust Training ( http://arxiv.org/abs/2209.05785v2 )

ライセンス: Link先を確認

Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie

(参考訳) ニューラルネットワークは敵の攻撃に弱い: 入力に巧みに作り上げられた、知覚不能な摂動を加えることで、出力を変更できる。敵の訓練は、そのような攻撃に対して堅牢なモデルを訓練するための最も効果的なアプローチの1つである。残念ながら、トレーニングデータ全体の逆例をイテレーション毎に構築する必要があるため、ニューラルネットワークのバニラトレーニングよりもはるかに遅い。コアセット選択の理論を活用することで、トレーニングデータの小さなサブセットの選択が、堅牢なトレーニングの時間的複雑さを軽減するための原則的なアプローチを提供することを示す。この目的のために、まず、逆コアセット選択に対する収束保証を提供する。特に、収束境界は、コアセットがトレーニングデータ全体にわたって計算された勾配をいかにうまく近似できるかに直接関係していることを示す。理論的解析により,この勾配近似誤差を逆コアセット選択目的として用いて,トレーニングセットのサイズを効果的に削減する。一度構築すると、トレーニングデータのこのサブセット上で逆トレーニングを実行します。既存の手法と異なり,TRADES,$\ell_p$-PGD,Perceptual Adversarial Trainingなど,さまざまなトレーニング対象に適用することができる。我々は,我々のアプローチが,クリーンでロバストな精度の低下を経験しながら,敵のトレーニングを2～3倍高速化することを示すために,広範な実験を行った。

Neural networks are vulnerable to adversarial attacks: adding well-crafted, imperceptible perturbations to their input can modify their output. Adversarial training is one of the most effective approaches to training robust models against such attacks. Unfortunately, this method is much slower than vanilla training of neural networks since it needs to construct adversarial examples for the entire training data at every iteration. By leveraging the theory of coreset selection, we show how selecting a small subset of training data provides a principled approach to reducing the time complexity of robust training. To this end, we first provide convergence guarantees for adversarial coreset selection. In particular, we show that the convergence bound is directly related to how well our coresets can approximate the gradient computed over the entire training data. Motivated by our theoretical analysis, we propose using this gradient approximation error as our adversarial coreset selection objective to reduce the training set size effectively. Once built, we run adversarial training over this subset of the training data. Unlike existing methods, our approach can be adapted to a wide variety of training objectives, including TRADES, $\ell_p$-PGD, and Perceptual Adversarial Training. We conduct extensive experiments to demonstrate that our approach speeds up adversarial training by 2-3 times while experiencing a slight degradation in the clean and robust accuracy.

翻訳日:2023-08-09 17:42:46 公開日:2023-08-08

# 一般化量子マスター方程式を用いたNISQコンピュータ上のオープン量子システムダイナミクスのシミュレーション

Simulating Open Quantum System Dynamics on NISQ Computers with Generalized Quantum Master Equations ( http://arxiv.org/abs/2209.04956v2 )

ライセンス: Link先を確認

Yuchen Wang (1), Ellen Mulvihill (2), Zixuan Hu (1), Ningyi Lyu (2), Saurabh Shivpuje (1), Yudan Liu (3), Micheline B. Soley (2 and 4), Eitan Geva (3), Victor S. Batista (2), and Sabre Kais (1) ((1) Purdue University, (2) Yale University, (3) University of Michigan, Ann Arbor, (4) University of Wisconsin-Madison)

(参考訳) 本稿では,一般量子マスター方程式(GQME)に基づく量子アルゴリズムを提案する。このアプローチは、還元密度行列の要素の任意の部分集合に対する運動方程式の厳密な導出を提供することにより、システムバス結合とマルコビティを仮定するリンドブラッド方程式の限界を克服する。残りの自由度の影響によるメモリカーネルを入力として、対応する非単位プロパゲータを算出する。 Szの仕組みを実証する。非ユニタリプロパゲータを高次元のヒルベルト空間内のユニタリなものに変換するために、-nagy dilation theorem を用いることができ、それが nisq コンピュータの量子回路上で実装できる。我々は, 量子回路深度が, 減密度行列の対角要素に制限された場合の精度に与える影響を解析し, スピンボソンベンチマークモデルに適用した量子アルゴリズムの有効性を検証した。提案手法は, NISQ IBM コンピュータ上で信頼性の高い結果が得られることを示す。

We present a quantum algorithm based on the Generalized Quantum Master Equation (GQME) approach to simulate open quantum system dynamics on noisy intermediate-scale quantum (NISQ) computers. This approach overcomes the limitations of the Lindblad equation, which assumes weak system-bath coupling and Markovity, by providing a rigorous derivation of the equations of motion for any subset of elements of the reduced density matrix. The memory kernel resulting from the effect of the remaining degrees of freedom is used as input to calculate the corresponding non-unitary propagator. We demonstrate how the Sz.-Nagy dilation theorem can be employed to transform the non-unitary propagator into a unitary one in a higher-dimensional Hilbert space, which can then be implemented on quantum circuits of NISQ computers. We validate our quantum algorithm as applied to the spin-boson benchmark model by analyzing the impact of the quantum circuit depth on the accuracy of the results when the subset is limited to the diagonal elements of the reduced density matrix. Our findings demonstrate that our approach yields reliable results on NISQ IBM computers.

翻訳日:2023-08-09 17:42:22 公開日:2023-08-08

# Swin-transformer-yolov5によるリアルタイムワイングレープバンチ検出

Swin-transformer-yolov5 For Real-time Wine Grape Bunch Detection ( http://arxiv.org/abs/2208.14508v3 )

ライセンス: Link先を確認

Shenglian Lu (1), Xiaoyu Liu (1), Zixaun He (2), Wenbo Liu (3), Xin Zhang (3), and Manoj Karkee (2) ((1) Guangxi Normal University, China, (2) Washington State University, US, (3) Mississippi State University, US)

(参考訳) 本研究では, リアルタイムワイン品種検出において, Swin-transformer-YOLOv5 と Swin-T-YOLOv5 が提案され, YOLOv5 と Swin-transformer の両方の利点を継承した。この研究は、2019年7月から9月にかけて、シャルドネ(白ベリーの皮)とメルロット(未熟時に白または白赤の混合ベリーの皮)の2種類のブドウ品種について行われた。 Swin-T-YOLOv5の優位性を検証するため、その性能はFaster R-CNN、YOLOv3、YOLOv4、YOLOv5など、一般的に使われている、競合するオブジェクト検出器と比較された。いずれのモデルも,2つの異なる気象条件(晴れと曇り),2つの異なるベリー成熟段階(未熟と成熟),および3つの異なる日光方向/強度(朝,正午,午後)を総合的に比較した。さらに,Swin-T-YOLOv5によるブドウの品種数予測は,アノテーション処理中の手動カウントや手動ラベリングなど,真理値と比較した。その結果、提案されたSwin-T-YOLOv5は、天候が曇ったときに平均精度(mAP)が97%、F1スコアが0.89という他の研究モデルよりも優れていた。このmAPはFaster R-CNN, YOLOv3, YOLOv4, YOLOv5より約44%, 18%, 14%, 4%高かった。 Swin-T-YOLOv5 は未熟果検出時に最低 mAP (90%) と F1-score (0.82) を達成し, 約40%, 5%, 3%, 1% の値を示した。さらに、Swin-T-YOLOv5は、予測と地上の真実を比較する際に、R2の最大0.91と2.36の根平均二乗誤差(RMSE)を達成したシャルドネ品種に対してより良い性能を示した。しかし、Merlotの品種では性能が劣り、R2の0.70とRMSEの3.30しか達成できなかった。

In this research, an integrated detection model, Swin-transformer-YOLOv5 or Swin-T-YOLOv5, was proposed for real-time wine grape bunch detection to inherit the advantages from both YOLOv5 and Swin-transformer. The research was conducted on two different grape varieties of Chardonnay (always white berry skin) and Merlot (white or white-red mix berry skin when immature; red when matured) from July to September in 2019. To verify the superiority of Swin-T-YOLOv5, its performance was compared against several commonly used/competitive object detectors, including Faster R-CNN, YOLOv3, YOLOv4, and YOLOv5. All models were assessed under different test conditions, including two different weather conditions (sunny and cloudy), two different berry maturity stages (immature and mature), and three different sunlight directions/intensities (morning, noon, and afternoon) for a comprehensive comparison. Additionally, the predicted number of grape bunches by Swin-T-YOLOv5 was further compared with ground truth values, including both in-field manual counting and manual labeling during the annotation process. Results showed that the proposed Swin-T-YOLOv5 outperformed all other studied models for grape bunch detection, with up to 97% of mean Average Precision (mAP) and 0.89 of F1-score when the weather was cloudy. This mAP was approximately 44%, 18%, 14%, and 4% greater than Faster R-CNN, YOLOv3, YOLOv4, and YOLOv5, respectively. Swin-T-YOLOv5 achieved its lowest mAP (90%) and F1-score (0.82) when detecting immature berries, where the mAP was approximately 40%, 5%, 3%, and 1% greater than the same. Furthermore, Swin-T-YOLOv5 performed better on Chardonnay variety with achieved up to 0.91 of R2 and 2.36 root mean square error (RMSE) when comparing the predictions with ground truth. However, it underperformed on Merlot variety with achieved only up to 0.70 of R2 and 3.30 of RMSE.

翻訳日:2023-08-09 17:42:02 公開日:2023-08-08

# 深部産業画像の異常検出:調査

Deep Industrial Image Anomaly Detection: A Survey ( http://arxiv.org/abs/2301.11514v4 )

ライセンス: Link先を確認

Jiaqi Liu, Guoyang Xie, Jingbao Wang, Shangnian Li, Chengjie Wang, Feng Zheng, Yaochu Jin

(参考訳) 近年のディープラーニングの急速な発展は,産業用画像異常検出(IAD)のマイルストーンとなった。本稿では,ニューラルネットワークアーキテクチャ,監視レベル,損失関数,メトリクス,データセットの観点から,ディープラーニングに基づく画像異常検出手法の包括的なレビューを行う。また, 工業生産から新たな環境を抽出し, 我々の提案した新たな環境下での現在のIADアプローチを概観する。さらに,画像異常検出のオープニング課題をいくつか挙げる。各種監視下の代表的ネットワークアーキテクチャのメリットと欠点について論じる。最後に,研究成果を要約し,今後の研究方向性を指摘する。さらなるリソースはhttps://github.com/M-3LAB/awesome-industrial-anomaly-detectionで入手できる。

The recent rapid development of deep learning has laid a milestone in industrial Image Anomaly Detection (IAD). In this paper, we provide a comprehensive review of deep learning-based image anomaly detection techniques, from the perspectives of neural network architectures, levels of supervision, loss functions, metrics and datasets. In addition, we extract the new setting from industrial manufacturing and review the current IAD approaches under our proposed our new setting. Moreover, we highlight several opening challenges for image anomaly detection. The merits and downsides of representative network architectures under varying supervision are discussed. Finally, we summarize the research findings and point out future research directions. More resources are available at https://github.com/M-3LAB/awesome-industrial-anomaly-detection.

翻訳日:2023-08-09 17:33:19 公開日:2023-08-08

# 実写フルアノテート顕微鏡画像データセット生成のための非定常拡散確率モデル

Denoising Diffusion Probabilistic Models for Generation of Realistic Fully-Annotated Microscopy Image Data Sets ( http://arxiv.org/abs/2301.10227v2 )

ライセンス: Link先を確認

Dennis Eschweiler, R\"uveyda Yilmaz, Matisse Baumann, Ina Laube, Rijo Roy, Abin Jose, Daniel Br\"uckner, Johannes Stegmaier

(参考訳) 近年のコンピュータビジョンの進歩は、拡散確率モデルが特に効果的な方法であることが証明され、写実的画像データの生成に大きな進展をもたらした。本研究では,望まれる構造の粗いスケッチを出発点として,教師なしかつ直感的なアプローチにより,拡散モデルが完全注釈付顕微鏡画像データセットを効果的に生成できることを実証する。提案されたパイプラインは、ディープラーニングベースのセグメンテーションアプローチをトレーニングする際の手動アノテーションへの依存を軽減するとともに、人間のアノテーションを必要とせずに、多様なデータセットのセグメンテーションを可能にする。このアプローチは、データ生成プロセスの合理化と、様々な生物や細胞タイプを含む様々な実践実験の例で示すように、セグメンテーションモデルのより効率的でスケーラブルなトレーニングを可能にする、という大きな約束を持っている。

Recent advances in computer vision have led to significant progress in the generation of realistic image data, with denoising diffusion probabilistic models proving to be a particularly effective method. In this study, we demonstrate that diffusion models can effectively generate fully-annotated microscopy image data sets through an unsupervised and intuitive approach, using rough sketches of desired structures as the starting point. The proposed pipeline helps to reduce the reliance on manual annotations when training deep learning-based segmentation approaches and enables the segmentation of diverse datasets without the need for human annotations. This approach holds great promise in streamlining the data generation process and enabling a more efficient and scalable training of segmentation models, as we show in the example of different practical experiments involving various organisms and cell types.

翻訳日:2023-08-09 17:33:09 公開日:2023-08-08

# 深度画像から変形を推定するソフトマテリアルのコマニピュレーション

Co-manipulation of soft-materials estimating deformation from depth images ( http://arxiv.org/abs/2301.05609v4 )

ライセンス: Link先を確認

Giorgio Nicola, Enrico Villagrossi, Nicola Pedrocchi

(参考訳) 布、複合材料、紙/ボール紙などの柔らかい材料を人ロボットで共同操作することは、いくつかの産業応用を提示する困難な作業である。コマニピュレーションされた材料の変形状態を推定することが主な課題である。人間のロボットの相対距離を計算して間接測度を提供する。本稿では,畳み込みニューラルネットワーク(CNN)を用いて,深度画像から素材の変形状態を推定するデータ駆動モデルを開発する。まず,素材の変形状態を,現在のロボットポーズと人間のつかみ位置との相対的なロト変換として定義する。モデルは、畳み込みニューラルネットワーク、特にImageNetで事前訓練されたDenseNet-121を介して、電流と所望の変形状態の間のデルタをロボットコントローラに供給し、ツイストコマンドを出力する。本稿では,データセットの取得,事前処理,モデルのトレーニングのために開発された手法について述べる。このモデルは、カメラからの骨格トラッカーに基づく最先端の手法と比較される。結果から,本手法は,骨格トラッカーによる性能向上と種々の欠点を回避し,データセット取得に必要な時間を最小限に抑えるため,異なるアーキテクチャやデータセット次元によるモデル性能についても検討した。

Human-robot co-manipulation of soft materials, such as fabrics, composites, and sheets of paper/cardboard, is a challenging operation that presents several relevant industrial applications. Estimating the deformation state of the co-manipulated material is one of the main challenges. Viable methods provide the indirect measure by calculating the human-robot relative distance. In this paper, we develop a data-driven model to estimate the deformation state of the material from a depth image through a Convolutional Neural Network (CNN). First, we define the deformation state of the material as the relative roto-translation from the current robot pose and a human grasping position. The model estimates the current deformation state through a Convolutional Neural Network, specifically a DenseNet-121 pretrained on ImageNet.The delta between the current and the desired deformation state is fed to the robot controller that outputs twist commands. The paper describes the developed approach to acquire, preprocess the dataset and train the model. The model is compared with the current state-of-the-art method based on a skeletal tracker from cameras. Results show that our approach achieves better performances and avoids the various drawbacks caused by using a skeletal tracker.Finally, we also studied the model performance according to different architectures and dataset dimensions to minimize the time required for dataset acquisition

翻訳日:2023-08-09 17:32:49 公開日:2023-08-08

# SPTS v2: シングルポイントシーンテキストスポッティング

SPTS v2: Single-Point Scene Text Spotting ( http://arxiv.org/abs/2301.01635v3 )

ライセンス: Link先を確認

Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

(参考訳) エンド・ツー・エンドのシーンテキストスポッティングは、本質的なテキスト検出と認識の相乗効果により大きな進歩を遂げている。従来の手法では、水平長方形、回転矩形、四角形、多角形などの手動アノテーションを前提条件としており、単点法よりもはるかに高価である。新しいフレームワークであるSPTS v2では、単一ポイントアノテーションを使用して高パフォーマンステキストスポッティングモデルをトレーニングできます。 spts v2は、同じ予測シーケンス内の全てのテキストインスタンスの中央点を逐次予測し、並行してテキスト認識を行う並列認識デコーダ(prd)を用いて、インスタンス割り当てデコーダ(iad)による自動回帰トランスの利点を予約する。これら2つのデコーダは同じパラメータを共有し、単純な情報伝達プロセスと対話的に接続され、勾配と情報を渡す。様々な既存のベンチマークデータセットに関する包括的な実験により、spts v2は、より少ないパラメータで以前の最先端のシングルポイントテキストスポッターを上回ることができ、19$\times$の推論速度を実現している。 SPTS v2フレームワークのコンテキスト内では、他の表現と比較した場合、シーンテキストスポッティングにおける単一点表現の潜在的嗜好が示唆される。このような試みは、既存のパラダイムの領域を超えたシーンテキストスポッティングアプリケーションにとって重要な機会を提供する。コードはhttps://github.com/Yuliang-Liu/SPTSv2.comで入手できる。

End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19$\times$ faster inference speed. Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms. Code is available at https://github.com/Yuliang-Liu/SPTSv2.

翻訳日:2023-08-09 17:32:29 公開日:2023-08-08

# 最適化情報完全一般化測定によるADAPT-VQEの測定オーバーヘッドの軽減

Mitigating the measurement overhead of ADAPT-VQE with optimised informationally complete generalised measurements ( http://arxiv.org/abs/2212.09719v2 )

ライセンス: Link先を確認

Anton Nyk\"anen, Matteo A. C. Rossi, Elsi-Mari Borrelli, Sabrina Maniscalco, Guillermo Garc\'ia-P\'erez

(参考訳) ADAPT-VQE は分子シミュレーションのためのコンパクトな ans\atze を構築するための頑健なアルゴリズムである。 UCCSDのような他の手法と比較して回路深度を著しく低減できるが、精度は高く、多くのハードウェア効率の良い ans\atze の変動最適化を妨げるようなバレン高原に悩まされない。しかし、標準的な実装では、多くの整流子演算子の勾配評価とトラフ推定という形でかなりの測定オーバーヘッドを導入する。本研究では, 適応情報完全一般化計測(AIM)に基づくエネルギー評価手法を最近導入して, この測定オーバーヘッドを軽減する。エネルギー自体の効率的な測定方法を提供する以外に、情報完全(IC)測定データは、古典的に効率的な後処理のみを使用してADAPT-VQEの演算子プール内の演算子のすべての演算子を推定するために再利用することができる。本稿では,AIM-ADAPT-VQE方式の詳細を述べるとともに,H4ハミルトニアンと演算子プールを用いてその性能について検討する。数値シミュレーションにより,エネルギーを評価するために得られた測定データを再利用してADAPT-VQEを実装することができることを示す。さらに, エネルギーを化学精度で測定すると, 生成回路のcnotカウントが理想値に近いことを示す。測定データが少ないため、AIM-ADAPT-VQEは高い確率で基底状態に収束するが、回路深さが増加する場合もある。

ADAPT-VQE stands out as a robust algorithm for constructing compact ans\"atze for molecular simulation. It enables to significantly reduce the circuit depth with respect to other methods, such as UCCSD, while achieving higher accuracy and not suffering from so-called barren plateaus that hinder the variational optimisation of many hardware-efficient ans\"atze. In its standard implementation, however, it introduces a considerable measurement overhead in the form of gradient evaluations trough estimations of many commutator operators. In this work, we mitigate this measurement overhead by exploiting a recently introduced method for energy evaluation relying on Adaptive Informationally complete generalised Measurements (AIM). Besides offering an efficient way to measure the energy itself, Informationally Complete (IC) measurement data can be reused to estimate all the commutators of the operators in the operator pool of ADAPT-VQE, using only classically efficient post-processing. We present the AIM-ADAPT-VQE scheme in detail, and investigate its performance with several H4 Hamiltonians and operator pools. Our numerical simulations indicate that the measurement data obtained to evaluate the energy can be reused to implement ADAPT-VQE with no additional measurement overhead for the systems considered here. In addition, we show that, if the energy is measured within chemical precision, the CNOT count in the resulting circuits is close to the ideal one. With scarce measurement data, AIM-ADAPT-VQE still converges to the ground state with high probability, albeit with an increased circuit depth in some cases.

翻訳日:2023-08-09 17:31:36 公開日:2023-08-08

# 測定デバイス非依存量子秘密共有の破断速度-距離制限

Breaking Rate-Distance Limitation of Measurement-Device-Independent Quantum Secret Sharing ( http://arxiv.org/abs/2212.06148v3 )

ライセンス: Link先を確認

Chen-Long Li, Yao Fu, Wen-Bo Liu, Yuan-Mei Xie, Bing-Hong Li, Min-Gang Zhou, Hua-Lei Yin, Zeng-Bing Chen

(参考訳) 現在、量子シークレット共有のほとんどの進歩はレート距離境界に苦しむため、キーレートは限られている。キーレートの制限に加えて、技術的困難とそれに伴うコストが相まって、大規模なデプロイメントを妨げている。さらに, 既存プロトコルの性能は, 参加者の攻撃を考慮せずに漸近的に解析される。本稿では,キーレートと伝送距離を改良した測定デバイス非依存の量子秘密共有プロトコルについて報告する。空間多重化に基づき,少なくとも10の通信相手のネットワーク上でのレート距離境界を破ることができることを示す。他のプロトコルと比較して、我々の研究は秘密鍵レートを2桁以上改善し、送信距離を長くしている。参加者攻撃を考慮した構成可能フレームワークにおけるプロトコルのセキュリティを解析し,その性能評価を行った。さらに,既存のプロトコルと比較して,署名率が10^7ドル以上向上したデジタル署名に対して,我々のプロトコルを適用することを検討する。我々は、量子ネットワーク上のマルチパーティアプリケーションに、我々の量子秘密共有プロトコルが確かな未来を提供することを期待している。

Currently most progresses on quantum secret sharing suffer from rate-distance bound, and thus the key rates are limited. In addition to the limited key rate, the technical difficulty and the corresponding cost together prevent large-scale deployment. Furthermore, the performance of most existing protocols is analyzed in the asymptotic regime without considering participant attacks. Here we report a measurement-device-independent quantum secret sharing protocol with improved key rate and transmission distance. Based on spatial multiplexing, our protocol shows it can break rate-distance bounds over network under at least ten communication parties. Compared with other protocols, our work improves the secret key rate by more than two orders of magnitude and has a longer transmission distance. We analyze the security of our protocol in the composable framework considering participant attacks and evaluate its performance in the finite-size regime. In addition, we investigate applying our protocol to digital signatures where the signature rate is improved more than $10^7$ times compared with existing protocols. We anticipate that our quantum secret sharing protocol will provide a solid future for multiparty applications on the quantum network.

翻訳日:2023-08-09 17:31:08 公開日:2023-08-08

# コヒーレント励起輸送下におけるスピンリング用エネルギーランドスケープコントローラのロバスト性

Robustness of Energy Landscape Controllers for Spin Rings under Coherent Excitation Transport ( http://arxiv.org/abs/2303.00142v2 )

ライセンス: Link先を確認

Sean O'Neil, Frank Langbein, Edmond Jonckheere, and S Shermer

(参考訳) 量子スピンリングにおける励起輸送を調節するコントローラの設計と解析は、古典的なフィードバック制御技術を用いて効果的な制御を合成し、古典的な制御理論の期待に反する結果をもたらす。本稿では,システムおよび制御パラメータの不確実性に対する励振伝達の忠実性を最適化する制御器のロバスト性について検討する。我々は,追跡誤差の感度を古典的制御アナログとして,ロバスト性尺度として忠実性誤差の対数感度を用いる。本稿では,コヒーレントトランスポートに最適化された量子系が,正確な時間Tでの読み出しに最適化されているか,あるいはTのタイムウインドウで最適化されているかによって,誤差とログ感度の相関が著しく異なることを示した。

The design and analysis of controllers to regulate excitation transport in quantum spin rings presents challenges in the application of classical feedback control techniques to synthesize effective control, and generates results in contradiction to the expectations of classical control theory. In this paper, we examine the robustness of controllers designed to optimize the fidelity of an excitation transfer to uncertainty in system and control parameters. We use the logarithmic sensitivity of the fidelity error as the measure of robustness, drawing on the classical control analog of the sensitivity of the tracking error. In our analysis we demonstrate that quantum systems optimized for coherent transport demonstrate significantly different correlation between error and the log-sensitivity depending on whether the controller is optimized for readout at an exact time T or over a time-window about T.

翻訳日:2023-08-09 17:25:02 公開日:2023-08-08

# 最小識別性原理による量子力学

Quantum Mechanics From Principle of Least Distinguishability ( http://arxiv.org/abs/2302.14619v5 )

ライセンス: Link先を確認

Jianhao M. Yang

(参考訳) 非相対論的量子力学の定式化は最小識別可能性の原理から導出できることを示す。この原理は、2つの仮定を分解することで古典力学から最小作用原理の拡張と考えることができる。第一に、Planck定数は、観測可能となるために、物理オブジェクトがそのダイナミクス中に示す必要がある個別のアクションの量を定義する。これにより、古典的軌道との識別性の度合いを計算できる。第二に、古典軌道に沿って一定の真空揺らぎがある。真空揺らぎによる新たな識別可能性を測定するために,情報メトリクスを定義する新しい手法を提案する。変分原理を適用して、可微分性の合計度を最小にすることで、不確実性関係やシュル・"{o}ディンガー方程式を含む基本量子定式化を位置および運動量表現の両方で取り戻すことができる。さらに、この原則は2つの面で新しい結果をもたらす。概念レベルでは、真空揺らぎに関する情報指標は、基礎となる物理的相互作用を伴わずに絡み合い効果を示すものであり、絡み合い効果が非因果関係であることを示唆している。数学のレベルでは、相対エントロピーのより一般的な定義を用いて真空揺らぎの情報量を定義することは、相対エントロピーの順序に依存する一般化されたシュルンディンガー方程式をもたらす。最小の微分可能性原理は、新しい数学的ツールであり、他の高度な量子定式化を得られることを期待する。

We show that the formulations of non-relativistic quantum mechanics can be derived from the principle of least distinguishability. The principle can be considered as an extension of the least action principle from classical mechanics by factoring in two assumptions. First, the Planck constant defines the discrete amount of action a physical object needs to exhibit during its dynamics in order to be observable. This enables us to calculate the degree of distinguishability from a classical trajectory. Second, there is constant vacuum fluctuation along a classical trajectory. A novel method is introduced to define the information metrics to measure additional distinguishability due to vacuum fluctuations. Applying the variation principle to minimize the total degree of distinguishability allows us to recover the basic quantum formulations including the uncertainty relation and the Schr\"{o}dinger equation in both position and momentum representations. Furthermore, the principle brings in new results on two fronts. At the conceptual level, we find that the information metrics for vacuum fluctuations are responsible for manifesting entanglement effects without underlying physical interactions, implying that entanglement effects are non-causal. At the mathematical level, defining the information metrics for vacuum fluctuations using more general definitions of relative entropy results in a generalized Schr\"{o}dinger equation that depends on the order of relative entropy. The least distinguishability principle is a new mathematical tool, and we expect other advanced quantum formulations can be obtained from it.

翻訳日:2023-08-09 17:24:45 公開日:2023-08-08

# グラフ畳み込みネットワークに対する意味的バックドア攻撃

A semantic backdoor attack against Graph Convolutional Networks ( http://arxiv.org/abs/2302.14353v3 )

ライセンス: Link先を確認

Jiazhu Dai, Zhipeng Xiong

(参考訳) グラフ畳み込みネットワーク(GCN)は、ノード分類やグラフ分類など、様々なグラフ構造化タスクの問題に対処するのに非常に効果的である。しかし、最近の研究では、GCNはバックドア攻撃と呼ばれる新しい種類の脅威に弱いことが示されており、敵は隠れバックドアをGCNに注入することで、攻撃されたモデルが良質なサンプルに対して良好に動作するようにしているが、攻撃者が定義したトリガーによって隠れバックドアがアクティベートされた場合、その予測は攻撃者が指定したターゲットラベルに変更される。本稿では,このようなセマンティックバックドア攻撃がGCNに対して可能かどうかを考察し,GCNにおけるセキュリティ脆弱性の存在を明らかにするために,グラフ分類の文脈下でGCNに対するセマンティックバックドア攻撃(SBAG)を提案する。 SBAGはサンプルの特定の種類のノードをバックドアトリガーとして使用し、トレーニングデータを汚染することでGCNモデルに隠れたバックドアを注入する。バックドアがアクティベートされ、GCNモデルは、サンプルが十分なトリガーノードを含む限り、修正されていないサンプルでも攻撃者が指定した悪意のある分類結果を与える。 4つのグラフデータセット上でSBAGを評価する。実験の結果,スバッグは2種類の攻撃試料に対して約99.9%,82%以上の攻撃成功率を達成でき,中毒率は5%以下であった。

Graph convolutional networks (GCNs) have been very effective in addressing the issue of various graph-structured related tasks, such as node classification and graph classification. However, recent research has shown that GCNs are vulnerable to a new type of threat called a backdoor attack, where the adversary can inject a hidden backdoor into GCNs so that the attacked model performs well on benign samples, but its prediction will be maliciously changed to the attacker-specified target label if the hidden backdoor is activated by the attacker-defined trigger. In this paper, we investigate whether such semantic backdoor attacks are possible for GCNs and propose a semantic backdoor attack against GCNs (SBAG) under the context of graph classification to reveal the existence of this security vulnerability in GCNs. SBAG uses a certain type of node in the samples as a backdoor trigger and injects a hidden backdoor into GCN models by poisoning training data. The backdoor will be activated, and the GCN models will give malicious classification results specified by the attacker even on unmodified samples as long as the samples contain enough trigger nodes. We evaluate SBAG on four graph datasets. The experimental results indicate that SBAG can achieve attack success rates of approximately 99.9% and over 82% for two kinds of attack samples, respectively, with poisoning rates of less than 5%.

翻訳日:2023-08-09 17:24:21 公開日:2023-08-08

# 何が新しいの? 物語における新しい出来事の展開を特定する

Whats New? Identifying the Unfolding of New Events in Narratives ( http://arxiv.org/abs/2302.07748v4 )

ライセンス: Link先を確認

Seyed Mahed Mousavi, Shohei Tanaka, Gabriel Roccabruna, Koichiro Yoshino, Satoshi Nakamura, Giuseppe Riccardi

(参考訳) ナラティブには、時間とコンテキストにまたがる豊富なイベントソースが含まれている。これらの出来事の自動理解は、さらなる計算(推論など)のために物語を要約した理解を提供する。本稿では,イベントの情報状況(IS)を調査し,物語における新たなイベントの自動識別という,新たな課題を提案する。イベントは主題、述語、オブジェクトの三重項として定義します。イベントは、談話の文脈と、コモンセンス推論によって推測できるかどうかに関して、新しく分類される。我々は,人間の注釈を用いて,新しい出来事を文レベルで表現した物語の公開コーパスを注釈した。本稿ではアノテーションプロトコルを提案し,アノテーションの品質とタスクの難易度について検討する。ナラティブ理解のための新しいイベント抽出タスクのために,アノテーション付きデータセット,アノテーション資料,機械学習ベースラインモデルを公開する。

Narratives include a rich source of events unfolding over time and context. Automatic understanding of these events provides a summarised comprehension of the narrative for further computation (such as reasoning). In this paper, we study the Information Status (IS) of the events and propose a novel challenging task: the automatic identification of new events in a narrative. We define an event as a triplet of subject, predicate, and object. The event is categorized as new with respect to the discourse context and whether it can be inferred through commonsense reasoning. We annotated a publicly available corpus of narratives with the new events at sentence level using human annotators. We present the annotation protocol and study the quality of the annotation and the difficulty of the task. We publish the annotated dataset, annotation materials, and machine learning baseline models for the task of new event extraction for narrative understanding.

翻訳日:2023-08-09 17:23:42 公開日:2023-08-08

# アラビア語のエンティティ認識に関する調査:過去・最近の進歩・将来の動向

A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and Future Trends ( http://arxiv.org/abs/2302.03512v3 )

ライセンス: Link先を確認

Xiaoye Qu, Yingjie Gu, Qingrong Xia, Zechang Li, Zhefeng Wang, Baoxing Huai

(参考訳) アラビア語のテキストがインターネット上に出現するにつれ、これらのアラビア語のテキストから重要な情報を抽出することは特に有用である。基本的な技術として、名前付きエンティティ認識(NER)は情報抽出技術のコアコンポーネントとして機能し、質問応答や知識グラフ構築など多くの自然言語処理(NLP)システムにおいて重要な役割を果たす。本稿では,アラビア語nerの開発,特にディープラーニングと事前学習型言語モデルにおける最近の進歩について概観する。具体的には、アラビア語 NER の背景として、アラビア語 NER の特徴や、アラビア語 NER の既存の資源について紹介する。そこで我々はアラビアNER法の開発を体系的にレビューした。伝統的なアラビア語のNERシステムは機能工学とドメイン固有のルールの設計に重点を置いている。近年,テキストを連続ベクトル表現で表現することで,深層学習が大きな進歩を遂げている。事前訓練された言語モデルの成長に伴い、アラビア語のNERはより良いパフォーマンスを得る。最後に,他の言語からのアラビアNER法とNER法のギャップを解消し,アラビアNERの今後の方向性を概説する。

As more and more Arabic texts emerged on the Internet, extracting important information from these Arabic texts is especially useful. As a fundamental technology, Named entity recognition (NER) serves as the core component in information extraction technology, while also playing a critical role in many other Natural Language Processing (NLP) systems, such as question answering and knowledge graph building. In this paper, we provide a comprehensive review of the development of Arabic NER, especially the recent advances in deep learning and pre-trained language model. Specifically, we first introduce the background of Arabic NER, including the characteristics of Arabic and existing resources for Arabic NER. Then, we systematically review the development of Arabic NER methods. Traditional Arabic NER systems focus on feature engineering and designing domain-specific rules. In recent years, deep learning methods achieve significant progress by representing texts via continuous vector representations. With the growth of pre-trained language model, Arabic NER yields better performance. Finally, we conclude the method gap between Arabic NER and NER methods from other languages, which helps outline future directions for Arabic NER.

翻訳日:2023-08-09 17:23:30 公開日:2023-08-08

# MonoFlow: Wassersteinグラディエントフローの観点からの多様性GANの再考

MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein Gradient Flows ( http://arxiv.org/abs/2302.01075v5 )

ライセンス: Link先を確認

Mingxuan Yi, Zhanxing Zhu, Song Liu

(参考訳) GAN(Generative Adversarial Network)における対人訓練の従来の理解は、判別器が分散を推定するために訓練され、生成器はこの分散を最小化する。 GANの多くの変種がこのパラダイムに従って開発されたという事実にもかかわらず、GANとその実践的アルゴリズムの現在の理論的理解は矛盾している。本稿では,サンプル空間における粒子の進化を特徴づけるwasserstein勾配流を利用して,ganの理論的洞察とアルゴリズム的インスピレーションを得る。粒子の進化は単調に増大する対数密度比のマッピングによって再スケールされる。本手法では, 識別器の訓練によりモノフローのベクトル場を得る手順として, 相手のベクトル場によって定義される粒子流を描画することを学ぶ。また,変動発散最小化と逆行訓練の基本的な違いを明らかにする。この解析は,ganの学習にどのような種類のジェネレータ損失関数が寄与するかを明らかにするのに役立ち,モノフローを実現する限り,ganは文献以上の損失設計(例えば,不飽和損失)を持つ可能性があることを示唆する。本フレームワークの有効性を検証するため, 一貫性のある実証研究を含む。

The conventional understanding of adversarial training in generative adversarial networks (GANs) is that the discriminator is trained to estimate a divergence, and the generator learns to minimize this divergence. We argue that despite the fact that many variants of GANs were developed following this paradigm, the current theoretical understanding of GANs and their practical algorithms are inconsistent. In this paper, we leverage Wasserstein gradient flows which characterize the evolution of particles in the sample space, to gain theoretical insights and algorithmic inspiration of GANs. We introduce a unified generative modeling framework - MonoFlow: the particle evolution is rescaled via a monotonically increasing mapping of the log density ratio. Under our framework, adversarial training can be viewed as a procedure first obtaining MonoFlow's vector field via training the discriminator and the generator learns to draw the particle flow defined by the corresponding vector field. We also reveal the fundamental difference between variational divergence minimization and adversarial training. This analysis helps us to identify what types of generator loss functions can lead to the successful training of GANs and suggest that GANs may have more loss designs beyond the literature (e.g., non-saturated loss), as long as they realize MonoFlow. Consistent empirical studies are included to validate the effectiveness of our framework.

翻訳日:2023-08-09 17:22:43 公開日:2023-08-08

# MS-DETR:低結合核融合型マルチスペクトル歩行者検出変換器とモードベース最適化

MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization ( http://arxiv.org/abs/2302.00290v2 )

ライセンス: Link先を確認

Yinghui Xing, Song Wang, Shizhou Zhang, Guoqiang Liang, Xiuwei Zhang, Yanning Zhang

(参考訳) 可視・熱変調は特に低照度条件下で相補的な情報を提供することができるため、多スペクトル歩行者検出は、多くの時空応用にとって重要な課題である。利用可能なマルチスペクトル歩行者検出装置のほとんどが非エンド・ツー・エンド検出器に基づいているが,本稿ではマルチスペクトル歩行者検出用トランスフォーマ(ms-detr)を提案し,detrをマルチモーダル検出の分野に拡張する。 ms-detrは2つのモダリティ固有のバックボーンとトランスエンコーダで構成され、続いてマルチモーダルトランスフォーマデコーダがあり、可視性と熱的特徴はマルチモーダルトランスフォーマデコーダで融合される。マルチモーダル画像間の不一致によく抵抗するため,マルチモーダル特徴のキーポイントを個別に抽出し,適応的に学習した注意重みでそれらを融合することにより,疎結合な融合戦略を設計する。さらに、異なるモダリティだけでなく、異なる歩行者インスタンスが最終検出のために異なる信頼度スコアを持つ傾向があるという知見に基づいて、可視およびサーマルデコーダの分岐を保存し、インスタンス毎の動的損失を通じて予測スロットを整列するインスタンス対応モダリティバランス最適化戦略を提案する。我々のエンドツーエンドMS-DETRは、挑戦的なKAIST、CVC-14、LLVIPベンチマークデータセットよりも優れた性能を示している。ソースコードはhttps://github.com/YinghuiXing/MS-DETR で公開されている。

Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions. Most of the available multispectral pedestrian detectors are based on non-end-to-end detectors, while in this paper, we propose MultiSpectral pedestrian DEtection TRansformer (MS-DETR), an end-to-end multispectral pedestrian detector, which extends DETR into the field of multi-modal detection. MS-DETR consists of two modality-specific backbones and Transformer encoders, followed by a multi-modal Transformer decoder, and the visible and thermal features are fused in the multi-modal Transformer decoder. To well resist the misalignment between multi-modal images, we design a loosely coupled fusion strategy by sparsely sampling some keypoints from multi-modal features independently and fusing them with adaptively learned attention weights. Moreover, based on the insight that not only different modalities, but also different pedestrian instances tend to have different confidence scores to final detection, we further propose an instance-aware modality-balanced optimization strategy, which preserves visible and thermal decoder branches and aligns their predicted slots through an instance-wise dynamic loss. Our end-to-end MS-DETR shows superior performance on the challenging KAIST, CVC-14 and LLVIP benchmark datasets. The source code is available at https://github.com/YinghuiXing/MS-DETR .

翻訳日:2023-08-09 17:22:19 公開日:2023-08-08

# 重量予測はAdamWの収束を高める

Weight Prediction Boosts the Convergence of AdamW ( http://arxiv.org/abs/2302.00195v2 )

ライセンス: Link先を確認

Lei Guan

(参考訳) 本稿では、ディープニューラルネットワーク(DNN)モデルをトレーニングする際の収束を高めるために、AdamWオプティマイザに重み予測を導入する。特に、各ミニバッチトレーニングの前に、AdamWの更新ルールに従って将来の重量を予測し、予測された将来の重量を前方通過と後方伝播の両方に応用する。このように、AdamWオプティマイザは、常に現在の重みではなく将来の重みの勾配を利用してDNNパラメータを更新し、AdamWオプティマイザはより良い収束を達成する。提案手法は単純で実装が容易だが, DNN トレーニングの収束性向上に有効である。提案手法の有効性を検証するため,画像分類と言語モデリングタスクについて広範な実験を行った。実験の結果,提案手法はDNNモデルのトレーニングにおいて,AdamWの収束を向上し,AdamWよりも精度がよいことがわかった。

In this paper, we introduce weight prediction into the AdamW optimizer to boost its convergence when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, we predict the future weights according to the update rule of AdamW and then apply the predicted future weights to do both forward pass and backward propagation. In this way, the AdamW optimizer always utilizes the gradients w.r.t. the future weights instead of current weights to update the DNN parameters, making the AdamW optimizer achieve better convergence. Our proposal is simple and straightforward to implement but effective in boosting the convergence of DNN training. We performed extensive experimental evaluations on image classification and language modeling tasks to verify the effectiveness of our proposal. The experimental results validate that our proposal can boost the convergence of AdamW and achieve better accuracy than AdamW when training the DNN models.

翻訳日:2023-08-09 17:21:49 公開日:2023-08-08

# 重力誘起低温原子の絡み合い

Gravitationally-induced entanglement in cold atoms ( http://arxiv.org/abs/2304.00734v2 )

ライセンス: Link先を確認

Richard Howl, Nathan Cooper, Lucia Hackerm\"uller

(参考訳) 実験室で量子重力をテストするための有望なルートは、2つ以上の量子物質間の重力誘起絡み合い(GIE)を探すことである。このような試験の提案は、主にN00N状態や高スクイーズ状態のような非古典状態のマイクロソリッドシステムを用いている。ここでは、初めて、2つの原子間ガス干渉計間のGIEを量子重力のテストとして考える。本稿では、2つの干渉計を並列に配置し、GIEと量子重力の証拠として出力ポートにおける原子数の相関関係を求める。 GIEは、N00NやSchr\odinger cat状態のようなマクロな重ね合わせ状態に挑戦することなく可能であり、代わりに原子の古典的な「コヒーレント」状態が存在する。これにより、原子干渉計の総質量はプランク質量スケールと長い積分時間でなければならない。しかし、現在最先端の量子スクイージングがコールド原子で行われていることから、質量スケールは接近可能なレベルまで減少し、近い将来にそのような質量スケールが達成できるかを詳細に議論する。

A promising route to testing quantum gravity in the laboratory is to look for gravitationally-induced entanglement (GIE) between two or more quantum matter systems. Proposals for such tests have principally used microsolid systems, with highly non-classical states, such as N00N states or highly-squeezed states. Here, we consider, for the first time, GIE between two atomic gas interferometers as a test of quantum gravity. We propose placing the two interferometers next to each other in parallel and looking for correlations in the number of atoms at the output ports as evidence of GIE and quantum gravity. GIE is possible without challenging macroscopic superposition states, such as N00N or Schr\"odinger cat states, and instead there can be just classical-like 'coherent' states of atoms. This requires the total mass of the atom interferometers to be on the Planck mass scale, and long integration times. However, with current state-of-the-art quantum squeezing in cold atoms, we argue that the mass scale can be reduced to approachable levels and detail how such a mass scale can be achieved in the near future.

翻訳日:2023-08-09 17:14:06 公開日:2023-08-08

# PMAA:マルチ時間衛星画像からの高速雲除去のためのプログレッシブなマルチスケールアテンションオートエンコーダモデル

PMAA: A Progressive Multi-scale Attention Autoencoder Model for High-performance Cloud Removal from Multi-temporal Satellite Imagery ( http://arxiv.org/abs/2303.16565v2 )

ライセンス: Link先を確認

Xuechao Zou, Kai Li, Junliang Xing, Pin Tao, Yachao Cui

(参考訳) 衛星画像解析はリモートセンシングにおいて重要な役割を担っているが、雲による情報損失は適用を著しく妨げている。既存のディープクラウド除去モデルは顕著な成果を上げているが、文脈情報を考えることはほとんどない。本研究では,MAM(Multiscale Attention Module)とLIM(Local Interaction Module)を用いて,グローバルおよびローカル情報を同時利用し,ロバストなコンテキスト依存を構築するための高性能クラウド除去アーキテクチャであるPMAA(Progressive Multi-scale Attention Autoencoder)を紹介する。 PMAAは、MAMを用いたマルチスケール機能の長距離依存性を確立し、LIMを用いた細粒度細部再構築を調整し、細粒度と粗粒度の同時表現を可能にする。多様なマルチスケール機能の助けを借りて、PMAAは2つのベンチマークデータセットで従来の最先端モデルCTGANを一貫して上回っている。さらに、PMAAは、それぞれCTGANのパラメータと計算複雑性の0.5%と14.6%しかなく、かなりの効率上の利点を持っている。これらの総合的な結果は、大規模なクラウド除去タスクを達成するためにエッジデバイスへのデプロイに適した軽量クラウド除去ネットワークとしてのPMAAの可能性を示している。ソースコードと事前トレーニングされたモデルは、https://github.com/xavierjiezou/pmaaで利用可能です。

Satellite imagery analysis plays a pivotal role in remote sensing; however, information loss due to cloud cover significantly impedes its application. Although existing deep cloud removal models have achieved notable outcomes, they scarcely consider contextual information. This study introduces a high-performance cloud removal architecture, termed Progressive Multi-scale Attention Autoencoder (PMAA), which concurrently harnesses global and local information to construct robust contextual dependencies using a novel Multi-scale Attention Module (MAM) and a novel Local Interaction Module (LIM). PMAA establishes long-range dependencies of multi-scale features using MAM and modulates the reconstruction of fine-grained details utilizing LIM, enabling simultaneous representation of fine- and coarse-grained features at the same level. With the help of diverse and multi-scale features, PMAA consistently outperforms the previous state-of-the-art model CTGAN on two benchmark datasets. Moreover, PMAA boasts considerable efficiency advantages, with only 0.5% and 14.6% of the parameters and computational complexity of CTGAN, respectively. These comprehensive results underscore PMAA's potential as a lightweight cloud removal network suitable for deployment on edge devices to accomplish large-scale cloud removal tasks. Our source code and pre-trained models are available at https://github.com/XavierJiezou/PMAA.

翻訳日:2023-08-09 17:13:28 公開日:2023-08-08

# gnnbuilder - 汎用グラフニューラルネットワークアクセラレーション生成,シミュレーション,最適化のための自動化フレームワーク

GNNBuilder: An Automated Framework for Generic Graph Neural Network Accelerator Generation, Simulation, and Optimization ( http://arxiv.org/abs/2303.16459v2 )

ライセンス: Link先を確認

Stefan Abi-Karam, Cong Hao

(参考訳) たくさんのグラフニューラルネットワーク(gnn)加速器が提案されている。しかし、それらはユーザーのハードウェアの専門知識に強く依存しており、通常は特定のGNNモデルに最適化されているため、実用上は困難である。そこで、本研究では、gnnbuilder を提案する。これは、最初の自動化された、汎用的な、エンドツーエンドのgnnアクセラレーター生成フレームワークである。 It features four advantages: (1) GNNBuilder can automatically generate GNN accelerators for a wide range of GNN models arbitrarily defined by users; (2) GNNBuilder takes standard PyTorch programming interface, introducing zero overhead for algorithm developers; (3) GNNBuilder supports end-to-end code generation, simulation, accelerator optimization, and hardware deployment, realizing a push-button fashion for GNN accelerator design; (4) GNNBuilder is equipped with accurate performance models of its generated accelerator, enabling fast and flexible design space exploration (DSE). 実験では、まず、我々のアクセラレータ性能モデルがレイテンシ予測で36セント、BRAMカウント予測で18セントの誤差を持つことを示した。次に、生成したアクセラレーターはCPUを6.33\times$、GPUを6.87\times$で上回ります。このフレームワークはオープンソースであり、コードはhttps://github.com/sharc-lab/gnn-builderで入手できる。

There are plenty of graph neural network (GNN) accelerators being proposed. However, they highly rely on users' hardware expertise and are usually optimized for one specific GNN model, making them challenging for practical use. Therefore, in this work, we propose GNNBuilder, the first automated, generic, end-to-end GNN accelerator generation framework. It features four advantages: (1) GNNBuilder can automatically generate GNN accelerators for a wide range of GNN models arbitrarily defined by users; (2) GNNBuilder takes standard PyTorch programming interface, introducing zero overhead for algorithm developers; (3) GNNBuilder supports end-to-end code generation, simulation, accelerator optimization, and hardware deployment, realizing a push-button fashion for GNN accelerator design; (4) GNNBuilder is equipped with accurate performance models of its generated accelerator, enabling fast and flexible design space exploration (DSE). In the experiments, first, we show that our accelerator performance model has errors within $36\%$ for latency prediction and $18\%$ for BRAM count prediction. Second, we show that our generated accelerators can outperform CPU by $6.33\times$ and GPU by $6.87\times$. This framework is open-source, and the code is available at https://github.com/sharc-lab/gnn-builder.

翻訳日:2023-08-09 17:13:02 公開日:2023-08-08

# 空間フォトニックイジングマシンによる低ランク組合せ最適化と統計的学習

Low-rank combinatorial optimization and statistical learning by spatial photonic Ising machine ( http://arxiv.org/abs/2303.14993v2 )

ライセンス: Link先を確認

Hiroshi Yamashita, Ken-ichi Okubo, Suguru Shimomura, Yusuke Ogura, Jun Tanida, Hideyuki Suzuki

(参考訳) 空間フォトニックイジングマシン (SPIM) [D. Pierangeli et al., Phys. Lett. 122, 213902 (2019)] は、空間光変調を利用して大規模な組合せ最適化問題を効率的に解くための有望な光学アーキテクチャである。しかし、SPIMの原始バージョンは、ランク1の相互作用行列だけでIsing問題に対応できる。本稿では,任意のイジング問題に光学的実装を変更せずに対応可能なspmの新しい計算モデルを提案する。提案モデルはクナップサック問題のような低位相互作用行列のイジング問題において特に効率的である。さらに、ボルツマンマシンの学習能力を取得する。低ランク相互作用モデルを用いて,MNIST手書き桁画像の学習,分類,サンプリングを効率的に行うことを示す。提案手法は,SPIMアーキテクチャに固有のスケーラビリティを損なうことなく,組合せ最適化と統計的学習の様々な問題に適用可能であることを示す。

The spatial photonic Ising machine (SPIM) [D. Pierangeli et al., Phys. Rev. Lett. 122, 213902 (2019)] is a promising optical architecture utilizing spatial light modulation for solving large-scale combinatorial optimization problems efficiently. The primitive version of the SPIM, however, can accommodate Ising problems with only rank-one interaction matrices. In this Letter, we propose a new computing model for the SPIM that can accommodate any Ising problem without changing its optical implementation. The proposed model is particularly efficient for Ising problems with low-rank interaction matrices, such as knapsack problems. Moreover, it acquires the learning ability of Boltzmann machines. We demonstrate that learning, classification, and sampling of the MNIST handwritten digit images are achieved efficiently using the model with low-rank interactions. Thus, the proposed model exhibits higher practical applicability to various problems of combinatorial optimization and statistical learning, without losing the scalability inherent in the SPIM architecture.

翻訳日:2023-08-09 17:12:42 公開日:2023-08-08

# カイラルRydbergモデルにおける量子スピン液体の分類と発生

Classification and emergence of quantum spin liquids in chiral Rydberg models ( http://arxiv.org/abs/2303.12829v2 )

ライセンス: Link先を確認

Poetri Sonya Tarabunga, Giuliano Giudici, Titas Chanda, Marcello Dalmonte

(参考訳) ライドバーグ原子配列で最近実現されたキラル相互作用ハミルトニアンの量子相の性質について検討する。ハニカム格子上のパルトン構成を用いて、全ての可能なフェルミオンキラルスピン液体を{\mathrm{u}(1)$ global symmetryに分類する。これらの2つのクラスから得られる対応する変動波動関数は、Rydberg多体基底状態の1/2$と1/4$の粒子密度を正確に記述する。この解析をテンソルネットワークシミュレーションで補完することにより、両方の粒子充填セクタは、同じ位相次数$\nu=1/2$分数量子ホール効果を持つスピン液体を持つと結論づける。密度 1/2$ では, モデルの位相図を明らかにするが, 密度 1/4$ では, 微視的位相図とほぼ重複する基底状態波動関数を明示的に構成する。これらの発見は、チャートン波動関数を用いてカイラル・リドバーグ模型における量子スピン液体の発見を導く道を開いた。

We investigate the nature of quantum phases arising in chiral interacting Hamiltonians recently realized in Rydberg atom arrays. We classify all possible fermionic chiral spin liquids with $\mathrm{U}(1)$ global symmetry using parton construction on the honeycomb lattice. The resulting classification includes six distinct classes of gapped quantum spin liquids: the corresponding variational wave functions obtained from two of these classes accurately describe the Rydberg many-body ground state at $1/2$ and $1/4$ particle density. Complementing this analysis with tensor network simulations, we conclude that both particle filling sectors host a spin liquid with the same topological order of a $\nu=1/2$ fractional quantum Hall effect. At density $1/2$, our results clarify the phase diagram of the model, while at density $1/4$, they provide an explicit construction of the ground state wave function with almost unit overlap with the microscopic one. These findings pave the way to the use of parton wave functions to guide the discovery of quantum spin liquids in chiral Rydberg models.

翻訳日:2023-08-09 17:12:22 公開日:2023-08-08

# 誘導アテンションを有するハイブリッドスペクトルDenoising Transformer

Hybrid Spectral Denoising Transformer with Guided Attention ( http://arxiv.org/abs/2303.09040v2 )

ライセンス: Link先を確認

Zeqiang Lai, Chenggang Yan, Ying Fu

(参考訳) 本稿では,ハイパースペクトル画像デノージングのためのハイブリッドスペクトルデノージングトランス(hsdt)を提案する。 HSIにトランスフォーマーを適用する際の課題は、効率と柔軟性を維持しつつ、大域的および局所的な空間スペクトル相関を捕捉するCNNベースの手法の既存の制限に対処する能力から生じる。この問題に対処するために,s3conv,gssa,自己変調フィードフォワードネットワーク(sm-ffn)の2つのモデルの利点を組み合わせたハイブリッド手法を提案する。私たちのS3Convは、3D畳み込みの軽量な代替として機能し、任意のバンド数でHSIに取り組む柔軟性を維持しながら、より空間的・スペクトル的な特徴を抽出します。これらの機能はGSSAによって適応的に処理され、スペクトル帯域にわたって3Dの自己アテンションを変換し、スペクトルシグネチャを符号化する学習可能なクエリセットによってガイドされる。これは我々のモデルに、大域的なスペクトル相関を識別する強力な能力を与えるだけでなく、線形複雑性も維持する。さらに, SM-FFNでは, より情報的領域の活性化を促進させる自己変調法を提案する。シミュレーションと実世界のノイズの両面において,様々な実験を行い,HSDTが計算オーバーヘッドを低く保ちながら既存の最先端手法を著しく上回ることを示す。コードはhttps: //github.com/Zeqiang-Lai/HSDTにある。

In this paper, we present a Hybrid Spectral Denoising Transformer (HSDT) for hyperspectral image denoising. Challenges in adapting transformer for HSI arise from the capabilities to tackle existing limitations of CNN-based methods in capturing the global and local spatial-spectral correlations while maintaining efficiency and flexibility. To address these issues, we introduce a hybrid approach that combines the advantages of both models with a Spatial-Spectral Separable Convolution (S3Conv), Guided Spectral Self-Attention (GSSA), and Self-Modulated Feed-Forward Network (SM-FFN). Our S3Conv works as a lightweight alternative to 3D convolution, which extracts more spatial-spectral correlated features while keeping the flexibility to tackle HSIs with an arbitrary number of bands. These features are then adaptively processed by GSSA which per-forms 3D self-attention across the spectral bands, guided by a set of learnable queries that encode the spectral signatures. This not only enriches our model with powerful capabilities for identifying global spectral correlations but also maintains linear complexity. Moreover, our SM-FFN proposes the self-modulation that intensifies the activations of more informative regions, which further strengthens the aggregated features. Extensive experiments are conducted on various datasets under both simulated and real-world noise, and it shows that our HSDT significantly outperforms the existing state-of-the-art methods while maintaining low computational overhead. Code is at https: //github.com/Zeqiang-Lai/HSDT.

翻訳日:2023-08-09 17:12:04 公開日:2023-08-08

# SemARFlow: 自律運転のための教師なし光フロー推定にセマンティックスを注入する

SemARFlow: Injecting Semantics into Unsupervised Optical Flow Estimation for Autonomous Driving ( http://arxiv.org/abs/2303.06209v2 )

ライセンス: Link先を確認

Shuai Yuan, Shuzhi Yu, Hannah Kim and Carlo Tomasi

(参考訳) 教師なし光フロー推定は、特に低テクスチャ領域における閉塞や運動境界付近で困難である。セマンティクスやドメイン知識などの追加情報は、この問題をより制約するのに役立ちます。本稿では,セマンティックセグメンテーションマスクを付加入力として利用する自律運転データのための教師なし光フローネットワークSemARFlowを紹介する。この追加情報はエンコーダに注入され、フロー出力を洗練する学習アップサンプラーに注入される。さらに、単純だが効果的なセマンティック拡張モジュールは、車両、ポール、空のフローとその境界を学習する際の自己スーパービジョンを提供する。これらの意味情報の注入により、KITTI-2015の光学フローテストの誤差は11.80%から8.38%に改善された。また、オブジェクト境界に関する目に見える改善や、データセットをまたいで一般化する能力も示しています。コードはhttps://github.com/duke-vision/semantic-unsup-flow-releaseで入手できる。

Unsupervised optical flow estimation is especially hard near occlusions and motion boundaries and in low-texture regions. We show that additional information such as semantics and domain knowledge can help better constrain this problem. We introduce SemARFlow, an unsupervised optical flow network designed for autonomous driving data that takes estimated semantic segmentation masks as additional inputs. This additional information is injected into the encoder and into a learned upsampler that refines the flow output. In addition, a simple yet effective semantic augmentation module provides self-supervision when learning flow and its boundaries for vehicles, poles, and sky. Together, these injections of semantic information improve the KITTI-2015 optical flow test error rate from 11.80% to 8.38%. We also show visible improvements around object boundaries as well as a greater ability to generalize across datasets. Code is available at https://github.com/duke-vision/semantic-unsup-flow-release.

翻訳日:2023-08-09 17:11:36 公開日:2023-08-08

# クラス特化因子を用いた遺伝的に解釈可能なマルチラベル分類

Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals ( http://arxiv.org/abs/2303.00500v2 )

ライセンス: Link先を確認

Susu Sun, Stefano Woerner, Andreas Maier, Lisa M. Koch, Christian F. Baumgartner

(参考訳) 医療画像解析などの高度な応用分野における機械学習アルゴリズムの解釈性は不可欠である。しかし、高いパフォーマンスのブラックボックスニューラルネットワークは予測の説明を提供していないため、不信感や人間とMLのコラボレーションにつながる可能性がある。実際には広く使われているポストホックな説明技術は、深刻な概念的問題に苦しむことが示されている。さらに,本論文で示すように,複数の医学的所見が1つの画像に共生するマルチラベルシナリオでは,現在の説明手法が適切に機能しない。マルチラベル分類のための本質的に解釈可能なモデルであるAttri-Netを提案する。 attri-netは、透明で信頼できる、人間に理解可能な説明を提供する強力な分類器である。モデルはまず、偽物に基づいてクラス固有の帰属マップを生成し、どの画像領域が特定の医学的所見に対応するかを特定する。次に、単純なロジスティック回帰分類器を用いて、これらの帰属写像のみに基づいて予測を行う。 Attri-Netを5つのポストホックな説明手法と3つの胸部X線データセット上の本質的に解釈可能な分類器と比較した。 Attri-Netは、臨床知識と整合した高品質なマルチラベル説明を生成し、最先端の分類モデルに匹敵する分類性能を有する。

Interpretability is essential for machine learning algorithms in high-stakes application fields such as medical image analysis. However, high-performing black-box neural networks do not provide explanations for their predictions, which can lead to mistrust and suboptimal human-ML collaboration. Post-hoc explanation techniques, which are widely used in practice, have been shown to suffer from severe conceptual problems. Furthermore, as we show in this paper, current explanation techniques do not perform adequately in the multi-label scenario, in which multiple medical findings may co-occur in a single image. We propose Attri-Net, an inherently interpretable model for multi-label classification. Attri-Net is a powerful classifier that provides transparent, trustworthy, and human-understandable explanations. The model first generates class-specific attribution maps based on counterfactuals to identify which image regions correspond to certain medical findings. Then a simple logistic regression classifier is used to make predictions based solely on these attribution maps. We compare Attri-Net to five post-hoc explanation techniques and one inherently interpretable classifier on three chest X-ray datasets. We find that Attri-Net produces high-quality multi-label explanations consistent with clinical knowledge and has comparable classification performance to state-of-the-art classification models.

翻訳日:2023-08-09 17:11:23 公開日:2023-08-08

# 異なる負の扱い方:リンク予測のための領域制約と範囲制約による損失関数の強化

Treat Different Negatives Differently: Enriching Loss Functions with Domain and Range Constraints for Link Prediction ( http://arxiv.org/abs/2303.00286v3 )

ライセンス: Link先を確認

Nicolas Hubert, Pierre Monnin, Armelle Brun, Davy Monticolo

(参考訳) 知識グラフ埋め込みモデル(KGEM)は、リンク予測を含む知識グラフ(KG)に関連する様々なタスクに使用される。それらは、三重項とその対応するラベルのバッチを考慮して計算される損失関数で訓練される。伝統的なアプローチでは、三重項のラベルは真か偽かである。しかし、最近の研究は全ての負の三重項が等しく評価されるべきでないことを示唆している。この仮定に従って、w.r.t.ドメインと範囲制約が意味的に妥当な負の三重項は高品質な負の三重項であると仮定する。したがって、損失関数は、意味的に無効な否定関数とは異なる扱いをするべきである。そこで本研究では,リンク予測のための3つの主損失関数に対する意味駆動型バージョンを提案する。広範かつ制御された実験環境において,提案した損失関数は,異なるスキーマを基盤とする3つの公開ベンチマークKGに対して,体系的に満足度の高い結果を与えることを示す。実際、提案された損失関数は(1) MRR と Hits@10 の値が向上し、(2) KGEM は Sem@K 測定値によって測定されるように、よりセマンティックな認識に向かわせる。これは意味情報がKGEMをグローバルに改善し、損失関数に組み込むべきであることを強調している。ドメインと範囲の関係はスキーマ定義のKGでほとんど利用できますが、このアプローチは実用的にも広く利用できます。

Knowledge graph embedding models (KGEMs) are used for various tasks related to knowledge graphs (KGs), including link prediction. They are trained with loss functions that are computed considering a batch of scored triples and their corresponding labels. Traditional approaches consider the label of a triple to be either true or false. However, recent works suggest that all negative triples should not be valued equally. In line with this recent assumption, we posit that negative triples that are semantically valid w.r.t. domain and range constraints might be high-quality negative triples. As such, loss functions should treat them differently from semantically invalid negative ones. To this aim, we propose semantic-driven versions for the three main loss functions for link prediction. In an extensive and controlled experimental setting, we show that the proposed loss functions systematically provide satisfying results on three public benchmark KGs underpinned with different schemas, which demonstrates both the generality and superiority of our proposed approach. In fact, the proposed loss functions do (1) lead to better MRR and Hits@10 values, (2) drive KGEMs towards better semantic awareness as measured by the Sem@K metric. This highlights that semantic information globally improves KGEMs, and thus should be incorporated into loss functions. Domains and ranges of relations being largely available in schema-defined KGs, this makes our approach both beneficial and widely usable in practice.

翻訳日:2023-08-09 17:11:02 公開日:2023-08-08

# 潜在特徴と接地ラベルの相互情報最大化によるロングテール認識

Long-Tailed Recognition by Mutual Information Maximization between Latent Features and Ground-Truth Labels ( http://arxiv.org/abs/2305.01160v3 )

ライセンス: Link先を確認

Min-Kook Suh and Seung-Woo Seo

(参考訳) コントラスト学習手法は,様々な表現学習タスクにおいて有意な性能を示したが,訓練データセットが長期化されると困難に陥る。多くの研究者は、この問題を解決するためにコントラスト学習とロジット調整技術を組み合わせたが、これらの組み合わせはアドホックに行われ、理論的背景はまだ提供されていない。本稿の目標は,背景を提供し,パフォーマンスをさらに向上させることである。まず,ロングテールタスクに苦しむコントラスト学習の基本的な理由は,潜在特徴量と入力データ間の相互情報最大化を最大化しようとすることである。基底ラベルは最大化では考慮されないため、クラスラベル間の不均衡に対処することはできない。むしろ、ロングテール認識タスクを潜在特徴と接地ラベルの相互情報最大化として解釈する。このアプローチは、コントラスト学習とロジット調整をシームレスに統合し、ロングテール認識ベンチマークで最先端のパフォーマンスを示す損失関数を導出する。また、画像分割タスクにおいて有効性を示し、画像分類を超えた汎用性を検証する。

Although contrastive learning methods have shown prevailing performance on a variety of representation learning tasks, they encounter difficulty when the training dataset is long-tailed. Many researchers have combined contrastive learning and a logit adjustment technique to address this problem, but the combinations are done ad-hoc and a theoretical background has not yet been provided. The goal of this paper is to provide the background and further improve the performance. First, we show that the fundamental reason contrastive learning methods struggle with long-tailed tasks is that they try to maximize the mutual information maximization between latent features and input data. As ground-truth labels are not considered in the maximization, they are not able to address imbalances between class labels. Rather, we interpret the long-tailed recognition task as a mutual information maximization between latent features and ground-truth labels. This approach integrates contrastive learning and logit adjustment seamlessly to derive a loss function that shows state-of-the-art performance on long-tailed recognition benchmarks. It also demonstrates its efficacy in image segmentation tasks, verifying its versatility beyond image classification.

翻訳日:2023-08-09 17:04:13 公開日:2023-08-08

# 周期系におけるキャビティ誘起電荷移動:長さゲージ形式

Cavity-induced charge transfer in periodic systems: length-gauge formalism ( http://arxiv.org/abs/2304.11364v2 )

ライセンス: Link先を確認

Ekaterina Vlasiuk, Valerii K. Kozin, Jelena Klinovaja, Daniel Loss, Ivan V. Iorsh, Ilya V. Tokatly

(参考訳) 光-物質相互作用を誘導する光子空洞の存在下で1次元周期格子系を扱うための長ゲージ形式を開発した。形式主義の目的は、パワー・ジエナウ=ウーリー・ハミルトニアンの文脈で位置作用素を定義するときに生じる数学的曖昧さを取り除くことである。次に、電子量子系と長波長のフォトニックキャビティモードとの相互作用を摂動的に解析するためにダイアグラム法を用いる。逆対称性を破った米-meleモデルにおけるキャビティ誘起電荷の不均衡と分極の研究により, 正則性の多様性を示す。

We develop a length-gauge formalism for treating one-dimensional periodic lattice systems in the presence of a photon cavity inducing light-matter interaction. The purpose of the formalism is to remove mathematical ambiguities that occur when defining the position operator in the context of the Power-Zienau-Woolley Hamiltonian. We then use a diagrammatic approach to analyze perturbatively the interaction between an electronic quantum system and a photonic cavity mode of long wavelength. We illustrate the versatility of the formalism by studying the cavity-induced electric charge imbalance and polarization in the Rice-Mele model with broken inversion symmetry.

翻訳日:2023-08-09 17:03:45 公開日:2023-08-08

# wigner friend シナリオにおけるオブザーバ依存事実からフレーム依存計測記録へ

From observer-dependent facts to frame-dependent measurement records in Wigner friend scenarios ( http://arxiv.org/abs/2304.09289v2 )

ライセンス: Link先を確認

J. Allam and A. Matzkin

(参考訳) 友人が測定を行うクローズドラボを外部エージェントが記述するwigner-friendのシナリオの記述は、量子測定のあいまいな性質のために問題となっている。 1つの選択肢は、友人の測定結果が外部の観察者の観点から定義されていないことを考慮し、観察者依存の事実につながる仮定を支持することである。本研究では,エージェントが観測を行う慣性参照フレームに依存する測定記録が,これらの仮定によってもたらされることを示す相対論的文脈のモデルを提案する。我々のモデルは、友人と遠方のエージェントが共有する絡み合ったペアに基づいて、空間的に分離された測定を行う。閉じた実験室に相対して休息中の外部観察者と移動フレームの観測者は観測された記録について一致しないが、これは互いにローレンツ変換ではない。

The description of Wigner-friend scenarios -- in which external agents describe a closed laboratory containing a friend making a measurement -- remains problematic due to the ambiguous nature of quantum measurements. One option is to endorse assumptions leading to observer-dependent facts, given that the friend's measurement outcome is not defined from the point of view of the external observers. We introduce in this work a model in a relativistic context showing that these assumptions can also lead to measurement records that depend on the inertial reference frame in which the agents make their observations. Our model is based on an entangled pair shared by the friend and a distant agent performing space-like separated measurements. An external observer at rest relative to the closed laboratory and observers in a moving frame do not agree on the observed records, which are not Lorentz transforms of one another.

翻訳日:2023-08-09 17:03:33 公開日:2023-08-08

# 局所的最大切断に対する古典上の量子的優位性

A quantum advantage over classical for local max cut ( http://arxiv.org/abs/2304.08420v3 )

ライセンス: Link先を確認

Charlie Carlson, Zackary Jorquera, Alexandra Kolla, Steven Kordonowy

(参考訳) 量子局所アルゴリズムの性能を、よく確立された組合せ最適化問題LocalMaxCut上で、類似の古典的アルゴリズムと比較する。量子最適化近似アルゴリズム (qaoa) と呼ばれる、farhi, goldstone, gutmannn [1] によって最初に発見された一般的な量子アルゴリズムは、次数-3グラフ上の比較可能な局所的手法よりも計算上優れていることが示されている。これらの結果は、最先端の量子ハードウェアに関連する小さな量子計算であっても、比較可能な単純な古典計算よりも大きな利点があることを示唆している。

We compare the performance of a quantum local algorithm to a similar classical counterpart on a well-established combinatorial optimization problem LocalMaxCut. We show that a popular quantum algorithm first discovered by Farhi, Goldstone, and Gutmannn [1] called the quantum optimization approximation algorithm (QAOA) has a computational advantage over comparable local classical techniques on degree-3 graphs. These results hint that even small-scale quantum computation, which is relevant to the current state-of the art quantum hardware, could have significant advantages over comparably simple classical computation.

翻訳日:2023-08-09 17:03:18 公開日:2023-08-08

# 責任あるAIを実装する:倫理的側面の緊張とトレードオフ

Implementing Responsible AI: Tensions and Trade-Offs Between Ethics Aspects ( http://arxiv.org/abs/2304.08275v3 )

ライセンス: Link先を確認

Conrad Sanderson, David Douglas, Qinghua Lu

(参考訳) 責任あるAIに対する多くの倫理原則が、AI/MLシステムの誤用と悪用に関する懸念を和らげるために提案されている。このような原則の基本的な側面は、プライバシー、正確性、公正性、堅牢性、説明可能性、透明性である。しかし、これらの側面の間には潜在的な緊張関係があり、これらの原則に従おうとするAI/ML開発者には困難をもたらしている。例えば、AI/MLシステムの精度を高めることで、その説明可能性を減らすことができる。この作業では、原則を実践するための継続的な取り組みの一環として、10の顕著な緊張、トレードオフ、および基盤となる側面の間のその他の相互作用のカタログをまとめ、議論します。主に双方向の対話に焦点を合わせ、さまざまな文献にまたがるサポートを描いています。このカタログは、倫理原則の側面間の相互作用の認識を高めるとともに、AI/MLシステムのデザイナと開発者による十分に支持された判断を促進するのに役立つ。

Many sets of ethics principles for responsible AI have been proposed to allay concerns about misuse and abuse of AI/ML systems. The underlying aspects of such sets of principles include privacy, accuracy, fairness, robustness, explainability, and transparency. However, there are potential tensions between these aspects that pose difficulties for AI/ML developers seeking to follow these principles. For example, increasing the accuracy of an AI/ML system may reduce its explainability. As part of the ongoing effort to operationalise the principles into practice, in this work we compile and discuss a catalogue of 10 notable tensions, trade-offs and other interactions between the underlying aspects. We primarily focus on two-sided interactions, drawing on support spread across a diverse literature. This catalogue can be helpful in raising awareness of the possible interactions between aspects of ethics principles, as well as facilitating well-supported judgements by the designers and developers of AI/ML systems.

翻訳日:2023-08-09 17:03:07 公開日:2023-08-08

# 顔認証エッジケースに取り組む - 奥行き解析とヒューマンマシン融合アプローチ-

Tackling Face Verification Edge Cases: In-Depth Analysis and Human-Machine Fusion Approach ( http://arxiv.org/abs/2304.08134v3 )

ライセンス: Link先を確認

Martin Knoche and Gerhard Rigoll

(参考訳) 現在、顔認識システムは複数のデータセットで人間のパフォーマンスを上回っている。しかし、マシンが正しく分類できないエッジケースは依然として存在する。本稿では,顔認証タスクにおける機械と操作者の組合せの効果について検討する。まず、いくつかの最先端モデルのエッジケースに注目して、共通のデータセットの困難な設定を見つける。次に,選択タスクの参加者60名を対象に,人間による調査を行い,詳細な分析を行った。最後に、機械と人間の意思決定を組み合わせることで、様々なベンチマークデータセットにおける最先端の顔認証システムの性能をさらに向上できることを実証する。コードとデータはgithubで公開されている。

Nowadays, face recognition systems surpass human performance on several datasets. However, there are still edge cases that the machine can't correctly classify. This paper investigates the effect of a combination of machine and human operators in the face verification task. First, we look closer at the edge cases for several state-of-the-art models to discover common datasets' challenging settings. Then, we conduct a study with 60 participants on these selected tasks with humans and provide an extensive analysis. Finally, we demonstrate that combining machine and human decisions can further improve the performance of state-of-the-art face verification systems on various benchmark datasets. Code and data are publicly available on GitHub.

翻訳日:2023-08-09 17:02:50 公開日:2023-08-08

# GaitRef:refined Sequential Skeletonsを用いた歩行認識

GaitRef: Gait Recognition with Refined Sequential Skeletons ( http://arxiv.org/abs/2304.07916v3 )

ライセンス: Link先を確認

Haidong Zhu, Wanrong Zheng, Zhaoheng Zheng, Ram Nevatia

(参考訳) 歩行認識と呼ばれる歩行シーケンスで人間を識別することは、遠くから観察できるとともに、被験者の協力を必要としない、有用な生体情報理解タスクである。人の歩行の順序を表すのに使われる2つの一般的な様相はシルエットと関節骨格である。各フレーム内の歩行者の境界を記録するシルエットシーケンスは、その人物の持ち運び物や衣服の様々な外観に苦しむ可能性がある。フレームワイドな関節検出はノイズが多く、シーケンシャルな検出と一致しないジッタを導入する。本稿では,シルエットと骨格を組み合わせることで,歩行認識のためのフレームワイドジョイント予測を洗練する。シルエット配列からの時間的情報を用いて,改良された骨格は付加アノテーションなしで歩行認識性能を向上させることができることを示す。我々は,CASIA-B,OUMVLP,Gait3D,GREWの4つの公開データセットを用いて手法を比較し,最先端の性能を示す。

Identifying humans with their walking sequences, known as gait recognition, is a useful biometric understanding task as it can be observed from a long distance and does not require cooperation from the subject. Two common modalities used for representing the walking sequence of a person are silhouettes and joint skeletons. Silhouette sequences, which record the boundary of the walking person in each frame, may suffer from the variant appearances from carried-on objects and clothes of the person. Framewise joint detections are noisy and introduce some jitters that are not consistent with sequential detections. In this paper, we combine the silhouettes and skeletons and refine the framewise joint predictions for gait recognition. With temporal information from the silhouette sequences, we show that the refined skeletons can improve gait recognition performance without extra annotations. We compare our methods on four public datasets, CASIA-B, OUMVLP, Gait3D and GREW, and show state-of-the-art performance.

翻訳日:2023-08-09 17:02:41 公開日:2023-08-08

# 3+1d$のフェルミオンガウスペップ : 回転と相対論的極限

Fermionic Gaussian PEPS in $3+1d$: Rotations and Relativistic Limits ( http://arxiv.org/abs/2304.06744v2 )

ライセンス: Link先を確認

Patrick Emonts, Erez Zohar

(参考訳) フェルミオンガウス射影アンタングルペア状態(Fermionic Gaussian Projected Entangled Pair States)は、非相互作用性フェルミオンハミルトニアンの基底状態の物理を記述するフェルミオンテンソルネットワーク状態構造である。非相互作用状態として、解析的および数値的な方法で、それらを非常に効率的に研究し分析することができる。近年,格子ゲージ理論の変分研究において,いわゆるPEPSゲージ機構を適用した上での出発点として用いられることが示されている。これは符号プロブレム自由変分モンテカルロを用いて行われる。本研究では、スピン表現と格子回転の要求に焦点をあてて、2次元から3次元に一般化する方法を示す。 2+1$-dおよび3+1$-dモデルにおいて、フェルミオン物質を用いた非摂動性格子ゲージ理論物理学を研究するために、上記の変分モンテカルロ法の適用に不可欠な構成を示す。したがって、ここで提示される構成はフェルミオンテンソルネットワーク状態を持つ非自明な格子ゲージ理論の研究に不可欠である。

Fermionic Gaussian Projected Entangled Pair States are fermionic tensor network state constructions which describe the physics of ground states of non-interacting fermionic Hamiltonians. As non-interacting states, one may study and analyze them very efficiently, in both analytical and numerical means. Recently it was shown that they may be used as the starting point - after applying so-called PEPS gauging mechanisms - for variational study of lattice gauge theories. This is done using sign-problem free variational Monte-Carlo. In this work we show how to generalize such states from two to three spatial dimensions, focusing on spin representations and requirements of lattice rotations. We present constructions which are crucial for the application of the above mentioned variational Monte-Carlo techniques for studying non-perturbative lattice gauge theory physics, with fermionic matter, in $2+1$-d and $3+1$-d models. Thus, the constructions presented here are crucial for the study of non-trivial lattice gauge theories with fermionic tensor network states.

翻訳日:2023-08-09 17:02:24 公開日:2023-08-08

# ハイブリッド音源を用いた非同期計測デバイス非依存量子鍵分布

Asynchronous measurement-device-independent quantum key distribution with hybrid source ( http://arxiv.org/abs/2304.04569v3 )

ライセンス: Link先を確認

Jun-Lin Bai, Yuan-Mei Xie, Yao Fu, Hua-Lei Yin, Zeng-Bing Chen

(参考訳) 秘密鍵レート容量の線形制約は、チューフィールド量子鍵分布(QKD)によって克服される。しかし、複雑な位相同期と位相追跡技術は、ツインフィールドプロトコルの実際の応用を阻害する。非同期計測デバイス非依存(AMDI)QKDあるいはモードペアリングQKDプロトコルは、技術的要求を緩和し、ツインフィールドプロトコルと同様の性能を維持することができる。本稿では,位相ランダム化弱コヒーレント状態から位相ランダム化コヒーレント状態重畳状態に変化させることにより,非古典光源を用いたAMDI-QKDプロトコルを提案する。シミュレーションの結果,提案プロトコルはAMDI-QKDプロトコルの鍵レートを大幅に向上するとともに,非古典光源の不完全変調に対するロバスト性を示した。

The linear constraint of secret key rate capacity is overcome by the tiwn-field quantum key distribution (QKD). However, the complex phase-locking and phase-tracking technique requirements throttle the real-life applications of twin-field protocol. The asynchronous measurement-device-independent (AMDI) QKD or called mode-pairing QKD protocol can relax the technical requirements and keep the similar performance of twin-field protocol. Here, we propose an AMDI-QKD protocol with a nonclassical light source by changing the phase-randomized weak coherent state to a phase-randomized coherent-state superposition in the signal state time window. Simulation results show that our proposed hybrid source protocol significantly enhances the key rate of the AMDI-QKD protocol, while exhibiting robustness to imperfect modulation of nonclassical light sources.

翻訳日:2023-08-09 17:02:03 公開日:2023-08-08

# コントラスト学習と深いモジュール化に基づく音声分離

Speech Separation based on Contrastive Learning and Deep Modularization ( http://arxiv.org/abs/2305.10652v2 )

ライセンス: Link先を確認

Peter Ochieng

(参考訳) 音声分離のための技術ツールの現況は教師付き学習に依存している。これは、置換問題に対処する必要があることを意味しており、トレーニングや推論で使用する話者数にミスマッチの影響を受けている。さらに、その性能は高品質なラベル付きデータの存在に大きく依存している。これらの問題は、完全に教師なしの音声分離技術を用いることで効果的に解決できる。本稿では,コントラスト学習を用いてフレームの表現を確立し,下流のディープモジュール化タスクで学習表現を使用する。具体的には、音声分離において、話者の異なるフレームを、その話者の隠れた標準フレームの強化と見なすことができることを実験的に示す。話者のフレームは、音声分離の鍵となる十分な韻律情報の重複を含む。そこで本研究では,与えられた話者に属するフレーム間の距離を最小化するために,自己教師付き学習を実現する。学習された表現は、下流の深いモジュール化タスクで、話者のアイデンティティに基づいたクラスタフレームに使用される。 WSJ0-2mix と WSJ0-3mix において, SI-SNRi と SDRi を 20.8 と 21.0 でそれぞれ達成した。 WSJ0-3mix では、SI-SNRi と SDRi はそれぞれ 20.7 と 20.7 を WSJ0-2mix で得る。最大の強みは、話者数が増えるにつれて、その性能が著しく低下しないことである。

The current monaural state of the art tools for speech separation relies on supervised learning. This means that they must deal with permutation problem, they are impacted by the mismatch on the number of speakers used in training and inference. Moreover, their performance heavily relies on the presence of high-quality labelled data. These problems can be effectively addressed by employing a fully unsupervised technique for speech separation. In this paper, we use contrastive learning to establish the representations of frames then use the learned representations in the downstream deep modularization task. Concretely, we demonstrate experimentally that in speech separation, different frames of a speaker can be viewed as augmentations of a given hidden standard frame of that speaker. The frames of a speaker contain enough prosodic information overlap which is key in speech separation. Based on this, we implement a self-supervised learning to learn to minimize the distance between frames belonging to a given speaker. The learned representations are used in a downstream deep modularization task to cluster frames based on speaker identity. Evaluation of the developed technique on WSJ0-2mix and WSJ0-3mix shows that the technique attains SI-SNRi and SDRi of 20.8 and 21.0 respectively in WSJ0-2mix. In WSJ0-3mix, it attains SI-SNRi and SDRi of 20.7 and 20.7 respectively in WSJ0-2mix. Its greatest strength being that as the number of speakers increase, its performance does not degrade significantly.

翻訳日:2023-08-09 16:54:57 公開日:2023-08-08

# インプラント位置予測のための2ストリーム回帰ネットワーク

Two-Stream Regression Network for Dental Implant Position Prediction ( http://arxiv.org/abs/2305.10044v3 )

ライセンス: Link先を確認

Xinquan Yang and Xuguang Li and Xuechen Li and Wenting Chen and Linlin Shen and Xin Li and Yongqiang Deng

(参考訳) インプラント補綴治療において, 外科的ガイドの設計は, 主観的かつ医師の経験に訴えやすいインプラント位置の手動位置に大きく依存する。この問題を解決するために深層学習法が適用され始めたとき, 歯間空間は様々であり, その一部には実際のインプラント領域と類似したテクスチャ特性を示すものもある。どちらの問題もインプラント位置予測には大きな課題となる。本稿では, 埋込領域検出器 (IRD) とマルチスケールパッチ埋め込み回帰ネットワーク (MSPENet) から構成される2ストリーム埋込位置回帰フレームワーク (TSIPR) を開発し, この問題に対処する。 irdのトレーニングのために、元のアノテーションを拡張して、よりリッチな特徴を持ち、追加のラベリングコストを発生しない、追加の監督情報を提供する。マルチスケールのパッチ埋め込みモジュールはMSPENetが様々な歯の間隔で画像から特徴を適応的に抽出するために設計されている。グローバルローカルな特徴相互作用ブロックは、リッチな特徴表現のための変換器と畳み込みを組み合わせたMSPENetのエンコーダを構築するように設計されている。推測中、IRDから抽出したRoIマスクを用いてMSPENetの予測結果を洗練する。 5倍のクロスバリデーションによる歯科インプラントデータセットの大規模な実験により,提案したTSIPRは既存の方法よりも優れた性能を示した。

In implant prosthesis treatment, the design of the surgical guide heavily relies on the manual location of the implant position, which is subjective and prone to doctor's experiences. When deep learning based methods has started to be applied to address this problem, the space between teeth are various and some of them might present similar texture characteristic with the actual implant region. Both problems make a big challenge for the implant position prediction. In this paper, we develop a two-stream implant position regression framework (TSIPR), which consists of an implant region detector (IRD) and a multi-scale patch embedding regression network (MSPENet), to address this issue. For the training of IRD, we extend the original annotation to provide additional supervisory information, which contains much more rich characteristic and do not introduce extra labeling costs. A multi-scale patch embedding module is designed for the MSPENet to adaptively extract features from the images with various tooth spacing. The global-local feature interaction block is designed to build the encoder of MSPENet, which combines the transformer and convolution for enriched feature representation. During inference, the RoI mask extracted from the IRD is used to refine the prediction results of the MSPENet. Extensive experiments on a dental implant dataset through five-fold cross-validation demonstrated that the proposed TSIPR achieves superior performance than existing methods.

翻訳日:2023-08-09 16:54:37 公開日:2023-08-08

# 視覚トランスフォーマーとそのcnnトランスフォーマーに基づく変種に関する調査

A survey of the Vision Transformers and its CNN-Transformer based Variants ( http://arxiv.org/abs/2305.09880v3 )

ライセンス: Link先を確認

Asifullah Khan, Zunaira Rauf, Anabia Sohail, Abdul Rehman, Hifsa Asif, Aqsa Asif, and Umair Farooq

(参考訳) 視覚トランスフォーマーは、様々なコンピュータビジョンアプリケーションのための畳み込みニューラルネットワーク(cnns)の代替として人気を博した。これらのトランスフォーマーは、画像のグローバルな関係に焦点を合わせ、大きな学習能力を提供する。しかし、画像の局所的相関をモデル化しないため、限定的な一般化に悩まされることがある。近年,視覚変換器による畳み込み操作と自己認識機構のハイブリッド化が出現し,局所的およびグローバルな画像表現の両面を利用した。これらのハイブリッド視覚トランスフォーマーは、cnn-transformer architectureとも呼ばれ、視覚応用において顕著な結果を示している。急速に増加するハイブリッドビジョントランスフォーマーの数を考えると、これらのハイブリッドアーキテクチャの分類と説明を提供する必要がある。本調査では,近年のビジョントランスフォーマーアーキテクチャの分類,特にハイブリッドビジョントランスフォーマーの分類について述べる。さらに,アテンション機構,位置埋め込み,マルチスケール処理,畳み込みなど,これらのアーキテクチャの重要な特徴についても論じる。個々の視覚トランスフォーマーアーキテクチャやcnnに焦点を当てた以前の調査論文とは対照的に、この調査はハイブリッド視覚トランスフォーマーの新たなトレンドを独特に強調している。ハイブリッドビジョントランスフォーマーが様々なコンピュータビジョンタスクに優れたパフォーマンスをもたらす可能性を示すことによって、この急速に進化するアーキテクチャの今後の方向性を明らかにした。

Vision transformers have become popular as a possible substitute to convolutional neural networks (CNNs) for a variety of computer vision applications. These transformers, with their ability to focus on global relationships in images, offer large learning capacity. However, they may suffer from limited generalization as they do not tend to model local correlation in images. Recently, in vision transformers hybridization of both the convolution operation and self-attention mechanism has emerged, to exploit both the local and global image representations. These hybrid vision transformers, also referred to as CNN-Transformer architectures, have demonstrated remarkable results in vision applications. Given the rapidly growing number of hybrid vision transformers, it has become necessary to provide a taxonomy and explanation of these hybrid architectures. This survey presents a taxonomy of the recent vision transformer architectures and more specifically that of the hybrid vision transformers. Additionally, the key features of these architectures such as the attention mechanisms, positional embeddings, multi-scale processing, and convolution are also discussed. In contrast to the previous survey papers that are primarily focused on individual vision transformer architectures or CNNs, this survey uniquely emphasizes the emerging trend of hybrid vision transformers. By showcasing the potential of hybrid vision transformers to deliver exceptional performance across a range of computer vision tasks, this survey sheds light on the future directions of this rapidly evolving architecture.

翻訳日:2023-08-09 16:54:13 公開日:2023-08-08

# 量子論の別の基礎

An alternative foundation of quantum theory ( http://arxiv.org/abs/2305.06727v5 )

ライセンス: Link先を確認

Inge S. Helland

(参考訳) 本稿では,量子論への新たなアプローチを提案する。基本はまず、理論変数、アクセス可能あるいはアクセス不能な変数、すなわち、アクターが任意に鋭い数値をそれらに割り当てることは可能であるか不可能であるかもしれない。認識論的プロセスでは、アクセス可能な変数は、アクターまたは一部の通信アクターと接続された理想的な観察である。群作用はこれらの変数上で定義され、群表現論はここでヒルベルト空間形式論を展開する基礎である。アクセス可能な理論変数に対応する演算子が導出され、離散の場合、可能な物理値はそれらの演算子の固有値であることが証明される。論文の焦点は、提案された量子論の基礎の基礎となるいくつかの数学的定理である。ここで、このアプローチで必要とされる群と変換は、アクセス可能な変数が有限次元である場合に明示的に構成できることを示す。ヒルベルト空間の定式化を再現するには、2つの相補変数の存在を仮定するのに十分である。数学的変数よりも物理変数にのみ焦点を合わせるために、到達不能変数の概念は概念の概念に置き換えられ、この関係において圏論の側面は群論を部分的に置き換える。ここで提案された基礎から推測される解釈は、量子論の一般的なエピステミック解釈と呼ばれる。この解釈の特別な例はQB主義であり、他のいくつかの解釈とも関係している。

A new approach towards quantum theory is proposed in this paper. The basis is first taken to be theoretical variables, variables that may be accessible or inaccessible, i.e., it may be possible or impossible for an actor to assign arbitrarily sharp numerical values to them. In an epistemic process, the accessible variables are just ideal observations connected to an actor or to some communicating actors. Group actions are defined on these variables, and group representation theory is the basis for developing the Hilbert space formalism here. Operators corresponding to accessible theoretical variables are derived, and in the discrete case it is proved that the possible physical values are the eigenvalues of these operators. The focus of the paper are some mathematical theorems paving the ground for the proposed foundation of quantum theory. It is shown here that the groups and transformation needed in this approach can be constructed explicitly in the case where the accessible variables are finite-dimensional. This simplifies the theory considerably: To reproduce the Hilbert space formulation, it is enough to assume the existence of two complementary variables. To focus only on physical variables rather than mathematical variables, the concept of inaccessible variables is then replaced by the concept of notions, and in this connection, aspects of category theory partly replace group theory. The interpretation inferred from the proposed foundation here may be called a general epistemic interpretation of quantum theory. A special case of this interpretation is QBism; it also has a relationship to several other interpretations.

翻訳日:2023-08-09 16:53:33 公開日:2023-08-08

# 量子コンピュータ上の多体系シミュレーションのための量子フローアルゴリズム

Quantum flow algorithms for simulating many-body systems on quantum computers ( http://arxiv.org/abs/2305.05168v2 )

ライセンス: Link先を確認

Karol Kowalski, Nicholas P. Bauman

(参考訳) 我々は,次元活性空間の縮小した固有値問題を通じてヒルベルト空間の大規模部分空間をサンプリングする量子フロー (QFlow) 法を用いて,強相関系の量子シミュレーションを行った。我々のQFlowアルゴリズムは回路の複雑さを大幅に減らし、スケーラブルで一定の回路幅の量子コンピューティングの道を開く。シミュレーションにより,QFlowは必要量子ビットを増大させることなく,パラメータの桁数が桁違いに少ないアクティブ空間を用いて,波動関数パラメータの集合数を最適化できることを示した。

We conducted quantum simulations of strongly correlated systems using the quantum flow (QFlow) approach, which enables sampling large sub-spaces of the Hilbert space through coupled eigenvalue problems in reduced dimensionality active spaces. Our QFlow algorithms significantly reduce circuit complexity and pave the way for scalable and constant-circuit-depth quantum computing. Our simulations show that QFlow can optimize the collective number of wave function parameters without increasing the required qubits using active spaces having an order of magnitude fewer number of parameters.

翻訳日:2023-08-09 16:53:13 公開日:2023-08-08

# DiffBFR: ブラインド顔復元に向けたブートストラップ拡散モデル

DiffBFR: Bootstrapping Diffusion Model Towards Blind Face Restoration ( http://arxiv.org/abs/2305.04517v2 )

ライセンス: Link先を確認

Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li, Tiande Guo, Xuecheng Nie

(参考訳) ブラインドフェイス修復(bfr)は挑戦的に重要である。以前の作業では、品質と効率のバランスのため、ganベースのフレームワークを利用してこの問題に取り組むことを好む。しかし、これらの手法は長期分布に対する安定性の低下と適応性に悩まされ、ソースのアイデンティティを同時に保持できず、詳細を復元することができない。本稿では,トレーニング崩壊の回避とロングテール分布の生成という面において,ganよりも優れていることを考慮し,bfrに拡散確率モデル(dpm)を導入することを提案する。 DiffBFRは2段階の設計を用いて、まず低画質の画像から識別情報を復元し、実際の顔の分布に応じてテクスチャの詳細を強化する。この設計は2つの重要なコンポーネントで実装されている。 1) 結果の顔の詳細を保存するためのアイデンティティ復元モジュール(IRM) 逆過程の条件として,LQ画像を用いた純ガウス的ランダム分布からノイズを除去する代わりに,部分雑音を付加したLQ画像から始まる新しい切り出しサンプリング手法を提案する。理論的には、この変化はDPMの限界の低い証拠を縮小し、さらにオリジナルの詳細を復元する。理論的証明により、入力サイズが異なる2つのカスケード条件DPMを導入し、このサンプリング効果を強化し、直接発生する高解像度画像のトレーニング困難を軽減する。 2)画像のテクスチャを磨くためのテクスチャ強化モジュール(TEM)。ここでは、LQフリーモデルである無条件DPMを導入し、修復を現実的に見せるように強制する。理論上は、純粋なHQ画像に基づいて訓練されたこの非条件DPMが、IRMから出力される推論画像の画素レベルの正しい分布を正当化するのに役立つことを証明した。分節時間ステップの切り抜きサンプリングを用いて、アイデンティティ情報を保持しながら画素レベルのテクスチャを研磨する。

Blind face restoration (BFR) is important while challenging. Prior works prefer to exploit GAN-based frameworks to tackle this task due to the balance of quality and efficiency. However, these methods suffer from poor stability and adaptability to long-tail distribution, failing to simultaneously retain source identity and restore detail. We propose DiffBFR to introduce Diffusion Probabilistic Model (DPM) for BFR to tackle the above problem, given its superiority over GAN in aspects of avoiding training collapse and generating long-tail distribution. DiffBFR utilizes a two-step design, that first restores identity information from low-quality images and then enhances texture details according to the distribution of real faces. This design is implemented with two key components: 1) Identity Restoration Module (IRM) for preserving the face details in results. Instead of denoising from pure Gaussian random distribution with LQ images as the condition during the reverse process, we propose a novel truncated sampling method which starts from LQ images with part noise added. We theoretically prove that this change shrinks the evidence lower bound of DPM and then restores more original details. With theoretical proof, two cascade conditional DPMs with different input sizes are introduced to strengthen this sampling effect and reduce training difficulty in the high-resolution image generated directly. 2) Texture Enhancement Module (TEM) for polishing the texture of the image. Here an unconditional DPM, a LQ-free model, is introduced to further force the restorations to appear realistic. We theoretically proved that this unconditional DPM trained on pure HQ images contributes to justifying the correct distribution of inference images output from IRM in pixel-level space. Truncated sampling with fractional time step is utilized to polish pixel-level textures while preserving identity information.

翻訳日:2023-08-09 16:53:04 公開日:2023-08-08

# YOLOCS:特徴空間凝固のためのDense Channel Compressionに基づく物体検出

YOLOCS: Object Detection based on Dense Channel Compression for Feature Spatial Solidification ( http://arxiv.org/abs/2305.04170v3 )

ライセンス: Link先を確認

Lin Huang, Weisheng Li, Linlin Shen, Haojie Fu, Xue Xiao, Suihan Xiao

(参考訳) 本研究では,ネットワーク内の前方および後方伝播に着目し,特徴浄化と勾配バックプロパゲーションの過程におけるチャネル特性と畳み込み核の関係について検討する。そこで本稿では,Dense Channel Compression for Feature Spatial Solidificationを提案する。本手法の中心概念に基づき,Dense Channel Compression for Feature Spatial Solidification Structure (DCFS) と非対称多層圧縮デカップリングヘッド (ADH) という,バックボーンとヘッドネットワークのための2つの革新的なモジュールを導入する。 YOLOv5モデルに統合されると、これらの2つのモジュールは例外的な性能を示し、YOLOCSと呼ばれるモデルが修正される。 MSCOCOデータセットに基づいて評価すると、大、中、小のYOLOCSモデルはそれぞれ50.1%、47.6%、42.5%のAPが得られる。推論速度はYOLOv5モデルと著しく類似しており、大、中、小のYOLOCSモデルはYOLOv5モデルのAPをそれぞれ1.1%、2.3%、5.2%上回っている。

In this study, we examine the associations between channel features and convolutional kernels during the processes of feature purification and gradient backpropagation, with a focus on the forward and backward propagation within the network. Consequently, we propose a method called Dense Channel Compression for Feature Spatial Solidification. Drawing upon the central concept of this method, we introduce two innovative modules for backbone and head networks: the Dense Channel Compression for Feature Spatial Solidification Structure (DCFS) and the Asymmetric Multi-Level Compression Decoupled Head (ADH). When integrated into the YOLOv5 model, these two modules demonstrate exceptional performance, resulting in a modified model referred to as YOLOCS. Evaluated on the MSCOCO dataset, the large, medium, and small YOLOCS models yield AP of 50.1%, 47.6%, and 42.5%, respectively. Maintaining inference speeds remarkably similar to those of the YOLOv5 model, the large, medium, and small YOLOCS models surpass the YOLOv5 model's AP by 1.1%, 2.3%, and 5.2%, respectively.

翻訳日:2023-08-09 16:52:33 公開日:2023-08-08

# 駆動マイクロ波共振器の光子放射統計

Photon emission statistics of a driven microwave cavity ( http://arxiv.org/abs/2305.01986v2 )

ライセンス: Link先を確認

Pedro Portugal, Fredrik Brange, Kalle S. U. Kansanen, Peter Samuelsson, and Christian Flindt

(参考訳) 最近の実験的進歩により、ナノスケール導体中の単一電子のトンネル化や非古典光源からの光子放出など、オープン量子系の個々の量子ジャンプを検出できるようになった。本研究では,外部磁場により共鳴駆動されるマイクロ波共振器から放射される光子の統計を理論的に検討する。パラメトリックとコヒーレントドライブの違いに着目し,キャビティフィールドを圧縮または変位させる。ガウス状態に基づく理論的枠組みを用いて,光子放射統計量の生成関数を得るために,計数場を施したlindbladマスター方程式を用いる。次に、2つのドライブの光子待ち時間の分布と、出射光の$g^{(2)}$-関数を比較し、これらの観測値間の重要な違いを同定する。長時間の限界において、光子放射統計の因子的累積と、この2つの駆動で顕著に異なる放出電流の大規模偏差統計を解析する。理論的な枠組みは、マイクロ波共振器を複数組み合わせた、より複雑なシステムにも容易に拡張でき、将来の実験で予測を検証できる。

Recent experimental advances have made it possible to detect individual quantum jumps in open quantum systems, such as the tunneling of single electrons in nanoscale conductors or the emission of photons from non-classical light sources. Here, we investigate theoretically the statistics of photons emitted from a microwave cavity that is driven resonantly by an external field. We focus on the differences between a parametric and a coherent drive, which either squeezes or displaces the cavity field. We employ a Lindblad master equation dressed with counting fields to obtain the generating function of the photon emission statistics using a theoretical framework based on Gaussian states. We then compare the distribution of photon waiting times for the two drives as well as the $g^{(2)}$-functions of the outgoing light, and we identify important differences between these observables. In the long-time limit, we analyze the factorial cumulants of the photon emission statistics and the large-deviation statistics of the emission currents, which are markedly different for the two drives. Our theoretical framework can readily be extended to more complicated systems, for instance, with several coupled microwave cavities, and our predictions may be tested in future experiments.

翻訳日:2023-08-09 16:51:49 公開日:2023-08-08

# GCformer: 正確でスケーラブルな多変数時系列予測のための効率的なフレームワーク

GCformer: An Efficient Framework for Accurate and Scalable Long-Term Multivariate Time Series Forecasting ( http://arxiv.org/abs/2306.08325v2 )

ライセンス: Link先を確認

YanJun Zhao, Ziqing Ma, Tian Zhou, Liang Sun, Mengni Ye, Yi Qian

(参考訳) トランスフォーマーベースのモデルは、時系列予測の有望なツールとして登場した。しかし、これらのモデルでは長い入力時系列の正確な予測はできない。一方で、時系列データ内のグローバルな依存関係を捉えられなかった。一方、長い入力シーケンスは、通常、大きなモデルサイズと高い時間複雑性をもたらす。この制限に対処するために、長い入力列を処理する構造化グローバル畳み込みブランチと、短い最新の信号をキャプチャするローカルトランスフォーマティブベースのブランチを組み合わせたgcformerを提案する。大域的畳み込みカーネルのための凝集フレームワークが3つの異なるパラメータ化手法を用いて導入された。グローバルブランチで選択された構造化畳み込みカーネルは、特に線形の複雑さで構築されており、長大で雑音の多い入力信号の効率的かつ効率的な処理を可能にしている。 6つのベンチマークデータセットに関する実証的研究により、GCformerは最先端の手法より優れており、多変量時系列ベンチマークのMSEエラーを4.38%、モデルパラメータを61.92%削減している。特に、グローバル畳み込み分岐は他のモデルの性能を向上させるためのプラグインブロックとして機能することができ、最近発表された様々なトランスフォーマーベースのモデルを含む平均31.93\%改善されている。私たちのコードはhttps://github.com/zyj-111/gcformerで公開しています。

Transformer-based models have emerged as promising tools for time series forecasting. However, these model cannot make accurate prediction for long input time series. On the one hand, they failed to capture global dependencies within time series data. On the other hand, the long input sequence usually leads to large model size and high time complexity. To address these limitations, we present GCformer, which combines a structured global convolutional branch for processing long input sequences with a local Transformer-based branch for capturing short, recent signals. A cohesive framework for a global convolution kernel has been introduced, utilizing three distinct parameterization methods. The selected structured convolutional kernel in the global branch has been specifically crafted with sublinear complexity, thereby allowing for the efficient and effective processing of lengthy and noisy input signals. Empirical studies on six benchmark datasets demonstrate that GCformer outperforms state-of-the-art methods, reducing MSE error in multivariate time series benchmarks by 4.38% and model parameters by 61.92%. In particular, the global convolutional branch can serve as a plug-in block to enhance the performance of other models, with an average improvement of 31.93\%, including various recently published Transformer-based models. Our code is publicly available at https://github.com/zyj-111/GCformer.

翻訳日:2023-08-09 16:46:07 公開日:2023-08-08

# GEMO-CLAP:ジェンダー属性強化コントラスト言語-Audio Pretraining for Speech Emotion Recognition

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition ( http://arxiv.org/abs/2306.07848v6 )

ライセンス: Link先を確認

Yu Pan, Yanni Hu, Yuguang Yang, Jixun Yao, Wen Fei, Lei Ma, Heng Lu

(参考訳) コントラスト学習に基づくクロスモダリティ事前学習アプローチは、近年、様々な分野で素晴らしい成功を収めている。本稿では,音声感情認識のための性分析型コントラスト言語-audio pretraining (clap) 手法であるgemo-clapを提案する。具体的には,WavLMモデルとRoBERTaモデルを用いて,感情CLAPモデル(Emo-CLAP)を構築した。第二に、音声感情モデリングにおけるジェンダー属性の重要性から、2つのソフトラベルに基づくGEmo-CLAP(SL-GEmo-CLAP)とマルチタスク学習に基づくGEmo-CLAP(ML-GEmo-CLAP)モデルが提案され、音声信号の感情とジェンダー情報を統合し、より合理的な目的を形成する。 IEMOCAPの大規模実験により,提案した2つのGEmo-CLAPモデルがベースラインであるEmo-CLAPより一貫して優れており,また最近の最先端手法と比較して最高の認識性能が得られた。特に、提案したSL-GEMO-CLAPモデルは、81.43\%の最高のUARと83.16\%のWARを達成する。

Contrastive learning based cross-modality pretraining approaches have recently exhibited impressive success in diverse fields. In this paper, we propose GEmo-CLAP, a kind of gender-attribute-enhanced contrastive language-audio pretraining (CLAP) method for speech emotion recognition. Specifically, a novel emotion CLAP model (Emo-CLAP) is first built, utilizing pre-trained WavLM and RoBERTa models. Second, given the significance of the gender attribute in speech emotion modeling, two novel soft label based GEmo-CLAP (SL-GEmo-CLAP) and multi-task learning based GEmo-CLAP (ML-GEmo-CLAP) models are further proposed to integrate emotion and gender information of speech signals, forming more reasonable objectives. Extensive experiments on IEMOCAP show that our proposed two GEmo-CLAP models consistently outperform the baseline Emo-CLAP, while also achieving the best recognition performance compared with recent state-of-the-art methods. Noticeably, the proposed SL-GEmo-CLAP model achieves the best UAR of 81.43\% and WAR of 83.16\% which performs better than other state-of-the-art SER methods by at least 3\%.

翻訳日:2023-08-09 16:45:45 公開日:2023-08-08

# InstructZero: ブラックボックス大言語モデルの効率的な命令最適化

InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models ( http://arxiv.org/abs/2306.03082v2 )

ライセンス: Link先を確認

Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, Tianyi Zhou

(参考訳) 大規模言語モデル~(llms)は命令フォロワであるが、異なる状況、特にバックプロパゲーションが禁止されているブラックボックスllmに対して最適な命令を見つけることは困難である。離散命令を直接最適化する代わりに,オープンソースLLMに適用した低次元ソフトプロンプトを最適化し,ブラックボックスLLMの命令を生成する。 InstructZero と呼ぶ提案手法の各イテレーションにおいて,ソフトプロンプトをオープンソース LLM を用いて命令に変換し,ゼロショット評価のためにブラックボックス LLM に送信し,その性能をベイズ最適化に送信し,ゼロショット性能を向上させるソフトプロンプトを新たに生成する。 Vicuna や ChatGPT など,オープンソースの LLM と API の組み合わせによる InstructZero の評価を行った。 InstructZero は,様々な下流タスクにおいて SOTA 自動命令手法より優れていることを示す。私たちのコードとデータはhttps://github.com/Lichang-Chen/InstructZero.comで公開されています。

Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at https://github.com/Lichang-Chen/InstructZero.

翻訳日:2023-08-09 16:45:17 公開日:2023-08-08

# 公務分野における話題分類のための大規模言語モデル活用

Leveraging Large Language Models for Topic Classification in the Domain of Public Affairs ( http://arxiv.org/abs/2306.02864v2 )

ライセンス: Link先を確認

Alejandro Pe\~na, Aythami Morales, Julian Fierrez, Ignacio Serna, Javier Ortega-Garcia, I\~nigo Puente, Jorge Cordova, Gonzalo Cordova

(参考訳) 行政文書の分析は、透明性、説明責任、情報的意思決定を促進するため、市民にとって不可欠である。市民は政府の政策を理解し、公的な議論に参加し、代表者が責任を負うことができる。特定の規制に依存している企業にとって、これは重要なことであり、時には命または死の問題である。大規模言語モデル(LLM)は、そのような文書で使用される複雑な言語を効果的に処理し理解することで、公務文書の分析を大幅に強化する可能性がある。本研究では,公務文書の分類におけるLCMの性能分析を行う。自然なマルチラベルタスクとして、これらの文書の分類は重要な課題である。本研究では,33K以上のサンプルと22.5Mトークンを持つ公開事務文書のデータベース収集に,Regexを利用したツールを使用する。実験では,スペインにおける4つの異なるllmの性能を評価し,最大30のトピックを異なる構成で分類した。その結果, LLM は公務分野の文書など, ドメイン固有の文書の処理に有効であることが示唆された。

The analysis of public affairs documents is crucial for citizens as it promotes transparency, accountability, and informed decision-making. It allows citizens to understand government policies, participate in public discourse, and hold representatives accountable. This is crucial, and sometimes a matter of life or death, for companies whose operation depend on certain regulations. Large Language Models (LLMs) have the potential to greatly enhance the analysis of public affairs documents by effectively processing and understanding the complex language used in such documents. In this work, we analyze the performance of LLMs in classifying public affairs documents. As a natural multi-label task, the classification of these documents presents important challenges. In this work, we use a regex-powered tool to collect a database of public affairs documents with more than 33K samples and 22.5M tokens. Our experiments assess the performance of 4 different Spanish LLMs to classify up to 30 different topics in the data in different configurations. The results shows that LLMs can be of great use to process domain-specific documents, such as those in the domain of public affairs.

翻訳日:2023-08-09 16:44:58 公開日:2023-08-08

# LLM時代のAI透明性:人間中心の研究ロードマップ

AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap ( http://arxiv.org/abs/2306.01941v2 )

ライセンス: Link先を確認

Q. Vera Liao and Jennifer Wortman Vaughan

(参考訳) 強力な大規模言語モデル(llm)の台頭は、イノベーションの絶好の機会をもたらすだけでなく、個人や社会全体に対するリスクも高めている。我々は LLM と LLM を注入したアプリケーションの開発とデプロイを責任を持って行うための重要な瞬間に達した。しかし、責任あるAI — 透明性 — の中心的な柱は、LLMに関する現在の議論から大きく逸脱している。 LLMの透明性を提供するための新しいアプローチを追求することが最重要であり、AIとヒューマンコンピュータの相互作用(HCI)の交差点における長年の研究は、人間中心の視点で行う必要があることを強調している。新たなLLMエコシステムにおける利害関係者のニーズ、新しいタイプのLLM組み込みアプリケーション、LLMに関する新たな利用パターンと課題を考慮し、人々の処理、インタラクション、情報の利用に関する教訓に基づいて、透明性へのアプローチを開発し、設計する必要があります。私たちは、LLMに透明性を提供する上で生じるユニークな課題と、HCIから学んだ教訓、AI透明性を人間中心の視点で捉えた責任あるAI研究を反映しています。次に、透明性を達成するためにコミュニティが採用した4つの一般的なアプローチ -- モデルレポート、評価結果の公開、説明の提供、不確実性の伝達 -- を概説し、これらのアプローチがllmにどのように適用されるか、あるいは適用されないかに関するオープン質問を提起します。これが議論の出発点となり、将来の研究に有用なロードマップになることを願っています。

The rise of powerful large language models (LLMs) brings about tremendous opportunities for innovation but also looming risks for individuals and society at large. We have reached a pivotal moment for ensuring that LLMs and LLM-infused applications are developed and deployed responsibly. However, a central pillar of responsible AI -- transparency -- is largely missing from the current discourse around LLMs. It is paramount to pursue new approaches to provide transparency for LLMs, and years of research at the intersection of AI and human-computer interaction (HCI) highlight that we must do so with a human-centered perspective: Transparency is fundamentally about supporting appropriate human understanding, and this understanding is sought by different stakeholders with different goals in different contexts. In this new era of LLMs, we must develop and design approaches to transparency by considering the needs of stakeholders in the emerging LLM ecosystem, the novel types of LLM-infused applications being built, and the new usage patterns and challenges around LLMs, all while building on lessons learned about how people process, interact with, and make use of information. We reflect on the unique challenges that arise in providing transparency for LLMs, along with lessons learned from HCI and responsible AI research that has taken a human-centered perspective on AI transparency. We then lay out four common approaches that the community has taken to achieve transparency -- model reporting, publishing evaluation results, providing explanations, and communicating uncertainty -- and call out open questions around how these approaches may or may not be applied to LLMs. We hope this provides a starting point for discussion and a useful roadmap for future research.

翻訳日:2023-08-09 16:44:42 公開日:2023-08-08

# Shuffle SGD は常に SGD より優れている: 任意データ順序による SGD の解析の改善

Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders ( http://arxiv.org/abs/2305.19259v3 )

ライセンス: Link先を確認

Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi

(参考訳) 確率勾配 Descent (SGD) アルゴリズムはニューラルネットワークの最適化に広く用いられ、ランダムリシャッフル (RR) とシングルシャッフル (SS) はトレーニングデータのランダムまたは単一置換によるサイクリングの一般的な選択肢である。しかし、非凸の場合におけるこれらのアルゴリズムの収束性は完全には理解されていない。既存の結果から,エポックの数がトレーニングセットサイズよりも小さい現実的なトレーニングシナリオでは,RRはSGDよりも悪いパフォーマンスを示す可能性が示唆された。本稿では,任意のデータ順序付けが可能な一般SGDアルゴリズムを解析し,非凸関数に対する収束率の向上を示す。具体的には, ランダムかつ単一シャッフルのSGDは, イテレーション数に関係なく, 従来のSGDよりも常に高速か,少なくとも同等であることを示す。本研究は,SGDをランダム/単一シャッフルで使用することの利点を強調し,非凸最適化のための収束特性に関する新たな知見を提供する。

Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being popular choices for cycling through random or single permutations of the training data. However, the convergence properties of these algorithms in the non-convex case are not fully understood. Existing results suggest that, in realistic training scenarios where the number of epochs is smaller than the training set size, RR may perform worse than SGD. In this paper, we analyze a general SGD algorithm that allows for arbitrary data orderings and show improved convergence rates for non-convex functions. Specifically, our analysis reveals that SGD with random and single shuffling is always faster or at least as good as classical SGD with replacement, regardless of the number of iterations. Overall, our study highlights the benefits of using SGD with random/single shuffling and provides new insights into its convergence properties for non-convex optimization.

翻訳日:2023-08-09 16:44:12 公開日:2023-08-08

# UMD: X2Xバックドア攻撃の教師なしモデル検出

UMD: Unsupervised Model Detection for X2X Backdoor Attacks ( http://arxiv.org/abs/2305.18651v3 )

ライセンス: Link先を確認

Zhen Xiang, Zidi Xiong, Bo Li

(参考訳) バックドア(トロイの木馬)攻撃はディープニューラルネットワークに対する一般的な脅威であり、バックドアトリガーに埋め込まれた1つ以上のソースクラスからのサンプルは、敵のターゲットクラスに誤分類される。既存の分類器がバックドア攻撃であるかどうかを検出する方法は、主に1対1攻撃(例えば全対1攻撃)で攻撃するために設計されている。我々の知る限り、監督なしでは、任意のソースクラスでより一般的なX2X攻撃に効果的に対処する既存のメソッドは、いずれも任意のターゲットクラスとペアリングすることはできません。本稿では,敵(ソース,ターゲット)クラスペアの合同推論により,x2xバックドア攻撃を効果的に検出する,初の教師なしモデル検出手法umdを提案する。特に,提案するクラスタリングアプローチに基づき,提案するバックドアクラスペアのサブセットを計測・選択するための新しい転送可能性統計を最初に定義した。次に,提案するロバストで教師なしの異常検出器を用いて,検出推定のためのリバースエンジニアリングトリガサイズの集約に基づいて,選択されたクラスペアを共同で評価する。我々は, CIFAR-10, GTSRB, Imagenetteデータセットの総合的な評価を行い, 多様なX2X攻撃に対する検出精度の観点から, 教師なしUDDがSOTA検出器(監督下でも)を17%, 4%, 8%で上回っていることを示す。また,いくつかの強適応攻撃に対するumdの強力な検出性能を示す。

Backdoor (Trojan) attack is a common threat to deep neural networks, where samples from one or more source classes embedded with a backdoor trigger will be misclassified to adversarial target classes. Existing methods for detecting whether a classifier is backdoor attacked are mostly designed for attacks with a single adversarial target (e.g., all-to-one attack). To the best of our knowledge, without supervision, no existing methods can effectively address the more general X2X attack with an arbitrary number of source classes, each paired with an arbitrary target class. In this paper, we propose UMD, the first Unsupervised Model Detection method that effectively detects X2X backdoor attacks via a joint inference of the adversarial (source, target) class pairs. In particular, we first define a novel transferability statistic to measure and select a subset of putative backdoor class pairs based on a proposed clustering approach. Then, these selected class pairs are jointly assessed based on an aggregation of their reverse-engineered trigger size for detection inference, using a robust and unsupervised anomaly detector we proposed. We conduct comprehensive evaluations on CIFAR-10, GTSRB, and Imagenette dataset, and show that our unsupervised UMD outperforms SOTA detectors (even with supervision) by 17%, 4%, and 8%, respectively, in terms of the detection accuracy against diverse X2X attacks. We also show the strong detection performance of UMD against several strong adaptive attacks.

翻訳日:2023-08-09 16:43:51 公開日:2023-08-08

# P-NOC:弱教師付きセマンティックセグメンテーションのための逆CAM生成

P-NOC: Adversarial CAM Generation for Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2305.12522v2 )

ライセンス: Link先を確認

Lucas David, Helio Pedrini, and Zanoni Dias

(参考訳) 大量の教師付きセグメンテーションアノテーションセットの必要性を軽減するため、複数のWeakly Supervised Semantic Segmentation(WSSS)戦略が考案された。これらはしばしば、注釈付き情報の欠如にもかかわらず、セグメンテーション前の有用なプロパティ(例えば、予測完全性と意味境界への忠実性)の開発を促進するための高度なデータとモデル正規化戦略に依存する。本稿では、まず、補完的なWSSS技術を分析し、その強みと限界を考慮して戦略を規則化する。次に,2つの対向CAM生成ネットワークを段階的に改良し,ロバストなセマンティックセマンティックセグメンテーションを提案する。実験の結果,本手法はベースラインの有効性を著しく向上させ,Pascal VOC 2012とMS COCO 2014データセットの両方に対して顕著な改善をもたらすことが示唆された。

To mitigate the necessity for large amounts of supervised segmentation annotation sets, multiple Weakly Supervised Semantic Segmentation (WSSS) strategies have been devised. These will often rely on advanced data and model regularization strategies to instigate the development of useful properties (e.g., prediction completeness and fidelity to semantic boundaries) in segmentation priors, notwithstanding the lack of annotated information. In this work, we first create a strong baseline by analyzing complementary WSSS techniques and regularizing strategies, considering their strengths and limitations. We then propose a new Class-specific Adversarial Erasing strategy, comprising two adversarial CAM generating networks being gradually refined to produce robust semantic segmentation proposals. Empirical results suggest that our approach induces substantial improvement in the effectiveness of the baseline, resulting in a noticeable improvement over both Pascal VOC 2012 and MS COCO 2014 datasets.

翻訳日:2023-08-09 16:42:54 公開日:2023-08-08

# 実数値観測からの強化学習のためのニューロモルフィックアーキテクチャ

A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations ( http://arxiv.org/abs/2307.02947v2 )

ライセンス: Link先を確認

Sergio F. Chevtchenko, Yeshwanth Bethi, Teresa B. Ludermir, Saeed Afshar

(参考訳) 強化学習(RL)は複雑な環境における意思決定のための強力なフレームワークを提供する。しかし、ハードウェア効率とバイオインスパイアされた方法でRLを実装することは依然として課題である。本稿では,実測値を用いてRL問題を解くための新しいスパイキングニューラルネットワーク(SNN)アーキテクチャを提案する。提案モデルは,td(temporal difference)-error modulation)とeligibility tracesを追加して,事前作業に基づいて多層イベントベースクラスタリングを組み込んだものである。アブレーション研究は、これらの成分がモデルの性能に与える影響を裏付けるものである。適応性トレースを持つ表型アクター批判アルゴリズムと最先端のPPOアルゴリズムをベンチマークとして使用する。当社のネットワークは,従来型のRL環境(マウンテンカー,カートポール,アクロボット)における安定的な制御ポリシの発見に成功した。提案モデルは,計算およびハードウェア実装要件の観点から,魅力的なトレードオフを提供する。このモデルは外部メモリバッファやグローバルエラー勾配計算を必要とせず、ローカル学習ルールと放送されたtd-error信号によってオンラインにシナプス更新が行われる。したがって、この研究はよりハードウェア効率の良いRLソリューションの開発に寄与する。

Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. However, implementing RL in hardware-efficient and bio-inspired ways remains a challenge. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations. The proposed model incorporates multi-layered event-based clustering, with the addition of Temporal Difference (TD)-error modulation and eligibility traces, building upon prior work. An ablation study confirms the significant impact of these components on the proposed model's performance. A tabular actor-critic algorithm with eligibility traces and a state-of-the-art Proximal Policy Optimization (PPO) algorithm are used as benchmarks. Our network consistently outperforms the tabular approach and successfully discovers stable control policies on classic RL environments: mountain car, cart-pole, and acrobot. The proposed model offers an appealing trade-off in terms of computational and hardware implementation requirements. The model does not require an external memory buffer nor a global error gradient computation, and synaptic updates occur online, driven by local learning rules and a broadcasted TD-error signal. Thus, this work contributes to the development of more hardware-efficient RL solutions.

翻訳日:2023-08-09 16:36:34 公開日:2023-08-08

# MAE-DFER:自己教師型動的顔表情認識のための効率的なマスク付きオートエンコーダ

MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition ( http://arxiv.org/abs/2307.02227v2 )

ライセンス: Link先を確認

Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao

(参考訳) 動的表情認識(DFER)は、インテリジェントで共感的な機械の開発に不可欠である。この分野での以前の取り組みは、主に教師付き学習パラダイムに当てはまり、既存のデータセットの制限付きデータによって厳しく制限されている。近年のマスク付きオートエンコーダ(例:videomae)の成功に触発されて,大量のラベルなしデータに対して大規模自己教師付き事前学習を活用し,dferの開発を大いに前進させる新しい自己教師付き手法mae-dferを提案する。ビデオMAEで使用されるバニラ・ビジョン・トランスフォーマー(ViT)は微調整中にかなりの計算を必要とするため、MAE-DFERはエンコーダとして効率的なローカル・グローバル・インタラクション・トランスフォーマー(LGI-Former)を開発する。さらに,MAE-DFERは,ビデオMAEのスタンドアロンな外観コンテンツ再構成に加えて,LGI-Formerが静的な外観情報と動的動き情報の両方を発掘することを奨励する明示的な時間的顔の動きモデリングも導入している。 6つのデータセットに対する大規模な実験により、MAE-DFERは最先端の教師付き手法をかなりのマージン(DFEWでは+6.30\% UAR、MAFWでは+8.34\% UAR)で一貫して上回り、大規模な自己監督型事前訓練を通じて強力な動的顔表現を学習できることが確認された。さらに、ビデオMAEと同等かそれ以上の性能を有し、計算コスト(約38 % FLOPs)を大幅に削減している。 mae-dferは、dferの進歩のための新しい方法を開拓し、この分野および他の関連するタスクにおいて、より関連する研究を刺激することができると信じている。コードとモデルはhttps://github.com/sunlicai/MAE-DFERで公開されている。

Dynamic facial expression recognition (DFER) is essential to the development of intelligent and empathetic machines. Prior efforts in this field mainly fall into supervised learning paradigm, which is severely restricted by the limited labeled data in existing datasets. Inspired by recent unprecedented success of masked autoencoders (e.g., VideoMAE), this paper proposes MAE-DFER, a novel self-supervised method which leverages large-scale self-supervised pre-training on abundant unlabeled data to largely advance the development of DFER. Since the vanilla Vision Transformer (ViT) employed in VideoMAE requires substantial computation during fine-tuning, MAE-DFER develops an efficient local-global interaction Transformer (LGI-Former) as the encoder. Moreover, in addition to the standalone appearance content reconstruction in VideoMAE, MAE-DFER also introduces explicit temporal facial motion modeling to encourage LGI-Former to excavate both static appearance and dynamic motion information. Extensive experiments on six datasets show that MAE-DFER consistently outperforms state-of-the-art supervised methods by significant margins (e.g., +6.30\% UAR on DFEW and +8.34\% UAR on MAFW), verifying that it can learn powerful dynamic facial representations via large-scale self-supervised pre-training. Besides, it has comparable or even better performance than VideoMAE, while largely reducing the computational cost (about 38\% FLOPs). We believe MAE-DFER has paved a new way for the advancement of DFER and can inspire more relevant research in this field and even other related tasks. Codes and models are publicly available at https://github.com/sunlicai/MAE-DFER.

翻訳日:2023-08-09 16:36:19 公開日:2023-08-08

# 変分ガウス近似のための高次幾何積分器

High-order geometric integrators for the variational Gaussian approximation ( http://arxiv.org/abs/2306.17608v2 )

ライセンス: Link先を確認

Roya Moghaddasi Fereidani and Ji\v{r}\'i J. L. Van\'i\v{c}ek

(参考訳) 時間依存型シュル「o」ディンガー方程式を解くための単軌道ガウス法のうち、変分ガウス近似が最も正確である。ヘラーの元々のソードガウス近似とは対照的に、シンプレクティックであり、エネルギーを正確に保存し、部分的にトンネルを考慮できる。しかし、変分法もはるかに高価である。効率を向上させるため,faou と lubich の2次シンプレクティック積分器を対称に合成し,任意の収束次数を時間ステップで達成できる幾何学的積分器を得る。本研究では,高次積分器が2次アルゴリズムに比べて収束を劇的に高速化できることを示すとともに,一般の4次ルンゲ・クッタ法とは対照的に,標準とシンプレクティック構造を正確に保存できることを示す。本手法は低次元系に限定されないことを示すため, 結合モーゼ発振器の非分離性20次元モデル上で解析を行う。また, 変分法はトンネルを捕捉し, 非変分法によるガウス近似よりも精度を向上することを示した。

Among the single-trajectory Gaussian-based methods for solving the time-dependent Schr\"{o}dinger equation, the variational Gaussian approximation is the most accurate one. In contrast to Heller's original thawed Gaussian approximation, it is symplectic, conserves energy exactly, and may partially account for tunneling. However, the variational method is also much more expensive. To improve its efficiency, we symmetrically compose the second-order symplectic integrator of Faou and Lubich and obtain geometric integrators that can achieve an arbitrary even order of convergence in the time step. We demonstrate that the high-order integrators can speed up convergence drastically compared to the second-order algorithm and, in contrast to the popular fourth-order Runge-Kutta method, are time-reversible and conserve the norm and the symplectic structure exactly, regardless of the time step. To show that the method is not restricted to low-dimensional systems, we perform most of the analysis on a non-separable twenty-dimensional model of coupled Morse oscillators. We also show that the variational method may capture tunneling and, in general, improves accuracy over the non-variational thawed Gaussian approximation.

翻訳日:2023-08-09 16:35:19 公開日:2023-08-08

# VCMのためのエンドツーエンド学習型マルチスケール特徴圧縮

End-to-End Learnable Multi-Scale Feature Compression for VCM ( http://arxiv.org/abs/2306.16670v3 )

ライセンス: Link先を確認

Yeongwoong Kim, Hyewon Jeong, Janghyun Yu, Younhee Kim, Jooyoung Lee, Se Yoon Jeong, and Hui Yong Kim

(参考訳) ディープラーニングベースのマシンビジョンアプリケーションの普及により、ビデオ符号化(VCM)と呼ばれる新しいタイプの圧縮が生まれている。 VCMは従来のビデオコーディングとは異なり、人間の視覚的品質ではなく、マシンビジョンのパフォーマンスに最適化されている。 MPEG-VCMの特徴圧縮トラックでは,画像から抽出したマルチスケール特徴を圧縮する。近年,MPEG-VCM機能アンカーに対するBDレートを最大96%削減できる多目的ビデオ符号化(VVC)標準方式が実証されている。しかし、vvcは抽出された特徴ではなく、自然画像のために設計されたため、まだ最適ではない。さらに、VVCの符号化複雑性が高いため、性能を犠牲にすることなく軽量エンコーダの設計が困難になる。これらの課題に対処するため,我々は,抽出された特徴のエンドツーエンド最適化と軽量エンコーダの設計を両立する,新しいマルチスケール特徴圧縮手法を提案する。提案モデルは,学習可能な圧縮機とマルチスケール特徴融合ネットワークを組み合わせることで,マルチスケール特徴の冗長性を効果的に除去する。融合ネットワークと圧縮ネットワークを単純にカスケードする代わりに、融合処理と符号化処理をインターリーブ方式で統合する。提案モデルでは,まず大規模特徴を符号化して潜伏表現を取得し,さらに小型特徴量で潜伏表現を融合する。この処理は、最小のスケール特徴が融合するまで連続して行われ、最終段階のエントロピー符号化によりエントロピー符号化が行われる。その結果、我々のモデルは、BDレートを少なくとも52%削減し、オブジェクト検出のエンコードタイムを$\times5$から$\times27$に短縮した。

The proliferation of deep learning-based machine vision applications has given rise to a new type of compression, so called video coding for machine (VCM). VCM differs from traditional video coding in that it is optimized for machine vision performance instead of human visual quality. In the feature compression track of MPEG-VCM, multi-scale features extracted from images are subject to compression. Recent feature compression works have demonstrated that the versatile video coding (VVC) standard-based approach can achieve a BD-rate reduction of up to 96% against MPEG-VCM feature anchor. However, it is still sub-optimal as VVC was not designed for extracted features but for natural images. Moreover, the high encoding complexity of VVC makes it difficult to design a lightweight encoder without sacrificing performance. To address these challenges, we propose a novel multi-scale feature compression method that enables both the end-to-end optimization on the extracted features and the design of lightweight encoders. The proposed model combines a learnable compressor with a multi-scale feature fusion network so that the redundancy in the multi-scale features is effectively removed. Instead of simply cascading the fusion network and the compression network, we integrate the fusion and encoding processes in an interleaved way. Our model first encodes a larger-scale feature to obtain a latent representation and then fuses the latent with a smaller-scale feature. This process is successively performed until the smallest-scale feature is fused and then the encoded latent at the final stage is entropy-coded for transmission. The results show that our model outperforms previous approaches by at least 52% BD-rate reduction and has $\times5$ to $\times27$ times less encoding time for object detection...

翻訳日:2023-08-09 16:34:56 公開日:2023-08-08

# 空間的詳細記憶を用いたパンシャープ化への学習

Learning to Pan-sharpening with Memories of Spatial Details ( http://arxiv.org/abs/2306.16181v3 )

ライセンス: Link先を確認

Maoxun Yuan, Tianyi Zhao, Bo Li, Xingxing Wei

(参考訳) リモートセンシングシステムにおいて最もよく用いられる技術の一つであるパンシャーペニングは、パンクロマティック画像からマルチスペクトル画像(MS)に空間的詳細を注入し、高解像度のマルチスペクトル画像を得る。ディープラーニングはその強固な適合能力と効率的な特徴抽出によって広く注目を集めているため、優れた性能を達成するために様々なパンシャープ化手法が提案されている。しかしながら、現在のパンシャーピング法では、通常、ペア化されたパンクロマトグラフィ(PAN)とMSイメージを入力として必要としており、いくつかのシナリオでは使用を制限している。この問題に対処するために,本論文では,PAN画像の空間的詳細が主に高周波の手がかりである,すなわち入力PAN画像の輪郭を反映していることを観察する。これにより,いくつかのベースエッジを格納するPAN非依存表現を開発し,それを介して対応するPAN画像の輪郭を構成することができる。その結果、推定時にms画像のみを用いてパンシャープ化タスクを行うことができる。この目的のために、メモリベースのネットワークは、トレーニングフェーズ中に空間の詳細を抽出して記憶するように適応し、メモリベースの空間詳細ネットワーク(MSDN)と呼ばれる推論時にPAN画像から空間情報を取得するプロセスを置き換えるために使用される。最後に、提案したMSDNモジュールを既存のディープラーニングベースのパンシャーピング手法に統合し、エンドツーエンドのパンシャーピングネットワークを実現する。我々はGaofen1衛星とWorldView-4衛星の広範な実験により、PAN画像なしで良好な空間的詳細を構築し、最高の性能を達成することを検証する。コードはhttps://github.com/Zhao-Tian-yi/Learning-to-Pan-sharpening-with-Memories-of-Spatial-Details.gitで公開されている。

Pan-sharpening, as one of the most commonly used techniques in remote sensing systems, aims to inject spatial details from panchromatic images into multispectral images (MS) to obtain high-resolution multispectral images. Since deep learning has received widespread attention because of its powerful fitting ability and efficient feature extraction, a variety of pan-sharpening methods have been proposed to achieve remarkable performance. However, current pan-sharpening methods usually require the paired panchromatic (PAN) and MS images as input, which limits their usage in some scenarios. To address this issue, in this paper we observe that the spatial details from PAN images are mainly high-frequency cues, i.e., the edges reflect the contour of input PAN images. This motivates us to develop a PAN-agnostic representation to store some base edges, so as to compose the contour for the corresponding PAN image via them. As a result, we can perform the pan-sharpening task with only the MS image when inference. To this end, a memory-based network is adapted to extract and memorize the spatial details during the training phase and is used to replace the process of obtaining spatial information from PAN images when inference, which is called Memory-based Spatial Details Network (MSDN). Finally, we integrate the proposed MSDN module into the existing deep learning-based pan-sharpening methods to achieve an end-to-end pan-sharpening network. With extensive experiments on the Gaofen1 and WorldView-4 satellites, we verify that our method constructs good spatial details without PAN images and achieves the best performance. The code is available at https://github.com/Zhao-Tian-yi/Learning-to-Pan-sharpening-with-Memories-of-Spatial-Details.git.

翻訳日:2023-08-09 16:34:26 公開日:2023-08-08

# 調和振動子の固有状態を記述する経路分布と他の1次元問題

Path distributions for describing eigenstates of the harmonic oscillator and other 1-dimensional problems ( http://arxiv.org/abs/2306.11155v2 )

ライセンス: Link先を確認

Randall M. Feenstra

(参考訳) 経路の確率振幅を合計して調和振動子の波動関数を形成する方法と、他の単純な1次元問題について述べる。各問題に対して既知の閉形式パスベースの伝搬器を用いて、波動関数を記述する積分式を記述する。この表現は伝統的に粒子の初期位置上の積分の形を取るが、経路の終点間の運動に関連した特性運動量の観点からここで再表現される。このようにして、得られた表現は定常位相解析の一般化を用いて解析され、各固有状態を正確に記述する経路の分布に繋がる。これらの分布は全ての旅行時間に有効であるが、長い時間評価すると、特性運動量の非負関数であることが判明する。特に調和振動子の場合、幾分広い分布が見られ、記述される状態のエネルギー固有値と等しい古典エネルギーに対応する運動量の値でピークとなる。

The manner in which probability amplitudes of paths sum up to form wave functions of a harmonic oscillator, as well as other, simple 1-dimensional problems, is described. Using known, closed-form, path-based propagators for each problem, an integral expression is written that describes the wave function. This expression conventionally takes the form of an integral over initial locations of a particle, but it is re-expressed here in terms of a characteristic momentum associated with motion between the endpoints of a path. In this manner, the resulting expression can be analyzed using a generalization of stationary-phase analysis, leading to distributions of paths that exactly describe each eigenstate. These distributions are valid for all travel times, but when evaluated for long times they turn out to be real, non-negative functions of the characteristic momentum. For the harmonic oscillator in particular, a somewhat broad distribution is found, peaked at value of momentum that corresponds to a classical energy which in turn equals the energy eigenvalue for the state being described.

翻訳日:2023-08-09 16:33:54 公開日:2023-08-08

# 文書レイアウトアノテーション:公務領域におけるデータベースとベンチマーク

Document Layout Annotation: Database and Benchmark in the Domain of Public Affairs ( http://arxiv.org/abs/2306.10046v2 )

ライセンス: Link先を確認

Alejandro Pe\~na, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Marcos Grande, I\~nigo Puente, Jorge Cordova, Gonzalo Cordova

(参考訳) 毎日何千ものデジタル文書が、企業、公共団体、市民に有用な情報と共に生成される。手動で処理できないことを考えると、これらの文書の自動処理は特定の分野においてますます必要となってきている。しかし、ほとんどの場合、テキストのみの構文解析では、様々な意味を持つ異なるコンポーネントを通して提示される情報を十分に理解できないため、この課題は依然として困難なままである。このような観点から、文書レイアウト分析(Document Layout Analysis, DLA)は、文書の基本コンポーネントを検出し分類することを目的とした、長年にわたる興味深い研究分野である。本研究では4つの基本レイアウトブロックと4つのテキストカテゴリを含む,異なるレイアウトラベルを持つデジタル文書をセミオートマチックにアノテートする手法を用いた。本稿では,スペイン政府から24件のデータソースを用いて,行政領域におけるDLAの新しいデータベースの収集に本手法を適用した。データベースは、37.9Kドキュメントと441Kドキュメントページと、8Mラベルが8つのレイアウトブロックユニットに関連付けられている。実験の結果,提案するテキストラベリング手順を最大99%の精度で検証した。

Every day, thousands of digital documents are generated with useful information for companies, public organizations, and citizens. Given the impossibility of processing them manually, the automatic processing of these documents is becoming increasingly necessary in certain sectors. However, this task remains challenging, since in most cases a text-only based parsing is not enough to fully understand the information presented through different components of varying significance. In this regard, Document Layout Analysis (DLA) has been an interesting research field for many years, which aims to detect and classify the basic components of a document. In this work, we used a procedure to semi-automatically annotate digital documents with different layout labels, including 4 basic layout blocks and 4 text categories. We apply this procedure to collect a novel database for DLA in the public affairs domain, using a set of 24 data sources from the Spanish Administration. The database comprises 37.9K documents with more than 441K document pages, and more than 8M labels associated to 8 layout block units. The results of our experiments validate the proposed text labeling procedure with accuracy up to 99%.

翻訳日:2023-08-09 16:33:37 公開日:2023-08-08

# 大規模言語モデルは本当に優れた論理型推論器か? 総合的な評価とそれ以上

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond ( http://arxiv.org/abs/2306.09841v3 )

ライセンス: Link先を確認

Fangzhi Xu, Qika Lin, Jiawei Han, Tianzhe Zhao, Jun Liu, Erik Cambria

(参考訳) 論理的推論は、知識工学と人工知能の分野において、一貫して基本的で重要な役割を果たす。近年、Large Language Models (LLMs) は自然言語処理(NLP)における注目すべき革新として現れ、様々な古典的NLPタスクにおいて顕著な成果を発揮している。しかし、LLMが人間の知性に類似した段階的な認知推論を必要とする論理的推論の課題に効果的に対処できるかどうかという問題は未解決のままである。この目的のために,本論文では,このギャップを橋渡しし,包括的評価を行う。まず,システマティックな評価を行うために,15の典型的な論理推論データセットを選択し,推論,帰納的,帰納的,混合形式の推論設定に整理する。評価の包括性を考慮すると、3つの代表的なLCM(text-davinci-003, ChatGPT, BARD)を含み、ゼロショット、ワンショット、3ショット設定で選択されたすべてのデータセットで評価する。第二に,単純な指標(例えば正確性)のみに依存する従来の評価と異なり,客観的・主観的評価を行い,回答と説明の両方をカバーする。さらに、LLMの論理的欠陥を明らかにするために、問題のあるケースは2次元から5つのエラータイプ、すなわちエビデンス選択プロセスと推論プロセスに起因する。第三に、知識バイアスの影響を回避し、LLMの論理的推論能力のベンチマークに純粋に集中するため、中立性のある新しいデータセットを提案する。サンプルは3,000種類あり、デダクティブ、インダクティブ、アブダクティブの設定をカバーしている。本論文は,詳細な評価に基づいて,6次元から論理推論能力の一般的な評価手法を提案する。 LLMの長所と短所を反映し、将来の作品の指針を与える。

Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge engineering and artificial intelligence. Recently, Large Language Models (LLMs) have emerged as a noteworthy innovation in natural language processing (NLP), exhibiting impressive achievements across various classic NLP tasks. However, the question of whether LLMs can effectively address the task of logical reasoning, which requires gradual cognitive inference similar to human intelligence, remains unanswered. To this end, we aim to bridge this gap and provide comprehensive evaluations in this paper. Firstly, to offer systematic evaluations, we select fifteen typical logical reasoning datasets and organize them into deductive, inductive, abductive and mixed-form reasoning settings. Considering the comprehensiveness of evaluations, we include three representative LLMs (i.e., text-davinci-003, ChatGPT and BARD) and evaluate them on all selected datasets under zero-shot, one-shot and three-shot settings. Secondly, different from previous evaluations relying only on simple metrics (e.g., accuracy), we propose fine-level evaluations from objective and subjective manners, covering both answers and explanations. Additionally, to uncover the logical flaws of LLMs, problematic cases will be attributed to five error types from two dimensions, i.e., evidence selection process and reasoning process. Thirdly, to avoid the influences of knowledge bias and purely focus on benchmarking the logical reasoning capability of LLMs, we propose a new dataset with neutral content. It contains 3,000 samples and covers deductive, inductive and abductive settings. Based on the in-depth evaluations, this paper finally forms a general evaluation scheme of logical reasoning capability from six dimensions. It reflects the pros and cons of LLMs and gives guiding directions for future works.

翻訳日:2023-08-09 16:33:20 公開日:2023-08-08

# テキストから画像へのデータ帰属の評価

Evaluating Data Attribution for Text-to-Image Models ( http://arxiv.org/abs/2306.09345v2 )

ライセンス: Link先を確認

Sheng-Yu Wang, Alexei A. Efros, Jun-Yan Zhu, Richard Zhang

(参考訳) 大きなテキスト・画像モデルでは「ノーベル」なイメージを合成できるが、これらの画像は必ずしもトレーニングデータのリフレクションである。このようなモデルにおけるデータ帰属の問題 -- トレーニングセット内の画像のどれが、生成された画像の出現に最も責任を持つか -- は、難しいが重要な問題である。この問題に対する最初のステップとして、既存の大規模モデルを所定の例題オブジェクトやスタイルに向けてチューニングする「カスタマイズ」メソッドによる帰属評価を行う。私たちのキーとなる洞察は、これによって、構築によって模範にコンピュータ的に影響される合成画像を効率的に作成できるということです。このような画像の新たなデータセットを用いて、様々なデータ属性アルゴリズムと様々な可能な特徴空間を評価することができる。さらに,データセット上でトレーニングすることで,dino, clip, vitなどの標準モデルを帰属問題に向けてチューニングすることができる。手順は小さな例集合に向けて調整されるが、より大きい集合への一般化を示す。最後に,問題の本質的不確実性を考慮することで,一連のトレーニング画像に対してソフト属性スコアを割り当てることができる。

While large text-to-image models are able to synthesize "novel" images, these images are necessarily a reflection of the training data. The problem of data attribution in such models -- which of the images in the training set are most responsible for the appearance of a given generated image -- is a difficult yet important one. As an initial step toward this problem, we evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style. Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction. With our new dataset of such exemplar-influenced images, we are able to evaluate various data attribution algorithms and different possible feature spaces. Furthermore, by training on our dataset, we can tune standard models, such as DINO, CLIP, and ViT, toward the attribution problem. Even though the procedure is tuned towards small exemplar sets, we show generalization to larger sets. Finally, by taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.

翻訳日:2023-08-09 16:32:23 公開日:2023-08-08

# 大規模言語モデルを用いた数学的導出の生成

Generating Mathematical Derivations with Large Language Models ( http://arxiv.org/abs/2307.09998v3 )

ライセンス: Link先を確認

Jordan Meadows, Marco Valentino, Andre Freitas

(参考訳) LLM(Large Language Models)を用いた特殊分野における数学的結果の導出は、モデルの限界を識別し、数学的発見を支援するための新たな研究方向である。本稿では,記号エンジンを用いて大規模方程式の導出を行い,目的方程式を前提から導出する際の LLM の機能について検討する。具体的には,事前学習戦略の頑健さと一般化を特殊化モデルと比較するため,GPTの文脈内学習とT5モデルの微調整を行う。実験結果から,FLAN-T5-large (MathT5) は従来のスコアにおいて,全ての静的および分布外テストセットにおいてGPTモデルよりも優れていた。しかし、詳細な分析により、微調整されたモデルは、見当たらない記号を含む摂動や(より少ない範囲で)方程式構造の変化に対してより敏感であることが明らかになった。さらに、1.7Kの方程式と200以上の導出を解析し、誤り、無関係、冗長な方程式を含むような一般的な推論誤差を強調する。最後に、数学的導出を評価するための既存の指標の適合性について検討し、摂動に対する感度などの一般的な特性を捉えることができるが、詳細な推論誤差やモデル間の本質的な差異を強調できないことを示す。全体として、この研究は合成データのトレーニングモデルがより大きなLLMよりも数学能力を向上することを示したが、現在のメトリクスは生成した数学的テキストの品質を適切に評価していない。

The derivation of mathematical results in specialised fields, using Large Language Models (LLMs), is an emerging research direction that can help identify models' limitations, and potentially support mathematical discovery. In this paper, we leverage a symbolic engine to generate derivations of equations at scale, and investigate the capabilities of LLMs when deriving goal equations from premises. Specifically, we employ in-context learning for GPT and fine-tune a range of T5 models to compare the robustness and generalisation of pre-training strategies to specialised models. Empirical results show that fine-tuned FLAN-T5-large (MathT5) outperforms GPT models on all static and out-of-distribution test sets in conventional scores. However, an in-depth analysis reveals that the fine-tuned models are more sensitive to perturbations involving unseen symbols and (to a lesser extent) changes to equation structure. In addition, we analyse 1.7K equations, and over 200 derivations, to highlight common reasoning errors such as the inclusion of incorrect, irrelevant, and redundant equations. Finally, we explore the suitability of existing metrics for evaluating mathematical derivations and find evidence that, while they can capture general properties such as sensitivity to perturbations, they fail to highlight fine-grained reasoning errors and essential differences between models. Overall, this work demonstrates that training models on synthetic data may improve their math capabilities beyond much larger LLMs, but current metrics are not appropriately assessing the quality of generated mathematical text.

翻訳日:2023-08-09 16:26:52 公開日:2023-08-08

# 遠方点雲登録のための密度不変特性

Density-invariant Features for Distant Point Cloud Registration ( http://arxiv.org/abs/2307.09788v2 )

ライセンス: Link先を確認

Quan Liu, Hongzi Zhu, Yunsong Zhou, Hongyang Li, Shan Chang, Minyi Guo

(参考訳) 遠隔地ライダー点雲の登録は、協調走行車の3dビジョンを拡張する上で重要であるが、重複面積が小さいことと観測点密度の差が大きいため、課題である。本稿では, 遠方のライダー点雲を登録するために, 密度不変な幾何学的特徴を抽出するグループワイズコントラスト学習(gcl)スキームを提案する。我々は、密度不変特徴抽出器を訓練するために、コントラスト正値が独立かつ同一分布(i.i.d.)であるべきという理論的解析と実験を通した。本稿では,同一空間的位置(正の群と呼ばれる)における複数の点群の特徴を類似させる,単純かつ効果的な訓練手法を提案し,一対の点群がi.i.d.原理に適合するように導入するサンプリングバイアスを回避する。結果として得られる完全畳み込み特徴抽出器は最先端の手法よりも強力で密度不変であり、KITTIとnuScenesベンチマークにおける遠隔シナリオの登録リコールをそれぞれ40.9%、26.9%改善した。コードはhttps://github.com/liuQuan98/GCLで入手できる。

Registration of distant outdoor LiDAR point clouds is crucial to extending the 3D vision of collaborative autonomous vehicles, and yet is challenging due to small overlapping area and a huge disparity between observed point densities. In this paper, we propose Group-wise Contrastive Learning (GCL) scheme to extract density-invariant geometric features to register distant outdoor LiDAR point clouds. We mark through theoretical analysis and experiments that, contrastive positives should be independent and identically distributed (i.i.d.), in order to train densityinvariant feature extractors. We propose upon the conclusion a simple yet effective training scheme to force the feature of multiple point clouds in the same spatial location (referred to as positive groups) to be similar, which naturally avoids the sampling bias introduced by a pair of point clouds to conform with the i.i.d. principle. The resulting fully-convolutional feature extractor is more powerful and density-invariant than state-of-the-art methods, improving the registration recall of distant scenarios on KITTI and nuScenes benchmarks by 40.9% and 26.9%, respectively. Code is available at https://github.com/liuQuan98/GCL.

翻訳日:2023-08-09 16:26:27 公開日:2023-08-08

# AesPA-Net:美的パターン認識型転送ネットワーク

AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks ( http://arxiv.org/abs/2307.09724v3 )

ライセンス: Link先を確認

Kibeom Hong, Seogkyu Jeon, Junsoo Lee, Namhyuk Ahn, Kunhee Kim, Pilhyeon Lee, Daesik Kim, Youngjung Uh, Hyeran Byun

(参考訳) 対象のスタイルを芸術的に表現するために、近年の研究では、スタイル画像の局所パッチをコンテンツ画像の対応するパッチにマッピングする能力により、注意機構を活用している。しかし、任意の内容とアートワークのセマンティックな対応が低いため、アテンションモジュールはスタイルイメージから特定のローカルパッチを乱用し、不調和で明らかな反復的なアーティファクトをもたらす。この制限を克服し,芸術的なスタイルの伝達を困難にするため,注意機構の強化とスタイルを整理するパターンのリズムの獲得に重点を置いている。本稿では,スタイル画像におけるパターンの反復を定量化する新しい指標であるパターン反復可能性について述べる。このパターン再現性に基づき,局所的およびグローバル的表現のスイートスポットを探索する美的パターン認識型転送ネットワーク(aespa-net)を提案する。さらに,注意機構が正確で意味のある意味的対応を学習することを奨励する,新たな自己監督タスクを提案する。最後に,局所パターンの精巧なリズムを伝達するためにパッチワイズスタイルロスを導入する。定量的に定量的な評価を行い,人間の知覚に適合するパターン再現性の信頼性を検証し,提案手法の優れていることを示す。

To deliver the artistic expression of the target style, recent studies exploit the attention mechanism owing to its ability to map the local patches of the style image to the corresponding patches of the content image. However, because of the low semantic correspondence between arbitrary content and artworks, the attention module repeatedly abuses specific local patches from the style image, resulting in disharmonious and evident repetitive artifacts. To overcome this limitation and accomplish impeccable artistic style transfer, we focus on enhancing the attention mechanism and capturing the rhythm of patterns that organize the style. In this paper, we introduce a novel metric, namely pattern repeatability, that quantifies the repetition of patterns in the style image. Based on the pattern repeatability, we propose Aesthetic Pattern-Aware style transfer Networks (AesPA-Net) that discover the sweet spot of local and global style expressions. In addition, we propose a novel self-supervisory task to encourage the attention mechanism to learn precise and meaningful semantic correspondence. Lastly, we introduce the patch-wise style loss to transfer the elaborate rhythm of local patterns. Through qualitative and quantitative evaluations, we verify the reliability of the proposed pattern repeatability that aligns with human perception, and demonstrate the superiority of the proposed framework.

翻訳日:2023-08-09 16:26:05 公開日:2023-08-08

# 人工知能のプライバシと進歩のバランス:生物医学研究・教育の病理学における匿名化

Balancing Privacy and Progress in Artificial Intelligence: Anonymization in Histopathology for Biomedical Research and Education ( http://arxiv.org/abs/2307.09426v2 )

ライセンス: Link先を確認

Neel Kanwal, Emiel A.M. Janssen, Kjersti Engan

(参考訳) 生物医学研究の進展は、大量の医療データへのアクセスに大きく依存している。病理組織学の場合,全スライド画像(WSI)と臨床病理学的情報は,Digital Pathology(DP)のための人工知能(AI)アルゴリズムの開発に有用である。医療データの転送は、二次的な目的のためにデータの使用性を高めるが、患者のプライバシにリスクをもたらす。同時に、既存の規制は、再識別リスクを避けるため、医療データを「必要に応じてクローズド」し続けるよう推進している。一般に、これらの法的規制は機密データを削除する必要があるが、現代の画像マッチングアルゴリズムによるデータ連鎖攻撃の可能性を考慮していない。さらに、DPにおける標準化の欠如により、WSIのすべてのフォーマットに対して単一のソリューションを確立するのが難しくなる。これらの課題は、AIアルゴリズムを開発しながらプライバシーと進捗のバランスをとるバイオインフォマティクス研究者の問題を提起する。本稿では,医療データ共有の法的規制と用語について検討する。我々は既存のアプローチをレビューし、病理学的観点から課題を強調する。また,多分野の研究・教育を促進するために,組織データのためのデータ共有ガイドラインも提示する。

The advancement of biomedical research heavily relies on access to large amounts of medical data. In the case of histopathology, Whole Slide Images (WSI) and clinicopathological information are valuable for developing Artificial Intelligence (AI) algorithms for Digital Pathology (DP). Transferring medical data "as open as possible" enhances the usability of the data for secondary purposes but poses a risk to patient privacy. At the same time, existing regulations push towards keeping medical data "as closed as necessary" to avoid re-identification risks. Generally, these legal regulations require the removal of sensitive data but do not consider the possibility of data linkage attacks due to modern image-matching algorithms. In addition, the lack of standardization in DP makes it harder to establish a single solution for all formats of WSIs. These challenges raise problems for bio-informatics researchers in balancing privacy and progress while developing AI algorithms. This paper explores the legal regulations and terminologies for medical data-sharing. We review existing approaches and highlight challenges from the histopathological perspective. We also present a data-sharing guideline for histological data to foster multidisciplinary research and education.

翻訳日:2023-08-09 16:25:43 公開日:2023-08-08

# なぜ小さなロバストさが役に立つのか? 代理訓練による対向移動可能性の理解

Why Does Little Robustness Help? Understanding Adversarial Transferability From Surrogate Training ( http://arxiv.org/abs/2307.07873v3 )

ライセンス: Link先を確認

Yechao Zhang, Shengshan Hu, Leo Yu Zhang, Junyu Shi, Minghui Li, Xiaogeng Liu, Wei Wan, Hai Jin

(参考訳) DNNの逆例(AE)は転送可能であることが示されている: ホワイトボックスサロゲートモデルをうまく騙すAEは、異なるアーキテクチャで他のブラックボックスモデルを騙すこともできる。多くの実験的な研究は、高度に伝達可能なAEを生成するためのガイダンスを提供してきたが、これらの発見の多くは説明に欠け、矛盾するアドバイスに至る。本稿では,敵対的伝達可能性の理解に向けてさらなる一歩を踏み出し,サロゲート的な側面に焦点をあてる。弱い摂動サンプルで逆向きに訓練されたモデルがより良い代理となるという、興味深い小さな堅牢性現象から始まり、モデルの滑らかさと勾配類似性という2つの主要な要因のトレードオフが原因と考えられる。研究は, 移動可能性との相関性ではなく, 共同効果に焦点をあてた。一連の理論的および経験的分析を通して、逆行訓練におけるデータ分布シフトが勾配類似性の低下を説明すると推測する。これらの知見に基づいて,データ拡張と勾配正規化が伝達可能性に与える影響を考察し,そのトレードオフが様々なトレーニングメカニズムに一般的に存在していることを確認する。最後に,入力勾配正則化とシャープネス認識最小化(sam)の組み合わせなど,モデルの滑らかさと勾配の類似性を同時に最適化するトランスファー性を高めるために,より優れたサロゲートを構築するための一般的な経路を提案する。要約すると、我々は、一方を無視しながら一方を最適化するのではなく、他方を効果的に移動攻撃する2つの要因の統一的な影響に注意を向け、代理モデルを操作する重要な役割を強調している。

Adversarial examples (AEs) for DNNs have been shown to be transferable: AEs that successfully fool white-box surrogate models can also deceive other black-box models with different architectures. Although a bunch of empirical studies have provided guidance on generating highly transferable AEs, many of these findings lack explanations and even lead to inconsistent advice. In this paper, we take a further step towards understanding adversarial transferability, with a particular focus on surrogate aspects. Starting from the intriguing little robustness phenomenon, where models adversarially trained with mildly perturbed adversarial samples can serve as better surrogates, we attribute it to a trade-off between two predominant factors: model smoothness and gradient similarity. Our investigations focus on their joint effects, rather than their separate correlations with transferability. Through a series of theoretical and empirical analyses, we conjecture that the data distribution shift in adversarial training explains the degradation of gradient similarity. Building on these insights, we explore the impacts of data augmentation and gradient regularization on transferability and identify that the trade-off generally exists in the various training mechanisms, thus building a comprehensive blueprint for the regulation mechanism behind transferability. Finally, we provide a general route for constructing better surrogates to boost transferability which optimizes both model smoothness and gradient similarity simultaneously, e.g., the combination of input gradient regularization and sharpness-aware minimization (SAM), validated by extensive experiments. In summary, we call for attention to the united impacts of these two factors for launching effective transfer attacks, rather than optimizing one while ignoring the other, and emphasize the crucial role of manipulating surrogate models.

翻訳日:2023-08-09 16:24:50 公開日:2023-08-08

# 大規模言語モデルを用いたテキスト分類の事前適応による教師なし校正

Unsupervised Calibration through Prior Adaptation for Text Classification using Large Language Models ( http://arxiv.org/abs/2307.06713v2 )

ライセンス: Link先を確認

Lautaro Estienne

(参考訳) 現在、さまざまな自然言語タスクが大規模言語モデル(llm)で処理されている。これらのモデルは、通常、非常に大量の教師なしのテキストデータで訓練され、微調整、キャリブレーション、コンテキスト内学習などの手法を使用して下流の自然言語タスクを実行するように適合する。そこで本研究では,ラベル付きサンプルとドメイン内サンプルクエリのみを必要とせず,テキスト分類タスクに事前クラス分布を適用する手法を提案する。提案されたアプローチでは、llmをブラックボックスとして扱い、モデル後方をタスクに校正するステージを追加する。提案手法は,適応データを用いずにキャリブレーションを行い,プロンプトと前回のアプローチで異なるトレーニングショット数に対して適応しないモデルよりも優れていた。

A wide variety of natural language tasks are currently being addressed with large-scale language models (LLMs). These models are usually trained with a very large amount of unsupervised text data and adapted to perform a downstream natural language task using methods like fine-tuning, calibration or in-context learning. In this work, we propose an approach to adapt the prior class distribution to perform text classification tasks without the need for labelled samples and only few in-domain sample queries. The proposed approach treats the LLM as a black box, adding a stage where the model posteriors are calibrated to the task. Results show that these methods outperform the un-adapted model for different number of training shots in the prompt and a previous approach were calibration is performed without using any adaptation data.

翻訳日:2023-08-09 16:23:46 公開日:2023-08-08

# エッジの平滑化: Hadamard overparametrization を用いたスパース正規化におけるスムース最適化のための汎用フレームワーク

Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization ( http://arxiv.org/abs/2307.03571v2 )

ライセンス: Link先を確認

Chris Kolb and Christian L. M\"uller and Bernd Bischl and David R\"ugamer

(参考訳) 本稿では,(構造)スパーシティに対する$\ell_q$と$\ell_{p,q}$正規化を伴う目的の円滑な最適化のためのフレームワークを提案する。これらの非滑らかでおそらくは非凸問題に対する解を見つけることは、通常、特別な最適化ルーチンに依存する。対照的に,本手法は,深層学習においてユビキタスなオフ・ザ・シェルフ(stochastic)勾配降下と相性があり,近似なしで微分可能なスパース正規化が可能となる。提案する最適化転送は、選択されたモデルパラメータのオーバーパラメータ化と、ペナルティの変更を含む。過度パラメータ化問題において、滑らかで凸な$\ell_2$正規化は元のパラメトリゼーションにおいて非滑らかかつ非凸正規化を誘導する。結果の代理問題は、同じ大域的最適性を持つだけでなく、局所的なミニマを正確に保存することを示した。これは非凸正則化において特に有用であり、大域的解を見つけることはNPハードであり、局所ミニマはしばしば一般化される。我々は,スパーシティ誘導パラメトリゼーションに関する様々な文献ストランドを一般の設定で集約し,既存のアプローチを有意義に拡張する統合的概観を提供する。本手法の有効性を数値実験により評価し,凸および非凸正則化器の共通実装を一致または上回ることでその効果を実証した。

This paper presents a framework for smooth optimization of objectives with $\ell_q$ and $\ell_{p,q}$ regularization for (structured) sparsity. Finding solutions to these non-smooth and possibly non-convex problems typically relies on specialized optimization routines. In contrast, the method studied here is compatible with off-the-shelf (stochastic) gradient descent that is ubiquitous in deep learning, thereby enabling differentiable sparse regularization without approximations. The proposed optimization transfer comprises an overparametrization of selected model parameters followed by a change of penalties. In the overparametrized problem, smooth and convex $\ell_2$ regularization induces non-smooth and non-convex regularization in the original parametrization. We show that the resulting surrogate problem not only has an identical global optimum but also exactly preserves the local minima. This is particularly useful in non-convex regularization, where finding global solutions is NP-hard and local minima often generalize well. We provide an integrative overview that consolidates various literature strands on sparsity-inducing parametrizations in a general setting and meaningfully extend existing approaches. The feasibility of our approach is evaluated through numerical experiments, demonstrating its effectiveness by matching or outperforming common implementations of convex and non-convex regularizers.

翻訳日:2023-08-09 16:23:32 公開日:2023-08-08

# テキスト分類におけるGzip vs. bag-of-words

Gzip versus bag-of-words for text classification ( http://arxiv.org/abs/2307.15002v5 )

ライセンス: Link先を確認

Juri Opitz

(参考訳) テキスト分類における圧縮の有効性('gzip')は最近多くの注目を集めている。本稿では, 'bag-of-words' アプローチが類似あるいは良好な結果を達成し,より効率的であることを示す。

The effectiveness of compression in text classification ('gzip') has recently garnered lots of attention. In this note we show that `bag-of-words' approaches can achieve similar or better results, and are more efficient.

翻訳日:2023-08-09 16:14:56 公開日:2023-08-08

# RPG-Palm:パルププリント認識のための実データ生成

RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition ( http://arxiv.org/abs/2307.14016v3 )

ライセンス: Link先を確認

Lei Shen, Jianlong Jin, Ruixin Zhang, Huaen Li, Kai Zhao, Yingyi Zhang, Jingyun Zhang, Shouhong Ding, Yang Zhao, Wei Jia

(参考訳) Palmprintは最近、プライバシーにやさしく安定したバイオメトリックスであるため、認識アプリケーションに大きな可能性を示している。しかし、大規模な公開palmprintデータセットの欠如は、palmprint認識のさらなる研究と開発を制限している。本稿では,パームプリントを大量のIDで合成する新しい現実的な擬似パルムプリント生成(RPG)モデルを提案する。まず,クラス内多様性を改善する条件変調生成器を提案する。次に,非ペアトレーニングに対するid一貫性を確保するために,id認識損失を提案する。我々は、アイデンティティ独立を保証するため、B'ezier palm creases生成戦略をさらに改善する。広範な実験結果から,合成前訓練は認識モデルの性能を著しく向上させることが示された。例えば、我々のモデルは、1:1$と1:3$のオープンセットプロトコルの下でtar@far=1e-6の観点で、最先端のb\'ezierpalmを$5\%$と$14\%$で改善します。実際のトレーニングデータのうち10〜%しかアクセスしない場合、本手法はarcfaceを100〜%の実際のトレーニングデータで上回っており、実データなしのpalmprint認識に近いことを示している。

Palmprint recently shows great potential in recognition applications as it is a privacy-friendly and stable biometric. However, the lack of large-scale public palmprint datasets limits further research and development of palmprint recognition. In this paper, we propose a novel realistic pseudo-palmprint generation (RPG) model to synthesize palmprints with massive identities. We first introduce a conditional modulation generator to improve the intra-class diversity. Then an identity-aware loss is proposed to ensure identity consistency against unpaired training. We further improve the B\'ezier palm creases generation strategy to guarantee identity independence. Extensive experimental results demonstrate that synthetic pretraining significantly boosts the recognition model performance. For example, our model improves the state-of-the-art B\'ezierPalm by more than $5\%$ and $14\%$ in terms of TAR@FAR=1e-6 under the $1:1$ and $1:3$ Open-set protocol. When accessing only $10\%$ of the real training data, our method still outperforms ArcFace with $100\%$ real training data, indicating that we are closer to real-data-free palmprint recognition.

翻訳日:2023-08-09 16:14:51 公開日:2023-08-08

# インテリジェントシステムの複雑解析

Complex Analysis of Intelligent Systems ( http://arxiv.org/abs/2307.12905v2 )

ライセンス: Link先を確認

M.W. AlMasri

(参考訳) 論理ゲートは、入力と出力が複数の変数を持つ解析関数である複素微分作用素を用いて書くことができる。複素数の極表現を用いて、系の振動挙動と論理ゲートの間の即時接続に到達する。物理オブジェクトが情報処理に使用するユニバーサルプログラミング言語(UPL)について説明する。 UPLの因果構造を保証するため,各時間スケールの計算を特徴付けるレイヤの概念を導入する。

Logic gates can be written in terms of complex differential operators where the inputs and outputs are analytic functions with several variables. Using the polar representation of complex numbers, we arrive at an immediate connection between the oscillatory behavior of the system and logic gates. We explain the universal programming language (UPL) used by physical objects to process information. To assure the causality structure in UPL, we introduce the concept of layers that characterizes the computations for each time scale.

翻訳日:2023-08-09 16:14:31 公開日:2023-08-08

# 不均衡異常検出のための損傷ビジョンマイニング機会

Damage Vision Mining Opportunity for Imbalanced Anomaly Detection ( http://arxiv.org/abs/2307.12676v3 )

ライセンス: Link先を確認

Takato Yasuno

(参考訳) 過去10年間で、従来のバランスの取れたデータセットは、産業アプリケーションにおける分類、オブジェクト検出、セマンティックセグメンテーション、異常検出のアルゴリズムの進歩に使われてきた。特に、条件ベースのメンテナンスでは、品質を保証するために視覚検査の自動化が不可欠である。予測保守と前向きな修復のための細かな決定過程を最適化するための劣化予測の試み。土木インフラや生活環境において, 被害データマイニングが不均衡なデータ問題を回避することはできない。視覚検査では, コンクリート表面から得られた劣化クラスと鋼材成分とのバランスが, 時々不均衡になる。多くの関連調査から、不均衡なデータ問題は4つのタイプに分類できると要約する。 1)対象物及びラベル有価物の範囲の欠如 2)マイノリティ階級の不均衡 3)空間的不均衡の背景 4) 画素単位の不均衡の長尾クラス。 2015年以降、回帰、画像分類、オブジェクト検出、セマンティックセグメンテーションを含むディープラーニングアプローチを用いた不均衡な研究が数多く行われている。しかし、不均衡なデータの異常検出はまだよく分かっていない。本研究は, 異常クラスの有無にかかわらず一級異常検出アプリケーションに注目し, 血液スメア, 肺感染症, 危険運転, 木質, コンクリート劣化, 河川汚泥, 災害被害等, 不均衡視覚データセットの明確な例を示す。図1に示すように、ダメージビジョンマイニングの優位性に関する重要な結果を提供し、より効果的な正比の範囲、異常検出アプリケーションの精度向上を仮定する。不均衡な研究では、正比1/1の平衡の場合と比較して、正比が適用可能であり、精度は一貫して高いことが判明した。

In past decade, previous balanced datasets have been used to advance algorithms for classification, object detection, semantic segmentation, and anomaly detection in industrial applications. Specifically, for condition-based maintenance, automating visual inspection is crucial to ensure high quality. Deterioration prognostic attempts to optimize the fine decision process for predictive maintenance and proactive repair. In civil infrastructure and living environment, damage data mining cannot avoid the imbalanced data issue because of rare unseen events and high quality status by improved operations. For visual inspection, deteriorated class acquired from the surface of concrete and steel components are occasionally imbalanced. From numerous related surveys, we summarize that imbalanced data problems can be categorized into four types; 1) missing range of target and label valuables, 2) majority-minority class imbalance, 3) foreground-background of spatial imbalance, 4) long-tailed class of pixel-wise imbalance. Since 2015, there has been many imbalanced studies using deep learning approaches that includes regression, image classification, object detection, semantic segmentation. However, anomaly detection for imbalanced data is not yet well known. In the study, we highlight one-class anomaly detection application whether anomalous class or not, and demonstrate clear examples on imbalanced vision datasets: blood smear, lung infection, hazardous driving, wooden, concrete deterioration, river sludge, and disaster damage. Illustrated in Fig.1, we provide key results on damage vision mining advantage, hypothesizing that the more effective range of positive ratio, the higher accuracy gain of anomaly detection application. In our imbalanced studies, compared with the balanced case of positive ratio 1/1, we find that there is applicable positive ratio, where the accuracy are consistently high.

翻訳日:2023-08-09 16:14:23 公開日:2023-08-08

# ProtoFL: 原型蒸留による教師なしフェデレーション学習

ProtoFL: Unsupervised Federated Learning via Prototypical Distillation ( http://arxiv.org/abs/2307.12450v2 )

ライセンス: Link先を確認

Hansol Kim, Youngjun Kwak, Minyoung Jung, Jinho Shin, Youngsung Kim, Changick Kim

(参考訳) フェデレートラーニング(FL)は、特に認証システムにおいて、データのプライバシ保護を強化するための有望なアプローチである。しかしながら、ラウンドコミュニケーションの制限、表現の不足、スケーラビリティは、デプロイメントに重大な課題をもたらし、その潜在能力を完全に阻害する。本稿では,グローバルモデルの表現力を高め,ラウンドコミュニケーションコストを削減するために,教師なしフェデレーション学習に基づく原型的表現蒸留法である「protofl」を提案する。さらに,正規化フローに基づく局所的な一クラス分類器を導入し,データ制限による性能向上を図る。本研究は,FLを用いた一級分類性能向上のための最初の研究である。我々は,MNIST, CIFAR-10, CIFAR-100, ImageNet-30, Keystroke-Dynamicsの5つの広く利用されているベンチマークにおいて,従来の手法よりも優れた性能を示した。

Federated learning (FL) is a promising approach for enhancing data privacy preservation, particularly for authentication systems. However, limited round communications, scarce representation, and scalability pose significant challenges to its deployment, hindering its full potential. In this paper, we propose 'ProtoFL', Prototypical Representation Distillation based unsupervised Federated Learning to enhance the representation power of a global model and reduce round communication costs. Additionally, we introduce a local one-class classifier based on normalizing flows to improve performance with limited data. Our study represents the first investigation of using FL to improve one-class classification performance. We conduct extensive experiments on five widely used benchmarks, namely MNIST, CIFAR-10, CIFAR-100, ImageNet-30, and Keystroke-Dynamics, to demonstrate the superior performance of our proposed framework over previous methods in the literature.

翻訳日:2023-08-09 16:13:37 公開日:2023-08-08

# 正しい理由:解釈可能なML技術は偽相関を検出できるか?

Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations? ( http://arxiv.org/abs/2307.12344v2 )

ライセンス: Link先を確認

Susu Sun, Lisa M. Koch, Christian F. Baumgartner

(参考訳) ディープニューラルネットワークモデルは、未整合の分類性能を提供するが、データ内の急激な相関を学習する傾向がある。テストデータがトレーニングデータと同じ分布から来ている場合、その情報に対するそのような依存をパフォーマンスメトリクスを使って検出することは困難である。ポストホックな説明や本質的に解釈可能な分類器のような解釈可能なMLメソッドは、欠陥モデル推論を特定することを約束する。しかし、これらの技法が実際にできるかどうかについては諸説ある。本稿では,説明手法のスプリアス相関を正しく識別する能力を評価するための厳密な評価手法を提案する。この戦略を用いて,胸部x線診断タスクにおいて3種類の人工的な共同創設者を検出できるため,ホック後の5つの説明手法と本質的に解釈可能な1つの手法を評価した。ポストホックな手法であるSHAPと本質的に解釈可能なAttri-Netは、最高の性能を提供し、欠陥モデルの振る舞いを確実に識別するために使用できる。

While deep neural network models offer unmatched classification performance, they are prone to learning spurious correlations in the data. Such dependencies on confounding information can be difficult to detect using performance metrics if the test data comes from the same distribution as the training data. Interpretable ML methods such as post-hoc explanations or inherently interpretable classifiers promise to identify faulty model reasoning. However, there is mixed evidence whether many of these techniques are actually able to do so. In this paper, we propose a rigorous evaluation strategy to assess an explanation technique's ability to correctly identify spurious correlations. Using this strategy, we evaluate five post-hoc explanation techniques and one inherently interpretable method for their ability to detect three types of artificially added confounders in a chest x-ray diagnosis task. We find that the post-hoc technique SHAP, as well as the inherently interpretable Attri-Net provide the best performance and can be used to reliably identify faulty model behavior.

翻訳日:2023-08-09 16:13:20 公開日:2023-08-08

# GPT-4によるCLIPの強化: プロンプトとしての視覚記述の調和

Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts ( http://arxiv.org/abs/2307.11661v2 )

ライセンス: Link先を確認

Mayug Maniparambil, Chris Vorster, Derek Molloy, Noel Murphy, Kevin McGuinness, Noel E. O'Connor

(参考訳) 対照的に、CLIPのような大きなVLM(Vision-Language Model)は、下流データセットで優れたパフォーマンスを提供することで、視覚表現学習に革命をもたらした。 VLMは、データセットに関連するプロンプトを設計することで、下流データセットに0ショットで適合する。このような迅速なエンジニアリングはドメインの専門知識と検証データセットを利用する。一方、gpt-4のような生成前訓練モデルの最近の開発は、高度なインターネット検索ツールとして使用できることを意味する。また、どんな構造でも視覚情報を提供するために操作することもできる。本稿では,GPT-4を用いて視覚的に記述可能なテキストを生成し,CLIPを下流タスクに適応させる方法について述べる。我々は、CLIPのデフォルトプロンプトと比較して、EuroSAT (~7%)、DTD (~7%)、SUN397 (~4.6%)、CUB (~3.3%)のような特殊な細粒度データセットの0ショット転送精度を大幅に改善したことを示す。また,提案するcocoopを平均で約2%上回り,4つの特殊細粒度データセットで4%以上上回る汎用分類器を構築するために,最善の文を選択できる簡単な数ショットアダプタも設計した。コード、プロンプト、補助テキストデータセットはhttps://github.com/mayug/VDT-Adapter.comで入手できる。

Contrastive pretrained large Vision-Language Models (VLMs) like CLIP have revolutionized visual representation learning by providing good performance on downstream datasets. VLMs are 0-shot adapted to a downstream dataset by designing prompts that are relevant to the dataset. Such prompt engineering makes use of domain expertise and a validation dataset. Meanwhile, recent developments in generative pretrained models like GPT-4 mean they can be used as advanced internet search tools. They can also be manipulated to provide visual information in any structure. In this work, we show that GPT-4 can be used to generate text that is visually descriptive and how this can be used to adapt CLIP to downstream tasks. We show considerable improvements in 0-shot transfer accuracy on specialized fine-grained datasets like EuroSAT (~7%), DTD (~7%), SUN397 (~4.6%), and CUB (~3.3%) when compared to CLIP's default prompt. We also design a simple few-shot adapter that learns to choose the best possible sentences to construct generalizable classifiers that outperform the recently proposed CoCoOP by ~2% on average and by over 4% on 4 specialized fine-grained datasets. The code, prompts, and auxiliary text dataset is available at https://github.com/mayug/VDT-Adapter.

翻訳日:2023-08-09 16:13:03 公開日:2023-08-08

# 多目的フェデレーション学習によるSecureBoostハイパーパラメータチューニング

SecureBoost Hyperparameter Tuning via Multi-Objective Federated Learning ( http://arxiv.org/abs/2307.10579v3 )

ライセンス: Link先を確認

Ziyao Ren, Yan Kang, Lixin Fan, Linghua Yang, Yongxin Tong and Qiang Yang

(参考訳) SecureBoostは、準同型暗号化を活用して、垂直連邦学習環境でデータのプライバシを保護するツリーブースティングアルゴリズムである。金融や医療などの分野では、解釈可能性、有効性、プライバシー保護能力によって広く利用されている。しかしSecureBoostは、高い計算複雑性とラベルリークのリスクに悩まされている。 SecureBoostの潜在能力を最大限活用するためには、SecureBoostのハイパーパラメータを慎重に選択して、ユーティリティ、効率、プライバシの最適なバランスをとる必要がある。既存の手法では経験的あるいはヒューリスティックにハイパーパラメータを設定するが、それらは最適とはほど遠い。このギャップを埋めるために、制約付きマルチオブジェクトセキュアBoost(CMOSB)アルゴリズムを提案し、各ソリューションがユーティリティ損失、トレーニングコスト、プライバシリークの間の最適なトレードオフを達成するためのハイパーパラメータのセットである、Pareto最適解を見つける。 3つの目的の測定を設計する。特に,提案したインスタンスクラスタリング攻撃を用いて,プライバシリークを測定する。実験により、CMOSBはベースラインよりも優れたハイパーパラメータを得るだけでなく、FL参加者のフレキシブルな要求を満たすための最適なハイパーパラメータセットも得られることが示された。

SecureBoost is a tree-boosting algorithm leveraging homomorphic encryption to protect data privacy in vertical federated learning setting. It is widely used in fields such as finance and healthcare due to its interpretability, effectiveness, and privacy-preserving capability. However, SecureBoost suffers from high computational complexity and risk of label leakage. To harness the full potential of SecureBoost, hyperparameters of SecureBoost should be carefully chosen to strike an optimal balance between utility, efficiency, and privacy. Existing methods either set hyperparameters empirically or heuristically, which are far from optimal. To fill this gap, we propose a Constrained Multi-Objective SecureBoost (CMOSB) algorithm to find Pareto optimal solutions that each solution is a set of hyperparameters achieving optimal tradeoff between utility loss, training cost, and privacy leakage. We design measurements of the three objectives. In particular, the privacy leakage is measured using our proposed instance clustering attack. Experimental results demonstrate that the CMOSB yields not only hyperparameters superior to the baseline but also optimal sets of hyperparameters that can support the flexible requirements of FL participants.

翻訳日:2023-08-09 16:12:41 公開日:2023-08-08

# 実世界応用における事前学習言語モデルの再利用性の向上

Improving the Reusability of Pre-trained Language Models in Real-world Applications ( http://arxiv.org/abs/2307.10457v3 )

ライセンス: Link先を確認

Somayeh Ghanbarzadeh, Hamid Palangi, Yan Huang, Radames Cruz Moreno, and Hamed Khanpour

(参考訳) 最先端の事前学習言語モデル(PLM)の再利用可能性はしばしば、その一般化問題によって制限され、トレーニングデータセットと異なる例であるOOD(Out-of-Distribution)/unseenの例で評価すると、その性能が劇的に低下する。この制限はplmsがスプリアス相関に依存しており、頻繁な例型ではうまく機能するが、一般的な例ではうまく機能しない。この問題に対処するため,我々は Masked Language Modeling (MLM) トレーニング目標を微調整プロセスに統合して PLM の一般化を向上する Mask-tuning というトレーニング手法を提案する。総合的な実験により、Mask-tuningは現在の最先端技術を超え、PLMのOODデータセットへの一般化を促進しながら、分散データセットのパフォーマンスを改善している。この結果から,マスクチューニングにより,見えないデータ上でのPLMの再利用性が向上し,現実のアプリケーションにおいてより実用的で効果的であることが示唆された。

The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their generalization problem, where their performance drastically decreases when evaluated on examples that differ from the training dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation arises from PLMs' reliance on spurious correlations, which work well for frequent example types but not for general examples. To address this issue, we propose a training approach called Mask-tuning, which integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization. Comprehensive experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs' generalization on OOD datasets while improving their performance on in-distribution datasets. The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications.

翻訳日:2023-08-09 16:12:21 公開日:2023-08-08

# デジェネレーションチューニング:安定拡散から不要な概念をスクランブルグリッドシールドを用いて

Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion ( http://arxiv.org/abs/2308.02552v2 )

ライセンス: Link先を確認

Zixuan Ni, Longhui Wei, Jiacheng Li, Siliang Tang, Yueting Zhuang, Qi Tian

(参考訳) トレーニングデータにおけるコンテンツの制約のない性質のため、SD(Stable Diffusion)のような大きなテキスト間拡散モデルは、対応するテキスト概念情報に基づいて、潜在的に著作権付きまたは危険なコンテンツを生成できる。これには、特定の知的財産権(IP)、人間の顔、様々な芸術様式が含まれる。しかし、広く使われるコンテンツ削除の方法である負のプロンプトは、推論ロジックに固有の制限があるため、しばしばこのコンテンツを隠すことに失敗する。本研究では,不必要な概念の内容をsd重みから保護するための新しい戦略である \textbf{degeneration-tuning (dt) を提案する。 Scrambled Gridを利用して、望ましくない概念とそれに対応する画像領域の相関関係を再構築することにより、そのようなテキスト概念が入力として提供されるとき、SDを無意味なコンテンツを生成するように誘導する。この適応はモデルの重みのレベルで発生するため、DTの後、SDはCrutNetのような他の条件付き拡散フレームワークに移植して不要な概念を保護することができる。各種概念の保護におけるDT法の有効性を定性的に示すことに加えて,DT前後のSDの定量的比較は,DT法が他のコンテンツの生成品質に悪影響を及ぼさないことを示している。 COCO-30KのFIDとISスコアはDT後、それぞれ12.61と39.20から13.04と38.25に微妙な変化しか示していない。

Owing to the unrestricted nature of the content in the training data, large text-to-image diffusion models, such as Stable Diffusion (SD), are capable of generating images with potentially copyrighted or dangerous content based on corresponding textual concepts information. This includes specific intellectual property (IP), human faces, and various artistic styles. However, Negative Prompt, a widely used method for content removal, frequently fails to conceal this content due to inherent limitations in its inference logic. In this work, we propose a novel strategy named \textbf{Degeneration-Tuning (DT)} to shield contents of unwanted concepts from SD weights. By utilizing Scrambled Grid to reconstruct the correlation between undesired concepts and their corresponding image domain, we guide SD to generate meaningless content when such textual concepts are provided as input. As this adaptation occurs at the level of the model's weights, the SD, after DT, can be grafted onto other conditional diffusion frameworks like ControlNet to shield unwanted concepts. In addition to qualitatively showcasing the effectiveness of our DT method in protecting various types of concepts, a quantitative comparison of the SD before and after DT indicates that the DT method does not significantly impact the generative quality of other contents. The FID and IS scores of the model on COCO-30K exhibit only minor changes after DT, shifting from 12.61 and 39.20 to 13.04 and 38.25, respectively, which clearly outperforms the previous methods.

翻訳日:2023-08-09 16:06:51 公開日:2023-08-08

# 大規模データ可視化のための適応配置マルチグリッドシーン表現ネットワーク

Adaptively Placed Multi-Grid Scene Representation Networks for Large-Scale Data Visualization ( http://arxiv.org/abs/2308.02494v2 )

ライセンス: Link先を確認

Skylar Wolfgang Wurster, Tianyu Xiong, Han-Wei Shen, Hanqi Guo, Tom Peterka

(参考訳) 科学データの圧縮と可視化のためにSRN(Scene representation network)が最近提案されている。しかし、現在最先端のSRNは、科学データに見られる複雑な特徴に利用可能なネットワークパラメータの割り当てに適応せず、再構築品質が低下する。本稿では,適応配置されたマルチグリッドsrn (apmgsrn) の欠点に対処し,マルチgpu上での並列学習を高速化するためのドメイン分割訓練と推論手法を提案する。また、PyTorchベースのSRNでプラグインとプレイのレンダリングを可能にする、オープンソースのニューラルボリュームレンダリングアプリケーションもリリースしています。提案アーキテクチャでは,複数の空間適応型特徴格子を用いて,領域内に配置すべき位置を学習し,エラー量の多いニューラルネットワーク資源を動的に割り当て,高価なオクツリー精製,プルーニング,従来の適応モデルのようなトラバーサルを必要とせず,科学的データに対するSRNの最先端の再構築精度を向上させる。大規模データを表現するためのドメイン分割アプローチでは、gpuメモリに収まるには大きすぎるボリュームのアウトオブコアソリューションに必要なオーバーヘッドを回避しつつ、トレーニング時間を削減しながら、ボリュームの別々のブロック上で並列にapmgsrnのセットをトレーニングします。トレーニング後、軽量なSRNはオープンソースレンダラーのリアルタイムなニューラルボリュームレンダリングに使用され、任意のビュー角と転送関数を探索することができる。この論文のコピーであるall code, all models used in our experiment, and all supplemental materials and videoは、https://github.com/skywolf829/apmgsrn.comで閲覧できます。

Scene representation networks (SRNs) have been recently proposed for compression and visualization of scientific data. However, state-of-the-art SRNs do not adapt the allocation of available network parameters to the complex features found in scientific data, leading to a loss in reconstruction quality. We address this shortcoming with an adaptively placed multi-grid SRN (APMGSRN) and propose a domain decomposition training and inference technique for accelerated parallel training on multi-GPU systems. We also release an open-source neural volume rendering application that allows plug-and-play rendering with any PyTorch-based SRN. Our proposed APMGSRN architecture uses multiple spatially adaptive feature grids that learn where to be placed within the domain to dynamically allocate more neural network resources where error is high in the volume, improving state-of-the-art reconstruction accuracy of SRNs for scientific data without requiring expensive octree refining, pruning, and traversal like previous adaptive models. In our domain decomposition approach for representing large-scale data, we train an set of APMGSRNs in parallel on separate bricks of the volume to reduce training time while avoiding overhead necessary for an out-of-core solution for volumes too large to fit in GPU memory. After training, the lightweight SRNs are used for realtime neural volume rendering in our open-source renderer, where arbitrary view angles and transfer functions can be explored. A copy of this paper, all code, all models used in our experiments, and all supplemental materials and videos are available at https://github.com/skywolf829/APMGSRN.

翻訳日:2023-08-09 16:06:23 公開日:2023-08-08

# ランダム化QAOA回路のエントロピー特性

Entropic property of randomized QAOA circuits ( http://arxiv.org/abs/2308.01807v2 )

ライセンス: Link先を確認

A. Yu. Chernyavkiy, B. I. Bantysh

(参考訳) 量子近似最適化アルゴリズム (QAOA) は、パラメータ化量子回路を用いてビットストリングをサンプリングすることにより、いくつかのバイナリ目的関数を最小化する。回路パラメータ(角度)を探索する一般的な最適化手法とは対照的に,ランダムに選択することを検討する。このアプローチは、Max-Cutを含む2次非拘束スピン最適化(QUSO)問題に対して古典的アルゴリズムより優れているわけではないが、古典的ランダム探索よりも驚くほど有利である。異なる目的値を得る確率分布を考えると、QUSO問題に対する確率パラメータ QAOA は常に古典的ランダム探索よりも高いエントロピーを与える。また,分布解析式も提供する。

Quantum approximate optimization algorithm (QAOA) aims to minimize some binary objective function by sampling bitstrings using a parameterized quantum circuit. In contrast to common optimization-based methods for searching circuit parameters (angles), here we consider choosing them at random. Despite the fact that this approach does not outperform classical algorithms for quadratic unconstrained spin optimization (QUSO) problems, including Max-Cut, it surprisingly provides an advantage over the classical random search. Investigation of this effect has led us to the following conjecture: given the probability distribution of obtaining distinct objective values, random parameters QAOA for QUSO problems always gives a higher entropy of this distribution than the classical random search. We also provide an analytical expressions for the distribution.

翻訳日:2023-08-09 16:05:07 公開日:2023-08-08

# NBIAS:テキスト中のバイアス識別のための自然言語処理フレームワーク

NBIAS: A Natural Language Processing Framework for Bias Identification in Text ( http://arxiv.org/abs/2308.01681v2 )

ライセンス: Link先を確認

Shaina Raza, Muskan Garg, Deepak John Reji, Syed Raza Bashir, Chen Ding

(参考訳) テキストデータのバイアスは、データが使用されると歪んだ解釈や結果につながる可能性がある。これらのバイアスは、ステレオタイプ、差別、その他の不公平な扱いを永続する可能性がある。偏ったデータに基づいて訓練されたアルゴリズムは、あるグループに不公平に影響を及ぼす決定を下す。したがって、データの公正かつ倫理的利用を確保するためには、これらのバイアスを検出して取り除くことが不可欠である。そこで我々は,データ層,コーパス・コントラクション,モデル開発層,評価層から構成される包括的で堅牢なフレームワークであるtextsc{Nbias} を開発した。このデータセットは、ソーシャルメディア、ヘルスケア、雇用ポータルなど、さまざまな分野からさまざまなデータを収集することによって構築される。そこで,変圧器を用いたトークン分類モデルを適用し,一意な名前を持つエンティティを通じてバイアス語やフレーズを識別する。評価手法では,定量的および定性的な評価をブレンドして,モデルの有効性を評価する。ベースラインに比べて1%から8%の精度向上を実現しています。また,モデル機能に関する堅牢な理解を生成でき,数値データだけでなく,その性能の質や複雑さも把握できる。提案手法は,様々なバイアスに適用でき,公平かつ倫理的なテキストデータの活用に寄与する。

Bias in textual data can lead to skewed interpretations and outcomes when the data is used. These biases could perpetuate stereotypes, discrimination, or other forms of unfair treatment. An algorithm trained on biased data ends up making decisions that disproportionately impact a certain group of people. Therefore, it is crucial to detect and remove these biases to ensure the fair and ethical use of data. To this end, we develop a comprehensive and robust framework \textsc{Nbias} that consists of a data layer, corpus contruction, model development layer and an evaluation layer. The dataset is constructed by collecting diverse data from various fields, including social media, healthcare, and job hiring portals. As such, we applied a transformer-based token classification model that is able to identify bias words/ phrases through a unique named entity. In the assessment procedure, we incorporate a blend of quantitative and qualitative evaluations to gauge the effectiveness of our models. We achieve accuracy improvements ranging from 1% to 8% compared to baselines. We are also able to generate a robust understanding of the model functioning, capturing not only numerical data but also the quality and intricacies of its performance. The proposed approach is applicable to a variety of biases and contributes to the fair and ethical use of textual data.

翻訳日:2023-08-09 16:04:53 公開日:2023-08-08

# fusionad: 自動運転の予測と計画タスクのためのマルチモダリティ融合

FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving ( http://arxiv.org/abs/2308.01006v3 )

ライセンス: Link先を確認

Tengju Ye, Wei Jing, Chunyong Hu, Shikun Huang, Lingping Gao, Fangzhen Li, Jingke Wang, Ke Guo, Wencong Xiao, Weibo Mao, Hang Zheng, Kun Li, Junbo Chen, Kaicheng Yu

(参考訳) 高精度でロバストなパフォーマンスに向けたマルチモダリティマルチタスクニューラルネットワークの構築は、自動運転の知覚タスクにおけるデファクトスタンダードである。しかし、複数のセンサからのそのようなデータを活用して予測と計画タスクを共同で最適化することは、ほとんど未検討のままである。本稿では、FusionADについて、私たちの知る限りでは、カメラとLiDARの2つの重要なセンサーからの情報を融合する最初の統合フレームワークであるFusionADについて述べる。具体的には、最初にトランスフォーマーベースのマルチモダリティフュージョンネットワークを構築し、フュージョンベースの機能を効果的に生み出す。カメラベースのエンドツーエンド手法であるUniADに対して、マルチモーダル特徴の利点を生かしたFMSPnPと呼ばれるモダリティ対応予測とステータス対応計画モジュールを融合して構築する。一般的なベンチマークnuscenesデータセットを広範囲に実験した結果,fusionadは最先端のパフォーマンスを達成し,検出や追跡などの知覚タスクでは平均15%,占有予測精度では10%,adeスコアでは0.708から0.389に低下し,衝突率を0.31%から0.12%に低減した。

Building a multi-modality multi-task neural network toward accurate and robust performance is a de-facto standard in perception task of autonomous driving. However, leveraging such data from multiple sensors to jointly optimize the prediction and planning tasks remains largely unexplored. In this paper, we present FusionAD, to the best of our knowledge, the first unified framework that fuse the information from two most critical sensors, camera and LiDAR, goes beyond perception task. Concretely, we first build a transformer based multi-modality fusion network to effectively produce fusion based features. In constrast to camera-based end-to-end method UniAD, we then establish a fusion aided modality-aware prediction and status-aware planning modules, dubbed FMSPnP that take advantages of multi-modality features. We conduct extensive experiments on commonly used benchmark nuScenes dataset, our FusionAD achieves state-of-the-art performance and surpassing baselines on average 15% on perception tasks like detection and tracking, 10% on occupancy prediction accuracy, reducing prediction error from 0.708 to 0.389 in ADE score and reduces the collision rate from 0.31% to only 0.12%.

翻訳日:2023-08-09 16:04:33 公開日:2023-08-08

# RecycleGPT: リサイクル可能なモジュールを備えた自動回帰言語モデル

RecycleGPT: An Autoregressive Language Model with Recyclable Module ( http://arxiv.org/abs/2308.03421v2 )

ライセンス: Link先を確認

Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu, Kunpeng Wang, Wenlai Zhao, Guangwen Yang

(参考訳) 既存の大きな言語モデルは、Kトークンのシーケンスを生成するためにK回実行する必要がある。本稿では,複数のステップでモデル全体を動作させることなく,事前生成したモデル状態をリサイクルすることで,高速な復号化速度を持つ生成言語モデルRecycleGPTを提案する。提案手法は,シーケンス内の隣接トークンは通常強い相関関係を持ち,シーケンス内の次のトークンは前列のトークンに基づいて合理的に推測あるいは推測できるという観測に基づく。実験と解析により,提案手法が推論遅延を低減し,最大1.4倍の高速化を実現し,高い性能を維持した。

Existing large language models have to run K times to generate a sequence of K tokens. In this paper, we present RecycleGPT, a generative language model with fast decoding speed by recycling pre-generated model states without running the whole model in multiple steps. Our approach relies on the observation that adjacent tokens in a sequence usually have strong correlations and the next token in a sequence can be reasonably guessed or inferred based on the preceding ones. Experiments and analysis demonstrate the effectiveness of our approach in lowering inference latency, achieving up to 1.4x speedup while preserving high performance.

翻訳日:2023-08-09 15:55:35 公開日:2023-08-08

# シーン画像を用いたマルチラベル自己監督学習

Multi-Label Self-Supervised Learning with Scene Images ( http://arxiv.org/abs/2308.03286v2 )

ライセンス: Link先を確認

Ke Zhu and Minghao Fu and Jianxin Wu

(参考訳) シーンイメージをターゲットとした自己教師あり学習(SSL)手法は最近急速に成長しており、主に専用の密マッチング機構か、高価な教師なしオブジェクト発見モジュールに依存している。本稿では,これらの厳密な操作に代えて,シーン/複数ラベル画像SSLを多ラベル分類問題として扱い,学習フレームワークを大幅に単純化することで,高品質な画像表現を学習可能であることを示す。具体的には、組込みと2つの辞書の組込みを比較して各入力画像に複数の二項擬似ラベルを割り当て、二項クロスエントロピー損失を用いてネットワークを最適化する。提案手法はマルチラベル自己教師学習(MLS)と呼ばれる。 MLSによる擬似ラベルは、異なる画像にまたがって意味的に類似した擬似陽性のペアを自動的に見つけ、コントラスト学習を容易にする。 MLSはMS-COCOの高品質な表現を学習し、分類、検出、セグメンテーションのベンチマークで最先端の結果を得る。同時に、MLSは既存のメソッドよりもはるかにシンプルで、デプロイやさらなる探索が容易である。

Self-supervised learning (SSL) methods targeting scene images have seen a rapid growth recently, and they mostly rely on either a dedicated dense matching mechanism or a costly unsupervised object discovery module. This paper shows that instead of hinging on these strenuous operations, quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem, which greatly simplifies the learning framework. Specifically, multiple binary pseudo-labels are assigned for each input image by comparing its embeddings with those in two dictionaries, and the network is optimized using the binary cross entropy loss. The proposed method is named Multi-Label Self-supervised learning (MLS). Visualizations qualitatively show that clearly the pseudo-labels by MLS can automatically find semantically similar pseudo-positive pairs across different images to facilitate contrastive learning. MLS learns high quality representations on MS-COCO and achieves state-of-the-art results on classification, detection and segmentation benchmarks. At the same time, MLS is much simpler than existing methods, making it easier to deploy and for further exploration.

翻訳日:2023-08-09 15:55:23 公開日:2023-08-08

# Spaceyze:空間対応最適化による地理空間ビデオ分析システム

Spatialyze: A Geospatial Video Analytics System with Spatial-Aware Optimizations ( http://arxiv.org/abs/2308.03276v2 )

ライセンス: Link先を確認

Chanwut Kittivorawong, Yongming Ge, Yousef Helal, Alvin Cheung

(参考訳) 携帯電話や監視カメラなどのコモディティなハードウェアを使って撮影されたビデオは、時間や位置などの様々なメタデータを記録する。このような地理空間的ビデオは日常的に遭遇し,その量は著しく増加している。しかし、そのようなデータと効率的に対話できるデータ管理システムは存在しません。本稿では,地理空間ビデオのエンドツーエンドクエリのための新しいフレームワークであるSpatialyzeについて述べる。 Spatialyzeにはドメイン固有の言語があり、ユーザは3段階の宣言型ビルド-フィルタ-オブザーブパラダイムを使って地理空間ビデオ分析ワークフローを構築することができる。内部的には、Spatialyzeはワークフローの宣言的な性質、ビデオに格納された時間空間メタデータ、現実世界のオブジェクトの物理的な振る舞いを活用してワークフローの実行を最適化する。実世界のビデオとワークフローを用いた結果から、spatialyzeは、最適化されていない実行と比較して97.1%の精度を維持しながら、実行時間を最大5.3倍削減できることがわかった。

Videos that are shot using commodity hardware such as phones and surveillance cameras record various metadata such as time and location. We encounter such geospatial videos on a daily basis and such videos have been growing in volume significantly. Yet, we do not have data management systems that allow users to interact with such data effectively. In this paper, we describe Spatialyze, a new framework for end-to-end querying of geospatial videos. Spatialyze comes with a domain-specific language where users can construct geospatial video analytic workflows using a 3-step, declarative, build-filter-observe paradigm. Internally, Spatialyze leverages the declarative nature of such workflows, the temporal-spatial metadata stored with videos, and physical behavior of real-world objects to optimize the execution of workflows. Our results using real-world videos and workflows show that Spatialyze can reduce execution time by up to 5.3x, while maintaining up to 97.1% accuracy compared to unoptimized execution.

翻訳日:2023-08-09 15:55:04 公開日:2023-08-08

# クエリガイドによるFew-shot 3D Point Cloud Segmentationの強化

Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement ( http://arxiv.org/abs/2308.03177v2 )

ライセンス: Link先を確認

Zhenhua Ning, Zhuotao Tian, Guangming Lu, Wenjie Pei

(参考訳) 3dポイントクラウドセグメンテーションに関する広範な研究が行われているが、ジェネリックモデルを新しいカテゴリに効果的に適応させることは、依然として大きな課題である。本稿では,pc-fss(point cloud few-shot segmentation)モデルを改善するための新しい手法を提案する。従来のPC-FSSでは,クエリサンプルの新規クラスを識別するために,サポートプロトタイプのカテゴリ情報を直接活用する手法とは異なり,提案手法では,サポートプロトタイプとクエリ機能間のコンテキストギャップを減らし,モデル性能を大幅に向上させる2つの重要な側面を識別する。具体的には,(1)クエリサンプルの背景や背景が不明瞭な外部キューを除去しながら,クエリコンテキストに適合するサポートバックグラウンドプロトタイプを適応させるとともに,(2)クエリ機能の指導の下で,クエリターゲットに意味的ギャップがないものをエミュレートするために,サポートプロトタイプを水平的に修正する。提案する設計は特徴抽出器と無関係であり,任意のプロトタイプベース手法に容易に適用できる。 S3DISとScanNetの実験結果は, 高い効率を維持しつつ, 大幅な改善を実現し, 顕著な実用効果を示した。このアプローチのコードはhttps://github.com/AaronNZH/Boosting-Few-shot-3D-Point-Segmentation-via-Query-Guided-Enhancementで公開されています。

Although extensive research has been conducted on 3D point cloud segmentation, effectively adapting generic models to novel categories remains a formidable challenge. This paper proposes a novel approach to improve point cloud few-shot segmentation (PC-FSS) models. Unlike existing PC-FSS methods that directly utilize categorical information from support prototypes to recognize novel classes in query samples, our method identifies two critical aspects that substantially enhance model performance by reducing contextual gaps between support prototypes and query features. Specifically, we (1) adapt support background prototypes to match query context while removing extraneous cues that may obscure foreground and background in query samples, and (2) holistically rectify support prototypes under the guidance of query features to emulate the latter having no semantic gap to the query targets. Our proposed designs are agnostic to the feature extractor, rendering them readily applicable to any prototype-based methods. The experimental results on S3DIS and ScanNet demonstrate notable practical benefits, as our approach achieves significant improvements while still maintaining high efficiency. The code for our approach is available at https://github.com/AaronNZH/Boosting-Few-shot-3D-Point-Cloud-Segmentation-via-Query-Guided-Enhanceme nt

翻訳日:2023-08-09 15:54:47 公開日:2023-08-08

# 複数参照時代に向けて -- NLG評価におけるデータ漏洩と限定参照多様性の対応

Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation ( http://arxiv.org/abs/2308.03131v2 )

ライセンス: Link先を確認

Xianfeng Zeng, Yijin Liu, Fandong Meng and Jie Zhou

(参考訳) BLEUやchrFのようなN-gramマッチングに基づく評価指標は、自然言語生成(NLG)タスクで広く利用されている。しかし、最近の研究では、これらのマッチングベースのメトリクスと人間の評価との間に弱い相関関係が明らかになっている。本稿では、マッチングベースのメトリクスにおけるパフォーマンスボトルネックは、参照の多様性の制限によって引き起こされる可能性があると推測する。この問題に対処するために,これらの指標と人的評価との整合性を高めるために, textit{multiple references} を用いることを提案する。 wmtメトリックベンチマークでは、マルチリファレンスf200spbleuが従来のシングルリファレンスより7.2\%精度が向上している。驚くべきことに、ニューラルネットワークベースのbertscoreを3.9\%の精度向上で上回っている。さらに,大規模言語モデル (LLM) におけるデータ漏洩問題は,マルチリファレンス・メトリックによって大幅に軽減できることがわかった。コードとデータは \url{https://github.com/sefazeng/llm-ref} でリリースします。

N-gram matching-based evaluation metrics, such as BLEU and chrF, are widely utilized across a range of natural language generation (NLG) tasks. However, recent studies have revealed a weak correlation between these matching-based metrics and human evaluations, especially when compared with neural-based metrics like BLEURT. In this paper, we conjecture that the performance bottleneck in matching-based metrics may be caused by the limited diversity of references. To address this issue, we propose to utilize \textit{multiple references} to enhance the consistency between these metrics and human evaluations. Within the WMT Metrics benchmarks, we observe that the multi-references F200spBLEU surpasses the conventional single-reference one by an accuracy improvement of 7.2\%. Remarkably, it also exceeds the neural-based BERTscore by an accuracy enhancement of 3.9\%. Moreover, we observe that the data leakage issue in large language models (LLMs) can be mitigated to a large extent by our multi-reference metric. We release the code and data at \url{https://github.com/SefaZeng/LLM-Ref}

翻訳日:2023-08-09 15:54:20 公開日:2023-08-08

# Gottesman-Kitaev-Preskill Codesによるボソニック量子誤差補正の進歩:理論・工学・応用

Advances in Bosonic Quantum Error Correction with Gottesman-Kitaev-Preskill Codes: Theory, Engineering and Applications ( http://arxiv.org/abs/2308.02913v2 )

ライセンス: Link先を確認

Anthony J. Brady, Alec Eickbusch, Shraddha Singh, Jing Wu and Quntao Zhuang

(参考訳) 量子情報を一組の高調波発振器に符号化することは、信頼性のある量子情報処理のためのノイズを軽減するためのハードウェア効率の良い手法と考えられる。量子ビットを振動子にエンコードするために、猫符号、二項符号、ゴッテマン・キタエフ・プレスキル(GKP)符号を含む様々な符号が提案されている。これらのボソニック符号は、量子誤差補正の分岐点に達した最初のものの一つである。さらに、GKP状態はボソニックチャネルにおける近接-最適量子通信速度を可能にするだけでなく、多くの発振器への発振器の誤り補正を可能にする。本稿では、超伝導回路アーキテクチャの最近の実験的進歩とマルチモードGKP量子ビット符号と発振器・オシレータ(O2O)符号の理論的進歩に焦点を当て、GKP符号の基本動作機構、性能評価および多くの応用に焦点を当てる。まず、ボソニック符号に必要な事前の連続変数形式から始める。次に、GKP状態の物理的実現に関わる量子工学に進む。本稿では,超伝導アーキテクチャにおけるGKP安定化と準備について深く掘り下げ,光領域におけるGKP状態を実現するための提案について検討する。最後に、マルチモードGKP量子ビットとGKP-O2O符号を示し、コード性能を調べ、計算、通信、センシングなどの量子情報処理タスクにおけるGKP符号の適用について議論する。

Encoding quantum information into a set of harmonic oscillators is considered a hardware efficient approach to mitigate noise for reliable quantum information processing. Various codes have been proposed to encode a qubit into an oscillator -- including cat codes, binomial codes and Gottesman-Kitaev-Preskill (GKP) codes. These bosonic codes are among the first to reach a break-even point for quantum error correction. Furthermore, GKP states not only enable close-to-optimal quantum communication rates in bosonic channels, but also allow for error correction of an oscillator into many oscillators. This review focuses on the basic working mechanism, performance characterization, and the many applications of GKP codes, with emphasis on recent experimental progress in superconducting circuit architectures and theoretical progress in multimode GKP qubit codes and oscillators-to-oscillators (O2O) codes. We begin with a preliminary continuous-variable formalism needed for bosonic codes. We then proceed to the quantum engineering involved to physically realize GKP states. We take a deep dive into GKP stabilization and preparation in superconducting architectures and examine proposals for realizing GKP states in the optical domain (along with a concise review of GKP realization in trapped-ion platforms). Finally, we present multimode GKP qubits and GKP-O2O codes, examine code performance and discuss applications of GKP codes in quantum information processing tasks such as computing, communication, and sensing.

翻訳日:2023-08-09 15:54:02 公開日:2023-08-08

# テンソル正規化群を持つ(1+1)次元o(3)非線形シグマモデルの絡み合いとr\'enyiエントロピー

Entanglement and R\'enyi entropies of (1+1)-dimensional O(3) nonlinear sigma model with tensor renormalization group ( http://arxiv.org/abs/2308.02798v2 )

ライセンス: Link先を確認

Xiao Luo, Yoshinobu Kuramashi

(参考訳) 1+1)次元o(3)非線形シグマモデルのエンタングルメントとr\'enyiエントロピーをテンソル正規化群法を用いて検討した。中心電荷は両エントロピーの漸近スケーリング特性から決定される。また、エンタングルメントエントロピーと$n\rightarrow 1$の次 R'eny エントロピーとの整合性についても検討する。

We investigate the entanglement and R\'enyi entropies for the (1+1)-dimensional O(3) nonlinear sigma model using the tensor renormalization group method. The central charge is determined from the asymptotic scaling properties of both entropies. We also examine the consistency between the entanglement entropy and the $n$th-order R\'enyi entropy with $n\rightarrow 1$.

翻訳日:2023-08-09 15:53:32 公開日:2023-08-08

# 頚椎細胞学分類のための開始ネットワークの投票序列化

A Voting-Stacking Ensemble of Inception Networks for Cervical Cytology Classification ( http://arxiv.org/abs/2308.02781v2 )

ライセンス: Link先を確認

Linyi Qian, Qian Huang, Yulin Chen, Junzhou Chen

(参考訳) 子宮頸癌は女性の健康を脅かす最も深刻な疾患の1つである。早期発見と診断は、頸部細胞診の分類が不可欠である癌リスクを著しく減少させる可能性がある。研究者は最近、頚部癌の自動診断のためのネットワークを多数設計しているが、これらの個々のモデルの精度と大小は、実用的な応用ニーズを満たすことができない。そこで本研究では,3つのインセプションネットワークをベース学習者として採用し,それらのアウトプットを投票アンサンブルで統合した,投票集計アンサンブル戦略を提案する。アンサンブルモデルで誤分類されたサンプルは、線形分類モデルをメタラーナーとして訓練し、最終的な予測を行う新しいトレーニングセットを生成する。さらに、パフォーマンスをさらに向上させるために、マルチレベルスタックアンサンブルフレームワークも設計されている。この手法はSIPakMed, Herlev, Mendeleyの各データセットで評価され, 100%, 100%, 100%の精度が得られた。実験結果は、現在の最先端(SOTA)法よりも優れており、スクリーニングの負荷を減らし、病理学者が子宮頸がんを検出するのに役立つ可能性を示している。

Cervical cancer is one of the most severe diseases threatening women's health. Early detection and diagnosis can significantly reduce cancer risk, in which cervical cytology classification is indispensable. Researchers have recently designed many networks for automated cervical cancer diagnosis, but the limited accuracy and bulky size of these individual models cannot meet practical application needs. To address this issue, we propose a Voting-Stacking ensemble strategy, which employs three Inception networks as base learners and integrates their outputs through a voting ensemble. The samples misclassified by the ensemble model generate a new training set on which a linear classification model is trained as the meta-learner and performs the final predictions. In addition, a multi-level Stacking ensemble framework is designed to improve performance further. The method is evaluated on the SIPakMed, Herlev, and Mendeley datasets, achieving accuracies of 100%, 100%, and 100%, respectively. The experimental results outperform the current state-of-the-art (SOTA) methods, demonstrating its potential for reducing screening workload and helping pathologists detect cervical cancer.

翻訳日:2023-08-09 15:53:22 公開日:2023-08-08

# 生成逆数ネットワークを用いた自動走行用実時間合成Raw Radarデータの生成

Generation of Realistic Synthetic Raw Radar Data for Automated Driving Applications using Generative Adversarial Networks ( http://arxiv.org/abs/2308.02632v2 )

ライセンス: Link先を確認

Eduardo C. Fidelis and Fabio Reway and Herick Y. S. Ribeiro and Pietro L. Campos and Werner Huber and Christian Icking and Lester A. Faria and Torsten Sch\"on

(参考訳) FMCWレーダをシミュレートする主なアプローチはレイトレーシングであり、通常は計算集約であり、バックグラウンドノイズを考慮しない。本研究では,GAN(Generative Adversarial Network)を用いた合成生レーダデータを生成するFMCWレーダシミュレーションの高速化手法を提案する。コードとトレーニング済みのウェイトはオープンソースであり、githubで入手できる。この方法は16個の同時チャープを生成し、レーダデータ(フィルタリングとクラスタリング)を処理するアルゴリズムのさらなる開発に生成されたデータを使用できる。これは、実生活では再現できない非存在または安全クリティカルなシナリオでデータを生成することによって、データ拡張の可能性を高めることができる。この研究で、GANはオートバイのレーダー測定を訓練され、直線を走行するオートバイの合成生レーダーデータを生成するために使用された。このデータを生成するには、ニューラルネットワークへの入力として、オートバイとガウスノイズの距離を用いる。合成レーダチャープはFrechet Inception Distance (FID)を用いて評価した。次に、このganを用いた合成データに基づいて、第1に、実データに基づいて、範囲方位(ra)マップを2回算出する。これらのRAマップに基づいて、適応しきい値とエッジ検出のアルゴリズムがオブジェクト検出に使用される。以上の結果から, 車両のコヒーレントレーダ反射と背景騒音について, チャープ, RAマップ, 物体検出結果の比較から, 現実的なデータであることが示唆された。そこで本研究では,レーダデータ生成におけるシミュレーションと現実のギャップを最小化する手法を提案する。

The main approaches for simulating FMCW radar are based on ray tracing, which is usually computationally intensive and do not account for background noise. This work proposes a faster method for FMCW radar simulation capable of generating synthetic raw radar data using generative adversarial networks (GAN). The code and pre-trained weights are open-source and available on GitHub. This method generates 16 simultaneous chirps, which allows the generated data to be used for the further development of algorithms for processing radar data (filtering and clustering). This can increase the potential for data augmentation, e.g., by generating data in non-existent or safety-critical scenarios that are not reproducible in real life. In this work, the GAN was trained with radar measurements of a motorcycle and used to generate synthetic raw radar data of a motorcycle traveling in a straight line. For generating this data, the distance of the motorcycle and Gaussian noise are used as input to the neural network. The synthetic generated radar chirps were evaluated using the Frechet Inception Distance (FID). Then, the Range-Azimuth (RA) map is calculated twice: first, based on synthetic data using this GAN and, second, based on real data. Based on these RA maps, an algorithm with adaptive threshold and edge detection is used for object detection. The results have shown that the data is realistic in terms of coherent radar reflections of the motorcycle and background noise based on the comparison of chirps, the RA maps and the object detection results. Thus, the proposed method in this work has shown to minimize the simulation-to-reality gap for the generation of radar data.

翻訳日:2023-08-09 15:53:02 公開日:2023-08-08

# Adapt and Decompose: Domain Adapted Least-to-Most PromptingによるText-to-SQLの効率的な一般化

Adapt and Decompose: Efficient Generalization of Text-to-SQL via Domain Adapted Least-To-Most Prompting ( http://arxiv.org/abs/2308.02582v2 )

ライセンス: Link先を確認

Aseem Arora, Shabbirhussain Bhaisaheb, Manasi Patwardhan, Lovekesh Vig, Gautam Shroff

(参考訳) Text-to-SQLセマンティックパーシングのクロスドメインとクロスコンポーネントの一般化は難しい課題である。既存のLarge Language Model (LLM) ベースのソリューションは、自然言語(NL)テストクエリ毎に実行時のプロンプトを合成するために、トレーニングセットから少数ショットの例の推論時検索に依存する。対照的に、トレーニングデータから最小限の少数のショットをオフラインでサンプリングするアルゴリズムを考案し、SQL節、演算子、関数を完全にカバーし、許容トークン長内でのドメインカバレッジを最大化する。これにより、固定されたジェネリック・プロンプト (GP) の合成が可能となり、NLテストクエリに共通する様々な例のセットで、高価なテストタイムの例検索を避けることができる。さらに、GPをターゲットデータベース領域(DA-GP)に自動適応させ、クロスドメインの一般化をよりうまく処理し、次いで、クロスコンポジションの一般化を扱うために分解されたLast-To-Most-Prompting(LTMP-DA-GP)を処理します。 LTMP-DA-GPの合成はオフラインタスクであり、人間の介入を最小限に抑えた新しいデータベースに対して1回ずつ実行される。提案手法は,テキストからSQLへのタスクの一般化性を評価するために設計されたKaggleDBQAデータセット上で,優れた性能を示す。さらに,GP 上での LTMP-DA-GP の性能改善を LLM や KaggleDBQA のデータベース上で一貫した性能向上を示し,本手法の有効性とモデルに依存しない利点を強調した。

Cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing is a challenging task. Existing Large Language Model (LLM) based solutions rely on inference-time retrieval of few-shot exemplars from the training set to synthesize a run-time prompt for each Natural Language (NL) test query. In contrast, we devise an algorithm which performs offline sampling of a minimal set-of few-shots from the training data, with complete coverage of SQL clauses, operators and functions, and maximal domain coverage within the allowed token length. This allows for synthesis of a fixed Generic Prompt (GP), with a diverse set-of exemplars common across NL test queries, avoiding expensive test time exemplar retrieval. We further auto-adapt the GP to the target database domain (DA-GP), to better handle cross-domain generalization; followed by a decomposed Least-To-Most-Prompting (LTMP-DA-GP) to handle cross-compositional generalization. The synthesis of LTMP-DA-GP is an offline task, to be performed one-time per new database with minimal human intervention. Our approach demonstrates superior performance on the KaggleDBQA dataset, designed to evaluate generalizability for the Text-to-SQL task. We further showcase consistent performance improvement of LTMP-DA-GP over GP, across LLMs and databases of KaggleDBQA, highlighting the efficacy and model agnostic benefits of our prompt based adapt and decompose approach.

翻訳日:2023-08-09 15:52:35 公開日:2023-08-08

# 偏微分プライベート・パーソナライズドレコメンデーションの高精度測定のためのランダム化アルゴリズム

Randomized algorithms for precise measurement of differentially-private, personalized recommendations ( http://arxiv.org/abs/2308.03735v2 )

ライセンス: Link先を確認

Allegra Laro, Yanqing Chen, Hao He, Babak Aghazadeh

(参考訳) パーソナライズドレコメンデーションは、今日のインターネットエコシステムの重要な部分を形成し、アーティストやクリエーターが興味のあるユーザーにリーチすることを支援し、ユーザーが新しく魅力的なコンテンツを見つけるのを助ける。しかし、今日の多くのユーザーは、歴史的に不注意な個人データの扱いとデータのプライバシーのために、推奨をパーソナライズするプラットフォームに懐疑的です。現在、パーソナライズドレコメンデーションに依存している企業は、プライバシ優先のシステムの多くをオーバーホールしなければならない、新たなパラダイムに移行している。本稿では,個人毎の個人別測定を容易にするためのアルゴリズムを提案する。広告をサンプルアプリケーションとして検討し,提案したプライバシー保護アルゴリズムがユーザエクスペリエンス,広告主価値,プラットフォーム収益に関連する重要な指標にどのように影響するかを,非個人的かつ非個人的かつパーソナライズされた実装の極端な部分と比較して定量化する。

Personalized recommendations form an important part of today's internet ecosystem, helping artists and creators to reach interested users, and helping users to discover new and engaging content. However, many users today are skeptical of platforms that personalize recommendations, in part due to historically careless treatment of personal data and data privacy. Now, businesses that rely on personalized recommendations are entering a new paradigm, where many of their systems must be overhauled to be privacy-first. In this article, we propose an algorithm for personalized recommendations that facilitates both precise and differentially-private measurement. We consider advertising as an example application, and conduct offline experiments to quantify how the proposed privacy-preserving algorithm affects key metrics related to user experience, advertiser value, and platform revenue compared to the extremes of both (private) non-personalized and non-private, personalized implementations.

翻訳日:2023-08-09 15:46:55 公開日:2023-08-08

# 分散画像セマンティクス無線伝送のための通信効率の高いフレームワーク

Communication-Efficient Framework for Distributed Image Semantic Wireless Transmission ( http://arxiv.org/abs/2308.03713v2 )

ライセンス: Link先を確認

Bingyan Xie, Yongpeng Wu, Yuxuan Shi, Derrick Wing Kwan Ng, Wenjun Zhang

(参考訳) 複数のデバイス間の通信を指すマルチノード通信は、多くのIoT(Internet-of-Things)シナリオで注目を集めている。しかし、その膨大なデータフローとタスク拡張の柔軟性は、通信効率のよい分散データ伝送フレームワークの緊急要求を引き起こした。本稿では,帯域幅削減と意味コミュニケーションのタスク適応性に着想を得て,iotデバイスを用いたマルチタスク分散画像伝送のためのflsc(federated learning-based semantic communication)フレームワークを提案する。フェデレートラーニングにより、各ユーザの独立したセマンティックコミュニケーションリンクの設計が可能となり、グローバルアグリゲーションによるセマンティック抽出とタスクパフォーマンスがさらに向上する。 FLSCの各リンクは、階層型視覚変換器(HVT)ベースの抽出器と、粗い意味抽出のためのタスク適応翻訳器と、特定のタスクに応じた意味翻訳からなる。 flscをより現実的な状態に拡張するために,チャネル状態情報に基づく複数入力多重出力伝送モジュールを設計し,チャネルフェーディングやノイズ対策を行う。シミュレーションの結果,粗い意味情報が画像レベルのタスクを処理できることが判明した。さらに、特に低信号対雑音比とチャネル帯域比の規則では、FLSCは従来の方式、例えば3dBチャネル条件で約10のピーク信号対雑音比利得よりも明らかに優れている。

Multi-node communication, which refers to the interaction among multiple devices, has attracted lots of attention in many Internet-of-Things (IoT) scenarios. However, its huge amounts of data flows and inflexibility for task extension have triggered the urgent requirement of communication-efficient distributed data transmission frameworks. In this paper, inspired by the great superiorities on bandwidth reduction and task adaptation of semantic communications, we propose a federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices. Federated learning enables the design of independent semantic communication link of each user while further improves the semantic extraction and task performance through global aggregation. Each link in FLSC is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator for coarse-to-fine semantic extraction and meaning translation according to specific tasks. In order to extend the FLSC into more realistic conditions, we design a channel state information-based multiple-input multiple-output transmission module to combat channel fading and noise. Simulation results show that the coarse semantic information can deal with a range of image-level tasks. Moreover, especially in low signal-to-noise ratio and channel bandwidth ratio regimes, FLSC evidently outperforms the traditional scheme, e.g. about 10 peak signal-to-noise ratio gain in the 3 dB channel condition.

翻訳日:2023-08-09 15:46:33 公開日:2023-08-08

# スクリーンベース3次元主観実験ソフトウェア

Screen-based 3D Subjective Experiment Software ( http://arxiv.org/abs/2308.03698v2 )

ライセンス: Link先を確認

Songlin Fan and Wei Gao

(参考訳) 近年,多岐にわたる3dグラフィックス(ポイントクラウドやメッシュなど)が学界や産業から,主観的実験を行うことでその知覚的品質を評価するための多大な努力を集めている。しかし、3Dの主観的実験のための便利なソフトウェアがないため、3Dグラフィック品質評価データセットの構築が複雑になり、関連する分野の繁栄を妨げる。本稿では,ユーザが柔軟に3dの主観的方法論を設計でき,高品質なデータセットを構築することができる強力なプラットフォームを開発し,幅広い3dグラフィックの主観的品質研究を可能にした。 3d刺激の知覚的品質差を正確に示すために,本ソフトウェアは音源刺激と刺激障害を同時に描画し,両刺激が同時反応することを可能にする。アマチュアの3d可視化ツールや画像/ビデオレンダリング方式と比較すると,主観実験時の認知的過負荷を最小限に抑えながら,典型的な3dアプリケーションを具現化する。提案するソフトウェアの有効性を検証するために,40名を対象に主観実験を行った。実験分析により,本ソフトウェアにおける主観的テストが3dモデルの合理的主観的品質スコアを生成できることが示されている。この論文のすべてのリソースはhttps://openi.pcl.ac.cn/OpenDatasets/3DQAで見ることができる。

Recently, widespread 3D graphics (e.g., point clouds and meshes) have drawn considerable efforts from academia and industry to assess their perceptual quality by conducting subjective experiments. However, lacking a handy software for 3D subjective experiments complicates the construction of 3D graphics quality assessment datasets, thus hindering the prosperity of relevant fields. In this paper, we develop a powerful platform with which users can flexibly design their 3D subjective methodologies and build high-quality datasets, easing a broad spectrum of 3D graphics subjective quality study. To accurately illustrate the perceptual quality differences of 3D stimuli, our software can simultaneously render the source stimulus and impaired stimulus and allows both stimuli to respond synchronously to viewer interactions. Compared with amateur 3D visualization tool-based or image/video rendering-based schemes, our approach embodies typical 3D applications while minimizing cognitive overload during subjective experiments. We organized a subjective experiment involving 40 participants to verify the validity of the proposed software. Experimental analyses demonstrate that subjective tests on our software can produce reasonable subjective quality scores of 3D models. All resources in this paper can be found at https://openi.pcl.ac.cn/OpenDatasets/3DQA.

翻訳日:2023-08-09 15:46:05 公開日:2023-08-08

# MedMine: メディケイトマイニングにおける事前学習言語モデルの検討

MedMine: Examining Pre-trained Language Models on Medication Mining ( http://arxiv.org/abs/2308.03629v2 )

ライセンス: Link先を確認

Haifa Alrdahi, Lifeng Han, Hendrik \v{S}uvalov, Goran Nenadic

(参考訳) 臨床およびバイオメディカルテキストからの薬剤の自動マイニングは、医療アプリケーションや最近の強力な言語モデル(lms)の開発に実際に影響するため、一般的な話題となっている。しかし、完全自動抽出モデルは依然として克服すべき障害に直面しており、より優れた影響を得るために直接臨床実践にデプロイすることができる。このような障害には、異なるエンティティタイプや臨床イベントに対する不均衡なパフォーマンスが含まれる。本研究では,モノリンガルモデルMed7や多言語大言語モデル(LLM)XLM-RoBERTaなどの微調整により,現状のPLMについて検討する。 n2c2-2018課題の共有タスクデータセットを用いて,それらの利点と欠点を比較した。これらの微調整実験から得られた知見を報告する。例えば、それらの出力を組み合わせたり、モデルをマージしたり、学習とデータ拡張によって全体的な精度を向上させることができる。 MedMineはM3 Initiative \url{https://github.com/HECTA-UoM/M3}の一部である。

Automatic medication mining from clinical and biomedical text has become a popular topic due to its real impact on healthcare applications and the recent development of powerful language models (LMs). However, fully-automatic extraction models still face obstacles to be overcome such that they can be deployed directly into clinical practice for better impacts. Such obstacles include their imbalanced performances on different entity types and clinical events. In this work, we examine current state-of-the-art pre-trained language models (PLMs) on such tasks, via fine-tuning including the monolingual model Med7 and multilingual large language model (LLM) XLM-RoBERTa. We compare their advantages and drawbacks using historical medication mining shared task data sets from n2c2-2018 challenges. We report the findings we get from these fine-tuning experiments such that they can facilitate future research on addressing them, for instance, how to combine their outputs, merge such models, or improve their overall accuracy by ensemble learning and data augmentation. MedMine is part of the M3 Initiative \url{https://github.com/HECTA-UoM/M3}

翻訳日:2023-08-09 15:45:32 公開日:2023-08-08

# GPT-3のトポロジカル解釈

Topological Interpretations of GPT-3 ( http://arxiv.org/abs/2308.03565v2 )

ライセンス: Link先を確認

Tianyi Sun and Bradley Nelson

(参考訳) 文ベクトルと文の意味的意味の相関関係を導出する一貫した方法を検討するための実験的検討である。我々はまず,GPT-3,Word2Vec,Sentence-BERTの3つの最先端単語/文埋め込み手法を用いて,平文文字列を高次元空間に埋め込む。次に、埋め込み空間における2つの文ベクトルの任意の組合せ間の対距離を計算し、それらを行列にマッピングする。各距離行列に基づいて、埋め込み空間における他の文ベクトルに対する文ベクトルの距離の相関を計算する。次に、距離行列の各対の相関を計算する。異なる埋め込み空間における同じ文の相関と同一埋め込み空間における異なる文の相関を観察した。これらの観察は私たちの仮説と一致し、次の段階へと進む。

This is an experiential study of investigating a consistent method for deriving the correlation between sentence vector and semantic meaning of a sentence. We first used three state-of-the-art word/sentence embedding methods including GPT-3, Word2Vec, and Sentence-BERT, to embed plain text sentence strings into high dimensional spaces. Then we compute the pairwise distance between any possible combination of two sentence vectors in an embedding space and map them into a matrix. Based on each distance matrix, we compute the correlation of distances of a sentence vector with respect to the other sentence vectors in an embedding space. Then we compute the correlation of each pair of the distance matrices. We observed correlations of the same sentence in different embedding spaces and correlations of different sentences in the same embedding space. These observations are consistent with our hypothesis and take us to the next stage.

翻訳日:2023-08-09 15:45:00 公開日:2023-08-08

# 高速インタラクティブセグメンテーションのための特徴デカップリング・リサイクリングネットワーク

Feature Decoupling-Recycling Network for Fast Interactive Segmentation ( http://arxiv.org/abs/2308.03529v2 )

ライセンス: Link先を確認

Huimin Zeng, Weinong Wang, Xin Tao, Zhiwei Xiong, Yu-Wing Tai, Wenjie Pei

(参考訳) 近年のインタラクティブセグメンテーション手法は, 画像の不変性を考慮せずに, 画像, ユーザガイダンス, 従来予測されていたマスクを入力とする。その結果、各インタラクションにおいて、ソース画像から特徴抽出が繰り返され、実質的な計算冗長性が生じる。本稿では,本研究で提案するfdrn(feature decoupling-recycling network)を提案する。これにより、インタラクティブプロセス全体の効率を大幅に改善することができる。具体的には,3つの相違点に対処するために,3つの視点からDecoupling-Recycling戦略を適用する。まず,2種類の入力領域を別々に処理するために,ユーザガイダンスの符号化からソース画像意味学の学習を分離する。第二に、FDRNは階層化された意味表現から高レベルの特徴と低レベルの特徴を分離し、特徴学習を強化する。第3に、ユーザガイダンスのエンコーディング中に、現在のユーザガイダンスが履歴ガイダンスから切り離され、現在のユーザガイダンスの効果が強調される。異なるドメインとモダリティから得られた6つのデータセットに関する広範な実験を行い、以下のモデルの有効性を実証する。 1) 他の方法よりも優れた効率性,特に長期的インタラクション(最大4.25倍の速度)を必要とする困難なシナリオにおいて有利であり,かつ,良好なセグメンテーション性能を達成する。 2) ユニバーサルエンハンスメント技術としての様々な方法への強い適用性 3) 医用画像のセグメンテーションや誤解を招くユーザガイダンスに対するロバスト性など,優れたクロスタスク汎用性。

Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input without considering the invariant nature of the source image. As a result, extracting features from the source image is repeated in each interaction, resulting in substantial computational redundancy. In this work, we propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies and then recycles components for each user interaction. Thus, the efficiency of the whole interactive process can be significantly improved. To be specific, we apply the Decoupling-Recycling strategy from three perspectives to address three types of discrepancies, respectively. First, our model decouples the learning of source image semantics from the encoding of user guidance to process two types of input domains separately. Second, FDRN decouples high-level and low-level features from stratified semantic representations to enhance feature learning. Third, during the encoding of user guidance, current user guidance is decoupled from historical guidance to highlight the effect of current user guidance. We conduct extensive experiments on 6 datasets from different domains and modalities, which demonstrate the following merits of our model: 1) superior efficiency than other methods, particularly advantageous in challenging scenarios requiring long-term interactions (up to 4.25x faster), while achieving favorable segmentation performance; 2) strong applicability to various methods serving as a universal enhancement technique; 3) well cross-task generalizability, e.g., to medical image segmentation, and robustness against misleading user guidance.

翻訳日:2023-08-09 15:44:29 公開日:2023-08-08

# DiffSynth:リアルタイムビデオ合成のための遅延インイテレーションデクリッカ

DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis ( http://arxiv.org/abs/2308.03463v2 )

ライセンス: Link先を確認

Zhongjie Duan, Lizhou You, Chengyu Wang, Cen Chen, Ziheng Wu, Weining Qian, Jun Huang, Fei Chao

(参考訳) 近年、拡散モデルが画像合成における最も強力なアプローチとして登場している。しかし、これらのモデルをビデオ合成に直接適用することは、しばしば目立ったフリックングコンテンツにつながるため、課題となる。最近提案されたゼロショット法は、フリックをある程度緩和するが、コヒーレントなビデオを生成するのに苦労している。本稿では,画像合成パイプラインをビデオ合成パイプラインに変換する新しい手法であるDiffSynthを提案する。 DiffSynthは2つの重要なコンポーネントで構成されている。潜像デクリッカリングフレームワークは、拡散モデルの潜像空間にビデオデクリッカリングを適用し、中間ステップにおけるフレッカの蓄積を効果的に防止する。さらに、異なるフレーム内のオブジェクトをリマップし、それらをブレンドしてビデオ一貫性を高める、patch blending algorithmというビデオデクリッカーアルゴリズムを提案する。 diffsynthの顕著な利点の1つは、テキスト誘導ビデオスタイライゼーション、ファッションビデオ合成、画像誘導ビデオスタイライゼーション、ビデオ復元、および3dレンダリングなど、様々なビデオ合成タスクへの一般的な適用である。テキスト誘導型ビデオスタイリングのタスクでは,チェリーピッキングなしで高品質な映像を合成することができる。実験結果はDiffSynthの有効性を示した。すべてのビデオはプロジェクトのページで見ることができる。ソースコードもリリースされる予定だ。

In recent years, diffusion models have emerged as the most powerful approach in image synthesis. However, applying these models directly to video synthesis presents challenges, as it often leads to noticeable flickering contents. Although recently proposed zero-shot methods can alleviate flicker to some extent, we still struggle to generate coherent videos. In this paper, we propose DiffSynth, a novel approach that aims to convert image synthesis pipelines to video synthesis pipelines. DiffSynth consists of two key components: a latent in-iteration deflickering framework and a video deflickering algorithm. The latent in-iteration deflickering framework applies video deflickering to the latent space of diffusion models, effectively preventing flicker accumulation in intermediate steps. Additionally, we propose a video deflickering algorithm, named patch blending algorithm, that remaps objects in different frames and blends them together to enhance video consistency. One of the notable advantages of DiffSynth is its general applicability to various video synthesis tasks, including text-guided video stylization, fashion video synthesis, image-guided video stylization, video restoring, and 3D rendering. In the task of text-guided video stylization, we make it possible to synthesize high-quality videos without cherry-picking. The experimental results demonstrate the effectiveness of DiffSynth. All videos can be viewed on our project page. Source codes will also be released.

翻訳日:2023-08-09 15:43:39 公開日:2023-08-08

# paif:攻撃耐性を持つセマンティクスセグメンテーションのための知覚認識型赤外可視画像融合

PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation ( http://arxiv.org/abs/2308.03979v1 )

ライセンス: Link先を確認

Zhu Liu, Jinyuan Liu, Benzhuang Zhang, Long Ma, Xin Fan, Risheng Liu

(参考訳) 赤外線および可視画像融合は、下流意味知覚タスクのための異なるモダリティからの補完情報を結合する強力な技術である。既存の学習ベースの手法は優れた性能を示すが、敵攻撃の固有の脆弱性に悩まされており、精度が著しく低下する。本研究では, 対向場面におけるセグメンテーションの堅牢性を促進するために, 知覚認識融合フレームワークを提案する。まず画像融合の成分を系統的に解析し, 対向摂動下でのセグメンテーションの堅牢性との関係について検討する。これらの分析に基づいて,標準精度とロバスト性のバランスをとるために,分解構造を用いた調和型アーキテクチャ探索を提案する。また,画像融合のパラメータロバスト性を改善するための適応型学習手法を提案する。したがって、画像融合の目標(つまり、ソースモダリティから相補的な特徴を抽出し、攻撃を防御する)は、アーキテクチャと学習戦略の観点から実現することができる。広範な実験結果から,本手法は,競争相手に比べて15.3%のセグメンテーションが向上し,ロバスト性が大幅に向上することが示された。ソースコードはhttps://github.com/liuzhu-cv/paifで入手できる。

Infrared and visible image fusion is a powerful technique that combines complementary information from different modalities for downstream semantic perception tasks. Existing learning-based methods show remarkable performance, but are suffering from the inherent vulnerability of adversarial attacks, causing a significant decrease in accuracy. In this work, a perception-aware fusion framework is proposed to promote segmentation robustness in adversarial scenes. We first conduct systematic analyses about the components of image fusion, investigating the correlation with segmentation robustness under adversarial perturbations. Based on these analyses, we propose a harmonized architecture search with a decomposition-based structure to balance standard accuracy and robustness. We also propose an adaptive learning strategy to improve the parameter robustness of image fusion, which can learn effective feature extraction under diverse adversarial perturbations. Thus, the goals of image fusion (\textit{i.e.,} extracting complementary features from source modalities and defending attack) can be realized from the perspectives of architectural and learning strategies. Extensive experimental results demonstrate that our scheme substantially enhances the robustness, with gains of 15.3% mIOU of segmentation in the adversarial scene, compared with advanced competitors. The source codes are available at https://github.com/LiuZhu-CV/PAIF.

翻訳日:2023-08-09 14:36:38 公開日:2023-08-08

# PUG:表現学習のためのフォトリアリスティックでセマンティックに制御可能な合成データ

PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning ( http://arxiv.org/abs/2308.03977v1 )

ライセンス: Link先を確認

Florian Bordes, Shashank Shekhar, Mark Ibrahim, Diane Bouchacourt, Pascal Vincent, Ari S. Morcos

(参考訳) 合成画像データセットは、ディープニューラルネットワークの設計と評価に不整合な利点を提供する。 i) 必要なだけ多くのデータサンプルをレンダリングする。 (ii)各場面を精密に制御し、細かな地上真理ラベル(及びキャプション)を付与する。 (iii)音実験の興味のある変数を分離するために、トレーニングとテストの間における分布の正確な制御を行う。このような約束にもかかわらず、合成画像データの使用は、主に現実主義が欠如しているため、依然として制限されている。それゆえ、ほとんどの作品は実際の画像のデータセットに依存しており、それはインターネット上の公開画像からしばしば取り除かれており、プライバシー、バイアス、著作権に関して問題があり、オブジェクトが正確にどのように現れるかはほとんど制御できない。本研究では,フォトリアリスティックな合成データの利用を民主化する手法を提案する。我々は,制御可能性と現実性の両方を提供する表現学習研究のための新しい世代の対話環境を開発する。私たちはエンタテインメント業界でよく知られた強力なゲームエンジンであるunreal engineを使用して、表現学習のためにpug(photorealistic unreal graphics)環境とデータセットを作成しています。本稿では,より厳密な視覚モデル評価を可能にするPUGの可能性を示す。

Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation. Despite such promise, the use of synthetic image data is still limited -- and often played down -- mainly due to their lack of realism. Most works therefore rely on datasets of real images, which have often been scraped from public images on the internet, and may have issues with regards to privacy, bias, and copyright, while offering little control over how objects precisely appear. In this work, we present a path to democratize the use of photorealistic synthetic data: we develop a new generation of interactive environments for representation learning research, that offer both controllability and realism. We use the Unreal Engine, a powerful game engine well known in the entertainment industry, to produce PUG (Photorealistic Unreal Graphics) environments and datasets for representation learning. In this paper, we demonstrate the potential of PUG to enable more rigorous evaluations of vision models.

翻訳日:2023-08-09 14:36:13 公開日:2023-08-08

# qutritシステムにおける時間依存デコヒーレンス率の最適化とコヒーレント制御

Optimization of Time-Dependent Decoherence Rates and Coherent Control for a Qutrit System ( http://arxiv.org/abs/2308.03976v1 )

ライセンス: Link先を確認

Oleg Morzhin, Alexander Pechen

(参考訳) この研究は、密度行列 $\rho(t)$ の進化がgorini-kossakowski-sudarshan-lindbladマスター方程式と同時コヒーレント(ハミルトニアン)と非コヒーレント(散逸のスーパーオペレーター)によって制御されるオープンクトリット系を考える。非コヒーレント制御は、特定の制御方法で時間や明確な物理力学内でのデコヒーレンス率に依存する。系の最終状態 $\rho(T)$ と与えられた目標状態 $\rho_{\rm target}$ との重なりを最大化する問題と、これらの状態間の2乗ヒルベルト-シュミット距離を最小化する問題を考える。両問題を両立させ, 対応するポントリャーギン関数, 随伴系(両終端目標の2つの場合), 目標の勾配を導出し, 1段階, 2段階, 3段階の勾配投影法を適用した。重なりを最大化する問題に対しては、正則化一階krotov法も適用する。数値実験では,まず,手法の動作を解析し,次に得られた制御過程を,非一貫性制御による資源としての環境を考察した。

The work considers an open qutrit system whose density matrix $\rho(t)$ evolution is governed by the Gorini-Kossakowski-Sudarshan-Lindblad master equation with simultaneous coherent (in the Hamiltonian) and incoherent (in the superoperator of dissipation) controls. Incoherent control makes the decoherence rates depending on time in a specific controlled manner and within clear physical mechanics. We consider the problem of maximizing the Hilbert-Schmidt overlap between the system's final state $\rho(T)$ and a given target state $\rho_{\rm target}$ and the problem of minimizing the squared Hilbert-Schmidt distance between these states. For the both problems, we perform their realifications, derive the corresponding Pontryagin function, adjount system (with the two cases of transversality conditions in view of the two terminal objectives), and gradients of the objectives, adapt the one-, two-, three-step gradient projection methods. For the problem of maximizing the overlap, we also adapt the regularized first-order Krotov method. In the numerical experiments, we analyze, first, the methods' operation and, second, the obtained control processes, in respect to considering the environment as a resource via incoherent control.

翻訳日:2023-08-09 14:35:53 公開日:2023-08-08

# 仮面運動モデリングによるプロンプトコントラスト:3次元動作表現学習に向けて

Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning ( http://arxiv.org/abs/2308.03975v1 )

ライセンス: Link先を確認

Jiahang Zhang, Lilang Lin, Jiaying Liu

(参考訳) 自己教師型学習は骨格に基づく人間の行動理解に有効であることが証明されている。先行研究は主に、骨格関係をモデル化するために、対比学習やマスキングモーションモデリングパラダイムに依存している。しかし,これらの手法では,シーケンスレベルと共同レベルの表現学習を効果的かつ同時に行うことはできない。その結果、学習した表現は、異なる下流タスクに一般化できない。さらに、これらの2つのパラダイムをナイーブな方法で組み合わせることで、相乗効果が失われ、トレーニングの干渉につながる可能性がある。これらの問題に対処するために、多目的な3次元動作表現学習のためのMasked Motion Modeling, PCM$^{\rm 3}$を用いたPrompted Contrastを提案する。本手法は,コントラスト学習とマスキング予測タスクを相互に有益に統合することで,下流課題の一般化能力を大幅に向上させる。具体的には、マスク付き予測は、コントラスト学習のための新しいトレーニングビューを提供し、ハイレベルなセマンティック情報でマスク付き予測トレーニングをガイドする。さらに,2つの異なるプリテキストタスクを学習することによって生じる干渉を低減し,モデル表現をさらに改善するマルチタスクプリトレーニング戦略を提案する。 3つの大規模データセットに基づく5つの下流タスクの大規模な実験を行い、PCM$^{\rm 3}$が最先端の作業と比較して優れた一般化能力を示す。私たちのプロジェクトは、https://jhang2020.github.io/Projects/PCM3/PCM3.htmlで公開されています。

Self-supervised learning has proved effective for skeleton-based human action understanding, which is an important yet challenging topic. Previous works mainly rely on contrastive learning or masked motion modeling paradigm to model the skeleton relations. However, the sequence-level and joint-level representation learning cannot be effectively and simultaneously handled by these methods. As a result, the learned representations fail to generalize to different downstream tasks. Moreover, combining these two paradigms in a naive manner leaves the synergy between them untapped and can lead to interference in training. To address these problems, we propose Prompted Contrast with Masked Motion Modeling, PCM$^{\rm 3}$, for versatile 3D action representation learning. Our method integrates the contrastive learning and masked prediction tasks in a mutually beneficial manner, which substantially boosts the generalization capacity for various downstream tasks. Specifically, masked prediction provides novel training views for contrastive learning, which in turn guides the masked prediction training with high-level semantic information. Moreover, we propose a dual-prompted multi-task pretraining strategy, which further improves model representations by reducing the interference caused by learning the two different pretext tasks. Extensive experiments on five downstream tasks under three large-scale datasets are conducted, demonstrating the superior generalization capacity of PCM$^{\rm 3}$ compared to the state-of-the-art works. Our project is publicly available at: https://jhang2020.github.io/Projects/PCM3/PCM3.html .

翻訳日:2023-08-09 14:35:23 公開日:2023-08-08

# クラスタ間のコストに依存する有向非巡回グラフの最適分割

Optimal partitioning of directed acyclic graphs with dependent costs between clusters ( http://arxiv.org/abs/2308.03970v1 )

ライセンス: Link先を確認

Paul Pao-Yen Wu, Fabrizio Rggeri, Kerrie Mengersen

(参考訳) ベイジアンネットワーク(bns)、マルコフ過程、隠れマルコフモデル(hmms)を含む多くの統計推論コンテキストは、基礎となる有向非巡回グラフ(dag)をクラスタに分割することでサポートされる。しかしながら、最適化するコストはクラスタ内の両方のノードに依存し、依存するクラスタと呼ばれる親ノードと子ノードを介して接続されるクラスタのマッピングであるため、統計的推論では、最適分割は困難である。本稿では,依存クラスタを用いた最適なクラスタマッピングのためのDCMAPアルゴリズムを提案する。 dagとクラスタマッピングに基づいて任意に定義された正のコスト関数が与えられると、dcmapは収束してすべての最適なクラスタを見つけ、途中に最適に近い解を返す。実験により,計算コスト関数を用いた海草複合体系のDBNモデルに対して,アルゴリズムは時間効率が高いことがわかった。 25ノードdbnと50ノードdbnでは、探索空間のサイズはそれぞれ9.91\times 10^9$と1.51\times10^{21}$でクラスタマッピングが可能であるが、最適解に88\%と72\%の近似性を持つ近似最適解は170と855である。第1の最適解は、第9434条の$(\text{95\% ci } 926,971)$、2256の$(2150,2271)$であり、それぞれ平均ヒューリスティックコストの4\%と0.2\%である。

Many statistical inference contexts, including Bayesian Networks (BNs), Markov processes and Hidden Markov Models (HMMS) could be supported by partitioning (i.e.~mapping) the underlying Directed Acyclic Graph (DAG) into clusters. However, optimal partitioning is challenging, especially in statistical inference as the cost to be optimised is dependent on both nodes within a cluster, and the mapping of clusters connected via parent and/or child nodes, which we call dependent clusters. We propose a novel algorithm called DCMAP for optimal cluster mapping with dependent clusters. Given an arbitrarily defined, positive cost function based on the DAG and cluster mappings, we show that DCMAP converges to find all optimal clusters, and returns near-optimal solutions along the way. Empirically, we find that the algorithm is time-efficient for a DBN model of a seagrass complex system using a computation cost function. For a 25 and 50-node DBN, the search space size was $9.91\times 10^9$ and $1.51\times10^{21}$ possible cluster mappings, respectively, but near-optimal solutions with 88\% and 72\% similarity to the optimal solution were found at iterations 170 and 865, respectively. The first optimal solution was found at iteration 934 $(\text{95\% CI } 926,971)$, and 2256 $(2150,2271)$ with a cost that was 4\% and 0.2\% of the naive heuristic cost, respectively.

翻訳日:2023-08-09 14:34:56 公開日:2023-08-08

# CheXFusion:長尺胸部X線分類のためのトランスフォーマーを用いたマルチビュー機能の有効融合

CheXFusion: Effective Fusion of Multi-View Features using Transformers for Long-Tailed Chest X-Ray Classification ( http://arxiv.org/abs/2308.03968v1 )

ライセンス: Link先を確認

Dongkyun Kim

(参考訳) 医用画像分類は、病気の長期分布、診断所見の同時発生、各研究または患者に利用可能な複数の視点により、ユニークな課題を生んでいる。本稿ではICCV CVAMD 2023 Shared Task on CXR-LT: Multi-Label Long-Tailed Classification on Chest X-raysについて述べる。マルチビュー画像を含むトランスフォーマーベースのフュージョンモジュールであるchexfusionを提案する。セルフアテンションとクロスアテンション機構により誘導される融合モジュールはラベル共起を考慮したマルチビュー特徴を効率的に集約する。さらに、モデルの性能を最適化するデータバランシングと自己学習手法についても検討する。提案手法はMIMIC-CXRテストセットにおいて0.372 mAPで最先端の結果を達成し,競争において第1位を確保した。この課題の成功は,マルチビュー設定,クラス不均衡,ラベル共起を考慮した医用画像分類の意義を浮き彫りにする。公開コードはhttps://github.com/dongkyuk/cxr-lt-public-solutionで入手できる。

Medical image classification poses unique challenges due to the long-tailed distribution of diseases, the co-occurrence of diagnostic findings, and the multiple views available for each study or patient. This paper introduces our solution to the ICCV CVAMD 2023 Shared Task on CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays. Our approach introduces CheXFusion, a transformer-based fusion module incorporating multi-view images. The fusion module, guided by self-attention and cross-attention mechanisms, efficiently aggregates multi-view features while considering label co-occurrence. Furthermore, we explore data balancing and self-training methods to optimize the model's performance. Our solution achieves state-of-the-art results with 0.372 mAP in the MIMIC-CXR test set, securing 1st place in the competition. Our success in the task underscores the significance of considering multi-view settings, class imbalance, and label co-occurrence in medical image classification. Public code is available at https://github.com/dongkyuk/CXR-LT-public-solution

翻訳日:2023-08-09 14:34:25 公開日:2023-08-08

# 単段画像検索のためのラフ・トゥ・フィギュア:学習コンパクト識別表現

Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval ( http://arxiv.org/abs/2308.04008v1 )

ライセンス: Link先を確認

Yunquan Zhu, Xinkai Gao, Bo Ke, Ruizhi Qiao, Xing Sun

(参考訳) 画像検索ターゲットは、クエリ画像と視覚的に類似したデータベースから画像を見つける。フェッチ・アンド・リランク・パラダイムに続く2段階のメソッドは優れた性能を達成しているが、それぞれのローカルモジュールとグローバルモジュールは実世界のアプリケーションでは非効率である。検索効率と精度を向上させるため、グローバル特徴とローカル特徴を融合表現に融合して単段画像検索を行う手法もある。しかし、様々な状況、例えば$、バックグラウンド、オクルージョン、視点によって、これらは依然として困難である。本研究では,一段階画像検索のためのコンパクト識別表現 (CFCD) を学習するための粗結合フレームワークを設計する。具体的には,各ミニバッチのスケールとマージンを動的に調整し,トレーニングやクラス内コンパクト性の向上のために徐々に強化する,適応型ソフトマックスベースロスの設計を行った。さらに,グローバルスケールでクラス間識別性を最適化するためのハードネガティブサンプリング戦略により,著名な局所記述子を注意深く選択し,詳細な意味関係をグローバル表現に注入するメカニズムを提案する。 Revisited Oxford や Revisited Paris などのベンチマークを用いて,最先端の単一ステージ画像検索性能を実現する手法の有効性を実証した。コードはhttps://github.com/bassyess/CFCDで入手できる。

Image retrieval targets to find images from a database that are visually similar to the query image. Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications. To better trade-off retrieval efficiency and accuracy, some approaches fuse global and local feature into a joint representation to perform single-stage image retrieval. However, they are still challenging due to various situations to tackle, $e.g.$, background, occlusion and viewpoint. In this work, we design a Coarse-to-Fine framework to learn Compact Discriminative representation (CFCD) for end-to-end single-stage image retrieval-requiring only image-level labels. Specifically, we first design a novel adaptive softmax-based loss which dynamically tunes its scale and margin within each mini-batch and increases them progressively to strengthen supervision during training and intra-class compactness. Furthermore, we propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation by a hard negative sampling strategy to optimize inter-class distinctiveness at a global scale. Extensive experimental results have demonstrated the effectiveness of our method, which achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris. Code is available at https://github.com/bassyess/CFCD.

翻訳日:2023-08-09 14:28:39 公開日:2023-08-08

# 視覚言語モデルを用いた単純な形状とテクスチャテキスト記述子を用いた医用画像分類

Few-shot medical image classification with simple shape and texture text descriptors using vision-language models ( http://arxiv.org/abs/2308.04005v1 )

ライセンス: Link先を確認

Michal Byra, Muhammad Febrian Rachmadi, Henrik Skibbe

(参考訳) 本研究では,視覚言語モデル (vlms) と大言語モデル (大言語モデル) の有用性について検討した。 gpt-4モデルを用いて,医療画像中の物体の形状とテクスチャ特性をカプセル化したテキスト記述子を生成する。次に、これらのgpt-4生成ディスクリプタと、自然画像に事前訓練されたvlmを用いて、胸部x線および胸部超音波画像の分類を行う。以上の結果から,VLMとGPT-4生成ディスクリプタを用いた医療画像の少ない分類が可能であることが示唆された。しかし、正確な分類は、ある記述子を分類スコアの計算から除外する必要がある。さらに,乳房超音波画像におけるvlmの形状特徴評価能について検討した。さらに, GPT-4 で生成したテキスト記述子の集合間の変動度について検討する。本研究は,医用画像解析へのVLMの適用について,いくつかの重要な知見を提供する。

In this work, we investigate the usefulness of vision-language models (VLMs) and large language models for binary few-shot classification of medical images. We utilize the GPT-4 model to generate text descriptors that encapsulate the shape and texture characteristics of objects in medical images. Subsequently, these GPT-4 generated descriptors, alongside VLMs pre-trained on natural images, are employed to classify chest X-rays and breast ultrasound images. Our results indicate that few-shot classification of medical images using VLMs and GPT-4 generated descriptors is a viable approach. However, accurate classification requires to exclude certain descriptors from the calculations of the classification scores. Moreover, we assess the ability of VLMs to evaluate shape features in breast mass ultrasound images. We further investigate the degree of variability among the sets of text descriptors produced by GPT-4. Our work provides several important insights about the application of VLMs for medical image analysis.

翻訳日:2023-08-09 14:27:58 公開日:2023-08-08

# 構造化背景知識と誘導推論を用いたCNN隠れニューロン活性化の理解

Understanding CNN Hidden Neuron Activations using Structured Background Knowledge and Deductive Reasoning ( http://arxiv.org/abs/2308.03999v1 )

ライセンス: Link先を確認

Abhilekha Dalal, Md Kamruzzaman Sarker, Adrita Barua, Eugene Vasserman, Pascal Hitzler

(参考訳) 正確な解釈は、深層学習システムが入力に関係していると内部的に何が検出されているかについての洞察を与え、深層学習システムのブラックボックス文字を非神秘化する。その技術は、隠れたノードの活性化は、人間にとって意味のある方法で解釈可能であるが、隠れたニューロンの活性化の解釈を仮説化し検証できる体系的な自動化手法は、過小評価されていることを示している。本稿では,そのような方法を提供し,意味のある解釈を提供することを示す。提案手法は,ウィキペディアの概念階層から学習した約200万クラスの大規模バックグラウンド知識と,セマンティックWeb分野のアプリケーション向けに開発された記述論理に基づく概念推論と呼ばれるシンボリック推論手法をベースとする。以上より,畳み込みニューラルネットワークの密集層内の個々のニューロンに,背景知識から有意なラベルを仮説と検証プロセスを通じて自動的に付加できることを示す。

A major challenge in Explainable AI is in correctly interpreting activations of hidden neurons: accurate interpretations would provide insights into the question of what a deep learning system has internally detected as relevant on the input, de-mystifying the otherwise black-box character of deep learning systems. The state of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans, but systematic automated methods that would be able to hypothesize and verify interpretations of hidden neuron activations are underexplored. In this paper, we provide such a method and demonstrate that it provides meaningful interpretations. Our approach is based on using large-scale background knowledge approximately 2 million classes curated from the Wikipedia concept hierarchy together with a symbolic reasoning approach called Concept Induction based on description logics, originally developed for applications in the Semantic Web field. Our results show that we can automatically attach meaningful labels from the background knowledge to individual neurons in the dense layer of a Convolutional Neural Network through a hypothesis and verification process

翻訳日:2023-08-09 14:27:16 公開日:2023-08-08

# オープンフィールド環境におけるロボットハーベスティングのための改良型YOLOv5sアーキテクチャに基づくリアルタイムイチゴ検出

Real-time Strawberry Detection Based on Improved YOLOv5s Architecture for Robotic Harvesting in open-field environment ( http://arxiv.org/abs/2308.03998v1 )

ライセンス: Link先を確認

Zixuan He (1)(2), Salik Ram Khana (1)(2), Xin Zhang (3), Manoj Karkee (1)(2), Qin Zhang (1)(2) ((1) Center for Precision and Automated Agricultural Systems, Washington State University, (2) Department of Biological Systems Engineering, Washington State University, (3) Department of Agricultural and Biological Engineering, Mississippi State University)

(参考訳) 本研究では、屋外環境下でイチゴを検知するYOLOv5を用いたカスタムオブジェクト検出モデルを提案する。 YOLOv5sの当初のアーキテクチャは、C3モジュールをバックボーンネットワークのC2fモジュールに置き換えることで変更され、より優れた機能勾配フローを提供した。第2に, YOLOv5sのバックボーンネットワークの最終層における空間ピラミッドのポーリング速度をクロスステージ部分ネットと組み合わせて, イチゴデータセットの一般化能力を向上した。提案されたアーキテクチャはYOLOv5s-Strawと名付けられた。 3つの成熟度クラス(未熟、ほぼ成熟、成熟)を持つイチゴキャノピーのrgb画像データセットは、オープンフィールド環境で収集され、輝度の低下、輝度の増大、ノイズの追加を含む一連の操作によって拡張された。オープンフィールド環境におけるイチゴ検出手法の優位性を検証するため、4つの競合検出モデル(YOLOv3-tiny, YOLOv5s, YOLOv5s-C2f, YOLOv8s)をトレーニングし、同じ計算環境下でテストし、YOLOv5s-Strawと比較した。その結果、平均平均精度は80.3%で、yolov3-tiny、yolov5s、yolov5s-c2f、yolov8では73.4%、77.8%、79.8%、79.3%であった。具体的には、YOLOv5s-Strawの平均精度は未熟なクラスで82.1%、ほぼ成熟したクラスで73.5%、成熟したクラスで86.6%であり、それぞれ2.3%と3.7%であった。モデルには8.6*10^6のネットワークパラメータがあり、1画像あたりの推論速度は18msであり、yolov8の推論速度は21.0ms、重いパラメータは11.1*10^6であった。

This study proposed a YOLOv5-based custom object detection model to detect strawberries in an outdoor environment. The original architecture of the YOLOv5s was modified by replacing the C3 module with the C2f module in the backbone network, which provided a better feature gradient flow. Secondly, the Spatial Pyramid Pooling Fast in the final layer of the backbone network of YOLOv5s was combined with Cross Stage Partial Net to improve the generalization ability over the strawberry dataset in this study. The proposed architecture was named YOLOv5s-Straw. The RGB images dataset of the strawberry canopy with three maturity classes (immature, nearly mature, and mature) was collected in open-field environment and augmented through a series of operations including brightness reduction, brightness increase, and noise adding. To verify the superiority of the proposed method for strawberry detection in open-field environment, four competitive detection models (YOLOv3-tiny, YOLOv5s, YOLOv5s-C2f, and YOLOv8s) were trained, and tested under the same computational environment and compared with YOLOv5s-Straw. The results showed that the highest mean average precision of 80.3% was achieved using the proposed architecture whereas the same was achieved with YOLOv3-tiny, YOLOv5s, YOLOv5s-C2f, and YOLOv8s were 73.4%, 77.8%, 79.8%, 79.3%, respectively. Specifically, the average precision of YOLOv5s-Straw was 82.1% in the immature class, 73.5% in the nearly mature class, and 86.6% in the mature class, which were 2.3% and 3.7%, respectively, higher than that of the latest YOLOv8s. The model included 8.6*10^6 network parameters with an inference speed of 18ms per image while the inference speed of YOLOv8s had a slower inference speed of 21.0ms and heavy parameters of 11.1*10^6, which indicates that the proposed model is fast enough for real time strawberry detection and localization for the robotic picking.

翻訳日:2023-08-09 14:26:45 公開日:2023-08-08

# 宇宙空間統合ネットワークにおける資源管理のための協調型マルチエージェント深層強化学習

Cooperative Multi-Type Multi-Agent Deep Reinforcement Learning for Resource Management in Space-Air-Ground Integrated Networks ( http://arxiv.org/abs/2308.03995v1 )

ライセンス: Link先を確認

Hengxi Zhang, Huaze Tang, Wenbo Ding, Xiao-Ping Zhang

(参考訳) sagin(space-air-ground integrated network)は、低軌道(leo)衛星、無人航空機(uavs)、地上ユーザー(gus)を含む異種デバイスを統合することで、スマートシティの応用を前進させることを約束している。しかし、SAGINの資源管理は、不適切な資源管理がデータ伝達の貧弱を招き、スマートシティのサービスに影響を及ぼすという緊急の研究を必要とする課題である。本稿では,5つの異なる通信リンクを含む総合的なSAGINシステムを開発し,資源管理問題に対処する効率的な協調型マルチエージェント深層強化学習(CMT-MARL)手法を提案する。実験結果は,提案するcmt-marlの有効性を強調するものである。これらの結果は、将来のSAGINの実装の可能性と実現可能性を示している。

The Space-Air-Ground Integrated Network (SAGIN), integrating heterogeneous devices including low earth orbit (LEO) satellites, unmanned aerial vehicles (UAVs), and ground users (GUs), holds significant promise for advancing smart city applications. However, resource management of the SAGIN is a challenge requiring urgent study in that inappropriate resource management will cause poor data transmission, and hence affect the services in smart cities. In this paper, we develop a comprehensive SAGIN system that encompasses five distinct communication links and propose an efficient cooperative multi-type multi-agent deep reinforcement learning (CMT-MARL) method to address the resource management issue. The experimental results highlight the efficacy of the proposed CMT-MARL, as evidenced by key performance indicators such as the overall transmission rate and transmission success rate. These results underscore the potential value and feasibility of future implementation of the SAGIN.

翻訳日:2023-08-09 14:25:59 公開日:2023-08-08

# マルチロール教育エージェントとしてのAIチャットボット:CS教育におけるエンゲージメントの変容

AI Chatbots as Multi-Role Pedagogical Agents: Transforming Engagement in CS Education ( http://arxiv.org/abs/2308.03992v1 )

ライセンス: Link先を確認

Cassie Chen Cao, Zijian Ding, Jionghao Lin, Frank Hopfgartner

(参考訳) 本研究では,人工知能(ai)を活用したマルチロールチャットボットを,学習経験の向上とコンピュータサイエンス教育への取り組みの促進に活用する。デザインに基づく研究アプローチを活用し、インストラクターボット、ピアボット、キャリアアドバイザボット、感情支援ボットという4つの異なる役割を持つ新しい学習環境を開発し、実装し、評価する。これらの役割は、自己決定理論の信条に基づいて設計され、能力、自律性、関連性という、学習者の生来の心理学的ニーズを満たしている。さらに、このシステムは質問に基づく学習パラダイムを採用し、学生に質問をし、解決策を求め、その好奇心を探求するよう促す。我々は,このシステムを,200人の参加学生を対象に,1ヶ月の高等教育状況下でテストし,人間教師と1人のチャットボットを含む条件と比較した。本研究は,チャットログのシーケンス分析や,調査やフォーカスグループインタビューなど質的手法などの定量的手法を取り入れた混合手法を用いた。トピックモデリングや感情分析などの最先端自然言語処理技術を統合することにより,学習者の関与,動機づけ,質問に基づく学習に対するシステムの影響を深く理解する。この研究は、厳格な設計と革新的アプローチを通じて、コンピュータサイエンス教育の風景を形作り、熱心で支援的でモチベーションのある学習環境を育む上で、AIを駆使したマルチロールチャットボットの可能性に関する重要な洞察を提供する。

This study investigates the use of Artificial Intelligence (AI)-powered, multi-role chatbots as a means to enhance learning experiences and foster engagement in computer science education. Leveraging a design-based research approach, we develop, implement, and evaluate a novel learning environment enriched with four distinct chatbot roles: Instructor Bot, Peer Bot, Career Advising Bot, and Emotional Supporter Bot. These roles, designed around the tenets of Self-Determination Theory, cater to the three innate psychological needs of learners - competence, autonomy, and relatedness. Additionally, the system embraces an inquiry-based learning paradigm, encouraging students to ask questions, seek solutions, and explore their curiosities. We test this system in a higher education context over a period of one month with 200 participating students, comparing outcomes with conditions involving a human tutor and a single chatbot. Our research utilizes a mixed-methods approach, encompassing quantitative measures such as chat log sequence analysis, and qualitative methods including surveys and focus group interviews. By integrating cutting-edge Natural Language Processing techniques such as topic modelling and sentiment analysis, we offer an in-depth understanding of the system's impact on learner engagement, motivation, and inquiry-based learning. This study, through its rigorous design and innovative approach, provides significant insights into the potential of AI-empowered, multi-role chatbots in reshaping the landscape of computer science education and fostering an engaging, supportive, and motivating learning environment.

翻訳日:2023-08-09 14:25:42 公開日:2023-08-08

# NEOLAF - LLMを利用したニューラルシンボリック認知アーキテクチャ

NEOLAF, an LLM-powered neural-symbolic cognitive architecture ( http://arxiv.org/abs/2308.03990v1 )

ライセンス: Link先を確認

Richard Jiarui Tong, Cassie Chen Cao, Timothy Xueqian Lee, Guodong Zhao, Ray Wan, Feiyue Wang, Xiangen Hu, Robin Schmucker, Jinsheng Pan, Julian Quevedo, Yu Lu

(参考訳) 本稿では,知的なエージェントをモデル化し構築する統合型ニューラルシンボリック認知アーキテクチャであるnever ending open learning adaptive framework(neolaf)を提案する。 NEOLAFフレームワークは、その説明可能性、漸進学習、効率性、協調的および分散学習、ヒューマン・イン・ザ・ループの実現、自己改善により、純粋接続性および純粋シンボル的アプローチよりもインテリジェントエージェントを構築する上で優れたアプローチである。さらに,課題解決エージェントとして構築されたNEOLAFエージェントを,オープンソースのMATHデータセットから複雑な数学問題に投入する,説得力のある実験を行った。その結果、NEOLAFの優れた学習能力と、認知アーキテクチャの分野や自己改善型適応型教育システムに革命をもたらす可能性を示す。

This paper presents the Never Ending Open Learning Adaptive Framework (NEOLAF), an integrated neural-symbolic cognitive architecture that models and constructs intelligent agents. The NEOLAF framework is a superior approach to constructing intelligent agents than both the pure connectionist and pure symbolic approaches due to its explainability, incremental learning, efficiency, collaborative and distributed learning, human-in-the-loop enablement, and self-improvement. The paper further presents a compelling experiment where a NEOLAF agent, built as a problem-solving agent, is fed with complex math problems from the open-source MATH dataset. The results demonstrate NEOLAF's superior learning capability and its potential to revolutionize the field of cognitive architectures and self-improving adaptive instructional systems.

翻訳日:2023-08-09 14:25:14 公開日:2023-08-08

# 3次元動的都市気候のリアルタイムシミュレーションのためのフーリエニューラルオペレータ

Fourier neural operator for real-time simulation of 3D dynamic urban microclimate ( http://arxiv.org/abs/2308.03985v1 )

ライセンス: Link先を確認

Wenhui Peng, Shaoxiang Qin, Senwen Yang, Jianchun Wang, Xue Liu, Liangzhu (Leon) Wang

(参考訳) 地球規模の都市化は、人間の快適性、健康、建築/都市エネルギー効率のための都市微小気候の重要性を強調している。主な環境影響として、建築設計や都市計画に大きな影響を与えている。都市が気候変動に備え、レジリエンス対策を効果的に実施するためには、地域の微気候を理解することが不可欠である。しかし、都市の微気候を分析するには、計算領域内の屋外パラメータの複雑な配列を屋内よりも長期にわたって考慮する必要がある。その結果, 都市微小気候の影響評価において, 計算流体力学(cfd)などの数値計算手法は計算コストが高くなる。ディープラーニング技術の台頭により、複雑な非線形相互作用とシステムダイナミクスのモデリングを加速する新たな機会が開けた。近年、フーリエニューラル演算子(FNO)は、部分微分方程式(PDE)の解法と流体力学系のモデリングの高速化に非常に有望であることが示されている。本研究では,FNOネットワークを実時間3次元都市風況シミュレーションに適用する。都市域のCFDシミュレーションから,半ラグランジュ的アプローチと分数ステップ法による大規模都市問題モデリングのための都市微気候特性のシミュレートによる訓練・試験データを生成する。数値実験により,fnoモデルは瞬時空間速度場を正確に再現できることがわかった。さらに,風向の異なる未確認データに基づくFNOモデルの評価を行い,FNOモデルが風向の異なるデータに対して良好に一般化可能であることを示す。さらに重要なことに、fnoアプローチはグラフィック処理ユニット上でミリ秒以内の予測を可能にし、3d動的都市気候のリアルタイムシミュレーションを可能にします。

Global urbanization has underscored the significance of urban microclimates for human comfort, health, and building/urban energy efficiency. They profoundly influence building design and urban planning as major environmental impacts. Understanding local microclimates is essential for cities to prepare for climate change and effectively implement resilience measures. However, analyzing urban microclimates requires considering a complex array of outdoor parameters within computational domains at the city scale over a longer period than indoors. As a result, numerical methods like Computational Fluid Dynamics (CFD) become computationally expensive when evaluating the impact of urban microclimates. The rise of deep learning techniques has opened new opportunities for accelerating the modeling of complex non-linear interactions and system dynamics. Recently, the Fourier Neural Operator (FNO) has been shown to be very promising in accelerating solving the Partial Differential Equations (PDEs) and modeling fluid dynamic systems. In this work, we apply the FNO network for real-time three-dimensional (3D) urban wind field simulation. The training and testing data are generated from CFD simulation of the urban area, based on the semi-Lagrangian approach and fractional stepping method to simulate urban microclimate features for modeling large-scale urban problems. Numerical experiments show that the FNO model can accurately reconstruct the instantaneous spatial velocity field. We further evaluate the trained FNO model on unseen data with different wind directions, and the results show that the FNO model can generalize well on different wind directions. More importantly, the FNO approach can make predictions within milliseconds on the graphics processing unit, making real-time simulation of 3D dynamic urban microclimate possible.

翻訳日:2023-08-09 14:24:57 公開日:2023-08-08

# SimplyRetrieve: プライベートで軽量な検索中心の生成AIツール

SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI Tool ( http://arxiv.org/abs/2308.03983v1 )

ライセンス: Link先を確認

Youyang Ng, Daisuke Miyashita, Yasuto Hoshi, Yasuhiro Morioka, Osamu Torii, Tomoya Kodama, Jun Deguchi

(参考訳) 大規模言語モデル(LLM)ベースの生成AIシステムは,近年,大きな進歩を遂げている。知識検索アーキテクチャを統合することで、追加のモデル微調整を必要とせずに、事前訓練されたLLMを使用して、プライベートデータを公開可能な生成AIシステムにシームレスに統合することができる。さらに、検索中心生成(RCG)アプローチは、文脈解釈と知識記憶におけるLLMとレトリバーの役割を明確に分離する将来的な研究方向であり、より効率的な実装につながる可能性がある。 simplyretrieveはオープンソースのツールで、機械学習コミュニティへの高度な進歩に対して、ローカライズされ、軽量で、ユーザフレンドリーなインターフェースを提供することを目標としている。 SimplyRetrieveはGUIとAPIベースのRCGプラットフォームを備えており、Private Knowledge Base ConstructorとRetrieval Tuning Moduleが支援している。これらの機能を活用することで、ユーザーはプライバシ標準を維持しながら生成AIのパフォーマンスを改善するためのRCGの可能性を探ることができる。このツールはMITライセンスでhttps://github.com/RCGAI/SimplyRetrieveで入手できる。

Large Language Model (LLM) based Generative AI systems have seen significant progress in recent years. Integrating a knowledge retrieval architecture allows for seamless integration of private data into publicly available Generative AI systems using pre-trained LLM without requiring additional model fine-tuning. Moreover, Retrieval-Centric Generation (RCG) approach, a promising future research direction that explicitly separates roles of LLMs and retrievers in context interpretation and knowledge memorization, potentially leads to more efficient implementation. SimplyRetrieve is an open-source tool with the goal of providing a localized, lightweight, and user-friendly interface to these sophisticated advancements to the machine learning community. SimplyRetrieve features a GUI and API based RCG platform, assisted by a Private Knowledge Base Constructor and a Retrieval Tuning Module. By leveraging these capabilities, users can explore the potential of RCG for improving generative AI performance while maintaining privacy standards. The tool is available at https://github.com/RCGAI/SimplyRetrieve with an MIT license.

翻訳日:2023-08-09 14:24:30 公開日:2023-08-08

# PartNER: LiDAR 3Dオブジェクト検出のための極性表現のレベルアップ

PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection ( http://arxiv.org/abs/2308.03982v1 )

ライセンス: Link先を確認

Ming Nie, Yujing Xue, Chunwei Wang, Chaoqiang Ye, Hang Xu, Xinge Zhu, Qingqiu Huang, Michael Bi Mi, Xinchao Wang, Li Zhang

(参考訳) 近年、極性に基づく表現は知覚タスクにおいて有望な性質を示している。点雲を均等に分離するデカルト的アプローチに加えて,(1)異なる解像度下でのロバスト性能の優位性と(2)ストリーミングベースのアプローチの優位性から,点雲を極性グリッドとして表現する手法が選択肢として認識されている。しかし、極性表現の不均一な分割のため、最先端の極性検出法は必然的に特徴歪み問題に悩まされ、カルテシアン法と比較して非無視的な性能差が生じる。この問題に対処するため,極座標における新しい3次元物体検出器Partnerを提案する。 PartNERは、グローバル表現再構成による特徴歪みのジレンマを緩和し、検出ヘッドにインスタンスレベルの幾何情報を導入することで回帰を容易にする。大規模な実験は、ストリーミングベースの検出と異なる解像度において圧倒的な優位性を示している。さらに,本手法は,Waymo と ONCE の検証セットにおいて,3.68% と 9.15% の顕著なマージンを持つ従来の極性理論よりも優れており,最先端の手法よりも競争力のある結果が得られる。

Recently, polar-based representation has shown promising properties in perceptual tasks. In addition to Cartesian-based approaches, which separate point clouds unevenly, representing point clouds as polar grids has been recognized as an alternative due to (1) its advantage in robust performance under different resolutions and (2) its superiority in streaming-based approaches. However, state-of-the-art polar-based detection methods inevitably suffer from the feature distortion problem because of the non-uniform division of polar representation, resulting in a non-negligible performance gap compared to Cartesian-based approaches. To tackle this issue, we present PARTNER, a novel 3D object detector in the polar coordinate. PARTNER alleviates the dilemma of feature distortion with global representation re-alignment and facilitates the regression by introducing instance-level geometric information into the detection head. Extensive experiments show overwhelming advantages in streaming-based detection and different resolutions. Furthermore, our method outperforms the previous polar-based works with remarkable margins of 3.68% and 9.15% on Waymo and ONCE validation set, thus achieving competitive results over the state-of-the-art methods.

翻訳日:2023-08-09 14:24:11 公開日:2023-08-08

# agentsims: 大きな言語モデル評価のためのオープンソースサンドボックス

AgentSims: An Open-Source Sandbox for Large Language Model Evaluation ( http://arxiv.org/abs/2308.04026v1 )

ライセンス: Link先を確認

Jiaju Lin, Haoran Zhao, Aochi Zhang, Yiting Wu, Huqiuyue Ping, Qin Chen

(参考訳) ChatGPTライクな大規模言語モデル(LLM)がコミュニティで普及しているため、LLMの能力を評価する方法はオープンな問題である。既存の評価手法では,(1)制約付き評価能力,(2)脆弱なベンチマーク,(3)客観的な指標が不足している。 LLMエージェントがシミュレーション環境でタスクを完了するタスクベース評価は、上記の問題を解決するための一対一のソリューションである。 agentimsは、あらゆる分野の研究者が興味のある特定の能力をテストするための、使いやすいインフラストラクチャです。研究者は対話的なGUIにエージェントやビルディングを追加するか、メモリ、計画、ツール使用システムといった新しいサポートメカニズムを数行のコードでテストすることで、評価タスクを構築することができる。デモはhttps://agentsims.comで公開しています。

With ChatGPT-like large language models (LLM) prevailing in the community, how to evaluate the ability of LLMs is an open question. Existing evaluation methods suffer from following shortcomings: (1) constrained evaluation abilities, (2) vulnerable benchmarks, (3) unobjective metrics. We suggest that task-based evaluation, where LLM agents complete tasks in a simulated environment, is a one-for-all solution to solve above problems. We present AgentSims, an easy-to-use infrastructure for researchers from all disciplines to test the specific capacities they are interested in. Researchers can build their evaluation tasks by adding agents and buildings on an interactive GUI or deploy and test new support mechanisms, i.e. memory, planning and tool-use systems, by a few lines of codes. Our demo is available at https://agentsims.com .

翻訳日:2023-08-09 14:16:21 公開日:2023-08-08

# MSAC:音声感情認識のための複数音声属性制御法

MSAC: Multiple Speech Attribute Control Method for Speech Emotion Recognition ( http://arxiv.org/abs/2308.04025v1 )

ライセンス: Link先を確認

Yu Pan

(参考訳) 言語感情認識(SER)は、大きな進歩にもかかわらず、特に野生世界では、感情特性の複雑さとあいまいさのため、依然として困難である。最近の研究は主に認識と一般化の能力に焦点を当てているが、本研究はser法の信頼性を探求し、様々な音声属性間のデータ分布の観点から音声感情をモデル化する方法を検討する。具体的には,新たなcnnベースのserモデルを構築し,加算マージンソフトマックス損失を適用し,異なるクラスの特徴間の距離を拡大することで識別性を高めた。第2に,音声属性を明示的に制御し,感情非依存な属性の影響を軽減し,よりきめ細かい感情関連特徴を捉えるための,新しい複数音声属性制御法であるmsacを提案する。第3に,out-of-distribution detection法を用いて,提案するserワークフローの信頼性をテスト・解析する試みを行った。単一とクロスコーポレートの両方のserシナリオに関する広範な実験により,提案する統一serワークフローは,認識,一般化,信頼性性能において,ベースラインを一貫して上回っていることが示された。さらにシングルコーパスのserでは、提案するserワークフローは72.97\%のwarとiemocapコーパス上の71.76\%のuarで優れた認識結果を達成している。

Despite significant progress, speech emotion recognition (SER) remains challenging due to inherent complexity and ambiguity of the emotion attribute, particularly in wild world. Whereas current studies primarily focus on recognition and generalization capabilities, this work pioneers an exploration into the reliability of SER methods and investigates how to model the speech emotion from the aspect of data distribution across various speech attributes. Specifically, we first build a novel CNN-based SER model which adopts additive margin softmax loss to expand the distance between features of different classes, thereby enhancing their discrimination. Second, a novel multiple speech attribute control method MSAC is proposed to explicitly control speech attributes, enabling the model to be less affected by emotion-agnostic attributes and capture more fine-grained emotion-related features. Third, we make a first attempt to test and analyze the reliability of the proposed SER workflow using the out-of-distribution detection method. Extensive experiments on both single and cross-corpus SER scenarios show that our proposed unified SER workflow consistently outperforms the baseline in terms of recognition, generalization, and reliability performance. Besides, in single-corpus SER, the proposed SER workflow achieves superior recognition results with a WAR of 72.97\% and a UAR of 71.76\% on the IEMOCAP corpus.

翻訳日:2023-08-09 14:16:05 公開日:2023-08-08

# 不均衡分類とRL探索のためのスコープ損失

Scope Loss for Imbalanced Classification and RL Exploration ( http://arxiv.org/abs/2308.04024v1 )

ライセンス: Link先を確認

Hasham Burhani, Xiao Qi Shi, Jonathan Jaegerman, Daniel Balicki

(参考訳) 強化学習問題と教師付き分類問題との等価性を示す。その結果,強化学習における探索的活用のトレードオフを教師付き分類におけるデータセットの不均衡問題と同一視し,その対処方法の類似性を見出した。上記の問題の解析から,強化学習と教師付き分類のための新しい損失関数を導出する。新たな損失関数であるScope Lossは、チューニングを必要とせずに、パフォーマンス損失のオーバーエクスプロイテーションやデータセットの不均衡を防止するために、勾配を調整する。ベンチマーク強化学習タスクのバスケットとスキュー分類データセットを用いて、SOTA損失関数に対するスコープ損失を検証し、スコープ損失が他の損失関数よりも優れていることを示す。

We demonstrate equivalence between the reinforcement learning problem and the supervised classification problem. We consequently equate the exploration exploitation trade-off in reinforcement learning to the dataset imbalance problem in supervised classification, and find similarities in how they are addressed. From our analysis of the aforementioned problems we derive a novel loss function for reinforcement learning and supervised classification. Scope Loss, our new loss function, adjusts gradients to prevent performance losses from over-exploitation and dataset imbalances, without the need for any tuning. We test Scope Loss against SOTA loss functions over a basket of benchmark reinforcement learning tasks and a skewed classification dataset, and show that Scope Loss outperforms other loss functions.

翻訳日:2023-08-09 14:15:40 公開日:2023-08-08

# Harrow-Hassidim-Lloydアルゴリズムにおける量子資源

Quantum Resources in Harrow-Hassidim-Lloyd Algorithm ( http://arxiv.org/abs/2308.04021v1 )

ライセンス: Link先を確認

Pradeep Kumar, Tanoy Kanti Konar, Leela Ganesh Chandra Lakkaraju, Aditi Sen De

(参考訳) 量子アルゴリズムは、古典的なアルゴリズムの能力を超えたタスク実行のランタイムを削減できる。したがって、量子の利点に責任を持つリソースを特定することは興味深い試みである。 HHL(Harrow-Hassidim-Lloyd)アルゴリズムにおいて、非自明な線形方程式系を解くためには、二分法と真多分法の両方の非消滅量子相関が不可欠であることを示す。さらに,システム全体の非有望なl1-ノルム量子コヒーレンスとレジスタ量子ビットがアルゴリズムの成功確率と関連していることがわかった。量子資源の定量的解析により、各ステップでかなりの量の二部交絡が生成され、このアルゴリズムに必要な一方で、多部交絡内容は性能指標に逆比例することが明らかとなった。さらに,ガウス分布から選択された不完全性が制御された回転に組み込まれると,障害の強さによって多部交絡が増加し,二部交絡とコヒーレンスも減少する一方で,二部交絡とコヒーレンスも増加し,このアルゴリズムにおける二部交絡とコヒーレンスの有効性が確かめられる。

Quantum algorithms have the ability to reduce runtime for executing tasks beyond the capabilities of classical algorithms. Therefore, identifying the resources responsible for quantum advantages is an interesting endeavour. We prove that nonvanishing quantum correlations, both bipartite and genuine multipartite entanglement, are required for solving nontrivial linear systems of equations in the Harrow-Hassidim-Lloyd (HHL) algorithm. Moreover, we find a nonvanishing l1-norm quantum coherence of the entire system and the register qubit which turns out to be related to the success probability of the algorithm. Quantitative analysis of the quantum resources reveals that while a significant amount of bipartite entanglement is generated in each step and required for this algorithm, multipartite entanglement content is inversely proportional to the performance indicator. In addition, we report that when imperfections chosen from Gaussian distribution are incorporated in controlled rotations, multipartite entanglement increases with the strength of the disorder, albeit error also increases while bipartite entanglement and coherence decreases, confirming the beneficial role of bipartite entanglement and coherence in this algorithm.

翻訳日:2023-08-09 14:15:29 公開日:2023-08-08

# 大規模無条件事前訓練による合成強化

Synthetic Augmentation with Large-scale Unconditional Pre-training ( http://arxiv.org/abs/2308.04020v1 )

ライセンス: Link先を確認

Jiarong Ye, Haomiao Ni, Peng Jin, Sharon X. Huang, Yuan Xue

(参考訳) 深層学習に基づく医用画像認識システムは、専門家のアノテーションによるかなりの量のトレーニングデータを必要とすることが多い。近年,クラスラベルに条件付けされたリアルな画像を生成することで問題を緩和する合成拡張技術が提案されている。しかし、これらの手法の有効性は、十分なラベル付きトレーニングデータなしでは保証できない訓練された生成モデルの表現能力に大きく依存する。さらに,アノテートデータへの依存を減らすために,大規模なラベルなしデータセットで事前学習し,後に小規模ラベル付きデータセットに適用して拡張トレーニングを行う,histodiffusionと呼ばれる合成拡張法を提案する。特に,多種多様なラベル付きデータセット上に潜在拡散モデル(LDM)をトレーニングし,共通特徴を学習し,条件付き入力なしで現実的な画像を生成する。そこで,本モデルでは,未確認ラベル付きデータセット上で,潜在空間の分類器ガイダンスを用いてモデルを微調整し,特定のカテゴリの画像を合成する。さらに,ターゲットラベルとのマッチングの信頼性が高い合成試料のみを添加する選択的な機構を採用した。本手法は,3つの病理組織学的データセットを事前学習し,大腸癌の病理組織学的データセット(CRC)を事前学習データセットから除外して評価する。 histodiffusionの強化により,backbone分類器の分類精度が6.4%向上した。私たちのコードはhttps://github.com/karenyyy/HistoDiffAug.comで利用可能です。

Deep learning based medical image recognition systems often require a substantial amount of training data with expert annotations, which can be expensive and time-consuming to obtain. Recently, synthetic augmentation techniques have been proposed to mitigate the issue by generating realistic images conditioned on class labels. However, the effectiveness of these methods heavily depends on the representation capability of the trained generative model, which cannot be guaranteed without sufficient labeled training data. To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training. In particular, we train a latent diffusion model (LDM) on diverse unlabeled datasets to learn common features and generate realistic images without conditional inputs. Then, we fine-tune the model with classifier guidance in latent space on an unseen labeled dataset so that the model can synthesize images of specific categories. Additionally, we adopt a selective mechanism to only add synthetic samples with high confidence of matching to target labels. We evaluate our proposed method by pre-training on three histopathology datasets and testing on a histopathology dataset of colorectal cancer (CRC) excluded from the pre-training datasets. With HistoDiffusion augmentation, the classification accuracy of a backbone classifier is remarkably improved by 6.4% using a small set of the original labels. Our code is available at https://github.com/karenyyy/HistoDiffAug.

翻訳日:2023-08-09 14:15:03 公開日:2023-08-08

# 逆攻撃による半教師付き学習の性能向上

Improving Performance of Semi-Supervised Learning by Adversarial Attacks ( http://arxiv.org/abs/2308.04018v1 )

ライセンス: Link先を確認

Dongyoon Yang, Kunwoong Kim, Yongdai Kim

(参考訳) semi-supervised learning (ssl) アルゴリズムは、大量のラベル付きデータへのアクセスが難しいという現実的な仮定に基づいている。本研究では,最近のSSLアルゴリズムの性能向上を目的として,逆ロバストネスを用いたクリーンサンプルの選択のためのSCARというフレームワークを提案する。セミスーパービジョンで事前学習したモデルを逆襲することにより,画像分類の大幅な進歩を示す。本稿では,現在の予測でラベル付けされた高信頼度ラベル付きデータを選択する方法を紹介する。 CIFAR10では、SCARを使った最近のSSLアルゴリズムが3つあり、画像分類が大幅に改善された。

Semi-supervised learning (SSL) algorithm is a setup built upon a realistic assumption that access to a large amount of labeled data is tough. In this study, we present a generalized framework, named SCAR, standing for Selecting Clean samples with Adversarial Robustness, for improving the performance of recent SSL algorithms. By adversarially attacking pre-trained models with semi-supervision, our framework shows substantial advances in classifying images. We introduce how adversarial attacks successfully select high-confident unlabeled data to be labeled with current predictions. On CIFAR10, three recent SSL algorithms with SCAR result in significantly improved image classification.

翻訳日:2023-08-09 14:14:37 公開日:2023-08-08

# グループレコメンデーションのための多粒度アテンションモデル

Multi-Granularity Attention Model for Group Recommendation ( http://arxiv.org/abs/2308.04017v1 )

ライセンス: Link先を確認

Jianye Ji, Jiayan Pei, Shaochuan Lin, Taotao Zhou, Hengxu He, Jia Jia, Ning Hu

(参考訳) グループレコメンデーションは、共通の興味、好み、特徴に基づいて、ユーザーグループにパーソナライズされたレコメンデーションを提供する。最近の研究では、個人の好みを統合し、グループ全体に役立つ集団的な決定を下す様々な方法が研究されている。しかし、それらの多くはリッチな振る舞いを持つユーザに依存しており、比較的まばらな振る舞いを持つユーザの潜在的嗜好を無視しているため、個人の興味の学習は不十分である。この課題に対処するために,複数レベルの粒度(サブセット,グループ,スーパーセットなど)を活用して,グループメンバーの潜伏傾向を解明し,推薦ノイズを軽減する手法であるMGAM(Multi-Granularity Attention Model)を提案する。特に,それまでのアイテムとのインタラクションを取り入れ,階層的な機構を活用し,ユーザの潜在部分レベルの嗜好表現を強化するサブセット選好抽出モジュールを提案する。さらに,グループ選好抽出モジュールとスーパーセット選好抽出モジュールを導入し,グループ選好を継続するグループレベルとグループグループ外見情報を含むスーパーセットレベルという2つのレベルにおいて,ユーザの潜在選好を探索する。提案手法は,サブセットレベルの埋め込み,グループレベルの埋め込み,スーパーセットレベルの埋め込みを組み込むことにより,複数の粒度にわたるグループレコメンデーションノイズを効果的に低減し,個々の興味を包括的に学習する。大規模オフラインおよびオンライン実験により,本手法の優れた性能が実証された。

Group recommendation provides personalized recommendations to a group of users based on their shared interests, preferences, and characteristics. Current studies have explored different methods for integrating individual preferences and making collective decisions that benefit the group as a whole. However, most of them heavily rely on users with rich behavior and ignore latent preferences of users with relatively sparse behavior, leading to insufficient learning of individual interests. To address this challenge, we present the Multi-Granularity Attention Model (MGAM), a novel approach that utilizes multiple levels of granularity (i.e., subsets, groups, and supersets) to uncover group members' latent preferences and mitigate recommendation noise. Specially, we propose a Subset Preference Extraction module that enhances the representation of users' latent subset-level preferences by incorporating their previous interactions with items and utilizing a hierarchical mechanism. Additionally, our method introduces a Group Preference Extraction module and a Superset Preference Extraction module, which explore users' latent preferences on two levels: the group-level, which maintains users' original preferences, and the superset-level, which includes group-group exterior information. By incorporating the subset-level embedding, group-level embedding, and superset-level embedding, our proposed method effectively reduces group recommendation noise across multiple granularities and comprehensively learns individual interests. Extensive offline and online experiments have demonstrated the superiority of our method in terms of performance.

翻訳日:2023-08-09 14:14:28 公開日:2023-08-08

# 構成ゼロショット学習のための階層的ビジュアルプリミティブエキスパート

Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning ( http://arxiv.org/abs/2308.04016v1 )

ライセンス: Link先を確認

Hanjae Kim, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

(参考訳) compositional zero-shot learning (czsl) は、既知のプリミティブ(属性とオブジェクト)の事前知識を持つ、未知のコンポジションを認識することを目的としている。 CZSLのこれまでの研究は、属性とオブジェクト間の文脈性、視覚的特徴の識別可能性、および現実世界の合成データの長期分布の把握に悩まされることが多かった。このような問題に対処するために,コンポジショントランスフォーマー(CoT)と呼ばれるシンプルでスケーラブルなフレームワークを提案する。 CoTは、視覚ネットワークを階層的に使用して、オブジェクトと属性の専門家を独特な方法で使用して、代表的な埋め込みを生成する。オブジェクトエキスパートは、最終層からボトムアップ方式で代表オブジェクト埋め込みを抽出し、属性エキスパートは、コンテキスト性を明確にモデル化するオブジェクト誘導アテンションモジュールで、トップダウン方式で属性埋め込みを行う。不均衡なデータ分布に起因するバイアス予測を緩和するために,2つのイメージを混合して仮想サンプルを合成し,少数属性クラスをオーバーサンプリングする,シンプルなマイノリティ属性拡張(MAA)を開発した。提案手法は,MIT-States,C-GQA,VAW-CZSLなど,いくつかのベンチマークでSoTA性能を実現する。また,cotが視覚識別を改善し,不均衡データ分布からモデルバイアスに対処する効果を示す。コードはhttps://github.com/HanjaeKim98/CoT.comで入手できる。

Compositional zero-shot learning (CZSL) aims to recognize unseen compositions with prior knowledge of known primitives (attribute and object). Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data. We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. CoT employs object and attribute experts in distinctive manners to generate representative embeddings, using the visual network hierarchically. The object expert extracts representative object embeddings from the final layer in a bottom-up manner, while the attribute expert makes attribute embeddings in a top-down manner with a proposed object-guided attention module that models contextuality explicitly. To remedy biased prediction caused by imbalanced data distribution, we develop a simple minority attribute augmentation (MAA) that synthesizes virtual samples by mixing two images and oversampling minority attribute classes. Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL. We also demonstrate the effectiveness of CoT in improving visual discrimination and addressing the model bias from the imbalanced data distribution. The code is available at https://github.com/HanjaeKim98/CoT.

翻訳日:2023-08-09 14:14:02 公開日:2023-08-08

# 大規模言語モデルの継続的な事前学習: モデルをいかに(再)ウォームするか?

Continual Pre-Training of Large Language Models: How to (re)warm your model? ( http://arxiv.org/abs/2308.04014v1 )

ライセンス: Link先を確認

Kshitij Gupta, Benjamin Th\'erien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timoth\'ee Lesort

(参考訳) 大規模言語モデル(LLM)は数十億のトークンで定期的に事前訓練されるが、新しいデータが利用可能になったら再起動する。より安価で効率的な解決策は、これらのモデルの継続的な事前トレーニングを可能にすることである。しかし、新しいデータによって誘導される分布シフトは、通常過去のデータにおける劣化性能をもたらす。本研究は,効率的な継続事前学習に向けた一歩として,異なるウォームアップ戦略の効果を検討する。私たちの仮説は、新しいデータセットでトレーニングするときの計算効率を改善するために、学習率を再び高めなければならないということです。我々は,SlimPajama(下流データ,297Bトークン)の事前トレーニングを継続するPile(上流データ,300Bトークン)上で事前トレーニングされたモデルのウォームアップフェーズについて,線形ウォームアップおよびコサイン崩壊スケジュールに従って検討した。我々はPythia 410M言語モデルアーキテクチャに関する全ての実験を行い、検証の難易度を通して性能を評価する。我々は,事前学習チェックポイント,最大学習率,ウォームアップ長の異なる実験を行った。私たちの結果は、リワーミングモデルが最初に上流データと下流データの損失を増加させる一方で、長期的にはダウンストリームパフォーマンスを改善し、大きなダウンストリームデータセットに対してscratch$\unicode{x2013}$evenからトレーニングされたモデルを上回ることを示しています。

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch$\unicode{x2013}$even for a large downstream dataset.

翻訳日:2023-08-09 14:13:38 公開日:2023-08-08

# 観測ネットワークデータから因果効果を推定するための一般化

Generalization bound for estimating causal effects from observational network data ( http://arxiv.org/abs/2308.04011v1 )

ライセンス: Link先を確認

Ruichu Cai, Zeqin Yang, Weilin Chen, Yuguang Yan, Zhifeng Hao

(参考訳) 観測ネットワークデータから因果効果を推定することは重要であるが難しい問題である。観測ネットワークデータに対する因果推論における既存の研究は、一般化境界の解析を欠いているため、理論的には複雑な境界バイアスを緩和し、原則的に学習目標の設計を実践的に導くことができる。このギャップを埋めるために,ネットワークシナリオにおける因果効果推定のための一般化を活用して導出する。 1)関節適合度スコアに基づく再重み付けスキーマと 2)IPM(Integrated Probability Metric)に基づく表現学習スキーマ。我々はそれぞれ、重み付けと表現学習の観点から、一般化に関する2つの視点を提供する。本稿では,境界の分析に動機づけられ,表現学習を付加した関節伸縮スコアに基づく重み付け回帰法を提案する。半合成データを持つ2つの実世界のネットワークに関する広範囲な実験により,本アルゴリズムの有効性が示された。

Estimating causal effects from observational network data is a significant but challenging problem. Existing works in causal inference for observational network data lack an analysis of the generalization bound, which can theoretically provide support for alleviating the complex confounding bias and practically guide the design of learning objectives in a principled manner. To fill this gap, we derive a generalization bound for causal effect estimation in network scenarios by exploiting 1) the reweighting schema based on joint propensity score and 2) the representation learning schema based on Integral Probability Metric (IPM). We provide two perspectives on the generalization bound in terms of reweighting and representation learning, respectively. Motivated by the analysis of the bound, we propose a weighting regression method based on the joint propensity score augmented with representation learning. Extensive experimental studies on two real-world networks with semi-synthetic data demonstrate the effectiveness of our algorithm.

翻訳日:2023-08-09 14:13:09 公開日:2023-08-08

# 形状最適化における異常検出と設計空間次元削減のための生成モデル

Generative Models for Anomaly Detection and Design-Space Dimensionality Reduction in Shape Optimization ( http://arxiv.org/abs/2308.04051v1 )

ライセンス: Link先を確認

Danny D'Agostino

(参考訳) 本研究は, 幾何異常のない最適化プロセスにおいて, 高品質な設計の創出を推進しつつ, グローバル最適化アルゴリズムの効率を向上させるために, 新たな形状最適化手法を提案する。これは、幾何学的分散が最大化される新しい縮小部分空間を定義する元の設計変数の数を減らし、因子分析や確率主成分分析のような確率的線形潜在変数モデルを介してデータの基底となる生成過程をモデル化することで達成される。形状修正法が線形であり, 設計変数が一様にランダムにサンプリングされる場合, 中心極限定理の直接適用により, データはガウス分布にほぼ従うことを示す。モデルの不確かさはマハラノビス距離の観点で測定され、異常な設計はこの測定値の高い値を示す傾向があることが示されている。これにより、異常なジオメトリがペナルティ化され、最適化ループ中に回避される新しい最適化モデルの定義が可能になる。この手法はdtmb 5415モデルの船体形状最適化に応用され、形状最適化問題の国際ベンチマークとして広く用いられている。グローバル最適化ルーチンはベイズ最適化とDIRECTアルゴリズムを用いて実行される。数値計算結果から,大域的最適化アルゴリズムの収束性が向上する一方で,高質な幾何学的特徴を持つ設計のみを最適化ルーチンによって生成し,貴重な計算量の多いシミュレーションの段階を回避した。

Our work presents a novel approach to shape optimization, that has the twofold objective to improve the efficiency of global optimization algorithms while promoting the generation of high-quality designs during the optimization process free of geometrical anomalies. This is accomplished by reducing the number of the original design variables defining a new reduced subspace where the geometrical variance is maximized and modeling the underlying generative process of the data via probabilistic linear latent variable models such as Factor Analysis and Probabilistic Principal Component Analysis. We show that the data follows approximately a Gaussian distribution when the shape modification method is linear and the design variables are sampled uniformly at random, due to the direct application of the central limit theorem. The model uncertainty is measured in terms of Mahalanobis distance, and the paper demonstrates that anomalous designs tend to exhibit a high value of this metric. This enables the definition of a new optimization model where anomalous geometries are penalized and consequently avoided during the optimization loop. The procedure is demonstrated for hull shape optimization of the DTMB 5415 model, extensively used as an international benchmark for shape optimization problems. The global optimization routine is carried out using Bayesian Optimization and the DIRECT algorithm. From the numerical results, the new framework improves the convergence of global optimization algorithms, while only designs with high-quality geometrical features are generated through the optimization routine thereby avoiding the wastage of precious computationally expensive simulations.

翻訳日:2023-08-09 14:07:32 公開日:2023-08-08

# sodformer: イベントとフレームを用いたtransformerによるストリーミングオブジェクト検出

SODFormer: Streaming Object Detection with Transformer Using Events and Frames ( http://arxiv.org/abs/2308.04047v1 )

ライセンス: Link先を確認

Dianze Li and Jianing Li and Yonghong Tian

(参考訳) DAVISカメラは、非同期イベントとフレームの相補的な2つのモードをストリーミングするが、徐々に大きなオブジェクト検出の課題(例えば、高速モーションのぼかしと低照度)に対処するために使われている。しかし、リッチな時間的手がかりを効果的に活用し、2つの異種視覚ストリームを融合する方法は、依然として困難な試みである。この課題に対処するために,まずイベントとフレームを統合し,非同期にオブジェクトを連続的に検出する,transformerを備えた新しいストリーミングオブジェクト検出器sodformerを提案する。まず,PKU-DAVIS-SOD(PKU-DAVIS-SOD)を1080.1k以上の手動ラベル上に構築する。そこで,この時空間トランスフォーマーモジュールは2つの視覚ストリームからのリッチな時間的キューを利用して検出性能を向上させることで,オブジェクトを終端から終端までのシーケンス予測問題により検出する時空間トランスフォーマーアーキテクチャを設計する。最後に、非同期アテンションベースの融合モジュールを提案し、2つの不均一なセンシングモードを統合し、各端から相補的な利点を生かし、任意のタイミングでオブジェクトを探索し、同期フレームベースの融合戦略から限られた出力周波数を破ることができる。その結果,提案するsodformerは,最先端手法が4つ,ベースラインが8つと有意な差を示した。また、従来のフレームベースカメラが故障した場合、例えば、高速モーションや低照度条件などでも、統一フレームワークがうまく機能することを示す。データセットとコードはhttps://github.com/dianzl/SODFormer.orgから入手可能です。

DAVIS camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges (e.g., fast motion blur and low-light). However, how to effectively leverage rich temporal cues and fuse two heterogeneous visual streams remains a challenging endeavor. To address this challenge, we propose a novel streaming object detector with Transformer, namely SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i.e., PKU-DAVIS-SOD) over 1080.1k manual labels. Then, we design a spatiotemporal Transformer architecture to detect objects via an end-to-end sequence prediction problem, where the novel temporal Transformer module leverages rich temporal cues from two visual streams to improve the detection performance. Finally, an asynchronous attention-based fusion module is proposed to integrate two heterogeneous sensing modalities and take complementary advantages from each end, which can be queried at any time to locate objects and break through the limited output frequency from synchronized frame-based fusion strategies. The results show that the proposed SODFormer outperforms four state-of-the-art methods and our eight baselines by a significant margin. We also show that our unifying framework works well even in cases where the conventional frame-based camera fails, e.g., high-speed motion and low-light conditions. Our dataset and code can be available at https://github.com/dianzl/SODFormer.

翻訳日:2023-08-09 14:07:07 公開日:2023-08-08

# 任意の二次集団-スピン相互作用を持つ非線形時間反転干渉法

Nonlinear time-reversal interferometry with arbitrary quadratic collective-spin interaction ( http://arxiv.org/abs/2308.04042v1 )

ライセンス: Link先を確認

Zhiyao Hu, Qixian Li, Xuanchen Zhang, He-bin Zhang, Long-Gang Huang, Yong-Chun Liu

(参考訳) 原子間非線形干渉法は量子力学や量子情報科学に広く応用されている。本稿では、任意の二次的集団-スピン相互作用によって生じるスピンスクイーズに基づいて、高ロバスト性およびメソジカルゲインを有する非線形時間反転干渉法を提案し、これをLipkin-Meshkov-Glick(LMG)モデルで記述する。 LMGモデルの2つの特定のケース, 1軸ねじれ, 2軸ねじれは, それぞれ頑健さと精度で優れており, スクイーズ処理, 符号化処理, アンチスクイーズ処理を最適化する。さらに,原子系における等価時間反転を実現するFloquet駆動方式を提案し,精度,ロバスト性,操作性が向上した。本研究では,原子非線形干渉法において高精度かつロバスト性を達成するためのベンチマークを設定する。

Atomic nonlinear interferometry has wide applications in quantum metrology and quantum information science. Here we propose a nonlinear time-reversal interferometry scheme with high robustness and metrological gain based on the spin squeezing generated by arbitrary quadratic collective-spin interaction, which could be described by the Lipkin-Meshkov-Glick (LMG) model. We optimize the squeezing process, encoding process, and anti-squeezing process, finding that the two particular cases of the LMG model, one-axis twisting and two-axis twisting outperform in robustness and precision, respectively. Moreover, we propose a Floquet driving method to realize equivalent time reverse in the atomic system, which leads to high performance in precision, robustness, and operability. Our study sets a benchmark in achieving high precision and robustness in atomic nonlinear interferometry.

翻訳日:2023-08-09 14:06:37 公開日:2023-08-08

# infere: 推論チェーンによるステップバイステップのレゲックス生成

InfeRE: Step-by-Step Regex Generation via Chain of Inference ( http://arxiv.org/abs/2308.04041v1 )

ライセンス: Link先を確認

Shuai Zhang, Xiaodong Gu, Yuting Chen, Beijun Shen

(参考訳) 自然言語記述(NL2RE)から正規表現(regexesの略)を自動生成する研究領域が新たに登場した。先行研究は、regexをトークンの線形列として扱い、最後の式を単一のパスで自動回帰的に生成する。彼らは最終結果の背後にある内部テキストマッチングプロセスのステップバイステップを考慮していない。これは、ニューラルネットワークモデルによるregex生成の有効性と解釈性を著しく阻害する。本稿では,レゲックスの生成をステップバイステップ推論の連鎖に分解する,infereと呼ばれる新しいパラダイムを提案する。頑健性を高めるために,異なるモデルからサンプリングされた複数の出力をアンサンブルする自己一貫性復号機構を導入する。我々は、NL-RX-TurkとKB13の2つの公開データセット上でInfeREを評価し、その結果を最先端のアプローチと人気のツリーベース生成アプローチであるTRANXと比較した。実験の結果、InfeREは以前のベースラインを大幅に上回り、2つのデータセットでそれぞれ16.3%と14.7%のDFA@5精度が向上した。特にInfeREは、DFA@5の精度で、両方のデータセットにおいて、人気のツリーベースの生成アプローチを18.1%、11.3%で上回っている。

Automatically generating regular expressions (abbrev. regexes) from natural language description (NL2RE) has been an emerging research area. Prior studies treat regex as a linear sequence of tokens and generate the final expressions autoregressively in a single pass. They did not take into account the step-by-step internal text-matching processes behind the final results. This significantly hinders the efficacy and interpretability of regex generation by neural language models. In this paper, we propose a new paradigm called InfeRE, which decomposes the generation of regexes into chains of step-by-step inference. To enhance the robustness, we introduce a self-consistency decoding mechanism that ensembles multiple outputs sampled from different models. We evaluate InfeRE on two publicly available datasets, NL-RX-Turk and KB13, and compare the results with state-of-the-art approaches and the popular tree-based generation approach TRANX. Experimental results show that InfeRE substantially outperforms previous baselines, yielding 16.3% and 14.7% improvement in DFA@5 accuracy on two datasets, respectively. Particularly, InfeRE outperforms the popular tree-based generation approach by 18.1% and 11.3% on both datasets, respectively, in terms of DFA@5 accuracy.

翻訳日:2023-08-09 14:06:21 公開日:2023-08-08

# マーモセット脳における結合分解と遺伝子発現画像の登録のための暗黙的神経表現

Implicit neural representations for joint decomposition and registration of gene expression images in the marmoset brain ( http://arxiv.org/abs/2308.04039v1 )

ライセンス: Link先を確認

Michal Byra, Charissa Poon, Tomomi Shimogori, Henrik Skibbe

(参考訳) 本稿では,脳の2つの画像に類似した解剖学的構造を登録するが,一方の画像には他方の画像に存在しない特徴やアーティファクトが含まれているという課題を解決する,暗黙的な神経表現に基づく新しい画像登録法を提案する。その効果を示すために,marmoset脳の2次元顕微鏡$\textit{in situ}$ハイブリダイゼーション遺伝子発現画像を用いた。遺伝子発現を正確に定量化するには、脳テンプレートへの画像登録が必要である。提案手法では,暗黙のネットワークと画像排除損失を併用して,画像の登録と分割を共同で行う。サポートイメージはテンプレートとよく一致し、残りのイメージはテンプレートから切り離された個々のイメージ特性をキャプチャします。実験では,提案手法は優れた結果を与え,他の登録手法よりも優れていた。

We propose a novel image registration method based on implicit neural representations that addresses the challenging problem of registering a pair of brain images with similar anatomical structures, but where one image contains additional features or artifacts that are not present in the other image. To demonstrate its effectiveness, we use 2D microscopy $\textit{in situ}$ hybridization gene expression images of the marmoset brain. Accurately quantifying gene expression requires image registration to a brain template, which is difficult due to the diversity of patterns causing variations in visible anatomical brain structures. Our approach uses implicit networks in combination with an image exclusion loss to jointly perform the registration and decompose the image into a support and residual image. The support image aligns well with the template, while the residual image captures individual image characteristics that diverge from the template. In experiments, our method provided excellent results and outperformed other registration techniques.

翻訳日:2023-08-09 14:06:01 公開日:2023-08-08

# 非構造データセットを用いたTF-IDF特徴量法と解析の比較検討

A Comparative Study on TF-IDF feature Weighting Method and its Analysis using Unstructured Dataset ( http://arxiv.org/abs/2308.04037v1 )

ライセンス: Link先を確認

Mamata Das, Selvakumar K., P.J.A. Alphonse

(参考訳) テキスト分類は、テキストを関連するカテゴリに分類するプロセスであり、そのアルゴリズムは多くの自然言語処理(NLP)の中核にある。 TF-IDF (Term Frequency-Inverse Document Frequency) とNLP (NLP) はテキスト分類において最もよく用いられる情報検索手法である。本研究では,非構造化データのテキスト分類における特徴重み付け手法の検討と解析を行った。提案モデルは,imdb movie reviews の n-grams と tf-idf と,感情分析のための amazon alexa reviews データセットの2つの特徴を検討した。次に、最先端の分類器を用いて、SVM(Support Vector Machine)、ロジスティック回帰(Logistic Regression)、Multinomial Naive Bayes(Multinomial NB)、ランダムフォレスト(Random Forest)、決定木(Decision Tree)、k-nearest neighbors(KNN)などの手法を検証する。これら2つの特徴抽出から,N-Gramに基づくよりもTF-IDFによる特徴抽出が顕著に増加した。 TF-IDFは最大精度(93.81%)、精度(94.20%)、リコール(93.81%)、F1スコア(91.99%)を得た。

Text Classification is the process of categorizing text into the relevant categories and its algorithms are at the core of many Natural Language Processing (NLP). Term Frequency-Inverse Document Frequency (TF-IDF) and NLP are the most highly used information retrieval methods in text classification. We have investigated and analyzed the feature weighting method for text classification on unstructured data. The proposed model considered two features N-Grams and TF-IDF on the IMDB movie reviews and Amazon Alexa reviews dataset for sentiment analysis. Then we have used the state-of-the-art classifier to validate the method i.e., Support Vector Machine (SVM), Logistic Regression, Multinomial Naive Bayes (Multinomial NB), Random Forest, Decision Tree, and k-nearest neighbors (KNN). From those two feature extractions, a significant increase in feature extraction with TF-IDF features rather than based on N-Gram. TF-IDF got the maximum accuracy (93.81%), precision (94.20%), recall (93.81%), and F1-score (91.99%) value in Random Forest classifier.

翻訳日:2023-08-09 14:05:43 公開日:2023-08-08

# 無線通信仕様情報合成のための基礎モデルの適用

Adapting Foundation Models for Information Synthesis of Wireless Communication Specifications ( http://arxiv.org/abs/2308.04033v1 )

ライセンス: Link先を確認

Manikanta Kotaru

(参考訳) 現代の無線通信技術を理解し、開発し、研究するための既存のアプローチは、多くのWebページや技術仕様文書を精査し、必要な情報を収集し、それを合成する時間集約的で厳しいプロセスである。本稿では,無線通信仕様の情報合成のための対話型人工知能であるNextGen Communications Copilotを提案する。このシステムは、基盤モデルの最近の進歩の上に構築され、ドメイン固有データベース、コンテキスト抽出器、フィードバックメカニズムの3つの主要な追加コンポーネントで構成されている。このシステムは、無線技術仕様のデータベースから抽出された簡潔でクエリ依存のコンテキスト情報と、専門家のフィードバックとデータコントリビューションのためのツールを付加する。対象物の専門家によるクエリと参照応答のベンチマークデータセットを用いた評価では、ChatGPTのような最先端ツールによって達成された0.07と0.59の値と比較して、平均BLEUスコアとBERTScore F1測定値0.37と0.79との関連性および正確な回答を示した。

Existing approaches to understanding, developing and researching modern wireless communication technologies involves time-intensive and arduous process of sifting through numerous webpages and technical specification documents, gathering the required information and synthesizing it. This paper presents NextGen Communications Copilot, a conversational artificial intelligence tool for information synthesis of wireless communication specifications. The system builds on top of recent advancements in foundation models and consists of three key additional components: a domain-specific database, a context extractor, and a feedback mechanism. The system appends user queries with concise and query-dependent contextual information extracted from a database of wireless technical specifications and incorporates tools for expert feedback and data contributions. On evaluation using a benchmark dataset of queries and reference responses created by subject matter experts, the system demonstrated more relevant and accurate answers with an average BLEU score and BERTScore F1-measure of 0.37 and 0.79 respectively compared to the corresponding values of 0.07 and 0.59 achieved by state-of-the-art tools like ChatGPT.

翻訳日:2023-08-09 14:05:15 公開日:2023-08-08

# 人間の感情の不確かさの測定

Measure of Uncertainty in Human Emotions ( http://arxiv.org/abs/2308.04032v1 )

ライセンス: Link先を確認

Etienne Naude (1), Henry Gann (1), Balaram Panda (1), Lance Zhang (1), Raina Song (1), Yuwei Shen (1) ((1) The University of Auckland)

(参考訳) 多くの研究は、コンピュータがいかに人間によって表示された感情を検査し、そのデータを使って異なるタスクを遂行できるかを調査している。しかし,ユーザの意思決定やタスクの実行を支援するために,感情分類情報を生成するコンピュータ能力を評価する研究はほとんどない。これは、人間とコンピュータの双方向コミュニケーションにとって最重要となるため、探究すべき重要な領域である。本研究では,感情分類の異なる不確実性情報表示が意思決定プロセスに与える影響を検討する実験を行った。その結果,不確実性情報を表示することで,意思決定に自信が持てることがわかった。

Many research explore how well computers are able to examine emotions displayed by humans and use that data to perform different tasks. However, there have been very few research which evaluate the computers ability to generate emotion classification information in an attempt to help the user make decisions or perform tasks. This is a crucial area to explore as it is paramount to the two way communication between humans and computers. This research conducted an experiment to investigate the impact of different uncertainty information displays of emotion classification on the human decision making process. Results show that displaying more uncertainty information can help users to be more confident when making decisions.

翻訳日:2023-08-09 14:04:53 公開日:2023-08-08

# Gentopia: ツール拡張LDMのためのコラボレーションプラットフォーム

Gentopia: A Collaborative Platform for Tool-Augmented LLMs ( http://arxiv.org/abs/2308.04030v1 )

ライセンス: Link先を確認

Binfeng Xu, Xukun Liu, Hua Shen, Zeyu Han, Yuhan Li, Murong Yue, Zhiyuan Peng, Yuchen Liu, Ziyu Yao, Dongkuan Xu

(参考訳) 拡張言語モデル(alm)は、ツールを使用する能力を持つ大きな言語モデルに力を与え、それらを実世界のインタラクションのためのインテリジェントエージェントに変換する。しかし、ALMの既存のフレームワークのほとんどは、フレキシブルなカスタマイズ、協調的な民主化、全体的評価といった重要な特徴に欠けている。シンプルな構成でエージェントを柔軟にカスタマイズでき、様々な言語モデル、タスクフォーマット、モジュールのプロンプト、プラグインを統一パラダイムにシームレスに統合できるalmフレームワークであるgentopiaを提案する。さらに,ユーザカスタマイズエージェントの登録と共有を可能にする公開プラットフォームであるgentpoolを構築した。ジェントプールに登録されたエージェントは、人工知能の民主化を進めるエージェント協力のために組み立てられるように構成可能である。クオリティの高いエージェントを確保するため、ジェントプールの不可欠なコンポーネントであるジェントベンチは、安全、堅牢性、効率など様々な面でユーザカスタマイズエージェントを徹底的に評価するように設計されている。 gentopiaをgithubにリリースし、今後も継続していく予定です。

Augmented Language Models (ALMs) empower large language models with the ability to use tools, transforming them into intelligent agents for real-world interactions. However, most existing frameworks for ALMs, to varying degrees, are deficient in the following critical features: flexible customization, collaborative democratization, and holistic evaluation. We present gentopia, an ALM framework enabling flexible customization of agents through simple configurations, seamlessly integrating various language models, task formats, prompting modules, and plugins into a unified paradigm. Furthermore, we establish gentpool, a public platform enabling the registration and sharing of user-customized agents. Agents registered in gentpool are composable such that they can be assembled together for agent collaboration, advancing the democratization of artificial intelligence. To ensure high-quality agents, gentbench, an integral component of gentpool, is designed to thoroughly evaluate user-customized agents across diverse aspects such as safety, robustness, efficiency, etc. We release gentopia on Github and will continuously move forward.

翻訳日:2023-08-09 14:04:43 公開日:2023-08-08

# バイオメディカル質問応答のためのトップK関連パス検索

Top K Relevant Passage Retrieval for Biomedical Question Answering ( http://arxiv.org/abs/2308.04028v1 )

ライセンス: Link先を確認

Shashank Gupta

(参考訳) 質問応答は、大量の文書を用いて事実のない質問に答えるタスクである。自然言語によるユーザの質問に対して,正確な回答を提供することを目標としている。質問応答は、TF-IDFやBM25のような伝統的なスパースベクトル空間モデルが事実上の方法である、選択された候補コンテキストに対する効率的な経路探索に依存する。ウェブ上では、ユーザーが質問した問題に対して、インターネットで利用可能なすべての回答を提供することのできる記事は1つもない。既存の密集した通路の検索モデルは、2018年12月20日からwikipediaのダンプで、質問に答えるための資料として訓練されている。質問応答(QA)は、大規模アノテートデータセットを使用して構築されたいくつかのオープンドメインとマシン理解システムで大きく進歩している。しかし、臨床領域では、この問題は比較的未解明のままである。複数の調査によると、バイオメディカル質問はWikipediaの記事から正しく答えられない。本研究では,既存の生体医学領域のためのdprフレームワークを開発し,医学的質問に答える信頼できる情報源であるpubmedアーティクルから回答を取り出す。 BioASQ QAデータセットで評価すると、細調整された高密度検索器は0.81F1スコアとなる。

Question answering is a task that answers factoid questions using a large collection of documents. It aims to provide precise answers in response to the user's questions in natural language. Question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. On the web, there is no single article that could provide all the possible answers available on the internet to the question of the problem asked by the user. The existing Dense Passage Retrieval model has been trained on Wikipedia dump from Dec. 20, 2018, as the source documents for answering questions. Question answering (QA) has made big strides with several open-domain and machine comprehension systems built using large-scale annotated datasets. However, in the clinical domain, this problem remains relatively unexplored. According to multiple surveys, Biomedical Questions cannot be answered correctly from Wikipedia Articles. In this work, we work on the existing DPR framework for the biomedical domain and retrieve answers from the Pubmed articles which is a reliable source to answer medical questions. When evaluated on a BioASQ QA dataset, our fine-tuned dense retriever results in a 0.81 F1 score.

翻訳日:2023-08-09 14:04:25 公開日:2023-08-08

# 軌道インフォームドサロゲート勾配を用いたフェデレートゼロ階最適化

Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients ( http://arxiv.org/abs/2308.04077v1 )

ライセンス: Link先を確認

Yao Shu, Xiaoqiang Lin, Zhongxiang Dai, Bryan Kian Hsiang Low

(参考訳) フェデレーション最適化(federated optimization)は、フェデレーション学習のような広い現実世界のアプリケーションを見つける新たなパラダイムで、複数のクライアント(エッジデバイスなど)がグローバルな機能を協調的に最適化する。クライアントはローカルデータセットを共有せず、通常はローカル勾配のみを共有する。しかし、勾配情報はフェデレーション最適化の多くの応用では利用できないため、フェデレーションゼロth-order optimization (zoo) のパラダイムが生まれている。既存のZOOアルゴリズムは、クエリと通信の非効率性の限界に悩まされている。 (a)勾配推定のための相当数の関数クエリに依存すること、及び (b) 実現したローカルアップデートと意図したグローバルアップデートの間に大きな差異がある。この目的のためには (a) 正確でクエリ効率のよい勾配推定のための最適化中に関数クエリの履歴を利用できるトラジェクトリインフォームド勾配サロゲートを導入し、 (b) これらの勾配置換体を用いた適応勾配補正法を開発し, 上記の相違を緩和する。そこで本稿では, トラジェクトリインフォームド・サロゲート勾配 (FZooS) アルゴリズムを用いたフェデレーションゼロ階次最適化手法を提案する。当社のfzoosは,フェデレーションブラックボックス逆攻撃やフェデレーション非微分メトリック最適化といった実世界実験によって支持される,既存のアプローチに対する理論的改善を実現しています。

Federated optimization, an emerging paradigm which finds wide real-world applications such as federated learning, enables multiple clients (e.g., edge devices) to collaboratively optimize a global function. The clients do not share their local datasets and typically only share their local gradients. However, the gradient information is not available in many applications of federated optimization, which hence gives rise to the paradigm of federated zeroth-order optimization (ZOO). Existing federated ZOO algorithms suffer from the limitations of query and communication inefficiency, which can be attributed to (a) their reliance on a substantial number of function queries for gradient estimation and (b) the significant disparity between their realized local updates and the intended global updates. To this end, we (a) introduce trajectory-informed gradient surrogates which is able to use the history of function queries during optimization for accurate and query-efficient gradient estimation, and (b) develop the technique of adaptive gradient correction using these gradient surrogates to mitigate the aforementioned disparity. Based on these, we propose the federated zeroth-order optimization using trajectory-informed surrogate gradients (FZooS) algorithm for query- and communication-efficient federated ZOO. Our FZooS achieves theoretical improvements over the existing approaches, which is supported by our real-world experiments such as federated black-box adversarial attack and federated non-differentiable metric optimization.

翻訳日:2023-08-09 13:56:06 公開日:2023-08-08

# DataTales: 大規模言語モデルによるデータ駆動記事のオーサリングの検討

DataTales: Investigating the use of Large Language Models for Authoring Data-Driven Articles ( http://arxiv.org/abs/2308.04076v1 )

ライセンス: Link先を確認

Nicole Sultanum, Arjun Srinivasan

(参考訳) データ駆動記事の執筆は複雑なプロセスであり、著者は洞察のためにデータを分析するだけでなく、洞察を効果的に伝達する結束的な物語を作る必要がある。現代大言語モデル(llms)のテキスト生成能力は、データ駆動記事の作成を支援し、執筆プロセスを迅速化する機会を提供する。本研究では LLM を活用したデータ駆動記事作成支援の実現可能性と評価について検討する。我々は,llmを利用して与えられたチャートに付随する文章的ナラティブを生成する,プロトタイプシステムdatatalesを設計した。デザインプローブとしてDataTalesを用いて,11人の専門家による質的研究を行い,LLMを価値あるデータ駆動型記事作成アシスタントとして活用する機会と機会を抽出した。

Authoring data-driven articles is a complex process requiring authors to not only analyze data for insights but also craft a cohesive narrative that effectively communicates the insights. Text generation capabilities of contemporary large language models (LLMs) present an opportunity to assist the authoring of data-driven articles and expedite the writing process. In this work, we investigate the feasibility and perceived value of leveraging LLMs to support authors of data-driven articles. We designed a prototype system, DataTales, that leverages a LLM to generate textual narratives accompanying a given chart. Using DataTales as a design probe, we conducted a qualitative study with 11 professionals to evaluate the concept, from which we distilled affordances and opportunities to further integrate LLMs as valuable data-driven article authoring assistants.

翻訳日:2023-08-09 13:55:43 公開日:2023-08-08

# 単眼RGBビデオにおける手指再建の空間的文脈の展開

Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction on Monocular RGB Video ( http://arxiv.org/abs/2308.04074v1 )

ライセンス: Link先を確認

Weichao Zhao, Hezhen Hu, Wengang Zhou, Li li, Houqiang Li

(参考訳) モノラルなRGBデータから相互作用する手を再構築することは難しい作業であり、例えば、自己と相互の閉塞や類似したテクスチャなど、多くの干渉要因が伴う。それまでの作業では、物理的に妥当な関係をモデル化することなく、単一のRGB画像からの情報しか活用できなかった。本研究は,空間的時空間情報を明示的に活用し,より優れたハンドリコンストラクションを実現することを目的としている。一方,1つのフレームで提供される情報不足を補うために時間的文脈を活用し,手の動きの滑らかさを対話するための時間的制約を伴う新しい時間的枠組みを設計する。また, 物理的衝突を伴わずに, 動的に再現可能な手を作るための相互浸透検出モジュールを提案する。提案フレームワークの有効性を検証するために,公開ベンチマークで新たな最先端性能を実現するための広範囲な実験を行った。

Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors, e.g. self- and mutual occlusion and similar textures. Previous works only leverage information from a single RGB image without modeling their physically plausible relation, which leads to inferior reconstruction results. In this work, we are dedicated to explicitly exploiting spatial-temporal information to achieve better interacting hand reconstruction. On one hand, we leverage temporal context to complement insufficient information provided by the single frame, and design a novel temporal framework with a temporal constraint for interacting hand motion smoothness. On the other hand, we further propose an interpenetration detection module to produce kinetically plausible interacting hands without physical collisions. Extensive experiments are performed to validate the effectiveness of our proposed framework, which achieves new state-of-the-art performance on public benchmarks.

翻訳日:2023-08-09 13:55:28 公開日:2023-08-08

# 物理形ニューラルネットワークのための特殊活性化関数の学習

Learning Specialized Activation Functions for Physics-informed Neural Networks ( http://arxiv.org/abs/2308.04073v1 )

ライセンス: Link先を確認

Honghui Wang, Lu Lu, Shiji Song, Gao Huang

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は最適化の難しさに悩まされている。本研究では,PINNの最適化難易度とアクティベーション関数の関係を明らかにする。具体的には, PINNは, 異なる性質を持つPDEを解く際に, 活性化関数に対して高い感度を示すことを示す。既存の作業は通常、非効率な試行錯誤によってアクティベーション関数を選択する。非効率な手動選択を回避し、PINNの最適化の難しさを軽減するため、異なる問題を解く際に最適な関数を探すための適応的アクティベーション関数を導入する。異なる適応活性化関数を比較し,その限界をピンの文脈で議論する。さらに,学習関数の滑らかさと多様性の要求度が高いピンズ最適化に,候補活性化関数の学習組合せのアイデアを合わせることを提案する。これは、候補集合から高次微分を与えることができないアクティベーション関数を除去し、手前のPDEに関する以前の知識に従って基本関数を異なる性質で組み込むことによって達成される。我々は,適応傾斜で探索空間をさらに強化する。提案したアダプティブアクティベーション関数は、異なるPDEシステムを解釈可能な方法で解くために使用できる。その効果は一連のベンチマークで示される。コードはhttps://github.com/LeapLabTHU/AdaAFforPINNsで入手できる。

Physics-informed neural networks (PINNs) are known to suffer from optimization difficulty. In this work, we reveal the connection between the optimization difficulty of PINNs and activation functions. Specifically, we show that PINNs exhibit high sensitivity to activation functions when solving PDEs with distinct properties. Existing works usually choose activation functions by inefficient trial-and-error. To avoid the inefficient manual selection and to alleviate the optimization difficulty of PINNs, we introduce adaptive activation functions to search for the optimal function when solving different problems. We compare different adaptive activation functions and discuss their limitations in the context of PINNs. Furthermore, we propose to tailor the idea of learning combinations of candidate activation functions to the PINNs optimization, which has a higher requirement for the smoothness and diversity on learned functions. This is achieved by removing activation functions which cannot provide higher-order derivatives from the candidate set and incorporating elementary functions with different properties according to our prior knowledge about the PDE at hand. We further enhance the search space with adaptive slopes. The proposed adaptive activation function can be used to solve different PDE systems in an interpretable way. Its effectiveness is demonstrated on a series of benchmarks. Code is available at https://github.com/LeapLabTHU/AdaAFforPINNs.

翻訳日:2023-08-09 13:55:11 公開日:2023-08-08

# 確率的軌道最適化における経路シグナチャ

Path Signatures for Diversity in Probabilistic Trajectory Optimisation ( http://arxiv.org/abs/2308.04071v1 )

ライセンス: Link先を確認

Lucas Barcelos, Tin Lai, Rafael Oliveira, Paulo Borges and Fabio Ramos

(参考訳) 移動計画は、発生した軌道の関数としてコストを最小化する軌道最適化問題としてキャストすることができる。いくつかの障害物と複雑な幾何学を持つ複雑な環境では、この最適化問題は一般に解くのが難しく、局所ミニマに傾向がある。しかし、近年のコンピューティングハードウェアの進歩により、複数の解が同時に得られる並列軌道最適化が可能となり、それぞれ異なる出発点から初期化される。残念なことに、2つの解が互いに崩壊することを防ぐ戦略がなければ、単純並列最適化はモード崩壊に悩まされ、アプローチの効率が低下し、グローバルな解を見つける可能性が低下する。本稿では, 粗路理論の最近の進歩を活用し, パラレルトラジェクトリ最適化のアルゴリズムを考案し, 解幅の多様性を促進し, モード崩壊を回避し, より優れたグローバル性を実現する。本手法は軌道の経路シグネチャとヒルベルト空間表現を基盤とし,軌道推定のための並列変分推論とカーネルの多様性を促進する。この戦略は,2次元ナビゲーションからロボットマニピュレータまで,さまざまな問題において競合する代替手段よりも低い平均コストを実現することを実証的に実証する。

Motion planning can be cast as a trajectory optimisation problem where a cost is minimised as a function of the trajectory being generated. In complex environments with several obstacles and complicated geometry, this optimisation problem is usually difficult to solve and prone to local minima. However, recent advancements in computing hardware allow for parallel trajectory optimisation where multiple solutions are obtained simultaneously, each initialised from a different starting point. Unfortunately, without a strategy preventing two solutions to collapse on each other, naive parallel optimisation can suffer from mode collapse diminishing the efficiency of the approach and the likelihood of finding a global solution. In this paper we leverage on recent advances in the theory of rough paths to devise an algorithm for parallel trajectory optimisation that promotes diversity over the range of solutions, therefore avoiding mode collapses and achieving better global properties. Our approach builds on path signatures and Hilbert space representations of trajectories, and connects parallel variational inference for trajectory estimation with diversity promoting kernels. We empirically demonstrate that this strategy achieves lower average costs than competing alternatives on a range of problems, from 2D navigation to robotic manipulators operating in cluttered environments.

翻訳日:2023-08-09 13:54:52 公開日:2023-08-08

# ConDistFL:部分注釈データからのフェデレーション学習のための条件付き蒸留

ConDistFL: Conditional Distillation for Federated Learning from Partially Annotated Data ( http://arxiv.org/abs/2308.04070v1 )

ライセンス: Link先を確認

Pochuan Wang, Chen Shen, Weichung Wang, Masahiro Oda, Chiou-Shann Fuh, Kensaku Mori, Holger R. Roth

(参考訳) 複数の臓器と疾患を同時に記述できる一般化セグメンテーションモデルの開発が望まれる。フェデレートラーニング(FL)は、トレーニングデータを交換することなく、モデルの協調開発を可能にする重要な技術である。しかし、完全に注釈付けされたトレーニングデータへの限られたアクセスは、一般化可能なモデルをトレーニングする上で大きな課題となる。本稿では,FLと知識蒸留を組み合わせた「ConDistFL」を提案する。局所モデルは、適切に設計された条件付き確率表現を用いて、グローバルモデルから部分的に注釈付きデータからラベルのない臓器や腫瘍の知識を抽出することができる。我々は,MSDとKITS19の課題から4つの異なる部分的腹部CTデータセットを検証した。実験の結果,提案フレームワークはfedavgおよびfedoptベースラインを大きく上回っている。さらに、外部テストデータセットのパフォーマンスは、各データセットで個別にトレーニングされたモデルと比較して、優れた一般化性を示す。本研究は,コンディストFLが頻繁な凝集を伴わずに良好に機能し,FLの通信コストを低減できることを示す。実装はhttps://github.com/nvidia/nvflare/tree/dev/research/condist-flで利用可能です。

Developing a generalized segmentation model capable of simultaneously delineating multiple organs and diseases is highly desirable. Federated learning (FL) is a key technology enabling the collaborative development of a model without exchanging training data. However, the limited access to fully annotated training data poses a major challenge to training generalizable models. We propose "ConDistFL", a framework to solve this problem by combining FL with knowledge distillation. Local models can extract the knowledge of unlabeled organs and tumors from partially annotated data from the global model with an adequately designed conditional probability representation. We validate our framework on four distinct partially annotated abdominal CT datasets from the MSD and KiTS19 challenges. The experimental results show that the proposed framework significantly outperforms FedAvg and FedOpt baselines. Moreover, the performance on an external test dataset demonstrates superior generalizability compared to models trained on each dataset separately. Our ablation study suggests that ConDistFL can perform well without frequent aggregation, reducing the communication cost of FL. Our implementation will be available at https://github.com/NVIDIA/NVFlare/tree/dev/research/condist-fl.

翻訳日:2023-08-09 13:54:30 公開日:2023-08-08

# 適応重み付き正規化と知識蒸留による低ラベルレジームの逆ロバスト性向上

Enhancing Adversarial Robustness in Low-Label Regime via Adaptively Weighted Regularization and Knowledge Distillation ( http://arxiv.org/abs/2308.04061v1 )

ライセンス: Link先を確認

Dongyoon Yang, Insung Kong, Yongdai Kim

(参考訳) 敵対的堅牢性は、最近、信頼できる人工知能の探求に多くの注目を集めた研究分野である。しかし、近年はラベル付きデータが豊富であると考えられる教師あり学習に焦点が当てられている。本稿では,ラベル付きデータが少ない半教師付き対人訓練について検討する。我々は,ロバストリスクに対する2つの上界を導出し,これら2つの上界に動機づけられたラベルなしデータの正規化項を提案する。そこで,本研究では,半教師型教師(セミ教師型学習アルゴリズムを用いた教師モデル)を用いて,正規化項と知識蒸留を併用した半教師型逆学習アルゴリズムを開発した。実験の結果,提案アルゴリズムは既存のアルゴリズムに比べて高いマージンで最先端の性能を実現することがわかった。特に教師付き学習アルゴリズムと比較して,ラベル付きデータの量が非常に少ない場合でも,提案アルゴリズムの性能はそれほど悪くはない。例えば、8\%のラベル付きデータしか持たないアルゴリズムは、CIFAR-10の標準および堅牢な精度の両面で、すべてのラベル付きデータを使用する教師付き敵訓練アルゴリズムに匹敵する。

Adversarial robustness is a research area that has recently received a lot of attention in the quest for trustworthy artificial intelligence. However, recent works on adversarial robustness have focused on supervised learning where it is assumed that labeled data is plentiful. In this paper, we investigate semi-supervised adversarial training where labeled data is scarce. We derive two upper bounds for the robust risk and propose a regularization term for unlabeled data motivated by these two upper bounds. Then, we develop a semi-supervised adversarial training algorithm that combines the proposed regularization term with knowledge distillation using a semi-supervised teacher (i.e., a teacher model trained using a semi-supervised learning algorithm). Our experiments show that our proposed algorithm achieves state-of-the-art performance with significant margins compared to existing algorithms. In particular, compared to supervised learning algorithms, performance of our proposed algorithm is not much worse even when the amount of labeled data is very small. For example, our algorithm with only 8\% labeled data is comparable to supervised adversarial training algorithms that use all labeled data, both in terms of standard and robust accuracies on CIFAR-10.

翻訳日:2023-08-09 13:54:13 公開日:2023-08-08

# クラスタリング手法によるニュージーランドの児童福祉システムの予測リスクモデルの改善に向けて

Toward Improving Predictive Risk Modelling for New Zealand's Child Welfare System Using Clustering Methods ( http://arxiv.org/abs/2308.04060v1 )

ライセンス: Link先を確認

Sahar Barmomanesh and Victor Miranda-Soberanis

(参考訳) 臨床的判断と予測的リスクモデルの組み合わせは、社会労働者が児童を虐待のリスクで隔離し、当局が介入すべき時期を決定するために重要な助けとなる。この問題に対処するための予測リスクモデリングは、行政データと機械学習アルゴリズムを含む世界中の政府福祉当局によって始められた。これまでの研究は、子供の虐待に関連するリスク要因を調査してきたが、これらのリスク要因がどのように相互作用するか、予測リスクモデルが異なる特徴を持つ子供に対して異なる機能を持つのかを理解するために、いくつかのギャップが残っている。本稿では,主成分分析とK-平均クラスタリングを統合することで,これらの特徴の同定と,現在のリスクモデリングフレームワークに対する潜在的な影響を明らかにする。このアプローチにより、ニュージーランド(NZ)の子供たちのケアと保護に関する懸念が報告された存在、未確認のクラスターを調べ、内部構造を分析し、訓練されたクラスターの賢明な予測モデルの性能を評価することができる。本研究の目的は,児童虐待の予測リスクモデルの開発に必要となるクラスタリングの程度を明らかにすることであり,児童保護当局が利用しようとするモデルの精度を高めることである。同一クラスタ上で学習したLASSOロジスティック回帰モデルの結果, 性能に有意な差は認められなかった。しかし、これらのモデルは、幼児を含む2つのクラスターに対してわずかに改善された。以上の結果から,特定の年齢の子どもに対して,誤差率のコントロールやモデルの精度向上のために,別のモデルを開発する必要があることが示唆された。結果は有望だが、結論を出すにはさらなる証拠が必要であり、さらなる調査が必要である。

The combination of clinical judgement and predictive risk models crucially assist social workers to segregate children at risk of maltreatment and decide when authorities should intervene. Predictive risk modelling to address this matter has been initiated by several governmental welfare authorities worldwide involving administrative data and machine learning algorithms. While previous studies have investigated risk factors relating to child maltreatment, several gaps remain as to understanding how such risk factors interact and whether predictive risk models perform differently for children with different features. By integrating Principal Component Analysis and K-Means clustering, this paper presents initial findings of our work on the identification of such features as well as their potential effect on current risk modelling frameworks. This approach allows examining existent, unidentified yet, clusters of New Zealand (NZ) children reported with care and protection concerns, as well as to analyse their inner structure, and evaluate the performance of prediction models trained cluster wise. We aim to discover the extent of clustering degree required as an early step in the development of predictive risk models for child maltreatment and so enhance the accuracy of such models intended for use by child protection authorities. The results from testing LASSO logistic regression models trained on identified clusters revealed no significant difference in their performance. The models, however, performed slightly better for two clusters including younger children. our results suggest that separate models might need to be developed for children of certain age to gain additional control over the error rates and to improve model accuracy. While results are promising, more evidence is needed to draw definitive conclusions, and further investigation is necessary.

翻訳日:2023-08-09 13:53:54 公開日:2023-08-08

# 3次元物体検出のための距離の実証分析

An Empirical Analysis of Range for 3D Object Detection ( http://arxiv.org/abs/2308.04054v1 )

ライセンス: Link先を確認

Neehar Peri, Mengtian Li, Benjamin Wilson, Yu-Xiong Wang, James Hays, Deva Ramanan

(参考訳) LiDARベースの3D検出は、自律ナビゲーションにおいて重要な役割を果たす。驚いたことに、自動運転車(AV)は(衝突回避のために)近接場オブジェクトと(長期計画のために)遠距離フィールドオブジェクトの両方を検出する必要があるが、現代のベンチマークは近接場3D検出のみに焦点を当てている。しかし、avは安全な航行のために遠方界物体を検出する必要がある。本稿では、長距離検出データセットArgoverse 2.0を用いた遠距離3次元検出の実証分析を行い、この問題をよりよく理解し、以下の知見を共有する: 近距離LiDAR測定は密度が高く、小さなボクセルで最適に符号化される一方、遠距離測定はスパースであり、大きなボクセルで符号化されている。この観察を利用して近距離vs遠距離検出用に調整された範囲エキスパートのコレクションを構築し,効率を33%向上させ,精度を3.2%向上させる長距離検出のためのモデルを効率的にアンサンブルする簡単な手法を提案する。

LiDAR-based 3D detection plays a vital role in autonomous navigation. Surprisingly, although autonomous vehicles (AVs) must detect both near-field objects (for collision avoidance) and far-field objects (for longer-term planning), contemporary benchmarks focus only on near-field 3D detection. However, AVs must detect far-field objects for safe navigation. In this paper, we present an empirical analysis of far-field 3D detection using the long-range detection dataset Argoverse 2.0 to better understand the problem, and share the following insight: near-field LiDAR measurements are dense and optimally encoded by small voxels, while far-field measurements are sparse and are better encoded with large voxels. We exploit this observation to build a collection of range experts tuned for near-vs-far field detection, and propose simple techniques to efficiently ensemble models for long-range detection that improve efficiency by 33% and boost accuracy by 3.2% CDS.

翻訳日:2023-08-09 13:53:28 公開日:2023-08-08

# 5ドルモデル:文の埋め込みからゲームマップとスプライトを生成する

The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings ( http://arxiv.org/abs/2308.04052v1 )

ライセンス: Link先を確認

Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius

(参考訳) 5ドルモデルは、符号化されたテキストプロンプトから低次元画像を生成する軽量なテキスト画像生成アーキテクチャである。このモデルは,低次元領域において,限られたトレーニングデータを用いて,正確かつ美的なコンテンツを生成することができる。モデルとデータセットの両方の小さなサイズにもかかわらず、生成された画像は、テキストプロンプトのエンコードされた意味を維持できる。このモデルを,画素アートゲームマップ,ビデオゲームスプライト画像,ダウンスケール絵文字画像の3つの小さなデータセットに適用し,これらの限られたデータセット上でのモデルの性能向上のために,新たな拡張戦略を適用した。 CLIP VIT-B/32モデルにより生成されたテキスト画像ペア間のコサイン類似度スコアを用いて,本モデルの性能を評価する。

The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model.

翻訳日:2023-08-09 13:53:10 公開日:2023-08-08

# I-WAS: 同期検出のためのGPT-2を用いたデータ拡張手法

I-WAS: a Data Augmentation Method with GPT-2 for Simile Detection ( http://arxiv.org/abs/2308.04109v1 )

ライセンス: Link先を確認

Yongzhu Chang, Rongsheng Zhang, Jiashu Pu

(参考訳) シミュラ検出は多くの自然言語処理(NLP)ベースのアプリケーション、特に文学分野において重要なタスクである。しかし、模擬検出に関する既存の研究は、しばしばサイズが限られており、完全な模擬形態を適切に表現していないコーパスに依存している。この問題に対処するため, GPT-2言語モデルを用いて, \textbf{W}ord置換および文補完に基づくデータ拡張手法を提案する。 I-WASと呼ばれる反復的なプロセスは、拡張文の品質を向上させるために設計されている。本手法の性能を実世界のアプリケーションでよりよく評価するために,実験のためにより多様なシミール形式を含むコーパスをコンパイルした。提案手法の有効性を実験的に検証し,本手法の有効性を検証した。

Simile detection is a valuable task for many natural language processing (NLP)-based applications, particularly in the field of literature. However, existing research on simile detection often relies on corpora that are limited in size and do not adequately represent the full range of simile forms. To address this issue, we propose a simile data augmentation method based on \textbf{W}ord replacement And Sentence completion using the GPT-2 language model. Our iterative process called I-WAS, is designed to improve the quality of the augmented sentences. To better evaluate the performance of our method in real-world applications, we have compiled a corpus containing a more diverse set of simile forms for experimentation. Our experimental results demonstrate the effectiveness of our proposed data augmentation method for simile detection.

翻訳日:2023-08-09 13:47:32 公開日:2023-08-08

# マルチタスクニューラルネットワークによる並列学習

Parallel Learning by Multitasking Neural Networks ( http://arxiv.org/abs/2308.04106v1 )

ライセンス: Link先を確認

Elena Agliari and Andrea Alessandrelli and Adriano Barra and Federico Ricci-Tersenghi

(参考訳) 現代の人工知能の課題は、複数のパターンを同時に学習すること(すなわち並列学習)である。標準的なヘビアン連想ニューラルネットワークでは実現できないが,本論文では,マルチタスキング・ヘビアン・ネットワーク(ホップフィールドモデルがスパースデータセットに取り組んでいるテーマのバリエーション)が,この複雑なタスクを自然に実行可能であることを示す。我々は,パターン認識に携わる標準連想ニューラルネットワークの低storageレベルを反映し,有限(ネットワークサイズが対数的に増加するまで)のパターンを並列に処理することに焦点を当てた。パターンを軽度に希釈するために、ネットワークはそれらを階層的に処理し、それらの信号の振幅をその情報内容(階層的状態)のパワーローとして分配する一方、強い希釈のために、全てのパターンに関連するすべての信号を同じ強度(並列的状態)で引き上げる。さらに、低ストレージ設定(例えば、スピンガラス限界から遠く離れた)に限定され、教師の存在はマルチタスクのパフォーマンスを変更したり、学習のしきい値を変更したりせず、後者はトレーニングプロトコルが監督または監督されていないものと同じである。例えば、モデルのコスト関数が複数のパターン(統計力学による記述)で並列に最小化されるたびに、標準的な総和二乗誤差損失関数(一般的に機械学習で使用される)が同じである。

A modern challenge of Artificial Intelligence is learning multiple patterns at once (i.e.parallel learning). While this can not be accomplished by standard Hebbian associative neural networks, in this paper we show how the Multitasking Hebbian Network (a variation on theme of the Hopfield model working on sparse data-sets) is naturally able to perform this complex task. We focus on systems processing in parallel a finite (up to logarithmic growth in the size of the network) amount of patterns, mirroring the low-storage level of standard associative neural networks at work with pattern recognition. For mild dilution in the patterns, the network handles them hierarchically, distributing the amplitudes of their signals as power-laws w.r.t. their information content (hierarchical regime), while, for strong dilution, all the signals pertaining to all the patterns are raised with the same strength (parallel regime). Further, confined to the low-storage setting (i.e., far from the spin glass limit), the presence of a teacher neither alters the multitasking performances nor changes the thresholds for learning: the latter are the same whatever the training protocol is supervised or unsupervised. Results obtained through statistical mechanics, signal-to-noise technique and Monte Carlo simulations are overall in perfect agreement and carry interesting insights on multiple learning at once: for instance, whenever the cost-function of the model is minimized in parallel on several patterns (in its description via Statistical Mechanics), the same happens to the standard sum-squared error Loss function (typically used in Machine Learning).

翻訳日:2023-08-09 13:47:19 公開日:2023-08-08

# 説明可能な機械学習によるドープ共役高分子の高スループット電気伝導率最適化

Explainable machine learning to enable high-throughput electrical conductivity optimization of doped conjugated polymers ( http://arxiv.org/abs/2308.04103v1 )

ライセンス: Link先を確認

Ji Wei Yoon, Adithya Kumar, Pawan Kumar, Kedar Hippalgaonkar, J Senthilnath, Vijila Chellappan

(参考訳) 高スループット実験技術と機械学習(ml)の組み合わせは、最近加速材料発見の新しい時代を導いており、最先端特性を持つ材料の識別を可能にしている。しかし、ある物理量の測定は自動化が難しいままである。特に、ドープポリマー材料の最適導電性を達成するには、細心のプロセス制御、実験および手間のかかる測定が必要である。本稿では,容易に測定可能な吸収スペクトルを用いたML手法を提案し,導電率測定に伴うワークフローを高速化する。最初のMLモデル(分類モデル)は、導電率>25から100S/cmの試料を正確に分類し、最大100%の精度を達成する。高導電率試料のサブセットについては,2次MLモデル(回帰モデル)を用いて導電率を予測し,印象的なR2値0.984を得た。このアプローチを検証するために, 498 s/cm と 506 s/cm の2つの高い導電率を持つ試料ではトレーニングされなかったモデルが, 精度の高いエラーレベルで正しく分類し, 予測できたことを示した。提案するmlワークフローにより, 導電率測定の効率を最大89%向上させることができた。さらに,記述子とmlモデルの独自な数学的性質を活用し,導電性に対するスペクトルの影響を裏付ける洞察を得ることにより,mlモデルの説明可能性の欠如という共通の課題に対処した。本研究では,実験科学におけるMLの目的的利用から得られる貴重な知見を提示しながら,ドープポリマー材料の特性を最適化するための加速経路を提案する。

The combination of high-throughput experimentation techniques and machine learning (ML) has recently ushered in a new era of accelerated material discovery, enabling the identification of materials with cutting-edge properties. However, the measurement of certain physical quantities remains challenging to automate. Specifically, meticulous process control, experimentation and laborious measurements are required to achieve optimal electrical conductivity in doped polymer materials. We propose a ML approach, which relies on readily measured absorbance spectra, to accelerate the workflow associated with measuring electrical conductivity. The first ML model (classification model), accurately classifies samples with a conductivity >~25 to 100 S/cm, achieving a maximum of 100% accuracy rate. For the subset of highly conductive samples, we employed a second ML model (regression model), to predict their conductivities, yielding an impressive test R2 value of 0.984. To validate the approach, we showed that the models, neither trained on the samples with the two highest conductivities of 498 and 506 S/cm, were able to, in an extrapolative manner, correctly classify and predict them at satisfactory levels of errors. The proposed ML workflow results in an improvement in the efficiency of the conductivity measurements by 89% of the maximum achievable using our experimental techniques. Furthermore, our approach addressed the common challenge of the lack of explainability in ML models by exploiting bespoke mathematical properties of the descriptors and ML model, allowing us to gain corroborated insights into the spectral influences on conductivity. Through this study, we offer an accelerated pathway for optimizing the properties of doped polymer materials while showcasing the valuable insights that can be derived from purposeful utilization of ML in experimental science.

翻訳日:2023-08-09 13:46:54 公開日:2023-08-08

# ディープニューラルネットワークアーキテクチャの非同期進化

Asynchronous Evolution of Deep Neural Network Architectures ( http://arxiv.org/abs/2308.04102v1 )

ライセンス: Link先を確認

Jason Liang, Hormoz Shahrzad, Risto Miikkulainen

(参考訳) 多くの進化的アルゴリズム(EA)は、候補の並列評価を利用する。しかし、評価時間が著しく異なる場合、多くのワーカノード(例えば、\計算クライアント)は、その時間の大部分をアイドル状態にし、次の世代が作られるのを待ちます。ディープニューラルネットワークのアーキテクチャとハイパーパラメータを最適化するeasのクラスである evolutionary neural architecture search (enas) は、この問題に特に脆弱である。本稿では,ENASと協調して動作する汎用非同期評価戦略(AES)を提案する。 AESは、最大$K$のキューを労働者に送信して評価し、M<K$の個人が労働者によって評価され次第、次の世代に進むことでスループットを向上させる。 M$の適切な値は、多様性と効率のバランスをとって実験的に決定される。 AESの汎用性と威力を示すために、まず11ビット多重化設計(単一ポピュレーション検証探索タスク)で評価され、画像キャプション(複数ポピュレーション開放最適化タスク)のためにENASまで拡張された。両問題とも多角的性能改善が観察され、AESはENASのような長大かつ可変的な評価時間を持つ複雑なシステムの進化を並列化するための有望な手法であることが示唆された。

Many evolutionary algorithms (EAs) take advantage of parallel evaluation of candidates. However, if evaluation times vary significantly, many worker nodes (i.e.,\ compute clients) are idle much of the time, waiting for the next generation to be created. Evolutionary neural architecture search (ENAS), a class of EAs that optimizes the architecture and hyperparameters of deep neural networks, is particularly vulnerable to this issue. This paper proposes a generic asynchronous evaluation strategy (AES) that is then adapted to work with ENAS. AES increases throughput by maintaining a queue of upto $K$ individuals ready to be sent to the workers for evaluation and proceeding to the next generation as soon as $M<<K$ individuals have been evaluated by the workers. A suitable value for $M$ is determined experimentally, balancing diversity and efficiency. To showcase the generality and power of AES, it was first evaluated in 11-bit multiplexer design (a single-population verifiable discovery task) and then scaled up to ENAS for image captioning (a multi-population open-ended-optimization task). In both problems, a multifold performance improvement was observed, suggesting that AES is a promising method for parallelizing the evolution of complex systems with long and variable evaluation times, such as those in ENAS.

翻訳日:2023-08-09 13:46:22 公開日:2023-08-08

# 量子近似最適化アルゴリズムによる分子ドッキング

Molecular docking via quantum approximate optimization algorithm ( http://arxiv.org/abs/2308.04098v1 )

ライセンス: Link先を確認

Qi-Ming Ding, Yi-Ming Huang, Xiao Yuan

(参考訳) 分子ドッキングは、薬物発見と精密医療において重要な役割を担い、タンパク質の機能を理解し、新しい治療法を進歩させることができる。本稿では, 量子コンピュータ上での逆ダイアバティック駆動とqaoaを利用した, ディジタル化カウンタダイアバティック量子近似最適化アルゴリズム (dc-qaoa) を提案する。 PM-2-020BのSARS-CoV-2 Mpro複合体,イミダゾピリジン34のDPP-4複合体,JP-III-048のHIV-1 gp120複合体など,多様な生物学的システムの解析に応用した。 DC-QAOAは優れた性能を示し、特に大きな分子ドッキング問題に対して、より正確で生物学的に関連するドッキング結果を提供する。さらに、QAOAベースのアルゴリズムは、ノイズの多い中間スケール量子時代のハードウェア互換性を向上し、実用的なドッキングシナリオ下での効率的な実装の可能性を示している。我々の発見は、量子コンピューティングの創薬の可能性の中核となり、タンパク質リグナンドドッキングプロセスを最適化するための貴重な洞察を提供する。

Molecular docking plays a pivotal role in drug discovery and precision medicine, enabling us to understand protein functions and advance novel therapeutics. Here, we introduce a potential alternative solution to this problem, the digitized-counterdiabatic quantum approximate optimization algorithm (DC-QAOA), which utilizes counterdiabatic driving and QAOA on a quantum computer. Our method was applied to analyze diverse biological systems, including the SARS-CoV-2 Mpro complex with PM-2-020B, the DPP-4 complex with piperidine fused imidazopyridine 34, and the HIV-1 gp120 complex with JP-III-048. The DC-QAOA exhibits superior performance, providing more accurate and biologically relevant docking results, especially for larger molecular docking problems. Moreover, QAOA-based algorithms demonstrate enhanced hardware compatibility in the noisy intermediate-scale quantum era, indicating their potential for efficient implementation under practical docking scenarios. Our findings underscore quantum computing's potential in drug discovery and offer valuable insights for optimizing protein-ligand docking processes.

翻訳日:2023-08-09 13:45:59 公開日:2023-08-08

# ユニモーダルからマルチモーダルへ:深い生成モデルによるsEMGに基づくパターン認識の改善

From Unimodal to Multimodal: improving the sEMG-Based Pattern Recognition via deep generative models ( http://arxiv.org/abs/2308.04091v1 )

ライセンス: Link先を確認

Wentao Wei, Linyan Ren

(参考訳) マルチモーダルハンドジェスチャー認識(HGR)システムは高い認識精度を実現する。しかし、マルチモーダルなジェスチャー認識データを取得するには、ユーザーが追加のセンサーを装着する必要があるため、ハードウェアコストが増加する。本稿では,仮想慣性計測ユニット(IMU)信号を用いた表面筋電図(sEMG)に基づくHGRの精度向上のための新しい生成手法を提案する。具体的には,前腕sEMG信号と前腕IMU信号の内在的相関に基づいて深部生成モデルを訓練し,入力前腕sEMG信号から仮想前腕IMU信号を生成する。その後、SEMG信号と仮想IMU信号は、ジェスチャー認識のためのマルチモーダル畳み込みニューラルネットワーク(CNN)モデルに入力される。提案手法の性能を評価するため,SEMGデータとIMUデータの両方を含む38のジェスチャーを行う28の被験者からなる5つの公開データベースと収集データベースを含む6つのデータベースについて実験を行った。その結果,提案手法は, sEMGをベースとした単調HGR法(2.15%～13.10%増加)よりも優れていた。深部生成モデルにより生成された仮想IMU信号の組み込みは、sEMGベースのHGRの精度を大幅に向上させることを示した。提案手法は,センサハードウェアを追加せずにHGRからHGRへの移行に成功したことを示す。

Multimodal hand gesture recognition (HGR) systems can achieve higher recognition accuracy. However, acquiring multimodal gesture recognition data typically requires users to wear additional sensors, thereby increasing hardware costs. This paper proposes a novel generative approach to improve Surface Electromyography (sEMG)-based HGR accuracy via virtual Inertial Measurement Unit (IMU) signals. Specifically, we trained a deep generative model based on the intrinsic correlation between forearm sEMG signals and forearm IMU signals to generate virtual forearm IMU signals from the input forearm sEMG signals at first. Subsequently, the sEMG signals and virtual IMU signals were fed into a multimodal Convolutional Neural Network (CNN) model for gesture recognition. To evaluate the performance of the proposed approach, we conducted experiments on 6 databases, including 5 publicly available databases and our collected database comprising 28 subjects performing 38 gestures, containing both sEMG and IMU data. The results show that our proposed approach outperforms the sEMG-based unimodal HGR method (with increases of 2.15%-13.10%). It demonstrates that incorporating virtual IMU signals, generated by deep generative models, can significantly enhance the accuracy of sEMG-based HGR. The proposed approach represents a successful attempt to transition from unimodal HGR to multimodal HGR without additional sensor hardware.

翻訳日:2023-08-09 13:45:39 公開日:2023-08-08

# メタバースにおける異種360度ビデオ:差分強化学習アプローチ

Heterogeneous 360 Degree Videos in Metaverse: Differentiated Reinforcement Learning Approaches ( http://arxiv.org/abs/2308.04083v1 )

ライセンス: Link先を確認

Wenhan Yu and Jun Zhao

(参考訳) 高度なビデオ技術が未来的なメタバースの開発を後押ししている。そのため、ユーザーのユースケースはもっと多様になり、360度ビデオとVR以外の2種類のビデオが混在するようになる。本稿では,フレームレートとサイバーシックネスの異なる360度ビデオに対して,新しい品質のサービスモデルを提案する。本稿では,自己設計の深部強化学習アルゴリズムを用いたフレームスロット構造とフレームワイズ最適化を提案する。具体的には、この異種シナリオに対して、SIDO(Separate Input Differentiated Output)とMIDO(Merged Input Differentiated Output)という2つの構造を設計する。また,その効果を示すため,総合的な実験を行う。

Advanced video technologies are driving the development of the futuristic Metaverse, which aims to connect users from anywhere and anytime. As such, the use cases for users will be much more diverse, leading to a mix of 360-degree videos with two types: non-VR and VR 360-degree videos. This paper presents a novel Quality of Service model for heterogeneous 360-degree videos with different requirements for frame rates and cybersickness. We propose a frame-slotted structure and conduct frame-wise optimization using self-designed differentiated deep reinforcement learning algorithms. Specifically, we design two structures, Separate Input Differentiated Output (SIDO) and Merged Input Differentiated Output (MIDO), for this heterogeneous scenario. We also conduct comprehensive experiments to demonstrate their effectiveness.

翻訳日:2023-08-09 13:45:14 公開日:2023-08-08

# quRKを用いた量子生成学習のアプリケーション指向ベンチマーク

Application-Oriented Benchmarking of Quantum Generative Learning Using QUARK ( http://arxiv.org/abs/2308.04082v1 )

ライセンス: Link先を確認

Florian J. Kiwit, Marwa Marso, Philipp Ross, Carlos A. Riofr\'io, Johannes Klepsch, Andre Luckow

(参考訳) 量子機械学習(QML)アルゴリズムのベンチマークは、QMLシステムの複雑さと変動性、例えば、モデルアンサーゼ、データセット、トレーニング技術、ハイパーパラメータ選択などによって困難である。 QUantum Computing Application benchmaRK (QUARK) フレームワークは、量子コンピューティングアプリケーションのためのベンチマーク研究を単純化し、標準化する。本稿では、量子生成モデルのトレーニングと展開を評価する機能を含むクォークの拡張をいくつか提案する。ソフトウェアアーキテクチャの更新について述べるとともに,その柔軟性を,いくつかのサンプルアプリケーションを通じて説明している。 2)GPUおよび実量子ハードウェアを用いたモデルの評価を行った。 (3) 生成モデルの一般化能力は, 生成データの新規性や妥当性など, 幅広い指標を用いて評価した。

Benchmarking of quantum machine learning (QML) algorithms is challenging due to the complexity and variability of QML systems, e.g., regarding model ansatzes, data sets, training techniques, and hyper-parameters selection. The QUantum computing Application benchmaRK (QUARK) framework simplifies and standardizes benchmarking studies for quantum computing applications. Here, we propose several extensions of QUARK to include the ability to evaluate the training and deployment of quantum generative models. We describe the updated software architecture and illustrate its flexibility through several example applications: (1) We trained different quantum generative models using several circuit ansatzes, data sets, and data transformations. (2) We evaluated our models on GPU and real quantum hardware. (3) We assessed the generalization capabilities of our generative models using a broad set of metrics that capture, e.g., the novelty and validity of the generated data.

翻訳日:2023-08-09 13:45:02 公開日:2023-08-08

# リアルタイム放射フィールドレンダリングのための3次元gaussian splatting

3D Gaussian Splatting for Real-Time Radiance Field Rendering ( http://arxiv.org/abs/2308.04079v1 )

ライセンス: Link先を確認

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk\"uhler, George Drettakis

(参考訳) ラジアンス・フィールド法は、最近、複数の写真やビデオで撮影されたシーンの新規ビュー合成に革命をもたらした。しかし、高い視覚的品質を達成するには、トレーニングとレンダリングにコストがかかるニューラルネットワークが必要である。非有界で完全なシーン(孤立したオブジェクトではなく)と1080p解像度のレンダリングでは、現在の方法ではリアルタイムの表示速度を達成できない。 1080pの解像度で高画質のリアルタイム(>30fps)ノベルビュー合成を実現するために,最先端の視覚品質を実現するための3つの重要な要素を導入する。 First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. 確立されたデータセット上で,最先端のビジュアル品質とリアルタイムレンダリングを実演する。

Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

翻訳日:2023-08-09 13:44:48 公開日:2023-08-08

# 連続波レーザーの偏光パス相関のコヒーレンス操作によるマクロ量子相関

Macroscopic quantum correlation using coherence manipulations of polarization-path correlations of a continuous-wave laser ( http://arxiv.org/abs/2308.04078v1 )

ライセンス: Link先を確認

B. S. Ham

(参考訳) 量子重ね合わせは通常、ハイゼンベルクの不確かさ原理が支配する微視的な方法で持続する。ペア粒子間の量子相関は、古典物理学によって支配される局所実在論の違反を意味する。過去数十年間、量子機能は量子コンピューティング、通信、センシングなど様々な量子技術に実装されてきた。このような量子的特徴は一般に古典的な手段では不可能であることが知られている。ここでは、連続波レーザーの偏光-パス相関のコヒーレンス操作のためのマクロ量子相関を提示し、分離不能な積-基底形式の結合パラメータ関係を満たす。偏光パス相関のコヒーレンス制御には、一対の電気光学変調器が干渉しないマッハ・ツェンダー干渉計において、対の偏光基底間の決定論的切替に使用され、従った一対の光変調器によって選択された積-基底選択の偏光積-基底重ねが生じる。この前例のないマクロな量子特徴は、将来の古典光学互換量子情報のための顕微鏡的状態を超えた量子力学の新しい理解の扉を開く。

Quantum superposition is normally sustained in a microscopic regime governed by Heisenberg uncertainty principle applicable to a single particle. Quantum correlation between paired particles implies the violation of local realism governed by classical physics. Over the last decades, quantum features have been implemented in various quantum technologies including quantum computing, communications, and sensing. Such quantum features are generally known to be impossible by any classical means. Here, a macroscopic quantum correlation is presented for coherence manipulations of polarization-path correlations of a continuous wave laser, satisfying the joint-parameter relation in an inseparable product-basis form. For the coherence control of the polarization-path correlation, a pair of electro-optic modulators is used in a noninterfering Mach-Zehnder interferometer for deterministic switching between paired polarization bases, resulting in the polarization product-basis superposition in a selective product-basis choice manner by a followed pair of acousto-optic modulators. This unprecedented macroscopic quantum feature opens the door to a new understanding of quantum mechanics beyond the microscopic regime for future classical optics-compatible quantum information.

翻訳日:2023-08-09 13:44:29 公開日:2023-08-08

# 難易度を考慮に入れた深層学習分類器の性能に関する総合的評価

Comprehensive Assessment of the Performance of Deep Learning Classifiers Reveals a Surprising Lack of Robustness ( http://arxiv.org/abs/2308.04137v1 )

ライセンス: Link先を確認

Michael W. Spratling

(参考訳) 信頼性が高くロバストな評価手法は、それ自体が堅牢で信頼性の高い機械学習モデルを開発する上で必要な第一歩である。残念ながら、分類器を評価するために一般的に使用される現在の評価プロトコルは、限られた種類のテストデータに依存する傾向があるため、パフォーマンスを総合的に評価できない。例えば、標準のテストデータを使用すると、分類器がトレーニングしていないクラスからサンプルへの予測を評価することができない。一方、未知クラスのサンプルを含むデータを用いたテストでは、分類器が既知のクラスのラベルをどの程度正確に予測できるかを評価することができない。本稿では,多種多様なデータを用いたベンチマーキング性能と,そのようなデータ型すべてに適用可能な単一のメトリクスを用いて,一貫した性能評価を行う。このようなベンチマークを用いて、現在のディープニューラルネットワークは、最先端のロバスト性を生み出すと信じられているメソッドで訓練されているものを含む、ある種のデータに対するミスに対して極めて脆弱であることが判明した。つまり、このようなモデルは、さまざまなドメインのデータに遭遇する可能性のある現実のシナリオでは信頼できないし、誤った判断をするのは簡単に騙されるため、安全ではない、ということだ。これらの結果によって、より包括的なテスト手法が広く採用され、その結果、将来的にはより堅牢な機械学習手法の開発につながることが期待されている。コードは以下の通り。 \url{https://codeberg.org/mwspratling/RobustnessEvaluation}

Reliable and robust evaluation methods are a necessary first step towards developing machine learning models that are themselves robust and reliable. Unfortunately, current evaluation protocols typically used to assess classifiers fail to comprehensively evaluate performance as they tend to rely on limited types of test data, and ignore others. For example, using the standard test data fails to evaluate the predictions made by the classifier to samples from classes it was not trained on. On the other hand, testing with data containing samples from unknown classes fails to evaluate how well the classifier can predict the labels for known classes. This article advocates bench-marking performance using a wide range of different types of data and using a single metric that can be applied to all such data types to produce a consistent evaluation of performance. Using such a benchmark it is found that current deep neural networks, including those trained with methods that are believed to produce state-of-the-art robustness, are extremely vulnerable to making mistakes on certain types of data. This means that such models will be unreliable in real-world scenarios where they may encounter data from many different domains, and that they are insecure as they can easily be fooled into making the wrong decisions. It is hoped that these results will motivate the wider adoption of more comprehensive testing methods that will, in turn, lead to the development of more robust machine learning methods in the future. Code is available at: \url{https://codeberg.org/mwspratling/RobustnessEvaluation}

翻訳日:2023-08-09 13:38:06 公開日:2023-08-08

# 量子エンタングルメントとスクイーズを用いたサブSQL電子場センシング

Sub-SQL electronic field sensing by simultaneously using quantum entanglements and squeezings ( http://arxiv.org/abs/2308.04136v1 )

ライセンス: Link先を確認

X. N. Feng, M. Zhang, and L. F. Wei

(参考訳) 量子エンタングルメント(quantum entanglement)と量子スクイージング(quantum squeezing)は、量子メトロロジーにおける感度の高い位相推定の標準量子限界(sql)を打ち負かすための2つの典型的なアプローチである。それぞれが、トラップされたイオンプラットフォームによる電界センシングの感度を向上させるために、すでに個別に利用されてきたが、実証された感度ゲインの上限は、SQL上の実験的な3dBと理論的な6dBと非常に限られている。ここで、内部(スピン)外部(オシレータ)状態の絡み合いと発振器のスクイージングを同時に使用して蓄積位相を効果的に増幅し、平均励起フォノン数を圧縮することにより、関連するパラメータを適切に設定できれば、これらの感度向上を効果的に超越することができることを示す。願わくば、この提案は、所望の電界や他のメトロロギーの繊細なセンシングのためのsqlのより強力なビートに対する新しいアプローチを提供する。

Quantum entanglement and quantum squeezing are two most typical approaches to beat the standard quantum limit (SQL) of the sensitive phase estimations in quantum metrology. Each of them has already been utilized individually to improve the sensitivity of electric field sensing with the trapped ion platform, but the upper bound of the demonstrated sensitivity gain is very limited, i.e., the experimental 3dB and theoretical 6dB, over the SQL. Here, by simultaneously using the internal (spin)-external (oscillator) state entanglements and the oscillator squeezings to effectively amplify the accumulation phase and compress the mean excited phonon number at the same time, we show that these sensitivity gains can be effectively surpassed, once the relevant parameters can be properly set. Hopefully, the proposal provides a novel approach to the stronger beaten of the SQL for the sensitive sensings of the desired electric field and also the other metrologies.

翻訳日:2023-08-09 13:37:22 公開日:2023-08-08

# マルチパススタッケルバーグ原子干渉法によるブロッホ振動相の研究

Bloch Oscillation Phases investigated by Multi-path Stuckelberg Atom Interferometry ( http://arxiv.org/abs/2308.04134v1 )

ライセンス: Link先を確認

Tahiyat Rahman, Anna Wirth-Singh, Andrew Ivanov, Daniel Gochnauer, Emmett Hough and Subhadeep Gupta

(参考訳) 加速光学格子でブロッホ振動(bos)を受ける原子は、2つの光子反動の運動量を得る。この技術は、原子光学のための大きな運動量伝達ツールを提供するが、原子干渉センサの完全な利用には、関連する位相を実験的に評価する必要がある。各BOは、スタッケルベルク干渉と呼ばれる干渉を引き起こす複数の交差を伴うランダウ・ツェナー交差を含む。我々はマルチパス・スタッケルベルク干渉計を開発し、BO中の原子相進化を最大100光子リコイル運動量移動で調べる。数値計算した単一粒子のシュロディンガー進化と比較し,高度にコヒーレントなBO配列を示し,基礎物理学およびセンシング応用におけるBO強化精密干渉計の位相安定性要件を評価する。

Atoms undergoing Bloch oscillations (BOs) in an accelerating optical lattice acquire momentum of two photon recoils per BO. This technique provides a large momentum transfer tool for atom optics, but its full exploitation for atom interferometric sensors requires experimental characterization of associated phases. Each BO involves a Landau-Zener crossing with multiple crossings inducing interference known as Stuckelberg interference. We develop a multi-path Stuckelberg interferometer and investigate atomic phase evolution during BOs, up to 100 photon recoil momentum transfer. We compare to numerically calculated single-particle Schrodinger evolution, demonstrate highly coherent BO sequences, and assess phase stability requirements for BO-enhanced precision interferometry in fundamental physics and sensing applications.

翻訳日:2023-08-09 13:36:50 公開日:2023-08-08

# 計測シャープネスと外乱トレードオフ

Measurement sharpness and disturbance trade-off ( http://arxiv.org/abs/2308.04133v1 )

ライセンス: Link先を確認

Nayere Saberian, Seyed Javad Akhtarshenas, and Fereshte Shahbeigi

(参考訳) 測定によって量子システムから情報を取得すると、通常は状態が乱される。しかし、測定後の状態は独特ではなく、選択した測定モデルに強く依存しており、情報ゆらぎのパズルを複雑にしている。 2つの異なる質問が順番に行われる。第一に、測定が引き起こす最小の障害は何か。第二に、固定された外乱が発生した場合、最善のシナリオで可能な測定量はどの程度有益か? 本稿では,これらの問題に対処する様々な手法を提案し,ユニタリキュービットチャネルの像と等価な,偏りのないバイナリキュービット測定と後測定状態空間の集合に対する明確な解を提供する。特に, この測定のシャープネスと, 測定前状態空間の平均忠実度との間には, 測定後状態に保存されたシャープネスと量子資源とのトレードオフ関係が, 局所的に適用された場合のコヒーレンスと不協和性の関係で異なることを示す。

Obtaining information from a quantum system through a measurement typically disturbs its state. The post-measurement states for a given measurement, however, are not unique and highly rely on the chosen measurement model, complicating the puzzle of information-disturbance. Two distinct questions are then in order. Firstly, what is the minimum disturbance a measurement may induce? Secondly, when a fixed disturbance occurs, how informative is the possible measurement in the best-case scenario? Here, we propose various approaches to tackle these questions and provide explicit solutions for the set of unbiased binary qubit measurements and post-measurement state spaces that are equivalent to the image of a unital qubit channel. In particular, we show there are different trade-off relations between the sharpness of this measurement and the average fidelity of the pre-measurement and post-measurement state spaces as well as the sharpness and quantum resources preserved in the post-measurement states in terms of coherence and discord-like correlation once the measurement is applied locally.

翻訳日:2023-08-09 13:36:23 公開日:2023-08-08

# OmniDataComposer: マルチモーダルデータ融合と無限データ生成のための統一データ構造

OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation ( http://arxiv.org/abs/2308.04126v1 )

ライセンス: Link先を確認

Dongyang Yu and Shihao Wang and Yuan Fang and Wangpeng An

(参考訳) 本稿では,マルチモーダルデータ融合と無制限データ生成のための革新的なアプローチであるOmniDataComposerについて述べる。コアとなるブレークスルーは、ビデオ、オーディオ、テキストを含むマルチモーダルなデータ入力の処理と統合に熟練した凝集性のあるデータ構造の導入だ。提案アルゴリズムは,映像・画像のキャプション抽出,高密度キャプション抽出,自動音声認識(ASR),光学文字認識(OCR),認識任意のモデル(RAM),オブジェクト追跡など,複数の操作の進歩を活用している。 omnidatacomposerは、6400以上のオブジェクトのカテゴリを識別でき、視覚情報のスペクトルを大きく広げることができる。これらの多様なモダリティを融合させ、モダリティ間の相互強化を促進し、クロスモダリティデータの修正を促進する。 \textbf{the final outputは、各ビデオの入力を精巧なシーケンシャルなドキュメントに変換し、ビデオを徹底的な物語に変換し、大きな言語モデルによって処理しやすくする。将来の展望には、無制限のデータ生成を促進するために各モダリティ用のデータセットを最適化することが含まれる。この堅牢なベースは、ChatGPTのようなモデルに価値のない洞察を提供し、ビデオキャプションのための高品質なデータセットを作成し、ビデオコンテンツに基づいた質問応答タスクを緩和する。 OmniDataComposerは、マルチモーダル学習の新たなステージを開拓し、AIの理解と複雑な実世界のデータ生成を増大させる大きな可能性を与える。

This paper presents OmniDataComposer, an innovative approach for multimodal data fusion and unlimited data generation with an intent to refine and uncomplicate interplay among diverse data modalities. Coming to the core breakthrough, it introduces a cohesive data structure proficient in processing and merging multimodal data inputs, which include video, audio, and text. Our crafted algorithm leverages advancements across multiple operations such as video/image caption extraction, dense caption extraction, Automatic Speech Recognition (ASR), Optical Character Recognition (OCR), Recognize Anything Model(RAM), and object tracking. OmniDataComposer is capable of identifying over 6400 categories of objects, substantially broadening the spectrum of visual information. It amalgamates these diverse modalities, promoting reciprocal enhancement among modalities and facilitating cross-modal data correction. \textbf{The final output metamorphoses each video input into an elaborate sequential document}, virtually transmuting videos into thorough narratives, making them easier to be processed by large language models. Future prospects include optimizing datasets for each modality to encourage unlimited data generation. This robust base will offer priceless insights to models like ChatGPT, enabling them to create higher quality datasets for video captioning and easing question-answering tasks based on video content. OmniDataComposer inaugurates a new stage in multimodal learning, imparting enormous potential for augmenting AI's understanding and generation of complex, real-world data.

翻訳日:2023-08-09 13:36:00 公開日:2023-08-08

# 自治体意思決定支援におけるソーシャルメディアとトピックモデリングと感性分析

Social Media, Topic Modeling and Sentiment Analysis in Municipal Decision Support ( http://arxiv.org/abs/2308.04124v1 )

ライセンス: Link先を確認

Milo\v{s} \v{S}va\v{n}a

(参考訳) 世界中の多くの都市が成長を望んでいる。しかし、スマートイニシアティブは一般市民の意見にあまり重みを与えないことが多い。ソーシャルメディアは市民の意見の最も重要な情報源の1つである。本稿では,自治体の意思決定を考慮したソーシャルメディア投稿処理フレームワークの試作について述べる。本フレームワークは,(1)各ソーシャルメディア投稿の感情極性を決定すること,(2)各トピックを識別し,それらのトピックを個別の投稿にマッピングすること,(3)これら2つの情報を各トピックに対して表現された全体感情を表すファジィ数に集約すること,の3段階からなる。任意にファジィ数は、各トピックに対して表される正と負の意見の「量」を示す2つの実数のタプルに還元することができる。このフレームワークはチェコのオストラヴァから約2ヶ月にわたって公開されたツイートで実証されている。このアプリケーションは、ファジィ数字がよりリッチな方法で感情を表現し、ソーシャルメディア上で表現される意見の多様性を捉えていることを示す。

Many cities around the world are aspiring to become. However, smart initiatives often give little weight to the opinions of average citizens. Social media are one of the most important sources of citizen opinions. This paper presents a prototype of a framework for processing social media posts with municipal decision-making in mind. The framework consists of a sequence of three steps: (1) determining the sentiment polarity of each social media post (2) identifying prevalent topics and mapping these topics to individual posts, and (3) aggregating these two pieces of information into a fuzzy number representing the overall sentiment expressed towards each topic. Optionally, the fuzzy number can be reduced into a tuple of two real numbers indicating the "amount" of positive and negative opinion expressed towards each topic. The framework is demonstrated on tweets published from Ostrava, Czechia over a period of about two months. This application illustrates how fuzzy numbers represent sentiment in a richer way and capture the diversity of opinions expressed on social media.

翻訳日:2023-08-09 13:35:31 公開日:2023-08-08

# ディープラーニングを用いたカスタム熱力学の構築

Constructing Custom Thermodynamics Using Deep Learning ( http://arxiv.org/abs/2308.04119v1 )

ライセンス: Link先を確認

Xiaoli Chen, Beatrice W. Soh, Zi-En Ooi, Eleonore Vissol-Gaudin, Haijun Yu, Kostya S. Novoselov, Kedar Hippalgaonkar, Qianxiao Li

(参考訳) AIの最もエキサイティングな応用の1つは、以前に蓄積されたデータに基づく自動科学的発見と、対称性や保存法を含む既知の物理原則によって提供される制限である。このような自動仮説作成と検証は、従来の物理的直観が失敗する複雑な現象の研究を支援する。特に重要なのが複雑な動的システムであり、時間発展は外部のパラメータによって強く影響を受ける。本稿では,任意の確率散逸系のマクロ的動的記述を,その微視的軌跡の観察から直接学習する,一般化したOnsager原理に基づくプラットフォームを開発する。複雑性と大きさが完全に顕微鏡的な記述を非現実的にするシステムに注目し,理論マクロモデルの構築には広範なドメイン知識や試行錯誤が必要となる。我々の機械学習アプローチは、還元熱力学座標を同時に構築し、これらの座標上の力学を解釈することでこの問題に対処する。提案手法を理論的および実験的に検証し, 外部応用分野における長鎖の延伸を実証する。具体的には,(1)安定状態と遷移状態の同定,(2)伸張速度の制御など,3つの解釈可能な熱力学的座標を学習し,高分子伸展の動的景観を構築する。我々はさらに,このアプローチの普遍性を,異なる領域の無関係問題に適用することで実証する。空間的流行に対するマクロダイナミクスの構築であり,その手法が幅広い科学的・技術的応用に対応していることを示す。

One of the most exciting applications of AI is automated scientific discovery based on previously amassed data, coupled with restrictions provided by the known physical principles, including symmetries and conservation laws. Such automated hypothesis creation and verification can assist scientists in studying complex phenomena, where traditional physical intuition may fail. Of particular importance are complex dynamic systems where their time evolution is strongly influenced by varying external parameters. In this paper we develop a platform based on a generalised Onsager principle to learn macroscopic dynamical descriptions of arbitrary stochastic dissipative systems directly from observations of their microscopic trajectories. We focus on systems whose complexity and sheer sizes render complete microscopic description impractical, and constructing theoretical macroscopic models requires extensive domain knowledge or trial-and-error. Our machine learning approach addresses this by simultaneously constructing reduced thermodynamic coordinates and interpreting the dynamics on these coordinates. We demonstrate our method by studying theoretically and validating experimentally, the stretching of long polymer chains in an externally applied field. Specifically, we learn three interpretable thermodynamic coordinates and build a dynamical landscape of polymer stretching, including (1) the identification of stable and transition states and (2) the control of the stretching rate. We further demonstrate the universality of our approach by applying it to an unrelated problem in a different domain: constructing macroscopic dynamics for spatial epidemics, showing that our method addresses wide scientific and technological applications.

翻訳日:2023-08-09 13:35:15 公開日:2023-08-08

# ベクターグラフィック文書におけるマルチモーダルカラーレコメンデーション

Multimodal Color Recommendation in Vector Graphic Documents ( http://arxiv.org/abs/2308.04118v1 )

ライセンス: Link先を確認

Qianru Qiu, Xueting Wang, Mayu Otani

(参考訳) カラー選択はグラフィック文書設計において重要な役割を担い、様々な文脈を十分に考慮する必要がある。しかし、ドキュメント内の他の色やテキストコンテキストと調和する適切な色を推奨することは、経験豊富なデザイナーにとっても難しい課題である。本研究では,色とテクストのコンテキストを統合したマルチモーダルマスクカラーモデルを提案し,グラフィック文書のテキスト対応カラーレコメンデーションを提案する。提案モデルは,複数のパレットにおける色間の関係をキャプチャする自己注意ネットワークと,色とCLIPに基づくテキスト表現を組み込んだ相互注意ネットワークから構成される。提案手法は主に色とテキストに基づいて色を推奨するカラーパレット補完に焦点を当てている。また、与えられたテキストに対応する完全なカラーパレットを生成するフルパレット生成という別のカラーレコメンデーションタスクにも適用可能である。実験結果から,提案手法は従来のカラーパレット完成法よりも精度,色分布,ユーザエクスペリエンスを上回り,色多様性と地味パレットとの類似性について完全なパレット生成法が得られた。

Color selection plays a critical role in graphic document design and requires sufficient consideration of various contexts. However, recommending appropriate colors which harmonize with the other colors and textual contexts in documents is a challenging task, even for experienced designers. In this study, we propose a multimodal masked color model that integrates both color and textual contexts to provide text-aware color recommendation for graphic documents. Our proposed model comprises self-attention networks to capture the relationships between colors in multiple palettes, and cross-attention networks that incorporate both color and CLIP-based text representations. Our proposed method primarily focuses on color palette completion, which recommends colors based on the given colors and text. Additionally, it is applicable for another color recommendation task, full palette generation, which generates a complete color palette corresponding to the given text. Experimental results demonstrate that our proposed approach surpasses previous color palette completion methods on accuracy, color distribution, and user experience, as well as full palette generation methods concerning color diversity and similarity to the ground truth palettes.

翻訳日:2023-08-09 13:34:51 公開日:2023-08-08

# 分子イオンを用いたシリコン中のドナースピン量子ビットの配置精度の向上

Improved placement precision of implanted donor spin qubits in silicon using molecule ions ( http://arxiv.org/abs/2308.04117v1 )

ライセンス: Link先を確認

Danielle Holmes (1), Benjamin Wilhelm (1), Alexander M. Jakob (2), Xi Yu (1), Fay E. Hudson (1,3), Kohei M. Itoh (4), Andrew S. Dzurak (1,3), David N. Jamieson (2), Andrea Morello (1) ((1) CQC2T, School of Electrical Engineering and Telecommunications, UNSW Sydney, Australia, (2) CQC2T, School of Physics, The University of Melbourne, Australia, (3) Diraq, Sydney, Australia, (4) School of Fundamental Science and Technology, Keio University, Japan)

(参考訳) シリコン28(^{28}$Si)のドナースピンは、固体で最も高性能な量子ビットの1つであり、記録的なコヒーレンス時間とゲート忠実度を99%以上提供する。ドナースピン量子ビットは、決定論的イオン注入の半導体-産業互換法を用いて製造することができる。ここでは, 単原子ではなく分子イオンを注入することで, 製造方法の精度を向上できることを示す。傍観者イオンは関心のドーパントと共作用し、さらなる運動エネルギーを持ち、単一のイオン検出器を用いて誘導された電子-ホール対を信号する決定論的ドナー注入の検出信頼性を高める。これにより、検出信頼を損なうことなくドナー量子ビットの配置不確実性を最小化することができる。高品質なPドナー量子ビットを生成するために二フッ化リン(PF$_2^+$)分子イオンの適合性を検討した。 $^{19}$F核は$I = 1/2$のスピンを持つので、磁気ノイズを加えることによってデコヒーレンスを引き起こすため、Pドナー電子にカップルを超微細化しないようにすることが必須である。二次イオン質量分析法を用いて、fはqubitデバイスの活性領域から拡散し、pドナーはドナー活性化アニールの間は元の位置に近いことが確認された。 PF$_2$-implanted qubit deviceを作製し、P供与電子上で電子スピン共鳴(ESR)測定を行った。 t_2^* = 20.5 \pm 0.5$$\mu$s と$t_2^{hahn} = 424 \pm 5$$$\mu$s の純粋な強調時間は、従来のp実装量子ビットデバイスに匹敵するpドナー電子値に対して抽出された。 PドナーESRスペクトルのより密な調査により、Pドナー近傍で$^{19}$Fの核スピンは見つからなかったことが判明した。したがって、分子イオンは、長寿命ドナースピン量子ビットの高精度な決定論的に実装された配列を生成することを大いに約束する。

Donor spins in silicon-28 ($^{28}$Si) are among the most performant qubits in the solid state, offering record coherence times and gate fidelities above 99%. Donor spin qubits can be fabricated using the semiconductor-industry compatible method of deterministic ion implantation. Here we show that the precision of this fabrication method can be boosted by implanting molecule ions instead of single atoms. The bystander ions, co-implanted with the dopant of interest, carry additional kinetic energy and thus increase the detection confidence of deterministic donor implantation employing single ion detectors to signal the induced electron-hole pairs. This allows the placement uncertainty of donor qubits to be minimised without compromising on detection confidence. We investigate the suitability of phosphorus difluoride (PF$_2^+$) molecule ions to produce high quality P donor qubits. Since $^{19}$F nuclei have a spin of $I = 1/2$, it is imperative to ensure that they do not hyperfine couple to P donor electrons as they would cause decoherence by adding magnetic noise. Using secondary ion mass spectrometry, we confirm that F diffuses away from the active region of qubit devices while the P donors remain close to their original location during a donor activation anneal. PF$_2$-implanted qubit devices were then fabricated and electron spin resonance (ESR) measurements were performed on the P donor electron. A pure dephasing time of $T_2^* = 20.5 \pm 0.5$ $\mu$s and a coherence time of $T_2^{Hahn} = 424 \pm 5$ $\mu$s were extracted for the P donor electron-values comparable to those found in previous P-implanted qubit devices. Closer investigation of the P donor ESR spectrum revealed that no $^{19}$F nuclear spins were found in the vicinity of the P donor. Molecule ions therefore show great promise for producing high-precision deterministically-implanted arrays of long-lived donor spin qubits.

翻訳日:2023-08-09 13:34:30 公開日:2023-08-08

# 意味的テクスト類似性における集団的人間の意見

Collective Human Opinions in Semantic Textual Similarity ( http://arxiv.org/abs/2308.04114v1 )

ライセンス: Link先を確認

Yuxia Wang, Shimin Tao, Ning Xie, Hao Yang, Timothy Baldwin, Karin Verspoor

(参考訳) セマンティックテキスト類似性(STS)の主観的な性質とSTSアノテーションの広汎な相違にもかかわらず、既存のベンチマークでは、平均的な人間格付けをゴールドスタンダードとして使用してきた。平均的なマスクは、低い合意の例における人間の意見の真の分布を隠蔽し、モデルが個々の評価が示す意味的曖昧さを捉えるのを防ぐ。本研究では,約15,000の文対と15万のラベルを持つ最初の不確実性対応STSデータセットであるUSTSを紹介する。分析により、スカラーも単一のガウス群も観測された判断のセットに適切に適合しないことが明らかになった。さらに,現在のstsモデルでは,個々のインスタンスに対する人間の不一致によるばらつきを捉えることはできず,集合データセットに対する予測信頼度を反映していることを示した。

Despite the subjective nature of semantic textual similarity (STS) and pervasive disagreements in STS annotation, existing benchmarks have used averaged human ratings as the gold standard. Averaging masks the true distribution of human opinions on examples of low agreement, and prevents models from capturing the semantic vagueness that the individual ratings represent. In this work, we introduce USTS, the first Uncertainty-aware STS dataset with ~15,000 Chinese sentence pairs and 150,000 labels, to study collective human opinions in STS. Analysis reveals that neither a scalar nor a single Gaussian fits a set of observed judgements adequately. We further show that current STS models cannot capture the variance caused by human disagreement on individual instances, but rather reflect the predictive confidence over the aggregate dataset.

翻訳日:2023-08-09 13:33:44 公開日:2023-08-08

# チャーン数の計算:実空間とツイスト境界条件の同値性

Calculations of Chern number: equivalence of real-space and twisted-boundary-condition formulae ( http://arxiv.org/abs/2308.04164v1 )

ライセンス: Link先を確認

Ling Lin, Yongguan Ke, Li Zhang and Chaohong Lee

(参考訳) チャーン数は二次元量子系の位相的特徴を特徴づける重要な不変量である。実空間チャーン数は、変換対称性を伴わずにシステムの位相的性質を抽出できるため、障害や不純物を伴うトポロジカルシステムの調査において重要な役割を果たす。一方、ツイスト境界条件(TBC)は、翻訳対称性のないチャーン数を定義するためにも用いられる。ここではこれらの異なるチャーン数の定義の関係について検討する。 TBC式と2つの実空間式(非可換チャーン数とボット指数式)を解析することにより、これらのアプローチが熱力学極限において等価であることを示す。等価性はハルダンモデルを通じて数値的に確認される。

Chern number is a crucial invariant for characterizing topological feature of two-dimensional quantum systems. Real-space Chern number allows us to extract topological properties of systems without involving translational symmetry, and hence plays an important role in investigating topological systems with disorder or impurity. On the other hand, the twisted boundary condition (TBC) can also be used to define the Chern number in the absence of translational symmetry. Here we study the relation between these different definitions of Chern number. Through analyzing the TBC formula and two real-space formulae (the non-commutative Chern number and the Bott index formula), we show that these approaches are equivalent in the thermodynamic limit. The equivalence is also numerically confirmed via the Haldane model.

翻訳日:2023-08-09 13:28:07 公開日:2023-08-08

# 散乱効果によるディスク下カメラ画像復元

Under-Display Camera Image Restoration with Scattering Effect ( http://arxiv.org/abs/2308.04163v1 )

ライセンス: Link先を確認

Binbin Song, Xiangyu Chen, Shuning Xu, and Jiantao Zhou

(参考訳) under-display camera(udc)は、ノッチやパンチホールによる邪魔なしにフルスクリーンのビジュアル体験を提供する。しかし、ディスプレイの半透明性は必然的にudc画像に深刻な劣化をもたらす。本稿では,表示による散乱効果の具体的な考察により,UDC画像復元問題に対処する。ディスプレイを均質な散乱媒体として扱うことにより,散乱効果を明示的にモデル化する。散乱効果の物理モデルを用いて、画像合成のための画像形成パイプラインを改善し、基底真理を持つ現実的なudcデータセットを構築する。最終的なUDC画像回復に対する散乱効果を抑制するために、2分岐復元ネットワークを設計する。より具体的には、散乱枝は、劣化した画像から散乱効果のパラメータを推定するためにチャンネルワイズ自己アテンションのグローバルモデリング能力を利用する。画像ブランチはcnnのローカル表現の利点を利用してクリアなシーンを復元する一方で、散乱ブランチによって暗黙的に誘導される。実世界のデータと合成データの両方で大規模な実験を行い、現状のUDC修復技術よりも提案手法の優位性を実証した。ソースコードとデータセットは \url{https://github.com/namecantbenull/srudc} で入手できる。

The under-display camera (UDC) provides consumers with a full-screen visual experience without any obstruction due to notches or punched holes. However, the semi-transparent nature of the display inevitably introduces the severe degradation into UDC images. In this work, we address the UDC image restoration problem with the specific consideration of the scattering effect caused by the display. We explicitly model the scattering effect by treating the display as a piece of homogeneous scattering medium. With the physical model of the scattering effect, we improve the image formation pipeline for the image synthesis to construct a realistic UDC dataset with ground truths. To suppress the scattering effect for the eventual UDC image recovery, a two-branch restoration network is designed. More specifically, the scattering branch leverages global modeling capabilities of the channel-wise self-attention to estimate parameters of the scattering effect from degraded images. While the image branch exploits the local representation advantage of CNN to recover clear scenes, implicitly guided by the scattering branch. Extensive experiments are conducted on both real-world and synthesized data, demonstrating the superiority of the proposed method over the state-of-the-art UDC restoration techniques. The source code and dataset are available at \url{https://github.com/NamecantbeNULL/SRUDC}.

翻訳日:2023-08-09 13:27:52 公開日:2023-08-08

# epcformer:ユニバーサル参照ビデオオブジェクトセグメンテーションのための表現プロンプト協調トランス

EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation ( http://arxiv.org/abs/2308.04162v1 )

ライセンス: Link先を確認

Jiajun Chen, Jiacheng Lin, Zhiqiang Xiao, Haolong Fu, Ke Nai, Kailun Yang, Zhiyong Li

(参考訳) 音声誘導型ビデオオブジェクトセグメンテーション(A-VOS)と参照型ビデオオブジェクトセグメンテーション(R-VOS)は、どちらもユーザが提供する表現プロンプトに従って、ビデオシーケンスから特定のオブジェクトをセグメントすることを目的としている。しかし、異なるモダリティの表現をモデル化する際の課題のため、現代の手法は相互作用の柔軟性と高精度なローカライゼーションとセグメンテーションのバランスをとるのに苦労している。本稿では,音声とテキストのアライメント表現と,音声,テキスト,視覚的特徴間の深い相互作用という2つの観点からこの問題に対処する。まず,epcformerにおいて,汎用アーキテクチャであるexpression prompt collaboration transformerを提案する。次に,音声およびテキスト表現のための表現アライメント(EA)機構を提案する。音声およびテキスト表現のコントラスト学習を導入することにより,同じオブジェクトを表す音声とテキスト表現間の意味的等価性の理解を実現する。次に,音声,テキスト,映像間の深いインタラクションを容易にするために,表現・視覚注意(eva)機構を導入する。表現プロンプトの観点からの映像オブジェクトのセグメンテーションの知識は,テキストと音声の相補的手がかりを深く探求することにより,2つのタスク間のシームレスな移動を可能にする。良く認識されたベンチマークの実験は、我々の普遍的なEPCFormerが両方のタスクで最先端の結果を得ることを示した。 EPCFormerのソースコードはhttps://github.com/lab206/EPCFormerで公開されている。

Audio-guided Video Object Segmentation (A-VOS) and Referring Video Object Segmentation (R-VOS) are two highly-related tasks, which both aim to segment specific objects from video sequences according to user-provided expression prompts. However, due to the challenges in modeling representations for different modalities, contemporary methods struggle to strike a balance between interaction flexibility and high-precision localization and segmentation. In this paper, we address this problem from two perspectives: the alignment representation of audio and text and the deep interaction among audio, text, and visual features. First, we propose a universal architecture, the Expression Prompt Collaboration Transformer, herein EPCFormer. Next, we propose an Expression Alignment (EA) mechanism for audio and text expressions. By introducing contrastive learning for audio and text expressions, the proposed EPCFormer realizes comprehension of the semantic equivalence between audio and text expressions denoting the same objects. Then, to facilitate deep interactions among audio, text, and video features, we introduce an Expression-Visual Attention (EVA) mechanism. The knowledge of video object segmentation in terms of the expression prompts can seamlessly transfer between the two tasks by deeply exploring complementary cues between text and audio. Experiments on well-recognized benchmarks demonstrate that our universal EPCFormer attains state-of-the-art results on both tasks. The source code of EPCFormer will be made publicly available at https://github.com/lab206/EPCFormer.

翻訳日:2023-08-09 13:27:32 公開日:2023-08-08

# 知識表現と推論の現状と課題

Current and Future Challenges in Knowledge Representation and Reasoning ( http://arxiv.org/abs/2308.04161v1 )

ライセンス: Link先を確認

James P. Delgrande, Birte Glimm, Thomas Meyer, Miroslaw Truszczynski, Frank Wolter

(参考訳) 知識表現と推論は人工知能の中心的で、長く、活発な領域である。近年では、機械学習や不確実性下での推論といった分野の研究によって、その研究に挑戦され、補完されている。 2022年7月、知識表現と推論に関するdagstuhl perspectivesワークショップが開催された。ワークショップの目的は、他の分野との関係、欠点と強み、今後の進歩の勧告などを含む、分野における芸術の状況を説明することであった。私たちは、Dagstuhl Workshopで行われたプレゼンテーション、パネル、ワーキンググループ、ディスカッションに基づいて、このマニフェストを開発しました。それは知識表現に関する私たちの見解の宣言である:その起源、目標、マイルストーン、現在のファシ、その他の分野、特に人工知能との関係、そしてその課題、そして次の10年の主要な優先事項である。

Knowledge Representation and Reasoning is a central, longstanding, and active area of Artificial Intelligence. Over the years it has evolved significantly; more recently it has been challenged and complemented by research in areas such as machine learning and reasoning under uncertainty. In July 2022 a Dagstuhl Perspectives workshop was held on Knowledge Representation and Reasoning. The goal of the workshop was to describe the state of the art in the field, including its relation with other areas, its shortcomings and strengths, together with recommendations for future progress. We developed this manifesto based on the presentations, panels, working groups, and discussions that took place at the Dagstuhl Workshop. It is a declaration of our views on Knowledge Representation: its origins, goals, milestones, and current foci; its relation to other disciplines, especially to Artificial Intelligence; and on its challenges, along with key priorities for the next decade.

翻訳日:2023-08-09 13:27:07 公開日:2023-08-08

# ステレオ・アテンションによるトップダウン立体画像品質評価

Towards Top-Down Stereoscopic Image Quality Assessment via Stereo Attention ( http://arxiv.org/abs/2308.04156v1 )

ライセンス: Link先を確認

Huilin Zhang, Sumei Li, Yongli Chang

(参考訳) 立体画像品質評価(SIQA)は、3Dコンテンツの視覚的体験を評価し改善する上で重要な役割を担っている。 SIQAの既存の双眼鏡特性と注意法は有望な性能を達成した。しかし、これらのボトムアップアプローチは、人間の視覚システム(HVS)の本質的な特徴を利用するには不十分である。本稿では,SIQAをステレオアテンションとして,品質評価プロセスの指針としてトップダウン視点を用いた新しいネットワークを提案する。提案手法は,高次双眼信号から低次単眼信号への誘導を実現する一方,両眼・単眼情報は処理パイプライン全体を通して段階的に校正することができる。我々は,ステレオ知覚におけるトップダウン哲学を実現するために,一般化ステレオアテンション(sat)ブロックを設計する。このブロックは、融合生成アテンションマップを2つの低レベル単眼特徴の表現に影響を与える高レベル双眼鏡変調器として利用する。さらに、霊長類一次視覚野の両眼反応が単眼反応の総和よりも小さいことを示す最近の知見を考慮に入れたエネルギー係数(EC)を導入する。適応ECは両眼反応の大きさを柔軟に調整できるため,我々の枠組み内での頑健な両眼特徴の形成が促進される。単眼的特徴の2つの枝の総和と減算から最も識別的品質情報を抽出するために,ミンプールとマックスプール操作を各枝に適用する二重プール戦略を用いる。実験結果から,SIQA分野における視覚知覚特性のシミュレーションと最先端化におけるトップダウン手法の優位性を強調した。この作業のコードはhttps://github.com/fanning-zhang/satnetで入手できる。

Stereoscopic image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing binocular properties and attention-based methods for SIQA have achieved promising performance. However, these bottom-up approaches are inadequate in exploiting the inherent characteristics of the human visual system (HVS). This paper presents a novel network for SIQA via stereo attention, employing a top-down perspective to guide the quality assessment process. Our proposed method realizes the guidance from high-level binocular signals down to low-level monocular signals, while the binocular and monocular information can be calibrated progressively throughout the processing pipeline. We design a generalized Stereo AttenTion (SAT) block to implement the top-down philosophy in stereo perception. This block utilizes the fusion-generated attention map as a high-level binocular modulator, influencing the representation of two low-level monocular features. Additionally, we introduce an Energy Coefficient (EC) to account for recent findings indicating that binocular responses in the primate primary visual cortex are less than the sum of monocular responses. The adaptive EC can tune the magnitude of binocular response flexibly, thus enhancing the formation of robust binocular features within our framework. To extract the most discriminative quality information from the summation and subtraction of the two branches of monocular features, we utilize a dual-pooling strategy that applies min-pooling and max-pooling operations to the respective branches. Experimental results highlight the superiority of our top-down method in simulating the property of visual perception and advancing the state-of-the-art in the SIQA field. The code of this work is available at https://github.com/Fanning-Zhang/SATNet.

翻訳日:2023-08-09 13:26:53 公開日:2023-08-08

# ビジョンランゲージモデルを用いたインターリーブ型ビジョンランゲージ指導

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions ( http://arxiv.org/abs/2308.04152v1 )

ライセンス: Link先を確認

Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Hanwang Zhang, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Yueting Zhuang

(参考訳) 最近、MLLM(Multimodal Large Language Models)が大きな関心を集め、様々な視覚言語タスクの汎用モデルとして機能する創発的な能力を示している。しかし、既存の手法は主に、MLLMの普及を妨げる視覚的コンテキストとして単一のイメージを持つ限られたタイプの命令に焦点を当てている。本稿では,視覚に豊かなWebページ/テキスト,講義スライド,エンボディダイアログなど,さまざまなシナリオをカバーする複雑な画像テキストシーケンシャルなコンテキストを含む複雑な視覚言語命令に対する命令に従う能力を総合的に評価するI4ベンチマークを提案する。画像キャプションのアライメントを目標とするVisual Prompt Generator (VPG)は、キャプションのための一般的なフォアグラウンド情報に出席する傾向にあるが、特定のタスクに必要な特定の情報を抽出するのに苦労する。本稿では,LLMの高度な推論能力を利用してVPGを制御し,命令固有の視覚情報を条件付きで抽出し,LLMに再注入する汎用的で軽量な知識再注入モジュールを提案する。さらに,基礎モデルのカスケードを協調させることにより,提案モジュールを体系的に学習するための,アノテーションフリーな対物画像学習戦略を提案する。提案するモジュールとトレーニング戦略によって強化されたcheetahは,多種多様な視覚言語インストラクションを効果的に処理し,高品質なマルチモーダルインストラクションチューニングデータを用いずに,i4のすべてのタスクにおいて最先端のゼロショット性能を実現するmllmである。さらに、Cheetahは、同時MMEベンチマークにおける最先端の命令チューニングモデルと比較して、競合性能を示す。

Multimodal Large Language Models (MLLMs) have recently sparked significant interest, which demonstrates emergent capabilities to serve as a general-purpose model for various vision-language tasks. However, existing methods mainly focus on limited types of instructions with a single image as visual context, which hinders the widespread availability of MLLMs. In this paper, we introduce the I4 benchmark to comprehensively evaluate the instruction following ability on complicated interleaved vision-language instructions, which involve intricate image-text sequential context, covering a diverse range of scenarios (e.g., visually-rich webpages/textbooks, lecture slides, embodied dialogue). Systematic evaluation on our I4 benchmark reveals a common defect of existing methods: the Visual Prompt Generator (VPG) trained on image-captioning alignment objective tends to attend to common foreground information for captioning but struggles to extract specific information required by particular tasks. To address this issue, we propose a generic and lightweight controllable knowledge re-injection module, which utilizes the sophisticated reasoning ability of LLMs to control the VPG to conditionally extract instruction-specific visual information and re-inject it into the LLM. Further, we introduce an annotation-free cross-attention guided counterfactual image training strategy to methodically learn the proposed module by collaborating a cascade of foundation models. Enhanced by the proposed module and training strategy, we present Cheetah, a MLLM that can effectively handle a wide variety of interleaved vision-language instructions and achieves state-of-the-art zero-shot performance across all tasks of I4, without high-quality multimodal instruction tuning data. Moreover, Cheetah also exhibits competitive performance compared with state-of-the-art instruction tuned models on concurrent MME benchmark.

翻訳日:2023-08-09 13:26:24 公開日:2023-08-08

# エッジ機械学習を用いた白斑症候群ウイルス(WSSV)モニタリングへの応用

Application for White Spot Syndrome Virus (WSSV) Monitoring using Edge Machine Learning ( http://arxiv.org/abs/2308.04151v1 )

ライセンス: Link先を確認

Lorenzo S. Querol, Macario O. Cordel II, Dan Jeric A. Rustia, Mary Nia M. Santos

(参考訳) 養殖産業はエビの輸出に強く依存しており、生産に深刻な影響を及ぼすホワイトスポット症候群ウイルス(WSSV)のようなウイルス感染による課題に直面している。この文脈では、コンピュータビジョンは、熟練した目や訓練されていない目ですぐに明らかでない特徴を特定する上で重要な役割を果たす。本研究は,WSSV認識のための限られたデータに対する課題である。データ収集とモニタリングに特化したモバイルアプリケーションは、WSSV認識モデルをトレーニングし、国全体の疾病監視を改善するためのイメージデータセットの作成を容易にするために開発された。この研究は、不均衡学習とデバイス上の推論の課題に対処するために、WSSV認識の徹底的な分析も含んでいる。 MobileNetV3-SmallとEfficientNetV2-B0がそれぞれ0.72と0.99のF1スコアを獲得した。両方のモデルの塩分ヒートマップは、これらのモデルの「ブラックボックス」の性質を明らかにし、画像のどの特徴が予測に最も重要であるかについての洞察を得るためにも観察された。これらの結果は、リソース制約のあるデバイス用に設計されたモデルを使用することの有効性と限界を強調し、WSSVを正確に認識し、この領域におけるコンピュータビジョンの使用における貴重な情報と方向性を提供する。

The aquaculture industry, strongly reliant on shrimp exports, faces challenges due to viral infections like the White Spot Syndrome Virus (WSSV) that severely impact output yields. In this context, computer vision can play a significant role in identifying features not immediately evident to skilled or untrained eyes, potentially reducing the time required to report WSSV infections. In this study, the challenge of limited data for WSSV recognition was addressed. A mobile application dedicated to data collection and monitoring was developed to facilitate the creation of an image dataset to train a WSSV recognition model and improve country-wide disease surveillance. The study also includes a thorough analysis of WSSV recognition to address the challenge of imbalanced learning and on-device inference. The models explored, MobileNetV3-Small and EfficientNetV2-B0, gained an F1-Score of 0.72 and 0.99 respectively. The saliency heatmaps of both models were also observed to uncover the "black-box" nature of these models and to gain insight as to what features in the images are most important in making a prediction. These results highlight the effectiveness and limitations of using models designed for resource-constrained devices and balancing their performance in accurately recognizing WSSV, providing valuable information and direction in the use of computer vision in this domain.

翻訳日:2023-08-09 13:25:50 公開日:2023-08-08

# ハイブリッドフィードフォワード受信機による二相シフト鍵識別のための標準量子限界のビーティング

Beating the standard quantum limit for binary phase-shift-keying discrimination with a hybrid feed-forward receiver ( http://arxiv.org/abs/2308.04146v1 )

ライセンス: Link先を確認

Michele N. Notarnicola and Stefano Olivares

(参考訳) 低強度局所発振器と光子数分解検出器を用いて、変位フィードフォワード受信機(DFFRE)とホモダインの適切な組み合わせに基づいて、二相シフト鍵コヒーレント状態の判別を行うハイブリッドフィードフォワード受信機(HFFRE)を提案する。提案手法は,非単位量子検出効率,暗カウント,可視性低下の存在下での現実的なシナリオにも対処する。現在のHFFREは、全ての条件においてDFFREよりも優れており、特定のレシエーションにおける標準量子限界を上回っている。

We propose a hybrid feed-forward receiver (HFFRE) for the discrimination of binary phase-shift-keyed coherent states based on the appropriate combination of the displacement feed-forward receiver (DFFRE) and a homodyne-like setup employing a low-intensity local oscillator and photon-number-resolving detectors. We investigate the performance of the proposed scheme addressing also realistic scenarios in the presence of non-unit quantum detection efficiency, dark counts and a visibility reduction. The present HFFRE outperforms the DFFRE in all conditions, beating the standard quantum limit in particular regimes.

翻訳日:2023-08-09 13:25:29 公開日:2023-08-08

# 視覚表現学習のためのクラスレベル構造関係モデリングと平滑化

Class-level Structural Relation Modelling and Smoothing for Visual Representation Learning ( http://arxiv.org/abs/2308.04142v1 )

ライセンス: Link先を確認

Zitan Chen, Zhuang Qi, Xiao Cao, Xiangxian Li, Xiangxu Meng, Lei Meng

(参考訳) 画像の表現学習は、視覚トランスフォーマーのようなより複雑な神経モデルや、構造因果モデルのような新しい学習理論の進歩によって進歩してきた。しかし、これらのモデルはクラスレベルのデータ分散を暗黙的に規則化する分類損失に主に依存しており、様々な視覚的パターンを持つクラスを扱う際に困難に直面する可能性がある。データサンプル間の構造情報の導入は,この状況を改善する可能性がある。 To achieve this goal, this paper presents a framework termed \textbf{C}lass-level Structural Relation Modeling and Smoothing for Visual Representation Learning (CSRMS), which includes the Class-level Relation Modelling, Class-aware Graph Sampling, and Relational Graph-Guided Representation Learning modules to model a relational graph of the entire dataset and perform class-aware smoothing and regularization operations to alleviate the issue of intra-class visual diversity and inter-class similarity. Specifically, the Class-level Relation Modelling module uses a clustering algorithm to learn the data distributions in the feature space and identify three types of class-level sample relations for the training set; Class-aware Graph Sampling module extends typical training batch construction process with three strategies to sample dataset-level sub-graphs; and Relational Graph-Guided Representation Learning module employs a graph convolution network with knowledge-guided smoothing operations to ease the projection from different visual patterns to the same class. 構造化知識モデルによる表現学習の効果を実証し、csrmを任意の最先端の視覚表現学習モデルと組み込むことで、パフォーマンスの向上が期待できることを示した。ソースコードとデモはhttps://github.com/czt117/CSRMSで公開されている。

Representation learning for images has been advanced by recent progress in more complex neural models such as the Vision Transformers and new learning theories such as the structural causal models. However, these models mainly rely on the classification loss to implicitly regularize the class-level data distributions, and they may face difficulties when handling classes with diverse visual patterns. We argue that the incorporation of the structural information between data samples may improve this situation. To achieve this goal, this paper presents a framework termed \textbf{C}lass-level Structural Relation Modeling and Smoothing for Visual Representation Learning (CSRMS), which includes the Class-level Relation Modelling, Class-aware Graph Sampling, and Relational Graph-Guided Representation Learning modules to model a relational graph of the entire dataset and perform class-aware smoothing and regularization operations to alleviate the issue of intra-class visual diversity and inter-class similarity. Specifically, the Class-level Relation Modelling module uses a clustering algorithm to learn the data distributions in the feature space and identify three types of class-level sample relations for the training set; Class-aware Graph Sampling module extends typical training batch construction process with three strategies to sample dataset-level sub-graphs; and Relational Graph-Guided Representation Learning module employs a graph convolution network with knowledge-guided smoothing operations to ease the projection from different visual patterns to the same class. Experiments demonstrate the effectiveness of structured knowledge modelling for enhanced representation learning and show that CSRMS can be incorporated with any state-of-the-art visual representation learning models for performance gains. The source codes and demos have been released at https://github.com/czt117/CSRMS.

翻訳日:2023-08-09 13:25:15 公開日:2023-08-08

# 長期法的文書分類のための大規模言語モデルプロンプトチェイン

Large Language Model Prompt Chaining for Long Legal Document Classification ( http://arxiv.org/abs/2308.04138v1 )

ライセンス: Link先を確認

Dietrich Trautmann

(参考訳) プロンプトは、望ましい結果に合致した適切な応答を生成する際に、言語モデルを誘導または制御するために使用される。チェイン(Chaining)は、複雑なタスクを小さな管理可能なコンポーネントに分解する戦略である。本研究は,広範な法律文書分類タスクにおいて,プロンプト・チェーンを活用し,その複雑なドメイン固有言語と相当な長さの制約を呈する。私たちのアプローチは、元の文書の簡潔な要約の作成から始まり、関連する例文とその対応するアノテーションをトレーニングコーパスから意味的に検索する。最後に、限定的なプロンプトからコンテキスト内学習を活用することで、タスクに基づいたラベルを割り当てるように促します。即時連鎖により、ゼロショット以上の性能を向上できるだけでなく、より小さなモデルを用いてChatGPTゼロショットのような大型モデルによって達成されるマイクロF1スコアを超越できることを実証する。

Prompting is used to guide or steer a language model in generating an appropriate response that is consistent with the desired outcome. Chaining is a strategy used to decompose complex tasks into smaller, manageable components. In this study, we utilize prompt chaining for extensive legal document classification tasks, which present difficulties due to their intricate domain-specific language and considerable length. Our approach begins with the creation of a concise summary of the original document, followed by a semantic search for related exemplar texts and their corresponding annotations from a training corpus. Finally, we prompt for a label - based on the task - to assign, by leveraging the in-context learning from the few-shot prompt. We demonstrate that through prompt chaining, we can not only enhance the performance over zero-shot, but also surpass the micro-F1 score achieved by larger models, such as ChatGPT zero-shot, using smaller models.

翻訳日:2023-08-09 13:24:49 公開日:2023-08-08

# 超強結合系におけるベル状態の超高速および決定論的生成

Ultrafast and deterministic generation of Bell states in the ultrastrong coupling regime ( http://arxiv.org/abs/2308.04183v1 )

ライセンス: Link先を確認

Xin Xie, Junlong Tian, Jie Peng

(参考訳) 我々は、非等方性2量子ラビモデル(qrm)の特別なダーク状態解を発見し、これは少なくとも1つの光子を持ち、カップリング状態全体において一定の固有エネルギーを持つ。そこで本研究では,暗黒状態に沿った断熱的進化を通じて2種類のベル状態を生成する手法を提案する。スタークシフトの助けを借りて、生成時間をサブナノ秒スケールに短縮することができ、共振器周波数の逆に比例し、忠実度は99%に達する。さらに、他の2種類のベル状態も超高速生成することができる。

We have found the special dark state solutions of the anisotropic two-qubit quantum Rabi model (QRM), which has at most one photon, and constant eigenenergy in the whole coupling regime. Accordingly, we propose a scheme to deterministically generate two kinds of the two-qubit Bell states through adiabatic evolution along the dark states. With the assistance of the Stark shift, the generation time can be reduced to subnanosecond scales, proportional to the reverse of the resonator frequency, with fidelity reaching 99%. Furthermore, the other two kinds of Bell states can also be ultrafast generated.

翻訳日:2023-08-09 13:16:43 公開日:2023-08-08

# 社会的に受け入れがたい談話分類(SUD)について : 「我々は同じページにいるのか?」

Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ?" ( http://arxiv.org/abs/2308.04180v1 )

ライセンス: Link先を確認

Bruno Machado Carneiro, Michele Linardi, Julien Longhi

(参考訳) オンラインテキストにおけるsud(socially unacceptable discourse)の特徴付けと検出について検討した。我々は、これまで最先端の機械学習(ML) SUD検出ソリューションで使用されてきたさまざまなオンラインソースから、さまざまな手動の注釈付きテキストを含む、新しいコーパスを構築し、提示する。このグローバルな文脈は、異なる文脈からではなく、同じSUDカテゴリに関する知識を取得するSUD分類器の一般化能力をテストすることができる。この観点から、オープンチャレンジとオープンリサーチの方向性を議論することで、異なるアノテーションのモダリティがSUD学習にどのように影響するかを分析することができる。また、アノテーションタスクでドメインエキスパートをサポートするいくつかのデータインサイトも提供します。

We study Socially Unacceptable Discourse (SUD) characterization and detection in online text. We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources used so far in state-of-the-art Machine learning (ML) SUD detection solutions. This global context allows us to test the generalization ability of SUD classifiers that acquire knowledge around the same SUD categories, but from different contexts. From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning by discussing open challenges and open research directions. We also provide several data insights which can support domain experts in the annotation task.

翻訳日:2023-08-09 13:16:32 公開日:2023-08-08

# 医療のためのチャットボット:簡潔なレビュー

Assistive Chatbots for healthcare: a succinct review ( http://arxiv.org/abs/2308.04178v1 )

ライセンス: Link先を確認

Basabdatta Sen Bhattacharya, Vibhav Sinai Pissurlenkar

(参考訳) 医療サービスを支援する人工知能(AI)は、近年の世界的なパンデミックほど必要とされていない。ここでは、過去10年間(2013-2023)に提案された医療におけるAI対応チャットボットの現状について概観する。 AI対応技術に焦点が当てられているのは、チャットボットによる人間と機械のインタラクションの質を高め、人間と人間のインタラクションへの依存を減らし、人間の時間を節約できる可能性があるからだ。われわれのレビューは、患者サポートに使われている(商用)チャットボットはごくわずかだが、臨床試験段階にある他の(商用ではない)チャットボットもあることを示している。しかし、このテクノロジーに対する患者の安全とデータ保護に関する信頼の欠如に加えて、医療従事者や専門家の間では、そのメリットに対するより広い認識の欠如がある。また,ヒトと比較して,チャットボットの自然言語処理(NLP)スキルに不満を呈している。このチャットボットは、nlpテクノロジーのバーを育てた最近のchatgptの導入にもかかわらず、医療支援の「ナロー」領域で機能する徹底的かつ厳格なチェックなしでは、患者の安全と医療倫理に信頼できない。私たちのレビューでは、公衆衛生サービスにおけるAI対応チャットボットのデプロイと統合を可能にするためには、時間の必要性が示唆されている。 (a)研修・開発を中心とした医療コミュニティ b) アウトリーチを通じて患者とより広い地域社会。

Artificial Intelligence (AI) for supporting healthcare services has never been more necessitated than by the recent global pandemic. Here, we review the state-of-the-art in AI-enabled Chatbots in healthcare proposed during the last 10 years (2013-2023). The focus on AI-enabled technology is because of its potential for enhancing the quality of human-machine interaction via Chatbots, reducing dependence on human-human interaction and saving man-hours. Our review indicates that there are a handful of (commercial) Chatbots that are being used for patient support, while there are others (non-commercial) that are in the clinical trial phases. However, there is a lack of trust on this technology regarding patient safety and data protection, as well as a lack of wider awareness on its benefits among the healthcare workers and professionals. Also, patients have expressed dissatisfaction with Natural Language Processing (NLP) skills of the Chatbots in comparison to humans. Notwithstanding the recent introduction of ChatGPT that has raised the bar for the NLP technology, this Chatbot cannot be trusted with patient safety and medical ethics without thorough and rigorous checks to serve in the `narrow' domain of assistive healthcare. Our review suggests that to enable deployment and integration of AI-enabled Chatbots in public health services, the need of the hour is: to build technology that is simple and safe to use; to build confidence on the technology among: (a) the medical community by focussed training and development; (b) the patients and wider community through outreach.

翻訳日:2023-08-09 13:16:20 公開日:2023-08-08

# ディープフェイク検出器はどの程度一般化可能か? 実証的研究

How Generalizable are Deepfake Detectors? An Empirical Study ( http://arxiv.org/abs/2308.04177v1 )

ライセンス: Link先を確認

Boquan Li, Jun Sun, Christopher M. Poskitt

(参考訳) ディープフェイクビデオや画像はますます信頼性が高くなり、詐欺やバイパスアクセス制御システムを促進する可能性から、大きな脅威となっている。これはディープフェイク検出法の開発を動機付けており、ディープラーニングモデルは実写映像と合成映像を区別するために訓練されている。残念ながら、既存の検出モデルは、トレーニングされていないデータセットのディープフェイクを一般化するのに苦労するが、なぜこの制限に対処できるのかを調査する作業はほとんど行われていない。本稿では,ディープフェイク検出器の汎用性に関する最初の実証的研究について述べる。本研究では,6つのdeepfakeデータセット,5つのdeepfake検出手法,および2つのモデル拡張手法を用いて,ゼロショット設定では検出器が一般化しないことを確認した。さらに, 検出器は, 合成法に特有の不要な特性を学習し, 識別的特徴の抽出に苦慮し, 一般化能力に限界があることが判明した。最後に、見えないデータセットをまたいで検出に普遍的に寄与するニューロンが存在することを見出し、ゼロショット一般化可能性への道筋を照明する。

Deepfake videos and images are becoming increasingly credible, posing a significant threat given their potential to facilitate fraud or bypass access control systems. This has motivated the development of deepfake detection methods, in which deep learning models are trained to distinguish between real and synthesized footage. Unfortunately, existing detection models struggle to generalize to deepfakes from datasets they were not trained on, but little work has been done to examine why or how this limitation can be addressed. In this paper, we present the first empirical study on the generalizability of deepfake detectors, an essential goal for detectors to stay one step ahead of attackers. Our study utilizes six deepfake datasets, five deepfake detection methods, and two model augmentation approaches, confirming that detectors do not generalize in zero-shot settings. Additionally, we find that detectors are learning unwanted properties specific to synthesis methods and struggling to extract discriminative features, limiting their ability to generalize. Finally, we find that there are neurons universally contributing to detection across seen and unseen datasets, illuminating a possible path forward to zero-shot generalizability.

翻訳日:2023-08-09 13:15:57 公開日:2023-08-08

# オープンドメインQAのためのモノトニックアグリゲーションについて

On Monotonic Aggregation for Open-domain QA ( http://arxiv.org/abs/2308.04176v1 )

ライセンス: Link先を確認

Sang-eun Han, Yeonseok Jeong, Seung-won Hwang, Kyungjae Lee

(参考訳) 質問応答 (QA) は, 支援文書を読み取ることなく, 回答のみを精査することで, 知識ソースからの音声検索において重要な課題である。特に、オープンドメインのQAは、制限なしの知識ソースに関するユーザの質問に答えることを目的としている。理想的には、ソースを追加することは精度を低下させるべきではないが、この特性("モノトニック性"と表記される)は現在の最先端のメソッドには当てはまらない。我々はその原因を特定し,それに基づいてジャッジ・スペシャリストの枠組みを提案する。本フレームワークは,(1)個々の情報源をカバーする専門的検索者/読み手,(2)最終回答を選択する専用言語モデルからなる。実験の結果,本フレームワークはモノトニック性を保証するだけでなく,最先端のマルチソースQA手法よりも優れていることがわかった。さらに,音声認識による雑音に対する単調性は頑健に保たれることを示す。コードと設定を公開しています。

Question answering (QA) is a critical task for speech-based retrieval from knowledge sources, by sifting only the answers without requiring to read supporting documents. Specifically, open-domain QA aims to answer user questions on unrestricted knowledge sources. Ideally, adding a source should not decrease the accuracy, but we find this property (denoted as "monotonicity") does not hold for current state-of-the-art methods. We identify the cause, and based on that we propose Judge-Specialist framework. Our framework consists of (1) specialist retrievers/readers to cover individual sources, and (2) judge, a dedicated language model to select the final answer. Our experiments show that our framework not only ensures monotonicity, but also outperforms state-of-the-art multi-source QA methods on Natural Questions. Additionally, we show that our models robustly preserve the monotonicity against noise from speech recognition. We publicly release our code and setting.

翻訳日:2023-08-09 13:15:35 公開日:2023-08-08

# 知識グラフを用いた薬物-薬物相互作用の予測

Predicting Drug-Drug Interactions Using Knowledge Graphs ( http://arxiv.org/abs/2308.04172v1 )

ライセンス: Link先を確認

Lizzy Farrugia, Lilian M. Azzopardi, Jeremy Debattista and Charlie Abela

(参考訳) 過去数十年間、人々は以前よりも多くの薬物を消費し、組み合わせ、ドラッグ・ドラッグ・インタラクション(DDI)の数を増やしてきた。未知のDDIを予測するために、近年では、単一の薬物特性を使用するよりも優れた薬物表現を提供するエンティティ間の関係を捉えることができるため、知識グラフ(KG)を導入し始めた。本稿では,様々な翻訳,因子化,ニューラルネットワーク(nn)ベースのkg埋め込み(kge)手法を用いて,公開薬物リポジトリからいくつかの薬物機能を1kgに統合し,そのノードをグラフに組み込む,medicx end-to-endフレームワークを提案する。最終的に、未知のDDIを予測する機械学習(ML)アルゴリズムを使用します。異なる翻訳と分解に基づくKGEモデルの中で、最も優れた組み合わせは、ComplExとLong Short-Term Memory (LSTM) ネットワークの組込みであり、D薬バンクのバージョン5.1.8にあるDDIに基づくデータセットでF1スコアの95.19%を得ることができた。このスコアは最先端のDeepDDIよりも5.61%良い。さらに,グラフニューラルネットワーク(gnn)を用いたグラフ自動エンコーダモデルも開発し,91.94%のf1スコアを達成した。その結果、GNNはComplExモデルよりもKGの基盤となるセマンティクスをマイニングする能力が強く、したがって、GNN内に高次元の埋め込みを使用することで、最先端のパフォーマンスを実現することができる。

In the last decades, people have been consuming and combining more drugs than before, increasing the number of Drug-Drug Interactions (DDIs). To predict unknown DDIs, recently, studies started incorporating Knowledge Graphs (KGs) since they are able to capture the relationships among entities providing better drug representations than using a single drug property. In this paper, we propose the medicX end-to-end framework that integrates several drug features from public drug repositories into a KG and embeds the nodes in the graph using various translation, factorisation and Neural Network (NN) based KG Embedding (KGE) methods. Ultimately, we use a Machine Learning (ML) algorithm that predicts unknown DDIs. Among the different translation and factorisation-based KGE models, we found that the best performing combination was the ComplEx embedding method with a Long Short-Term Memory (LSTM) network, which obtained an F1-score of 95.19% on a dataset based on the DDIs found in DrugBank version 5.1.8. This score is 5.61% better than the state-of-the-art model DeepDDI. Additionally, we also developed a graph auto-encoder model that uses a Graph Neural Network (GNN), which achieved an F1-score of 91.94%. Consequently, GNNs have demonstrated a stronger ability to mine the underlying semantics of the KG than the ComplEx model, and thus using higher dimension embeddings within the GNN can lead to state-of-the-art performance.

翻訳日:2023-08-09 13:15:19 公開日:2023-08-08

# 多コアニューロモルフィックプロセッサのコアインタフェース最適化

Core interface optimization for multi-core neuromorphic processors ( http://arxiv.org/abs/2308.04171v1 )

ライセンス: Link先を確認

Zhe Su, Hyunjung Hwang, Tristan Torchet, Giacomo Indiveri

(参考訳) Spiking Neural Networks(SNN)のハードウェア実装は、低電力と低レイテンシを必要とし、外部クラウドベースのコンピューティングサービスに頼らないアプリケーションのためのエッジコンピューティングへの有望なアプローチである。しかし、これまで提案されたほとんどのソリューションは、比較的小さなネットワークしかサポートしていないか、大きなネットワークを実装するための重要なハードウェアリソースを取り上げている。大規模でスケーラブルなSNNを実現するためには、マルチコアアーキテクチャの設計を可能にする効率的な非同期通信およびルーティングファブリックを開発する必要がある。特に、コア間スパイク通信を管理するコアインターフェースは、特に調停アーキテクチャとルーティングメモリにおける電力性能領域(ppa)のボトルネックを表しているため、重要なコンポーネントである。本稿では,階層型アービタ木に基づく,対応する非同期符号化パイプライン回路との調停機構を提案する。提案手法は,最先端の調停アーキテクチャと比較して,スパースイベントモードでのレイテンシを70%以上削減し,面積コストを低減した。ルーティングメモリは、電流センシング完了検出(cscd)を伴う非同期コンテンツアドレス可能メモリ(cam)を使用し、約46%の省エネを実現し、構成可能な遅延線を用いて従来の非同期camに対するスループットを40%向上させる。さらに、マルチコアニューロモルフィックプロセッサのコアインタフェースリソースを劇的に削減すると同時に、我々が提案する調停アーキテクチャとCAMアーキテクチャは、幅広い一般的な非同期回路やシステムにも適用可能である。

Hardware implementations of Spiking Neural Networks (SNNs) represent a promising approach to edge-computing for applications that require low-power and low-latency, and which cannot resort to external cloud-based computing services. However, most solutions proposed so far either support only relatively small networks, or take up significant hardware resources, to implement large networks. To realize large-scale and scalable SNNs it is necessary to develop an efficient asynchronous communication and routing fabric that enables the design of multi-core architectures. In particular the core interface that manages inter-core spike communication is a crucial component as it represents the bottleneck of Power-Performance-Area (PPA) especially for the arbitration architecture and the routing memory. In this paper we present an arbitration mechanism with the corresponding asynchronous encoding pipeline circuits, based on hierarchical arbiter trees. The proposed scheme reduces the latency by more than 70% in sparse-event mode, compared to the state-of-the-art arbitration architectures, with lower area cost. The routing memory makes use of asynchronous Content Addressable Memory (CAM) with Current Sensing Completion Detection (CSCD), which saves approximately 46% energy, and achieves a 40% increase in throughput against conventional asynchronous CAM using configurable delay lines, at the cost of only a slight increase in area. In addition as it radically reduces the core interface resources in multi-core neuromorphic processors, the arbitration architecture and CAM architecture we propose can be also applied to a wide range of general asynchronous circuits and systems.

翻訳日:2023-08-09 13:14:51 公開日:2023-08-08

# 位置音源定位のための2重入力ニューラルネットワーク

Dual input neural networks for positional sound source localization ( http://arxiv.org/abs/2308.04169v1 )

ライセンス: Link先を確認

Eric Grinstein, Vincent W. Neo and Patrick A. Naylor

(参考訳) 多くの信号処理アプリケーションでは、メタデータを高次元信号と組み合わせて所望の出力を生成するのに有利に使用できる。従来のサウンドソースローカライゼーション(SSL)アルゴリズムでは、多くの分散マイクロホンから受信される高次元のマルチチャンネルオーディオ信号から得られる情報と、空間内のマイクロホンの座標などのシーンの音響特性を記述する情報を組み合わせて、音源の位置を推定する。本稿では,これら2つのデータ型をニューラルネットワークでモデル化するための簡易かつ効果的な手法として,dual input neural network (di-nns)を導入する。提案したDI-NNを,難易度やリアリズムの異なるシナリオで訓練・評価し,従来のLast-Squares(LS)法や,従来の畳み込みリカレントニューラルネットワーク(CRNN)法と比較する。その結果、実記録の試験データセットにおいて、di-nnがベースラインを著しく上回り、ls法より5倍低いローカライズエラーとなり、crnnより2倍低い値を示した。

In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such as the microphones' coordinates in space, to estimate the position of a sound source. We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture, a classical Least-Squares (LS) method as well as a classical Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.

翻訳日:2023-08-09 13:14:21 公開日:2023-08-08

# EFaR 2023: 効率的な顔認識コンペティション

EFaR 2023: Efficient Face Recognition Competition ( http://arxiv.org/abs/2308.04168v1 )

ライセンス: Link先を確認

Jan Niklas Kolf, Fadi Boutros, Jurek Elliesen, Markus Theuerkauf, Naser Damer, Mohamad Alansari, Oussama Abdul Hay, Sara Alansari, Sajid Javed, Naoufel Werghi, Klemen Grm, Vitomir \v{S}truc, Fernando Alonso-Fernandez, Kevin Hernandez Diaz, Josef Bigun, Anjith George, Christophe Ecabert, Hatef Otroshi Shahreza, Ketan Kotwal, S\'ebastien Marcel, Iurii Medvedev, Bo Jin, Diogo Nunes, Ahmad Hassanpour, Pankaj Khatiwada, Aafan Ahmad Toor, Bian Yang

(参考訳) 本稿では,2023年の国際生体認証合同会議(ijcb 2023)で開かれた,効率的な顔認識コンペティション(efar)の概要を紹介する。この大会は6つの異なるチームから17の応募を受けた。効率的な顔認識モデルのさらなる発展を促進するため、提案したソリューションは、様々なベンチマークで達成された検証精度の重み付けスコアと、浮動小数点演算数とモデルサイズによるデプロイ可能性に基づいてランク付けされる。提案の評価はバイアス、クロス品質、大規模認識ベンチマークに拡張される。本稿では,提案したソリューションの性能評価結果の概要と,多様なベースラインのセットについて概説する。提出されたソリューションは、計算コストを削減するために小さく効率的なネットワークアーキテクチャを使用し、いくつかのソリューションはモデル量子化を適用する。現在のソリューションで不足している可能性のある技術についても,その展望が述べられている。

This paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023). The competition received 17 submissions from 6 different teams. To drive further development of efficient face recognition models, the submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size. The evaluation of submissions is extended to bias, cross-quality, and large-scale recognition benchmarks. Overall, the paper gives an overview of the achieved performance values of the submitted solutions as well as a diverse set of baselines. The submitted solutions use small, efficient network architectures to reduce the computational cost, some solutions apply model quantization. An outlook on possible techniques that are underrepresented in current solutions is given as well.

翻訳日:2023-08-09 13:13:59 公開日:2023-08-08

# 連続変数ベースの量子位置検証プロトコルのセキュリティ

Security of a Continuous-Variable based Quantum Position Verification Protocol ( http://arxiv.org/abs/2308.04166v1 )

ライセンス: Link先を確認

Rene Allerstorfer, Lloren\c{c} Escol\`a-Farr\`as, Arpan Akash Ray, Boris \v{S}kori\'c, Florian Speelman, Philip Verduyn Lunel

(参考訳) 本研究では,連続可変量子状態を用いた量子位置検証について検討する。既存の離散プロトコルとは対照的に,コヒーレントな状態とその特性を利用するプロトコルを提示・分析する。離散可変フォトニック状態と比較して、コヒーレント状態は、現在の技術で効率的に調製および操作できるため、実用的な利点がある。我々は,量子チャネル内の雑音が一定のしきい値以下である限り,敵は正しい応答について正直な証明者よりも不確実性が高いことを示すため,エントロピーな不確実性関係を通じて,絡み合っていない攻撃者に対するプロトコルのセキュリティを証明した。さらに,eprペアを1つだけ共有する攻撃者がプロトコルを破ることができることを示す。

In this work we study quantum position verification with continuous-variable quantum states. In contrast to existing discrete protocols, we present and analyze a protocol that utilizes coherent states and its properties. Compared to discrete-variable photonic states, coherent states offer practical advantages since they can be efficiently prepared and manipulated with current technology. We prove security of the protocol against any unentangled attackers via entropic uncertainty relations, showing that the adversary has more uncertainty than the honest prover about the correct response as long as the noise in the quantum channel is below a certain threshold. Additionally, we show that attackers who pre-share one continuous-variable EPR pair can break the protocol.

翻訳日:2023-08-09 13:13:47 公開日:2023-08-08

# KNNを用いたLASSOを用いた地域定量値の変動係数と健康影響研究への応用

Varying-coefficients for regional quantile via KNN-based LASSO with applications to health outcome study ( http://arxiv.org/abs/2308.04212v1 )

ライセンス: Link先を確認

Seyoung Park, Eun Ryung Lee, Hyokyoung G. Hong

(参考訳) 身体の質量指数やコレステロール濃度などの健康影響は年齢に依存し、関連する危険因子に様々な影響を与えることが知られている。本稿では,k-nearest neighbors (knn) fused lasso を用いた変分共効率(vc)地域分位回帰を用いた,健康成果とリスク要因の関係を動的にモデル化する新しい枠組みを提案する。提案手法は,厳密な推定誤差バウンドと,特定の正規性条件下で正確なクラスターパターンを検出する能力を含む,強い理論的特性を有する。結果の最適化問題を効率的に解くために,乗算器アルゴリズムの交互方向法(ADMM)を開発した。本研究は,健康成果とリスク因子の複雑な年齢依存関係を捉えるための提案手法の有効性を実証する。

Health outcomes, such as body mass index and cholesterol levels, are known to be dependent on age and exhibit varying effects with their associated risk factors. In this paper, we propose a novel framework for dynamic modeling of the associations between health outcomes and risk factors using varying-coefficients (VC) regional quantile regression via K-nearest neighbors (KNN) fused Lasso, which captures the time-varying effects of age. The proposed method has strong theoretical properties, including a tight estimation error bound and the ability to detect exact clustered patterns under certain regularity conditions. To efficiently solve the resulting optimization problem, we develop an alternating direction method of multipliers (ADMM) algorithm. Our empirical results demonstrate the efficacy of the proposed method in capturing the complex age-dependent associations between health outcomes and their risk factors.

翻訳日:2023-08-09 13:08:00 公開日:2023-08-08

# x線マイクロスペクトロスコピーによる材料化学状態のロバスト検索

Robust retrieval of material chemical states in X-ray microspectroscopy ( http://arxiv.org/abs/2308.04207v1 )

ライセンス: Link先を確認

Ting Wang, Xiaotong Wu, Jizhou Li, Chao Wang

(参考訳) x線マイクロスペクトロスコープ技術は、材料の形態的および化学的変化を研究するために必須であり、高分解能な構造と分光情報を提供する。しかし、化学状態の確実な回収のための実用的なデータ分析は、多くの研究分野における材料の基本的理解を加速させる大きな障害である。本研究では、x線マイクロスペクトロスコピーのための新しいデータ定式化モデルを提案し、ノイズやスペクトル変動に頑健な、この問題を解決するための専用非混合フレームワークを開発した。さらに、この枠組みは二状態物質化学の分析に限らず、従来および広く用いられている手法の代替として有効である。また、より効率的に解を得るために、証明可能な収束を伴う代替方向乗算法が適用される。提案手法は,低信号対雑音比や重なり合うスペクトル特徴といった困難な条件下でも,複雑な試料や異種試料の化学状態を正確に同定し特徴付けることができる。シミュレーションおよび実データに対する大規模な実験結果は、その有効性と信頼性を示している。

X-ray microspectroscopic techniques are essential for studying morphological and chemical changes in materials, providing high-resolution structural and spectroscopic information. However, its practical data analysis for reliably retrieving the chemical states remains a major obstacle to accelerating the fundamental understanding of materials in many research fields. In this work, we propose a novel data formulation model for X-ray microspectroscopy and develop a dedicated unmixing framework to solve this problem, which is robust to noise and spectral variability. Moreover, this framework is not limited to the analysis of two-state material chemistry, making it an effective alternative to conventional and widely-used methods. In addition, an alternative directional multiplier method with provable convergence is applied to obtain the solution efficiently. Our framework can accurately identify and characterize chemical states in complex and heterogeneous samples, even under challenging conditions such as low signal-to-noise ratios and overlapping spectral features. Extensive experimental results on simulated and real datasets demonstrate its effectiveness and reliability.

翻訳日:2023-08-09 13:07:42 公開日:2023-08-08

# オープンワールドインスタンスセグメンテーションのためのトランスフォーマーの探索

Exploring Transformers for Open-world Instance Segmentation ( http://arxiv.org/abs/2308.04206v1 )

ライセンス: Link先を確認

Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo

(参考訳) オープンワールドのインスタンスセグメンテーションは、少数のベースカテゴリオブジェクトから学習することで、イメージ内のすべてのオブジェクトをセグメンテーションすることを目的としている。目に見えないカテゴリの数は、見られているカテゴリの何百倍も大きい可能性があるため、このタスクは困難である。近年、DETRのようなモデルがクローズドな世界で広く研究され、オープンな世界では探索されていない。本稿では,Transformerを用いてオープンワールドのインスタンスセグメンテーションとSWORDを提案する。まず,分類ヘッドの前にストップグレード操作をアタッチし,さらに新たなオブジェクト発見のためのiouヘッドを追加する。単純なストップグレード操作は,新しいオブジェクトが背景として抑制されるのを防ぐだけでなく,ヒューリスティックラベル割り当てのメリットをネットワークが享受できることを示す。次に,オブジェクトと背景の表現を拡大するための新しいコントラスト学習フレームワークを提案する。具体的には,オブジェクトセンタを得るためにユニバーサルオブジェクトキューを維持し,オブジェクトクエリから正と負のサンプルを動的に選択して対比学習を行う。本研究は, 平均リコールと平均精度の無視にのみ焦点をあてるものであるが, いずれの基準も考慮し, SWORDの優位性を示す。我々のモデルは、様々なオープンワールドのクロスカテゴリやクロスデータセットの一般化において最先端のパフォーマンスを達成する。特にVOC以外のシステムでは,ARb100では40.0%,ARm100では34.9%の新たな技術結果が得られた。 COCO と UVO の一般化では、SWORD はAPm では5.9%、ARm100 では8.1% で過去最高のオープンワールドモデルを上回っている。

Open-world instance segmentation is a rising task, which aims to segment all objects in the image by learning from a limited number of base-category objects. This task is challenging, as the number of unseen categories could be hundreds of times larger than that of seen categories. Recently, the DETR-like models have been extensively studied in the closed world while stay unexplored in the open world. In this paper, we utilize the Transformer for open-world instance segmentation and present SWORD. Firstly, we introduce to attach the stop-gradient operation before classification head and further add IoU heads for discovering novel objects. We demonstrate that a simple stop-gradient operation not only prevents the novel objects from being suppressed as background, but also allows the network to enjoy the merit of heuristic label assignment. Secondly, we propose a novel contrastive learning framework to enlarge the representations between objects and background. Specifically, we maintain a universal object queue to obtain the object center, and dynamically select positive and negative samples from the object queries for contrastive learning. While the previous works only focus on pursuing average recall and neglect average precision, we show the prominence of SWORD by giving consideration to both criteria. Our models achieve state-of-the-art performance in various open-world cross-category and cross-dataset generalizations. Particularly, in VOC to non-VOC setup, our method sets new state-of-the-art results of 40.0% on ARb100 and 34.9% on ARm100. For COCO to UVO generalization, SWORD significantly outperforms the previous best open-world model by 5.9% on APm and 8.1% on ARm100.

翻訳日:2023-08-09 13:07:26 公開日:2023-08-08

# 量子力学系の隠れテンソル構造:単一粒子量子計算を目指して

Hidden tensor structures of any quantum mechanical system: Towards single-particle quantum computation ( http://arxiv.org/abs/2308.04202v1 )

ライセンス: Link先を確認

Marek Czachor

(参考訳) 量子情報処理の標準的なアーキテクチャはボトムアップ設計に基づいている: 1桁の1粒子システムから始まり、マルチ桁の量子レジスタは1つの量子桁のテンソル積によって数学的にモデル化されたマルチ粒子構成を要求する。ここでは、量子情報処理の単一粒子トップダウン設計を可能にする隠れテンソル構造を、任意の単一量子システムが自動的に備えていることを示す。隠れテンソル構造は、単一の1次元調和振動子のように単純な量子系を任意の数のサブシステムに分解できることを意味する。結果として生じる構造は、量子計算、ベルの不等式違反、普遍量子ゲートの定式化を可能にするのに十分なリッチである。原則として、単一粒子量子コンピュータは可能である。さらに、これらの隠れた構造は、ブラント・グリーンバーグによる生成消滅作用素のマルチボゾン表現のような、いくつかのよく知られた理論構成のルーツであり、高次または分数次スクイージングの文脈で集中的に研究されていることが示されている。事実上、文献から知られているかなり退屈な標準的な証明は、文字通り1行に単純化することができる。一般的な構成は具体例で示される。

Standard architecture of quantum information processing is based on bottom-up design: One begins with a one-digit one-particle system, while multi-digit quantum registers demand multi-particle configurations, mathematically modeled by tensor products of single quantum digits. Here we show that any single quantum system is automatically equipped with hidden tensor structures that allow for single-particle top-down designs of quantum information processing. Hidden tensor structures imply that any quantum system, even as simple as a single one-dimensional harmonic oscillator, can be decomposed into an arbitrary number of subsystems. The resulting structure is rich enough to enable quantum computation, violation of Bell's inequalities, and formulation of universal quantum gates. In principle, a single-particle quantum computer is possible. Moreover, it is shown that these hidden structures are at the roots of some well known theoretical constructions, such as the Brandt-Greenberg multi-boson representation of creation-annihilation operators, intensively investigated in the context of higher-order or fractional-order squeezing. In effect, certain rather tedious standard proofs known from the literature can be simplified to literally one line. The general construction is illustrated by concrete examples.

翻訳日:2023-08-09 13:07:00 公開日:2023-08-08

# アインシュタインの量子リドルの観点から見たハイゼンベルクの量子力学の百年次再評価

A centennial reappraisal of Heisenberg's Quantum Mechanics with a perspective on Einstein's Quantum Riddle ( http://arxiv.org/abs/2308.04199v1 )

ライセンス: Link先を確認

Tuck C. Choy

(参考訳) ハイゼンベルクは1925年7月に発表した論文で、ボルン、ヨルダン、ハイゼンベルク、そしてディラック(1925年から1927年まで)によるその後の論文を通じて量子力学の発展を推し進めた。本稿では,新しい視点について考察する。 (i)彼の発見の直観を導くものは何か (ii)ボルン=ヨルダン=ハイゼンベルク正準量子化規則の起源この点から、アインシュタインの量子リドル (Lande 1974, Sommerfeld1918, Born1926) についての洞察と、ハイゼンベルクの量子力学の過去100年後に何が起こるのかを垣間見ることができる。

Heisenberg's breakthrough in his July 1925 paper that set in motion the development of Quantum Mechanics through subsequent papers by Born, Jordan, Heisenberg and also Dirac (from 1925 to 1927) is reexamined through a modern lens. In this paper, we shall discuss some new perspectives on (i) what could be the guiding intuitions for his discoveries and (ii) the origin of the Born-Jordan-Heisenberg canonical quantization rule. From this vantage point we may get an insight into Einstein's Quantum Riddle (Lande1974,Sommerfeld1918,Born1926) and a possible glimpse of what might come next after the last 100 years of Heisenberg's quantum mechanics.

翻訳日:2023-08-09 13:06:38 公開日:2023-08-08

# D3G: Glanceアノテーションを用いた時間文接地のためのガウス先行探索

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation ( http://arxiv.org/abs/2308.04197v1 )

ライセンス: Link先を確認

Hanjun Li, Xiujun Shu, Sunan He, Ruizhi Qiao, Wei Wen, Taian Guo, Bei Gan, Xing Sun

(参考訳) time sentence grounding (tsg) は、与えられた自然言語クエリを持つ未トリミングビデオから特定のモーメントを見つけることを目的としている。近年では、教師付きメソッドが完全に教師付きメソッドに比べて大きなパフォーマンスギャップを保ち、後者は面倒なタイムスタンプアノテーションを必要とする。本研究では,TSGタスクのアノテーションコストを削減しつつ,TSGタスクの競合性能を維持することを目的としている。この目的を達成するために、最近提案された、各クエリに対して単一のフレームアノテーション(参照アノテーション)のみを必要とする、一見教師付き時間文グラウンド化タスクについて検討する。そこで,本研究では,semantic alignment group contrastive learning module (sa-gcl) と dynamic gaussian prior adjustment module (dga) からなるd3g (d3g) を用いた動的ガウス型事前学習フレームワークを提案する。特に、SA-GCLはガウスの先行と意味的整合性を利用して、2次元の時間地図から信頼できる正のモーメントをサンプリングし、共同埋め込み空間における正の文-モーメント対の整合に寄与する。さらに,複数のイベントからなる,視線アノテーションとモデル複合クエリによるアノテーションバイアスを軽減するために,ターゲットモーメントの基底的真理を近似するために分布を動的に調整するdgaモジュールを提案する。 3つの挑戦的なベンチマークに関する大規模な実験は、提案したD3Gの有効性を検証する。最先端の弱い教師付き手法を大きなマージンで上回り、完全に教師付き手法と比較してパフォーマンスギャップを狭める。コードはhttps://github.com/solicucu/D3Gで入手できる。

Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. Recently, weakly supervised methods still have a large performance gap compared to fully supervised ones, while the latter requires laborious timestamp annotations. In this study, we aim to reduce the annotation cost yet keep competitive performance for TSG task compared to fully supervised ones. To achieve this goal, we investigate a recently proposed glance-supervised temporal sentence grounding task, which requires only single frame annotation (referred to as glance annotation) for each query. Under this setup, we propose a Dynamic Gaussian prior based Grounding framework with Glance annotation (D3G), which consists of a Semantic Alignment Group Contrastive Learning module (SA-GCL) and a Dynamic Gaussian prior Adjustment module (DGA). Specifically, SA-GCL samples reliable positive moments from a 2D temporal map via jointly leveraging Gaussian prior and semantic consistency, which contributes to aligning the positive sentence-moment pairs in the joint embedding space. Moreover, to alleviate the annotation bias resulting from glance annotation and model complex queries consisting of multiple events, we propose the DGA module, which adjusts the distribution dynamically to approximate the ground truth of target moments. Extensive experiments on three challenging benchmarks verify the effectiveness of the proposed D3G. It outperforms the state-of-the-art weakly supervised methods by a large margin and narrows the performance gap compared to fully supervised methods. Code is available at https://github.com/solicucu/D3G.

翻訳日:2023-08-09 13:06:25 公開日:2023-08-08

# GHZ状態測定を用いた高光沢しきい値量子コンピューティング

High photon-loss threshold quantum computing using GHZ-state measurements ( http://arxiv.org/abs/2308.04192v1 )

ライセンス: Link先を確認

Brendan Pankovich, Angus Kan, Kwok Ho Wan, Maike Ostmann, Alex Neville, Srikrishna Omkar, Adel Sohbi and Kamil Br\'adler

(参考訳) 本稿では,グリーンベルガー・ホーネ・ザイリンガー(GHZ)において,一定の大きさ,絡み合った資源状態に基づく射影的計測に基づく耐故障性アーキテクチャを提案する。本稿では,GHZ状態測定を符号化し,光子損失による誤差や線形光学の確率的性質を抑える線形光学構造について述べる。シミュレーションにより, 一定サイズの資源状態上での2量子ビット核融合測定により実現される, 最先端の線形光学アーキテクチャと比較して高い単一光子損失しきい値を示す。この結果はフォトニックフォールトトレラント量子コンピューティングを実現するためのリソース効率のよい道筋を示していると信じている。

We propose fault-tolerant architectures based on performing projective measurements in the Greenberger-Horne-Zeilinger (GHZ) basis on constant-sized, entangled resource states. We present linear-optical constructions of the architectures, where the GHZ-state measurements are encoded to suppress the errors induced by photon loss and the probabilistic nature of linear optics. Simulations of our constructions demonstrate high single-photon loss thresholds compared to the state-of-the-art linear-optical architecture realized with encoded two-qubit fusion measurements performed on constant-sized resource states. We believe this result shows a resource-efficient path to achieving photonic fault-tolerant quantum computing.

翻訳日:2023-08-09 13:05:54 公開日:2023-08-08

# ディープクロススケールパッチマッチングによる画像コピーモーブ偽造検出

Image Copy-Move Forgery Detection via Deep Cross-Scale PatchMatch ( http://arxiv.org/abs/2308.04188v1 )

ライセンス: Link先を確認

Yingjie He, Yuanman Li, Changsheng Chen and Xia Li

(参考訳) 最近開発された深層アルゴリズムは,イメージコピーモーブ偽造検出(cmfd)の分野で有望な進歩を遂げている。しかし、訓練画像やクローンされた領域にコピーモブオブジェクトが存在しない場合、いくつかの実用的なシナリオでは一般化性は限られている。以上の課題に対処するため,本研究では,従来の手法と深層手法を融合した新しいエンドツーエンドCMFDフレームワークを提案する。具体的には、コピー-ムーブ領域をローカライズするCMFDに適した、ディープクロススケールパッチマッチ手法を設計する。既存の深層モデルとは対照的に,高分解能スケールから抽出した特徴を用いて,ソースとターゲット領域間の明確かつ信頼性の高いポイント・ツー・ポイントマッチングを求める。さらに、ソース/ターゲット分離のための操作領域位置分岐を開発する。提案したCMFDフレームワークは完全に差別化可能であり、エンドツーエンドでトレーニングすることができる。提案手法は,本手法をコピー・ムーブの異なる内容に対して高い一般化性を示し,提案手法は既存手法よりも優れた性能を実現する。

The recently developed deep algorithms achieve promising progress in the field of image copy-move forgery detection (CMFD). However, they have limited generalizability in some practical scenarios, where the copy-move objects may not appear in the training images or cloned regions are from the background. To address the above issues, in this work, we propose a novel end-to-end CMFD framework by integrating merits from both conventional and deep methods. Specifically, we design a deep cross-scale patchmatch method tailored for CMFD to localize copy-move regions. In contrast to existing deep models, our scheme aims to seek explicit and reliable point-to-point matching between source and target regions using features extracted from high-resolution scales. Further, we develop a manipulation region location branch for source/target separation. The proposed CMFD framework is completely differentiable and can be trained in an end-to-end manner. Extensive experimental results demonstrate the high generalizability of our method to different copy-move contents, and the proposed scheme achieves significantly better performance than existing approaches.

翻訳日:2023-08-09 13:05:46 公開日:2023-08-08

# 何に理由を加えるか? 日常的な説明の分析

Adding Why to What? Analyses of an Everyday Explanation ( http://arxiv.org/abs/2308.04187v1 )

ライセンス: Link先を確認

Lutz Terfloth, Michael Schaffer, Heike M. Buhl, Carsten Schulte

(参考訳) xaiでは、専門的なオーディエンスのための説明とは対照的に、在職者について説明するとき、共通の専門知識を想定できないと考えることが重要である。しかし、人間間の説明は大きく異なるため、説明の共通性の研究は困難である。技術哲学的なアプローチである双対自然理論を使って、これらの課題に対処しました。アーキテクチャ(例えば、アルゴリズムのロジック)や関連性(例えば、決定の重大さ、レコメンデーションの意味)に焦点を当てることによって、XAIの2つの性質に対処することで、XAIの決定を説明することができる。本理論を分析的枠組みとして20種類のゲーム説明を検討した。我々は、この理論を使って、技術的アーティファクトの説明を素早く構造化し、比較した。ビデオリコールの結果から説明内容を分析した結果を補足し,説明者による説明の正当性について検討した。説明者はまずゲームの物理的側面(アーキテクチャ)に注目し、その後にのみ関連性の側面に注目することを発見した。ビデオのリコールでは、EXがアーキテクチャに重点を置くことは、より複雑で無形な側面にフォーカスする前に、まずは基本的なコンポーネントを説明することによって、説明を構築する上で重要であると見なされた。両者の対応の切り替えは、説明の目標、新たな誤解、説明者の知識ニーズによって正当化された。我々は,今後の研究課題を喚起するいくつかの共通点を発見し,さらに一般化すれば,合成説明の構成に第一のアイデアを与える。

In XAI it is important to consider that, in contrast to explanations for professional audiences, one cannot assume common expertise when explaining for laypeople. But such explanations between humans vary greatly, making it difficult to research commonalities across explanations. We used the dual nature theory, a techno-philosophical approach, to cope with these challenges. According to it, one can explain, for example, an XAI's decision by addressing its dual nature: by focusing on the Architecture (e.g., the logic of its algorithms) or the Relevance (e.g., the severity of a decision, the implications of a recommendation). We investigated 20 game explanations using the theory as an analytical framework. We elaborate how we used the theory to quickly structure and compare explanations of technological artifacts. We supplemented results from analyzing the explanation contents with results from a video recall to explore how explainers justified their explanation. We found that explainers were focusing on the physical aspects of the game first (Architecture) and only later on aspects of the Relevance. Reasoning in the video recalls indicated that EX regarded the focus on the Architecture as important for structuring the explanation initially by explaining the basic components before focusing on more complex, intangible aspects. Shifting between addressing the two sides was justified by explanation goals, emerging misunderstandings, and the knowledge needs of the explainee. We discovered several commonalities that inspire future research questions which, if further generalizable, provide first ideas for the construction of synthetic explanations.

翻訳日:2023-08-09 13:05:27 公開日:2023-08-08

# セキュアコード回帰のための反復スケッチ

Iterative Sketching for Secure Coded Regression ( http://arxiv.org/abs/2308.04185v1 )

ライセンス: Link先を確認

Neophytos Charalambides, Hessam Mahdavifar, Mert Pilanci, Alfred O. Hero III

(参考訳) 本研究では,安全性を確保しつつ,線形回帰分布を高速化する手法を提案する。ランダムなスケッチ技術を活用し、非同期システムにおけるストラグラーレジリエンスを改善する。具体的には、ランダム正規直交行列を適用し、その後、情報を確保し、回帰問題の次元を減らすために \textit{blocks} をサブサンプルする。我々の設定では、変換は \textit{approximate gradient coding scheme} で符号化された暗号化に対応し、サブサンプリングは非ストラップ作業者の応答に対応する。これにより、$\ell_2$-subspace Embedding, \textit{i.e.} に対する分配的な \textit{iterative sketching} アプローチが各イテレーションで検討される。我々はまた、サンプリングをブロックするために一般化された \textit{Subsampled Randomized Hadamard Transform} の特別なケースに焦点を当て、データをセキュアにするためにどのように修正できるかについて議論する。

In this work, we propose methods for speeding up linear regression distributively, while ensuring security. We leverage randomized sketching techniques, and improve straggler resilience in asynchronous systems. Specifically, we apply a random orthonormal matrix and then subsample \textit{blocks}, to simultaneously secure the information and reduce the dimension of the regression problem. In our setup, the transformation corresponds to an encoded encryption in an \textit{approximate gradient coding scheme}, and the subsampling corresponds to the responses of the non-straggling workers; in a centralized coded computing network. This results in a distributive \textit{iterative sketching} approach for an $\ell_2$-subspace embedding, \textit{i.e.} a new sketch is considered at each iteration. We also focus on the special case of the \textit{Subsampled Randomized Hadamard Transform}, which we generalize to block sampling; and discuss how it can be modified in order to secure the data.

翻訳日:2023-08-09 13:05:01 公開日:2023-08-08

# キラル特異点を用いたプラズマ共鳴のコヒーレント光-マター相互作用の増強と室温量子収率

Enhanced coherent light-matter interaction and room-temperature quantum yield of plasmonic resonances engineered by a chiral exceptional point ( http://arxiv.org/abs/2308.04239v1 )

ライセンス: Link先を確認

Yuwei Lu, Haoxiang Jiang, Renming Liu

(参考訳) プラズモニック共鳴の強い消散は量子操作に有害である。量子コヒーレンスを高めるために,光磁場の位相が量子状態を柔軟に操作する新しい自由度を提供するキラル例外点(CEP)で作動するフォトニックキャビティを統合することにより,プラズモン共鳴の局所状態密度(LDOS)を調整することを提案する。量子化数モード理論を用いて,提案するハイブリッドキャビティのldosが,cepを伴わない通常のプラズモニック・フォトニックキャビティと比較して最大8倍のエンハンスメントとマグニチュード・オブ・マグニチュード・ライン幅の狭さを伴うサブロレンツ型に進化できることを明らかにした。これにより、偏光状態の散逸が減少すると共にコヒーレントな光-物質相互作用が強化される。さらに,cepにおける量子収率の大幅な向上,ファノ干渉によるプラズモニック吸収の低減,スーパー散乱によるキャビティ放射の増大の2つのメカニズムを明らかにするために,固有モード分解に基づく散乱理論が存在する。また,cepにおける高量子収率は,量子エミッタの蛍光寿命を測定することで,cepにおける拡張ldosの実験的検証に有用であることがわかった。そこで本研究では,CEPを用いた環境下でのプラズマ共鳴が,オープン光共振器の非ハーモニティ性を利用して量子状態制御を探索し,センサ,分光,量子情報処理,量子コンピューティングのための高性能な量子デバイスを構築する上で,有望なプラットフォームとなることを示す。

Strong dissipation of plasmonic resonances is detrimental to quantum manipulation. To enhance the quantum coherence, we propose to tailor the local density of states (LDOS) of plasmonic resonances by integrating with a photonic cavity operating at a chiral exceptional point (CEP), where the phase of light field can offer a new degree of freedom to flexibly manipulate the quantum states. A quantized few-mode theory is employed to reveal that the LDOS of the proposed hybrid cavity can evolve into sub-Lorentzian lineshape, with order-of-magnitude linewidth narrowing and additionally a maximum of eightfold enhancement compared to the usual plasmonic-photonic cavity without CEP. This results in the enhanced coherent light-matter interaction accompanied by the reduced dissipation of polaritonic states. Furthermore, a scattering theory based on eigenmode decomposition is present to elucidate two mechanisms responsible for the significant improvement of quantum yield at CEP, the reduction of plasmonic absorption by the Fano interference and the enhancement of cavity radiation through the superscattering. Importantly, we find the latter allows achieving a near-unity quantum yield at room temperature; in return, high quantum yield is beneficial to experimentally verify the enhanced LDOS at CEP by measuring the fluorescence lifetime of a quantum emitter. Therefore, our work demonstrates that the plasmonic resonances in CEP-engineered environment can serve as a promising platform for exploring the quantum states control by virtue of the non-Hermiticity of open optical resonators and building the high-performance quantum devices for sensing, spectroscopy, quantum information processing and quantum computing.

翻訳日:2023-08-09 12:57:13 公開日:2023-08-08

# コンフォメーション予測による無線チャネル上の信頼性不確実性定量化を用いたフェデレーション推定

Federated Inference with Reliable Uncertainty Quantification over Wireless Channels via Conformal Prediction ( http://arxiv.org/abs/2308.04237v1 )

ライセンス: Link先を確認

Meiyi Zhu, Matteo Zecchin, Sangwoo Park, Caili Guo, Chunyan Feng, Osvaldo Simeone

(参考訳) デバイスとサーバが事前訓練されたモデルを共有する設定を考える。サーバはモデルが与えられたら、新しい入力を推論したい。デバイスは、以前トレーニングに使用されていなかったデータにアクセスでき、共通の無線チャネルを介してサーバと通信することができる。デバイスが新しい入力にアクセスできない場合、デバイスからサーバへの通信は、サーバにおける推論決定の質を高めることができるのか? 最近の研究では、デバイス間通信を利用してサーバの決定の信頼性を向上させるfederated conformal prediction(cp)が導入されている。連合CPでは、デバイスがローカルデータ上で共有事前学習モデルによって得られた損失に関するサーバ情報と通信し、サーバは、この情報を利用して決定間隔や設定を校正し、予め定義された目標信頼性レベルに正しい回答を含むことが保証される。以前の作業ではノイズのない通信を想定しており、デバイスは1つの実数をサーバに通信できる。本稿では,無線環境下での初となるフェデレーションCPについて検討する。本稿では,タイプベース多重アクセス(TBMA)と新しい量子補正戦略に基づく新しいプロトコルWFCPを提案する。 WFCPは、サーバが生成した予測セットのカバレッジに関して、正式な信頼性を保証することが証明されている。計算結果を用いて、既存の連合CP方式のデジタル実装に対するWFCPの顕著なアドバンテージを、特に限られた通信資源や多数のデバイスで示している。

Consider a setting in which devices and a server share a pre-trained model. The server wishes to make an inference on a new input given the model. Devices have access to data, previously not used for training, and can communicate to the server over a common wireless channel. If the devices have no access to the new input, can communication from devices to the server enhance the quality of the inference decision at the server? Recent work has introduced federated conformal prediction (CP), which leverages devices-to-server communication to improve the reliability of the server's decision. With federated CP, devices communicate to the server information about the loss accrued by the shared pre-trained model on the local data, and the server leverages this information to calibrate a decision interval, or set, so that it is guaranteed to contain the correct answer with a pre-defined target reliability level. Previous work assumed noise-free communication, whereby devices can communicate a single real number to the server. In this paper, we study for the first time federated CP in a wireless setting. We introduce a novel protocol, termed wireless federated conformal prediction (WFCP), which builds on type-based multiple access (TBMA) and on a novel quantile correction strategy. WFCP is proved to provide formal reliability guarantees in terms of coverage of the predicted set produced by the server. Using numerical results, we demonstrate the significant advantages of WFCP against digital implementations of existing federated CP schemes, especially in regimes with limited communication resources and/or large number of devices.

翻訳日:2023-08-09 12:56:38 公開日:2023-08-08

# 合成子レースデータにおけるGANを用いた画像間変換の比較検討

A Comparative Study of Image-to-Image Translation Using GANs for Synthetic Child Race Data ( http://arxiv.org/abs/2308.04232v1 )

ライセンス: Link先を確認

Wang Yao, Muhammad Ali Farooq, Joseph Lemley, Peter Corcoran

(参考訳) データにおける民族多様性の欠如は、文献における顔認識技術の限界要因となっている。これは、データサンプルが不足している子供に特に当てはまり、成人データに基づいて訓練されたマシンビジョンアルゴリズムを子供に適応させようとする際の課題である。本研究では,画像から画像への変換を利用して異なる人種のデータを合成し,児童の顔データの民族性を調整することを提案する。ピク2ピク、サイクガン、カットネットワークという3つの異なる画像から画像へのニューラルネットワーク手法を比較し、コーカサス的児童データとアジアの児童データ変換を実装した。画像から画像への変換手法を用いて、幅広い民族多様性を持つ様々な合成子データサンプルを作成することが可能であることを示す。

The lack of ethnic diversity in data has been a limiting factor of face recognition techniques in the literature. This is particularly the case for children where data samples are scarce and presents a challenge when seeking to adapt machine vision algorithms that are trained on adult data to work on children. This work proposes the utilization of image-to-image transformation to synthesize data of different races and thus adjust the ethnicity of children's face data. We consider ethnicity as a style and compare three different Image-to-Image neural network based methods, specifically pix2pix, CycleGAN, and CUT networks to implement Caucasian child data and Asian child data conversion. Experimental validation results on synthetic data demonstrate the feasibility of using image-to-image transformation methods to generate various synthetic child data samples with broader ethnic diversity.

翻訳日:2023-08-09 12:56:10 公開日:2023-08-08

# opinionconv: 接頭辞を持つ会話型製品検索

OpinionConv: Conversational Product Search with Grounded Opinions ( http://arxiv.org/abs/2308.04226v1 )

ライセンス: Link先を確認

Vahid Sadiri Javadi, Martin Potthast, Lucie Flek

(参考訳) 製品を探すとき、他人の意見はインフォームドな意思決定において重要な役割を果たす。製品に関する主観的な経験は貴重な情報源になり得る。これはまた、顧客とセールスアシスタントが製品に関する事実や意見を交換する販売会話においても当てはまる。しかし、そのような会話のためにAIを訓練することは、言語モデルが実世界の経験の欠如に対して真の意見を持っていないという事実によって複雑である。製品レビューを製品意見の豊富な情報源として活用し、真に主観的な物語の中で対話型AIを基礎にすることでこの問題に対処する。 OpinionConvでは,営業会話をシミュレートする最初の対話型AIを開発した。生成した会話を検証するために,生成した意見が現実的であると認識されることを示すユーザスタディを複数実施する。また, 意思決定の根拠として, 意見の重要性も確認した。

When searching for products, the opinions of others play an important role in making informed decisions. Subjective experiences about a product can be a valuable source of information. This is also true in sales conversations, where a customer and a sales assistant exchange facts and opinions about products. However, training an AI for such conversations is complicated by the fact that language models do not possess authentic opinions for their lack of real-world experience. We address this problem by leveraging product reviews as a rich source of product opinions to ground conversational AI in true subjective narratives. With OpinionConv, we develop the first conversational AI for simulating sales conversations. To validate the generated conversations, we conduct several user studies showing that the generated opinions are perceived as realistic. Our assessors also confirm the importance of opinions as an informative basis for decision-making.

翻訳日:2023-08-09 12:55:54 公開日:2023-08-08

# Doorbellのカメラは年を重ねるにつれて認識されるのか?

Will your Doorbell Camera still recognize you as you grow old ( http://arxiv.org/abs/2308.04224v1 )

ライセンス: Link先を確認

Wang Yao, Muhammad Ali Farooq, Joseph Lemley and Peter Corcoran

(参考訳) ドアベルカメラのような低消費電力の消費者向けデバイスに対するロバスト認証は、価値がありユニークな課題である。本研究は,顔認証法の性能に及ぼす年齢と加齢の影響を考察する。 AgeDBとMorph-IIの2つの公開年齢データセットがこの作業のベースラインとして使用されている。様々な年齢効果を持つ高品質な顔画像の集合を拡大するために、フォトリアリスティックな年齢変換法が用いられている。そして、これらの合成老化データが高速深層学習に基づく顔認識モデルに与える影響を、受信者動作特性(ROC)曲線や一致スコア分布を含む様々な指標を用いて定量化する。実験結果から, 顔認証手法の長期化は依然として重要な課題であることが明らかとなった。

Robust authentication for low-power consumer devices such as doorbell cameras poses a valuable and unique challenge. This work explores the effect of age and aging on the performance of facial authentication methods. Two public age datasets, AgeDB and Morph-II have been used as baselines in this work. A photo-realistic age transformation method has been employed to augment a set of high-quality facial images with various age effects. Then the effect of these synthetic aging data on the high-performance deep-learning-based face recognition model is quantified by using various metrics including Receiver Operating Characteristic (ROC) curves and match score distributions. Experimental results demonstrate that long-term age effects are still a significant challenge for the state-of-the-art facial authentication method.

翻訳日:2023-08-09 12:55:40 公開日:2023-08-08

# リアルタイムプログレッシブラーニング:ニューラルネットワークに基づく選択記憶を用いた相互強化学習と制御

Real-Time Progressive Learning: Mutually Reinforcing Learning and Control with Neural-Network-Based Selective Memory ( http://arxiv.org/abs/2308.04223v1 )

ライセンス: Link先を確認

Yiming Fei, Jiangang Li, Yanan Li

(参考訳) 記憶は、学習の基盤として、知識の記憶、更新、および忘れることを決定し、さらに学習の効率を決定づける。リアルタイム・プログレッシブ・ラーニング(RTPL)と呼ばれる,放射基底関数ニューラルネットワーク(RBFNN)に基づく学習制御方式を,安定性と閉ループ性能を保証したシステムの未知のダイナミクスを学習するために提案する。適応型神経制御(ANC)における確率勾配降下(SGD)更新法則の代わりに、RTPLは選択型メモリ再帰最小二乗法(SMRLS)アルゴリズムを採用し、RBFNNの重みを更新する。 SMRLSを介してRBFNNの近似能力を特徴空間上に均一に分散し、SGD法の受動的知識忘れ現象を抑制する。その後、RTPLは古典的なANCに対して以下のメリットを達成します。 1)低レベル持続励起(PE)下での学習能力保証 2)学習性能の向上(学習速度,精度,一般化能力) 3)実用用途におけるRTPLの堅牢性を確保する低利得要件。さらに、rtplベースの学習と制御は、タスク実行中に徐々に強化され、長期学習制御タスクに適合する。例えば、RTPLは適応フィードフォワードコントローラであるRBFNNを持つ非線形システムのクラスにおけるトラッキング制御問題に対処するために使用される。対応する理論解析およびシミュレーション研究はrtplの有効性を示す。

Memory, as the basis of learning, determines the storage, update and forgetting of the knowledge and further determines the efficiency of learning. Featured with a mechanism of memory, a radial basis function neural network (RBFNN) based learning control scheme named real-time progressive learning (RTPL) is proposed to learn the unknown dynamics of the system with guaranteed stability and closed-loop performance. Instead of the stochastic gradient descent (SGD) update law in adaptive neural control (ANC), RTPL adopts the selective memory recursive least squares (SMRLS) algorithm to update the weights of the RBFNN. Through SMRLS, the approximation capabilities of the RBFNN are uniformly distributed over the feature space and thus the passive knowledge forgetting phenomenon of SGD method is suppressed. Subsequently, RTPL achieves the following merits over the classical ANC: 1) guaranteed learning capability under low-level persistent excitation (PE), 2) improved learning performance (learning speed, accuracy and generalization capability), and 3) low gain requirement ensuring robustness of RTPL in practical applications. Moreover, the RTPL based learning and control will gradually reinforce each other during the task execution, making it appropriate for long-term learning control tasks. As an example, RTPL is used to address the tracking control problem of a class of nonlinear systems with RBFNN being an adaptive feedforward controller. Corresponding theoretical analysis and simulation studies demonstrate the effectiveness of RTPL.

翻訳日:2023-08-09 12:55:29 公開日:2023-08-08

# GNNモデルにおけるグラフ注意に基づく説明の意味解釈と検証

Semantic Interpretation and Validation of Graph Attention-based Explanations for GNN Models ( http://arxiv.org/abs/2308.04220v1 )

ライセンス: Link先を確認

Efimia Panagiotaki, Daniele De Martini, Lars Kunze

(参考訳) 本研究では,グラフニューラルネットワーク(GNN)に基づくモデルの説明可能性を高めるために意味的注意の応用について検討し,意味的インフォームド摂動を導入し,予測特徴量とモデル精度の相関性を確立する手法を提案する。 Graph Deep Learning(GDL)は、複雑な特徴や関係を簡潔に記述するために柔軟なグラフ構造を活用する、シーン解釈のようなタスクのための有望な分野として登場した。 eXplainable AI(XAI)で使用される従来の説明可能性手法は、そのような構造に直接適用できないため、グラフ固有のアプローチが導入された。注意機構は、深層学習モデルにおける入力特徴の重要性を推定する上での有効性を示しており、GNN予測のための特徴に基づく説明を提供するために、これまで用いられてきた。これらの知見に基づいて,注意重みを意味的ソートされた特徴集合の重要性指標として用いることを検討する既存の注意度ベースのグラフ説明可能性手法を拡張する。予測注目度分布の挙動をモデル精度と相関して解析することにより、GNNモデルの挙動に関する特徴的重要性に関する貴重な洞察を得る。提案手法をlidar pointcloud推定モデルに適用し,高機能化に寄与する重要セマンティクスクラスを効果的に同定し,信頼性の高いポストホックセマンティクス記述を生成する。

In this work, we propose a methodology for investigating the application of semantic attention to enhance the explainability of Graph Neural Network (GNN)-based models, introducing semantically-informed perturbations and establishing a correlation between predicted feature-importance weights and model accuracy. Graph Deep Learning (GDL) has emerged as a promising field for tasks like scene interpretation, leveraging flexible graph structures to concisely describe complex features and relationships. As traditional explainability methods used in eXplainable AI (XAI) cannot be directly applied to such structures, graph-specific approaches are introduced. Attention mechanisms have demonstrated their efficacy in estimating the importance of input features in deep learning models and thus have been previously employed to provide feature-based explanations for GNN predictions. Building upon these insights, we extend existing attention-based graph-explainability methods investigating the use of attention weights as importance indicators of semantically sorted feature sets. Through analysing the behaviour of predicted attention-weights distribution in correlation with model accuracy, we gain valuable insights into feature importance with respect to the behaviour of the GNN model. We apply our methodology to a lidar pointcloud estimation model successfully identifying key semantic classes that contribute to enhanced performance effectively generating reliable post-hoc semantic explanations.

翻訳日:2023-08-09 12:55:03 公開日:2023-08-08

# AquaSAM:水中画像フォアグラウンドセグメンテーション

AquaSAM: Underwater Image Foreground Segmentation ( http://arxiv.org/abs/2308.04218v1 )

ライセンス: Link先を確認

Muduo Xu, Jianhao Su, Yutao Liu

(参考訳) SAM(Segment Anything Model)は自然画像のセグメンテーションに革命をもたらしたが、それでも水中画像のパフォーマンスは制限されている。この研究は、様々な水中ターゲットのセグメンテーションのための汎用的な方法を作成することを目的として、水中画像上でSAMの成功を拡大する最初の試みであるAquaSAMを提示する。これを実現するために、SUIMデータセットで様々なラベルを自動的に分類し抽出することから始める。次に,サムを海中イメージセグメンテーションに適応させるための簡易な微調整法を開発した。人間のダイバーのような8つのセグメンテーションタスクを含む広範な実験を通して、AquaSAMは特にサンゴ礁のような硬いタスクにおいて、デフォルトのSAMモデルよりも優れていることを示した。 AquaSAMは、水中セグメンテーションにおける平均Dice similarity Coefficient(DSC)が7.13(%)改善され、mIoUの改善が平均8.27(%)改善された。

The Segment Anything Model (SAM) has revolutionized natural image segmentation, nevertheless, its performance on underwater images is still restricted. This work presents AquaSAM, the first attempt to extend the success of SAM on underwater images with the purpose of creating a versatile method for the segmentation of various underwater targets. To achieve this, we begin by classifying and extracting various labels automatically in SUIM dataset. Subsequently, we develop a straightforward fine-tuning method to adapt SAM to general foreground underwater image segmentation. Through extensive experiments involving eight segmentation tasks like human divers, we demonstrate that AquaSAM outperforms the default SAM model especially at hard tasks like coral reefs. AquaSAM achieves an average Dice Similarity Coefficient (DSC) of 7.13 (%) improvement and an average of 8.27 (%) on mIoU improvement in underwater segmentation tasks.

翻訳日:2023-08-09 12:54:39 公開日:2023-08-08

# リアルタイム合成支援のためのハイブリッド検索拡張生成

Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance ( http://arxiv.org/abs/2308.04215v1 )

ライセンス: Link先を確認

Xuchao Zhang, Menglin Xia, Camille Couturier, Guoqing Zheng, Saravan Rajmohan, Victor Ruhle

(参考訳) 検索拡張モデルは、文脈理解を改善し、プライベートデータを統合し、幻覚を減らすことで、伝統的な言語モデルの強化に役立つ。しかし,大規模言語モデルの検索に要する処理時間は,合成支援などのリアルタイム応答を必要とするタスクに適用する際の課題となっている。この制限を克服するために,クライアントモデルとクラウドモデルを組み合わせたハイブリッド設定を利用するハイブリッド検索拡張生成(HybridRAG)フレームワークを提案する。 HybridRAGはクラウド上でLLM(Large Language Model)によって非同期に生成される検索拡張メモリを組み込んでいる。この検索強化メモリを統合することで、クライアントモデルはLLMの能力を利用して、非常に効果的な応答を生成する能力を得る。さらに、非同期メモリの統合により、クライアントモデルはクラウドからのメモリ同期を待つことなく、ユーザの要求に対してリアルタイムにレスポンスを提供することができる。 Wikitext と Pile のサブセットを用いた実験により,HybridRAG はクラウドベースの検索拡張 LLM よりも低レイテンシを実現し,クライアントのみのモデルよりも実用性が高いことがわかった。

Retrieval augmented models show promise in enhancing traditional language models by improving their contextual understanding, integrating private data, and reducing hallucination. However, the processing time required for retrieval augmented large language models poses a challenge when applying them to tasks that require real-time responses, such as composition assistance. To overcome this limitation, we propose the Hybrid Retrieval-Augmented Generation (HybridRAG) framework that leverages a hybrid setting that combines both client and cloud models. HybridRAG incorporates retrieval-augmented memory generated asynchronously by a Large Language Model (LLM) in the cloud. By integrating this retrieval augmented memory, the client model acquires the capability to generate highly effective responses, benefiting from the LLM's capabilities. Furthermore, through asynchronous memory integration, the client model is capable of delivering real-time responses to user requests without the need to wait for memory synchronization from the cloud. Our experiments on Wikitext and Pile subsets show that HybridRAG achieves lower latency than a cloud-based retrieval-augmented LLM, while outperforming client-only models in utility.

翻訳日:2023-08-09 12:54:22 公開日:2023-08-08

# FLIRT: フィードバックループのコンテキスト内でのレッドチーム

FLIRT: Feedback Loop In-context Red Teaming ( http://arxiv.org/abs/2308.04265v1 )

ライセンス: Link先を確認

Ninareh Mehrabi, Palash Goyal, Christophe Dupuy, Qian Hu, Shalini Ghosh, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta

(参考訳) 警告: 本論文は不適切または不快な内容を含む。生成モデルが様々なアプリケーションで一般公開されるようになるにつれて、これらのモデルの脆弱性のテストと分析が最優先事項となっている。ここでは,特定のモデルを評価し,その脆弱性を安全でない不適切なコンテンツ生成に対して公開する自動レッドチームフレームワークを提案する。私たちのフレームワークは、レッドチームモデルに対するフィードバックループでコンテキスト内学習を使用し、それらを安全でないコンテンツ生成にトリガーします。本稿では,テキストから画像へのモデルの効果的かつ多様なプロンプトを自動的に学習するための,コンテキスト内攻撃戦略を提案する。提案手法は, ベースラインアプローチと比較して, 安定拡散(SD)モデルにおいて, 安全性が向上した場合でも, 脆弱性の暴露に有効であることが実証された。さらに,提案フレームワークが,テキスト対テキストモデルのレッド・ペアリングに有効であることを実証し,従来報告した数に比べて有毒な応答生成率を有意に高めることを示した。

Warning: this paper contains content that may be inappropriate or offensive. As generative models become available for public use in various applications, testing and analyzing vulnerabilities of these models has become a priority. Here we propose an automatic red teaming framework that evaluates a given model and exposes its vulnerabilities against unsafe and inappropriate content generation. Our framework uses in-context learning in a feedback loop to red team models and trigger them into unsafe content generation. We propose different in-context attack strategies to automatically learn effective and diverse adversarial prompts for text-to-image models. Our experiments demonstrate that compared to baseline approaches, our proposed strategy is significantly more effective in exposing vulnerabilities in Stable Diffusion (SD) model, even when the latter is enhanced with safety features. Furthermore, we demonstrate that the proposed framework is effective for red teaming text-to-text models, resulting in significantly higher toxic response generation rate compared to previously reported numbers.

翻訳日:2023-08-09 12:48:55 公開日:2023-08-08

# BarlowRL:データ効率の良い強化学習のためのバローツイン

BarlowRL: Barlow Twins for Data-Efficient Reinforcement Learning ( http://arxiv.org/abs/2308.04263v1 )

ライセンス: Link先を確認

Omer Veysel Cagatan

(参考訳) 本稿では,Barlow Twins自己教師型学習フレームワークとDER(Data-Efficient Rainbow)アルゴリズムを組み合わせたデータ効率強化学習エージェントBarlowRLを紹介する。 BarlowRLはAtari 100kベンチマークでDERとそれと対照的なCURLの両方を上回っている。 BarlowRLは空間全体に広がる情報を強制することによって次元的崩壊を避ける。これにより、RLアルゴリズムは、最終的に顕著なパフォーマンスをもたらす一様拡散状態表現を利用することができる。 Barlow TwinsとDERの統合により、データ効率が向上し、RLタスクのパフォーマンスが向上する。 BarlowRLは、RLアルゴリズムを改善するために自己教師付き学習技術を導入する可能性を示している。

This paper introduces BarlowRL, a data-efficient reinforcement learning agent that combines the Barlow Twins self-supervised learning framework with DER (Data-Efficient Rainbow) algorithm. BarlowRL outperforms both DER and its contrastive counterpart CURL on the Atari 100k benchmark. BarlowRL avoids dimensional collapse by enforcing information spread to the whole space. This helps RL algorithms to utilize uniformly spread state representation that eventually results in a remarkable performance. The integration of Barlow Twins with DER enhances data efficiency and achieves superior performance in the RL tasks. BarlowRL demonstrates the potential of incorporating self-supervised learning techniques to improve RL algorithms.

翻訳日:2023-08-09 12:48:37 公開日:2023-08-08

# SDLFormer: 高速MR画像再構成のための疎高密度局所変換器

SDLFormer: A Sparse and Dense Locality-enhanced Transformer for Accelerated MR Image Reconstruction ( http://arxiv.org/abs/2308.04262v1 )

ライセンス: Link先を確認

Rahul G.S., Sriprabha Ramnarayanan, Mohammad Al Fahim, Keerthi Ram, Preejith S.P, and Mohanasankar Sivaprakasam

(参考訳) トランスフォーマーは、空間領域における非局所的な領域関係を学習する能力のため、畳み込みニューラルネットワークの有効な代替手段として登場した。トランスの自己アテンション機構により、トランスフォーマーは画像の長距離依存性を捉えることができ、画像領域におけるアンダーサンプリングの効果が非局所的であるため、mri画像再構成の高速化に望ましい。計算効率にも拘わらず、ウィンドウベースのトランスフォーマーはイメージウィンドウの範囲内に限定されるため、レセプティブフィールドの制限を受ける。拡張注意機構と畳み込み機構を統合し,mri画像再構成を高速化する窓型トランスフォーマーネットワークを提案する。提案手法は,mri画像再構成のための低レベル変換不変特性を学習するために,遠方近傍の画素関係を強化し,トランスフォーマモジュール内に奥行き方向畳み込みを導入するために,拡張および密集した近傍注意トランスから構成する。提案モデルは, 自己監督的に訓練される。 k-space スプリッティングに基づく自己教師型学習における4xおよび5xアンダーサンプリングと対比した冠状骨PD, 冠状骨PDFS, 軸方向T2に対する多コイルMRIアクセラレーションの広範な実験を行った。本手法は他の再構築アーキテクチャと並列ドメイン自己教師付き学習ベースラインとの比較を行った。その結果,提案モデルが改善率を示すことがわかった。 (i)PSNRでは約1.40dB、他のアーキテクチャでは平均0.028dBである。 (ii)psnrでは約1.44db、並列ドメイン自己教師付き学習では約0.029db。コードはhttps://github.com/rahul-gs-16/sdlformer.gitで入手できる。

Transformers have emerged as viable alternatives to convolutional neural networks owing to their ability to learn non-local region relationships in the spatial domain. The self-attention mechanism of the transformer enables transformers to capture long-range dependencies in the images, which might be desirable for accelerated MRI image reconstruction as the effect of undersampling is non-local in the image domain. Despite its computational efficiency, the window-based transformers suffer from restricted receptive fields as the dependencies are limited to within the scope of the image windows. We propose a window-based transformer network that integrates dilated attention mechanism and convolution for accelerated MRI image reconstruction. The proposed network consists of dilated and dense neighborhood attention transformers to enhance the distant neighborhood pixel relationship and introduce depth-wise convolutions within the transformer module to learn low-level translation invariant features for accelerated MRI image reconstruction. The proposed model is trained in a self-supervised manner. We perform extensive experiments for multi-coil MRI acceleration for coronal PD, coronal PDFS and axial T2 contrasts with 4x and 5x under-sampling in self-supervised learning based on k-space splitting. We compare our method against other reconstruction architectures and the parallel domain self-supervised learning baseline. Results show that the proposed model exhibits improvement margins of (i) around 1.40 dB in PSNR and around 0.028 in SSIM on average over other architectures (ii) around 1.44 dB in PSNR and around 0.029 in SSIM over parallel domain self-supervised learning. The code is available at https://github.com/rahul-gs-16/sdlformer.git

翻訳日:2023-08-09 12:48:27 公開日:2023-08-08

# passt と large audio-caption データセットを用いた自然言語に基づく音声検索の高度化

Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets ( http://arxiv.org/abs/2308.04258v1 )

ライセンス: Link先を確認

Paul Primus, Khaled Koutini, Gerhard Widmer

(参考訳) 本研究は,事前学習されたテキストとスペクトログラム変換器に基づく音声検索システムを提案する。提案手法は,異なるモーダルの関連事例が近接した共有音声キャプション空間に記録とテキスト記述を投影する。本研究では,システムの各コンポーネントが検索性能に与える影響を系統的分析により検討する。その結果,音声埋め込みのためのセルフアテンションベースのオーディオエンコーダと,事前学習における人間生成および合成データセットの利用という2つの重要な役割を担っている。さらに,ClosoV2字幕をキーワードで拡張し,その多様性を高める実験を行ったが,これは限界改善にしか至らなかった。当システムは2023年のdcaseチャレンジで第1位にランクインし, clothov2ベンチマークの現在の成果を5.6ppも上回った。マップ@10。

This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transformers. Our method projects recordings and textual descriptions into a shared audio-caption space in which related examples from different modalities are close. Through a systematic analysis, we examine how each component of the system influences retrieval performance. As a result, we identify two key components that play a crucial role in driving performance: the self-attention-based audio encoder for audio embedding and the utilization of additional human-generated and synthetic data sets during pre-training. We further experimented with augmenting ClothoV2 captions with available keywords to increase their variety; however, this only led to marginal improvements. Our system ranked first in the 2023's DCASE Challenge, and it outperforms the current state of the art on the ClothoV2 benchmark by 5.6 pp. mAP@10.

翻訳日:2023-08-09 12:47:58 公開日:2023-08-08

# CLASSLA-Stanza:南スラヴ語の言語処理の次のステップ

CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages ( http://arxiv.org/abs/2308.04255v1 )

ライセンス: Link先を確認

Luka Ter\v{c}on, Nikola Ljube\v{s}i\'c

(参考訳) 本稿では,南スラヴ語の自動言語アノテーションのためのパイプラインであるCLASSLA-Stanzaについて述べる。我々は、Stanzaに対するCLASSLA-Stanzaの主な改善点を説明し、パイプラインの最新2.1リリースのモデルトレーニングプロセスの詳細を説明します。また、異なる言語や品種のパイプラインによって生成されたパフォーマンススコアも報告する。 CLASSLA-Stanzaは、サポートするすべての言語で一貫して高いパフォーマンスを示し、サポート対象のすべてのタスクにおいて、親パイプラインStanzaをパフォーマンスまたは拡張する。また、Webデータの効率的な処理を可能にするパイプラインの新機能と、その実装に繋がった理由についても紹介する。

We present CLASSLA-Stanza, a pipeline for automatic linguistic annotation of the South Slavic languages, which is based on the Stanza natural language processing pipeline. We describe the main improvements in CLASSLA-Stanza with respect to Stanza, and give a detailed description of the model training process for the latest 2.1 release of the pipeline. We also report performance scores produced by the pipeline for different languages and varieties. CLASSLA-Stanza exhibits consistently high performance across all the supported languages and outperforms or expands its parent pipeline Stanza at all the supported tasks. We also present the pipeline's new functionality enabling efficient processing of web data and the reasons that led to its implementation.

翻訳日:2023-08-09 12:47:33 公開日:2023-08-08

# 多焦点レンズカメラによるブラア認識距離推定

Blur aware metric depth estimation with multi-focus plenoptic cameras ( http://arxiv.org/abs/2308.04252v1 )

ライセンス: Link先を確認

Mathieu Labussi\`ere, C\'eline Teuli\`ere, Omar Ait-Aider

(参考訳) 従来のカメラはシーンの1つの視点のみをキャプチャするが、plenopticまたはlight-fieldカメラは1つのスナップショットで空間的および角的情報をキャプチャし、単一の取得から深さを推定できる。本稿では,多焦点カメラからの生画像のみを用いた新しい距離深度推定アルゴリズムを提案する。提案手法は,焦点長の異なる複数のマイクロレンズを用いたマルチフォーカス構成に特に適合する。 BLADEのアプローチの主な目的は,デフォーカスステレオ画像の一致度とデフォーカス手がかりの両組み合わせによる相違度推定を改善することである。したがって,従来は欠点とされていたぼやけ情報を活用する。スケール係数までの深さ推定を提供するデフォーカスボケを含む逆射影モデルを明示的に導出する。次に, 逆モデルを校正する手法を提案する。したがって、深度スケーリングを考慮に入れ、正確なメートル深度推定を行う。その結果,Defocus cuesの導入により深さ推定が向上した。筆者らは,3次元ライダースキャナーを用いて,相対的な深度推定設定と実世界の3次元複雑なシーンにおけるフレームワークと深度スケーリングキャリブレーションの有効性を実証した。

While a traditional camera only captures one point of view of a scene, a plenoptic or light-field camera, is able to capture spatial and angular information in a single snapshot, enabling depth estimation from a single acquisition. In this paper, we present a new metric depth estimation algorithm using only raw images from a multi-focus plenoptic camera. The proposed approach is especially suited for the multi-focus configuration where several micro-lenses with different focal lengths are used. The main goal of our blur aware depth estimation (BLADE) approach is to improve disparity estimation for defocus stereo images by integrating both correspondence and defocus cues. We thus leverage blur information where it was previously considered a drawback. We explicitly derive an inverse projection model including the defocus blur providing depth estimates up to a scale factor. A method to calibrate the inverse model is then proposed. We thus take into account depth scaling to achieve precise and accurate metric depth estimates. Our results show that introducing defocus cues improves the depth estimation. We demonstrate the effectiveness of our framework and depth scaling calibration on relative depth estimation setups and on real-world 3D complex scenes with ground truth acquired with a 3D lidar scanner.

翻訳日:2023-08-09 12:47:12 公開日:2023-08-08

# minddiffuser: 意味的および構造的拡散を伴うヒト脳活動からの画像再構成制御

MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion ( http://arxiv.org/abs/2308.04249v1 )

ライセンス: Link先を確認

Yizhuo Lu, Changde Du, Qiongyi zhou, Dianpeng Wang, Huiguang He

(参考訳) 脳の録音から視覚刺激を再構築することは有意義で難しい課題である。特に、精密かつ制御可能な画像再構成の達成は、脳-コンピュータインタフェースの進歩と活用を促進する上で非常に重要である。複雑な画像再構成技術の進歩にもかかわらず、この課題は、画像刺激と意味(概念と対象)と構造(位置、方向、大きさ)の結合的なアライメントを達成することにある。上記の問題に対処するため,MindDiffuserと呼ばれる2段階画像再構成モデルを提案する。ステージ1では、VQ-VAE潜在表現とfMRIからデコードされたCLIPテキスト埋め込みが安定拡散され、セマンティック情報を含む予備画像が生成される。ステージ2では、fMRIからデコードされたCLIP視覚特徴を監視情報として利用し、バックプロパゲーションによりステージ1でデコードされた2つの特徴ベクトルを継続的に調整し、構造情報を整列させる。定性的および定量的解析の結果から,本モデルがNatural Scenes Dataset (NSD) の最先端モデルを上回ったことが明らかとなった。その後の実験結果は、そのモデルの神経生物学的妥当性を裏付けるものであり、対応する脳反応と一致するマルチモーダル特徴の解釈可能性によって証明された。

Reconstructing visual stimuli from brain recordings has been a meaningful and challenging task. Especially, the achievement of precise and controllable image reconstruction bears great significance in propelling the progress and utilization of brain-computer interfaces. Despite the advancements in complex image reconstruction techniques, the challenge persists in achieving a cohesive alignment of both semantic (concepts and objects) and structure (position, orientation, and size) with the image stimuli. To address the aforementioned issue, we propose a two-stage image reconstruction model called MindDiffuser. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion, which yields a preliminary image that contains semantic information. In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpropagation to align the structural information. The results of both qualitative and quantitative analyses demonstrate that our model has surpassed the current state-of-the-art models on Natural Scenes Dataset (NSD). The subsequent experimental findings corroborate the neurobiological plausibility of the model, as evidenced by the interpretability of the multimodal feature employed, which align with the corresponding brain responses.

翻訳日:2023-08-09 12:46:28 公開日:2023-08-08

# 単語埋め込みを用いた光沢アライメント

Gloss Alignment Using Word Embeddings ( http://arxiv.org/abs/2308.04248v1 )

ライセンス: Link先を確認

Harry Walsh, Ozge Mercanoglu Sincan, Ben Saunders, Richard Bowden

(参考訳) 署名言語データセットのキャプチャとアノテーションは、時間とコストのかかるプロセスである。現在のデータセットは、制約のない \acf{slt}モデルをうまくトレーニングするには、桁違いに小さすぎる。その結果、研究は、手話インタプリタと関連するオーディオサブタイトルの両方からなる大規模トレーニングデータのソースとして、テレビ放送コンテンツに転換した。しかし、手話アノテーションの欠如は、このデータのユーザビリティを制限し、手話スポッティングのような自動アノテーション技術の開発につながった。これらのスポッティングは、字幕ではなくビデオと一致しており、しばしば字幕と斑点の記号のミスアライメントをもたらす。本論文では,大規模な音声言語モデルを用いて,スポッティングを対応する字幕に合わせる手法を提案する。単一のモダリティを用いることで,計算コストが低く,既存のアライメント手法と組み合わせて利用することができる。本稿では, 単語アライメントにおける<acf{mdgs} と \acf{bobsl} データセットの有効性を定量的に検証し, 単語アライメントにおいて最大33.22 BLEU-1 スコアを回復する。

Capturing and annotating Sign language datasets is a time consuming and costly process. Current datasets are orders of magnitude too small to successfully train unconstrained \acf{slt} models. As a result, research has turned to TV broadcast content as a source of large-scale training data, consisting of both the sign language interpreter and the associated audio subtitle. However, lack of sign language annotation limits the usability of this data and has led to the development of automatic annotation techniques such as sign spotting. These spottings are aligned to the video rather than the subtitle, which often results in a misalignment between the subtitle and spotted signs. In this paper we propose a method for aligning spottings with their corresponding subtitles using large spoken language models. Using a single modality means our method is computationally inexpensive and can be utilized in conjunction with existing alignment techniques. We quantitatively demonstrate the effectiveness of our method on the \acf{mdgs} and \acf{bobsl} datasets, recovering up to a 33.22 BLEU-1 score in word alignment.

翻訳日:2023-08-09 12:45:52 公開日:2023-08-08

# aicsd: 意味セグメンテーションのための適応的クラス間類似性蒸留

AICSD: Adaptive Inter-Class Similarity Distillation for Semantic Segmentation ( http://arxiv.org/abs/2308.04243v1 )

ライセンス: Link先を確認

Amir M. Mansourian, Rozhan Ahmadi, Shohreh Kasaei

(参考訳) 近年、深層ニューラルネットワークはコンピュータビジョンタスクにおいて顕著な精度を実現している。特にセマンティックセグメンテーションのような密集した予測タスクにおいて、推論時間が重要な要因となるため、知識蒸留は軽量な学生ネットワークの精度向上に成功している。既存の手法は、チャンネルや異なるクラスの情報を無視することが多い。これらの制約を克服するために,知識蒸留のためのICSD (Inter-Class similarity Distillation) を提案する。提案手法は,クラス内のクラス内分布をネットワーク出力から独立に計算することにより,教師ネットワークから生徒ネットワークへ高次関係を伝達する。その後、各クラスの分布間のkl発散を用いて蒸留のためのクラス間類似度行列を計算する。提案手法の有効性をさらに向上するため,適応損失重み付け(ALW)トレーニング戦略を提案する。既存の方法とは異なり、alw戦略は教師の予測の誤りを考慮し、訓練プロセスの終了に向けて教師ネットワークの影響を徐々に減少させる。セマンティックセグメンテーションのためのよく知られた2つのデータセットであるCityscapesとPascal VOC 2012で実施された大規模な実験は、mIoUとピクセル精度の観点から提案手法の有効性を検証する。提案手法は, 定量評価と定性評価の両方により, 既存の知識蒸留法よりも優れていた。コードは、https://github.com/AmirMansurian/AICSDで入手できる。

In recent years, deep neural networks have achieved remarkable accuracy in computer vision tasks. With inference time being a crucial factor, particularly in dense prediction tasks such as semantic segmentation, knowledge distillation has emerged as a successful technique for improving the accuracy of lightweight student networks. The existing methods often neglect the information in channels and among different classes. To overcome these limitations, this paper proposes a novel method called Inter-Class Similarity Distillation (ICSD) for the purpose of knowledge distillation. The proposed method transfers high-order relations from the teacher network to the student network by independently computing intra-class distributions for each class from network outputs. This is followed by calculating inter-class similarity matrices for distillation using KL divergence between distributions of each pair of classes. To further improve the effectiveness of the proposed method, an Adaptive Loss Weighting (ALW) training strategy is proposed. Unlike existing methods, the ALW strategy gradually reduces the influence of the teacher network towards the end of training process to account for errors in teacher's predictions. Extensive experiments conducted on two well-known datasets for semantic segmentation, Cityscapes and Pascal VOC 2012, validate the effectiveness of the proposed method in terms of mIoU and pixel accuracy. The proposed method outperforms most of existing knowledge distillation methods as demonstrated by both quantitative and qualitative evaluations. Code is available at: https://github.com/AmirMansurian/AICSD

翻訳日:2023-08-09 12:45:32 公開日:2023-08-08

# AutoPCF: 大規模言語モデルを用いた効率的な製品カーボンフットプリント会計

AutoPCF: Efficient Product Carbon Footprint Accounting with Large Language Models ( http://arxiv.org/abs/2308.04241v1 )

ライセンス: Link先を確認

Zhu Deng, Jinjie Liu, Biao Luo, Can Yuan, Qingrun Yang, Lei Xiao, Wenwen Zhou, Zhu Liu

(参考訳) 製品炭素フットプリント(pcf)はサプライチェーンの脱炭素化に不可欠であり、製品ライフサイクル中のすべての活動によって引き起こされる間接的および間接的な温室効果ガス排出量を測定する。しかし、PCF会計は、しばしば専門知識とライフサイクルモデルを構築するのにかなりの時間を必要とする。本研究では,5つの大規模言語モデル(llm)の創発的能力を用いて,製品の'cradle-to-gate'ライフサイクルをモデル化し,入力と出力のインベントリデータを生成し,その限界を一般化pcf知識データベースとして明らかにする。 llmsを活用することで,計算パラメータの自動マッチングにディープラーニングアルゴリズムを適用し,最終的にpcfを計算する,自動ai駆動型pcf会計フレームワークautopcfを提案する。 autopcfフレームワークを用いて3つのケース製品の炭素フットプリントを推定した結果,モデリング時間を数日から数分に短縮し,pcfの自動モデリングと推定を実現する可能性を示した。

The product carbon footprint (PCF) is crucial for decarbonizing the supply chain, as it measures the direct and indirect greenhouse gas emissions caused by all activities during the product's life cycle. However, PCF accounting often requires expert knowledge and significant time to construct life cycle models. In this study, we test and compare the emergent ability of five large language models (LLMs) in modeling the 'cradle-to-gate' life cycles of products and generating the inventory data of inputs and outputs, revealing their limitations as a generalized PCF knowledge database. By utilizing LLMs, we propose an automatic AI-driven PCF accounting framework, called AutoPCF, which also applies deep learning algorithms to automatically match calculation parameters, and ultimately calculate the PCF. The results of estimating the carbon footprint for three case products using the AutoPCF framework demonstrate its potential in achieving automatic modeling and estimation of PCF with a large reduction in modeling time from days to minutes.

翻訳日:2023-08-09 12:45:07 公開日:2023-08-08

# 持続的行動による時間的離散化を伴うアクタ-クリティック

Actor-Critic with variable time discretization via sustained actions ( http://arxiv.org/abs/2308.04299v1 )

ライセンス: Link先を確認

Jakub {\L}yskawa, Pawe{\l} Wawrzy\'nski

(参考訳) 強化学習(RL)法は離散時間で機能する。ロボット制御のような本質的に連続した問題にRLを適用するには、特定の時間離散化を定義する必要がある。これは、訓練が容易なスパースタイムコントロールと、最終的なパフォーマンス向上を可能にするより細かいタイムコントロールの2つの選択肢である。本研究では,異なる時間離散化設定の利点を組み合わせたオフポリシーrlアルゴリズムであるsusacerを提案する。最初はスパースタイムの離散化で動作し、徐々に微細なものに切り替える。ロボット制御環境における時間偏差変化の影響を解析する:Ant, HalfCheetah, Hopper, Walker2D。いずれの場合も,提案アルゴリズムは最先端技術より優れている。

Reinforcement learning (RL) methods work in discrete time. In order to apply RL to inherently continuous problems like robotic control, a specific time discretization needs to be defined. This is a choice between sparse time control, which may be easier to train, and finer time control, which may allow for better ultimate performance. In this work, we propose SusACER, an off-policy RL algorithm that combines the advantages of different time discretization settings. Initially, it operates with sparse time discretization and gradually switches to a fine one. We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D. In all cases our proposed algorithm outperforms state of the art.

翻訳日:2023-08-09 12:36:50 公開日:2023-08-08

# LaCAM$^\ast$:リアルタイム・大規模・準最適マルチエージェントパスフィニングを目指して

Engineering LaCAM$^\ast$: Towards Real-Time, Large-Scale, and Near-Optimal Multi-Agent Pathfinding ( http://arxiv.org/abs/2308.04292v1 )

ライセンス: Link先を確認

Keisuke Okumura

(参考訳) 本稿では,最近提案されたLaCAM*アルゴリズムの改良を通じて,リアルタイム,大規模,準最適マルチエージェントパスフィンディング(MAPF)の課題に対処する。 LaCAM*はスケーラブルな検索ベースのアルゴリズムであり、累積遷移コストに対する最適解の最終的な発見を保証する。様々な最先端MAPF法を超越した計画成功率を示す一方で、初期解の質は最適には程遠いものであり、最適への収束速度は遅い。これらの制限を克服するために,他のMAPF法からインスピレーションを得た改良手法をいくつか紹介する。これらの手法の融合がLaCAM*の解の質を著しく向上させ、MAPFアルゴリズムの境界をさらに推し進めるという実証的な証拠を提供する。

This paper addresses the challenges of real-time, large-scale, and near-optimal multi-agent pathfinding (MAPF) through enhancements to the recently proposed LaCAM* algorithm. LaCAM* is a scalable search-based algorithm that guarantees the eventual finding of optimal solutions for cumulative transition costs. While it has demonstrated remarkable planning success rates, surpassing various state-of-the-art MAPF methods, its initial solution quality is far from optimal, and its convergence speed to the optimum is slow. To overcome these limitations, this paper introduces several improvement techniques, partly drawing inspiration from other MAPF methods. We provide empirical evidence that the fusion of these techniques significantly improves the solution quality of LaCAM*, thus further pushing the boundaries of MAPF algorithms.

翻訳日:2023-08-09 12:36:41 公開日:2023-08-08

# 長距離絡み合いを混合に変換する:局所平衡へのテンソル-ネットワークアプローチ

Converting long-range entanglement into mixture: tensor-network approach to local equilibration ( http://arxiv.org/abs/2308.04291v1 )

ライセンス: Link先を確認

Miguel Fr\'ias-P\'erez, Luca Tagliacozzo and Mari Carmen Ba\~nuls

(参考訳) クエンチによって誘導される平衡外進化において、高速自由度は標準テンソルネットワークでエンコードし難い長距離絡みを生じる。しかし、局所観測者はそのような長距離相関を、還元された局所状態への寄与を通じてのみ知覚する。本稿では,このような長距離の絡み合いを識別し,それを効率的に混合して表現しやすいテンソルネットワーク法を提案する。このように,有限計算資源を持つ局所作用素の長時間挙動をキャプチャする密度行列として,時間発展状態の効果的な記述を得る。

In the out-of-equilibrium evolution induced by a quench, fast degrees of freedom generate long-range entanglement that is hard to encode with standard tensor networks. However, local observables only sense such long-range correlations through their contribution to the reduced local state as a mixture. We present a tensor network method that identifies such long-range entanglement and efficiently transforms it into mixture, much easier to represent. In this way, we obtain an effective description of the time-evolved state as a density matrix that captures the long-time behavior of local operators with finite computational resources.

翻訳日:2023-08-09 12:36:24 公開日:2023-08-08

# Cloth2Tex: 3D仮想トライオンのためのカスタマイズされた布地テクスチャ生成パイプライン

Cloth2Tex: A Customized Cloth Texture Generation Pipeline for 3D Virtual Try-On ( http://arxiv.org/abs/2308.04288v1 )

ライセンス: Link先を確認

Daiheng Gao, Xu Chen, Xindi Zhang, Qi Wang, Ke Sun, Bang Zhang, Liefeng Bo, Qixing Huang

(参考訳) 3D服の製作とデザインは、3D仮想試着、2D服の3Dアパレルへのデジタル化、布のアニメーションなど、様々な用途でリアルな服装を合成する必要性が高まるにつれて、非常に要求されるようになった。そのため、2d参照画像などの単純な入力から高品質なテクスチャを得るために、シンプルで簡単なパイプラインが必要となる。伝統的なワーピングベースのテクスチャ生成法では、各タイプの衣服に手動で選択するかなりの数の制御ポイントが必要であるため、時間と手間がかかる。本稿では,この過程における人的負担をなくす新しい方法である cloth2tex を提案する。 Cloth2Texは、合理的なレイアウトと構造整合性を持つテクスチャマップを生成する自己教師方式である。 Cloth2Texのもうひとつの重要な特徴は、高忠実なテクスチャインペイントをサポートするために使用できることだ。これはClos2Texと一般的な潜在拡散モデルを組み合わせることで実現される。提案手法は質的かつ定量的に評価し,Clos2Texが高品質なテクスチャマップを生成でき,他の手法と比較して最高の視覚効果が得られることを示した。プロジェクトページ:tomguluson92.github.io/projects/cloth2tex/

Fabricating and designing 3D garments has become extremely demanding with the increasing need for synthesizing realistic dressed persons for a variety of applications, e.g. 3D virtual try-on, digitalization of 2D clothes into 3D apparel, and cloth animation. It thus necessitates a simple and straightforward pipeline to obtain high-quality texture from simple input, such as 2D reference images. Since traditional warping-based texture generation methods require a significant number of control points to be manually selected for each type of garment, which can be a time-consuming and tedious process. We propose a novel method, called Cloth2Tex, which eliminates the human burden in this process. Cloth2Tex is a self-supervised method that generates texture maps with reasonable layout and structural consistency. Another key feature of Cloth2Tex is that it can be used to support high-fidelity texture inpainting. This is done by combining Cloth2Tex with a prevailing latent diffusion model. We evaluate our approach both qualitatively and quantitatively and demonstrate that Cloth2Tex can generate high-quality texture maps and achieve the best visual effects in comparison to other methods. Project page: tomguluson92.github.io/projects/cloth2tex/

翻訳日:2023-08-09 12:36:14 公開日:2023-08-08

# wav2vec 2.0 Feature Extractorの比較解析

Comparative Analysis of the wav2vec 2.0 Feature Extractor ( http://arxiv.org/abs/2308.04286v1 )

ライセンス: Link先を確認

Peter Vieting and Ralf Schl\"uter and Hermann Ney

(参考訳) 自動音声認識(ASR)システムは通常手作りの特徴抽出パイプラインを使用する。固有情報損失を回避し、音声から転写テキストへのより一貫したモデリングを達成するために、neural raw waveform feature extractor(fes)は魅力的なアプローチである。また、最近広く普及したwav2vec 2.0モデルは、音声波形を直接操作する畳み込みFEを使用している。しかし、文献ではまだ広く研究されていない。本研究では,ctc (connectionist temporal classification) asrモデルにおける標準特徴抽出法を代替する能力について検討し,それを代替神経feと比較する。両者とも、librispeechベンチマークにおいて従来のfesと競合し、個々のコンポーネントの影響を分析する。さらに、学習したフィルタを分析し、ASRシステムにとって最も重要な情報が一連の帯域通過フィルタによって得られることを示す。

Automatic speech recognition (ASR) systems typically use handcrafted feature extraction pipelines. To avoid their inherent information loss and to achieve more consistent modeling from speech to transcribed text, neural raw waveform feature extractors (FEs) are an appealing approach. Also the wav2vec 2.0 model, which has recently gained large popularity, uses a convolutional FE which operates directly on the speech waveform. However, it is not yet studied extensively in the literature. In this work, we study its capability to replace the standard feature extraction methods in a connectionist temporal classification (CTC) ASR model and compare it to an alternative neural FE. We show that both are competitive with traditional FEs on the LibriSpeech benchmark and analyze the effect of the individual components. Furthermore, we analyze the learned filters and show that the most important information for the ASR system is obtained by a set of bandpass filters.

翻訳日:2023-08-09 12:35:51 公開日:2023-08-08

# 極端海洋環境下における無人船の視覚に基づく自律航法

Vision-Based Autonomous Navigation for Unmanned Surface Vessel in Extreme Marine Conditions ( http://arxiv.org/abs/2308.04283v1 )

ライセンス: Link先を確認

Muhayyuddin Ahmed, Ahsan Baidar Bakht, Taimur Hassan, Waseem Akram, Ahmed Humais, Lakmal Seneviratne, Shaoming He, Defu Lin, and Irfan Hussain

(参考訳) 視覚知覚は無人表面容器(USV)の自律航法において重要な要素であり、特に自律的な検査と追跡に関わるタスクにおいて重要である。これらのタスクには、ナビゲーションのターゲットを特定する視覚ベースのナビゲーション技術が含まれる。海洋環境における極端な気象条件下での視認性の低下は、視覚に基づくアプローチが適切に働くことを困難にしている。これらの課題を克服するために,極端海洋環境下で対象物を追跡する自律型視覚ナビゲーションフレームワークを提案する。提案するフレームワークは、GAN(Generative Adversarial Network)を使用してノイズを除去し、オブジェクト検出器(YOLOv5)に渡す前にオブジェクトの特徴をハイライトする統合認識パイプラインで構成されている。検出された視覚的特徴は、ターゲットを追跡するためにUSVによって使用される。提案手法は砂嵐や霧による可視性低下下でのシミュレーションで徹底的に検証されている。その結果,提案手法が既存の手法を様々な測定値で上回っているmbzircシミュレーションデータセット全体において,最先端のデヘイジング手法と比較した。

Visual perception is an important component for autonomous navigation of unmanned surface vessels (USV), particularly for the tasks related to autonomous inspection and tracking. These tasks involve vision-based navigation techniques to identify the target for navigation. Reduced visibility under extreme weather conditions in marine environments makes it difficult for vision-based approaches to work properly. To overcome these issues, this paper presents an autonomous vision-based navigation framework for tracking target objects in extreme marine conditions. The proposed framework consists of an integrated perception pipeline that uses a generative adversarial network (GAN) to remove noise and highlight the object features before passing them to the object detector (i.e., YOLOv5). The detected visual features are then used by the USV to track the target. The proposed framework has been thoroughly tested in simulation under extremely reduced visibility due to sandstorms and fog. The results are compared with state-of-the-art de-hazing methods across the benchmarked MBZIRC simulation dataset, on which the proposed scheme has outperformed the existing methods across various metrics.

翻訳日:2023-08-09 12:35:38 公開日:2023-08-08

# 散逸のないエッジ状態による線幅狭化による位相保護下空洞ポラリトン

Topologically protected subradiant cavity polaritons through linewidth narrowing enabled by dissipationless edge states ( http://arxiv.org/abs/2308.04277v1 )

ライセンス: Link先を確認

Yuwei Lu, Jingfeng Liu, Haoxiang Jiang, Zeyang Liao

(参考訳) 量子レベルでの強い光-物質相互作用に由来するキャビティ偏光子は、キャビティ場を介して量子状態の効率的な操作の基礎となる。狭い直線幅と長い寿命を持つポラリトンは、量子センシングや記憶などの応用に魅力的である。本稿では,一次元原子配列で成形したトポロジカルミラーを用いた発振ガリーモード共振器を試作し,キャビティ偏光子の寿命を等級的に向上させる手法を提案する。この顕著な強化特性は、空洞モードの漏れを抑制する原子配列のトポロジカルバンドギャップによって保護される散逸のないエッジ状態への分極状態のカップリングによるものである。ラビ分割の幅を超えると、位相的バンドギャップは、極性状態から原子配列のバルク状態への散逸をさらに減少させ、非常に鋭い線幅を持つ亜ラジアントキャビティポラリトンに生じる。結果のラビ振動は、単一の量子エミッタの自由空間崩壊よりも低い速度で崩壊する。エッジ状態の位相的に保護された性質から受け継いだキャビティポラリトンは、原子周波数、相互作用強度、位置を含む中程度の摂動を伴う乱れた原子ミラーに保存することができる。我々の研究は、量子コンピューティングとネットワークの将来の応用にロバストな量子コヒーレンスを持つトポロジー工学の量子状態の新しいパラダイムを開放する。

Cavity polaritons derived from the strong light-matter interaction at the quantum level provide a basis for efficient manipulation of quantum states via cavity field. Polaritons with narrow linewidth and long lifetime are appealing in applications such as quantum sensing and storage. Here, we propose a prototypical arrangement to implement a whispering-gallery-mode resonator with topological mirror moulded by one-dimensional atom array, which allows to boost the lifetime of cavity polaritons over an order of magnitude. This considerable enhancement attributes to the coupling of polaritonic states to dissipationless edge states protected by the topological bandgap of atom array that suppresses the leakage of cavity modes. When exceeding the width of Rabi splitting, topological bandgap can further reduce the dissipation from polaritonic states to bulk states of atom array, giving arise to subradiant cavity polaritons with extremely sharp linewidth. The resultant Rabi oscillation decays with a rate even below the free-space decay of a single quantum emitter. Inheriting from the topologically protected properties of edge states, the subradiance of cavity polaritons can be preserved in the disordered atom mirror with moderate perturbations involving the atomic frequency, interaction strengths and location. Our work opens up a new paradigm of topology-engineered quantum states with robust quantum coherence for future applications in quantum computing and network.

翻訳日:2023-08-09 12:35:18 公開日:2023-08-08

# コンテキストアライメント - 微調整前のバニラ言語モデルとのチャット

In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning ( http://arxiv.org/abs/2308.04275v1 )

ライセンス: Link先を確認

Xiaochuang Han

(参考訳) 本稿では,コンテキスト内学習による推論時間アライメントについて検討する。我々は,事前学習された言語モデルであるllama-2を微調整する前に検討し,モデルがチャットスタイルの指示に従うように促された場合,平均9個のデモンストレーションアライメント例を取得する。直接的プロンプトと比較すると、モデル重みを変更しないコンテキスト内アライメントは、OpenAIのtext-davinci-003モデルであるWin-rate w.r.tの7倍増加し、アライメントを微調整する強力なベースラインに匹敵するバニラ言語モデルとなる。

In this note, we explore inference-time alignment through in-context learning. We consider a vanilla pretrained language model Llama-2 before any fine-tuning and retrieve an average of 9 demonstration alignment examples when the model is prompted to follow chat-style instructions. Compared to direct prompting, the in-context alignment without changing model weights leads to a 7x increase in win-rate w.r.t. the text-davinci-003 model from OpenAI, making the vanilla language model comparable to strong baselines with alignment fine-tuning.

翻訳日:2023-08-09 12:34:54 公開日:2023-08-08

# losy and lossless (l$^2$) トレーニング後のモデルサイズ圧縮

Lossy and Lossless (L$^2$) Post-training Model Size Compression ( http://arxiv.org/abs/2308.04269v1 )

ライセンス: Link先を確認

Yumeng Shi, Shihao Bai, Xiuying Wei, Ruihao Gong, Jianlei Yang

(参考訳) ディープニューラルネットワークは驚くべきパフォーマンスをもたらし、様々なビジュアルタスクで広く使われている。しかし、その巨大なサイズは伝送と貯蔵に多大な不便をもたらす。過去の多くの研究でモデルサイズ圧縮が研究されている。しかしながら、これらの研究は、しばしば様々な損失のない圧縮手法に単独でアプローチし、高い圧縮比を効率的に達成する上での課題となる。本研究では,無損失圧縮と無損失圧縮を統一的に組み合わせた後学習モデルサイズ圧縮法を提案する。本稿ではまず,異なる損失圧縮法を訓練後の方法で共同で行うことができる統一パラメトリックウェイト変換を提案する。次に、損失圧縮の最適化を導出するために専用微分可能なカウンタを導入し、後続のロスレス圧縮に適した点に到達させる。さらに, 所望のグローバル圧縮比を制御でき, 異なる層に対して適応比を割り当てることができる。最後に,精度を犠牲にすることなく10/times$圧縮比を安定させ,短時間で20/times$圧縮比を小さくする手法を提案する。私たちのコードはhttps://github.com/ModelTC/L2_Compressionで利用可能です。

Deep neural networks have delivered remarkable performance and have been widely used in various visual tasks. However, their huge size causes significant inconvenience for transmission and storage. Many previous studies have explored model size compression. However, these studies often approach various lossy and lossless compression methods in isolation, leading to challenges in achieving high compression ratios efficiently. This work proposes a post-training model size compression method that combines lossy and lossless compression in a unified way. We first propose a unified parametric weight transformation, which ensures different lossy compression methods can be performed jointly in a post-training manner. Then, a dedicated differentiable counter is introduced to guide the optimization of lossy compression to arrive at a more suitable point for later lossless compression. Additionally, our method can easily control a desired global compression ratio and allocate adaptive ratios for different layers. Finally, our method can achieve a stable $10\times$ compression ratio without sacrificing accuracy and a $20\times$ compression ratio with minor accuracy loss in a short time. Our code is available at https://github.com/ModelTC/L2_Compression .

翻訳日:2023-08-09 12:34:41 公開日:2023-08-08

# 知識蒸留のための教師学生アーキテクチャ:調査

Teacher-Student Architecture for Knowledge Distillation: A Survey ( http://arxiv.org/abs/2308.04268v1 )

ライセンス: Link先を確認

Chengming Hu, Xuan Li, Dan Liu, Haolun Wu, Xi Chen, Ju Wang, Xue Liu

(参考訳) ディープニューラルネットワーク(dnn)は、多くの領域で大規模な問題を解決する能力が強かったが、そのようなdnnを実世界のシステムに展開することは困難である。この問題に対処するために,数パラメータの単純な学生ネットワークが,パラメータの少ない教師ネットワークと同等の性能を達成できる,教師学習型アーキテクチャが提案されている。近年, 知識圧縮, 知識拡張, 知識適応, 知識向上など, 様々な知識蒸留(KD)の目標に対して, 教師・学生アーキテクチャが効果的に広く受け入れられている。教師・学生アーキテクチャの助けを借りて,最近の研究は,軽量で汎用的な学生ネットワークを通じて,複数の蒸留目的を達成することができる。知識圧縮を主眼とする既存のKD調査と異なり、この調査はまず、複数の蒸留目標にわたる教師-学生アーキテクチャについて調査する。本調査では,様々な知識表現とそれに対応する最適化目標について紹介する。さらに, 代表的な学習アルゴリズムと効果的な蒸留スキームを用いて, 教師・学生のアーキテクチャを体系的に概観する。この調査は、分類、認識、生成、ランキング、回帰など、様々な目的にまたがる教師学習型アーキテクチャの最近の応用を要約している。最後に,アーキテクチャ設計,知識品質,回帰型学習の理論研究を中心に,kdにおける潜在的研究方向を検討する。この包括的調査を通じて、産業実践家や学術コミュニティは、様々な蒸留目的に教師-学生アーキテクチャを効果的に設計、学習、適用するための貴重な洞察とガイドラインを得ることができる。

Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be deployed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. Recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. Different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. This survey presents an introduction to various knowledge representations and their corresponding optimization objectives. Additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. This survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. Lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. Through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.

翻訳日:2023-08-09 12:34:21 公開日:2023-08-08

# RLHF-Blender: 多様なヒューマンフィードバックから学ぶための構成可能な対話インタフェース

RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback ( http://arxiv.org/abs/2308.04332v1 )

ライセンス: Link先を確認

Yannick Metz, David Lindner, Rapha\"el Baur, Daniel Keim, Mennatallah El-Assady

(参考訳) ヒューマンフィードバック(RLHF)からの強化学習を実用化するためには,多様なフィードバック源から報酬モデルを学習し,異なるタイプのフィードバックの提供に関わる人的要因を検討することが重要である。しかし、多様なフィードバックから学習する体系的な研究は、研究者が利用できる限られた標準ツールによって支えられている。このギャップを埋めるために,人間のフィードバックから学習するための,構成可能な対話型インタフェースであるRLHF-Blenderを提案する。 RLHF-Blenderはモジュラー実験フレームワークと実装を提供しており、研究者は報酬学習のために人間のフィードバックの特性と品質を体系的に研究することができる。このシステムは、デモ、ランキング、比較、自然言語指導を含む様々なフィードバックタイプの探索や、その効果に対するヒューマンファクターの影響を考慮した研究を促進する。 RLHF-ブレンダーによる具体的な研究の機会について論じる。詳細はhttps://rlhfblender.info/を参照。

To use reinforcement learning from human feedback (RLHF) in practical applications, it is crucial to learn reward models from diverse sources of human feedback and to consider human factors involved in providing feedback of different types. However, the systematic study of learning from diverse types of feedback is held back by limited standardized tooling available to researchers. To bridge this gap, we propose RLHF-Blender, a configurable, interactive interface for learning from human feedback. RLHF-Blender provides a modular experimentation framework and implementation that enables researchers to systematically investigate the properties and qualities of human feedback for reward learning. The system facilitates the exploration of various feedback types, including demonstrations, rankings, comparisons, and natural language instructions, as well as studies considering the impact of human factors on their effectiveness. We discuss a set of concrete research opportunities enabled by RLHF-Blender. More information is available at https://rlhfblender.info/.

翻訳日:2023-08-09 12:27:53 公開日:2023-08-08

# GANを用いたクロスシーン映像のシーン合成によるドメイン適応型人物探索

Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos ( http://arxiv.org/abs/2308.04322v1 )

ライセンス: Link先を確認

Huibing Wang, Tianxiang Cui, Mingze Yao, Huijuan Pang, Yushan Du

(参考訳) 人探しは近年、実際のカメラから特定の歩行者を検索することを目的としているコンピュータビジョン分野において難しい課題となっている。しかしながら、ほとんどの監視ビデオは、歩行者のイメージのみで構成されており、しばしば同じ背景や衣服を特徴としている。したがって,実場面での人物検索において,より識別的な特徴を知ることは困難である。この課題に対処するため、GAN(Generative Adversarial Networks)を用いて監視ビデオからデータを合成する。 GANは高品質な画像を効率よく生成するため、コンピュータビジョンの問題に発展してきた。ビデオの処理や正確な検出結果の取得が可能な,人気の高いFast R-CNNモデルを変更するだけでよい。 2段階モデルがもたらす圧力を適切に軽減するため,我々はAIDQ (Assisted-Identity Query Module) を設計し,後方部に対して肯定的な画像を提供する。さらに,人物検索作業のための高品質な人物画像の合成が可能な,新しいGANベースのシーン合成モデルを提案する。 GANに基づくシーン合成モデルの特徴学習を容易にするため,合成画像とオリジナル画像の協調学習を行うオンライン学習戦略を採用した。 CUHK-SYSU と PRW の2つの広く使われている個人探索ベンチマークによる広範囲な実験により,本手法は高い性能を達成し,より広範なアブレーション研究により,GAN合成データがデータセットの変動性を効果的に増加し,より現実的になることを示す。

Person search has recently been a challenging task in the computer vision domain, which aims to search specific pedestrians from real cameras.Nevertheless, most surveillance videos comprise only a handful of images of each pedestrian, which often feature identical backgrounds and clothing. Hence, it is difficult to learn more discriminative features for person search in real scenes. To tackle this challenge, we draw on Generative Adversarial Networks (GAN) to synthesize data from surveillance videos. GAN has thrived in computer vision problems because it produces high-quality images efficiently. We merely alter the popular Fast R-CNN model, which is capable of processing videos and yielding accurate detection outcomes. In order to appropriately relieve the pressure brought by the two-stage model, we design an Assisted-Identity Query Module (AIDQ) to provide positive images for the behind part. Besides, the proposed novel GAN-based Scene Synthesis model that can synthesize high-quality cross-id person images for person search tasks. In order to facilitate the feature learning of the GAN-based Scene Synthesis model, we adopt an online learning strategy that collaboratively learns the synthesized images and original images. Extensive experiments on two widely used person search benchmarks, CUHK-SYSU and PRW, have shown that our method has achieved great performance, and the extensive ablation study further justifies our GAN-synthetic data can effectively increase the variability of the datasets and be more realistic.

翻訳日:2023-08-09 12:27:37 公開日:2023-08-08

# 弱教師付きセマンティックセグメンテーションのための全ペア一貫性学習

All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2308.04321v1 )

ライセンス: Link先を確認

Weixuan Sun, Yanhao Zhang, Zhen Qin, Zheyuan Liu, Lin Cheng, Fanyi Wang, Yiran Zhong, Nick Barnes

(参考訳) 本研究では,Wakly supervised semantic segmentation (WSSS) のためのオブジェクトのローカライズを改良したトランスフォーマーベース正規化を提案する。画像レベルのWSSSでは、擬似セグメンテーションラベルとしてオブジェクトローカライゼーションを生成するためにクラスアクティベーションマップ(CAM)が採用されている。 CAMの部分的なアクティベーション問題に対処するために、様々な画像拡張におけるアクティベーション強度の不変性を維持するために整合正則化を用いる。しかし、これらの手法は各CAM内の領域間のペアワイズ関係を無視し、コンテキストをキャプチャし、画像ビュー間で不変であるべきである。そこで本研究では,新しい全対整合正規化(ACR)を提案する。一対の拡張ビューが与えられた場合、我々のアプローチは、一対の拡張ビュー間でのアクティベーション強度を規則化するとともに、各ビュー内の領域間の親和性が一貫していることを保証する。視覚トランスフォーマーを自己着脱機構として採用し,自然にペアワイズ親和性を埋め込む。これにより、強調画像対の注目行列間の距離を簡易に調整できる。さらに,クラストークンの勾配を利用した新しいクラス単位のローカライズ手法を提案する。我々の手法はアーキテクチャを変更することなくトランスフォーマーを用いて既存のWSSSメソッドにシームレスに統合することができる。 PASCAL VOCおよびMS COCOデータセットを用いて本手法の評価を行った。本手法はクラスローカライゼーションマップ(PASCAL VOC列車の67.3% mIoU)を著しく改善し,WSSS性能が向上した。

In this work, we propose a new transformer-based regularization to better localize objects for Weakly supervised semantic segmentation (WSSS). In image-level WSSS, Class Activation Map (CAM) is adopted to generate object localization as pseudo segmentation labels. To address the partial activation issue of the CAMs, consistency regularization is employed to maintain activation intensity invariance across various image augmentations. However, such methods ignore pair-wise relations among regions within each CAM, which capture context and should also be invariant across image views. To this end, we propose a new all-pairs consistency regularization (ACR). Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent. We adopt vision transformers as the self-attention mechanism naturally embeds pair-wise affinity. This enables us to simply regularize the distance between the attention matrices of augmented image pairs. Additionally, we introduce a novel class-wise localization method that leverages the gradients of the class token. Our method can be seamlessly integrated into existing WSSS methods using transformers without modifying the architectures. We evaluate our method on PASCAL VOC and MS COCO datasets. Our method produces noticeably better class localization maps (67.3% mIoU on PASCAL VOC train), resulting in superior WSSS performances.

翻訳日:2023-08-09 12:27:09 公開日:2023-08-08

# サブ回折非コヒーレント光イメージングにおける量子限界 III。数値解析

Quantum limit to subdiffraction incoherent optical imaging. III. Numerical analysis ( http://arxiv.org/abs/2308.04317v1 )

ライセンス: Link先を確認

Xiao-Jie Tan and Mankei Tsang

(参考訳) 遠距離非コヒーレントイメージングの基本的な限界を調べるために、この研究の予備(m. tsang, phys. rev. a 99, 012305 (2019), 104, 052411 (2021)]は、物体モーメント推定誤差の量子下限を研究し、物体サイズに関する境界のスケーリング則を証明した。スケーリングの法則は、消滅する物体の大きさの漸近的極限でのみ証明されたため、この研究は量子境界の数値解析を行い、実際にゼロでない物体サイズでうまく働くことを検証した。また,空間モードデマルチプレクシング (SPADE) と呼ばれる測定値の最適性について検討し,SPADEがスケーリングに追従するだけでなく,少なくとも低次モーメントに対して最適に近い数値的に近いことを示す。

To investigate the fundamental limit to far-field incoherent imaging, the prequels to this work [M. Tsang, Phys. Rev. A 99, 012305 (2019); 104, 052411 (2021)] have studied a quantum lower bound on the error of estimating an object moment and proved a scaling law for the bound with respect to the object size. As the scaling law was proved only in the asymptotic limit of vanishing object size, this work performs a numerical analysis of the quantum bound to verify that the law works well for nonzero object sizes in reality. We also use the numerical bounds to study the optimality of a measurement called spatial-mode demultiplexing or SPADE, showing that SPADE not only follows the scaling but is also numerically close to being optimal, at least for low-order moments.

翻訳日:2023-08-09 12:26:46 公開日:2023-08-08

# 協調マルチエージェントバンド: 最適な個別レグレットと一定通信コストを持つ分散アルゴリズム

Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs ( http://arxiv.org/abs/2308.04314v1 )

ライセンス: Link先を確認

Lin Yang, Xuchuang Wang, Mohammad Hajiesmaili, Lijun Zhang, John C.S. Lui, Don Towsley

(参考訳) 近年,一組の分散エージェントが協調的に同じマルチアームバンディットゲームをする,協調型マルチエージェントマルチアームバンディットの研究が盛んに行われている。目標は、最適なグループと個人の後悔とエージェント間のコミュニケーションの少ないバンディットアルゴリズムを開発することである。以前の作業では、リーダフォローと完全な分散アルゴリズムという2つのパラダイムを使用してこの問題に取り組んでいた。両方のパラダイムにおける先行アルゴリズムは、最適なグループ後悔を達成する。リーダー追跡アルゴリズムは一定の通信コストを達成するが、最適な個人の後悔は達成できない。最先端の完全分散アルゴリズムは、最適な個別の後悔を実現するが、一定の通信コストは達成できない。本稿では,シンプルだが効果的な通信方針を示し,協調的盗賊学習アルゴリズムに統合する。我々のアルゴリズムは、最適な個人の後悔と絶え間ないコミュニケーションコストという、両方のパラダイムのベストを達成する。

Recently, there has been extensive study of cooperative multi-agent multi-armed bandits where a set of distributed agents cooperatively play the same multi-armed bandit game. The goal is to develop bandit algorithms with the optimal group and individual regrets and low communication between agents. The prior work tackled this problem using two paradigms: leader-follower and fully distributed algorithms. Prior algorithms in both paradigms achieve the optimal group regret. The leader-follower algorithms achieve constant communication costs but fail to achieve optimal individual regrets. The state-of-the-art fully distributed algorithms achieve optimal individual regrets but fail to achieve constant communication costs. This paper presents a simple yet effective communication policy and integrates it into a learning algorithm for cooperative bandits. Our algorithm achieves the best of both paradigms: optimal individual regret and constant communication costs.

翻訳日:2023-08-09 12:26:26 公開日:2023-08-08

# Apple Vision Pro for Healthcare:「究極のディスプレイ」?

Apple Vision Pro for Healthcare: "The Ultimate Display"? ( http://arxiv.org/abs/2308.04313v1 )

ライセンス: Link先を確認

Jan Egger, Christina Gsaxner, Xiaojun Chen, Jiang Bian, Jens Kleesiek, Behrus Puladi

(参考訳) 2023年6月のWorldwide Developers Conference (WWDC)で、AppleはVision Proを発表した。 Vision ProはMR(Mixed Reality)ヘッドセットで、より具体的にはVR(Virtual Reality)デバイスで、VST(Video See-Through)機能が追加されている。 VST機能は、Vision Proを拡張現実(Augmented Reality, AR)デバイスに変える。 AR機能は、カメラを介して現実世界をユーザーの目の前で(VR)スクリーンにストリーミングすることで実現される。もちろんこれはユニークではなく、Varjo XR-3のような他のデバイスと似ている。それでもVision Proには、ヘッドセットの装着者の目が「外」に表示されるインサイド・アウト・スクリーンや、デジタルクラウンと呼ばれる上部のボタンなど、デジタルコンテンツを物理的空間とシームレスにブレンドできる機能があります。さらに、バッテリへのケーブル以外は接続されていないため、varjo xr-3と比較してヘッドセットはより機敏になる。これは、1965年にイヴァン・サザーランドがスケッチした「Ultimate Display」に近いかもしれない。 Ultimate Displayのような一般向けにはまだ公開されていないが、この観点からは、ARがまだ医療分野で直面しているいくつかの臨床的課題を克服できるかどうかを見極めるとともに、Vision Proが臨床医を不可欠なタスクで支援し、患者とより多くの時間を過ごすことができるかどうかを議論したい。

At the Worldwide Developers Conference (WWDC) in June 2023, Apple introduced the Vision Pro. The Vision Pro is a Mixed Reality (MR) headset, more specifically it is a Virtual Reality (VR) device with an additional Video See-Through (VST) capability. The VST capability turns the Vision Pro also into an Augmented Reality (AR) device. The AR feature is enabled by streaming the real world via cameras to the (VR) screens in front of the user's eyes. This is of course not unique and similar to other devices, like the Varjo XR-3. Nevertheless, the Vision Pro has some interesting features, like an inside-out screen that can show the headset wearers' eyes to "outsiders" or a button on the top, called "Digital Crown", that allows you to seamlessly blend digital content with your physical space by turning it. In addition, it is untethered, except for the cable to the battery, which makes the headset more agile, compared to the Varjo XR-3. This could actually come closer to the "Ultimate Display", which Ivan Sutherland had already sketched in 1965. Not available to the public yet, like the Ultimate Display, we want to take a look into the crystal ball in this perspective to see if it can overcome some clinical challenges that - especially - AR still faces in the medical domain, but also go beyond and discuss if the Vision Pro could support clinicians in essential tasks to spend more time with their patients.

翻訳日:2023-08-09 12:26:15 公開日:2023-08-08

# 対話シナリオにおける車両軌道予測のための解釈可能なゴールベースモデル

Interpretable Goal-Based model for Vehicle Trajectory Prediction in Interactive Scenarios ( http://arxiv.org/abs/2308.04312v1 )

ライセンス: Link先を確認

Amina Ghoul, Itheri Yahiaoui, Anne Verroust-Blondet, and Fawzi Nashashibi

(参考訳) 都市環境における交通路を予測しながら、車と周囲の社会的相互作用を理解する能力は、自動運転における道路安全に不可欠である。社会的相互作用は不確実性のため説明が難しい。近年、ニューラルネットワークに基づく手法は軌道予測に広く使われており、手作りの手法よりも優れていることが示されている。しかし、これらの手法は解釈可能性の欠如に苦しむ。この制限を克服するために,対話環境における車両軌道予測タスクにおいて,離散的選択モデルの解釈可能性とニューラルネットワークに基づくモデルの高精度を組み合わせる。インタラクションデータセットを用いてモデルを実装し評価し,提案手法の有効性を実証し,精度を損なうことなくその予測を説明する。

The abilities to understand the social interaction behaviors between a vehicle and its surroundings while predicting its trajectory in an urban environment are critical for road safety in autonomous driving. Social interactions are hard to explain because of their uncertainty. In recent years, neural network-based methods have been widely used for trajectory prediction and have been shown to outperform hand-crafted methods. However, these methods suffer from their lack of interpretability. In order to overcome this limitation, we combine the interpretability of a discrete choice model with the high accuracy of a neural network-based model for the task of vehicle trajectory prediction in an interactive environment. We implement and evaluate our model using the INTERACTION dataset and demonstrate the effectiveness of our proposed architecture to explain its predictions without compromising the accuracy.

翻訳日:2023-08-09 12:25:50 公開日:2023-08-08

# メタファー検出のためのディープラーニングに基づく知識注入:包括的レビュー

Deep Learning-Based Knowledge Injection for Metaphor Detection: A Comprehensive Review ( http://arxiv.org/abs/2308.04306v1 )

ライセンス: Link先を確認

Cheng Yang, Wenye Zhao, Qingbao Huang

(参考訳) 比喩研究の歴史は知識注入研究の進化を象徴している。近年のディープラーニング技術の進歩により、自然言語処理コミュニティはメタファ認識タスクの成果に知識を適用することに大きな関心を示している。メタファ認識の分野では,知識注入に関するアプローチが徐々に増えてきたが,知識注入に基づくアプローチに関する完全なレビュー記事が不足している。そこで本稿の目的は,メタファ認識タスクにおける知識注入へのディープラーニングの適用における研究の進歩を包括的にレビューすることである。本稿では,主要な知識と知識の注入原則を体系的に要約し,一般化するとともに,メタファ認識タスクで使用されるデータセット,評価指標,ベンチマークモデルをレビューする。最後に,ナレッジインジェクション手法が直面する課題を探究し,今後の研究の方向性を展望する。

The history of metaphor research also marks the evolution of knowledge infusion research. With the continued advancement of deep learning techniques in recent years, the natural language processing community has shown great interest in applying knowledge to successful results in metaphor recognition tasks. Although there has been a gradual increase in the number of approaches involving knowledge injection in the field of metaphor recognition, there is a lack of a complete review article on knowledge injection based approaches. Therefore, the goal of this paper is to provide a comprehensive review of research advances in the application of deep learning for knowledge injection in metaphor recognition tasks. In this paper, we systematically summarize and generalize the mainstream knowledge and knowledge injection principles, as well as review the datasets, evaluation metrics, and benchmark models used in metaphor recognition tasks. Finally, we explore the current issues facing knowledge injection methods and provide an outlook on future research directions.

翻訳日:2023-08-09 12:25:40 公開日:2023-08-08

# セマンティック通信システムにおけるモデル反転盗聴攻撃

The Model Inversion Eavesdropping Attack in Semantic Communication Systems ( http://arxiv.org/abs/2308.04304v1 )

ライセンス: Link先を確認

Yuhao Chen, Qianqian Yang, Zhiguo Shi and Jiming Chen

(参考訳) 近年,セマンティックコミュニケーションはコミュニケーション効率の優位性について研究が盛んに行われている。意味コミュニケーションは、生のメッセージから意味を抽出するためにディープラーニングに依存するため、ディープラーニングモデルをターゲットにした攻撃には弱い。本稿では, セマンティック通信システムにおけるプライバシー漏洩のリスクを明らかにするために, モデル逆盗聴攻撃(MIEA)を導入する。 mieaでは、攻撃者は最初にセマンティック通信システムによって送信される信号を盗み出し、次にモデル反転攻撃を行い、ホワイトボックスとブラックボックスの設定の両方が考慮される生のメッセージを再構築する。評価の結果,MIEAは異なるチャネル条件下で良好な品質で生メッセージを再構築できることがわかった。次に, セキュアな意味コミュニケーションを実現するために, ランダムな順列と置換に基づく防御手法を提案する。本研究は,MIEA対策における防衛法の有効性を実証するものである。

In recent years, semantic communication has been a popular research topic for its superiority in communication efficiency. As semantic communication relies on deep learning to extract meaning from raw messages, it is vulnerable to attacks targeting deep learning models. In this paper, we introduce the model inversion eavesdropping attack (MIEA) to reveal the risk of privacy leaks in the semantic communication system. In MIEA, the attacker first eavesdrops the signal being transmitted by the semantic communication system and then performs model inversion attack to reconstruct the raw message, where both the white-box and black-box settings are considered. Evaluation results show that MIEA can successfully reconstruct the raw message with good quality under different channel conditions. We then propose a defense method based on random permutation and substitution to defend against MIEA in order to achieve secure semantic communication. Our experimental results demonstrate the effectiveness of the proposed defense method in preventing MIEA.

翻訳日:2023-08-09 12:25:27 公開日:2023-08-08

# 先行情報とセマンティック支援機能グリッドマップを用いた車両運動予測

Vehicle Motion Forecasting using Prior Information and Semantic-assisted Occupancy Grid Maps ( http://arxiv.org/abs/2308.04303v1 )

ライセンス: Link先を確認

Rabbia Asghar, Manuel Diaz-Zapata, Lukas Rummelhard, Anne Spalanzani, Christian Laugier

(参考訳) センサデータの不確実性、未来の非決定論的性質、エージェントの複雑な振る舞いなどにより、自律走行車両の動作予測は困難なタスクである。本稿では,シーンを動的占有グリッドマップ(dogm)として表現し,占有セルに意味ラベルを関連付け,地図情報を組み込むことにより,この問題に取り組む。本稿では,車両行動予測のための深層学習に基づく時空間的手法と確率論的手法を組み合わせた新しい枠組みを提案する。実世界のNuScenesデータセットを用いて実験を行い,OGMの予測よりも静的車両と動的車両の予測能力が優れていることを示す。さらに,アブレーション研究を行い,アーキテクチャにおける意味ラベルとマップの役割を評価する。

Motion prediction is a challenging task for autonomous vehicles due to uncertainty in the sensor data, the non-deterministic nature of future, and complex behavior of agents. In this paper, we tackle this problem by representing the scene as dynamic occupancy grid maps (DOGMs), associating semantic labels to the occupied cells and incorporating map information. We propose a novel framework that combines deep-learning-based spatio-temporal and probabilistic approaches to predict vehicle behaviors.Contrary to the conventional OGM prediction methods, evaluation of our work is conducted against the ground truth annotations. We experiment and validate our results on real-world NuScenes dataset and show that our model shows superior ability to predict both static and dynamic vehicles compared to OGM predictions. Furthermore, we perform an ablation study and assess the role of semantic labels and map in the architecture.

翻訳日:2023-08-09 12:25:12 公開日:2023-08-08

# SSTFormer: フレームイベントに基づく認識のためのブリッジングスパイキングニューラルネットワークとメモリサポートトランス

SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition ( http://arxiv.org/abs/2308.04369v1 )

ライセンス: Link先を確認

Xiao Wang, Zongzhen Wu, Yao Rong, Lin Zhu, Bo Jiang, Jin Tang, Yonghong Tian

(参考訳) イベントカメラに基づくパターン認識は近年新たに生まれた研究テーマである。現在の研究者は通常、イベントストリームを画像、グラフ、voxelに変換し、イベントベースの分類にディープニューラルネットワークを採用する。しかし、単純なイベント認識データセットでは良いパフォーマンスが得られるが、以下の2つの問題により、結果はまだ限られているかもしれない。まず、認識のみに空間的スパースイベントストリームを採用するが、色や詳細なテクスチャ情報をうまくキャプチャできない場合がある。第2に、spyking neural networks (snn) をエネルギー効率のよいサブオプティマイズによる認識に、artificial neural networks (ann) をエネルギー集約的かつ高性能な認識に採用している。しかし、これら2つの側面のバランスを取ることはほとんど考えていない。本稿では,RGBフレームとイベントストリームを同時に融合してパターンを認識することを提案し,上記の問題に対処する新しいRGBフレームイベント認識フレームワークを提案する。提案手法は,RGBフレーム符号化のためのメモリサポートトランスフォーマーネットワーク,生イベントストリーム符号化のためのスパイクニューラルネットワーク,RGBイベント特徴集約のためのマルチモーダルボトルネック融合モジュール,予測ヘッドの4つの主要モジュールを含む。また,RGB-Eventに基づく分類データセットが不足しているため,DVS346イベントカメラを用いて記録した114のクラスと27102のフレームイベントペアを含む大規模PokerEventデータセットを提案する。 2つのrgbイベントベースの分類データセットに関する広範な実験により,提案フレームワークの有効性が完全に検証された。この作業により、RGBフレームとイベントストリームを融合することで、パターン認識の開発が促進されることを願っています。この作業のデータセットとソースコードは、https://github.com/Event-AHU/SSTFormer.comで公開されます。

Event camera-based pattern recognition is a newly arising research topic in recent years. Current researchers usually transform the event streams into images, graphs, or voxels, and adopt deep neural networks for event-based classification. Although good performance can be achieved on simple event recognition datasets, however, their results may be still limited due to the following two issues. Firstly, they adopt spatial sparse event streams for recognition only, which may fail to capture the color and detailed texture information well. Secondly, they adopt either Spiking Neural Networks (SNN) for energy-efficient recognition with suboptimal results, or Artificial Neural Networks (ANN) for energy-intensive, high-performance recognition. However, seldom of them consider achieving a balance between these two aspects. In this paper, we formally propose to recognize patterns by fusing RGB frames and event streams simultaneously and propose a new RGB frame-event recognition framework to address the aforementioned issues. The proposed method contains four main modules, i.e., memory support Transformer network for RGB frame encoding, spiking neural network for raw event stream encoding, multi-modal bottleneck fusion module for RGB-Event feature aggregation, and prediction head. Due to the scarce of RGB-Event based classification dataset, we also propose a large-scale PokerEvent dataset which contains 114 classes, and 27102 frame-event pairs recorded using a DVS346 event camera. Extensive experiments on two RGB-Event based classification datasets fully validated the effectiveness of our proposed framework. We hope this work will boost the development of pattern recognition by fusing RGB frames and event streams. Both our dataset and source code of this work will be released at https://github.com/Event-AHU/SSTFormer.

翻訳日:2023-08-09 12:18:12 公開日:2023-08-08

# SLEM:超学習方程式モデリングを用いた経路モデリングと因果推論のための機械学習

SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling ( http://arxiv.org/abs/2308.04365v1 )

ライセンス: Link先を確認

Matthew J. Vowels

(参考訳) 因果推論は科学の重要な目標であり、観測データを用いて仮説的介入の予測に関する有意義な結論に達することができる。経路モデル、構造方程式モデル(SEM)、より一般的には、DAG(Directed Acyclic Graphs)は、現象の根底にある因果構造に関する仮定を明確に特定する手段を提供する。関数形式とパラメトリック形式についてほとんど仮定しないDAGとは異なり、SEMは線型性を仮定する。これにより機能的不特定が生じ、研究者が信頼性の高い効果サイズ推定を行うのを防ぐことができる。これとは対照的に,機械学習のスーパーラーナーアンサンブルを統合するパスモデリング技術であるSuper Learner Equation Modelingを提案する。我々は,SEMと比較した場合の線形モデルに対する因果効果の一貫性と不偏性の評価,および非線形関係を扱う場合のSEMに対する優位性を実証的に示す。オープンソースのコードとサンプルを使ったチュートリアルノートブックを提供し,メソッドの使いやすさを強調する。

Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.

翻訳日:2023-08-09 12:17:42 公開日:2023-08-08

# 無バイアス画像分割の学習 : プレーン膝X線撮影を例に

Learning Unbiased Image Segmentation: A Case Study with Plain Knee Radiographs ( http://arxiv.org/abs/2308.04356v1 )

ライセンス: Link先を確認

Nickolas Littlefield, Johannes F. Plate, Kurt R. Weiss, Ines Lohse, Avani Chhabra, Ismaeel A. Siddiqui, Zoe Menezes, George Mastorakos, Sakshi Mehul Thakar, Mehrnaz Abedian, Matthew F. Gong, Luke A. Carlson, Hamidreza Moradi, Soheyla Amirian, and Ahmad P. Tafti

(参考訳) 膝骨骨解剖の自動分節化は整形外科において必須であり,術前および術後のいずれにおいても数年にわたって行われている。深層学習アルゴリズムは医用画像解析において異常な性能を示しているが、これらのモデルにおける公平性と潜在的なバイアスの評価は限られている。本研究では,単純x線写真を用いた深層学習による膝骨解剖学的セグメント化を再考し,視認性や人種バイアスを明らかにすることを目的とした。現在の貢献はバイアスに対する理解を深める可能性を提供し、医療画像の研究者や実践者に実践的な洞察を提供する。提案された緩和戦略は、男女の偏見を緩和し、公平で偏見のないセグメンテーション結果を保証する。さらに本研究は, 多様な患者集団の正確な診断と治療結果への平等なアクセスを促進し, 公平かつ包括的な医療提供を促進する。

Automatic segmentation of knee bony anatomy is essential in orthopedics, and it has been around for several years in both pre-operative and post-operative settings. While deep learning algorithms have demonstrated exceptional performance in medical image analysis, the assessment of fairness and potential biases within these models remains limited. This study aims to revisit deep learning-powered knee-bony anatomy segmentation using plain radiographs to uncover visible gender and racial biases. The current contribution offers the potential to advance our understanding of biases, and it provides practical insights for researchers and practitioners in medical imaging. The proposed mitigation strategies mitigate gender and racial biases, ensuring fair and unbiased segmentation results. Furthermore, this work promotes equal access to accurate diagnoses and treatment outcomes for diverse patient populations, fostering equitable and inclusive healthcare provision.

翻訳日:2023-08-09 12:17:24 公開日:2023-08-08

# 3D-VisTA:3Dビジョンとテキストアライメントのためのトレーニング済みトランス

3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment ( http://arxiv.org/abs/2308.04352v1 )

ライセンス: Link先を確認

Ziyu Zhu, Xiaojian Ma, Yixin Chen, Zhidong Deng, Siyuan Huang, Qing Li

(参考訳) 3次元視覚言語接地(3D-VL)は、3次元物理世界と自然言語を結びつけることを目的とした新興分野である。現在の3D-VLモデルは、洗練されたモジュール、補助的な損失、最適化のトリックに大きく依存している。本稿では,様々な下流タスクに容易に適応可能な3次元視覚およびテキストアライメントのための事前学習トランスフォーマである3d-vistaを提案する。 3D-VisTAは、単一のモーダルモデリングとマルチモーダル融合の両方に、高度なタスク固有の設計を使わずに自己アテンション層を利用する。 3D-VLタスクの性能をさらに向上するために,3D-VL事前学習のための大規模3DシーンテキストペアデータセットであるScanScribeを構築した。 ScanScribeには、ScanNetと3R-Scanデータセットに由来する1,185の屋内シーンのための2,995のRGB-Dスキャンと、既存の3D-VLタスク、テンプレート、GPT-3から生成された278Kシーン記述が含まれている。 3D-VisTAは、マスク付き言語/オブジェクトモデリングとシーンテキストマッチングによってScanScribe上で事前トレーニングされる。視覚的接地や密接なキャプション、質問応答、位置推論など、様々な3D-VLタスクの最先端結果が得られる。さらに、3D-VisTAはデータ効率が優れており、下流タスクの微調整中に限られたアノテーションでも高い性能が得られる。

3D vision-language grounding (3D-VL) is an emerging field that aims to connect the 3D physical world with natural language, which is crucial for achieving embodied intelligence. Current 3D-VL models rely heavily on sophisticated modules, auxiliary losses, and optimization tricks, which calls for a simple and unified model. In this paper, we propose 3D-VisTA, a pre-trained Transformer for 3D Vision and Text Alignment that can be easily adapted to various downstream tasks. 3D-VisTA simply utilizes self-attention layers for both single-modal modeling and multi-modal fusion without any sophisticated task-specific design. To further enhance its performance on 3D-VL tasks, we construct ScanScribe, the first large-scale 3D scene-text pairs dataset for 3D-VL pre-training. ScanScribe contains 2,995 RGB-D scans for 1,185 unique indoor scenes originating from ScanNet and 3R-Scan datasets, along with paired 278K scene descriptions generated from existing 3D-VL tasks, templates, and GPT-3. 3D-VisTA is pre-trained on ScanScribe via masked language/object modeling and scene-text matching. It achieves state-of-the-art results on various 3D-VL tasks, ranging from visual grounding and dense captioning to question answering and situated reasoning. Moreover, 3D-VisTA demonstrates superior data efficiency, obtaining strong performance even with limited annotations during downstream task fine-tuning.

翻訳日:2023-08-09 12:17:07 公開日:2023-08-08

# 国籍バイアスを解き放つ:AI生成記事における人間による国籍の認識に関する研究

Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles ( http://arxiv.org/abs/2308.04346v1 )

ライセンス: Link先を確認

Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao `Kenneth' Huang and Shomir Wilson

(参考訳) 自然言語処理(NLP)モデルにおける国籍バイアスの可能性について,人間の評価手法を用いて検討した。バイアス付きNLPモデルは、ステレオタイプを永続させ、アルゴリズムによる差別につながる可能性がある。本研究は,テキスト生成モデルにおける国籍バイアスの影響を定量的かつ定性的に把握するための2段階の混合手法を用いる。人間中心の定量的分析を通じて、AIソースが生成した記事の国籍バイアスの程度を測定する。次に,被験者との公開面接を行い,質的コーディングと主題分析を行い,これらのバイアスが人間の読者に与える影響を理解する。以上の結果から,NLPモデルでは既存の社会的バイアスを再現・増幅する傾向があり,社会工学的な場面で使用すれば害につながる可能性が示唆された。インタビューから得られた質的な分析は、読者がそのような記事に遭遇する際の体験についての洞察を与え、読者の国に対する認識を変える可能性を強調している。これらの知見は、AIが社会に与える影響を形作り、AIシステムのバイアスを正す必要性において、公衆の認識が重要な役割を担っていることを強調している。

We investigate the potential for nationality biases in natural language processing (NLP) models using human evaluation methods. Biased NLP models can perpetuate stereotypes and lead to algorithmic discrimination, posing a significant challenge to the fairness and justice of AI systems. Our study employs a two-step mixed-methods approach that includes both quantitative and qualitative analysis to identify and understand the impact of nationality bias in a text generation model. Through our human-centered quantitative analysis, we measure the extent of nationality bias in articles generated by AI sources. We then conduct open-ended interviews with participants, performing qualitative coding and thematic analysis to understand the implications of these biases on human readers. Our findings reveal that biased NLP models tend to replicate and amplify existing societal biases, which can translate to harm if used in a sociotechnical setting. The qualitative analysis from our interviews offers insights into the experience readers have when encountering such articles, highlighting the potential to shift a reader's perception of a country. These findings emphasize the critical role of public perception in shaping AI's impact on society and the need to correct biases in AI systems.

翻訳日:2023-08-09 12:16:39 公開日:2023-08-08

# クロスモーダル検索のためのトランスフォーマによる2ストリームエンコーダの統合

Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval ( http://arxiv.org/abs/2308.04343v1 )

ライセンス: Link先を確認

Yi Bin, Haoxuan Li, Yahui Xu, Xing Xu, Yang Yang, Heng Tao Shen

(参考訳) 既存のクロスモーダル検索手法の多くは、画像とテキストの異なるアーキテクチャを持つ2ストリームエンコーダ、画像のCNN、テキストのRNN/Transformerを使用している。このようなアーキテクチャの相違は、異なる意味的分布空間を誘導し、画像とテキスト間の相互作用を制限し、さらに画像とテキストのアライメントが劣る可能性がある。視覚タスクにおけるトランスフォーマーの最近の進歩に触発されたこの研究ギャップを埋めるため,両モードでトランスフォーマーとエンコーダアーキテクチャを統合することを提案する。具体的には、画像変換器、テキスト変換器、階層アライメントモジュールからなる2ストリーム変換器(textbf{Hierarchical Alignment Transformer (HAT)})を純粋にベースとしたクロスモーダル検索フレームワークを設計する。このような同一のアーキテクチャでは、エンコーダは画像やテキストに類似した特徴を持つ表現を生成し、それらの相互作用やアライメントをより容易にすることができる。さらに、リッチセマンティクスを活用するために、画像とテキストの間の異なるレイヤのマルチレベル対応を探索するための階層的アライメントスキームを考案する。提案するHATの有効性を評価するため,MSCOCOとFlickr30Kという2つのベンチマークデータセットについて広範な実験を行った。実験の結果,HATはSOTAベースラインよりも大きなマージンで優れていた。具体的には、textit{i.e.} と Image-to-text と text-to-image の2つの主要なタスクにおいて、HAT は MSCOCO での Recall@1 の相対スコア改善を 7.6 % と 16.7 %、Flickr30k では 4.4 % と 11.6 % を達成する。コードは \url{https://github.com/luminosityx/hat} で入手できる。

Most existing cross-modal retrieval methods employ two-stream encoders with different architectures for images and texts, \textit{e.g.}, CNN for images and RNN/Transformer for texts. Such discrepancy in architectures may induce different semantic distribution spaces and limit the interactions between images and texts, and further result in inferior alignment between images and texts. To fill this research gap, inspired by recent advances of Transformers in vision tasks, we propose to unify the encoder architectures with Transformers for both modalities. Specifically, we design a cross-modal retrieval framework purely based on two-stream Transformers, dubbed \textbf{Hierarchical Alignment Transformers (HAT)}, which consists of an image Transformer, a text Transformer, and a hierarchical alignment module. With such identical architectures, the encoders could produce representations with more similar characteristics for images and texts, and make the interactions and alignments between them much easier. Besides, to leverage the rich semantics, we devise a hierarchical alignment scheme to explore multi-level correspondences of different layers between images and texts. To evaluate the effectiveness of the proposed HAT, we conduct extensive experiments on two benchmark datasets, MSCOCO and Flickr30K. Experimental results demonstrate that HAT outperforms SOTA baselines by a large margin. Specifically, on two key tasks, \textit{i.e.}, image-to-text and text-to-image retrieval, HAT achieves 7.6\% and 16.7\% relative score improvement of Recall@1 on MSCOCO, and 4.4\% and 11.6\% on Flickr30k respectively. The code is available at \url{https://github.com/LuminosityX/HAT}.

翻訳日:2023-08-09 12:16:17 公開日:2023-08-08

# 正確、説明可能、プライベートモデル:トレーニングデータの漏洩を最小限に抑えながらリコースを提供する

Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage ( http://arxiv.org/abs/2308.04341v1 )

ライセンス: Link先を確認

Catherine Huang, Chelse Swoopes, Christina Xiao, Jiaqi Ma, Himabindu Lakkaraju

(参考訳) 機械学習モデルは、個々の結果を予測するために、影響のある領域でますます利用されています。このように、多くのモデルは、否定的な結果を受ける個人にアルゴリズム的リコースを提供する。しかし、recourseは敵によってプライベートな情報を開示するために利用される。この研究はそのような攻撃を緩和する最初の試みである。本稿では,微分プライベート・モデル(DPM)とラプラス・リコース(LR)の2つの新しい手法を提案する。実世界および合成データセットのロジスティック回帰分類器を用いて、DPMとLRは、特に低FPRにおいて、敵対者が推論できることを減らすのに有効であることがわかった。トレーニングデータセットのサイズが十分に大きい場合、モデルを維持しながらプライバシーの漏洩を防止し、新しいLR法でレコメンデーション精度を向上することに成功した。

Machine learning models are increasingly utilized across impactful domains to predict individual outcomes. As such, many models provide algorithmic recourse to individuals who receive negative outcomes. However, recourse can be leveraged by adversaries to disclose private information. This work presents the first attempt at mitigating such attacks. We present two novel methods to generate differentially private recourse: Differentially Private Model (DPM) and Laplace Recourse (LR). Using logistic regression classifiers and real world and synthetic datasets, we find that DPM and LR perform well in reducing what an adversary can infer, especially at low FPR. When training dataset size is large enough, we find particular success in preventing privacy leakage while maintaining model and recourse accuracy with our novel LR method.

翻訳日:2023-08-09 12:15:41 公開日:2023-08-08

# 網膜面に基づく軽量かつ高精度な顔検出アルゴリズム

A Lightweight and Accurate Face Detection Algorithm Based on Retinaface ( http://arxiv.org/abs/2308.04340v1 )

ライセンス: Link先を確認

Baozhu Liu, Hewei Yu

(参考訳) 本稿では,Retinaface を用いた軽量かつ高精度な顔検出アルゴリズム LAFD (Light and accurate face detection) を提案する。アルゴリズムのバックボーンネットワークは、畳み込みカーネルのサイズ、反転残差ブロックのチャネル拡大乗算器、seアテンション機構の使用を調整する改良されたmobilenetv3ネットワークである。変形可能な畳み込みネットワーク(dcn)がコンテキストモジュールに導入され、アルゴリズムはモデルの分類損失関数としてクロスエントロピー損失関数の代わりに焦点損失関数を使用する。 WIDERFACEデータセットの試験結果は、LAFDの平均精度が94.1%、92.2%、82.1%で、それぞれ3.4%、4.0%、および8.3%の改善であり、優れた軽量モデルであるLFFDよりも3.1%、4.1%高い。入力画像が前処理され、長さ1560px、幅1200pxにスケールされた場合、「ハード」検証サブセットの平均精度は86.2%となる。モデルは軽量で、サイズは10.2MBである。

In this paper, we propose a lightweight and accurate face detection algorithm LAFD (Light and accurate face detection) based on Retinaface. Backbone network in the algorithm is a modified MobileNetV3 network which adjusts the size of the convolution kernel, the channel expansion multiplier of the inverted residuals block and the use of the SE attention mechanism. Deformable convolution network(DCN) is introduced in the context module and the algorithm uses focal loss function instead of cross-entropy loss function as the classification loss function of the model. The test results on the WIDERFACE dataset indicate that the average accuracy of LAFD is 94.1%, 92.2% and 82.1% for the "easy", "medium" and "hard" validation subsets respectively with an improvement of 3.4%, 4.0% and 8.3% compared to Retinaface and 3.1%, 4.1% and 4.1% higher than the well-performing lightweight model, LFFD. If the input image is pre-processed and scaled to 1560px in length or 1200px in width, the model achieves an average accuracy of 86.2% on the 'hard' validation subset. The model is lightweight, with a size of only 10.2MB.

翻訳日:2023-08-09 12:15:27 公開日:2023-08-08

# Pengembangan Model untuk Mendeteksi Kerusakan pada Terumbu Karang dengan Klasifikasi Citra

Pengembangan Model untuk Mendeteksi Kerusakan pada Terumbu Karang dengan Klasifikasi Citra ( http://arxiv.org/abs/2308.04337v1 )

ライセンス: Link先を確認

Fadhil Muhammad, Alif Bintang Elfandra, Iqbal Pahlevi Amin, Alfan Farizki Wicaksono

(参考訳) インドネシア海域のサンゴ礁の生物多様性は貴重な資産であり、保存する必要がある。急速な気候変動と人的活動はサンゴ礁の生態系を悪化させ、サンゴの白化はサンゴの健康状態の重要な指標となっている。そこで本研究では,健康サンゴと漂白サンゴを区別する正確な分類モデルを開発することを目的としている。本研究はFlickr APIを用いてFlickrから収集した923の画像からなる特別なデータセットを利用する。データセットは、健康サンゴ(438画像)と漂白サンゴ(485画像)の2つの異なるクラスで構成されている。これらの画像は最大300ピクセルの幅や高さにリサイズされ、データセット全体にわたって一貫したサイズを維持している。本研究で用いられる方法は、機械学習モデル、特に畳み込みニューラルネットワーク(cnn)を用いて、健康で漂白したサンゴの視覚パターンを認識し識別することである。この文脈では、データセットは、最適な結果を得るために様々な分類モデルのトレーニングとテストに使用できる。 ResNetモデルを利用することで、Stock-Scratch ResNetモデルは、精度と精度で事前訓練されたモデルより優れていることがわかった。正確な分類モデルの開発の成功は、サンゴ礁の健康をよりよく理解する研究者や海洋生物学者に大いに役立つだろう。これらのモデルはサンゴ礁環境の変化をモニタリングするためにも用いられるため、生命に大きく影響する保護と生態系の回復に重要な貢献をする。

The abundant biodiversity of coral reefs in Indonesian waters is a valuable asset that needs to be preserved. Rapid climate change and uncontrolled human activities have led to the degradation of coral reef ecosystems, including coral bleaching, which is a critical indicator of coral health conditions. Therefore, this research aims to develop an accurate classification model to distinguish between healthy corals and corals experiencing bleaching. This study utilizes a specialized dataset consisting of 923 images collected from Flickr using the Flickr API. The dataset comprises two distinct classes: healthy corals (438 images) and bleached corals (485 images). These images have been resized to a maximum of 300 pixels in width or height, whichever is larger, to maintain consistent sizes across the dataset. The method employed in this research involves the use of machine learning models, particularly convolutional neural networks (CNN), to recognize and differentiate visual patterns associated with healthy and bleached corals. In this context, the dataset can be used to train and test various classification models to achieve optimal results. By leveraging the ResNet model, it was found that a from-scratch ResNet model can outperform pretrained models in terms of precision and accuracy. The success in developing accurate classification models will greatly benefit researchers and marine biologists in gaining a better understanding of coral reef health. These models can also be employed to monitor changes in the coral reef environment, thereby making a significant contribution to conservation and ecosystem restoration efforts that have far-reaching impacts on life.

翻訳日:2023-08-09 12:15:03 公開日:2023-08-08

# ガーナのnational science and maths quizを勝ち取るaiを目指して

Towards an AI to Win Ghana's National Science and Maths Quiz ( http://arxiv.org/abs/2308.04333v1 )

ライセンス: Link先を確認

George Boateng, Jonathan Abrefah Mensah, Kevin Takyi Yeboah, William Edor, Andrew Kojo Mensah-Onumah, Naafi Dasana Ibrahim, Nana Sam Yeboah

(参考訳) aiはガーナのnational science and maths quiz(nsmq)に勝つことができるか? NSMQ AIプロジェクト(NSMQ AI Project)は、NSMQのライブ配信と勝利を競うAIを開発するオープンソースプロジェクトである。 NSMQ (英語: NSMQ) は、ガーナの2人の学生からなる3つのチームが、生物学、化学、物理学、数学の5段階にわたる質問に答えて、優勝チームが優勝するまでの5段階で競う、毎年開催される科学・数学の大会である。 NSMQは、音声テキスト、テキスト音声、質問応答、人間とコンピュータのインタラクションなど、興味深い技術的課題を抱える、エキサイティングなライブクイズコンペティションである。 2023年1月に始まったこの進行中の作業の中で、プロジェクトの概要、各チーム、これまでの進捗状況、そして10月にNSMQ 2023向けに計画されたAIのローンチとデビューに向けた次のステップについて説明します。この大きな課題を克服するAIは、アフリカの何百万人もの学生が、このAIから一対一の学習支援を受けられるように、教育に現実的な影響を与える可能性がある。

Can an AI win Ghana's National Science and Maths Quiz (NSMQ)? That is the question we seek to answer in the NSMQ AI project, an open-source project that is building AI to compete live in the NSMQ and win. The NSMQ is an annual live science and mathematics competition for senior secondary school students in Ghana in which 3 teams of 2 students compete by answering questions across biology, chemistry, physics, and math in 5 rounds over 5 progressive stages until a winning team is crowned for that year. The NSMQ is an exciting live quiz competition with interesting technical challenges across speech-to-text, text-to-speech, question-answering, and human-computer interaction. In this ongoing work that began in January 2023, we give an overview of the project, describe each of the teams, progress made thus far, and the next steps toward our planned launch and debut of the AI in October for NSMQ 2023. An AI that conquers this grand challenge can have real-world impact on education such as enabling millions of students across Africa to have one-on-one learning support from this AI.

翻訳日:2023-08-09 12:14:36 公開日:2023-08-08

# シーケンス生成のための大規模言語モデルからの学習評価モデル

Learning Evaluation Models from Large Language Models for Sequence Generation ( http://arxiv.org/abs/2308.04386v1 )

ライセンス: Link先を確認

Chenglong Wang, Hang Zhou, Kaiyan Chang, Tongran Liu, Chunliang Zhang, Quan Du, Tong Xiao, Jingbo Zhu

(参考訳) 大規模言語モデルはシーケンス生成評価において最先端のパフォーマンスを実現するが、一般的に多くのパラメータを持つ。これは、大規模に評価能力を適用することで示される計算上の課題である。本稿では, LLM から比較的軽量な言語モデルへ評価能力を移すために, この問題を克服するために, \textbf{ECT}valuation \textbf{e}valuation \textbf{c}apability \textbf{t}ransfer 法を提案する。提案するectに基づいて、chatgptから様々な評価モデルを学び、強化学習と再ランキングアプローチによるシーケンス生成モデルの改善に報奨モデルとして活用する。機械翻訳, テキストスタイル転送, 要約タスクの実験結果から, ECTの有効性が示された。特に、学習した評価モデルをシーケンス生成モデルに適用すると、一般的なメトリクスやChatGPTで評価されるように、より優れた生成シーケンスが得られる。

Large language models achieve state-of-the-art performance on sequence generation evaluation, but typically have a large number of parameters. This is a computational challenge as presented by applying their evaluation capability at scale. To overcome the challenge, in this paper, we propose \textbf{ECT}, an \textbf{e}valuation \textbf{c}apability \textbf{t}ransfer method, to transfer the evaluation capability from LLMs to relatively lightweight language models. Based on the proposed ECT, we learn various evaluation models from ChatGPT, and employ them as reward models to improve sequence generation models via reinforcement learning and reranking approaches. Experimental results on machine translation, text style transfer, and summarization tasks demonstrate the effectiveness of our ECT. Notably, applying the learned evaluation models to sequence generation models results in better generated sequences as evaluated by commonly used metrics and ChatGPT.

翻訳日:2023-08-09 12:09:26 公開日:2023-08-08

# DELFlow: 大規模クラウドのためのシーンフローの高精度学習

DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds ( http://arxiv.org/abs/2308.04383v1 )

ライセンス: Link先を確認

Chensheng Peng, Guangming Wang, Xian Wan Lo, Xinrui Wu, Chenfeng Xu, Masayoshi Tomizuka, Wei Zhan, Hesheng Wang

(参考訳) 点雲は自然に狭く、画像ピクセルは密度が高い。不整合限界は、ポイントワイドシーンフロー推定のための両モードからの融合である。従来の手法では,局所的な特徴集約のための最遠点サンプリング,kn,ボール問合せアルゴリズムに関わる距離計算とソートによるメモリ効率の非効率とオーバーヘッドのため,一時的推論によってシーン全体のシーンフローを予測することはほとんどなかった。シーンフロー学習におけるこれらの問題を緩和するため、3次元座標を2次元グリッドに格納することにより、生点を濃密な形式に規則化する。既存の作品でよく使われるサンプリング操作とは異なり,密度2次元表現 1)所定のシーンのほとんどのポイントを保存する。 2)効率の大幅な向上をもたらし、 3) 点と画素間の密度ギャップを排除し, 効率的な特徴融合を実現する。また,複数の点を投影中に1つのグリッドにマッピング可能であることによる情報損失問題を軽減するための新しいワーピング投影手法を提案する。十分な実験により,flyingthings3dとkittiデータセットの先行技術に匹敵する,本手法の有効性と有効性が実証された。

Point clouds are naturally sparse, while image pixels are dense. The inconsistency limits feature fusion from both modalities for point-wise scene flow estimation. Previous methods rarely predict scene flow from the entire point clouds of the scene with one-time inference due to the memory inefficiency and heavy overhead from distance calculation and sorting involved in commonly used farthest point sampling, KNN, and ball query algorithms for local feature aggregation. To mitigate these issues in scene flow learning, we regularize raw points to a dense format by storing 3D coordinates in 2D grids. Unlike the sampling operation commonly used in existing works, the dense 2D representation 1) preserves most points in the given scene, 2) brings in a significant boost of efficiency, and 3) eliminates the density gap between points and pixels, allowing us to perform effective feature fusion. We also present a novel warping projection technique to alleviate the information loss problem resulting from the fact that multiple points could be mapped into one grid during projection when computing cost volume. Sufficient experiments demonstrate the efficiency and effectiveness of our method, outperforming the prior-arts on the FlyingThings3D and KITTI dataset.

翻訳日:2023-08-09 12:09:07 公開日:2023-08-08

# 1次元量子多体系における活性誘起強磁性

Activity-induced ferromagnetism in one-dimensional quantum many-body systems ( http://arxiv.org/abs/2308.04382v1 )

ライセンス: Link先を確認

Kazuaki Takasan, Kyosuke Adachi, Kyogo Kawaguchi

(参考訳) 自己推進体のアンサンブルである活性物質は、様々な非平衡相転移を示す。本稿では,活性物質の原型モデルであるヴィエクモデルに類似した1次元の非エルミート量子多体モデルを構築し,その量子相転移について検討する。このモデルは強磁性相互作用と活性を伴う2成分ハードコアボソンから構成される:スピン依存非対称ホッピング。数値的な結果は、古典的な例ではフラッキングの量子的相反する活性によって誘導される強磁性秩序の出現を示し、強磁性相互作用なしでも生き残る。摂動理論と2粒子の場合の解法により、2粒子レベルでの非エルミート皮膚効果がこの群れ形成に不可欠であることがわかった。この効果を考慮に入れ,二点平均場理論を用いて数値的に求めた位相図を定性的に再現する。さらに,ハードコア条件が緩和されたモデルの変形を数値的に検討し,強磁性秩序のロバスト性を確認した。

Active matter, an ensemble of self-propelled entities, exhibits various nonequilibrium phase transitions. In this paper, we construct a non-Hermitian quantum many-body model in one dimension analogous to the Vicsek model, a prototypical model of active matter, and investigate its quantum phase transitions. The model consists of two-component hard-core bosons undergoing ferromagnetic interactions and with activity: spin-dependent asymmetric hopping. Numerical results show the emergence of a ferromagnetic order induced by the activity, which is a quantum counterpart of flocking in classical examples, and it even survives without the ferromagnetic interaction. We find through perturbation theory and solving the two-particle case that the non-Hermitian skin effect at the two-particle level is crucial for this flocking phase. To take this effect into account, we employ a two-site mean-field theory and qualitatively reproduce the numerically obtained phase diagram. We further numerically study a variant of our model, where the hard-core condition is relaxed, and confirm the robustness of the ferromagnetic order.

翻訳日:2023-08-09 12:08:46 公開日:2023-08-08

# あなたの否定は真否定ではないかもしれない:偽陰性除去による画像テキストマッチングの促進

Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination ( http://arxiv.org/abs/2308.04380v1 )

ライセンス: Link先を確認

Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, Heng Tao Shen

(参考訳) 既存の画像テキストマッチング手法の多くは最適化目的として三重項損失を採用しており、モデルを効果的に訓練するには<anchor, positive, negative> の3重項に対する適切な負のサンプルを選択することが重要である。しかし, 既存の手法では, ほぼ同様の試料をハード負として用いるが, 真の負ではない可能性がある。言い換えれば、アンカーと組み合わせていない高い類似性を持つサンプルは、正の意味的関連を保留し、それらを偽陰性と呼ぶ。これらの偽陰性を三重項損失で撃退することは、意味表現学習を誤解させ、検索性能を低下させる。本稿では,偽陰性から生じる問題を緩和できる新しい偽陰性除去法を提案する。具体的には,画像エンコーダとテキストエンコーダから抽出した特徴に基づいて,まず,アンカーとの類似性から正と負のサンプルの分布を別々に構築する。得られたサンプルの偽陰性確率は、アンカーとの類似性および上記の分布に基づいてベイズの法則を用いて計算し、これは負のサンプリング過程においてサンプリング重量として用いられる。小さいバッチサイズでは偽陰性は存在しないかもしれないので、大きな負のバッファを保持するために運動量を持つメモリモジュールを設計し、バッファにまたがる負のサンプリング戦略を実装します。さらに, モデルが強陰性に焦点を合わせるために, 単純な負のサンプリング重みをカットダウン戦略で再割り当てする。 Flickr30KとMS-COCOで大規模な実験を行い,提案した偽陰性除去戦略の優位性を実証した。コードはhttps://github.com/luminosityx/fneで入手できる。

Most existing image-text matching methods adopt triplet loss as the optimization objective, and choosing a proper negative sample for the triplet of <anchor, positive, negative> is important for effectively training the model, e.g., hard negatives make the model learn efficiently and effectively. However, we observe that existing methods mainly employ the most similar samples as hard negatives, which may not be true negatives. In other words, the samples with high similarity but not paired with the anchor may reserve positive semantic associations, and we call them false negatives. Repelling these false negatives in triplet loss would mislead the semantic representation learning and result in inferior retrieval performance. In this paper, we propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling, which could alleviate the problem introduced by false negatives. Specifically, we first construct the distributions of positive and negative samples separately via their similarities with the anchor, based on the features extracted from image and text encoders. Then we calculate the false negative probability of a given sample based on its similarity with the anchor and the above distributions via the Bayes' rule, which is employed as the sampling weight during negative sampling process. Since there may not exist any false negative in a small batch size, we design a memory module with momentum to retain a large negative buffer and implement our negative sampling strategy spanning over the buffer. In addition, to make the model focus on hard negatives, we reassign the sampling weights for the simple negatives with a cut-down strategy. The extensive experiments are conducted on Flickr30K and MS-COCO, and the results demonstrate the superiority of our proposed false negative elimination strategy. The code is available at https://github.com/LuminosityX/FNE.

翻訳日:2023-08-09 12:08:28 公開日:2023-08-08

# 3+1次元における時空対称量子力学

Space-time-symmetric quantum mechanics in 3+1 dimensions ( http://arxiv.org/abs/2308.04376v1 )

ライセンス: Link先を確認

Eduardo O. Dias

(参考訳) 従来の量子力学(QM)では、時間はパラメータとして$t$として扱われ、時間に関する量子状態の進化は${\hat {H}}|\psi(t)\rangle=i\hbar \frac{d}{dt}|\psi(t)\rangle$で記述される。 QM の最近提案された時空対称(STS)拡張では、位置がパラメータとなり、新しい量子状態 $|\phi(x)\rangle$ が導入された。この状態は、粒子の到着時刻が $x$ の位置で記述され、到着時刻が $x$ に対して変化する方法は ${\hat {p}}|\phi(x)\rangle=-i\hbar \frac{d}{dx} |\phi(x)\rangle$ によって制御される。本研究では,三次元空間を移動する粒子へのSTS拡張を一般化する。従来のQMと3次元STS拡張を組み合わせることで、動的方程式 ${\hat { P}}^{\mu}|{\phi }^\mu(x^{\mu})\rangle=-i \hbar~\eta^{\mu\nu}\frac{d}{dx^{\nu}}|{\phi}^\mu (x^{\mu})\rangle$ で与えられる `full'' STS QM が得られる。 x^\mu$を選択すると、Schr\"odinger方程式($x^\mu=x^0=t$)または3次元STS拡張($x^\mu=x^i=$または$x$、$y$、または$z$)を復元できる。 x^\mu=x$ を選択することにより、自由粒子に対する STS QM の動的方程式を解き、波動関数 $\langle t,y,z|\phi^1(x)\rangle$ を計算する。この波動関数は、検出器がx$の位置にあるyz$平面全体を占有していることを考えると、即時$t$で到達する粒子の確率振幅(y$,z$)を表す。注目すべきことに、$|\langle t,y,z|\phi (x)\rangle|^2$ in $y$ と $z$ の積分は、公理的キョフスキ分布の3次元版の形を取る。

In conventional quantum mechanics (QM), time is treated as a parameter, $t$, and the evolution of the quantum state with respect to time is described by ${\hat {H}}|\psi(t)\rangle=i\hbar \frac{d}{dt}|\psi(t)\rangle$. In a recently proposed space-time-symmetric (STS) extension of QM, position becomes the parameter and a new quantum state, $|\phi(x)\rangle$, is introduced. This state describes the particle's arrival time at position $x$, and the way the arrival time changes with respect to $x$ is governed by ${\hat {P}}|\phi(x)\rangle=-i\hbar \frac{d}{dx} |\phi(x)\rangle$. In this work, we generalize the STS extension to a particle moving in three-dimensional space. By combining the conventional QM with the three-dimensional STS extension, we have a ``full'' STS QM given by the dynamic equation ${\hat { P}}^{\mu}|{\phi }^\mu(x^{\mu})\rangle=- i \hbar~\eta^{\mu\nu}\frac{d}{dx^{\nu}}|{\phi}^\mu (x^{\mu})\rangle$, where $x^{\mu}$ is the coordinate chosen as the parameter of the state. Depending on the choice of $x^\mu$, we can recover either the Schr\"odinger equation (with $x^\mu=x^0=t$) or the three-dimensional STS extension (with $x^\mu=x^i=$ either $x$, $y$, or $z$). By selecting $x^\mu=x$, we solve the dynamic equation of the STS QM for a free particle and calculate the wave function $\langle t,y,z|\phi^1(x)\rangle$. This wave function represents the probability amplitude of the particle arriving at position ($y$,$z$) at instant $t$, given that the detector occupies the entire $yz$-plane located at position $x$. Remarkably, we find that the integral of $|\langle t,y,z|\phi (x)\rangle|^2$ in $y$ and $z$ takes the form of the three-dimensional version of the axiomatic Kijowski distribution.

翻訳日:2023-08-09 12:07:56 公開日:2023-08-08

# 人間-ai共同臨床意思決定における反事実的説明が信頼と信頼に及ぼす影響の理解

Understanding the Effect of Counterfactual Explanations on Trust and Reliance on AI for Human-AI Collaborative Clinical Decision Making ( http://arxiv.org/abs/2308.04375v1 )

ライセンス: Link先を確認

Min Hun Lee, Chong Jun Chew

(参考訳) 人工知能(AI)は、ハイテイクドメイン(例えば健康)における人間の意思決定を支援するものと考えられている。しかし、研究者は人間のAI補完的なパフォーマンスを達成する代わりに、人間がAIモデルの間違った提案を過度に評価できるという問題を議論してきた。そこで本研究では,AIに対する信頼度を低下させるため,AI提案をより分析的にレビューする上で,有能な特徴説明に加えて,臨床意思決定におけるAIへの信頼度と信頼度への影響を検討した。我々は,7人のセラピストと10人のレイパーを対象に,ストローク後の生存者の動作の質を評価するための実験を行い,そのパフォーマンス,タスクの合意レベル,AIへの依存度を2種類のAIの説明なしで分析した。その結果,「正しい」aiアウトプットが提示された場合,aiモデルがセラピストや素直な説明を補助し,作業の成果や合意レベルを改善することができた。セラピストもレイパーもAIのアウトプットを過度に頼っていたが、反ファクト的な説明はセラピストとレイパーの双方が、優れた特徴説明と比較して「ホワイト」AIのアウトプットへの過度な依存を21倍に減らした。具体的には、18.0 f1-score によるパフォーマンス劣化が顕著で、14.0 f1-score は8.6 f1-score と2.8 f1-score のパフォーマンス劣化のセラピストよりも高い。我々の研究は、AIモデルの精度をより正確に見積り、AI出力の過度な信頼度を減らし、人間とAIの協調的な意思決定を改善することの意義について論じている。

Artificial intelligence (AI) is increasingly being considered to assist human decision-making in high-stake domains (e.g. health). However, researchers have discussed an issue that humans can over-rely on wrong suggestions of the AI model instead of achieving human AI complementary performance. In this work, we utilized salient feature explanations along with what-if, counterfactual explanations to make humans review AI suggestions more analytically to reduce overreliance on AI and explored the effect of these explanations on trust and reliance on AI during clinical decision-making. We conducted an experiment with seven therapists and ten laypersons on the task of assessing post-stroke survivors' quality of motion, and analyzed their performance, agreement level on the task, and reliance on AI without and with two types of AI explanations. Our results showed that the AI model with both salient features and counterfactual explanations assisted therapists and laypersons to improve their performance and agreement level on the task when `right' AI outputs are presented. While both therapists and laypersons over-relied on `wrong' AI outputs, counterfactual explanations assisted both therapists and laypersons to reduce their over-reliance on `wrong' AI outputs by 21\% compared to salient feature explanations. Specifically, laypersons had higher performance degrades by 18.0 f1-score with salient feature explanations and 14.0 f1-score with counterfactual explanations than therapists with performance degrades of 8.6 and 2.8 f1-scores respectively. Our work discusses the potential of counterfactual explanations to better estimate the accuracy of an AI model and reduce over-reliance on `wrong' AI outputs and implications for improving human-AI collaborative decision-making.

翻訳日:2023-08-09 12:07:00 公開日:2023-08-08

# pelta: フェデレーション学習における回避攻撃を軽減するためのトランスフォーマーの遮蔽

Pelta: Shielding Transformers to Mitigate Evasion Attacks in Federated Learning ( http://arxiv.org/abs/2308.04373v1 )

ライセンス: Link先を確認

Simon Queyrut, Y\'erom-David Bromberg, Valerio Schiavoni

(参考訳) フェデレートされた学習の主な前提は、機械学習モデルの更新がローカルに計算され、特にユーザーのデータのプライバシを保護するためである。このメカニズムは、一度集約された一般的なモデルを、共同作業や非悪意のあるノードにブロードキャストすると仮定する。しかし、適切な防御がなければ、妥協されたクライアントは、敵の例を探すことで、ローカルメモリ内のモデルを簡単に探すことができる。例えば、画像ベースの応用を考えると、敵対的な例は、ローカルモデルによって誤って分類された(人間の目には)知覚不能に摂動されたイメージから構成される。このような悪質な調査を軽減するため,我々は,信頼できるハードウェアを活用した新たな遮蔽機構であるpeltaを紹介する。 Trusted Execution Environments(TEEs)の能力を活用することで、Peltaはバックプロパゲーションチェーンルールの一部をマスクする。我々は,アートアンサンブルモデルの現状についてペルタを評価し,自己注意勾配攻撃に対する効果を実証する。

The main premise of federated learning is that machine learning model updates are computed locally, in particular to preserve user data privacy, as those never leave the perimeter of their device. This mechanism supposes the general model, once aggregated, to be broadcast to collaborating and non malicious nodes. However, without proper defenses, compromised clients can easily probe the model inside their local memory in search of adversarial examples. For instance, considering image-based applications, adversarial examples consist of imperceptibly perturbed images (to the human eye) misclassified by the local model, which can be later presented to a victim node's counterpart model to replicate the attack. To mitigate such malicious probing, we introduce Pelta, a novel shielding mechanism leveraging trusted hardware. By harnessing the capabilities of Trusted Execution Environments (TEEs), Pelta masks part of the back-propagation chain rule, otherwise typically exploited by attackers for the design of malicious samples. We evaluate Pelta on a state of the art ensemble model and demonstrate its effectiveness against the Self Attention Gradient adversarial Attack.

翻訳日:2023-08-09 12:06:22 公開日:2023-08-08

# 導出Argument を用いた Bipolar Argument グラフの検証

Some Options for Instantiation of Bipolar Argument Graphs with Deductive Arguments ( http://arxiv.org/abs/2308.04372v1 )

ライセンス: Link先を確認

Anthony Hunter

(参考訳) 議論グラフは議論的状況の抽象表現を提供する。双極性グラフは有向グラフであり、各ノードは引数を表し、各アークは別のノードに対する1つの引数の影響を表す。ここでは、影響が支持、攻撃、曖昧であると仮定する。双極引数グラフでは、各引数はアトミックであるため、内部構造を持たない。しかし、個々の議論の性質やどのように相互作用するかをよりよく理解するには、その内部構造を検討することが重要である。そこで本論文では,双極子グラフのインスタンス化のための論理的引数の利用に基づくフレームワークと,引数の内部構造と引数間の関係のタイプを考慮に入れた議論のインスタンス化に関する制約のセットを提案する。

Argument graphs provide an abstract representation of an argumentative situation. A bipolar argument graph is a directed graph where each node denotes an argument, and each arc denotes the influence of one argument on another. Here we assume that the influence is supporting, attacking, or ambiguous. In a bipolar argument graph, each argument is atomic and so it has no internal structure. Yet to better understand the nature of the individual arguments, and how they interact, it is important to consider their internal structure. To address this need, this paper presents a framework based on the use of logical arguments to instantiate bipolar argument graphs, and a set of possible constraints on instantiating arguments that take into account the internal structure of the arguments, and the types of relationship between arguments.

翻訳日:2023-08-09 12:06:03 公開日:2023-08-08

# 大規模言語モデルを用いた累積推論

Cumulative Reasoning With Large Language Models ( http://arxiv.org/abs/2308.04371v1 )

ライセンス: Link先を確認

Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew Chi-Chih Yao

(参考訳) 言語モデルは強力で多用途であるが、しばしば非常に複雑な問題に対処できない。これは、複雑な問題を解決するには意図的な思考が必要であり、トレーニングの間は最小限の指導しか行われていないからである。本稿では,言語モデルを累積的かつ反復的に活用し,人間の思考過程をエミュレートするCumulative Reasoning(CR)という新しい手法を提案する。タスクを小さなコンポーネントに分解することで、 \ournamebは問題解決プロセスを合理化し、より管理しやすく、効果的にする。論理推論タスクでは、CRは既存のメソッドを9.3\%改善し、計算済みのFOLIO wikiデータセットで98.04\%の驚くべき精度を達成する。 24 のゲームでは、CR は 94 % の精度を達成し、これは以前の最先端手法よりも 20 % の大幅な向上を意味する。

While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, \ournameb streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3\%, and achieves the astonishing accuracy of 98.04\% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 94\%, which signifies a substantial enhancement of 20\% over the previous state-of-the-art method.

翻訳日:2023-08-09 12:05:49 公開日:2023-08-08

# 超解像におけるカモフラーゲ型物体検出 : 比較検討

When Super-Resolution Meets Camouflaged Object Detection: A Comparison Study ( http://arxiv.org/abs/2308.04370v1 )

ライセンス: Link先を確認

Juan Wen, Shupeng Cheng, Peng Xu, Bowen Zhou, Radu Timofte, Weiyan Hou, Luc Van Gool

(参考訳) Super Resolution (SR) と Camouflaged Object Detection (COD) は、コンピュータビジョンにおける様々なジョイントアプリケーションとのホットトピックである。例えば、低解像度の監視画像は、超高解像度技術と擬似物体検出によって順次処理することができる。しかし、以前の研究では、この2つの領域は常に孤立して研究されている。本稿では, 両者の総合的な比較評価を初めて実施する。具体的には,一般的なcodデータセット上で異なる超解像法をベンチマークし,sr法で処理したcodデータを用いて,異なるcodモデルのロバスト性を評価する。私たちの目標は、これらの2つの領域を橋渡し、新しい実験現象を発見し、新しい経験をまとめることです。

Super Resolution (SR) and Camouflaged Object Detection (COD) are two hot topics in computer vision with various joint applications. For instance, low-resolution surveillance images can be successively processed by super-resolution techniques and camouflaged object detection. However, in previous work, these two areas are always studied in isolation. In this paper, we, for the first time, conduct an integrated comparative evaluation for both. Specifically, we benchmark different super-resolution methods on commonly used COD datasets, and meanwhile, we evaluate the robustness of different COD models by using COD data processed by SR methods. Our goal is to bridge these two domains, discover novel experimental phenomena, summarize new experim.

翻訳日:2023-08-09 12:05:34 公開日:2023-08-08

# 屋外神経放射領域における深度事前の探索

Digging into Depth Priors for Outdoor Neural Radiance Fields ( http://arxiv.org/abs/2308.04413v1 )

ライセンス: Link先を確認

Chen Wang, Jiadai Sun, Lina Liu, Chenming Wu, Zhelun Shen, Dayan Wu, Yuchao Dai, Liangjun Zhang

(参考訳) neural radiance fields (nerf) は、新しいビュー合成や没入現実(immersive reality)など、視覚やグラフィックタスクにおいて印象的なパフォーマンスを示している。しかしながら、放射場の形状-照度あいまいさは、特に希薄な視点設定において、依然として課題である。近年の作業では、問題を緩和するため、奥行き先を屋外のNeRFトレーニングに統合している。しかし, 深度事前の選択基準と, 異なる先行の相対的メリットについては, 十分に検討されていない。さらに、深さ優先法を使うための異なるアプローチを選択するという相対的なメリットも未検討の問題である。本稿では,屋外神経放射場に先行する深度を用いた総合的な研究と評価を行い,一般的な深度センシング技術とその適用方法について述べる。具体的には,広く使用されている2つの屋外データセット上で,4つの共通使用深度前置法と異なる深さ使用法を備えた2つの代表的なnerf法を用いて広範囲な実験を行う。実験結果から,NeRFモデルの深度事前トレーニングにおいて,実践者や研究者が有用である可能性が示唆された。プロジェクトページ: https://cwchenwang.github.io/outdoor-nerf-depth

Neural Radiance Fields (NeRF) have demonstrated impressive performance in vision and graphics tasks, such as novel view synthesis and immersive reality. However, the shape-radiance ambiguity of radiance fields remains a challenge, especially in the sparse viewpoints setting. Recent work resorts to integrating depth priors into outdoor NeRF training to alleviate the issue. However, the criteria for selecting depth priors and the relative merits of different priors have not been thoroughly investigated. Moreover, the relative merits of selecting different approaches to use the depth priors is also an unexplored problem. In this paper, we provide a comprehensive study and evaluation of employing depth priors to outdoor neural radiance fields, covering common depth sensing technologies and most application ways. Specifically, we conduct extensive experiments with two representative NeRF methods equipped with four commonly-used depth priors and different depth usages on two widely used outdoor datasets. Our experimental results reveal several interesting findings that can potentially benefit practitioners and researchers in training their NeRF models with depth priors. Project Page: https://cwchenwang.github.io/outdoor-nerf-depth

翻訳日:2023-08-09 11:58:44 公開日:2023-08-08

# ランダム化線形分類器を用いた確率不変学習

Probabilistic Invariant Learning with Randomized Linear Classifiers ( http://arxiv.org/abs/2308.04412v1 )

ライセンス: Link先を確認

Leonardo Cotta, Gal Yehuda, Assaf Schuster, Chris J. Maddison

(参考訳) 既知のタスクの不分散を表現的かつ保存するモデルの設計は、ますます難しい問題になっている。既存のソリューション計算リソースやメモリリソースに対する不変性。本研究では,表現的かつ不変だが資源の少ないランダム性モデルと設計モデルをどのように活用するかを示す。ランダム化アルゴリズムにインスパイアされた私たちの重要な洞察は、普遍近似と不変性の確率論的概念を受け入れることで、リソースの要求を減らせることである。具体的には,Randomized Linear Classifiers (RLC) と呼ばれるバイナリ分類モデルのクラスを提案する。 rlcはコンパクト群変換に対する不変性を維持しつつ、高確率で任意の(スムース)関数を近似できるパラメータとサンプルサイズ条件を与える。この結果を利用して,集合,グラフ,球面データ上の分類タスクに対して有理確率不変量を持つ3つのrlcを設計した。これらのモデルが、(決定論的)ニューラルネットワークとその不変量よりも少ないリソースを用いて、確率的不変性と普遍性を達成する方法を示す。最後に、決定論的不変ニューラルネットワークが困難であることが知られている不変タスクにおいて、この新しいモデルの利点を実証的に示す。

Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, our key insight is that accepting probabilistic notions of universal approximation and invariance can reduce our resource requirements. More specifically, we propose a class of binary classification models called Randomized Linear Classifiers (RLCs). We give parameter and sample size conditions in which RLCs can, with high probability, approximate any (smooth) function while preserving invariance to compact group transformations. Leveraging this result, we design three RLCs that are provably probabilistic invariant for classification tasks over sets, graphs, and spherical data. We show how these models can achieve probabilistic invariance and universality using less resources than (deterministic) neural networks and their invariant counterparts. Finally, we empirically demonstrate the benefits of this new class of models on invariant tasks where deterministic invariant neural networks are known to struggle.

翻訳日:2023-08-09 11:58:24 公開日:2023-08-08

# 3次元物体検出のための頂点相対位置符号化V-DETR:DETR

V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection ( http://arxiv.org/abs/2308.04409v1 )

ライセンス: Link先を確認

Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, Baining Guo

(参考訳) DETRフレームワークを用いた点雲のための高性能な3次元物体検出器を提案する。事前の試みは、訓練データの限られた規模から正確な帰納バイアスを学習できないため、すべて最適以下の結果に終わる。特に、クエリは、ターゲットオブジェクトから遠く離れた点にしばしば参加し、オブジェクト検出の局所性原理に違反します。この制限に対処するために,各デコーダ層におけるクエリによって予測される3Dボックスに対する相対的な位置に基づいて各点の位置エンコーディングを計算し,局所性の原則に従ってモデルがオブジェクト近傍の点に焦点を合わせるための明確な情報を提供する,新しい3D Vertex Relative Position Encoding (3DV-RPE)手法を提案する。さらに,タスクの理解に基づくデータの正規化など,さまざまな側面からパイプラインを体系的に改善する。難解なscannetv2ベンチマークでは、それぞれ65.0\%/47.0\%から77.8\%/66.0\%までの$\rm{ap}_{25}$/$\rm{ap}_{50}$で以前の3detrを大きく改善した。さらに、ScanNetV2 と SUN RGB-D データセットに新しいレコードをセットし、http://github.com/yichaoshen-MS/V-DETR でコードをリリースする。

We introduce a highly performant 3D object detector for point clouds using the DETR framework. The prior attempts all end up with suboptimal results because they fail to learn accurate inductive biases from the limited scale of training data. In particular, the queries often attend to points that are far away from the target objects, violating the locality principle in object detection. To address the limitation, we introduce a novel 3D Vertex Relative Position Encoding (3DV-RPE) method which computes position encoding for each point based on its relative position to the 3D boxes predicted by the queries in each decoder layer, thus providing clear information to guide the model to focus on points near the objects, in accordance with the principle of locality. In addition, we systematically improve the pipeline from various aspects such as data normalization based on our understanding of the task. We show exceptional results on the challenging ScanNetV2 benchmark, achieving significant improvements over the previous 3DETR in $\rm{AP}_{25}$/$\rm{AP}_{50}$ from 65.0\%/47.0\% to 77.8\%/66.0\%, respectively. In addition, our method sets a new record on ScanNetV2 and SUN RGB-D datasets.Code will be released at http://github.com/yichaoshen-MS/V-DETR.

翻訳日:2023-08-09 11:58:07 公開日:2023-08-08

# XGBD:説明誘導グラフバックドア検出

XGBD: Explanation-Guided Graph Backdoor Detection ( http://arxiv.org/abs/2308.04406v1 )

ライセンス: Link先を確認

Zihan Guan, Mengnan Du, Ninghao Liu

(参考訳) バックドア攻撃は、グラフ学習モデルに重大なセキュリティリスクをもたらす。トレーニングデータセットにバックドアトリガーを挿入することで、ターゲットモデルにバックドアを組み込むことができる。バックドア攻撃に対抗するためにバックドア検出が提案されている。バックドアとクリーンサンプルの混合でモデルのトレーニングを行うと、バックドアサンプルの損失はクリーンサンプルよりも大幅に減少し、最低損失値のサンプルを選択することでバックドアサンプルを容易に検出できる。しかし、グラフデータ上のトポロジ的特徴情報の無知は、グラフ領域に直接適用した場合、検出の有効性を制限する。そこで本稿では,トポロジ情報を活用するために,説明誘導型バックドア検出手法を提案する。具体的には、グラフデータセット上でヘルパモデルをトレーニングし、モデルにグラフサンプルをフィードし、モデル予測を重要なサブグラフに属性付けるために説明手法を採用する。バックドア試料はクリーンサンプルと異なる属性分布を有するので,説明文はバックドア試料を検出するための識別的特徴として有用である。複数のポピュラーデータセットと攻撃手法に関する包括的実験により,本手法の有効性と説明可能性を示す。私たちのコードは、https://github.com/GuanZihan/GNN_backdoor_detectionで利用可能です。

Backdoor attacks pose a significant security risk to graph learning models. Backdoors can be embedded into the target model by inserting backdoor triggers into the training dataset, causing the model to make incorrect predictions when the trigger is present. To counter backdoor attacks, backdoor detection has been proposed. An emerging detection strategy in the vision and NLP domains is based on an intriguing phenomenon: when training models on a mixture of backdoor and clean samples, the loss on backdoor samples drops significantly faster than on clean samples, allowing backdoor samples to be easily detected by selecting samples with the lowest loss values. However, the ignorance of topological feature information on graph data limits its detection effectiveness when applied directly to the graph domain. To this end, we propose an explanation-guided backdoor detection method to take advantage of the topological information. Specifically, we train a helper model on the graph dataset, feed graph samples into the model, and then adopt explanation methods to attribute model prediction to an important subgraph. We observe that backdoor samples have distinct attribution distribution than clean samples, so the explanatory subgraph could serve as more discriminative features for detecting backdoor samples. Comprehensive experiments on multiple popular datasets and attack methods demonstrate the effectiveness and explainability of our method. Our code is available: https://github.com/GuanZihan/GNN_backdoor_detection.

翻訳日:2023-08-09 11:57:24 公開日:2023-08-08

# イベント匿名化による識別のない人物再識別

Person Re-Identification without Identification via Event Anonymization ( http://arxiv.org/abs/2308.04402v1 )

ライセンス: Link先を確認

Shafiq Ahmad, Pietro Morerio, Alessio Del Bue

(参考訳) 公共空間における視覚的監視の大規模利用は、個人のプライバシーを犠牲にしつつ、リソース消費(エネルギー、帯域幅、計算)を増加させる。ニューロモルフィック視覚センサ(イベントカメラ)は, 現場の被験者の詳細なRGB視覚情報を捉えないため, プライバシー問題に対する有効な解決策として近年検討されている。しかし、最近のディープラーニングアーキテクチャは、イベントカメラからのイメージを高い忠実度で再構築することができ、イベントベースのビジョンアプリケーションに対するプライバシーに対する潜在的な脅威を再導入している。本稿では,このような画像再構成攻撃から人間の身元を守るために,イベントストリームを匿名化することを目的とする。そこで本研究では,プライバシを保護し,人物ReIdのような下流タスクを実行するという2つの目的に対して,エンドツーエンドネットワークアーキテクチャを共同で最適化する手法を提案する。我々のネットワークは、イベントをスクランブルすることを学び、プライバシー攻撃者から回収された画像の劣化を強制する。この作業では、私たちのアプローチのパフォーマンスを評価するために収集された最初のイベントベースの人物ReIdデータセットもコミュニティに提供します。本手法を広範囲な実験により検証し,SoftBioデータセットと提案したEvent-ReIdデータセットからシミュレーションした合成イベントデータについて報告する。

Wide-scale use of visual surveillance in public spaces puts individual privacy at stake while increasing resource consumption (energy, bandwidth, and computation). Neuromorphic vision sensors (event-cameras) have been recently considered a valid solution to the privacy issue because they do not capture detailed RGB visual information of the subjects in the scene. However, recent deep learning architectures have been able to reconstruct images from event cameras with high fidelity, reintroducing a potential threat to privacy for event-based vision applications. In this paper, we aim to anonymize event-streams to protect the identity of human subjects against such image reconstruction attacks. To achieve this, we propose an end-to-end network architecture jointly optimized for the twofold objective of preserving privacy and performing a downstream task such as person ReId. Our network learns to scramble events, enforcing the degradation of images recovered from the privacy attacker. In this work, we also bring to the community the first ever event-based person ReId dataset gathered to evaluate the performance of our approach. We validate our approach with extensive experiments and report results on the synthetic event data simulated from the publicly available SoftBio dataset and our proposed Event-ReId dataset.

翻訳日:2023-08-09 11:56:49 公開日:2023-08-08

# ファインチューニングゲーム:汎用モデルの獲得と適応

Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models ( http://arxiv.org/abs/2308.04399v1 )

ライセンス: Link先を確認

Benjamin Laufer and Jon Kleinberg and Hoda Heidari

(参考訳) 機械学習(ML)と人工知能(AI)の主な進歩は、汎用モデルの開発とリリースの形式をますます取り入れている。これらのモデルは、他の企業や代理店が特定のドメイン固有の機能を実行するように設計されている。このプロセスは適応や微調整として知られるようになった。本稿では、ジェネラリストが技術製品(以下、MLモデル)を一定のレベルのパフォーマンスで導入し、1つ以上のドメイン-スペシャリストが特定のドメインでの使用に適応する微調整プロセスのモデルを提案する。両社とも、テクノロジに投資するときに利益を計上し、コストを被る。そして、市場に到達するためのテクノロジの収益の共有方法に関する交渉合意に達する必要がある。比較的一般的なコストと収益関数に対して、細調整ゲームが利益分配ソリューションをもたらす条件を特徴付ける。我々は、潜在的なドメイン-特殊化が、テクノロジーの取り込みに寄与し、自由化され、または吸収されることを観察し、これらの異なる戦略をもたらす条件を提供する。我々は,このタイプのインタラクションにおける企業の戦略行動の洞察を,バーゲインソリューションとサブゲーム完全均衡に基づく手法がどのように提供するかを示し,一方の企業が他方よりも著しくコストが高い場合でも,利益の分配が生じることを見出した。また,実用関数の一般集合に対するパレート・最適交渉配置を同定する手法も提案する。

Major advances in Machine Learning (ML) and Artificial Intelligence (AI) increasingly take the form of developing and releasing general-purpose models. These models are designed to be adapted by other businesses and agencies to perform a particular, domain-specific function. This process has become known as adaptation or fine-tuning. This paper offers a model of the fine-tuning process where a Generalist brings the technological product (here an ML model) to a certain level of performance, and one or more Domain-specialist(s) adapts it for use in a particular domain. Both entities are profit-seeking and incur costs when they invest in the technology, and they must reach a bargaining agreement on how to share the revenue for the technology to reach the market. For a relatively general class of cost and revenue functions, we characterize the conditions under which the fine-tuning game yields a profit-sharing solution. We observe that any potential domain-specialization will either contribute, free-ride, or abstain in their uptake of the technology, and we provide conditions yielding these different strategies. We show how methods based on bargaining solutions and sub-game perfect equilibria provide insights into the strategic behavior of firms in these types of interactions, and we find that profit-sharing can still arise even when one firm has significantly higher costs than another. We also provide methods for identifying Pareto-optimal bargaining arrangements for a general set of utility functions.

翻訳日:2023-08-09 11:56:07 公開日:2023-08-08

# 文字レベルNMTと言語類似性

Character-level NMT and language similarity ( http://arxiv.org/abs/2308.04398v1 )

ライセンス: Link先を確認

Josef Jon and Ond\v{r}ej Bojar

(参考訳) 本稿では,チェコ語とクロアチア語,ドイツ語,ハンガリー語,スロバキア語,スペイン語の翻訳における,トランスフォーマーアーキテクチャを用いた文字レベルのニューラルネットワーク翻訳の有効性について検討する。自動mtメトリクスを用いてモデルを評価し,類似言語間の翻訳が文字レベルの入力セグメンテーションに有益であることを示すが,関連度の低い言語では,文字レベルのバニラトランスフォーマベースがサブワードレベルのセグメンテーションに遅れることが多い。我々は、既に訓練済みのサブワードレベルのモデルを文字レベルに微調整することで、ギャップを閉じることができるという以前の発見を確認する。

We explore the effectiveness of character-level neural machine translation using Transformer architecture for various levels of language similarity and size of the training dataset on translation between Czech and Croatian, German, Hungarian, Slovak, and Spanish. We evaluate the models using automatic MT metrics and show that translation between similar languages benefits from character-level input segmentation, while for less related languages, character-level vanilla Transformer-base often lags behind subword-level segmentation. We confirm previous findings that it is possible to close the gap by finetuning the already trained subword-level models to character-level.

翻訳日:2023-08-09 11:55:43 公開日:2023-08-08

# LEFormer:リモートセンシング画像からの湖沼抽出のためのハイブリッドCNN変換器アーキテクチャ

LEFormer: A Hybrid CNN-Transformer Architecture for Accurate Lake Extraction from Remote Sensing Imagery ( http://arxiv.org/abs/2308.04397v1 )

ライセンス: Link先を確認

Ben Chen, Xuechao Zou, Yu Zhang, Jiayu Li, Kai Li, Pin Tao

(参考訳) リモートセンシング画像からの湖の抽出は、湖の複雑な形状とノイズの存在のために困難である。既存の手法は曖昧なセグメンテーション境界と貧弱なフォアグラウンドモデリングに悩まされている。本稿では, LEFormerと呼ばれるCNN-Transformerハイブリッドアーキテクチャを, 正確な湖沼抽出のために提案する。 leformerにはcnnエンコーダ、トランスフォーマーエンコーダ、クロスエンコーダ融合、軽量デコーダの4つのモジュールが含まれている。 CNNエンコーダは、局所的な空間情報を復元し、微細な詳細を改善する。同時にTransformerエンコーダは、任意の長さのシーケンス間の長距離依存関係をキャプチャし、グローバルな特徴とコンテキスト情報をよりよく取得する。最後に、マスク予測に軽量デコーダを用いる。本研究では,2つのデータセットである表層水 (SW) と清海・チベット高原湖 (QTPL) のLEFormerの性能と効率を評価する。実験結果から,LEFormerはこれらの2つのデータセット上で,最新技術(SOTA)のパフォーマンスと効率を一貫して達成し,既存の手法よりも優れていることがわかった。具体的には、LEFormerはSWデータセットとQTPLデータセットの90.86%と97.42% mIoUをそれぞれ3.61Mで達成し、従来のSOTA法より20倍小さい。

Lake extraction from remote sensing imagery is challenging due to the complex shapes of lakes and the presence of noise. Existing methods suffer from blurred segmentation boundaries and poor foreground modeling. In this paper, we propose a hybrid CNN-Transformer architecture, called LEFormer, for accurate lake extraction. LEFormer contains four main modules: CNN encoder, Transformer encoder, cross-encoder fusion, and lightweight decoder. The CNN encoder recovers local spatial information and improves fine-scale details. Simultaneously, the Transformer encoder captures long-range dependencies between sequences of any length, allowing them to obtain global features and context information better. Finally, a lightweight decoder is employed for mask prediction. We evaluate the performance and efficiency of LEFormer on two datasets, the Surface Water (SW) and the Qinghai-Tibet Plateau Lake (QTPL). Experimental results show that LEFormer consistently achieves state-of-the-art (SOTA) performance and efficiency on these two datasets, outperforming existing methods. Specifically, LEFormer achieves 90.86% and 97.42% mIoU on the SW and QTPL datasets with a parameter count of 3.61M, respectively, while being 20x minor than the previous SOTA method.

翻訳日:2023-08-09 11:55:30 公開日:2023-08-08

# ソーシャルプロセスマイニングを支援する企業コラボレーションシステムのためのイベント抽象化

Event Abstraction for Enterprise Collaboration Systems to Support Social Process Mining ( http://arxiv.org/abs/2308.04396v1 )

ライセンス: Link先を確認

Jonas Blatt, Patrick Delfmann, Petra Schubert

(参考訳) プロセスマイニング(PM)の1つの目的は、情報システムのイベントログからプロセスモデルの発見である。 PMはプロセス指向のエンタープライズシステムに適用されているが、通信やドキュメント指向のエンタープライズコラボレーションシステム(ECS)には適していない。 ECSイベントログは非常に粒度が高く、その結果はスパゲッティモデルに適用される。これに対する一般的な解決策は、発見アルゴリズムを実行する前に低レベルのログをより抽象的な高レベルのログに変換する、イベント抽象化である。 ECSログには、既存のイベント抽象化アプローチで完全に対処されていない特別な特徴がある。このギャップをECSイベント抽象化(ECSEA)アプローチで埋めることを目指しており、記録された実際のユーザアクティビティ(ハイレベルトレース)とシステム生成の低レベルトレース(ECSから抽出した)を比較してモデルを訓練する。このモデルにより、将来の低レベルトレースをPMに使用できる抽象化された高レベルログに変換することができる。本評価は,アルゴリズムが正確な結果を生成することを示す。 ECSEAは、社会プロセスマイニング(Social Process Mining)と呼ばれるECSにおける協調作業活動の解釈に不可欠な前処理手法である。

One aim of Process Mining (PM) is the discovery of process models from event logs of information systems. PM has been successfully applied to process-oriented enterprise systems but is less suited for communication- and document-oriented Enterprise Collaboration Systems (ECS). ECS event logs are very fine-granular and PM applied to their logs results in spaghetti models. A common solution for this is event abstraction, i.e., converting low-level logs into more abstract high-level logs before running discovery algorithms. ECS logs come with special characteristics that have so far not been fully addressed by existing event abstraction approaches. We aim to close this gap with a tailored ECS event abstraction (ECSEA) approach that trains a model by comparing recorded actual user activities (high-level traces) with the system-generated low-level traces (extracted from the ECS). The model allows us to automatically convert future low-level traces into an abstracted high-level log that can be used for PM. Our evaluation shows that the algorithm produces accurate results. ECSEA is a preprocessing method that is essential for the interpretation of collaborative work activity in ECS, which we call Social Process Mining.

翻訳日:2023-08-09 11:55:05 公開日:2023-08-08

# 医用画像におけるデータ拡張に基づく教師なしドメイン適応

Data Augmentation-Based Unsupervised Domain Adaptation In Medical Imaging ( http://arxiv.org/abs/2308.04395v1 )

ライセンス: Link先を確認

Sebastian N{\o}rgaard Llambias, Mads Nielsen, Mostafa Mehdipour Ghazi

(参考訳) ディープラーニングベースの医療画像モデルは、ハードウェア、取得パラメータ、人口、アーティファクトの違いによって生じるデータの異質性によって、しばしば新しいスキャンを効果的に一般化するのに苦労する。この制限は、臨床に機械学習モデルを採用する上で大きな課題となる。脳MRI領域の領域適応のための教師なし手法として,MRI固有の拡張技術を活用して提案する。本手法の有効性を評価するために,様々なデータセット,モダリティ,セグメンテーションタスクにまたがる広範な実験を行い,最先端手法との比較を行った。その結果,提案手法は高い精度を実現し,幅広い適用性を示し,多くのケースで最先端性能を上回って,様々なタスクにおけるドメインシフトに対する著しい堅牢性を示すことができた。

Deep learning-based models in medical imaging often struggle to generalize effectively to new scans due to data heterogeneity arising from differences in hardware, acquisition parameters, population, and artifacts. This limitation presents a significant challenge in adopting machine learning models for clinical practice. We propose an unsupervised method for robust domain adaptation in brain MRI segmentation by leveraging MRI-specific augmentation techniques. To evaluate the effectiveness of our method, we conduct extensive experiments across diverse datasets, modalities, and segmentation tasks, comparing against the state-of-the-art methods. The results show that our proposed approach achieves high accuracy, exhibits broad applicability, and showcases remarkable robustness against domain shift in various tasks, surpassing the state-of-the-art performance in the majority of cases.

翻訳日:2023-08-09 11:54:44 公開日:2023-08-08

# 追加データセットを組み込むことで、余分な相関を導入すればパフォーマンスを損なうことができる

When More is Less: Incorporating Additional Datasets Can Hurt Performance By Introducing Spurious Correlations ( http://arxiv.org/abs/2308.04431v1 )

ライセンス: Link先を確認

Rhys Compton, Lily Zhang, Aahlad Puli, Rajesh Ranganath

(参考訳) この作業は、多くの場合、外部データセットの追加がモデルのパフォーマンスを損なう可能性があることを示すことで、その概念に挑戦する。 4つの異なるオープンソースの胸部x線データセットと9つの異なるラベルの組み合わせを用いた大規模実証研究において,2つの病院のデータに基づいてトレーニングされたモデルでは,単一の病院のデータでトレーニングされたモデルよりも,2つの病院でトレーニングされたモデルの精度が最悪であることが示されている。この驚くべき結果は、追加の病院がトレーニング分布をテスト分布とよりよく似ているとしても起こる。この現象は, 病院固有のイメージアーティファクトが原因で, 疾患と病院との間に生じる急激な相関関係から生じると説明される。複数のデータセットでトレーニングする際のトレードオフ、追加データの明らかなメリットと、導入した急激な相関の差し迫ったコストを強調します。場合によっては、データセットのバランスをとることで、スプリアス相関を取り除き、パフォーマンスを向上させることができるが、必ずしも効果的な戦略ではない。我々は、これらの結果を説明するのに役立つ、散発的な相関に関する文献内の結果の文脈化を行う。本実験は,機械学習モデルにおけるトレーニングデータの選択において,特に医用画像などと相関する危険のある場面において,注意を喚起することの重要性を強調する。リスクの概要は、将来の研究と実践において注意深いデータ選択とモデル評価の必要性を浮き彫りにしている。

In machine learning, incorporating more data is often seen as a reliable strategy for improving model performance; this work challenges that notion by demonstrating that the addition of external datasets in many cases can hurt the resulting model's performance. In a large-scale empirical study across combinations of four different open-source chest x-ray datasets and 9 different labels, we demonstrate that in 43% of settings, a model trained on data from two hospitals has poorer worst group accuracy over both hospitals than a model trained on just a single hospital's data. This surprising result occurs even though the added hospital makes the training distribution more similar to the test distribution. We explain that this phenomenon arises from the spurious correlation that emerges between the disease and hospital, due to hospital-specific image artifacts. We highlight the trade-off one encounters when training on multiple datasets, between the obvious benefit of additional data and insidious cost of the introduced spurious correlation. In some cases, balancing the dataset can remove the spurious correlation and improve performance, but it is not always an effective strategy. We contextualize our results within the literature on spurious correlations to help explain these outcomes. Our experiments underscore the importance of exercising caution when selecting training data for machine learning models, especially in settings where there is a risk of spurious correlations such as with medical imaging. The risks outlined highlight the need for careful data selection and model evaluation in future research and practice.

翻訳日:2023-08-09 11:49:19 公開日:2023-08-08

# SILO言語モデル:非パラメトリックデータストアにおける法的リスクの解消

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore ( http://arxiv.org/abs/2308.04430v1 )

ライセンス: Link先を確認

Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer

(参考訳) 著作権や制限されたデータに対する訓練言語モデル(LM)の合法性は、激しい議論の対象となっている。しかし, モデルの性能は, 低リスクテキスト(例えば, 著作権外書籍や政府文書)でのみ訓練した場合, サイズやドメインカバレッジが限定されているため, 著しく低下する。これは推論中にリスクパフォーマンスのトレードオフを管理する新しい言語モデルです。 siloは、(1)パブリックドメインの228bトークンと許容ライセンスのテキストをキュレートした新しいコーパスであるopen license corpus(olc)上でパラメトリックlmをトレーニングし、(2)より一般的で容易に修正可能な非パラメトリックデータストア(例えば、著作権付き書籍やニュースを含む)で拡張することで構築されます。データストアは、トレーニングなしでハイリスクデータを使用することができ、文レベルのデータ属性をサポートし、データプロデューサがストアからコンテンツを削除することで、モデルからオプトアウトできる。これらの能力は、米国の公正使用原則や欧州連合のGDPRなどのデータ利用規制の遵守を促進することができる。実験の結果,パラメトリックLMはOLCでカバーされていない領域で苦労していることがわかった。しかし、データストアへのアクセスはドメインのパフォーマンスを大幅に改善し、パフォーマンスギャップの90%を、主にリスクの高いテキストを含むより多様なコーパスであるパイル上でトレーニングされたlmで閉じる。また、どの非パラメトリックアプローチが最適か、残りのエラーがどこにあるか、そしてデータストアサイズでパフォーマンスがどのようにスケールするかを分析します。その結果, 法的リスクを軽減しつつ, 高品質な言語モデルの構築が可能であることが示唆された。

The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government documents), due to its limited size and domain coverage. We present SILO, a new language model that manages this risk-performance tradeoff during inference. SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e.g., containing copyrighted books or news) that is only queried during inference. The datastore allows use of high-risk data without training on it, supports sentence-level data attribution, and enables data producers to opt out from the model by removing content from the store. These capabilities can foster compliance with data-use regulations such as the fair use doctrine in the United States and the GDPR in the European Union. Our experiments show that the parametric LM struggles on domains not covered by OLC. However, access to the datastore greatly improves out of domain performance, closing 90% of the performance gap with an LM trained on the Pile, a more diverse corpus with mostly high-risk text. We also analyze which nonparametric approach works best, where the remaining errors lie, and how performance scales with datastore size. Our results suggest that it is possible to build high quality language models while mitigating their legal risk.

翻訳日:2023-08-09 11:48:51 公開日:2023-08-08

# マルチタスク非IIDデータによるメタ学習オペレータの最適性

Meta-Learning Operators to Optimality from Multi-Task Non-IID Data ( http://arxiv.org/abs/2308.04428v1 )

ライセンス: Link先を確認

Thomas T.C.K. Zhang, Leonardo F. Toso, James Anderson, Nikolai Matni

(参考訳) 機械学習の最近の進歩の背後にある強力な概念は、異種ソースやタスクからデータにまたがる共通機能を抽出することだ。直感的には、共通の表現関数を学ぶためにすべてのデータを使用することは、与えられたタスクでより少ないパラメータを微調整に残すことで、計算努力と統計的一般化の両方に利益をもたらす。これらの利点を理論的に基礎づけるために、ノイジーベクトル測度$y = Mx + w$ から線型作用素 $M$ を回復する一般的な設定を提案し、この共変量 $x$ は非等方的かつ非等方的である。既存の等方性非依存のメタラーニングアプローチは,表現更新のバイアスを伴い,ノイズ項のスケーリングによってソースタスク数への好適な依存が失われることを示した。これにより、単一タスクのデータサイズによって、表現学習のサンプル複雑性がボトルネックになる可能性がある。本稿では,collins et al. (2021) で提案されている交互最小化-descent (amd) 方式の適応である$\texttt{de-bias & feature-whiten}$ (\texttt{dfw}$) を導入し,$\textit{total}$ソースデータサイズでスケールダウンしたノイズレベルによる最適表現への線形収束を確立する。これはoracleの実証的リスク最小化器と同じ順序で一般化される。各種数値シミュレーションにおける$\texttt{DFW}$の重要性を検証する。特に,バニラの交互最小化降下は,iidにおいても破滅的に失敗するが,軽度に非等方性データを示す。我々の分析は、事前の作業を統一し、一般化し、制御や動的システムといった幅広いアプリケーションに対して柔軟なフレームワークを提供する。

A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically grounding these merits, we propose a general setting of recovering linear operators $M$ from noisy vector measurements $y = Mx + w$, where the covariates $x$ may be both non-i.i.d. and non-isotropic. We demonstrate that existing isotropy-agnostic meta-learning approaches incur biases on the representation update, which causes the scaling of the noise terms to lose favorable dependence on the number of source tasks. This in turn can cause the sample complexity of representation learning to be bottlenecked by the single-task data size. We introduce an adaptation, $\texttt{De-bias & Feature-Whiten}$ ($\texttt{DFW}$), of the popular alternating minimization-descent (AMD) scheme proposed in Collins et al., (2021), and establish linear convergence to the optimal representation with noise level scaling down with the $\textit{total}$ source data size. This leads to generalization bounds on the same order as an oracle empirical risk minimizer. We verify the vital importance of $\texttt{DFW}$ on various numerical simulations. In particular, we show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications, such as in controls and dynamical systems.

翻訳日:2023-08-09 11:48:20 公開日:2023-08-08

# 自動エンコーダと生成逆数ネットワークを用いた古代石器表面の異常検出のためのディープラーニング手法

A Deep-Learning Method Using Auto-encoder and Generative Adversarial Network for Anomaly Detection on Ancient Stone Stele Surfaces ( http://arxiv.org/abs/2308.04426v1 )

ライセンス: Link先を確認

Yikun Liu and Yuning Wang and Cheng Liu

(参考訳) 最初の例では、自然劣化と人為的損傷の正確な検出が、その予防的保存に不可欠である。既存の文化遺産保存法は、正確性、効率性、時系列性、コストのバランスが困難であるため、この目標を完全に達成できない。本稿では, オートエンコーダ (ae) とgan (generative adversarial network) を用いて, 上記の古代石碑の緊急状況をリアルタイムで自動検出する深層学習手法を提案する。提案手法は, 予測不能な異常を包括的に検出しつつ, 広範な異常サンプルを必要とせず, 既存の手法の限界を克服するものである。この方法は、監視、データ取得、前処理、モデル構築、後処理の段階を含む。ロングメン・グロットーズの石碑をケーススタディとして、aeとganアーキテクチャに基づく教師なし学習モデルを提案し、99.74\%の再構成精度で検証した。本手法の評価により,人工的に設計された7つの異常を十分に検出し,誤報を伴わずに精度と信頼性を示した。本研究は,文化遺産分野における深層学習の新たな考え方と可能性を提供する。

Accurate detection of natural deterioration and man-made damage on the surfaces of ancient stele in the first instance is essential for their preventive conservation. Existing methods for cultural heritage preservation are not able to achieve this goal perfectly due to the difficulty of balancing accuracy, efficiency, timeliness, and cost. This paper presents a deep-learning method to automatically detect above mentioned emergencies on ancient stone stele in real time, employing autoencoder (AE) and generative adversarial network (GAN). The proposed method overcomes the limitations of existing methods by requiring no extensive anomaly samples while enabling comprehensive detection of unpredictable anomalies. the method includes stages of monitoring, data acquisition, pre-processing, model structuring, and post-processing. Taking the Longmen Grottoes' stone steles as a case study, an unsupervised learning model based on AE and GAN architectures is proposed and validated with a reconstruction accuracy of 99.74\%. The method's evaluation revealed the proficient detection of seven artificially designed anomalies and demonstrated precision and reliability without false alarms. This research provides novel ideas and possibilities for the application of deep learning in the field of cultural heritage.

翻訳日:2023-08-09 11:47:44 公開日:2023-08-08

# 共同対話感覚分類と行為認識のための双方向マルチホップ推論モデル

A Bi-directional Multi-hop Inference Model for Joint Dialog Sentiment Classification and Act Recognition ( http://arxiv.org/abs/2308.04424v1 )

ライセンス: Link先を確認

Li Zheng, Fei Li, Yuyang Chai, Chong Teng, Donghong Ji

(参考訳) ダイアログ知覚分類(DSC)とアクト認識(DAR)の併用作業は,ダイアログ中の各発話に対する感情ラベルと行動ラベルを同時に予測することを目的としている。しかし、現在のメソッドはダイアログコンテキストを1つの方向だけエンコードしており、コンテキストを完全に理解する能力が制限されている。さらに、これらの手法は、感情と行動ラベルの明確な相関を見落とし、リッチな感情を捉え、手がかりを行動させ、効果的で正確な推論を妨げる能力に乏しい。これらの問題に対処するために,特徴選択ネットワークと双方向マルチホップ推論ネットワークを活用した双方向マルチホップ推論モデル(bmim)を提案する。また,感情と行動ラベルの相関を明示的にモデル化するために,コントラスト学習と二重学習を用いる。 DARのF1スコアは少なくとも2.6%,DSCのF1スコアは1.4%,BMIMは最先端のベースラインよりも優れていた。さらに,提案モデルでは,パフォーマンスの向上だけでなく,共同感情と行動予測タスクの解釈可能性の向上も図っている。

The joint task of Dialog Sentiment Classification (DSC) and Act Recognition (DAR) aims to predict the sentiment label and act label for each utterance in a dialog simultaneously. However, current methods encode the dialog context in only one direction, which limits their ability to thoroughly comprehend the context. Moreover, these methods overlook the explicit correlations between sentiment and act labels, which leads to an insufficient ability to capture rich sentiment and act clues and hinders effective and accurate reasoning. To address these issues, we propose a Bi-directional Multi-hop Inference Model (BMIM) that leverages a feature selection network and a bi-directional multi-hop inference network to iteratively extract and integrate rich sentiment and act clues in a bi-directional manner. We also employ contrastive learning and dual learning to explicitly model the correlations of sentiment and act labels. Our experiments on two widely-used datasets show that BMIM outperforms state-of-the-art baselines by at least 2.6% on F1 score in DAR and 1.4% on F1 score in DSC. Additionally, Our proposed model not only improves the performance but also enhances the interpretability of the joint sentiment and act prediction task.

翻訳日:2023-08-09 11:47:24 公開日:2023-08-08

# 有限干渉計を用いた高次元時空絡み合いの活用法

How to harness high-dimensional temporal entanglement, using limited interferometry setups ( http://arxiv.org/abs/2308.04422v1 )

ライセンス: Link先を確認

Alexandra Bergmayr, Florian Kanitschar, Matej Pivoluska, Marcus Huber

(参考訳) 高次元の絡み合いは量子通信において大きな利点があることが示されている。多くの自由度、特にダウンコンバージョン(SPDC)で定期的に生成される時間領域で利用可能である。ローカルに1つの検出器チャネルだけが必要であるという利点はあるが、特に量子鍵分散アプリケーションに必要な仮定なしの方法で解析することは、悪名高い。分極時間領域における高次元絡み合いの最初の完全解析を行い、関連する密度行列要素と量子鍵分布(qkd)のセキュリティパラメータを効率的に検証する方法を示す。厳密な足場に関する過去の実験に加えて、物理ノイズモデルも開発し、自由空間量子通信の耐雑音性をさらに高める新しい構成を提案する。

High-dimensional entanglement has shown to have significant advantages in quantum communication. It is available in many degrees of freedom and in particular in the time-domain routinely produced in down-conversion (SPDC). While advantageous in the sense that only a single detector channel is needed locally, it is notoriously hard to analyze, especially in an assumption-free manner that is required for quantum key distribution applications. We develop the first complete analysis of high-dimensional entanglement in the polarization-time-domain and show how to efficiently certify relevant density matrix elements and security parameters for Quantum Key Distribution (QKD). In addition to putting past experiments on rigorous footing, we also develop physical noise models and propose a novel setup that can further enhance the noise resistance of free-space quantum communication.

翻訳日:2023-08-09 11:47:02 公開日:2023-08-08

# DiffCR:光学衛星画像からの雲除去のための高速条件拡散フレームワーク

DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images ( http://arxiv.org/abs/2308.04417v1 )

ライセンス: Link先を確認

Xuechao Zou, Kai Li, Junliang Xing, Yu Zhang, Shiying Wang, Lei Jin, and Pin Tao

(参考訳) 光衛星画像は重要なデータソースであるが、雲は品質を損なうことが多く、画像の応用や分析を妨げている。その結果、光学衛星画像から雲を効果的に除去する研究の方向性が明らかになってきた。クラウド除去の最近の進歩は、主に最適な画像品質をもたらす可能性のある生成的逆ネットワークに依存しているが、拡散モデルは、様々な画像生成タスクにおいて顕著な成功を示しており、この課題に対処できる可能性を示している。本稿では,光衛星画像の高速クラウド除去に深部畳み込みネットワークを用いた条件付き拡散を利用したDiffCRという新しいフレームワークを提案する。具体的には、条件付き画像特徴抽出のための分離エンコーダを導入し、条件付き入力と合成出力との外観情報の密接な類似性を保証する頑健な色表現を提供する。また,雲の除去モデルにおいて,条件画像の出現と目標画像との対応性を計算コストで正確にシミュレートする,新しい,効率的な時間と条件の融合ブロックを提案する。 2つの一般的なベンチマークデータセットに対する大規模な実験的評価は、DiffCRが全ての指標で常に最先端のパフォーマンスを達成しており、パラメータと計算の複雑さはそれぞれ、以前のベストメソッドの5.1%と5.4%であることを示している。ソースコード、事前トレーニングされたモデル、および実験結果は、この論文が受け入れられた時点でhttps://github.com/XavierJiezou/DiffCRで公開されている。

Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image quality, diffusion models have demonstrated remarkable success in diverse image-generation tasks, showcasing their potential in addressing this challenge. This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery. Specifically, we introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output. Moreover, we propose a novel and efficient time and condition fusion block within the cloud removal model to accurately simulate the correspondence between the appearance in the conditional image and the target image at a low computational cost. Extensive experimental evaluations on two commonly used benchmark datasets demonstrate that DiffCR consistently achieves state-of-the-art performance on all metrics, with parameter and computational complexities amounting to only 5.1% and 5.4%, respectively, of those previous best methods. The source code, pre-trained models, and all the experimental results will be publicly available at https://github.com/XavierJiezou/DiffCR upon the paper's acceptance of this work.

翻訳日:2023-08-09 11:46:30 公開日:2023-08-08

# 新しいタイプの自然崩壊モデルの提案

A proposal for a new kind of spontaneous collapse model ( http://arxiv.org/abs/2308.04415v1 )

ライセンス: Link先を確認

Nicol\`o Piccione

(参考訳) 自然崩壊モデルは、物理機構が波動関数の崩壊の原因となる標準的な量子力学の修正であり、いわゆる「測定問題」を解決する手段を提供する。しかし、相対論的にしようとすると、大きな課題が現れます。本稿では,相対論的バージョンを容易に得ることができる新しい非相対論的自発的崩壊モデルを提案する。非相対論的な状態においては、このモデルがGhirardi-Rimini-Weberモデルと非常によく似た力学に導かれることを示す。さらに、よく知られた連続自発局所化モデルのマスター方程式を得ることもできる。最後に,提案モデルがGhirardi-Rimini-Weberモデルと概念的に類似した方法で測定問題を解く方法を示す。

Spontaneous collapse models are modifications of standard quantum mechanics in which a physical mechanism is responsible for the collapse of the wavefunction, thus providing a way to solve the so-called "measurement problem". However, they present great challenges when one tries to make them relativistic. Here, we propose a new kind of non-relativistic spontaneous collapse model whose relativistic version could be easier to obtain. In the non-relativistic regime, we show that this model can lead to a dynamics quite similar to that of the Ghirardi-Rimini-Weber model, by also naturally solving the problem of indistinguishable particles. Moreover, we can also obtain the same master equation of the well-known Continuous Spontaneous Localization models. Finally, we show how our proposed model solves the measurement problem in a manner conceptually similar to the Ghirardi-Rimini-Weber model.

翻訳日:2023-08-09 11:45:52 公開日:2023-08-08

PDF登録状況（公開日: 20230808）