Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230618となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# Ethereumブロックチェーンクライアントのカオスエンジニアリング Chaos Engineering of Ethereum Blockchain Clients ( http://arxiv.org/abs/2111.00221v2 ) ライセンス: Link先を確認	Long Zhang, Javier Ron, Benoit Baudry, and Martin Monperrus	(参考訳) 本稿では,Ethereumブロックチェーンクライアントのレジリエンス評価のためのカオスエンジニアリングアプローチであるChaosETHを提案する。 ChaosETHは以下の方法で動作する。まず、Ethereumクライアントを監視して、通常の動作を決定する。その後、システム呼び出しのエラーをひとつのethereumクライアントに一度に注入し、摂動による動作を監視する。最後に、ChaosETHは、インジェクションされたシステム呼び出しの呼び出しエラーの影響を評価するために、摂動前後に記録された振る舞いを比較する。実験は、最も人気のあるethereumクライアント実装であるgoethereumとnethermindで実施された。 15のアプリケーションレベルのメトリクスに対して、22の異なるシステムコールエラーがEthereumクライアントに与える影響を評価します。システムコールの呼び出しエラーは,直接クラッシュから完全なレジリエンスに至るまで,Ethereumクライアントの幅広いレジリエンス特性を明らかにした。この実験は、ブロックチェーンシステムにカオスエンジニアリング原則を適用する可能性を明確に示している。 In this paper, we present ChaosETH, a chaos engineering approach for resilience assessment of Ethereum blockchain clients. ChaosETH operates in the following manner: First, it monitors Ethereum clients to determine their normal behavior. Then, it injects system call invocation errors into one single Ethereum client at a time, and observes the behavior resulting from perturbation. Finally, ChaosETH compares the behavior recorded before, during, and after perturbation to assess the impact of the injected system call invocation errors. The experiments are performed on the two most popular Ethereum client implementations: GoEthereum and Nethermind. We assess the impact of 22 different system call errors on those Ethereum clients with respect to 15 application-level metrics. Our results reveal a broad spectrum of resilience characteristics of Ethereum clients w.r.t. system call invocation errors, ranging from direct crashes to full resilience. The experiments clearly demonstrate the feasibility of applying chaos engineering principles to blockchain systems.	翻訳日:2023-10-24 15:48:43 公開日:2023-06-18
# NLPに基づくGDPRに対するデータ処理契約の自動コンプライアンスチェック NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR ( http://arxiv.org/abs/2209.09722v2 ) ライセンス: Link先を確認	Orlando Amaral, Muhammad Ilyas Azeem, Sallam Abualhaija and Lionel C Briand	(参考訳) 個人データの処理は、一般データ保護規則(GDPR)により、データ処理協定(DPA)を通じてヨーロッパで規制されている。 DPAのコンプライアンスを確認することは、個人データの処理を含むソフトウェア開発において、DPAとしてソフトウェアシステムのコンプライアンス検証に寄与する。しかし、GDPRにおけるDPA関連コンプライアンス要件を理解し、特定し、それらの要件をDPAで検証するためにかなりの時間と労力を必要とするため、与えられたDPAがGDPRに準拠するかどうかを手作業で確認することは困難である。本稿では,GDPR に対する DPA の適合性をチェックするための自動解法を提案する。法律の専門家との密接な交流の中で、私たちはまず2つのアーティファクトを構築しました。一 DPAの遵守及び遵守に係るGDPRの規定から抽出した「shall」要件 (ii)要件の法的概念を定義する用語表。そこで我々は、自然言語処理(NLP)技術を活用して、与えられたDPAの適合性をチェックする自動化ソリューションを開発した。具体的には,DPAのテキストコンテンツに対するフレーズレベルの表現を自動生成し,あらかじめ定義された"shall"要件の表現と比較する。 30の実際のDPAのデータセットでは、750の真偽の違反のうち618が正しく発見され、76の偽の違反を発生させ、さらに524の満足した要件を正しく識別する。このアプローチの平均精度は89.1%、リコールは82.4%、精度は84.6%である。市販のNLPツールに依存するベースラインと比較して,提案手法は平均精度が約20ポイント向上する。提案手法の精度は手作業による検証に制限を加えて約94%向上できる。 Processing personal data is regulated in Europe by the General Data Protection Regulation (GDPR) through data processing agreements (DPAs). Checking the compliance of DPAs contributes to the compliance verification of software systems as DPAs are an important source of requirements for software development involving the processing of personal data. However, manually checking whether a given DPA complies with GDPR is challenging as it requires significant time and effort for understanding and identifying DPA-relevant compliance requirements in GDPR and then verifying these requirements in the DPA. In this paper, we propose an automated solution to check the compliance of a given DPA against GDPR. In close interaction with legal experts, we first built two artifacts: (i) the "shall" requirements extracted from the GDPR provisions relevant to DPA compliance and (ii) a glossary table defining the legal concepts in the requirements. Then, we developed an automated solution that leverages natural language processing (NLP) technologies to check the compliance of a given DPA against these "shall" requirements. Specifically, our approach automatically generates phrasal-level representations for the textual content of the DPA and compares it against predefined representations of the "shall" requirements. Over a dataset of 30 actual DPAs, the approach correctly finds 618 out of 750 genuine violations while raising 76 false violations, and further correctly identifies 524 satisfied requirements. The approach has thus an average precision of 89.1%, a recall of 82.4%, and an accuracy of 84.6%. Compared to a baseline that relies on off-the-shelf NLP tools, our approach provides an average accuracy gain of ~20 percentage points. The accuracy of our approach can be improved to ~94% with limited manual verification effort.	翻訳日:2023-10-24 14:55:43 公開日:2023-06-18
# Eunomia: WebAssemblyバイナリのシンボリック実行でユーザ指定のファイングレード検索を実現する Eunomia: Enabling User-specified Fine-Grained Search in Symbolically Executing WebAssembly Binaries ( http://arxiv.org/abs/2304.07204v2 ) ライセンス: Link先を確認	Ningyu He, Zhehao Zhao, Jikai Wang, Yubin Hu, Shengjian Guo, Haoyu Wang, Guangtai Liang, Ding Li, Xiangqun Chen, Yao Guo	(参考訳) 既存の手法ではシンボリック実行のパス爆発問題を軽減するための自動アプローチが提案されているが、ユーザは様々な探索戦略を慎重に適用してシンボリック実行を最適化する必要がある。既存のアプローチは粗粒度のグローバル検索戦略のみをサポートするため、複雑なコード構造を効率的に横断することはできない。本稿では,局所的なドメイン知識を指定して,きめ細かい検索を可能にするシンボル実行手法であるEunomiaを提案する。 Eunomiaでは、ユーザーがターゲットプログラムの異なる部分にローカル検索戦略を正確に特定できる表現型DSL、Aesを設計する。局所探索戦略をさらに最適化するために,異なる局所探索戦略に対して変数のコンテキストを自動的に分離し,同じ変数に対する局所探索戦略間の競合を回避する区間ベースのアルゴリズムを設計する。 WebAssemblyをターゲットにしたシンボリック実行プラットフォームとして、Eunomiaを実装しています。これにより、さまざまな言語(CやGoなど)で書かれたアプリケーションを解析できますが、WebAssemblyにコンパイルすることができます。私たちの知る限りでは、EunomiaはWebAssemblyランタイムの全機能をサポートする最初のシンボリックな実行エンジンです。シンボリック実行のためのマイクロベンチマークスイートと6つの実世界のアプリケーションを用いて,Eunomiaの評価を行った。評価の結果,Eunomiaは実世界のアプリケーションにおけるバグ検出を最大3桁高速化することがわかった。総合的なユーザスタディの結果によると、ユーザはシンプルで直感的なAesスクリプトを書くことで、シンボリック実行の効率と効率を大幅に改善することができる。既知の6つの実世界のバグの検証に加えて、Eunomia氏は人気のあるオープンソースプロジェクトである Collections-C で2つのゼロデイバグも検出した。 Although existing techniques have proposed automated approaches to alleviate the path explosion problem of symbolic execution, users still need to optimize symbolic execution by applying various searching strategies carefully. As existing approaches mainly support only coarse-grained global searching strategies, they cannot efficiently traverse through complex code structures. In this paper, we propose Eunomia, a symbolic execution technique that allows users to specify local domain knowledge to enable fine-grained search. In Eunomia, we design an expressive DSL, Aes, that lets users precisely pinpoint local searching strategies to different parts of the target program. To further optimize local searching strategies, we design an interval-based algorithm that automatically isolates the context of variables for different local searching strategies, avoiding conflicts between local searching strategies for the same variable. We implement Eunomia as a symbolic execution platform targeting WebAssembly, which enables us to analyze applications written in various languages (like C and Go) but can be compiled into WebAssembly. To the best of our knowledge, Eunomia is the first symbolic execution engine that supports the full features of the WebAssembly runtime. We evaluate Eunomia with a dedicated microbenchmark suite for symbolic execution and six real-world applications. Our evaluation shows that Eunomia accelerates bug detection in real-world applications by up to three orders of magnitude. According to the results of a comprehensive user study, users can significantly improve the efficiency and effectiveness of symbolic execution by writing a simple and intuitive Aes script. Besides verifying six known real-world bugs, Eunomia also detected two new zero-day bugs in a popular open-source project, Collections-C.	翻訳日:2023-10-24 12:47:14 公開日:2023-06-18
# 適応可能なjson diffフレームワーク An adaptable JSON Diff Framework ( http://arxiv.org/abs/2305.05865v2 ) ライセンス: Link先を確認	Ao Sun	(参考訳) 本稿では,json-diffフレームワークであるjycmの実装について述べる。このフレームワークは"非順序"比較の概念を導入して既存のフレームワークを拡張し,ユーザが柔軟に比較シナリオをカスタマイズできる。さらに,jsonオブジェクト間の差異をより可視化し,理解するためのdiff-resultレンダラも提供する。私たちの作業は、より適応的で包括的な比較を可能にし、幅広いユースケースと要件に対応します。 In this paper, we present an implementation of JSON-diff framework JYCM, extending the existing framework by introducing the concept of "unordered" comparisons and allowing users to customize their comparison scenarios flexibly. Furthermore, we provide a diff-result renderer to visualize better and understand the differences between JSON objects. Our work enables more adaptable and comprehensive comparisons to accommodate a wider range of use cases and requirements.	翻訳日:2023-10-24 09:15:48 公開日:2023-06-18
# 研究ソフトウェアの公平性を改善するメタデータベースのエコシステム A Metadata-Based Ecosystem to Improve the FAIRness of Research Software ( http://arxiv.org/abs/2306.10620v1 ) ライセンス: Link先を確認	Patrick Kuckertz, Jan G\"opfert, Oliver Karras, David Neuroth, Julian Sch\"onau, Rodrigo Pueblas, Stephan Ferenz, Felix Engel, Noah Pflugradt, Jann M. Weinand, Astrid Nie{\ss}e, S\"oren Auer, Detlef Stolten	(参考訳) 研究ソフトウェアの再利用は、研究効率と学術交流の中心である。ソフトウェアの適用により、さまざまなバックグラウンドを持つ研究者は、研究結果の再現、検証、拡張が可能になる。さらに、オープンソースコードの解析は、アプローチの理解、比較、統合に役立つ。しかし、関連するソフトウェアが見つからない、あるいは既存の研究プロセスと互換性がないため、それ以上の使用は行われない。これは反復的なソフトウェア開発をもたらし、個々の研究者や研究コミュニティ全体の進歩を妨げる。この記事では、詳細でマシン操作可能なメタデータを持つソフトウェアインターフェースのデータモデルを記述するための、DataDescエコシステムを紹介します。特別なメタデータスキーマに加えて、簡単に収集できる交換フォーマットとサポートツール、およびソフトウェアドキュメントの自動公開が導入されている。このアプローチは、実質的にフェアネス、すなわち、発見可能性、アクセシビリティ、相互運用性、そして研究ソフトウェアの再利用性を高め、研究への影響を効果的に促進する。 The reuse of research software is central to research efficiency and academic exchange. The application of software enables researchers with varied backgrounds to reproduce, validate, and expand upon study findings. Furthermore, the analysis of open source code aids in the comprehension, comparison, and integration of approaches. Often, however, no further use occurs because relevant software cannot be found or is incompatible with existing research processes. This results in repetitive software development, which impedes the advancement of individual researchers and entire research communities. In this article, the DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata. In addition to a specialized metadata schema, an exchange format and support tools for easy collection and the automated publishing of software documentation are introduced. This approach practically increases the FAIRness, i.e., findability, accessibility, interoperability, and so the reusability of research software, as well as effectively promotes its impact on research.	翻訳日:2023-10-23 19:27:02 公開日:2023-06-18
# 2クラス依存サイクルのアンタングリングパターンに関する実証的研究 An Empirical Study of Untangling Patterns of Two-Class Dependency Cycles ( http://arxiv.org/abs/2306.10599v1 ) ライセンス: Link先を確認	Qiong Feng, Shuwen Liu, Huan Ji, Xiaotian Ma, Peng Liang	(参考訳) 依存性のサイクルは、ソフトウェアの品質と保守性に大きな課題をもたらします。しかし、実際のシナリオにおいて、実践者が依存性のサイクルをどのように解決するかの理解は限られている。本稿では,ソフトウェア開発者が2つのクラス間の依存性サイクルを実際に解決するための繰り返しパターンについて,実証的研究を行った。さまざまなドメインにまたがる18のオープンソースプロジェクトのデータを分析し,数百のサイクルアンタングリングケースを手作業で調査した。私たちの調査によると、開発者は依存性サイクルに対処するために5つの繰り返しパターンを使う傾向があります。選択されたパターンは、巡回クラス間の依存関係関係によって決定されるだけでなく、その設計コンテキスト、すなわち、巡回クラスが隣のクラスに依存するか、あるいは依存するかに非常に関係している。この経験的な研究を通じて、通常、サイクルのハンドリング中に開発者が犯した3つのよくある間違いを発見した。これらの繰り返しのパターンと依存性サイクルのプラクティスに見られるよくある誤りは、開発者の認識を改善するための分類法となり、ソフトウェア工学の学生や経験の浅い開発者のための教材としても使われる。また,依存性サイクルの内部構造を考慮することに加えて,自動ツールが依存関係サイクルのリファクタリングを支援するために,サイクルの設計コンテキストを考慮する必要があることも示唆した。 Dependency cycles pose a significant challenge to software quality and maintainability. However, there is limited understanding of how practitioners resolve dependency cycles in real-world scenarios. This paper presents an empirical study investigating the recurring patterns employed by software developers to resolve dependency cycles between two classes in practice. We analyzed the data from 18 open-source projects across different domains and manually inspected hundreds of cycle untangling cases. Our findings reveal that developers tend to employ five recurring patterns to address dependency cycles. The chosen patterns are not only determined by dependency relations between cyclic classes, but also highly related to their design context, i.e., how cyclic classes depend on or are depended by their neighbor classes. Through this empirical study, we also discovered three common mistakes developers usually made during cycles' handling. These recurring patterns and common mistakes observed in dependency cycles' practice can serve as a taxonomy to improve developers' awareness and also be used as learning materials for students in software engineering and inexperienced developers. Our results also suggest that, in addition to considering the internal structure of dependency cycles, automatic tools need to consider the design context of cycles to provide better support for refactoring dependency cycles.	翻訳日:2023-10-23 19:26:46 公開日:2023-06-18
# 感性分析のためのテキストアノテーションツールとしてのChatGPTの活用 Leveraging ChatGPT As Text Annotation Tool For Sentiment Analysis ( http://arxiv.org/abs/2306.17177v1 ) ライセンス: Link先を確認	Mohammad Belal, James She, Simon Wong	(参考訳) 感性分析は、あるテキストの感情的なトーンや極性を特定することを含む、よく知られた自然言語処理タスクである。ソーシャルメディアやその他のオンラインプラットフォームの成長に伴い、顧客からのフィードバックや意見の監視と理解を求める企業や組織にとって、感情分析はますます重要になっている。教師付き学習アルゴリズムはこのタスクに広く採用されているが、分類器を作成するには人間の注釈付きテキストが必要である。この課題を克服するために、レキシコンベースのツールが使用されている。辞書ベースのアルゴリズムの欠点は、事前に定義された感情レキシコンに依存していることだ。 ChatGPTはOpenAIの新製品で、最も人気のあるAI製品として登場した。さまざまなトピックやタスクに関する質問に答えることができる。本研究では、さまざまな感情分析タスクのためのデータラベリングツールとしてのChatGPTについて検討する。異なる目的の2つの感情分析データセットで評価する。以上の結果から,ChatGPTは他のレキシコンをベースとした非教師なし手法よりも高い性能を示し,全体的な精度が向上した。特に、最もパフォーマンスの良い語彙ベースのアルゴリズムと比較して、ChatGPTはツイートデータセットの精度が20%、Amazonレビューデータセットの約25%向上している。これらの結果は、感情分析タスクにおけるChatGPTの異常な性能を強調し、既存のレキシコンベースのアプローチをかなり上回った。この証拠は、異なる感情分析イベントやタスクのアノテーションとして使用できることを示唆している。 Sentiment analysis is a well-known natural language processing task that involves identifying the emotional tone or polarity of a given piece of text. With the growth of social media and other online platforms, sentiment analysis has become increasingly crucial for businesses and organizations seeking to monitor and comprehend customer feedback as well as opinions. Supervised learning algorithms have been popularly employed for this task, but they require human-annotated text to create the classifier. To overcome this challenge, lexicon-based tools have been used. A drawback of lexicon-based algorithms is their reliance on pre-defined sentiment lexicons, which may not capture the full range of sentiments in natural language. ChatGPT is a new product of OpenAI and has emerged as the most popular AI product. It can answer questions on various topics and tasks. This study explores the use of ChatGPT as a tool for data labeling for different sentiment analysis tasks. It is evaluated on two distinct sentiment analysis datasets with varying purposes. The results demonstrate that ChatGPT outperforms other lexicon-based unsupervised methods with significant improvements in overall accuracy. Specifically, compared to the best-performing lexical-based algorithms, ChatGPT achieves a remarkable increase in accuracy of 20% for the tweets dataset and approximately 25% for the Amazon reviews dataset. These findings highlight the exceptional performance of ChatGPT in sentiment analysis tasks, surpassing existing lexicon-based approaches by a significant margin. The evidence suggests it can be used for annotation on different sentiment analysis events and taskss.	翻訳日:2023-07-09 14:20:34 公開日:2023-06-18
# News Verifiers Showdown: News Fact-CheckingにおけるChatGPT 3.5, ChatGPT 4.0, Bing AI, Bardの比較評価 News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking ( http://arxiv.org/abs/2306.17176v1 ) ライセンス: Link先を確認	Kevin Matthe Caramancion	(参考訳) 本研究では,openai の chatgpt 3.5 と 4.0,google の bard (lamda) と microsoft の bing ai といった著名な大規模言語モデル (llm) の習熟度を評価することを目的とした。独立したファクトチェック機関から提供された100のファクトチェックされたニュースアイテムは、制御された条件下でこれら各llmにそれぞれ提示された。これらの回答は、true, false, and partial true/falseの3つのカテゴリの1つに分類された。 LLMの有効性は、独立機関が提供した検証事実に対する分類の正確さに基づいて評価された。結果は全モデル中適度な熟練度を示し、平均スコアは100点中65.25点であった。モデルのうち、OpenAIのGPT-4.0はスコア71で際立っており、偽造と事実を区別する新しいLSMの能力の限界が示唆された。しかし、人間のファクトチェッカーのパフォーマンスに逆らうと、AIモデルは、約束を示すにもかかわらず、ニュース情報に固有の微妙さとコンテキストを理解できない。この発見は、人間の認知スキルの重要性と、AI能力の継続的な進歩の必要性を強調しながら、ファクトチェックの領域におけるAIの可能性を強調している。最後に、この研究のシミュレーションから得られた実験データは、kaggleで公開されている。 This study aimed to evaluate the proficiency of prominent Large Language Models (LLMs), namely OpenAI's ChatGPT 3.5 and 4.0, Google's Bard(LaMDA), and Microsoft's Bing AI in discerning the truthfulness of news items using black box testing. A total of 100 fact-checked news items, all sourced from independent fact-checking agencies, were presented to each of these LLMs under controlled conditions. Their responses were classified into one of three categories: True, False, and Partially True/False. The effectiveness of the LLMs was gauged based on the accuracy of their classifications against the verified facts provided by the independent agencies. The results showed a moderate proficiency across all models, with an average score of 65.25 out of 100. Among the models, OpenAI's GPT-4.0 stood out with a score of 71, suggesting an edge in newer LLMs' abilities to differentiate fact from deception. However, when juxtaposed against the performance of human fact-checkers, the AI models, despite showing promise, lag in comprehending the subtleties and contexts inherent in news information. The findings highlight the potential of AI in the domain of fact-checking while underscoring the continued importance of human cognitive skills and the necessity for persistent advancements in AI capabilities. Finally, the experimental data produced from the simulation of this work is openly available on Kaggle.	翻訳日:2023-07-09 14:20:11 公開日:2023-06-18
# ソフトウェア問題の自動割り当てと分類 Automated Assignment and Classification of Software Issues ( http://arxiv.org/abs/2307.00009v1 ) ライセンス: Link先を確認	B\"u\c{s}ra Tabak	(参考訳) ソフトウェアの問題には、開発中に新しいスレッドを修正、改善、作成するための作業単位が含まれ、チームメンバ間のコミュニケーションを容易にする。最も関係のあるチームメンバーにイシューを割り当てて、イシューのカテゴリを決定するのは、面倒で難しい作業です。間違った分類は、プロジェクトの遅延や再作業、チームメンバー間のトラブルを引き起こします。本論文は,浅層機械学習のための言語的特徴を注意深く整理し,浅層およびアンサンブル法の性能を深層言語モデルと比較するものである。 state-of-the-artとは異なり、私たちはソリューションの汎用性に貢献するために、特定の個人やチームではなく、4つの役割(設計者、開発者、テスター、リーダー)に問題を割り当てます。また、ソリューションの定式化における産業的プラクティスを反映した開発者の経験レベルも考えています。私たちは、問題をバグ、新機能、改善など、異なるクラスに分類する分類アプローチを採用しています。さらに、必要な修正の種類に基づいてバグをさらに分類する努力も行います。グローバルテレビプロデューサーの上位3社のうちの1社から5つの産業データセットを収集し,評価し,深層言語モデルと比較した。われわれのデータセットには5324の問題がある。浅い手法のアンサンブル分類器は問題割当ての0.92と、最先端のディープ言語モデルに統計的に匹敵する精度のイシュー分類の0.90を達成できることを示す。この貢献には、5つのアノテートされた産業問題データセットの公開共有、明確で包括的な特徴セットの開発、新しいラベルセットの導入、浅い機械学習技術のアンサンブル分類器の有効性の検証が含まれる。 Software issues contain units of work to fix, improve or create new threads during the development and facilitate communication among the team members. Assigning an issue to the most relevant team member and determining a category of an issue is a tedious and challenging task. Wrong classifications cause delays and rework in the project and trouble among the team members. This thesis proposes a set of carefully curated linguistic features for shallow machine learning methods and compares the performance of shallow and ensemble methods with deep language models. Unlike the state-of-the-art, we assign issues to four roles (designer, developer, tester, and leader) rather than to specific individuals or teams to contribute to the generality of our solution. We also consider the level of experience of the developers to reflect the industrial practices in our solution formulation. We employ a classification approach to categorize issues into distinct classes, namely bug, new feature, improvement, and other. Additionally, we endeavor to further classify bugs based on the specific type of modification required. We collect and annotate five industrial data sets from one of the top three global television producers to evaluate our proposal and compare it with deep language models. Our data sets contain 5324 issues in total. We show that an ensemble classifier of shallow techniques achieves 0.92 for issue assignment and 0.90 for issue classification in accuracy which is statistically comparable to the state-of-the-art deep language models. The contributions include the public sharing of five annotated industrial issue data sets, the development of a clear and comprehensive feature set, the introduction of a novel label set and the validation of the efficacy of an ensemble classifier of shallow machine learning techniques.	翻訳日:2023-07-09 14:02:31 公開日:2023-06-18
# 認知型AIエコシステム: ChatGPTの事例 Deceptive AI Ecosystems: The Case of ChatGPT ( http://arxiv.org/abs/2306.13671v1 ) ライセンス: Link先を確認	Xiao Zhan, Yifan Xu, Stefan Sarkadi	(参考訳) AIチャットボットのChatGPTは、人間のような応答を生成する能力で人気を集めている。しかし、この機能にはいくつかのリスクが伴う。特に、ユーザーが誤解を招いたり、倫理的な問題をさらに引き起こす可能性のある情報を作成したりするといった、欺く行動が原因である。社会的、文化的、経済的、政治的相互作用に対するChatGPTの影響をより深く理解するためには、ChatGPTが、様々な社会的圧力が開発と展開に影響を与える現実世界でどのように機能するかを検討することが不可欠である。本稿では,ChatGPTが組み込まれているエコシステムの一部として,ユーザの関与を重視しながら,ChatGPTを"野生"で研究する必要性を強調する。そこで我々は,ChatGPTの疑わしい人間的対話から生じる倫理的課題を考察し,より透明で信頼性の高いチャットボットを開発するためのロードマップを提案する。当社のアプローチの中心は、チャットボット技術の未来を形作る上で、積極的なリスクアセスメントとユーザ参加の重要性です。 ChatGPT, an AI chatbot, has gained popularity for its capability in generating human-like responses. However, this feature carries several risks, most notably due to its deceptive behaviour such as offering users misleading or fabricated information that could further cause ethical issues. To better understand the impact of ChatGPT on our social, cultural, economic, and political interactions, it is crucial to investigate how ChatGPT operates in the real world where various societal pressures influence its development and deployment. This paper emphasizes the need to study ChatGPT "in the wild", as part of the ecosystem it is embedded in, with a strong focus on user involvement. We examine the ethical challenges stemming from ChatGPT's deceptive human-like interactions and propose a roadmap for developing more transparent and trustworthy chatbots. Central to our approach is the importance of proactive risk assessment and user participation in shaping the future of chatbot technology.	翻訳日:2023-07-02 13:46:58 公開日:2023-06-18
# 「少し改題しようと思うかもしれない」--ピアツーリングにおけるヘッジの特定 "You might think about slightly revising the title": identifying hedges in peer-tutoring interactions ( http://arxiv.org/abs/2306.14911v1 ) ライセンス: Link先を確認	Yann Raphalen, Chlo\'e Clavel, Justine Cassell	(参考訳) ヘッジは会話の相互作用の管理において重要な役割を果たす。ピア・チュータリングでは、インストラクションやネガティブなフィードバックの影響を抑えるために低いラプポートを経験するダイアド(インターロケーターのペア)の家庭教師が特に用いている。学習を改善するために学生とのラプポートを管理する学習エージェント構築の目的を追求し,マルチモーダルピアツーリングデータセットを用いてヘッジ識別のための計算フレームワークを構築した。我々は,社会科学文献の洞察を取り入れた,事前学習した資源を活用したアプローチを比較した。私たちの最高のパフォーマンスは、解釈しやすく、既存のベースラインを上回るハイブリッドアプローチでした。我々は,ピアツーリング会話におけるヘッジを特徴付ける特徴を探索するためにモデル説明可能性ツールを用い,新たな特徴とハイブリッドモデルアプローチの利点を明らかにした。 Hedges play an important role in the management of conversational interaction. In peer tutoring, they are notably used by tutors in dyads (pairs of interlocutors) experiencing low rapport to tone down the impact of instructions and negative feedback. Pursuing the objective of building a tutoring agent that manages rapport with students in order to improve learning, we used a multimodal peer-tutoring dataset to construct a computational framework for identifying hedges. We compared approaches relying on pre-trained resources with others that integrate insights from the social science literature. Our best performance involved a hybrid approach that outperforms the existing baseline while being easier to interpret. We employ a model explainability tool to explore the features that characterize hedges in peer-tutoring conversations, and we identify some novel features, and the benefits of such a hybrid model approach.	翻訳日:2023-07-02 13:27:28 公開日:2023-06-18
# llms時代におけるヒューマンラベルデータの重要性 The Importance of Human-Labeled Data in the Era of LLMs ( http://arxiv.org/abs/2306.14910v1 ) ライセンス: Link先を確認	Yang Liu	(参考訳) 大規模言語モデル(LLM)の出現は、カスタマイズされた機械学習モデルの開発に革命をもたらし、データ要件の再定義に関する議論を引き起こした。 LLMの訓練と実施によって促進される自動化は、人間レベルのラベリング介入が、教師付き学習の時代と同じレベルの重要さをもはや持たないという議論や願望につながった。本稿では LLM 時代における人間ラベルデータの継続的な関連性を支持する説得力のある議論について述べる。 The advent of large language models (LLMs) has brought about a revolution in the development of tailored machine learning models and sparked debates on redefining data requirements. The automation facilitated by the training and implementation of LLMs has led to discussions and aspirations that human-level labeling interventions may no longer hold the same level of importance as in the era of supervised learning. This paper presents compelling arguments supporting the ongoing relevance of human-labeled data in the era of LLMs.	翻訳日:2023-07-02 13:27:13 公開日:2023-06-18
# 動的ニューラルネットワークを用いた株価予測 Stock Price Prediction using Dynamic Neural Networks ( http://arxiv.org/abs/2306.12969v1 ) ライセンス: Link先を確認	David Noel	(参考訳) 本稿では,日替わり価格を予測する時系列動的ニューラルネットワークの解析と実装を行う。ニューラルネットワークはカオス、非線形、一見ランダムなデータの基本パターンを識別する能力を有しており、現在の多くの技術よりもはるかに正確に株価の動きを予測するメカニズムを提供する。基本技術、技術的手法、回帰手法を含むストック分析の現代的手法は、ニューラルネットワークのパフォーマンスと会話され、並列化される。また、効率的な市場仮説(EMH)を提示し、ニューラルネットワークを用いたカオス理論と対比する。本稿では,EMHを論じ,カオス理論を支持する。最後に、株価予測にニューラルネットワークを使用するための推奨事項を示す。 This paper will analyze and implement a time series dynamic neural network to predict daily closing stock prices. Neural networks possess unsurpassed abilities in identifying underlying patterns in chaotic, non-linear, and seemingly random data, thus providing a mechanism to predict stock price movements much more precisely than many current techniques. Contemporary methods for stock analysis, including fundamental, technical, and regression techniques, are conversed and paralleled with the performance of neural networks. Also, the Efficient Market Hypothesis (EMH) is presented and contrasted with Chaos theory using neural networks. This paper will refute the EMH and support Chaos theory. Finally, recommendations for using neural networks in stock price prediction will be presented.	翻訳日:2023-06-23 14:07:09 公開日:2023-06-18
# ラベル付き確率ブロックモデルにおけるインスタンス最適クラスタリカバリ Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model ( http://arxiv.org/abs/2306.12968v1 ) ライセンス: Link先を確認	Kaito Ariu, Alexandre Proutiere, Se-Young Yun	(参考訳) 我々は,有限個のクラスタを持つラベル付き確率ブロックモデル (lsbm) において隠れたコミュニティを回復する問題を考える。 LSBMでは、ラベルは(独立して)各アイテムに対して観測される。我々の目的は、観測されたラベルを用いてクラスタを復元する効率的なアルゴリズムを考案することである。この目的のために、任意のクラスタリングアルゴリズムで満たされる誤分類項目の期待数について、インスタンス固有の下限を再検討する。本稿では,これらの下位境界を期待値と高い確率で一致させる最初のアルゴリズムであるIACを提案する。 iacは1回のスペクトルクラスタリングアルゴリズムと反復的確率に基づくクラスタ割り当て改善からなる。このアプローチはインスタンス固有の低境界に基づいており、クラスタ数を含むモデルパラメータは一切必要としない。スペクトルクラスタリングを一度だけ実行することで、IACは$\mathcal{O}(n \text{polylog}(n))$の全体的な計算複雑性を維持する。本手法の有効性を数値実験により示す。 We consider the problem of recovering hidden communities in the Labeled Stochastic Block Model (LSBM) with a finite number of clusters, where cluster sizes grow linearly with the total number $n$ of items. In the LSBM, a label is (independently) observed for each pair of items. Our objective is to devise an efficient algorithm that recovers clusters using the observed labels. To this end, we revisit instance-specific lower bounds on the expected number of misclassified items satisfied by any clustering algorithm. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability. IAC consists of a one-time spectral clustering algorithm followed by an iterative likelihood-based cluster assignment improvement. This approach is based on the instance-specific lower bound and does not require any model parameters, including the number of clusters. By performing the spectral clustering only once, IAC maintains an overall computational complexity of $\mathcal{O}(n \text{polylog}(n))$. We illustrate the effectiveness of our approach through numerical experiments.	翻訳日:2023-06-23 14:07:00 公開日:2023-06-18
# Anchor-Guided Clustering と Spatio-Temporal Consistency ID Re Assignment によるマルチカメラ人物追跡の強化 Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and Spatio-Temporal Consistency ID Re-Assignment ( http://arxiv.org/abs/2304.09471v2 ) ライセンス: Link先を確認	Hsiang-Wei Huang, Cheng-Yen Yang, Zhongyu Jiang, Pyong-Kun Kim, Kyoungoh Lee, Kwangju Kim, Samartha Ramkumar, Chaitanya Mullapudi, In-Su Jang, Chung-I Huang, Jenq-Neng Hwang	(参考訳) マルチカメラの多人数追跡は、特に小売、医療センター、交通ハブなどの環境において、正確で効率的な屋内人物追跡システムへの需要が高まり、研究の重要領域になりつつある。我々は、アンカー誘導クラスタリングを用いて、幾何学に基づくクロスカメラIDの再割り当てのための、クロスカメラの再識別と時空間整合性を実現する、新しいマルチカメラ多人数追跡手法を提案する。本研究の目的は,各個人に特有の特徴を識別し,カメラ間の視界の重なりを利用して,実際のカメラパラメータを必要とせずに正確な軌跡の予測を行うことにより,トラッキングの精度を向上させることである。本手法は合成データと実世界のデータの両方を扱う際のロバスト性と有効性を示している。提案手法はCVPR AI City Challenge 2023データセットで評価され,95.36%のIDF1を達成し,第1位となった。コードはhttps://github.com/ipl-uw/AIC23_Track1_UWIPL_ETRIで公開されている。 Multi-camera multiple people tracking has become an increasingly important area of research due to the growing demand for accurate and efficient indoor people tracking systems, particularly in settings such as retail, healthcare centers, and transit hubs. We proposed a novel multi-camera multiple people tracking method that uses anchor-guided clustering for cross-camera re-identification and spatio-temporal consistency for geometry-based cross-camera ID reassigning. Our approach aims to improve the accuracy of tracking by identifying key features that are unique to every individual and utilizing the overlap of views between cameras to predict accurate trajectories without needing the actual camera parameters. The method has demonstrated robustness and effectiveness in handling both synthetic and real-world data. The proposed method is evaluated on CVPR AI City Challenge 2023 dataset, achieving IDF1 of 95.36% with the first-place ranking in the challenge. The code is available at: https://github.com/ipl-uw/AIC23_Track1_UWIPL_ETRI.	翻訳日:2023-06-22 17:25:33 公開日:2023-06-18
# 注意に基づく畳み込みネットワークと説明可能なAIを用いた乳癌分離 Breast Cancer Segmentation using Attention-based Convolutional Network and Explainable AI ( http://arxiv.org/abs/2305.14389v2 ) ライセンス: Link先を確認	Jai Vardhan, Taraka Satya Krishna Teja Malisetti	(参考訳) 乳がん(BC)は依然として重大な健康上の脅威であり、現在長期治療は行われていない。早期発見は重要であるが、マンモグラフィーの解釈は高い偽陽性と陰性によって妨げられる。 BCは肺がんに勝ると予想され、早期発見法の改善が不可欠である。高分解能赤外線カメラを用いたサーモグラフィは、特に人工知能(ai)と組み合わせると期待できる。この研究は、セグメンテーションのための注意に基づく畳み込みニューラルネットワークを示し、BCの検出と分類のスピードと精度を高める。このシステムは画像を強化し、説明可能なAIを用いて癌セグメンテーションを行う。 irt画像を用いてunetアーキテクチャのバイアスと弱点領域を分析するために,障害同定のためのトランスフォーマッティングに基づく畳み込みアーキテクチャ(unet)を提案し,勾配重み付けクラスアクティベーションマッピング(grad-cam)を用いた。既存のディープラーニングフレームワークと比較して,提案フレームワークの優位性が確認された。 Breast cancer (BC) remains a significant health threat, with no long-term cure currently available. Early detection is crucial, yet mammography interpretation is hindered by high false positives and negatives. With BC incidence projected to surpass lung cancer, improving early detection methods is vital. Thermography, using high-resolution infrared cameras, offers promise, especially when combined with artificial intelligence (AI). This work presents an attention-based convolutional neural network for segmentation, providing increased speed and precision in BC detection and classification. The system enhances images and performs cancer segmentation with explainable AI. We propose a transformer-attention-based convolutional architecture (UNet) for fault identification and employ Gradient-weighted Class Activation Mapping (Grad-CAM) to analyze areas of bias and weakness in the UNet architecture with IRT images. The superiority of our proposed framework is confirmed when compared with existing deep learning frameworks.	翻訳日:2023-06-22 17:03:43 公開日:2023-06-18
# 一様量子重ね合わせ状態作成のための効率的な量子アルゴリズム An efficient quantum algorithm for preparation of uniform quantum superposition states ( http://arxiv.org/abs/2306.11747v1 ) ライセンス: Link先を確認	Alok Shukla, Prakash Vedula	(参考訳) n$-qubitの計算基底状態の空でない部分集合上の一様重ね合わせを含む量子状態準備は、多くの量子計算アルゴリズムや応用において重要かつ困難なステップである。本研究は、$\ket{\Psi} = \frac{1}{\sqrt{M}}\sum_{j = 0}^{M - 1} \ket{j}$, ここで、$M$は重ね合わせ状態における異なる状態の数を表し、$2 \leq M \leq 2^n$である。重ね合わせ状態 $\ket{\Psi}$ は、全ての$M$に対して、ゲートの複雑さと回路深さのみ$O(\log_2~M)$で効率的に作成できることが示される。これは、この問題の一般的な場合の文献における他の既存のアプローチと比較して、ゲート複雑性が指数関数的に減少することを示している。提案されたアプローチのもう1つの利点は、$n=\ceil{\log_2~m}$ qubitsである。さらに、ancilla qubits や複数の制御を持つ量子ゲートは、一様重ね合わせ状態 $\ket{\psi}$ を作成するのに必要としない。また、一様重ね合わせ状態の混合を含む多種多様な非一様重ね合わせ状態は、前述した一様重ね合わせ状態$\ket{\Psi}$を作成するのに使用されるのと同じ回路構成で効率よく生成できるが、修正されたパラメータで生成できることも示されている。 Quantum state preparation involving a uniform superposition over a non-empty subset of $n$-qubit computational basis states is an important and challenging step in many quantum computation algorithms and applications. In this work, we address the problem of preparation of a uniform superposition state of the form $\ket{\Psi} = \frac{1}{\sqrt{M}}\sum_{j = 0}^{M - 1} \ket{j}$, where $M$ denotes the number of distinct states in the superposition state and $2 \leq M \leq 2^n$. We show that the superposition state $\ket{\Psi}$ can be efficiently prepared with a gate complexity and circuit depth of only $O(\log_2~M)$ for all $M$. This demonstrates an exponential reduction in gate complexity in comparison to other existing approaches in the literature for the general case of this problem. Another advantage of the proposed approach is that it requires only $n=\ceil{\log_2~M}$ qubits. Furthermore, neither ancilla qubits nor any quantum gates with multiple controls are needed in our approach for creating the uniform superposition state $\ket{\Psi}$. It is also shown that a broad class of nonuniform superposition states that involve a mixture of uniform superposition states can also be efficiently created with the same circuit configuration that is used for creating the uniform superposition state $\ket{\Psi}$ described earlier, but with modified parameters.	翻訳日:2023-06-22 16:34:27 公開日:2023-06-18
# ハイブリッドレンズの深部適応融合による光電界再構成 Light Field Reconstruction via Deep Adaptive Fusion of Hybrid Lenses ( http://arxiv.org/abs/2102.07085v3 ) ライセンス: Link先を確認	Jing Jin and Mantang Guo and Junhui Hou and Hui Liu and Hongkai Xiong	(参考訳) 本稿では,複数の低解像度カメラを取り囲む高分解能カメラを含むハイブリッドレンズからの高分解能光電界(lf)像の再構成の問題について検討する。既存手法の性能は, 平坦なテクスチャ領域のぼやけた結果や, 不連続境界付近の歪みなど, 依然として限られている。この課題に対処するために,2つの相補的および並列的な視点から入力の特徴を包括的に活用する,エンドツーエンドの学習ベースアプローチを提案する。具体的には、深い多次元およびクロスドメインの特徴表現を学習することにより、空間的に一貫した中間推定を回帰し、他方のモジュールは、高分解能ビューの情報を伝播することにより、高周波数テクスチャを維持する別の中間推定をワープする。最後に,2つの中間推定の利点を学習アテンションマップを通して適応的に活用し,平滑なテクスチャ領域と深さの不連続境界の両方において,最終的な高分解能のlf画像を得る。さらに,ハイブリッドLFイメージングシステムによって得られた実ハイブリッドデータに対して,シミュレーションハイブリッドデータを用いてトレーニングした手法の有効性を向上するために,ネットワークアーキテクチャとトレーニング戦略を慎重に設計する。実データとシミュレーションデータの両方について広範な実験を行った結果,最先端データよりも優れたアプローチが得られた。我々の知る限りでは、これは真のハイブリッド入力からのLF再構成のための最初のエンドツーエンドのディープラーニング手法である。我々のフレームワークは、高解像度なLFデータ取得のコストを削減し、LFデータストレージと送信の恩恵を受ける可能性があると考えています。 This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation, while the other module warps another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations adaptively via the learned attention maps, leading to the final high-resolution LF image with satisfactory results on both plain textured areas and depth discontinuous boundaries. Besides, to promote the effectiveness of our method trained with simulated hybrid data on real hybrid data captured by a hybrid LF imaging system, we carefully design the network architecture and the training strategy. Extensive experiments on both real and simulated hybrid data demonstrate the significant superiority of our approach over state-of-the-art ones. To the best of our knowledge, this is the first end-to-end deep learning method for LF reconstruction from a real hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission.	翻訳日:2023-06-22 08:30:20 公開日:2023-06-18
# ゴールコンディショニングトランスポーターネットワークを用いた変形可能なケーブル、布地、バッグの再構成学習 Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks ( http://arxiv.org/abs/2012.03385v4 ) ライセンス: Link先を確認	Daniel Seita, Pete Florence, Jonathan Tompson, Erwin Coumans, Vikas Sindhwani, Ken Goldberg, Andy Zeng	(参考訳) ケーブル、布地、バッグなどの変形可能な物体の配置と操作は、ロボット操作における長年の課題である。変形可能な複雑なダイナミクスと高次元の構成空間は、剛性のある物体と比較すると、多段計画だけでなくゴールの仕様においても操作が困難である。ゴールは剛体のポーズほど簡単に特定できず、「バッグの中にアイテムを置く」といった複雑な空間関係を伴うこともある。本研究では,画像ベースゴールコンディショニングや複数ステップの変形操作を含む,1D,2D,3Dの変形可能な構造を持つシミュレーションベンチマークスイートを開発する。本稿では,最近提案されたロボット操作を学習するためのモデルアーキテクチャであるトランスポーターネットワークに目標条件を組み込む手法を提案する。シミュレーションおよび物理実験において、目標条件付きトランスポーターネットワークは、ターゲット位置に対するテスト時間視覚アンカーを使わずに、変形可能な構造を柔軟に指定した構成に操作できることを示した。また, 2次元および3次元の変形可能なタスクでテストすることにより, 変形可能なオブジェクトを操作するトランスポーターネットワークを用いて, 先行結果を著しく拡張した。補足資料はhttps://berkeleyautomation.github.io/bags/で入手できる。 Rearranging and manipulating deformable objects such as cables, fabrics, and bags is a long-standing challenge in robotic manipulation. The complex dynamics and high-dimensional configuration spaces of deformables, compared to rigid objects, make manipulation difficult not only for multi-step planning, but even for goal specification. Goals cannot be as easily specified as rigid object poses, and may involve complex relative spatial relations such as "place the item inside the bag". In this work, we develop a suite of simulated benchmarks with 1D, 2D, and 3D deformable structures, including tasks that involve image-based goal-conditioning and multi-step deformable manipulation. We propose embedding goal-conditioning into Transporter Networks, a recently proposed model architecture for learning robotic manipulation that rearranges deep features to infer displacements that can represent pick and place actions. In simulation and in physical experiments, we demonstrate that goal-conditioned Transporter Networks enable agents to manipulate deformable structures into flexibly specified configurations without test-time visual anchors for target locations. We also significantly extend prior results using Transporter Networks for manipulating deformable objects by testing on tasks with 2D and 3D deformables. Supplementary material is available at https://berkeleyautomation.github.io/bags/.	翻訳日:2023-06-22 08:29:19 公開日:2023-06-18
# 文脈広い音素クラス情報を活用した音声強調性能の向上 Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information ( http://arxiv.org/abs/2011.07442v5 ) ライセンス: Link先を確認	Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao	(参考訳) 従来,音声の音響的特徴を調音的特徴の場所/マンガで増大させることで,音声強調(SE)過程を導出することにより,音声の幅広い音韻特性を考慮し,性能向上を図ることができた。本稿では,音節属性の文脈情報を付加情報として検討し,SEをさらに活用する。より具体的には、幅広い音素クラス(bpcs)のシーケンスを予測するエンドツーエンド自動音声認識(e2e-asr)モデルによる損失を利用して、se性能を改善することを提案する。また,BPCをベースとしたE2E-ASRに基づくSEシステムの学習において,ASRを用いた多目的トレーニングと知覚的損失も開発した。音声の発声, 発声残響, 音声強調課題による実験結果から, 文脈的bpc情報がse性能を向上できることが確認された。さらに、BPCベースのE2E-ASRで訓練されたSEモデルは、音素ベースのE2E-ASRよりも優れている。その結果、ASRシステムによる音素の誤分類による目的が不完全なフィードバックにつながる可能性があり、BPCがよりよい選択である可能性が示唆された。最後に,重畳可能な音声目標を同一のBPCに組み合わせることで,SE性能を効果的に向上できることに注意する。 Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the broad phonetic properties of the input speech when performing enhancement to attain performance improvements. In this paper, we explore the contextual information of articulatory attributes as additional information to further benefit SE. More specifically, we propose to improve the SE performance by leveraging losses from an end-to-end automatic speech recognition (E2E-ASR) model that predicts the sequence of broad phonetic classes (BPCs). We also developed multi-objective training with ASR and perceptual losses to train the SE system based on a BPC-based E2E-ASR. Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that contextual BPC information improves SE performance. Moreover, the SE model trained with the BPC-based E2E-ASR outperforms that with the phoneme-based E2E-ASR. The results suggest that objectives with misclassification of phonemes by the ASR system may lead to imperfect feedback, and BPC could be a potentially better choice. Finally, it is noted that combining the most-confusable phonetic targets into the same BPC when calculating the additional objective can effectively improve the SE performance.	翻訳日:2023-06-22 08:28:56 公開日:2023-06-18
# 条件付き生成逆数ネットワークを用いた深層学習対流 Deep Learning Convective Flow Using Conditional Generative Adversarial Networks ( http://arxiv.org/abs/2005.06422v2 ) ライセンス: Link先を確認	Changlin Jiang, Amir Barati Farimani	(参考訳) 我々は,エネルギー輸送を伴う時間依存対流の学習と予測が可能な汎用ディープラーニングフレームワークfluidganを開発した。 fluidganは高速で正確でデータ駆動であり、基礎となる流体やエネルギー輸送物理学の知識なしに流体の物理を満たしている。また、FluidGANは速度、圧力、温度場の結合も学習する。我々の枠組みは、基礎となる物理モデルが複雑または未知である決定論的多物理現象を理解するのに役立つ。 We developed a general deep learning framework, FluidGAN, capable of learning and predicting time-dependent convective flow coupled with energy transport. FluidGAN is thoroughly data-driven with high speed and accuracy and satisfies the physics of fluid without any prior knowledge of underlying fluid and energy transport physics. FluidGAN also learns the coupling between velocity, pressure, and temperature fields. Our framework helps understand deterministic multiphysics phenomena where the underlying physical model is complex or unknown.	翻訳日:2023-06-22 08:27:10 公開日:2023-06-18
# 深層学習における認識的不確かさの定量化 Quantifying Epistemic Uncertainty in Deep Learning ( http://arxiv.org/abs/2110.12122v4 ) ライセンス: Link先を確認	Ziyi Huang, Henry Lam and Haofeng Zhang	(参考訳) 不確かさの定量化は、機械学習の信頼性と堅牢性の中核にある。本稿では,この不確実性,特に,深層学習において,不確実性(不確実性)を(訓練手順から)\textit{procedural variability} と(訓練データから) \textit{data variability} (訓練データから) に分解する理論的枠組みを提案する。次に,これらの不確実性を評価するための2つの手法を提案する。我々は,古典的な統計手法を適用する際の計算困難を克服する方法を実証する。複数の問題設定に関する実験的な評価は、我々の理論を裏付け、我々のフレームワークと推定が、どのようにしてモデリングとデータ収集の直接的なガイダンスを提供するかを説明する。 Uncertainty quantification is at the core of the reliability and robustness of machine learning. In this paper, we provide a theoretical framework to dissect the uncertainty, especially the \textit{epistemic} component, in deep learning into \textit{procedural variability} (from the training procedure) and \textit{data variability} (from the training data), which is the first such attempt in the literature to our best knowledge. We then propose two approaches to estimate these uncertainties, one based on influence function and one on batching. We demonstrate how our approaches overcome the computational difficulties in applying classical statistical methods. Experimental evaluations on multiple problem settings corroborate our theory and illustrate how our framework and estimation can provide direct guidance on modeling and data collection efforts.	翻訳日:2023-06-22 06:47:39 公開日:2023-06-18
# 一般化総変分最小化によるクラスタ化フェデレーション学習 Clustered Federated Learning via Generalized Total Variation Minimization ( http://arxiv.org/abs/2105.12769v4 ) ライセンス: Link先を確認	Yasmin SarcheshmehPour, Yu Tian, Linli Zhang, Alexander Jung	(参考訳) ネットワーク構造を持つローカルデータセットの分散収集のための局所的(あるいはパーソナライズされた)モデルを学習するための最適化手法を検討する。このネットワーク構造は、ローカルデータセット間の類似性のドメイン固有の概念から生じる。そのような概念の例としては、時空間的近接、統計的依存関係、機能的関係などがある。我々の主要な概念的貢献は、一般化総変動(GTV)最小化としてフェデレーション学習を定式化することである。この定式化は、既存の連合学習方法を統一し、大幅に拡張する。柔軟性が高く、一般化線形モデルやディープニューラルネットワークを含む幅広いパラメトリックモデルと組み合わせることができる。私たちのアルゴリズムの主な貢献は、完全に分散した連合学習アルゴリズムです。このアルゴリズムは、GTVの最小化を解くために確立された原始双対法を適用して得られる。メッセージパッシングとして実装することができ、処理時間や帯域幅を含む限られた計算資源から生じる不正確な計算に対して堅牢である。私たちの主な分析的貢献は、アルゴリズムが学習したローカルモデルパラメータと、oracleベースのクラスタ型フェデレーション学習方法との偏差の上限です。この上界は、ローカルモデルと、gtvの最小化が(ほぼ)均質なローカルデータセットをプールできるローカルデータセットのネットワーク構造に関する条件を明らかにする。 We study optimization methods to train local (or personalized) models for decentralized collections of local datasets with an intrinsic network structure. This network structure arises from domain-specific notions of similarity between local datasets. Examples for such notions include spatio-temporal proximity, statistical dependencies or functional relations. Our main conceptual contribution is to formulate federated learning as generalized total variation (GTV) minimization. This formulation unifies and considerably extends existing federated learning methods. It is highly flexible and can be combined with a broad range of parametric models, including generalized linear models or deep neural networks. Our main algorithmic contribution is a fully decentralized federated learning algorithm. This algorithm is obtained by applying an established primal-dual method to solve GTV minimization. It can be implemented as message passing and is robust against inexact computations that arise from limited computational resources including processing time or bandwidth. Our main analytic contribution is an upper bound on the deviation between the local model parameters learnt by our algorithm and an oracle-based clustered federated learning method. This upper bound reveals conditions on the local models and the network structure of local datasets such that GTV minimization is able to pool (nearly) homogeneous local datasets.	翻訳日:2023-06-22 06:45:20 公開日:2023-06-18
# 確率近似と強化学習における漸近統計量のODE法 The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning ( http://arxiv.org/abs/2110.14427v3 ) ライセンス: Link先を確認	Vivek Borkar, Shuhang Chen, Adithya Devraj, Ioannis Kontoyiannis and Sean Meyn	(参考訳) 論文は、$d$-dimensional stochastic approximation recursion、$$$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \phi_{n+1}) $$、$\phi$は一般状態空間上の幾何学的エルゴードマルコフ連鎖である$\textsf{x}$、定常分布$\pi$、$f:\re^d\times\textsf{x}\to\re^d$である。主な結果はDonsker-Varadhan Lyapunov ドリフト条件 (DV3) とベクトル場 $\bar{f}(\theta)=\textsf{E}[f(\theta,\Phi)]$, with $\Phi\sim\pi$ による平均流の安定性条件の下にある。 (i)$\{ \theta_n\}$ は収束 a.s. であり、$L_4$ は一意根 $\theta^$ of $\bar{f}(\theta)$ に収束する。 (ii)正規化誤差に対する通常の1次元CLTと同様に関数型CLTが確立される。 (iii) CLT は正規化バージョン $z_n{=:} \sqrt{n} (\theta^{\text{PR}}_n -\theta^)$, 平均化パラメータ $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$ を、ステップサイズに関する標準的な仮定に従って保持する。さらに、正規化された共分散は、$$ \lim_{n \to \infty} n \textsf{E} [ {\widetilde{\theta}}^{\text{ PR}}_n ({\widetilde{\theta}}^{\text{ PR}}_n)^T ] = \Sigma_\theta^,\;\;\;\;\textit{with $\widetilde{\theta}^{\text{ PR}}_n = \theta^{\text{ PR}}_n -\theta^$,} $$$$\Sigma_\theta^$はポリアクとルパートの最小共分散である。 (iv) 例えば、$f$ と $\bar{f}$ が $\theta$ において線型であり、マルコフ連鎖 $\Phi$ は幾何学的にエルゴード的であるが満足しない(DV3)。アルゴリズムは収束するが、第二モーメントは非有界である: $ \textsf{E} [ \\| \theta_n \\|^2 ] \to \infty$ as $n\to\infty$。 The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ in which $\Phi$ is a geometrically ergodic Markov chain on a general state space $\textsf{X}$ with stationary distribution $\pi$, and $f:\Re^d\times\textsf{X}\to\Re^d$. The main results are established under a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3), and a stability condition for the mean flow with vector field $\bar{f}(\theta)=\textsf{E}[f(\theta,\Phi)]$, with $\Phi\sim\pi$. (i) $\{ \theta_n\}$ is convergent a.s. and in $L_4$ to the unique root $\theta^$ of $\bar{f}(\theta)$. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. (iii) The CLT holds for the normalized version, $z_n{=:} \sqrt{n} (\theta^{\text{PR}}_n -\theta^)$, of the averaged parameters, $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$, subject to standard assumptions on the step-size. Moreover, the normalized covariance converges, $$ \lim_{n \to \infty} n \textsf{E} [ {\widetilde{\theta}}^{\text{ PR}}_n ({\widetilde{\theta}}^{\text{ PR}}_n)^T ] = \Sigma_\theta^,\;\;\;\textit{with $\widetilde{\theta}^{\text{ PR}}_n = \theta^{\text{ PR}}_n -\theta^$,} $$ where $\Sigma_\theta^$ is the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and the Markov chain $\Phi$ is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded: $ \textsf{E} [ \\| \theta_n \\|^2 ] \to \infty$ as $n\to\infty$.	翻訳日:2023-06-22 06:36:26 公開日:2023-06-18
# metaverse: セキュリティとプライバシの懸念 Metaverse: Security and Privacy Concerns ( http://arxiv.org/abs/2203.03854v3 ) ライセンス: Link先を確認	Ruoyu Zhao, Yushu Zhang, Youwen Zhu, Rushi Lan, Zhongyun Hua	(参考訳) 現実世界に似た3次元仮想宇宙である「メタバース」という用語は、1990年代に先延ばしされて以来、常に想像力に満ちていた。近年,様々な技術の継続的な出現と進歩によってメタバースを実現することが可能となり,再び注目を浴びている。差別の削減、個人差の排除、社会化など、人間社会に多くの利益をもたらす可能性がある。しかし、すべてにはセキュリティとプライバシに関する懸念がある。本稿では,メタバースの概念をまず分析し,他のVR技術と比較して超仮想現実性(VR)エコシステムであることを示す。そして、ユーザ情報、コミュニケーション、シナリオ、グッズという4つの視点から、セキュリティとプライバシに関する懸念を慎重に分析し、詳細化します。一方、我々は、新たなバケット効果を利用して、哲学的な観点から、セキュリティとプライバシの懸念に包括的に対処する必要性を提起し、メタバースコミュニティに多少の進展をもたらすことを期待する。 The term "metaverse", a three-dimensional virtual universe similar to the real realm, has always been full of imagination since it was put forward in the 1990s. Recently, it is possible to realize the metaverse with the continuous emergence and progress of various technologies, and thus it has attracted extensive attention again. It may bring a lot of benefits to human society such as reducing discrimination, eliminating individual differences, and socializing. However, everything has security and privacy concerns, which is no exception for the metaverse. In this article, we firstly analyze the concept of the metaverse and propose that it is a super virtual-reality (VR) ecosystem compared with other VR technologies. Then, we carefully analyze and elaborate on possible security and privacy concerns from four perspectives: user information, communication, scenario, and goods, and immediately, the potential solutions are correspondingly put forward. Meanwhile, we propose the need to take advantage of the new buckets effect to comprehensively address security and privacy concerns from a philosophical perspective, which hopefully will bring some progress to the metaverse community.	翻訳日:2023-06-22 06:26:30 公開日:2023-06-18
# バックドアポゾンサンプル検出のためのプロアクティブMLアプローチに向けて Towards A Proactive ML Approach for Detecting Backdoor Poison Samples ( http://arxiv.org/abs/2205.13616v3 ) ライセンス: Link先を確認	Xiangyu Qi, Tinghao Xie, Jiachen T. Wang, Tong Wu, Saeed Mahloujifar, Prateek Mittal	(参考訳) 広告主は、トレーニングデータセットにバックドア毒サンプルを導入することで、ディープラーニングモデルにバックドアを埋め込むことができる。本研究は,バックドア攻撃の脅威を軽減するために,このような毒のサンプルを検出する方法を検討する。まず、最も先行作業の基盤となるポストホックなワークフローを明らかにし、ディフェンダーは攻撃の進行を受動的に許可し、その後攻撃後のモデルの特徴を活用して毒のサンプルを明らかにする。このワークフローがディフェンダーの能力を十分に活用していないことは明らかで、その上に構築されたディフェンスパイプラインは、多くのシナリオで障害やパフォーマンスの低下を引き起こします。第2に,モデルトレーニングと毒物検出パイプライン全体に対して,ディフェンダーが積極的に関与し,攻撃後のモデルの特徴を強要し,拡大し,毒物検出を容易にするという,積極的な考え方を促進することによるパラダイムシフトを提案する。これに基づいて統一フレームワークを定式化し,より堅牢で一般化可能な検出パイプラインの設計に関する実践的洞察を提供する。第3に,本フレームワークの具体的インスタンス化として,CT(Confusion Training)技術を導入する。 CTは、既に有毒なデータセットに追加の中毒攻撃を加え、検出にバックドアパターンを露出しながら、良性相関を積極的に分離する。 4種類のデータセットと14種類の攻撃に対する実証的評価は、14のベースライン防御に対するCTの優位性を検証した。 Adversaries can embed backdoors in deep learning models by introducing backdoor poison samples into training datasets. In this work, we investigate how to detect such poison samples to mitigate the threat of backdoor attacks. First, we uncover a post-hoc workflow underlying most prior work, where defenders passively allow the attack to proceed and then leverage the characteristics of the post-attacked model to uncover poison samples. We reveal that this workflow does not fully exploit defenders' capabilities, and defense pipelines built on it are prone to failure or performance degradation in many scenarios. Second, we suggest a paradigm shift by promoting a proactive mindset in which defenders engage proactively with the entire model training and poison detection pipeline, directly enforcing and magnifying distinctive characteristics of the post-attacked model to facilitate poison detection. Based on this, we formulate a unified framework and provide practical insights on designing detection pipelines that are more robust and generalizable. Third, we introduce the technique of Confusion Training (CT) as a concrete instantiation of our framework. CT applies an additional poisoning attack to the already poisoned dataset, actively decoupling benign correlation while exposing backdoor patterns to detection. Empirical evaluations on 4 datasets and 14 types of attacks validate the superiority of CT over 14 baseline defenses.	翻訳日:2023-06-22 06:18:36 公開日:2023-06-18
# 自動車用創発型視覚センサ Emergent Visual Sensors for Autonomous Vehicles ( http://arxiv.org/abs/2205.09383v2 ) ライセンス: Link先を確認	You Li, Julien Moreau, Javier Ibanez-Guzman	(参考訳) 自動運転車は、周囲を理解するために認識システムに依存している。カメラは、現代のコンピュータビジョンアルゴリズムが提供する物体検出と認識の利点から、lidarやレーダーなどの他のセンサーと比較して、知覚システムにとって不可欠である。しかし、その固有の撮像原理によって制限されるため、標準的なrgbカメラは、低照度、高コントラスト、霧・雨・雪などの悪天候など、様々な悪いシナリオで性能が低下する可能性がある。一方,2次元画像検出による3次元情報の推定は,ライダーやレーダーに比べて一般的に困難である。近年、従来のRGBカメラの限界に対応するために、いくつかの新しいセンシング技術が登場している。本稿では,赤外線カメラ,レンジゲートカメラ,偏光カメラ,イベントカメラの4つの新しいイメージセンサの原理を概観する。それらの比較優位性、既存または潜在的アプリケーション、および対応するデータ処理アルゴリズムはすべて、体系的な方法で提示される。本研究は、自動運転社会の実践者に対して、新たな視点と洞察を提供することを期待する。 Autonomous vehicles rely on perception systems to understand their surroundings for further navigation missions. Cameras are essential for perception systems due to the advantages of object detection and recognition provided by modern computer vision algorithms, comparing to other sensors, such as LiDARs and radars. However, limited by its inherent imaging principle, a standard RGB camera may perform poorly in a variety of adverse scenarios, including but not limited to: low illumination, high contrast, bad weather such as fog/rain/snow, etc. Meanwhile, estimating the 3D information from the 2D image detection is generally more difficult when compared to LiDARs or radars. Several new sensing technologies have emerged in recent years to address the limitations of conventional RGB cameras. In this paper, we review the principles of four novel image sensors: infrared cameras, range-gated cameras, polarization cameras, and event cameras. Their comparative advantages, existing or potential applications, and corresponding data processing algorithms are all presented in a systematic manner. We expect that this study will assist practitioners in the autonomous driving society with new perspectives and insights.	翻訳日:2023-06-22 06:17:24 公開日:2023-06-18
# 量子カーネルモデルにおける帯域幅の一般化 Bandwidth Enables Generalization in Quantum Kernel Models ( http://arxiv.org/abs/2206.06686v3 ) ライセンス: Link先を確認	Abdulkadir Canatar, Evan Peters, Cengiz Pehlevan, Stefan M. Wild, Ruslan Shaydulin	(参考訳) 量子コンピュータは、いくつかの特殊な設定で古典的な最先端の機械学習手法を高速化することが知られている。例えば、量子カーネルの手法は離散対数問題の学習版で指数関数的な高速化をもたらすことが示されている。量子モデルの一般化を理解することは、実用上の問題において同様のスピードアップを実現するために不可欠である。最近の結果は、一般化が量子的特徴空間の指数的大きさによって妨げられることを証明している。これらの結果は量子モデルが量子ビットの数が大きい場合には一般化できないことを示唆するが、本論文ではこれらの結果は過度に制限的な仮定に依存していることを示す。我々は、量子カーネル帯域幅と呼ばれるハイパーパラメータを変化させることで、より広いモデルのクラスを考える。我々は、大量子ビット極限を解析し、閉形式で解ける量子モデルの一般化のための明示的な公式を提供する。具体的には、帯域幅の値を変更することで、任意の対象関数に一般化できないモデルから、整列した目標に対する良好な一般化を得られることを示す。本解析では,帯域幅がカーネル積分演算子のスペクトルを制御し,モデルの帰納バイアスを制御していることを示す。この理論が量子モデルの一般化にどのように影響するかを正確に予測できることを実証的に証明する。我々は、機械学習における量子優位性に対する結果の意義について論じる。 Quantum computers are known to provide speedups over classical state-of-the-art machine learning methods in some specialized settings. For example, quantum kernel methods have been shown to provide an exponential speedup on a learning version of the discrete logarithm problem. Understanding the generalization of quantum models is essential to realizing similar speedups on problems of practical interest. Recent results demonstrate that generalization is hindered by the exponential size of the quantum feature space. Although these results suggest that quantum models cannot generalize when the number of qubits is large, in this paper we show that these results rely on overly restrictive assumptions. We consider a wider class of models by varying a hyperparameter that we call quantum kernel bandwidth. We analyze the large-qubit limit and provide explicit formulas for the generalization of a quantum model that can be solved in closed form. Specifically, we show that changing the value of the bandwidth can take a model from provably not being able to generalize to any target function to good generalization for well-aligned targets. Our analysis shows how the bandwidth controls the spectrum of the kernel integral operator and thereby the inductive bias of the model. We demonstrate empirically that our theory correctly predicts how varying the bandwidth affects generalization of quantum models on challenging datasets, including those far outside our theoretical assumptions. We discuss the implications of our results for quantum advantage in machine learning.	翻訳日:2023-06-22 06:08:31 公開日:2023-06-18
# Live in the Moment: 政策の進化に適応した学習ダイナミクスモデル Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy ( http://arxiv.org/abs/2207.12141v3 ) ライセンス: Link先を確認	Xiyao Wang, Wichayaporn Wongkamjan, Ruonan Jia, Furong Huang	(参考訳) モデルベース強化学習(RL)は、動的モデルを学び、政策学習のためのサンプルを生成することにより、モデルフリーRLよりも実際に高いサンプル効率を達成する。以前の研究は、すべての歴史的政策、すなわちサンプル再生バッファの実証的な状態-行動ビジション分布に適合するダイナミックスモデルを学習した。しかし,本稿では,使用中のポリシが経時的に変化しているため,その分布下でのダイナミックスモデルの適用が必ずしも,<emph{all>履歴ポリシーのモデル予測に有効であるとは限らないことを観察する。トレーニング中のポリシーの進化は、状態行動の訪問分布の変化を引き起こす。我々は、この分布がモデル学習とモデルロールアウトに与える影響を理論的に分析する。次に,新しい動力学モデル学習法である \textit{policy-adapted dynamics model learning (pdml)" を提案する。 PDMLは歴史的政策混合分布を動的に調整し、学習したモデルが進化する政策の状態-行動訪問分布に継続的に適応できるようにする。 MuJoCoにおける一連の連続制御環境の実験により、PDMLは、最先端のモデルベースRL法と組み合わせて、サンプル効率を大幅に向上し、漸近性能を向上することが示された。 Model-based reinforcement learning (RL) often achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a dynamics model that fits under the empirical state-action visitation distribution for all historical policies, i.e., the sample replay buffer. However, in this paper, we observe that fitting the dynamics model under the distribution for \emph{all historical policies} does not necessarily benefit model prediction for the \emph{current policy} since the policy in use is constantly evolving over time. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how this distribution shift over historical policies affects the model learning and model rollouts. We then propose a novel dynamics model learning method, named \textit{Policy-adapted Dynamics Model Learning (PDML)}. PDML dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy. Experiments on a range of continuous control environments in MuJoCo show that PDML achieves significant improvement in sample efficiency and higher asymptotic performance combined with the state-of-the-art model-based RL methods.	翻訳日:2023-06-22 05:58:11 公開日:2023-06-18
# 大規模コーパスの意味的類似性分析に関する認知的研究:トランスフォーマーによるアプローチ A Cognitive Study on Semantic Similarity Analysis of Large Corpora: A Transformer-based Approach ( http://arxiv.org/abs/2207.11716v2 ) ライセンス: Link先を確認	Praneeth Nemani, Satyanarayana Vollala	(参考訳) 意味的類似性分析とモデリングは、今日の多くの自然言語処理の先駆的応用において、基本的に賞賛されているタスクである。シーケンシャルパターン認識の感覚により、RNNやLSTMのような多くのニューラルネットワークはセマンティック類似性モデリングにおいて満足な結果を得た。しかし、これらの解は、非系列的な方法で情報を処理できないため、不適切なコンテキスト抽出につながるため、非効率であると考えられている。トランスフォーマーは、非逐次データ処理や自己アテンションといった長所があるため、最先端アーキテクチャとして機能する。本稿では,従来の手法とトランスフォーマー方式の両方を用いて,米国特許用語のPhrase Matching Datasetに対する意味的類似性解析とモデリングを行う。提案手法は,4種類の復号化BERT-DeBERTaを試作し,K-Foldクロスバリデーションにより性能を向上する。実験の結果,従来の手法と比較して手法の性能が向上し,平均ピアソン相関スコアは0.79。 Semantic similarity analysis and modeling is a fundamentally acclaimed task in many pioneering applications of natural language processing today. Owing to the sensation of sequential pattern recognition, many neural networks like RNNs and LSTMs have achieved satisfactory results in semantic similarity modeling. However, these solutions are considered inefficient due to their inability to process information in a non-sequential manner, thus leading to the improper extraction of context. Transformers function as the state-of-the-art architecture due to their advantages like non-sequential data processing and self-attention. In this paper, we perform semantic similarity analysis and modeling on the U.S Patent Phrase to Phrase Matching Dataset using both traditional and transformer-based techniques. We experiment upon four different variants of the Decoding Enhanced BERT - DeBERTa and enhance its performance by performing K-Fold Cross-Validation. The experimental results demonstrate our methodology's enhanced performance compared to traditional techniques, with an average Pearson correlation score of 0.79.	翻訳日:2023-06-22 05:57:49 公開日:2023-06-18
# CPU上のディープラーニングモデル:効率的なトレーニングの方法論 Deep Learning Models on CPUs: A Methodology for Efficient Training ( http://arxiv.org/abs/2206.10034v2 ) ライセンス: Link先を確認	Quchen Fu, Ramesh Chukka, Keith Achorn, Thomas Atta-fosu, Deepak R. Canchi, Zhongwei Teng, Jules White, and Douglas C. Schmidt	(参考訳) GPUは、高度に並列化されたアーキテクチャのため、ディープラーニングモデルのトレーニングに好まれている。その結果、トレーニング最適化に関するほとんどの研究はGPUに焦点を当てている。しかし、トレーニング用の適切なハードウェアを選択する方法を決定する際には、コストと効率のトレードオフがしばしばあります。特にcpuサーバは、ハードウェア更新コストが少なく、既存のインフラをより活用できるため、cpu上でのトレーニングがより効率的であれば有益である。本稿では,CPUを用いた深層学習モデルの学習にいくつかの貢献をする。まず、intel cpu上でディープラーニングモデルのトレーニングを最適化する手法と、パフォーマンスプロファイリングを改善するために開発したprofilednnと呼ばれるツールキットを提案する。第2に、ワークフローをガイドし、パフォーマンス問題を特定するいくつかのケーススタディを探索し、PyTorch用のIntel Extensionを最適化することで、RetinaNet-ResNext50モデル全体の2倍のトレーニングパフォーマンスが向上する。第3に、PyTorchの公式実装の2倍高速な、ボトルネックの特定とカスタム焦点損失カーネル作成を可能にするProfileDNNの可視化機能を活用する方法を示す。 GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when deciding on how to choose the proper hardware for training. In particular, CPU servers can be beneficial if training on CPUs was more efficient, as they incur fewer hardware update costs and better utilizing existing infrastructure. This paper makes several contributions to research on training deep learning models using CPUs. First, it presents a method for optimizing the training of deep learning models on Intel CPUs and a toolkit called ProfileDNN, which we developed to improve performance profiling. Second, we describe a generic training optimization method that guides our workflow and explores several case studies where we identified performance issues and then optimized the Intel Extension for PyTorch, resulting in an overall 2x training performance increase for the RetinaNet-ResNext50 model. Third, we show how to leverage the visualization capabilities of ProfileDNN, which enabled us to pinpoint bottlenecks and create a custom focal loss kernel that was two times faster than the official reference PyTorch implementation.	翻訳日:2023-06-22 05:56:13 公開日:2023-06-18
# SE(3)-DiffusionFields:拡散による関節握りと運動最適化のためのスムーズなコスト関数の学習 SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion ( http://arxiv.org/abs/2209.03855v4 ) ライセンス: Link先を確認	Julen Urain and Niklas Funk and Jan Peters and Georgia Chalvatzaki	(参考訳) 多目的最適化問題は、ロボット工学においてユビキタスである。例えば、ロボット操作タスクの最適化には、ポーズの設定、衝突、関節制限の把握に関する共同検討が必要である。いくつかの要求は容易に手作業で設計できるが、例えば、軌道の滑らかさはデータから学習する必要がある。本稿では,データ駆動型se(3)コスト関数を拡散モデルとして学習する手法を提案する。拡散モデルは高度に表現されたマルチモーダル分布を表現することができ、スコアマッチングトレーニングの目的のため、空間全体に適切な勾配を示すことができる。拡散モデルとしての学習コストは、他のコストとシームレスに1つの微分可能な目的関数に統合し、関節勾配に基づく運動最適化を可能にする。本研究では,6dof把持のためのse(3)拡散モデルの学習に着目し,把持選択と軌道生成を分離することなく,関節把持と運動最適化の新しい枠組みを創り出す。本研究は,SE(3)拡散モデルw.r.t.古典的生成モデルの表現力を評価し,代表的ベースラインに対するシミュレーションおよび実世界のロボット操作の一連のタスクにおいて,提案した最適化フレームワークの優れた性能を示す。 Multi-objective optimization problems are ubiquitous in robotics, e.g., the optimization of a robot manipulation task requires a joint consideration of grasp pose configurations, collisions and joint limits. While some demands can be easily hand-designed, e.g., the smoothness of a trajectory, several task-specific objectives need to be learned from data. This work introduces a method for learning data-driven SE(3) cost functions as diffusion models. Diffusion models can represent highly-expressive multimodal distributions and exhibit proper gradients over the entire space due to their score-matching training objective. Learning costs as diffusion models allows their seamless integration with other costs into a single differentiable objective function, enabling joint gradient-based motion optimization. In this work, we focus on learning SE(3) diffusion models for 6DoF grasping, giving rise to a novel framework for joint grasp and motion optimization without needing to decouple grasp selection from trajectory generation. We evaluate the representation power of our SE(3) diffusion models w.r.t. classical generative models, and we showcase the superior performance of our proposed optimization framework in a series of simulated and real-world robotic manipulation tasks against representative baselines.	翻訳日:2023-06-22 05:49:27 公開日:2023-06-18
# 任意の次元に対するstabiliser符号の数え上げ Counting stabiliser codes for arbitrary dimension ( http://arxiv.org/abs/2209.01449v2 ) ライセンス: Link先を確認	Tanmay Singal, Che Chiang, Eugene Hsu, Eunsang Kim, Hsi-Sheng Goan and Min-Hsiu Hsieh	(参考訳) この作業では、任意の正の整数$d$に対して、$d$-dimensional qudits からなる $[[n,k]]_d$ 安定化符号の数を計算する。 gross (ref. [23]) による独創的な著作において、$[[n,k]]_d$安定化符号は、$d$ が素数である場合(または素数、すなわち $d=p^m$ である場合)に計算された。 Refの証明。参照。 [23]は,非プライム事件には適用できない. この証明のために、グループ構造を $[n,k]]_d$ コードに導入し、これを中国の剰余定理と組み合わせて $[[n,k]]_d$ コードの数を数える。私たちの仕事はRefと重なる。参照。 [23]$d$が素数であり、この場合、我々の結果は正確に一致するが、より一般的なケースでは結果が異なる。それにもかかわらず、安定化符号の総桁数は、その次元が素数であるか非素数であるかに依存しない。これは、安定化状態の数(またはより一般に安定化符号)を数えるために使われる方法が$d$が素数であるかどうかに依存するため、驚くべきことである。安定状態の濃度は、素数次元の場合(およびガロア・クディット素数-パワー次元の場合)でしか知られていなかったが、量子コンピューティングにおける多くの話題において量子化器として重要な役割を果たす。その中には、魔法の資源理論、設計理論、安定状態に対するデ・フィネッティの定理、クリフォード回路の古典的シミュラビリティの研究と最適化、小次元系の量子的文脈性の研究、ウィグナー函数の研究などが含まれる。我々の研究は、一般の場合でこの量子化器を利用できるので、素数次元でない量子系を素数次元系と同じ台座に配置する上で重要なステップである。 In this work, we compute the number of $[[n,k]]_d$ stabilizer codes made up of $d$-dimensional qudits, for arbitrary positive integers $d$. In a seminal work by Gross (Ref. [23]) the number of $[[n,k]]_d$ stabilizer codes was computed for the case when $d$ is a prime (or the power of a prime, i.e., $d=p^m$, but when the qudits are Galois-qudits). The proof in Ref. Ref. [23] is inapplicable to the non-prime case. For our proof, we introduce a group structure to $[[n,k]]_d$ codes, and use this in conjunction with the Chinese remainder theorem to count the number of $[[n,k]]_d$ codes. Our work overlaps with Ref. Ref. [23] when $d$ is a prime and in this case our results match exactly, but the results differ for the more generic case. Despite that, the overall order of magnitude of the number of stabilizer codes scales agnostic of whether the dimension is prime or non-prime. This is surprising since the method employed to count the number of stabilizer states (or more generally stabilizer codes) depends on whether $d$ is prime or not. The cardinality of stabilizer states, which was so far known only for the prime-dimensional case (and the Galois qudit prime-power dimensional case) plays an important role as a quantifier in many topics in quantum computing. Salient among these are the resource theory of magic, design theory, de Finetti theorem for stabilizer states, the study and optimisation of the classical simulability of Clifford circuits, the study of quantum contextuality of small-dimensional systems and the study of Wigner-functions. Our work makes available this quantifier for the generic case, and thus is an important step needed to place results for quantum computing with non-prime dimensional quantum systems on the same pedestal as prime-dimensional systems.	翻訳日:2023-06-22 05:48:41 公開日:2023-06-18
# グローバル収束勾配型バイレベルハイパーパラメータ最適化法 A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization Method ( http://arxiv.org/abs/2208.12118v2 ) ライセンス: Link先を確認	Ankur Sinha, Satender Gunwal and Shivam Kumar	(参考訳) 機械学習におけるハイパーパラメータ最適化は、通常、近似したハイパーパラメータセットのみをもたらすナイーブなテクニックによって達成される。ベイズ最適化のような手法は、与えられたハイパーパラメータの領域をインテリジェントに探索するが、最適解を保証しない。これらのアプローチの大きな欠点は、ハイパーパラメータの数で探索領域が指数関数的に増加し、計算コストが増加し、アプローチが遅くなることである。超パラメータ最適化問題は本質的には二段階最適化問題であり、この問題を解決するための二段階解法を試みている研究もある。しかしながら、これらの研究はトレーニング損失を最小限にするユニークなモデル重み付けを仮定している。本稿では,超パラメータ最適化問題の解法として,これらの欠点に対処する勾配法について述べる。提案手法は,実験で正規化ハイパーパラメータを選択した連続ハイパーパラメータを扱うことができる。この手法は、理論的に証明された最適パラメータの集合への収束を保証する。この考え方はガウス過程回帰を用いた低レベル最適値関数の近似に基づいている。その結果、二レベル問題は、拡張ラグランジアン法を用いて解決される単一レベル制約最適化タスクに還元される。我々は,MNISTおよびCIFAR-10データセットを多層パーセプトロンおよびLeNetアーキテクチャ上で広範囲に計算し,提案手法の有効性を確認した。格子探索, ランダム探索, ベイズ最適化, ハイバーバンド法の比較研究により, 提案アルゴリズムはより低い計算量に収束し, テストセットをより一般化するモデルが導かれることを示した。 Hyperparameter optimization in machine learning is often achieved using naive techniques that only lead to an approximate set of hyperparameters. Although techniques such as Bayesian optimization perform an intelligent search on a given domain of hyperparameters, it does not guarantee an optimal solution. A major drawback of most of these approaches is an exponential increase of their search domain with number of hyperparameters, increasing the computational cost and making the approaches slow. The hyperparameter optimization problem is inherently a bilevel optimization task, and some studies have attempted bilevel solution methodologies for solving this problem. However, these studies assume a unique set of model weights that minimize the training loss, which is generally violated by deep learning architectures. This paper discusses a gradient-based bilevel method addressing these drawbacks for solving the hyperparameter optimization problem. The proposed method can handle continuous hyperparameters for which we have chosen the regularization hyperparameter in our experiments. The method guarantees convergence to the set of optimal hyperparameters that this study has theoretically proven. The idea is based on approximating the lower-level optimal value function using Gaussian process regression. As a result, the bilevel problem is reduced to a single level constrained optimization task that is solved using the augmented Lagrangian method. We have performed an extensive computational study on the MNIST and CIFAR-10 datasets on multi-layer perceptron and LeNet architectures that confirms the efficiency of the proposed method. A comparative study against grid search, random search, Bayesian optimization, and HyberBand method on various hyperparameter problems shows that the proposed algorithm converges with lower computation and leads to models that generalize better on the testing set.	翻訳日:2023-06-22 05:47:26 公開日:2023-06-18
# PromptCast: 時系列予測のための新しいPromptベースの学習パラダイム PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting ( http://arxiv.org/abs/2210.08964v4 ) ライセンス: Link先を確認	Hao Xue and Flora D. Salim	(参考訳) 本稿では,時系列予測の新しい視点を提案する。既存の時系列予測手法では、モデルは入力として数値の列を取り、出力として数値値を生成する。既存のSOTAモデルはトランスフォーマーアーキテクチャに基づいており、複数のエンコーディング機構で変更され、歴史的データのコンテキストとセマンティクスが組み込まれている。事前学習された言語基盤モデルの成功に触発されて、これらのモデルが時系列予測の解決にも適用できるかどうかを疑問視する。そこで我々は,新しい予測パラダイムであるprompt-based time series forecasting (promptcast)を提案する。この新しいタスクでは、数値入力と出力をプロンプトに変換し、予測タスクを文から文へのフレーム化することで、予測目的の言語モデルを直接適用することができる。本研究を支援するために,3つの実世界の予測シナリオを含む大規模データセット(PISA)を提案する。我々は異なるSOTA数値に基づく予測手法と言語生成モデルを評価する。様々な予測設定によるベンチマーク結果は、言語生成モデルで提案するプロンプトキャストが有望な研究方向であることを示している。さらに、従来の数値ベースの予測と比較すると、PromptCastはゼロショット設定下でのより優れた一般化能力を示す。 This paper presents a new perspective on time series forecasting. In existing time series forecasting methods, the models take a sequence of numerical values as input and yield numerical values as output. The existing SOTA models are largely based on the Transformer architecture, modified with multiple encoding mechanisms to incorporate the context and semantics around the historical data. Inspired by the successes of pre-trained language foundation models, we pose a question about whether these models can also be adapted to solve time-series forecasting. Thus, we propose a new forecasting paradigm: prompt-based time series forecasting (PromptCast). In this novel task, the numerical input and output are transformed into prompts and the forecasting task is framed in a sentence-to-sentence manner, making it possible to directly apply language models for forecasting purposes. To support and facilitate the research of this task, we also present a large-scale dataset (PISA) that includes three real-world forecasting scenarios. We evaluate different SOTA numerical-based forecasting methods and language generation models. The benchmark results with various forecasting settings demonstrate the proposed PromptCast with language generation models is a promising research direction. Additionally, in comparison to conventional numerical-based forecasting, PromptCast shows a much better generalization ability under the zero-shot setting.	翻訳日:2023-06-22 05:28:20 公開日:2023-06-18
# エッジ対応事前学習によるMR画像合成のためのマルチスケールトランスネットワーク Multi-scale Transformer Network with Edge-aware Pre-training for Cross-Modality MR Image Synthesis ( http://arxiv.org/abs/2212.01108v3 ) ライセンス: Link先を確認	Yonghao Li, Tao Zhou, Kelei He, Yi Zhou, Dinggang Shen	(参考訳) 磁気共鳴(MR)画像合成は、与えられたモダリティから欠落するモダリティを生成するために用いられる。既存の(教師付き学習)手法は、効果的な合成モデルを訓練するために、多数のペアのマルチモーダルデータを必要とすることが多い。しかし、教師付きトレーニングに十分なペアデータを得ることは、しばしば困難である。実際、ペアデータの数は少ないが、ペアデータの数は少ないことが多い。本稿では,2つのペアデータとアンペアデータの両方を活用するために,エッジ対応MR画像合成のためのマルチスケールトランスフォーマーネットワーク(MT-Net)を提案する。具体的には、Edge保存型Masked AutoEncoder(Edge-MAE)を自己教師方式で事前訓練し、同時に実行する。 1)各画像にランダムにマスキングされたパッチに対する画像インプテーション 2)エッジマップ全体の推定はコンテキスト情報と構造情報の両方を効果的に学習する。さらに,各対策の難しさに応じて異なるマスクパッチを別々に処理することにより,Edge-MAEの性能を向上させるパッチワイド・ロスを提案する。提案した事前学習に基づいて、後続の微調整段階において、事前訓練したエッジ-MAEのエンコーダから抽出したマルチスケール特徴を統合することにより、欠損モード画像を合成するデュアルスケール選択融合(DSF)モジュールを設計(MT-Net)する。さらに、この事前学習エンコーダを用いて、合成画像と、トレーニングにおいて類似(一貫性)を必要とする対応する接地構造画像から高レベル特徴を抽出する。実験の結果, MT-Net は, 利用可能な全ペアデータに対して 70 % の費用を用いても, 競合する手法と同等の性能を発揮することがわかった。私たちのコードはhttps://github.com/lyhkevin/mt-netで公開されます。 Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model. However, it is often challenging to obtain sufficient paired data for supervised training. In reality, we often have a small number of paired data while a large number of unpaired data. To take advantage of both paired and unpaired data, in this paper, we propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis. Specifically, an Edge-preserving Masked AutoEncoder (Edge-MAE) is first pre-trained in a self-supervised manner to simultaneously perform 1) image imputation for randomly masked patches in each image and 2) whole edge map estimation, which effectively learns both contextual and structural information. Besides, a novel patch-wise loss is proposed to enhance the performance of Edge-MAE by treating different masked patches differently according to the difficulties of their respective imputations. Based on this proposed pre-training, in the subsequent fine-tuning stage, a Dual-scale Selective Fusion (DSF) module is designed (in our MT-Net) to synthesize missing-modality images by integrating multi-scale features extracted from the encoder of the pre-trained Edge-MAE. Further, this pre-trained encoder is also employed to extract high-level features from the synthesized image and corresponding ground-truth image, which are required to be similar (consistent) in the training. Experimental results show that our MT-Net achieves comparable performance to the competing methods even using $70\%$ of all available paired data. Our code will be publicly available at https://github.com/lyhkevin/MT-Net.	翻訳日:2023-06-22 05:00:35 公開日:2023-06-18
# 注意機構に基づくBi-LSTM価格予測 Bi-LSTM Price Prediction based on Attention Mechanism ( http://arxiv.org/abs/2212.03443v2 ) ライセンス: Link先を確認	Jiashu Lou, Leyi Cui, Ye Li	(参考訳) 金融デリバティブ市場の拡大と発展に伴い、取引の頻度もより速く、より速くなります。人間の限界により、最近はアルゴリズムと自動トレーディングが議論の中心となっている。本稿では,金とビットコインという2つの一般的な資産をベースとした,注目機構に基づく双方向LSTMニューラルネットワークを提案する。機能工学の面では,従来の技術要素を付加すると同時に,時系列モデルを組み合わせることで,要因の開発も行います。モデルパラメータの選択において、我々は最終的に2層深層学習ネットワークを選択した。 aucの測定によれば、bitcoinと金の正確性はそれぞれ71.94%と73.03%である。予測結果を用いて,2年間で1089.34%のリターンを達成した。同時に,本論文で提案した Bi-LSTM モデルと従来のモデルとの比較を行い,本モデルがデータセット上で最高の性能を示すことを示す。最後に, モデルの重要性と実験結果, 今後の改善方向性について考察する。 With the increasing enrichment and development of the financial derivatives market, the frequency of transactions is also faster and faster. Due to human limitations, algorithms and automatic trading have recently become the focus of discussion. In this paper, we propose a bidirectional LSTM neural network based on an attention mechanism, which is based on two popular assets, gold and bitcoin. In terms of Feature Engineering, on the one hand, we add traditional technical factors, and at the same time, we combine time series models to develop factors. In the selection of model parameters, we finally chose a two-layer deep learning network. According to AUC measurement, the accuracy of bitcoin and gold is 71.94% and 73.03% respectively. Using the forecast results, we achieved a return of 1089.34% in two years. At the same time, we also compare the attention Bi-LSTM model proposed in this paper with the traditional model, and the results show that our model has the best performance in this data set. Finally, we discuss the significance of the model and the experimental results, as well as the possible improvement direction in the future.	翻訳日:2023-06-22 04:48:07 公開日:2023-06-18
# ディープニューラルネットワークは2年生よりスマートか? Are Deep Neural Networks SMARTer than Second Graders? ( http://arxiv.org/abs/2212.09993v5 ) ライセンス: Link先を確認	Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Kevin A. Smith, Joshua B. Tenenbaum	(参考訳) 最近では、高度な認知能力を必要とするタスク(例えば、囲い込み、アートの生成、チャットgptなど)を解決するためのディープニューラルネットワークの応用が増えている。幅広いスキルを必要とする問題を解決する上で、ニューラルネットワークはどの程度一般化可能か? この質問に答えるために、ニューラルネットワークの抽象化、推論、一般化能力を評価するための、単純なマルチモーダルアルゴリズム推論タスクと関連するsmart-101データセットを提案する。私たちのデータセットは101の独特なパズルで構成されており、それぞれのパズルは絵と質問で構成されており、それらの解には算術、代数、空間的推論などいくつかの基本的なスキルが必要です。ディープニューラルネットワークのトレーニングに向けてデータセットをスケールするために、解アルゴリズムを維持しながら、パズルごとに完全に新しいインスタンスをプログラムで生成する。 SMART-101の性能をベンチマークするために,様々な最先端のバックボーンを用いた視覚・言語メタラーニングモデルを提案する。実験の結果,強力な深層モデルでは教師付き環境下でのパズルに対して妥当な性能が得られたが,一般化のための解析ではランダムな精度に劣らないことがわかった。また,最近のChatGPTや他の大規模言語モデルをSMART-101のサブセットで評価した結果,これらのモデルが合理的な推論能力を示す一方で,解答はしばしば誤りであることがわかった。 Recent times have witnessed an increasing number of applications of deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, ChatGPT, etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children in the 6--8 age group. Our dataset consists of 101 unique puzzles; each puzzle comprises a picture and a question, and their solution needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning, among others. To scale our dataset towards training deep neural networks, we programmatically generate entirely new instances for each puzzle, while retaining their solution algorithm. To benchmark performances on SMART-101, we propose a vision and language meta-learning model using varied state-of-the-art backbones. Our experiments reveal that while powerful deep models offer reasonable performances on puzzles in a supervised setting, they are not better than random accuracy when analyzed for generalization. We also evaluate the recent ChatGPT and other large language models on a subset of SMART-101 and find that while these models show convincing reasoning abilities, the answers are often incorrect.	翻訳日:2023-06-22 04:41:03 公開日:2023-06-18
# ブラウアー群同変ニューラルネットワーク Brauer's Group Equivariant Neural Networks ( http://arxiv.org/abs/2212.08630v2 ) ライセンス: Link先を確認	Edward Pearce-Crump	(参考訳) 私たちは、機械学習の文献に欠けている3つの対称性群に対して、層が$\mathbb{r}^{n}$のテンソルパワーを持つ可能性のある全てのグループ同変ニューラルネットワークの完全な特徴付けを提供する:$o(n)$、特別な直交群である$so(n)$、シンプレクティック群である$sp(n)$。特に、この群が$O(n)$または$SO(n)$であるとき、および群が$Sp(n)$であるときの$\mathbb{R}^{n}$のシンプレクティック基底において、そのようなテンソルパワー空間の間の学習可能で線型で同変な層函数のスパンニング集合を見つける。 We provide a full characterisation of all of the possible group equivariant neural networks whose layers are some tensor power of $\mathbb{R}^{n}$ for three symmetry groups that are missing from the machine learning literature: $O(n)$, the orthogonal group; $SO(n)$, the special orthogonal group; and $Sp(n)$, the symplectic group. In particular, we find a spanning set of matrices for the learnable, linear, equivariant layer functions between such tensor power spaces in the standard basis of $\mathbb{R}^{n}$ when the group is $O(n)$ or $SO(n)$, and in the symplectic basis of $\mathbb{R}^{n}$ when the group is $Sp(n)$.	翻訳日:2023-06-22 04:39:50 公開日:2023-06-18
# 大規模フレキシブルタイトガウス混合モデルの確率的1次学習 Stochastic First-Order Learning for Large-Scale Flexibly Tied Gaussian Mixture Model ( http://arxiv.org/abs/2212.05402v2 ) ライセンス: Link先を確認	Mohammad Pasande, Reshad Hosseini, Babak Nadjar Araabi	(参考訳) ガウス混合モデル(英: Gaussian Mixture Models、GMM)は、多くの科学的領域に適用できるカーネルモデルに基づく最も強力なパラメトリック密度推定器の1つである。近年、データソースの劇的な拡大に伴い、典型的な機械学習アルゴリズム、例えば期待最大化(em)は、高次元およびストリーミングデータで困難に直面する。さらに、複雑な密度はしばしば多数のガウス成分を必要とする。本稿では,一階確率最適化を用いたGMMの高速オンラインパラメータ推定アルゴリズムを提案する。このアプローチは、高次元のストリーミングデータや複雑な密度に直面した場合のGMMの課題に対応するためのフレームワークを提供する。直交性を保存する新しい確率多様体最適化アルゴリズムを導入し、よく知られたユークリッド空間の数値最適化と共に用いる。合成データと実データの両方における数多くの実験結果により,提案手法がem法よりも精度良く収束し,収束に必要なエポック数が少なく,エポック当たりの時間消費も少ないという点で有効であることが証明された。 Gaussian Mixture Models (GMM) are one of the most potent parametric density estimators based on the kernel model that finds application in many scientific domains. In recent years, with the dramatic enlargement of data sources, typical machine learning algorithms, e.g. Expectation Maximization (EM), encounters difficulty with high-dimensional and streaming data. Moreover, complicated densities often demand a large number of Gaussian components. This paper proposes a fast online parameter estimation algorithm for GMM by using first-order stochastic optimization. This approach provides a framework to cope with the challenges of GMM when faced with high-dimensional streaming data and complex densities by leveraging the flexibly-tied factorization of the covariance matrix. A new stochastic Manifold optimization algorithm that preserves the orthogonality is introduced and used along with the well-known Euclidean space numerical optimization. Numerous empirical results on both synthetic and real datasets justify the effectiveness of our proposed stochastic method over EM-based methods in the sense of better-converged maximum for likelihood function, fewer number of needed epochs for convergence, and less time consumption per epoch.	翻訳日:2023-06-22 04:38:28 公開日:2023-06-18
# ディープ線形ネットワークにおけるニューラル崩壊:バランスデータから不均衡データへ Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data ( http://arxiv.org/abs/2301.00437v5 ) ライセンス: Link先を確認	Hien Dang and Tho Tran and Stanley Osher and Hung Tran-The and Nhat Ho and Tan Nguyen	(参考訳) 最近のディープニューラルネットワークは、画像分類から自然言語処理まで、タスクで素晴らしいパフォーマンスを達成している。驚くべきことに、大量のパラメータを持つこれらの複雑なシステムは、収束までのトレーニングにおいて、最終層の特徴と分類器において同じ構造特性を示す。特に、ラスト層の特徴はクラス平均に崩壊し、それらのクラス平均は単純等角タイトフレーム(etf)の頂点であることが観察されている。この現象はNeural Collapse(NC)として知られている。近年の論文では、単純化された"unconstrained feature model"を用いた学習問題の大域的最小化にncが現れることが理論的に示されている。この文脈では、一般的な平均二乗誤差 (MSE) とクロスエントロピー (CE) の損失に対して、より深い線形ネットワークにおけるNCの発生を証明し、大域的な解が線形層にまたがるNC特性を示すことを示す。さらに,本研究をmse損失に対する不均衡データに拡張し,バイアスフリー設定下でのncの最初の幾何解析を提案する。以上の結果から,最終層の特徴と分類器の直交ベクトルからなる幾何への収束が,対応するクラスにおけるデータ量に依存することを示す。最後に、バランスの取れたシナリオと不均衡なシナリオの両方で、合成および実用的なネットワークアーキテクチャに関する理論的解析を実証的に検証する。 Modern deep neural networks have achieved impressive performance on tasks from image classification to natural language processing. Surprisingly, these complex systems with massive amounts of parameters exhibit the same structural properties in their last-layer features and classifiers across canonical datasets when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means, and those class-means are the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is known as Neural Collapse (NC). Recent papers have theoretically shown that NC emerges in the global minimizers of training problems with the simplified "unconstrained feature model". In this context, we take a step further and prove the NC occurrences in deep linear networks for the popular mean squared error (MSE) and cross entropy (CE) losses, showing that global solutions exhibit NC properties across the linear layers. Furthermore, we extend our study to imbalanced data for MSE loss and present the first geometric analysis of NC under bias-free setting. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of orthogonal vectors, whose lengths depend on the amount of data in their corresponding classes. Finally, we empirically validate our theoretical analyses on synthetic and practical network architectures with both balanced and imbalanced scenarios.	翻訳日:2023-06-22 04:28:25 公開日:2023-06-18
# カリキュラムによるシングルタスクrlの複雑性向上の理解 Understanding the Complexity Gains of Single-Task RL with a Curriculum ( http://arxiv.org/abs/2212.12809v3 ) ライセンス: Link先を確認	Qiyang Li, Yuexiang Zhai, Yi Ma, Sergey Levine	(参考訳) 強化学習 (Reinforcement Learning, RL) の問題は, 十分な報奨がなければ難しい。証明可能なRL法に関する先行研究は、一般的にこの問題に専用の探索戦略で対処することを提案している。しかし、この課題に取り組む別の方法は、タスク空間が興味深いタスクだけでなく、暗黙的にカリキュラムとして機能する簡単なタスクを含むマルチタスクrl問題として再編成することである。このような改革により、既存のマルチタスクRLメソッドをスクラッチから1つの課題を解決するためのより効率的な代替手段として実行することが可能となる。本研究では,単タスクrl問題をカリキュラムで定義されたマルチタスクrl問題として再構成する理論的枠組みを提案する。カリキュラムの厳密な規則性条件下では、マルチタスクRL問題における各タスクの逐次的解決は、明確な探索ボーナスや探索戦略を伴わずに、元の単一タスク問題の解決よりも計算的に効率的であることを示す。また, シミュレーションロボットタスクにおけるカリキュラム学習を高速化する効果的な実践的学習アルゴリズムに, 理論的洞察を変換できることを示した。 Reinforcement learning (RL) problems can be challenging without well-shaped rewards. Prior work on provably efficient RL methods generally proposes to address this issue with dedicated exploration strategies. However, another way to tackle this challenge is to reformulate it as a multi-task RL problem, where the task space contains not only the challenging task of interest but also easier tasks that implicitly function as a curriculum. Such a reformulation opens up the possibility of running existing multi-task RL methods as a more efficient alternative to solving a single challenging task from scratch. In this work, we provide a theoretical framework that reformulates a single-task RL problem as a multi-task RL problem defined by a curriculum. Under mild regularity conditions on the curriculum, we show that sequentially solving each task in the multi-task RL problem is more computationally efficient than solving the original single-task problem, without any explicit exploration bonuses or other exploration strategies. We also show that our theoretical insights can be translated into an effective practical learning algorithm that can accelerate curriculum learning on simulated robotic tasks.	翻訳日:2023-06-22 04:28:02 公開日:2023-06-18
# 量子チャネルの時間表現のヒット:既約の場合とユニタリウォークへの応用を超えて Hitting time expressions for quantum channels: beyond the irreducible case and applications to unitary walks ( http://arxiv.org/abs/2301.07003v3 ) ライセンス: Link先を確認	C. F. Lardizabal and L. F. L. Pereira	(参考訳) この研究では、有限次元ヒルベルト空間に作用する量子チャネルに関連する一般化された逆数を用いて、粒子が選択されたゴール部分空間に到達する平均ヒット時間を計算することができる。この研究で研究されている問題は、グラフ、特に量子マルコフ連鎖の量子力学に関する最近の結果に動機づけられている。我々は,一般化された逆数と打点時間がどのように得られるかを記述することに集中する。 a) 既約性の概念を弱めることができるので、既約の例も考慮できる。 b) 一般正のトレース保存地図に対する任意の到着部分空間を考えることができる。可算写像の自然な例はユニタリ量子ウォークによって与えられる。また、より特定の逆元、すなわち群逆元が我々の文脈でどのように現れるかを説明し、独立した興味を持つ行列代数的構成と関係付ける。 In this work we make use of generalized inverses associated with quantum channels acting on finite-dimensional Hilbert spaces, so that one may calculate the mean hitting time for a particle to reach a chosen goal subspace. The questions studied in this work are motivated by recent results on quantum dynamics on graphs, most particularly quantum Markov chains. We focus on describing how generalized inverses and hitting times can be obtained, with the main novelties of this work with respect to previous ones being that a) we are able to weaken the notion of irreducibility, so that reducible examples can be considered as well, and b) one may consider arbitrary arrival subspaces for general positive, trace preserving maps. Natural examples of reducible maps are given by unitary quantum walks. We also take the opportunity to explain how a more specific inverse, namely the group inverse, appears in our context, in connection with matrix algebraic constructions which may be of independent interest.	翻訳日:2023-06-22 04:19:54 公開日:2023-06-18
# 多次元概念発見(MCD):完全性を保証する統一フレームワーク Multi-dimensional concept discovery (MCD): A unifying framework with completeness guarantees ( http://arxiv.org/abs/2301.11911v2 ) ライセンス: Link先を確認	Johanna Vielhaben, Stefan Bl\"ucher, Nils Strodthoff	(参考訳) 完全性公理は、モデルに局所的に忠実である、すなわち一つの決定に対してのみ、ポストホックなXAI法の説明を与える。 XAIの信頼できる応用、特に高い意思決定には、よりグローバルなモデル理解が必要です。近年,概念に基づく手法が提案されているが,実際のモデル推論に縛られることは保証されていない。この問題を回避するために,概念レベルの完全性関係を満たす従来のアプローチの拡張として,多次元概念発見(MCD)を提案する。提案手法は一般線形部分空間から概念として始まり,概念解釈可能性の強化やモデル部品の再学習は不要である。改良された概念を発見し,多次元部分空間の可能性を完全に活用するために,スパース部分空間クラスタリングを提案する。 mcdは、入力空間の概念を補完する2つの分析ツールを提供している: (1) 概念活性化マップ(concept activation map)は、サンプル内で概念が表現される場所を示し、原型的なサンプルを通して概念のキャラクタリゼーションを可能にする。どちらのツールもモデル推論の詳細な理解を可能にし、完全性関係を通じてモデルと関係することを保証する。これは、より信頼できるコンセプトベースのXAIへの道を開く。我々はより制約のある概念定義に対するmcdの優位性を実証的に示す。 The completeness axiom renders the explanation of a post-hoc XAI method only locally faithful to the model, i.e. for a single decision. For the trustworthy application of XAI, in particular for high-stake decisions, a more global model understanding is required. Recently, concept-based methods have been proposed, which are however not guaranteed to be bound to the actual model reasoning. To circumvent this problem, we propose Multi-dimensional Concept Discovery (MCD) as an extension of previous approaches that fulfills a completeness relation on the level of concepts. Our method starts from general linear subspaces as concepts and does neither require reinforcing concept interpretability nor re-training of model parts. We propose sparse subspace clustering to discover improved concepts and fully leverage the potential of multi-dimensional subspaces. MCD offers two complementary analysis tools for concepts in input space: (1) concept activation maps, that show where a concept is expressed within a sample, allowing for concept characterization through prototypical samples, and (2) concept relevance heatmaps, that decompose the model decision into concept contributions. Both tools together enable a detailed understanding of the model reasoning, which is guaranteed to relate to the model via a completeness relation. This paves the way towards more trustworthy concept-based XAI. We empirically demonstrate the superiority of MCD against more constrained concept definitions.	翻訳日:2023-06-22 04:09:21 公開日:2023-06-18
# 入力摂動による拡散モデルにおける露光バイアス低減 Input Perturbation Reduces Exposure Bias in Diffusion Models ( http://arxiv.org/abs/2301.11706v3 ) ライセンス: Link先を確認	Mang Ning, Enver Sangineto, Angelo Porrello, Simone Calderara, Rita Cucchiara	(参考訳) Denoising Diffusion Probabilistic Modelsは、長いサンプリングチェーンは高い計算コストをもたらすが、優れた生成品質を示している。本稿では,長いサンプリングチェーンが誤り蓄積現象の原因となり,自己回帰的テキスト生成における露光バイアス問題と類似していることを示す。具体的には、前者は真理サンプルに、後者は前回生成した結果に条件付けされているため、トレーニングとテストの間には相違があることに留意する。この問題を緩和するために,基底真理サンプルを摂動させて推定時間予測誤差をシミュレートする,非常に単純かつ効果的なトレーニング正規化を提案する。提案する入力摂動は,リコールや精度に影響を与えず,トレーニング時間と推論時間の両方を削減しつつ,サンプル品質の大幅な改善をもたらすことを実証的に示す。例えば、CelebA 64$\times$64では、トレーニング時間の37.5%を節約しながら、新しい最先端のFIDスコア1.27を達成する。コードはhttps://github.com/forever208/DDPM-IPで公開されている。 Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing, since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality while reducing both the training and the inference times. For instance, on CelebA 64$\times$64, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5% of the training time. The code is publicly available at https://github.com/forever208/DDPM-IP	翻訳日:2023-06-22 04:08:56 公開日:2023-06-18
# 不規則サンプリング時間列に対するニューラル連続離散状態空間モデル Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series ( http://arxiv.org/abs/2301.11308v3 ) ライセンス: Link先を確認	Abdul Fatir Ansari, Alvin Heng, Andre Lim, Harold Soh	(参考訳) 実世界の動的現象(例えば気候、生物)の正確な予測モデルを学ぶことは難しい課題である。鍵となる問題は、自然プロセスと人工プロセスの両方によって生成されたデータは、しばしば不規則にサンプリングされ、または欠落した観察を含む時系列で構成されていることである。本研究では,離散時間観測による時系列連続時間モデリングのためのニューラル連続離散状態空間モデル(NCDSSM)を提案する。 NCDSSMは補助変数を用いて力学からの認識をアンタングルし、補助変数のみに償却推論を必要とする。連続離散フィルタリング理論の手法を活用して,動的状態の正確なベイズ推定を行う方法を示す。本研究では,潜在ダイナミクスの3つの柔軟なパラメータ化と,推論中に動的状態を限界化する効率的な学習目標を提案する。様々なドメインにわたる複数のベンチマークデータセットでの実証結果は、既存のモデルに対するncdssmのインプテーションと予測性能が改善されたことを示している。 Learning accurate predictive models of real-world dynamic phenomena (e.g., climate, biological) remains a challenging task. One key issue is that the data generated by both natural and artificial processes often comprise time series that are irregularly sampled and/or contain missing observations. In this work, we propose the Neural Continuous-Discrete State Space Model (NCDSSM) for continuous-time modeling of time series through discrete-time observations. NCDSSM employs auxiliary variables to disentangle recognition from dynamics, thus requiring amortized inference only for the auxiliary variables. Leveraging techniques from continuous-discrete filtering theory, we demonstrate how to perform accurate Bayesian inference for the dynamic states. We propose three flexible parameterizations of the latent dynamics and an efficient training objective that marginalizes the dynamic states during inference. Empirical results on multiple benchmark datasets across various domains show improved imputation and forecasting performance of NCDSSM over existing models.	翻訳日:2023-06-22 04:08:18 公開日:2023-06-18
# 交互群同変ニューラルネットワークのゼリーフィッシュ特性 How Jellyfish Characterise Alternating Group Equivariant Neural Networks ( http://arxiv.org/abs/2301.10152v2 ) ライセンス: Link先を確認	Edward Pearce-Crump	(参考訳) 我々は、層が$\mathbb{R}^{n}$のテンソルパワーを持つ任意の交互群(A_n$)同変ニューラルネットワークの完全な特徴付けを提供する。特に、学習可能で線型で$A_n$-同変な層函数に対する行列の基底は、そのようなテンソルパワー空間の間の標準基底$\mathbb{R}^{n}$である。また,本手法が局所対称性に同値なニューラルネットワークの構築にどのように一般化するかについても述べる。 We provide a full characterisation of all of the possible alternating group ($A_n$) equivariant neural networks whose layers are some tensor power of $\mathbb{R}^{n}$. In particular, we find a basis of matrices for the learnable, linear, $A_n$-equivariant layer functions between such tensor power spaces in the standard basis of $\mathbb{R}^{n}$. We also describe how our approach generalises to the construction of neural networks that are equivariant to local symmetries.	翻訳日:2023-06-22 04:07:53 公開日:2023-06-18
# ローカルクレジットと不完全軌道を用いたGFlowNetsのより良いトレーニング Better Training of GFlowNets with Local Credit and Incomplete Trajectories ( http://arxiv.org/abs/2302.01687v2 ) ライセンス: Link先を確認	Ling Pan, Nikolay Malkin, Dinghuai Zhang, Yoshua Bengio	(参考訳) Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an energy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood). それらは、生成軌道の最後に与えられる、いくつかの報酬関数 $r(x)$ (または $\exp(-\mathcal{e}(x))$ with $\mathcal{e}(x)$ に比例する確率を持つ一連のステップを通じて、オブジェクト $x$を生成するように訓練される。最終的に報酬が与えられる他のRL設定と同様に、トレーニングとクレジットの割り当ての効率は、これらの軌道が長くなると損なわれる可能性がある。従来のgflownetでは,不完全なトラジェクタ(終端状態と関連する報酬の計算)からの学習は不可能だった。本稿では, 終端状態だけでなく, 中間状態にもエネルギー関数が適用可能であることを考察する。これは例えば、エネルギー関数が加法的であるときに達成され、軌道に沿って項が利用できる。我々は、GFlowNet状態フロー関数を再パラメータ化して、各状態で既に獲得した部分的な報酬を利用する方法を示す。これにより、不完全なトラジェクトリであってもパラメータの更新に適用可能なトレーニングの目標が可能になる。完全な軌道が利用可能である場合でも、多くのシミュレーションで示されているように、より局所化されたクレジットと勾配を得ることができることはトレーニング収束をスピードアップさせる。 Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an energy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood). They are trained to generate an object $x$ through a sequence of steps with probability proportional to some reward function $R(x)$ (or $\exp(-\mathcal{E}(x))$ with $\mathcal{E}(x)$ denoting the energy function), given at the end of the generative trajectory. Like for other RL settings where the reward is only given at the end, the efficiency of training and credit assignment may suffer when those trajectories are longer. With previous GFlowNet work, no learning was possible from incomplete trajectories (lacking a terminal state and the computation of the associated reward). In this paper, we consider the case where the energy function can be applied not just to terminal states but also to intermediate states. This is for example achieved when the energy function is additive, with terms available along the trajectory. We show how to reparameterize the GFlowNet state flow function to take advantage of the partial reward already accrued at each state. This enables a training objective that can be applied to update parameters even with incomplete trajectories. Even when complete trajectories are available, being able to obtain more localized credit and gradients is found to speed up training convergence, as demonstrated across many simulations.	翻訳日:2023-06-22 04:00:16 公開日:2023-06-18
# 生成的対向対称性発見 Generative Adversarial Symmetry Discovery ( http://arxiv.org/abs/2302.00236v4 ) ライセンス: Link先を確認	Jianke Yang, Robin Walters, Nima Dehmamy, Rose Yu	(参考訳) 科学応用における等価ニューラルネットワークの成功にもかかわらず、それらは対称性群 a を事前に知る必要がある。しかし、実際どの対称性を帰納的バイアスとして使うかを知るのは難しいかもしれない。間違った対称性を強制してもパフォーマンスを損なうことさえある。本稿では,生成的対人訓練に類似したパラダイムを用いて,データセットから同値を自動的に検出するフレームワークLieGANを提案する。具体的には、生成器がデータに適用された変換のグループを学習し、元の分布を保存し、識別器を騙す。リーGANは対称性を解釈可能なリー代数基底として表現し、回転群 $\mathrm{SO}(n)$、制限ローレンツ群 $\mathrm{SO}(1,3)^+$ のような様々な対称性を軌道予測やトップクォークタギングタスクにおいて発見することができる。学習された対称性は、予測の精度と一般化を改善するために、既存の同変ニューラルネットワークで容易に利用できる。 Despite the success of equivariant neural networks in scientific applications, they require knowing the symmetry group a priori. However, it may be difficult to know which symmetry to use as an inductive bias in practice. Enforcing the wrong symmetry could even hurt the performance. In this paper, we propose a framework, LieGAN, to automatically discover equivariances from a dataset using a paradigm akin to generative adversarial training. Specifically, a generator learns a group of transformations applied to the data, which preserve the original distribution and fool the discriminator. LieGAN represents symmetry as interpretable Lie algebra basis and can discover various symmetries such as the rotation group $\mathrm{SO}(n)$, restricted Lorentz group $\mathrm{SO}(1,3)^+$ in trajectory prediction and top-quark tagging tasks. The learned symmetry can also be readily used in several existing equivariant neural networks to improve accuracy and generalization in prediction.	翻訳日:2023-06-22 03:59:08 公開日:2023-06-18
# アダプタフュージョンによるパラメータ効率変調バイアス低減 Parameter-efficient Modularised Bias Mitigation via AdapterFusion ( http://arxiv.org/abs/2302.06321v2 ) ライセンス: Link先を確認	Deepak Kumar, Oleg Lesota, George Zerveas, Daniel Cohen, Carsten Eickhoff, Markus Schedl, Navid Rekabsaz	(参考訳) 大きな事前学習された言語モデルは社会バイアスを含み、これらのバイアスに沿って下流タスクに運ばれます。現行のプロセス内バイアス緩和アプローチ(例えば逆行訓練)は、モデルのパラメータを更新することでデバイアスを課し、効果的にモデルを新しい、不可逆なデバイアス状態に移行する。本研究では,モデルから分離したスタンドアロンのデバイアス機能を開発するための新しい手法を提案する。 dam(debiasing with adapter modules) - 任意のバイアス緩和機能を別々のアダプタにカプセル化し、それをオンデマンドでモデルに追加することで公平性を提供する。我々は、性別、人種、年齢を保護属性とする3つの分類タスクに関する大規模な実験を行った。以上の結果から, DAMはバイアス緩和の有効性を改善し, マルチ属性シナリオにおける破滅的な忘れを回避し, パラメータ効率を付与し, オリジナルモデルとデバイアスモデルとの切り替えが容易なタスク性能を維持した。 Large pre-trained language models contain societal biases and carry along these biases to downstream tasks. Current in-processing bias mitigation approaches (like adversarial training) impose debiasing by updating a model's parameters, effectively transferring the model to a new, irreversible debiased state. In this work, we propose a novel approach to develop stand-alone debiasing functionalities separate from the model, which can be integrated into the model on-demand, while keeping the core model untouched. Drawing from the concept of AdapterFusion in multi-task learning, we introduce DAM (Debiasing with Adapter Modules) - a debiasing approach to first encapsulate arbitrary bias mitigation functionalities into separate adapters, and then add them to the model on-demand in order to deliver fairness qualities. We conduct a large set of experiments on three classification tasks with gender, race, and age as protected attributes. Our results show that DAM improves or maintains the effectiveness of bias mitigation, avoids catastrophic forgetting in a multi-attribute scenario, and maintains on-par task performance, while granting parameter-efficiency and easy switching between the original and debiased models.	翻訳日:2023-06-22 03:51:45 公開日:2023-06-18
# SOCRATES:ロボット犬を用いたテキスト検索とアプローチ SOCRATES: Text-based Human Search and Approach using a Robot Dog ( http://arxiv.org/abs/2302.05324v2 ) ライセンス: Link先を確認	Jeongeun Park, Jefferson Silveria, Matthew Pan, and Sungjoon Choi	(参考訳) 本稿では、自由形式のテキスト記述に基づく人間の検索とアプローチに焦点を当てたTEXシステム(SOCRATES)に基づく人間接近ロボットのためのSOCraticモデルを提案する。特に、文章の記述は外観(例えば、黒い髪の白いシャツ)と位置情報(例えば、ロボットを扱う学生)で構成されている。本稿ではまず,言語領域における大規模事前学習モデルと,テキスト記述に基づいて対象者を探索するダウンストリームタスクを接続するHuman Search Socratic Modelを提案する。そこで,本研究では,目標音場ロボットの動作を生成するためのハイブリッド学習フレームワークを提案し,実験モジュールと知識蒸留モジュールからなる人物にアプローチする。仮想移動ロボットを用いたシミュレーションと,参加者とBoston Dynamics Spotロボットによる実世界の実験により,提案した探索モジュールを検証した。さらに,ロボット社会属性尺度 (robotic social attribute scale,rosas) に基づいて,人間参加型フレームワークの特性を解析した。 In this paper, we propose a SOCratic model for Robots Approaching humans based on TExt System (SOCRATES) focusing on the human search and approach based on free-form textual description; the robot first searches for the target user, then the robot proceeds to approach in a human-friendly manner. In particular, textual descriptions are composed of appearance (e.g., wearing white shirts with black hair) and location clues (e.g., is a student who works with robots). We initially present a Human Search Socratic Model that connects large pre-trained models in the language domain to solve the downstream task, which is searching for the target person based on textual descriptions. Then, we propose a hybrid learning-based framework for generating target-cordial robotic motion to approach a person, consisting of a learning-from-demonstration module and a knowledge distillation module. We validate the proposed searching module via simulation using a virtual mobile robot as well as through real-world experiments involving participants and the Boston Dynamics Spot robot. Furthermore, we analyze the properties of the proposed approaching framework with human participants based on the Robotic Social Attributes Scale (RoSAS)	翻訳日:2023-06-22 03:50:19 公開日:2023-06-18
# 自由グラフモデルの構造学習のための原理的・効率的なモチーフ探索 Principled and Efficient Motif Finding for Structure Learning of Lifted Graphical Models ( http://arxiv.org/abs/2302.04599v3 ) ライセンス: Link先を確認	Jonathan Feldstein, Dominic Phillips and Efthymia Tsamoura	(参考訳) 構造学習は、ニューロシンボリックAIと統計リレーショナル学習の分野の中心となるAIの中核的な問題である。データから論理理論を自動的に学習する。構造学習の基礎は、構造モチーフとして知られるデータの繰り返しパターンをマイニングすることである。これらのパターンを見つけることは指数探索空間を減らし、したがって公式の学習を導く。モチーフ学習の重要性にもかかわらず、まだよく理解されていない。本稿では,一階述語論理と確率論的モデルとをブレンドする言語であるリフト型グラフィカルモデルにおいて,構造モチーフをマイニングする第一原理的手法を提案する。私たちの最初の貢献は、2つの直感的なハイパーパラメータに依存するアルゴリズムです。1つはエンティティの類似性測度の不確実性を制御するもので、もう1つは結果のルールの柔らかさを制御するものです。第2のコントリビューションは、最も関連するデータへの検索スペースを減らすために、データの階層的クラスタリングを実行する前処理ステップです。 3つ目の貢献は、構造関連データをクラスタリングするためのO(n ln n)アルゴリズムの導入です。提案手法は, 標準ベンチマークを用いて評価し, 最先端構造学習手法の精度を最大6%, 実行速度を最大80%向上することを示す。 Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning. It consists in automatically learning a logical theory from data. The basis for structure learning is mining repeating patterns in the data, known as structural motifs. Finding these patterns reduces the exponential search space and therefore guides the learning of formulas. Despite the importance of motif learning, it is still not well understood. We present the first principled approach for mining structural motifs in lifted graphical models, languages that blend first-order logic with probabilistic models, which uses a stochastic process to measure the similarity of entities in the data. Our first contribution is an algorithm, which depends on two intuitive hyperparameters: one controlling the uncertainty in the entity similarity measure, and one controlling the softness of the resulting rules. Our second contribution is a preprocessing step where we perform hierarchical clustering on the data to reduce the search space to the most relevant data. Our third contribution is to introduce an O(n ln n) (in the size of the entities in the data) algorithm for clustering structurally-related data. We evaluate our approach using standard benchmarks and show that we outperform state-of-the-art structure learning approaches by up to 6% in terms of accuracy and up to 80% in terms of runtime.	翻訳日:2023-06-22 03:49:44 公開日:2023-06-18
# nl2cmd: 自然言語からbashコマンドへの変換をアップデートしたワークフロー NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation ( http://arxiv.org/abs/2302.07845v3 ) ライセンス: Link先を確認	Quchen Fu, Zhongwei Teng, Marco Georgaklis, Jules White, Douglas C. Schmidt	(参考訳) 自然言語をBash Commandsに翻訳することは近年注目されている研究分野である。ほとんどの努力はより正確な翻訳モデルの作成に集中している。私たちの知る限りでは、2つのデータセットしか利用できません。どちらのデータセットも、既知のデータソース(stack overflowやクラウドソーシングなどを通じて)をスクレイピングし、英語テキストまたはbashコマンドの検証と修正を行う専門家を雇う。本稿では,Bashコマンドをスクラッチから合成する研究に2つの貢献をする。まず、対応する英文からBashコマンドを生成するための最先端翻訳モデルについて述べる。第2に、NL2CMDデータセットを新たに導入し、自動生成し、人間の介入を最小限に抑え、以前のデータセットの6倍以上の規模となる。生成パイプラインは既存のBashコマンドに依存しないので、分散とコマンドの種類をカスタマイズすることができる。このタスクにおけるChatGPTの性能を評価し、データジェネレータとして使用する可能性について議論する。私たちの実験結果は、データセットのスケールと多様性が、セマンティック解析研究者にユニークな機会を提供することを示す。 Translating natural language into Bash Commands is an emerging research field that has gained attention in recent years. Most efforts have focused on producing more accurate translation models. To the best of our knowledge, only two datasets are available, with one based on the other. Both datasets involve scraping through known data sources (through platforms like stack overflow, crowdsourcing, etc.) and hiring experts to validate and correct either the English text or Bash Commands. This paper provides two contributions to research on synthesizing Bash Commands from scratch. First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text. Second, we introduce a new NL2CMD dataset that is automatically generated, involves minimal human intervention, and is over six times larger than prior datasets. Since the generation pipeline does not rely on existing Bash Commands, the distribution and types of commands can be custom adjusted. We evaluate the performance of ChatGPT on this task and discuss the potential of using it as a data generator. Our empirical results show how the scale and diversity of our dataset can offer unique opportunities for semantic parsing researchers.	翻訳日:2023-06-22 03:40:05 公開日:2023-06-18
# 量子エントロピーと中心極限定理 Quantum Entropy and Central Limit Theorem ( http://arxiv.org/abs/2302.07841v3 ) ライセンス: Link先を確認	Kaifeng Bu, Weichen Gu, Arthur Jaffe	(参考訳) 離散変数(dv)量子系をquditsに基づいて研究する枠組みを提案する。これは平均状態(MS)、最小の安定射影状態(MSPS)、新しい畳み込みの概念に依存している。興味深い結果がいくつかある: ms は相対エントロピーに関して与えられた状態に対する最も近い msps であり、ms はフォン・ノイマンエントロピーに関して極端であり、「dv系における最大エントロピー原理」を示す。我々は、ゼロ平均量子状態の畳み込みを反復して中央極限定理を確立し、これをその ms に収束させることを示す。 DVビームスプリッタとDV増幅器の2つの例について詳述する。 We introduce a framework to study discrete-variable (DV) quantum systems based on qudits. It relies on notions of a mean state (MS), a minimal stabilizer-projection state (MSPS), and a new convolution. Some interesting consequences are: The MS is the closest MSPS to a given state with respect to the relative entropy; the MS is extremal with respect to the von Neumann entropy, demonstrating a ''maximal entropy principle in DV systems.'' We obtain a series of inequalities for quantum entropies and for Fisher information based on convolution, giving a ''second law of thermodynamics for quantum convolutions.'' We show that the convolution of two stabilizer states is a stabilizer state. We establish a central limit theorem, based on iterating the convolution of a zero-mean quantum state, and show this converges to its MS. The rate of convergence is characterized by the ''magic gap,'' which we define in terms of the support of the characteristic function of the state. We elaborate on two examples: the DV beam splitter and the DV amplifier.	翻訳日:2023-06-22 03:39:45 公開日:2023-06-18
# 空間時間データオーバーフィッティングによる高画質・高能率ビデオ超解法の実現に向けて Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting ( http://arxiv.org/abs/2303.08331v2 ) ライセンス: Link先を確認	Gen Li, Jie Ji, Minghai Qin, Wei Niu, Bin Ren, Fatemeh Afghah, Linke Guo, Xiaolong Ma	(参考訳) 深層畳み込みニューラルネットワーク(deep convolutional neural network, dnns)は,コンピュータビジョンのさまざまな分野で広く使用されているため,dnnによるビデオ解像度向上能力の活用が,現代の映像配信システムにおいて新たなトレンドとなっている。ビデオをチャンクに分割し、各チャンクを超高解像度モデルでオーバーフィットさせることで、サーバはビデオをクライアントに送信する前にエンコードする。しかし、大量のチャンクが良いオーバーフィッティング品質を保証することが期待され、ストレージを大幅に増加させ、データ転送により多くの帯域幅リソースを消費する。一方で、トレーニング最適化技術によるチャンク数の減少は通常、高いモデルキャパシティを必要とするため、実行速度が大幅に低下する。そこで本稿では,空間的時間的情報を利用して映像をチャンクに正確に分割し,チャンク数とモデルサイズを最小限に抑える,高品質で効率的な映像解像度アップスケーリングタスクのための新しい手法を提案する。さらに,本手法をデータ認識合同学習手法により,単一のオーバーフィッティングモデルに進化させ,品質低下によるストレージ要件の低減を図っている。市販の携帯電話にモデルをデプロイし,実験結果から,映像品質の高いリアルタイムビデオ解像度を実現することを示す。 41.6 PSNRで28fpsのストリーミング速度を実現し、ライブビデオ解像度アップスケールタスクでは14$\times$と2.29dBの高速化を実現した。 https://github.com/coulsonlee/STDO-CVPR2023.gitで利用可能なコード As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14$\times$ faster and 2.29 dB better in the live video resolution upscaling tasks. Code available in https://github.com/coulsonlee/STDO-CVPR2023.git	翻訳日:2023-06-22 03:23:41 公開日:2023-06-18
# moe展開に向けて:mixing-of-expert(moe)推論の非効率化 Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference ( http://arxiv.org/abs/2303.06182v2 ) ライセンス: Link先を確認	Haiyang Huang, Newsha Ardalani, Anna Sun, Liu Ke, Hsien-Hsin S. Lee, Anjali Sridhar, Shruti Bhosale, Carole-Jean Wu, Benjamin Lee	(参考訳) Mixture-of-Experts (MoE)モデルはコンピュータビジョンと自然言語処理の幅広いタスクにおいて最先端のパフォーマンスを達成するために人気を集めている。トレーニング中の計算コストの最小化を図りながら、モデル容量を効果的に拡大する。しかし,そのようなモデルの導入は,大規模で複雑な通信パターンのため困難である。本稿では,2つのmoeワークロード,すなわち言語モデリング(lm)と機械翻訳(mt)のキャラクタリゼーションを行い,デプロイ時の非効率なソースを特定する。本研究では,(1)動的ゲーティング,(2)エキスパートバッファリング,(3)エキスパートロードバランシングの3つの非効率化手法を提案する。我々は,動的ゲーティングにより最大スループットが6.21-11.23$\times$ for LM, 5.75-10.98$\times$ for MT Encoder, 2.58-5.71$\times$ for MT Decoderを示す。また、LMで最大1.36$\times$、MTで最大1.1$\times$までメモリ使用量を削減します。また、CPUメモリで残りをバッファリングしながら、GPUメモリで熱くアクティブな専門家のみを保持する新しいキャッシングメカニズムであるExpert Bufferingを提案します。これにより、静的メモリ割り当てを最大1.47$\times$まで削減できる。最後に、ワークロードにさらなるスケーラビリティを提供するロードバランシング手法を提案する。 Mixture-of-Experts (MoE) models have gained popularity in achieving state-of-the-art performance in a wide range of tasks in computer vision and natural language processing. They effectively expand the model capacity while incurring a minimal increase in computation cost during training. However, deploying such models for inference is difficult due to their large size and complex communication pattern. In this work, we provide a characterization of two MoE workloads, namely Language Modeling (LM) and Machine Translation (MT) and identify their sources of inefficiencies at deployment. We propose three optimization techniques to mitigate sources of inefficiencies, namely (1) Dynamic gating, (2) Expert Buffering, and (3) Expert load balancing. We show that dynamic gating improves maximum throughput by 6.21-11.23$\times$ for LM, 5.75-10.98$\times$ for MT Encoder and 2.58-5.71$\times$ for MT Decoder. It also reduces memory usage by up to 1.36$\times$ for LM and up to 1.1$\times$ for MT. We further propose Expert Buffering, a new caching mechanism that only keeps hot, active experts in GPU memory while buffering the rest in CPU memory. This reduces static memory allocation by up to 1.47$\times$. We finally propose a load balancing methodology that provides additional scalability to the workload.	翻訳日:2023-06-22 03:22:27 公開日:2023-06-18
# Pacos: 推奨反転におけるユーザの解釈とコンテキスト依存の選択をモデル化する Pacos: Modeling Users' Interpretable and Context-Dependent Choices in Preference Reversals ( http://arxiv.org/abs/2303.05648v2 ) ライセンス: Link先を確認	Qingming Li and H. Vicky Zhao	(参考訳) 選択問題とは、いくつかの項目から最適な選択を選択することを指し、選択問題におけるユーザの好みを学ぶことは、意思決定メカニズムを理解し、パーソナライズされたサービスを提供する上で非常に重要である。現存する作品は通常、人々が個別にアイテムを評価すると仮定する。しかし、実際には、ユーザの嗜好は、コンテキスト効果と呼ばれるアイテムが配置されている市場に依存しており、2つの項目に対するユーザの嗜好の順序は逆転し、嗜好逆転と呼ばれることもある。本研究では,ユーザの適応的な重み付け,項目間比較,表示位置の3つの要因を明らかにする。本稿では,3つの要素を同時に扱うための統一フレームワークとしてpacosと呼ばれる文脈依存選好モデルを提案し,高い解釈性を持つ付加法と高精度な ann 法を含む2つの設計法を検討する。プライオリティ・リバーサルの発生条件について検討し,プライオリティ・リバーサルの対処におけるpacosの有効性を理論的に証明する。実験結果から,提案手法は,ユーザの選択を予測するための先行作業よりも優れた性能を示し,好みの逆転の原因を理解するのに大いに役立つことがわかった。 Choice problems refer to selecting the best choices from several items, and learning users' preferences in choice problems is of great significance in understanding the decision making mechanisms and providing personalized services. Existing works typically assume that people evaluate items independently. In practice, however, users' preferences depend on the market in which items are placed, which is known as context effects; and the order of users' preferences for two items may even be reversed, which is referred to preference reversals. In this work, we identify three factors contributing to context effects: users' adaptive weights, the inter-item comparison, and display positions. We propose a context-dependent preference model named Pacos as a unified framework for addressing three factors simultaneously, and consider two design methods including an additive method with high interpretability and an ANN-based method with high accuracy. We study the conditions for preference reversals to occur and provide an theoretical proof of the effectiveness of Pacos in addressing preference reversals. Experimental results show that the proposed method has better performance than prior works in predicting users' choices, and has great interpretability to help understand the cause of preference reversals.	翻訳日:2023-06-22 03:21:35 公開日:2023-06-18
# 位相・経路コヒーレンスに基づく指向性ルータと制御可能な非相互伝送 Directional router and controllable non-reciprocity transmission based on phase and pathway coherence ( http://arxiv.org/abs/2303.13784v2 ) ライセンス: Link先を確認	Xu Yang, Lei Tan, and Wu-Ming Liu	(参考訳) 4つの空洞を持つ多チャネル量子ルータは、2つの結合共振器導波路と4つの単一空洞によって構成される。このハイブリッドシステムでは、入射ポートから出港ポートまでの光子間の複数の経路に基づき、特定ポートから出射する光子を100%に近い位置に調整することで方向経路を実現することができる。 2つの古典的光場間の位相差の影響下では、異なる経路間の相互干渉を破壊的干渉や建設的干渉に調整することができ、ルーティング確率の増大と減少の基礎となる。単一光子ルーティング確率に対するパラメータ値の影響についても検討した。確率振幅の解析式を調べることで、一定のパラメータ条件下で出口が閉じられる物理機構と、光子の後方伝達と元の方向伝達との間の位相関係が得られる。さらに、カイラルカップリングを超えた非相反的な伝送と方向ルーティングも実現でき、量子ルータの研究に新たな可能性と光子伝送特性の研究への新たな洞察を与えることができる。 A multi-channel quantum router with four nodal cavities is constructed by two coupled-resonator waveguides and four single cavities. We can achieve directional routing by adjusting the probability of photon exiting from the specified port to close to 100% based on multiple pathways between the photon from the incident port to the outgoing port in this hybrid system. Under the effect of phase difference between two classical light fields, the mutual interference between different pathways can be adjusted to destructive interference or constructive interference, which lays the foundation for the increase and decrease of the routing probability. The influence of different parameter values on single photon routing probability is also studied. By studying the analytic formula of probability amplitude, we get the physical mechanism of exiting ports being closed under certain parameter conditions and the phase relationship between the backward transmission and the original direction transmission of photons. Furthermore the non-reciprocal transmission and directional routing beyond chiral coupling can also be realized, which provides new possibilities for the study of quantum routers and new insights for the study of photon transmission characteristics.	翻訳日:2023-06-22 03:10:40 公開日:2023-06-18
# OpenAGI: LLMがドメインエキスパートと出会ったとき OpenAGI: When LLM Meets Domain Experts ( http://arxiv.org/abs/2304.04370v4 ) ライセンス: Link先を確認	Yingqiang Ge, Wenyue Hua, Kai Mei, Jianchao Ji, Juntao Tan, Shuyuan Xu, Zelong Li, Yongfeng Zhang	(参考訳) ヒューマンインテリジェンスは、複雑なタスクを解決するための基本的なスキルの組み合わせに長けている。この能力は人工知能(AI)にとって不可欠であり、包括的なインテリジェントモデルに組み込まれるべきであり、AI(Artificial General Intelligence)に向けた複雑なタスク解決のためのエキスパートモデルを活用することができる。大規模言語モデル(llm)は有望な学習能力と推論能力を示し、外部モデルを用いて複雑な問題に取り組むことができる。本研究では,マルチステップ実世界のタスク用に設計されたオープンソースのAGI研究プラットフォームであるOpenAGIを紹介する。具体的には、OpenAGIはデュアル戦略を使用し、ベンチマークと評価のための標準ベンチマークタスクと、クリエイティブな問題解決のためのより拡張可能なモデルを含むオープンエンドタスクを統合する。タスクはLLMに自然言語クエリとして表示され、適切なモデルを選択し実行します。また,タスクフィードバック(rltf)機構からの強化学習を提案し,タスク結果を用いてllmの能力を改善し,自己改善型aiフィードバックループを作成する。我々は、AGIが一意に定義された解決経路を持たない、広く多面的な研究課題であることを認めているが、LLMとドメイン固有の専門家モデルの統合は、人間における一般知能と専門知能の混在を反映したものであり、AGIに対する有望なアプローチである。私たちは、openagiプロジェクトのコード、データセット、ベンチマーク、評価メソッド、デモをオープンソース化し、agiの進歩へのコミュニティの関与を促進しています。 Human intelligence excels at combining basic skills to solve complex tasks. This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive intelligent models, enabling them to harness expert models for complex task-solving towards Artificial General Intelligence (AGI). Large Language Models (LLMs) show promising learning and reasoning abilities, and can effectively use external models to tackle complex problems. In this work, we introduce OpenAGI, an open-source AGI research platform designed for multi-step, real-world tasks. Specifically, OpenAGI uses a dual strategy, integrating standard benchmark tasks for benchmarking and evaluation, and open-ended tasks including more expandable models for creative problem-solving. Tasks are presented as natural language queries to the LLM, which then selects and executes appropriate models. We also propose a Reinforcement Learning from Task Feedback (RLTF) mechanism that uses task results to improve the LLM's ability, which creates a self-improving AI feedback loop. While we acknowledge that AGI is a broad and multifaceted research challenge with no singularly defined solution path, the integration of LLMs with domain-specific expert models, inspired by mirroring the blend of general and specialized intelligence in humans, offers a promising approach towards AGI. We are open-sourcing the OpenAGI project's code, dataset, benchmarks, evaluation methods, and demo to foster community involvement in AGI advancement: https://github.com/agiresearch/OpenAGI.	翻訳日:2023-06-22 03:03:28 公開日:2023-06-18
# 炭化ケイ素中の炭素クラスターエミッタ Carbon cluster emitters in silicon carbide ( http://arxiv.org/abs/2304.04197v2 ) ライセンス: Link先を確認	Pei Li, P\'eter Udvarhelyi, Song Li, Bing Huang, and Adam Gali	(参考訳) 4Hポリタイプ(4H-SiC)の炭化ケイ素は、高破壊電界、キャリア飽和速度、優れた熱伝導率、その他の良好な特性により、高要求の電子機器に期待できる広帯域ギャップ半導体である。近年, 4H-SiC, 例えば負電荷のシリコン空孔と中性希薄量子ビットの蛍光高スピン点欠陥は, 急速に出現する量子技術分野における多くの応用候補として注目されている。さらに、炭素クラスターは4H-SiCの熱酸化後に現れる蛍光中心としても機能し、SiC結晶中の炭素原子を放出する照射技術を用いることができる。照射技術は空室関連量子ビットを生成するためにしばしば用いられるため、蛍光炭素クラスターは既に確立された空室関連量子ビットに干渉する可能性がある。本研究では, 4H-SiCの炭素原子4個以上を含む炭素クラスターの電子構造, 生成エネルギー, 解離エネルギー, 振動特性およびフル蛍光スペクトルを密度汎関数理論計算により系統的に検討した。これらの炭素クラスターのすべての局所的な構成を検討しました。炭素クラスターの電子的および振動的性質は、4h-sic格子の実際の局所配置に大きく依存する。 4H-SiCの炭素クラスターを4H-SiCの安定可視発光体として同定した。 Silicon carbide in its 4H polytype (4H-SiC) is a promising wide band gap semiconductor for highly-demanding electronic devices, thanks to its high breakdown electrical field, high carrier saturation speed, excellent thermal conductivity, and other favorable properties. Recently, fluorescent high-spin point defects in 4H-SiC, e.g., negatively charged silicon-vacancy and neutral divacancy qubits, have been proven to be outstanding candidates for numerous applications in the rapidly emerging field of quantum technology. In addition, carbon clusters can act as fluorescent centers too that may appear after thermal oxidation of 4H-SiC or using irradiation techniques which kick out carbon atoms from their sites in the SiC crystal. As irradiation techniques are often used to generate vacancy-related qubits, fluorescent carbon clusters may interfere with the already established vacancy-related qubits. In this study, we systematically investigate the electronic structure, formation energy, dissociation energy, vibrational properties and the full fluorescence spectrum of carbon clusters involving up to four carbon atoms in 4H-SiC by means of density functional theory calculations. We considered all the possible local configurations for these carbon clusters. The electronic and vibronic properties of the carbon clusters depend strongly on the actual local configuration of the 4H-SiC lattice. By comparing the calculated and previously observed fluorescence spectra in 4H-SiC, we identify several carbon clusters as stable visible emitters in 4H-SiC.	翻訳日:2023-06-22 03:03:02 公開日:2023-06-18
# クロスレファレンストランスによる医療画像の分節化 Few-shot Medical Image Segmentation via Cross-Reference Transformer ( http://arxiv.org/abs/2304.09630v3 ) ライセンス: Link先を確認	Yao Huang and Jianming Liu	(参考訳) 深層学習モデルは医用画像セグメンテーションの主流となっているが、トレーニングには大規模な手動ラベル付きデータセットが必要であり、目に見えないカテゴリに拡張することは困難である。 Few-shot segmentation(FSS)は、少数のラベル付きサンプルから新しいカテゴリを学習することで、これらの課題に対処する可能性がある。現在の手法のほとんどはプロトタイプ学習アーキテクチャを採用しており、サポート対象のベクトルを拡張し、条件付きセグメンテーションを実行するためにクエリ機能と結合する。しかし、このようなフレームワークは、サポートとクエリ機能の相関を無視する一方で、クエリ機能に重点を置く可能性がある。本稿では,支援画像と問合せ画像との相互作用の欠如に対処するために,クロスリファレンストランスを用いた,自己教師付き少数の医用画像分割ネットワークを提案する。まず,両方向のクロスアテンションモジュールを用いて,サポートセット画像とクエリ画像の相関性を向上する。次に,高次元チャネルにおけるサポート機能やクエリ機能の類似部分を発掘・拡張するために,クロスリファレンス機構を採用している。実験の結果,CTデータセットとMRIデータセットの両方で良好な結果が得られた。 Deep learning models have become the mainstream method for medical image segmentation, but they require a large manually labeled dataset for training and are difficult to extend to unseen categories. Few-shot segmentation(FSS) has the potential to address these challenges by learning new categories from a small number of labeled samples. The majority of the current methods employ a prototype learning architecture, which involves expanding support prototype vectors and concatenating them with query features to conduct conditional segmentation. However, such framework potentially focuses more on query features while may neglect the correlation between support and query features. In this paper, we propose a novel self-supervised few shot medical image segmentation network with Cross-Reference Transformer, which addresses the lack of interaction between the support image and the query image. We first enhance the correlation features between the support set image and the query image using a bidirectional cross-attention module. Then, we employ a cross-reference mechanism to mine and enhance the similar parts of support features and query features in high-dimensional channels. Experimental results show that the proposed model achieves good results on both CT dataset and MRI dataset.	翻訳日:2023-06-22 02:53:41 公開日:2023-06-18
# 微調整事前学習言語モデルのためのk-NNの再検討 Revisiting k-NN for Fine-tuning Pre-trained Language Models ( http://arxiv.org/abs/2304.09058v2 ) ライセンス: Link先を確認	Lei Li, Jing Chen, Bozhong Tian, Ningyu Zhang	(参考訳) パラメトリックベースの熱心な学習者であるプレトレーニング言語モデル(PLM)は、現在の自然言語処理(NLP)のパラダイムにおいて事実上の選択肢となっている。対照的に、k-Nearest-Neighbor(kNN)分類器は遅延学習パラダイムであり、過度なフィットと孤立したノイズを軽減する傾向がある。本稿では, PLM に基づく分類器の拡張のために kNN 分類器を再検討する。方法論的なレベルでは,(1)kNNを事前知識として活用してトレーニングプロセスの校正を行う,という2つのステップで,PLMのテキスト表現を持つkNNを採用することを提案する。 2) kNNで予測される確率分布とPLMの分類器の確率分布を線形に補間する。私たちのアプローチの核心は、kNN校正トレーニングの実装です。これは、トレーニングプロセスにおいて、予測結果を簡単な例と難しい例の指標として扱います。アプリケーションシナリオの多様性の観点から、我々は8つのエンドタスクに対して、微調整、急速調整、ゼロショット、少数ショット、完全教師付き設定に関する広範な実験を行います。我々は,NLPを効率的にするための古典的手法の力をコミュニティに再考させることを願っている。コードとデータセットはhttps://github.com/zjunlp/Revisit-KNNで公開されている。 Pre-trained Language Models (PLMs), as parametric-based eager learners, have become the de-facto choice for current paradigms of Natural Language Processing (NLP). In contrast, k-Nearest-Neighbor (kNN) classifiers, as the lazy learning paradigm, tend to mitigate over-fitting and isolated noise. In this paper, we revisit kNN classifiers for augmenting the PLMs-based classifiers. From the methodological level, we propose to adopt kNN with textual representations of PLMs in two steps: (1) Utilize kNN as prior knowledge to calibrate the training process. (2) Linearly interpolate the probability distribution predicted by kNN with that of the PLMs' classifier. At the heart of our approach is the implementation of kNN-calibrated training, which treats predicted results as indicators for easy versus hard examples during the training process. From the perspective of the diversity of application scenarios, we conduct extensive experiments on fine-tuning, prompt-tuning paradigms and zero-shot, few-shot and fully-supervised settings, respectively, across eight diverse end-tasks. We hope our exploration will encourage the community to revisit the power of classical methods for efficient NLP. Code and datasets are available in https://github.com/zjunlp/Revisit-KNN.	翻訳日:2023-06-22 02:53:20 公開日:2023-06-18
# 人間-aiチームにおける統計的プロアクティブダイアログモデリングのための信頼度対応ユーザシミュレータの開発 Development of a Trust-Aware User Simulator for Statistical Proactive Dialog Modeling in Human-AI Teams ( http://arxiv.org/abs/2304.11913v2 ) ライセンス: Link先を確認	Matthias Kraus, Ron Riekenbrauck, Wolfgang Minker	(参考訳) 近年,人間-AIチームという概念が注目されている。人間とAIチームメイトとの効果的なコラボレーションのためには、緊密な協調と効果的なコミュニケーションには、積極的活動が不可欠である。しかしながら、人間をサポートするAIベースのシステムのための適切な能動性の設計は、まだオープンな問題であり、課題である。本稿では,プロアクティブダイアログポリシーのトレーニングとテストのためのコーパスベースユーザシミュレータの開発について述べる。このシミュレータは、プロアクティブダイアログとそのユーザ信頼への影響に関するインフォームド知識を取り入れ、社会デポグラフィ的特徴やパーソナリティ特性を含むユーザの行動や個人情報をシミュレートする。 2つの異なるシミュレーション手法を比較し、タスクステップベースの手法により、逐次依存関係のモデリングの強化により、全体的な結果が改善された。本研究では,人間-AIチーム改善のための対話ゲーム設定において,適切なプロアクティブ戦略を探索し,評価するための有望な方法を提案する。 The concept of a Human-AI team has gained increasing attention in recent years. For effective collaboration between humans and AI teammates, proactivity is crucial for close coordination and effective communication. However, the design of adequate proactivity for AI-based systems to support humans is still an open question and a challenging topic. In this paper, we present the development of a corpus-based user simulator for training and testing proactive dialog policies. The simulator incorporates informed knowledge about proactive dialog and its effect on user trust and simulates user behavior and personal information, including socio-demographic features and personality traits. Two different simulation approaches were compared, and a task-step-based approach yielded better overall results due to enhanced modeling of sequential dependencies. This research presents a promising avenue for exploring and evaluating appropriate proactive strategies in a dialog game setting for improving Human-AI teams.	翻訳日:2023-06-22 02:40:54 公開日:2023-06-18
# 6次元非定常マニピュレーションのためのハイブリッドアクタ・クリティカルマップの学習 Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation ( http://arxiv.org/abs/2305.03942v2 ) ライセンス: Link先を確認	Wenxuan Zhou, Bowen Jiang, Fan Yang, Chris Paxton, David Held	(参考訳) 物を握らずに操作することは、人間の器用さに欠かせない要素であり、非理解的な操作と呼ばれる。非包括的操作は、オブジェクトとのより複雑な相互作用を可能にするだけでなく、グリップとオブジェクトの相互作用を推論する際の課題も提示する。本研究では,物体の6次元非包括的操作のための強化学習手法であるHybrid Actor-Critic Maps for Manipulation (HACMan)を紹介する。 HACManは、オブジェクトポイントクラウドから接触位置を選択することと、ロボットが接触した後どのように動くかを記述した一連の動きパラメータからなる、時間的に制限された空間的空間的なオブジェクト中心のアクション表現を提案する。我々は、このハイブリッド離散連続アクション表現で学習するために、既存のオフポリチィRLアルゴリズムを変更した。シミュレーションおよび実世界における6次元オブジェクトポーズアライメントタスクにおけるHACManの評価を行った。ランダム化された初期ポーズ,ランダム化された6d目標,多様なオブジェクトカテゴリを備えた最難のタスクでは,性能低下を伴わないオブジェクトカテゴリに対する強力な一般化が実証され,実世界でのゼロショット転送で89%の成功率と50%の成功率を達成した。代替アクション表現と比較して、HACManは最高のベースラインの3倍以上の成功率を達成する。ゼロショットのsim2realトランスファーでは、動的かつ接触に富んだ非包括的スキルを用いて、現実の未確認物体をうまく操作できる。ビデオはプロジェクトのwebサイト(https://hacman-2023.github.io)で見ることができる。 Manipulating objects without grasping them is an essential component of human dexterity, referred to as non-prehensile manipulation. Non-prehensile manipulation may enable more complex interactions with the objects, but also presents challenges in reasoning about gripper-object interactions. In this work, we introduce Hybrid Actor-Critic Maps for Manipulation (HACMan), a reinforcement learning approach for 6D non-prehensile manipulation of objects using point cloud observations. HACMan proposes a temporally-abstracted and spatially-grounded object-centric action representation that consists of selecting a contact location from the object point cloud and a set of motion parameters describing how the robot will move after making contact. We modify an existing off-policy RL algorithm to learn in this hybrid discrete-continuous action representation. We evaluate HACMan on a 6D object pose alignment task in both simulation and in the real world. On the hardest version of our task, with randomized initial poses, randomized 6D goals, and diverse object categories, our policy demonstrates strong generalization to unseen object categories without a performance drop, achieving an 89% success rate on unseen objects in simulation and 50% success rate with zero-shot transfer in the real world. Compared to alternative action representations, HACMan achieves a success rate more than three times higher than the best baseline. With zero-shot sim2real transfer, our policy can successfully manipulate unseen objects in the real world for challenging non-planar goals, using dynamic and contact-rich non-prehensile skills. Videos can be found on the project website: https://hacman-2023.github.io.	翻訳日:2023-06-22 02:32:22 公開日:2023-06-18
# ChatGPTの動作記憶能力に関する実証的研究 Working Memory Capacity of ChatGPT: An Empirical Study ( http://arxiv.org/abs/2305.03731v2 ) ライセンス: Link先を確認	Dongyu Gong, Xingchen Wan, Dingmin Wang	(参考訳) ワーキングメモリは、人間の知性と人工知能の両方において重要な側面であり、情報の一時記憶と操作のためのワークスペースとして機能する。本稿では,OpenAI が開発した大規模言語モデルである ChatGPT (gpt-3.5-turbo) の動作記憶能力について,様々な条件下での音声および空間的 n-back タスクの性能を検証し,系統的に評価する。実験の結果,nが増加するにつれてchatgptの性能が大幅に低下することが明らかとなり(作業記憶に格納する情報が増える必要がある),作業記憶能力の限界がヒトに非常に近いことが示唆された。さらに,chatgptの性能に対する異なる指導戦略の影響を調査し,キャパシティ制限の基本パターンが持続することを確認した。実験結果から,n-backタスクは大規模言語モデルのワーキングメモリ容量をベンチマークするためのツールとして機能し,aiワーキングメモリの強化とaiモデルによるヒューマンワーキングメモリの理解の深化を目的とした今後の取り組みの可能性を秘めている可能性が示唆された。 Working memory is a critical aspect of both human intelligence and artificial intelligence, serving as a workspace for the temporary storage and manipulation of information. In this paper, we systematically assess the working memory capacity of ChatGPT (gpt-3.5-turbo), a large language model developed by OpenAI, by examining its performance in verbal and spatial n-back tasks under various conditions. Our experiments reveal that ChatGPT experiences significant declines in performance as n increases (which necessitates more information to be stored in working memory), suggesting a limit to the working memory capacity strikingly similar to that of humans. Furthermore, we investigate the impact of different instruction strategies on ChatGPT's performance and observe that the fundamental patterns of a capacity limit persist. From our empirical findings, we propose that n-back tasks may serve as tools for benchmarking the working memory capacity of large language models and hold potential for informing future efforts aimed at enhancing AI working memory and deepening our understanding of human working memory through AI models.	翻訳日:2023-06-22 02:31:53 公開日:2023-06-18
# ChatGPTとBardはデータプロバイダと利益を共有すべきか? AI時代の新しいビジネスモデル Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era ( http://arxiv.org/abs/2305.02555v2 ) ライセンス: Link先を確認	Dong Zhang	(参考訳) ChatGPTのようなさまざまなAIツールが普及するにつれて、私たちは真のAIの時代に入りつつある。例外的なAIツールがすぐにかなりの利益を得ると予想できる。 AIツールは、従来の利害関係者や株主に加えて、トレーニングデータプロバイダと収益を共有するべきか? 答えはイエスです。大規模言語モデルのような大規模なAIツールは、継続的に改善するためには、より高品質なデータを必要とするが、現在の著作権法は様々な種類のデータへのアクセスを制限する。 AIツールとデータプロバイダ間で収益を共有することで、現在の敵対的なゼロサムゲーム関係を、AIツールと著作権のあるデータ所有者の大多数が協力的かつ相互に利益をもたらすものにすることができる。しかし、現在の収益分配ビジネスモデルは、次のAI時代のAIツールでは機能しない。なぜなら、ウェブサイトベースのトラフィックやクリックのようなアクションのための最も広く使われているメトリクスは、生成AIツールのプロンプトやコストといった新しいメトリクスに置き換えられるからだ。まったく新しい収益分配ビジネスモデルは、AIツールからほぼ独立して、データプロバイダに簡単に説明できる必要があるが、各データプロバイダのデータエンゲージメントを測定するために、プロンプトベースのスコアリングシステムを確立する必要がある。本稿では、分類とコンテンツ類似性モデルに基づいて、AIツールのすべてのデータプロバイダに対して、このようなスコアリングシステムを構築する方法を体系的に議論し、それを構築するためのAIツールやサードパーティの要件を概説する。このようなスコアリングシステムを使ってデータプロバイダと収益を共有することで、より多くのデータ所有者が収益共有プログラムに参加することができる。これは、すべての当事者が恩恵を受ける、実用的なAI時代になるでしょう。 With various AI tools such as ChatGPT becoming increasingly popular, we are entering a true AI era. We can foresee that exceptional AI tools will soon reap considerable profits. A crucial question arise: should AI tools share revenue with their training data providers in additional to traditional stakeholders and shareholders? The answer is Yes. Large AI tools, such as large language models, always require more and better quality data to continuously improve, but current copyright laws limit their access to various types of data. Sharing revenue between AI tools and their data providers could transform the current hostile zero-sum game relationship between AI tools and a majority of copyrighted data owners into a collaborative and mutually beneficial one, which is necessary to facilitate the development of a virtuous cycle among AI tools, their users and data providers that drives forward AI technology and builds a healthy AI ecosystem. However, current revenue-sharing business models do not work for AI tools in the forthcoming AI era, since the most widely used metrics for website-based traffic and action, such as clicks, will be replaced by new metrics such as prompts and cost per prompt for generative AI tools. A completely new revenue-sharing business model, which must be almost independent of AI tools and be easily explained to data providers, needs to establish a prompt-based scoring system to measure data engagement of each data provider. This paper systematically discusses how to build such a scoring system for all data providers for AI tools based on classification and content similarity models, and outlines the requirements for AI tools or third parties to build it. Sharing revenue with data providers using such a scoring system would encourage more data owners to participate in the revenue-sharing program. This will be a utilitarian AI era where all parties benefit.	翻訳日:2023-06-22 02:31:31 公開日:2023-06-18
# WSSSに代わるもの? 弱教師付きセマンティックセマンティックセグメンテーション問題におけるセグメンテーションモデル(SAM)の実証的研究 An Alternative to WSSS? An Empirical Study of the Segment Anything Model (SAM) on Weakly-Supervised Semantic Segmentation Problems ( http://arxiv.org/abs/2305.01586v2 ) ライセンス: Link先を確認	Weixuan Sun, Zheyuan Liu, Yanhao Zhang, Yiran Zhong, Nick Barnes	(参考訳) Segment Anything Model (SAM)は優れたパフォーマンスと汎用性を示しており、様々なタスクに有望なツールとなっている。本稿では,Wakly-Supervised Semantic Segmentation (WSSS)におけるSAMの適用について検討する。特に,画像レベルのクラスラベルのみを付与した擬似ラベル生成パイプラインとしてSAMを適用した。ほとんどのケースで目覚ましい結果が見られたが、特定の限界も特定できた。本研究は,PASCAL VOCとMS-COCOの性能評価を含む。このレポートは、WSSSにSAMを採用するためのさらなる調査と、より広範な現実世界のアプリケーションを促進することを期待する。 The Segment Anything Model (SAM) has demonstrated exceptional performance and versatility, making it a promising tool for various related tasks. In this report, we explore the application of SAM in Weakly-Supervised Semantic Segmentation (WSSS). Particularly, we adapt SAM as the pseudo-label generation pipeline given only the image-level class labels. While we observed impressive results in most cases, we also identify certain limitations. Our study includes performance evaluations on PASCAL VOC and MS-COCO, where we achieved remarkable improvements over the latest state-of-the-art methods on both datasets. We anticipate that this report encourages further explorations of adopting SAM in WSSS, as well as wider real-world applications.	翻訳日:2023-06-22 02:29:58 公開日:2023-06-18
# 軽量オールコンベネト・トランスファー学習による表面emgに基づくセッション間/サブジェクション認識 Surface EMG-Based Inter-Session/Inter-Subject Gesture Recognition by Leveraging Lightweight All-ConvNet and Transfer Learning ( http://arxiv.org/abs/2305.08014v2 ) ライセンス: Link先を確認	Md. Rabiul Islam, Daniel Massicotte, Philippe Y. Massicotte, and Wei-Ping Zhu	(参考訳) 低解像度のHD-sEMG画像を用いたジェスチャー認識は、より流動的で自然な筋肉-コンピュータインターフェースを開発するための新たな道を開く。しかし、セッション間およびサブジェクト間シナリオ間のデータ変動は大きな課題となる。既存のアプローチでは、非常に大きく複雑なConvNetまたは2SRNNベースのドメイン適応手法を使用して、これらのセッション間およびオブジェクト間データのばらつきに起因する分散シフトを近似した。したがって、これらの方法は、何百万ものトレーニングパラメータと、事前トレーニングと適応段階の両方で、トレーニング済みおよびターゲットドメインデータセットを学習する必要がある。その結果、リアルタイムアプリケーションへのデプロイには、ハイエンドのリソースバウンドと計算コストが非常にかかる。本稿では,この問題を解決するために,軽量なall-convnet and transfer learning(tl)を活用した軽量なall-convnet+tlモデルを提案する。 all-convnet+tlモデルは畳み込み層のみで構成されており、セッション間およびサブジェクト間データ可変性によって引き起こされる分散シフトに対処するための不変および判別表現を学習するための単純かつ効率的なフレームワークである。 4つのデータセットに対する実験により,提案手法は,既存の手法よりも大きなマージンで優れており,セッション間およびオブジェクト間シナリオにおける最先端の結果が得られ,セッション内ジェスチャ認識において同等あるいは競合的に実行されることを示した。これらのパフォーマンスギャップは、少数のデータ(例えば単一のトライアル)がターゲットドメインで利用可能になったときにさらに増加する。これらの顕著な実験結果は、現在の最先端モデルが、sEMGベースのセッション間およびオブジェクト間ジェスチャー認識タスクに対して過度にパラメータ化されていることを示す。 Gesture recognition using low-resolution instantaneous HD-sEMG images opens up new avenues for the development of more fluid and natural muscle-computer interfaces. However, the data variability between inter-session and inter-subject scenarios presents a great challenge. The existing approaches employed very large and complex deep ConvNet or 2SRNN-based domain adaptation methods to approximate the distribution shift caused by these inter-session and inter-subject data variability. Hence, these methods also require learning over millions of training parameters and a large pre-trained and target domain dataset in both the pre-training and adaptation stages. As a result, it makes high-end resource-bounded and computationally very expensive for deployment in real-time applications. To overcome this problem, we propose a lightweight All-ConvNet+TL model that leverages lightweight All-ConvNet and transfer learning (TL) for the enhancement of inter-session and inter-subject gesture recognition performance. The All-ConvNet+TL model consists solely of convolutional layers, a simple yet efficient framework for learning invariant and discriminative representations to address the distribution shifts caused by inter-session and inter-subject data variability. Experiments on four datasets demonstrate that our proposed methods outperform the most complex existing approaches by a large margin and achieve state-of-the-art results on inter-session and inter-subject scenarios and perform on par or competitively on intra-session gesture recognition. These performance gaps increase even more when a tiny amount (e.g., a single trial) of data is available on the target domain for adaptation. These outstanding experimental results provide evidence that the current state-of-the-art models may be overparameterized for sEMG-based inter-session and inter-subject gesture recognition tasks.	翻訳日:2023-06-22 02:23:09 公開日:2023-06-18
# 健康保険請求の経時的変化に関する大規模研究 Large-Scale Study of Temporal Shift in Health Insurance Claims ( http://arxiv.org/abs/2305.05087v2 ) ライセンス: Link先を確認	Christina X Ji, Ahmed M Alaa, David Sontag	(参考訳) 臨床結果を予測する機械学習モデルは歴史的データを用いて開発されている。しかし、たとえこれらのモデルが近い将来デプロイされるとしても、データセットの時間的シフトは理想的なパフォーマンスに満たない可能性がある。この現象を捉えるために,歴史的モデルがもはやその結果を予測するのに最適でない場合,特定の時点において予測される結果が非定常であるようなタスクを考える。本研究では,集団レベルでの時間的シフトを検証するためのアルゴリズムを構築した。次に,大規模なタスク群における時間変化の振り返りスキャンを行うためのメタアルゴリズムを構築した。我々のアルゴリズムは、医療の時間的シフトを私たちの知識にまとめて評価することを可能にする。我々は、2015年から2020年にかけて、医療保険請求データセットに基づいて242の医療結果を評価し、1,010のタスクを作成します。タスクの9.7%は人口レベルでの時間的シフトを示し、93.0%は人口移動の影響を受けている。臨床的意義を理解するためにケーススタディを掘り下げる。我々の分析は、医療における時間的シフトの広範性を強調している。 Most machine learning models for predicting clinical outcomes are developed using historical data. Yet, even if these models are deployed in the near future, dataset shift over time may result in less than ideal performance. To capture this phenomenon, we consider a task--that is, an outcome to be predicted at a particular time point--to be non-stationary if a historical model is no longer optimal for predicting that outcome. We build an algorithm to test for temporal shift either at the population level or within a discovered sub-population. Then, we construct a meta-algorithm to perform a retrospective scan for temporal shift on a large collection of tasks. Our algorithms enable us to perform the first comprehensive evaluation of temporal shift in healthcare to our knowledge. We create 1,010 tasks by evaluating 242 healthcare outcomes for temporal shift from 2015 to 2020 on a health insurance claims dataset. 9.7% of the tasks show temporal shifts at the population level, and 93.0% have some sub-population affected by shifts. We dive into case studies to understand the clinical implications. Our analysis highlights the widespread prevalence of temporal shifts in healthcare.	翻訳日:2023-06-22 02:20:38 公開日:2023-06-18
# 区別可能なセグメンテーションを用いたエンドツーエンド同時音声翻訳 End-to-End Simultaneous Speech Translation with Differentiable Segmentation ( http://arxiv.org/abs/2305.16093v2 ) ライセンス: Link先を確認	Shaolei Zhang, Yang Feng	(参考訳) エンドツーエンド同時音声翻訳(simulst)は、ストリーミング音声入力を受信しながら翻訳を出力する(すなわち、ストリーミング音声翻訳)ため、音声入力を分割して、現在の受信音声に基づいて翻訳する必要がある。しかし、不利な瞬間に音声入力を分割すると、音響的完全性が損なわれ、翻訳モデルの性能に悪影響を及ぼす可能性がある。したがって、翻訳モデルが高品質な翻訳を生み出すのに役立つこれらの瞬間に音声入力を分割する学習は、シマルストの鍵となる。既存のSimulST法は、固定長セグメンテーションまたは外部セグメンテーションモデルのいずれかを使用しており、常に基礎となる翻訳モデルとセグメンテーションを分離している。そこで本稿では,SimulST における微分可能セグメンテーション (DiSeg) を提案し,基礎となる翻訳モデルから直接セグメンテーションを学習する。 DiSegは、予測トレーニングによってハードセグメンテーションを微分可能にし、翻訳モデルと共同でトレーニングし、翻訳効果セグメンテーションを学ぶことができる。実験結果から,DiSegは最先端性能を実現し,セグメンテーション能力に優れることが示された。 End-to-end simultaneous speech translation (SimulST) outputs translation while receiving the streaming speech inputs (a.k.a. streaming speech translation), and hence needs to segment the speech inputs and then translate based on the current received speech. However, segmenting the speech inputs at unfavorable moments can disrupt the acoustic integrity and adversely affect the performance of the translation model. Therefore, learning to segment the speech inputs at those moments that are beneficial for the translation model to produce high-quality translation is the key to SimulST. Existing SimulST methods, either using the fixed-length segmentation or external segmentation model, always separate segmentation from the underlying translation model, where the gap results in segmentation outcomes that are not necessarily beneficial for the translation process. In this paper, we propose Differentiable Segmentation (DiSeg) for SimulST to directly learn segmentation from the underlying translation model. DiSeg turns hard segmentation into differentiable through the proposed expectation training, enabling it to be jointly trained with the translation model and thereby learn translation-beneficial segmentation. Experimental results demonstrate that DiSeg achieves state-of-the-art performance and exhibits superior segmentation capability.	翻訳日:2023-06-22 01:52:30 公開日:2023-06-18
# 低コストセキュリティ検査のための大規模単発ミリ波イメージングに向けて Towards Large-scale Single-shot Millimeter-wave Imaging for Low-cost Security Inspection ( http://arxiv.org/abs/2305.15750v2 ) ライセンス: Link先を確認	Liheng Bian, Daoyu Li, Shuoguang Wang, Chunyang Teng, Huteng Liu, Hanwen Xu, Xuyang Chang, Guoqiang Zhao, Shiyong Li, Jun Zhang	(参考訳) 安全検査のための有望な技術としてミリ波イメージング(MMW)が登場している。画像分解能、透過性、人間の安全性の微妙なバランスを実現し、低周波マイクロ波に比べて高い分解能、可視光よりも強い透過性、X線より強い安全性を実現している。近年の進歩にもかかわらず、必要な大規模アンテナアレイの高コストは、実際にMMWイメージングを広く採用することを妨げている。この課題に取り組むため,sparseアンテナアレーを用いた大規模単発mmwイメージングフレームワークを報告し,解釈可能な学習方式で低コストかつ高精度なセキュリティ検査を実現する。まず,大規模アレイにおける各要素の統計的ランク付けについて検討するため,全サンプルのMMWエコーを収集した。これらの要素はランキングに基づいてサンプリングされ、実験的に最適なスパースサンプリング戦略を構築し、アンテナアレイのコストを最大1桁削減する。さらに,スパースサンプルエコーから頑健で正確な画像再構成を実現する非学習的解釈可能な学習手法を考案した。最後に,物体の自動検出のためのニューラルネットワークを開発し,10%のスパースアレイを用いた隠れたセンチメートルサイズのターゲットの検出を実験的に実証した。報告した手法の性能は、精度、リコール、mAP50を含む様々な指標で既存のMMW撮像方式よりも50%以上優れている。このような強力な検出能力とオーダー・オブ・マグニチュードのコスト削減により、この技術は大規模単発MMWイメージングの実用的な方法となり、さらに実用的な応用が期待できる。 Millimeter-wave (MMW) imaging is emerging as a promising technique for safe security inspection. It achieves a delicate balance between imaging resolution, penetrability and human safety, resulting in higher resolution compared to low-frequency microwave, stronger penetrability compared to visible light, and stronger safety compared to X ray. Despite of recent advance in the last decades, the high cost of requisite large-scale antenna array hinders widespread adoption of MMW imaging in practice. To tackle this challenge, we report a large-scale single-shot MMW imaging framework using sparse antenna array, achieving low-cost but high-fidelity security inspection under an interpretable learning scheme. We first collected extensive full-sampled MMW echoes to study the statistical ranking of each element in the large-scale array. These elements are then sampled based on the ranking, building the experimentally optimal sparse sampling strategy that reduces the cost of antenna array by up to one order of magnitude. Additionally, we derived an untrained interpretable learning scheme, which realizes robust and accurate image reconstruction from sparsely sampled echoes. Last, we developed a neural network for automatic object detection, and experimentally demonstrated successful detection of concealed centimeter-sized targets using 10% sparse array, whereas all the other contemporary approaches failed at the same sample sampling ratio. The performance of the reported technique presents higher than 50% superiority over the existing MMW imaging schemes on various metrics including precision, recall, and mAP50. With such strong detection ability and order-of-magnitude cost reduction, we anticipate that this technique provides a practical way for large-scale single-shot MMW imaging, and could advocate its further practical applications.	翻訳日:2023-06-22 01:52:09 公開日:2023-06-18
# テキスト誘導拡散モデルの興味ある特性 Intriguing Properties of Text-guided Diffusion Models ( http://arxiv.org/abs/2306.00974v3 ) ライセンス: Link先を確認	Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, and Alan Yuille	(参考訳) テキスト誘導拡散モデル(TDM)は広く応用されているが、予期せず失敗することがある。よくある失敗は (i)自然に見えるテキストは、間違った内容の画像を生成させるか、または (ii)同じテキストプロンプトで条件付けされているにもかかわらず、非常に異なる、あるいは無関係な出力を生成する潜在変数の異なるランダムなサンプル。本研究では,TDMの障害モードについて,より詳細に研究し,理解することを目的とする。これを実現するために,画像分類器を代理損失関数として利用するTDMに対する敵対攻撃であるSAGEを提案し,画像生成における予期せぬ動作や故障事例を自動的に発見するために,TDMの離散的なプロンプト空間と高次元潜在空間を探索する。我々は,sageが分類器ではなく拡散モデルの障害事例を見出すために,いくつかの技術的貢献を行い,人間の研究で検証する。本研究は,これまでに体系的に研究されていないtdmの4つの興味をそそる性質を明らかにした。(1)入力テキストのセマンティクスを捉えない画像を生成する,様々な自然テキストプロンプトを見つける。これらの障害を根本原因に基づいた10の異なるタイプに分類する。 2) テキストプロンプトから独立して歪んだ画像につながる潜伏空間(外れ値ではない)のサンプルを見つけ, 潜伏空間の一部が十分に構造化されていないことを示唆した。 3)テキストプロンプトと無関係な自然画像に繋がる潜在サンプルを見つけ、潜在空間とプロンプト空間の間の潜在的な不一致を示唆する。 (4) 入力プロンプトに1つの逆数トークンを埋め込むことで、CLIPスコアに最小限の影響を与えながら、さまざまな特定のターゲットオブジェクトを生成することができる。これは言語表現の脆弱さを示し、潜在的な安全性の懸念を提起する。 Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly. Common failures include: (i) natural-looking text prompts generating images with the wrong content, or (ii) different random samples of the latent variables that generate vastly different, and even unrelated, outputs despite being conditioned on the same text prompt. In this work, we aim to study and understand the failure modes of TDMs in more detail. To achieve this, we propose SAGE, an adversarial attack on TDMs that uses image classifiers as surrogate loss functions, to search over the discrete prompt space and the high-dimensional latent space of TDMs to automatically discover unexpected behaviors and failure cases in the image generation. We make several technical contributions to ensure that SAGE finds failure cases of the diffusion model, rather than the classifier, and verify this in a human study. Our study reveals four intriguing properties of TDMs that have not been systematically studied before: (1) We find a variety of natural text prompts producing images that fail to capture the semantics of input texts. We categorize these failures into ten distinct types based on the underlying causes. (2) We find samples in the latent space (which are not outliers) that lead to distorted images independent of the text prompt, suggesting that parts of the latent space are not well-structured. (3) We also find latent samples that lead to natural-looking images which are unrelated to the text prompt, implying a potential misalignment between the latent and prompt spaces. (4) By appending a single adversarial token embedding to an input prompt we can generate a variety of specified target objects, while only minimally affecting the CLIP score. This demonstrates the fragility of language representations and raises potential safety concerns.	翻訳日:2023-06-22 01:22:48 公開日:2023-06-18
# マスク画像モデリングによる自己教師付き学習フレームワークに基づく新しいドライバ抽出行動検出 A Novel Driver Distraction Behavior Detection Based on Self-Supervised Learning Framework with Masked Image Modeling ( http://arxiv.org/abs/2306.00543v3 ) ライセンス: Link先を確認	Yingzhi Zhang, Taiguo Li, Chao Li and Xinghong Zhou	(参考訳) ドライバーの気晴らしは毎年かなりの数の交通事故を引き起こし、経済的な損失と損失をもたらす。現在、商用車両の自動化のレベルは完全に無人ではなく、ドライバーは依然として車両の操作と制御において重要な役割を担っている。そのため,道路安全には運転者の注意散らし行動検出が不可欠である。現在、ドライバーの注意散逸検出は主に従来の畳み込みニューラルネットワーク(cnn)と教師付き学習方法に依存している。しかし、ラベル付きデータセットの高コスト、高レベルのセマンティック情報をキャプチャする能力の制限、一般化性能の低下など、依然として課題がある。そこで本研究では,ドライバの注意散逸行動検出のためのマスク画像モデルに基づく自己教師付き学習手法を提案する。まず,マスク付き画像モデリング(MIM)のための自己教師型学習フレームワークを導入し,データセットのラベル付けによる人的・物質的消費の問題を解決する。次に、Swin Transformerがエンコーダとして使用される。 Swin Transformerブロックを再構成し、ウィンドウマルチヘッド自己アテンション(W-MSA)とシフトウィンドウマルチヘッド自己アテンション(SW-MSA)検出ヘッドの分布を全ステージにわたって調整することで、より軽量化を実現する。最後に、モデルの認識と一般化能力を強化するために、様々なデータ拡張戦略と最適なランダムマスキング戦略が使用される。大規模運転注意散逸行動データセットの試験結果から,本論文で提案した自己教師学習法は99.60%の精度で,高度な教師付き学習法の優れた性能を近似する。 Driver distraction causes a significant number of traffic accidents every year, resulting in economic losses and casualties. Currently, the level of automation in commercial vehicles is far from completely unmanned, and drivers still play an important role in operating and controlling the vehicle. Therefore, driver distraction behavior detection is crucial for road safety. At present, driver distraction detection primarily relies on traditional Convolutional Neural Networks (CNN) and supervised learning methods. However, there are still challenges such as the high cost of labeled datasets, limited ability to capture high-level semantic information, and weak generalization performance. In order to solve these problems, this paper proposes a new self-supervised learning method based on masked image modeling for driver distraction behavior detection. Firstly, a self-supervised learning framework for masked image modeling (MIM) is introduced to solve the serious human and material consumption issues caused by dataset labeling. Secondly, the Swin Transformer is employed as an encoder. Performance is enhanced by reconfiguring the Swin Transformer block and adjusting the distribution of the number of window multi-head self-attention (W-MSA) and shifted window multi-head self-attention (SW-MSA) detection heads across all stages, which leads to model more lightening. Finally, various data augmentation strategies are used along with the best random masking strategy to strengthen the model's recognition and generalization ability. Test results on a large-scale driver distraction behavior dataset show that the self-supervised learning method proposed in this paper achieves an accuracy of 99.60%, approximating the excellent performance of advanced supervised learning methods.	翻訳日:2023-06-22 01:21:59 公開日:2023-06-18
# 雑音ラベルを用いた線形距離メトリック学習 Linear Distance Metric Learning with Noisy Labels ( http://arxiv.org/abs/2306.03173v2 ) ライセンス: Link先を確認	Meysam Alishahi, Anna Little, and Jeff M. Phillips	(参考訳) 線形距離距離学習では、あるユークリッド距離空間内のデータを与えられ、ある距離条件を可能な限り尊重する別のユークリッド距離空間への適切な線型写像を見つけることが目的である。本稿では,一般連続凸損失最適化問題に還元する単純でエレガントな手法を定式化し,異なる雑音モデルに対して対応する損失関数を導出する。その結果、データがノイズである場合でも、十分なサンプルへのアクセスを提供する精度で基底真理線形計量を学習できることを示し、対応するサンプル複雑性を限定する。さらに,学習したモデルを低ランクモデルに切り離し,損失関数とパラメータの精度を良好に維持する効果的な手法を提案する。合成および実データ集合に関するいくつかの実験的な観察は、我々の理論的結果を支持し、知らせる。 In linear distance metric learning, we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible. In this paper, we formalize a simple and elegant method which reduces to a general continuous convex loss optimization problem, and for different noise models we derive the corresponding loss functions. We show that even if the data is noisy, the ground truth linear metric can be learned with any precision provided access to enough samples, and we provide a corresponding sample complexity bound. Moreover, we present an effective way to truncate the learned model to a low-rank model that can provably maintain the accuracy in loss function and in parameters -- the first such results of this type. Several experimental observations on synthetic and real data sets support and inform our theoretical results.	翻訳日:2023-06-22 01:11:11 公開日:2023-06-18
# hierarchyeom.jl:オープン量子システムにおける階層的運動方程式のための効率的なjuliaフレームワーク HierarchicalEOM.jl: An efficient Julia framework for hierarchical equations of motion in open quantum systems ( http://arxiv.org/abs/2306.07522v3 ) ライセンス: Link先を確認	Yi-Te Huang, Po-Chen Kuo, Neill Lambert, Mauro Cirio, Simon Cross, Shen-Liang Yang, Franco Nori, Yueh-Nan Chen	(参考訳) 我々は,複数のボソニック環境とフェルミオン環境を同時に結合したシステムのダイナミクスを減少させるために,階層的運動方程式(heom)を統合するためのjuliaフレームワークであるhierarchicaleom.jlというオープンソースソフトウェアパッケージを導入する。 HierarchicalEOM.jlは、ボソニックおよびフェルミオンスペクトル、定常状態、および全ての補助密度作用素(ADO)の拡張空間におけるフルダイナミックスを計算する方法の集合を特徴としている。 ADOのマルチインデックスの必要な処理は、ユーザフレンドリーなインターフェースによって実現される。 2つのフェルミオン貯水池と相互作用する1つの不純物(アンダーソンモデル)と1つのボゾンと2つのフェルミオン貯水池と相互作用する超強結合電荷キャビティ系を解析することにより、パッケージの機能性を実証する。 hierarchyeom.jl は heom liouvillian superoperator の構築において、全ての ados のダイナミクスと定常状態の解法として、このパッケージが確立された python (qutip) の量子ツールボックスの対応するメソッドに関して、桁違いに高速化することができる。 We introduce an open-source software package called "HierarchicalEOM.jl", a Julia framework to integrate the hierarchical equations of motion (HEOM) for the reduced dynamics of a system simultaneously coupled to multiple bosonic and fermionic environments. HierarchicalEOM.jl features a collection of methods to compute bosonic and fermionic spectra, stationary states, and the full dynamics in the extended space of all auxiliary density operators (ADOs). The required handling of the ADOs multi-indexes is achieved through a user-friendly interface. We exemplify the functionalities of the package by analyzing a single impurity interacting with two fermionic reservoirs (Anderson model), and an ultra-strongly coupled charge-cavity system interacting with one bosonic and two fermionic reservoirs. HierarchicalEOM.jl allows for an order of magnitude speedup in the construction of the HEOM Liouvillian superoperator, solving dynamics and stationary states for all ADOs, with respect to the corresponding method in the Quantum Toolbox in Python (QuTiP), upon which this package is founded.	翻訳日:2023-06-22 01:04:35 公開日:2023-06-18
# LAMM: 言語支援マルチモーダル命令-チューニングデータセット、フレームワーク、ベンチマーク LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark ( http://arxiv.org/abs/2306.06687v2 ) ライセンス: Link先を確認	Zhenfei Yin, Jiong Wang, Jianjian Cao, Zhelun Shi, Dingning Liu, Mukai Li, Lu Sheng, Lei Bai, Xiaoshui Huang, Zhiyong Wang, Jing Shao, Wanli Ouyang	(参考訳) 大規模言語モデルは、人工知能の実現への潜在的経路となっている。マルチモーダル大規模言語モデルに関する最近の研究は、視覚モダリティの処理における効果を実証している。本研究では,MLLMの研究をポイントクラウドに拡張し,2次元画像と3次元ポイントクラウド理解のためのLAMMデータセットとLAMMベンチマークを示す。また,MLLMのさらなるモダリティへの拡張を容易にする拡張可能なフレームワークを構築した。私たちの主な貢献は3倍です。 1) LAMM-Dataset と LAMM-Benchmark について述べる。広範な実験によって、データセットとベンチマークの有効性が検証されます。 2)mllmのインストラクションチューニングデータセットとベンチマークを構築するための詳細な方法を示し,mllmに関する今後の研究により,他のドメインやタスク,モダリティへのスケールアップと拡張を高速化する。 3)モダリティの拡張に最適化されたMLLMトレーニングフレームワークを提供する。また、今後の研究を加速するために、ベースラインモデル、総合的な実験観測、分析も提供する。コードとデータセットはhttps://github.com/OpenLAMM/LAMMで公開されている。 Large language models have become a potential pathway toward achieving artificial general intelligence. Recent works on multi-modal large language models have demonstrated their effectiveness in handling visual modalities. In this work, we extend the research of MLLMs to point clouds and present the LAMM-Dataset and LAMM-Benchmark for 2D image and 3D point cloud understanding. We also establish an extensible framework to facilitate the extension of MLLMs to additional modalities. Our main contribution is three-fold: 1) We present the LAMM-Dataset and LAMM-Benchmark, which cover almost all high-level vision tasks for 2D and 3D vision. Extensive experiments validate the effectiveness of our dataset and benchmark. 2) We demonstrate the detailed methods of constructing instruction-tuning datasets and benchmarks for MLLMs, which will enable future research on MLLMs to scale up and extend to other domains, tasks, and modalities faster. 3) We provide a primary but potential MLLM training framework optimized for modalities' extension. We also provide baseline models, comprehensive experimental observations, and analysis to accelerate future research. Codes and datasets are now available at https://github.com/OpenLAMM/LAMM.	翻訳日:2023-06-22 01:03:44 公開日:2023-06-18
# グラフニューラルネットワークの局所的・グローバル的展望 Local-to-global Perspectives on Graph Neural Networks ( http://arxiv.org/abs/2306.06547v2 ) ライセンス: Link先を確認	Chen Cai	(参考訳) この論文は、グラフ構造化データを処理するための主要なアーキテクチャである、グラフニューラルネットワーク(gnn)のローカルからグローバルへの展望を示している。 GNNをローカルメッセージパッシングニューラルネットワーク(MPNN)とグローバルグラフトランスフォーマーに分類した後、我々は3つの作品を提示した。 1)グローバルGNNの一種である不変グラフネットワークの収束特性について検討する。 2)ローカルMPNNとグローバルグラフ変換器を接続し、 3)グローバルモデリングで使用される標準サブルーチンであるグラフ粗大化にローカルMPNNを使用する。 This thesis presents a local-to-global perspective on graph neural networks (GNN), the leading architecture to process graph-structured data. After categorizing GNN into local Message Passing Neural Networks (MPNN) and global Graph transformers, we present three pieces of work: 1) study the convergence property of a type of global GNN, Invariant Graph Networks, 2) connect the local MPNN and global Graph Transformer, and 3) use local MPNN for graph coarsening, a standard subroutine used in global modeling.	翻訳日:2023-06-22 01:03:25 公開日:2023-06-18
# セルワイズ物体追跡、速度推定、時間経過によるセンサデータの投影のための深層学習法 Deep Learning Method for Cell-Wise Object Tracking, Velocity Estimation and Projection of Sensor Data over Time ( http://arxiv.org/abs/2306.06126v2 ) ライセンス: Link先を確認	Marco Braun, Moritz Luszek, Mirko Meuter, Dominic Spata, Kevin Kollek and Anton Kummert	(参考訳) 環境セグメンテーションと速度推定のための最近のディープラーニング手法は、得られたセンサデータ内の時空間関係を利用する畳み込みリカレントニューラルネットワークに依存している。これらのアプローチは、ConvNetsを利用した新しい入力と記憶データの関連付けにより、シーンダイナミクスを暗黙的に導き出す。我々は、convnetがこのタスクのアーキテクチャ上の制約に苦しむ様子を示す。そこで本研究では,トランスフォーマー機構を応用した新しいリカレントニューラルネットワークユニットを提示することにより,センサ記録の時系列における時空間相関の活用に関する様々な課題を解決する。このユニット内のオブジェクトエンコーディングは、それぞれセンサ入力とメモリ状態から派生したキー-クエリペアを関連付け、連続したフレーム間で追跡される。次に、結果の追跡パターンを使用して、シーンダイナミクスと回帰速度を得る。最後のステップでは、抽出された速度推定に基づいてリカレントニューラルネットワークのメモリ状態を投影し、上記の時空間的不一致を解決する。 Current Deep Learning methods for environment segmentation and velocity estimation rely on Convolutional Recurrent Neural Networks to exploit spatio-temporal relationships within obtained sensor data. These approaches derive scene dynamics implicitly by correlating novel input and memorized data utilizing ConvNets. We show how ConvNets suffer from architectural restrictions for this task. Based on these findings, we then provide solutions to various issues on exploiting spatio-temporal correlations in a sequence of sensor recordings by presenting a novel Recurrent Neural Network unit utilizing Transformer mechanisms. Within this unit, object encodings are tracked across consecutive frames by correlating key-query pairs derived from sensor inputs and memory states, respectively. We then use resulting tracking patterns to obtain scene dynamics and regress velocities. In a last step, the memory state of the Recurrent Neural Network is projected based on extracted velocity estimates to resolve aforementioned spatio-temporal misalignment.	翻訳日:2023-06-22 01:02:43 公開日:2023-06-18
# vision datasets: 視覚に基づく産業検査のベンチマーク VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON ( http://arxiv.org/abs/2306.07890v2 ) ライセンス: Link先を確認	Haoping Bai, Shancong Mou, Tatiana Likhomanenko, Ramazan Gokberk Cinbis, Oncel Tuzel, Ping Huang, Jiulong Shan, Jianjun Shi, Meng Cao	(参考訳) ビジョンベースの検査アルゴリズムの進歩にもかかわらず、データ可用性、品質、複雑な生産要件など、現実の産業上の課題は、しばしば未解決のままである。我々は,14の産業検査データセットの多種多様なコレクションであるvision datasetsを紹介する。以前のデータセットとは異なり、VISIONは欠陥検出に汎用性をもたらし、すべての分割にアノテーションマスクを提供し、さまざまな検出方法に対処する。データセットにはインスタンスセグメンテーションアノテーションがあり、正確な欠陥識別を可能にします。 44の欠陥を含む合計18kイメージにより、VISIONは幅広い実世界のプロダクションシナリオを反映しようと試みている。 Vision Datasetsで進行中の2つのチャレンジコンペティションを支援することで、ビジョンベースの産業検査のさらなる進歩を期待する。 Despite progress in vision-based inspection algorithms, real-world industrial challenges -- specifically in data availability, quality, and complex production requirements -- often remain under-addressed. We introduce the VISION Datasets, a diverse collection of 14 industrial inspection datasets, uniquely poised to meet these challenges. Unlike previous datasets, VISION brings versatility to defect detection, offering annotation masks across all splits and catering to various detection methodologies. Our datasets also feature instance-segmentation annotation, enabling precise defect identification. With a total of 18k images encompassing 44 defect types, VISION strives to mirror a wide range of real-world production scenarios. By supporting two ongoing challenge competitions on the VISION Datasets, we hope to foster further advancements in vision-based industrial inspection.	翻訳日:2023-06-22 00:52:52 公開日:2023-06-18
# 顧客レビューから洞察を効率的に抽出するためのクラウドベースの機械学習パイプライン A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews ( http://arxiv.org/abs/2306.07786v2 ) ライセンス: Link先を確認	Robert Lakatos, Gergo Bogacsovics, Balazs Harangi, Istvan Lakatos, Attila Tiba, Janos Toth, Marianna Szabo, Andras Hajdu	(参考訳) 自然言語処理の効率は、機械学習モデル、特にニューラルネットワークベースのソリューションの出現によって劇的に向上した。しかしながら、特定のドメインを考慮する場合、いくつかのタスクはまだ難しい。本稿では,パイプラインに統合された機械学習手法を用いて,顧客レビューから洞察を抽出するクラウドシステムを提案する。トピックモデリングには、自然言語処理、ベクトル埋め込みに基づくキーワード抽出、クラスタリング用に設計されたトランスフォーマーベースニューラルネットワークを用いる。提案モデルの要素は,効率的な情報抽出,抽出した情報のトピックモデリング,ユーザニーズといった要件を満たすために,さらに統合され,さらに発展してきた。さらに,本タスクの既存のトピックモデリングやキーワード抽出ソリューションよりも優れた結果が得られる。提案手法は,ベンチマークのために公開されているデータセットを用いて,他の最先端手法と比較して検証・比較する。 The efficiency of natural language processing has improved dramatically with the advent of machine learning models, particularly neural network-based solutions. However, some tasks are still challenging, especially when considering specific domains. In this paper, we present a cloud-based system that can extract insights from customer reviews using machine learning methods integrated into a pipeline. For topic modeling, our composite model uses transformer-based neural networks designed for natural language processing, vector embedding-based keyword extraction, and clustering. The elements of our model have been integrated and further developed to meet better the requirements of efficient information extraction, topic modeling of the extracted information, and user needs. Furthermore, our system can achieve better results than this task's existing topic modeling and keyword extraction solutions. Our approach is validated and compared with other state-of-the-art methods using publicly available datasets for benchmarking.	翻訳日:2023-06-22 00:52:22 公開日:2023-06-18
# リンク予測のためのグラフニューラルネットワークの評価:現在の落とし穴とベンチマーク Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking ( http://arxiv.org/abs/2306.10453v1 ) ライセンス: Link先を確認	Juanhui Li, Harry Shomer, Haitao Mao, Shenglai Zeng, Yao Ma, Neil Shah, Jiliang Tang, Dawei Yin	(参考訳) リンク予測は、グラフのエッジの一部のみに基づいて、見当たらないエッジが存在するかどうかを予測しようとする。近年,この課題にグラフニューラルネットワーク(GNN)を活用すべく,一連の手法が導入されている。さらに、これらの新しいモデルの有効性をより良く評価するために、新しく多様なデータセットも作成されている。しかし、これらの新しい手法を適切に評価する能力を阻害する複数の落とし穴がある。これらの落とし穴には、(1)複数のベースラインでの実際のパフォーマンスよりも低いこと、(2)いくつかのデータセットにおける統一データ分割と評価指標の欠如、(3)簡単な負のサンプルを用いた非現実的な評価設定が含まれる。これらの課題を克服するために、我々はまず、同じデータセットとハイパーパラメータ検索設定を利用して、注目すべきメソッドとデータセットを公正に比較する。次に,複数のヒューリスティックスを用いて硬い負のサンプルをサンプリングするヒューリスティック関連サンプリング手法(heart)に基づいて,より実用的な評価設定を行う。新しい評価設定は、評価を現実世界の状況に合わせることによって、リンク予測の新たな挑戦と機会を促進するのに役立つ。私たちの実装とデータはhttps://github.com/Juanhui28/HeaRTで利用可能です。 Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph. A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task. Furthermore, new and diverse datasets have also been created to better evaluate the effectiveness of these new models. However, multiple pitfalls currently exist that hinder our ability to properly evaluate these new methods. These pitfalls mainly include: (1) Lower than actual performance on multiple baselines, (2) A lack of a unified data split and evaluation metric on some datasets, and (3) An unrealistic evaluation setting that uses easy negative samples. To overcome these challenges, we first conduct a fair comparison across prominent methods and datasets, utilizing the same dataset and hyperparameter search settings. We then create a more practical evaluation setting based on a Heuristic Related Sampling Technique (HeaRT), which samples hard negative samples via multiple heuristics. The new evaluation setting helps promote new challenges and opportunities in link prediction by aligning the evaluation with real-world situations. Our implementation and data are available at https://github.com/Juanhui28/HeaRT	翻訳日:2023-06-21 20:44:23 公開日:2023-06-18
# mismatch:ミスマッチエラー型を用いたマシン生成テキストのきめ細かい評価 MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types ( http://arxiv.org/abs/2306.10452v1 ) ライセンス: Link先を確認	Keerthiram Murugesan, Sarathkrishna Swaminathan, Soham Dan, Subhajit Chaudhury, Chulaka Gunasekara, Maxwell Crouse, Diwakar Mahajan, Ibrahim Abdelaziz, Achille Fokoue, Pavan Kapanipathi, Salim Roukos, Alexander Gray	(参考訳) 大規模言語モデルへの関心が高まっており、参照(典型的には人間生成)テキストと比較して機械テキストの品質を評価する必要性が注目されている。最近の研究はタスク固有の評価メトリクスにフォーカスするか、既存のメトリクスでキャプチャされたマシン生成テキストの特性を研究している。本研究では,一対のテキスト間のきめ細かいミスマッチに基づいて,人間の判断を7つのNLPタスクでモデル化する新しい評価手法を提案する。微粒化評価のためのNLPタスクの最近の取り組みに触発されて,空間的/地理的誤りや実体的誤りなど13種類のミスマッチエラータイプを導入し,人間の判断をより正確に予測するためのモデル指導を行った。本稿では,これらのミスマッチエラータイプを補助的タスクとして用いたマシンテキスト評価のためのニューラルネットワークフレームワークを提案し,既存の単一数値評価指標を,マシンから抽出したテキスト特徴や参照テキストに加え,スカラー機能として再活用する。当社の実験では、ミスマッチエラーによる既存のメトリクスに関する重要な洞察を明らかにしました。 7つのNLPタスクから得られたデータセットの文対間のミスマッチ誤差は,人間の評価とよく一致している。 With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts. Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types such as spatial/geographic errors, entity errors, etc, to guide the model for better prediction of human judgments. We propose a neural framework for evaluating machine texts that uses these mismatch error types as auxiliary tasks and re-purposes the existing single-number evaluation metrics as additional scalar features, in addition to textual features extracted from the machine and reference texts. Our experiments reveal key insights about the existing metrics via the mismatch errors. We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.	翻訳日:2023-06-21 20:44:04 公開日:2023-06-18
# 協調的知識の活用による胸部X線の放射線学所見の生成 Generation of Radiology Findings in Chest X-Ray by Leveraging Collaborative Knowledge ( http://arxiv.org/abs/2306.10448v1 ) ライセンス: Link先を確認	Manuela Daniela Danu, George Marica, Sanjeev Kumar Karn, Bogdan Georgescu, Awais Mansoor, Florin Ghesu, Lucian Mihai Itu, Constantin Suciu, Sasa Grbic, Oladimeji Farri, Dorin Comaniciu	(参考訳) 典型的な放射線医学レポートの全てのサブセクションのうち、臨床適応、所見、印象は患者の健康状態に関する重要な詳細を反映していることが多い。インプレッションに含まれる情報は、しばしば発見によってカバーされる。 FindingsとImpressionは画像の検査によって推測できるが、臨床指標は追加のコンテキストを必要とすることが多い。医学的イメージを解釈する認知的タスクは、放射線学のワークフローにおいて最も重要かつしばしば時間を要するステップである。本稿では,医療画像の自動解釈,特に胸部X線(CXR)から発見物を生成することに焦点を当てた。したがって、この研究は、研究の執筆やナレーションにほとんどの時間を費やす放射線科医の作業量を減らすことに焦点を当てている。ラジオグラフィーレポート生成を単一ステップ画像キャプションタスクとして扱う過去の研究とは異なり、CXR画像の解釈の複雑さを考慮し、2段階のアプローチを提案する。 (a)画像に異常のある領域を検出すること、 (b)生成型大言語モデル(llm)を用いて異常領域の関連テキストを生成すること。この2段階のアプローチは解釈可能性の層を導入し、放射線技師がcxrをレビューする際に使用する体系的な推論とフレームワークを整合させる。 Among all the sub-sections in a typical radiology report, the Clinical Indications, Findings, and Impression often reflect important details about the health status of a patient. The information included in Impression is also often covered in Findings. While Findings and Impression can be deduced by inspecting the image, Clinical Indications often require additional context. The cognitive task of interpreting medical images remains the most critical and often time-consuming step in the radiology workflow. Instead of generating an end-to-end radiology report, in this paper, we focus on generating the Findings from automated interpretation of medical images, specifically chest X-rays (CXRs). Thus, this work focuses on reducing the workload of radiologists who spend most of their time either writing or narrating the Findings. Unlike past research, which addresses radiology report generation as a single-step image captioning task, we have further taken into consideration the complexity of interpreting CXR images and propose a two-step approach: (a) detecting the regions with abnormalities in the image, and (b) generating relevant text for regions with abnormalities by employing a generative large language model (LLM). This two-step approach introduces a layer of interpretability and aligns the framework with the systematic reasoning that radiologists use when reviewing a CXR.	翻訳日:2023-06-21 20:43:44 公開日:2023-06-18
# 分散マッチングによるグラフ学習の現場グローバル解釈 In-Process Global Interpretation for Graph Learning via Distribution Matching ( http://arxiv.org/abs/2306.10447v1 ) ライセンス: Link先を確認	Yi Nian, Wei Jin, Lu Lin	(参考訳) グラフニューラルネットワーク(GNN)は、重要なグラフパターンをキャプチャする能力が優れているため、強力なグラフ学習モデルとして登場した。解釈可能なグラフ学習のためのモデルメカニズムに関する洞察を得るためには、事前学習されたgnnモデルが個人予測に使用するデータパターンを抽出し、hoc後の局所解釈に焦点を当てている。しかし、近年の研究では、ポストホック法はモデル初期化に非常に敏感であり、局所的な解釈は特定のインスタンス特有のモデル予測のみを説明できることを示している。本研究では、モデルトレーニング手順のグローバルな解釈を提供する方法について、まだ研究されていない重要な質問に答えることで、これらの制限に対処します。我々は,この問題を,GNNのトレーニング手順を支配する高レベルかつ人間の知能なパターンを蒸留することを目的とした,プロセス内グローバル解釈として定式化する。さらに,GNNの特徴空間における原グラフと解釈グラフの分布を学習の過程でマッチングすることにより,解釈グラフを合成するグラフ分散マッチング(GDM)を提案する。これらのわずかな解釈グラフは、トレーニング中にモデルがキャプチャする最も有益なパターンを示しています。グラフ分類データセットに関する広範囲な実験により,高い説明精度,時間効率,クラス関連構造を明らかにする能力など,提案手法の複数の利点が示された。 Graphs neural networks (GNNs) have emerged as a powerful graph learning model due to their superior capacity in capturing critical graph patterns. To gain insights about the model mechanism for interpretable graph learning, previous efforts focus on post-hoc local interpretation by extracting the data pattern that a pre-trained GNN model uses to make an individual prediction. However, recent works show that post-hoc methods are highly sensitive to model initialization and local interpretation can only explain the model prediction specific to a particular instance. In this work, we address these limitations by answering an important question that is not yet studied: how to provide global interpretation of the model training procedure? We formulate this problem as in-process global interpretation, which targets on distilling high-level and human-intelligible patterns that dominate the training procedure of GNNs. We further propose Graph Distribution Matching (GDM) to synthesize interpretive graphs by matching the distribution of the original and interpretive graphs in the feature space of the GNN as its training proceeds. These few interpretive graphs demonstrate the most informative patterns the model captures during training. Extensive experiments on graph classification datasets demonstrate multiple advantages of the proposed method, including high explanation accuracy, time efficiency and the ability to reveal class-relevant structure.	翻訳日:2023-06-21 20:43:23 公開日:2023-06-18
# メタ事前自己検索によるユニバーサル情報抽出 Universal Information Extraction with Meta-Pretrained Self-Retrieval ( http://arxiv.org/abs/2306.10444v1 ) ライセンス: Link先を確認	Xin Cong. Bowen Yu, Mengcheng Fang, Tingwen Liu, Haiyang Yu, Zhongkai Hu, Fei Huang, Yongbin Li, Bin Wang	(参考訳) Universal Information extract~(Universal IE)は、テキストから構造までの一様生成方法で異なる抽出タスクを解くことを目的としている。このような生成手順は、抽出すべき複雑な情報構造が存在する場合に苦労する傾向がある。外部知識ベースから知識を取得することは、モデルがこの問題を克服するのに役立つかもしれないが、様々なIEタスクに適した知識ベースを構築することは不可能である。本稿では,事前学習された言語モデル~(plm)に大量の知識が格納されていることに着想を得て,タスク固有の知識をplmから取得してユニバーサルieを強化するメタレトリエを提案する。異なるIEタスクが異なる知識を必要とするため、下流のIEタスクを微調整する際に、MetaRetrieverがタスク固有の検索性能の最大化を迅速に行えるメタトレーニングアルゴリズムを提案する。実験の結果、MetaRetrieverは4つのIEタスク、12のデータセットで、完全に管理され、低リソースで、少数ショットのシナリオで新しい最先端を実現している。 Universal Information Extraction~(Universal IE) aims to solve different extraction tasks in a uniform text-to-structure generation manner. Such a generation procedure tends to struggle when there exist complex information structures to be extracted. Retrieving knowledge from external knowledge bases may help models to overcome this problem but it is impossible to construct a knowledge base suitable for various IE tasks. Inspired by the fact that large amount of knowledge are stored in the pretrained language models~(PLM) and can be retrieved explicitly, in this paper, we propose MetaRetriever to retrieve task-specific knowledge from PLMs to enhance universal IE. As different IE tasks need different knowledge, we further propose a Meta-Pretraining Algorithm which allows MetaRetriever to quicktly achieve maximum task-specific retrieval performance when fine-tuning on downstream IE tasks. Experimental results show that MetaRetriever achieves the new state-of-the-art on 4 IE tasks, 12 datasets under fully-supervised, low-resource and few-shot scenarios.	翻訳日:2023-06-21 20:43:05 公開日:2023-06-18
# El Ni\~非南方振動の季節予測のための畳み込みGRUネットワーク Convolutional GRU Network for Seasonal Prediction of the El Ni\~no-Southern Oscillation ( http://arxiv.org/abs/2306.10443v1 ) ライセンス: Link先を確認	Lingda Wang, Savana Ammons, Vera Mikyoung Hur, Ryan L. Sriver, Zhizhen Zhao	(参考訳) 地球温度と降水パターンに大きな影響を及ぼすため,エルニ・南方オシレーション(ENSO)地域の海面温度(SST)の予測が広く研究されている。線形逆モデル(LIM)やアナログ予測(AF)、リカレントニューラルネットワーク(RNN)といった統計モデルは、大きな動的モデルに比べて柔軟性と計算コストの低いENSO予測に広く用いられている。しかし、これらのモデルには、SST変数の空間パターンのキャプチャや線形力学に依存する制限がある。本稿では,enso領域時空間シーケンス予測問題に対するconvolutional gated recurrent unit (convgru)ネットワークの改良と,ダウンストリームタスクとしてのni\~no 3.4インデックス予測を提案する。提案するConvGRUネットワークはエンコーダ・デコーダシーケンス・ツー・シーケンス構造を持ち,太平洋地域の歴史的SSTマップを入力として取り込んで,その後数ヶ月間,ENSO領域内で将来のSSTマップを生成する。 ConvGRUネットワークの性能を評価するために,複数の大規模気候モデルから得られたデータを用いて実験を行った。その結果、LIM、AF、RNNと比較して、ConvGRUネットワークはNi\~no 3.4インデックスの予測可能性を大幅に向上することが示された。この改善は、拡張された有用予測範囲、高いピアソン相関、低い根-平均二乗誤差によって証明される。提案モデルは,enso現象の理解と予測能力の向上に期待でき,空間パターンとテレコネクションを用いた他の気象・気候予測シナリオにも適用可能である。 Predicting sea surface temperature (SST) within the El Ni\~no-Southern Oscillation (ENSO) region has been extensively studied due to its significant influence on global temperature and precipitation patterns. Statistical models such as linear inverse model (LIM), analog forecasting (AF), and recurrent neural network (RNN) have been widely used for ENSO prediction, offering flexibility and relatively low computational expense compared to large dynamic models. However, these models have limitations in capturing spatial patterns in SST variability or relying on linear dynamics. Here we present a modified Convolutional Gated Recurrent Unit (ConvGRU) network for the ENSO region spatio-temporal sequence prediction problem, along with the Ni\~no 3.4 index prediction as a down stream task. The proposed ConvGRU network, with an encoder-decoder sequence-to-sequence structure, takes historical SST maps of the Pacific region as input and generates future SST maps for subsequent months within the ENSO region. To evaluate the performance of the ConvGRU network, we trained and tested it using data from multiple large climate models. The results demonstrate that the ConvGRU network significantly improves the predictability of the Ni\~no 3.4 index compared to LIM, AF, and RNN. This improvement is evidenced by extended useful prediction range, higher Pearson correlation, and lower root-mean-square error. The proposed model holds promise for improving our understanding and predicting capabilities of the ENSO phenomenon and can be broadly applicable to other weather and climate prediction scenarios with spatial patterns and teleconnections.	翻訳日:2023-06-21 20:42:46 公開日:2023-06-18
# ロボット操作のためのユニバーサルセマンティクス・ジオメトリ表現 A Universal Semantic-Geometric Representation for Robotic Manipulation ( http://arxiv.org/abs/2306.10474v1 ) ライセンス: Link先を確認	Tong Zhang, Yingdong Hu, Hanchen Cui, Hang Zhao, Yang Gao	(参考訳) ロボットはセンサー、特にRGBと深度カメラに大きく依存し、世界に対する認識と対話を行う。 RGBカメラは、正確な空間情報を欠きながら、豊かな意味情報を持つ2D画像を記録する。一方、深度カメラは重要な3Dジオメトリデータを提供するが、セマンティクスは限られている。したがって、ロボットの知覚と制御を学習するためには、両方のモダリティを統合することが不可欠である。しかし、現在の研究は主にこれらのモダリティの1つに焦点を合わせており、両方を組み込むことの利点を無視している。この目的のために,大規模な事前学習型2次元モデルのリッチな意味情報を活用し,三次元空間推論の利点を継承するロボットのための汎用認識モジュールであるセマンティック・幾何学表現(SGR)を提案する。実験の結果、SGRはエージェントに対して、シミュレーションおよび実世界の様々なロボット操作タスクを成功させ、シングルタスクとマルチタスクの両方において、最先端の手法よりも優れた性能を発揮することが示された。さらに、SGRには、新しいセマンティック属性に一般化するユニークな機能があり、他のメソッドとは分離されている。 Robots rely heavily on sensors, especially RGB and depth cameras, to perceive and interact with the world. RGB cameras record 2D images with rich semantic information while missing precise spatial information. On the other side, depth cameras offer critical 3D geometry data but capture limited semantics. Therefore, integrating both modalities is crucial for learning representations for robotic perception and control. However, current research predominantly focuses on only one of these modalities, neglecting the benefits of incorporating both. To this end, we present Semantic-Geometric Representation (SGR), a universal perception module for robotics that leverages the rich semantic information of large-scale pre-trained 2D models and inherits the merits of 3D spatial reasoning. Our experiments demonstrate that SGR empowers the agent to successfully complete a diverse range of simulated and real-world robotic manipulation tasks, outperforming state-of-the-art methods significantly in both single-task and multi-task settings. Furthermore, SGR possesses the unique capability to generalize to novel semantic attributes, setting it apart from the other methods.	翻訳日:2023-06-21 20:35:22 公開日:2023-06-18
# 2D-Shapley: 断片化されたデータ評価のためのフレームワーク 2D-Shapley: A Framework for Fragmented Data Valuation ( http://arxiv.org/abs/2306.10473v1 ) ライセンス: Link先を確認	Zhihong Liu, Hoang Anh Just, Xiangyu Chang, Xi Chen, Ruoxi Jia	(参考訳) データ評価 -- モデルの特定の予測行動に対する個々のデータソースの貢献を定量化する -- は、機械学習の透明性を高め、データ共有のためのインセンティブシステムを設計する上で非常に重要である。既存の作業は、共有機能やサンプルスペースでデータソースを評価することに集中しています。それぞれの部分的な特徴とサンプルのみを含む断片化されたデータソースの評価方法は、未解決の問題のままである。まず,集約されたデータマトリックスから断片を除去することの反事実を計算する手法を提案する。反事実計算に基づいてさらに,断片化されたデータコンテキストにおける一意に魅力的な公理を満たす,断片化されたデータ評価のための理論的枠組みである2d-shapleyを提案する。 2D-Shapleyは、有用なデータフラグメントの選択、サンプル単位のデータ値の解釈、きめ細かいデータ問題診断など、さまざまな新しいユースケースを促進する。 Data valuation -- quantifying the contribution of individual data sources to certain predictive behaviors of a model -- is of great importance to enhancing the transparency of machine learning and designing incentive systems for data sharing. Existing work has focused on evaluating data sources with the shared feature or sample space. How to valuate fragmented data sources of which each only contains partial features and samples remains an open question. We start by presenting a method to calculate the counterfactual of removing a fragment from the aggregated data matrix. Based on the counterfactual calculation, we further propose 2D-Shapley, a theoretical framework for fragmented data valuation that uniquely satisfies some appealing axioms in the fragmented data context. 2D-Shapley empowers a range of new use cases, such as selecting useful data fragments, providing interpretation for sample-wise data values, and fine-grained data issue diagnosis.	翻訳日:2023-06-21 20:35:03 公開日:2023-06-18
# 高次依存パーシングのためのニューラルポテンシャルの伝達 Transferring Neural Potentials For High Order Dependency Parsing ( http://arxiv.org/abs/2306.10469v1 ) ライセンス: Link先を確認	Farshad Noravesh	(参考訳) 高階依存性解析は兄弟や孫といった高階機能を活用して、現在の一階依存性解析の精度を向上させる。本稿では,ビアフィンスコアを用いて弧スコアの推定を行い,それをグラフィカルモデルに伝播する。グラフィカルモデル内の推論は二重分解を用いて解決される。本アルゴリズムは,バイアフィンのニューラルスコアをグラフィカルモデルに伝達し,2重分解推論を活用し,回路全体をエンドツーエンドに訓練し,第1次情報を高次情報に転送する。 High order dependency parsing leverages high order features such as siblings or grandchildren to improve state of the art accuracy of current first order dependency parsers. The present paper uses biaffine scores to provide an estimate of the arc scores and is then propagated into a graphical model. The inference inside the graphical model is solved using dual decomposition. The present algorithm propagates biaffine neural scores to the graphical model and by leveraging dual decomposition inference, the overall circuit is trained end-to-end to transfer first order informations to the high order informations.	翻訳日:2023-06-21 20:34:47 公開日:2023-06-18
# ブラウン運動制御器によるgansトレーニングの安定化 Stabilizing GANs' Training with Brownian Motion Controller ( http://arxiv.org/abs/2306.10468v1 ) ライセンス: Link先を確認	Tianjiao Luo, Ziyu Zhu, Jianfei Chen, Jun Zhu	(参考訳) generative adversarial networks(gans)のトレーニングプロセスは不安定であり、グローバルに収束しない。本稿では,制御理論の観点からGANの安定性を考察し,BMC(Brownian Motion Controller)と呼ばれる高次騒音制御系を提案する。ディラックGANの原型の場合から、我々はBMCを設計し、正確に同じだが到達可能な最適平衡を求める。理論上、diracgans-bmcの訓練過程は指数関数的に安定であり、収束率の境界が導かれることを証明している。次に、BMCを通常のGANに拡張し、GANs-BMCの実装手順を提供する。実験の結果,我々のGANs-BMCは,より高速な収束率,発振域の小さい,FIDスコアの点で優れた性能で,StyleGANv2-adaフレームワーク下でのGANsのトレーニングを効果的に安定化することがわかった。 The training process of generative adversarial networks (GANs) is unstable and does not converge globally. In this paper, we examine the stability of GANs from the perspective of control theory and propose a universal higher-order noise-based controller called Brownian Motion Controller (BMC). Starting with the prototypical case of Dirac-GANs, we design a BMC to retrieve precisely the same but reachable optimal equilibrium. We theoretically prove that the training process of DiracGANs-BMC is globally exponential stable and derive bounds on the rate of convergence. Then we extend our BMC to normal GANs and provide implementation instructions on GANs-BMC. Our experiments show that our GANs-BMC effectively stabilizes GANs' training under StyleGANv2-ada frameworks with a faster rate of convergence, a smaller range of oscillation, and better performance in terms of FID score.	翻訳日:2023-06-21 20:34:36 公開日:2023-06-18
# グラフラドリング: 中間的コミュニケーションを伴わない極めて単純な並列GNNトレーニング Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication ( http://arxiv.org/abs/2306.10466v1 ) ライセンス: Link先を確認	Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang	(参考訳) グラフは一様であり、GNNはグラフを学習するためのニューラルネットワークの強力なファミリーである。その人気にもかかわらず、gnnの拡張は、不健全な勾配、過剰なスモーニング、情報のスカッシュといった一般的な問題に苦しめられ、それがしばしば標準以下のパフォーマンスに繋がる。本研究では,GNNのキャパシティを拡張・拡張することなく拡張し,複数の小・大規模グラフにまたがる性能向上を図ることに興味がある。最近のモデルスープの興味深い現象に触発されて、複数の大規模言語事前学習モデルの微調整重量をより良いミニマにマージできることが示唆され、モデルスープの基本を利用して、GNNスケーリング時のメモリボトルネックやトレーサビリティの問題を緩和する。より具体的には、現在のGNNの深化や拡大はしないが、GNNに適したモデルスープのデータ中心の視点を示す。すなわち、巨大なグラフデータを独立に分割して並列に訓練された複数のGNNを中間的な通信なしで構築し、その強度をグリーディ補間スーププロシージャと組み合わせて最先端のパフォーマンスを達成することで、強力なGNNを構築する。さらに,大規模なグラフデータ構造を扱える最先端のグラフサンプリングとグラフ分割アプローチを活用することで,幅広いモデルスープ作成手法を提供する。実世界の小規模・大規模グラフにまたがる広範な実験は、我々のアプローチの有効性を示し、GNNスケーリングのための有望な直交方向に向かっている。コードは以下の通り: \url{https://github.com/VITA-Group/graph_ladling}。 Graphs are omnipresent and GNNs are a powerful family of neural networks for learning over graphs. Despite their popularity, scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing, which often lead to sub-standard performance. In this work, we are interested in exploring a principled way to scale GNNs capacity without deepening or widening, which can improve its performance across multiple small and large graphs. Motivated by the recent intriguing phenomenon of model soups, which suggest that fine-tuned weights of multiple large-language pre-trained models can be merged to a better minima, we argue to exploit the fundamentals of model soups to mitigate the aforementioned issues of memory bottleneck and trainability during GNNs scaling. More specifically, we propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs, i.e., to build powerful GNNs by dividing giant graph data to build independently and parallelly trained multiple comparatively weaker GNNs without any intermediate communication, and combining their strength using a greedy interpolation soup procedure to achieve state-of-the-art performance. Moreover, we provide a wide variety of model soup preparation techniques by leveraging state-of-the-art graph sampling and graph partitioning approaches that can handle large graph data structures. Our extensive experiments across many real-world small and large graphs, illustrate the effectiveness of our approach and point towards a promising orthogonal direction for GNN scaling. Codes are available at: \url{https://github.com/VITA-Group/graph_ladling}.	翻訳日:2023-06-21 20:34:19 公開日:2023-06-18
# 改良されたRDOプロセスによるGAN画像圧縮 GAN-based Image Compression with Improved RDO Process ( http://arxiv.org/abs/2306.10461v1 ) ライセンス: Link先を確認	Fanxin Xia, Jian Jin, Lili Meng, Feng Ding, Huaxiang Zhang	(参考訳) GANベースの画像圧縮方式は,低ビットレートで高い知覚品質を実現するため,近年顕著な進歩を見せている。しかし、主な問題として2つある。 1)色,テクスチャ,構造及び構造における再構成画像の知覚的変性 2)不正確なエントロピーモデル。本稿では、レート歪み最適化(RDO)プロセスを改善した新しいGANベースの画像圧縮手法を提案する。これを実現するために、DisTSとMS-SSIMのメトリクスを用いて、色、テクスチャ、構造における知覚的変性を測定する。さらに,エントロピーモデルのための離散ガウス・ラプラシア・ロジスティック混合モデル(gllmm)を吸収し,潜在表現の確率分布の推定精度を向上させる。評価過程において, iqaメトリクスを用いて再構成画像の知覚品質を評価する代わりに, ヒトの知覚結果を完全に反映する異なるコーデック間の平均評価スコア(mos)実験を直接実施する。実験の結果,提案手法は既存のGAN法と最先端のハイブリッドコーデック(VVC)よりも優れていた。 GAN-based image compression schemes have shown remarkable progress lately due to their high perceptual quality at low bit rates. However, there are two main issues, including 1) the reconstructed image perceptual degeneration in color, texture, and structure as well as 2) the inaccurate entropy model. In this paper, we present a novel GAN-based image compression approach with improved rate-distortion optimization (RDO) process. To achieve this, we utilize the DISTS and MS-SSIM metrics to measure perceptual degeneration in color, texture, and structure. Besides, we absorb the discretized gaussian-laplacian-logistic mixture model (GLLMM) for entropy modeling to improve the accuracy in estimating the probability distributions of the latent representation. During the evaluation process, instead of evaluating the perceptual quality of the reconstructed image via IQA metrics, we directly conduct the Mean Opinion Score (MOS) experiment among different codecs, which fully reflects the actual perceptual results of humans. Experimental results demonstrate that the proposed method outperforms the existing GAN-based methods and the state-of-the-art hybrid codec (i.e., VVC).	翻訳日:2023-06-21 20:33:42 公開日:2023-06-18
# instant soup: 安いプランニングアンサンブルを1枚のパスで作れば、大きなモデルから宝くじを引ける Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models ( http://arxiv.org/abs/2306.10460v1 ) ライセンス: Link先を確認	Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang	(参考訳) 大規模な事前訓練されたトランスフォーマは、微調整による多数の下流アプリケーションへの適応性の高さから、ここ数年で爆発的な注目を集めてきたが、その指数関数的に増加するパラメータ数は、業界標準のハードウェアなしでそれらを微調整する上でも、大きなハードルとなっている。近年、LTH(Lottery Ticket hypothesis)とその変種は、これらの大きな事前訓練されたモデルを用いて、密度の高いモデルと同等の性能を達成できるサブネットを創出するが、LTHプラグマティズムは、反復的なフルトレーニングと反復的マグニチュードプルーニング(IMP)のプルーニングルーチンによって著しく阻害され、モデルサイズが増加するにつれて悪化する。モデルスープの最近の観察から,複数のモデルの微調整された重量をより小型化できる可能性が示唆されている。我々は,IMPの高価な中間プルーニング段階を計算効率の悪いマスク生成と集約ルーチンに置き換えることで,従来のIMPコストのごく一部を用いて,宝くじ品質のサブネットワークを生成するInstant Soup Pruning (ISP)を提案する。具体的には、マスク生成の段階では、ISPは、様々なトレーニングプロトコルとデータサブセットを使用して、弱いノイズの多いサブネットを多数生成し、ノイズを平均化し、高品質のノイズを発生させる。複数のベンチマークビジョンと言語データセットにわたるCLIP(未探索)とBERTの2つの大規模な事前訓練モデルに対する広範な実験とアブレーションにより、ISPの有効性がいくつかの最先端のプルーニング手法と比較して検証された。コードは以下の通り。 \url{https://github.com/VITA-Group/instant_soup} Large pre-trained transformers have been receiving explosive attention in the past few years, due to their wide adaptability for numerous downstream applications via fine-tuning, but their exponentially increasing parameter counts are becoming a primary hurdle to even just fine-tune them without industry-standard hardware. Recently, Lottery Ticket Hypothesis (LTH) and its variants, have been exploited to prune these large pre-trained models generating subnetworks that can achieve similar performance as their dense counterparts, but LTH pragmatism is enormously inhibited by repetitive full training and pruning routine of iterative magnitude pruning (IMP) which worsens with increasing model size. Motivated by the recent observations of model soups, which suggest that fine-tuned weights of multiple models can be merged to a better minima, we propose Instant Soup Pruning (ISP) to generate lottery ticket quality subnetworks, using a fraction of the original IMP cost by replacing the expensive intermediate pruning stages of IMP with computationally efficient weak mask generation and aggregation routine. More specifically, during the mask generation stage, ISP takes a small handful of iterations using varying training protocols and data subsets to generate many weak and noisy subnetworks, and superpose them to average out the noise creating a high-quality denoised subnetwork. Our extensive experiments and ablation on two popular large-scale pre-trained models: CLIP (unexplored in pruning till date) and BERT across multiple benchmark vision and language datasets validate the effectiveness of ISP compared to several state-of-the-art pruning methods. Codes are available at: \url{https://github.com/VITA-Group/instant_soup}	翻訳日:2023-06-21 20:33:24 公開日:2023-06-18
# Interval Targets を用いた弱教師付き回帰 Weakly Supervised Regression with Interval Targets ( http://arxiv.org/abs/2306.10458v1 ) ライセンス: Link先を確認	Xin Cheng and Yuzhou Cao and Ximing Li and Bo An and Lei Feng	(参考訳) 本稿では,Regressed with interval target (RIT)と呼ばれる,興味深い教師付き回帰設定について検討する。関連する回帰設定に関する従来の手法のいくつかはRITに適応できるが、統計的に一貫性がなく、経験的性能は保証されない。本稿では,RITに関する詳細な研究について述べる。まず,ritのデータ生成過程を記述するための新しい統計モデルを提案し,その妥当性を示す。第二に、RITの簡単な選択法を解析し、対象値として区間内の特定の値を選択してモデルを訓練する。第3に、予測を間隔に制限することでモデルを訓練するための統計的に一貫した制限法を提案する。さらに,限界法に対する推定誤差を導出する。最後に,様々なデータセットに関する広範囲な実験を行い,提案手法の有効性を示す。 This paper investigates an interesting weakly supervised regression setting called regression with interval targets (RIT). Although some of the previous methods on relevant regression settings can be adapted to RIT, they are not statistically consistent, and thus their empirical performance is not guaranteed. In this paper, we provide a thorough study on RIT. First, we proposed a novel statistical model to describe the data generation process for RIT and demonstrate its validity. Second, we analyze a simple selection method for RIT, which selects a particular value in the interval as the target value to train the model. Third, we propose a statistically consistent limiting method for RIT to train the model by limiting the predictions to the interval. We further derive an estimation error bound for our limiting method. Finally, extensive experiments on various datasets demonstrate the effectiveness of our proposed method.	翻訳日:2023-06-21 20:32:48 公開日:2023-06-18
# グラフ表現学習によるバイオメディシンの進歩 : 最近の進歩,課題,今後の方向性 Advancing Biomedicine with Graph Representation Learning: Recent Progress, Challenges, and Future Directions ( http://arxiv.org/abs/2306.10456v1 ) ライセンス: Link先を確認	Fang Li, Yi Nian, Zenan Sun, Cui Tao	(参考訳) グラフ表現学習(GRL)は、バイオメディシンを含む様々な分野のブレークスルーに大きく貢献する中心的な分野として登場した。本調査の目的は, GRL法の最近の進歩とそのバイオメディカル分野への応用を概観することである。また、GRLが現在直面している重要な課題を強調し、今後の研究の方向性について概説する。 Graph representation learning (GRL) has emerged as a pivotal field that has contributed significantly to breakthroughs in various fields, including biomedicine. The objective of this survey is to review the latest advancements in GRL methods and their applications in the biomedical field. We also highlight key challenges currently faced by GRL and outline potential directions for future research.	翻訳日:2023-06-21 20:32:37 公開日:2023-06-18
# 制限された相手に対する量子サンプリングによるワンウェイエンタングルメント浄化の安全性 Security of One-Way Entanglement Purification with Quantum Sampling Against a Restricted Adversary ( http://arxiv.org/abs/2306.10455v1 ) ライセンス: Link先を確認	Cameron Cianci	(参考訳) エンタングルメント浄化プロトコルは、ノイズの多いチャネルにエンタングルメントを分散することにより、量子ネットワークの将来において重要な役割を果たすことを約束する。しかし, 双方向浄化プロトコルの安全性は検討されているのみである。そこで本研究では,量子サンプリングを応用し,単一量子ビットパウリゲートに制限された敵に対するセキュリティを証明する一方通行の絡み合い解消プロトコルを提案する。これは、一方向の絡み合わせプロトコルと誤り訂正符号の等価性を利用する。このプロトコルの安全性を証明するために、ブーマンとフェーアが導入した量子サンプリングフレームワークを用いて、チャネルを通過した量子ビットのハミング重量を推定し、Eveが量子チャネルに課した干渉の量を決定するために、推定相対的なハミング重量$\omega$を使用する。 eveは単一キュービットのパウリゲートに制限されているため、適用ゲートの数をハミング重量を用いて直接見積もることができる。逆1量子ビットゲートの数を推定すると、誤差補正を行い、確率1-\epsilon_{qu}^\delta$で論理量子ビットをイヴから切り離すことができる。このプロトコルは一方向のみの通信を可能にするため、送信前にコードの距離を決定する必要があるため、bob氏は、eveがコードを修正できる以上のゲートを施したことを知っていれば、プロトコルを中止せざるを得なくなるだろう。ワンウェイプロトコルは、通信が限られている場合や、双方向プロトコルで必要とされる複数の通信ラウンドと比較してレイテンシーを減らしたい場合に使われる可能性がある。さらなる研究は、より一般的な敵に対するセキュリティ保証を得るために、任意のシングルまたはマルチキュービットゲートに対するこのプロトコルのセキュリティを調査することができる。 Entanglement purification protocols promise to play a critical role in the future of quantum networks by distributing entanglement across noisy channels. However, only the security of two-way purification protocols have been closely studied. To address this, we propose a one-way entanglement purification protocol which utilizes quantum sampling and prove its security against an adversary restricted to single qubit Pauli gates. This is done through leveraging the equivalence of one-way entanglement purification protocols with error-correcting codes. To prove the security of this protocol, we first use the quantum sampling framework introduced by Bouman and Fehr to estimate the Hamming weight of the qubits which passed through the channel and then use the estimated relative Hamming weight $\omega$ to determine the amount of interference that Eve has subjected to the quantum channel. Since Eve is restricted to single qubit Pauli gates, the number of applied gates can be directly estimated using the Hamming weight. Estimating the number of adversarial single qubit gates, allows us to perform error correction and disentangle the logical qubit from Eve with probability $1-\epsilon_{qu}^\delta$. Since this protocol allows communication only in one direction, the distance of the code must be decided before transmission, and therefore Bob will be forced to abort the protocol if he finds that Eve has applied more gates than the code can correct. One-way protocols may find use when communication is limited, or when we desire to decrease latency compared to the multiple rounds of communication needed in two-way protocols. Further research may investigate the security of this protocol against arbitrary single or multi-qubit gates to obtain security guarantees against a more general adversary.	翻訳日:2023-06-21 20:32:29 公開日:2023-06-18
# 自律運転のためのオンライン地図ベクトル化:ラスタライズの視点から Online Map Vectorization for Autonomous Driving: A Rasterization Perspective ( http://arxiv.org/abs/2306.10502v1 ) ライセンス: Link先を確認	Gongjie Zhang, Jiahao Lin, Shuang Wu, Yilin Song, Zhipeng Luo, Yang Xue, Shijian Lu, Zuoguan Wang	(参考訳) ベクトル化高精細度(hd)マップは自動運転に必須であり、高度な知覚と計画のための詳細な環境情報を提供する。しかし、現在の地図ベクトル化法はしばしば偏差を示し、既存の地図ベクトル化の評価基準ではこれらの偏差を検出するのに十分な感度が欠けている。これらの制約に対処するため、ラスタ化の哲学をマップベクトル化に統合することを提案する。具体的には、ラスタライズに基づく新しい評価指標を導入し、感度が良く、現実の自律運転シナリオに適している。さらに、ベクトル化出力に微分可能ラスタ化を適用し、ラスタ化HDマップの精密かつ幾何学的監視を行う新しいフレームワークであるMapVR(Map Vectorization via Rasterization)を提案する。特に、MapVRは様々な幾何学的な形状のラスタ化戦略を設計し、幅広い地図要素に効果的に適用することができる。実験により、ラスタ化を地図ベクトル化に組み込むことは、推論中に余分な計算コストを伴わずに性能を大幅に向上させ、より正確な地図認識をもたらし、究極的にはより安全な自動運転を促進することが示されている。 Vectorized high-definition (HD) map is essential for autonomous driving, providing detailed and precise environmental information for advanced perception and planning. However, current map vectorization methods often exhibit deviations, and the existing evaluation metric for map vectorization lacks sufficient sensitivity to detect these deviations. To address these limitations, we propose integrating the philosophy of rasterization into map vectorization. Specifically, we introduce a new rasterization-based evaluation metric, which has superior sensitivity and is better suited to real-world autonomous driving scenarios. Furthermore, we propose MapVR (Map Vectorization via Rasterization), a novel framework that applies differentiable rasterization to vectorized outputs and then performs precise and geometry-aware supervision on rasterized HD maps. Notably, MapVR designs tailored rasterization strategies for various geometric shapes, enabling effective adaptation to a wide range of map elements. Experiments show that incorporating rasterization into map vectorization greatly enhances performance with no extra computational cost during inference, leading to more accurate map perception and ultimately promoting safer autonomous driving.	翻訳日:2023-06-21 20:24:49 公開日:2023-06-18
# 多層心血管疾患予測のための半教師付き学習:マルチデータセットによる検討 Semi-Supervised Learning for Multi-Label Cardiovascular Diseases Prediction:A Multi-Dataset Study ( http://arxiv.org/abs/2306.10494v1 ) ライセンス: Link先を確認	Rushuang Zhou, Lei Lu, Zijun Liu, Ting Xiang, Zhen Liang, David A. Clifton, Yining Dong, Yuan-Ting Zhang	(参考訳) 心電図は、心血管疾患(CVD)を予測するための非侵襲的なツールである。現在の心電図に基づく診断システムは,ディープラーニング技術の急速な発展により,有望な性能を示す。しかし、ラベルの不足、複数のCVDの共起、見えないデータセットの性能の低下は、ディープラーニングベースのモデルの普及を著しく妨げている。統一されたフレームワークでそれらに取り組むことは、依然として大きな課題である。そこで本研究では,複数のCVDを同時に認識するマルチラベル半教師付きモデル(ECGMatch)を提案する。 ECGMatchでは、弱い強力なECGデータ拡張のためにECGAugmentモジュールが開発され、モデルトレーニングのための多様なサンプルを生成する。その後、ラベル不足を緩和する擬似ラベル生成・改良のために、近隣の合意モデリングと知識蒸留を備えたハイパーパラメータ効率のフレームワークを設計する。最後に,ラベル付きサンプル内の異なるCVDの共起情報を捕捉し,ラベル付きサンプルに伝達するラベル相関アライメントモジュールを提案する。 4つのデータセットと3つのプロトコルに関する大規模な実験は、提案モデルの有効性と安定性を実証している。そのため,本モデルは,限られた監督下での多ラベルCVD予測において,堅牢な性能を実現する診断システムを実現することができる。 Electrocardiography (ECG) is a non-invasive tool for predicting cardiovascular diseases (CVDs). Current ECG-based diagnosis systems show promising performance owing to the rapid development of deep learning techniques. However, the label scarcity problem, the co-occurrence of multiple CVDs and the poor performance on unseen datasets greatly hinder the widespread application of deep learning-based models. Addressing them in a unified framework remains a significant challenge. To this end, we propose a multi-label semi-supervised model (ECGMatch) to recognize multiple CVDs simultaneously with limited supervision. In the ECGMatch, an ECGAugment module is developed for weak and strong ECG data augmentation, which generates diverse samples for model training. Subsequently, a hyperparameter-efficient framework with neighbor agreement modeling and knowledge distillation is designed for pseudo-label generation and refinement, which mitigates the label scarcity problem. Finally, a label correlation alignment module is proposed to capture the co-occurrence information of different CVDs within labeled samples and propagate this information to unlabeled samples. Extensive experiments on four datasets and three protocols demonstrate the effectiveness and stability of the proposed model, especially on unseen datasets. As such, this model can pave the way for diagnostic systems that achieve robust performance on multi-label CVDs prediction with limited supervision.	翻訳日:2023-06-21 20:24:28 公開日:2023-06-18
# MOSPC: ペアワイズ比較に基づくMOS予測 MOSPC: MOS Prediction Based on Pairwise Comparison ( http://arxiv.org/abs/2306.10493v1 ) ライセンス: Link先を確認	Kexin Wang, Yunlong Zhao, Qianqian Dong, Tom Ko, Mingxuan Wang	(参考訳) 合成音声の品質を評価する主観的指標として、平均評価スコア~(mos)は、通常、複数の注釈者が同じ音声を得点する必要がある。このようなアノテーションアプローチには多くのマンパワーが必要で、時間もかかります。自動評価のためのMOS予測モデルは、労働コストを大幅に削減することができる。先行研究では,mosスコアが近い場合,音声品質を正確にランク付けすることは困難である。しかし, 実用的応用においては, 単にmosスコアを予測するよりも, 合成システムや文の品質を正しくランク付けすることが重要である。一方、アノテーション中に各アノテータが複数のオーディオをスコアする際、アノテータが付与する第1または第1の音声スコアに基づいてスコアが相対値となる。以上の2点により,ペア比較(MOSPC)に基づくMOS予測のための一般的なフレームワークを提案し,C-Mixupアルゴリズムを用いてMOSPCの一般化性能を向上させる。 BVCCとVCC2018の実験は、我々のフレームワークが相関係数の指標のほとんど、特に品質ランキングに関するKTAUの基準よりも優れていることを示している。また,このフレームワークは,各細粒度セグメントのランキング精度も高いベースラインを超えている。これらの結果から,音声品質のランク付け精度の向上に寄与することが示唆された。 As a subjective metric to evaluate the quality of synthesized speech, Mean opinion score~(MOS) usually requires multiple annotators to score the same speech. Such an annotation approach requires a lot of manpower and is also time-consuming. MOS prediction model for automatic evaluation can significantly reduce labor cost. In previous works, it is difficult to accurately rank the quality of speech when the MOS scores are close. However, in practical applications, it is more important to correctly rank the quality of synthesis systems or sentences than simply predicting MOS scores. Meanwhile, as each annotator scores multiple audios during annotation, the score is probably a relative value based on the first or the first few speech scores given by the annotator. Motivated by the above two points, we propose a general framework for MOS prediction based on pair comparison (MOSPC), and we utilize C-Mixup algorithm to enhance the generalization performance of MOSPC. The experiments on BVCC and VCC2018 show that our framework outperforms the baselines on most of the correlation coefficient metrics, especially on the metric KTAU related to quality ranking. And our framework also surpasses the strong baseline in ranking accuracy on each fine-grained segment. These results indicate that our framework contributes to improving the ranking accuracy of speech quality.	翻訳日:2023-06-21 20:24:07 公開日:2023-06-18
# レーン分割注意マップ類似性を用いた自律走行シミュレーションにおけるSim2Real画像ギャップの定量化に関する研究 A Study on Quantifying Sim2Real Image Gap in Autonomous Driving Simulations Using Lane Segmentation Attention Map Similarity ( http://arxiv.org/abs/2306.10491v1 ) ライセンス: Link先を確認	Seongjeong Park, Jinu Pahk, Lennart Lorenz Freimuth Jahn, Yongseob Lim, Jinung An, Gyeungho Choi	(参考訳) 自動運転シミュレーションは非常に現実的な画像を必要とする。予備研究では,DCLGANを用いてCARLAシミュレータ画像がより現実に近いものになると,車線認識モデルの性能は現実の運転に匹敵するレベルまで向上した。また、車両が車線から外れた後に車線の中心に戻る能力が大幅に改善されたことも確認された。しかし,シミュレーション画像のリアリズムを定量的に評価するための合意基準は現時点では存在しない。そこで本研究では,fid (fr\'echet inception distance) が事前学習モデルを用いて特徴ベクトル分布距離を測定することを前提として,enet-sadの自己注意蒸留過程からの注意マップを用いてシミュレーション道路画像の類似度を測定する指標を提案する。最後に,実世界の自律走行試験道路を実装したカルラマップの画像に適用することにより,計測方法の適合性を検証した。 Autonomous driving simulations require highly realistic images. Our preliminary study found that when the CARLA Simulator image was made more like reality by using DCLGAN, the performance of the lane recognition model improved to levels comparable to real-world driving. It was also confirmed that the vehicle's ability to return to the center of the lane after deviating from it improved significantly. However, there is currently no agreed-upon metric for quantitatively evaluating the realism of simulation images. To address this issue, based on the idea that FID (Fr\'echet Inception Distance) measures the feature vector distribution distance using a pre-trained model, this paper proposes a metric that measures the similarity of simulation road images using the attention map from the self-attention distillation process of ENet-SAD. Finally, this paper verified the suitability of the measurement method by applying it to the image of the CARLA map that implemented a realworld autonomous driving test road.	翻訳日:2023-06-21 20:23:44 公開日:2023-06-18
# ニューロシンボリック学習による高速画像ラベリング Rapid Image Labeling via Neuro-Symbolic Learning ( http://arxiv.org/abs/2306.10490v1 ) ライセンス: Link先を確認	Yifeng Wang, Zhi Tu, Yiwen Xiang, Shiyuan Zhou, Xiyuan Chen, Bingxuan Li, and Tianyi Zhang	(参考訳) コンピュータビジョン(cv)の成功は、手動の注釈データに大きく依存している。しかし、データラベリングには重要なドメイン専門知識が必要であり、クラウドワーカーに簡単に委譲することはできない、ヘルスケアのような重要なドメインで画像に注釈をつけるのは、非常に高価である。この課題に対処するために、ドメインの専門家が提供した少量のラベル付きデータから画像ラベル規則を推論し、そのルールを用いて無注釈データを自動的にラベル付けするRapidというニューロシンボリックアプローチを提案する。特にRapidは、事前訓練されたCVモデルと誘導論理学習を組み合わせて、ロジックベースのラベリングルールを推論する。 rapidは4つの画像ラベリングタスクで83.33%から88.33%のラベリング精度を達成している。特にrapidは、2つの高度に専門的なタスクで微調整されたcvモデルを大幅に上回っている。これらの結果は,小さなデータから高速に学習することの有効性と,異なるタスクを一般化する能力を示している。コードとデータセットはhttps://github.com/Neural-Symbolic-Image-Labeling/で公開されています。 The success of Computer Vision (CV) relies heavily on manually annotated data. However, it is prohibitively expensive to annotate images in key domains such as healthcare, where data labeling requires significant domain expertise and cannot be easily delegated to crowd workers. To address this challenge, we propose a neuro-symbolic approach called Rapid, which infers image labeling rules from a small amount of labeled data provided by domain experts and automatically labels unannotated data using the rules. Specifically, Rapid combines pre-trained CV models and inductive logic learning to infer the logic-based labeling rules. Rapid achieves a labeling accuracy of 83.33% to 88.33% on four image labeling tasks with only 12 to 39 labeled samples. In particular, Rapid significantly outperforms finetuned CV models in two highly specialized tasks. These results demonstrate the effectiveness of Rapid in learning from small data and its capability to generalize among different tasks. Code and our dataset are publicly available at https://github.com/Neural-Symbolic-Image-Labeling/	翻訳日:2023-06-21 20:23:25 公開日:2023-06-18
# 2層ニューラルネットワークパラメトリゼーションを用いた自然アクタークリティックの大域収束について On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization ( http://arxiv.org/abs/2306.10486v1 ) ライセンス: Link先を確認	Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal	(参考訳) アクター批判アルゴリズムは最先端の意思決定問題を解決するのに顕著な成功を収めた。しかしながら、その経験的効果にもかかわらず、その理論的基盤は、特にニューラルネットワークのパラメトリゼーションにおいて、比較的未探査のままである。本稿では,ニューラルネットを用いて批評家を表現する自然なアクタ-クリティックアルゴリズムの研究について述べる。本研究の目的は,本アルゴリズムの性能特性をより深く理解し,サンプル複雑性の保証を確立することである。そこで本研究では,2層批判パラメトリゼーション(NAC2L)を用いたNatural Actor-Criticアルゴリズムを提案する。我々のアプローチでは、凸最適化問題を通じて各イテレーションの$q$関数を推定する。提案手法により,$\tilde{\mathcal{o}}\left(\frac{1}{\epsilon^{4}(1-\gamma)^{4}}\right)$ のサンプル複雑性が得られることを確認した。対照的に、文献中の既存のサンプルの複雑さは、表状または線形のMDPのみを保持する。一方、この結果は可算な状態空間に対して成り立ち、MDP上の線形構造やローランク構造を必要としない。 Actor-critic algorithms have shown remarkable success in solving state-of-the-art decision-making problems. However, despite their empirical effectiveness, their theoretical underpinnings remain relatively unexplored, especially with neural network parametrization. In this paper, we delve into the study of a natural actor-critic algorithm that utilizes neural networks to represent the critic. Our aim is to establish sample complexity guarantees for this algorithm, achieving a deeper understanding of its performance characteristics. To achieve that, we propose a Natural Actor-Critic algorithm with 2-Layer critic parametrization (NAC2L). Our approach involves estimating the $Q$-function in each iteration through a convex optimization problem. We establish that our proposed approach attains a sample complexity of $\tilde{\mathcal{O}}\left(\frac{1}{\epsilon^{4}(1-\gamma)^{4}}\right)$. In contrast, the existing sample complexity results in the literature only hold for a tabular or linear MDP. Our result, on the other hand, holds for countable state spaces and does not require a linear or low-rank structure on the MDP.	翻訳日:2023-06-21 20:23:07 公開日:2023-06-18
# 分散検出のための平衡エネルギー正規化損失 Balanced Energy Regularization Loss for Out-of-distribution Detection ( http://arxiv.org/abs/2306.10485v1 ) ライセンス: Link先を確認	Hyunjun Choi, Hawook Jeong, Jin Young Choi	(参考訳) オフ・オブ・ディストリビューション(OOD)検出の分野では、OODデータとして補助データを使用する従来の手法が有望な性能を示している。しかし、この方法はすべての補助データに等しく損失を与え、不整合と区別する。しかし, 様々なタスクにおいて, 補助的oodデータのクラス間での分布には, 一般的な不均衡が存在する。本稿では, 単純だが多種多様なタスクに有効である平衡エネルギー正規化損失を提案する。我々の平衡エネルギー正規化損失は、OODデータのクラス不均衡に対処するために補助データに対して、クラスごとに異なる事前確率を利用する。主な概念は、マイノリティクラスよりも多数派クラスからの補助的なサンプルを規則化することである。本手法は, 従来のエネルギー正規化損失よりも, セマンティックセグメンテーション, ロングテール画像分類, 画像分類におけるood検出に優れている。さらに, セマンティックセグメンテーションにおけるOOD検出と長期画像分類の2つのタスクにおいて, 最先端性能を実現する。コードはhttps://github.com/hyunjunChhoi/Balanced_Energyで入手できる。 In the field of out-of-distribution (OOD) detection, a previous method that use auxiliary data as OOD data has shown promising performance. However, the method provides an equal loss to all auxiliary data to differentiate them from inliers. However, based on our observation, in various tasks, there is a general imbalance in the distribution of the auxiliary OOD data across classes. We propose a balanced energy regularization loss that is simple but generally effective for a variety of tasks. Our balanced energy regularization loss utilizes class-wise different prior probabilities for auxiliary data to address the class imbalance in OOD data. The main concept is to regularize auxiliary samples from majority classes, more heavily than those from minority classes. Our approach performs better for OOD detection in semantic segmentation, long-tailed image classification, and image classification than the prior energy regularization loss. Furthermore, our approach achieves state-of-the-art performance in two tasks: OOD detection in semantic segmentation and long-tailed image classification. Code is available at https://github.com/hyunjunChhoi/Balanced_Energy.	翻訳日:2023-06-21 20:22:55 公開日:2023-06-18
# STOIC2021 COVID-19 AIチャレンジ:再利用可能なトレーニング方法論をプライベートデータに適用 The STOIC2021 COVID-19 AI challenge: applying reusable training methodologies to private data ( http://arxiv.org/abs/2306.10484v1 ) ライセンス: Link先を確認	Luuk H. Boulogne, Julian Lorenz, Daniel Kienzle, Robin Schon, Katja Ludwig, Rainer Lienhart, Simon Jegou, Guang Li, Cong Chen, Qi Wang, Derik Shi, Mayug Maniparambil, Dominik Muller, Silvan Mertes, Niklas Schroter, Fabio Hellmann, Miriam Elia, Ine Dirks, Matias Nicolas Bossa, Abel Diaz Berenguer, Tanmoy Mukherjee, Jef Vandemeulebroucke, Hichem Sahli, Nikos Deligiannis, Panagiotis Gonidakis, Ngoc Dung Huynh, Imran Razzak, Reda Bouadjenek, Mario Verdicchio, Pasquale Borrelli, Marco Aiello, James A. Meakin, Alexander Lemm, Christoph Russ, Razvan Ionasec, Nikos Paragios, Bram van Ginneken, and Marie-Pierre Revel Dubois	(参考訳) 課題は、自動医療画像分析の最先端を推進する。彼らが提供する公開トレーニングデータの量は、ソリューションのパフォーマンスを制限できる。これらのソリューションのトレーニング方法論へのパブリックアクセスはまだ残っていない。本研究は、プライベートデータ上でのトレーニングソリューションと再利用可能なトレーニング方法論を保証できるType Three (T3)チャレンジフォーマットを実装した。 T3では、チャレンジオーガナイザが参加者が提供するコードベースを、隔離されたトレーニングデータでトレーニングする。 T3はSTOIC2021チャレンジで実施され、CT(Computed tomography)スキャンから被験者が1ヶ月以内にインキュベーションまたは死亡と定義される重症なCOVID-19感染症を患っているかどうかを予測することを目的としている。 stoic2021は、2000年公開のctスキャンを使用してチャレンジソリューションを開発した資格フェーズと、9724名の被験者のctスキャンでソリューションをトレーニングしたトレーニング方法論を参加者が提出する最終フェーズで構成されていた。主催者は最終段階の8回のうち6回を修了した。トレーニングと実行のためのコードベースが公開された。勝利解は、重篤なCOVID-19と非重症なCOVID-19(0.815)の鑑別のために、受信機動作特性曲線の下にある領域を得た。すべてのファイナリストの最終フェーズソリューションは、予選フェーズソリューションによって改善されました。 Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.	翻訳日:2023-06-21 20:22:37 公開日:2023-06-18
# 重み付き構造テンソル総変動による画像の雑音化 Weighted structure tensor total variation for image denoising ( http://arxiv.org/abs/2306.10482v1 ) ライセンス: Link先を確認	Xiuhan Sheng and Jingya Changa	(参考訳) 本稿では, 画像復号化問題の変分枠組みに基づいて, 異方性全変量モデル (ATV) と構造テンソル全変量モデル (STV) を組み合わせた新しい画像復号化正規化手法を提案する。本モデルは,stvモデルにおけるパッチベースヤコビ行列に対して,atvモデルで提案する行列重み演算子を適用することにより,画像の1次情報を効果的に捕捉し,ノイズ処理中に局所的な特徴を維持できる。グレースケールとrgbカラー画像のノイズ除去実験により,提案手法は,全変量ベースモデルとstvモデルに基づく他の既知の手法と比較して,良好な修復品質が得られることが示された。 Based on the variational framework of the image denoising problem, we introduce a novel image denoising regularizer that combines anisotropic total variation model (ATV) and structure tensor total variation model (STV) in this paper. The model can effectively capture the first-order information of the image and maintain local features during the denoising process by applying the matrix weighting operator proposed in the ATV model to the patch-based Jacobian matrix in the STV model. Denoising experiments on grayscale and RGB color images demonstrate that the suggested model can produce better restoration quality in comparison to other well-known methods based on total-variation-based models and the STV model.	翻訳日:2023-06-21 20:22:17 公開日:2023-06-18
# if2net: 連続学習のためのインナートフリーネットワーク IF2Net: Innately Forgetting-Free Networks for Continual Learning ( http://arxiv.org/abs/2306.10480v1 ) ライセンス: Link先を確認	Depeng Li, Tianqi Wang, Bingrong Xu, Kenji Kawaguchi, Zhigang Zeng, and Ponnuthurai Nagaratnam Suganthan	(参考訳) 継続的学習は、以前の学習した知識に干渉することなく、新しい概念を段階的に吸収することができる。ニューラルネットワークの特徴として,情報を接続に重み付けして格納する手法を考案し,連続学習環境におけるIF2Netの設計方法について検討した。本研究では,新しいタスクの学習前後において,各タスクに対する重み付けを巧みに保ちながら,単純かつ効果的な学習パラダイムを提案する。まず,ランダム重み付きタスク列の表現レベル学習について紹介した。このテクニックは、ランダム化によって引き起こされるドリフト表現を別々のタスク最適動作状態に調整することを指すが、関連する重みは凍結され、再利用される(重みの層的な更新がよく知られている)。そして、出力重み更新を同相直交空間に投影し、モデル可塑性を維持しながら古い知識を邪魔しないようにすることで、忘れることなく逐次意思決定を行うことができる。 IF2Netは、ランダム化と直交化のそれぞれの強みを統合することにより、テスト時にタスクの同一性を知ることなく、本質的に無制限のマッピングルールを学習することができる。理論解析および実証研究において,本手法の有効性を検証した。 Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge. Motivated by the characteristics of neural networks, in which information is stored in weights on connections, we investigated how to design an Innately Forgetting-Free Network (IF2Net) for continual learning context. This study proposed a straightforward yet effective learning paradigm by ingeniously keeping the weights relative to each seen task untouched before and after learning a new task. We first presented the novel representation-level learning on task sequences with random weights. This technique refers to tweaking the drifted representations caused by randomization back to their separate task-optimal working states, but the involved weights are frozen and reused (opposite to well-known layer-wise updates of weights). Then, sequential decision-making without forgetting can be achieved by projecting the output weight updates into the parsimonious orthogonal space, making the adaptations not disturb old knowledge while maintaining model plasticity. IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time by integrating the respective strengths of randomization and orthogonalization. We validated the effectiveness of our approach in the extensive theoretical analysis and empirical study.	翻訳日:2023-06-21 20:22:02 公開日:2023-06-18
# ct金属アーティファクト低減のためのretinexflow RetinexFlow for CT metal artifact reduction ( http://arxiv.org/abs/2306.10520v1 ) ライセンス: Link先を確認	Jiandong Su and Ce Wang and Yinsheng Li and Kun Shang and Dong Liang	(参考訳) 金属アーティファクトはctイメージングにおいて大きな課題であり、画質を著しく低下させ、正確な診断を困難にしている。しかし、従来の方法では、金属インプラントの位置の事前知識が必要か、あるいは高品質ct画像を得る能力を制限するアーティファクト形成のメカニズムによるモデリングの逸脱が必要である。本研究では,金属アーティファクト低減問題を分解と完了タスクの組合せとして定式化する。そこで本研究では,retinex理論と条件付き正規化フローに基づく,新たなエンドツーエンド画像ドメインモデルであるretinexflowを提案する。具体的には,金属インプラント成分と固有の成分を分解する機能分解エンコーダを設計し,その特性を抽出する。そして、機能対イメージフローモジュールを使用して、金属製のアーティファクトフリーCT画像ステップを、一連の可逆変換をステップで完了させる。これらの設計は粗細な戦略でモデルに組み込まれており、優れた性能を実現しています。シミュレーションおよび臨床データを用いた実験結果から,本手法はより定量的で質的な結果が得られ,アーティファクト除去や画像忠実度が向上することが示された。 Metal artifacts is a major challenge in computed tomography (CT) imaging, significantly degrading image quality and making accurate diagnosis difficult. However, previous methods either require prior knowledge of the location of metal implants, or have modeling deviations with the mechanism of artifact formation, which limits the ability to obtain high-quality CT images. In this work, we formulate metal artifacts reduction problem as a combination of decomposition and completion tasks. And we propose RetinexFlow, which is a novel end-to-end image domain model based on Retinex theory and conditional normalizing flow, to solve it. Specifically, we first design a feature decomposition encoder for decomposing the metal implant component and inherent component, and extracting the inherent feature. Then, it uses a feature-to-image flow module to complete the metal artifact-free CT image step by step through a series of invertible transformations. These designs are incorporated in our model with a coarse-to-fine strategy, enabling it to achieve superior performance. The experimental results on on simulation and clinical datasets show our method achieves better quantitative and qualitative results, exhibiting better visual performance in artifact removal and image fidelity	翻訳日:2023-06-21 20:17:04 公開日:2023-06-18
# 量子プログラムのリファクタリングについて On Refactoring Quantum Programs ( http://arxiv.org/abs/2306.10517v1 ) ライセンス: Link先を確認	Jianjun Zhao	(参考訳) リファクタリングは、ソフトウェアの内部設計を再構築し、外部の振る舞いを保ちながら、ソフトウェアの効率と保守性を改善する上で重要な技術である。古典的なプログラムは様々なリファクタリング手法の恩恵を受けているが、量子プログラミングの分野には専用のリファクタリング技法がない。量子重ね合わせ、絡み合い、非閉包原理といった量子コンピューティングの異なる性質は、特別なリファクタリング技術を必要とする。本稿では,量子プログラム専用に設計された包括的リファクタリングのセットを提示することで,このギャップを解消する。各リファクタリングは、量子プログラムの効果的な再構成を保証するために慎重に設計され、説明される。さらに,量子プログラムのリファクタリングプロセスの自動化におけるツールサポートの重要性を強調する。我々の研究は量子プログラミング言語Q\#に焦点を当てているが、我々のアプローチは他の量子プログラミング言語にも適用でき、量子ソフトウェアの保守性と効率を高めるための一般的なソリューションを提供する。 Refactoring is a crucial technique for improving the efficiency and maintainability of software by restructuring its internal design while preserving its external behavior. While classical programs have benefited from various refactoring methods, the field of quantum programming lacks dedicated refactoring techniques. The distinct properties of quantum computing, such as quantum superposition, entanglement, and the no-cloning principle, necessitate specialized refactoring techniques. This paper bridges this gap by presenting a comprehensive set of refactorings specifically designed for quantum programs. Each refactoring is carefully designed and explained to ensure the effective restructuring of quantum programs. Additionally, we highlight the importance of tool support in automating the refactoring process for quantum programs. Although our study focuses on the quantum programming language Q\#, our approach is applicable to other quantum programming languages, offering a general solution for enhancing the maintainability and efficiency of quantum software.	翻訳日:2023-06-21 20:16:25 公開日:2023-06-18
# 群衆の生体信号検出のためのビジョンガイドMIMOレーダビームフォーミング Vision Guided MIMO Radar Beamforming for Enhanced Vital Signs Detection in Crowds ( http://arxiv.org/abs/2306.10515v1 ) ライセンス: Link先を確認	Shuaifeng Jiang, Ahmed Alkhateeb, Daniel W. Bliss, and Yu Rong	(参考訳) リモートセンシング技術としてのレーダーは、人間の活動を分析するために何十年も使われてきた。モーション感度、プライバシー保護、透過性などの優れた特徴にもかかわらず、レーダーは光学センサーに比べて空間的自由度が制限されているため、事前情報なしで混雑した環境を感知することは困難である。本稿では,複数入力多重出力 (mimo) レーダにおけるディジタルビームフォーミングの誘導に視覚センサを応用した,新しいデュアルセンシングシステムを開発した。また,2種類のセンサを整列するキャリブレーションアルゴリズムを開発し,キャリブレーションされたデュアルシステムは,75^\circ$×65^\circ$,及び2mの範囲で3次元空間における約2cm精度を達成可能であることを示した。最後に,実環境におけるバイタルサイン検出の有望な方向性を浮き彫りにした,座位と立位が密集した被験者群に対して,同時にバイタルサインを検出できることを示す。 Radar as a remote sensing technology has been used to analyze human activity for decades. Despite all the great features such as motion sensitivity, privacy preservation, penetrability, and more, radar has limited spatial degrees of freedom compared to optical sensors and thus makes it challenging to sense crowded environments without prior information. In this paper, we develop a novel dual-sensing system, in which a vision sensor is leveraged to guide digital beamforming in a multiple-input multiple-output (MIMO) radar. Also, we develop a calibration algorithm to align the two types of sensors and show that the calibrated dual system achieves about two centimeters precision in three-dimensional space within a field of view of $75^\circ$ by $65^\circ$ and for a range of two meters. Finally, we show that the proposed approach is capable of detecting the vital signs simultaneously for a group of closely spaced subjects, sitting and standing, in a cluttered environment, which highlights a promising direction for vital signs detection in realistic environments.	翻訳日:2023-06-21 20:16:00 公開日:2023-06-18
# プロンプトに基づくFew Shotテキスト分類のための進化的バーバリザ探索 Evolutionary Verbalizer Search for Prompt-based Few Shot Text Classification ( http://arxiv.org/abs/2306.10514v1 ) ライセンス: Link先を確認	Tongtao Ling, Lei Chen, Yutao Lai and Hai-Lin Liu	(参考訳) テキスト分類の最近の進歩は、テキスト入力をタスク固有のプロンプトでラップして質問をクローズすることを目的としている。マスク付き言語モデルでそれらを処理し、マスク付きトークンを予測し、予測された単語とターゲットラベルのマッピングを構成する動詞を用いた。事前訓練された言語モデルを使用するこのアプローチは、プロンプトベースのチューニングと呼ばれ、低データシナリオにおける従来の微調整アプローチよりも著しく優れている。プロンプトベースのチューニングのコアとして、動詞化語は通常、人間の努力で手作りされる。本稿では, 最適言語化器の自動構築に着目し, 高速言語化器による即興チューニングを改善するための新しい進化的言語化器探索アルゴリズムを提案する。具体的には、進化アルゴリズム(EA)にインスパイアされ、進化過程において様々な動詞を自動進化させ、何回か繰り返して最適なものを選択する。 5つのテキスト分類データセットに関する広範囲なサンプル実験を行い,本手法の有効性を示した。 Recent advances for few-shot text classification aim to wrap textual inputs with task-specific prompts to cloze questions. By processing them with a masked language model to predict the masked tokens and using a verbalizer that constructs the mapping between predicted words and target labels. This approach of using pre-trained language models is called prompt-based tuning, which could remarkably outperform conventional fine-tuning approach in the low-data scenario. As the core of prompt-based tuning, the verbalizer is usually handcrafted with human efforts or suboptimally searched by gradient descent. In this paper, we focus on automatically constructing the optimal verbalizer and propose a novel evolutionary verbalizer search (EVS) algorithm, to improve prompt-based tuning with the high-performance verbalizer. Specifically, inspired by evolutionary algorithm (EA), we utilize it to automatically evolve various verbalizers during the evolutionary procedure and select the best one after several iterations. Extensive few-shot experiments on five text classification datasets show the effectiveness of our method.	翻訳日:2023-06-21 20:15:21 公開日:2023-06-18
# LLMの認知能力を効果的に測定する:適応的テストの観点から Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing Perspective ( http://arxiv.org/abs/2306.10512v1 ) ライセンス: Link先を確認	Yan Zhuang, Qi Liu, Yuting Ning, Weizhe Huang, Rui Lv, Zhenya Huang, Guanhao Zhao, Zheng Zhang, Qingyang Mao, Shijin Wang, Enhong Chen	(参考訳) ChatGPTのような大型言語モデル(LLM)は、人間に似た認知能力を示している。これらの異なるモデルの能力を比較するために、異なる分野(文学、生物学、心理学など)のいくつかのベンチマーク(標準テスト質問の組)がしばしば採用され、精度、リコール、f1などの伝統的な指標によるテスト結果が報告されている。しかし、LCMの評価方法は認知科学の観点から非効率で不正確である。心理測定に使用されるCAT(Computerized Adaptive Testing)にヒントを得て,LLM評価のための適応テストフレームワークを提案する。標準的なテストセットを使用し、単に精度を報告するのではなく、モデルの性能に基づいて、難易度などのテスト問題の特徴を動的に調整する。これにより、より少ない質問を使ってモデルの能力をより正確に推定できる。さらに重要なのは、LLMを人間と簡単に比較できることであり、人間レベルの能力を目指すNLPモデルに必須である。診断報告によると、ChatGPTは「不注意な学生」のように振る舞うことが多く、時折質問を推測する傾向がある。対象知識,数学的推論,プログラミングの3つの側面から,gpt4が他のモデルを大幅に上回ることができ,中学生の認知能力に到達できる,詳細な診断を行い,最新の6つの指導調整llmをランク付けした。効率的な適応テストを使った異なるモデルの異なるテスト -- 私たちは、これは大きな言語モデルを評価するための新しい規範になる可能性があると信じています。 Large language models (LLMs), like ChatGPT, have shown some human-like cognitive abilities. For comparing these abilities of different models, several benchmarks (i.e. sets of standard test questions) from different fields (e.g., Literature, Biology and Psychology) are often adopted and the test results under traditional metrics such as accuracy, recall and F1, are reported. However, such way for evaluating LLMs can be inefficient and inaccurate from the cognitive science perspective. Inspired by Computerized Adaptive Testing (CAT) used in psychometrics, we propose an adaptive testing framework for LLM evaluation. Rather than using a standard test set and simply reporting accuracy, this approach dynamically adjusts the characteristics of the test questions, such as difficulty, based on the model's performance. This allows for a more accurate estimation of the model's abilities, using fewer questions. More importantly, it allows LLMs to be compared with humans easily, which is essential for NLP models that aim for human-level ability. Our diagnostic reports have found that ChatGPT often behaves like a ``careless student'', prone to slip and occasionally guessing the questions. We conduct a fine-grained diagnosis and rank the latest 6 instruction-tuned LLMs from three aspects of Subject Knowledge, Mathematical Reasoning, and Programming, where GPT4 can outperform other models significantly and reach the cognitive ability of middle-level students. Different tests for different models using efficient adaptive testing -- we believe this has the potential to become a new norm in evaluating large language models.	翻訳日:2023-06-21 20:15:01 公開日:2023-06-18
# クロスドメイン・ファウショット学習のためのデュアル適応表現アライメント Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning ( http://arxiv.org/abs/2306.10511v1 ) ライセンス: Link先を確認	Yifan Zhao, Tong Zhang, Jia Li, Yonghong Tian	(参考訳) ベース知識から学習することで、限られたサポートサンプルを持つ新規なクエリを認識することを目的としている。この設定の最近の進歩は、ベース知識と新しいクエリサンプルが同じドメインに分散されていることを前提としている。本稿では,対象領域で利用可能なサンプルが極端に少ないクロスドメイン・少数ショット学習問題に対処することを提案する。この現実的な環境下では,効果的な二重適応表現アライメントアプローチを提案することで,メタリーナーの迅速な適応能力に焦点をあてる。提案手法では,まず,サポートインスタンスをプロトタイプとして再検討し,それらのプロトタイプを識別可能なクローズドフォームソリューションで再計画する。したがって、学習知識の特徴空間は、クロスインスタンスとクロスプロトタイプの関係により、クエリ空間に適応的に変換することができる。機能アライメントの他に,サポートおよびクエリサンプル間の共変シフトを解決するために,クエリサンプルの事前統計値を利用する正規化分布アライメントモジュールも提示する。これら2つのモジュールにより、プログレッシブなメタ学習フレームワークが構築され、その一般化能力を維持しながら、極めて少数のサンプルを用いて高速な適応を行う。実験結果から,cdfslベンチマーク4回,細粒度クロスドメインベンチマーク4回において,新たな最先端結果が得られた。 Few-shot learning aims to recognize novel queries with limited support samples by learning from base knowledge. Recent progress in this setting assumes that the base knowledge and novel query samples are distributed in the same domains, which are usually infeasible for realistic applications. Toward this issue, we propose to address the cross-domain few-shot learning problem where only extremely few samples are available in target domains. Under this realistic setting, we focus on the fast adaptation capability of meta-learners by proposing an effective dual adaptive representation alignment approach. In our approach, a prototypical feature alignment is first proposed to recalibrate support instances as prototypes and reproject these prototypes with a differentiable closed-form solution. Therefore feature spaces of learned knowledge can be adaptively transformed to query spaces by the cross-instance and cross-prototype relations. Besides the feature alignment, we further present a normalized distribution alignment module, which exploits prior statistics of query samples for solving the covariant shifts among the support and query samples. With these two modules, a progressive meta-learning framework is constructed to perform the fast adaptation with extremely few-shot samples while maintaining its generalization capabilities. Experimental evidence demonstrates our approach achieves new state-of-the-art results on 4 CDFSL benchmarks and 4 fine-grained cross-domain benchmarks.	翻訳日:2023-06-21 20:14:32 公開日:2023-06-18
# 人間対機械: 学生生成とAI生成の教育内容の比較 Human vs Machine: Comparison of Student-generated and AI-generated Educational Content ( http://arxiv.org/abs/2306.10509v1 ) ライセンス: Link先を確認	Paul Denny and Hassan Khosravi and Arto Hellas and Juho Leinonen and Sami Sarsa	(参考訳) パーソナライズされた学習体験を提供するオンライン学習プラットフォームに移行する学生が増えているため、高品質な教育コンテンツの生産には大きなニーズがある。大規模言語モデル(llm)は、大規模学習教材の迅速な作成に有望な解決策を提供し、インストラクターの負担を軽減する。本研究では,学習支援活動の一環として,LLMが生み出す資源の質を学生が生み出すものと比較することにより,導入プログラミングの文脈において学習資源を生み出す可能性を検討した。盲目評価を用いて、学生はaiとその仲間によって生成されたリソースの正確性と有用性を評価した。その結果,学生が認識するai生成資源の質は,仲間が生成する資源の質と同等であることがわかった。これは、AI生成資源が特定の文脈において有効な補助材料として機能する可能性を示唆している。 llmsが生成するリソースは与えられた例に忠実に反映する傾向があるが、学生が生成するリソースは、使用するコンテンツの長さと特定の構文の特徴の点で、より多種多様である。この研究は、さまざまなタイプの学習リソースと幅広い主題領域を探索し、AI生成リソースが学習結果に長期的な影響を理解することの必要性を強調している。 As an increasing number of students move to online learning platforms that deliver personalized learning experiences, there is a great need for the production of high-quality educational content. Large language models (LLMs) appear to offer a promising solution to the rapid creation of learning materials at scale, reducing the burden on instructors. In this study, we investigated the potential for LLMs to produce learning resources in an introductory programming context, by comparing the quality of the resources generated by an LLM with those created by students as part of a learnersourcing activity. Using a blind evaluation, students rated the correctness and helpfulness of resources generated by AI and their peers, after both were initially provided with identical exemplars. Our results show that the quality of AI-generated resources, as perceived by students, is equivalent to the quality of resources generated by their peers. This suggests that AI-generated resources may serve as viable supplementary material in certain contexts. Resources generated by LLMs tend to closely mirror the given exemplars, whereas student-generated resources exhibit greater variety in terms of content length and specific syntax features used. The study highlights the need for further research exploring different types of learning resources and a broader range of subject areas, and understanding the long-term impact of AI-generated resources on learning outcomes.	翻訳日:2023-06-21 20:14:10 公開日:2023-06-18
# qcnext:ジョイントマルチエージェント軌道予測のための次世代フレームワーク QCNeXt: A Next-Generation Framework For Joint Multi-Agent Trajectory Prediction ( http://arxiv.org/abs/2306.10508v1 ) ライセンス: Link先を確認	Zikang Zhou, Zihao Wen, Jianping Wang, Yung-Hui Li, Yu-Kai Huang	(参考訳) 路上エージェントの将来の軌跡の同時分布を推定することは自動運転に不可欠である。本稿では,QCNeXtと呼ばれるマルチエージェント軌道予測のための次世代フレームワークを提案する。まず,複合マルチエージェント軌道予測のタスクとして,クエリ中心のエンコーディングパラダイムを採用する。この符号化方式により, シーンエンコーダは, 設定要素の置換等価性, 空間次元の回転変換不変性, 時間次元の変換不変性を備える。これらの不変性は、精度の高いマルチエージェント予測を可能にするだけでなく、エンコーダにストリーミング処理能力を与える。第2に,エージェントの相互作用をモデル化することで,複数エージェントの軌道予測を容易にする多エージェントDETR型デコーダを提案する。連立予測モデルが限界指標においても限界予測モデルを上回ることが初めて示され,軌道予測における新たな研究機会が開かれた。我々の手法はArgoverse 2のマルチエージェントモーション予測ベンチマークで1位にランクされ、CVPR 2023 Workshop on Autonomous DrivingでArgoverse Challengeのチャンピオンを獲得した。 Estimating the joint distribution of on-road agents' future trajectories is essential for autonomous driving. In this technical report, we propose a next-generation framework for joint multi-agent trajectory prediction called QCNeXt. First, we adopt the query-centric encoding paradigm for the task of joint multi-agent trajectory prediction. Powered by this encoding scheme, our scene encoder is equipped with permutation equivariance on the set elements, roto-translation invariance in the space dimension, and translation invariance in the time dimension. These invariance properties not only enable accurate multi-agent forecasting fundamentally but also empower the encoder with the capability of streaming processing. Second, we propose a multi-agent DETR-like decoder, which facilitates joint multi-agent trajectory prediction by modeling agents' interactions at future time steps. For the first time, we show that a joint prediction model can outperform marginal prediction models even on the marginal metrics, which opens up new research opportunities in trajectory prediction. Our approach ranks 1st on the Argoverse 2 multi-agent motion forecasting benchmark, winning the championship of the Argoverse Challenge at the CVPR 2023 Workshop on Autonomous Driving.	翻訳日:2023-06-21 20:13:46 公開日:2023-06-18
# 非log-concave分布に対するMCMCアルゴリズムの高速条件混合 Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions ( http://arxiv.org/abs/2306.10506v1 ) ライセンス: Link先を確認	Xiang Cheng, Bohan Wang, Jingzhao Zhang, Yusong Zhu	(参考訳) MCMCアルゴリズムは、ターゲット分布$\pi(x) \propto \exp(-V(x))$からサンプリングするための経験的に効率的なツールを提供する。しかし理論側では、mcmcアルゴリズムは$\pi(x)$ が非log-concaveであるときに混合速度が遅い。我々の研究は、このギャップを検証し、ポアンカー型不等式が状態空間のサブセット$\mathcal{X}$に収まるとき、MCMC の条件分布は $\mathcal{X}$ より速く真の条件分布に混合することを示す。この高速混合保証は、グローバル混合が確実に遅い場合に保持することができる。ステートメントを形式化し,条件付き混合率を定量化する。さらに,条件付き混合はガウス型混合物のサンプリング,ガウス型混合モデルのパラメータ推定,局所的極小点のgibbsサンプリングに興味深い意味を持つことを示す。 MCMC algorithms offer empirically efficient tools for sampling from a target distribution $\pi(x) \propto \exp(-V(x))$. However, on the theory side, MCMC algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave. Our work examines this gap and shows that when Poincar\'e-style inequality holds on a subset $\mathcal{X}$ of the state space, the conditional distribution of MCMC iterates over $\mathcal{X}$ mixes fast to the true conditional distribution. This fast mixing guarantee can hold in cases when global mixing is provably slow. We formalize the statement and quantify the conditional mixing rate. We further show that conditional mixing can have interesting implications for sampling from mixtures of Gaussians, parameter estimation for Gaussian mixture models and Gibbs-sampling with well-connected local minima.	翻訳日:2023-06-21 20:13:28 公開日:2023-06-18
# グラフ分類のための構造感性グラフ辞書 Structure-Sensitive Graph Dictionary Embedding for Graph Classification ( http://arxiv.org/abs/2306.10505v1 ) ライセンス: Link先を確認	Guangbu Liu, Tong Zhang, Xudong Wang, Wenting Zhao, Chuanwei Zhou, and Zhen Cui	(参考訳) グラフ構造表現は、様々なグラフを区別する上で重要な役割を果たす。本研究では,入力グラフをグラフ分類タスク用のグラフ辞書の埋め込み空間に変換するための,構造化グラフ辞書埋め込み(SS-GDE)フレームワークを提案する。本稿では,基本グラフ辞書を日常的に使用する代わりに,各入力グラフに対応するパーソナライズされた辞書(名前付きグラフ辞書)を生成するための変分グラフ辞書適応(VGDA)を提案する。特に,ベースグラフキーのサブ構造を各入力に応じて調整するためにベルヌーイサンプリングを導入することで,ベース辞書の表現能力を大幅に向上させる。クロスグラフ計測を高感度かつ安定にするために, 最適輸送に対するマルチスケールの注意を設計し, 多感度ワッサースタイン符号化法を提案する。この枠組みを最適化するために, 相互情報を目的として導入し, 適合グラフ辞書の変分推論にさらに寄与する。グラフ分類の複数のデータセット上でSS-GDEを行い、実験結果から最先端手法よりも有効性と優位性を示す。 Graph structure expression plays a vital role in distinguishing various graphs. In this work, we propose a Structure-Sensitive Graph Dictionary Embedding (SS-GDE) framework to transform input graphs into the embedding space of a graph dictionary for the graph classification task. Instead of a plain use of a base graph dictionary, we propose the variational graph dictionary adaptation (VGDA) to generate a personalized dictionary (named adapted graph dictionary) for catering to each input graph. In particular, for the adaptation, the Bernoulli sampling is introduced to adjust substructures of base graph keys according to each input, which increases the expression capacity of the base dictionary tremendously. To make cross-graph measurement sensitive as well as stable, multi-sensitivity Wasserstein encoding is proposed to produce the embeddings by designing multi-scale attention on optimal transport. To optimize the framework, we introduce mutual information as the objective, which further deduces to variational inference of the adapted graph dictionary. We perform our SS-GDE on multiple datasets of graph classification, and the experimental results demonstrate the effectiveness and superiority over the state-of-the-art methods.	翻訳日:2023-06-21 20:13:12 公開日:2023-06-18
# MARBLE:ユニバーサル評価のための音楽オーディオ表現ベンチマーク MARBLE: Music Audio Representation Benchmark for Universal Evaluation ( http://arxiv.org/abs/2306.10548v1 ) ライセンス: Link先を確認	Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu	(参考訳) 画像生成やフィクションの共創など、芸術と人工知能(AI)の広範な交差の時代において、音楽のためのAIは、特に音楽の理解において比較的初期段階にある。これは、深い音楽表現に関する限られた作業、大規模データセットの不足、普遍的でコミュニティ主導のベンチマークの欠如によって明らかである。この問題に対処するため,MARBLEと呼ばれるUniversaL評価のためのMusic Audio Representation Benchmarkを導入する。音響、パフォーマンス、スコア、ハイレベル記述を含む4つの階層レベルを持つ包括的分類を定義することで、様々な音楽情報検索(MIR)タスクのベンチマークを提供する。次に,8つの公開データセット上で14のタスクに基づく統一プロトコルを構築し,音楽録音をベースラインとして開発したオープンソース事前学習モデルの表現を公平かつ標準的に評価する。さらに、MARBLEは、データセットの著作権問題に関する明確な声明とともに、使いやすく、拡張可能で、再現可能なスイートをコミュニティに提供する。その結果、近年提案されている大規模事前学習型言語モデルは、多くのタスクにおいて最善を尽くし、さらなる改善の余地があることがわかった。 leaderboardと toolkitリポジトリは、将来の音楽ai研究を促進するためにhttps://marble-bm.shef.ac.ukで公開されている。 In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue, we introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and reproducible suite for the community, with a clear statement on copyright issues on datasets. Results suggest recently proposed large-scale pre-trained musical language models perform the best in most tasks, with room for further improvement. The leaderboard and toolkit repository are published at https://marble-bm.shef.ac.uk to promote future music AI research.	翻訳日:2023-06-21 20:05:37 公開日:2023-06-18
# UniMC:関係表現学習による長期記憶会話のための統一フレームワーク UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning ( http://arxiv.org/abs/2306.10543v1 ) ライセンス: Link先を確認	Kang Zhao, Wei Liu, Jian Luan, Minglei Gao, Li Qian, Hanlin Teng, Bin Wang	(参考訳) オープンドメインの長期記憶会話は、人間との長期的な親密性を確立することができ、鍵となるのは、長期の対話履歴情報を理解し記憶する能力である。既存の作業は、パイプラインを通じてモデリングする複数のモデルを統合することで、異なるステージ間の結合を無視します。本稿では,関係表現を学習することで異なるステージ間の接続を増加させる,長期記憶会話(unimc)のための統一フレームワークを提案する。具体的には、主タスクを確率グラフに基づいて3つのサブタスクに分解する。 1)会話要約 2)メモリ検索 3)メモリ拡張世代。各サブタスクは、デコーダ入力の先頭に特別なトークンを挿入することによってモデル化されたクエリとメモリ間の関連性を計算する表現を学習する。関連表現学習は、パラメータ共有と合同トレーニングを通じてサブタスク間の接続を強化する。実験結果から,提案手法は強いベースラインよりも一貫して改善され,対話の一貫性と係合性が向上することが示された。 Open-domain long-term memory conversation can establish long-term intimacy with humans, and the key is the ability to understand and memorize long-term dialogue history information. Existing works integrate multiple models for modelling through a pipeline, which ignores the coupling between different stages. In this paper, we propose a Unified framework for Long-term Memory Conversations (UniMC), which increases the connection between different stages by learning relevance representation. Specifically, we decompose the main task into three subtasks based on probability graphs: 1) conversation summarization, 2) memory retrieval, 3) memory-augmented generation. Each subtask involves learning a representation for calculating the relevance between the query and memory, which is modelled by inserting a special token at the beginning of the decoder input. The relevance representation learning strengthens the connection across subtasks through parameter sharing and joint training. Extensive experimental results show that the proposed method consistently improves over strong baselines and yields better dialogue consistency and engagingness.	翻訳日:2023-06-21 20:05:15 公開日:2023-06-18
# 畳み込みニューラルネットワークにおける負の情報強化の学習 Learn to Enhance the Negative Information in Convolutional Neural Network ( http://arxiv.org/abs/2306.10536v1 ) ライセンス: Link先を確認	Zhicheng Cai, Chenglei Peng, Qiu Shen	(参考訳) 本稿では,畳み込みニューラルネットワーク(CNN)に特化して学習可能な非線形活性化機構を提案する。負のニューロンを切断し「死のReLU」の問題に苦しむReLUとは対照的に、LENIは死んだ神経細胞を再構築し、情報損失を減らす能力を持っている。改良されたReLUと比較して、LENIは負相情報をより適切に処理するための学習可能なアプローチを導入している。これにより、LENIはReLUの本来の利点を維持しつつ、モデル表現能力を大幅に向上させることができる。汎用的なアクティベーションメカニズムとして、レニはポータビリティの特性を持ち、アクティベーション層を単にレニブロックに置き換えることで、任意のcnnモデルで容易に利用できる。大規模な実験により、LENIは様々なベンチマークデータセット上の様々なベースラインモデルの性能を、明確なマージン(ImageNet-1kで最大1.24%高いトップ1精度)で、無視できる余分なパラメータで改善できることが確認された。さらなる実験では、LENIがチャネル補償機構として機能し、競争力や性能が向上するが、ベースラインモデルよりも学習パラメータが少ないことが示されている。さらに、LENIは表現能力の向上に寄与するモデル構造に非対称性を導入する。可視化実験を通じて、LENIがより多くの情報を保持し、より多くの表現を学習できることを検証する。 This paper proposes a learnable nonlinear activation mechanism specifically for convolutional neural network (CNN) termed as LENI, which learns to enhance the negative information in CNNs. In sharp contrast to ReLU which cuts off the negative neurons and suffers from the issue of ''dying ReLU'', LENI enjoys the capacity to reconstruct the dead neurons and reduce the information loss. Compared to improved ReLUs, LENI introduces a learnable approach to process the negative phase information more properly. In this way, LENI can enhance the model representational capacity significantly while maintaining the original advantages of ReLU. As a generic activation mechanism, LENI possesses the property of portability and can be easily utilized in any CNN models through simply replacing the activation layers with LENI block. Extensive experiments validate that LENI can improve the performance of various baseline models on various benchmark datasets by a clear margin (up to 1.24% higher top-1 accuracy on ImageNet-1k) with negligible extra parameters. Further experiments show that LENI can act as a channel compensation mechanism, offering competitive or even better performance but with fewer learned parameters than baseline models. In addition, LENI introduces the asymmetry to the model structure which contributes to the enhancement of representational capacity. Through visualization experiments, we validate that LENI can retain more information and learn more representations.	翻訳日:2023-06-21 20:05:00 公開日:2023-06-18
# ProMIL: 医用画像の確率的多重学習 ProMIL: Probabilistic Multiple Instance Learning for Medical Imaging ( http://arxiv.org/abs/2306.10535v1 ) ライセンス: Link先を確認	{\L}ukasz Struski, Dawid Rymarczyk, Arkadiusz Lewicki, Robert Sabiniewicz, Jacek Tabor, Bartosz Zieli\'nski	(参考訳) マルチインスタンスラーニング(MIL)は、ひとつのラベルがインスタンスの袋全体に割り当てられる弱い教師付き問題である。 MILモデルの重要なクラスはインスタンスベースで、まずインスタンスを分類し、その予測を集約してバッグラベルを取得する。最も一般的なMILモデルは、バッグが正のラベルを持つ場合、そのインスタンスの少なくとも1つが正のラベルを持つ場合である。しかし、この推論は、ポジティブなバッグラベルが特定のポジティブなインスタンスのパーセンテージの結果であるような、多くの現実のシナリオでは成り立たない。この問題に対処するために,深層ニューラルネットワークとベルンシュタイン多項式推定に基づく,ProMILと呼ばれる専用インスタンスベースの手法を提案する。 ProMILの重要な利点は、意思決定に最適なパーセンテージを自動的に検出できることである。 ProMILは実世界の医療応用において標準のインスタンスベースMILよりも優れていることを示す。コードを利用可能にします。 Multiple Instance Learning (MIL) is a weakly-supervised problem in which one label is assigned to the whole bag of instances. An important class of MIL models is instance-based, where we first classify instances and then aggregate those predictions to obtain a bag label. The most common MIL model is when we consider a bag as positive if at least one of its instances has a positive label. However, this reasoning does not hold in many real-life scenarios, where the positive bag label is often a consequence of a certain percentage of positive instances. To address this issue, we introduce a dedicated instance-based method called ProMIL, based on deep neural networks and Bernstein polynomial estimation. An important advantage of ProMIL is that it can automatically detect the optimal percentage level for decision-making. We show that ProMIL outperforms standard instance-based MIL in real-world medical applications. We make the code available.	翻訳日:2023-06-21 20:04:35 公開日:2023-06-18
# データ拡張によるグラフ異常検出モデルの一般化性の向上 Improving Generalizability of Graph Anomaly Detection Models via Data Augmentation ( http://arxiv.org/abs/2306.10534v1 ) ライセンス: Link先を確認	Shuang Zhou, Xiao Huang, Ninghao Liu, Huachi Zhou, Fu-Lai Chung, Long-Kai Huang	(参考訳) グラフ異常検出(GAD)は、少数の異常でさえ、良心的なユーザーに大きな脅威をもたらす可能性があるため、重要なタスクである。従来の知識として利用可能なラベルを効果的に活用できる最近の半教師付きGAD法は、教師なし手法よりも優れた性能を実現している。実際には、人々はビジネスを確保するために新しい(サブ)グラフ上の異常を識別する必要があるが、効果的な検出モデルをトレーニングするラベルが欠落している可能性がある。自然なアイデアのひとつは、トレーニング済みのgadモデルをテスト用の新しい(サブ)グラフに直接導入することだ。しかし、既存の半教師付きGAD法は一般化の問題に悩まされており、例えば、よく訓練されたモデルは、同じグラフの見えない領域(つまり、トレーニングではアクセスできない)ではうまく機能しない。それは大きなトラブルを引き起こすかもしれない。本稿では,この現象を基礎として,学習領域グラフと未発見テストグラフの両方の異常を効果的に識別し,潜在的な危険を解消することを目的とした,一般化グラフ異常検出の一般的かつ新しい研究問題を提案する。それでも、限られたラベルしか利用できないため、通常のバックグラウンドはトレーニングとテストデータの違いがあるため、難しい作業です。そこで本研究では,学習データを充実させ,GADモデルの一般化性を高めるために,textit{AugAN} (\uline{Aug}mentation for \uline{A}nomaly and \uline{N}ormal distributions) というデータ拡張手法を提案する。モデル一般化性向上における本手法の有効性を検証する。 Graph anomaly detection (GAD) is a vital task since even a few anomalies can pose huge threats to benign users. Recent semi-supervised GAD methods, which can effectively leverage the available labels as prior knowledge, have achieved superior performances than unsupervised methods. In practice, people usually need to identify anomalies on new (sub)graphs to secure their business, but they may lack labels to train an effective detection model. One natural idea is to directly adopt a trained GAD model to the new (sub)graph for testing. However, we find that existing semi-supervised GAD methods suffer from poor generalization issue, i.e., well-trained models could not perform well on an unseen area (i.e., not accessible in training) of the same graph. It may cause great troubles. In this paper, we base on the phenomenon and propose a general and novel research problem of generalized graph anomaly detection that aims to effectively identify anomalies on both the training-domain graph and unseen testing graph to eliminate potential dangers. Nevertheless, it is a challenging task since only limited labels are available, and the normal background may differ between training and testing data. Accordingly, we propose a data augmentation method named \textit{AugAN} (\uline{Aug}mentation for \uline{A}nomaly and \uline{N}ormal distributions) to enrich training data and boost the generalizability of GAD models. Experiments verify the effectiveness of our method in improving model generalizability.	翻訳日:2023-06-21 20:04:20 公開日:2023-06-18
# 事前学習したテキストから画像への拡散モデルによるポイントクラウド補完 Point-Cloud Completion with Pretrained Text-to-image Diffusion Models ( http://arxiv.org/abs/2306.10533v1 ) ライセンス: Link先を確認	Yoni Kasten, Ohad Rahamim, Gal Chechik	(参考訳) 実世界のアプリケーションで収集されるポイントクラウドデータは、しばしば不完全である。データは一般的に、特定の視点や角度しか捉えない部分的な視点から観察されるオブジェクトのために欠落している。さらに、オクルージョンと低解像度サンプリングのため、データは不完全である。既存の補完アプローチは、ノイズと不完全な点雲の完成を導くために、事前に定義されたオブジェクトのデータセットに依存している。しかし、これらのアプローチは、トレーニングデータセットでは不十分なOOD(Out-Of-Distribution)オブジェクトでテストすると、パフォーマンスが悪くなります。ここでは,近年のテキストガイド画像生成の進歩を活かし,テキストガイド形状生成の大きなブレークスルーを導いた。本稿では,事前学習したテキストから画像への拡散モデルを用いて,与えられた不完全点クラウドのテキストセマンティクスを活用し,完全な表面表現を得るsds完全というアプローチについて述べる。 SDS-Completeは、高価な3D情報を集めることなく、テスト時間最適化を用いて様々なオブジェクトを補完することができる。実世界の深度センサとLiDARスキャナーで捉えた不完全なスキャン対象に対するSDS完全性を評価する。一般的なデータセットから欠落したオブジェクトを効果的に再構築し、現在の手法と比較して、Chamferの損失を平均50%削減する。プロジェクトページ: https://sds-complete.github.io/ Point-cloud data collected in real-world applications are often incomplete. Data is typically missing due to objects being observed from partial viewpoints, which only capture a specific perspective or angle. Additionally, data can be incomplete due to occlusion and low-resolution sampling. Existing completion approaches rely on datasets of predefined objects to guide the completion of noisy and incomplete, point clouds. However, these approaches perform poorly when tested on Out-Of-Distribution (OOD) objects, that are poorly represented in the training dataset. Here we leverage recent advances in text-guided image generation, which lead to major breakthroughs in text-guided shape generation. We describe an approach called SDS-Complete that uses a pre-trained text-to-image diffusion model and leverages the text semantics of a given incomplete point cloud of an object, to obtain a complete surface representation. SDS-Complete can complete a variety of objects using test-time optimization without expensive collection of 3D information. We evaluate SDS Complete on incomplete scanned objects, captured by real-world depth sensors and LiDAR scanners. We find that it effectively reconstructs objects that are absent from common datasets, reducing Chamfer loss by 50% on average compared with current methods. Project page: https://sds-complete.github.io/	翻訳日:2023-06-21 20:03:52 公開日:2023-06-18
# GenPose:拡散モデルによる生成カテゴリレベルのオブジェクトポス推定 GenPose: Generative Category-level Object Pose Estimation via Diffusion Models ( http://arxiv.org/abs/2306.10531v1 ) ライセンス: Link先を確認	Jiyao Zhang, Mingdong Wu and Hao Dong	(参考訳) オブジェクトのポーズ推定は、AIとコンピュータビジョンの具体化において重要な役割を果たす。カテゴリーレベルのポーズ推定の実用性にもかかわらず、現在のアプローチは、マルチハイポテーゼ問題として知られる部分的観測点雲の課題に遭遇する。本研究では,カテゴリーレベルのオブジェクトポーズ推定を条件付き生成モデルとして再検討し,従来のポイント・ツー・ポイント回帰から外れた新しい解を提案する。スコアベース拡散モデルを利用して、拡散モデルから候補をサンプリングし、2段階のプロセスでそれらを集約することによりオブジェクトのポーズを推定する。確率を推定する際のコストのかかる統合プロセスを回避するため,従来のスコアベースモデルからエネルギーベースモデルを訓練し,エンドツーエンドの推定を可能にする方法を提案する。提案手法は, 厳密な5d2cmおよび5d5cmで50%, 60%以上の精度でREAL275データセット上での最先端性能を実現する。さらに,本手法は,類似の対称特性を微調整せずに共有する新しいカテゴリに対して高い一般化性を示し,オブジェクトポーズ追跡タスクに容易に適応でき,現在の最先端ベースラインに匹敵する結果が得られることを示した。 Object pose estimation plays a vital role in embodied AI and computer vision, enabling intelligent agents to comprehend and interact with their surroundings. Despite the practicality of category-level pose estimation, current approaches encounter challenges with partially observed point clouds, known as the multihypothesis issue. In this study, we propose a novel solution by reframing categorylevel object pose estimation as conditional generative modeling, departing from traditional point-to-point regression. Leveraging score-based diffusion models, we estimate object poses by sampling candidates from the diffusion model and aggregating them through a two-step process: filtering out outliers via likelihood estimation and subsequently mean-pooling the remaining candidates. To avoid the costly integration process when estimating the likelihood, we introduce an alternative method that trains an energy-based model from the original score-based model, enabling end-to-end likelihood estimation. Our approach achieves state-of-the-art performance on the REAL275 dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics, respectively. Furthermore, our method demonstrates strong generalizability to novel categories sharing similar symmetric properties without fine-tuning and can readily adapt to object pose tracking tasks, yielding comparable results to the current state-of-the-art baselines.	翻訳日:2023-06-21 20:03:31 公開日:2023-06-18
# トランスフォーマーモデルにおけるジェンダーバイアス:包括的調査 Gender Bias in Transformer Models: A comprehensive survey ( http://arxiv.org/abs/2306.10530v1 ) ライセンス: Link先を確認	Praneeth Nemani, Yericherla Deepak Joel, Palla Vijay, Farhana Ferdousi Liza	(参考訳) 人工知能(AI)におけるジェンダーバイアスは、個人の生活に深く影響する懸念として浮上している。本稿では,トランスフォーマーモデルにおけるジェンダーバイアスを言語学的観点から調査する。言語モデルにおけるジェンダーバイアスの存在は以前の研究で認識されているが、このバイアスを効果的に測定し評価する方法に関するコンセンサスが不足している。本研究はトランスフォーマーにおけるジェンダーバイアスに関する既存の文献を批判的に検討し,バイアス評価に使用される多様な方法論と指標について考察した。現在のトランスフォーマにおける性別バイアス測定のアプローチには、不完全あるいは欠陥のあるメトリクスの利用、不適切なデータセットサイズ、評価方法の標準化の欠如など、いくつかの制限がある。さらに,対話システムや機械翻訳など,下流アプリケーション用トランスフォーマーにおけるジェンダーバイアスの潜在的影響について検討した。我々は、言語技術の開発と展開における認識の高まりと説明責任の必要性を強調し、これらのシステムにおける公平性と公平性を育むことの重要性を強調している。本稿では、トランスフォーマーモデルにおけるジェンダーバイアスの包括的概要として、新しい洞察を提供し、この重要な領域における将来の研究に有用な方向性を提供する。 Gender bias in artificial intelligence (AI) has emerged as a pressing concern with profound implications for individuals' lives. This paper presents a comprehensive survey that explores gender bias in Transformer models from a linguistic perspective. While the existence of gender bias in language models has been acknowledged in previous studies, there remains a lack of consensus on how to effectively measure and evaluate this bias. Our survey critically examines the existing literature on gender bias in Transformers, shedding light on the diverse methodologies and metrics employed to assess bias. Several limitations in current approaches to measuring gender bias in Transformers are identified, encompassing the utilization of incomplete or flawed metrics, inadequate dataset sizes, and a dearth of standardization in evaluation methods. Furthermore, our survey delves into the potential ramifications of gender bias in Transformers for downstream applications, including dialogue systems and machine translation. We underscore the importance of fostering equity and fairness in these systems by emphasizing the need for heightened awareness and accountability in developing and deploying language technologies. This paper serves as a comprehensive overview of gender bias in Transformer models, providing novel insights and offering valuable directions for future research in this critical domain.	翻訳日:2023-06-21 20:03:06 公開日:2023-06-18
# 線形モデルにおけるDropout Regularization Versus $\ell_2$-Penalization Dropout Regularization Versus $\ell_2$-Penalization in the Linear Model ( http://arxiv.org/abs/2306.10529v1 ) ライセンス: Link先を確認	Gabriel Clara, Sophie Langer, Johannes Schmidt-Hieber	(参考訳) 線形回帰モデルにおける降下を伴う勾配降下の統計的挙動について検討する。特に、イテレートの期待と共分散行列に対する非漸近境界が導出される。期待値におけるドロップアウトと$\ell_2$-レギュライゼーションの相関が広く引用されているのとは対照的に、この結果は勾配降下ダイナミクスとドロップアウトによって引き起こされる追加のランダム性との相互作用により、はるかに微妙な関係を示している。また,正規化効果を持たず,最小二乗推定器に収束するドロップアウトの簡易変種についても検討した。 We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for expectations and covariance matrices of the iterates are derived. In contrast with the widely cited connection between dropout and $\ell_2$-regularization in expectation, the results indicate a much more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout. We also study a simplified variant of dropout which does not have a regularizing effect and converges to the least squares estimator.	翻訳日:2023-06-21 20:02:46 公開日:2023-06-18
# 測定の最適化による暗カウント効果の低減 Reduce dark count effects by optimizing measurements ( http://arxiv.org/abs/2306.10525v1 ) ライセンス: Link先を確認	Hao Shu	(参考訳) 量子タスクを実践する場合、デバイスの不完全性を考慮する必要がある。中でも、重要かつ未解決な問題の1つは、単一光子検出器によるダークカウント効果である。本稿では,これらの問題を考察し,実用的検出器を用いた暗視計数効果のロバスト性を反映した測定の新たな最適性を定義する。また、一般計測のための最適化スキームを提供する。この研究は、測定値の選択を最適化してダークカウント効果を扱おうとする最初の研究であり、この問題はスキームによって軽減できると信じている。 When implementing quantum tasks practically, the imperfection of devices should take into account. Among all, One of the significant but unsolved problems is the dark count effect caused by single photon detectors. In this paper, we consider such an issue and define a new optimality for measurements, reflecting the robustness in dark count effects with practical detectors. Also, an optimization scheme for general measurements is provided. This research could be the first one trying to handle dark count effects based on optimizing the choice of measurements, and we believe that the problem can be reduced by the scheme.	翻訳日:2023-06-21 20:02:35 公開日:2023-06-18
# OpenDataVal: データ評価のための統一ベンチマーク OpenDataVal: a Unified Benchmark for Data Valuation ( http://arxiv.org/abs/2306.10577v1 ) ライセンス: Link先を確認	Kevin Fu Jiang, Weixin Liang, James Zou, Yongchan Kwon	(参考訳) 個々のデータポイントの品質と影響を評価することは、モデルパフォーマンスを改善し、トレーニングデータセット内の望ましくないバイアスを軽減するために重要です。データ品質を定量化するためにいくつかのデータ評価アルゴリズムが提案されているが、データ評価のための体系的で標準化されたベンチマークシステムがない。本稿では、研究者や実践者が様々なデータ評価アルゴリズムを適用して比較できるようにする、使いやすく統一されたベンチマークフレームワークOpenDataValを紹介する。 OpenDataValは統合された環境を提供する (i)画像、自然言語、表形式のデータセットの多種多様なコレクション。 (ii)9種類の最先端データ評価アルゴリズムの実装、及び (iii) scikit-learnで任意のモデルをインポート可能な予測モデルapi。さらに、データ値の品質を評価するための4つの下流機械学習タスクを提案する。我々はOpenDataValを用いてベンチマーク分析を行い、最先端データ評価手法の有効性を定量化し比較する。一つのアルゴリズムが全てのタスクに対して一様に最善を尽くすことはなく、ユーザの下流タスクに適切なアルゴリズムを適用すべきである。 OpenDataValはhttps://opendataval.github.ioで公開されている。さらに、研究者が自身のデータバリュエーションアルゴリズムの有効性を評価できるリーダーボードを提供する。 Assessing the quality and impact of individual data points is critical for improving model performance and mitigating undesirable biases within the training dataset. Several data valuation algorithms have been proposed to quantify data quality, however, there lacks a systemic and standardized benchmarking system for data valuation. In this paper, we introduce OpenDataVal, an easy-to-use and unified benchmark framework that empowers researchers and practitioners to apply and compare various data valuation algorithms. OpenDataVal provides an integrated environment that includes (i) a diverse collection of image, natural language, and tabular datasets, (ii) implementations of nine different state-of-the-art data valuation algorithms, and (iii) a prediction model API that can import any models in scikit-learn. Furthermore, we propose four downstream machine learning tasks for evaluating the quality of data values. We perform benchmarking analysis using OpenDataVal, quantifying and comparing the efficacy of state-of-the-art data valuation approaches. We find that no single algorithm performs uniformly best across all tasks, and an appropriate algorithm should be employed for a user's downstream task. OpenDataVal is publicly available at https://opendataval.github.io with comprehensive documentation. Furthermore, we provide a leaderboard where researchers can evaluate the effectiveness of their own data valuation algorithms.	翻訳日:2023-06-21 19:56:32 公開日:2023-06-18
# スコアに基づくデータ同化 Score-based Data Assimilation ( http://arxiv.org/abs/2306.10574v1 ) ライセンス: Link先を確認	Fran\c{c}ois Rozet and Gilles Louppe	(参考訳) データ同化は、最も包括的な形で、確率力学系のノイズまたは不完全な観察を説明する可塑性状態軌跡を特定するベイズ逆問題に対処する。粒子法や変分法などの様々な手法が提案されている。しかし、ほとんどのアルゴリズムは、長期間の地平線や、海洋や大気のような複雑な力学を持つ高次元システムにとって、推論の遷移力学に依存している。本研究では,軌道推定のためのスコアに基づくデータ同化について述べる。我々は、任意の長さの軌道のスコアを短いセグメントで一連のスコアに分解できるというキーインサイトに基づいて、状態軌道のスコアに基づく生成モデルを学ぶ。トレーニング後、全ての状態を同時に生成して非自己回帰的にスコアモデルを用いて推論を行う。極めて特筆すべきは、トレーニング手順から観察モデルを分離し、推論時にのみ使用して生成過程をガイドし、幅広いゼロショット観察シナリオを可能にすることである。本手法の有効性を裏付ける理論的,実証的な証拠を提示する。 Data assimilation, in its most comprehensive form, addresses the Bayesian inverse problem of identifying plausible state trajectories that explain noisy or incomplete observations of stochastic dynamical systems. Various approaches have been proposed to solve this problem, including particle-based and variational methods. However, most algorithms depend on the transition dynamics for inference, which becomes intractable for long time horizons or for high-dimensional systems with complex dynamics, such as oceans or atmospheres. In this work, we introduce score-based data assimilation for trajectory inference. We learn a score-based generative model of state trajectories based on the key insight that the score of an arbitrarily long trajectory can be decomposed into a series of scores over short segments. After training, inference is carried out using the score model, in a non-autoregressive manner by generating all states simultaneously. Quite distinctively, we decouple the observation model from the training procedure and use it only at inference to guide the generative process, which enables a wide range of zero-shot observation scenarios. We present theoretical and empirical evidence supporting the effectiveness of our method.	翻訳日:2023-06-21 19:56:16 公開日:2023-06-18
# 高次相関関数における強および超強結合の操作 Manifestation of strong and ultra-strong coupling in high-order correlation function ( http://arxiv.org/abs/2306.10573v1 ) ライセンス: Link先を確認	A. S. Belashov, E. S. Andrianov, A. A. Zyablovsky	(参考訳) キャビティ-単一原子」系における強結合と超強結合は、基礎物理学と応用物理学の両方に大きな関心を持つ。キャビティモードと原子との結合強度の増加は、第一に弱結合から強結合へ、第二に超強結合状態へ遷移すると考えられる。この書簡では、この共通の意見を反論し、カップリングレジーム間の遷移が異なる順序の相関関数に対して異なる順序で起こることを実証する。また, n次相関関数の場合, 強結合状態への遷移には, 1次相関関数の結合強度が約$n^{2/3} 大きいことが判明した。対照的に、超強結合状態への移行は、第一次相関関数の動力学よりも結合強度が低いn次相関関数の動力学に現れる。その結果、カップリング強度の増加が第一に弱いカップリングから超強結合への遷移、第二に強結合状態へと繋がる相関関数の次数が存在する。高次相関関数の測定は、結合強度が振動周波数の10分の1以下である場合、「キャビティモード-単一原子」における超強結合を観測できると主張している。 Strong and ultra-strong coupling in "cavity - single atom" system are of great interest for both fundamental and applied physics. It is considered that the increase in the coupling strength between a cavity mode and an atom leads, first, to transition from weak to strong coupling and, second, to ultra-strong coupling regime. In this letter, we refute this common opinion and demonstrate that the transitions between the coupling regimes occur in different sequence for the correlations' functions of different orders. We show that for n-th order correlations' functions, the transition to the strong coupling regime requires the coupling strength approximately by $n^{2/3}$ times greater than the one for first order correlations' functions. In contrast, the transition to the ultra-strong coupling regime manifests in the dynamics of n-th order correlations' functions at the less coupling strength than in the dynamics of first order correlations' functions. As a result, there is the order of correlations' functions, above which the increase in the coupling strength leads, first, to the transition from the weak coupling first to the ultra-strong coupling regime, and second to the strong coupling regime. We argue that the measurement of high orders correlations' functions makes it possible to observe the ultra-strong coupling in "cavity mode - single atom" when the coupling strength is much less than one tenth of the oscillation frequency.	翻訳日:2023-06-21 19:55:59 公開日:2023-06-18
# 最短共通スーパーストリングとテキスト集合問題に対する量子アルゴリズム Quantum Algorithms for the Shortest Common Superstring and Text Assembling Problems ( http://arxiv.org/abs/2306.10572v1 ) ライセンス: Link先を確認	Kamil Khadiev, Carlos Manuel Bosch Machado, Zeyu Chen, Junde Wu	(参考訳) 本稿では,テキスト集合問題の2つのバージョンについて考察する。文字列の列$s^1,\dots,s^n$ of total length $l$(辞書)と$t$ of length $m$(テキスト)が与えられます。問題の最初のバージョンは、辞書から$t$を組み立てることである。 2番目のバージョンは ``Shortest Superstring Problem' (SSP) または ``Shortest Common Superstring Problem' (SCS) である。この場合、$t$は与えられず、与えられたシーケンスから各文字列をサブストリングとして含む最短文字列(スーパーストリングと呼ぶ)を構築するべきです。これらの問題は、小さな断片から長いDNA配列を再構成する配列アセンブリー法に関連付けられている。どちらの問題に対しても、従来のアルゴリズムよりも優れた量子アルゴリズムを提案する。最初のケースでは、$O(m+\log m\sqrt{nL})$ run time の量子アルゴリズムを示す。 SSP の場合、実行時間 $O(n^3 1.728^n +L +\sqrt{L}n^{1.5}+\sqrt{L}n\log^2L\log^2n)$ の量子アルゴリズムを示す。 In this paper, we consider two versions of the Text Assembling problem. We are given a sequence of strings $s^1,\dots,s^n$ of total length $L$ that is a dictionary, and a string $t$ of length $m$ that is texts. The first version of the problem is assembling $t$ from the dictionary. The second version is the ``Shortest Superstring Problem''(SSP) or the ``Shortest Common Superstring Problem''(SCS). In this case, $t$ is not given, and we should construct the shortest string (we call it superstring) that contains each string from the given sequence as a substring. These problems are connected with the sequence assembly method for reconstructing a long DNA sequence from small fragments. For both problems, we suggest new quantum algorithms that work better than their classical counterparts. In the first case, we present a quantum algorithm with $O(m+\log m\sqrt{nL})$ running time. In the case of SSP, we present a quantum algorithm with running time $O(n^3 1.728^n +L +\sqrt{L}n^{1.5}+\sqrt{L}n\log^2L\log^2n)$.	翻訳日:2023-06-21 19:55:37 公開日:2023-06-18
# スピングラス系における量子重ね合わせと絡み合い Quantum Superposition and Entanglement in Spin-Glass Systems ( http://arxiv.org/abs/2306.10571v1 ) ライセンス: Link先を確認	Asl{\i} Tuncer and Serhat C. Kad{\i}o\u{g}lu	(参考訳) スピングラスは、ポテンシャル配置を含む等しく可能性の高い重ね合わせ状態(sss)に存在することを提案する。我々は、Edward-Anderson(EA)の秩序パラメータと磁化を利用して、SG、強磁性(FM)、常磁性(PM)相などの磁気秩序(秩序)の識別への寄与に基づいて、これらのSSの分類手法を確立する。また,様々なシステムサイズを包含し,これらの相依存SSの絡み合い特性を負性測定を用いて検討した。解析の結果,SG秩序パラメータは磁性秩序(秩序のずれ)相の絡み合い特性を決定するのに有効であり,その逆も磁性秩序の存在を示す負性を示す。具体的には,スピングラス系における負性率と感受性の関係について検討する。本研究はスピングラスと量子磁石における量子重ね合わせの役割についてさらなる知見を与える。 We propose that spin glasses can exist in equally probable superposition states (SSs) comprising potential configurations. Employing the Edward-Anderson (EA) order parameter and magnetization, we establish a classification scheme for these SSs based on their contribution to discerning magnetic order (disorder), such as SG, ferromagnetic (FM), and paramagnetic (PM) phases. We also encompass various system sizes and investigate the entanglement properties of these phase-dependent SSs using the negativity measure. Our analysis reveals that the SG order parameter can be employed to determine the entanglement characteristics of magnetically ordered (disordered) phases, and vice versa, with negativity indicating the presence of magnetic order. Specifically, we examine the relationship between negativity and susceptibility in spin-glass systems. Our findings provide further insight into the role of quantum superposition in spin glasses and quantum magnets.	翻訳日:2023-06-21 19:55:15 公開日:2023-06-18
# MIR-GAN:アダベリアルネットワークを用いたフレームレベルモード不変表現の精製 MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition ( http://arxiv.org/abs/2306.10567v1 ) ライセンス: Link先を確認	Yuchen Hu, Chen Chen, Ruizhe Li, Heqing Zou, Eng Siong Chng	(参考訳) 音声視覚音声認識(AVSR)は、近年、人間の発話を理解するためにマルチモーダル信号を活用することで、研究の関心が高まりつつある。この課題に対処する主流のアプローチは、マルチモーダリティ融合と表現学習のための高度なアーキテクチャと技術を開発した。しかし、異なるモダリティの自然な不均一性は、それらの表現間の分布ギャップを生じさせ、それらを融合させることを困難にする。本稿では,モダリティ間の共通表現を学習してギャップを埋めることを目的とする。感情分析などの他のマルチモーダルタスクにおける既存の類似手法とは異なり,avsrのシーケンス間タスク設定を考慮した時間的文脈依存性に注目した。特に,フレームレベルのモダリティ不変表現(MIR-GAN)を改良する対角ネットワークを提案する。 LRS3 と LRS2 の公開ベンチマークによる大規模な実験により,我々の手法は最先端技術よりも優れていることが示された。 Audio-visual speech recognition (AVSR) attracts a surge of research interest recently by leveraging multimodal signals to understand human speech. Mainstream approaches addressing this task have developed sophisticated architectures and techniques for multi-modality fusion and representation learning. However, the natural heterogeneity of different modalities causes distribution gap between their representations, making it challenging to fuse them. In this paper, we aim to learn the shared representations across modalities to bridge their gap. Different from existing similar methods on other multimodal tasks like sentiment analysis, we focus on the temporal contextual dependencies considering the sequence-to-sequence task setting of AVSR. In particular, we propose an adversarial network to refine frame-level modality-invariant representations (MIR-GAN), which captures the commonality across modalities to ease the subsequent multimodal fusion process. Extensive experiments on public benchmarks LRS3 and LRS2 show that our approach outperforms the state-of-the-arts.	翻訳日:2023-06-21 19:55:01 公開日:2023-06-18
# 雑音中の唇の聴取:ロバストな音声認識のための普遍音素マッピングと伝達 Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition ( http://arxiv.org/abs/2306.10563v1 ) ライセンス: Link先を確認	Yuchen Hu, Ruizhe Li, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng	(参考訳) AVSR(Audio-visual speech Recognition)は、視覚情報を用いた音声のみの音声認識のノイズロス性を改善するための有望なソリューションを提供する。しかし, AVSRタスクの優越性を考慮して, 音質改善に重点を置いており, フロントエンドの雑音処理などの雑音適応技術が注目されている。効果はあるものの、これらの手法は通常2つの実践的な課題に直面している。 1) 実環境シナリオにおける騒音発声・視聴覚訓練の十分なラベルの欠如と課題 2) テストノイズに対する最適モデル一般性は低い。本研究では,非教師なし雑音適応の学習データに依存することなく,どのテストノイズにも適応できるavsrの頑健性を高めるために,雑音不変な視覚モダリティについて検討する。人間の知覚機構に着想を得て,視覚信号からクリーンな音声を復元し,雑音のある環境下での音声認識を可能にする,普遍的な音素マッピング(UniVPM)手法を提案する。 LRS3 と LRS2 のベンチマーク実験により, 様々なノイズや清潔な条件下での最先端性を実現することができた。また,視覚音声認識タスクにおける先行技術よりも優れていた。 Audio-visual speech recognition (AVSR) provides a promising solution to ameliorate the noise-robustness of audio-only speech recognition with visual information. However, most existing efforts still focus on audio modality to improve robustness considering its dominance in AVSR task, with noise adaptation techniques such as front-end denoise processing. Though effective, these methods are usually faced with two practical challenges: 1) lack of sufficient labeled noisy audio-visual training data in some real-world scenarios and 2) less optimal model generality to unseen testing noises. In this work, we investigate the noise-invariant visual modality to strengthen robustness of AVSR, which can adapt to any testing noises while without dependence on noisy training data, a.k.a., unsupervised noise adaptation. Inspired by human perception mechanism, we propose a universal viseme-phoneme mapping (UniVPM) approach to implement modality transfer, which can restore clean audio from visual signals to enable speech recognition under any noisy conditions. Extensive experiments on public benchmarks LRS3 and LRS2 show that our approach achieves the state-of-the-art under various noisy as well as clean conditions. In addition, we also outperform previous state-of-the-arts on visual speech recognition task.	翻訳日:2023-06-21 19:54:43 公開日:2023-06-18
# 原表現定理のイザベル形式化 Isabelle Formalisation of Original Representation Theorems ( http://arxiv.org/abs/2306.10558v1 ) ライセンス: Link先を確認	Marco B. Caminati	(参考訳) 最近の論文では、巨大なデータベース上のクロスサイトデータマイニングと、既存のイザベルで検証されたイベント構造列挙アルゴリズムに基づいて、明らかに無関係な数学的対象(並行性理論と計算生物学における全グラフからのイベント構造)をリンクする新たな定理が発見された。そのような定理の起源と新しさを考えると、それらの形式的検証は特に望ましい。本稿では,Isabelle/HOL定義と定理による検証を行い,そのプロセスにおける技術的課題を明らかにする。導入された形式化は、Isabelleで検証されたイベント構造列挙アルゴリズムの完全な検証フレームワークへの検証を完了し、イベント構造を完全なグラフにリンクする。 In a recent paper, new theorems linking apparently unrelated mathematical objects (event structures from concurrency theory and full graphs arising in computational biology) were discovered by cross-site data mining on huge databases, and building on existing Isabelle-verified event structures enumeration algorithms. Given the origin and newness of such theorems, their formal verification is particularly desirable. This paper presents such a verification via Isabelle/HOL definitions and theorems, and exposes the technical challenges found in the process. The introduced formalisation completes the verification of Isabelle-verified event structure enumeration algorithms into a fully verified framework to link event structures to full graphs.	翻訳日:2023-06-21 19:54:20 公開日:2023-06-18
# リーダボードから実践への要約:表現バックボーンの選択と堅牢性確保 Summarization from Leaderboards to Practice: Choosing A Representation Backbone and Ensuring Robustness ( http://arxiv.org/abs/2306.10555v1 ) ライセンス: Link先を確認	David Demeter, Oshin Agarwal, Simon Ben Igeri, Marko Sterbentz, Neil Molino, John M. Conroy, Ani Nenkova	(参考訳) 学術文献は、既存の研究コンポーネントから最大限の顧客向け要約システムを構築する方法についてはあまりガイダンスを提供していない。本稿では,一般的なモデルからシステムバックボーンの選択を知らせる解析を行い,自動評価と人間評価の両方において,bartがペガサスやt5よりも優れた性能を示す。また,クロスドメインを適用した場合,要約処理の性能が著しく低下することが判明した。同時に、異種ドメインに微調整されたシステムは、すべてのドメインでうまく動作し、幅広いドメインの要約に最も適します。我々の研究は異種ドメイン要約ベンチマークの必要性を強調している。システム出力のかなりのばらつきは、人間による評価だけでは捉えられず、自動評価だけでは標準のリーダーボードに反映されそうにない。 Academic literature does not give much guidance on how to build the best possible customer-facing summarization system from existing research components. Here we present analyses to inform the selection of a system backbone from popular models; we find that in both automatic and human evaluation, BART performs better than PEGASUS and T5. We also find that when applied cross-domain, summarizers exhibit considerably worse performance. At the same time, a system fine-tuned on heterogeneous domains performs well on all domains and will be most suitable for a broad-domain summarizer. Our work highlights the need for heterogeneous domain summarization benchmarks. We find considerable variation in system output that can be captured only with human evaluation and are thus unlikely to be reflected in standard leaderboards with only automatic evaluation.	翻訳日:2023-06-21 19:54:06 公開日:2023-06-18
# 予測モデルは因果推論に使用できるか? Can predictive models be used for causal inference? ( http://arxiv.org/abs/2306.10551v1 ) ライセンス: Link先を確認	Maximilian Pichler and Florian Hartig	(参考訳) 機械学習 (ML) と深層学習 (DL) アルゴリズムは予測タスクに優れるが、一般的には非因果関係を利用して、解釈可能性と一般化可能性の両方を制限すると仮定される。ここでは,この説明と予測のトレードオフが,期待したほど深く,基本的なものではないことを示す。 MLとDLのアルゴリズムは、すべてのデータに不特定に入力された場合の予測に非因果的特徴を用いる傾向にあるが、Pearlのバックドア調整基準に従って特徴を選択することで、任意のMLとDLアルゴリズムの学習プロセスを制限することができる。このような状況では、いくつかのアルゴリズム、特にディープニューラルネットワークは、特徴コリニアリティの下でほぼ偏りのない効果推定を提供することができる。残されるバイアスは、特定のアルゴリズム構造とハイパーパラメータ選択によって説明される。その結果、予測や推論のために調整された場合、最適なハイパーパラメータ設定が異なり、予測と説明の間のトレードオフの一般的な期待を確認する。しかし、このトレードオフの効果は因果的に制約された特徴選択の効果と比較して小さい。したがって、特徴間の因果関係が説明されれば、予測と説明の差は一般的に想定されるよりもはるかに小さくなる。また,このような因果制約のあるモデルが,共線形構造が変化した新しいデータに対してより一般化することを示し,一般化の失敗はしばしば因果学習の欠如によるものであることを示唆する。以上の結果から,mlモデルを用いて(causal)効果を推定する視点を提供するだけでなく,新しいデータに対するmlモデルとdlモデルの一般化可能性の向上にも寄与した。 Supervised machine learning (ML) and deep learning (DL) algorithms excel at predictive tasks, but it is commonly assumed that they often do so by exploiting non-causal correlations, which may limit both interpretability and generalizability. Here, we show that this trade-off between explanation and prediction is not as deep and fundamental as expected. Whereas ML and DL algorithms will indeed tend to use non-causal features for prediction when fed indiscriminately with all data, it is possible to constrain the learning process of any ML and DL algorithm by selecting features according to Pearl's backdoor adjustment criterion. In such a situation, some algorithms, in particular deep neural networks, can provide near unbiased effect estimates under feature collinearity. Remaining biases are explained by the specific algorithmic structures as well as hyperparameter choice. Consequently, optimal hyperparameter settings are different when tuned for prediction or inference, confirming the general expectation of a trade-off between prediction and explanation. However, the effect of this trade-off is small compared to the effect of a causally constrained feature selection. Thus, once the causal relationship between the features is accounted for, the difference between prediction and explanation may be much smaller than commonly assumed. We also show that such causally constrained models generalize better to new data with altered collinearity structures, suggesting generalization failure may often be due to a lack of causal learning. Our results not only provide a perspective for using ML for inference of (causal) effects but also help to improve the generalizability of fitted ML and DL models to new data.	翻訳日:2023-06-21 19:53:53 公開日:2023-06-18
# グリオーマにおける経時的MRI画像解析のための深層学習に基づくグループ登録法 Deep learning-based group-wise registration for longitudinal MRI analysis in glioma ( http://arxiv.org/abs/2306.10611v1 ) ライセンス: Link先を確認	Claudia Chinea Hammecher, Karin van Garderen, Marion Smits, Pieter Wesseling, Bart Westerman, Pim French, Mathilde Kouwenhoven, Roel Verhaak, Frans Vos, Esther Bron and Bo Li	(参考訳) グリオーマの成長は縦断画像登録で定量化できる。しかし、画像全体にわたる大きな質量効果と組織の変化は、さらなる課題をもたらす。本稿では,グリオーマMRIの正確かつ偏りのない登録のための縦断的,学習的,集団的登録法を提案する。我々は,Glioma Longitudinal AnalySiSコンソーシアムのデータセットを評価し,古典的な登録手法と比較した。より詳細な登録で同等のDice係数を実現し、ランタイムを1分以内で大幅に削減します。提案手法は、グリオーマの成長に関するさらなる知見を提供するため、古典的なツールボックスの代替として機能する可能性がある。 Glioma growth may be quantified with longitudinal image registration. However, the large mass-effects and tissue changes across images pose an added challenge. Here, we propose a longitudinal, learning-based, and groupwise registration method for the accurate and unbiased registration of glioma MRI. We evaluate on a dataset from the Glioma Longitudinal AnalySiS consortium and compare it to classical registration methods. We achieve comparable Dice coefficients, with more detailed registrations, while significantly reducing the runtime to under a minute. The proposed methods may serve as an alternative to classical toolboxes, to provide further insight into glioma growth.	翻訳日:2023-06-21 19:45:46 公開日:2023-06-18
# STHG:空間時間不均一グラフ学習による高度なオーディオ・ビジュアルダイアリゼーション STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization ( http://arxiv.org/abs/2306.10608v1 ) ライセンス: Link先を確認	Kyle Min	(参考訳) 本稿では,Ego4D Challenge 2023の音声・視覚ダイアリゼーションタスクにおけるSTHGという新しい手法を紹介する。キーとなるイノベーションは、単一の一元的なグラフ学習フレームワークを使用して、ビデオ内のすべての話者をモデル化することです。カメラ装着者のみに独立したコンポーネントを必要とする従来のアプローチとは異なり、STHGはカメラ装着者を含む全ての人の音声活動を共同で検出することができる。最終手法はEgo4Dのテストセット上で61.1%のDERを得るが、これは昨年の勝者と同様に全てのベースラインを著しく上回っている。 Ego4D Challenge 2023で1位を獲得した。また,本課題では,sthgによるダイアリゼーション音声セグメントに市販音声認識システムを適用することで,音声認識課題における競合性能が向上することを示す。 This report introduces our novel method named STHG for the Audio-Visual Diarization task of the Ego4D Challenge 2023. Our key innovation is that we model all the speakers in a video using a single, unified heterogeneous graph learning framework. Unlike previous approaches that require a separate component solely for the camera wearer, STHG can jointly detect the speech activities of all people including the camera wearer. Our final method obtains 61.1% DER on the test set of Ego4D, which significantly outperforms all the baselines as well as last year's winner. Our submission achieved 1st place in the Ego4D Challenge 2023. We additionally demonstrate that applying the off-the-shelf speech recognition system to the diarized speech segments by STHG produces a competitive performance on the Speech Transcription task of this challenge.	翻訳日:2023-06-21 19:45:35 公開日:2023-06-18
# 表象によるデコンゲーション:市場における経済福祉改善のための学習 Decongestion by Representation: Learning to Improve Economic Welfare in Marketplaces ( http://arxiv.org/abs/2306.10606v1 ) ライセンス: Link先を確認	Omer Nahum, Gali Noti, David Parkes, Nir Rosenfeld	(参考訳) 混雑は、消費者が同じ商品のサブセット(例えば、休暇の賃貸プラットフォームで同じ小さな資産を追求するなど)で非効率に競争する市場において共通の失敗モードである。典型的な経済的な話では、物価は市場を切り離すために需給のバランスをとることでこの問題を解決している。しかし、現代のオンラインマーケットプレースでは、価格は通常、売り手によって分散された方法で設定される。このことは、プラットフォームがこの力を使って、混雑を減らして社会福祉を改善する表現を学習する、表現による混雑の現在の研究を動機付けている。技術的な課題は2つある — 真の評価ではなく、ユーザの過去の選択から明らかな選好のみに依存すること、そして、どの機能を明かすか、本質的にコンビネーションであるかを判断する表現を扱うこと、の2つだ。我々は、消費者選択データに基づいてエンドツーエンドで訓練できる福祉の差別化可能なプロキシを提案することで、両方の課題に取り組む。脱便が福祉を促進するための十分な条件を与える理論を提供し、我々の設定とアプローチに光を当てる合成データと実データの両方について実験を行う。 Congestion is a common failure mode of markets, where consumers compete inefficiently on the same subset of goods (e.g., chasing the same small set of properties on a vacation rental platform). The typical economic story is that prices solve this problem by balancing supply and demand in order to decongest the market. But in modern online marketplaces, prices are typically set in a decentralized way by sellers, with the power of a platform limited to controlling representations -- the information made available about products. This motivates the present study of decongestion by representation, where a platform uses this power to learn representations that improve social welfare by reducing congestion. The technical challenge is twofold: relying only on revealed preferences from users' past choices, rather than true valuations; and working with representations that determine which features to reveal and are inherently combinatorial. We tackle both by proposing a differentiable proxy of welfare that can be trained end-to-end on consumer choice data. We provide theory giving sufficient conditions for when decongestion promotes welfare, and present experiments on both synthetic and real data shedding light on our setting and approach.	翻訳日:2023-06-21 19:45:20 公開日:2023-06-18
# Fermi-Hubbardモデルに対するコンピュテータスケーリングによるトラッター誤差 Trotter error with commutator scaling for the Fermi-Hubbard model ( http://arxiv.org/abs/2306.10603v1 ) ライセンス: Link先を確認	Ansgar Schubert and Christian B. Mendl	(参考訳) 一般トロッター積公式の小さな因子による高階誤差境界を導出し、子などの結果を一般化する。 (第11回第11回第011020回(2021年)) 次に、これらの境界をフェルミ・ハバード・ハミルトニアンによって支配される実時間量子時間発展作用素に1次元および2次元の正方格子および三角形格子に応用する。我々の研究の主な技術的貢献は、与えられた格子幾何学のホッピングと相互作用項の間の入れ子交換子の象徴的評価である。この計算は、時間ステップとハミルトニアン係数の項による誤差境界の明示的な表現をもたらす。実際のtrotterエラー(小さなシステムで評価された)と比較すると、バウンダリがエラーを過大評価していることがわかる。 We derive higher-order error bounds with small prefactors for a general Trotter product formula, generalizing a result of Childs et al. [Phys. Rev. X 11, 011020 (2021)]. We then apply these bounds to the real-time quantum time evolution operator governed by the Fermi-Hubbard Hamiltonian on one-dimensional and two-dimensional square and triangular lattices. The main technical contribution of our work is a symbolic evaluation of nested commutators between hopping and interaction terms for a given lattice geometry. The calculations result in explicit expressions for the error bounds in terms of the time step and Hamiltonian coefficients. Comparison with the actual Trotter error (evaluated on a small system) indicates that the bounds still overestimate the error.	翻訳日:2023-06-21 19:44:58 公開日:2023-06-18
# dropcompute: 計算分散低減による、シンプルでより堅牢な分散同期トレーニング DropCompute: simple and more robust distributed synchronous training via compute variance reduction ( http://arxiv.org/abs/2306.10598v1 ) ライセンス: Link先を確認	Niv Giladi, Shahar Gottlieb, Moran Shkolnik, Asaf Karnieli, Ron Banner, Elad Hoffer, Kfir Yehuda Levy, Daniel Soudry	(参考訳) 背景: ディープニューラルネットワーク(DNN)の大規模トレーニングには分散トレーニングが不可欠である。大規模DNNトレーニングの主要な方法は同期(All-Reduceなど)であるが、各ステップですべてのワーカーを待つ必要がある。このように、これらの方法は、重労働による遅延によって制限される。結果: 計算時間の変動によって作業員が行き詰まる典型的なシナリオについて検討した。計算時間特性とスケーラビリティの制約との間には,このような乱雑な作業者によって引き起こされる解析的な関係がある。そこで本研究では,作業者間のばらつきを低減し,同期訓練の堅牢性を向上させるための簡易かつ効果的な分散化手法を提案する。この方法は広く使われているall-reduceと統合できる。本研究は,200ガウディ加速器を用いた大規模トレーニングタスクで検証した。 Background: Distributed training is essential for large scale training of deep neural networks (DNNs). The dominant methods for large scale DNN training are synchronous (e.g. All-Reduce), but these require waiting for all workers in each step. Thus, these methods are limited by the delays caused by straggling workers. Results: We study a typical scenario in which workers are straggling due to variability in compute time. We find an analytical relation between compute time properties and scalability limitations, caused by such straggling workers. With these findings, we propose a simple yet effective decentralized method to reduce the variation among workers and thus improve the robustness of synchronous training. This method can be integrated with the widely used All-Reduce. Our findings are validated on large-scale training tasks using 200 Gaudi Accelerators.	翻訳日:2023-06-21 19:44:45 公開日:2023-06-18
# コンパクトカーネルによる条件付き期待 Conditional expectation via compact kernels ( http://arxiv.org/abs/2306.10592v1 ) ライセンス: Link先を確認	Suddhasattwa Das	(参考訳) 2つの確率変数の積から生じる条件付き期待を見出すという共通の設定において、微分、条件付き期待、および多様体学習の別々のタスクがしばしば表される。本稿では、このより一般的な問題に焦点をあて、条件付き期待値を推定する演算子理論的アプローチについて述べる。カーネル積分作用素は、再生カーネルヒルベルト空間における線形逆問題として推定問題を設定するためのコンパクト化ツールとして用いられる。この方程式は数値近似に安定な解を持つことが示されており、データ駆動実装の収束を保証する。全体的なテクニックは実装が容易で、現実世界の問題に対する彼らの成功例も示されています。 The separate tasks of denoising, conditional expectation and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactification tool, to set up the estimation problem as a linear inverse problem in a reproducing kernel Hilbert space. This equation is shown to have solutions that are stable to numerical approximation, thus guaranteeing the convergence of data-driven implementations. The overall technique is easy to implement, and their successful application to some real-world problems are also shown.	翻訳日:2023-06-21 19:44:34 公開日:2023-06-18
# 量子コンピュータを用いた機械学習における特徴選択 Quantum computer based Feature Selection in Machine Learning ( http://arxiv.org/abs/2306.10591v1 ) ライセンス: Link先を確認	Gerhard Hellstern, Vanessa Dehn, Martin Zaefferer	(参考訳) 本稿では,教師付き学習問題における適切な特徴数を選択する問題について検討する。機械学習の一般的な手法を出発点として、特徴選択タスクを古典的数値手法や量子計算フレームワークで扱うことができる二次的非拘束最適化問題(qubo)として扱う。異なる結果と小さな問題設定を比較した。本研究の結果から,QUBO法が他の特徴選択法より優れているか否かは,データセットに依存することがわかった。 27の特徴を持つより大きなデータセットの拡張として、量子コンピューティングによるQUBO法の収束挙動と古典的確率的最適化法を比較する。誤差率の持続により、古典確率最適化法は依然として優れている。 The problem of selecting an appropriate number of features in supervised learning problems is investigated in this paper. Starting with common methods in machine learning, we treat the feature selection task as a quadratic unconstrained optimization problem (QUBO), which can be tackled with classical numerical methods as well as within a quantum computing framework. We compare the different results in small-sized problem setups. According to the results of our study, whether the QUBO method outperforms other feature selection methods depends on the data set. In an extension to a larger data set with 27 features, we compare the convergence behavior of the QUBO methods via quantum computing with classical stochastic optimization methods. Due to persisting error rates, the classical stochastic optimization methods are still superior.	翻訳日:2023-06-21 19:44:23 公開日:2023-06-18
# 二重強汎函数のウォルド信頼区間の妥当性の正当性は仮定なしで証明できるのか? Can we falsify the justification of the validity of Wald confidence intervals of doubly robust functionals, without assumptions? ( http://arxiv.org/abs/2306.10590v1 ) ライセンス: Link先を確認	Lin Liu and Rajarshi Mukherjee and James M. Robins	(参考訳) 本稿では,lotnitzkyらによって研究された2重ロバスト(dr)関数のクラスに属する任意の2重機械学習(dml)推定器を中心に,報告された公称$(1 - \alpha)$ wald confidence interval(ci)の有効性をアナリストの正当化を偽造する,liu et al. 20における仮定-リーンテストの実現可能なバージョンを開発する。 DR機能学のクラスは広く、経済学やバイオ統計学において中心的な重要性を持つ。厳密には、(i)chernozhukovらによって研究された条件付き期待のアフィン汎関数の期待として書ける平均二乗連続汎函数のクラスと、robinsらによって研究された函数のクラスの両方を含む。 DR関数の現在の最先端推定子 $\psi$ は DML 推定子 $\hat{\psi}_{1}$ である。 $\hat{\psi}_{1}$ のバイアスは、2つのニュアンス関数 $b$ と $p$ が推定されるレートの積に依存する。最も一般的なアナリストは、彼女の複雑性を低減した仮定の下で、Cauchy-Schwarz (CS) の上限が $\hat{\psi}_{1}$ のバイアスの $o (n^{- 1 / 2})$ であることを証明することによって、彼女の Wald CI の有効性を正当化する。したがって、仮説 $H_{0}$: CS上界が$o (n^{- 1 / 2})$ であるなら、ウォルドCIの有効性に対するアナリストの正当化を偽ることになる。本研究では、$b, p$ あるいはそれらの推定値 $\hat{b}, \hat{p}$ の複雑性還元仮定に頼ることなく、$H_{0}$ の有効な仮定リーンのファルシフィケーションテストを示す。シミュレーション実験を行い,提案する仮定-リーンテストの実用性を示す。我々の方法論の避けられない制限は、我々のを含む$h_{0}$の仮定-リーンテストが一貫性のあるテストにならないことである。したがって、テストの拒絶の失敗は$h_{0}$を支持する意味のある証拠ではない。 In this article we develop a feasible version of the assumption-lean tests in Liu et al. 20 that can falsify an analyst's justification for the validity of a reported nominal $(1 - \alpha)$ Wald confidence interval (CI) centered at a double machine learning (DML) estimator for any member of the class of doubly robust (DR) functionals studied by Rotnitzky et al. 21. The class of DR functionals is broad and of central importance in economics and biostatistics. It strictly includes both (i) the class of mean-square continuous functionals that can be written as an expectation of an affine functional of a conditional expectation studied by Chernozhukov et al. 22 and the class of functionals studied by Robins et al. 08. The present state-of-the-art estimators for DR functionals $\psi$ are DML estimators $\hat{\psi}_{1}$. The bias of $\hat{\psi}_{1}$ depends on the product of the rates at which two nuisance functions $b$ and $p$ are estimated. Most commonly an analyst justifies the validity of her Wald CIs by proving that, under her complexity-reducing assumptions, the Cauchy-Schwarz (CS) upper bound for the bias of $\hat{\psi}_{1}$ is $o (n^{- 1 / 2})$. Thus if the hypothesis $H_{0}$: the CS upper bound is $o (n^{- 1 / 2})$ is rejected by our test, we will have falsified the analyst's justification for the validity of her Wald CIs. In this work, we exhibit a valid assumption-lean falsification test of $H_{0}$, without relying on complexity-reducing assumptions on $b, p$, or their estimates $\hat{b}, \hat{p}$. Simulation experiments are conducted to demonstrate how the proposed assumption-lean test can be used in practice. An unavoidable limitation of our methodology is that no assumption-lean test of $H_{0}$, including ours, can be a consistent test. Thus failure of our test to reject is not meaningful evidence in favor of $H_{0}$.	翻訳日:2023-06-21 19:44:12 公開日:2023-06-18
# 政策最適化における楽観性と適応性 Optimism and Adaptivity in Policy Optimization ( http://arxiv.org/abs/2306.10587v1 ) ライセンス: Link先を確認	Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag	(参考訳) 我々は,強化学習(RL)における政策最適化手法の高速化のための統一パラダイムを,<emph{optimism} \& \emph{adaptivity} を通じて進める。ポリシー反復法とポリシー勾配法との深い関係を生かして、一見無関係なポリシー最適化アルゴリズムを2つのインターリーブステップの繰り返し適用として再キャストする。 i) \emph{optimistic policy improve operator} は、先行ポリシー $\pi_t$ を \emph{gradient ascent prediction} を用いて仮説 $\pi_{t+1} にマッピングし、次に続く。 (ii)$\pi_{t+1}$のパフォーマンスの部分評価に基づく楽観的予測のemph{hindsight adaptation}。我々はこの共有レンズを用いて、ソフトで楽観的なポリシー反復、自然なアクター批判法、前方探索に基づくモデルベースのポリシー改善、メタ学習アルゴリズムなど、他のよく知られたアルゴリズムを共同で表現する。そうすることで、オプティミズム \& 適応性による加速度に関連する集合的理論的性質に光を当てた。これらの知見に基づいて,メタグラディエント・ラーニングによる<emph{adaptive \& optistic policy gradient} アルゴリズムを設計し,最適性に関連するいくつかの設計選択を実証的に強調する。 We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) through \emph{optimism} \& \emph{adaptivity}. Leveraging the deep connection between policy iteration and policy gradient methods, we recast seemingly unrelated policy optimization algorithms as the repeated application of two interleaving steps (i) an \emph{optimistic policy improvement operator} maps a prior policy $\pi_t$ to a hypothesis $\pi_{t+1}$ using a \emph{gradient ascent prediction}, followed by (ii) a \emph{hindsight adaptation} of the optimistic prediction based on a partial evaluation of the performance of $\pi_{t+1}$. We use this shared lens to jointly express other well-known algorithms, including soft and optimistic policy iteration, natural actor-critic methods, model-based policy improvement based on forward search, and meta-learning algorithms. By doing so, we shed light on collective theoretical properties related to acceleration via optimism \& adaptivity. Building on these insights, we design an \emph{adaptive \& optimistic policy gradient} algorithm via meta-gradient learning, and empirically highlight several design choices pertaining to optimism, in an illustrative task.	翻訳日:2023-06-21 19:43:29 公開日:2023-06-18
# 多チャンネル近藤雲の階層的絡み合い殻 Hierarchical entanglement shells of multichannel Kondo clouds ( http://arxiv.org/abs/2306.10583v1 ) ライセンス: Link先を確認	Jeongmin Shim, Donghoon Kim, and H.-S. Sim	(参考訳) 不純物や境界はしばしば隙間のないバルクに非自明な境界条件を課し、与えられたバルク、相転移、非フェルミ液体に対して異なる境界普遍性クラスを生じる。しかし、基礎となる境界状態はほとんど未探検のままである。これは金堂雲が金属中の磁気不純物をどのように空間的に形成するかという根本的な問題に関連している。ここでは,非フェルミ液体を含む多チャンネル近藤雲の量子コヒーレントな空間・エネルギー構造を,不純物とチャネル間の量子絡み合いを調べることにより予測する。異なる非フェルミ液体の絡み合い殻は、チャネルによって構造に共存する。温度が上昇すると、シェルは外側から1つずつ抑制され、残りの外側のシェルは各チャネルの熱位相を決定する。エンタングルメントシェルの検出は実験的に可能である。以上より,他の境界状態や境界点の絡み合いを研究するための手掛かりが示唆された。 Impurities or boundaries often impose nontrivial boundary conditions on a gapless bulk, resulting in distinct boundary universality classes for a given bulk, phase transitions, and non-Fermi liquids in diverse systems. The underlying boundary states however remain largely unexplored. This is related with a fundamental issue how a Kondo cloud spatially forms to screen a magnetic impurity in a metal. Here we predict the quantum-coherent spatial and energy structure of multichannel Kondo clouds, representative boundary states involving competing non-Fermi liquids, by studying quantum entanglement between the impurity and the channels. Entanglement shells of distinct non-Fermi liquids coexist in the structure, depending on the channels. As temperature increases, the shells become suppressed one by one from the outside, and the remaining outermost shell determines the thermal phase of each channel. Detection of the entanglement shells is experimentally feasible. Our findings suggest a guide to studying other boundary states and boundary-bulk entanglement.	翻訳日:2023-06-21 19:43:01 公開日:2023-06-18
# 競合型マルチエージェント探索のための進化戦略 Evolving Strategies for Competitive Multi-Agent Search ( http://arxiv.org/abs/2306.10640v1 ) ライセンス: Link先を確認	Erkin Bahceci, Riitta Katila, and Risto Miikkulainen	(参考訳) 進化的計算は工学における自動発見に適しているが、人間や組織がより効果的に機能する方法についての洞察を得るためにも利用できる。本稿では、組織におけるイノベーションサーチの現実的な課題をモチベーションの例として用い、まず、人間の創造的問題解決を競合マルチエージェントサーチ(CMAS)として定式化する。 CMASは既存のシングルエージェントやチーム検索と異なり、エージェントは他のエージェントの検索の知識と、これらの検索から生じる検索環境の動的変化を通して対話する。主な仮説は、進化的計算がCMASの効果的な戦略を発見するために用いられることであり、この仮説はNKモデルに関する一連の実験、すなわち部分的に相関し、調整可能な頑丈なフィットネスランドスケープで検証される。異なる競争環境ごとに異なる専門戦略が進化し、環境全体にわたってうまく機能する一般的な戦略も進化する。これらの戦略は、手作りの戦略や伝統的な木探索に基づく戦略よりも効率的で複雑である。このような風景を新しい球面で可視化することで、例えば、風景のポジティブな変化を追跡するなど、戦略がいかにうまく機能するかについての洞察が得られる。本稿は,将来,競争力のあるマルチエージェント検索として,様々な創造活動を研究するためのフレームワークを提供する。 While evolutionary computation is well suited for automatic discovery in engineering, it can also be used to gain insight into how humans and organizations could perform more effectively. Using a real-world problem of innovation search in organizations as the motivating example, this article first formalizes human creative problem solving as competitive multiagent search (CMAS). CMAS is different from existing single-agent and team search problems in that the agents interact through knowledge of other agents' searches and through the dynamic changes in the search landscape that result from these searches. The main hypothesis is that evolutionary computation can be used to discover effective strategies for CMAS; this hypothesis is verified in a series of experiments on the NK model, i.e. partially correlated and tunably rugged fitness landscapes. Different specialized strategies are evolved for each different competitive environment, and also general strategies that perform well across environments. These strategies are more effective and more complex than hand-designed strategies and a strategy based on traditional tree search. Using a novel spherical visualization of such landscapes, insight is gained about how successful strategies work, e.g. by tracking positive changes in the landscape. The article thus provides a possible framework for studying various human creative activities as competitive multi-agent search in the future.	翻訳日:2023-06-21 19:37:14 公開日:2023-06-18
# MA-BBOB:ノイズのない数値ブラックボックス最適化文脈における自動ML手法の評価のためのBBOB関数の多要素結合 MA-BBOB: Many-Affine Combinations of BBOB Functions for Evaluating AutoML Approaches in Noiseless Numerical Black-Box Optimization Contexts ( http://arxiv.org/abs/2306.10627v1 ) ライセンス: Link先を確認	Diederick Vermetten, Furong Ye, Thomas B\"ack, Carola Doerr	(参考訳) 連続最適化(coco)プラットフォームから確立されたbbob関数のペアを補間することにより、数値ブラックボックス最適化ベンチマークのための新しいインスタンスを生成するための最近の提案を拡張し、本研究では、元のインスタンスと任意に選択されたグローバルオプティマの複数のアフィン結合を可能にするさらなる一般化を提案する。 ma-bbobジェネレータがインスタンス空間を満たし、アルゴリズム性能の全体的なパターンが保存されていることを実証する。課題のランドスケープ特徴と性能データを組み合わせることで,これらの特徴がアルゴリズムの選択に有用かどうかを従来の研究が示唆したように疑問視する。 MA-BBOBは、標準化された実験ルーチンを促進し、パフォーマンス分析と視覚化のためにインタラクティブなIOHanalyzerモジュールへのアクセスを提供し、(MA-)BBOB関数で利用可能なリッチで成長するデータコレクションとの比較を可能にする。 Extending a recent suggestion to generate new instances for numerical black-box optimization benchmarking by interpolating pairs of the well-established BBOB functions from the COmparing COntinuous Optimizers (COCO) platform, we propose in this work a further generalization that allows multiple affine combinations of the original instances and arbitrarily chosen locations of the global optima. We demonstrate that the MA-BBOB generator can help fill the instance space, while overall patterns in algorithm performance are preserved. By combining the landscape features of the problems with the performance data, we pose the question of whether these features are as useful for algorithm selection as previous studies suggested. MA-BBOB is built on the publicly available IOHprofiler platform, which facilitates standardized experimentation routines, provides access to the interactive IOHanalyzer module for performance analysis and visualization, and enables comparisons with the rich and growing data collection available for the (MA-)BBOB functions.	翻訳日:2023-06-21 19:36:50 公開日:2023-06-18
# グラフニューラルネットワークを用いた気流シミュレーションのためのメタラーニング Meta-Learning for Airflow Simulations with Graph Neural Networks ( http://arxiv.org/abs/2306.10624v1 ) ライセンス: Link先を確認	Wenzhuo Liu, Mouadh Yagoubi, Marc Schoenauer	(参考訳) 数値シミュレーションの分野は実世界のシステムの設計と管理において重要であり、偏微分方程式(PDE)は一般的な数学モデリングツールである。しかしながら、従来の数値解法は高い計算コストを必要とすることが多いため、PDEの解法は依然として課題である。その結果、機械学習(特にディープラーニング)アルゴリズムを利用したデータ駆動手法が、計算流体力学(CFD)のような複雑なPDEの解を予測できるモデルを学ぶために、ますます提案されている。しかし、これらの手法は、OoD(Out-of-distriion)サンプルの一般化性能の低下に悩まされており、より効率的なアプローチの必要性を強調している。そこで本研究では,OoDサンプルを用いた学習モデルの性能向上のためのメタラーニング手法を提案する。具体的には,各気翼上のCFD内の気流シミュレーションをメタラーニング問題として設定し,一つの気翼形状で定義された各例を個別のタスクとして扱う。モデルに依存しないメタラーニング(MAML)を用いることで,従来のエアフイル形状に適応可能なメタラーナーを,少数のタスク固有データのみを用いて学習する。提案手法の効率を実験的に実証し, 学習モデルのood一般化性能を向上し, 効率を維持できることを示した。 The field of numerical simulation is of significant importance for the design and management of real-world systems, with partial differential equations (PDEs) being a commonly used mathematical modeling tool. However, solving PDEs remains still a challenge, as commonly used traditional numerical solvers often require high computational costs. As a result, data-driven methods leveraging machine learning (more particularly Deep Learning) algorithms have been increasingly proposed to learn models that can predict solutions to complex PDEs, such as those arising in computational fluid dynamics (CFD). However, these methods are known to suffer from poor generalization performance on out-of-distribution (OoD) samples, highlighting the need for more efficient approaches. To this end, we present a meta-learning approach to enhance the performance of learned models on OoD samples. Specifically, we set the airflow simulation in CFD over various airfoils as a meta-learning problem, where each set of examples defined on a single airfoil shape is treated as a separate task. Through the use of model-agnostic meta-learning (MAML), we learn a meta-learner capable of adapting to new tasks, i.e., previously unseen airfoil shapes, using only a small amount of task-specific data. We experimentally demonstrate the efficiency of the proposed approach for improving the OoD generalization performance of learned models while maintaining efficiency.	翻訳日:2023-06-21 19:36:27 公開日:2023-06-18
# 歯科用パノラマX線写真解析のためのマスケ画像モデリング Enhanced Masked Image Modeling for Analysis of Dental Panoramic Radiographs ( http://arxiv.org/abs/2306.10623v1 ) ライセンス: Link先を確認	Amani Almalki and Longin Jan Latecki	(参考訳) コンピュータ支援放射線情報報告は, 歯科医療提供者の診断・治療計画を容易にするため, 研究の注目を集めている。しかし,手動による歯科画像の解釈には限界があり,高価であり,時間を要する。デンタルイメージングのもうひとつの障壁は、トレーニング用に利用可能なイメージの数が限られていることだ。本研究では,マスク画像モデリング(simmim)トランスフォーマ(sd-simmim)上に自己教師付き学習を施した新しい自己蒸留法(sd)を提案する。マスクパッチの予測損失に加えて、SD-SimMIMは可視パッチの自己蒸留損失を計算する。 SD-SimMIMを歯科用パノラマX線に応用し, 歯の修復, 矯正器具の検出, 症例分割作業を行った。その結果,SD-SimMIMは他の自己教師あり学習方法よりも優れていた。さらに、既存のパノラマX線データセットのアノテーションを増強し、改善する。 The computer-assisted radiologic informative report has received increasing research attention to facilitate diagnosis and treatment planning for dental care providers. However, manual interpretation of dental images is limited, expensive, and time-consuming. Another barrier in dental imaging is the limited number of available images for training, which is a challenge in the era of deep learning. This study proposes a novel self-distillation (SD) enhanced self-supervised learning on top of the masked image modeling (SimMIM) Transformer, called SD-SimMIM, to improve the outcome with a limited number of dental radiographs. In addition to the prediction loss on masked patches, SD-SimMIM computes the self-distillation loss on the visible patches. We apply SD-SimMIM on dental panoramic X-rays for teeth numbering, detection of dental restorations and orthodontic appliances, and instance segmentation tasks. Our results show that SD-SimMIM outperforms other self-supervised learning methods. Furthermore, we augment and improve the annotation of an existing dataset of panoramic X-rays.	翻訳日:2023-06-21 19:36:03 公開日:2023-06-18
# 多地点脳MRIを用いた事前知識インフォームドディープラーニングによるラセン検出と定量化 Prior-knowledge-informed deep learning for lacune detection and quantification using multi-site brain MRI ( http://arxiv.org/abs/2306.10622v1 ) ライセンス: Link先を確認	Bo Li, Jeroen de Bresser, Wiro Niessen, Matthias van Osch, Wiesje M. van der Flier, Geert Jan Biessels, Meike W. Vernooij, Esther Bron (for the Heart-Brain Connection Consortium)	(参考訳) 血管起源と推定されるラクエンは、脳小血管疾患や認知症などの認知疾患を評価するのに重要である。しかしながら、画像データからのラクエンの視覚的評価は、そのサイズ、スパース性、模倣性が小さいため、困難で、時間がかかり、レート依存である。最近の自動アルゴリズムの発展により、感度を保ちながらラカンの検出を高速化する一方で、偽陽性が多数見られ、臨床や大規模研究での使用には実用的でないことが示されている。そこで我々は,ラグーン検出に加えて,分類的負担スコアを出力する新しいフレームワークを開発した。このスコアは、ラキューンのイメージングアセスメントを単純化し、効果的に加速するより実用的なラキューンの存在推定を提供する可能性がある。我々は,検出と分類スコアの組み合わせにより,ノイズラベルに対する感度が低下する,という仮説を立てた。 Lacunes of presumed vascular origin, also referred to as lacunar infarcts, are important to assess cerebral small vessel disease and cognitive diseases such as dementia. However, visual rating of lacunes from imaging data is challenging, time-consuming, and rater-dependent, owing to their small size, sparsity, and mimics. Whereas recent developments in automatic algorithms have shown to make the detection of lacunes faster while preserving sensitivity, they also showed a large number of false positives, which makes them impractical for use in clinical practice or large-scale studies. Here, we develop a novel framework that, in addition to lacune detection, outputs a categorical burden score. This score could provide a more practical estimate of lacune presence that simplifies and effectively accelerates the imaging assessment of lacunes. We hypothesize that the combination of detection and the categorical score makes the procedure less sensitive to noisy labels.	翻訳日:2023-06-21 19:35:45 公開日:2023-06-18
# UniSG^GA:Geometric Algebraを用いた3Dシーングラフによる幾何学・行動・GNNの創成AIへの応用 UniSG^GA: A 3D scenegraph powered by Geometric Algebra unifying geometry, behavior and GNNs towards generative AI ( http://arxiv.org/abs/2306.10621v1 ) ライセンス: Link先を確認	Manos Kamarianakis, Antonis Protopsaltis, Dimitris Angelis, Paul Zikas, Mike Kentros, George Papagiannakis	(参考訳) 本研究は,3次元シーンの挙動と幾何学的データを組み込んだ,新たな統合されたシーングラフ構造UniSG^GAの導入について述べる。グラフニューラルネットワーク(GNN)をシームレスに統合し、生成タスク中に3Dシーングラフ(3D-SG)を変換する際の課題に対処するように設計されている。グラフ表現において,物体間のトポロジ的関係を効率的に把握し,保存するために,幾何学的代数形式をシームレスに統合するUniSG^GAを提案する。この新しいアプローチは、生成的および予測的タスクの処理におけるGNNの全体的なパフォーマンスと能力を高め、新たな可能性を開き、シーン生成と合成を効果的に活用できるグラフベースの生成AIモデルのさらなる探索と開発の基礎を築くことを目的としている。 This work presents the introduction of UniSG^GA, a novel integrated scenegraph structure, that to incorporates behavior and geometry data on a 3D scene. It is specifically designed to seamlessly integrate Graph Neural Networks (GNNs) and address the challenges associated with transforming a 3D scenegraph (3D-SG) during generative tasks. To effectively capture and preserve the topological relationships between objects in a simplified way, within the graph representation, we propose UniSG^GA, that seamlessly integrates Geometric Algebra (GA) forms. This novel approach enhances the overall performance and capability of GNNs in handling generative and predictive tasks, opening up new possibilities and aiming to lay the foundation for further exploration and development of graph-based generative AI models that can effectively incorporate behavior data for enhanced scene generation and synthesis.	翻訳日:2023-06-21 19:35:29 公開日:2023-06-18
# 自己回帰型ニューラル演算子の安定性に向けて Towards Stability of Autoregressive Neural Operators ( http://arxiv.org/abs/2306.10619v1 ) ライセンス: Link先を確認	Michael McCabe, Peter Harrington, Shashank Subramanian, Jed Brown	(参考訳) ニューラル演算子は、物理科学における時空間系のモデリングに有望なアプローチであることが証明されている。しかし、これらのモデルを大規模システム向けにトレーニングすることは、計算とメモリの大幅なコストを発生させるため、非常に難しい - これらのシステムは、将来の時間状態を予測するために、ニューラルネットワークの自動回帰的タイムステッピングに頼ることを余儀なくされることが多い。これはコスト管理に有効であるが、時間とともに制御不能なエラーの増加と最終的には不安定になる可能性がある。この自己回帰的誤差の増大の原因を,物理システムのための先駆的ニューラルオペレータモデルを用いて解析し,その軽減法を探究する。計算/メモリコストを膨らませることなく、これらのモデル内で不安定誘導操作を慎重に制御できるアーキテクチャとアプリケーション固有の改善を導入する。本研究では,Navier-Stokes流体の流れ,浅瀬の回転,高分解能気象予報システムなどの科学システムについて報告する。原型ニューラルネットワークに設計原則を適用すると、これらのシステムのオリジナルのモデルと比較して、800\%長予測の長距離予測において、偏差の定性的な兆候のないエラーが大幅に減少することを示した。再現性のために、私たちは \href{https://anonymous.4open.science/r/stabilizing_neural_operators-5774/}{code}をオープンソース化した。 Neural operators have proven to be a promising approach for modeling spatiotemporal systems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense -- these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effective in managing costs, it can lead to uncontrolled error growth over time and eventual instability. We analyze the sources of this autoregressive error growth using prototypical neural operator models for physical systems and explore ways to mitigate it. We introduce architectural and application-specific improvements that allow for careful control of instability-inducing operations within these models without inflating the compute/memory expense. We present results on several scientific systems that include Navier-Stokes fluid flow, rotating shallow water, and a high-resolution global weather forecasting system. We demonstrate that applying our design principles to prototypical neural networks leads to significantly lower errors in long-range forecasts with 800\% longer forecasts without qualitative signs of divergence compared to the original models for these systems. We open-source our \href{https://anonymous.4open.science/r/stabilizing_neural_operators-5774/}{code} for reproducibility.	翻訳日:2023-06-21 19:35:10 公開日:2023-06-18
# gpuによる電力系統用機械学習モデルの検証 GPU-Accelerated Verification of Machine Learning Models for Power Systems ( http://arxiv.org/abs/2306.10617v1 ) ライセンス: Link先を確認	{Samuel Chevalier, Ilgiz Murzakhanov, Spyros Chatzivasileiadis	(参考訳) 近年,大規模機械学習(ML)モデルの性能を厳格に検証するための計算ツールが著しく進歩している。最も成功した解法は、高度に専門化されたGPU加速分岐とバウンドルーチンである。このようなツールは、電力システムなどの安全クリティカルなシステムに機械学習アプリケーションをデプロイする上で、極めて重要である。しかし、その成功にもかかわらず、障壁はシステム問題にこれらのルーチンをそのまま適用することを妨げる。本稿ではこの問題を2つの重要な方法で解決する。まず,まず,複数の検証問題の同時検証を可能にする(例えば,個々の検証問題を解くことによってではなく,すべての行フローの制約が同時に違反されることをチェックする)。そこで本研究では,一連の潜在的侵害をまたいだ"ワーストケース"違反を,元来のニューラルネットワークを補強する一連のreluベースのレイヤに変換する,厳密なトランスフォーメーションを導入する。これにより、検証者は直接解釈することができる。第二に、電力フロー制約を満たすためには、しばしば電力系統MLモデルを検証する必要がある。本稿では,線形等式制約と不等式制約を直接検証問題にエンコードする双対化手法を提案する。これらのイノベーションを実証するために,データ駆動型セキュリティ制約付きDC-OPFソルバに関わる問題を検証した。私たちは最初のイノベーションセットを$\alpha,\beta$-crownソルバを使って構築し、テストし、gurobi 10.0に対してベンチマークします。当社のコントリビューションは100倍以上のスピードアップを実現し、高いレベルの柔軟性を実現しています。 Computational tools for rigorously verifying the performance of large-scale machine learning (ML) models have progressed significantly in recent years. The most successful solvers employ highly specialized, GPU-accelerated branch and bound routines. Such tools are crucial for the successful deployment of machine learning applications in safety-critical systems, such as power systems. Despite their successes, however, barriers prevent out-of-the-box application of these routines to power system problems. This paper addresses this issue in two key ways. First, for the first time to our knowledge, we enable the simultaneous verification of multiple verification problems (e.g., checking for the violation of all line flow constraints simultaneously and not by solving individual verification problems). For that, we introduce an exact transformation that converts the "worst-case" violation across a set of potential violations to a series of ReLU-based layers that augment the original neural network. This allows verifiers to interpret them directly. Second, power system ML models often must be verified to satisfy power flow constraints. We propose a dualization procedure which encodes linear equality and inequality constraints directly into the verification problem; and in a manner which is mathematically consistent with the specialized verification tools. To demonstrate these innovations, we verify problems associated with data-driven security constrained DC-OPF solvers. We build and test our first set of innovations using the $\alpha,\beta$-CROWN solver, and we benchmark against Gurobi 10.0. Our contributions achieve a speedup that can exceed 100x and allow higher degrees of verification flexibility.	翻訳日:2023-06-21 19:34:49 公開日:2023-06-18
# Omnipredictor を用いた単一インデックスモデルの学習 Agnostically Learning Single-Index Models using Omnipredictors ( http://arxiv.org/abs/2306.10615v1 ) ライセンス: Link先を確認	Aravind Gollakota and Parikshit Gopalan and Adam R. Klivans and Konstantinos Stavropoulos	(参考訳) 任意の単調およびリプシッツのアクティベーションを持つSIM(Single-Index Models)を学習するための最初の結果を与える。以前のすべての作業は、実現可能な設定でのみ保持するか、アクティベーションを知る必要がある。さらに、有界な第二モーメントを持つことは限界しか必要としないが、事前の作業はすべてより強い分布仮定(反集中や有界性など)を必要とする。本アルゴリズムは, [GHK$^+$23] の検定多重精度を満たす予測器を用いた全方位予測に関する最近の研究に基づいている。我々の分析は単純であり、ブレグマンの発散(あるいは損失の一致)と$\ell_p$距離の関係に依存する。また、GLMtronのような標準アルゴリズムと非依存設定におけるロジスティック回帰の新しい保証も提供する。 We give the first result for agnostically learning Single-Index Models (SIMs) with arbitrary monotone and Lipschitz activations. All prior work either held only in the realizable setting or required the activation to be known. Moreover, we only require the marginal to have bounded second moments, whereas all prior work required stronger distributional assumptions (such as anticoncentration or boundedness). Our algorithm is based on recent work by [GHK$^+$23] on omniprediction using predictors satisfying calibrated multiaccuracy. Our analysis is simple and relies on the relationship between Bregman divergences (or matching losses) and $\ell_p$ distances. We also provide new guarantees for standard algorithms like GLMtron and logistic regression in the agnostic setting.	翻訳日:2023-06-21 19:34:29 公開日:2023-06-18
# 雑音処理とサイド情報のない同定可能な因果推論 Identifiable causal inference with noisy treatment and no side information ( http://arxiv.org/abs/2306.10614v1 ) ライセンス: Link先を確認	Antti P\"oll\"anen, Pekka Marttinen	(参考訳) いくつかの因果推論シナリオでは、例えば疫学や計量学において、治療変数(すなわち原因)が不正確に測定される。この測定誤差の影響を補正できないと、偏りのある因果効果の推定につながる。従来の研究では、複雑な非線形依存を可能とし、側面情報へのアクセスを前提とせず、因果的観点からこの問題に対処する方法は研究されていない。シナリオとして,不正確な測定を行う連続処理変数を仮定したモデルを提案する。測定誤差モデルに対する既存の結果に基づいて,測定誤差の分散やその他の側面情報を知ることなく,モデルの因果効果の推定値が同定可能であることを示す。本手法は,ガウス条件がニューラルネットワークによってパラメータ化される深い潜在変数モデルに依拠し,モデル学習のための重要度重み付き変分目標を開発した。実験結果から, 測定誤差が未知であることを示す。より広い範囲において、我々の仕事は信頼できる因果推論ができるアプリケーションの範囲を広げます。 In some causal inference scenarios, the treatment (i.e. cause) variable is measured inaccurately, for instance in epidemiology or econometrics. Failure to correct for the effect of this measurement error can lead to biased causal effect estimates. Previous research has not studied methods that address this issue from a causal viewpoint while allowing for complex nonlinear dependencies and without assuming access to side information. For such as scenario, this paper proposes a model that assumes a continuous treatment variable which is inaccurately measured. Building on existing results for measurement error models, we prove that our model's causal effect estimates are identifiable, even without knowledge of the measurement error variance or other side information. Our method relies on a deep latent variable model where Gaussian conditionals are parameterized by neural networks, and we develop an amortized importance-weighted variational objective for training the model. Empirical results demonstrate the method's good performance with unknown measurement error. More broadly, our work extends the range of applications where reliable causal inference can be conducted.	翻訳日:2023-06-21 19:34:14 公開日:2023-06-18
# CompanyKG: 企業類似性定量化のための大規模不均一グラフ CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification ( http://arxiv.org/abs/2306.10649v1 ) ライセンス: Link先を確認	Lele Cao, Vilhelm von Ehrenheim, Mark Granroth-Wilding, Richard Anselmo Stahl, Andrew McCornack, Armin Catovic and Dhiana Deva Cavacanti Rocha	(参考訳) 投資業界では、市場マッピング、競合分析、合併や買収など、様々な目的のために細かな会社の類似度定量化を行うことが不可欠であることが多い。企業の特徴と関係を表現・学習するために,企業kgという知識グラフを提案し,公開する。具体的には、117万の企業が企業記述の埋め込みに富んだノードとして表現され、15の異なる企業間関係によって51.06百万のエッジが生成される。企業の類似度定量化手法の包括的評価を可能にするために,類似度予測,競合検索,類似度ランキングという3つの評価タスクをアノテートした。本稿では,11個の再現可能な予測手法について,ノードのみ,エッジのみ,ノード+エッジの3つのグループに分類した。私たちの知る限りでは、企業間類似性を定量化するのに適した、実世界の投資プラットフォームから派生した、最初の大規模な異種グラフデータセットである。 In the investment industry, it is often essential to carry out fine-grained company similarity quantification for a range of purposes, including market mapping, competitor analysis, and mergers and acquisitions. We propose and publish a knowledge graph, named CompanyKG, to represent and learn diverse company features and relations. Specifically, 1.17 million companies are represented as nodes enriched with company description embeddings; and 15 different inter-company relations result in 51.06 million weighted edges. To enable a comprehensive assessment of methods for company similarity quantification, we have devised and compiled three evaluation tasks with annotated test sets: similarity prediction, competitor retrieval and similarity ranking. We present extensive benchmarking results for 11 reproducible predictive methods categorized into three groups: node-only, edge-only, and node+edge. To the best of our knowledge, CompanyKG is the first large-scale heterogeneous graph dataset originating from a real-world investment platform, tailored for quantifying inter-company similarity.	翻訳日:2023-06-21 19:26:34 公開日:2023-06-18
# 参照なしユーザ制御可能なセマンティック画像合成 Referenceless User Controllable Semantic Image Synthesis ( http://arxiv.org/abs/2306.10646v1 ) ライセンス: Link先を確認	Jonghyun Kim, Gen Li, Joongkyu Kim	(参考訳) 近年のセマンティック画像合成の進歩にもかかわらず、画像スタイルの完全な制御は難しい問題である。既存の手法では、スタイル情報をセマンティックレイアウトにフィードするために参照画像が必要であり、そのスタイルが与えられた画像によって制約されていることを示す。本稿では,特定の意味領域のスタイルを表現するために特異色を利用するユーザ制御可能な意味画像合成のためのRUCGANというモデルを提案する。提案するネットワークは,各セマンティックレイアウトにユーザ希望のスタイルとして色を注入することにより,参照不要なセマンティックイメージ合成を実現し,特異な色でセマンティックイメージを合成することができる。様々な挑戦的データセットに対する大規模な実験結果から,提案手法は既存手法よりも優れており,我々はさらに,スタイル制御性に対するアプローチの利点を示すインタラクティブUIを提供する。 Despite recent progress in semantic image synthesis, complete control over image style remains a challenging problem. Existing methods require reference images to feed style information into semantic layouts, which indicates that the style is constrained by the given image. In this paper, we propose a model named RUCGAN for user controllable semantic image synthesis, which utilizes a singular color to represent the style of a specific semantic region. The proposed network achieves reference-free semantic image synthesis by injecting color as user-desired styles into each semantic layout, and is able to synthesize semantic images with unusual colors. Extensive experimental results on various challenging datasets show that the proposed method outperforms existing methods, and we further provide an interactive UI to demonstrate the advantage of our approach for style controllability.	翻訳日:2023-06-21 19:26:04 公開日:2023-06-18
# ChatGPTプロンプトを用いた効果的な教育チャットボットの開発:ソーシャルメディア・リテラシーを事例として Developing Effective Educational Chatbots with ChatGPT prompts: Insights from Preliminary Tests in a Case Study on Social Media Literacy ( http://arxiv.org/abs/2306.10645v1 ) ライセンス: Link先を確認	Cansu Koyuturk, Mona Yavari, Emily Theophilou, Sathya Bursic, Gregor Donabauer, Alessia Telari, Alessia Testa, Raffaele Boiano, Alessandro Gabbiadini, Davinia Hernandez-Leo, Martin Ruskov, Dimitri Ognibene	(参考訳) 教育用チャットボットには、インタラクティブでパーソナライズされた学習体験が約束されているが、その開発は、利用可能なプラットフォームの自由なインタラクション機能と、知識を適切なフォーマットでエンコーディングすることの難しさによって制限されている。 chatgptのようなゼロショット学習機能を持つ言語学習モデルの最近の進歩は、プロンプトベースのアプローチで教育用チャットボットを開発する新しい可能性を示唆している。本稿では,チャットボットを交互に操作できる簡易システムを用いた事例研究を行い,最初のテストから得られた洞察と予備ガイドラインについて述べる。本稿では,ChatGPTが複数の相互接続型学習目標を追求する能力,文化,年齢,教育レベルといったユーザの特性に教育活動を適用する能力,多様な教育戦略や会話スタイルを利用する能力について検討する。その結果,チャットボットの役割が教師からセラピストに予期せぬ変化をもたらすおそれのある,チャットボットによる会話の限られた歴史と,ChatGPTによる高度に構造化された応答の形式によって,課題が提起される。これらの課題に対処し、効果的な教育チャットボットの開発を促進するための初期ガイドラインを提示する。 Educational chatbots come with a promise of interactive and personalized learning experiences, yet their development has been limited by the restricted free interaction capabilities of available platforms and the difficulty of encoding knowledge in a suitable format. Recent advances in language learning models with zero-shot learning capabilities, such as ChatGPT, suggest a new possibility for developing educational chatbots using a prompt-based approach. We present a case study with a simple system that enables mixed-turn chatbot interactions and we discuss the insights and preliminary guidelines obtained from initial tests. We examine ChatGPT's ability to pursue multiple interconnected learning objectives, adapt the educational activity to users' characteristics, such as culture, age, and level of education, and its ability to use diverse educational strategies and conversational styles. Although the results are encouraging, challenges are posed by the limited history maintained for the conversation and the highly structured form of responses by ChatGPT, as well as their variability, which can lead to an unexpected switch of the chatbot's role from a teacher to a therapist. We provide some initial guidelines to address these issues and to facilitate the development of effective educational chatbots.	翻訳日:2023-06-21 19:25:36 公開日:2023-06-18
# 正方格子中のsu(3)フェルミオンの金属-絶縁体転移と磁性 Metal-insulator transition and magnetism of SU(3) fermions in the square lattice ( http://arxiv.org/abs/2306.10644v1 ) ライセンス: Link先を確認	Eduardo Ibarra-Garc\'ia-Padilla, Chunhan Feng, Giulio Pasqualetti, Simon F\"olling, Richard T. Scalettar, Ehsan Khatami, Kaden R. A. Hazzard	(参考訳) 数値的精度決定型量子モンテカルロ法(DQMC)と数値連結クラスタ展開法(NLCE)を用いて, 正方格子中のSU(3)対称フェルミ-ハッバードモデル(FHM)を1/3$-fillingで検討した。本稿では,金属絶縁体遷移と磁気クロスオーバーのシグネチャを観察するモデルのT$-$U$位相図を示す。これらのシグネチャは、圧縮率の上昇と対角スピン-スピン相関関数の符号の相互作用依存的な変化を特徴づける温度スケールである。圧縮性の解析は、金属絶縁体量子臨界点の位置を$U_c/t \sim 6$と推定し、有限のT$でモット物理を観測するための温度スケールを提供する。さらに、スピンスピン相関関数の解析から、u/t \gtrsim6$ と $t \sim j = 4t^2/u$ に対して、短距離の2つの反強磁性構造と、温度が $t/j \lesssim 0.57$ を下回るように出現する3つの反強磁性構造が存在することを観察する。この2-SLから3-SLへの磁気秩序の交差は、ハイゼンベルクの極限予測と一致し、オンサイト対の数に観測可能な効果を持つ。最後に、現在達成されている実験技術と温度を持つ光学格子中のアルカリ-アース様原子を用いて、$T$-U$相図の特徴を探索する方法について述べる。本論文で論じられた結果は, ドーピングによるSU(3)FHM探査の出発点となる。 We study the SU(3) symmetric Fermi-Hubbard model (FHM) in the square lattice at $1/3$-filling using numerically exact determinant quantum Monte Carlo (DQMC) and numerical linked-cluster expansion (NLCE) techniques. We present the $T$-$U$ phase diagram of the model, in which we observe signatures of the metal-insulator transition and magnetic crossovers. These signatures are the temperature scale characterizing the rise of the compressibility, and an interaction-dependent change in the sign of the diagonal spin-spin correlation function. The analysis of the compressibility estimates the location of the metal-insulator quantum critical point at $U_c/t \sim 6$, and provides a temperature scale for observing Mott physics at finite-$T$. Furthermore, from the analysis of the spin-spin correlation function we observe that for $U/t \gtrsim6$ and $T \sim J = 4t^2/U$ there is a development of a short-ranged two sublattice (2-SL) antiferromagnetic structure, as well as an emerging three sublattice (3-SL) antiferromagnetic structure as the temperature is lowered below $T/J \lesssim 0.57$. This crossover from 2-SL to 3-SL magnetic ordering agrees with Heisenberg limit predictions, and has observable effects on the number of on-site pairs. Finally, we describe how the features of the $T$-$U$ phase diagram can be explored with alkaline-earth-like atoms in optical lattices with currently-achieved experimental techniques and temperatures. The results discussed in this manuscript provide a starting point for the exploration of the SU(3) FHM upon doping.	翻訳日:2023-06-21 19:24:56 公開日:2023-06-18

Title

Authors

Abstract

論文公表日・翻訳日

# Ethereumブロックチェーンクライアントのカオスエンジニアリング

Chaos Engineering of Ethereum Blockchain Clients ( http://arxiv.org/abs/2111.00221v2 )

ライセンス: Link先を確認

Long Zhang, Javier Ron, Benoit Baudry, and Martin Monperrus

(参考訳) 本稿では,Ethereumブロックチェーンクライアントのレジリエンス評価のためのカオスエンジニアリングアプローチであるChaosETHを提案する。 ChaosETHは以下の方法で動作する。まず、Ethereumクライアントを監視して、通常の動作を決定する。その後、システム呼び出しのエラーをひとつのethereumクライアントに一度に注入し、摂動による動作を監視する。最後に、ChaosETHは、インジェクションされたシステム呼び出しの呼び出しエラーの影響を評価するために、摂動前後に記録された振る舞いを比較する。実験は、最も人気のあるethereumクライアント実装であるgoethereumとnethermindで実施された。 15のアプリケーションレベルのメトリクスに対して、22の異なるシステムコールエラーがEthereumクライアントに与える影響を評価します。システムコールの呼び出しエラーは,直接クラッシュから完全なレジリエンスに至るまで,Ethereumクライアントの幅広いレジリエンス特性を明らかにした。この実験は、ブロックチェーンシステムにカオスエンジニアリング原則を適用する可能性を明確に示している。

In this paper, we present ChaosETH, a chaos engineering approach for resilience assessment of Ethereum blockchain clients. ChaosETH operates in the following manner: First, it monitors Ethereum clients to determine their normal behavior. Then, it injects system call invocation errors into one single Ethereum client at a time, and observes the behavior resulting from perturbation. Finally, ChaosETH compares the behavior recorded before, during, and after perturbation to assess the impact of the injected system call invocation errors. The experiments are performed on the two most popular Ethereum client implementations: GoEthereum and Nethermind. We assess the impact of 22 different system call errors on those Ethereum clients with respect to 15 application-level metrics. Our results reveal a broad spectrum of resilience characteristics of Ethereum clients w.r.t. system call invocation errors, ranging from direct crashes to full resilience. The experiments clearly demonstrate the feasibility of applying chaos engineering principles to blockchain systems.

翻訳日:2023-10-24 15:48:43 公開日:2023-06-18

# NLPに基づくGDPRに対するデータ処理契約の自動コンプライアンスチェック

NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR ( http://arxiv.org/abs/2209.09722v2 )

ライセンス: Link先を確認

Orlando Amaral, Muhammad Ilyas Azeem, Sallam Abualhaija and Lionel C Briand

(参考訳) 個人データの処理は、一般データ保護規則(GDPR)により、データ処理協定(DPA)を通じてヨーロッパで規制されている。 DPAのコンプライアンスを確認することは、個人データの処理を含むソフトウェア開発において、DPAとしてソフトウェアシステムのコンプライアンス検証に寄与する。しかし、GDPRにおけるDPA関連コンプライアンス要件を理解し、特定し、それらの要件をDPAで検証するためにかなりの時間と労力を必要とするため、与えられたDPAがGDPRに準拠するかどうかを手作業で確認することは困難である。本稿では,GDPR に対する DPA の適合性をチェックするための自動解法を提案する。法律の専門家との密接な交流の中で、私たちはまず2つのアーティファクトを構築しました。一 DPAの遵守及び遵守に係るGDPRの規定から抽出した「shall」要件 (ii)要件の法的概念を定義する用語表。そこで我々は、自然言語処理(NLP)技術を活用して、与えられたDPAの適合性をチェックする自動化ソリューションを開発した。具体的には,DPAのテキストコンテンツに対するフレーズレベルの表現を自動生成し,あらかじめ定義された"shall"要件の表現と比較する。 30の実際のDPAのデータセットでは、750の真偽の違反のうち618が正しく発見され、76の偽の違反を発生させ、さらに524の満足した要件を正しく識別する。このアプローチの平均精度は89.1%、リコールは82.4%、精度は84.6%である。市販のNLPツールに依存するベースラインと比較して,提案手法は平均精度が約20ポイント向上する。提案手法の精度は手作業による検証に制限を加えて約94%向上できる。

Processing personal data is regulated in Europe by the General Data Protection Regulation (GDPR) through data processing agreements (DPAs). Checking the compliance of DPAs contributes to the compliance verification of software systems as DPAs are an important source of requirements for software development involving the processing of personal data. However, manually checking whether a given DPA complies with GDPR is challenging as it requires significant time and effort for understanding and identifying DPA-relevant compliance requirements in GDPR and then verifying these requirements in the DPA. In this paper, we propose an automated solution to check the compliance of a given DPA against GDPR. In close interaction with legal experts, we first built two artifacts: (i) the "shall" requirements extracted from the GDPR provisions relevant to DPA compliance and (ii) a glossary table defining the legal concepts in the requirements. Then, we developed an automated solution that leverages natural language processing (NLP) technologies to check the compliance of a given DPA against these "shall" requirements. Specifically, our approach automatically generates phrasal-level representations for the textual content of the DPA and compares it against predefined representations of the "shall" requirements. Over a dataset of 30 actual DPAs, the approach correctly finds 618 out of 750 genuine violations while raising 76 false violations, and further correctly identifies 524 satisfied requirements. The approach has thus an average precision of 89.1%, a recall of 82.4%, and an accuracy of 84.6%. Compared to a baseline that relies on off-the-shelf NLP tools, our approach provides an average accuracy gain of ~20 percentage points. The accuracy of our approach can be improved to ~94% with limited manual verification effort.

翻訳日:2023-10-24 14:55:43 公開日:2023-06-18

# Eunomia: WebAssemblyバイナリのシンボリック実行でユーザ指定のファイングレード検索を実現する

Eunomia: Enabling User-specified Fine-Grained Search in Symbolically Executing WebAssembly Binaries ( http://arxiv.org/abs/2304.07204v2 )

ライセンス: Link先を確認

Ningyu He, Zhehao Zhao, Jikai Wang, Yubin Hu, Shengjian Guo, Haoyu Wang, Guangtai Liang, Ding Li, Xiangqun Chen, Yao Guo

(参考訳) 既存の手法ではシンボリック実行のパス爆発問題を軽減するための自動アプローチが提案されているが、ユーザは様々な探索戦略を慎重に適用してシンボリック実行を最適化する必要がある。既存のアプローチは粗粒度のグローバル検索戦略のみをサポートするため、複雑なコード構造を効率的に横断することはできない。本稿では,局所的なドメイン知識を指定して,きめ細かい検索を可能にするシンボル実行手法であるEunomiaを提案する。 Eunomiaでは、ユーザーがターゲットプログラムの異なる部分にローカル検索戦略を正確に特定できる表現型DSL、Aesを設計する。局所探索戦略をさらに最適化するために,異なる局所探索戦略に対して変数のコンテキストを自動的に分離し,同じ変数に対する局所探索戦略間の競合を回避する区間ベースのアルゴリズムを設計する。 WebAssemblyをターゲットにしたシンボリック実行プラットフォームとして、Eunomiaを実装しています。これにより、さまざまな言語(CやGoなど)で書かれたアプリケーションを解析できますが、WebAssemblyにコンパイルすることができます。私たちの知る限りでは、EunomiaはWebAssemblyランタイムの全機能をサポートする最初のシンボリックな実行エンジンです。シンボリック実行のためのマイクロベンチマークスイートと6つの実世界のアプリケーションを用いて,Eunomiaの評価を行った。評価の結果,Eunomiaは実世界のアプリケーションにおけるバグ検出を最大3桁高速化することがわかった。総合的なユーザスタディの結果によると、ユーザはシンプルで直感的なAesスクリプトを書くことで、シンボリック実行の効率と効率を大幅に改善することができる。既知の6つの実世界のバグの検証に加えて、Eunomia氏は人気のあるオープンソースプロジェクトである Collections-C で2つのゼロデイバグも検出した。

Although existing techniques have proposed automated approaches to alleviate the path explosion problem of symbolic execution, users still need to optimize symbolic execution by applying various searching strategies carefully. As existing approaches mainly support only coarse-grained global searching strategies, they cannot efficiently traverse through complex code structures. In this paper, we propose Eunomia, a symbolic execution technique that allows users to specify local domain knowledge to enable fine-grained search. In Eunomia, we design an expressive DSL, Aes, that lets users precisely pinpoint local searching strategies to different parts of the target program. To further optimize local searching strategies, we design an interval-based algorithm that automatically isolates the context of variables for different local searching strategies, avoiding conflicts between local searching strategies for the same variable. We implement Eunomia as a symbolic execution platform targeting WebAssembly, which enables us to analyze applications written in various languages (like C and Go) but can be compiled into WebAssembly. To the best of our knowledge, Eunomia is the first symbolic execution engine that supports the full features of the WebAssembly runtime. We evaluate Eunomia with a dedicated microbenchmark suite for symbolic execution and six real-world applications. Our evaluation shows that Eunomia accelerates bug detection in real-world applications by up to three orders of magnitude. According to the results of a comprehensive user study, users can significantly improve the efficiency and effectiveness of symbolic execution by writing a simple and intuitive Aes script. Besides verifying six known real-world bugs, Eunomia also detected two new zero-day bugs in a popular open-source project, Collections-C.

翻訳日:2023-10-24 12:47:14 公開日:2023-06-18

# 適応可能なjson diffフレームワーク

An adaptable JSON Diff Framework ( http://arxiv.org/abs/2305.05865v2 )

ライセンス: Link先を確認

Ao Sun

(参考訳) 本稿では,json-diffフレームワークであるjycmの実装について述べる。このフレームワークは"非順序"比較の概念を導入して既存のフレームワークを拡張し,ユーザが柔軟に比較シナリオをカスタマイズできる。さらに,jsonオブジェクト間の差異をより可視化し,理解するためのdiff-resultレンダラも提供する。私たちの作業は、より適応的で包括的な比較を可能にし、幅広いユースケースと要件に対応します。

In this paper, we present an implementation of JSON-diff framework JYCM, extending the existing framework by introducing the concept of "unordered" comparisons and allowing users to customize their comparison scenarios flexibly. Furthermore, we provide a diff-result renderer to visualize better and understand the differences between JSON objects. Our work enables more adaptable and comprehensive comparisons to accommodate a wider range of use cases and requirements.

翻訳日:2023-10-24 09:15:48 公開日:2023-06-18

# 研究ソフトウェアの公平性を改善するメタデータベースのエコシステム

A Metadata-Based Ecosystem to Improve the FAIRness of Research Software ( http://arxiv.org/abs/2306.10620v1 )

ライセンス: Link先を確認

Patrick Kuckertz, Jan G\"opfert, Oliver Karras, David Neuroth, Julian Sch\"onau, Rodrigo Pueblas, Stephan Ferenz, Felix Engel, Noah Pflugradt, Jann M. Weinand, Astrid Nie{\ss}e, S\"oren Auer, Detlef Stolten

(参考訳) 研究ソフトウェアの再利用は、研究効率と学術交流の中心である。ソフトウェアの適用により、さまざまなバックグラウンドを持つ研究者は、研究結果の再現、検証、拡張が可能になる。さらに、オープンソースコードの解析は、アプローチの理解、比較、統合に役立つ。しかし、関連するソフトウェアが見つからない、あるいは既存の研究プロセスと互換性がないため、それ以上の使用は行われない。これは反復的なソフトウェア開発をもたらし、個々の研究者や研究コミュニティ全体の進歩を妨げる。この記事では、詳細でマシン操作可能なメタデータを持つソフトウェアインターフェースのデータモデルを記述するための、DataDescエコシステムを紹介します。特別なメタデータスキーマに加えて、簡単に収集できる交換フォーマットとサポートツール、およびソフトウェアドキュメントの自動公開が導入されている。このアプローチは、実質的にフェアネス、すなわち、発見可能性、アクセシビリティ、相互運用性、そして研究ソフトウェアの再利用性を高め、研究への影響を効果的に促進する。

The reuse of research software is central to research efficiency and academic exchange. The application of software enables researchers with varied backgrounds to reproduce, validate, and expand upon study findings. Furthermore, the analysis of open source code aids in the comprehension, comparison, and integration of approaches. Often, however, no further use occurs because relevant software cannot be found or is incompatible with existing research processes. This results in repetitive software development, which impedes the advancement of individual researchers and entire research communities. In this article, the DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata. In addition to a specialized metadata schema, an exchange format and support tools for easy collection and the automated publishing of software documentation are introduced. This approach practically increases the FAIRness, i.e., findability, accessibility, interoperability, and so the reusability of research software, as well as effectively promotes its impact on research.

翻訳日:2023-10-23 19:27:02 公開日:2023-06-18

# 2クラス依存サイクルのアンタングリングパターンに関する実証的研究

An Empirical Study of Untangling Patterns of Two-Class Dependency Cycles ( http://arxiv.org/abs/2306.10599v1 )

ライセンス: Link先を確認

Qiong Feng, Shuwen Liu, Huan Ji, Xiaotian Ma, Peng Liang

(参考訳) 依存性のサイクルは、ソフトウェアの品質と保守性に大きな課題をもたらします。しかし、実際のシナリオにおいて、実践者が依存性のサイクルをどのように解決するかの理解は限られている。本稿では,ソフトウェア開発者が2つのクラス間の依存性サイクルを実際に解決するための繰り返しパターンについて,実証的研究を行った。さまざまなドメインにまたがる18のオープンソースプロジェクトのデータを分析し,数百のサイクルアンタングリングケースを手作業で調査した。私たちの調査によると、開発者は依存性サイクルに対処するために5つの繰り返しパターンを使う傾向があります。選択されたパターンは、巡回クラス間の依存関係関係によって決定されるだけでなく、その設計コンテキスト、すなわち、巡回クラスが隣のクラスに依存するか、あるいは依存するかに非常に関係している。この経験的な研究を通じて、通常、サイクルのハンドリング中に開発者が犯した3つのよくある間違いを発見した。これらの繰り返しのパターンと依存性サイクルのプラクティスに見られるよくある誤りは、開発者の認識を改善するための分類法となり、ソフトウェア工学の学生や経験の浅い開発者のための教材としても使われる。また,依存性サイクルの内部構造を考慮することに加えて,自動ツールが依存関係サイクルのリファクタリングを支援するために,サイクルの設計コンテキストを考慮する必要があることも示唆した。

Dependency cycles pose a significant challenge to software quality and maintainability. However, there is limited understanding of how practitioners resolve dependency cycles in real-world scenarios. This paper presents an empirical study investigating the recurring patterns employed by software developers to resolve dependency cycles between two classes in practice. We analyzed the data from 18 open-source projects across different domains and manually inspected hundreds of cycle untangling cases. Our findings reveal that developers tend to employ five recurring patterns to address dependency cycles. The chosen patterns are not only determined by dependency relations between cyclic classes, but also highly related to their design context, i.e., how cyclic classes depend on or are depended by their neighbor classes. Through this empirical study, we also discovered three common mistakes developers usually made during cycles' handling. These recurring patterns and common mistakes observed in dependency cycles' practice can serve as a taxonomy to improve developers' awareness and also be used as learning materials for students in software engineering and inexperienced developers. Our results also suggest that, in addition to considering the internal structure of dependency cycles, automatic tools need to consider the design context of cycles to provide better support for refactoring dependency cycles.

翻訳日:2023-10-23 19:26:46 公開日:2023-06-18

# 感性分析のためのテキストアノテーションツールとしてのChatGPTの活用

Leveraging ChatGPT As Text Annotation Tool For Sentiment Analysis ( http://arxiv.org/abs/2306.17177v1 )

ライセンス: Link先を確認

Mohammad Belal, James She, Simon Wong

(参考訳) 感性分析は、あるテキストの感情的なトーンや極性を特定することを含む、よく知られた自然言語処理タスクである。ソーシャルメディアやその他のオンラインプラットフォームの成長に伴い、顧客からのフィードバックや意見の監視と理解を求める企業や組織にとって、感情分析はますます重要になっている。教師付き学習アルゴリズムはこのタスクに広く採用されているが、分類器を作成するには人間の注釈付きテキストが必要である。この課題を克服するために、レキシコンベースのツールが使用されている。辞書ベースのアルゴリズムの欠点は、事前に定義された感情レキシコンに依存していることだ。 ChatGPTはOpenAIの新製品で、最も人気のあるAI製品として登場した。さまざまなトピックやタスクに関する質問に答えることができる。本研究では、さまざまな感情分析タスクのためのデータラベリングツールとしてのChatGPTについて検討する。異なる目的の2つの感情分析データセットで評価する。以上の結果から,ChatGPTは他のレキシコンをベースとした非教師なし手法よりも高い性能を示し,全体的な精度が向上した。特に、最もパフォーマンスの良い語彙ベースのアルゴリズムと比較して、ChatGPTはツイートデータセットの精度が20%、Amazonレビューデータセットの約25%向上している。これらの結果は、感情分析タスクにおけるChatGPTの異常な性能を強調し、既存のレキシコンベースのアプローチをかなり上回った。この証拠は、異なる感情分析イベントやタスクのアノテーションとして使用できることを示唆している。

Sentiment analysis is a well-known natural language processing task that involves identifying the emotional tone or polarity of a given piece of text. With the growth of social media and other online platforms, sentiment analysis has become increasingly crucial for businesses and organizations seeking to monitor and comprehend customer feedback as well as opinions. Supervised learning algorithms have been popularly employed for this task, but they require human-annotated text to create the classifier. To overcome this challenge, lexicon-based tools have been used. A drawback of lexicon-based algorithms is their reliance on pre-defined sentiment lexicons, which may not capture the full range of sentiments in natural language. ChatGPT is a new product of OpenAI and has emerged as the most popular AI product. It can answer questions on various topics and tasks. This study explores the use of ChatGPT as a tool for data labeling for different sentiment analysis tasks. It is evaluated on two distinct sentiment analysis datasets with varying purposes. The results demonstrate that ChatGPT outperforms other lexicon-based unsupervised methods with significant improvements in overall accuracy. Specifically, compared to the best-performing lexical-based algorithms, ChatGPT achieves a remarkable increase in accuracy of 20% for the tweets dataset and approximately 25% for the Amazon reviews dataset. These findings highlight the exceptional performance of ChatGPT in sentiment analysis tasks, surpassing existing lexicon-based approaches by a significant margin. The evidence suggests it can be used for annotation on different sentiment analysis events and taskss.

翻訳日:2023-07-09 14:20:34 公開日:2023-06-18

# News Verifiers Showdown: News Fact-CheckingにおけるChatGPT 3.5, ChatGPT 4.0, Bing AI, Bardの比較評価

News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking ( http://arxiv.org/abs/2306.17176v1 )

ライセンス: Link先を確認

Kevin Matthe Caramancion

(参考訳) 本研究では,openai の chatgpt 3.5 と 4.0,google の bard (lamda) と microsoft の bing ai といった著名な大規模言語モデル (llm) の習熟度を評価することを目的とした。独立したファクトチェック機関から提供された100のファクトチェックされたニュースアイテムは、制御された条件下でこれら各llmにそれぞれ提示された。これらの回答は、true, false, and partial true/falseの3つのカテゴリの1つに分類された。 LLMの有効性は、独立機関が提供した検証事実に対する分類の正確さに基づいて評価された。結果は全モデル中適度な熟練度を示し、平均スコアは100点中65.25点であった。モデルのうち、OpenAIのGPT-4.0はスコア71で際立っており、偽造と事実を区別する新しいLSMの能力の限界が示唆された。しかし、人間のファクトチェッカーのパフォーマンスに逆らうと、AIモデルは、約束を示すにもかかわらず、ニュース情報に固有の微妙さとコンテキストを理解できない。この発見は、人間の認知スキルの重要性と、AI能力の継続的な進歩の必要性を強調しながら、ファクトチェックの領域におけるAIの可能性を強調している。最後に、この研究のシミュレーションから得られた実験データは、kaggleで公開されている。

This study aimed to evaluate the proficiency of prominent Large Language Models (LLMs), namely OpenAI's ChatGPT 3.5 and 4.0, Google's Bard(LaMDA), and Microsoft's Bing AI in discerning the truthfulness of news items using black box testing. A total of 100 fact-checked news items, all sourced from independent fact-checking agencies, were presented to each of these LLMs under controlled conditions. Their responses were classified into one of three categories: True, False, and Partially True/False. The effectiveness of the LLMs was gauged based on the accuracy of their classifications against the verified facts provided by the independent agencies. The results showed a moderate proficiency across all models, with an average score of 65.25 out of 100. Among the models, OpenAI's GPT-4.0 stood out with a score of 71, suggesting an edge in newer LLMs' abilities to differentiate fact from deception. However, when juxtaposed against the performance of human fact-checkers, the AI models, despite showing promise, lag in comprehending the subtleties and contexts inherent in news information. The findings highlight the potential of AI in the domain of fact-checking while underscoring the continued importance of human cognitive skills and the necessity for persistent advancements in AI capabilities. Finally, the experimental data produced from the simulation of this work is openly available on Kaggle.

翻訳日:2023-07-09 14:20:11 公開日:2023-06-18

# ソフトウェア問題の自動割り当てと分類

Automated Assignment and Classification of Software Issues ( http://arxiv.org/abs/2307.00009v1 )

ライセンス: Link先を確認

B\"u\c{s}ra Tabak

(参考訳) ソフトウェアの問題には、開発中に新しいスレッドを修正、改善、作成するための作業単位が含まれ、チームメンバ間のコミュニケーションを容易にする。最も関係のあるチームメンバーにイシューを割り当てて、イシューのカテゴリを決定するのは、面倒で難しい作業です。間違った分類は、プロジェクトの遅延や再作業、チームメンバー間のトラブルを引き起こします。本論文は,浅層機械学習のための言語的特徴を注意深く整理し,浅層およびアンサンブル法の性能を深層言語モデルと比較するものである。 state-of-the-artとは異なり、私たちはソリューションの汎用性に貢献するために、特定の個人やチームではなく、4つの役割(設計者、開発者、テスター、リーダー)に問題を割り当てます。また、ソリューションの定式化における産業的プラクティスを反映した開発者の経験レベルも考えています。私たちは、問題をバグ、新機能、改善など、異なるクラスに分類する分類アプローチを採用しています。さらに、必要な修正の種類に基づいてバグをさらに分類する努力も行います。グローバルテレビプロデューサーの上位3社のうちの1社から5つの産業データセットを収集し,評価し,深層言語モデルと比較した。われわれのデータセットには5324の問題がある。浅い手法のアンサンブル分類器は問題割当ての0.92と、最先端のディープ言語モデルに統計的に匹敵する精度のイシュー分類の0.90を達成できることを示す。この貢献には、5つのアノテートされた産業問題データセットの公開共有、明確で包括的な特徴セットの開発、新しいラベルセットの導入、浅い機械学習技術のアンサンブル分類器の有効性の検証が含まれる。

Software issues contain units of work to fix, improve or create new threads during the development and facilitate communication among the team members. Assigning an issue to the most relevant team member and determining a category of an issue is a tedious and challenging task. Wrong classifications cause delays and rework in the project and trouble among the team members. This thesis proposes a set of carefully curated linguistic features for shallow machine learning methods and compares the performance of shallow and ensemble methods with deep language models. Unlike the state-of-the-art, we assign issues to four roles (designer, developer, tester, and leader) rather than to specific individuals or teams to contribute to the generality of our solution. We also consider the level of experience of the developers to reflect the industrial practices in our solution formulation. We employ a classification approach to categorize issues into distinct classes, namely bug, new feature, improvement, and other. Additionally, we endeavor to further classify bugs based on the specific type of modification required. We collect and annotate five industrial data sets from one of the top three global television producers to evaluate our proposal and compare it with deep language models. Our data sets contain 5324 issues in total. We show that an ensemble classifier of shallow techniques achieves 0.92 for issue assignment and 0.90 for issue classification in accuracy which is statistically comparable to the state-of-the-art deep language models. The contributions include the public sharing of five annotated industrial issue data sets, the development of a clear and comprehensive feature set, the introduction of a novel label set and the validation of the efficacy of an ensemble classifier of shallow machine learning techniques.

翻訳日:2023-07-09 14:02:31 公開日:2023-06-18

# 認知型AIエコシステム: ChatGPTの事例

Deceptive AI Ecosystems: The Case of ChatGPT ( http://arxiv.org/abs/2306.13671v1 )

ライセンス: Link先を確認

Xiao Zhan, Yifan Xu, Stefan Sarkadi

(参考訳) AIチャットボットのChatGPTは、人間のような応答を生成する能力で人気を集めている。しかし、この機能にはいくつかのリスクが伴う。特に、ユーザーが誤解を招いたり、倫理的な問題をさらに引き起こす可能性のある情報を作成したりするといった、欺く行動が原因である。社会的、文化的、経済的、政治的相互作用に対するChatGPTの影響をより深く理解するためには、ChatGPTが、様々な社会的圧力が開発と展開に影響を与える現実世界でどのように機能するかを検討することが不可欠である。本稿では,ChatGPTが組み込まれているエコシステムの一部として,ユーザの関与を重視しながら,ChatGPTを"野生"で研究する必要性を強調する。そこで我々は,ChatGPTの疑わしい人間的対話から生じる倫理的課題を考察し,より透明で信頼性の高いチャットボットを開発するためのロードマップを提案する。当社のアプローチの中心は、チャットボット技術の未来を形作る上で、積極的なリスクアセスメントとユーザ参加の重要性です。

ChatGPT, an AI chatbot, has gained popularity for its capability in generating human-like responses. However, this feature carries several risks, most notably due to its deceptive behaviour such as offering users misleading or fabricated information that could further cause ethical issues. To better understand the impact of ChatGPT on our social, cultural, economic, and political interactions, it is crucial to investigate how ChatGPT operates in the real world where various societal pressures influence its development and deployment. This paper emphasizes the need to study ChatGPT "in the wild", as part of the ecosystem it is embedded in, with a strong focus on user involvement. We examine the ethical challenges stemming from ChatGPT's deceptive human-like interactions and propose a roadmap for developing more transparent and trustworthy chatbots. Central to our approach is the importance of proactive risk assessment and user participation in shaping the future of chatbot technology.

翻訳日:2023-07-02 13:46:58 公開日:2023-06-18

# 「少し改題しようと思うかもしれない」--ピアツーリングにおけるヘッジの特定

"You might think about slightly revising the title": identifying hedges in peer-tutoring interactions ( http://arxiv.org/abs/2306.14911v1 )

ライセンス: Link先を確認

Yann Raphalen, Chlo\'e Clavel, Justine Cassell

(参考訳) ヘッジは会話の相互作用の管理において重要な役割を果たす。ピア・チュータリングでは、インストラクションやネガティブなフィードバックの影響を抑えるために低いラプポートを経験するダイアド(インターロケーターのペア)の家庭教師が特に用いている。学習を改善するために学生とのラプポートを管理する学習エージェント構築の目的を追求し,マルチモーダルピアツーリングデータセットを用いてヘッジ識別のための計算フレームワークを構築した。我々は,社会科学文献の洞察を取り入れた,事前学習した資源を活用したアプローチを比較した。私たちの最高のパフォーマンスは、解釈しやすく、既存のベースラインを上回るハイブリッドアプローチでした。我々は,ピアツーリング会話におけるヘッジを特徴付ける特徴を探索するためにモデル説明可能性ツールを用い,新たな特徴とハイブリッドモデルアプローチの利点を明らかにした。

Hedges play an important role in the management of conversational interaction. In peer tutoring, they are notably used by tutors in dyads (pairs of interlocutors) experiencing low rapport to tone down the impact of instructions and negative feedback. Pursuing the objective of building a tutoring agent that manages rapport with students in order to improve learning, we used a multimodal peer-tutoring dataset to construct a computational framework for identifying hedges. We compared approaches relying on pre-trained resources with others that integrate insights from the social science literature. Our best performance involved a hybrid approach that outperforms the existing baseline while being easier to interpret. We employ a model explainability tool to explore the features that characterize hedges in peer-tutoring conversations, and we identify some novel features, and the benefits of such a hybrid model approach.

翻訳日:2023-07-02 13:27:28 公開日:2023-06-18

# llms時代におけるヒューマンラベルデータの重要性

The Importance of Human-Labeled Data in the Era of LLMs ( http://arxiv.org/abs/2306.14910v1 )

ライセンス: Link先を確認

Yang Liu

(参考訳) 大規模言語モデル(LLM)の出現は、カスタマイズされた機械学習モデルの開発に革命をもたらし、データ要件の再定義に関する議論を引き起こした。 LLMの訓練と実施によって促進される自動化は、人間レベルのラベリング介入が、教師付き学習の時代と同じレベルの重要さをもはや持たないという議論や願望につながった。本稿では LLM 時代における人間ラベルデータの継続的な関連性を支持する説得力のある議論について述べる。

The advent of large language models (LLMs) has brought about a revolution in the development of tailored machine learning models and sparked debates on redefining data requirements. The automation facilitated by the training and implementation of LLMs has led to discussions and aspirations that human-level labeling interventions may no longer hold the same level of importance as in the era of supervised learning. This paper presents compelling arguments supporting the ongoing relevance of human-labeled data in the era of LLMs.

翻訳日:2023-07-02 13:27:13 公開日:2023-06-18

# 動的ニューラルネットワークを用いた株価予測

Stock Price Prediction using Dynamic Neural Networks ( http://arxiv.org/abs/2306.12969v1 )

ライセンス: Link先を確認

David Noel

(参考訳) 本稿では,日替わり価格を予測する時系列動的ニューラルネットワークの解析と実装を行う。ニューラルネットワークはカオス、非線形、一見ランダムなデータの基本パターンを識別する能力を有しており、現在の多くの技術よりもはるかに正確に株価の動きを予測するメカニズムを提供する。基本技術、技術的手法、回帰手法を含むストック分析の現代的手法は、ニューラルネットワークのパフォーマンスと会話され、並列化される。また、効率的な市場仮説(EMH)を提示し、ニューラルネットワークを用いたカオス理論と対比する。本稿では,EMHを論じ,カオス理論を支持する。最後に、株価予測にニューラルネットワークを使用するための推奨事項を示す。

This paper will analyze and implement a time series dynamic neural network to predict daily closing stock prices. Neural networks possess unsurpassed abilities in identifying underlying patterns in chaotic, non-linear, and seemingly random data, thus providing a mechanism to predict stock price movements much more precisely than many current techniques. Contemporary methods for stock analysis, including fundamental, technical, and regression techniques, are conversed and paralleled with the performance of neural networks. Also, the Efficient Market Hypothesis (EMH) is presented and contrasted with Chaos theory using neural networks. This paper will refute the EMH and support Chaos theory. Finally, recommendations for using neural networks in stock price prediction will be presented.

翻訳日:2023-06-23 14:07:09 公開日:2023-06-18

# ラベル付き確率ブロックモデルにおけるインスタンス最適クラスタリカバリ

Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model ( http://arxiv.org/abs/2306.12968v1 )

ライセンス: Link先を確認

Kaito Ariu, Alexandre Proutiere, Se-Young Yun

(参考訳) 我々は,有限個のクラスタを持つラベル付き確率ブロックモデル (lsbm) において隠れたコミュニティを回復する問題を考える。 LSBMでは、ラベルは(独立して)各アイテムに対して観測される。我々の目的は、観測されたラベルを用いてクラスタを復元する効率的なアルゴリズムを考案することである。この目的のために、任意のクラスタリングアルゴリズムで満たされる誤分類項目の期待数について、インスタンス固有の下限を再検討する。本稿では,これらの下位境界を期待値と高い確率で一致させる最初のアルゴリズムであるIACを提案する。 iacは1回のスペクトルクラスタリングアルゴリズムと反復的確率に基づくクラスタ割り当て改善からなる。このアプローチはインスタンス固有の低境界に基づいており、クラスタ数を含むモデルパラメータは一切必要としない。スペクトルクラスタリングを一度だけ実行することで、IACは$\mathcal{O}(n \text{polylog}(n))$の全体的な計算複雑性を維持する。本手法の有効性を数値実験により示す。

We consider the problem of recovering hidden communities in the Labeled Stochastic Block Model (LSBM) with a finite number of clusters, where cluster sizes grow linearly with the total number $n$ of items. In the LSBM, a label is (independently) observed for each pair of items. Our objective is to devise an efficient algorithm that recovers clusters using the observed labels. To this end, we revisit instance-specific lower bounds on the expected number of misclassified items satisfied by any clustering algorithm. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability. IAC consists of a one-time spectral clustering algorithm followed by an iterative likelihood-based cluster assignment improvement. This approach is based on the instance-specific lower bound and does not require any model parameters, including the number of clusters. By performing the spectral clustering only once, IAC maintains an overall computational complexity of $\mathcal{O}(n \text{polylog}(n))$. We illustrate the effectiveness of our approach through numerical experiments.

翻訳日:2023-06-23 14:07:00 公開日:2023-06-18

# Anchor-Guided Clustering と Spatio-Temporal Consistency ID Re Assignment によるマルチカメラ人物追跡の強化

Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and Spatio-Temporal Consistency ID Re-Assignment ( http://arxiv.org/abs/2304.09471v2 )

ライセンス: Link先を確認

Hsiang-Wei Huang, Cheng-Yen Yang, Zhongyu Jiang, Pyong-Kun Kim, Kyoungoh Lee, Kwangju Kim, Samartha Ramkumar, Chaitanya Mullapudi, In-Su Jang, Chung-I Huang, Jenq-Neng Hwang

(参考訳) マルチカメラの多人数追跡は、特に小売、医療センター、交通ハブなどの環境において、正確で効率的な屋内人物追跡システムへの需要が高まり、研究の重要領域になりつつある。我々は、アンカー誘導クラスタリングを用いて、幾何学に基づくクロスカメラIDの再割り当てのための、クロスカメラの再識別と時空間整合性を実現する、新しいマルチカメラ多人数追跡手法を提案する。本研究の目的は,各個人に特有の特徴を識別し,カメラ間の視界の重なりを利用して,実際のカメラパラメータを必要とせずに正確な軌跡の予測を行うことにより,トラッキングの精度を向上させることである。本手法は合成データと実世界のデータの両方を扱う際のロバスト性と有効性を示している。提案手法はCVPR AI City Challenge 2023データセットで評価され,95.36%のIDF1を達成し,第1位となった。コードはhttps://github.com/ipl-uw/AIC23_Track1_UWIPL_ETRIで公開されている。

Multi-camera multiple people tracking has become an increasingly important area of research due to the growing demand for accurate and efficient indoor people tracking systems, particularly in settings such as retail, healthcare centers, and transit hubs. We proposed a novel multi-camera multiple people tracking method that uses anchor-guided clustering for cross-camera re-identification and spatio-temporal consistency for geometry-based cross-camera ID reassigning. Our approach aims to improve the accuracy of tracking by identifying key features that are unique to every individual and utilizing the overlap of views between cameras to predict accurate trajectories without needing the actual camera parameters. The method has demonstrated robustness and effectiveness in handling both synthetic and real-world data. The proposed method is evaluated on CVPR AI City Challenge 2023 dataset, achieving IDF1 of 95.36% with the first-place ranking in the challenge. The code is available at: https://github.com/ipl-uw/AIC23_Track1_UWIPL_ETRI.

翻訳日:2023-06-22 17:25:33 公開日:2023-06-18

# 注意に基づく畳み込みネットワークと説明可能なAIを用いた乳癌分離

Breast Cancer Segmentation using Attention-based Convolutional Network and Explainable AI ( http://arxiv.org/abs/2305.14389v2 )

ライセンス: Link先を確認

Jai Vardhan, Taraka Satya Krishna Teja Malisetti

(参考訳) 乳がん(BC)は依然として重大な健康上の脅威であり、現在長期治療は行われていない。早期発見は重要であるが、マンモグラフィーの解釈は高い偽陽性と陰性によって妨げられる。 BCは肺がんに勝ると予想され、早期発見法の改善が不可欠である。高分解能赤外線カメラを用いたサーモグラフィは、特に人工知能(ai)と組み合わせると期待できる。この研究は、セグメンテーションのための注意に基づく畳み込みニューラルネットワークを示し、BCの検出と分類のスピードと精度を高める。このシステムは画像を強化し、説明可能なAIを用いて癌セグメンテーションを行う。 irt画像を用いてunetアーキテクチャのバイアスと弱点領域を分析するために,障害同定のためのトランスフォーマッティングに基づく畳み込みアーキテクチャ(unet)を提案し,勾配重み付けクラスアクティベーションマッピング(grad-cam)を用いた。既存のディープラーニングフレームワークと比較して,提案フレームワークの優位性が確認された。

Breast cancer (BC) remains a significant health threat, with no long-term cure currently available. Early detection is crucial, yet mammography interpretation is hindered by high false positives and negatives. With BC incidence projected to surpass lung cancer, improving early detection methods is vital. Thermography, using high-resolution infrared cameras, offers promise, especially when combined with artificial intelligence (AI). This work presents an attention-based convolutional neural network for segmentation, providing increased speed and precision in BC detection and classification. The system enhances images and performs cancer segmentation with explainable AI. We propose a transformer-attention-based convolutional architecture (UNet) for fault identification and employ Gradient-weighted Class Activation Mapping (Grad-CAM) to analyze areas of bias and weakness in the UNet architecture with IRT images. The superiority of our proposed framework is confirmed when compared with existing deep learning frameworks.

翻訳日:2023-06-22 17:03:43 公開日:2023-06-18

# 一様量子重ね合わせ状態作成のための効率的な量子アルゴリズム

An efficient quantum algorithm for preparation of uniform quantum superposition states ( http://arxiv.org/abs/2306.11747v1 )

ライセンス: Link先を確認

Alok Shukla, Prakash Vedula

(参考訳) n$-qubitの計算基底状態の空でない部分集合上の一様重ね合わせを含む量子状態準備は、多くの量子計算アルゴリズムや応用において重要かつ困難なステップである。本研究は、$\ket{\Psi} = \frac{1}{\sqrt{M}}\sum_{j = 0}^{M - 1} \ket{j}$, ここで、$M$は重ね合わせ状態における異なる状態の数を表し、$2 \leq M \leq 2^n$である。重ね合わせ状態 $\ket{\Psi}$ は、全ての$M$に対して、ゲートの複雑さと回路深さのみ$O(\log_2~M)$で効率的に作成できることが示される。これは、この問題の一般的な場合の文献における他の既存のアプローチと比較して、ゲート複雑性が指数関数的に減少することを示している。提案されたアプローチのもう1つの利点は、$n=\ceil{\log_2~m}$ qubitsである。さらに、ancilla qubits や複数の制御を持つ量子ゲートは、一様重ね合わせ状態 $\ket{\psi}$ を作成するのに必要としない。また、一様重ね合わせ状態の混合を含む多種多様な非一様重ね合わせ状態は、前述した一様重ね合わせ状態$\ket{\Psi}$を作成するのに使用されるのと同じ回路構成で効率よく生成できるが、修正されたパラメータで生成できることも示されている。

Quantum state preparation involving a uniform superposition over a non-empty subset of $n$-qubit computational basis states is an important and challenging step in many quantum computation algorithms and applications. In this work, we address the problem of preparation of a uniform superposition state of the form $\ket{\Psi} = \frac{1}{\sqrt{M}}\sum_{j = 0}^{M - 1} \ket{j}$, where $M$ denotes the number of distinct states in the superposition state and $2 \leq M \leq 2^n$. We show that the superposition state $\ket{\Psi}$ can be efficiently prepared with a gate complexity and circuit depth of only $O(\log_2~M)$ for all $M$. This demonstrates an exponential reduction in gate complexity in comparison to other existing approaches in the literature for the general case of this problem. Another advantage of the proposed approach is that it requires only $n=\ceil{\log_2~M}$ qubits. Furthermore, neither ancilla qubits nor any quantum gates with multiple controls are needed in our approach for creating the uniform superposition state $\ket{\Psi}$. It is also shown that a broad class of nonuniform superposition states that involve a mixture of uniform superposition states can also be efficiently created with the same circuit configuration that is used for creating the uniform superposition state $\ket{\Psi}$ described earlier, but with modified parameters.

翻訳日:2023-06-22 16:34:27 公開日:2023-06-18

# ハイブリッドレンズの深部適応融合による光電界再構成

Light Field Reconstruction via Deep Adaptive Fusion of Hybrid Lenses ( http://arxiv.org/abs/2102.07085v3 )

ライセンス: Link先を確認

Jing Jin and Mantang Guo and Junhui Hou and Hui Liu and Hongkai Xiong

(参考訳) 本稿では,複数の低解像度カメラを取り囲む高分解能カメラを含むハイブリッドレンズからの高分解能光電界(lf)像の再構成の問題について検討する。既存手法の性能は, 平坦なテクスチャ領域のぼやけた結果や, 不連続境界付近の歪みなど, 依然として限られている。この課題に対処するために,2つの相補的および並列的な視点から入力の特徴を包括的に活用する,エンドツーエンドの学習ベースアプローチを提案する。具体的には、深い多次元およびクロスドメインの特徴表現を学習することにより、空間的に一貫した中間推定を回帰し、他方のモジュールは、高分解能ビューの情報を伝播することにより、高周波数テクスチャを維持する別の中間推定をワープする。最後に,2つの中間推定の利点を学習アテンションマップを通して適応的に活用し,平滑なテクスチャ領域と深さの不連続境界の両方において,最終的な高分解能のlf画像を得る。さらに,ハイブリッドLFイメージングシステムによって得られた実ハイブリッドデータに対して,シミュレーションハイブリッドデータを用いてトレーニングした手法の有効性を向上するために,ネットワークアーキテクチャとトレーニング戦略を慎重に設計する。実データとシミュレーションデータの両方について広範な実験を行った結果,最先端データよりも優れたアプローチが得られた。我々の知る限りでは、これは真のハイブリッド入力からのLF再構成のための最初のエンドツーエンドのディープラーニング手法である。我々のフレームワークは、高解像度なLFデータ取得のコストを削減し、LFデータストレージと送信の恩恵を受ける可能性があると考えています。

This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation, while the other module warps another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations adaptively via the learned attention maps, leading to the final high-resolution LF image with satisfactory results on both plain textured areas and depth discontinuous boundaries. Besides, to promote the effectiveness of our method trained with simulated hybrid data on real hybrid data captured by a hybrid LF imaging system, we carefully design the network architecture and the training strategy. Extensive experiments on both real and simulated hybrid data demonstrate the significant superiority of our approach over state-of-the-art ones. To the best of our knowledge, this is the first end-to-end deep learning method for LF reconstruction from a real hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission.

翻訳日:2023-06-22 08:30:20 公開日:2023-06-18

# ゴールコンディショニングトランスポーターネットワークを用いた変形可能なケーブル、布地、バッグの再構成学習

Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks ( http://arxiv.org/abs/2012.03385v4 )

ライセンス: Link先を確認

Daniel Seita, Pete Florence, Jonathan Tompson, Erwin Coumans, Vikas Sindhwani, Ken Goldberg, Andy Zeng

(参考訳) ケーブル、布地、バッグなどの変形可能な物体の配置と操作は、ロボット操作における長年の課題である。変形可能な複雑なダイナミクスと高次元の構成空間は、剛性のある物体と比較すると、多段計画だけでなくゴールの仕様においても操作が困難である。ゴールは剛体のポーズほど簡単に特定できず、「バッグの中にアイテムを置く」といった複雑な空間関係を伴うこともある。本研究では,画像ベースゴールコンディショニングや複数ステップの変形操作を含む,1D,2D,3Dの変形可能な構造を持つシミュレーションベンチマークスイートを開発する。本稿では,最近提案されたロボット操作を学習するためのモデルアーキテクチャであるトランスポーターネットワークに目標条件を組み込む手法を提案する。シミュレーションおよび物理実験において、目標条件付きトランスポーターネットワークは、ターゲット位置に対するテスト時間視覚アンカーを使わずに、変形可能な構造を柔軟に指定した構成に操作できることを示した。また, 2次元および3次元の変形可能なタスクでテストすることにより, 変形可能なオブジェクトを操作するトランスポーターネットワークを用いて, 先行結果を著しく拡張した。補足資料はhttps://berkeleyautomation.github.io/bags/で入手できる。

Rearranging and manipulating deformable objects such as cables, fabrics, and bags is a long-standing challenge in robotic manipulation. The complex dynamics and high-dimensional configuration spaces of deformables, compared to rigid objects, make manipulation difficult not only for multi-step planning, but even for goal specification. Goals cannot be as easily specified as rigid object poses, and may involve complex relative spatial relations such as "place the item inside the bag". In this work, we develop a suite of simulated benchmarks with 1D, 2D, and 3D deformable structures, including tasks that involve image-based goal-conditioning and multi-step deformable manipulation. We propose embedding goal-conditioning into Transporter Networks, a recently proposed model architecture for learning robotic manipulation that rearranges deep features to infer displacements that can represent pick and place actions. In simulation and in physical experiments, we demonstrate that goal-conditioned Transporter Networks enable agents to manipulate deformable structures into flexibly specified configurations without test-time visual anchors for target locations. We also significantly extend prior results using Transporter Networks for manipulating deformable objects by testing on tasks with 2D and 3D deformables. Supplementary material is available at https://berkeleyautomation.github.io/bags/.

翻訳日:2023-06-22 08:29:19 公開日:2023-06-18

# 文脈広い音素クラス情報を活用した音声強調性能の向上

Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information ( http://arxiv.org/abs/2011.07442v5 )

ライセンス: Link先を確認

Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao

(参考訳) 従来,音声の音響的特徴を調音的特徴の場所/マンガで増大させることで,音声強調(SE)過程を導出することにより,音声の幅広い音韻特性を考慮し,性能向上を図ることができた。本稿では,音節属性の文脈情報を付加情報として検討し,SEをさらに活用する。より具体的には、幅広い音素クラス(bpcs)のシーケンスを予測するエンドツーエンド自動音声認識(e2e-asr)モデルによる損失を利用して、se性能を改善することを提案する。また,BPCをベースとしたE2E-ASRに基づくSEシステムの学習において,ASRを用いた多目的トレーニングと知覚的損失も開発した。音声の発声, 発声残響, 音声強調課題による実験結果から, 文脈的bpc情報がse性能を向上できることが確認された。さらに、BPCベースのE2E-ASRで訓練されたSEモデルは、音素ベースのE2E-ASRよりも優れている。その結果、ASRシステムによる音素の誤分類による目的が不完全なフィードバックにつながる可能性があり、BPCがよりよい選択である可能性が示唆された。最後に,重畳可能な音声目標を同一のBPCに組み合わせることで,SE性能を効果的に向上できることに注意する。

Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the broad phonetic properties of the input speech when performing enhancement to attain performance improvements. In this paper, we explore the contextual information of articulatory attributes as additional information to further benefit SE. More specifically, we propose to improve the SE performance by leveraging losses from an end-to-end automatic speech recognition (E2E-ASR) model that predicts the sequence of broad phonetic classes (BPCs). We also developed multi-objective training with ASR and perceptual losses to train the SE system based on a BPC-based E2E-ASR. Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that contextual BPC information improves SE performance. Moreover, the SE model trained with the BPC-based E2E-ASR outperforms that with the phoneme-based E2E-ASR. The results suggest that objectives with misclassification of phonemes by the ASR system may lead to imperfect feedback, and BPC could be a potentially better choice. Finally, it is noted that combining the most-confusable phonetic targets into the same BPC when calculating the additional objective can effectively improve the SE performance.

翻訳日:2023-06-22 08:28:56 公開日:2023-06-18

# 条件付き生成逆数ネットワークを用いた深層学習対流

Deep Learning Convective Flow Using Conditional Generative Adversarial Networks ( http://arxiv.org/abs/2005.06422v2 )

ライセンス: Link先を確認

Changlin Jiang, Amir Barati Farimani

(参考訳) 我々は,エネルギー輸送を伴う時間依存対流の学習と予測が可能な汎用ディープラーニングフレームワークfluidganを開発した。 fluidganは高速で正確でデータ駆動であり、基礎となる流体やエネルギー輸送物理学の知識なしに流体の物理を満たしている。また、FluidGANは速度、圧力、温度場の結合も学習する。我々の枠組みは、基礎となる物理モデルが複雑または未知である決定論的多物理現象を理解するのに役立つ。

We developed a general deep learning framework, FluidGAN, capable of learning and predicting time-dependent convective flow coupled with energy transport. FluidGAN is thoroughly data-driven with high speed and accuracy and satisfies the physics of fluid without any prior knowledge of underlying fluid and energy transport physics. FluidGAN also learns the coupling between velocity, pressure, and temperature fields. Our framework helps understand deterministic multiphysics phenomena where the underlying physical model is complex or unknown.

翻訳日:2023-06-22 08:27:10 公開日:2023-06-18

# 深層学習における認識的不確かさの定量化

Quantifying Epistemic Uncertainty in Deep Learning ( http://arxiv.org/abs/2110.12122v4 )

ライセンス: Link先を確認

Ziyi Huang, Henry Lam and Haofeng Zhang

(参考訳) 不確かさの定量化は、機械学習の信頼性と堅牢性の中核にある。本稿では,この不確実性,特に,深層学習において,不確実性(不確実性)を(訓練手順から)\textit{procedural variability} と(訓練データから) \textit{data variability} (訓練データから) に分解する理論的枠組みを提案する。次に,これらの不確実性を評価するための2つの手法を提案する。我々は,古典的な統計手法を適用する際の計算困難を克服する方法を実証する。複数の問題設定に関する実験的な評価は、我々の理論を裏付け、我々のフレームワークと推定が、どのようにしてモデリングとデータ収集の直接的なガイダンスを提供するかを説明する。

Uncertainty quantification is at the core of the reliability and robustness of machine learning. In this paper, we provide a theoretical framework to dissect the uncertainty, especially the \textit{epistemic} component, in deep learning into \textit{procedural variability} (from the training procedure) and \textit{data variability} (from the training data), which is the first such attempt in the literature to our best knowledge. We then propose two approaches to estimate these uncertainties, one based on influence function and one on batching. We demonstrate how our approaches overcome the computational difficulties in applying classical statistical methods. Experimental evaluations on multiple problem settings corroborate our theory and illustrate how our framework and estimation can provide direct guidance on modeling and data collection efforts.

翻訳日:2023-06-22 06:47:39 公開日:2023-06-18

# 一般化総変分最小化によるクラスタ化フェデレーション学習

Clustered Federated Learning via Generalized Total Variation Minimization ( http://arxiv.org/abs/2105.12769v4 )

ライセンス: Link先を確認

Yasmin SarcheshmehPour, Yu Tian, Linli Zhang, Alexander Jung

(参考訳) ネットワーク構造を持つローカルデータセットの分散収集のための局所的(あるいはパーソナライズされた)モデルを学習するための最適化手法を検討する。このネットワーク構造は、ローカルデータセット間の類似性のドメイン固有の概念から生じる。そのような概念の例としては、時空間的近接、統計的依存関係、機能的関係などがある。我々の主要な概念的貢献は、一般化総変動(GTV)最小化としてフェデレーション学習を定式化することである。この定式化は、既存の連合学習方法を統一し、大幅に拡張する。柔軟性が高く、一般化線形モデルやディープニューラルネットワークを含む幅広いパラメトリックモデルと組み合わせることができる。私たちのアルゴリズムの主な貢献は、完全に分散した連合学習アルゴリズムです。このアルゴリズムは、GTVの最小化を解くために確立された原始双対法を適用して得られる。メッセージパッシングとして実装することができ、処理時間や帯域幅を含む限られた計算資源から生じる不正確な計算に対して堅牢である。私たちの主な分析的貢献は、アルゴリズムが学習したローカルモデルパラメータと、oracleベースのクラスタ型フェデレーション学習方法との偏差の上限です。この上界は、ローカルモデルと、gtvの最小化が(ほぼ)均質なローカルデータセットをプールできるローカルデータセットのネットワーク構造に関する条件を明らかにする。

We study optimization methods to train local (or personalized) models for decentralized collections of local datasets with an intrinsic network structure. This network structure arises from domain-specific notions of similarity between local datasets. Examples for such notions include spatio-temporal proximity, statistical dependencies or functional relations. Our main conceptual contribution is to formulate federated learning as generalized total variation (GTV) minimization. This formulation unifies and considerably extends existing federated learning methods. It is highly flexible and can be combined with a broad range of parametric models, including generalized linear models or deep neural networks. Our main algorithmic contribution is a fully decentralized federated learning algorithm. This algorithm is obtained by applying an established primal-dual method to solve GTV minimization. It can be implemented as message passing and is robust against inexact computations that arise from limited computational resources including processing time or bandwidth. Our main analytic contribution is an upper bound on the deviation between the local model parameters learnt by our algorithm and an oracle-based clustered federated learning method. This upper bound reveals conditions on the local models and the network structure of local datasets such that GTV minimization is able to pool (nearly) homogeneous local datasets.

翻訳日:2023-06-22 06:45:20 公開日:2023-06-18

# 確率近似と強化学習における漸近統計量のODE法

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning ( http://arxiv.org/abs/2110.14427v3 )

ライセンス: Link先を確認

Vivek Borkar, Shuhang Chen, Adithya Devraj, Ioannis Kontoyiannis and Sean Meyn

(参考訳) 論文は、$d$-dimensional stochastic approximation recursion、$$$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \phi_{n+1}) $$、$\phi$は一般状態空間上の幾何学的エルゴードマルコフ連鎖である$\textsf{x}$、定常分布$\pi$、$f:\re^d\times\textsf{x}\to\re^d$である。主な結果はDonsker-Varadhan Lyapunov ドリフト条件 (DV3) とベクトル場 $\bar{f}(\theta)=\textsf{E}[f(\theta,\Phi)]$, with $\Phi\sim\pi$ による平均流の安定性条件の下にある。 (i)$\{ \theta_n\}$ は収束 a.s. であり、$L_4$ は一意根 $\theta^*$ of $\bar{f}(\theta)$ に収束する。 (ii)正規化誤差に対する通常の1次元CLTと同様に関数型CLTが確立される。 (iii) CLT は正規化バージョン $z_n{=:} \sqrt{n} (\theta^{\text{PR}}_n -\theta^*)$, 平均化パラメータ $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$ を、ステップサイズに関する標準的な仮定に従って保持する。さらに、正規化された共分散は、$$ \lim_{n \to \infty} n \textsf{E} [ {\widetilde{\theta}}^{\text{ PR}}_n ({\widetilde{\theta}}^{\text{ PR}}_n)^T ] = \Sigma_\theta^*,\;\;\;\;\textit{with $\widetilde{\theta}^{\text{ PR}}_n = \theta^{\text{ PR}}_n -\theta^*$,} $$$$\Sigma_\theta^*$はポリアクとルパートの最小共分散である。 (iv) 例えば、$f$ と $\bar{f}$ が $\theta$ において線型であり、マルコフ連鎖 $\Phi$ は幾何学的にエルゴード的であるが満足しない(DV3)。アルゴリズムは収束するが、第二モーメントは非有界である: $ \textsf{E} [ \| \theta_n \|^2 ] \to \infty$ as $n\to\infty$。

The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ in which $\Phi$ is a geometrically ergodic Markov chain on a general state space $\textsf{X}$ with stationary distribution $\pi$, and $f:\Re^d\times\textsf{X}\to\Re^d$. The main results are established under a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3), and a stability condition for the mean flow with vector field $\bar{f}(\theta)=\textsf{E}[f(\theta,\Phi)]$, with $\Phi\sim\pi$. (i) $\{ \theta_n\}$ is convergent a.s. and in $L_4$ to the unique root $\theta^*$ of $\bar{f}(\theta)$. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. (iii) The CLT holds for the normalized version, $z_n{=:} \sqrt{n} (\theta^{\text{PR}}_n -\theta^*)$, of the averaged parameters, $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$, subject to standard assumptions on the step-size. Moreover, the normalized covariance converges, $$ \lim_{n \to \infty} n \textsf{E} [ {\widetilde{\theta}}^{\text{ PR}}_n ({\widetilde{\theta}}^{\text{ PR}}_n)^T ] = \Sigma_\theta^*,\;\;\;\textit{with $\widetilde{\theta}^{\text{ PR}}_n = \theta^{\text{ PR}}_n -\theta^*$,} $$ where $\Sigma_\theta^*$ is the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and the Markov chain $\Phi$ is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded: $ \textsf{E} [ \| \theta_n \|^2 ] \to \infty$ as $n\to\infty$.

翻訳日:2023-06-22 06:36:26 公開日:2023-06-18

# metaverse: セキュリティとプライバシの懸念

Metaverse: Security and Privacy Concerns ( http://arxiv.org/abs/2203.03854v3 )

ライセンス: Link先を確認

Ruoyu Zhao, Yushu Zhang, Youwen Zhu, Rushi Lan, Zhongyun Hua

(参考訳) 現実世界に似た3次元仮想宇宙である「メタバース」という用語は、1990年代に先延ばしされて以来、常に想像力に満ちていた。近年,様々な技術の継続的な出現と進歩によってメタバースを実現することが可能となり,再び注目を浴びている。差別の削減、個人差の排除、社会化など、人間社会に多くの利益をもたらす可能性がある。しかし、すべてにはセキュリティとプライバシに関する懸念がある。本稿では,メタバースの概念をまず分析し,他のVR技術と比較して超仮想現実性(VR)エコシステムであることを示す。そして、ユーザ情報、コミュニケーション、シナリオ、グッズという4つの視点から、セキュリティとプライバシに関する懸念を慎重に分析し、詳細化します。一方、我々は、新たなバケット効果を利用して、哲学的な観点から、セキュリティとプライバシの懸念に包括的に対処する必要性を提起し、メタバースコミュニティに多少の進展をもたらすことを期待する。

The term "metaverse", a three-dimensional virtual universe similar to the real realm, has always been full of imagination since it was put forward in the 1990s. Recently, it is possible to realize the metaverse with the continuous emergence and progress of various technologies, and thus it has attracted extensive attention again. It may bring a lot of benefits to human society such as reducing discrimination, eliminating individual differences, and socializing. However, everything has security and privacy concerns, which is no exception for the metaverse. In this article, we firstly analyze the concept of the metaverse and propose that it is a super virtual-reality (VR) ecosystem compared with other VR technologies. Then, we carefully analyze and elaborate on possible security and privacy concerns from four perspectives: user information, communication, scenario, and goods, and immediately, the potential solutions are correspondingly put forward. Meanwhile, we propose the need to take advantage of the new buckets effect to comprehensively address security and privacy concerns from a philosophical perspective, which hopefully will bring some progress to the metaverse community.

翻訳日:2023-06-22 06:26:30 公開日:2023-06-18

# バックドアポゾンサンプル検出のためのプロアクティブMLアプローチに向けて

Towards A Proactive ML Approach for Detecting Backdoor Poison Samples ( http://arxiv.org/abs/2205.13616v3 )

ライセンス: Link先を確認

Xiangyu Qi, Tinghao Xie, Jiachen T. Wang, Tong Wu, Saeed Mahloujifar, Prateek Mittal

(参考訳) 広告主は、トレーニングデータセットにバックドア毒サンプルを導入することで、ディープラーニングモデルにバックドアを埋め込むことができる。本研究は,バックドア攻撃の脅威を軽減するために,このような毒のサンプルを検出する方法を検討する。まず、最も先行作業の基盤となるポストホックなワークフローを明らかにし、ディフェンダーは攻撃の進行を受動的に許可し、その後攻撃後のモデルの特徴を活用して毒のサンプルを明らかにする。このワークフローがディフェンダーの能力を十分に活用していないことは明らかで、その上に構築されたディフェンスパイプラインは、多くのシナリオで障害やパフォーマンスの低下を引き起こします。第2に,モデルトレーニングと毒物検出パイプライン全体に対して,ディフェンダーが積極的に関与し,攻撃後のモデルの特徴を強要し,拡大し,毒物検出を容易にするという,積極的な考え方を促進することによるパラダイムシフトを提案する。これに基づいて統一フレームワークを定式化し,より堅牢で一般化可能な検出パイプラインの設計に関する実践的洞察を提供する。第3に,本フレームワークの具体的インスタンス化として,CT(Confusion Training)技術を導入する。 CTは、既に有毒なデータセットに追加の中毒攻撃を加え、検出にバックドアパターンを露出しながら、良性相関を積極的に分離する。 4種類のデータセットと14種類の攻撃に対する実証的評価は、14のベースライン防御に対するCTの優位性を検証した。

Adversaries can embed backdoors in deep learning models by introducing backdoor poison samples into training datasets. In this work, we investigate how to detect such poison samples to mitigate the threat of backdoor attacks. First, we uncover a post-hoc workflow underlying most prior work, where defenders passively allow the attack to proceed and then leverage the characteristics of the post-attacked model to uncover poison samples. We reveal that this workflow does not fully exploit defenders' capabilities, and defense pipelines built on it are prone to failure or performance degradation in many scenarios. Second, we suggest a paradigm shift by promoting a proactive mindset in which defenders engage proactively with the entire model training and poison detection pipeline, directly enforcing and magnifying distinctive characteristics of the post-attacked model to facilitate poison detection. Based on this, we formulate a unified framework and provide practical insights on designing detection pipelines that are more robust and generalizable. Third, we introduce the technique of Confusion Training (CT) as a concrete instantiation of our framework. CT applies an additional poisoning attack to the already poisoned dataset, actively decoupling benign correlation while exposing backdoor patterns to detection. Empirical evaluations on 4 datasets and 14 types of attacks validate the superiority of CT over 14 baseline defenses.

翻訳日:2023-06-22 06:18:36 公開日:2023-06-18

# 自動車用創発型視覚センサ

Emergent Visual Sensors for Autonomous Vehicles ( http://arxiv.org/abs/2205.09383v2 )

ライセンス: Link先を確認

You Li, Julien Moreau, Javier Ibanez-Guzman

(参考訳) 自動運転車は、周囲を理解するために認識システムに依存している。カメラは、現代のコンピュータビジョンアルゴリズムが提供する物体検出と認識の利点から、lidarやレーダーなどの他のセンサーと比較して、知覚システムにとって不可欠である。しかし、その固有の撮像原理によって制限されるため、標準的なrgbカメラは、低照度、高コントラスト、霧・雨・雪などの悪天候など、様々な悪いシナリオで性能が低下する可能性がある。一方,2次元画像検出による3次元情報の推定は,ライダーやレーダーに比べて一般的に困難である。近年、従来のRGBカメラの限界に対応するために、いくつかの新しいセンシング技術が登場している。本稿では,赤外線カメラ,レンジゲートカメラ,偏光カメラ,イベントカメラの4つの新しいイメージセンサの原理を概観する。それらの比較優位性、既存または潜在的アプリケーション、および対応するデータ処理アルゴリズムはすべて、体系的な方法で提示される。本研究は、自動運転社会の実践者に対して、新たな視点と洞察を提供することを期待する。

Autonomous vehicles rely on perception systems to understand their surroundings for further navigation missions. Cameras are essential for perception systems due to the advantages of object detection and recognition provided by modern computer vision algorithms, comparing to other sensors, such as LiDARs and radars. However, limited by its inherent imaging principle, a standard RGB camera may perform poorly in a variety of adverse scenarios, including but not limited to: low illumination, high contrast, bad weather such as fog/rain/snow, etc. Meanwhile, estimating the 3D information from the 2D image detection is generally more difficult when compared to LiDARs or radars. Several new sensing technologies have emerged in recent years to address the limitations of conventional RGB cameras. In this paper, we review the principles of four novel image sensors: infrared cameras, range-gated cameras, polarization cameras, and event cameras. Their comparative advantages, existing or potential applications, and corresponding data processing algorithms are all presented in a systematic manner. We expect that this study will assist practitioners in the autonomous driving society with new perspectives and insights.

翻訳日:2023-06-22 06:17:24 公開日:2023-06-18

# 量子カーネルモデルにおける帯域幅の一般化

Bandwidth Enables Generalization in Quantum Kernel Models ( http://arxiv.org/abs/2206.06686v3 )

ライセンス: Link先を確認

Abdulkadir Canatar, Evan Peters, Cengiz Pehlevan, Stefan M. Wild, Ruslan Shaydulin

(参考訳) 量子コンピュータは、いくつかの特殊な設定で古典的な最先端の機械学習手法を高速化することが知られている。例えば、量子カーネルの手法は離散対数問題の学習版で指数関数的な高速化をもたらすことが示されている。量子モデルの一般化を理解することは、実用上の問題において同様のスピードアップを実現するために不可欠である。最近の結果は、一般化が量子的特徴空間の指数的大きさによって妨げられることを証明している。これらの結果は量子モデルが量子ビットの数が大きい場合には一般化できないことを示唆するが、本論文ではこれらの結果は過度に制限的な仮定に依存していることを示す。我々は、量子カーネル帯域幅と呼ばれるハイパーパラメータを変化させることで、より広いモデルのクラスを考える。我々は、大量子ビット極限を解析し、閉形式で解ける量子モデルの一般化のための明示的な公式を提供する。具体的には、帯域幅の値を変更することで、任意の対象関数に一般化できないモデルから、整列した目標に対する良好な一般化を得られることを示す。本解析では,帯域幅がカーネル積分演算子のスペクトルを制御し,モデルの帰納バイアスを制御していることを示す。この理論が量子モデルの一般化にどのように影響するかを正確に予測できることを実証的に証明する。我々は、機械学習における量子優位性に対する結果の意義について論じる。

Quantum computers are known to provide speedups over classical state-of-the-art machine learning methods in some specialized settings. For example, quantum kernel methods have been shown to provide an exponential speedup on a learning version of the discrete logarithm problem. Understanding the generalization of quantum models is essential to realizing similar speedups on problems of practical interest. Recent results demonstrate that generalization is hindered by the exponential size of the quantum feature space. Although these results suggest that quantum models cannot generalize when the number of qubits is large, in this paper we show that these results rely on overly restrictive assumptions. We consider a wider class of models by varying a hyperparameter that we call quantum kernel bandwidth. We analyze the large-qubit limit and provide explicit formulas for the generalization of a quantum model that can be solved in closed form. Specifically, we show that changing the value of the bandwidth can take a model from provably not being able to generalize to any target function to good generalization for well-aligned targets. Our analysis shows how the bandwidth controls the spectrum of the kernel integral operator and thereby the inductive bias of the model. We demonstrate empirically that our theory correctly predicts how varying the bandwidth affects generalization of quantum models on challenging datasets, including those far outside our theoretical assumptions. We discuss the implications of our results for quantum advantage in machine learning.

翻訳日:2023-06-22 06:08:31 公開日:2023-06-18

# Live in the Moment: 政策の進化に適応した学習ダイナミクスモデル

Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy ( http://arxiv.org/abs/2207.12141v3 )

ライセンス: Link先を確認

Xiyao Wang, Wichayaporn Wongkamjan, Ruonan Jia, Furong Huang

(参考訳) モデルベース強化学習(RL)は、動的モデルを学び、政策学習のためのサンプルを生成することにより、モデルフリーRLよりも実際に高いサンプル効率を達成する。以前の研究は、すべての歴史的政策、すなわちサンプル再生バッファの実証的な状態-行動ビジション分布に適合するダイナミックスモデルを学習した。しかし,本稿では,使用中のポリシが経時的に変化しているため,その分布下でのダイナミックスモデルの適用が必ずしも,<emph{all>履歴ポリシーのモデル予測に有効であるとは限らないことを観察する。トレーニング中のポリシーの進化は、状態行動の訪問分布の変化を引き起こす。我々は、この分布がモデル学習とモデルロールアウトに与える影響を理論的に分析する。次に,新しい動力学モデル学習法である \textit{policy-adapted dynamics model learning (pdml)" を提案する。 PDMLは歴史的政策混合分布を動的に調整し、学習したモデルが進化する政策の状態-行動訪問分布に継続的に適応できるようにする。 MuJoCoにおける一連の連続制御環境の実験により、PDMLは、最先端のモデルベースRL法と組み合わせて、サンプル効率を大幅に向上し、漸近性能を向上することが示された。

Model-based reinforcement learning (RL) often achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a dynamics model that fits under the empirical state-action visitation distribution for all historical policies, i.e., the sample replay buffer. However, in this paper, we observe that fitting the dynamics model under the distribution for \emph{all historical policies} does not necessarily benefit model prediction for the \emph{current policy} since the policy in use is constantly evolving over time. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how this distribution shift over historical policies affects the model learning and model rollouts. We then propose a novel dynamics model learning method, named \textit{Policy-adapted Dynamics Model Learning (PDML)}. PDML dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy. Experiments on a range of continuous control environments in MuJoCo show that PDML achieves significant improvement in sample efficiency and higher asymptotic performance combined with the state-of-the-art model-based RL methods.

翻訳日:2023-06-22 05:58:11 公開日:2023-06-18

# 大規模コーパスの意味的類似性分析に関する認知的研究:トランスフォーマーによるアプローチ

A Cognitive Study on Semantic Similarity Analysis of Large Corpora: A Transformer-based Approach ( http://arxiv.org/abs/2207.11716v2 )

ライセンス: Link先を確認

Praneeth Nemani, Satyanarayana Vollala

(参考訳) 意味的類似性分析とモデリングは、今日の多くの自然言語処理の先駆的応用において、基本的に賞賛されているタスクである。シーケンシャルパターン認識の感覚により、RNNやLSTMのような多くのニューラルネットワークはセマンティック類似性モデリングにおいて満足な結果を得た。しかし、これらの解は、非系列的な方法で情報を処理できないため、不適切なコンテキスト抽出につながるため、非効率であると考えられている。トランスフォーマーは、非逐次データ処理や自己アテンションといった長所があるため、最先端アーキテクチャとして機能する。本稿では,従来の手法とトランスフォーマー方式の両方を用いて,米国特許用語のPhrase Matching Datasetに対する意味的類似性解析とモデリングを行う。提案手法は,4種類の復号化BERT-DeBERTaを試作し,K-Foldクロスバリデーションにより性能を向上する。実験の結果,従来の手法と比較して手法の性能が向上し,平均ピアソン相関スコアは0.79。

Semantic similarity analysis and modeling is a fundamentally acclaimed task in many pioneering applications of natural language processing today. Owing to the sensation of sequential pattern recognition, many neural networks like RNNs and LSTMs have achieved satisfactory results in semantic similarity modeling. However, these solutions are considered inefficient due to their inability to process information in a non-sequential manner, thus leading to the improper extraction of context. Transformers function as the state-of-the-art architecture due to their advantages like non-sequential data processing and self-attention. In this paper, we perform semantic similarity analysis and modeling on the U.S Patent Phrase to Phrase Matching Dataset using both traditional and transformer-based techniques. We experiment upon four different variants of the Decoding Enhanced BERT - DeBERTa and enhance its performance by performing K-Fold Cross-Validation. The experimental results demonstrate our methodology's enhanced performance compared to traditional techniques, with an average Pearson correlation score of 0.79.

翻訳日:2023-06-22 05:57:49 公開日:2023-06-18

# CPU上のディープラーニングモデル:効率的なトレーニングの方法論

Deep Learning Models on CPUs: A Methodology for Efficient Training ( http://arxiv.org/abs/2206.10034v2 )

ライセンス: Link先を確認

Quchen Fu, Ramesh Chukka, Keith Achorn, Thomas Atta-fosu, Deepak R. Canchi, Zhongwei Teng, Jules White, and Douglas C. Schmidt

(参考訳) GPUは、高度に並列化されたアーキテクチャのため、ディープラーニングモデルのトレーニングに好まれている。その結果、トレーニング最適化に関するほとんどの研究はGPUに焦点を当てている。しかし、トレーニング用の適切なハードウェアを選択する方法を決定する際には、コストと効率のトレードオフがしばしばあります。特にcpuサーバは、ハードウェア更新コストが少なく、既存のインフラをより活用できるため、cpu上でのトレーニングがより効率的であれば有益である。本稿では,CPUを用いた深層学習モデルの学習にいくつかの貢献をする。まず、intel cpu上でディープラーニングモデルのトレーニングを最適化する手法と、パフォーマンスプロファイリングを改善するために開発したprofilednnと呼ばれるツールキットを提案する。第2に、ワークフローをガイドし、パフォーマンス問題を特定するいくつかのケーススタディを探索し、PyTorch用のIntel Extensionを最適化することで、RetinaNet-ResNext50モデル全体の2倍のトレーニングパフォーマンスが向上する。第3に、PyTorchの公式実装の2倍高速な、ボトルネックの特定とカスタム焦点損失カーネル作成を可能にするProfileDNNの可視化機能を活用する方法を示す。

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when deciding on how to choose the proper hardware for training. In particular, CPU servers can be beneficial if training on CPUs was more efficient, as they incur fewer hardware update costs and better utilizing existing infrastructure. This paper makes several contributions to research on training deep learning models using CPUs. First, it presents a method for optimizing the training of deep learning models on Intel CPUs and a toolkit called ProfileDNN, which we developed to improve performance profiling. Second, we describe a generic training optimization method that guides our workflow and explores several case studies where we identified performance issues and then optimized the Intel Extension for PyTorch, resulting in an overall 2x training performance increase for the RetinaNet-ResNext50 model. Third, we show how to leverage the visualization capabilities of ProfileDNN, which enabled us to pinpoint bottlenecks and create a custom focal loss kernel that was two times faster than the official reference PyTorch implementation.

翻訳日:2023-06-22 05:56:13 公開日:2023-06-18

# SE(3)-DiffusionFields:拡散による関節握りと運動最適化のためのスムーズなコスト関数の学習

SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion ( http://arxiv.org/abs/2209.03855v4 )

ライセンス: Link先を確認

Julen Urain and Niklas Funk and Jan Peters and Georgia Chalvatzaki

(参考訳) 多目的最適化問題は、ロボット工学においてユビキタスである。例えば、ロボット操作タスクの最適化には、ポーズの設定、衝突、関節制限の把握に関する共同検討が必要である。いくつかの要求は容易に手作業で設計できるが、例えば、軌道の滑らかさはデータから学習する必要がある。本稿では,データ駆動型se(3)コスト関数を拡散モデルとして学習する手法を提案する。拡散モデルは高度に表現されたマルチモーダル分布を表現することができ、スコアマッチングトレーニングの目的のため、空間全体に適切な勾配を示すことができる。拡散モデルとしての学習コストは、他のコストとシームレスに1つの微分可能な目的関数に統合し、関節勾配に基づく運動最適化を可能にする。本研究では,6dof把持のためのse(3)拡散モデルの学習に着目し,把持選択と軌道生成を分離することなく,関節把持と運動最適化の新しい枠組みを創り出す。本研究は,SE(3)拡散モデルw.r.t.古典的生成モデルの表現力を評価し,代表的ベースラインに対するシミュレーションおよび実世界のロボット操作の一連のタスクにおいて,提案した最適化フレームワークの優れた性能を示す。

Multi-objective optimization problems are ubiquitous in robotics, e.g., the optimization of a robot manipulation task requires a joint consideration of grasp pose configurations, collisions and joint limits. While some demands can be easily hand-designed, e.g., the smoothness of a trajectory, several task-specific objectives need to be learned from data. This work introduces a method for learning data-driven SE(3) cost functions as diffusion models. Diffusion models can represent highly-expressive multimodal distributions and exhibit proper gradients over the entire space due to their score-matching training objective. Learning costs as diffusion models allows their seamless integration with other costs into a single differentiable objective function, enabling joint gradient-based motion optimization. In this work, we focus on learning SE(3) diffusion models for 6DoF grasping, giving rise to a novel framework for joint grasp and motion optimization without needing to decouple grasp selection from trajectory generation. We evaluate the representation power of our SE(3) diffusion models w.r.t. classical generative models, and we showcase the superior performance of our proposed optimization framework in a series of simulated and real-world robotic manipulation tasks against representative baselines.

翻訳日:2023-06-22 05:49:27 公開日:2023-06-18

# 任意の次元に対するstabiliser符号の数え上げ

Counting stabiliser codes for arbitrary dimension ( http://arxiv.org/abs/2209.01449v2 )

ライセンス: Link先を確認

Tanmay Singal, Che Chiang, Eugene Hsu, Eunsang Kim, Hsi-Sheng Goan and Min-Hsiu Hsieh

(参考訳) この作業では、任意の正の整数$d$に対して、$d$-dimensional qudits からなる $[[n,k]]_d$ 安定化符号の数を計算する。 gross (ref. [23]) による独創的な著作において、$[[n,k]]_d$安定化符号は、$d$ が素数である場合(または素数、すなわち $d=p^m$ である場合)に計算された。 Refの証明。参照。 [23]は,非プライム事件には適用できない. この証明のために、グループ構造を $[n,k]]_d$ コードに導入し、これを中国の剰余定理と組み合わせて $[[n,k]]_d$ コードの数を数える。私たちの仕事はRefと重なる。参照。 [23]$d$が素数であり、この場合、我々の結果は正確に一致するが、より一般的なケースでは結果が異なる。それにもかかわらず、安定化符号の総桁数は、その次元が素数であるか非素数であるかに依存しない。これは、安定化状態の数(またはより一般に安定化符号)を数えるために使われる方法が$d$が素数であるかどうかに依存するため、驚くべきことである。安定状態の濃度は、素数次元の場合(およびガロア・クディット素数-パワー次元の場合)でしか知られていなかったが、量子コンピューティングにおける多くの話題において量子化器として重要な役割を果たす。その中には、魔法の資源理論、設計理論、安定状態に対するデ・フィネッティの定理、クリフォード回路の古典的シミュラビリティの研究と最適化、小次元系の量子的文脈性の研究、ウィグナー函数の研究などが含まれる。我々の研究は、一般の場合でこの量子化器を利用できるので、素数次元でない量子系を素数次元系と同じ台座に配置する上で重要なステップである。

In this work, we compute the number of $[[n,k]]_d$ stabilizer codes made up of $d$-dimensional qudits, for arbitrary positive integers $d$. In a seminal work by Gross (Ref. [23]) the number of $[[n,k]]_d$ stabilizer codes was computed for the case when $d$ is a prime (or the power of a prime, i.e., $d=p^m$, but when the qudits are Galois-qudits). The proof in Ref. Ref. [23] is inapplicable to the non-prime case. For our proof, we introduce a group structure to $[[n,k]]_d$ codes, and use this in conjunction with the Chinese remainder theorem to count the number of $[[n,k]]_d$ codes. Our work overlaps with Ref. Ref. [23] when $d$ is a prime and in this case our results match exactly, but the results differ for the more generic case. Despite that, the overall order of magnitude of the number of stabilizer codes scales agnostic of whether the dimension is prime or non-prime. This is surprising since the method employed to count the number of stabilizer states (or more generally stabilizer codes) depends on whether $d$ is prime or not. The cardinality of stabilizer states, which was so far known only for the prime-dimensional case (and the Galois qudit prime-power dimensional case) plays an important role as a quantifier in many topics in quantum computing. Salient among these are the resource theory of magic, design theory, de Finetti theorem for stabilizer states, the study and optimisation of the classical simulability of Clifford circuits, the study of quantum contextuality of small-dimensional systems and the study of Wigner-functions. Our work makes available this quantifier for the generic case, and thus is an important step needed to place results for quantum computing with non-prime dimensional quantum systems on the same pedestal as prime-dimensional systems.

翻訳日:2023-06-22 05:48:41 公開日:2023-06-18

# グローバル収束勾配型バイレベルハイパーパラメータ最適化法

A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization Method ( http://arxiv.org/abs/2208.12118v2 )

ライセンス: Link先を確認

Ankur Sinha, Satender Gunwal and Shivam Kumar

(参考訳) 機械学習におけるハイパーパラメータ最適化は、通常、近似したハイパーパラメータセットのみをもたらすナイーブなテクニックによって達成される。ベイズ最適化のような手法は、与えられたハイパーパラメータの領域をインテリジェントに探索するが、最適解を保証しない。これらのアプローチの大きな欠点は、ハイパーパラメータの数で探索領域が指数関数的に増加し、計算コストが増加し、アプローチが遅くなることである。超パラメータ最適化問題は本質的には二段階最適化問題であり、この問題を解決するための二段階解法を試みている研究もある。しかしながら、これらの研究はトレーニング損失を最小限にするユニークなモデル重み付けを仮定している。本稿では,超パラメータ最適化問題の解法として,これらの欠点に対処する勾配法について述べる。提案手法は,実験で正規化ハイパーパラメータを選択した連続ハイパーパラメータを扱うことができる。この手法は、理論的に証明された最適パラメータの集合への収束を保証する。この考え方はガウス過程回帰を用いた低レベル最適値関数の近似に基づいている。その結果、二レベル問題は、拡張ラグランジアン法を用いて解決される単一レベル制約最適化タスクに還元される。我々は,MNISTおよびCIFAR-10データセットを多層パーセプトロンおよびLeNetアーキテクチャ上で広範囲に計算し,提案手法の有効性を確認した。格子探索, ランダム探索, ベイズ最適化, ハイバーバンド法の比較研究により, 提案アルゴリズムはより低い計算量に収束し, テストセットをより一般化するモデルが導かれることを示した。

Hyperparameter optimization in machine learning is often achieved using naive techniques that only lead to an approximate set of hyperparameters. Although techniques such as Bayesian optimization perform an intelligent search on a given domain of hyperparameters, it does not guarantee an optimal solution. A major drawback of most of these approaches is an exponential increase of their search domain with number of hyperparameters, increasing the computational cost and making the approaches slow. The hyperparameter optimization problem is inherently a bilevel optimization task, and some studies have attempted bilevel solution methodologies for solving this problem. However, these studies assume a unique set of model weights that minimize the training loss, which is generally violated by deep learning architectures. This paper discusses a gradient-based bilevel method addressing these drawbacks for solving the hyperparameter optimization problem. The proposed method can handle continuous hyperparameters for which we have chosen the regularization hyperparameter in our experiments. The method guarantees convergence to the set of optimal hyperparameters that this study has theoretically proven. The idea is based on approximating the lower-level optimal value function using Gaussian process regression. As a result, the bilevel problem is reduced to a single level constrained optimization task that is solved using the augmented Lagrangian method. We have performed an extensive computational study on the MNIST and CIFAR-10 datasets on multi-layer perceptron and LeNet architectures that confirms the efficiency of the proposed method. A comparative study against grid search, random search, Bayesian optimization, and HyberBand method on various hyperparameter problems shows that the proposed algorithm converges with lower computation and leads to models that generalize better on the testing set.

翻訳日:2023-06-22 05:47:26 公開日:2023-06-18

# PromptCast: 時系列予測のための新しいPromptベースの学習パラダイム

PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting ( http://arxiv.org/abs/2210.08964v4 )

ライセンス: Link先を確認

Hao Xue and Flora D. Salim

(参考訳) 本稿では,時系列予測の新しい視点を提案する。既存の時系列予測手法では、モデルは入力として数値の列を取り、出力として数値値を生成する。既存のSOTAモデルはトランスフォーマーアーキテクチャに基づいており、複数のエンコーディング機構で変更され、歴史的データのコンテキストとセマンティクスが組み込まれている。事前学習された言語基盤モデルの成功に触発されて、これらのモデルが時系列予測の解決にも適用できるかどうかを疑問視する。そこで我々は,新しい予測パラダイムであるprompt-based time series forecasting (promptcast)を提案する。この新しいタスクでは、数値入力と出力をプロンプトに変換し、予測タスクを文から文へのフレーム化することで、予測目的の言語モデルを直接適用することができる。本研究を支援するために,3つの実世界の予測シナリオを含む大規模データセット(PISA)を提案する。我々は異なるSOTA数値に基づく予測手法と言語生成モデルを評価する。様々な予測設定によるベンチマーク結果は、言語生成モデルで提案するプロンプトキャストが有望な研究方向であることを示している。さらに、従来の数値ベースの予測と比較すると、PromptCastはゼロショット設定下でのより優れた一般化能力を示す。

This paper presents a new perspective on time series forecasting. In existing time series forecasting methods, the models take a sequence of numerical values as input and yield numerical values as output. The existing SOTA models are largely based on the Transformer architecture, modified with multiple encoding mechanisms to incorporate the context and semantics around the historical data. Inspired by the successes of pre-trained language foundation models, we pose a question about whether these models can also be adapted to solve time-series forecasting. Thus, we propose a new forecasting paradigm: prompt-based time series forecasting (PromptCast). In this novel task, the numerical input and output are transformed into prompts and the forecasting task is framed in a sentence-to-sentence manner, making it possible to directly apply language models for forecasting purposes. To support and facilitate the research of this task, we also present a large-scale dataset (PISA) that includes three real-world forecasting scenarios. We evaluate different SOTA numerical-based forecasting methods and language generation models. The benchmark results with various forecasting settings demonstrate the proposed PromptCast with language generation models is a promising research direction. Additionally, in comparison to conventional numerical-based forecasting, PromptCast shows a much better generalization ability under the zero-shot setting.

翻訳日:2023-06-22 05:28:20 公開日:2023-06-18

# エッジ対応事前学習によるMR画像合成のためのマルチスケールトランスネットワーク

Multi-scale Transformer Network with Edge-aware Pre-training for Cross-Modality MR Image Synthesis ( http://arxiv.org/abs/2212.01108v3 )

ライセンス: Link先を確認

Yonghao Li, Tao Zhou, Kelei He, Yi Zhou, Dinggang Shen

(参考訳) 磁気共鳴(MR)画像合成は、与えられたモダリティから欠落するモダリティを生成するために用いられる。既存の(教師付き学習)手法は、効果的な合成モデルを訓練するために、多数のペアのマルチモーダルデータを必要とすることが多い。しかし、教師付きトレーニングに十分なペアデータを得ることは、しばしば困難である。実際、ペアデータの数は少ないが、ペアデータの数は少ないことが多い。本稿では,2つのペアデータとアンペアデータの両方を活用するために,エッジ対応MR画像合成のためのマルチスケールトランスフォーマーネットワーク(MT-Net)を提案する。具体的には、Edge保存型Masked AutoEncoder(Edge-MAE)を自己教師方式で事前訓練し、同時に実行する。 1)各画像にランダムにマスキングされたパッチに対する画像インプテーション 2)エッジマップ全体の推定はコンテキスト情報と構造情報の両方を効果的に学習する。さらに,各対策の難しさに応じて異なるマスクパッチを別々に処理することにより,Edge-MAEの性能を向上させるパッチワイド・ロスを提案する。提案した事前学習に基づいて、後続の微調整段階において、事前訓練したエッジ-MAEのエンコーダから抽出したマルチスケール特徴を統合することにより、欠損モード画像を合成するデュアルスケール選択融合(DSF)モジュールを設計(MT-Net)する。さらに、この事前学習エンコーダを用いて、合成画像と、トレーニングにおいて類似(一貫性)を必要とする対応する接地構造画像から高レベル特徴を抽出する。実験の結果, MT-Net は, 利用可能な全ペアデータに対して 70 % の費用を用いても, 競合する手法と同等の性能を発揮することがわかった。私たちのコードはhttps://github.com/lyhkevin/mt-netで公開されます。

Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model. However, it is often challenging to obtain sufficient paired data for supervised training. In reality, we often have a small number of paired data while a large number of unpaired data. To take advantage of both paired and unpaired data, in this paper, we propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis. Specifically, an Edge-preserving Masked AutoEncoder (Edge-MAE) is first pre-trained in a self-supervised manner to simultaneously perform 1) image imputation for randomly masked patches in each image and 2) whole edge map estimation, which effectively learns both contextual and structural information. Besides, a novel patch-wise loss is proposed to enhance the performance of Edge-MAE by treating different masked patches differently according to the difficulties of their respective imputations. Based on this proposed pre-training, in the subsequent fine-tuning stage, a Dual-scale Selective Fusion (DSF) module is designed (in our MT-Net) to synthesize missing-modality images by integrating multi-scale features extracted from the encoder of the pre-trained Edge-MAE. Further, this pre-trained encoder is also employed to extract high-level features from the synthesized image and corresponding ground-truth image, which are required to be similar (consistent) in the training. Experimental results show that our MT-Net achieves comparable performance to the competing methods even using $70\%$ of all available paired data. Our code will be publicly available at https://github.com/lyhkevin/MT-Net.

翻訳日:2023-06-22 05:00:35 公開日:2023-06-18

# 注意機構に基づくBi-LSTM価格予測

Bi-LSTM Price Prediction based on Attention Mechanism ( http://arxiv.org/abs/2212.03443v2 )

ライセンス: Link先を確認

Jiashu Lou, Leyi Cui, Ye Li

(参考訳) 金融デリバティブ市場の拡大と発展に伴い、取引の頻度もより速く、より速くなります。人間の限界により、最近はアルゴリズムと自動トレーディングが議論の中心となっている。本稿では,金とビットコインという2つの一般的な資産をベースとした,注目機構に基づく双方向LSTMニューラルネットワークを提案する。機能工学の面では,従来の技術要素を付加すると同時に,時系列モデルを組み合わせることで,要因の開発も行います。モデルパラメータの選択において、我々は最終的に2層深層学習ネットワークを選択した。 aucの測定によれば、bitcoinと金の正確性はそれぞれ71.94%と73.03%である。予測結果を用いて,2年間で1089.34%のリターンを達成した。同時に,本論文で提案した Bi-LSTM モデルと従来のモデルとの比較を行い,本モデルがデータセット上で最高の性能を示すことを示す。最後に, モデルの重要性と実験結果, 今後の改善方向性について考察する。

With the increasing enrichment and development of the financial derivatives market, the frequency of transactions is also faster and faster. Due to human limitations, algorithms and automatic trading have recently become the focus of discussion. In this paper, we propose a bidirectional LSTM neural network based on an attention mechanism, which is based on two popular assets, gold and bitcoin. In terms of Feature Engineering, on the one hand, we add traditional technical factors, and at the same time, we combine time series models to develop factors. In the selection of model parameters, we finally chose a two-layer deep learning network. According to AUC measurement, the accuracy of bitcoin and gold is 71.94% and 73.03% respectively. Using the forecast results, we achieved a return of 1089.34% in two years. At the same time, we also compare the attention Bi-LSTM model proposed in this paper with the traditional model, and the results show that our model has the best performance in this data set. Finally, we discuss the significance of the model and the experimental results, as well as the possible improvement direction in the future.

翻訳日:2023-06-22 04:48:07 公開日:2023-06-18

# ディープニューラルネットワークは2年生よりスマートか?

Are Deep Neural Networks SMARTer than Second Graders? ( http://arxiv.org/abs/2212.09993v5 )

ライセンス: Link先を確認

Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Kevin A. Smith, Joshua B. Tenenbaum

(参考訳) 最近では、高度な認知能力を必要とするタスク(例えば、囲い込み、アートの生成、チャットgptなど)を解決するためのディープニューラルネットワークの応用が増えている。幅広いスキルを必要とする問題を解決する上で、ニューラルネットワークはどの程度一般化可能か? この質問に答えるために、ニューラルネットワークの抽象化、推論、一般化能力を評価するための、単純なマルチモーダルアルゴリズム推論タスクと関連するsmart-101データセットを提案する。私たちのデータセットは101の独特なパズルで構成されており、それぞれのパズルは絵と質問で構成されており、それらの解には算術、代数、空間的推論などいくつかの基本的なスキルが必要です。ディープニューラルネットワークのトレーニングに向けてデータセットをスケールするために、解アルゴリズムを維持しながら、パズルごとに完全に新しいインスタンスをプログラムで生成する。 SMART-101の性能をベンチマークするために,様々な最先端のバックボーンを用いた視覚・言語メタラーニングモデルを提案する。実験の結果,強力な深層モデルでは教師付き環境下でのパズルに対して妥当な性能が得られたが,一般化のための解析ではランダムな精度に劣らないことがわかった。また,最近のChatGPTや他の大規模言語モデルをSMART-101のサブセットで評価した結果,これらのモデルが合理的な推論能力を示す一方で,解答はしばしば誤りであることがわかった。

Recent times have witnessed an increasing number of applications of deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, ChatGPT, etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children in the 6--8 age group. Our dataset consists of 101 unique puzzles; each puzzle comprises a picture and a question, and their solution needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning, among others. To scale our dataset towards training deep neural networks, we programmatically generate entirely new instances for each puzzle, while retaining their solution algorithm. To benchmark performances on SMART-101, we propose a vision and language meta-learning model using varied state-of-the-art backbones. Our experiments reveal that while powerful deep models offer reasonable performances on puzzles in a supervised setting, they are not better than random accuracy when analyzed for generalization. We also evaluate the recent ChatGPT and other large language models on a subset of SMART-101 and find that while these models show convincing reasoning abilities, the answers are often incorrect.

翻訳日:2023-06-22 04:41:03 公開日:2023-06-18

# ブラウアー群同変ニューラルネットワーク

Brauer's Group Equivariant Neural Networks ( http://arxiv.org/abs/2212.08630v2 )

ライセンス: Link先を確認

Edward Pearce-Crump

(参考訳) 私たちは、機械学習の文献に欠けている3つの対称性群に対して、層が$\mathbb{r}^{n}$のテンソルパワーを持つ可能性のある全てのグループ同変ニューラルネットワークの完全な特徴付けを提供する:$o(n)$、特別な直交群である$so(n)$、シンプレクティック群である$sp(n)$。特に、この群が$O(n)$または$SO(n)$であるとき、および群が$Sp(n)$であるときの$\mathbb{R}^{n}$のシンプレクティック基底において、そのようなテンソルパワー空間の間の学習可能で線型で同変な層函数のスパンニング集合を見つける。

We provide a full characterisation of all of the possible group equivariant neural networks whose layers are some tensor power of $\mathbb{R}^{n}$ for three symmetry groups that are missing from the machine learning literature: $O(n)$, the orthogonal group; $SO(n)$, the special orthogonal group; and $Sp(n)$, the symplectic group. In particular, we find a spanning set of matrices for the learnable, linear, equivariant layer functions between such tensor power spaces in the standard basis of $\mathbb{R}^{n}$ when the group is $O(n)$ or $SO(n)$, and in the symplectic basis of $\mathbb{R}^{n}$ when the group is $Sp(n)$.

翻訳日:2023-06-22 04:39:50 公開日:2023-06-18

# 大規模フレキシブルタイトガウス混合モデルの確率的1次学習

Stochastic First-Order Learning for Large-Scale Flexibly Tied Gaussian Mixture Model ( http://arxiv.org/abs/2212.05402v2 )

ライセンス: Link先を確認

Mohammad Pasande, Reshad Hosseini, Babak Nadjar Araabi

(参考訳) ガウス混合モデル(英: Gaussian Mixture Models、GMM)は、多くの科学的領域に適用できるカーネルモデルに基づく最も強力なパラメトリック密度推定器の1つである。近年、データソースの劇的な拡大に伴い、典型的な機械学習アルゴリズム、例えば期待最大化(em)は、高次元およびストリーミングデータで困難に直面する。さらに、複雑な密度はしばしば多数のガウス成分を必要とする。本稿では,一階確率最適化を用いたGMMの高速オンラインパラメータ推定アルゴリズムを提案する。このアプローチは、高次元のストリーミングデータや複雑な密度に直面した場合のGMMの課題に対応するためのフレームワークを提供する。直交性を保存する新しい確率多様体最適化アルゴリズムを導入し、よく知られたユークリッド空間の数値最適化と共に用いる。合成データと実データの両方における数多くの実験結果により,提案手法がem法よりも精度良く収束し,収束に必要なエポック数が少なく,エポック当たりの時間消費も少ないという点で有効であることが証明された。

Gaussian Mixture Models (GMM) are one of the most potent parametric density estimators based on the kernel model that finds application in many scientific domains. In recent years, with the dramatic enlargement of data sources, typical machine learning algorithms, e.g. Expectation Maximization (EM), encounters difficulty with high-dimensional and streaming data. Moreover, complicated densities often demand a large number of Gaussian components. This paper proposes a fast online parameter estimation algorithm for GMM by using first-order stochastic optimization. This approach provides a framework to cope with the challenges of GMM when faced with high-dimensional streaming data and complex densities by leveraging the flexibly-tied factorization of the covariance matrix. A new stochastic Manifold optimization algorithm that preserves the orthogonality is introduced and used along with the well-known Euclidean space numerical optimization. Numerous empirical results on both synthetic and real datasets justify the effectiveness of our proposed stochastic method over EM-based methods in the sense of better-converged maximum for likelihood function, fewer number of needed epochs for convergence, and less time consumption per epoch.

翻訳日:2023-06-22 04:38:28 公開日:2023-06-18

# ディープ線形ネットワークにおけるニューラル崩壊:バランスデータから不均衡データへ

Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data ( http://arxiv.org/abs/2301.00437v5 )

ライセンス: Link先を確認

Hien Dang and Tho Tran and Stanley Osher and Hung Tran-The and Nhat Ho and Tan Nguyen

(参考訳) 最近のディープニューラルネットワークは、画像分類から自然言語処理まで、タスクで素晴らしいパフォーマンスを達成している。驚くべきことに、大量のパラメータを持つこれらの複雑なシステムは、収束までのトレーニングにおいて、最終層の特徴と分類器において同じ構造特性を示す。特に、ラスト層の特徴はクラス平均に崩壊し、それらのクラス平均は単純等角タイトフレーム(etf)の頂点であることが観察されている。この現象はNeural Collapse(NC)として知られている。近年の論文では、単純化された"unconstrained feature model"を用いた学習問題の大域的最小化にncが現れることが理論的に示されている。この文脈では、一般的な平均二乗誤差 (MSE) とクロスエントロピー (CE) の損失に対して、より深い線形ネットワークにおけるNCの発生を証明し、大域的な解が線形層にまたがるNC特性を示すことを示す。さらに,本研究をmse損失に対する不均衡データに拡張し,バイアスフリー設定下でのncの最初の幾何解析を提案する。以上の結果から,最終層の特徴と分類器の直交ベクトルからなる幾何への収束が,対応するクラスにおけるデータ量に依存することを示す。最後に、バランスの取れたシナリオと不均衡なシナリオの両方で、合成および実用的なネットワークアーキテクチャに関する理論的解析を実証的に検証する。

Modern deep neural networks have achieved impressive performance on tasks from image classification to natural language processing. Surprisingly, these complex systems with massive amounts of parameters exhibit the same structural properties in their last-layer features and classifiers across canonical datasets when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means, and those class-means are the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is known as Neural Collapse (NC). Recent papers have theoretically shown that NC emerges in the global minimizers of training problems with the simplified "unconstrained feature model". In this context, we take a step further and prove the NC occurrences in deep linear networks for the popular mean squared error (MSE) and cross entropy (CE) losses, showing that global solutions exhibit NC properties across the linear layers. Furthermore, we extend our study to imbalanced data for MSE loss and present the first geometric analysis of NC under bias-free setting. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of orthogonal vectors, whose lengths depend on the amount of data in their corresponding classes. Finally, we empirically validate our theoretical analyses on synthetic and practical network architectures with both balanced and imbalanced scenarios.

翻訳日:2023-06-22 04:28:25 公開日:2023-06-18

# カリキュラムによるシングルタスクrlの複雑性向上の理解

Understanding the Complexity Gains of Single-Task RL with a Curriculum ( http://arxiv.org/abs/2212.12809v3 )

ライセンス: Link先を確認

Qiyang Li, Yuexiang Zhai, Yi Ma, Sergey Levine

(参考訳) 強化学習 (Reinforcement Learning, RL) の問題は, 十分な報奨がなければ難しい。証明可能なRL法に関する先行研究は、一般的にこの問題に専用の探索戦略で対処することを提案している。しかし、この課題に取り組む別の方法は、タスク空間が興味深いタスクだけでなく、暗黙的にカリキュラムとして機能する簡単なタスクを含むマルチタスクrl問題として再編成することである。このような改革により、既存のマルチタスクRLメソッドをスクラッチから1つの課題を解決するためのより効率的な代替手段として実行することが可能となる。本研究では,単タスクrl問題をカリキュラムで定義されたマルチタスクrl問題として再構成する理論的枠組みを提案する。カリキュラムの厳密な規則性条件下では、マルチタスクRL問題における各タスクの逐次的解決は、明確な探索ボーナスや探索戦略を伴わずに、元の単一タスク問題の解決よりも計算的に効率的であることを示す。また, シミュレーションロボットタスクにおけるカリキュラム学習を高速化する効果的な実践的学習アルゴリズムに, 理論的洞察を変換できることを示した。

Reinforcement learning (RL) problems can be challenging without well-shaped rewards. Prior work on provably efficient RL methods generally proposes to address this issue with dedicated exploration strategies. However, another way to tackle this challenge is to reformulate it as a multi-task RL problem, where the task space contains not only the challenging task of interest but also easier tasks that implicitly function as a curriculum. Such a reformulation opens up the possibility of running existing multi-task RL methods as a more efficient alternative to solving a single challenging task from scratch. In this work, we provide a theoretical framework that reformulates a single-task RL problem as a multi-task RL problem defined by a curriculum. Under mild regularity conditions on the curriculum, we show that sequentially solving each task in the multi-task RL problem is more computationally efficient than solving the original single-task problem, without any explicit exploration bonuses or other exploration strategies. We also show that our theoretical insights can be translated into an effective practical learning algorithm that can accelerate curriculum learning on simulated robotic tasks.

翻訳日:2023-06-22 04:28:02 公開日:2023-06-18

# 量子チャネルの時間表現のヒット:既約の場合とユニタリウォークへの応用を超えて

Hitting time expressions for quantum channels: beyond the irreducible case and applications to unitary walks ( http://arxiv.org/abs/2301.07003v3 )

ライセンス: Link先を確認

C. F. Lardizabal and L. F. L. Pereira

(参考訳) この研究では、有限次元ヒルベルト空間に作用する量子チャネルに関連する一般化された逆数を用いて、粒子が選択されたゴール部分空間に到達する平均ヒット時間を計算することができる。この研究で研究されている問題は、グラフ、特に量子マルコフ連鎖の量子力学に関する最近の結果に動機づけられている。我々は,一般化された逆数と打点時間がどのように得られるかを記述することに集中する。 a) 既約性の概念を弱めることができるので、既約の例も考慮できる。 b) 一般正のトレース保存地図に対する任意の到着部分空間を考えることができる。可算写像の自然な例はユニタリ量子ウォークによって与えられる。また、より特定の逆元、すなわち群逆元が我々の文脈でどのように現れるかを説明し、独立した興味を持つ行列代数的構成と関係付ける。

In this work we make use of generalized inverses associated with quantum channels acting on finite-dimensional Hilbert spaces, so that one may calculate the mean hitting time for a particle to reach a chosen goal subspace. The questions studied in this work are motivated by recent results on quantum dynamics on graphs, most particularly quantum Markov chains. We focus on describing how generalized inverses and hitting times can be obtained, with the main novelties of this work with respect to previous ones being that a) we are able to weaken the notion of irreducibility, so that reducible examples can be considered as well, and b) one may consider arbitrary arrival subspaces for general positive, trace preserving maps. Natural examples of reducible maps are given by unitary quantum walks. We also take the opportunity to explain how a more specific inverse, namely the group inverse, appears in our context, in connection with matrix algebraic constructions which may be of independent interest.

翻訳日:2023-06-22 04:19:54 公開日:2023-06-18

# 多次元概念発見(MCD):完全性を保証する統一フレームワーク

Multi-dimensional concept discovery (MCD): A unifying framework with completeness guarantees ( http://arxiv.org/abs/2301.11911v2 )

ライセンス: Link先を確認

Johanna Vielhaben, Stefan Bl\"ucher, Nils Strodthoff

(参考訳) 完全性公理は、モデルに局所的に忠実である、すなわち一つの決定に対してのみ、ポストホックなXAI法の説明を与える。 XAIの信頼できる応用、特に高い意思決定には、よりグローバルなモデル理解が必要です。近年,概念に基づく手法が提案されているが,実際のモデル推論に縛られることは保証されていない。この問題を回避するために,概念レベルの完全性関係を満たす従来のアプローチの拡張として,多次元概念発見(MCD)を提案する。提案手法は一般線形部分空間から概念として始まり,概念解釈可能性の強化やモデル部品の再学習は不要である。改良された概念を発見し,多次元部分空間の可能性を完全に活用するために,スパース部分空間クラスタリングを提案する。 mcdは、入力空間の概念を補完する2つの分析ツールを提供している: (1) 概念活性化マップ(concept activation map)は、サンプル内で概念が表現される場所を示し、原型的なサンプルを通して概念のキャラクタリゼーションを可能にする。どちらのツールもモデル推論の詳細な理解を可能にし、完全性関係を通じてモデルと関係することを保証する。これは、より信頼できるコンセプトベースのXAIへの道を開く。我々はより制約のある概念定義に対するmcdの優位性を実証的に示す。

The completeness axiom renders the explanation of a post-hoc XAI method only locally faithful to the model, i.e. for a single decision. For the trustworthy application of XAI, in particular for high-stake decisions, a more global model understanding is required. Recently, concept-based methods have been proposed, which are however not guaranteed to be bound to the actual model reasoning. To circumvent this problem, we propose Multi-dimensional Concept Discovery (MCD) as an extension of previous approaches that fulfills a completeness relation on the level of concepts. Our method starts from general linear subspaces as concepts and does neither require reinforcing concept interpretability nor re-training of model parts. We propose sparse subspace clustering to discover improved concepts and fully leverage the potential of multi-dimensional subspaces. MCD offers two complementary analysis tools for concepts in input space: (1) concept activation maps, that show where a concept is expressed within a sample, allowing for concept characterization through prototypical samples, and (2) concept relevance heatmaps, that decompose the model decision into concept contributions. Both tools together enable a detailed understanding of the model reasoning, which is guaranteed to relate to the model via a completeness relation. This paves the way towards more trustworthy concept-based XAI. We empirically demonstrate the superiority of MCD against more constrained concept definitions.

翻訳日:2023-06-22 04:09:21 公開日:2023-06-18

# 入力摂動による拡散モデルにおける露光バイアス低減

Input Perturbation Reduces Exposure Bias in Diffusion Models ( http://arxiv.org/abs/2301.11706v3 )

ライセンス: Link先を確認

Mang Ning, Enver Sangineto, Angelo Porrello, Simone Calderara, Rita Cucchiara

(参考訳) Denoising Diffusion Probabilistic Modelsは、長いサンプリングチェーンは高い計算コストをもたらすが、優れた生成品質を示している。本稿では,長いサンプリングチェーンが誤り蓄積現象の原因となり,自己回帰的テキスト生成における露光バイアス問題と類似していることを示す。具体的には、前者は真理サンプルに、後者は前回生成した結果に条件付けされているため、トレーニングとテストの間には相違があることに留意する。この問題を緩和するために,基底真理サンプルを摂動させて推定時間予測誤差をシミュレートする,非常に単純かつ効果的なトレーニング正規化を提案する。提案する入力摂動は,リコールや精度に影響を与えず,トレーニング時間と推論時間の両方を削減しつつ,サンプル品質の大幅な改善をもたらすことを実証的に示す。例えば、CelebA 64$\times$64では、トレーニング時間の37.5%を節約しながら、新しい最先端のFIDスコア1.27を達成する。コードはhttps://github.com/forever208/DDPM-IPで公開されている。

Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing, since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality while reducing both the training and the inference times. For instance, on CelebA 64$\times$64, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5% of the training time. The code is publicly available at https://github.com/forever208/DDPM-IP

翻訳日:2023-06-22 04:08:56 公開日:2023-06-18

# 不規則サンプリング時間列に対するニューラル連続離散状態空間モデル

Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series ( http://arxiv.org/abs/2301.11308v3 )

ライセンス: Link先を確認

Abdul Fatir Ansari, Alvin Heng, Andre Lim, Harold Soh

(参考訳) 実世界の動的現象(例えば気候、生物)の正確な予測モデルを学ぶことは難しい課題である。鍵となる問題は、自然プロセスと人工プロセスの両方によって生成されたデータは、しばしば不規則にサンプリングされ、または欠落した観察を含む時系列で構成されていることである。本研究では,離散時間観測による時系列連続時間モデリングのためのニューラル連続離散状態空間モデル(NCDSSM)を提案する。 NCDSSMは補助変数を用いて力学からの認識をアンタングルし、補助変数のみに償却推論を必要とする。連続離散フィルタリング理論の手法を活用して,動的状態の正確なベイズ推定を行う方法を示す。本研究では,潜在ダイナミクスの3つの柔軟なパラメータ化と,推論中に動的状態を限界化する効率的な学習目標を提案する。様々なドメインにわたる複数のベンチマークデータセットでの実証結果は、既存のモデルに対するncdssmのインプテーションと予測性能が改善されたことを示している。

Learning accurate predictive models of real-world dynamic phenomena (e.g., climate, biological) remains a challenging task. One key issue is that the data generated by both natural and artificial processes often comprise time series that are irregularly sampled and/or contain missing observations. In this work, we propose the Neural Continuous-Discrete State Space Model (NCDSSM) for continuous-time modeling of time series through discrete-time observations. NCDSSM employs auxiliary variables to disentangle recognition from dynamics, thus requiring amortized inference only for the auxiliary variables. Leveraging techniques from continuous-discrete filtering theory, we demonstrate how to perform accurate Bayesian inference for the dynamic states. We propose three flexible parameterizations of the latent dynamics and an efficient training objective that marginalizes the dynamic states during inference. Empirical results on multiple benchmark datasets across various domains show improved imputation and forecasting performance of NCDSSM over existing models.

翻訳日:2023-06-22 04:08:18 公開日:2023-06-18

# 交互群同変ニューラルネットワークのゼリーフィッシュ特性

How Jellyfish Characterise Alternating Group Equivariant Neural Networks ( http://arxiv.org/abs/2301.10152v2 )

ライセンス: Link先を確認

Edward Pearce-Crump

(参考訳) 我々は、層が$\mathbb{R}^{n}$のテンソルパワーを持つ任意の交互群(A_n$)同変ニューラルネットワークの完全な特徴付けを提供する。特に、学習可能で線型で$A_n$-同変な層函数に対する行列の基底は、そのようなテンソルパワー空間の間の標準基底$\mathbb{R}^{n}$である。また,本手法が局所対称性に同値なニューラルネットワークの構築にどのように一般化するかについても述べる。

We provide a full characterisation of all of the possible alternating group ($A_n$) equivariant neural networks whose layers are some tensor power of $\mathbb{R}^{n}$. In particular, we find a basis of matrices for the learnable, linear, $A_n$-equivariant layer functions between such tensor power spaces in the standard basis of $\mathbb{R}^{n}$. We also describe how our approach generalises to the construction of neural networks that are equivariant to local symmetries.

翻訳日:2023-06-22 04:07:53 公開日:2023-06-18

# ローカルクレジットと不完全軌道を用いたGFlowNetsのより良いトレーニング

Better Training of GFlowNets with Local Credit and Incomplete Trajectories ( http://arxiv.org/abs/2302.01687v2 )

ライセンス: Link先を確認

Ling Pan, Nikolay Malkin, Dinghuai Zhang, Yoshua Bengio

(参考訳) Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an energy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood). それらは、生成軌道の最後に与えられる、いくつかの報酬関数 $r(x)$ (または $\exp(-\mathcal{e}(x))$ with $\mathcal{e}(x)$ に比例する確率を持つ一連のステップを通じて、オブジェクト $x$を生成するように訓練される。最終的に報酬が与えられる他のRL設定と同様に、トレーニングとクレジットの割り当ての効率は、これらの軌道が長くなると損なわれる可能性がある。従来のgflownetでは,不完全なトラジェクタ(終端状態と関連する報酬の計算)からの学習は不可能だった。本稿では, 終端状態だけでなく, 中間状態にもエネルギー関数が適用可能であることを考察する。これは例えば、エネルギー関数が加法的であるときに達成され、軌道に沿って項が利用できる。我々は、GFlowNet状態フロー関数を再パラメータ化して、各状態で既に獲得した部分的な報酬を利用する方法を示す。これにより、不完全なトラジェクトリであってもパラメータの更新に適用可能なトレーニングの目標が可能になる。完全な軌道が利用可能である場合でも、多くのシミュレーションで示されているように、より局所化されたクレジットと勾配を得ることができることはトレーニング収束をスピードアップさせる。

Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an energy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood). They are trained to generate an object $x$ through a sequence of steps with probability proportional to some reward function $R(x)$ (or $\exp(-\mathcal{E}(x))$ with $\mathcal{E}(x)$ denoting the energy function), given at the end of the generative trajectory. Like for other RL settings where the reward is only given at the end, the efficiency of training and credit assignment may suffer when those trajectories are longer. With previous GFlowNet work, no learning was possible from incomplete trajectories (lacking a terminal state and the computation of the associated reward). In this paper, we consider the case where the energy function can be applied not just to terminal states but also to intermediate states. This is for example achieved when the energy function is additive, with terms available along the trajectory. We show how to reparameterize the GFlowNet state flow function to take advantage of the partial reward already accrued at each state. This enables a training objective that can be applied to update parameters even with incomplete trajectories. Even when complete trajectories are available, being able to obtain more localized credit and gradients is found to speed up training convergence, as demonstrated across many simulations.

翻訳日:2023-06-22 04:00:16 公開日:2023-06-18

# 生成的対向対称性発見

Generative Adversarial Symmetry Discovery ( http://arxiv.org/abs/2302.00236v4 )

ライセンス: Link先を確認

Jianke Yang, Robin Walters, Nima Dehmamy, Rose Yu

(参考訳) 科学応用における等価ニューラルネットワークの成功にもかかわらず、それらは対称性群 a を事前に知る必要がある。しかし、実際どの対称性を帰納的バイアスとして使うかを知るのは難しいかもしれない。間違った対称性を強制してもパフォーマンスを損なうことさえある。本稿では,生成的対人訓練に類似したパラダイムを用いて,データセットから同値を自動的に検出するフレームワークLieGANを提案する。具体的には、生成器がデータに適用された変換のグループを学習し、元の分布を保存し、識別器を騙す。リーGANは対称性を解釈可能なリー代数基底として表現し、回転群 $\mathrm{SO}(n)$、制限ローレンツ群 $\mathrm{SO}(1,3)^+$ のような様々な対称性を軌道予測やトップクォークタギングタスクにおいて発見することができる。学習された対称性は、予測の精度と一般化を改善するために、既存の同変ニューラルネットワークで容易に利用できる。

Despite the success of equivariant neural networks in scientific applications, they require knowing the symmetry group a priori. However, it may be difficult to know which symmetry to use as an inductive bias in practice. Enforcing the wrong symmetry could even hurt the performance. In this paper, we propose a framework, LieGAN, to automatically discover equivariances from a dataset using a paradigm akin to generative adversarial training. Specifically, a generator learns a group of transformations applied to the data, which preserve the original distribution and fool the discriminator. LieGAN represents symmetry as interpretable Lie algebra basis and can discover various symmetries such as the rotation group $\mathrm{SO}(n)$, restricted Lorentz group $\mathrm{SO}(1,3)^+$ in trajectory prediction and top-quark tagging tasks. The learned symmetry can also be readily used in several existing equivariant neural networks to improve accuracy and generalization in prediction.

翻訳日:2023-06-22 03:59:08 公開日:2023-06-18

# アダプタフュージョンによるパラメータ効率変調バイアス低減

Parameter-efficient Modularised Bias Mitigation via AdapterFusion ( http://arxiv.org/abs/2302.06321v2 )

ライセンス: Link先を確認

Deepak Kumar, Oleg Lesota, George Zerveas, Daniel Cohen, Carsten Eickhoff, Markus Schedl, Navid Rekabsaz

(参考訳) 大きな事前学習された言語モデルは社会バイアスを含み、これらのバイアスに沿って下流タスクに運ばれます。現行のプロセス内バイアス緩和アプローチ(例えば逆行訓練)は、モデルのパラメータを更新することでデバイアスを課し、効果的にモデルを新しい、不可逆なデバイアス状態に移行する。本研究では,モデルから分離したスタンドアロンのデバイアス機能を開発するための新しい手法を提案する。 dam(debiasing with adapter modules) - 任意のバイアス緩和機能を別々のアダプタにカプセル化し、それをオンデマンドでモデルに追加することで公平性を提供する。我々は、性別、人種、年齢を保護属性とする3つの分類タスクに関する大規模な実験を行った。以上の結果から, DAMはバイアス緩和の有効性を改善し, マルチ属性シナリオにおける破滅的な忘れを回避し, パラメータ効率を付与し, オリジナルモデルとデバイアスモデルとの切り替えが容易なタスク性能を維持した。

Large pre-trained language models contain societal biases and carry along these biases to downstream tasks. Current in-processing bias mitigation approaches (like adversarial training) impose debiasing by updating a model's parameters, effectively transferring the model to a new, irreversible debiased state. In this work, we propose a novel approach to develop stand-alone debiasing functionalities separate from the model, which can be integrated into the model on-demand, while keeping the core model untouched. Drawing from the concept of AdapterFusion in multi-task learning, we introduce DAM (Debiasing with Adapter Modules) - a debiasing approach to first encapsulate arbitrary bias mitigation functionalities into separate adapters, and then add them to the model on-demand in order to deliver fairness qualities. We conduct a large set of experiments on three classification tasks with gender, race, and age as protected attributes. Our results show that DAM improves or maintains the effectiveness of bias mitigation, avoids catastrophic forgetting in a multi-attribute scenario, and maintains on-par task performance, while granting parameter-efficiency and easy switching between the original and debiased models.

翻訳日:2023-06-22 03:51:45 公開日:2023-06-18

# SOCRATES:ロボット犬を用いたテキスト検索とアプローチ

SOCRATES: Text-based Human Search and Approach using a Robot Dog ( http://arxiv.org/abs/2302.05324v2 )

ライセンス: Link先を確認

Jeongeun Park, Jefferson Silveria, Matthew Pan, and Sungjoon Choi

(参考訳) 本稿では、自由形式のテキスト記述に基づく人間の検索とアプローチに焦点を当てたTEXシステム(SOCRATES)に基づく人間接近ロボットのためのSOCraticモデルを提案する。特に、文章の記述は外観(例えば、黒い髪の白いシャツ)と位置情報(例えば、ロボットを扱う学生)で構成されている。本稿ではまず,言語領域における大規模事前学習モデルと,テキスト記述に基づいて対象者を探索するダウンストリームタスクを接続するHuman Search Socratic Modelを提案する。そこで,本研究では,目標音場ロボットの動作を生成するためのハイブリッド学習フレームワークを提案し,実験モジュールと知識蒸留モジュールからなる人物にアプローチする。仮想移動ロボットを用いたシミュレーションと,参加者とBoston Dynamics Spotロボットによる実世界の実験により,提案した探索モジュールを検証した。さらに,ロボット社会属性尺度 (robotic social attribute scale,rosas) に基づいて,人間参加型フレームワークの特性を解析した。

In this paper, we propose a SOCratic model for Robots Approaching humans based on TExt System (SOCRATES) focusing on the human search and approach based on free-form textual description; the robot first searches for the target user, then the robot proceeds to approach in a human-friendly manner. In particular, textual descriptions are composed of appearance (e.g., wearing white shirts with black hair) and location clues (e.g., is a student who works with robots). We initially present a Human Search Socratic Model that connects large pre-trained models in the language domain to solve the downstream task, which is searching for the target person based on textual descriptions. Then, we propose a hybrid learning-based framework for generating target-cordial robotic motion to approach a person, consisting of a learning-from-demonstration module and a knowledge distillation module. We validate the proposed searching module via simulation using a virtual mobile robot as well as through real-world experiments involving participants and the Boston Dynamics Spot robot. Furthermore, we analyze the properties of the proposed approaching framework with human participants based on the Robotic Social Attributes Scale (RoSAS)

翻訳日:2023-06-22 03:50:19 公開日:2023-06-18

# 自由グラフモデルの構造学習のための原理的・効率的なモチーフ探索

Principled and Efficient Motif Finding for Structure Learning of Lifted Graphical Models ( http://arxiv.org/abs/2302.04599v3 )

ライセンス: Link先を確認

Jonathan Feldstein, Dominic Phillips and Efthymia Tsamoura

(参考訳) 構造学習は、ニューロシンボリックAIと統計リレーショナル学習の分野の中心となるAIの中核的な問題である。データから論理理論を自動的に学習する。構造学習の基礎は、構造モチーフとして知られるデータの繰り返しパターンをマイニングすることである。これらのパターンを見つけることは指数探索空間を減らし、したがって公式の学習を導く。モチーフ学習の重要性にもかかわらず、まだよく理解されていない。本稿では,一階述語論理と確率論的モデルとをブレンドする言語であるリフト型グラフィカルモデルにおいて,構造モチーフをマイニングする第一原理的手法を提案する。私たちの最初の貢献は、2つの直感的なハイパーパラメータに依存するアルゴリズムです。1つはエンティティの類似性測度の不確実性を制御するもので、もう1つは結果のルールの柔らかさを制御するものです。第2のコントリビューションは、最も関連するデータへの検索スペースを減らすために、データの階層的クラスタリングを実行する前処理ステップです。 3つ目の貢献は、構造関連データをクラスタリングするためのO(n ln n)アルゴリズムの導入です。提案手法は, 標準ベンチマークを用いて評価し, 最先端構造学習手法の精度を最大6%, 実行速度を最大80%向上することを示す。

Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning. It consists in automatically learning a logical theory from data. The basis for structure learning is mining repeating patterns in the data, known as structural motifs. Finding these patterns reduces the exponential search space and therefore guides the learning of formulas. Despite the importance of motif learning, it is still not well understood. We present the first principled approach for mining structural motifs in lifted graphical models, languages that blend first-order logic with probabilistic models, which uses a stochastic process to measure the similarity of entities in the data. Our first contribution is an algorithm, which depends on two intuitive hyperparameters: one controlling the uncertainty in the entity similarity measure, and one controlling the softness of the resulting rules. Our second contribution is a preprocessing step where we perform hierarchical clustering on the data to reduce the search space to the most relevant data. Our third contribution is to introduce an O(n ln n) (in the size of the entities in the data) algorithm for clustering structurally-related data. We evaluate our approach using standard benchmarks and show that we outperform state-of-the-art structure learning approaches by up to 6% in terms of accuracy and up to 80% in terms of runtime.

翻訳日:2023-06-22 03:49:44 公開日:2023-06-18

# nl2cmd: 自然言語からbashコマンドへの変換をアップデートしたワークフロー

NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation ( http://arxiv.org/abs/2302.07845v3 )

ライセンス: Link先を確認

Quchen Fu, Zhongwei Teng, Marco Georgaklis, Jules White, Douglas C. Schmidt

(参考訳) 自然言語をBash Commandsに翻訳することは近年注目されている研究分野である。ほとんどの努力はより正確な翻訳モデルの作成に集中している。私たちの知る限りでは、2つのデータセットしか利用できません。どちらのデータセットも、既知のデータソース(stack overflowやクラウドソーシングなどを通じて)をスクレイピングし、英語テキストまたはbashコマンドの検証と修正を行う専門家を雇う。本稿では,Bashコマンドをスクラッチから合成する研究に2つの貢献をする。まず、対応する英文からBashコマンドを生成するための最先端翻訳モデルについて述べる。第2に、NL2CMDデータセットを新たに導入し、自動生成し、人間の介入を最小限に抑え、以前のデータセットの6倍以上の規模となる。生成パイプラインは既存のBashコマンドに依存しないので、分散とコマンドの種類をカスタマイズすることができる。このタスクにおけるChatGPTの性能を評価し、データジェネレータとして使用する可能性について議論する。私たちの実験結果は、データセットのスケールと多様性が、セマンティック解析研究者にユニークな機会を提供することを示す。

Translating natural language into Bash Commands is an emerging research field that has gained attention in recent years. Most efforts have focused on producing more accurate translation models. To the best of our knowledge, only two datasets are available, with one based on the other. Both datasets involve scraping through known data sources (through platforms like stack overflow, crowdsourcing, etc.) and hiring experts to validate and correct either the English text or Bash Commands. This paper provides two contributions to research on synthesizing Bash Commands from scratch. First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text. Second, we introduce a new NL2CMD dataset that is automatically generated, involves minimal human intervention, and is over six times larger than prior datasets. Since the generation pipeline does not rely on existing Bash Commands, the distribution and types of commands can be custom adjusted. We evaluate the performance of ChatGPT on this task and discuss the potential of using it as a data generator. Our empirical results show how the scale and diversity of our dataset can offer unique opportunities for semantic parsing researchers.

翻訳日:2023-06-22 03:40:05 公開日:2023-06-18

# 量子エントロピーと中心極限定理

Quantum Entropy and Central Limit Theorem ( http://arxiv.org/abs/2302.07841v3 )

ライセンス: Link先を確認

Kaifeng Bu, Weichen Gu, Arthur Jaffe

(参考訳) 離散変数(dv)量子系をquditsに基づいて研究する枠組みを提案する。これは平均状態(MS)、最小の安定射影状態(MSPS)、新しい畳み込みの概念に依存している。興味深い結果がいくつかある: ms は相対エントロピーに関して与えられた状態に対する最も近い msps であり、ms はフォン・ノイマンエントロピーに関して極端であり、「dv系における最大エントロピー原理」を示す。我々は、ゼロ平均量子状態の畳み込みを反復して中央極限定理を確立し、これをその ms に収束させることを示す。 DVビームスプリッタとDV増幅器の2つの例について詳述する。

We introduce a framework to study discrete-variable (DV) quantum systems based on qudits. It relies on notions of a mean state (MS), a minimal stabilizer-projection state (MSPS), and a new convolution. Some interesting consequences are: The MS is the closest MSPS to a given state with respect to the relative entropy; the MS is extremal with respect to the von Neumann entropy, demonstrating a ''maximal entropy principle in DV systems.'' We obtain a series of inequalities for quantum entropies and for Fisher information based on convolution, giving a ''second law of thermodynamics for quantum convolutions.'' We show that the convolution of two stabilizer states is a stabilizer state. We establish a central limit theorem, based on iterating the convolution of a zero-mean quantum state, and show this converges to its MS. The rate of convergence is characterized by the ''magic gap,'' which we define in terms of the support of the characteristic function of the state. We elaborate on two examples: the DV beam splitter and the DV amplifier.

翻訳日:2023-06-22 03:39:45 公開日:2023-06-18

# 空間時間データオーバーフィッティングによる高画質・高能率ビデオ超解法の実現に向けて

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting ( http://arxiv.org/abs/2303.08331v2 )

ライセンス: Link先を確認

Gen Li, Jie Ji, Minghai Qin, Wei Niu, Bin Ren, Fatemeh Afghah, Linke Guo, Xiaolong Ma

(参考訳) 深層畳み込みニューラルネットワーク(deep convolutional neural network, dnns)は,コンピュータビジョンのさまざまな分野で広く使用されているため,dnnによるビデオ解像度向上能力の活用が,現代の映像配信システムにおいて新たなトレンドとなっている。ビデオをチャンクに分割し、各チャンクを超高解像度モデルでオーバーフィットさせることで、サーバはビデオをクライアントに送信する前にエンコードする。しかし、大量のチャンクが良いオーバーフィッティング品質を保証することが期待され、ストレージを大幅に増加させ、データ転送により多くの帯域幅リソースを消費する。一方で、トレーニング最適化技術によるチャンク数の減少は通常、高いモデルキャパシティを必要とするため、実行速度が大幅に低下する。そこで本稿では,空間的時間的情報を利用して映像をチャンクに正確に分割し,チャンク数とモデルサイズを最小限に抑える,高品質で効率的な映像解像度アップスケーリングタスクのための新しい手法を提案する。さらに,本手法をデータ認識合同学習手法により,単一のオーバーフィッティングモデルに進化させ,品質低下によるストレージ要件の低減を図っている。市販の携帯電話にモデルをデプロイし,実験結果から,映像品質の高いリアルタイムビデオ解像度を実現することを示す。 41.6 PSNRで28fpsのストリーミング速度を実現し、ライブビデオ解像度アップスケールタスクでは14$\times$と2.29dBの高速化を実現した。 https://github.com/coulsonlee/STDO-CVPR2023.gitで利用可能なコード

As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14$\times$ faster and 2.29 dB better in the live video resolution upscaling tasks. Code available in https://github.com/coulsonlee/STDO-CVPR2023.git

翻訳日:2023-06-22 03:23:41 公開日:2023-06-18

# moe展開に向けて:mixing-of-expert(moe)推論の非効率化

Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference ( http://arxiv.org/abs/2303.06182v2 )

ライセンス: Link先を確認

Haiyang Huang, Newsha Ardalani, Anna Sun, Liu Ke, Hsien-Hsin S. Lee, Anjali Sridhar, Shruti Bhosale, Carole-Jean Wu, Benjamin Lee

(参考訳) Mixture-of-Experts (MoE)モデルはコンピュータビジョンと自然言語処理の幅広いタスクにおいて最先端のパフォーマンスを達成するために人気を集めている。トレーニング中の計算コストの最小化を図りながら、モデル容量を効果的に拡大する。しかし,そのようなモデルの導入は,大規模で複雑な通信パターンのため困難である。本稿では,2つのmoeワークロード,すなわち言語モデリング(lm)と機械翻訳(mt)のキャラクタリゼーションを行い,デプロイ時の非効率なソースを特定する。本研究では,(1)動的ゲーティング,(2)エキスパートバッファリング,(3)エキスパートロードバランシングの3つの非効率化手法を提案する。我々は,動的ゲーティングにより最大スループットが6.21-11.23$\times$ for LM, 5.75-10.98$\times$ for MT Encoder, 2.58-5.71$\times$ for MT Decoderを示す。また、LMで最大1.36$\times$、MTで最大1.1$\times$までメモリ使用量を削減します。また、CPUメモリで残りをバッファリングしながら、GPUメモリで熱くアクティブな専門家のみを保持する新しいキャッシングメカニズムであるExpert Bufferingを提案します。これにより、静的メモリ割り当てを最大1.47$\times$まで削減できる。最後に、ワークロードにさらなるスケーラビリティを提供するロードバランシング手法を提案する。

Mixture-of-Experts (MoE) models have gained popularity in achieving state-of-the-art performance in a wide range of tasks in computer vision and natural language processing. They effectively expand the model capacity while incurring a minimal increase in computation cost during training. However, deploying such models for inference is difficult due to their large size and complex communication pattern. In this work, we provide a characterization of two MoE workloads, namely Language Modeling (LM) and Machine Translation (MT) and identify their sources of inefficiencies at deployment. We propose three optimization techniques to mitigate sources of inefficiencies, namely (1) Dynamic gating, (2) Expert Buffering, and (3) Expert load balancing. We show that dynamic gating improves maximum throughput by 6.21-11.23$\times$ for LM, 5.75-10.98$\times$ for MT Encoder and 2.58-5.71$\times$ for MT Decoder. It also reduces memory usage by up to 1.36$\times$ for LM and up to 1.1$\times$ for MT. We further propose Expert Buffering, a new caching mechanism that only keeps hot, active experts in GPU memory while buffering the rest in CPU memory. This reduces static memory allocation by up to 1.47$\times$. We finally propose a load balancing methodology that provides additional scalability to the workload.

翻訳日:2023-06-22 03:22:27 公開日:2023-06-18

# Pacos: 推奨反転におけるユーザの解釈とコンテキスト依存の選択をモデル化する

Pacos: Modeling Users' Interpretable and Context-Dependent Choices in Preference Reversals ( http://arxiv.org/abs/2303.05648v2 )

ライセンス: Link先を確認

Qingming Li and H. Vicky Zhao

(参考訳) 選択問題とは、いくつかの項目から最適な選択を選択することを指し、選択問題におけるユーザの好みを学ぶことは、意思決定メカニズムを理解し、パーソナライズされたサービスを提供する上で非常に重要である。現存する作品は通常、人々が個別にアイテムを評価すると仮定する。しかし、実際には、ユーザの嗜好は、コンテキスト効果と呼ばれるアイテムが配置されている市場に依存しており、2つの項目に対するユーザの嗜好の順序は逆転し、嗜好逆転と呼ばれることもある。本研究では,ユーザの適応的な重み付け,項目間比較,表示位置の3つの要因を明らかにする。本稿では,3つの要素を同時に扱うための統一フレームワークとしてpacosと呼ばれる文脈依存選好モデルを提案し,高い解釈性を持つ付加法と高精度な ann 法を含む2つの設計法を検討する。プライオリティ・リバーサルの発生条件について検討し,プライオリティ・リバーサルの対処におけるpacosの有効性を理論的に証明する。実験結果から,提案手法は,ユーザの選択を予測するための先行作業よりも優れた性能を示し,好みの逆転の原因を理解するのに大いに役立つことがわかった。

Choice problems refer to selecting the best choices from several items, and learning users' preferences in choice problems is of great significance in understanding the decision making mechanisms and providing personalized services. Existing works typically assume that people evaluate items independently. In practice, however, users' preferences depend on the market in which items are placed, which is known as context effects; and the order of users' preferences for two items may even be reversed, which is referred to preference reversals. In this work, we identify three factors contributing to context effects: users' adaptive weights, the inter-item comparison, and display positions. We propose a context-dependent preference model named Pacos as a unified framework for addressing three factors simultaneously, and consider two design methods including an additive method with high interpretability and an ANN-based method with high accuracy. We study the conditions for preference reversals to occur and provide an theoretical proof of the effectiveness of Pacos in addressing preference reversals. Experimental results show that the proposed method has better performance than prior works in predicting users' choices, and has great interpretability to help understand the cause of preference reversals.

翻訳日:2023-06-22 03:21:35 公開日:2023-06-18

# 位相・経路コヒーレンスに基づく指向性ルータと制御可能な非相互伝送

Directional router and controllable non-reciprocity transmission based on phase and pathway coherence ( http://arxiv.org/abs/2303.13784v2 )

ライセンス: Link先を確認

Xu Yang, Lei Tan, and Wu-Ming Liu

(参考訳) 4つの空洞を持つ多チャネル量子ルータは、2つの結合共振器導波路と4つの単一空洞によって構成される。このハイブリッドシステムでは、入射ポートから出港ポートまでの光子間の複数の経路に基づき、特定ポートから出射する光子を100%に近い位置に調整することで方向経路を実現することができる。 2つの古典的光場間の位相差の影響下では、異なる経路間の相互干渉を破壊的干渉や建設的干渉に調整することができ、ルーティング確率の増大と減少の基礎となる。単一光子ルーティング確率に対するパラメータ値の影響についても検討した。確率振幅の解析式を調べることで、一定のパラメータ条件下で出口が閉じられる物理機構と、光子の後方伝達と元の方向伝達との間の位相関係が得られる。さらに、カイラルカップリングを超えた非相反的な伝送と方向ルーティングも実現でき、量子ルータの研究に新たな可能性と光子伝送特性の研究への新たな洞察を与えることができる。

A multi-channel quantum router with four nodal cavities is constructed by two coupled-resonator waveguides and four single cavities. We can achieve directional routing by adjusting the probability of photon exiting from the specified port to close to 100% based on multiple pathways between the photon from the incident port to the outgoing port in this hybrid system. Under the effect of phase difference between two classical light fields, the mutual interference between different pathways can be adjusted to destructive interference or constructive interference, which lays the foundation for the increase and decrease of the routing probability. The influence of different parameter values on single photon routing probability is also studied. By studying the analytic formula of probability amplitude, we get the physical mechanism of exiting ports being closed under certain parameter conditions and the phase relationship between the backward transmission and the original direction transmission of photons. Furthermore the non-reciprocal transmission and directional routing beyond chiral coupling can also be realized, which provides new possibilities for the study of quantum routers and new insights for the study of photon transmission characteristics.

翻訳日:2023-06-22 03:10:40 公開日:2023-06-18

# OpenAGI: LLMがドメインエキスパートと出会ったとき

OpenAGI: When LLM Meets Domain Experts ( http://arxiv.org/abs/2304.04370v4 )

ライセンス: Link先を確認

Yingqiang Ge, Wenyue Hua, Kai Mei, Jianchao Ji, Juntao Tan, Shuyuan Xu, Zelong Li, Yongfeng Zhang

(参考訳) ヒューマンインテリジェンスは、複雑なタスクを解決するための基本的なスキルの組み合わせに長けている。この能力は人工知能(AI)にとって不可欠であり、包括的なインテリジェントモデルに組み込まれるべきであり、AI(Artificial General Intelligence)に向けた複雑なタスク解決のためのエキスパートモデルを活用することができる。大規模言語モデル(llm)は有望な学習能力と推論能力を示し、外部モデルを用いて複雑な問題に取り組むことができる。本研究では,マルチステップ実世界のタスク用に設計されたオープンソースのAGI研究プラットフォームであるOpenAGIを紹介する。具体的には、OpenAGIはデュアル戦略を使用し、ベンチマークと評価のための標準ベンチマークタスクと、クリエイティブな問題解決のためのより拡張可能なモデルを含むオープンエンドタスクを統合する。タスクはLLMに自然言語クエリとして表示され、適切なモデルを選択し実行します。また,タスクフィードバック(rltf)機構からの強化学習を提案し,タスク結果を用いてllmの能力を改善し,自己改善型aiフィードバックループを作成する。我々は、AGIが一意に定義された解決経路を持たない、広く多面的な研究課題であることを認めているが、LLMとドメイン固有の専門家モデルの統合は、人間における一般知能と専門知能の混在を反映したものであり、AGIに対する有望なアプローチである。私たちは、openagiプロジェクトのコード、データセット、ベンチマーク、評価メソッド、デモをオープンソース化し、agiの進歩へのコミュニティの関与を促進しています。

Human intelligence excels at combining basic skills to solve complex tasks. This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive intelligent models, enabling them to harness expert models for complex task-solving towards Artificial General Intelligence (AGI). Large Language Models (LLMs) show promising learning and reasoning abilities, and can effectively use external models to tackle complex problems. In this work, we introduce OpenAGI, an open-source AGI research platform designed for multi-step, real-world tasks. Specifically, OpenAGI uses a dual strategy, integrating standard benchmark tasks for benchmarking and evaluation, and open-ended tasks including more expandable models for creative problem-solving. Tasks are presented as natural language queries to the LLM, which then selects and executes appropriate models. We also propose a Reinforcement Learning from Task Feedback (RLTF) mechanism that uses task results to improve the LLM's ability, which creates a self-improving AI feedback loop. While we acknowledge that AGI is a broad and multifaceted research challenge with no singularly defined solution path, the integration of LLMs with domain-specific expert models, inspired by mirroring the blend of general and specialized intelligence in humans, offers a promising approach towards AGI. We are open-sourcing the OpenAGI project's code, dataset, benchmarks, evaluation methods, and demo to foster community involvement in AGI advancement: https://github.com/agiresearch/OpenAGI.

翻訳日:2023-06-22 03:03:28 公開日:2023-06-18

# 炭化ケイ素中の炭素クラスターエミッタ

Carbon cluster emitters in silicon carbide ( http://arxiv.org/abs/2304.04197v2 )

ライセンス: Link先を確認

Pei Li, P\'eter Udvarhelyi, Song Li, Bing Huang, and Adam Gali

(参考訳) 4Hポリタイプ(4H-SiC)の炭化ケイ素は、高破壊電界、キャリア飽和速度、優れた熱伝導率、その他の良好な特性により、高要求の電子機器に期待できる広帯域ギャップ半導体である。近年, 4H-SiC, 例えば負電荷のシリコン空孔と中性希薄量子ビットの蛍光高スピン点欠陥は, 急速に出現する量子技術分野における多くの応用候補として注目されている。さらに、炭素クラスターは4H-SiCの熱酸化後に現れる蛍光中心としても機能し、SiC結晶中の炭素原子を放出する照射技術を用いることができる。照射技術は空室関連量子ビットを生成するためにしばしば用いられるため、蛍光炭素クラスターは既に確立された空室関連量子ビットに干渉する可能性がある。本研究では, 4H-SiCの炭素原子4個以上を含む炭素クラスターの電子構造, 生成エネルギー, 解離エネルギー, 振動特性およびフル蛍光スペクトルを密度汎関数理論計算により系統的に検討した。これらの炭素クラスターのすべての局所的な構成を検討しました。炭素クラスターの電子的および振動的性質は、4h-sic格子の実際の局所配置に大きく依存する。 4H-SiCの炭素クラスターを4H-SiCの安定可視発光体として同定した。

Silicon carbide in its 4H polytype (4H-SiC) is a promising wide band gap semiconductor for highly-demanding electronic devices, thanks to its high breakdown electrical field, high carrier saturation speed, excellent thermal conductivity, and other favorable properties. Recently, fluorescent high-spin point defects in 4H-SiC, e.g., negatively charged silicon-vacancy and neutral divacancy qubits, have been proven to be outstanding candidates for numerous applications in the rapidly emerging field of quantum technology. In addition, carbon clusters can act as fluorescent centers too that may appear after thermal oxidation of 4H-SiC or using irradiation techniques which kick out carbon atoms from their sites in the SiC crystal. As irradiation techniques are often used to generate vacancy-related qubits, fluorescent carbon clusters may interfere with the already established vacancy-related qubits. In this study, we systematically investigate the electronic structure, formation energy, dissociation energy, vibrational properties and the full fluorescence spectrum of carbon clusters involving up to four carbon atoms in 4H-SiC by means of density functional theory calculations. We considered all the possible local configurations for these carbon clusters. The electronic and vibronic properties of the carbon clusters depend strongly on the actual local configuration of the 4H-SiC lattice. By comparing the calculated and previously observed fluorescence spectra in 4H-SiC, we identify several carbon clusters as stable visible emitters in 4H-SiC.

翻訳日:2023-06-22 03:03:02 公開日:2023-06-18

# クロスレファレンストランスによる医療画像の分節化

Few-shot Medical Image Segmentation via Cross-Reference Transformer ( http://arxiv.org/abs/2304.09630v3 )

ライセンス: Link先を確認

Yao Huang and Jianming Liu

(参考訳) 深層学習モデルは医用画像セグメンテーションの主流となっているが、トレーニングには大規模な手動ラベル付きデータセットが必要であり、目に見えないカテゴリに拡張することは困難である。 Few-shot segmentation(FSS)は、少数のラベル付きサンプルから新しいカテゴリを学習することで、これらの課題に対処する可能性がある。現在の手法のほとんどはプロトタイプ学習アーキテクチャを採用しており、サポート対象のベクトルを拡張し、条件付きセグメンテーションを実行するためにクエリ機能と結合する。しかし、このようなフレームワークは、サポートとクエリ機能の相関を無視する一方で、クエリ機能に重点を置く可能性がある。本稿では,支援画像と問合せ画像との相互作用の欠如に対処するために,クロスリファレンストランスを用いた,自己教師付き少数の医用画像分割ネットワークを提案する。まず,両方向のクロスアテンションモジュールを用いて,サポートセット画像とクエリ画像の相関性を向上する。次に,高次元チャネルにおけるサポート機能やクエリ機能の類似部分を発掘・拡張するために,クロスリファレンス機構を採用している。実験の結果,CTデータセットとMRIデータセットの両方で良好な結果が得られた。

Deep learning models have become the mainstream method for medical image segmentation, but they require a large manually labeled dataset for training and are difficult to extend to unseen categories. Few-shot segmentation(FSS) has the potential to address these challenges by learning new categories from a small number of labeled samples. The majority of the current methods employ a prototype learning architecture, which involves expanding support prototype vectors and concatenating them with query features to conduct conditional segmentation. However, such framework potentially focuses more on query features while may neglect the correlation between support and query features. In this paper, we propose a novel self-supervised few shot medical image segmentation network with Cross-Reference Transformer, which addresses the lack of interaction between the support image and the query image. We first enhance the correlation features between the support set image and the query image using a bidirectional cross-attention module. Then, we employ a cross-reference mechanism to mine and enhance the similar parts of support features and query features in high-dimensional channels. Experimental results show that the proposed model achieves good results on both CT dataset and MRI dataset.

翻訳日:2023-06-22 02:53:41 公開日:2023-06-18

# 微調整事前学習言語モデルのためのk-NNの再検討

Revisiting k-NN for Fine-tuning Pre-trained Language Models ( http://arxiv.org/abs/2304.09058v2 )

ライセンス: Link先を確認

Lei Li, Jing Chen, Bozhong Tian, Ningyu Zhang

(参考訳) パラメトリックベースの熱心な学習者であるプレトレーニング言語モデル(PLM)は、現在の自然言語処理(NLP)のパラダイムにおいて事実上の選択肢となっている。対照的に、k-Nearest-Neighbor(kNN)分類器は遅延学習パラダイムであり、過度なフィットと孤立したノイズを軽減する傾向がある。本稿では, PLM に基づく分類器の拡張のために kNN 分類器を再検討する。方法論的なレベルでは,(1)kNNを事前知識として活用してトレーニングプロセスの校正を行う,という2つのステップで,PLMのテキスト表現を持つkNNを採用することを提案する。 2) kNNで予測される確率分布とPLMの分類器の確率分布を線形に補間する。私たちのアプローチの核心は、kNN校正トレーニングの実装です。これは、トレーニングプロセスにおいて、予測結果を簡単な例と難しい例の指標として扱います。アプリケーションシナリオの多様性の観点から、我々は8つのエンドタスクに対して、微調整、急速調整、ゼロショット、少数ショット、完全教師付き設定に関する広範な実験を行います。我々は,NLPを効率的にするための古典的手法の力をコミュニティに再考させることを願っている。コードとデータセットはhttps://github.com/zjunlp/Revisit-KNNで公開されている。

Pre-trained Language Models (PLMs), as parametric-based eager learners, have become the de-facto choice for current paradigms of Natural Language Processing (NLP). In contrast, k-Nearest-Neighbor (kNN) classifiers, as the lazy learning paradigm, tend to mitigate over-fitting and isolated noise. In this paper, we revisit kNN classifiers for augmenting the PLMs-based classifiers. From the methodological level, we propose to adopt kNN with textual representations of PLMs in two steps: (1) Utilize kNN as prior knowledge to calibrate the training process. (2) Linearly interpolate the probability distribution predicted by kNN with that of the PLMs' classifier. At the heart of our approach is the implementation of kNN-calibrated training, which treats predicted results as indicators for easy versus hard examples during the training process. From the perspective of the diversity of application scenarios, we conduct extensive experiments on fine-tuning, prompt-tuning paradigms and zero-shot, few-shot and fully-supervised settings, respectively, across eight diverse end-tasks. We hope our exploration will encourage the community to revisit the power of classical methods for efficient NLP. Code and datasets are available in https://github.com/zjunlp/Revisit-KNN.

翻訳日:2023-06-22 02:53:20 公開日:2023-06-18

# 人間-aiチームにおける統計的プロアクティブダイアログモデリングのための信頼度対応ユーザシミュレータの開発

Development of a Trust-Aware User Simulator for Statistical Proactive Dialog Modeling in Human-AI Teams ( http://arxiv.org/abs/2304.11913v2 )

ライセンス: Link先を確認

Matthias Kraus, Ron Riekenbrauck, Wolfgang Minker

(参考訳) 近年,人間-AIチームという概念が注目されている。人間とAIチームメイトとの効果的なコラボレーションのためには、緊密な協調と効果的なコミュニケーションには、積極的活動が不可欠である。しかしながら、人間をサポートするAIベースのシステムのための適切な能動性の設計は、まだオープンな問題であり、課題である。本稿では,プロアクティブダイアログポリシーのトレーニングとテストのためのコーパスベースユーザシミュレータの開発について述べる。このシミュレータは、プロアクティブダイアログとそのユーザ信頼への影響に関するインフォームド知識を取り入れ、社会デポグラフィ的特徴やパーソナリティ特性を含むユーザの行動や個人情報をシミュレートする。 2つの異なるシミュレーション手法を比較し、タスクステップベースの手法により、逐次依存関係のモデリングの強化により、全体的な結果が改善された。本研究では,人間-AIチーム改善のための対話ゲーム設定において,適切なプロアクティブ戦略を探索し,評価するための有望な方法を提案する。

The concept of a Human-AI team has gained increasing attention in recent years. For effective collaboration between humans and AI teammates, proactivity is crucial for close coordination and effective communication. However, the design of adequate proactivity for AI-based systems to support humans is still an open question and a challenging topic. In this paper, we present the development of a corpus-based user simulator for training and testing proactive dialog policies. The simulator incorporates informed knowledge about proactive dialog and its effect on user trust and simulates user behavior and personal information, including socio-demographic features and personality traits. Two different simulation approaches were compared, and a task-step-based approach yielded better overall results due to enhanced modeling of sequential dependencies. This research presents a promising avenue for exploring and evaluating appropriate proactive strategies in a dialog game setting for improving Human-AI teams.

翻訳日:2023-06-22 02:40:54 公開日:2023-06-18

# 6次元非定常マニピュレーションのためのハイブリッドアクタ・クリティカルマップの学習

Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation ( http://arxiv.org/abs/2305.03942v2 )

ライセンス: Link先を確認

Wenxuan Zhou, Bowen Jiang, Fan Yang, Chris Paxton, David Held

(参考訳) 物を握らずに操作することは、人間の器用さに欠かせない要素であり、非理解的な操作と呼ばれる。非包括的操作は、オブジェクトとのより複雑な相互作用を可能にするだけでなく、グリップとオブジェクトの相互作用を推論する際の課題も提示する。本研究では,物体の6次元非包括的操作のための強化学習手法であるHybrid Actor-Critic Maps for Manipulation (HACMan)を紹介する。 HACManは、オブジェクトポイントクラウドから接触位置を選択することと、ロボットが接触した後どのように動くかを記述した一連の動きパラメータからなる、時間的に制限された空間的空間的なオブジェクト中心のアクション表現を提案する。我々は、このハイブリッド離散連続アクション表現で学習するために、既存のオフポリチィRLアルゴリズムを変更した。シミュレーションおよび実世界における6次元オブジェクトポーズアライメントタスクにおけるHACManの評価を行った。ランダム化された初期ポーズ,ランダム化された6d目標,多様なオブジェクトカテゴリを備えた最難のタスクでは,性能低下を伴わないオブジェクトカテゴリに対する強力な一般化が実証され,実世界でのゼロショット転送で89%の成功率と50%の成功率を達成した。代替アクション表現と比較して、HACManは最高のベースラインの3倍以上の成功率を達成する。ゼロショットのsim2realトランスファーでは、動的かつ接触に富んだ非包括的スキルを用いて、現実の未確認物体をうまく操作できる。ビデオはプロジェクトのwebサイト(https://hacman-2023.github.io)で見ることができる。

Manipulating objects without grasping them is an essential component of human dexterity, referred to as non-prehensile manipulation. Non-prehensile manipulation may enable more complex interactions with the objects, but also presents challenges in reasoning about gripper-object interactions. In this work, we introduce Hybrid Actor-Critic Maps for Manipulation (HACMan), a reinforcement learning approach for 6D non-prehensile manipulation of objects using point cloud observations. HACMan proposes a temporally-abstracted and spatially-grounded object-centric action representation that consists of selecting a contact location from the object point cloud and a set of motion parameters describing how the robot will move after making contact. We modify an existing off-policy RL algorithm to learn in this hybrid discrete-continuous action representation. We evaluate HACMan on a 6D object pose alignment task in both simulation and in the real world. On the hardest version of our task, with randomized initial poses, randomized 6D goals, and diverse object categories, our policy demonstrates strong generalization to unseen object categories without a performance drop, achieving an 89% success rate on unseen objects in simulation and 50% success rate with zero-shot transfer in the real world. Compared to alternative action representations, HACMan achieves a success rate more than three times higher than the best baseline. With zero-shot sim2real transfer, our policy can successfully manipulate unseen objects in the real world for challenging non-planar goals, using dynamic and contact-rich non-prehensile skills. Videos can be found on the project website: https://hacman-2023.github.io.

翻訳日:2023-06-22 02:32:22 公開日:2023-06-18

# ChatGPTの動作記憶能力に関する実証的研究

Working Memory Capacity of ChatGPT: An Empirical Study ( http://arxiv.org/abs/2305.03731v2 )

ライセンス: Link先を確認

Dongyu Gong, Xingchen Wan, Dingmin Wang

(参考訳) ワーキングメモリは、人間の知性と人工知能の両方において重要な側面であり、情報の一時記憶と操作のためのワークスペースとして機能する。本稿では,OpenAI が開発した大規模言語モデルである ChatGPT (gpt-3.5-turbo) の動作記憶能力について,様々な条件下での音声および空間的 n-back タスクの性能を検証し,系統的に評価する。実験の結果,nが増加するにつれてchatgptの性能が大幅に低下することが明らかとなり(作業記憶に格納する情報が増える必要がある),作業記憶能力の限界がヒトに非常に近いことが示唆された。さらに,chatgptの性能に対する異なる指導戦略の影響を調査し,キャパシティ制限の基本パターンが持続することを確認した。実験結果から,n-backタスクは大規模言語モデルのワーキングメモリ容量をベンチマークするためのツールとして機能し,aiワーキングメモリの強化とaiモデルによるヒューマンワーキングメモリの理解の深化を目的とした今後の取り組みの可能性を秘めている可能性が示唆された。

Working memory is a critical aspect of both human intelligence and artificial intelligence, serving as a workspace for the temporary storage and manipulation of information. In this paper, we systematically assess the working memory capacity of ChatGPT (gpt-3.5-turbo), a large language model developed by OpenAI, by examining its performance in verbal and spatial n-back tasks under various conditions. Our experiments reveal that ChatGPT experiences significant declines in performance as n increases (which necessitates more information to be stored in working memory), suggesting a limit to the working memory capacity strikingly similar to that of humans. Furthermore, we investigate the impact of different instruction strategies on ChatGPT's performance and observe that the fundamental patterns of a capacity limit persist. From our empirical findings, we propose that n-back tasks may serve as tools for benchmarking the working memory capacity of large language models and hold potential for informing future efforts aimed at enhancing AI working memory and deepening our understanding of human working memory through AI models.

翻訳日:2023-06-22 02:31:53 公開日:2023-06-18

# ChatGPTとBardはデータプロバイダと利益を共有すべきか? AI時代の新しいビジネスモデル

Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era ( http://arxiv.org/abs/2305.02555v2 )

ライセンス: Link先を確認

Dong Zhang

(参考訳) ChatGPTのようなさまざまなAIツールが普及するにつれて、私たちは真のAIの時代に入りつつある。例外的なAIツールがすぐにかなりの利益を得ると予想できる。 AIツールは、従来の利害関係者や株主に加えて、トレーニングデータプロバイダと収益を共有するべきか? 答えはイエスです。大規模言語モデルのような大規模なAIツールは、継続的に改善するためには、より高品質なデータを必要とするが、現在の著作権法は様々な種類のデータへのアクセスを制限する。 AIツールとデータプロバイダ間で収益を共有することで、現在の敵対的なゼロサムゲーム関係を、AIツールと著作権のあるデータ所有者の大多数が協力的かつ相互に利益をもたらすものにすることができる。しかし、現在の収益分配ビジネスモデルは、次のAI時代のAIツールでは機能しない。なぜなら、ウェブサイトベースのトラフィックやクリックのようなアクションのための最も広く使われているメトリクスは、生成AIツールのプロンプトやコストといった新しいメトリクスに置き換えられるからだ。まったく新しい収益分配ビジネスモデルは、AIツールからほぼ独立して、データプロバイダに簡単に説明できる必要があるが、各データプロバイダのデータエンゲージメントを測定するために、プロンプトベースのスコアリングシステムを確立する必要がある。本稿では、分類とコンテンツ類似性モデルに基づいて、AIツールのすべてのデータプロバイダに対して、このようなスコアリングシステムを構築する方法を体系的に議論し、それを構築するためのAIツールやサードパーティの要件を概説する。このようなスコアリングシステムを使ってデータプロバイダと収益を共有することで、より多くのデータ所有者が収益共有プログラムに参加することができる。これは、すべての当事者が恩恵を受ける、実用的なAI時代になるでしょう。

With various AI tools such as ChatGPT becoming increasingly popular, we are entering a true AI era. We can foresee that exceptional AI tools will soon reap considerable profits. A crucial question arise: should AI tools share revenue with their training data providers in additional to traditional stakeholders and shareholders? The answer is Yes. Large AI tools, such as large language models, always require more and better quality data to continuously improve, but current copyright laws limit their access to various types of data. Sharing revenue between AI tools and their data providers could transform the current hostile zero-sum game relationship between AI tools and a majority of copyrighted data owners into a collaborative and mutually beneficial one, which is necessary to facilitate the development of a virtuous cycle among AI tools, their users and data providers that drives forward AI technology and builds a healthy AI ecosystem. However, current revenue-sharing business models do not work for AI tools in the forthcoming AI era, since the most widely used metrics for website-based traffic and action, such as clicks, will be replaced by new metrics such as prompts and cost per prompt for generative AI tools. A completely new revenue-sharing business model, which must be almost independent of AI tools and be easily explained to data providers, needs to establish a prompt-based scoring system to measure data engagement of each data provider. This paper systematically discusses how to build such a scoring system for all data providers for AI tools based on classification and content similarity models, and outlines the requirements for AI tools or third parties to build it. Sharing revenue with data providers using such a scoring system would encourage more data owners to participate in the revenue-sharing program. This will be a utilitarian AI era where all parties benefit.

翻訳日:2023-06-22 02:31:31 公開日:2023-06-18

# WSSSに代わるもの? 弱教師付きセマンティックセマンティックセグメンテーション問題におけるセグメンテーションモデル(SAM)の実証的研究

An Alternative to WSSS? An Empirical Study of the Segment Anything Model (SAM) on Weakly-Supervised Semantic Segmentation Problems ( http://arxiv.org/abs/2305.01586v2 )

ライセンス: Link先を確認

Weixuan Sun, Zheyuan Liu, Yanhao Zhang, Yiran Zhong, Nick Barnes

(参考訳) Segment Anything Model (SAM)は優れたパフォーマンスと汎用性を示しており、様々なタスクに有望なツールとなっている。本稿では,Wakly-Supervised Semantic Segmentation (WSSS)におけるSAMの適用について検討する。特に,画像レベルのクラスラベルのみを付与した擬似ラベル生成パイプラインとしてSAMを適用した。ほとんどのケースで目覚ましい結果が見られたが、特定の限界も特定できた。本研究は,PASCAL VOCとMS-COCOの性能評価を含む。このレポートは、WSSSにSAMを採用するためのさらなる調査と、より広範な現実世界のアプリケーションを促進することを期待する。

The Segment Anything Model (SAM) has demonstrated exceptional performance and versatility, making it a promising tool for various related tasks. In this report, we explore the application of SAM in Weakly-Supervised Semantic Segmentation (WSSS). Particularly, we adapt SAM as the pseudo-label generation pipeline given only the image-level class labels. While we observed impressive results in most cases, we also identify certain limitations. Our study includes performance evaluations on PASCAL VOC and MS-COCO, where we achieved remarkable improvements over the latest state-of-the-art methods on both datasets. We anticipate that this report encourages further explorations of adopting SAM in WSSS, as well as wider real-world applications.

翻訳日:2023-06-22 02:29:58 公開日:2023-06-18

# 軽量オールコンベネト・トランスファー学習による表面emgに基づくセッション間/サブジェクション認識

Surface EMG-Based Inter-Session/Inter-Subject Gesture Recognition by Leveraging Lightweight All-ConvNet and Transfer Learning ( http://arxiv.org/abs/2305.08014v2 )

ライセンス: Link先を確認

Md. Rabiul Islam, Daniel Massicotte, Philippe Y. Massicotte, and Wei-Ping Zhu

(参考訳) 低解像度のHD-sEMG画像を用いたジェスチャー認識は、より流動的で自然な筋肉-コンピュータインターフェースを開発するための新たな道を開く。しかし、セッション間およびサブジェクト間シナリオ間のデータ変動は大きな課題となる。既存のアプローチでは、非常に大きく複雑なConvNetまたは2SRNNベースのドメイン適応手法を使用して、これらのセッション間およびオブジェクト間データのばらつきに起因する分散シフトを近似した。したがって、これらの方法は、何百万ものトレーニングパラメータと、事前トレーニングと適応段階の両方で、トレーニング済みおよびターゲットドメインデータセットを学習する必要がある。その結果、リアルタイムアプリケーションへのデプロイには、ハイエンドのリソースバウンドと計算コストが非常にかかる。本稿では,この問題を解決するために,軽量なall-convnet and transfer learning(tl)を活用した軽量なall-convnet+tlモデルを提案する。 all-convnet+tlモデルは畳み込み層のみで構成されており、セッション間およびサブジェクト間データ可変性によって引き起こされる分散シフトに対処するための不変および判別表現を学習するための単純かつ効率的なフレームワークである。 4つのデータセットに対する実験により,提案手法は,既存の手法よりも大きなマージンで優れており,セッション間およびオブジェクト間シナリオにおける最先端の結果が得られ,セッション内ジェスチャ認識において同等あるいは競合的に実行されることを示した。これらのパフォーマンスギャップは、少数のデータ(例えば単一のトライアル)がターゲットドメインで利用可能になったときにさらに増加する。これらの顕著な実験結果は、現在の最先端モデルが、sEMGベースのセッション間およびオブジェクト間ジェスチャー認識タスクに対して過度にパラメータ化されていることを示す。

Gesture recognition using low-resolution instantaneous HD-sEMG images opens up new avenues for the development of more fluid and natural muscle-computer interfaces. However, the data variability between inter-session and inter-subject scenarios presents a great challenge. The existing approaches employed very large and complex deep ConvNet or 2SRNN-based domain adaptation methods to approximate the distribution shift caused by these inter-session and inter-subject data variability. Hence, these methods also require learning over millions of training parameters and a large pre-trained and target domain dataset in both the pre-training and adaptation stages. As a result, it makes high-end resource-bounded and computationally very expensive for deployment in real-time applications. To overcome this problem, we propose a lightweight All-ConvNet+TL model that leverages lightweight All-ConvNet and transfer learning (TL) for the enhancement of inter-session and inter-subject gesture recognition performance. The All-ConvNet+TL model consists solely of convolutional layers, a simple yet efficient framework for learning invariant and discriminative representations to address the distribution shifts caused by inter-session and inter-subject data variability. Experiments on four datasets demonstrate that our proposed methods outperform the most complex existing approaches by a large margin and achieve state-of-the-art results on inter-session and inter-subject scenarios and perform on par or competitively on intra-session gesture recognition. These performance gaps increase even more when a tiny amount (e.g., a single trial) of data is available on the target domain for adaptation. These outstanding experimental results provide evidence that the current state-of-the-art models may be overparameterized for sEMG-based inter-session and inter-subject gesture recognition tasks.

翻訳日:2023-06-22 02:23:09 公開日:2023-06-18

# 健康保険請求の経時的変化に関する大規模研究

Large-Scale Study of Temporal Shift in Health Insurance Claims ( http://arxiv.org/abs/2305.05087v2 )

ライセンス: Link先を確認

Christina X Ji, Ahmed M Alaa, David Sontag

(参考訳) 臨床結果を予測する機械学習モデルは歴史的データを用いて開発されている。しかし、たとえこれらのモデルが近い将来デプロイされるとしても、データセットの時間的シフトは理想的なパフォーマンスに満たない可能性がある。この現象を捉えるために,歴史的モデルがもはやその結果を予測するのに最適でない場合,特定の時点において予測される結果が非定常であるようなタスクを考える。本研究では,集団レベルでの時間的シフトを検証するためのアルゴリズムを構築した。次に,大規模なタスク群における時間変化の振り返りスキャンを行うためのメタアルゴリズムを構築した。我々のアルゴリズムは、医療の時間的シフトを私たちの知識にまとめて評価することを可能にする。我々は、2015年から2020年にかけて、医療保険請求データセットに基づいて242の医療結果を評価し、1,010のタスクを作成します。タスクの9.7%は人口レベルでの時間的シフトを示し、93.0%は人口移動の影響を受けている。臨床的意義を理解するためにケーススタディを掘り下げる。我々の分析は、医療における時間的シフトの広範性を強調している。

Most machine learning models for predicting clinical outcomes are developed using historical data. Yet, even if these models are deployed in the near future, dataset shift over time may result in less than ideal performance. To capture this phenomenon, we consider a task--that is, an outcome to be predicted at a particular time point--to be non-stationary if a historical model is no longer optimal for predicting that outcome. We build an algorithm to test for temporal shift either at the population level or within a discovered sub-population. Then, we construct a meta-algorithm to perform a retrospective scan for temporal shift on a large collection of tasks. Our algorithms enable us to perform the first comprehensive evaluation of temporal shift in healthcare to our knowledge. We create 1,010 tasks by evaluating 242 healthcare outcomes for temporal shift from 2015 to 2020 on a health insurance claims dataset. 9.7% of the tasks show temporal shifts at the population level, and 93.0% have some sub-population affected by shifts. We dive into case studies to understand the clinical implications. Our analysis highlights the widespread prevalence of temporal shifts in healthcare.

翻訳日:2023-06-22 02:20:38 公開日:2023-06-18

# 区別可能なセグメンテーションを用いたエンドツーエンド同時音声翻訳

End-to-End Simultaneous Speech Translation with Differentiable Segmentation ( http://arxiv.org/abs/2305.16093v2 )

ライセンス: Link先を確認

Shaolei Zhang, Yang Feng

(参考訳) エンドツーエンド同時音声翻訳(simulst)は、ストリーミング音声入力を受信しながら翻訳を出力する(すなわち、ストリーミング音声翻訳)ため、音声入力を分割して、現在の受信音声に基づいて翻訳する必要がある。しかし、不利な瞬間に音声入力を分割すると、音響的完全性が損なわれ、翻訳モデルの性能に悪影響を及ぼす可能性がある。したがって、翻訳モデルが高品質な翻訳を生み出すのに役立つこれらの瞬間に音声入力を分割する学習は、シマルストの鍵となる。既存のSimulST法は、固定長セグメンテーションまたは外部セグメンテーションモデルのいずれかを使用しており、常に基礎となる翻訳モデルとセグメンテーションを分離している。そこで本稿では,SimulST における微分可能セグメンテーション (DiSeg) を提案し,基礎となる翻訳モデルから直接セグメンテーションを学習する。 DiSegは、予測トレーニングによってハードセグメンテーションを微分可能にし、翻訳モデルと共同でトレーニングし、翻訳効果セグメンテーションを学ぶことができる。実験結果から,DiSegは最先端性能を実現し,セグメンテーション能力に優れることが示された。

End-to-end simultaneous speech translation (SimulST) outputs translation while receiving the streaming speech inputs (a.k.a. streaming speech translation), and hence needs to segment the speech inputs and then translate based on the current received speech. However, segmenting the speech inputs at unfavorable moments can disrupt the acoustic integrity and adversely affect the performance of the translation model. Therefore, learning to segment the speech inputs at those moments that are beneficial for the translation model to produce high-quality translation is the key to SimulST. Existing SimulST methods, either using the fixed-length segmentation or external segmentation model, always separate segmentation from the underlying translation model, where the gap results in segmentation outcomes that are not necessarily beneficial for the translation process. In this paper, we propose Differentiable Segmentation (DiSeg) for SimulST to directly learn segmentation from the underlying translation model. DiSeg turns hard segmentation into differentiable through the proposed expectation training, enabling it to be jointly trained with the translation model and thereby learn translation-beneficial segmentation. Experimental results demonstrate that DiSeg achieves state-of-the-art performance and exhibits superior segmentation capability.

翻訳日:2023-06-22 01:52:30 公開日:2023-06-18

# 低コストセキュリティ検査のための大規模単発ミリ波イメージングに向けて

Towards Large-scale Single-shot Millimeter-wave Imaging for Low-cost Security Inspection ( http://arxiv.org/abs/2305.15750v2 )

ライセンス: Link先を確認

Liheng Bian, Daoyu Li, Shuoguang Wang, Chunyang Teng, Huteng Liu, Hanwen Xu, Xuyang Chang, Guoqiang Zhao, Shiyong Li, Jun Zhang

(参考訳) 安全検査のための有望な技術としてミリ波イメージング(MMW)が登場している。画像分解能、透過性、人間の安全性の微妙なバランスを実現し、低周波マイクロ波に比べて高い分解能、可視光よりも強い透過性、X線より強い安全性を実現している。近年の進歩にもかかわらず、必要な大規模アンテナアレイの高コストは、実際にMMWイメージングを広く採用することを妨げている。この課題に取り組むため,sparseアンテナアレーを用いた大規模単発mmwイメージングフレームワークを報告し,解釈可能な学習方式で低コストかつ高精度なセキュリティ検査を実現する。まず,大規模アレイにおける各要素の統計的ランク付けについて検討するため,全サンプルのMMWエコーを収集した。これらの要素はランキングに基づいてサンプリングされ、実験的に最適なスパースサンプリング戦略を構築し、アンテナアレイのコストを最大1桁削減する。さらに,スパースサンプルエコーから頑健で正確な画像再構成を実現する非学習的解釈可能な学習手法を考案した。最後に,物体の自動検出のためのニューラルネットワークを開発し,10%のスパースアレイを用いた隠れたセンチメートルサイズのターゲットの検出を実験的に実証した。報告した手法の性能は、精度、リコール、mAP50を含む様々な指標で既存のMMW撮像方式よりも50%以上優れている。このような強力な検出能力とオーダー・オブ・マグニチュードのコスト削減により、この技術は大規模単発MMWイメージングの実用的な方法となり、さらに実用的な応用が期待できる。

Millimeter-wave (MMW) imaging is emerging as a promising technique for safe security inspection. It achieves a delicate balance between imaging resolution, penetrability and human safety, resulting in higher resolution compared to low-frequency microwave, stronger penetrability compared to visible light, and stronger safety compared to X ray. Despite of recent advance in the last decades, the high cost of requisite large-scale antenna array hinders widespread adoption of MMW imaging in practice. To tackle this challenge, we report a large-scale single-shot MMW imaging framework using sparse antenna array, achieving low-cost but high-fidelity security inspection under an interpretable learning scheme. We first collected extensive full-sampled MMW echoes to study the statistical ranking of each element in the large-scale array. These elements are then sampled based on the ranking, building the experimentally optimal sparse sampling strategy that reduces the cost of antenna array by up to one order of magnitude. Additionally, we derived an untrained interpretable learning scheme, which realizes robust and accurate image reconstruction from sparsely sampled echoes. Last, we developed a neural network for automatic object detection, and experimentally demonstrated successful detection of concealed centimeter-sized targets using 10% sparse array, whereas all the other contemporary approaches failed at the same sample sampling ratio. The performance of the reported technique presents higher than 50% superiority over the existing MMW imaging schemes on various metrics including precision, recall, and mAP50. With such strong detection ability and order-of-magnitude cost reduction, we anticipate that this technique provides a practical way for large-scale single-shot MMW imaging, and could advocate its further practical applications.

翻訳日:2023-06-22 01:52:09 公開日:2023-06-18

# テキスト誘導拡散モデルの興味ある特性

Intriguing Properties of Text-guided Diffusion Models ( http://arxiv.org/abs/2306.00974v3 )

ライセンス: Link先を確認

Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, and Alan Yuille

(参考訳) テキスト誘導拡散モデル(TDM)は広く応用されているが、予期せず失敗することがある。よくある失敗は (i)自然に見えるテキストは、間違った内容の画像を生成させるか、または (ii)同じテキストプロンプトで条件付けされているにもかかわらず、非常に異なる、あるいは無関係な出力を生成する潜在変数の異なるランダムなサンプル。本研究では,TDMの障害モードについて,より詳細に研究し,理解することを目的とする。これを実現するために,画像分類器を代理損失関数として利用するTDMに対する敵対攻撃であるSAGEを提案し,画像生成における予期せぬ動作や故障事例を自動的に発見するために,TDMの離散的なプロンプト空間と高次元潜在空間を探索する。我々は,sageが分類器ではなく拡散モデルの障害事例を見出すために,いくつかの技術的貢献を行い,人間の研究で検証する。本研究は,これまでに体系的に研究されていないtdmの4つの興味をそそる性質を明らかにした。(1)入力テキストのセマンティクスを捉えない画像を生成する,様々な自然テキストプロンプトを見つける。これらの障害を根本原因に基づいた10の異なるタイプに分類する。 2) テキストプロンプトから独立して歪んだ画像につながる潜伏空間(外れ値ではない)のサンプルを見つけ, 潜伏空間の一部が十分に構造化されていないことを示唆した。 3)テキストプロンプトと無関係な自然画像に繋がる潜在サンプルを見つけ、潜在空間とプロンプト空間の間の潜在的な不一致を示唆する。 (4) 入力プロンプトに1つの逆数トークンを埋め込むことで、CLIPスコアに最小限の影響を与えながら、さまざまな特定のターゲットオブジェクトを生成することができる。これは言語表現の脆弱さを示し、潜在的な安全性の懸念を提起する。

Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly. Common failures include: (i) natural-looking text prompts generating images with the wrong content, or (ii) different random samples of the latent variables that generate vastly different, and even unrelated, outputs despite being conditioned on the same text prompt. In this work, we aim to study and understand the failure modes of TDMs in more detail. To achieve this, we propose SAGE, an adversarial attack on TDMs that uses image classifiers as surrogate loss functions, to search over the discrete prompt space and the high-dimensional latent space of TDMs to automatically discover unexpected behaviors and failure cases in the image generation. We make several technical contributions to ensure that SAGE finds failure cases of the diffusion model, rather than the classifier, and verify this in a human study. Our study reveals four intriguing properties of TDMs that have not been systematically studied before: (1) We find a variety of natural text prompts producing images that fail to capture the semantics of input texts. We categorize these failures into ten distinct types based on the underlying causes. (2) We find samples in the latent space (which are not outliers) that lead to distorted images independent of the text prompt, suggesting that parts of the latent space are not well-structured. (3) We also find latent samples that lead to natural-looking images which are unrelated to the text prompt, implying a potential misalignment between the latent and prompt spaces. (4) By appending a single adversarial token embedding to an input prompt we can generate a variety of specified target objects, while only minimally affecting the CLIP score. This demonstrates the fragility of language representations and raises potential safety concerns.

翻訳日:2023-06-22 01:22:48 公開日:2023-06-18

# マスク画像モデリングによる自己教師付き学習フレームワークに基づく新しいドライバ抽出行動検出

A Novel Driver Distraction Behavior Detection Based on Self-Supervised Learning Framework with Masked Image Modeling ( http://arxiv.org/abs/2306.00543v3 )

ライセンス: Link先を確認

Yingzhi Zhang, Taiguo Li, Chao Li and Xinghong Zhou

(参考訳) ドライバーの気晴らしは毎年かなりの数の交通事故を引き起こし、経済的な損失と損失をもたらす。現在、商用車両の自動化のレベルは完全に無人ではなく、ドライバーは依然として車両の操作と制御において重要な役割を担っている。そのため,道路安全には運転者の注意散らし行動検出が不可欠である。現在、ドライバーの注意散逸検出は主に従来の畳み込みニューラルネットワーク(cnn)と教師付き学習方法に依存している。しかし、ラベル付きデータセットの高コスト、高レベルのセマンティック情報をキャプチャする能力の制限、一般化性能の低下など、依然として課題がある。そこで本研究では,ドライバの注意散逸行動検出のためのマスク画像モデルに基づく自己教師付き学習手法を提案する。まず,マスク付き画像モデリング(MIM)のための自己教師型学習フレームワークを導入し,データセットのラベル付けによる人的・物質的消費の問題を解決する。次に、Swin Transformerがエンコーダとして使用される。 Swin Transformerブロックを再構成し、ウィンドウマルチヘッド自己アテンション(W-MSA)とシフトウィンドウマルチヘッド自己アテンション(SW-MSA)検出ヘッドの分布を全ステージにわたって調整することで、より軽量化を実現する。最後に、モデルの認識と一般化能力を強化するために、様々なデータ拡張戦略と最適なランダムマスキング戦略が使用される。大規模運転注意散逸行動データセットの試験結果から,本論文で提案した自己教師学習法は99.60%の精度で,高度な教師付き学習法の優れた性能を近似する。

Driver distraction causes a significant number of traffic accidents every year, resulting in economic losses and casualties. Currently, the level of automation in commercial vehicles is far from completely unmanned, and drivers still play an important role in operating and controlling the vehicle. Therefore, driver distraction behavior detection is crucial for road safety. At present, driver distraction detection primarily relies on traditional Convolutional Neural Networks (CNN) and supervised learning methods. However, there are still challenges such as the high cost of labeled datasets, limited ability to capture high-level semantic information, and weak generalization performance. In order to solve these problems, this paper proposes a new self-supervised learning method based on masked image modeling for driver distraction behavior detection. Firstly, a self-supervised learning framework for masked image modeling (MIM) is introduced to solve the serious human and material consumption issues caused by dataset labeling. Secondly, the Swin Transformer is employed as an encoder. Performance is enhanced by reconfiguring the Swin Transformer block and adjusting the distribution of the number of window multi-head self-attention (W-MSA) and shifted window multi-head self-attention (SW-MSA) detection heads across all stages, which leads to model more lightening. Finally, various data augmentation strategies are used along with the best random masking strategy to strengthen the model's recognition and generalization ability. Test results on a large-scale driver distraction behavior dataset show that the self-supervised learning method proposed in this paper achieves an accuracy of 99.60%, approximating the excellent performance of advanced supervised learning methods.

翻訳日:2023-06-22 01:21:59 公開日:2023-06-18

# 雑音ラベルを用いた線形距離メトリック学習

Linear Distance Metric Learning with Noisy Labels ( http://arxiv.org/abs/2306.03173v2 )

ライセンス: Link先を確認

Meysam Alishahi, Anna Little, and Jeff M. Phillips

(参考訳) 線形距離距離学習では、あるユークリッド距離空間内のデータを与えられ、ある距離条件を可能な限り尊重する別のユークリッド距離空間への適切な線型写像を見つけることが目的である。本稿では,一般連続凸損失最適化問題に還元する単純でエレガントな手法を定式化し,異なる雑音モデルに対して対応する損失関数を導出する。その結果、データがノイズである場合でも、十分なサンプルへのアクセスを提供する精度で基底真理線形計量を学習できることを示し、対応するサンプル複雑性を限定する。さらに,学習したモデルを低ランクモデルに切り離し,損失関数とパラメータの精度を良好に維持する効果的な手法を提案する。合成および実データ集合に関するいくつかの実験的な観察は、我々の理論的結果を支持し、知らせる。

In linear distance metric learning, we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible. In this paper, we formalize a simple and elegant method which reduces to a general continuous convex loss optimization problem, and for different noise models we derive the corresponding loss functions. We show that even if the data is noisy, the ground truth linear metric can be learned with any precision provided access to enough samples, and we provide a corresponding sample complexity bound. Moreover, we present an effective way to truncate the learned model to a low-rank model that can provably maintain the accuracy in loss function and in parameters -- the first such results of this type. Several experimental observations on synthetic and real data sets support and inform our theoretical results.

翻訳日:2023-06-22 01:11:11 公開日:2023-06-18

# hierarchyeom.jl:オープン量子システムにおける階層的運動方程式のための効率的なjuliaフレームワーク

HierarchicalEOM.jl: An efficient Julia framework for hierarchical equations of motion in open quantum systems ( http://arxiv.org/abs/2306.07522v3 )

ライセンス: Link先を確認

Yi-Te Huang, Po-Chen Kuo, Neill Lambert, Mauro Cirio, Simon Cross, Shen-Liang Yang, Franco Nori, Yueh-Nan Chen

(参考訳) 我々は,複数のボソニック環境とフェルミオン環境を同時に結合したシステムのダイナミクスを減少させるために,階層的運動方程式(heom)を統合するためのjuliaフレームワークであるhierarchicaleom.jlというオープンソースソフトウェアパッケージを導入する。 HierarchicalEOM.jlは、ボソニックおよびフェルミオンスペクトル、定常状態、および全ての補助密度作用素(ADO)の拡張空間におけるフルダイナミックスを計算する方法の集合を特徴としている。 ADOのマルチインデックスの必要な処理は、ユーザフレンドリーなインターフェースによって実現される。 2つのフェルミオン貯水池と相互作用する1つの不純物(アンダーソンモデル)と1つのボゾンと2つのフェルミオン貯水池と相互作用する超強結合電荷キャビティ系を解析することにより、パッケージの機能性を実証する。 hierarchyeom.jl は heom liouvillian superoperator の構築において、全ての ados のダイナミクスと定常状態の解法として、このパッケージが確立された python (qutip) の量子ツールボックスの対応するメソッドに関して、桁違いに高速化することができる。

We introduce an open-source software package called "HierarchicalEOM.jl", a Julia framework to integrate the hierarchical equations of motion (HEOM) for the reduced dynamics of a system simultaneously coupled to multiple bosonic and fermionic environments. HierarchicalEOM.jl features a collection of methods to compute bosonic and fermionic spectra, stationary states, and the full dynamics in the extended space of all auxiliary density operators (ADOs). The required handling of the ADOs multi-indexes is achieved through a user-friendly interface. We exemplify the functionalities of the package by analyzing a single impurity interacting with two fermionic reservoirs (Anderson model), and an ultra-strongly coupled charge-cavity system interacting with one bosonic and two fermionic reservoirs. HierarchicalEOM.jl allows for an order of magnitude speedup in the construction of the HEOM Liouvillian superoperator, solving dynamics and stationary states for all ADOs, with respect to the corresponding method in the Quantum Toolbox in Python (QuTiP), upon which this package is founded.

翻訳日:2023-06-22 01:04:35 公開日:2023-06-18

# LAMM: 言語支援マルチモーダル命令-チューニングデータセット、フレームワーク、ベンチマーク

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark ( http://arxiv.org/abs/2306.06687v2 )

ライセンス: Link先を確認

Zhenfei Yin, Jiong Wang, Jianjian Cao, Zhelun Shi, Dingning Liu, Mukai Li, Lu Sheng, Lei Bai, Xiaoshui Huang, Zhiyong Wang, Jing Shao, Wanli Ouyang

(参考訳) 大規模言語モデルは、人工知能の実現への潜在的経路となっている。マルチモーダル大規模言語モデルに関する最近の研究は、視覚モダリティの処理における効果を実証している。本研究では,MLLMの研究をポイントクラウドに拡張し,2次元画像と3次元ポイントクラウド理解のためのLAMMデータセットとLAMMベンチマークを示す。また,MLLMのさらなるモダリティへの拡張を容易にする拡張可能なフレームワークを構築した。私たちの主な貢献は3倍です。 1) LAMM-Dataset と LAMM-Benchmark について述べる。広範な実験によって、データセットとベンチマークの有効性が検証されます。 2)mllmのインストラクションチューニングデータセットとベンチマークを構築するための詳細な方法を示し,mllmに関する今後の研究により,他のドメインやタスク,モダリティへのスケールアップと拡張を高速化する。 3)モダリティの拡張に最適化されたMLLMトレーニングフレームワークを提供する。また、今後の研究を加速するために、ベースラインモデル、総合的な実験観測、分析も提供する。コードとデータセットはhttps://github.com/OpenLAMM/LAMMで公開されている。

Large language models have become a potential pathway toward achieving artificial general intelligence. Recent works on multi-modal large language models have demonstrated their effectiveness in handling visual modalities. In this work, we extend the research of MLLMs to point clouds and present the LAMM-Dataset and LAMM-Benchmark for 2D image and 3D point cloud understanding. We also establish an extensible framework to facilitate the extension of MLLMs to additional modalities. Our main contribution is three-fold: 1) We present the LAMM-Dataset and LAMM-Benchmark, which cover almost all high-level vision tasks for 2D and 3D vision. Extensive experiments validate the effectiveness of our dataset and benchmark. 2) We demonstrate the detailed methods of constructing instruction-tuning datasets and benchmarks for MLLMs, which will enable future research on MLLMs to scale up and extend to other domains, tasks, and modalities faster. 3) We provide a primary but potential MLLM training framework optimized for modalities' extension. We also provide baseline models, comprehensive experimental observations, and analysis to accelerate future research. Codes and datasets are now available at https://github.com/OpenLAMM/LAMM.

翻訳日:2023-06-22 01:03:44 公開日:2023-06-18

# グラフニューラルネットワークの局所的・グローバル的展望

Local-to-global Perspectives on Graph Neural Networks ( http://arxiv.org/abs/2306.06547v2 )

ライセンス: Link先を確認

Chen Cai

(参考訳) この論文は、グラフ構造化データを処理するための主要なアーキテクチャである、グラフニューラルネットワーク(gnn)のローカルからグローバルへの展望を示している。 GNNをローカルメッセージパッシングニューラルネットワーク(MPNN)とグローバルグラフトランスフォーマーに分類した後、我々は3つの作品を提示した。 1)グローバルGNNの一種である不変グラフネットワークの収束特性について検討する。 2)ローカルMPNNとグローバルグラフ変換器を接続し、 3)グローバルモデリングで使用される標準サブルーチンであるグラフ粗大化にローカルMPNNを使用する。

This thesis presents a local-to-global perspective on graph neural networks (GNN), the leading architecture to process graph-structured data. After categorizing GNN into local Message Passing Neural Networks (MPNN) and global Graph transformers, we present three pieces of work: 1) study the convergence property of a type of global GNN, Invariant Graph Networks, 2) connect the local MPNN and global Graph Transformer, and 3) use local MPNN for graph coarsening, a standard subroutine used in global modeling.

翻訳日:2023-06-22 01:03:25 公開日:2023-06-18

# セルワイズ物体追跡、速度推定、時間経過によるセンサデータの投影のための深層学習法

Deep Learning Method for Cell-Wise Object Tracking, Velocity Estimation and Projection of Sensor Data over Time ( http://arxiv.org/abs/2306.06126v2 )

ライセンス: Link先を確認

Marco Braun, Moritz Luszek, Mirko Meuter, Dominic Spata, Kevin Kollek and Anton Kummert

(参考訳) 環境セグメンテーションと速度推定のための最近のディープラーニング手法は、得られたセンサデータ内の時空間関係を利用する畳み込みリカレントニューラルネットワークに依存している。これらのアプローチは、ConvNetsを利用した新しい入力と記憶データの関連付けにより、シーンダイナミクスを暗黙的に導き出す。我々は、convnetがこのタスクのアーキテクチャ上の制約に苦しむ様子を示す。そこで本研究では,トランスフォーマー機構を応用した新しいリカレントニューラルネットワークユニットを提示することにより,センサ記録の時系列における時空間相関の活用に関する様々な課題を解決する。このユニット内のオブジェクトエンコーディングは、それぞれセンサ入力とメモリ状態から派生したキー-クエリペアを関連付け、連続したフレーム間で追跡される。次に、結果の追跡パターンを使用して、シーンダイナミクスと回帰速度を得る。最後のステップでは、抽出された速度推定に基づいてリカレントニューラルネットワークのメモリ状態を投影し、上記の時空間的不一致を解決する。

Current Deep Learning methods for environment segmentation and velocity estimation rely on Convolutional Recurrent Neural Networks to exploit spatio-temporal relationships within obtained sensor data. These approaches derive scene dynamics implicitly by correlating novel input and memorized data utilizing ConvNets. We show how ConvNets suffer from architectural restrictions for this task. Based on these findings, we then provide solutions to various issues on exploiting spatio-temporal correlations in a sequence of sensor recordings by presenting a novel Recurrent Neural Network unit utilizing Transformer mechanisms. Within this unit, object encodings are tracked across consecutive frames by correlating key-query pairs derived from sensor inputs and memory states, respectively. We then use resulting tracking patterns to obtain scene dynamics and regress velocities. In a last step, the memory state of the Recurrent Neural Network is projected based on extracted velocity estimates to resolve aforementioned spatio-temporal misalignment.

翻訳日:2023-06-22 01:02:43 公開日:2023-06-18

# vision datasets: 視覚に基づく産業検査のベンチマーク

VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON ( http://arxiv.org/abs/2306.07890v2 )

ライセンス: Link先を確認

Haoping Bai, Shancong Mou, Tatiana Likhomanenko, Ramazan Gokberk Cinbis, Oncel Tuzel, Ping Huang, Jiulong Shan, Jianjun Shi, Meng Cao

(参考訳) ビジョンベースの検査アルゴリズムの進歩にもかかわらず、データ可用性、品質、複雑な生産要件など、現実の産業上の課題は、しばしば未解決のままである。我々は,14の産業検査データセットの多種多様なコレクションであるvision datasetsを紹介する。以前のデータセットとは異なり、VISIONは欠陥検出に汎用性をもたらし、すべての分割にアノテーションマスクを提供し、さまざまな検出方法に対処する。データセットにはインスタンスセグメンテーションアノテーションがあり、正確な欠陥識別を可能にします。 44の欠陥を含む合計18kイメージにより、VISIONは幅広い実世界のプロダクションシナリオを反映しようと試みている。 Vision Datasetsで進行中の2つのチャレンジコンペティションを支援することで、ビジョンベースの産業検査のさらなる進歩を期待する。

Despite progress in vision-based inspection algorithms, real-world industrial challenges -- specifically in data availability, quality, and complex production requirements -- often remain under-addressed. We introduce the VISION Datasets, a diverse collection of 14 industrial inspection datasets, uniquely poised to meet these challenges. Unlike previous datasets, VISION brings versatility to defect detection, offering annotation masks across all splits and catering to various detection methodologies. Our datasets also feature instance-segmentation annotation, enabling precise defect identification. With a total of 18k images encompassing 44 defect types, VISION strives to mirror a wide range of real-world production scenarios. By supporting two ongoing challenge competitions on the VISION Datasets, we hope to foster further advancements in vision-based industrial inspection.

翻訳日:2023-06-22 00:52:52 公開日:2023-06-18

# 顧客レビューから洞察を効率的に抽出するためのクラウドベースの機械学習パイプライン

A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews ( http://arxiv.org/abs/2306.07786v2 )

ライセンス: Link先を確認

Robert Lakatos, Gergo Bogacsovics, Balazs Harangi, Istvan Lakatos, Attila Tiba, Janos Toth, Marianna Szabo, Andras Hajdu

(参考訳) 自然言語処理の効率は、機械学習モデル、特にニューラルネットワークベースのソリューションの出現によって劇的に向上した。しかしながら、特定のドメインを考慮する場合、いくつかのタスクはまだ難しい。本稿では,パイプラインに統合された機械学習手法を用いて,顧客レビューから洞察を抽出するクラウドシステムを提案する。トピックモデリングには、自然言語処理、ベクトル埋め込みに基づくキーワード抽出、クラスタリング用に設計されたトランスフォーマーベースニューラルネットワークを用いる。提案モデルの要素は,効率的な情報抽出,抽出した情報のトピックモデリング,ユーザニーズといった要件を満たすために,さらに統合され,さらに発展してきた。さらに,本タスクの既存のトピックモデリングやキーワード抽出ソリューションよりも優れた結果が得られる。提案手法は,ベンチマークのために公開されているデータセットを用いて,他の最先端手法と比較して検証・比較する。

The efficiency of natural language processing has improved dramatically with the advent of machine learning models, particularly neural network-based solutions. However, some tasks are still challenging, especially when considering specific domains. In this paper, we present a cloud-based system that can extract insights from customer reviews using machine learning methods integrated into a pipeline. For topic modeling, our composite model uses transformer-based neural networks designed for natural language processing, vector embedding-based keyword extraction, and clustering. The elements of our model have been integrated and further developed to meet better the requirements of efficient information extraction, topic modeling of the extracted information, and user needs. Furthermore, our system can achieve better results than this task's existing topic modeling and keyword extraction solutions. Our approach is validated and compared with other state-of-the-art methods using publicly available datasets for benchmarking.

翻訳日:2023-06-22 00:52:22 公開日:2023-06-18

# リンク予測のためのグラフニューラルネットワークの評価:現在の落とし穴とベンチマーク

Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking ( http://arxiv.org/abs/2306.10453v1 )

ライセンス: Link先を確認

Juanhui Li, Harry Shomer, Haitao Mao, Shenglai Zeng, Yao Ma, Neil Shah, Jiliang Tang, Dawei Yin

(参考訳) リンク予測は、グラフのエッジの一部のみに基づいて、見当たらないエッジが存在するかどうかを予測しようとする。近年,この課題にグラフニューラルネットワーク(GNN)を活用すべく,一連の手法が導入されている。さらに、これらの新しいモデルの有効性をより良く評価するために、新しく多様なデータセットも作成されている。しかし、これらの新しい手法を適切に評価する能力を阻害する複数の落とし穴がある。これらの落とし穴には、(1)複数のベースラインでの実際のパフォーマンスよりも低いこと、(2)いくつかのデータセットにおける統一データ分割と評価指標の欠如、(3)簡単な負のサンプルを用いた非現実的な評価設定が含まれる。これらの課題を克服するために、我々はまず、同じデータセットとハイパーパラメータ検索設定を利用して、注目すべきメソッドとデータセットを公正に比較する。次に,複数のヒューリスティックスを用いて硬い負のサンプルをサンプリングするヒューリスティック関連サンプリング手法(heart)に基づいて,より実用的な評価設定を行う。新しい評価設定は、評価を現実世界の状況に合わせることによって、リンク予測の新たな挑戦と機会を促進するのに役立つ。私たちの実装とデータはhttps://github.com/Juanhui28/HeaRTで利用可能です。

Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph. A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task. Furthermore, new and diverse datasets have also been created to better evaluate the effectiveness of these new models. However, multiple pitfalls currently exist that hinder our ability to properly evaluate these new methods. These pitfalls mainly include: (1) Lower than actual performance on multiple baselines, (2) A lack of a unified data split and evaluation metric on some datasets, and (3) An unrealistic evaluation setting that uses easy negative samples. To overcome these challenges, we first conduct a fair comparison across prominent methods and datasets, utilizing the same dataset and hyperparameter search settings. We then create a more practical evaluation setting based on a Heuristic Related Sampling Technique (HeaRT), which samples hard negative samples via multiple heuristics. The new evaluation setting helps promote new challenges and opportunities in link prediction by aligning the evaluation with real-world situations. Our implementation and data are available at https://github.com/Juanhui28/HeaRT

翻訳日:2023-06-21 20:44:23 公開日:2023-06-18

# mismatch:ミスマッチエラー型を用いたマシン生成テキストのきめ細かい評価

MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types ( http://arxiv.org/abs/2306.10452v1 )

ライセンス: Link先を確認

Keerthiram Murugesan, Sarathkrishna Swaminathan, Soham Dan, Subhajit Chaudhury, Chulaka Gunasekara, Maxwell Crouse, Diwakar Mahajan, Ibrahim Abdelaziz, Achille Fokoue, Pavan Kapanipathi, Salim Roukos, Alexander Gray

(参考訳) 大規模言語モデルへの関心が高まっており、参照(典型的には人間生成)テキストと比較して機械テキストの品質を評価する必要性が注目されている。最近の研究はタスク固有の評価メトリクスにフォーカスするか、既存のメトリクスでキャプチャされたマシン生成テキストの特性を研究している。本研究では,一対のテキスト間のきめ細かいミスマッチに基づいて,人間の判断を7つのNLPタスクでモデル化する新しい評価手法を提案する。微粒化評価のためのNLPタスクの最近の取り組みに触発されて,空間的/地理的誤りや実体的誤りなど13種類のミスマッチエラータイプを導入し,人間の判断をより正確に予測するためのモデル指導を行った。本稿では,これらのミスマッチエラータイプを補助的タスクとして用いたマシンテキスト評価のためのニューラルネットワークフレームワークを提案し,既存の単一数値評価指標を,マシンから抽出したテキスト特徴や参照テキストに加え,スカラー機能として再活用する。当社の実験では、ミスマッチエラーによる既存のメトリクスに関する重要な洞察を明らかにしました。 7つのNLPタスクから得られたデータセットの文対間のミスマッチ誤差は,人間の評価とよく一致している。

With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts. Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types such as spatial/geographic errors, entity errors, etc, to guide the model for better prediction of human judgments. We propose a neural framework for evaluating machine texts that uses these mismatch error types as auxiliary tasks and re-purposes the existing single-number evaluation metrics as additional scalar features, in addition to textual features extracted from the machine and reference texts. Our experiments reveal key insights about the existing metrics via the mismatch errors. We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.

翻訳日:2023-06-21 20:44:04 公開日:2023-06-18

# 協調的知識の活用による胸部X線の放射線学所見の生成

Generation of Radiology Findings in Chest X-Ray by Leveraging Collaborative Knowledge ( http://arxiv.org/abs/2306.10448v1 )

ライセンス: Link先を確認

Manuela Daniela Danu, George Marica, Sanjeev Kumar Karn, Bogdan Georgescu, Awais Mansoor, Florin Ghesu, Lucian Mihai Itu, Constantin Suciu, Sasa Grbic, Oladimeji Farri, Dorin Comaniciu

(参考訳) 典型的な放射線医学レポートの全てのサブセクションのうち、臨床適応、所見、印象は患者の健康状態に関する重要な詳細を反映していることが多い。インプレッションに含まれる情報は、しばしば発見によってカバーされる。 FindingsとImpressionは画像の検査によって推測できるが、臨床指標は追加のコンテキストを必要とすることが多い。医学的イメージを解釈する認知的タスクは、放射線学のワークフローにおいて最も重要かつしばしば時間を要するステップである。本稿では,医療画像の自動解釈,特に胸部X線(CXR)から発見物を生成することに焦点を当てた。したがって、この研究は、研究の執筆やナレーションにほとんどの時間を費やす放射線科医の作業量を減らすことに焦点を当てている。ラジオグラフィーレポート生成を単一ステップ画像キャプションタスクとして扱う過去の研究とは異なり、CXR画像の解釈の複雑さを考慮し、2段階のアプローチを提案する。 (a)画像に異常のある領域を検出すること、 (b)生成型大言語モデル(llm)を用いて異常領域の関連テキストを生成すること。この2段階のアプローチは解釈可能性の層を導入し、放射線技師がcxrをレビューする際に使用する体系的な推論とフレームワークを整合させる。

Among all the sub-sections in a typical radiology report, the Clinical Indications, Findings, and Impression often reflect important details about the health status of a patient. The information included in Impression is also often covered in Findings. While Findings and Impression can be deduced by inspecting the image, Clinical Indications often require additional context. The cognitive task of interpreting medical images remains the most critical and often time-consuming step in the radiology workflow. Instead of generating an end-to-end radiology report, in this paper, we focus on generating the Findings from automated interpretation of medical images, specifically chest X-rays (CXRs). Thus, this work focuses on reducing the workload of radiologists who spend most of their time either writing or narrating the Findings. Unlike past research, which addresses radiology report generation as a single-step image captioning task, we have further taken into consideration the complexity of interpreting CXR images and propose a two-step approach: (a) detecting the regions with abnormalities in the image, and (b) generating relevant text for regions with abnormalities by employing a generative large language model (LLM). This two-step approach introduces a layer of interpretability and aligns the framework with the systematic reasoning that radiologists use when reviewing a CXR.

翻訳日:2023-06-21 20:43:44 公開日:2023-06-18

# 分散マッチングによるグラフ学習の現場グローバル解釈

In-Process Global Interpretation for Graph Learning via Distribution Matching ( http://arxiv.org/abs/2306.10447v1 )

ライセンス: Link先を確認

Yi Nian, Wei Jin, Lu Lin

(参考訳) グラフニューラルネットワーク(GNN)は、重要なグラフパターンをキャプチャする能力が優れているため、強力なグラフ学習モデルとして登場した。解釈可能なグラフ学習のためのモデルメカニズムに関する洞察を得るためには、事前学習されたgnnモデルが個人予測に使用するデータパターンを抽出し、hoc後の局所解釈に焦点を当てている。しかし、近年の研究では、ポストホック法はモデル初期化に非常に敏感であり、局所的な解釈は特定のインスタンス特有のモデル予測のみを説明できることを示している。本研究では、モデルトレーニング手順のグローバルな解釈を提供する方法について、まだ研究されていない重要な質問に答えることで、これらの制限に対処します。我々は,この問題を,GNNのトレーニング手順を支配する高レベルかつ人間の知能なパターンを蒸留することを目的とした,プロセス内グローバル解釈として定式化する。さらに,GNNの特徴空間における原グラフと解釈グラフの分布を学習の過程でマッチングすることにより,解釈グラフを合成するグラフ分散マッチング(GDM)を提案する。これらのわずかな解釈グラフは、トレーニング中にモデルがキャプチャする最も有益なパターンを示しています。グラフ分類データセットに関する広範囲な実験により,高い説明精度,時間効率,クラス関連構造を明らかにする能力など,提案手法の複数の利点が示された。

Graphs neural networks (GNNs) have emerged as a powerful graph learning model due to their superior capacity in capturing critical graph patterns. To gain insights about the model mechanism for interpretable graph learning, previous efforts focus on post-hoc local interpretation by extracting the data pattern that a pre-trained GNN model uses to make an individual prediction. However, recent works show that post-hoc methods are highly sensitive to model initialization and local interpretation can only explain the model prediction specific to a particular instance. In this work, we address these limitations by answering an important question that is not yet studied: how to provide global interpretation of the model training procedure? We formulate this problem as in-process global interpretation, which targets on distilling high-level and human-intelligible patterns that dominate the training procedure of GNNs. We further propose Graph Distribution Matching (GDM) to synthesize interpretive graphs by matching the distribution of the original and interpretive graphs in the feature space of the GNN as its training proceeds. These few interpretive graphs demonstrate the most informative patterns the model captures during training. Extensive experiments on graph classification datasets demonstrate multiple advantages of the proposed method, including high explanation accuracy, time efficiency and the ability to reveal class-relevant structure.

翻訳日:2023-06-21 20:43:23 公開日:2023-06-18

# メタ事前自己検索によるユニバーサル情報抽出

Universal Information Extraction with Meta-Pretrained Self-Retrieval ( http://arxiv.org/abs/2306.10444v1 )

ライセンス: Link先を確認

Xin Cong. Bowen Yu, Mengcheng Fang, Tingwen Liu, Haiyang Yu, Zhongkai Hu, Fei Huang, Yongbin Li, Bin Wang

(参考訳) Universal Information extract~(Universal IE)は、テキストから構造までの一様生成方法で異なる抽出タスクを解くことを目的としている。このような生成手順は、抽出すべき複雑な情報構造が存在する場合に苦労する傾向がある。外部知識ベースから知識を取得することは、モデルがこの問題を克服するのに役立つかもしれないが、様々なIEタスクに適した知識ベースを構築することは不可能である。本稿では,事前学習された言語モデル~(plm)に大量の知識が格納されていることに着想を得て,タスク固有の知識をplmから取得してユニバーサルieを強化するメタレトリエを提案する。異なるIEタスクが異なる知識を必要とするため、下流のIEタスクを微調整する際に、MetaRetrieverがタスク固有の検索性能の最大化を迅速に行えるメタトレーニングアルゴリズムを提案する。実験の結果、MetaRetrieverは4つのIEタスク、12のデータセットで、完全に管理され、低リソースで、少数ショットのシナリオで新しい最先端を実現している。

Universal Information Extraction~(Universal IE) aims to solve different extraction tasks in a uniform text-to-structure generation manner. Such a generation procedure tends to struggle when there exist complex information structures to be extracted. Retrieving knowledge from external knowledge bases may help models to overcome this problem but it is impossible to construct a knowledge base suitable for various IE tasks. Inspired by the fact that large amount of knowledge are stored in the pretrained language models~(PLM) and can be retrieved explicitly, in this paper, we propose MetaRetriever to retrieve task-specific knowledge from PLMs to enhance universal IE. As different IE tasks need different knowledge, we further propose a Meta-Pretraining Algorithm which allows MetaRetriever to quicktly achieve maximum task-specific retrieval performance when fine-tuning on downstream IE tasks. Experimental results show that MetaRetriever achieves the new state-of-the-art on 4 IE tasks, 12 datasets under fully-supervised, low-resource and few-shot scenarios.

翻訳日:2023-06-21 20:43:05 公開日:2023-06-18

# El Ni\~非南方振動の季節予測のための畳み込みGRUネットワーク

Convolutional GRU Network for Seasonal Prediction of the El Ni\~no-Southern Oscillation ( http://arxiv.org/abs/2306.10443v1 )

ライセンス: Link先を確認

Lingda Wang, Savana Ammons, Vera Mikyoung Hur, Ryan L. Sriver, Zhizhen Zhao

(参考訳) 地球温度と降水パターンに大きな影響を及ぼすため,エルニ・南方オシレーション(ENSO)地域の海面温度(SST)の予測が広く研究されている。線形逆モデル(LIM)やアナログ予測(AF)、リカレントニューラルネットワーク(RNN)といった統計モデルは、大きな動的モデルに比べて柔軟性と計算コストの低いENSO予測に広く用いられている。しかし、これらのモデルには、SST変数の空間パターンのキャプチャや線形力学に依存する制限がある。本稿では,enso領域時空間シーケンス予測問題に対するconvolutional gated recurrent unit (convgru)ネットワークの改良と,ダウンストリームタスクとしてのni\~no 3.4インデックス予測を提案する。提案するConvGRUネットワークはエンコーダ・デコーダシーケンス・ツー・シーケンス構造を持ち,太平洋地域の歴史的SSTマップを入力として取り込んで,その後数ヶ月間,ENSO領域内で将来のSSTマップを生成する。 ConvGRUネットワークの性能を評価するために,複数の大規模気候モデルから得られたデータを用いて実験を行った。その結果、LIM、AF、RNNと比較して、ConvGRUネットワークはNi\~no 3.4インデックスの予測可能性を大幅に向上することが示された。この改善は、拡張された有用予測範囲、高いピアソン相関、低い根-平均二乗誤差によって証明される。提案モデルは,enso現象の理解と予測能力の向上に期待でき,空間パターンとテレコネクションを用いた他の気象・気候予測シナリオにも適用可能である。

Predicting sea surface temperature (SST) within the El Ni\~no-Southern Oscillation (ENSO) region has been extensively studied due to its significant influence on global temperature and precipitation patterns. Statistical models such as linear inverse model (LIM), analog forecasting (AF), and recurrent neural network (RNN) have been widely used for ENSO prediction, offering flexibility and relatively low computational expense compared to large dynamic models. However, these models have limitations in capturing spatial patterns in SST variability or relying on linear dynamics. Here we present a modified Convolutional Gated Recurrent Unit (ConvGRU) network for the ENSO region spatio-temporal sequence prediction problem, along with the Ni\~no 3.4 index prediction as a down stream task. The proposed ConvGRU network, with an encoder-decoder sequence-to-sequence structure, takes historical SST maps of the Pacific region as input and generates future SST maps for subsequent months within the ENSO region. To evaluate the performance of the ConvGRU network, we trained and tested it using data from multiple large climate models. The results demonstrate that the ConvGRU network significantly improves the predictability of the Ni\~no 3.4 index compared to LIM, AF, and RNN. This improvement is evidenced by extended useful prediction range, higher Pearson correlation, and lower root-mean-square error. The proposed model holds promise for improving our understanding and predicting capabilities of the ENSO phenomenon and can be broadly applicable to other weather and climate prediction scenarios with spatial patterns and teleconnections.

翻訳日:2023-06-21 20:42:46 公開日:2023-06-18

# ロボット操作のためのユニバーサルセマンティクス・ジオメトリ表現

A Universal Semantic-Geometric Representation for Robotic Manipulation ( http://arxiv.org/abs/2306.10474v1 )

ライセンス: Link先を確認

Tong Zhang, Yingdong Hu, Hanchen Cui, Hang Zhao, Yang Gao

(参考訳) ロボットはセンサー、特にRGBと深度カメラに大きく依存し、世界に対する認識と対話を行う。 RGBカメラは、正確な空間情報を欠きながら、豊かな意味情報を持つ2D画像を記録する。一方、深度カメラは重要な3Dジオメトリデータを提供するが、セマンティクスは限られている。したがって、ロボットの知覚と制御を学習するためには、両方のモダリティを統合することが不可欠である。しかし、現在の研究は主にこれらのモダリティの1つに焦点を合わせており、両方を組み込むことの利点を無視している。この目的のために,大規模な事前学習型2次元モデルのリッチな意味情報を活用し,三次元空間推論の利点を継承するロボットのための汎用認識モジュールであるセマンティック・幾何学表現(SGR)を提案する。実験の結果、SGRはエージェントに対して、シミュレーションおよび実世界の様々なロボット操作タスクを成功させ、シングルタスクとマルチタスクの両方において、最先端の手法よりも優れた性能を発揮することが示された。さらに、SGRには、新しいセマンティック属性に一般化するユニークな機能があり、他のメソッドとは分離されている。

Robots rely heavily on sensors, especially RGB and depth cameras, to perceive and interact with the world. RGB cameras record 2D images with rich semantic information while missing precise spatial information. On the other side, depth cameras offer critical 3D geometry data but capture limited semantics. Therefore, integrating both modalities is crucial for learning representations for robotic perception and control. However, current research predominantly focuses on only one of these modalities, neglecting the benefits of incorporating both. To this end, we present Semantic-Geometric Representation (SGR), a universal perception module for robotics that leverages the rich semantic information of large-scale pre-trained 2D models and inherits the merits of 3D spatial reasoning. Our experiments demonstrate that SGR empowers the agent to successfully complete a diverse range of simulated and real-world robotic manipulation tasks, outperforming state-of-the-art methods significantly in both single-task and multi-task settings. Furthermore, SGR possesses the unique capability to generalize to novel semantic attributes, setting it apart from the other methods.

翻訳日:2023-06-21 20:35:22 公開日:2023-06-18

# 2D-Shapley: 断片化されたデータ評価のためのフレームワーク

2D-Shapley: A Framework for Fragmented Data Valuation ( http://arxiv.org/abs/2306.10473v1 )

ライセンス: Link先を確認

Zhihong Liu, Hoang Anh Just, Xiangyu Chang, Xi Chen, Ruoxi Jia

(参考訳) データ評価 -- モデルの特定の予測行動に対する個々のデータソースの貢献を定量化する -- は、機械学習の透明性を高め、データ共有のためのインセンティブシステムを設計する上で非常に重要である。既存の作業は、共有機能やサンプルスペースでデータソースを評価することに集中しています。それぞれの部分的な特徴とサンプルのみを含む断片化されたデータソースの評価方法は、未解決の問題のままである。まず,集約されたデータマトリックスから断片を除去することの反事実を計算する手法を提案する。反事実計算に基づいてさらに,断片化されたデータコンテキストにおける一意に魅力的な公理を満たす,断片化されたデータ評価のための理論的枠組みである2d-shapleyを提案する。 2D-Shapleyは、有用なデータフラグメントの選択、サンプル単位のデータ値の解釈、きめ細かいデータ問題診断など、さまざまな新しいユースケースを促進する。

Data valuation -- quantifying the contribution of individual data sources to certain predictive behaviors of a model -- is of great importance to enhancing the transparency of machine learning and designing incentive systems for data sharing. Existing work has focused on evaluating data sources with the shared feature or sample space. How to valuate fragmented data sources of which each only contains partial features and samples remains an open question. We start by presenting a method to calculate the counterfactual of removing a fragment from the aggregated data matrix. Based on the counterfactual calculation, we further propose 2D-Shapley, a theoretical framework for fragmented data valuation that uniquely satisfies some appealing axioms in the fragmented data context. 2D-Shapley empowers a range of new use cases, such as selecting useful data fragments, providing interpretation for sample-wise data values, and fine-grained data issue diagnosis.

翻訳日:2023-06-21 20:35:03 公開日:2023-06-18

# 高次依存パーシングのためのニューラルポテンシャルの伝達

Transferring Neural Potentials For High Order Dependency Parsing ( http://arxiv.org/abs/2306.10469v1 )

ライセンス: Link先を確認

Farshad Noravesh

(参考訳) 高階依存性解析は兄弟や孫といった高階機能を活用して、現在の一階依存性解析の精度を向上させる。本稿では,ビアフィンスコアを用いて弧スコアの推定を行い,それをグラフィカルモデルに伝播する。グラフィカルモデル内の推論は二重分解を用いて解決される。本アルゴリズムは,バイアフィンのニューラルスコアをグラフィカルモデルに伝達し,2重分解推論を活用し,回路全体をエンドツーエンドに訓練し,第1次情報を高次情報に転送する。

High order dependency parsing leverages high order features such as siblings or grandchildren to improve state of the art accuracy of current first order dependency parsers. The present paper uses biaffine scores to provide an estimate of the arc scores and is then propagated into a graphical model. The inference inside the graphical model is solved using dual decomposition. The present algorithm propagates biaffine neural scores to the graphical model and by leveraging dual decomposition inference, the overall circuit is trained end-to-end to transfer first order informations to the high order informations.

翻訳日:2023-06-21 20:34:47 公開日:2023-06-18

# ブラウン運動制御器によるgansトレーニングの安定化

Stabilizing GANs' Training with Brownian Motion Controller ( http://arxiv.org/abs/2306.10468v1 )

ライセンス: Link先を確認

Tianjiao Luo, Ziyu Zhu, Jianfei Chen, Jun Zhu

(参考訳) generative adversarial networks(gans)のトレーニングプロセスは不安定であり、グローバルに収束しない。本稿では,制御理論の観点からGANの安定性を考察し,BMC(Brownian Motion Controller)と呼ばれる高次騒音制御系を提案する。ディラックGANの原型の場合から、我々はBMCを設計し、正確に同じだが到達可能な最適平衡を求める。理論上、diracgans-bmcの訓練過程は指数関数的に安定であり、収束率の境界が導かれることを証明している。次に、BMCを通常のGANに拡張し、GANs-BMCの実装手順を提供する。実験の結果,我々のGANs-BMCは,より高速な収束率,発振域の小さい,FIDスコアの点で優れた性能で,StyleGANv2-adaフレームワーク下でのGANsのトレーニングを効果的に安定化することがわかった。

The training process of generative adversarial networks (GANs) is unstable and does not converge globally. In this paper, we examine the stability of GANs from the perspective of control theory and propose a universal higher-order noise-based controller called Brownian Motion Controller (BMC). Starting with the prototypical case of Dirac-GANs, we design a BMC to retrieve precisely the same but reachable optimal equilibrium. We theoretically prove that the training process of DiracGANs-BMC is globally exponential stable and derive bounds on the rate of convergence. Then we extend our BMC to normal GANs and provide implementation instructions on GANs-BMC. Our experiments show that our GANs-BMC effectively stabilizes GANs' training under StyleGANv2-ada frameworks with a faster rate of convergence, a smaller range of oscillation, and better performance in terms of FID score.

翻訳日:2023-06-21 20:34:36 公開日:2023-06-18

# グラフラドリング: 中間的コミュニケーションを伴わない極めて単純な並列GNNトレーニング

Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication ( http://arxiv.org/abs/2306.10466v1 )

ライセンス: Link先を確認

Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang

(参考訳) グラフは一様であり、GNNはグラフを学習するためのニューラルネットワークの強力なファミリーである。その人気にもかかわらず、gnnの拡張は、不健全な勾配、過剰なスモーニング、情報のスカッシュといった一般的な問題に苦しめられ、それがしばしば標準以下のパフォーマンスに繋がる。本研究では,GNNのキャパシティを拡張・拡張することなく拡張し,複数の小・大規模グラフにまたがる性能向上を図ることに興味がある。最近のモデルスープの興味深い現象に触発されて、複数の大規模言語事前学習モデルの微調整重量をより良いミニマにマージできることが示唆され、モデルスープの基本を利用して、GNNスケーリング時のメモリボトルネックやトレーサビリティの問題を緩和する。より具体的には、現在のGNNの深化や拡大はしないが、GNNに適したモデルスープのデータ中心の視点を示す。すなわち、巨大なグラフデータを独立に分割して並列に訓練された複数のGNNを中間的な通信なしで構築し、その強度をグリーディ補間スーププロシージャと組み合わせて最先端のパフォーマンスを達成することで、強力なGNNを構築する。さらに,大規模なグラフデータ構造を扱える最先端のグラフサンプリングとグラフ分割アプローチを活用することで,幅広いモデルスープ作成手法を提供する。実世界の小規模・大規模グラフにまたがる広範な実験は、我々のアプローチの有効性を示し、GNNスケーリングのための有望な直交方向に向かっている。コードは以下の通り: \url{https://github.com/VITA-Group/graph_ladling}。

Graphs are omnipresent and GNNs are a powerful family of neural networks for learning over graphs. Despite their popularity, scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing, which often lead to sub-standard performance. In this work, we are interested in exploring a principled way to scale GNNs capacity without deepening or widening, which can improve its performance across multiple small and large graphs. Motivated by the recent intriguing phenomenon of model soups, which suggest that fine-tuned weights of multiple large-language pre-trained models can be merged to a better minima, we argue to exploit the fundamentals of model soups to mitigate the aforementioned issues of memory bottleneck and trainability during GNNs scaling. More specifically, we propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs, i.e., to build powerful GNNs by dividing giant graph data to build independently and parallelly trained multiple comparatively weaker GNNs without any intermediate communication, and combining their strength using a greedy interpolation soup procedure to achieve state-of-the-art performance. Moreover, we provide a wide variety of model soup preparation techniques by leveraging state-of-the-art graph sampling and graph partitioning approaches that can handle large graph data structures. Our extensive experiments across many real-world small and large graphs, illustrate the effectiveness of our approach and point towards a promising orthogonal direction for GNN scaling. Codes are available at: \url{https://github.com/VITA-Group/graph_ladling}.

翻訳日:2023-06-21 20:34:19 公開日:2023-06-18

# 改良されたRDOプロセスによるGAN画像圧縮

GAN-based Image Compression with Improved RDO Process ( http://arxiv.org/abs/2306.10461v1 )

ライセンス: Link先を確認

Fanxin Xia, Jian Jin, Lili Meng, Feng Ding, Huaxiang Zhang

(参考訳) GANベースの画像圧縮方式は,低ビットレートで高い知覚品質を実現するため,近年顕著な進歩を見せている。しかし、主な問題として2つある。 1)色,テクスチャ,構造及び構造における再構成画像の知覚的変性 2)不正確なエントロピーモデル。本稿では、レート歪み最適化(RDO)プロセスを改善した新しいGANベースの画像圧縮手法を提案する。これを実現するために、DisTSとMS-SSIMのメトリクスを用いて、色、テクスチャ、構造における知覚的変性を測定する。さらに,エントロピーモデルのための離散ガウス・ラプラシア・ロジスティック混合モデル(gllmm)を吸収し,潜在表現の確率分布の推定精度を向上させる。評価過程において, iqaメトリクスを用いて再構成画像の知覚品質を評価する代わりに, ヒトの知覚結果を完全に反映する異なるコーデック間の平均評価スコア(mos)実験を直接実施する。実験の結果,提案手法は既存のGAN法と最先端のハイブリッドコーデック(VVC)よりも優れていた。

GAN-based image compression schemes have shown remarkable progress lately due to their high perceptual quality at low bit rates. However, there are two main issues, including 1) the reconstructed image perceptual degeneration in color, texture, and structure as well as 2) the inaccurate entropy model. In this paper, we present a novel GAN-based image compression approach with improved rate-distortion optimization (RDO) process. To achieve this, we utilize the DISTS and MS-SSIM metrics to measure perceptual degeneration in color, texture, and structure. Besides, we absorb the discretized gaussian-laplacian-logistic mixture model (GLLMM) for entropy modeling to improve the accuracy in estimating the probability distributions of the latent representation. During the evaluation process, instead of evaluating the perceptual quality of the reconstructed image via IQA metrics, we directly conduct the Mean Opinion Score (MOS) experiment among different codecs, which fully reflects the actual perceptual results of humans. Experimental results demonstrate that the proposed method outperforms the existing GAN-based methods and the state-of-the-art hybrid codec (i.e., VVC).

翻訳日:2023-06-21 20:33:42 公開日:2023-06-18

# instant soup: 安いプランニングアンサンブルを1枚のパスで作れば、大きなモデルから宝くじを引ける

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models ( http://arxiv.org/abs/2306.10460v1 )

ライセンス: Link先を確認

Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang

(参考訳) 大規模な事前訓練されたトランスフォーマは、微調整による多数の下流アプリケーションへの適応性の高さから、ここ数年で爆発的な注目を集めてきたが、その指数関数的に増加するパラメータ数は、業界標準のハードウェアなしでそれらを微調整する上でも、大きなハードルとなっている。近年、LTH(Lottery Ticket hypothesis)とその変種は、これらの大きな事前訓練されたモデルを用いて、密度の高いモデルと同等の性能を達成できるサブネットを創出するが、LTHプラグマティズムは、反復的なフルトレーニングと反復的マグニチュードプルーニング(IMP)のプルーニングルーチンによって著しく阻害され、モデルサイズが増加するにつれて悪化する。モデルスープの最近の観察から,複数のモデルの微調整された重量をより小型化できる可能性が示唆されている。我々は,IMPの高価な中間プルーニング段階を計算効率の悪いマスク生成と集約ルーチンに置き換えることで,従来のIMPコストのごく一部を用いて,宝くじ品質のサブネットワークを生成するInstant Soup Pruning (ISP)を提案する。具体的には、マスク生成の段階では、ISPは、様々なトレーニングプロトコルとデータサブセットを使用して、弱いノイズの多いサブネットを多数生成し、ノイズを平均化し、高品質のノイズを発生させる。複数のベンチマークビジョンと言語データセットにわたるCLIP(未探索)とBERTの2つの大規模な事前訓練モデルに対する広範な実験とアブレーションにより、ISPの有効性がいくつかの最先端のプルーニング手法と比較して検証された。コードは以下の通り。 \url{https://github.com/VITA-Group/instant_soup}

Large pre-trained transformers have been receiving explosive attention in the past few years, due to their wide adaptability for numerous downstream applications via fine-tuning, but their exponentially increasing parameter counts are becoming a primary hurdle to even just fine-tune them without industry-standard hardware. Recently, Lottery Ticket Hypothesis (LTH) and its variants, have been exploited to prune these large pre-trained models generating subnetworks that can achieve similar performance as their dense counterparts, but LTH pragmatism is enormously inhibited by repetitive full training and pruning routine of iterative magnitude pruning (IMP) which worsens with increasing model size. Motivated by the recent observations of model soups, which suggest that fine-tuned weights of multiple models can be merged to a better minima, we propose Instant Soup Pruning (ISP) to generate lottery ticket quality subnetworks, using a fraction of the original IMP cost by replacing the expensive intermediate pruning stages of IMP with computationally efficient weak mask generation and aggregation routine. More specifically, during the mask generation stage, ISP takes a small handful of iterations using varying training protocols and data subsets to generate many weak and noisy subnetworks, and superpose them to average out the noise creating a high-quality denoised subnetwork. Our extensive experiments and ablation on two popular large-scale pre-trained models: CLIP (unexplored in pruning till date) and BERT across multiple benchmark vision and language datasets validate the effectiveness of ISP compared to several state-of-the-art pruning methods. Codes are available at: \url{https://github.com/VITA-Group/instant_soup}

翻訳日:2023-06-21 20:33:24 公開日:2023-06-18

# Interval Targets を用いた弱教師付き回帰

Weakly Supervised Regression with Interval Targets ( http://arxiv.org/abs/2306.10458v1 )

ライセンス: Link先を確認

Xin Cheng and Yuzhou Cao and Ximing Li and Bo An and Lei Feng

(参考訳) 本稿では,Regressed with interval target (RIT)と呼ばれる,興味深い教師付き回帰設定について検討する。関連する回帰設定に関する従来の手法のいくつかはRITに適応できるが、統計的に一貫性がなく、経験的性能は保証されない。本稿では,RITに関する詳細な研究について述べる。まず,ritのデータ生成過程を記述するための新しい統計モデルを提案し,その妥当性を示す。第二に、RITの簡単な選択法を解析し、対象値として区間内の特定の値を選択してモデルを訓練する。第3に、予測を間隔に制限することでモデルを訓練するための統計的に一貫した制限法を提案する。さらに,限界法に対する推定誤差を導出する。最後に,様々なデータセットに関する広範囲な実験を行い,提案手法の有効性を示す。

This paper investigates an interesting weakly supervised regression setting called regression with interval targets (RIT). Although some of the previous methods on relevant regression settings can be adapted to RIT, they are not statistically consistent, and thus their empirical performance is not guaranteed. In this paper, we provide a thorough study on RIT. First, we proposed a novel statistical model to describe the data generation process for RIT and demonstrate its validity. Second, we analyze a simple selection method for RIT, which selects a particular value in the interval as the target value to train the model. Third, we propose a statistically consistent limiting method for RIT to train the model by limiting the predictions to the interval. We further derive an estimation error bound for our limiting method. Finally, extensive experiments on various datasets demonstrate the effectiveness of our proposed method.

翻訳日:2023-06-21 20:32:48 公開日:2023-06-18

# グラフ表現学習によるバイオメディシンの進歩 : 最近の進歩,課題,今後の方向性

Advancing Biomedicine with Graph Representation Learning: Recent Progress, Challenges, and Future Directions ( http://arxiv.org/abs/2306.10456v1 )

ライセンス: Link先を確認

Fang Li, Yi Nian, Zenan Sun, Cui Tao

(参考訳) グラフ表現学習(GRL)は、バイオメディシンを含む様々な分野のブレークスルーに大きく貢献する中心的な分野として登場した。本調査の目的は, GRL法の最近の進歩とそのバイオメディカル分野への応用を概観することである。また、GRLが現在直面している重要な課題を強調し、今後の研究の方向性について概説する。

Graph representation learning (GRL) has emerged as a pivotal field that has contributed significantly to breakthroughs in various fields, including biomedicine. The objective of this survey is to review the latest advancements in GRL methods and their applications in the biomedical field. We also highlight key challenges currently faced by GRL and outline potential directions for future research.

翻訳日:2023-06-21 20:32:37 公開日:2023-06-18

# 制限された相手に対する量子サンプリングによるワンウェイエンタングルメント浄化の安全性

Security of One-Way Entanglement Purification with Quantum Sampling Against a Restricted Adversary ( http://arxiv.org/abs/2306.10455v1 )

ライセンス: Link先を確認

Cameron Cianci

(参考訳) エンタングルメント浄化プロトコルは、ノイズの多いチャネルにエンタングルメントを分散することにより、量子ネットワークの将来において重要な役割を果たすことを約束する。しかし, 双方向浄化プロトコルの安全性は検討されているのみである。そこで本研究では,量子サンプリングを応用し,単一量子ビットパウリゲートに制限された敵に対するセキュリティを証明する一方通行の絡み合い解消プロトコルを提案する。これは、一方向の絡み合わせプロトコルと誤り訂正符号の等価性を利用する。このプロトコルの安全性を証明するために、ブーマンとフェーアが導入した量子サンプリングフレームワークを用いて、チャネルを通過した量子ビットのハミング重量を推定し、Eveが量子チャネルに課した干渉の量を決定するために、推定相対的なハミング重量$\omega$を使用する。 eveは単一キュービットのパウリゲートに制限されているため、適用ゲートの数をハミング重量を用いて直接見積もることができる。逆1量子ビットゲートの数を推定すると、誤差補正を行い、確率1-\epsilon_{qu}^\delta$で論理量子ビットをイヴから切り離すことができる。このプロトコルは一方向のみの通信を可能にするため、送信前にコードの距離を決定する必要があるため、bob氏は、eveがコードを修正できる以上のゲートを施したことを知っていれば、プロトコルを中止せざるを得なくなるだろう。ワンウェイプロトコルは、通信が限られている場合や、双方向プロトコルで必要とされる複数の通信ラウンドと比較してレイテンシーを減らしたい場合に使われる可能性がある。さらなる研究は、より一般的な敵に対するセキュリティ保証を得るために、任意のシングルまたはマルチキュービットゲートに対するこのプロトコルのセキュリティを調査することができる。

Entanglement purification protocols promise to play a critical role in the future of quantum networks by distributing entanglement across noisy channels. However, only the security of two-way purification protocols have been closely studied. To address this, we propose a one-way entanglement purification protocol which utilizes quantum sampling and prove its security against an adversary restricted to single qubit Pauli gates. This is done through leveraging the equivalence of one-way entanglement purification protocols with error-correcting codes. To prove the security of this protocol, we first use the quantum sampling framework introduced by Bouman and Fehr to estimate the Hamming weight of the qubits which passed through the channel and then use the estimated relative Hamming weight $\omega$ to determine the amount of interference that Eve has subjected to the quantum channel. Since Eve is restricted to single qubit Pauli gates, the number of applied gates can be directly estimated using the Hamming weight. Estimating the number of adversarial single qubit gates, allows us to perform error correction and disentangle the logical qubit from Eve with probability $1-\epsilon_{qu}^\delta$. Since this protocol allows communication only in one direction, the distance of the code must be decided before transmission, and therefore Bob will be forced to abort the protocol if he finds that Eve has applied more gates than the code can correct. One-way protocols may find use when communication is limited, or when we desire to decrease latency compared to the multiple rounds of communication needed in two-way protocols. Further research may investigate the security of this protocol against arbitrary single or multi-qubit gates to obtain security guarantees against a more general adversary.

翻訳日:2023-06-21 20:32:29 公開日:2023-06-18

# 自律運転のためのオンライン地図ベクトル化:ラスタライズの視点から

Online Map Vectorization for Autonomous Driving: A Rasterization Perspective ( http://arxiv.org/abs/2306.10502v1 )

ライセンス: Link先を確認

Gongjie Zhang, Jiahao Lin, Shuang Wu, Yilin Song, Zhipeng Luo, Yang Xue, Shijian Lu, Zuoguan Wang

(参考訳) ベクトル化高精細度(hd)マップは自動運転に必須であり、高度な知覚と計画のための詳細な環境情報を提供する。しかし、現在の地図ベクトル化法はしばしば偏差を示し、既存の地図ベクトル化の評価基準ではこれらの偏差を検出するのに十分な感度が欠けている。これらの制約に対処するため、ラスタ化の哲学をマップベクトル化に統合することを提案する。具体的には、ラスタライズに基づく新しい評価指標を導入し、感度が良く、現実の自律運転シナリオに適している。さらに、ベクトル化出力に微分可能ラスタ化を適用し、ラスタ化HDマップの精密かつ幾何学的監視を行う新しいフレームワークであるMapVR(Map Vectorization via Rasterization)を提案する。特に、MapVRは様々な幾何学的な形状のラスタ化戦略を設計し、幅広い地図要素に効果的に適用することができる。実験により、ラスタ化を地図ベクトル化に組み込むことは、推論中に余分な計算コストを伴わずに性能を大幅に向上させ、より正確な地図認識をもたらし、究極的にはより安全な自動運転を促進することが示されている。

Vectorized high-definition (HD) map is essential for autonomous driving, providing detailed and precise environmental information for advanced perception and planning. However, current map vectorization methods often exhibit deviations, and the existing evaluation metric for map vectorization lacks sufficient sensitivity to detect these deviations. To address these limitations, we propose integrating the philosophy of rasterization into map vectorization. Specifically, we introduce a new rasterization-based evaluation metric, which has superior sensitivity and is better suited to real-world autonomous driving scenarios. Furthermore, we propose MapVR (Map Vectorization via Rasterization), a novel framework that applies differentiable rasterization to vectorized outputs and then performs precise and geometry-aware supervision on rasterized HD maps. Notably, MapVR designs tailored rasterization strategies for various geometric shapes, enabling effective adaptation to a wide range of map elements. Experiments show that incorporating rasterization into map vectorization greatly enhances performance with no extra computational cost during inference, leading to more accurate map perception and ultimately promoting safer autonomous driving.

翻訳日:2023-06-21 20:24:49 公開日:2023-06-18

# 多層心血管疾患予測のための半教師付き学習:マルチデータセットによる検討

Semi-Supervised Learning for Multi-Label Cardiovascular Diseases Prediction:A Multi-Dataset Study ( http://arxiv.org/abs/2306.10494v1 )

ライセンス: Link先を確認

Rushuang Zhou, Lei Lu, Zijun Liu, Ting Xiang, Zhen Liang, David A. Clifton, Yining Dong, Yuan-Ting Zhang

(参考訳) 心電図は、心血管疾患(CVD)を予測するための非侵襲的なツールである。現在の心電図に基づく診断システムは,ディープラーニング技術の急速な発展により,有望な性能を示す。しかし、ラベルの不足、複数のCVDの共起、見えないデータセットの性能の低下は、ディープラーニングベースのモデルの普及を著しく妨げている。統一されたフレームワークでそれらに取り組むことは、依然として大きな課題である。そこで本研究では,複数のCVDを同時に認識するマルチラベル半教師付きモデル(ECGMatch)を提案する。 ECGMatchでは、弱い強力なECGデータ拡張のためにECGAugmentモジュールが開発され、モデルトレーニングのための多様なサンプルを生成する。その後、ラベル不足を緩和する擬似ラベル生成・改良のために、近隣の合意モデリングと知識蒸留を備えたハイパーパラメータ効率のフレームワークを設計する。最後に,ラベル付きサンプル内の異なるCVDの共起情報を捕捉し,ラベル付きサンプルに伝達するラベル相関アライメントモジュールを提案する。 4つのデータセットと3つのプロトコルに関する大規模な実験は、提案モデルの有効性と安定性を実証している。そのため,本モデルは,限られた監督下での多ラベルCVD予測において,堅牢な性能を実現する診断システムを実現することができる。

Electrocardiography (ECG) is a non-invasive tool for predicting cardiovascular diseases (CVDs). Current ECG-based diagnosis systems show promising performance owing to the rapid development of deep learning techniques. However, the label scarcity problem, the co-occurrence of multiple CVDs and the poor performance on unseen datasets greatly hinder the widespread application of deep learning-based models. Addressing them in a unified framework remains a significant challenge. To this end, we propose a multi-label semi-supervised model (ECGMatch) to recognize multiple CVDs simultaneously with limited supervision. In the ECGMatch, an ECGAugment module is developed for weak and strong ECG data augmentation, which generates diverse samples for model training. Subsequently, a hyperparameter-efficient framework with neighbor agreement modeling and knowledge distillation is designed for pseudo-label generation and refinement, which mitigates the label scarcity problem. Finally, a label correlation alignment module is proposed to capture the co-occurrence information of different CVDs within labeled samples and propagate this information to unlabeled samples. Extensive experiments on four datasets and three protocols demonstrate the effectiveness and stability of the proposed model, especially on unseen datasets. As such, this model can pave the way for diagnostic systems that achieve robust performance on multi-label CVDs prediction with limited supervision.

翻訳日:2023-06-21 20:24:28 公開日:2023-06-18

# MOSPC: ペアワイズ比較に基づくMOS予測

MOSPC: MOS Prediction Based on Pairwise Comparison ( http://arxiv.org/abs/2306.10493v1 )

ライセンス: Link先を確認

Kexin Wang, Yunlong Zhao, Qianqian Dong, Tom Ko, Mingxuan Wang

(参考訳) 合成音声の品質を評価する主観的指標として、平均評価スコア~(mos)は、通常、複数の注釈者が同じ音声を得点する必要がある。このようなアノテーションアプローチには多くのマンパワーが必要で、時間もかかります。自動評価のためのMOS予測モデルは、労働コストを大幅に削減することができる。先行研究では,mosスコアが近い場合,音声品質を正確にランク付けすることは困難である。しかし, 実用的応用においては, 単にmosスコアを予測するよりも, 合成システムや文の品質を正しくランク付けすることが重要である。一方、アノテーション中に各アノテータが複数のオーディオをスコアする際、アノテータが付与する第1または第1の音声スコアに基づいてスコアが相対値となる。以上の2点により,ペア比較(MOSPC)に基づくMOS予測のための一般的なフレームワークを提案し,C-Mixupアルゴリズムを用いてMOSPCの一般化性能を向上させる。 BVCCとVCC2018の実験は、我々のフレームワークが相関係数の指標のほとんど、特に品質ランキングに関するKTAUの基準よりも優れていることを示している。また,このフレームワークは,各細粒度セグメントのランキング精度も高いベースラインを超えている。これらの結果から,音声品質のランク付け精度の向上に寄与することが示唆された。

As a subjective metric to evaluate the quality of synthesized speech, Mean opinion score~(MOS) usually requires multiple annotators to score the same speech. Such an annotation approach requires a lot of manpower and is also time-consuming. MOS prediction model for automatic evaluation can significantly reduce labor cost. In previous works, it is difficult to accurately rank the quality of speech when the MOS scores are close. However, in practical applications, it is more important to correctly rank the quality of synthesis systems or sentences than simply predicting MOS scores. Meanwhile, as each annotator scores multiple audios during annotation, the score is probably a relative value based on the first or the first few speech scores given by the annotator. Motivated by the above two points, we propose a general framework for MOS prediction based on pair comparison (MOSPC), and we utilize C-Mixup algorithm to enhance the generalization performance of MOSPC. The experiments on BVCC and VCC2018 show that our framework outperforms the baselines on most of the correlation coefficient metrics, especially on the metric KTAU related to quality ranking. And our framework also surpasses the strong baseline in ranking accuracy on each fine-grained segment. These results indicate that our framework contributes to improving the ranking accuracy of speech quality.

翻訳日:2023-06-21 20:24:07 公開日:2023-06-18

# レーン分割注意マップ類似性を用いた自律走行シミュレーションにおけるSim2Real画像ギャップの定量化に関する研究

A Study on Quantifying Sim2Real Image Gap in Autonomous Driving Simulations Using Lane Segmentation Attention Map Similarity ( http://arxiv.org/abs/2306.10491v1 )

ライセンス: Link先を確認

Seongjeong Park, Jinu Pahk, Lennart Lorenz Freimuth Jahn, Yongseob Lim, Jinung An, Gyeungho Choi

(参考訳) 自動運転シミュレーションは非常に現実的な画像を必要とする。予備研究では,DCLGANを用いてCARLAシミュレータ画像がより現実に近いものになると,車線認識モデルの性能は現実の運転に匹敵するレベルまで向上した。また、車両が車線から外れた後に車線の中心に戻る能力が大幅に改善されたことも確認された。しかし,シミュレーション画像のリアリズムを定量的に評価するための合意基準は現時点では存在しない。そこで本研究では,fid (fr\'echet inception distance) が事前学習モデルを用いて特徴ベクトル分布距離を測定することを前提として,enet-sadの自己注意蒸留過程からの注意マップを用いてシミュレーション道路画像の類似度を測定する指標を提案する。最後に,実世界の自律走行試験道路を実装したカルラマップの画像に適用することにより,計測方法の適合性を検証した。

Autonomous driving simulations require highly realistic images. Our preliminary study found that when the CARLA Simulator image was made more like reality by using DCLGAN, the performance of the lane recognition model improved to levels comparable to real-world driving. It was also confirmed that the vehicle's ability to return to the center of the lane after deviating from it improved significantly. However, there is currently no agreed-upon metric for quantitatively evaluating the realism of simulation images. To address this issue, based on the idea that FID (Fr\'echet Inception Distance) measures the feature vector distribution distance using a pre-trained model, this paper proposes a metric that measures the similarity of simulation road images using the attention map from the self-attention distillation process of ENet-SAD. Finally, this paper verified the suitability of the measurement method by applying it to the image of the CARLA map that implemented a realworld autonomous driving test road.

翻訳日:2023-06-21 20:23:44 公開日:2023-06-18

# ニューロシンボリック学習による高速画像ラベリング

Rapid Image Labeling via Neuro-Symbolic Learning ( http://arxiv.org/abs/2306.10490v1 )

ライセンス: Link先を確認

Yifeng Wang, Zhi Tu, Yiwen Xiang, Shiyuan Zhou, Xiyuan Chen, Bingxuan Li, and Tianyi Zhang

(参考訳) コンピュータビジョン(cv)の成功は、手動の注釈データに大きく依存している。しかし、データラベリングには重要なドメイン専門知識が必要であり、クラウドワーカーに簡単に委譲することはできない、ヘルスケアのような重要なドメインで画像に注釈をつけるのは、非常に高価である。この課題に対処するために、ドメインの専門家が提供した少量のラベル付きデータから画像ラベル規則を推論し、そのルールを用いて無注釈データを自動的にラベル付けするRapidというニューロシンボリックアプローチを提案する。特にRapidは、事前訓練されたCVモデルと誘導論理学習を組み合わせて、ロジックベースのラベリングルールを推論する。 rapidは4つの画像ラベリングタスクで83.33%から88.33%のラベリング精度を達成している。特にrapidは、2つの高度に専門的なタスクで微調整されたcvモデルを大幅に上回っている。これらの結果は,小さなデータから高速に学習することの有効性と,異なるタスクを一般化する能力を示している。コードとデータセットはhttps://github.com/Neural-Symbolic-Image-Labeling/で公開されています。

The success of Computer Vision (CV) relies heavily on manually annotated data. However, it is prohibitively expensive to annotate images in key domains such as healthcare, where data labeling requires significant domain expertise and cannot be easily delegated to crowd workers. To address this challenge, we propose a neuro-symbolic approach called Rapid, which infers image labeling rules from a small amount of labeled data provided by domain experts and automatically labels unannotated data using the rules. Specifically, Rapid combines pre-trained CV models and inductive logic learning to infer the logic-based labeling rules. Rapid achieves a labeling accuracy of 83.33% to 88.33% on four image labeling tasks with only 12 to 39 labeled samples. In particular, Rapid significantly outperforms finetuned CV models in two highly specialized tasks. These results demonstrate the effectiveness of Rapid in learning from small data and its capability to generalize among different tasks. Code and our dataset are publicly available at https://github.com/Neural-Symbolic-Image-Labeling/

翻訳日:2023-06-21 20:23:25 公開日:2023-06-18

# 2層ニューラルネットワークパラメトリゼーションを用いた自然アクタークリティックの大域収束について

On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization ( http://arxiv.org/abs/2306.10486v1 )

ライセンス: Link先を確認

Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal

(参考訳) アクター批判アルゴリズムは最先端の意思決定問題を解決するのに顕著な成功を収めた。しかしながら、その経験的効果にもかかわらず、その理論的基盤は、特にニューラルネットワークのパラメトリゼーションにおいて、比較的未探査のままである。本稿では,ニューラルネットを用いて批評家を表現する自然なアクタ-クリティックアルゴリズムの研究について述べる。本研究の目的は,本アルゴリズムの性能特性をより深く理解し,サンプル複雑性の保証を確立することである。そこで本研究では,2層批判パラメトリゼーション(NAC2L)を用いたNatural Actor-Criticアルゴリズムを提案する。我々のアプローチでは、凸最適化問題を通じて各イテレーションの$q$関数を推定する。提案手法により,$\tilde{\mathcal{o}}\left(\frac{1}{\epsilon^{4}(1-\gamma)^{4}}\right)$ のサンプル複雑性が得られることを確認した。対照的に、文献中の既存のサンプルの複雑さは、表状または線形のMDPのみを保持する。一方、この結果は可算な状態空間に対して成り立ち、MDP上の線形構造やローランク構造を必要としない。

Actor-critic algorithms have shown remarkable success in solving state-of-the-art decision-making problems. However, despite their empirical effectiveness, their theoretical underpinnings remain relatively unexplored, especially with neural network parametrization. In this paper, we delve into the study of a natural actor-critic algorithm that utilizes neural networks to represent the critic. Our aim is to establish sample complexity guarantees for this algorithm, achieving a deeper understanding of its performance characteristics. To achieve that, we propose a Natural Actor-Critic algorithm with 2-Layer critic parametrization (NAC2L). Our approach involves estimating the $Q$-function in each iteration through a convex optimization problem. We establish that our proposed approach attains a sample complexity of $\tilde{\mathcal{O}}\left(\frac{1}{\epsilon^{4}(1-\gamma)^{4}}\right)$. In contrast, the existing sample complexity results in the literature only hold for a tabular or linear MDP. Our result, on the other hand, holds for countable state spaces and does not require a linear or low-rank structure on the MDP.

翻訳日:2023-06-21 20:23:07 公開日:2023-06-18

# 分散検出のための平衡エネルギー正規化損失

Balanced Energy Regularization Loss for Out-of-distribution Detection ( http://arxiv.org/abs/2306.10485v1 )

ライセンス: Link先を確認

Hyunjun Choi, Hawook Jeong, Jin Young Choi

(参考訳) オフ・オブ・ディストリビューション(OOD)検出の分野では、OODデータとして補助データを使用する従来の手法が有望な性能を示している。しかし、この方法はすべての補助データに等しく損失を与え、不整合と区別する。しかし, 様々なタスクにおいて, 補助的oodデータのクラス間での分布には, 一般的な不均衡が存在する。本稿では, 単純だが多種多様なタスクに有効である平衡エネルギー正規化損失を提案する。我々の平衡エネルギー正規化損失は、OODデータのクラス不均衡に対処するために補助データに対して、クラスごとに異なる事前確率を利用する。主な概念は、マイノリティクラスよりも多数派クラスからの補助的なサンプルを規則化することである。本手法は, 従来のエネルギー正規化損失よりも, セマンティックセグメンテーション, ロングテール画像分類, 画像分類におけるood検出に優れている。さらに, セマンティックセグメンテーションにおけるOOD検出と長期画像分類の2つのタスクにおいて, 最先端性能を実現する。コードはhttps://github.com/hyunjunChhoi/Balanced_Energyで入手できる。

In the field of out-of-distribution (OOD) detection, a previous method that use auxiliary data as OOD data has shown promising performance. However, the method provides an equal loss to all auxiliary data to differentiate them from inliers. However, based on our observation, in various tasks, there is a general imbalance in the distribution of the auxiliary OOD data across classes. We propose a balanced energy regularization loss that is simple but generally effective for a variety of tasks. Our balanced energy regularization loss utilizes class-wise different prior probabilities for auxiliary data to address the class imbalance in OOD data. The main concept is to regularize auxiliary samples from majority classes, more heavily than those from minority classes. Our approach performs better for OOD detection in semantic segmentation, long-tailed image classification, and image classification than the prior energy regularization loss. Furthermore, our approach achieves state-of-the-art performance in two tasks: OOD detection in semantic segmentation and long-tailed image classification. Code is available at https://github.com/hyunjunChhoi/Balanced_Energy.

翻訳日:2023-06-21 20:22:55 公開日:2023-06-18

# STOIC2021 COVID-19 AIチャレンジ:再利用可能なトレーニング方法論をプライベートデータに適用

The STOIC2021 COVID-19 AI challenge: applying reusable training methodologies to private data ( http://arxiv.org/abs/2306.10484v1 )

ライセンス: Link先を確認

Luuk H. Boulogne, Julian Lorenz, Daniel Kienzle, Robin Schon, Katja Ludwig, Rainer Lienhart, Simon Jegou, Guang Li, Cong Chen, Qi Wang, Derik Shi, Mayug Maniparambil, Dominik Muller, Silvan Mertes, Niklas Schroter, Fabio Hellmann, Miriam Elia, Ine Dirks, Matias Nicolas Bossa, Abel Diaz Berenguer, Tanmoy Mukherjee, Jef Vandemeulebroucke, Hichem Sahli, Nikos Deligiannis, Panagiotis Gonidakis, Ngoc Dung Huynh, Imran Razzak, Reda Bouadjenek, Mario Verdicchio, Pasquale Borrelli, Marco Aiello, James A. Meakin, Alexander Lemm, Christoph Russ, Razvan Ionasec, Nikos Paragios, Bram van Ginneken, and Marie-Pierre Revel Dubois

(参考訳) 課題は、自動医療画像分析の最先端を推進する。彼らが提供する公開トレーニングデータの量は、ソリューションのパフォーマンスを制限できる。これらのソリューションのトレーニング方法論へのパブリックアクセスはまだ残っていない。本研究は、プライベートデータ上でのトレーニングソリューションと再利用可能なトレーニング方法論を保証できるType Three (T3)チャレンジフォーマットを実装した。 T3では、チャレンジオーガナイザが参加者が提供するコードベースを、隔離されたトレーニングデータでトレーニングする。 T3はSTOIC2021チャレンジで実施され、CT(Computed tomography)スキャンから被験者が1ヶ月以内にインキュベーションまたは死亡と定義される重症なCOVID-19感染症を患っているかどうかを予測することを目的としている。 stoic2021は、2000年公開のctスキャンを使用してチャレンジソリューションを開発した資格フェーズと、9724名の被験者のctスキャンでソリューションをトレーニングしたトレーニング方法論を参加者が提出する最終フェーズで構成されていた。主催者は最終段階の8回のうち6回を修了した。トレーニングと実行のためのコードベースが公開された。勝利解は、重篤なCOVID-19と非重症なCOVID-19(0.815)の鑑別のために、受信機動作特性曲線の下にある領域を得た。すべてのファイナリストの最終フェーズソリューションは、予選フェーズソリューションによって改善されました。

Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.

翻訳日:2023-06-21 20:22:37 公開日:2023-06-18

# 重み付き構造テンソル総変動による画像の雑音化

Weighted structure tensor total variation for image denoising ( http://arxiv.org/abs/2306.10482v1 )

ライセンス: Link先を確認

Xiuhan Sheng and Jingya Changa

(参考訳) 本稿では, 画像復号化問題の変分枠組みに基づいて, 異方性全変量モデル (ATV) と構造テンソル全変量モデル (STV) を組み合わせた新しい画像復号化正規化手法を提案する。本モデルは,stvモデルにおけるパッチベースヤコビ行列に対して,atvモデルで提案する行列重み演算子を適用することにより,画像の1次情報を効果的に捕捉し,ノイズ処理中に局所的な特徴を維持できる。グレースケールとrgbカラー画像のノイズ除去実験により,提案手法は,全変量ベースモデルとstvモデルに基づく他の既知の手法と比較して,良好な修復品質が得られることが示された。

Based on the variational framework of the image denoising problem, we introduce a novel image denoising regularizer that combines anisotropic total variation model (ATV) and structure tensor total variation model (STV) in this paper. The model can effectively capture the first-order information of the image and maintain local features during the denoising process by applying the matrix weighting operator proposed in the ATV model to the patch-based Jacobian matrix in the STV model. Denoising experiments on grayscale and RGB color images demonstrate that the suggested model can produce better restoration quality in comparison to other well-known methods based on total-variation-based models and the STV model.

翻訳日:2023-06-21 20:22:17 公開日:2023-06-18

# if2net: 連続学習のためのインナートフリーネットワーク

IF2Net: Innately Forgetting-Free Networks for Continual Learning ( http://arxiv.org/abs/2306.10480v1 )

ライセンス: Link先を確認

Depeng Li, Tianqi Wang, Bingrong Xu, Kenji Kawaguchi, Zhigang Zeng, and Ponnuthurai Nagaratnam Suganthan

(参考訳) 継続的学習は、以前の学習した知識に干渉することなく、新しい概念を段階的に吸収することができる。ニューラルネットワークの特徴として,情報を接続に重み付けして格納する手法を考案し,連続学習環境におけるIF2Netの設計方法について検討した。本研究では,新しいタスクの学習前後において,各タスクに対する重み付けを巧みに保ちながら,単純かつ効果的な学習パラダイムを提案する。まず,ランダム重み付きタスク列の表現レベル学習について紹介した。このテクニックは、ランダム化によって引き起こされるドリフト表現を別々のタスク最適動作状態に調整することを指すが、関連する重みは凍結され、再利用される(重みの層的な更新がよく知られている)。そして、出力重み更新を同相直交空間に投影し、モデル可塑性を維持しながら古い知識を邪魔しないようにすることで、忘れることなく逐次意思決定を行うことができる。 IF2Netは、ランダム化と直交化のそれぞれの強みを統合することにより、テスト時にタスクの同一性を知ることなく、本質的に無制限のマッピングルールを学習することができる。理論解析および実証研究において,本手法の有効性を検証した。

Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge. Motivated by the characteristics of neural networks, in which information is stored in weights on connections, we investigated how to design an Innately Forgetting-Free Network (IF2Net) for continual learning context. This study proposed a straightforward yet effective learning paradigm by ingeniously keeping the weights relative to each seen task untouched before and after learning a new task. We first presented the novel representation-level learning on task sequences with random weights. This technique refers to tweaking the drifted representations caused by randomization back to their separate task-optimal working states, but the involved weights are frozen and reused (opposite to well-known layer-wise updates of weights). Then, sequential decision-making without forgetting can be achieved by projecting the output weight updates into the parsimonious orthogonal space, making the adaptations not disturb old knowledge while maintaining model plasticity. IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time by integrating the respective strengths of randomization and orthogonalization. We validated the effectiveness of our approach in the extensive theoretical analysis and empirical study.

翻訳日:2023-06-21 20:22:02 公開日:2023-06-18

# ct金属アーティファクト低減のためのretinexflow

RetinexFlow for CT metal artifact reduction ( http://arxiv.org/abs/2306.10520v1 )

ライセンス: Link先を確認

Jiandong Su and Ce Wang and Yinsheng Li and Kun Shang and Dong Liang

(参考訳) 金属アーティファクトはctイメージングにおいて大きな課題であり、画質を著しく低下させ、正確な診断を困難にしている。しかし、従来の方法では、金属インプラントの位置の事前知識が必要か、あるいは高品質ct画像を得る能力を制限するアーティファクト形成のメカニズムによるモデリングの逸脱が必要である。本研究では,金属アーティファクト低減問題を分解と完了タスクの組合せとして定式化する。そこで本研究では,retinex理論と条件付き正規化フローに基づく,新たなエンドツーエンド画像ドメインモデルであるretinexflowを提案する。具体的には,金属インプラント成分と固有の成分を分解する機能分解エンコーダを設計し,その特性を抽出する。そして、機能対イメージフローモジュールを使用して、金属製のアーティファクトフリーCT画像ステップを、一連の可逆変換をステップで完了させる。これらの設計は粗細な戦略でモデルに組み込まれており、優れた性能を実現しています。シミュレーションおよび臨床データを用いた実験結果から,本手法はより定量的で質的な結果が得られ,アーティファクト除去や画像忠実度が向上することが示された。

Metal artifacts is a major challenge in computed tomography (CT) imaging, significantly degrading image quality and making accurate diagnosis difficult. However, previous methods either require prior knowledge of the location of metal implants, or have modeling deviations with the mechanism of artifact formation, which limits the ability to obtain high-quality CT images. In this work, we formulate metal artifacts reduction problem as a combination of decomposition and completion tasks. And we propose RetinexFlow, which is a novel end-to-end image domain model based on Retinex theory and conditional normalizing flow, to solve it. Specifically, we first design a feature decomposition encoder for decomposing the metal implant component and inherent component, and extracting the inherent feature. Then, it uses a feature-to-image flow module to complete the metal artifact-free CT image step by step through a series of invertible transformations. These designs are incorporated in our model with a coarse-to-fine strategy, enabling it to achieve superior performance. The experimental results on on simulation and clinical datasets show our method achieves better quantitative and qualitative results, exhibiting better visual performance in artifact removal and image fidelity

翻訳日:2023-06-21 20:17:04 公開日:2023-06-18

# 量子プログラムのリファクタリングについて

On Refactoring Quantum Programs ( http://arxiv.org/abs/2306.10517v1 )

ライセンス: Link先を確認

Jianjun Zhao

(参考訳) リファクタリングは、ソフトウェアの内部設計を再構築し、外部の振る舞いを保ちながら、ソフトウェアの効率と保守性を改善する上で重要な技術である。古典的なプログラムは様々なリファクタリング手法の恩恵を受けているが、量子プログラミングの分野には専用のリファクタリング技法がない。量子重ね合わせ、絡み合い、非閉包原理といった量子コンピューティングの異なる性質は、特別なリファクタリング技術を必要とする。本稿では,量子プログラム専用に設計された包括的リファクタリングのセットを提示することで,このギャップを解消する。各リファクタリングは、量子プログラムの効果的な再構成を保証するために慎重に設計され、説明される。さらに,量子プログラムのリファクタリングプロセスの自動化におけるツールサポートの重要性を強調する。我々の研究は量子プログラミング言語Q\#に焦点を当てているが、我々のアプローチは他の量子プログラミング言語にも適用でき、量子ソフトウェアの保守性と効率を高めるための一般的なソリューションを提供する。

Refactoring is a crucial technique for improving the efficiency and maintainability of software by restructuring its internal design while preserving its external behavior. While classical programs have benefited from various refactoring methods, the field of quantum programming lacks dedicated refactoring techniques. The distinct properties of quantum computing, such as quantum superposition, entanglement, and the no-cloning principle, necessitate specialized refactoring techniques. This paper bridges this gap by presenting a comprehensive set of refactorings specifically designed for quantum programs. Each refactoring is carefully designed and explained to ensure the effective restructuring of quantum programs. Additionally, we highlight the importance of tool support in automating the refactoring process for quantum programs. Although our study focuses on the quantum programming language Q\#, our approach is applicable to other quantum programming languages, offering a general solution for enhancing the maintainability and efficiency of quantum software.

翻訳日:2023-06-21 20:16:25 公開日:2023-06-18

# 群衆の生体信号検出のためのビジョンガイドMIMOレーダビームフォーミング

Vision Guided MIMO Radar Beamforming for Enhanced Vital Signs Detection in Crowds ( http://arxiv.org/abs/2306.10515v1 )

ライセンス: Link先を確認

Shuaifeng Jiang, Ahmed Alkhateeb, Daniel W. Bliss, and Yu Rong

(参考訳) リモートセンシング技術としてのレーダーは、人間の活動を分析するために何十年も使われてきた。モーション感度、プライバシー保護、透過性などの優れた特徴にもかかわらず、レーダーは光学センサーに比べて空間的自由度が制限されているため、事前情報なしで混雑した環境を感知することは困難である。本稿では,複数入力多重出力 (mimo) レーダにおけるディジタルビームフォーミングの誘導に視覚センサを応用した,新しいデュアルセンシングシステムを開発した。また,2種類のセンサを整列するキャリブレーションアルゴリズムを開発し,キャリブレーションされたデュアルシステムは,75^\circ$×65^\circ$,及び2mの範囲で3次元空間における約2cm精度を達成可能であることを示した。最後に,実環境におけるバイタルサイン検出の有望な方向性を浮き彫りにした,座位と立位が密集した被験者群に対して,同時にバイタルサインを検出できることを示す。

Radar as a remote sensing technology has been used to analyze human activity for decades. Despite all the great features such as motion sensitivity, privacy preservation, penetrability, and more, radar has limited spatial degrees of freedom compared to optical sensors and thus makes it challenging to sense crowded environments without prior information. In this paper, we develop a novel dual-sensing system, in which a vision sensor is leveraged to guide digital beamforming in a multiple-input multiple-output (MIMO) radar. Also, we develop a calibration algorithm to align the two types of sensors and show that the calibrated dual system achieves about two centimeters precision in three-dimensional space within a field of view of $75^\circ$ by $65^\circ$ and for a range of two meters. Finally, we show that the proposed approach is capable of detecting the vital signs simultaneously for a group of closely spaced subjects, sitting and standing, in a cluttered environment, which highlights a promising direction for vital signs detection in realistic environments.

翻訳日:2023-06-21 20:16:00 公開日:2023-06-18

# プロンプトに基づくFew Shotテキスト分類のための進化的バーバリザ探索

Evolutionary Verbalizer Search for Prompt-based Few Shot Text Classification ( http://arxiv.org/abs/2306.10514v1 )

ライセンス: Link先を確認

Tongtao Ling, Lei Chen, Yutao Lai and Hai-Lin Liu

(参考訳) テキスト分類の最近の進歩は、テキスト入力をタスク固有のプロンプトでラップして質問をクローズすることを目的としている。マスク付き言語モデルでそれらを処理し、マスク付きトークンを予測し、予測された単語とターゲットラベルのマッピングを構成する動詞を用いた。事前訓練された言語モデルを使用するこのアプローチは、プロンプトベースのチューニングと呼ばれ、低データシナリオにおける従来の微調整アプローチよりも著しく優れている。プロンプトベースのチューニングのコアとして、動詞化語は通常、人間の努力で手作りされる。本稿では, 最適言語化器の自動構築に着目し, 高速言語化器による即興チューニングを改善するための新しい進化的言語化器探索アルゴリズムを提案する。具体的には、進化アルゴリズム(EA)にインスパイアされ、進化過程において様々な動詞を自動進化させ、何回か繰り返して最適なものを選択する。 5つのテキスト分類データセットに関する広範囲なサンプル実験を行い,本手法の有効性を示した。

Recent advances for few-shot text classification aim to wrap textual inputs with task-specific prompts to cloze questions. By processing them with a masked language model to predict the masked tokens and using a verbalizer that constructs the mapping between predicted words and target labels. This approach of using pre-trained language models is called prompt-based tuning, which could remarkably outperform conventional fine-tuning approach in the low-data scenario. As the core of prompt-based tuning, the verbalizer is usually handcrafted with human efforts or suboptimally searched by gradient descent. In this paper, we focus on automatically constructing the optimal verbalizer and propose a novel evolutionary verbalizer search (EVS) algorithm, to improve prompt-based tuning with the high-performance verbalizer. Specifically, inspired by evolutionary algorithm (EA), we utilize it to automatically evolve various verbalizers during the evolutionary procedure and select the best one after several iterations. Extensive few-shot experiments on five text classification datasets show the effectiveness of our method.

翻訳日:2023-06-21 20:15:21 公開日:2023-06-18

# LLMの認知能力を効果的に測定する:適応的テストの観点から

Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing Perspective ( http://arxiv.org/abs/2306.10512v1 )

ライセンス: Link先を確認

Yan Zhuang, Qi Liu, Yuting Ning, Weizhe Huang, Rui Lv, Zhenya Huang, Guanhao Zhao, Zheng Zhang, Qingyang Mao, Shijin Wang, Enhong Chen

(参考訳) ChatGPTのような大型言語モデル(LLM)は、人間に似た認知能力を示している。これらの異なるモデルの能力を比較するために、異なる分野(文学、生物学、心理学など)のいくつかのベンチマーク(標準テスト質問の組)がしばしば採用され、精度、リコール、f1などの伝統的な指標によるテスト結果が報告されている。しかし、LCMの評価方法は認知科学の観点から非効率で不正確である。心理測定に使用されるCAT(Computerized Adaptive Testing)にヒントを得て,LLM評価のための適応テストフレームワークを提案する。標準的なテストセットを使用し、単に精度を報告するのではなく、モデルの性能に基づいて、難易度などのテスト問題の特徴を動的に調整する。これにより、より少ない質問を使ってモデルの能力をより正確に推定できる。さらに重要なのは、LLMを人間と簡単に比較できることであり、人間レベルの能力を目指すNLPモデルに必須である。診断報告によると、ChatGPTは「不注意な学生」のように振る舞うことが多く、時折質問を推測する傾向がある。対象知識,数学的推論,プログラミングの3つの側面から,gpt4が他のモデルを大幅に上回ることができ,中学生の認知能力に到達できる,詳細な診断を行い,最新の6つの指導調整llmをランク付けした。効率的な適応テストを使った異なるモデルの異なるテスト -- 私たちは、これは大きな言語モデルを評価するための新しい規範になる可能性があると信じています。

Large language models (LLMs), like ChatGPT, have shown some human-like cognitive abilities. For comparing these abilities of different models, several benchmarks (i.e. sets of standard test questions) from different fields (e.g., Literature, Biology and Psychology) are often adopted and the test results under traditional metrics such as accuracy, recall and F1, are reported. However, such way for evaluating LLMs can be inefficient and inaccurate from the cognitive science perspective. Inspired by Computerized Adaptive Testing (CAT) used in psychometrics, we propose an adaptive testing framework for LLM evaluation. Rather than using a standard test set and simply reporting accuracy, this approach dynamically adjusts the characteristics of the test questions, such as difficulty, based on the model's performance. This allows for a more accurate estimation of the model's abilities, using fewer questions. More importantly, it allows LLMs to be compared with humans easily, which is essential for NLP models that aim for human-level ability. Our diagnostic reports have found that ChatGPT often behaves like a ``careless student'', prone to slip and occasionally guessing the questions. We conduct a fine-grained diagnosis and rank the latest 6 instruction-tuned LLMs from three aspects of Subject Knowledge, Mathematical Reasoning, and Programming, where GPT4 can outperform other models significantly and reach the cognitive ability of middle-level students. Different tests for different models using efficient adaptive testing -- we believe this has the potential to become a new norm in evaluating large language models.

翻訳日:2023-06-21 20:15:01 公開日:2023-06-18

# クロスドメイン・ファウショット学習のためのデュアル適応表現アライメント

Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning ( http://arxiv.org/abs/2306.10511v1 )

ライセンス: Link先を確認

Yifan Zhao, Tong Zhang, Jia Li, Yonghong Tian

(参考訳) ベース知識から学習することで、限られたサポートサンプルを持つ新規なクエリを認識することを目的としている。この設定の最近の進歩は、ベース知識と新しいクエリサンプルが同じドメインに分散されていることを前提としている。本稿では,対象領域で利用可能なサンプルが極端に少ないクロスドメイン・少数ショット学習問題に対処することを提案する。この現実的な環境下では,効果的な二重適応表現アライメントアプローチを提案することで,メタリーナーの迅速な適応能力に焦点をあてる。提案手法では,まず,サポートインスタンスをプロトタイプとして再検討し,それらのプロトタイプを識別可能なクローズドフォームソリューションで再計画する。したがって、学習知識の特徴空間は、クロスインスタンスとクロスプロトタイプの関係により、クエリ空間に適応的に変換することができる。機能アライメントの他に,サポートおよびクエリサンプル間の共変シフトを解決するために,クエリサンプルの事前統計値を利用する正規化分布アライメントモジュールも提示する。これら2つのモジュールにより、プログレッシブなメタ学習フレームワークが構築され、その一般化能力を維持しながら、極めて少数のサンプルを用いて高速な適応を行う。実験結果から,cdfslベンチマーク4回,細粒度クロスドメインベンチマーク4回において,新たな最先端結果が得られた。

Few-shot learning aims to recognize novel queries with limited support samples by learning from base knowledge. Recent progress in this setting assumes that the base knowledge and novel query samples are distributed in the same domains, which are usually infeasible for realistic applications. Toward this issue, we propose to address the cross-domain few-shot learning problem where only extremely few samples are available in target domains. Under this realistic setting, we focus on the fast adaptation capability of meta-learners by proposing an effective dual adaptive representation alignment approach. In our approach, a prototypical feature alignment is first proposed to recalibrate support instances as prototypes and reproject these prototypes with a differentiable closed-form solution. Therefore feature spaces of learned knowledge can be adaptively transformed to query spaces by the cross-instance and cross-prototype relations. Besides the feature alignment, we further present a normalized distribution alignment module, which exploits prior statistics of query samples for solving the covariant shifts among the support and query samples. With these two modules, a progressive meta-learning framework is constructed to perform the fast adaptation with extremely few-shot samples while maintaining its generalization capabilities. Experimental evidence demonstrates our approach achieves new state-of-the-art results on 4 CDFSL benchmarks and 4 fine-grained cross-domain benchmarks.

翻訳日:2023-06-21 20:14:32 公開日:2023-06-18

# 人間対機械: 学生生成とAI生成の教育内容の比較

Human vs Machine: Comparison of Student-generated and AI-generated Educational Content ( http://arxiv.org/abs/2306.10509v1 )

ライセンス: Link先を確認

Paul Denny and Hassan Khosravi and Arto Hellas and Juho Leinonen and Sami Sarsa

(参考訳) パーソナライズされた学習体験を提供するオンライン学習プラットフォームに移行する学生が増えているため、高品質な教育コンテンツの生産には大きなニーズがある。大規模言語モデル(llm)は、大規模学習教材の迅速な作成に有望な解決策を提供し、インストラクターの負担を軽減する。本研究では,学習支援活動の一環として,LLMが生み出す資源の質を学生が生み出すものと比較することにより,導入プログラミングの文脈において学習資源を生み出す可能性を検討した。盲目評価を用いて、学生はaiとその仲間によって生成されたリソースの正確性と有用性を評価した。その結果,学生が認識するai生成資源の質は,仲間が生成する資源の質と同等であることがわかった。これは、AI生成資源が特定の文脈において有効な補助材料として機能する可能性を示唆している。 llmsが生成するリソースは与えられた例に忠実に反映する傾向があるが、学生が生成するリソースは、使用するコンテンツの長さと特定の構文の特徴の点で、より多種多様である。この研究は、さまざまなタイプの学習リソースと幅広い主題領域を探索し、AI生成リソースが学習結果に長期的な影響を理解することの必要性を強調している。

As an increasing number of students move to online learning platforms that deliver personalized learning experiences, there is a great need for the production of high-quality educational content. Large language models (LLMs) appear to offer a promising solution to the rapid creation of learning materials at scale, reducing the burden on instructors. In this study, we investigated the potential for LLMs to produce learning resources in an introductory programming context, by comparing the quality of the resources generated by an LLM with those created by students as part of a learnersourcing activity. Using a blind evaluation, students rated the correctness and helpfulness of resources generated by AI and their peers, after both were initially provided with identical exemplars. Our results show that the quality of AI-generated resources, as perceived by students, is equivalent to the quality of resources generated by their peers. This suggests that AI-generated resources may serve as viable supplementary material in certain contexts. Resources generated by LLMs tend to closely mirror the given exemplars, whereas student-generated resources exhibit greater variety in terms of content length and specific syntax features used. The study highlights the need for further research exploring different types of learning resources and a broader range of subject areas, and understanding the long-term impact of AI-generated resources on learning outcomes.

翻訳日:2023-06-21 20:14:10 公開日:2023-06-18

# qcnext:ジョイントマルチエージェント軌道予測のための次世代フレームワーク

QCNeXt: A Next-Generation Framework For Joint Multi-Agent Trajectory Prediction ( http://arxiv.org/abs/2306.10508v1 )

ライセンス: Link先を確認

Zikang Zhou, Zihao Wen, Jianping Wang, Yung-Hui Li, Yu-Kai Huang

(参考訳) 路上エージェントの将来の軌跡の同時分布を推定することは自動運転に不可欠である。本稿では,QCNeXtと呼ばれるマルチエージェント軌道予測のための次世代フレームワークを提案する。まず,複合マルチエージェント軌道予測のタスクとして,クエリ中心のエンコーディングパラダイムを採用する。この符号化方式により, シーンエンコーダは, 設定要素の置換等価性, 空間次元の回転変換不変性, 時間次元の変換不変性を備える。これらの不変性は、精度の高いマルチエージェント予測を可能にするだけでなく、エンコーダにストリーミング処理能力を与える。第2に,エージェントの相互作用をモデル化することで,複数エージェントの軌道予測を容易にする多エージェントDETR型デコーダを提案する。連立予測モデルが限界指標においても限界予測モデルを上回ることが初めて示され,軌道予測における新たな研究機会が開かれた。我々の手法はArgoverse 2のマルチエージェントモーション予測ベンチマークで1位にランクされ、CVPR 2023 Workshop on Autonomous DrivingでArgoverse Challengeのチャンピオンを獲得した。

Estimating the joint distribution of on-road agents' future trajectories is essential for autonomous driving. In this technical report, we propose a next-generation framework for joint multi-agent trajectory prediction called QCNeXt. First, we adopt the query-centric encoding paradigm for the task of joint multi-agent trajectory prediction. Powered by this encoding scheme, our scene encoder is equipped with permutation equivariance on the set elements, roto-translation invariance in the space dimension, and translation invariance in the time dimension. These invariance properties not only enable accurate multi-agent forecasting fundamentally but also empower the encoder with the capability of streaming processing. Second, we propose a multi-agent DETR-like decoder, which facilitates joint multi-agent trajectory prediction by modeling agents' interactions at future time steps. For the first time, we show that a joint prediction model can outperform marginal prediction models even on the marginal metrics, which opens up new research opportunities in trajectory prediction. Our approach ranks 1st on the Argoverse 2 multi-agent motion forecasting benchmark, winning the championship of the Argoverse Challenge at the CVPR 2023 Workshop on Autonomous Driving.

翻訳日:2023-06-21 20:13:46 公開日:2023-06-18

# 非log-concave分布に対するMCMCアルゴリズムの高速条件混合

Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions ( http://arxiv.org/abs/2306.10506v1 )

ライセンス: Link先を確認

Xiang Cheng, Bohan Wang, Jingzhao Zhang, Yusong Zhu

(参考訳) MCMCアルゴリズムは、ターゲット分布$\pi(x) \propto \exp(-V(x))$からサンプリングするための経験的に効率的なツールを提供する。しかし理論側では、mcmcアルゴリズムは$\pi(x)$ が非log-concaveであるときに混合速度が遅い。我々の研究は、このギャップを検証し、ポアンカー型不等式が状態空間のサブセット$\mathcal{X}$に収まるとき、MCMC の条件分布は $\mathcal{X}$ より速く真の条件分布に混合することを示す。この高速混合保証は、グローバル混合が確実に遅い場合に保持することができる。ステートメントを形式化し,条件付き混合率を定量化する。さらに,条件付き混合はガウス型混合物のサンプリング,ガウス型混合モデルのパラメータ推定,局所的極小点のgibbsサンプリングに興味深い意味を持つことを示す。

MCMC algorithms offer empirically efficient tools for sampling from a target distribution $\pi(x) \propto \exp(-V(x))$. However, on the theory side, MCMC algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave. Our work examines this gap and shows that when Poincar\'e-style inequality holds on a subset $\mathcal{X}$ of the state space, the conditional distribution of MCMC iterates over $\mathcal{X}$ mixes fast to the true conditional distribution. This fast mixing guarantee can hold in cases when global mixing is provably slow. We formalize the statement and quantify the conditional mixing rate. We further show that conditional mixing can have interesting implications for sampling from mixtures of Gaussians, parameter estimation for Gaussian mixture models and Gibbs-sampling with well-connected local minima.

翻訳日:2023-06-21 20:13:28 公開日:2023-06-18

# グラフ分類のための構造感性グラフ辞書

Structure-Sensitive Graph Dictionary Embedding for Graph Classification ( http://arxiv.org/abs/2306.10505v1 )

ライセンス: Link先を確認

Guangbu Liu, Tong Zhang, Xudong Wang, Wenting Zhao, Chuanwei Zhou, and Zhen Cui

(参考訳) グラフ構造表現は、様々なグラフを区別する上で重要な役割を果たす。本研究では,入力グラフをグラフ分類タスク用のグラフ辞書の埋め込み空間に変換するための,構造化グラフ辞書埋め込み(SS-GDE)フレームワークを提案する。本稿では,基本グラフ辞書を日常的に使用する代わりに,各入力グラフに対応するパーソナライズされた辞書(名前付きグラフ辞書)を生成するための変分グラフ辞書適応(VGDA)を提案する。特に,ベースグラフキーのサブ構造を各入力に応じて調整するためにベルヌーイサンプリングを導入することで,ベース辞書の表現能力を大幅に向上させる。クロスグラフ計測を高感度かつ安定にするために, 最適輸送に対するマルチスケールの注意を設計し, 多感度ワッサースタイン符号化法を提案する。この枠組みを最適化するために, 相互情報を目的として導入し, 適合グラフ辞書の変分推論にさらに寄与する。グラフ分類の複数のデータセット上でSS-GDEを行い、実験結果から最先端手法よりも有効性と優位性を示す。

Graph structure expression plays a vital role in distinguishing various graphs. In this work, we propose a Structure-Sensitive Graph Dictionary Embedding (SS-GDE) framework to transform input graphs into the embedding space of a graph dictionary for the graph classification task. Instead of a plain use of a base graph dictionary, we propose the variational graph dictionary adaptation (VGDA) to generate a personalized dictionary (named adapted graph dictionary) for catering to each input graph. In particular, for the adaptation, the Bernoulli sampling is introduced to adjust substructures of base graph keys according to each input, which increases the expression capacity of the base dictionary tremendously. To make cross-graph measurement sensitive as well as stable, multi-sensitivity Wasserstein encoding is proposed to produce the embeddings by designing multi-scale attention on optimal transport. To optimize the framework, we introduce mutual information as the objective, which further deduces to variational inference of the adapted graph dictionary. We perform our SS-GDE on multiple datasets of graph classification, and the experimental results demonstrate the effectiveness and superiority over the state-of-the-art methods.

翻訳日:2023-06-21 20:13:12 公開日:2023-06-18

# MARBLE:ユニバーサル評価のための音楽オーディオ表現ベンチマーク

MARBLE: Music Audio Representation Benchmark for Universal Evaluation ( http://arxiv.org/abs/2306.10548v1 )

ライセンス: Link先を確認

Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu

(参考訳) 画像生成やフィクションの共創など、芸術と人工知能(AI)の広範な交差の時代において、音楽のためのAIは、特に音楽の理解において比較的初期段階にある。これは、深い音楽表現に関する限られた作業、大規模データセットの不足、普遍的でコミュニティ主導のベンチマークの欠如によって明らかである。この問題に対処するため,MARBLEと呼ばれるUniversaL評価のためのMusic Audio Representation Benchmarkを導入する。音響、パフォーマンス、スコア、ハイレベル記述を含む4つの階層レベルを持つ包括的分類を定義することで、様々な音楽情報検索(MIR)タスクのベンチマークを提供する。次に,8つの公開データセット上で14のタスクに基づく統一プロトコルを構築し,音楽録音をベースラインとして開発したオープンソース事前学習モデルの表現を公平かつ標準的に評価する。さらに、MARBLEは、データセットの著作権問題に関する明確な声明とともに、使いやすく、拡張可能で、再現可能なスイートをコミュニティに提供する。その結果、近年提案されている大規模事前学習型言語モデルは、多くのタスクにおいて最善を尽くし、さらなる改善の余地があることがわかった。 leaderboardと toolkitリポジトリは、将来の音楽ai研究を促進するためにhttps://marble-bm.shef.ac.ukで公開されている。

In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue, we introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and reproducible suite for the community, with a clear statement on copyright issues on datasets. Results suggest recently proposed large-scale pre-trained musical language models perform the best in most tasks, with room for further improvement. The leaderboard and toolkit repository are published at https://marble-bm.shef.ac.uk to promote future music AI research.

翻訳日:2023-06-21 20:05:37 公開日:2023-06-18

# UniMC:関係表現学習による長期記憶会話のための統一フレームワーク

UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning ( http://arxiv.org/abs/2306.10543v1 )

ライセンス: Link先を確認

Kang Zhao, Wei Liu, Jian Luan, Minglei Gao, Li Qian, Hanlin Teng, Bin Wang

(参考訳) オープンドメインの長期記憶会話は、人間との長期的な親密性を確立することができ、鍵となるのは、長期の対話履歴情報を理解し記憶する能力である。既存の作業は、パイプラインを通じてモデリングする複数のモデルを統合することで、異なるステージ間の結合を無視します。本稿では,関係表現を学習することで異なるステージ間の接続を増加させる,長期記憶会話(unimc)のための統一フレームワークを提案する。具体的には、主タスクを確率グラフに基づいて3つのサブタスクに分解する。 1)会話要約 2)メモリ検索 3)メモリ拡張世代。各サブタスクは、デコーダ入力の先頭に特別なトークンを挿入することによってモデル化されたクエリとメモリ間の関連性を計算する表現を学習する。関連表現学習は、パラメータ共有と合同トレーニングを通じてサブタスク間の接続を強化する。実験結果から,提案手法は強いベースラインよりも一貫して改善され,対話の一貫性と係合性が向上することが示された。

Open-domain long-term memory conversation can establish long-term intimacy with humans, and the key is the ability to understand and memorize long-term dialogue history information. Existing works integrate multiple models for modelling through a pipeline, which ignores the coupling between different stages. In this paper, we propose a Unified framework for Long-term Memory Conversations (UniMC), which increases the connection between different stages by learning relevance representation. Specifically, we decompose the main task into three subtasks based on probability graphs: 1) conversation summarization, 2) memory retrieval, 3) memory-augmented generation. Each subtask involves learning a representation for calculating the relevance between the query and memory, which is modelled by inserting a special token at the beginning of the decoder input. The relevance representation learning strengthens the connection across subtasks through parameter sharing and joint training. Extensive experimental results show that the proposed method consistently improves over strong baselines and yields better dialogue consistency and engagingness.

翻訳日:2023-06-21 20:05:15 公開日:2023-06-18

# 畳み込みニューラルネットワークにおける負の情報強化の学習

Learn to Enhance the Negative Information in Convolutional Neural Network ( http://arxiv.org/abs/2306.10536v1 )

ライセンス: Link先を確認

Zhicheng Cai, Chenglei Peng, Qiu Shen

(参考訳) 本稿では,畳み込みニューラルネットワーク(CNN)に特化して学習可能な非線形活性化機構を提案する。負のニューロンを切断し「死のReLU」の問題に苦しむReLUとは対照的に、LENIは死んだ神経細胞を再構築し、情報損失を減らす能力を持っている。改良されたReLUと比較して、LENIは負相情報をより適切に処理するための学習可能なアプローチを導入している。これにより、LENIはReLUの本来の利点を維持しつつ、モデル表現能力を大幅に向上させることができる。汎用的なアクティベーションメカニズムとして、レニはポータビリティの特性を持ち、アクティベーション層を単にレニブロックに置き換えることで、任意のcnnモデルで容易に利用できる。大規模な実験により、LENIは様々なベンチマークデータセット上の様々なベースラインモデルの性能を、明確なマージン(ImageNet-1kで最大1.24%高いトップ1精度)で、無視できる余分なパラメータで改善できることが確認された。さらなる実験では、LENIがチャネル補償機構として機能し、競争力や性能が向上するが、ベースラインモデルよりも学習パラメータが少ないことが示されている。さらに、LENIは表現能力の向上に寄与するモデル構造に非対称性を導入する。可視化実験を通じて、LENIがより多くの情報を保持し、より多くの表現を学習できることを検証する。

This paper proposes a learnable nonlinear activation mechanism specifically for convolutional neural network (CNN) termed as LENI, which learns to enhance the negative information in CNNs. In sharp contrast to ReLU which cuts off the negative neurons and suffers from the issue of ''dying ReLU'', LENI enjoys the capacity to reconstruct the dead neurons and reduce the information loss. Compared to improved ReLUs, LENI introduces a learnable approach to process the negative phase information more properly. In this way, LENI can enhance the model representational capacity significantly while maintaining the original advantages of ReLU. As a generic activation mechanism, LENI possesses the property of portability and can be easily utilized in any CNN models through simply replacing the activation layers with LENI block. Extensive experiments validate that LENI can improve the performance of various baseline models on various benchmark datasets by a clear margin (up to 1.24% higher top-1 accuracy on ImageNet-1k) with negligible extra parameters. Further experiments show that LENI can act as a channel compensation mechanism, offering competitive or even better performance but with fewer learned parameters than baseline models. In addition, LENI introduces the asymmetry to the model structure which contributes to the enhancement of representational capacity. Through visualization experiments, we validate that LENI can retain more information and learn more representations.

翻訳日:2023-06-21 20:05:00 公開日:2023-06-18

# ProMIL: 医用画像の確率的多重学習

ProMIL: Probabilistic Multiple Instance Learning for Medical Imaging ( http://arxiv.org/abs/2306.10535v1 )

ライセンス: Link先を確認

{\L}ukasz Struski, Dawid Rymarczyk, Arkadiusz Lewicki, Robert Sabiniewicz, Jacek Tabor, Bartosz Zieli\'nski

(参考訳) マルチインスタンスラーニング(MIL)は、ひとつのラベルがインスタンスの袋全体に割り当てられる弱い教師付き問題である。 MILモデルの重要なクラスはインスタンスベースで、まずインスタンスを分類し、その予測を集約してバッグラベルを取得する。最も一般的なMILモデルは、バッグが正のラベルを持つ場合、そのインスタンスの少なくとも1つが正のラベルを持つ場合である。しかし、この推論は、ポジティブなバッグラベルが特定のポジティブなインスタンスのパーセンテージの結果であるような、多くの現実のシナリオでは成り立たない。この問題に対処するために,深層ニューラルネットワークとベルンシュタイン多項式推定に基づく,ProMILと呼ばれる専用インスタンスベースの手法を提案する。 ProMILの重要な利点は、意思決定に最適なパーセンテージを自動的に検出できることである。 ProMILは実世界の医療応用において標準のインスタンスベースMILよりも優れていることを示す。コードを利用可能にします。

Multiple Instance Learning (MIL) is a weakly-supervised problem in which one label is assigned to the whole bag of instances. An important class of MIL models is instance-based, where we first classify instances and then aggregate those predictions to obtain a bag label. The most common MIL model is when we consider a bag as positive if at least one of its instances has a positive label. However, this reasoning does not hold in many real-life scenarios, where the positive bag label is often a consequence of a certain percentage of positive instances. To address this issue, we introduce a dedicated instance-based method called ProMIL, based on deep neural networks and Bernstein polynomial estimation. An important advantage of ProMIL is that it can automatically detect the optimal percentage level for decision-making. We show that ProMIL outperforms standard instance-based MIL in real-world medical applications. We make the code available.

翻訳日:2023-06-21 20:04:35 公開日:2023-06-18

# データ拡張によるグラフ異常検出モデルの一般化性の向上

Improving Generalizability of Graph Anomaly Detection Models via Data Augmentation ( http://arxiv.org/abs/2306.10534v1 )

ライセンス: Link先を確認

Shuang Zhou, Xiao Huang, Ninghao Liu, Huachi Zhou, Fu-Lai Chung, Long-Kai Huang

(参考訳) グラフ異常検出(GAD)は、少数の異常でさえ、良心的なユーザーに大きな脅威をもたらす可能性があるため、重要なタスクである。従来の知識として利用可能なラベルを効果的に活用できる最近の半教師付きGAD法は、教師なし手法よりも優れた性能を実現している。実際には、人々はビジネスを確保するために新しい(サブ)グラフ上の異常を識別する必要があるが、効果的な検出モデルをトレーニングするラベルが欠落している可能性がある。自然なアイデアのひとつは、トレーニング済みのgadモデルをテスト用の新しい(サブ)グラフに直接導入することだ。しかし、既存の半教師付きGAD法は一般化の問題に悩まされており、例えば、よく訓練されたモデルは、同じグラフの見えない領域(つまり、トレーニングではアクセスできない)ではうまく機能しない。それは大きなトラブルを引き起こすかもしれない。本稿では,この現象を基礎として,学習領域グラフと未発見テストグラフの両方の異常を効果的に識別し,潜在的な危険を解消することを目的とした,一般化グラフ異常検出の一般的かつ新しい研究問題を提案する。それでも、限られたラベルしか利用できないため、通常のバックグラウンドはトレーニングとテストデータの違いがあるため、難しい作業です。そこで本研究では,学習データを充実させ,GADモデルの一般化性を高めるために,textit{AugAN} (\uline{Aug}mentation for \uline{A}nomaly and \uline{N}ormal distributions) というデータ拡張手法を提案する。モデル一般化性向上における本手法の有効性を検証する。

Graph anomaly detection (GAD) is a vital task since even a few anomalies can pose huge threats to benign users. Recent semi-supervised GAD methods, which can effectively leverage the available labels as prior knowledge, have achieved superior performances than unsupervised methods. In practice, people usually need to identify anomalies on new (sub)graphs to secure their business, but they may lack labels to train an effective detection model. One natural idea is to directly adopt a trained GAD model to the new (sub)graph for testing. However, we find that existing semi-supervised GAD methods suffer from poor generalization issue, i.e., well-trained models could not perform well on an unseen area (i.e., not accessible in training) of the same graph. It may cause great troubles. In this paper, we base on the phenomenon and propose a general and novel research problem of generalized graph anomaly detection that aims to effectively identify anomalies on both the training-domain graph and unseen testing graph to eliminate potential dangers. Nevertheless, it is a challenging task since only limited labels are available, and the normal background may differ between training and testing data. Accordingly, we propose a data augmentation method named \textit{AugAN} (\uline{Aug}mentation for \uline{A}nomaly and \uline{N}ormal distributions) to enrich training data and boost the generalizability of GAD models. Experiments verify the effectiveness of our method in improving model generalizability.

翻訳日:2023-06-21 20:04:20 公開日:2023-06-18

# 事前学習したテキストから画像への拡散モデルによるポイントクラウド補完

Point-Cloud Completion with Pretrained Text-to-image Diffusion Models ( http://arxiv.org/abs/2306.10533v1 )

ライセンス: Link先を確認

Yoni Kasten, Ohad Rahamim, Gal Chechik

(参考訳) 実世界のアプリケーションで収集されるポイントクラウドデータは、しばしば不完全である。データは一般的に、特定の視点や角度しか捉えない部分的な視点から観察されるオブジェクトのために欠落している。さらに、オクルージョンと低解像度サンプリングのため、データは不完全である。既存の補完アプローチは、ノイズと不完全な点雲の完成を導くために、事前に定義されたオブジェクトのデータセットに依存している。しかし、これらのアプローチは、トレーニングデータセットでは不十分なOOD(Out-Of-Distribution)オブジェクトでテストすると、パフォーマンスが悪くなります。ここでは,近年のテキストガイド画像生成の進歩を活かし,テキストガイド形状生成の大きなブレークスルーを導いた。本稿では,事前学習したテキストから画像への拡散モデルを用いて,与えられた不完全点クラウドのテキストセマンティクスを活用し,完全な表面表現を得るsds完全というアプローチについて述べる。 SDS-Completeは、高価な3D情報を集めることなく、テスト時間最適化を用いて様々なオブジェクトを補完することができる。実世界の深度センサとLiDARスキャナーで捉えた不完全なスキャン対象に対するSDS完全性を評価する。一般的なデータセットから欠落したオブジェクトを効果的に再構築し、現在の手法と比較して、Chamferの損失を平均50%削減する。プロジェクトページ: https://sds-complete.github.io/

Point-cloud data collected in real-world applications are often incomplete. Data is typically missing due to objects being observed from partial viewpoints, which only capture a specific perspective or angle. Additionally, data can be incomplete due to occlusion and low-resolution sampling. Existing completion approaches rely on datasets of predefined objects to guide the completion of noisy and incomplete, point clouds. However, these approaches perform poorly when tested on Out-Of-Distribution (OOD) objects, that are poorly represented in the training dataset. Here we leverage recent advances in text-guided image generation, which lead to major breakthroughs in text-guided shape generation. We describe an approach called SDS-Complete that uses a pre-trained text-to-image diffusion model and leverages the text semantics of a given incomplete point cloud of an object, to obtain a complete surface representation. SDS-Complete can complete a variety of objects using test-time optimization without expensive collection of 3D information. We evaluate SDS Complete on incomplete scanned objects, captured by real-world depth sensors and LiDAR scanners. We find that it effectively reconstructs objects that are absent from common datasets, reducing Chamfer loss by 50% on average compared with current methods. Project page: https://sds-complete.github.io/

翻訳日:2023-06-21 20:03:52 公開日:2023-06-18

# GenPose:拡散モデルによる生成カテゴリレベルのオブジェクトポス推定

GenPose: Generative Category-level Object Pose Estimation via Diffusion Models ( http://arxiv.org/abs/2306.10531v1 )

ライセンス: Link先を確認

Jiyao Zhang, Mingdong Wu and Hao Dong

(参考訳) オブジェクトのポーズ推定は、AIとコンピュータビジョンの具体化において重要な役割を果たす。カテゴリーレベルのポーズ推定の実用性にもかかわらず、現在のアプローチは、マルチハイポテーゼ問題として知られる部分的観測点雲の課題に遭遇する。本研究では,カテゴリーレベルのオブジェクトポーズ推定を条件付き生成モデルとして再検討し,従来のポイント・ツー・ポイント回帰から外れた新しい解を提案する。スコアベース拡散モデルを利用して、拡散モデルから候補をサンプリングし、2段階のプロセスでそれらを集約することによりオブジェクトのポーズを推定する。確率を推定する際のコストのかかる統合プロセスを回避するため,従来のスコアベースモデルからエネルギーベースモデルを訓練し,エンドツーエンドの推定を可能にする方法を提案する。提案手法は, 厳密な5d2cmおよび5d5cmで50%, 60%以上の精度でREAL275データセット上での最先端性能を実現する。さらに,本手法は,類似の対称特性を微調整せずに共有する新しいカテゴリに対して高い一般化性を示し,オブジェクトポーズ追跡タスクに容易に適応でき,現在の最先端ベースラインに匹敵する結果が得られることを示した。

Object pose estimation plays a vital role in embodied AI and computer vision, enabling intelligent agents to comprehend and interact with their surroundings. Despite the practicality of category-level pose estimation, current approaches encounter challenges with partially observed point clouds, known as the multihypothesis issue. In this study, we propose a novel solution by reframing categorylevel object pose estimation as conditional generative modeling, departing from traditional point-to-point regression. Leveraging score-based diffusion models, we estimate object poses by sampling candidates from the diffusion model and aggregating them through a two-step process: filtering out outliers via likelihood estimation and subsequently mean-pooling the remaining candidates. To avoid the costly integration process when estimating the likelihood, we introduce an alternative method that trains an energy-based model from the original score-based model, enabling end-to-end likelihood estimation. Our approach achieves state-of-the-art performance on the REAL275 dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics, respectively. Furthermore, our method demonstrates strong generalizability to novel categories sharing similar symmetric properties without fine-tuning and can readily adapt to object pose tracking tasks, yielding comparable results to the current state-of-the-art baselines.

翻訳日:2023-06-21 20:03:31 公開日:2023-06-18

# トランスフォーマーモデルにおけるジェンダーバイアス:包括的調査

Gender Bias in Transformer Models: A comprehensive survey ( http://arxiv.org/abs/2306.10530v1 )

ライセンス: Link先を確認

Praneeth Nemani, Yericherla Deepak Joel, Palla Vijay, Farhana Ferdousi Liza

(参考訳) 人工知能(AI)におけるジェンダーバイアスは、個人の生活に深く影響する懸念として浮上している。本稿では,トランスフォーマーモデルにおけるジェンダーバイアスを言語学的観点から調査する。言語モデルにおけるジェンダーバイアスの存在は以前の研究で認識されているが、このバイアスを効果的に測定し評価する方法に関するコンセンサスが不足している。本研究はトランスフォーマーにおけるジェンダーバイアスに関する既存の文献を批判的に検討し,バイアス評価に使用される多様な方法論と指標について考察した。現在のトランスフォーマにおける性別バイアス測定のアプローチには、不完全あるいは欠陥のあるメトリクスの利用、不適切なデータセットサイズ、評価方法の標準化の欠如など、いくつかの制限がある。さらに,対話システムや機械翻訳など,下流アプリケーション用トランスフォーマーにおけるジェンダーバイアスの潜在的影響について検討した。我々は、言語技術の開発と展開における認識の高まりと説明責任の必要性を強調し、これらのシステムにおける公平性と公平性を育むことの重要性を強調している。本稿では、トランスフォーマーモデルにおけるジェンダーバイアスの包括的概要として、新しい洞察を提供し、この重要な領域における将来の研究に有用な方向性を提供する。

Gender bias in artificial intelligence (AI) has emerged as a pressing concern with profound implications for individuals' lives. This paper presents a comprehensive survey that explores gender bias in Transformer models from a linguistic perspective. While the existence of gender bias in language models has been acknowledged in previous studies, there remains a lack of consensus on how to effectively measure and evaluate this bias. Our survey critically examines the existing literature on gender bias in Transformers, shedding light on the diverse methodologies and metrics employed to assess bias. Several limitations in current approaches to measuring gender bias in Transformers are identified, encompassing the utilization of incomplete or flawed metrics, inadequate dataset sizes, and a dearth of standardization in evaluation methods. Furthermore, our survey delves into the potential ramifications of gender bias in Transformers for downstream applications, including dialogue systems and machine translation. We underscore the importance of fostering equity and fairness in these systems by emphasizing the need for heightened awareness and accountability in developing and deploying language technologies. This paper serves as a comprehensive overview of gender bias in Transformer models, providing novel insights and offering valuable directions for future research in this critical domain.

翻訳日:2023-06-21 20:03:06 公開日:2023-06-18

# 線形モデルにおけるDropout Regularization Versus $\ell_2$-Penalization

Dropout Regularization Versus $\ell_2$-Penalization in the Linear Model ( http://arxiv.org/abs/2306.10529v1 )

ライセンス: Link先を確認

Gabriel Clara, Sophie Langer, Johannes Schmidt-Hieber

(参考訳) 線形回帰モデルにおける降下を伴う勾配降下の統計的挙動について検討する。特に、イテレートの期待と共分散行列に対する非漸近境界が導出される。期待値におけるドロップアウトと$\ell_2$-レギュライゼーションの相関が広く引用されているのとは対照的に、この結果は勾配降下ダイナミクスとドロップアウトによって引き起こされる追加のランダム性との相互作用により、はるかに微妙な関係を示している。また,正規化効果を持たず,最小二乗推定器に収束するドロップアウトの簡易変種についても検討した。

We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for expectations and covariance matrices of the iterates are derived. In contrast with the widely cited connection between dropout and $\ell_2$-regularization in expectation, the results indicate a much more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout. We also study a simplified variant of dropout which does not have a regularizing effect and converges to the least squares estimator.

翻訳日:2023-06-21 20:02:46 公開日:2023-06-18

# 測定の最適化による暗カウント効果の低減

Reduce dark count effects by optimizing measurements ( http://arxiv.org/abs/2306.10525v1 )

ライセンス: Link先を確認

Hao Shu

(参考訳) 量子タスクを実践する場合、デバイスの不完全性を考慮する必要がある。中でも、重要かつ未解決な問題の1つは、単一光子検出器によるダークカウント効果である。本稿では,これらの問題を考察し,実用的検出器を用いた暗視計数効果のロバスト性を反映した測定の新たな最適性を定義する。また、一般計測のための最適化スキームを提供する。この研究は、測定値の選択を最適化してダークカウント効果を扱おうとする最初の研究であり、この問題はスキームによって軽減できると信じている。

When implementing quantum tasks practically, the imperfection of devices should take into account. Among all, One of the significant but unsolved problems is the dark count effect caused by single photon detectors. In this paper, we consider such an issue and define a new optimality for measurements, reflecting the robustness in dark count effects with practical detectors. Also, an optimization scheme for general measurements is provided. This research could be the first one trying to handle dark count effects based on optimizing the choice of measurements, and we believe that the problem can be reduced by the scheme.

翻訳日:2023-06-21 20:02:35 公開日:2023-06-18

# OpenDataVal: データ評価のための統一ベンチマーク

OpenDataVal: a Unified Benchmark for Data Valuation ( http://arxiv.org/abs/2306.10577v1 )

ライセンス: Link先を確認

Kevin Fu Jiang, Weixin Liang, James Zou, Yongchan Kwon

(参考訳) 個々のデータポイントの品質と影響を評価することは、モデルパフォーマンスを改善し、トレーニングデータセット内の望ましくないバイアスを軽減するために重要です。データ品質を定量化するためにいくつかのデータ評価アルゴリズムが提案されているが、データ評価のための体系的で標準化されたベンチマークシステムがない。本稿では、研究者や実践者が様々なデータ評価アルゴリズムを適用して比較できるようにする、使いやすく統一されたベンチマークフレームワークOpenDataValを紹介する。 OpenDataValは統合された環境を提供する (i)画像、自然言語、表形式のデータセットの多種多様なコレクション。 (ii)9種類の最先端データ評価アルゴリズムの実装、及び (iii) scikit-learnで任意のモデルをインポート可能な予測モデルapi。さらに、データ値の品質を評価するための4つの下流機械学習タスクを提案する。我々はOpenDataValを用いてベンチマーク分析を行い、最先端データ評価手法の有効性を定量化し比較する。一つのアルゴリズムが全てのタスクに対して一様に最善を尽くすことはなく、ユーザの下流タスクに適切なアルゴリズムを適用すべきである。 OpenDataValはhttps://opendataval.github.ioで公開されている。さらに、研究者が自身のデータバリュエーションアルゴリズムの有効性を評価できるリーダーボードを提供する。

Assessing the quality and impact of individual data points is critical for improving model performance and mitigating undesirable biases within the training dataset. Several data valuation algorithms have been proposed to quantify data quality, however, there lacks a systemic and standardized benchmarking system for data valuation. In this paper, we introduce OpenDataVal, an easy-to-use and unified benchmark framework that empowers researchers and practitioners to apply and compare various data valuation algorithms. OpenDataVal provides an integrated environment that includes (i) a diverse collection of image, natural language, and tabular datasets, (ii) implementations of nine different state-of-the-art data valuation algorithms, and (iii) a prediction model API that can import any models in scikit-learn. Furthermore, we propose four downstream machine learning tasks for evaluating the quality of data values. We perform benchmarking analysis using OpenDataVal, quantifying and comparing the efficacy of state-of-the-art data valuation approaches. We find that no single algorithm performs uniformly best across all tasks, and an appropriate algorithm should be employed for a user's downstream task. OpenDataVal is publicly available at https://opendataval.github.io with comprehensive documentation. Furthermore, we provide a leaderboard where researchers can evaluate the effectiveness of their own data valuation algorithms.

翻訳日:2023-06-21 19:56:32 公開日:2023-06-18

# スコアに基づくデータ同化

Score-based Data Assimilation ( http://arxiv.org/abs/2306.10574v1 )

ライセンス: Link先を確認

Fran\c{c}ois Rozet and Gilles Louppe

(参考訳) データ同化は、最も包括的な形で、確率力学系のノイズまたは不完全な観察を説明する可塑性状態軌跡を特定するベイズ逆問題に対処する。粒子法や変分法などの様々な手法が提案されている。しかし、ほとんどのアルゴリズムは、長期間の地平線や、海洋や大気のような複雑な力学を持つ高次元システムにとって、推論の遷移力学に依存している。本研究では,軌道推定のためのスコアに基づくデータ同化について述べる。我々は、任意の長さの軌道のスコアを短いセグメントで一連のスコアに分解できるというキーインサイトに基づいて、状態軌道のスコアに基づく生成モデルを学ぶ。トレーニング後、全ての状態を同時に生成して非自己回帰的にスコアモデルを用いて推論を行う。極めて特筆すべきは、トレーニング手順から観察モデルを分離し、推論時にのみ使用して生成過程をガイドし、幅広いゼロショット観察シナリオを可能にすることである。本手法の有効性を裏付ける理論的,実証的な証拠を提示する。

Data assimilation, in its most comprehensive form, addresses the Bayesian inverse problem of identifying plausible state trajectories that explain noisy or incomplete observations of stochastic dynamical systems. Various approaches have been proposed to solve this problem, including particle-based and variational methods. However, most algorithms depend on the transition dynamics for inference, which becomes intractable for long time horizons or for high-dimensional systems with complex dynamics, such as oceans or atmospheres. In this work, we introduce score-based data assimilation for trajectory inference. We learn a score-based generative model of state trajectories based on the key insight that the score of an arbitrarily long trajectory can be decomposed into a series of scores over short segments. After training, inference is carried out using the score model, in a non-autoregressive manner by generating all states simultaneously. Quite distinctively, we decouple the observation model from the training procedure and use it only at inference to guide the generative process, which enables a wide range of zero-shot observation scenarios. We present theoretical and empirical evidence supporting the effectiveness of our method.

翻訳日:2023-06-21 19:56:16 公開日:2023-06-18

# 高次相関関数における強および超強結合の操作

Manifestation of strong and ultra-strong coupling in high-order correlation function ( http://arxiv.org/abs/2306.10573v1 )

ライセンス: Link先を確認

A. S. Belashov, E. S. Andrianov, A. A. Zyablovsky

(参考訳) キャビティ-単一原子」系における強結合と超強結合は、基礎物理学と応用物理学の両方に大きな関心を持つ。キャビティモードと原子との結合強度の増加は、第一に弱結合から強結合へ、第二に超強結合状態へ遷移すると考えられる。この書簡では、この共通の意見を反論し、カップリングレジーム間の遷移が異なる順序の相関関数に対して異なる順序で起こることを実証する。また, n次相関関数の場合, 強結合状態への遷移には, 1次相関関数の結合強度が約$n^{2/3} 大きいことが判明した。対照的に、超強結合状態への移行は、第一次相関関数の動力学よりも結合強度が低いn次相関関数の動力学に現れる。その結果、カップリング強度の増加が第一に弱いカップリングから超強結合への遷移、第二に強結合状態へと繋がる相関関数の次数が存在する。高次相関関数の測定は、結合強度が振動周波数の10分の1以下である場合、「キャビティモード-単一原子」における超強結合を観測できると主張している。

Strong and ultra-strong coupling in "cavity - single atom" system are of great interest for both fundamental and applied physics. It is considered that the increase in the coupling strength between a cavity mode and an atom leads, first, to transition from weak to strong coupling and, second, to ultra-strong coupling regime. In this letter, we refute this common opinion and demonstrate that the transitions between the coupling regimes occur in different sequence for the correlations' functions of different orders. We show that for n-th order correlations' functions, the transition to the strong coupling regime requires the coupling strength approximately by $n^{2/3}$ times greater than the one for first order correlations' functions. In contrast, the transition to the ultra-strong coupling regime manifests in the dynamics of n-th order correlations' functions at the less coupling strength than in the dynamics of first order correlations' functions. As a result, there is the order of correlations' functions, above which the increase in the coupling strength leads, first, to the transition from the weak coupling first to the ultra-strong coupling regime, and second to the strong coupling regime. We argue that the measurement of high orders correlations' functions makes it possible to observe the ultra-strong coupling in "cavity mode - single atom" when the coupling strength is much less than one tenth of the oscillation frequency.

翻訳日:2023-06-21 19:55:59 公開日:2023-06-18

# 最短共通スーパーストリングとテキスト集合問題に対する量子アルゴリズム

Quantum Algorithms for the Shortest Common Superstring and Text Assembling Problems ( http://arxiv.org/abs/2306.10572v1 )

ライセンス: Link先を確認

Kamil Khadiev, Carlos Manuel Bosch Machado, Zeyu Chen, Junde Wu

(参考訳) 本稿では,テキスト集合問題の2つのバージョンについて考察する。文字列の列$s^1,\dots,s^n$ of total length $l$(辞書)と$t$ of length $m$(テキスト)が与えられます。問題の最初のバージョンは、辞書から$t$を組み立てることである。 2番目のバージョンは ``Shortest Superstring Problem' (SSP) または ``Shortest Common Superstring Problem' (SCS) である。この場合、$t$は与えられず、与えられたシーケンスから各文字列をサブストリングとして含む最短文字列(スーパーストリングと呼ぶ)を構築するべきです。これらの問題は、小さな断片から長いDNA配列を再構成する配列アセンブリー法に関連付けられている。どちらの問題に対しても、従来のアルゴリズムよりも優れた量子アルゴリズムを提案する。最初のケースでは、$O(m+\log m\sqrt{nL})$ run time の量子アルゴリズムを示す。 SSP の場合、実行時間 $O(n^3 1.728^n +L +\sqrt{L}n^{1.5}+\sqrt{L}n\log^2L\log^2n)$ の量子アルゴリズムを示す。

In this paper, we consider two versions of the Text Assembling problem. We are given a sequence of strings $s^1,\dots,s^n$ of total length $L$ that is a dictionary, and a string $t$ of length $m$ that is texts. The first version of the problem is assembling $t$ from the dictionary. The second version is the ``Shortest Superstring Problem''(SSP) or the ``Shortest Common Superstring Problem''(SCS). In this case, $t$ is not given, and we should construct the shortest string (we call it superstring) that contains each string from the given sequence as a substring. These problems are connected with the sequence assembly method for reconstructing a long DNA sequence from small fragments. For both problems, we suggest new quantum algorithms that work better than their classical counterparts. In the first case, we present a quantum algorithm with $O(m+\log m\sqrt{nL})$ running time. In the case of SSP, we present a quantum algorithm with running time $O(n^3 1.728^n +L +\sqrt{L}n^{1.5}+\sqrt{L}n\log^2L\log^2n)$.

翻訳日:2023-06-21 19:55:37 公開日:2023-06-18

# スピングラス系における量子重ね合わせと絡み合い

Quantum Superposition and Entanglement in Spin-Glass Systems ( http://arxiv.org/abs/2306.10571v1 )

ライセンス: Link先を確認

Asl{\i} Tuncer and Serhat C. Kad{\i}o\u{g}lu

(参考訳) スピングラスは、ポテンシャル配置を含む等しく可能性の高い重ね合わせ状態(sss)に存在することを提案する。我々は、Edward-Anderson(EA)の秩序パラメータと磁化を利用して、SG、強磁性(FM)、常磁性(PM)相などの磁気秩序(秩序)の識別への寄与に基づいて、これらのSSの分類手法を確立する。また,様々なシステムサイズを包含し,これらの相依存SSの絡み合い特性を負性測定を用いて検討した。解析の結果,SG秩序パラメータは磁性秩序(秩序のずれ)相の絡み合い特性を決定するのに有効であり,その逆も磁性秩序の存在を示す負性を示す。具体的には,スピングラス系における負性率と感受性の関係について検討する。本研究はスピングラスと量子磁石における量子重ね合わせの役割についてさらなる知見を与える。

We propose that spin glasses can exist in equally probable superposition states (SSs) comprising potential configurations. Employing the Edward-Anderson (EA) order parameter and magnetization, we establish a classification scheme for these SSs based on their contribution to discerning magnetic order (disorder), such as SG, ferromagnetic (FM), and paramagnetic (PM) phases. We also encompass various system sizes and investigate the entanglement properties of these phase-dependent SSs using the negativity measure. Our analysis reveals that the SG order parameter can be employed to determine the entanglement characteristics of magnetically ordered (disordered) phases, and vice versa, with negativity indicating the presence of magnetic order. Specifically, we examine the relationship between negativity and susceptibility in spin-glass systems. Our findings provide further insight into the role of quantum superposition in spin glasses and quantum magnets.

翻訳日:2023-06-21 19:55:15 公開日:2023-06-18

# MIR-GAN:アダベリアルネットワークを用いたフレームレベルモード不変表現の精製

MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition ( http://arxiv.org/abs/2306.10567v1 )

ライセンス: Link先を確認

Yuchen Hu, Chen Chen, Ruizhe Li, Heqing Zou, Eng Siong Chng

(参考訳) 音声視覚音声認識(AVSR)は、近年、人間の発話を理解するためにマルチモーダル信号を活用することで、研究の関心が高まりつつある。この課題に対処する主流のアプローチは、マルチモーダリティ融合と表現学習のための高度なアーキテクチャと技術を開発した。しかし、異なるモダリティの自然な不均一性は、それらの表現間の分布ギャップを生じさせ、それらを融合させることを困難にする。本稿では,モダリティ間の共通表現を学習してギャップを埋めることを目的とする。感情分析などの他のマルチモーダルタスクにおける既存の類似手法とは異なり,avsrのシーケンス間タスク設定を考慮した時間的文脈依存性に注目した。特に,フレームレベルのモダリティ不変表現(MIR-GAN)を改良する対角ネットワークを提案する。 LRS3 と LRS2 の公開ベンチマークによる大規模な実験により,我々の手法は最先端技術よりも優れていることが示された。

Audio-visual speech recognition (AVSR) attracts a surge of research interest recently by leveraging multimodal signals to understand human speech. Mainstream approaches addressing this task have developed sophisticated architectures and techniques for multi-modality fusion and representation learning. However, the natural heterogeneity of different modalities causes distribution gap between their representations, making it challenging to fuse them. In this paper, we aim to learn the shared representations across modalities to bridge their gap. Different from existing similar methods on other multimodal tasks like sentiment analysis, we focus on the temporal contextual dependencies considering the sequence-to-sequence task setting of AVSR. In particular, we propose an adversarial network to refine frame-level modality-invariant representations (MIR-GAN), which captures the commonality across modalities to ease the subsequent multimodal fusion process. Extensive experiments on public benchmarks LRS3 and LRS2 show that our approach outperforms the state-of-the-arts.

翻訳日:2023-06-21 19:55:01 公開日:2023-06-18

# 雑音中の唇の聴取:ロバストな音声認識のための普遍音素マッピングと伝達

Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition ( http://arxiv.org/abs/2306.10563v1 )

ライセンス: Link先を確認

Yuchen Hu, Ruizhe Li, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng

(参考訳) AVSR(Audio-visual speech Recognition)は、視覚情報を用いた音声のみの音声認識のノイズロス性を改善するための有望なソリューションを提供する。しかし, AVSRタスクの優越性を考慮して, 音質改善に重点を置いており, フロントエンドの雑音処理などの雑音適応技術が注目されている。効果はあるものの、これらの手法は通常2つの実践的な課題に直面している。 1) 実環境シナリオにおける騒音発声・視聴覚訓練の十分なラベルの欠如と課題 2) テストノイズに対する最適モデル一般性は低い。本研究では,非教師なし雑音適応の学習データに依存することなく,どのテストノイズにも適応できるavsrの頑健性を高めるために,雑音不変な視覚モダリティについて検討する。人間の知覚機構に着想を得て,視覚信号からクリーンな音声を復元し,雑音のある環境下での音声認識を可能にする,普遍的な音素マッピング(UniVPM)手法を提案する。 LRS3 と LRS2 のベンチマーク実験により, 様々なノイズや清潔な条件下での最先端性を実現することができた。また,視覚音声認識タスクにおける先行技術よりも優れていた。

Audio-visual speech recognition (AVSR) provides a promising solution to ameliorate the noise-robustness of audio-only speech recognition with visual information. However, most existing efforts still focus on audio modality to improve robustness considering its dominance in AVSR task, with noise adaptation techniques such as front-end denoise processing. Though effective, these methods are usually faced with two practical challenges: 1) lack of sufficient labeled noisy audio-visual training data in some real-world scenarios and 2) less optimal model generality to unseen testing noises. In this work, we investigate the noise-invariant visual modality to strengthen robustness of AVSR, which can adapt to any testing noises while without dependence on noisy training data, a.k.a., unsupervised noise adaptation. Inspired by human perception mechanism, we propose a universal viseme-phoneme mapping (UniVPM) approach to implement modality transfer, which can restore clean audio from visual signals to enable speech recognition under any noisy conditions. Extensive experiments on public benchmarks LRS3 and LRS2 show that our approach achieves the state-of-the-art under various noisy as well as clean conditions. In addition, we also outperform previous state-of-the-arts on visual speech recognition task.

翻訳日:2023-06-21 19:54:43 公開日:2023-06-18

# 原表現定理のイザベル形式化

Isabelle Formalisation of Original Representation Theorems ( http://arxiv.org/abs/2306.10558v1 )

ライセンス: Link先を確認

Marco B. Caminati

(参考訳) 最近の論文では、巨大なデータベース上のクロスサイトデータマイニングと、既存のイザベルで検証されたイベント構造列挙アルゴリズムに基づいて、明らかに無関係な数学的対象(並行性理論と計算生物学における全グラフからのイベント構造)をリンクする新たな定理が発見された。そのような定理の起源と新しさを考えると、それらの形式的検証は特に望ましい。本稿では,Isabelle/HOL定義と定理による検証を行い,そのプロセスにおける技術的課題を明らかにする。導入された形式化は、Isabelleで検証されたイベント構造列挙アルゴリズムの完全な検証フレームワークへの検証を完了し、イベント構造を完全なグラフにリンクする。

In a recent paper, new theorems linking apparently unrelated mathematical objects (event structures from concurrency theory and full graphs arising in computational biology) were discovered by cross-site data mining on huge databases, and building on existing Isabelle-verified event structures enumeration algorithms. Given the origin and newness of such theorems, their formal verification is particularly desirable. This paper presents such a verification via Isabelle/HOL definitions and theorems, and exposes the technical challenges found in the process. The introduced formalisation completes the verification of Isabelle-verified event structure enumeration algorithms into a fully verified framework to link event structures to full graphs.

翻訳日:2023-06-21 19:54:20 公開日:2023-06-18

# リーダボードから実践への要約:表現バックボーンの選択と堅牢性確保

Summarization from Leaderboards to Practice: Choosing A Representation Backbone and Ensuring Robustness ( http://arxiv.org/abs/2306.10555v1 )

ライセンス: Link先を確認

David Demeter, Oshin Agarwal, Simon Ben Igeri, Marko Sterbentz, Neil Molino, John M. Conroy, Ani Nenkova

(参考訳) 学術文献は、既存の研究コンポーネントから最大限の顧客向け要約システムを構築する方法についてはあまりガイダンスを提供していない。本稿では,一般的なモデルからシステムバックボーンの選択を知らせる解析を行い,自動評価と人間評価の両方において,bartがペガサスやt5よりも優れた性能を示す。また,クロスドメインを適用した場合,要約処理の性能が著しく低下することが判明した。同時に、異種ドメインに微調整されたシステムは、すべてのドメインでうまく動作し、幅広いドメインの要約に最も適します。我々の研究は異種ドメイン要約ベンチマークの必要性を強調している。システム出力のかなりのばらつきは、人間による評価だけでは捉えられず、自動評価だけでは標準のリーダーボードに反映されそうにない。

Academic literature does not give much guidance on how to build the best possible customer-facing summarization system from existing research components. Here we present analyses to inform the selection of a system backbone from popular models; we find that in both automatic and human evaluation, BART performs better than PEGASUS and T5. We also find that when applied cross-domain, summarizers exhibit considerably worse performance. At the same time, a system fine-tuned on heterogeneous domains performs well on all domains and will be most suitable for a broad-domain summarizer. Our work highlights the need for heterogeneous domain summarization benchmarks. We find considerable variation in system output that can be captured only with human evaluation and are thus unlikely to be reflected in standard leaderboards with only automatic evaluation.

翻訳日:2023-06-21 19:54:06 公開日:2023-06-18

# 予測モデルは因果推論に使用できるか?

Can predictive models be used for causal inference? ( http://arxiv.org/abs/2306.10551v1 )

ライセンス: Link先を確認

Maximilian Pichler and Florian Hartig

(参考訳) 機械学習 (ML) と深層学習 (DL) アルゴリズムは予測タスクに優れるが、一般的には非因果関係を利用して、解釈可能性と一般化可能性の両方を制限すると仮定される。ここでは,この説明と予測のトレードオフが,期待したほど深く,基本的なものではないことを示す。 MLとDLのアルゴリズムは、すべてのデータに不特定に入力された場合の予測に非因果的特徴を用いる傾向にあるが、Pearlのバックドア調整基準に従って特徴を選択することで、任意のMLとDLアルゴリズムの学習プロセスを制限することができる。このような状況では、いくつかのアルゴリズム、特にディープニューラルネットワークは、特徴コリニアリティの下でほぼ偏りのない効果推定を提供することができる。残されるバイアスは、特定のアルゴリズム構造とハイパーパラメータ選択によって説明される。その結果、予測や推論のために調整された場合、最適なハイパーパラメータ設定が異なり、予測と説明の間のトレードオフの一般的な期待を確認する。しかし、このトレードオフの効果は因果的に制約された特徴選択の効果と比較して小さい。したがって、特徴間の因果関係が説明されれば、予測と説明の差は一般的に想定されるよりもはるかに小さくなる。また,このような因果制約のあるモデルが,共線形構造が変化した新しいデータに対してより一般化することを示し,一般化の失敗はしばしば因果学習の欠如によるものであることを示唆する。以上の結果から,mlモデルを用いて(causal)効果を推定する視点を提供するだけでなく,新しいデータに対するmlモデルとdlモデルの一般化可能性の向上にも寄与した。

Supervised machine learning (ML) and deep learning (DL) algorithms excel at predictive tasks, but it is commonly assumed that they often do so by exploiting non-causal correlations, which may limit both interpretability and generalizability. Here, we show that this trade-off between explanation and prediction is not as deep and fundamental as expected. Whereas ML and DL algorithms will indeed tend to use non-causal features for prediction when fed indiscriminately with all data, it is possible to constrain the learning process of any ML and DL algorithm by selecting features according to Pearl's backdoor adjustment criterion. In such a situation, some algorithms, in particular deep neural networks, can provide near unbiased effect estimates under feature collinearity. Remaining biases are explained by the specific algorithmic structures as well as hyperparameter choice. Consequently, optimal hyperparameter settings are different when tuned for prediction or inference, confirming the general expectation of a trade-off between prediction and explanation. However, the effect of this trade-off is small compared to the effect of a causally constrained feature selection. Thus, once the causal relationship between the features is accounted for, the difference between prediction and explanation may be much smaller than commonly assumed. We also show that such causally constrained models generalize better to new data with altered collinearity structures, suggesting generalization failure may often be due to a lack of causal learning. Our results not only provide a perspective for using ML for inference of (causal) effects but also help to improve the generalizability of fitted ML and DL models to new data.

翻訳日:2023-06-21 19:53:53 公開日:2023-06-18

# グリオーマにおける経時的MRI画像解析のための深層学習に基づくグループ登録法

Deep learning-based group-wise registration for longitudinal MRI analysis in glioma ( http://arxiv.org/abs/2306.10611v1 )

ライセンス: Link先を確認

Claudia Chinea Hammecher, Karin van Garderen, Marion Smits, Pieter Wesseling, Bart Westerman, Pim French, Mathilde Kouwenhoven, Roel Verhaak, Frans Vos, Esther Bron and Bo Li

(参考訳) グリオーマの成長は縦断画像登録で定量化できる。しかし、画像全体にわたる大きな質量効果と組織の変化は、さらなる課題をもたらす。本稿では,グリオーマMRIの正確かつ偏りのない登録のための縦断的,学習的,集団的登録法を提案する。我々は,Glioma Longitudinal AnalySiSコンソーシアムのデータセットを評価し,古典的な登録手法と比較した。より詳細な登録で同等のDice係数を実現し、ランタイムを1分以内で大幅に削減します。提案手法は、グリオーマの成長に関するさらなる知見を提供するため、古典的なツールボックスの代替として機能する可能性がある。

Glioma growth may be quantified with longitudinal image registration. However, the large mass-effects and tissue changes across images pose an added challenge. Here, we propose a longitudinal, learning-based, and groupwise registration method for the accurate and unbiased registration of glioma MRI. We evaluate on a dataset from the Glioma Longitudinal AnalySiS consortium and compare it to classical registration methods. We achieve comparable Dice coefficients, with more detailed registrations, while significantly reducing the runtime to under a minute. The proposed methods may serve as an alternative to classical toolboxes, to provide further insight into glioma growth.

翻訳日:2023-06-21 19:45:46 公開日:2023-06-18

# STHG:空間時間不均一グラフ学習による高度なオーディオ・ビジュアルダイアリゼーション

STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization ( http://arxiv.org/abs/2306.10608v1 )

ライセンス: Link先を確認

Kyle Min

(参考訳) 本稿では,Ego4D Challenge 2023の音声・視覚ダイアリゼーションタスクにおけるSTHGという新しい手法を紹介する。キーとなるイノベーションは、単一の一元的なグラフ学習フレームワークを使用して、ビデオ内のすべての話者をモデル化することです。カメラ装着者のみに独立したコンポーネントを必要とする従来のアプローチとは異なり、STHGはカメラ装着者を含む全ての人の音声活動を共同で検出することができる。最終手法はEgo4Dのテストセット上で61.1%のDERを得るが、これは昨年の勝者と同様に全てのベースラインを著しく上回っている。 Ego4D Challenge 2023で1位を獲得した。また,本課題では,sthgによるダイアリゼーション音声セグメントに市販音声認識システムを適用することで,音声認識課題における競合性能が向上することを示す。

This report introduces our novel method named STHG for the Audio-Visual Diarization task of the Ego4D Challenge 2023. Our key innovation is that we model all the speakers in a video using a single, unified heterogeneous graph learning framework. Unlike previous approaches that require a separate component solely for the camera wearer, STHG can jointly detect the speech activities of all people including the camera wearer. Our final method obtains 61.1% DER on the test set of Ego4D, which significantly outperforms all the baselines as well as last year's winner. Our submission achieved 1st place in the Ego4D Challenge 2023. We additionally demonstrate that applying the off-the-shelf speech recognition system to the diarized speech segments by STHG produces a competitive performance on the Speech Transcription task of this challenge.

翻訳日:2023-06-21 19:45:35 公開日:2023-06-18

# 表象によるデコンゲーション:市場における経済福祉改善のための学習

Decongestion by Representation: Learning to Improve Economic Welfare in Marketplaces ( http://arxiv.org/abs/2306.10606v1 )

ライセンス: Link先を確認

Omer Nahum, Gali Noti, David Parkes, Nir Rosenfeld

(参考訳) 混雑は、消費者が同じ商品のサブセット(例えば、休暇の賃貸プラットフォームで同じ小さな資産を追求するなど)で非効率に競争する市場において共通の失敗モードである。典型的な経済的な話では、物価は市場を切り離すために需給のバランスをとることでこの問題を解決している。しかし、現代のオンラインマーケットプレースでは、価格は通常、売り手によって分散された方法で設定される。このことは、プラットフォームがこの力を使って、混雑を減らして社会福祉を改善する表現を学習する、表現による混雑の現在の研究を動機付けている。技術的な課題は2つある — 真の評価ではなく、ユーザの過去の選択から明らかな選好のみに依存すること、そして、どの機能を明かすか、本質的にコンビネーションであるかを判断する表現を扱うこと、の2つだ。我々は、消費者選択データに基づいてエンドツーエンドで訓練できる福祉の差別化可能なプロキシを提案することで、両方の課題に取り組む。脱便が福祉を促進するための十分な条件を与える理論を提供し、我々の設定とアプローチに光を当てる合成データと実データの両方について実験を行う。

Congestion is a common failure mode of markets, where consumers compete inefficiently on the same subset of goods (e.g., chasing the same small set of properties on a vacation rental platform). The typical economic story is that prices solve this problem by balancing supply and demand in order to decongest the market. But in modern online marketplaces, prices are typically set in a decentralized way by sellers, with the power of a platform limited to controlling representations -- the information made available about products. This motivates the present study of decongestion by representation, where a platform uses this power to learn representations that improve social welfare by reducing congestion. The technical challenge is twofold: relying only on revealed preferences from users' past choices, rather than true valuations; and working with representations that determine which features to reveal and are inherently combinatorial. We tackle both by proposing a differentiable proxy of welfare that can be trained end-to-end on consumer choice data. We provide theory giving sufficient conditions for when decongestion promotes welfare, and present experiments on both synthetic and real data shedding light on our setting and approach.

翻訳日:2023-06-21 19:45:20 公開日:2023-06-18

# Fermi-Hubbardモデルに対するコンピュテータスケーリングによるトラッター誤差

Trotter error with commutator scaling for the Fermi-Hubbard model ( http://arxiv.org/abs/2306.10603v1 )

ライセンス: Link先を確認

Ansgar Schubert and Christian B. Mendl

(参考訳) 一般トロッター積公式の小さな因子による高階誤差境界を導出し、子などの結果を一般化する。 (第11回第11回第011020回(2021年)) 次に、これらの境界をフェルミ・ハバード・ハミルトニアンによって支配される実時間量子時間発展作用素に1次元および2次元の正方格子および三角形格子に応用する。我々の研究の主な技術的貢献は、与えられた格子幾何学のホッピングと相互作用項の間の入れ子交換子の象徴的評価である。この計算は、時間ステップとハミルトニアン係数の項による誤差境界の明示的な表現をもたらす。実際のtrotterエラー(小さなシステムで評価された)と比較すると、バウンダリがエラーを過大評価していることがわかる。

We derive higher-order error bounds with small prefactors for a general Trotter product formula, generalizing a result of Childs et al. [Phys. Rev. X 11, 011020 (2021)]. We then apply these bounds to the real-time quantum time evolution operator governed by the Fermi-Hubbard Hamiltonian on one-dimensional and two-dimensional square and triangular lattices. The main technical contribution of our work is a symbolic evaluation of nested commutators between hopping and interaction terms for a given lattice geometry. The calculations result in explicit expressions for the error bounds in terms of the time step and Hamiltonian coefficients. Comparison with the actual Trotter error (evaluated on a small system) indicates that the bounds still overestimate the error.

翻訳日:2023-06-21 19:44:58 公開日:2023-06-18

# dropcompute: 計算分散低減による、シンプルでより堅牢な分散同期トレーニング

DropCompute: simple and more robust distributed synchronous training via compute variance reduction ( http://arxiv.org/abs/2306.10598v1 )

ライセンス: Link先を確認

Niv Giladi, Shahar Gottlieb, Moran Shkolnik, Asaf Karnieli, Ron Banner, Elad Hoffer, Kfir Yehuda Levy, Daniel Soudry

(参考訳) 背景: ディープニューラルネットワーク(DNN)の大規模トレーニングには分散トレーニングが不可欠である。大規模DNNトレーニングの主要な方法は同期(All-Reduceなど)であるが、各ステップですべてのワーカーを待つ必要がある。このように、これらの方法は、重労働による遅延によって制限される。結果: 計算時間の変動によって作業員が行き詰まる典型的なシナリオについて検討した。計算時間特性とスケーラビリティの制約との間には,このような乱雑な作業者によって引き起こされる解析的な関係がある。そこで本研究では,作業者間のばらつきを低減し,同期訓練の堅牢性を向上させるための簡易かつ効果的な分散化手法を提案する。この方法は広く使われているall-reduceと統合できる。本研究は,200ガウディ加速器を用いた大規模トレーニングタスクで検証した。

Background: Distributed training is essential for large scale training of deep neural networks (DNNs). The dominant methods for large scale DNN training are synchronous (e.g. All-Reduce), but these require waiting for all workers in each step. Thus, these methods are limited by the delays caused by straggling workers. Results: We study a typical scenario in which workers are straggling due to variability in compute time. We find an analytical relation between compute time properties and scalability limitations, caused by such straggling workers. With these findings, we propose a simple yet effective decentralized method to reduce the variation among workers and thus improve the robustness of synchronous training. This method can be integrated with the widely used All-Reduce. Our findings are validated on large-scale training tasks using 200 Gaudi Accelerators.

翻訳日:2023-06-21 19:44:45 公開日:2023-06-18

# コンパクトカーネルによる条件付き期待

Conditional expectation via compact kernels ( http://arxiv.org/abs/2306.10592v1 )

ライセンス: Link先を確認

Suddhasattwa Das

(参考訳) 2つの確率変数の積から生じる条件付き期待を見出すという共通の設定において、微分、条件付き期待、および多様体学習の別々のタスクがしばしば表される。本稿では、このより一般的な問題に焦点をあて、条件付き期待値を推定する演算子理論的アプローチについて述べる。カーネル積分作用素は、再生カーネルヒルベルト空間における線形逆問題として推定問題を設定するためのコンパクト化ツールとして用いられる。この方程式は数値近似に安定な解を持つことが示されており、データ駆動実装の収束を保証する。全体的なテクニックは実装が容易で、現実世界の問題に対する彼らの成功例も示されています。

The separate tasks of denoising, conditional expectation and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactification tool, to set up the estimation problem as a linear inverse problem in a reproducing kernel Hilbert space. This equation is shown to have solutions that are stable to numerical approximation, thus guaranteeing the convergence of data-driven implementations. The overall technique is easy to implement, and their successful application to some real-world problems are also shown.

翻訳日:2023-06-21 19:44:34 公開日:2023-06-18

# 量子コンピュータを用いた機械学習における特徴選択

Quantum computer based Feature Selection in Machine Learning ( http://arxiv.org/abs/2306.10591v1 )

ライセンス: Link先を確認

Gerhard Hellstern, Vanessa Dehn, Martin Zaefferer

(参考訳) 本稿では,教師付き学習問題における適切な特徴数を選択する問題について検討する。機械学習の一般的な手法を出発点として、特徴選択タスクを古典的数値手法や量子計算フレームワークで扱うことができる二次的非拘束最適化問題(qubo)として扱う。異なる結果と小さな問題設定を比較した。本研究の結果から,QUBO法が他の特徴選択法より優れているか否かは,データセットに依存することがわかった。 27の特徴を持つより大きなデータセットの拡張として、量子コンピューティングによるQUBO法の収束挙動と古典的確率的最適化法を比較する。誤差率の持続により、古典確率最適化法は依然として優れている。

The problem of selecting an appropriate number of features in supervised learning problems is investigated in this paper. Starting with common methods in machine learning, we treat the feature selection task as a quadratic unconstrained optimization problem (QUBO), which can be tackled with classical numerical methods as well as within a quantum computing framework. We compare the different results in small-sized problem setups. According to the results of our study, whether the QUBO method outperforms other feature selection methods depends on the data set. In an extension to a larger data set with 27 features, we compare the convergence behavior of the QUBO methods via quantum computing with classical stochastic optimization methods. Due to persisting error rates, the classical stochastic optimization methods are still superior.

翻訳日:2023-06-21 19:44:23 公開日:2023-06-18

# 二重強汎函数のウォルド信頼区間の妥当性の正当性は仮定なしで証明できるのか?

Can we falsify the justification of the validity of Wald confidence intervals of doubly robust functionals, without assumptions? ( http://arxiv.org/abs/2306.10590v1 )

ライセンス: Link先を確認

Lin Liu and Rajarshi Mukherjee and James M. Robins

(参考訳) 本稿では,lotnitzkyらによって研究された2重ロバスト(dr)関数のクラスに属する任意の2重機械学習(dml)推定器を中心に,報告された公称$(1 - \alpha)$ wald confidence interval(ci)の有効性をアナリストの正当化を偽造する,liu et al. 20における仮定-リーンテストの実現可能なバージョンを開発する。 DR機能学のクラスは広く、経済学やバイオ統計学において中心的な重要性を持つ。厳密には、(i)chernozhukovらによって研究された条件付き期待のアフィン汎関数の期待として書ける平均二乗連続汎函数のクラスと、robinsらによって研究された函数のクラスの両方を含む。 DR関数の現在の最先端推定子 $\psi$ は DML 推定子 $\hat{\psi}_{1}$ である。 $\hat{\psi}_{1}$ のバイアスは、2つのニュアンス関数 $b$ と $p$ が推定されるレートの積に依存する。最も一般的なアナリストは、彼女の複雑性を低減した仮定の下で、Cauchy-Schwarz (CS) の上限が $\hat{\psi}_{1}$ のバイアスの $o (n^{- 1 / 2})$ であることを証明することによって、彼女の Wald CI の有効性を正当化する。したがって、仮説 $H_{0}$: CS上界が$o (n^{- 1 / 2})$ であるなら、ウォルドCIの有効性に対するアナリストの正当化を偽ることになる。本研究では、$b, p$ あるいはそれらの推定値 $\hat{b}, \hat{p}$ の複雑性還元仮定に頼ることなく、$H_{0}$ の有効な仮定リーンのファルシフィケーションテストを示す。シミュレーション実験を行い,提案する仮定-リーンテストの実用性を示す。我々の方法論の避けられない制限は、我々のを含む$h_{0}$の仮定-リーンテストが一貫性のあるテストにならないことである。したがって、テストの拒絶の失敗は$h_{0}$を支持する意味のある証拠ではない。

In this article we develop a feasible version of the assumption-lean tests in Liu et al. 20 that can falsify an analyst's justification for the validity of a reported nominal $(1 - \alpha)$ Wald confidence interval (CI) centered at a double machine learning (DML) estimator for any member of the class of doubly robust (DR) functionals studied by Rotnitzky et al. 21. The class of DR functionals is broad and of central importance in economics and biostatistics. It strictly includes both (i) the class of mean-square continuous functionals that can be written as an expectation of an affine functional of a conditional expectation studied by Chernozhukov et al. 22 and the class of functionals studied by Robins et al. 08. The present state-of-the-art estimators for DR functionals $\psi$ are DML estimators $\hat{\psi}_{1}$. The bias of $\hat{\psi}_{1}$ depends on the product of the rates at which two nuisance functions $b$ and $p$ are estimated. Most commonly an analyst justifies the validity of her Wald CIs by proving that, under her complexity-reducing assumptions, the Cauchy-Schwarz (CS) upper bound for the bias of $\hat{\psi}_{1}$ is $o (n^{- 1 / 2})$. Thus if the hypothesis $H_{0}$: the CS upper bound is $o (n^{- 1 / 2})$ is rejected by our test, we will have falsified the analyst's justification for the validity of her Wald CIs. In this work, we exhibit a valid assumption-lean falsification test of $H_{0}$, without relying on complexity-reducing assumptions on $b, p$, or their estimates $\hat{b}, \hat{p}$. Simulation experiments are conducted to demonstrate how the proposed assumption-lean test can be used in practice. An unavoidable limitation of our methodology is that no assumption-lean test of $H_{0}$, including ours, can be a consistent test. Thus failure of our test to reject is not meaningful evidence in favor of $H_{0}$.

翻訳日:2023-06-21 19:44:12 公開日:2023-06-18

# 政策最適化における楽観性と適応性

Optimism and Adaptivity in Policy Optimization ( http://arxiv.org/abs/2306.10587v1 )

ライセンス: Link先を確認

Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag

(参考訳) 我々は,強化学習(RL)における政策最適化手法の高速化のための統一パラダイムを,<emph{optimism} \& \emph{adaptivity} を通じて進める。ポリシー反復法とポリシー勾配法との深い関係を生かして、一見無関係なポリシー最適化アルゴリズムを2つのインターリーブステップの繰り返し適用として再キャストする。 i) \emph{optimistic policy improve operator} は、先行ポリシー $\pi_t$ を \emph{gradient ascent prediction} を用いて仮説 $\pi_{t+1} にマッピングし、次に続く。 (ii)$\pi_{t+1}$のパフォーマンスの部分評価に基づく楽観的予測のemph{hindsight adaptation}。我々はこの共有レンズを用いて、ソフトで楽観的なポリシー反復、自然なアクター批判法、前方探索に基づくモデルベースのポリシー改善、メタ学習アルゴリズムなど、他のよく知られたアルゴリズムを共同で表現する。そうすることで、オプティミズム \& 適応性による加速度に関連する集合的理論的性質に光を当てた。これらの知見に基づいて,メタグラディエント・ラーニングによる<emph{adaptive \& optistic policy gradient} アルゴリズムを設計し,最適性に関連するいくつかの設計選択を実証的に強調する。

We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) through \emph{optimism} \& \emph{adaptivity}. Leveraging the deep connection between policy iteration and policy gradient methods, we recast seemingly unrelated policy optimization algorithms as the repeated application of two interleaving steps (i) an \emph{optimistic policy improvement operator} maps a prior policy $\pi_t$ to a hypothesis $\pi_{t+1}$ using a \emph{gradient ascent prediction}, followed by (ii) a \emph{hindsight adaptation} of the optimistic prediction based on a partial evaluation of the performance of $\pi_{t+1}$. We use this shared lens to jointly express other well-known algorithms, including soft and optimistic policy iteration, natural actor-critic methods, model-based policy improvement based on forward search, and meta-learning algorithms. By doing so, we shed light on collective theoretical properties related to acceleration via optimism \& adaptivity. Building on these insights, we design an \emph{adaptive \& optimistic policy gradient} algorithm via meta-gradient learning, and empirically highlight several design choices pertaining to optimism, in an illustrative task.

翻訳日:2023-06-21 19:43:29 公開日:2023-06-18

# 多チャンネル近藤雲の階層的絡み合い殻

Hierarchical entanglement shells of multichannel Kondo clouds ( http://arxiv.org/abs/2306.10583v1 )

ライセンス: Link先を確認

Jeongmin Shim, Donghoon Kim, and H.-S. Sim

(参考訳) 不純物や境界はしばしば隙間のないバルクに非自明な境界条件を課し、与えられたバルク、相転移、非フェルミ液体に対して異なる境界普遍性クラスを生じる。しかし、基礎となる境界状態はほとんど未探検のままである。これは金堂雲が金属中の磁気不純物をどのように空間的に形成するかという根本的な問題に関連している。ここでは,非フェルミ液体を含む多チャンネル近藤雲の量子コヒーレントな空間・エネルギー構造を,不純物とチャネル間の量子絡み合いを調べることにより予測する。異なる非フェルミ液体の絡み合い殻は、チャネルによって構造に共存する。温度が上昇すると、シェルは外側から1つずつ抑制され、残りの外側のシェルは各チャネルの熱位相を決定する。エンタングルメントシェルの検出は実験的に可能である。以上より,他の境界状態や境界点の絡み合いを研究するための手掛かりが示唆された。

Impurities or boundaries often impose nontrivial boundary conditions on a gapless bulk, resulting in distinct boundary universality classes for a given bulk, phase transitions, and non-Fermi liquids in diverse systems. The underlying boundary states however remain largely unexplored. This is related with a fundamental issue how a Kondo cloud spatially forms to screen a magnetic impurity in a metal. Here we predict the quantum-coherent spatial and energy structure of multichannel Kondo clouds, representative boundary states involving competing non-Fermi liquids, by studying quantum entanglement between the impurity and the channels. Entanglement shells of distinct non-Fermi liquids coexist in the structure, depending on the channels. As temperature increases, the shells become suppressed one by one from the outside, and the remaining outermost shell determines the thermal phase of each channel. Detection of the entanglement shells is experimentally feasible. Our findings suggest a guide to studying other boundary states and boundary-bulk entanglement.

翻訳日:2023-06-21 19:43:01 公開日:2023-06-18

# 競合型マルチエージェント探索のための進化戦略

Evolving Strategies for Competitive Multi-Agent Search ( http://arxiv.org/abs/2306.10640v1 )

ライセンス: Link先を確認

Erkin Bahceci, Riitta Katila, and Risto Miikkulainen

(参考訳) 進化的計算は工学における自動発見に適しているが、人間や組織がより効果的に機能する方法についての洞察を得るためにも利用できる。本稿では、組織におけるイノベーションサーチの現実的な課題をモチベーションの例として用い、まず、人間の創造的問題解決を競合マルチエージェントサーチ(CMAS)として定式化する。 CMASは既存のシングルエージェントやチーム検索と異なり、エージェントは他のエージェントの検索の知識と、これらの検索から生じる検索環境の動的変化を通して対話する。主な仮説は、進化的計算がCMASの効果的な戦略を発見するために用いられることであり、この仮説はNKモデルに関する一連の実験、すなわち部分的に相関し、調整可能な頑丈なフィットネスランドスケープで検証される。異なる競争環境ごとに異なる専門戦略が進化し、環境全体にわたってうまく機能する一般的な戦略も進化する。これらの戦略は、手作りの戦略や伝統的な木探索に基づく戦略よりも効率的で複雑である。このような風景を新しい球面で可視化することで、例えば、風景のポジティブな変化を追跡するなど、戦略がいかにうまく機能するかについての洞察が得られる。本稿は,将来,競争力のあるマルチエージェント検索として,様々な創造活動を研究するためのフレームワークを提供する。

While evolutionary computation is well suited for automatic discovery in engineering, it can also be used to gain insight into how humans and organizations could perform more effectively. Using a real-world problem of innovation search in organizations as the motivating example, this article first formalizes human creative problem solving as competitive multiagent search (CMAS). CMAS is different from existing single-agent and team search problems in that the agents interact through knowledge of other agents' searches and through the dynamic changes in the search landscape that result from these searches. The main hypothesis is that evolutionary computation can be used to discover effective strategies for CMAS; this hypothesis is verified in a series of experiments on the NK model, i.e. partially correlated and tunably rugged fitness landscapes. Different specialized strategies are evolved for each different competitive environment, and also general strategies that perform well across environments. These strategies are more effective and more complex than hand-designed strategies and a strategy based on traditional tree search. Using a novel spherical visualization of such landscapes, insight is gained about how successful strategies work, e.g. by tracking positive changes in the landscape. The article thus provides a possible framework for studying various human creative activities as competitive multi-agent search in the future.

翻訳日:2023-06-21 19:37:14 公開日:2023-06-18

# MA-BBOB:ノイズのない数値ブラックボックス最適化文脈における自動ML手法の評価のためのBBOB関数の多要素結合

MA-BBOB: Many-Affine Combinations of BBOB Functions for Evaluating AutoML Approaches in Noiseless Numerical Black-Box Optimization Contexts ( http://arxiv.org/abs/2306.10627v1 )

ライセンス: Link先を確認

Diederick Vermetten, Furong Ye, Thomas B\"ack, Carola Doerr

(参考訳) 連続最適化(coco)プラットフォームから確立されたbbob関数のペアを補間することにより、数値ブラックボックス最適化ベンチマークのための新しいインスタンスを生成するための最近の提案を拡張し、本研究では、元のインスタンスと任意に選択されたグローバルオプティマの複数のアフィン結合を可能にするさらなる一般化を提案する。 ma-bbobジェネレータがインスタンス空間を満たし、アルゴリズム性能の全体的なパターンが保存されていることを実証する。課題のランドスケープ特徴と性能データを組み合わせることで,これらの特徴がアルゴリズムの選択に有用かどうかを従来の研究が示唆したように疑問視する。 MA-BBOBは、標準化された実験ルーチンを促進し、パフォーマンス分析と視覚化のためにインタラクティブなIOHanalyzerモジュールへのアクセスを提供し、(MA-)BBOB関数で利用可能なリッチで成長するデータコレクションとの比較を可能にする。

Extending a recent suggestion to generate new instances for numerical black-box optimization benchmarking by interpolating pairs of the well-established BBOB functions from the COmparing COntinuous Optimizers (COCO) platform, we propose in this work a further generalization that allows multiple affine combinations of the original instances and arbitrarily chosen locations of the global optima. We demonstrate that the MA-BBOB generator can help fill the instance space, while overall patterns in algorithm performance are preserved. By combining the landscape features of the problems with the performance data, we pose the question of whether these features are as useful for algorithm selection as previous studies suggested. MA-BBOB is built on the publicly available IOHprofiler platform, which facilitates standardized experimentation routines, provides access to the interactive IOHanalyzer module for performance analysis and visualization, and enables comparisons with the rich and growing data collection available for the (MA-)BBOB functions.

翻訳日:2023-06-21 19:36:50 公開日:2023-06-18

# グラフニューラルネットワークを用いた気流シミュレーションのためのメタラーニング

Meta-Learning for Airflow Simulations with Graph Neural Networks ( http://arxiv.org/abs/2306.10624v1 )

ライセンス: Link先を確認

Wenzhuo Liu, Mouadh Yagoubi, Marc Schoenauer

(参考訳) 数値シミュレーションの分野は実世界のシステムの設計と管理において重要であり、偏微分方程式(PDE)は一般的な数学モデリングツールである。しかしながら、従来の数値解法は高い計算コストを必要とすることが多いため、PDEの解法は依然として課題である。その結果、機械学習(特にディープラーニング)アルゴリズムを利用したデータ駆動手法が、計算流体力学(CFD)のような複雑なPDEの解を予測できるモデルを学ぶために、ますます提案されている。しかし、これらの手法は、OoD(Out-of-distriion)サンプルの一般化性能の低下に悩まされており、より効率的なアプローチの必要性を強調している。そこで本研究では,OoDサンプルを用いた学習モデルの性能向上のためのメタラーニング手法を提案する。具体的には,各気翼上のCFD内の気流シミュレーションをメタラーニング問題として設定し,一つの気翼形状で定義された各例を個別のタスクとして扱う。モデルに依存しないメタラーニング(MAML)を用いることで,従来のエアフイル形状に適応可能なメタラーナーを,少数のタスク固有データのみを用いて学習する。提案手法の効率を実験的に実証し, 学習モデルのood一般化性能を向上し, 効率を維持できることを示した。

The field of numerical simulation is of significant importance for the design and management of real-world systems, with partial differential equations (PDEs) being a commonly used mathematical modeling tool. However, solving PDEs remains still a challenge, as commonly used traditional numerical solvers often require high computational costs. As a result, data-driven methods leveraging machine learning (more particularly Deep Learning) algorithms have been increasingly proposed to learn models that can predict solutions to complex PDEs, such as those arising in computational fluid dynamics (CFD). However, these methods are known to suffer from poor generalization performance on out-of-distribution (OoD) samples, highlighting the need for more efficient approaches. To this end, we present a meta-learning approach to enhance the performance of learned models on OoD samples. Specifically, we set the airflow simulation in CFD over various airfoils as a meta-learning problem, where each set of examples defined on a single airfoil shape is treated as a separate task. Through the use of model-agnostic meta-learning (MAML), we learn a meta-learner capable of adapting to new tasks, i.e., previously unseen airfoil shapes, using only a small amount of task-specific data. We experimentally demonstrate the efficiency of the proposed approach for improving the OoD generalization performance of learned models while maintaining efficiency.

翻訳日:2023-06-21 19:36:27 公開日:2023-06-18

# 歯科用パノラマX線写真解析のためのマスケ画像モデリング

Enhanced Masked Image Modeling for Analysis of Dental Panoramic Radiographs ( http://arxiv.org/abs/2306.10623v1 )

ライセンス: Link先を確認

Amani Almalki and Longin Jan Latecki

(参考訳) コンピュータ支援放射線情報報告は, 歯科医療提供者の診断・治療計画を容易にするため, 研究の注目を集めている。しかし,手動による歯科画像の解釈には限界があり,高価であり,時間を要する。デンタルイメージングのもうひとつの障壁は、トレーニング用に利用可能なイメージの数が限られていることだ。本研究では,マスク画像モデリング(simmim)トランスフォーマ(sd-simmim)上に自己教師付き学習を施した新しい自己蒸留法(sd)を提案する。マスクパッチの予測損失に加えて、SD-SimMIMは可視パッチの自己蒸留損失を計算する。 SD-SimMIMを歯科用パノラマX線に応用し, 歯の修復, 矯正器具の検出, 症例分割作業を行った。その結果,SD-SimMIMは他の自己教師あり学習方法よりも優れていた。さらに、既存のパノラマX線データセットのアノテーションを増強し、改善する。

The computer-assisted radiologic informative report has received increasing research attention to facilitate diagnosis and treatment planning for dental care providers. However, manual interpretation of dental images is limited, expensive, and time-consuming. Another barrier in dental imaging is the limited number of available images for training, which is a challenge in the era of deep learning. This study proposes a novel self-distillation (SD) enhanced self-supervised learning on top of the masked image modeling (SimMIM) Transformer, called SD-SimMIM, to improve the outcome with a limited number of dental radiographs. In addition to the prediction loss on masked patches, SD-SimMIM computes the self-distillation loss on the visible patches. We apply SD-SimMIM on dental panoramic X-rays for teeth numbering, detection of dental restorations and orthodontic appliances, and instance segmentation tasks. Our results show that SD-SimMIM outperforms other self-supervised learning methods. Furthermore, we augment and improve the annotation of an existing dataset of panoramic X-rays.

翻訳日:2023-06-21 19:36:03 公開日:2023-06-18

# 多地点脳MRIを用いた事前知識インフォームドディープラーニングによるラセン検出と定量化

Prior-knowledge-informed deep learning for lacune detection and quantification using multi-site brain MRI ( http://arxiv.org/abs/2306.10622v1 )

ライセンス: Link先を確認

Bo Li, Jeroen de Bresser, Wiro Niessen, Matthias van Osch, Wiesje M. van der Flier, Geert Jan Biessels, Meike W. Vernooij, Esther Bron (for the Heart-Brain Connection Consortium)

(参考訳) 血管起源と推定されるラクエンは、脳小血管疾患や認知症などの認知疾患を評価するのに重要である。しかしながら、画像データからのラクエンの視覚的評価は、そのサイズ、スパース性、模倣性が小さいため、困難で、時間がかかり、レート依存である。最近の自動アルゴリズムの発展により、感度を保ちながらラカンの検出を高速化する一方で、偽陽性が多数見られ、臨床や大規模研究での使用には実用的でないことが示されている。そこで我々は,ラグーン検出に加えて,分類的負担スコアを出力する新しいフレームワークを開発した。このスコアは、ラキューンのイメージングアセスメントを単純化し、効果的に加速するより実用的なラキューンの存在推定を提供する可能性がある。我々は,検出と分類スコアの組み合わせにより,ノイズラベルに対する感度が低下する,という仮説を立てた。

Lacunes of presumed vascular origin, also referred to as lacunar infarcts, are important to assess cerebral small vessel disease and cognitive diseases such as dementia. However, visual rating of lacunes from imaging data is challenging, time-consuming, and rater-dependent, owing to their small size, sparsity, and mimics. Whereas recent developments in automatic algorithms have shown to make the detection of lacunes faster while preserving sensitivity, they also showed a large number of false positives, which makes them impractical for use in clinical practice or large-scale studies. Here, we develop a novel framework that, in addition to lacune detection, outputs a categorical burden score. This score could provide a more practical estimate of lacune presence that simplifies and effectively accelerates the imaging assessment of lacunes. We hypothesize that the combination of detection and the categorical score makes the procedure less sensitive to noisy labels.

翻訳日:2023-06-21 19:35:45 公開日:2023-06-18

# UniSG^GA:Geometric Algebraを用いた3Dシーングラフによる幾何学・行動・GNNの創成AIへの応用

UniSG^GA: A 3D scenegraph powered by Geometric Algebra unifying geometry, behavior and GNNs towards generative AI ( http://arxiv.org/abs/2306.10621v1 )

ライセンス: Link先を確認

Manos Kamarianakis, Antonis Protopsaltis, Dimitris Angelis, Paul Zikas, Mike Kentros, George Papagiannakis

(参考訳) 本研究は,3次元シーンの挙動と幾何学的データを組み込んだ,新たな統合されたシーングラフ構造UniSG^GAの導入について述べる。グラフニューラルネットワーク(GNN)をシームレスに統合し、生成タスク中に3Dシーングラフ(3D-SG)を変換する際の課題に対処するように設計されている。グラフ表現において,物体間のトポロジ的関係を効率的に把握し,保存するために,幾何学的代数形式をシームレスに統合するUniSG^GAを提案する。この新しいアプローチは、生成的および予測的タスクの処理におけるGNNの全体的なパフォーマンスと能力を高め、新たな可能性を開き、シーン生成と合成を効果的に活用できるグラフベースの生成AIモデルのさらなる探索と開発の基礎を築くことを目的としている。

This work presents the introduction of UniSG^GA, a novel integrated scenegraph structure, that to incorporates behavior and geometry data on a 3D scene. It is specifically designed to seamlessly integrate Graph Neural Networks (GNNs) and address the challenges associated with transforming a 3D scenegraph (3D-SG) during generative tasks. To effectively capture and preserve the topological relationships between objects in a simplified way, within the graph representation, we propose UniSG^GA, that seamlessly integrates Geometric Algebra (GA) forms. This novel approach enhances the overall performance and capability of GNNs in handling generative and predictive tasks, opening up new possibilities and aiming to lay the foundation for further exploration and development of graph-based generative AI models that can effectively incorporate behavior data for enhanced scene generation and synthesis.

翻訳日:2023-06-21 19:35:29 公開日:2023-06-18

# 自己回帰型ニューラル演算子の安定性に向けて

Towards Stability of Autoregressive Neural Operators ( http://arxiv.org/abs/2306.10619v1 )

ライセンス: Link先を確認

Michael McCabe, Peter Harrington, Shashank Subramanian, Jed Brown

(参考訳) ニューラル演算子は、物理科学における時空間系のモデリングに有望なアプローチであることが証明されている。しかし、これらのモデルを大規模システム向けにトレーニングすることは、計算とメモリの大幅なコストを発生させるため、非常に難しい - これらのシステムは、将来の時間状態を予測するために、ニューラルネットワークの自動回帰的タイムステッピングに頼ることを余儀なくされることが多い。これはコスト管理に有効であるが、時間とともに制御不能なエラーの増加と最終的には不安定になる可能性がある。この自己回帰的誤差の増大の原因を,物理システムのための先駆的ニューラルオペレータモデルを用いて解析し,その軽減法を探究する。計算/メモリコストを膨らませることなく、これらのモデル内で不安定誘導操作を慎重に制御できるアーキテクチャとアプリケーション固有の改善を導入する。本研究では,Navier-Stokes流体の流れ,浅瀬の回転,高分解能気象予報システムなどの科学システムについて報告する。原型ニューラルネットワークに設計原則を適用すると、これらのシステムのオリジナルのモデルと比較して、800\%長予測の長距離予測において、偏差の定性的な兆候のないエラーが大幅に減少することを示した。再現性のために、私たちは \href{https://anonymous.4open.science/r/stabilizing_neural_operators-5774/}{code}をオープンソース化した。

Neural operators have proven to be a promising approach for modeling spatiotemporal systems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense -- these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effective in managing costs, it can lead to uncontrolled error growth over time and eventual instability. We analyze the sources of this autoregressive error growth using prototypical neural operator models for physical systems and explore ways to mitigate it. We introduce architectural and application-specific improvements that allow for careful control of instability-inducing operations within these models without inflating the compute/memory expense. We present results on several scientific systems that include Navier-Stokes fluid flow, rotating shallow water, and a high-resolution global weather forecasting system. We demonstrate that applying our design principles to prototypical neural networks leads to significantly lower errors in long-range forecasts with 800\% longer forecasts without qualitative signs of divergence compared to the original models for these systems. We open-source our \href{https://anonymous.4open.science/r/stabilizing_neural_operators-5774/}{code} for reproducibility.

翻訳日:2023-06-21 19:35:10 公開日:2023-06-18

# gpuによる電力系統用機械学習モデルの検証

GPU-Accelerated Verification of Machine Learning Models for Power Systems ( http://arxiv.org/abs/2306.10617v1 )

ライセンス: Link先を確認

{Samuel Chevalier, Ilgiz Murzakhanov, Spyros Chatzivasileiadis

(参考訳) 近年,大規模機械学習(ML)モデルの性能を厳格に検証するための計算ツールが著しく進歩している。最も成功した解法は、高度に専門化されたGPU加速分岐とバウンドルーチンである。このようなツールは、電力システムなどの安全クリティカルなシステムに機械学習アプリケーションをデプロイする上で、極めて重要である。しかし、その成功にもかかわらず、障壁はシステム問題にこれらのルーチンをそのまま適用することを妨げる。本稿ではこの問題を2つの重要な方法で解決する。まず,まず,複数の検証問題の同時検証を可能にする(例えば,個々の検証問題を解くことによってではなく,すべての行フローの制約が同時に違反されることをチェックする)。そこで本研究では,一連の潜在的侵害をまたいだ"ワーストケース"違反を,元来のニューラルネットワークを補強する一連のreluベースのレイヤに変換する,厳密なトランスフォーメーションを導入する。これにより、検証者は直接解釈することができる。第二に、電力フロー制約を満たすためには、しばしば電力系統MLモデルを検証する必要がある。本稿では,線形等式制約と不等式制約を直接検証問題にエンコードする双対化手法を提案する。これらのイノベーションを実証するために,データ駆動型セキュリティ制約付きDC-OPFソルバに関わる問題を検証した。私たちは最初のイノベーションセットを$\alpha,\beta$-crownソルバを使って構築し、テストし、gurobi 10.0に対してベンチマークします。当社のコントリビューションは100倍以上のスピードアップを実現し、高いレベルの柔軟性を実現しています。

Computational tools for rigorously verifying the performance of large-scale machine learning (ML) models have progressed significantly in recent years. The most successful solvers employ highly specialized, GPU-accelerated branch and bound routines. Such tools are crucial for the successful deployment of machine learning applications in safety-critical systems, such as power systems. Despite their successes, however, barriers prevent out-of-the-box application of these routines to power system problems. This paper addresses this issue in two key ways. First, for the first time to our knowledge, we enable the simultaneous verification of multiple verification problems (e.g., checking for the violation of all line flow constraints simultaneously and not by solving individual verification problems). For that, we introduce an exact transformation that converts the "worst-case" violation across a set of potential violations to a series of ReLU-based layers that augment the original neural network. This allows verifiers to interpret them directly. Second, power system ML models often must be verified to satisfy power flow constraints. We propose a dualization procedure which encodes linear equality and inequality constraints directly into the verification problem; and in a manner which is mathematically consistent with the specialized verification tools. To demonstrate these innovations, we verify problems associated with data-driven security constrained DC-OPF solvers. We build and test our first set of innovations using the $\alpha,\beta$-CROWN solver, and we benchmark against Gurobi 10.0. Our contributions achieve a speedup that can exceed 100x and allow higher degrees of verification flexibility.

翻訳日:2023-06-21 19:34:49 公開日:2023-06-18

# Omnipredictor を用いた単一インデックスモデルの学習

Agnostically Learning Single-Index Models using Omnipredictors ( http://arxiv.org/abs/2306.10615v1 )

ライセンス: Link先を確認

Aravind Gollakota and Parikshit Gopalan and Adam R. Klivans and Konstantinos Stavropoulos

(参考訳) 任意の単調およびリプシッツのアクティベーションを持つSIM(Single-Index Models)を学習するための最初の結果を与える。以前のすべての作業は、実現可能な設定でのみ保持するか、アクティベーションを知る必要がある。さらに、有界な第二モーメントを持つことは限界しか必要としないが、事前の作業はすべてより強い分布仮定(反集中や有界性など)を必要とする。本アルゴリズムは, [GHK$^+$23] の検定多重精度を満たす予測器を用いた全方位予測に関する最近の研究に基づいている。我々の分析は単純であり、ブレグマンの発散(あるいは損失の一致)と$\ell_p$距離の関係に依存する。また、GLMtronのような標準アルゴリズムと非依存設定におけるロジスティック回帰の新しい保証も提供する。

We give the first result for agnostically learning Single-Index Models (SIMs) with arbitrary monotone and Lipschitz activations. All prior work either held only in the realizable setting or required the activation to be known. Moreover, we only require the marginal to have bounded second moments, whereas all prior work required stronger distributional assumptions (such as anticoncentration or boundedness). Our algorithm is based on recent work by [GHK$^+$23] on omniprediction using predictors satisfying calibrated multiaccuracy. Our analysis is simple and relies on the relationship between Bregman divergences (or matching losses) and $\ell_p$ distances. We also provide new guarantees for standard algorithms like GLMtron and logistic regression in the agnostic setting.

翻訳日:2023-06-21 19:34:29 公開日:2023-06-18

# 雑音処理とサイド情報のない同定可能な因果推論

Identifiable causal inference with noisy treatment and no side information ( http://arxiv.org/abs/2306.10614v1 )

ライセンス: Link先を確認

Antti P\"oll\"anen, Pekka Marttinen

(参考訳) いくつかの因果推論シナリオでは、例えば疫学や計量学において、治療変数(すなわち原因)が不正確に測定される。この測定誤差の影響を補正できないと、偏りのある因果効果の推定につながる。従来の研究では、複雑な非線形依存を可能とし、側面情報へのアクセスを前提とせず、因果的観点からこの問題に対処する方法は研究されていない。シナリオとして,不正確な測定を行う連続処理変数を仮定したモデルを提案する。測定誤差モデルに対する既存の結果に基づいて,測定誤差の分散やその他の側面情報を知ることなく,モデルの因果効果の推定値が同定可能であることを示す。本手法は,ガウス条件がニューラルネットワークによってパラメータ化される深い潜在変数モデルに依拠し,モデル学習のための重要度重み付き変分目標を開発した。実験結果から, 測定誤差が未知であることを示す。より広い範囲において、我々の仕事は信頼できる因果推論ができるアプリケーションの範囲を広げます。

In some causal inference scenarios, the treatment (i.e. cause) variable is measured inaccurately, for instance in epidemiology or econometrics. Failure to correct for the effect of this measurement error can lead to biased causal effect estimates. Previous research has not studied methods that address this issue from a causal viewpoint while allowing for complex nonlinear dependencies and without assuming access to side information. For such as scenario, this paper proposes a model that assumes a continuous treatment variable which is inaccurately measured. Building on existing results for measurement error models, we prove that our model's causal effect estimates are identifiable, even without knowledge of the measurement error variance or other side information. Our method relies on a deep latent variable model where Gaussian conditionals are parameterized by neural networks, and we develop an amortized importance-weighted variational objective for training the model. Empirical results demonstrate the method's good performance with unknown measurement error. More broadly, our work extends the range of applications where reliable causal inference can be conducted.

翻訳日:2023-06-21 19:34:14 公開日:2023-06-18

# CompanyKG: 企業類似性定量化のための大規模不均一グラフ

CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification ( http://arxiv.org/abs/2306.10649v1 )

ライセンス: Link先を確認

Lele Cao, Vilhelm von Ehrenheim, Mark Granroth-Wilding, Richard Anselmo Stahl, Andrew McCornack, Armin Catovic and Dhiana Deva Cavacanti Rocha

(参考訳) 投資業界では、市場マッピング、競合分析、合併や買収など、様々な目的のために細かな会社の類似度定量化を行うことが不可欠であることが多い。企業の特徴と関係を表現・学習するために,企業kgという知識グラフを提案し,公開する。具体的には、117万の企業が企業記述の埋め込みに富んだノードとして表現され、15の異なる企業間関係によって51.06百万のエッジが生成される。企業の類似度定量化手法の包括的評価を可能にするために,類似度予測,競合検索,類似度ランキングという3つの評価タスクをアノテートした。本稿では,11個の再現可能な予測手法について,ノードのみ,エッジのみ,ノード+エッジの3つのグループに分類した。私たちの知る限りでは、企業間類似性を定量化するのに適した、実世界の投資プラットフォームから派生した、最初の大規模な異種グラフデータセットである。

In the investment industry, it is often essential to carry out fine-grained company similarity quantification for a range of purposes, including market mapping, competitor analysis, and mergers and acquisitions. We propose and publish a knowledge graph, named CompanyKG, to represent and learn diverse company features and relations. Specifically, 1.17 million companies are represented as nodes enriched with company description embeddings; and 15 different inter-company relations result in 51.06 million weighted edges. To enable a comprehensive assessment of methods for company similarity quantification, we have devised and compiled three evaluation tasks with annotated test sets: similarity prediction, competitor retrieval and similarity ranking. We present extensive benchmarking results for 11 reproducible predictive methods categorized into three groups: node-only, edge-only, and node+edge. To the best of our knowledge, CompanyKG is the first large-scale heterogeneous graph dataset originating from a real-world investment platform, tailored for quantifying inter-company similarity.

翻訳日:2023-06-21 19:26:34 公開日:2023-06-18

# 参照なしユーザ制御可能なセマンティック画像合成

Referenceless User Controllable Semantic Image Synthesis ( http://arxiv.org/abs/2306.10646v1 )

ライセンス: Link先を確認

Jonghyun Kim, Gen Li, Joongkyu Kim

(参考訳) 近年のセマンティック画像合成の進歩にもかかわらず、画像スタイルの完全な制御は難しい問題である。既存の手法では、スタイル情報をセマンティックレイアウトにフィードするために参照画像が必要であり、そのスタイルが与えられた画像によって制約されていることを示す。本稿では,特定の意味領域のスタイルを表現するために特異色を利用するユーザ制御可能な意味画像合成のためのRUCGANというモデルを提案する。提案するネットワークは,各セマンティックレイアウトにユーザ希望のスタイルとして色を注入することにより,参照不要なセマンティックイメージ合成を実現し,特異な色でセマンティックイメージを合成することができる。様々な挑戦的データセットに対する大規模な実験結果から,提案手法は既存手法よりも優れており,我々はさらに,スタイル制御性に対するアプローチの利点を示すインタラクティブUIを提供する。

Despite recent progress in semantic image synthesis, complete control over image style remains a challenging problem. Existing methods require reference images to feed style information into semantic layouts, which indicates that the style is constrained by the given image. In this paper, we propose a model named RUCGAN for user controllable semantic image synthesis, which utilizes a singular color to represent the style of a specific semantic region. The proposed network achieves reference-free semantic image synthesis by injecting color as user-desired styles into each semantic layout, and is able to synthesize semantic images with unusual colors. Extensive experimental results on various challenging datasets show that the proposed method outperforms existing methods, and we further provide an interactive UI to demonstrate the advantage of our approach for style controllability.

翻訳日:2023-06-21 19:26:04 公開日:2023-06-18

# ChatGPTプロンプトを用いた効果的な教育チャットボットの開発:ソーシャルメディア・リテラシーを事例として

Developing Effective Educational Chatbots with ChatGPT prompts: Insights from Preliminary Tests in a Case Study on Social Media Literacy ( http://arxiv.org/abs/2306.10645v1 )

ライセンス: Link先を確認

Cansu Koyuturk, Mona Yavari, Emily Theophilou, Sathya Bursic, Gregor Donabauer, Alessia Telari, Alessia Testa, Raffaele Boiano, Alessandro Gabbiadini, Davinia Hernandez-Leo, Martin Ruskov, Dimitri Ognibene

(参考訳) 教育用チャットボットには、インタラクティブでパーソナライズされた学習体験が約束されているが、その開発は、利用可能なプラットフォームの自由なインタラクション機能と、知識を適切なフォーマットでエンコーディングすることの難しさによって制限されている。 chatgptのようなゼロショット学習機能を持つ言語学習モデルの最近の進歩は、プロンプトベースのアプローチで教育用チャットボットを開発する新しい可能性を示唆している。本稿では,チャットボットを交互に操作できる簡易システムを用いた事例研究を行い,最初のテストから得られた洞察と予備ガイドラインについて述べる。本稿では,ChatGPTが複数の相互接続型学習目標を追求する能力,文化,年齢,教育レベルといったユーザの特性に教育活動を適用する能力,多様な教育戦略や会話スタイルを利用する能力について検討する。その結果,チャットボットの役割が教師からセラピストに予期せぬ変化をもたらすおそれのある,チャットボットによる会話の限られた歴史と,ChatGPTによる高度に構造化された応答の形式によって,課題が提起される。これらの課題に対処し、効果的な教育チャットボットの開発を促進するための初期ガイドラインを提示する。

Educational chatbots come with a promise of interactive and personalized learning experiences, yet their development has been limited by the restricted free interaction capabilities of available platforms and the difficulty of encoding knowledge in a suitable format. Recent advances in language learning models with zero-shot learning capabilities, such as ChatGPT, suggest a new possibility for developing educational chatbots using a prompt-based approach. We present a case study with a simple system that enables mixed-turn chatbot interactions and we discuss the insights and preliminary guidelines obtained from initial tests. We examine ChatGPT's ability to pursue multiple interconnected learning objectives, adapt the educational activity to users' characteristics, such as culture, age, and level of education, and its ability to use diverse educational strategies and conversational styles. Although the results are encouraging, challenges are posed by the limited history maintained for the conversation and the highly structured form of responses by ChatGPT, as well as their variability, which can lead to an unexpected switch of the chatbot's role from a teacher to a therapist. We provide some initial guidelines to address these issues and to facilitate the development of effective educational chatbots.

翻訳日:2023-06-21 19:25:36 公開日:2023-06-18

# 正方格子中のsu(3)フェルミオンの金属-絶縁体転移と磁性

Metal-insulator transition and magnetism of SU(3) fermions in the square lattice ( http://arxiv.org/abs/2306.10644v1 )

ライセンス: Link先を確認

Eduardo Ibarra-Garc\'ia-Padilla, Chunhan Feng, Giulio Pasqualetti, Simon F\"olling, Richard T. Scalettar, Ehsan Khatami, Kaden R. A. Hazzard

(参考訳) 数値的精度決定型量子モンテカルロ法(DQMC)と数値連結クラスタ展開法(NLCE)を用いて, 正方格子中のSU(3)対称フェルミ-ハッバードモデル(FHM)を1/3$-fillingで検討した。本稿では,金属絶縁体遷移と磁気クロスオーバーのシグネチャを観察するモデルのT$-$U$位相図を示す。これらのシグネチャは、圧縮率の上昇と対角スピン-スピン相関関数の符号の相互作用依存的な変化を特徴づける温度スケールである。圧縮性の解析は、金属絶縁体量子臨界点の位置を$U_c/t \sim 6$と推定し、有限のT$でモット物理を観測するための温度スケールを提供する。さらに、スピンスピン相関関数の解析から、u/t \gtrsim6$ と $t \sim j = 4t^2/u$ に対して、短距離の2つの反強磁性構造と、温度が $t/j \lesssim 0.57$ を下回るように出現する3つの反強磁性構造が存在することを観察する。この2-SLから3-SLへの磁気秩序の交差は、ハイゼンベルクの極限予測と一致し、オンサイト対の数に観測可能な効果を持つ。最後に、現在達成されている実験技術と温度を持つ光学格子中のアルカリ-アース様原子を用いて、$T$-U$相図の特徴を探索する方法について述べる。本論文で論じられた結果は, ドーピングによるSU(3)FHM探査の出発点となる。

We study the SU(3) symmetric Fermi-Hubbard model (FHM) in the square lattice at $1/3$-filling using numerically exact determinant quantum Monte Carlo (DQMC) and numerical linked-cluster expansion (NLCE) techniques. We present the $T$-$U$ phase diagram of the model, in which we observe signatures of the metal-insulator transition and magnetic crossovers. These signatures are the temperature scale characterizing the rise of the compressibility, and an interaction-dependent change in the sign of the diagonal spin-spin correlation function. The analysis of the compressibility estimates the location of the metal-insulator quantum critical point at $U_c/t \sim 6$, and provides a temperature scale for observing Mott physics at finite-$T$. Furthermore, from the analysis of the spin-spin correlation function we observe that for $U/t \gtrsim6$ and $T \sim J = 4t^2/U$ there is a development of a short-ranged two sublattice (2-SL) antiferromagnetic structure, as well as an emerging three sublattice (3-SL) antiferromagnetic structure as the temperature is lowered below $T/J \lesssim 0.57$. This crossover from 2-SL to 3-SL magnetic ordering agrees with Heisenberg limit predictions, and has observable effects on the number of on-site pairs. Finally, we describe how the features of the $T$-$U$ phase diagram can be explored with alkaline-earth-like atoms in optical lattices with currently-achieved experimental techniques and temperatures. The results discussed in this manuscript provide a starting point for the exploration of the SU(3) FHM upon doping.

翻訳日:2023-06-21 19:24:56 公開日:2023-06-18

PDF登録状況（公開日: 20230618）