Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20231024となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 非Fungible Token Security Non-Fungible Token Security ( http://arxiv.org/abs/2310.15518v1 ) ライセンス: Link先を確認	Ryleigh McKinney, Sundar Krishnan,	(参考訳) 非偽造トークン(NFT)はブロックチェーンに格納されたユニークなデジタル資産であり、デジタル資産の所有権と認証に使用される。 NFTは2014年に初めて製作され、その人気は2021年から2022年にかけてピークを迎えた。本稿では,NFT(Non-Fungible Tokens)の歴史,NFT(Future of NFTs),およびセキュリティ上の懸念について述べる。 Non-fungible tokens (NFTs) are unique digital assets stored on the blockchain and is used to certify ownership and authenticity of the digital asset. NFTs were first created in 2014 while their popularity peaked between 2021 and 2022. In this paper, the authors dive into the world of Non-Fungible Tokens (NFTs), their history, the Future of NFTs, as well as the security concerns.	翻訳日:2024-03-25 14:05:29 公開日:2023-10-24
# 国家電子アイデンティティ(eID)システムに対する影響とリスクアセスメントフレームワーク An Impact and Risk Assessment Framework for National Electronic Identity (eID) Systems ( http://arxiv.org/abs/2310.15784v1 ) ライセンス: Link先を確認	Jide Edu, Mark Hooper, Carsten Maple, Jon Crowcroft,	(参考訳) 電子識別(eID)システムにより、市民は、政府のサービスへのアクセスや金融取引の実施など、様々な目的で、アイデンティティを主張し、認証することができる。これらのシステムは、権利、サービス、および正式な経済へのユーザーアクセスを改善する。 eIDシステムが国家発展の重要な側面となるにつれて、いかなる失敗、妥協、誤用も政府、ユーザー、社会に損害を与える可能性がある。したがって、システムに対する新たなリスクを特定し、その影響を評価するためには、効果的なリスク評価が不可欠である。しかしながら、これらのシステムに対する包括的リスクアセスメントの開発は、技術的なセキュリティとプライバシの影響に焦点を絞るだけでなく、利害関係者やこれらのシステムが提供するコミュニティの文脈的理解によって実施されなければならない。本研究では,現在のリスクアセスメントがすべての主要な利害関係者のリスク要因に対処するものではないと仮定し,その影響について検討する。リスクの広範な影響と、利害関係者にとって潜在的に重大な影響について検討し、これらの制度が導入された社会的、経済的、政治的文脈を含む幅広い要因を考察する枠組みを提案する。これは、eIDシステムに対するリスクをよりよく評価するための総合的なプラットフォームを提供する。 Electronic identification (eID) systems allow citizens to assert and authenticate their identities for various purposes, such as accessing government services or conducting financial transactions. These systems improve user access to rights, services, and the formal economy. As eID systems become an essential facet of national development, any failure, compromise, or misuse can be costly and damaging to the government, users, and society. Therefore, an effective risk assessment is vital for identifying emerging risks to the system and assessing their impact. However, developing a comprehensive risk assessment for these systems must extend far beyond focusing on technical security and privacy impacts and must be conducted with a contextual understanding of stakeholders and the communities these systems serve. In this study, we posit that current risk assessments do not address risk factors for all key stakeholders and explore how potential compromise could impact them each in turn. In the examination of the broader impact of risks and the potentially significant consequences for stakeholders, we propose a framework that considers a wide range of factors, including the social, economic, and political contexts in which these systems were implemented. This provides a holistic platform for a better assessment of risk to the eID system.	翻訳日:2024-03-25 14:05:29 公開日:2023-10-24
# 国家電子アイデンティティ(NeID)システムのリスクと課題 Exploring the Risks and Challenges of National Electronic Identity (NeID) System ( http://arxiv.org/abs/2310.15813v1 ) ライセンス: Link先を確認	Jide Edu, Mark Hooper, Carsten Maple, Jon Crowcroft,	(参考訳) 多くの国は、国民の身元を確実に確認することで、公正で透明で、自治的な社会を育む可能性を認識し、国家電子識別システム(NeID)を採用してきた。 NeIDの包括的性質は、義務を履行する責任を負いながら権利を行使する権限を人々に与えます。それでも、これらの複雑なアイデンティティ検証システムの開発と実装は、セキュリティ、プライバシ、排除に関する懸念を引き起こしている。本研究では,NeIDリスクの異なるカテゴリについて論じ,これらのシステムの展開を成功させるとともに,この技術によって引き起こされる特定のリスクやその他の課題にどのように対処するかを考察する。異なるNeIDシステムのレビューと、各デプロイメントで提示されるユニークなリスクと課題を軽減するための取り組みに基づいて、強いセキュリティ対策の実施、定期的なリスク評価の実行、システムの設計と実装におけるステークホルダーの関与など、リスクを軽減するためのベストプラクティスを強調した。 Many countries have embraced national electronic identification (NeID) systems, recognising their potential to foster a fair, transparent, and well-governed society by ensuring the secure verification of citizens' identities. The inclusive nature of NeID empowers people to exercise their rights while holding them accountable for fulfilling their obligations. Nevertheless, the development and implementation of these complex identity-verification systems have raised concerns regarding security, privacy, and exclusion. In this study, we discuss the different categories of NeID risk and explore the successful deployment of these systems, while examining how the specific risks and other challenges posed by this technology are addressed. Based on the review of the different NeID systems and the efforts made to mitigate the unique risks and challenges presented within each deployment, we highlighted the best practices for mitigating risk, including implementing strong security measures, conducting regular risk assessments, and involving stakeholders in the design and implementation of the system.	翻訳日:2024-03-25 14:05:29 公開日:2023-10-24
# 再実行可能シグナチャスキームとゼロ知識証明:分散デジタルアイデンティティシステムへの適用に関する比較検討 Redactable Signature Schemes and Zero-knowledge Proofs: A comparative examination for applications in Decentralized Digital Identity Systems ( http://arxiv.org/abs/2310.15934v1 ) ライセンス: Link先を確認	Bryan Kumara, Mark Hooper, Carsten Maple, Timothy Hobson, Jon Crowcroft,	(参考訳) Redactable Signature SchemesとZero-Knowledge Proofsは、プライバシを実現するために、根本的に異なる2つのアプローチである。本稿では,分散IDシステムに適用した場合のメリットと欠点について分析する。 Redactable Signaturesは競合的に高速でコンパクトだが、ゼロ知識証明ほど表現力がない。一方、ゼロ知識証明ははるかに高速であるが、いくつかのプロトコルは信頼できるセットアップを必要とする。我々は、利点と欠点を考慮すれば、再実行可能なシグネチャは初期の段階でより適切であり、ゼロ知識証明は後期の分散IDシステムにおいてより適切である、と結論付けた。 Redactable Signature Schemes and Zero-Knowledge Proofs are two radically different approaches to enable privacy. This paper analyses their merits and drawbacks when applied to decentralized identity system. Redactable Signatures, though competitively quick and compact, are not as expressive as zero-knowledge proofs and do not provide the same level of privacy. On the other hand, zero-knowledge proofs can be much faster but some protocols require a trusted set-up. We conclude that given the benefits and drawbacks, redactable signatures are more appropriate at an earlier stage and zero-knowledge proofs are more appropriate at a later stage for decentralized identity systems	翻訳日:2024-03-25 14:05:29 公開日:2023-10-24
# クラウドストレージにおけるアクセス構造の収縮を実現するための効率的な方法 An Efficient Method for Realizing Contractions of Access Structures in Cloud Storage ( http://arxiv.org/abs/2310.15972v1 ) ライセンス: Link先を確認	Shuai Feng, Liang Feng Zhang,	(参考訳) シングルクラウドストレージでは、暗号文の属性ベースの暗号化(CP-ABE)により、クラウドサーバへのアクセス構造の下でデータを暗号化し、復号に必要な属性を指定することができる。マルチクラウドストレージでは、シークレット共有スキーム(SSS)によって、任意のデータを複数の共有に分割し、1つのサーバに分割し、どのサブセットがデータを復元できるかを指定できる。いくつかの属性/サーバを削除するのは興味深い問題ですが、認証済みのすべてのセットの残りの属性/サーバでデータをリカバリすることが可能です。この問題はSSSのアクセス構造が収縮する問題に関連している。本稿では,アクセス構造に対して与えられたSSSを,アクセス構造を収縮するSSSに効率的に変換する手法を提案する。 CP-ABEをベースとした単一クラウドストレージにおける属性除去問題とマルチクラウドストレージにおけるデータ移動問題の解決におけるその応用について述べる。私たちの方法は、サーバストレージの削減や、追加のサーバストレージの不要といったソリューションを生み出します。 In single-cloud storage, ciphertext-policy attribute-based encryption (CP-ABE) allows one to encrypt any data under an access structure to a cloud server, specifying what attributes are required to decrypt. In multi-cloud storage, a secret sharing scheme (SSS) allows one to split any data into multiple shares, one to a single server, and specify which subset of the servers are able to recover the data. It is an interesting problem to remove some attributes/servers but still enable the remaining attributes/servers in every authorized set to recover the data. The problem is related to the contraction problem of access structures for SSSs. In this paper, we propose a method that can efficiently transform a given SSS for an access structure to SSSs for contractions of the access structure. We show its applications in solving the attribute removal problem in the CP-ABE based single-cloud storage and the data relocating problem in multi-cloud storage. Our method results in solutions that require either less server storage or even no additional server storage.	翻訳日:2024-03-25 13:55:39 公開日:2023-10-24
# バーチャルリアリティーは、ユーザーをキーストローク推論攻撃から守ることができるか? Can Virtual Reality Protect Users from Keystroke Inference Attacks? ( http://arxiv.org/abs/2310.16191v1 ) ライセンス: Link先を確認	Zhuolin Yang, Zain Sarwar, Iris Hwang, Ronik Bhaskar, Ben Y. Zhao, Haitao Zheng,	(参考訳) バーチャルリアリティ(VR)は、地理的制限なしに没入的でインタラクティブな体験を提供することで人気を集めている。また、物理的分離による個人のプライバシーの感覚も提供する。本稿では,プライバシーの強化を前提として,個人情報を盗むサイドチャネル攻撃からVRを保護できないことを示す。皮肉なことに、この脆弱性はVRの最大の強み、没入的でインタラクティブな性質から生じます。そこで我々は,アバターを観察することで,他のVRユーザによって入力されたコンテンツをアタッカー(VRユーザ)が復元できるような,共有仮想環境における新しいキーストローク推論攻撃の設計と実装を行った。アバターはユーザの手の動きのノイズの多いテレメトリを表示するが、インテリジェントアタッカーは、キーボードレイアウトやラベル付きデータを収集することなく、そのデータを入力されたキーを認識し、型付きコンテンツを再構築することができる。 IRBが承認した複数のVRシナリオを対象としたユーザスタディを用いて,提案した攻撃の評価を行った。 15人中13人がタイプされたキーの86%～98%を正確に認識し、元のタイプされたコンテンツの意味の98%を回復したコンテンツが保持している。また、防衛の可能性についても論じる。 Virtual Reality (VR) has gained popularity by providing immersive and interactive experiences without geographical limitations. It also provides a sense of personal privacy through physical separation. In this paper, we show that despite assumptions of enhanced privacy, VR is unable to shield its users from side-channel attacks that steal private information. Ironically, this vulnerability arises from VR's greatest strength, its immersive and interactive nature. We demonstrate this by designing and implementing a new set of keystroke inference attacks in shared virtual environments, where an attacker (VR user) can recover the content typed by another VR user by observing their avatar. While the avatar displays noisy telemetry of the user's hand motion, an intelligent attacker can use that data to recognize typed keys and reconstruct typed content, without knowing the keyboard layout or gathering labeled data. We evaluate the proposed attacks using IRB-approved user studies across multiple VR scenarios. For 13 out of 15 tested users, our attacks accurately recognize 86%-98% of typed keys, and the recovered content retains up to 98% of the meaning of the original typed content. We also discuss potential defenses.	翻訳日:2024-03-25 13:55:39 公開日:2023-10-24
# ブロックチェーンを用いた協調プラトゥーニングによる運転者の安全回復 Driver Safety Reward with Cooperative Platooning using Blockchain ( http://arxiv.org/abs/2312.02164v1 ) ライセンス: Link先を確認	Sruthi Rachamalla, Henry Hexmoor,	(参考訳) 共同運転(またはプラトゥーニング)は、車両通信プロトコルによって2台以上の車両を道路で接続することで安全性と効率を向上させることに焦点を当てる。リーダーは小隊を管理し、車間の通信を確立し、小隊の演習を行うため、非常に重要である。本稿では,運転者の安全につながる道路における小隊化を促進するドライバーインセンティブモデルを提案する。小隊のリーダーはフォロワーよりも複数の責任を持ち、我々のモデルはフォロワーよりもリーダーにインセンティブを与える。これらのインセンティブは暗号通貨として報われる。この小隊のリーダーとフォロワーの両方のためのデジタルマネタイズ方法は、ブロックチェーンを使用したセキュアなトランザクションによって実現される。 Cooperative driving (or Platooning) focuses on improving the safety and efficiency by connecting two or more vehicles on a road by vehicular communication protocols. The leader is crucial as it manages the platoon, establishes communication between cars, and perform platoon maneuvers. In this paper, we proposed a driver incentive model which encourages platooning on roads leading to driver safety. As, the leader of platoon have multiple responsibilities than followers, our model rewards more incentives to leader than followers. These incentives will be rewarded as crypto tokens. This digital monetization method for both leaders and followers of a platoon is accomplished by secure transactions using blockchain.	翻訳日:2024-03-25 13:06:53 公開日:2023-10-24
# 強化学習に基づく移動ロボットの局所経路計画 Reinforcement learning based local path planning for mobile robot ( http://arxiv.org/abs/2403.12463v1 ) ライセンス: Link先を確認	Mehmet Gok, Mehmet Tekerek, Hamza Aydemir,	(参考訳) 移動ロボットが特定の目標地点に行くには、異なる方法が用いられる。これらのメソッドは、オンラインとオフラインのシナリオで異なる方法で動作します。オフラインのシナリオでは、環境マップが一度作成され、このマップ上に固定された経路計画が作成され、ターゲットに到達する。 A* や RRT (Rapidly-Exploring Random Tree) のような経路計画アルゴリズムはオフライン手法の例である。ここで最も明白な状況は、ロードされたマップの条件を変更するパスを再計画する必要があることである。一方,オンラインのシナリオでは,センサから得られる知覚データを用いて地図を使わずに,ロボットを対象物へ動的に移動させる。 SFM(Social Force Model)のようなアプローチは、オンラインシステムで使われている。しかし、これらの手法は多くの動的センシングデータの要求に悩まされている。このように、オフラインシステムにおける再計画とマッピングの必要性や、オンラインシステムにおける様々なシステム設計要件が、自律型移動ロボット研究に焦点を絞っていると言えよう。近年,モバイルロボットナビゲーションにおける問題に対する新たなソリューションとして,ディープニューラルネットワークを用いたQ-Learning手法が採用されている。本研究では,DQN(Deep Q-Learning)とDQN(Deep DQN)アーキテクチャを用いた機械学習アルゴリズムを,上記の問題の解法として評価し,障害物回避のための自律移動ロボットの経路計画を実現する。 Different methods are used for a mobile robot to go to a specific target location. These methods work in different ways for online and offline scenarios. In the offline scenario, an environment map is created once, and fixed path planning is made on this map to reach the target. Path planning algorithms such as A* and RRT (Rapidly-Exploring Random Tree) are the examples of offline methods. The most obvious situation here is the need to re-plan the path for changing conditions of the loaded map. On the other hand, in the online scenario, the robot moves dynamically to a given target without using a map by using the perceived data coming from the sensors. Approaches such as SFM (Social Force Model) are used in online systems. However, these methods suffer from the requirement of a lot of dynamic sensing data. Thus, it can be said that the need for re-planning and mapping in offline systems and various system design requirements in online systems are the subjects that focus on autonomous mobile robot research. Recently, deep neural network powered Q-Learning methods are used as an emerging solution to the aforementioned problems in mobile robot navigation. In this study, machine learning algorithms with deep Q-Learning (DQN) and Deep DQN architectures, are evaluated for the solution of the problems presented above to realize path planning of an autonomous mobile robot to avoid obstacles.	翻訳日:2024-03-25 07:36:54 公開日:2023-10-24
# 協調情報を用いたグラフベース軌道予測 Graph-based Trajectory Prediction with Cooperative Information ( http://arxiv.org/abs/2310.15692v1 ) ライセンス: Link先を確認	Jan Strohbeck, Sebastian Maschke, Max Mertens, Michael Buchholz	(参考訳) 自動走行の場合、複雑な交通状況で他の道路利用者の将来の軌道を予測することは困難である。現代のニューラルネットワークは、過去の交通参加者の軌跡と地図データを使って、運転者の意図とおそらくの操作に関するヒントを集めている。車と他の交通機関の接続性を高めることで、協調情報は軌道予測アルゴリズムの入力として使用できるデータの別の情報源となる。接続されたアクターは、意図した経路を送信したり、計画された軌道を他のアクターに送信したりする。本研究では、このデータソースを軌跡予測に使用する利点を概説し、この追加データを活用可能なグラフベースのニューラルネットワークアーキテクチャを提案する。協調データが存在するとネットワーク性能が大幅に向上することを示す。また,協調的な情報がない場合においても,ネットワークの性能を向上させる訓練手法を提案する。また,ネットワークが不正確な協調データを処理できることを示し,実際の運転環境での利用を可能にした。 For automated driving, predicting the future trajectories of other road users in complex traffic situations is a hard problem. Modern neural networks use the past trajectories of traffic participants as well as map data to gather hints about the possible driver intention and likely maneuvers. With increasing connectivity between cars and other traffic actors, cooperative information is another source of data that can be used as inputs for trajectory prediction algorithms. Connected actors might transmit their intended path or even complete planned trajectories to other actors, which simplifies the prediction problem due to the imposed constraints. In this work, we outline the benefits of using this source of data for trajectory prediction and propose a graph-based neural network architecture that can leverage this additional data. We show that the network performance increases substantially if cooperative data is present. Also, our proposed training scheme improves the network's performance even for cases where no cooperative information is available. We also show that the network can deal with inaccurate cooperative data, which allows it to be used in real automated driving environments.	翻訳日:2024-02-18 14:31:58 公開日:2023-10-24
# AIによるプログラミング演習の自動補正: GPT-3.5はどの程度有効か? AI-enhanced Auto-correction of Programming Exercises: How Effective is GPT-3.5? ( http://arxiv.org/abs/2311.10737v1 ) ライセンス: Link先を確認	Imen Azaiz, Oliver Deckarm, Sven Strickroth	(参考訳) タイムリーな形成的フィードバックは、効果的な学習にとって最も重要な要因の1つと考えられている。タイムリーで個別化されたフィードバックの提供は、高等教育の大規模クラスでは特に難しい。最近、gpt-3のような大きな言語モデルが一般公開され、コード生成やコード説明といった様々なタスクで有望な結果が得られた。本稿では、パーソナライズされたコード修正とフィードバック生成におけるAIの可能性を検討する。既存の学生による2つの実世界の課題の提出に基づいて,AI支援によるe-アセスメントの正しさと,障害の局所化,ヒントの正しさ,生成したフィードバックのコードスタイルの提案などの特徴について検討した。その結果,提出品の73 %が正しいか間違っているかのどちらかとして正しく同定された。これらの症例の59パーセントでは、GPT-3.5も有効で高品質なフィードバックを得られる。さらに、GPT-3.5は、実際のエラーではないエラーのローカライズや、幻覚的エラーなど、評価の弱点を示した。意味と潜在的な新しい利用シナリオについて論じる。 Timely formative feedback is considered as one of the most important drivers for effective learning. Delivering timely and individualized feedback is particularly challenging in large classes in higher education. Recently Large Language Models such as GPT-3 became available to the public that showed promising results on various tasks such as code generation and code explanation. This paper investigates the potential of AI in providing personalized code correction and generating feedback. Based on existing student submissions of two different real-world assignments, the correctness of the AI-aided e-assessment as well as the characteristics such as fault localization, correctness of hints, and code style suggestions of the generated feedback are investigated. The results show that 73 % of the submissions were correctly identified as either correct or incorrect. In 59 % of these cases, GPT-3.5 also successfully generated effective and high-quality feedback. Additionally, GPT-3.5 exhibited weaknesses in its evaluation, including localization of errors that were not the actual errors, or even hallucinated errors. Implications and potential new usage scenarios are discussed.	翻訳日:2023-11-27 01:00:53 公開日:2023-10-24
# パターン識別を用いた文脈分布外検出 Contextualised Out-of-Distribution Detection using Pattern Identication ( http://arxiv.org/abs/2311.12855v1 ) ライセンス: Link先を確認	Romain Xu-Darme (LSL, LIG), Julien Girard-Satabin (LSL), Darryl Hond (TRT UK), Gabriele Incorvaia (TRT UK), Zakaria Chihani (LSL)	(参考訳) 本研究では,クラス固有の繰り返しパターンを識別し,視覚的分類のための堅牢なアウト・オブ・ディストリビューション(OoD)検出手法を構築するための,説明可能なAI分野からの既存作業の拡張であるCODEを提案する。 CODEは分類器の再トレーニングを一切必要とせず、OoD非依存、すなわちトレーニングデータセットに直接チューニングされる。重要なことに、パターン識別により、イン・ディストリビューション(ID)データセットのイメージを参照データとして提供し、信頼度スコアに追加のコンテキストを提供する。さらに,IDデータセットの摂動に基づく新しいベンチマークを導入し,OoD検出法の比較の基準値として機能するIDデータセットとOoDデータセットの差を,既知の定量的に測定した。 In this work, we propose CODE, an extension of existing work from the field of explainable AI that identifies class-specific recurring patterns to build a robust Out-of-Distribution (OoD) detection method for visual classifiers. CODE does not require any classifier retraining and is OoD-agnostic, i.e., tuned directly to the training dataset. Crucially, pattern identification allows us to provide images from the In-Distribution (ID) dataset as reference data to provide additional context to the confidence scores. In addition, we introduce a new benchmark based on perturbations of the ID dataset that provides a known and quantifiable measure of the discrepancy between the ID and OoD datasets serving as a reference value for the comparison between OoD detection methods.	翻訳日:2023-11-27 00:36:35 公開日:2023-10-24
# プロンプト誘導多モード変圧器による結晶材料の状態予測密度 Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer ( http://arxiv.org/abs/2311.12856v1 ) ライセンス: Link先を確認	Namkyeong Lee, Heewoong Noh, Sungwon Kim, Dongmin Hyun, Gyoung S. Na, Chanyoung Park	(参考訳) 状態密度 (DOS) は結晶材料のスペクトル特性であり、物質の様々な特性に関する基本的な知見を提供する。従来の研究は主にDOS予測のための結晶材料の高品質な表現の獲得に焦点が当てられていたが、我々はDOSの性質を反映して得られた表現からDOSを予測することに重点を置いている。つまり、dosは結晶性物質だけでなく、以前の作品では無視されているエネルギーレベルによっても決定される。本稿では,多モード変圧器を用いて結晶材料とエネルギーから得られる不均一な情報を統合し,結晶材料中の原子と様々なエネルギー準位との複雑な関係をモデル化し,dos予測を行う。さらに, 結晶構造系とエネルギーの相互作用を学習するためのモデルとして, プロンプトを活用することを提案する。 Phonon DOSとElectron DOSの2種類のDOSに関する大規模な実験は、DOSTransformerの優位性を実証している。 The density of states (DOS) is a spectral property of crystalline materials, which provides fundamental insights into various characteristics of the materials. While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the general distribution of states as a function of energy. That is, DOS is not solely determined by the crystalline material but also by the energy levels, which has been neglected in previous works. In this paper, we propose to integrate heterogeneous information obtained from the crystalline materials and the energies via a multi-modal transformer, thereby modeling the complex relationships between the atoms in the crystalline materials and various energy levels for DOS prediction. Moreover, we propose to utilize prompts to guide the model to learn the crystal structural system-specific interactions between crystalline materials and energies. Extensive experiments on two types of DOS, i.e., Phonon DOS and Electron DOS, with various real-world scenarios demonstrate the superiority of DOSTransformer.	翻訳日:2023-11-27 00:18:34 公開日:2023-10-24
# MLを用いた地震解析用バブルアナライザプローブの設計 Design Of Rubble Analyzer Probe Using ML For Earthquake ( http://arxiv.org/abs/2311.02087v1 ) ライセンス: Link先を確認	Abhishek Sebastian, R Pragna, K Vishal Vythianathan, Dasaraju Sohan Sai, U Shiva Sri Hari Al, R Anirudh and Apurv Choudhary	(参考訳) the earthquake rubble analyzerは、機械学習を使って周囲の音で人間の存在を検知し、97.45%の精度を達成する。また、リアルタイムの環境データも提供し、地震後の救助活動に不可欠な、閉じ込められた個人に対する生存可能性の評価を支援する。 The earthquake rubble analyzer uses machine learning to detect human presence via ambient sounds, achieving 97.45% accuracy. It also provides real-time environmental data, aiding in assessing survival prospects for trapped individuals, crucial for post-earthquake rescue efforts	翻訳日:2023-11-12 19:58:03 公開日:2023-10-24
# CMIP X-MOS:極端モデル出力統計による気候モデルの改善 CMIP X-MOS: Improving Climate Models with Extreme Model Output Statistics ( http://arxiv.org/abs/2311.03370v1 ) ライセンス: Link先を確認	Vsevolod Morozov, Artem Galliamov, Aleksandr Lukashevich, Antonina Kurdukova, and Yury Maximov	(参考訳) 温室効果ガスの排出が気候変動に与える影響や、自然災害の頻度と深刻度の増加を評価するには、気候モデルが不可欠である。統合モデル相互比較計画(cmip)によって生み出された気候モデルが広く受け入れられているにもかかわらず、気候の極端さを正確に予測する上での課題に直面している。この制限に対処し、自然災害リスクの予測を改善するため、エクストリームモデル出力統計(x-mos)を導入する。このアプローチでは、深部回帰手法を用いてCMIPモデル出力を気象観測所から得られた実測値に正確にマッピングし、XXI気候極度のより正確な解析を行う。過去の研究とは対照的に,本研究では,将来の気候パラメータ分布の尾部推定の強化に重点を置いている。後者は意思決定者をサポートし、世界中の気候関連リスクをよりよく評価することができる。 Climate models are essential for assessing the impact of greenhouse gas emissions on our changing climate and the resulting increase in the frequency and severity of natural disasters. Despite the widespread acceptance of climate models produced by the Coupled Model Intercomparison Project (CMIP), they still face challenges in accurately predicting climate extremes, which pose most significant threats to both people and the environment. To address this limitation and improve predictions of natural disaster risks, we introduce Extreme Model Output Statistics (X-MOS). This approach utilizes deep regression techniques to precisely map CMIP model outputs to real measurements obtained from weather stations, which results in a more accurate analysis of the XXI climate extremes. In contrast to previous research, our study places a strong emphasis on enhancing the estimation of the tails of future climate parameter distributions. The latter supports decision-makers, enabling them to better assess climate-related risks across the globe.	翻訳日:2023-11-12 19:48:41 公開日:2023-10-24
# 商品取引における注文書の深層学習と強化学習の併用 Combining Deep Learning on Order Books with Reinforcement Learning for Profitable Trading ( http://arxiv.org/abs/2311.02088v1 ) ライセンス: Link先を確認	Koti S. Jaddu and Paul A. Bilokon	(参考訳) 近未来の動きを予測する価格不均衡や価格行動のパターンを利用するには、自動化された判断を迅速に行う必要がある。多くのアルゴリズムが探索されテストされてきたが、分析手法は限られた領域に焦点をあてて市場環境の全体像を活用できない。機械学習の分野では、収益性のあるトレーディングの領域範囲を増やすために、多くの大規模エンドツーエンドの生データの研究が成功しているが、複製は非常に困難である。注文書の深層学習と強化学習を組み合わせることは、大規模エンドツーエンド学習を、小売取引に適した再現性のためのより管理可能な軽量なコンポーネントに分解する1つの方法である。次の研究は、注文フローの不均衡を利用して複数の地平線をまたがるリターンを予測することに焦点を当て、トレーディング信号を提供する5つの金融機器のための3つの時間差学習モデルを訓練する。使用される楽器は2つの外国為替ペア(GBPUSDとEURUSD)、2つの指標(DE40とFTSE100)、1つの商品(XAUUSD)である。これらの15エージェントのパフォーマンスは、バックテストシミュレーションによって評価され、成功したモデルが小売トレーディングプラットフォームでテストを進める。この結果は潜在的に証明されるが、小売業の取引コスト、滑り込み、変動の拡散を完全に処理するために、一貫して利益を上げている取引に対して、さらなる修正が必要となる。 High-frequency trading is prevalent, where automated decisions must be made quickly to take advantage of price imbalances and patterns in price action that forecast near-future movements. While many algorithms have been explored and tested, analytical methods fail to harness the whole nature of the market environment by focusing on a limited domain. With the evergrowing machine learning field, many large-scale end-to-end studies on raw data have been successfully employed to increase the domain scope for profitable trading but are very difficult to replicate. Combining deep learning on the order books with reinforcement learning is one way of breaking down large-scale end-to-end learning into more manageable and lightweight components for reproducibility, suitable for retail trading. The following work focuses on forecasting returns across multiple horizons using order flow imbalance and training three temporal-difference learning models for five financial instruments to provide trading signals. The instruments used are two foreign exchange pairs (GBPUSD and EURUSD), two indices (DE40 and FTSE100), and one commodity (XAUUSD). The performances of these 15 agents are evaluated through backtesting simulation, and successful models proceed through to forward testing on a retail trading platform. The results prove potential but require further minimal modifications for consistently profitable trading to fully handle retail trading costs, slippage, and spread fluctuation.	翻訳日:2023-11-12 19:44:07 公開日:2023-10-24
# 第二生まれの電子、再び水夫として生まれる Second Born electrons, born again seamen ( http://arxiv.org/abs/2310.17666v1 ) ライセンス: Link先を確認	A. R. P. Rau	(参考訳) タイトルの複数の句は好奇心に満ちており、海洋上の人物の救助と原子衝突における電荷移動における第2ボルン項の支配は物理学の共通要素を共有している。 2つの性質と共通性について説明する。 The multiple puns in the title play on a curiosity, that the rescue of a person overboard at sea and the dominance of the second Born term in charge transfer in atomic collisions share common elements of physics. Essentials and commonality in the two are explained.	翻訳日:2023-11-05 14:14:46 公開日:2023-10-24
# HMC-pCNサンプリング器を用いたSA-Roundtrip前のベイズ画像逆問題 Bayesian imaging inverse problem with SA-Roundtrip prior via HMC-pCN sampler ( http://arxiv.org/abs/2310.17817v1 ) ライセンス: Link先を確認	Jiayu Qian, Yuanyuan Liu, Jingya Yang and Qingping Zhou	(参考訳) 深い生成前のベイズ推定は、多くの科学・工学分野における逆問題の画像解決にかなりの関心を集めている。事前分布の選択は、利用可能な事前測定の重要表現学習から学習される。サラウンドトリップ(sa-roundtrip)は、サンプリング生成の制御とデータの固有次元の識別を可能にするために、新しい深層生成前置法である。この前は双方向生成逆ネットワークに自己接続構造を組み込む。その後、ベイズ推定は、特定の条件下でエルゴードであることが証明された事前条件付きcrank-nicolson (hmc-pcn) アルゴリズムを用いたハミルトニアンモンテカルロを用いて、低次元潜在空間の後方分布に適用される。 MNIST と TomoPhantom のデータセットを用いたCT再構成実験により,提案手法は最新技術との比較よりも優れており,精度の高い精度の定量化とともに,頑健で優れた点推定器が得られることがわかった。 Bayesian inference with deep generative prior has received considerable interest for solving imaging inverse problems in many scientific and engineering fields. The selection of the prior distribution is learned from, and therefore an important representation learning of, available prior measurements. The SA-Roundtrip, a novel deep generative prior, is introduced to enable controlled sampling generation and identify the data's intrinsic dimension. This prior incorporates a self-attention structure within a bidirectional generative adversarial network. Subsequently, Bayesian inference is applied to the posterior distribution in the low-dimensional latent space using the Hamiltonian Monte Carlo with preconditioned Crank-Nicolson (HMC-pCN) algorithm, which is proven to be ergodic under specific conditions. Experiments conducted on computed tomography (CT) reconstruction with the MNIST and TomoPhantom datasets reveal that the proposed method outperforms state-of-the-art comparisons, consistently yielding a robust and superior point estimator along with precise uncertainty quantification.	翻訳日:2023-11-05 14:04:38 公開日:2023-10-24
# ノイズラベル下でのロバストネスのための微調整前訓練モデル Fine tuning Pre trained Models for Robustness Under Noisy Labels ( http://arxiv.org/abs/2310.17668v1 ) ライセンス: Link先を確認	Sumyeong Ahn, Sihyeon Kim, Jongwoo Ko, Se-Young Yun	(参考訳) トレーニングデータセットにノイズの多いラベルが存在することは、機械学習モデルのパフォーマンスに大きな影響を及ぼす可能性がある。この問題に対処するため、研究者はノイズラベルを用いた学習法を検討し、クリーンサンプルを特定し、ノイズラベルの影響を低減する。しかし、トレーニングデータセットの特定の部分の影響を限定すると、全体的な一般化性能が低下する可能性がある。これを緩和するため,近年の研究では,膨大な計算資源を活用し,ノイズラベルの慎重な活用を考察している。したがって、訓練コストの増大は効率の再評価を必要とする。その他の研究分野では、高度な一般化性能と効率性を達成することを目的とした、大規模な事前訓練モデルのための微調整技術の開発に焦点が当てられている。しかし,これらの手法は主にクリーンデータセットに集中しており,ノイズのあるラベルシナリオの探索は限られている。本研究の目的は,ノイズのあるラベル付きデータセットに対して,事前学習したモデルを微調整する適切な方法を見つけることである。この目的を達成するために,ノイズの多いデータセットに遭遇したモデルの特徴について検討する。実験分析を通じて,事前学習したモデルの事前知識を頑健かつ効率的に伝達するTURNという新しいアルゴリズムを導入する。本アルゴリズムは,(1)ノイズラベルによる特徴抽出器の歪みを防止するために線形分類器を独立にチューニングし,(2)雑音ラベル比を低減し,ノイズ低減データセットに基づいてモデル全体を微調整し,ターゲットデータセットに適用する,という2つのステップからなる。提案アルゴリズムは, 従来の手法と比較して, 様々なベンチマークにおいて, 効率が高く, 性能も向上している。 The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models. To tackle this issue, researchers have explored methods for Learning with Noisy Labels to identify clean samples and reduce the influence of noisy labels. However, constraining the influence of a certain portion of the training dataset can result in a reduction in overall generalization performance. To alleviate this, recent studies have considered the careful utilization of noisy labels by leveraging huge computational resources. Therefore, the increasing training cost necessitates a reevaluation of efficiency. In other areas of research, there has been a focus on developing fine-tuning techniques for large pre-trained models that aim to achieve both high generalization performance and efficiency. However, these methods have mainly concentrated on clean datasets, and there has been limited exploration of the noisy label scenario. In this research, our aim is to find an appropriate way to fine-tune pre-trained models for noisy labeled datasets. To achieve this goal, we investigate the characteristics of pre-trained models when they encounter noisy datasets. Through empirical analysis, we introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models. The algorithm consists of two main steps: (1) independently tuning the linear classifier to protect the feature extractor from being distorted by noisy labels, and (2) reducing the noisy label ratio and fine-tuning the entire model based on the noise-reduced dataset to adapt it to the target dataset. The proposed algorithm has been extensively tested and demonstrates efficient yet improved denoising performance on various benchmarks compared to previous methods.	翻訳日:2023-11-05 14:03:58 公開日:2023-10-24
# DeSIQ: ソーシャルインテリジェンス理解のための不偏のベンチマークを目指す DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding ( http://arxiv.org/abs/2310.18359v1 ) ライセンス: Link先を確認	Xiao-Yu Guo and Yuan-Fang Li and Gholamreza Haffari	(参考訳) 社会的知性は人間の表現、意図、相互作用を理解するのに不可欠である。ソーシャルインテリジェンス・クエリー(Social Intelligence Queries, Social-IQ)は、複雑なソーシャルインタラクションのビデオに関する複数の質問のデータセットである。このようなベンチマークデータセットの健全性は基礎となる研究課題の調査に不可欠であるため,ソーシャルiqの健全性を研究するための包括的方法論を定めている。分析の結果,Social-IQにはかなりのバイアスがあることが判明した。このバイアスは適度に強い言語モデルによって活用され,適切な相関関係を学習し,文脈や質問を伴わずに完全なパフォーマンスを達成することができる。ソーシャルIQに単純な摂動を適用して構築した新しい挑戦的データセットであるDeSIQを紹介する。我々の実証分析は、DeSIQがオリジナルのSocial-IQデータセットのバイアスを著しく減少させることを示している。さらに,モデルサイズ,モデルスタイル,学習設定,コモンセンス知識,マルチモダリティがベンチマーク性能に与える影響について検討し,考察した。我々の新しいデータセット、観察、発見は、社会的知性の研究に重要な研究課題を開く。 Social intelligence is essential for understanding and reasoning about human expressions, intents and interactions. One representative benchmark for its study is Social Intelligence Queries (Social-IQ), a dataset of multiple-choice questions on videos of complex social interactions. We define a comprehensive methodology to study the soundness of Social-IQ, as the soundness of such benchmark datasets is crucial to the investigation of the underlying research problem. Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model to learn spurious correlations to achieve perfect performance without being given the context or even the question. We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ. Our empirical analysis shows DeSIQ significantly reduces the biases in the original Social-IQ dataset. Furthermore, we examine and shed light on the effect of model size, model style, learning settings, commonsense knowledge, and multi-modality on the new benchmark performance. Our new dataset, observations and findings open up important research questions for the study of social intelligence.	翻訳日:2023-11-05 13:56:00 公開日:2023-10-24
# 大規模言語モデルのプロンプト工学手法に関するコミュニケーション理論の展望 A Communication Theory Perspective on Prompting Engineering Methods for Large Language Models ( http://arxiv.org/abs/2310.18358v1 ) ライセンス: Link先を確認	Yuanfeng Song, Yuanqin He, Xuefang Zhao, Hanlin Gu, Di Jiang, Haijun Yang, Lixin Fan, Qiang Yang	(参考訳) 大規模言語モデル(llms)の台頭により、コミュニティはシングルタスク指向自然言語処理(nlp)研究から総合的なエンドツーエンドマルチタスク学習パラダイムへと移行した。この分野におけるこの研究の線に沿って、LLMベースのプロンプト法は、プロンプト工学(PE)による技術的アドバンテージと、様々なプロンプト法によって開示される基礎的NLP原則によって、多くの注目を集めている。従来の教師付き学習では、ラベル付きデータに基づいてモデルをトレーニングし、予測する必要があった。対照的にPE法は、特にショットやゼロショットのシナリオにおいて、適切なプロンプトを構成することによって既存のLCM(GPT-3とGPT-4)の強力な能力を直接利用する。本論文は,この分野の促進と進化する性質に関する研究の豊富さに直面することを目的としている。 i) 確立された通信理論の枠組みの中で,既存のPE手法をレビューするための新たな視点を示す。 (二)4つの典型的な課題に使用される既存のPE手法の展開動向の理解を深めること。 (iii)将来のpe法の有望な研究方向について光を当てた。 The springing up of Large Language Models (LLMs) has shifted the community from single-task-orientated natural language processing (NLP) research to a holistic end-to-end multi-task learning paradigm. Along this line of research endeavors in the area, LLM-based prompting methods have attracted much attention, partially due to the technological advantages brought by prompt engineering (PE) as well as the underlying NLP principles disclosed by various prompting methods. Traditional supervised learning usually requires training a model based on labeled data and then making predictions. In contrast, PE methods directly use the powerful capabilities of existing LLMs (i.e., GPT-3 and GPT-4) via composing appropriate prompts, especially under few-shot or zero-shot scenarios. Facing the abundance of studies related to the prompting and the ever-evolving nature of this field, this article aims to (i) illustrate a novel perspective to review existing PE methods, within the well-established communication theory framework; (ii) facilitate a better/deeper understanding of developing trends of existing PE methods used in four typical tasks; (iii) shed light on promising research directions for future PE methods.	翻訳日:2023-11-05 13:55:41 公開日:2023-10-24
# 大規模言語モデルを活用したeコマースにおける製品記述の強化 Leveraging Large Language Models for Enhanced Product Descriptions in eCommerce ( http://arxiv.org/abs/2310.18357v1 ) ライセンス: Link先を確認	Jianghong Zhou and Bo Liu and Jhalak Nilesh Acharya Yao Hong and Kuang-chih Lee and Musen Wen	(参考訳) eコマースのダイナミックな分野では、検索の可視性と顧客エンゲージメントを高めるために、製品記述の品質と包括性が重要である。効果的な製品説明は、'コールドスタート'問題に対処し、市場のトレンドに合わせて、最終的にクリックスルー率の増加につながる。これらの記述を作成するための従来の手法は、しばしば人為的な努力を伴い、一貫性とスケーラビリティの両方を欠いている。本稿では,LAMA 2.0 7B言語モデルを用いた製品記述の自動生成手法を提案する。私たちは、最大のeコマースプラットフォームの1つであるwalmartから、本物の製品説明のデータセットでモデルをトレーニングします。このモデルは、ドメイン固有の言語機能やeコマースニュアンスのために微調整され、営業やユーザエンゲージメントにおける実用性を高める。我々は、NDCG、顧客クリックスルー率、人間評価など、複数の評価指標を用いて、アプローチの有効性を検証する。この結果から,システムはスケーラブルであるだけでなく,製品記述の作成に関わる人的作業量を大幅に削減できることが判明した。本研究は,eコマースプラットフォームのさまざまな面の自動化と最適化において,llama 2.0 7b のような大規模言語モデルのかなりの可能性を強調し,検索機能の改善や販売の増加など,ビジネス的な影響を提供する。 In the dynamic field of eCommerce, the quality and comprehensiveness of product descriptions are pivotal for enhancing search visibility and customer engagement. Effective product descriptions can address the 'cold start' problem, align with market trends, and ultimately lead to increased click-through rates. Traditional methods for crafting these descriptions often involve significant human effort and may lack both consistency and scalability. This paper introduces a novel methodology for automating product description generation using the LLAMA 2.0 7B language model. We train the model on a dataset of authentic product descriptions from Walmart, one of the largest eCommerce platforms. The model is then fine-tuned for domain-specific language features and eCommerce nuances to enhance its utility in sales and user engagement. We employ multiple evaluation metrics, including NDCG, customer click-through rates, and human assessments, to validate the effectiveness of our approach. Our findings reveal that the system is not only scalable but also significantly reduces the human workload involved in creating product descriptions. This study underscores the considerable potential of large language models like LLAMA 2.0 7B in automating and optimizing various facets of eCommerce platforms, offering significant business impact, including improved search functionality and increased sales.	翻訳日:2023-11-05 13:55:21 公開日:2023-10-24
# ヒューリスティックから分析へ:コヒーレント物理コモンセンス推論のための認知的動機付け戦略 From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning ( http://arxiv.org/abs/2310.18364v1 ) ライセンス: Link先を確認	Zheyuan Zhang, Shane Storks, Fengyuan Hu, Sungryull Sohn, Moontae Lee, Honglak Lee, Joyce Chai	(参考訳) プレトレーニング言語モデル(PLM)は、様々な言語タスクにおいて印象的なパフォーマンスを示している。しかし、それらはしばしば相関関係を生じやすく、しばしば説明的な情報を生成する。現実世界のアプリケーションでは、PLMは形式化された一貫性のある推論チェーンで決定を正当化する必要があるが、この課題は未解決のままである。認知心理学は、人間が高速で直感的なヒューリスティックな思考を活用して過去の経験に基づいて意思決定を行い、より遅く、思慮深い分析的推論を通じて決定を合理化することができると理論化している。 PLMによる微調整および文脈内学習にこれらの相互結合二重プロセスを導入し、コヒーレントなコモンセンス推論を必要とする2つの言語理解タスクに適用する。提案するヒューリスティック・アナリシス・推論(har)戦略はモデル決定の合理化のコヒーレンスを劇的に改善し,直観的物理学の階層的推論(trip)に最先端の結果をもたらすことを示した。また、この改良されたコヒーレンスが、推論の各ステップにおいて、関連する言語コンテキストに対するより忠実な注意の直接の結果であることも分かりました。以上の結果から, PLM推論の一貫性と信頼性を効果的に向上できる可能性が示唆された。 Pre-trained language models (PLMs) have shown impressive performance in various language tasks. However, they are prone to spurious correlations, and often generate illusory information. In real-world applications, PLMs should justify decisions with formalized, coherent reasoning chains, but this challenge remains under-explored. Cognitive psychology theorizes that humans are capable of utilizing fast and intuitive heuristic thinking to make decisions based on past experience, then rationalizing the decisions through slower and deliberative analytic reasoning. We incorporate these interlinked dual processes in fine-tuning and in-context learning with PLMs, applying them to two language understanding tasks that require coherent physical commonsense reasoning. We show that our proposed Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions, yielding state-of-the-art results on Tiered Reasoning for Intuitive Physics (TRIP). We also find that this improved coherence is a direct result of more faithful attention to relevant language context in each step of reasoning. Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.	翻訳日:2023-11-05 13:40:07 公開日:2023-10-24
# 強化学習におけるグラフ畳み込みネットワークを用いた会話エージェントの文脈化リアルタイムマルチモーダル感情認識 A Contextualized Real-Time Multimodal Emotion Recognition for Conversational Agents using Graph Convolutional Networks in Reinforcement Learning ( http://arxiv.org/abs/2310.18363v1 ) ライセンス: Link先を確認	Fathima Abdul Rahman, Guang Lu	(参考訳) 最近の生成型人工知能(genai)と大規模言語モデル(llm)の発展により、会話エージェントはますます普及し、受け入れられている。身近な方法でインタラクションし、仮想的なコンパニオンとしてサポートすることで、ヒューマンタッチを提供します。したがって、ユーザの感情を理解して慎重に反応することが重要である。感情認識の標準的な問題と比較すると、会話エージェントはリアルタイムでなければならないという追加の制約に直面している。音声、視覚、テキストのモダリティを用いたモデルアーキテクチャの研究は、オンライン機能を提供しないフルビデオシーケンスを用いた感情分類に重点を置いている。本稿では,グラフ畳み込みネットワークと強化学習(coner-grl)を用いたコンテキスト化感情認識のための新しいパラダイムを提案する。会話は、文脈情報の効果的な抽出のために、発話の小さなグループに分割される。このシステムは、GRU(Gated Recurrent Units)を用いて、これらの発話群からマルチモーダル特徴を抽出する。さらに重要なことに、グラフ畳み込みネットワーク(gcn)と強化学習(rl)エージェントは、インタラクティブなシナリオにおける感情機能の複雑な依存関係を捉えるために訓練される。 ConER-GRLモデルとベンチマークデータセット上の他の最先端モデルを比較して、IEMOCAPはマルチモーダルな会話信号からリアルタイムで感情を認識する際に、conER-GRLアーキテクチャの利点を示す。 Owing to the recent developments in Generative Artificial Intelligence (GenAI) and Large Language Models (LLM), conversational agents are becoming increasingly popular and accepted. They provide a human touch by interacting in ways familiar to us and by providing support as virtual companions. Therefore, it is important to understand the user's emotions in order to respond considerately. Compared to the standard problem of emotion recognition, conversational agents face an additional constraint in that recognition must be real-time. Studies on model architectures using audio, visual, and textual modalities have mainly focused on emotion classification using full video sequences that do not provide online features. In this work, we present a novel paradigm for contextualized Emotion Recognition using Graph Convolutional Network with Reinforcement Learning (conER-GRL). Conversations are partitioned into smaller groups of utterances for effective extraction of contextual information. The system uses Gated Recurrent Units (GRU) to extract multimodal features from these groups of utterances. More importantly, Graph Convolutional Networks (GCN) and Reinforcement Learning (RL) agents are cascade trained to capture the complex dependencies of emotion features in interactive scenarios. Comparing the results of the conER-GRL model with other state-of-the-art models on the benchmark dataset IEMOCAP demonstrates the advantageous capabilities of the conER-GRL architecture in recognizing emotions in real-time from multimodal conversational signals.	翻訳日:2023-11-05 13:39:40 公開日:2023-10-24
# SoK: 汎用大規模言語モデルにおける記憶 SoK: Memorization in General-Purpose Large Language Models ( http://arxiv.org/abs/2310.18362v1 ) ライセンス: Link先を確認	Valentin Hartmann, Anshuman Suri, Vincent Bindschaedler, David Evans, Shruti Tople, Robert West	(参考訳) 大規模言語モデル(LLM)は、無数のアプリケーションが開発中で、目覚ましいペースで進んでいる。従来の機械学習モデルとは異なり、それらはもはや特定のアプリケーションのために構築されるものではなく、幅広いタスクに優れたように設計されている。この成功の大きな要因は、膨大なトレーニングデータセットと、トレーニングデータに含まれる大量の情報を記憶できる前例のない数のモデルパラメータにある。この記憶は単なる言語にとどまらず、いくつかの文書にのみ存在する情報を包含している。これは、質問応答のようなタスクを実行するために必要であり、したがって学習の重要な部分であるため、しばしば望ましいが、プライバシーやセキュリティ、著作権など、さまざまな問題をもたらす。 LLMはトレーニングデータの短い秘密を記憶できるだけでなく、さまざまな方法でテキストで表現できる事実や書体スタイルといった概念を記憶することもできる。本稿では,文章,事実,アイデア,アルゴリズム,書式,分布特性,アライメント目標を網羅したLLMにおける記憶のための分類法を提案する。モデル性能,プライバシ,セキュリティ,機密性,著作権,監査,暗記の検出と防止方法など,各種類の暗記(肯定的かつ否定的)が持つ意味について述べる。さらに,モデル重みの代わりにモデルの振る舞いを暗記する手法が主流であることから生じる課題についても,推論能力や復号アルゴリズムの違いといったllm特有の現象により強調する。本稿では,LSMの記憶から生じる潜在的なリスクと可能性について述べる。 Large Language Models (LLMs) are advancing at a remarkable pace, with myriad applications under development. Unlike most earlier machine learning models, they are no longer built for one specific application but are designed to excel in a wide range of tasks. A major part of this success is due to their huge training datasets and the unprecedented number of model parameters, which allow them to memorize large amounts of information contained in the training data. This memorization goes beyond mere language, and encompasses information only present in a few documents. This is often desirable since it is necessary for performing tasks such as question answering, and therefore an important part of learning, but also brings a whole array of issues, from privacy and security to copyright and beyond. LLMs can memorize short secrets in the training data, but can also memorize concepts like facts or writing styles that can be expressed in text in many different ways. We propose a taxonomy for memorization in LLMs that covers verbatim text, facts, ideas and algorithms, writing styles, distributional properties, and alignment goals. We describe the implications of each type of memorization - both positive and negative - for model performance, privacy, security and confidentiality, copyright, and auditing, and ways to detect and prevent memorization. We further highlight the challenges that arise from the predominant way of defining memorization with respect to model behavior instead of model weights, due to LLM-specific phenomena such as reasoning capabilities or differences between decoding algorithms. Throughout the paper, we describe potential risks and opportunities arising from memorization in LLMs that we hope will motivate new research directions.	翻訳日:2023-11-05 13:39:16 公開日:2023-10-24
# ユニニ医療従事者のための臨床判断支援システム Clinical Decision Support System for Unani Medicine Practitioners ( http://arxiv.org/abs/2310.18361v1 ) ライセンス: Link先を確認	Haider Sultan, Hafiza Farwa Mahmood, Noor Fatima, Marriyam Nadeem and Talha Waheed	(参考訳) 伝統医学の他の分野と同様に、ユナニ薬は長年にわたり有効な医療として見なされてきた。現在でも亜大陸、特にパキスタンやインドで広く使われている。しかし、Unani Medicines Practitionersは日々の医療実践において現代のIT応用を欠いている。オンライン臨床意思決定支援システムは、この課題に対処し、Unani Medicines実践者の診断過程を支援する。提案システムは、患者の症状を入力するためのwebベースのインターフェースを提供し、その症状を自動的に分析し、起こりうる疾患のリストを生成する。このシステムにより、患者は最も可能性の高い疾患を選択し、関連する治療法を遠隔で知らせることができる。このシステムは、オンライン臨床決定支援システム、人工知能推論エンジン、総合的なUnani Medicines Databaseの3つのモジュールで構成されている。このシステムは、決定木、ディープラーニング、自然言語処理といった高度なAI技術を採用している。システム開発では、React、FastAPI、MySQLを含むテクノロジスタックを使用した。アプリケーションのデータと機能は、同様のドメインアプリケーションとの統合と拡張のためにAPIを使用して公開されます。このプロジェクトの新規性は、Unani Medicinesの原則の文脈で、病気を正確にかつ効率的に診断することの課題に対処することである。技術力を活用することで, 医療サービスや情報へのアクセスの容易化, コスト削減, 開業医や患者の満足度向上, 診断プロセスの速度と正確性の向上, 遠隔での効果的な治療が期待できる。このアプリケーションは、Unani Medicines Practitioners, patient, Government Drug Regulators, Software Developers, and Medical researchersなどに役に立つ。 Like other fields of Traditional Medicines, Unani Medicines have been found as an effective medical practice for ages. It is still widely used in the subcontinent, particularly in Pakistan and India. However, Unani Medicines Practitioners are lacking modern IT applications in their everyday clinical practices. An Online Clinical Decision Support System may address this challenge to assist apprentice Unani Medicines practitioners in their diagnostic processes. The proposed system provides a web-based interface to enter the patient's symptoms, which are then automatically analyzed by our system to generate a list of probable diseases. The system allows practitioners to choose the most likely disease and inform patients about the associated treatment options remotely. The system consists of three modules: an Online Clinical Decision Support System, an Artificial Intelligence Inference Engine, and a comprehensive Unani Medicines Database. The system employs advanced AI techniques such as Decision Trees, Deep Learning, and Natural Language Processing. For system development, the project team used a technology stack that includes React, FastAPI, and MySQL. Data and functionality of the application is exposed using APIs for integration and extension with similar domain applications. The novelty of the project is that it addresses the challenge of diagnosing diseases accurately and efficiently in the context of Unani Medicines principles. By leveraging the power of technology, the proposed Clinical Decision Support System has the potential to ease access to healthcare services and information, reduce cost, boost practitioner and patient satisfaction, improve speed and accuracy of the diagnostic process, and provide effective treatments remotely. The application will be useful for Unani Medicines Practitioners, Patients, Government Drug Regulators, Software Developers, and Medical Researchers.	翻訳日:2023-11-05 13:38:50 公開日:2023-10-24
# 自力でLPMを誘導する: 機械の読み出しを自動で操作するショートカットトリガー Guiding LLM to Fool Itself: Automatically Manipulating Machine Reading Comprehension Shortcut Triggers ( http://arxiv.org/abs/2310.18360v1 ) ライセンス: Link先を確認	Mosh Levy, Shauli Ravfogel, Yoav Goldberg	(参考訳) 機械読取包括システム(MRC)におけるLLMの最近の応用は目覚ましい結果を示しているが、真のラベルと突発的に相関した特徴によって引き起こされるショートカットの使用は、その信頼性に対する潜在的な脅威として現れている。そこで本研究では,LLM を編集者として,LLM を誤解を招くようなテキスト編集を指導する LLM と,編集したテキストに基づいて質問に回答する LLM の2つの角度から解析する。サンプルにショートカットトリガーを追加するためのエディタをガイドするフレームワークを導入する。 GPT4をエディタとして使うと、LCMを騙すサンプルのトリガショートカットをうまく編集できる。 LLMを読者として分析することで、能力のあるLLMであってもショートカット知識で騙すことができる。驚くべきことに、gpt4は自身の編集によって欺くことができる(f1では15%減少)。手術をショートカットするLLMの脆弱性について検討した。今後の研究のためにフレームワークが生成したキュレートデータセットであるShortcutQAを公開します。 Recent applications of LLMs in Machine Reading Comprehension (MRC) systems have shown impressive results, but the use of shortcuts, mechanisms triggered by features spuriously correlated to the true label, has emerged as a potential threat to their reliability. We analyze the problem from two angles: LLMs as editors, guided to edit text to mislead LLMs; and LLMs as readers, who answer questions based on the edited text. We introduce a framework that guides an editor to add potential shortcuts-triggers to samples. Using GPT4 as the editor, we find it can successfully edit trigger shortcut in samples that fool LLMs. Analysing LLMs as readers, we observe that even capable LLMs can be deceived using shortcut knowledge. Strikingly, we discover that GPT4 can be deceived by its own edits (15% drop in F1). Our findings highlight inherent vulnerabilities of LLMs to shortcut manipulations. We publish ShortcutQA, a curated dataset generated by our framework for future research.	翻訳日:2023-11-05 13:38:27 公開日:2023-10-24
# 非定常確率多元帯域に対するリスク回避フレームワーク A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits ( http://arxiv.org/abs/2310.19821v1 ) ライセンス: Link先を確認	Reda Alami, Mohammed Mahfoud, Mastane Achab	(参考訳) 典型的な確率的多腕バンディット問題(英語版)では、しばしば与えられた報酬の合計を最大化することが目的である。追加情報なしで最適な戦略が選択される一方で、追加の環境固有の知識を提供する場合、もはやそうではない。特に、医療や金融のような高ボラティリティの分野では、単純報酬の最大化アプローチは、学習問題の複雑さを正確に捉えておらず、信頼性の低いソリューションをもたらすことが多い。そこで本研究では,非定常環境で動作する適応型リスクアウェア戦略の枠組みを提案する。本手法は,多機能バンディットアルゴリズムの複数のファミリーをリスクに敏感な設定にマップするために,文献に広く普及する様々なリスク対策を取り入れている。さらに、得られたアルゴリズムをRestarted Bayesian Online Change-Point Detection (R-BOCPD)アルゴリズムと組み合わせ、局所的な(アームごとの)スイッチを検出するために(可変な)探索戦略を課す。我々は、有限時間理論的保証と漸近的後悔の束縛である$\tilde o(\sqrt{k_t t})$ up to time horizon $t$ と$k_t$ を提供する。実際に,本フレームワークは,合成環境と実環境の両方における最先端技術と比較し,リスク感受性と非定常性の両方に関して効率よく機能する。 In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon $T$. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no longer the case when provided additional environment-specific knowledge. In particular, in areas of high volatility like healthcare or finance, a naive reward maximization approach often does not accurately capture the complexity of the learning problem and results in unreliable solutions. To tackle problems of this nature, we propose a framework of adaptive risk-aware strategies that operate in non-stationary environments. Our framework incorporates various risk measures prevalent in the literature to map multiple families of multi-armed bandit algorithms into a risk-sensitive setting. In addition, we equip the resulting algorithms with the Restarted Bayesian Online Change-Point Detection (R-BOCPD) algorithm and impose a (tunable) forced exploration strategy to detect local (per-arm) switches. We provide finite-time theoretical guarantees and an asymptotic regret bound of order $\tilde O(\sqrt{K_T T})$ up to time horizon $T$ with $K_T$ the total number of change-points. In practice, our framework compares favorably to the state-of-the-art in both synthetic and real-world environments and manages to perform efficiently with respect to both risk-sensitivity and non-stationarity.	翻訳日:2023-11-05 13:28:44 公開日:2023-10-24
# NetDistiller:in-situ蒸留によるTiny Deep Learningの強化 NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation ( http://arxiv.org/abs/2310.19820v1 ) ライセンス: Link先を確認	Shunyao Zhang, Yonggan Fu, Shang Wu, Jyotikrishna Dass, Haoran You, Yingyan (Celine) Lin	(参考訳) 小さなニューラルネットワーク(TNN)のタスク精度を高めることは、メモリ、計算、帯域幅、電源の制限によって制限されるエッジデバイスへのTNNのデプロイを可能にするための根本的な課題となっている。そこで本研究では,TNNのチャネル数を拡大して構築した重み共有教師のサブネットワークとして扱うことにより,TNNの達成可能な精度を高めるためのNetDistillerというフレームワークを提案する。具体的には, 目標TNNモデルと, 1) 勾配の衝突に対処するための勾配手術と(2) 教師モデルの過度な適合を緩和するための不確実性を考慮した蒸留を通じて, 重み付け教師モデルとの共同訓練を行う。多様なタスクにわたる大規模な実験は、最先端の手法よりも達成可能なTNNの精度を高めるNetDistillerの有効性を検証する。私たちのコードはhttps://github.com/GATECH-EIC/NetDistiller.comから入手可能です。 Boosting the task accuracy of tiny neural networks (TNNs) has become a fundamental challenge for enabling the deployments of TNNs on edge devices which are constrained by strict limitations in terms of memory, computation, bandwidth, and power supply. To this end, we propose a framework called NetDistiller to boost the achievable accuracy of TNNs by treating them as sub-networks of a weight-sharing teacher constructed by expanding the number of channels of the TNN. Specifically, the target TNN model is jointly trained with the weight-sharing teacher model via (1) gradient surgery to tackle the gradient conflicts between them and (2) uncertainty-aware distillation to mitigate the overfitting of the teacher model. Extensive experiments across diverse tasks validate NetDistiller's effectiveness in boosting TNNs' achievable accuracy over state-of-the-art methods. Our code is available at https://github.com/GATECH-EIC/NetDistiller.	翻訳日:2023-11-05 13:28:16 公開日:2023-10-24
# 構成ファインチューニング: 一般化のための事前学習型デノナイジングオートエンコーダ Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization ( http://arxiv.org/abs/2006.16205v4 ) ライセンス: Link先を確認	Sang Michael Xie, Tengyu Ma, Percy Liang	(参考訳) 我々は,コードにコンパイルしなければならない擬似コード変換など,出力の妥当性制約を受ける構造化出力の予測問題に注目する。ラベル付き入出力ペアは入手に費用がかかるが、"ラベルなし"出力(つまり、対応する入力のない出力)は自由に利用可能であり(github上のコードなど)、出力妥当性に関する情報を提供する。ラベルなし出力の劣化バージョンを復調するためにデノイザを事前訓練することで、出力構造をキャプチャできる。まず,プレトレーニング後の標準的な微調整が,この構造の一部を破壊していることを示す。次に, 予め学習したデノイザを冷凍し, 出力構造を保存した予測器を微調整する構成ファインチューニングを提案する。 2層ReLUネットワークの場合、構成した微調整によって予測器の複雑さが大幅に減少し、一般化が向上することを示す。実験により,2つの擬似コードからコードへの変換データセット(3%,6%)の標準的な微調整よりも微調整が向上することを示した。合成微調整による改善は、アウト・オブ・ディストリビューション(OOD)の例(4%と25%の相対)で拡大される。 We focus on prediction problems with structured outputs that are subject to output validity constraints, e.g. pseudocode-to-code translation where the code must compile. While labeled input-output pairs are expensive to obtain, "unlabeled" outputs, i.e. outputs without corresponding inputs, are freely available (e.g. code on GitHub) and provide information about output validity. We can capture the output structure by pre-training a denoiser to denoise corrupted versions of unlabeled outputs. We first show that standard fine-tuning after pre-training destroys some of this structure. We then propose composed fine-tuning, which fine-tunes a predictor composed with the pre-trained denoiser, which is frozen to preserve output structure. For two-layer ReLU networks, we prove that composed fine-tuning significantly reduces the complexity of the predictor, thus improving generalization. Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative). The improvement from composed fine-tuning is magnified on out-of-distribution (OOD) examples (4% and 25% relative).	翻訳日:2023-10-28 07:34:29 公開日:2023-10-24
# 説明可能なニューラル推論のための前方構成伝搬 Forward Composition Propagation for Explainable Neural Reasoning ( http://arxiv.org/abs/2112.12717v4 ) ライセンス: Link先を確認	Isel Grau and Gonzalo N\'apoles and Marilyn Bello and Yamisleydi Salgueiro and Agnieszka Jastrzebska	(参考訳) 本稿では,構造的分類問題に基づくフィードフォワードニューラルネットワークの予測を説明するため,fcp( forward composition propagation)と呼ばれるアルゴリズムを提案する。提案するfcpアルゴリズムでは、各ニューロンは、そのニューロンにおける各問題の特徴の役割を示す合成ベクターによって記述される。構成ベクトルは与えられた入力インスタンスを使用して初期化され、出力層に到達するまでネットワーク全体に伝播する。各構成値の符号は、対応する特徴がニューロンを興奮させるか阻害するかを示し、絶対値はその影響を定量化する。 FCPアルゴリズムは、学習プロセスが完了すると、ポストホックベースで実行される。本稿では,fcpアルゴリズムを説明することを目的として,根拠真理が分かっている公平性問題におけるバイアス検出に関するケーススタディを開発した。シミュレーションの結果, 構成値は保護特徴の期待挙動と密接に一致することがわかった。この論文のソースコードと補足資料はhttps://github.com/igraugar/fcp.comで入手できる。 This paper proposes an algorithm called Forward Composition Propagation (FCP) to explain the predictions of feed-forward neural networks operating on structured classification problems. In the proposed FCP algorithm, each neuron is described by a composition vector indicating the role of each problem feature in that neuron. Composition vectors are initialized using a given input instance and subsequently propagated through the whole network until reaching the output layer. The sign of each composition value indicates whether the corresponding feature excites or inhibits the neuron, while the absolute value quantifies its impact. The FCP algorithm is executed on a post-hoc basis, i.e., once the learning process is completed. Aiming to illustrate the FCP algorithm, this paper develops a case study concerning bias detection in a fairness problem in which the ground truth is known. The simulation results show that the composition values closely align with the expected behavior of protected features. The source code and supplementary material for this paper are available at https://github.com/igraugar/fcp.	翻訳日:2023-10-28 07:05:57 公開日:2023-10-24
# クロスドメインマイズショット学習のための特徴抽出器スタック化 Feature Extractor Stacking for Cross-domain Few-shot Learning ( http://arxiv.org/abs/2205.05831v4 ) ライセンス: Link先を確認	Hongyu Wang, Eibe Frank, Bernhard Pfahringer, Michael Mayo, Geoffrey Holmes	(参考訳) クロスドメイン・ショットラーニング(CDFSL)は、知識を1つ以上のソースドメインから、明確に異なる分布を持つインスタンススカースターゲットドメインに転送する必要がある学習問題に対処する。最近発表されたCDFSL法は一般に、複数のソースドメインの知識を1つの特徴抽出器に組み合わせた普遍モデルを構築している。これにより効率的な推論が可能になるが、新しいソースドメインが追加されるたびに抽出器を再計算する必要がある。これらの手法の一部は、異種ソースドメイン抽出アーキテクチャと互換性がない。そこで本研究では,cdfsl法である特徴抽出器スタックリング(fes)を提案する。この手法は,不均質な事前学習された抽出器を箱から取り出し,その抽出器の更新時に再計算する必要のある普遍モデルを維持しない。本稿では,古典的累積一般化法に着想を得た基本的FESアルゴリズムと,畳み込みFES(ConFES)と正規化FES(ReFES)の2つの変種を紹介する。対象領域のタスクが与えられた場合、これらのアルゴリズムは、各抽出器を独立に微調整し、クロスバリデーションを使用して、サポートセットからスタック化された一般化のためのトレーニングデータを抽出し、このデータから単純な線形累積分類器を学習する。我々は,畳み込みニューラルネットワークを用いた画像分類を目標としたメタデータセットベンチマークにおいて,fes法を評価した結果,最新性能が得られた。 Cross-domain few-shot learning (CDFSL) addresses learning problems where knowledge needs to be transferred from one or more source domains into an instance-scarce target domain with an explicitly different distribution. Recently published CDFSL methods generally construct a universal model that combines knowledge of multiple source domains into one feature extractor. This enables efficient inference but necessitates re-computation of the extractor whenever a new source domain is added. Some of these methods are also incompatible with heterogeneous source domain extractor architectures. We propose feature extractor stacking (FES), a new CDFSL method for combining information from a collection of extractors, that can utilise heterogeneous pretrained extractors out of the box and does not maintain a universal model that needs to be re-computed when its extractor collection is updated. We present the basic FES algorithm, which is inspired by the classic stacked generalisation approach, and also introduce two variants: convolutional FES (ConFES) and regularised FES (ReFES). Given a target-domain task, these algorithms fine-tune each extractor independently, use cross-validation to extract training data for stacked generalisation from the support set, and learn a simple linear stacking classifier from this data. We evaluate our FES methods on the well-known Meta-Dataset benchmark, targeting image classification with convolutional neural networks, and show that they can achieve state-of-the-art performance.	翻訳日:2023-10-28 06:54:33 公開日:2023-10-24
# 治療コミュニティにおける相互影響の同定 Identifying Peer Influence in Therapeutic Communities ( http://arxiv.org/abs/2203.14223v3 ) ライセンス: Link先を確認	Shanjukta Nath, Keith Warren, Subhadeep Paul	(参考訳) 治療コミュニティ(TCs)の卒業に相互の影響や役割モデルの影響があるかを検討する。住民間の確認書と修正書の交換記録と正確な出入り日数を記録した3TCから匿名化された個人レベルの観測データを分析した。アサーションによってピアネットワークを形成することができ、エントリとイグジットの日付は関心の因果効果を定義することができる。因果的役割モデルの効果を,社会的接点の1つ(例えば,肯定的な意見を述べた仲間)を観察できる住民(ego)の期待結果の差を測定することで,egoの退学前に卒業を成功させるか,egoの退学前に卒業を成功させるか,という概念化する。ピアインフルエンスは通常観測データにおいて観測されていないホモフィアと結合するので、ネットワークを潜在変数モデルでモデル化し、ホモフィアを推定し、結果方程式に含める。我々は,ピア影響推定器のバイアスがサンプルサイズとともに減少するという理論的保証を提供する。以上の結果から,学生の卒業が住民の卒業に与える影響が示唆された。ピアの影響の大きさは、性別、人種、ロールモデル効果の定義に基づいて異なる。カウンターファクチュアル・エクササイズは、ネットワークの伝播を通じて、被治療者の直接的および間接的に、友人を「危険にさらされている」個人に割り当てることの潜在的利益を定量化する。 We investigate if there is a peer influence or role model effect on successful graduation from Therapeutic Communities (TCs). We analyze anonymized individual-level observational data from 3 TCs that kept records of written exchanges of affirmations and corrections among residents, and their precise entry and exit dates. The affirmations allow us to form peer networks, and the entry and exit dates allow us to define a causal effect of interest. We conceptualize the causal role model effect as measuring the difference in the expected outcome of a resident (ego) who can observe one of their social contacts (e.g., peers who gave affirmations), to be successful in graduating before the ego's exit vs not successfully graduating before the ego's exit. Since peer influence is usually confounded with unobserved homophily in observational data, we model the network with a latent variable model to estimate homophily and include it in the outcome equation. We provide a theoretical guarantee that the bias of our peer influence estimator decreases with sample size. Our results indicate there is an effect of peers' graduation on the graduation of residents. The magnitude of peer influence differs based on gender, race, and the definition of the role model effect. A counterfactual exercise quantifies the potential benefits of intervention of assigning a buddy to "at-risk" individuals directly on the treated resident and indirectly on their peers through network propagation.	翻訳日:2023-10-28 06:53:18 公開日:2023-10-24
# 畳み込みスパース符号化による教師なしエネルギー分散 Unsupervised energy disaggregation via convolutional sparse coding ( http://arxiv.org/abs/2207.09785v2 ) ライセンス: Link先を確認	Christian Aarset (1) and Andreas Habring (1) and Martin Holler (1) and Mario Mitter (2) ((1) University of Graz, (2) Solgenium OG)	(参考訳) 本研究では,スマートメータを備えた民家における非教師なしエネルギー分散手法を提案する。本手法は, 電力消費を能動的・受動的に分類し, 直接の相互作用なしに住民の活動や存在を報告できることを目的とする。これは、個人住宅の非侵入的な健康モニタリングのようなアプリケーションの基盤となる。提案手法は,ipalm(inertial proximal alternating linearized minimization)アルゴリズムを用いて,収束を保証した種々の条件を満たした適切なエネルギー汎関数を最小化するものである。提案手法の実現可能性を確認するため,半合成テストデータセットに関する実験と,既存の教師付き手法との比較を行った。 In this work, a method for unsupervised energy disaggregation in private households equipped with smart meters is proposed. This method aims to classify power consumption as active or passive, granting the ability to report on the residents' activity and presence without direct interaction. This lays the foundation for applications like non-intrusive health monitoring of private homes. The proposed method is based on minimizing a suitable energy functional, for which the iPALM (inertial proximal alternating linearized minimization) algorithm is employed, demonstrating that various conditions guaranteeing convergence are satisfied. In order to confirm feasibility of the proposed method, experiments on semi-synthetic test data sets and a comparison to existing, supervised methods are provided.	翻訳日:2023-10-28 06:46:06 公開日:2023-10-24
# Open-Radiomics: 標準化されたデータセットのコレクションと再生可能放射能機械学習パイプラインの技術プロトコル Open-radiomics: A Collection of Standardized Datasets and a Technical Protocol for Reproducible Radiomics Machine Learning Pipelines ( http://arxiv.org/abs/2207.14776v2 ) ライセンス: Link先を確認	Khashayar Namdar, Matthias W. Wagner, Birgit B. Ertl-Wagner, Farzad Khalvati	(参考訳) 目的: 医療画像における機械学習パイプラインの重要な分野として、放射能は再現性とアクセシビリティという2つの大きな課題に直面している。本研究では,放射能特徴抽出が再現性に及ぼす影響を調べるため,提案手法に基づく包括的放射能パイプラインとともに,放射能データセットのセットであるopen-radiomicsを導入する。材料と方法: 実験はBraTS 2020オープンソースMRI(Magnetic Resonance Imaging)データセットで行われ、369人の成人脳腫瘍患者(低次グリオーマ76例、高次グリオーマ293例)を含む。 lggとhggの分類にpyradiomicsライブラリを使用し、4つのmri配列、3つのbinwidth、6つの画像正規化法、4つの腫瘍サブリージョンの組み合わせからなる288のradiomicsデータセットを形成する。ランダムフォレスト分類器が使用され、各放射能は異なるデータ分割とモデルランダム状態を用いたトレーニング検証(60%/20%/20%)実験を100回(28,800回)繰り返し、エリアアンダーレシーバー動作特性曲線(AUC)を算出した。結果:binwidthやimage normalizationと異なり,腫瘍のサブリージョンと画像配列はモデルの性能に大きく影響した。 t1コントラストエンハンス配列とネクロティックと非エンハンス腫瘍コア領域の結合により、最高aucs(平均auc 0.951,95%信頼区間0.949, 0.952)が得られた。 28の設定とデータ分割により1のAUCがテストされた。結語: この実験は, 放射能パイプライン(例:腫瘍亜領域)の変動源が, 結果に有意な影響を及ぼしうることを示し, 再現不可能な表面的完全性に繋がる可能性がある。 Purpose: As an important branch of machine learning pipelines in medical imaging, radiomics faces two major challenges namely reproducibility and accessibility. In this work, we introduce open-radiomics, a set of radiomics datasets along with a comprehensive radiomics pipeline based on our proposed technical protocol to investigate the effects of radiomics feature extraction on the reproducibility of the results. Materials and Methods: Experiments are conducted on BraTS 2020 open-source Magnetic Resonance Imaging (MRI) dataset that includes 369 adult patients with brain tumors (76 low-grade glioma (LGG), and 293 high-grade glioma (HGG)). Using PyRadiomics library for LGG vs. HGG classification, 288 radiomics datasets are formed; the combinations of 4 MRI sequences, 3 binWidths, 6 image normalization methods, and 4 tumor subregions. Random Forest classifiers were used, and for each radiomics dataset the training-validation-test (60%/20%/20%) experiment with different data splits and model random states was repeated 100 times (28,800 test results) and Area Under Receiver Operating Characteristic Curve (AUC) was calculated. Results: Unlike binWidth and image normalization, tumor subregion and imaging sequence significantly affected performance of the models. T1 contrast-enhanced sequence and the union of necrotic and the non-enhancing tumor core subregions resulted in the highest AUCs (average test AUC 0.951, 95% confidence interval of (0.949, 0.952)). Although 28 settings and data splits yielded test AUC of 1, they were irreproducible. Conclusion: Our experiments demonstrate the sources of variability in radiomics pipelines (e.g., tumor subregion) can have a significant impact on the results, which may lead to superficial perfect performances that are irreproducible.	翻訳日:2023-10-28 06:32:04 公開日:2023-10-24
# 小児低グレードグリオーマ腫瘍の3次元確率分布を用いた分子サブタイプ同定による深層学習モデルの改善 Improving Deep Learning Models for Pediatric Low-Grade Glioma Tumors Molecular Subtype Identification Using 3D Probability Distributions of Tumor Location ( http://arxiv.org/abs/2210.07287v2 ) ライセンス: Link先を確認	Khashayar Namdar, Matthias W. Wagner, Kareem Kudus, Cynthia Hawkins, Uri Tabori, Brigit Ertl-Wagner, Farzad Khalvati	(参考訳) 背景と目的:小児低次グリオーマ(pLGG)は小児で最も一般的な脳腫瘍であり,pLGGの分子マーカーの同定は治療計画の立案に不可欠である。 pLGGサブタイプ同定のための畳み込みニューラルネットワーク(CNN)モデルは腫瘍セグメンテーションに依存している。腫瘍の分節は最適ではないと仮定し,mriデータに腫瘍位置確率を用いたcnnモデルの拡張を提案する。材料と方法: rebが承認した回顧的研究には、mri流体減衰逆回復法(flair)の143個のブラフ融合癌と71個のブラフv600e変異腫瘍の配列があった。腫瘍セグメンテーション(ROIs)は小児神経放射線学のフェローが提供し、高齢者神経放射線学者が検証した。それぞれの実験では、データを80/20の割合で開発とテストにランダムに分割する。腫瘍位置の確率密度関数 (PDF) を導出するために, 開発データセットの各クラス毎の3DバイナリROIマスクを組み合わせ, 位置ベース, CNNベース, ハイブリッドの3つのパイプラインを開発した。結果:異なるモデルの初期化とデータを100回分割して実験を繰り返し,AUC(Area Under Receiver Operating Characteristics Curve)を算出した。位置ベース分類器は 77.90, 95% 信頼区間 (CI) (76.76, 79.03) を達成した。 CNNベースの分類器は86.11、CI(84.96、87.25)、CNNは88.64 CI(87.57、89.72)で前者を上回った(Studentのt-test p-value 0.0018)。結論: 腫瘍位置をCNNモデルに組み込むことにより, 統計的に有意な改善が得られた。結果から,手動で分割したROIが最適でない可能性が示唆された。 Background and Purpose: Pediatric low-grade glioma (pLGG) is the most common type of brain tumor in children, and identification of molecular markers for pLGG is crucial for successful treatment planning. Convolutional Neural Network (CNN) models for pLGG subtype identification rely on tumor segmentation. We hypothesize tumor segmentations are suboptimal and thus, we propose to augment the CNN models using tumor location probability in MRI data. Materials and Methods: Our REB-approved retrospective study included MRI Fluid-Attenuated Inversion Recovery (FLAIR) sequences of 143 BRAF fused and 71 BRAF V600E mutated tumors. Tumor segmentations (regions of interest (ROIs)) were provided by a pediatric neuroradiology fellow and verified by a senior pediatric neuroradiologist. In each experiment, we randomly split the data into development and test with an 80/20 ratio. We combined the 3D binary ROI masks for each class in the development dataset to derive the probability density functions (PDF) of tumor location, and developed three pipelines: location-based, CNN-based, and hybrid. Results: We repeated the experiment with different model initializations and data splits 100 times and calculated the Area Under Receiver Operating Characteristic Curve (AUC). The location-based classifier achieved an AUC of 77.90, 95% confidence interval (CI) (76.76, 79.03). CNN-based classifiers achieved AUC of 86.11, CI (84.96, 87.25), while the tumor-location-guided CNNs outperformed the formers with an average AUC of 88.64 CI (87.57, 89.72), which was statistically significant (Student's t-test p-value 0.0018). Conclusion: We achieved statistically significant improvements by incorporating tumor location into the CNN models. Our results suggest that manually segmented ROIs may not be optimal.	翻訳日:2023-10-28 06:25:49 公開日:2023-10-24
# 自己教師あり学習による心筋超音波からのラベルなしセグメンテーション Label-free segmentation from cardiac ultrasound using self-supervised learning ( http://arxiv.org/abs/2210.04979v2 ) ライセンス: Link先を確認	Danielle L. Ferreira, Zaynaf Salaymang, Rima Arnaout	(参考訳) 心室のセグメンテーションと測定は心エコーにおいて重要であるが、困難で再現性に乏しい。ニューラルネットワークは補助できるが、教師付きアプローチは、同じ面倒な手動アノテーションを必要とする。コンピュータビジョン,臨床領域知識,深層学習を組み合わせた自己教師型(手動ラベルなし)セグメンテーションのためのパイプラインを構築した。 8,393枚の心エコー画像(4,476,266枚,平均61年,女性51%)を用いて,450枚の心エコー画像(93,000枚)をトレーニングし,生体計測値の算出を行った。また,左室を手作業で追跡できる患者10,030例の外部画像についても検討した。臨床測定値とパイプライン予測値の間のr2は、報告されたクリニック間変動と類似しており、いくつかの異なる測定値(r2 0.56-0.84)で教師あり学習に匹敵する。異常室径と機能を検出する平均精度は,臨床検査と比較して0.85(範囲0.71-0.97)であった。テスト心エコー図(n=553)のサブセットは、MRIがゴールド標準である心臓MRIに対応していた。パイプラインとMRIの相関は臨床心エコー図とMRIと類似していた。最後に、パイプラインは、外部の手動ラベル付きデータセットで、左室を0.99 (95% ci [0.89])の平均diceスコアで正確に区分する。本研究は, 超音波による画像分割を手作業で自由かつ臨床的に有効かつ高度にスケーラブルに行う方法である。 Segmentation and measurement of cardiac chambers is critical in cardiac ultrasound but is laborious and poorly reproducible. Neural networks can assist, but supervised approaches require the same laborious manual annotations. We built a pipeline for self-supervised (no manual labels) segmentation combining computer vision, clinical domain knowledge, and deep learning. We trained on 450 echocardiograms (93,000 images) and tested on 8,393 echocardiograms (4,476,266 images; mean 61 years, 51% female), using the resulting segmentations to calculate biometrics. We also tested against external images from an additional 10,030 patients with available manual tracings of the left ventricle. r2 between clinically measured and pipeline-predicted measurements were similar to reported inter-clinician variation and comparable to supervised learning across several different measurements (r2 0.56-0.84). Average accuracy for detecting abnormal chamber size and function was 0.85 (range 0.71-0.97) compared to clinical measurements. A subset of test echocardiograms (n=553) had corresponding cardiac MRIs, where MRI is the gold standard. Correlation between pipeline and MRI measurements was similar to that between clinical echocardiogram and MRI. Finally, the pipeline accurately segments the left ventricle with an average Dice score of 0.89 (95% CI [0.89]) in the external, manually labeled dataset. Our results demonstrate a manual-label free, clinically valid, and highly scalable method for segmentation from ultrasound, a noisy but globally important imaging modality.	翻訳日:2023-10-28 06:25:13 公開日:2023-10-24
# 部分観測軌道からの作動型クープマン発電機の非線形モデル学習 Learning Bilinear Models of Actuated Koopman Generators from Partially-Observed Trajectories ( http://arxiv.org/abs/2209.09977v3 ) ライセンス: Link先を確認	Samuel E. Otto, Sebastian Peitz, Clarence W. Rowley	(参考訳) 基礎となるkoopman演算子やジェネレータの近似に基づく非線形力学系のデータ駆動モデルは、予測、特徴学習、状態推定、制御に成功している。制御-アフィン系に対するクープマン生成器は入力に対するアフィン依存性も持つことがよく知られており、ダイナミクスの便利な有限次元双線型近似に繋がる。しかし、動作を伴うシステムのクープマン発生器を近似するための現在のアプローチの範囲を制限する2つの主要な障害がある。まず、既存の手法の性能は、クープマン生成器が近似される基底関数の選択に大きく依存する。第二に、全状態が観測されない場合、出力時系列の入力列への依存性を、近似koopman演算子にオブザーバブルを構築する際に考慮する必要がある。これらの問題に対処するため、クープマン発生器が支配する可観測体の力学を双線型隠れマルコフモデルとして記述し、予測最大化(EM)アルゴリズムを用いてモデルパラメータを決定する。 Eステップは標準のカルマンフィルタとスムーズで、Mステップはジェネレータの制御-アフィン動的モード分解に似ている。本手法は,ゆるい多様体を持つ作動系に対する有限次元koopman-invariant部分空間の復元,非強制ダフィング方程式に対するkoopman固有関数の推定,揚力と抗力のノイズ観測のみに基づく流体ピンボール系のモデル予測制御といった3つの実例で性能を示す。 Data-driven models for nonlinear dynamical systems based on approximating the underlying Koopman operator or generator have proven to be successful tools for forecasting, feature learning, state estimation, and control. It has become well known that the Koopman generators for control-affine systems also have affine dependence on the input, leading to convenient finite-dimensional bilinear approximations of the dynamics. Yet there are still two main obstacles that limit the scope of current approaches for approximating the Koopman generators of systems with actuation. First, the performance of existing methods depends heavily on the choice of basis functions over which the Koopman generator is to be approximated; and there is currently no universal way to choose them for systems that are not measure preserving. Secondly, if we do not observe the full state, then it becomes necessary to account for the dependence of the output time series on the sequence of supplied inputs when constructing observables to approximate Koopman operators. To address these issues, we write the dynamics of observables governed by the Koopman generator as a bilinear hidden Markov model, and determine the model parameters using the expectation-maximization (EM) algorithm. The E-step involves a standard Kalman filter and smoother, while the M-step resembles control-affine dynamic mode decomposition for the generator. We demonstrate the performance of this method on three examples, including recovery of a finite-dimensional Koopman-invariant subspace for an actuated system with a slow manifold; estimation of Koopman eigenfunctions for the unforced Duffing equation; and model-predictive control of a fluidic pinball system based only on noisy observations of lift and drag.	翻訳日:2023-10-28 06:22:47 公開日:2023-10-24
# グラフの非現実的説明に関する調査:定義,方法,評価 A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation ( http://arxiv.org/abs/2210.12089v2 ) ライセンス: Link先を確認	Mario Alfonso Prado-Romero and Bardh Prenkaj and Giovanni Stilo and Fosca Giannotti	(参考訳) グラフニューラルネットワーク(GNN)は、コミュニティ検出と分子分類においてよく機能する。 Counterfactual Explanations (CE) はブラックボックスモデルの透明性の限界を克服するための反例を提供する。グラフ学習の関心が高まっているため、我々はGNNにおけるCEの概念に注目している。私たちはsoaを分析して分類法、一様表記法、ベンチマークデータセットと評価メトリクスを提供しました。本稿では,14の手法,評価プロトコル,22のデータセット,19のメトリクスについて論じる。提案手法の大半をGRETELライブラリに統合し,その強度と落とし穴を理解する実験的な評価を行った。オープンな課題と今後の作業を強調します。 Graph Neural Networks (GNNs) perform well in community detection and molecule classification. Counterfactual Explanations (CE) provide counter-examples to overcome the transparency limitations of black-box models. Due to the growing attention in graph learning, we focus on the concepts of CE for GNNs. We analysed the SoA to provide a taxonomy, a uniform notation, and the benchmarking datasets and evaluation metrics. We discuss fourteen methods, their evaluation protocols, twenty-two datasets, and nineteen metrics. We integrated the majority of methods into the GRETEL library to conduct an empirical evaluation to understand their strengths and pitfalls. We highlight open challenges and future work.	翻訳日:2023-10-28 06:12:11 公開日:2023-10-24
# 過パラメータ学習におけるバギング:リスク特性とリスク単調化 Bagging in overparameterized learning: Risk characterization and risk monotonization ( http://arxiv.org/abs/2210.11445v3 ) ライセンス: Link先を確認	Pratik Patil, Jin-Hong Du, Arun Kumar Kuchibhotla	(参考訳) バギング(英: Bagging)は、統計学と機械学習において、予測手順の性能を改善するために一般的に用いられるアンサンブル技法である。本稿では,比例漸近法の下での袋詰め予測器の変種について,特徴数と観測数との比率が一定に収束する確率について検討する。具体的には,単純なランダムサンプリングによる古典的結果を用いて,袋詰め予測器の2乗誤差損失下での予測リスクを分析する一般的な手法を提案する。戦略を特化することで,任意の特徴共分散行列と信号ベクトルを持つ定型線形モデルの下で,任意の数のバッグを持つ袋付リッジおよびリッジレス予測器の正確な漸近的リスクを導出する。さらに,バッグングの最適サブサンプルサイズを選択するための一般的なクロスバリデーション手順を規定し,サンプルサイズ(二重あるいは多重の降下)の制限リスクの非単調な挙動を排除するために,その実用性について議論する。袋詰めリッジとリッジレス予測器に対する提案手法の実証において, 最適なサブサンプルサイズのオラクル特性を徹底的に検討し, 異なる袋詰めタイプ間の詳細な比較を行った。 Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors using classical results on simple random sampling. Specializing the strategy, we derive the exact asymptotic risk of the bagged ridge and ridgeless predictors with an arbitrary number of bags under a well-specified linear model with arbitrary feature covariance matrices and signal vectors. Furthermore, we prescribe a generic cross-validation procedure to select the optimal subsample size for bagging and discuss its utility to eliminate the non-monotonic behavior of the limiting risk in the sample size (i.e., double or multiple descents). In demonstrating the proposed procedure for bagged ridge and ridgeless predictors, we thoroughly investigate the oracle properties of the optimal subsample size and provide an in-depth comparison between different bagging variants.	翻訳日:2023-10-28 06:12:03 公開日:2023-10-24
# deepgoplus推論の数値安定性 Numerical Stability of DeepGOPlus Inference ( http://arxiv.org/abs/2212.06361v3 ) ライセンス: Link先を確認	In\'es Gonzalez Pepe, Yohan Chatelain, Gregory Kiar, Tristan Glatard	(参考訳) 畳み込みニューラルネットワーク(CNN)は現在、利用可能な最も広く使用されているディープニューラルネットワーク(DNN)アーキテクチャの1つであり、多くの問題に対して最先端のパフォーマンスを実現している。元々はコンピュータビジョンのタスクに応用され、CNNは画像以外の空間的関係のあるデータでもうまく機能し、様々な分野に適用されてきた。しかし、近年の研究では、DNNにおける数値安定性の課題が強調されている。これらの課題は、パフォーマンスと信頼性を損なう可能性がある。本稿では,タンパク質機能を予測するCNNであるDeepGOPlusについて検討する。 deepgoplusは最先端のパフォーマンスを達成し,プロテオミクスに出現するタンパク質配列をうまく活用し,アノテートすることができる。浮動小数点データの摂動による不確かさを定量化し,モデル推論段階の数値的安定性を判定する。さらに,DeepGOPlus推論に精度の低い浮動小数点形式を用いることで,メモリ消費とレイテンシを低減する機会を探る。これは、浮動小数点演算エラーを実験的に定量化するMonte Carlo Arithmeticと、カスタマイズ可能な浮動小数点演算精度フォーマットで結果をエミュレートするVPRECを使用してDeepGOPlusの実行を計測することで実現されている。 deepgoplusモデルの主要な成果物であり、異なる環境にまたがって広く適用できるため、推論の段階に焦点が当てられる。以上の結果から,DeepGOPlus CNNは数値的に非常に安定しているが,より精度の低い浮動小数点型でしか実装できないことがわかった。事前学習したdeepgoplusモデルから得られた予測は非常に信頼性が高く,既存の浮動小数点形式を効率的に利用することができる。 Convolutional neural networks (CNNs) are currently among the most widely-used deep neural network (DNN) architectures available and achieve state-of-the-art performance for many problems. Originally applied to computer vision tasks, CNNs work well with any data with a spatial relationship, besides images, and have been applied to different fields. However, recent works have highlighted numerical stability challenges in DNNs, which also relates to their known sensitivity to noise injection. These challenges can jeopardise their performance and reliability. This paper investigates DeepGOPlus, a CNN that predicts protein function. DeepGOPlus has achieved state-of-the-art performance and can successfully take advantage and annotate the abounding protein sequences emerging in proteomics.We determine the numerical stability of the model's inference stage by quantifying the numerical uncertainty due to perturbations of the underlying floating-point data. In addition, we explore the opportunity to use reduced-precision floating point formats for DeepGOPlus inference to reduce memory consumption and latency. This is achieved by instrumenting DeepGOPlus' execution using Monte Carlo Arithmetic, a technique that experimentally quantifies floating point operation errors and VPREC, a tool that emulates results with customizable floating point precision formats. Focus is placed on the inference stage as it is the primary deliverable of the DeepGOPlus model, widely applicable across different environments. All in all, our results show that although the DeepGOPlus CNN is very stable numerically, it can only be selectively implemented with lower-precision floating-point formats. We conclude that predictions obtained from the pre-trained DeepGOPlus model are very reliable numerically, and use existing floating-point formats efficiently.	翻訳日:2023-10-28 06:05:43 公開日:2023-10-24
# ガウス状態の光子数モーメントと累積 Photon-number moments and cumulants of Gaussian states ( http://arxiv.org/abs/2212.06067v3 ) ライセンス: Link先を確認	Yanic Cardin, Nicol\'as Quesada	(参考訳) 光子数に基づく場合,ガウス状態のモーメントと累積に対する閉形式表現を開発する。ガウス状態の光子数モーメントをループハフニアンで表現し、グラフの隣接を表す$(0,1)$-行列に適用すると、その完全マッチングの数を数える。同様に、(0,1)$-行列に適用されたとき、そのグラフのハミルトニアンサイクルの数をカウントする新しく導入された行列関数であるモントリオールアーの言葉で光子数累積を表現する。これらのグラフ理論接続に基づいて、光子数モーメントと累積の計算が$#P-$hardであることを示す。さらに、ハフニアンのよく知られた結果と一致するモントリオールの時間(すなわち累積)を計算する指数時間アルゴリズムを提供する。次に、一様損失の干渉計が、ゼロ変位を持つ同一の単一モードガウス状態を持つ全ての入力で供給されると、奇数次累積は、すべてゼロであることが示される。最後に,K$同一状態が$$\ell$モード干渉計に供給されるガウスボソンサンプリング装置において,累積の分布を4次まで異なる入力状態に対して研究するために導出した式を用いる。本研究では, 入力状態のタイプ, 圧縮状態, 損失値, スクラッシュ状態, 熱状態, および非真空入力数の関数として, 累積物の依存性を解析した。熱状態は他の古典的状態(例えばスカッシュ状態)よりも、損失状態や無損失状態の光子数累積状態の模倣においてずっと悪い結果をもたらすことが判明した。 We develop closed-form expressions for the moments and cumulants of Gaussian states when measured in the photon-number basis. We express the photon-number moments of a Gaussian state in terms of the loop Hafnian, a function that when applied to a $(0,1)$-matrix representing the adjacency of a graph, counts the number of its perfect matchings. Similarly, we express the photon-number cumulants in terms of the Montrealer, a newly introduced matrix function that when applied to a $(0,1)$-matrix counts the number of Hamiltonian cycles of that graph. Based on these graph-theoretic connections, we show that the calculation of photon-number moments and cumulants are $#P-$hard. Moreover, we provide an exponential time algorithm to calculate Montrealers (and thus cumulants), matching well-known results for Hafnians. We then demonstrate that when a uniformly lossy interferometer is fed in every input with identical single-mode Gaussian states with zero displacement, all the odd-order cumulants but the first one are zero. Finally, we employ the expressions we derive to study the distribution of cumulants up to the fourth order for different input states in a Gaussian boson sampling setup where $K$ identical states are fed into an $\ell$-mode interferometer. We analyze the dependence of the cumulants as a function of the type of input state, squeezed, lossy squeezed, squashed, or thermal, and as a function of the number of non-vacuum inputs. We find that thermal states perform much worse than other classical states, such as squashed states, at mimicking the photon-number cumulants of lossy or lossless squeezed states.	翻訳日:2023-10-28 06:05:09 公開日:2023-10-24
# JASMINE:Few-Shot LearningのためのアラビアGPTモデル JASMINE: Arabic GPT Models for Few-Shot Learning ( http://arxiv.org/abs/2212.10755v2 ) ライセンス: Link先を確認	El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, AbdelRahim Elmadany, Alcides Alcoba Inciarte, Md Tawkat Islam Khondaker	(参考訳) 生成前訓練(GPT)に関する学術研究は、我々の自己回帰モデル全体の理解に深刻なギャップを残している。例えば、これらのモデルの可能性や、多様な言語的・文化的環境における社会的影響についてはほとんど知識がない。我々は、ジャスミンを導入することで、人口4億人を超える幅広い言語と方言のコレクションであるアラビア語のこの問題を緩和する。 JASMINEは、大きく多様なデータセット(約235GBのテキスト)で事前訓練された3億-6.7億のパラメータの大きさの強力なアラビア語の自動回帰トランスフォーマー言語モデルのスイートである。また,アラビア語自己回帰モデルの自動評価および人間評価のための包括的なベンチマークを,社会的バイアス,有害性,毒性の可能性を網羅して,慎重に設計し,公開する。新たなベンチマークを用いて,JASMINEは多種多様なNLPタスクにおける数ショット学習と同様に,本質的に強力な性能を示す。我々は、興味のある研究者とモデルと評価ベンチマークを責任を持ってリリースし、実験するためのコードを提供することを目標としています。 Scholarship on generative pretraining (GPT) remains acutely Anglocentric, leaving serious gaps in our understanding of the whole class of autoregressive models. For example, we have little knowledge about the potential of these models and their societal impacts in diverse linguistic and cultural settings. We alleviate this issue for Arabic, a wide collection of languages and dialectal varieties with more than 400 million population, by introducing JASMINE. JASMINE is a suite of powerful Arabic autoregressive Transformer language models ranging in size between 300 million-6.7 billion parameters pretrained on a large and diverse dataset (~ 235 GB of text). We also carefully design and release a comprehensive benchmark for both automated and human evaluation of Arabic autoregressive models, with coverage of potential social biases, harms, and toxicity. Using our novel benchmark, we evaluate JASMINE extensively showing powerful performance intrinsically as well as in few-shot learning on a wide range of NLP tasks. We aim to responsibly release our models and evaluation benchmark with interested researchers, along with code for experimenting with them.	翻訳日:2023-10-28 05:52:39 公開日:2023-10-24
# 抽出NLP課題生成モデルにおけるトークン化整合性 Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks ( http://arxiv.org/abs/2212.09912v2 ) ライセンス: Link先を確認	Kaiser Sun, Peng Qi, Yuhao Zhang, Lan Liu, William Yang Wang, Zhiheng Huang	(参考訳) 生成モデルは、入力の一部を抽出して所望の出力を形成する抽出タスクを解くために広く応用され、大きな成功を収めた。例えば、抽出質問応答(QA)では、生成モデルは常に最先端の結果をもたらす。本研究では,これらのモデルのトレーニングにおいて一般的に無視されるトークン化の不整合の問題を特定する。この問題は、インプットとアウトプットがトークン化されていないことでこれらのタスクの抽出性が損なわれ、結果としてパフォーマンスの低下と幻覚が引き起こされる。本稿では,この問題に対する簡易かつ効果的な解決法を提案し,抽出QAのケーススタディを行う。我々は、一貫したトークン化により、BARTモデルがSQuAD上でトレーニングされ、8つのQAデータセットで評価された場合、ドメイン内データセットとドメイン外データセットの両方で、注目すべき平均+1.7 F2ゲインを達成できることを示した。さらに、モデルはより速く収束し、文脈外回答を生じにくくなります。これらの結果から,抽出タスクの解決においてトークン化をどのように行うべきか,トレーニング中に一貫したトークン化を適用することを推奨したい。 Generative models have been widely applied to solve extractive tasks, where parts of the input is extracted to form the desired output, and achieved significant success. For example, in extractive question answering (QA), generative models have constantly yielded state-of-the-art results. In this work, we identify the issue of tokenization inconsistency that is commonly neglected in training these models. This issue damages the extractive nature of these tasks after the input and output are tokenized inconsistently by the tokenizer, and thus leads to performance drop as well as hallucination. We propose a simple yet effective fix to this issue and conduct a case study on extractive QA. We show that, with consistent tokenization, the model performs better in both in-domain and out-of-domain datasets, with a notable average of +1.7 F2 gain when a BART model is trained on SQuAD and evaluated on 8 QA datasets. Further, the model converges faster, and becomes less likely to generate out-of-context answers. With these findings, we would like to call for more attention on how tokenization should be done when solving extractive tasks and recommend applying consistent tokenization during training.	翻訳日:2023-10-28 05:51:56 公開日:2023-10-24
# 契約書で何を読むべきか? 法的義務、権利及び禁止の当事者固有の要約 What to Read in a Contract? Party-Specific Summarization of Legal Obligations, Entitlements, and Prohibitions ( http://arxiv.org/abs/2212.09825v2 ) ライセンス: Link先を確認	Abhilasha Sancheti, Aparna Garimella, Balaji Vasan Srinivasan, Rachel Rudinger	(参考訳) 法的契約における重要な義務、権利、および禁止の見直しと理解は、その長さとドメイン固有性のために退屈な作業となり得る。さらに、契約当事者ごとに重要な権利と義務が異なります。本研究では,権利と義務の理解の迅速化と改善を図るために,法定契約の当事者別抽出要約タスクを提案する。そこで,本研究では,法的専門家が注釈を付した,当事者固有の対関係の重要度比較からなるデータセットを収集し,リース契約から抽出された義務,権利,禁止を含む約293k文対をカバーする。このデータセットを用いて,ペアワイズ重要ランカを訓練し,パーティ固有の契約要約を生成するパイプラインベース抽出要約システムを提案する。自動評価法と人間評価法の両方を用いて,システムと各種ベースラインの比較を行い,要約中にドメイン固有の重要概念を取り入れる必要性を確立する。 Reviewing and comprehending key obligations, entitlements, and prohibitions in legal contracts can be a tedious task due to their length and domain-specificity. Furthermore, the key rights and duties requiring review vary for each contracting party. In this work, we propose a new task of party-specific extractive summarization for legal contracts to facilitate faster reviewing and improved comprehension of rights and duties. To facilitate this, we curate a dataset comprising of party-specific pairwise importance comparisons annotated by legal experts, covering ~293K sentence pairs that include obligations, entitlements, and prohibitions extracted from lease agreements. Using this dataset, we train a pairwise importance ranker and propose a pipeline-based extractive summarization system that generates a party-specific contract summary. We establish the need for incorporating domain-specific notion of importance during summarization by comparing our system against various baselines using both automatic and human evaluation methods	翻訳日:2023-10-28 05:51:38 公開日:2023-10-24
# cp-bcs:制御フローグラフと擬似コードによるバイナリコードの要約 CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code ( http://arxiv.org/abs/2310.16853v1 ) ライセンス: Link先を確認	Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang	(参考訳) 低レベルの言語(アセンブリコード)の実行動作とセマンティクスを人間可読な自然言語に変換することを含むため、バイナリの関数サマリーの自動生成は極めて価値のある作業である。しかしながら、アセンブリコードの理解に関する現在の作業のほとんどは、関数名の生成に向けられている。このギャップを埋めるため、バイナリ関数、特に削除されたバイナリ(シンボルテーブルやデバッグ情報がない)の完全な要約を生成することに重点を置いています。アセンブリコードのセマンティクスを十分に活用するために,cp-bcsと呼ばれる制御フローグラフと擬似コードガイドバイナリコード要約フレームワークを提案する。 CP-BCSは双方向の命令レベル制御フローグラフと擬似コードを利用して、専門家の知識を取り入れ、包括的なバイナリ関数の実行動作と論理意味学を学ぶ。 CP-BCSを3種類のコンピュータアーキテクチャ(X86, X64, ARM)に対して3種類のバイナリ最適化レベル(O1, O2, O3)で評価する。その結果,cp-bcsが優れ,リバースエンジニアリングの効率が著しく向上した。 Automatically generating function summaries for binaries is an extremely valuable but challenging task, since it involves translating the execution behavior and semantics of the low-level language (assembly code) into human-readable natural language. However, most current works on understanding assembly code are oriented towards generating function names, which involve numerous abbreviations that make them still confusing. To bridge this gap, we focus on generating complete summaries for binary functions, especially for stripped binary (no symbol table and debug information in reality). To fully exploit the semantics of assembly code, we present a control flow graph and pseudo code guided binary code summarization framework called CP-BCS. CP-BCS utilizes a bidirectional instruction-level control flow graph and pseudo code that incorporates expert knowledge to learn the comprehensive binary function execution behavior and logic semantics. We evaluate CP-BCS on 3 different binary optimization levels (O1, O2, and O3) for 3 different computer architectures (X86, X64, and ARM). The evaluation results demonstrate CP-BCS is superior and significantly improves the efficiency of reverse engineering.	翻訳日:2023-10-28 00:17:36 公開日:2023-10-24
# 医療画像を用いた深層学習モデルによる新型コロナウイルス患者の分類 Deep Learning Models for Classification of COVID-19 Cases by Medical Images ( http://arxiv.org/abs/2310.16851v1 ) ライセンス: Link先を確認	Amir Ali	(参考訳) 近年,胸部ct画像を用いた新型コロナウイルス感染の検出が注目されている。しかし、医療画像から患者を分類することは、特にその両側の変化を特定する上で非常に困難である。この課題に対処するために,本研究では,感染患者の正確な分類に深層学習モデルの力を利用する。本研究では,deepnet201,googlenet,alexnetを含む深層伝達学習に基づく分類モデルと,注意深く選択された教師付き学習モデルの比較分析を行った。また,X線や心電図などの医用画像の識別と識別を含むCovid-19の分類も検討した。この包括的なアプローチにより、我々のモデルは幅広い医療画像タイプを扱えるようになり、Covid-19の特徴的なパターンを効果的に特定できる。高度な深層学習技術を用いて、綿密な研究を行い、Covid-19診断の精度とスピードを高めるために大きな努力をしてきた。これらのモデルの有効性と、covid-19対策のグローバルな取り組みに多大な貢献ができる可能性を実証した。 In recent times, the use of chest Computed Tomography (CT) images for detecting coronavirus infections has gained significant attention, owing to their ability to reveal bilateral changes in affected individuals. However, classifying patients from medical images presents a formidable challenge, particularly in identifying such bilateral changes. To tackle this challenge, our study harnesses the power of deep learning models for the precise classification of infected patients. Our research involves a comparative analysis of deep transfer learning-based classification models, including DenseNet201, GoogleNet, and AlexNet, against carefully chosen supervised learning models. Additionally, our work encompasses Covid-19 classification, which involves the identification and differentiation of medical images, such as X-rays and electrocardiograms, that exhibit telltale signs of Covid-19 infection. This comprehensive approach ensures that our models can handle a wide range of medical image types and effectively identify characteristic patterns indicative of Covid-19. By conducting meticulous research and employing advanced deep learning techniques, we have made significant strides in enhancing the accuracy and speed of Covid-19 diagnosis. Our results demonstrate the effectiveness of these models and their potential to make substantial contributions to the global effort to combat COVID-19.	翻訳日:2023-10-28 00:17:13 公開日:2023-10-24
# 新視点音響合成 Novel-View Acoustic Synthesis ( http://arxiv.org/abs/2301.08730v3 ) ライセンス: Link先を確認	Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi	(参考訳) 我々は,nvas(new-view acoustic synthesis)タスクについて紹介する。音源の視点で観測された視覚と音を考えると,対象とする視点からそのシーンの音を合成できるのか? 入力された音声・視覚的手がかりを分析し,空間内の任意の点の音を合成することを学ぶ視覚誘導音響合成(ViGAS)ネットワークを提案する。このタスクをベンチマークするために、我々は2つの大規模マルチビューオーディオ視覚データセットを収集した。提案手法は,空間的手がかりの推論に成功し,両データセットに忠実な音声を合成することを示す。我々の知る限り、この研究は、AR/VRからアート、デザインに至るまで、エキサイティングな可能性のある、新しい視点の音響合成タスクを解決するための、最初の定式化、データセット、アプローチを表している。この研究に縛られずに、我々は、新しいビュー合成の未来は、ビデオからのマルチモーダル学習にあると信じている。 We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint? We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space by analyzing the input audio-visual cues. To benchmark this task, we collect two first-of-their-kind large-scale multi-view audio-visual datasets, one synthetic and one real. We show that our model successfully reasons about the spatial cues and synthesizes faithful audio on both datasets. To our knowledge, this work represents the very first formulation, dataset, and approach to solve the novel-view acoustic synthesis task, which has exciting potential applications ranging from AR/VR to art and design. Unlocked by this work, we believe that the future of novel-view synthesis is in multi-modal learning from videos.	翻訳日:2023-10-27 18:18:12 公開日:2023-10-24
# 未知力学系のロバスト進化演算子学習のための臨界サンプリング Critical Sampling for Robust Evolution Operator Learning of Unknown Dynamical Systems ( http://arxiv.org/abs/2304.07485v3 ) ライセンス: Link先を確認	Ce Zhang, Kailiang Wu, Zhihai He	(参考訳) 未知の力学系を考えると、その統治法則の効果的な学習と将来の進化の正確な予測に必要なサンプルの最小数と、これらの臨界試料をどうやって選択するか。そこで本研究では,設計アプローチに基づくこの問題について検討する。少数の初期サンプルから始めて、システム進化のより正確な学習を実現するために、臨界サンプルを適応的に発見する。ここでの課題の1つは、地平系状態が未知であるため、ネットワークモデリングエラーを知らないことですが、これはクリティカルサンプリングに必要です。この課題に対処するために,前向きと後向きの進化ネットワークをそれぞれ前向きと後向きの時間方向の時間的進化の挙動を学習する多段階の相互予測ネットワークを提案する。非常に興味深いことに、所望のネットワークモデリング誤差は、現在のシステム状態から直接計算できる多段階相互予測誤差と高い相関関係にあることがわかった。これにより、動的システムに対する高いネットワークモデリング誤差を持つ領域から臨界サンプルを動的に選択できる。さらに、空間力学モデリングを時間的進化予測に組み込んだ共同時空間進化ネットワークを導入し、システム進化演算子を少数のサンプルで頑健に学習する。提案手法は,未知力学系の効果的な学習に必要なサンプル数を劇的に削減し,未知力学系の進化挙動を正確に予測できることが実証された。 Given an unknown dynamical system, what is the minimum number of samples needed for effective learning of its governing laws and accurate prediction of its future evolution behavior, and how to select these critical samples? In this work, we propose to explore this problem based on a design approach. Starting from a small initial set of samples, we adaptively discover critical samples to achieve increasingly accurate learning of the system evolution. One central challenge here is that we do not know the network modeling error since the ground-truth system state is unknown, which is however needed for critical sampling. To address this challenge, we introduce a multi-step reciprocal prediction network where forward and backward evolution networks are designed to learn the temporal evolution behavior in the forward and backward time directions, respectively. Very interestingly, we find that the desired network modeling error is highly correlated with the multi-step reciprocal prediction error, which can be directly computed from the current system state. This allows us to perform a dynamic selection of critical samples from regions with high network modeling errors for dynamical systems. Additionally, a joint spatial-temporal evolution network is introduced which incorporates spatial dynamics modeling into the temporal evolution prediction for robust learning of the system evolution operator with few samples. Our extensive experimental results demonstrate that our proposed method is able to dramatically reduce the number of samples needed for effective learning and accurate prediction of evolution behaviors of unknown dynamical systems by up to hundreds of times.	翻訳日:2023-10-26 21:25:47 公開日:2023-10-24
# アウトソース機械学習タスクの低コスト結果検証のための生成フレームワーク A Generative Framework for Low-Cost Result Validation of Outsourced Machine Learning Tasks ( http://arxiv.org/abs/2304.00083v2 ) ライセンス: Link先を確認	Abhinav Kumar, Miguel A. Guirao Aguilera, Reza Tourani, Satyajayant Misra	(参考訳) 機械学習(ML)の人気が高まり、さまざまなセンシティブなドメインにデプロイされるようになり、MLのセキュリティとプライバシを重視した大きな研究がもたらされた。しかしながら、自動運転など一部のアプリケーションでは、アウトソースされたMLワークロードの整合性検証がより重要になっている。マルチパーティ計算や証明ベースシステムといった既存のソリューションは、計算オーバーヘッドがかなり大きいため、リアルタイムアプリケーションには適さない。我々は、アウトソースされたMLワークロードのリアルタイム検証のための新しいフレームワークであるFidesを提案する。 Fidesは、信頼された実行環境内で実行中に対応するサービスモデルを検証するための、空間を動的に蒸留し微調整する、新しい、効率的な蒸留技術である、Greedy Distillation Transfer Learningを特徴としている。 fideは、統計分析と分岐測定を使用して、サービスモデルが攻撃されている場合に高い確率で識別するクライアント側の攻撃検出モデルを備えている。 Fidesはまた、攻撃が特定されるたびに元のクラスを予測する再分類機能を提供する。攻撃検出と再分類モデルの訓練のための生成的逆ネットワークフレームワークを考案した。評価の結果,fideは攻撃検出で最大98%,再分類で94%の精度を達成した。 The growing popularity of Machine Learning (ML) has led to its deployment in various sensitive domains, which has resulted in significant research focused on ML security and privacy. However, in some applications, such as autonomous driving, integrity verification of the outsourced ML workload is more critical--a facet that has not received much attention. Existing solutions, such as multi-party computation and proof-based systems, impose significant computation overhead, which makes them unfit for real-time applications. We propose Fides, a novel framework for real-time validation of outsourced ML workloads. Fides features a novel and efficient distillation technique--Greedy Distillation Transfer Learning--that dynamically distills and fine-tunes a space and compute-efficient verification model for verifying the corresponding service model while running inside a trusted execution environment. Fides features a client-side attack detection model that uses statistical analysis and divergence measurements to identify, with a high likelihood, if the service model is under attack. Fides also offers a re-classification functionality that predicts the original class whenever an attack is identified. We devised a generative adversarial network framework for training the attack detection and re-classification models. The evaluation shows that Fides achieves an accuracy of up to 98% for attack detection and 94% for re-classification.	翻訳日:2023-10-26 21:24:03 公開日:2023-10-24
# バンディットフィードバックによる実効予測:再パラメータ化による学習 Performative Prediction with Bandit Feedback: Learning through Reparameterization ( http://arxiv.org/abs/2305.01094v3 ) ライセンス: Link先を確認	Yatong Chen, Wei Tang, Chien-Ju Ho, Yang Liu	(参考訳) 実効的予測は、 \citeauthor{perdomo2020performative} によって導入された、モデルの展開に応じてデータ分布自体が変化する社会予測を研究するためのフレームワークである。この分野での既存の作業は通常、実行リスクがデプロイされたモデル上で凸である、モデルからデータ分散へのマッピングが事前にモデルデザイナに知られており、実行リスクの第一次情報が利用可能である、という3つの前提に基づいている。本稿では,これらの仮定を必要としない実効予測問題の研究を開始する。具体的には、実行予測目標を誘導データ分布の関数として再パラメータ化する「em再パラメータ化」フレームワークを開発する。また,第1レベルが分布パラメータ空間上で反復最適化を行い,第2レベルが各イテレーションで特定の目標分布パラメータを誘導するモデルを学習する2レベルゼロ階最適化手法を開発した。軽度条件下では、この再パラメータ化により、非凸目標を凸目標に変換し、証明可能な後悔保証を達成することができる。特に, 実演サンプルの総数において部分線形であり, モデルパラメータの次元における多項式のみである後悔境界を与える。アプリケーション側では、youtubeやtiktokのような大規模なオンラインレコメンデーションシステムでは、レコメンデーション更新頻度が高く、将来の好みを変える可能性がある。 Performative prediction, as introduced by \citeauthor{perdomo2020performative}, is a framework for studying social prediction in which the data distribution itself changes in response to the deployment of a model. Existing work in this field usually hinges on three assumptions that are easily violated in practice: that the performative risk is convex over the deployed model, that the mapping from the model to the data distribution is known to the model designer in advance, and the first-order information of the performative risk is available. In this paper, we initiate the study of performative prediction problems that do not require these assumptions. Specifically, we develop a {\em reparameterization} framework that reparametrizes the performative prediction objective as a function of the induced data distribution. We also develop a two-level zeroth-order optimization procedure, where the first level performs iterative optimization on the distribution parameter space, and the second level learns the model that induced a particular target distribution parameter at each iteration. Under mild conditions, this reparameterization allows us to transform the non-convex objective into a convex one and achieve provable regret guarantees. In particular, we provide a regret bound that is sublinear in the total number of performative samples taken and is only polynomial in the dimension of the model parameter. On the application side, we believe our method is useful for large online recommendation systems like YouTube or TikTok, where the recommendation update frequency is high and might potentially reshape future preferences.	翻訳日:2023-10-26 21:13:45 公開日:2023-10-24
# ブロック不変対称性シフト:二量化ハミルトニアンのユニタリの線形結合への分解を改善する前処理法 Block-Invariant Symmetry Shift: Preprocessing technique for second-quantized Hamiltonians to improve their decompositions to Linear Combination of Unitaries ( http://arxiv.org/abs/2304.13772v3 ) ライセンス: Link先を確認	Ignacio Loaiza, Artur F. Izmaylov	(参考訳) 量子位相推定(QPE)による分子電子ハミルトニアンのエネルギー推定の計算コストは、ハミルトニアンの最大値と最小値の違いによって増大する。本研究では、ハミルトニアンのノルムを特定の対称性の標的状態の固有スペクトルを変更することなく減少させる前処理手順を提案する。新しい手順であるBlock-Invariant Symmetry Shift (BLISS) は作用素 T を構築し、H-T を実装するコストは H のそれと比較すると削減されるが、H-T は H と同じ方法で利子の部分空間に作用する。 BLISS性能は、LCU(Linear Combination of Unitary)に基づく小さな分子の集合上のQPEアプローチに対して実証される。目標とする状態の集合を示す対称性として電子の数を用いると、BLISSはいくつかのLCU分解に対して非シフトバージョンと比較して2つの1ノルムの減少係数を与えた。 Computational cost of energy estimation for molecular electronic Hamiltonians via Quantum Phase Estimation (QPE) grows with the difference between the largest and smallest eigenvalues of the Hamiltonian. In this work we propose a preprocessing procedure that reduces the norm of the Hamiltonian without changing its eigenspectrum for the target states of a particular symmetry. The new procedure, Block-Invariant Symmetry Shift (BLISS), builds an operator T such that the cost of implementing H-T is reduced compared to that of H, yet H-T acts on the subspaces of interest the same way as H does. BLISS performance is demonstrated for Linear Combination of Unitaries (LCU)-based QPE approaches on a set of small molecules. Using the number of electrons as the symmetry specifying the target set of states, BLISS provided a factor of 2 reduction of 1-norm for several LCU decompositions compared to their unshifted versions.	翻訳日:2023-10-26 21:13:19 公開日:2023-10-24
# 超強結合空洞QEDにおける緩和破壊と共鳴トンネル Relaxation breakdown and resonant tunneling in ultrastrong-coupling cavity QED ( http://arxiv.org/abs/2304.11191v2 ) ライセンス: Link先を確認	Daniele De Bernardis	(参考訳) 単一電磁空洞モードと超強結合した非対称双極子の開緩和ダイナミクスについて検討した。相互作用系全体に対する熱化マスター方程式を用いることで、リウビリアンギャップの位相図を導出する。超強結合は双極子トンネル速度の指数関数的な抑制により平衡状態への緩和を抑制する。しかし、極性多光子共鳴はキャビティを介する双極子共鳴トンネル法により高速な緩和を回復する。数値的なエビデンスとは別に、一般化された回転波近似によりRabiモデルを対角化して完全に解析的な記述を開発する。このような超強結合系の緩和物理学は、標準のテキストブック装束状態図の多光子ポーラロン版に還元される。最後に、超強結合系におけるカスケード共振トンネル構成の基礎を設定できるマルチウェルダイポールの拡張について議論する。 We study the open relaxation dynamics of an asymmetric dipole that is ultrastrongly coupled to a single electromagnetic cavity mode. By using a thermalizing master equation for the whole interacting system we derive a phase diagram of the Liouvillian gap. It emerges that the ultrastrong coupling inhibits the system relaxation toward the equilibrium state due to an exponential suppression of the dipole tunneling rate. However, we find that polaronic multi-photon resonances restore fast relaxation by a cavity-mediated dipole resonant tunneling process. Aside of the numerical evidences, we develop a fully analytical description by diagonalizing the Rabi model through a generalized rotating-wave approximation, valid in the so-called polaron frame. The relaxation physics of such ultrastrong-coupling systems is then reduced to a multi-photon polaron version of the standard text-book dressed states picture. At the end we discuss an extension to a multi-well dipole that can set the basis of a cascaded resonant tunnelling setup in the ultrastrong coupling regime.	翻訳日:2023-10-26 21:12:19 公開日:2023-10-24
# TELeR:複雑なタスクのベンチマークのためのLLMプロンプトの一般的な分類法 TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks ( http://arxiv.org/abs/2305.11430v2 ) ライセンス: Link先を確認	Shubhra Kanti Karmaker Santu and Dongji Feng	(参考訳) LLMは従来の会話環境におけるテキストの理解と生成に大きな成功を収めてきたが、不明確な複雑なタスクを実行する可能性はほとんど研究されていない。実際、我々は複雑なタスクにのみ焦点を絞った複数のLSMを用いて包括的なベンチマーク研究を行っていません。しかし,このようなベンチマーク研究を行うことは,プロンプトタイプやスタイルが異なる場合や,プロンプトで詳細度が異なる場合,llmsの性能のばらつきが大きいため,困難である。この問題に対処するため,本論文では,様々な複雑なタスクを実行するために,特定の特性を持つプロンプトを設計できる汎用分類法を提案する。この分類は、将来のベンチマーク研究が研究の一部として使用される特定のカテゴリのプロンプトを報告し、異なる研究間で有意義な比較を可能にする。また、この分類学を通じて共通標準を確立することで、研究者は特定の複雑なタスクにおいてLLMのパフォーマンスについてより正確な結論を導き出すことができる。 While LLMs have shown great success in understanding and generating text in traditional conversational settings, their potential for performing ill-defined complex tasks is largely under-studied. Indeed, we are yet to conduct comprehensive benchmarking studies with multiple LLMs that are exclusively focused on a complex task. However, conducting such benchmarking studies is challenging because of the large variations in LLMs' performance when different prompt types/styles are used and different degrees of detail are provided in the prompts. To address this issue, the paper proposes a general taxonomy that can be used to design prompts with specific properties in order to perform a wide range of complex tasks. This taxonomy will allow future benchmarking studies to report the specific categories of prompts used as part of the study, enabling meaningful comparisons across different studies. Also, by establishing a common standard through this taxonomy, researchers will be able to draw more accurate conclusions about LLMs' performance on a specific complex task.	翻訳日:2023-10-26 21:04:05 公開日:2023-10-24
# DoReMi: データ混合の最適化が言語モデルの事前トレーニングを高速化 DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining ( http://arxiv.org/abs/2305.10429v3 ) ライセンス: Link先を確認	Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu	(参考訳) 事前学習データドメイン(wikipedia、書籍、webテキストなど)の混合比率は、言語モデル(lm)の性能に大きく影響する。本稿では,minimax optimization (doremi) によるドメインの重み付けを提案する。これはまず,グループ分散ロバスト最適化 (group distributionally robust optimization, group dro) を用いた小さなプロキシモデルを,ダウンストリームタスクを知らずにドメインの重み付け (mixture proportions) を生成する。次に、これらのドメインウェイトでデータセットを再サンプリングし、より大きなフルサイズのモデルをトレーニングします。実験では、280Mパラメータのプロキシモデル上でDoReMiを使用して、8Bパラメータモデル(30倍大きい)をより効率的にトレーニングするためのドメイン重みを求める。 The Pileでは、DoReMiはドメインをダウンウェイトしても、すべてのドメインのパープレキシティを改善します。 DoReMiは、The Pileのデフォルトドメインウェイトを使用してトレーニングされたベースラインモデルに対して平均的な数ショットダウンストリーム精度を6.5%改善し、2.6倍のトレーニングステップでベースライン精度に達する。 GLaMデータセットでは、下流タスクの知識がないDoReMiが、下流タスクにチューニングされたドメインウェイトの使用パフォーマンスにマッチする。 The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of downstream tasks. We then resample a dataset with these domain weights and train a larger, full-sized model. In our experiments, we use DoReMi on a 280M-parameter proxy model to find domain weights for training an 8B-parameter model (30x larger) more efficiently. On The Pile, DoReMi improves perplexity across all domains, even when it downweights a domain. DoReMi improves average few-shot downstream accuracy by 6.5% points over a baseline model trained using The Pile's default domain weights and reaches the baseline accuracy with 2.6x fewer training steps. On the GLaM dataset, DoReMi, which has no knowledge of downstream tasks, even matches the performance of using domain weights tuned on downstream tasks.	翻訳日:2023-10-26 21:03:47 公開日:2023-10-24
# 低ランク共変量近似による変量誤差fr\'echet回帰 Errors-in-variables Fr\'echet Regression with Low-rank Covariate Approximation ( http://arxiv.org/abs/2305.09282v2 ) ライセンス: Link先を確認	Kyunghee Han and Dogyoon Song	(参考訳) fr\'echet回帰は非ユークリッド応答変数を含む回帰分析に有望なアプローチとして現れた。しかし、その実用的適用性は、豊富でノイズのない共変量データを持つ理想的なシナリオに依存することによって妨げられている。本稿では,共変量行列に内在する低ランク構造を活用し,これらの制約に対処する新しい推定手法を提案する。提案手法は,大域的Fr'echet回帰と主成分回帰の概念を組み合わせて,回帰推定器の効率と精度の向上を目的とする。低ランク構造を取り入れることで、特に高次元および誤差不変回帰設定において、より効率的なモデリングと推定が可能となる。提案した推定器の大サンプル特性の理論的解析を行い, 偏差, 分散, および測定誤差による追加変動の包括的解析を行った。さらに, 数値実験により, 理論的な知見を裏付ける実証的なエビデンスを与え, 提案手法の優れた性能を示す。全体として、この研究は非ユークリッド変数の回帰分析のための有望なフレームワークを導入し、様々な分野の潜在的な応用とともに、限定的でノイズの多い共変量データに関連する課題に効果的に対処する。 Fr\'echet regression has emerged as a promising approach for regression analysis involving non-Euclidean response variables. However, its practical applicability has been hindered by its reliance on ideal scenarios with abundant and noiseless covariate data. In this paper, we present a novel estimation method that tackles these limitations by leveraging the low-rank structure inherent in the covariate matrix. Our proposed framework combines the concepts of global Fr\'echet regression and principal component regression, aiming to improve the efficiency and accuracy of the regression estimator. By incorporating the low-rank structure, our method enables more effective modeling and estimation, particularly in high-dimensional and errors-in-variables regression settings. We provide a theoretical analysis of the proposed estimator's large-sample properties, including a comprehensive rate analysis of bias, variance, and additional variations due to measurement errors. Furthermore, our numerical experiments provide empirical evidence that supports the theoretical findings, demonstrating the superior performance of our approach. Overall, this work introduces a promising framework for regression analysis of non-Euclidean variables, effectively addressing the challenges associated with limited and noisy covariate data, with potential applications in diverse fields.	翻訳日:2023-10-26 21:02:46 公開日:2023-10-24
# シーングラフを用いた事前学習型視覚・言語モデルへの構造化表現の導入 Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs ( http://arxiv.org/abs/2305.06343v2 ) ライセンス: Link先を確認	Roei Herzig, Alon Mendelson, Leonid Karlinsky, Assaf Arbelle, Rogerio Feris, Trevor Darrell, Amir Globerson	(参考訳) 視覚と言語モデル(VLM)は、様々なタスクにおいて顕著なゼロショット(ZS)性能を示した。しかし、近年の研究では、最高のVLMでさえ、オブジェクト属性、関係性、行動状態などの構成的シーン理解の側面を捉えるのに苦労していることが示されている。対照的に、これらのモデルを改善することができるシーングラフ(SG)のような構造化アノテーションを得るためには、時間とコストがかかり、大規模では利用できない。ここでは,SGデータセットが事前学習されたVLMの構造的理解を高めるのに十分な情報を提供できるかどうかを問う。構造化情報を視覚表現とテキスト表現の両方に組み込むコンポーネントを統合することで,sgsから学習する際にvlmを改善することが可能であることを示す。視覚面では、SG情報を予測するために訓練されたイメージトランスフォーマーに特別な「SGコンポーネント」を組み込む一方、テキスト側では、SGを使用して、シーンの異なる構成面をハイライトするきめ細かいキャプションを生成する。提案手法は,ZS能力を軽度に低下させるだけで,複数のVLデータセット上でのVLMの性能を向上する。 Vision and language models (VLMs) have demonstrated remarkable zero-shot (ZS) performance in a variety of tasks. However, recent works have shown that even the best VLMs struggle to capture aspects of compositional scene understanding, such as object attributes, relations, and action states. In contrast, obtaining structured annotations, such as scene graphs (SGs), that could improve these models is time-consuming and costly, and thus cannot be used on a large scale. Here we ask whether small SG datasets can provide sufficient information for enhancing structured understanding of pretrained VLMs. We show that it is indeed possible to improve VLMs when learning from SGs by integrating components that incorporate structured information into both visual and textual representations. For the visual side, we incorporate a special "SG Component" in the image transformer trained to predict SG information, while for the textual side, we utilize SGs to generate fine-grained captions that highlight different compositional aspects of the scene. Our method improves the performance of several popular VLMs on multiple VL datasets with only a mild degradation in ZS capabilities.	翻訳日:2023-10-26 21:02:26 公開日:2023-10-24
# NerfAcc: 効率的なサンプリングがNeRFを加速 NerfAcc: Efficient Sampling Accelerates NeRFs ( http://arxiv.org/abs/2305.04966v2 ) ライセンス: Link先を確認	Ruilong Li, Hang Gao, Matthew Tancik, Angjoo Kanazawa	(参考訳) ボリュームレンダリングに必要な大量のサンプルのため、ニューラルレイディアンスフィールドの最適化とレンダリングは計算コストがかかる。最近の研究には、彼らのメソッドを加速するための代替サンプリングアプローチが含まれているが、それらはしばしば作業の焦点ではない。本稿では,複数のサンプリング手法を検討・比較し,改良されたサンプリングは送信推定器の統一的概念の下でNeRFの変種に適用可能であることを示す。今後の実験を容易にするため,NeRF関連手法に高度なサンプリング手法を組み込むための柔軟なAPIを提供するPythonツールボックスであるNerfAccを開発した。既存のコードベースに最小限の変更を加えることで、最近のNeRFメソッドのトレーニング時間を1.5倍から20倍に短縮できることを示し、その柔軟性を示す。さらに、Instant-NGPのような高度にカスタマイズされたNeRFは、NerfAccを使用してネイティブのPyTorchで実装できる。 Optimizing and rendering Neural Radiance Fields is computationally expensive due to the vast number of samples required by volume rendering. Recent works have included alternative sampling approaches to help accelerate their methods, however, they are often not the focus of the work. In this paper, we investigate and compare multiple sampling approaches and demonstrate that improved sampling is generally applicable across NeRF variants under an unified concept of transmittance estimator. To facilitate future experiments, we develop NerfAcc, a Python toolbox that provides flexible APIs for incorporating advanced sampling methods into NeRF related methods. We demonstrate its flexibility by showing that it can reduce the training time of several recent NeRF methods by 1.5x to 20x with minimal modifications to the existing codebase. Additionally, highly customized NeRFs, such as Instant-NGP, can be implemented in native PyTorch using NerfAcc.	翻訳日:2023-10-26 21:01:40 公開日:2023-10-24
# マルチホップインストラクションによる画像操作 -- 新しいデータセットと弱スーパービジョンニューロシンボリックアプローチ Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach ( http://arxiv.org/abs/2305.14410v2 ) ライセンス: Link先を確認	Harman Singh, Poorva Garg, Mohit Gupta, Kevin Shah, Ashish Goswami, Satyam Modi, Arnab Kumar Mondal, Dinesh Khandelwal, Dinesh Garg, Parag Singla	(参考訳) 私たちは自然言語テキストによるイメージ操作に関心があります -- 複数のAIアプリケーションに有用なタスクですが、マルチモーダルスペースに対する複雑な推論が必要です。近年提案されているニューロシンボリック・コンセプト・ラーニング(nscl)を,画像操作のための視覚質問応答(vqa)のタスクに非常に効果的に拡張した。 NeuroSIM と呼ばれるシステムでは,マルチオブジェクトシーン上で複雑なマルチホップ推論を行うことができ,VQA の注釈付きデータ形式において弱い監視しか必要としない。 NeuroSIMは、オブジェクト属性と操作操作からなるドメイン固有言語(DSL)に基づいて、命令をシンボルプログラムに解析し、その実行を導く。我々はタスクのための新しいデータセットを作成し、幅広い実験により、neurosimが教師付きデータを使用して操作するsataベースラインと高い競合性を示している。 We are interested in image manipulation via natural language text -- a task that is useful for multiple AI applications but requires complex reasoning over multi-modal spaces. We extend recently proposed Neuro Symbolic Concept Learning (NSCL), which has been quite effective for the task of Visual Question Answering (VQA), for the task of image manipulation. Our system referred to as NeuroSIM can perform complex multi-hop reasoning over multi-object scenes and only requires weak supervision in the form of annotated data for VQA. NeuroSIM parses an instruction into a symbolic program, based on a Domain Specific Language (DSL) comprising of object attributes and manipulation operations, that guides its execution. We create a new dataset for the task, and extensive experiments demonstrate that NeuroSIM is highly competitive with or beats SOTA baselines that make use of supervised data for manipulation.	翻訳日:2023-10-26 20:53:12 公開日:2023-10-24
# 画像テキストグラフ空間における粗相関学習による視覚・言語構成性の向上 Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality ( http://arxiv.org/abs/2305.13812v3 ) ライセンス: Link先を確認	Harman Singh, Pengchuan Zhang, Qifan Wang, Mengjiao Wang, Wenhan Xiong, Jingfei Du, Yu Chen	(参考訳) 対照的に訓練された視覚言語モデルは、視覚と言語表現の学習において著しく進歩し、様々な下流のマルチモーダルタスクのための最先端のモデルに繋がった。しかし、最近の研究では、オブジェクト、属性、関係性に対して構成的推論を行う能力において、これらのモデルの厳しい制限が強調されている。シーングラフは、イメージを合成的に理解する効果的な方法として登場した。これらは、オブジェクト、それらの属性、シーン内の他のオブジェクトとの関係を含む画像のグラフ構造化セマンティック表現である。本研究では,テキストから解析したシーングラフを画像シーングラフのプロキシとして考慮し,様々な複雑な文を同じ画像にアライメントする画像とテキスト間の粗い相互差分学習目標とともに,グラフ分解と拡張フレームワークを提案する。これと合わせて,属性結合と関係理解を改善するために,シーングラフ空間における新規な負のマイニング手法を提案する。本研究では,提案する複数のベンチマークにおいて,属性結合,関係理解,系統的一般化,生産性を大幅に向上させる手法の有効性を実証すると共に,様々なマルチモーダルタスクにおけるクリップと同等あるいは優れた性能を実現するとともに,提案手法の有効性を実証する。 Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning, leading to state-of-the-art models for various downstream multimodal tasks. However, recent research has highlighted severe limitations of these models in their ability to perform compositional reasoning over objects, attributes, and relations. Scene graphs have emerged as an effective way to understand images compositionally. These are graph-structured semantic representations of images that contain objects, their attributes, and relations with other objects in a scene. In this work, we consider the scene graph parsed from text as a proxy for the image scene graph and propose a graph decomposition and augmentation framework along with a coarse-to-fine contrastive learning objective between images and text that aligns sentences of various complexities to the same image. Along with this, we propose novel negative mining techniques in the scene graph space for improving attribute binding and relation understanding. Through extensive experiments, we demonstrate the effectiveness of our approach that significantly improves attribute binding, relation understanding, systematic generalization, and productivity on multiple recently proposed benchmarks (For example, improvements upto $18\%$ for systematic generalization, $16.5\%$ for relation understanding over a strong baseline), while achieving similar or better performance than CLIP on various general multimodal tasks.	翻訳日:2023-10-26 20:52:05 公開日:2023-10-24
# ハミルトン構造とカオアエネルギーとフーリエ景観構造をつなぐ Connecting the Hamiltonian structure to the QAOA energy and Fourier landscape structure ( http://arxiv.org/abs/2305.13594v2 ) ライセンス: Link先を確認	Micha{\l} St\k{e}ch{\l}y, Lanruo Gao, Boniface Yogendran, Enrico Fontana, Manuel Rudolph	(参考訳) 本稿では,量子近似最適化アルゴリズム(QAOA)におけるハミルトニアンの構成と,対応するコスト景観特性との関係の理解を深めることを目的とする。 QAOAは、組合せ最適化に最もよく用いられる変分量子アルゴリズム(VQA)の顕著な例である。 qaoaの成功はパラメータ最適化に大きく依存しており、特にノイズの多い量子ハードウェアでは大きな課題となっている。したがって、コスト関数のランドスケープを理解することは、より良い最適化ヒューリスティックを設計するのに役立つ。最大5つの局所項と最大20量子ビットを持つハミルトニアンの1層QAOAの場合を考える。コストランドスケープの可視化に加えて、それらのフーリエ変換を計算し、補完的な視点からハミルトニアンの構造との関係を研究する。さらに,地形の粗さを定量化するための指標を導入し,高次元パラメトリドランドスケープの性質に関する貴重な知見を提供する。これらの手法により、ハミルトン構造、項の順序、係数が最適化ランドスケープの粗さに与える影響を明らかにすることができるが、第一原理からVQAの複雑なランドスケープを予測することは非常に困難であり、一般的には実現不可能である。 In this paper, we aim to expand the understanding of the relationship between the composition of the Hamiltonian in the Quantum Approximate Optimization Algorithm (QAOA) and the corresponding cost landscape characteristics. QAOA is a prominent example of a Variational Quantum Algorithm (VQA), which is most commonly used for combinatorial optimization. The success of QAOA heavily relies on parameter optimization, which is a great challenge, especially on scarce noisy quantum hardware. Thus understanding the cost function landscape can aid in designing better optimization heuristics and therefore potentially provide eventual value. We consider the case of 1-layer QAOA for Hamiltonians with up to 5-local terms and up to 20 qubits. In addition to visualizing the cost landscapes, we calculate their Fourier transform to study the relationship with the structure of the Hamiltonians from a complementary perspective. Furthermore, we introduce metrics to quantify the roughness of the landscape, which provide valuable insights into the nature of high-dimensional parametrized landscapes. While these techniques allow us to elucidate the role of Hamiltonian structure, order of the terms and their coefficients on the roughness of the optimization landscape, we also find that predicting the intricate landscapes of VQAs from first principles is very challenging and unlikely to be feasible in general.	翻訳日:2023-10-26 20:51:00 公開日:2023-10-24
# 多言語機械翻訳におけるデータ不均衡と表現変性の緩和 Mitigating Data Imbalance and Representation Degeneration in Multilingual Machine Translation ( http://arxiv.org/abs/2305.12786v2 ) ライセンス: Link先を確認	Wen Lai, Alexandra Chronopoulou, Alexander Fraser	(参考訳) 多言語ニューラルマシン翻訳(mnmt)の進歩にもかかわらず、この分野には依然として2つの大きな課題があると主張している。データ不均衡問題は、全ての言語対、特にロングテール言語(すなわち非常に低リソース言語)における並列コーパスの量の不均衡を指す。表現退化問題(representation degeneration problem)とは、mnmtモデルで利用可能な全空間の小さな部分空間にのみ現れるエンコードされたトークンの問題を指す。そこで,本稿では,mnmtモデルの性能向上のために,ターゲット側単言語データとバイリンガル辞書のみを使用するフレームワークであるbi-aclを提案する。我々は、オンライン制約ビーム探索とカリキュラム学習サンプリング戦略を組み合わせた双方向オートエンコーダと双方向コントラスト学習という2つのモジュールを定義した。広範な実験により,提案手法は,ロングテール言語と高リソース言語の両方においてより効果的であることが判明した。また,我々のアプローチは,ゼロショットシナリオでドメインと言語間の知識を伝達できることを実証する。 Despite advances in multilingual neural machine translation (MNMT), we argue that there are still two major challenges in this area: data imbalance and representation degeneration. The data imbalance problem refers to the imbalance in the amount of parallel corpora for all language pairs, especially for long-tail languages (i.e., very low-resource languages). The representation degeneration problem refers to the problem of encoded tokens tending to appear only in a small subspace of the full space available to the MNMT model. To solve these two issues, we propose Bi-ACL, a framework that uses only target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model. We define two modules, named bidirectional autoencoder and bidirectional contrastive learning, which we combine with an online constrained beam search and a curriculum learning sampling strategy. Extensive experiments show that our proposed method is more effective both in long-tail languages and in high-resource languages. We also demonstrate that our approach is capable of transferring knowledge between domains and languages in zero-shot scenarios.	翻訳日:2023-10-26 20:50:31 公開日:2023-10-24
# ほとんどのニューラルネットワークがほぼ学習可能 Most Neural Networks Are Almost Learnable ( http://arxiv.org/abs/2305.16508v3 ) ライセンス: Link先を確認	Amit Daniely, Nathan Srebro, Gal Vardi	(参考訳) ランダムな定数深度ネットワークを学習するためのPTASを提案する。固定された$\epsilon>0$とdeep $i$に対して、$\sqrt{d} \cdot \mathbb{S}^{d-1}$の任意の分布に対して、dep $i$のランダムなXavierネットワークを$\epsilon$の加算誤差まで学習するポリ時間アルゴリズムが存在することを示す。このアルゴリズムは(\bar{d})^{\mathrm{poly}(\epsilon^{-1})}$の時間とサンプルの複雑さで動作し、ここで$\bar d$はネットワークのサイズである。 Sigmoid や ReLU のような活性化の場合、境界は $(\bar{d})^{\mathrm{polylog}(\epsilon^{-1})}$ に改善され、定数深度ランダムネットワークを学習するための準ポリ時間アルゴリズムが生成される。 We present a PTAS for learning random constant-depth networks. We show that for any fixed $\epsilon>0$ and depth $i$, there is a poly-time algorithm that for any distribution on $\sqrt{d} \cdot \mathbb{S}^{d-1}$ learns random Xavier networks of depth $i$, up to an additive error of $\epsilon$. The algorithm runs in time and sample complexity of $(\bar{d})^{\mathrm{poly}(\epsilon^{-1})}$, where $\bar d$ is the size of the network. For some cases of sigmoid and ReLU-like activations the bound can be improved to $(\bar{d})^{\mathrm{polylog}(\epsilon^{-1})}$, resulting in a quasi-poly-time algorithm for learning constant depth random networks.	翻訳日:2023-10-26 20:43:01 公開日:2023-10-24
# ディックモデルにおける量子カオスとその変種 Quantum chaos in the Dicke model and its variants ( http://arxiv.org/abs/2305.15505v2 ) ライセンス: Link先を確認	Devvrat Tiwari and Subhashish Banerjee	(参考訳) 近年,時間外秩序相関器 (OTOC) が量子カオスの指標として注目されている。半古典的極限では、指数的成長速度は古典的リアプノフ指数に類似している。量子古典的対応は、多レベル原子と空洞場相互作用のモデルであるディックモデルのように、一体カオスシステムと相互作用を持つ現実的なシステムでサポートされている。この目的のために、オープン量子系設定におけるディックモデルの異なるバリエーションに対するOTOCを計算する。ディックモデルの超放射相転移とOTOCの関連性を検討した。さらに、otocと第2次コヒーレンス関数との関係も確立する。これは、量子光学モデルにおけるOTOCと量子カオスの実験的研究において重要である。 Recently, the out-of-time-ordered correlator (OTOC) has gained much attention as an indicator of quantum chaos. In the semi-classical limit, its exponential growth rate resembles the classical Lyapunov exponent. The quantum-classical correspondence has been supported for the one-body chaotic systems as well as realistic systems with interactions, as in the Dicke model, a model of multi-two-level atoms and cavity field interactions. To this end, we calculate the OTOC for different variations of the Dicke model in an open quantum system setting. The connection between the superradiant phase transition of the Dicke model and the OTOC is studied. Further, we establish a relation between the OTOC and the second-order coherence function. This becomes important for the experimental studies of the OTOC and quantum chaos in the models of quantum optics.	翻訳日:2023-10-26 20:42:23 公開日:2023-10-24
# 時系列分類におけるロバストな説明枠組み Robust Framework for Explanation Evaluation in Time Series Classification ( http://arxiv.org/abs/2306.05501v3 ) ライセンス: Link先を確認	Thu Trang Nguyen, Thach Le Nguyen, and Georgiana Ifrim	(参考訳) 時系列分類(英: time series classification)は、人間の活動認識、スポーツ分析、一般医療などの領域でよく見られる、一般的なデータ型、時間系列を扱うタスクである。本稿では時系列分類のための説明手法を定量的に評価・ランク付けするための枠組みを提供する。時系列の説明手法に対する近年の関心は、様々な説明手法を提供してきた。しかし、その説明が特定の問題について意見が一致しない場合、どちらを使うべきかは不明のままである。正しい答えを見つけるために複数の説明を比較することは自明ではない。 2つの重要な課題は、与えられた説明方法(例えば、分類タスクの関連性)の定量的かつ堅牢な評価方法と、説明手法を並べて比較する方法である。本稿では、時系列分類のための複数の相性に基づく説明を評価・比較するための堅牢なモデル非依存的説明評価フレームワークAMEEを提案する。このアプローチでは、各説明によって導かれる入力時系列にデータ摂動を加える。次に、摂動が分類精度に及ぼす影響を計測し、説明評価に用いる。その結果,時系列の判別部を乱すと分類精度が大きく変化し,各説明の評価に使用できることがわかった。異なるタイプの摂動と異なる種類の分類器にロバストにするために、摂動と分類器にまたがる精度の損失を集約する。この新しいアプローチは、異なる説明方法の定量化とランク付けを可能にします。合成データセットの定量的および定性的な分析、さまざまな時系列データセット、および既知の専門家基盤真理を持つ実世界のデータセットを提供する。 Time series classification is a task which deals with a prevalent data type, temporal sequences, common in domains such as human activity recognition, sports analytics and general healthcare. This paper provides a framework to quantitatively evaluate and rank explanation methods for time series classification. The recent interest in explanation methods for time series has provided a great variety of explanation techniques. Nevertheless, when the explanations disagree on a specific problem, it remains unclear which of them to use. Comparing multiple explanations to find the right answer is non-trivial. Two key challenges remain: how to quantitatively and robustly evaluate the informativeness of a given explanation method (i.e., relevance for the classification task), and how to compare explanation methods side-by-side. We propose AMEE, a robust Model-Agnostic Explanation Evaluation framework for evaluating and comparing multiple saliency-based explanations for time series classification. In this approach, data perturbation is added to the input time series guided by each explanation. The impact of perturbation on classification accuracy is then measured and used for explanation evaluation. The results show that perturbing discriminative parts of the time series leads to significant changes in classification accuracy which can be used to evaluate each explanation. To be robust to different types of perturbations and different types of classifiers, we aggregate the accuracy loss across perturbations and classifiers. This novel approach allows us to quantify and rank different explanation methods. We provide a quantitative and qualitative analysis for synthetic datasets, a variety of time-series datasets, as well as a real-world dataset with known expert ground truth.	翻訳日:2023-10-26 20:32:29 公開日:2023-10-24
# 逆問題に対するデータ一貫性を用いた直接拡散ブリッジ Direct Diffusion Bridge using Data Consistency for Inverse Problems ( http://arxiv.org/abs/2305.19809v2 ) ライセンス: Link先を確認	Hyungjin Chung, Jeongsol Kim, Jong Chul Ye	(参考訳) 拡散モデルに基づく逆問題解法は優れた性能を示したが、主にノイズから始まる逆拡散サンプリングを必要とするため、速度は制限されている。最近のいくつかの研究は、特定の逆問題に対してクリーンと腐敗を直接ブリッジすることで拡散過程を構築することでこの問題を緩和しようと試みている。本稿では,これらの既存の研究をDDB (Direct Diffusion Bridges) という名前で統一し,異なる理論に動機付けられながら,結果のアルゴリズムがパラメータの選択でのみ異なることを示す。そして、現在のddbフレームワークの重要な制限、すなわちデータの一貫性が保証されないことを強調します。この問題に対処するため,我々は,微調整を必要とせずにデータ一貫性を課す修正推論手順を提案する。得られた手法データをCDDB (Consistent DDB) と呼び、知覚と歪みの両指標において矛盾する結果が得られ、Pareto-frontier を最適な方向に効果的に推し進める。提案手法は両評価基準の最先端化を実現し,既存手法よりも優れていることを示す。コードはhttps://github.com/HJ-harry/CDDBで入手できる。 Diffusion model-based inverse problem solvers have shown impressive performance, but are limited in speed, mostly as they require reverse diffusion sampling starting from noise. Several recent works have tried to alleviate this problem by building a diffusion process, directly bridging the clean and the corrupted for specific inverse problems. In this paper, we first unify these existing works under the name Direct Diffusion Bridges (DDB), showing that while motivated by different theories, the resulting algorithms only differ in the choice of parameters. Then, we highlight a critical limitation of the current DDB framework, namely that it does not ensure data consistency. To address this problem, we propose a modified inference procedure that imposes data consistency without the need for fine-tuning. We term the resulting method data Consistent DDB (CDDB), which outperforms its inconsistent counterpart in terms of both perception and distortion metrics, thereby effectively pushing the Pareto-frontier toward the optimum. Our proposed method achieves state-of-the-art results on both evaluation criteria, showcasing its superiority over existing methods. Code is available at https://github.com/HJ-harry/CDDB	翻訳日:2023-10-26 20:29:46 公開日:2023-10-24
# 学習不可能なデータセットから何が学べるか? What Can We Learn from Unlearnable Datasets? ( http://arxiv.org/abs/2305.19254v2 ) ライセンス: Link先を確認	Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom Goldstein	(参考訳) 広範なWebスクレイピングの時代、未学習のデータセットメソッドは、ディープニューラルネットワークの一般化を防ぎ、データのプライバシを保護する可能性がある。しかし、それらの利用を危うくする多くの実用的な制限に加えて、データを保護する能力に疑問を投げかける多くの発見を行ないました。まず、学習不可能なデータセットでトレーニングされたニューラルネットワークはショートカットのみを学ぶと広く信じられている。これとは対照的に,ネットワークは高いテスト性能を期待できる有用な特徴を実際に学習することができ,画像保護が保証されていないことを示唆している。学習不能なデータセットは、追加の摂動の線形分離性を通じて学習ショートカットを誘導すると考えられている。摂動の線形分離性は必要条件ではないことを示す反例を提供する。線形分離可能な摂動を頼りにすべきでない理由を強調するため,ICML 2021 と ICLR 2023 で発行された未学習データセットから学習が可能な直交射影攻撃を提案する。提案手法は, 提案手法に比べてかなり複雑ではない。 In an era of widespread web scraping, unlearnable dataset methods have the potential to protect data privacy by preventing deep neural networks from generalizing. But in addition to a number of practical limitations that make their use unlikely, we make a number of findings that call into question their ability to safeguard data. First, it is widely believed that neural networks trained on unlearnable datasets only learn shortcuts, simpler rules that are not useful for generalization. In contrast, we find that networks actually can learn useful features that can be reweighed for high test performance, suggesting that image protection is not assured. Unlearnable datasets are also believed to induce learning shortcuts through linear separability of added perturbations. We provide a counterexample, demonstrating that linear separability of perturbations is not a necessary condition. To emphasize why linearly separable perturbations should not be relied upon, we propose an orthogonal projection attack which allows learning from unlearnable datasets published in ICML 2021 and ICLR 2023. Our proposed attack is significantly less complex than recently proposed techniques.	翻訳日:2023-10-26 20:29:11 公開日:2023-10-24
# 雑音帯域フィードバックを持つ逆数に対する行列ゲームに対する対数レグレット Logarithmic Regret for Matrix Games against an Adversary with Noisy Bandit Feedback ( http://arxiv.org/abs/2306.13233v2 ) ライセンス: Link先を確認	Arnab Maiti, Kevin Jamieson, Lillian J. Ratliff	(参考訳) 本稿では,列プレイヤーが行$i$を選択し,列プレイヤーが列$j$を選択し,列プレイヤーが平均$a_{i,j}$で騒がしい報酬を受け取る,ゼロサムマトリクスゲームの一変型について考察する。行プレイヤーの目的は、敵列プレイヤーに対してさえ、できるだけ多くの報酬を蓄積することである。もし行プレーヤが任意の報酬列に対して$\sqrt{T}$後悔を得るアルゴリズムであるEXP3戦略を使用すると、このゲーム設定におけるナッシュ平衡に対して$\sqrt{T}$後悔も達成される。しかしながら、EXP3戦略がゲームの構造のミオピックであるという事実から、O'Donoghue et al. (2021) はゲーム構造を活用する UCB スタイルのアルゴリズムを提案し、このアルゴリズムがEXP3を経験的に大きく上回ることを示した。彼らは、このucbスタイルのアルゴリズムが$\sqrt{t}$ regretを達成したことを示したが、本論文では、任意の敵に対して$\text{polylog}(t)$ regretを確実に達成するアルゴリズムが存在するかどうかを問う。単純な2 \times 2$設定を肯定する形で、この質問に答える新しいアルゴリズムを提案し、後悔の設定におけるゲームに対する最初のインスタンス依存保証を提供する。我々のアルゴリズムは2つの大きなハードルを克服します 1)nash平衡は1/\sqrt{t}$レートでしか推定できないが、対数的後悔を得る。 2) 敵がナッシュ均衡に関する情報を提供するか、または行プレイヤーが負の後悔をもたらすかを保証する行プレイヤー戦略を設計する。さらに、全情報の場合、最初のハードルがまだ関係している一般的な$n \times m$ケースに対処する。最後に、EXP3 と UCB ベースのアルゴリズムは、必ずしも $\sqrt{T}$ 以上の性能を発揮できないことを示す。 This paper considers a variant of zero-sum matrix games where at each timestep the row player chooses row $i$, the column player chooses column $j$, and the row player receives a noisy reward with mean $A_{i,j}$. The objective of the row player is to accumulate as much reward as possible, even against an adversarial column player. If the row player uses the EXP3 strategy, an algorithm known for obtaining $\sqrt{T}$ regret against an arbitrary sequence of rewards, it is immediate that the row player also achieves $\sqrt{T}$ regret relative to the Nash equilibrium in this game setting. However, partly motivated by the fact that the EXP3 strategy is myopic to the structure of the game, O'Donoghue et al. (2021) proposed a UCB-style algorithm that leverages the game structure and demonstrated that this algorithm greatly outperforms EXP3 empirically. While they showed that this UCB-style algorithm achieved $\sqrt{T}$ regret, in this paper we ask if there exists an algorithm that provably achieves $\text{polylog}(T)$ regret against any adversary, analogous to results from stochastic bandits. We propose a novel algorithm that answers this question in the affirmative for the simple $2 \times 2$ setting, providing the first instance-dependent guarantees for games in the regret setting. Our algorithm overcomes two major hurdles: 1) obtaining logarithmic regret even though the Nash equilibrium is estimable only at a $1/\sqrt{T}$ rate, and 2) designing row-player strategies that guarantee that either the adversary provides information about the Nash equilibrium, or the row player incurs negative regret. Moreover, in the full information case we address the general $n \times m$ case where the first hurdle is still relevant. Finally, we show that EXP3 and the UCB-based algorithm necessarily cannot perform better than $\sqrt{T}$.	翻訳日:2023-10-26 20:23:02 公開日:2023-10-24
# 深層アンサンブルを超えて:分布シフト下におけるベイズ深層学習の大規模評価 Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution Shift ( http://arxiv.org/abs/2306.12306v3 ) ライセンス: Link先を確認	Florian Seligmann, Philipp Becker, Michael Volpp, Gerhard Neumann	(参考訳) Bayesian Deep Learning (BDL) は、分布シフトしたデータに対するよく校正された予測を実現するための有望なアプローチである。それにもかかわらず、最近のSOTA手法を多様で現実的で挑戦的なベンチマークタスクを体系的に評価する大規模な調査は存在しない。本稿では,BDL研究の現状を明らかにするために,WILDSコレクションから,分散シフトによる一般化能力とキャリブレーションに着目した,挑戦的な分類と回帰作業を含む実世界のデータセットに対する最新のBDLアルゴリズムの評価を行った。我々は、大規模な、畳み込み、トランスフォーマーベースのニューラルネットワークアーキテクチャでアルゴリズムを比較した。特に,予測校正誤差の符号付きバージョンについて検討し,メソッドが過度か過度かを明らかにし,メソッドの振舞いに関するさらなる知見を提供する。さらに,スクラッチからのトレーニングが極めて高価である大規模事前学習モデルに対して,bdlの体系的評価を行った。最後に,近年のDeep Ensemblesの成功を踏まえ,一般的な単一モード後部近似をアンサンブルを用いて複数のモードに拡張する。単一モード近似は一般にモデルの一般化能力とキャリブレーションをかなりの差で向上させるが、大きなトランスフォーマーベース言語モデルを微調整する際のアンサンブルの失敗モードも同定する。この設定では、最終層ベイズ・バイ・バックプロップのような変分推論に基づくアプローチは、SWAGのような現代の近似推論アルゴリズムが最適なキャリブレーションを達成するのに対し、大きなマージンによる精度で他の手法よりも優れている。 Bayesian deep learning (BDL) is a promising approach to achieve well-calibrated predictions on distribution-shifted data. Nevertheless, there exists no large-scale survey that evaluates recent SOTA methods on diverse, realistic, and challenging benchmark tasks in a systematic manner. To provide a clear picture of the current state of BDL research, we evaluate modern BDL algorithms on real-world datasets from the WILDS collection containing challenging classification and regression tasks, with a focus on generalization capability and calibration under distribution shift. We compare the algorithms on a wide range of large, convolutional and transformer-based neural network architectures. In particular, we investigate a signed version of the expected calibration error that reveals whether the methods are over- or under-confident, providing further insight into the behavior of the methods. Further, we provide the first systematic evaluation of BDL for fine-tuning large pre-trained models, where training from scratch is prohibitively expensive. Finally, given the recent success of Deep Ensembles, we extend popular single-mode posterior approximations to multiple modes by the use of ensembles. While we find that ensembling single-mode approximations generally improves the generalization capability and calibration of the models by a significant margin, we also identify a failure mode of ensembles when finetuning large transformer-based language models. In this setting, variational inference based approaches such as last-layer Bayes By Backprop outperform other methods in terms of accuracy by a large margin, while modern approximate inference algorithms such as SWAG achieve the best calibration.	翻訳日:2023-10-26 20:22:07 公開日:2023-10-24
# SynerGPT:パーソナライズドドラッグのシナジー予測と薬物設計のためのインコンテキストラーニング SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design ( http://arxiv.org/abs/2307.11694v2 ) ライセンス: Link先を確認	Carl Edwards and Aakanksha Naik and Tushar Khot and Martin Burke and Heng Ji and Tom Hope	(参考訳) 相乗的な薬物の組み合わせを予測することは、がん治療、特に生検細胞を介して患者の特定の腫瘍にパーソナライズされた治療の発見を加速するのに役立つ。本稿では,文脈内薬物シナジー学習のための新しい設定とモデルを提案する。特定のがん細胞標的の文脈における10～20の薬物相乗関係の「個人化データセット」を作成した。私たちの目標は、そのコンテキストで追加の薬物シナジー関係を予測することです。 gpt言語モデル(lm)を"in-context learn"共通関数クラスに事前トレーニングする最近の作業に触発されて、gptモデルが"drug synergy function"を学習できるようにする新しい事前学習スキームを考案する。我々のモデルは -- テキストコーパス、分子指紋、タンパク質相互作用、その他のドメイン固有の知識を使用しない -- は、競争的な結果を達成することができる。さらに, モデルプロンプトを最適化する遺伝的アルゴリズムと文脈内アプローチを統合し, 患者生検を行った後, テスト対象のシナジー候補を選定する。最後に、特定の患者の「パーソナライズされたデータセット」をターゲットとした、特に相乗効果のある薬物の設計を可能にする逆薬物設計の新たなタスクについて検討する。我々の発見は、精密がん医学に重要な影響を与える可能性があり、またlmsの非テキスト事前トレーニングに関する興味深い疑問も提起できる。 Predicting synergistic drug combinations can help accelerate discovery of cancer treatments, particularly therapies personalized to a patient's specific tumor via biopsied cells. In this paper, we propose a novel setting and models for in-context drug synergy learning. We are given a small "personalized dataset" of 10-20 drug synergy relationships in the context of specific cancer cell targets. Our goal is to predict additional drug synergy relationships in that context. Inspired by recent work that pre-trains a GPT language model (LM) to "in-context learn" common function classes, we devise novel pre-training schemes that enable a GPT model to in-context learn "drug synergy functions". Our model -- which does not use any textual corpora, molecular fingerprints, protein interaction or any other domain-specific knowledge -- is able to achieve competitive results. We further integrate our in-context approach with a genetic algorithm to optimize model prompts and select synergy candidates to test after conducting a patient biopsy. Finally, we explore a novel task of inverse drug design which can potentially enable the design of drugs that synergize specifically to target a given patient's "personalized dataset". Our findings can potentially have an important impact on precision cancer medicine, and also raise intriguing questions on non-textual pre-training for LMs.	翻訳日:2023-10-26 20:11:25 公開日:2023-10-24
# back to optimization:拡散に基づくゼロショット3次元ポーズ推定 Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation ( http://arxiv.org/abs/2307.03833v3 ) ライセンス: Link先を確認	Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang	(参考訳) 学習に基づく手法は、従来の最適化に基づく手法よりも多くのベンチマークにおいて非常に優れた性能を持つ3Dヒューマンポーズ推定(HPE)タスクを支配している。それにもかかわらず、訓練されたネットワークは暗黙的にカメラ固有のパラメータとドメインベースの人間のポーズの分布と統計平均による推定ポーズを学習するため、2D-3Dリフト、画像から3D、あるいは拡散ベースの方法で学習ベースのモデルにとって、野生の3D HPEは依然として最大の課題である。一方、最適化に基づく手法は、より多様で洗練された人間のポーズを予測することができるケース・バイ・ケースを推定する。最適化と学習に基づく手法の利点を組み合わせることで、3D HPEの3次元 HPE に対する \textbf{Ze}ro-shot \textbf{D}iffusion-based \textbf{O}ptimization (\textbf{ZeDO}) パイプラインを提案する。当社のマルチハイポテーゼである \textit{\textbf{zedo}} は、人間3.6mの最先端(sota)性能を実現し、minmpjpeは51.4$mmで、2d-3dまたはimage-3dペアのトレーニングを行わない。さらに,我々の単一仮説であるtextit{\textbf{ZeDO}} は,PA-MPJPE 40.3$mm の 3DPW データセット上での SOTA 性能を達成している。 Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge for learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the \textbf{Ze}ro-shot \textbf{D}iffusion-based \textbf{O}ptimization (\textbf{ZeDO}) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis \textit{\textbf{ZeDO}} achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE $51.4$mm, without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis \textit{\textbf{ZeDO}} achieves SOTA performance on 3DPW dataset with PA-MPJPE $40.3$mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW.	翻訳日:2023-10-26 20:09:57 公開日:2023-10-24
# 3:1 Nesting Rules in Redistricting 3:1 Nesting Rules in Redistricting ( http://arxiv.org/abs/2308.00605v2 ) ライセンス: Link先を確認	Christopher Donnay	(参考訳) 立法再編成では、ほとんどの州が下院と上院の地図を別々に描いている。オハイオ州とウィスコンシン州は上院の選挙区に3:1のネスト規則、すなわち隣接する下院の3つの選挙区から作るよう求めている。我々は、この要件が再編成に与える影響、特に特定の政党が獲得した議席数について調査する。我々はマルコフチェインモンテカルロ法を用いて生成された2つのアンサンブルを比較した。一方はReCom連鎖を用いてネスト要求のない上院地図を生成するもので、もう一方は3:1ネスト要求の上院地図を生成する。 3:1のネスト規則を必要とすることは、勝利した席の分布に最小限の影響を与える。さらに、選択された下院地図がネストされた上院地図の分布に与える影響について検討し、下院レベルでの極端な議席偏差が上院レベルでの当選議席の分布に大きく影響しないことを見出した。 In legislative redistricting, most states draw their House and Senate maps separately. Ohio and Wisconsin require that their Senate districts be made with a 3:1 nesting rule, i.e., out of triplets of adjacent House districts. We seek to study the impact of this requirement on redistricting, specifically on the number of seats won by a particular political party. We compare two ensembles generated using Markov Chain Monte Carlo methods; one which uses the ReCom chain to generate Senate maps without a nesting requirement, and the other which uses a chain that generates Senate maps with a 3:1 nesting requirement. We find that requiring a 3:1 nesting rule has minimal impact on the distribution of seats won. Moreover, we study the impact the chosen House map has on the distribution of nested Senate maps, and find that an extreme seat bias at the House level does not significantly impact the distribution of seats won at the Senate level.	翻訳日:2023-10-26 20:00:15 公開日:2023-10-24
# 不均衡データを用いたクロスエントロピー損失下における非拘束特徴モデルの神経崩壊 Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss with Imbalanced Data ( http://arxiv.org/abs/2309.09725v2 ) ライセンス: Link先を確認	Wanli Hong and Shuyang Ling	(参考訳) 近年、コンピュータビジョンやテキスト処理の様々なタスクにおいて、ディープニューラルネットワーク(DNN)が大きな成功を収めているのを目撃している。興味深いことに、大量のパラメータを持つこれらのDNNは、特徴表現と終末期(TPT)における最終層分類器に類似した構造特性を共有している。具体的には、トレーニングデータ(各クラスが同じサンプル数を共有する)のバランスをとると、同じクラスのサンプルの特徴ベクトルが対応するクラス内平均特徴に収束し、ペアワイズ角が同じであることが観察される。この現象は、2019年にパパヤン、ハン、ドノホによって初めて言及されたNeural Collapse(NC)として知られている。近年の多くの研究は、いわゆるunconstrained feature model(ufm)を採用してこの現象を理論的に説明している。本稿では,非拘束特徴モデルの文脈におけるクロスエントロピー損失関数下の不均衡データへの n c 現象の拡張について検討する。私たちの貢献は最先端の成果と比較すると多様です。 (a)特徴ベクトルが崩壊現象、すなわち同じクラス内の特徴が同じ平均ベクトルに崩壊することを示す。 b) 平均特徴ベクトルは、もはや等角的タイトフレームを形成しない。その代わりに、その対角はサンプルサイズに依存する。 (c) 少数群の崩壊(少数群の特徴ベクトルが1つのベクトルに崩壊する)が起こるシャープしきい値も正確に特徴づける。 (d)最後に、サンプルサイズが大きくなるとデータサイズの不均衡の影響が減少する。以上より,不均衡データに対するクロスエントロピー損失下でのn c の全体像を示す。数値実験は我々の理論解析を裏付ける。 Recent years have witnessed the huge success of deep neural networks (DNNs) in various tasks of computer vision and text processing. Interestingly, these DNNs with massive number of parameters share similar structural properties on their feature representation and last-layer classifier at terminal phase of training (TPT). Specifically, if the training data are balanced (each class shares the same number of samples), it is observed that the feature vectors of samples from the same class converge to their corresponding in-class mean features and their pairwise angles are the same. This fascinating phenomenon is known as Neural Collapse (N C), first termed by Papyan, Han, and Donoho in 2019. Many recent works manage to theoretically explain this phenomenon by adopting so-called unconstrained feature model (UFM). In this paper, we study the extension of N C phenomenon to the imbalanced data under cross-entropy loss function in the context of unconstrained feature model. Our contribution is multi-fold compared with the state-of-the-art results: (a) we show that the feature vectors exhibit collapse phenomenon, i.e., the features within the same class collapse to the same mean vector; (b) the mean feature vectors no longer form an equiangular tight frame. Instead, their pairwise angles depend on the sample size; (c) we also precisely characterize the sharp threshold on which the minority collapse (the feature vectors of the minority groups collapse to one single vector) will take place; (d) finally, we argue that the effect of the imbalance in datasize diminishes as the sample size grows. Our results provide a complete picture of the N C under the cross-entropy loss for the imbalanced data. Numerical experiments confirm our theoretical analysis.	翻訳日:2023-10-26 19:51:12 公開日:2023-10-24
# グローバルが局所化:グローバルマスター方程式の効率的な多体力学 Global becomes local: Efficient many-body dynamics for global master equations ( http://arxiv.org/abs/2309.07105v2 ) ライセンス: Link先を確認	Alexander Schnell	(参考訳) この研究は、グローバル対ローカルマスター方程式の問題に進展をもたらす。レッドフィールドマスター方程式のような大域的マスター方程式(標準ボルン近似やマルコフ近似に従う)は、ハミルトニアン系を完全に対角化する必要がある。これは量子多体系の相互作用には特に困難である。我々は、相反(エネルギー)空間における短波相関時間展開について議論し、ハミルトニアンの対角化を避けるジャンプ作用素の連続展開をもたらす。局所的に1つの場所に結合された浴場の場合、これは典型的には、局所的なオペレーターの観点から、グローバルなレッドフィールドジャンプ演算子の拡張につながる。さらに、局所レッドフィールドマスター方程式を近似したリンドブラッド形式にマッピングし、より広い体系のクラスに適用できる一方で、従来の局所リンドブラッドアプローチと同じ概念上の利点を持つ方程式を与える。我々のアイデアは局所マスター方程式の非ヒューリスティックな基礎を生み出し、確立された多体法と組み合わせることができる。 This work makes progress on the issue of global- vs. local- master equations. Global master equations like the Redfield master equation (following from standard Born- and Markov- approximation) require a full diagonalization of the system Hamiltonian. This is especially challenging for interacting quantum many-body systems. We discuss a short-bath-correlation-time expansion in reciprocal (energy) space, leading to a series expansion of the jump operator, which avoids a diagonalization of the Hamiltonian. For a bath that is coupled locally to one site, this typically leads to an expansion of the global Redfield jump operator in terms of local operators. We additionally map the local Redfield master equation to an approximate Lindblad form, giving an equation which has the same conceptual advantages of traditional local Lindblad approaches, while being applicable in a much broader class of systems. Our ideas give rise to a non-heuristic foundation of local master equations, which can be combined with established many-body methods.	翻訳日:2023-10-26 19:50:21 公開日:2023-10-24
# egofalls - エゴセントリックカメラを用いた視覚聴覚データセットと転倒検出ベンチマーク EGOFALLS: A visual-audio dataset and benchmark for fall detection using egocentric cameras ( http://arxiv.org/abs/2309.04579v2 ) ライセンス: Link先を確認	Xueyi Wang	(参考訳) 転倒は重大であり、高齢者のような脆弱な人口にとって致命的である。これまでの研究は、単一のセンサー、画像、加速度計によるデータキャプチャによるフォールの検出に対処してきた。本研究では,エゴセントリックカメラで撮影した映像から抽出したマルチモーダルディスクリプタを利用する。提案手法は,抽出した記述子上に構築した遅延決定融合層を含む。さらに,提案手法を評価するためのデータセットを新たに収集した。この種の公開データセットとしてはこれが初めてのものだと考えています。データセットは、14人の被験者による10,948のビデオサンプルからなる。個々の特徴抽出器の性能,視覚情報の融合,視覚情報と音声情報の融合を評価するため,アブレーション実験を行った。さらに,内部および外部のクロスバリデーション実験を行った。その結果,遅延決定融合による音声情報と視覚情報の融合により検出性能が向上し,転倒防止・緩和に有望なツールとなることが示された。 Falls are significant and often fatal for vulnerable populations such as the elderly. Previous works have addressed the detection of falls by relying on data capture by a single sensor, images or accelerometers. In this work, we rely on multimodal descriptors extracted from videos captured by egocentric cameras. Our proposed method includes a late decision fusion layer that builds on top of the extracted descriptors. Furthermore, we collect a new dataset on which we assess our proposed approach. We believe this is the first public dataset of its kind. The dataset comprises 10,948 video samples by 14 subjects. We conducted ablation experiments to assess the performance of individual feature extractors, fusion of visual information, and fusion of both visual and audio information. Moreover, we experimented with internal and external cross-validation. Our results demonstrate that the fusion of audio and visual information through late decision fusion improves detection performance, making it a promising tool for fall prevention and mitigation.	翻訳日:2023-10-26 19:49:49 公開日:2023-10-24
# マルチタスク多言語機械翻訳のためのタスクベースMOE Task-Based MoE for Multitask Multilingual Machine Translation ( http://arxiv.org/abs/2308.15772v3 ) ライセンス: Link先を確認	Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla	(参考訳) Mixture-of-experts (MoE) アーキテクチャは多くのアプリケーションで深層モデルのトレーニングにおいて、多様なタスクのための強力な手法であることが証明されている。しかし、現在のMoE実装はタスク非依存であり、異なるタスクから全てのトークンを同じように扱う。そこで本研究では,タスク情報を異なる粒度レベルでMoEモデルに組み込む新しい手法を,動的タスクベースアダプタの共用により設計する。実験と解析により,マルチタスク多言語機械翻訳における高密度および標準MoEモデルに対するアプローチの利点が示された。タスク固有のアダプタでは、モデルを新しいタスクに効率的に一般化することができます。 Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters. Our experiments and analysis show the advantages of our approaches over the dense and canonical MoE models on multi-task multilingual machine translations. With task-specific adapters, our models can additionally generalize to new tasks efficiently.	翻訳日:2023-10-26 19:48:10 公開日:2023-10-24
# オープンソースツールキットと公開データを用いたウィスパースタイルの再現訓練 Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data ( http://arxiv.org/abs/2309.13876v3 ) ライセンス: Link先を確認	Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe	(参考訳) 大量のデータで事前学習した音声モデルは、大きな成功を収めている。 OpenAI Whisperは680k時間の教師付き音声データに基づいてトレーニングされた多言語マルチタスクモデルである。ゼロショット設定であっても、音声認識や翻訳のベンチマークによく当てはまる。しかし、そのようなモデルを開発するための完全なパイプライン(データ収集からトレーニングまで)は公開されていないため、研究者がパフォーマンスを改善し、効率性、堅牢性、公正性、バイアスといったトレーニング関連の問題に対処することは困難である。本研究は,オープンソースツールキットと公開データを用いたWhisperスタイルのトレーニングを再現するOpen Whisperスタイル音声モデル(OWSM)を提案する。 owsmはさらに多くの翻訳方向をサポートし、より効率的にトレーニングできる。データ準備、トレーニング、推論、スコアリングに使用されるすべてのスクリプトと、オープンサイエンスを促進するための事前訓練されたモデルとトレーニングログを公開します。 Pre-training speech models on large volumes of data has achieved remarkable success. OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised speech data. It generalizes well to various speech recognition and translation benchmarks even in a zero-shot setup. However, the full pipeline for developing such models (from data collection to training) is not publicly accessible, which makes it difficult for researchers to further improve its performance and address training-related issues such as efficiency, robustness, fairness, and bias. This work presents an Open Whisper-style Speech Model (OWSM), which reproduces Whisper-style training using an open-source toolkit and publicly available data. OWSM even supports more translation directions and can be more efficient to train. We will publicly release all scripts used for data preparation, training, inference, and scoring as well as pre-trained models and training logs to promote open science.	翻訳日:2023-10-26 19:40:55 公開日:2023-10-24
# ジョイントインタラクティブナビゲーションの拡散モデル A Diffusion-Model of Joint Interactive Navigation ( http://arxiv.org/abs/2309.12508v2 ) ライセンス: Link先を確認	Matthew Niedoba, Jonathan Wilder Lavington, Yunpeng Liu, Vasileios Lioutas, Justice Sefas, Xiaoxuan Liang, Dylan Green, Setareh Dabiri, Berend Zwartsenberg, Adam Scibior, Frank Wood	(参考訳) 自動運転車システムのシミュレーションには、シミュレーションされた交通参加者が多様で現実的な行動を示す必要がある。シミュレーションにおける事前記録された実世界の交通シナリオの使用は、現実主義を保証するが、安全クリティカルイベントの希少さにより、大規模な運転シナリオの収集が高価になる。本稿では,トラフィックシナリオ生成のための拡散ベース手法であるdjinnを提案する。提案手法は,過去,現在,未来からの柔軟な状態観測に基づいて,すべてのエージェントの軌道を協調的に拡散させる。人気トラジェクトリ予測データセットについて,共同トラジェクトリ指標を用いたアートパフォーマンスの現状を報告する。さらに, DJINNは, 目標ベースサンプリング, 行動クラスサンプリング, シナリオ編集など, 様々な価値条件分布からの直接的テストタイムサンプリングを柔軟に行えるかを示した。 Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN - a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing.	翻訳日:2023-10-26 19:40:18 公開日:2023-10-24
# mazeデータセットの生成と操作のための構成可能なライブラリ A Configurable Library for Generating and Manipulating Maze Datasets ( http://arxiv.org/abs/2309.10498v2 ) ライセンス: Link先を確認	Michael Igorevich Ivanitskiy, Rusheb Shah, Alex F. Spies, Tilman R\"auker, Dan Valentine, Can Rager, Lucia Quirke, Chris Mathwin, Guillaume Corlouer, Cecilia Diniz Behn, Samy Wu Fung	(参考訳) 分散シフトに機械学習モデルがどのように反応するかを理解することは、重要な研究課題である。 Mazesは、微妙な分布シフトと顕著な分布シフトの両方をシミュレートするニュアンスなプラットフォームを提供する様々な生成アルゴリズムのために、優れたテストベッドとして機能する。そこで本研究では,maze処理タスクからなるデータセットの生成,処理,視覚化のための包括的なライブラリである$\texttt{maze-dataset}$を提案する。このライブラリを使用すると、研究者はデータセットを簡単に作成でき、使用する生成アルゴリズム、選択したアルゴリズムに供給されるパラメータ、迷路を生成するフィルタを満たさなければならない。さらに、ラスタライズドおよびテキストベースを含む複数の出力フォーマットをサポートし、畳み込みニューラルネットワークと自己回帰トランスフォーマーモデルに対応している。これらのフォーマットは、可視化と変換のためのツールとともに、研究アプリケーションにおける汎用性と適応性を保証する。 Understanding how machine learning models respond to distributional shifts is a key research challenge. Mazes serve as an excellent testbed due to varied generation algorithms offering a nuanced platform to simulate both subtle and pronounced distributional shifts. To enable systematic investigations of model behavior on out-of-distribution data, we present $\texttt{maze-dataset}$, a comprehensive library for generating, processing, and visualizing datasets consisting of maze-solving tasks. With this library, researchers can easily create datasets, having extensive control over the generation algorithm used, the parameters fed to the algorithm of choice, and the filters that generated mazes must satisfy. Furthermore, it supports multiple output formats, including rasterized and text-based, catering to convolutional neural networks and autoregressive transformer models. These formats, along with tools for visualizing and converting between them, ensure versatility and adaptability in research applications.	翻訳日:2023-10-26 19:39:27 公開日:2023-10-24
# 編集による要約の改善 Improving Summarization with Human Edits ( http://arxiv.org/abs/2310.05857v2 ) ライセンス: Link先を確認	Zonghai Yao, Benjamin J Schloss, and Sai P. Selvaraj	(参考訳) 近年の研究では、人間のフィードバックパラダイムで学習し、人間の判断による高品質なテキストを生成することが期待されている。既存の作品は、人間のフィードバックを使って、一般的なドメイン抽象要約の大規模言語モデル(llm)を訓練し、従来よりも質の高い要約を得た。本稿では,より探索の少ない人間のフィードバック,すなわち人間の編集に焦点をあてる。トレーニングループにおいて,人文編集データとモデル生成データの両方を併用する新しい手法であるシーケンスアライメント(un)Likelihood Training(SALT)を提案する。また,既存のトレーニングデータから得られる真実の要約と人文編集のシミュレーションを実演し,トレーニング後に得られたモデル生成要約と合わせて,高価な人文データの必要性を低減させる。実験では,一般領域要約から医療領域要約まで,人間のフィードバック探索を拡張した。本研究は,人間および模倣編集による要約品質向上における塩の効果を示す。追加実験により、SALTは従来のRLHF法(人間の嗜好のために設計された)-DPOよりも優れた性能を示した。私たちの論文の証拠は、研究者にさまざまな人間のフィードバックアプローチを精査し、収集し、よりうまく活用するよう促すことを願っています。 Recent work has shown the promise of learning with human feedback paradigms to produce human-determined high-quality text. Existing works use human feedback to train large language models (LLMs) in general domain abstractive summarization and have obtained summary quality exceeding traditional likelihood training. In this paper, we focus on a less explored form of human feedback -- Human Edits. We propose Sequence Alignment (un)Likelihood Training (SALT), a novel technique to use both the human-edited and model-generated data together in the training loop. In addition, we demonstrate simulating Human Edits with ground truth summaries coming from existing training data -- Imitation edits, along with the model-generated summaries obtained after the training, to reduce the need for expensive human-edit data. In our experiments, we extend human feedback exploration from general domain summarization to medical domain summarization. Our results demonstrate the effectiveness of SALT in improving the summary quality with Human and Imitation Edits. Through additional experiments, we show that SALT outperforms the conventional RLHF method (designed for human preferences) -- DPO, when applied to human-edit data. We hope the evidence in our paper prompts researchers to explore, collect, and better use different human feedback approaches scalably.	翻訳日:2023-10-26 19:30:47 公開日:2023-10-24
# アンチレラキシエーション被覆および緩衝ガス充填アルカリ蒸気セルにおける光貯蔵の比較研究 Comparative study of light storage in antirelaxation-coated and buffer-gas-filled alkali vapor cells ( http://arxiv.org/abs/2310.03726v2 ) ライセンス: Link先を確認	Marin {\DH}uji\'c, D. Buhin, N. \v{S}anti\'c, D. Aumiler, and T. Ban	(参考訳) 熱ルビジウム蒸気中における電磁誘導透過 (EIT) を用いた反緩和コーティングおよび緩衝ガス充填アルカリ気相セルの光貯蔵特性の比較検討を行った。バッファーガス充填セルの使用は、抗リラクゼーションコーティング細胞と比較して保存時間と効率が約10倍向上した。我々は、ほぼ共鳴のeit$\lambda$-schemeを共振器の代わりに使用することにより、同様のメモリ寿命を維持しながら、バッファガス充填メモリ効率を最大6倍向上させる。本研究は,フィールド展開可能な量子メモリの開発に寄与する。量子記憶 We perform a comparative study of light storage in antirelaxation-coated and buffer-gas-filled alkali vapor cells using electromagnetically induced transparency (EIT) in warm rubidium vapor. The use of a buffer-gas-filled cell resulted in $\approx$10-fold improvement in storage time and efficiency compared to antirelaxation-coated cells. We achieve up to sixfold enhancement in buffer-gas-filled memory efficiency, while maintaining a similar memory lifetime, by employing a near-resonant EIT $\Lambda$-scheme instead of a resonant one. Our findings contribute to the development of field-deployable quantum memories. quantum memories.	翻訳日:2023-10-26 19:29:41 公開日:2023-10-24
# KGQuiz:大規模言語モデルにおける符号化知識の一般化の評価 KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models ( http://arxiv.org/abs/2310.09725v2 ) ライセンス: Link先を確認	Yuyang Bai, Shangbin Feng, Vidhisha Balachandran, Zhaoxuan Tan, Shiqi Lou, Tianxing He, Yulia Tsvetkov	(参考訳) 大規模言語モデル(llm)は知識集約型タスクにおいて顕著な性能を示し、実世界の知識がモデルパラメータにエンコードされていることを示唆する。しかし、限られた知識領域におけるいくつかの探索課題の他に、LLMの知識を体系的に評価する方法や、その知識能力がいかに一般化するかは、知識領域や徐々に複雑化するタスク形式でよく理解されていない。そこで本研究では,LLMの知識一般化能力を総合的に研究するための知識集約型ベンチマークKGQuizを提案する。 KGQuizは3つの知識ドメインをカバーするスケーラブルなフレームワークで、複雑さを増す5つのタスクで構成されている。我々は,LLMの知識能力とその一般化をより深く理解するために,KGQuizベンチマークを用いて,5つの知識集約タスクと知識領域の10個のオープンソースおよびブラックボックスLSMを評価した。大規模な実験では、LLMは簡単な知識のQAタスクにおいて印象的なパフォーマンスを達成する一方で、より複雑な推論やドメイン固有の事実の活用を必要とする設定やコンテキストは依然として重大な課題を呈している。 kgquizをテストベッドとして、ドメインとタスクフォーマット間のパフォーマンスの微妙な変動を分析し、最終的には幅広い知識ドメインとタスクにわたってllmsの知識能力を理解し、評価し、改善することを想定した。 Large language models (LLMs) demonstrate remarkable performance on knowledge-intensive tasks, suggesting that real-world knowledge is encoded in their model parameters. However, besides explorations on a few probing tasks in limited knowledge domains, it is not well understood how to evaluate LLMs' knowledge systematically and how well their knowledge abilities generalize, across a spectrum of knowledge domains and progressively complex task formats. To this end, we propose KGQuiz, a knowledge-intensive benchmark to comprehensively investigate the knowledge generalization abilities of LLMs. KGQuiz is a scalable framework constructed from triplet-based knowledge, which covers three knowledge domains and consists of five tasks with increasing complexity: true-or-false, multiple-choice QA, blank filling, factual editing, and open-ended knowledge generation. To gain a better understanding of LLMs' knowledge abilities and their generalization, we evaluate 10 open-source and black-box LLMs on the KGQuiz benchmark across the five knowledge-intensive tasks and knowledge domains. Extensive experiments demonstrate that LLMs achieve impressive performance in straightforward knowledge QA tasks, while settings and contexts requiring more complex reasoning or employing domain-specific facts still present significant challenges. We envision KGQuiz as a testbed to analyze such nuanced variations in performance across domains and task formats, and ultimately to understand, evaluate, and improve LLMs' knowledge abilities across a wide spectrum of knowledge domains and tasks.	翻訳日:2023-10-26 19:21:25 公開日:2023-10-24
# ノード回帰/分類のための無限幅グラフニューラルネットワーク Infinite Width Graph Neural Networks for Node Regression/ Classification ( http://arxiv.org/abs/2310.08176v2 ) ライセンス: Link先を確認	Yunus Cobanoglu	(参考訳) 本研究は,グラフ構造化データ上の完全連結深層ニューラルネットワークの一般化であるグラフニューラルネットワークの解析を行う。 Infinite Width Neural NetworksはDeep LearningをGaussian ProcessesとKernelsに接続している。 Gaussian ProcessesとKernelsは、ニューラルネットワークのハイパーパラメータをはるかに少なくし、不確実性推定に使用できるため、アプリケーションに対してよりユーザフレンドリである。この研究は、ガウス過程とカーネルをニューラルネットワークに接続する研究の量を増やしている。 Kernel と Gaussian Process のクローズドフォームは、標準の Graph Neural Network、Skip-Concatenate Connections を備えた Graph Neural Network、Graph Attention Neural Network など、さまざまなアーキテクチャから派生している。すべてのアーキテクチャは、トランスダクティブノードの回帰と分類のタスクにおいて、さまざまなデータセット上で評価される。さらに、効果的な抵抗として知られるスペクトルスパーシフィケーション手法は、ランタイムとメモリ要求を改善するために使用される。インダクティブグラフ学習タスク(グラフ回帰/分類)への設定の拡張は簡単であり、3.5で簡単に議論される。 This work analyzes Graph Neural Networks, a generalization of Fully-Connected Deep Neural Nets on Graph structured data, when their width, that is the number of nodes in each fullyconnected layer is increasing to infinity. Infinite Width Neural Networks are connecting Deep Learning to Gaussian Processes and Kernels, both Machine Learning Frameworks with long traditions and extensive theoretical foundations. Gaussian Processes and Kernels have much less hyperparameters then Neural Networks and can be used for uncertainty estimation, making them more user friendly for applications. This works extends the increasing amount of research connecting Gaussian Processes and Kernels to Neural Networks. The Kernel and Gaussian Process closed forms are derived for a variety of architectures, namely the standard Graph Neural Network, the Graph Neural Network with Skip-Concatenate Connections and the Graph Attention Neural Network. All architectures are evaluated on a variety of datasets on the task of transductive Node Regression and Classification. Additionally, a Spectral Sparsification method known as Effective Resistance is used to improve runtime and memory requirements. Extending the setting to inductive graph learning tasks (Graph Regression/ Classification) is straightforward and is briefly discussed in 3.5.	翻訳日:2023-10-26 19:20:36 公開日:2023-10-24
# 最適な探索はトンプソンサンプリングよりも難しくない Optimal Exploration is no harder than Thompson Sampling ( http://arxiv.org/abs/2310.06069v2 ) ライセンス: Link先を確認	Zhaoqi Li, Kevin Jamieson, Lalit Jain	(参考訳) 腕の組 $\mathcal{Z}\subset \mathbb{R}^d$ と未知のパラメータベクトル $\theta_\ast\mathbb{R}^d$ が与えられたとき、純粋な探索線形バンドイ問題は $\arg\max_{z\in \mathcal{Z}} z^{\top}\theta_{\ast}$ を返すことを目的としており、$x^{\top}\theta_{\ast}$ と $x\in \mathcal{X}\subset \mathbb{R}^d$ のノイズ測定による確率が高い。既存の(漸近的に)最適な方法が必要か a) 各アームに対する潜在的にコストがかかるプロジェクション $z\in \mathcal{Z}$ b) それぞれの時点で$\mathcal{Z}$のサブセットを明示的に保持すること。この複雑さは、後悔の最小化のために人気があり単純なトンプソンサンプリングアルゴリズムと矛盾する。これは後続サンプリングとargmaxオラクルへのアクセスを必要とするだけであり、任意の時点で$\mathcal{Z}$を列挙する必要はない。残念ながら、トンプソンサンプリングは純粋な探査に最適ではないことが知られている。最適な探索が可能で、トンプソンサンプリングと同じ計算プリミティブしか必要としないアルゴリズムがあるのだろうか? 私たちはその質問を肯定的に答える。我々はサンプリングとargmaxのみを利用するアルゴリズムを提供し、指数関数収束率を達成し、指数は漸近的に可能な全ての割り当ての中で最適である。さらに,本アルゴリズムは,既存の漸近的最適手法と同様に,容易に実装および実行可能であることを示す。 Given a set of arms $\mathcal{Z}\subset \mathbb{R}^d$ and an unknown parameter vector $\theta_\ast\in\mathbb{R}^d$, the pure exploration linear bandit problem aims to return $\arg\max_{z\in \mathcal{Z}} z^{\top}\theta_{\ast}$, with high probability through noisy measurements of $x^{\top}\theta_{\ast}$ with $x\in \mathcal{X}\subset \mathbb{R}^d$. Existing (asymptotically) optimal methods require either a) potentially costly projections for each arm $z\in \mathcal{Z}$ or b) explicitly maintaining a subset of $\mathcal{Z}$ under consideration at each time. This complexity is at odds with the popular and simple Thompson Sampling algorithm for regret minimization, which just requires access to a posterior sampling and argmax oracle, and does not need to enumerate $\mathcal{Z}$ at any point. Unfortunately, Thompson sampling is known to be sub-optimal for pure exploration. In this work, we pose a natural question: is there an algorithm that can explore optimally and only needs the same computational primitives as Thompson Sampling? We answer the question in the affirmative. We provide an algorithm that leverages only sampling and argmax oracles and achieves an exponential convergence rate, with the exponent being the optimal among all possible allocations asymptotically. In addition, we show that our algorithm can be easily implemented and performs as well empirically as existing asymptotically optimal methods.	翻訳日:2023-10-26 19:19:33 公開日:2023-10-24
# segue: 現実世界における顔のプライバシー保護のための、サイドインフォメーションによる生成不能な例 Segue: Side-information Guided Generative Unlearnable Examples for Facial Privacy Protection in Real World ( http://arxiv.org/abs/2310.16061v1 ) ライセンス: Link先を確認	Zhiling Zhang, Jie Zhang, Kui Zhang, Wenbo Zhou, Weiming Zhang, and Nenghai Yu	(参考訳) 顔認識技術の普及は、多くの個人が顔データの収集と利用を心配しているため、プライバシー上の懸念を引き起こしている。これらの懸念に対処するため、研究者はモデルトレーニング段階におけるデータに知覚不可能な摂動を加えることで、モデルが対象の顔の特徴を識別するのを防ぐことを目的として、「非学習可能な例」の概念を積極的に検討している。しかし、現在の手法は非効率であり、トランスファービリティとロバスト性を同時に保証できないため、現実世界では非現実性を引き起こす。そこで本研究では,sgue: side-information guided generative unlearnable という新しい手法を提案する。具体的には,一度学習したマルチユースモデルを用いて,時間消費勾配法ではなく,所望の摂動を生成する。転送性を改善するために,各シナリオに固有の真のラベルや擬似ラベルなどの側面情報を導入する。堅牢性向上のため、トレーニングパイプラインには歪み層が組み込まれている。広範な実験により、提案法が従来の方法よりはるかに高速であることが証明され(1000$\times$)、異なるデータセットとモデルアーキテクチャ間で転送可能な効率性を達成する。さらに、JPEG圧縮、敵トレーニング、およびいくつかの標準的なデータ拡張に抵抗することができる。 The widespread use of face recognition technology has given rise to privacy concerns, as many individuals are worried about the collection and utilization of their facial data. To address these concerns, researchers are actively exploring the concept of ``unlearnable examples", by adding imperceptible perturbation to data in the model training stage, which aims to prevent the model from learning discriminate features of the target face. However, current methods are inefficient and cannot guarantee transferability and robustness at the same time, causing impracticality in the real world. To remedy it, we propose a novel method called Segue: Side-information guided generative unlearnable examples. Specifically, we leverage a once-trained multiple-used model to generate the desired perturbation rather than the time-consuming gradient-based method. To improve transferability, we introduce side information such as true labels and pseudo labels, which are inherently consistent across different scenarios. For robustness enhancement, a distortion layer is integrated into the training pipeline. Extensive experiments demonstrate that the proposed Segue is much faster than previous methods (1000$\times$) and achieves transferable effectiveness across different datasets and model architectures. Furthermore, it can resist JPEG compression, adversarial training, and some standard data augmentations.	翻訳日:2023-10-26 19:10:31 公開日:2023-10-24
# go-16衛星観測による対流開始時点の物理的説明可能な深層学習 Physically Explainable Deep Learning for Convective Initiation Nowcasting Using GOES-16 Satellite Observations ( http://arxiv.org/abs/2310.16015v1 ) ライセンス: Link先を確認	Da Fan, Steven J. Greybush, David John Gagne II, and Eugene E. Clothiaux	(参考訳) Convection Initiation (CI) nowcasting は、数値天気予報モデルと既存の nowcasting アルゴリズムの両方において難しい問題である。本研究では,多チャンネル赤外線GOES-R衛星観測に基づくCI予測のためのオブジェクトベース確率的深層学習モデルを開発した。このデータは、2020年6月から2021年6月にかけて、グレートプレーンズ地域のマルチレーダーマルチセンサードップラー気象レーダ製品で発見されたciの可能性のある事象に関するパッチから得られたものだ。客観的なレーダーベースのアプローチは、これらのイベントを識別するために使用される。ディープラーニングモデルは、特に誤報率において、リードタイムで最大1時間までの古典的ロジスティックモデルを著しく上回る。ケーススタディを通じて、深層学習モデルは、雲と湿気の特性に複数のレベルで依存することを示す。モデル説明は、モデルの決定過程を異なるベースラインで明らかにする。説明結果は,ベースラインの選択によって異なるレベルの水分と雲の特徴の重要性を強調した。本研究は, モデル行動の理解を深め, 科学的洞察を得る上で, 異なるベースラインを用いることの利点を示す。 Convection initiation (CI) nowcasting remains a challenging problem for both numerical weather prediction models and existing nowcasting algorithms. In this study, object-based probabilistic deep learning models are developed to predict CI based on multichannel infrared GOES-R satellite observations. The data come from patches surrounding potential CI events identified in Multi-Radar Multi-Sensor Doppler weather radar products over the Great Plains region from June and July 2020 and June 2021. An objective radar-based approach is used to identify these events. The deep learning models significantly outperform the classical logistic model at lead times up to 1 hour, especially on the false alarm ratio. Through case studies, the deep learning model exhibits the dependence on the characteristics of clouds and moisture at multiple levels. Model explanation further reveals the model's decision-making process with different baselines. The explanation results highlight the importance of moisture and cloud features at different levels depending on the choice of baseline. Our study demonstrates the advantage of using different baselines in further understanding model behavior and gaining scientific insights.	翻訳日:2023-10-26 19:09:57 公開日:2023-10-24
# 量子シミュレータにおける位相励起と創発フェルミオンのゲージ冷却 Gauged cooling of topological excitations and emergent fermions on quantum simulators ( http://arxiv.org/abs/2310.16082v1 ) ライセンス: Link先を確認	Gilad Kishony, Mark S. Rudner, Achim Rosch, Erez Berg	(参考訳) シミュレーション冷却は、短期量子シミュレータ上で多体ハミルトニアンの低エネルギー状態を作成するための堅牢な方法である。このようなスキームでは、シミュレータのスピン(またはキュービット)のサブセットは、興味のあるシステムからエネルギーとエントロピーを抽出する「バス」として扱われる。しかし、このようなプロトコルは、トポロジカル位相のような微視的な自由度で励起が極めて非局所的なシステムに適用される場合、非効率であり、そのような励起は浴への局所結合によって抽出することが困難である。我々は,システムの自由度を非局所的に量子シミュレータに符号化することで,この障害を克服するための経路を探究する。提案手法を説明するために,IsingスピンをZ_2$ゲージ場に結合し,励起を除去するための貯留体として同時に機能する"ゲージ冷却"プロトコルを用いて,励起がドメイン壁である量子Isingモデルの強磁性相を効率的に冷却する方法を示す。本プロトコルは強磁性相と常磁性相の基底状態を等しく効率的に作成できることを示す。ゲージ冷却プロトコルは自然に(相互作用する)フェルミオン系に拡張され、単一フェルミオンホッピングによるフェルミオン浴とのカップリングによる冷却に相当する。 Simulated cooling is a robust method for preparing low-energy states of many-body Hamiltonians on near-term quantum simulators. In such schemes, a subset of the simulator's spins (or qubits) are treated as a "bath," which extracts energy and entropy from the system of interest. However, such protocols are inefficient when applied to systems whose excitations are highly non-local in terms of the microscopic degrees of freedom, such as topological phases of matter; such excitations are difficult to extract by a local coupling to a bath. We explore a route to overcome this obstacle by encoding of the system's degrees of freedom into those of the quantum simulator in a non-local manner. To illustrate the approach, we show how to efficiently cool the ferromagnetic phase of the quantum Ising model, whose excitations are domain walls, via a "gauged cooling" protocol in which the Ising spins are coupled to a $Z_2$ gauge field that simultaneously acts as a reservoir for removing excitations. We show that our protocol can prepare the ground states of the ferromagnetic and paramagnetic phases equally efficiently. The gauged cooling protocol naturally extends to (interacting) fermionic systems, where it is equivalent to cooling by coupling to a fermionic bath via single-fermion hopping.	翻訳日:2023-10-26 19:01:54 公開日:2023-10-24
# リニア変圧器の実用計算力とその繰り返し・自己参照拡張 Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions ( http://arxiv.org/abs/2310.16076v1 ) ライセンス: Link先を確認	Kazuki Irie, R\'obert Csord\'as, J\"urgen Schmidhuber	(参考訳) 最近のリカレントニューラルネットワーク(RNN)の計算能力の研究は、リアルタイムおよび有限精度の仮定を与えられたRNNアーキテクチャの階層構造を明らかにしている。本稿では,線形変換器 (LT) やFWP (Fast Weight Programmers) を線形化した自動回帰変換器について検討する。 LTは固定サイズのRNNライクなシーケンスプロセッサと等価であるという意味で特有であり、今や人気になっている自己アテンションネットワークとしても表現できる。本稿では,標準トランスフォーマーのLT/FWPへの直接転送について述べる。正規言語認識実験により,fwpや自己回帰重み行列といった最近提案されたfwp拡張が,例えばパリティ問題の一般化を可能にするltの制限を克服することに成功したことを示す。私たちのコードは公開されています。 Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions. Here we study auto-regressive Transformers with linearised attention, a.k.a. linear Transformers (LTs) or Fast Weight Programmers (FWPs). LTs are special in the sense that they are equivalent to RNN-like sequence processors with a fixed-size state, while they can also be expressed as the now-popular self-attention networks. We show that many well-known results for the standard Transformer directly transfer to LTs/FWPs. Our formal language recognition experiments demonstrate how recently proposed FWP extensions such as recurrent FWPs and self-referential weight matrices successfully overcome certain limitations of the LT, e.g., allowing for generalisation on the parity problem. Our code is public.	翻訳日:2023-10-26 19:01:30 公開日:2023-10-24
# RePoseDM: Pose Guided Image Synthesis における繰り返しポッドアライメントとグラディエントガイダンス RePoseDM: Recurrent Pose Alignment and Gradient Guidance for Pose Guided Image Synthesis ( http://arxiv.org/abs/2310.16074v1 ) ライセンス: Link先を確認	Anant Khandelwal	(参考訳) ポーズ誘導された人物画像合成タスクは、フォトリアリスティックな外観と欠陥のないポーズ転送を備えた参照イメージを再レンダリングする必要がある。人物画像は高度に構造化されているため、既存のアプローチでは複雑な変形や閉塞のために密接な接続を必要としている。しかし畳み込みニューラルネットワークによって生成される特徴マップには等分散性がなく、したがってマルチレベルウォーピングでさえ完全なポーズアライメントを持っていない。拡散モデルが与えられた条件付きガイダンスからフォトリアリスティックな画像を生成する能力に着想を得て,ポーズアライメントを条件付きガイダンスとして提案する。さらに,対象ポーズからの距離を入力として適切なポーズ多様体から出力するポーズ相互作用場からの勾配誘導を提案する。これは、フォトリアリズムと非歪なテクスチャの詳細をもたらす、もっともらしいポーズ伝達軌道の学習に役立つ。 2つの大規模ベンチマークとユーザ調査の結果から,提案手法が課題シナリオにおいて,フォトリアリスティックなポーズ伝達を生成する可能性を実証した。また,HumanArtデータセット上でのポーズ誘導画像生成における勾配誘導の効率性を示す。 Pose-guided person image synthesis task requires re-rendering a reference image, which should have a photorealistic appearance and flawless pose transfer. Since person images are highly structured, existing approaches require dense connections for complex deformations and occlusions because these are generally handled through multi-level warping and masking in latent space. But the feature maps generated by convolutional neural networks do not have equivariance, and hence even the multi-level warping does not have a perfect pose alignment. Inspired by the ability of the diffusion model to generate photorealistic images from the given conditional guidance, we propose recurrent pose alignment to provide pose-aligned texture features as conditional guidance. Moreover, we propose gradient guidance from pose interaction fields, which output the distance from the valid pose manifold given a target pose as input. This helps in learning plausible pose transfer trajectories that result in photorealism and undistorted texture details. Extensive results on two large-scale benchmarks and a user study demonstrate the ability of our proposed approach to generate photorealistic pose transfer under challenging scenarios. Additionally, we prove the efficiency of gradient guidance in pose-guided image generation on the HumanArt dataset with fine-tuned stable diffusion.	翻訳日:2023-10-26 19:01:11 公開日:2023-10-24
# ビデオにおける非偏在シーングラフ生成の相関バイアス Correlation Debiasing for Unbiased Scene Graph Generation in Videos ( http://arxiv.org/abs/2310.16073v1 ) ライセンス: Link先を確認	Anant Khandelwal	(参考訳) ビデオからの動的シーングラフ生成(SGG)は、時間的変動に起因するシーン全体のオブジェクトを包括的に理解するだけでなく、時間的動きと異なるオブジェクトとの相互作用のモデルを必要とする。さらに、視覚関係のロングテール分布は、多くの動的sgg法において重要なボトルネックであり、そのほとんどが複雑なアーキテクチャを用いて時空間的コンテキストを捉えることに焦点を当てており、バイアス付きシーングラフの生成に繋がる。これらの課題に対処するために,フローアウェアな時間的一貫性と不確実性との相関脱バイアスを,非バイアス動的シーングラフに対して提案する。 FloCoDeはフローを使ってフレーム全体の時間的に一貫したオブジェクトを検出する。さらに、ロングテールクラスの偏りのない関係表現を学ぶために相関デバイアスを用いる。さらに、予測の不確実性を弱めるために、sgmoidal cross-entropy loss と contrastive loss を混合してラベル相関を組み込んで、共通の共起関係を識別し、長い尾を持つ関係を弱めるのに役立つ。大規模な実験的評価は、より偏りのないシーングラフを生成する優位性を示す最大4.1%のパフォーマンス向上を示している。 Dynamic scene graph generation (SGG) from videos requires not only comprehensive understanding of objects across the scenes that are prone to temporal fluctuations but also a model the temporal motions and interactions with different objects. Moreover, the long-tailed distribution of visual relationships is the crucial bottleneck of most dynamic SGG methods, since most of them focus on capturing spatio-temporal context using complex architectures, which leads to the generation of biased scene graphs. To address these challenges, we propose FloCoDe: Flow-aware temporal consistency and Correlation Debiasing with uncertainty attenuation for unbiased dynamic scene graphs. FloCoDe employs feature warping using flow to detect temporally consistent objects across the frames. In addition, it uses correlation debiasing to learn the unbiased relation representation for long-tailed classes. Moreover, to attenuate the predictive uncertainties, it uses a mixture of sigmoidal cross-entropy loss and contrastive loss to incorporate label correlations to identify the commonly co-occurring relations and help debias the long-tailed ones. Extensive experimental evaluation shows a performance gain as high as 4.1% showing the superiority of generating more unbiased scene graphs.	翻訳日:2023-10-26 19:00:50 公開日:2023-10-24
# 畳み込みLSTMを用いた大学キャンパスにおける格子周波数予測 Grid Frequency Forecasting in University Campuses using Convolutional LSTM ( http://arxiv.org/abs/2310.16071v1 ) ライセンス: Link先を確認	Aneesh Sathe, Wen Ren Yang	(参考訳) 現代の電力網は複雑化しており、主に再生可能エネルギー源の統合と消費パターンの進化に起因している。本稿では,畳み込みニューラルネットワーク(CNN)とLong Short-Term Memory(LSTM)を用いて,グリッド周波数の時系列予測モデルを構築する手法を提案する。これらのモデルは、グリッド周波数データに固有の時空間的複雑さを効果的に捉え、予測精度を著しく向上し、電力グリッドの信頼性を高める。本研究は,大学キャンパス内の建物を対象とした個別コンボリューショナルLSTM(ConvLSTM)モデルの可能性と開発について検討し,各建物に対して個別に学習し,評価することを可能にする。個々のConvLSTMモデルは、各キャンパスビルの電力消費データに基づいて訓練され、歴史的傾向に基づいてグリッド周波数を予測する。その結果、平均二乗誤差(mse)、平均絶対誤差(mae)、平均絶対パーセンテージ誤差(mape)といった性能指標によって示される従来の予測手法よりも、提案モデルが優れていることが示された。さらに、アンサンブルモデルによって、建物固有のモデルから洞察を集約し、キャンパス全体に包括的な予測を提供する。このアプローチは、各建物固有の電力消費データのプライバシーとセキュリティを保証する。 The modern power grid is facing increasing complexities, primarily stemming from the integration of renewable energy sources and evolving consumption patterns. This paper introduces an innovative methodology that harnesses Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to establish robust time series forecasting models for grid frequency. These models effectively capture the spatiotemporal intricacies inherent in grid frequency data, significantly enhancing prediction accuracy and bolstering power grid reliability. The research explores the potential and development of individualized Convolutional LSTM (ConvLSTM) models for buildings within a university campus, enabling them to be independently trained and evaluated for each building. Individual ConvLSTM models are trained on power consumption data for each campus building and forecast the grid frequency based on historical trends. The results convincingly demonstrate the superiority of the proposed models over traditional forecasting techniques, as evidenced by performance metrics such as Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). Additionally, an Ensemble Model is formulated to aggregate insights from the building-specific models, delivering comprehensive forecasts for the entire campus. This approach ensures the privacy and security of power consumption data specific to each building.	翻訳日:2023-10-26 19:00:26 公開日:2023-10-24
# 交通予測のための時空間ハイパーグラフニューラルネットワーク Spatial-Temporal Hypergraph Neural Network for Traffic Forecasting ( http://arxiv.org/abs/2310.16070v1 ) ライセンス: Link先を確認	Chengzhi Yao, Zhi Li, Junbo Wang	(参考訳) モバイルインターネット開発と位置技術から恩恵を受ける交通予測は、インテリジェントトランスポーテーションシステムにおいて重要な役割を果たす。豊かで多様な交通アプリケーションの実装や、収集された交通データに基づいた便利な交通サービスの実現に役立ちます。既存のほとんどの手法はグラフベースのディープラーニングネットワークを利用して、交通予測を浅くする複雑な道路ネットワークをモデル化する。その効果にもかかわらず、これらの手法は一般に道路網のトポロジと交通力学による高次時間的依存関係によって引き起こされる高次空間依存性を完全に捉えることに制限されている。道路網のトポロジと交通力学を組み合わせて交通データの高次時空間依存性を捕捉するSTHODE: Spatio-Temporal Hypergraph Neural Ordinary Differential Equation Networkを提案する。技術的には、STHODEは空間モジュールと時間モジュールから構成される。一方,空間ハイパーグラフを構築し,適応型mixhopハイパーグラフodeネットワークを用いて高次空間依存性をキャプチャする。一方,時間的ハイパーグラフを用い,ハイパーエッジ進化型odeネットワークを用いて高次時間的依存関係をキャプチャする。最後に、積み重ねられたSTHODE層の出力を集約し、予測性能を相互に向上する。 4つの実世界のトラヒックデータセットで行った広範囲な実験により、提案モデルの性能が様々なベースラインよりも優れていることを示した。 Traffic forecasting, which benefits from mobile Internet development and position technologies, plays a critical role in Intelligent Transportation Systems. It helps to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data. Most existing methods usually leverage graph-based deep learning networks to model the complex road network for traffic forecasting shallowly. Despite their effectiveness, these methods are generally limited in fully capturing high-order spatial dependencies caused by road network topology and high-order temporal dependencies caused by traffic dynamics. To tackle the above issues, we focus on the essence of traffic system and propose STHODE: Spatio-Temporal Hypergraph Neural Ordinary Differential Equation Network, which combines road network topology and traffic dynamics to capture high-order spatio-temporal dependencies in traffic data. Technically, STHODE consists of a spatial module and a temporal module. On the one hand, we construct a spatial hypergraph and leverage an adaptive MixHop hypergraph ODE network to capture high-order spatial dependencies. On the other hand, we utilize a temporal hypergraph and employ a hyperedge evolving ODE network to capture high-order temporal dependencies. Finally, we aggregate the outputs of stacked STHODE layers to mutually enhance the prediction performance. Extensive experiments conducted on four real-world traffic datasets demonstrate the superior performance of our proposed model compared to various baselines.	翻訳日:2023-10-26 19:00:01 公開日:2023-10-24
# cpseg:chain-of-thought languageプロンプトによる細かな画像意味セグメンテーション CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting ( http://arxiv.org/abs/2310.16069v1 ) ライセンス: Link先を確認	Lei Li	(参考訳) 自然シーン分析とリモートセンシング画像は、大規模言語誘導コンテキスト認識データ利用の進歩に大きな可能性を秘めている。このポテンシャルは、設計言語プロンプトによるオブジェクト検出やセグメンテーションといった下流タスクのパフォーマンス向上に特に重要である。そこで本稿では,画像に関連づけられたテキスト情報を活用した新たな「思考の連鎖」プロセスを統合することにより,画像分割性能を向上させるための革新的なフレームワークである cpseg を紹介する。この画期的なアプローチは洪水災害のシナリオに適用されている。 CPSegは、様々な文から派生したプロンプトテキストを符号化し、コヒーレント連鎖を定式化する。我々は、画像、セマンティックマスク、および対応するテキスト情報を含む新しい視覚言語データセット、FloodPromptを提案する。これはシナリオの意味的理解を強化するだけでなく、ピクセルとテキストのマッチングマップの相互作用を通じて意味的セグメンテーションの重要なタスクを支援する。 CPSegの有効性を質的,定量的に検証した。 Natural scene analysis and remote sensing imagery offer immense potential for advancements in large-scale language-guided context-aware data utilization. This potential is particularly significant for enhancing performance in downstream tasks such as object detection and segmentation with designed language prompting. In light of this, we introduce the CPSeg, Chain-of-Thought Language Prompting for Finer-grained Semantic Segmentation), an innovative framework designed to augment image segmentation performance by integrating a novel "Chain-of-Thought" process that harnesses textual information associated with images. This groundbreaking approach has been applied to a flood disaster scenario. CPSeg encodes prompt texts derived from various sentences to formulate a coherent chain-of-thought. We propose a new vision-language dataset, FloodPrompt, which includes images, semantic masks, and corresponding text information. This not only strengthens the semantic understanding of the scenario but also aids in the key task of semantic segmentation through an interplay of pixel and text matching maps. Our qualitative and quantitative analyses validate the effectiveness of CPSeg.	翻訳日:2023-10-26 18:59:38 公開日:2023-10-24
# 超次元変換:関数のホログラフィック表現 The Hyperdimensional Transform: a Holographic Representation of Functions ( http://arxiv.org/abs/2310.16065v1 ) ライセンス: Link先を確認	Pieter Dewulf, Michiel Stock, Bernard De Baets	(参考訳) 積分変換は、関数をキャラクタリゼーションが容易な空間にマッピングする貴重な数学的ツールである。我々は超次元変換を新しい積分変換として導入する。正方積分可能な関数を超次元ベクトルと呼ばれるノイズロバスト、ホログラフィック、高次元表現に変換する。中心となる考え方は、関数をランダム関数の線型結合で近似することである。確率的直交基底関数の集合を正式に導入し、超次元変換とその逆写像を定義する。本稿では、その特異性、逆変換の近似特性、積分と微分の表現など、一般的な変換関連特性について論じる。超次元変換は、フーリエ変換、ラプラス変換、ファジィ変換などの他の積分変換と密接に結合する強力で柔軟なフレームワークを提供する。さらに、より効率的で説明可能な機械学習アルゴリズムに急速に注目を集めている超次元コンピューティングの分野に対する理論的基礎と新しい洞察を提供し、統計モデリングや機械学習の潜在的な応用の可能性を提供する。さらに,チュートリアルとして機能し,変換の計算から微分方程式の解法まで,実例の再現を可能にする,簡単で分かりやすいコードも提供する。 Integral transforms are invaluable mathematical tools to map functions into spaces where they are easier to characterize. We introduce the hyperdimensional transform as a new kind of integral transform. It converts square-integrable functions into noise-robust, holographic, high-dimensional representations called hyperdimensional vectors. The central idea is to approximate a function by a linear combination of random functions. We formally introduce a set of stochastic, orthogonal basis functions and define the hyperdimensional transform and its inverse. We discuss general transform-related properties such as its uniqueness, approximation properties of the inverse transform, and the representation of integrals and derivatives. The hyperdimensional transform offers a powerful, flexible framework that connects closely with other integral transforms, such as the Fourier, Laplace, and fuzzy transforms. Moreover, it provides theoretical foundations and new insights for the field of hyperdimensional computing, a computing paradigm that is rapidly gaining attention for efficient and explainable machine learning algorithms, with potential applications in statistical modelling and machine learning. In addition, we provide straightforward and easily understandable code, which can function as a tutorial and allows for the reproduction of the demonstrated examples, from computing the transform to solving differential equations.	翻訳日:2023-10-26 18:59:19 公開日:2023-10-24
# 学習可能なフィルタモジュールによる交通予測の強化 Enhancing Traffic Prediction with Learnable Filter Module ( http://arxiv.org/abs/2310.16063v1 ) ライセンス: Link先を確認	Yuanshao Zhu, Yongchao Ye, Xiangyu Zhao, and James J.Q. Yu	(参考訳) 将来の交通条件のモデル化は、時間的および空間的相関を捉えるために複雑な空間-時間的ニューラルネットワークに大きく依存することが多い。このノイズは、しばしば交通観測における予期せぬ短期的なピークや落下として現れ、交通事故や固有のセンサー振動によって引き起こされる。実際には、そのようなノイズはその確率的性質のためにモデル化することが困難であり、ニューラルネットワークがこの振る舞いを学習するように設計された場合、リスクを過度に当てはめる可能性がある。この問題に対処するために,トラフィックデータのノイズを適応的にフィルタする学習可能なフィルタモジュールを提案する。このモジュールはフーリエ変換を利用して、そのパターンに基づいてノイズがフィルタリングされる周波数領域にデータを変換する。離散データは逆フーリエ変換を用いて時間領域に復元される。提案手法は,交通予測モデルにおける入力データの品質向上に重点を置いている。提案するモジュールは軽量であり,既存モデルとの統合が容易であり,トラフィック予測性能を大幅に向上できることを示す。さらに,実世界のデータセットに対する広範囲な実験結果を用いて検証を行い,ノイズを効果的に軽減し,予測精度を向上させることを示す。 Modeling future traffic conditions often relies heavily on complex spatial-temporal neural networks to capture spatial and temporal correlations, which can overlook the inherent noise in the data. This noise, often manifesting as unexpected short-term peaks or drops in traffic observation, is typically caused by traffic accidents or inherent sensor vibration. In practice, such noise can be challenging to model due to its stochastic nature and can lead to overfitting risks if a neural network is designed to learn this behavior. To address this issue, we propose a learnable filter module to filter out noise in traffic data adaptively. This module leverages the Fourier transform to convert the data to the frequency domain, where noise is filtered based on its pattern. The denoised data is then recovered to the time domain using the inverse Fourier transform. Our approach focuses on enhancing the quality of the input data for traffic prediction models, which is a critical yet often overlooked aspect in the field. We demonstrate that the proposed module is lightweight, easy to integrate with existing models, and can significantly improve traffic prediction performance. Furthermore, we validate our approach with extensive experimental results on real-world datasets, showing that it effectively mitigates noise and enhances prediction accuracy.	翻訳日:2023-10-26 18:59:01 公開日:2023-10-24
# 調整済み大規模モデルの逆行領域適応における共同設立者のバランシング Confounder Balancing in Adversarial Domain Adaptation for Pre-Trained Large Models Fine-Tuning ( http://arxiv.org/abs/2310.16062v1 ) ライセンス: Link先を確認	Shuoran Jiang, Qingcai Chen, Yang Xiang, Youcheng Pan, Xiangping Wu	(参考訳) プレトレーニング済みの大規模モデル(PLM)における優れた一般化、文脈学習、および出現能力は、直接トレーニングデータなしで特定のタスクを処理し、ソースドメインから学習した知識をターゲットドメインに転送するために、敵対的ドメイン適応(ADA)手法のより良い基礎モデルとなる。しかし、既存のadaメソッドは、ターゲットドメインと異なるソースデータ分散の根本原因である、confounderを適切に考慮していない。本研究では, PLMs fine-tuning (ADA-CBF) のための共創バランシングを用いた対向ドメイン適応を提案する。 ADA−CBFは、特徴抽出器、ドメイン分類器及び共同分類器の基盤モデルとしてPLMを含み、対向的損失で共同訓練される。この損失は、ドメイン分類器の識別を希釈することで、ドメイン不変表現学習を改善するために設計されている。同時に、敵対的損失は、トレーニング中のソースドメインと未測定ドメインの共作者分布のバランスをとる。既存のADA法と比較して、ADA-CBFはドメイン不変の特徴の共創者を正しく識別し、PLMから抽出した特徴の共創バイアスを取り除くことができる。 ADA-CBFの共創者分類器はプラグアンドプレイとして設計されており、共創者計測可能、測定不能、または部分的に測定可能な環境に適用することができる。自然言語処理とコンピュータビジョンダウンストリームタスクの実証結果は、ADA-CBFが最新のGPT-4, LLaMA2, ViT, ADAメソッドより優れていることを示している。 The excellent generalization, contextual learning, and emergence abilities in the pre-trained large models (PLMs) handle specific tasks without direct training data, making them the better foundation models in the adversarial domain adaptation (ADA) methods to transfer knowledge learned from the source domain to target domains. However, existing ADA methods fail to account for the confounder properly, which is the root cause of the source data distribution that differs from the target domains. This study proposes an adversarial domain adaptation with confounder balancing for PLMs fine-tuning (ADA-CBF). The ADA-CBF includes a PLM as the foundation model for a feature extractor, a domain classifier and a confounder classifier, and they are jointly trained with an adversarial loss. This loss is designed to improve the domain-invariant representation learning by diluting the discrimination in the domain classifier. At the same time, the adversarial loss also balances the confounder distribution among source and unmeasured domains in training. Compared to existing ADA methods, ADA-CBF can correctly identify confounders in domain-invariant features, thereby eliminating the confounder biases in the extracted features from PLMs. The confounder classifier in ADA-CBF is designed as a plug-and-play and can be applied in the confounder measurable, unmeasurable, or partially measurable environments. Empirical results on natural language processing and computer vision downstream tasks show that ADA-CBF outperforms the newest GPT-4, LLaMA2, ViT and ADA methods.	翻訳日:2023-10-26 18:58:41 公開日:2023-10-24
# グラフを用いた分散オンライン学習のための局所的個人的勾配追跡 Locally Differentially Private Gradient Tracking for Distributed Online Learning over Directed Graphs ( http://arxiv.org/abs/2310.16105v1 ) ライセンス: Link先を確認	Ziqin Chen and Yongqiang Wang	(参考訳) 分散オンライン学習は、ストリーミングデータを含む大規模な機械学習問題を解決するのに極めて効果的であることが証明されている。しかし、分散学習における学習者間の情報共有は、個々の学習者のセンシティブなデータの漏洩を懸念させる。このリスクを軽減するため、分散オンライン学習において、プライバシー保護の「金の標準」として広く見なされている差分プライバシーが、多くの既存の結果に広く採用されている。しかし、これらの結果はしばしば、学習精度とプライバシーの根本的なトレードオフに直面します。本稿では,このトレードオフを回避するために,局所的微分勾配追跡に基づく分散オンライン学習アルゴリズムを提案する。解析の結果,提案アルゴリズムは局所的差分プライバシーの厳密性を確保しつつ,平均二乗に収束し,反復回数が無限大となる場合においても,累積的プライバシー予算が有限であることが保証された。このアルゴリズムは、学習者間のコミュニケーショングラフが向けられた場合でも適用できる。私たちの知る限りでは、有向グラフ上の分散オンライン学習において、学習精度と厳密な局所微分プライバシーを同時に確保する最初の結果です。我々は,Mushroomsデータセットのロジスティック回帰と,MNISTデータセットとCIFAR-10データセットのCNN画像分類を含む,複数のベンチマーク機械学習アプリケーションを用いて,アルゴリズムの性能を評価する。実験の結果,提案アルゴリズムが既存のアルゴリズムよりも精度が向上していることが確認された。 Distributed online learning has been proven extremely effective in solving large-scale machine learning problems involving streaming data. However, information sharing between learners in distributed learning also raises concerns about the potential leakage of individual learners' sensitive data. To mitigate this risk, differential privacy, which is widely regarded as the "gold standard" for privacy protection, has been widely employed in many existing results on distributed online learning. However, these results often face a fundamental tradeoff between learning accuracy and privacy. In this paper, we propose a locally differentially private gradient tracking based distributed online learning algorithm that successfully circumvents this tradeoff. Our analysis shows that the proposed algorithm converges in mean square to the exact optimal solution while ensuring rigorous local differential privacy, with the cumulative privacy budget guaranteed to be finite even when the number of iterations tends to infinity. The algorithm is applicable even when the communication graph among learners is directed. To the best of our knowledge, this is the first result that simultaneously ensures learning accuracy and rigorous local differential privacy in distributed online learning over directed graphs. We evaluate our algorithm's performance by using multiple benchmark machine-learning applications, including logistic regression on the "Mushrooms" dataset and CNN-based image classification on the "MNIST" and "CIFAR-10" datasets, respectively. The experimental results confirm that the proposed algorithm outperforms existing counterparts in both training and testing accuracies.	翻訳日:2023-10-26 18:49:48 公開日:2023-10-24
# LaksNet:Udacityシミュレーターにおける自動運転車のエンドツーエンドディープラーニングモデル LaksNet: an end-to-end deep learning model for self-driving cars in Udacity simulator ( http://arxiv.org/abs/2310.16103v1 ) ライセンス: Link先を確認	Lakshmikar R. Polamreddy and Youshan Zhang	(参考訳) 道路事故の大半は、注意散らし、無謀さ、飲酒運転など、人間の間違いによるものである。この危険な状況を克服する効果的な方法の1つは、車両に自動運転技術を実装することである。本稿では、自動運転車のための効率的なディープラーニングモデルの構築に着目する。本研究では、4つの畳み込み層と2つの完全連結層からなる新しい効果的畳み込みニューラルネットワークモデル「ラークスネット」を提案する。 Udacityシミュレータから生成されたトレーニングデータを用いて,LaksNetモデルを用いた広範な実験を行った。我々のモデルは、シミュレーターのトラックを降りることなく走行する車の走行時間において、既存のImageNetやNVIDIAモデルよりも優れています。 The majority of road accidents occur because of human errors, including distraction, recklessness, and drunken driving. One of the effective ways to overcome this dangerous situation is by implementing self-driving technologies in vehicles. In this paper, we focus on building an efficient deep-learning model for self-driving cars. We propose a new and effective convolutional neural network model called `LaksNet' consisting of four convolutional layers and two fully connected layers. We conduct extensive experiments using our LaksNet model with the training data generated from the Udacity simulator. Our model outperforms many existing pre-trained ImageNet and NVIDIA models in terms of the duration of the car for which it drives without going off the track on the simulator.	翻訳日:2023-10-26 18:49:23 公開日:2023-10-24
# 光子効率多光子顕微鏡のための学習・不確実性駆動型適応獲得 Learned, Uncertainty-driven Adaptive Acquisition for Photon-Efficient Multiphoton Microscopy ( http://arxiv.org/abs/2310.16102v1 ) ライセンス: Link先を確認	Cassandra Tong Ye, Jiashu Han, Kunzan Liu, Anastasios Angelopoulos, Linda Griffith, Kristina Monakhova, Sixian You	(参考訳) 多光子顕微鏡(MPM)は強力なイメージングツールであり、生体組織イメージングにおいて重要な効果がある。しかし、ほとんどの多光子顕微鏡プラットフォームは点走査に依存しているため、取得時間、視野(fov)、光毒性、および画質の間に固有のトレードオフがあり、高速で大きなfov、および/または穏やかな撮像が必要な場合、ノイズの測定結果が発生することが多い。深層学習は多光子顕微鏡測定に応用できるが、これらのアルゴリズムは幻覚を引き起こす傾向があり、医学や科学の分野では破滅的なものである。本稿では,多光子画像計測における画素方向の不確かさを同時に推定し,アルゴリズムの信頼性を改善し,深層学習予測のための統計的保証を提供する手法を提案する。さらに,この学習された画素単位の不確実性を利用して,サンプルの最も不確実な領域のみをスキャンする適応的取得手法を提案する。本研究では,ヒト子宮内膜組織のMPM測定実験において,微細な特徴を維持でき,各画素における不確かさを予測しながら,他の denoising 法より優れていることを示す。最後に, 適応的獲得手法を用いて, 試料中の微細な特徴を回収しながら, 120倍の取得時間と全光量削減効果を示した。実実験データを用いた復調作業における分布自由不確実性定量化と再構成不確実性に基づく適応的獲得の提案を最初に行った。 Multiphoton microscopy (MPM) is a powerful imaging tool that has been a critical enabler for live tissue imaging. However, since most multiphoton microscopy platforms rely on point scanning, there is an inherent trade-off between acquisition time, field of view (FOV), phototoxicity, and image quality, often resulting in noisy measurements when fast, large FOV, and/or gentle imaging is needed. Deep learning could be used to denoise multiphoton microscopy measurements, but these algorithms can be prone to hallucination, which can be disastrous for medical and scientific applications. We propose a method to simultaneously denoise and predict pixel-wise uncertainty for multiphoton imaging measurements, improving algorithm trustworthiness and providing statistical guarantees for the deep learning predictions. Furthermore, we propose to leverage this learned, pixel-wise uncertainty to drive an adaptive acquisition technique that rescans only the most uncertain regions of a sample. We demonstrate our method on experimental noisy MPM measurements of human endometrium tissues, showing that we can maintain fine features and outperform other denoising methods while predicting uncertainty at each pixel. Finally, with our adaptive acquisition technique, we demonstrate a 120X reduction in acquisition time and total light dose while successfully recovering fine features in the sample. We are the first to demonstrate distribution-free uncertainty quantification for a denoising task with real experimental data and the first to propose adaptive acquisition based on reconstruction uncertainty	翻訳日:2023-10-26 18:49:15 公開日:2023-10-24
# 教師なしドメイン適応のためのDeep Feature Registration Deep Feature Registration for Unsupervised Domain Adaptation ( http://arxiv.org/abs/2310.16100v1 ) ライセンス: Link先を確認	Youshan Zhang and Brian D. Davison	(参考訳) ラベル付きソースドメインからラベル付きターゲットドメインへの知識を活用するために、教師なしのドメイン適応が検討されているが、既存の手法は2つのドメイン間の分散アライメントに焦点を当てている。しかし、ソースとターゲットの機能を調整する方法には、うまく対応していない。本稿では,ドメイン不変特徴を維持し,ヒストグラムマッチングによる登録特徴と対象特徴のドメイン異同性を同時に最小化する,登録特徴を生成できるディープ特徴登録(dfr)モデルを提案する。さらに,確率的ソフトセレクションとセンターベースハードセレクションの両方を考慮して,ターゲット領域における擬似ラベルの品質を向上させる擬似ラベルリファインメントプロセスも採用する。複数のUDAベンチマークでの大規模な実験は、我々のDFRモデルの有効性を示し、その結果、新しい最先端の性能をもたらす。 While unsupervised domain adaptation has been explored to leverage the knowledge from a labeled source domain to an unlabeled target domain, existing methods focus on the distribution alignment between two domains. However, how to better align source and target features is not well addressed. In this paper, we propose a deep feature registration (DFR) model to generate registered features that maintain domain invariant features and simultaneously minimize the domain-dissimilarity of registered features and target features via histogram matching. We further employ a pseudo label refinement process, which considers both probabilistic soft selection and center-based hard selection to improve the quality of pseudo labels in the target domain. Extensive experiments on multiple UDA benchmarks demonstrate the effectiveness of our DFR model, resulting in new state-of-the-art performance.	翻訳日:2023-10-26 18:48:43 公開日:2023-10-24
# 半教師付き画像分割における解剖学的不確かさ Anatomically-aware Uncertainty for Semi-supervised Image Segmentation ( http://arxiv.org/abs/2310.16099v1 ) ライセンス: Link先を確認	Sukesh Adiga V, Jose Dolz, Herve Lombaert	(参考訳) 半教師付き学習は、ラベルなしデータを活用することにより、画像セグメンテーションのための大きなピクセル単位のラベル付きデータセットの必要性を緩和する。ラベルのないデータを利用するための顕著な方法は、モデル予測を規則化することである。非ラベルデータの予測は信頼できないため、不確実性認識スキームは徐々に有意義で信頼性の高い予測から学ぶために用いられる。しかしながら、不確実性推定法は、各トレーニングステップで計算しなければならないモデル予測から、計算コストの高い複数の推論に依存する。さらに、これらの不確実性マップは画素ワイドの差を捉え、グローバルな情報を考慮しない。本研究では,セグメント化マスクのグローバル情報を活用することによってセグメント化の不確実性を推定する手法を提案する。より正確には、解剖学的に認識された表現は、最初に利用可能なセグメンテーションマスクをモデル化することを学ぶ。学習表現は、新しいセグメンテーションの予測を解剖学的に表現可能なセグメンテーションにマップする。推定可能なセグメンテーションからのずれは、セグメンテーションネットワークをさらに導くために基礎となる画素レベルの不確かさを推定するのに役立つ。提案手法は,この表現から単一推論を用いて不確実性を推定し,全体の計算量を削減する。心臓MRIでは左心房,腹部CTでは多発臓器の2つの公用セグメンテーションデータセットについて検討した。我々の解剖学的手法は2つの一般的な評価指標を用いて,最先端の半教師付き手法よりもセグメンテーション精度を向上する。 Semi-supervised learning relaxes the need of large pixel-wise labeled datasets for image segmentation by leveraging unlabeled data. A prominent way to exploit unlabeled data is to regularize model predictions. Since the predictions of unlabeled data can be unreliable, uncertainty-aware schemes are typically employed to gradually learn from meaningful and reliable predictions. Uncertainty estimation methods, however, rely on multiple inferences from the model predictions that must be computed for each training step, which is computationally expensive. Moreover, these uncertainty maps capture pixel-wise disparities and do not consider global information. This work proposes a novel method to estimate segmentation uncertainty by leveraging global information from the segmentation masks. More precisely, an anatomically-aware representation is first learnt to model the available segmentation masks. The learnt representation thereupon maps the prediction of a new segmentation into an anatomically-plausible segmentation. The deviation from the plausible segmentation aids in estimating the underlying pixel-level uncertainty in order to further guide the segmentation network. The proposed method consequently estimates the uncertainty using a single inference from our representation, thereby reducing the total computation. We evaluate our method on two publicly available segmentation datasets of left atria in cardiac MRIs and of multiple organs in abdominal CTs. Our anatomically-aware method improves the segmentation accuracy over the state-of-the-art semi-supervised methods in terms of two commonly used evaluation metrics.	翻訳日:2023-10-26 18:48:29 公開日:2023-10-24
# 在庫管理政策の評価と改善のための文脈帯域 Contextual Bandits for Evaluating and Improving Inventory Control Policies ( http://arxiv.org/abs/2310.16096v1 ) ライセンス: Link先を確認	Dean Foster, Randy Jia, Dhruv Madeka	(参考訳) 定期的なレビュー在庫管理問題に、非定常的なランダムな需要、失った販売、確率的ベンダーのリードタイムに対処する解決策は、一般に近似またはシミュレーションのダイナミクスを強く仮定し、最適化、動的プログラミング、強化学習などの手法を適用する。したがって、特に改善の余地があるかどうかを確認するためには、在庫管理政策の分析と評価が重要である。我々は,政策の望ましい性質である均衡政策の概念について紹介する。これは,ほんのわずかな行動だけを変更するだけでは,実質的な報酬は得られない,という直観的な意味を持つ。本手法は, 理論上, 経験的研究においても良好な保証が得られることを示すため, 軽量なコンテキストバンディットベースアルゴリズムを提案する。 Solutions to address the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times typically involve making strong assumptions on the dynamics for either approximation or simulation, and applying methods such as optimization, dynamic programming, or reinforcement learning. Therefore, it is important to analyze and evaluate any inventory control policy, in particular to see if there is room for improvement. We introduce the concept of an equilibrium policy, a desirable property of a policy that intuitively means that, in hindsight, changing only a small fraction of actions does not result in materially more reward. We provide a light-weight contextual bandit-based algorithm to evaluate and occasionally tweak policies, and show that this method achieves favorable guarantees, both theoretically and in empirical studies.	翻訳日:2023-10-26 18:48:09 公開日:2023-10-24
# CR-COPEC:財務報告から学ぶ企業業績変化の因果関係 CR-COPEC: Causal Rationale of Corporate Performance Changes to Learn from Financial Reports ( http://arxiv.org/abs/2310.16095v1 ) ライセンス: Link先を確認	Ye Eun Chun, Sunjae Kwon, Kyunghwan Sohn, Nakwon Sung, Junyoup Lee, Byungki Seo, Kevin Compher, Seung-won Hwang, Jaesik Choi	(参考訳) 本稿では,企業業績の変化の因果関係(Causal Rationale of Corporate Performance Changes)を財務報告から紹介する。これは、企業業績の変化を検出するための包括的な大規模ドメイン適応因果文データセットである。 CR-COPECは2つの大きな業績に貢献している。まず、会計基準に従う専門家の因果分析を形式的に含む米国企業の10-kの年次報告書から因果的根拠を検出する。このデータセットは、個々の投資家とアナリストの両方が、すべてのドキュメントを読み取るのに多大な努力をすることなく、投資と意思決定のための材料情報リソースとして広く利用することができる。第2に、12の業界における企業の財務パフォーマンスに影響を与えるさまざまな特性を慎重に検討する。その結果、CR-COPECは各産業における独自の物語を考慮に入れ、各産業における因果文を区別することができる。また, CR-COPECデータセットの構築方法や, 目的文を産業特性に関する因果文として分類するのに適していることを示す。私たちのデータセットと実験コードは公開されています。 In this paper, we introduce CR-COPEC called Causal Rationale of Corporate Performance Changes from financial reports. This is a comprehensive large-scale domain-adaptation causal sentence dataset to detect financial performance changes of corporate. CR-COPEC contributes to two major achievements. First, it detects causal rationale from 10-K annual reports of the U.S. companies, which contain experts' causal analysis following accounting standards in a formal manner. This dataset can be widely used by both individual investors and analysts as material information resources for investing and decision making without tremendous effort to read through all the documents. Second, it carefully considers different characteristics which affect the financial performance of companies in twelve industries. As a result, CR-COPEC can distinguish causal sentences in various industries by taking unique narratives in each industry into consideration. We also provide an extensive analysis of how well CR-COPEC dataset is constructed and suited for classifying target sentences as causal ones with respect to industry characteristics. Our dataset and experimental codes are publicly available.	翻訳日:2023-10-26 18:47:54 公開日:2023-10-24
# 長距離絡み合いと位相励起 Long-range entanglement and topological excitations ( http://arxiv.org/abs/2310.16091v1 ) ライセンス: Link先を確認	Gianpaolo Torre, Jovan Odavi\'c, Pierre Fromholz, Salvatore Marco Giampaolo, Fabio Franchini	(参考訳) トポロジカル秩序は様々な形態を持ち、その分類と検出は近代研究の重要な分野である。本研究では, 位相位相を同定するために導入された非連結エントロピーが, 単一の分数化励起によって搬送される長距離エンタングルメント (lre) も明らかにできることを示す。反強磁性スピン鎖のトポロジカルフラストレーションを誘導することにより、量子的に非局在化されたドメインウォール励起をシステムに導入できることを示す。さらに, 量子クエンチに対するlreの弾力性と障害の導入について検討し, 典型的な位相秩序や対称性が保護された位相を持つ位相の存在を確立した。 Topological order comes in different forms, and its classification and detection is an important field of modern research. In this work, we show that the Disconnected Entanglement Entropy, a measure originally introduced to identify topological phases, is also able to unveil the long-range entanglement (LRE) carried by a single, fractionalized excitation. We show this by considering a quantum, delocalized domain wall excitation that can be introduced into a system by inducing topological frustration in an antiferromagnetic spin chain. Furthermore, we study the resilience of LRE against a quantum quench and the introduction of disorder, thus establishing the existence of a phase with topological features despite not being a typical topological order or symmetry-protected one.	翻訳日:2023-10-26 18:47:38 公開日:2023-10-24
# フラクトニック量子物質の動的スペクトル応答 Dynamical Spectral Response of Fractonic Quantum Matter ( http://arxiv.org/abs/2310.16084v1 ) ライセンス: Link先を確認	Philip Zechmann, Julian Boesl, Johannes Feldmeier, Michael Knap	(参考訳) フラクタル励起を持つ量子多体系は、興味深い物質の段階を実現することができる。本研究では,粒子数に加えて質量中心の保存や双極子モーメントも考慮した,拘束されたボース・ハバード模型の低エネルギー励起を1次元で研究する。このモデルは、双極子モット絶縁体、双極子ルッティンガー液体、準安定双極子超固体を含むフラクトン相を実現することが知られている。テンソルネットワーク法を用いてシステムの動的応答からスペクトル関数を計算し、対応する基底状態相の低エネルギー場理論から予測を検証する。双極子mott絶縁体,双極子ルッティンガー液の線形音響モード,および非整数充填時の電荷密度波秩序と位相コヒーレンスを有する超固体状態における零モード,有限モードの軟2次モードにおいて,ガッピング励起の存在を実証する。 Quantum many-body systems with fractonic excitations can realize fascinating phases of matter. Here, we study the low-energy excitations of a constrained Bose-Hubbard model in one dimension, which conserves the center of mass or, equivalently, the dipole moment in addition to the particle number. This model is known to realize fractonic phases, including a dipole Mott insulator, a dipole Luttinger liquid, and a metastable dipole supersolid. We use tensor network methods to compute spectral functions from the dynamical response of the system and verify predictions from low-energy field theories of the corresponding ground state phases. We demonstrate the existence of gapped excitations compatible with strong coupling results in a dipole Mott insulator, linear sound modes characteristic of a Luttinger liquid of dipoles, and soft quadratic modes at both zero and finite momenta in a supersolid state with charge density wave order and phase coherence at non-integer filling.	翻訳日:2023-10-26 18:47:22 公開日:2023-10-24
# 局所量子場の経路積分による粒子検出器モデル Particle detector models from path integrals of localized quantum fields ( http://arxiv.org/abs/2310.16083v1 ) ライセンス: Link先を確認	Bruno de S. L. Torres	(参考訳) シュウィンガー・ケルディッシュ経路積分を用いて、相対論的量子情報 (rqi) における局所量子場理論とより一般的な局所プローブのモデルとの接続を描く。プローブとして使用される局所化された場の到達不能モードを積分して追跡することにより、摂動理論の先頭の順において、プローブ場の有限個のモードのダイナミクスは、ちょうど有限個の調和振動子unruh-dewitt(udw)検出器のそれであることを示す。等価性は、プローブターゲット場系の入力状態の比較的一般的なクラスと、検出器として含む任意の数のモードに対して有効である。経路積分はまた、追跡された追加モードの存在により摂動理論のより高い順序でUDWモデルの補正を得る体系的な方法を与える閉形式式も提供する。このアプローチは、最近提案された量子場理論(arXiv:2308.11698)のための検出器ベースとフィールド理論ベースの測定フレームワークの間の橋渡しと拡張し、また、経路積分法がより一般的な分野であるRQIと他の物理学領域における粒子検出器モデルの間の潜在的な接続を指している。 Using the Schwinger-Keldysh path integral, we draw a connection between localized quantum field theories and more commonly used models of local probes in Relativistic Quantum Information (RQI). By integrating over and then tracing out the inaccessible modes of the localized field being used as a probe, we show that, at leading order in perturbation theory, the dynamics of any finite number of modes of the probe field is exactly that of a finite number of harmonic-oscillator Unruh-DeWitt (UDW) detectors. The equivalence is valid for a rather general class of input states of the probe-target field system, as well as for any arbitrary number of modes included as detectors. The path integral also provides a closed-form expression which gives us a systematic way of obtaining the corrections to the UDW model at higher orders in perturbation theory due to the existence of the additional modes that have been traced out. This approach vindicates and extends a recently proposed bridge between detector-based and field-theory-based measurement frameworks for quantum field theory [arXiv:2308.11698], and also points to potential connections between particle detector models in RQI and other areas of physics where path integral methods are more commonplace -- in particular, the Wilsonian approach to the renormalization group and effective field theories.	翻訳日:2023-10-26 18:47:04 公開日:2023-10-24
# 葉を通しての立体視深度知覚 Stereoscopic Depth Perception Through Foliage ( http://arxiv.org/abs/2310.16120v1 ) ライセンス: Link先を確認	Robert Kerschner, Rakesh John Amala Arokia Nathan, Rafal Mantiuk, Oliver Bimber	(参考訳) 人間も計算手法も葉の下に隠された物体の深さを識別するのに苦労している。しかし,計算合成開口センシングと立体画像を融合する人間の能力を組み合わせた場合,このような識別が実現可能となる。捜索・救助、野生生物の観察、監視、早期の山火事検出に必要な物体識別タスクでは、人、動物、車両などの誤った発見と地上や樹冠の日光を浴びたパッチ、あるいは地上の火災と樹木のトランクとの区別を深度支援する。我々は、密集した森の上空でドローンが撮影したビデオを使って、ユーザーの奥行きを識別する能力をテストした。単視ビデオを見たり,運動視差に頼ると,これは不可能であることがわかった。葉の閉塞が原因で立体映像でも同様であった。しかし,オクルージョンを減少させるために合成開口センシングが用いられ,立体視ビデオに差が生じたが,計算(立体視マッチング)手法は失敗し,人間の観察者は深度を識別することに成功した。これは、計算方法と人間の視覚の相乗効果を利用して、単独では実行できないタスクを実行するシステムの可能性を示している。 Both humans and computational methods struggle to discriminate the depths of objects hidden beneath foliage. However, such discrimination becomes feasible when we combine computational optical synthetic aperture sensing with the human ability to fuse stereoscopic images. For object identification tasks, as required in search and rescue, wildlife observation, surveillance, and early wildfire detection, depth assists in differentiating true from false findings, such as people, animals, or vehicles vs. sun-heated patches at the ground level or in the tree crowns, or ground fires vs. tree trunks. We used video captured by a drone above dense woodland to test users' ability to discriminate depth. We found that this is impossible when viewing monoscopic video and relying on motion parallax. The same was true with stereoscopic video because of the occlusions caused by foliage. However, when synthetic aperture sensing was used to reduce occlusions and disparity-scaled stereoscopic video was presented, whereas computational (stereoscopic matching) methods were unsuccessful, human observers successfully discriminated depth. This shows the potential of systems which exploit the synergy between computational methods and human vision to perform tasks that neither can perform alone.	翻訳日:2023-10-26 18:41:08 公開日:2023-10-24
# alquist 5.0: 対話ツリーは生成モデルと出会う。ソーシャルボットの会話を促進する新しいアプローチ Alquist 5.0: Dialogue Trees Meet Generative Models. A Novel Approach for Enhancing SocialBot Conversations ( http://arxiv.org/abs/2310.16119v1 ) ライセンス: Link先を確認	Ond\v{r}ej Kobza, Jan \v{C}uhel, Tommaso Gargiani, David Herel, Petr Marek (Faculty of Electrical Engineering, CTU in Prague)	(参考訳) Alexa Prize SocialBot Grand Challenge~5のために開発されたSocialBot - Alquist~5.0を紹介します。従来のシステムに基づいて、NRG Baristaを導入し、社会ボットにバリスタを統合するための革新的なアプローチをいくつか紹介し、全体的な会話体験を改善した。さらに、SocialBotを拡張してマルチモーダルデバイスをサポートします。本稿では,多種多様なトピックにまたがる共感的・知識的な会話能力を維持しつつ,ユーザ期待の進展に対応するAlquist~5.0の開発に関する知見を提供する。 We present our SocialBot -- Alquist~5.0 -- developed for the Alexa Prize SocialBot Grand Challenge~5. Building upon previous versions of our system, we introduce the NRG Barista and outline several innovative approaches for integrating Barista into our SocialBot, improving the overall conversational experience. Additionally, we extend our SocialBot to support multimodal devices. This paper offers insights into the development of Alquist~5.0, which meets evolving user expectations while maintaining empathetic and knowledgeable conversational abilities across diverse topics.	翻訳日:2023-10-26 18:40:45 公開日:2023-10-24
# NADI 2023:第4回アラビア方言識別タスク NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task ( http://arxiv.org/abs/2310.16117v1 ) ライセンス: Link先を確認	Muhammad Abdul-Mageed, AbdelRahim Elmadany, Chiyu Zhang, El Moatez Billah Nagoudi, Houda Bouamor, Nizar Habash	(参考訳) 第4回Nuanced Arabic Dialect Identification Shared Task (NADI 2023)の報告を行った。 NADIの目的は、研究チームが標準化された条件下で協力的に競争する機会を作ることで、最先端のアラビアNLPを促進することである。アラビア語の方言に注目し、新しいデータセットを提供し、異なるアプローチ間で意味のある比較を可能にするサブタスクを定義する。 NADI 2023は、方言識別(Subtask 1)と方言間機械翻訳(Subtask 2とSubtask 3)の両方をターゲットにしている。共有タスクには58のユニークなチームが登録され、そのうち18チームが参加している(テストフェーズには76の有効な応募がある)。そのうち16チームがsubtask 1, 5がsubtask 2に参加し,3がsubtask 3に参加した。優勝チームはSubtask 1, 14.76 Bleuで87.27 F1、Subtask 2, 21.10 Bleuでそれぞれ勝利した。その結果,3つのサブタスクは依然として困難なままであり,将来的な作業のモチベーションが得られた。参加チームが採用する手法について説明し,NADIの展望を簡潔に述べる。 We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023). The objective of NADI is to help advance state-of-the-art Arabic NLP by creating opportunities for teams of researchers to collaboratively compete under standardized conditions. It does so with a focus on Arabic dialects, offering novel datasets and defining subtasks that allow for meaningful comparisons between different approaches. NADI 2023 targeted both dialect identification (Subtask 1) and dialect-to-MSA machine translation (Subtask 2 and Subtask 3). A total of 58 unique teams registered for the shared task, of whom 18 teams have participated (with 76 valid submissions during test phase). Among these, 16 teams participated in Subtask 1, 5 participated in Subtask 2, and 3 participated in Subtask 3. The winning teams achieved 87.27 F1 on Subtask 1, 14.76 Bleu in Subtask 2, and 21.10 Bleu in Subtask 3, respectively. Results show that all three subtasks remain challenging, thereby motivating future work in this area. We describe the methods employed by the participating teams and briefly offer an outlook for NADI.	翻訳日:2023-10-26 18:40:36 公開日:2023-10-24
# 過去のデータのない概念の覚醒:オンラインプラセボからのクラスインクリメンタルラーニング Wakening Past Concepts without Past Data: Class-Incremental Learning from Online Placebos ( http://arxiv.org/abs/2310.16115v1 ) ライセンス: Link先を確認	Yaoyao Liu, Yingying Li, Bernt Schiele, Qianru Sun	(参考訳) 古いクラス知識を忘れないことは、モデルが新しいクラスに継続的に適応する場合、クラスインクリメンタル学習(cil)にとって重要な課題である。これに対処する一般的なテクニックは知識蒸留(kd)であり、古いモデルと新しいモデルの予測の不一致を罰する。このような予測は、cilのメモリ制限が厳しいため、古いクラスデータは極めて少ないため、ほとんど新しいクラスデータで行われます。本稿では,KDの損失を深く掘り下げ,「KDの新しいクラスデータの利用」がモデル適応を阻害するだけでなく(新しいクラスを学習するために),古いクラスの知識を保存するための効率の低下をもたらすことを明らかにする。ここでは,Google Imagesなどの無料画像ストリームから,Placebosを自動的かつ経済的に選択するKDの古いクラスのPlaceboを使用することによって,この問題に対処する。この目的のために,オンラインプレースボ選択ポリシーをトレーニングし,ストリーミング画像(良か悪か)の品質を迅速に評価し,kdの1回フィードフォワード計算によいもののみを使用する。我々は,オンラインマルコフ決定プロセス(MDP)としてポリシートレーニングプロセスを定式化し,このMDP問題を解決するためのオンライン学習アルゴリズムを導入する。実験では、我々の方法が示されます。 1) placebosとオリジナルの古いクラスデータの間にクラス重複がない場合でも、驚くほど効果的である。 2)追加の監督や記憶予算を必要としない。 3)多くの上位パフォーマンスcilメソッド、特にクラスごとに5つのexemplarsのような古いクラスのexemplarに対して低いメモリ予算を使用する場合を著しく上回っている。 Not forgetting old class knowledge is a key challenge for class-incremental learning (CIL) when the model continuously adapts to new classes. A common technique to address this is knowledge distillation (KD), which penalizes prediction inconsistencies between old and new models. Such prediction is made with almost new class data, as old class data is extremely scarce due to the strict memory limitation in CIL. In this paper, we take a deep dive into KD losses and find that "using new class data for KD" not only hinders the model adaption (for learning new classes) but also results in low efficiency for preserving old class knowledge. We address this by "using the placebos of old classes for KD", where the placebos are chosen from a free image stream, such as Google Images, in an automatical and economical fashion. To this end, we train an online placebo selection policy to quickly evaluate the quality of streaming images (good or bad placebos) and use only good ones for one-time feed-forward computation of KD. We formulate the policy training process as an online Markov Decision Process (MDP), and introduce an online learning algorithm to solve this MDP problem without causing much computation costs. In experiments, we show that our method 1) is surprisingly effective even when there is no class overlap between placebos and original old class data, 2) does not require any additional supervision or memory budget, and 3) significantly outperforms a number of top-performing CIL methods, in particular when using lower memory budgets for old class exemplars, e.g., five exemplars per class.	翻訳日:2023-10-26 18:40:17 公開日:2023-10-24
# 脳遺伝子転写の圧縮発現 Compressed representation of brain genetic transcription ( http://arxiv.org/abs/2310.16113v1 ) ライセンス: Link先を確認	James K Ruffle, Henry Watkins, Robert J Gray, Harpreet Hyare, Michel Thiebaut de Schotten, Parashkev Nachev	(参考訳) 脳のアーキテクチャは複雑すぎるので、コンパクトでナビゲート可能な空間に変化を投影する圧縮表現を使わずに直感的に調査できる。この課題は、解剖学的および転写学的パターンの結合複雑性が最大圧縮を要求する遺伝子表現のような高次元データにおいて特に困難である。確立された実践は標準主成分分析(pca)であり、その計算フェリシティは限定的な表現率、特に大きな圧縮比によって相殺される。 Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets. ディープオートエンコーダは、パフォーマンスとターゲットドメインのすべての指標において優れた表現力を示し、人間の脳における転写パターンを表現する基準標準としての使用をサポートする。 The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use standard principal component analysis (PCA), whose computational felicity is offset by limited expressivity, especially at great compression ratios. Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets. We show that deep auto-encoders yield superior representations across all metrics of performance and target domains, supporting their use as the reference standard for representing transcription patterns in the human brain.	翻訳日:2023-10-26 18:39:46 公開日:2023-10-24
# 胸部X線からの長期多ラベル疾患分類に向けて:CXR-LT課題の概観 Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge ( http://arxiv.org/abs/2310.16112v1 ) ライセンス: Link先を確認	Gregory Holste, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang, Dongkyun Kim, Trong-Hieu Nguyen-Mau, Minh-Triet Tran, Jaehyup Jeong, Wongi Park, Jongbin Ryu, Feng Hong, Arsh Verma, Yosuke Yamagishi, Changhyun Kim, Hyeryeong Seo, Myungjoo Kang, Leo Anthony Celi, Zhiyong Lu, Ronald M. Summers, George Shih, Zhangyang Wang, Yifan Peng	(参考訳) 診断医療画像検査のような現実世界の画像認識問題の多くは、"long-tailed" $\unicode{x2013}$である。胸部X線撮影では、診断は長い尾と多ラベルの問題であり、患者は同時に複数の所見を呈することが多い。医学画像認識における長期学習の問題の研究が始まっているが、長期学習によるラベルの不均衡とラベル共起の相互作用を研究する研究者はほとんどいない。今回我々は,胸部x線 (cxr) からの胸部多発性胸部疾患の分類について, 研究コミュニティと協働し, cxr-lt (cxr-lt) のオープンチャレンジを行った。我々は、35万以上のCXRの大規模ベンチマークデータセットを公開し、それぞれに長い尾の分布の後、少なくとも26の臨床所見の1つをラベル付けした。トップパフォーマンスソリューションの一般的なテーマを合成し,ロングテール,マルチラベルの医用画像分類を実践的に推奨する。最後に,これらの知見を用いて,視覚言語基礎モデルによる少数・ゼロショットの疾患分類を提案する。 Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" $\unicode{x2013}$ there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.	翻訳日:2023-10-26 18:39:28 公開日:2023-10-24
# ゼロショットプロンプトを用いた局所微分プライベート文書生成 Locally Differentially Private Document Generation Using Zero Shot Prompting ( http://arxiv.org/abs/2310.16111v1 ) ライセンス: Link先を確認	Saiteja Utpala, Sara Hooker, Pin Yu Chen	(参考訳) 多くの研究が、事前訓練された大きな言語モデルに関連するプライバシーリスクを強調している。対照的に,本研究は,事前学習された大規模言語モデルがプライバシー保護に効果的に寄与することを示すことにより,独自の視点を提供する。本稿では,DP-Promptという,事前訓練された大規模言語モデルのパワーとゼロショットプロンプトを利用して,ダウンストリームユーティリティへの影響を最小限に抑えながら,作者の匿名化攻撃に対処する手法を提案する。 DP-PromptをChatGPT(gpt-3.5)のような強力な言語モデルで使用すると、匿名化攻撃の成功率の顕著な低下が観察され、より単純な設計にもかかわらず既存のアプローチをかなり上回っていることが示された。例えば、IMDBデータセットの場合、DP-Prompt(ChatGPT)は、静的攻撃者に対する著者識別F1スコアの46倍、適応攻撃者に対する26倍の低下を達成しながら、クリーンな感情F1スコアを完全に回復する。プライバシ利用トレードオフのさまざまな影響を分析するために,70億のパラメータを含む,オープンソースの6つの大規模言語モデルを対象に,広範な実験を行いました。 Numerous studies have highlighted the privacy risks associated with pretrained large language models. In contrast, our research offers a unique perspective by demonstrating that pretrained large language models can effectively contribute to privacy preservation. We propose a locally differentially private mechanism called DP-Prompt, which leverages the power of pretrained large language models and zero-shot prompting to counter author de-anonymization attacks while minimizing the impact on downstream utility. When DP-Prompt is used with a powerful language model like ChatGPT (gpt-3.5), we observe a notable reduction in the success rate of de-anonymization attacks, showing that it surpasses existing approaches by a considerable margin despite its simpler design. For instance, in the case of the IMDB dataset, DP-Prompt (with ChatGPT) perfectly recovers the clean sentiment F1 score while achieving a 46\% reduction in author identification F1 score against static attackers and a 26\% reduction against adaptive attackers. We conduct extensive experiments across six open-source large language models, ranging up to 7 billion parameters, to analyze various effects of the privacy-utility tradeoff.	翻訳日:2023-10-26 18:39:01 公開日:2023-10-24
# 複合画像生成SwinTransformer Network for Audio Denoising Complex Image Generation SwinTransformer Network for Audio Denoising ( http://arxiv.org/abs/2310.16109v1 ) ライセンス: Link先を確認	Youshan Zhang and Jialu Li	(参考訳) 高性能なオーディオデノーミングを実現することは、現実世界のアプリケーションでは依然として難しい課題である。既存の時間周波数法は、しばしば生成された周波数領域画像の品質を無視している。本稿では,音声の雑音化問題を画像生成タスクに変換する。まず、複雑なフーリエ領域からより多くの情報を取得するための複雑な画像生成SwinTransformerネットワークを開発する。そこで我々は,高品質な画像を生成するために構造類似性と詳細な損失関数を課し,識別音声とクリーンオーディオの差を最小限に抑えるためにSDR損失を開発する。 2つのベンチマークデータセットに関する広範囲な実験により,提案手法が最先端の手法よりも優れていることを証明した。 Achieving high-performance audio denoising is still a challenging task in real-world applications. Existing time-frequency methods often ignore the quality of generated frequency domain images. This paper converts the audio denoising problem into an image generation task. We first develop a complex image generation SwinTransformer network to capture more information from the complex Fourier domain. We then impose structure similarity and detailed loss functions to generate high-quality images and develop an SDR loss to minimize the difference between denoised and clean audios. Extensive experiments on two benchmark datasets demonstrate that our proposed model is better than state-of-the-art methods.	翻訳日:2023-10-26 18:38:38 公開日:2023-10-24
# 進化の物理的性質と統計的収縮性は写像の等価概念である Physicality of evolution and statistical contractivity are equivalent notions of maps ( http://arxiv.org/abs/2310.16107v1 ) ライセンス: Link先を確認	Matteo Scandi, Paolo Abiuso, Dario De Santis, Jacopo Surace	(参考訳) 統計量化器は、ノイズ変換の下で情報が失われるべきという直感に従って、物理的進化の下で収縮するために一般的に必要である。この原理は統計学において非常に関係があり、それに基づいて一意性の結果を導出することさえ可能である: 任意の物理写像の下にそれらの縮約性を与えることによって、チェンツォフ=ペッツの定理はフィッシャー情報計量と呼ばれる確率分布(あるいは密度行列)の空間上の一意の計量を抽出する。この結果から、統計量化器は、その定義が物理写像に基づいているため、導出概念である可能性が示唆される。この作品の目的は、この信念を否定することである。実際、チェンツォフ=ペッツの定理に双対な結果を示し、すべての可能な線型写像の中で、フィッシャー情報に一致するのは、まさに物理的なものであることを証明した。この結果は、共通の意見に反して、物理地図と標準統計量化器の間には基本的な階層構造が存在しないことを示している。 Statistical quantifiers are generically required to contract under physical evolutions, following the intuition that information should be lost under noisy transformations. This principle is very relevant in statistics, and it even allows to derive uniqueness results based on it: by imposing their contractivity under any physical maps, the Chentsov-Petz theorem singles out a unique family of metrics on the space of probability distributions (or density matrices) called the Fisher information metrics. This result might suggest that statistical quantifiers are a derived concept, as their very definition is based on physical maps. The aim of this work is to disprove this belief. Indeed, we present a result dual to the Chentsov-Petz theorem, proving that among all possible linear maps, the only ones that contract the Fisher information are exactly the physical ones. This result shows that, contrary to the common opinion, there is no fundamental hierarchy between physical maps and canonical statistical quantifiers, as either of them can be defined in terms of the other.	翻訳日:2023-10-26 18:38:27 公開日:2023-10-24
# ブロードキャストベースサブグラフサンプリングを用いた無線ネットワークによる分散学習 Decentralized Learning over Wireless Networks with Broadcast-Based Subgraph Sampling ( http://arxiv.org/abs/2310.16106v1 ) ライセンス: Link先を確認	Daniel P\'erez Herrera, Zheng Chen and Erik G. Larsson	(参考訳) 本研究は、コンセンサスに基づく分散確率勾配勾配(D-SGD)を用いて、無線ネットワーク上の分散学習のコミュニケーション面に焦点を当てる。ネットワーク内情報交換による実際の通信コストや遅延を考慮すると,送信スロット毎の改善によって測定されたアルゴリズムの高速収束を実現することが目的である。本稿では,無線ネットワーク上でのD-SGDの効率的な通信フレームワークであるBASSを提案する。各イテレーションにおいて、非干渉ノードの複数のサブセットを起動し、隣人にモデル更新をブロードキャストする。これらのサブセットは時間とともにランダムに活性化され、確率はネットワーク接続の重要性を反映し、通信コストの制約(例えば、イテレーション当たりの平均送信スロット数)を受ける。コンセンサス更新ステップでは、通信対称性を維持するために双方向リンクのみを効果的に保存する。既存のリンクベースのスケジューリング手法と比較して、無線チャネルの固有の放送特性は、同じ数の送信スロットでより多くの通信リンクを作成することにより、分散学習の収束を早めるという本質的な利点を提供する。 This work centers on the communication aspects of decentralized learning over wireless networks, using consensus-based decentralized stochastic gradient descent (D-SGD). Considering the actual communication cost or delay caused by in-network information exchange in an iterative process, our goal is to achieve fast convergence of the algorithm measured by improvement per transmission slot. We propose BASS, an efficient communication framework for D-SGD over wireless networks with broadcast transmission and probabilistic subgraph sampling. In each iteration, we activate multiple subsets of non-interfering nodes to broadcast model updates to their neighbors. These subsets are randomly activated over time, with probabilities reflecting their importance in network connectivity and subject to a communication cost constraint (e.g., the average number of transmission slots per iteration). During the consensus update step, only bi-directional links are effectively preserved to maintain communication symmetry. In comparison to existing link-based scheduling methods, the inherent broadcasting nature of wireless channels offers intrinsic advantages in speeding up convergence of decentralized learning by creating more communicated links with the same number of transmission slots.	翻訳日:2023-10-26 18:38:03 公開日:2023-10-24
# 知識グラフに対する文脈対応説明可能なレコメンデーション Context-aware explainable recommendations over knowledge graphs ( http://arxiv.org/abs/2310.16141v1 ) ライセンス: Link先を確認	Jinfeng Zhong, Elsa Negre	(参考訳) 知識グラフは、アイテムに関連する豊富な意味関係を含み、そのような意味関係をレコメンデーションシステムに組み込むことで、アイテムの潜伏した関係を探索し、予測の精度を改善し、レコメンデーションの説明可能性を高める。しかし、このような説明はユーザのコンテキストに適応せず、ユーザーの好みに大きく影響する可能性がある。そこで本研究では,コンテキストに適応したユーザの嗜好をモデル化し,項目に関する知識グラフにリッチな意味関係を組み込むためのエンドツーエンドフレームワークであるca-kgcn(context-aware knowledge graph convolutional network)を提案する。このフレームワークは、アイテムのコンテキストや特徴など、さまざまな要素に対するユーザの注意を捉える。具体的には、コンテキストに適合したユーザの好みをモデル化し、与えられたコンテキストに適応した説明を提供する。実世界の3つのデータセットの実験は、ユーザの好みを文脈に合わせてモデル化し、生成したリコメンデーションを説明するという、我々のフレームワークの有効性を示している。 Knowledge graphs contain rich semantic relationships related to items and incorporating such semantic relationships into recommender systems helps to explore the latent connections of items, thus improving the accuracy of prediction and enhancing the explainability of recommendations. However, such explainability is not adapted to users' contexts, which can significantly influence their preferences. In this work, we propose CA-KGCN (Context-Aware Knowledge Graph Convolutional Network), an end-to-end framework that can model users' preferences adapted to their contexts and can incorporate rich semantic relationships in the knowledge graph related to items. This framework captures users' attention to different factors: contexts and features of items. More specifically, the framework can model users' preferences adapted to their contexts and provide explanations adapted to the given context. Experiments on three real-world datasets show the effectiveness of our framework: modeling users' preferences adapted to their contexts and explaining the recommendations generated.	翻訳日:2023-10-26 18:29:51 公開日:2023-10-24
# Pix2HDR -- 高速HDRビデオのための画素単位の取得と深層学習に基づく合成アプローチ Pix2HDR -- A pixel-wise acquisition and deep learning-based synthesis approach for high-speed HDR videos ( http://arxiv.org/abs/2310.16139v1 ) ライセンス: Link先を確認	Caixin Wang, Jie Zhang, Matthew A. Wilson, Ralph Etienne-Cummings	(参考訳) 広い動きと光強度でダイナミックなシーンを正確に捉えることは、多くの視覚アプリケーションにとって不可欠である。しかし、カメラのフレームレートがダイナミックレンジを制限するため、高速ハイダイナミックレンジ(HDR)ビデオの取得は困難である。既存の方法はマルチ露光フレームを取得するために速度を犠牲にする。しかし、これらのフレーム内の不整合運動は、なおもHDR融合アルゴリズムの複雑さを生じさせ、成果物をもたらす。フレームベースの露光の代わりに、個々のピクセルを様々な露光や位相オフセットでサンプリングする。ピクセル単位でプログラマブルなイメージセンサに実装したサンプリングパターンは,高速動作を同時に高ダイナミックレンジでキャプチャする。次に,ディープニューラルネットワークによるエンドツーエンド学習重みを用いて,画素毎の出力をhdrビデオに変換し,動きのぼやけを最小限に抑えながら,高い時空間分解能を達成する。我々は、1000FPSでエイリアスフリーのHDRビデオの取得を実証し、低照度条件下での高速な動きと明るい背景を解消する。複雑なシーンをデコードする際の深層ニューラルネットワークの強度と画素ワイドサンプリングパターンの汎用性を組み合わせることにより,動的条件下での視覚システムの適応性と性能を大幅に向上させる。 Accurately capturing dynamic scenes with wide-ranging motion and light intensity is crucial for many vision applications. However, acquiring high-speed high dynamic range (HDR) video is challenging because the camera's frame rate restricts its dynamic range. Existing methods sacrifice speed to acquire multi-exposure frames. Yet, misaligned motion in these frames can still pose complications for HDR fusion algorithms, resulting in artifacts. Instead of frame-based exposures, we sample the videos using individual pixels at varying exposures and phase offsets. Implemented on a pixel-wise programmable image sensor, our sampling pattern simultaneously captures fast motion at a high dynamic range. We then transform pixel-wise outputs into an HDR video using end-to-end learned weights from deep neural networks, achieving high spatiotemporal resolution with minimized motion blurring. We demonstrate aliasing-free HDR video acquisition at 1000 FPS, resolving fast motion under low-light conditions and against bright backgrounds - both challenging conditions for conventional cameras. By combining the versatility of pixel-wise sampling patterns with the strength of deep neural networks at decoding complex scenes, our method greatly enhances the vision system's adaptability and performance in dynamic conditions.	翻訳日:2023-10-26 18:29:33 公開日:2023-10-24
# 下位信号:脳神経発達過程としての乳児非栄養摂取の検出 Subtle Signals: Video-based Detection of Infant Non-nutritive Sucking as a Neurodevelopmental Cue ( http://arxiv.org/abs/2310.16138v1 ) ライセンス: Link先を確認	Shaotong Zhu, Michael Wan, Sai Kumar Reddy Manne, Emily Zimmerman, Sarah Ostadabbas	(参考訳) 栄養素を摂取せずにおしゃぶり、指または類似の物体を吸う行為である非栄養吸引(non-nutritive sucking, nns)は、健康な初期発達を評価する上で重要な役割を果たす。早産児の場合、NNS行動は摂食準備度を決定する重要な要素である。年長の幼児では、nns行動の特徴は神経および運動発達に関する貴重な洞察を与える。さらに、突発性乳幼児死亡症候群(SIDS)の予防としてNNS活性が提案されている。しかし、NNS評価の臨床応用は、現在、労働集約的および主観的指先評価によって妨げられている。そのため、研究者はしばしば、客観的なNS信号測定のために高価な圧力変換器を利用する。臨床医と研究者双方のNS信号監視のアクセシビリティと信頼性を高めるため,自然環境下でのベビーモニター映像を用いたNNS活動の非接触検出のためのビジョンベースアルゴリズムを提案する。本手法では,乳幼児の微妙な信号の検出と増幅を可能にするため,光学的流れと時間的畳み込みネットワークを包括的に探索する。均一長の短いビデオクリップをNNSおよび非NNS周期に分類することに成功した。さらに,NNSおよび非NNSセグメントに長い混合能動画を分割し,局所的な分類結果をまとめる手動および学習に基づく手法について検討した。本研究は,19名の乳児と183時間の乳児モニター映像を含む,乳児の注釈付きビデオの2つの新しいデータセットを紹介した。 Non-nutritive sucking (NNS), which refers to the act of sucking on a pacifier, finger, or similar object without nutrient intake, plays a crucial role in assessing healthy early development. In the case of preterm infants, NNS behavior is a key component in determining their readiness for feeding. In older infants, the characteristics of NNS behavior offer valuable insights into neural and motor development. Additionally, NNS activity has been proposed as a potential safeguard against sudden infant death syndrome (SIDS). However, the clinical application of NNS assessment is currently hindered by labor-intensive and subjective finger-in-mouth evaluations. Consequently, researchers often resort to expensive pressure transducers for objective NNS signal measurement. To enhance the accessibility and reliability of NNS signal monitoring for both clinicians and researchers, we introduce a vision-based algorithm designed for non-contact detection of NNS activity using baby monitor footage in natural settings. Our approach involves a comprehensive exploration of optical flow and temporal convolutional networks, enabling the detection and amplification of subtle infant-sucking signals. We successfully classify short video clips of uniform length into NNS and non-NNS periods. Furthermore, we investigate manual and learning-based techniques to piece together local classification results, facilitating the segmentation of longer mixed-activity videos into NNS and non-NNS segments of varying duration. Our research introduces two novel datasets of annotated infant videos, including one sourced from our clinical study featuring 19 infant subjects and 183 hours of overnight baby monitor footage.	翻訳日:2023-10-26 18:29:09 公開日:2023-10-24
# あなたは私をフォローできますか。 ChatGPTにおける状況理解のテスト Can You Follow Me? Testing Situational Understanding in ChatGPT ( http://arxiv.org/abs/2310.16135v1 ) ライセンス: Link先を確認	Chenghao Yang, Allyson Ettinger	(参考訳) 文の意味の理解と情報の更新は、私たちが“situational understanding(su)”と呼ぶ、人間のようなaiエージェントにとって重要な能力です。特にチャットモデル、例えばChatGPTでは、人間とAIの一貫性、一貫性、効果的な対話を可能にするためにSUが不可欠である。従来,非チャットボット大規模言語モデル(LLM)のSU制限は特定されてきたが,これらの制限の程度や原因はよく理解されておらず,現在のチャットベースモデルの性能については検討されていない。本研究では,モデルが環境状態を追跡・列挙する能力を評価することによって,チャット指向モデルにおけるsuの制御および体系的なテストを可能にする,新しいsuテストのための合成環境を提案する。私たちの環境はまた、パフォーマンスパターンの根本原因をより深く理解するために、モデルパフォーマンスのダイナミクスを綿密に分析することができます。テストは最先端のチャットボットであるChatGPTに適用し、タスクの基本的な単純さにもかかわらず、モデルの性能は時間にわたって正しい環境状態を維持することができないことを反映している。当社のフォローアップ分析によると、パフォーマンスの低下は、主にchatgptが(完全な対話履歴にアクセスできるが)永続的なインコンテキストメモリを持っているためであり、アキュラシーを人工的に膨らませるアップデートを含む幻覚的なアップデートの影響を受けやすいためである。以上の結果から,ChatGPTは現状のロバストな追跡機能を備えていないことが示唆され,ChatGPTの優れた対話性能への信頼にはリスクが伴うことが示唆された。テスト環境を再現するためのコードベースと、ChatGPTからのすべてのプロンプトとAPIレスポンスを、https://github.com/yangalan123/SituationalTestingでリリースしています。 Understanding sentence meanings and updating information states appropriately across time -- what we call "situational understanding" (SU) -- is a critical ability for human-like AI agents. SU is essential in particular for chat models, such as ChatGPT, to enable consistent, coherent, and effective dialogue between humans and AI. Previous works have identified certain SU limitations in non-chatbot Large Language models (LLMs), but the extent and causes of these limitations are not well understood, and capabilities of current chat-based models in this domain have not been explored. In this work we tackle these questions, proposing a novel synthetic environment for SU testing which allows us to do controlled and systematic testing of SU in chat-oriented models, through assessment of models' ability to track and enumerate environment states. Our environment also allows for close analysis of dynamics of model performance, to better understand underlying causes for performance patterns. We apply our test to ChatGPT, the state-of-the-art chatbot, and find that despite the fundamental simplicity of the task, the model's performance reflects an inability to retain correct environment states across time. Our follow-up analyses suggest that performance degradation is largely because ChatGPT has non-persistent in-context memory (although it can access the full dialogue history) and it is susceptible to hallucinated updates -- including updates that artificially inflate accuracies. Our findings suggest overall that ChatGPT is not currently equipped for robust tracking of situation states, and that trust in the impressive dialogue performance of ChatGPT comes with risks. We release the codebase for reproducing our test environment, as well as all prompts and API responses from ChatGPT, at https://github.com/yangalan123/SituationalTesting.	翻訳日:2023-10-26 18:28:43 公開日:2023-10-24
# ソフトウェア工学会議とジャーナルの多様性 Diversity in Software Engineering Conferences and Journals ( http://arxiv.org/abs/2310.16132v1 ) ライセンス: Link先を確認	Aditya Shankar Narayanan, Dheeraj Vagavolu, Nancy A Day, Meiyappan Nagappan	(参考訳) 民族や性別に関する多様性は、ソフトウェア開発のオープンソースや産業環境で研究されてきた。学術会議や雑誌などの出版の道は、成長する技術産業に寄与している。しかし、学界における多様性に関する研究はほとんど行われていない。本稿では,ソフトウェア工学の会議や雑誌に掲載した著者の民族,性別,地理的多様性について検討する。ソフトウェア工学における3つのトップカンファレンスと2つのトップジャーナルの出版物の多様性を体系的に定量的に分析し、ソフトウェア工学の会議や出版物において、特定の民族、性別、地理的な場所に属する著者や委員に対するバイアスと参入障壁の存在を示唆する。本研究は,2010年から2022年までのICSE, FSE, ASEおよびIEEE TSE, ACM TOSEMの会議から,出版物(受理者)および委員会データ(プログラム・組織委員会・ジャーナル編集委員会)を分析した。このデータの分析によると、参加者や委員会メンバーの間では、アフリカ、南アメリカ、オセアニアの国からの出版物など、表現力が著しく低いコミュニティが存在する。しかし、委員会の多様性と参加者との相関研究では決定的な証拠は得られなかった。さらに、白人作家や男性作家との論文が引用される可能性が高いという決定的な証拠はない。最後に、2010-2022年の間に著者の民族多様性が向上したが、性別や地理的多様性は改善しなかった。 Diversity with respect to ethnicity and gender has been studied in open-source and industrial settings for software development. Publication avenues such as academic conferences and journals contribute to the growing technology industry. However, there have been very few diversity-related studies conducted in the context of academia. In this paper, we study the ethnic, gender, and geographical diversity of the authors published in Software Engineering conferences and journals. We provide a systematic quantitative analysis of the diversity of publications and organizing and program committees of three top conferences and two top journals in Software Engineering, which indicates the existence of bias and entry barriers towards authors and committee members belonging to certain ethnicities, gender, and/or geographical locations in Software Engineering conferences and journal publications. For our study, we analyse publication (accepted authors) and committee data (Program and Organizing committee/ Journal Editorial Board) from the conferences ICSE, FSE, and ASE and the journals IEEE TSE and ACM TOSEM from 2010 to 2022. The analysis of the data shows that across participants and committee members, there are some communities that are consistently significantly lower in representation, for example, publications from countries in Africa, South America, and Oceania. However, a correlation study between the diversity of the committees and the participants did not yield any conclusive evidence. Furthermore, there is no conclusive evidence that papers with White authors or male authors were more likely to be cited. Finally, we see an improvement in the ethnic diversity of the authors over the years 2010-2022 but not in gender or geographical diversity.	翻訳日:2023-10-26 18:28:10 公開日:2023-10-24
# GenKIE:ロバストな生成型マルチモーダルドキュメントキー情報抽出 GenKIE: Robust Generative Multimodal Document Key Information Extraction ( http://arxiv.org/abs/2310.16131v1 ) ライセンス: Link先を確認	Panfeng Cao, Ye Wang, Qiang Zhang, Zaiqiao Meng	(参考訳) スキャンされた文書からキー情報抽出(KIE)が注目されている。最近のkieのアプローチによって有望な結果が得られたが、通常は識別モデルに基づいて構築され、ocr(optical character recognition)エラーの処理能力がなく、不必要なトークンレベルのラベル付けが必要となる。本稿では,KIEタスクに対処する新しい生成的エンドツーエンドモデルであるGenkieを提案する。 genkieは、マルチモーダルエンコーダを使用して視覚、レイアウト、テキストの特徴を埋め込み、デコーダを使用して所望の出力を生成するシーケンスツーシーケンスのマルチモーダル生成モデルである。適切に設計されたプロンプトを利用して、ラベルセマンティクスを弱教師付き信号として組み込んで、キー情報の生成を促す。生成モデルの顕著な利点は、OCRエラーの自動修正を可能にすることである。さらに、トークンレベルの粒度アノテーションは不要である。複数のパブリックな実世界のデータセットに対する大規模な実験は、GenKIEが様々な種類のドキュメントを効果的に一般化し、最先端の結果を達成することを示している。実験では,OCRエラーに対するモデルの堅牢性も検証し,実際のシナリオにおいてGenKIEを高度に適用する。 Key information extraction (KIE) from scanned documents has gained increasing attention because of its applications in various domains. Although promising results have been achieved by some recent KIE approaches, they are usually built based on discriminative models, which lack the ability to handle optical character recognition (OCR) errors and require laborious token-level labelling. In this paper, we propose a novel generative end-to-end model, named GenKIE, to address the KIE task. GenKIE is a sequence-to-sequence multimodal generative model that utilizes multimodal encoders to embed visual, layout and textual features and a decoder to generate the desired output. Well-designed prompts are leveraged to incorporate the label semantics as the weakly supervised signals and entice the generation of the key information. One notable advantage of the generative model is that it enables automatic correction of OCR errors. Besides, token-level granular annotation is not required. Extensive experiments on multiple public real-world datasets show that GenKIE effectively generalizes over different types of documents and achieves state-of-the-art results. Our experiments also validate the model's robustness against OCR errors, making GenKIE highly applicable in real-world scenarios.	翻訳日:2023-10-26 18:27:46 公開日:2023-10-24
# octopus:アラビア語自然言語生成のためのマルチタスクモデルとツールキット Octopus: A Multitask Model and Toolkit for Arabic Natural Language Generation ( http://arxiv.org/abs/2310.16127v1 ) ライセンス: Link先を確認	AbdelRahim Elmadany, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed	(参考訳) アラビア語のテキストを理解し、人間のような応答を生成することは、難しい取り組みだ。多くの研究者が個々の問題に対するモデルと解決策を提案しているが、幅広いタスクを処理できる包括的なアラビア語自然言語生成ツールキットが急速に不足している。本稿では,新しいアラビア語テキスト変換モデルarat5v2について述べる。新しいモデルは,拡張シーケンス長2,048トークンを使用して,多種多様なデータに対して体系的に訓練されている。我々は,シングルタスクとマルチタスクの両方の設定下で,教師なし,監督なし,共同学習を含む様々な事前学習戦略を検討する。私たちのモデルは、大きなマージンで競争ベースラインを上回ります。これはPythonベースのパッケージで、8つのアラビア生成タスク用に調整されたコマンドラインツールキットで、すべて1つのモデルを利用しています。モデルとツールキットをパブリックリポジトリでリリースしています。 Understanding Arabic text and generating human-like responses is a challenging endeavor. While many researchers have proposed models and solutions for individual problems, there is an acute shortage of a comprehensive Arabic natural language generation toolkit that is capable of handling a wide range of tasks. In this work, we present a novel Arabic text-to-text Transformer model, namely AraT5v2. Our new model is methodically trained on extensive and diverse data, utilizing an extended sequence length of 2,048 tokens. We explore various pretraining strategies including unsupervised, supervised, and joint pertaining, under both single and multitask settings. Our models outperform competitive baselines with large margins. We take our work one step further by developing and publicly releasing Octopus, a Python-based package and command-line toolkit tailored for eight Arabic generation tasks all exploiting a single model. We release the models and the toolkit on our public repository.	翻訳日:2023-10-26 18:27:20 公開日:2023-10-24
# 薄肉金属添加物製造におけるオンライン熱場予測 Online Thermal Field Prediction for Metal Additive Manufacturing of Thin Walls ( http://arxiv.org/abs/2310.16125v1 ) ライセンス: Link先を確認	Yifan Tang, M. Rahmani Dehaghani, Pouyan Sajadi, Shahriar Bakrani Balani, Akshay Dhalpe, Suraj Panicker, Di Wu, Eric Coatanea, G. Gary Wang	(参考訳) 本論文は, 金属AMにおける実用的問題, すなわち, 少数のセンサが利用可能であれば, 印刷部品の熱場をオンラインで予測する方法について検討することを目的とする。本研究は,オンライン性能制御のための金属AMプロセスに統合可能なマッピングと再構成を用いたオンライン熱場予測手法を提案する。温度曲線(一点の温度プロファイルの曲線セグメント)の類似性に基づいて、熱電界マッピングは、予め印刷された層上のある点の測定温度から、未印刷層の点の温度曲線を推定する人工ニューラルネットワークを適用する。同じ層上の複数の点の温度分布を測定・予測することで、熱電界再構成は、同じ層上のすべての点の温度プロファイルを構築するための還元次数モデル(rom)を提案し、層全体の温度場を構築するのに使用できる。 ROMのトレーニングは、計算効率を高めるための極端な学習機械(ELM)を用いて行われる。 15本のワイヤアークAM実験と9つのシミュレーションは、各層の固定長と一方向印刷の薄い壁のために設計されている。実験結果から, 提案手法は, 低コストデスクトップ上で0.1秒以内で, 未印刷層の温度場を構築できることが示唆された。一方,本手法は,低層から高層へ,同じシミュレーションでは高層へ,異なるAMプロセスパラメータ上での新しいシミュレーションに至るまで,ほとんどの場合において適用可能である。さらに,提案手法を限られた実験データで微調整した後,新しい実験における予測温度分布の相対誤差は十分に小さく,金属AMのオンライン応用における熱場予測法の適用性と一般化が実証された。 This paper aims to study a practical issue in metal AM, i.e., how to predict the thermal field of yet-to-print parts online when only a few sensors are available. This work proposes an online thermal field prediction method using mapping and reconstruction, which could be integrated into a metal AM process for online performance control. Based on the similarity of temperature curves (curve segments of a temperature profile of one point), the thermal field mapping applies an artificial neural network to estimate the temperature curves of points on the yet-to-print layer from measured temperatures of certain points on the previously printed layer. With measured/predicted temperature profiles of several points on the same layer, the thermal field reconstruction proposes a reduced order model (ROM) to construct the temperature profiles of all points on the same layer, which could be used to build the temperature field of the entire layer. The training of ROM is performed with an extreme learning machine (ELM) for computational efficiency. Fifteen wire arc AM experiments and nine simulations are designed for thin walls with a fixed length and unidirectional printing of each layer. The test results indicate that the proposed prediction method could construct the thermal field of a yet-to-print layer within 0.1 seconds on a low-cost desktop. Meanwhile, the method has acceptable generalization capability in most cases from lower layers to higher layers in the same simulation and from one simulation to a new simulation on different AM process parameters. More importantly, after fine-tuning the proposed method with limited experimental data, the relative errors of all predicted temperature profiles on a new experiment are sufficiently small, demonstrating the applicability and generalization of the proposed thermal field prediction method in online applications for metal AM.	翻訳日:2023-10-26 18:27:06 公開日:2023-10-24
# アンカー空間最適輸送:複数のOT問題のバッチ処理の高速化 Anchor Space Optimal Transport: Accelerating Batch Processing of Multiple OT Problems ( http://arxiv.org/abs/2310.16123v1 ) ライセンス: Link先を確認	Jianming Huang, Xun Su, Zhongxi Fang, Hiroyuki Kasai	(参考訳) 最適輸送(ot)理論は、定義された距離空間上の確率分布を比較する効果的な方法を提供するが、立方体計算の複雑さに苦しむ。シンクホーンのアルゴリズムはotソリューションの計算複雑性を大幅に削減するが、複数のot問題の解は依然として時間消費とメモリ消費である。しかし、OTの計算加速度に関する多くの研究は、通常、単一OT問題の前提に基づいており、ミニバッチにおける分布の潜在的共通特性を無視している。そこで本研究では,複数のOT問題解のバッチ処理に特化して設計された,アンカー空間最適輸送(ASOT)問題として指定された翻訳OT問題を提案する。提案したASOT問題に対して、分布を共有アンカー点空間にマッピングすることで、潜在的な共通特性を学習し、OTバッチ処理を高速化する。提案する asot に基づいて、元の ot 問題に対する wasserstein 距離誤差は、地上コスト誤差によって境界づけられることが証明される。そこで本研究では,距離誤差を最小限に抑えるアンカー空間を3つの手法で学習する手法を提案する。実世界のデータセットの数値実験により,提案手法は妥当な近似性能を維持しつつ計算時間を劇的に短縮できることを示した。 The optimal transport (OT) theory provides an effective way to compare probability distributions on a defined metric space, but it suffers from cubic computational complexity. Although the Sinkhorn's algorithm greatly reduces the computational complexity of OT solutions, the solutions of multiple OT problems are still time-consuming and memory-comsuming in practice. However, many works on the computational acceleration of OT are usually based on the premise of a single OT problem, ignoring the potential common characteristics of the distributions in a mini-batch. Therefore, we propose a translated OT problem designated as the anchor space optimal transport (ASOT) problem, which is specially designed for batch processing of multiple OT problem solutions. For the proposed ASOT problem, the distributions will be mapped into a shared anchor point space, which learns the potential common characteristics and thus help accelerate OT batch processing. Based on the proposed ASOT, the Wasserstein distance error to the original OT problem is proven to be bounded by ground cost errors. Building upon this, we propose three methods to learn an anchor space minimizing the distance error, each of which has its application background. Numerical experiments on real-world datasets show that our proposed methods can greatly reduce computational time while maintaining reasonable approximation performance.	翻訳日:2023-10-26 18:26:38 公開日:2023-10-24
# 素粒子物理学のためのニューラルネットワーク「19のパラメータ」 19 Parameters Is All You Need: Tiny Neural Networks for Particle Physics ( http://arxiv.org/abs/2310.16121v1 ) ライセンス: Link先を確認	Alexander Bogatskiy, Timothy Hoffman, Jan T. Offermann	(参考訳) 粒子加速器の衝突速度が向上し、ディープラーニングソリューションがその実現可能性を証明するにつれ、トリガーのような低レイテンシタスクのための軽量で高速なニューラルネットワークアーキテクチャの必要性が高まっている。本稿では,最近のLorentz- and permutation-symmetric architectureであるPELICANの可能性を検証し,トップクォークジェットタグのバイナリ分類タスクと比較した場合に,数万のパラメータで汎用アーキテクチャを上回り,最大19個のトレーニング可能なパラメータを提示する。 As particle accelerators increase their collision rates, and deep learning solutions prove their viability, there is a growing need for lightweight and fast neural network architectures for low-latency tasks such as triggering. We examine the potential of one recent Lorentz- and permutation-symmetric architecture, PELICAN, and present its instances with as few as 19 trainable parameters that outperform generic architectures with tens of thousands of parameters when compared on the binary classification task of top quark jet tagging.	翻訳日:2023-10-26 18:26:17 公開日:2023-10-24
# 静的長距離双極子相互作用による量子位置相関を持つ冷エミッタアンサンブル中の光伝播 Propagation of light in cold emitter ensembles with quantum position correlations due to static long-range dipolar interactions ( http://arxiv.org/abs/2310.16158v1 ) ライセンス: Link先を確認	G. J. Bean, N. D. Drummond, J. Ruostekoski	(参考訳) 我々は、不規則な位置が静的な長距離双極子-双極子相互作用によって引き起こされる相関を示す双極子エミッタからの光の散乱を分析する。量子力学的位置相関は、変動量子および拡散量子モンテカルロ法によるゼロ温度ボゾン原子または分子に対して計算される。低光強度の極限における高密度アンサンブル中の定常原子に対して、シミュレーションは、電子基底状態と励起状態を含む全ての位置相関関数に対する光学応答の解を与える。我々は,コヒーレントかつ非コヒーレントな散乱,集合線幅,直線シフト,固有モード,および障害誘発励起局在が静的相互作用と密度に影響されるかを計算する。強く閉じ込められたオービタントトラップとプロラトトラップの強い反発的な静的相互作用は、光を介する共鳴双極子-双極子相互作用において大きな変動を緩和する双極子間の短距離秩序をもたらす。典型的には、コヒーレント反射と光学的深さが増大し、コヒーレント散乱が減少する。静的双極子相互作用の存在は、密度の強い雲におけるサブラジアント固有モードの高選択的励起を可能にする。この効果は、自然の線幅より下にある共鳴が狭いプロラトトラップにおいてさらに顕著になる。静的双極子相互作用が光遷移周波数に影響を及ぼすと、アンサンブルは協調効果を抑制する不均一に経験された静的双極子相互作用によって不均質な広がりを示す。 We analyze the scattering of light from dipolar emitters whose disordered positions exhibit correlations induced by static, long-range dipole-dipole interactions. The quantum-mechanical position correlations are calculated for zero temperature bosonic atoms or molecules using variational and diffusion quantum Monte Carlo methods. For stationary atoms in dense ensembles in the limit of low light intensity, the simulations yield solutions for the optical responses to all orders of position correlation functions that involve electronic ground and excited states. We calculate how coherent and incoherent scattering, collective linewidths, line shifts, and eigenmodes, and disorder-induced excitation localization are influenced by the static interactions and the density. We find that dominantly repulsive static interactions in strongly confined oblate and prolate traps introduce short-range ordering among the dipoles which curtails large fluctuations in the light-mediated resonant dipole-dipole interactions. This typically results in an increase in coherent reflection and optical depth, accompanied by reduced incoherent scattering. The presence of static dipolar interactions permits the highly selective excitation of subradiant eigenmodes in dense clouds. This effect becomes even more pronounced in a prolate trap, where the resonances narrow below the natural linewidth. When the static dipolar interactions affect the optical transition frequencies, the ensemble exhibits inhomogeneous broadening due to the nonuniformly experienced static dipolar interactions that suppress cooperative effects.	翻訳日:2023-10-26 18:21:51 公開日:2023-10-24
# 議論による文脈認識特徴帰属 Context-aware feature attribution through argumentation ( http://arxiv.org/abs/2310.16157v1 ) ライセンス: Link先を確認	Jinfeng Zhong, Elsa Negre	(参考訳) 特徴帰属(feature attribution)は、機械学習とデータ分析の両方において、モデル出力に対する個々の特徴や変数の寄与を決定する基本的なタスクである。このプロセスは、結果を予測する上で最も重要な特徴を特定するのに役立つ。特徴属性法の歴史は、従属変数と独立変数の間の非線形関係を組み込んで線形回帰モデルを拡張する一般付加モデル(GAM)に遡ることができる。近年、勾配に基づく手法やサロゲートモデルが複雑な人工知能(AI)システムに応用されているが、これらの手法には限界がある。ガンは精度が低い傾向にあり、勾配に基づく手法は解釈が難しく、サロゲートモデルはしばしば安定性と忠実性の問題に苦しむ。さらに,既存の手法ではユーザのコンテキストを考慮せず,好みに大きな影響を及ぼす可能性がある。このような制約に対処し、現在の最先端を推し進めるために、我々は、CA-FATA(Context-Aware Feature Attribution Through Argumentation)と呼ばれる新しい特徴属性フレームワークを定義します。我々のフレームワークは、各フィーチャを、予測をサポートし、攻撃し、または中和できる引数として扱うことによって、議論の力を利用する。さらに、CA-FATAは議論手順として属性を定式化し、各計算には明示的な意味論があり、本質的に解釈可能である。 CA-FATAは、ユーザのコンテキストなどのサイド情報を容易に統合し、より正確な予測を行う。 Feature attribution is a fundamental task in both machine learning and data analysis, which involves determining the contribution of individual features or variables to a model's output. This process helps identify the most important features for predicting an outcome. The history of feature attribution methods can be traced back to General Additive Models (GAMs), which extend linear regression models by incorporating non-linear relationships between dependent and independent variables. In recent years, gradient-based methods and surrogate models have been applied to unravel complex Artificial Intelligence (AI) systems, but these methods have limitations. GAMs tend to achieve lower accuracy, gradient-based methods can be difficult to interpret, and surrogate models often suffer from stability and fidelity issues. Furthermore, most existing methods do not consider users' contexts, which can significantly influence their preferences. To address these limitations and advance the current state-of-the-art, we define a novel feature attribution framework called Context-Aware Feature Attribution Through Argumentation (CA-FATA). Our framework harnesses the power of argumentation by treating each feature as an argument that can either support, attack or neutralize a prediction. Additionally, CA-FATA formulates feature attribution as an argumentation procedure, and each computation has explicit semantics, which makes it inherently interpretable. CA-FATA also easily integrates side information, such as users' contexts, resulting in more accurate predictions.	翻訳日:2023-10-26 18:21:06 公開日:2023-10-24
# 光による超伝導量子ビットのコヒーレント制御 Coherent control of a superconducting qubit using light ( http://arxiv.org/abs/2310.16155v1 ) ライセンス: Link先を確認	Hana K. Warner (1), Jeffrey Holzgrafe (1 and 2), Beatriz Yankelevich (3), David Barton (1), Stefano Poletto (3), C. J. Xin (1), Neil Sinclair (1 and 4), Di Zhu (1), Eyob Sete (3), Brandon Langley (3), Emma Batson (5), Marco Colangelo (5), Amirhassan Shams-Ansari (1), Graham Joe (1), Karl K. Berggren (5), Liang Jiang (6), Matthew Reagor (3), and Marko Loncar (1) ((1) Harvard John A. Paulson School for Engineering and Applied Sciences, Cambridge, MA, USA, (2) Hyperlight Corporation, Cambridge, MA, USA, (3) Rigetti Computing, Berkeley, CA, USA, (4) Division of Physics, Mathematics, and Astronomy, California Institute of Technology, Pasadena, CA, USA, (5) Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA, (6) Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, USA)	(参考訳) 量子科学と技術は、低損失および低ノイズ通信チャネルに接続された量子プロセッサのネットワークに依存する強力な計算資源の実現を約束している [1,2]。極低温環境で動作する超伝導マイクロ波量子ビット (3-8ghz) は、その強いジョセフソン非線形性と低損失 [3] のために量子プロセッサノードの有望な候補として現れているが、空間的に分離されたプロセッサノード間の情報は、低損失光ファイバを伝搬する通信光子 (200 thz) を介して室温で伝達される可能性が高い。したがって、これらの異なる周波数間の量子情報の変換 [4-10] は、各プラットフォームの利点を量子資源と対向させることで活用することが重要である。ここでは超伝導量子ビットのコヒーレント光制御を示す。我々は、最大1.18%の変換効率(1.16%の協調性)で動作し、量子コヒーレンス時間 (800 ns) に影響を与えずに超伝導量子ビット内のラビ振動 (2.27 mhz) を示すマイクロ波光量子トランスデューサを開発した。最後に,ネットワーク量子プロセッサノードへのトランスデューサの利用に関する展望について述べる。 Quantum science and technology promise the realization of a powerful computational resource that relies on a network of quantum processors connected with low loss and low noise communication channels capable of distributing entangled states [1,2]. While superconducting microwave qubits (3-8 GHz) operating in cryogenic environments have emerged as promising candidates for quantum processor nodes due to their strong Josephson nonlinearity and low loss [3], the information between spatially separated processor nodes will likely be carried at room temperature via telecommunication photons (200 THz) propagating in low loss optical fibers. Transduction of quantum information [4-10] between these disparate frequencies is therefore critical to leverage the advantages of each platform by interfacing quantum resources. Here, we demonstrate coherent optical control of a superconducting qubit. We achieve this by developing a microwave-optical quantum transducer that operates with up to 1.18% conversion efficiency (1.16% cooperativity) and demonstrate optically-driven Rabi oscillations (2.27 MHz) in a superconducting qubit without impacting qubit coherence times (800 ns). Finally, we discuss outlooks towards using the transducer to network quantum processor nodes.	翻訳日:2023-10-26 18:20:23 公開日:2023-10-24
# 深層ニューラルネットワークにおける不変表現の学習による次元の呪いの破れ Breaking the Curse of Dimensionality in Deep Neural Networks by Learning Invariant Representations ( http://arxiv.org/abs/2310.16154v1 ) ライセンス: Link先を確認	Leonardo Petrini	(参考訳) 人工知能、特に機械学習のサブフィールドは、データから学び、データに適応するデータ駆動モデルへとパラダイムシフトしている。このことは、自然言語処理やコンピュータビジョンといった様々な領域において前例のない進歩をもたらした。ディープラーニングは、一連の計算層を通じて生データから関連する特徴を学習することで、従来のアプローチをはるかに超えている。この論文は、これらのモデルのアーキテクチャとそれらが処理するデータ内の固有の構造との関係を研究することによって、ディープラーニングの理論的基礎を探求する。特に、深層学習アルゴリズムの有効性を問うことで、いわゆる次元の呪い(すなわち、次元が増大するデータポイントの必要性が指数関数的に増加することによる、高次元での一般学習の難しさ)を克服できるだろうか? データの構造を利用して、関連する表現を学ぶ能力はあるか? 異なるアーキテクチャはどのように異なるデータ構造を利用するのか? これらの問題に対処するために、データの構造は、その不変性、すなわち、手元にあるタスクに無関係な側面によって効果的に特徴づけられるという考えを推し進める。本手法は,実験研究と物理モデルを組み合わせた深層学習への経験的アプローチを取り入れている。これらの単純化されたモデルは、私たちが深層学習システムで観察する複雑な振る舞いを調査し、解釈し、理論と実践のギャップを埋めることが目的である。 Artificial intelligence, particularly the subfield of machine learning, has seen a paradigm shift towards data-driven models that learn from and adapt to data. This has resulted in unprecedented advancements in various domains such as natural language processing and computer vision, largely attributed to deep learning, a special class of machine learning models. Deep learning arguably surpasses traditional approaches by learning the relevant features from raw data through a series of computational layers. This thesis explores the theoretical foundations of deep learning by studying the relationship between the architecture of these models and the inherent structures found within the data they process. In particular, we ask What drives the efficacy of deep learning algorithms and allows them to beat the so-called curse of dimensionality-i.e. the difficulty of generally learning functions in high dimensions due to the exponentially increasing need for data points with increased dimensionality? Is it their ability to learn relevant representations of the data by exploiting their structure? How do different architectures exploit different data structures? In order to address these questions, we push forward the idea that the structure of the data can be effectively characterized by its invariances-i.e. aspects that are irrelevant for the task at hand. Our methodology takes an empirical approach to deep learning, combining experimental studies with physics-inspired toy models. These simplified models allow us to investigate and interpret the complex behaviors we observe in deep learning systems, offering insights into their inner workings, with the far-reaching goal of bridging the gap between theory and practice.	翻訳日:2023-10-26 18:19:41 公開日:2023-10-24
# wojoodner 2023: アラビア語の最初の名前付きエンティティ認識共有タスク WojoodNER 2023: The First Arabic Named Entity Recognition Shared Task ( http://arxiv.org/abs/2310.16153v1 ) ライセンス: Link先を確認	Mustafa Jarrar, Muhammad Abdul-Mageed, Mohammed Khalilia, Bashar Talafha, AbdelRahim Elmadany, Nagham Hamad, Alaa' Omar	(参考訳) WojoodNER-2023は、最初のアラビア名付きエンティティ認識(NER)共有タスクである。 WojoodNER-2023の主な焦点はアラビア語のNERであり、新しいNERデータセット(すなわちWojood)と異なるNERアプローチ間の有意義な比較を促進するために設計されたサブタスクの定義を提供する。 WojoodNER-2023はFlatNERとNestedNERの2つのサブタスクを含む。合計45のチームがこの共有タスクに登録され、そのうち11チームがテストフェーズに積極的に参加した。具体的には11チームがFlatNERに参加し、8ドルチームがNestedNERに挑戦した。優勝チームはF1得点を91.96点、NestedNERで93.73点とした。 We present WojoodNER-2023, the first Arabic Named Entity Recognition (NER) Shared Task. The primary focus of WojoodNER-2023 is on Arabic NER, offering novel NER datasets (i.e., Wojood) and the definition of subtasks designed to facilitate meaningful comparisons between different NER approaches. WojoodNER-2023 encompassed two Subtasks: FlatNER and NestedNER. A total of 45 unique teams registered for this shared task, with 11 of them actively participating in the test phase. Specifically, 11 teams participated in FlatNER, while $8$ teams tackled NestedNER. The winning teams achieved F1 scores of 91.96 and 93.73 in FlatNER and NestedNER, respectively.	翻訳日:2023-10-26 18:19:08 公開日:2023-10-24
# FLTrojan: 選択的な重み付けによるフェデレーション言語モデルに対するプライバシ漏洩攻撃 FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering ( http://arxiv.org/abs/2310.16152v1 ) ライセンス: Link先を確認	Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Kang Gu, Najrin Sultana, Shagufta Mehnaz	(参考訳) フェデレーション学習(federated learning, fl)は、言語モデリングを含む多くのテクノロジベースのアプリケーションにおいて、重要なコンポーネントになりつつある。しかし、連合言語モデルにおけるプライバシー漏洩の程度を認識するのは簡単ではなく、既存の攻撃は、それがどれほど敏感であるか、あるいは無意味であるかに関わらず、データを抽出することだけを目的としている。本稿では,このギャップを埋めるため,フェデレーション言語モデルからプライバシに敏感なユーザデータを漏洩する2つの新たな知見を提案する。まず、FLの中間ラウンドからのモデルスナップショットが、最終的なトレーニングモデルよりも大きなプライバシリークを引き起こす可能性があることを重要視する。第2に、センシティブなトレーニングデータを記憶する責任を特に負うモデルの選択的な重みを改ざんすることで、プライバシの漏洩が増大する可能性があることを特定する。悪意のあるクライアントが、サーバからの協力なしに、FL内の他のユーザのプライバシーに敏感なデータを漏洩させる方法を示す。提案手法は, メンバシップ推定のリコールを29%向上させ, 最大70%のプライベートデータ再構成を達成し, 敵の能力の強い仮定で既存の攻撃よりも優れていた。 Federated learning (FL) is becoming a key component in many technology-based applications including language modeling -- where individual FL participants often have privacy-sensitive text data in their local datasets. However, realizing the extent of privacy leakage in federated language models is not straightforward and the existing attacks only intend to extract data regardless of how sensitive or naive it is. To fill this gap, in this paper, we introduce two novel findings with regard to leaking privacy-sensitive user data from federated language models. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that privacy leakage can be aggravated by tampering with a model's selective weights that are specifically responsible for memorizing the sensitive training data. We show how a malicious client can leak the privacy-sensitive data of some other user in FL even without any cooperation from the server. Our best-performing method improves the membership inference recall by 29% and achieves up to 70% private data reconstruction, evidently outperforming existing attacks with stronger assumptions of adversary capabilities.	翻訳日:2023-10-26 18:18:45 公開日:2023-10-24
# Yin Yang Convolutional Nets:Opposites解析による画像マニフォールド抽出 Yin Yang Convolutional Nets: Image Manifold Extraction by the Analysis of Opposites ( http://arxiv.org/abs/2310.16148v1 ) ライセンス: Link先を確認	Augusto Seben da Rosa, Frederico Santos de Oliveira, Anderson da Silva Soares, Arnaldo Candido Junior	(参考訳) コンピュータビジョンは、トレーニング最適化、新しいアーキテクチャ(純粋注意、効率的なブロック、視覚言語モデル、生成モデルなど)など、いくつかの進歩を示した。これにより、分類などのいくつかのタスクのパフォーマンスが向上した。しかし、これらのモデルの大部分は、脳に関する現実的な神経科学的アプローチから遠ざかっている修正に焦点を当てている。本研究では,視覚多様体を抽出するアーキテクチャであるYin Yang Convolutional Network(Yin Yang Convolutional Network,Yin Yang Convolutional Network,Yin Yang Convolutional Network,Yin Yang Convolutional Network,Yin Yang Convolutional Network,Yin Yang Convolutional Network)を紹介する。我々のアーキテクチャは,データセットCIFAR-10の低パラメータアーキテクチャ間で,最先端の効率を提供することを示す。最初のモデルは93.32\%テスト精度に達し、このカテゴリの古いsomaよりも0.8\%高く、パラメータは15,000未満(726k)でした。第2のモデルは52kパラメータを使用し、テスト精度はわずか3.86\%です。 ImageNetでも分析を行い、1.6Mパラメータで66.49\%の精度で検証しました。コードはhttps://github.com/NoSavedDATA/YinYang_CNNで公開しています。 Computer vision in general presented several advances such as training optimizations, new architectures (pure attention, efficient block, vision language models, generative models, among others). This have improved performance in several tasks such as classification, and others. However, the majority of these models focus on modifications that are taking distance from realistic neuroscientific approaches related to the brain. In this work, we adopt a more bio-inspired approach and present the Yin Yang Convolutional Network, an architecture that extracts visual manifold, its blocks are intended to separate analysis of colors and forms at its initial layers, simulating occipital lobe's operations. Our results shows that our architecture provides State-of-the-Art efficiency among low parameter architectures in the dataset CIFAR-10. Our first model reached 93.32\% test accuracy, 0.8\% more than the older SOTA in this category, while having 150k less parameters (726k in total). Our second model uses 52k parameters, losing only 3.86\% test accuracy. We also performed an analysis on ImageNet, where we reached 66.49\% validation accuracy with 1.6M parameters. We make the code publicly available at: https://github.com/NoSavedDATA/YinYang_CNN.	翻訳日:2023-10-26 18:18:21 公開日:2023-10-24
# PreWoMe: ロングフォーム質問回答のためのワーキングメモリとしての前提事項のエクスプロイト PreWoMe: Exploiting Presuppositions as Working Memory for Long Form Question Answering ( http://arxiv.org/abs/2310.16147v1 ) ライセンス: Link先を確認	Wookje Han, Jinsol Park, Kyungjae Lee	(参考訳) 長文質問応答(LFQA)における情報探索質問は、その質問の曖昧さや偽の前提によって誤解を招くことが多い。既存の多くのアプローチは誤解を招く問題に対処するが、予測不可能な入力特性を持つ現実世界では不十分な限られた問題に適応している。本研究では,任意の種類の情報探索問題に対処できる統一的なアプローチであるPreWoMeを提案する。 PreWoMeのキーとなるアイデアは、質問の前提を抽出し、それらをワーキングメモリとして利用して、質問に対するフィードバックとアクションを生成することである。実験の結果,PreWoMeは誤解を招く質問に対処するだけでなく,通常の質問に対処する上でも有効であることがわかった。 Information-seeking questions in long-form question answering (LFQA) often prove misleading due to ambiguity or false presupposition in the question. While many existing approaches handle misleading questions, they are tailored to limited questions, which are insufficient in a real-world setting with unpredictable input characteristics. In this work, we propose PreWoMe, a unified approach capable of handling any type of information-seeking question. The key idea of PreWoMe involves extracting presuppositions in the question and exploiting them as working memory to generate feedback and action about the question. Our experiment shows that PreWoMe is effective not only in tackling misleading questions but also in handling normal ones, thereby demonstrating the effectiveness of leveraging presuppositions, feedback, and action for real-world QA settings.	翻訳日:2023-10-26 18:17:59 公開日:2023-10-24
# Clinfo.ai: 学術文献を用いた医学質問応答のためのオープンソースの検索型大規模言語モデルシステム Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature ( http://arxiv.org/abs/2310.16146v1 ) ライセンス: Link先を確認	Alejandro Lozano, Scott L Fleming, Chia-Chun Chiang, and Nigam Shah	(参考訳) 出版される医学文献の急速な発展は、臨床医や研究者が最新の関連する発見をタイムリーに追従し、要約することを困難にしている。大規模言語モデル(LLM)に基づくいくつかのクローズドソース要約ツールが存在するが、その出力の厳密で体系的な評価は欠如している。さらに、これらのツールを評価するための高品質なデータセットと適切なベンチマークタスクが存在する。 We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200. The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality datasets and appropriate benchmark tasks with which to evaluate these tools. We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200.	翻訳日:2023-10-26 18:17:45 公開日:2023-10-24
# 限定記憶能力を持つ言語モデルによる人間の文処理における干渉 A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing ( http://arxiv.org/abs/2310.16142v1 ) ライセンス: Link先を確認	William Timkey, Tal Linzen	(参考訳) 人間の文処理の難易度を左右する2つの要因は、作業記憶からの期待と検索である。最近の統合認知モデル作成の試みは、トランスフォーマー言語モデルの自己愛機構と、人間の文処理における作業記憶のcueに基づく検索理論(ryuとlewis 2021)との並列性に依拠している。 ryuとlewisは、gpt-2の特殊注意ヘッドの注意パターンが類似性に基づく干渉、すなわちcueに基づく検索モデルの鍵となる予測と一致していることを示したが、それらの方法は構文的に特殊な注意ヘッドを識別することを必要とし、数百のメモリ検索操作が並行して行われるという認知的に予測不能な仮定を与える。本研究は,認知理論によって仮定される記憶系とより密接に類似した,単一の自己注意頭部を持つ反復型ニューラルネットワークモデルを開発する。本モデルでは,人間の実験で観察された意味的および構文的干渉効果を捉える。 Two of the central factors believed to underpin human sentence processing difficulty are expectations and retrieval from working memory. A recent attempt to create a unified cognitive model integrating these two factors relied on the parallels between the self-attention mechanism of transformer language models and cue-based retrieval theories of working memory in human sentence processing (Ryu and Lewis 2021). While Ryu and Lewis show that attention patterns in specialized attention heads of GPT-2 are consistent with similarity-based interference, a key prediction of cue-based retrieval models, their method requires identifying syntactically specialized attention heads, and makes the cognitively implausible assumption that hundreds of memory retrieval operations take place in parallel. In the present work, we develop a recurrent neural language model with a single self-attention head, which more closely parallels the memory system assumed by cognitive theories. We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.	翻訳日:2023-10-26 18:17:30 公開日:2023-10-24
# 隠れたサイテーションが科学に本当の影響を与えている Hidden Citations Obscure True Impact in Science ( http://arxiv.org/abs/2310.16181v1 ) ライセンス: Link先を確認	Xiangyi Meng, Onur Varol, Albert-L\'aszl\'o Barab\'asi	(参考訳) 参照科学者が以前の知識に依拠するメカニズムは、近年広く使われて誤用された科学的影響の尺度へと変化している。しかし、発見が常識となると、引用は法人化によって消滅する。これは隠れた引用の概念につながり、それを具現化した出版物に言及することなく、発見への明確なテキストクレジットを表す。ここでは,各論文の全文に適用した教師なしの解釈可能な機械学習を用いて,隠れた引用を体系的に識別する。出版場所や規律に関係なく出現する,影響力のある発見や隠された引用数が引用数を上回っていることが判明した。引用数ではなく,写本の本文中の話題に関する談話の程度から判断し,より議論が深まるほど,標準書誌分析の可視性が低下することを示した。隠れた引用は、文献測度が発見の真の影響を定量化するための限られた視点を与え、科学的コーパスの全文から知識を抽出する必要性を高めていることを示している。 References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.	翻訳日:2023-10-26 18:09:11 公開日:2023-10-24
# 逆追跡による補正は要約における幻覚を減少させる Correction with Backtracking Reduces Hallucination in Summarization ( http://arxiv.org/abs/2310.16176v1 ) ライセンス: Link先を確認	Zhenzhen Liu, Chao Wan, Varsha Kishore, Jin Peng Zhou, Minmin Chen, Kilian Q. Weinberger	(参考訳) 抽象要約は、重要な要素を保持しながら簡潔なソースドキュメントの自然言語要約を生成することを目的としている。近年の進歩にもかかわらず、ニューラルネットワークの要約モデルは、ソースドキュメントに基礎を置かない詳細の要約を生成させる幻覚(またはより正確に表現する)の影響を受けやすいことが知られている。本稿では,抽象的な要約における幻覚を低減するため,シンプルだが効率的な手法であるCoBaを紹介する。アプローチは幻覚検出と緩和という2つのステップに基づいている。前者は条件付き単語の確率と文脈語の距離に関する単純な統計値を測定することで達成可能であることを示す。さらに,ストレートフォワードバックトラッキングが驚くほど効果的であることを示す。テキスト要約のための3つのベンチマークデータセットに対して,先行技術を用いて提案手法を徹底的に評価した。その結果,CoBaは幻覚の低減に有効かつ効率的であり,適応性と柔軟性に優れていた。 Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization. The approach is based on two steps: hallucination detection and mitigation. We show that the former can be achieved through measuring simple statistics about conditional word probabilities and distance to context words. Further, we demonstrate that straight-forward backtracking is surprisingly effective at mitigation. We thoroughly evaluate the proposed method with prior art on three benchmark datasets for text summarization. The results show that CoBa is effective and efficient in reducing hallucination, and offers great adaptability and flexibility.	翻訳日:2023-10-26 18:08:52 公開日:2023-10-24
# G-CASCADE:2次元医用画像分割のための効率的なカスケードグラフ畳み込みデコーディング G-CASCADE: Efficient Cascaded Graph Convolutional Decoding for 2D Medical Image Segmentation ( http://arxiv.org/abs/2310.16175v1 ) ライセンス: Link先を確認	Md Mostafijur Rahman and Radu Marculescu	(参考訳) 近年,医療画像分割は,コンピュータ支援診断の分野において重要な応用となっている。本稿では,2次元医用画像分割のための新しいグラフ畳み込み型デコーダであるカスケードグラフ畳み込み注意デコーダ(g-cascade)を提案する。 G-CASCADEは、効率的なグラフ畳み込みブロックを持つ階層変換器エンコーダによって生成される多段特徴写像を徐々に洗練する。エンコーダはセルフアテンション機構を利用して長距離依存関係をキャプチャし、デコーダはグラフ畳み込みブロックのグローバル受容フィールドによる長距離情報を保存する特徴マップを洗練する。複数のトランスフォーマーエンコーダを用いたデコーダの厳密な評価は,5つの医用画像分割作業(腹部臓器,心臓臓器,ポリープ病変,皮膚病変,網膜血管)において,我々のモデルが他のSOTA法よりも優れていることを示している。また,パラメータが80.8%少なく,FLOPが82.3%少ないSOTA CASCADEデコーダよりも優れたDICEスコアが得られることを示す。我々のデコーダは他の階層エンコーダと簡単に使用でき、汎用的セマンティックおよび医用画像セグメンテーションタスクに利用できる。 In recent years, medical image segmentation has become an important application in the field of computer-aided diagnosis. In this paper, we are the first to propose a new graph convolution-based decoder namely, Cascaded Graph Convolutional Attention Decoder (G-CASCADE), for 2D medical image segmentation. G-CASCADE progressively refines multi-stage feature maps generated by hierarchical transformer encoders with an efficient graph convolution block. The encoder utilizes the self-attention mechanism to capture long-range dependencies, while the decoder refines the feature maps preserving long-range information due to the global receptive fields of the graph convolution block. Rigorous evaluations of our decoder with multiple transformer encoders on five medical image segmentation tasks (i.e., Abdomen organs, Cardiac organs, Polyp lesions, Skin lesions, and Retinal vessels) show that our model outperforms other state-of-the-art (SOTA) methods. We also demonstrate that our decoder achieves better DICE scores than the SOTA CASCADE decoder with 80.8% fewer parameters and 82.3% fewer FLOPs. Our decoder can easily be used with other hierarchical encoders for general-purpose semantic and medical image segmentation tasks.	翻訳日:2023-10-26 18:08:39 公開日:2023-10-24
# フォトニック状態の量子幾何学による電磁不整合の探索 Probing Electromagnetic Nonreciprocity with Quantum Geometry of Photonic States ( http://arxiv.org/abs/2310.16174v1 ) ライセンス: Link先を確認	Ioannis Petrides, Jonathan B. Curtis, Marie Wesson, Amir Yacoby, Prineha Narang	(参考訳) 誘電体および磁性材料における相互および非相互効果は、電子の微視的性質に関する重要な情報を提供する。しかし、この2つを実験的に区別することは、特に関連する効果が極めて小さい場合に困難であることが証明されている。そこで本研究では,関心のある材料を中心に配置したクロスキャビティデバイスを用いた非接触検出を提案する。本稿では, キャビティの電磁モード間の結合と共振周波数のシフトに, Kerr や Faraday などの材料の光学特性, 複屈折が現れることを示す。幾何学的フォトニック状態のダイナミクスを計算することにより、量子メトリックおよび量子プロセストモグラフィーに基づいて、物質の複素屈折率の個々の成分を分離し、関連するパラメータ推定の分散に束縛された量子力学的クレーア・ラオを最小化する計測プロトコルを定式化する。本手法は,光キャビティにおけるフォック状態,マイクロ波およびTHz共振器におけるコヒーレント状態など,幅広い実験プラットフォームに適用可能であることが期待される。 Reciprocal and nonreciprocal effects in dielectric and magnetic materials provide crucial information about the microscopic properties of electrons. However, experimentally distinguishing the two has proven to be challenging, especially when the associated effects are extremely small. To this end, we propose a contact-less detection using a cross-cavity device where a material of interest is placed at its centre. We show that the optical properties of the material, such as Kerr and Faraday rotation, or, birefringence, manifest in the coupling between the cavities' electromagnetic modes and in the shift of their resonant frequencies. By calculating the dynamics of a geometrical photonic state, we formulate a measurement protocol based on the quantum metric and quantum process tomography that isolates the individual components of the material's complex refractive index and minimizes the quantum mechanical Cram\'er-Rao bound on the variance of the associated parameter estimation. Our approach is expected to be applicable across a broad spectrum of experimental platforms including Fock states in optical cavities, or, coherent states in microwave and THz resonators.	翻訳日:2023-10-26 18:08:18 公開日:2023-10-24
# $\epsilon$-Greedyによる深部Q-Networksの収束とサンプル複雑度解析について On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $\epsilon$-Greedy Exploration ( http://arxiv.org/abs/2310.16173v1 ) ライセンス: Link先を確認	Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury	(参考訳) 本稿では,深層強化学習における$\varepsilon$-greedyによるDQN(Deep Q-Network)の理論的理解を提供する。 DQNの壮大な経験的成果にもかかわらず、その理論的特徴は未解明のままである。まず、探査戦略は非現実的か既存の分析で無視される。第2に、従来のQ-ラーニングアルゴリズムとは対照的に、DQNはターゲットネットワークと経験リプレイを使用して、Q-ネットワークのトレーニングに使用する平均2乗ベルマン誤差(MSBE)のバイアスのない推定値を取得する。しかし、dqnsの既存の理論解析では収束解析が欠如しており、計算効率に乏しい超パラメータニューラルネットワークを配置することで技術的な課題を回避している。本稿では,DQNの実用的設定を$\epsilon$-greedyポリシーを用いて理論的収束とサンプル複雑性解析を行う。減衰$\epsilon$が最適Q値関数に幾何学的に収束する反復手順を証明する。さらに、$\epsilon$値のより高いレベルは収束領域を拡大するが収束を遅くするが、反対のレベルは$\epsilon$値の低レベルである。実験はdqnsの確立した理論的洞察を正当化する。 This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However, the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with decaying $\epsilon$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $\epsilon$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $\epsilon$ values. Experiments justify our established theoretical insights on DQNs.	翻訳日:2023-10-26 18:07:58 公開日:2023-10-24
# iNVS:新しいビュー合成のための拡散塗料の再利用 iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis ( http://arxiv.org/abs/2310.16167v1 ) ライセンス: Link先を確認	Yash Kant, Aliaksandr Siarohin, Michael Vasilkovsky, Riza Alp Guler, Jian Ren, Sergey Tulyakov, Igor Gilitschenski	(参考訳) 単一ソース画像から一貫した新しいビューを生成する方法を提案する。本手法は,画像からの可視画素の再利用を最大化する。これを実現するために,光源ビューから対象ビューへ可視画素を転送する単眼深度推定器を用いる。事前学習した2次元インペインティング拡散モデルから始めて,大規模オブジャバースデータセットを用いて3次元オブジェクトの事前学習を行う。トレーニング中は、エピポーラ線に基づく新しいマスキング機構を使用して、アプローチの質をさらに向上する。これにより、さまざまなオブジェクトに対してゼロショットの新規ビュー合成を行うことができる。 Google Scanned Objects、Ray Traced Multiview、Common Objectsの3つの挑戦的なデータセットでフレームワークのゼロショット能力を評価する。詳細は、私たちのWebページを参照してください。 We present a method for generating consistent novel views from a single source image. Our approach focuses on maximizing the reuse of visible pixels from the source image. To achieve this, we use a monocular depth estimator that transfers visible pixels from the source view to the target view. Starting from a pre-trained 2D inpainting diffusion model, we train our method on the large-scale Objaverse dataset to learn 3D object priors. While training we use a novel masking mechanism based on epipolar lines to further improve the quality of our approach. This allows our framework to perform zero-shot novel view synthesis on a variety of objects. We evaluate the zero-shot abilities of our framework on three challenging datasets: Google Scanned Objects, Ray Traced Multiview, and Common Objects in 3D. See our webpage for more details: https://yashkant.github.io/invs/	翻訳日:2023-10-26 18:07:39 公開日:2023-10-24
# Brainchop:次世代Webベースのニューロイメージングアプリケーション Brainchop: Next Generation Web-Based Neuroimaging Application ( http://arxiv.org/abs/2310.16162v1 ) ライセンス: Link先を確認	Mohamed Masoud, Pratyush Reddy, Farfalla Hu, and Sergey Plis	(参考訳) ブラウザ内でのボリューム画像処理、特に医療データを直接行うことは、従来のバックエンドツールと比較して前例のない課題である。これらの課題は、制約付き計算リソースやフロントエンド機械学習ライブラリの可用性など、ブラウザ環境に固有の制限から生じる。その結果、エンドユーザーデータのプライバシと居住性を維持しつつ、脳全体の前処理とセグメンテーションに包括的なエンドツーエンドソリューションを提供することができる、神経画像フロントエンドツールが不足している。この状況を踏まえて、brainchop(http://www.brainchop.org)を、事前訓練されたフル脳深層学習モデルを使用して、構造mriのボリューム分析を可能にする画期的なブラウザ内神経イメージングツールとして紹介します。データプライバシに関するコミットメントに加えて、このフロントエンドツールはスケーラビリティ、低レイテンシ、ユーザフレンドリな操作、クロスプラットフォーム互換性、アクセシビリティ向上など、複数の機能を提供する。本稿では,brainchopの処理パイプラインを概説し,各種ソフトウェアおよびハードウェア構成におけるモデルの性能評価を行う。その結果,webブラウザのリソース制約環境においても,ロバストなメッシュネットアーキテクチャにより,ボリュームデータに対するクライアント側処理の実用性が示された。 Performing volumetric image processing directly within the browser, particularly with medical data, presents unprecedented challenges compared to conventional backend tools. These challenges arise from limitations inherent in browser environments, such as constrained computational resources and the availability of frontend machine learning libraries. Consequently, there is a shortage of neuroimaging frontend tools capable of providing comprehensive end-to-end solutions for whole brain preprocessing and segmentation while preserving end-user data privacy and residency. In light of this context, we introduce Brainchop (http://www.brainchop.org) as a groundbreaking in-browser neuroimaging tool that enables volumetric analysis of structural MRI using pre-trained full-brain deep learning models, all without requiring technical expertise or intricate setup procedures. Beyond its commitment to data privacy, this frontend tool offers multiple features, including scalability, low latency, user-friendly operation, cross-platform compatibility, and enhanced accessibility. This paper outlines the processing pipeline of Brainchop and evaluates the performance of models across various software and hardware configurations. The results demonstrate the practicality of client-side processing for volumetric data, owing to the robust MeshNet architecture, even within the resource-constrained environment of web browsers.	翻訳日:2023-10-26 18:07:24 公開日:2023-10-24
# MyriadAL: 病理学のためのアクティブショットラーニング MyriadAL: Active Few Shot Learning for Histopathology ( http://arxiv.org/abs/2310.16161v1 ) ライセンス: Link先を確認	Nico Schiavone, Jingyi Wang, Shuangzhi Li, Roger Zemp, and Xingyu Li	(参考訳) アクティブラーニング(AL)とFew Shot Learning(FSL)は,近年,優れた成果を上げているラベル効率のよい2つの手法である。しかし、両方の学習パラダイムにおけるほとんどの先行技術は、膨大な未学習データの富を探索することができない。本研究では,アノテーションの予算が非常に限られているが,目的とするタスクにラベルなしのデータが大量に含まれている場合に,この問題に対処する。この研究は、ラベリングが禁止的に高価である病理組織学の文脈で行われます。そこで,本研究では,ループ内のコントラスト学習エンコーダ,擬似ラベル生成,新規クエリサンプル選択などを含む,能動的少数ショット学習フレームワークであるmyriad active learning (mal)を提案する。具体的には、得られたデータ表現とクラスタリング知識が基礎を形成してalループを活性化する自己教師あり方式で、ラベルなしデータをマッサージする。各ALサイクルのオラクルからのフィードバックにより、エンコーダの上の浅いタスク固有ネットを最適化することにより、未ラベルデータの擬似ラベルを洗練する。これらの更新された擬似ラベルは、アクティブな学習クエリ選択プロセスの通知と改善に役立つ。さらに,既存の不確実性対策を組み合わせて,不確実性リスト全体を活用し,alのサンプル冗長性を低減するための新しいレシピを提案する。 2つの公開病理組織学データセットに関する広範な実験により、malは以前の研究よりも優れたテスト精度、マクロf1-スコア、ラベル効率を示し、データセットのわずか5%をラベル付けしながら、完全な教師付きアルゴリズムと同等のテスト精度を達成できることが示された。 Active Learning (AL) and Few Shot Learning (FSL) are two label-efficient methods which have achieved excellent results recently. However, most prior arts in both learning paradigms fail to explore the wealth of the vast unlabelled data. In this study, we address this issue in the scenario where the annotation budget is very limited, yet a large amount of unlabelled data for the target task is available. We frame this work in the context of histopathology where labelling is prohibitively expensive. To this end, we introduce an active few shot learning framework, Myriad Active Learning (MAL), including a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop. Specifically, we propose to massage unlabelled data in a self-supervised manner, where the obtained data representations and clustering knowledge form the basis to activate the AL loop. With feedback from the oracle in each AL cycle, the pseudo-labels of the unlabelled data are refined by optimizing a shallow task-specific net on top of the encoder. These updated pseudo-labels serve to inform and improve the active learning query selection process. Furthermore, we introduce a novel recipe to combine existing uncertainty measures and utilize the entire uncertainty list to reduce sample redundancy in AL. Extensive experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works, and can achieve a comparable test accuracy to a fully supervised algorithm while labelling only 5% of the dataset.	翻訳日:2023-10-26 18:07:00 公開日:2023-10-24
# 軽量安定化器を用いたトーリック符号の単発誤差補正 Single-shot error correction on toric codes with high-weight stabilizers ( http://arxiv.org/abs/2310.16160v1 ) ライセンス: Link先を確認	Yingjia Lin, Shilin Huang, Kenneth R. Brown	(参考訳) 量子エラー訂正符号の場合、要求される測定ラウンドの数は通常、測定が故障した場合の符号距離とともに増加する。単発エラー訂正では、コードサイズに関係なく1ラウンドのノイズシンドローム測定でエラーしきい値を設定することができる。ここでは、トーリックコードのシングルショットチェック演算子を実装します。シングルショットチェックはcampbell[campbell, 2019]に続くガウス除去によって構築される。単発チェック演算子は、ノイズ測定による誤差モデルに対して5.62%の持続しきい値となり、従来のトーリックコード検査演算子よりもノイズ測定の回数が多い。この変換のコストは非局所的な高重安定化器発生器である。次に,安定度重みで測定誤差を増大させるゲートに基づく誤差モデルを検討する。ここでは、単発のしきい値の振る舞いは見つからず、代わりに、コードファミリが固定エラー率に対して最適なコードサイズを持つことを見つけます。この誤差モデルでは、複数の測定値を持つ従来のチェック演算子は論理誤差率を低くする。 For quantum error correction codes the required number of measurement rounds typically increases with the code distance when measurements are faulty. Single-shot error correction allows for an error threshold with only one round of noisy syndrome measurements regardless of the code size. Here we implement single-shot check operators for toric codes. The single-shot checks are constructed by Gaussian elimination following Campbell [Campbell, 2019]. The single-shot check operators result in a sustainable threshold at 5.62% for an error model with noisy measurements, outperforming the conventional toric code check operators with multiple rounds of noisy measurement. The cost of the transformation is non-local high-weight stabilizer generators. We then consider a gate-based error model that leads to increased measurement error with stabilizer weight. Here we find no single-shot threshold behavior and instead find the code family will have an optimal code size for a fixed error rate. For this error model, the conventional check operators with multiple measurements yields a lower logical error rate.	翻訳日:2023-10-26 18:06:32 公開日:2023-10-24
# 固定イジング結合を有する超伝導または半導体スピン量子ビット配列に対するロバスト形状パルス Robust shaped pulses for arrays of superconducting or semiconductor spin qubits with fixed Ising coupling ( http://arxiv.org/abs/2310.16159v1 ) ライセンス: Link先を確認	David W. Kanaar and J. P. Kestner	(参考訳) 固体量子コンピューティングにおける現在の大きな課題は、量子ビット配列をより多くの量子ビットに拡張することである。これは、これらの配列内の独立に調整可能な多数の量子ビット間カップリングに対する制御配線の複雑さによって妨げられる。問題を単純化する1つのアプローチは、固定Ising(ZZ$)相互作用を持つqubit配列を使用することである。そのようなシステムにおいて、量子ビットの特定の部分集合を同時に駆動するとき、ダイナミクスは、$\mathfrak{su}$(2) 部分代数の集合に制限される。これらの$\mathfrak{su}$(2)sの中で、x$-gatesと$\frac{\pi}{2}$$zz$ローテーションを、トランスモン量子ビットにおけるエラーの主な原因である漏洩や、フラックスや半導体スピン量子ビットにおける不確かさの主な源である結合ゆらぎに対して頑健に行う方法を説明します。これらのゲートと仮想$z$ゲートは、量子コンピューティングのための普遍的なゲートセットを形成する。超伝導量子ビットおよび半導体スピン量子ビットアレイを構成する2辺,3辺,4辺の頂点に対して,このロバストゲートセットを構築する。 A major current challenge in solid-state quantum computing is to scale qubit arrays to a larger number of qubits. This is hampered by the complexity of the control wiring for the large number of independently tunable interqubit couplings within these arrays. One approach to simplifying the problem is to use a qubit array with fixed Ising ($ZZ$) interactions. When simultaneously driving a specific subset of qubits in such a system, the dynamics are confined to a set of commuting $\mathfrak{su}$(2) subalgebras. Within these $\mathfrak{su}$(2)s we describe how to perform $X$-gates and $\frac{\pi}{2}$ $ZZ$ rotations robustly against either leakage, which is the main source of error in transmon qubits, or coupling fluctuations, which is the main source of infidelity in flux or semiconductor spin qubits. These gates together with virtual-$z$ gates form a universal set of gates for quantum computing. We construct this set of robust gates for two-edge, three-edge, and four-edge vertices, which compose all existing superconducting qubit and semiconductor spin qubit arrays.	翻訳日:2023-10-26 18:06:17 公開日:2023-10-24
# GPU組み込みシステムのパフォーマンスチューニング:マシンラーニングと解析モデル駆動チューニング手法 Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies ( http://arxiv.org/abs/2310.16214v1 ) ライセンス: Link先を確認	Adrian Perez Dieguez, Margarita Amor Lopez	(参考訳) GPU組み込みシステムは、効率的な電力消費のために、様々な領域で人気を集めている。しかし、これらのシステム上で動作するリアルタイムまたは時間を要するアプリケーションの要求を満たすためには、高いパフォーマンスを示すように調整することが不可欠である。本稿では,GPU組み込みシステム上での2つのチューニング手法の開発と比較による課題に対処するとともに,これらのアーキテクチャ上で動作するアプリケーションの最適化を目指す開発者や研究者に対して,パフォーマンス上の洞察を提供する。我々は、FFT、スキャンプリミティブ、および多くのアプリケーションにおいて性能クリティカルなコンポーネントである三角形システムソルバなどの並列プレフィックス演算に焦点を当てる。本研究は,分析モデル駆動型チューニング手法と機械学習(ML)に基づくチューニング手法を紹介する。 NVIDIA JetsonシステムにおけるBPLGライブラリの異なる並列プレフィックス実装のための2つのチューニング手法の性能評価を行い、その性能を網羅的な探索によって達成されたものと比較した。この発見は、サーバと組み込みデバイス間の主要な計算パターンのパフォーマンスポータビリティに関するオープンな課題に対処するための最良の戦略を明らかにし、オフラインおよびオンラインチューニングの実践的なガイダンスを提供した。また,CUSPARSE,CUB,CUFFTなどの最先端ライブラリとBPLGの性能を比較し,GPU組み込みシステムにおける並列計算パターンに関する既存の研究のギャップにも対処する。 GPU-embedded systems have gained popularity across various domains due to their efficient power consumption. However, in order to meet the demands of real-time or time-consuming applications running on these systems, it is crucial for them to be tuned to exhibit high performance. This paper addresses the issue by developing and comparing two tuning methodologies on GPU-embedded systems, and also provides performance insights for developers and researchers seeking to optimize applications running on these architectures. We focus on parallel prefix operations, such as FFT, scan primitives, and tridiagonal system solvers, which are performance-critical components in many applications. The study introduces an analytical model-driven tuning methodology and a Machine Learning (ML)-based tuning methodology. We evaluate the performance of the two tuning methodologies for different parallel prefix implementations of the BPLG library in an NVIDIA Jetson system, and compare their performance to the ones achieved through an exhaustive search. The findings shed light on the best strategies for handling the open challenge of performance portability for major computational patterns among server and embedded devices, providing practical guidance for offline and online tuning. We also address the existing gap in performance studies for parallel computational patterns in GPU-embedded systems by comparing the BPLG performance against other state-of-the-art libraries, including CUSPARSE, CUB, and CUFFT.	翻訳日:2023-10-26 18:00:26 公開日:2023-10-24
# シャドウセンス:RGB熱ドローン画像からのシャドウ非依存樹冠検出のための教師なしドメイン適応と特徴融合 ShadowSense: Unsupervised Domain Adaptation and Feature Fusion for Shadow-Agnostic Tree Crown Detection from RGB-Thermal Drone Imagery ( http://arxiv.org/abs/2310.16212v1 ) ライセンス: Link先を確認	Rudraksh Kapil, Seyed Mojtaba Marvasti-Zadeh, Nadir Erbilgin, Nilanjan Ray	(参考訳) リモートセンシングデータから個々の樹冠の正確な検出は、森林天蓋の密集した性質と、重複する天蓋、閉塞、および様々な照明条件など様々な環境変化の存在により、大きな課題となる。さらに、ロバストモデルのトレーニングのためのデータ不足は、複雑な森林条件を効果的に研究する上で、別の制限を加える。本稿では,新しい陰影樹冠検出法を提案し,約50k対のrgb熱画像を含む難解なデータセットを提供する。提案手法(ShadowSense)は完全に自己教師型であり,特徴抽出のためのソースドメインアノテーションと,特徴ピラミッドネットワークのための前景特徴アライメントを使わずに,それぞれ目に見える前景領域に着目してドメイン不変表現を適応させる。そして、両方のモードの補完情報を融合し、rgbで訓練された検出器の予測を効果的に改善し、全体的な精度を高める。広汎な実験は、ベースラインRGB訓練検出器と、教師なし領域適応や早期画像融合に依存する最先端技術の両方よりも提案手法が優れていることを示す。私たちのコードとデータは、https://github.com/rudrakshkapil/ShadowSense.comで利用可能です。 Accurate detection of individual tree crowns from remote sensing data poses a significant challenge due to the dense nature of forest canopy and the presence of diverse environmental variations, e.g., overlapping canopies, occlusions, and varying lighting conditions. Additionally, the lack of data for training robust models adds another limitation in effectively studying complex forest conditions. This paper presents a novel method for detecting shadowed tree crowns and provides a challenging dataset comprising roughly 50k paired RGB-thermal images to facilitate future research for illumination-invariant detection. The proposed method (ShadowSense) is entirely self-supervised, leveraging domain adversarial training without source domain annotations for feature extraction and foreground feature alignment for feature pyramid networks to adapt domain-invariant representations by focusing on visible foreground regions, respectively. It then fuses complementary information of both modalities to effectively improve upon the predictions of an RGB-trained detector and boost the overall accuracy. Extensive experiments demonstrate the superiority of the proposed method over both the baseline RGB-trained detector and state-of-the-art techniques that rely on unsupervised domain adaptation or early image fusion. Our code and data are available: https://github.com/rudrakshkapil/ShadowSense	翻訳日:2023-10-26 18:00:03 公開日:2023-10-24
# 深層学習による衛星ハイパースペクトル画像の海とクラウドのセグメンテーション Sea-Land-Cloud Segmentation in Satellite Hyperspectral Imagery by Deep Learning ( http://arxiv.org/abs/2310.16210v1 ) ライセンス: Link先を確認	Jon Alvarez Justo, Joseph Landon Garrett, Mariana-Iuliana Georgescu, Jesus Gonzalez-Llorente, Radu Tudor Ionescu, Tor Arne Johansen	(参考訳) 衛星は、エッジ推論を通じてプラットフォームの自律性を高めるために、オンボード人工知能(AI)技術の採用が増えている。この文脈において,hs衛星画像のセグメンテーションにおける深層学習(dl)技術の利用は,リモートセンシング応用に有利であり,本研究では,海洋(海),陸(陸),雲形成に焦点をあてた,hs画像のオンボードマルチクラスセグメンテーションに関連があると考えられる16種類の異なるモデルを訓練する。我々は,海陸クラウドセグメンテーションの実証事例としてHYPSO-1ミッションを採用し,その有効性を示すために,新しい海陸クラウドランキングアプリケーションシナリオを導入する。本システムでは, セグメント画像から海, 陸, 雲の濃度に基づいて, HS画像のダウンリンクを優先する。性能,パラメータ数,推測時間を考慮して,軌道内配置のモデルを比較的評価した。モデルには浅部モデルと深部モデルの両方が含まれており、新たに4つのDLモデルを提案すると、スペクトル(1D)と空間(2D)の両方のコンテキストからなる1つのスペクトルシグネチャ(1D)のセグメンテーションが3Dデータ処理より優れていることを示す。 1D-Justo-LiuNet と呼ばれる軽量DLモデルは,U-Net などの海面-クラウドセグメンテーションの最先端モデルを,性能 (0.93 精度) とパラメータ数 (4,563) で一貫して上回っている。しかし、1Dモデルは、テストされた処理アーキテクチャにおいて、明らかに準最適である15秒の推論時間を示す。最後に、軌道内画像のセグメンテーションは生データではなく、L1bの放射率キャリブレーション後に起こることを実証した後、より弱いセグメンテーション性能を犠牲にして、スペクトルチャネルを3つのモデルのパラメータと推論時間に下げることも示す。 Satellites are increasingly adopting on-board Artificial Intelligence (AI) techniques to enhance platforms' autonomy through edge inference. In this context, the utilization of deep learning (DL) techniques for segmentation in HS satellite imagery offers advantages for remote sensing applications, and therefore, we train 16 different models, whose codes are made available through our study, which we consider to be relevant for on-board multi-class segmentation of HS imagery, focusing on classifying oceanic (sea), terrestrial (land), and cloud formations. We employ the HYPSO-1 mission as an illustrative case for sea-land-cloud segmentation, and to demonstrate the utility of the segments, we introduce a novel sea-land-cloud ranking application scenario. Our system prioritizes HS image downlink based on sea, land, and cloud coverage levels from the segmented images. We comparatively evaluate the models for in-orbit deployment, considering performance, parameter count, and inference time. The models include both shallow and deep models, and after we propose four new DL models, we demonstrate that segmenting single spectral signatures (1D) outperforms 3D data processing comprising both spectral (1D) and spatial (2D) contexts. We conclude that our lightweight DL model, called 1D-Justo-LiuNet, consistently surpasses state-of-the-art models for sea-land-cloud segmentation, such as U-Net and its variations, in terms of performance (0.93 accuracy) and parameter count (4,563). However, the 1D models present longer inference time (15s) in the tested processing architecture, which is clearly suboptimal. Finally, after demonstrating that in-orbit image segmentation should occur post L1b radiance calibration rather than on raw data, we additionally show that reducing spectral channels down to 3 lowers models' parameters and inference time, at the cost of weaker segmentation performance.	翻訳日:2023-10-26 17:59:35 公開日:2023-10-24
# ELMリッジ回帰ブースティング ELM Ridge Regression Boosting ( http://arxiv.org/abs/2310.16209v1 ) ライセンス: Link先を確認	M. Andrecut	(参考訳) ELM(Extreme Learning Machine)に適用したRide Regression(RR)手法の高速化手法について検討し,提案手法がELMの分類性能とロバスト性を大幅に向上させることを示す。 We discuss a boosting approach for the Ridge Regression (RR) method, with applications to the Extreme Learning Machine (ELM), and we show that the proposed method significantly improves the classification performance and robustness of ELMs.	翻訳日:2023-10-26 17:58:56 公開日:2023-10-24
# イベントタイムラインの背景要約 Background Summarization of Event Timelines ( http://arxiv.org/abs/2310.16197v1 ) ライセンス: Link先を確認	Adithya Pratapa, Kevin Small, Markus Dreyer	(参考訳) ニュースイベントの簡潔な要約を生成することは、難しい自然言語処理タスクである。ジャーナリストは、重要なサブイベントをハイライトするためにタイムラインをキュレートすることが多いが、ニュースイベントへの新参者は、歴史的な状況に追いつくことの難しさに直面する。本稿では、各タイムライン更新を補完するバックグラウンドニュース要約のタスクと、関連する先行イベントの背景要約を導入することで、このニーズに対処する。既存の時系列データセットをマージしてデータセットを構築し,各ニュースイベント毎の背景概要を記述する。本稿では,最先端の要約システムを用いて強力なベースライン性能を確立し,背景要約を生成するクエリ指向型を提案する。背景要約の質を評価するため,背景要約が回答する現在の事象経過に関する質問の割合を測定する質問応答に基づく評価指標であるバックグラウンドユーティリティスコア(BUS)を提案する。 GPT-3.5を用いたゼロショット性能の向上に加えて,Flan-T5などの微調整システムの有効性を示す。 Generating concise summaries of news events is a challenging natural language processing task. While journalists often curate timelines to highlight key sub-events, newcomers to a news event face challenges in catching up on its historical context. In this paper, we address this need by introducing the task of background news summarization, which complements each timeline update with a background summary of relevant preceding events. We construct a dataset by merging existing timeline datasets and asking human annotators to write a background summary for each timestep of each news event. We establish strong baseline performance using state-of-the-art summarization systems and propose a query-focused variant to generate background summaries. To evaluate background summary quality, we present a question-answering-based evaluation metric, Background Utility Score (BUS), which measures the percentage of questions about a current event timestep that a background summary answers. Our experiments show the effectiveness of instruction fine-tuned systems such as Flan-T5, in addition to strong zero-shot performance using GPT-3.5.	翻訳日:2023-10-26 17:58:49 公開日:2023-10-24
# 単純決定論的オートエンコーダによる低位潜在空間の学習:理論的および経験的考察 Learning Low-Rank Latent Spaces with Simple Deterministic Autoencoder: Theoretical and Empirical Insights ( http://arxiv.org/abs/2310.16194v1 ) ライセンス: Link先を確認	Alokendu Mazumder, Tirthajit Baruah, Bhartendu Kumar, Rishab Sharma, Vishwajeet Pattanaik, Punit Rathore	(参考訳) autoencoderは教師なしの学習パラダイムであり、再構成損失を最小限にすることでデータのコンパクトな潜在表現を作ることを目的としている。しかし、ほとんどのデータ(画像)が低次元空間に埋め込まれているという事実は見過ごされがちであり、効果的なデータ表現には不可欠である。この制限に対処するため,Low-Rank Autoencoder (LoRAE) と呼ばれる新しい手法を提案する。 LoRAEでは,低次元潜在空間を適応的に再構成し,オートエンコーダの基本目的を保ちながら低ランク正規化器を組み込んだ。これは重要な情報を保存しながら、データを低次元空間に埋め込むのに役立つ。低ランク潜在空間を学習する単純なオートエンコーダ拡張である。理論的には、モデルに対してより厳密なエラー境界を確立する。経験的に、我々のモデルの優越性は画像生成や下流分類といった様々なタスクを通して輝いています。理論的および実践的な結果は、低次元埋め込みを取得することの重要性を強調している。 The autoencoder is an unsupervised learning paradigm that aims to create a compact latent representation of data by minimizing the reconstruction loss. However, it tends to overlook the fact that most data (images) are embedded in a lower-dimensional space, which is crucial for effective data representation. To address this limitation, we propose a novel approach called Low-Rank Autoencoder (LoRAE). In LoRAE, we incorporated a low-rank regularizer to adaptively reconstruct a low-dimensional latent space while preserving the basic objective of an autoencoder. This helps embed the data in a lower-dimensional space while preserving important information. It is a simple autoencoder extension that learns low-rank latent space. Theoretically, we establish a tighter error bound for our model. Empirically, our model's superiority shines through various tasks such as image generation and downstream classification. Both theoretical and practical outcomes highlight the importance of acquiring low-dimensional embeddings.	翻訳日:2023-10-26 17:58:32 公開日:2023-10-24
# Lengthは文書レベルのセマンティックスのためのカースと祝福 Length is a Curse and a Blessing for Document-level Semantics ( http://arxiv.org/abs/2310.16193v1 ) ライセンス: Link先を確認	Chenghao Xiao, Yizhi Li, G Thomas Hudson, Chenghua Lin, Noura Al Moubayed	(参考訳) 近年、コントラスト学習(cl)は、事前学習された言語モデルから文と文書レベルのエンコーディング能力を回復するために広く利用されている。本研究では,CLモデルの長さ一般化可能性,すなわち,長さ誘起セマンティックシフトに対する脆弱性について考察する。我々は、その長さの脆弱性が重要で見過ごされている研究のギャップであるだけでなく、文書の長さによって提供される意味的信号のみに応じて教師なしのclメソッドを考案することができることを検証した。まず,文書の伸長がCLによってもたらされた文書内類似度を高めることを示し,文書の長さ攻撃の基礎となる理論的基礎を導出する。さらに,clが約束する等方性は,学習中に露呈するテキストの長さ範囲に大きく依存することがわかった。これらの知見に触発されて、単純で普遍的な文書表現学習フレームワークla(ser)$^{3}$: 意味論的にロバストな文表現学習のための長さ非依存の自己参照を導入し、標準情報検索ベンチマークで最先端の教師なしパフォーマンスを実現する。 In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but we can devise unsupervised CL methods solely depending on the semantic signal provided by document length. We first derive the theoretical foundations underlying length attacks, showing that elongating a document would intensify the high intra-document similarity that is already brought by CL. Moreover, we found that isotropy promised by CL is highly dependent on the length range of text exposed in training. Inspired by these findings, we introduce a simple yet universal document representation learning framework, LA(SER)$^{3}$: length-agnostic self-reference for semantically robust sentence representation learning, achieving state-of-the-art unsupervised performance on the standard information retrieval benchmark.	翻訳日:2023-10-26 17:58:17 公開日:2023-10-24
# スパース観測と時変センサを用いた高効率深部データ同化 Efficient deep data assimilation with sparse observations and time-varying sensors ( http://arxiv.org/abs/2310.16187v1 ) ライセンス: Link先を確認	Sibo Cheng, Che Liu, Yike Guo, Rossella Arcucci	(参考訳) 変分データ同化(DA)は、複数のノイズデータソースの重み付けをすることで、現場復元と予測の工学的問題に広く用いられている。近年,DAにおけるディープラーニング(DL)技術の統合は,高次元力学系における効率と精度の向上を約束している。それにもかかわらず、既存の深部DAアプローチは、特に時間とともにセンサーの配置と数が動的である場合、非構造化観測データを扱うのに困難に直面している。本稿では,dl逆演算子を同化目的関数に組み込んだ変分データ同化のためのvoronoi-tessellation inverse operator(vivid)という新しい変分daスキームを導入する。 voronoi-tessellationとconvolutional neural networksの能力を活用することで、vividは、スパース、非構造化、時間変化のセンサーデータの処理に長けている。さらに、DL逆演算子の組み入れにより、観測と状態空間の直接リンクが確立され、DAに必要な最小化ステップの数が減少する。さらに、 vivid は適切な直交分解 (pod) とシームレスに統合でき、エンドツーエンドの還元順序 da スキームを開発することができる。流体力学系における数値実験により、VIVIDは既存のDAおよびDLアルゴリズムを大幅に上回ることを示す。 VIVIDのロバスト性は、様々なレベルの事前エラー、様々なセンサーの利用、DAにおける誤り共分散の誤特定などを通じてもアクセス可能である。 Variational Data Assimilation (DA) has been broadly used in engineering problems for field reconstruction and prediction by performing a weighted combination of multiple sources of noisy data. In recent years, the integration of deep learning (DL) techniques in DA has shown promise in improving the efficiency and accuracy in high-dimensional dynamical systems. Nevertheless, existing deep DA approaches face difficulties in dealing with unstructured observation data, especially when the placement and number of sensors are dynamic over time. We introduce a novel variational DA scheme, named Voronoi-tessellation Inverse operator for VariatIonal Data assimilation (VIVID), that incorporates a DL inverse operator into the assimilation objective function. By leveraging the capabilities of the Voronoi-tessellation and convolutional neural networks, VIVID is adept at handling sparse, unstructured, and time-varying sensor data. Furthermore, the incorporation of the DL inverse operator establishes a direct link between observation and state space, leading to a reduction in the number of minimization steps required for DA. Additionally, VIVID can be seamlessly integrated with Proper Orthogonal Decomposition (POD) to develop an end-to-end reduced-order DA scheme, which can further expedite field reconstruction. Numerical experiments in a fluid dynamics system demonstrate that VIVID can significantly outperform existing DA and DL algorithms. The robustness of VIVID is also accessed through the application of various levels of prior error, the utilization of varying numbers of sensors, and the misspecification of error covariance in DA.	翻訳日:2023-10-26 17:57:55 公開日:2023-10-24
# 粉末x線回折像に対するu-netアーキテクチャを用いた画像分割 Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images ( http://arxiv.org/abs/2310.16186v1 ) ライセンス: Link先を確認	Howard Yanxon, Eric Roberts, Hannah Parraga, James Weng, Wenqian Xu, Uta Ruett, Alexander Hexemer, Petrus Zwart, Nickolas Schwarz	(参考訳) 科学研究者は、高エネルギー粉末X線回折(XRD)技術を用いて、充電可能な電池材料などの機能デバイスにおける材料の結晶構造を調べる。実験XRD画像中のアーティファクトを識別する手法を提案する。提案手法では,チューニング可能なu-netなど,ディープラーニング畳み込みニューラルネットワークアーキテクチャを用いてアーチファクトを識別する。特に、予測されたアーティファクトは、全体正の正の率またはリコールを用いて、対応する基底真理(手動で実装)に対して評価される。その結果、u-netはトレーニングに含まれないテストデータセット上で92.4%という高いリコール性能を実現でき、従来の方法と比較して平均的な偽陽性率を34%削減できた。 U-Netsはまた、アーティファクトの識別と分離に要する時間を50%以上削減している。さらに, アーティファクトの排除は, 統合された1次元XRDパターンに大きな変化を示し, 後処理のXRDデータのさらなる解析を促進する。 Scientific researchers frequently use the in situ synchrotron high-energy powder X-ray diffraction (XRD) technique to examine the crystallographic structures of materials in functional devices such as rechargeable battery materials. We propose a method for identifying artifacts in experimental XRD images. The proposed method uses deep learning convolutional neural network architectures, such as tunable U-Nets to identify the artifacts. In particular, the predicted artifacts are evaluated against the corresponding ground truth (manually implemented) using the overall true positive rate or recall. The result demonstrates that the U-Nets can consistently produce great recall performance at 92.4% on the test dataset, which is not included in the training, with a 34% reduction in average false positives in comparison to the conventional method. The U-Nets also reduce the time required to identify and separate artifacts by more than 50%. Furthermore, the exclusion of the artifacts shows major changes in the integrated 1D XRD pattern, enhancing further analysis of the post-processing XRD data.	翻訳日:2023-10-26 17:57:25 公開日:2023-10-24
# blp 2023タスク2:感情分析 BLP 2023 Task 2: Sentiment Analysis ( http://arxiv.org/abs/2310.16183v1 ) ライセンス: Link先を確認	Md. Arid Hasan, Firoj Alam, Anika Anjum, Shudipta Das, Afiyat Anjum	(参考訳) EMNLP 2023と共同で,第1回BLP 2023ワークショップの一環として編成されたBLP知覚共有タスクの概要を紹介する。このタスクは、ソーシャルメディアのテキスト中の感情の検出として定義されます。このタスクには71人の参加者が参加し、29チームと30チームがそれぞれ開発フェーズと評価フェーズにシステムを提出した。参加者は合計597人となった。しかし、合計15チームがシステム記述書を提出した。提出されたシステムにおけるアプローチの範囲は、古典的な機械学習モデル、微調整された事前訓練モデル、ゼロショットと少数ショットの設定でLarge Language Model(LLM)を活用することまで様々である。本稿では,データセット開発と評価設定を含むタスク設定の詳細な説明を行う。また,参加者が提出したシステムの概要についても概説する。共有タスクからのすべてのデータセットと評価スクリプトが研究コミュニティ向けに公開され、この領域におけるさらなる研究が進められている。 We present an overview of the BLP Sentiment Shared Task, organized as part of the inaugural BLP 2023 workshop, co-located with EMNLP 2023. The task is defined as the detection of sentiment in a given piece of social media text. This task attracted interest from 71 participants, among whom 29 and 30 teams submitted systems during the development and evaluation phases, respectively. In total, participants submitted 597 runs. However, a total of 15 teams submitted system description papers. The range of approaches in the submitted systems spans from classical machine learning models, fine-tuning pre-trained models, to leveraging Large Language Model (LLMs) in zero- and few-shot settings. In this paper, we provide a detailed account of the task setup, including dataset development and evaluation setup. Additionally, we provide a brief overview of the systems submitted by the participants. All datasets and evaluation scripts from the shared task have been made publicly available for the research community, to foster further research in this domain	翻訳日:2023-10-26 17:57:08 公開日:2023-10-24
# 事前学習型言語モデルの改良と解釈のための混合言語訓練適応器 Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models ( http://arxiv.org/abs/2310.16240v1 ) ライセンス: Link先を確認	Raymond Li, Gabriel Murray and Giuseppe Carenini	(参考訳) 本研究では,パラメータ効率のよい微調整(PEFT)設定において,言語構造を事前学習言語モデルに注入することで,2つの人気のある研究領域を組み合わせる手法を提案する。このアプローチでは、異なる言語構造をエンコードする並列アダプタモジュールを、gumbel-softmaxゲートを使用してモデルの各層におけるこれらのモジュールの重要性を判断する、新しい混合言語専門家アーキテクチャを用いて結合する。パラメータの数を減らすために、まず、その重要度に基づいて専門家を刈り取る前に、一定数のステップでモデルをトレーニングします。実験の結果,3種類の事前学習モデルによる実験結果から,本手法はパラメータ数に比較して,最先端のPEFT法より優れていることが示された。さらに,各モデルで選択した専門家を各層で分析し,今後の研究に対する洞察を提供する。 In this work, we propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models in the parameter-efficient fine-tuning (PEFT) setting. In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture, where Gumbel-Softmax gates are used to determine the importance of these modules at each layer of the model. To reduce the number of parameters, we first train the model for a fixed small number of steps before pruning the experts based on their importance scores. Our experiment results with three different pre-trained models show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters. In addition, we provide additional analysis to examine the experts selected by each model at each layer to provide insights for future studies.	翻訳日:2023-10-26 17:49:59 公開日:2023-10-24
# 教師なし画像セグメンテーションのための画素レベルクラスタリングネットワーク Pixel-Level Clustering Network for Unsupervised Image Segmentation ( http://arxiv.org/abs/2310.16234v1 ) ライセンス: Link先を確認	Cuong Manh Hoang and Byeongkeun Kang	(参考訳) 画像分割は、自動運転、把持、ロボットナビゲーションなどの様々なコンピュータビジョンアプリケーションにおいて不可欠であるが、トレーニングのためにピクセルレベルですべてのオブジェクトに注釈を付けることはほぼ不可能である。したがって、教師なし画像分割法の研究は不可欠である。本稿では,画像の領域分割のためのピクセルレベルのクラスタリングフレームワークを提案する。提案フレームワークは、注意機構を備えた機能埋め込みモジュール、特徴統計計算モジュール、画像再構成、および高精度な教師なしセグメンテーションを実現するスーパーピクセルセグメンテーションを含む。さらに,各スーパーピクセル間の一貫性,隣接スーパーピクセル間の相似/相似性,画像間の構造相似性を利用したトレーニング戦略を提案する。また,スーパーピクセルによる損失による過大セグメント化を回避するため,ポストプロセッシング手法を提案する。さらに,教師なしセマンティックセグメンテーションのための提案手法の拡張を提案する。提案フレームワークの有効性を実証するために,3つの公開データセット(berkeley segmentation dataset,pascal voc 2012 dataset,coco-stuff dataset)について実験を行った。実験の結果,提案手法は従来の最先端手法よりも優れていた。 While image segmentation is crucial in various computer vision applications, such as autonomous driving, grasping, and robot navigation, annotating all objects at the pixel-level for training is nearly impossible. Therefore, the study of unsupervised image segmentation methods is essential. In this paper, we present a pixel-level clustering framework for segmenting images into regions without using ground truth annotations. The proposed framework includes feature embedding modules with an attention mechanism, a feature statistics computing module, image reconstruction, and superpixel segmentation to achieve accurate unsupervised segmentation. Additionally, we propose a training strategy that utilizes intra-consistency within each superpixel, inter-similarity/dissimilarity between neighboring superpixels, and structural similarity between images. To avoid potential over-segmentation caused by superpixel-based losses, we also propose a post-processing method. Furthermore, we present an extension of the proposed method for unsupervised semantic segmentation. We conducted experiments on three publicly available datasets (Berkeley segmentation dataset, PASCAL VOC 2012 dataset, and COCO-Stuff dataset) to demonstrate the effectiveness of the proposed framework. The experimental results show that the proposed framework outperforms previous state-of-the-art methods.	翻訳日:2023-10-26 17:49:43 公開日:2023-10-24
# 時系列予測のための注意に基づくアンサンブルプール Attention-Based Ensemble Pooling for Time Series Forecasting ( http://arxiv.org/abs/2310.16231v1 ) ライセンス: Link先を確認	Dhruvit Patel and Alexander Wikner	(参考訳) 時系列予測におけるモデルバイアスを低減する一般的な手法は、予測モデルのアンサンブルを使用して、その出力をアンサンブル予測にまとめることである。しかし、各予測モデルが異なるバイアスを持つ場合、このプーリング中に各モデル予測がどのように評価されるべきかは必ずしも明確ではない。提案手法は,注意に基づくアンサンブルプーリングモデルによって重み付け値が学習される候補モデル予測よりも重み付け平均を行うプーリング手法を提案する。本手法は,非定常Lorenz `63方程式のダイナミクスのマルチステップ予測と,COVID-19による週のインシデント死亡の1ステップ予測という2つの時系列予測問題に対して試行する。当モデルでは,非定常ロレンツ式63を予測した場合に優れた有効時間が得られるが,covid-19週次インシデント死亡を予測した場合,既存のアンサンブルプールよりも良好に動作しないことがわかった。 A common technique to reduce model bias in time-series forecasting is to use an ensemble of predictive models and pool their output into an ensemble forecast. In cases where each predictive model has different biases, however, it is not always clear exactly how each model forecast should be weighed during this pooling. We propose a method for pooling that performs a weighted average over candidate model forecasts, where the weights are learned by an attention-based ensemble pooling model. We test this method on two time-series forecasting problems: multi-step forecasting of the dynamics of the non-stationary Lorenz `63 equation, and one-step forecasting of the weekly incident deaths due to COVID-19. We find that while our model achieves excellent valid times when forecasting the non-stationary Lorenz `63 equation, it does not consistently perform better than the existing ensemble pooling when forecasting COVID-19 weekly incident deaths.	翻訳日:2023-10-26 17:49:23 公開日:2023-10-24
# ショートカット学習の基礎について On the Foundations of Shortcut Learning ( http://arxiv.org/abs/2310.16228v1 ) ライセンス: Link先を確認	Katherine L. Hermann, Hossein Mobahi, Thomas Fel, Michael C. Mozer	(参考訳) ディープラーニングモデルは、データから豊富な特徴を抽出できる。モデルが使用する機能は、予測性だけでなく、確実にトレインセットラベルを示す機能にも依存します。ショートカット学習に関する文献では、例えば、形状上のテクスチャや、前景の物体上の画像背景など、モデルが別の特徴を特権化する例が指摘されている。本稿では,モデルに対してどの入力特性が利用可能かという仮説を検証し,モデルの特徴利用に対する予測性と可用性の相互作用を体系的に検討する。提案手法は,予測性や予測可能性に関連する要因によって異なる2つの潜在的特徴を持つ分類データセットを合成する最小限かつ明示的な生成フレームワークを構築し,コア機能(利用できない,予測しにくい)を犠牲にして,ショートカット機能に対するモデルのショートカットバイアスの過度な信頼性を定量化する。線形モデルは比較的偏りがないが、ReLUまたはTanh単位を持つ単一の隠れ層を導入するとバイアスが生じる。我々の経験的発見は、Neural Tangent Kernelsに基づく理論的考察と一致している。最後に,自然データ集合における予測性と可用性のトレードオフについて検討し,モデルの近距離バイアスを増大させるアベイラビリティ操作を発見する。これらの結果は、モデルがタスクをどう解決するかを形作る役割を考慮し、体系的な研究を保証している深い非線形アーキテクチャの基本的な特徴であることを示す。 Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on predictivity-how reliably a feature indicates train-set labels-but also on availability-how easily the feature can be extracted, or leveraged, from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for example texture over shape and image backgrounds over foreground objects. Here, we test hypotheses about which input properties are more available to a model, and systematically study how predictivity and availability interact to shape models' feature use. We construct a minimal, explicit generative framework for synthesizing classification datasets with two latent features that vary in predictivity and in factors we hypothesize to relate to availability, and quantify a model's shortcut bias-its over-reliance on the shortcut (more available, less predictive) feature at the expense of the core (less available, more predictive) feature. We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias. Our empirical findings are consistent with a theoretical account based on Neural Tangent Kernels. Finally, we study how models used in practice trade off predictivity and availability in naturalistic datasets, discovering availability manipulations which increase models' degree of shortcut bias. Taken together, these findings suggest that the propensity to learn shortcut features is a fundamental characteristic of deep nonlinear architectures warranting systematic study given its role in shaping how models solve tasks.	翻訳日:2023-10-26 17:49:04 公開日:2023-10-24
# TiC-CLIP:CLIPモデルの継続的なトレーニング TiC-CLIP: Continual Training of CLIP Models ( http://arxiv.org/abs/2310.16226v1 ) ライセンス: Link先を確認	Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri	(参考訳) 最新のデータで大規模な基盤モデルを最新に保つのは本質的にコストがかかる。絶え間ない再訓練の禁止コストを避けるためには、これらのモデルを継続的に訓練することが不可欠である。この問題は、大規模な連続学習ベンチマークやベースラインの欠如によって悪化している。我々は、TiC-DataCompt、TiC-YFCC、TiC-RedCapsといったビジョン言語モデルをトレーニングするための、WebスケールのTime-Continual(TiC)ベンチマークの最初のセットを紹介します。まず、ベンチマークを用いて様々な動的評価を算出し、既存のモデルの時間的堅牢性を測定する。私たちは、OpenAIのCLIP(2020年までのデータでトレーニングされた)が、最近トレーニングされたOpenCLIPリポジトリのモデルと比較して、2021年から2022年までのキュレートされた検索タスクにおいて、$\approx 8\%$ゼロショットの精度を失うことを示しています。次に、時間連続データに基づいてモデルを効率的にトレーニングする方法を研究します。最後のチェックポイントからトレーニングを継続し、古いデータを再生するシンプルなリハーサルベースのアプローチは、スクラッチからリトレーニングする標準的なプラクティスと比較して、計算コストを2.5\times$削減する。 Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataCompt, TiC-YFCC, and TiC-RedCaps with over 12.7B timestamped image-text pairs spanning 9 years (2014--2022). We first use our benchmarks to curate various dynamic evaluations to measure temporal robustness of existing models. We show OpenAI's CLIP (trained on data up to 2020) loses $\approx 8\%$ zero-shot accuracy on our curated retrieval task from 2021--2022 compared with more recently trained models in OpenCLIP repository. We then study how to efficiently train models on time-continuous data. We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by $2.5\times$ when compared to the standard practice of retraining from scratch.	翻訳日:2023-10-26 17:48:32 公開日:2023-10-24
# CleanCoNLL: ほとんどノイズのない名前付きエンティティ認識データセット CleanCoNLL: A Nearly Noise-Free Named Entity Recognition Dataset ( http://arxiv.org/abs/2310.16225v1 ) ライセンス: Link先を確認	Susanna R\"ucker, Alan Akbik	(参考訳) conll-03コーパスは、名前付きエンティティ認識(ner)のための最もよく知られているベンチマークデータセットである。しかし、以前の研究では、データにかなりの数のアノテーションエラー、不完全性、不整合が見つかった。これは、現在の最先端モデルは、CoNLL-03の推定ノイズレベルに匹敵する、あるいは超えるF1スコアを達成するため、NERアプローチを客観的に比較し、それらのエラーを分析するための課題となる。この問題に対処するために,全ラベルの7.0%を英語のconll-03で訂正する自動一貫性チェックによる包括的relabelingの取り組みを提案する。我々の取り組みは、NERラベルのより良い説明可能性とアノテーション品質のさらなる保護のためにエンティティリンクアノテーションのレイヤを追加します。実験結果から, 最先端の手法がF1スコア(97.1%)をはるかに上回っているだけでなく, アノテーションノイズによる誤りとして誤算された正確な予測のシェアが47%から6%に低下していることがわかった。このことは、我々の資源は最先端モデルによる残差を分析するのに適しており、理論上界は高資源でも粗粒NERに到達していないことを示している。このような分析を容易にするため,研究コミュニティにCleanCoNLLを公開する。 The CoNLL-03 corpus is arguably the most well-known and utilized benchmark dataset for named entity recognition (NER). However, prior works found significant numbers of annotation errors, incompleteness, and inconsistencies in the data. This poses challenges to objectively comparing NER approaches and analyzing their errors, as current state-of-the-art models achieve F1-scores that are comparable to or even exceed the estimated noise level in CoNLL-03. To address this issue, we present a comprehensive relabeling effort assisted by automatic consistency checking that corrects 7.0% of all labels in the English CoNLL-03. Our effort adds a layer of entity linking annotation both for better explainability of NER labels and as additional safeguard of annotation quality. Our experimental evaluation finds not only that state-of-the-art approaches reach significantly higher F1-scores (97.1%) on our data, but crucially that the share of correct predictions falsely counted as errors due to annotation noise drops from 47% to 6%. This indicates that our resource is well suited to analyze the remaining errors made by state-of-the-art models, and that the theoretical upper bound even on high resource, coarse-grained NER is not yet reached. To facilitate such analysis, we make CleanCoNLL publicly available to the research community.	翻訳日:2023-10-26 17:48:06 公開日:2023-10-24
# 毒物は痕跡がない:毒物攻撃の完全無依存検出 Poison is Not Traceless: Fully-Agnostic Detection of Poisoning Attacks ( http://arxiv.org/abs/2310.16224v1 ) ライセンス: Link先を確認	Xinglong Chang, Katharina Dost, Gillian Dobbie, J\"org Wicker	(参考訳) 機械学習モデルのパフォーマンスは、基礎となるデータの品質に依存する。悪意のあるアクターは、トレーニングデータを汚染することでモデルを攻撃することができる。現在の検出器は、特定のデータタイプ、モデル、または攻撃と結びついているため、実際のシナリオでの適用性は限られている。本稿では,毒性のあるデータセットの分析にのみ依存する攻撃を検知する新たなフレームワークであるDIVA(Detecting In Visible Attacks)を提案する。 divaは、毒物や清潔なデータに対する分類器の精度を比較して毒物攻撃を検知できるという考えに基づいており、仮説上のクリーンデータセット上で未知の精度を推定するために、複雑度測定を用いてメタリーナーを事前訓練している。このフレームワークは一般的な中毒攻撃に適用できる。評価のために,本稿ではラベルフリップ攻撃に対するDIVAを検証した。 The performance of machine learning models depends on the quality of the underlying data. Malicious actors can attack the model by poisoning the training data. Current detectors are tied to either specific data types, models, or attacks, and therefore have limited applicability in real-world scenarios. This paper presents a novel fully-agnostic framework, DIVA (Detecting InVisible Attacks), that detects attacks solely relying on analyzing the potentially poisoned data set. DIVA is based on the idea that poisoning attacks can be detected by comparing the classifier's accuracy on poisoned and clean data and pre-trains a meta-learner using Complexity Measures to estimate the otherwise unknown accuracy on a hypothetical clean dataset. The framework applies to generic poisoning attacks. For evaluation purposes, in this paper, we test DIVA on label-flipping attacks.	翻訳日:2023-10-26 17:47:42 公開日:2023-10-24
# 階層的ランダム化平滑化 Hierarchical Randomized Smoothing ( http://arxiv.org/abs/2310.16221v1 ) ライセンス: Link先を確認	Yan Scholten, Jan Schuchardt, Aleksandar Bojchevski, Stephan G\"unnemann	(参考訳) 実世界のデータは複雑で、しばしば複数のエンティティ(例えば画像はピクセル、グラフは相互接続ノード)に分解できるオブジェクトで構成されている。ランダム化平滑化(randomized smoothing)は、モデルが入力の小さな変更に対して確実に堅牢になるための強力なフレームワークである。しかし、オブジェクト全体(例えば画像)を任意に摂動せず、エンティティのサブセット(例えばピクセル)しか持たない場合、ランダムな平滑化による複雑なデータに対するロバスト性の証明は困難である。ランダムに選択されたエンティティのサブセットにのみランダムノイズを追加することにより、部分的にオブジェクトを平滑化します。従来の手法よりも標的に雑音を加えることで、高い精度を維持しながら強靭性を保証する。異なるノミージング分布を用いて階層的平滑化を初期化し,離散的および連続的領域に対する新しいロバスト性証明を導出する。画像とノードの分類における階層的平滑化の重要性を実験的に実証し,ロバスト性・正確性に優れたトレードオフをもたらすことを示した。全体として、階層的平滑化は、摂動に対して確実に堅牢で正確であるモデルにとって重要な貢献である。 Real-world data is complex and often consists of objects that can be decomposed into multiple entities (e.g. images into pixels, graphs into interconnected nodes). Randomized smoothing is a powerful framework for making models provably robust against small changes to their inputs - by guaranteeing robustness of the majority vote when randomly adding noise before classification. Yet, certifying robustness on such complex data via randomized smoothing is challenging when adversaries do not arbitrarily perturb entire objects (e.g. images) but only a subset of their entities (e.g. pixels). As a solution, we introduce hierarchical randomized smoothing: We partially smooth objects by adding random noise only on a randomly selected subset of their entities. By adding noise in a more targeted manner than existing methods we obtain stronger robustness guarantees while maintaining high accuracy. We initialize hierarchical smoothing using different noising distributions, yielding novel robustness certificates for discrete and continuous domains. We experimentally demonstrate the importance of hierarchical smoothing in image and node classification, where it yields superior robustness-accuracy trade-offs. Overall, hierarchical smoothing is an important contribution towards models that are both - certifiably robust to perturbations and accurate.	翻訳日:2023-10-26 17:47:28 公開日:2023-10-24
# 大規模言語モデルのための知識編集:調査 Knowledge Editing for Large Language Models: A Survey ( http://arxiv.org/abs/2310.16218v1 ) ライセンス: Link先を確認	Song Wang, Yaochen Zhu, Haochen Liu, Zaiyi Zheng, Chen Chen, Jundong L	(参考訳) 大規模言語モデル(LLM)は、その膨大な知識と推論能力に基づいてテキストを理解し、分析し、生成する顕著な能力のために、最近、学術的および産業的景観を変革した。それにもかかわらず、llmsの大きな欠点は、前例のない量のパラメータによる事前トレーニングの計算コストである。事前訓練されたモデルに新しい知識を頻繁に導入する必要がある場合、デメリットは悪化する。したがって、事前訓練されたLLMを更新するための効率的かつ効率的な技術を開発することが不可欠である。従来の手法は、事前訓練されたllmにおける新しい知識を直接微調整によってエンコードする。しかし, 自己学習型LLMは計算集約的であり, モデル更新によらず, 価値ある事前学習知識を劣化させるリスクがある。近年,知識に基づくモデル編集(KME)が注目され,他の無関係な知識に悪影響を及ぼすことなく,特定の知識を組み込むためにLLMを正確に修正することを目指している。本調査では,KME分野の最近の進歩を包括的かつ詳細に概観することを目的としている。まず、異なるKME戦略を包含するKMEの一般的な定式化を導入する。その後,本手法の革新的分類法として,既存のKME戦略を考察し,各カテゴリの手法の重要点,利点,限界を分析した上で,新たな知識の事前学習 LLM への導入方法に基づくKME手法の革新的分類法を提案する。さらに、KMEの代表的な指標、データセット、応用を紹介する。最後に,KMEの実践性と課題の残りについて詳細な分析を行い,今後の発展に向けた今後の研究の方向性を提案する。 Large language models (LLMs) have recently transformed both the academic and industrial landscapes due to their remarkable capacity to understand, analyze, and generate texts based on their vast knowledge and reasoning ability. Nevertheless, one major drawback of LLMs is their substantial computational cost for pre-training due to their unprecedented amounts of parameters. The disadvantage is exacerbated when new knowledge frequently needs to be introduced into the pre-trained model. Therefore, it is imperative to develop effective and efficient techniques to update pre-trained LLMs. Traditional methods encode new knowledge in pre-trained LLMs through direct fine-tuning. However, naively re-training LLMs can be computationally intensive and risks degenerating valuable pre-trained knowledge irrelevant to the update in the model. Recently, Knowledge-based Model Editing (KME) has attracted increasing attention, which aims to precisely modify the LLMs to incorporate specific knowledge, without negatively influencing other irrelevant knowledge. In this survey, we aim to provide a comprehensive and in-depth overview of recent advances in the field of KME. We first introduce a general formulation of KME to encompass different KME strategies. Afterward, we provide an innovative taxonomy of KME techniques based on how the new knowledge is introduced into pre-trained LLMs, and investigate existing KME strategies while analyzing key insights, advantages, and limitations of methods from each category. Moreover, representative metrics, datasets, and applications of KME are introduced accordingly. Finally, we provide an in-depth analysis regarding the practicality and remaining challenges of KME and suggest promising research directions for further advancement in this field.	翻訳日:2023-10-26 17:47:04 公開日:2023-10-24
# NaRb分子の複数回転状態に対するマジックトラップ Magic Traps for Multiple Rotational States of NaRb Molecule ( http://arxiv.org/abs/2310.16215v1 ) ライセンス: Link先を確認	Svetlana Kotochigova, Qingze Guan, Vito Scarola, Brian DeMarco, Bryce Gadway	(参考訳) 分子は振動、回転、スピン軌道、超微細な自由度を持ち、それぞれが外部電磁放射に特異的に反応する。これらの量子状態の重ね合わせに対するコヒーレント制御は分子の操作の鍵となる。例えば、より長い量子シミュレーションが続くほど、コヒーレンス時間が長くなる。レーザー光で分子を制御する上で重要な量は、その複素値の分子動的偏光性である。実際の部分は分子が感じたツイーザー電位を決定するが、想像的な部分はコヒーレンス時間に寄与する。本研究は、電気双極子-forbidden分子遷移に対して、(数十ghzのオーダーで)小さなデチューニングを持つ選択レーザ周波数によって、光学ポテンシャルにおける分子の効率的なトラップを実現することを示唆する。この遷移に近接して、これらの状態間のコヒーレンスを犠牲にすることなく、多重回転状態のトラップ電位を著しく修正することができる。超低温23na87rb極性分子の複数の回転状態に対するマジックトラップ条件が生成できることを実証する。また,スピン分離したマジックトラップは磁場方向に向いた静電場を印加することで実現可能であることを示した。 Molecules have vibrational, rotational, spin-orbit and hyperfine degrees of freedom, each of which responds in a unique fashion to external electromagnetic radiation. The coherent control over superpositions of these quantum states is key to manipulation of molecules. For example, the better the coherence time the longer quantum simulations can last. The important quantity for controlling a molecule with laser light is its complex-valued molecular dynamic polarizability. Its real part determines the tweezer potential as felt by the molecule, while its imaginary part contributes to the coherence time. Our studies show that efficient trapping of a molecule in an optical potential can be achieved by a selecting laser frequency that has a small detuning (on the order of tens of GHz) relative to an electric-dipole-forbidden molecular transition. Close proximity to this transition allows us to significantly modify the trapping potentials for multiple rotational states without sacrificing coherences among these states. We demonstrate that magic trapping conditions for multiple rotational states in ultracold 23Na87Rb polar molecule can be created. In addition, we show that spin-decoupled magic trapping can be achieved with an applied static electric field oriented along the magnetic field direction.	翻訳日:2023-10-26 17:46:37 公開日:2023-10-24
# speakerly:テキスト合成のための音声ベースのライティングアシスタント Speakerly: A Voice-based Writing Assistant for Text Composition ( http://arxiv.org/abs/2310.16251v1 ) ライセンス: Link先を確認	Dhruv Kumar, Vipul Raheja, Alice Kaiser-Schatzlein, Robyn Perry, Apurva Joshi, Justin Hugues-Nuger, Samuel Lou, Navid Chowdhury	(参考訳) メールやインスタントメッセージ,ノートなど,さまざまなユースケースにわたるテキスト合成を支援する,リアルタイム音声による文字作成支援システムである speakerly を提案する。ユーザーは指示や指示を通じてシステムと対話でき、システムはよく書式化され、一貫性のある文書を生成する。システムアーキテクチャと,このようなシステムを大規模に構築およびデプロイする上でのさまざまな課題に対する対処方法について詳述する。具体的には,タスク固有モデルと事前学習した言語モデルを組み合わせて,テキスト合成を高速かつ効果的に行うとともに,多様な入力モードをサポートしてユーザビリティを向上させる。 We present Speakerly, a new real-time voice-based writing assistance system that helps users with text composition across various use cases such as emails, instant messages, and notes. The user can interact with the system through instructions or dictation, and the system generates a well-formatted and coherent document. We describe the system architecture and detail how we address the various challenges while building and deploying such a system at scale. More specifically, our system uses a combination of small, task-specific models as well as pre-trained language models for fast and effective text composition while supporting a variety of input modes for better usability.	翻訳日:2023-10-26 17:39:03 公開日:2023-10-24
# グラフ隣接の固有ベクトルに基づく有限要素モデルの問合せのためのクラスタリングツール A clustering tool for interrogating finite element models based on eigenvectors of graph adjacency ( http://arxiv.org/abs/2310.16249v1 ) ライセンス: Link先を確認	Ramaseshan Kannan	(参考訳) 本稿では,有限要素(fe)シミュレーションモデルにおける誤りをデバッグするための教師なし学習アルゴリズムを紹介し,その生成方法について詳述する。このアルゴリズムは、剛性行列の隣接性の数値的性質を用いてfeモデルにおける自由度を集合する。このアルゴリズムは、商用構造FEスイートOasys GSA(www.oasys-software.com/gsa)の「モデル安定性解析」ツールとしてデプロイされている。実世界のfeモデルのデバッグにエンドユーザがうまく利用し、実際に動作するツールの例を示す。 This note introduces an unsupervised learning algorithm to debug errors in finite element (FE) simulation models and details how it was productionised. The algorithm clusters degrees of freedom in the FE model using numerical properties of the adjacency of its stiffness matrix. The algorithm has been deployed as a tool called `Model Stability Analysis' tool within the commercial structural FE suite Oasys GSA (www.oasys-software.com/gsa). It has been used successfully by end-users for debugging real world FE models and we present examples of the tool in action.	翻訳日:2023-10-26 17:38:51 公開日:2023-10-24
# GlotLID:低リソース言語のための言語識別 GlotLID: Language Identification for Low-Resource Languages ( http://arxiv.org/abs/2310.16248v1 ) ライセンス: Link先を確認	Amir Hossein Kargaran, Ayyoob Imani, Fran\c{c}ois Yvon, Hinrich Sch\"utze	(参考訳) 最近のいくつかの論文は、約300の高リソース言語と中リソース言語のための優れた言語識別ソリューション(lid)を公開している。ただし、LIDは利用できない。 i) 幅広い低リソース言語をカバーしている。 (ii)厳格に評価され、信頼性がある (iii)効率的で使いやすい。 glotlid-mは広範にわたる範囲,信頼性,効率性のデシデラタを満たすlidモデルである。 1665の言語を識別し、以前の作業に比べてカバー範囲が大幅に増加した。実験では,F1と偽陽性率(FPR)のバランスをとる場合,GlotLID-Mは4つのベースライン(CLD3,FT176,OpenLID,NLLB)を上回った。コーパスメタデータの誤り、高リソース言語からの漏洩、密接な関連言語間の分離の困難、マクロ言語対バラエティの処理、一般的なノイズデータなどである。 GlotLID-Mをデータセット生成パイプラインに統合することで,低リソース言語や文化に対するNLP技術の品質向上とアクセシビリティ向上が期待できる。 GlotLID-Mモデル、コード、およびデータソースのリストが利用可能である。 Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages. However, there is no LID available that (i) covers a wide range of low-resource languages, (ii) is rigorously evaluated and reliable and (iii) efficient and easy to use. Here, we publish GlotLID-M, an LID model that satisfies the desiderata of wide coverage, reliability and efficiency. It identifies 1665 languages, a large increase in coverage compared to prior work. In our experiments, GlotLID-M outperforms four baselines (CLD3, FT176, OpenLID and NLLB) when balancing F1 and false positive rate (FPR). We analyze the unique challenges that low-resource LID poses: incorrect corpus metadata, leakage from high-resource languages, difficulty separating closely related languages, handling of macrolanguage vs varieties and in general noisy data. We hope that integrating GlotLID-M into dataset creation pipelines will improve quality and enhance accessibility of NLP technology for low-resource languages and cultures. GlotLID-M model, code, and list of data sources are available: https://github.com/cisnlp/GlotLID.	翻訳日:2023-10-26 17:38:42 公開日:2023-10-24
# 汎用最小補助イジングマシンの設計 Design of General Purpose Minimal-Auxiliary Ising Machines ( http://arxiv.org/abs/2310.16246v1 ) ライセンス: Link先を確認	Isaac K. Martin, Andrew G. Moore, John T. Daly, Jess J. Meyer, Teresa M. Ranadive	(参考訳) isingマシンは、従来のコンピューティングパラダイムの制限を克服し、エネルギー使用量のごく一部で運用する、量子インメモリ処理コンピュータの一形態である。イジングマシンを設計する過程は逆イジング問題として知られている。不運なことに、この問題は一般に計算的に難解である:これは非凸混合整数線形計画問題であり、スピン数が多いランタイムの指数的スケーリングのため、最も単純な場合を除いて、素直にブルート強化できない。我々は、探索空間を2次スケーリングで1つに減らすことができる新しい理論的結果を証明する。この理論を利用して、逆イジング問題に対する汎用アルゴリズム解を開発する。特に、3ビットと4ビットの整数乗算のイジングの定式化を実証する。この結果,スピンがプレミアムである現代のIsingハードウェア上でそのような回路を実装する実践性が向上した。 Ising machines are a form of quantum-inspired processing-in-memory computer which has shown great promise for overcoming the limitations of traditional computing paradigms while operating at a fraction of the energy use. The process of designing Ising machines is known as the reverse Ising problem. Unfortunately, this problem is in general computationally intractable: it is a nonconvex mixed-integer linear programming problem which cannot be naively brute-forced except in the simplest cases due to exponential scaling of runtime with number of spins. We prove new theoretical results which allow us to reduce the search space to one with quadratic scaling. We utilize this theory to develop general purpose algorithmic solutions to the reverse Ising problem. In particular, we demonstrate Ising formulations of 3-bit and 4-bit integer multiplication which use fewer total spins than previously known methods by a factor of more than three. Our results increase the practicality of implementing such circuits on modern Ising hardware, where spins are at a premium.	翻訳日:2023-10-26 17:38:19 公開日:2023-10-24
# ZzzGPT:睡眠の質を高めるインタラクティブGPTアプローチ ZzzGPT: An Interactive GPT Approach to Enhance Sleep Quality ( http://arxiv.org/abs/2310.16242v1 ) ライセンス: Link先を確認	Yonchanok Khaokaew, Thuc Hanh Nguyen, Kaixin Ji, Hiruni Kegalle, Marwah Alaofi	(参考訳) 今日の世界では、睡眠の質は全体の幸福に欠かせない。ウェアラブルセンサーはリアルタイムのモニタリングを提供するが、アクション可能な洞察を欠くことが多く、ユーザの放棄につながる。本稿では,睡眠パターンの理解における技術の役割について述べる。本研究では,大規模言語モデル(llm)を活用した2段階フレームワークを導入し,動作可能なフィードバックによる正確な睡眠予測を実現する。 GLOBEMデータセットとLLMからの合成データを活用して、XGBoostのようなモデルによる強化結果を強調する。本手法は,高度な機械学習とユーザ中心設計を融合し,科学的正確性と実用性を融合する。 In today's world, sleep quality is pivotal for overall well-being. While wearable sensors offer real-time monitoring, they often lack actionable insights, leading to user abandonment. This paper delves into the role of technology in understanding sleep patterns. We introduce a two-stage framework, utilizing Large Language Models (LLMs), aiming to provide accurate sleep predictions with actionable feedback. Leveraging the GLOBEM dataset and synthetic data from LLMs, we highlight enhanced results with models like XGBoost. Our approach merges advanced machine learning with user-centric design, blending scientific accuracy with practicality.	翻訳日:2023-10-26 17:38:01 公開日:2023-10-24
# タスク親和性予測によるマルチタスク機械学習のためのタスクグループ化 Task Grouping for Automated Multi-Task Machine Learning via Task Affinity Prediction ( http://arxiv.org/abs/2310.16241v1 ) ライセンス: Link先を確認	Afiya Ayman, Ayan Mukhopadhyay, Aron Laszka	(参考訳) 類似したタスクを同時に学習する必要がある場合、マルチタスク学習(MTL)モデルはシングルタスク学習(STL)モデルよりもはるかに高い精度が得られる。しかし、MTLの利点は、タスクの類似性、データセットのサイズなど、様々な要因に依存している。では、どのタスクを一緒に学ぶべきか? ドメインの専門家は直観、経験、ベストプラクティスに従ってタスクをグループ化できますが、手動のグルーピングは労働集約的で最適なものではありません。本稿では,タスクグループ化のための新しい自動化手法を提案する。まず、mtl文献で広く使われている4つのベンチマークデータセットを用いて、mtlのタスクの親和性を調べ、ニューラルネットワークに基づくmtlモデルに焦点をあてる。我々は、MTLを用いてタスク群を同時に学習すべきか、STLを用いて独立して学習すべきかを予測するのに役立つ固有のタスク特徴とSTLの特徴を識別する。この予測器をベースとしたランダム化探索アルゴリズムを導入し,タスク群探索時に行うMTLトレーニングの数を最小化する。提案する4つのベンチマークデータセットでは,既存のベースラインアプローチよりも,予測型検索アプローチの方が優れたタスクグループ化を実現できることを示す。 When a number of similar tasks have to be learned simultaneously, multi-task learning (MTL) models can attain significantly higher accuracy than single-task learning (STL) models. However, the advantage of MTL depends on various factors, such as the similarity of the tasks, the sizes of the datasets, and so on; in fact, some tasks might not benefit from MTL and may even incur a loss of accuracy compared to STL. Hence, the question arises: which tasks should be learned together? Domain experts can attempt to group tasks together following intuition, experience, and best practices, but manual grouping can be labor-intensive and far from optimal. In this paper, we propose a novel automated approach for task grouping. First, we study the affinity of tasks for MTL using four benchmark datasets that have been used extensively in the MTL literature, focusing on neural network-based MTL models. We identify inherent task features and STL characteristics that can help us to predict whether a group of tasks should be learned together using MTL or if they should be learned independently using STL. Building on this predictor, we introduce a randomized search algorithm, which employs the predictor to minimize the number of MTL trainings performed during the search for task groups. We demonstrate on the four benchmark datasets that our predictor-driven search approach can find better task groupings than existing baseline approaches.	翻訳日:2023-10-26 17:37:53 公開日:2023-10-24
# スケール空間理論を用いた深層畳み込みネットワークの解像学習 Resolution learning in deep convolutional networks using scale-space theory ( http://arxiv.org/abs/2106.03412v3 ) ライセンス: Link先を確認	Silvia L.Pintea and Nergis Tomen and Stanley F. Goes and Marco Loog and Jan C. van Gemert	(参考訳) 深層畳み込みニューラルネットワーク(cnns)の分解能は、通常、フィルタサイズを通じて受容場サイズに制限され、特徴地図上のレイヤーまたはストレート畳み込みをサブサンプリングする。最適な解像度はデータセットによって大きく異なる可能性がある。現代のCNNは、そのようなハイパーパラメータのチューニングを煩雑にするネットワークアーキテクチャにおいて、その解像度のハイパーパラメータをハードコードしている。我々は、ハードコードされた解像度ハイパーパラメータを廃止し、データから適切な解像度を学ぶことを提案する。スケール空間理論を用いてフィルタの自己相似パラメトリゼーションを求め、ガウス微分フィルタの学習的組み合わせによりフィルタを近似するために、N-Jet: truncated Taylor級数を用いる。ガウス基底のパラメータシグマは、フィルタが符号化する詳細度とフィルタの空間的範囲の両方を制御する。 sigmaは連続パラメータであるため、損失に関して最適化することができる。提案したN-Jetレイヤは,各レイヤの解像度を自動的に学習しながら,最先端のアーキテクチャで使用する場合と同等のパフォーマンスを実現する。我々は,N-Jet層を分類とセグメンテーションの両方で評価し,学習シグマが複数サイズの入力に特に有用であることを示す。 Resolution in deep convolutional neural networks (CNNs) is typically bounded by the receptive field size through filter sizes, and subsampling layers or strided convolutions on feature maps. The optimal resolution may vary significantly depending on the dataset. Modern CNNs hard-code their resolution hyper-parameters in the network architecture which makes tuning such hyper-parameters cumbersome. We propose to do away with hard-coded resolution hyper-parameters and aim to learn the appropriate resolution from data. We use scale-space theory to obtain a self-similar parametrization of filters and make use of the N-Jet: a truncated Taylor series to approximate a filter by a learned combination of Gaussian derivative filters. The parameter sigma of the Gaussian basis controls both the amount of detail the filter encodes and the spatial extent of the filter. Since sigma is a continuous parameter, we can optimize it with respect to the loss. The proposed N-Jet layer achieves comparable performance when used in state-of-the art architectures, while learning the correct resolution in each layer automatically. We evaluate our N-Jet layer on both classification and segmentation, and we show that learning sigma is especially beneficial for inputs at multiple sizes.	翻訳日:2023-10-26 04:10:02 公開日:2023-10-24
# 未知不変多様体近傍の低速確率系の非線形モデル還元 Nonlinear model reduction for slow-fast stochastic systems near unknown invariant manifolds ( http://arxiv.org/abs/2104.02120v2 ) ライセンス: Link先を確認	Felix X.-F. Ye, Sichen Yang, Mauro Maggioni	(参考訳) 本稿では,低次元不変有効多様体と低速ダイナミクス,高次元大速モードを有する高次元確率力学系に対して,非線形確率モデル還元法を提案する。シミュレーションの短いバーストが得られたブラックボックスシミュレータへのアクセスのみを前提として、不変多様体の推定値を出力するアルゴリズムと、高速モードを平均化した実効確率力学のプロセスと、そのシミュレータを設計する。このシミュレータは、不変多様体の低次元を活用し、有効プロセスの正則性に依存する大きさの時間ステップを要し、したがって通常、高速モードを解決しなければならない元のシミュレータよりもはるかに大きいという点で効率的である。アルゴリズムと推定はオンザフライで実行でき、基礎となるダイナミクスとの一貫性を失うことなく、効率的な状態空間の探索に繋がる。この構造は, 定常分布, 準安定状態の同定, 滞留時間, 遷移速度など, それらの力学の重要な特徴と観測可能性の推定とともに, 有効力学の経路の高速かつ効率的なシミュレーションを可能にする。 We introduce a nonlinear stochastic model reduction technique for high-dimensional stochastic dynamical systems that have a low-dimensional invariant effective manifold with slow dynamics, and high-dimensional, large fast modes. Given only access to a black box simulator from which short bursts of simulation can be obtained, we design an algorithm that outputs an estimate of the invariant manifold, a process of the effective stochastic dynamics on it, which has averaged out the fast modes, and a simulator thereof. This simulator is efficient in that it exploits of the low dimension of the invariant manifold, and takes time steps of size dependent on the regularity of the effective process, and therefore typically much larger than that of the original simulator, which had to resolve the fast modes. The algorithm and the estimation can be performed on-the-fly, leading to efficient exploration of the effective state space, without losing consistency with the underlying dynamics. This construction enables fast and efficient simulation of paths of the effective dynamics, together with estimation of crucial features and observables of such dynamics, including the stationary distribution, identification of metastable states, and residence times and transition rates between them.	翻訳日:2023-10-26 04:09:40 公開日:2023-10-24
# 記述論理 elhr の証明 Provenance for the Description Logic ELHr ( http://arxiv.org/abs/2001.07541v3 ) ライセンス: Link先を確認	Camille Bourgaux, Ana Ozaki, Rafael Pe\~naloza and Livia Predoiu	(参考訳) ELHrオントロジーにおける前兆情報処理の問題に対処する。本稿では,オントロジーに基づくデータアクセスの設定について考察し,オントロジーの公理に証明トークンを付加したセミリングと古典的データ証明の拡張について考察する。その結果、導出に関わる公理の証明を継承し、注釈として証明多項式を生成する。 ELHrの場合のセマンティクスを分析し,結合の存在が証明の扱いに様々な困難をもたらすことを示し,その一部はセミリングの乗法的イデオロポシーを仮定することによって緩和されている。本仮定では, オントロジーの完備化, 結果に対する関連する公理の集合の計算, 問合せ応答の3つの問題について検討する。 We address the problem of handling provenance information in ELHr ontologies. We consider a setting recently introduced for ontology-based data access, based on semirings and extending classical data provenance, in which ontology axioms are annotated with provenance tokens. A consequence inherits the provenance of the axioms involved in deriving it, yielding a provenance polynomial as an annotation. We analyse the semantics for the ELHr case and show that the presence of conjunctions poses various difficulties for handling provenance, some of which are mitigated by assuming multiplicative idempotency of the semiring. Under this assumption, we study three problems: ontology completion with provenance, computing the set of relevant axioms for a consequence, and query answering.	翻訳日:2023-10-26 04:08:51 公開日:2023-10-24
# ディープニューラルネットワークのための微分スカラー化 Differentiable Sparsification for Deep Neural Networks ( http://arxiv.org/abs/1910.03201v6 ) ライセンス: Link先を確認	Yognjin Lee	(参考訳) ディープニューラルネットワークは、機能エンジニアリングの負担を大幅に軽減していますが、これらのネットワークの効果的なアーキテクチャを決定するには、それと同等の努力が必要です。さらに、ネットワークサイズが過大になるにつれて、そのサイズを減らすためにかなりの量のリソースが投資される。これらの課題はオーバーコンプリートモデルのスパース化によって効果的に対処できる。本研究では,確率的勾配降下を伴う正規化対象関数を直接最適化することにより,重要でないパラメータをゼロにすることができるディープニューラルネットワークの完全微分可能なスパーシフィケーション法を提案する。その結果,提案手法はネットワークのスパース化構造と重み付けの両方をエンドツーエンドで学習することができる。様々な現代のディープニューラルネットワークに直接適用することができ、トレーニングプロセスに最小限の変更を必要とする。私たちの知る限りでは、これは最初の完全に微分可能なスパーシフィケーション方法です。 Deep neural networks have significantly alleviated the burden of feature engineering, but comparable efforts are now required to determine effective architectures for these networks. Furthermore, as network sizes have become excessively large, a substantial amount of resources is invested in reducing their sizes. These challenges can be effectively addressed through the sparsification of over-complete models. In this study, we propose a fully differentiable sparsification method for deep neural networks, which can zero out unimportant parameters by directly optimizing a regularized objective function with stochastic gradient descent. Consequently, the proposed method can learn both the sparsified structure and weights of a network in an end-to-end manner. It can be directly applied to various modern deep neural networks and requires minimal modification to the training process. To the best of our knowledge, this is the first fully differentiable sparsification method.	翻訳日:2023-10-26 04:08:22 公開日:2023-10-24
# 微分プライバシーにおける完全適応構成 Fully Adaptive Composition in Differential Privacy ( http://arxiv.org/abs/2203.05481v3 ) ライセンス: Link先を確認	Justin Whitehouse and Aaditya Ramdas and Ryan Rogers and Zhiwei Steven Wu	(参考訳) 構成は差分プライバシーの重要な特徴である。よく知られている高度な合成定理は、プライバシの基本的な構成が許すよりも2倍の頻度でプライベートデータベースをクエリできる。しかし、これらの結果は、すべてのアルゴリズムのプライバシパラメータをデータとやりとりする前に修正する必要がある。これを解決するためにRogersらは、アルゴリズムとプライバシパラメータの両方を適応的に選択できる完全適応型合成を導入した。彼らは、適応的な構成でプライバシを測定するための2つの確率的オブジェクトを定義した。すなわち、構成されたインタラクションに対して差分プライバシー保証を提供するプライバシフィルタと、プライバシ損失に関する時間的一様境界であるプライバシオドメータである。高度な合成と既存のフィルターとオドメーターの間には大きなギャップがある。まず、既存のフィルタは、構成されるアルゴリズムに強い仮定を与える。第二に、これらのオドメータとフィルターは大きな定数に苦しめられ、実用的でない。我々は,プライバシパラメータを適応的に選択したにもかかわらず,定数を含む高度な構成率に適合するフィルタを構築した。途中で、おおよそのzcdpに対するプライバシーフィルターも導出します。また、オドメーターの一般的なファミリーもいくつか構築する。これらのオドメーターは、任意の、事前選択された時点、または同時に同時に2つの対数係数まで高度な合成の厳密さと一致する。マルティンゲール濃度の進歩を利用して結果を得る。要約すると、完全に適応的なプライバシは、ほとんど損失なく取得可能である。 Composition is a key feature of differential privacy. Well-known advanced composition theorems allow one to query a private database quadratically more times than basic privacy composition would permit. However, these results require that the privacy parameters of all algorithms be fixed before interacting with the data. To address this, Rogers et al. introduced fully adaptive composition, wherein both algorithms and their privacy parameters can be selected adaptively. They defined two probabilistic objects to measure privacy in adaptive composition: privacy filters, which provide differential privacy guarantees for composed interactions, and privacy odometers, time-uniform bounds on privacy loss. There are substantial gaps between advanced composition and existing filters and odometers. First, existing filters place stronger assumptions on the algorithms being composed. Second, these odometers and filters suffer from large constants, making them impractical. We construct filters that match the rates of advanced composition, including constants, despite allowing for adaptively chosen privacy parameters. En route we also derive a privacy filter for approximate zCDP. We also construct several general families of odometers. These odometers match the tightness of advanced composition at an arbitrary, preselected point in time, or at all points in time simultaneously, up to a doubly-logarithmic factor. We obtain our results by leveraging advances in martingale concentration. In sum, we show that fully adaptive privacy is obtainable at almost no loss.	翻訳日:2023-10-26 04:02:28 公開日:2023-10-24
# LAP:畳み込みニューラルネットワークにおける概念に基づく自己解釈と知識注入のための注意型モジュール LAP: An Attention-Based Module for Concept Based Self-Interpretation and Knowledge Injection in Convolutional Neural Networks ( http://arxiv.org/abs/2201.11808v5 ) ライセンス: Link先を確認	Rassa Ghavami Modegh, Ahmad Salimi, Alireza Dizaji, Hamid R. Rabiee	(参考訳) 深層畳み込みニューラルネットワークの最先端性能にもかかわらず、見当たらない状況ではバイアスや誤動作の影響を受けやすい。さらに、推論の背後にある複雑な計算は、信頼を育むには人間には理解できない。外部説明手法は、人間の理解可能な方法でネットワーク決定を解釈しようと試みてきたが、仮定や単純化により誤認を訴えられている。一方、モデル固有の自己解釈性は、前述の誤りに対してより堅牢であるが、既に訓練されたモデルには適用できない。そこで本研究では, 自己解釈性を実現し, 性能損失を伴わない知識注入の可能性を実現する, LAP (Local Attention Pooling) と呼ばれる新しい注意層を提案する。このモジュールは、どんな畳み込みニューラルネットワークにも簡単に接続できる。我々は、専門家の注釈に頼らずに、意思決定における特徴の区別を学ぶための弱教師付きトレーニングスキームを定義した。我々は、ImageNetを含む2つのデータセット上で複数のLAP拡張モデルを評価することによって、我々の主張を検証する。提案するフレームワークは、一般的なホワイトボックスの説明手法よりも、人間の理解しやすく忠実なモデル解釈を提供する。 Despite the state-of-the-art performance of deep convolutional neural networks, they are susceptible to bias and malfunction in unseen situations. Moreover, the complex computation behind their reasoning is not human-understandable to develop trust. External explainer methods have tried to interpret network decisions in a human-understandable way, but they are accused of fallacies due to their assumptions and simplifications. On the other side, the inherent self-interpretability of models, while being more robust to the mentioned fallacies, cannot be applied to the already trained models. In this work, we propose a new attention-based pooling layer, called Local Attention Pooling (LAP), that accomplishes self-interpretability and the possibility for knowledge injection without performance loss. The module is easily pluggable into any convolutional neural network, even the already trained ones. We have defined a weakly supervised training scheme to learn the distinguishing features in decision-making without depending on experts' annotations. We verified our claims by evaluating several LAP-extended models on two datasets, including ImageNet. The proposed framework offers more valid human-understandable and faithful-to-the-model interpretations than the commonly used white-box explainer methods.	翻訳日:2023-10-26 04:01:23 公開日:2023-10-24
# heam:ディープニューラルネットワークの高効率近似マルチプライア最適化 HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks ( http://arxiv.org/abs/2201.08022v4 ) ライセンス: Link先を確認	Su Zheng, Zhen Li, Yao Lu, Jingbo Gao, Jide Zhang, Lingli Wang	(参考訳) オペランド分布にしたがって平均誤差を最小化する近似乗算器の自動設計のための最適化手法を提案する。我々の乗算器は、DNNにおいて最もよく再現された近似乗算器よりも50.24%高い精度で15.76%小さく、消費電力が25.05%減少し、3.50%遅れている。正確な乗算器と比較して、乗算器は面積、消費電力、遅延を44.94%、47.63%、および16.78%削減し、精度の損失は無視できる。我々の乗算器を持つ試験されたDNN加速器モジュールは、18.70%の面積と9.99%の消費電力を得る。 We propose an optimization method for the automatic design of approximate multipliers, which minimizes the average error according to the operand distributions. Our multiplier achieves up to 50.24% higher accuracy than the best reproduced approximate multiplier in DNNs, with 15.76% smaller area, 25.05% less power consumption, and 3.50% shorter delay. Compared with an exact multiplier, our multiplier reduces the area, power consumption, and delay by 44.94%, 47.63%, and 16.78%, respectively, with negligible accuracy losses. The tested DNN accelerator modules with our multiplier obtain up to 18.70% smaller area and 9.99% less power consumption than the original modules.	翻訳日:2023-10-26 04:01:05 公開日:2023-10-24
# DNNテストのための実世界のメディアデータの多変量解析 Provably Valid and Diverse Mutations of Real-World Media Data for DNN Testing ( http://arxiv.org/abs/2112.01956v2 ) ライセンス: Link先を確認	Yuanyuan Yuan, Qi Pang, Shuai Wang	(参考訳) ディープニューラルネットワーク(dnn)は、しばしば高次元メディアデータ(例えば写真、テキスト、音声)を受け取り、その知覚内容(例えば猫)を理解する。 DNNをテストするには、誤予測を引き起こすために多様な入力が必要である。いくつかの予備的な研究では、バイトレベルの突然変異やドメイン固有のフィルター(霧など)を使用し、有効変異は制限され、エラーを起こしやすい。 sota worksは(無限の)入力を生成するために深い生成モデルを採用している。また、変異した入力を知覚的に有効に保つために(例えば、猫は突然変異後に「猫」のままである)、既存の努力は不正確で一般化不可能なヒューリスティックに頼っている。本研究は,低次元空間における高次元メディアデータの知覚を捉える理論である,多様体に基づく厳密な手法により,メディア入力変異(DIV)と妥当性(VAL)の2つの重要な目的を再考する。 DIV と VAL が互いに密接な関係にある重要な結果を示し、SOTA 生成モデルに基づく手法が実世界のメディアデータ(DIV と VAL の犠牲)を根本的に変更できないことを証明した。対照的に,実世界のメディアデータを,多様体に基づく高いDIVとVALで変更できる可能性について論じる。我々は,様々なフォーマット(画像,音声,テキスト)のメディアデータを,多様体に基づく統一的な方法で変更する技術ソリューションを考案する。特に、メディアデータが低次元多様体に投影されると、そのデータは特定の方向とステップサイズで多様体の上を歩くことで変更することができる。入力データと対比すると、変異されたデータは、適度に高いval(犬はまだ残っている)を保持しながら、知覚特性(例えば、横たわる犬対立犬)にdivを奨励する。 DNNをテストするためにDEEPWALKで実装する。 DEEPWALKは包括性テストにおいて従来の手法よりも優れており、より高い品質のエラートリガー入力を見つけることができる。 Deep neural networks (DNNs) often accept high-dimensional media data (e.g., photos, text, and audio) and understand their perceptual content (e.g., a cat). To test DNNs, diverse inputs are needed to trigger mis-predictions. Some preliminary works use byte-level mutations or domain-specific filters (e.g., foggy), whose enabled mutations may be limited and likely error-prone. SOTA works employ deep generative models to generate (infinite) inputs. Also, to keep the mutated inputs perceptually valid (e.g., a cat remains a "cat" after mutation), existing efforts rely on imprecise and less generalizable heuristics. This study revisits two key objectives in media input mutation - perception diversity (DIV) and validity (VAL) - in a rigorous manner based on manifold, a well-developed theory capturing perceptions of high-dimensional media data in a low-dimensional space. We show important results that DIV and VAL inextricably bound each other, and prove that SOTA generative model-based methods fundamentally fail to mutate real-world media data (either sacrificing DIV or VAL). In contrast, we discuss the feasibility of mutating real-world media data with provably high DIV and VAL based on manifold. We concretize the technical solution of mutating media data of various formats (images, audios, text) via a unified manner based on manifold. Specifically, when media data are projected into a low-dimensional manifold, the data can be mutated by walking on the manifold with certain directions and step sizes. When contrasted with the input data, the mutated data exhibit encouraging DIV in the perceptual traits (e.g., lying vs. standing dog) while retaining reasonably high VAL (i.e., a dog remains a dog). We implement our techniques in DEEPWALK for testing DNNs. DEEPWALK outperforms prior methods in testing comprehensiveness and can find more error-triggering inputs with higher quality.	翻訳日:2023-10-26 04:00:49 公開日:2023-10-24
# 最も単純な流木 Simplest Streaming Trees ( http://arxiv.org/abs/2110.08483v6 ) ライセンス: Link先を確認	Haoyin Xu, Jayanta Dey, Sambit Panda, Joshua T. Vogelstein	(参考訳) ランダムな森林や勾配木などの決定的森林は、特に表データにおいて、現実世界のデータ問題の主要な機械学習手法である。しかし、現在の実装のほとんどはバッチモードでのみ動作するため、より多くのデータが到着してもインクリメンタルに更新することはできない。以前のいくつかの作品は、この制限を克服するためにストリーミングツリーとアンサンブルを開発した。それにもかかわらず、これらの最先端アルゴリズムは、いくつかの問題に対する精度の低下や、他の問題でのメモリ使用率など、多くの欠点を抱えていることがわかった。そこで、我々は、決定木を可能な限りシンプルに拡張し、新しいデータを与え、成長を続けることで既存の木を更新し、古い木を新しい木に置き換え、全体の木数を制御する。 72の分類問題(OpenML-CC18データスイート)を含むベンチマークスイートでは、上記のいずれかの制限を問わないストリーム決定フォレスト(SDF)のアプローチが示されている。これらのデータセット上では、従来のバッチ決定森林アルゴリズムよりも、我々のアプローチがよく、時にはより良く機能することを示した。したがって、sdfは多くの現実世界の問題に容易に適用できる流木や森林の単純な標準を確立している。 Decision forests, including random forests and gradient boosting trees, remain the leading machine learning methods for many real-world data problems, especially on tabular data. However, most of the current implementations only operate in batch mode, and therefore cannot incrementally update when more data arrive. Several previous works developed streaming trees and ensembles to overcome this limitation. Nonetheless, we found that those state-of-the-art algorithms suffer from a number of drawbacks, including low accuracy on some problems and high memory usage on others. We therefore developed the simplest possible extension of decision trees: given new data, simply update existing trees by continuing to grow them, and replace some old trees with new ones to control the total number of trees. In a benchmark suite containing 72 classification problems (the OpenML-CC18 data suite), we illustrate that our approach, Stream Decision Forest (SDF), does not suffer from either of the aforementioned limitations. On those datasets, we also demonstrate that our approach often performs as well, and sometimes even better, than conventional batch decision forest algorithm. Thus, SDFs establish a simple standard for streaming trees and forests that could readily be applied to many real-world problems.	翻訳日:2023-10-26 04:00:14 公開日:2023-10-24
# 非教師なしビデオ領域適応による行動認識:対角的視点 Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective ( http://arxiv.org/abs/2208.07365v3 ) ライセンス: Link先を確認	Pengfei Wei, Lingdong Kong, Xinghua Qu, Yi Ren, Zhiqiang Xu, Jing Jiang, Xiang Yin	(参考訳) 教師なしビデオドメイン適応は実用的だが難しい課題である。この作業では、初めて、歪んだ視点からそれに取り組む。我々のキーとなる考え方は、空間的領域と時間的領域の分断を分離して扱うことである。具体的には,静的情報のエンコードと動的情報をエンコードする2組の潜在要因によるクロスドメインビデオの生成を検討する。その後、トランスファーシーケンスVAE(TranSVAE)フレームワークが開発され、そのような世代をモデル化する。適応性を高めるために,潜在因子を制約する目的をいくつか提案する。これらの制約により、静的なドメイン固有情報を切り離すことで空間的ばらつきを容易に取り除き、対角学習により時間的ばらつきをフレームレベルとビデオレベルの両方からさらに低減することができる。 UCF-HMDB、Jester、Epic-Kitchensデータセットの大規模な実験は、最先端のいくつかのアプローチと比較してTranSVAEの有効性と優位性を検証する。コードは公開されている。 Unsupervised video domain adaptation is a practical yet challenging task. In this work, for the first time, we tackle it from a disentanglement view. Our key idea is to handle the spatial and temporal domain divergence separately through disentanglement. Specifically, we consider the generation of cross-domain videos from two sets of latent factors, one encoding the static information and another encoding the dynamic information. A Transfer Sequential VAE (TranSVAE) framework is then developed to model such generation. To better serve for adaptation, we propose several objectives to constrain the latent factors. With these constraints, the spatial divergence can be readily removed by disentangling the static domain-specific information out, and the temporal divergence is further reduced from both frame- and video-levels through adversarial learning. Extensive experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE compared with several state-of-the-art approaches. Code is publicly available.	翻訳日:2023-10-26 03:50:54 公開日:2023-10-24
# 作用素値シャッテン空間と量子エントロピー Operator-valued Schatten spaces and quantum entropies ( http://arxiv.org/abs/2207.06693v3 ) ライセンス: Link先を確認	Salman Beigi, Milad M. Goodarzi	(参考訳) 作用素値のシャッテン空間は g. pisier によってベクトル値 $\ell_p$-spaces の非可換対応として導入された。この作用素空間の族は補間スケールを形成し、様々なアプリケーションにおいて強力で便利なツールとなる。特に、この族から来るノルムは自然に量子情報理論(QIT)におけるあるエントロピー量の定義に現れるので、ピシエの理論を用いてそれらの量のいくつかの特徴を確立することができる。それにもかかわらず、既存の文献からこの理論の主結果の証明に従うことは極めて困難である。本稿では,特にQITコミュニティ全体において,Pisierの理論の基礎となる概念と概念を自己完結した形で提示することによって,このギャップを埋めようとしている。さらに、この理論のいくつかの応用をQITで述べる。特に、量子条件 R'enyi エントロピーに束縛された新しい一様連続性を証明する。 Operator-valued Schatten spaces were introduced by G. Pisier as a noncommutative counterpart of vector-valued $\ell_p$-spaces. This family of operator spaces forms an interpolation scale which makes it a powerful and convenient tool in a variety of applications. In particular, as the norms coming from this family naturally appear in the definition of certain entropic quantities in Quantum Information Theory (QIT), one may apply Pisier's theory to establish some features of those quantities. Nevertheless, it could be quite challenging to follow the proofs of the main results of this theory from the existing literature. In this article, we attempt to fill this gap by presenting the underlying concepts and ideas of Pisier's theory in a self-contained way which we hope to be more accessible, especially for the QIT community at large. Furthermore, we describe some applications of this theory in QIT. In particular, we prove a new uniform continuity bound for the quantum conditional R\'enyi entropy.	翻訳日:2023-10-26 03:50:28 公開日:2023-10-24
# 動的変動軌跡モデルを用いた心エコー図の異常検出 Anomaly Detection in Echocardiograms with Dynamic Variational Trajectory Models ( http://arxiv.org/abs/2206.15316v3 ) ライセンス: Link先を確認	Alain Ryser, Laura Manduchi, Fabian Laumer, Holger Michel, Sven Wellmann, Julia E. Vogt	(参考訳) 心エコービデオの新しい異常検出法を提案する。本手法は循環周期の周期的特性を利用して変動潜在軌道モデル(TVAE)の3つの変種を学習する。第1の2つの変種(TVAE-CとTVAE-R)は心臓の厳格な周期運動をモデル化するが、第3の変種(TVAE-S)はより一般的であり、ビデオ全体を通して空間表現の変化を可能にする。全てのモデルは、健康な人口の規範を学ぶために、複数のチャンバービューからなる幼児の心エコービデオの、新しい社内データセットの健全なサンプルに基づいて訓練される。推定の際には,データセット内の分布外サンプルを検出するために,MAPに基づく最大異常検出を行う。提案手法は, Ebstein's Anomaly や Shone-complex などの重症先天性心疾患を確実に同定する。さらに、肺高血圧や右室拡張を検出する際に、標準変分オートエンコーダを用いたMAPベースの異常検出よりも優れた性能を発揮する。最後に, 異常心構造に対応する領域を強調するヒートマップを用いて, 出力の解釈可能な説明を可能にすることを実証する。 We propose a novel anomaly detection method for echocardiogram videos. The introduced method takes advantage of the periodic nature of the heart cycle to learn three variants of a variational latent trajectory model (TVAE). While the first two variants (TVAE-C and TVAE-R) model strict periodic movements of the heart, the third (TVAE-S) is more general and allows shifts in the spatial representation throughout the video. All models are trained on the healthy samples of a novel in-house dataset of infant echocardiogram videos consisting of multiple chamber views to learn a normative prior of the healthy population. During inference, maximum a posteriori (MAP) based anomaly detection is performed to detect out-of-distribution samples in our dataset. The proposed method reliably identifies severe congenital heart defects, such as Ebstein's Anomaly or Shone-complex. Moreover, it achieves superior performance over MAP-based anomaly detection with standard variational autoencoders when detecting pulmonary hypertension and right ventricular dilation. Finally, we demonstrate that the proposed method enables interpretable explanations of its output through heatmaps highlighting the regions corresponding to anomalous heart structures.	翻訳日:2023-10-26 03:50:12 公開日:2023-10-24
# ボソニックガウス系の量子r\'{e}nyiエントロピー汎関数 Quantum R\'{e}nyi Entropy Functionals for Bosonic Gaussian Systems ( http://arxiv.org/abs/2204.10737v3 ) ライセンス: Link先を確認	Junseo Lee and Kabgyun Jeong	(参考訳) 本研究では、次数 $p>1$ とパワー $\kappa$ の量子 r\'{e}nyi エントロピーパワー不等式を古典的 r\'{e}nyi-$p$ エントロピーパワー不等式(英語版)の量子アナログとして導入する。この不等式を導出するために、一般化ビームスプリッター演算である量子畳み込みの混合演算により、ボソニックガウス系のWehrl-$p$エントロピーパワー不等式を利用する。この観測は、量子R\'{e}nyi-$p$エントロピーパワーの不等式を、D$モードボソニックガウスの準確率分布に対して直接提供する。提案された不等式は、量子チャネル容量の非自明な計算、特にボソニックガウス量子チャネル上の普遍上界、およびガウス増幅器のスクイージング操作によるガウスの絡み合い観測に有用である。 In this study, the quantum R\'{e}nyi entropy power inequality of order $p>1$ and power $\kappa$ is introduced as a quantum analog of the classical R\'{e}nyi-$p$ entropy power inequality. To derive this inequality, we first exploit the Wehrl-$p$ entropy power inequality on bosonic Gaussian systems via the mixing operation of quantum convolution, which is a generalized beam-splitter operation. This observation directly provides a quantum R\'{e}nyi-$p$ entropy power inequality over a quasi-probability distribution for $D$-mode bosonic Gaussian regimes. The proposed inequality is expected to be useful for the nontrivial computing of quantum channel capacities, particularly universal upper bounds on bosonic Gaussian quantum channels, and a Gaussian entanglement witness in the case of Gaussian amplifier via squeezing operations.	翻訳日:2023-10-26 03:48:47 公開日:2023-10-24
# ULF: Cross-Validation を用いた非教師付きラベリング関数補正 ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision ( http://arxiv.org/abs/2204.06863v3 ) ライセンス: Link先を確認	Anastasiia Sedova, Benjamin Roth	(参考訳) 手動ラベリングの費用対効果は弱い監督(WS)であり、データサンプルは事前に定義されたラベリング関数のセット(LF)を使って自動的にアノテートされ、関連するクラスの人工ラベリングを生成するルールベースのメカニズムである。そこで本研究では,k-foldクロスバリデーションの原理に基づくWSのノイズ低減手法について検討する。非教師付きラベル関数補正のための新しいアルゴリズムULFを導入し、いくつかのLF以外のモデルで訓練されたモデルを利用してWSデータを識別し、保持されたLFに固有のバイアスを補正する。特にULFは、高信頼性のクロスバリデーションサンプルにこの割り当てを再見積することで、クラスへのLFの割り当てを洗練します。複数のデータセットの評価は、手動ラベリングを必要とせずにWS学習を向上するULFの有効性を確認する。 A cost-effective alternative to manual data labeling is weak supervision (WS), where data samples are automatically annotated using a predefined set of labeling functions (LFs), rule-based mechanisms that generate artificial labels for the associated classes. In this work, we investigate noise reduction techniques for WS based on the principle of k-fold cross-validation. We introduce a new algorithm ULF for Unsupervised Labeling Function correction, which denoises WS data by leveraging models trained on all but some LFs to identify and correct biases specific to the held-out LFs. Specifically, ULF refines the allocation of LFs to classes by re-estimating this assignment on highly reliable cross-validated samples. Evaluation on multiple datasets confirms ULF's effectiveness in enhancing WS learning without the need for manual labeling.	翻訳日:2023-10-26 03:48:28 公開日:2023-10-24
# 弱結合を超える有限時間ランダウアー原理 Finite-time Landauer principle beyond weak coupling ( http://arxiv.org/abs/2211.02065v3 ) ライセンス: Link先を確認	Alberto Rolandi and Mart\'i Perarnau-Llobet	(参考訳) ランダウアーの原理は、情報を消去する熱力学的コストに根本的な制限を与える。その飽和は可逆等温過程を必要とし、したがって無限の時間を必要とする。我々は,単一のフェルミオンモードの占有中にエンコードされたビットに対して,ランドウアーの原理の有限時間バージョンを開発した。正確な非平衡力学を解くことによって、熱力学への幾何学的アプローチにより、遅い駆動状態における消去過程(フェルミオンのエネルギーと系-バス結合を制御パラメータとする)を最適化する。数値的に解くことができる熱力学的計量と測地線方程式の解析式を求める。これらの解は、非マルコフ的かつ強いカップリング効果を完全に考慮して、ランダウアーの束縛に対する有限時間補正を特徴付けるための最適な過程を与える。 Landauer's principle gives a fundamental limit to the thermodynamic cost of erasing information. Its saturation requires a reversible isothermal process, and hence infinite time. We develop a finite-time version of Landauer's principle for a bit encoded in the occupation of a single fermionic mode, which can be strongly coupled to a reservoir. By solving the exact non-equilibrium dynamics, we optimize erasure processes (taking both the fermion's energy and system-bath coupling as control parameters) in the slow driving regime through a geometric approach to thermodynamics. We find analytic expressions for the thermodynamic metric and geodesic equations, which can be solved numerically. Their solution yields optimal processes that allow us to characterize a finite-time correction to Landauer's bound, fully taking into account non-markovian and strong coupling effects.	翻訳日:2023-10-26 03:42:03 公開日:2023-10-24
# FrischとSegr\`eによる多段階Stern$\unicode{x2013}$Gerlach実験の量子力学的モデリング Quantum mechanical modeling of the multi-stage Stern$\unicode{x2013}$Gerlach experiment conducted by Frisch and Segr\`e ( http://arxiv.org/abs/2210.11553v3 ) ライセンス: Link先を確認	S. S\"uleyman Kahraman, Kelvin Titimbo, Zhe He, Jung-Tsung Shen, Lihong V. Wang	(参考訳) マルチステージ Stern$\unicode{x2013}$Gerlach 実験はカスケード量子測定を提供する。 Frisch と Segr\``e が行ったマルチステージ Stern$\unicode{x2013}$Gerlach 実験は、Majorana の量子力学を用いて解析的にモデル化され、Rabi によって修正された。しかし、理論的な予測は実験的な観測とよく一致しない。ここでは、スピンの時間発展の超微細構造相互作用を含むフォン・ノイマン方程式を用いて、標準量子力学モデルを数値的に解く。これまでのところ、自由パラメータを使わずに標準量子力学モデルから決定される係数はまだゼロ以下であり、理論と実験のミスマッチを示している。一致を改善する非標準変種を議論するために検討する。 Multi-stage Stern$\unicode{x2013}$Gerlach experiments provide cascaded quantum measurements. The multi-stage Stern$\unicode{x2013}$Gerlach experiment conducted by Frisch and Segr\`e has been modeled analytically using quantum mechanics by Majorana and revised by Rabi by including the hyperfine interaction. However, the theoretical predictions do not match the experimental observation well. Here, we numerically solve the standard quantum mechanical model, via the von Neumann equation, that includes the hyperfine interaction for the time evolution of the spin. Thus far, the coefficients of determination from the standard quantum mechanical model without using free parameters are still below zero, indicating a mismatch between theory and experiment. Non-standard variants that improve the match are explored for discussion.	翻訳日:2023-10-26 03:41:49 公開日:2023-10-24
# 深層生成モデルのコンテンツベース検索 Content-Based Search for Deep Generative Models ( http://arxiv.org/abs/2210.03116v4 ) ライセンス: Link先を確認	Daohan Lu, Sheng-Yu Wang, Nupur Kumari, Rohan Agarwal, Mia Tang, David Bau, Jun-Yan Zhu	(参考訳) カスタマイズおよび事前学習された生成モデルの増大により、ユーザが既存のすべてのモデルを完全に認識することは不可能になった。このニーズに対処するために、我々はコンテンツベースのモデル検索のタスクを導入する: クエリと大量の生成モデルが与えられたとき、クエリに最も適したモデルを見つける。各生成モデルは画像の分布を生成するため、探索タスクを最適化問題として定式化し、クエリと類似したコンテンツを生成する確率が最も高いモデルを選択する。画像,スケッチ,テキストなど,異なるモーダル性からのクエリを考慮し,この確率を近似する定式化を導入する。さらに,様々な問合せモダリティに適合する特徴を学習するモデル検索のためのコントラスト学習フレームワークを提案する。本手法は,モデル検索タスク用に作成した新しいベンチマークである生成モデルzooのベースライン数を上回ることを示す。 The growing proliferation of customized and pretrained generative models has made it infeasible for a user to be fully cognizant of every model in existence. To address this need, we introduce the task of content-based model search: given a query and a large set of generative models, finding the models that best match the query. As each generative model produces a distribution of images, we formulate the search task as an optimization problem to select the model with the highest probability of generating similar content as the query. We introduce a formulation to approximate this probability given the query from different modalities, e.g., image, sketch, and text. Furthermore, we propose a contrastive learning framework for model retrieval, which learns to adapt features for various query modalities. We demonstrate that our method outperforms several baselines on Generative Model Zoo, a new benchmark we create for the model retrieval task.	翻訳日:2023-10-26 03:41:32 公開日:2023-10-24
# Spectral2 Spectral: Image-spectral similarity Assisted Spectral CT Deep Reconstruction without Reference Spectral2Spectral: Image-spectral Similarity Assisted Spectral CT Deep Reconstruction without Reference ( http://arxiv.org/abs/2210.01125v2 ) ライセンス: Link先を確認	Xiaodong Guo, Longhui Li, Peng He, Peng Feng, Dingyue Chang, Hengyong Yu, Weiwen Wu	(参考訳) 光子計数検出器(英語版)(PCD)に基づくスペクトル計算トモグラフィーは、バイオメディカル素材のより正確な同定と定量分析を提供する能力を持つため、ますます注目を集めている。狭いエネルギービン内での光子数の制限は、低信号ノイズ比の撮像結果をもたらす。既存のCT再建のための教師付き深層再構築ネットワークは,ノイズのない臨床像を基準として取得することは不可能であるため,これらの課題に対処するのは難しい。本稿では,教師なし手法とデータ先行処理を,Spectral2Spectralという名前の統一フレームワークに相乗化するための反復的深層再構築ネットワークを提案する。我々のSpectral2Spectralは、教師なしの深層学習戦略を用いて、ノイズの多いデータからエンドツーエンドで高品質な画像を得る。画像スペクトル領域内の構造的類似性は、ネットワークトレーニングをさらに制約するために正規化項として洗練される。ニューラルネットワークの重みは自動的に更新され、反復プロセス内の画像の特徴と構造をキャプチャする。 3つの大規模な前臨床データセット実験は、スペクトル2スペクトルが他の最先端の手法よりも優れた画質を再構成することを示した。 Spectral computed tomography based on a photon-counting detector (PCD) attracts more and more attentions since it has the capability to provide more accurate identification and quantitative analysis for biomedical materials. The limited number of photons within narrow energy bins leads to imaging results of low signal-noise ratio. The existing supervised deep reconstruction networks for CT reconstruction are difficult to address these challenges because it is usually impossible to acquire noise-free clinical images with clear structures as references. In this paper, we propose an iterative deep reconstruction network to synergize unsupervised method and data priors into a unified framework, named as Spectral2Spectral. Our Spectral2Spectral employs an unsupervised deep training strategy to obtain high-quality images from noisy data in an end-to-end fashion. The structural similarity prior within image-spectral domain is refined as a regularization term to further constrain the network training. The weights of neural network are automatically updated to capture image features and structures within the iterative process. Three large-scale preclinical datasets experiments demonstrate that the Spectral2spectral reconstructs better image quality than other the state-of-the-art methods.	翻訳日:2023-10-26 03:41:17 公開日:2023-10-24
# 畳み込みニューラルネットワークにおける最大プール特徴写像のシフト不変性について On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks ( http://arxiv.org/abs/2209.11740v2 ) ライセンス: Link先を確認	Hubert Leterme (UGA, LJK), K\'evin Polisano (UGA, LJK), Val\'erie Perrier (Grenoble INP, LJK), Karteek Alahari (LJK)	(参考訳) 本稿では,画像分類における畳み込みニューラルネットワーク(cnns)の数学的解釈性の向上に着目する。具体的には、imagenetのようなデータセットでトレーニングすると、指向したバンドパスフィルタによく似たパラメータを学習する傾向がある、第1層で発生する不安定な問題に取り組む。このようなガボル型フィルタによるサブサンプル畳み込みはエイリアスしやすく、小さな入力シフトに敏感である。この文脈では、最大プーリング作用素が複素モジュラーを近似する条件を確立するが、これはほとんどシフト不変である。次に、サブサンプル畳み込みに対するシフト不変性の尺度を導出し、最大プーリングを行う。特に,安定を達成する上で,フィルタの周波数と方向が果たす重要な役割を強調する。本稿では,二本木複素ウェーブレットパケット変換に基づく決定論的特徴抽出器,特に離散ガボール分解の場合について実験的に検証する。 This paper focuses on improving the mathematical interpretability of convolutional neural networks (CNNs) in the context of image classification. Specifically, we tackle the instability issue arising in their first layer, which tends to learn parameters that closely resemble oriented band-pass filters when trained on datasets like ImageNet. Subsampled convolutions with such Gabor-like filters are prone to aliasing, causing sensitivity to small input shifts. In this context, we establish conditions under which the max pooling operator approximates a complex modulus, which is nearly shift invariant. We then derive a measure of shift invariance for subsampled convolutions followed by max pooling. In particular, we highlight the crucial role played by the filter's frequency and orientation in achieving stability. We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree complex wavelet packet transform, a particular case of discrete Gabor-like decomposition.	翻訳日:2023-10-26 03:40:55 公開日:2023-10-24
# Amortized Variational Inference: A Systematic Review Amortized Variational Inference: A Systematic Review ( http://arxiv.org/abs/2209.10888v2 ) ライセンス: Link先を確認	Ankush Ganguly, Sanjana Jain, and Ukrit Watchareeruetai	(参考訳) 変分推論(VI)の中核となる原理は、複雑な後続確率密度の統計的推論問題をトラクタブルな最適化問題に変換することである。この特性により、VIは複数のサンプリングベース技術よりも高速になる。しかし、従来のVIアルゴリズムは大規模データセットには拡張性がなく、最適化プロセスを再実行することなく容易に境界外データポイントを推測できない。ストーシャスティック、ブラックボックス、アモールタイズVIといったこの分野の最近の発展は、これらの問題に対処するのに役立っている。生成的モデリングタスクは、パラメータ化関数を用いて近似後続密度パラメータを学習するため、その効率と拡張性にアモータイズVIを広く利用している。本稿では、様々なVI技法の数学的基礎を概観し、VIの解釈の基礎を形成する。さらに, 償却ギャップ, 一般化問題, 不整合表現学習, 後方崩壊など, 償却viの諸問題に対処した最近の傾向について概説する。最後に、VI 最適化を改善するための交互分散手法を解析する。 The core principle of Variational Inference (VI) is to convert the statistical inference problem of computing complex posterior probability densities into a tractable optimization problem. This property enables VI to be faster than several sampling-based techniques. However, the traditional VI algorithm is not scalable to large data sets and is unable to readily infer out-of-bounds data points without re-running the optimization process. Recent developments in the field, like stochastic-, black box-, and amortized-VI, have helped address these issues. Generative modeling tasks nowadays widely make use of amortized VI for its efficiency and scalability, as it utilizes a parameterized function to learn the approximate posterior density parameters. In this paper, we review the mathematical foundations of various VI techniques to form the basis for understanding amortized VI. Additionally, we provide an overview of the recent trends that address several issues of amortized VI, such as the amortization gap, generalization issues, inconsistent representation learning, and posterior collapse. Finally, we analyze alternate divergence measures that improve VI optimization.	翻訳日:2023-10-26 03:40:38 公開日:2023-10-24
# MaXM:多言語視覚質問応答を目指して MaXM: Towards Multilingual Visual Question Answering ( http://arxiv.org/abs/2209.05401v3 ) ライセンス: Link先を確認	Soravit Changpinyo, Linting Xue, Michal Yarom, Ashish V. Thapliyal, Idan Szpektor, Julien Amelot, Xi Chen, Radu Soricut	(参考訳) VQA(Visual Question Answering)は、主に英語のレンズを通して研究されている。しかし、同じ方法で他の言語でVQAに取り組むには、かなりの量のリソースが必要になる。本稿では,データとモデリングの両面で,多言語視覚質問応答(mVQA)のスケーラブルな解を提案する。まず,従来の質問や回答を直接収集する手法よりも,人間のアノテーションの取り組みをはるかに少なくする,mVQAデータ生成のための翻訳ベースのフレームワークを提案する。次に,Crossmodal-3600データセットの多言語キャプションに適用し,テスト専用VQAベンチマークであるMaXMを作成するための効率的なアノテーションプロトコルを開発する。最後に, 単純で軽量で効果的なアプローチと, 最先端の英語および多言語VQAモデルのベンチマークを行う。われわれのベンチマークがmVQAのさらなる研究を促進することを願っている。 Visual Question Answering (VQA) has been primarily studied through the lens of the English language. Yet, tackling VQA in other languages in the same manner would require a considerable amount of resources. In this paper, we propose scalable solutions to multilingual visual question answering (mVQA), on both data and modeling fronts. We first propose a translation-based framework to mVQA data generation that requires much less human annotation efforts than the conventional approach of directly collection questions and answers. Then, we apply our framework to the multilingual captions in the Crossmodal-3600 dataset and develop an efficient annotation protocol to create MaXM, a test-only VQA benchmark in 7 diverse languages. Finally, we develop a simple, lightweight, and effective approach as well as benchmark state-of-the-art English and multilingual VQA models. We hope that our benchmark encourages further research on mVQA.	翻訳日:2023-10-26 03:40:19 公開日:2023-10-24
# SCL-RAI:NERにおける未ラベルエンティティ問題に対する検索拡張推論を用いたスパン型コントラスト学習 SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER ( http://arxiv.org/abs/2209.01646v3 ) ライセンス: Link先を確認	Shuzheng Si, Shuang Zeng, Jiaxing Lin, Baobao Chang	(参考訳) 名前付きエンティティ認識は、テキスト内のエンティティを見つけて分類するタスクである。しかし、NERデータセットのUnlabeled Entity Problemは、NERのパフォーマンスを著しく損なう。本稿では,この問題に対処するためのSCL-RAIを提案する。まず,異なるラベルで表現するスパンの距離を減らし,異なるラベルで表現するコントラスト学習を行うことにより,エンティティ間のあいまいさを軽減し,ラベルのないエンティティに対するモデルの堅牢性を向上させる。そこで我々は,決定境界シフト問題を緩和する検索拡張推論を提案する。本手法は,2つの実世界のデータセットにおいて,従来のSOTA法よりも4.21%,F1スコアが8.64%向上した。 Named Entity Recognition is the task to locate and classify the entities in the text. However, Unlabeled Entity Problem in NER datasets seriously hinders the improvement of NER performance. This paper proposes SCL-RAI to cope with this problem. Firstly, we decrease the distance of span representations with the same label while increasing it for different ones via span-based contrastive learning, which relieves the ambiguity among entities and improves the robustness of the model over unlabeled entities. Then we propose retrieval augmented inference to mitigate the decision boundary shifting problem. Our method significantly outperforms the previous SOTA method by 4.21% and 8.64% F1-score on two real-world datasets.	翻訳日:2023-10-26 03:40:05 公開日:2023-10-24
# カプセル学習のためのハイブリッドGromov-Wasserstein埋め込み Hybrid Gromov-Wasserstein Embedding for Capsule Learning ( http://arxiv.org/abs/2209.00232v2 ) ライセンス: Link先を確認	Pourya Shamsolmoali, Masoumeh Zareapoor, Swagatam Das, Eric Granger, Salvador Garcia	(参考訳) Capsule Networks(CapsNets)は、イメージをオブジェクト、部品、およびそれらの関係の階層にパースすることを目的として、部分全体変換と階層的コンポーネントルーティングを含む2段階のプロセスを使用する。しかし、この階層的関係モデリングは計算コストが高く、潜在的な利点にもかかわらずcapsnetの利用が制限されている。 capsnetモデルの現在の状況は、主に彼らのパフォーマンスとカプセルのベースラインを比較することに集中しており、複雑なタスクでディープcnnの変種と同じレベルの熟練度を達成できていない。この制限に対処するために、標準ベースラインモデルを超え、高性能な畳み込みモデルよりも優れた性能を示すカプセルの学習手法を提案する。まず、入力ベクトルが投影される部分カプセルのグループを紹介します。次に、まず、サブカプセルによってモデル化されたコンポーネントと入力の相違性を定量化し、次に最適な輸送によってアライメント度を決定するハイブリッドGromov-Wassersteinフレームワークを提案する。この革新的なメカニズムは、それぞれのコンポーネント分布の類似性に基づいて、入力とサブカプセルのアライメントを定義する新しい洞察を生かしている。このアプローチはCapsNetsの複雑な高次元データから学ぶ能力を高め、解釈可能性と階層構造を維持する。提案モデルには2つの利点がある。 (i)その軽量な性質は、物体検出を含むより複雑な視覚タスクへのカプセルの応用を促進する。 (ii)これらの要求タスクにおけるベースラインアプローチよりも優れています。 Capsule networks (CapsNets) aim to parse images into a hierarchy of objects, parts, and their relations using a two-step process involving part-whole transformation and hierarchical component routing. However, this hierarchical relationship modeling is computationally expensive, which has limited the wider use of CapsNet despite its potential advantages. The current state of CapsNet models primarily focuses on comparing their performance with capsule baselines, falling short of achieving the same level of proficiency as deep CNN variants in intricate tasks. To address this limitation, we present an efficient approach for learning capsules that surpasses canonical baseline models and even demonstrates superior performance compared to high-performing convolution models. Our contribution can be outlined in two aspects: firstly, we introduce a group of subcapsules onto which an input vector is projected. Subsequently, we present the Hybrid Gromov-Wasserstein framework, which initially quantifies the dissimilarity between the input and the components modeled by the subcapsules, followed by determining their alignment degree through optimal transport. This innovative mechanism capitalizes on new insights into defining alignment between the input and subcapsules, based on the similarity of their respective component distributions. This approach enhances CapsNets' capacity to learn from intricate, high-dimensional data while retaining their interpretability and hierarchical structure. Our proposed model offers two distinct advantages: (i) its lightweight nature facilitates the application of capsules to more intricate vision tasks, including object detection; (ii) it outperforms baseline approaches in these demanding tasks.	翻訳日:2023-10-26 03:39:53 公開日:2023-10-24
# DenseShift: 高精度で効率的な2ビットの量子化を目指す DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization ( http://arxiv.org/abs/2208.09708v3 ) ライセンス: Link先を確認	Xinlin Li, Bang Liu, Rui Heng Yang, Vanessa Courville, Chao Xing, Vahid Partovi Nia	(参考訳) 低リソースのエッジデバイスにディープニューラルネットワークを効率的にデプロイするのは、リソース要件の増大が原因で難しい。この問題に対処するため、研究者は2つの量子化のパワーや、メモリ使用量の削減と計算の簡素化を目的としたシフトネットワークなど、乗算フリーなニューラルネットワークを提案している。しかし、既存の低ビットシフトネットワークはフル精度のネットワークほど正確ではなく、通常は制限されたウェイトレンジ符号化スキームと量子化損失に悩まされている。本稿では,シフトネットワークの精度を大幅に向上し,視覚・音声アプリケーションのための全精度ネットワークと競合する性能を実現する高密度シフトネットワークを提案する。さらに,非量子化浮動小数点アクティベーションを用いた効率的なDenseShiftネットワークのデプロイ手法を導入し,既存手法の1.6倍の高速化を実現した。これを実現するために,低ビットシフトネットワークにおけるゼロウェイト値がモデルのキャパシティに寄与せず,推論計算に悪影響を及ぼすことを実証する。そこで本研究では,モデルキャパシティの向上と推論を簡略化するゼロフリーシフト機構を提案する。さらに,学習効率を向上させるための符号スケール分解設計と,モデルの伝達学習性能を向上させるための低分散ランダム初期化戦略を提案する。様々なコンピュータビジョンおよび音声タスクに関する広範な実験により,高密度シフトは既存の低ビット乗算フリーネットワークよりも優れており,全精度ネットワークに比べて競争性能が向上することが示された。さらに,提案手法は,精度を低下させることなく強い転送学習性能を示す。私たちのコードはGitHubでリリースされました。 Efficiently deploying deep neural networks on low-resource edge devices is challenging due to their ever-increasing resource requirements. To address this issue, researchers have proposed multiplication-free neural networks, such as Power-of-Two quantization, or also known as Shift networks, which aim to reduce memory usage and simplify computation. However, existing low-bit Shift networks are not as accurate as their full-precision counterparts, typically suffering from limited weight range encoding schemes and quantization loss. In this paper, we propose the DenseShift network, which significantly improves the accuracy of Shift networks, achieving competitive performance to full-precision networks for vision and speech applications. In addition, we introduce a method to deploy an efficient DenseShift network using non-quantized floating-point activations, while obtaining 1.6X speed-up over existing methods. To achieve this, we demonstrate that zero-weight values in low-bit Shift networks do not contribute to model capacity and negatively impact inference computation. To address this issue, we propose a zero-free shifting mechanism that simplifies inference and increases model capacity. We further propose a sign-scale decomposition design to enhance training efficiency and a low-variance random initialization strategy to improve the model's transfer learning performance. Our extensive experiments on various computer vision and speech tasks demonstrate that DenseShift outperforms existing low-bit multiplication-free networks and achieves competitive performance compared to full-precision networks. Furthermore, our proposed approach exhibits strong transfer learning performance without a drop in accuracy. Our code was released on GitHub.	翻訳日:2023-10-26 03:39:27 公開日:2023-10-24
# 非エルミタン系における量子力学の欠陥解凍 Quantum Metric Unveils Defect Freezing in Non-Hermitian Systems ( http://arxiv.org/abs/2301.02247v3 ) ライセンス: Link先を確認	Karin Sim, Nicol\`o Defenu, Paolo Molignini, R. Chitra	(参考訳) 量子ハミルトニアンにおける非ハーモニティ性は、非単位時間進化とおそらく複雑なエネルギー固有値をもたらし、エルミート的でない豊富な現象論をもたらす。本研究では, 完全可解な非エルミート系のダイナミクスを研究し, 線形クエンチを受ける$\mathcal{pt}$-symmetric モードと$\mathcal{pt}$-brokenモードの両方をホストする。ヒルベルト空間に非自明な動的計量が与えられる完全に一貫したフレームワークを用いることで、生成された欠陥のダイナミクスを分析する。エルミート系とは対照的に, PT崩壊時間進化は欠陥凍結を引き起こすため, 断熱性に反することが明らかとなった。この物理学は、状態の時間依存ノルムによる量正規化の法則によって見逃されるため、いわゆるメートル法フレームワークを必要とする。我々の結果は幅広い実験システムに関係している。 Non-Hermiticity in quantum Hamiltonians leads to nonunitary time evolution and possibly complex energy eigenvalues, which can lead to a rich phenomenology with no Hermitian counterpart. In this work, we study the dynamics of an exactly solvable non-Hermitian system, hosting both $\mathcal{PT}$-symmetric and $\mathcal{PT}$-broken modes subject to a linear quench. Employing a fully consistent framework, in which the Hilbert space is endowed with a nontrivial dynamical metric, we analyze the dynamics of the generated defects. In contrast to Hermitian systems, our study reveals that PT -broken time evolution leads to defect freezing and hence the violation of adiabaticity. This physics necessitates the so-called metric framework, as it is missed by the oft used approach of normalizing quantities by the time-dependent norm of the state. Our results are relevant for a wide class of experimental systems.	翻訳日:2023-10-26 03:30:46 公開日:2023-10-24
# 境界時間結晶を用いた量子力学 Quantum metrology with boundary time crystals ( http://arxiv.org/abs/2301.02103v2 ) ライセンス: Link先を確認	V. Montenegro, M. G. Genoni, A. Bayat, M. G. A. Paris	(参考訳) 量子センシングは、古典的な技術よりも量子技術の優位性を実証するアリーナの1つである。しかし、そのような優位性は、回避不可能なノイズとプローブのデコヒーレンスにより減少することができる。したがって、デコヒーレンスと戦うか利益を得るための気象学的戦略は非常に望ましい。これは、散逸相転移をサポートするある種の脱コヒーレンス駆動多体系であり、センシングに役立つかもしれない。境界時結晶(バウンダリ時結晶)は、時間-翻訳対称性が破られ、熱力学の極限で開量子系に長寿命の振動が出現する物質のエキゾチックな散逸相である。対称から境界時間結晶相への遷移は2次遷移によって説明され、量子フィッシャー情報によって定量化された量子エンハンス感度を示す。また,システムの臨界指数を決定し,それらの関係性を確立する。我々の手法は、量子エンハンス感度を達成するためにデコヒーレンスを活用することの実証である。実用の観点からは、初期化とは無関係であることの利点があり、単純な測定で捉えることができる。 Quantum sensing is one of the arenas that exemplifies the superiority of quantum technologies over their classical counterparts. Such superiority, however, can be diminished due to unavoidable noise and decoherence of the probe. Thus, metrological strategies to fight against or profit from decoherence are highly desirable. This is the case of certain types of decoherence-driven many-body systems supporting dissipative phase transitions, which might be helpful for sensing. Boundary time crystals are exotic dissipative phases of matter in which the time-translational symmetry is broken, and long-lasting oscillations emerge in open quantum systems at the thermodynamic limit. We show that the transition from a symmetry unbroken into a boundary time crystal phase, described by a second-order transition, reveals quantum-enhanced sensitivity quantified through quantum Fisher information. We also determine the critical exponents of the system and establish their relationship. Our scheme is indeed a demonstration of harnessing decoherence for achieving quantum-enhanced sensitivity. From a practical perspective, it has the advantage of being independent of initialization and can be captured by a simple measurement.	翻訳日:2023-10-26 03:30:28 公開日:2023-10-24
# T-Projection:シーケンスラベリングタスクのための高品質アノテーションプロジェクション T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks ( http://arxiv.org/abs/2212.10548v2 ) ライセンス: Link先を確認	Iker Garc\'ia-Ferrero, Rodrigo Agerri, German Rigau	(参考訳) 与えられたシーケンスラベリングタスクと言語に対するラベル付きデータがないため、アノテーションプロジェクションは注釈付きデータを自動的に生成する戦略のひとつとして提案されている。アノテーションプロジェクションはしばしば、並列コーパス上で、ソース言語の与えられたスパンに関連するラベルをターゲット言語の対応するスパンに転送するタスクとして定式化されている。本稿では,大規模な事前学習されたテキスト・テキスト言語モデルと最先端機械翻訳技術を活用したアノテーション投影手法T-Projectionを提案する。 T-プロジェクションはラベルプロジェクションタスクを2つのサブタスクに分解する。 (i)多言語t5モデルを用いた投影候補の集合を生成した候補生成ステップ (ii)翻訳確率に基づいて生成候補をランク付けする候補選択ステップ。 5つのインド・ヨーロッパ語と8つの低資源アフリカの言語において内在的および外在的タスクについて実験を行った。我々は、T射影が従来のアノテーション投影法よりも広いマージンで優れていると評価した。我々は、T-Projectionがシーケンスラベリングタスクにおける高品質なトレーニングデータの欠如を自動的に緩和するのに役立つと考えている。コードとデータは公開されている。 In the absence of readily available labeled data for a given sequence labeling task and language, annotation projection has been proposed as one of the possible strategies to automatically generate annotated data. Annotation projection has often been formulated as the task of transporting, on parallel corpora, the labels pertaining to a given span in the source language into its corresponding span in the target language. In this paper we present T-Projection, a novel approach for annotation projection that leverages large pretrained text-to-text language models and state-of-the-art machine translation technology. T-Projection decomposes the label projection task into two subtasks: (i) A candidate generation step, in which a set of projection candidates using a multilingual T5 model is generated and, (ii) a candidate selection step, in which the generated candidates are ranked based on translation probabilities. We conducted experiments on intrinsic and extrinsic tasks in 5 Indo-European and 8 low-resource African languages. We demostrate that T-projection outperforms previous annotation projection methods by a wide margin. We believe that T-Projection can help to automatically alleviate the lack of high-quality training data for sequence labeling tasks. Code and data are publicly available.	翻訳日:2023-10-26 03:30:11 公開日:2023-10-24
# 近位因果学習のための最適治療基準 Optimal Treatment Regimes for Proximal Causal Learning ( http://arxiv.org/abs/2212.09494v3 ) ライセンス: Link先を確認	Tao Shen, Yifan Cui	(参考訳) 政策立案者が因果推論を引き合いに出し、観測データに基づいて決定を下す場合の一般的な懸念は、測定された共変量体が、すべての共変量体、すなわち標準的無根性の仮定が成り立たないことである。最近提案された近親因果推論フレームワークは、実生活シナリオに付随するプロキシ変数を利用して因果効果を特定し、意思決定を容易にする。そこで本研究では, 橋梁の既往と治療を基盤とした, 最適な個別化治療手法を提案する。以上の結果から,この新しい最適治療体制の価値関数は文献上既存のものよりも優れていることが示された。識別、優越性、過剰な価値境界、推定された体制の整合性を含む理論的保証が確立される。さらに,提案手法を数値実験により実証し,実データに適用する。 A common concern when a policymaker draws causal inferences from and makes decisions based on observational data is that the measured covariates are insufficiently rich to account for all sources of confounding, i.e., the standard no confoundedness assumption fails to hold. The recently proposed proximal causal inference framework shows that proxy variables that abound in real-life scenarios can be leveraged to identify causal effects and therefore facilitate decision-making. Building upon this line of work, we propose a novel optimal individualized treatment regime based on so-called outcome and treatment confounding bridges. We then show that the value function of this new optimal treatment regime is superior to that of existing ones in the literature. Theoretical guarantees, including identification, superiority, excess value bound, and consistency of the estimated regime, are established. Furthermore, we demonstrate the proposed optimal regime via numerical experiments and a real data application.	翻訳日:2023-10-26 03:29:29 公開日:2023-10-24
# PhoMoH:人間の頭部のフォトリアリスティックな3Dモデル PhoMoH: Implicit Photorealistic 3D Models of Human Heads ( http://arxiv.org/abs/2212.07275v3 ) ライセンス: Link先を確認	Mihai Zanfir, Thiemo Alldieck and Cristian Sminchisescu	(参考訳) 本稿では,フォトリアリスティックな3次元形状の生成モデルを構築し,頭髪,あごひげ,口腔,衣服などの人間の頭部の外観をモデル化するニューラルネット手法であるフォモ(英語版)を提案する。以前の研究とは対照的に、PhoMoHは神経場を用いて人間の頭部をモデル化し、複雑なトポロジーをサポートする。ヘッドモデルをゼロから学習する代わりに,既存の表現型ヘッドモデルに新機能を加えることを提案する。具体的には,中解像度の頭部モデル上に高精細なジオメトリネットワークを,細部,局所的なジオメトリ認識,不連続色場とともに学習する。提案するアーキテクチャにより,比較的少ないデータからフォトリアリスティックな頭部モデルを学ぶことができる。学習された生成幾何学と出現ネットワークは個別にサンプリングすることができ、多様で現実的な人間の頭を作ることができる。大規模な実験は、我々のメソッドを定性的かつ異なるメトリクスで検証する。 We present PhoMoH, a neural network methodology to construct generative models of photo-realistic 3D geometry and appearance of human heads including hair, beards, an oral cavity, and clothing. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive head model with new features. Concretely, we learn a highly detailed geometry network layered on top of a mid-resolution head model together with a detailed, local geometry-aware, and disentangled color field. Our proposed architecture allows us to learn photo-realistic human head models from relatively little data. The learned generative geometry and appearance networks can be sampled individually and enable the creation of diverse and realistic human heads. Extensive experiments validate our method qualitatively and across different metrics.	翻訳日:2023-10-26 03:29:12 公開日:2023-10-24
# チャネル識別のための利益のある絡み合い Profitable entanglement for channel discrimination ( http://arxiv.org/abs/2211.15108v2 ) ライセンス: Link先を確認	Samad Khabbazi Oskouei, Stefano Mancini, Milajiguli Rexiti	(参考訳) 本稿では,2つの一般量子ビットチャネル(単位前処理と後処理)を識別する際の側絡み合いの有用性について検討し,それが成功確率を増大させる(及び、そうでない)正確な条件を決定する。これは、まず、完全正およびトレース保存されたキュービット線型写像の集合において極端であるチャネルの問題を解析し、次にそのような集合の内部にあるチャネルについて構成的に行われる。 We investigate the usefulness of side entanglement in discriminating between two generic qubit channels, {\ up to unitary pre- and post-processing,} and determine exact conditions under which it does enhance (as well as conditions under which it does not) the success probability. This is done in a constructive way by first analyzing the problem for channels that are extremal in the set of completely positive and trace-preserving qubit linear maps and then for channels that are inside such a set.	翻訳日:2023-10-26 03:28:42 公開日:2023-10-24
# 物理ベースオブジェクト6d-pose推定法 Physics-Based Object 6D-Pose Estimation during Non-Prehensile Manipulation ( http://arxiv.org/abs/2211.13572v3 ) ライセンス: Link先を確認	Zisong Xu, Rafael Papallas, Mehmet Dogar	(参考訳) 本研究では,物体の6次元姿勢を時間とともに追跡する手法を提案する。オブジェクトの操作中にいつでも、ロボットのジョイントコントロールとカメラからのイメージへのアクセスを前提とします。ロボットのジョイントコントロールを使って、物体の動きを物理ベースの予測します。そして、この予測とカメラからの観測を組み合わせることで、物体のポーズを可能な限り正確に推定する。本研究では,制御情報と視覚情報を組み合わせた粒子フィルタリング手法を提案する。提案手法を2つのベースラインと比較する。 (i)各時間ステップでのイメージベースポーズ推定システムのみの使用、及び (II)計算に高価な物理予測を行わない粒子フィルタであって,物体が一定の速度で動くことを仮定する。その結果、物理ベースの予測を行うことで計算コストが上がり、より正確な追跡が可能となり、カメラに見えない物体でも物体のポーズを推定できることがわかった。 We propose a method to track the 6D pose of an object over time, while the object is under non-prehensile manipulation by a robot. At any given time during the manipulation of the object, we assume access to the robot joint controls and an image from a camera. We use the robot joint controls to perform a physics-based prediction of how the object might be moving. We then combine this prediction with the observation coming from the camera, to estimate the object pose as accurately as possible. We use a particle filtering approach to combine the control information with the visual information. We compare the proposed method with two baselines: (i) using only an image-based pose estimation system at each time-step, and (ii) a particle filter which does not perform the computationally expensive physics predictions, but assumes the object moves with constant velocity. Our results show that making physics-based predictions is worth the computational cost, resulting in more accurate tracking, and estimating object pose even when the object is not clearly visible to the camera.	翻訳日:2023-10-26 03:28:31 公開日:2023-10-24
# 低複素性を考慮した適応フェデレーションミニマックス最適化 Adaptive Federated Minimax Optimization with Lower complexities ( http://arxiv.org/abs/2211.07303v3 ) ライセンス: Link先を確認	Feihu Huang	(参考訳) フェデレーション学習(Federated Learning)は、分散およびプライバシ保護の機械学習パラダイムとして人気がある。一方、機械学習において、効率的な階層最適化としてミニマックス最適化が広く適用されている。近年,分散ミニマックス問題の解法としてフェデレーション最適化法が提案されている。しかし、これらのフェデレーションされたミニマックス法は依然として高い勾配と通信の複雑さに苦しむ。一方,適応学習速度を用いてアルゴリズムを高速化するアルゴリズムは少ない。このギャップを埋めるため,本論文では,非凸ミニマックス最適化のクラスについて検討し,分散ミニマックス問題を解くための効率的な適応フェデレーションミニマックス最適化アルゴリズム(adafgda)を提案する。特に,adafgdaは運動量に基づく分散低減法と局所sgd法を基盤とし,統一適応行列を用いて様々な適応学習率を柔軟に組み込むことができる。理論的には、AdaFGDAアルゴリズムに対して、非i.d.条件下でのソリッド収束解析フレームワークを提供する。さらに、我々のアルゴリズムは、非凸ミニマックス問題の$\epsilon$-stationary pointを求める際に、$\tilde{o}(\epsilon^{-3})$と$\tilde{o}(\epsilon^{-2})$の通信複雑性がより低い勾配(すなわち確率的一階オラクル、sfo)の複雑さを得ることを証明します。実験では,アルゴリズムの効率性を検証するために,深層auc最大化とロバストニューラルネットワークトレーニングタスクについて実験を行う。 Federated learning is a popular distributed and privacy-preserving machine learning paradigm. Meanwhile, minimax optimization, as an effective hierarchical optimization, is widely applied in machine learning. Recently, some federated optimization methods have been proposed to solve the distributed minimax problems. However, these federated minimax methods still suffer from high gradient and communication complexities. Meanwhile, few algorithm focuses on using adaptive learning rate to accelerate algorithms. To fill this gap, in the paper, we study a class of nonconvex minimax optimization, and propose an efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to solve these distributed minimax problems. Specifically, our AdaFGDA builds on the momentum-based variance reduced and local-SGD techniques, and it can flexibly incorporate various adaptive learning rates by using the unified adaptive matrix. Theoretically, we provide a solid convergence analysis framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we prove our algorithms obtain lower gradient (i.e., stochastic first-order oracle, SFO) complexity of $\tilde{O}(\epsilon^{-3})$ with lower communication complexity of $\tilde{O}(\epsilon^{-2})$ in finding $\epsilon$-stationary point of the nonconvex minimax problems. Experimentally, we conduct some experiments on the deep AUC maximization and robust neural network training tasks to verify efficiency of our algorithms.	翻訳日:2023-10-26 03:28:15 公開日:2023-10-24
# 超不均衡太陽電池モジュール画像における欠陥分割のための高調波出力不均衡 Harmonizing output imbalance for defect segmentation on extremely-imbalanced photovoltaic module cells images ( http://arxiv.org/abs/2211.05295v4 ) ライセンス: Link先を確認	Jianye Yi, Xiaopin Zhong, Weixiang Liu, Zongze Wu, Yuanlong Deng and Zhengguang Wu	(参考訳) 太陽光発電(PV)産業の継続的な発展は、PVモジュール細胞の単結晶の品質に対する高い要求を高めている。 PVモジュールセルイメージの欠陥領域の分割を学ぶとき、Tiny Hidden Cracks (THC) は極めて不均衡なサンプルを生み出す。欠陥画素と通常の画素の比率は1:2000程度である。この極端不均衡により、PVモジュール細胞のTHCのセグメンテーションが難しくなり、セグメンテーションの課題でもある。 To address the problem of segmenting defects on extremely-imbalanced THC data, the paper makes contributions from three aspects: (1) it proposes an explicit measure for output imbalance; (2) it generalizes a distribution-based loss that can handle different types of output imbalances; and (3) it introduces a compound loss with our adaptive hyperparameter selection algorithm that can keep the consistency of training and inference for harmonizing the output imbalance on extremelyimbalanced input data. 提案手法は,広く使用されている4つのディープラーニングアーキテクチャと,入力の不均衡度が異なる4つのデータセットを用いて評価する。実験の結果,提案手法は既存手法よりも優れていた。 The continuous development of the photovoltaic (PV) industry has raised high requirements for the quality of monocrystalline of PV module cells. When learning to segment defect regions in PV module cell images, Tiny Hidden Cracks (THC) lead to extremely-imbalanced samples. The ratio of defect pixels to normal pixels can be as low as 1:2000. This extreme imbalance makes it difficult to segment the THC of PV module cells, which is also a challenge for semantic segmentation. To address the problem of segmenting defects on extremely-imbalanced THC data, the paper makes contributions from three aspects: (1) it proposes an explicit measure for output imbalance; (2) it generalizes a distribution-based loss that can handle different types of output imbalances; and (3) it introduces a compound loss with our adaptive hyperparameter selection algorithm that can keep the consistency of training and inference for harmonizing the output imbalance on extremelyimbalanced input data. The proposed method is evaluated on four widely-used deep learning architectures and four datasets with varying degrees of input imbalance. The experimental results show that the proposed method outperforms existing methods.	翻訳日:2023-10-26 03:27:20 公開日:2023-10-24
# 重要再サンプリングによる言語モデルのデータ選択 Data Selection for Language Models via Importance Resampling ( http://arxiv.org/abs/2302.03169v2 ) ライセンス: Link先を確認	Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang	(参考訳) 適切な事前学習データセットの選択は、一般ドメイン(gpt-3など)とドメイン固有言語モデル(例えば、コードx)の両方において不可欠である。この問題を、ラベルなしのターゲットサンプルが与えられた場合に、所望のターゲット分布にマッチするように、大きな生のラベルなしデータセットのサブセットを選択することで定式化する。テキストデータの大規模化と次元化のため、既存の手法では単純なヒューリスティックスや専門家を使ってデータを手作業でキュレートする。代わりに、lmデータ選択に低次元で使用される古典的な重要度再サンプリングアプローチを拡張します。本研究では,トラクタビリティの低減した特徴空間における重み付けを推定し,重み付けによる重み付けを伴うデータを選択する,効率的でスケーラブルなフレームワークであるData Selection with Importance Resampling(DSIR)を提案する。適切な特徴空間を決定するために、選択した事前学習データと特徴空間のターゲットとの近接度を測定するデータ計量であるKL削減が、単純なn-gram特徴量で計算した場合の平均下流精度(r=0.89)と高い相関を持つことを示す。これは、n-gram特徴を用いたDSIRのインスタンス化を動機付けます。特定のドメインに対して事前トレーニングを継続する場合、DSIRは8つのターゲットディストリビューションにわたる専門家のキュレーションと互換性がある。汎用ドメインモデル(ターゲットはウィキペディア+書籍)を事前トレーニングする場合、DSIRはGLUEベンチマークでランダム選択とヒューリスティックフィルタリングベースラインを2-2.5%改善する。 Selecting a suitable pretraining dataset is crucial for both general-domain (e.g., GPT-3) and domain-specific (e.g., Codex) language models (LMs). We formalize this problem as selecting a subset of a large raw unlabeled dataset to match a desired target distribution given some unlabeled target samples. Due to the large scale and dimensionality of the raw text data, existing methods use simple heuristics or use experts to manually curate data. Instead, we extend the classic importance resampling approach used in low-dimensions for LM data selection. We propose Data Selection with Importance Resampling (DSIR), an efficient and scalable framework that estimates importance weights in a reduced feature space for tractability and selects data with importance resampling according to these weights. To determine an appropriate feature space, we show that KL reduction, a data metric that measures the proximity between selected pretraining data and the target in a feature space, has high correlation with average downstream accuracy (r=0.89) when computed with simple n-gram features. This motivates our instantiation of DSIR using n-gram features. When performing continued pretraining towards a specific domain, DSIR performs comparably to expert curation across 8 target distributions. When pretraining general-domain models (target is Wikipedia + books), DSIR improves over random selection and heuristic filtering baselines by 2-2.5% on the GLUE benchmark.	翻訳日:2023-10-26 03:22:12 公開日:2023-10-24
# NA-SODINN:残音条件に基づく外惑星画像検出のためのディープラーニングアルゴリズム NA-SODINN: a deep learning algorithm for exoplanet image detection based on residual noise regimes ( http://arxiv.org/abs/2302.02854v2 ) ライセンス: Link先を確認	Carles Cantero, Olivier Absil, Carl-Henrik Dahlqvist and Marc Van Droogenbroeck	(参考訳) SODINNアルゴリズムは、角微分画像(ADI)データセットにおける外惑星検出のために設計された畳み込みニューラルネットワークである。 EIDC (Exoplanet Imaging Data Challenge) におけるHCIアルゴリズムのベンチマークの結果が得られた。 i) SODINNは、最終検出マップにおいて、多数の偽陽性を生成でき、 (ii)より局所的に画像を処理するアルゴリズムは、より優れた性能を発揮する。本研究は,新しい局所処理手法を導入し,それに従って学習プロセスを適用することで, sodinn検出性能を向上させることを目的とする。本稿では,畳み込みニューラルネットワーク(CNN)に基づく新しいディープラーニングバイナリ分類器NA-SODINNを提案する。我々の新しいアプローチは、VLT/SPHEREとKeck/NIRC-2のADI配列の局所受信動作特性(ROC)解析を通じて、2つのSODINNベースハイブリッドモデルとより標準の環状PCAアプローチに対して試験された。その結果、NA-SODINNは感度と特異性の両方でSODINNを強化し、特にスペックルが支配するノイズレシエーションにおいて顕著であることがわかった。また, NA-SODINNは, EIDCにおける提案された検出アルゴリズムの完全セットに対してベンチマークを行い, 最終的な検出スコアが最強検出アルゴリズムと一致しているか, あるいは上回っていることを示すとともに, 教師付き機械学習のケースにおいて, 処理された画像の局所的内容に検出タスクを適用することの重要性を図示し, 強化する。 Supervised deep learning was recently introduced in high-contrast imaging (HCI) through the SODINN algorithm, a convolutional neural network designed for exoplanet detection in angular differential imaging (ADI) datasets. The benchmarking of HCI algorithms within the Exoplanet Imaging Data Challenge (EIDC) showed that (i) SODINN can produce a high number of false positives in the final detection maps, and (ii) algorithms processing images in a more local manner perform better. This work aims to improve the SODINN detection performance by introducing new local processing approaches and adapting its learning process accordingly. We propose NA-SODINN, a new deep learning binary classifier based on a convolutional neural network (CNN) that better captures image noise correlations in ADI-processed frames by identifying noise regimes. Our new approach was tested against its predecessor, as well as two SODINN-based hybrid models and a more standard annular-PCA approach, through local receiving operating characteristics (ROC) analysis of ADI sequences from the VLT/SPHERE and Keck/NIRC-2 instruments. Results show that NA-SODINN enhances SODINN in both sensitivity and specificity, especially in the speckle-dominated noise regime. NA-SODINN is also benchmarked against the complete set of submitted detection algorithms in EIDC, in which we show that its final detection score matches or outperforms the most powerful detection algorithms.Throughout the supervised machine learning case, this study illustrates and reinforces the importance of adapting the task of detection to the local content of processed images.	翻訳日:2023-10-26 03:21:45 公開日:2023-10-24
# 半スーパービジョンの医用画像分割再考 : ばらつき低減の視点から Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective ( http://arxiv.org/abs/2302.01735v5 ) ライセンス: Link先を確認	Chenyu You, Weicheng Dai, Yifei Min, Fenglin Liu, David A. Clifton, S Kevin Zhou, Lawrence Hamilton Staib, James S Duncan	(参考訳) 医用画像のセグメンテーションにおいて, 比較学習は, 意味論的に類似した, 異種のサンプルを対比することにより, 視覚表現の質を向上させるための主流の実践である。これは、真に異なる解剖学的特徴を持つ負の例が、もしサンプルを採取すれば、性能が著しく向上する、という観察によって可能となった。しかし実際には、これらのサンプルは類似した解剖学的領域から来ており、モデルは少数派のテールクラスのサンプルを区別するのに苦労し、テールクラスは誤分類されやすくなり、両者ともモデル崩壊に繋がる。本稿では,医療画像分割のための階層化群理論を用いた半教師付きコントラスト学習(cl)フレームワークarcoを提案する。特に, 分散還元推定の概念を通したarcoの構築を最初に提案し, 限定ラベルを持つ画素/ボクセルレベル分割タスクにおいて, ある種の分散還元手法が特に有益であることを示す。さらに,これらのサンプリング手法が分散還元において普遍的であることを理論的に証明する。最後に,5つの2D/3D医療データセットと3つのセマンティックセマンティックセグメンテーションデータセットとラベル設定の異なる8つのベンチマークに対して,我々の手法を実験的に検証した。さらに、clフレームワークをこれらのサンプリング技術で強化し、以前の方法を大きく上回る結果を示す。我々は,これらの課題を克服するために,現在の自己超越目標の限界を定量化し,半監督的医用画像セグメンテーションに向けた重要なステップであると考えている。 For medical image segmentation, contrastive learning is the dominant practice to improve the quality of visual representations by contrasting semantically similar and dissimilar pairs of samples. This is enabled by the observation that without accessing ground truth labels, negative examples with truly dissimilar anatomical features, if sampled, can significantly improve the performance. In reality, however, these samples may come from similar anatomical regions and the models may struggle to distinguish the minority tail-class samples, making the tail classes more prone to misclassification, both of which typically lead to model collapse. In this paper, we propose ARCO, a semi-supervised contrastive learning (CL) framework with stratified group theory for medical image segmentation. In particular, we first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks with extremely limited labels. Furthermore, we theoretically prove these sampling techniques are universal in variance reduction. Finally, we experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings, and our methods consistently outperform state-of-the-art semi-supervised methods. Additionally, we augment the CL frameworks with these sampling techniques and demonstrate significant gains over previous methods. We believe our work is an important step towards semi-supervised medical image segmentation by quantifying the limitation of current self-supervision objectives for accomplishing such challenging safety-critical tasks.	翻訳日:2023-10-26 03:21:17 公開日:2023-10-24
# コンテキストプルーニングメタラーニングによる大規模ニューラルネットワークの学習 Learning Large-scale Neural Fields via Context Pruned Meta-Learning ( http://arxiv.org/abs/2302.00617v3 ) ライセンス: Link先を確認	Jihoon Tack, Subin Kim, Sihyun Yu, Jaeho Lee, Jinwoo Shin, Jonathan Richard Schwarz	(参考訳) 本稿では,オンラインコンテキストポイントの自動選択による大幅なメモリ節約を実現することで,大規模ニューラルネットワークトレーニングのための効率的な最適化に基づくメタ学習手法を提案する。これは、各学習ステップをデータサブセットに集中させ、モデル品質の即時改善を期待し、その結果、大域構造のほぼ瞬時にモデリングし、高周波の詳細を洗練させることによって達成される。さらに,最適化に基づくメタ学習のマイオピアを緩和しつつ,文脈セットの縮小によって生じる誤りを最小化するブートストラップ補正を導入することで,メタ学習初期化の質をさらに向上させる。最後に,メタテスト時間における勾配再スケーリングが,最適化手順を大幅に短縮する上で,極めて高品質なニューラルネットワークの学習を可能にすることを示す。私たちのフレームワークはモデルに依存しず、直感的で、実装が簡単で、幅広い信号に対する大幅な再構成改善を示しています。本稿では,複数のモダリティにまたがる9つのデータセットの広範な実験評価を行い,その手法を構成するアルゴリズム成分を注意深く分析することで,最先端の結果を示す。コードはhttps://github.com/jihoontack/GradNCPで入手できる。 We introduce an efficient optimization-based meta-learning technique for large-scale neural field training by realizing significant memory savings through automated online context point selection. This is achieved by focusing each learning step on the subset of data with the highest expected immediate improvement in model quality, resulting in the almost instantaneous modeling of global structure and subsequent refinement of high-frequency details. We further improve the quality of our meta-learned initialization by introducing a bootstrap correction resulting in the minimization of any error introduced by reduced context sets while simultaneously mitigating the well-known myopia of optimization-based meta-learning. Finally, we show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields in significantly shortened optimization procedures. Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals. We provide an extensive empirical evaluation on nine datasets across multiple multiple modalities, demonstrating state-of-the-art results while providing additional insight through careful analysis of the algorithmic components constituting our method. Code is available at https://github.com/jihoontack/GradNCP	翻訳日:2023-10-26 03:20:48 公開日:2023-10-24
# グラフニューラルネットワークのゼロワン法則 Zero-One Laws of Graph Neural Networks ( http://arxiv.org/abs/2301.13060v5 ) ライセンス: Link先を確認	Sam Adam-Day, Theodor Mihai Iliant, \.Ismail \.Ilkan Ceylan	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ上の機械学習のためのデファクト標準ディープラーニングアーキテクチャである。これにより、これらのモデルの能力と限界、特にそれらの表現と外挿能力に関する多くの作業が分析された。グラフノードの数が非常に大きくなるにつれて、GNNはどのように振る舞うのか? 穏やかな仮定の下では、Erd\H{o}s-R\'enyi モデルから増大するグラフを描くと、そのようなグラフがGNN分類器のクラスによって特定の出力にマップされる確率は 0 または 1 の傾向を示す。このクラスは一般的なグラフ畳み込みネットワークアーキテクチャを含んでいる。その結果、これらのGNNに対して「ゼロワン法則」を確立し、他の収束法則と類似して、その能力に関する理論的制限を課す。理論的な漸近限界は、比較的小さなグラフ上で既に明らかなものであることを観察し、実験的に検証した。 Graph neural networks (GNNs) are the de facto standard deep learning architectures for machine learning on graphs. This has led to a large body of work analyzing the capabilities and limitations of these models, particularly pertaining to their representation and extrapolation capacity. We offer a novel theoretical perspective on the representation and extrapolation capacity of GNNs, by answering the question: how do GNNs behave as the number of graph nodes become very large? Under mild assumptions, we show that when we draw graphs of increasing size from the Erd\H{o}s-R\'enyi model, the probability that such graphs are mapped to a particular output by a class of GNN classifiers tends to either zero or to one. This class includes the popular graph convolutional network architecture. The result establishes 'zero-one laws' for these GNNs, and analogously to other convergence laws, entails theoretical limitations on their capacity. We empirically verify our results, observing that the theoretical asymptotic limits are evident already on relatively small graphs.	翻訳日:2023-10-26 03:20:28 公開日:2023-10-24
# グラフニューラルネットワークは、グラフ構造のみから隠れた特徴を回復できる Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure ( http://arxiv.org/abs/2301.10956v3 ) ライセンス: Link先を確認	Ryoma Sato	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ学習問題の一般的なモデルである。 gnnは多くの実用的なタスクで強い経験的パフォーマンスを示します。しかし、理論的な性質は完全に解明されていない。本稿では,GNNの表現力の観点から,GNNがグラフ構造を活用できるかどうかを検討する。本分析では,グラフ構造に関するすべての情報を含む隠れノード特徴(あるいは潜在ノード特徴)によって制御されるグラフ生成プロセスについて考察する。このフレームワークの典型的な例は、隠れた特徴から構築されたkNNグラフである。本研究の主目的は,隠れた特徴自身や間接的なヒントを含むすべてのノード特徴が利用できない場合でも,GNNが入力グラフのみから隠れたノード特徴を復元できることである。 gnnは、ダウンストリームタスクで回復したノード機能をさらに使用できる。これらの結果から、GNNはグラフ構造を自分自身で完全に活用でき、事実上、GNNは下流タスクに隠されたノード機能と明示的なノード機能の両方を利用することができる。実験では,理論解析に基づいて構築されたGNNアーキテクチャを用いて,GNNが隠れた特徴を正確に復元できることを示し,その妥当性を確認した。 Graph Neural Networks (GNNs) are popular models for graph learning problems. GNNs show strong empirical performance in many practical tasks. However, the theoretical properties have not been completely elucidated. In this paper, we investigate whether GNNs can exploit the graph structure from the perspective of the expressive power of GNNs. In our analysis, we consider graph generation processes that are controlled by hidden (or latent) node features, which contain all information about the graph structure. A typical example of this framework is kNN graphs constructed from the hidden features. In our main results, we show that GNNs can recover the hidden node features from the input graph alone, even when all node features, including the hidden features themselves and any indirect hints, are unavailable. GNNs can further use the recovered node features for downstream tasks. These results show that GNNs can fully exploit the graph structure by themselves, and in effect, GNNs can use both the hidden and explicit node features for downstream tasks. In the experiments, we confirm the validity of our results by showing that GNNs can accurately recover the hidden features using a GNN architecture built based on our theoretical analysis.	翻訳日:2023-10-26 03:19:26 公開日:2023-10-24
# Batch Prompting: 大規模言語モデルAPIによる効率的な推論 Batch Prompting: Efficient Inference with Large Language Model APIs ( http://arxiv.org/abs/2301.08721v2 ) ライセンス: Link先を確認	Zhoujun Cheng, Jungo Kasai, Tao Yu	(参考訳) 大規模言語モデル(LLM)を用いた大量のサンプルに対する推論は、産業や実世界の利用において計算的かつ経済的にコストがかかる可能性がある。我々は,LLMが1回に1つのサンプルではなく,バッチで推論を実行できるようにする,シンプルで効果的なプロンプト手法であるバッチプロンプトを提案する。ダウンストリーム性能を維持しながらトークンと時間の両方のコストを削減する。理論的には、数ショットのコンテキスト内学習環境では、各バッチのサンプル数とともに、推論コストはほぼ線形に減少する。バッチプロンプトが著しく~(最大で6つのサンプルで5倍)、LLM(Codex)推論トークンと時間コストが削減され、性能が向上または同等になる。 GPT-3.5 や GPT-4 のような最先端の Chat ベースの LLM では、バッチプロンプトの利点も保たれている。さらに分析した結果、各バッチ内のサンプル数とタスクの複雑さがパフォーマンスに影響することがわかった。さらに、バッチプロンプトはLLMを用いて異なる推論方法に適用できる。私たちのコードはhttps://github.com/xlang-ai/batch-promptingのサイトにある。 Performing inference on large volumes of samples with large language models (LLMs) can be computationally and financially costly in industry and real-world use. We propose batch prompting, a simple yet effective prompting approach that enables the LLM to run inference in batches, instead of one sample at a time. Our method reduces both token and time costs while retaining downstream performance. We theoretically demonstrate that under a few-shot in-context learning setting, the inference costs decrease almost inverse linearly with the number of samples in each batch. We extensively validate the effectiveness of batch prompting on ten datasets across commonsense QA, arithmetic reasoning, and NLI/NLU: batch prompting significantly~(up to 5x with six samples in batch) reduces the LLM (Codex) inference token and time costs while achieving better or comparable performance. For state-of-the-art Chat-based LLMs, e.g., GPT-3.5 and GPT-4, we show the benefits of batch prompting also hold. Further analysis shows that the number of samples in each batch and the complexity of tasks affect its performance. Moreover, batch prompting can be applied across different reasoning methods using LLMs. Our code can be found at the site https://github.com/xlang-ai/batch-prompting.	翻訳日:2023-10-26 03:19:08 公開日:2023-10-24
# FENDI:量子インターネットにおける高密度エンタングルメント分布を目指して FENDI: Toward High-Fidelity Entanglement Distribution in the Quantum Internet ( http://arxiv.org/abs/2301.08269v3 ) ライセンス: Link先を確認	Huayue Gu, Zhouyu Li, Ruozhou Yu, Xiaojian Wang, Fangtong Zhou, Jianqing Liu, Guoliang Xue	(参考訳) 量子ネットワークは、遠隔ノード間で量子の絡み合いを分散させ、セキュアな通信、量子センシング、分散量子コンピューティングにおける多くの応用の鍵となる。本稿では,マルチホップ量子リピータネットワークにおけるスループットと絡み合い分布の質のトレードオフについて検討する。エンタングルメント分布率(EDR)および/またはエンタングルメント忠実度をヒューリスティックに最大化することを目的とした既存の研究と比較して、我々のゴールは、任意の量子ノード間の最大到達可能なEDRの上限を満たしつつ、最大到達可能な最悪のケース忠実度を特徴づけることである。この特徴付けは、量子ネットワークの達成可能な性能領域の基本的な境界を提供し、量子ネットワークトポロジー、プロトコル、アプリケーションの設計を支援する。しかし、そのタスクは非常に非自明であり、証明する限りNPハードである。我々の主な貢献は、達成可能な最悪のケースの忠実度を厳密なEDR境界に近似する完全多項式時間近似スキームであり、最適忠実度非依存なEDR最適化と最悪のケース等方性雑音モデルを組み合わせたものである。 EDRとフィデリティ保証は、量子メモリを備えたポストセレクション・アンド・ストレージプロトコルによって実装できる。離散時間量子ネットワークシミュレータを開発することで,ネットワークの特徴的な性能領域(近似パレートフロンティア)を示すシミュレーションを行い,既存のプロトコルが実質的なギャップを示す一方で,設計プロトコルが性能領域を達成できることを実証する。 A quantum network distributes quantum entanglements between remote nodes, and is key to many applications in secure communication, quantum sensing and distributed quantum computing. This paper explores the fundamental trade-off between the throughput and the quality of entanglement distribution in a multi-hop quantum repeater network. Compared to existing work which aims to heuristically maximize the entanglement distribution rate (EDR) and/or entanglement fidelity, our goal is to characterize the maximum achievable worst-case fidelity, while satisfying a bound on the maximum achievable expected EDR between an arbitrary pair of quantum nodes. This characterization will provide fundamental bounds on the achievable performance region of a quantum network, which can assist with the design of quantum network topology, protocols and applications. However, the task is highly non-trivial and is NP-hard as we shall prove. Our main contribution is a fully polynomial-time approximation scheme to approximate the achievable worst-case fidelity subject to a strict expected EDR bound, combining an optimal fidelity-agnostic EDR-maximizing formulation and a worst-case isotropic noise model. The EDR and fidelity guarantees can be implemented by a post-selection-and-storage protocol with quantum memories. By developing a discrete-time quantum network simulator, we conduct simulations to show the characterized performance region (the approximate Pareto frontier) of a network, and demonstrate that the designed protocol can achieve the performance region while existing protocols exhibit a substantial gap.	翻訳日:2023-10-26 03:18:42 公開日:2023-10-24
# 超伝導回路上の時間最適ユニバーサル量子ゲート Time-optimal universal quantum gates on superconducting circuits ( http://arxiv.org/abs/2301.03334v2 ) ライセンス: Link先を確認	Ze Li, Ming-Jie Liang, Zheng-Yuan Xue	(参考訳) 量子系を操作する場合、デコヒーレンスは避けられない。量子操作の質を低下させるため、高忠実度量子ゲートを必要とする大規模量子計算の主要な障害の1つである。一般的に、ゲート操作が長ければ長いほど、デコヒーレンスによって引き起こされるゲートの不完全性が増す。したがって、ゲート時間を短くする方法は、解決すべき緊急の問題となる。この目的のために、量子ブラヒストローネ方程式の解法に基づく時間最適制御は簡単な解である。本稿では,2次元正方格子配置の超伝導量子ビット上での普遍量子ゲートを実現する手法を提案し,2量子ビットゲートの忠実度は99.9\%に近づく。一方、外部駆動の変形を調整することにより、Z軸ゲートをかなり加速させることができる。最後に,デフォーカスエラーの影響を低減するために,デコヒーレンスフリーな部分空間符号化も実装に取り入れた。そこで我々は,大規模量子計算に期待できる高速量子スキームを提案する。 Decoherence is inevitable when manipulating quantum systems. It decreases the quality of quantum manipulations and thus is one of the main obstacles for large-scale quantum computation, where high-fidelity quantum gates are needed. Generally, the longer a gate operation is, the more decoherence-induced gate infidelity will be. Therefore, how to shorten the gate time becomes an urgent problem to be solved. To this end, time-optimal control based on solving the quantum brachistochrone equation is a straightforward solution. Here, based on time-optimal control, we propose a scheme to realize universal quantum gates on superconducting qubits in a two-dimensional square lattice configuration, and the two-qubit gate fidelity approaches 99.9\%. Meanwhile, we can further accelerate the Z-axis gate considerably by adjusting the detuning of the external driving. Finally, in order to reduce the influence of the dephasing error, decoherence-free subspace encoding is also incorporated in our physical implementation. Therefore, we present a fast quantum scheme which is promising for large-scale quantum computation.	翻訳日:2023-10-26 03:18:11 公開日:2023-10-24
# cosyn:コンテキスト同期双曲ネットワークを用いたオンライン会話における暗黙的ヘイトスピーチの検出 CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network ( http://arxiv.org/abs/2303.03387v3 ) ライセンス: Link先を確認	Sreyan Ghosh and Manan Suri and Purva Chiniya and Utkarsh Tyagi and Sonal Kumar and Dinesh Manocha	(参考訳) オンラインの会話で交流するソーシャルメディア利用者の急増はヘイトスピーチを著しく増加させ、様々な人口層からの影響を受けている。先行研究のほとんどが、暗黙のヘイトスピーチの検出や間接言語やコード化された言語によるヘイトスピーチの検出に重点を置いて、ヘイトフルなフレーズを活用している、明示的なヘイトスピーチの検出に重点を置いている。本稿では,オンライン会話における暗黙のヘイトスピーチを検出するために,ユーザと会話のコンテキストを明示的に組み込んだ,コンテキストシナージュ型ニューラルネットワークCoSynを提案する。 cosyn氏は、これらの外部コンテキストをエンコードする新しい方法を紹介し、それらの間の相互作用を明確に捉える新しいコンテキストインタラクションメカニズムを採用し、これらのノイズの多いコンテキストから取得すべき情報量について独立的に評価する。さらに、ソーシャルメディアのスケールフリーなダイナミクスを考慮するために、双曲空間でこれらすべての操作を実行する。我々は6つのヘイトスピーチデータセットに対するCoSynの有効性を実証し、CoSynが1.24%から57.8%の範囲で絶対的な改善を施した暗黙のヘイトスピーチの検出において、すべてのベースラインを上回っていることを示す。 The tremendous growth of social media users interacting in online conversations has led to significant growth in hate speech, affecting people from various demographics. Most of the prior works focus on detecting explicit hate speech, which is overt and leverages hateful phrases, with very little work focusing on detecting hate speech that is implicit or denotes hatred through indirect or coded language. In this paper, we present CoSyn, a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations. CoSyn introduces novel ways to encode these external contexts and employs a novel context interaction mechanism that clearly captures the interplay between them, making independent assessments of the amounts of information to be retrieved from these noisy contexts. Additionally, it carries out all these operations in the hyperbolic space to account for the scale-free dynamics of social media. We demonstrate the effectiveness of CoSyn on 6 hate speech datasets and show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.	翻訳日:2023-10-26 01:35:41 公開日:2023-10-24
# 欧州連合における政治広告の透明性向上法についての一考察 A Note on the Proposed Law for Improving the Transparency of Political Advertising in the European Union ( http://arxiv.org/abs/2303.02863v4 ) ライセンス: Link先を確認	Jukka Ruohonen	(参考訳) 世界中で政治広告の供給と需要が高まっている。同時に、外国政府や他の悪役による選挙妨害のような社会的な脅威は、多くの民主政治において迫る懸念となっている。さらに、外国軍や国内軍による選挙結果の操作は、基本的権利を心配している多くの市民の関心事であり続けている。この目的のために、欧州連合(EU)はこの問題に取り組むためのいくつかの取り組みを開始した。 2020年には、政治広告の透明性を高めるための新しい規制が提案された。この短い解説は提案された規制を見直し、その制限と潜在的な影響についていくつかの点を提起する。 There is an increasing supply and demand for political advertising throughout the world. At the same time, societal threats, such as election interference by foreign governments and other bad actors, continues to be a pressing concern in many democracies. Furthermore, manipulation of electoral outcomes, whether by foreign or domestic forces, continues to be a concern of many citizens who are also worried about their fundamental rights. To these ends, the European Union (EU) has launched several initiatives for tackling the issues. A new regulation was proposed in 2020 also for improving the transparency of political advertising in the union. This short commentary reviews the regulation proposed and raises a few points about its limitations and potential impacts.	翻訳日:2023-10-26 01:35:17 公開日:2023-10-24
# 大規模言語モデルによるゼロショットクロスリンガル要約 Zero-Shot Cross-Lingual Summarization via Large Language Models ( http://arxiv.org/abs/2302.14229v4 ) ライセンス: Link先を確認	Jiaan Wang, Yunlong Liang, Fandong Meng, Beiqi Zou, Zhixu Li, Jianfeng Qu, Jie Zhou	(参考訳) ソース言語の文書が与えられた場合、言語間要約(CLS)は異なるターゲット言語で要約を生成することを目的としている。近年, GPT-3.5, ChatGPT, GPT-4 などの大規模言語モデル (LLM) の出現は, 計算言語学コミュニティから広く注目を集めている。しかし、LS上でのLSMの性能は未だ分かっていない。本稿では,異なるパラダイム(エンド・ツー・エンド・エンド・パイプライン)からゼロショットCLSを誘導するための様々なプロンプトを実証的に使用し,生成したサマリーの予備評価を行う。 ChatGPT と GPT-4 はもともと,詳細な情報を持つ長文要約が好まれていた。これらの2つのLSMは、対話的なプロンプトの助けを借りて、情報量と簡潔さを更にバランスさせ、CLSの性能を大幅に向上させることができる。 3つの広く使用されているCLSデータセットによる実験結果から、GPT-4は最先端のゼロショットCLS性能を達成し、微細調整されたmBART-50と競合して性能を発揮することが示された。さらに,多言語およびバイリンガルLLM(BLOOMZ,ChatGLM-6B,Vicuna-13B,ChatYuan)はゼロショットCLS能力に制限がある。要約と翻訳を同時に行うモデルを必要とするCLSの合成特性のため、ゼロショット方式でこのタスクを実現することは、LSMにとっての課題である。したがって、今後のLSM研究がLSをテストベッドとして利用できることを心から願っています。 Given a document in a source language, cross-lingual summarization (CLS) aims to generate a summary in a different target language. Recently, the emergence of Large Language Models (LLMs), such as GPT-3.5, ChatGPT and GPT-4, has attracted wide attention from the computational linguistics community. However, it is not yet known the performance of LLMs on CLS. In this report, we empirically use various prompts to guide LLMs to perform zero-shot CLS from different paradigms (i.e., end-to-end and pipeline), and provide a preliminary evaluation on the generated summaries. We find that ChatGPT and GPT-4 originally prefer to produce lengthy summaries with detailed information. These two LLMs can further balance informativeness and conciseness with the help of an interactive prompt, significantly improving their CLS performance. Experimental results on three widely-used CLS datasets show that GPT-4 achieves state-of-the-art zero-shot CLS performance, and performs competitively compared with the fine-tuned mBART-50. Moreover, we also find some multi-lingual and bilingual LLMs (i.e., BLOOMZ, ChatGLM-6B, Vicuna-13B and ChatYuan) have limited zero-shot CLS ability. Due to the composite nature of CLS, which requires models to perform summarization and translation simultaneously, accomplishing this task in a zero-shot manner is even a challenge for LLMs. Therefore, we sincerely hope and recommend future LLM research could use CLS as a testbed.	翻訳日:2023-10-26 01:35:07 公開日:2023-10-24
# 深層ニューラルネットワークにおける早期トレーニングダイナミクスの位相図:学習速度,深さ,幅の影響 Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width ( http://arxiv.org/abs/2302.12250v2 ) ライセンス: Link先を確認	Dayal Singh Kalra and Maissam Barkeshli	(参考訳) 確率勾配降下法(SGD)を訓練したディープニューラルネットワーク(DNN)の最適化ダイナミクスを系統的に解析し,学習率$\eta$,deep $d$,Whid $w$のニューラルネットワークの効果について検討した。損失のヘシアンの最大固有値 $\lambda^H_t$ を解析することにより、損失ランドスケープの鋭さを測定することで、ダイナミクスは4つの異なる状態を示すことができる。 (i)早期の一時的な体制。 (二)中間飽和体制 (iii)進歩的な研削体制、 (iv)後期の「安定の最先端」体制。初期と中間の体制は (i)および (ii) $\eta \equiv c / \lambda_0^H $, $d$, $w$ に依存する豊富な位相図を示す。トレーニング損失とシャープネスの初期ダイナミクスにおいて定性的に異なる現象を分離するいくつかの臨界値である$c$を同定した。特に、$d$ と $1/w$ が増加するにつれて、鋭さが早い段階で減少する `sharpness reduction" フェーズの開始を見出した。 We systematically analyze optimization dynamics in deep neural networks (DNNs) trained with stochastic gradient descent (SGD) and study the effect of learning rate $\eta$, depth $d$, and width $w$ of the neural network. By analyzing the maximum eigenvalue $\lambda^H_t$ of the Hessian of the loss, which is a measure of sharpness of the loss landscape, we find that the dynamics can show four distinct regimes: (i) an early time transient regime, (ii) an intermediate saturation regime, (iii) a progressive sharpening regime, and (iv) a late time ``edge of stability" regime. The early and intermediate regimes (i) and (ii) exhibit a rich phase diagram depending on $\eta \equiv c / \lambda_0^H $, $d$, and $w$. We identify several critical values of $c$, which separate qualitatively distinct phenomena in the early time dynamics of training loss and sharpness. Notably, we discover the opening up of a ``sharpness reduction" phase, where sharpness decreases at early times, as $d$ and $1/w$ are increased.	翻訳日:2023-10-26 01:34:38 公開日:2023-10-24
# 共変量子組合せ論とゼロエラー通信への応用 Covariant quantum combinatorics with applications to zero-error communication ( http://arxiv.org/abs/2302.07776v2 ) ライセンス: Link先を確認	Dominic Verdon	(参考訳) 有限次元の共変集合において、すべての系(有限次元$C^$-代数)がコンパクトな量子群$G$の作用を持ち、すべてのチャネル(正の正の$G$-不変状態を保存する写像)が$G$-作用に関して共変であるような量子(非可換性)関係と量子(非可換性)グラフの理論を開発する。我々は、対称性制約を持つゼロエラー量子通信理論への応用による定義の動機付けを行う。主な結果は以下の通りである。 1)共変量子関係を共変チャネルの基底関係とするために必要な十分条件を与える。 2) 共変チャネルの共変チャネルの共変グラフとして、g$-作用を持つすべての量子可換グラフ(これを量子 $g$-graph と呼ぶ)が出現することを示す。 3) 共変チャネルは共変チャネルの可積分性が$G$-graph であるときに正確に可逆であることを示す。 4) $g$ が準三角である場合(これはすべてのコンパクト群を含む)、共変ゼロエラーのソースチャネル符号化スキームは、共変準同型である。 We develop the theory of quantum (a.k.a. noncommutative) relations and quantum (a.k.a. noncommutative) graphs in the finite-dimensional covariant setting, where all systems (finite-dimensional $C^$-algebras) carry an action of a compact quantum group $G$, and all channels (completely positive maps preserving the canonical $G$-invariant state) are covariant with respect to the $G$-actions. We motivate our definitions by applications to zero-error quantum communication theory with a symmetry constraint. Some key results are the following: 1) We give a necessary and sufficient condition for a covariant quantum relation to be the underlying relation of a covariant channel. 2) We show that every quantum confusability graph with a $G$-action (which we call a quantum $G$-graph) arises as the confusability graph of a covariant channel. 3) We show that a covariant channel is reversible precisely when its confusability $G$-graph is discrete. 4) When $G$ is quasitriangular (this includes all compact groups), we show that covariant zero-error source-channel coding schemes are classified by covariant homomorphisms between confusability $G$-graphs.	翻訳日:2023-10-26 01:34:15 公開日:2023-10-24
# 近位ニュートンによる効率的なグラフラプラシアン推定 Efficient Graph Laplacian Estimation by Proximal Newton ( http://arxiv.org/abs/2302.06434v2 ) ライセンス: Link先を確認	Yakov Medvedovsky, Eran Treister, Tirza Routtenberg	(参考訳) Laplacian-Constrained Gaussian Markov Random Field (LGMRF) は、与えられたデータから重み付きスパース依存グラフを学ぶための一般的な多変量統計モデルである。このグラフ学習問題は、ラプラシア構造制約を受ける精度行列の最大極大推定(MLE)として、スパース性誘導ペナルティ項で定式化することができる。本稿では,この学習問題を正確かつ効率的に解くことを目的とする。まず、一般的な$\ell_1$-normのペナルティは、この設定では不適切であり、完全なグラフにつながる可能性があるため、推定バイアスの低いスパース解を促進する非凸ミニマックスペナルティ(MCP)を用いる。第二に, 既存の一階法とは対照的に, 共役勾配, プリコンディショニング, およびアクティブ/フリー集合への分割といったアルゴリズム的特徴を活かし, 効率的な解法を得るための二階間近ニュートン法を開発した。数値実験により,既存の手法と比較して計算複雑性とグラフ学習精度の両方において,提案手法の利点が示された。 The Laplacian-constrained Gaussian Markov Random Field (LGMRF) is a common multivariate statistical model for learning a weighted sparse dependency graph from given data. This graph learning problem can be formulated as a maximum likelihood estimation (MLE) of the precision matrix, subject to Laplacian structural constraints, with a sparsity-inducing penalty term. This paper aims to solve this learning problem accurately and efficiently. First, since the commonly used $\ell_1$-norm penalty is inappropriate in this setting and may lead to a complete graph, we employ the nonconvex minimax concave penalty (MCP), which promotes sparse solutions with lower estimation bias. Second, as opposed to existing first-order methods for this problem, we develop a second-order proximal Newton approach to obtain an efficient solver, utilizing several algorithmic features, such as using Conjugate Gradients, preconditioning, and splitting to active/free sets. Numerical experiments demonstrate the advantages of the proposed method in terms of both computational complexity and graph learning accuracy compared to existing methods.	翻訳日:2023-10-26 01:33:06 公開日:2023-10-24
# ディープ・パーセプチュアル・ロス・ネットワークの系統的性能解析--Breaking Transfer Learning Conventions- A Systematic Performance Analysis of Deep Perceptual Loss Networks: Breaking Transfer Learning Conventions ( http://arxiv.org/abs/2302.04032v2 ) ライセンス: Link先を確認	Gustav Grund Pihlgren, Konstantina Nikolaidou, Prakash Chandra Chhipa, Nosheen Abid, Rajkumar Saini, Fredrik Sandin, Marcus Liwicki	(参考訳) ディープ・パーセプチュアル・ロス(deep perceptual loss)は、ニューラルネットワークから抽出された深い特徴を用いて人間の知覚を模倣することを目的としたコンピュータビジョンにおける損失関数の一種である。近年,画像合成やセグメンテーション,奥行き予測など,画像や画像ライクなアウトプットを持つタスクにおいて,興味深いコンピュータビジョンタスクのホストに対して大きな効果が与えられている。この手法の多くのアプリケーションは事前訓練されたネットワーク(しばしば畳み込みネットワーク)を損失計算に利用する。関心が高まり、広く使われるようになったにも拘わらず、深い知覚的損失を計算するためにどのネットワークを使うか、どの層から特徴を抽出するかを探索するにはより多くの努力が必要である。本研究の目的は,既存の4つの重度知覚喪失例において,多種多様な特徴抽出点に対して,広く利用され,容易に利用できる事前学習ネットワークのホストを体系的に評価することである。知覚的類似性,超解像,画像分割,次元化のユースケースをベンチマークにより評価した。ベンチマークは、選択したネットワークと抽出ポイントを評価する以前の作業の実装である。ベンチマークのパフォーマンスとネットワークの属性と抽出ポイントは、詳細な分析の基盤として使用される。この分析は、どのアーキテクチャが深い知覚損失に対して優れたパフォーマンスを提供するのか、特定のタスクやデータセットの適切な抽出ポイントをどのように選択するかに関する洞察を明らかにする。さらに本研究は, 深い知覚喪失に対する結果の意義と, 転校学習の幅広い分野について論じる。その結果, 転校学習における2つの慣例から深い知覚損失が逸脱し, それらの規則がより深い分析を必要とすることが示唆された。 Deep perceptual loss is a type of loss function in computer vision that aims to mimic human perception by using the deep features extracted from neural networks. In recent years, the method has been applied to great effect on a host of interesting computer vision tasks, especially for tasks with image or image-like outputs, such as image synthesis, segmentation, depth prediction, and more. Many applications of the method use pretrained networks, often convolutional networks, for loss calculation. Despite the increased interest and broader use, more effort is needed toward exploring which networks to use for calculating deep perceptual loss and from which layers to extract the features. This work aims to rectify this by systematically evaluating a host of commonly used and readily available, pretrained networks for a number of different feature extraction points on four existing use cases of deep perceptual loss. The use cases of perceptual similarity, super-resolution, image segmentation, and dimensionality reduction, are evaluated through benchmarks. The benchmarks are implementations of previous works where the selected networks and extraction points are evaluated. The performance on the benchmarks, and attributes of the networks and extraction points are then used as a basis for an in-depth analysis. This analysis uncovers insight regarding which architectures provide superior performance for deep perceptual loss and how to choose an appropriate extraction point for a particular task and dataset. Furthermore, the work discusses the implications of the results for deep perceptual loss and the broader field of transfer learning. The results show that deep perceptual loss deviates from two commonly held conventions in transfer learning, which suggests that those conventions are in need of deeper analysis.	翻訳日:2023-10-26 01:32:29 公開日:2023-10-24
# 順序付けによる規則 Rule Enforcing Through Ordering ( http://arxiv.org/abs/2303.17971v2 ) ライセンス: Link先を確認	David Sychrovsk\'y, Sameer Desai, Martin Loebl	(参考訳) 大都市の小さな交通犯罪のような現実の多くの状況では、中央の権威は多数の個人に対して定期的に罰を課す。一般的な慣習は、個人により小さな罰金を科す機会を与え、より大きな刑罰を課す法的手続きを避けることを保証することである。しかし、多くの犯罪者と中央権力の限られた能力のおかげで、個人のリスクは通常小さく、合理的な個人は罰金を支払うことを選択しない。ここで、中央機関が犯人を公的な命令で処理した場合、犯人に罰金を科すよう適切にインセンティブを与える。我々は、我々のメカニズムが非協力と個人が支払うインセンティブを促進するという分析的および現実的な実験を示す。さらに、任意の連立についても同じことが言える。我々は、中央機関が受け取る総支払額を定量化し、その額が大幅に増加することを示す。 In many real world situations, like minor traffic offenses in big cities, a central authority is tasked with periodic administering punishments to a large number of individuals. Common practice is to give each individual a chance to suffer a smaller fine and be guaranteed to avoid the legal process with probable considerably larger punishment. However, thanks to the large number of offenders and a limited capacity of the central authority, the individual risk is typically small and a rational individual will not choose to pay the fine. Here we show that if the central authority processes the offenders in a publicly known order, it properly incentives the offenders to pay the fine. We show analytically and on realistic experiments that our mechanism promotes non-cooperation and incentives individuals to pay. Moreover, the same holds for an arbitrary coalition. We quantify the expected total payment the central authority receives, and show it increases considerably.	翻訳日:2023-10-26 01:25:56 公開日:2023-10-24
# 正面視のためのNeRFおよびニューラルビュー合成法の知覚的品質評価 Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views ( http://arxiv.org/abs/2303.15206v3 ) ライセンス: Link先を確認	Hanxue Liang, Tianhao Wu, Param Hanji, Francesco Banterle, Hongyun Gao, Rafal Mantiuk, Cengiz Oztireli	(参考訳) ニューラルビュー合成(neural view synthesis, nvs)は、自由視点映像を合成する最も成功した手法の1つであり、撮像された画像の集合から高い忠実度を達成することができる。この成功は、PSNR、SSIM、LPIPSといった画像品質の指標を用いて、テストビューのセットで評価される、多くのバリエーションを生み出した。 nvsの手法がビデオ品質に対してどのように機能するかについては、研究が不足している。本研究は,NVSおよびNeRFの知覚的評価に関する最初の研究である。本研究では,制御された実験室環境で撮影されたシーンの2つのデータセットと,室内のシーンを収集した。既存のデータセットとは対照的に、これらのシーンには参照ビデオシーケンスがあり、静的画像のみを見る際に容易に見過ごされる時間的アーティファクトや微妙な歪みをテストできます。我々は,NVS法によって合成された映像の品質をよく制御された知覚品質評価実験で測定した。本稿では,nvs評価のためのデータセットとメトリック選択の結果と推奨結果の詳細な分析を行う。 Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation.	翻訳日:2023-10-26 01:25:38 公開日:2023-10-24
# 医用画像解析におけるラベル有効深層学習の課題と今後の方向性 Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions ( http://arxiv.org/abs/2303.12484v2 ) ライセンス: Link先を確認	Cheng Jin, Zhengrui Guo, Yi Lin, Luyang Luo, Hao Chen	(参考訳) ディープラーニングは近年急速に成長し、幅広いアプリケーションで最先端のパフォーマンスを達成している。しかし、トレーニングモデルは通常、大量のラベル付きデータの高価で時間を要する。これは医療画像解析(MIA)の分野において特に当てはまり、データに制限があり、ラベルを取得するのに費用がかかる。これにより、ラベル付きデータとラベルなしデータと弱いラベル付きデータとを包括的に利用するためのラベル効率の高いディープラーニング手法が開発される。本調査では,最近300以上の論文を網羅的に調査し,MIAにおけるラベル効率学習戦略の最近の進歩を概観した。まず,ラベル効率の高い学習の背景を示し,そのアプローチを異なるスキームに分類する。次に、各スキームを通して現在の最先端手法を詳細に検討する。具体的には,カノニカルな半教師付き,自己教師付き,マルチインスタンスの学習スキームだけでなく,最近ではアクティブでアノテーション効率のよい学習戦略も紹介する。さらに, この分野への総合的な貢献として, 調査手法の共通点や特徴を解明するだけでなく, 現状の課題を詳細に分析し, 今後の研究への道のりを示唆する。 Deep learning has seen rapid growth in recent years and achieved state-of-the-art performance in a wide range of applications. However, training models typically requires expensive and time-consuming collection of large quantities of labeled data. This is particularly true within the scope of medical imaging analysis (MIA), where data are limited and labels are expensive to be acquired. Thus, label-efficient deep learning methods are developed to make comprehensive use of the labeled data as well as the abundance of unlabeled and weak-labeled data. In this survey, we extensively investigated over 300 recent papers to provide a comprehensive overview of recent progress on label-efficient learning strategies in MIA. We first present the background of label-efficient learning and categorize the approaches into different schemes. Next, we examine the current state-of-the-art methods in detail through each scheme. Specifically, we provide an in-depth investigation, covering not only canonical semi-supervised, self-supervised, and multi-instance learning schemes, but also recently emerged active and annotation-efficient learning strategies. Moreover, as a comprehensive contribution to the field, this survey not only elucidates the commonalities and unique features of the surveyed methods but also presents a detailed analysis of the current challenges in the field and suggests potential avenues for future research.	翻訳日:2023-10-26 01:25:04 公開日:2023-10-24
# 自律運転における3次元動作推定のための簡易フレームワーク A Simple Framework for 3D Occupancy Estimation in Autonomous Driving ( http://arxiv.org/abs/2303.10076v4 ) ライセンス: Link先を確認	Wanshui Gan, Ningkai Mo, Hongbin Xu, Naoto Yokoya	(参考訳) 周囲の画像から3D占有率を推定するタスクは、Bird's Eye View (BEV) の認識の成功に続いて、自動運転分野におけるエキサイティングな発展である。このタスクは、運転環境の重要な3D特性を提供し、周囲空間の全体的な理解と認識を高める。本研究では,ネットワーク設計や最適化,評価などの3D占有率推定の重要要素を明らかにするために,CNNベースのフレームワークである3D占有率推定のためのシンプルなフレームワークを提案する。さらに, 自律運転における3次元知覚研究を推進しうる, 単眼深度推定や3次元再構成など, 3次元占有推定と他の関連課題との関係について検討した。評価のために,現在の公開データセットに柔軟である占有評価基準を定義するための簡単なサンプリング戦略を提案する。さらに,提案手法とddadおよびnuscenesデータセットの単眼深度推定法を比較し,競合性能を達成するために,深度推定メトリックの観点からベンチマークを確立した。関連するコードはhttps://github.com/GANWANSHUI/SimpleOccupancyで更新される。 The task of estimating 3D occupancy from surrounding-view images is an exciting development in the field of autonomous driving, following the success of Bird's Eye View (BEV) perception. This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. In this work, we present a simple framework for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation, such as network design, optimization, and evaluation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation and 3D reconstruction, which could advance the study of 3D perception in autonomous driving. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish the benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets and achieve competitive performance. The relevant code will be updated in https://github.com/GANWANSHUI/SimpleOccupancy.	翻訳日:2023-10-26 01:24:43 公開日:2023-10-24
# CoLT5: 条件計算付きより高速なロングレンジトランス CoLT5: Faster Long-Range Transformers with Conditional Computation ( http://arxiv.org/abs/2303.09752v3 ) ライセンス: Link先を確認	Joshua Ainslie, Tao Lei, Michiel de Jong, Santiago Onta\~n\'on, Siddhartha Brahma, Yury Zemlyanskiy, David Uthus, Mandy Guo, James Lee-Thorp, Yi Tay, Yun-Hsuan Sung, Sumit Sanghai	(参考訳) 多くの自然言語処理タスクは、長い入力の恩恵を受けるが、長い文書をトランスフォーマーで処理するのは高価である。しかし、特に長い文書では、すべてのトークンが等しく重要であるわけではない。本研究では,条件計算を駆使して,フィードフォワード層とアテンション層の両方で重要なトークンにより多くのリソースを割り当てる,この直観に基づく長入力トランスフォーマモデル colt5 を提案する。我々は、長い入力SCROLLSベンチマークでSOTAを達成し、より高速なトレーニングと推論により、CoLT5はLongT5よりも強力な性能を実現することを示す。さらに、CoLT5は、非常に長い入力を効果的に、かつ、牽引的に利用でき、64kまでの入力長が強い。 Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.	翻訳日:2023-10-26 01:24:23 公開日:2023-10-24
# I Tag, You Tag, Everybody Tags! I Tag, You Tag, Everybody Tags! ( http://arxiv.org/abs/2303.06073v2 ) ライセンス: Link先を確認	Hazem Ibrahim, Rohail Asim, Matteo Varvello, Yasir Zaki	(参考訳) 位置タグは個人の持ち物を追跡するように設計されている。それでも、位置情報タグが人をストーカーするのに悪用されているという逸話もある。追跡は、例えばBluetoothとペアの電話でローカルに達成され、タグに近づいた位置レポート装置にピギーバックすることでリモートで達成される。本稿では,最も人気のある2つの位置情報タグ (apple の airtag と samsungの smarttag) の性能を,実生活のユースケースをエミュレートする目的で,遭遇したデバイスの数や種類を制御せず,多数の位置情報報告デバイスを含む制御実験によって検討する。どちらのタグも同様の性能を示しており、例えば、半径100m以内の約10分で55%の位置にある。両方のタグが同時にデプロイされ、半分の時間で同等の精度を達成する場合でも、位置タグによる正確な位置へのリアルタイムストーカーは実行不可能である。それにもかかわらず、被害者の正確な動きの半分は、1時間だけ遅れて正確にバックトラッキングできる(エラーは10m)。 Location tags are designed to track personal belongings. Nevertheless, there has been anecdotal evidence that location tags are also misused to stalk people. Tracking is achieved locally, e.g., via Bluetooth with a paired phone, and remotely, by piggybacking on location-reporting devices which come into proximity of a tag. This paper studies the performance of the two most popular location tags (Apple's AirTag and Samsung's SmartTag) through controlled experiments - with a known large distribution of location-reporting devices - as well as in-the-wild experiments - with no control on the number and kind of reporting devices encountered, thus emulating real-life use-cases. We find that both tags achieve similar performance, e.g., they are located 55% of the times in about 10 minutes within a 100 m radius. It follows that real time stalking to a precise location via location tags is impractical, even when both tags are concurrently deployed which achieves comparable accuracy in half the time. Nevertheless, half of a victim's exact movements can be backtracked accurately (10m error) with just a one-hour delay, which is still perilous information in the possession of a stalker.	翻訳日:2023-10-26 01:23:50 公開日:2023-10-24
# chatgptは優れたnlgエバブリエーターか? 予備的研究 Is ChatGPT a Good NLG Evaluator? A Preliminary Study ( http://arxiv.org/abs/2303.04048v3 ) ライセンス: Link先を確認	Jiaan Wang, Yunlong Liang, Fandong Meng, Zengkui Sun, Haoxiang Shi, Zhixu Li, Jinan Xu, Jianfeng Qu, Jie Zhou	(参考訳) 近年、ChatGPTの出現は、計算言語学コミュニティから広く注目を集めている。多くの先行研究により、ChatGPTは自動評価指標を用いて様々なNLPタスクにおいて顕著な性能を発揮することが示されている。しかし、ChatGPTが評価指標として機能する能力はまだ未定である。自然言語生成モデル(NLG)の質を評価することは困難な作業であり、NLGの指標は人間の判断と相関が低いことで悪名高いことから、ChatGPTは優れたNLG評価指標であるのだろうか。本稿では,その信頼性を NLG 測定値として示すため,ChatGPT の予備メタ評価を行う。より詳しくは、ChatGPTを人間評価器とみなし、タスク固有(例えば、要約)とアスペクト固有(例えば、関連)の指示を与えて、ChatGPTにNLGモデルの生成された結果を評価する。我々は5つのNLGメタ評価データセット(要約、ストーリー生成、データ・トゥ・テキストタスクを含む)について実験を行った。実験の結果,ChatGPTは従来の自動測定値と比較すると,ほとんどの場合,人間の判断と最先端あるいは競合的な相関が得られた。さらに,ChatGPT評価器の有効性は,メタ評価データセットの作成方法の影響を受けている可能性が示唆された。参照に大きく依存して生成されるメタ評価データセットに対して、ChatGPT評価器は効果を失う可能性がある。我々の予備研究は、汎用的な信頼性NLGメトリックの出現を促すことを願っている。 Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with human judgments, we wonder whether ChatGPT is a good NLG evaluation metric. In this report, we provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric. In detail, we regard ChatGPT as a human evaluator and give task-specific (e.g., summarization) and aspect-specific (e.g., relevance) instruction to prompt ChatGPT to evaluate the generated results of NLG models. We conduct experiments on five NLG meta-evaluation datasets (including summarization, story generation and data-to-text tasks). Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments in most cases. In addition, we find that the effectiveness of the ChatGPT evaluator might be influenced by the creation method of the meta-evaluation datasets. For the meta-evaluation datasets which are created greatly depending on the reference and thus are biased, the ChatGPT evaluator might lose its effectiveness. We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.	翻訳日:2023-10-26 01:23:26 公開日:2023-10-24
# Hofstadter格子の次アネレスト近傍結合における光-マター相互作用 Light-Matter interactions in Hofstadter lattice with the next-nearest neighbor couplings ( http://arxiv.org/abs/2304.14580v2 ) ライセンス: Link先を確認	Jia-Qi Li, Zhao-Min Gao, Wen-Xiao Liu and Xin Wang	(参考訳) ホフシュタッター格子のバルク領域に結合するエミッタの光-マター相互作用について,De Bernardis \textit{et al。とD。バーナーディーズ、Z。 -P。 Cian, I. Carusotto, M. Hafezi, P. Rabl, \href{https://link.aps.org/doi/10.1103/PhysRevLett.126.103603}{Phys Rev. Lett. 126, 103603 (2021)}]. 本研究では,NNN(Next-nearest neighbor)結合を用いた拡張Hofstadter格子における光相互作用を提案する。標準ホフシュタッター格子と比較して、NNN結合はミラー対称性を破り、エネルギーバンドは平坦ではなく、非ゼロ群速度に分散する。 de bernardis \textit{et al. による研究とは対照的である。二つのレベルエミッタが拡張ホフスタッター格子のバルク領域と相互作用する場合、エミッタはフラットバンドとのコヒーレント振動によってタップされず、光子を一方向放射することができる。キラル機構は、壊れたパリティ対称性に由来する。放射率とキラリティはエミッタの結合位置によって周期的に変化する。これらの特徴はすべてフォトニック格子プラットフォーム上で実現でき、キラル量子情報処理に応用される可能性がある。 The light-mater interactions for an emitter coupling to the bulk region of a Hofstadter lattice has recently investigated by De Bernardis \textit{et al.} [D. De Bernardis, Z.-P. Cian, I. Carusotto, M. Hafezi, and P. Rabl, \href{https://link.aps.org/doi/10.1103/PhysRevLett.126.103603}{Phys. Rev. Lett. 126, 103603 (2021)}]. We propose the light-mater interactions in an extended Hofstadter lattice with the next-nearest neighbor (NNN) couplings. Compared with the standard Hofstadter lattice, the NNN couplings break the mirror symmetry and the energy bands are not flat, i.e., dispersive with nonzero group velocity. In contrast to the study by De Bernardis \textit{et al.}, when a two-level emitter interacts with the bulk region of extended Hofstadter lattice, the emitter is no longer tapped by the coherent oscillations with the flat band, and can radiate photons unidirectional. The chiral mechanism stems from the broken parity symmetry. Both the radiation rate and the chirality periodically change with the emitter's coupling position. All of those particular features can be realized on the photonic lattice platform and may find potential application in chiral quantum information processing.	翻訳日:2023-10-26 01:16:02 公開日:2023-10-24
# 生成モデルのための平均場ゲーム実験室 A mean-field games laboratory for generative modeling ( http://arxiv.org/abs/2304.13534v5 ) ライセンス: Link先を確認	Benjamin J. Zhang and Markos A. Katsoulakis	(参考訳) 生成モデルの説明,拡張,設計のための数学的枠組みとして,平均場ゲーム(MFG)の汎用性を実証する。生成フローでは、各粒子(生成サンプル)がその模擬経路上の損失関数を最小化するラグランジアン定式化が用いられる。しかし、この損失は他の粒子の経路に依存しており、粒子の集団間での競合につながっている。この競技の漸近的な行動は平均場ゲームをもたらす。我々は,MFGsと生成フローと,連続時間正規化フロー,スコアベース生成モデル(SGM),ワッサーシュタイン勾配フローなどの拡散とを関連づける。さらに,各生成モデルの数学的性質を,結合した前方-後方非線形偏微分方程式の組であるmfgの最適性条件を用いて検討する。 MFG最適条件によって記述される数学的構造は、生成フローの誘導バイアスを特定する。 SGMの数学的構造を解明し, ワッサーシュタイン勾配流のMFG定式化を導出し, 正規化流れの健全性と構造について検討する。アルゴリズムの観点から、最適条件は生成モデルの訓練を強化するためにハミルトン・ヤコビ・ベルマン正則化器(HJB)を生成する。特に,標準SGMよりも性能が向上したHJB正規化SGMを提案する。本稿では,本フレームワークをMFG実験室として紹介し,新たな実験方法と生成モデルの創出の場として機能する。 We demonstrate the versatility of mean-field games (MFGs) as a mathematical framework for explaining, enhancing, and designing generative models. In generative flows, a Lagrangian formulation is used where each particle (generated sample) aims to minimize a loss function over its simulated path. The loss, however, is dependent on the paths of other particles, which leads to a competition among the population of particles. The asymptotic behavior of this competition yields a mean-field game. We establish connections between MFGs and major classes of generative flows and diffusions including continuous-time normalizing flows, score-based generative models (SGM), and Wasserstein gradient flows. Furthermore, we study the mathematical properties of each generative model by studying their associated MFG's optimality condition, which is a set of coupled forward-backward nonlinear partial differential equations. The mathematical structure described by the MFG optimality conditions identifies the inductive biases of generative flows. We investigate the well-posedness and structure of normalizing flows, unravel the mathematical structure of SGMs, and derive a MFG formulation of Wasserstein gradient flows. From an algorithmic perspective, the optimality conditions yields Hamilton-Jacobi-Bellman (HJB) regularizers for enhanced training of generative models. In particular, we propose and demonstrate an HJB-regularized SGM with improved performance over standard SGMs. We present this framework as an MFG laboratory which serves as a platform for revealing new avenues of experimentation and invention of generative models.	翻訳日:2023-10-26 01:15:30 公開日:2023-10-24
# DiffTraj:拡散確率モデルによるGPS軌道生成 DiffTraj: Generating GPS Trajectory with Diffusion Probabilistic Model ( http://arxiv.org/abs/2304.11582v2 ) ライセンス: Link先を確認	Yuanshao Zhu, Yongchao Ye, Shiyao Zhang, Xiangyu Zhao, and James J.Q. Yu	(参考訳) GPS対応機器とデータ取得技術の広範囲な統合により、GPSトラジェクトリーデータの増加が加速し、時空間データマイニング研究の進歩が促進された。それにもかかわらず、GPSトラジェクトリには個人位置情報が含まれており、生データを扱う際に深刻なプライバシー上の懸念が生じる。この問題に対処するための有望なアプローチは、オリジナルのデータを生成されたプライバシフリーな代替手段に置き換える、トラジェクトリ生成である。軌道生成の可能性にもかかわらず、人間の行動の複雑な性質とその固有の確率特性は、高品質な軌道生成に挑戦する。本研究では,軌道生成のための時空間拡散確率モデル(DiffTraj)を提案する。このモデルは拡散モデルの生成能力と実際の軌道から導かれる時空間的特徴を効果的に組み合わせる。中心となる考え方は、逆軌道分解過程を通じて白いノイズから地理的軌跡を再構成し、合成することである。さらに、条件情報を埋め込んだトラジェクトリUNet(Traj-UNet)ディープニューラルネットワークを提案し、逆処理中のノイズレベルを正確に推定する。 2つの実世界のデータセットの実験により、DiffTrajは元の分布を保持しながら高忠実な軌道を生成するために直感的に適用可能であることが示された。さらに, 生成した結果は下流経路解析タスクをサポートし, 地理的分布評価の点で他の手法を著しく上回っている。 Pervasive integration of GPS-enabled devices and data acquisition technologies has led to an exponential increase in GPS trajectory data, fostering advancements in spatial-temporal data mining research. Nonetheless, GPS trajectories contain personal geolocation information, rendering serious privacy concerns when working with raw data. A promising approach to address this issue is trajectory generation, which involves replacing original data with generated, privacy-free alternatives. Despite the potential of trajectory generation, the complex nature of human behavior and its inherent stochastic characteristics pose challenges in generating high-quality trajectories. In this work, we propose a spatial-temporal diffusion probabilistic model for trajectory generation (DiffTraj). This model effectively combines the generative abilities of diffusion models with the spatial-temporal features derived from real trajectories. The core idea is to reconstruct and synthesize geographic trajectories from white noise through a reverse trajectory denoising process. Furthermore, we propose a Trajectory UNet (Traj-UNet) deep neural network to embed conditional information and accurately estimate noise levels during the reverse process. Experiments on two real-world datasets show that DiffTraj can be intuitively applied to generate high-fidelity trajectories while retaining the original distributions. Moreover, the generated results can support downstream trajectory analysis tasks and significantly outperform other methods in terms of geo-distribution evaluations.	翻訳日:2023-10-26 01:15:05 公開日:2023-10-24
# 量子特異値変換を用いたハミルトンシミュレーション:複雑性解析と線形vlasov-poisson方程式への応用 Hamiltonian simulation using quantum singular value transformation: complexity analysis and application to the linearized Vlasov-Poisson equation ( http://arxiv.org/abs/2304.08937v2 ) ライセンス: Link先を確認	Kiichiro Toyoizumi, Naoki Yamamoto, Kazuo Hoshino	(参考訳) 量子コンピューティングは物理系のシミュレーション時間(より正確にはアルゴリズムのクエリ数)を高速化するために使用することができる。近年,量子特異値変換(QSVT)がHSの最小シミュレーション時間を達成することが証明された。 QSVTベースのHSアルゴリズムの重要なサブルーチンは振幅増幅演算であり、これはQSVTフレームワークにおける可視振幅増幅または固定点振幅増幅によって実現できる。そこで本研究では,QSVT ベースの HS の誤りとクエリ数に関する詳細な解析を行い,シミュレーション時間における不明瞭な手法が固定点法よりも優れていることを示す。この結果に基づいて,QSVT に基づく HS を 1 次元線形化 Vlasov-Poisson 方程式に適用し,線形ランドウ減衰のシミュレーションに成功したことを示す。 Quantum computing can be used to speed up the simulation time (more precisely, the number of queries of the algorithm) for physical systems; one such promising approach is the Hamiltonian simulation (HS) algorithm. Recently, it was proven that the quantum singular value transformation (QSVT) achieves the minimum simulation time for HS. An important subroutine of the QSVT-based HS algorithm is the amplitude amplification operation, which can be realized via the oblivious amplitude amplification or the fixed-point amplitude amplification in the QSVT framework. In this work, we execute a detailed analysis of the error and number of queries of the QSVT-based HS and show that the oblivious method is better than the fixed-point one in the sense of simulation time. Based on this finding, we apply the QSVT-based HS to the one-dimensional linearized Vlasov-Poisson equation and demonstrate that the linear Landau damping can be successfully simulated.	翻訳日:2023-10-26 01:14:21 公開日:2023-10-24
# Few-Shot Class-Incremental Learningに関する調査 A Survey on Few-Shot Class-Incremental Learning ( http://arxiv.org/abs/2304.08130v2 ) ライセンス: Link先を確認	Songsong Tian, Lusi Li, Weijun Li, Hang Ran, Xin Ning, Prayag Tiwari	(参考訳) 大規模なディープラーニングモデルは印象的だが、リアルタイムデータが利用できないと苦労する。 FSCIL(Few-shot class-incremental Learning)は、ディープニューラルネットワークにおいて、これまで学んだことを忘れずに、ラベル付きサンプルから新しいタスクを学習する上で重要な課題となる。このセットアップは、破滅的な忘れと過度な問題を引き起こし、モデルパフォーマンスに深刻な影響を与えます。 FSCILの研究は、データボリュームと取得時間に関するディープラーニングモデルの制限を克服し、機械学習モデルの実用性と適応性を向上させる。本稿では FSCIL に関する総合的な調査を行う。これまでの調査と異なり,2つの視点からfscilを導入することに着目し,30以上の理論研究と20以上の応用研究をレビューした。理論的には,従来の機械学習手法,メタ学習に基づく手法,特徴量と特徴量に基づく手法,リプレイに基づく手法,動的ネットワーク構造に基づく手法の5つのサブカテゴリに分けた新しい分類手法を提案する。また、FSCILのベンチマークデータセットに関する最近の理論的研究の評価を行った。アプリケーションの観点からは、FSCILは、自然言語処理やグラフと同様に、画像分類、オブジェクト検出、画像分割など、コンピュータビジョンの様々な分野において、目覚ましい成果を達成している。我々は重要な応用をまとめる。最後に,応用,問題設定,理論開発など今後の研究の方向性を指摘する。本稿では,FSCILの方法論,性能,アプリケーションの観点からの最近の進歩を包括的に分析する。 Large deep learning models are impressive, but they struggle when real-time data is not available. Few-shot class-incremental learning (FSCIL) poses a significant challenge for deep neural networks to learn new tasks from just a few labeled samples without forgetting the previously learned ones. This setup easily leads to catastrophic forgetting and overfitting problems, severely affecting model performance. Studying FSCIL helps overcome deep learning model limitations on data volume and acquisition time, while improving practicality and adaptability of machine learning models. This paper provides a comprehensive survey on FSCIL. Unlike previous surveys, we aim to synthesize few-shot learning and incremental learning, focusing on introducing FSCIL from two perspectives, while reviewing over 30 theoretical research studies and more than 20 applied research studies. From the theoretical perspective, we provide a novel categorization approach that divides the field into five subcategories, including traditional machine learning methods, meta-learning based methods, feature and feature space-based methods, replay-based methods, and dynamic network structure-based methods. We also evaluate the performance of recent theoretical research on benchmark datasets of FSCIL. From the application perspective, FSCIL has achieved impressive achievements in various fields of computer vision such as image classification, object detection, and image segmentation, as well as in natural language processing and graph. We summarize the important applications. Finally, we point out potential future research directions, including applications, problem setups, and theory development. Overall, this paper offers a comprehensive analysis of the latest advances in FSCIL from a methodological, performance, and application perspective.	翻訳日:2023-10-26 01:14:02 公開日:2023-10-24
# 大規模言語モデルを用いた文書レベル機械翻訳 Document-Level Machine Translation with Large Language Models ( http://arxiv.org/abs/2304.02210v2 ) ライセンス: Link先を確認	Longyue Wang, Chenyang Lyu, Tianbo Ji, Zhirui Zhang, Dian Yu, Shuming Shi, Zhaopeng Tu	(参考訳) ChatGPTのような大規模言語モデル(LLM)は、様々な自然言語処理(NLP)タスクに対して、一貫性、凝集性、関連性、および流動性のある回答を生成することができる。本稿では,文書レベルの機械翻訳(MT)をテストベッドとして,談話モデルにおけるLLMの能力の詳細な評価を行う。この研究は3つの側面に焦点を当てています 1) 文脈認識プロンプトの効果は,文書レベルの翻訳品質と談話現象に異なるプロンプトが与える影響について検討する。 2)ChatGPTの翻訳性能を商用MTシステムと高度文書レベルのMT手法と比較する翻訳モデルの比較 3) 会話モデリング能力の分析により, llmで符号化された談話知識をさらに探究し, 学習技術が談話モデリングに与える影響に光を当てる。多くのベンチマークで評価した結果、LCMは優れた性能を示し、文書レベルの翻訳の新たなパラダイムとなる可能性を示した。 1)GPT-3.5及びGPT-4は、その強力な長文モデリング機能を活用し、人的評価において商用MTシステムより優れている。 2) GPT-4 は GPT-3.5 よりも言語知識の探索能力が高い。この研究は、MT における LLM の課題と機会を強調し、将来 LLM の設計と評価を刺激したいと思っています。 Large language models (LLMs) such as ChatGPT can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. Taking document-level machine translation (MT) as a testbed, this paper provides an in-depth evaluation of LLMs' ability on discourse modeling. The study focuses on three aspects: 1) Effects of Context-Aware Prompts, where we investigate the impact of different prompts on document-level translation quality and discourse phenomena; 2) Comparison of Translation Models, where we compare the translation performance of ChatGPT with commercial MT systems and advanced document-level MT methods; 3) Analysis of Discourse Modelling Abilities, where we further probe discourse knowledge encoded in LLMs and shed light on impacts of training techniques on discourse modeling. By evaluating on a number of benchmarks, we surprisingly find that LLMs have demonstrated superior performance and show potential to become a new paradigm for document-level translation: 1) leveraging their powerful long-text modeling capabilities, GPT-3.5 and GPT-4 outperform commercial MT systems in terms of human evaluation; 2) GPT-4 demonstrates a stronger ability for probing linguistic knowledge than GPT-3.5. This work highlights the challenges and opportunities of LLMs for MT, which we hope can inspire the future design and evaluation of LLMs.We release our data and annotations at https://github.com/longyuewangdcu/Document-MT-LLM.	翻訳日:2023-10-26 01:12:52 公開日:2023-10-24
# 完全配向量子センサを用いた超伝導渦の広視野定量磁気イメージング Wide-field quantitative magnetic imaging of superconducting vortices using perfectly aligned quantum sensors ( http://arxiv.org/abs/2304.01024v2 ) ライセンス: Link先を確認	Shunsuke Nishimura, Taku Kobayashi, Daichi Sasaki, Takeyuki Tsuji, Takayuki Iwasaki, Mutsuko Hatano, Kento Sasaki, and Kensuke Kobayashi	(参考訳) 超伝導渦の可視化に様々な技術が応用され、電磁応答の手がかりとなっている。ここでは, 完全に整列したダイヤモンド量子センサを用いて, 超伝導薄膜中の渦の成層場を広範囲に定量的に可視化する。センサの不均一性の影響を軽減する解析により,yba$_2$cu$_3$o$_{7-\delta}$における単一渦の磁束を,精度$\pm10~\%$で可視化する。得られた渦形状は理論モデルと一致し, 浸透深さと温度依存性は従来の研究と一致し, 精度と広い適用性が証明された。この広視野イメージングは、原理的には極端条件下でも機能し、様々な超伝導体のキャラクタリゼーションを可能にする。 Various techniques have been applied to visualize superconducting vortices, providing clues to their electromagnetic response. Here, we present a wide-field, quantitative imaging of the stray field of the vortices in a superconducting thin film using perfectly aligned diamond quantum sensors. Our analysis, which mitigates the influence of the sensor inhomogeneities, visualizes the magnetic flux of single vortices in YBa$_2$Cu$_3$O$_{7-\delta}$ with an accuracy of $\pm10~\%$. The obtained vortex shape is consistent with the theoretical model, and penetration depth and its temperature dependence agree with previous studies, proving our technique's accuracy and broad applicability. This wide-field imaging, which in principle works even under extreme conditions, allows the characterization of various superconductors.	翻訳日:2023-10-26 01:12:30 公開日:2023-10-24
# 議論中の暗黙の質問としての包括的単純化 Elaborative Simplification as Implicit Questions Under Discussion ( http://arxiv.org/abs/2305.10387v3 ) ライセンス: Link先を確認	Yating Wu, William Sheffield, Kyle Mahowald and Junyi Jessy Li	(参考訳) 自動テキスト簡易化(automated text simplification)は、子供や創発的なバイリンガルなどの人々にとって、テキストをより使いやすくするための技術であり、複雑な文からエンコーダ・デコーダモデルを用いた簡易文への単言語翻訳タスクとしてよく考えられている。このビューは、単純化されたテキストに新しい情報が加えられる詳細化の考慮に失敗している。本稿では,議論中の問題(qud)フレームワークのレンズを通して,説明の簡略化を考察し,著者が何を精巧に扱っているのか,どのように精巧化が談話の文脈にどのように適合するかを,暗黙的な問いに対する明示的な答えとして捉えて検討する。我々は,これらの現象を研究するために,暗黙のQUDを伴う1.3KのelabQUDを紹介する。質問生成による)qudを明示的にモデル化することで、説明の単純化と他の談話とどのように結びつくかという本質的な理解がもたらされるだけでなく、説明生成の質が大幅に向上することを示す。 Automated text simplification, a technique useful for making text more accessible to people such as children and emergent bilinguals, is often thought of as a monolingual translation task from complex sentences to simplified sentences using encoder-decoder models. This view fails to account for elaborative simplification, where new information is added into the simplified text. This paper proposes to view elaborative simplification through the lens of the Question Under Discussion (QUD) framework, providing a robust way to investigate what writers elaborate upon, how they elaborate, and how elaborations fit into the discourse context by viewing elaborations as explicit answers to implicit questions. We introduce ElabQUD, consisting of 1.3K elaborations accompanied with implicit QUDs, to study these phenomena. We show that explicitly modeling QUD (via question generation) not only provides essential understanding of elaborative simplification and how the elaborations connect with the rest of the discourse, but also substantially improves the quality of elaboration generation.	翻訳日:2023-10-26 01:07:06 公開日:2023-10-24
# オフライン強化学習へのミニマリストアプローチの再検討 Revisiting the Minimalist Approach to Offline Reinforcement Learning ( http://arxiv.org/abs/2305.09836v2 ) ライセンス: Link先を確認	Denis Tarasov, Vladislav Kurenkov, Alexander Nikulin, Sergey Kolesnikov	(参考訳) 近年、オフライン強化学習(rl)が大幅に進歩し、複雑さの度合いの異なる多数のアルゴリズムが開発された。これらのアルゴリズムは注目すべき改善をもたらしたが、多くは中核的なアルゴリズムの進歩を超えてその有効性に影響を与える一見小さな設計選択を取り入れている。しかし、これらの設計選択が確立されたベースラインに与える影響は未定である。本稿では,オフラインrlにおける最近の作業のふりかえり分析を行い,td3+bc法上に構築された設計要素を統合する最小化アルゴリズムであるrebracを提案することで,このギャップを埋めることを目的とする。 D4RLとV-D4RLのベンチマークを用いて51のデータセット上のReBRACの評価を行い、オフラインとオフラインの両方の設定におけるアンサンブルフリーメソッド間の最先端性能を実証した。これらの設計選択の有効性をさらに説明するために、数千の実験で大規模なアブレーション研究とハイパーパラメータ感度分析を行う。 Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many incorporate seemingly minor design choices that impact their effectiveness beyond core algorithmic advances. However, the effect of these design choices on established baselines remains understudied. In this work, we aim to bridge this gap by conducting a retrospective analysis of recent works in offline RL and propose ReBRAC, a minimalistic algorithm that integrates such design elements built on top of the TD3+BC method. We evaluate ReBRAC on 51 datasets with both proprioceptive and visual state spaces using D4RL and V-D4RL benchmarks, demonstrating its state-of-the-art performance among ensemble-free methods in both offline and offline-to-online settings. To further illustrate the efficacy of these design choices, we perform a large-scale ablation study and hyperparameter sensitivity analysis on the scale of thousands of experiments.	翻訳日:2023-10-26 01:06:29 公開日:2023-10-24
# 顔認証の視力説明に向けて Towards Visual Saliency Explanations of Face Verification ( http://arxiv.org/abs/2305.08546v4 ) ライセンス: Link先を確認	Yuhang Lu, Zewei Xu, Touradj Ebrahimi	(参考訳) 過去数年間、深層畳み込みニューラルネットワークは、認証と識別の両方のシナリオにおいて、顔認識(FR)技術のフロンティアを推し進めてきた。精度が高いにもかかわらず、説明性に欠けるとしてしばしば批判される。深層顔認識システムにおける意思決定プロセスの理解に対する需要が高まっている。近年の研究では、視覚塩分マップの解説としての利用が研究されているが、顔認識の文脈では議論や分析が欠如していることが多い。本稿では,説明可能な顔認証タスクに集中し,新しい説明枠組みを提案する。まず, 深層frモデルによる決定に焦点をあてた, 塩分に基づく説明方法の定義が提案されている。第二に,CorrRISEというモデルに依存しない新しい説明法が提案され,任意の顔画像の類似領域と相似領域の両方を明らかにする。次に、顔認証における一般的な視覚塩分説明手法の性能を測定するために評価手法を考案する。最後に, 視覚的, 定量的な結果から, 提案手法は他の最先端の顔認証手法と比較して有望な結果が得られた。 In the past years, deep convolutional neural networks have been pushing the frontier of face recognition (FR) techniques in both verification and identification scenarios. Despite the high accuracy, they are often criticized for lacking explainability. There has been an increasing demand for understanding the decision-making process of deep face recognition systems. Recent studies have investigated the usage of visual saliency maps as an explanation, but they often lack a discussion and analysis in the context of face recognition. This paper concentrates on explainable face verification tasks and conceives a new explanation framework. Firstly, a definition of the saliency-based explanation method is provided, which focuses on the decisions made by the deep FR model. Secondly, a new model-agnostic explanation method named CorrRISE is proposed to produce saliency maps, which reveal both the similar and dissimilar regions of any given pair of face images. Then, an evaluation methodology is designed to measure the performance of general visual saliency explanation methods in face verification. Finally, substantial visual and quantitative results have shown that the proposed CorrRISE method demonstrates promising results in comparison with other state-of-the-art explainable face verification approaches.	翻訳日:2023-10-26 01:05:48 公開日:2023-10-24
# 広視野眼底画像からの網膜疾患認識のためのドメイン適応 Supervised Domain Adaptation for Recognizing Retinal Diseases from Wide-Field Fundus Images ( http://arxiv.org/abs/2305.08078v2 ) ライセンス: Link先を確認	Qijie Wei, Jingyuan Yang, Bo Wang, Jinrui Wang, Jianchun Zhao, Xinyu Zhao, Sheng Yang, Niranchana Manivannan, Youxin Chen, Dayong Ding, Jing Zhou and Xirong Li	(参考訳) 本稿では,広視野 (WF) と超広視野 (UWF) の眼底画像から複数の網膜疾患を認識するための課題について述べる。既存の大量のラベル付きカラーファンドス写真(CFP)データと、比較的少量のWFおよびUWFデータを有効利用するために、クロスドメイン協調学習(CdCL)というドメイン適応手法を提案する。教師なしドメイン適応における固定比に基づくミックスアップの成功に触発されて、我々はこの戦略を現在のタスクに再活用する。 CFP画像とWF/UWF画像の視野の違いにより,CFP画像の解剖学的構造がWF/UWF画像よりもかなり大きくなるという,スケールバイアスが自然に存在する。 CdCL法は,変圧器を用いたスケール・バイアス補正法により,スケール不変な特徴を生成できる。 wf画像とuwf画像の両方をカバーする複数のデータセットに関する広範囲な実験によって示されているように、提案手法は多くの競合ベースラインと比較できる。 This paper addresses the emerging task of recognizing multiple retinal diseases from wide-field (WF) and ultra-wide-field (UWF) fundus images. For an effective use of existing large amount of labeled color fundus photo (CFP) data and the relatively small amount of WF and UWF data, we propose a supervised domain adaptation method named Cross-domain Collaborative Learning (CdCL). Inspired by the success of fixed-ratio based mixup in unsupervised domain adaptation, we re-purpose this strategy for the current task. Due to the intrinsic disparity between the field-of-view of CFP and WF/UWF images, a scale bias naturally exists in a mixup sample that the anatomic structure from a CFP image will be considerably larger than its WF/UWF counterpart. The CdCL method resolves the issue by Scale-bias Correction, which employs Transformers for producing scale-invariant features. As demonstrated by extensive experiments on multiple datasets covering both WF and UWF images, the proposed method compares favorably against a number of competitive baselines.	翻訳日:2023-10-26 01:05:31 公開日:2023-10-24
# 医用画像の拡散モデルに留意すること --脳MRIおよび胸部X線画像の記憶におけるGANとの比較 Beware of diffusion models for synthesizing medical images -- A comparison with GANs in terms of memorizing brain MRI and chest x-ray images ( http://arxiv.org/abs/2305.07644v2 ) ライセンス: Link先を確認	Muhammad Usman Akbar, Wuhao Wang, Anders Eklund	(参考訳) 拡散モデルは当初テキスト・画像生成のために開発され、現在では高品質な合成画像の生成に利用されている。 GANが先行する拡散モデルでは,様々な評価指標を用いて顕著な結果が得られた。しかし、fidなどの一般的なメトリクスは、拡散モデルが単にトレーニングイメージを再現しているかどうかを決定するのに適していない。ここでは、BRATS20、BRATS21および胸部X線肺炎データセットを用いてStyleGANおよび拡散モデルを用いて、脳MRIおよび胸部X線画像を合成し、合成4c画像とすべてのトレーニング画像との相関を測定する。以上の結果から,拡散モデルでは,特に3次元ボリュームの2次元スライスを用いた場合,StyleGANと比較してトレーニング画像を記憶する傾向が示唆された。研究者は、synthe4c画像の共有が最終的な目的であれば、医用イメージングに拡散モデルを使用する際に注意する必要がある。 Diffusion models were initially developed for text-to-image generation and are now being utilized to generate high-quality synthetic images. Preceded by GANs, diffusion models have shown impressive results using various evaluation metrics. However, commonly used metrics such as FID and IS are not suitable for determining whether diffusion models are simply reproducing the training images. Here we train StyleGAN and diffusion models, using BRATS20, BRATS21 and a chest x-ray pneumonia dataset, to synthesize brain MRI and chest x-ray images, and measure the correlation between the synthe4c images and all training images. Our results show that diffusion models are more likely to memorize the training images, compared to StyleGAN, especially for small datasets and when using 2D slices from 3D volumes. Researchers should be careful when using diffusion models for medical imaging, if the final goal is to share the synthe4c images	翻訳日:2023-10-26 01:05:11 公開日:2023-10-24
# VPGTrans: LLM間でのビジュアルプロンプトジェネレータの転送 VPGTrans: Transfer Visual Prompt Generator across LLMs ( http://arxiv.org/abs/2305.01278v2 ) ライセンス: Link先を確認	Ao Zhang, Hao Fei, Yuan Yao, Wei Ji, Li Li, Zhiyuan Liu, and Tat-Seng Chua	(参考訳) 画像テキストペアをスクラッチから事前学習することで,新たなマルチモーダル LLM (MLLM) を開発するには, 既存の LLM を比較的軽量なビジュアルプロンプトジェネレータ (VPG) と接続することが, 実現可能なパラダイムとなる。しかし、MLLMのVPG部分のさらなるチューニングは依然として必要な計算コスト、すなわち何千時間ものGPU時間と数百万のトレーニングデータに悩まされている。 1つの代替策は、既存のMLLMからターゲットMLLMに既存のVPGを転送することである。本研究では,LLM間のVPG転送可能性について初めて検討し,VPG転送コストを低減するための解決策を探究する。我々はまず, 異なるLLMサイズ(例えば, 小さいから大きい)および異なるLLMタイプにわたるVPG転送について検討し, 転送効率を最大化するために重要な因子を診断する。本稿では,VPGTransという2段階の転送フレームワークを設計する。広範な実験を通じて,vpgtransは,パフォーマンスを損なうことなく,転送学習プロセスを大幅に高速化できることを実証する。 BLIP-2 OPT$_\text{2.7B}$からBLIP-2 OPT$_\text{6.7B}$へのVPG転送には10倍以上のスピードアップと10.7%のトレーニングデータがある。さらに、その背後にある一連の興味深い発見と潜在的な根拠を提供し、議論する。最後に、VL-LLaMAとVL-Vicunaを含む2つの新しいMLLMと、最近リリースされたLLaMAとVicuna LLMをカスタマイズすることで、VPGTransアプローチの実用価値を示す。 While developing a new multimodal LLM (MLLM) by pre-training on tremendous image-text pairs from scratch can be exceedingly resource-consuming, connecting an existing LLM with a comparatively lightweight visual prompt generator (VPG) becomes a feasible paradigm. However, further tuning the VPG part of the MLLM still suffers from indispensable computational costs, i.e., requiring thousands of GPU hours and millions of training data. One alternative solution is to transfer an existing VPG from any existing MLLMs for the target MLLM. In this work, we for the first time investigate the VPG transferability across LLMs, and explore a solution to reduce the cost of VPG transfer. We first study the VPG transfer across different LLM sizes (e.g., small-to-large), and across different LLM types, through which we diagnose the key factors to maximize the transfer efficiency. Based on our observation, we design a two-stage transfer framework named VPGTrans, which is simple yet highly effective. Through extensive experiments, we demonstrate that VPGTrans helps significantly speed up the transfer learning process without compromising performance. Remarkably, it helps achieve the VPG transfer from BLIP-2 OPT$_\text{2.7B}$ to BLIP-2 OPT$_\text{6.7B}$ with over 10 times speed-up and 10.7% training data compared with connecting a VPG to OPT$_\text{6.7B}$ from scratch. Further, a series of intriguing findings and potential rationales behind them are provided and discussed. Finally, we showcase the practical value of our VPGTrans approach, by customizing two novel MLLMs, including VL-LLaMA and VL-Vicuna, with recently released LLaMA and Vicuna LLMs.	翻訳日:2023-10-26 01:04:24 公開日:2023-10-24
# GPT-2はどのように計算しますか? 事前学習言語モデルにおける数学的能力の解釈 How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model ( http://arxiv.org/abs/2305.00586v4 ) ライセンス: Link先を確認	Michael Hanna, Ollie Liu and Alexandre Variengien	(参考訳) 事前訓練された言語モデルは、明示的に訓練されていないタスクに驚くほど適しているが、これらの機能の実装方法はあまり理解されていない。本稿では,事前学習された言語モデルによってしばしば得られる基本的な数学的能力について検討する。具体的には,GPT-2の(限定的な)数学的能力を説明するために,機械的解釈可能性技術を用いる。ケーススタディとして,「戦争は1732年から17年まで続いた」などの文を取り込む能力について検討し,有効な2桁終了年(32歳未満)を予測した。まず、このタスクの出力を計算するGPT-2小の計算グラフの小さなサブセットである回路を同定する。そして、各回路部品の役割を説明し、GPT-2小の最終的な多層パーセプトロンが、開始年よりも終末年の確率を高めることを示す。最後に、回路を活性化する関連タスクを見つける。以上の結果から,GPT-2は多種多様なコンテキストにまたがって活性化する複雑だが汎用的な機構を用いて計算を行う。 Pre-trained language models can be surprisingly adept at tasks they were not explicitly trained on, but how they implement these capabilities is poorly understood. In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models. Concretely, we use mechanistic interpretability techniques to explain the (limited) mathematical abilities of GPT-2 small. As a case study, we examine its ability to take in sentences such as "The war lasted from the year 1732 to the year 17", and predict valid two-digit end years (years > 32). We first identify a circuit, a small subset of GPT-2 small's computational graph that computes this task's output. Then, we explain the role of each circuit component, showing that GPT-2 small's final multi-layer perceptrons boost the probability of end years greater than the start year. Finally, we find related tasks that activate our circuit. Our results suggest that GPT-2 small computes greater-than using a complex but general mechanism that activates across diverse contexts.	翻訳日:2023-10-26 01:03:43 公開日:2023-10-24
# instructalign: 連続的な言語間インストラクションチューニングによる高低リソース言語アライメント InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning ( http://arxiv.org/abs/2305.13627v2 ) ライセンス: Link先を確認	Samuel Cahyawijaya, Holy Lovenia, Tiezheng Yu, Willy Chung, Pascale Fung	(参考訳) 命令を調整した大規模言語モデル(LLM)は、様々なタスクや言語で顕著な能力を示している。しかし、利用可能なデータが不足しているため、表現不足の言語に一般化する能力は限られている。さらに、命令調整されたLLMに新しい言語を直接適用すると、破滅的な忘れ込みが生じ、マルチタスク能力が失われる。この問題に対処するために,LLMが新たな未知言語と学習済み高ソース言語との整合を可能にするために,連続的な言語間命令チューニングを使用するInstructAlignを提案する。 InstructAlignの有効性を実証し,並列データに制限のある低リソース言語をモデルで理解し,破滅的な忘れ込みを防止した。我々の研究は、言語適応手法の進歩に寄与し、特に、未表現言語への命令調整 LLM の適応に寄与する。私たちのコードはhttps://github.com/HLTCHKUST/InstructAlignでリリースされています Large language models (LLMs) that are tuned with instructions have demonstrated remarkable capabilities in various tasks and languages. However, their ability to generalize to underrepresented languages is limited due to the scarcity of available data. Additionally, directly adapting new languages to instruction-tuned LLMs can result in catastrophic forgetting, which leads to the loss of multitasking ability. To address this issue, we propose InstructAlign which uses continual crosslingual instruction tuning to enable LLMs to align new unseen languages with previously learned high-resource languages. Our results demonstrate the effectiveness of InstructAlign in enabling the model to understand low-resource languages with limited parallel data while preventing catastrophic forgetting. Our work contributes to the advancement of language adaptation methods, particularly for adapting instruction-tuned LLMs to underrepresented languages. Our code is released on https://github.com/HLTCHKUST/InstructAlign	翻訳日:2023-10-26 00:55:37 公開日:2023-10-24
# KineticNet: 軌道自由密度汎関数理論のための伝達可能な運動エネルギー関数の深層学習 KineticNet: Deep learning a transferable kinetic energy functional for orbital-free density functional theory ( http://arxiv.org/abs/2305.13316v2 ) ライセンス: Link先を確認	Roman Remme, Tobias Kaczun, Maximilian Scheurer, Andreas Dreuw, Fred A. Hamprecht	(参考訳) 軌道自由密度汎関数理論(OF-DFT)は、最小コストで基底状態分子特性を計算することを約束する。しかし、電子密度のみの関数として運動エネルギーを計算できないため、これは抑制されている。ここでは、より高価なコーン・シャム密度汎関数理論によって提供される基底真理から運動エネルギー汎関数を学習する。モデルに十分な表現性と空間的コンテキストを付与し、メモリフットプリントをGPU上の計算能力に制限する、トレーニングデータの十分な広範な分布を作成して、初期推定が貧弱な場合でも反復的な密度最適化を可能にする、という2つの課題に直面している。そこで我々は,分子二次格子上の量予測に適応した点畳み込みに基づく等価なディープニューラルネットワークアーキテクチャであるkineticnetを提案する。核カスプ近傍で十分な空間分解能を有する畳み込みフィルタ、複数の結合長にわたって情報を伝達する原子中心のスパースだが表現力のあるアーキテクチャ、およびランダムな外部電位による摂動面の基底状態密度を見つけ、様々なトレーニングデータを生成する新しい戦略を含む。 KineticNetは、入力密度と微小分子のジオメトリにわたる学習された機能の化学的精度を初めて達成した。 2つの電子系に対して、化学的精度でOF-DFT密度を最適化する。 Orbital-free density functional theory (OF-DFT) holds the promise to compute ground state molecular properties at minimal cost. However, it has been held back by our inability to compute the kinetic energy as a functional of the electron density only. We here set out to learn the kinetic energy functional from ground truth provided by the more expensive Kohn-Sham density functional theory. Such learning is confronted with two key challenges: Giving the model sufficient expressivity and spatial context while limiting the memory footprint to afford computations on a GPU; and creating a sufficiently broad distribution of training data to enable iterative density optimization even when starting from a poor initial guess. In response, we introduce KineticNet, an equivariant deep neural network architecture based on point convolutions adapted to the prediction of quantities on molecular quadrature grids. Important contributions include convolution filters with sufficient spatial resolution in the vicinity of the nuclear cusp, an atom-centric sparse but expressive architecture that relays information across multiple bond lengths; and a new strategy to generate varied training data by finding ground state densities in the face of perturbations by a random external potential. KineticNet achieves, for the first time, chemical accuracy of the learned functionals across input densities and geometries of tiny molecules. For two electron systems, we additionally demonstrate OF-DFT density optimization with chemical accuracy.	翻訳日:2023-10-26 00:55:21 公開日:2023-10-24
# GQA:マルチヘッドチェックポイントを用いた汎用マルチクエリトランスフォーマモデルの訓練 GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints ( http://arxiv.org/abs/2305.13245v2 ) ライセンス: Link先を確認	Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebr\'on, Sumit Sanghai	(参考訳) 単一のキー値ヘッドのみを使用するマルチクエリアテンション(MQA)は、デコーダ推論を大幅に高速化する。しかし、MQAは品質の低下につながる可能性があるし、より高速な推論のためだけに別のモデルをトレーニングすることは望ましくないかもしれない。 1) 既存のマルチヘッド言語モデルのチェックポイントを、従来の事前学習計算の5%を用いてMQAモデルにアップトレーニングするためのレシピを提案し、(2) キー値ヘッドの中間数(クエリヘッド数より少ない数)を使用するマルチクエリアテンションの一般化であるグループクエリアテンション(GQA)を導入する。トレーニングされたGQAはMQAに匹敵する速度でマルチヘッドで品質を実現することを示す。 Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference. However, MQA can lead to quality degradation, and moreover it may not be desirable to train a separate model just for faster inference. We (1) propose a recipe for uptraining existing multi-head language model checkpoints into models with MQA using 5% of original pre-training compute, and (2) introduce grouped-query attention (GQA), a generalization of multi-query attention which uses an intermediate (more than one, less than number of query heads) number of key-value heads. We show that uptrained GQA achieves quality close to multi-head attention with comparable speed to MQA.	翻訳日:2023-10-26 00:54:57 公開日:2023-10-24
# SpokenWOZ:タスク指向対話エージェントのための大規模音声テキストベンチマーク SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents ( http://arxiv.org/abs/2305.13040v4 ) ライセンス: Link先を確認	Shuzheng Si, Wentao Ma, Haoyu Gao, Yuchuan Wu, Ting-En Lin, Yinpei Dai, Hangyu Li, Rui Yan, Fei Huang, Yongbin Li	(参考訳) タスク指向対話(TOD)モデルは近年大きな進歩を遂げている。しかし,従来の研究は主にアノテータによるデータセットに焦点を当てており,学術研究と実世界の会話シナリオのギャップが生じた。いくつかの小規模音声TODデータセットは、ASRエラーなどの堅牢性問題に対処するために提案されているが、音声会話におけるユニークな課題は無視されている。この制限に対処するために,8つのドメイン,203kのターン,5.7kの対話,対人会話からの249時間の音声を含む,音声TODのための大規模音声テキストデータセットであるSpkenWOZを導入する。 SpokenWOZはさらに、音声言語における単語間処理や推論などの一般的な音声特徴を取り入れている。これらの特徴に基づき,新たな課題としてクロスターンスロットと推論スロット検出を提案する。テキストモーダルモデル,新たに提案されたデュアルモーダルモデル,LLM,例えばChatGPTなど,さまざまなベースライン上で実験を行う。その結果、最も先進的な対話状態追跡装置は、結合目標精度が25.65%しか達成できず、somaエンドツーエンドモデルでは52.1%の対話でユーザ要求を正しく完了している。データセット、コード、およびリーダーボードは、https://spokenwoz.github.io/SpokenWOZ-github.io/で入手できる。 Task-oriented dialogue (TOD) models have made significant progress in recent years. However, previous studies primarily focus on datasets written by annotators, which has resulted in a gap between academic research and real-world spoken conversation scenarios. While several small-scale spoken TOD datasets are proposed to address robustness issues such as ASR errors, they ignore the unique challenges in spoken conversation. To tackle the limitations, we introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD, containing 8 domains, 203k turns, 5.7k dialogues and 249 hours of audios from human-to-human spoken conversations. SpokenWOZ further incorporates common spoken characteristics such as word-by-word processing and reasoning in spoken language. Based on these characteristics, we present cross-turn slot and reasoning slot detection as new challenges. We conduct experiments on various baselines, including text-modal models, newly proposed dual-modal models, and LLMs, e.g., ChatGPT. The results show that the current models still have substantial room for improvement in spoken conversation, where the most advanced dialogue state tracker only achieves 25.65% in joint goal accuracy and the SOTA end-to-end model only correctly completes the user request in 52.1% of dialogues. The dataset, code, and leaderboard are available: https://spokenwoz.github.io/SpokenWOZ-github.io/.	翻訳日:2023-10-26 00:54:35 公開日:2023-10-24
# 形状のViT:計算最適モデル設計のためのスケーリング法則 Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design ( http://arxiv.org/abs/2305.13035v4 ) ライセンス: Link先を確認	Ibrahim Alabdulmohsin, Xiaohua Zhai, Alexander Kolesnikov, Lucas Beyer	(参考訳) スケーリング法則は、最近、与えられた計算時間に最適なモデルサイズ(パラメータの数)を導出するために用いられる。このような手法を改良して、幅や深さなどの計算最適モデル形状を推定し、視覚トランスフォーマーでこれをうまく実装した。我々の形状最適化型視覚変換器SoViTは、同等の計算量で事前訓練されているにもかかわらず、サイズが2倍以上のモデルと競合する結果を得る。例えば、SoViT-400m/14はILSRCV2012で90.3%の微調整精度を達成し、はるかに大きなViT-g/14を超え、同じ設定でViT-G/14に近づく。画像分類,キャプション,vqa,ゼロショット転送など,複数のタスクにわたって徹底的な評価を行い,幅広い領域にわたるモデルの有効性と限界の特定を実証した。全体として、私たちの発見は視覚モデルを盲目的にスケールアップし、より詳細なスケーリングの道を開くという一般的なアプローチに挑戦しています。 Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute. For example, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012, surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical settings, with also less than half the inference cost. We conduct a thorough evaluation across multiple tasks, such as image classification, captioning, VQA and zero-shot transfer, demonstrating the effectiveness of our model across a broad range of domains and identifying limitations. Overall, our findings challenge the prevailing approach of blindly scaling up vision models and pave a path for a more informed scaling.	翻訳日:2023-10-26 00:54:07 公開日:2023-10-24
# 最近傍の機械翻訳は出力投影層上でのメタオプティマイザである Nearest Neighbor Machine Translation is Meta-Optimizer on Output Projection Layer ( http://arxiv.org/abs/2305.13034v2 ) ライセンス: Link先を確認	Ruize Gao, Zhirui Zhang, Yichao Du, Lemao Liu, Rui Wang	(参考訳) Nearest Neighbor Machine Translation (k$NN-MT)は、訓練済みニューラルネットワーク翻訳(NMT)モデルとドメイン固有のトークンレベルの検索を統合することで、ドメイン適応タスクにおいて大きな成功を収めた。しかし、その成功の背景にある理由は十分に調査されていない。本稿では,理論的および実証的研究を通じて,$k$NN-MTを包括的に分析する。当初,NMTの出力射影層に勾配降下を暗黙的に実行する手法として,$k$NN-MTの動作機構に関する新たな知見を提供し,モデル微調整の特定の事例であることを示す。その後、我々は、$k$NN-MTとモデル全体の微調整性能の違いを調べるために、複数ドメインの実験と単語レベルの分析を行う。その結果、(1)アダプタに$k$nn-mtを組み込むことで、ドメイン内テストセットの微調整と同等の翻訳性能が得られると同時に、ドメイン外テストセットのパフォーマンスも向上し、(2)ドメイン内低頻度単語のリコールでは$k$nn-mtを大きく上回っているが、このギャップは、追加のアダプタ層でコンテキスト表現を最適化することで橋渡しできる。 Nearest Neighbor Machine Translation ($k$NN-MT) has achieved great success in domain adaptation tasks by integrating pre-trained Neural Machine Translation (NMT) models with domain-specific token-level retrieval. However, the reasons underlying its success have not been thoroughly investigated. In this paper, we comprehensively analyze $k$NN-MT through theoretical and empirical studies. Initially, we provide new insights into the working mechanism of $k$NN-MT as an efficient technique to implicitly execute gradient descent on the output projection layer of NMT, indicating that it is a specific case of model fine-tuning. Subsequently, we conduct multi-domain experiments and word-level analysis to examine the differences in performance between $k$NN-MT and entire-model fine-tuning. Our findings suggest that: (1) Incorporating $k$NN-MT with adapters yields comparable translation performance to fine-tuning on in-domain test sets, while achieving better performance on out-of-domain test sets; (2) Fine-tuning significantly outperforms $k$NN-MT on the recall of in-domain low-frequency words, but this gap could be bridged by optimizing the context representations with additional adapter layers.	翻訳日:2023-10-26 00:53:37 公開日:2023-10-24
# chatgptを蒸留して自動解答評価を行う Distilling ChatGPT for Explainable Automated Student Answer Assessment ( http://arxiv.org/abs/2305.12962v2 ) ライセンス: Link先を確認	Jiazheng Li, Lin Gui, Yuxiang Zhou, David West, Cesare Aloisi, Yulan He	(参考訳) 説明可能で忠実なフィードバックを提供することは,学生の回答自動評価に不可欠である。本稿では,最先端の大規模言語モデルであるChatGPTを用いて,学生の回答スコアリングと合理性生成の同時処理を行う新しいフレームワークを提案する。そこで我々は,ChatGPTに異なるテンプレートを付けて,一貫性のない有理を改良してマーキング基準に適合させることにより,適切な指示を識別する。洗練されたChatGPT出力により、学生の回答を同時に評価し、合理的な結果を提供する、より小さな言語モデルを微調整できる。ベンチマークデータセットの広範な実験により,提案手法はchatgptと比較してqwk全体のスコアを11%向上させた。さらに,提案手法によって得られた理論的根拠がchatgptに匹敵することを示した。このアプローチは,教育における説明可能な自動評価を実現するための有効なソリューションを提供する。コードはhttps://github.com/lijiazheng99/aeraで入手できる。 Providing explainable and faithful feedback is crucial for automated student answer assessment. In this paper, we introduce a novel framework that explores using ChatGPT, a cutting-edge large language model, for the concurrent tasks of student answer scoring and rationale generation. We identify the appropriate instructions by prompting ChatGPT with different templates to collect the rationales, where inconsistent rationales are refined to align with marking standards. The refined ChatGPT outputs enable us to fine-tune a smaller language model that simultaneously assesses student answers and provides rationales. Extensive experiments on the benchmark dataset show that the proposed method improves the overall QWK score by 11% compared to ChatGPT. Furthermore, our thorough analysis and human evaluation demonstrate that the rationales generated by our proposed method are comparable to those of ChatGPT. Our approach provides a viable solution to achieve explainable automated assessment in education. Code available at https://github.com/lijiazheng99/aera.	翻訳日:2023-10-26 00:53:12 公開日:2023-10-24
# nlp研究におけるパラダイムシフトの2次解析--いつ、どのように、なぜ? A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and Why? ( http://arxiv.org/abs/2305.12920v2 ) ライセンス: Link先を確認	Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych	(参考訳) 科学分野の基本概念と傾向を理解することは、その継続的な進歩を保ち続けるために不可欠である。本研究では,因果発見と推論手法を用いて,科学分野における研究トピックの進化を分析するための体系的枠組みを提案する。我々は,NLPにおける研究トピックの進化の多様な側面を包含する3つの変数を定義し,因果探索アルゴリズムを用いてこれらの変数間の因果関係を明らかにする。その後、これらの関係の強度を測定するためにこの構造を利用する。 ACLアンソロジーコーパスに関する広範な実験を行うことにより、我々のフレームワークは、幅広いNLP研究トピックの進化的傾向と根本原因を効果的に発見できることを実証する。具体的には、タスクとメソッドがNLPの研究の主要な要因であることを示し、データセットは従うが、メトリクスは最小限の影響を持つ。 Understanding the fundamental concepts and trends in a scientific field is crucial for keeping abreast of its continuous advancement. In this study, we propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques. We define three variables to encompass diverse facets of the evolution of research topics within NLP and utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data. Subsequently, we leverage this structure to measure the intensity of these relationships. By conducting extensive experiments on the ACL Anthology corpus, we demonstrate that our framework effectively uncovers evolutionary trends and the underlying causes for a wide range of NLP research topics. Specifically, we show that tasks and methods are primary drivers of research in NLP, with datasets following, while metrics have minimal impact.	翻訳日:2023-10-26 00:52:56 公開日:2023-10-24
# opt-r: 大きな言語モデルの推論スキルの微調整と促進における説明の役割を探る OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models ( http://arxiv.org/abs/2305.12001v2 ) ライセンス: Link先を確認	Badr AlKhamissi, Siddharth Verma, Ping Yu, Zhijing Jin, Asli Celikyilmaz, Mona Diab	(参考訳) 本稿では,大規模言語モデル (llm) の推論能力について,特にopen pretrained transformers (opt) モデルを代表として徹底的に検討する。そこで本研究では, OPT-Rモデル, OPT-REモデル, OPT-REモデルの3つのモデルについて検討した。次に,SUPER-NATURALINSTRUCTIONSベンチマークから抽出した57の領域外タスクに対して,26の異なる推論スキルを網羅し,3つのプロンプト技術を用いて全てのモデルを評価する。本研究では,27の構成と6,156の試験評価を網羅的に網羅し,様々な推論スキルにおける説明の役割を理解するために,微調整,プロンプト,スケールの寸法を調査した。この結果から, モデルが微調整された場合, モデルの性能に有意な影響を与えず, 非微調整されたモデルに肯定的な影響を及ぼすことが明らかとなった。さらに,シグネチャリングと微調整の際の説明を取り入れた分類精度が,わずかながら一貫した増加を観察した。最後に、数値(+20.4%)と類推(+13.9%)の推論や、無視可能なあるいは否定的な効果を示すスキルなど、微調整やプロンプトの際の説明を取り入れることで、どのスキルが最も有益かを洞察する。 In this paper, we conduct a thorough investigation into the reasoning capabilities of Large Language Models (LLMs), focusing specifically on the Open Pretrained Transformers (OPT) models as a representative of such models. Our study entails finetuning three different sizes of OPT on a carefully curated reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned without explanations, and OPT-RE, finetuned with explanations. We then evaluate all models on 57 out-of-domain tasks drawn from the SUPER-NATURALINSTRUCTIONS benchmark, covering 26 distinct reasoning skills, utilizing three prompting techniques. Through a comprehensive grid of 27 configurations and 6,156 test evaluations, we investigate the dimensions of finetuning, prompting, and scale to understand the role of explanations on different reasoning skills. Our findings reveal that having explanations in the fewshot exemplar has no significant impact on the model's performance when the model is finetuned, while positively affecting the non-finetuned counterpart. Moreover, we observe a slight yet consistent increase in classification accuracy as we incorporate explanations during prompting and finetuning, respectively. Finally, we offer insights on which skills benefit the most from incorporating explanations during finetuning and prompting, such as Numerical (+20.4%) and Analogical (+13.9%) reasoning, as well as skills that exhibit negligible or negative effects.	翻訳日:2023-10-26 00:52:41 公開日:2023-10-24
# 言語間視覚伝達のためのメタラーニング Meta-learning For Vision-and-language Cross-lingual Transfer ( http://arxiv.org/abs/2305.14843v2 ) ライセンス: Link先を確認	Hanxu Hu, Frank Keller	(参考訳) 現在のvison-Language Model (PVLM) は、様々なマルチモーダルデータセットにおいて優れた性能を発揮する。近年,多言語モデルの構築を目的とした研究が行われ,多言語多モーダルデータセットが提案されている。現在のpvlmは、マルチモーダルなゼロショットや少数ショットのクロスリンガル転送、特に低リソース言語で使用される場合、これらのデータセットでパフォーマンスが悪い。この問題を解決するために,新しいメタ学習型微調整フレームワークを提案する。本フレームワークは,mamlを言語間マルチモーダルで設計することにより,視覚言語シナリオにおける新しい言語に迅速に適応する。 XVNLI, xGQA, MARVL, xFlicker&Co) の視覚言語理解タスクおよびデータセットにおける, ゼロショットおよび少数ショットの言語間移動における現在のPVLMの性能を向上させる実験を行った。 Current pre-trained vison-language models (PVLMs) achieve excellent performance on a range of multi-modal datasets. Recent work has aimed at building multilingual models, and a range of novel multilingual multi-modal datasets have been proposed. Current PVLMs typically perform poorly on these datasets when used for multi-modal zero-shot or few-shot cross-lingual transfer, especially for low-resource languages. To alleviate this problem, we propose a novel meta-learning fine-tuning framework. Our framework makes current PVLMs rapidly adaptive to new languages in vision-language scenarios by designing MAML in a cross-lingual multi-modal manner. Experiments show that our method boosts the performance of current state-of-the-art PVLMs in both zero-shot and few-shot cross-lingual transfer on a range of vision-language understanding tasks and datasets (XVNLI, xGQA, MaRVL, xFlicker&Co)	翻訳日:2023-10-26 00:45:31 公開日:2023-10-24
# 画像キャプションにおけるアクダクタンスとsituated Meaningの探索:マルチモーダル解析 Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis ( http://arxiv.org/abs/2305.14616v2 ) ライセンス: Link先を確認	Pin-Er Chen, Po-Ya Angela Wang, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh	(参考訳) 本稿では,マルチモーダルな意味表現に関する基礎的課題を,計算的認知言語学の観点から考察する。我々は、flickr30kデータセットから得られた画像に、アフォーマンス、知覚的敬礼、オブジェクト番号、視線キューイング、生態的ニッチアソシエーション(ena)という5つの知覚的特性を注釈し、画像キャプションにおけるテキスト的要素との関連について検討する。以上の結果から,ギブソニアン代価を持つ画像は,テルル代価を示す画像に比べて「保持版」と「コンテナ名詞」を含む字幕の頻度が高いことが判明した。知覚的サリエンス、対象数、ENAもまた言語表現の選択と関連している。本研究は,物体や事象の包括的理解には,認知的注意,言語の意味的ニュアンス,多様性の統合が必要であることを示す。自然言語理解における位置的意味と余裕の基盤の重要性を強調し,様々なシナリオにおける人間的な解釈の進歩の可能性について考察した。 This paper explores the grounding issue regarding multimodal semantic representation from a computational cognitive-linguistic view. We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche Association (ENA), and examine their association with textual elements in the image captions. Our findings reveal that images with Gibsonian affordance show a higher frequency of captions containing 'holding-verbs' and 'container-nouns' compared to images displaying telic affordance. Perceptual Salience, Object Number, and ENA are also associated with the choice of linguistic expressions. Our study demonstrates that comprehensive understanding of objects or events requires cognitive attention, semantic nuances in language, and integration across multiple modalities. We highlight the vital importance of situated meaning and affordance grounding in natural language understanding, with the potential to advance human-like interpretation in various scenarios.	翻訳日:2023-10-26 00:45:10 公開日:2023-10-24
# 翻訳と効果的な言語間伝達のための多言語画素表現 Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer ( http://arxiv.org/abs/2305.14280v2 ) ライセンス: Link先を確認	Elizabeth Salesky, Neha Verma, Philipp Koehn, Matt Post	(参考訳) 画素表現を用いた多言語機械翻訳モデルを効果的に学習する方法を紹介し,実証する。さまざまな言語とスクリプトカバレッジを備えた2つの異なるデータ設定を実験し,サブワード埋め込みと比較して性能が向上した。文字間のパラメータ共有など,画素表現のさまざまな特性について検討し,前向きな転送につながる部分の理解を深める。これらの特性は, 未知のスクリプトへのシームレスな言語間移動を可能にするだけでなく, 語彙展開などの代替手段よりも, 画素表現をよりデータ効率良くする。この作業が、すべての言語とスクリプトに対して、より拡張可能な多言語モデルに貢献することを願っています。 We introduce and demonstrate how to effectively train multilingual machine translation models with pixel representations. We experiment with two different data settings with a variety of language and script coverage, demonstrating improved performance compared to subword embeddings. We explore various properties of pixel representations such as parameter sharing within and across scripts to better understand where they lead to positive transfer. We observe that these properties not only enable seamless cross-lingual transfer to unseen scripts, but make pixel representations more data-efficient than alternatives such as vocabulary expansion. We hope this work contributes to more extensible multilingual models for all languages and scripts.	翻訳日:2023-10-26 00:44:51 公開日:2023-10-24
# 2次元クラスター状態におけるバルク測定による境界相転移のトリガリング Triggering Boundary Phase Transitions through Bulk Measurements in 2D Cluster States ( http://arxiv.org/abs/2305.14231v2 ) ライセンス: Link先を確認	Yuchen Guo, Jian-Hao Zhang, Zhen Bi, Shuo Yang	(参考訳) テンソルネットワーク法を用いてバルク測定を行う無限2次元クラスター状態の境界における位相図について検討する。状態は、下限量子ビットおよび全てのバルク量子ビットにおいて、一様測定値$m = \cos{\theta}z+\sin{\theta}x$となる。以上の結果から, システムの境界は, 測定角度$\theta = \pi/2$ および任意の$\theta < \pi/2$ に対して領域法的絡み合いを示すことがわかった。領域ロー位相では、相転移は$\theta_c=1.371$で起こる。 $\theta \in(\theta_c,\pi/2)$ の位相は、単次元局所ガッピングハミルトニアンの一意な基底状態として実現できない非射影行列積状態によって特徴づけられる。その代わり、自発的な対称性の破れを伴う猫の状態に似ている。これらの結果から, 2次元系の境界の位相図は, 標準1次元系よりも複雑であることが示された。 We investigate the phase diagram at the boundary of an infinite two-dimensional cluster state subject to bulk measurements using tensor network methods. The state is subjected to uniform measurements $M = \cos{\theta}Z+\sin{\theta}X$ on the lower boundary qubits and in all bulk qubits. Our results show that the boundary of the system exhibits volume-law entanglement at the measurement angle $\theta = \pi/2$ and area-law entanglement for any $\theta < \pi/2$. Within the area-law phase, a phase transition occurs at $\theta_c=1.371$. The phase with $\theta \in(\theta_c,\pi/2)$ is characterized by a noninjective matrix product state, which cannot be realized as the unique ground state of a one-dimensional local, gapped Hamiltonian. Instead, it resembles a cat state with spontaneous symmetry breaking. These findings demonstrate that the phase diagram of the boundary of a two-dimensional system can be more intricate than that of a standard one-dimensional system.	翻訳日:2023-10-26 00:44:39 公開日:2023-10-24
# 知識の不変性を保つ:オープン情報抽出のロバスト性評価の再検討 Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction ( http://arxiv.org/abs/2305.13981v2 ) ライセンス: Link先を確認	Ji Qi, Chuchun Zhang, Xiaozhi Wang, Kaisheng Zeng, Jifan Yu, Jinxin Liu, Jiuding Sun, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu	(参考訳) 分布変化に対するロバスト性は、NLPモデルを現実の世界、特に情報抽出タスクにうまく適用できることを保証する。しかしながら、ほとんどの先行評価ベンチマークは、ロバスト性の重要な測定値を無視して、ペアワイズマッチングの正しさを検証することに費やされてきた。本稿では,実世界におけるオープン情報抽出モデルの評価をシミュレートした最初のベンチマークを提案する。それぞれの例が、同じ意味の構造化された知識を持つが、異なる構文と表現形式を持つ文からなる、知識不変のクランクである大規模なテストベッドを設計し、アノテートする。さらにロバスト性メトリクスを詳述することで、モデルが全体のクランクで一貫して正確である場合、ロバストであると判断される。我々は過去10年間に発行された典型的なモデルと一般的な大言語モデルの実験を行い、その結果、既存の成功したモデルは、最大で23.43 F1スコアのフラストレーションのある劣化を示した。私たちのリソースとコードはhttps://github.com/qijimrc/robust.comから入手できます。 The robustness to distribution changes ensures that NLP models can be successfully applied in the realistic world, especially for information extraction tasks. However, most prior evaluation benchmarks have been devoted to validating pairwise matching correctness, ignoring the crucial measurement of robustness. In this paper, we present the first benchmark that simulates the evaluation of open information extraction models in the real world, where the syntactic and expressive distributions under the same knowledge meaning may drift variously. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique that consists of sentences with structured knowledge of the same meaning but with different syntactic and expressive forms. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques. We perform experiments on typical models published in the last decade as well as a popular large language model, the results show that the existing successful models exhibit a frustrating degradation, with a maximum drop of 23.43 F1 score. Our resources and code are available at https://github.com/qijimrc/ROBUST.	翻訳日:2023-10-26 00:44:04 公開日:2023-10-24
# 大規模言語モデルのための協調学習アシスタントによる誤りから学ぶ Learning from Mistakes via Cooperative Study Assistant for Large Language Models ( http://arxiv.org/abs/2305.13829v3 ) ライセンス: Link先を確認	Danqing Wang, Lei Li	(参考訳) 大規模言語モデル(llm)は、自身のフィードバックに基づいて世代を洗練する可能性を実証している。しかし、llm自体からのフィードバックはしばしば不正確であり、その利点を制限している。本稿では,対話的協調によるミス学習における主要なllmを支援する補助エージェントを用いた新しい枠組みである,大言語モデル学習支援システム(salam)を提案する。収集フェーズでは、学生アシスタントエージェントがメインLLMをプローブし、そのエラーを分析し、間違ったメモリでインタラクションを収集する。試験段階では、研究アシスタントは、関連するケースを検索して、メインのllmが予測し、同様のエラーを避けるためのガイドラインを提供する。まず,汎用学習支援システムの有効性を検証し,その効果をカスタマイズし,学習経験を模倣してllm固有の指導を行う。 SALAMはBBHでは6.6、BBQでは12.6の精度でLLMを大幅に向上できることを示す。 Large language models (LLMs) have demonstrated their potential to refine their generation based on their own feedback. However, the feedback from LLM itself is often inaccurate, thereby limiting its benefits. In this paper, we propose Study Assistant for Large LAnguage Model (SALAM), a novel framework with an auxiliary agent to assist the main LLM in learning from mistakes through interactive cooperation. In the gathering phase, the student assistant agent probes the main LLM, analyzes its errors, and collects the interaction in a mistake memory. During the examination phase, the study assistant provides guidelines by retrieving relevant cases to help the main LLM anticipate and avoid similar errors. We first investigate the effectiveness of a general study assistant and then customize it to provide LLM-specific guidance through imitation learning from successful guidance experiences. Our experiments on three LLMs using two challenging frameworks demonstrate that SALAM can significantly boost LLMs by an accuracy margin of up to 6.6 on BBH and 12.6 on BBQ.	翻訳日:2023-10-26 00:43:45 公開日:2023-10-24
# 質問が英語でなければChatGPTを信用しない:多言語能力とLLMのタイプの検討 Don't Trust ChatGPT when Your Question is not in English: A Study of Multilingual Abilities and Types of LLMs ( http://arxiv.org/abs/2305.16339v2 ) ライセンス: Link先を確認	Xiang Zhang, Senyu Li, Bradley Hauer, Ning Shi, Grzegorz Kondrak	(参考訳) 大規模言語モデル(LLM)は,近年,自然言語理解能力に優れ,多種多様な自然言語処理(NLP)タスクに優れてきた。ほとんどのllmが主に英語で訓練されているにもかかわらず、複数の研究が他の多くの言語での比較性能を示している。しかし、LLMが多言語能力をどのように獲得するか、また異なる言語間でパフォーマンスがどのように異なるか、という根本的な疑問が続いている。ユーザや研究者は多種多様な言語背景から来ており、LLMの活用と解釈に影響を与える可能性があるため、これらの質問はLLMの研究に不可欠である。本研究では,多言語環境でのllmの性能差を体系的に評価する方法を提案する。 LLMにおける多言語一般化の現象について検討し,多言語学習データ不足が多言語能力の向上につながることを示す。これを実現するために、バック翻訳に基づく新しいプロンプト方式を用いる。その結果,GPTは多言語設定において高い翻訳的振る舞いを示すことがわかった。 Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities and have excelled in a variety of natural language processing (NLP)tasks in recent years. Despite the fact that most LLMs are trained predominantly in English, multiple studies have demonstrated their comparative performance in many other languages. However, fundamental questions persist regarding how LLMs acquire their multi-lingual abilities and how performance varies across different languages. These inquiries are crucial for the study of LLMs since users and researchers often come from diverse language backgrounds, potentially influencing their utilization and interpretation of LLMs' results. In this work, we propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings. We investigate the phenomenon of across-language generalizations in LLMs, wherein insufficient multi-lingual training data leads to advanced multi-lingual capabilities. To accomplish this, we employ a novel back-translation-based prompting method. The results show that GPT exhibits highly translating-like behaviour in multilingual settings.	翻訳日:2023-10-26 00:35:13 公開日:2023-10-24
# 経験的条件付き一貫した最適輸送 Consistent Optimal Transport with Empirical Conditional Measures ( http://arxiv.org/abs/2305.15901v3 ) ライセンス: Link先を確認	Piyushi Manupriya, Rachit Keerti Das, Sayantan Biswas, Saketha Nath Jagarlapudi	(参考訳) 2つの連接分布からのサンプルを仮定し,共通変数上での最適輸送(OT)の問題を考える。条件付き変数が連続であるような一般的な設定に注目し、2つのジョイント分布におけるこの変数の限界は同じではないかもしれない。このような設定では、標準ot変種は採用できず、新しい推定技術が必要である。主な課題は条件分布が明確には利用できないことであるが、我々のot定式化における重要なアイデアは、共同サンプル上で計算されたカーネル化されたleast-squares項を、輸送計画の限界と経験的な条件条件とを暗黙的に一致させることである。軽度条件下では,条件付き変数の関数として推定された輸送計画が漸近的に最適であることを示す。有限標本に対しては、正規化対象の偏差が$O(1/m^{1/4})$で有界であることを示し、$m$はサンプルの数である。また,明示的な確率モデルと暗黙的な生成モデルを用いて条件付き輸送計画をモデル化する方法についても論じる。最適計画が解析的に知られている合成データセット上の推定器の一貫性を実証的に検証する。治療に対する細胞応答予測の文脈において, プロンプト・ラーニングや条件生成などのアプリケーションで採用すると, 最先端の手法が改善される。 Given samples from two joint distributions, we consider the problem of Optimal Transportation (OT) between them when conditioned on a common variable. We focus on the general setting where the conditioned variable may be continuous, and the marginals of this variable in the two joint distributions may not be the same. In such settings, standard OT variants cannot be employed, and novel estimation techniques are necessary. Since the main challenge is that the conditional distributions are not explicitly available, the key idea in our OT formulation is to employ kernelized-least-squares terms computed over the joint samples, which implicitly match the transport plan's marginals with the empirical conditionals. Under mild conditions, we prove that our estimated transport plans, as a function of the conditioned variable, are asymptotically optimal. For finite samples, we show that the deviation in terms of our regularized objective is bounded by $O(1/m^{1/4})$, where $m$ is the number of samples. We also discuss how the conditional transport plan could be modelled using explicit probabilistic models as well as using implicit generative ones. We empirically verify the consistency of our estimator on synthetic datasets, where the optimal plan is analytically known. When employed in applications like prompt learning for few-shot classification and conditional-generation in the context of predicting cell responses to treatment, our methodology improves upon state-of-the-art methods.	翻訳日:2023-10-26 00:34:55 公開日:2023-10-24
# スクラッチによる文埋め込みの対比学習 Contrastive Learning of Sentence Embeddings from Scratch ( http://arxiv.org/abs/2305.15077v2 ) ライセンス: Link先を確認	Junlei Zhang, Zhenzhong Lan, Junxian He	(参考訳) コントラスト学習は、最先端の文埋め込みを訓練する主要なアプローチである。これまでの研究は、人間の注釈付き自然言語推論(nli)データや、教師なしの大規模非ラベル文を用いて、文埋め込みを学習してきた。しかし、ラベルのないデータであっても、様々な理由から特定のドメインで課題を提起している。これらの問題に対処するために、合成データによる文埋め込みを訓練するコントラスト学習フレームワークSynCSEを提案する。具体的には,(1)ラベルなし文(同期部分)に対する肯定的および否定的アノテーションの生成,(2)対応するアノテーションをスクラッチから生成すること(同期スクラッチ),など,比較学習に必要なデータサンプルを合成する大規模言語モデルの利用について検討する。 SynCSE-partial と SynCSE-scratch はどちらも教師なしベースラインを大幅に上回り、SynCSE-partial は教師付きモデルに匹敵する性能をほとんどの設定で達成している。 Contrastive learning has been the dominant approach to train state-of-the-art sentence embeddings. Previous studies have typically learned sentence embeddings either through the use of human-annotated natural language inference (NLI) data or via large-scale unlabeled sentences in an unsupervised manner. However, even in the case of unlabeled data, their acquisition presents challenges in certain domains due to various reasons. To address these issues, we present SynCSE, a contrastive learning framework that trains sentence embeddings with synthesized data. Specifically, we explore utilizing large language models to synthesize the required data samples for contrastive learning, including (1) producing positive and negative annotations given unlabeled sentences (SynCSE-partial), and (2) generating sentences along with their corresponding annotations from scratch (SynCSE-scratch). Experimental results on sentence similarity and reranking tasks indicate that both SynCSE-partial and SynCSE-scratch greatly outperform unsupervised baselines, and SynCSE-partial even achieves comparable performance to the supervised models in most settings.	翻訳日:2023-10-26 00:34:29 公開日:2023-10-24
# Cheap and Quick: 大規模言語モデルのための効率的な視覚言語指導チューニング Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models ( http://arxiv.org/abs/2305.15023v3 ) ライセンス: Link先を確認	Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, Rongrong Ji	(参考訳) 近年、人工知能の次のマイルストーンと見なされる視覚言語(vl)学習など、大規模言語モデル(llm)のマルチモーダル能力の拡張への関心が高まっている。しかし、既存のソリューションは非常に高価であり、過剰なパラメータを最適化するだけでなく、VL命令のチューニングの前にも大規模な事前学習が必要である。本稿では,Mixture-of-Modality Adaptation (MMA)と呼ばれる,LLMの有効なVL適応のための,新規で安価なソリューションを提案する。画像エンコーダとLLMを接続するために大きなニューラルネットワークを使用する代わりに、MMAはLLMとVLタスクのギャップを埋めるために、軽量モジュール(アダプタ)を採用する。一方、MMAは、LLMが自然言語理解能力を損なうことなく、シングルモーダル命令とマルチモーダル命令の自動シフトを実現するためのルーティングアルゴリズムも備えている。 mmaを検証するために、llamaと呼ばれる最近のllmに適用し、これをlavinという大きな視覚言語指示モデルと呼ぶ。 mmaとlavinを検証するために,マルチモーダル科学質問応答とマルチモーダル対話という2つの設定で広範な実験を行った。実験結果は,既存のマルチモーダルLLMよりもLaVINの競争性能と訓練効率が優れているだけでなく,汎用チャットボットとしての可能性も確認した。さらに重要なことに、LaVINの実際の支出は極めて安価であり、例えば3.8Mのトレーニング可能なパラメータを持つ訓練時間は1.4時間に過ぎず、MMAの有効性を大きく確認している。私たちのプロジェクトはhttps://luogen1996.github.io/lavinでリリースしています。 Recently, growing interest has been aroused in extending the multimodal capability of large language models (LLMs), e.g., vision-language (VL) learning, which is regarded as the next milestone of artificial general intelligence. However, existing solutions are prohibitively expensive, which not only need to optimize excessive parameters, but also require another large-scale pre-training before VL instruction tuning. In this paper, we propose a novel and affordable solution for the effective VL adaption of LLMs, called Mixture-of-Modality Adaptation (MMA). Instead of using large neural networks to connect the image encoder and LLM, MMA adopts lightweight modules, i.e., adapters, to bridge the gap between LLMs and VL tasks, which also enables the joint optimization of the image and language models. Meanwhile, MMA is also equipped with a routing algorithm to help LLMs achieve an automatic shift between single- and multi-modal instructions without compromising their ability of natural language understanding. To validate MMA, we apply it to a recent LLM called LLaMA and term this formed large vision-language instructed model as LaVIN. To validate MMA and LaVIN, we conduct extensive experiments under two setups, namely multimodal science question answering and multimodal dialogue. The experimental results not only demonstrate the competitive performance and the superior training efficiency of LaVIN than existing multimodal LLMs, but also confirm its great potential as a general-purpose chatbot. More importantly, the actual expenditure of LaVIN is extremely cheap, e.g., only 1.4 training hours with 3.8M trainable parameters, greatly confirming the effectiveness of MMA. Our project is released at https://luogen1996.github.io/lavin.	翻訳日:2023-10-26 00:33:27 公開日:2023-10-24
# ACL OCL Corpus:計算言語学におけるオープンサイエンスの推進 The ACL OCL Corpus: Advancing Open Science in Computational Linguistics ( http://arxiv.org/abs/2305.14996v2 ) ライセンス: Link先を確認	Shaurya Rohatgi, Yanxia Qin, Benjamin Aw, Niranjana Unnithan, Min-Yen Kan	(参考訳) 本稿では、ACLアンソロジーから派生した学術コーパスであるACL OCLを紹介し、計算言語学領域におけるオープン科学研究を支援する。 ACLアンソロジーの以前のバージョンの統合と拡張により、ACL OCLはメタデータ、PDFファイル、引用グラフ、セクション、数字、大きな知識リソースへのリンクを含む構造化されたフルテキストをコントリビュートする(Semantic Scholar)。 ACL OCLは、73Kの論文と210Kの数字を含む70年に及ぶ。我々は、ACL OCLが計算言語学の傾向を観察するためにどのように適用されているかに注目する。教師付きニューラルモデルを用いて論文のトピックを検出することで、"Syntax: Tagging, Chunking and Parsing"への関心が薄れ、"Natural Language Generation"が復活しつつあることに注意する。私たちのデータセットはHuggingFace (https://huggingface.co/datasets/WINGNUS/ACL-OCL)から入手可能です。 We present ACL OCL, a scholarly corpus derived from the ACL Anthology to assist Open scientific research in the Computational Linguistics domain. Integrating and enhancing the previous versions of the ACL Anthology, the ACL OCL contributes metadata, PDF files, citation graphs and additional structured full texts with sections, figures, and links to a large knowledge resource (Semantic Scholar). The ACL OCL spans seven decades, containing 73K papers, alongside 210K figures. We spotlight how ACL OCL applies to observe trends in computational linguistics. By detecting paper topics with a supervised neural model, we note that interest in "Syntax: Tagging, Chunking and Parsing" is waning and "Natural Language Generation" is resurging. Our dataset is available from HuggingFace (https://huggingface.co/datasets/WINGNUS/ACL-OCL).	翻訳日:2023-10-26 00:32:57 公開日:2023-10-24
# Dolphin: アラビア語のNLGのベンチマーク Dolphin: A Challenging and Diverse Benchmark for Arabic NLG ( http://arxiv.org/abs/2305.14989v2 ) ライセンス: Link先を確認	El Moatez Billah Nagoudi, AbdelRahim Elmadany, Ahmed El-Shangiti, Muhammad Abdul-Mageed	(参考訳) 我々は、アラビア語の言語と品種の広範なコレクションに特化した自然言語生成(NLG)評価フレームワークの必要性に対処する新しいベンチマークであるDolphinを紹介する。提案したベンチマークは、対話生成、質問応答、機械翻訳、要約などを含む13種類のNLGタスクを含む。イルカは50のテストスプリットにまたがる40の多様で代表的な公開データセットで構成されており、実世界のシナリオとアラビア語の言語豊かさを反映して注意深くキュレートされている。アラビア語および多言語モデルの性能と一般化能力を評価するための新しい標準を設定し、研究者が現在の方法論の境界を押し上げることを約束する。我々はDolphinを広範囲に分析し、その多様性と現在のアラビアのNLG研究のギャップを明らかにする。また、インタラクティブでモジュール化された公開のリーダーボードを提供し、ベンチマークでいくつかのモデルを評価し、研究者が比較できる強力なベースラインを設定することができます。 We present Dolphin, a novel benchmark that addresses the need for a natural language generation (NLG) evaluation framework dedicated to the wide collection of Arabic languages and varieties. The proposed benchmark encompasses a broad range of 13 different NLG tasks, including dialogue generation, question answering, machine translation, summarization, among others. Dolphin comprises a substantial corpus of 40 diverse and representative public datasets across 50 test splits, carefully curated to reflect real-world scenarios and the linguistic richness of Arabic. It sets a new standard for evaluating the performance and generalization capabilities of Arabic and multilingual models, promising to enable researchers to push the boundaries of current methodologies. We provide an extensive analysis of Dolphin, highlighting its diversity and identifying gaps in current Arabic NLG research. We also offer a public leaderboard that is both interactive and modular and evaluate several models on our benchmark, allowing us to set strong baselines against which researchers can compare.	翻訳日:2023-10-26 00:32:39 公開日:2023-10-24
# キャリブレーションを問う:人間のフィードバックを微調整した言語モデルからキャリブレーションされた信頼スコアを除去するための戦略 Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback ( http://arxiv.org/abs/2305.14975v2 ) ライセンス: Link先を確認	Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning	(参考訳) 信頼に値する実世界の予測システムは、十分に調整された信頼度スコアを生成するべきである。つまり、その回答に対する信頼度は、答えが正しい可能性を示すものでなければならない。近年の研究では、教師なし事前学習が条件付き確率が著しく高い大言語モデル(lms)を生成することが示されている。しかしながら、最も広く使われているLMは、人間のフィードバック(RLHF-LMs)からの強化学習によって微調整されており、RLHF-LMsが極めて低濃度の条件付き確率を生成することを示唆する研究もある。この弱さを考慮し,rlhf-lmsから信頼度スコアを抽出する方法の広範な評価を行った。 ChatGPT, GPT-4, Claude などの RLHF-LM に対して,出力トークンとして出力される言語的信頼度は,TriviaQA, SciQ, TruthfulQA ベンチマークにおけるモデルの条件付き確率よりもよく校正され,期待される校正誤差を50%削減する。 A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of low-confidence predictions. Recent studies have shown that unsupervised pre-training produces large language models (LMs) whose conditional probabilities are remarkably well-calibrated. However, the most widely-used LMs are fine-tuned with reinforcement learning from human feedback (RLHF-LMs), and some studies have suggested that RLHF-LMs produce conditional probabilities that are very poorly calibrated. In light of this perceived weakness, we conduct a broad evaluation of methods for extracting confidence scores from RLHF-LMs. For RLHF-LMs such as ChatGPT, GPT-4, and Claude, we find that verbalized confidences emitted as output tokens are typically better-calibrated than the model's conditional probabilities on the TriviaQA, SciQ, and TruthfulQA benchmarks, often reducing the expected calibration error by a relative 50%.	翻訳日:2023-10-26 00:32:22 公開日:2023-10-24
# GRACE: 差別的ガイドによる思考の連鎖 GRACE: Discriminator-Guided Chain-of-Thought Reasoning ( http://arxiv.org/abs/2305.14934v2 ) ライセンス: Link先を確認	Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Lu Wang	(参考訳) マルチステップ推論の文脈では、例えば、チェーン・オブ・シント(英語版)を持つ言語モデル(LM)は、容易に誤ったステップを割り当てることができる。結果として、ソリューションの可能性を最適化するデコーディング戦略は、しばしば不正確なソリューションをもたらす。この問題に対処するため、我々は、正しい推論ステップを生成するためのデコードプロセスを段階的に決定するGRACE(CorrectnEss Discriminator)によるチェーン・オブ・シークレット・推論を提案する。 GRACEは、正しいステップと間違ったステップに対して対照的な損失で訓練された判別器を使用し、復号時にその正確性に基づいて次のステップ候補をスコアする。重要な点として、GRACEはLMトレーニングや微調整を必要とせず、LMからのサンプリングのみを必要とする。 flan-t5ファミリーとllamaファミリのモデルを用いて、4つの数学と2つの象徴的推論タスクの優雅さを評価し、ほとんどの設定で欲張りなデコード、検証者、自己一貫性と比較して、実質的なパフォーマンス向上を示す。さらに自己整合性と組み合わせると、GRACEはすべてのベースラインを大きなマージンで上回る。 GSM8Kに対する人間とLLMの評価は、GRACEが最終回答精度を向上するだけでなく、中間推論の正確性も向上することを示している。我々の実装は \url{https://github.com/mukhal/grace} でアクセスできる。 In the context of multi-step reasoning, e.g., with chain-of-thought, language models (LMs) can easily assign a high likelihood to incorrect steps. As a result, decoding strategies that optimize for solution likelihood often yield incorrect solutions. To address this issue, we propose Guiding chain-of-thought ReAsoning with a CorrectnEss Discriminator (GRACE), a stepwise decoding approach that steers the decoding process towards producing correct reasoning steps. GRACE employs a discriminator trained with a contrastive loss over correct and incorrect steps, which is used during decoding to score next-step candidates based on their correctness. Importantly, GRACE only requires sampling from the LM, without the need for LM training or fine-tuning. Using models from FLAN-T5 and LLaMA families, we evaluate GRACE over four math and two symbolic reasoning tasks, where it exhibits substantial performance gains compared to greedy decoding, verifiers, and self-consistency in most settings. When further combined with self-consistency, GRACE outperforms all the baselines by sizeable margins. Human and LLM evaluations over GSM8K show that GRACE not only improves the final answer accuracy but also the correctness of the intermediate reasoning. Our implementation can be accessed at \url{https://github.com/mukhal/grace}.	翻訳日:2023-10-26 00:31:57 公開日:2023-10-24
# 電子スピンが電気双極子スピン共鳴によって駆動される量子ドット内の動的核スピン偏極 Dynamical nuclear spin polarization in a quantum dot with an electron spin driven by electric dipole spin resonance ( http://arxiv.org/abs/2306.11253v2 ) ライセンス: Link先を確認	Peter Stano, Takashi Nakajima, Akito Noiri, Seigo Tarucha, Daniel Loss	(参考訳) 単一電子スピンによって誘起される量子ドット内の核スピンの分極を、コヒーレントなラビ振動を行うために電気的に駆動する。核スピン偏光速度を導出し、その制御パラメータ、特に電子ラーモア周波数からの駆動周波数のデチューニングに依存することを解析する。生じる核スピン偏極は、2つの重要な違いを持つNMR文献から知られているハートマン・ハーン効果と関係している。まず、量子ドットでは一般的にマイクロ磁石を使用し、電子と核スピンの量子化軸の小さな偏向に繋がる。第2に、電気駆動は原子格子に対して電子をウィグルする。伝統的なハートマン・ハーンのシナリオにはない2つの効果は、ゲート量子ドットにおける原子スピン偏極の2つのメカニズムを引き起こす。核スピン偏極は共鳴現象であり、電子ラビと核ラーモア周波数(典型的には数MHzまたは数MHz)の共鳴において最大効率を達成する。駆動周波数の関数として、偏光速度は鋭いピークを発生させ、大きな値に達することができる。原子核偏極は電子ラーモア周波数の変化として実験的に検出されるため、式や図形では前者から後者への変換がしばしば行われる。これらの単位では、分極はGaAs量子ドットで数百MHz/s、Si量子ドットで少なくとも数十kHz/sに達する。我々は、大きな核分極を達成するための共鳴分極効果とフィードバックによるオーバーハウザー場安定化の可能性を分析する。 We analyze the polarization of nuclear spins in a quantum dot induced by a single-electron spin that is electrically driven to perform coherent Rabi oscillations. We derive the associated nuclear-spin polarization rate and analyze its dependence on the accessible control parameters, especially the detuning of the driving frequency from the electron Larmor frequency. The arising nuclear-spin polarization is related to the Hartmann-Hahn effect known from the NMR literature with two important differences. First, in quantum dots one typically uses a micro magnet, leading to a small deflection of the quantization axes of the electron and nuclear spins. Second, the electric driving wiggles the electron with respect to the atomic lattice. The two effects, absent in the traditional Hartmann-Hahn scenario, give rise to two mechanisms of nuclear-spin polarization in gated quantum dots. The arising nuclear-spin polarization is a resonance phenomenon, achieving maximal efficiency at the resonance of the electron Rabi and nuclear Larmor frequency (typically a few or a few tens of MHz). As a function of the driving frequency, the polarization rate can develop sharp peaks and reach large values at them. Since the nuclear polarization is experimentally detected as changes of the electron Larmor frequency, we often convert the former to the latter in our formulas and figures. In these units, the polarization can reach hundreds of MHz/s in GaAs quantum dots and at least tens of kHz/s in Si quantum dots. We analyze possibilities to exploit the resonant polarization effects for achieving large nuclear polarization and for stabilizing the Overhauser field through feedback.	翻訳日:2023-10-26 00:25:50 公開日:2023-10-24
# ビジョンファウンデーションモデルによる任意のポイントクラウドシーケンスの分割 Segment Any Point Cloud Sequences by Distilling Vision Foundation Models ( http://arxiv.org/abs/2306.09347v2 ) ライセンス: Link先を確認	Youquan Liu and Lingdong Kong and Jun Cen and Runnan Chen and Wenwei Zhang and Liang Pan and Kai Chen and Ziwei Liu	(参考訳) 視覚基礎モデル(VFM)の最近の進歩は、多目的かつ効率的な視覚知覚の新しい可能性を開いた。本稿では,vfmを多種多様な自動車用ポイントクラウドシーケンスのセグメンテーションに利用する新しいフレームワークである seal を紹介する。 sealには3つの魅力がある。 i) スケーラビリティ: VFMはポイントクラウドに直接蒸留され、事前トレーニング中に2Dまたは3Dのアノテーションが不要になる。 ii) 一貫性: 空間的および時間的関係は、カメラからライダーへの正規化段階とポイント・ツー・セグメンの正規化段階の両方において実施され、クロスモーダル表現学習が促進される。三総合可能性:シールは、実際の/合成、低解像度、大規模/小規模、クリーン/破損したデータセットを含む多様なポイントクラウドを含む下流タスクに、オフザシェルフ方式で知識伝達を可能にする。 11の異なるポイントクラウドデータセットで広範な実験が行われ、sealの有効性と優位性が示された。特筆すべきは、アザラシは線形探索の後、無作為初期化を36.9%、先行芸術を6.1%上回り、無作為初期化を45.0%上回ったことである。さらに、sealは、テスト済みの11のpoint cloudデータセットすべてにおいて、20の異なるマイナショット微調整タスクにわたる既存のメソッドよりも大きなパフォーマンス向上を示している。 Recent advancements in vision foundation models (VFMs) have opened up new possibilities for versatile and efficient visual perception. In this work, we introduce Seal, a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences. Seal exhibits three appealing properties: i) Scalability: VFMs are directly distilled into point clouds, obviating the need for annotations in either 2D or 3D during pretraining. ii) Consistency: Spatial and temporal relationships are enforced at both the camera-to-LiDAR and point-to-segment regularization stages, facilitating cross-modal representation learning. iii) Generalizability: Seal enables knowledge transfer in an off-the-shelf manner to downstream tasks involving diverse point clouds, including those from real/synthetic, low/high-resolution, large/small-scale, and clean/corrupted datasets. Extensive experiments conducted on eleven different point cloud datasets showcase the effectiveness and superiority of Seal. Notably, Seal achieves a remarkable 45.0% mIoU on nuScenes after linear probing, surpassing random initialization by 36.9% mIoU and outperforming prior arts by 6.1% mIoU. Moreover, Seal demonstrates significant performance gains over existing methods across 20 different few-shot fine-tuning tasks on all eleven tested point cloud datasets.	翻訳日:2023-10-26 00:25:28 公開日:2023-10-24
# 微分的にプライベートな条件付き独立性テスト Differentially Private Conditional Independence Testing ( http://arxiv.org/abs/2306.06721v2 ) ライセンス: Link先を確認	Iden Kalemaj, Shiva Prasad Kasiviswanathan, Aaditya Ramdas	(参考訳) 条件独立テスト(CI)は、統計データ分析において広く使われ、例えば、因果グラフ発見のための多くのアルゴリズムの構成要素である。 ciテストの目的は、$x \perp \!というヌル仮説を受け入れたり拒否したりすることです。 \! \! \perp Y \mid Z$, where $X \in \mathbb{R}, Y \in \mathbb{R}, Z \in \mathbb{R}^d$。本研究では,差分プライバシー制約下での条件付き独立試験について検討する。我々は、ShahとPetersの一般化共分散尺度(2020)とCand\`es et al.(2016)の条件付きランダム化テスト(モデル-X仮定)の2つのプライベートCIテスト手順を設計する。テストのパフォーマンスを理論的に保証し、実証的に検証します。これらは、Z$が連続している場合の一般的なケースで機能する厳密な理論的保証を持つ最初のプライベートCIテストである。 Conditional independence (CI) tests are widely used in statistical data analysis, e.g., they are the building block of many algorithms for causal graph discovery. The goal of a CI test is to accept or reject the null hypothesis that $X \perp \!\!\! \perp Y \mid Z$, where $X \in \mathbb{R}, Y \in \mathbb{R}, Z \in \mathbb{R}^d$. In this work, we investigate conditional independence testing under the constraint of differential privacy. We design two private CI testing procedures: one based on the generalized covariance measure of Shah and Peters (2020) and another based on the conditional randomization test of Cand\`es et al. (2016) (under the model-X assumption). We provide theoretical guarantees on the performance of our tests and validate them empirically. These are the first private CI tests with rigorous theoretical guarantees that work for the general case when $Z$ is continuous.	翻訳日:2023-10-26 00:24:40 公開日:2023-10-24
# ラベルシフトによるフェデレーション不確かさの等角的予測 Conformal Prediction for Federated Uncertainty Quantification Under Label Shift ( http://arxiv.org/abs/2306.05131v2 ) ライセンス: Link先を確認	Vincent Plassier, Mehdi Makni, Aleksandr Rubashevskii, Eric Moulines and Maxim Panov	(参考訳) Federated Learning(FL)は機械学習フレームワークで、多くのクライアントがトレーニングデータを分散化しながらモデルを協調的にトレーニングする。近年のFLの発展にもかかわらず、不確実量化トピック(UQ)は部分的に解決されている。 UQ法の中で、共形予測(CP)アプローチは最小の仮定の下で分布のない保証を提供する。質的回帰に基づく新しい連立共形共形予測法を開発し,プライバシー制約を考慮に入れる。この方法はエージェント間のラベルシフトを効果的に扱うために重み付けを活用し、予測セットの有効なカバレッジと差分プライバシの両方を理論的に保証する。広範な実験により、この方法が現在の競争相手よりも優れていることが示されている。 Federated Learning (FL) is a machine learning framework where many clients collaboratively train models while keeping the training data decentralized. Despite recent advances in FL, the uncertainty quantification topic (UQ) remains partially addressed. Among UQ methods, conformal prediction (CP) approaches provides distribution-free guarantees under minimal assumptions. We develop a new federated conformal prediction method based on quantile regression and take into account privacy constraints. This method takes advantage of importance weighting to effectively address the label shift between agents and provides theoretical guarantees for both valid coverage of the prediction sets and differential privacy. Extensive experimental studies demonstrate that this method outperforms current competitors.	翻訳日:2023-10-26 00:24:22 公開日:2023-10-24
# 高輝度LHCにおけるデータ圧縮のための地球モーバー距離の微分 Differentiable Earth Mover's Distance for Data Compression at the High-Luminosity LHC ( http://arxiv.org/abs/2306.04712v2 ) ライセンス: Link先を確認	Rohan Shenoy and Javier Duarte and Christian Herwig and James Hirschauer and Daniel Noonan and Maurizio Pierini and Nhan Tran and Cristina Mantilla Suarez	(参考訳) 地球移動器距離(Earth mover's distance、EMD)は画像認識と分類に有用な指標であるが、通常の実装は微分可能ではなく、勾配降下による他のアルゴリズムを訓練するための損失関数として使うには遅すぎる。本稿では,畳み込みニューラルネットワーク(CNN)を用いて,EMDの微分可能かつ高速な近似を学習し,計算集約型EMD実装の代替として使用できることを示す。この微分可能な近似を、cernの高輝度lhcにおけるデータ圧縮のためのautoencoder-inspired neural network(encoder nn)のトレーニングに適用する。このエンコーダNNの目標は、粒子検出器内のエネルギー蓄積の分布に関する情報を保存しながらデータを圧縮することである。 EMD CNNを用いて訓練したエンコーダNNの性能が平均二乗誤差に基づく損失関数付きトレーニングよりも優れていることを示す。 The Earth mover's distance (EMD) is a useful metric for image recognition and classification, but its usual implementations are not differentiable or too slow to be used as a loss function for training other algorithms via gradient descent. In this paper, we train a convolutional neural network (CNN) to learn a differentiable, fast approximation of the EMD and demonstrate that it can be used as a substitute for computing-intensive EMD implementations. We apply this differentiable approximation in the training of an autoencoder-inspired neural network (encoder NN) for data compression at the high-luminosity LHC at CERN. The goal of this encoder NN is to compress the data while preserving the information related to the distribution of energy deposits in particle detectors. We demonstrate that the performance of our encoder NN trained using the differentiable EMD CNN surpasses that of training with loss functions based on mean squared error.	翻訳日:2023-10-26 00:24:11 公開日:2023-10-24
# ダイナミクスシフトを伴うデータに対する状態正規化ポリシー最適化 State Regularized Policy Optimization on Data with Dynamics Shift ( http://arxiv.org/abs/2306.03552v2 ) ライセンス: Link先を確認	Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An	(参考訳) 多くの現実世界のシナリオでは、強化学習(rl)アルゴリズムは、動的シフトのあるデータ、すなわち異なる環境ダイナミクスに基づいて訓練される。現在の手法の大部分は、環境パラメータを識別するためにコンテキストエンコーダをトレーニングすることでこの問題に対処している。動的シフトを伴うデータは、環境パラメータに従って分離され、対応するポリシーをトレーニングする。しかし、これらの手法は、データがtextit{ad hoc} として使用されるため、サンプル非効率であり、1つのダイナミクスのために訓練されたポリシーは、異なるダイナミクスを持つ他のすべての環境で収集されたデータから恩恵を受けることができない。本稿では,類似した構造と異なるダイナミクスを持つ多くの環境において,最適ポリシーが類似した定常状態分布を持つことを示す。このような特性を活用し,動的シフトを持つデータから定常状態分布を学習し,効率的なデータ再利用を行う。そのような分布は、新しい環境で訓練されたポリシーを規則化するために使用され、SRPO(\textbf{S}tate \textbf{R}egularized \textbf{P}olicy \textbf{O}ptimization)アルゴリズムにつながる。理論的解析を行うため、類似した環境構造の直観はホモモルファスMDPの概念によって特徴づけられる。次に、定常状態分布によって規則化されたポリシーに対して、低いバウンド性能保証を示す。実際には、SRPOはオンラインとオフラインのRL設定の両方でコンテキストベースのアルゴリズムのアドオンモジュールとなることができる。実験の結果、srpoは複数のコンテキストベースのアルゴリズムをより効率的にし、全体的な性能を大幅に向上できることがわかった。 In many real-world scenarios, Reinforcement Learning (RL) algorithms are trained on data with dynamics shift, i.e., with different underlying environment dynamics. A majority of current methods address such issue by training context encoders to identify environment parameters. Data with dynamics shift are separated according to their environment parameters to train the corresponding policy. However, these methods can be sample inefficient as data are used \textit{ad hoc}, and policies trained for one dynamics cannot benefit from data collected in all other environments with different dynamics. In this paper, we find that in many environments with similar structures and different dynamics, optimal policies have similar stationary state distributions. We exploit such property and learn the stationary state distribution from data with dynamics shift for efficient data reuse. Such distribution is used to regularize the policy trained in a new environment, leading to the SRPO (\textbf{S}tate \textbf{R}egularized \textbf{P}olicy \textbf{O}ptimization) algorithm. To conduct theoretical analyses, the intuition of similar environment structures is characterized by the notion of homomorphous MDPs. We then demonstrate a lower-bound performance guarantee on policies regularized by the stationary state distribution. In practice, SRPO can be an add-on module to context-based algorithms in both online and offline RL settings. Experimental results show that SRPO can make several context-based algorithms far more data efficient and significantly improve their overall performance.	翻訳日:2023-10-26 00:23:56 公開日:2023-10-24
# シュル=オディンガー代数の自然な基礎におけるクリロフ複雑性 Krylov complexity in a natural basis for the Schr\"odinger algebra ( http://arxiv.org/abs/2306.03133v3 ) ライセンス: Link先を確認	Dimitrios Patramanis and Watse Sybesma	(参考訳) クリロフ複雑性の研究により、2次元シュリンガー群対称性を持つ量子系の作用素成長を研究する。半単純リー代数では実現可能であるが、半直和構造によって特徴づけられるシュリンガー代数のようなケースは複雑である。我々は、この代数のクリロフ複雑性を自然な正則基底で計算し、通常の三対角ランツォスアルゴリズムの結果とは対照的に、時間発展作用素の五対角構造を生成することを提案する。結果として生じる複雑性は期待通りに振る舞う。このアプローチは他の半単純でない代数に洞察を与えることができると我々は主張する。 We investigate operator growth in quantum systems with two-dimensional Schr\"odinger group symmetry by studying the Krylov complexity. While feasible for semisimple Lie algebras, cases such as the Schr\"odinger algebra which is characterized by a semi-direct sum structure are complicated. We propose to compute Krylov complexity for this algebra in a natural orthonormal basis, which produces a pentadiagonal structure of the time evolution operator, contrasting the usual tridiagonal Lanczos algorithm outcome. The resulting complexity behaves as expected. We advocate that this approach can provide insights to other non-semisimple algebras.	翻訳日:2023-10-26 00:23:25 公開日:2023-10-24
# ReContrast: コントラスト再構成によるドメイン特異的異常検出 ReContrast: Domain-Specific Anomaly Detection via Contrastive Reconstruction ( http://arxiv.org/abs/2306.02602v3 ) ライセンス: Link先を確認	Jia Guo, Shuai Lu, Lize Jia, Weihang Zhang, Huiqi Li	(参考訳) 殆どの高度な教師なし異常検出(UAD)手法は、例えばImageNetのような大規模データセットで事前訓練された冷凍エンコーダネットワークの特徴表現をモデル化することに依存している。しかし, 自然画像領域から借用したエンコーダから抽出した特徴は, 産業検査や医用画像などのUAD領域で要求される特徴とほとんど一致しない。本稿では,ネットワーク全体を最適化し,事前学習した画像領域に対するバイアスを低減し,対象領域におけるネットワークの向き付けを行う,新たな認識論的uad法であるrecontrastを提案する。まず、エラーから異常を検出する機能再構築アプローチから始める。本質的に、コントラスト学習の要素を特徴再構成にエレガントに組み込んで、ネットワークが不安定、パターン崩壊、および同一のショートカットをトレーニングし、同時にターゲットドメイン上のエンコーダとデコーダの両方を最適化する。様々な画像領域における転写能力を実証するために,2つの一般的な産業欠陥検出ベンチマークと3つの医療画像UADタスクにまたがる広範な実験を行った。 Most advanced unsupervised anomaly detection (UAD) methods rely on modeling feature representations of frozen encoder networks pre-trained on large-scale datasets, e.g. ImageNet. However, the features extracted from the encoders that are borrowed from natural image domains coincide little with the features required in the target UAD domain, such as industrial inspection and medical imaging. In this paper, we propose a novel epistemic UAD method, namely ReContrast, which optimizes the entire network to reduce biases towards the pre-trained image domain and orients the network in the target domain. We start with a feature reconstruction approach that detects anomalies from errors. Essentially, the elements of contrastive learning are elegantly embedded in feature reconstruction to prevent the network from training instability, pattern collapse, and identical shortcut, while simultaneously optimizing both the encoder and decoder on the target domain. To demonstrate our transfer ability on various image domains, we conduct extensive experiments across two popular industrial defect detection benchmarks and three medical image UAD tasks, which shows our superiority over current state-of-the-art methods.	翻訳日:2023-10-26 00:23:14 公開日:2023-10-24
# 分布シフト下での低ランクデータの雑音化:二重降下とデータ拡張 Denoising Low-Rank Data Under Distribution Shift: Double Descent and Data Augmentation ( http://arxiv.org/abs/2305.17297v2 ) ライセンス: Link先を確認	Chinmaya Kausik and Kashvi Srivastava and Rishi Sonthalia	(参考訳) 現代の機械学習におけるデノイジングの重要性と、教師付きデノイジングに関する豊富な経験的研究にもかかわらず、その理論的理解は比較的少ない。教師付きdenoisingを研究することの1つの懸念は、テスト分布からのノイズレストレーニングデータが常に存在するとは限らないことである。テストデータセットとは異なるデータセットからノイズレストレーニングデータにアクセスするのは、より合理的である。そこで本研究では,分布シフト下での分節化と雑音入力回帰について検討した。実生活データや現代の機械学習への理論的洞察の適用性を高めるために、3つの考察を加えます。第一に、過去の理論的な研究は、データ共分散行列はフルランクでよく条件付けされていると仮定しているが、実生活データはおよそローランクである。したがって、データ行列は低ランクであると仮定する。第2に、データの独立性の前提を下げます。第3に、計算能力の増大とデータの次元性は、非古典的学習体制の研究を重要視している。したがって、データ次元$d$とサンプル数$N$が$d/N = c + o(1)$として成長する非古典的比例法で作業する。この設定では,雑音と雑音の回帰に対する一般的なテストエラー表現を導出し,雑音の過大さが良性,緊張的,あるいは破滅的である場合の研究を行う。テスト誤差は一般分布シフト下で二重降下を示し,データ拡張と暗黙的正規化としてのノイズの役割についての洞察を与える。また、実生活データを用いて実験を行い、その理論予測を低ランクデータに対する1% MSE誤差と一致させる。 Despite the importance of denoising in modern machine learning and ample empirical work on supervised denoising, its theoretical understanding is still relatively scarce. One concern about studying supervised denoising is that one might not always have noiseless training data from the test distribution. It is more reasonable to have access to noiseless training data from a different dataset than the test dataset. Motivated by this, we study supervised denoising and noisy-input regression under distribution shift. We add three considerations to increase the applicability of our theoretical insights to real-life data and modern machine learning. First, while most past theoretical work assumes that the data covariance matrix is full-rank and well-conditioned, empirical studies have shown that real-life data is approximately low-rank. Thus, we assume that our data matrices are low-rank. Second, we drop independence assumptions on our data. Third, the rise in computational power and dimensionality of data have made it important to study non-classical regimes of learning. Thus, we work in the non-classical proportional regime, where data dimension $d$ and number of samples $N$ grow as $d/N = c + o(1)$. For this setting, we derive general test error expressions for both denoising and noisy-input regression, and study when overfitting the noise is benign, tempered or catastrophic. We show that the test error exhibits double descent under general distribution shift, providing insights for data augmentation and the role of noise as an implicit regularizer. We also perform experiments using real-life data, where we match the theoretical predictions with under 1% MSE error for low-rank data.	翻訳日:2023-10-26 00:22:53 公開日:2023-10-24
# 外部時間スケール調整によるハイパーパラメータ依存性の低減 Reducing hyperparameter dependence by external timescale tailoring ( http://arxiv.org/abs/2307.08603v2 ) ライセンス: Link先を確認	Lina C. Jaurigue and Kathy L\"udge	(参考訳) 貯水池コンピューティングにおけるタスク特化ハイパーパラメータチューニングはオープンな問題であり、特にハードウェア実装型貯水池との関連性が高い。本研究では,外部制御可能なタスク特定時間スケールが貯留層計算手法の性能とハイパーパラメータ感度に与える影響について検討する。その結果,リザーバの時間スケールが特定のタスクに合わせて調整された場合,ハイパーパラメータの最適化の必要性を低減できることがわかった。この結果は主に過去の入力の記憶を必要とする時間的タスクに関係している。貯水池計算手法にタスク固有の時間スケールを含める様々な方法を検討し、時間多重・空間多重の貯水池計算の両面から、メッセージの普遍性を実証する。 Task specific hyperparameter tuning in reservoir computing is an open issue, and is of particular relevance for hardware implemented reservoirs. We investigate the influence of directly including externally controllable task specific timescales on the performance and hyperparameter sensitivity of reservoir computing approaches. We show that the need for hyperparameter optimisation can be reduced if timescales of the reservoir are tailored to the specific task. Our results are mainly relevant for temporal tasks requiring memory of past inputs, for example chaotic timeseries prediciton. We consider various methods of including task specific timescales in the reservoir computing approach and demonstrate the universality of our message by looking at both time-multiplexed and spatially multiplexed reservoir computing.	翻訳日:2023-10-26 00:14:35 公開日:2023-10-24
# 開量子系におけるフラクトニック高次位相 Fractonic Higher-Order Topological Phases in Open Quantum Systems ( http://arxiv.org/abs/2307.05474v2 ) ライセンス: Link先を確認	Jian-Hao Zhang, Ke Ding, Shuo Yang, Zhen Bi	(参考訳) 本研究では,非共役平均対称性保護位相の開放量子系への一般化を,サブシステム対称性と大域対称性の組み合わせで検討する。特に、平均サブシステム対称性を持つ2種類の固有平均高次位相位相相の例を示す。平均対称性の一般化された異常キャンセル基準に基づくこれらの位相の分類手法についても論じる。 In this work, we study the generalization of decohered average symmetry-protected topological phases to open quantum systems with a combination of subsystem symmetries and global symmetries. In particular, we provide examples of two types of intrinsic average higher-order topological phases with average subsystem symmetries. A classification scheme for these phases based on generalized anomaly cancellation criteria of average symmetry is also discussed.	翻訳日:2023-10-26 00:14:24 公開日:2023-10-24
# RADAR: 逆学習によるロバストなAIテキスト検出 RADAR: Robust AI-Text Detection via Adversarial Learning ( http://arxiv.org/abs/2307.03838v2 ) ライセンス: Link先を確認	Xiaomeng Hu and Pin-Yu Chen and Tsung-Yi Ho	(参考訳) 大規模言語モデル(LLM)の最近の進歩とChatGPTライクなアプリケーションの普及により、人間と機械間の高品質テキスト生成の境界が曖昧になった。しかし、我々の技術や社会の革命的な変化に加えて、LLM生成テキスト(AIテキスト)と人間生成テキストを区別することの難しさは、偽コンテンツ生成、盗作、無実の作家の虚偽の告発など、誤用と公平性の新たな課題を引き起こす。既存の研究は、現在のAIテキスト検出器はLLMベースのパラフレーズには堅牢ではないことを示しているが、本稿は、敵学習による堅牢なAIテキスト検出を共同で訓練するRADARと呼ばれる新しいフレームワークを提案することによって、このギャップを埋めることを目指している。 RADARはパラフラザーと検出器の対向訓練に基づいている。パラフレーズの目標は、AIテキスト検出を避けるために現実的なコンテンツを生成することである。 RADARは検出器からのフィードバックを使ってパラフラザーを更新する。 4つのデータセットで8つの異なるLLM(Pythia, Dolly 2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLaMA, Vicuna)を評価した結果、RADARが既存のAIテキスト検出方法、特にパラフレーズが設定されている場合において、大幅に上回っていることが示された。 GPT-3.5-Turbo を用いた RADAR の高機能化と RADAR の高機能化について検討した。 Recent advances in large language models (LLMs) and the intensifying popularity of ChatGPT-like applications have blurred the boundary of high-quality text generation between humans and machines. However, in addition to the anticipated revolutionary changes to our technology and society, the difficulty of distinguishing LLM-generated texts (AI-text) from human-generated texts poses new challenges of misuse and fairness, such as fake content generation, plagiarism, and false accusations of innocent writers. While existing works show that current AI-text detectors are not robust to LLM-based paraphrasing, this paper aims to bridge this gap by proposing a new framework called RADAR, which jointly trains a robust AI-text detector via adversarial learning. RADAR is based on adversarial training of a paraphraser and a detector. The paraphraser's goal is to generate realistic content to evade AI-text detection. RADAR uses the feedback from the detector to update the paraphraser, and vice versa. Evaluated with 8 different LLMs (Pythia, Dolly 2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLaMA, and Vicuna) across 4 datasets, experimental results show that RADAR significantly outperforms existing AI-text detection methods, especially when paraphrasing is in place. We also identify the strong transferability of RADAR from instruction-tuned LLMs to other LLMs, and evaluate the improved capability of RADAR via GPT-3.5-Turbo.	翻訳日:2023-10-26 00:14:18 公開日:2023-10-24
# 逆モデルによる不確かさの定量化 Quantification of Uncertainty with Adversarial Models ( http://arxiv.org/abs/2307.03217v2 ) ライセンス: Link先を確認	Kajetan Schweighofer, Lukas Aichberger, Mykyta Ielanskyi, G\"unter Klambauer, Sepp Hochreiter	(参考訳) 不確かさの定量化は実世界のアプリケーションで実行可能な予測に重要である。予測的不確実性定量化の重要な部分は、発散関数と後部の間の積の積分として定義されるてんかん不確実性の推定である。ディープアンサンブルやMCドロップアウトのような現在の手法は、主にサンプリングモデルにおいて後部を考慮しているため、てんかんの不確かさを推定するには不十分である。疫学的な不確実性をよりよく推定するために, 適応モデルによる不確かさの定量化を提案する。 quamは、積分の下の全積が後側だけでなく大きい領域を特定する。その結果、quamは従来の方法に比べて認識の不確かさの近似誤差が低い。製品が大きいモデルは、(逆の例ではなく)逆のモデルに対応します。敵対モデルは、高い後部と、それらの予測と参照モデルの高ばらつきの両方を持つ。実験の結果, QUIMは, 深層学習モデルの認識不確実性を把握し, 視覚領域における課題に対する従来の手法よりも優れていることがわかった。 Quantifying uncertainty is important for actionable predictions in real-world applications. A crucial part of predictive uncertainty quantification is the estimation of epistemic uncertainty, which is defined as an integral of the product between a divergence function and the posterior. Current methods such as Deep Ensembles or MC dropout underperform at estimating the epistemic uncertainty, since they primarily consider the posterior when sampling models. We suggest Quantification of Uncertainty with Adversarial Models (QUAM) to better estimate the epistemic uncertainty. QUAM identifies regions where the whole product under the integral is large, not just the posterior. Consequently, QUAM has lower approximation error of the epistemic uncertainty compared to previous methods. Models for which the product is large correspond to adversarial models (not adversarial examples!). Adversarial models have both a high posterior as well as a high divergence between their predictions and that of a reference model. Our experiments show that QUAM excels in capturing epistemic uncertainty for deep learning models and outperforms previous methods on challenging tasks in the vision domain.	翻訳日:2023-10-26 00:13:51 公開日:2023-10-24
# Marginal Pseudo-likelihood を用いたガウス図形モデルの大規模ベイズ構造学習 Large-scale Bayesian Structure Learning for Gaussian Graphical Models using Marginal Pseudo-likelihood ( http://arxiv.org/abs/2307.00127v2 ) ライセンス: Link先を確認	Reza Mohammadi, Marit Schoonhoven, Lucas Vogels, S. Ilker Birbil	(参考訳) ガウスモデルの学習のためのベイズ的手法は、モデルの不確実性に対処し、事前の知識を取り入れる堅牢なフレームワークを提供する。その理論的な強みにもかかわらず、ベイズ法の適用性はしばしば計算的要求、特に数千の変数を含む現代の文脈によって制約される。この問題を克服するため,我々は,ベイズ的手法を先行する手法に比べて計算コストが著しく低いマルコフ連鎖モンテカルロ(mcmc)探索アルゴリズムを2つ導入する。提案するmcmcに基づく探索アルゴリズムは,計算の難解な正規化定数と反復的精度行列サンプリングの複雑さを回避できる。これらのアルゴリズムは、1000変数の大規模な問題であっても、標準コンピュータ上でほんの数分で信頼できる結果を提供できる。さらに,提案手法は,全グラフ空間を効率的に探索することにより,モデルの不確実性に対処することができる。シミュレーション研究は,提案アルゴリズム,特に大規模スパースグラフにおいて,計算効率と精度の点でベイズ的手法より優れていることを示す。新しいアプローチをサポートする実装は、r package bdgraphを通じて利用できる。 Bayesian methods for learning Gaussian graphical models offer a robust framework that addresses model uncertainty and incorporates prior knowledge. Despite their theoretical strengths, the applicability of Bayesian methods is often constrained by computational needs, especially in modern contexts involving thousands of variables. To overcome this issue, we introduce two novel Markov chain Monte Carlo (MCMC) search algorithms that have a significantly lower computational cost than leading Bayesian approaches. Our proposed MCMC-based search algorithms use the marginal pseudo-likelihood approach to bypass the complexities of computing intractable normalizing constants and iterative precision matrix sampling. These algorithms can deliver reliable results in mere minutes on standard computers, even for large-scale problems with one thousand variables. Furthermore, our proposed method is capable of addressing model uncertainty by efficiently exploring the full posterior graph space. Our simulation study indicates that the proposed algorithms, particularly for large-scale sparse graphs, outperform the leading Bayesian approaches in terms of computational efficiency and precision. The implementation supporting the new approach is available through the R package BDgraph.	翻訳日:2023-10-26 00:13:33 公開日:2023-10-24
# ディープフェイク検出の公平性向上 Improving Fairness in Deepfake Detection ( http://arxiv.org/abs/2306.16635v2 ) ライセンス: Link先を確認	Yan Ju, Shu Hu, Shan Jia, George H. Chen, Siwei Lyu	(参考訳) 近年の効果的なディープフェイク検出モデルの開発にもかかわらず、近年の研究では、ディープフェイク検出モデルの開発に使用されるトレーニングデータのバイアスが、異なる人種や性別の人口集団に対して不公平なパフォーマンスをもたらすことが示されている。このような結果、これらのグループは不公平に標的にされ、または検出から除外され、分類されていないディープフェイクが世論を操り、モデルの信頼を損なうことができる。これらの研究はディープフェイク検出における不公平さの同定と評価に重点を置いているが,アルゴリズムレベルでのディープフェイク検出の公平性問題に対処する手法は開発されていない。そこで本研究では,新しい損失関数を提案すれば,人口統計学的要因を認識できない方法で,ディープフェイク検出モデルをトレーニングできるという,ディープフェイク検出フェアネスを改善する最初の試みを行う。 4つのdeepfakeデータセットと5つのdeepfake検出器に関する広範な実験は、deepfake検出フェアネスを改善するためのアプローチの有効性と柔軟性を示しています。 Despite the development of effective deepfake detection models in recent years, several recent studies have demonstrated that biases in the training data utilized to develop deepfake detection models can lead to unfair performance for demographic groups of different races and/or genders. Such can result in these groups being unfairly targeted or excluded from detection, allowing misclassified deepfakes to manipulate public opinion and erode trust in the model. While these studies have focused on identifying and evaluating the unfairness in deepfake detection, no methods have been developed to address the fairness issue of deepfake detection at the algorithm level. In this work, we make the first attempt to improve deepfake detection fairness by proposing novel loss functions to train fair deepfake detection models in ways that are agnostic or aware of demographic factors. Extensive experiments on four deepfake datasets and five deepfake detectors demonstrate the effectiveness and flexibility of our approach in improving the deepfake detection fairness.	翻訳日:2023-10-26 00:13:13 公開日:2023-10-24
# vint:ビジュアルナビゲーションのための基礎モデル ViNT: A Foundation Model for Visual Navigation ( http://arxiv.org/abs/2306.14846v2 ) ライセンス: Link先を確認	Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Stachowicz, Kevin Black, Noriaki Hirose, Sergey Levine	(参考訳) 汎用的事前学習モデル("foundation model")は、個々の機械学習問題に対して、スクラッチから学習するために必要なものよりもはるかに小さいデータセットを使って、一般化可能なソリューションを作成することができる。このようなモデルは通常、弱い監督を持つ大規模で多様なデータセットでトレーニングされ、個々のダウンストリームアプリケーションで利用可能なものよりも多くのトレーニングデータを消費する。本稿では,視覚に基づくロボットナビゲーションにおける汎用事前学習モデルの成功を目的とした基礎モデルである視覚ナビゲーショントランスフォーマ(vint)について述べる。 ViNTは、任意のナビゲーションデータセットで使用可能な汎用目標到達目標をトレーニングし、フレキシブルなTransformerベースのアーキテクチャを使用して、ナビゲーションの余裕を学習し、さまざまな下流ナビゲーションタスクへの効率的な適応を可能にする。 vintは、さまざまなロボットプラットフォームから数百時間のロボットナビゲーションを含む、既存の多くのナビゲーションデータセットでトレーニングされており、特異なデータセットでトレーニングされた専門家モデルよりも優れた、ポジティブな転送を示す。 ViNTは、新しい環境を探索するための拡散に基づくサブゴールの提案で拡張することができ、長距離ヒューリスティックスを備えた場合のキロメートル規模のナビゲーション問題を解決することができる。 ViNTはプロンプトチューニングにインスパイアされた技法で新しいタスク仕様に適応することができ、ゴールエンコーダはゴールトークンの同じ空間に埋め込まれた別のタスクモダリティ(GPSウェイポイントやルーティングコマンドなど)のエンコーディングに置き換えられる。様々な下流問題領域に対応する柔軟性と能力は、モバイルロボティクスの効果的な基盤モデルとしてViNTを確立している。ビデオ、コード、モデルチェックポイントについては、プロジェクトページ https://visualnav-transformer.github.io を参照してください。 General-purpose pre-trained models ("foundation models") have enabled practitioners to produce generalizable solutions for individual machine learning problems with datasets that are significantly smaller than those required for learning from scratch. Such models are typically trained on large and diverse datasets with weak supervision, consuming much more training data than is available for any individual downstream application. In this paper, we describe the Visual Navigation Transformer (ViNT), a foundation model that aims to bring the success of general-purpose pre-trained models to vision-based robotic navigation. ViNT is trained with a general goal-reaching objective that can be used with any navigation dataset, and employs a flexible Transformer-based architecture to learn navigational affordances and enable efficient adaptation to a variety of downstream navigational tasks. ViNT is trained on a number of existing navigation datasets, comprising hundreds of hours of robotic navigation from a variety of different robotic platforms, and exhibits positive transfer, outperforming specialist models trained on singular datasets. ViNT can be augmented with diffusion-based subgoal proposals to explore novel environments, and can solve kilometer-scale navigation problems when equipped with long-range heuristics. ViNT can also be adapted to novel task specifications with a technique inspired by prompt-tuning, where the goal encoder is replaced by an encoding of another task modality (e.g., GPS waypoints or routing commands) embedded into the same space of goal tokens. This flexibility and ability to accommodate a variety of downstream problem domains establishes ViNT as an effective foundation model for mobile robotics. For videos, code, and model checkpoints, see our project page at https://visualnav-transformer.github.io.	翻訳日:2023-10-26 00:12:55 公開日:2023-10-24
# 漸近等方性サブプランク位相空間感度に対するスーパーポーシングコンパス状態 Superposing compass states for asymptotic isotropic sub-Planck phase-space sensitivity ( http://arxiv.org/abs/2306.13182v2 ) ライセンス: Link先を確認	Atharva Shukla, Barry C. Sanders	(参考訳) コンパス状態は、位相空間の変位に対する感度が真空状態の任意の方向に分散する感度よりも優れているという意味でサブプランク位相空間構造をもたらすが、この感度は異方性である。ここでは、一般化されたコンパス状態を、前者に対して$n$ のコンパス状態の重ね合わせとして導入し、それぞれ$\frac\pi{2n}$ で向き付けます。具体的には、これら一般化されたコンパス状態のウィグナー関数と、一般化されたコンパス状態とそれらの置換されたコンパス状態との重なりに対する近似閉形式表現を導出する。さらに、一般化されたコンパス状態は、任意の方向における位相空間の変位に対する等方性感度を示す。 Compass states deliver sub-Planck phase-space structure in the sense that sensitivity to phase-space displacement is superior to the sensitivity of displacing the vacuum state in any direction, but this sensitivity is anisotropic: better sensitivity for some directions of phase-space displacement vs others. Here we introduce generalised compass states as superpositions of $n$ compass states, with each oriented by $\frac\pi{2n}$ with respect to its predecessor. Specifically, we derive Wigner functions for these generalised compass states and approximate closed-form expressions for overlaps between generalised compass states and their displaced counterparts. Furthermore, we show that generalised compass states, in the limit $n\to\infty$, display isotropic sensitivity to phase-space displacement in any direction.	翻訳日:2023-10-26 00:12:26 公開日:2023-10-24
# 監視システムのレプリカ限界における位相遷移の解明 Elusive phase transition in the replica limit of monitored systems ( http://arxiv.org/abs/2306.12166v2 ) ライセンス: Link先を確認	Guido Giachetti and Andrea De Luca	(参考訳) 各スピンが無作為な方向にスピン成分の弱い測定によって常に摂動しているペアワイズオールツーオールノイズ相互作用を持つ、n$ spin-$1/2$粒子の系において、正確な可解な力学モデルの研究を行った。我々は、このレプリカのトリックを利用して、精製やその他の可観測物の研究における測定結果の重み付けをボルンの規則に当てはめ、大額のN$制限に正確に記述する。相転移の性質は計算に使用されるレプリカの数 n$ に大きく依存しており、関連する $n \rightarrow 1$ のレプリカリミットにおける不連続/清浄相を破壊する非摂動的対数補正が現れる。具体的には、弱い測定相における混合状態の浄化時間は、任意の強い測定速度でシステムサイズにおいて常に指数関数的に長いことを観察する。 We study an exactly solvable model of monitored dynamics in a system of $N$ spin-$1/2$ particles with pairwise all-to-all noisy interactions, where each spin is constantly perturbed by weak measurements of the spin component in a random direction. We make use of the replica trick to account for the Born's rule weighting of the measurement outcomes in the study of purification and other observables, with an exact description in the large-$N$ limit. We find that the nature of the phase transition strongly depends on the number $n$ of replicas used in the calculation, with the appearance of non-perturbative logarithmic corrections that destroy the disentangled/purifying phase in the relevant $n \rightarrow 1$ replica limit. Specifically, we observe that the purification time of a mixed state in the weak measurement phase is always exponentially long in the system size for arbitrary strong measurement rates.	翻訳日:2023-10-26 00:12:08 公開日:2023-10-24
# 構造に基づく薬物設計のための幾何学的深層学習の体系的調査 A Systematic Survey in Geometric Deep Learning for Structure-based Drug Design ( http://arxiv.org/abs/2306.11768v5 ) ライセンス: Link先を確認	Zaixi Zhang, Jiaxian Yan, Qi Liu, Enhong Chen, and Marinka Zitnik	(参考訳) structure-based drug design (sbdd) はタンパク質の3次元形状を利用して薬物候補を同定する。物理化学的モデリングに基礎を置き、ドメインの専門知識によって情報を得る伝統的な手法は資源集約である。幾何学的深層学習の最近の進歩は、AlphaFoldのようなツールによる正確なタンパク質の3D構造予測の可用性と合わせて、3D幾何学的データの統合と処理に焦点を当て、構造に基づく薬物設計の分野を大きく進歩させた。本稿では,SBDDにおける幾何学的深層学習の現状を体系的にレビューする。まず,SBDDの基本課題を概説し,3Dタンパク質の表現を詳細に説明し,代表的予測モデルと生成モデルを強調した。次に、結合部位予測、結合ポーズ生成、 \emph{de novo} 分子生成、リンカ設計、結合親和性予測など、各キータスクの詳細なレビューを行う。形式的な問題定義を提供し,各タスクの代表的な方法,データセット,評価指標,パフォーマンスベンチマークを概説する。 Finally, we summarize the current challenges and future opportunities: current challenges in SBDD include oversimplified problem formulations, inadequate out-of-distribution generalization, a lack of reliable evaluation metrics and large-scale benchmarks, and the need for experimental verification and enhanced model understanding; opportunities include leveraging multimodal datasets, integrating domain knowledge, building comprehensive benchmarks, designing criteria based on clinical endpoints, and developing foundation models that broaden the range of design tasks. また、進行中のコントリビューションとSBDDの新しいデータセットを反映して、 \url{https://github.com/zaixizhang/Awesome-SBDD}をキュレートします。 Structure-based drug design (SBDD) utilizes the three-dimensional geometry of proteins to identify potential drug candidates. Traditional methods, grounded in physicochemical modeling and informed by domain expertise, are resource-intensive. Recent developments in geometric deep learning, focusing on the integration and processing of 3D geometric data, coupled with the availability of accurate protein 3D structure predictions from tools like AlphaFold, have greatly advanced the field of structure-based drug design. This paper systematically reviews the current state of geometric deep learning in SBDD. We first outline foundational tasks in SBDD, detail prevalent 3D protein representations, and highlight representative predictive and generative models. We then offer in-depth reviews of each key task, including binding site prediction, binding pose generation, \emph{de novo} molecule generation, linker design, and binding affinity prediction. We provide formal problem definitions and outline each task's representative methods, datasets, evaluation metrics, and performance benchmarks. Finally, we summarize the current challenges and future opportunities: current challenges in SBDD include oversimplified problem formulations, inadequate out-of-distribution generalization, a lack of reliable evaluation metrics and large-scale benchmarks, and the need for experimental verification and enhanced model understanding; opportunities include leveraging multimodal datasets, integrating domain knowledge, building comprehensive benchmarks, designing criteria based on clinical endpoints, and developing foundation models that broaden the range of design tasks. We also curate \url{https://github.com/zaixizhang/Awesome-SBDD}, reflecting ongoing contributions and new datasets in SBDD.	翻訳日:2023-10-26 00:11:49 公開日:2023-10-24
# 生成的行動クローニングのための証明可能保証--低レベル安定性と高レベル行動の橋渡し Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior ( http://arxiv.org/abs/2307.14619v5 ) ライセンス: Link先を確認	Adam Block, Ali Jadbabaie, Daniel Pfrommer, Max Simchowitz, Russ Tedrake	(参考訳) 生成モデルを用いた複雑な専門家による実験の行動クローニングに関する理論的枠組みを提案する。我々のフレームワークは、専門家によるデモンストレーションの模倣を安定化させるために、低レベルのコントローラ(位置命令制御の学習または暗黙)を呼び出す。私たちはそれを示します a) 適切な低レベルの安定保証及び b) 擬似学習者として十分強力な生成モデルである純粋教師付き行動クローニングは, 基本的に任意の専門的軌跡の時間毎のステップ分布を最適な輸送コストで生成することができる。我々の分析は、学習方針の確率的連続性(英語版)(total variation continuity、TVC)に依存している。次に、一般的なデータ拡張レジームと新しいアルゴリズムのトリックを組み合わせることで、TVCが最小限の精度の劣化で確保できることを示し、実行時に拡張ノイズを追加する。拡散モデルによりパラメータ化されたポリシーの保証をインスタンス化し、学習者が(雑音増大した)エキスパートポリシーのスコアを正確に推定した場合、擬似軌道の分布は自然の最適輸送距離における演者分布に近くなることを示す。提案手法は,無関心な手法である雑音提示トラジェクタ間の複雑なカップリングを構成する。本稿では,アルゴリズムの推薦を実証的に検証し,生成モデルによる行動クローニングの改善に向けた今後の研究の方向性について論じる。 We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. Our framework invokes low-level controllers - either learned or implicit in position-command control - to stabilize imitation around expert demonstrations. We show that with (a) a suitable low-level stability guarantee and (b) a powerful enough generative model as our imitation learner, pure supervised behavior cloning can generate trajectories matching the per-time step distribution of essentially arbitrary expert trajectories in an optimal transport cost. Our analysis relies on a stochastic continuity property of the learned policy we call "total variation continuity" (TVC). We then show that TVC can be ensured with minimal degradation of accuracy by combining a popular data-augmentation regimen with a novel algorithmic trick: adding augmentation noise at execution time. We instantiate our guarantees for policies parameterized by diffusion models and prove that if the learner accurately estimates the score of the (noise-augmented) expert policy, then the distribution of imitator trajectories is close to the demonstrator distribution in a natural optimal transport distance. Our analysis constructs intricate couplings between noise-augmented trajectories, a technique that may be of independent interest. We conclude by empirically validating our algorithmic recommendations, and discussing implications for future research directions for better behavior cloning with generative modeling.	翻訳日:2023-10-26 00:07:13 公開日:2023-10-24
# ランダムウォークによる異常検出に対する結合空間攻撃 Coupled-Space Attacks against Random-Walk-based Anomaly Detection ( http://arxiv.org/abs/2307.14387v2 ) ライセンス: Link先を確認	Yuni Lai, Marcin Waniek, Liying Li, Jingwen Wu, Yulin Zhu, Tomasz P. Michalak, Talal Rahwan, Kai Zhou	(参考訳) ランダムウォークスに基づく異常検出(RWAD)は、様々なアプリケーションにおいて異常パターンを特定するために一般的に用いられる。 RWADの興味深い特徴は、入力グラフが事前に存在するか、生の特徴から構築できることである。その結果、RWADに対する潜在的な攻撃面は2つあり、グラフ空間攻撃と特徴空間攻撃である。本稿では,実用的な結合空間攻撃を設計し,グラフ空間と特徴空間攻撃の相互作用について検討する。この目的のために、我々は徹底的な複雑性解析を行い、RWAD攻撃がNPハードであることを証明した。そこで我々は,グラフ空間攻撃を二段階最適化問題として定式化し,それを解決するための2つの戦略を提案する。最後に、より強力な特徴空間攻撃(グラフ誘導攻撃)を設計するためのガイダンスとしてグラフ空間攻撃の結果を利用する。包括的実験により,提案する攻撃は,rwadからターゲットノードを限定的な攻撃予算で有効にすることを示す。さらに,ブラックボックス設定で転送攻撃実験を行い,対象ノードの異常スコアを有意に減少させることを示した。本研究では,グラフ空間が特徴空間に依存するグラフ異常検出に対する結合空間攻撃の研究の扉を開く。 Random Walks-based Anomaly Detection (RWAD) is commonly used to identify anomalous patterns in various applications. An intriguing characteristic of RWAD is that the input graph can either be pre-existing or constructed from raw features. Consequently, there are two potential attack surfaces against RWAD: graph-space attacks and feature-space attacks. In this paper, we explore this vulnerability by designing practical coupled-space attacks, investigating the interplay between graph-space and feature-space attacks. To this end, we conduct a thorough complexity analysis, proving that attacking RWAD is NP-hard. Then, we proceed to formulate the graph-space attack as a bi-level optimization problem and propose two strategies to solve it: alternative iteration (alterI-attack) or utilizing the closed-form solution of the random walk model (cf-attack). Finally, we utilize the results from the graph-space attacks as guidance to design more powerful feature-space attacks (i.e., graph-guided attacks). Comprehensive experiments demonstrate that our proposed attacks are effective in enabling the target nodes from RWAD with a limited attack budget. In addition, we conduct transfer attack experiments in a black-box setting, which show that our feature attack significantly decreases the anomaly scores of target nodes. Our study opens the door to studying the coupled-space attack against graph anomaly detection in which the graph space relies on the feature space.	翻訳日:2023-10-26 00:06:37 公開日:2023-10-24
# 多類分類における平均ケースロバストネスの効率的な推定 Efficient Estimation of Average-Case Robustness for Multi-Class Classification ( http://arxiv.org/abs/2307.13885v3 ) ライセンス: Link先を確認	Tessa Han, Suraj Srinivas, Himabindu Lakkaraju	(参考訳) 機械学習におけるロバスト性は、逆条件でよく研究されるが、実世界のノイズ(測定ノイズなど)は逆条件ではなくランダムである。このような雑音下でのモデル行動は、平均ケースロバスト性、すなわち入力周辺の局所領域で一貫した予測を得る確率によって捉えられる。しかしながら、モンテカルロサンプリングに基づく平均ケースロバストネスを計算するna\"iveなアプローチは、特に高次元データでは統計的に非効率であり、大規模アプリケーションでは計算コストがかかる。本研究では,マルチクラス判別モデルの平均ケースロバストネスを効率的に計算する最初の解析推定器を開発した。これらの推定器は入力周辺の局所領域のモデルを線形化し、結果の線形モデルのロバスト性を解析的に計算する。これらの推定器が標準ディープラーニングモデルのロバストネスを効率的に計算し、ロバスト性バイアスの測定やノイズの摂動に弱いデータセットの同定など、ロバストネスに関わる様々なタスクにおいてこれらの推定器の有用性を示す。そこで本研究では,ロバストネスのための新しいフレームワークを提案するだけでなく,下流アプリケーションにおける平均ケースロバストネスの利用を可能にし,その計算を実用的なものにする。 Robustness in machine learning is commonly studied in the adversarial setting, yet real-world noise (such as measurement noise) is random rather than adversarial. Model behavior under such noise is captured by average-case robustness, i.e., the probability of obtaining consistent predictions in a local region around an input. However, the na\"ive approach to computing average-case robustness based on Monte-Carlo sampling is statistically inefficient, especially for high-dimensional data, leading to prohibitive computational costs for large-scale applications. In this work, we develop the first analytical estimators to efficiently compute average-case robustness of multi-class discriminative models. These estimators linearize models in the local region around an input and analytically compute the robustness of the resulting linear models. We show empirically that these estimators efficiently compute the robustness of standard deep learning models and demonstrate these estimators' usefulness for various tasks involving robustness, such as measuring robustness bias and identifying dataset samples that are vulnerable to noise perturbation. In doing so, this work not only proposes a new framework for robustness, but also makes its computation practical, enabling the use of average-case robustness in downstream applications.	翻訳日:2023-10-26 00:06:11 公開日:2023-10-24
# WebArena: 自律エージェント構築のための現実的なWeb環境 WebArena: A Realistic Web Environment for Building Autonomous Agents ( http://arxiv.org/abs/2307.13854v2 ) ライセンス: Link先を確認	Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig	(参考訳) 生成AIの進歩により、自律エージェントは自然言語コマンドを通じて日々のタスクを管理することが可能になった。しかし、現在のエージェントは主に単純な合成環境で作成され、テストされ、現実世界のシナリオと切り離される。本稿では,現実的で再現性の高い言語誘導エージェントのための環境を構築する。具体的には、web上でタスクを行うエージェントに注目し、eコマース、ソーシャルフォーラムの議論、共同ソフトウェア開発、コンテンツ管理という4つの共通ドメインから完全に機能するwebサイトを構築する。私たちの環境は、人間のようなタスク解決を促進するツール(地図など)と外部知識ベース(ユーザマニュアルなど)で豊かになっています。私たちの環境に基づいて、タスク完了の機能的正確性を評価することに焦点を当てた一連のベンチマークタスクをリリースします。私たちのベンチマークのタスクは多様で、長い水平で、人間が日常的にインターネット上で実行するタスクをエミュレートするように設計されています。我々はいくつかのベースラインエージェントを実験し、行動前に推論などの最近の手法を統合する。 GPT-4をベースとしたベストエージェントは、エンド・ツー・エンドのタスク成功率14.41%に過ぎず、人間のパフォーマンス78.24%よりも大幅に低い。これらの結果は、ロバストなエージェントのさらなる開発の必要性、現在の最先端の大規模言語モデルが実際のタスクにおける完全なパフォーマンスには程遠いこと、そして、webarenaがそのような進歩を測定するために使用できることを浮き彫りにしている。 With advances in generative AI, there is now potential for autonomous agents to manage daily tasks via natural language commands. However, current agents are primarily created and tested in simplified synthetic environments, leading to a disconnect with real-world scenarios. In this paper, we build an environment for language-guided agents that is highly realistic and reproducible. Specifically, we focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains: e-commerce, social forum discussions, collaborative software development, and content management. Our environment is enriched with tools (e.g., a map) and external knowledge bases (e.g., user manuals) to encourage human-like task-solving. Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions. The tasks in our benchmark are diverse, long-horizon, and designed to emulate tasks that humans routinely perform on the internet. We experiment with several baseline agents, integrating recent techniques such as reasoning before acting. The results demonstrate that solving complex tasks is challenging: our best GPT-4-based agent only achieves an end-to-end task success rate of 14.41%, significantly lower than the human performance of 78.24%. These results highlight the need for further development of robust agents, that current state-of-the-art large language models are far from perfect performance in these real-life tasks, and that WebArena can be used to measure such progress.	翻訳日:2023-10-26 00:05:32 公開日:2023-10-24
# 交通信号制御のためのSim-to-Real転送に向けた不確実な接地行動変換 Uncertainty-aware Grounded Action Transformation towards Sim-to-Real Transfer for Traffic Signal Control ( http://arxiv.org/abs/2307.12388v2 ) ライセンス: Link先を確認	Longchao Da, Hao Mei, Romir Sharma and Hua Wei	(参考訳) 交通信号制御(tsc)は、数百万人の日常生活に影響を与える複雑で重要なタスクである。強化学習(rl)は交通信号制御の最適化に有望な結果を示しているが、現在のrlベースのtsc法は主にシミュレーションで訓練され、シミュレーションと実世界のパフォーマンスギャップに苦しむ。本稿では, シミュレーション中の動作を不確実性で動的に変換することで, シミュレーション環境から実世界環境へ学習した学習方針を伝達し, 遷移力学の領域ギャップを緩和する, UGAT と呼ばれるシミュレーションから実世界への移行手法を提案する。本手法をシミュレーションした交通環境において評価し,実環境におけるトランスファーrlポリシーの性能を著しく向上させることを示す。 Traffic signal control (TSC) is a complex and important task that affects the daily lives of millions of people. Reinforcement Learning (RL) has shown promising results in optimizing traffic signal control, but current RL-based TSC methods are mainly trained in simulation and suffer from the performance gap between simulation and the real world. In this paper, we propose a simulation-to-real-world (sim-to-real) transfer approach called UGAT, which transfers a learned policy trained from a simulated environment to a real-world environment by dynamically transforming actions in the simulation with uncertainty to mitigate the domain gap of transition dynamics. We evaluate our method on a simulated traffic environment and show that it significantly improves the performance of the transferred RL policy in the real world.	翻訳日:2023-10-26 00:05:04 公開日:2023-10-24
# 表面電子に基づく非断熱的ホロノミック量子ゲート Nonadiabatic holonomic quantum gates based on the surface electron ( http://arxiv.org/abs/2307.09900v3 ) ライセンス: Link先を確認	Jun Wang, Hai-Bo Wang, Qing Ai	(参考訳) 幾何学位相に基づく非線形ホロノミック量子計算は、内蔵ノイズとデコヒーレンスに対して堅牢である。本研究では, 量子計算のための有望な2次元プラットフォームである表面電子系において, 非断熱ホロノミック量子ゲートを実現するためのスキームを理論的に提案する。ホロノミックゲートは、リドベルク状態とスピン状態が不均一磁場を介して結合する3層構造によって実現される。循環進化の後、計算基盤は異なる幾何学的位相を拾い上げ、幾何学的ゲートを実行する。スピンアップした電子のみが幾何ゲートを体験し、スピンダウンした電子は状態選択駆動場から分離される。ライドバーグ状態とスピン状態にエンコードされた任意の制御uゲートを実現できる。出力状態の忠実度は、実験的に達成可能なパラメータで 0.99 を超える。 The nonadiabatic holonomic quantum computation based on the geometric phase is robust against the built-in noise and decoherence. In this work, we theoretically propose a scheme to realize nonadiabatic holonomic quantum gates in a surface electron system, which is a promising two-dimensional platform for quantum computation. The holonomic gate is realized by a three-level structure that combines the Rydberg states and spin states via an inhomogeneous magnetic field. After a cyclic evolution, the computation bases pick up different geometric phases and thus perform a geometric gate. Only the electron with spin up experiences the geometric gate, while the electron with spin down is decoupled from the state-selective driving fields. The arbitrary controlled-U gate encoded on the Rydberg states and spin states can then be realized. The fidelity of the output state exceeds 0.99 with experimentally achievable parameters.	翻訳日:2023-10-26 00:04:25 公開日:2023-10-24
# 曖昧な基底真理の下での共形予測 Conformal prediction under ambiguous ground truth ( http://arxiv.org/abs/2307.09302v2 ) ライセンス: Link先を確認	David Stutz, Abhijit Guha Roy, Tatiana Matejovicova, Patricia Strachan, Ali Taylan Cemgil, Arnaud Doucet	(参考訳) Conformal Prediction (CP) は、$C(X)$ を満たす予測セット $\mathbb{P}(Y \in C(X))\geq 1-\alpha$ for a user-chosen $\alpha \in [0,1]$ をキャリブレーションデータ $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y\|X}$ に頼って厳密な不確実性定量化を行うことができる。通常、$\mathbb{P}^{Y\|X}$ は「真の」後ラベル分布であると暗黙的に仮定される。しかし、多くの実世界のシナリオにおいて、ラベルの$Y_1, ..., Y_n$は投票手順を用いて専門家の意見を集約することで得られ、結果として1ホット分布の$\mathbb{P}_{vote}^{Y\|X}$となる。そのような `voted' ラベルに対して、CP の保証は、真の分布 $\mathbb{P}$ よりもむしろ w.r.t. $\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y\|X}$ である。曖昧な基底真理ラベルを持つ場合、$\mathbb{P}_{vote}$と$\mathbb{P}$の区別は無関係である。しかし、不明瞭なラベルのために専門家が同意しない場合、$\mathbb{P}_{vote}^{Y\|X}$を1ホット分布 $\mathbb{P}_{vote}^{Y\|X}$ と近似すると、この不確実性は無視される。本稿では、非退化分布 $\mathbb{P}_{agg}^{Y\|X}$ を用いて、専門家の意見を利用して $\mathbb{P}Y\|X}$ を近似する。それぞれのキャリブレーション例$X_1, ..., X_n$に対して, $\mathbb{P}_{agg}^{Y\|X}$から複数の合成擬似ラベルをサンプリングすることにより, w.r.t. $\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y\|X}$を保証できるモンテカルロCPプロシージャを開発する。専門家アノテータ間で大きな不一致を伴う皮膚条件分類のケーススタディでは、CP w.r.t. $\mathbb{P}_{vote}$ under-covers expert annotations: calibrated for 7,2\%$ coverage, on average 10\%$; our Monte Carlo CP closes this gap both empirically and theoretically。 Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}(Y \in C(X))\geq 1-\alpha$ for a user-chosen $\alpha \in [0,1]$ by relying on calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y\|X}$. It is typically implicitly assumed that $\mathbb{P}^{Y\|X}$ is the "true" posterior label distribution. However, in many real-world scenarios, the labels $Y_1,...,Y_n$ are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution $\mathbb{P}_{vote}^{Y\|X}$. For such ``voted'' labels, CP guarantees are thus w.r.t. $\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y\|X}$ rather than the true distribution $\mathbb{P}$. In cases with unambiguous ground truth labels, the distinction between $\mathbb{P}_{vote}$ and $\mathbb{P}$ is irrelevant. However, when experts do not agree because of ambiguous labels, approximating $\mathbb{P}^{Y\|X}$ with a one-hot distribution $\mathbb{P}_{vote}^{Y\|X}$ ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate $\mathbb{P}^{Y\|X}$ using a non-degenerate distribution $\mathbb{P}_{agg}^{Y\|X}$. We develop Monte Carlo CP procedures which provide guarantees w.r.t. $\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y\|X}$ by sampling multiple synthetic pseudo-labels from $\mathbb{P}_{agg}^{Y\|X}$ for each calibration example $X_1,...,X_n$. In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. $\mathbb{P}_{vote}$ under-covers expert annotations: calibrated for $72\%$ coverage, it falls short by on average $10\%$; our Monte Carlo CP closes this gap both empirically and theoretically.	翻訳日:2023-10-26 00:04:13 公開日:2023-10-24
# 量子コヒーレンスと微視的可逆性の原理 Quantum coherence and the principle of microscopic reversibility ( http://arxiv.org/abs/2307.08792v2 ) ライセンス: Link先を確認	K. Khan, W. F. Magalhaes, Jailson S. Araujo, B. de Lima Bernardo and Gabriel H. Aguilar	(参考訳) 微視的可逆性の原理は、ゆらぎ関係とオンサガーの相互関係の定式化の基本的な要素である。したがって、この原理が量子力学のシナリオにどのように適合するかを明確に記述することは、非平衡量子過程をよりよく理解するために重要である。本稿では、量子遷移を観測する確率と対応する時間反転過程との対称性関係においてコヒーレンスが果たす役割を強調する、この原理の量子一般化を提案する。本研究では,温熱貯留層と相互作用する量子ビット系の枠組みにおける知見の意義について検討し,そのダイナミクスをシミュレートする光学実験を実施する。理論および実験の結果, 低温ではコヒーレンスの影響がより決定的であり, 古典の場合からの最大離脱は最大コヒーレント状態に対しては起こらないことがわかった。古典的な予測は適切な範囲で回復される。 The principle of microscopic reversibility is a fundamental element in the formulation of fluctuation relations and the Onsager reciprocal relations. As such, a clear description of whether and how this principle is adapted to the quantum mechanical scenario might be essential to a better understanding of nonequilibrium quantum processes. Here, we propose a quantum generalization of this principle, which highlights the role played by coherence in the symmetry relations involving the probability of observing a quantum transition and that of the corresponding time reversed process. We study the implications of our findings in the framework of a qubit system interacting with a thermal reservoir, and implement an optical experiment that simulates the dynamics. Our theoretical and experimental results show that the influence of coherence is more decisive at low temperatures and that the maximum departure from the classical case does not take place for maximally coherent states. Classical predictions are recovered in the appropriate limits.	翻訳日:2023-10-26 00:02:41 公開日:2023-10-24
# llm自己防衛:自己検査によって、llmは彼らが騙されていることを知っている LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked ( http://arxiv.org/abs/2308.07308v3 ) ライセンス: Link先を確認	Mansi Phute, Alec Helbling, Matthew Hull, ShengYun Peng, Sebastian Szyller, Cory Cornelius and Duen Horng Chau	(参考訳) 大規模言語モデル(LLM)は高品質なテキスト生成に人気があるが、強化学習を通じて人的価値に合わせる場合でも有害なコンテンツを生成できる。敵のプロンプトは安全対策を回避できる。 LLM自己防衛(LSM Self Defense, LLM Self Defense)は, LLMスクリーンに応答を誘導することでこれらの攻撃を防御する簡単な手法である。本手法では,微調整や入力前処理,反復的な出力生成は不要である。代わりに、生成されたコンテンツを事前定義されたプロンプトに組み込んで、llmの別のインスタンスを使用してテキストを分析し、それが有害かどうかを予測します。我々は, GPT 3.5 と Llama 2 の LLM Self Defense を, GPT 3.5 と Llama 2 の2つの主要な LLM の様々な攻撃に対して試験する。特に、LDM Self Defense は GPT 3.5 と Llama 2 を用いて攻撃成功率を事実上 0 に下げることに成功した。 Large language models (LLMs) are popular for high-quality text generation but can produce harmful content, even when aligned with human values through reinforcement learning. Adversarial prompts can bypass their safety measures. We propose LLM Self Defense, a simple approach to defend against these attacks by having an LLM screen the induced responses. Our method does not require any fine-tuning, input preprocessing, or iterative output generation. Instead, we incorporate the generated content into a pre-defined prompt and employ another instance of an LLM to analyze the text and predict whether it is harmful. We test LLM Self Defense on GPT 3.5 and Llama 2, two of the current most prominent LLMs against various types of attacks, such as forcefully inducing affirmative responses to prompts and prompt engineering attacks. Notably, LLM Self Defense succeeds in reducing the attack success rate to virtually 0 using both GPT 3.5 and Llama 2.	翻訳日:2023-10-25 23:54:51 公開日:2023-10-24
# セマンティックスを超えて:自己教師型学習による行動強化関連モデル学習 Beyond Semantics: Learning a Behavior Augmented Relevance Model with Self-supervised Learning ( http://arxiv.org/abs/2308.05379v4 ) ライセンス: Link先を確認	Zeyuan Chen, Wei Chen, Jia Xu, Zhongyi Liu, Wei Zhang	(参考訳) 関連モデリングは,検索エンジンがユーザエクスペリエンスを確保する上で重要な,対応するクエリに対して望ましい項目を見つけることを目的としている。ほとんどの従来の手法では、クエリとアイテム間のセマンティックな類似性を評価することでこの問題に対処するが、純粋なセマンティックマッチングは、すべてではない。実際、検索ログのユーザ履歴行動データから抽出された補助的なクエリ-イテム相互作用は、ユーザの検索意図をさらに明らかにするためのヒントを与えることができる。そこで我々は,Alipay Search (BARL-ASe) のための新しい行動拡張関連学習モデルを提案し,ターゲットクエリの隣のクエリと隣のクエリの隣のクエリを利用して,ターゲットクエリと項目のセマンティックマッチングを補完する。具体的には,隣接と対象の両方のビューから粗粒度および細粒度の意味表現を蒸留するマルチレベルコアテンションを構築した。このモデルはその後,BARL-ASeの精度とロジット学習の強化により頑健性を向上させるために,隣接目標の自己教師型学習を採用する。さらに、alipayのミニアプリの検索シナリオのロングテールクエリ項目マッチングを実際に扱う方法について論じる。実業界データとオンラインa/bテストによる実験により,提案手法が低レイテンシで有望な性能を実現することを実証した。 Relevance modeling aims to locate desirable items for corresponding queries, which is crucial for search engines to ensure user experience. Although most conventional approaches address this problem by assessing the semantic similarity between the query and item, pure semantic matching is not everything. In reality, auxiliary query-item interactions extracted from user historical behavior data of the search log could provide hints to reveal users' search intents further. Drawing inspiration from this, we devise a novel Behavior Augmented Relevance Learning model for Alipay Search (BARL-ASe) that leverages neighbor queries of target item and neighbor items of target query to complement target query-item semantic matching. Specifically, our model builds multi-level co-attention for distilling coarse-grained and fine-grained semantic representations from both neighbor and target views. The model subsequently employs neighbor-target self-supervised learning to improve the accuracy and robustness of BARL-ASe by strengthening representation and logit learning. Furthermore, we discuss how to deal with the long-tail query-item matching of the mini apps search scenario of Alipay practically. Experiments on real-world industry data and online A/B testing demonstrate our proposal achieves promising performance with low latency.	翻訳日:2023-10-25 23:54:31 公開日:2023-10-24
# モデルモデル -- その1 Model of models -- Part 1 ( http://arxiv.org/abs/2308.04600v2 ) ライセンス: Link先を確認	Shimon Komarovsky	(参考訳) 本稿では,AGIエージェントの主成分として機能する新しい認知モデルを提案する。このモデルは、成熟したインテリジェンス状態に導入され、以前のモデルであるDENN、特にAKREMの拡張として、運用モデル(フレーム/クラス)と意志を含む。このモデルの中核的な仮定は、認知は蓄積された知識を操作することであり、適切な意志のガイダンスである。また、知識の一部である行動が、成熟した知性状態に先行する進化段階において、意志に沿うことを学習していると仮定する。さらに、このモデルは、トップダウンとボトムアップの両方のモデル学習、一般化のバース特殊化など、既知のすべての知的側面における双対性原理に基づいている。さらに、AGI設計には全体論的アプローチが提唱され、再利用性とシンプルさという形で制約や効率性の下での認知が提案される。最後に、この成熟状態に達するには、統合原理を利用して、幼児から成人への認知的進化を通して記述する。この認知モデルの最終的な製品は、モデルとインスタンスの動的操作メモリである。最後に、成熟状態に達する進化段階のいくつかの例と予備的なアイデアを示す。 This paper proposes a new cognitive model, acting as the main component of an AGI agent. The model is introduced in its mature intelligence state, and as an extension of previous models, DENN, and especially AKREM, by including operational models (frames/classes) and will. This model's core assumption is that cognition is about operating on accumulated knowledge, with the guidance of an appropriate will. Also, we assume that the actions, part of knowledge, are learning to be aligned with will, during the evolution phase that precedes the mature intelligence state. In addition, this model is mainly based on the duality principle in every known intelligent aspect, such as exhibiting both top-down and bottom-up model learning, generalization verse specialization, and more. Furthermore, a holistic approach is advocated for AGI designing, and cognition under constraints or efficiency is proposed, in the form of reusability and simplicity. Finally, reaching this mature state is described via a cognitive evolution from infancy to adulthood, utilizing a consolidation principle. The final product of this cognitive model is a dynamic operational memory of models and instances. Lastly, some examples and preliminary ideas for the evolution phase to reach the mature state are presented.	翻訳日:2023-10-25 23:54:09 公開日:2023-10-24
# MM-Vet:統合能力のための大規模マルチモーダルモデルの評価 MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities ( http://arxiv.org/abs/2308.02490v3 ) ライセンス: Link先を確認	Weihao Yu, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang	(参考訳) 複雑なマルチモーダルタスクにおける大規模マルチモーダルモデル(LMM)の評価ベンチマークであるMM-Vetを提案する。近年のLMMは、黒板に書かれた数学の問題を解くこと、ニュース画像の出来事や有名人を推論すること、視覚的ジョークを説明することなど、様々な興味深い能力を示している。迅速なモデル開発は、ベンチマーク開発の評価に課題をもたらす。課題は,(1)複雑なマルチモーダルタスクを体系的に構造化し,評価する方法,(2)質問や回答のタイプでうまく機能する評価指標を設計する方法,(3)単純なパフォーマンスランキングを超えたモデルインサイトを提供する方法。この目的のために、複雑なタスクを解く興味深い能力は、様々なコアビジョン言語(VL)機能を統合できる一般モデルによってしばしば達成されるという知見に基づいて設計されたMM-Vetを提案する。 MM-Vetは6つのコアVL機能を定義し、機能の組み合わせから導かれる16の関心統合を検証している。評価指標として,オープンエンド出力のためのLCMに基づく評価器を提案する。評価器は、異なる質問タイプと回答スタイルで評価が可能であり、その結果、統一されたスコアリング基準となる。 MM-Vetにおける代表的LMMを評価し、異なるLMMシステムパラダイムとモデルの能力に関する洞察を提供する。コードとデータはhttps://github.com/yuweihao/MM-Vet.comで公開されている。 We propose MM-Vet, an evaluation benchmark that examines large multimodal models (LMMs) on complicated multimodal tasks. Recent LMMs have shown various intriguing abilities, such as solving math problems written on the blackboard, reasoning about events and celebrities in news images, and explaining visual jokes. Rapid model advancements pose challenges to evaluation benchmark development. Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking. To this end, we present MM-Vet, designed based on the insight that the intriguing ability to solve complicated tasks is often achieved by a generalist model being able to integrate different core vision-language (VL) capabilities. MM-Vet defines 6 core VL capabilities and examines the 16 integrations of interest derived from the capability combination. For evaluation metrics, we propose an LLM-based evaluator for open-ended outputs. The evaluator enables the evaluation across different question types and answer styles, resulting in a unified scoring metric. We evaluate representative LMMs on MM-Vet, providing insights into the capabilities of different LMM system paradigms and models. Code and data are available at https://github.com/yuweihao/MM-Vet.	翻訳日:2023-10-25 23:53:23 公開日:2023-10-24
# Baby Llama: パフォーマンスペナルティのない小さなデータセットで訓練された教師のアンサンブルからの知識蒸留 Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty ( http://arxiv.org/abs/2308.02019v2 ) ライセンス: Link先を確認	Inar Timiryasov and Jean-Loup Tastet	(参考訳) 言語モデルのサンプル効率を向上させることを目的として,babylmチャレンジへの提案を行った。我々は,GPT-2と10MワードのBabyLMデータセットを用いて,GPT-2と小LLaMAモデルからなるアンサンブルを訓練し,それを58MパラメータのLLaMAモデルに蒸留した。これは、蒸留が十分に小さなデータセットで訓練された場合、教師モデルの完全な性能を維持するだけでなく、それを上回ることができ、直接訓練よりもかなり優れた性能を得られることを示唆する。 We present our submission to the BabyLM challenge, whose goal was to improve the sample efficiency of language models. We trained an ensemble consisting of a GPT-2 and small LLaMA models on the developmentally-plausible, 10M-word BabyLM dataset, then distilled it into a small, 58M-parameter LLaMA model, which exceeds in performance both of its teachers as well as a similar model trained without distillation. This suggests that distillation can not only retain the full performance of the teacher model when the latter is trained on a sufficiently small dataset; it can exceed it, and lead to significantly better performance than direct training.	翻訳日:2023-10-25 23:52:58 公開日:2023-10-24
# DiffKendall: Kendallのランク相関を微分可能なFew-Shot学習のための新しいアプローチ DiffKendall: A Novel Approach for Few-Shot Learning with Differentiable Kendall's Rank Correlation ( http://arxiv.org/abs/2307.15317v2 ) ライセンス: Link先を確認	Kaipeng Zheng, Huishuai Zhang, Weiran Huang	(参考訳) 少数ショット学習は、ベースデータセットでトレーニングされたモデルを、それまでモデルによってカテゴリが見られなかった新しいタスクに適応させることを目的としている。これはしばしば、新しいクラスにおけるチャネル間の機能値の比較的均一な分布をもたらし、新しいタスクにおけるチャネルの重要性を決定する上での課題となる。標準的少数ショット学習法では、コサイン類似度や負ユークリッド距離といった幾何学的類似度メトリクスを用いて、2つの特徴間の意味的関連度を測定する。しかし、幾何学的類似性が高い特徴は、特に数ショット学習の文脈において、異なる意味論を持つ可能性がある。本稿では,特徴チャネルのランク付けの重要性が,幾何学的類似度指標よりも数ショット学習の信頼性が高いことを示す。幾何類似度メトリックをケンドールのランク相関に置き換えることにより、様々な領域の様々な方法やデータセットにおいて、数発学習の性能を向上させることができる。さらに,kendallのランク相関の非微分可能性問題に対処するために,メタトレーニングにおいて注意深く設計された微分可能損失を提案する。幾何学的類似性を微分可能なkendallのランク相関式に置き換えることで,既存の多数の少数ショットアプローチと統合することができ,幾何学的類似度メトリクスに依存する将来の最先端手法と統合する準備が整っている。大規模な実験は、ランク相関に基づくアプローチの有効性を検証し、少数ショット学習において顕著な改善を示す。 Few-shot learning aims to adapt models trained on the base dataset to novel tasks where the categories were not seen by the model before. This often leads to a relatively uniform distribution of feature values across channels on novel classes, posing challenges in determining channel importance for novel tasks. Standard few-shot learning methods employ geometric similarity metrics such as cosine similarity and negative Euclidean distance to gauge the semantic relatedness between two features. However, features with high geometric similarities may carry distinct semantics, especially in the context of few-shot learning. In this paper, we demonstrate that the importance ranking of feature channels is a more reliable indicator for few-shot learning than geometric similarity metrics. We observe that replacing the geometric similarity metric with Kendall's rank correlation only during inference is able to improve the performance of few-shot learning across a wide range of methods and datasets with different domains. Furthermore, we propose a carefully designed differentiable loss for meta-training to address the non-differentiability issue of Kendall's rank correlation. By replacing geometric similarity with differentiable Kendall's rank correlation, our method can integrate with numerous existing few-shot approaches and is ready for integrating with future state-of-the-art methods that rely on geometric similarity metrics. Extensive experiments validate the efficacy of the rank-correlation-based approach, showcasing a significant improvement in few-shot learning.	翻訳日:2023-10-25 23:52:01 公開日:2023-10-24
# Lanczos法の累積展開を用いた量子計算グリーン関数 Quantum Computed Green's Functions using a Cumulant Expansion of the Lanczos Method ( http://arxiv.org/abs/2309.09685v2 ) ライセンス: Link先を確認	Gabriel Greene-Diniz, David Zsolt Manrique, Kentaro Yamamoto, Evgeny Plekhanov, Nathan Fitzpatrick, Michal Krompiec, Rei Sakuma, David Mu\~noz Ramo	(参考訳) 本稿では,多体グリーン関数行列をスピン軌道基底で計算する量子計算法を提案する。我々は,有限サイズのフェルミオンハバードモデルとそれに関連する不純物モデルに動的平均場理論を適用し,量子量子コンピュータH1-1上でのグリーン関数の計算を実証する。本手法は, ハミルトンモーメントを計測可能な期待値として, ランチョス法を累積展開する手法である。これにより、変分量子固有ソルバ(vqe)の繰り返し適用による測定回数の大幅なオーバーヘッドを回避し、代わりに一組の計測回路でモーメントの期待値を測定する。測定されたモーメントから、三対角化ハミルトン行列が計算され、連続分数を通してグリーン函数が生成される。本研究では, 変分アルゴリズムを用いて基底状態を作成するが, 実装のモジュラリティにより, 基底状態に対して他の(変分的でない)アプローチが使用できることに留意する。 In this paper, we present a quantum computational method to calculate the many-body Green's function matrix in a spin orbital basis. We apply our approach to finite-sized fermionic Hubbard models and related impurity models within Dynamical Mean Field Theory, and demonstrate the calculation of Green's functions on Quantinuum's H1-1 trapped-ion quantum computer. Our approach involves a cumulant expansion of the Lanczos method, using Hamiltonian moments as measurable expectation values. This bypasses the need for a large overhead in the number of measurements due to repeated applications of the variational quantum eigensolver (VQE), and instead measures the expectation value of the moments with one set of measurement circuits. From the measured moments, the tridiagonalised Hamiltonian matrix can be computed, which in turn yields the Green's function via continued fractions. While we use a variational algorithm to prepare the ground state in this work, we note that the modularity of our implementation allows for other (non-variational) approaches to be used for the ground state.	翻訳日:2023-10-25 23:45:33 公開日:2023-10-24
# 提案要求に対するオープンデータ駆動チーム推奨によるリサーチコラボレーションの促進 Promoting Research Collaboration with Open Data Driven Team Recommendation in Response to Call for Proposals ( http://arxiv.org/abs/2309.09404v3 ) ライセンス: Link先を確認	Siva Likitha Valluru, Biplav Srivastava, Sai Teja Paladi, Siwen Yan, Sriraam Natarajan	(参考訳) チームの構築とコラボレーションの促進は2つの非常に一般的なビジネス活動です。例えばteamingforfunding問題では、研究機関や研究者が、後者の提案に応じて資金提供機関に申し込む際の協力的な機会を特定することに関心を持っている。本稿では,(1)各チームが,その機会に要求される最高のスキルカバレッジを達成し,(2)その機会を分配する作業負荷が,候補メンバー間でバランスをとるような,さまざまなAI手法を用いてチームを推薦するシステムについて述べる。我々は,提案コール(需要)と研究者プロファイル(供給)のオープンデータに潜んでいるスキルを抽出し,分類法を用いてそれらを正規化し,供給需要にマッチする効率的なアルゴリズムを作成することで,これらの疑問に対処した。短期と長期の目標のバランスをとる新しいメトリクスに沿って、良さを最大化するチームを作ります。我々は,(1) アルゴリズムの成功を定量的に検証し,(1) 優れたスコアを用いて推奨チームを評価し,より情報のある手法がより少ない人数のチームの推薦につながること,(2) 大学レベルの大規模ユーザスタディを実施することによって質的に,そのツールが極めて有用かつ関連性の高いものであることを示す。最後に,我々のアプローチの汎用性を確立するために,米国とインド(研究者と提案コール)の2つの異なる環境でシステムを評価し,日常的な使用のために米国の主要大学に展開する。 Building teams and promoting collaboration are two very common business activities. An example of these are seen in the TeamingForFunding problem, where research institutions and researchers are interested to identify collaborative opportunities when applying to funding agencies in response to latter's calls for proposals. We describe a novel system to recommend teams using a variety of AI methods, such that (1) each team achieves the highest possible skill coverage that is demanded by the opportunity, and (2) the workload of distributing the opportunities is balanced amongst the candidate members. We address these questions by extracting skills latent in open data of proposal calls (demand) and researcher profiles (supply), normalizing them using taxonomies, and creating efficient algorithms that match demand to supply. We create teams to maximize goodness along a novel metric balancing short- and long-term objectives. We validate the success of our algorithms (1) quantitatively, by evaluating the recommended teams using a goodness score and find that more informed methods lead to recommendations of smaller number of teams but higher goodness, and (2) qualitatively, by conducting a large-scale user study at a college-wide level, and demonstrate that users overall found the tool very useful and relevant. Lastly, we evaluate our system in two diverse settings in US and India (of researchers and proposal calls) to establish generality of our approach, and deploy it at a major US university for routine use.	翻訳日:2023-10-25 23:45:14 公開日:2023-10-24
# AV2Wav: 音声音声強調のための連続自己教師機能からの拡散に基づく再合成 AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement ( http://arxiv.org/abs/2309.08030v2 ) ライセンス: Link先を確認	Ju-Chieh Chou, Chung-Ming Chien, Karen Livescu	(参考訳) 音声強調システムは通常、クリーンな音声と騒がしい音声のペアを使って訓練される。オーディオ・ヴィジュアル音声強調(AVSE)では、音声・ヴィジュアル・データセットは、背景雑音や残響を伴う現実世界の環境で収集され、AVSEの開発を妨げている。本研究では,実世界の学習データの課題にもかかわらずクリーンな音声を生成できる再生型音声視覚音声強調手法であるAV2Wavを紹介する。ニューラルクオリティ推定器を用いて音声・視覚コーパスからほぼクリーンな音声のサブセットを取得し、このサブセット上で拡散モデルを訓練し、ノイズロバストトレーニングによりAV-HuBERTから連続音声表現に条件付き波形を生成する。韻律や話者情報を保持するために、離散表現よりも連続表現を用いる。このvocodingタスクだけで、モデルはマスキングベースのベースラインよりも音声強調を行うことができる。さらに, クリーン・ノイズ対の拡散モデルを微調整し, 性能向上を図る。提案手法は,自動測定と人間の聴力テストの両方においてマスキングベースのベースラインを上回り,聴力テストにおけるターゲット音声にほぼ近い品質である。オーディオサンプルはhttps://home.ttic.edu/~jcchou/demo/avse/avse_demo.htmlにある。 Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in real-world environments with background noise and reverberation, hampering the development of AVSE. In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data. We obtain a subset of nearly clean speech from an audio-visual corpus using a neural quality estimator, and then train a diffusion model on this subset to generate waveforms conditioned on continuous speech representations from AV-HuBERT with noise-robust training. We use continuous rather than discrete representations to retain prosody and speaker information. With this vocoding task alone, the model can perform speech enhancement better than a masking-based baseline. We further fine-tune the diffusion model on clean/noisy utterance pairs to improve the performance. Our approach outperforms a masking-based baseline in terms of both automatic metrics and a human listening test and is close in quality to the target speech in the listening test. Audio samples can be found at https://home.ttic.edu/~jcchou/demo/avse/avse_demo.html.	翻訳日:2023-10-25 23:44:47 公開日:2023-10-24
# 臨床テキスト要約:大規模言語モデルへの適応は人間の専門家を上回らせる Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts ( http://arxiv.org/abs/2309.07430v3 ) ライセンス: Link先を確認	Dave Van Veen, Cara Van Uden, Louis Blankemeier, Jean-Benoit Delbrouck, Asad Aali, Christian Bluethgen, Anuj Pareek, Malgorzata Polacin, Eduardo Pontes Reis, Anna Seehofnerova, Nidhi Rohatgi, Poonam Hosamani, William Collins, Neera Ahuja, Curtis P. Langlotz, Jason Hom, Sergios Gatidis, John Pauly, Akshay S. Chaudhari	(参考訳) 膨大なテキストデータを精査し、電子健康記録(ehr)から重要な情報を要約することは、臨床医の時間の割り当てに多大な負担を課す。大規模言語モデル(LLM)は自然言語処理(NLP)タスクにおいて大きな可能性を秘めているが、多種多様な臨床要約タスクに対する効果はまだ十分に実証されていない。本研究は,8つのllmにドメイン適応法を適用し,6つのデータセットと4つの異なる臨床要約タスク(放射線検査,患者の質問,進捗記録,医師と患者との対話)にまたがる。我々は,最近のllmの進歩が改善しない事例に加えて,モデルと適応手法のトレードオフを明らかにする。さらに,10名の医師による臨床読影者を対象に,最良適応LSMの要約は,完全性と正確性の観点からヒトの要約より好ましいことを示す。続く質的分析は、LLMと人間の専門家が直面する課題を強調します。最後に,これらの指標が医師の嗜好とどのように一致しているかの理解を深めるため,従来の量的NLP指標と読者調査スコアを相関付ける。我々の研究は、複数のタスクにわたる臨床テキスト要約において、llmが人間専門家を上回った最初の証拠である。このことは、LSMを臨床ワークフローに組み込むことで、医師がパーソナライズされた患者のケアや、本質的に人間の医学的側面にもっと集中できるように、ドキュメントの負担を軽減することができることを意味している。 Sifting through vast textual data and summarizing key information from electronic health records (EHR) imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown immense promise in natural language processing (NLP) tasks, their efficacy on a diverse range of clinical summarization tasks has not yet been rigorously demonstrated. In this work, we apply domain adaptation methods to eight LLMs, spanning six datasets and four distinct clinical summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Our thorough quantitative assessment reveals trade-offs between models and adaptation methods in addition to instances where recent advances in LLMs may not improve results. Further, in a clinical reader study with ten physicians, we show that summaries from our best-adapted LLMs are preferable to human summaries in terms of completeness and correctness. Our ensuing qualitative analysis highlights challenges faced by both LLMs and human experts. Lastly, we correlate traditional quantitative NLP metrics with reader study scores to enhance our understanding of how these metrics align with physician preferences. Our research marks the first evidence of LLMs outperforming human experts in clinical text summarization across multiple tasks. This implies that integrating LLMs into clinical workflows could alleviate documentation burden, empowering clinicians to focus more on personalized patient care and the inherently human aspects of medicine.	翻訳日:2023-10-25 23:44:24 公開日:2023-10-24
# MRI並列再構成のためのバッチインプットニューラル表現法 Batch Implicit Neural Representation for MRI Parallel Reconstruction ( http://arxiv.org/abs/2309.06067v3 ) ライセンス: Link先を確認	Hao Li, Yusheng Zhou, Jianan Liu, Xiling Liu, Tao Huang, and Zhihan Lv	(参考訳) 磁気共鳴画像(MRI)は常に長い取得時間の問題に悩まされている。 MRI再構成は、特定の位相符号化ラインをスキップし、アンダーサンプル測定から高品質なイメージを復元することでスキャン時間を短縮する1つの方法である。近年,物体を空間座標の連続関数として表現する新しい深層学習法として暗黙的ニューラル表現(INR)が登場し,この関数は通常多層パーセプトロン(MLP)によってパラメータ化される。本稿では,INRの一般化問題を克服するために,フルサンプリング画像をボクセル座標の関数として,アンダーサンプル画像の先行特徴ベクトルとして表現した新しいMRI並列再構成手法を提案する。具体的には,スケールの異なるmr画像からスケール非依存なvoxel特徴を生成し,座標ベクトルと結合してmlpを介して完全にサンプリングされたmr画像を復元し,任意のスケール再構成を実現するスケール埋め込みエンコーダを導入する。提案手法の性能は,mriデータセット上で実験し,他の再構成法と比較することで評価した。提案手法が代替手法よりも優れていることを示す定量的評価を行った。 Magnetic resonance imaging (MRI) always suffered from the problem of long acquisition time. MRI reconstruction is one solution to reduce scan time by skipping certain phase-encoding lines and then restoring high-quality images from undersampled measurements. Recently, implicit neural representation (INR) has emerged as a new deep learning method that represents an object as a continuous function of spatial coordinates, and this function is normally parameterized by a multilayer perceptron (MLP). In this paper, we propose a novel MRI parallel reconstruction method based on INR, which represents the fully-sampled images as the function of voxel coordinates and prior feature vectors of undersampled images for overcoming the generalization problem of INR. Specifically, we introduce a scale-embedded encoder to produce scale-independent voxel-specific features from MR images with different undersampled scales and then concatenate with coordinates vectors to recover fully-sampled MR images via an MLP, thus achieving arbitrary scale reconstruction. The performance of the proposed method was assessed by experimenting on publicly available MRI datasets and compared with other reconstruction methods. Our quantitative evaluation demonstrates the superiority of the proposed method over alternative reconstruction methods.	翻訳日:2023-10-25 23:43:25 公開日:2023-10-24
# NanoT5: リソース制限付き事前トレーニングおよび微調整T5スタイルモデルのためのPyTorchフレームワーク nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources ( http://arxiv.org/abs/2309.02373v2 ) ライセンス: Link先を確認	Piotr Nawrot	(参考訳) T5のような最先端の言語モデルはNLPのランドスケープに革命をもたらしたが、その計算要求は研究コミュニティの大部分を妨げている。この課題に対処するため、T5モデルの事前学習と微調整を効率的に行うために特別に最適化されたPyTorchフレームワークであるnanoT5を提案する。 nanot5はオプティマイザの違いと優先順位付け効率から得られた洞察に基づいて、t5ベースのモデルをたった16時間で1つのgpuで事前トレーニングすることができる。このオープンソースフレームワークの導入により、言語モデリングの研究へのアクセシビリティを拡大し、よりユーザフレンドリーなT5(Encoder-Decoder)実装に対するコミュニティの要求に応えたいと思っています。コンフィギュレーションやコードベース、事前トレーニングされた洞察、事前トレーニングされたモデルなど、私たちのコントリビューションを一般公開しています。 State-of-the-art language models like T5 have revolutionized the NLP landscape, but their computational demands hinder a large portion of the research community. To address this challenge, we present nanoT5, a specially-optimized PyTorch framework for efficient pre-training and fine-tuning of T5 models. Drawing on insights from optimizer differences and prioritizing efficiency, nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance. With the introduction of this open-source framework, we hope to widen the accessibility to language modelling research and cater to the community's demand for more user-friendly T5 (Encoder-Decoder) implementations. We make our contributions, including configurations, codebase, pre-training insights, and pre-trained models, available to the public.	翻訳日:2023-10-25 23:43:04 公開日:2023-10-24
# Shatter and Gather: テキストスーパービジョンによる画像セグメンテーションの学習 Shatter and Gather: Learning Referring Image Segmentation with Text Supervision ( http://arxiv.org/abs/2308.15512v2 ) ライセンス: Link先を確認	Dongwon Kim, Namyup Kim, Cuiling Lan, Suha Kwak	(参考訳) イメージセグメンテーションを参照すると、自由形式のテキストで記述された任意のエンティティをセグメンテーションするタスクは、様々なビジョンアプリケーションを開きます。しかし、このタスクのトレーニングデータの手作業によるラベル付けは極めてコストがかかるため、トレーニング用のラベル付きデータが不足する。トレーニング画像のテキスト記述を唯一の監督源として用いた弱教師付き学習手法によりこの問題に対処する。この目的のために,まず,入力画像中の意味的エンティティを探索し,テキストクエリに関連するエンティティを結合して参照者のマスクを予測する新しいモデルを提案する。また、新たな損失関数を導入し、さらなる監視なしにモデルをトレーニングできるようにします。提案手法は,画像分割参照のための4つの公開ベンチマークで評価され,同じタスクに対する既存の手法や,最近のオープンボカブラリーセグメンテーションモデルよりも明らかに優れていた。 Referring image segmentation, the task of segmenting any arbitrary entities described in free-form texts, opens up a variety of vision applications. However, manual labeling of training data for this task is prohibitively costly, leading to lack of labeled data for training. We address this issue by a weakly supervised learning approach using text descriptions of training images as the only source of supervision. To this end, we first present a new model that discovers semantic entities in input image and then combines such entities relevant to text query to predict the mask of the referent. We also present a new loss function that allows the model to be trained without any further supervision. Our method was evaluated on four public benchmarks for referring image segmentation, where it clearly outperformed the existing method for the same task and recent open-vocabulary segmentation models on all the benchmarks.	翻訳日:2023-10-25 23:42:45 公開日:2023-10-24
# マルチアーメッドバンドの実値組合せ純粋探索のためのトンプソンサンプリング Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit ( http://arxiv.org/abs/2308.10238v2 ) ライセンス: Link先を確認	Shintaro Nakamura, Masashi Sugiyama	(参考訳) 本稿では,マルチアームバンディット(R-CPE-MAB)問題の実測値について検討する。 R-CPE-MABでは、プレイヤーは確率的な腕を$d$与えられ、各アームの報酬は$s\in\{1, \ldots, d\}$が平均$\mu_s$の未知分布に従う。各タイムステップで、プレイヤーは片方の腕を引っ張り、その報酬を観察する。プレイヤーのゴールは、最適な \emph{action} $\boldsymbol{\pi}^{} = \argmax_{\boldsymbol{\pi} \in \mathcal{A}} \boldsymbol{\mu}^{\top}\boldsymbol{\pi}$を有限サイズの実数値の \emph{action set} $\mathcal{A}\subset \mathbb{R}^{d}$から極小のアームプルで識別することである。 R-CPE-MAB の以前の方法では、アクションセット $\mathcal{A}$ のサイズは$d$ の多項式である。一般トンプソンサンプリング探索法(GenTS-Explore)と呼ばれるアルゴリズムを導入する。これはアクションセットのサイズが指数関数的に$d$で大きい場合でも動作する最初のアルゴリズムである。また,R-CPE-MAB問題に対して,新たな問題依存型サンプル複雑性を低い境界で導入し,GenTS-Exploreアルゴリズムが問題依存定数係数まで最適なサンプル複雑性を実現することを示す。 We study the real-valued combinatorial pure exploration of the multi-armed bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given $d$ stochastic arms, and the reward of each arm $s\in\{1, \ldots, d\}$ follows an unknown distribution with mean $\mu_s$. In each time step, a player pulls a single arm and observes its reward. The player's goal is to identify the optimal \emph{action} $\boldsymbol{\pi}^{} = \argmax_{\boldsymbol{\pi} \in \mathcal{A}} \boldsymbol{\mu}^{\top}\boldsymbol{\pi}$ from a finite-sized real-valued \emph{action set} $\mathcal{A}\subset \mathbb{R}^{d}$ with as few arm pulls as possible. Previous methods in the R-CPE-MAB assume that the size of the action set $\mathcal{A}$ is polynomial in $d$. We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the action set is exponentially large in $d$. We also introduce a novel problem-dependent sample complexity lower bound of the R-CPE-MAB problem, and show that the GenTS-Explore algorithm achieves the optimal sample complexity up to a problem-dependent constant factor.	翻訳日:2023-10-25 23:42:20 公開日:2023-10-24
# 知識グラフ推論による弱教師付きセマンティックセグメンテーション Weakly Supervised Semantic Segmentation by Knowledge Graph Inference ( http://arxiv.org/abs/2309.14057v2 ) ライセンス: Link先を確認	Jia Zhang, Bo Peng, Xi Wu	(参考訳) 現在、畳み込みニューラルネットワーク(CNN)に基づくWSSS(Weakly Supervised Semantic Segmentation)における既存の取り組みは、同様に重要な下流セグメンテーションネットワークに限定して、マルチラベル分類ネットワークステージの強化に重点を置いている。さらに、CNNベースのローカルコンボリューションには、広範なカテゴリ間の依存関係をモデル化する能力がない。そこで本稿では,wsss 強化のためのグラフ推論に基づくアプローチを提案する。マルチラベル分類とセグメンテーションネットワークの段階を同時に拡張することにより,WSSSの全体的改善を図る。マルチラベル分類ネットワークセグメントでは、外部知識とgcnを組み合わせることで、クラス間の依存関係をグローバルに推論する。これによりネットワークは、画像の不十分な領域の特徴を解明し、生成された擬似ラベルの完全性を改善することができる。セグメント化ネットワークセグメントにおいて,提案するグラフ推論マッピング(GRM)モジュールを用いてテキストデータベースから得られた知識を活用し,画像領域内のクラス表現の文脈的推論を容易にする。このgrmモジュールは、個々のサンプルに対するセマンティックコヒーレンスを動的に学習しながら、セグメンテーションネットワークの局所畳み込みの高レベル意味論における特徴表現を強化する。画像レベルの監視のみを用いて、PASCAL VOC 2012およびMS-COCOデータセット上でWSSSの最先端のパフォーマンスを達成した。マルチラベル分類とセグメンテーションネットワークの段階における広範な実験により,WSSSの進展に対するグラフ推論手法の有効性が示された。 Currently, existing efforts in Weakly Supervised Semantic Segmentation (WSSS) based on Convolutional Neural Networks (CNNs) have predominantly focused on enhancing the multi-label classification network stage, with limited attention given to the equally important downstream segmentation network. Furthermore, CNN-based local convolutions lack the ability to model the extensive inter-category dependencies. Therefore, this paper introduces a graph reasoning-based approach to enhance WSSS. The aim is to improve WSSS holistically by simultaneously enhancing both the multi-label classification and segmentation network stages. In the multi-label classification network segment, external knowledge is integrated, coupled with GCNs, to globally reason about inter-class dependencies. This encourages the network to uncover features in non-salient regions of images, thereby refining the completeness of generated pseudo-labels. In the segmentation network segment, the proposed Graph Reasoning Mapping (GRM) module is employed to leverage knowledge obtained from textual databases, facilitating contextual reasoning for class representation within image regions. This GRM module enhances feature representation in high-level semantics of the segmentation network's local convolutions, while dynamically learning semantic coherence for individual samples. Using solely image-level supervision, we have achieved state-of-the-art performance in WSSS on the PASCAL VOC 2012 and MS-COCO datasets. Extensive experimentation on both the multi-label classification and segmentation network stages underscores the effectiveness of the proposed graph reasoning approach for advancing WSSS.	翻訳日:2023-10-25 23:34:38 公開日:2023-10-24
# 在庫管理における後方予測 : 分類手法とコストの考察 Backorder Prediction in Inventory Management: Classification Techniques and Cost Considerations ( http://arxiv.org/abs/2309.13837v3 ) ライセンス: Link先を確認	Sarit Maitra, Sukanya Kundu	(参考訳) 本稿では,在庫管理における後方予測のための高度な分析手法を紹介する。秩序とは、株式の枯渇により直ちに達成できない命令のこと。 ROC-AUC や PR-AUC などの性能評価指標を用いて, 平衡バッグ分類器, ファジィ論理, 変分オートエンコーダ, 多層パーセプトロン分類器などの複数の分類手法の評価を行った。さらに、在庫管理や受注処理に関連する金銭的意味やコストを考慮すると、利益関数と誤分類コストが組み込まれている。この研究は、アンサンブル技法とvaeを含むモデリング手法の組み合わせによって、在庫管理における不均衡データセットを効果的に処理し、解釈可能性を強調し、偽陽性と偽陰性を低減できることを示唆している。本研究は, 予測分析の進歩に寄与し, 後方予測における今後の調査や意思決定のための在庫管理最適化に有用な知見を提供する。 This article introduces an advanced analytical approach for predicting backorders in inventory management. Backorder refers to an order that cannot be immediately fulfilled due to stock depletion. Multiple classification techniques, including Balanced Bagging Classifiers, Fuzzy Logic, Variational Autoencoder - Generative Adversarial Networks, and Multi-layer Perceptron classifiers, are assessed in this work using performance evaluation metrics such as ROC-AUC and PR-AUC. Moreover, this work incorporates a profit function and misclassification costs, considering the financial implications and costs associated with inventory management and backorder handling. The study suggests that a combination of modeling approaches, including ensemble techniques and VAE, can effectively address imbalanced datasets in inventory management, emphasizing interpretability and reducing false positives and false negatives. This research contributes to the advancement of predictive analytics and offers valuable insights for future investigations in backorder forecasting and inventory control optimization for decision-making.	翻訳日:2023-10-25 23:34:10 公開日:2023-10-24
# Rewrite Caption Semantics: 言語スーパービジョンセマンティックセマンティックセマンティックセマンティックスのためのブリッジングセマンティックギャップ Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation ( http://arxiv.org/abs/2309.13505v3 ) ライセンス: Link先を確認	Yun Xing, Jian Kang, Aoran Xiao, Jiahao Nie, Shao Ling, Shijian Lu	(参考訳) ビジョンランゲージ事前学習は、その目覚ましいゼロショット認識能力と、言語監督から一般化可能な視覚表現を学習する可能性を示した。一歩前進して、言語によるセマンティックセグメンテーションは、画像とテキストのペアのみからピクセルグループを学習することで、テキスト入力の空間的局所化を可能にする。それでも、最先端技術は、視覚とテキストのモダリティの間に明確な意味的ギャップに悩まされている:画像に現れる多くの視覚概念が、ペア化されたキャプションに欠けている。このような意味的ミスアライメントは事前学習で循環し、テキスト表現で捉えた視覚概念が不十分なため、密集した予測ではゼロショット性能が劣る。このようなセマンティクスのギャップを埋めるため,CLIPを利用するパイプラインであるConcept Curation(CoCu)を提案する。各画像とテキストのペアに対して,視覚駆動型拡張とテキスト対視覚誘導ランキングとで視覚的に整合するコンセプトアーカイブを構築した。したがって、関連する概念はクラスタガイドによるサンプリングによって識別され、事前トレーニングされ、視覚とテキストのセマンティクスのギャップを埋めることができる。 8つのセグメンテーションベンチマークの幅広いスイートにわたる実験は、cocuがスーパーブゼロショット転送性能を達成し、言語教師付きセグメンテーションベースラインを大きなマージンで大きく向上させ、事前トレーニングデータにおけるセマンティクスギャップの橋渡しの価値を示唆している。 Vision-Language Pre-training has demonstrated its remarkable zero-shot recognition ability and potential to learn generalizable visual representations from language supervision. Taking a step ahead, language-supervised semantic segmentation enables spatial localization of textual inputs by learning pixel grouping solely from image-text pairs. Nevertheless, the state-of-the-art suffers from clear semantic gaps between visual and textual modality: plenty of visual concepts appeared in images are missing in their paired captions. Such semantic misalignment circulates in pre-training, leading to inferior zero-shot performance in dense predictions due to insufficient visual concepts captured in textual representations. To close such semantic gap, we propose Concept Curation (CoCu), a pipeline that leverages CLIP to compensate for the missing semantics. For each image-text pair, we establish a concept archive that maintains potential visually-matched concepts with our proposed vision-driven expansion and text-to-vision-guided ranking. Relevant concepts can thus be identified via cluster-guided sampling and fed into pre-training, thereby bridging the gap between visual and textual semantics. Extensive experiments over a broad suite of 8 segmentation benchmarks show that CoCu achieves superb zero-shot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin, suggesting the value of bridging semantic gap in pre-training data.	翻訳日:2023-10-25 23:33:51 公開日:2023-10-24
# 単語レベルとスパンレベルのタスクを統一する:NJUNLPによるWMT2023品質評価共有タスクへの参加 Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task ( http://arxiv.org/abs/2309.13230v2 ) ライセンス: Link先を確認	Xiang Geng, Zhejian Lai, Yu Zhang, Shimin Tao, Hao Yang, Jiajun Chen, Shujian Huang	(参考訳) 我々は,WMT 2023 Quality Estimation (QE)共有タスクに対するNJUNLPチームの提案を紹介する。私たちのチームは2つのサブタスクすべてで、英語とドイツ語のペアの予測を提出しました。 (i)文・語レベルの品質予測、及び (ii)細粒度エラースパン検出。 NJUQEフレームワーク(https://github.com/NJUNLP/njuqe)に基づくQEの擬似データ手法をさらに検討する。 WMT翻訳タスクから並列データを用いて疑似MQMデータを生成する。擬似QEデータ上でXLMR大モデルを事前訓練し、実QEデータ上で微調整する。両段階で文レベルスコアと単語レベルタグを共同で学習する。実証的に、私たちはパフォーマンスを改善する重要なハイパーパラメータを見つける実験を行います。技術的には、単語レベルの出力をきめ細かな誤差にカバーする単純な手法を提案する。全体的に、我々のモデルは単語レベルときめ細かいエラースパン検出サブタスクの両方において、英語とドイツ語で最高の結果を得ました。 We introduce the submissions of the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks: (i) sentence- and word-level quality prediction; and (ii) fine-grained error span detection. This year, we further explore pseudo data methods for QE based on NJUQE framework (https://github.com/NJUNLP/njuqe). We generate pseudo MQM data using parallel data from the WMT translation task. We pre-train the XLMR large model on pseudo QE data, then fine-tune it on real QE data. At both stages, we jointly learn sentence-level scores and word-level tags. Empirically, we conduct experiments to find the key hyper-parameters that improve the performance. Technically, we propose a simple method that covert the word-level outputs to fine-grained error span results. Overall, our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks by a considerable margin.	翻訳日:2023-10-25 23:33:19 公開日:2023-10-24
# AnglE最適化テキスト埋め込み AnglE-optimized Text Embeddings ( http://arxiv.org/abs/2309.12871v5 ) ライセンス: Link先を確認	Xianming Li, Jing Li	(参考訳) 高品質なテキスト埋め込みは、Large Language Model (LLM) アプリケーションにおいて重要なコンポーネントであるセマンティックテキスト類似性(STS)タスクの改善に重要である。しかし、既存のテキスト埋め込みモデルが直面する共通の課題は、主に飽和ゾーンを持つ最適化目的におけるコサイン関数に依存することによる勾配の消失の問題である。本稿では,AnglEと呼ばれる新しい角度最適化テキスト埋め込みモデルを提案する。 AnglEの中核となる考え方は、複素空間に角度最適化を導入することである。この手法は、勾配を阻害し最適化を妨げうるコサイン関数における飽和域の悪影響を効果的に軽減する。包括的なSTS評価を設定するために、既存の短文STSデータセットとGitHub Issuesから新たに収集された長文STSデータセットを試した。さらに、ラベル付きデータに制限のあるドメイン固有のstsシナリオを検討し、アングルがllmアノテートデータとどのように連携するかを検討する。短文STS、長文STS、ドメイン固有のSTSタスクなど、さまざまなタスクで大規模な実験が行われた。その結果、AnglEはコサイン飽和ゾーンを無視したSOTA(State-of-the-art STS)モデルよりも優れていた。これらの結果は、AnglEが高品質なテキスト埋め込みを生成する能力と、STSにおける角度最適化の有用性を示している。 High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.	翻訳日:2023-10-25 23:32:43 公開日:2023-10-24
# モデルを微調整する方法:統一モデルシフトとモデルバイアスポリシー最適化 How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization ( http://arxiv.org/abs/2309.12671v2 ) ライセンス: Link先を確認	Hai Zhang, Hang Yu, Junqiao Zhao, Di Zhang, Chang Huang, Hongtu Zhou, Xiao Zhang, Chen Ye	(参考訳) 効果的なモデルベース強化学習(mbrl)アルゴリズムの設計と導出は、主にモデル学習とポリシー最適化の結合度が高いことが原因で困難である。モデル学習を導くためにリターンの相違に依存する多くの先行手法は、モデル変更の影響を無視しており、過剰なモデル更新によるパフォーマンス劣化につながる可能性がある。他のメソッドでは、モデルシフトを明示的に考慮するためにパフォーマンス差分を使用する。しかし、これらの手法はモデルシフトを制約するために一定のしきい値に依存するため、しきい値に大きく依存し、トレーニングプロセス中に適応性に欠ける。本稿では,モデルシフトとモデルバイアスを統一し,微調整プロセスを定式化する最適化目標を理論的に導出する。このプロセスはモデル更新を適応的に調整し、モデルオーバーフィットを避けながら、パフォーマンス向上の保証を得る。そこで我々は,USB-PO (Unified model Shift and model Bias Policy Optimization) という簡単なアルゴリズムを開発した。実験の結果,USB-POはいくつかの課題のあるベンチマークタスクにおいて,最先端のパフォーマンスを実現することがわかった。 Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward algorithm USB-PO (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks.	翻訳日:2023-10-25 23:32:21 公開日:2023-10-24
# 自律型水中車両のインテリジェントデブリ質量推定モデル Intelligent Debris Mass Estimation Model for Autonomous Underwater Vehicle ( http://arxiv.org/abs/2309.10617v2 ) ライセンス: Link先を確認	Mohana Sri S, Swethaa S, Aouthithiye Barathwaj SR Y, Sai Ganesh CS	(参考訳) 海洋ゴミは海洋生物の生存に重大な脅威をもたらし、しばしば絡み合いや飢餓につながり、最終的には死に至る。したがって、海洋からゴミを取り除くことは自然のバランスを回復し、海洋生物を繁栄させるのに不可欠である。インスタンスセグメンテーション(インスタンスセグメンテーション)は、物体を識別し、それらを正確に特定し、分離するオブジェクト検出の先進的な形態であり、自律型水中車両(AUV)が水中環境を効果的に操作するための必須のツールである。 AUVは画像セグメンテーションを使用して、カメラが捉えた画像を分析し、水中環境をナビゲートする。本稿では、画像内の個々のオブジェクトの面積を計算するためにインスタンスセグメンテーションを使用し、roboflowではyolov7を使用して、検出毎にクラスラベルと信頼度スコアを持つ画像内の各オブジェクトのバウンディングボックスのセットを生成する。次に、オブジェクトの境界ボックスにバイナリマスクを適用することで、各オブジェクトに対してセグメンテーションマスクを作成する。マスクは、背景からオブジェクトをセグメント化するように訓練された畳み込みニューラルネットワークの出力にバイナリしきい値を適用して生成される。最後に、形態素演算や輪郭検出などの後処理技術を適用し、マスクの精度と品質を向上させることにより、各対象に対するセグメンテーションマスクの精錬を行う。インスタンスセグメンテーションの領域を推定するプロセスは、各セグメンテーションされたインスタンスの領域を別々に計算し、全インスタンスの領域を合計して総面積を得る。この計算は、矩形や円のような物体の形状に基づく標準式を用いて行われる。対象が複素である場合、その領域を推定するためにモンテカルロ法が用いられる。この方法は従来の方法よりも精度が高く、特に多数のサンプルを使用する場合に高い精度を提供する。 Marine debris poses a significant threat to the survival of marine wildlife, often leading to entanglement and starvation, ultimately resulting in death. Therefore, removing debris from the ocean is crucial to restore the natural balance and allow marine life to thrive. Instance segmentation is an advanced form of object detection that identifies objects and precisely locates and separates them, making it an essential tool for autonomous underwater vehicles (AUVs) to navigate and interact with their underwater environment effectively. AUVs use image segmentation to analyze images captured by their cameras to navigate underwater environments. In this paper, we use instance segmentation to calculate the area of individual objects within an image, we use YOLOV7 in Roboflow to generate a set of bounding boxes for each object in the image with a class label and a confidence score for every detection. A segmentation mask is then created for each object by applying a binary mask to the object's bounding box. The masks are generated by applying a binary threshold to the output of a convolutional neural network trained to segment objects from the background. Finally, refining the segmentation mask for each object is done by applying post-processing techniques such as morphological operations and contour detection, to improve the accuracy and quality of the mask. The process of estimating the area of instance segmentation involves calculating the area of each segmented instance separately and then summing up the areas of all instances to obtain the total area. The calculation is carried out using standard formulas based on the shape of the object, such as rectangles and circles. In cases where the object is complex, the Monte Carlo method is used to estimate the area. This method provides a higher degree of accuracy than traditional methods, especially when using a large number of samples.	翻訳日:2023-10-25 23:32:04 公開日:2023-10-24
# 量子ハイブリッドおよび量子インスパイアされたハードウェア上での移動ロボットスケジューリング問題の最適化事例 An Optimization Case Study for solving a Transport Robot Scheduling Problem on Quantum-Hybrid and Quantum-Inspired Hardware ( http://arxiv.org/abs/2309.09736v4 ) ライセンス: Link先を確認	Dominik Leib, Tobias Seidel, Sven J\"ager, Raoul Heese, Caitlin Isobel Jones, Abhishek Awasthi, Astrid Niederle, Michael Bortz	(参考訳) 本稿では,d-wavesのquantum-classical hybrid framework,futsuのquantum-inspired digital annealer,gurobi's state-of-the-art classical solverの性能比較を行った。この問題は、産業的に関連のある現実世界のシナリオに由来する。我々は、異なる設計哲学に従う問題に対して、3つの異なるモデルを提供する。ベンチマークでは、異なるモデルとソルバの組み合わせのソリューション品質とエンドツーエンドランタイムに焦点を当てています。ディジタルアニールラーには有望な結果が得られ、グロビと直接比較すると、ハイブリッド量子アニールラーにはいくつかの機会がある。本研究は、異なる戦略でアプリケーション指向最適化問題を解決するためのワークフローに関する洞察を提供し、異なるアプローチの強みと弱みを評価するのに有用である。 We present a comprehensive case study comparing the performance of D-Waves' quantum-classical hybrid framework, Fujitsu's quantum-inspired digital annealer, and Gurobi's state-of-the-art classical solver in solving a transport robot scheduling problem. This problem originates from an industrially relevant real-world scenario. We provide three different models for our problem following different design philosophies. In our benchmark, we focus on the solution quality and end-to-end runtime of different model and solver combinations. We find promising results for the digital annealer and some opportunities for the hybrid quantum annealer in direct comparison with Gurobi. Our study provides insights into the workflow for solving an application-oriented optimization problem with different strategies, and can be useful for evaluating the strengths and weaknesses of different approaches.	翻訳日:2023-10-25 23:31:08 公開日:2023-10-24
# アバロンの思考ゲーム:再帰的熟考による偽装との戦い Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation ( http://arxiv.org/abs/2310.01320v3 ) ライセンス: Link先を確認	Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、LLM-as-Agentの分野で大きな成功を収めている。それにもかかわらず、llmsが処理する情報は一貫して正直であり、人間社会やaiが生成するコンテンツにおける広汎な誤解や誤解を招く情報を無視しているという仮定が一般的である。この監視により、LSMは悪意のある操作を受けやすくなり、有害な結果をもたらす可能性がある。本研究では,複雑なアバロンゲームを用いて,認知環境におけるLSMの可能性を探究する。アバロンは誤った情報に満ちており、洗練された論理を必要とするため、「思考のゲーム」として表される。アバロンゲームにおける人間の再帰的思考と視点取りの有効性に着想を得て,LLMの認知・認識能力を高めるための新しい枠組みであるRecursive Contemplation(ReCon)を導入する。 ReConは、定式化と洗練の熟考プロセスを組み合わせており、定式化は初期の思考とスピーチを生み出し、洗練の熟考はそれらをさらに洗練する。さらに、これらのプロセスにそれぞれ一階および二階の視点遷移を組み込む。具体的には、LLMエージェントが他人の精神状態を推測し、2階は他人がエージェントの精神状態をどう知覚するかを理解する。 reconを異なるllmと統合した後、avalon gameの広範な実験結果は、追加の微調整やデータなしで偽情報の識別と操作をllmに支援する効果を示している。最後に、ReConの有効性の可能な説明を提供し、安全性、推論、話し方、フォーマットの観点からLLMの現在の限界を探求し、その後の研究の可能性を秘めている。 Recent breakthroughs in large language models (LLMs) have brought remarkable success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information processed by LLMs is consistently honest, neglecting the pervasive deceptive or misleading information in human society and AI-generated content. This oversight makes LLMs susceptible to malicious manipulations, potentially resulting in detrimental outcomes. This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments. Avalon, full of misinformation and requiring sophisticated logic, manifests as a "Game-of-Thoughts". Inspired by the efficacy of humans' recursive thinking and perspective-taking in the Avalon game, we introduce a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information. ReCon combines formulation and refinement contemplation processes; formulation contemplation produces initial thoughts and speech, while refinement contemplation further polishes them. Additionally, we incorporate first-order and second-order perspective transitions into these processes respectively. Specifically, the first-order allows an LLM agent to infer others' mental states, and the second-order involves understanding how others perceive the agent's mental state. After integrating ReCon with different LLMs, extensive experiment results from the Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around deceptive information without extra fine-tuning and data. Finally, we offer a possible explanation for the efficacy of ReCon and explore the current limitations of LLMs in terms of safety, reasoning, speaking style, and format, potentially furnishing insights for subsequent research.	翻訳日:2023-10-25 23:25:26 公開日:2023-10-24
# 話者認識のための自己スーパービジョンによる音声とコンテンツの分離 Disentangling Voice and Content with Self-Supervision for Speaker Recognition ( http://arxiv.org/abs/2310.01128v2 ) ライセンス: Link先を確認	Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li	(参考訳) 話者認識では,話者特性と内容が混在しているため,音声から正確な話者表現を抽出することは困難である。本稿では,話者の特性と内容の変動を同時にモデル化するアンタングル化フレームワークを提案する。異なる音声成分を抽出する学習可能な遷移モデルからなる3つのガウス推論層を用いて実現した。特に、強化された遷移モデルは、複雑な音声力学をモデル化するために特別に設計されている。また,話者識別以外のラベルを使わずにコンテンツを動的に切り離すセルフスーパービジョン手法を提案する。提案フレームワークの有効性は,VoxCelebデータセットとSITWデータセットを用いて,それぞれEERおよびminDCFの平均減少率を9.56%,8.24%で検証した。追加のモデルトレーニングやデータは特に必要とされないため、実用上容易に適用できる。 For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker traits and content variability in speech. It is realized with the use of three Gaussian inference layers, each consisting of a learnable transition model that extracts distinct speech components. Notably, a strengthened transition model is specifically designed to model complex speech dynamics. We also propose a self-supervision method to dynamically disentangle content without the use of labels other than speaker identities. The efficacy of the proposed framework is validated via experiments conducted on the VoxCeleb and SITW datasets with 9.56% and 8.24% average reductions in EER and minDCF, respectively. Since neither additional model training nor data is specifically needed, it is easily applicable in practical use.	翻訳日:2023-10-25 23:24:55 公開日:2023-10-24
# 自律運転における協調認識における適応的コミュニケーション Adaptive Communications in Collaborative Perception with Domain Alignment for Autonomous Driving ( http://arxiv.org/abs/2310.00013v2 ) ライセンス: Link先を確認	Senkang Hu, Zhengru Fang, Haonan An, Guowen Xu, Yuan Zhou, Xianhao Chen, Yuguang Fang	(参考訳) 複数の連結車両と自律車両の協調認識は、車両が通信を介して補助情報を交換できるようにすることで、知覚能力を大幅に向上させることができる。従来のアプローチの進歩にもかかわらず、チャネルのばらつきとコラボレーティブな車両間のデータの均一性による課題は依然として残っている。そこで本研究では,通信グラフを動的に調整し,平均伝送遅延を最小化し,データの不均一性による副作用を緩和するチャネルアウェア協調知覚フレームワークacc-daを提案する。私たちの小説は3つの側面にある。まず、通信グラフを構築し、異なるチャネル情報状態に応じて伝送遅延を最小化できる伝送遅延最小化方法を設計する。次に、速度歪みトレードオフを動的に調整し、知覚効率を向上させる適応データ再構成機構を提案する。さらに、データ送信時の時間的冗長性を最小化する。最後に、異なる車両からのデータ分布を調整するためのドメインアライメントスキームを考案し、異なる車両間のドメイン間ギャップを緩和し、対象タスクの性能を向上させる。総合的な実験により,本手法の有効性が実証された。 Collaborative perception among multiple connected and autonomous vehicles can greatly enhance perceptive capabilities by allowing vehicles to exchange supplementary information via communications. Despite advances in previous approaches, challenges still remain due to channel variations and data heterogeneity among collaborative vehicles. To address these issues, we propose ACC-DA, a channel-aware collaborative perception framework to dynamically adjust the communication graph and minimize the average transmission delay while mitigating the side effects from the data heterogeneity. Our novelties lie in three aspects. We first design a transmission delay minimization method, which can construct the communication graph and minimize the transmission delay according to different channel information state. We then propose an adaptive data reconstruction mechanism, which can dynamically adjust the rate-distortion trade-off to enhance perception efficiency. Moreover, it minimizes the temporal redundancy during data transmissions. Finally, we conceive a domain alignment scheme to align the data distribution from different vehicles, which can mitigate the domain gap between different vehicles and improve the performance of the target task. Comprehensive experiments demonstrate the effectiveness of our method in comparison to the existing state-of-the-art works.	翻訳日:2023-10-25 23:24:41 公開日:2023-10-24
# 説明可能な機械学習に基づく糖尿病性腎症予測モデル Explainable machine learning-based prediction model for diabetic nephropathy ( http://arxiv.org/abs/2309.16730v2 ) ライセンス: Link先を確認	Jing-Mei Yin, Yang Li, Jun-Tang Xue, Guo-Wei Zong, Zhong-Ze Fang, and Lang Zou	(参考訳) 本研究の目的は, 糖尿病性腎症 (DN) に対する血清代謝物の影響を解析し, 機械学習を用いてDNの有病率を予測することである。データセットは、2018年4月から2019年4月まで、大連医科大学第二附属病院(SAHDMU)で548人の患者で構成されている。最小絶対収縮・選択演算子(LASSO)回帰モデルと10倍のクロスバリデーションにより最適38個の特徴を選定する。我々は,eXtreme Gradient Boosting (XGB),ランダムフォレスト,決定木,ロジスティック回帰の4つの機械学習アルゴリズムを,AUC-ROC曲線,決定曲線,キャリブレーション曲線で比較した。 shapley additive explanations (shap) 法による最適予測モデルにおける特徴量と相互作用効果を定量化する。 xgbモデルは、最大auc値0.966のdnで画面表示に最適な性能を持つ。 XGBモデルは、他のモデルよりも臨床効果が高く、適合度も良い。さらに、血清代謝物と糖尿病の持続時間の間には大きな相互作用がある。我々は,DN をスクリーニングする XGB アルゴリズムによる予測モデルを開発した。 C2、C5DC、Tyr、Ser、Met、C24、C4DC、Cysはこのモデルに多大な貢献をしている。 The aim of this study is to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach. The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). We select the optimal 38 features through a Least absolute shrinkage and selection operator (LASSO) regression model and a 10-fold cross-validation. We compare four machine learning algorithms, including eXtreme Gradient Boosting (XGB), random forest, decision tree and logistic regression, by AUC-ROC curves, decision curves, calibration curves. We quantify feature importance and interaction effects in the optimal predictive model by Shapley Additive exPlanations (SHAP) method. The XGB model has the best performance to screen for DN with the highest AUC value of 0.966. The XGB model also gains more clinical net benefits than others and the fitting degree is better. In addition, there are significant interactions between serum metabolites and duration of diabetes. We develop a predictive model by XGB algorithm to screen for DN. C2, C5DC, Tyr, Ser, Met, C24, C4DC, and Cys have great contribution in the model, and can possibly be biomarkers for DN.	翻訳日:2023-10-25 23:24:24 公開日:2023-10-24
# 金融ポートフォリオ管理のためのディープラーニングとオンラインソース感の活用 Leveraging Deep Learning and Online Source Sentiment for Financial Portfolio Management ( http://arxiv.org/abs/2309.16679v2 ) ライセンス: Link先を確認	Paraskevi Nousi, Loukia Avramelou, Georgios Rodinos, Maria Tzelepi, Theodoros Manousis, Konstantinos Tsampazis, Kyriakos Stefanidis, Dimitris Spanos, Manos Kirtas, Pavlos Tosidis, Avraam Tsantekidis, Nikolaos Passalis and Anastasios Tefas	(参考訳) ファイナンシャル・ポートフォリオ・マネジメント(英: financial portfolio management)とは、株式、インデックスファンド、外国為替、暗号通貨などの一連の金融資産において、当該事業の損失を最小化しつつ利益を最大化することを目的とした、資金の分配及び取引業務を行う業務をいう。ディープラーニング(DL)メソッドは、さまざまなタスクにおいて一貫して優れており、自動化された金融取引はその中のひとつです。本稿では,金融取引における様々なdl手法について,監督学習と強化学習の両面で見識を提供することを目的としている。同時に、取引資産に関する感情情報を考慮し、対応する研究研究を通してそれらの有用性を議論し、実証する。最後に、このような金融エージェントの訓練においてよく見られる問題について議論し、これらの問題を避けるために必要な知識を読者に与え、実際に議論する方法を適用する。 Financial portfolio management describes the task of distributing funds and conducting trading operations on a set of financial assets, such as stocks, index funds, foreign exchange or cryptocurrencies, aiming to maximize the profit while minimizing the loss incurred by said operations. Deep Learning (DL) methods have been consistently excelling at various tasks and automated financial trading is one of the most complex one of those. This paper aims to provide insight into various DL methods for financial trading, under both the supervised and reinforcement learning schemes. At the same time, taking into consideration sentiment information regarding the traded assets, we discuss and demonstrate their usefulness through corresponding research studies. Finally, we discuss commonly found problems in training such financial agents and equip the reader with the necessary knowledge to avoid these problems and apply the discussed methods in practice.	翻訳日:2023-10-25 23:24:01 公開日:2023-10-24
# 一般化されたブラックホールエントロピーはフォン・ノイマンエントロピーである Generalized Black Hole Entropy is von Neumann Entropy ( http://arxiv.org/abs/2309.15897v2 ) ライセンス: Link先を確認	Jonah Kudler-Flam, Samuel Leutheusser, Gautam Satishchandran	(参考訳) 最近、シュワルツシルト-AdSブラックホールの質量にdressした可観測物のフォン・ノイマン代数やデ・シッターの観測者がタイプIIであることが示されている。半古典状態のフォン・ノイマンエントロピーは一般化エントロピーであることが判明した。しかし、これらの議論は平衡状態(kms)の存在に依存しており、例えば重力崩壊によって形成されたブラックホール、カーブラックホール、あるいは漸近的にド・ジッター空間内のブラックホールには適用されない。本稿では, キリング地平線を持つ任意の時空上の線形場に対して, 着衣可観測体の代数を求めるための一般的な枠組みを提案する。定常状態(ただし必ずしも KMS ではない)の存在と解の適切な崩壊を仮定すると、着飾った可観測体の代数が常に地平線上に「局所化」されたタイプII因子を含むという構造定理が証明される。これらの仮定は、ほとんどのケースで厳格に証明されている。漸近的に平坦なケーラーブラックホールの外方での代数に応用すると、場はブラックホールの質量と角運動量にdressした状態で、地平線上のタイプII$_{\infty}$代数と過去のヌル無限大におけるタイプI$_{\infty}$代数の積が見つかる。シュワルツシルト=ド・シッター (Schwarzschild-de Sitter) では、観測者を導入するにもかかわらず、場の可観測物はブラックホールと宇宙的地平線の摂動領域に似ており、各地平線上のタイプII$_{\infty}$代数の積である。いずれの場合も、半古典状態に対するフォン・ノイマンのエントロピーは一般化エントロピーによって与えられる。我々の結果は、他の「有界構造」が存在する場合(例えば、漸近境界あるいは他のキリング地平線)、可観測体の代数はタイプII$_{\infty}$であり、そのような構造が存在しない場合(例えば、デ・シッター)、代数はタイプII$_{1}$であることを示している。 It was recently shown that the von Neumann algebras of observables dressed to the mass of a Schwarzschild-AdS black hole or an observer in de Sitter are Type II, and thus admit well-defined traces. The von Neumann entropies of "semi-classical" states were found to be generalized entropies. However, these arguments relied on the existence of an equilibrium (KMS) state and thus do not apply to, e.g., black holes formed from gravitational collapse, Kerr black holes, or black holes in asymptotically de Sitter space. In this paper, we present a general framework for obtaining the algebra of dressed observables for linear fields on any spacetime with a Killing horizon. We prove, assuming the existence of a stationary (but not necessarily KMS) state and suitable decay of solutions, a structure theorem that the algebra of dressed observables always contains a Type II factor "localized" on the horizon. These assumptions have been rigorously proven in most cases of interest. Applied to the algebra in the exterior of an asymptotically flat Kerr black hole, where the fields are dressed to the black hole mass and angular momentum, we find a product of a Type II$_{\infty}$ algebra on the horizon and a Type I$_{\infty}$ algebra at past null infinity. In Schwarzschild-de Sitter, despite the fact that we introduce an observer, the quantum field observables are dressed to the perturbed areas of the black hole and cosmological horizons and is the product of Type II$_{\infty}$ algebras on each horizon. In all cases, the von Neumann entropy for semiclassical states is given by the generalized entropy. Our results suggest that in all cases where there exists another "boundary structure" (e.g., an asymptotic boundary or another Killing horizon) the algebra of observables is Type II$_{\infty}$ and in the absence of such structures (e.g., de Sitter) the algebra is Type II$_{1}$.	翻訳日:2023-10-25 23:23:45 公開日:2023-10-24
# 拡散モデルにおける信号リークバイアスの爆発 Exploiting the Signal-Leak Bias in Diffusion Models ( http://arxiv.org/abs/2309.15842v2 ) ライセンス: Link先を確認	Martin Nicolas Everaert, Athanasios Fitsios, Marco Bocchio, Sami Arpa, Sabine S\"usstrunk, Radhakrishna Achanta	(参考訳) ほとんどの拡散モデルの推論パイプラインにはバイアスがある。このバイアスは、分布がノイズ分布から逸脱し、トレーニングと推論プロセスの間に不一致が生じる信号リークから生じる。この信号リークバイアスは、モデルが特定のスタイルに調整されると特に重要であり、サブ最適スタイルマッチングを引き起こす。最近の研究は、訓練中の信号漏れを回避しようとしている。代わりに、既存の拡散モデルにおけるこの信号漏れバイアスを利用して、生成した画像のさらなる制御を可能にする方法を示します。これにより、より輝度の異なる画像や、所望のスタイルや色に合致した画像を生成することができます。空間周波数及び画素領域における信号リークの分布をモデル化し、初期潜時における信号リークを含むことにより、追加のトレーニングを伴わずに予測結果に適合する画像を生成する。 There is a bias in the inference pipeline of most diffusion models. This bias arises from a signal leak whose distribution deviates from the noise distribution, creating a discrepancy between training and inference processes. We demonstrate that this signal-leak bias is particularly significant when models are tuned to a specific style, causing sub-optimal style matching. Recent research tries to avoid the signal leakage during training. We instead show how we can exploit this signal-leak bias in existing diffusion models to allow more control over the generated images. This enables us to generate images with more varied brightness, and images that better match a desired style or color. By modeling the distribution of the signal leak in the spatial frequency and pixel domains, and including a signal leak in the initial latent, we generate images that better match expected results without any additional training.	翻訳日:2023-10-25 23:23:06 公開日:2023-10-24
# 手術ビデオのための動的シーングラフ表現 Dynamic Scene Graph Representation for Surgical Video ( http://arxiv.org/abs/2309.14538v2 ) ライセンス: Link先を確認	Felix Holm, Ghazal Ghazaei, Tobias Czempiel, Ege \"Ozsoy, Stefan Saur, Nassir Navab	(参考訳) 顕微鏡または内視鏡画像装置から撮影された手術ビデオは、豊富なが複雑な情報源であり、様々なツールや解剖学的構造が長い時間で利用される。重要なワークフロー情報を含み、多くの手順で一般的に記録されているにもかかわらず、外科的ワークフロー理解のための外科的ビデオの使用は依然として限られている。本研究では,すべての解剖学的構造,ツール,およびそれらの相互作用をエンコードしながら,手術ビデオを表現するためのより包括的,意味的に有意義で可読な方法としてシーングラフを利用する。ソリューションの影響を適切に評価するために、cadisと白内障データセットのセマンティックセグメンテーションからシーングラフデータセットを作成します。本稿では,グラフ畳み込みネットワーク(gcns)を用いて,手術下下流の作業,例えば外科的ワークフロー認識や競合性能に対処し,シーングラフを活用できることを実証する。さらに, 臨床現場において重要なモデル決定の説明可能性とロバスト性に関して, 外科的シーングラフの有用性を示す。 Surgical videos captured from microscopic or endoscopic imaging devices are rich but complex sources of information, depicting different tools and anatomical structures utilized during an extended amount of time. Despite containing crucial workflow information and being commonly recorded in many procedures, usage of surgical videos for automated surgical workflow understanding is still limited. In this work, we exploit scene graphs as a more holistic, semantically meaningful and human-readable way to represent surgical videos while encoding all anatomical structures, tools, and their interactions. To properly evaluate the impact of our solutions, we create a scene graph dataset from semantic segmentations from the CaDIS and CATARACTS datasets. We demonstrate that scene graphs can be leveraged through the use of graph convolutional networks (GCNs) to tackle surgical downstream tasks such as surgical workflow recognition with competitive performance. Moreover, we demonstrate the benefits of surgical scene graphs regarding the explainability and robustness of model decisions, which are crucial in the clinical setting.	翻訳日:2023-10-25 23:22:53 公開日:2023-10-24
# 人間支援言語プランナーを用いた生涯ロボット学習 Lifelong Robot Learning with Human Assisted Language Planners ( http://arxiv.org/abs/2309.14321v2 ) ライセンス: Link先を確認	Meenal Parakh, Alisha Fong, Anthony Simeonov, Tao Chen, Abhishek Gupta, Pulkit Agrawal	(参考訳) 大規模言語モデル(LLM)は、高レベルの命令を実行可能な命令列に分解できるプランナーのように振る舞うことが示されている。しかし、現在のLSMベースのプランナーは、一定のスキルセットでしか動作できない。この限界を克服し、llmベースのプランナーを用いて新たなスキルをクエリし、これらのスキルを剛体オブジェクト操作のためのデータと時間効率のよい方法でロボットに教える方法を提案する。本システムは,新たに獲得したスキルを今後の課題に再利用し,オープンワールドと生涯学習の可能性を示す。シミュレーションと実世界における複数のタスクに関するフレームワークの評価を行った。ビデオは以下の通り。 https://sites.google.com/mit.edu/halp-robot-learning。 Large Language Models (LLMs) have been shown to act like planners that can decompose high-level instructions into a sequence of executable instructions. However, current LLM-based planners are only able to operate with a fixed set of skills. We overcome this critical limitation and present a method for using LLM-based planners to query new skills and teach robots these skills in a data and time-efficient manner for rigid object manipulation. Our system can re-use newly acquired skills for future tasks, demonstrating the potential of open world and lifelong learning. We evaluate the proposed framework on multiple tasks in simulation and the real world. Videos are available at: https://sites.google.com/mit.edu/halp-robot-learning.	翻訳日:2023-10-25 23:22:06 公開日:2023-10-24
# grove: 証拠の森を用いた検索による複雑なストーリー生成フレームワーク GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence ( http://arxiv.org/abs/2310.05388v2 ) ライセンス: Link先を確認	Zhihua Wen, Zhiliang Tian, Wei Wu, Yuxin Yang, Yanqi Shi, Zhen Huang, Dongsheng Li	(参考訳) 条件付きストーリー生成は、人間と機械の相互作用、特に複雑なプロットによるストーリーの生成において重要である。大きな言語モデル(LLM)は、ストーリー生成を含む複数のNLPタスクでうまく機能するが、複雑なプロットと創造的なプロットの両方でストーリーを生成することは困難である。既存の手法はしばしば、LLMを目標条件に合わせるための詳細なプロンプトに依存しており、それは必然的に生成されたストーリーの創造性を制限している。我々は、模範的な人間書きの物語からの情報を活用することで、より多様なプロットラインを生み出すことを主張する。ストーリーの詳細を深く掘り下げることは、複雑で信頼できるプロットを構築するのに役立つ。本稿では,e\textbf{V}id\textbf{E}nce(GROVE)のf\textbf{O}restを用いた検索-au\textbf{G}mented sto\textbf{R}y生成フレームワークを提案する。我々は,目標条件の検索レポジトリを構築し,llmをプロンプトするためのサンプルを少数生成する。さらに,証拠の森を抽出する 'asking-why'' プロンプトスキームをデザインし,生成したストーリーで発生する曖昧さを補償する。この反復的なプロセスはストーリーの背景を明らかにする。最後に,エビデンス・フォレストから最も適切なエビデンス・チェーンを選択し,生成したストーリーに統合することで,物語の複雑さと信頼性を高める。実験結果と多数の事例が本手法の有効性を検証した。 Conditional story generation is significant in human-machine interaction, particularly in producing stories with complex plots. While Large language models (LLMs) perform well on multiple NLP tasks, including story generation, it is challenging to generate stories with both complex and creative plots. Existing methods often rely on detailed prompts to guide LLMs to meet target conditions, which inadvertently restrict the creative potential of the generated stories. We argue that leveraging information from exemplary human-written stories facilitates generating more diverse plotlines. Delving deeper into story details helps build complex and credible plots. In this paper, we propose a retrieval-au\textbf{G}mented sto\textbf{R}y generation framework with a f\textbf{O}rest of e\textbf{V}id\textbf{E}nce (GROVE) to enhance stories' complexity. We build a retrieval repository for target conditions to produce few-shot examples to prompt LLMs. Additionally, we design an ``asking-why'' prompting scheme that extracts a forest of evidence, providing compensation for the ambiguities that may occur in the generated story. This iterative process uncovers underlying story backgrounds. Finally, we select the most fitting chains of evidence from the evidence forest and integrate them into the generated story, thereby enhancing the narrative's complexity and credibility. Experimental results and numerous examples verify the effectiveness of our method.	翻訳日:2023-10-25 23:15:21 公開日:2023-10-24
# ラテント合成による効率的なテキストデータ利用によるエンドツーエンド音声処理の改善 Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis ( http://arxiv.org/abs/2310.05374v3 ) ライセンス: Link先を確認	Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen	(参考訳) 高性能なエンドツーエンド音声(E2E)処理モデルを訓練するには、特にデータ中心人工知能の時代において、大量のラベル付き音声データが必要となる。しかし、ラベル付き音声データは通常、テキストデータに比べて、収集が困難で費用がかかる。 E2E音声処理モデルのための効率的なテキストデータ利用フレームワークLaSynを提案する。我々は、テキストデータを事前訓練された音声モデルの中間潜在表現に変換するために、潜在合成器を訓練する。テキストデータの擬似音響表現は、モデルトレーニングのための音響データを増強する。我々は,低リソース自動音声認識(ASR)と音声言語理解(SLU)タスクにおけるLaSynの評価を行った。 ASRでは、LibriSpeechトレインクリーン100で訓練されたE2Eベースラインを改善し、異なるテストセットで単語エラー率を22.3%以上削減した。 SLUでは,SLURP上でのSLU-F1の絶対4.1%,SLURP上でのスロット充填SLU-F1の絶対4.49%,STOP上でのEMとEM-Treeの精度2.25%でE2Eベースラインを改善した。パラメータが少なければ、LaSynの結果は出版されている最先端の作品と競合する。その結果,強化トレーニングデータの品質が示された。 Training a high performance end-to-end speech (E2E) processing model requires an enormous amount of labeled speech data, especially in the era of data-centric artificial intelligence. However, labeled speech data are usually scarcer and more expensive for collection, compared to textual data. We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech processing models. We train a latent synthesizer to convert textual data into an intermediate latent representation of a pre-trained speech model. These pseudo acoustic representations of textual data augment acoustic data for model training. We evaluate LaSyn on low-resource automatic speech recognition (ASR) and spoken language understanding (SLU) tasks. For ASR, LaSyn improves an E2E baseline trained on LibriSpeech train-clean-100, with relative word error rate reductions over 22.3% on different test sets. For SLU, LaSyn improves our E2E baseline by absolute 4.1% for intent classification accuracy and 3.8% for slot filling SLU-F1 on SLURP, and absolute 4.49% and 2.25% for exact match (EM) and EM-Tree accuracies on STOP respectively. With fewer parameters, the results of LaSyn are competitive to published state-of-the-art works. The results demonstrate the quality of the augmented training data.	翻訳日:2023-10-25 23:14:52 公開日:2023-10-24
# Counter Turing Test CT^2: AI生成テキスト検出は、あなたが考えるほど簡単ではない -- AI検出可能性指数の導入 Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think -- Introducing AI Detectability Index ( http://arxiv.org/abs/2310.05030v2 ) ライセンス: Link先を確認	Megha Chakraborty, S.M Towhidul Islam Tonmoy, S M Mehedi Zaman, Krish Sharma, Niyar R Barman, Chandan Gupta, Shreya Gautam, Tanay Kumar, Vinija Jain, Aman Chadha, Amit P. Sheth, Amitava Das	(参考訳) 有能なChatGPTの台頭に伴い、AI生成テキストのリスクと結果が急増している。 AI生成物の所有権に関する必然的な問題に対処するため、米国著作権庁は「作品の伝統的な著作物が機械によって生産された場合、作品は人間の著作物に欠け、事務所はそれを登録しない」という声明を発表した。さらに、米国とEU政府は最近、AIの規制フレームワークに関する最初の提案を起草した。 AI生成型テキスト検出(AGTD)は、AI生成型テキスト検出(AGTD)におけるこのサイノーゾ的なスポットライトから、研究においてすぐに注目を集めているトピックとして現れ、いくつかの初期手法が提案され、間もなく検出をバイパスする技術が出現する。本稿では,既存のAGTD手法のロバスト性を総合的に評価することを目的とした手法のベンチマークであるCounter Turing Test (CT^2)を紹介する。調査対象のAGTD法が脆弱であることは明らかです。 AI開発を規制するための政策決定に関する広範な議論の中で、LLMが生成するコンテンツの検出可能性を評価することが最も重要である。そこで本研究では,LLMの評価とランク付けを容易にする定量スペクトルを確立するために,AI検出可能性指数(AI Detectability Index, ADI)を提案する。われわれは15個の現代LLMを徹底的に検討し、より大きなLLMはADIが高い傾向を示し、小さいLLMに比べて検出しにくいことを示した。 ADIはより広範なNLPコミュニティのツールとして大きな価値があり、AI関連の政策決定においてルーリックとして機能する可能性があると強く信じています。 With the rise of prolific ChatGPT, the risk and consequences of AI-generated text has increased alarmingly. To address the inevitable question of ownership attribution for AI-generated artifacts, the US Copyright Office released a statement stating that 'If a work's traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it'. Furthermore, both the US and the EU governments have recently drafted their initial proposals regarding the regulatory framework for AI. Given this cynosural spotlight on generative AI, AI-generated text detection (AGTD) has emerged as a topic that has already received immediate attention in research, with some initial methods having been proposed, soon followed by emergence of techniques to bypass detection. This paper introduces the Counter Turing Test (CT^2), a benchmark consisting of techniques aiming to offer a comprehensive evaluation of the robustness of existing AGTD techniques. Our empirical findings unequivocally highlight the fragility of the proposed AGTD methods under scrutiny. Amidst the extensive deliberations on policy-making for regulating AI development, it is of utmost importance to assess the detectability of content generated by LLMs. Thus, to establish a quantifiable spectrum facilitating the evaluation and ranking of LLMs according to their detectability levels, we propose the AI Detectability Index (ADI). We conduct a thorough examination of 15 contemporary LLMs, empirically demonstrating that larger LLMs tend to have a higher ADI, indicating they are less detectable compared to smaller LLMs. We firmly believe that ADI holds significant value as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making.	翻訳日:2023-10-25 23:14:27 公開日:2023-10-24
# ブラックホール蒸発の単位(半)因果量子回路表現 Unitary (semi)causal quantum-circuit representation of black hole evaporation ( http://arxiv.org/abs/2310.04744v3 ) ライセンス: Link先を確認	Bogus{\l}aw Broda	(参考訳) 事象の地平線 (semicausality) によって課される因果関係を尊重するブラックホールの一元進化(蒸発)の一般的な構造が導出され、量子回路の言語で表される。対応する絡み合いエントロピーとエントロピー曲線の進化に対する結果が決定されている。一般的なスキームの例として、テンソル製品モデルと制御された非製品モデルという2種類のキュービット玩具モデルが議論されている。 A general structure of unitary evolution (evaporation) of the black hole, respecting causality imposed by the event horizon (semicausality), has been derived and presented in the language of quantum circuits. The resulting consequences for the evolution of the corresponding entanglement entropy and the entropy curve have been determined. As an illustration of the general scheme, two families of qubit toy models have been discussed: tensor product models and controlled non-product models.	翻訳日:2023-10-25 23:13:56 公開日:2023-10-24
# 中国語大言語モデルにおける幻覚評価 Evaluating Hallucinations in Chinese Large Language Models ( http://arxiv.org/abs/2310.03368v3 ) ライセンス: Link先を確認	Qinyuan Cheng, Tianxiang Sun, Wenwei Zhang, Siyin Wang, Xiangyang Liu, Mozhi Zhang, Junliang He, Mianqiu Huang, Zhangyue Yin, Kai Chen, Xipeng Qiu	(参考訳) 本稿では,中国大言語モデルにおける幻覚現象を測定するために,HaluQAというベンチマークを作成した。 HalluQAには450の厳密に設計された敵の質問が含まれており、複数のドメインにまたがっており、中国の歴史的文化、慣習、社会現象を考慮に入れている。 HalluQAの構築中,擬似偽造と事実誤りの2種類の幻覚を考察し,GLM-130B と ChatGPT に基づく敵対的サンプルを構築した。評価のために,モデル出力が幻覚的かどうかを判定するために,GPT-4を用いた自動評価手法を設計する。 ERNIE-Bot、Baichuan2、ChatGLM、Qwen、SparkDeskなど、24の大規模言語モデルに関する広範な実験を行います。 24モデル中、18モデルは50%未満の非幻覚率を達成した。これはHauQAが非常に難しいことを示している。様々なモデルにおける幻覚の主なタイプとその原因を分析した。さらに,様々なモデルに対してどの種類の幻覚を優先すべきかについて議論する。 In this paper, we establish a benchmark named HalluQA (Chinese Hallucination Question-Answering) to measure the hallucination phenomenon in Chinese large language models. HalluQA contains 450 meticulously designed adversarial questions, spanning multiple domains, and takes into account Chinese historical culture, customs, and social phenomena. During the construction of HalluQA, we consider two types of hallucinations: imitative falsehoods and factual errors, and we construct adversarial samples based on GLM-130B and ChatGPT. For evaluation, we design an automated evaluation method using GPT-4 to judge whether a model output is hallucinated. We conduct extensive experiments on 24 large language models, including ERNIE-Bot, Baichuan2, ChatGLM, Qwen, SparkDesk and etc. Out of the 24 models, 18 achieved non-hallucination rates lower than 50%. This indicates that HalluQA is highly challenging. We analyze the primary types of hallucinations in different types of models and their causes. Additionally, we discuss which types of hallucinations should be prioritized for different types of models.	翻訳日:2023-10-25 23:13:18 公開日:2023-10-24
# 自己教師型エンコーダ・デコーダ音声モデルのプロンプティングと適応調整 Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model ( http://arxiv.org/abs/2310.02971v2 ) ライセンス: Link先を確認	Kai-Wei Chang, Ming-Hsin Chen, Yun-Ping Lin, Jing Neng Hsu, Paul Kuo-Ming Huang, Chien-yu Huang, Shang-Wen Li, Hung-yi Lee	(参考訳) プロンプティングとアダプタチューニングがファインチューニング(FT)手法の効率的な代替手段として登場した。しかし、既存の音声プロンプトの研究は分類タスクに焦点が当てられ、より複雑なシーケンス生成タスクに失敗した。加えて、アダプタチューニングは主にエンコーダのみの自己教師型モデルに焦点をあてて適用される。実験の結果,自己教師付きエンコーダデコーダモデルwav2seqは,シーケンス生成タスクにおける従来の作業を上回ることがわかった。 ASRでは単語誤り率が53%向上し,スロットフィリングではF1スコアが27%向上した。さらに、プロンプトは低リソースシナリオにおいてFT法と競合する。さらに,言語間asrにおけるwav2seqのプロンプトとアダプタチューニングの伝達可能性を示す。訓練可能なパラメータが限られている場合、プロンプトとアダプタのチューニングは7つの言語で従来のFTより一貫して優れている。特に低リソースのシナリオでは、アダプタチューニングが一貫して向上する。 Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. However, existing studies on speech prompting focused on classification tasks and failed on more complex sequence generation tasks. Besides, adapter tuning is primarily applied with a focus on encoder-only self-supervised models. Our experiments show that prompting on Wav2Seq, a self-supervised encoder-decoder model, surpasses previous works in sequence generation tasks. It achieves a remarkable 53% relative improvement in word error rate for ASR and a 27% in F1 score for slot filling. Additionally, prompting competes with the FT method in the low-resource scenario. Moreover, we show the transferability of prompting and adapter tuning on Wav2Seq in cross-lingual ASR. When limited trainable parameters are involved, prompting and adapter tuning consistently outperform conventional FT across 7 languages. Notably, in the low-resource scenario, prompting consistently outperforms adapter tuning.	翻訳日:2023-10-25 23:12:32 公開日:2023-10-24
# 監視量子ビットにおける局在、フラクタル性、エルゴード性 Localization, fractality, and ergodicity in a monitored qubit ( http://arxiv.org/abs/2310.01997v2 ) ライセンス: Link先を確認	Paul P\"opperl, Igor V. Gornyi, David B. Saakian, Oleg M. Yevtushenko	(参考訳) そこで本研究では,二段階システム (qubit) の統計的特性を反復的に測定した。このセットアップは、システムのユニタリダイナミクスと量子測定によって導入された非ユニタリ確率の間の複雑な相互作用を探索するための基本的な最小限のモデルであり、これは測定誘起相転移の現象の中心である。この「トイモデル」は、量子ビットの量子状態の分布関数を長時間の極限で表す、驚くほどリッチなダイナミクスを持つことを示した。我々はアンダーソン局在の現象と魅力的な類似点を発見したが、それは異なる基礎的なメカニズムによって支配されている。具体的には、監視された量子ビットの状態分布関数は、ブロッホ球面上の1つの角度でパラメータ化され、アンダーソン遷移の理論に精通した様々な種類の振る舞いを示し、完全な局在からほぼ一様非局在まで、この2つの極限の間にフラクタリティが生じる。各種特殊ケースの解析解と2つの相補的な数値的アプローチを組み合わせることにより、モデルの「位相図」を記述した構造を包括的に理解する。我々は、初期状態の分類と定量化を行い、監視された量子ビットの2つの異なる位相:エルゴードと非エルゴードを同定する。これら2つのフェーズ間の遷移が主な発見です。 We study the statistical properties of a single two-level system (qubit) subject to repetitive ancilla-based measurements. This setup is a fundamental minimal model for exploring the intricate interplay between the unitary dynamics of the system and the nonunitary stochasticity introduced by quantum measurements, which is central to the phenomenon of measurement-induced phase transitions. We demonstrate that this "toy model" harbors remarkably rich dynamics, manifesting in the distribution function of the qubit's quantum states in the long-time limit. We uncover a compelling analogy with the phenomenon of Anderson localization, albeit governed by distinct underlying mechanisms. Specifically, the state distribution function of the monitored qubit, parameterized by a single angle on the Bloch sphere, exhibits diverse types of behavior familiar from the theory of Anderson transitions, spanning from complete localization to almost uniform delocalization, with fractality occurring between the two limits. By combining analytical solutions for various special cases with two complementary numerical approaches, we achieve a comprehensive understanding of the structure delineating the "phase diagram" of the model. We categorize and quantify the emergent regimes and identify two distinct phases of the monitored qubit: ergodic and nonergodic. The transition between these two phases is our main finding.	翻訳日:2023-10-25 23:11:59 公開日:2023-10-24
# TRIGO:生成言語モデルのための形式的数学的証明のベンチマーク TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models ( http://arxiv.org/abs/2310.10180v2 ) ライセンス: Link先を確認	Jing Xiong, Jianhao Shen, Ye Yuan, Haiming Wang, Yichun Yin, Zhengying Liu, Lin Li, Zhijiang Guo, Qingxing Cao, Yinya Huang, Chuanyang Zheng, Xiaodan Liang, Ming Zhang, Qun Liu	(参考訳) 自動定理証明(ATP)は、最近成功した生成言語モデルの推論能力を探究する上で魅力的な領域となっている。しかし、現在のATPベンチマークは主にシンボリック推論に焦点を当てているが、複素数組合せの推論を理解することは滅多にない。本研究では, ATP ベンチマーク TRIGO を提案する。このベンチマークは, ステップバイステップの証明で三角法式を縮小するモデルを必要とするだけでなく, 論理式に対する生成的 LM の推論能力とその操作, グループ化, 因子数項の操作能力を評価する。我々は、Webから三角法式とその縮小形式を収集し、手作業で単純化プロセスを注釈化し、それをリーン形式言語システムに翻訳する。その後、アノテーション付きサンプルからサンプルを自動生成してデータセットを拡張する。さらに,Lean-Gymに基づく自動生成装置を開発し,モデルの一般化能力を徹底的に分析するために,様々な困難と分布のデータセット分割を作成する。提案するTRIGOは,多量のオープンソース形式定理証明言語データに基づいて事前学習された GPT-4 を含む先進的生成型LMの新たな課題を示すとともに,形式的および数学的推論において,生成型LMの能力を研究するための新しいツールを提供する。 Automated theorem proving (ATP) has become an appealing domain for exploring the reasoning ability of the recent successful generative language models. However, current ATP benchmarks mainly focus on symbolic inference, but rarely involve the understanding of complex number combination reasoning. In this work, we propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas and its capability to manipulate, group, and factor number terms. We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system. We then automatically generate additional examples from the annotated samples to expand the dataset. Furthermore, we develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability. Our extensive experiments show our proposed TRIGO poses a new challenge for advanced generative LM's including GPT-4 which is pre-trained on a considerable amount of open-source formal theorem-proving language data, and provide a new tool to study the generative LM's ability on both formal and mathematical reasoning.	翻訳日:2023-10-25 23:06:28 公開日:2023-10-24
# AdaptSSR: Augmentation-Adaptive Self-Supervised Rankingによる事前学習ユーザモデル AdaptSSR: Pre-training User Model with Augmentation-Adaptive Self-Supervised Ranking ( http://arxiv.org/abs/2310.09706v2 ) ライセンス: Link先を確認	Yang Yu, Qi Liu, Kai Zhang, Yuren Zhang, Chao Song, Min Hou, Yuqing Yuan, Zhihao Ye, Zaixi Zhang, Sanshi Lei Yu	(参考訳) ユーザの特性や関心を捉えることを目的としたユーザモデリングは、タスク固有のラベル付きデータに大きく依存しており、データのスパーシティの問題に苦しんでいる。最近のいくつかの研究は、対照的な学習タスクで大量のユーザー行動シーケンスでユーザーモデルを事前学習することでこの問題に取り組みました。一般に、これらの手法は、データ拡張によって構築された同一の行動列の異なるビューを意味的に一貫した、すなわち、ユーザの類似した特性や興味を反映し、特徴空間におけるそれらの合意を最大化する。しかし,ユーザ行動の多様さや騒音のため,既存の拡張手法はユーザの特徴を損なったり,ノイズを生じさせる傾向がある。したがって、ユーザモデルに拡張ビュー間の類似性を直接最大化させると、負の転送が発生する可能性がある。そこで本研究では,ユーザモデルを事前学習しながら,拡張ビュー間の意味的一貫性の要件を緩和する,拡張適応型自己教師付きランキング (adaptssr) という新しいpretextタスクでコントラスト学習タスクを置き換えることを提案する。具体的には,ユーザモデルをトレーニングして,暗黙的に拡張されたビューと明示的な拡張されたビュー,他のユーザからのビューの類似性をキャプチャする,複数対のランキング損失を採用する。さらに,モデルトレーニングを容易にするために,バッチ内ハードネガティブサンプリング戦略も採用した。さらに,異なる行動系列に対するデータ拡張の影響を別々に考慮し,拡張ビュー間の推定類似度に基づいて,各サンプルに適用される類似度順序制約を自動的に調整する拡張適応融合機構を設計する。 6つの下流タスクを持つパブリックデータセットと産業データセットの大規模な実験は、AdaptSSRの有効性を検証する。 User modeling, which aims to capture users' characteristics or interests, heavily relies on task-specific labeled data and suffers from the data sparsity issue. Several recent studies tackled this problem by pre-training the user model on massive user behavior sequences with a contrastive learning task. Generally, these methods assume different views of the same behavior sequence constructed via data augmentation are semantically consistent, i.e., reflecting similar characteristics or interests of the user, and thus maximizing their agreement in the feature space. However, due to the diverse interests and heavy noise in user behaviors, existing augmentation methods tend to lose certain characteristics of the user or introduce noisy behaviors. Thus, forcing the user model to directly maximize the similarity between the augmented views may result in a negative transfer. To this end, we propose to replace the contrastive learning task with a new pretext task: Augmentation-Adaptive SelfSupervised Ranking (AdaptSSR), which alleviates the requirement of semantic consistency between the augmented views while pre-training a discriminative user model. Specifically, we adopt a multiple pairwise ranking loss which trains the user model to capture the similarity orders between the implicitly augmented view, the explicitly augmented view, and views from other users. We further employ an in-batch hard negative sampling strategy to facilitate model training. Moreover, considering the distinct impacts of data augmentation on different behavior sequences, we design an augmentation-adaptive fusion mechanism to automatically adjust the similarity order constraint applied to each sample based on the estimated similarity between the augmented views. Extensive experiments on both public and industrial datasets with six downstream tasks verify the effectiveness of AdaptSSR.	翻訳日:2023-10-25 23:06:03 公開日:2023-10-24
# ポイントDynRF:単眼ビデオからの点ベース動的放射場 Point-DynRF: Point-based Dynamic Radiance Fields from a Monocular Video ( http://arxiv.org/abs/2310.09647v2 ) ライセンス: Link先を確認	Byeongjun Park, Changick Kim	(参考訳) 動的放射場は単眼ビデオから新しいビューを生成するための有望なアプローチとして現れてきた。しかし, 従来の手法では, 隣接する入力フレーム間のみの動的放射場に対する幾何的整合性を強制し, 大域的なシーン形状を表現し, 入力カメラ軌道から時空間離れた視点で退化させることが困難であった。この問題を解決するために、我々は、大域的幾何学情報とボリュームレンダリングプロセスがそれぞれニューラルネットワークと動的放射場によってトレーニングされる新しいフレームワークである点ベース動的放射場(\textbf{Point-DynRF})を導入する。具体的には,幾何学的プロキシから直接ニューラルポイント雲を再構成し,提案する損失を用いて放射場と幾何学的プロキシの両方を最適化し,相互補完を可能にした。提案手法の有効性をNVIDIA Dynamic Scenes Datasetと因果的に捉えたモノクロビデオクリップを用いて検証した。 Dynamic radiance fields have emerged as a promising approach for generating novel views from a monocular video. However, previous methods enforce the geometric consistency to dynamic radiance fields only between adjacent input frames, making it difficult to represent the global scene geometry and degenerates at the viewpoint that is spatio-temporally distant from the input camera trajectory. To solve this problem, we introduce point-based dynamic radiance fields (\textbf{Point-DynRF}), a novel framework where the global geometric information and the volume rendering process are trained by neural point clouds and dynamic radiance fields, respectively. Specifically, we reconstruct neural point clouds directly from geometric proxies and optimize both radiance fields and the geometric proxies using our proposed losses, allowing them to complement each other. We validate the effectiveness of our method with experiments on the NVIDIA Dynamic Scenes Dataset and several causally captured monocular video clips.	翻訳日:2023-10-25 23:05:32 公開日:2023-10-24
# explore-instruct: 能動的探索によるドメイン固有の命令カバレッジの向上 Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration ( http://arxiv.org/abs/2310.09168v3 ) ライセンス: Link先を確認	Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, Shuming Shi	(参考訳) インストラクションチューニングは、拡張された多様性によって大幅に最適化され、より広い範囲のタスクを扱うことができるモデルとなる。しかし、そのようなチューニングに使用される既存のデータは、個々のドメインの不十分なカバレッジを示すことが多く、これらの領域内のニュアンスな理解と相互作用の範囲を制限する。そこで本研究では,Large Language Models (LLMs) による積極的な探索を通じて,ドメイン固有の命令チューニングに使用されるデータカバレッジを向上させる手法であるExplore-Instructを提案する。 Explore-Instructは、汎用的なドメインユースケースに基づいて、多種多様なドメイン中心の命令チューニングデータを得るための探索アルゴリズムを実装することで、さまざまなバリエーションや可能性を探究する。データ中心分析は、ドメイン固有の命令カバレッジを改善するために提案手法の有効性を検証する。さらに,本モデルの性能は,ドメイン固有のデータ拡張など,複数のベースラインにまたがる大幅な向上を示す。本研究は,特にドメイン固有の文脈において,命令カバレッジを改善するための有望な機会を提供し,適応可能な言語モデルの開発を促進する。私たちのコード、モデルウェイト、データは、 \url{https://github.com/fanqiwan/Explore-Instruct}で公開されています。 Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks. However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a novel approach to enhance the data coverage to be used in domain-specific instruction-tuning through active exploration via Large Language Models (LLMs). Built upon representative domain use cases, Explore-Instruct explores a multitude of variations or possibilities by implementing a search algorithm to obtain diversified and domain-focused instruction-tuning data. Our data-centric analysis validates the effectiveness of this proposed approach in improving domain-specific instruction coverage. Moreover, our model's performance demonstrates considerable advancements over multiple baselines, including those utilizing domain-specific data enhancement. Our findings offer a promising opportunity to improve instruction coverage, especially in domain-specific contexts, thereby advancing the development of adaptable language models. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/Explore-Instruct}.	翻訳日:2023-10-25 23:05:15 公開日:2023-10-24
# PuoBERTa:セツワナのキュレート言語モデルの訓練と評価 PuoBERTa: Training and evaluation of a curated language model for Setswana ( http://arxiv.org/abs/2310.09141v2 ) ライセンス: Link先を確認	Vukosi Marivate, Moseli Mots'Oehli, Valencia Wagner, Richard Lastrucci and Isheanesu Dzingirai	(参考訳) 自然言語処理(NLP)は、Setswanaのような低リソース言語では遅れを取っているが、英語のような豊富なリソース言語では大きな進歩を遂げている。本稿では,seswana用に特別に訓練されたカスタマイズされたマスキング言語モデルpuobertaについて述べる。我々は,PuoBERTaのトレーニングのための高品質なコーパスを生成するために,多種多様なモノリンガルテキストの収集,キュレート,準備を行った。 setwanaのためのモノリンガルリソースの作成に先立って,part-of-speech(pos)タグ,named entity recognition(ner),news categorizationなど,いくつかのnlpタスクでpuobertaを評価した。さらに、新しいセツワナニュース分類データセットを導入し、PuoBERTaを使った初期ベンチマークを提供した。我々の研究は、セツワナのような未調査言語に対するNLP能力の育成におけるPuoBERTaの有効性を実証し、今後の研究方向性の道を開く。 Natural language processing (NLP) has made significant progress for well-resourced languages such as English but lagged behind for low-resource languages like Setswana. This paper addresses this gap by presenting PuoBERTa, a customised masked language model trained specifically for Setswana. We cover how we collected, curated, and prepared diverse monolingual texts to generate a high-quality corpus for PuoBERTa's training. Building upon previous efforts in creating monolingual resources for Setswana, we evaluated PuoBERTa across several NLP tasks, including part-of-speech (POS) tagging, named entity recognition (NER), and news categorisation. Additionally, we introduced a new Setswana news categorisation dataset and provided the initial benchmarks using PuoBERTa. Our work demonstrates the efficacy of PuoBERTa in fostering NLP capabilities for understudied languages like Setswana and paves the way for future research directions.	翻訳日:2023-10-25 23:04:53 公開日:2023-10-24
# 機械学習に基づく地球科学システムのモデリングのための質量保存型パーセプトロン A Mass-Conserving-Perceptron for Machine Learning-Based Modeling of Geoscientific Systems ( http://arxiv.org/abs/2310.08644v2 ) ライセンス: Link先を確認	Yuan-Heng Wang, Hoshin V. Gupta	(参考訳) 地学システムの時系列進化を予測する物理概念(PC)モデルの構築に何十年も取り組んできたが、最近の研究は機械学習(ML)ベースのGated Recurrent Neural Network技術が、はるかに正確なモデルの開発に利用できることを示している。しかし,MLモデルから身体的理解を抽出することの難しさは,システム構造や機能に関する科学的知識の強化に有用である。本稿では,PCベースとMLベースのモデリングアプローチのギャップを埋める手段として,物理的に解釈可能なMass Conserving Perceptron(MCP)を提案する。 MCPは、PCモデルとGRNNの両方の基盤となる有向グラフ構造間の固有同型を利用して、物理的プロセスの質量保存性を明確に表現し、それらのプロセスの機能的性質を、既製のML技術を用いて利用可能なデータから直接(解釈可能な方法で)学習できるようにする。概念実証として,mcpの機能的表現力(能力)を調査し,リーフ川流域の降雨流出(rr)ダイナミクスを同時表現する能力について検討し,科学的仮説検証に有用性を示す。結論として,この概念を拡張して,地学システムを通しての質量エネルギー情報流の結合特性のMLに基づく物理概念表現を可能にする。 Although decades of effort have been devoted to building Physical-Conceptual (PC) models for predicting the time-series evolution of geoscientific systems, recent work shows that Machine Learning (ML) based Gated Recurrent Neural Network technology can be used to develop models that are much more accurate. However, the difficulty of extracting physical understanding from ML-based models complicates their utility for enhancing scientific knowledge regarding system structure and function. Here, we propose a physically-interpretable Mass Conserving Perceptron (MCP) as a way to bridge the gap between PC-based and ML-based modeling approaches. The MCP exploits the inherent isomorphism between the directed graph structures underlying both PC models and GRNNs to explicitly represent the mass-conserving nature of physical processes while enabling the functional nature of such processes to be directly learned (in an interpretable manner) from available data using off-the-shelf ML technology. As a proof of concept, we investigate the functional expressivity (capacity) of the MCP, explore its ability to parsimoniously represent the rainfall-runoff (RR) dynamics of the Leaf River Basin, and demonstrate its utility for scientific hypothesis testing. To conclude, we discuss extensions of the concept to enable ML-based physical-conceptual representation of the coupled nature of mass-energy-information flows through geoscientific systems.	翻訳日:2023-10-25 23:04:35 公開日:2023-10-24
# 一般化リセット過程を考慮したマルコフ開量子力学における普遍的および非普遍的確率則 Universal and nonuniversal probability laws in Markovian open quantum dynamics subject to generalized reset processes ( http://arxiv.org/abs/2310.06981v2 ) ライセンス: Link先を確認	Federico Carollo, Igor Lesanovsky, Juan P. Garrahan	(参考訳) 我々は、マルコフ開量子系の量子ジャンプ軌道を、初期配置への状態の確率的リセットの対象となるものとする。リセットイベントは、量子軌道を連続した時間間隔に分割し、各間隔内で観測可能な軌道の値から確率変数のシーケンスを定義する。量子状態の関数に関連する観測可能量に対して、列内の特定の順序の確率が普遍法則に従うことを示す。この法則は、選択された可観測性に依存しず、ポアソニアンリセット過程の場合、ダイナミクスの詳細にも依存しない。量子ジャンプの数え上げに関連する可観測性を考慮すると、一般の確率は普遍的な性質を失う。普遍性は、同じシーケンスで等しい結果が観測される確率が、弱いリセット率の限界で達成できるような、消滅的に小さい場合にのみ回復される。その結果,従来の確率過程 [N。〜r。 ~smith et al., epl {\bf 142}, 51002 (2023)] 量子領域と状態依存リセット過程に関係し、普遍確率法則の出現に関連する側面に光を当てている。 We consider quantum jump trajectories of Markovian open quantum systems subject to stochastic in time resets of their state to an initial configuration. The reset events provide a partitioning of quantum trajectories into consecutive time intervals, defining sequences of random variables from the values of a trajectory observable within each of the intervals. For observables related to functions of the quantum state, we show that the probability of certain orderings in the sequences obeys a universal law. This law does not depend on the chosen observable and, in case of Poissonian reset processes, not even on the details of the dynamics. When considering (discrete) observables associated with the counting of quantum jumps, the probabilities in general lose their universal character. Universality is only recovered in cases when the probability of observing equal outcomes in a same sequence is vanishingly small, which we can achieve in a weak reset rate limit. Our results extend previous findings on classical stochastic processes [N.~R.~Smith et al., EPL {\bf 142}, 51002 (2023)] to the quantum domain and to state-dependent reset processes, shedding light on relevant aspects for the emergence of universal probability laws.	翻訳日:2023-10-25 23:04:11 公開日:2023-10-24
# パッセージレベルの幻覚検出のための新しいベンチマークと逆検証法 A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection ( http://arxiv.org/abs/2310.06498v2 ) ライセンス: Link先を確認	Shiping Yang, Renliang Sun, Xiaojun Wan	(参考訳) 大きな言語モデル(LLM)は、現実世界のシナリオで人間と効果的に協力する能力を示している。しかし、LCMは幻覚、すなわち不正確なテキストと未検証情報を生成する傾向があり、ミッションクリティカルなタスクに配備すると大きなダメージを与える可能性がある。本稿では,ゼロリソース方式で事実誤りを自動的に検出する逆検証に基づく自己チェック手法を提案する。そこで本研究では,ChatGPTが生成し,アノテーションを付加した幻覚検出ベンチマークPHDを構築した。ゼロリソース幻覚検出の以前の研究とは対照的に,本手法とベンチマークは文レベルではなくパスレベル検出に集中している。提案手法と既存のゼロリソース検出手法を2つのデータセット上で実証的に評価した。実験の結果,提案手法はトークンのコストが少なく,時間も少ないが,ベースラインをかなり上回ることがわかった。さらに,LLMが捕捉できなかった幻覚症例を手動で解析し,ゼロリソース手法の共有限界を明らかにした。 Large Language Models (LLMs) have shown their ability to collaborate effectively with humans in real-world scenarios. However, LLMs are apt to generate hallucinations, i.e., makeup incorrect text and unverified information, which can cause significant damage when deployed for mission-critical tasks. In this paper, we propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion. To facilitate future studies and assess different methods, we construct a hallucination detection benchmark named PHD, which is generated by ChatGPT and annotated by human annotators. Contrasting previous studies of zero-resource hallucination detection, our method and benchmark concentrate on passage-level detection instead of sentence-level. We empirically evaluate our method and existing zero-resource detection methods on two datasets. The experimental results demonstrate that the proposed method considerably outperforms the baselines while costing fewer tokens and less time. Furthermore, we manually analyze some hallucination cases that LLM failed to capture, revealing the shared limitation of zero-resource methods.	翻訳日:2023-10-25 23:03:47 公開日:2023-10-24
# 近接認識表現によるメモリ効率の高い位置推薦 Memory efficient location recommendation through proximity-aware representation ( http://arxiv.org/abs/2310.06484v2 ) ライセンス: Link先を確認	Xuan Luo, Mingqing Huang, Rui Lv, Hui Zhao	(参考訳) シーケンシャルな位置推薦は、ユーザー体験を高め、ビジネスに利益をもたらし、行政を補助する現代の生活において大きな役割を果たす。位置推薦手法は,レコメンデーションシステムの開発によって大きく発展してきたが,地理的情報の利用は限定的であり,データの疎性に対処する課題も続いている。そこで本研究では,自己認識ネットワークアーキテクチャ上に構築された逐次レコメンデーション(PASR:Sequential Recommendation)の領域表現について述べる。本稿では,重要サンプリングを用いた新たな損失関数を用いて,最適化時の情報的負のサンプルを強調する。さらに、PASRは、自己アテンションに基づく地理エンコーダを、各GPSポイントにおける階層グリッドと近接グリッドに利用することにより、地理情報の統合を強化する。さらに地理情報を活用するため,近接認識型負のサンプリング器を用いて負のサンプルの品質を向上させる。 3つの実世界位置ベースソーシャルネットワーキング(LBSN)データセットを用いて評価を行い、PASRが最先端のシーケンシャルな位置推薦方法を上回ることを示した。 Sequential location recommendation plays a huge role in modern life, which can enhance user experience, bring more profit to businesses and assist in government administration. Although methods for location recommendation have evolved significantly thanks to the development of recommendation systems, there is still limited utilization of geographic information, along with the ongoing challenge of addressing data sparsity. In response, we introduce a Proximity-aware based region representation for Sequential Recommendation (PASR for short), built upon the Self-Attention Network architecture. We tackle the sparsity issue through a novel loss function employing importance sampling, which emphasizes informative negative samples during optimization. Moreover, PASR enhances the integration of geographic information by employing a self-attention-based geography encoder to the hierarchical grid and proximity grid at each GPS point. To further leverage geographic information, we utilize the proximity-aware negative samplers to enhance the quality of negative samples. We conducted evaluations using three real-world Location-Based Social Networking (LBSN) datasets, demonstrating that PASR surpasses state-of-the-art sequential location recommendation methods	翻訳日:2023-10-25 23:03:28 公開日:2023-10-24
# AdaFuse:空間/周波数交差注意に基づく適応医療画像融合 AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential Cross Attention ( http://arxiv.org/abs/2310.05462v2 ) ライセンス: Link先を確認	Xianming Gu, Lihui Wang, Zeyu Deng, Ying Cao, Xingyu Huang and Yue-min Zhu	(参考訳) マルチモーダル画像の融合は, 多モーダル画像の相補的情報を単一の画像にマージできるため, 正確な臨床診断と手術ナビゲーションに不可欠である。融合画像の品質は、抽出された単一モダリティの特徴と、マルチモーダル情報に対する融合規則に依存する。既存の深層学習に基づく融合法では各モードの意味的特徴を完全に活用することができ、各モードの有効低周波情報と高周波情報を識別することができず、適応的に融合することができない。本稿では,フーリエ変換に基づく周波数誘導注意機構を用いてマルチモーダル画像情報を適応的に融合するadafuseを提案する。具体的には,鍵と問合せ値の交換により空間領域と周波数領域の2つのモダリティの特徴を適応的に融合し,空間と周波数の特徴間のクロスアテンションスコアを算出し,空間と周波数の融合をさらに導くクロスアテンション融合(caf)ブロックを提案する。 cafブロックは、異なるモダリティの高周波特性を高め、融合画像の詳細を保持することができる。さらに,低周波情報と高周波情報の両方を保持するために,構造損失とコンテンツ損失からなる新しい損失関数を設計する。いくつかのデータセットにおける広範囲な比較実験により、提案手法が視覚品質と定量的指標の両方において最先端の手法よりも優れていることが示されている。アブレーション実験は, 提案した損失・融合戦略の有効性も検証した。 Multi-modal medical image fusion is essential for the precise clinical diagnosis and surgical navigation since it can merge the complementary information in multi-modalities into a single image. The quality of the fused image depends on the extracted single modality features as well as the fusion rules for multi-modal information. Existing deep learning-based fusion methods can fully exploit the semantic features of each modality, they cannot distinguish the effective low and high frequency information of each modality and fuse them adaptively. To address this issue, we propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism based on Fourier transform. Specifically, we propose the cross-attention fusion (CAF) block, which adaptively fuses features of two modalities in the spatial and frequency domains by exchanging key and query values, and then calculates the cross-attention scores between the spatial and frequency features to further guide the spatial-frequential information fusion. The CAF block enhances the high-frequency features of the different modalities so that the details in the fused images can be retained. Moreover, we design a novel loss function composed of structure loss and content loss to preserve both low and high frequency information. Extensive comparison experiments on several datasets demonstrate that the proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics. The ablation experiments also validate the effectiveness of the proposed loss and fusion strategy.	翻訳日:2023-10-25 23:03:08 公開日:2023-10-24
# ImageArg-2023:マルチモーダル・引数マイニングにおける最初の共有タスクの概要 Overview of ImageArg-2023: The First Shared Task in Multimodal Argument Mining ( http://arxiv.org/abs/2310.12172v2 ) ライセンス: Link先を確認	Zhexiong Liu, Mohamed Elaraby, Yang Zhong, Diane Litman	(参考訳) 本稿では,第10回Argument Mining on EMNLP 2023ワークショップと共同で,最初のマルチモーダルなArgument Mining共有タスクであるImageArg共有タスクの概要を紹介する。共有タスクは,(1)Subtask-A:Argument Stance Classification,(2)Subtask-B: Image Persuasiveness Classificationの2つのサブタスクからなる。前者は、物議を醸す話題(銃規制や中絶など)に向けて、画像とテキストを含むツイートのスタンスを決定する。後者は、画像がツイートテキストをより説得力のあるものにするかどうかを決定する。共有タスクは6カ国9チームからSubtask-A申請31件、Subtask-B申請21件を受け取った。 subtask-a の上位は 0.8647 の f1-score を達成し、subtask-b の上位は 0.5561 の f1-score を達成した。 This paper presents an overview of the ImageArg shared task, the first multimodal Argument Mining shared task co-located with the 10th Workshop on Argument Mining at EMNLP 2023. The shared task comprises two classification subtasks - (1) Subtask-A: Argument Stance Classification; (2) Subtask-B: Image Persuasiveness Classification. The former determines the stance of a tweet containing an image and a piece of text toward a controversial topic (e.g., gun control and abortion). The latter determines whether the image makes the tweet text more persuasive. The shared task received 31 submissions for Subtask-A and 21 submissions for Subtask-B from 9 different teams across 6 countries. The top submission in Subtask-A achieved an F1-score of 0.8647 while the best submission in Subtask-B achieved an F1-score of 0.5561.	翻訳日:2023-10-25 22:55:03 公開日:2023-10-24
# bin-wise scalingは、機械学習回帰における予測の不確かさの一貫性と適応性を改善することができるか? Can bin-wise scaling improve consistency and adaptivity of prediction uncertainty for machine learning regression ? ( http://arxiv.org/abs/2310.11978v2 ) ライセンス: Link先を確認	Pascal Pernot	(参考訳) binwise variance scaling (bvs) は、一様分散(または温度)スケーリングよりも効率的な補正が可能な機械学習回帰問題の予測の不確実性のためのポストホックなリカバリ法として最近提案されている。 BVSのオリジナルバージョンは不確実性ベースのビンニングを使用しており、不確実性、すなわち一貫性に基づいて校正条件を改善することを目的としている。ここでは,BVSの適応,特に代替損失関数と,適応性を改善するための入力機能(X)に基づくビンニング方式について検討する。すなわち,BVSと提案した変種の性能は,原子化エネルギーの予測のためのベンチマークデータセット上で検証し,等調回帰の結果と比較する。 Binwise Variance Scaling (BVS) has recently been proposed as a post hoc recalibration method for prediction uncertainties of machine learning regression problems that is able of more efficient corrections than uniform variance (or temperature) scaling. The original version of BVS uses uncertainty-based binning, which is aimed to improve calibration conditionally on uncertainty, i.e. consistency. I explore here several adaptations of BVS, in particular with alternative loss functions and a binning scheme based on an input-feature (X) in order to improve adaptivity, i.e. calibration conditional on X. The performances of BVS and its proposed variants are tested on a benchmark dataset for the prediction of atomization energies and compared to the results of isotonic regression.	翻訳日:2023-10-25 22:54:32 公開日:2023-10-24
# ディスコナンスからインサイトへ:事例アウトカム分類のための集合住宅の解体 From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification ( http://arxiv.org/abs/2310.11878v4 ) ライセンス: Link先を確認	Shanshan Xu, T.Y.S.S Santosh, Oana Ichim, Isabella Risini, Barbara Plank, Matthias Grabmair	(参考訳) 法的NLPでは、ケースアウトカム分類(COC)は正確であるだけでなく、信頼性と説明性も必要である。説明可能なCOCの既存の作業は、単一の専門家によるアノテーションに限定されている。しかし、弁護士が事件事実の評価に異議を唱えることも知られている。そこで我々は,国際人権法領域の専門家2人から得られたechr1の合理的な変動に関する新たなデータセットを収集し,弱い合意を遵守する。それらの不一致を調査し,coc固有のサブカテゴリを補う2段階のタスク非依存分類法を構築した。我々の知る限り、これは人間のラベルの変化に焦点を当てた法的NLPにおける最初の研究である。異なる分類群を定量的に評価し,cocメタデータの粒度やノイズを考慮し,法的な文脈を過小に特定することによる不一致が主な原因であることを見出した。さらに、RAVE上でのSOTA COCモデルの妥当性を評価し、モデルと専門家間の限定的な合意を観察する。総じて,本事例のケーススタディでは,法的nlpにおけるベンチマークデータセット作成におけるhhertoの不正確さが明らかにされている。 In legal NLP, Case Outcome Classification (COC) must not only be accurate but also trustworthy and explainable. Existing work in explainable COC has been limited to annotations by a single expert. However, it is well-known that lawyers may disagree in their assessment of case facts. We hence collect a novel dataset RAVE: Rationale Variation in ECHR1, which is obtained from two experts in the domain of international human rights law, for whom we observe weak agreement. We study their disagreements and build a two-level task-independent taxonomy, supplemented with COC-specific subcategories. To our knowledge, this is the first work in the legal NLP that focuses on human label variation. We quantitatively assess different taxonomy categories and find that disagreements mainly stem from underspecification of the legal context, which poses challenges given the typically limited granularity and noise in COC metadata. We further assess the explainablility of SOTA COC models on RAVE and observe limited agreement between models and experts. Overall, our case study reveals hitherto underappreciated complexities in creating benchmark datasets in legal NLP that revolve around identifying aspects of a case's facts supposedly relevant to its outcome.	翻訳日:2023-10-25 22:54:16 公開日:2023-10-24
# 画像データに対するconvnetのパラメータ生成のための学習 Learning to Generate Parameters of ConvNets for Unseen Image Data ( http://arxiv.org/abs/2310.11862v2 ) ライセンス: Link先を確認	Shiye Wang, Kaituo Feng, Changsheng Li, Ye Yuan, Guoren Wang	(参考訳) 典型的な畳み込みニューラルネットワーク(convnets)は、大量の画像データに大きく依存し、ネットワークパラメータを学習するために反復最適化アルゴリズム(sgdやadamなど)を利用する。本稿では,convnetアーキテクチャが与えられたとき,画像データセットとそれに対応する最適なネットワークパラメータの間に相関関係が存在することを観測し,それらの関係を捉えるハイパーマップを学習できるかどうかを検証し,トレーニングフェーズで見たことのない画像データセットのネットワークパラメータを直接予測できるように,新たなトレーニングパラダイムを提案し,convnetのパラメータ学習を予測タスクに定式化する。そこで我々は,データセットとそれに対応するネットワークパラメータのマッピングを学習する目的で,PudNetと呼ばれる新しいハイパーネットワークモデルを提案し,そのパラメータを1つの前方伝播だけで予測する。さらに,重みを共有する一連の適応型ハイパーリカレントユニットにより,異なるネットワーク層間のパラメータの依存性を捉えることができる。大規模な実験により,提案手法は,データセット内予測とデータセット間予測の2種類のデータセットに対して有効であることが示された。当社のPudNetは,ImageNet-1Kなど,大規模なデータセットにもスケールアップ可能です。 GCをスクラッチから使用してImageNet-1K上でResNet-18をトレーニングするには8967GPU秒を要する。しかし、我々のpudnetはresnet-18のネットワークパラメータを予測するのにわずか3.89gpu秒しかかからない(44.92%)。 Typical Convolutional Neural Networks (ConvNets) depend heavily on large amounts of image data and resort to an iterative optimization algorithm (e.g., SGD or Adam) to learn network parameters, which makes training very time- and resource-intensive. In this paper, we propose a new training paradigm and formulate the parameter learning of ConvNets into a prediction task: given a ConvNet architecture, we observe there exists correlations between image datasets and their corresponding optimal network parameters, and explore if we can learn a hyper-mapping between them to capture the relations, such that we can directly predict the parameters of the network for an image dataset never seen during the training phase. To do this, we put forward a new hypernetwork based model, called PudNet, which intends to learn a mapping between datasets and their corresponding network parameters, and then predicts parameters for unseen data with only a single forward propagation. Moreover, our model benefits from a series of adaptive hyper recurrent units sharing weights to capture the dependencies of parameters among different network layers. Extensive experiments demonstrate that our proposed method achieves good efficacy for unseen image datasets on two kinds of settings: Intra-dataset prediction and Inter-dataset prediction. Our PudNet can also well scale up to large-scale datasets, e.g., ImageNet-1K. It takes 8967 GPU seconds to train ResNet-18 on the ImageNet-1K using GC from scratch and obtain a top-5 accuracy of 44.65 %. However, our PudNet costs only 3.89 GPU seconds to predict the network parameters of ResNet-18 achieving comparable performance (44.92 %), more than 2,300 times faster than the traditional training paradigm.	翻訳日:2023-10-25 22:53:57 公開日:2023-10-24
# ニューラルネットワークを用いた自己注意機構におけるQKV計算の強化 Neural Attention: Enhancing QKV Calculation in Self-Attention Mechanism with Neural Networks ( http://arxiv.org/abs/2310.11398v2 ) ライセンス: Link先を確認	Muhan Zhang	(参考訳) ディープラーニングの領域では、自己認識メカニズムは、自然言語処理やコンピュータビジョンを含む、無数のタスクにまたがる重要な役割を実証している。多様なアプリケーションで成功しているにもかかわらず、従来の自己認識メカニズムは主にクエリ、キー、値(QKV)の計算に線形変換を利用する。本稿では,qkv計算のための新しい手法を探究し,特別に設計されたニューラルネットワーク構造を用いて計算を行う。改良されたマリアンモデルを用いて、IWSLT 2017ドイツ語翻訳タスクデータセットの実験を行い、従来の手法で近似した。実験結果から,BLEUスコアの大幅な向上が得られた。さらに,wikitext-103データセットを用いてrobertaモデルをトレーニングする際にも,モデルのパープレキシティが当初のデータに比べて著しく低下していることを反映して,その優越性が示された。これらの実験結果から,本手法の有効性を検証できるだけでなく,ニューラルネットワークを用いたqkv計算による自己着脱機構の最適化の可能性も明らかにした。提案手法のソースコードと実装の詳細はhttps://github.com/ocislyjrti/NeuralAttention.comでアクセスできます。 In the realm of deep learning, the self-attention mechanism has substantiated its pivotal role across a myriad of tasks, encompassing natural language processing and computer vision. Despite achieving success across diverse applications, the traditional self-attention mechanism primarily leverages linear transformations for the computation of query, key, and value (QKV), which may not invariably be the optimal choice under specific circumstances. This paper probes into a novel methodology for QKV computation-implementing a specially-designed neural network structure for the calculation. Utilizing a modified Marian model, we conducted experiments on the IWSLT 2017 German-English translation task dataset and juxtaposed our method with the conventional approach. The experimental results unveil a significant enhancement in BLEU scores with our method. Furthermore, our approach also manifested superiority when training the Roberta model with the Wikitext-103 dataset, reflecting a notable reduction in model perplexity compared to its original counterpart. These experimental outcomes not only validate the efficacy of our method but also reveal the immense potential in optimizing the self-attention mechanism through neural network-based QKV computation, paving the way for future research and practical applications. The source code and implementation details for our proposed method can be accessed at https://github.com/ocislyjrti/NeuralAttention.	翻訳日:2023-10-25 22:53:04 公開日:2023-10-24
# vechr:欧州人権裁判所における脆弱性タイプの説明可能かつロバストな分類のためのデータセット VECHR: A Dataset for Explainable and Robust Classification of Vulnerability Type in the European Court of Human Rights ( http://arxiv.org/abs/2310.11368v4 ) ライセンス: Link先を確認	Shanshan Xu, Leon Staufer, T.Y.S.S Santosh, Oana Ichim, Corina Heri, Matthias Grabmair	(参考訳) 脆弱性を認識することは,対象とするサポートの理解と実装において極めて重要である。これは欧州人権裁判所(ECtHR)において特に重要であり、裁判所は条約の基準を実際の個人のニーズに適合させ、それによって効果的な人権保護を確保する。しかし、脆弱性の概念はECtHRではいまだ解明されておらず、これまでのNLP研究では対応していない。そこで本研究では,脆弱性型分類と説明的根拠からなる,新たな専門家によるマルチラベルデータセットであるVECHRを提案する。予測可能性と説明可能性の両方の観点から,VECHRの最先端モデルの性能をベンチマークする。結果は,予測性能が低く,モデルと専門家の合意が限られているタスクの難易度を示す。さらに,out-of-domain(ood)データを扱う際のモデルのロバスト性を分析し,全体の性能を観測する。私たちのデータセットは、パフォーマンス、説明可能性、堅牢性に関する大きな改善の余地を提供するユニークな課題をもたらします。 Recognizing vulnerability is crucial for understanding and implementing targeted support to empower individuals in need. This is especially important at the European Court of Human Rights (ECtHR), where the court adapts Convention standards to meet actual individual needs and thus ensures effective human rights protection. However, the concept of vulnerability remains elusive at the ECtHR and no prior NLP research has dealt with it. To enable future research in this area, we present VECHR, a novel expert-annotated multi-label dataset comprising of vulnerability type classification and explanation rationale. We benchmark the performance of state-of-the-art models on VECHR from both prediction and explainability perspectives. Our results demonstrate the challenging nature of the task with lower prediction performance and limited agreement between models and experts. Further, we analyze the robustness of these models in dealing with out-of-domain (OOD) data and observe overall limited performance. Our dataset poses unique challenges offering significant room for improvement regarding performance, explainability, and robustness.	翻訳日:2023-10-25 22:52:27 公開日:2023-10-24
# 弱視を利用してインドネシアの保全データセットを生成する Utilizing Weak Supervision To Generate Indonesian Conservation Dataset ( http://arxiv.org/abs/2310.11258v2 ) ライセンス: Link先を確認	Mega Fransiska, Diah Pitaloka, Saripudin, Satrio Putra, Lintang Sutawika	(参考訳) 弱監視は、NLP開発を加速する需要の増加に対応する、迅速かつ大規模データセット作成のための有望なアプローチとして現れている。ラベル機能を利用することで、弱い監督により、ソフトラベル付きデータセットを生成する学習ラベルモデルを作成することで、実践者が迅速にデータセットを生成することができる。本稿では,インドネシアのNLPデータセットを保護ニューステキストから構築する方法について述べる。マルチクラス分類と感情分類の2種類のデータセットを構築した。次に、様々な事前学習言語モデルを用いてベースライン実験を行う。これらの基準値は59.79%の精度と55.72%のF1スコア、66.87%のF1スコアマクロ、71.5%のF1スコアマイクロ、83.67%のROC-AUCの試験結果を示している。さらに,本研究で使用されるデータセットとラベル機能もリリースして,さらなる研究と探索を行う。 Weak supervision has emerged as a promising approach for rapid and large-scale dataset creation in response to the increasing demand for accelerated NLP development. By leveraging labeling functions, weak supervision allows practitioners to generate datasets quickly by creating learned label models that produce soft-labeled datasets. This paper aims to show how such an approach can be utilized to build an Indonesian NLP dataset from conservation news text. We construct two types of datasets: multi-class classification and sentiment classification. We then provide baseline experiments using various pretrained language models. These baseline results demonstrate test performances of 59.79% accuracy and 55.72% F1-score for sentiment classification, 66.87% F1-score-macro, 71.5% F1-score-micro, and 83.67% ROC-AUC for multi-class classification. Additionally, we release the datasets and labeling functions used in this work for further research and exploration.	翻訳日:2023-10-25 22:51:46 公開日:2023-10-24
# 導波路QEDにおける量子多光子ラビ振動 Quantum Multiphoton Rabi Oscillations in Waveguide QED ( http://arxiv.org/abs/2310.15412v1 ) ライセンス: Link先を確認	Debsuvra Mukhopadhyay and Jung-Tsung Shen	(参考訳) 量子情報処理の未来は、チップスケールのナノフォトニクス、特にキャビティQEDと導波路QEDである。量子フォトニクス技術を支える最前線のプロセスの1つは、強いレーザー源によって量子ビットが照射されたときに現れるラビ振動現象である。従来の半古典的枠組みとは別に、光励起が多光子フォック状態の形で、キュービットカップルが放射線モードの連続体となるより一般的な量子論的ケースについて述べる。実空間の定式化を利用して、2レベルエミッタと相互作用するフォトニックフォック状態の散乱ダイナミクスを解析的に探索する。原子励起の振幅は、逐次光子吸収と放出のポテンシャルによって引き起こされる様々な独立した散乱事象の線形重ね合わせを示す。数個の光子のうちの1つが確率的散乱によって始められた最低次励起は、弱場環境におけるダイナミクスを適切に特徴づける。これは、原子-光子相互作用の繰り返しによる高次散乱現象によって補われる。我々の構成におけるクォービット励起の時間的進化は、特にラビ振動が展開する強跳躍極限において、半古典的な予測を密接に反映している。特に、この半古典的パラダイムとの互換性は、弱い運転と大きな調整の限界の両方に適用される。したがって,本解析では,単一モードキャビティqedに関連する量子ラビ振動の既存の結果から,光子を情報キャリアとするマルチモード導波路qed構成まで拡張する。最後に、パルス波パケットの散乱ダイナミクスについて検討し、少数の光子を含むシナリオにおいても励起効率を大幅に向上させる可能性を明らかにする。 The future of quantum information processing hinges on chip-scale nanophotonics, specifically cavity QED and waveguide QED. One of the foremost processes underpinning quantum photonic technologies is the phenomenon of Rabi oscillations, which manifests when a qubit is irradiated by an intense laser source. Departing from the conventional semiclassical framework, we expound on the more general, quantum-theoretic case where the optical excitation takes the form of a multiphoton Fock state, and the qubit couples to a continuum of radiation modes. By employing the real-space formalism, we analytically explore the scattering dynamics of the photonic Fock state as it interfaces with a two-level emitter. The resulting amplitude for atomic excitation features a linear superposition of various independent scattering events that are triggered by the potential of sequential photon absorptions and emissions. The lowest-order excitation event, initiated by the stochastic scattering of one of the several photons, aptly characterizes the dynamics in a weak-field environment. This is complemented by a multitude of higher-order scattering events ensuing from repeated atom-photon interactions. The temporal evolution of the qubit excitation in our configuration closely mirrors the semiclassical predictions, particularly in the strong-pumping limit where Rabi oscillations unfold. Notably, this compatibility with the semiclassical paradigm applies both to the weak-driving and large-detuning limits. Our analysis, therefore, extends the existing results on quantum Rabi oscillations pertinent to single-mode cavity QED, to the multimode, waveguide-QED configurations wherein flying photons are the information carriers. Finally, we explore the scattering dynamics of pulsed wave packets, highlighting the potential to substantially enhance excitation efficiency, even in scenarios involving just a few photons.	翻訳日:2023-10-25 21:24:17 公開日:2023-10-24
# constitutionmaker: フィードバックを原則に変換することで、大規模言語モデルをインタラクティブに評価する ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles ( http://arxiv.org/abs/2310.15428v1 ) ライセンス: Link先を確認	Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry	(参考訳) 大きな言語モデル(LLM)のプロンプトは、ユーザが独自のチャットボットを作成してカスタマイズするための、有望な新しいアプローチである。しかしながら、プロンプトエンジニアリングや微調整といったチャットボットのアウトプットを操作する現在の方法は、モデルのアウトプットに対する自然なフィードバックをプロンプトやモデルの変更に変換するユーザをサポートしない。本研究では,フィードバックをモデル動作を規定する一連の原則(コンスティチューション)に変換するのを支援することにより,ユーザがフィードバックを通じてインタラクティブにモデルアウトプットを洗練する方法について検討する。フォーマティブな研究から,(1)ユーザはフィードバックをチャットボットの原則に変換することを支援する必要があり,(2)ユーザが望む原則の種類を分類する必要があることがわかった。このような知見に触発されて,ユーザフィードバックを原則に変換するインタラクティブなツールであるconstitutionmakerを,llmベースのチャットボットとして開発した。 ConstitutionMakerでは、自然言語で肯定的あるいは否定的なフィードバック、自動生成されたフィードバックの選択、チャットボットの応答の書き直し、各フィードバックモードが自動的にチャットボットのプロンプトに挿入される原則を生成する。 14人の参加者によるユーザ調査では、constitutionmakerとablatedバージョンを比較して、ユーザが独自の原則を記述した。 constitutionmakerでは、参加者は彼らの原則がチャットボットをよりガイドし、フィードバックをより簡単に原則に変換し、より効率的に、よりメンタルな要求なしに原則を書くことができると感じた。 ConstitutionMakerは、ユーザーがチャットボットを改善する方法を特定し、モデルに対する直感的な反応をフィードバックに定式化し、フィードバックを具体的で明確な原則に変換するのに役立つ。これらの知見は,LLM出力の対話的クオリティ向上を支援する将来的なツールである。 Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively refine model outputs through their feedback, by helping them convert their feedback into a set of principles (i.e. a constitution) that dictate the model's behavior. From a formative study, we (1) found that users needed support converting their feedback into principles for the chatbot and (2) classified the different principle types desired by users. Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots. With ConstitutionMaker, users can provide either positive or negative feedback in natural language, select auto-generated feedback, or rewrite the chatbot's response; each mode of feedback automatically generates a principle that is inserted into the chatbot's prompt. In a user study with 14 participants, we compare ConstitutionMaker to an ablated version, where users write their own principles. With ConstitutionMaker, participants felt that their principles could better guide the chatbot, that they could more easily convert their feedback into principles, and that they could write principles more efficiently, with less mental demand. ConstitutionMaker helped users identify ways to improve the chatbot, formulate their intuitive responses to the model into feedback, and convert this feedback into specific and clear principles. Together, these findings inform future tools that support the interactive critiquing of LLM outputs.	翻訳日:2023-10-25 21:12:10 公開日:2023-10-24
# Mason-Alberta音声セグメント:ディープニューラルネットワークと補間に基づく強制アライメントシステム The Mason-Alberta Phonetic Segmenter: A forced alignment system based on deep neural networks and interpolation ( http://arxiv.org/abs/2310.15425v1 ) ライセンス: Link先を確認	Matthew C. Kelley, Scott James Perry, Benjamin V. Tucker	(参考訳) 強制アライメントシステムは,音声データのセグメント間の境界を自動的に決定する。これらのツールは、手作業で書き起こしやセグメント化できない音声データの使用を容易にするために、音韻学では一般的である。本稿では,新しいニューラルネットワークに基づく強制アライメントシステム,Mason-Alberta Phonetic Segmenter(MAPS)について述べる。 MAPSアライメントは、強制アライメントシステムのために私たちが追求する2つの改善のためのテストベッドとして機能します。第一は、音声のセグメントが真に離散的ではなく、一般的に重複しているという共通の理解によって動機付けられた分類タスクではなく、強制ライナーで音響モデルをタグ付けタスクとして扱うことである。 2つ目は、現代の強制アライメントシステムにおいて一般的な10ミリ秒制限よりも正確な境界を許容する補間技術である。本システムの構成を最先端システムであるモントリオール強制調整機と比較した。タギングのアプローチはモントリオール強制アリグナーよりも改善された結果をもたらすことはなかった。しかし、補間技術を備えたシステムは、試験セット上の目標の10ms以内の境界の量において、モントリオール強制調整機と比較して27.92%増加した。また,音響モデリングの課題と訓練過程を強制的に調整し,これらのモデルの出力対象が電話との類似性の概念とどのように一致しないか,また,この緊張の解消にはタスクと出力対象の再検討や音声自体のセグメント化が必要となる可能性があることを強調する。 Forced alignment systems automatically determine boundaries between segments in speech data, given an orthographic transcription. These tools are commonplace in phonetics to facilitate the use of speech data that would be infeasible to manually transcribe and segment. In the present paper, we describe a new neural network-based forced alignment system, the Mason-Alberta Phonetic Segmenter (MAPS). The MAPS aligner serves as a testbed for two possible improvements we pursue for forced alignment systems. The first is treating the acoustic model in a forced aligner as a tagging task, rather than a classification task, motivated by the common understanding that segments in speech are not truly discrete and commonly overlap. The second is an interpolation technique to allow boundaries more precise than the common 10 ms limit in modern forced alignment systems. We compare configurations of our system to a state-of-the-art system, the Montreal Forced Aligner. The tagging approach did not generally yield improved results over the Montreal Forced Aligner. However, a system with the interpolation technique had a 27.92% increase relative to the Montreal Forced Aligner in the amount of boundaries within 10 ms of the target on the test set. We also reflect on the task and training process for acoustic modeling in forced alignment, highlighting how the output targets for these models do not match phoneticians' conception of similarity between phones and that reconciliation of this tension may require rethinking the task and output targets or how speech itself should be segmented.	翻訳日:2023-10-25 21:11:39 公開日:2023-10-24
# 分子ポラリトンの線形応答 Linear response of molecular polaritons ( http://arxiv.org/abs/2310.15424v1 ) ライセンス: Link先を確認	Joel Yuen-Zhou and Arghadip Koner	(参考訳) 本稿では,光学キャビティの光子モードにN$分子エミッタが結合する集合光物質強結合系を,光子が不純物である量子不純物モデルにマッピングし,不調和遷移の浴に結合することを示す。 N\gg1$の熱力学限界では、この浴を効果的な調和風呂に置き換えることにより、問題を劇的に単純化して調和振動子の1つにすることができる。分子入力に必要な唯一の分子入力が分子線感受性である線形光学スペクトル(透過,反射,吸収)の単純な解析式を導出する。この形式化は、温度、障害、ビブロンカップリング、および分子アンサンブルの光学的飽和の役割を示す一連の例に適用され、非線形光学実験の重要なクラスを記述する際にも有用である。完全性のために、回転波近似における任意の無調波系(大小ともにN$)に対する分光観測器の自己完結型導出を含む包括的近似を提供する。提案された結果のいくつかは既に文献で報告されているが、オープン量子系における強力な概念と線形応答理論と分子分極論を結びつける新しい解釈と同様に、結果を統一的に提示する。 In this article, we show that the collective light-matter strong coupling regime, where $N$ molecular emitters couple to the photon mode of an optical cavity, can be mapped to a quantum impurity model where the photon is the impurity that is coupled to a bath of anharmonic transitions. In the thermodynamic limit where $N\gg1$, we argue that the bath can be replaced with an effective harmonic bath, leading to a dramatic simplification of the problem into one of coupled harmonic oscillators. We derive simple analytical expressions for linear optical spectra (transmission, reflection, and absorption) where the only molecular input required is the molecular linear susceptibility. This formalism is applied to a series of illustrative examples showcasing the role of temperature, disorder, vibronic coupling, and optical saturation of the molecular ensemble, explaining that it is useful even when describing an important class of nonlinear optical experiments. For completeness, we provide a comprehensive Appendix that includes a self-contained derivation of the relevant spectroscopic observables for arbitrary anharmonic systems (for both large and small $N$) within the rotating-wave approximation. While some of the presented results herein have already been reported in the literature, we provide a unified presentation of the results as well as new interpretations that connect powerful concepts in open quantum systems and linear response theory with molecular polaritonics.	翻訳日:2023-10-25 21:11:10 公開日:2023-10-24
# G2-MonoDepth:単分子RGB+Xデータからの一般化深度推論の一般的なフレームワーク G2-MonoDepth: A General Framework of Generalized Depth Inference from Monocular RGB+X Data ( http://arxiv.org/abs/2310.15422v1 ) ライセンス: Link先を確認	Haotian Wang, Meng Yang, and Nanning Zheng	(参考訳) 単眼深度推定はロボットのシーン認識の基本的な問題である。特定のロボットにはカメラと任意のタイプの奥行きセンサーが装備され、様々なスケールの様々なシーンに配置できるが、近年の進歩は複数のサブタスクを派生させた。これにより、特定のロボットの微調整モデルにさらなる負担がかかり、大規模な工業化において高コストでカスタマイズできる。本稿では,様々なロボットから入力されたあらゆるデータから高品質な深度マップを推定する単眼深度推定の統一課題について検討する。基本的なベンチマーク G2-MonoDepth はこのタスクのために開発されている。 (a)rgbプラス多様なシーンスケール/セマンティクス、深さスパーシティ([0%, 100%])、エラー(ホール/ノイズ/ブラル)の生深度に対応する統一データ表現rgb+x。 (b)入力生データの深度・深度・誤り及び出力シーンの多様さに対応するための新たな統一的損失 (c)多様なシーンスケールを入力から出力へよく伝達する改良されたネットワーク、及び (d) トレーニング用の生深度マップで実際のすべての種類のアーティファクトをシミュレートするデータ拡張パイプライン。 G2-MonoDepthは、深度推定、鮮度の違いによる深度補完、見えないシーンでの深度向上を含む3つのサブタスクに適用され、現実世界のデータと合成データの両方でSOTAベースラインを常に上回る。 Monocular depth inference is a fundamental problem for scene perception of robots. Specific robots may be equipped with a camera plus an optional depth sensor of any type and located in various scenes of different scales, whereas recent advances derived multiple individual sub-tasks. It leads to additional burdens to fine-tune models for specific robots and thereby high-cost customization in large-scale industrialization. This paper investigates a unified task of monocular depth inference, which infers high-quality depth maps from all kinds of input raw data from various robots in unseen scenes. A basic benchmark G2-MonoDepth is developed for this task, which comprises four components: (a) a unified data representation RGB+X to accommodate RGB plus raw depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth sparsity/errors of input raw data and diverse scales of output scenes, (c) an improved network to well propagate diverse scene scales from input to output, and (d) a data augmentation pipeline to simulate all types of real artifacts in raw depth maps for training. G2-MonoDepth is applied in three sub-tasks including depth estimation, depth completion with different sparsity, and depth enhancement in unseen scenes, and it always outperforms SOTA baselines on both real-world data and synthetic data.	翻訳日:2023-10-25 21:10:48 公開日:2023-10-24
# FANToM: インタラクションにおける心のストレステストマシン理論のベンチマーク FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions ( http://arxiv.org/abs/2310.15421v1 ) ライセンス: Link先を確認	Hyunwoo Kim, Melanie Sclar, Xuhui Zhou, Ronan Le Bras, Gunhee Kim, Yejin Choi, Maarten Sap	(参考訳) 心の理論(ToM)評価は、相互作用性に本質的に欠ける受動的物語を用いたテストモデルに焦点を当てている。本稿では,情報非対称な会話文脈におけるToMのストレステストを目的とした新しいベンチマークであるFANToMを紹介する。本ベンチマークは,大規模言語モデル(llm)の評価において,心理学から重要な理論的要件と必要な経験的考察を導出する。特に,LLMにおける視覚的・虚偽のToM能力を識別するために,同じ推論を要求される複数の質問を定式化する。 FANToMは、チェーン・オブ・シークレット・推論や微調整でさえも、人間よりもはるかにパフォーマンスが悪く、最先端のLLMでは困難であることを示す。 Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. Our benchmark draws upon important theoretical requisites from psychology and necessary empirical considerations when evaluating large language models (LLMs). In particular, we formulate multiple types of questions that demand the same underlying reasoning to identify illusory or false sense of ToM capabilities in LLMs. We show that FANToM is challenging for state-of-the-art LLMs, which perform significantly worse than humans even with chain-of-thought reasoning or fine-tuning.	翻訳日:2023-10-25 21:10:19 公開日:2023-10-24
# 短文トピックモデリングのための事前学習型言語モデル"Imagine" Let the Pretrained Language Models "Imagine" for Short Texts Topic Modeling ( http://arxiv.org/abs/2310.15420v1 ) ライセンス: Link先を確認	Pritom Saha Akash, Jie Huang, Kevin Chen-Chuan Chang	(参考訳) トピックモデルは、ドキュメントコレクション内の潜在意味論を発見するための魅力的な方法の1つです。しかし、ドキュメントが有効な十分な共起情報を持っていると仮定する。しかし、短いテキストでは、共起情報は最小限であり、結果として文書表現に特徴的スパーシティが生じる。したがって、既存のトピックモデル(確率的または神経的)は、主にパターンをマイニングして一貫性のあるトピックを生成するのに失敗する。本稿では,既存の事前学習言語モデル(PLM)を用いて,短いテキストを長いシーケンスに拡張することで,データスパーシビリティ問題に対処する,短文トピックモデリングの新しいアプローチを提案する。さらに、PLMからノイズの多い話題テキスト生成の効果を低減するために、ニューラルトピックモデルを拡張した簡単なソリューションを提供する。我々は,本モデルが短文トピックモデリングの性能を大幅に向上させることができることを観察した。極端なデータスパーシティシナリオの下での複数の実世界のデータセットに関する広範囲な実験は、我々のモデルが最先端のモデルよりも高品質なトピックを生成できることを示しています。 Topic models are one of the compelling methods for discovering latent semantics in a document collection. However, it assumes that a document has sufficient co-occurrence information to be effective. However, in short texts, co-occurrence information is minimal, which results in feature sparsity in document representation. Therefore, existing topic models (probabilistic or neural) mostly fail to mine patterns from them to generate coherent topics. In this paper, we take a new approach to short-text topic modeling to address the data-sparsity issue by extending short text into longer sequences using existing pre-trained language models (PLMs). Besides, we provide a simple solution extending a neural topic model to reduce the effect of noisy out-of-topics text generation from PLMs. We observe that our model can substantially improve the performance of short-text topic modeling. Extensive experiments on multiple real-world datasets under extreme data sparsity scenarios show that our models can generate high-quality topics outperforming state-of-the-art models.	翻訳日:2023-10-25 21:10:09 公開日:2023-10-24
# 政策最適化におけるフラクタル景観 Fractal Landscapes in Policy Optimization ( http://arxiv.org/abs/2310.15418v1 ) ライセンス: Link先を確認	Tao Wang, Sylvia Herbert and Sicun Gao	(参考訳) 政策勾配は、継続的ドメインにおける深層強化学習(RL)の中核にある。多くの成功にもかかわらず、政策勾配によるRLトレーニングは、既知の解に対する標準的な制御問題でさえも、多くの理由で失敗する可能性があると、実際にはしばしば見られている。ポリシ空間における最適化の展望は,あるクラスのMDPに対して極めて非平滑あるいはフラクタルであり,そもそも勾配を推定する手段が存在しない,という,ポリシー勾配アプローチの固有の制限を理解するための枠組みを提案する。カオス理論と非スムース解析の手法を考察し,政策最適化目標の最大リアプノフ指数とh\"older指数を分析した。さらに,学習過程がフラクタルランドスケープに遭遇したときのサンプルから目的関数の局所的滑らかさを推定する実用的な手法を開発した。このようなフラクタルな景観によって、政策最適化の失敗事例をいかに説明できるかを示す実験を示す。 Policy gradient lies at the core of deep reinforcement learning (RL) in continuous domains. Despite much success, it is often observed in practice that RL training with policy gradient can fail for many reasons, even on standard control problems with known solutions. We propose a framework for understanding one inherent limitation of the policy gradient approach: the optimization landscape in the policy space can be extremely non-smooth or fractal for certain classes of MDPs, such that there does not exist gradient to be estimated in the first place. We draw on techniques from chaos theory and non-smooth analysis, and analyze the maximal Lyapunov exponents and H\"older exponents of the policy optimization objectives. Moreover, we develop a practical method that can estimate the local smoothness of objective function from samples to identify when the training process has encountered fractal landscapes. We show experiments to illustrate how some failure cases of policy optimization can be explained by such fractal landscapes.	翻訳日:2023-10-25 21:09:52 公開日:2023-10-24
# 点/系列再構成による名目性スコア条件付き時系列異常検出 Nominality Score Conditioned Time Series Anomaly Detection by Point/Sequential Reconstruction ( http://arxiv.org/abs/2310.15416v1 ) ライセンス: Link先を確認	Chih-Yu Lai, Fan-Keng Sun, Zhengqi Gao, Jeffrey H. Lang, and Duane S. Boning	(参考訳) 時系列異常検出は、複雑で様々なパターンが発生するため困難である。時間依存関係をモデル化して、点異常の検出精度を維持しながらコンテキスト異常を見つけることが大きな課題である。本稿では,ポイントベースおよびシーケンスベース再構成モデルを用いた教師なし時系列異常検出のためのフレームワークを提案する。点ベースモデルは点異常の定量化を試み、シーケンスベースモデルは点と文脈異常の定量化を試みる。観測された時刻が名目時点から2段階のずれ値であるという定式化において、復元誤差の組合せ値の比率から算出した名目スコアを導入する。本研究は,発音スコアと異常スコアとを更に統合して誘導異常スコアを導出し,特定の条件下で誘導異常スコアが元の異常スコアよりも優れていることを理論的に証明する。いくつかの公開データセットに関する広範な研究により、提案されたフレームワークは、時系列異常検出のための最先端のベースラインよりも優れていることが示されている。 Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based reconstruction models. The point-based model attempts to quantify point anomalies, and the sequence-based model attempts to quantify both point and contextual anomalies. Under the formulation that the observed time point is a two-stage deviated value from a nominal time point, we introduce a nominality score calculated from the ratio of a combined value of the reconstruction errors. We derive an induced anomaly score by further integrating the nominality score and anomaly score, then theoretically prove the superiority of the induced anomaly score over the original anomaly score under certain conditions. Extensive studies conducted on several public datasets show that the proposed framework outperforms most state-of-the-art baselines for time series anomaly detection.	翻訳日:2023-10-25 21:09:36 公開日:2023-10-24
# 会話間のギャップを意識して-長期対話生成の改善 Mind the Gap Between Conversations for Improved Long-Term Dialogue Generation ( http://arxiv.org/abs/2310.15415v1 ) ライセンス: Link先を確認	Qiang Zhang, Jason Naradowsky, Yusuke Miyao	(参考訳) 会話の終わり方や再開方法を知ることは、コミュニケーションの自然な部分であり、数週間、数ヶ月、数年にわたる議論を可能にする。会話間のギャップの期間は、どのトピックが関連しているか、どの質問をするかを判断し、明確にモデル化されていない対話システムは、不自然な応答を生成する。本稿では,対話モデルに時間を認識し,セッション間の時間が異なるマルチセッション対話データセットであるgapchatを提案する。データセットはリアルタイムに構築されているが、話者の生活における出来事の進行をシミュレートして、長い時間間隔で発生する現実的な対話を生成する。時間情報をモデルに公開し、時間とイベントの進捗の異なる表現を比較します。人的評価において、時間認識モデルは、選択したトピックと会話から得られる情報との関係を判断する指標において、より良い性能を示すことを示す。 Knowing how to end and resume conversations over time is a natural part of communication, allowing for discussions to span weeks, months, or years. The duration of gaps between conversations dictates which topics are relevant and which questions to ask, and dialogue systems which do not explicitly model time may generate responses that are unnatural. In this work we explore the idea of making dialogue models aware of time, and present GapChat, a multi-session dialogue dataset in which the time between each session varies. While the dataset is constructed in real-time, progress on events in speakers' lives is simulated in order to create realistic dialogues occurring across a long timespan. We expose time information to the model and compare different representations of time and event progress. In human evaluation we show that time-aware models perform better in metrics that judge the relevance of the chosen topics and the information gained from the conversation.	翻訳日:2023-10-25 21:09:18 公開日:2023-10-24
# 人間とAIのコラボレーションに関する諸条約 Diverse Conventions for Human-AI Collaboration ( http://arxiv.org/abs/2310.15414v1 ) ライセンス: Link先を確認	Bidipta Sarkar and Andy Shih and Dorsa Sadigh	(参考訳) コンベンションは、プレイヤーが明示的なコミュニケーションなしに共有戦略で協調できるため、協調マルチエージェントゲームにおける強力なパフォーマンスに不可欠である。残念ながら、セルフプレイのような標準的なマルチエージェント強化学習技術は、任意で非多様性の慣習に収束し、新しいパートナーと対話する際には一般化が不十分になる。本研究は,(1)自己プレイ中の報酬を最大化し,(2)発見済みの規約(クロスプレイ)で遊ぶ際の報酬を最小化し,意味的に異なる規約を刺激することにより,多様な慣習を生成する手法を提案する。クロスプレイの逆最適化に拘わらず,学習した政策が忠実に振る舞うようにするために,自己プレイとクロスプレイの遷移をサンプリングして初期状態をランダムに生成し,この初期状態から自己プレイの報酬を最大化することを学習する「emph{mixed-play}」を導入する。我々は,Overcookedを含む様々なマルチエージェント協調ゲームにおける手法の利点を分析し,本手法が実際のユーザとペアリングした場合の人間レベルのパフォーマンスを越えながら,人間の慣行に適応できることを見出した。 Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce \emph{mixed-play}, where an initial state is randomly generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users.	翻訳日:2023-10-25 21:09:02 公開日:2023-10-24
# DeepIron:1枚の画像から未処理のガーメントテクスチャを予測する DeepIron: Predicting Unwarped Garment Texture from a Single Image ( http://arxiv.org/abs/2310.15447v1 ) ライセンス: Link先を確認	Hyun-Song Kwon, Sung-Hee Lee	(参考訳) 画像からの3D衣服のリアルな再構築は、アバター作成や仮想試着など幅広い応用がある。本稿では,1枚の写真から3次元衣料のテクスチャマップを再構築する新しい枠組みを提案する。 2次元縫製パターンを縫い合わせることで3D衣服をモデル化すると、その具体的目的は縫製パターンのテクスチャ画像を作成することである。本フレームワークの重要な構成要素であるテクスチュア・アンワーパーは、入力された衣服画像から本来のテクスチャイメージを推測し、ユーザの身体形状やポーズによるテクスチャのゆらぎと隠蔽を示す。 Texture Unwarperは、2つの画像の潜在空間をマッピングすることで、入力画像と出力画像の間で効果的に変換する。入力された衣服の本来のテクスチャを推定することで、新しいポーズのためにリアルに変形した高品質なテクスチャ画像を表示できる3d衣料モデルの再構築を支援する。他の方法との比較とアブレーション研究を通じて,本手法の有効性を検証する。さらに, 衣服を装着したアバターのテクスチャやイメージを付加した衣服縫製パターンの大規模データセットを公開し, 今後, テクスチャの再構築と合成研究に役立てる予定である。 Realistic reconstruction of 3D clothing from an image has wide applications, such as avatar creation and virtual try-on. This paper presents a novel framework that reconstructs the texture map for 3D garments from a single image with pose. Assuming that 3D garments are modeled by stitching 2D garment sewing patterns, our specific goal is to generate a texture image for the sewing patterns. A key component of our framework, the Texture Unwarper, infers the original texture image from the input clothing image, which exhibits warping and occlusion of texture due to the user's body shape and pose. The Texture Unwarper effectively transforms between the input and output images by mapping the latent spaces of the two images. By inferring the unwarped original texture of the input garment, our method helps reconstruct 3D garment models that can show high-quality texture images realistically deformed for new poses. We validate the effectiveness of our approach through a comparison with other methods and ablation studies. Additionally, we release a large dataset of garment sewing patterns with textures and images of avatars wearing the garments, which will be useful for future research on garment texture reconstruction and synthesis.	翻訳日:2023-10-25 21:03:38 公開日:2023-10-24
# 高速伝播: サンプリングサブネットワークによる単段攻撃訓練の高速化 Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks ( http://arxiv.org/abs/2310.15444v1 ) ライセンス: Link先を確認	Xiaojun Jia, Jianshu Li, Jindong Gu, Yang Bai and Xiaochun Cao	(参考訳) 敵のトレーニングでは、敵の例に対して堅牢なモデルを構築することが期待されている。逆行訓練の大きな欠点は、逆行例の生成によって引き起こされる計算オーバーヘッドである。この制限を克服するため、単段階攻撃に基づく敵の訓練が検討されている。これまでの作業は、サンプル初期化、損失正規化、トレーニング戦略など、異なる視点からの一段階の敵訓練を改善する。ほとんど全員が、基盤となるモデルをブラックボックスとして扱う。本研究では,モデルの内部構造ブロックを利用して効率を向上させることを提案する。具体的には、トレーニング中の代理モデルとして軽量サブネットワークを動的にサンプリングすることを提案する。これにより、効果的に対向訓練を行うために、前方と後方の両方のパスを加速することができる。さらに,モデルロバスト性が,サンプルサブネットワークを用いた単段逆訓練によって向上することを示すための理論的解析を行う。さらに, サンプリングを層ごとに, 繰り返しから繰り返しへと変化させる新しいサンプリング手法を提案する。従来の手法と比較して,本手法はトレーニングコストを削減するだけでなく,モデル堅牢性を向上する。一連の人気データセットの評価は、提案したFB-Betterの有効性を示す。私たちのコードはhttps://github.com/jiaxiaojunQAQ/FP-Better.comで公開されています。 Adversarial training has shown promise in building robust models against adversarial examples. A major drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples. To overcome this limitation, adversarial training based on single-step attacks has been explored. Previous work improves the single-step adversarial training from different perspectives, e.g., sample initialization, loss regularization, and training strategy. Almost all of them treat the underlying model as a black box. In this work, we propose to exploit the interior building blocks of the model to improve efficiency. Specifically, we propose to dynamically sample lightweight subnetworks as a surrogate model during training. By doing this, both the forward and backward passes can be accelerated for efficient adversarial training. Besides, we provide theoretical analysis to show the model robustness can be improved by the single-step adversarial training with sampled subnetworks. Furthermore, we propose a novel sampling strategy where the sampling varies from layer to layer and from iteration to iteration. Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness. Evaluations on a series of popular datasets demonstrate the effectiveness of the proposed FB-Better. Our code has been released at https://github.com/jiaxiaojunQAQ/FP-Better.	翻訳日:2023-10-25 21:03:15 公開日:2023-10-24
# 量子アニール法による線形方程式解法アルゴリズムの収束率 Convergence rate of algorithms for solving linear equations by quantum annealing ( http://arxiv.org/abs/2310.15441v1 ) ライセンス: Link先を確認	V. Shalgin, S. Tikhomirov	(参考訳) 量子アニーリングの原理に基づく量子コンピュータを用いて線形方程式$ax=b$を解くための様々な反復アルゴリズムを考える。コンピュータの出力がボルツマン分布によって記述されていると仮定すると、方程式解法アルゴリズムが収束する条件下で、それらの収束率の推定値が提供される。無限個の量子ビットと少数の量子ビットの両方を用いたアルゴリズムへのこのアプローチの適用について論じる。 We consider various iterative algorithms for solving the linear equation $ax=b$ using a quantum computer operating on the principle of quantum annealing. Assuming that the computer's output is described by the Boltzmann distribution, it is shown under which conditions the equation-solving algorithms converge, and an estimate of their convergence rate is provided. The application of this approach to algorithms using both an infinite number of qubits and a small number of qubits is discussed.	翻訳日:2023-10-25 21:02:57 公開日:2023-10-24
# 線形VOEにおける学習ダイナミクス: 後方崩壊閾値, 超流動潜時空間ピットフォール, KLアニーリングによる高速化 Learning Dynamics in Linear VAE: Posterior Collapse Threshold, Superfluous Latent Space Pitfalls, and Speedup with KL Annealing ( http://arxiv.org/abs/2310.15440v1 ) ライセンス: Link先を確認	Yuma Ichikawa and Koji Hukushima	(参考訳) 変分自己エンコーダ(VAEs)は、変分後部はしばしば前者と密接に一致する悪名高い問題に直面し、後部崩壊と呼ばれる現象は表現学習の質を妨げる。この問題を緩和するために、調整可能なハイパーパラメータ$\beta$と、KLアニールと呼ばれるこのパラメータをアニールする戦略を提案する。本研究では,最小vaeにおける学習ダイナミクスの理論的解析を行う。ダイナミックスが大きな入力次元の限界内で決定論的プロセスに収束することが厳密に証明され、一般化誤差の詳細な動的解析が可能になる。さらに, VAEはまず絡み合った表現を学習し, 徐々に絡み合った表現を取得する。決定論的プロセスの固定点分析により、$\beta$ が一定の閾値を超えると、学習期間に関係なく後方崩壊は避けられないことが分かる。さらに、データ生成因子の過剰な潜在変数は背景雑音の過剰化につながり、一般化と学習収束の両方に悪影響を及ぼす。この分析により、適切に調整されたKLアニールが収束を加速することが明らかとなった。 Variational autoencoders (VAEs) face a notorious problem wherein the variational posterior often aligns closely with the prior, a phenomenon known as posterior collapse, which hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter $\beta$ and a strategy for annealing this parameter, called KL annealing, are proposed. This study presents a theoretical analysis of the learning dynamics in a minimal VAE. It is rigorously proved that the dynamics converge to a deterministic process within the limit of large input dimensions, thereby enabling a detailed dynamical analysis of the generalization error. Furthermore, the analysis shows that the VAE initially learns entangled representations and gradually acquires disentangled representations. A fixed-point analysis of the deterministic process reveals that when $\beta$ exceeds a certain threshold, posterior collapse becomes inevitable regardless of the learning period. Additionally, the superfluous latent variables for the data-generative factors lead to overfitting of the background noise; this adversely affects both generalization and learning convergence. The analysis further unveiled that appropriately tuned KL annealing can accelerate convergence.	翻訳日:2023-10-25 21:02:48 公開日:2023-10-24
# k-haters:ターゲット別評価を用いた韓国におけるヘイトスピーチ検出コーパス K-HATERS: A Hate Speech Detection Corpus in Korean with Target-Specific Ratings ( http://arxiv.org/abs/2310.15439v1 ) ライセンス: Link先を確認	Chaewon Park, Soohwan Kim, Kyubyong Park, Kunwoo Park	(参考訳) オンライン憎しみの拡散に対抗するために、多くのデータセットが提案されている。これらの努力にもかかわらず、これらの資源の大半は英語中心であり、主に過度な憎しみの形式に焦点を当てている。この研究ギャップは、より微妙な憎悪表現をカプセル化した多様な言語で高品質なコーパスを開発することを要求する。本研究では,韓国におけるヘイトスピーチ検出のための新しいコーパスであるK-HATERSを紹介する。このリソースは韓国で最大の攻撃的言語コーパスであり、ターゲット固有の評価を3ポイントのlikertスケールで提供し、さまざまな攻撃性を通じて韓国における憎悪表現の検出を可能にした。提案コーパスの有効性を示す実験を行い,既存のデータセットとの比較を行った。さらに,人間の注釈における潜在的なノイズやバイアスに対処するために,個人の認知能力を評価するための社会科学において広く用いられている認知的リフレクションテスト(cognitive reflection test)を,ラベル付け品質の指標として採用するという新しい考え方を探求する。その結果、テストスコアが最も低い個人からのアノテーションは、特定のターゲットグループに対して偏りのある予測を行い、精度が低い検出モデルをもたらす傾向がある。本研究は,ヘイトスピーチの検出と資源構築に関するNLP研究に寄与する。コードとデータセットはhttps://github.com/ssu-humane/K-HATERSでアクセスできる。 Numerous datasets have been proposed to combat the spread of online hate. Despite these efforts, a majority of these resources are English-centric, primarily focusing on overt forms of hate. This research gap calls for developing high-quality corpora in diverse languages that also encapsulate more subtle hate expressions. This study introduces K-HATERS, a new corpus for hate speech detection in Korean, comprising approximately 192K news comments with target-specific offensiveness ratings. This resource is the largest offensive language corpus in Korean and is the first to offer target-specific ratings on a three-point Likert scale, enabling the detection of hate expressions in Korean across varying degrees of offensiveness. We conduct experiments showing the effectiveness of the proposed corpus, including a comparison with existing datasets. Additionally, to address potential noise and bias in human annotations, we explore a novel idea of adopting the Cognitive Reflection Test, which is widely used in social science for assessing an individual's cognitive ability, as a proxy of labeling quality. Findings indicate that annotations from individuals with the lowest test scores tend to yield detection models that make biased predictions toward specific target groups and are less accurate. This study contributes to the NLP research on hate speech detection and resource construction. The code and dataset can be accessed at https://github.com/ssu-humane/K-HATERS.	翻訳日:2023-10-25 21:02:27 公開日:2023-10-24
# VGX:学習ベースのソフトウェア脆弱性分析を促進する大規模サンプル生成 VGX: Large-Scale Sample Generation for Boosting Learning-Based Software Vulnerability Analyses ( http://arxiv.org/abs/2310.15436v1 ) ライセンス: Link先を確認	Yu Nong, Richard Fang, Guangbei Yi, Kunsong Zhao, Xiapu Luo, Feng Chen, and Haipeng Cai	(参考訳) 学習ベースの防御ソフトウェア脆弱性分析の成功を伴って、ラベル付き脆弱性プログラムサンプルの大規模かつ高品質なセットが欠如しており、これらの防御のさらなる進歩を妨げる。既存の自動サンプル生成手法は、生成したサンプルの高ノイズのため、まだ現実的な期待に届かなかった。本稿では,高品質な脆弱性データセットを大規模に生成するための新しい手法であるVGXを提案する。通常のプログラムが与えられた場合、VGXは脆弱性を注入できるコードコンテキストを特定し、新しいバリューフローベースの位置エンコーディングを備えたカスタマイズされたトランスフォーマーを使用して、特にコード構造とコンテキストを学ぶための新しい目的に対して事前トレーニングを行う。次に、VGXは、歴史的修正と現実世界の脆弱性に関する人間の知識の両方から得られた編集パターンを用いて、特定コンテキストにおける脆弱性注入コード編集を実現する。 4つのSOTAベースライン(パターン-、トランスフォーマー-、GNN-、パターン+トランスフォーマー-ベース)と比較して、VGXは99.09-890.06%高いF1と22.45%-328.47%高いラベル精度を達成した。 vgxは脆弱性のあるサンプルを150,392個生成し、そのサンプルから10パーセントをランダムに選択し、脆弱性の検出、ローカライズ、修復にどの程度役立つかを評価しました。その結果、これらの3つのアプリケーションタスクのSOTA技術は、F1の19.15-330.80%、トップ10の精度が12.86-19.31%、トップ50の精度が85.02-99.30%向上した。これらのサンプルはまた、SOTA脆弱性検出器が、オリジナルのモデルで見逃されるような重要なシステム(例えばLinuxカーネル)において、13のより現実的な脆弱性(CVE)を発見するのに役立った。 Accompanying the successes of learning-based defensive software vulnerability analyses is the lack of large and quality sets of labeled vulnerable program samples, which impedes further advancement of those defenses. Existing automated sample generation approaches have shown potentials yet still fall short of practical expectations due to the high noise in the generated samples. This paper proposes VGX, a new technique aimed for large-scale generation of high-quality vulnerability datasets. Given a normal program, VGX identifies the code contexts in which vulnerabilities can be injected, using a customized Transformer featured with a new value-flowbased position encoding and pre-trained against new objectives particularly for learning code structure and context. Then, VGX materializes vulnerability-injection code editing in the identified contexts using patterns of such edits obtained from both historical fixes and human knowledge about real-world vulnerabilities. Compared to four state-of-the-art (SOTA) baselines (pattern-, Transformer-, GNN-, and pattern+Transformer-based), VGX achieved 99.09-890.06% higher F1 and 22.45%-328.47% higher label accuracy. For in-the-wild sample production, VGX generated 150,392 vulnerable samples, from which we randomly chose 10% to assess how much these samples help vulnerability detection, localization, and repair. Our results show SOTA techniques for these three application tasks achieved 19.15-330.80% higher F1, 12.86-19.31% higher top-10 accuracy, and 85.02-99.30% higher top-50 accuracy, respectively, by adding those samples to their original training data. These samples also helped a SOTA vulnerability detector discover 13 more real-world vulnerabilities (CVEs) in critical systems (e.g., Linux kernel) that would be missed by the original model.	翻訳日:2023-10-25 21:02:03 公開日:2023-10-24
# PromptInfuser: AIとUIデザインの密結合がデザイナのワークフローに与える影響 PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers' Workflows ( http://arxiv.org/abs/2310.15435v1 ) ライセンス: Link先を確認	Savvas Petridis, Michael Terry, Carrie J. Cai	(参考訳) AIアプリケーションのプロトタイプ作成は、非常に難しい。大規模言語モデル(LLM)のプロトタイピングがAIプロトタイピングの障壁を劇的に減らしたが、デザイナはまだAI機能とUIを別々にプロトタイピングしている。プロンプトとuiデザインの結合がデザイナのワークフローに与える影響について検討する。本研究では,UI要素をインプットやプロンプトの出力に接続することで,半機能的なモックアップを作成できるプラグインであるPromptInfuserを開発した。 14人のデザイナーによる研究で、promiseinfuserとデザイナーの現在のaiプロトタイピングワークフローを比較した。 PromptInfuserはプロダクトのアイデアを伝えるのに非常に有用であり、想定されたアーティファクトを現実的に表現し、プロトタイピングをより効率的にし、UIの問題や技術的な制約を予測するのに役立ちます。 PromptInfuserは、プロンプトとUIを合わせてイテレーションを奨励した。これらの発見は、AIアプリケーションをプロトタイピングする将来のシステムに通知する。 Prototyping AI applications is notoriously difficult. While large language model (LLM) prompting has dramatically lowered the barriers to AI prototyping, designers are still prototyping AI functionality and UI separately. We investigate how coupling prompt and UI design affects designers' workflows. Grounding this research, we developed PromptInfuser, a Figma plugin that enables users to create semi-functional mockups, by connecting UI elements to the inputs and outputs of prompts. In a study with 14 designers, we compare PromptInfuser to designers' current AI-prototyping workflow. PromptInfuser was perceived to be significantly more useful for communicating product ideas, more capable of producing prototypes that realistically represent the envisioned artifact, more efficient for prototyping, and more helpful for anticipating UI issues and technical constraints. PromptInfuser encouraged iteration over prompt and UI together, which helped designers identify UI and prompt incompatibilities and reflect upon their total solution. Together, these findings inform future systems for prototyping AI applications.	翻訳日:2023-10-25 21:01:26 公開日:2023-10-24
# 政策畳み込みによる大規模行動空間のオフポリシー評価 Off-Policy Evaluation for Large Action Spaces via Policy Convolution ( http://arxiv.org/abs/2310.15433v1 ) ライセンス: Link先を確認	Noveen Sachdeva, Lequn Wang, Dawen Liang, Nathan Kallus, Julian McAuley	(参考訳) 正確なオフポリシー推定器の開発は、新しいポリシーの評価と最適化の両方に不可欠である。オフポリシー推定の主な課題は、データを生成するロギングポリシーと、我々が評価しようとしているターゲットポリシーの分散シフトである。通常、分布シフトを補正する技術は、ある種の重要サンプリングを含む。このアプローチは偏りのない値推定をもたらすが、ワンステップのコンテキストバンディットの単純な場合であっても、しばしば高い分散のトレードオフを伴う。さらに、重要サンプリングは、アクションスペースが大きいと非現実的になる共通のサポート仮定に依存する。これらの課題に対処するために、我々は、予測者の政策転換(PC)ファミリーを紹介する。これらのメソッドは、アクション内の潜在構造 -- アクション埋め込みを通じて利用可能 -- を利用して、ログとターゲットポリシーを戦略的に畳み込みます。この畳み込みは、畳み込み量を調整することで制御できるユニークなバイアス分散トレードオフをもたらす。筆者らは,PCを用いた場合,特に行動空間や政策ミスマッチが大きくなり,既存の推定値よりも最大5～6桁の精度で,平均二乗誤差(MSE)が顕著に向上することを示した。 Developing accurate off-policy estimators is crucial for both evaluating and optimizing for new policies. The main challenge in off-policy estimation is the distribution shift between the logging policy that generates data and the target policy that we aim to evaluate. Typically, techniques for correcting distribution shift involve some form of importance sampling. This approach results in unbiased value estimation but often comes with the trade-off of high variance, even in the simpler case of one-step contextual bandits. Furthermore, importance sampling relies on the common support assumption, which becomes impractical when the action space is large. To address these challenges, we introduce the Policy Convolution (PC) family of estimators. These methods leverage latent structure within actions -- made available through action embeddings -- to strategically convolve the logging and target policies. This convolution introduces a unique bias-variance trade-off, which can be controlled by adjusting the amount of convolution. Our experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC, especially when either the action space or policy mismatch becomes large, with gains of up to 5 - 6 orders of magnitude over existing estimators.	翻訳日:2023-10-25 21:01:05 公開日:2023-10-24
# 火をつけるのに何が必要か社会的・道徳的状況の明確化のための文脈と合理化の反復的自己蒸留 What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations ( http://arxiv.org/abs/2310.15431v1 ) ライセンス: Link先を確認	Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi	(参考訳) 道徳的または倫理的な判断は、それらが起こる特定の文脈に大きく依存する。様々なデファシブルな文脈化の陰(つまり、行動の道徳的受容性を強化するまたは弱める付加的な情報)を理解することは、現実のシナリオにおける人間の道徳的判断の微妙さと複雑さを正確に表すために重要である。我々は,行動が多かれ少なかれ道徳的に容認されるような基礎的な文脈を提供することと,その推論を正当化する常識的理性を導入する。高品質なタスクデータを抽出するために,GPT-3から少量の未構造化シード知識から始まる反復的自己蒸留アプローチを,(1)学生モデルからの自己蒸留,(2)人間による判断(妥当性向上)とNLI(多様性向上)によって訓練された批評家モデルによるターゲットフィルタリング,(3)自己シミュレーション学習(データ品質の増幅)とを交互に行う。このプロセスは、妥当性、多様性、デファシビリティを改善したデファシブルコンテキストを生成する学生モデルを生成する。このモデルから、人間のアノテータの85.9%から99.8%で評価された115Kデファシブルな道徳行動の文脈化と合理性の1.2M項目からなる高品質なデータセット \delta-Rules-of-Thumb を蒸留する。 \delta-RoT を用いて、すべての中間学生モデルに顕著なマージンで勝利する最終学生モデルを得る。 Moral or ethical judgments rely heavily on the specific contexts in which they occur. Understanding varying shades of defeasible contextualizations (i.e., additional information that strengthens or attenuates the moral acceptability of an action) is critical to accurately represent the subtlety and intricacy of grounded human moral judgment in real-life scenarios. We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable, along with commonsense rationales that justify the reasoning. To elicit high-quality task data, we take an iterative self-distillation approach that starts from a small amount of unstructured seed knowledge from GPT-3 and then alternates between (1) self-distillation from student models; (2) targeted filtering with a critic model trained by human judgment (to boost validity) and NLI (to boost diversity); (3) self-imitation learning (to amplify the desired data quality). This process yields a student model that produces defeasible contexts with improved validity, diversity, and defeasibility. From this model we distill a high-quality dataset, \delta-Rules-of-Thumb, of 1.2M entries of contextualizations and rationales for 115K defeasible moral actions rated highly by human annotators 85.9% to 99.8% of the time. Using \delta-RoT we obtain a final student model that wins over all intermediate student models by a notable margin.	翻訳日:2023-10-25 21:00:45 公開日:2023-10-24
# Beyond Sentiment: 政治的スタンス分類のためのトピックメトリクスを活用する Beyond Sentiment: Leveraging Topic Metrics for Political Stance Classification ( http://arxiv.org/abs/2310.15429v1 ) ライセンス: Link先を確認	Weihong Qi	(参考訳) 感覚分析は、コーパスの全体音だけを捉えるために広く批判されているが、テキスト内の潜伏構造や政治的スタンスを正確に反映するには不十分である。本研究では,抽出されたトピックから変換されたダミー変数であるトピックメトリクスを,スタンス分類における感情指標の代替および補完として導入する。本研究は,Bestvater and Monroe (2023) が同定した3つのデータセットを用いて,一貫性のあるトピック抽出におけるBERTopicの習熟度と,スタンス分類におけるトピックメトリクスの有効性を示す。実験の結果、BERTopicのコヒーレンススコアは17.07%から54.20%向上し、ディリクレ転位(英語版)(LDA)や非負行列因子化(英語版)(NMF)のような従来のアプローチと比較しても改善した。さらに,トピックメトリクスは,スタンス分類における感情指標を上回り,最大18.95%のパフォーマンス向上を示した。本研究は,文脈に富んだテキストやコーパスにおいて,スタンスと感情の相関が弱い話題メトリクスが特に有効であることを示唆する。センチメントとトピックメトリクスの組み合わせは、ほとんどのシナリオで最適なパフォーマンスを達成し、トピックメトリクスのコヒーレンススコアの低さだけでなく、感情のみに依存するという制限にも対処できます。 Sentiment analysis, widely critiqued for capturing merely the overall tone of a corpus, falls short in accurately reflecting the latent structures and political stances within texts. This study introduces topic metrics, dummy variables converted from extracted topics, as both an alternative and complement to sentiment metrics in stance classification. By employing three datasets identified by Bestvater and Monroe (2023), this study demonstrates BERTopic's proficiency in extracting coherent topics and the effectiveness of topic metrics in stance classification. The experiment results show that BERTopic improves coherence scores by 17.07% to 54.20% when compared to traditional approaches such as Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), prevalent in earlier political science research. Additionally, our results indicate topic metrics outperform sentiment metrics in stance classification, increasing performance by as much as 18.95%. Our findings suggest topic metrics are especially effective for context-rich texts and corpus where stance and sentiment correlations are weak. The combination of sentiment and topic metrics achieve an optimal performance in most of the scenarios and can further address the limitations of relying solely on sentiment as well as the low coherence score of topic metrics.	翻訳日:2023-10-25 21:00:13 公開日:2023-10-24
# 意味的混乱補正による連続イベント抽出 Continual Event Extraction with Semantic Confusion Rectification ( http://arxiv.org/abs/2310.15470v1 ) ライセンス: Link先を確認	Zitao Wang and Xinyi Wang and Wei Hu	(参考訳) 本研究では, 連続イベント抽出法について検討し, 忘れることを避けつつ, 間欠的に出現するイベント情報を抽出することを目的とした。イベントタイプに関するセマンティックな混乱は、時間とともに更新される同じテキストのアノテーションに由来することを観察する。イベントタイプ間の不均衡は、この問題を悪化させる。本稿では,意味的混乱を解消する新しい連続イベント抽出モデルを提案する。意味的混乱を軽減するために各文の擬似ラベルをマークする。イベントタイプの理解を深めるために、現在のモデルと以前のモデルの間に重要な知識を転送します。さらに、モデルには、他の関連する型を利用して、ロングテールイベントタイプのセマンティクスにフォーカスするよう促す。実験の結果,本モデルは最先端のベースラインより優れ,不均衡なデータセットに熟練していることがわかった。 We study continual event extraction, which aims to extract incessantly emerging event information while avoiding forgetting. We observe that the semantic confusion on event types stems from the annotations of the same text being updated over time. The imbalance between event types even aggravates this issue. This paper proposes a novel continual event extraction model with semantic confusion rectification. We mark pseudo labels for each sentence to alleviate semantic confusion. We transfer pivotal knowledge between current and previous models to enhance the understanding of event types. Moreover, we encourage the model to focus on the semantics of long-tailed event types by leveraging other associated types. Experimental results show that our model outperforms state-of-the-art baselines and is proficient in imbalanced datasets.	翻訳日:2023-10-25 20:51:41 公開日:2023-10-24
# Janusインターフェース: 大規模言語モデルの微調整がプライバシリスクをいかに増幅するか The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks ( http://arxiv.org/abs/2310.15469v1 ) ライセンス: Link先を確認	Xiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, Zihao Wang, Liya Su, XiaoFeng Wang, Haixu Tang	(参考訳) 2018年以降のこの時代は、OpenAIのChatGPTのような革新的な言語技術によって、大きな言語モデル(LLM)が出現した。業界がモデルパラメータの強化と膨大な人間の言語データの活用に躍起になり、セキュリティとプライバシの課題も浮上した。中でも最も重要なのが、Webベースのデータ取得におけるPII(Personal Identible Information)の潜在的な不注意な付加であり、意図しないPII開示のリスクが生じる。トレーニング中のRLHFや破滅的なフォーッティングといった戦略は、プライバシー侵害のリスクを抑えるために取り組まれてきたが、OpenAIのGPT-3.5のための微調整インターフェースによって象徴された最近のLCMの進歩は、懸念を再燃させた。 LLMの微調整は、トレーニングデータセットに埋め込まれた個人情報の漏洩を引き起こすだろうか? 本稿では,この問題に対する最初の解決策,特にJanus 攻撃と呼ばれる新たな LLM 攻撃経路の発見について報告する。この攻撃では、LLMを極小のPIIデータセットを用いて微調整し、潜在的に再蓄積し、隠蔽されたPIIを明らかにするPIIアソシエーションタスクを構築することができる。以上の結果から, GPT-3.5 などの LLM が不透過性から PII 抽出に移行し, 隠れた PII のかなりの割合を希釈できることが明らかとなった。この研究は、Janus攻撃ベクトルを深く掘り下げることで、LLMユーティリティとプライバシ保護の間の複雑な相互作用をナビゲートする義務を負う。 The era post-2018 marked the advent of Large Language Models (LLMs), with innovations such as OpenAI's ChatGPT showcasing prodigious linguistic prowess. As the industry galloped toward augmenting model parameters and capitalizing on vast swaths of human language data, security and privacy challenges also emerged. Foremost among these is the potential inadvertent accrual of Personal Identifiable Information (PII) during web-based data acquisition, posing risks of unintended PII disclosure. While strategies like RLHF during training and Catastrophic Forgetting have been marshaled to control the risk of privacy infringements, recent advancements in LLMs, epitomized by OpenAI's fine-tuning interface for GPT-3.5, have reignited concerns. One may ask: can the fine-tuning of LLMs precipitate the leakage of personal information embedded within training datasets? This paper reports the first endeavor to seek the answer to the question, particularly our discovery of a new LLM exploitation avenue, called the Janus attack. In the attack, one can construct a PII association task, whereby an LLM is fine-tuned using a minuscule PII dataset, to potentially reinstate and reveal concealed PIIs. Our findings indicate that, with a trivial fine-tuning outlay, LLMs such as GPT-3.5 can transition from being impermeable to PII extraction to a state where they divulge a substantial proportion of concealed PII. This research, through its deep dive into the Janus attack vector, underscores the imperative of navigating the intricate interplay between LLM utility and privacy preservation.	翻訳日:2023-10-25 20:51:29 公開日:2023-10-24
# 再生可能エネルギーシステムにおける分散ソリューションのエンパワーメントとグリッド最適化 Empowering Distributed Solutions in Renewable Energy Systems and Grid Optimization ( http://arxiv.org/abs/2310.15468v1 ) ライセンス: Link先を確認	Mohammad Mohammadi and Ali Mohammadi	(参考訳) 本研究では,電力産業における集中型アプローチから分散型アプローチへの移行に着目し,特に機械学習(ml)の進歩が再生可能エネルギー資源のエンパワーメントとグリッド管理の改善において重要な役割を担っていることを示す。 MLモデルは、人工ニューラルネットワーク、サポートベクターマシン、決定木といった様々な技術を活用することで、再生可能エネルギーの生成と消費を予測する上でますます重要になっている。さらに、予測精度を高めるために、データ分割、正規化、分解、離散化などのデータ前処理手法を用いる。ビッグデータとMLをスマートグリッドに組み込むことは、エネルギー効率の向上、需要に対するより効率的な応答、再生可能エネルギー源のより良い統合など、いくつかの利点をもたらす。それでも、大規模なデータボリュームの処理、サイバーセキュリティの確保、専門知識の獲得といった課題には対処する必要がある。この研究は、太陽エネルギー、風力エネルギー、電気分布と貯蔵の領域における様々なML応用を研究し、エネルギーシステムを最適化する可能性を示している。この研究は、mlイノベーションと分散意思決定の適用を通じて集中型ソリューションから分散型ソリューションへと移行し、最終的にはより効率的で持続可能なエネルギーの未来を形作る、電力セクターの進化の状況を示すものだ。 This study delves into the shift from centralized to decentralized approaches in the electricity industry, with a particular focus on how machine learning (ML) advancements play a crucial role in empowering renewable energy sources and improving grid management. ML models have become increasingly important in predicting renewable energy generation and consumption, utilizing various techniques like artificial neural networks, support vector machines, and decision trees. Furthermore, data preprocessing methods, such as data splitting, normalization, decomposition, and discretization, are employed to enhance prediction accuracy. The incorporation of big data and ML into smart grids offers several advantages, including heightened energy efficiency, more effective responses to demand, and better integration of renewable energy sources. Nevertheless, challenges like handling large data volumes, ensuring cybersecurity, and obtaining specialized expertise must be addressed. The research investigates various ML applications within the realms of solar energy, wind energy, and electric distribution and storage, illustrating their potential to optimize energy systems. To sum up, this research demonstrates the evolving landscape of the electricity sector as it shifts from centralized to decentralized solutions through the application of ML innovations and distributed decision-making, ultimately shaping a more efficient and sustainable energy future.	翻訳日:2023-10-25 20:50:57 公開日:2023-10-24
# EKGNet: パターン内不整脈分類のための10.96{\mu}W完全アナログニューラルネットワーク EKGNet: A 10.96{\mu}W Fully Analog Neural Network for Intra-Patient Arrhythmia Classification ( http://arxiv.org/abs/2310.15466v1 ) ライセンス: Link先を確認	Benyamin Haghi, Lin Ma, Sahin Lale, Anima Anandkumar, Azita Emami	(参考訳) 心電図不整脈分類におけるアナログ計算と深層学習を組み合わせた統合的アプローチを提案する。本研究では,高精度かつ低消費電力でアーカイブ可能なハードウェア効率で完全アナログ不整脈分類アーキテクチャであるekgnetを提案する。提案アーキテクチャは、サブスレッショルド領域で動作するトランジスタのエネルギー効率を活用し、アナログ・デジタルコンバータ(ADC)と静的ランダムアクセスメモリ(SRAM)を必要としない。システム設計は、プロセス、供給電圧、温度変化を緩和する新しいアナログ・シーケンシャル・マルチプライ・アキュムレート(MAC)回路を含む。 PhysioNet の MIT-BIH と PTB 診断データセットの実験的評価は, 平均平衡精度 95% と 94.25% を患者内不整脈分類と心筋梗塞分類でそれぞれ達成し, 提案手法の有効性を示した。この革新的なアプローチは、バイオメディカル応用における精度と伝達性を高めた低出力不整脈分類システムを開発するための有望な道を示す。 We present an integrated approach by combining analog computing and deep learning for electrocardiogram (ECG) arrhythmia classification. We propose EKGNet, a hardware-efficient and fully analog arrhythmia classification architecture that archives high accuracy with low power consumption. The proposed architecture leverages the energy efficiency of transistors operating in the subthreshold region, eliminating the need for analog-to-digital converters (ADC) and static random access memory (SRAM). The system design includes a novel analog sequential Multiply-Accumulate (MAC) circuit that mitigates process, supply voltage, and temperature variations. Experimental evaluations on PhysioNet's MIT-BIH and PTB Diagnostics datasets demonstrate the effectiveness of the proposed method, achieving average balanced accuracy of 95% and 94.25% for intra-patient arrhythmia classification and myocardial infarction (MI) classification, respectively. This innovative approach presents a promising avenue for developing low-power arrhythmia classification systems with enhanced accuracy and transferability in biomedical applications.	翻訳日:2023-10-25 20:50:33 公開日:2023-10-24
# ユーザ生成コンテンツにおけるyes-no質問に対する回答の解釈 Interpreting Answers to Yes-No Questions in User-Generated Content ( http://arxiv.org/abs/2310.15464v1 ) ライセンス: Link先を確認	Shivam Mathur, Keun Hee Park, Dhivya Chinnappa, Saketh Kotamraju and Eduardo Blanco	(参考訳) ソーシャルメディアでイエスノー質問に対する回答の解釈は難しい。もちろん、キーワードは珍しくなく、それらを含む答えは、キーワードが提案するものと解釈されることは滅多にない。本稿では,Twitterから4,442件の質問応答対を新たに提示する。我々は, 解釈がイエスかノーか, 解釈が不明な回答の言語的特徴について論じる。大規模な言語モデルは、同じ問題に対して他のコーパスを微調整しブレンドした後でも、ソーシャルメディア以外でも、この問題を解決するには程遠いことを示している。 Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and the few answers that include them are rarely to be interpreted what the keywords suggest. In this paper, we present a new corpus of 4,442 yes-no question-answer pairs from Twitter. We discuss linguistic characteristics of answers whose interpretation is yes or no, as well as answers whose interpretation is unknown. We show that large language models are far from solving this problem, even after fine-tuning and blending other corpora for the same problem but outside social media.	翻訳日:2023-10-25 20:50:12 公開日:2023-10-24
# 言語モデル間インタラクションによる自己誘導型メンタルヘルス介入の促進--認知再構成の事例研究 Facilitating Self-Guided Mental Health Interventions Through Human-Language Model Interaction: A Case Study of Cognitive Restructuring ( http://arxiv.org/abs/2310.15461v1 ) ライセンス: Link先を確認	Ashish Sharma, Kevin Rushton, Inna Wanyin Lin, Theresa Nguyen, Tim Althoff	(参考訳) 自己指導型のメンタルヘルス介入、例えば"do-it-yourself"ツールによる対処戦略の学習と実践は、メンタルヘルスへのアクセスを改善するという大きな約束を示す。しかし、これらの介入はしばしば認知的に要求され、感情的に引き起こされ、広範囲の実装と採用を制限するアクセシビリティ障壁を生み出します。本稿では,人間と言語モデルの相互作用が自己誘導型メンタルヘルス介入をどのように支援できるかについて検討する。否定的思考を克服するエビデンスに基づく治療手法であるcognitive restructuringをケーススタディとして捉えた。 IRBが承認した15,531人の参加者からなる大規模メンタルヘルスウェブサイトにおけるランダム化フィールドスタディにおいて、認知的再構成の様々な段階を通じて言語モデルを用いて人々を支援するシステムの設計と評価を行った。その結果,本システムは67%の参加者の感情的強度に正の影響を与え,否定的思考を65%が克服するのに役立つことがわかった。若者は比較的悪い結果を報告しているが、言語モデル生成を単純化する調整された介入により、全体的な効果と株式が向上する。 Self-guided mental health interventions, such as "do-it-yourself" tools to learn and practice coping strategies, show great promise to improve access to mental health care. However, these interventions are often cognitively demanding and emotionally triggering, creating accessibility barriers that limit their wide-scale implementation and adoption. In this paper, we study how human-language model interaction can support self-guided mental health interventions. We take cognitive restructuring, an evidence-based therapeutic technique to overcome negative thinking, as a case study. In an IRB-approved randomized field study on a large mental health website with 15,531 participants, we design and evaluate a system that uses language models to support people through various steps of cognitive restructuring. Our findings reveal that our system positively impacts emotional intensity for 67% of participants and helps 65% overcome negative thoughts. Although adolescents report relatively worse outcomes, we find that tailored interventions that simplify language model generations improve overall effectiveness and equity.	翻訳日:2023-10-25 20:50:00 公開日:2023-10-24
# UI文法によるLLMによるUIレイアウト生成 UI Layout Generation with LLMs Guided by UI Grammar ( http://arxiv.org/abs/2310.15455v1 ) ライセンス: Link先を確認	Yuwen Lu, Ziang Tong, Qinyi Zhao, Chengzhi Zhang, Toby Jia-Jun Li	(参考訳) 近年のLLM(Large Language Models)の進歩は、特にモバイルユーザインタフェース(UI)に関するタスクへの応用において、研究者や業界の専門家の間で関心を喚起している。本稿では,UIレイアウト生成におけるLCMの利用について検討する。調査の中心はUI文法の導入です。UI画面に固有の階層構造を表現するために提案した新しいアプローチです。本研究の目的は, LLMの生成能力の向上と, プロセスの説明可能性, 制御性の向上である。 GPT-4で行った実験では、LLMがテキスト内学習を通じて高品質なユーザインタフェースを実現できることを示した。さらに,本研究では,特定の側面における生成結果の品質向上に向けた文法的アプローチの可能性について予備的検討を行った。 The recent advances in Large Language Models (LLMs) have stimulated interest among researchers and industry professionals, particularly in their application to tasks concerning mobile user interfaces (UIs). This position paper investigates the use of LLMs for UI layout generation. Central to our exploration is the introduction of UI grammar -- a novel approach we proposed to represent the hierarchical structure inherent in UI screens. The aim of this approach is to guide the generative capacities of LLMs more effectively and improve the explainability and controllability of the process. Initial experiments conducted with GPT-4 showed the promising capability of LLMs to produce high-quality user interfaces via in-context learning. Furthermore, our preliminary comparative study suggested the potential of the grammar-based approach in improving the quality of generative results in specific aspects.	翻訳日:2023-10-25 20:49:38 公開日:2023-10-24
# パブリック機能によるプライベートラーニング Private Learning with Public Features ( http://arxiv.org/abs/2310.15454v1 ) ライセンス: Link先を確認	Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang	(参考訳) 本研究では,データがプライベート機能とパブリック機能の結合であるプライベート学習問題のクラスについて検討する。これは、リコメンデーションや広告予測のような個人的なパーソナライズタスクにおいて、個人に関連する特徴が敏感である一方で、アイテム(推奨する映画や曲、またはユーザーに見せる広告)に関連する特徴が公開されており、保護を必要としない場合が多い。自然の疑問は、プライベートアルゴリズムがパブリック機能の存在下で高いユーティリティを達成できるかどうかである。公開機能で動作するマルチエンコーダモデルに対して,肯定的な回答を与える。我々は,この分離を有効に活用するアルゴリズムを,(勾配にノイズを加える代わりに)十分な統計量だけを保護して開発する。本手法は, 線形回帰に対する実用性の向上を保証し, 2つの標準プライベートレコメンデーションベンチマークにおいて, プライベートな特徴分離に適応する手法の重要性を実証する。 We study a class of private learning problems in which the data is a join of private and public features. This is often the case in private personalization tasks such as recommendation or ad prediction, in which features related to individuals are sensitive, while features related to items (the movies or songs to be recommended, or the ads to be shown to users) are publicly available and do not require protection. A natural question is whether private algorithms can achieve higher utility in the presence of public features. We give a positive answer for multi-encoder models where one of the encoders operates on public features. We develop new algorithms that take advantage of this separation by only protecting certain sufficient statistics (instead of adding noise to the gradient). This method has a guaranteed utility improvement for linear regression, and importantly, achieves the state of the art on two standard private recommendation benchmarks, demonstrating the importance of methods that adapt to the private-public feature separation.	翻訳日:2023-10-25 20:49:25 公開日:2023-10-24
# 因果表現学習における一般識別性と達成可能性 General Identifiability and Achievability for Causal Representation Learning ( http://arxiv.org/abs/2310.15450v1 ) ライセンス: Link先を確認	Burak Var{\i}c{\i}, Emre Acart\"urk, Karthikeyan Shanmugam, Ali Tajer	(参考訳) 本稿では、一般的な非パラメトリック因果潜在モデルと、潜在データを観測データにマッピングする一般変換モデルに基づく因果表現学習(CRL)に焦点を当てる。潜在因果グラフ内のノードごとに2つのハードな \textbf{uncoupled} 介入を用いて、 \textbf{identifiability} と \textbf{achievability} の結果を確立する。特に、どの一対の介入環境が同じノードを介入しているか(疎結合な環境)を知らない。この論文は、未結合の介入の下で潜在因果モデルと変数の完全回復が保証されることを示す。達成可能性のために、観測データと介入データを使用し、アルゴリズムの証明可能な保証で潜在因果モデルと変数を復元するアルゴリズムが設計されている。このアルゴリズムは、異なる環境におけるスコアの変動を利用して、変圧器の逆数と後続変数を推定する。さらに、分析では、2つのハードな \textbf{coupled} 介入、つまり同じノードが介入した2つの環境に関するメタデータが知られている場合に、既存のidentifiability結果が復元される。非パラメトリック識別性に関する既存の結果は、介入に関する仮定と追加の忠実性の仮定を必要とする。本稿では、観測データが利用可能である場合、追加の忠実性の仮定は不要であることを示す。 This paper focuses on causal representation learning (CRL) under a general nonparametric causal latent model and a general transformation model that maps the latent data to the observational data. It establishes \textbf{identifiability} and \textbf{achievability} results using two hard \textbf{uncoupled} interventions per node in the latent causal graph. Notably, one does not know which pair of intervention environments have the same node intervened (hence, uncoupled environments). For identifiability, the paper establishes that perfect recovery of the latent causal model and variables is guaranteed under uncoupled interventions. For achievability, an algorithm is designed that uses observational and interventional data and recovers the latent causal model and variables with provable guarantees for the algorithm. This algorithm leverages score variations across different environments to estimate the inverse of the transformer and, subsequently, the latent variables. The analysis, additionally, recovers the existing identifiability result for two hard \textbf{coupled} interventions, that is when metadata about the pair of environments that have the same node intervened is known. It is noteworthy that the existing results on non-parametric identifiability require assumptions on interventions and additional faithfulness assumptions. This paper shows that when observational data is available, additional faithfulness assumptions are unnecessary.	翻訳日:2023-10-25 20:49:07 公開日:2023-10-24
# 確率的非凸凸ミニマックス問題に対する一階正則運動量降下昇降アルゴリズム An accelerated first-order regularized momentum descent ascent algorithm for stochastic nonconvex-concave minimax problems ( http://arxiv.org/abs/2310.15448v1 ) ライセンス: Link先を確認	Huiling Zhang and Zi Xu	(参考訳) 確率的非凸ミニマックス問題は近年、機械学習、信号処理など多くの分野に注目されている。本稿では,確率的非凸凸ミニマックス問題を解くための一階正則化運動量降下法(formda)を提案する。アルゴリズムの反復複雑性は$\tilde{\mathcal{O}}(\varepsilon ^{-6.5})$で$\varepsilon$-stationary pointを得ることが証明され、これは目的関数の定常性の下での確率的非凸-凹ミニマックス問題を解くためにシングルループアルゴリズムの最もよく知られた複雑性を実現する。 Stochastic nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose an accelerated first-order regularized momentum descent ascent algorithm (FORMDA) for solving stochastic nonconvex-concave minimax problems. The iteration complexity of the algorithm is proved to be $\tilde{\mathcal{O}}(\varepsilon ^{-6.5})$ to obtain an $\varepsilon$-stationary point, which achieves the best-known complexity bound for single-loop algorithms to solve the stochastic nonconvex-concave minimax problems under the stationarity of the objective function.	翻訳日:2023-10-25 20:48:40 公開日:2023-10-24
# the quantum tortoise and the classical hare: 量子コンピューティングがどの問題を加速させるか(そしてそうしないか)を理解するためのシンプルなフレームワーク The Quantum Tortoise and the Classical Hare: A simple framework for understanding which problems quantum computing will accelerate (and which it will not) ( http://arxiv.org/abs/2310.15505v1 ) ライセンス: Link先を確認	Sukwoong Choi, William S. Moses, Neil Thompson	(参考訳) 量子コンピューティングは、いくつかの問題を解決するために変革的な利益を約束します。量子コンピュータを今、あるいは将来使いたい人には、どの問題が役に立つかを知ることが重要です。本稿では,この問いに対して直感的かつ定量的に答える枠組みを提案する。フレームワークの基盤となる構造は量子コンピュータと古典コンピュータの競争であり、それぞれの強みが勝利のタイミングを決定する。古典的コンピュータは高速に動作するが、量子コンピュータはより効率的なアルゴリズムを実行することがある。速度優位かアルゴリズム優位かは、ある問題が量子コンピューティングの恩恵を受けるかどうかを決定する。我々の分析によると、多くの問題、特に、一般的なビジネスにとって重要な小規模から中規模の問題では、量子コンピューティングの恩恵を受けない。逆に、より大きな問題や特に大きなアルゴリズム的ゲインを持つものは、短期量子コンピューティングの恩恵を受ける。非常に大きなアルゴリズムの利得は、実際にはまれであり、原理上も稀であると理論化されているため、量子コンピューティングの利点は、このようなまれなケースのユーザか、非常に大きなデータを処理する実践者のいずれかに流れることを示唆する。 Quantum computing promises transformational gains for solving some problems, but little to none for others. For anyone hoping to use quantum computers now or in the future, it is important to know which problems will benefit. In this paper, we introduce a framework for answering this question both intuitively and quantitatively. The underlying structure of the framework is a race between quantum and classical computers, where their relative strengths determine when each wins. While classical computers operate faster, quantum computers can sometimes run more efficient algorithms. Whether the speed advantage or the algorithmic advantage dominates determines whether a problem will benefit from quantum computing or not. Our analysis reveals that many problems, particularly those of small to moderate size that can be important for typical businesses, will not benefit from quantum computing. Conversely, larger problems or those with particularly big algorithmic gains will benefit from near-term quantum computing. Since very large algorithmic gains are rare in practice and theorized to be rare even in principle, our analysis suggests that the benefits from quantum computing will flow either to users of these rare cases, or practitioners processing very large data.	翻訳日:2023-10-25 20:44:11 公開日:2023-10-24
# 合成シーングラフからのクロスビュー自己ローカライゼーション Cross-view Self-localization from Synthesized Scene-graphs ( http://arxiv.org/abs/2310.15504v1 ) ライセンス: Link先を確認	Ryogo Yamamoto, Kanji Tanaka	(参考訳) クロスビューの自己ローカライゼーションは、スパース視点からデータベースイメージを提供する視覚的場所認識の難しいシナリオである。近年,NeRF(Neural Radiance Fields)技術を用いたデータベース画像の合成手法が注目されている。しかし,これらの手法により得られた合成画像は,原画像よりも品質が低く,データベースの保存コストも著しく向上する。本研究では、生画像から計算したビュー不変外観特徴と合成画像から計算したビュー依存空間意味特徴の利点を組み合わせた、新しいハイブリッドシーンモデルを提案する。これら2つの特徴はシーングラフに融合され、グラフニューラルネットワークによって圧縮学習され認識される。提案手法の有効性は,フォトリアリスティック・ビタット・シミュレータを用いた多数の未確認ビューを含む新しいクロスビュー・セルフローカライズデータセットを用いて検証した。 Cross-view self-localization is a challenging scenario of visual place recognition in which database images are provided from sparse viewpoints. Recently, an approach for synthesizing database images from unseen viewpoints using NeRF (Neural Radiance Fields) technology has emerged with impressive performance. However, synthesized images provided by these techniques are often of lower quality than the original images, and furthermore they significantly increase the storage cost of the database. In this study, we explore a new hybrid scene model that combines the advantages of view-invariant appearance features computed from raw images and view-dependent spatial-semantic features computed from synthesized images. These two types of features are then fused into scene graphs, and compressively learned and recognized by a graph neural network. The effectiveness of the proposed method was verified using a novel cross-view self-localization dataset with many unseen views generated using a photorealistic Habitat simulator.	翻訳日:2023-10-25 20:43:52 公開日:2023-10-24
# TRAMS:長距離言語モデリングのためのトレーニング不要メモリ選択 TRAMS: Training-free Memory Selection for Long-range Language Modeling ( http://arxiv.org/abs/2310.15494v1 ) ライセンス: Link先を確認	Haofei Yu, Cunxiang wang, Yue Zhang, Wei Bi	(参考訳) トランスフォーマーアーキテクチャは多くのaiモデルにとって不可欠であるが、長距離言語モデリングの課題に直面している。いくつかの特定のトランスフォーマーアーキテクチャは、長距離依存の問題に対処するために設計されているが、Transformer-XLのような既存のメソッドは、高頻度で非効率なメモリに悩まされている。本研究では、1つの単純なメトリクスに基づいて注意計算に参加するトークンを選択できる「トレーニングフリーメモリ選択(tram)」と呼ばれるプラグ・アンド・プレイ戦略を提案する。この戦略により、現在のクエリに高い注意点を持つ可能性のあるトークンを保持し、他のトークンを無視します。我々は、単語レベルのベンチマーク(wikitext-103)と文字レベルのベンチマーク(enwik8)で、このアプローチをテストしました。 The Transformer architecture is crucial for numerous AI models, but it still faces challenges in long-range language modeling. Though several specific transformer architectures have been designed to tackle issues of long-range dependencies, existing methods like Transformer-XL are plagued by a high percentage of ineffective memories. In this study, we present a plug-and-play strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens participating in attention calculation based on one simple metric. This strategy allows us to keep tokens that are likely to have a high attention score with the current queries and ignore the other ones. We have tested our approach on the word-level benchmark (WikiText-103) and the character-level benchmark (enwik8), and the results indicate an improvement without having additional training or adding additional parameters.	翻訳日:2023-10-25 20:43:38 公開日:2023-10-24
# 統合オンライントップK勧告のためのロバスト表現学習 Robust Representation Learning for Unified Online Top-K Recommendation ( http://arxiv.org/abs/2310.15492v1 ) ライセンス: Link先を確認	Minfang Lu, Yuchen Jiang, Huihui Dong, Qi Li, Ziru Xu, Yuanlin Liu, Lixia Wu, Haoyuan Hu, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng	(参考訳) 大規模産業eコマースにおいて、オンラインレコメンデーションシステムの効率性は、さまざまなビジネスシナリオに対応する、非常に関連性の高いアイテム/コンテンツ広告を提供する上で重要である。しかし、既存の研究のほとんどはアイテム広告のみに焦点を当てており、コンテンツ広告の重要性を無視している。この監視はマルチエンタリティ構造内の不整合と不公平な検索をもたらす。さらに、異なるドメインにまたがる複数のエンティティ広告からトップk広告を取得するという課題は、複雑さを増す。近年の研究では、異なるドメイン内のユーザエンタリティの挙動が、分化と均質性の特徴を示すことが証明されている。したがって、マルチドメインマッチングモデルは通常、ドメイン不変およびドメイン固有表現を持つハイブリッド専門家フレームワークに依存します。残念なことに、ほとんどのアプローチは、主に異なる専門家のコンビネーションモードの最適化にフォーカスしており、専門家モジュール自体の最適化に固有の困難に対処できていない。異なるドメインにまたがる冗長な情報の存在は、専門家間の干渉と競争をもたらし、一方、各ドメインの異なる学習目標が専門家間の最適化の課題に繋がる。そこで本研究では,統一型オンライントップkレコメンデーションのためのロバスト表現学習を提案する。提案手法は,データフェアネスを保証するため,エンティティ空間における統一モデリングを構築する。ロバスト表現学習は、ドメイン敵学習とマルチビューワッサースタイン分布学習を用いてロバスト表現を学習する。さらに,本提案手法は,相補的不確実性重みと直交性制約によって相反する目的のバランスをとる。提案手法の有効性と合理性は様々な実験によって検証されている。 In large-scale industrial e-commerce, the efficiency of an online recommendation system is crucial in delivering highly relevant item/content advertising that caters to diverse business scenarios. However, most existing studies focus solely on item advertising, neglecting the significance of content advertising. This oversight results in inconsistencies within the multi-entity structure and unfair retrieval. Furthermore, the challenge of retrieving top-k advertisements from multi-entity advertisements across different domains adds to the complexity. Recent research proves that user-entity behaviors within different domains exhibit characteristics of differentiation and homogeneity. Therefore, the multi-domain matching models typically rely on the hybrid-experts framework with domain-invariant and domain-specific representations. Unfortunately, most approaches primarily focus on optimizing the combination mode of different experts, failing to address the inherent difficulty in optimizing the expert modules themselves. The existence of redundant information across different domains introduces interference and competition among experts, while the distinct learning objectives of each domain lead to varying optimization challenges among experts. To tackle these issues, we propose robust representation learning for the unified online top-k recommendation. Our approach constructs unified modeling in entity space to ensure data fairness. The robust representation learning employs domain adversarial learning and multi-view wasserstein distribution learning to learn robust representations. Moreover, the proposed method balances conflicting objectives through the homoscedastic uncertainty weights and orthogonality constraints. Various experiments validate the effectiveness and rationality of our proposed method, which has been successfully deployed online to serve real business scenarios.	翻訳日:2023-10-25 20:43:20 公開日:2023-10-24
# NuTrea: コンテキスト誘導型マルチホップKGQAのためのニューラルツリー検索 NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA ( http://arxiv.org/abs/2310.15484v1 ) ライセンス: Link先を確認	Hyeong Kyu Choi and Seunghun Lee and Jaewon Chu and Hyunwoo J. Kim	(参考訳) マルチホップ知識グラフ質問回答(Multi-hop Knowledge Graph Question Answering, KGQA)は、知識グラフ(KG)からノードを取得して自然言語の質問に答えるタスクである。最近のGNNベースのアプローチでは、メッセージをシードノードから応答ノードへ順次伝播するKGパス探索問題としてこのタスクを定式化している。しかし、これらのメッセージは過去指向であり、全kgコンテキストを考慮しない。さらに悪いことに、kgノードは適切な名詞エンティティを表すことが多く、時には暗号化され、経路間の選択に役立たない。これらの問題に対処するために,木探索に基づくGNNモデルであるNeural Tree Search (NuTrea)を提案する。私たちのモデルは、未到達のサブツリー領域を調査し、過去指向の埋め込みを促進するメッセージパッシングスキームを採用しています。さらに,グローバルなKGコンテキストを考慮したRF-IEF(Relation Frequency-Inverse Entity Frequency)ノードの埋め込みを導入し,不明瞭なKGノードを特徴付ける。提案手法の汎用性は,3つの主要なマルチホップKGQAベンチマークデータセットの実験により実証され,その表現性と頑健性をさらに検証した。全体として、NuTreaは複雑な自然言語の質問でKGに問い合わせる強力な手段を提供する。コードはhttps://github.com/mlvlab/nutreaで入手できる。 Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the full KG context. To make matters worse, KG nodes often represent proper noun entities and are sometimes encrypted, being uninformative in selecting between paths. To address these problems, we propose Neural Tree Search (NuTrea), a tree search-based GNN model that incorporates the broader KG context. Our model adopts a message-passing scheme that probes the unreached subtree regions to boost the past-oriented embeddings. In addition, we introduce the Relation Frequency-Inverse Entity Frequency (RF-IEF) node embedding that considers the global KG context to better characterize ambiguous KG nodes. The general effectiveness of our approach is demonstrated through experiments on three major multi-hop KGQA benchmark datasets, and our extensive analyses further validate its expressiveness and robustness. Overall, NuTrea provides a powerful means to query the KG with complex natural language questions. Code is available at https://github.com/mlvlab/NuTrea.	翻訳日:2023-10-25 20:42:37 公開日:2023-10-24
# RGB-Dビデオにおける局所物体検出 Salient Object Detection in RGB-D Videos ( http://arxiv.org/abs/2310.15482v1 ) ライセンス: Link先を確認	Ao Mou, Yukang Lu, Jiahao He, Dingyao Min, Keren Fu, Qijun Zhao	(参考訳) 奥行き検知装置の普及に伴い、RGB-Dビデオや関連データ/メディアは日常生活の様々な面で大きな注目を集めている。その結果、RGB-Dビデオにおけるサルエント物体検出(SOD)の実施は、非常に有望で進化する道を示す。この領域の可能性にもかかわらず、RGB-DビデオにおけるSODは、RGB-D SODとビデオSOD(VSOD)は、伝統的に独立して研究されている。この新興分野を探求するため,本稿では,データセットとモデルという2つの主要な貢献を行う。一方,RDVSデータセットは現実的な深度を持つ新しいRGB-D VSODデータセットであり,シーンの多様性とフレーム単位の厳密なアノテーションが特徴である。包括的属性とオブジェクト指向分析を用いてデータセットを検証し、トレーニングとテストの分割を提供する。さらに、RGB-D VSODに適した3ストリームネットワークであるDCTNet+を導入し、RGBのモダリティを重視し、奥行きと光の流れを補助モダリティとして扱う。正確な最終予測のために,有効機能強化,改良,融合を追求するために,マルチモーダルアテンションモジュール (MAM) と改良融合モジュール (RFM) の2つのモジュールを提案する。 RFM内での相互作用と融合を強化するため、我々はUIM(Universal Interaction Module)を設計し、RFMに到達する前にマルチモーダルな低レベル特徴を洗練するための全体的マルチモーダル減衰経路(HMAP)を統合する。 RDVSと共に擬似RGB-Dビデオデータセットを用いて総合実験を行い、DCTNet+が17のVSODモデルと14のRGB-D SODモデルよりも優れていることを示した。擬似的および現実的なRGB-Dビデオデータセット上でアブレーション実験を行い、個々のモジュールの利点と現実的な深さを導入する必要性を実証した。私たちのコードとRDVSデータセットはhttps://github.com/kerenfu/RDVS/で利用可能です。 Given the widespread adoption of depth-sensing acquisition devices, RGB-D videos and related data/media have gained considerable traction in various aspects of daily life. Consequently, conducting salient object detection (SOD) in RGB-D videos presents a highly promising and evolving avenue. Despite the potential of this area, SOD in RGB-D videos remains somewhat under-explored, with RGB-D SOD and video SOD (VSOD) traditionally studied in isolation. To explore this emerging field, this paper makes two primary contributions: the dataset and the model. On one front, we construct the RDVS dataset, a new RGB-D VSOD dataset with realistic depth and characterized by its diversity of scenes and rigorous frame-by-frame annotations. We validate the dataset through comprehensive attribute and object-oriented analyses, and provide training and testing splits. Moreover, we introduce DCTNet+, a three-stream network tailored for RGB-D VSOD, with an emphasis on RGB modality and treats depth and optical flow as auxiliary modalities. In pursuit of effective feature enhancement, refinement, and fusion for precise final prediction, we propose two modules: the multi-modal attention module (MAM) and the refinement fusion module (RFM). To enhance interaction and fusion within RFM, we design a universal interaction module (UIM) and then integrate holistic multi-modal attentive paths (HMAPs) for refining multi-modal low-level features before reaching RFMs. Comprehensive experiments, conducted on pseudo RGB-D video datasets alongside our RDVS, highlight the superiority of DCTNet+ over 17 VSOD models and 14 RGB-D SOD models. Ablation experiments were performed on both pseudo and realistic RGB-D video datasets to demonstrate the advantages of individual modules as well as the necessity of introducing realistic depth. Our code together with RDVS dataset will be available at https://github.com/kerenfu/RDVS/.	翻訳日:2023-10-25 20:41:51 公開日:2023-10-24
# AutoDiff: 表データ合成のためのオートエンコーダと拡散モデルを組み合わせる AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing ( http://arxiv.org/abs/2310.15479v1 ) ライセンス: Link先を確認	Namjoon Suh, Xiaofeng Lin, Din-Yin Hsieh, Merhdad Honarkhah, Guang Cheng	(参考訳) 拡散モデルは、コンピュータビジョン、言語モデル、音声合成を含む現代の機械学習の多くのサブフィールドにおいて、合成データ生成の主要なパラダイムとなっている。本稿では,合成表データを生成するために拡散モデルのパワーを利用する。表データの異質な特徴は表データ合成における主な障害であり,オートエンコーダアーキテクチャを用いてこの問題に対処している。最先端の表型シンセサイザーと比較すると,本モデルから得られた合成表は,実データに対する優れた統計量を示し,機械学習ユーティリティの下流タスクにおいて良好に機能する。我々は15の公開データセットに対して実験を行った。特に,本モデルでは,表層データ合成における長年の課題である特徴間の相関関係を良好に捉えている。私たちのコードは要求に応じて入手でき、paperが受け入れられれば公開されます。 Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem by employing the auto-encoder architecture. When compared with the state-of-the-art tabular synthesizers, the resulting synthetic tables from our model show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities. We conducted the experiments over 15 publicly available datasets. Notably, our model adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. Our code is available upon request and will be publicly released if paper is accepted.	翻訳日:2023-10-25 20:40:47 公開日:2023-10-24
# CRaSh: 大規模言語モデルなしでのファインチューニングによるクラスタリング、削除、共有 CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model ( http://arxiv.org/abs/2310.15477v1 ) ライセンス: Link先を確認	Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, Bowen Zhou	(参考訳) 近年,大規模言語モデル(LLM)を協調させ,様々なタスクにまたがる一般化能力を高めるための効果的な手法として,命令チューニングが認識されている。しかし、公開アクセス可能な集中型LCMをプライベートな命令データでチューニングする場合、プライバシー上の懸念は避けられない。モデル間のパラメータ化モジュールの直接移動は、この問題に対処するための有効なアプローチであるが、その意味と有効性はさらなる探索が必要である。本稿では,集中型LCMと下流エミュレータ間でトランスフォーマブロックを転送する代表技術であるOffsite-Tuning(OFT)に焦点を当てる。基礎となるOFTのメカニズムを限定的に理解し,表現性や機能的類似性の観点からLLMに関する経験的分析を行う。興味深いことに、モデルのサイズが拡大するにつれて、LCMの層内にユニークなモジュラー構造が現れる。同時に、レイヤ間の表現と中間予測の微妙だが潜在的に重要な変化に注目する。これらの観測にインスパイアされたCRaShは、LCMから改善エミュレータを導出するトレーニングフリー戦略であるClustering、Removing、Sharingを含む。 CRaShは数十億のパラメータでOFTのパフォーマンスを大幅に向上させる。さらに,ロスランドスケープのレンズを通したフルモデルによる微調整により得られる最適解について検討した。以上の結果から,同じ盆地に落下するオプティマ間の線形接続が示され,クラッシュとオプティマの効果が強調された。ソースコードはhttps://github.com/TsinghuaC3I/CRaShで公開されている。 Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks. However, when tuning publicly accessible, centralized LLMs with private instruction data, privacy concerns are inevitable. While direct transfer of parameterized modules between models is a plausible approach to address this, its implications and effectiveness need further exploration. This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators. Given the limited understanding of the underlying mechanism of OFT, we perform an empirical analysis on LLMs from the perspectives of representation and functional similarity. Interestingly, our findings reveal a unique modular structure within the layers of LLMs that appears to emerge as the model size expands. Simultaneously, we note subtle but potentially significant changes in representation and intermediate predictions across the layers. Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs. CRaSh significantly boosts performance of OFT with billions of parameters. Furthermore, we investigate the optimal solutions yielded by fine-tuning with and without full model through the lens of loss landscape. Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT. The source code is publicly available at https://github.com/TsinghuaC3I/CRaSh.	翻訳日:2023-10-25 20:40:32 公開日:2023-10-24
# 幾何コヒーレンスのトレードオフ関係 Trade-off relations of geometric coherence ( http://arxiv.org/abs/2310.15476v1 ) ライセンス: Link先を確認	Bingyu Hu and Ming-Jing Zhao	(参考訳) 量子コヒーレンスは重要な量子資源であり、様々な研究分野と密接に関連している。幾何コヒーレンス(geoge coherence)は、操作的にも幾何学的にもコヒーレンス尺度である。量子ビット系における幾何コヒーレンスのトレードオフ関係について検討する。まず、量子状態の純度によって幾何学的コヒーレンスの上界を導出する。これにより、量子コヒーレンスと混合性との相補性関係が確立される。次に, 2 と 3 つの一般測定ベース上の幾何学的コヒーレンスの量子不確かさ関係をそれぞれ非可逆性の観点から導出する。これらのトレードオフ関係は、量子コヒーレンスの量に制限を与える。副産物として、純状態アンサンブルを識別する最小誤差確率と量子状態の混合性との相補性関係が確立される。 Quantum coherence is an important quantum resource and it is intimately related to various research fields. The geometric coherence is a coherence measure both operationally and geometrically. We study the trade-off relation of geometric coherence in qubit systems. We first derive an upper bound for the geometric coherence by the purity of quantum states. Based on this, a complementarity relation between the quantum coherence and the mixedness is established. We then derive the quantum uncertainty relations of the geometric coherence on two and three general measurement bases in terms of the incompatibility respectively, which turn out to be state-independent for pure states. These trade-off relations provide the limit to the amount of quantum coherence. As a byproduct,the complementarity relation between the minimum error probability for discriminating a pure-states ensemble and the mixedness of quantum states is established.	翻訳日:2023-10-25 20:40:07 公開日:2023-10-24
# 心不全リスク予測のための解釈型生存分析 Interpretable Survival Analysis for Heart Failure Risk Prediction ( http://arxiv.org/abs/2310.15472v1 ) ライセンス: Link先を確認	Mike Van Ness, Tomas Bosschieter, Natasha Din, Andrew Ambrosy, Alexander Sandhu, Madeleine Udell	(参考訳) 生存分析(Survival analysis)は、医療研究において重要かつ広範な問題である。医学研究は伝統的に生き残り分析のためにコックスモデルに依存してきた。 coxモデルは、時間とともに対数線形ハザード関数と比例ハザードを仮定し、これらの仮定が失敗すると性能が低下する。機械学習に基づく新しい生存モデルは、これらの仮定を回避し、精度の向上を提供するが、時には、臨床的使用に不可欠なモデル解釈可能性の犠牲になる。現状の生存モデルと解釈可能かつ競合する新しい生存分析パイプラインを提案する。具体的には,サバイバル・スタッキングの改良版を用いてサバイバル解析問題を分類問題に変換し,特徴選択を行う制御バーンと説明可能なブースティングマシンを用いて解釈可能な予測を生成する。パイプラインを評価するため,大規模なEMHデータベースを用いて心不全のリスクを予測する。我々のパイプラインは最先端のパフォーマンスを達成し、心不全のリスク要因に関する興味深い新しい洞察を提供する。 Survival analysis, or time-to-event analysis, is an important and widespread problem in healthcare research. Medical research has traditionally relied on Cox models for survival analysis, due to their simplicity and interpretability. Cox models assume a log-linear hazard function as well as proportional hazards over time, and can perform poorly when these assumptions fail. Newer survival models based on machine learning avoid these assumptions and offer improved accuracy, yet sometimes at the expense of model interpretability, which is vital for clinical use. We propose a novel survival analysis pipeline that is both interpretable and competitive with state-of-the-art survival models. Specifically, we use an improved version of survival stacking to transform a survival analysis problem to a classification problem, ControlBurn to perform feature selection, and Explainable Boosting Machines to generate interpretable predictions. To evaluate our pipeline, we predict risk of heart failure using a large-scale EHR database. Our pipeline achieves state-of-the-art performance and provides interesting and novel insights about risk factors for heart failure.	翻訳日:2023-10-25 20:39:50 公開日:2023-10-24
# SteloCoder: Pythonコードへの多言語翻訳のためのデコーダ専用LLM SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code Translation ( http://arxiv.org/abs/2310.15539v1 ) ライセンス: Link先を確認	Jialing Pan, Adrien Sad\'e, Jin Kim, Eric Soriano, Guillem Sole, Sylvain Flamant	(参考訳) 最近、Large Language Models (LLMs) に焦点が当てられ、StarCoder (Li et al., 2023) と Code Llama (Rozi\`ere et al., 2023) の両方がコード生成において顕著なパフォーマンスを示している。しかし、効率的なトレーニング技術によるコード翻訳機能の改善はいまだに必要である。これに対応するために,マルチプログラミング言語からpythonコードへの翻訳用に設計された,デコーダ専用のstarcoderベースのllmであるstelocoderを紹介する。特にSteroCoderは、入力プログラミング言語を指定せずに、C++、C#、JavaScript、Java、PHPからPythonへのコード変換を実現している。我々は,5人の専門家とマルチタスク処理のためのゲーティングネットワークを備えたMixture-of-Experts (MoE)技術を組み込んだStarCoderモデルアーキテクチャを改良した。専門家はstarcoderの微調整によって得られる。具体的には,各専門家のサイズをStarCoderのパラメータの0.06%に制限するローランド適応手法(LoRA)を用いる。同時に、時間的学習効率を向上させるため、カリキュラム学習戦略を採用し、自己指導データを用いて効率的な微調整を行う。その結果、各専門家は1つの80Gb A100 HBMでトレーニングするのにわずか6時間しかかからない。 XLCoSTデータセットの実験により、SteroCoderは、マルチプログラミング言語からPythonへの翻訳において平均73.76のCodeBLEUスコアを達成し、リーダーボードの最高パフォーマンスを3.5以上上回った。この成果は、StarCoderをバックボーンとし、1つの80GB A100 HBMで32時間の有効なトレーニングを行うという、わずか4500万の余剰パラメータによるものである。ソースコードはhttps://github.com/sade-adrien/stelocoder.com/。 With the recent focus on Large Language Models (LLMs), both StarCoder (Li et al., 2023) and Code Llama (Rozi\`ere et al., 2023) have demonstrated remarkable performance in code generation. However, there is still a need for improvement in code translation functionality with efficient training techniques. In response to this, we introduce SteloCoder, a decoder-only StarCoder-based LLM designed specifically for multi-programming language-to-Python code translation. In particular, SteloCoder achieves C++, C#, JavaScript, Java, or PHP-to-Python code translation without specifying the input programming language. We modified StarCoder model architecture by incorporating a Mixture-of-Experts (MoE) technique featuring five experts and a gating network for multi-task handling. Experts are obtained by StarCoder fine-tuning. Specifically, we use a Low-Rank Adaptive Method (LoRA) technique, limiting each expert size as only 0.06% of number of StarCoder's parameters. At the same time, to enhance training efficiency in terms of time, we adopt curriculum learning strategy and use self-instruct data for efficient fine-tuning. As a result, each expert takes only 6 hours to train on one single 80Gb A100 HBM. With experiments on XLCoST datasets, SteloCoder achieves an average of 73.76 CodeBLEU score in multi-programming language-to-Python translation, surpassing the top performance from the leaderboard by at least 3.5. This accomplishment is attributed to only 45M extra parameters with StarCoder as the backbone and 32 hours of valid training on one 80GB A100 HBM. The source code is release here: https://github.com/sade-adrien/SteloCoder.	翻訳日:2023-10-25 20:32:48 公開日:2023-10-24
# 協調サンプル選択とコントラスト半監督学習を用いた雑音ラベルによる学習 Learning with Noisy Labels Using Collaborative Sample Selection and Contrastive Semi-Supervised Learning ( http://arxiv.org/abs/2310.15533v1 ) ライセンス: Link先を確認	Qing Miao, Xiaohe Wu, Chao Xu, Yanli Ji, Wangmeng Zuo, Yiwen Guo, Zhaopeng Meng	(参考訳) ノイズラベルを用いた学習(LNL)は広く研究されており、既存のアプローチでは、クリーンサンプルの選択と半教師付き学習(SSL)を交互に行うフレームワークが一般的である。しかし、このアプローチには制限があり、Deep Neural Network (DNN)分類器によって選択されたクリーンセットは、必然的にノイズの多いサンプルを含んでいる。クリーンなサンプルとノイズの多いサンプルの混合は、SSL中のDNNトレーニングの誤認を招き、サンプル選択におけるエラー蓄積による確認バイアスによる一般化性能を損なう。この問題に対処するために,大規模事前学習モデルクリップを活用した協調サンプル選択法(collaborative sample selection, css)を提案する。 CSSは、特定されたクリーンセットから混合ノイズサンプルを削除することを目的としている。私たちは,CLIPの確率とDNN分類器の予測を組み合わせた2次元ガウス混合モデル (2D-GMM) を訓練することにより,これを実現できる。また,CLIPのLNLへの適応性を高めるために,半教師付き学習における対照的な損失を伴う協調学習機構を導入する。これにより、CLIPとDNN分類器のプロンプトを共同でトレーニングし、特徴表現の改善、DNNの分類性能の向上、協調サンプル選択に対する相互利益をもたらすことができる。 CLIPからの補助情報と即時微調整を活用することにより、クリーンセットからノイズサンプルを効果的に除去し、トレーニング中の確認バイアスを軽減する。複数のベンチマークデータセットに対する実験結果から,提案手法の有効性を最先端手法と比較した。 Learning with noisy labels (LNL) has been extensively studied, with existing approaches typically following a framework that alternates between clean sample selection and semi-supervised learning (SSL). However, this approach has a limitation: the clean set selected by the Deep Neural Network (DNN) classifier, trained through self-training, inevitably contains noisy samples. This mixture of clean and noisy samples leads to misguidance in DNN training during SSL, resulting in impaired generalization performance due to confirmation bias caused by error accumulation in sample selection. To address this issue, we propose a method called Collaborative Sample Selection (CSS), which leverages the large-scale pre-trained model CLIP. CSS aims to remove the mixed noisy samples from the identified clean set. We achieve this by training a 2-Dimensional Gaussian Mixture Model (2D-GMM) that combines the probabilities from CLIP with the predictions from the DNN classifier. To further enhance the adaptation of CLIP to LNL, we introduce a co-training mechanism with a contrastive loss in semi-supervised learning. This allows us to jointly train the prompt of CLIP and the DNN classifier, resulting in improved feature representation, boosted classification performance of DNNs, and reciprocal benefits to our Collaborative Sample Selection. By incorporating auxiliary information from CLIP and utilizing prompt fine-tuning, we effectively eliminate noisy samples from the clean set and mitigate confirmation bias during training. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed method in comparison with the state-of-the-art approaches.	翻訳日:2023-10-25 20:32:10 公開日:2023-10-24
# マトリックス機構のプライバシ増幅 Privacy Amplification for Matrix Mechanisms ( http://arxiv.org/abs/2310.15526v1 ) ライセンス: Link先を確認	Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta	(参考訳) プライバシーの増幅はデータ選択のランダム性を利用して、より厳密な差分プライバシー(DP)保証を提供する。この分析は、DP-SGDが機械学習で成功した鍵であるが、新しい最先端のアルゴリズムには適用できない。これは、DP-FTRLとして知られるこれらのアルゴリズムが、DP-SGDのように独立ノイズの代わりに相関ノイズを追加するために行列機構を使用するためである。本稿では,任意の汎用行列機構をサンプリングすることで,プライバシ増幅を解析した最初のアルゴリズムであるmmccを提案する。 MMCCは、$\epsilon\to0$という低い値に近づいたため、ほぼ厳密である。 MMCCにおける相関出力を解析するために,先行出力に条件付けすることで,独立であるかのように解析できることを示す。条件合成定理」は広範に有効であり、二分木-DP-FTRLに付加される雑音が、DP-SGDに付加される雑音と増幅と漸近的に一致できることを示す。また,本アルゴリズムは,標準ベンチマーク上でのDP-FTRLアルゴリズムのプライバシ・ユーティリティトレードオフを大幅に改善することを示した。 Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose "MMCC", the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $\epsilon\to0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our "conditional composition theorem" has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our amplification algorithm also has practical empirical utility: we show it leads to significant improvement in the privacy-utility trade-offs for DP-FTRL algorithms on standard benchmarks.	翻訳日:2023-10-25 20:31:43 公開日:2023-10-24
# 離散消音拡散モデルの固有のプライバシー特性について On the Inherent Privacy Properties of Discrete Denoising Diffusion Models ( http://arxiv.org/abs/2310.15524v1 ) ライセンス: Link先を確認	Rongzhe Wei, Eleonora Krea\v{c}i\'c, Haoyu Wang, Haoteng Yin, Eli Chien, Vamsi K. Potluru, Pan Li	(参考訳) プライバシーに関する懸念から、合成データセットの作成が急増し、将来的な道として拡散モデルが生まれている。先行研究はこれらのモデルに対して経験的評価を行ったが、プライバシ保護能力の数学的特徴を提供するにはギャップがある。そこで本研究では,個別データセット生成のための離散拡散モデル(DDM)に固有のプライバシ保護を理論的に検討する。インスタンス毎の差分プライバシー(pDP)に着目して、トレーニングデータセットの各データポイントの潜在的なプライバシー漏洩を解明し、データ前処理に関する洞察を提供し、DDMによる合成データセット生成のプライバシーリスクを低減する。また、$s$サイズのデータポイントによるトレーニングは、純粋なノイズから合成クリーンデータフェーズへの移行時に、$(\epsilon, \mathcal{o}(\frac{1}{s^2\epsilon})$-pdpから$(\epsilon, \mathcal{o}(\frac{1}{s\epsilon})$-pdpへのプライバシリークの急増をもたらし、拡散係数のより早い減衰は、プライバシの保証を増幅することを示している。最後に,合成データと実世界のデータの両方について理論的知見を実証的に検証する。 Privacy concerns have led to a surge in the creation of synthetic datasets, with diffusion models emerging as a promising avenue. Although prior studies have performed empirical evaluations on these models, there has been a gap in providing a mathematical characterization of their privacy-preserving capabilities. To address this, we present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models (DDMs) for discrete dataset generation. Focusing on per-instance differential privacy (pDP), our framework elucidates the potential privacy leakage for each data point in a given training dataset, offering insights into data preprocessing to reduce privacy risks of the synthetic dataset generation via DDMs. Our bounds also show that training with $s$-sized data points leads to a surge in privacy leakage from $(\epsilon, \mathcal{O}(\frac{1}{s^2\epsilon}))$-pDP to $(\epsilon, \mathcal{O}(\frac{1}{s\epsilon}))$-pDP during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Finally, we empirically verify our theoretical findings on both synthetic and real-world datasets.	翻訳日:2023-10-25 20:31:23 公開日:2023-10-24
# グラフ自己教師付き学習のための生成的および対比的パラダイム Generative and Contrastive Paradigms Are Complementary for Graph Self-Supervised Learning ( http://arxiv.org/abs/2310.15523v1 ) ライセンス: Link先を確認	Yuxiang Wang, Xiao Yan, Chuang Hu, Fangcheng Fu, Wentao Zhang, Hao Wang, Shuo Shang, Jiawei Jiang	(参考訳) グラフ自己教師学習(GSSL)では、マスク付きオートエンコーダ(MAE)が生成パラダイムに従い、マスク付きグラフエッジやノードの機能を再構築する。 Contrastive Learning (CL)は、同じグラフの拡張ビューの類似性を最大化し、GSSLで広く使われている。しかし、GSSLの既存の作業では、MAEとCLは別々に検討されている。我々は、mae と cl のパラダイムが相補的であることを観察し、それらを統合するために graph contrastive masked autoencoder (gcmae) フレームワークを提案する。具体的には、ローカルエッジやノード機能に注目して、MAEはグラフのグローバルな情報をキャプチャできず、特定のエッジや機能に敏感である。逆にclはグラフ間の関係を考えるため、グローバル情報を抽出するのに優れている。したがって、GCMAE に MAE ブランチと CL ブランチを装備し、2 つのブランチは共通エンコーダを共有することにより、MAE ブランチは CL ブランチによって抽出されたグローバル情報を利用することができる。 GCMAEにグローバルグラフ構造を捕捉させるため、既存の作業のようにマスクされたエッジのみでなく、隣接行列全体を再構築するように訓練する。さらに,MAEの特徴平滑化問題に対処するため,再構成誤差を低減するのではなく,ノード埋め込み間の格差を改善する特徴再構築のための識別損失を提案する。我々は,4つのグラフタスク(ノード分類,ノードクラスタリング,リンク予測,グラフ分類)におけるGCMAEを評価し,14の最先端ベースラインと比較した。その結果、GCMAEはこれらのタスクに対して常に良好な精度を提供しており、最高性能のベースラインと比較して最大3.2%の精度向上が達成されている。 For graph self-supervised learning (GSSL), masked autoencoder (MAE) follows the generative paradigm and learns to reconstruct masked graph edges or node features. Contrastive Learning (CL) maximizes the similarity between augmented views of the same graph and is widely used for GSSL. However, MAE and CL are considered separately in existing works for GSSL. We observe that the MAE and CL paradigms are complementary and propose the graph contrastive masked autoencoder (GCMAE) framework to unify them. Specifically, by focusing on local edges or node features, MAE cannot capture global information of the graph and is sensitive to particular edges and features. On the contrary, CL excels in extracting global information because it considers the relation between graphs. As such, we equip GCMAE with an MAE branch and a CL branch, and the two branches share a common encoder, which allows the MAE branch to exploit the global information extracted by the CL branch. To force GCMAE to capture global graph structures, we train it to reconstruct the entire adjacency matrix instead of only the masked edges as in existing works. Moreover, a discrimination loss is proposed for feature reconstruction, which improves the disparity between node embeddings rather than reducing the reconstruction error to tackle the feature smoothing problem of MAE. We evaluate GCMAE on four popular graph tasks (i.e., node classification, node clustering, link prediction, and graph classification) and compare with 14 state-of-the-art baselines. The results show that GCMAE consistently provides good accuracy across these tasks, and the maximum accuracy improvement is up to 3.2% compared with the best-performing baseline.	翻訳日:2023-10-25 20:30:51 公開日:2023-10-24
# MarkQA:数値推論を用いた大規模KBQAデータセット MarkQA: A large scale KBQA dataset with numerical reasoning ( http://arxiv.org/abs/2310.15517v1 ) ライセンス: Link先を確認	Xiang Huang, Sitao Cheng, Yuheng Bao, Shanshan Huang, Yuzhong Qu	(参考訳) 知識ベースに対する質問応答 (KBQA) はファクトイド問題への対処の進展を示しているが、数値的推論を伴うKBQAはいまだに未解明である。本稿では,KBQAにおける複素数値推論に着目し,マルチホップ推論と数値推論の両方を実行する必要がある新しいタスクNR-KBQAを提案する。 PyQLと呼ばれるPython形式で論理形式を設計し、数値推論問題の推論プロセスを表現する。 NR-KBQAの開発を容易にするため,少量の種子から自動的に構築されるMarkQAと呼ばれる大規模なデータセットを提案する。 MarkQAの各質問には、対応するSPARQLクエリと、QDMRフォーマットとPyQLプログラムのステップバイステップ推論プロセスが備わっている。 MarkQAにおける最先端QA手法の実験結果は、KBQAにおける複雑な数値推論が大きな課題に直面していることを示している。 While question answering over knowledge bases (KBQA) has shown progress in addressing factoid questions, KBQA with numerical reasoning remains relatively unexplored. In this paper, we focus on the complex numerical reasoning in KBQA and propose a new task, NR-KBQA, which necessitates the ability to perform both multi-hop reasoning and numerical reasoning. We design a logic form in Python format called PyQL to represent the reasoning process of numerical reasoning questions. To facilitate the development of NR-KBQA, we present a large dataset called MarkQA, which is automatically constructed from a small set of seeds. Each question in MarkQA is equipped with its corresponding SPARQL query, alongside the step-by-step reasoning process in the QDMR format and PyQL program. Experimental results of some state-of-the-art QA methods on the MarkQA show that complex numerical reasoning in KBQA faces great challenges.	翻訳日:2023-10-25 20:30:21 公開日:2023-10-24
# 負荷依存コストによる中国のポストマン問題を解決するためのグラフ注意に基づく深層強化学習 Graph Attention-based Deep Reinforcement Learning for solving the Chinese Postman Problem with Load-dependent costs ( http://arxiv.org/abs/2310.15516v1 ) ライセンス: Link先を確認	Cong Dao Tran, Truong Son Hy	(参考訳) 近年,深い強化学習(DRL)モデルがルーティング問題を解く上で有望な結果を示している。しかしながら、ほとんどのDRLソルバは、トラベリングセールスマン問題(TSP)のようなノードルーティング問題を解決するために一般的に提案されている。一方、中国ポストマン問題(CPP)のようなアークルーティング問題に対するニューラルネットワークの適用については、TSPと比較して不規則で複雑な解空間がしばしばあるため、限定的な研究がなされている。これらのギャップを埋めるために,負荷制約を伴う複雑なアークルーティング問題であるCPP-LC(Corberan et al., 2018)に対処する新しいDRLフレームワークを提案する。この手法の目新しさは2つある。まず、CPP-LCをマルコフ決定過程(MDP)シーケンシャルモデルとして定式化する。次に、CPP-LC課題に効果的に対応するために、エンコーダとデコーダからなるDRL、すなわちArc-DRLに基づく自己回帰モデルを導入する。このようなフレームワークにより、DRLモデルはルーティング問題に対して効率よく、かつ、辛抱強く動作する。さらに,CPP-LCのための進化的アルゴリズム(EA)に基づくバイオインスパイアされた新しいメタヒューリスティックソリューションを提案する。大規模な実験により、Arc-DRLは、(Corberanらによって提案された)CPP-LCの大規模なベンチマークデータセットにおいて、反復局所探索(ILS)や可変近傍探索(VNS)のような既存のメタヒューリスティックな手法よりも、ソリューションの品質と実行時間の両方に関して優れていることが示された。 EA、ILS、VNSといったメタヒューリスティクスのためのC++実装と、データ生成のためのコード、生成されたデータはhttps://github.com/HySonLab/ Chinese_Postman_Problemでリリースしています。 Recently, Deep reinforcement learning (DRL) models have shown promising results in solving routing problems. However, most DRL solvers are commonly proposed to solve node routing problems, such as the Traveling Salesman Problem (TSP). Meanwhile, there has been limited research on applying neural methods to arc routing problems, such as the Chinese Postman Problem (CPP), since they often feature irregular and complex solution spaces compared to TSP. To fill these gaps, this paper proposes a novel DRL framework to address the CPP with load-dependent costs (CPP-LC) (Corberan et al., 2018), which is a complex arc routing problem with load constraints. The novelty of our method is two-fold. First, we formulate the CPP-LC as a Markov Decision Process (MDP) sequential model. Subsequently, we introduce an autoregressive model based on DRL, namely Arc-DRL, consisting of an encoder and decoder to address the CPP-LC challenge effectively. Such a framework allows the DRL model to work efficiently and scalably to arc routing problems. Furthermore, we propose a new bio-inspired meta-heuristic solution based on Evolutionary Algorithm (EA) for CPP-LC. Extensive experiments show that Arc-DRL outperforms existing meta-heuristic methods such as Iterative Local Search (ILS) and Variable Neighborhood Search (VNS) proposed by (Corberan et al., 2018) on large benchmark datasets for CPP-LC regarding both solution quality and running time; while the EA gives the best solution quality with much more running time. We release our C++ implementations for metaheuristics such as EA, ILS and VNS along with the code for data generation and our generated data at https://github.com/HySonLab/Chinese_Postman_Problem	翻訳日:2023-10-25 20:30:05 公開日:2023-10-24
# 火災と闘う - 誤情報の作りと検出におけるllmの2つの役割 Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation ( http://arxiv.org/abs/2310.15515v1 ) ライセンス: Link先を確認	Jason Lucas, Adaku Uchendu, Michiharu Yamashita, Jooyoung Lee, Shaurya Rohatgi, Dongwon Lee	(参考訳) 大規模言語モデル(LLM)の最近のユビキティと破壊的な影響は、誤用される可能性(大規模な有害かつ誤解を招くコンテンツを生成すること)を懸念している。 LLMの新たなリスクに対処するために,現代LSMの創発的・創発的推論能力を活用して人文・LLM生成の偽情報に対抗する新しいFighting Fire with Fire(F3)戦略を提案する。まず, GPT-3.5-turboを用いて, パラフレーズベースおよび摂動型プレフィックススタイルのプロンプトを用いて, 真偽LLM生成コンテンツを合成する。第2に,ゼロショットの文脈内意味推論手法をclozeスタイルのプロンプトに適用し,偽記事やニュース記事から真偽を識別する。我々は,GPT-3.5-turboの分布内および分布外両方のゼロショット優位性を観測し,GPT-3.5-turboの精度は従来より68-72%向上した。私たちのコードベースとデータセットはhttps://github.com/mickeymst/f3で利用可能です。 Recent ubiquity and disruptive impacts of large language models (LLMs) have raised concerns about their potential to be misused (.i.e, generating large-scale harmful and misleading content). To combat this emerging risk of LLMs, we propose a novel "Fighting Fire with Fire" (F3) strategy that harnesses modern LLMs' generative and emergent reasoning capabilities to counter human-written and LLM-generated disinformation. First, we leverage GPT-3.5-turbo to synthesize authentic and deceptive LLM-generated content through paraphrase-based and perturbation-based prefix-style prompts, respectively. Second, we apply zero-shot in-context semantic reasoning techniques with cloze-style prompts to discern genuine from deceptive posts and news articles. In our extensive experiments, we observe GPT-3.5-turbo's zero-shot superiority for both in-distribution and out-of-distribution datasets, where GPT-3.5-turbo consistently achieved accuracy at 68-72%, unlike the decline observed in previous customized and fine-tuned disinformation detectors. Our codebase and dataset are available at https://github.com/mickeymst/F3.	翻訳日:2023-10-25 20:29:29 公開日:2023-10-24
# 多言語表現の結合行列分解解析 A Joint Matrix Factorization Analysis of Multilingual Representations ( http://arxiv.org/abs/2310.15513v1 ) ライセンス: Link先を確認	Zheng Zhao, Yftah Ziser, Bonnie Webber, Shay B. Cohen	(参考訳) 多言語モデルと単言語モデルの潜在表現を比較するために,結合行列の分解に基づく解析ツールを提案する。探索の代替として、このツールは複数の表現の集合を共同で解析することを可能にする。このツールを用いて,多言語事前学習モデルで学習した表現に形態素的特徴がどのように反映されているかを検討した。 33以上の言語と17種類の形態素合成カテゴリの大規模実証研究を行った。以上の結果から,上層と下層における形態素情報エンコーディングの多様性が示され,言語特性によるカテゴリー別差異がみられた。因子化出力の階層的クラスタリングは、言語学者が手作業で作成した系統樹に関連する木構造をもたらす。さらに、因子化出力は、異なる言語間タスク間で観察される性能と強い相関を示す。将来の研究を促進するためにコードをリリースします。 We present an analysis tool based on joint matrix factorization for comparing latent representations of multilingual and monolingual models. An alternative to probing, this tool allows us to analyze multiple sets of representations in a joint manner. Using this tool, we study to what extent and how morphosyntactic features are reflected in the representations learned by multilingual pre-trained models. We conduct a large-scale empirical study of over 33 languages and 17 morphosyntactic categories. Our findings demonstrate variations in the encoding of morphosyntactic information across upper and lower layers, with category-specific differences influenced by language properties. Hierarchical clustering of the factorization outputs yields a tree structure that is related to phylogenetic trees manually crafted by linguists. Moreover, we find the factorization outputs exhibit strong associations with performance observed across different cross-lingual tasks. We release our code to facilitate future research.	翻訳日:2023-10-25 20:29:04 公開日:2023-10-24
# KITAB:情報検索における制約満足度の評価 KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval ( http://arxiv.org/abs/2310.15511v1 ) ライセンス: Link先を確認	Marah I Abdin, Suriya Gunasekar, Varun Chandrasekaran, Jerry Li, Mert Yuksekgonul, Rahee Ghosh Peshawaria, Ranjita Naik, Besmira Nushi	(参考訳) 本研究は,情報検索における制約満足度問合せ(例えば「サンディエゴのアイスクリームショップの一覧」)に対する最新技術モデルの回答能力について検討する。これまでこのようなクエリは,web検索や知識ベースを通じてのみ解決可能なタスクと考えられていた。最近では、大きな言語モデル (LLM) がこのタスクの初期発生能力を示している。しかし、現在の検索ベンチマークの多くは飽和しているか、制約満足度を測定していない。 llmの事実的不正確性と幻覚に関する懸念の高まりに動機づけられ,言語モデルの制約満足度を測定するための新しいデータセットであるkitabを提案する。 KITABは600人以上の著者と13,000のクエリにまたがる書籍関連データで構成され、関連する動的データ収集と制約検証アプローチを提供し、他の著者に対して同様のテストデータを取得する。 GPT4 と GPT3.5 に関する拡張実験では,情報人気,制約タイプ,コンテキストアベイラビリティなど,一般的な障害モードを特徴付ける。その結果,無関係な情報,事実的誤り,不完全性によって測定された厳密な制約が,情報人気が低下するにつれて悪化することが明らかとなった。コンテキスト可用性は無関係な情報を緩和するが、制約を満たすには役立たず、制約満足度に対する基本的な障壁を特定する。今後のモデルの制約満足度向上に関するさらなる研究を促進するため、当社のコントリビューションをオープンソースとして公開します。 We study the ability of state-of-the art models to answer constraint satisfaction queries for information retrieval (e.g., 'a list of ice cream shops in San Diego'). In the past, such queries were considered to be tasks that could only be solved via web-search or knowledge bases. More recently, large language models (LLMs) have demonstrated initial emergent abilities in this task. However, many current retrieval benchmarks are either saturated or do not measure constraint satisfaction. Motivated by rising concerns around factual incorrectness and hallucinations of LLMs, we present KITAB, a new dataset for measuring constraint satisfaction abilities of language models. KITAB consists of book-related data across more than 600 authors and 13,000 queries, and also offers an associated dynamic data collection and constraint verification approach for acquiring similar test data for other authors. Our extended experiments on GPT4 and GPT3.5 characterize and decouple common failure modes across dimensions such as information popularity, constraint types, and context availability. Results show that in the absence of context, models exhibit severe limitations as measured by irrelevant information, factual errors, and incompleteness, many of which exacerbate as information popularity decreases. While context availability mitigates irrelevant information, it is not helpful for satisfying constraints, identifying fundamental barriers to constraint satisfaction. We open source our contributions to foster further research on improving constraint satisfaction abilities of future models.	翻訳日:2023-10-25 20:28:50 公開日:2023-10-24
# ケイ素炭化ケイ素における核スピン量子ビットの測定 Measuring nuclear spin qubits by qudit-enhanced spectroscopy in Silicon Carbide ( http://arxiv.org/abs/2310.15557v1 ) ライセンス: Link先を確認	Erik Hesselmeier, Pierre Kuna, Istv\'an Tak\'acs, Viktor Iv\'ady, Wolfgang Knolle, Misagh Ghezellou, Jawad Ul-Hassan, Durga Dasari, Florian Kaiser, Vadim Vorobyov, J\"org Wrachtrup	(参考訳) 単一電子スピンへの超微細結合を持つ核スピンは、非常に貴重な量子ビットである。本研究では,4H-SiCの単一シリコン空孔色中心(V2)を取り巻く特にリッチな核スピン環境を探索し,特徴付ける。電子スピン-3/2quditを4レベルセンサーとして使用することにより、超微細な相互作用を通じて、数種類の$^{29}$siと$^{13}$c核スピンを同定する。我々は、光検出核共鳴による超微粒子結合の主要成分を抽出し、DFTシミュレーションにより結晶中の殻群に割り当てる。我々は、電子スピンの基底状態レベルの反交差を動的核偏極に利用し、核スピン偏極を最大9,8\pm6\,\%$とする。この手法は、個々のスピンの核磁気共鳴信号を検出し、そのコヒーレント制御を実証するために使用できる。我々の研究は、多ビットメモリおよび量子コンピューティングプラットフォームとしてSiCが将来使われるためのパラメータの詳細なセットを提供する。 Nuclear spins with hyperfine coupling to single electron spins are highly valuable quantum bits. In this work we probe and characterise the particularly rich nuclear spin environment around single silicon vacancy color-centers (V2) in 4H-SiC. By using the electron spin-3/2 qudit as a 4 level sensor, we identify several groups of $^{29}$Si and $^{13}$C nuclear spins through their hyperfine interaction. We extract the major components of their hyperfine coupling via optical detected nuclear resonance, and assign them to shell groups in the crystal via the DFT simulations. We utilise the ground state level anti-crossing of the electron spin for dynamic nuclear polarization and achieve a nuclear spin polarization of up to $98\pm6\,\%$. We show that this scheme can be used to detect the nuclear magnetic resonance signal of individual spins and demonstrate their coherent control. Our work provides a detailed set of parameters for future use of SiC as a multi-qubit memory and quantum computing platform.	翻訳日:2023-10-25 20:23:01 公開日:2023-10-24
# TCRA-LLM:推論コスト削減のための大規模言語モデル TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction ( http://arxiv.org/abs/2310.15556v1 ) ライセンス: Link先を確認	Junyi Liu, Liangzhi Li, Tong Xiang, Bowen Wang, Yiming Qian	(参考訳) ChatGPTが公開用のAPIをリリースして以来、商用の大規模言語モデル(LLM)上に構築されたアプリケーションの数は指数関数的に増加した。このようなモデルの一般的な使用例としては、コンテキスト内学習能力の活用と、検索強化によって得られた知識を活用したユーザクエリによる応答の生成がある。商業的な検索拡張 LLM の展開の1つの問題は、LLM の入力トークンサイズを大幅に増大させる追加の検索コンテキストによるコストである。そこで本研究では,要約圧縮と意味圧縮の2つの手法を含むトークン圧縮方式を提案する。第1の方法は、長さの異なる自己インストラクションを含むサンプルを用いて生成されたデータセットによって微調整されたt5ベースのモデルを適用し、要約を行うことでトークンサイズを削減する。第2の方法は、セマンティクスへの影響が小さい単語を取り除いてトークンサイズを更に圧縮する。提案手法の有効性を適切に評価するために,妊娠期や乳幼児の食品レコメンデーションに着目したFRDB(Food-Recommendation DB)というデータセットを提案し,活用する。意味的圧縮は、トークンサイズとパフォーマンスをトレードオフするより柔軟な方法を提供するので、トークンサイズを1.6%の精度低下で20%削減できます。 Since ChatGPT released its API for public use, the number of applications built on top of commercial large language models (LLMs) increase exponentially. One popular usage of such models is leveraging its in-context learning ability and generating responses given user queries leveraging knowledge obtained by retrieval augmentation. One problem of deploying commercial retrieval-augmented LLMs is the cost due to the additionally retrieved context that largely increases the input token size of the LLMs. To mitigate this, we propose a token compression scheme that includes two methods: summarization compression and semantic compression. The first method applies a T5-based model that is fine-tuned by datasets generated using self-instruct containing samples with varying lengths and reduce token size by doing summarization. The second method further compresses the token size by removing words with lower impact on the semantic. In order to adequately evaluate the effectiveness of the proposed methods, we propose and utilize a dataset called Food-Recommendation DB (FRDB) focusing on food recommendation for women around pregnancy period or infants. Our summarization compression can reduce 65% of the retrieval token size with further 0.3% improvement on the accuracy; semantic compression provides a more flexible way to trade-off the token size with performance, for which we can reduce the token size by 20% with only 1.6% of accuracy drop.	翻訳日:2023-10-25 20:22:45 公開日:2023-10-24
# 日頭負荷予測のための転送学習--欧州電力需要時系列を事例として Transfer learning for day-ahead load forecasting: a case study on European national electricity demand time series ( http://arxiv.org/abs/2310.15555v1 ) ライセンス: Link先を確認	Alexandros-Menelaos Tzortzis, Sotiris Pelekis, Evangelos Spiliotis, Spiros Mouzakitis, John Psarras, Dimitris Askounis	(参考訳) 電力グリッドの日々の運用には,短期負荷予測(STLF)が不可欠である。しかし、電力需要時系列を特徴付ける非線形性、非定常性、ランダム性は、STLFを困難な課題にしている。ターゲット系列を含む必要のない複数の電力需要系列のデータを用いてトレーニングされたニューラルネットワーク(NN)モデルなど、STLFを改善するための様々な予測手法が提案されている。本研究では,この特殊なSTLF(Transfer Learning, TL)の性能について, 欧州各国の日頭電力需要を表す27の時系列を考慮し検討した。我々は、人気があり実装が容易なNNモデルを採用し、クラスタリング分析を行い、シリーズ間の類似パターンを特定し、TLを支援する。この文脈では、2つの異なるTLアプローチがクラスタリングステップなしでコンパイルされ、典型的なNNトレーニング設定と同様に互いに比較される。その結果,クラスタリング技術を考慮した場合,TLは従来の手法よりも優れていることがわかった。 Short-term load forecasting (STLF) is crucial for the daily operation of power grids. However, the non-linearity, non-stationarity, and randomness characterizing electricity demand time series renders STLF a challenging task. Various forecasting approaches have been proposed for improving STLF, including neural network (NN) models which are trained using data from multiple electricity demand series that may not necessary include the target series. In the present study, we investigate the performance of this special case of STLF, called transfer learning (TL), by considering a set of 27 time series that represent the national day-ahead electricity demand of indicative European countries. We employ a popular and easy-to-implement NN model and perform a clustering analysis to identify similar patterns among the series and assist TL. In this context, two different TL approaches, with and without the clustering step, are compiled and compared against each other as well as a typical NN training setup. Our results demonstrate that TL can outperform the conventional approach, especially when clustering techniques are considered.	翻訳日:2023-10-25 20:22:22 公開日:2023-10-24
# 圧縮光キャビティモードにおける単一原子の量子速度限界 Quantum speed limit of a single atom in a squeezed optical cavity mode ( http://arxiv.org/abs/2310.15554v1 ) ライセンス: Link先を確認	Ya-Jie Ma, Xue-Chen Gao, Shao-Xiong Wu, and Chang-shui Yu	(参考訳) 本研究では,Fabry-Perotマイクロ共振器に閉じ込められた単一原子の量子速度限界について理論的に検討する。 2階非線形媒体に駆動レーザを印加した場合、キャビティモードを圧縮し、ボゴリューボフスクイーズ変換の下で有効ハミルトニアンが得られる。進化した原子状態の解析的表現は、初期励起状態の非エルミート的シュル「{o}dinger方程式を用いて得ることができ、量子速度制限時間は解析的式とマスター方程式の両方に非常によく一致する。量子速度制限の観点からは、大きな変形、強い駆動、結合強度のために量子状態の進化を加速することがより導出的である。最初の重ね合わせ状態の場合、初期状態の形式は進化速度に大きな影響を与える。量子速度制限時間はシステムパラメータに依存するだけでなく、初期状態によっても決定される。 We theoretically study the quantum speed limit of a single atom trapped in a Fabry-Perot microresonator. The cavity mode will be squeezed when a driving laser is applied to the second-order nonlinear medium, and the effective Hamiltonian can be obtained under the Bogoliubov squeezing transformation. The analytical expression of evolved atom state can be obtained by using the non-Hermitian Schr\"{o}dinger equation for the initial excited state, and the quantum speed limit time coincides very well for both the analytical expression and the master equation method. From the perspective of quantum speed limit, it is more conducive to accelerate the evolution of the quantum state for the large detuning, strong driving and coupling strength. For the initial superposition state case, the form of initial state has more influence on the evolution speed. The quantum speed limit time is not only dependent on the system parameters but also determined by the initial state.	翻訳日:2023-10-25 20:22:07 公開日:2023-10-24
# トランスフォーマーモデルにおける多言語性:フィードフォワードネットワークにおける言語特異性の検討 Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks ( http://arxiv.org/abs/2310.15552v1 ) ライセンス: Link先を確認	Sunit Bhattacharya and Ondrej Bojar	(参考訳) 最近の研究では、トランスフォーマー内のフィードフォワードモジュールは、トレーニングの例に基づいて入力から特定のパターンをキャプチャすることを学ぶキーバリューメモリの集合と見なすことができる。次に、キーの"メモリ"から出力された値を組み合わせて、次のトークンに関する予測を生成する。これは、出力層の近くの最終的なトークン選択に向けて徐々に収束する予測の漸進的なプロセスにつながる。この興味深い視点は、多言語モデルがこのメカニズムをどのように活用するかという疑問を提起する。具体的には、2つ以上の言語でトレーニングされた自己回帰モデルでは、すべてのニューロン(クロス層)はすべての言語に等しく反応するのか? いいえ! 我々の仮説は、事前学習中に特定のモデルパラメータが強い言語固有の特徴を学習する一方で、他のパラメータは言語に依存しない(言語間で共有される)特徴を学習するという考えを中心にしている。これを検証するために,本モデルが最初に事前学習された2言語の並列コーパスを用いて実験を行った。その結果,ネットワークの入力や出力に最も近い層は,中間層に比べて言語固有の振る舞いを示す傾向があることがわかった。 Recent research suggests that the feed-forward module within Transformers can be viewed as a collection of key-value memories, where the keys learn to capture specific patterns from the input based on the training examples. The values then combine the output from the 'memories' of the keys to generate predictions about the next token. This leads to an incremental process of prediction that gradually converges towards the final token choice near the output layers. This interesting perspective raises questions about how multilingual models might leverage this mechanism. Specifically, for autoregressive models trained on two or more languages, do all neurons (across layers) respond equally to all languages? No! Our hypothesis centers around the notion that during pretraining, certain model parameters learn strong language-specific features, while others learn more language-agnostic (shared across languages) features. To validate this, we conduct experiments utilizing parallel corpora of two languages that the model was initially pretrained on. Our findings reveal that the layers closest to the network's input or output tend to exhibit more language-specific behaviour compared to the layers in the middle.	翻訳日:2023-10-25 20:21:51 公開日:2023-10-24
# 自己教師付き適応残差推定生成逆ネットワークによるpet合成 PET Synthesis via Self-supervised Adaptive Residual Estimation Generative Adversarial Network ( http://arxiv.org/abs/2310.15550v1 ) ライセンス: Link先を確認	Yuxin Xue, Lei Bi, Yige Peng, Michael Fulham, David Dagan Feng, Jinman Kim	(参考訳) PET(Positron emission tomography)は、臨床診断において広く用いられている、高感度な分子イメージングである。 PETからの放射線被曝を減らすことだけでなく、適切な画質を維持することに関心がある。畳み込みニューラルネットワーク(cnns)を用いた低用量pet画像から合成された高品質pet画像を生成する手法が,低用量画像の復元に最先端の手法であると報告されている。しかし,これらの手法は,合成画像と実画像のテクスチャと構造にばらつきが生じやすい。さらに,低用量PETと標準PETとの分布変化について検討した。これらの課題に対処するため,我々は,自己教師付き適応残差推定生成ネットワーク(SS-AEGAN)を開発した。本稿では,(1)低線量PETと合成出力との残差マップを入力とし,予備合成PET画像の動的修正を目的とした適応残差推定機構であるAE-Net,(2)粗いジェネレータの特徴表現を強化する自己教師付き事前学習戦略を紹介する。全身PET画像の公開ベンチマークデータを用いて実験したところ,SS-AEGANは様々な線量削減因子による最先端合成法よりも一貫して優れていた。 Positron emission tomography (PET) is a widely used, highly sensitive molecular imaging in clinical diagnosis. There is interest in reducing the radiation exposure from PET but also maintaining adequate image quality. Recent methods using convolutional neural networks (CNNs) to generate synthesized high-quality PET images from low-dose counterparts have been reported to be state-of-the-art for low-to-high image recovery methods. However, these methods are prone to exhibiting discrepancies in texture and structure between synthesized and real images. Furthermore, the distribution shift between low-dose PET and standard PET has not been fully investigated. To address these issues, we developed a self-supervised adaptive residual estimation generative adversarial network (SS-AEGAN). We introduce (1) An adaptive residual estimation mapping mechanism, AE-Net, designed to dynamically rectify the preliminary synthesized PET images by taking the residual map between the low-dose PET and synthesized output as the input, and (2) A self-supervised pre-training strategy to enhance the feature representation of the coarse generator. Our experiments with a public benchmark dataset of total-body PET images show that SS-AEGAN consistently outperformed the state-of-the-art synthesis methods with various dose reduction factors.	翻訳日:2023-10-25 20:21:30 公開日:2023-10-24
# テンソル最適化におけるアルゴリズムの正則化:マトリックスセンシングの解法に向けて Algorithmic Regularization in Tensor Optimization: Towards a Lifted Approach in Matrix Sensing ( http://arxiv.org/abs/2310.15549v1 ) ライセンス: Link先を確認	Ziye Ma, Javad Lavaei, Somayeh Sojoudi	(参考訳) 勾配降下(GD)は、暗黙の正規化を誘導し、コンパクト表現を促進するため、機械学習モデルの一般化に不可欠である。本研究では, テンソル最適化のための暗黙的正則化誘導におけるgdの役割について検討する。このフレームワークは、対称なランク1テンソルを最適化する際に、急激な解を厳密なサドルに変換することによって、非凸行列センシング問題に対処するために最近提案されている。十分に小さな初期化スケールで、この昇降問題に適用されたGDは、近似階数1テンソルと逃避方向の臨界点を導出する。本研究は, 行列センシングのテンソルパラメトリゼーションが一階法と組み合わせ, この問題における大域的最適性を達成する上で重要であることを裏付ける。 Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor optimization, particularly within the context of the lifted matrix sensing framework. This framework has been recently proposed to address the non-convex matrix sensing problem by transforming spurious solutions into strict saddles when optimizing over symmetric, rank-1 tensors. We show that, with sufficiently small initialization scale, GD applied to this lifted problem results in approximate rank-1 tensors and critical points with escape directions. Our findings underscore the significance of the tensor parametrization of matrix sensing, in combination with first-order methods, in achieving global optimality in such problems.	翻訳日:2023-10-25 20:21:10 公開日:2023-10-24
# トラップイオン中のボソニック論理状態のロバストと決定論的生成 Robust and Deterministic Preparation of Bosonic Logical States in a Trapped Ion ( http://arxiv.org/abs/2310.15546v1 ) ライセンス: Link先を確認	V. G. Matsos, C. H. Valahu, T. Navickas, A. D. Rao, M. J. Millican, M. J. Biercuk and T. R. Tan	(参考訳) ボソニックモードにおける論理量子ビットの符号化は、フォールトトレラント量子情報処理のハードウェア効率の高い実装を提供する。閉じ込められたイオンと超伝導マイクロ波キャビティの最近の進歩は、高品質なボソニック状態の実験的実現と、ボソニックモードで符号化された誤り訂正論理量子ビットの実証につながっている。しかし、現在のボゾン符号語作成プロトコルは、一般的なノイズ源には堅牢性がなく、実装が実験的に困難であり、これまで実現されてきたコードの品質と幅を制限している。本稿では, ロバスト制御による誤り抑制の概念と量子誤差補正符号化を組み合わせることで, 捕捉イオンの力学的運動における非古典的ターゲットボソニック状態の高忠実性, 決定論的生成を実験的に証明する。本稿では,レーザ駆動によるスピンモーション相互作用の動的変調を数値的に最適化し,目標状態を生成する手法を提案する。最適化された制御パルスは実験的な制約に合わせて調整され、支配的なエラー源に対して堅牢に設計されている。これらのプロトコルを用いて、Gottesman-Kitaev-Preskill (GKP)状態の論理的忠実度を$\bar{\mathcal{F}}=0.940(8)$で証明し、平均忠実度$\mathcal{F}=0.807(7)$で距離3二項論理状態の最初の実現を実現し、12.91(5) dBの真空状態を示す。 Encoding logical qubits in bosonic modes provides a potentially hardware-efficient implementation of fault-tolerant quantum information processing. Recent advancements in trapped ions and superconducting microwave cavities have led to experimental realizations of high-quality bosonic states and demonstrations of error-corrected logical qubits encoded in bosonic modes. However, current protocols for preparing bosonic code words lack robustness to common noise sources and can be experimentally challenging to implement, limiting the quality and breadth of codes that have been realized to date. Here, we combine concepts of error suppression via robust control with quantum error correction encoding and experimentally demonstrate high-fidelity, deterministic preparation of highly non-classical target bosonic states in the mechanical motion of a trapped ion. Our approach implements numerically optimized dynamical modulation of laser-driven spin-motion interactions to generate the target state in a single step. The optimized control pulses are tailored towards experimental constraints and are designed to be robust against the dominant source of error. Using these protocols, we demonstrate logical fidelities for the Gottesman-Kitaev-Preskill (GKP) state as high as $\bar{\mathcal{F}}=0.940(8)$, achieve the first realization of a distance-3 binomial logical state with an average fidelity of $\mathcal{F}=0.807(7)$, and demonstrate a 12.91(5) dB squeezed vacuum state.	翻訳日:2023-10-25 20:20:54 公開日:2023-10-24
# 複数の解像度でのルーティング問題を解決する対称性保存グラフアテンションネットワーク Symmetry-preserving graph attention network to solve routing problems at multiple resolutions ( http://arxiv.org/abs/2310.15543v1 ) ライセンス: Link先を確認	Cong Dao Tran, Thong Bach, Truong Son Hy	(参考訳) トラベリングセールスパーソン問題 (TSP) と車両ルーティング問題 (VRP) は,機械学習 (ML) 手法の適応により,精度と計算時間を合理的に向上した。しかし、以前の作品では、回転、翻訳、置換、スケーリングを含む、tspsとvrpから生じる対称性を完全に尊重していない。本研究では,組合わせ問題を解くために,最初の完全同値モデルとトレーニングを導入する。さらに、特に大きなグラフや長距離グラフの場合において、入力グラフのマルチスケール構造(ローカルからグローバル情報)を捉えることが不可欠であり、従来の手法は局所的あるいは準最適解に繋がるローカル情報のみを抽出することに限定されていた。上記の制限に対処するため,マルチレゾリューション方式と等価グラフアテンションネットワーク(mEGAT)アーキテクチャを併用して,低レベルおよび高レベルグラフレゾリューションに基づく最適経路を効率的に学習する手法を提案する。特に, 入力グラフから粗粒グラフの階層構造を構築し, まずは単純な低レベルグラフのルーティング問題を解き, その知識をより複雑な高レベルグラフに活用する。実験により,本モデルが既存のベースラインより優れており,対称性の保存とマルチレゾリューションがデータ駆動方式で組合せ問題を解くための重要なレシピであることを実証した。私たちのソースコードはhttps://github.com/HySonLab/Multires-NP-hardで公開されています。 Travelling Salesperson Problems (TSPs) and Vehicle Routing Problems (VRPs) have achieved reasonable improvement in accuracy and computation time with the adaptation of Machine Learning (ML) methods. However, none of the previous works completely respects the symmetries arising from TSPs and VRPs including rotation, translation, permutation, and scaling. In this work, we introduce the first-ever completely equivariant model and training to solve combinatorial problems. Furthermore, it is essential to capture the multiscale structure (i.e. from local to global information) of the input graph, especially for the cases of large and long-range graphs, while previous methods are limited to extracting only local information that can lead to a local or sub-optimal solution. To tackle the above limitation, we propose a Multiresolution scheme in combination with Equivariant Graph Attention network (mEGAT) architecture, which can learn the optimal route based on low-level and high-level graph resolutions in an efficient way. In particular, our approach constructs a hierarchy of coarse-graining graphs from the input graph, in which we try to solve the routing problems on simple low-level graphs first, then utilize that knowledge for the more complex high-level graphs. Experimentally, we have shown that our model outperforms existing baselines and proved that symmetry preservation and multiresolution are important recipes for solving combinatorial problems in a data-driven manner. Our source code is publicly available at https://github.com/HySonLab/Multires-NP-hard	翻訳日:2023-10-25 20:20:21 公開日:2023-10-24
# 辞書から概念的役割を学習することによる理解と一貫性の言語モデルの改善 Improving Language Models Meaning Understanding and Consistency by Learning Conceptual Roles from Dictionary ( http://arxiv.org/abs/2310.15541v1 ) ライセンス: Link先を確認	Myeongjun Erik Jang, Thomas Lukasiewicz	(参考訳) 現代事前訓練言語モデル(PLM)の非人間的な振る舞いは、その信頼性を損なう主要な原因である。このような不整合な振る舞いの驚くべき現象は、一貫性のない予測の生成であり、同じ意味を持つテキストに対して異なる予測を生成したり、論理特性に違反するなど、論理的に矛盾する結果を生み出す。以前の研究では、データの増大を悪用したり、問題を緩和するために特殊な損失関数を実装した。しかし、大規模なPLMのために高価なトレーニングリソースを消費し、一定の一貫性のタイプしか扱えないため、利用は限られている。そこで本研究では,plmの意味認識を根本的に改善することにより,一貫性のない行動問題を緩和する実践的アプローチを提案する。概念的役割理論に基づき,辞書内の単語定義ペアから概念間の正確な相互関係を学習することにより,plmが正確な意味を捉えることができる。次に,学習した相互関係とPLMの事前学習知識を組み合わせるために,いくつかの追加パラメータのみを更新する効率的なパラメータ統合手法を提案する。実験の結果,複数種類の一貫性を同時に改善し,効率的な知識統合を実現し,他の言語にも容易に適用できることが判明した。 The non-humanlike behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness. A striking phenomenon of such faulty behaviours is the generation of inconsistent predictions, which produces logically contradictory results, such as generating different predictions for texts delivering the same meaning or violating logical properties. Previous studies exploited data augmentation or implemented specialised loss functions to alleviate the issue. However, their usage is limited, because they consume expensive training resources for large-sized PLMs and can only handle a certain consistency type. To this end, we propose a practical approach that alleviates the inconsistent behaviour issue by fundamentally improving PLMs' meaning awareness. Based on the conceptual role theory, our method allows PLMs to capture accurate meaning by learning precise interrelationships between concepts from word-definition pairs in a dictionary. Next, we propose an efficient parameter integration technique that updates only a few additional parameters to combine the learned interrelationship with PLMs' pre-trained knowledge. Our experimental results reveal that the approach can concurrently improve multiple types of consistency, enables efficient knowledge integration, and easily applies to other languages.	翻訳日:2023-10-25 20:19:56 公開日:2023-10-24
# 変化のレンズを通して識別可能な潜在多項式因果モデル Identifiable Latent Polynomial Causal Models Through the Lens of Change ( http://arxiv.org/abs/2310.15580v1 ) ライセンス: Link先を確認	Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi	(参考訳) 因果表現学習は、観測された低レベルデータから潜在的な高レベル因果表現を明らかにすることを目的としている。その主な任務の1つは、これらの潜在因果モデルの識別を信頼できる保証を提供することである。最近のブレークスルーでは、複数の環境にまたがる潜在因果変数間の因果影響の変化を利用して、識別可能性を探る。しかし、この進歩は潜在因果変数間の因果関係が線形ガウスモデルに厳密に従うという仮定に基づいている。本稿では,多項式モデルに代表される非線形因果関係と指数関数族に準拠した一般雑音分布を含む潜在因果モデルの範囲を拡張する。さらに,すべての因果パラメータに変化を付与する必要性や,その一部が変化していない場合の部分的識別可能性について検討する。さらに,我々の理論的発見に基礎を置き,一貫した因果表現の学習を可能にする新しい経験的推定法を提案する。合成データと実世界データの両方から得られた実験結果は,識別性と一貫性に関する理論的貢献を検証する。 Causal representation learning aims to unveil latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments \citep{liu2022identifying}. However, this progress rests on the assumption that the causal relationships among latent causal variables adhere strictly to linear Gaussian models. In this paper, we extend the scope of latent causal models to involve nonlinear causal relationships, represented by polynomial models, and general noise distributions conforming to the exponential family. Additionally, we investigate the necessity of imposing changes on all causal parameters and present partial identifiability results when part of them remains unchanged. Further, we propose a novel empirical estimation method, grounded in our theoretical finding, that enables learning consistent latent causal representations. Our experimental results, obtained from both synthetic and real-world data, validate our theoretical contributions concerning identifiability and consistency.	翻訳日:2023-10-25 20:12:05 公開日:2023-10-24
# VMAFによるPyTorchの再実装:実験結果 VMAF Re-implementation on PyTorch: Some Experimental Results ( http://arxiv.org/abs/2310.15578v1 ) ライセンス: Link先を確認	Kirill Aistov and Maxim Koroteev	(参考訳) 標準VMAF実装に基づいて,PyTorchフレームワークを用いたVMAFの実装を提案する。この実装で標準(libvmaf)と比較すると、vmafユニットで$\lesssim 10^{-2}$の差が示される。目的関数としてVMAFを使用する場合の勾配計算について検討し、この関数を用いたトレーニングが不利な勾配を生じさせないことを示す。 Based on the standard VMAF implementation we propose an implementation of VMAF using PyTorch framework. For this implementation comparisons with the standard (libvmaf) show the discrepancy $\lesssim 10^{-2}$ in VMAF units. We investigate gradients computation when using VMAF as an objective function and demonstrate that training using this function does not result in ill-behaving gradients.	翻訳日:2023-10-25 20:11:47 公開日:2023-10-24
# CONTRASTE:Aspect-based Promptsを用いた教師付きコントラスト事前訓練 CONTRASTE: Supervised Contrastive Pre-training With Aspect-based Prompts For Aspect Sentiment Triplet Extraction ( http://arxiv.org/abs/2310.15577v1 ) ライセンス: Link先を確認	Rajdeep Mukherjee, Nithish Kannen, Saurabh Kumar Pandey, Pawan Goyal	(参考訳) Aspect Sentiment Triplet extract (ASTE)に関する既存の研究は、タスクのためのより効率的な微調整技術の開発に重点を置いている。私たちのモチベーションは、複数のABSAタスクの下流のパフォーマンスを同時に改善できる汎用的なアプローチを考え出すことです。そこで本研究では,ConTRastive Learningを用いた新しい事前学習戦略であるConTRASTEを提案する。我々は主にASTEに焦点を当てているが、ACOS、TASD、AESCといった他のABSAタスクに対して提案手法の利点を示す。文とその関連する(アスペクト、意見、感情)三つ子を与えられたら、まず、対応する感情を隠蔽したアスペクトベースのプロンプトを設計する。次に,デコーダの生成したアスペクト認識感情表現に対して,コントラスト学習を適用して,エンコーダ-デコーダモデルを訓練する。そこで, モデル重みを微調整するために, ベースエンコーダ・デコーダモデルとタグ付きオピニオン項検出器, 回帰型トリプレット数推定器の2つの補完モジュールを組み合わせた, 新たなマルチタスク手法を提案する。 4つのベンチマークデータセットの徹底的な実験と詳細なアブレーション実験により,提案する各コンポーネントの重要性が証明された。 Existing works on Aspect Sentiment Triplet Extraction (ASTE) explicitly focus on developing more efficient fine-tuning techniques for the task. Instead, our motivation is to come up with a generic approach that can improve the downstream performances of multiple ABSA tasks simultaneously. Towards this, we present CONTRASTE, a novel pre-training strategy using CONTRastive learning to enhance the ASTE performance. While we primarily focus on ASTE, we also demonstrate the advantage of our proposed technique on other ABSA tasks such as ACOS, TASD, and AESC. Given a sentence and its associated (aspect, opinion, sentiment) triplets, first, we design aspect-based prompts with corresponding sentiments masked. We then (pre)train an encoder-decoder model by applying contrastive learning on the decoder-generated aspect-aware sentiment representations of the masked terms. For fine-tuning the model weights thus obtained, we then propose a novel multi-task approach where the base encoder-decoder model is combined with two complementary modules, a tagging-based Opinion Term Detector, and a regression-based Triplet Count Estimator. Exhaustive experiments on four benchmark datasets and a detailed ablation study establish the importance of each of our proposed components as we achieve new state-of-the-art ASTE results.	翻訳日:2023-10-25 20:11:41 公開日:2023-10-24
# 量子アルゴリズムによるAgnostic Learningのためのニアクアドラティックサンプル複雑度低減 A Near-Quadratic Sample Complexity Reduction for Agnostic Learning via Quantum Algorithms ( http://arxiv.org/abs/2310.15576v1 ) ライセンス: Link先を確認	Daniel Z. Zanger	(参考訳) 量子アルゴリズムを用いて、精度 $\epsilon,0<\epsilon<1/4$ と信頼 $1-\delta,0<\delta <1,$ の新しいサンプル複雑性上界$O((\mbox{log}(\frac{1}{\delta}))/\epsilon)$ as $\epsilon,\delta\rightarrow 0$ ($\epsilon^{-1}$ のポリ対数係数まで)を一般の無知学習モデルに対して得られる。これは漸近順序 $\theta((\mbox{log}(\frac{1}{\delta}))/\epsilon^{2}) の対応するサンプル複雑性を、有限濃度の仮説集合とともに無依存学習問題に対する古典的(非量子)アルゴリズムによって達成可能であることが文献で知られている(例えば arunachalam と de wolf (2018) を参照)。したがって、一般的な無依存学習の場合、我々が達成する学習速度の量子スピードアップは、(多対数因子まで)$\epsilon^{-1}$で二次的である。 Using quantum algorithms, we obtain, for accuracy $\epsilon,0<\epsilon<1/4$ and confidence $1-\delta,0<\delta <1,$ a new sample complexity upper bound of $O((\mbox{log}(\frac{1}{\delta}))/\epsilon)$ as $\epsilon,\delta\rightarrow 0$ (up to a polylogarithmic factor in $\epsilon^{-1}$) for a general agnostic learning model, provided the hypothesis class is of finite cardinality. This greatly improves upon a corresponding sample complexity of asymptotic order $\Theta((\mbox{log}(\frac{1}{\delta}))/\epsilon^{2})$ known in the literature to be attainable by means of classical (non-quantum) algorithms for an agnostic learning problem also with hypothesis set of finite cardinality (see, for example, Arunachalam and de Wolf (2018) and the classical statistical learning theory references cited there). Thus, for general agnostic learning, the quantum speedup in the rate of learning that we achieve is quadratic in $\epsilon^{-1}$ (up to a polylogarithmic factor).	翻訳日:2023-10-25 20:11:16 公開日:2023-10-24
# POE:複数選択推論のための除去プロセス POE: Process of Elimination for Multiple Choice Reasoning ( http://arxiv.org/abs/2310.15575v1 ) ライセンス: Link先を確認	Chenkai Ma, Xinya Du	(参考訳) 言語モデル(LM)は、複数の選択推論タスクに対してコンテキスト内学習を行うことができるが、これらのタスクの選択肢は等しく扱われる。人間は最後に正しい答えを選ぶ前に間違った選択肢を最初に排除するので、同様の2段階の戦略は、これらのタスクにおいてLMをより良くする、と私たちは主張する。この目的のために, 2段階のスコアリング法であるプロセス・オブ・エミッション(POE)を提案する。最初のステップでは、POEはそれぞれのオプションをスコアし、一見間違ったオプションを排除します。 2番目のステップでは、POEはこれらの間違ったオプションを隠蔽し、残りのオプションから最終的な予測を行う。 8つの推論タスクのゼロショット実験では,POEの有効性が示され,以下の分析により,論理的推論タスクに特に有効であることが判明した。さらにマスクの効果を分析し,ChatGPTのような少数ショット設定や大規模言語モデル(LLM)に適用できることを示す。 Language models (LMs) are capable of conducting in-context learning for multiple choice reasoning tasks, but the options in these tasks are treated equally. As humans often first eliminate wrong options before picking the final correct answer, we argue a similar two-step strategy can make LMs better at these tasks. To this end, we present the Process of Elimination (POE), a two-step scoring method. In the first step, POE scores each option, and eliminates seemingly wrong options. In the second step, POE masks these wrong options, and makes the final prediction from the remaining options. Zero-shot experiments on 8 reasoning tasks illustrate the effectiveness of POE, and a following analysis finds our method to be especially performant on logical reasoning tasks. We further analyze the effect of masks, and show that POE applies to few-shot settings and large language models (LLMs) like ChatGPT.	翻訳日:2023-10-25 20:10:35 公開日:2023-10-24
# 薬物発見知識グラフのための自然言語処理:約束と落とし穴 Natural Language Processing for Drug Discovery Knowledge Graphs: promises and pitfalls ( http://arxiv.org/abs/2310.15572v1 ) ライセンス: Link先を確認	J. Charles G. Jeynes, Tim James, Matthew Corney	(参考訳) 薬物発見を助けるための知識グラフ(kgs)の構築と分析は、研究のトピックである。 KGsの健全な特徴は、コネクションの発見を容易にするフォーマットで、多くの異種データソースを組み合わせる能力である。 KGsの実用性は、薬物再資源化などの分野で実証されており、手動によるデータの探索とモデリングを通じて洞察されている。本稿では、自然言語処理(nlp)を用いて、通常、科学文献からkgsのデータソースとして非構造化テキストをマイニングする約束と落とし穴について論じる。これは、当初、KG内のデータの基盤としてChEMBLなどの構造化データソースを解析し、NLPを使用してそれらを強化または拡張した経験に基づいています。 KGsのNLPの基本的な約束は、人間のキュレーションだけでは事実上不可能なタスクとして、数百万のドキュメントからデータを自動的に抽出することである。しかしながら、NLP-KGパイプラインには誤った名前のエンティティ認識やオントロジーなどの潜在的な落とし穴があり、最終的には誤った推論や結論につながる可能性がある。 Building and analysing knowledge graphs (KGs) to aid drug discovery is a topical area of research. A salient feature of KGs is their ability to combine many heterogeneous data sources in a format that facilitates discovering connections. The utility of KGs has been exemplified in areas such as drug repurposing, with insights made through manual exploration and modelling of the data. In this article, we discuss promises and pitfalls of using natural language processing (NLP) to mine unstructured text typically from scientific literature as a data source for KGs. This draws on our experience of initially parsing structured data sources such as ChEMBL as the basis for data within a KG, and then enriching or expanding upon them using NLP. The fundamental promise of NLP for KGs is the automated extraction of data from millions of documents a task practically impossible to do via human curation alone. However, there are many potential pitfalls in NLP-KG pipelines such as incorrect named entity recognition and ontology linking all of which could ultimately lead to erroneous inferences and conclusions.	翻訳日:2023-10-25 20:10:19 公開日:2023-10-24
# 選択特殊化による視覚的接地連続言語学習 Visually Grounded Continual Language Learning with Selective Specialization ( http://arxiv.org/abs/2310.15571v1 ) ライセンス: Link先を確認	Kyra Ahrens, Lennart Bengtson, Jae Hee Lee, Stefan Wermter	(参考訳) 視覚に作用する人工エージェントの望ましい特性は、各タスクに十分な専門化と、伝達のための一般的な知識の構築のバランスを保ちながら、言語に変形したタスクのシーケンスを継続的に学習することである。選択的特殊化(Selective specialization)、すなわち各タスクを専門とするモデルコンポーネントの選択は、このトレードオフを管理するための戦略である。しかしながら、選択戦略の設計には、より専門的で一般化可能な表現の学習において、各モデルコンポーネントの役割についての洞察が必要である。そこで本研究の目的は,視覚下連続言語学習のための選択戦略を広範囲に分析することである。この目的に適したベンチマークがないため、徹底したモデル分析に十分な制御と柔軟性を提供する2つの新しい診断データセットを導入する。モジュールの特殊化戦略および2種類のモデルアーキテクチャの定量化のための様々なヒューリスティックスを評価する。最後に,共通の連続学習ベースラインを上回る分析に基づいて,概念的に単純なアプローチをデザインする。本研究は,連続学習アルゴリズムと個別モデル部品の学習行動の連携を改善するためのさらなる取り組みの必要性を示す。 A desirable trait of an artificial agent acting in the visual world is to continually learn a sequence of language-informed tasks while striking a balance between sufficiently specializing in each task and building a generalized knowledge for transfer. Selective specialization, i.e., a careful selection of model components to specialize in each task, is a strategy to provide control over this trade-off. However, the design of selection strategies requires insights on the role of each model component in learning rather specialized or generalizable representations, which poses a gap in current research. Thus, our aim with this work is to provide an extensive analysis of selection strategies for visually grounded continual language learning. Due to the lack of suitable benchmarks for this purpose, we introduce two novel diagnostic datasets that provide enough control and flexibility for a thorough model analysis. We assess various heuristics for module specialization strategies as well as quantifiable measures for two different types of model architectures. Finally, we design conceptually simple approaches based on our analysis that outperform common continual learning baselines. Our results demonstrate the need for further efforts towards better aligning continual learning algorithms with the learning behaviors of individual model parts.	翻訳日:2023-10-25 20:10:00 公開日:2023-10-24
# MuLMS: 材料科学領域における情報抽出のための多層注釈テキストコーパス MuLMS: A Multi-Layer Annotated Text Corpus for Information Extraction in the Materials Science Domain ( http://arxiv.org/abs/2310.15569v1 ) ライセンス: Link先を確認	Timo Pierre Schrader, Matteo Finco, Stefan Gr\"unewald, Felix Hildebrand, Annemarie Friedrich	(参考訳) 研究分野に関する最近の出版物や実験結果をすべて追跡することは難しい課題である。先行研究は、様々な科学分野における情報抽出モデルの有効性を実証した。最近、未研究の材料科学領域向けにいくつかのデータセットがリリースされた。しかしながら、これらのデータセットは、パーシング合成手順や固体酸化物燃料電池などのサブドメインといったサブプロブレムに焦点を当てている。本稿では,材料科学のサブドメイン7つにまたがる50のオープンアクセス記事のデータセットであるmulmsについて述べる。コーパスはドメインの専門家によって注釈付けされており、名前付きエンティティからフレーム構造へのいくつかのレイヤがある。すべてのタスクに対して競合するニューラルモデルを提示し、既存の関連リソースによるマルチタスクトレーニングがメリットをもたらすことを示す。 Keeping track of all relevant recent publications and experimental results for a research area is a challenging task. Prior work has demonstrated the efficacy of information extraction models in various scientific areas. Recently, several datasets have been released for the yet understudied materials science domain. However, these datasets focus on sub-problems such as parsing synthesis procedures or on sub-domains, e.g., solid oxide fuel cells. In this resource paper, we present MuLMS, a new dataset of 50 open-access articles, spanning seven sub-domains of materials science. The corpus has been annotated by domain experts with several layers ranging from named entities over relations to frame structures. We present competitive neural models for all tasks and demonstrate that multi-task training with existing related resources leads to benefits.	翻訳日:2023-10-25 20:09:40 公開日:2023-10-24
# I$^2$MD:Modal Mutual Distillationを用いた3D行動表現学習 I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation ( http://arxiv.org/abs/2310.15568v1 ) ライセンス: Link先を確認	Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li	(参考訳) 近年の自己教師型3次元行動表現学習の進歩は、主に対照的な学習によるものである。しかし、従来の対照的な枠組みでは、異なる骨格のモダリティ間の豊富な相補性は未解明のままである。さらに、自己提供したサンプルの識別に最適化されたモデルでは、限定されたアクションカテゴリの場合、同様のポジティブなインスタンスが多数発生する。本研究では, 一般的な相互蒸留(I$^2$MD)フレームワークを導入することで, 上記の問題に対処する。 i$^2$md では、まずクロスモーダル相互作用をクロスモーダル相互蒸留(cmd)過程として再計算する。教員の知識を学生に伝達する既存の蒸留ソリューションとは異なり、CMDでは、知識は継続的に更新され、事前訓練中にモダリティ間で双方向に蒸留される。類似したサンプルの干渉を緩和し,その基盤となるコンテキストを活用するため,IMD(Intra-modal Mutual Distillation)戦略,IMD(Dynamic Neighbors Aggregation)メカニズムを最初に導入し,各モードで追加のクラスタレベルの識別ブランチをインスタンス化する。高度に相関した隣り合う特徴を適応的に集約し、局所的なクラスタレベルのコントラストを形成する。相互蒸留は2つの分枝間で行われ、相互レベルの知識交換が行われる。 3つのデータセットに関する広範な実験は、我々のアプローチが一連の新しいレコードを設定することを示している。 Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of limited action categories. In this work, we tackle the aforementioned problems by introducing a general Inter- and Intra-modal Mutual Distillation (I$^2$MD) framework. In I$^2$MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process. Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training. To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy, In IMD, the Dynamic Neighbors Aggregation (DNA) mechanism is first introduced, where an additional cluster-level discrimination branch is instantiated in each modality. It adaptively aggregates highly-correlated neighboring features, forming local cluster-level contrasting. Mutual distillation is then performed between the two branches for cross-level knowledge exchange. Extensive experiments on three datasets show that our approach sets a series of new records.	翻訳日:2023-10-25 20:09:28 公開日:2023-10-24
# Ojaのアルゴリズムから応用による乗法重み更新法へ From Oja's Algorithm to the Multiplicative Weights Update Method with Applications ( http://arxiv.org/abs/2310.15559v1 ) ライセンス: Link先を確認	Dan Garber	(参考訳) ojaのアルゴリズムは、主に確率主成分分析の文脈で研究されているよく知られたオンラインアルゴリズムである。我々は、共通の固有ベクトルを共有する任意の(必ずしも確率的ではない)対称行列列に適用すると、ojaのアルゴリズムの後悔は、専門家のアドバイスによる予測問題に対するよく知られた乗法重みの後悔という観点で、直接的に境界づけられるという、我々の知識の最も良いところは、単純な観察をする。単位球面上の二次形式を最適化するいくつかの応用を$\reals^n$で論じる。 Oja's algorithm is a well known online algorithm studied mainly in the context of stochastic principal component analysis. We make a simple observation, yet to the best of our knowledge a novel one, that when applied to a any (not necessarily stochastic) sequence of symmetric matrices which share common eigenvectors, the regret of Oja's algorithm could be directly bounded in terms of the regret of the well known multiplicative weights update method for the problem of prediction with expert advice. Several applications to optimization with quadratic forms over the unit sphere in $\reals^n$ are discussed.	翻訳日:2023-10-25 20:09:04 公開日:2023-10-24
# tagE:人間の指示を理解するために身体的エージェントを起動 tagE: Enabling an Embodied Agent to Understand Human Instructions ( http://arxiv.org/abs/2310.15605v1 ) ライセンス: Link先を確認	Chayan Sarkar and Avik Mitra and Pradip Pramanick and Tapas Nayak	(参考訳) 自然言語は、物理的存在を持つ知的エージェントが人間と関わるとき、コミュニケーションの第一のモードとして機能する。多くの研究が、感情分析、意図予測、質問応答、要約といった取り組みを含む自然言語理解(NLU)に焦点を当てているが、NLUの範囲は、具体的エージェントによる具体的な行動を必要とする状況に限られている。自然言語固有の曖昧さと不完全性は、人間の意図を解読しようとする知的エージェントにとっての課題である。この課題に取り組むため,我々は,具体化エージェント (tage) のためのタスクおよび引数グラウンドと呼ばれる新しいシステムを提案する。本システムでは,自然言語で表現された複雑なタスク命令から一連のタスクを抽出するために,発明的なニューラルネットワークモデルを採用している。提案モデルでは,入れ子デコードに富んだエンコーダ・デコーダ・フレームワークを用いて,複雑な命令からタスクとその引数を効果的に抽出する。抽出されたタスクはロボットの確立したスキルコレクションにマッピング(あるいは接地)され、引数は環境に存在するオブジェクトの接地を見つける。システムのトレーニングと評価を容易にするため,複雑な命令を含むデータセットをキュレートした。実験の結果は、ロバストなベースラインモデルよりも優れており、我々のアプローチの長所を浮き彫りにしている。 Natural language serves as the primary mode of communication when an intelligent agent with a physical presence engages with human beings. While a plethora of research focuses on natural language understanding (NLU), encompassing endeavors such as sentiment analysis, intent prediction, question answering, and summarization, the scope of NLU directed at situations necessitating tangible actions by an embodied agent remains limited. The inherent ambiguity and incompleteness inherent in natural language present challenges for intelligent agents striving to decipher human intention. To tackle this predicament head-on, we introduce a novel system known as task and argument grounding for Embodied agents (tagE). At its core, our system employs an inventive neural network model designed to extract a series of tasks from complex task instructions expressed in natural language. Our proposed model adopts an encoder-decoder framework enriched with nested decoding to effectively extract tasks and their corresponding arguments from these intricate instructions. These extracted tasks are then mapped (or grounded) to the robot's established collection of skills, while the arguments find grounding in objects present within the environment. To facilitate the training and evaluation of our system, we have curated a dataset featuring complex instructions. The results of our experiments underscore the prowess of our approach, as it outperforms robust baseline models.	翻訳日:2023-10-25 20:03:21 公開日:2023-10-24
# MUSER: マルチビュー類似のケース検索データセット MUSER: A Multi-View Similar Case Retrieval Dataset ( http://arxiv.org/abs/2310.15602v1 ) ライセンス: Link先を確認	Qingquan Li and Yiran Hu and Feng Yao and Chaojun Xiao and Zhiyuan Liu and Maosong Sun and Weixing Shen	(参考訳) 類似事例検索(SCR)は、司法公正の促進に重要な役割を果たす代表的法的AIアプリケーションである。しかし、既存のSCRデータセットは、事件間の類似性を判断する際にのみ事実記述セクションに焦点をあてており、背景にある洞察力のある推論プロセスを提供する他の価値あるセクション(例えば裁判所の意見)を無視している。さらに、ケースの類似性は、典型的には事実記述のテクスト的意味論のみによって測定され、法的知識の観点からは、訴訟の完全な複雑さを捉えることができない可能性がある。本稿では,多視点類似度測定に基づく類似事例検索データセットであるmuserと,文レベルの法的要素アノテーションを用いた包括的法的要素を提案する。具体的には,3つの視点(法的事実,紛争焦点,法規)を選択し,それぞれに法的要素の包括的かつ構造化されたラベルスキーマを構築し,ケース類似性の正確かつ理解可能な評価を可能にする。構築されたデータセットは、中国の民事事件から始まり、100のクエリケースと4,024の候補ケースを含んでいる。法的な要素予測のためのテキスト分類アルゴリズムと,MUSER上の類似事例を検索するための様々な検索手法を実装した。実験結果から, 法的要素を組み込むことでSCRモデルの性能向上が期待できるが, MUSERがもたらした課題に対処するためには, さらなる努力が必要であることが示唆された。ソースコードとデータセットはhttps://github.com/thulawtech/muserで公開されている。 Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge. In this work, we present MUSER, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations. Specifically, we select three perspectives (legal fact, dispute focus, and law statutory) and build a comprehensive and structured label schema of legal elements for each of them, to enable accurate and knowledgeable evaluation of case similarities. The constructed dataset originates from Chinese civil cases and contains 100 query cases and 4,024 candidate cases. We implement several text classification algorithms for legal element prediction and various retrieval methods for retrieving similar cases on MUSER. The experimental results indicate that incorporating legal elements can benefit the performance of SCR models, but further efforts are still required to address the remaining challenges posed by MUSER. The source code and dataset are released at https://github.com/THUlawtech/MUSER.	翻訳日:2023-10-25 20:02:55 公開日:2023-10-24
# 片手で複数の物体をつかむ Grasp Multiple Objects with One Hand ( http://arxiv.org/abs/2310.15599v1 ) ライセンス: Link先を確認	Yuyang Li, Bo Liu, Yiran Geng, Puhao Li, Yaodong Yang, Yixin Zhu, Tengyu Liu, Siyuan Huang	(参考訳) 人間の手の複雑な運動学は、複数のオブジェクトを同時に把握し、操作することができる。その重要性にもかかわらず、ロボットによるマルチオブジェクトの把持は未検討のままであり、運動学、ダイナミクス、オブジェクト構成の課題を提示している。本稿では,マルチフィンガーデキスタラスハンドを用いたテーブルトップ上のマルチオブジェクトグリップのための2段階手法であるMultiGraspを提案する。それは (i)先延ばし案の作成及び (二物をつかんで持ち上げること。) 実験結果は、主に二重物体の把握と44.13%の成功率の報告に焦点が当てられ、未確認の物体構成への適応性と不正確な把握を示す。フレームワークはまた、推論速度の低下にもかかわらず、2つ以上のオブジェクトを把握できることも示している。 The human hand's complex kinematics allow for simultaneous grasping and manipulation of multiple objects, essential for tasks like object transfer and in-hand manipulation. Despite its importance, robotic multi-object grasping remains underexplored and presents challenges in kinematics, dynamics, and object configurations. This paper introduces MultiGrasp, a two-stage method for multi-object grasping on a tabletop with a multi-finger dexterous hand. It involves (i) generating pre-grasp proposals and (ii) executing the grasp and lifting the objects. Experimental results primarily focus on dual-object grasping and report a 44.13% success rate, showcasing adaptability to unseen object configurations and imprecise grasps. The framework also demonstrates the capability to grasp more than two objects, albeit at a reduced inference speed.	翻訳日:2023-10-25 20:02:34 公開日:2023-10-24
# 対話型スケッチ質問応答における創発的コミュニケーション Emergent Communication in Interactive Sketch Question Answering ( http://arxiv.org/abs/2310.15597v1 ) ライセンス: Link先を確認	Zixing Lei, Yiming Zhang, Yuxin Xiong and Siheng Chen	(参考訳) 視覚に基づく創発的コミュニケーション(EC)は、スケッチを通してコミュニケーションを学び、人間のコミュニケーションの進化を解明することを目的としている。皮肉なことに、以前の作品は、人間のコミュニケーションに欠かせないマルチラウンドインタラクションを無視している。このギャップを埋めるために、我々はまず、2人の共同プレイヤーがスケッチを通して対話し、複数のラウンドで画像に関する質問に答える、インタラクティブスケッチ質問回答(ISQA)タスクを導入する。この課題を達成するために,質問応答精度,複雑化,人間の解釈可能性などの3つの評価因子のバランスを効果的に達成できる,新しいインタラクティブECシステムを設計する。人的評価を含む実験結果から,マルチラウンド対話機構は,適切な人間解釈能力を有する知的エージェント間のコミュニケーションを目標とし,効率的なものにすることが示された。 Vision-based emergent communication (EC) aims to learn to communicate through sketches and demystify the evolution of human communication. Ironically, previous works neglect multi-round interaction, which is indispensable in human communication. To fill this gap, we first introduce a novel Interactive Sketch Question Answering (ISQA) task, where two collaborative players are interacting through sketches to answer a question about an image in a multi-round manner. To accomplish this task, we design a new and efficient interactive EC system, which can achieve an effective balance among three evaluation factors, including the question answering accuracy, drawing complexity and human interpretability. Our experimental results including human evaluation demonstrate that multi-round interactive mechanism facilitates targeted and efficient communication between intelligent agents with decent human interpretability.	翻訳日:2023-10-25 20:02:20 公開日:2023-10-24
# 検索に基づく知識伝達:超大規模言語モデル圧縮に対する効果的なアプローチ Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression ( http://arxiv.org/abs/2310.15594v1 ) ライセンス: Link先を確認	Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Ran Lucien Wang, Rui Yan	(参考訳) 大規模事前学習言語モデル(LLM)は、様々な自然言語処理(NLP)タスクにおいて例外的な性能を示した。しかし、これらのモデルの巨大なサイズは、現実世界のアプリケーションに展開する上で大きな課題をもたらします。多くのモデル圧縮技術が提案されているが、モデルスケールに大きなギャップがある場合、そのほとんどが極端なモデル圧縮を達成するのに適していない。本稿では,LLMの知識を極小モデル(例えば1%)に効果的に伝達する,Retrieval-based Knowledge Transfer (RetriKT)と呼ばれる新しい圧縮パラダイムを提案する。特に,本手法では,LLMから知識を抽出して知識ストアを構築する。モデルの質を向上させるために、ソフトプロンプトチューニングと近位政策最適化(ppo)強化学習技術が採用されている。 SuperGLUE と GLUE ベンチマークによる低リソースタスクに対する大規模な実験が行われた。提案手法はLLMの知識を活用することにより,小規模モデルの性能を著しく向上することを示す。 Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. While numerous model compression techniques have been proposed, most of them are not well-suited for achieving extreme model compression when there is a significant gap in model scale. In this paper, we introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT), which effectively transfers the knowledge of LLMs to extremely small-scale models (e.g., 1%). In particular, our approach extracts knowledge from LLMs to construct a knowledge store, from which the small-scale model can retrieve relevant information and leverage it for effective inference. To improve the quality of the model, soft prompt tuning and Proximal Policy Optimization (PPO) reinforcement learning techniques are employed. Extensive experiments are conducted on low-resource tasks from SuperGLUE and GLUE benchmarks. The results demonstrate that the proposed approach significantly enhances the performance of small-scale models by leveraging the knowledge from LLMs.	翻訳日:2023-10-25 20:02:05 公開日:2023-10-24
# 顔データ最小化: プライバシーフィルターとしての浅いモデル Facial Data Minimization: Shallow Model as Your Privacy Filter ( http://arxiv.org/abs/2310.15590v1 ) ライセンス: Link先を確認	Yuwen Pu, Jiahao Chen, Jiayu Pan, Hao li, Diqun Yan, Xuhong Zhang, Shouling Ji	(参考訳) 顔認識サービスは、多くの分野で使われており、人々に多くの利便性をもたらしている。しかし、ユーザの顔データがサービスプロバイダに送信されると、ユーザはプライベートデータのコントロールを失うことになる。近年,顔データ漏洩によるセキュリティやプライバシの問題が数多く発生している。多くのプライバシー保護手法が提案されているが、通常は敵の戦略や補助データにアクセスできない場合に失敗する。そこで本稿では,顔認識サービスシステムにおいて非常に典型的な顔画像と顔特徴をアップロードする2つの事例を十分に検討し,データプライバシ最小化変換(pmt)法を提案する。この方法は、認証サービスの浅いモデルに基づいて元の顔データを処理し、難読化データを得る。難読化されたデータは、認可されたモデルの満足なパフォーマンスを維持し、他の許可されていないモデルのパフォーマンスを制限するだけでなく、AIメソッドや人間の視覚的盗難によって元のプライバシデータが漏洩することを防ぐ。また,サービスプロバイダが受信したデータに対して事前処理を行うことができるため,PMTの堅牢性を向上させるための摂動法も提案する。さらに、1つの顔画像を複数のサービスモデルに同時に認可するために、PMTのスケーラビリティを向上させるために複数の制限機構を提案する。最後に,提案するpmtによる顔再建,データ乱用,顔属性推定攻撃に対する防御効果について,広範な実験を行い,その効果を評価した。これらの実験結果から, PMTは顔認識精度を維持しつつ, 顔データの乱用やプライバシーの漏洩を防止できることがわかった。 Face recognition service has been used in many fields and brings much convenience to people. However, once the user's facial data is transmitted to a service provider, the user will lose control of his/her private data. In recent years, there exist various security and privacy issues due to the leakage of facial data. Although many privacy-preserving methods have been proposed, they usually fail when they are not accessible to adversaries' strategies or auxiliary data. Hence, in this paper, by fully considering two cases of uploading facial images and facial features, which are very typical in face recognition service systems, we proposed a data privacy minimization transformation (PMT) method. This method can process the original facial data based on the shallow model of authorized services to obtain the obfuscated data. The obfuscated data can not only maintain satisfactory performance on authorized models and restrict the performance on other unauthorized models but also prevent original privacy data from leaking by AI methods and human visual theft. Additionally, since a service provider may execute preprocessing operations on the received data, we also propose an enhanced perturbation method to improve the robustness of PMT. Besides, to authorize one facial image to multiple service models simultaneously, a multiple restriction mechanism is proposed to improve the scalability of PMT. Finally, we conduct extensive experiments and evaluate the effectiveness of the proposed PMT in defending against face reconstruction, data abuse, and face attribute estimation attacks. These experimental results demonstrate that PMT performs well in preventing facial data abuse and privacy leakage while maintaining face recognition accuracy.	翻訳日:2023-10-25 20:01:49 公開日:2023-10-24
# ScanDL:テキストによる合成スキャンパス生成のための拡散モデル ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts ( http://arxiv.org/abs/2310.15587v1 ) ライセンス: Link先を確認	Lena S. Bolliger, David R. Reich, Patrick Haller, Deborah N. Jakobi, Paul Prasse, Lena A. J\"ager	(参考訳) 読書における眼球運動は、人間の言語処理の基礎となる認知メカニズムの研究において重要な役割を担っている。近年,目の動きと認知の密結合は,言語モデルの解釈可能性,拡張性,事前学習といった言語関連機械学習タスクや,読み手やテキスト特有の特性の推論にも活用されている。しかし、眼球運動データの不足とアプリケーション時の利用不可は、この研究のラインにとって大きな課題となっている。当初は、眼球運動データを合成するための認知モデルを用いてこの問題に対処した。しかし、人間のようなスキャンパスを生成する唯一の目的として、純粋にデータ駆動型機械学習ベースの手法の方が適していることが証明されている。近年の拡散過程を離散データに適用する進歩に続いて,テキスト上で合成スキャンパスを生成する新しい離散シーケンス-シーケンス間拡散モデルであるscandlを提案する。事前学習した単語表現を活用し、刺激テキストと固定シーケンスを併用することにより、2つの入力間のマルチモーダル相互作用を捉える。本研究では,データセット内のscandlを評価し,最先端のscanpath生成法を著しく上回っていることを示す。最後に、モデルが人間的な読書行動を示す能力の基盤となる広範な心理言語学的分析を提供する。実装はhttps://github.com/dili-lab/scandlで利用可能です。 Eye movements in reading play a crucial role in psycholinguistic research studying the cognitive mechanisms underlying human language processing. More recently, the tight coupling between eye movements and cognition has also been leveraged for language-related machine learning tasks such as the interpretability, enhancement, and pre-training of language models, as well as the inference of reader- and text-specific properties. However, scarcity of eye movement data and its unavailability at application time poses a major challenge for this line of research. Initially, this problem was tackled by resorting to cognitive models for synthesizing eye movement data. However, for the sole purpose of generating human-like scanpaths, purely data-driven machine-learning-based methods have proven to be more suitable. Following recent advances in adapting diffusion processes to discrete data, we propose ScanDL, a novel discrete sequence-to-sequence diffusion model that generates synthetic scanpaths on texts. By leveraging pre-trained word representations and jointly embedding both the stimulus text and the fixation sequence, our model captures multi-modal interactions between the two inputs. We evaluate ScanDL within- and across-dataset and demonstrate that it significantly outperforms state-of-the-art scanpath generation methods. Finally, we provide an extensive psycholinguistic analysis that underlines the model's ability to exhibit human-like reading behavior. Our implementation is made available at https://github.com/DiLi-Lab/ScanDL.	翻訳日:2023-10-25 20:01:24 公開日:2023-10-24
# 自己監督型深層学習を用いた開海サーベイランスにおける意図的AISシャットダウンの検出 Detecting Intentional AIS Shutdown in Open Sea Maritime Surveillance Using Self-Supervised Deep Learning ( http://arxiv.org/abs/2310.15586v1 ) ライセンス: Link先を確認	Pierre Bernab\'e, Arnaud Gotlieb, Bruno Legeard, Dusica Marijan, Frank Olaf Sem-Jacobsen, Helge Spieker	(参考訳) 海上交通監視においては、違法漁業や違法商品の輸送などの違法行為の検知は沿岸管理にとって重要な課題である。開海では、自動識別システム(ais)のメッセージがオンボードのトランスポンダーによって送信され、監視衛星によって捕捉される。しかし、インシンセア船はしばしば違法行為を隠すためにAISトランスポンダを故意にシャットダウンする。開海では、プロトコルの制限、悪天候条件、衛星位置の制限により、意図的なAISシャットダウンと受信の欠如を区別することが非常に困難である。本稿では,自己教師付き深層学習手法とトランスフォーマーモデルに基づく異常ais欠落検出のための新しい手法を提案する。トレーニングされたモデルは、履歴データを使用して、次の分にメッセージを受け取るかどうかを予測する。その後、モデルが検出された異常を予測と実際に何が起こるかを比較して報告する。本手法は,6万以上の船舶の軌道に対応して,毎月5億以上のaisメッセージをリアルタイムに処理することができる。この手法は、ノルウェーの4つの観測衛星から得られた1年間の実世界のデータに基づいて評価される。関連研究結果を用いて,すでに検出されているAIS停止を再度発見し,本手法の有効性を検証した。 In maritime traffic surveillance, detecting illegal activities, such as illegal fishing or transshipment of illicit products is a crucial task of the coastal administration. In the open sea, one has to rely on Automatic Identification System (AIS) message transmitted by on-board transponders, which are captured by surveillance satellites. However, insincere vessels often intentionally shut down their AIS transponders to hide illegal activities. In the open sea, it is very challenging to differentiate intentional AIS shutdowns from missing reception due to protocol limitations, bad weather conditions or restricting satellite positions. This paper presents a novel approach for the detection of abnormal AIS missing reception based on self-supervised deep learning techniques and transformer models. Using historical data, the trained model predicts if a message should be received in the upcoming minute or not. Afterwards, the model reports on detected anomalies by comparing the prediction with what actually happens. Our method can process AIS messages in real-time, in particular, more than 500 Millions AIS messages per month, corresponding to the trajectories of more than 60 000 ships. The method is evaluated on 1-year of real-world data coming from four Norwegian surveillance satellites. Using related research results, we validated our method by rediscovering already detected intentional AIS shutdowns.	翻訳日:2023-10-25 20:01:00 公開日:2023-10-24
# 教師指導による構成的視覚推論のためのマルチモーダル表現 Multimodal Representations for Teacher-Guided Compositional Visual Reasoning ( http://arxiv.org/abs/2310.15585v1 ) ライセンス: Link先を確認	Wafa Aissa (CEDRIC - VERTIGO), Marin Ferecatu (CEDRIC - VERTIGO), Michel Crucianu (CEDRIC - VERTIGO)	(参考訳) ニューラルモジュールネットワーク(Neural Module Networks, NMN)は、画像上で順次実行される一連の推論サブタスクからなるプログラムへの質問の変換を可能にする視覚的質問応答のための魅力的な方法である。 nmnは統合モデルと比較して説明可能性を高め、基礎となる推論プロセスの理解を深める。 nmnの有効性を向上させるため,大規模クロスモーダルエンコーダで得られた特徴を活用できる。また、現在のNMNsのトレーニング手法は、モジュール出力をその後のモジュールに伝播させることに依存しており、予測誤差の蓄積と偽解の生成につながる。これを軽減するために,教師指導を含むNMN学習戦略を導入する。当初、このモデルは地道な中間出力によって完全に導かれるが、訓練が進むにつれて徐々に自律的な行動へと移行する。これにより、誤り蓄積を低減し、トレーニング効率と最終性能を向上し、クロスモーダル機能を導入し、NMNにより効果的なトレーニング技術を採用することにより、推論プロセスにおける性能と透明性のバランスが良好であることを実証する。 Neural Module Networks (NMN) are a compelling method for visual question answering, enabling the translation of a question into a program consisting of a series of reasoning sub-tasks that are sequentially executed on the image to produce an answer. NMNs provide enhanced explainability compared to integrated models, allowing for a better understanding of the underlying reasoning process. To improve the effectiveness of NMNs we propose to exploit features obtained by a large-scale cross-modal encoder. Also, the current training approach of NMNs relies on the propagation of module outputs to subsequent modules, leading to the accumulation of prediction errors and the generation of false answers. To mitigate this, we introduce an NMN learning strategy involving scheduled teacher guidance. Initially, the model is fully guided by the ground-truth intermediate outputs, but gradually transitions to an autonomous behavior as training progresses. This reduces error accumulation, thus improving training efficiency and final performance.We demonstrate that by incorporating cross-modal features and employing more effective training techniques for NMN, we achieve a favorable balance between performance and transparency in the reasoning process.	翻訳日:2023-10-25 20:00:40 公開日:2023-10-24
# 無線通信ネットワークによる分割フェデレーション学習の高速化 Accelerating Split Federated Learning over Wireless Communication Networks ( http://arxiv.org/abs/2310.15584v1 ) ライセンス: Link先を確認	Ce Xu, Jinxuan Li, Yuan Liu, Yushi Ling, and Miaowen Wen	(参考訳) 人工知能(AI)の開発は、ディープニューラルネットワーク(DNN)ベースのアプリケーションを促進する機会を提供する。しかし、DNNの大量のパラメータと計算複雑性により、リソース制約のあるエッジデバイスにデプロイすることは困難である。この課題に対処する効果的な方法はモデル分割/分割であり、DNNはデバイスとサーバにそれぞれデプロイされる2つの部分に分けられる。本稿では,連合学習(fl)の並列モデル学習機構と分割学習(sl)のモデル分割構造を組み合わせたslit federated learning(sfl)フレームワークについて検討する。 DNNの個別分割点を持つ異種デバイスの実用シナリオを考察する。システム遅延を最小限に抑えるために,分割点選択と帯域割り当ての連立問題を定式化する。交互最適化を用いることで、問題を2つのサブプロブレムに分解し、最適に解く。実験の結果,レイテンシ低減と精度向上における作業の優位性を実証した。 The development of artificial intelligence (AI) provides opportunities for the promotion of deep neural network (DNN)-based applications. However, the large amount of parameters and computational complexity of DNN makes it difficult to deploy it on edge devices which are resource-constrained. An efficient method to address this challenge is model partition/splitting, in which DNN is divided into two parts which are deployed on device and server respectively for co-training or co-inference. In this paper, we consider a split federated learning (SFL) framework that combines the parallel model training mechanism of federated learning (FL) and the model splitting structure of split learning (SL). We consider a practical scenario of heterogeneous devices with individual split points of DNN. We formulate a joint problem of split point selection and bandwidth allocation to minimize the system latency. By using alternating optimization, we decompose the problem into two sub-problems and solve them optimally. Experiment results demonstrate the superiority of our work in latency reduction and accuracy improvement.	翻訳日:2023-10-25 20:00:20 公開日:2023-10-24
# ガウス過程回帰による保証被覆予測間隔 Guaranteed Coverage Prediction Intervals with Gaussian Process Regression ( http://arxiv.org/abs/2310.15641v1 ) ライセンス: Link先を確認	Harris Papadopoulos	(参考訳) ガウス過程回帰 (gaussian process regression, gpr) は一般的な回帰法であり、多くの機械学習技術とは異なり、予測の不確実性の推定を提供する。しかしながら、これらの不確実性の推定は、モデルが十分に特定されているという仮定に基づいている。その結果、生成した不確実性推定は、例えば、95%の信頼度で生成される予測間隔(PI)が、真のラベルの95%未満をカバーすることができる。この問題に対処するため,本稿では,CP(Conformal Prediction)と呼ばれる機械学習フレームワークに基づくGPRの拡張を提案する。この拡張は、モデルを完全に不特定であっても、必要なカバレッジでPIの生成を保証する。提案手法は,GPRの利点とCPの有効なカバレッジ保証を組み合わせ,実験により既存の手法よりも優れていることを示す。 Gaussian Process Regression (GPR) is a popular regression method, which unlike most Machine Learning techniques, provides estimates of uncertainty for its predictions. These uncertainty estimates however, are based on the assumption that the model is well-specified, an assumption that is violated in most practical applications, since the required knowledge is rarely available. As a result, the produced uncertainty estimates can become very misleading; for example the prediction intervals (PIs) produced for the 95\% confidence level may cover much less than 95\% of the true labels. To address this issue, this paper introduces an extension of GPR based on a Machine Learning framework called, Conformal Prediction (CP). This extension guarantees the production of PIs with the required coverage even when the model is completely misspecified. The proposed approach combines the advantages of GPR with the valid coverage guarantee of CP, while the performed experimental results demonstrate its superiority over existing methods.	翻訳日:2023-10-25 19:52:38 公開日:2023-10-24
# coannotating: データアノテーションのための人間と大規模言語モデル間の不確実性に基づく作業割り当て CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation ( http://arxiv.org/abs/2310.15638v1 ) ライセンス: Link先を確認	Minzhi Li, Taiwei Shi, Caleb Ziems, Min-Yen Kan, Nancy F. Chen, Zhengyuan Liu, Diyi Yang	(参考訳) 注釈付きデータは、訓練モデルにおいて自然言語処理(NLP)において重要な役割を果たす。近年のLLM(Large Language Models)の発展を踏まえると、ChatGPTのようなモデルは、人間のアノテーションと同等かそれ以上の多くのテキストアノテーションタスクにおいてゼロショット機能を示す。このようなllmは、コストの低減とスケーラビリティの向上により、手動アノテーションの代替として機能する。しかし,LLMを補完的なアノテータとして活用した限定的な研究や,品質とコストの両方の目的を達成するために,人間とLLMの間でアノテーション作業がどのように最適に割り当てられているかを考察した。本稿では,非構造化テキストの大規模共同アノテーションのための新しいパラダイムであるCoAnnotatingを提案する。この枠組みでは、不確実性を利用してLCMのアノテーション能力を推定する。我々の実証研究は、CoAnnotatingが、異なるデータセットで結果から作業を割り当てる効果的な手段であることを示し、ランダムベースラインよりも最大21%パフォーマンスが改善されている。コード実装についてはhttps://github.com/SALT-NLP/CoAnnotatingを参照。 Annotated data plays a critical role in Natural Language Processing (NLP) in training models and evaluating their performance. Given recent developments in Large Language Models (LLMs), models such as ChatGPT demonstrate zero-shot capability on many text-annotation tasks, comparable with or even exceeding human annotators. Such LLMs can serve as alternatives for manual annotation, due to lower costs and higher scalability. However, limited work has leveraged LLMs as complementary annotators, nor explored how annotation work is best allocated among humans and LLMs to achieve both quality and cost objectives. We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale. Under this framework, we utilize uncertainty to estimate LLMs' annotation capability. Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline. For code implementation, see https://github.com/SALT-NLP/CoAnnotating.	翻訳日:2023-10-25 19:52:22 公開日:2023-10-24
# Resume Representation Learningとスキルベースマッチングを用いたキャリアパス予測 Career Path Prediction using Resume Representation Learning and Skill-based Matching ( http://arxiv.org/abs/2310.15636v1 ) ライセンス: Link先を確認	Jens-Joris Decorte, Jeroen Van Hautte, Johannes Deleu, Chris Develder and Thomas Demeester	(参考訳) 求職者の満足度とパフォーマンスにフィットするパーソン・ジョブの影響は広く認識されており、キャリアにおける正しいタイミングで労働者に次のステップを提供することの重要性を強調している。キャリアの次のステップを予測するこのタスクは、キャリアパス予測と呼ばれ、ターンオーバー防止や社内仕事の移動といった多様な応用がある。既存のキャリアパス予測手法は、職種と企業間の相互作用をモデル化するために、大量のプライベートキャリア履歴データに依存している。本稿では,履歴書の作業経験セクションの一部である未検討のテキスト記述を活用することを提案する。 ESCOの職業ラベルにアノテートした2,164人の匿名キャリア履歴の構造化データセットを導入する。このデータセットに基づいて,作業履歴データ専用に設計された新しい表現学習手法である careerbert を提案する。キャリアパス予測のためのスキルベースモデルとテキストベースモデルを開発し,データセット上でそれぞれ35.24%と39.61%のre recall@10を達成した。最後に、ハイブリッドアプローチが43.01%のリコール@10で最強の結果を得るため、両方のアプローチが相補的であることを示す。 The impact of person-job fit on job satisfaction and performance is widely acknowledged, which highlights the importance of providing workers with next steps at the right time in their career. This task of predicting the next step in a career is known as career path prediction, and has diverse applications such as turnover prevention and internal job mobility. Existing methods to career path prediction rely on large amounts of private career history data to model the interactions between job titles and companies. We propose leveraging the unexplored textual descriptions that are part of work experience sections in resumes. We introduce a structured dataset of 2,164 anonymized career histories, annotated with ESCO occupation labels. Based on this dataset, we present a novel representation learning approach, CareerBERT, specifically designed for work history data. We develop a skill-based model and a text-based model for career path prediction, which achieve 35.24% and 39.61% recall@10 respectively on our dataset. Finally, we show that both approaches are complementary as a hybrid approach achieves the strongest result with 43.01% recall@10.	翻訳日:2023-10-25 19:51:51 公開日:2023-10-24
# 言語設計、ライブラリ、ガベージコレクションで64ビットアーキテクチャを最大限活用するためのヒント Tips for making the most of 64-bit architectures in langage design, libraries or garbage collection ( http://arxiv.org/abs/2310.15632v1 ) ライセンス: Link先を確認	Beno\^it Sonntag (UNISTRA), Dominique Colnet (LORIA)	(参考訳) 今日標準になった64ビットアーキテクチャは、前例のない低レベルプログラミングの可能性を秘めている。 For the first time in the history of computing, the size of address registers far exceeded the physical capacity of their bus.After a brief reminder of the possibilities offered by the small size of addresses compared to the available 64 bits,we develop three concrete examples of how the vacant bits of these registers can be used.Among these examples, two of them concern the implementation of a library for a new statically typed programming language.Firstly, the implementation of multi-precision integers, with the aim of improving performance in terms of both calculation speed and RAM savings.The second example focuses on the library's handling of UTF-8 character strings.Here, the idea is to make indexing easier by ignoring the physical size of each UTF-8 characters.Finally, the third example is a possible enhancement of garbage collectors, in particular the mark \& sweep for the object marking phase. The 64-bit architectures that have become standard today offer unprecedented low-level programming possibilities. For the first time in the history of computing, the size of address registers far exceeded the physical capacity of their bus.After a brief reminder of the possibilities offered by the small size of addresses compared to the available 64 bits,we develop three concrete examples of how the vacant bits of these registers can be used.Among these examples, two of them concern the implementation of a library for a new statically typed programming language.Firstly, the implementation of multi-precision integers, with the aim of improving performance in terms of both calculation speed and RAM savings.The second example focuses on the library's handling of UTF-8 character strings.Here, the idea is to make indexing easier by ignoring the physical size of each UTF-8 characters.Finally, the third example is a possible enhancement of garbage collectors, in particular the mark \& sweep for the object marking phase.	翻訳日:2023-10-25 19:51:18 公開日:2023-10-24
# 圧縮量子波形推定 Compressive quantum waveform estimation ( http://arxiv.org/abs/2310.15630v1 ) ライセンス: Link先を確認	Alex Tritt, Joshua Morris, Christopher C. Bounds, Hamish A. M. Taylor, James Saunderson, L. D. Turner	(参考訳) 量子センサーを信号全体(量子波形推定)のサンプルに適用することは、医療研究のためにニューロンが生成する電気パルスのモニタリングなど、小さな信号のセンシングに革命をもたらす。しかし、量子リソース(例えば、長いセンシング時間や多くの破壊的測定値)の集中的な使用は、現在の実装を現実世界での使用には実用的ではない。そこで本論文では, 合成神経様信号の量子波形推定を実験的に実証し, 必要となる以上の寒冷原子測定を行った。 Applying quantum sensors to sample entire signals (quantum waveform estimation) promises to revolutionize the sensing of small signals, such as the monitoring of electrical pulses generated by neurons for medical research. However, intensive use of quantum resources (e.g., long sensing times and/or many destructive measurements) make current implementations impractical for real-world use. In this Letter, we experimentally demonstrate quantum waveform estimation of a synthesized neural-like signal, taking many fewer cold-atom measurements than would naively be necessary.	翻訳日:2023-10-25 19:50:55 公開日:2023-10-24
# シリコン系バレーフォトニック結晶における光周波数コムのオンチップ位相輸送 On-chip topological transport of optical frequency combs in silicon-based valley photonic crystals ( http://arxiv.org/abs/2310.15629v1 ) ライセンス: Link先を確認	Zhen Jiang, Hongwei Wang, Yuechen Yang, Yang Shen, Bo Ji, Yanghe Chen, Yong Zhang, Lu Sun, Zheng Wang, Chun Jiang, Yikai Su, and Guangqiang He	(参考訳) 集積フォトニックシステムにおける光周波数コムの生成と制御は、複雑で高可制御性で大規模デバイスを可能にする。平行して、多粒子系におけるトポロジカル物理学の活用は、製造の不完全性に対する堅牢性のような魅力的な特徴を持つ。ここでは,古典的領域と非古典的領域の両方において,通信波長における光周波数コムのオンチップトポロジ輸送を実験的に実証する。量子周波数コムと消散性Kerrソリトンコムの両方にマイクロ共振器でアクセスする。量子周波数コム、すなわち多重周波数モードのコヒーレント重ね合わせは、周波数絡み合いqudit状態であることが証明されている。また, 散逸性カーソリトンコームは, 集団的コヒーレンスやソリトンの自己組織化により, 高いコヒーレント性とモード同期性を示す。さらに、バレー・キンク状態は、量子周波数コムと散逸性カー・ソリトンコムの両方を、鋭い曲がりに対する頑丈さで許容する。位相的に保護された光周波数コムは、複合フォトニックシステムにおいて固有のロバスト性を可能にする。 The generation and control of optical frequency combs in integrated photonic systems enables complex, high-controllable, and large-scale devices. In parallel, harnessing topological physics in multipartite systems has allowed them with compelling features such as robustness against fabrication imperfections. Here we experimentally demonstrate on-chip topological transport for optical frequency combs at telecommunication wavelengths, both in classical and nonclassical domains. We access both the quantum frequency combs and dissipative Kerr soliton combs with a micro-resonator. The quantum frequency comb, that is, a coherent superposition of multiple frequency modes, is proven to be a frequency-entangled qudit state. We also show that dissipative Kerr soliton combs are highly coherent and mode-locked due to the collective coherence or self-organization of solitons. Moreover, the valley kink states allow both quantum frequency combs and dissipative Kerr soliton combs with robustness against sharp bends. Our topologically protected optical frequency combs could enable the inherent robustness in integrated complex photonic systems.	翻訳日:2023-10-25 19:50:28 公開日:2023-10-24
# 文脈指向非巡回グラフ Contextual directed acyclic graphs ( http://arxiv.org/abs/2310.15627v1 ) ライセンス: Link先を確認	Ryan Thompson, Edwin V. Bonilla, Robert Kohn	(参考訳) 観測データから有向非巡回グラフ(DAG)の構造を推定することは、機械学習において重要な課題である。この地域のほとんどの研究は、人口の1つのDAGを学ぶことに集中している。本稿では、利用可能な「文脈的」特徴に基づき、個人間でグラフ構造が変化する別の設定を検討する。我々は、コンテキスト特徴を重み付き隣接行列として表されるDAGにマッピングするニューラルネットワークを介して、このコンテキストDAG問題に取り組む。ニューラルネットワークは、出力行列がスパースであることを保証する新規な投影層を備え、最近開発された非循環性の特徴を満足する。我々は,コンテキストDAGを学習するためのスケーラブルな計算フレームワークを考案し,プロジェクション層をバックプロパゲーションするための収束保証と解析的勾配を提供する。実験の結果,既存手法が失敗するコンテキスト固有グラフを復元できる可能性が示唆された。 Estimating the structure of directed acyclic graphs (DAGs) from observational data remains a significant challenge in machine learning. Most research in this area concentrates on learning a single DAG for the entire population. This paper considers an alternative setting where the graph structure varies across individuals based on available "contextual" features. We tackle this contextual DAG problem via a neural network that maps the contextual features to a DAG, represented as a weighted adjacency matrix. The neural network is equipped with a novel projection layer that ensures the output matrices are sparse and satisfy a recently developed characterization of acyclicity. We devise a scalable computational framework for learning contextual DAGs and provide a convergence guarantee and an analytical gradient for backpropagating through the projection layer. Our experiments suggest that the new approach can recover the true context-specific graph where existing approaches fail.	翻訳日:2023-10-25 19:50:02 公開日:2023-10-24
# gupnet++: 単眼3次元物体検出のための幾何不確かさ伝播ネットワーク GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection ( http://arxiv.org/abs/2310.15624v1 ) ライセンス: Link先を確認	Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang	(参考訳) 幾何学は単眼3次元物体検出において重要な役割を担っている。物体の物理的大きさと画像平面の2次元投影の間の視点投影を用いて物体の深さを推定することができ、深部モデルに数学的先行性を導入することができる。しかし、このプロジェクションプロセスは、推定高さの誤差を増幅し、投影された深さに反映する誤差増幅も導入する。信頼できない深さの推測を導き、トレーニングの安定性を損なう。そこで本研究では,幾何投影を確率論的にモデル化し,新たな幾何不確かさ伝播ネットワーク(gupnet++)を提案する。これにより、深さ予測が十分に拘束され、合理的な不確実性に結びつくことが保証される。このような幾何学的不確実性を導入する意義は、2つある:(1)。トレーニング中の幾何射影の不確かさ伝播関係をモデル化し、エンドツーエンドモデル学習の安定性と効率を向上させる。 (2). 3D検出結果の品質を示す信頼性の高い信頼性に導出することができ、より信頼性の高い検出推測を可能にする。実験により,提案手法は画像ベースモノクロ3次元検出におけるSOTA性能を得るだけでなく,簡易なフレームワークによる有効性も示す。 Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and reflected into the projected depth. It leads to unreliable depth inferences and also impairs training stability. To tackle this problem, we propose a novel Geometry Uncertainty Propagation Network (GUPNet++) by modeling geometry projection in a probabilistic manner. This ensures depth predictions are well-bounded and associated with a reasonable uncertainty. The significance of introducing such geometric uncertainty is two-fold: (1). It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning. (2). It can be derived to a highly reliable confidence to indicate the quality of the 3D detection result, enabling more reliable detection inference. Experiments show that the proposed approach not only obtains (state-of-the-art) SOTA performance in image-based monocular 3D detection but also demonstrates superiority in efficacy with a simplified framework.	翻訳日:2023-10-25 19:49:30 公開日:2023-10-24
# Nkoの機械翻訳:ツール、コーパス、ベースライン結果 Machine Translation for Nko: Tools, Corpora and Baseline Results ( http://arxiv.org/abs/2310.15612v1 ) ライセンス: Link先を確認	Moussa Koulako Bala Doumbouya, Baba Mamadi Dian\'e, Solo Farabado Ciss\'e, Djibrila Dian\'e, Abdoulaye Sow, S\'er\'e Moussa Doumbouya, Daouda Bangoura, Fod\'e Moriba Bayo, Ibrahima Sory 2. Cond\'e, Kalo Mory Dian\'e, Chris Piech, Christopher Manning	(参考訳) 現在、複数の西アフリカ諸国で何千万人もの人々が話している言語であるNkoの機械翻訳システムは存在しない。この問題に対処するために,現在十分に大きな並列テキストコーパスを持っていないNkoや他の言語向けの機械翻訳システムの開発を目的とした,ツール,リソース,ベースラインの一連の結果を示す。 1) Friallel: 複写ベースのワークフローによる品質管理を取り入れた,新しい並列テキストキュレーションソフトウェア。 2) FLoRes-200とNLLB-Seedの2,009,6,193の高品質なNko翻訳を204,40言語と並行して拡張した。 3) nicolingua-0005:130,850の並列セグメントを持つ三言語・二言語コーパスと300万以上のnko単語を含む単言語コーパスのコレクション。 (4) ベースラインバイリンガルおよび多言語ニューラルマシン翻訳の結果、FLoRes-devtest上での英語Nko chrF++のスコアが30.83である。 Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. (1) Friallel: A novel collaborative parallel text curation software that incorporates quality control through copyedit-based workflows. (2) Expansion of the FLoRes-200 and NLLB-Seed corpora with 2,009 and 6,193 high-quality Nko translations in parallel with 204 and 40 other languages. (3) nicolingua-0005: A collection of trilingual and bilingual corpora with 130,850 parallel segments and monolingual corpora containing over 3 million Nko words. (4) Baseline bilingual and multilingual neural machine translation results with the best model scoring 30.83 English-Nko chrF++ on FLoRes-devtest.	翻訳日:2023-10-25 19:49:08 公開日:2023-10-24
# Slisemapを使って物理データを解釈する Using Slisemap to interpret physical data ( http://arxiv.org/abs/2310.15610v1 ) ライセンス: Link先を確認	Lauri Sepp\"al\"ainen, Anton Bj\"orklund, Vitus Besel and Kai Puolam\"aki	(参考訳) マニフォールド可視化技術は、物理科学における高次元データセットの可視化に一般的に用いられている。本稿では,最近導入されたsliseと呼ばれる多様体可視化法を,物理と化学のデータセットに適用する。 Slisemapは、多様体の可視化と説明可能な人工知能を組み合わせる。説明可能な人工知能は、ブラックボックス機械学習モデルと複雑なシミュレータの決定過程を調べるために使用される。 Slisemapでは、類似のローカル説明を持つデータ項目がグループ化されるような埋め込みが見つかる。従って、slisemapは、ブラックボックスモデルのさまざまな振る舞いの概要を提供する。これにより、Slisemapは教師付き多様体可視化法となり、埋め込みのパターンは対象特性を反映する。本稿では,Slisemapを物理データ上でどのように利用し,評価し,Slisemapがこれらのデータセットでトレーニングされた分類と回帰モデルに関する有意義な情報を見つけるのに有効であることを示す。 Manifold visualisation techniques are commonly used to visualise high-dimensional datasets in physical sciences. In this paper we apply a recently introduced manifold visualisation method, called Slise, on datasets from physics and chemistry. Slisemap combines manifold visualisation with explainable artificial intelligence. Explainable artificial intelligence is used to investigate the decision processes of black box machine learning models and complex simulators. With Slisemap we find an embedding such that data items with similar local explanations are grouped together. Hence, Slisemap gives us an overview of the different behaviours of a black box model. This makes Slisemap into a supervised manifold visualisation method, where the patterns in the embedding reflect a target property. In this paper we show how Slisemap can be used and evaluated on physical data and that Slisemap is helpful in finding meaningful information on classification and regression models trained on these datasets.	翻訳日:2023-10-25 19:48:52 公開日:2023-10-24
# 限界テスト: 大規模言語モデルを用いたモバイルアプリクラッシュ検出のための不規則テキスト入力生成 Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model ( http://arxiv.org/abs/2310.15657v1 ) ライセンス: Link先を確認	Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, Qing Wang	(参考訳) モバイルアプリは私たちの日常生活のユビキタスな部分となり、ユーザはさまざまなサービスやユーティリティにアクセスできるようになる。テキスト入力は、ユーザとアプリケーションの間の重要な対話チャネルとして、検索クエリ、認証、メッセージングなどのコア機能において重要な役割を果たす。しかし、特定の特別なテキスト(例えばFont Sizeの-18)は、アプリをクラッシュさせ、アプリを完全テストするための多様な特異な入力を生成することが要求される。しかし、これは爆発ジレンマ、高文脈感度、複雑な制約関係の組み合わせによっても困難である。本稿では,LLMを利用してモバイルアプリのクラッシュ検出のための異常なテキスト入力を自動的に生成するInputBlasterを提案する。異常な入力生成問題を一連のテストジェネレータを生成するタスクとして定式化し、それぞれが同じ突然変異規則の下で異常なテキスト入力のバッチを生成する。詳しくは、インプットブラスターがllmを利用して、推論チェインとして機能する突然変異ルールと共にテストジェネレータを生成し、コンテキスト内学習スキーマを使用して、パフォーマンス向上の例を示す。 inputblasterは36のテキスト入力ウィジェットで評価され、31の人気のあるandroidアプリを含むキャッシュバグがあり、78%のバグ検出率を達成し、最高のベースラインよりも136%高い。また、自動GUIテストツールと統合し、Google Playの現実世界のアプリの37のクラッシュを検知します。 Mobile applications have become a ubiquitous part of our daily life, providing users with access to various services and utilities. Text input, as an important interaction channel between users and applications, plays an important role in core functionality such as search queries, authentication, messaging, etc. However, certain special text (e.g., -18 for Font Size) can cause the app to crash, and generating diversified unusual inputs for fully testing the app is highly demanded. Nevertheless, this is also challenging due to the combination of explosion dilemma, high context sensitivity, and complex constraint relations. This paper proposes InputBlaster which leverages the LLM to automatically generate unusual text inputs for mobile app crash detection. It formulates the unusual inputs generation problem as a task of producing a set of test generators, each of which can yield a batch of unusual text inputs under the same mutation rule. In detail, InputBlaster leverages LLM to produce the test generators together with the mutation rules serving as the reasoning chain, and utilizes the in-context learning schema to demonstrate the LLM with examples for boosting the performance. InputBlaster is evaluated on 36 text input widgets with cash bugs involving 31 popular Android apps, and results show that it achieves 78% bug detection rate, with 136% higher than the best baseline. Besides, we integrate it with the automated GUI testing tool and detect 37 unseen crashes in real-world apps from Google Play.	翻訳日:2023-10-25 19:43:07 公開日:2023-10-24
# モメンタム勾配に基づくハイパーグラフニューラルネットワークの標的外攻撃 Momentum Gradient-based Untargeted Attack on Hypergraph Neural Networks ( http://arxiv.org/abs/2310.15656v1 ) ライセンス: Link先を確認	Yang Chen, Stjepan Picek, Zhonglin Ye, Zhaoyang Wang and Haixing Zhao	(参考訳) ハイパグラフニューラルネットワーク(HGNN)は,高次表現能力に優れたため,様々なハイパーグラフ関連タスクに適用されている。近年の研究では、ディープラーニングモデルは敵の攻撃に弱いことが示されている。グラフニューラルネットワーク(GNN)を対象とするグラフ敵攻撃の研究はほとんど行われておらず、HGNNに対する敵攻撃の研究はほとんど未解明である。本稿では,このギャップを低減しようと試みる。我々は、ノード機能の変更に焦点を当てた、未ターゲット攻撃のための新しいHGNN攻撃モデル、MGHGAを設計する。我々はHGNNのトレーニングの過程を考察し、ハイパーグラフモデリングの前に代理モデルを用いて攻撃を実装する。具体的には、MGHGAは2つの部分から構成される。我々は,特徴選択モジュールにおける攻撃ノード機能を選択するために運動量勾配機構を用いる。特徴修正モジュールでは、MGHGAを離散的かつ連続的なデータセットに適用するために、2つの特徴生成アプローチ(直接修正と符号勾配)を用いる。我々は,5つのベンチマークデータセットを用いて,ノードにおけるMGHGAの攻撃性能と視覚オブジェクト分類タスクを検証する。その結果,MGHGAはベースラインよりも平均2%向上した。 Hypergraph Neural Networks (HGNNs) have been successfully applied in various hypergraph-related tasks due to their excellent higher-order representation capabilities. Recent works have shown that deep learning models are vulnerable to adversarial attacks. Most studies on graph adversarial attacks have focused on Graph Neural Networks (GNNs), and the study of adversarial attacks on HGNNs remains largely unexplored. In this paper, we try to reduce this gap. We design a new HGNNs attack model for the untargeted attack, namely MGHGA, which focuses on modifying node features. We consider the process of HGNNs training and use a surrogate model to implement the attack before hypergraph modeling. Specifically, MGHGA consists of two parts: feature selection and feature modification. We use a momentum gradient mechanism to choose the attack node features in the feature selection module. In the feature modification module, we use two feature generation approaches (direct modification and sign gradient) to enable MGHGA to be employed on discrete and continuous datasets. We conduct extensive experiments on five benchmark datasets to validate the attack performance of MGHGA in the node and the visual object classification tasks. The results show that MGHGA improves performance by an average of 2% compared to the than the baselines.	翻訳日:2023-10-25 19:42:41 公開日:2023-10-24
# 軽量cnnネットワークによる光流れの輝度整合性の破れ Breaking of brightness consistency in optical flow with a lightweight CNN network ( http://arxiv.org/abs/2310.15655v1 ) ライセンス: Link先を確認	Yicheng Lin, Shuo Wang, Yunlong Jiang and Bin Han	(参考訳) スパース光フローは様々なコンピュータビジョンタスクで広く使われているが、輝度の一貫性がハイダイナミックレンジ(HDR)環境での性能を制限すると仮定する。本研究では,光の強い畳み込み特性と強い不変性を持つコーナーを抽出するために,軽量ネットワークを用いる。畳み込み特性の整合性に対する光学流法の典型的な輝度の整合性を変化させると、光ローバストハイブリッド光流法が得られる。提案するネットワークは,4つの畳み込み層のみを使用して特徴マップとスコアマップを同時に抽出するため,商用CPU上で190 FPSで動作する。浅層ネットワークを直接訓練することは難しいため、深層ネットワークは信頼性マップを計算してそれを支援するように設計されている。両ネットワークでエンドツーエンドの教師なしトレーニングモードが使用される。提案手法の有効性を検証するため, 動的照明下でのコーナーリピータビリティと原点光流のマッチング性能を比較した。さらに、VINS-Monoの光学フロー法を置き換えることにより、より正確な視覚慣性システムを構築する。パブリックなHDRデータセットでは、翻訳エラーを93\%削減する。コードはhttps://github.com/linyicheng1/LET-NETで公開されている。 Sparse optical flow is widely used in various computer vision tasks, however assuming brightness consistency limits its performance in High Dynamic Range (HDR) environments. In this work, a lightweight network is used to extract illumination robust convolutional features and corners with strong invariance. Modifying the typical brightness consistency of the optical flow method to the convolutional feature consistency yields the light-robust hybrid optical flow method. The proposed network runs at 190 FPS on a commercial CPU because it uses only four convolutional layers to extract feature maps and score maps simultaneously. Since the shallow network is difficult to train directly, a deep network is designed to compute the reliability map that helps it. An end-to-end unsupervised training mode is used for both networks. To validate the proposed method, we compare corner repeatability and matching performance with origin optical flow under dynamic illumination. In addition, a more accurate visual inertial system is constructed by replacing the optical flow method in VINS-Mono. In a public HDR dataset, it reduces translation errors by 93\%. The code is publicly available at https://github.com/linyicheng1/LET-NET.	翻訳日:2023-10-25 19:42:21 公開日:2023-10-24
# llms生成コンテンツの検出に関する調査研究 A Survey on Detection of LLMs-Generated Content ( http://arxiv.org/abs/2310.15654v1 ) ライセンス: Link先を確認	Xianjun Yang, Liangming Pan, Xuandong Zhao, Haifeng Chen, Linda Petzold, William Yang Wang, Wei Cheng	(参考訳) ChatGPTのような先進的な大規模言語モデル(LLM)の急成長は、メディア、サイバーセキュリティ、公開談話、教育など、さまざまな分野に影響を及ぼす合成コンテンツ生成の増加につながっている。そのため,LSMの生成する内容を検出する能力は重要視されている。我々は,既存の検出戦略とベンチマークの詳細な概要を提供し,それらの相違点を精査し,この分野の重要な課題と展望を特定し,検出精度を高めるためにより適応的で堅牢なモデルを提案する。また,LSMの急速な機能向上に対応するため,様々な攻撃に対して多面的アプローチの必要性を示唆する。我々の知る限り、この研究はLLMの時代の検出に関する最初の総合的な調査である。我々は,LLMが生成するコンテンツ検出の現在の状況について広く理解し,合成コンテンツに支配される時代において,デジタル情報の完全性を維持しようと努力する研究者や実践者に対して,ガイダンスを提供することを期待している。関連論文の要約はhttps://github.com/Xianjun-Yang/Awesome_papers_on_LLMs_detection.gitで一貫して更新される。 The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT have led to an increase in synthetic content generation with implications across a variety of sectors, including media, cybersecurity, public discourse, and education. As such, the ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks, scrutinizing their differences and identifying key challenges and prospects in the field, advocating for more adaptable and robust models to enhance detection accuracy. We also posit the necessity for a multi-faceted approach to defend against various attacks to counter the rapidly advancing capabilities of LLMs. To the best of our knowledge, this work is the first comprehensive survey on the detection in the era of LLMs. We hope it will provide a broad understanding of the current landscape of LLMs-generated content detection, offering a guiding reference for researchers and practitioners striving to uphold the integrity of digital information in an era increasingly dominated by synthetic content. The relevant papers are summarized and will be consistently updated at https://github.com/Xianjun-Yang/Awesome_papers_on_LLMs_detection.git.	翻訳日:2023-10-25 19:42:02 公開日:2023-10-24
# メタ学習によるグラフ上の知覚的公正攻撃 Deceptive Fairness Attacks on Graphs via Meta Learning ( http://arxiv.org/abs/2310.15653v1 ) ライセンス: Link先を確認	Jian Kang, Yinglong Xia, Ross Maciejewski, Jiebo Luo, Hanghang Tong	(参考訳) グラフ学習モデルにおいて、どのようにして有害な攻撃を達成し、偏見を欺いて悪化させることができるのか? 本稿では,二段階最適化問題を通じてこの問題に答え,FATEというメタ学習ベースのフレームワークを提案する。 FATEは、様々な公正定義やグラフ学習モデル、操作操作の任意の選択に関して広く適用できる。さらに、グラフニューラルネットワーク上での統計的パリティと個別の公正性を攻撃するためにFATEをインスタンス化する。半教師付きノード分類のタスクにおいて,実世界のデータセットに対する広範な実験評価を行う。実験の結果,下流タスクの実用性を維持しつつ,公平性を考慮したグラフニューラルネットワークのバイアスを増大させる可能性が示唆された。本稿では、公正グラフ学習の対角的堅牢性に関する洞察を提供し、将来の研究における堅牢かつ公正なグラフ学習の設計に光を当てることを望む。 We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively? We answer this question via a bi-level optimization problem and propose a meta learning-based framework named FATE. FATE is broadly applicable with respect to various fairness definitions and graph learning models, as well as arbitrary choices of manipulation operations. We further instantiate FATE to attack statistical parity and individual fairness on graph neural networks. We conduct extensive experimental evaluations on real-world datasets in the task of semi-supervised node classification. The experimental results demonstrate that FATE could amplify the bias of graph neural networks with or without fairness consideration while maintaining the utility on the downstream task. We hope this paper provides insights into the adversarial robustness of fair graph learning and can shed light on designing robust and fair graph learning in future studies.	翻訳日:2023-10-25 19:41:42 公開日:2023-10-24
# 効率的な事前学習音声モデルとしての動的畳み込みニューラルネットワーク Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models ( http://arxiv.org/abs/2310.15648v1 ) ライセンス: Link先を確認	Florian Schmid, Khaled Koutini, Gerhard Widmer	(参考訳) audiosetのような大規模なオーディオデータセットの導入は、トランスフォーマーがオーディオドメインを克服し、cnnを最先端のニューラルネットワークアーキテクチャとして多くのタスクで置き換える手段となった。 Audio Spectrogram Transformerは大規模なデータセットを活用するのに優れており、下流タスクで微調整されたときにCNNを超える強力な事前学習モデルを生成する。しかし、現在の一般的なAudio Spectrogram Transformersは、CNNと比較して計算複雑性の点で要求されている。近年, Transformer-to-CNN Knowledge Distillation を用いることで, 効率的な CNN は, 大規模データセット上での Transformer に追いつき, 性能も向上することが示された。本研究では, 動的非線形性, 動的畳み込み, および注意機構からなる動的cnnブロックを導入することにより, この研究範囲を拡大し, 効率的なcnnのキャパシティを向上させる。これらの動的CNNは,大規模オーディオセットの音声タグ付け作業において,性能・複雑度トレードオフとパラメータ効率の観点から,従来のCNNよりも優れていることを示す。さらに,導入した動的cnnは,ダウンストリームタスクの性能向上とスケールアップ,トランスフォーマー性能の向上,オーディオセットやダウンストリームタスクよりも優れたパフォーマンスを実現していることを示す。 The introduction of large-scale audio datasets, such as AudioSet, paved the way for Transformers to conquer the audio domain and replace CNNs as the state-of-the-art neural network architecture for many tasks. Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks. However, current popular Audio Spectrogram Transformers are demanding in terms of computational complexity compared to CNNs. Recently, we have shown that, by employing Transformer-to-CNN Knowledge Distillation, efficient CNNs can catch up with and even outperform Transformers on large datasets. In this work, we extend this line of research and increase the capacity of efficient CNNs by introducing dynamic CNN blocks, constructed of dynamic non-linearities, dynamic convolutions and attention mechanisms. We show that these dynamic CNNs outperform traditional efficient CNNs, in terms of the performance-complexity trade-off and parameter efficiency, at the task of audio tagging on the large-scale AudioSet. Our experiments further indicate that the introduced dynamic CNNs achieve better performance on downstream tasks and scale up well, attaining Transformer performance and even outperforming them on AudioSet and several downstream tasks.	翻訳日:2023-10-25 19:41:28 公開日:2023-10-24
# マスク付き特徴アライメントを持つ平均教師DETR:ロバストドメイン適応検出トランスフレームワーク Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework ( http://arxiv.org/abs/2310.15646v1 ) ライセンス: Link先を確認	Weixi Weng, Chun Yuan	(参考訳) 非教師付きドメイン適応オブジェクト検出(UDAOD)による検出変換器(DETR)の研究は主に特徴アライメントに焦点を当てており、既存の手法は2つの種類に分けられる。 1段階の機能アライメント手法は、パフォーマンスの変動やトレーニングの停滞を容易に引き起こすことができる。平均教師に基づく2段階特徴アライメント手法は、事前訓練段階に続き、自己訓練段階と、信頼性の高い事前訓練モデルの獲得と一貫した性能向上の達成に直面する課題を含む。上述の手法では、ターゲットライクなドメインのような第3の関連ドメインをどのように活用して適応を支援するかはまだ検討されていない。これらの問題に対処するため、我々はMTMと呼ばれる2段階のフレームワーク、すなわちMasked Feature Alignmentを用いた平均教師-DETRを提案する。事前訓練段階では,画像スタイルの転送によって生成されたラベル付きターゲットライクな画像を用いて,性能変動を回避する。自己学習段階において,平均教師に基づく擬似ラベルによるラベル付き目標画像の活用と,学生モデルの一貫したパフォーマンス向上を実現するために,オブジェクトクエリ知識転送(oqkt)と呼ばれるモジュールを提案する。最も重要なことは,Masked Domain Query-based Feature Alignment (MDQFA) や Masked Token-wise Feature Alignment (MTWFA) といったマスク付き機能アライメント手法によって,トレーニングの停滞を防止し,事前訓練段階における堅牢な事前訓練モデルを実現するとともに,自己学習段階におけるモデルの目標性能を向上させることにある。 3つの難解なシナリオの実験と理論的解析はmtmの有効性を検証する。 Unsupervised domain adaptation object detection(UDAOD) research on Detection Transformer(DETR) mainly focuses on feature alignment and existing methods can be divided into two kinds, each of which has its unresolved issues. One-stage feature alignment methods can easily lead to performance fluctuation and training stagnation. Two-stage feature alignment method based on mean teacher comprises a pretraining stage followed by a self-training stage, each facing problems in obtaining reliable pretrained model and achieving consistent performance gains. Methods mentioned above have not yet explore how to utilize the third related domain such as target-like domain to assist adaptation. To address these issues, we propose a two-stage framework named MTM, i.e. Mean Teacher-DETR with Masked Feature Alignment. In the pretraining stage, we utilize labeled target-like images produced by image style transfer to avoid performance fluctuation. In the self-training stage, we leverage unlabeled target images by pseudo labels based on mean teacher and propose a module called Object Queries Knowledge Transfer(OQKT) to ensure consistent performance gains of the student model. Most importantly, we propose masked feature alignment methods including Masked Domain Query-based Feature Alignment(MDQFA) and Masked Token-wise Feature Alignment(MTWFA) to alleviate domain shift in a more robust way, which not only prevent training stagnation and lead to a robust pretrained model in the pretraining stage, but also enhance the model's target performance in the self-training stage. Experiments on three challenging scenarios and a theoretical analysis verify the effectiveness of MTM.	翻訳日:2023-10-25 19:41:03 公開日:2023-10-24
# Droidをライトアップ! Android マルウェア検出におけるアプリケーション難読化に対する静的解析機能の有効性について Light up that Droid! On the Effectiveness of Static Analysis Features against App Obfuscation for Android Malware Detection ( http://arxiv.org/abs/2310.15645v1 ) ライセンス: Link先を確認	Borja Molina-Coronado, Antonio Ruggia, Usue Mori, Alessio Merlo, Alexander Mendiburu, Jose Miguel-Alonso	(参考訳) マルウェアの作者は、難読化を静的解析機能に基づいてマルウェア検出をバイパスする手段と見なしている。 Androidでは、多くのアンチマルウェア製品が単純なプログラム変換で容易に回避できることが確認されている。これらの作業とは対照的に、静的解析機能を活用したAndroid用のML検出提案も難読化耐性として提案されている。したがって、特定の難読化戦略やツールの使用が、静的解析機能に基づくandroid用のmlマルウェア検出器の妥当性のリスクの程度を決定する必要がある。本稿では,静的解析を用いて抽出した共通特徴に対する特定の難読化技術の影響を評価し,これらの特徴に依存するMLマルウェア検出装置の有効性を損なうのに十分重要な変化かどうかを判定する。実験結果から, 難読化技術は静的解析の全ての特徴を異なるツールで異なる程度に変化させることが示唆された。しかし,特定の特徴は,難読化が存在する場合でもMLマルウェア検出の有効性を保っている。これらの知見に基づいて,難読化対策に頑健なAndroid用MLマルウェア検出器を提案し,現状の最先端検知器よりも優れた性能を示す。 Malware authors have seen obfuscation as the mean to bypass malware detectors based on static analysis features. For Android, several studies have confirmed that many anti-malware products are easily evaded with simple program transformations. As opposed to these works, ML detection proposals for Android leveraging static analysis features have also been proposed as obfuscation-resilient. Therefore, it needs to be determined to what extent the use of a specific obfuscation strategy or tool poses a risk for the validity of ML malware detectors for Android based on static analysis features. To shed some light in this regard, in this article we assess the impact of specific obfuscation techniques on common features extracted using static analysis and determine whether the changes are significant enough to undermine the effectiveness of ML malware detectors that rely on these features. The experimental results suggest that obfuscation techniques affect all static analysis features to varying degrees across different tools. However, certain features retain their validity for ML malware detection even in the presence of obfuscation. Based on these findings, we propose a ML malware detector for Android that is robust against obfuscation and outperforms current state-of-the-art detectors.	翻訳日:2023-10-25 19:40:31 公開日:2023-10-24
# フィンランドにおけるICTインハウス調達の旅 : 法的枠組みと実践的課題の評価 Navigating ICT In-House Procurement in Finland: Evaluating Legal Frameworks and Practical Challenges ( http://arxiv.org/abs/2310.15643v1 ) ライセンス: Link先を確認	Reetta Ghezzi, Minnamaria Korhonen, Hannu Vilpponen, and Tommi Mikkonen	(参考訳) 内調達は公共調達の分野で物議を醸している問題である。簡単に言えば、このような調達はベンダーの公平かつ平等な扱いの特定の側面を見渡すことができる。本稿では,フィンランド市町村におけるICTの社内調達に関する質的研究について述べる。半構造化インタビューは自治体の利害関係者からの洞察を集めるために行われた。接地理論のアプローチを用いて、データ分析はフィンランドの自治体とそれに関連する内的実体の間の複雑なダイナミクスを示している。それでもなお、社内調達を管理する法的枠組みが複雑で議論されていることは明らかである。 In-house procurement is a controversial issue in the field of public procurement. Simply put, such procurement allows overlooking certain aspects of fair and equal treatment of vendors. This paper presents qualitative research on in-house ICT procurement within Finnish municipalities. Semi-structured interviews were conducted to gather insights from municipal stakeholders. Using grounded theory approach, data analysis shows intricate dynamics between Finnish municipalities and in-house entities associated with them. Still, it is clear that the legal framework governing in-house procurement remains intricate and debated.	翻訳日:2023-10-25 19:40:14 公開日:2023-10-24
# GitBug-Actions:GitHubアクションで再現可能なバグフィックスベンチマークを構築する GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub Actions ( http://arxiv.org/abs/2310.15642v1 ) ライセンス: Link先を確認	Nuno Saavedra, Andr\'e Silva, Martin Monperrus	(参考訳) バグフィックスベンチマークは、自動プログラム修復(APR)やフォールトローカライゼーション(FL)など、ソフトウェア工学の様々なサブフィールドを進化させる上で基本的なものである。優れたベンチマークには、今日の技術と開発プラクティスを正確に反映する最近の例を含める必要があります。長期的に実行可能なベンチマークは、例えば、もはや利用できない依存関係のために、残業時間を劣化しないテストスイートを特徴としなければならない。既存のベンチマークは両方の基準を満たさない。例えば、最上位のjavaベンチマークである defects4j が、2020年にアップデートされた。さらに、既存のベンチマークの大半では、完全な再現性は無視されている。本稿では,gitbug-actionsについて述べる。最新かつ完全に再現可能なバグフィックスを用いて,バグフィックスベンチマークを構築するための新しいツールである。 GitBug-Actionsは、最も人気のあるCIプラットフォームであるGitHub Actionsに依存して、バグフィックスを検出し、制御された再現可能な環境でCIパイプラインをスマートにローカルに実行する。私たちの知る限りでは、GitHub Actionsを使ってバグフィックスを収集するのは初めてです。ツールチェーンを示すために、gitbug-actionsをデプロイして、さまざまなリポジトリから実行可能な、完全に再現可能なバグ修正を含む、概念実証のgoバグフィックスベンチマークを構築します。 GitBug-Actionsをデモするビデオは、https://youtu.be/aBWwa1sJYBsで公開されている。 Bug-fix benchmarks are fundamental in advancing various sub-fields of software engineering such as automatic program repair (APR) and fault localization (FL). A good benchmark must include recent examples that accurately reflect technologies and development practices of today. To be executable in the long term, a benchmark must feature test suites that do not degrade overtime due to, for example, dependencies that are no longer available. Existing benchmarks fail in meeting both criteria. For instance, Defects4J, one of the foremost Java benchmarks, last received an update in 2020. Moreover, full-reproducibility has been neglected by the majority of existing benchmarks. In this paper, we present GitBug-Actions: a novel tool for building bug-fix benchmarks with modern and fully-reproducible bug-fixes. GitBug-Actions relies on the most popular CI platform, GitHub Actions, to detect bug-fixes and smartly locally execute the CI pipeline in a controlled and reproducible environment. To the best of our knowledge, we are the first to rely on GitHub Actions to collect bug-fixes. To demonstrate our toolchain, we deploy GitBug-Actions to build a proof-of-concept Go bug-fix benchmark containing executable, fully-reproducible bug-fixes from different repositories. A video demonstrating GitBug-Actions is available at: https://youtu.be/aBWwa1sJYBs.	翻訳日:2023-10-25 19:40:04 公開日:2023-10-24
# 循環紙からの知識集約による医用抽象要約の改善 Improving Biomedical Abstractive Summarisation with Knowledge Aggregation from Citation Papers ( http://arxiv.org/abs/2310.15684v1 ) ライセンス: Link先を確認	Chen Tang, Shun Wang, Tomas Goldsack and Chenghua Lin	(参考訳) バイオメディカル文学から派生した抽象化は、専門的な書体や、関連する文献の深い理解を必要とするバイオメディカル用語など、ドメイン固有の特徴を持っている。結果として、既存の言語モデルは、ドメイン固有の背景知識が欠如していることから、バイオメディカルの専門家が生み出したものと同等の技術的要約を生成するのに苦労する。本稿では,文献から引用された外部論文から知識を集約することにより,生物医学的抽象要約における言語モデルの性能を向上させることを目的とする。本稿では,引用論文からドメイン固有の知識を統合し,引用論文から論文の内容と関連知識の両方を活用することで要約をニューラルネットワークで生成する,新しい注目に基づく引用集約モデルを提案する。さらに,本研究の基盤となる大規模生物医学的要約データセットを構築し,公開する。広範な実験により,本モデルが最先端のアプローチを上回り,抽象的生物医学的テキスト要約の大幅な改善を達成していることが示された。 Abstracts derived from biomedical literature possess distinct domain-specific characteristics, including specialised writing styles and biomedical terminologies, which necessitate a deep understanding of the related literature. As a result, existing language models struggle to generate technical summaries that are on par with those produced by biomedical experts, given the absence of domain-specific background knowledge. This paper aims to enhance the performance of language models in biomedical abstractive summarisation by aggregating knowledge from external papers cited within the source article. We propose a novel attention-based citation aggregation model that integrates domain-specific knowledge from citation papers, allowing neural networks to generate summaries by leveraging both the paper content and relevant knowledge from citation papers. Furthermore, we construct and release a large-scale biomedical summarisation dataset that serves as a foundation for our research. Extensive experiments demonstrate that our model outperforms state-of-the-art approaches and achieves substantial improvements in abstractive biomedical text summarisation.	翻訳日:2023-10-25 19:30:49 公開日:2023-10-24
# 集団作業における大規模言語モデルの利用状況と防止 Prevalence and prevention of large language model use in crowd work ( http://arxiv.org/abs/2310.15683v1 ) ライセンス: Link先を確認	Veniamin Veselovsky, Manoel Horta Ribeiro, Philip Cozzolino, Andrew Gordon, David Rothschild, Robert West	(参考訳) 大規模言語モデル (LLM) の使用は, 群集労働者の間で広く普及しており, 目標緩和戦略は, LLM の使用を著しく削減するが, 排除しない。 LLMの使用に関して労働者が指示を受けていないテキスト要約タスクでは、LLMの使用頻度は30%程度と見積もられたが、LLMの使用を禁止し、コピーペーストを無効にすることで使用コストを高くすることで約半分削減された。 llmの使用は、(モデルではなく)人間の行動に関わる研究を害し、クラウドソースデータで訓練された将来のモデルを劣化させる可能性がある、高品質だが均質な反応をもたらす。同時に、llmの使用を防止することは、高品質な応答を得るのと相反する可能性がある。例えば、労働者にllmを使わないよう要求する場合、要約には必須情報を含むキーワードが少なかった。 llmが人気や能力を高め、利用に関する基準が変わるにつれ、私たちの見積もはおそらく変わるでしょう。しかし,LLMベースのツールとユーザの共同進化を理解することは,クラウドソーシングによる研究の妥当性を維持する鍵であり,広く普及する前に重要なベースラインを提供する。 We show that the use of large language models (LLMs) is prevalent among crowd workers, and that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use. On a text summarization task where workers were not directed in any way regarding their LLM use, the estimated prevalence of LLM use was around 30%, but was reduced by about half by asking workers to not use LLMs and by raising the cost of using them, e.g., by disabling copy-pasting. Secondary analyses give further insight into LLM use and its prevention: LLM use yields high-quality but homogeneous responses, which may harm research concerned with human (rather than model) behavior and degrade future models trained with crowdsourced data. At the same time, preventing LLM use may be at odds with obtaining high-quality responses; e.g., when requesting workers not to use LLMs, summaries contained fewer keywords carrying essential information. Our estimates will likely change as LLMs increase in popularity or capabilities, and as norms around their usage change. Yet, understanding the co-evolution of LLM-based tools and users is key to maintaining the validity of research done using crowdsourcing, and we provide a critical baseline before widespread adoption ensues.	翻訳日:2023-10-25 19:30:32 公開日:2023-10-24
# 多要素バンドの固定予算実値組合せ純粋探索 Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit ( http://arxiv.org/abs/2310.15681v1 ) ライセンス: Link先を確認	Shintaro Nakamura and Masashi Sugiyama	(参考訳) 固定予算設定におけるマルチアームバンディットの実測値について検討した。まず,動作クラスのサイズがアーム数に対して指数関数的に大きい場合でも,最善の動作を識別できる最初のアルゴリズムであるコンビネートアル・逐次アサイン(csa)アルゴリズムを導入する。 CSAアルゴリズムの誤差確率の上限は指数の対数係数までの下界と一致することを示す。次に、アクションクラスのサイズが多項式である場合には、minimax combinatorial sequential accepts and rejects(minimax-combsar)アルゴリズムという別のアルゴリズムを導入し、それが最適であることを示し、下界に一致することを示す。最後に,提案手法を従来の手法と実験的に比較し,アルゴリズムの性能が向上したことを示す。 We study the real-valued combinatorial pure exploration of the multi-armed bandit in the fixed-budget setting. We first introduce the Combinatorial Successive Asign (CSA) algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large with respect to the number of arms. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. Then, we introduce another algorithm named the Minimax Combinatorial Successive Accepts and Rejects (Minimax-CombSAR) algorithm for the case where the size of the action class is polynomial, and show that it is optimal, which matches a lower bound. Finally, we experimentally compare the algorithms with previous methods and show that our algorithm performs better.	翻訳日:2023-10-25 19:30:08 公開日:2023-10-24
# マルチモーダル3次元シーン理解の最近の進歩:包括的調査と評価 Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation ( http://arxiv.org/abs/2310.15676v1 ) ライセンス: Link先を確認	Yinjie Lei, Zixuan Wang, Feng Chen, Guoqing Wang, Peng Wang and Yang Yang	(参考訳) マルチモーダルな3Dシーン理解は、自律運転や人間とコンピュータのインタラクションなど、多くの分野で広く応用されているため、注目されている。従来の単一モードの3D理解と比較して、付加的なモダリティの導入は、シーン解釈の豊かさと精度を高めるだけでなく、より堅牢でレジリエントな理解を保証する。これは、3Dデータのみに依存することが不十分な環境において、特に重要になる。マルチカメラ画像(3D+2D)とテキスト記述(3D+言語)を統合するようなマルチモーダルな3D手法の開発が過去3年間に進んでいるが、包括的かつ詳細なレビューは特に欠落している。本稿では,このギャップを埋めるための最近の進歩を体系的に調査する。まず、様々な3次元マルチモーダルタスクを形式的に定義し、それらの固有の課題を要約する背景を紹介する。その後,既存の手法をモダリティやタスクに応じて徹底的に分類し,それぞれの強みや限界を探索する新しい分類法を提案する。さらに、いくつかのベンチマークデータセットに対する最近のアプローチと洞察に富んだ分析の比較結果も提供される。最後に,未解決問題について考察し,今後の研究への道筋について述べる。 Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction. Compared to conventional single-modal 3D understanding, introducing an additional modality not only elevates the richness and precision of scene interpretation but also ensures a more robust and resilient understanding. This becomes especially crucial in varied and challenging environments where solely relying on 3D data might be inadequate. While there has been a surge in the development of multi-modal 3D methods over past three years, especially those integrating multi-camera images (3D+2D) and textual descriptions (3D+language), a comprehensive and in-depth review is notably absent. In this article, we present a systematic survey of recent progress to bridge this gap. We begin by briefly introducing a background that formally defines various 3D multi-modal tasks and summarizes their inherent challenges. After that, we present a novel taxonomy that delivers a thorough categorization of existing methods according to modalities and tasks, exploring their respective strengths and limitations. Furthermore, comparative results of recent approaches on several benchmark datasets, together with insightful analysis, are offered. Finally, we discuss the unresolved issues and provide several potential avenues for future research.	翻訳日:2023-10-25 19:29:53 公開日:2023-10-24
# シュワルツシルトブラックホール近傍の量子性 Quantumness near a Schwarzschild black hole ( http://arxiv.org/abs/2310.15675v1 ) ライセンス: Link先を確認	S. Haddadi, M. A. Yurischev, M. Y. Abd-Rabbou, M. Azizi, M. R. Pourkarimi, M. Ghominejad	(参考訳) 量子情報科学と相対性理論の融合は、ブラックホールに関連する情報の伝達を取り巻く謎を理解する新しい機会を与える。この目的のために、シュワルツシルトブラックホール近傍の量子度をデコヒーレンスの下で実用モデルで研究する。本論文で検討するシナリオは、平らな領域の定常粒子が周囲の粒子と相互作用し、別の粒子がシュワルツシルトブラックホールの事象の地平線付近で自由落下する、というものである。ホーキング放射とデコヒーレンスが研究中の系に与える影響を調べ、これらの効果が量子特性の生存を阻害するが、完全に破壊できないことを発見した。したがって、この研究の結果は、曲がりくねった時空フレームワークの中で動作している実システムの量子特性の理解に貴重な洞察を与える可能性がある。 The merging of quantum information science with the relativity theory presents novel opportunities for understanding the enigmas surrounding the transmission of information in relation to black holes. For this purpose, we study the quantumness near a Schwarzschild black hole in a practical model under decoherence. The scenario we consider in this paper is that a stationary particle in the flat region interacts with its surroundings while another particle experiences free fall in the vicinity of a Schwarzschild black hole's event horizon. We explore the impacts of Hawking radiation and decoherence on the system under investigation and find that these effects can limit the survival of quantum characteristics, but cannot destroy them completely. Hence, the results of this study possess the potential to yield valuable insights into the comprehension of the quantum properties of a real system operating within a curved space-time framework.	翻訳日:2023-10-25 19:29:31 公開日:2023-10-24
# 私の注意に基づくASRシステムはどのくらい必要か? How Much Context Does My Attention-Based ASR System Need? ( http://arxiv.org/abs/2310.15672v1 ) ライセンス: Link先を確認	Robert Flynn and Anton Ragni	(参考訳) 音声認識のタスクでは、訓練中の30秒以上の音響コンテキストの使用は珍しく、文献ではあまり調査されていない。本研究では,音声・言語モデルの学習/評価に使用されるシーケンス長のスケールが音声認識性能に与える影響について検討する。これらの実験では、約10万の擬似ラベル付きSpotifyポッドキャストのデータセットを使用し、コンテキストの長さは5秒から1時間である。長文データセットのゼロショット評価利益-22とtedliumは、約80秒の音響コンテキストでのトレーニングの利点を示し、限られたコンテキストベースラインから14.9%の相対的な改善を示している。さらに、完全長文ASRシステムのビームサーチにより、長文変換言語モデルとシステム組み合わせを行い、現在の最先端技術と競合する結果を得る。 For the task of speech recognition, the use of more than 30 seconds of acoustic context during training is uncommon, and under-investigated in literature. In this work, we examine the effect of scaling the sequence length used to train/evaluate (dense-attention based) acoustic and language models on speech recognition performance. For these experiments a dataset of roughly 100,000 pseudo-labelled Spotify podcasts is used, with context lengths of 5 seconds to 1 hour being explored. Zero-shot evaluations on long-format datasets Earnings-22 and Tedlium demonstrate a benefit from training with around 80 seconds of acoustic context, showing up to a 14.9% relative improvement from a limited context baseline. Furthermore, we perform a system combination with long-context transformer language models via beam search for a fully long-context ASR system, with results that are competitive with the current state-of-the-art.	翻訳日:2023-10-25 19:29:17 公開日:2023-10-24
# 3次元物体検出のための視覚中心多モードエキスパートの活用 Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection ( http://arxiv.org/abs/2310.15670v1 ) ライセンス: Link先を確認	Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li	(参考訳) 現在の研究は主に、lidarまたはマルチモーダルベース(expert)から転送される知識を通じて、カメラのみの3dオブジェクト検出器(apprentice)の精度向上に重点を置いている。しかし、LiDARとカメラの特徴のドメインギャップの存在は、時間融合の固有の非互換性と相まって、蒸留に基づく徒弟強化の有効性を著しく損なう。ユニモーダル蒸留の成功に触発されて、見習いに親しみやすい専門家モデルはカメラ機能に大きく依存する一方で、マルチモーダルモデルに匹敵する性能を保った。そこで本研究では, 見習いに親しみやすいマルチモーダルエキスパートと時間融合に親しむ蒸留監督を含む,カメラオンリーの見習いモデルを改善するためのフレームワークであるVCDを紹介する。マルチモーダルの専門家VCD-Eは、特徴格差を軽減するためにカメラオンリーの見習いと同一の構造を採用し、LiDAR入力を3Dシーンの再構成に先立って深度として活用し、他の異種マルチモーダル専門家と同等の性能を達成する。また、シーン内の各対象に対する運動誤認を個別に補正する目的で、細粒度軌道ベースの蒸留モジュールを導入する。これらの改善により、我々のカメラオンリーの見習いVCD-Aは、63.1%のNDSスコアでnuScenesに新しい最先端技術を設定する。 Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert). However, the presence of the domain gap between LiDAR and camera features, coupled with the inherent incompatibility in temporal fusion, significantly hinders the effectiveness of distillation-based enhancements for apprentices. Motivated by the success of uni-modal distillation, an apprentice-friendly expert model would predominantly rely on camera features, while still achieving comparable performance to multi-modal models. To this end, we introduce VCD, a framework to improve the camera-only apprentice model, including an apprentice-friendly multi-modal expert and temporal-fusion-friendly distillation supervision. The multi-modal expert VCD-E adopts an identical structure as that of the camera-only apprentice in order to alleviate the feature disparity, and leverages LiDAR input as a depth prior to reconstruct the 3D scene, achieving the performance on par with other heterogeneous multi-modal experts. Additionally, a fine-grained trajectory-based distillation module is introduced with the purpose of individually rectifying the motion misalignment for each object in the scene. With those improvements, our camera-only apprentice VCD-A sets new state-of-the-art on nuScenes with a score of 63.1% NDS.	翻訳日:2023-10-25 19:29:02 公開日:2023-10-24
# 数学用語問題に対する表現構文情報ボトルネック Expression Syntax Information Bottleneck for Math Word Problems ( http://arxiv.org/abs/2310.15664v1 ) ライセンス: Link先を確認	Jing Xiong, Chengming Li, Min Yang, Xiping Hu, Bin Hu	(参考訳) Math Word Problems (MWP) は、テキストで与えられた数学的問題を自動的に解くことを目的としている。以前の研究では、モデルがより包括的な機能を得るために、元のテキストで追加情報を取得するために複雑なモデルを設計する傾向がある。本稿では,我々の注意を反対方向に向け,MWPの急激な相関を含む冗長な特徴を捨てる方法について検討する。そこで本研究では,表現構文木の本質的特徴を抽出し,構文関連性のない特徴を含む潜在固有冗長性をフィルタリングするMWP(ESIB)のための表現構文情報ブートネック手法を設計する。 ESIBの鍵となる考え方は、複数のモデルに対して、同じ問題の異なる問題表現に対する同じ式構文木を相互学習により予測し、表現構文木の一貫性のある情報をキャプチャし、潜時固有の冗長性を捨てることである。モデルの一般化能力を向上し、より多様な表現を生成するために、潜在空間における表現構文情報にもっと依存するようモデルに促すために、自己蒸留損失をデザインする。 2つの大規模ベンチマークにおける実験結果から,我々のモデルが最先端の結果を達成するだけでなく,より多様なソリューションを生み出すことが示された。コードは利用可能です。 Math Word Problems (MWP) aims to automatically solve mathematical questions given in texts. Previous studies tend to design complex models to capture additional information in the original text so as to enable the model to gain more comprehensive features. In this paper, we turn our attention in the opposite direction, and work on how to discard redundant features containing spurious correlations for MWP. To this end, we design an Expression Syntax Information Bottleneck method for MWP (called ESIB) based on variational information bottleneck, which extracts essential features of expression syntax tree while filtering latent-specific redundancy containing syntax-irrelevant features. The key idea of ESIB is to encourage multiple models to predict the same expression syntax tree for different problem representations of the same problem by mutual learning so as to capture consistent information of expression syntax tree and discard latent-specific redundancy. To improve the generalization ability of the model and generate more diverse expressions, we design a self-distillation loss to encourage the model to rely more on the expression syntax information in the latent space. Experimental results on two large-scale benchmarks show that our model not only achieves state-of-the-art results but also generates more diverse solutions. The code is available.	翻訳日:2023-10-25 19:28:37 公開日:2023-10-24
# 電気負荷予測における対話型一般化付加モデルとその応用 Interactive Generalized Additive Model and Its Applications in Electric Load Forecasting ( http://arxiv.org/abs/2310.15662v1 ) ライセンス: Link先を確認	Linxiao Yang and Rui Ren and Xinyue Gu and Liang Sun	(参考訳) 電力負荷予測は電力システムの計画と管理に欠かせない要素である。不正確な負荷予測は、停電やエネルギーの浪費につながる可能性がある。正確な電力負荷予測は、ホリデーシーズンの負荷予測や極端気象条件下での負荷予測など、限られたデータやデータがない場合に困難である。高リスク意思決定は通常負荷予測の後に行われるため、モデル解釈は予測モデルの導入に不可欠である。本稿では,電力産業において,解釈可能なだけでなく,特定の分野の知識を取り入れた対話型GAMを提案する。このブースティングに基づくGAMは、断片線形関数を活用し、効率的なアルゴリズムによって学習することができる。パブリックベンチマークと電気データの両方において、我々の対話型GAMは現在の最先端の手法よりも優れており、極端な気象事象の場合に優れた一般化能力を示す。私たちはインタラクティブなGAMをベースとしたユーザフレンドリなWebベースのツールをローンチし、電気予測のための統合AIプラットフォームであるeForecaster製品にすでに組み込んでいます。 Electric load forecasting is an indispensable component of electric power system planning and management. Inaccurate load forecasting may lead to the threat of outages or a waste of energy. Accurate electric load forecasting is challenging when there is limited data or even no data, such as load forecasting in holiday, or under extreme weather conditions. As high-stakes decision-making usually follows after load forecasting, model interpretability is crucial for the adoption of forecasting models. In this paper, we propose an interactive GAM which is not only interpretable but also can incorporate specific domain knowledge in electric power industry for improved performance. This boosting-based GAM leverages piecewise linear functions and can be learned through our efficient algorithm. In both public benchmark and electricity datasets, our interactive GAM outperforms current state-of-the-art methods and demonstrates good generalization ability in the cases of extreme weather events. We launched a user-friendly web-based tool based on interactive GAM and already incorporated it into our eForecaster product, a unified AI platform for electricity forecasting.	翻訳日:2023-10-25 19:28:15 公開日:2023-10-24
# 地域制御型スタイル転送 Region-controlled Style Transfer ( http://arxiv.org/abs/2310.15658v1 ) ライセンス: Link先を確認	Junjie Kang, Jinsong Wu, Shiqi Jiang	(参考訳) 画像スタイル転送は計算ビジョンにおいて難しい課題である。既存のアルゴリズムは、ニューラルネットワークの特徴層を制御することによって、スタイルイメージの色とテクスチャを転送する。しかし、コンテンツ画像の異なる領域におけるテクスチャの強さを制御できない。そこで本研究では,異なる領域のスタイル強度を制約するためにロス関数を用いたトレーニング手法を提案する。本手法は,スタイル画像とコンテンツ画像の勾配関係に基づいて,異なる領域におけるスタイル特徴の伝達強度を導出する。さらに,その意味的関係を維持しつつ,コンテンツの特徴をスタイル的特徴に線形変換する特徴融合手法を提案する。広範な実験により,提案手法の有効性が実証された。 Image style transfer is a challenging task in computational vision. Existing algorithms transfer the color and texture of style images by controlling the neural network's feature layers. However, they fail to control the strength of textures in different regions of the content image. To address this issue, we propose a training method that uses a loss function to constrain the style intensity in different regions. This method guides the transfer strength of style features in different regions based on the gradient relationship between style and content images. Additionally, we introduce a novel feature fusion method that linearly transforms content features to resemble style features while preserving their semantic relationships. Extensive experiments have demonstrated the effectiveness of our proposed approach.	翻訳日:2023-10-25 19:27:58 公開日:2023-10-24
# GNeSF: 一般化可能なニューラルセマンティックフィールド GNeSF: Generalizable Neural Semantic Fields ( http://arxiv.org/abs/2310.15712v1 ) ライセンス: Link先を確認	Hanlin Chen, Chen Li, Mengqi Guo, Zhiwen Yan, Gim Hee Lee	(参考訳) 神経的暗黙的表現に基づく3次元シーンセグメンテーションが最近登場し,2次元監督によるトレーニングのみを活用している。しかし、既存のアプローチでは推論中に新しいシーンへの一般化を禁止した高価なシーンごとの最適化が必要である。この問題を回避するために,暗黙表現に基づく一般化可能な3次元セグメンテーションフレームワークを提案する。具体的には,多視点画像特徴と意味マップを入力として,空間情報のみを入力とし,シーン固有の幾何学的・意味的情報への過度な適合を避ける。本稿では,各3次元点の異なる視点から2次元意味情報を集約するソフト投票機構を提案する。画像の特徴に加えて,我々のフレームワークでは,投票結果を予測するために,ビュー差情報も符号化されている。直感的には、近くのビューからのセマンティックな情報は、遠くのビューよりも貢献できる。さらに、可視性モジュールは、隠されたビューから有害情報を検出し、フィルタリングするように設計されている。提案手法の汎用性により,意味マップを合成したり,2次元意味的監督だけで新規シーンの3次元意味セグメンテーションを行うことができる。実験結果から,本手法はシーン特異的アプローチと同等の性能を示した。さらに重要なことは、我々のアプローチは2Dアノテーションだけで既存の強力な監督ベースのアプローチより優れていることです。ソースコードはhttps://github.com/hlinchen/gnesf.com/で入手できます。 3D scene segmentation based on neural implicit representation has emerged recently with the advantage of training only on 2D supervision. However, existing approaches still requires expensive per-scene optimization that prohibits generalization to novel scenes during inference. To circumvent this problem, we introduce a generalizable 3D segmentation framework based on implicit representation. Specifically, our framework takes in multi-view image features and semantic maps as the inputs instead of only spatial information to avoid overfitting to scene-specific geometric and semantic information. We propose a novel soft voting mechanism to aggregate the 2D semantic information from different views for each 3D point. In addition to the image features, view difference information is also encoded in our framework to predict the voting scores. Intuitively, this allows the semantic information from nearby views to contribute more compared to distant ones. Furthermore, a visibility module is also designed to detect and filter out detrimental information from occluded views. Due to the generalizability of our proposed method, we can synthesize semantic maps or conduct 3D semantic segmentation for novel scenes with solely 2D semantic supervision. Experimental results show that our approach achieves comparable performance with scene-specific approaches. More importantly, our approach can even outperform existing strong supervision-based approaches with only 2D annotations. Our source code is available at: https://github.com/HLinChen/GNeSF.	翻訳日:2023-10-25 19:22:35 公開日:2023-10-24
# 観察変数のグルーピングによって識別可能な因果表現学習 Causal Representation Learning Made Identifiable by Grouping of Observational Variables ( http://arxiv.org/abs/2310.15709v1 ) ライセンス: Link先を確認	Hiroshi Morioka, Aapo Hyv\"arinen	(参考訳) 現在注目されているトピックはcausal representation learning(crl)で、その目標はデータ駆動方式で隠れた機能のための因果モデルを学ぶことである。残念なことにCRLは、表現学習と因果発見の2つの悪名高い悪名高い問題の組み合わせである。しかし,一意解が保証される実用的識別可能性条件の発見は,その実用性に不可欠である。これまでのアプローチのほとんどは、時間的因果性(temporal causality)や監督や介入の存在といった潜在因果メカニズムの仮定に基づいている。ここでは,時間構造や介入,弱い監督を必要としない,新しい弱い制約に基づく識別可能性を示す。このアプローチは、観測混合が観測変数の適切なグループ化を示すと仮定している。また,モデルに整合した新たな自己教師付き推定フレームワークを提案し,その統計的整合性を証明し,最先端のベースラインに比べて優れたCRL性能を実験的に示す。我々はまた、潜在する共同設立者と因果サイクルに対する堅牢性を示す。 A topic of great current interest is Causal Representation Learning (CRL), whose goal is to learn a causal model for hidden features in a data-driven manner. Unfortunately, CRL is severely ill-posed since it is a combination of the two notoriously ill-posed problems of representation learning and causal discovery. Yet, finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most approaches so far have been based on assumptions on the latent causal mechanisms, such as temporal causality, or existence of supervision or interventions; these can be too restrictive in actual applications. Here, we show identifiability based on novel, weak constraints, which requires no temporal structure, intervention, nor weak supervision. The approach is based assuming the observational mixing exhibits a suitable grouping of the observational variables. We also propose a novel self-supervised estimation framework consistent with the model, prove its statistical consistency, and experimentally show its superior CRL performances compared to the state-of-the-art baselines. We further demonstrate its robustness against latent confounders and causal cycles.	翻訳日:2023-10-25 19:22:11 公開日:2023-10-24
# 深層強化学習を用いた多種多様なスケジューリングポリシーの作成による大規模フレキシブルジョブショップスケジューリングインスタンスの解法 Solving large flexible job shop scheduling instances by generating a diverse set of scheduling policies with deep reinforcement learning ( http://arxiv.org/abs/2310.15706v1 ) ライセンス: Link先を確認	Imanol Echeverria, Maialen Murua, Roberto Santana	(参考訳) フレキシブルなジョブショップスケジューリング問題(fjssp)は文献で広く研究されており、ヒューリスティック、精密、メタヒューリスティックな手法で複数のアプローチが提案されている。しかし、業界がリアルタイムでディスラプティブなイベントに応答できるという要求は、数秒以内に新しいスケジュールを生成する必要性を生んでいる。この制約の下では、品質が向上してもスケジュールを生成することができるのはディスパッチルール(DR)のみである。この結果を改善するため、fjsspをマルコフ決定プロセス(mdp)としてモデル化し、強化学習を用いて機械に操作を割り当てる最適解を生成するポリシーを作成するための最近の手法が提案されている。それでも、特に現実のシナリオで一般的な大きなJSSPインスタンスでは、改善の余地は残っている。そこで本研究では,FJSSPの大規模インスタンスを堅牢に解決する手法を提案する。そこで本稿では,グラフニューラルネットワークを用いてFJSSPをMDPとしてモデル化する手法を提案する。また、推論をより堅牢にする方法として、並列化可能なスケジューリングポリシーの多様なセットを生成し、DRを使って制限する2つの方法を提案する。提案手法は,より大規模なFJSSPインスタンス上での他の3つの深層強化学習手法よりも,分散ルールよりも優れ,より優れた結果が得られることがわかった。 The Flexible Job Shop Scheduling Problem (FJSSP) has been extensively studied in the literature, and multiple approaches have been proposed within the heuristic, exact, and metaheuristic methods. However, the industry's demand to be able to respond in real-time to disruptive events has generated the necessity to be able to generate new schedules within a few seconds. Among these methods, under this constraint, only dispatching rules (DRs) are capable of generating schedules, even though their quality can be improved. To improve the results, recent methods have been proposed for modeling the FJSSP as a Markov Decision Process (MDP) and employing reinforcement learning to create a policy that generates an optimal solution assigning operations to machines. Nonetheless, there is still room for improvement, particularly in the larger FJSSP instances which are common in real-world scenarios. Therefore, the objective of this paper is to propose a method capable of robustly solving large instances of the FJSSP. To achieve this, we propose a novel way of modeling the FJSSP as an MDP using graph neural networks. We also present two methods to make inference more robust: generating a diverse set of scheduling policies that can be parallelized and limiting them using DRs. We have tested our approach on synthetically generated instances and various public benchmarks and found that our approach outperforms dispatching rules and achieves better results than three other recent deep reinforcement learning methods on larger FJSSP instances.	翻訳日:2023-10-25 19:21:52 公開日:2023-10-24
# 無線ネットワークにおける情報正確性と鮮度のための学習型スケジューリング Learning-based Scheduling for Information Accuracy and Freshness in Wireless Networks ( http://arxiv.org/abs/2310.15705v1 ) ライセンス: Link先を確認	Hitesh Gudwani	(参考訳) 我々は、複数のソース、単一の通信チャネル、単一の監視ステーションからなるシステムを考える。各ソースは、精度の異なる時間変動量を測定し、そのうちの1つがチャネル経由で監視ステーションに更新を送信する。それぞれの通信が成功する確率は、更新を送信するためにスケジュールされたソースの機能である。正確な測定の確率と全てのソースの送信が成功する確率の両方がスケジューラに不明である。利息の指標は、宛先が受信した最終更新の精度と、システムの年齢情報(AoI)に依存するシステムによって与えられる報酬である。我々は,マルチアームバンディット問題の一変種としてスケジューリング問題をモデル化した。 ETC,$\epsilon$-greedy, UCB, TSといった4ドルの標準バンディットポリシのパフォーマンスをシミュレーションによって適切に調整したシステムモデルと比較する。さらに、これらのポリシーの2ドルなどの分析的な保証と、$\epsilon$-greedyを提供します。最後に、いかなる政策でも達成可能な累積的後悔に対する下限を特徴づける。 We consider a system of multiple sources, a single communication channel, and a single monitoring station. Each source measures a time-varying quantity with varying levels of accuracy and one of them sends its update to the monitoring station via the channel. The probability of success of each attempted communication is a function of the source scheduled for transmitting its update. Both the probability of correct measurement and the probability of successful transmission of all the sources are unknown to the scheduler. The metric of interest is the reward received by the system which depends on the accuracy of the last update received by the destination and the Age-of-Information (AoI) of the system. We model our scheduling problem as a variant of the multi-arm bandit problem with sources as different arms. We compare the performance of all $4$ standard bandit policies, namely, ETC, $\epsilon$-greedy, UCB, and TS suitably adjusted to our system model via simulations. In addition, we provide analytical guarantees of $2$ of these policies, ETC, and $\epsilon$-greedy. Finally, we characterize the lower bound on the cumulative regret achievable by any policy.	翻訳日:2023-10-25 19:21:24 公開日:2023-10-24
# 外部知識グラフを用いた生物医学的要約の強化 Enhancing Biomedical Lay Summarisation with External Knowledge Graphs ( http://arxiv.org/abs/2310.15702v1 ) ライセンス: Link先を確認	Tomas Goldsack, Zhihao Zhang, Chen Tang, Carolina Scarton, Chenghua Lin	(参考訳) 自動レイサマリゼーションのこれまでのアプローチは、技術的聴衆(例えば研究者)のために書かれたことを考えると、すべての技術的概念を明示的に定義したり、すべての背景情報を一般の聴衆に関連付けることは不可能である。本稿では,既存のバイオメディカル・レイ・サマリゼーション・データセットであるeLifeに,関連するバイオメディカル概念に関する詳細な情報を含む,記事固有の知識グラフを付加することにより,この問題に対処する。自動評価と人的評価の両方を用いて,各手法がエンコーダ・デコーダ・モデルアーキテクチャの異なる領域を対象とし,階層化モデルに知識グラフを組み込む3つのアプローチの有効性を体系的に検討した。この結果から,グラフベースのドメイン知識の統合は,生成したテキストの可読性を大幅に向上し,技術的概念の理解を深めることによって,レイ・サマリゼーションのメリットを著しく向上させることが確認できた。 Previous approaches for automatic lay summarisation are exclusively reliant on the source article that, given it is written for a technical audience (e.g., researchers), is unlikely to explicitly define all technical concepts or state all of the background information that is relevant for a lay audience. We address this issue by augmenting eLife, an existing biomedical lay summarisation dataset, with article-specific knowledge graphs, each containing detailed information on relevant biomedical concepts. Using both automatic and human evaluations, we systematically investigate the effectiveness of three different approaches for incorporating knowledge graphs within lay summarisation models, with each method targeting a distinct area of the encoder-decoder model architecture. Our results confirm that integrating graph-based domain knowledge can significantly benefit lay summarisation by substantially increasing the readability of generated text and improving the explanation of technical concepts.	翻訳日:2023-10-25 19:21:07 公開日:2023-10-24
# COPF: 最適な政策適合による継続的な学習 COPF: Continual Learning Human Preference through Optimal Policy Fitting ( http://arxiv.org/abs/2310.15694v1 ) ライセンス: Link先を確認	Han Zhang, Lin Gui, Yuanzhao Zhai, Hui Wang, Yu Lei, Ruifeng Xu	(参考訳) 人間フィードバックからの強化学習(rlhf)は、事前学習された言語モデル(lm)を改善するために一般的に用いられる手法であり、人間の好みに適合する能力を高める。しかしながら、現在のRLHFベースのLMは、新しいクエリやフィードバックが導入されるたびに完全なリトレーニングを必要とする。 lmsの再トレーニングは、データプライバシに関する懸念に加えて、膨大な時間と計算リソースを必要とするため、多くの現実の状況において実践上の困難をもたらす。この制限に対処するために,モンテカルロ法を用いて一連の最適政策を推定し,関数正規化と連続的にポリシーシーケンスを適合させる,COPF(Continuous Optimal Policy Fitting)と呼ばれる新しい手法を提案する。 COPFは単一の学習フェーズを含み、複雑な強化学習を必要としない。重要なのは、ラベルのないデータから学習するRLHFと共有することで、継続的な嗜好学習に柔軟になることだ。実験の結果, copfは, 異なるタスクやドメインにおける人間の嗜好と一貫性を持たせる上で, 強い連続学習(cl)ベースラインよりも優れていることがわかった。 The technique of Reinforcement Learning from Human Feedback (RLHF) is a commonly employed method to improve pre-trained Language Models (LM), enhancing their ability to conform to human preferences. Nevertheless, the current RLHF-based LMs necessitate full retraining each time novel queries or feedback are introduced, which becomes a challenging task because human preferences can vary between different domains or tasks. Retraining LMs poses practical difficulties in many real-world situations due to the significant time and computational resources required, along with concerns related to data privacy. To address this limitation, we propose a new method called Continual Optimal Policy Fitting (COPF), in which we estimate a series of optimal policies using the Monte Carlo method, and then continually fit the policy sequence with the function regularization. COPF involves a single learning phase and doesn't necessitate complex reinforcement learning. Importantly, it shares the capability with RLHF to learn from unlabeled data, making it flexible for continual preference learning. Our experimental results show that COPF outperforms strong Continuous learning (CL) baselines when it comes to consistently aligning with human preferences on different tasks and domains.	翻訳日:2023-10-25 19:20:48 公開日:2023-10-24
# 半教師付き学習によるレシピジャンルの自動分類 Towards Automated Recipe Genre Classification using Semi-Supervised Learning ( http://arxiv.org/abs/2310.15693v1 ) ライセンス: Link先を確認	Nazmus Sakib, G. M. Shahariar, Md. Mohsinul Kabir, Md. Kamrul Hasan and Hasan Mahmud	(参考訳) 料理のレシピを共有することは、料理のアイデアを交換し、料理の準備の指示を与えるのに最適な方法である。しかし、適切なラベル付きデータがないため、オンラインの生レシピを適切な食品ジャンルに分類することは困難である。本研究では,それぞれのカテゴリにラベル付けされた200万の料理レシピを含む「Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset」というデータセットを提案する。このデータには、タイトル、NER、方向、拡張NERなどの様々な特徴と、パン屋、飲み物、非野菜、野菜、ファーストフード、穀物、食事、側面、融合などのジャンルを表す9つの異なるラベルが含まれている。提案されたパイプラインである3A2M+は、名前付きエンティティ認識(NER)リストのサイズを拡張して、2つのNER抽出ツールを使用してレシピの方向から、熱、時間、プロセスなどの名前のないエンティティに対処する。 3A2M+データセットは、分類、名前付きエンティティ認識、レシピ生成など、さまざまな困難なレシピ関連タスクに対する包括的なソリューションを提供する。さらに、従来の機械学習、ディープラーニング、事前学習言語モデルを用いてレシピをそれぞれのジャンルに分類し、全体の精度98.6\%を達成した。我々の調査は、タイトル機能はジャンルの分類においてより重要な役割を担ったことを示している。 Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the ``Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools. 3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation. Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6\%. Our investigation indicates that the title feature played a more significant role in classifying the genre.	翻訳日:2023-10-25 19:20:27 公開日:2023-10-24
# 補間・逆問題に対する高次残差ネットワークを用いた物理インフォームド Physics-Informed with Power-Enhanced Residual Network for Interpolation and Inverse Problems ( http://arxiv.org/abs/2310.15690v1 ) ライセンス: Link先を確認	Amir Noorizadegan, D.L. Young, Y.C. Hon, C.S. Chen	(参考訳) 本稿では,2次元および3次元設定におけるスムース関数と非スムース関数の補間能力を改善するために設計された,パワーエンハンシング残差ネットワークと呼ばれる新しいニューラルネットワーク構造を提案する。残余要素に電力項を追加することで、アーキテクチャはネットワークの表現力を高める。本研究は,ネットワーク深さ,幅,最適化手法について検討し,アーキテクチャの適応性と性能上の優位性を示す。一貫して,提案するパワーエンハンシング残差ネットワーク,特に非スムース関数の異常精度を強調する。実世界の例では、正確性、収束性、効率性の点で、普通のニューラルネットワークよりも優れていることも確認されている。この研究は、より深いネットワークの影響も調べている。さらに、提案アーキテクチャは逆バーガー方程式の解法にも適用され、優れた性能を示す。結論として、パワーエンハンシング残余ネットワークは、ニューラルネットワークの機能を大幅に強化する汎用的なソリューションを提供する。実装されたコードは、 \url{https://github.com/cmmai/resnet_for_pinn} で利用可能である。 This paper introduces a novel neural network structure called the Power-Enhancing residual network, designed to improve interpolation capabilities for both smooth and non-smooth functions in 2D and 3D settings. By adding power terms to residual elements, the architecture boosts the network's expressive power. The study explores network depth, width, and optimization methods, showing the architecture's adaptability and performance advantages. Consistently, the results emphasize the exceptional accuracy of the proposed Power-Enhancing residual network, particularly for non-smooth functions. Real-world examples also confirm its superiority over plain neural network in terms of accuracy, convergence, and efficiency. The study also looks at the impact of deeper network. Moreover, the proposed architecture is also applied to solving the inverse Burgers' equation, demonstrating superior performance. In conclusion, the Power-Enhancing residual network offers a versatile solution that significantly enhances neural network capabilities. The codes implemented are available at: \url{https://github.com/CMMAi/ResNet_for_PINN}.	翻訳日:2023-10-25 19:19:55 公開日:2023-10-24
# 特許簡素化のための銀標準の作成 Creating a silver standard for patent simplification ( http://arxiv.org/abs/2310.15689v1 ) ライセンス: Link先を確認	Silvia Casola, Alberto Lavelli, Horacio Saggion	(参考訳) 特許は、発明を一方的に保護し、他方で技術知識を流通させることを目的とした法的文書である。彼らの複雑なスタイル ― 法的、技術的、極めてあいまいな言語 ― は、コンテンツが人間や機械へのアクセスを困難にし、情報検索コミュニティに重大な課題をもたらす。本稿では,リプレースにより特許文書を自動的に簡易化する手法を提案する。ドメイン内並列化データがないため,特許文の大規模銀標準を自動的に生成する手法を提案する。候補を得るには一般ドメインパラフレーズシステムを用いるが,このプロセスはエラーを起こしやすく,制御が困難である。そこで,本研究では,適切なフィルタとペアリングし,簡易化システムの訓練に有効なクリーンコーパスを構築する。合成銀コーパスの人間による評価は, 文法的, 適切であり, 簡単な文を含むことを示している。 Patents are legal documents that aim at protecting inventions on the one hand and at making technical knowledge circulate on the other. Their complex style -- a mix of legal, technical, and extremely vague language -- makes their content hard to access for humans and machines and poses substantial challenges to the information retrieval community. This paper proposes an approach to automatically simplify patent text through rephrasing. Since no in-domain parallel simplification data exist, we propose a method to automatically generate a large-scale silver standard for patent sentences. To obtain candidates, we use a general-domain paraphrasing system; however, the process is error-prone and difficult to control. Thus, we pair it with proper filters and construct a cleaner corpus that can successfully be used to train a simplification system. Human evaluation of the synthetic silver corpus shows that it is considered grammatical, adequate, and contains simple sentences.	翻訳日:2023-10-25 19:19:40 公開日:2023-10-24
# フィードバックに基づく物体外観学習による夜間熱赤外画像のカラー化 Nighttime Thermal Infrared Image Colorization with Feedback-based Object Appearance Learning ( http://arxiv.org/abs/2310.15688v1 ) ライセンス: Link先を確認	Fu-Ya Luo, Shu-Lin Liu, Yi-Jun Cao, Kai-Fu Yang, Chang-Yong Xie, Yong Liu, Yong-Jie Li	(参考訳) 悪環境(例えば全暗黒)における安定した撮像は、熱赤外カメラ(TIR)を夜景知覚の一般的な選択肢にしている。しかしながら、TIR画像の低コントラストと色度欠如は、人間の解釈とその後のRGBベースの視覚アルゴリズムの展開に有害である。したがって、それを対応する昼間色画像(NTIR2DC)に翻訳することで、夜間TIR画像を色づけすることは理にかなっている。 NTIR2DCタスクの目覚ましい進歩にもかかわらず、小さなオブジェクトクラスの翻訳性能をいかに向上させるかは未調査である。この問題に対処するために,フィードバックに基づくオブジェクト外観学習(FoalGAN)を取り入れた生成的敵ネットワークを提案する。具体的には、オブジェクト翻訳の文脈依存性を低減するために、オクルージョン対応ミックスアップモジュールとそれに対応する外観整合性損失を提案する。夜間の街路場面における小型物体の代表的な例として, 交通灯の外観損失をデザインすることにより, 交通灯のリアリズムを高める方法を示す。小型オブジェクトの出現学習をさらに改善するため,2つのフィードバック学習戦略を考案し,異なるサンプルの学習頻度を選択的に調整する。さらに,brnoデータセットのサブセットに対してピクセルレベルのアノテーションを提供し,複数の気象条件下でのntir画像理解の研究を容易にする。広範な実験により,提案手法は小物体の出現学習に有効であるだけでなく,ntir2dcタスクにおける意味保存とエッジ一貫性の観点から,他の画像翻訳手法よりも優れていることが示された。 Stable imaging in adverse environments (e.g., total darkness) makes thermal infrared (TIR) cameras a prevalent option for night scene perception. However, the low contrast and lack of chromaticity of TIR images are detrimental to human interpretation and subsequent deployment of RGB-based vision algorithms. Therefore, it makes sense to colorize the nighttime TIR images by translating them into the corresponding daytime color images (NTIR2DC). Despite the impressive progress made in the NTIR2DC task, how to improve the translation performance of small object classes is under-explored. To address this problem, we propose a generative adversarial network incorporating feedback-based object appearance learning (FoalGAN). Specifically, an occlusion-aware mixup module and corresponding appearance consistency loss are proposed to reduce the context dependence of object translation. As a representative example of small objects in nighttime street scenes, we illustrate how to enhance the realism of traffic light by designing a traffic light appearance loss. To further improve the appearance learning of small objects, we devise a dual feedback learning strategy to selectively adjust the learning frequency of different samples. In addition, we provide pixel-level annotation for a subset of the Brno dataset, which can facilitate the research of NTIR image understanding under multiple weather conditions. Extensive experiments illustrate that the proposed FoalGAN is not only effective for appearance learning of small objects, but also outperforms other image translation methods in terms of semantic preservation and edge consistency for the NTIR2DC task.	翻訳日:2023-10-25 19:19:25 公開日:2023-10-24
# 単純なアンサンブルプロジェクタによる半教師あり学習性能の劣化・校正・改善 Debiasing, calibrating, and improving Semi-supervised Learning performance via simple Ensemble Projector ( http://arxiv.org/abs/2310.15764v1 ) ライセンス: Link先を確認	Khanh-Binh Nguyen	(参考訳) 半教師付き学習(SSL)に関する最近の研究は大きな成功を収めている。有望な性能にもかかわらず、現在の最先端の手法は、より多くのネットワークコンポーネントと追加のトレーニング手順を導入するコストを犠牲にして、ますます複雑な設計へと向かっている。本稿では,既存のコントラスト付き半教師付き学習フレームワークの性能向上を目的として,EPASS(Ensemble Projectors Aided for Semi-supervised Learning)という簡単な手法を提案する。 1つのプロジェクタからの学習された埋め込みが対照的な学習で使用されるメモリバンクに格納される標準的な方法とは異なり、EPASSは複数のプロジェクタからのアンサンブル埋め込みをメモリバンクに格納する。その結果、EPASSは一般化を改善し、特徴表現を強化し、性能を向上する。例えばEPASSは、SimMatchのラベル付きデータの100k/1\%/10\%しか使用せず、半教師付き学習の強いベースラインを39.47\%/31.39\%/24.70\%のトップ-1エラーレートで改善し、ImageNetデータセット上でCoMatchの40.24\%/32.64\%/25.90\%のトップ1エラーレートを達成する。これらの改善は、提案手法の一般的な有効性を証明するため、メソッド、ネットワークアーキテクチャ、データセット間で一貫性がある。コードはhttps://github.com/beandkay/EPASSで入手できる。 Recent studies on semi-supervised learning (SSL) have achieved great success. Despite their promising performance, current state-of-the-art methods tend toward increasingly complex designs at the cost of introducing more network components and additional training procedures. In this paper, we propose a simple method named Ensemble Projectors Aided for Semi-supervised Learning (EPASS), which focuses mainly on improving the learned embeddings to boost the performance of the existing contrastive joint-training semi-supervised learning frameworks. Unlike standard methods, where the learned embeddings from one projector are stored in memory banks to be used with contrastive learning, EPASS stores the ensemble embeddings from multiple projectors in memory banks. As a result, EPASS improves generalization, strengthens feature representation, and boosts performance. For instance, EPASS improves strong baselines for semi-supervised learning by 39.47\%/31.39\%/24.70\% top-1 error rate, while using only 100k/1\%/10\% of labeled data for SimMatch, and achieves 40.24\%/32.64\%/25.90\% top-1 error rate for CoMatch on the ImageNet dataset. These improvements are consistent across methods, network architectures, and datasets, proving the general effectiveness of the proposed methods. Code is available at https://github.com/beandkay/EPASS.	翻訳日:2023-10-25 19:11:31 公開日:2023-10-24
# RAPL:Few-Shotドキュメンテーション-レベル関係抽出のための関係認識型学習手法 RAPL: A Relation-Aware Prototype Learning Approach for Few-Shot Document-Level Relation Extraction ( http://arxiv.org/abs/2310.15743v1 ) ライセンス: Link先を確認	Shiao Meng, Xuming Hu, Aiwei Liu, Shu'ang Li, Fukun Ma, Yawen Yang, Lijie Wen	(参考訳) ラベル付きドキュメントがわずかにあれば、ドキュメント内のエンティティ間のセマンティックな関係を識別する方法? 実世界のシナリオにおける広範囲なデータ不足問題に対処するためには,FSDLRE (Few-shot document-level relation extract) が重要である。メトリクスベースのメタラーニングは、分類のためのクラスプロトタイプを構築するFSDLREに広く採用されている効果的なフレームワークである。しかし、既存の作品はしばしば正確な関係セマンティクスを持つクラスプロトタイプを得るのに苦労している。 1) 対象関係型のプロトタイプを構築するには、その関係を保持するすべてのエンティティペアの表現を集約する一方、これらのエンティティペアは他の関係も保持し、プロトタイプを妨害する可能性がある。 2) ターゲット関係型が異なるタスクではNOTA意味が異なることを無視して,NOTA(None-of-the-above)プロトタイプを全タスクにわたって使用する。本稿では,FSDLREにおける関係認識型プロトタイプ学習手法を提案する。本手法は,関係記述や現実的なNOTAインスタンスをガイダンスとして活用することにより,関係のプロトタイプを効果的に改良し,タスク固有のNOTAプロトタイプを生成する。 2つのFSDLREベンチマークの様々な設定において,提案手法が平均2.61%のF_1$で最先端の手法より優れていることを示す。 How to identify semantic relations among entities in a document when only a few labeled documents are available? Few-shot document-level relation extraction (FSDLRE) is crucial for addressing the pervasive data scarcity problem in real-world scenarios. Metric-based meta-learning is an effective framework widely adopted for FSDLRE, which constructs class prototypes for classification. However, existing works often struggle to obtain class prototypes with accurate relational semantics: 1) To build prototype for a target relation type, they aggregate the representations of all entity pairs holding that relation, while these entity pairs may also hold other relations, thus disturbing the prototype. 2) They use a set of generic NOTA (none-of-the-above) prototypes across all tasks, neglecting that the NOTA semantics differs in tasks with different target relation types. In this paper, we propose a relation-aware prototype learning method for FSDLRE to strengthen the relational semantics of prototype representations. By judiciously leveraging the relation descriptions and realistic NOTA instances as guidance, our method effectively refines the relation prototypes and generates task-specific NOTA prototypes. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches by average 2.61% $F_1$ across various settings of two FSDLRE benchmarks.	翻訳日:2023-10-25 19:11:02 公開日:2023-10-24
# 拡張テンプレートを用いた心電図インプテーションの拡散モデルの改善 Improving Diffusion Models for ECG Imputation with an Augmented Template Prior ( http://arxiv.org/abs/2310.15742v1 ) ライセンス: Link先を確認	Alexander Jenkins, Zehua Chen, Fu Siong Ng, Danilo Mandic	(参考訳) 心電図(ecg)などの脈動信号は日常診療の一部として広範囲に収集される。しかし、ノイズの多い低品質な録音は、モバイルの健康システムで収集された信号にとって大きな問題であり、信号品質が低下し、ダウンストリームのタスクが自動化される。近年の研究では、確率的時系列モデルによるECGの欠落値の計算が検討されている。それにもかかわらず、決定論的モデルと比較すると、被験者と心拍関係の差異がトレーニング目標において明示的に考慮されないため、その性能は依然として限られている。本研究は,心電図の計算精度の向上と確率モデルによる予測精度の向上を目的として,様々な健康状態に先立って情報処理を行うテンプレート誘導型拡散確率モデルPulseDiffを提案する。具体的には 1) まず,被写体レベルの脈動テンプレートを,個人的特徴を捉えた欠落値の先取りとして,観察から抽出する。 2) 位置と振幅のビートレベルのばらつきを考慮した事前拡張のためのテンプレートにビートレベルの確率シフト項を追加する。 3) 被験者の健康状態を検討するための信頼度スコアを最終的に設計し, プライオリティが安全な方法で提供されることを保証した。 PTBXLデータセットを用いて実験したところ、PulseDiffはCSDIとSSSD$^{S4}$という2つの強力なDDPMベースラインモデルの性能を改善し、不確実性を管理しながらDDPMの生成を検証した。 SSSD$^{S4}$と組み合わせると、PulseDiff法は短区間欠落データに対する主要な決定論的モデルよりも優れ、長期間隔データ損失に匹敵する。 Pulsative signals such as the electrocardiogram (ECG) are extensively collected as part of routine clinical care. However, noisy and poor-quality recordings, leading to missing values, are a major issue for signals collected using mobile health systems, decreasing the signal quality and affecting the automated downstream tasks. Recent studies have explored imputation of missing values for ECG with probabilistic time-series models. Nevertheless, in comparison with the deterministic models, their performance is still limited, as the variations across subjects and heart-beat relationships are not explicitly considered in the training objective. In this work, to improve the ECG imputation and forecasting accuracy with probabilistic models, we present an template-guided denoising diffusion probabilistic model, PulseDiff, which is conditioned an informative prior for a range of health conditions. Specifically, 1) we first extract a subject-level pulsative template from the observation as an informative prior of missing values, which captures the personal characteristics; 2) we then add beat-level stochastic shift terms on the template for prior augmentation, which considers the beat-level variance of positioning and amplitude; 3) we finally design a confidence score to consider the health condition of subject, which ensures our prior is provided in a safe way. Experiments with the PTBXL dataset reveal PulseDiff improves the performance of two strong DDPMs baseline models, CSDI and SSSD$^{S4}$, verifying our method guides the generation of DDPMs while managing the uncertainty. When combining with SSSD$^{S4}$, our PulseDiff method outperforms the leading deterministic model for short-interval missing data and is comparable for long-interval data loss.	翻訳日:2023-10-25 19:10:40 公開日:2023-10-24
# プロトタイプ学習と特権情報を用いた解釈可能な医用画像分類 Interpretable Medical Image Classification using Prototype Learning and Privileged Information ( http://arxiv.org/abs/2310.15741v1 ) ライセンス: Link先を確認	Luisa Gallee, Meinrad Beer, and Michael Goetz	(参考訳) 解釈可能性はしばしば医療画像に必須の要件である。説明可能性とハイパフォーマンスの必要性に対処するには、高度なディープラーニング手法が必要である。本研究では,トレーニングプロセス中に利用可能な追加情報を使用して理解可能かつ強力なモデルを作成することができるかを検討する。本稿では,カプセルネットワークの利点,プロトタイプ学習,特権情報の利用を活用したproto-capsという革新的なソリューションを提案する。 LIDC-IDRIデータセット上で提案された解を評価することで,解釈可能性の向上と以上の最先端予測性能の併用が期待できる。説明可能なベースラインモデルと比較して,悪性度 (93.0 %) と肺結節の平均的特徴を予測できる精度は6 %以上向上した。同時に、モデルは、放射線科医が定義した属性の視覚的な検証を可能にするプロトタイプ表現によるケースベースの推論を提供する。 Interpretability is often an essential requirement in medical imaging. Advanced deep learning methods are required to address this need for explainability and high performance. In this work, we investigate whether additional information available during the training process can be used to create an understandable and powerful model. We propose an innovative solution called Proto-Caps that leverages the benefits of capsule networks, prototype learning and the use of privileged information. Evaluating the proposed solution on the LIDC-IDRI dataset shows that it combines increased interpretability with above state-of-the-art prediction performance. Compared to the explainable baseline model, our method achieves more than 6 % higher accuracy in predicting both malignancy (93.0 %) and mean characteristic features of lung nodules. Simultaneously, the model provides case-based reasoning with prototype representations that allow visual validation of radiologist-defined attributes.	翻訳日:2023-10-25 19:10:09 公開日:2023-10-24
# 量子モナドロジーは The Quantum Monadology ( http://arxiv.org/abs/2310.15735v1 ) ライセンス: Link先を確認	Hisham Sati and Urs Schreiber	(参考訳) 関数型プログラミング言語の現代的な理論は、計算サイドエフェクトとサイドコンテクストの符号化にモナドを用いる。量子コンピューティングは本質的に(量子測定のように)サイドエフェクトフルであり、(混合補助状態のように)コンテキスト依存であるにもかかわらず、このモナディックパラダイムは以前は量子プログラミング言語に当てはまらない。ここでは、Grothendieckの「操作のモチーフヨガ」によって誘導されるパラメータ化加群スペクトルのカテゴリ上の(co)モナドを、HC-加群に特化する現在の目的と、さらに集合付き複素ベクトル空間に対して体系的に解析する。量子計測結果によってパラメータ化された量子状態空間の集まりとしてインデックス付きベクトル空間を解釈すると、これらの(co)モナドは、古典的な制御と量子測定結果を古典的文脈に「動的に持ち上げる」機能を持つ関数型量子プログラミングのための包括的自然言語を提供する。我々は、最近構築された線形ホモトピー型理論(LHoTT)に埋め込み、パラメータ化されたモジュールスペクトルに解釈可能な、これらのモナディックな量子効果を表現するドメイン固有量子プログラミング言語(QS)を提案する。 LHoTTに組み込むと、線形量子型、古典的制御、動的リフト、そして特に位相効果を持つ、正式に検証可能な普遍量子プログラミングが実現される。 The modern theory of functional programming languages uses monads for encoding computational side-effects and side-contexts, beyond bare-bone program logic. Even though quantum computing is intrinsically side-effectful (as in quantum measurement) and context-dependent (as on mixed ancillary states), little of this monadic paradigm has previously been brought to bear on quantum programming languages. Here we systematically analyze the (co)monads on categories of parameterized module spectra which are induced by Grothendieck's "motivic yoga of operations" -- for the present purpose specialized to HC-modules and further to set-indexed complex vector spaces. Interpreting an indexed vector space as a collection of alternative possible quantum state spaces parameterized by quantum measurement results, as familiar from Proto-Quipper-semantics, we find that these (co)monads provide a comprehensive natural language for functional quantum programming with classical control and with "dynamic lifting" of quantum measurement results back into classical contexts. We close by indicating a domain-specific quantum programming language (QS) expressing these monadic quantum effects in transparent do-notation, embeddable into the recently constructed Linear Homotopy Type Theory (LHoTT) which interprets into parameterized module spectra. Once embedded into LHoTT, this should make for formally verifiable universal quantum programming with linear quantum types, classical control, dynamic lifting, and notably also with topological effects.	翻訳日:2023-10-25 19:09:53 公開日:2023-10-24
# 群集歩行者検出のためのクエリ適応型DETR Query-adaptive DETR for Crowded Pedestrian Detection ( http://arxiv.org/abs/2310.15725v1 ) ライセンス: Link先を確認	Feng Gao, Jiaxu Leng, Ji Gan, and Xinbo Gao	(参考訳) トラアンフォーマー(DETR)とその変種(DETR)は,歩行者の混雑検知に適用され,高い性能を実現している。しかし、混み合ったシーンでは、DETRのクエリの数が手動で調整されなければならず、そうでなければ、パフォーマンスは様々な程度に低下する。本稿では,2つのクエリ生成手法をまず分析し,適応クエリ生成手法を設計するための4つのガイドラインを要約する。そこで我々は,この問題を軽減するためにランクベースの適応クエリ生成(RAQG)を提案する。具体的には、エンコーダが生成する最も信頼度の低い正のトレーニングサンプルのランクを予測できるランク予測ヘッドを設計する。予測ランクに基づいて,エンコーダが生成した粗い検出結果を適応的に選択してクエリを生成する適応的選択法を設計する。さらに、ランク予測ヘッドをより良く訓練するために、ソフトグラディエントL1損失を提案する。ソフトグラディエントL1損失の勾配は連続であり、損失値とモデルパラメータの更新値の関係を粒度的に記述することができる。提案手法は単純かつ効果的であり,任意のDETRに接続してクエリ適応性を実現する。 crowdhuman dataset と citypersons dataset の実験結果は,detr に対するクエリを適応的に生成し,競合的な結果が得られることを示した。特に,crowdhumanデータセットで39.4%のmrを実現する。 DEtection TRansformer (DETR) and its variants (DETRs) have been successfully applied to crowded pedestrian detection, which achieved promising performance. However, we find that, in different degrees of crowded scenes, the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees. In this paper, we first analyze the two current query generation methods and summarize four guidelines for designing the adaptive query generation method. Then, we propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem. Specifically, we design a rank prediction head that can predict the rank of the lowest confidence positive training sample produced by the encoder. Based on the predicted rank, we design an adaptive selection method that can adaptively select coarse detection results produced by the encoder to generate queries. Moreover, to train the rank prediction head better, we propose Soft Gradient L1 Loss. The gradient of Soft Gradient L1 Loss is continuous, which can describe the relationship between the loss value and the updated value of model parameters granularly. Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory. The experimental results on Crowdhuman dataset and Citypersons dataset show that our method can adaptively generate queries for DETRs and achieve competitive results. Especially, our method achieves state-of-the-art 39.4% MR on Crowdhuman dataset.	翻訳日:2023-10-25 19:09:23 公開日:2023-10-24
# variator: プラグアンドプレイ圧縮モジュールによる事前学習モデルの高速化 Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules ( http://arxiv.org/abs/2310.15724v1 ) ライセンス: Link先を確認	Chaojun Xiao, Yuqi Luo, Wenbin Zhang, Pengle Zhang, Xu Han, Yankai Lin, Zhengyan Zhang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou	(参考訳) プレトレーニング言語モデル (PLM) は, NLPタスクにおいて顕著な結果を得たが, 膨大なパラメータサイズと計算コストを犠牲にしている。本稿では,プラグアンドプレイ圧縮プラグインによる計算効率を向上させるパラメータ効率向上手法であるVariatorを提案する。圧縮プラグインは、複数の隠れベクターを1つに圧縮することでシーケンス長を減らし、元のPLMでトレーニングするように設計されている。 1) 実世界のアプリケーションでは, 圧縮プラグインのプラグ・アンド・プレイ特性は, 現在のワークロードに基づいて異なる加速度比で異なる圧縮プラグインを動的に選択することができる。 2) 圧縮プラグインは、最小パラメータを持ついくつかのコンパクトニューラルネットワーク層で構成され、特にタスク数が増加するシナリオにおいて、ストレージとメモリオーバーヘッドを大幅に節約する。 Variatorの7つのデータセットに対する有効性を検証する。実験の結果,バリエータは0.9%の追加パラメータで計算コストを53%削減でき,性能は2%未満であった。さらに、モデルが数十億のパラメータにスケールすると、変数は未圧縮plmの強力な性能にマッチする。 Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressing multiple hidden vectors into one and trained with original PLMs frozen. Different from traditional model acceleration methods, which compress PLMs to smaller sizes, Variator offers two distinct advantages: (1) In real-world applications, the plug-and-play nature of our compression plugins enables dynamic selection of different compression plugins with varying acceleration ratios based on the current workload. (2) The compression plugin comprises a few compact neural network layers with minimal parameters, significantly saving storage and memory overhead, particularly in scenarios with a growing number of tasks. We validate the effectiveness of Variator on seven datasets. Experimental results show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%. Moreover, when the model scales to billions of parameters, Variator matches the strong performance of uncompressed PLMs.	翻訳日:2023-10-25 19:09:00 公開日:2023-10-24
# re-temp:時間知識グラフ完成のための関係認識時間表現学習 Re-Temp: Relation-Aware Temporal Representation Learning for Temporal Knowledge Graph Completion ( http://arxiv.org/abs/2310.15722v1 ) ライセンス: Link先を確認	Kunze Wang, Soyeon Caren Han, Josiah Poon	(参考訳) 補外設定の下での時間的知識グラフ補完(TKGC)は、行方不明な実体を将来から予測することを目的としており、現実の予測問題とより密接に一致する課題を呈している。既存の研究は主に、最近のスナップショットに適用されたシーケンシャルグラフニューラルネットワークを使用してエンティティと関係を符号化している。しかしながら、これらのアプローチは、クエリにおけるエンティティ関連の関係に従って無関係なスナップショットをスキップする能力を見落とし、明示的な時間的情報の重要性を無視する傾向にある。そこで本研究では,各タイムスタンプのあとのスキップ情報の流れを取り入れ,明示的な時間的埋め込みを入力として活用するRe-Temp(Relation-Aware Temporal Representation Learning)を提案する。さらに,情報漏洩を防止するため,二相前方伝播法を提案する。 6つのtkgc(extrapolation)データセットの評価を通じて、このモデルが最新の8つの最先端モデルを上回ることを実証した。 Temporal Knowledge Graph Completion (TKGC) under the extrapolation setting aims to predict the missing entity from a fact in the future, posing a challenge that aligns more closely with real-world prediction problems. Existing research mostly encodes entities and relations using sequential graph neural networks applied to recent snapshots. However, these approaches tend to overlook the ability to skip irrelevant snapshots according to entity-related relations in the query and disregard the importance of explicit temporal information. To address this, we propose our model, Re-Temp (Relation-Aware Temporal Representation Learning), which leverages explicit temporal embedding as input and incorporates skip information flow after each timestamp to skip unnecessary information for prediction. Additionally, we introduce a two-phase forward propagation method to prevent information leakage. Through the evaluation on six TKGC (extrapolation) datasets, we demonstrate that our model outperforms all eight recent state-of-the-art models by a significant margin.	翻訳日:2023-10-25 19:08:41 公開日:2023-10-24
# 脳エンコーディングのためのタスク固有言語モデルのアンサンブル Ensemble of Task-Specific Language Models for Brain Encoding ( http://arxiv.org/abs/2310.15720v1 ) ライセンス: Link先を確認	Sanjai Kumaran, Arvindh Arun, Jerrin John	(参考訳) 言語モデルは、脳内の特定の関心領域のfMRIアクティベーションをエンコードするのに十分なほど豊富であることが示されている。従来の研究は、脳の反応を予測するために人気のある自然言語処理タスクで学んだ表現から伝達学習を探索してきた。本研究では,10言語モデル(構文2と意味8)からアンサンブルモデルを作成することにより,エンコーダの性能を向上させる。アンサンブルメソッドを通じて、すべてのROIで、現在のベースラインを平均10%上回りました。 Language models have been shown to be rich enough to encode fMRI activations of certain Regions of Interest in our Brains. Previous works have explored transfer learning from representations learned for popular natural language processing tasks for predicting brain responses. In our work, we improve the performance of such encoders by creating an ensemble model out of 10 popular Language Models (2 syntactic and 8 semantic). We beat the current baselines by 10% on average across all ROIs through our ensembling methods.	翻訳日:2023-10-25 19:08:22 公開日:2023-10-24
# リカレントリニアトランス Recurrent Linear Transformers ( http://arxiv.org/abs/2310.15719v1 ) ライセンス: Link先を確認	Subhojeet Pramanik, Esraa Elelimy, Marlos C. Machado, Adam White	(参考訳) トランスアーキテクチャにおける自己保持機構は、長距離依存をキャプチャできるため、シーケンシャルデータ処理におけるその有効性の背後にある主な理由である。しかし、トランスフォーマーの成功にもかかわらず、幅広い適用可能性を制限する2つの大きな欠点がある。(1)過去の情報を思い出すために、自己照査メカニズムは、コンテキストとして提供すべき履歴全体にアクセスする必要がある。 (2)変圧器の推論コストは高価である。本稿では,文脈非依存な推論コストを提供し,長距離依存性を効果的に活用し,実際にうまく機能するトランスフォーマ自着機構の再帰的な代替手法を提案する。上述した計算制限が変圧器の応用をほぼ不可能にしている強化学習問題に対する我々のアプローチを評価する。診断環境におけるアーキテクチャの異なるコンポーネントの影響を定量化し、2dおよび3dピクセルベースの部分観測可能な環境でのパフォーマンス向上を評価する。最先端アーキテクチャであるgtrxlと比較すると、このアプローチでの推論は少なくとも40%安価で、メモリ使用量を50%以上削減できる。提案手法はGTrXLと同等かそれ以上に動作し,GTrXLの性能が37%以上向上する。 The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their broader applicability: (1) In order to remember past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper we introduce recurrent alternatives to the transformer self-attention mechanism that offer a context-independent inference cost, leverage long-range dependencies effectively, and perform well in practice. We evaluate our approaches in reinforcement learning problems where the aforementioned computational limitations make the application of transformers nearly infeasible. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments. When compared to a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use in more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance on harder tasks.	翻訳日:2023-10-25 19:08:14 公開日:2023-10-24
# ソーシャルメディア上でヘイトスピーチを共有する理由に関する因果理解 Causal Understanding of Why Users Share Hate Speech on Social Media ( http://arxiv.org/abs/2310.15772v1 ) ライセンス: Link先を確認	Dominique Geissler and Abdurahman Maarouf and Stefan Feuerriegel	(参考訳) ソーシャルメディア上でのヘイトスピーチは、個人の精神的および身体的幸福を脅かし、現実世界の暴力にさらに責任を負う。ヘイトスピーチの普及の背後にある重要なドライバーであり、なぜヘイトフルな投稿がバイラルに広まるのかは、リシェアされている。本稿では,ヘイトスピーチをユーザに再共有させるユーザ属性の包括的かつ因果的分析を行う。しかし, ソーシャルメディアデータからの因果推論は, 選択バイアスに悩まされる可能性が高く, 発話を嫌うユーザの脆弱性の違いにより, さらなる矛盾が生じているため, 困難である。我々は,新しい3段階の因果関係の枠組みを開発し,(1)対向性スコアを応用し,観察的ソーシャルメディアデータの偏りを解消する。 2) 音声を潜伏埋め込みとして嫌うユーザの潜伏脆弱性をモデル化するために, 偏りのある確率スコアを用いた。 3) ユーザ属性がヘイトスピーチを共有する確率に与える影響をモデル化し, ユーザのヘイトスピーチに対する潜在的な脆弱性を制御した。既存のベースラインと比較して、我々のフレームワークの特に強みは、非線形でありながら説明可能な因果効果をモデル化することである。フォロワーが減り、友達が減り、投稿数が減り、ヘイトスピーチが増えたことがわかりました。その代わり、若いアカウントはヘイトスピーチを減らしている。全体として、ヘイトスピーチの共有を促す要因を理解することは、有害な行動に関与するリスクのある個人を検知し、効果的な緩和戦略を設計するために重要である。 Hate speech on social media threatens the mental and physical well-being of individuals and is further responsible for real-world violence. An important driver behind the spread of hate speech and thus why hateful posts can go viral are reshares, yet little is known about why users reshare hate speech. In this paper, we present a comprehensive, causal analysis of the user attributes that make users reshare hate speech. However, causal inference from observational social media data is challenging, because such data likely suffer from selection bias, and there is further confounding due to differences in the vulnerability of users to hate speech. We develop a novel, three-step causal framework: (1) We debias the observational social media data by applying inverse propensity scoring. (2) We use the debiased propensity scores to model the latent vulnerability of users to hate speech as a latent embedding. (3) We model the causal effects of user attributes on users' probability of sharing hate speech, while controlling for the latent vulnerability of users to hate speech. Compared to existing baselines, a particular strength of our framework is that it models causal effects that are non-linear, yet still explainable. We find that users with fewer followers, fewer friends, and fewer posts share more hate speech. Younger accounts, in return, share less hate speech. Overall, understanding the factors that drive users to share hate speech is crucial for detecting individuals at risk of engaging in harmful behavior and for designing effective mitigation strategies.	翻訳日:2023-10-25 19:02:58 公開日:2023-10-24
# 自己監督型コントラスト学習によるMRI超解像 Unpaired MRI Super Resolution with Self-Supervised Contrastive Learning ( http://arxiv.org/abs/2310.15767v1 ) ライセンス: Link先を確認	Hao Li, Quanwei Liu, Jianan Liu, Xiling Liu, Yanni Dong, Tao Huang, Zhihan Lv	(参考訳) 高分解能mri(high- resolution (hr) magnetic resonance imaging, mri)は臨床における診断精度を高めるために重要である。それでも、MRIの解像度に固有の制限が適用範囲を制限している。深層学習に基づく画像超解像(SR)法は、追加コストなしでMRIの解像度を改善することを約束する。しかし、これらの手法はトレーニングのために相当数のHR MRI画像を必要とすることが多く、取得は困難である。本稿では、自己教師付きコントラスト学習を用いて、限られたトレーニングデータを用いてSR性能を向上させる未ペアMRI SRアプローチを提案する。提案手法は,正および負のサンプル対を構築するために,正のHR画像と合成SR画像の両方を活用し,識別的特徴の学習を容易にする。本研究で得られた実験結果は,hr画像のpaucityが利用可能であっても,ピーク信号対雑音比と構造類似度指数が著しく向上することを示す。本研究は, 臨床応用における高分解能MRIの進歩に寄与し, 限られたトレーニングデータの課題に対処するためのアプローチの可能性を示すものである。 High-resolution (HR) magnetic resonance imaging (MRI) is crucial for enhancing diagnostic accuracy in clinical settings. Nonetheless, the inherent limitation of MRI resolution restricts its widespread applicability. Deep learning-based image super-resolution (SR) methods exhibit promise in improving MRI resolution without additional cost. However, these methods frequently require a substantial number of HR MRI images for training, which can be challenging to acquire. In this paper, we propose an unpaired MRI SR approach that employs self-supervised contrastive learning to enhance SR performance with limited training data. Our approach leverages both authentic HR images and synthetically generated SR images to construct positive and negative sample pairs, thus facilitating the learning of discriminative features. Empirical results presented in this study underscore significant enhancements in the peak signal-to-noise ratio and structural similarity index, even when a paucity of HR images is available. These findings accentuate the potential of our approach in addressing the challenge of limited training data, thereby contributing to the advancement of high-resolution MRI in clinical applications.	翻訳日:2023-10-25 19:02:31 公開日:2023-10-24
# 条件付き精度調整によるロバスト学習 Robust Learning via Conditional Prevalence Adjustment ( http://arxiv.org/abs/2310.15766v1 ) ライセンス: Link先を確認	Minh Nguyen, Alan Q. Wang, Heejong Kim, Mert R. Sabuncu	(参考訳) 医療データは、境界変数間の相関が広く変化する複数の場所から来ることが多い。深層学習モデルがこれらの不安定な相関を利用していれば、目に見えない場所で破滅的に失敗する可能性がある。不安定な相関に対処する多くの方法が提案されているが、それぞれに制限がある。例えば、敵対的なトレーニングはモデルに不安定な相関を完全に無視させるが、それによって予測性能が低下する可能性がある。他の方法(例えば不変リスク最小化[4])は、因果データ生成過程を仮定して、安定した関連性のみに依存するドメイン不変表現を学習しようとする(入力 X はクラスラベル Y を引き起こす)。したがって、それらはコンピュータビジョンに共通する反因果タスク(Y cause X)に対して効果がない。本稿では,CoPA(Conditional Prevalence-Adjustment)という手法を提案する。 CoPAは、(1)生成機構が安定であり、すなわちラベルYと共起変数(s)ZがXを発生し、(2)各サイトEにおける不安定な条件付き確率がXとYの不安定な相関を完全に考慮していると仮定する。我々の重要な観察は、共起変数は医療現場で定期的に記録され、例えば (Y, Z) サンプルのセット(X のサンプルは不要)から容易に有病率を推定できるということです。 CoPAは、たとえ1つのトレーニングサイトがあっても機能する。合成データと実データを用いた実験では,CoPAが競争ベースラインを上回っていることがわかった。 Healthcare data often come from multiple sites in which the correlations between confounding variables can vary widely. If deep learning models exploit these unstable correlations, they might fail catastrophically in unseen sites. Although many methods have been proposed to tackle unstable correlations, each has its limitations. For example, adversarial training forces models to completely ignore unstable correlations, but doing so may lead to poor predictive performance. Other methods (e.g. Invariant risk minimization [4]) try to learn domain-invariant representations that rely only on stable associations by assuming a causal data-generating process (input X causes class label Y ). Thus, they may be ineffective for anti-causal tasks (Y causes X), which are common in computer vision. We propose a method called CoPA (Conditional Prevalence-Adjustment) for anti-causal tasks. CoPA assumes that (1) generation mechanism is stable, i.e. label Y and confounding variable(s) Z generate X, and (2) the unstable conditional prevalence in each site E fully accounts for the unstable correlations between X and Y . Our crucial observation is that confounding variables are routinely recorded in healthcare settings and the prevalence can be readily estimated, for example, from a set of (Y, Z) samples (no need for corresponding samples of X). CoPA can work even if there is a single training site, a scenario which is often overlooked by existing methods. Our experiments on synthetic and real data show CoPA beating competitive baselines.	翻訳日:2023-10-25 19:02:14 公開日:2023-10-24
# 自由テキストフィードバックから学ぶ - 新しいデータセットを収集するか、既存のものを拡張するか? Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend Existing Ones? ( http://arxiv.org/abs/2310.15758v1 ) ライセンス: Link先を確認	Dominic Petrak, Nafise Sadat Moosavi, Ye Tian, Nikolai Rozanov, Iryna Gurevych	(参考訳) 自由テキストの人間のフィードバックから学ぶことはダイアログシステムには不可欠だが、注釈付きデータは少なく、通常は会話型AIで知られている少数のエラータイプのみをカバーする。新しいデータセットをスクラッチから収集しアノテートするのではなく、最新の合成ダイアログ生成は、既存のダイアログデータセットを必要なアノテーションで拡張するために使用できる。しかし,このような取り組みの実現可能性を評価するためには,これらのデータセットに含まれる自由文フィードバックのタイプと頻度を知ることが重要である。本研究では,MultiWoZ,SGD,BABI,ペルソナチャット,ウィザーズ・オブ・ウィキペディア,セルフフィード・チャットボットの人間ボット分割など,多種多様なダイアログデータセットについて検討する。本稿では,対話における自由文人文フィードバックのアノテーションのための新しい分類法を導出し,gpt-2,llama,flan-t5の3つのsota言語生成モデルに対する応答生成におけるそのデータを含む影響について検討した。本研究は,エラータイプ,ユーザ応答型,それらの関係など,検討したデータセットの構成に関する新たな知見を提供する。 Learning from free-text human feedback is essential for dialog systems, but annotated data is scarce and usually covers only a small fraction of error types known in conversational AI. Instead of collecting and annotating new datasets from scratch, recent advances in synthetic dialog generation could be used to augment existing dialog datasets with the necessary annotations. However, to assess the feasibility of such an effort, it is important to know the types and frequency of free-text human feedback included in these datasets. In this work, we investigate this question for a variety of commonly used dialog datasets, including MultiWoZ, SGD, BABI, PersonaChat, Wizards-of-Wikipedia, and the human-bot split of the Self-Feeding Chatbot. Using our observations, we derive new taxonomies for the annotation of free-text human feedback in dialogs and investigate the impact of including such data in response generation for three SOTA language generation models, including GPT-2, LLAMA, and Flan-T5. Our findings provide new insights into the composition of the datasets examined, including error types, user response types, and the relations between them.	翻訳日:2023-10-25 19:01:44 公開日:2023-10-24
# オンライン討論における価値の相違は相違に影響を及ぼすか? Do Differences in Values Influence Disagreements in Online Discussions? ( http://arxiv.org/abs/2310.15757v1 ) ライセンス: Link先を確認	Michiel van der Meer, Piek Vossen, Catholijn M. Jonker, Pradeep K. Murukannaiah	(参考訳) 差別はオンライン議論で一般的である。相違はコラボレーションを促進し、いくつかの条件下での議論の品質を改善する可能性がある。意見の不一致を認識する方法は存在するが、意見不一致に影響を及ぼす要因の深い理解は文献に欠けている。本稿では,個人価値の違いがオンライン議論における意見の相違を示唆する仮説を考察する。オンライン議論における価値推定に最先端モデルをどのように利用できるか,そして,推定値をどのように価値プロファイルに集約できるかを示す。人手による合意ラベルに基づいて,評価値のプロファイルを評価する。価値プロファイルの相違は特定のケースにおける不一致と相関することがわかった。また,合意予測に価値情報を含めることで,性能が向上することがわかった。 Disagreements are common in online discussions. Disagreement may foster collaboration and improve the quality of a discussion under some conditions. Although there exist methods for recognizing disagreement, a deeper understanding of factors that influence disagreement is lacking in the literature. We investigate a hypothesis that differences in personal values are indicative of disagreement in online discussions. We show how state-of-the-art models can be used for estimating values in online discussions and how the estimated values can be aggregated into value profiles. We evaluate the estimated value profiles based on human-annotated agreement labels. We find that the dissimilarity of value profiles correlates with disagreement in specific cases. We also find that including value information in agreement prediction improves performance.	翻訳日:2023-10-25 19:01:22 公開日:2023-10-24
# 言語モデルと直接音声翻訳の統合:ジェンダーの抑揚を制御する推論時間解法 Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection ( http://arxiv.org/abs/2310.15752v1 ) ライセンス: Link先を確認	Dennis Fucci, Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli	(参考訳) 話者を参照する単語を翻訳する場合、音声翻訳(st)システムはデフォルトの男性ジェネリクスに頼らず、潜在的に誤解を招く声質に頼るべきではない。むしろ、話者の好みに応じて性別を割り当てるべきである。そのための既存のソリューションは、効果的ではあるが、実際には実現不可能ではない。提案手法は,STデコーダが暗黙的に学習した(バイアス付き)内部言語モデル(LM)を,ジェンダー固有の外部LMに置き換えるものである。 en->es/fr/it実験では,女性型において,基礎モデルと最良のトレーニング時間緩和戦略をそれぞれ31.0点,1.6点に上回った。話者の発声特性が性別と矛盾する困難な状況下では、さらに利益が(最大32.0と3.4まで)大きくなる。 When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST data. To overcome these limitations, we propose the first inference-time solution to control speaker-related gender inflections in ST. Our approach partially replaces the (biased) internal language model (LM) implicitly learned by the ST decoder with gender-specific external LMs. Experiments on en->es/fr/it show that our solution outperforms the base models and the best training-time mitigation strategy by up to 31.0 and 1.6 points in gender accuracy, respectively, for feminine forms. The gains are even larger (up to 32.0 and 3.4) in the challenging condition where speakers' vocal traits conflict with their gender.	翻訳日:2023-10-25 19:01:12 公開日:2023-10-24
# 大規模言語モデルはビデオ質問応答の時間的・因果的推論である Large Language Models are Temporal and Causal Reasoners for Video Question Answering ( http://arxiv.org/abs/2310.15747v1 ) ライセンス: Link先を確認	Dohwan Ko, Ji Soo Lee, Wooyoung Kang, Byungseok Roh, Hyunwoo J. Kim	(参考訳) 大規模言語モデル(LLM)は、幅広い自然言語理解および生成タスクにおいて顕著なパフォーマンスを示している。ビデオ質問回答 (Video Question Answering, VideoQA) における時間的・因果的推論のために, LLM が $\textit{linguistic shortcuts}$ を有効活用するための先行情報を提供する。しかしながら、そのような先行は、視覚的コンテンツを無視しながら、そのモデルを過度に疑問に答える$\textit{i.e.}$, $\textit{linguistic bias}$ へと導くことによって、ビデオQAの準最適結果を引き起こすことが多い。これは 'ungrounded guesses' や 'hallucinations' とも呼ばれる。この問題を解決するために,ビデオQA 上で LLM が先行する手法である Flipped-VQA を提案し,VQ とVA,QA のペアをそれぞれ付与する$\langle$V,Q,A$\rangle$ triplet のすべての組み合わせを,ソースペアとターゲットラベルをフリップすることで予測し,それらの複雑な関係を理解するために $\textit{i.e.}$,予測 A, Q, V のペアをそれぞれ与えられた VQ, VA, QA のペアを推定する。本稿では,LLaMAにFlipped-VQAを適用してLLaMA-VQAを開発した。さらに、Flipped-VQA は様々な LLM (OPT および GPT-J) に適用可能な汎用フレームワークであり、その性能を一貫して改善する。我々は, Flipped-VQAが言語的ショートカットの活用を促進するだけでなく, 言語バイアスを緩和し, 問題の過度な回答を引き起こすことを実証的に示す。コードはhttps://github.com/mlvlab/flipped-vqaで入手できる。 Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA). However, such priors often cause suboptimal results on VideoQA by leading the model to over-rely on questions, $\textit{i.e.}$, $\textit{linguistic bias}$, while ignoring visual content. This is also known as `ungrounded guesses' or `hallucinations'. To address this problem while leveraging LLMs' prior on VideoQA, we propose a novel framework, Flipped-VQA, encouraging the model to predict all the combinations of $\langle$V, Q, A$\rangle$ triplet by flipping the source pair and the target label to understand their complex relationships, $\textit{i.e.}$, predict A, Q, and V given a VQ, VA, and QA pairs, respectively. In this paper, we develop LLaMA-VQA by applying Flipped-VQA to LLaMA, and it outperforms both LLMs-based and non-LLMs-based models on five challenging VideoQA benchmarks. Furthermore, our Flipped-VQA is a general framework that is applicable to various LLMs (OPT and GPT-J) and consistently improves their performances. We empirically demonstrate that Flipped-VQA not only enhances the exploitation of linguistic shortcuts but also mitigates the linguistic bias, which causes incorrect answers over-relying on the question. Code is available at https://github.com/mlvlab/Flipped-VQA.	翻訳日:2023-10-25 19:00:51 公開日:2023-10-24
# 失敗は道を開く - チューニングフリーなルール蓄積による大規模言語モデルの拡張 Failures Pave the Way: Enhancing Large Language Models through Tuning-free Rule Accumulation ( http://arxiv.org/abs/2310.15746v1 ) ライセンス: Link先を確認	Zeyuan Yang, Peng Li, Yang Liu	(参考訳) 大きな言語モデル(LLM)は素晴らしいパフォーマンスを示しています。しかし、サンプル間の関係を捉えることができないため、これらの凍結LDMは必然的に同様のミスを繰り返し続ける。本稿では,過去の誤りから学習することで,llmの性能向上を指導するチューニングフリールール蓄積(tran)フレームワークを提案する。データが順次到着すると、LSMは不正なケースから徐々にルールを蓄積し、ルールコレクションを形成する。これらのルールはLLMによって、後続の入力を処理する際にも同様のミスを避けるために使用される。さらに、ルールはプライマリプロンプトとは独立であり、シームレスにプロンプトデザイン戦略を補完する。実験により,TRANは最近のベースラインよりも大きなマージンで改善されていることがわかった。 Large Language Models (LLMs) have showcased impressive performance. However, due to their inability to capture relationships among samples, these frozen LLMs inevitably keep repeating similar mistakes. In this work, we propose our Tuning-free Rule Accumulation (TRAN) framework, which guides LLMs in improving their performance by learning from previous mistakes. Considering data arrives sequentially, LLMs gradually accumulate rules from incorrect cases, forming a rule collection. These rules are then utilized by the LLMs to avoid making similar mistakes when processing subsequent inputs. Moreover, the rules remain independent of the primary prompts, seamlessly complementing prompt design strategies. Experimentally, we show that TRAN improves over recent baselines by a large margin.	翻訳日:2023-10-25 18:59:49 公開日:2023-10-24
# トポロジカル非負行列因子化による単一細胞RNA配列の解析 Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization ( http://arxiv.org/abs/2310.15744v1 ) ライセンス: Link先を確認	Yuta Hozumi and Guo-Wei Wei	(参考訳) 単細胞rnaシークエンシング(scrna-seq)は比較的新しい技術であり、scrna-seqデータに関連する高次元、複雑さ、大規模であることから統計学、データサイエンス、計算生物学に多大な関心を寄せている。非負行列分解(NMF)は、結果として生じる低次元成分のメタジーン解釈によるユニークなアプローチを提供する。しかし、NMFアプローチはマルチスケール分析の欠如に悩まされている。この研究は、2つの永続ラプラシア正規化NMF法、すなわちトポロジカルNMF(TNMF)とロバストトトポロジカルNMF(rTNMF)を導入している。合計12のデータセットを用いて、提案したTNMFとrTNMFが他のNMFベースの手法よりも大幅に優れていることを示す。また,TNMF と rTNMF を用いて,一般的な一様多様体近似・投影 (UMAP) と t-分散確率的隣接埋め込み (t-SNE) の可視化を行った。 Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).	翻訳日:2023-10-25 18:59:13 公開日:2023-10-24
# パラメータ効率の良い構成知識グラフ表現のためのランダムエンティティ量子化 Random Entity Quantization for Parameter-Efficient Compositional Knowledge Graph Representation ( http://arxiv.org/abs/2310.15797v1 ) ライセンス: Link先を確認	Jiaang Li, Quan Wang, Yi Liu, Licheng Zhang, Zhendong Mao	(参考訳) 下流タスクには知識グラフ(KG)の表現学習が不可欠である。支配的なアプローチであるKG Embedding(KGE)は、独立したベクトルを持つエンティティを表し、スケーラビリティの課題に直面している。最近の研究では、事前定義された小さなコードブックからマッチしたエンティティ対応コードワードを構成することでエンティティを表現する、パラメータ効率の代替方法を提案している。本稿では、各エンティティの対応するコードワードをエンティティ量子化として取得するプロセスについて述べる。本稿では,単純なランダムな実体量子化が,現在の戦略と同じような結果が得られることを示す。この現象を分析し,エンティティ表現のための数値化結果であるエンティティ符号が,コードレベルではエントロピーが高く,ランダムなエンティティ量子化下ではコードワードレベルではjaccard距離が高いことを明らかにする。したがって、異なる実体はより容易に区別され、効果的なKG表現を促進する。以上の結果から,現在の定量化戦略はkg表現にとって重要ではないこと,また,実体識別性が現在の戦略を超えて向上する余地があることが示された。結果はhttps://github.com/jiaangl/randomquantizationで再現できます。 Representation Learning on Knowledge Graphs (KGs) is essential for downstream tasks. The dominant approach, KG Embedding (KGE), represents entities with independent vectors and faces the scalability challenge. Recent studies propose an alternative way for parameter efficiency, which represents entities by composing entity-corresponding codewords matched from predefined small-scale codebooks. We refer to the process of obtaining corresponding codewords of each entity as entity quantization, for which previous works have designed complicated strategies. Surprisingly, this paper shows that simple random entity quantization can achieve similar results to current strategies. We analyze this phenomenon and reveal that entity codes, the quantization outcomes for expressing entities, have higher entropy at the code level and Jaccard distance at the codeword level under random entity quantization. Therefore, different entities become more easily distinguished, facilitating effective KG representation. The above results show that current quantization strategies are not critical for KG representation, and there is still room for improvement in entity distinguishability beyond current strategies. The code to reproduce our results is available at https://github.com/JiaangL/RandomQuantization.	翻訳日:2023-10-25 18:51:14 公開日:2023-10-24
# プレフィックス部分空間学習による大規模言語モデルの一般化 Improving generalization in large language models by learning prefix subspaces ( http://arxiv.org/abs/2310.15793v1 ) ライセンス: Link先を確認	Louis Falissard, Vincent Guigue, Laure Soulier	(参考訳) この記事では、不足データレジーム("few-shot"学習設定としても知られる)における、大言語モデル(llms)の微調整に焦点を当てます。ニューラルネットワーク部分空間に基づくLLMの一般化能力を向上させる手法を提案する。近年,コンピュータビジョンで導入されたこの最適化手法は,パラメータ空間におけるモデル全体の結合最適化を通じて,より広い局所最適化を同定することにより,モデル一般化を改善することを目的としている。しかし、大規模で事前訓練されたトランスフォーマーへの適応は、いくつかの課題を引き起こす。第一に、それらのパラメータの数によって複数のモデルの訓練が難しくなっており、第二に、決定論的パラメータの初期化スキームは、当初提案された部分空間法に不適当である。本稿では,Parameter Efficient Fine-Tuning(PEFT)法が従来の手法と完全に互換性があることを示し,連続接頭辞の単純さを学習することを提案する。本手法は,数ショットの学習環境に適応したGLUEベンチマークの変種を用いて試行し,両コントリビューションが相多手法と比較して平均性能の向上につながることを示す。実装は以下のリンクで確認できる。 https://github.com/Liloulou/prefix_subspace This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as the "few-shot" learning setting). We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces. This optimization method, recently introduced in computer vision, aims to improve model generalization by identifying wider local optima through the joint optimization of an entire simplex of models in parameter space. Its adaptation to massive, pretrained transformers, however, poses some challenges. First, their considerable number of parameters makes it difficult to train several models jointly, and second, their deterministic parameter initialization schemes make them unfit for the subspace method as originally proposed. We show in this paper that "Parameter Efficient Fine-Tuning" (PEFT) methods, however, are perfectly compatible with this original approach, and propose to learn entire simplex of continuous prefixes. We test our method on a variant of the GLUE benchmark adapted to the few-shot learning setting, and show that both our contributions jointly lead to a gain in average performances compared to sota methods. The implementation can be found at the following link: https://github.com/Liloulou/prefix_subspace	翻訳日:2023-10-25 18:50:54 公開日:2023-10-24
# qPOTS: Pareto 最適トンプソンサンプリングによる効率的なバッチ多目的ベイズ最適化 qPOTS: Efficient batch multiobjective Bayesian optimization via Pareto optimal Thompson sampling ( http://arxiv.org/abs/2310.15788v1 ) ライセンス: Link先を確認	S. Ashwin Renganathan	(参考訳) 多目的最適化の古典的進化的アプローチは、非常に効果的であるが、目的に対して多くのクエリを発生させる。多目的最適化を解くためのサンプル効率のアプローチは、ガウス過程(GP)サロゲートとベイズ最適化(BO)である。多目的ベイズ最適化(MOBO)は、新しい観測候補を取得するために最適化された取得関数の構築を伴う。この ‘inner' の最適化は様々な理由により困難である: 取得関数は非凸であり、非微分可能であり、/または解析形式で利用できない。我々は、このハード獲得関数最適化ステップを廃止し、より安価な多目的最適化問題を解くことで得られたランダムgp後方サンプルパスのparetoフロンティアから新しい候補を選択する(q\texttt{pots}$)トンプソンサンプリングベースアプローチ(q\texttt{pots}$)を提案する。より高次元での計算的トラクタビリティを向上させるために、Nystr\"{o}m近似と組み合わせた自動アクティブな候補選択法を提案する。提案手法は,任意のgp事前仮定に適用し,合成および実世界実験において,精度と計算効率の両面で,最先端における強力な経験的性能を示す。 Classical evolutionary approaches for multiobjective optimization are quite effective but incur a lot of queries to the objectives; this can be prohibitive when objectives are expensive oracles. A sample-efficient approach to solving multiobjective optimization is via Gaussian process (GP) surrogates and Bayesian optimization (BO). Multiobjective Bayesian optimization (MOBO) involves the construction of an acquisition function which is optimized to acquire new observation candidates. This ``inner'' optimization can be hard due to various reasons: acquisition functions being nonconvex, nondifferentiable and/or unavailable in analytical form; the success of MOBO heavily relies on this inner optimization. We do away with this hard acquisition function optimization step and propose a simple, but effective, Thompson sampling based approach ($q\texttt{POTS}$) where new candidate(s) are chosen from the Pareto frontier of random GP posterior sample paths obtained by solving a much cheaper multiobjective optimization problem. To further improve computational tractability in higher dimensions we propose an automated active set of candidates selection combined with a Nystr\"{o}m approximation. Our approach applies to arbitrary GP prior assumptions and demonstrates strong empirical performance over the state of the art, both in terms of accuracy and computational efficiency, on synthetic as well as real-world experiments.	翻訳日:2023-10-25 18:50:31 公開日:2023-10-24
# SequenceMatch: 半教師あり学習のための弱強強化設計の再検討 SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learning ( http://arxiv.org/abs/2310.15787v1 ) ライセンス: Link先を確認	Khanh-Binh Nguyen	(参考訳) 半教師付き学習(SSL)は,大量のラベルのないデータを用いたモデルのトレーニングを可能にするため,近年普及している。しかし、SSLメソッドが直面する問題のひとつは、モデルが小さなラベル付きトレーニングデータセットに過度に適合し、過信で誤った予測を生成する場合に発生する、確認バイアスである。この問題に対処するために,複数のデータ拡張を利用する効率的なSSL手法であるSequenceMatchを提案する。 sequencematchのキー要素は、ラベルなしデータのメディア拡張を含んでいることです。拡張された各例の異なる拡張と一貫性の制約を利用することで、sequencematchは弱く強く拡張された例に対するモデルの予測分布の相違を減らすのに役立ちます。さらに、SequenceMatchは、高信頼と低信頼の予測のための2つの異なる一貫性の制約を定義する。その結果、SequenceMatchはReMixMatchよりもデータ効率が高く、ReMixMatch($\times4$)とCoMatch($\times2$)の両方よりも時間効率が高い。その単純さにもかかわらず、SequenceMatchはCIFAR-10/100、SVHN、STL-10といった標準ベンチマークの先行手法より一貫して優れている。また、ImageNetのような大規模データセットで38.46\%のエラー率で、最先端の手法をはるかに上回っている。コードはhttps://github.com/beandkay/sequencematchで入手できる。 Semi-supervised learning (SSL) has become popular in recent years because it allows the training of a model using a large amount of unlabeled data. However, one issue that many SSL methods face is the confirmation bias, which occurs when the model is overfitted to the small labeled training dataset and produces overconfident, incorrect predictions. To address this issue, we propose SequenceMatch, an efficient SSL method that utilizes multiple data augmentations. The key element of SequenceMatch is the inclusion of a medium augmentation for unlabeled data. By taking advantage of different augmentations and the consistency constraints between each pair of augmented examples, SequenceMatch helps reduce the divergence between the prediction distribution of the model for weakly and strongly augmented examples. In addition, SequenceMatch defines two different consistency constraints for high and low-confidence predictions. As a result, SequenceMatch is more data-efficient than ReMixMatch, and more time-efficient than both ReMixMatch ($\times4$) and CoMatch ($\times2$) while having higher accuracy. Despite its simplicity, SequenceMatch consistently outperforms prior methods on standard benchmarks, such as CIFAR-10/100, SVHN, and STL-10. It also surpasses prior state-of-the-art methods by a large margin on large-scale datasets such as ImageNet, with a 38.46\% error rate. Code is available at https://github.com/beandkay/SequenceMatch.	翻訳日:2023-10-25 18:50:04 公開日:2023-10-24
# 小規模確率メタラーニングのためのニューラルネットワークの償却推論 Amortised Inference in Neural Networks for Small-Scale Probabilistic Meta-Learning ( http://arxiv.org/abs/2310.15786v1 ) ライセンス: Link先を確認	Matthew Ashman, Tommy Rochussen and Adrian Weller	(参考訳) BNNに対する大域的誘導点変分近似は、真の後続分布の条件を正確に近似する一連の条件分布を構築するために、一連のインジェクション入力を使用する。我々の重要な洞察は、これらのインプットを実際のデータに置き換えることができ、変動分布は各データポイントに対して近似的な確率の集合からなることである。この構造は、推定ネットワークとして知られるメタモデルを通して各データポイントを渡すことで、各近似近似のパラメータが得られ、アモートされた推論になる。この推論ネットワークを関連するデータセット間でトレーニングすることにより、タスク固有のBNNに対するメタ学習ベイズ推論が可能になる。 The global inducing point variational approximation for BNNs is based on using a set of inducing inputs to construct a series of conditional distributions that accurately approximate the conditionals of the true posterior distribution. Our key insight is that these inducing inputs can be replaced by the actual data, such that the variational distribution consists of a set of approximate likelihoods for each datapoint. This structure lends itself to amortised inference, in which the parameters of each approximate likelihood are obtained by passing each datapoint through a meta-model known as the inference network. By training this inference network across related datasets, we can meta-learn Bayesian inference over task-specific BNNs.	翻訳日:2023-10-25 18:49:34 公開日:2023-10-24
# LLMをテストエキスパートにする - 機能的認識によるモバイルGUIテストへのヒューマンライクなインタラクション Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions ( http://arxiv.org/abs/2310.15780v1 ) ライセンス: Link先を確認	Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, Qing Wang	(参考訳) 自動化されたグラフィカルユーザインターフェース(gui)テストは、アプリケーションの品質を保証する上で重要な役割を果たす。自動guiテストにおける学習ベースのテクニックの人気は、人間のようなインタラクションを生成する能力によって高まっているが、テストカバレッジの低さ、一般化能力の不十分、トレーニングデータへの依存度など、いくつかの制限に苦しめられている。自然言語理解や質問応答におけるChatGPTのような大規模言語モデル(LLM)の成功に触発されて,我々はQ&AタスクとしてモバイルGUIテスト問題を定式化した。 gptdroidを提案し,guiページ情報をllmに渡してテストスクリプトを省略し,アプリケーションのフィードバックをllmに渡すように実行し,プロセス全体を繰り返すことで,モバイルアプリとのチャットをllmに依頼する。このフレームワークでは、llmにプロセス全体のテスト知識を保持させ、長期にわたって機能ベースの推論を行うことで探索を導く、機能対応メモリプロンプト機構も導入しています。 google playの93のアプリで評価し、最高のベースラインを32%のアクティビティカバレッジで上回り、より速い速度で31%のバグを検出することを実証した。さらに、gptdroidはgoogle playで新たに53のバグを発見し、そのうち35が修正されている。 Automated Graphical User Interface (GUI) testing plays a crucial role in ensuring app quality, especially as mobile applications have become an integral part of our daily lives. Despite the growing popularity of learning-based techniques in automated GUI testing due to their ability to generate human-like interactions, they still suffer from several limitations, such as low testing coverage, inadequate generalization capabilities, and heavy reliance on training data. Inspired by the success of Large Language Models (LLMs) like ChatGPT in natural language understanding and question answering, we formulate the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts, and executing them to keep passing the app feedback to LLM, iterating the whole process. Within this framework, we have also introduced a functionality-aware memory prompting mechanism that equips the LLM with the ability to retain testing knowledge of the whole process and conduct long-term, functionality-based reasoning to guide exploration. We evaluate it on 93 apps from Google Play and demonstrate that it outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate. Moreover, GPTDroid identify 53 new bugs on Google Play, of which 35 have been confirmed and fixed.	翻訳日:2023-10-25 18:49:23 公開日:2023-10-24
# MRIスキャンにおけるプライバシー向上のための3Dマスクオートエンコーダ 3D Masked Autoencoders for Enhanced Privacy in MRI Scans ( http://arxiv.org/abs/2310.15778v1 ) ライセンス: Link先を確認	Lennart Alexander Van der Goten and Kevin Smith	(参考訳) MRIスキャンは貴重な医療情報を提供するが、保護すべき機密かつ個人識別可能な情報(PII)も含む。 MRIメタデータは容易にサニタイズされるが、MRI画像データは患者の頭部の高現実的な3Dヴィジュアライゼーションをレンダリングする情報を含んでいるため、データベースを相互参照することで、悪意あるアクターが被検体を特定できるため、プライバシー上のリスクである。データ匿名化と非識別化は個人の個人情報のプライバシーと機密性の確保に関係している。従来のMRI鑑定法では、特定のスキャンからプライバシーに敏感な部分(目、鼻など)を取り除く。これは、ダウンストリーム分析をオフにできるドメインシフトの導入に費やされる。近年,GANをベースとしたアプローチが提案され,患者のスキャンを部品の除去ではなく (顔の変更など) 改造して識別する手法が提案されている。本研究では,マスク付きオートエンコーダを用いて顔を非識別するモデルcp-maeを提案する。この方法では,最大256^3$(以前は128立方体)の解像度のスキャンを合成することができ,ボクセルの数が8倍に増加した。構築した構成を使って、非常に堅牢なトレーニングステージを示すシステムを設計することができ、ネットワークを新しいデータに適合させるのが容易になりました。 MRI scans provide valuable medical information, however they also contain sensitive and personally identifiable information (PII) that needs to be protected. Whereas MRI metadata is easily sanitized, MRI image data is a privacy risk because it contains information to render highly-realistic 3D visualizations of a patient's head, enabling malicious actors to possibly identify the subject by cross-referencing a database. Data anonymization and de-identification is concerned with ensuring the privacy and confidentiality of individuals' personal information. Traditional MRI de-identification methods remove privacy-sensitive parts (e.g. eyes, nose etc.) from a given scan. This comes at the expense of introducing a domain shift that can throw off downstream analyses. Recently, a GAN-based approach was proposed to de-identify a patient's scan by remodeling it (e.g. changing the face) rather than by removing parts. In this work, we propose CP-MAE, a model that de-identifies the face using masked autoencoders and that outperforms all previous approaches in terms of downstream task performance as well as de-identification. With our method we are able to synthesize scans of resolution up to $256^3$ (previously 128 cubic) which constitutes an eight-fold increase in the number of voxels. Using our construction we were able to design a system that exhibits a highly robust training stage, making it easy to fit the network on novel data.	翻訳日:2023-10-25 18:48:58 公開日:2023-10-24
# MindLLM: スクラッチ、評価、ドメイン・アプリケーションからトレーニング済みの軽量大言語モデル MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications ( http://arxiv.org/abs/2310.15777v1 ) ライセンス: Link先を確認	Yizhe Yang, Huashan Sun, Jiawei Li, Runheng Liu, Yinghao Li, Yuhang Liu, Heyan Huang, Yang Gao	(参考訳) 大規模言語モデル(LLM)は、様々な自然言語タスクにおいて顕著な性能を示し、汎用人工知能への大きな一歩を踏み出した。汎用人工知能は、ますます大規模なモデルを開発することで活用されているが、LLMのトレーニングとデプロイのコストとリソース不足を考慮して、特定のドメインにより良いサービスを提供する軽量なカスタムモデルを開発するための別の部門が存在する可能性がある。本稿では,13億,30億のパラメータを持つモデルを提供することで,その負担を軽減するために,スクラッチから訓練したバイリンガル軽量大言語モデルであるMindLLMを提案する。データ構築、モデルアーキテクチャ、評価、アプリケーションなど、プロセスのすべてのステップをカバーしている。このような洞察は、同僚の学者や開発者にとって有益である。 MindLLMは、いくつかの公開ベンチマークにおいて、他のオープンソースの大規模モデルのパフォーマンスと一貫して一致または上回っている。また,小型モデルに適した革新的な命令チューニングフレームワークを導入し,その能力を向上させる。さらに、法律や金融といった特定の垂直領域におけるMindLLMの適用について検討し、軽量モデルの俊敏性と適応性を強調します。 Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.	翻訳日:2023-10-25 18:48:30 公開日:2023-10-24
# CP$^{\infty}$ and beyond: 2-カテゴリー拡張理論 CP$^{\infty}$ and beyond: 2-categorical dilation theory ( http://arxiv.org/abs/2310.15776v1 ) ライセンス: Link先を確認	Robert Allen and Dominic Verdon	(参考訳) カテゴリー量子力学の洞察と技法を無限次元系に拡張する問題は (coecke and heunen, 2016) で検討された。その仕事において、ヒルベルト空間と有界線型写像の圏からヒルベルト空間と量子演算の圏を復元する$\mathrm{CP}^{\infty}$-コンストラクションが定義された。ここで、$\mathrm{cp}^{\infty}$-コンストラクションの‘ホリゾンタル分類’によって、フォン・ノイマン代数、双加群、インタートウィナーの2-圏 $[w^]$ からすべてのフォン・ノイマン代数とチャネル(正規ユニタリ正の写像)の圏を回復できることを示す。応用として、チェーの有限次元行列代数間の極端チャネルのキャラクタリゼーションを任意のフォン・ノイマン代数間の極端チャネルのキャラクタリゼーションに拡張する。 The problem of extending the insights and techniques of categorical quantum mechanics to infinite-dimensional systems was considered in (Coecke and Heunen, 2016). In that work the $\mathrm{CP}^{\infty}$-construction, which recovers the category of Hilbert spaces and quantum operations from the category of Hilbert spaces and bounded linear maps, was defined. Here we show that by a `horizontal categorification' of the $\mathrm{CP}^{\infty}$-construction, one can recover the category of all von Neumann algebras and channels (normal unital completely positive maps) from the 2-category $[W^]$ of von Neumann algebras, bimodules and intertwiners. As an application, we extend Choi's characterisation of extremal channels between finite-dimensional matrix algebras to a characterisation of extremal channels between arbitrary von Neumann algebras.	翻訳日:2023-10-25 18:48:11 公開日:2023-10-24
# BLESS: 文の単純化に関する大規模言語モデルのベンチマーク BLESS: Benchmarking Large Language Models on Sentence Simplification ( http://arxiv.org/abs/2310.15773v1 ) ライセンス: Link先を確認	Tannon Kew, Alison Chi, Laura V\'asquez-Rodr\'iguez, Sweta Agrawal, Dennis Aumiller, Fernando Alva-Manchego, Matthew Shardlow	(参考訳) 本稿では,最新の大規模言語モデル(LLM)の総合的なパフォーマンスベンチマークであるBLESSについて,テキスト単純化(TS)の課題について紹介する。そこで,本研究では,各ドメインの3つのテストセット(Wikipedia,ニュース,医療)に対して,サイズ,アーキテクチャ,事前学習方法,アクセシビリティなど,44種類のモデルを比較して,この課題を克服する方法について検討する。本分析では,異なるモデルで実行される共通編集操作のタイプについて,一連の自動測定値と大規模に定量的に検討する。さらに,モデル出力のサブセットを手作業で定性解析することにより,生成した単純化の品質を評価する。評価の結果,最高のLSMはTSのトレーニングを受けていないにもかかわらず,最先端のTSベースラインと相容れない性能を示した。さらに,一部のLCMでは編集操作の幅と多様性がより大きいことが判明した。私たちのパフォーマンスベンチマークは、将来のTSメソッドと評価メトリクスの開発のためのリソースとして利用できます。 We present BLESS, a comprehensive performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS). We examine how well off-the-shelf LLMs can solve this challenging task, assessing a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting. Our analysis considers a suite of automatic metrics as well as a large-scale quantitative investigation into the types of common edit operations performed by the different models. Furthermore, we perform a manual qualitative analysis on a subset of model outputs to better gauge the quality of the generated simplifications. Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines. Additionally, we find that certain LLMs demonstrate a greater range and diversity of edit operations. Our performance benchmark will be available as a resource for the development of future TS methods and evaluation metrics.	翻訳日:2023-10-25 18:47:50 公開日:2023-10-24
# 非自然言語処理: 言語モデルはマシン生成プロンプトをどのように扱うか? Unnatural language processing: How do language models handle machine-generated prompts? ( http://arxiv.org/abs/2310.15829v1 ) ライセンス: Link先を確認	Corentin Kervadec, Francesca Franzon and Marco Baroni	(参考訳) 言語モデルプロンプト最適化研究は、モデル埋め込み空間からのベクトル列を含む、明確な意味や構文構造を持たない自動生成されたトークンシーケンスによって、意味論的および文法上、手作業によるプロンプトがルーチン的に上回ることを示した。我々は機械生成プロンプトを用いて、自然言語表現を含まない入力に対してモデルがどのように反応するかを探索する。連続的および離散的な機械生成プロンプトに応答し,複数の意味タスクにおいて異なる大きさのモデルの挙動を考察し,人間の生成した自然言語プロンプトに応答する振る舞いと比較した。同様の出力を生成する場合でも、マシン生成とヒューマンプロンプトは、異なるパープレキシティ、異なる注意と出力エントロピー分布、異なるユニットアクティベーションプロファイルを含む、ネットワーク処理経路を通じて異なる応答パターンをトリガーする。我々は、異なるプロンプトタイプによって活性化される単位の性質について予備的な洞察を与え、自然言語のみが真に言語的な回路をリクルートすることを示唆する。 Language model prompt optimization research has shown that semantically and grammatically well-formed manually crafted prompts are routinely outperformed by automatically generated token sequences with no apparent meaning or syntactic structure, including sequences of vectors from a model's embedding space. We use machine-generated prompts to probe how models respond to input that is not composed of natural language expressions. We study the behavior of models of different sizes in multiple semantic tasks in response to both continuous and discrete machine-generated prompts, and compare it to the behavior in response to human-generated natural-language prompts. Even when producing a similar output, machine-generated and human prompts trigger different response patterns through the network processing pathways, including different perplexities, different attention and output entropy distributions, and different unit activation profiles. We provide preliminary insight into the nature of the units activated by different prompt types, suggesting that only natural language prompts recruit a genuinely linguistic circuit.	翻訳日:2023-10-25 18:41:58 公開日:2023-10-24
# 高度高分解能3次元ResUNetによる自動大動脈切開 : SEG.Aチャレンジへの貢献 Automatic Aorta Segmentation with Heavily Augmented, High-Resolution 3-D ResUNet: Contribution to the SEG.A Challenge ( http://arxiv.org/abs/2310.15827v1 ) ライセンス: Link先を確認	Marek Wodzinski and Henning M\"uller	(参考訳) 3次元医用量の自動大動脈分割は重要な課題である。いくつかの要因は、大動脈解離の可能性や、小枝の分節化や注釈の難しさなど、問題を難しくしている。この研究は、MICCAI 2023カンファレンスで組織されたSEGへのMedGIFTチームの貢献を示す。ディープエンコーダ・デコーダアーキテクチャに基づく完全自動アルゴリズムを提案する。私たちの研究の主な前提は、特に低いデータ構造において、データ前処理と拡張がディープアーキテクチャよりもずっと重要であるということです。したがって、この解は伝統的な畳み込みU-Netの変種に基づいている。提案手法は,すべてのテストケースに対して0.9以上のdiceスコアを達成し,参加者の安定性が最も高かった。本法は, 臨床評価, 定量的結果, 容積メッシュの質について, 1位, 4位, 3位と評価した。ソースコードと事前訓練されたモデルを自由にリリースし、Grand-Challengeプラットフォーム上でアルゴリズムへのアクセスを提供する。 Automatic aorta segmentation from 3-D medical volumes is an important yet difficult task. Several factors make the problem challenging, e.g. the possibility of aortic dissection or the difficulty with segmenting and annotating the small branches. This work presents a contribution by the MedGIFT team to the SEG.A challenge organized during the MICCAI 2023 conference. We propose a fully automated algorithm based on deep encoder-decoder architecture. The main assumption behind our work is that data preprocessing and augmentation are much more important than the deep architecture, especially in low data regimes. Therefore, the solution is based on a variant of traditional convolutional U-Net. The proposed solution achieved a Dice score above 0.9 for all testing cases with the highest stability among all participants. The method scored 1st, 4th, and 3rd in terms of the clinical evaluation, quantitative results, and volumetric meshing quality, respectively. We freely release the source code, pretrained model, and provide access to the algorithm on the Grand-Challenge platform.	翻訳日:2023-10-25 18:41:38 公開日:2023-10-24
# コンセプトドリフトについて知っておくべきこと - 進化する環境のモニタリングに関するサーベイ One or Two Things We know about Concept Drift -- A Survey on Monitoring Evolving Environments ( http://arxiv.org/abs/2310.15826v1 ) ライセンス: Link先を確認	Fabian Hinder and Valerie Vaquet and Barbara Hammer	(参考訳) 私たちを取り巻く世界は常に変化している。これらの変化は、しばしば概念の漂流と表現され、多くの産業や技術プロセスに影響を及ぼす。多くのシナリオでは安全性が重要であり、誤動作やその他の異常な行動につながる可能性があるため、概念ドリフトの検出と分析が不可欠である。本稿では,教師なしデータストリームにおける概念ドリフトに着目した文献レビューを行う。多くの調査は教師なしのデータストリームにフォーカスしているが、教師なしの設定をレビューする作業はない。しかし、この設定はモニタリングや異常検出に特に関連しており、エンジニアリングにおける多くのタスクや課題に直接適用できる。この調査は、ドリフト検出に関する既存の研究の分類を提供する。さらに、ドリフトの局在に関する研究の現状を体系的な方法でカバーしている。体系的な文献レビューの提供に加えて、本研究は、考慮された問題の正確な数学的定義を提供し、パラメトリックな人工データセットの標準化実験を含んでおり、検出とローカライゼーションの異なる戦略を直接比較することができる。これにより、異なるスキームの適合性を系統的に分析し、実世界のシナリオで使用するためのガイドラインを提供できる。最後に、概念ドリフトを説明するという新しいトピックのセクションがある。 The world surrounding us is subject to constant change. These changes, frequently described as concept drift, influence many industrial and technical processes. As they can lead to malfunctions and other anomalous behavior, which may be safety-critical in many scenarios, detecting and analyzing concept drift is crucial. In this paper, we provide a literature review focusing on concept drift in unsupervised data streams. While many surveys focus on supervised data streams, so far, there is no work reviewing the unsupervised setting. However, this setting is of particular relevance for monitoring and anomaly detection which are directly applicable to many tasks and challenges in engineering. This survey provides a taxonomy of existing work on drift detection. Besides, it covers the current state of research on drift localization in a systematic way. In addition to providing a systematic literature review, this work provides precise mathematical definitions of the considered problems and contains standardized experiments on parametric artificial datasets allowing for a direct comparison of different strategies for detection and localization. Thereby, the suitability of different schemes can be analyzed systematically and guidelines for their usage in real-world scenarios can be provided. Finally, there is a section on the emerging topic of explaining concept drift.	翻訳日:2023-10-25 18:41:23 公開日:2023-10-24
# Rosetta Stone - KSAA-RD Shared Task: 言語モデリングから単語定義へ Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To Word--Definition Alignment ( http://arxiv.org/abs/2310.15823v1 ) ライセンス: Link先を確認	Ahmed ElBakry, Mohamed Gabr, Muhammad ElNokrashy, Badr AlKhamissi	(参考訳) 逆辞書は、ユーザーが提供された定義、意味、記述に基づいて単語を発見できるツールである。このような手法は様々なシナリオで有用であり、同一性のない単語の記述を持つ言語学習者を支援し、正確な用語を求める作家に利益をもたらす。これらのシナリオは、しばしば"Tip-of-the-Tongue"(TOT)現象と呼ばれる現象をカプセル化する。本稿では,アラビア語逆辞書共有タスクの勝利解を提案する。この課題は、アラビア語のベクトル表現を付随する記述から導出することに焦点を当てている。共有タスクは2つの異なるサブタスクを含む: 1つはアラビア語の定義を入力として含み、もう1つは英語の定義を用いる。最初のサブタスクに対して、我々のアプローチは、与えられた定義に埋め込まれた単語を予測し、微調整されたアラビアBERTベースのモデルの集合に依存する。最終的な表現は、アンサンブル内の各モデルからの出力埋め込み平均化によって得られる。対照的に、第2サブタスクの最も効果的な解決策は、英語のテスト定義をアラビア語に翻訳し、最初は第1サブタスクのために訓練された微調整モデルに適用することである。この簡単な方法は両方のサブタスクで最高点を達成する。 A Reverse Dictionary is a tool enabling users to discover a word based on its provided definition, meaning, or description. Such a technique proves valuable in various scenarios, aiding language learners who possess a description of a word without its identity, and benefiting writers seeking precise terminology. These scenarios often encapsulate what is referred to as the "Tip-of-the-Tongue" (TOT) phenomena. In this work, we present our winning solution for the Arabic Reverse Dictionary shared task. This task focuses on deriving a vector representation of an Arabic word from its accompanying description. The shared task encompasses two distinct subtasks: the first involves an Arabic definition as input, while the second employs an English definition. For the first subtask, our approach relies on an ensemble of finetuned Arabic BERT-based models, predicting the word embedding for a given definition. The final representation is obtained through averaging the output embeddings from each model within the ensemble. In contrast, the most effective solution for the second subtask involves translating the English test definitions into Arabic and applying them to the finetuned models originally trained for the first subtask. This straightforward method achieves the highest score across both subtasks.	翻訳日:2023-10-25 18:41:04 公開日:2023-10-24

Title

Authors

Abstract

論文公表日・翻訳日

# 非Fungible Token Security

Non-Fungible Token Security ( http://arxiv.org/abs/2310.15518v1 )

ライセンス: Link先を確認

Ryleigh McKinney, Sundar Krishnan,

(参考訳) 非偽造トークン(NFT)はブロックチェーンに格納されたユニークなデジタル資産であり、デジタル資産の所有権と認証に使用される。 NFTは2014年に初めて製作され、その人気は2021年から2022年にかけてピークを迎えた。本稿では,NFT(Non-Fungible Tokens)の歴史,NFT(Future of NFTs),およびセキュリティ上の懸念について述べる。

Non-fungible tokens (NFTs) are unique digital assets stored on the blockchain and is used to certify ownership and authenticity of the digital asset. NFTs were first created in 2014 while their popularity peaked between 2021 and 2022. In this paper, the authors dive into the world of Non-Fungible Tokens (NFTs), their history, the Future of NFTs, as well as the security concerns.

翻訳日:2024-03-25 14:05:29 公開日:2023-10-24

# 国家電子アイデンティティ(eID)システムに対する影響とリスクアセスメントフレームワーク

An Impact and Risk Assessment Framework for National Electronic Identity (eID) Systems ( http://arxiv.org/abs/2310.15784v1 )

ライセンス: Link先を確認

Jide Edu, Mark Hooper, Carsten Maple, Jon Crowcroft,

(参考訳) 電子識別(eID)システムにより、市民は、政府のサービスへのアクセスや金融取引の実施など、様々な目的で、アイデンティティを主張し、認証することができる。これらのシステムは、権利、サービス、および正式な経済へのユーザーアクセスを改善する。 eIDシステムが国家発展の重要な側面となるにつれて、いかなる失敗、妥協、誤用も政府、ユーザー、社会に損害を与える可能性がある。したがって、システムに対する新たなリスクを特定し、その影響を評価するためには、効果的なリスク評価が不可欠である。しかしながら、これらのシステムに対する包括的リスクアセスメントの開発は、技術的なセキュリティとプライバシの影響に焦点を絞るだけでなく、利害関係者やこれらのシステムが提供するコミュニティの文脈的理解によって実施されなければならない。本研究では,現在のリスクアセスメントがすべての主要な利害関係者のリスク要因に対処するものではないと仮定し,その影響について検討する。リスクの広範な影響と、利害関係者にとって潜在的に重大な影響について検討し、これらの制度が導入された社会的、経済的、政治的文脈を含む幅広い要因を考察する枠組みを提案する。これは、eIDシステムに対するリスクをよりよく評価するための総合的なプラットフォームを提供する。

Electronic identification (eID) systems allow citizens to assert and authenticate their identities for various purposes, such as accessing government services or conducting financial transactions. These systems improve user access to rights, services, and the formal economy. As eID systems become an essential facet of national development, any failure, compromise, or misuse can be costly and damaging to the government, users, and society. Therefore, an effective risk assessment is vital for identifying emerging risks to the system and assessing their impact. However, developing a comprehensive risk assessment for these systems must extend far beyond focusing on technical security and privacy impacts and must be conducted with a contextual understanding of stakeholders and the communities these systems serve. In this study, we posit that current risk assessments do not address risk factors for all key stakeholders and explore how potential compromise could impact them each in turn. In the examination of the broader impact of risks and the potentially significant consequences for stakeholders, we propose a framework that considers a wide range of factors, including the social, economic, and political contexts in which these systems were implemented. This provides a holistic platform for a better assessment of risk to the eID system.

翻訳日:2024-03-25 14:05:29 公開日:2023-10-24

# 国家電子アイデンティティ(NeID)システムのリスクと課題

Exploring the Risks and Challenges of National Electronic Identity (NeID) System ( http://arxiv.org/abs/2310.15813v1 )

ライセンス: Link先を確認

Jide Edu, Mark Hooper, Carsten Maple, Jon Crowcroft,

(参考訳) 多くの国は、国民の身元を確実に確認することで、公正で透明で、自治的な社会を育む可能性を認識し、国家電子識別システム(NeID)を採用してきた。 NeIDの包括的性質は、義務を履行する責任を負いながら権利を行使する権限を人々に与えます。それでも、これらの複雑なアイデンティティ検証システムの開発と実装は、セキュリティ、プライバシ、排除に関する懸念を引き起こしている。本研究では,NeIDリスクの異なるカテゴリについて論じ,これらのシステムの展開を成功させるとともに,この技術によって引き起こされる特定のリスクやその他の課題にどのように対処するかを考察する。異なるNeIDシステムのレビューと、各デプロイメントで提示されるユニークなリスクと課題を軽減するための取り組みに基づいて、強いセキュリティ対策の実施、定期的なリスク評価の実行、システムの設計と実装におけるステークホルダーの関与など、リスクを軽減するためのベストプラクティスを強調した。

Many countries have embraced national electronic identification (NeID) systems, recognising their potential to foster a fair, transparent, and well-governed society by ensuring the secure verification of citizens' identities. The inclusive nature of NeID empowers people to exercise their rights while holding them accountable for fulfilling their obligations. Nevertheless, the development and implementation of these complex identity-verification systems have raised concerns regarding security, privacy, and exclusion. In this study, we discuss the different categories of NeID risk and explore the successful deployment of these systems, while examining how the specific risks and other challenges posed by this technology are addressed. Based on the review of the different NeID systems and the efforts made to mitigate the unique risks and challenges presented within each deployment, we highlighted the best practices for mitigating risk, including implementing strong security measures, conducting regular risk assessments, and involving stakeholders in the design and implementation of the system.

翻訳日:2024-03-25 14:05:29 公開日:2023-10-24

# 再実行可能シグナチャスキームとゼロ知識証明:分散デジタルアイデンティティシステムへの適用に関する比較検討

Redactable Signature Schemes and Zero-knowledge Proofs: A comparative examination for applications in Decentralized Digital Identity Systems ( http://arxiv.org/abs/2310.15934v1 )

ライセンス: Link先を確認

Bryan Kumara, Mark Hooper, Carsten Maple, Timothy Hobson, Jon Crowcroft,

(参考訳) Redactable Signature SchemesとZero-Knowledge Proofsは、プライバシを実現するために、根本的に異なる2つのアプローチである。本稿では,分散IDシステムに適用した場合のメリットと欠点について分析する。 Redactable Signaturesは競合的に高速でコンパクトだが、ゼロ知識証明ほど表現力がない。一方、ゼロ知識証明ははるかに高速であるが、いくつかのプロトコルは信頼できるセットアップを必要とする。我々は、利点と欠点を考慮すれば、再実行可能なシグネチャは初期の段階でより適切であり、ゼロ知識証明は後期の分散IDシステムにおいてより適切である、と結論付けた。

Redactable Signature Schemes and Zero-Knowledge Proofs are two radically different approaches to enable privacy. This paper analyses their merits and drawbacks when applied to decentralized identity system. Redactable Signatures, though competitively quick and compact, are not as expressive as zero-knowledge proofs and do not provide the same level of privacy. On the other hand, zero-knowledge proofs can be much faster but some protocols require a trusted set-up. We conclude that given the benefits and drawbacks, redactable signatures are more appropriate at an earlier stage and zero-knowledge proofs are more appropriate at a later stage for decentralized identity systems

翻訳日:2024-03-25 14:05:29 公開日:2023-10-24

# クラウドストレージにおけるアクセス構造の収縮を実現するための効率的な方法

An Efficient Method for Realizing Contractions of Access Structures in Cloud Storage ( http://arxiv.org/abs/2310.15972v1 )

ライセンス: Link先を確認

Shuai Feng, Liang Feng Zhang,

(参考訳) シングルクラウドストレージでは、暗号文の属性ベースの暗号化(CP-ABE)により、クラウドサーバへのアクセス構造の下でデータを暗号化し、復号に必要な属性を指定することができる。マルチクラウドストレージでは、シークレット共有スキーム(SSS)によって、任意のデータを複数の共有に分割し、1つのサーバに分割し、どのサブセットがデータを復元できるかを指定できる。いくつかの属性/サーバを削除するのは興味深い問題ですが、認証済みのすべてのセットの残りの属性/サーバでデータをリカバリすることが可能です。この問題はSSSのアクセス構造が収縮する問題に関連している。本稿では,アクセス構造に対して与えられたSSSを,アクセス構造を収縮するSSSに効率的に変換する手法を提案する。 CP-ABEをベースとした単一クラウドストレージにおける属性除去問題とマルチクラウドストレージにおけるデータ移動問題の解決におけるその応用について述べる。私たちの方法は、サーバストレージの削減や、追加のサーバストレージの不要といったソリューションを生み出します。

In single-cloud storage, ciphertext-policy attribute-based encryption (CP-ABE) allows one to encrypt any data under an access structure to a cloud server, specifying what attributes are required to decrypt. In multi-cloud storage, a secret sharing scheme (SSS) allows one to split any data into multiple shares, one to a single server, and specify which subset of the servers are able to recover the data. It is an interesting problem to remove some attributes/servers but still enable the remaining attributes/servers in every authorized set to recover the data. The problem is related to the contraction problem of access structures for SSSs. In this paper, we propose a method that can efficiently transform a given SSS for an access structure to SSSs for contractions of the access structure. We show its applications in solving the attribute removal problem in the CP-ABE based single-cloud storage and the data relocating problem in multi-cloud storage. Our method results in solutions that require either less server storage or even no additional server storage.

翻訳日:2024-03-25 13:55:39 公開日:2023-10-24

# バーチャルリアリティーは、ユーザーをキーストローク推論攻撃から守ることができるか?

Can Virtual Reality Protect Users from Keystroke Inference Attacks? ( http://arxiv.org/abs/2310.16191v1 )

ライセンス: Link先を確認

Zhuolin Yang, Zain Sarwar, Iris Hwang, Ronik Bhaskar, Ben Y. Zhao, Haitao Zheng,

(参考訳) バーチャルリアリティ(VR)は、地理的制限なしに没入的でインタラクティブな体験を提供することで人気を集めている。また、物理的分離による個人のプライバシーの感覚も提供する。本稿では,プライバシーの強化を前提として,個人情報を盗むサイドチャネル攻撃からVRを保護できないことを示す。皮肉なことに、この脆弱性はVRの最大の強み、没入的でインタラクティブな性質から生じます。そこで我々は,アバターを観察することで,他のVRユーザによって入力されたコンテンツをアタッカー(VRユーザ)が復元できるような,共有仮想環境における新しいキーストローク推論攻撃の設計と実装を行った。アバターはユーザの手の動きのノイズの多いテレメトリを表示するが、インテリジェントアタッカーは、キーボードレイアウトやラベル付きデータを収集することなく、そのデータを入力されたキーを認識し、型付きコンテンツを再構築することができる。 IRBが承認した複数のVRシナリオを対象としたユーザスタディを用いて,提案した攻撃の評価を行った。 15人中13人がタイプされたキーの86%～98%を正確に認識し、元のタイプされたコンテンツの意味の98%を回復したコンテンツが保持している。また、防衛の可能性についても論じる。

Virtual Reality (VR) has gained popularity by providing immersive and interactive experiences without geographical limitations. It also provides a sense of personal privacy through physical separation. In this paper, we show that despite assumptions of enhanced privacy, VR is unable to shield its users from side-channel attacks that steal private information. Ironically, this vulnerability arises from VR's greatest strength, its immersive and interactive nature. We demonstrate this by designing and implementing a new set of keystroke inference attacks in shared virtual environments, where an attacker (VR user) can recover the content typed by another VR user by observing their avatar. While the avatar displays noisy telemetry of the user's hand motion, an intelligent attacker can use that data to recognize typed keys and reconstruct typed content, without knowing the keyboard layout or gathering labeled data. We evaluate the proposed attacks using IRB-approved user studies across multiple VR scenarios. For 13 out of 15 tested users, our attacks accurately recognize 86%-98% of typed keys, and the recovered content retains up to 98% of the meaning of the original typed content. We also discuss potential defenses.

翻訳日:2024-03-25 13:55:39 公開日:2023-10-24

# ブロックチェーンを用いた協調プラトゥーニングによる運転者の安全回復

Driver Safety Reward with Cooperative Platooning using Blockchain ( http://arxiv.org/abs/2312.02164v1 )

ライセンス: Link先を確認

Sruthi Rachamalla, Henry Hexmoor,

(参考訳) 共同運転(またはプラトゥーニング)は、車両通信プロトコルによって2台以上の車両を道路で接続することで安全性と効率を向上させることに焦点を当てる。リーダーは小隊を管理し、車間の通信を確立し、小隊の演習を行うため、非常に重要である。本稿では,運転者の安全につながる道路における小隊化を促進するドライバーインセンティブモデルを提案する。小隊のリーダーはフォロワーよりも複数の責任を持ち、我々のモデルはフォロワーよりもリーダーにインセンティブを与える。これらのインセンティブは暗号通貨として報われる。この小隊のリーダーとフォロワーの両方のためのデジタルマネタイズ方法は、ブロックチェーンを使用したセキュアなトランザクションによって実現される。

Cooperative driving (or Platooning) focuses on improving the safety and efficiency by connecting two or more vehicles on a road by vehicular communication protocols. The leader is crucial as it manages the platoon, establishes communication between cars, and perform platoon maneuvers. In this paper, we proposed a driver incentive model which encourages platooning on roads leading to driver safety. As, the leader of platoon have multiple responsibilities than followers, our model rewards more incentives to leader than followers. These incentives will be rewarded as crypto tokens. This digital monetization method for both leaders and followers of a platoon is accomplished by secure transactions using blockchain.

翻訳日:2024-03-25 13:06:53 公開日:2023-10-24

# 強化学習に基づく移動ロボットの局所経路計画

Reinforcement learning based local path planning for mobile robot ( http://arxiv.org/abs/2403.12463v1 )

ライセンス: Link先を確認

Mehmet Gok, Mehmet Tekerek, Hamza Aydemir,

(参考訳) 移動ロボットが特定の目標地点に行くには、異なる方法が用いられる。これらのメソッドは、オンラインとオフラインのシナリオで異なる方法で動作します。オフラインのシナリオでは、環境マップが一度作成され、このマップ上に固定された経路計画が作成され、ターゲットに到達する。 A* や RRT (Rapidly-Exploring Random Tree) のような経路計画アルゴリズムはオフライン手法の例である。ここで最も明白な状況は、ロードされたマップの条件を変更するパスを再計画する必要があることである。一方,オンラインのシナリオでは,センサから得られる知覚データを用いて地図を使わずに,ロボットを対象物へ動的に移動させる。 SFM(Social Force Model)のようなアプローチは、オンラインシステムで使われている。しかし、これらの手法は多くの動的センシングデータの要求に悩まされている。このように、オフラインシステムにおける再計画とマッピングの必要性や、オンラインシステムにおける様々なシステム設計要件が、自律型移動ロボット研究に焦点を絞っていると言えよう。近年,モバイルロボットナビゲーションにおける問題に対する新たなソリューションとして,ディープニューラルネットワークを用いたQ-Learning手法が採用されている。本研究では,DQN(Deep Q-Learning)とDQN(Deep DQN)アーキテクチャを用いた機械学習アルゴリズムを,上記の問題の解法として評価し,障害物回避のための自律移動ロボットの経路計画を実現する。

Different methods are used for a mobile robot to go to a specific target location. These methods work in different ways for online and offline scenarios. In the offline scenario, an environment map is created once, and fixed path planning is made on this map to reach the target. Path planning algorithms such as A* and RRT (Rapidly-Exploring Random Tree) are the examples of offline methods. The most obvious situation here is the need to re-plan the path for changing conditions of the loaded map. On the other hand, in the online scenario, the robot moves dynamically to a given target without using a map by using the perceived data coming from the sensors. Approaches such as SFM (Social Force Model) are used in online systems. However, these methods suffer from the requirement of a lot of dynamic sensing data. Thus, it can be said that the need for re-planning and mapping in offline systems and various system design requirements in online systems are the subjects that focus on autonomous mobile robot research. Recently, deep neural network powered Q-Learning methods are used as an emerging solution to the aforementioned problems in mobile robot navigation. In this study, machine learning algorithms with deep Q-Learning (DQN) and Deep DQN architectures, are evaluated for the solution of the problems presented above to realize path planning of an autonomous mobile robot to avoid obstacles.

翻訳日:2024-03-25 07:36:54 公開日:2023-10-24

# 協調情報を用いたグラフベース軌道予測

Graph-based Trajectory Prediction with Cooperative Information ( http://arxiv.org/abs/2310.15692v1 )

ライセンス: Link先を確認

Jan Strohbeck, Sebastian Maschke, Max Mertens, Michael Buchholz

(参考訳) 自動走行の場合、複雑な交通状況で他の道路利用者の将来の軌道を予測することは困難である。現代のニューラルネットワークは、過去の交通参加者の軌跡と地図データを使って、運転者の意図とおそらくの操作に関するヒントを集めている。車と他の交通機関の接続性を高めることで、協調情報は軌道予測アルゴリズムの入力として使用できるデータの別の情報源となる。接続されたアクターは、意図した経路を送信したり、計画された軌道を他のアクターに送信したりする。本研究では、このデータソースを軌跡予測に使用する利点を概説し、この追加データを活用可能なグラフベースのニューラルネットワークアーキテクチャを提案する。協調データが存在するとネットワーク性能が大幅に向上することを示す。また,協調的な情報がない場合においても,ネットワークの性能を向上させる訓練手法を提案する。また,ネットワークが不正確な協調データを処理できることを示し,実際の運転環境での利用を可能にした。

For automated driving, predicting the future trajectories of other road users in complex traffic situations is a hard problem. Modern neural networks use the past trajectories of traffic participants as well as map data to gather hints about the possible driver intention and likely maneuvers. With increasing connectivity between cars and other traffic actors, cooperative information is another source of data that can be used as inputs for trajectory prediction algorithms. Connected actors might transmit their intended path or even complete planned trajectories to other actors, which simplifies the prediction problem due to the imposed constraints. In this work, we outline the benefits of using this source of data for trajectory prediction and propose a graph-based neural network architecture that can leverage this additional data. We show that the network performance increases substantially if cooperative data is present. Also, our proposed training scheme improves the network's performance even for cases where no cooperative information is available. We also show that the network can deal with inaccurate cooperative data, which allows it to be used in real automated driving environments.

翻訳日:2024-02-18 14:31:58 公開日:2023-10-24

# AIによるプログラミング演習の自動補正: GPT-3.5はどの程度有効か?

AI-enhanced Auto-correction of Programming Exercises: How Effective is GPT-3.5? ( http://arxiv.org/abs/2311.10737v1 )

ライセンス: Link先を確認

Imen Azaiz, Oliver Deckarm, Sven Strickroth

(参考訳) タイムリーな形成的フィードバックは、効果的な学習にとって最も重要な要因の1つと考えられている。タイムリーで個別化されたフィードバックの提供は、高等教育の大規模クラスでは特に難しい。最近、gpt-3のような大きな言語モデルが一般公開され、コード生成やコード説明といった様々なタスクで有望な結果が得られた。本稿では、パーソナライズされたコード修正とフィードバック生成におけるAIの可能性を検討する。既存の学生による2つの実世界の課題の提出に基づいて,AI支援によるe-アセスメントの正しさと,障害の局所化,ヒントの正しさ,生成したフィードバックのコードスタイルの提案などの特徴について検討した。その結果,提出品の73 %が正しいか間違っているかのどちらかとして正しく同定された。これらの症例の59パーセントでは、GPT-3.5も有効で高品質なフィードバックを得られる。さらに、GPT-3.5は、実際のエラーではないエラーのローカライズや、幻覚的エラーなど、評価の弱点を示した。意味と潜在的な新しい利用シナリオについて論じる。

Timely formative feedback is considered as one of the most important drivers for effective learning. Delivering timely and individualized feedback is particularly challenging in large classes in higher education. Recently Large Language Models such as GPT-3 became available to the public that showed promising results on various tasks such as code generation and code explanation. This paper investigates the potential of AI in providing personalized code correction and generating feedback. Based on existing student submissions of two different real-world assignments, the correctness of the AI-aided e-assessment as well as the characteristics such as fault localization, correctness of hints, and code style suggestions of the generated feedback are investigated. The results show that 73 % of the submissions were correctly identified as either correct or incorrect. In 59 % of these cases, GPT-3.5 also successfully generated effective and high-quality feedback. Additionally, GPT-3.5 exhibited weaknesses in its evaluation, including localization of errors that were not the actual errors, or even hallucinated errors. Implications and potential new usage scenarios are discussed.

翻訳日:2023-11-27 01:00:53 公開日:2023-10-24

# パターン識別を用いた文脈分布外検出

Contextualised Out-of-Distribution Detection using Pattern Identication ( http://arxiv.org/abs/2311.12855v1 )

ライセンス: Link先を確認

Romain Xu-Darme (LSL, LIG), Julien Girard-Satabin (LSL), Darryl Hond (TRT UK), Gabriele Incorvaia (TRT UK), Zakaria Chihani (LSL)

(参考訳) 本研究では,クラス固有の繰り返しパターンを識別し,視覚的分類のための堅牢なアウト・オブ・ディストリビューション(OoD)検出手法を構築するための,説明可能なAI分野からの既存作業の拡張であるCODEを提案する。 CODEは分類器の再トレーニングを一切必要とせず、OoD非依存、すなわちトレーニングデータセットに直接チューニングされる。重要なことに、パターン識別により、イン・ディストリビューション(ID)データセットのイメージを参照データとして提供し、信頼度スコアに追加のコンテキストを提供する。さらに,IDデータセットの摂動に基づく新しいベンチマークを導入し,OoD検出法の比較の基準値として機能するIDデータセットとOoDデータセットの差を,既知の定量的に測定した。

In this work, we propose CODE, an extension of existing work from the field of explainable AI that identifies class-specific recurring patterns to build a robust Out-of-Distribution (OoD) detection method for visual classifiers. CODE does not require any classifier retraining and is OoD-agnostic, i.e., tuned directly to the training dataset. Crucially, pattern identification allows us to provide images from the In-Distribution (ID) dataset as reference data to provide additional context to the confidence scores. In addition, we introduce a new benchmark based on perturbations of the ID dataset that provides a known and quantifiable measure of the discrepancy between the ID and OoD datasets serving as a reference value for the comparison between OoD detection methods.

翻訳日:2023-11-27 00:36:35 公開日:2023-10-24

# プロンプト誘導多モード変圧器による結晶材料の状態予測密度

Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer ( http://arxiv.org/abs/2311.12856v1 )

ライセンス: Link先を確認

Namkyeong Lee, Heewoong Noh, Sungwon Kim, Dongmin Hyun, Gyoung S. Na, Chanyoung Park

(参考訳) 状態密度 (DOS) は結晶材料のスペクトル特性であり、物質の様々な特性に関する基本的な知見を提供する。従来の研究は主にDOS予測のための結晶材料の高品質な表現の獲得に焦点が当てられていたが、我々はDOSの性質を反映して得られた表現からDOSを予測することに重点を置いている。つまり、dosは結晶性物質だけでなく、以前の作品では無視されているエネルギーレベルによっても決定される。本稿では,多モード変圧器を用いて結晶材料とエネルギーから得られる不均一な情報を統合し,結晶材料中の原子と様々なエネルギー準位との複雑な関係をモデル化し,dos予測を行う。さらに, 結晶構造系とエネルギーの相互作用を学習するためのモデルとして, プロンプトを活用することを提案する。 Phonon DOSとElectron DOSの2種類のDOSに関する大規模な実験は、DOSTransformerの優位性を実証している。

The density of states (DOS) is a spectral property of crystalline materials, which provides fundamental insights into various characteristics of the materials. While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the general distribution of states as a function of energy. That is, DOS is not solely determined by the crystalline material but also by the energy levels, which has been neglected in previous works. In this paper, we propose to integrate heterogeneous information obtained from the crystalline materials and the energies via a multi-modal transformer, thereby modeling the complex relationships between the atoms in the crystalline materials and various energy levels for DOS prediction. Moreover, we propose to utilize prompts to guide the model to learn the crystal structural system-specific interactions between crystalline materials and energies. Extensive experiments on two types of DOS, i.e., Phonon DOS and Electron DOS, with various real-world scenarios demonstrate the superiority of DOSTransformer.

翻訳日:2023-11-27 00:18:34 公開日:2023-10-24

# MLを用いた地震解析用バブルアナライザプローブの設計

Design Of Rubble Analyzer Probe Using ML For Earthquake ( http://arxiv.org/abs/2311.02087v1 )

ライセンス: Link先を確認

Abhishek Sebastian, R Pragna, K Vishal Vythianathan, Dasaraju Sohan Sai, U Shiva Sri Hari Al, R Anirudh and Apurv Choudhary

(参考訳) the earthquake rubble analyzerは、機械学習を使って周囲の音で人間の存在を検知し、97.45%の精度を達成する。また、リアルタイムの環境データも提供し、地震後の救助活動に不可欠な、閉じ込められた個人に対する生存可能性の評価を支援する。

The earthquake rubble analyzer uses machine learning to detect human presence via ambient sounds, achieving 97.45% accuracy. It also provides real-time environmental data, aiding in assessing survival prospects for trapped individuals, crucial for post-earthquake rescue efforts

翻訳日:2023-11-12 19:58:03 公開日:2023-10-24

# CMIP X-MOS:極端モデル出力統計による気候モデルの改善

CMIP X-MOS: Improving Climate Models with Extreme Model Output Statistics ( http://arxiv.org/abs/2311.03370v1 )

ライセンス: Link先を確認

Vsevolod Morozov, Artem Galliamov, Aleksandr Lukashevich, Antonina Kurdukova, and Yury Maximov

(参考訳) 温室効果ガスの排出が気候変動に与える影響や、自然災害の頻度と深刻度の増加を評価するには、気候モデルが不可欠である。統合モデル相互比較計画(cmip)によって生み出された気候モデルが広く受け入れられているにもかかわらず、気候の極端さを正確に予測する上での課題に直面している。この制限に対処し、自然災害リスクの予測を改善するため、エクストリームモデル出力統計(x-mos)を導入する。このアプローチでは、深部回帰手法を用いてCMIPモデル出力を気象観測所から得られた実測値に正確にマッピングし、XXI気候極度のより正確な解析を行う。過去の研究とは対照的に,本研究では,将来の気候パラメータ分布の尾部推定の強化に重点を置いている。後者は意思決定者をサポートし、世界中の気候関連リスクをよりよく評価することができる。

Climate models are essential for assessing the impact of greenhouse gas emissions on our changing climate and the resulting increase in the frequency and severity of natural disasters. Despite the widespread acceptance of climate models produced by the Coupled Model Intercomparison Project (CMIP), they still face challenges in accurately predicting climate extremes, which pose most significant threats to both people and the environment. To address this limitation and improve predictions of natural disaster risks, we introduce Extreme Model Output Statistics (X-MOS). This approach utilizes deep regression techniques to precisely map CMIP model outputs to real measurements obtained from weather stations, which results in a more accurate analysis of the XXI climate extremes. In contrast to previous research, our study places a strong emphasis on enhancing the estimation of the tails of future climate parameter distributions. The latter supports decision-makers, enabling them to better assess climate-related risks across the globe.

翻訳日:2023-11-12 19:48:41 公開日:2023-10-24

# 商品取引における注文書の深層学習と強化学習の併用

Combining Deep Learning on Order Books with Reinforcement Learning for Profitable Trading ( http://arxiv.org/abs/2311.02088v1 )

ライセンス: Link先を確認

Koti S. Jaddu and Paul A. Bilokon

(参考訳) 近未来の動きを予測する価格不均衡や価格行動のパターンを利用するには、自動化された判断を迅速に行う必要がある。多くのアルゴリズムが探索されテストされてきたが、分析手法は限られた領域に焦点をあてて市場環境の全体像を活用できない。機械学習の分野では、収益性のあるトレーディングの領域範囲を増やすために、多くの大規模エンドツーエンドの生データの研究が成功しているが、複製は非常に困難である。注文書の深層学習と強化学習を組み合わせることは、大規模エンドツーエンド学習を、小売取引に適した再現性のためのより管理可能な軽量なコンポーネントに分解する1つの方法である。次の研究は、注文フローの不均衡を利用して複数の地平線をまたがるリターンを予測することに焦点を当て、トレーディング信号を提供する5つの金融機器のための3つの時間差学習モデルを訓練する。使用される楽器は2つの外国為替ペア(GBPUSDとEURUSD)、2つの指標(DE40とFTSE100)、1つの商品(XAUUSD)である。これらの15エージェントのパフォーマンスは、バックテストシミュレーションによって評価され、成功したモデルが小売トレーディングプラットフォームでテストを進める。この結果は潜在的に証明されるが、小売業の取引コスト、滑り込み、変動の拡散を完全に処理するために、一貫して利益を上げている取引に対して、さらなる修正が必要となる。

High-frequency trading is prevalent, where automated decisions must be made quickly to take advantage of price imbalances and patterns in price action that forecast near-future movements. While many algorithms have been explored and tested, analytical methods fail to harness the whole nature of the market environment by focusing on a limited domain. With the evergrowing machine learning field, many large-scale end-to-end studies on raw data have been successfully employed to increase the domain scope for profitable trading but are very difficult to replicate. Combining deep learning on the order books with reinforcement learning is one way of breaking down large-scale end-to-end learning into more manageable and lightweight components for reproducibility, suitable for retail trading. The following work focuses on forecasting returns across multiple horizons using order flow imbalance and training three temporal-difference learning models for five financial instruments to provide trading signals. The instruments used are two foreign exchange pairs (GBPUSD and EURUSD), two indices (DE40 and FTSE100), and one commodity (XAUUSD). The performances of these 15 agents are evaluated through backtesting simulation, and successful models proceed through to forward testing on a retail trading platform. The results prove potential but require further minimal modifications for consistently profitable trading to fully handle retail trading costs, slippage, and spread fluctuation.

翻訳日:2023-11-12 19:44:07 公開日:2023-10-24

# 第二生まれの電子、再び水夫として生まれる

Second Born electrons, born again seamen ( http://arxiv.org/abs/2310.17666v1 )

ライセンス: Link先を確認

A. R. P. Rau

(参考訳) タイトルの複数の句は好奇心に満ちており、海洋上の人物の救助と原子衝突における電荷移動における第2ボルン項の支配は物理学の共通要素を共有している。 2つの性質と共通性について説明する。

The multiple puns in the title play on a curiosity, that the rescue of a person overboard at sea and the dominance of the second Born term in charge transfer in atomic collisions share common elements of physics. Essentials and commonality in the two are explained.

翻訳日:2023-11-05 14:14:46 公開日:2023-10-24

# HMC-pCNサンプリング器を用いたSA-Roundtrip前のベイズ画像逆問題

Bayesian imaging inverse problem with SA-Roundtrip prior via HMC-pCN sampler ( http://arxiv.org/abs/2310.17817v1 )

ライセンス: Link先を確認

Jiayu Qian, Yuanyuan Liu, Jingya Yang and Qingping Zhou

(参考訳) 深い生成前のベイズ推定は、多くの科学・工学分野における逆問題の画像解決にかなりの関心を集めている。事前分布の選択は、利用可能な事前測定の重要表現学習から学習される。サラウンドトリップ(sa-roundtrip)は、サンプリング生成の制御とデータの固有次元の識別を可能にするために、新しい深層生成前置法である。この前は双方向生成逆ネットワークに自己接続構造を組み込む。その後、ベイズ推定は、特定の条件下でエルゴードであることが証明された事前条件付きcrank-nicolson (hmc-pcn) アルゴリズムを用いたハミルトニアンモンテカルロを用いて、低次元潜在空間の後方分布に適用される。 MNIST と TomoPhantom のデータセットを用いたCT再構成実験により,提案手法は最新技術との比較よりも優れており,精度の高い精度の定量化とともに,頑健で優れた点推定器が得られることがわかった。

Bayesian inference with deep generative prior has received considerable interest for solving imaging inverse problems in many scientific and engineering fields. The selection of the prior distribution is learned from, and therefore an important representation learning of, available prior measurements. The SA-Roundtrip, a novel deep generative prior, is introduced to enable controlled sampling generation and identify the data's intrinsic dimension. This prior incorporates a self-attention structure within a bidirectional generative adversarial network. Subsequently, Bayesian inference is applied to the posterior distribution in the low-dimensional latent space using the Hamiltonian Monte Carlo with preconditioned Crank-Nicolson (HMC-pCN) algorithm, which is proven to be ergodic under specific conditions. Experiments conducted on computed tomography (CT) reconstruction with the MNIST and TomoPhantom datasets reveal that the proposed method outperforms state-of-the-art comparisons, consistently yielding a robust and superior point estimator along with precise uncertainty quantification.

翻訳日:2023-11-05 14:04:38 公開日:2023-10-24

# ノイズラベル下でのロバストネスのための微調整前訓練モデル

Fine tuning Pre trained Models for Robustness Under Noisy Labels ( http://arxiv.org/abs/2310.17668v1 )

ライセンス: Link先を確認

Sumyeong Ahn, Sihyeon Kim, Jongwoo Ko, Se-Young Yun

(参考訳) トレーニングデータセットにノイズの多いラベルが存在することは、機械学習モデルのパフォーマンスに大きな影響を及ぼす可能性がある。この問題に対処するため、研究者はノイズラベルを用いた学習法を検討し、クリーンサンプルを特定し、ノイズラベルの影響を低減する。しかし、トレーニングデータセットの特定の部分の影響を限定すると、全体的な一般化性能が低下する可能性がある。これを緩和するため,近年の研究では,膨大な計算資源を活用し,ノイズラベルの慎重な活用を考察している。したがって、訓練コストの増大は効率の再評価を必要とする。その他の研究分野では、高度な一般化性能と効率性を達成することを目的とした、大規模な事前訓練モデルのための微調整技術の開発に焦点が当てられている。しかし,これらの手法は主にクリーンデータセットに集中しており,ノイズのあるラベルシナリオの探索は限られている。本研究の目的は,ノイズのあるラベル付きデータセットに対して,事前学習したモデルを微調整する適切な方法を見つけることである。この目的を達成するために,ノイズの多いデータセットに遭遇したモデルの特徴について検討する。実験分析を通じて,事前学習したモデルの事前知識を頑健かつ効率的に伝達するTURNという新しいアルゴリズムを導入する。本アルゴリズムは,(1)ノイズラベルによる特徴抽出器の歪みを防止するために線形分類器を独立にチューニングし,(2)雑音ラベル比を低減し,ノイズ低減データセットに基づいてモデル全体を微調整し,ターゲットデータセットに適用する,という2つのステップからなる。提案アルゴリズムは, 従来の手法と比較して, 様々なベンチマークにおいて, 効率が高く, 性能も向上している。

The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models. To tackle this issue, researchers have explored methods for Learning with Noisy Labels to identify clean samples and reduce the influence of noisy labels. However, constraining the influence of a certain portion of the training dataset can result in a reduction in overall generalization performance. To alleviate this, recent studies have considered the careful utilization of noisy labels by leveraging huge computational resources. Therefore, the increasing training cost necessitates a reevaluation of efficiency. In other areas of research, there has been a focus on developing fine-tuning techniques for large pre-trained models that aim to achieve both high generalization performance and efficiency. However, these methods have mainly concentrated on clean datasets, and there has been limited exploration of the noisy label scenario. In this research, our aim is to find an appropriate way to fine-tune pre-trained models for noisy labeled datasets. To achieve this goal, we investigate the characteristics of pre-trained models when they encounter noisy datasets. Through empirical analysis, we introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models. The algorithm consists of two main steps: (1) independently tuning the linear classifier to protect the feature extractor from being distorted by noisy labels, and (2) reducing the noisy label ratio and fine-tuning the entire model based on the noise-reduced dataset to adapt it to the target dataset. The proposed algorithm has been extensively tested and demonstrates efficient yet improved denoising performance on various benchmarks compared to previous methods.

翻訳日:2023-11-05 14:03:58 公開日:2023-10-24

# DeSIQ: ソーシャルインテリジェンス理解のための不偏のベンチマークを目指す

DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding ( http://arxiv.org/abs/2310.18359v1 )

ライセンス: Link先を確認

Xiao-Yu Guo and Yuan-Fang Li and Gholamreza Haffari

(参考訳) 社会的知性は人間の表現、意図、相互作用を理解するのに不可欠である。ソーシャルインテリジェンス・クエリー(Social Intelligence Queries, Social-IQ)は、複雑なソーシャルインタラクションのビデオに関する複数の質問のデータセットである。このようなベンチマークデータセットの健全性は基礎となる研究課題の調査に不可欠であるため,ソーシャルiqの健全性を研究するための包括的方法論を定めている。分析の結果,Social-IQにはかなりのバイアスがあることが判明した。このバイアスは適度に強い言語モデルによって活用され,適切な相関関係を学習し,文脈や質問を伴わずに完全なパフォーマンスを達成することができる。ソーシャルIQに単純な摂動を適用して構築した新しい挑戦的データセットであるDeSIQを紹介する。我々の実証分析は、DeSIQがオリジナルのSocial-IQデータセットのバイアスを著しく減少させることを示している。さらに,モデルサイズ,モデルスタイル,学習設定,コモンセンス知識,マルチモダリティがベンチマーク性能に与える影響について検討し,考察した。我々の新しいデータセット、観察、発見は、社会的知性の研究に重要な研究課題を開く。

Social intelligence is essential for understanding and reasoning about human expressions, intents and interactions. One representative benchmark for its study is Social Intelligence Queries (Social-IQ), a dataset of multiple-choice questions on videos of complex social interactions. We define a comprehensive methodology to study the soundness of Social-IQ, as the soundness of such benchmark datasets is crucial to the investigation of the underlying research problem. Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model to learn spurious correlations to achieve perfect performance without being given the context or even the question. We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ. Our empirical analysis shows DeSIQ significantly reduces the biases in the original Social-IQ dataset. Furthermore, we examine and shed light on the effect of model size, model style, learning settings, commonsense knowledge, and multi-modality on the new benchmark performance. Our new dataset, observations and findings open up important research questions for the study of social intelligence.

翻訳日:2023-11-05 13:56:00 公開日:2023-10-24

# 大規模言語モデルのプロンプト工学手法に関するコミュニケーション理論の展望

A Communication Theory Perspective on Prompting Engineering Methods for Large Language Models ( http://arxiv.org/abs/2310.18358v1 )

ライセンス: Link先を確認

Yuanfeng Song, Yuanqin He, Xuefang Zhao, Hanlin Gu, Di Jiang, Haijun Yang, Lixin Fan, Qiang Yang

(参考訳) 大規模言語モデル(llms)の台頭により、コミュニティはシングルタスク指向自然言語処理(nlp)研究から総合的なエンドツーエンドマルチタスク学習パラダイムへと移行した。この分野におけるこの研究の線に沿って、LLMベースのプロンプト法は、プロンプト工学(PE)による技術的アドバンテージと、様々なプロンプト法によって開示される基礎的NLP原則によって、多くの注目を集めている。従来の教師付き学習では、ラベル付きデータに基づいてモデルをトレーニングし、予測する必要があった。対照的にPE法は、特にショットやゼロショットのシナリオにおいて、適切なプロンプトを構成することによって既存のLCM(GPT-3とGPT-4)の強力な能力を直接利用する。本論文は,この分野の促進と進化する性質に関する研究の豊富さに直面することを目的としている。 i) 確立された通信理論の枠組みの中で,既存のPE手法をレビューするための新たな視点を示す。 (二)4つの典型的な課題に使用される既存のPE手法の展開動向の理解を深めること。 (iii)将来のpe法の有望な研究方向について光を当てた。

The springing up of Large Language Models (LLMs) has shifted the community from single-task-orientated natural language processing (NLP) research to a holistic end-to-end multi-task learning paradigm. Along this line of research endeavors in the area, LLM-based prompting methods have attracted much attention, partially due to the technological advantages brought by prompt engineering (PE) as well as the underlying NLP principles disclosed by various prompting methods. Traditional supervised learning usually requires training a model based on labeled data and then making predictions. In contrast, PE methods directly use the powerful capabilities of existing LLMs (i.e., GPT-3 and GPT-4) via composing appropriate prompts, especially under few-shot or zero-shot scenarios. Facing the abundance of studies related to the prompting and the ever-evolving nature of this field, this article aims to (i) illustrate a novel perspective to review existing PE methods, within the well-established communication theory framework; (ii) facilitate a better/deeper understanding of developing trends of existing PE methods used in four typical tasks; (iii) shed light on promising research directions for future PE methods.

翻訳日:2023-11-05 13:55:41 公開日:2023-10-24

# 大規模言語モデルを活用したeコマースにおける製品記述の強化

Leveraging Large Language Models for Enhanced Product Descriptions in eCommerce ( http://arxiv.org/abs/2310.18357v1 )

ライセンス: Link先を確認

Jianghong Zhou and Bo Liu and Jhalak Nilesh Acharya Yao Hong and Kuang-chih Lee and Musen Wen

(参考訳) eコマースのダイナミックな分野では、検索の可視性と顧客エンゲージメントを高めるために、製品記述の品質と包括性が重要である。効果的な製品説明は、'コールドスタート'問題に対処し、市場のトレンドに合わせて、最終的にクリックスルー率の増加につながる。これらの記述を作成するための従来の手法は、しばしば人為的な努力を伴い、一貫性とスケーラビリティの両方を欠いている。本稿では,LAMA 2.0 7B言語モデルを用いた製品記述の自動生成手法を提案する。私たちは、最大のeコマースプラットフォームの1つであるwalmartから、本物の製品説明のデータセットでモデルをトレーニングします。このモデルは、ドメイン固有の言語機能やeコマースニュアンスのために微調整され、営業やユーザエンゲージメントにおける実用性を高める。我々は、NDCG、顧客クリックスルー率、人間評価など、複数の評価指標を用いて、アプローチの有効性を検証する。この結果から,システムはスケーラブルであるだけでなく,製品記述の作成に関わる人的作業量を大幅に削減できることが判明した。本研究は,eコマースプラットフォームのさまざまな面の自動化と最適化において,llama 2.0 7b のような大規模言語モデルのかなりの可能性を強調し,検索機能の改善や販売の増加など,ビジネス的な影響を提供する。

In the dynamic field of eCommerce, the quality and comprehensiveness of product descriptions are pivotal for enhancing search visibility and customer engagement. Effective product descriptions can address the 'cold start' problem, align with market trends, and ultimately lead to increased click-through rates. Traditional methods for crafting these descriptions often involve significant human effort and may lack both consistency and scalability. This paper introduces a novel methodology for automating product description generation using the LLAMA 2.0 7B language model. We train the model on a dataset of authentic product descriptions from Walmart, one of the largest eCommerce platforms. The model is then fine-tuned for domain-specific language features and eCommerce nuances to enhance its utility in sales and user engagement. We employ multiple evaluation metrics, including NDCG, customer click-through rates, and human assessments, to validate the effectiveness of our approach. Our findings reveal that the system is not only scalable but also significantly reduces the human workload involved in creating product descriptions. This study underscores the considerable potential of large language models like LLAMA 2.0 7B in automating and optimizing various facets of eCommerce platforms, offering significant business impact, including improved search functionality and increased sales.

翻訳日:2023-11-05 13:55:21 公開日:2023-10-24

# ヒューリスティックから分析へ:コヒーレント物理コモンセンス推論のための認知的動機付け戦略

From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning ( http://arxiv.org/abs/2310.18364v1 )

ライセンス: Link先を確認

Zheyuan Zhang, Shane Storks, Fengyuan Hu, Sungryull Sohn, Moontae Lee, Honglak Lee, Joyce Chai

(参考訳) プレトレーニング言語モデル(PLM)は、様々な言語タスクにおいて印象的なパフォーマンスを示している。しかし、それらはしばしば相関関係を生じやすく、しばしば説明的な情報を生成する。現実世界のアプリケーションでは、PLMは形式化された一貫性のある推論チェーンで決定を正当化する必要があるが、この課題は未解決のままである。認知心理学は、人間が高速で直感的なヒューリスティックな思考を活用して過去の経験に基づいて意思決定を行い、より遅く、思慮深い分析的推論を通じて決定を合理化することができると理論化している。 PLMによる微調整および文脈内学習にこれらの相互結合二重プロセスを導入し、コヒーレントなコモンセンス推論を必要とする2つの言語理解タスクに適用する。提案するヒューリスティック・アナリシス・推論(har)戦略はモデル決定の合理化のコヒーレンスを劇的に改善し,直観的物理学の階層的推論(trip)に最先端の結果をもたらすことを示した。また、この改良されたコヒーレンスが、推論の各ステップにおいて、関連する言語コンテキストに対するより忠実な注意の直接の結果であることも分かりました。以上の結果から, PLM推論の一貫性と信頼性を効果的に向上できる可能性が示唆された。

Pre-trained language models (PLMs) have shown impressive performance in various language tasks. However, they are prone to spurious correlations, and often generate illusory information. In real-world applications, PLMs should justify decisions with formalized, coherent reasoning chains, but this challenge remains under-explored. Cognitive psychology theorizes that humans are capable of utilizing fast and intuitive heuristic thinking to make decisions based on past experience, then rationalizing the decisions through slower and deliberative analytic reasoning. We incorporate these interlinked dual processes in fine-tuning and in-context learning with PLMs, applying them to two language understanding tasks that require coherent physical commonsense reasoning. We show that our proposed Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions, yielding state-of-the-art results on Tiered Reasoning for Intuitive Physics (TRIP). We also find that this improved coherence is a direct result of more faithful attention to relevant language context in each step of reasoning. Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.

翻訳日:2023-11-05 13:40:07 公開日:2023-10-24

# 強化学習におけるグラフ畳み込みネットワークを用いた会話エージェントの文脈化リアルタイムマルチモーダル感情認識

A Contextualized Real-Time Multimodal Emotion Recognition for Conversational Agents using Graph Convolutional Networks in Reinforcement Learning ( http://arxiv.org/abs/2310.18363v1 )

ライセンス: Link先を確認

Fathima Abdul Rahman, Guang Lu

(参考訳) 最近の生成型人工知能(genai)と大規模言語モデル(llm)の発展により、会話エージェントはますます普及し、受け入れられている。身近な方法でインタラクションし、仮想的なコンパニオンとしてサポートすることで、ヒューマンタッチを提供します。したがって、ユーザの感情を理解して慎重に反応することが重要である。感情認識の標準的な問題と比較すると、会話エージェントはリアルタイムでなければならないという追加の制約に直面している。音声、視覚、テキストのモダリティを用いたモデルアーキテクチャの研究は、オンライン機能を提供しないフルビデオシーケンスを用いた感情分類に重点を置いている。本稿では,グラフ畳み込みネットワークと強化学習(coner-grl)を用いたコンテキスト化感情認識のための新しいパラダイムを提案する。会話は、文脈情報の効果的な抽出のために、発話の小さなグループに分割される。このシステムは、GRU(Gated Recurrent Units)を用いて、これらの発話群からマルチモーダル特徴を抽出する。さらに重要なことに、グラフ畳み込みネットワーク(gcn)と強化学習(rl)エージェントは、インタラクティブなシナリオにおける感情機能の複雑な依存関係を捉えるために訓練される。 ConER-GRLモデルとベンチマークデータセット上の他の最先端モデルを比較して、IEMOCAPはマルチモーダルな会話信号からリアルタイムで感情を認識する際に、conER-GRLアーキテクチャの利点を示す。

Owing to the recent developments in Generative Artificial Intelligence (GenAI) and Large Language Models (LLM), conversational agents are becoming increasingly popular and accepted. They provide a human touch by interacting in ways familiar to us and by providing support as virtual companions. Therefore, it is important to understand the user's emotions in order to respond considerately. Compared to the standard problem of emotion recognition, conversational agents face an additional constraint in that recognition must be real-time. Studies on model architectures using audio, visual, and textual modalities have mainly focused on emotion classification using full video sequences that do not provide online features. In this work, we present a novel paradigm for contextualized Emotion Recognition using Graph Convolutional Network with Reinforcement Learning (conER-GRL). Conversations are partitioned into smaller groups of utterances for effective extraction of contextual information. The system uses Gated Recurrent Units (GRU) to extract multimodal features from these groups of utterances. More importantly, Graph Convolutional Networks (GCN) and Reinforcement Learning (RL) agents are cascade trained to capture the complex dependencies of emotion features in interactive scenarios. Comparing the results of the conER-GRL model with other state-of-the-art models on the benchmark dataset IEMOCAP demonstrates the advantageous capabilities of the conER-GRL architecture in recognizing emotions in real-time from multimodal conversational signals.

翻訳日:2023-11-05 13:39:40 公開日:2023-10-24

# SoK: 汎用大規模言語モデルにおける記憶

SoK: Memorization in General-Purpose Large Language Models ( http://arxiv.org/abs/2310.18362v1 )

ライセンス: Link先を確認

Valentin Hartmann, Anshuman Suri, Vincent Bindschaedler, David Evans, Shruti Tople, Robert West

(参考訳) 大規模言語モデル(LLM)は、無数のアプリケーションが開発中で、目覚ましいペースで進んでいる。従来の機械学習モデルとは異なり、それらはもはや特定のアプリケーションのために構築されるものではなく、幅広いタスクに優れたように設計されている。この成功の大きな要因は、膨大なトレーニングデータセットと、トレーニングデータに含まれる大量の情報を記憶できる前例のない数のモデルパラメータにある。この記憶は単なる言語にとどまらず、いくつかの文書にのみ存在する情報を包含している。これは、質問応答のようなタスクを実行するために必要であり、したがって学習の重要な部分であるため、しばしば望ましいが、プライバシーやセキュリティ、著作権など、さまざまな問題をもたらす。 LLMはトレーニングデータの短い秘密を記憶できるだけでなく、さまざまな方法でテキストで表現できる事実や書体スタイルといった概念を記憶することもできる。本稿では,文章,事実,アイデア,アルゴリズム,書式,分布特性,アライメント目標を網羅したLLMにおける記憶のための分類法を提案する。モデル性能,プライバシ,セキュリティ,機密性,著作権,監査,暗記の検出と防止方法など,各種類の暗記(肯定的かつ否定的)が持つ意味について述べる。さらに,モデル重みの代わりにモデルの振る舞いを暗記する手法が主流であることから生じる課題についても,推論能力や復号アルゴリズムの違いといったllm特有の現象により強調する。本稿では,LSMの記憶から生じる潜在的なリスクと可能性について述べる。

Large Language Models (LLMs) are advancing at a remarkable pace, with myriad applications under development. Unlike most earlier machine learning models, they are no longer built for one specific application but are designed to excel in a wide range of tasks. A major part of this success is due to their huge training datasets and the unprecedented number of model parameters, which allow them to memorize large amounts of information contained in the training data. This memorization goes beyond mere language, and encompasses information only present in a few documents. This is often desirable since it is necessary for performing tasks such as question answering, and therefore an important part of learning, but also brings a whole array of issues, from privacy and security to copyright and beyond. LLMs can memorize short secrets in the training data, but can also memorize concepts like facts or writing styles that can be expressed in text in many different ways. We propose a taxonomy for memorization in LLMs that covers verbatim text, facts, ideas and algorithms, writing styles, distributional properties, and alignment goals. We describe the implications of each type of memorization - both positive and negative - for model performance, privacy, security and confidentiality, copyright, and auditing, and ways to detect and prevent memorization. We further highlight the challenges that arise from the predominant way of defining memorization with respect to model behavior instead of model weights, due to LLM-specific phenomena such as reasoning capabilities or differences between decoding algorithms. Throughout the paper, we describe potential risks and opportunities arising from memorization in LLMs that we hope will motivate new research directions.

翻訳日:2023-11-05 13:39:16 公開日:2023-10-24

# ユニニ医療従事者のための臨床判断支援システム

Clinical Decision Support System for Unani Medicine Practitioners ( http://arxiv.org/abs/2310.18361v1 )

ライセンス: Link先を確認

Haider Sultan, Hafiza Farwa Mahmood, Noor Fatima, Marriyam Nadeem and Talha Waheed

(参考訳) 伝統医学の他の分野と同様に、ユナニ薬は長年にわたり有効な医療として見なされてきた。現在でも亜大陸、特にパキスタンやインドで広く使われている。しかし、Unani Medicines Practitionersは日々の医療実践において現代のIT応用を欠いている。オンライン臨床意思決定支援システムは、この課題に対処し、Unani Medicines実践者の診断過程を支援する。提案システムは、患者の症状を入力するためのwebベースのインターフェースを提供し、その症状を自動的に分析し、起こりうる疾患のリストを生成する。このシステムにより、患者は最も可能性の高い疾患を選択し、関連する治療法を遠隔で知らせることができる。このシステムは、オンライン臨床決定支援システム、人工知能推論エンジン、総合的なUnani Medicines Databaseの3つのモジュールで構成されている。このシステムは、決定木、ディープラーニング、自然言語処理といった高度なAI技術を採用している。システム開発では、React、FastAPI、MySQLを含むテクノロジスタックを使用した。アプリケーションのデータと機能は、同様のドメインアプリケーションとの統合と拡張のためにAPIを使用して公開されます。このプロジェクトの新規性は、Unani Medicinesの原則の文脈で、病気を正確にかつ効率的に診断することの課題に対処することである。技術力を活用することで, 医療サービスや情報へのアクセスの容易化, コスト削減, 開業医や患者の満足度向上, 診断プロセスの速度と正確性の向上, 遠隔での効果的な治療が期待できる。このアプリケーションは、Unani Medicines Practitioners, patient, Government Drug Regulators, Software Developers, and Medical researchersなどに役に立つ。

Like other fields of Traditional Medicines, Unani Medicines have been found as an effective medical practice for ages. It is still widely used in the subcontinent, particularly in Pakistan and India. However, Unani Medicines Practitioners are lacking modern IT applications in their everyday clinical practices. An Online Clinical Decision Support System may address this challenge to assist apprentice Unani Medicines practitioners in their diagnostic processes. The proposed system provides a web-based interface to enter the patient's symptoms, which are then automatically analyzed by our system to generate a list of probable diseases. The system allows practitioners to choose the most likely disease and inform patients about the associated treatment options remotely. The system consists of three modules: an Online Clinical Decision Support System, an Artificial Intelligence Inference Engine, and a comprehensive Unani Medicines Database. The system employs advanced AI techniques such as Decision Trees, Deep Learning, and Natural Language Processing. For system development, the project team used a technology stack that includes React, FastAPI, and MySQL. Data and functionality of the application is exposed using APIs for integration and extension with similar domain applications. The novelty of the project is that it addresses the challenge of diagnosing diseases accurately and efficiently in the context of Unani Medicines principles. By leveraging the power of technology, the proposed Clinical Decision Support System has the potential to ease access to healthcare services and information, reduce cost, boost practitioner and patient satisfaction, improve speed and accuracy of the diagnostic process, and provide effective treatments remotely. The application will be useful for Unani Medicines Practitioners, Patients, Government Drug Regulators, Software Developers, and Medical Researchers.

翻訳日:2023-11-05 13:38:50 公開日:2023-10-24

# 自力でLPMを誘導する: 機械の読み出しを自動で操作するショートカットトリガー

Guiding LLM to Fool Itself: Automatically Manipulating Machine Reading Comprehension Shortcut Triggers ( http://arxiv.org/abs/2310.18360v1 )

ライセンス: Link先を確認

Mosh Levy, Shauli Ravfogel, Yoav Goldberg

(参考訳) 機械読取包括システム(MRC)におけるLLMの最近の応用は目覚ましい結果を示しているが、真のラベルと突発的に相関した特徴によって引き起こされるショートカットの使用は、その信頼性に対する潜在的な脅威として現れている。そこで本研究では,LLM を編集者として,LLM を誤解を招くようなテキスト編集を指導する LLM と,編集したテキストに基づいて質問に回答する LLM の2つの角度から解析する。サンプルにショートカットトリガーを追加するためのエディタをガイドするフレームワークを導入する。 GPT4をエディタとして使うと、LCMを騙すサンプルのトリガショートカットをうまく編集できる。 LLMを読者として分析することで、能力のあるLLMであってもショートカット知識で騙すことができる。驚くべきことに、gpt4は自身の編集によって欺くことができる(f1では15%減少)。手術をショートカットするLLMの脆弱性について検討した。今後の研究のためにフレームワークが生成したキュレートデータセットであるShortcutQAを公開します。

Recent applications of LLMs in Machine Reading Comprehension (MRC) systems have shown impressive results, but the use of shortcuts, mechanisms triggered by features spuriously correlated to the true label, has emerged as a potential threat to their reliability. We analyze the problem from two angles: LLMs as editors, guided to edit text to mislead LLMs; and LLMs as readers, who answer questions based on the edited text. We introduce a framework that guides an editor to add potential shortcuts-triggers to samples. Using GPT4 as the editor, we find it can successfully edit trigger shortcut in samples that fool LLMs. Analysing LLMs as readers, we observe that even capable LLMs can be deceived using shortcut knowledge. Strikingly, we discover that GPT4 can be deceived by its own edits (15% drop in F1). Our findings highlight inherent vulnerabilities of LLMs to shortcut manipulations. We publish ShortcutQA, a curated dataset generated by our framework for future research.

翻訳日:2023-11-05 13:38:27 公開日:2023-10-24

# 非定常確率多元帯域に対するリスク回避フレームワーク

A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits ( http://arxiv.org/abs/2310.19821v1 )

ライセンス: Link先を確認

Reda Alami, Mohammed Mahfoud, Mastane Achab

(参考訳) 典型的な確率的多腕バンディット問題(英語版)では、しばしば与えられた報酬の合計を最大化することが目的である。追加情報なしで最適な戦略が選択される一方で、追加の環境固有の知識を提供する場合、もはやそうではない。特に、医療や金融のような高ボラティリティの分野では、単純報酬の最大化アプローチは、学習問題の複雑さを正確に捉えておらず、信頼性の低いソリューションをもたらすことが多い。そこで本研究では,非定常環境で動作する適応型リスクアウェア戦略の枠組みを提案する。本手法は,多機能バンディットアルゴリズムの複数のファミリーをリスクに敏感な設定にマップするために,文献に広く普及する様々なリスク対策を取り入れている。さらに、得られたアルゴリズムをRestarted Bayesian Online Change-Point Detection (R-BOCPD)アルゴリズムと組み合わせ、局所的な(アームごとの)スイッチを検出するために(可変な)探索戦略を課す。我々は、有限時間理論的保証と漸近的後悔の束縛である$\tilde o(\sqrt{k_t t})$ up to time horizon $t$ と$k_t$ を提供する。実際に,本フレームワークは,合成環境と実環境の両方における最先端技術と比較し,リスク感受性と非定常性の両方に関して効率よく機能する。

In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon $T$. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no longer the case when provided additional environment-specific knowledge. In particular, in areas of high volatility like healthcare or finance, a naive reward maximization approach often does not accurately capture the complexity of the learning problem and results in unreliable solutions. To tackle problems of this nature, we propose a framework of adaptive risk-aware strategies that operate in non-stationary environments. Our framework incorporates various risk measures prevalent in the literature to map multiple families of multi-armed bandit algorithms into a risk-sensitive setting. In addition, we equip the resulting algorithms with the Restarted Bayesian Online Change-Point Detection (R-BOCPD) algorithm and impose a (tunable) forced exploration strategy to detect local (per-arm) switches. We provide finite-time theoretical guarantees and an asymptotic regret bound of order $\tilde O(\sqrt{K_T T})$ up to time horizon $T$ with $K_T$ the total number of change-points. In practice, our framework compares favorably to the state-of-the-art in both synthetic and real-world environments and manages to perform efficiently with respect to both risk-sensitivity and non-stationarity.

翻訳日:2023-11-05 13:28:44 公開日:2023-10-24

# NetDistiller:in-situ蒸留によるTiny Deep Learningの強化

NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation ( http://arxiv.org/abs/2310.19820v1 )

ライセンス: Link先を確認

Shunyao Zhang, Yonggan Fu, Shang Wu, Jyotikrishna Dass, Haoran You, Yingyan (Celine) Lin

(参考訳) 小さなニューラルネットワーク(TNN)のタスク精度を高めることは、メモリ、計算、帯域幅、電源の制限によって制限されるエッジデバイスへのTNNのデプロイを可能にするための根本的な課題となっている。そこで本研究では,TNNのチャネル数を拡大して構築した重み共有教師のサブネットワークとして扱うことにより,TNNの達成可能な精度を高めるためのNetDistillerというフレームワークを提案する。具体的には, 目標TNNモデルと, 1) 勾配の衝突に対処するための勾配手術と(2) 教師モデルの過度な適合を緩和するための不確実性を考慮した蒸留を通じて, 重み付け教師モデルとの共同訓練を行う。多様なタスクにわたる大規模な実験は、最先端の手法よりも達成可能なTNNの精度を高めるNetDistillerの有効性を検証する。私たちのコードはhttps://github.com/GATECH-EIC/NetDistiller.comから入手可能です。

Boosting the task accuracy of tiny neural networks (TNNs) has become a fundamental challenge for enabling the deployments of TNNs on edge devices which are constrained by strict limitations in terms of memory, computation, bandwidth, and power supply. To this end, we propose a framework called NetDistiller to boost the achievable accuracy of TNNs by treating them as sub-networks of a weight-sharing teacher constructed by expanding the number of channels of the TNN. Specifically, the target TNN model is jointly trained with the weight-sharing teacher model via (1) gradient surgery to tackle the gradient conflicts between them and (2) uncertainty-aware distillation to mitigate the overfitting of the teacher model. Extensive experiments across diverse tasks validate NetDistiller's effectiveness in boosting TNNs' achievable accuracy over state-of-the-art methods. Our code is available at https://github.com/GATECH-EIC/NetDistiller.

翻訳日:2023-11-05 13:28:16 公開日:2023-10-24

# 構成ファインチューニング: 一般化のための事前学習型デノナイジングオートエンコーダ

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization ( http://arxiv.org/abs/2006.16205v4 )

ライセンス: Link先を確認

Sang Michael Xie, Tengyu Ma, Percy Liang

(参考訳) 我々は,コードにコンパイルしなければならない擬似コード変換など,出力の妥当性制約を受ける構造化出力の予測問題に注目する。ラベル付き入出力ペアは入手に費用がかかるが、"ラベルなし"出力(つまり、対応する入力のない出力)は自由に利用可能であり(github上のコードなど)、出力妥当性に関する情報を提供する。ラベルなし出力の劣化バージョンを復調するためにデノイザを事前訓練することで、出力構造をキャプチャできる。まず,プレトレーニング後の標準的な微調整が,この構造の一部を破壊していることを示す。次に, 予め学習したデノイザを冷凍し, 出力構造を保存した予測器を微調整する構成ファインチューニングを提案する。 2層ReLUネットワークの場合、構成した微調整によって予測器の複雑さが大幅に減少し、一般化が向上することを示す。実験により,2つの擬似コードからコードへの変換データセット(3%,6%)の標準的な微調整よりも微調整が向上することを示した。合成微調整による改善は、アウト・オブ・ディストリビューション(OOD)の例(4%と25%の相対)で拡大される。

We focus on prediction problems with structured outputs that are subject to output validity constraints, e.g. pseudocode-to-code translation where the code must compile. While labeled input-output pairs are expensive to obtain, "unlabeled" outputs, i.e. outputs without corresponding inputs, are freely available (e.g. code on GitHub) and provide information about output validity. We can capture the output structure by pre-training a denoiser to denoise corrupted versions of unlabeled outputs. We first show that standard fine-tuning after pre-training destroys some of this structure. We then propose composed fine-tuning, which fine-tunes a predictor composed with the pre-trained denoiser, which is frozen to preserve output structure. For two-layer ReLU networks, we prove that composed fine-tuning significantly reduces the complexity of the predictor, thus improving generalization. Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative). The improvement from composed fine-tuning is magnified on out-of-distribution (OOD) examples (4% and 25% relative).

翻訳日:2023-10-28 07:34:29 公開日:2023-10-24

# 説明可能なニューラル推論のための前方構成伝搬

Forward Composition Propagation for Explainable Neural Reasoning ( http://arxiv.org/abs/2112.12717v4 )

ライセンス: Link先を確認

Isel Grau and Gonzalo N\'apoles and Marilyn Bello and Yamisleydi Salgueiro and Agnieszka Jastrzebska

(参考訳) 本稿では,構造的分類問題に基づくフィードフォワードニューラルネットワークの予測を説明するため,fcp( forward composition propagation)と呼ばれるアルゴリズムを提案する。提案するfcpアルゴリズムでは、各ニューロンは、そのニューロンにおける各問題の特徴の役割を示す合成ベクターによって記述される。構成ベクトルは与えられた入力インスタンスを使用して初期化され、出力層に到達するまでネットワーク全体に伝播する。各構成値の符号は、対応する特徴がニューロンを興奮させるか阻害するかを示し、絶対値はその影響を定量化する。 FCPアルゴリズムは、学習プロセスが完了すると、ポストホックベースで実行される。本稿では,fcpアルゴリズムを説明することを目的として,根拠真理が分かっている公平性問題におけるバイアス検出に関するケーススタディを開発した。シミュレーションの結果, 構成値は保護特徴の期待挙動と密接に一致することがわかった。この論文のソースコードと補足資料はhttps://github.com/igraugar/fcp.comで入手できる。

This paper proposes an algorithm called Forward Composition Propagation (FCP) to explain the predictions of feed-forward neural networks operating on structured classification problems. In the proposed FCP algorithm, each neuron is described by a composition vector indicating the role of each problem feature in that neuron. Composition vectors are initialized using a given input instance and subsequently propagated through the whole network until reaching the output layer. The sign of each composition value indicates whether the corresponding feature excites or inhibits the neuron, while the absolute value quantifies its impact. The FCP algorithm is executed on a post-hoc basis, i.e., once the learning process is completed. Aiming to illustrate the FCP algorithm, this paper develops a case study concerning bias detection in a fairness problem in which the ground truth is known. The simulation results show that the composition values closely align with the expected behavior of protected features. The source code and supplementary material for this paper are available at https://github.com/igraugar/fcp.

翻訳日:2023-10-28 07:05:57 公開日:2023-10-24

# クロスドメインマイズショット学習のための特徴抽出器スタック化

Feature Extractor Stacking for Cross-domain Few-shot Learning ( http://arxiv.org/abs/2205.05831v4 )

ライセンス: Link先を確認

Hongyu Wang, Eibe Frank, Bernhard Pfahringer, Michael Mayo, Geoffrey Holmes

(参考訳) クロスドメイン・ショットラーニング(CDFSL)は、知識を1つ以上のソースドメインから、明確に異なる分布を持つインスタンススカースターゲットドメインに転送する必要がある学習問題に対処する。最近発表されたCDFSL法は一般に、複数のソースドメインの知識を1つの特徴抽出器に組み合わせた普遍モデルを構築している。これにより効率的な推論が可能になるが、新しいソースドメインが追加されるたびに抽出器を再計算する必要がある。これらの手法の一部は、異種ソースドメイン抽出アーキテクチャと互換性がない。そこで本研究では,cdfsl法である特徴抽出器スタックリング(fes)を提案する。この手法は,不均質な事前学習された抽出器を箱から取り出し,その抽出器の更新時に再計算する必要のある普遍モデルを維持しない。本稿では,古典的累積一般化法に着想を得た基本的FESアルゴリズムと,畳み込みFES(ConFES)と正規化FES(ReFES)の2つの変種を紹介する。対象領域のタスクが与えられた場合、これらのアルゴリズムは、各抽出器を独立に微調整し、クロスバリデーションを使用して、サポートセットからスタック化された一般化のためのトレーニングデータを抽出し、このデータから単純な線形累積分類器を学習する。我々は,畳み込みニューラルネットワークを用いた画像分類を目標としたメタデータセットベンチマークにおいて,fes法を評価した結果,最新性能が得られた。

Cross-domain few-shot learning (CDFSL) addresses learning problems where knowledge needs to be transferred from one or more source domains into an instance-scarce target domain with an explicitly different distribution. Recently published CDFSL methods generally construct a universal model that combines knowledge of multiple source domains into one feature extractor. This enables efficient inference but necessitates re-computation of the extractor whenever a new source domain is added. Some of these methods are also incompatible with heterogeneous source domain extractor architectures. We propose feature extractor stacking (FES), a new CDFSL method for combining information from a collection of extractors, that can utilise heterogeneous pretrained extractors out of the box and does not maintain a universal model that needs to be re-computed when its extractor collection is updated. We present the basic FES algorithm, which is inspired by the classic stacked generalisation approach, and also introduce two variants: convolutional FES (ConFES) and regularised FES (ReFES). Given a target-domain task, these algorithms fine-tune each extractor independently, use cross-validation to extract training data for stacked generalisation from the support set, and learn a simple linear stacking classifier from this data. We evaluate our FES methods on the well-known Meta-Dataset benchmark, targeting image classification with convolutional neural networks, and show that they can achieve state-of-the-art performance.

翻訳日:2023-10-28 06:54:33 公開日:2023-10-24

# 治療コミュニティにおける相互影響の同定

Identifying Peer Influence in Therapeutic Communities ( http://arxiv.org/abs/2203.14223v3 )

ライセンス: Link先を確認

Shanjukta Nath, Keith Warren, Subhadeep Paul

(参考訳) 治療コミュニティ(TCs)の卒業に相互の影響や役割モデルの影響があるかを検討する。住民間の確認書と修正書の交換記録と正確な出入り日数を記録した3TCから匿名化された個人レベルの観測データを分析した。アサーションによってピアネットワークを形成することができ、エントリとイグジットの日付は関心の因果効果を定義することができる。因果的役割モデルの効果を,社会的接点の1つ(例えば,肯定的な意見を述べた仲間)を観察できる住民(ego)の期待結果の差を測定することで,egoの退学前に卒業を成功させるか,egoの退学前に卒業を成功させるか,という概念化する。ピアインフルエンスは通常観測データにおいて観測されていないホモフィアと結合するので、ネットワークを潜在変数モデルでモデル化し、ホモフィアを推定し、結果方程式に含める。我々は,ピア影響推定器のバイアスがサンプルサイズとともに減少するという理論的保証を提供する。以上の結果から,学生の卒業が住民の卒業に与える影響が示唆された。ピアの影響の大きさは、性別、人種、ロールモデル効果の定義に基づいて異なる。カウンターファクチュアル・エクササイズは、ネットワークの伝播を通じて、被治療者の直接的および間接的に、友人を「危険にさらされている」個人に割り当てることの潜在的利益を定量化する。

We investigate if there is a peer influence or role model effect on successful graduation from Therapeutic Communities (TCs). We analyze anonymized individual-level observational data from 3 TCs that kept records of written exchanges of affirmations and corrections among residents, and their precise entry and exit dates. The affirmations allow us to form peer networks, and the entry and exit dates allow us to define a causal effect of interest. We conceptualize the causal role model effect as measuring the difference in the expected outcome of a resident (ego) who can observe one of their social contacts (e.g., peers who gave affirmations), to be successful in graduating before the ego's exit vs not successfully graduating before the ego's exit. Since peer influence is usually confounded with unobserved homophily in observational data, we model the network with a latent variable model to estimate homophily and include it in the outcome equation. We provide a theoretical guarantee that the bias of our peer influence estimator decreases with sample size. Our results indicate there is an effect of peers' graduation on the graduation of residents. The magnitude of peer influence differs based on gender, race, and the definition of the role model effect. A counterfactual exercise quantifies the potential benefits of intervention of assigning a buddy to "at-risk" individuals directly on the treated resident and indirectly on their peers through network propagation.

翻訳日:2023-10-28 06:53:18 公開日:2023-10-24

# 畳み込みスパース符号化による教師なしエネルギー分散

Unsupervised energy disaggregation via convolutional sparse coding ( http://arxiv.org/abs/2207.09785v2 )

ライセンス: Link先を確認

Christian Aarset (1) and Andreas Habring (1) and Martin Holler (1) and Mario Mitter (2) ((1) University of Graz, (2) Solgenium OG)

(参考訳) 本研究では,スマートメータを備えた民家における非教師なしエネルギー分散手法を提案する。本手法は, 電力消費を能動的・受動的に分類し, 直接の相互作用なしに住民の活動や存在を報告できることを目的とする。これは、個人住宅の非侵入的な健康モニタリングのようなアプリケーションの基盤となる。提案手法は,ipalm(inertial proximal alternating linearized minimization)アルゴリズムを用いて,収束を保証した種々の条件を満たした適切なエネルギー汎関数を最小化するものである。提案手法の実現可能性を確認するため,半合成テストデータセットに関する実験と,既存の教師付き手法との比較を行った。

In this work, a method for unsupervised energy disaggregation in private households equipped with smart meters is proposed. This method aims to classify power consumption as active or passive, granting the ability to report on the residents' activity and presence without direct interaction. This lays the foundation for applications like non-intrusive health monitoring of private homes. The proposed method is based on minimizing a suitable energy functional, for which the iPALM (inertial proximal alternating linearized minimization) algorithm is employed, demonstrating that various conditions guaranteeing convergence are satisfied. In order to confirm feasibility of the proposed method, experiments on semi-synthetic test data sets and a comparison to existing, supervised methods are provided.

翻訳日:2023-10-28 06:46:06 公開日:2023-10-24

# Open-Radiomics: 標準化されたデータセットのコレクションと再生可能放射能機械学習パイプラインの技術プロトコル

Open-radiomics: A Collection of Standardized Datasets and a Technical Protocol for Reproducible Radiomics Machine Learning Pipelines ( http://arxiv.org/abs/2207.14776v2 )

ライセンス: Link先を確認

Khashayar Namdar, Matthias W. Wagner, Birgit B. Ertl-Wagner, Farzad Khalvati

(参考訳) 目的: 医療画像における機械学習パイプラインの重要な分野として、放射能は再現性とアクセシビリティという2つの大きな課題に直面している。本研究では,放射能特徴抽出が再現性に及ぼす影響を調べるため,提案手法に基づく包括的放射能パイプラインとともに,放射能データセットのセットであるopen-radiomicsを導入する。材料と方法: 実験はBraTS 2020オープンソースMRI(Magnetic Resonance Imaging)データセットで行われ、369人の成人脳腫瘍患者(低次グリオーマ76例、高次グリオーマ293例)を含む。 lggとhggの分類にpyradiomicsライブラリを使用し、4つのmri配列、3つのbinwidth、6つの画像正規化法、4つの腫瘍サブリージョンの組み合わせからなる288のradiomicsデータセットを形成する。ランダムフォレスト分類器が使用され、各放射能は異なるデータ分割とモデルランダム状態を用いたトレーニング検証(60%/20%/20%)実験を100回(28,800回)繰り返し、エリアアンダーレシーバー動作特性曲線(AUC)を算出した。結果:binwidthやimage normalizationと異なり,腫瘍のサブリージョンと画像配列はモデルの性能に大きく影響した。 t1コントラストエンハンス配列とネクロティックと非エンハンス腫瘍コア領域の結合により、最高aucs(平均auc 0.951,95%信頼区間0.949, 0.952)が得られた。 28の設定とデータ分割により1のAUCがテストされた。結語: この実験は, 放射能パイプライン(例:腫瘍亜領域)の変動源が, 結果に有意な影響を及ぼしうることを示し, 再現不可能な表面的完全性に繋がる可能性がある。

Purpose: As an important branch of machine learning pipelines in medical imaging, radiomics faces two major challenges namely reproducibility and accessibility. In this work, we introduce open-radiomics, a set of radiomics datasets along with a comprehensive radiomics pipeline based on our proposed technical protocol to investigate the effects of radiomics feature extraction on the reproducibility of the results. Materials and Methods: Experiments are conducted on BraTS 2020 open-source Magnetic Resonance Imaging (MRI) dataset that includes 369 adult patients with brain tumors (76 low-grade glioma (LGG), and 293 high-grade glioma (HGG)). Using PyRadiomics library for LGG vs. HGG classification, 288 radiomics datasets are formed; the combinations of 4 MRI sequences, 3 binWidths, 6 image normalization methods, and 4 tumor subregions. Random Forest classifiers were used, and for each radiomics dataset the training-validation-test (60%/20%/20%) experiment with different data splits and model random states was repeated 100 times (28,800 test results) and Area Under Receiver Operating Characteristic Curve (AUC) was calculated. Results: Unlike binWidth and image normalization, tumor subregion and imaging sequence significantly affected performance of the models. T1 contrast-enhanced sequence and the union of necrotic and the non-enhancing tumor core subregions resulted in the highest AUCs (average test AUC 0.951, 95% confidence interval of (0.949, 0.952)). Although 28 settings and data splits yielded test AUC of 1, they were irreproducible. Conclusion: Our experiments demonstrate the sources of variability in radiomics pipelines (e.g., tumor subregion) can have a significant impact on the results, which may lead to superficial perfect performances that are irreproducible.

翻訳日:2023-10-28 06:32:04 公開日:2023-10-24

# 小児低グレードグリオーマ腫瘍の3次元確率分布を用いた分子サブタイプ同定による深層学習モデルの改善

Improving Deep Learning Models for Pediatric Low-Grade Glioma Tumors Molecular Subtype Identification Using 3D Probability Distributions of Tumor Location ( http://arxiv.org/abs/2210.07287v2 )

ライセンス: Link先を確認

Khashayar Namdar, Matthias W. Wagner, Kareem Kudus, Cynthia Hawkins, Uri Tabori, Brigit Ertl-Wagner, Farzad Khalvati

(参考訳) 背景と目的:小児低次グリオーマ(pLGG)は小児で最も一般的な脳腫瘍であり,pLGGの分子マーカーの同定は治療計画の立案に不可欠である。 pLGGサブタイプ同定のための畳み込みニューラルネットワーク(CNN)モデルは腫瘍セグメンテーションに依存している。腫瘍の分節は最適ではないと仮定し,mriデータに腫瘍位置確率を用いたcnnモデルの拡張を提案する。材料と方法: rebが承認した回顧的研究には、mri流体減衰逆回復法(flair)の143個のブラフ融合癌と71個のブラフv600e変異腫瘍の配列があった。腫瘍セグメンテーション(ROIs)は小児神経放射線学のフェローが提供し、高齢者神経放射線学者が検証した。それぞれの実験では、データを80/20の割合で開発とテストにランダムに分割する。腫瘍位置の確率密度関数 (PDF) を導出するために, 開発データセットの各クラス毎の3DバイナリROIマスクを組み合わせ, 位置ベース, CNNベース, ハイブリッドの3つのパイプラインを開発した。結果:異なるモデルの初期化とデータを100回分割して実験を繰り返し,AUC(Area Under Receiver Operating Characteristics Curve)を算出した。位置ベース分類器は 77.90, 95% 信頼区間 (CI) (76.76, 79.03) を達成した。 CNNベースの分類器は86.11、CI(84.96、87.25)、CNNは88.64 CI(87.57、89.72)で前者を上回った(Studentのt-test p-value 0.0018)。結論: 腫瘍位置をCNNモデルに組み込むことにより, 統計的に有意な改善が得られた。結果から,手動で分割したROIが最適でない可能性が示唆された。

Background and Purpose: Pediatric low-grade glioma (pLGG) is the most common type of brain tumor in children, and identification of molecular markers for pLGG is crucial for successful treatment planning. Convolutional Neural Network (CNN) models for pLGG subtype identification rely on tumor segmentation. We hypothesize tumor segmentations are suboptimal and thus, we propose to augment the CNN models using tumor location probability in MRI data. Materials and Methods: Our REB-approved retrospective study included MRI Fluid-Attenuated Inversion Recovery (FLAIR) sequences of 143 BRAF fused and 71 BRAF V600E mutated tumors. Tumor segmentations (regions of interest (ROIs)) were provided by a pediatric neuroradiology fellow and verified by a senior pediatric neuroradiologist. In each experiment, we randomly split the data into development and test with an 80/20 ratio. We combined the 3D binary ROI masks for each class in the development dataset to derive the probability density functions (PDF) of tumor location, and developed three pipelines: location-based, CNN-based, and hybrid. Results: We repeated the experiment with different model initializations and data splits 100 times and calculated the Area Under Receiver Operating Characteristic Curve (AUC). The location-based classifier achieved an AUC of 77.90, 95% confidence interval (CI) (76.76, 79.03). CNN-based classifiers achieved AUC of 86.11, CI (84.96, 87.25), while the tumor-location-guided CNNs outperformed the formers with an average AUC of 88.64 CI (87.57, 89.72), which was statistically significant (Student's t-test p-value 0.0018). Conclusion: We achieved statistically significant improvements by incorporating tumor location into the CNN models. Our results suggest that manually segmented ROIs may not be optimal.

翻訳日:2023-10-28 06:25:49 公開日:2023-10-24

# 自己教師あり学習による心筋超音波からのラベルなしセグメンテーション

Label-free segmentation from cardiac ultrasound using self-supervised learning ( http://arxiv.org/abs/2210.04979v2 )

ライセンス: Link先を確認

Danielle L. Ferreira, Zaynaf Salaymang, Rima Arnaout

(参考訳) 心室のセグメンテーションと測定は心エコーにおいて重要であるが、困難で再現性に乏しい。ニューラルネットワークは補助できるが、教師付きアプローチは、同じ面倒な手動アノテーションを必要とする。コンピュータビジョン,臨床領域知識,深層学習を組み合わせた自己教師型(手動ラベルなし)セグメンテーションのためのパイプラインを構築した。 8,393枚の心エコー画像(4,476,266枚,平均61年,女性51%)を用いて,450枚の心エコー画像(93,000枚)をトレーニングし,生体計測値の算出を行った。また,左室を手作業で追跡できる患者10,030例の外部画像についても検討した。臨床測定値とパイプライン予測値の間のr2は、報告されたクリニック間変動と類似しており、いくつかの異なる測定値(r2 0.56-0.84)で教師あり学習に匹敵する。異常室径と機能を検出する平均精度は,臨床検査と比較して0.85(範囲0.71-0.97)であった。テスト心エコー図(n=553)のサブセットは、MRIがゴールド標準である心臓MRIに対応していた。パイプラインとMRIの相関は臨床心エコー図とMRIと類似していた。最後に、パイプラインは、外部の手動ラベル付きデータセットで、左室を0.99 (95% ci [0.89])の平均diceスコアで正確に区分する。本研究は, 超音波による画像分割を手作業で自由かつ臨床的に有効かつ高度にスケーラブルに行う方法である。

Segmentation and measurement of cardiac chambers is critical in cardiac ultrasound but is laborious and poorly reproducible. Neural networks can assist, but supervised approaches require the same laborious manual annotations. We built a pipeline for self-supervised (no manual labels) segmentation combining computer vision, clinical domain knowledge, and deep learning. We trained on 450 echocardiograms (93,000 images) and tested on 8,393 echocardiograms (4,476,266 images; mean 61 years, 51% female), using the resulting segmentations to calculate biometrics. We also tested against external images from an additional 10,030 patients with available manual tracings of the left ventricle. r2 between clinically measured and pipeline-predicted measurements were similar to reported inter-clinician variation and comparable to supervised learning across several different measurements (r2 0.56-0.84). Average accuracy for detecting abnormal chamber size and function was 0.85 (range 0.71-0.97) compared to clinical measurements. A subset of test echocardiograms (n=553) had corresponding cardiac MRIs, where MRI is the gold standard. Correlation between pipeline and MRI measurements was similar to that between clinical echocardiogram and MRI. Finally, the pipeline accurately segments the left ventricle with an average Dice score of 0.89 (95% CI [0.89]) in the external, manually labeled dataset. Our results demonstrate a manual-label free, clinically valid, and highly scalable method for segmentation from ultrasound, a noisy but globally important imaging modality.

翻訳日:2023-10-28 06:25:13 公開日:2023-10-24

# 部分観測軌道からの作動型クープマン発電機の非線形モデル学習

Learning Bilinear Models of Actuated Koopman Generators from Partially-Observed Trajectories ( http://arxiv.org/abs/2209.09977v3 )

ライセンス: Link先を確認

Samuel E. Otto, Sebastian Peitz, Clarence W. Rowley

(参考訳) 基礎となるkoopman演算子やジェネレータの近似に基づく非線形力学系のデータ駆動モデルは、予測、特徴学習、状態推定、制御に成功している。制御-アフィン系に対するクープマン生成器は入力に対するアフィン依存性も持つことがよく知られており、ダイナミクスの便利な有限次元双線型近似に繋がる。しかし、動作を伴うシステムのクープマン発生器を近似するための現在のアプローチの範囲を制限する2つの主要な障害がある。まず、既存の手法の性能は、クープマン生成器が近似される基底関数の選択に大きく依存する。第二に、全状態が観測されない場合、出力時系列の入力列への依存性を、近似koopman演算子にオブザーバブルを構築する際に考慮する必要がある。これらの問題に対処するため、クープマン発生器が支配する可観測体の力学を双線型隠れマルコフモデルとして記述し、予測最大化(EM)アルゴリズムを用いてモデルパラメータを決定する。 Eステップは標準のカルマンフィルタとスムーズで、Mステップはジェネレータの制御-アフィン動的モード分解に似ている。本手法は,ゆるい多様体を持つ作動系に対する有限次元koopman-invariant部分空間の復元,非強制ダフィング方程式に対するkoopman固有関数の推定,揚力と抗力のノイズ観測のみに基づく流体ピンボール系のモデル予測制御といった3つの実例で性能を示す。

Data-driven models for nonlinear dynamical systems based on approximating the underlying Koopman operator or generator have proven to be successful tools for forecasting, feature learning, state estimation, and control. It has become well known that the Koopman generators for control-affine systems also have affine dependence on the input, leading to convenient finite-dimensional bilinear approximations of the dynamics. Yet there are still two main obstacles that limit the scope of current approaches for approximating the Koopman generators of systems with actuation. First, the performance of existing methods depends heavily on the choice of basis functions over which the Koopman generator is to be approximated; and there is currently no universal way to choose them for systems that are not measure preserving. Secondly, if we do not observe the full state, then it becomes necessary to account for the dependence of the output time series on the sequence of supplied inputs when constructing observables to approximate Koopman operators. To address these issues, we write the dynamics of observables governed by the Koopman generator as a bilinear hidden Markov model, and determine the model parameters using the expectation-maximization (EM) algorithm. The E-step involves a standard Kalman filter and smoother, while the M-step resembles control-affine dynamic mode decomposition for the generator. We demonstrate the performance of this method on three examples, including recovery of a finite-dimensional Koopman-invariant subspace for an actuated system with a slow manifold; estimation of Koopman eigenfunctions for the unforced Duffing equation; and model-predictive control of a fluidic pinball system based only on noisy observations of lift and drag.

翻訳日:2023-10-28 06:22:47 公開日:2023-10-24

# グラフの非現実的説明に関する調査:定義,方法,評価

A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation ( http://arxiv.org/abs/2210.12089v2 )

ライセンス: Link先を確認

Mario Alfonso Prado-Romero and Bardh Prenkaj and Giovanni Stilo and Fosca Giannotti

(参考訳) グラフニューラルネットワーク(GNN)は、コミュニティ検出と分子分類においてよく機能する。 Counterfactual Explanations (CE) はブラックボックスモデルの透明性の限界を克服するための反例を提供する。グラフ学習の関心が高まっているため、我々はGNNにおけるCEの概念に注目している。私たちはsoaを分析して分類法、一様表記法、ベンチマークデータセットと評価メトリクスを提供しました。本稿では,14の手法,評価プロトコル,22のデータセット,19のメトリクスについて論じる。提案手法の大半をGRETELライブラリに統合し,その強度と落とし穴を理解する実験的な評価を行った。オープンな課題と今後の作業を強調します。

Graph Neural Networks (GNNs) perform well in community detection and molecule classification. Counterfactual Explanations (CE) provide counter-examples to overcome the transparency limitations of black-box models. Due to the growing attention in graph learning, we focus on the concepts of CE for GNNs. We analysed the SoA to provide a taxonomy, a uniform notation, and the benchmarking datasets and evaluation metrics. We discuss fourteen methods, their evaluation protocols, twenty-two datasets, and nineteen metrics. We integrated the majority of methods into the GRETEL library to conduct an empirical evaluation to understand their strengths and pitfalls. We highlight open challenges and future work.

翻訳日:2023-10-28 06:12:11 公開日:2023-10-24

# 過パラメータ学習におけるバギング:リスク特性とリスク単調化

Bagging in overparameterized learning: Risk characterization and risk monotonization ( http://arxiv.org/abs/2210.11445v3 )

ライセンス: Link先を確認

Pratik Patil, Jin-Hong Du, Arun Kumar Kuchibhotla

(参考訳) バギング(英: Bagging)は、統計学と機械学習において、予測手順の性能を改善するために一般的に用いられるアンサンブル技法である。本稿では,比例漸近法の下での袋詰め予測器の変種について,特徴数と観測数との比率が一定に収束する確率について検討する。具体的には,単純なランダムサンプリングによる古典的結果を用いて,袋詰め予測器の2乗誤差損失下での予測リスクを分析する一般的な手法を提案する。戦略を特化することで,任意の特徴共分散行列と信号ベクトルを持つ定型線形モデルの下で,任意の数のバッグを持つ袋付リッジおよびリッジレス予測器の正確な漸近的リスクを導出する。さらに,バッグングの最適サブサンプルサイズを選択するための一般的なクロスバリデーション手順を規定し,サンプルサイズ(二重あるいは多重の降下)の制限リスクの非単調な挙動を排除するために,その実用性について議論する。袋詰めリッジとリッジレス予測器に対する提案手法の実証において, 最適なサブサンプルサイズのオラクル特性を徹底的に検討し, 異なる袋詰めタイプ間の詳細な比較を行った。

Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors using classical results on simple random sampling. Specializing the strategy, we derive the exact asymptotic risk of the bagged ridge and ridgeless predictors with an arbitrary number of bags under a well-specified linear model with arbitrary feature covariance matrices and signal vectors. Furthermore, we prescribe a generic cross-validation procedure to select the optimal subsample size for bagging and discuss its utility to eliminate the non-monotonic behavior of the limiting risk in the sample size (i.e., double or multiple descents). In demonstrating the proposed procedure for bagged ridge and ridgeless predictors, we thoroughly investigate the oracle properties of the optimal subsample size and provide an in-depth comparison between different bagging variants.

翻訳日:2023-10-28 06:12:03 公開日:2023-10-24

# deepgoplus推論の数値安定性

Numerical Stability of DeepGOPlus Inference ( http://arxiv.org/abs/2212.06361v3 )

ライセンス: Link先を確認

In\'es Gonzalez Pepe, Yohan Chatelain, Gregory Kiar, Tristan Glatard

(参考訳) 畳み込みニューラルネットワーク(CNN)は現在、利用可能な最も広く使用されているディープニューラルネットワーク(DNN)アーキテクチャの1つであり、多くの問題に対して最先端のパフォーマンスを実現している。元々はコンピュータビジョンのタスクに応用され、CNNは画像以外の空間的関係のあるデータでもうまく機能し、様々な分野に適用されてきた。しかし、近年の研究では、DNNにおける数値安定性の課題が強調されている。これらの課題は、パフォーマンスと信頼性を損なう可能性がある。本稿では,タンパク質機能を予測するCNNであるDeepGOPlusについて検討する。 deepgoplusは最先端のパフォーマンスを達成し,プロテオミクスに出現するタンパク質配列をうまく活用し,アノテートすることができる。浮動小数点データの摂動による不確かさを定量化し,モデル推論段階の数値的安定性を判定する。さらに,DeepGOPlus推論に精度の低い浮動小数点形式を用いることで,メモリ消費とレイテンシを低減する機会を探る。これは、浮動小数点演算エラーを実験的に定量化するMonte Carlo Arithmeticと、カスタマイズ可能な浮動小数点演算精度フォーマットで結果をエミュレートするVPRECを使用してDeepGOPlusの実行を計測することで実現されている。 deepgoplusモデルの主要な成果物であり、異なる環境にまたがって広く適用できるため、推論の段階に焦点が当てられる。以上の結果から,DeepGOPlus CNNは数値的に非常に安定しているが,より精度の低い浮動小数点型でしか実装できないことがわかった。事前学習したdeepgoplusモデルから得られた予測は非常に信頼性が高く,既存の浮動小数点形式を効率的に利用することができる。

Convolutional neural networks (CNNs) are currently among the most widely-used deep neural network (DNN) architectures available and achieve state-of-the-art performance for many problems. Originally applied to computer vision tasks, CNNs work well with any data with a spatial relationship, besides images, and have been applied to different fields. However, recent works have highlighted numerical stability challenges in DNNs, which also relates to their known sensitivity to noise injection. These challenges can jeopardise their performance and reliability. This paper investigates DeepGOPlus, a CNN that predicts protein function. DeepGOPlus has achieved state-of-the-art performance and can successfully take advantage and annotate the abounding protein sequences emerging in proteomics.We determine the numerical stability of the model's inference stage by quantifying the numerical uncertainty due to perturbations of the underlying floating-point data. In addition, we explore the opportunity to use reduced-precision floating point formats for DeepGOPlus inference to reduce memory consumption and latency. This is achieved by instrumenting DeepGOPlus' execution using Monte Carlo Arithmetic, a technique that experimentally quantifies floating point operation errors and VPREC, a tool that emulates results with customizable floating point precision formats. Focus is placed on the inference stage as it is the primary deliverable of the DeepGOPlus model, widely applicable across different environments. All in all, our results show that although the DeepGOPlus CNN is very stable numerically, it can only be selectively implemented with lower-precision floating-point formats. We conclude that predictions obtained from the pre-trained DeepGOPlus model are very reliable numerically, and use existing floating-point formats efficiently.

翻訳日:2023-10-28 06:05:43 公開日:2023-10-24

# ガウス状態の光子数モーメントと累積

Photon-number moments and cumulants of Gaussian states ( http://arxiv.org/abs/2212.06067v3 )

ライセンス: Link先を確認

Yanic Cardin, Nicol\'as Quesada

(参考訳) 光子数に基づく場合,ガウス状態のモーメントと累積に対する閉形式表現を開発する。ガウス状態の光子数モーメントをループハフニアンで表現し、グラフの隣接を表す$(0,1)$-行列に適用すると、その完全マッチングの数を数える。同様に、(0,1)$-行列に適用されたとき、そのグラフのハミルトニアンサイクルの数をカウントする新しく導入された行列関数であるモントリオールアーの言葉で光子数累積を表現する。これらのグラフ理論接続に基づいて、光子数モーメントと累積の計算が$#P-$hardであることを示す。さらに、ハフニアンのよく知られた結果と一致するモントリオールの時間(すなわち累積)を計算する指数時間アルゴリズムを提供する。次に、一様損失の干渉計が、ゼロ変位を持つ同一の単一モードガウス状態を持つ全ての入力で供給されると、奇数次累積は、すべてゼロであることが示される。最後に,K$同一状態が$$\ell$モード干渉計に供給されるガウスボソンサンプリング装置において,累積の分布を4次まで異なる入力状態に対して研究するために導出した式を用いる。本研究では, 入力状態のタイプ, 圧縮状態, 損失値, スクラッシュ状態, 熱状態, および非真空入力数の関数として, 累積物の依存性を解析した。熱状態は他の古典的状態(例えばスカッシュ状態)よりも、損失状態や無損失状態の光子数累積状態の模倣においてずっと悪い結果をもたらすことが判明した。

We develop closed-form expressions for the moments and cumulants of Gaussian states when measured in the photon-number basis. We express the photon-number moments of a Gaussian state in terms of the loop Hafnian, a function that when applied to a $(0,1)$-matrix representing the adjacency of a graph, counts the number of its perfect matchings. Similarly, we express the photon-number cumulants in terms of the Montrealer, a newly introduced matrix function that when applied to a $(0,1)$-matrix counts the number of Hamiltonian cycles of that graph. Based on these graph-theoretic connections, we show that the calculation of photon-number moments and cumulants are $#P-$hard. Moreover, we provide an exponential time algorithm to calculate Montrealers (and thus cumulants), matching well-known results for Hafnians. We then demonstrate that when a uniformly lossy interferometer is fed in every input with identical single-mode Gaussian states with zero displacement, all the odd-order cumulants but the first one are zero. Finally, we employ the expressions we derive to study the distribution of cumulants up to the fourth order for different input states in a Gaussian boson sampling setup where $K$ identical states are fed into an $\ell$-mode interferometer. We analyze the dependence of the cumulants as a function of the type of input state, squeezed, lossy squeezed, squashed, or thermal, and as a function of the number of non-vacuum inputs. We find that thermal states perform much worse than other classical states, such as squashed states, at mimicking the photon-number cumulants of lossy or lossless squeezed states.

翻訳日:2023-10-28 06:05:09 公開日:2023-10-24

# JASMINE:Few-Shot LearningのためのアラビアGPTモデル

JASMINE: Arabic GPT Models for Few-Shot Learning ( http://arxiv.org/abs/2212.10755v2 )

ライセンス: Link先を確認

El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, AbdelRahim Elmadany, Alcides Alcoba Inciarte, Md Tawkat Islam Khondaker

(参考訳) 生成前訓練(GPT)に関する学術研究は、我々の自己回帰モデル全体の理解に深刻なギャップを残している。例えば、これらのモデルの可能性や、多様な言語的・文化的環境における社会的影響についてはほとんど知識がない。我々は、ジャスミンを導入することで、人口4億人を超える幅広い言語と方言のコレクションであるアラビア語のこの問題を緩和する。 JASMINEは、大きく多様なデータセット(約235GBのテキスト)で事前訓練された3億-6.7億のパラメータの大きさの強力なアラビア語の自動回帰トランスフォーマー言語モデルのスイートである。また,アラビア語自己回帰モデルの自動評価および人間評価のための包括的なベンチマークを,社会的バイアス,有害性,毒性の可能性を網羅して,慎重に設計し,公開する。新たなベンチマークを用いて,JASMINEは多種多様なNLPタスクにおける数ショット学習と同様に,本質的に強力な性能を示す。我々は、興味のある研究者とモデルと評価ベンチマークを責任を持ってリリースし、実験するためのコードを提供することを目標としています。

Scholarship on generative pretraining (GPT) remains acutely Anglocentric, leaving serious gaps in our understanding of the whole class of autoregressive models. For example, we have little knowledge about the potential of these models and their societal impacts in diverse linguistic and cultural settings. We alleviate this issue for Arabic, a wide collection of languages and dialectal varieties with more than 400 million population, by introducing JASMINE. JASMINE is a suite of powerful Arabic autoregressive Transformer language models ranging in size between 300 million-6.7 billion parameters pretrained on a large and diverse dataset (~ 235 GB of text). We also carefully design and release a comprehensive benchmark for both automated and human evaluation of Arabic autoregressive models, with coverage of potential social biases, harms, and toxicity. Using our novel benchmark, we evaluate JASMINE extensively showing powerful performance intrinsically as well as in few-shot learning on a wide range of NLP tasks. We aim to responsibly release our models and evaluation benchmark with interested researchers, along with code for experimenting with them.

翻訳日:2023-10-28 05:52:39 公開日:2023-10-24

# 抽出NLP課題生成モデルにおけるトークン化整合性

Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks ( http://arxiv.org/abs/2212.09912v2 )

ライセンス: Link先を確認

Kaiser Sun, Peng Qi, Yuhao Zhang, Lan Liu, William Yang Wang, Zhiheng Huang

(参考訳) 生成モデルは、入力の一部を抽出して所望の出力を形成する抽出タスクを解くために広く応用され、大きな成功を収めた。例えば、抽出質問応答(QA)では、生成モデルは常に最先端の結果をもたらす。本研究では,これらのモデルのトレーニングにおいて一般的に無視されるトークン化の不整合の問題を特定する。この問題は、インプットとアウトプットがトークン化されていないことでこれらのタスクの抽出性が損なわれ、結果としてパフォーマンスの低下と幻覚が引き起こされる。本稿では,この問題に対する簡易かつ効果的な解決法を提案し,抽出QAのケーススタディを行う。我々は、一貫したトークン化により、BARTモデルがSQuAD上でトレーニングされ、8つのQAデータセットで評価された場合、ドメイン内データセットとドメイン外データセットの両方で、注目すべき平均+1.7 F2ゲインを達成できることを示した。さらに、モデルはより速く収束し、文脈外回答を生じにくくなります。これらの結果から,抽出タスクの解決においてトークン化をどのように行うべきか,トレーニング中に一貫したトークン化を適用することを推奨したい。

Generative models have been widely applied to solve extractive tasks, where parts of the input is extracted to form the desired output, and achieved significant success. For example, in extractive question answering (QA), generative models have constantly yielded state-of-the-art results. In this work, we identify the issue of tokenization inconsistency that is commonly neglected in training these models. This issue damages the extractive nature of these tasks after the input and output are tokenized inconsistently by the tokenizer, and thus leads to performance drop as well as hallucination. We propose a simple yet effective fix to this issue and conduct a case study on extractive QA. We show that, with consistent tokenization, the model performs better in both in-domain and out-of-domain datasets, with a notable average of +1.7 F2 gain when a BART model is trained on SQuAD and evaluated on 8 QA datasets. Further, the model converges faster, and becomes less likely to generate out-of-context answers. With these findings, we would like to call for more attention on how tokenization should be done when solving extractive tasks and recommend applying consistent tokenization during training.

翻訳日:2023-10-28 05:51:56 公開日:2023-10-24

# 契約書で何を読むべきか? 法的義務、権利及び禁止の当事者固有の要約

What to Read in a Contract? Party-Specific Summarization of Legal Obligations, Entitlements, and Prohibitions ( http://arxiv.org/abs/2212.09825v2 )

ライセンス: Link先を確認

Abhilasha Sancheti, Aparna Garimella, Balaji Vasan Srinivasan, Rachel Rudinger

(参考訳) 法的契約における重要な義務、権利、および禁止の見直しと理解は、その長さとドメイン固有性のために退屈な作業となり得る。さらに、契約当事者ごとに重要な権利と義務が異なります。本研究では,権利と義務の理解の迅速化と改善を図るために,法定契約の当事者別抽出要約タスクを提案する。そこで,本研究では,法的専門家が注釈を付した,当事者固有の対関係の重要度比較からなるデータセットを収集し,リース契約から抽出された義務,権利,禁止を含む約293k文対をカバーする。このデータセットを用いて,ペアワイズ重要ランカを訓練し,パーティ固有の契約要約を生成するパイプラインベース抽出要約システムを提案する。自動評価法と人間評価法の両方を用いて,システムと各種ベースラインの比較を行い,要約中にドメイン固有の重要概念を取り入れる必要性を確立する。

Reviewing and comprehending key obligations, entitlements, and prohibitions in legal contracts can be a tedious task due to their length and domain-specificity. Furthermore, the key rights and duties requiring review vary for each contracting party. In this work, we propose a new task of party-specific extractive summarization for legal contracts to facilitate faster reviewing and improved comprehension of rights and duties. To facilitate this, we curate a dataset comprising of party-specific pairwise importance comparisons annotated by legal experts, covering ~293K sentence pairs that include obligations, entitlements, and prohibitions extracted from lease agreements. Using this dataset, we train a pairwise importance ranker and propose a pipeline-based extractive summarization system that generates a party-specific contract summary. We establish the need for incorporating domain-specific notion of importance during summarization by comparing our system against various baselines using both automatic and human evaluation methods

翻訳日:2023-10-28 05:51:38 公開日:2023-10-24

# cp-bcs:制御フローグラフと擬似コードによるバイナリコードの要約

CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code ( http://arxiv.org/abs/2310.16853v1 )

ライセンス: Link先を確認

Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang

(参考訳) 低レベルの言語(アセンブリコード)の実行動作とセマンティクスを人間可読な自然言語に変換することを含むため、バイナリの関数サマリーの自動生成は極めて価値のある作業である。しかしながら、アセンブリコードの理解に関する現在の作業のほとんどは、関数名の生成に向けられている。このギャップを埋めるため、バイナリ関数、特に削除されたバイナリ(シンボルテーブルやデバッグ情報がない)の完全な要約を生成することに重点を置いています。アセンブリコードのセマンティクスを十分に活用するために,cp-bcsと呼ばれる制御フローグラフと擬似コードガイドバイナリコード要約フレームワークを提案する。 CP-BCSは双方向の命令レベル制御フローグラフと擬似コードを利用して、専門家の知識を取り入れ、包括的なバイナリ関数の実行動作と論理意味学を学ぶ。 CP-BCSを3種類のコンピュータアーキテクチャ(X86, X64, ARM)に対して3種類のバイナリ最適化レベル(O1, O2, O3)で評価する。その結果,cp-bcsが優れ,リバースエンジニアリングの効率が著しく向上した。

Automatically generating function summaries for binaries is an extremely valuable but challenging task, since it involves translating the execution behavior and semantics of the low-level language (assembly code) into human-readable natural language. However, most current works on understanding assembly code are oriented towards generating function names, which involve numerous abbreviations that make them still confusing. To bridge this gap, we focus on generating complete summaries for binary functions, especially for stripped binary (no symbol table and debug information in reality). To fully exploit the semantics of assembly code, we present a control flow graph and pseudo code guided binary code summarization framework called CP-BCS. CP-BCS utilizes a bidirectional instruction-level control flow graph and pseudo code that incorporates expert knowledge to learn the comprehensive binary function execution behavior and logic semantics. We evaluate CP-BCS on 3 different binary optimization levels (O1, O2, and O3) for 3 different computer architectures (X86, X64, and ARM). The evaluation results demonstrate CP-BCS is superior and significantly improves the efficiency of reverse engineering.

翻訳日:2023-10-28 00:17:36 公開日:2023-10-24

# 医療画像を用いた深層学習モデルによる新型コロナウイルス患者の分類

Deep Learning Models for Classification of COVID-19 Cases by Medical Images ( http://arxiv.org/abs/2310.16851v1 )

ライセンス: Link先を確認

Amir Ali

(参考訳) 近年,胸部ct画像を用いた新型コロナウイルス感染の検出が注目されている。しかし、医療画像から患者を分類することは、特にその両側の変化を特定する上で非常に困難である。この課題に対処するために,本研究では,感染患者の正確な分類に深層学習モデルの力を利用する。本研究では,deepnet201,googlenet,alexnetを含む深層伝達学習に基づく分類モデルと,注意深く選択された教師付き学習モデルの比較分析を行った。また,X線や心電図などの医用画像の識別と識別を含むCovid-19の分類も検討した。この包括的なアプローチにより、我々のモデルは幅広い医療画像タイプを扱えるようになり、Covid-19の特徴的なパターンを効果的に特定できる。高度な深層学習技術を用いて、綿密な研究を行い、Covid-19診断の精度とスピードを高めるために大きな努力をしてきた。これらのモデルの有効性と、covid-19対策のグローバルな取り組みに多大な貢献ができる可能性を実証した。

In recent times, the use of chest Computed Tomography (CT) images for detecting coronavirus infections has gained significant attention, owing to their ability to reveal bilateral changes in affected individuals. However, classifying patients from medical images presents a formidable challenge, particularly in identifying such bilateral changes. To tackle this challenge, our study harnesses the power of deep learning models for the precise classification of infected patients. Our research involves a comparative analysis of deep transfer learning-based classification models, including DenseNet201, GoogleNet, and AlexNet, against carefully chosen supervised learning models. Additionally, our work encompasses Covid-19 classification, which involves the identification and differentiation of medical images, such as X-rays and electrocardiograms, that exhibit telltale signs of Covid-19 infection. This comprehensive approach ensures that our models can handle a wide range of medical image types and effectively identify characteristic patterns indicative of Covid-19. By conducting meticulous research and employing advanced deep learning techniques, we have made significant strides in enhancing the accuracy and speed of Covid-19 diagnosis. Our results demonstrate the effectiveness of these models and their potential to make substantial contributions to the global effort to combat COVID-19.

翻訳日:2023-10-28 00:17:13 公開日:2023-10-24

# 新視点音響合成

Novel-View Acoustic Synthesis ( http://arxiv.org/abs/2301.08730v3 )

ライセンス: Link先を確認

Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

(参考訳) 我々は,nvas(new-view acoustic synthesis)タスクについて紹介する。音源の視点で観測された視覚と音を考えると,対象とする視点からそのシーンの音を合成できるのか? 入力された音声・視覚的手がかりを分析し,空間内の任意の点の音を合成することを学ぶ視覚誘導音響合成(ViGAS)ネットワークを提案する。このタスクをベンチマークするために、我々は2つの大規模マルチビューオーディオ視覚データセットを収集した。提案手法は,空間的手がかりの推論に成功し,両データセットに忠実な音声を合成することを示す。我々の知る限り、この研究は、AR/VRからアート、デザインに至るまで、エキサイティングな可能性のある、新しい視点の音響合成タスクを解決するための、最初の定式化、データセット、アプローチを表している。この研究に縛られずに、我々は、新しいビュー合成の未来は、ビデオからのマルチモーダル学習にあると信じている。

We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint? We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space by analyzing the input audio-visual cues. To benchmark this task, we collect two first-of-their-kind large-scale multi-view audio-visual datasets, one synthetic and one real. We show that our model successfully reasons about the spatial cues and synthesizes faithful audio on both datasets. To our knowledge, this work represents the very first formulation, dataset, and approach to solve the novel-view acoustic synthesis task, which has exciting potential applications ranging from AR/VR to art and design. Unlocked by this work, we believe that the future of novel-view synthesis is in multi-modal learning from videos.

翻訳日:2023-10-27 18:18:12 公開日:2023-10-24

# 未知力学系のロバスト進化演算子学習のための臨界サンプリング

Critical Sampling for Robust Evolution Operator Learning of Unknown Dynamical Systems ( http://arxiv.org/abs/2304.07485v3 )

ライセンス: Link先を確認

Ce Zhang, Kailiang Wu, Zhihai He

(参考訳) 未知の力学系を考えると、その統治法則の効果的な学習と将来の進化の正確な予測に必要なサンプルの最小数と、これらの臨界試料をどうやって選択するか。そこで本研究では,設計アプローチに基づくこの問題について検討する。少数の初期サンプルから始めて、システム進化のより正確な学習を実現するために、臨界サンプルを適応的に発見する。ここでの課題の1つは、地平系状態が未知であるため、ネットワークモデリングエラーを知らないことですが、これはクリティカルサンプリングに必要です。この課題に対処するために,前向きと後向きの進化ネットワークをそれぞれ前向きと後向きの時間方向の時間的進化の挙動を学習する多段階の相互予測ネットワークを提案する。非常に興味深いことに、所望のネットワークモデリング誤差は、現在のシステム状態から直接計算できる多段階相互予測誤差と高い相関関係にあることがわかった。これにより、動的システムに対する高いネットワークモデリング誤差を持つ領域から臨界サンプルを動的に選択できる。さらに、空間力学モデリングを時間的進化予測に組み込んだ共同時空間進化ネットワークを導入し、システム進化演算子を少数のサンプルで頑健に学習する。提案手法は,未知力学系の効果的な学習に必要なサンプル数を劇的に削減し,未知力学系の進化挙動を正確に予測できることが実証された。

Given an unknown dynamical system, what is the minimum number of samples needed for effective learning of its governing laws and accurate prediction of its future evolution behavior, and how to select these critical samples? In this work, we propose to explore this problem based on a design approach. Starting from a small initial set of samples, we adaptively discover critical samples to achieve increasingly accurate learning of the system evolution. One central challenge here is that we do not know the network modeling error since the ground-truth system state is unknown, which is however needed for critical sampling. To address this challenge, we introduce a multi-step reciprocal prediction network where forward and backward evolution networks are designed to learn the temporal evolution behavior in the forward and backward time directions, respectively. Very interestingly, we find that the desired network modeling error is highly correlated with the multi-step reciprocal prediction error, which can be directly computed from the current system state. This allows us to perform a dynamic selection of critical samples from regions with high network modeling errors for dynamical systems. Additionally, a joint spatial-temporal evolution network is introduced which incorporates spatial dynamics modeling into the temporal evolution prediction for robust learning of the system evolution operator with few samples. Our extensive experimental results demonstrate that our proposed method is able to dramatically reduce the number of samples needed for effective learning and accurate prediction of evolution behaviors of unknown dynamical systems by up to hundreds of times.

翻訳日:2023-10-26 21:25:47 公開日:2023-10-24

# アウトソース機械学習タスクの低コスト結果検証のための生成フレームワーク

A Generative Framework for Low-Cost Result Validation of Outsourced Machine Learning Tasks ( http://arxiv.org/abs/2304.00083v2 )

ライセンス: Link先を確認

Abhinav Kumar, Miguel A. Guirao Aguilera, Reza Tourani, Satyajayant Misra

(参考訳) 機械学習(ML)の人気が高まり、さまざまなセンシティブなドメインにデプロイされるようになり、MLのセキュリティとプライバシを重視した大きな研究がもたらされた。しかしながら、自動運転など一部のアプリケーションでは、アウトソースされたMLワークロードの整合性検証がより重要になっている。マルチパーティ計算や証明ベースシステムといった既存のソリューションは、計算オーバーヘッドがかなり大きいため、リアルタイムアプリケーションには適さない。我々は、アウトソースされたMLワークロードのリアルタイム検証のための新しいフレームワークであるFidesを提案する。 Fidesは、信頼された実行環境内で実行中に対応するサービスモデルを検証するための、空間を動的に蒸留し微調整する、新しい、効率的な蒸留技術である、Greedy Distillation Transfer Learningを特徴としている。 fideは、統計分析と分岐測定を使用して、サービスモデルが攻撃されている場合に高い確率で識別するクライアント側の攻撃検出モデルを備えている。 Fidesはまた、攻撃が特定されるたびに元のクラスを予測する再分類機能を提供する。攻撃検出と再分類モデルの訓練のための生成的逆ネットワークフレームワークを考案した。評価の結果,fideは攻撃検出で最大98%,再分類で94%の精度を達成した。

The growing popularity of Machine Learning (ML) has led to its deployment in various sensitive domains, which has resulted in significant research focused on ML security and privacy. However, in some applications, such as autonomous driving, integrity verification of the outsourced ML workload is more critical--a facet that has not received much attention. Existing solutions, such as multi-party computation and proof-based systems, impose significant computation overhead, which makes them unfit for real-time applications. We propose Fides, a novel framework for real-time validation of outsourced ML workloads. Fides features a novel and efficient distillation technique--Greedy Distillation Transfer Learning--that dynamically distills and fine-tunes a space and compute-efficient verification model for verifying the corresponding service model while running inside a trusted execution environment. Fides features a client-side attack detection model that uses statistical analysis and divergence measurements to identify, with a high likelihood, if the service model is under attack. Fides also offers a re-classification functionality that predicts the original class whenever an attack is identified. We devised a generative adversarial network framework for training the attack detection and re-classification models. The evaluation shows that Fides achieves an accuracy of up to 98% for attack detection and 94% for re-classification.

翻訳日:2023-10-26 21:24:03 公開日:2023-10-24

# バンディットフィードバックによる実効予測:再パラメータ化による学習

Performative Prediction with Bandit Feedback: Learning through Reparameterization ( http://arxiv.org/abs/2305.01094v3 )

ライセンス: Link先を確認

Yatong Chen, Wei Tang, Chien-Ju Ho, Yang Liu

(参考訳) 実効的予測は、 \citeauthor{perdomo2020performative} によって導入された、モデルの展開に応じてデータ分布自体が変化する社会予測を研究するためのフレームワークである。この分野での既存の作業は通常、実行リスクがデプロイされたモデル上で凸である、モデルからデータ分散へのマッピングが事前にモデルデザイナに知られており、実行リスクの第一次情報が利用可能である、という3つの前提に基づいている。本稿では,これらの仮定を必要としない実効予測問題の研究を開始する。具体的には、実行予測目標を誘導データ分布の関数として再パラメータ化する「em再パラメータ化」フレームワークを開発する。また,第1レベルが分布パラメータ空間上で反復最適化を行い,第2レベルが各イテレーションで特定の目標分布パラメータを誘導するモデルを学習する2レベルゼロ階最適化手法を開発した。軽度条件下では、この再パラメータ化により、非凸目標を凸目標に変換し、証明可能な後悔保証を達成することができる。特に, 実演サンプルの総数において部分線形であり, モデルパラメータの次元における多項式のみである後悔境界を与える。アプリケーション側では、youtubeやtiktokのような大規模なオンラインレコメンデーションシステムでは、レコメンデーション更新頻度が高く、将来の好みを変える可能性がある。

Performative prediction, as introduced by \citeauthor{perdomo2020performative}, is a framework for studying social prediction in which the data distribution itself changes in response to the deployment of a model. Existing work in this field usually hinges on three assumptions that are easily violated in practice: that the performative risk is convex over the deployed model, that the mapping from the model to the data distribution is known to the model designer in advance, and the first-order information of the performative risk is available. In this paper, we initiate the study of performative prediction problems that do not require these assumptions. Specifically, we develop a {\em reparameterization} framework that reparametrizes the performative prediction objective as a function of the induced data distribution. We also develop a two-level zeroth-order optimization procedure, where the first level performs iterative optimization on the distribution parameter space, and the second level learns the model that induced a particular target distribution parameter at each iteration. Under mild conditions, this reparameterization allows us to transform the non-convex objective into a convex one and achieve provable regret guarantees. In particular, we provide a regret bound that is sublinear in the total number of performative samples taken and is only polynomial in the dimension of the model parameter. On the application side, we believe our method is useful for large online recommendation systems like YouTube or TikTok, where the recommendation update frequency is high and might potentially reshape future preferences.

翻訳日:2023-10-26 21:13:45 公開日:2023-10-24

# ブロック不変対称性シフト:二量化ハミルトニアンのユニタリの線形結合への分解を改善する前処理法

Block-Invariant Symmetry Shift: Preprocessing technique for second-quantized Hamiltonians to improve their decompositions to Linear Combination of Unitaries ( http://arxiv.org/abs/2304.13772v3 )

ライセンス: Link先を確認

Ignacio Loaiza, Artur F. Izmaylov

(参考訳) 量子位相推定(QPE)による分子電子ハミルトニアンのエネルギー推定の計算コストは、ハミルトニアンの最大値と最小値の違いによって増大する。本研究では、ハミルトニアンのノルムを特定の対称性の標的状態の固有スペクトルを変更することなく減少させる前処理手順を提案する。新しい手順であるBlock-Invariant Symmetry Shift (BLISS) は作用素 T を構築し、H-T を実装するコストは H のそれと比較すると削減されるが、H-T は H と同じ方法で利子の部分空間に作用する。 BLISS性能は、LCU(Linear Combination of Unitary)に基づく小さな分子の集合上のQPEアプローチに対して実証される。目標とする状態の集合を示す対称性として電子の数を用いると、BLISSはいくつかのLCU分解に対して非シフトバージョンと比較して2つの1ノルムの減少係数を与えた。

Computational cost of energy estimation for molecular electronic Hamiltonians via Quantum Phase Estimation (QPE) grows with the difference between the largest and smallest eigenvalues of the Hamiltonian. In this work we propose a preprocessing procedure that reduces the norm of the Hamiltonian without changing its eigenspectrum for the target states of a particular symmetry. The new procedure, Block-Invariant Symmetry Shift (BLISS), builds an operator T such that the cost of implementing H-T is reduced compared to that of H, yet H-T acts on the subspaces of interest the same way as H does. BLISS performance is demonstrated for Linear Combination of Unitaries (LCU)-based QPE approaches on a set of small molecules. Using the number of electrons as the symmetry specifying the target set of states, BLISS provided a factor of 2 reduction of 1-norm for several LCU decompositions compared to their unshifted versions.

翻訳日:2023-10-26 21:13:19 公開日:2023-10-24

# 超強結合空洞QEDにおける緩和破壊と共鳴トンネル

Relaxation breakdown and resonant tunneling in ultrastrong-coupling cavity QED ( http://arxiv.org/abs/2304.11191v2 )

ライセンス: Link先を確認

Daniele De Bernardis

(参考訳) 単一電磁空洞モードと超強結合した非対称双極子の開緩和ダイナミクスについて検討した。相互作用系全体に対する熱化マスター方程式を用いることで、リウビリアンギャップの位相図を導出する。超強結合は双極子トンネル速度の指数関数的な抑制により平衡状態への緩和を抑制する。しかし、極性多光子共鳴はキャビティを介する双極子共鳴トンネル法により高速な緩和を回復する。数値的なエビデンスとは別に、一般化された回転波近似によりRabiモデルを対角化して完全に解析的な記述を開発する。このような超強結合系の緩和物理学は、標準のテキストブック装束状態図の多光子ポーラロン版に還元される。最後に、超強結合系におけるカスケード共振トンネル構成の基礎を設定できるマルチウェルダイポールの拡張について議論する。

We study the open relaxation dynamics of an asymmetric dipole that is ultrastrongly coupled to a single electromagnetic cavity mode. By using a thermalizing master equation for the whole interacting system we derive a phase diagram of the Liouvillian gap. It emerges that the ultrastrong coupling inhibits the system relaxation toward the equilibrium state due to an exponential suppression of the dipole tunneling rate. However, we find that polaronic multi-photon resonances restore fast relaxation by a cavity-mediated dipole resonant tunneling process. Aside of the numerical evidences, we develop a fully analytical description by diagonalizing the Rabi model through a generalized rotating-wave approximation, valid in the so-called polaron frame. The relaxation physics of such ultrastrong-coupling systems is then reduced to a multi-photon polaron version of the standard text-book dressed states picture. At the end we discuss an extension to a multi-well dipole that can set the basis of a cascaded resonant tunnelling setup in the ultrastrong coupling regime.

翻訳日:2023-10-26 21:12:19 公開日:2023-10-24

# TELeR:複雑なタスクのベンチマークのためのLLMプロンプトの一般的な分類法

TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks ( http://arxiv.org/abs/2305.11430v2 )

ライセンス: Link先を確認

Shubhra Kanti Karmaker Santu and Dongji Feng

(参考訳) LLMは従来の会話環境におけるテキストの理解と生成に大きな成功を収めてきたが、不明確な複雑なタスクを実行する可能性はほとんど研究されていない。実際、我々は複雑なタスクにのみ焦点を絞った複数のLSMを用いて包括的なベンチマーク研究を行っていません。しかし,このようなベンチマーク研究を行うことは,プロンプトタイプやスタイルが異なる場合や,プロンプトで詳細度が異なる場合,llmsの性能のばらつきが大きいため,困難である。この問題に対処するため,本論文では,様々な複雑なタスクを実行するために,特定の特性を持つプロンプトを設計できる汎用分類法を提案する。この分類は、将来のベンチマーク研究が研究の一部として使用される特定のカテゴリのプロンプトを報告し、異なる研究間で有意義な比較を可能にする。また、この分類学を通じて共通標準を確立することで、研究者は特定の複雑なタスクにおいてLLMのパフォーマンスについてより正確な結論を導き出すことができる。

While LLMs have shown great success in understanding and generating text in traditional conversational settings, their potential for performing ill-defined complex tasks is largely under-studied. Indeed, we are yet to conduct comprehensive benchmarking studies with multiple LLMs that are exclusively focused on a complex task. However, conducting such benchmarking studies is challenging because of the large variations in LLMs' performance when different prompt types/styles are used and different degrees of detail are provided in the prompts. To address this issue, the paper proposes a general taxonomy that can be used to design prompts with specific properties in order to perform a wide range of complex tasks. This taxonomy will allow future benchmarking studies to report the specific categories of prompts used as part of the study, enabling meaningful comparisons across different studies. Also, by establishing a common standard through this taxonomy, researchers will be able to draw more accurate conclusions about LLMs' performance on a specific complex task.

翻訳日:2023-10-26 21:04:05 公開日:2023-10-24

# DoReMi: データ混合の最適化が言語モデルの事前トレーニングを高速化

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining ( http://arxiv.org/abs/2305.10429v3 )

ライセンス: Link先を確認

Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

(参考訳) 事前学習データドメイン(wikipedia、書籍、webテキストなど)の混合比率は、言語モデル(lm)の性能に大きく影響する。本稿では,minimax optimization (doremi) によるドメインの重み付けを提案する。これはまず,グループ分散ロバスト最適化 (group distributionally robust optimization, group dro) を用いた小さなプロキシモデルを,ダウンストリームタスクを知らずにドメインの重み付け (mixture proportions) を生成する。次に、これらのドメインウェイトでデータセットを再サンプリングし、より大きなフルサイズのモデルをトレーニングします。実験では、280Mパラメータのプロキシモデル上でDoReMiを使用して、8Bパラメータモデル(30倍大きい)をより効率的にトレーニングするためのドメイン重みを求める。 The Pileでは、DoReMiはドメインをダウンウェイトしても、すべてのドメインのパープレキシティを改善します。 DoReMiは、The Pileのデフォルトドメインウェイトを使用してトレーニングされたベースラインモデルに対して平均的な数ショットダウンストリーム精度を6.5%改善し、2.6倍のトレーニングステップでベースライン精度に達する。 GLaMデータセットでは、下流タスクの知識がないDoReMiが、下流タスクにチューニングされたドメインウェイトの使用パフォーマンスにマッチする。

The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of downstream tasks. We then resample a dataset with these domain weights and train a larger, full-sized model. In our experiments, we use DoReMi on a 280M-parameter proxy model to find domain weights for training an 8B-parameter model (30x larger) more efficiently. On The Pile, DoReMi improves perplexity across all domains, even when it downweights a domain. DoReMi improves average few-shot downstream accuracy by 6.5% points over a baseline model trained using The Pile's default domain weights and reaches the baseline accuracy with 2.6x fewer training steps. On the GLaM dataset, DoReMi, which has no knowledge of downstream tasks, even matches the performance of using domain weights tuned on downstream tasks.

翻訳日:2023-10-26 21:03:47 公開日:2023-10-24

# 低ランク共変量近似による変量誤差fr\'echet回帰

Errors-in-variables Fr\'echet Regression with Low-rank Covariate Approximation ( http://arxiv.org/abs/2305.09282v2 )

ライセンス: Link先を確認

Kyunghee Han and Dogyoon Song

(参考訳) fr\'echet回帰は非ユークリッド応答変数を含む回帰分析に有望なアプローチとして現れた。しかし、その実用的適用性は、豊富でノイズのない共変量データを持つ理想的なシナリオに依存することによって妨げられている。本稿では,共変量行列に内在する低ランク構造を活用し,これらの制約に対処する新しい推定手法を提案する。提案手法は,大域的Fr'echet回帰と主成分回帰の概念を組み合わせて,回帰推定器の効率と精度の向上を目的とする。低ランク構造を取り入れることで、特に高次元および誤差不変回帰設定において、より効率的なモデリングと推定が可能となる。提案した推定器の大サンプル特性の理論的解析を行い, 偏差, 分散, および測定誤差による追加変動の包括的解析を行った。さらに, 数値実験により, 理論的な知見を裏付ける実証的なエビデンスを与え, 提案手法の優れた性能を示す。全体として、この研究は非ユークリッド変数の回帰分析のための有望なフレームワークを導入し、様々な分野の潜在的な応用とともに、限定的でノイズの多い共変量データに関連する課題に効果的に対処する。

Fr\'echet regression has emerged as a promising approach for regression analysis involving non-Euclidean response variables. However, its practical applicability has been hindered by its reliance on ideal scenarios with abundant and noiseless covariate data. In this paper, we present a novel estimation method that tackles these limitations by leveraging the low-rank structure inherent in the covariate matrix. Our proposed framework combines the concepts of global Fr\'echet regression and principal component regression, aiming to improve the efficiency and accuracy of the regression estimator. By incorporating the low-rank structure, our method enables more effective modeling and estimation, particularly in high-dimensional and errors-in-variables regression settings. We provide a theoretical analysis of the proposed estimator's large-sample properties, including a comprehensive rate analysis of bias, variance, and additional variations due to measurement errors. Furthermore, our numerical experiments provide empirical evidence that supports the theoretical findings, demonstrating the superior performance of our approach. Overall, this work introduces a promising framework for regression analysis of non-Euclidean variables, effectively addressing the challenges associated with limited and noisy covariate data, with potential applications in diverse fields.

翻訳日:2023-10-26 21:02:46 公開日:2023-10-24

# シーングラフを用いた事前学習型視覚・言語モデルへの構造化表現の導入

Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs ( http://arxiv.org/abs/2305.06343v2 )

ライセンス: Link先を確認

Roei Herzig, Alon Mendelson, Leonid Karlinsky, Assaf Arbelle, Rogerio Feris, Trevor Darrell, Amir Globerson

(参考訳) 視覚と言語モデル(VLM)は、様々なタスクにおいて顕著なゼロショット(ZS)性能を示した。しかし、近年の研究では、最高のVLMでさえ、オブジェクト属性、関係性、行動状態などの構成的シーン理解の側面を捉えるのに苦労していることが示されている。対照的に、これらのモデルを改善することができるシーングラフ(SG)のような構造化アノテーションを得るためには、時間とコストがかかり、大規模では利用できない。ここでは,SGデータセットが事前学習されたVLMの構造的理解を高めるのに十分な情報を提供できるかどうかを問う。構造化情報を視覚表現とテキスト表現の両方に組み込むコンポーネントを統合することで,sgsから学習する際にvlmを改善することが可能であることを示す。視覚面では、SG情報を予測するために訓練されたイメージトランスフォーマーに特別な「SGコンポーネント」を組み込む一方、テキスト側では、SGを使用して、シーンの異なる構成面をハイライトするきめ細かいキャプションを生成する。提案手法は,ZS能力を軽度に低下させるだけで,複数のVLデータセット上でのVLMの性能を向上する。

Vision and language models (VLMs) have demonstrated remarkable zero-shot (ZS) performance in a variety of tasks. However, recent works have shown that even the best VLMs struggle to capture aspects of compositional scene understanding, such as object attributes, relations, and action states. In contrast, obtaining structured annotations, such as scene graphs (SGs), that could improve these models is time-consuming and costly, and thus cannot be used on a large scale. Here we ask whether small SG datasets can provide sufficient information for enhancing structured understanding of pretrained VLMs. We show that it is indeed possible to improve VLMs when learning from SGs by integrating components that incorporate structured information into both visual and textual representations. For the visual side, we incorporate a special "SG Component" in the image transformer trained to predict SG information, while for the textual side, we utilize SGs to generate fine-grained captions that highlight different compositional aspects of the scene. Our method improves the performance of several popular VLMs on multiple VL datasets with only a mild degradation in ZS capabilities.

翻訳日:2023-10-26 21:02:26 公開日:2023-10-24

# NerfAcc: 効率的なサンプリングがNeRFを加速

NerfAcc: Efficient Sampling Accelerates NeRFs ( http://arxiv.org/abs/2305.04966v2 )

ライセンス: Link先を確認

Ruilong Li, Hang Gao, Matthew Tancik, Angjoo Kanazawa

(参考訳) ボリュームレンダリングに必要な大量のサンプルのため、ニューラルレイディアンスフィールドの最適化とレンダリングは計算コストがかかる。最近の研究には、彼らのメソッドを加速するための代替サンプリングアプローチが含まれているが、それらはしばしば作業の焦点ではない。本稿では,複数のサンプリング手法を検討・比較し,改良されたサンプリングは送信推定器の統一的概念の下でNeRFの変種に適用可能であることを示す。今後の実験を容易にするため,NeRF関連手法に高度なサンプリング手法を組み込むための柔軟なAPIを提供するPythonツールボックスであるNerfAccを開発した。既存のコードベースに最小限の変更を加えることで、最近のNeRFメソッドのトレーニング時間を1.5倍から20倍に短縮できることを示し、その柔軟性を示す。さらに、Instant-NGPのような高度にカスタマイズされたNeRFは、NerfAccを使用してネイティブのPyTorchで実装できる。

Optimizing and rendering Neural Radiance Fields is computationally expensive due to the vast number of samples required by volume rendering. Recent works have included alternative sampling approaches to help accelerate their methods, however, they are often not the focus of the work. In this paper, we investigate and compare multiple sampling approaches and demonstrate that improved sampling is generally applicable across NeRF variants under an unified concept of transmittance estimator. To facilitate future experiments, we develop NerfAcc, a Python toolbox that provides flexible APIs for incorporating advanced sampling methods into NeRF related methods. We demonstrate its flexibility by showing that it can reduce the training time of several recent NeRF methods by 1.5x to 20x with minimal modifications to the existing codebase. Additionally, highly customized NeRFs, such as Instant-NGP, can be implemented in native PyTorch using NerfAcc.

翻訳日:2023-10-26 21:01:40 公開日:2023-10-24

# マルチホップインストラクションによる画像操作 -- 新しいデータセットと弱スーパービジョンニューロシンボリックアプローチ

Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach ( http://arxiv.org/abs/2305.14410v2 )

ライセンス: Link先を確認

Harman Singh, Poorva Garg, Mohit Gupta, Kevin Shah, Ashish Goswami, Satyam Modi, Arnab Kumar Mondal, Dinesh Khandelwal, Dinesh Garg, Parag Singla

(参考訳) 私たちは自然言語テキストによるイメージ操作に関心があります -- 複数のAIアプリケーションに有用なタスクですが、マルチモーダルスペースに対する複雑な推論が必要です。近年提案されているニューロシンボリック・コンセプト・ラーニング(nscl)を,画像操作のための視覚質問応答(vqa)のタスクに非常に効果的に拡張した。 NeuroSIM と呼ばれるシステムでは,マルチオブジェクトシーン上で複雑なマルチホップ推論を行うことができ,VQA の注釈付きデータ形式において弱い監視しか必要としない。 NeuroSIMは、オブジェクト属性と操作操作からなるドメイン固有言語(DSL)に基づいて、命令をシンボルプログラムに解析し、その実行を導く。我々はタスクのための新しいデータセットを作成し、幅広い実験により、neurosimが教師付きデータを使用して操作するsataベースラインと高い競合性を示している。

We are interested in image manipulation via natural language text -- a task that is useful for multiple AI applications but requires complex reasoning over multi-modal spaces. We extend recently proposed Neuro Symbolic Concept Learning (NSCL), which has been quite effective for the task of Visual Question Answering (VQA), for the task of image manipulation. Our system referred to as NeuroSIM can perform complex multi-hop reasoning over multi-object scenes and only requires weak supervision in the form of annotated data for VQA. NeuroSIM parses an instruction into a symbolic program, based on a Domain Specific Language (DSL) comprising of object attributes and manipulation operations, that guides its execution. We create a new dataset for the task, and extensive experiments demonstrate that NeuroSIM is highly competitive with or beats SOTA baselines that make use of supervised data for manipulation.

翻訳日:2023-10-26 20:53:12 公開日:2023-10-24

# 画像テキストグラフ空間における粗相関学習による視覚・言語構成性の向上

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality ( http://arxiv.org/abs/2305.13812v3 )

ライセンス: Link先を確認

Harman Singh, Pengchuan Zhang, Qifan Wang, Mengjiao Wang, Wenhan Xiong, Jingfei Du, Yu Chen

(参考訳) 対照的に訓練された視覚言語モデルは、視覚と言語表現の学習において著しく進歩し、様々な下流のマルチモーダルタスクのための最先端のモデルに繋がった。しかし、最近の研究では、オブジェクト、属性、関係性に対して構成的推論を行う能力において、これらのモデルの厳しい制限が強調されている。シーングラフは、イメージを合成的に理解する効果的な方法として登場した。これらは、オブジェクト、それらの属性、シーン内の他のオブジェクトとの関係を含む画像のグラフ構造化セマンティック表現である。本研究では,テキストから解析したシーングラフを画像シーングラフのプロキシとして考慮し,様々な複雑な文を同じ画像にアライメントする画像とテキスト間の粗い相互差分学習目標とともに,グラフ分解と拡張フレームワークを提案する。これと合わせて,属性結合と関係理解を改善するために,シーングラフ空間における新規な負のマイニング手法を提案する。本研究では,提案する複数のベンチマークにおいて,属性結合,関係理解,系統的一般化,生産性を大幅に向上させる手法の有効性を実証すると共に,様々なマルチモーダルタスクにおけるクリップと同等あるいは優れた性能を実現するとともに,提案手法の有効性を実証する。

Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning, leading to state-of-the-art models for various downstream multimodal tasks. However, recent research has highlighted severe limitations of these models in their ability to perform compositional reasoning over objects, attributes, and relations. Scene graphs have emerged as an effective way to understand images compositionally. These are graph-structured semantic representations of images that contain objects, their attributes, and relations with other objects in a scene. In this work, we consider the scene graph parsed from text as a proxy for the image scene graph and propose a graph decomposition and augmentation framework along with a coarse-to-fine contrastive learning objective between images and text that aligns sentences of various complexities to the same image. Along with this, we propose novel negative mining techniques in the scene graph space for improving attribute binding and relation understanding. Through extensive experiments, we demonstrate the effectiveness of our approach that significantly improves attribute binding, relation understanding, systematic generalization, and productivity on multiple recently proposed benchmarks (For example, improvements upto $18\%$ for systematic generalization, $16.5\%$ for relation understanding over a strong baseline), while achieving similar or better performance than CLIP on various general multimodal tasks.

翻訳日:2023-10-26 20:52:05 公開日:2023-10-24

# ハミルトン構造とカオアエネルギーとフーリエ景観構造をつなぐ

Connecting the Hamiltonian structure to the QAOA energy and Fourier landscape structure ( http://arxiv.org/abs/2305.13594v2 )

ライセンス: Link先を確認

Micha{\l} St\k{e}ch{\l}y, Lanruo Gao, Boniface Yogendran, Enrico Fontana, Manuel Rudolph

(参考訳) 本稿では,量子近似最適化アルゴリズム(QAOA)におけるハミルトニアンの構成と,対応するコスト景観特性との関係の理解を深めることを目的とする。 QAOAは、組合せ最適化に最もよく用いられる変分量子アルゴリズム(VQA)の顕著な例である。 qaoaの成功はパラメータ最適化に大きく依存しており、特にノイズの多い量子ハードウェアでは大きな課題となっている。したがって、コスト関数のランドスケープを理解することは、より良い最適化ヒューリスティックを設計するのに役立つ。最大5つの局所項と最大20量子ビットを持つハミルトニアンの1層QAOAの場合を考える。コストランドスケープの可視化に加えて、それらのフーリエ変換を計算し、補完的な視点からハミルトニアンの構造との関係を研究する。さらに,地形の粗さを定量化するための指標を導入し,高次元パラメトリドランドスケープの性質に関する貴重な知見を提供する。これらの手法により、ハミルトン構造、項の順序、係数が最適化ランドスケープの粗さに与える影響を明らかにすることができるが、第一原理からVQAの複雑なランドスケープを予測することは非常に困難であり、一般的には実現不可能である。

In this paper, we aim to expand the understanding of the relationship between the composition of the Hamiltonian in the Quantum Approximate Optimization Algorithm (QAOA) and the corresponding cost landscape characteristics. QAOA is a prominent example of a Variational Quantum Algorithm (VQA), which is most commonly used for combinatorial optimization. The success of QAOA heavily relies on parameter optimization, which is a great challenge, especially on scarce noisy quantum hardware. Thus understanding the cost function landscape can aid in designing better optimization heuristics and therefore potentially provide eventual value. We consider the case of 1-layer QAOA for Hamiltonians with up to 5-local terms and up to 20 qubits. In addition to visualizing the cost landscapes, we calculate their Fourier transform to study the relationship with the structure of the Hamiltonians from a complementary perspective. Furthermore, we introduce metrics to quantify the roughness of the landscape, which provide valuable insights into the nature of high-dimensional parametrized landscapes. While these techniques allow us to elucidate the role of Hamiltonian structure, order of the terms and their coefficients on the roughness of the optimization landscape, we also find that predicting the intricate landscapes of VQAs from first principles is very challenging and unlikely to be feasible in general.

翻訳日:2023-10-26 20:51:00 公開日:2023-10-24

# 多言語機械翻訳におけるデータ不均衡と表現変性の緩和

Mitigating Data Imbalance and Representation Degeneration in Multilingual Machine Translation ( http://arxiv.org/abs/2305.12786v2 )

ライセンス: Link先を確認

Wen Lai, Alexandra Chronopoulou, Alexander Fraser

(参考訳) 多言語ニューラルマシン翻訳(mnmt)の進歩にもかかわらず、この分野には依然として2つの大きな課題があると主張している。データ不均衡問題は、全ての言語対、特にロングテール言語(すなわち非常に低リソース言語)における並列コーパスの量の不均衡を指す。表現退化問題(representation degeneration problem)とは、mnmtモデルで利用可能な全空間の小さな部分空間にのみ現れるエンコードされたトークンの問題を指す。そこで,本稿では,mnmtモデルの性能向上のために,ターゲット側単言語データとバイリンガル辞書のみを使用するフレームワークであるbi-aclを提案する。我々は、オンライン制約ビーム探索とカリキュラム学習サンプリング戦略を組み合わせた双方向オートエンコーダと双方向コントラスト学習という2つのモジュールを定義した。広範な実験により,提案手法は,ロングテール言語と高リソース言語の両方においてより効果的であることが判明した。また,我々のアプローチは,ゼロショットシナリオでドメインと言語間の知識を伝達できることを実証する。

Despite advances in multilingual neural machine translation (MNMT), we argue that there are still two major challenges in this area: data imbalance and representation degeneration. The data imbalance problem refers to the imbalance in the amount of parallel corpora for all language pairs, especially for long-tail languages (i.e., very low-resource languages). The representation degeneration problem refers to the problem of encoded tokens tending to appear only in a small subspace of the full space available to the MNMT model. To solve these two issues, we propose Bi-ACL, a framework that uses only target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model. We define two modules, named bidirectional autoencoder and bidirectional contrastive learning, which we combine with an online constrained beam search and a curriculum learning sampling strategy. Extensive experiments show that our proposed method is more effective both in long-tail languages and in high-resource languages. We also demonstrate that our approach is capable of transferring knowledge between domains and languages in zero-shot scenarios.

翻訳日:2023-10-26 20:50:31 公開日:2023-10-24

# ほとんどのニューラルネットワークがほぼ学習可能

Most Neural Networks Are Almost Learnable ( http://arxiv.org/abs/2305.16508v3 )

ライセンス: Link先を確認

Amit Daniely, Nathan Srebro, Gal Vardi

(参考訳) ランダムな定数深度ネットワークを学習するためのPTASを提案する。固定された$\epsilon>0$とdeep $i$に対して、$\sqrt{d} \cdot \mathbb{S}^{d-1}$の任意の分布に対して、dep $i$のランダムなXavierネットワークを$\epsilon$の加算誤差まで学習するポリ時間アルゴリズムが存在することを示す。このアルゴリズムは(\bar{d})^{\mathrm{poly}(\epsilon^{-1})}$の時間とサンプルの複雑さで動作し、ここで$\bar d$はネットワークのサイズである。 Sigmoid や ReLU のような活性化の場合、境界は $(\bar{d})^{\mathrm{polylog}(\epsilon^{-1})}$ に改善され、定数深度ランダムネットワークを学習するための準ポリ時間アルゴリズムが生成される。

We present a PTAS for learning random constant-depth networks. We show that for any fixed $\epsilon>0$ and depth $i$, there is a poly-time algorithm that for any distribution on $\sqrt{d} \cdot \mathbb{S}^{d-1}$ learns random Xavier networks of depth $i$, up to an additive error of $\epsilon$. The algorithm runs in time and sample complexity of $(\bar{d})^{\mathrm{poly}(\epsilon^{-1})}$, where $\bar d$ is the size of the network. For some cases of sigmoid and ReLU-like activations the bound can be improved to $(\bar{d})^{\mathrm{polylog}(\epsilon^{-1})}$, resulting in a quasi-poly-time algorithm for learning constant depth random networks.

翻訳日:2023-10-26 20:43:01 公開日:2023-10-24

# ディックモデルにおける量子カオスとその変種

Quantum chaos in the Dicke model and its variants ( http://arxiv.org/abs/2305.15505v2 )

ライセンス: Link先を確認

Devvrat Tiwari and Subhashish Banerjee

(参考訳) 近年,時間外秩序相関器 (OTOC) が量子カオスの指標として注目されている。半古典的極限では、指数的成長速度は古典的リアプノフ指数に類似している。量子古典的対応は、多レベル原子と空洞場相互作用のモデルであるディックモデルのように、一体カオスシステムと相互作用を持つ現実的なシステムでサポートされている。この目的のために、オープン量子系設定におけるディックモデルの異なるバリエーションに対するOTOCを計算する。ディックモデルの超放射相転移とOTOCの関連性を検討した。さらに、otocと第2次コヒーレンス関数との関係も確立する。これは、量子光学モデルにおけるOTOCと量子カオスの実験的研究において重要である。

Recently, the out-of-time-ordered correlator (OTOC) has gained much attention as an indicator of quantum chaos. In the semi-classical limit, its exponential growth rate resembles the classical Lyapunov exponent. The quantum-classical correspondence has been supported for the one-body chaotic systems as well as realistic systems with interactions, as in the Dicke model, a model of multi-two-level atoms and cavity field interactions. To this end, we calculate the OTOC for different variations of the Dicke model in an open quantum system setting. The connection between the superradiant phase transition of the Dicke model and the OTOC is studied. Further, we establish a relation between the OTOC and the second-order coherence function. This becomes important for the experimental studies of the OTOC and quantum chaos in the models of quantum optics.

翻訳日:2023-10-26 20:42:23 公開日:2023-10-24

# 時系列分類におけるロバストな説明枠組み

Robust Framework for Explanation Evaluation in Time Series Classification ( http://arxiv.org/abs/2306.05501v3 )

ライセンス: Link先を確認

Thu Trang Nguyen, Thach Le Nguyen, and Georgiana Ifrim

(参考訳) 時系列分類(英: time series classification)は、人間の活動認識、スポーツ分析、一般医療などの領域でよく見られる、一般的なデータ型、時間系列を扱うタスクである。本稿では時系列分類のための説明手法を定量的に評価・ランク付けするための枠組みを提供する。時系列の説明手法に対する近年の関心は、様々な説明手法を提供してきた。しかし、その説明が特定の問題について意見が一致しない場合、どちらを使うべきかは不明のままである。正しい答えを見つけるために複数の説明を比較することは自明ではない。 2つの重要な課題は、与えられた説明方法(例えば、分類タスクの関連性)の定量的かつ堅牢な評価方法と、説明手法を並べて比較する方法である。本稿では、時系列分類のための複数の相性に基づく説明を評価・比較するための堅牢なモデル非依存的説明評価フレームワークAMEEを提案する。このアプローチでは、各説明によって導かれる入力時系列にデータ摂動を加える。次に、摂動が分類精度に及ぼす影響を計測し、説明評価に用いる。その結果,時系列の判別部を乱すと分類精度が大きく変化し,各説明の評価に使用できることがわかった。異なるタイプの摂動と異なる種類の分類器にロバストにするために、摂動と分類器にまたがる精度の損失を集約する。この新しいアプローチは、異なる説明方法の定量化とランク付けを可能にします。合成データセットの定量的および定性的な分析、さまざまな時系列データセット、および既知の専門家基盤真理を持つ実世界のデータセットを提供する。

Time series classification is a task which deals with a prevalent data type, temporal sequences, common in domains such as human activity recognition, sports analytics and general healthcare. This paper provides a framework to quantitatively evaluate and rank explanation methods for time series classification. The recent interest in explanation methods for time series has provided a great variety of explanation techniques. Nevertheless, when the explanations disagree on a specific problem, it remains unclear which of them to use. Comparing multiple explanations to find the right answer is non-trivial. Two key challenges remain: how to quantitatively and robustly evaluate the informativeness of a given explanation method (i.e., relevance for the classification task), and how to compare explanation methods side-by-side. We propose AMEE, a robust Model-Agnostic Explanation Evaluation framework for evaluating and comparing multiple saliency-based explanations for time series classification. In this approach, data perturbation is added to the input time series guided by each explanation. The impact of perturbation on classification accuracy is then measured and used for explanation evaluation. The results show that perturbing discriminative parts of the time series leads to significant changes in classification accuracy which can be used to evaluate each explanation. To be robust to different types of perturbations and different types of classifiers, we aggregate the accuracy loss across perturbations and classifiers. This novel approach allows us to quantify and rank different explanation methods. We provide a quantitative and qualitative analysis for synthetic datasets, a variety of time-series datasets, as well as a real-world dataset with known expert ground truth.

翻訳日:2023-10-26 20:32:29 公開日:2023-10-24

# 逆問題に対するデータ一貫性を用いた直接拡散ブリッジ

Direct Diffusion Bridge using Data Consistency for Inverse Problems ( http://arxiv.org/abs/2305.19809v2 )

ライセンス: Link先を確認

Hyungjin Chung, Jeongsol Kim, Jong Chul Ye

(参考訳) 拡散モデルに基づく逆問題解法は優れた性能を示したが、主にノイズから始まる逆拡散サンプリングを必要とするため、速度は制限されている。最近のいくつかの研究は、特定の逆問題に対してクリーンと腐敗を直接ブリッジすることで拡散過程を構築することでこの問題を緩和しようと試みている。本稿では,これらの既存の研究をDDB (Direct Diffusion Bridges) という名前で統一し,異なる理論に動機付けられながら,結果のアルゴリズムがパラメータの選択でのみ異なることを示す。そして、現在のddbフレームワークの重要な制限、すなわちデータの一貫性が保証されないことを強調します。この問題に対処するため,我々は,微調整を必要とせずにデータ一貫性を課す修正推論手順を提案する。得られた手法データをCDDB (Consistent DDB) と呼び、知覚と歪みの両指標において矛盾する結果が得られ、Pareto-frontier を最適な方向に効果的に推し進める。提案手法は両評価基準の最先端化を実現し,既存手法よりも優れていることを示す。コードはhttps://github.com/HJ-harry/CDDBで入手できる。

Diffusion model-based inverse problem solvers have shown impressive performance, but are limited in speed, mostly as they require reverse diffusion sampling starting from noise. Several recent works have tried to alleviate this problem by building a diffusion process, directly bridging the clean and the corrupted for specific inverse problems. In this paper, we first unify these existing works under the name Direct Diffusion Bridges (DDB), showing that while motivated by different theories, the resulting algorithms only differ in the choice of parameters. Then, we highlight a critical limitation of the current DDB framework, namely that it does not ensure data consistency. To address this problem, we propose a modified inference procedure that imposes data consistency without the need for fine-tuning. We term the resulting method data Consistent DDB (CDDB), which outperforms its inconsistent counterpart in terms of both perception and distortion metrics, thereby effectively pushing the Pareto-frontier toward the optimum. Our proposed method achieves state-of-the-art results on both evaluation criteria, showcasing its superiority over existing methods. Code is available at https://github.com/HJ-harry/CDDB

翻訳日:2023-10-26 20:29:46 公開日:2023-10-24

# 学習不可能なデータセットから何が学べるか?

What Can We Learn from Unlearnable Datasets? ( http://arxiv.org/abs/2305.19254v2 )

ライセンス: Link先を確認

Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom Goldstein

(参考訳) 広範なWebスクレイピングの時代、未学習のデータセットメソッドは、ディープニューラルネットワークの一般化を防ぎ、データのプライバシを保護する可能性がある。しかし、それらの利用を危うくする多くの実用的な制限に加えて、データを保護する能力に疑問を投げかける多くの発見を行ないました。まず、学習不可能なデータセットでトレーニングされたニューラルネットワークはショートカットのみを学ぶと広く信じられている。これとは対照的に,ネットワークは高いテスト性能を期待できる有用な特徴を実際に学習することができ,画像保護が保証されていないことを示唆している。学習不能なデータセットは、追加の摂動の線形分離性を通じて学習ショートカットを誘導すると考えられている。摂動の線形分離性は必要条件ではないことを示す反例を提供する。線形分離可能な摂動を頼りにすべきでない理由を強調するため,ICML 2021 と ICLR 2023 で発行された未学習データセットから学習が可能な直交射影攻撃を提案する。提案手法は, 提案手法に比べてかなり複雑ではない。

In an era of widespread web scraping, unlearnable dataset methods have the potential to protect data privacy by preventing deep neural networks from generalizing. But in addition to a number of practical limitations that make their use unlikely, we make a number of findings that call into question their ability to safeguard data. First, it is widely believed that neural networks trained on unlearnable datasets only learn shortcuts, simpler rules that are not useful for generalization. In contrast, we find that networks actually can learn useful features that can be reweighed for high test performance, suggesting that image protection is not assured. Unlearnable datasets are also believed to induce learning shortcuts through linear separability of added perturbations. We provide a counterexample, demonstrating that linear separability of perturbations is not a necessary condition. To emphasize why linearly separable perturbations should not be relied upon, we propose an orthogonal projection attack which allows learning from unlearnable datasets published in ICML 2021 and ICLR 2023. Our proposed attack is significantly less complex than recently proposed techniques.

翻訳日:2023-10-26 20:29:11 公開日:2023-10-24

# 雑音帯域フィードバックを持つ逆数に対する行列ゲームに対する対数レグレット

Logarithmic Regret for Matrix Games against an Adversary with Noisy Bandit Feedback ( http://arxiv.org/abs/2306.13233v2 )

ライセンス: Link先を確認

Arnab Maiti, Kevin Jamieson, Lillian J. Ratliff

(参考訳) 本稿では,列プレイヤーが行$i$を選択し,列プレイヤーが列$j$を選択し,列プレイヤーが平均$a_{i,j}$で騒がしい報酬を受け取る,ゼロサムマトリクスゲームの一変型について考察する。行プレイヤーの目的は、敵列プレイヤーに対してさえ、できるだけ多くの報酬を蓄積することである。もし行プレーヤが任意の報酬列に対して$\sqrt{T}$後悔を得るアルゴリズムであるEXP3戦略を使用すると、このゲーム設定におけるナッシュ平衡に対して$\sqrt{T}$後悔も達成される。しかしながら、EXP3戦略がゲームの構造のミオピックであるという事実から、O'Donoghue et al. (2021) はゲーム構造を活用する UCB スタイルのアルゴリズムを提案し、このアルゴリズムがEXP3を経験的に大きく上回ることを示した。彼らは、このucbスタイルのアルゴリズムが$\sqrt{t}$ regretを達成したことを示したが、本論文では、任意の敵に対して$\text{polylog}(t)$ regretを確実に達成するアルゴリズムが存在するかどうかを問う。単純な2 \times 2$設定を肯定する形で、この質問に答える新しいアルゴリズムを提案し、後悔の設定におけるゲームに対する最初のインスタンス依存保証を提供する。我々のアルゴリズムは2つの大きなハードルを克服します 1)nash平衡は1/\sqrt{t}$レートでしか推定できないが、対数的後悔を得る。 2) 敵がナッシュ均衡に関する情報を提供するか、または行プレイヤーが負の後悔をもたらすかを保証する行プレイヤー戦略を設計する。さらに、全情報の場合、最初のハードルがまだ関係している一般的な$n \times m$ケースに対処する。最後に、EXP3 と UCB ベースのアルゴリズムは、必ずしも $\sqrt{T}$ 以上の性能を発揮できないことを示す。

This paper considers a variant of zero-sum matrix games where at each timestep the row player chooses row $i$, the column player chooses column $j$, and the row player receives a noisy reward with mean $A_{i,j}$. The objective of the row player is to accumulate as much reward as possible, even against an adversarial column player. If the row player uses the EXP3 strategy, an algorithm known for obtaining $\sqrt{T}$ regret against an arbitrary sequence of rewards, it is immediate that the row player also achieves $\sqrt{T}$ regret relative to the Nash equilibrium in this game setting. However, partly motivated by the fact that the EXP3 strategy is myopic to the structure of the game, O'Donoghue et al. (2021) proposed a UCB-style algorithm that leverages the game structure and demonstrated that this algorithm greatly outperforms EXP3 empirically. While they showed that this UCB-style algorithm achieved $\sqrt{T}$ regret, in this paper we ask if there exists an algorithm that provably achieves $\text{polylog}(T)$ regret against any adversary, analogous to results from stochastic bandits. We propose a novel algorithm that answers this question in the affirmative for the simple $2 \times 2$ setting, providing the first instance-dependent guarantees for games in the regret setting. Our algorithm overcomes two major hurdles: 1) obtaining logarithmic regret even though the Nash equilibrium is estimable only at a $1/\sqrt{T}$ rate, and 2) designing row-player strategies that guarantee that either the adversary provides information about the Nash equilibrium, or the row player incurs negative regret. Moreover, in the full information case we address the general $n \times m$ case where the first hurdle is still relevant. Finally, we show that EXP3 and the UCB-based algorithm necessarily cannot perform better than $\sqrt{T}$.

翻訳日:2023-10-26 20:23:02 公開日:2023-10-24

# 深層アンサンブルを超えて:分布シフト下におけるベイズ深層学習の大規模評価

Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution Shift ( http://arxiv.org/abs/2306.12306v3 )

ライセンス: Link先を確認

Florian Seligmann, Philipp Becker, Michael Volpp, Gerhard Neumann

(参考訳) Bayesian Deep Learning (BDL) は、分布シフトしたデータに対するよく校正された予測を実現するための有望なアプローチである。それにもかかわらず、最近のSOTA手法を多様で現実的で挑戦的なベンチマークタスクを体系的に評価する大規模な調査は存在しない。本稿では,BDL研究の現状を明らかにするために,WILDSコレクションから,分散シフトによる一般化能力とキャリブレーションに着目した,挑戦的な分類と回帰作業を含む実世界のデータセットに対する最新のBDLアルゴリズムの評価を行った。我々は、大規模な、畳み込み、トランスフォーマーベースのニューラルネットワークアーキテクチャでアルゴリズムを比較した。特に,予測校正誤差の符号付きバージョンについて検討し,メソッドが過度か過度かを明らかにし,メソッドの振舞いに関するさらなる知見を提供する。さらに,スクラッチからのトレーニングが極めて高価である大規模事前学習モデルに対して,bdlの体系的評価を行った。最後に,近年のDeep Ensemblesの成功を踏まえ,一般的な単一モード後部近似をアンサンブルを用いて複数のモードに拡張する。単一モード近似は一般にモデルの一般化能力とキャリブレーションをかなりの差で向上させるが、大きなトランスフォーマーベース言語モデルを微調整する際のアンサンブルの失敗モードも同定する。この設定では、最終層ベイズ・バイ・バックプロップのような変分推論に基づくアプローチは、SWAGのような現代の近似推論アルゴリズムが最適なキャリブレーションを達成するのに対し、大きなマージンによる精度で他の手法よりも優れている。

Bayesian deep learning (BDL) is a promising approach to achieve well-calibrated predictions on distribution-shifted data. Nevertheless, there exists no large-scale survey that evaluates recent SOTA methods on diverse, realistic, and challenging benchmark tasks in a systematic manner. To provide a clear picture of the current state of BDL research, we evaluate modern BDL algorithms on real-world datasets from the WILDS collection containing challenging classification and regression tasks, with a focus on generalization capability and calibration under distribution shift. We compare the algorithms on a wide range of large, convolutional and transformer-based neural network architectures. In particular, we investigate a signed version of the expected calibration error that reveals whether the methods are over- or under-confident, providing further insight into the behavior of the methods. Further, we provide the first systematic evaluation of BDL for fine-tuning large pre-trained models, where training from scratch is prohibitively expensive. Finally, given the recent success of Deep Ensembles, we extend popular single-mode posterior approximations to multiple modes by the use of ensembles. While we find that ensembling single-mode approximations generally improves the generalization capability and calibration of the models by a significant margin, we also identify a failure mode of ensembles when finetuning large transformer-based language models. In this setting, variational inference based approaches such as last-layer Bayes By Backprop outperform other methods in terms of accuracy by a large margin, while modern approximate inference algorithms such as SWAG achieve the best calibration.

翻訳日:2023-10-26 20:22:07 公開日:2023-10-24

# SynerGPT:パーソナライズドドラッグのシナジー予測と薬物設計のためのインコンテキストラーニング

SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design ( http://arxiv.org/abs/2307.11694v2 )

ライセンス: Link先を確認

Carl Edwards and Aakanksha Naik and Tushar Khot and Martin Burke and Heng Ji and Tom Hope

(参考訳) 相乗的な薬物の組み合わせを予測することは、がん治療、特に生検細胞を介して患者の特定の腫瘍にパーソナライズされた治療の発見を加速するのに役立つ。本稿では,文脈内薬物シナジー学習のための新しい設定とモデルを提案する。特定のがん細胞標的の文脈における10～20の薬物相乗関係の「個人化データセット」を作成した。私たちの目標は、そのコンテキストで追加の薬物シナジー関係を予測することです。 gpt言語モデル(lm)を"in-context learn"共通関数クラスに事前トレーニングする最近の作業に触発されて、gptモデルが"drug synergy function"を学習できるようにする新しい事前学習スキームを考案する。我々のモデルは -- テキストコーパス、分子指紋、タンパク質相互作用、その他のドメイン固有の知識を使用しない -- は、競争的な結果を達成することができる。さらに, モデルプロンプトを最適化する遺伝的アルゴリズムと文脈内アプローチを統合し, 患者生検を行った後, テスト対象のシナジー候補を選定する。最後に、特定の患者の「パーソナライズされたデータセット」をターゲットとした、特に相乗効果のある薬物の設計を可能にする逆薬物設計の新たなタスクについて検討する。我々の発見は、精密がん医学に重要な影響を与える可能性があり、またlmsの非テキスト事前トレーニングに関する興味深い疑問も提起できる。

Predicting synergistic drug combinations can help accelerate discovery of cancer treatments, particularly therapies personalized to a patient's specific tumor via biopsied cells. In this paper, we propose a novel setting and models for in-context drug synergy learning. We are given a small "personalized dataset" of 10-20 drug synergy relationships in the context of specific cancer cell targets. Our goal is to predict additional drug synergy relationships in that context. Inspired by recent work that pre-trains a GPT language model (LM) to "in-context learn" common function classes, we devise novel pre-training schemes that enable a GPT model to in-context learn "drug synergy functions". Our model -- which does not use any textual corpora, molecular fingerprints, protein interaction or any other domain-specific knowledge -- is able to achieve competitive results. We further integrate our in-context approach with a genetic algorithm to optimize model prompts and select synergy candidates to test after conducting a patient biopsy. Finally, we explore a novel task of inverse drug design which can potentially enable the design of drugs that synergize specifically to target a given patient's "personalized dataset". Our findings can potentially have an important impact on precision cancer medicine, and also raise intriguing questions on non-textual pre-training for LMs.

翻訳日:2023-10-26 20:11:25 公開日:2023-10-24

# back to optimization:拡散に基づくゼロショット3次元ポーズ推定

Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation ( http://arxiv.org/abs/2307.03833v3 )

ライセンス: Link先を確認

Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang

(参考訳) 学習に基づく手法は、従来の最適化に基づく手法よりも多くのベンチマークにおいて非常に優れた性能を持つ3Dヒューマンポーズ推定(HPE)タスクを支配している。それにもかかわらず、訓練されたネットワークは暗黙的にカメラ固有のパラメータとドメインベースの人間のポーズの分布と統計平均による推定ポーズを学習するため、2D-3Dリフト、画像から3D、あるいは拡散ベースの方法で学習ベースのモデルにとって、野生の3D HPEは依然として最大の課題である。一方、最適化に基づく手法は、より多様で洗練された人間のポーズを予測することができるケース・バイ・ケースを推定する。最適化と学習に基づく手法の利点を組み合わせることで、3D HPEの3次元 HPE に対する \textbf{Ze}ro-shot \textbf{D}iffusion-based \textbf{O}ptimization (\textbf{ZeDO}) パイプラインを提案する。当社のマルチハイポテーゼである \textit{\textbf{zedo}} は、人間3.6mの最先端(sota)性能を実現し、minmpjpeは51.4$mmで、2d-3dまたはimage-3dペアのトレーニングを行わない。さらに,我々の単一仮説であるtextit{\textbf{ZeDO}} は,PA-MPJPE 40.3$mm の 3DPW データセット上での SOTA 性能を達成している。

Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge for learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the \textbf{Ze}ro-shot \textbf{D}iffusion-based \textbf{O}ptimization (\textbf{ZeDO}) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis \textit{\textbf{ZeDO}} achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE $51.4$mm, without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis \textit{\textbf{ZeDO}} achieves SOTA performance on 3DPW dataset with PA-MPJPE $40.3$mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW.

翻訳日:2023-10-26 20:09:57 公開日:2023-10-24

# 3:1 Nesting Rules in Redistricting

3:1 Nesting Rules in Redistricting ( http://arxiv.org/abs/2308.00605v2 )

ライセンス: Link先を確認

Christopher Donnay

(参考訳) 立法再編成では、ほとんどの州が下院と上院の地図を別々に描いている。オハイオ州とウィスコンシン州は上院の選挙区に3:1のネスト規則、すなわち隣接する下院の3つの選挙区から作るよう求めている。我々は、この要件が再編成に与える影響、特に特定の政党が獲得した議席数について調査する。我々はマルコフチェインモンテカルロ法を用いて生成された2つのアンサンブルを比較した。一方はReCom連鎖を用いてネスト要求のない上院地図を生成するもので、もう一方は3:1ネスト要求の上院地図を生成する。 3:1のネスト規則を必要とすることは、勝利した席の分布に最小限の影響を与える。さらに、選択された下院地図がネストされた上院地図の分布に与える影響について検討し、下院レベルでの極端な議席偏差が上院レベルでの当選議席の分布に大きく影響しないことを見出した。

In legislative redistricting, most states draw their House and Senate maps separately. Ohio and Wisconsin require that their Senate districts be made with a 3:1 nesting rule, i.e., out of triplets of adjacent House districts. We seek to study the impact of this requirement on redistricting, specifically on the number of seats won by a particular political party. We compare two ensembles generated using Markov Chain Monte Carlo methods; one which uses the ReCom chain to generate Senate maps without a nesting requirement, and the other which uses a chain that generates Senate maps with a 3:1 nesting requirement. We find that requiring a 3:1 nesting rule has minimal impact on the distribution of seats won. Moreover, we study the impact the chosen House map has on the distribution of nested Senate maps, and find that an extreme seat bias at the House level does not significantly impact the distribution of seats won at the Senate level.

翻訳日:2023-10-26 20:00:15 公開日:2023-10-24

# 不均衡データを用いたクロスエントロピー損失下における非拘束特徴モデルの神経崩壊

Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss with Imbalanced Data ( http://arxiv.org/abs/2309.09725v2 )

ライセンス: Link先を確認

Wanli Hong and Shuyang Ling

(参考訳) 近年、コンピュータビジョンやテキスト処理の様々なタスクにおいて、ディープニューラルネットワーク(DNN)が大きな成功を収めているのを目撃している。興味深いことに、大量のパラメータを持つこれらのDNNは、特徴表現と終末期(TPT)における最終層分類器に類似した構造特性を共有している。具体的には、トレーニングデータ(各クラスが同じサンプル数を共有する)のバランスをとると、同じクラスのサンプルの特徴ベクトルが対応するクラス内平均特徴に収束し、ペアワイズ角が同じであることが観察される。この現象は、2019年にパパヤン、ハン、ドノホによって初めて言及されたNeural Collapse(NC)として知られている。近年の多くの研究は、いわゆるunconstrained feature model(ufm)を採用してこの現象を理論的に説明している。本稿では,非拘束特徴モデルの文脈におけるクロスエントロピー損失関数下の不均衡データへの n c 現象の拡張について検討する。私たちの貢献は最先端の成果と比較すると多様です。 (a)特徴ベクトルが崩壊現象、すなわち同じクラス内の特徴が同じ平均ベクトルに崩壊することを示す。 b) 平均特徴ベクトルは、もはや等角的タイトフレームを形成しない。その代わりに、その対角はサンプルサイズに依存する。 (c) 少数群の崩壊(少数群の特徴ベクトルが1つのベクトルに崩壊する)が起こるシャープしきい値も正確に特徴づける。 (d)最後に、サンプルサイズが大きくなるとデータサイズの不均衡の影響が減少する。以上より,不均衡データに対するクロスエントロピー損失下でのn c の全体像を示す。数値実験は我々の理論解析を裏付ける。

Recent years have witnessed the huge success of deep neural networks (DNNs) in various tasks of computer vision and text processing. Interestingly, these DNNs with massive number of parameters share similar structural properties on their feature representation and last-layer classifier at terminal phase of training (TPT). Specifically, if the training data are balanced (each class shares the same number of samples), it is observed that the feature vectors of samples from the same class converge to their corresponding in-class mean features and their pairwise angles are the same. This fascinating phenomenon is known as Neural Collapse (N C), first termed by Papyan, Han, and Donoho in 2019. Many recent works manage to theoretically explain this phenomenon by adopting so-called unconstrained feature model (UFM). In this paper, we study the extension of N C phenomenon to the imbalanced data under cross-entropy loss function in the context of unconstrained feature model. Our contribution is multi-fold compared with the state-of-the-art results: (a) we show that the feature vectors exhibit collapse phenomenon, i.e., the features within the same class collapse to the same mean vector; (b) the mean feature vectors no longer form an equiangular tight frame. Instead, their pairwise angles depend on the sample size; (c) we also precisely characterize the sharp threshold on which the minority collapse (the feature vectors of the minority groups collapse to one single vector) will take place; (d) finally, we argue that the effect of the imbalance in datasize diminishes as the sample size grows. Our results provide a complete picture of the N C under the cross-entropy loss for the imbalanced data. Numerical experiments confirm our theoretical analysis.

翻訳日:2023-10-26 19:51:12 公開日:2023-10-24

# グローバルが局所化:グローバルマスター方程式の効率的な多体力学

Global becomes local: Efficient many-body dynamics for global master equations ( http://arxiv.org/abs/2309.07105v2 )

ライセンス: Link先を確認

Alexander Schnell

(参考訳) この研究は、グローバル対ローカルマスター方程式の問題に進展をもたらす。レッドフィールドマスター方程式のような大域的マスター方程式(標準ボルン近似やマルコフ近似に従う)は、ハミルトニアン系を完全に対角化する必要がある。これは量子多体系の相互作用には特に困難である。我々は、相反(エネルギー)空間における短波相関時間展開について議論し、ハミルトニアンの対角化を避けるジャンプ作用素の連続展開をもたらす。局所的に1つの場所に結合された浴場の場合、これは典型的には、局所的なオペレーターの観点から、グローバルなレッドフィールドジャンプ演算子の拡張につながる。さらに、局所レッドフィールドマスター方程式を近似したリンドブラッド形式にマッピングし、より広い体系のクラスに適用できる一方で、従来の局所リンドブラッドアプローチと同じ概念上の利点を持つ方程式を与える。我々のアイデアは局所マスター方程式の非ヒューリスティックな基礎を生み出し、確立された多体法と組み合わせることができる。

This work makes progress on the issue of global- vs. local- master equations. Global master equations like the Redfield master equation (following from standard Born- and Markov- approximation) require a full diagonalization of the system Hamiltonian. This is especially challenging for interacting quantum many-body systems. We discuss a short-bath-correlation-time expansion in reciprocal (energy) space, leading to a series expansion of the jump operator, which avoids a diagonalization of the Hamiltonian. For a bath that is coupled locally to one site, this typically leads to an expansion of the global Redfield jump operator in terms of local operators. We additionally map the local Redfield master equation to an approximate Lindblad form, giving an equation which has the same conceptual advantages of traditional local Lindblad approaches, while being applicable in a much broader class of systems. Our ideas give rise to a non-heuristic foundation of local master equations, which can be combined with established many-body methods.

翻訳日:2023-10-26 19:50:21 公開日:2023-10-24

# egofalls - エゴセントリックカメラを用いた視覚聴覚データセットと転倒検出ベンチマーク

EGOFALLS: A visual-audio dataset and benchmark for fall detection using egocentric cameras ( http://arxiv.org/abs/2309.04579v2 )

ライセンス: Link先を確認

Xueyi Wang

(参考訳) 転倒は重大であり、高齢者のような脆弱な人口にとって致命的である。これまでの研究は、単一のセンサー、画像、加速度計によるデータキャプチャによるフォールの検出に対処してきた。本研究では,エゴセントリックカメラで撮影した映像から抽出したマルチモーダルディスクリプタを利用する。提案手法は,抽出した記述子上に構築した遅延決定融合層を含む。さらに,提案手法を評価するためのデータセットを新たに収集した。この種の公開データセットとしてはこれが初めてのものだと考えています。データセットは、14人の被験者による10,948のビデオサンプルからなる。個々の特徴抽出器の性能,視覚情報の融合,視覚情報と音声情報の融合を評価するため,アブレーション実験を行った。さらに,内部および外部のクロスバリデーション実験を行った。その結果,遅延決定融合による音声情報と視覚情報の融合により検出性能が向上し,転倒防止・緩和に有望なツールとなることが示された。

Falls are significant and often fatal for vulnerable populations such as the elderly. Previous works have addressed the detection of falls by relying on data capture by a single sensor, images or accelerometers. In this work, we rely on multimodal descriptors extracted from videos captured by egocentric cameras. Our proposed method includes a late decision fusion layer that builds on top of the extracted descriptors. Furthermore, we collect a new dataset on which we assess our proposed approach. We believe this is the first public dataset of its kind. The dataset comprises 10,948 video samples by 14 subjects. We conducted ablation experiments to assess the performance of individual feature extractors, fusion of visual information, and fusion of both visual and audio information. Moreover, we experimented with internal and external cross-validation. Our results demonstrate that the fusion of audio and visual information through late decision fusion improves detection performance, making it a promising tool for fall prevention and mitigation.

翻訳日:2023-10-26 19:49:49 公開日:2023-10-24

# マルチタスク多言語機械翻訳のためのタスクベースMOE

Task-Based MoE for Multitask Multilingual Machine Translation ( http://arxiv.org/abs/2308.15772v3 )

ライセンス: Link先を確認

Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla

(参考訳) Mixture-of-experts (MoE) アーキテクチャは多くのアプリケーションで深層モデルのトレーニングにおいて、多様なタスクのための強力な手法であることが証明されている。しかし、現在のMoE実装はタスク非依存であり、異なるタスクから全てのトークンを同じように扱う。そこで本研究では,タスク情報を異なる粒度レベルでMoEモデルに組み込む新しい手法を,動的タスクベースアダプタの共用により設計する。実験と解析により,マルチタスク多言語機械翻訳における高密度および標準MoEモデルに対するアプローチの利点が示された。タスク固有のアダプタでは、モデルを新しいタスクに効率的に一般化することができます。

Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters. Our experiments and analysis show the advantages of our approaches over the dense and canonical MoE models on multi-task multilingual machine translations. With task-specific adapters, our models can additionally generalize to new tasks efficiently.

翻訳日:2023-10-26 19:48:10 公開日:2023-10-24

# オープンソースツールキットと公開データを用いたウィスパースタイルの再現訓練

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data ( http://arxiv.org/abs/2309.13876v3 )

ライセンス: Link先を確認

Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

(参考訳) 大量のデータで事前学習した音声モデルは、大きな成功を収めている。 OpenAI Whisperは680k時間の教師付き音声データに基づいてトレーニングされた多言語マルチタスクモデルである。ゼロショット設定であっても、音声認識や翻訳のベンチマークによく当てはまる。しかし、そのようなモデルを開発するための完全なパイプライン(データ収集からトレーニングまで)は公開されていないため、研究者がパフォーマンスを改善し、効率性、堅牢性、公正性、バイアスといったトレーニング関連の問題に対処することは困難である。本研究は,オープンソースツールキットと公開データを用いたWhisperスタイルのトレーニングを再現するOpen Whisperスタイル音声モデル(OWSM)を提案する。 owsmはさらに多くの翻訳方向をサポートし、より効率的にトレーニングできる。データ準備、トレーニング、推論、スコアリングに使用されるすべてのスクリプトと、オープンサイエンスを促進するための事前訓練されたモデルとトレーニングログを公開します。

Pre-training speech models on large volumes of data has achieved remarkable success. OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised speech data. It generalizes well to various speech recognition and translation benchmarks even in a zero-shot setup. However, the full pipeline for developing such models (from data collection to training) is not publicly accessible, which makes it difficult for researchers to further improve its performance and address training-related issues such as efficiency, robustness, fairness, and bias. This work presents an Open Whisper-style Speech Model (OWSM), which reproduces Whisper-style training using an open-source toolkit and publicly available data. OWSM even supports more translation directions and can be more efficient to train. We will publicly release all scripts used for data preparation, training, inference, and scoring as well as pre-trained models and training logs to promote open science.

翻訳日:2023-10-26 19:40:55 公開日:2023-10-24

# ジョイントインタラクティブナビゲーションの拡散モデル

A Diffusion-Model of Joint Interactive Navigation ( http://arxiv.org/abs/2309.12508v2 )

ライセンス: Link先を確認

Matthew Niedoba, Jonathan Wilder Lavington, Yunpeng Liu, Vasileios Lioutas, Justice Sefas, Xiaoxuan Liang, Dylan Green, Setareh Dabiri, Berend Zwartsenberg, Adam Scibior, Frank Wood

(参考訳) 自動運転車システムのシミュレーションには、シミュレーションされた交通参加者が多様で現実的な行動を示す必要がある。シミュレーションにおける事前記録された実世界の交通シナリオの使用は、現実主義を保証するが、安全クリティカルイベントの希少さにより、大規模な運転シナリオの収集が高価になる。本稿では,トラフィックシナリオ生成のための拡散ベース手法であるdjinnを提案する。提案手法は,過去,現在,未来からの柔軟な状態観測に基づいて,すべてのエージェントの軌道を協調的に拡散させる。人気トラジェクトリ予測データセットについて,共同トラジェクトリ指標を用いたアートパフォーマンスの現状を報告する。さらに, DJINNは, 目標ベースサンプリング, 行動クラスサンプリング, シナリオ編集など, 様々な価値条件分布からの直接的テストタイムサンプリングを柔軟に行えるかを示した。

Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN - a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing.

翻訳日:2023-10-26 19:40:18 公開日:2023-10-24

# mazeデータセットの生成と操作のための構成可能なライブラリ

A Configurable Library for Generating and Manipulating Maze Datasets ( http://arxiv.org/abs/2309.10498v2 )

ライセンス: Link先を確認

Michael Igorevich Ivanitskiy, Rusheb Shah, Alex F. Spies, Tilman R\"auker, Dan Valentine, Can Rager, Lucia Quirke, Chris Mathwin, Guillaume Corlouer, Cecilia Diniz Behn, Samy Wu Fung

(参考訳) 分散シフトに機械学習モデルがどのように反応するかを理解することは、重要な研究課題である。 Mazesは、微妙な分布シフトと顕著な分布シフトの両方をシミュレートするニュアンスなプラットフォームを提供する様々な生成アルゴリズムのために、優れたテストベッドとして機能する。そこで本研究では,maze処理タスクからなるデータセットの生成,処理,視覚化のための包括的なライブラリである$\texttt{maze-dataset}$を提案する。このライブラリを使用すると、研究者はデータセットを簡単に作成でき、使用する生成アルゴリズム、選択したアルゴリズムに供給されるパラメータ、迷路を生成するフィルタを満たさなければならない。さらに、ラスタライズドおよびテキストベースを含む複数の出力フォーマットをサポートし、畳み込みニューラルネットワークと自己回帰トランスフォーマーモデルに対応している。これらのフォーマットは、可視化と変換のためのツールとともに、研究アプリケーションにおける汎用性と適応性を保証する。

Understanding how machine learning models respond to distributional shifts is a key research challenge. Mazes serve as an excellent testbed due to varied generation algorithms offering a nuanced platform to simulate both subtle and pronounced distributional shifts. To enable systematic investigations of model behavior on out-of-distribution data, we present $\texttt{maze-dataset}$, a comprehensive library for generating, processing, and visualizing datasets consisting of maze-solving tasks. With this library, researchers can easily create datasets, having extensive control over the generation algorithm used, the parameters fed to the algorithm of choice, and the filters that generated mazes must satisfy. Furthermore, it supports multiple output formats, including rasterized and text-based, catering to convolutional neural networks and autoregressive transformer models. These formats, along with tools for visualizing and converting between them, ensure versatility and adaptability in research applications.

翻訳日:2023-10-26 19:39:27 公開日:2023-10-24

# 編集による要約の改善

Improving Summarization with Human Edits ( http://arxiv.org/abs/2310.05857v2 )

ライセンス: Link先を確認

Zonghai Yao, Benjamin J Schloss, and Sai P. Selvaraj

(参考訳) 近年の研究では、人間のフィードバックパラダイムで学習し、人間の判断による高品質なテキストを生成することが期待されている。既存の作品は、人間のフィードバックを使って、一般的なドメイン抽象要約の大規模言語モデル(llm)を訓練し、従来よりも質の高い要約を得た。本稿では,より探索の少ない人間のフィードバック,すなわち人間の編集に焦点をあてる。トレーニングループにおいて,人文編集データとモデル生成データの両方を併用する新しい手法であるシーケンスアライメント(un)Likelihood Training(SALT)を提案する。また,既存のトレーニングデータから得られる真実の要約と人文編集のシミュレーションを実演し,トレーニング後に得られたモデル生成要約と合わせて,高価な人文データの必要性を低減させる。実験では,一般領域要約から医療領域要約まで,人間のフィードバック探索を拡張した。本研究は,人間および模倣編集による要約品質向上における塩の効果を示す。追加実験により、SALTは従来のRLHF法(人間の嗜好のために設計された)-DPOよりも優れた性能を示した。私たちの論文の証拠は、研究者にさまざまな人間のフィードバックアプローチを精査し、収集し、よりうまく活用するよう促すことを願っています。

Recent work has shown the promise of learning with human feedback paradigms to produce human-determined high-quality text. Existing works use human feedback to train large language models (LLMs) in general domain abstractive summarization and have obtained summary quality exceeding traditional likelihood training. In this paper, we focus on a less explored form of human feedback -- Human Edits. We propose Sequence Alignment (un)Likelihood Training (SALT), a novel technique to use both the human-edited and model-generated data together in the training loop. In addition, we demonstrate simulating Human Edits with ground truth summaries coming from existing training data -- Imitation edits, along with the model-generated summaries obtained after the training, to reduce the need for expensive human-edit data. In our experiments, we extend human feedback exploration from general domain summarization to medical domain summarization. Our results demonstrate the effectiveness of SALT in improving the summary quality with Human and Imitation Edits. Through additional experiments, we show that SALT outperforms the conventional RLHF method (designed for human preferences) -- DPO, when applied to human-edit data. We hope the evidence in our paper prompts researchers to explore, collect, and better use different human feedback approaches scalably.

翻訳日:2023-10-26 19:30:47 公開日:2023-10-24

# アンチレラキシエーション被覆および緩衝ガス充填アルカリ蒸気セルにおける光貯蔵の比較研究

Comparative study of light storage in antirelaxation-coated and buffer-gas-filled alkali vapor cells ( http://arxiv.org/abs/2310.03726v2 )

ライセンス: Link先を確認

Marin {\DH}uji\'c, D. Buhin, N. \v{S}anti\'c, D. Aumiler, and T. Ban

(参考訳) 熱ルビジウム蒸気中における電磁誘導透過 (EIT) を用いた反緩和コーティングおよび緩衝ガス充填アルカリ気相セルの光貯蔵特性の比較検討を行った。バッファーガス充填セルの使用は、抗リラクゼーションコーティング細胞と比較して保存時間と効率が約10倍向上した。我々は、ほぼ共鳴のeit$\lambda$-schemeを共振器の代わりに使用することにより、同様のメモリ寿命を維持しながら、バッファガス充填メモリ効率を最大6倍向上させる。本研究は,フィールド展開可能な量子メモリの開発に寄与する。量子記憶

We perform a comparative study of light storage in antirelaxation-coated and buffer-gas-filled alkali vapor cells using electromagnetically induced transparency (EIT) in warm rubidium vapor. The use of a buffer-gas-filled cell resulted in $\approx$10-fold improvement in storage time and efficiency compared to antirelaxation-coated cells. We achieve up to sixfold enhancement in buffer-gas-filled memory efficiency, while maintaining a similar memory lifetime, by employing a near-resonant EIT $\Lambda$-scheme instead of a resonant one. Our findings contribute to the development of field-deployable quantum memories. quantum memories.

翻訳日:2023-10-26 19:29:41 公開日:2023-10-24

# KGQuiz:大規模言語モデルにおける符号化知識の一般化の評価

KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models ( http://arxiv.org/abs/2310.09725v2 )

ライセンス: Link先を確認

Yuyang Bai, Shangbin Feng, Vidhisha Balachandran, Zhaoxuan Tan, Shiqi Lou, Tianxing He, Yulia Tsvetkov

(参考訳) 大規模言語モデル(llm)は知識集約型タスクにおいて顕著な性能を示し、実世界の知識がモデルパラメータにエンコードされていることを示唆する。しかし、限られた知識領域におけるいくつかの探索課題の他に、LLMの知識を体系的に評価する方法や、その知識能力がいかに一般化するかは、知識領域や徐々に複雑化するタスク形式でよく理解されていない。そこで本研究では,LLMの知識一般化能力を総合的に研究するための知識集約型ベンチマークKGQuizを提案する。 KGQuizは3つの知識ドメインをカバーするスケーラブルなフレームワークで、複雑さを増す5つのタスクで構成されている。我々は,LLMの知識能力とその一般化をより深く理解するために,KGQuizベンチマークを用いて,5つの知識集約タスクと知識領域の10個のオープンソースおよびブラックボックスLSMを評価した。大規模な実験では、LLMは簡単な知識のQAタスクにおいて印象的なパフォーマンスを達成する一方で、より複雑な推論やドメイン固有の事実の活用を必要とする設定やコンテキストは依然として重大な課題を呈している。 kgquizをテストベッドとして、ドメインとタスクフォーマット間のパフォーマンスの微妙な変動を分析し、最終的には幅広い知識ドメインとタスクにわたってllmsの知識能力を理解し、評価し、改善することを想定した。

Large language models (LLMs) demonstrate remarkable performance on knowledge-intensive tasks, suggesting that real-world knowledge is encoded in their model parameters. However, besides explorations on a few probing tasks in limited knowledge domains, it is not well understood how to evaluate LLMs' knowledge systematically and how well their knowledge abilities generalize, across a spectrum of knowledge domains and progressively complex task formats. To this end, we propose KGQuiz, a knowledge-intensive benchmark to comprehensively investigate the knowledge generalization abilities of LLMs. KGQuiz is a scalable framework constructed from triplet-based knowledge, which covers three knowledge domains and consists of five tasks with increasing complexity: true-or-false, multiple-choice QA, blank filling, factual editing, and open-ended knowledge generation. To gain a better understanding of LLMs' knowledge abilities and their generalization, we evaluate 10 open-source and black-box LLMs on the KGQuiz benchmark across the five knowledge-intensive tasks and knowledge domains. Extensive experiments demonstrate that LLMs achieve impressive performance in straightforward knowledge QA tasks, while settings and contexts requiring more complex reasoning or employing domain-specific facts still present significant challenges. We envision KGQuiz as a testbed to analyze such nuanced variations in performance across domains and task formats, and ultimately to understand, evaluate, and improve LLMs' knowledge abilities across a wide spectrum of knowledge domains and tasks.

翻訳日:2023-10-26 19:21:25 公開日:2023-10-24

# ノード回帰/分類のための無限幅グラフニューラルネットワーク

Infinite Width Graph Neural Networks for Node Regression/ Classification ( http://arxiv.org/abs/2310.08176v2 )

ライセンス: Link先を確認

Yunus Cobanoglu

(参考訳) 本研究は,グラフ構造化データ上の完全連結深層ニューラルネットワークの一般化であるグラフニューラルネットワークの解析を行う。 Infinite Width Neural NetworksはDeep LearningをGaussian ProcessesとKernelsに接続している。 Gaussian ProcessesとKernelsは、ニューラルネットワークのハイパーパラメータをはるかに少なくし、不確実性推定に使用できるため、アプリケーションに対してよりユーザフレンドリである。この研究は、ガウス過程とカーネルをニューラルネットワークに接続する研究の量を増やしている。 Kernel と Gaussian Process のクローズドフォームは、標準の Graph Neural Network、Skip-Concatenate Connections を備えた Graph Neural Network、Graph Attention Neural Network など、さまざまなアーキテクチャから派生している。すべてのアーキテクチャは、トランスダクティブノードの回帰と分類のタスクにおいて、さまざまなデータセット上で評価される。さらに、効果的な抵抗として知られるスペクトルスパーシフィケーション手法は、ランタイムとメモリ要求を改善するために使用される。インダクティブグラフ学習タスク(グラフ回帰/分類)への設定の拡張は簡単であり、3.5で簡単に議論される。

This work analyzes Graph Neural Networks, a generalization of Fully-Connected Deep Neural Nets on Graph structured data, when their width, that is the number of nodes in each fullyconnected layer is increasing to infinity. Infinite Width Neural Networks are connecting Deep Learning to Gaussian Processes and Kernels, both Machine Learning Frameworks with long traditions and extensive theoretical foundations. Gaussian Processes and Kernels have much less hyperparameters then Neural Networks and can be used for uncertainty estimation, making them more user friendly for applications. This works extends the increasing amount of research connecting Gaussian Processes and Kernels to Neural Networks. The Kernel and Gaussian Process closed forms are derived for a variety of architectures, namely the standard Graph Neural Network, the Graph Neural Network with Skip-Concatenate Connections and the Graph Attention Neural Network. All architectures are evaluated on a variety of datasets on the task of transductive Node Regression and Classification. Additionally, a Spectral Sparsification method known as Effective Resistance is used to improve runtime and memory requirements. Extending the setting to inductive graph learning tasks (Graph Regression/ Classification) is straightforward and is briefly discussed in 3.5.

翻訳日:2023-10-26 19:20:36 公開日:2023-10-24

# 最適な探索はトンプソンサンプリングよりも難しくない

Optimal Exploration is no harder than Thompson Sampling ( http://arxiv.org/abs/2310.06069v2 )

ライセンス: Link先を確認

Zhaoqi Li, Kevin Jamieson, Lalit Jain

(参考訳) 腕の組 $\mathcal{Z}\subset \mathbb{R}^d$ と未知のパラメータベクトル $\theta_\ast\mathbb{R}^d$ が与えられたとき、純粋な探索線形バンドイ問題は $\arg\max_{z\in \mathcal{Z}} z^{\top}\theta_{\ast}$ を返すことを目的としており、$x^{\top}\theta_{\ast}$ と $x\in \mathcal{X}\subset \mathbb{R}^d$ のノイズ測定による確率が高い。既存の(漸近的に)最適な方法が必要か a) 各アームに対する潜在的にコストがかかるプロジェクション $z\in \mathcal{Z}$ b) それぞれの時点で$\mathcal{Z}$のサブセットを明示的に保持すること。この複雑さは、後悔の最小化のために人気があり単純なトンプソンサンプリングアルゴリズムと矛盾する。これは後続サンプリングとargmaxオラクルへのアクセスを必要とするだけであり、任意の時点で$\mathcal{Z}$を列挙する必要はない。残念ながら、トンプソンサンプリングは純粋な探査に最適ではないことが知られている。最適な探索が可能で、トンプソンサンプリングと同じ計算プリミティブしか必要としないアルゴリズムがあるのだろうか? 私たちはその質問を肯定的に答える。我々はサンプリングとargmaxのみを利用するアルゴリズムを提供し、指数関数収束率を達成し、指数は漸近的に可能な全ての割り当ての中で最適である。さらに,本アルゴリズムは,既存の漸近的最適手法と同様に,容易に実装および実行可能であることを示す。

Given a set of arms $\mathcal{Z}\subset \mathbb{R}^d$ and an unknown parameter vector $\theta_\ast\in\mathbb{R}^d$, the pure exploration linear bandit problem aims to return $\arg\max_{z\in \mathcal{Z}} z^{\top}\theta_{\ast}$, with high probability through noisy measurements of $x^{\top}\theta_{\ast}$ with $x\in \mathcal{X}\subset \mathbb{R}^d$. Existing (asymptotically) optimal methods require either a) potentially costly projections for each arm $z\in \mathcal{Z}$ or b) explicitly maintaining a subset of $\mathcal{Z}$ under consideration at each time. This complexity is at odds with the popular and simple Thompson Sampling algorithm for regret minimization, which just requires access to a posterior sampling and argmax oracle, and does not need to enumerate $\mathcal{Z}$ at any point. Unfortunately, Thompson sampling is known to be sub-optimal for pure exploration. In this work, we pose a natural question: is there an algorithm that can explore optimally and only needs the same computational primitives as Thompson Sampling? We answer the question in the affirmative. We provide an algorithm that leverages only sampling and argmax oracles and achieves an exponential convergence rate, with the exponent being the optimal among all possible allocations asymptotically. In addition, we show that our algorithm can be easily implemented and performs as well empirically as existing asymptotically optimal methods.

翻訳日:2023-10-26 19:19:33 公開日:2023-10-24

# segue: 現実世界における顔のプライバシー保護のための、サイドインフォメーションによる生成不能な例

Segue: Side-information Guided Generative Unlearnable Examples for Facial Privacy Protection in Real World ( http://arxiv.org/abs/2310.16061v1 )

ライセンス: Link先を確認

Zhiling Zhang, Jie Zhang, Kui Zhang, Wenbo Zhou, Weiming Zhang, and Nenghai Yu

(参考訳) 顔認識技術の普及は、多くの個人が顔データの収集と利用を心配しているため、プライバシー上の懸念を引き起こしている。これらの懸念に対処するため、研究者はモデルトレーニング段階におけるデータに知覚不可能な摂動を加えることで、モデルが対象の顔の特徴を識別するのを防ぐことを目的として、「非学習可能な例」の概念を積極的に検討している。しかし、現在の手法は非効率であり、トランスファービリティとロバスト性を同時に保証できないため、現実世界では非現実性を引き起こす。そこで本研究では,sgue: side-information guided generative unlearnable という新しい手法を提案する。具体的には,一度学習したマルチユースモデルを用いて,時間消費勾配法ではなく,所望の摂動を生成する。転送性を改善するために,各シナリオに固有の真のラベルや擬似ラベルなどの側面情報を導入する。堅牢性向上のため、トレーニングパイプラインには歪み層が組み込まれている。広範な実験により、提案法が従来の方法よりはるかに高速であることが証明され(1000$\times$)、異なるデータセットとモデルアーキテクチャ間で転送可能な効率性を達成する。さらに、JPEG圧縮、敵トレーニング、およびいくつかの標準的なデータ拡張に抵抗することができる。

The widespread use of face recognition technology has given rise to privacy concerns, as many individuals are worried about the collection and utilization of their facial data. To address these concerns, researchers are actively exploring the concept of ``unlearnable examples", by adding imperceptible perturbation to data in the model training stage, which aims to prevent the model from learning discriminate features of the target face. However, current methods are inefficient and cannot guarantee transferability and robustness at the same time, causing impracticality in the real world. To remedy it, we propose a novel method called Segue: Side-information guided generative unlearnable examples. Specifically, we leverage a once-trained multiple-used model to generate the desired perturbation rather than the time-consuming gradient-based method. To improve transferability, we introduce side information such as true labels and pseudo labels, which are inherently consistent across different scenarios. For robustness enhancement, a distortion layer is integrated into the training pipeline. Extensive experiments demonstrate that the proposed Segue is much faster than previous methods (1000$\times$) and achieves transferable effectiveness across different datasets and model architectures. Furthermore, it can resist JPEG compression, adversarial training, and some standard data augmentations.

翻訳日:2023-10-26 19:10:31 公開日:2023-10-24

# go-16衛星観測による対流開始時点の物理的説明可能な深層学習

Physically Explainable Deep Learning for Convective Initiation Nowcasting Using GOES-16 Satellite Observations ( http://arxiv.org/abs/2310.16015v1 )

ライセンス: Link先を確認

Da Fan, Steven J. Greybush, David John Gagne II, and Eugene E. Clothiaux

(参考訳) Convection Initiation (CI) nowcasting は、数値天気予報モデルと既存の nowcasting アルゴリズムの両方において難しい問題である。本研究では,多チャンネル赤外線GOES-R衛星観測に基づくCI予測のためのオブジェクトベース確率的深層学習モデルを開発した。このデータは、2020年6月から2021年6月にかけて、グレートプレーンズ地域のマルチレーダーマルチセンサードップラー気象レーダ製品で発見されたciの可能性のある事象に関するパッチから得られたものだ。客観的なレーダーベースのアプローチは、これらのイベントを識別するために使用される。ディープラーニングモデルは、特に誤報率において、リードタイムで最大1時間までの古典的ロジスティックモデルを著しく上回る。ケーススタディを通じて、深層学習モデルは、雲と湿気の特性に複数のレベルで依存することを示す。モデル説明は、モデルの決定過程を異なるベースラインで明らかにする。説明結果は,ベースラインの選択によって異なるレベルの水分と雲の特徴の重要性を強調した。本研究は, モデル行動の理解を深め, 科学的洞察を得る上で, 異なるベースラインを用いることの利点を示す。

Convection initiation (CI) nowcasting remains a challenging problem for both numerical weather prediction models and existing nowcasting algorithms. In this study, object-based probabilistic deep learning models are developed to predict CI based on multichannel infrared GOES-R satellite observations. The data come from patches surrounding potential CI events identified in Multi-Radar Multi-Sensor Doppler weather radar products over the Great Plains region from June and July 2020 and June 2021. An objective radar-based approach is used to identify these events. The deep learning models significantly outperform the classical logistic model at lead times up to 1 hour, especially on the false alarm ratio. Through case studies, the deep learning model exhibits the dependence on the characteristics of clouds and moisture at multiple levels. Model explanation further reveals the model's decision-making process with different baselines. The explanation results highlight the importance of moisture and cloud features at different levels depending on the choice of baseline. Our study demonstrates the advantage of using different baselines in further understanding model behavior and gaining scientific insights.

翻訳日:2023-10-26 19:09:57 公開日:2023-10-24

# 量子シミュレータにおける位相励起と創発フェルミオンのゲージ冷却

Gauged cooling of topological excitations and emergent fermions on quantum simulators ( http://arxiv.org/abs/2310.16082v1 )

ライセンス: Link先を確認

Gilad Kishony, Mark S. Rudner, Achim Rosch, Erez Berg

(参考訳) シミュレーション冷却は、短期量子シミュレータ上で多体ハミルトニアンの低エネルギー状態を作成するための堅牢な方法である。このようなスキームでは、シミュレータのスピン(またはキュービット)のサブセットは、興味のあるシステムからエネルギーとエントロピーを抽出する「バス」として扱われる。しかし、このようなプロトコルは、トポロジカル位相のような微視的な自由度で励起が極めて非局所的なシステムに適用される場合、非効率であり、そのような励起は浴への局所結合によって抽出することが困難である。我々は,システムの自由度を非局所的に量子シミュレータに符号化することで,この障害を克服するための経路を探究する。提案手法を説明するために,IsingスピンをZ_2$ゲージ場に結合し,励起を除去するための貯留体として同時に機能する"ゲージ冷却"プロトコルを用いて,励起がドメイン壁である量子Isingモデルの強磁性相を効率的に冷却する方法を示す。本プロトコルは強磁性相と常磁性相の基底状態を等しく効率的に作成できることを示す。ゲージ冷却プロトコルは自然に(相互作用する)フェルミオン系に拡張され、単一フェルミオンホッピングによるフェルミオン浴とのカップリングによる冷却に相当する。

Simulated cooling is a robust method for preparing low-energy states of many-body Hamiltonians on near-term quantum simulators. In such schemes, a subset of the simulator's spins (or qubits) are treated as a "bath," which extracts energy and entropy from the system of interest. However, such protocols are inefficient when applied to systems whose excitations are highly non-local in terms of the microscopic degrees of freedom, such as topological phases of matter; such excitations are difficult to extract by a local coupling to a bath. We explore a route to overcome this obstacle by encoding of the system's degrees of freedom into those of the quantum simulator in a non-local manner. To illustrate the approach, we show how to efficiently cool the ferromagnetic phase of the quantum Ising model, whose excitations are domain walls, via a "gauged cooling" protocol in which the Ising spins are coupled to a $Z_2$ gauge field that simultaneously acts as a reservoir for removing excitations. We show that our protocol can prepare the ground states of the ferromagnetic and paramagnetic phases equally efficiently. The gauged cooling protocol naturally extends to (interacting) fermionic systems, where it is equivalent to cooling by coupling to a fermionic bath via single-fermion hopping.

翻訳日:2023-10-26 19:01:54 公開日:2023-10-24

# リニア変圧器の実用計算力とその繰り返し・自己参照拡張

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions ( http://arxiv.org/abs/2310.16076v1 )

ライセンス: Link先を確認

Kazuki Irie, R\'obert Csord\'as, J\"urgen Schmidhuber

(参考訳) 最近のリカレントニューラルネットワーク(RNN)の計算能力の研究は、リアルタイムおよび有限精度の仮定を与えられたRNNアーキテクチャの階層構造を明らかにしている。本稿では,線形変換器 (LT) やFWP (Fast Weight Programmers) を線形化した自動回帰変換器について検討する。 LTは固定サイズのRNNライクなシーケンスプロセッサと等価であるという意味で特有であり、今や人気になっている自己アテンションネットワークとしても表現できる。本稿では,標準トランスフォーマーのLT/FWPへの直接転送について述べる。正規言語認識実験により,fwpや自己回帰重み行列といった最近提案されたfwp拡張が,例えばパリティ問題の一般化を可能にするltの制限を克服することに成功したことを示す。私たちのコードは公開されています。

Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions. Here we study auto-regressive Transformers with linearised attention, a.k.a. linear Transformers (LTs) or Fast Weight Programmers (FWPs). LTs are special in the sense that they are equivalent to RNN-like sequence processors with a fixed-size state, while they can also be expressed as the now-popular self-attention networks. We show that many well-known results for the standard Transformer directly transfer to LTs/FWPs. Our formal language recognition experiments demonstrate how recently proposed FWP extensions such as recurrent FWPs and self-referential weight matrices successfully overcome certain limitations of the LT, e.g., allowing for generalisation on the parity problem. Our code is public.

翻訳日:2023-10-26 19:01:30 公開日:2023-10-24

# RePoseDM: Pose Guided Image Synthesis における繰り返しポッドアライメントとグラディエントガイダンス

RePoseDM: Recurrent Pose Alignment and Gradient Guidance for Pose Guided Image Synthesis ( http://arxiv.org/abs/2310.16074v1 )

ライセンス: Link先を確認

Anant Khandelwal

(参考訳) ポーズ誘導された人物画像合成タスクは、フォトリアリスティックな外観と欠陥のないポーズ転送を備えた参照イメージを再レンダリングする必要がある。人物画像は高度に構造化されているため、既存のアプローチでは複雑な変形や閉塞のために密接な接続を必要としている。しかし畳み込みニューラルネットワークによって生成される特徴マップには等分散性がなく、したがってマルチレベルウォーピングでさえ完全なポーズアライメントを持っていない。拡散モデルが与えられた条件付きガイダンスからフォトリアリスティックな画像を生成する能力に着想を得て,ポーズアライメントを条件付きガイダンスとして提案する。さらに,対象ポーズからの距離を入力として適切なポーズ多様体から出力するポーズ相互作用場からの勾配誘導を提案する。これは、フォトリアリズムと非歪なテクスチャの詳細をもたらす、もっともらしいポーズ伝達軌道の学習に役立つ。 2つの大規模ベンチマークとユーザ調査の結果から,提案手法が課題シナリオにおいて,フォトリアリスティックなポーズ伝達を生成する可能性を実証した。また,HumanArtデータセット上でのポーズ誘導画像生成における勾配誘導の効率性を示す。

Pose-guided person image synthesis task requires re-rendering a reference image, which should have a photorealistic appearance and flawless pose transfer. Since person images are highly structured, existing approaches require dense connections for complex deformations and occlusions because these are generally handled through multi-level warping and masking in latent space. But the feature maps generated by convolutional neural networks do not have equivariance, and hence even the multi-level warping does not have a perfect pose alignment. Inspired by the ability of the diffusion model to generate photorealistic images from the given conditional guidance, we propose recurrent pose alignment to provide pose-aligned texture features as conditional guidance. Moreover, we propose gradient guidance from pose interaction fields, which output the distance from the valid pose manifold given a target pose as input. This helps in learning plausible pose transfer trajectories that result in photorealism and undistorted texture details. Extensive results on two large-scale benchmarks and a user study demonstrate the ability of our proposed approach to generate photorealistic pose transfer under challenging scenarios. Additionally, we prove the efficiency of gradient guidance in pose-guided image generation on the HumanArt dataset with fine-tuned stable diffusion.

翻訳日:2023-10-26 19:01:11 公開日:2023-10-24

# ビデオにおける非偏在シーングラフ生成の相関バイアス

Correlation Debiasing for Unbiased Scene Graph Generation in Videos ( http://arxiv.org/abs/2310.16073v1 )

ライセンス: Link先を確認

Anant Khandelwal

(参考訳) ビデオからの動的シーングラフ生成(SGG)は、時間的変動に起因するシーン全体のオブジェクトを包括的に理解するだけでなく、時間的動きと異なるオブジェクトとの相互作用のモデルを必要とする。さらに、視覚関係のロングテール分布は、多くの動的sgg法において重要なボトルネックであり、そのほとんどが複雑なアーキテクチャを用いて時空間的コンテキストを捉えることに焦点を当てており、バイアス付きシーングラフの生成に繋がる。これらの課題に対処するために,フローアウェアな時間的一貫性と不確実性との相関脱バイアスを,非バイアス動的シーングラフに対して提案する。 FloCoDeはフローを使ってフレーム全体の時間的に一貫したオブジェクトを検出する。さらに、ロングテールクラスの偏りのない関係表現を学ぶために相関デバイアスを用いる。さらに、予測の不確実性を弱めるために、sgmoidal cross-entropy loss と contrastive loss を混合してラベル相関を組み込んで、共通の共起関係を識別し、長い尾を持つ関係を弱めるのに役立つ。大規模な実験的評価は、より偏りのないシーングラフを生成する優位性を示す最大4.1%のパフォーマンス向上を示している。

Dynamic scene graph generation (SGG) from videos requires not only comprehensive understanding of objects across the scenes that are prone to temporal fluctuations but also a model the temporal motions and interactions with different objects. Moreover, the long-tailed distribution of visual relationships is the crucial bottleneck of most dynamic SGG methods, since most of them focus on capturing spatio-temporal context using complex architectures, which leads to the generation of biased scene graphs. To address these challenges, we propose FloCoDe: Flow-aware temporal consistency and Correlation Debiasing with uncertainty attenuation for unbiased dynamic scene graphs. FloCoDe employs feature warping using flow to detect temporally consistent objects across the frames. In addition, it uses correlation debiasing to learn the unbiased relation representation for long-tailed classes. Moreover, to attenuate the predictive uncertainties, it uses a mixture of sigmoidal cross-entropy loss and contrastive loss to incorporate label correlations to identify the commonly co-occurring relations and help debias the long-tailed ones. Extensive experimental evaluation shows a performance gain as high as 4.1% showing the superiority of generating more unbiased scene graphs.

翻訳日:2023-10-26 19:00:50 公開日:2023-10-24

# 畳み込みLSTMを用いた大学キャンパスにおける格子周波数予測

Grid Frequency Forecasting in University Campuses using Convolutional LSTM ( http://arxiv.org/abs/2310.16071v1 )

ライセンス: Link先を確認

Aneesh Sathe, Wen Ren Yang

(参考訳) 現代の電力網は複雑化しており、主に再生可能エネルギー源の統合と消費パターンの進化に起因している。本稿では,畳み込みニューラルネットワーク(CNN)とLong Short-Term Memory(LSTM)を用いて,グリッド周波数の時系列予測モデルを構築する手法を提案する。これらのモデルは、グリッド周波数データに固有の時空間的複雑さを効果的に捉え、予測精度を著しく向上し、電力グリッドの信頼性を高める。本研究は,大学キャンパス内の建物を対象とした個別コンボリューショナルLSTM(ConvLSTM)モデルの可能性と開発について検討し,各建物に対して個別に学習し,評価することを可能にする。個々のConvLSTMモデルは、各キャンパスビルの電力消費データに基づいて訓練され、歴史的傾向に基づいてグリッド周波数を予測する。その結果、平均二乗誤差(mse)、平均絶対誤差(mae)、平均絶対パーセンテージ誤差(mape)といった性能指標によって示される従来の予測手法よりも、提案モデルが優れていることが示された。さらに、アンサンブルモデルによって、建物固有のモデルから洞察を集約し、キャンパス全体に包括的な予測を提供する。このアプローチは、各建物固有の電力消費データのプライバシーとセキュリティを保証する。

The modern power grid is facing increasing complexities, primarily stemming from the integration of renewable energy sources and evolving consumption patterns. This paper introduces an innovative methodology that harnesses Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to establish robust time series forecasting models for grid frequency. These models effectively capture the spatiotemporal intricacies inherent in grid frequency data, significantly enhancing prediction accuracy and bolstering power grid reliability. The research explores the potential and development of individualized Convolutional LSTM (ConvLSTM) models for buildings within a university campus, enabling them to be independently trained and evaluated for each building. Individual ConvLSTM models are trained on power consumption data for each campus building and forecast the grid frequency based on historical trends. The results convincingly demonstrate the superiority of the proposed models over traditional forecasting techniques, as evidenced by performance metrics such as Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). Additionally, an Ensemble Model is formulated to aggregate insights from the building-specific models, delivering comprehensive forecasts for the entire campus. This approach ensures the privacy and security of power consumption data specific to each building.

翻訳日:2023-10-26 19:00:26 公開日:2023-10-24

# 交通予測のための時空間ハイパーグラフニューラルネットワーク

Spatial-Temporal Hypergraph Neural Network for Traffic Forecasting ( http://arxiv.org/abs/2310.16070v1 )

ライセンス: Link先を確認

Chengzhi Yao, Zhi Li, Junbo Wang

(参考訳) モバイルインターネット開発と位置技術から恩恵を受ける交通予測は、インテリジェントトランスポーテーションシステムにおいて重要な役割を果たす。豊かで多様な交通アプリケーションの実装や、収集された交通データに基づいた便利な交通サービスの実現に役立ちます。既存のほとんどの手法はグラフベースのディープラーニングネットワークを利用して、交通予測を浅くする複雑な道路ネットワークをモデル化する。その効果にもかかわらず、これらの手法は一般に道路網のトポロジと交通力学による高次時間的依存関係によって引き起こされる高次空間依存性を完全に捉えることに制限されている。道路網のトポロジと交通力学を組み合わせて交通データの高次時空間依存性を捕捉するSTHODE: Spatio-Temporal Hypergraph Neural Ordinary Differential Equation Networkを提案する。技術的には、STHODEは空間モジュールと時間モジュールから構成される。一方,空間ハイパーグラフを構築し,適応型mixhopハイパーグラフodeネットワークを用いて高次空間依存性をキャプチャする。一方,時間的ハイパーグラフを用い,ハイパーエッジ進化型odeネットワークを用いて高次時間的依存関係をキャプチャする。最後に、積み重ねられたSTHODE層の出力を集約し、予測性能を相互に向上する。 4つの実世界のトラヒックデータセットで行った広範囲な実験により、提案モデルの性能が様々なベースラインよりも優れていることを示した。

Traffic forecasting, which benefits from mobile Internet development and position technologies, plays a critical role in Intelligent Transportation Systems. It helps to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data. Most existing methods usually leverage graph-based deep learning networks to model the complex road network for traffic forecasting shallowly. Despite their effectiveness, these methods are generally limited in fully capturing high-order spatial dependencies caused by road network topology and high-order temporal dependencies caused by traffic dynamics. To tackle the above issues, we focus on the essence of traffic system and propose STHODE: Spatio-Temporal Hypergraph Neural Ordinary Differential Equation Network, which combines road network topology and traffic dynamics to capture high-order spatio-temporal dependencies in traffic data. Technically, STHODE consists of a spatial module and a temporal module. On the one hand, we construct a spatial hypergraph and leverage an adaptive MixHop hypergraph ODE network to capture high-order spatial dependencies. On the other hand, we utilize a temporal hypergraph and employ a hyperedge evolving ODE network to capture high-order temporal dependencies. Finally, we aggregate the outputs of stacked STHODE layers to mutually enhance the prediction performance. Extensive experiments conducted on four real-world traffic datasets demonstrate the superior performance of our proposed model compared to various baselines.

翻訳日:2023-10-26 19:00:01 公開日:2023-10-24

# cpseg:chain-of-thought languageプロンプトによる細かな画像意味セグメンテーション

CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting ( http://arxiv.org/abs/2310.16069v1 )

ライセンス: Link先を確認

Lei Li

(参考訳) 自然シーン分析とリモートセンシング画像は、大規模言語誘導コンテキスト認識データ利用の進歩に大きな可能性を秘めている。このポテンシャルは、設計言語プロンプトによるオブジェクト検出やセグメンテーションといった下流タスクのパフォーマンス向上に特に重要である。そこで本稿では,画像に関連づけられたテキスト情報を活用した新たな「思考の連鎖」プロセスを統合することにより,画像分割性能を向上させるための革新的なフレームワークである cpseg を紹介する。この画期的なアプローチは洪水災害のシナリオに適用されている。 CPSegは、様々な文から派生したプロンプトテキストを符号化し、コヒーレント連鎖を定式化する。我々は、画像、セマンティックマスク、および対応するテキスト情報を含む新しい視覚言語データセット、FloodPromptを提案する。これはシナリオの意味的理解を強化するだけでなく、ピクセルとテキストのマッチングマップの相互作用を通じて意味的セグメンテーションの重要なタスクを支援する。 CPSegの有効性を質的,定量的に検証した。

Natural scene analysis and remote sensing imagery offer immense potential for advancements in large-scale language-guided context-aware data utilization. This potential is particularly significant for enhancing performance in downstream tasks such as object detection and segmentation with designed language prompting. In light of this, we introduce the CPSeg, Chain-of-Thought Language Prompting for Finer-grained Semantic Segmentation), an innovative framework designed to augment image segmentation performance by integrating a novel "Chain-of-Thought" process that harnesses textual information associated with images. This groundbreaking approach has been applied to a flood disaster scenario. CPSeg encodes prompt texts derived from various sentences to formulate a coherent chain-of-thought. We propose a new vision-language dataset, FloodPrompt, which includes images, semantic masks, and corresponding text information. This not only strengthens the semantic understanding of the scenario but also aids in the key task of semantic segmentation through an interplay of pixel and text matching maps. Our qualitative and quantitative analyses validate the effectiveness of CPSeg.

翻訳日:2023-10-26 18:59:38 公開日:2023-10-24

# 超次元変換:関数のホログラフィック表現

The Hyperdimensional Transform: a Holographic Representation of Functions ( http://arxiv.org/abs/2310.16065v1 )

ライセンス: Link先を確認

Pieter Dewulf, Michiel Stock, Bernard De Baets

(参考訳) 積分変換は、関数をキャラクタリゼーションが容易な空間にマッピングする貴重な数学的ツールである。我々は超次元変換を新しい積分変換として導入する。正方積分可能な関数を超次元ベクトルと呼ばれるノイズロバスト、ホログラフィック、高次元表現に変換する。中心となる考え方は、関数をランダム関数の線型結合で近似することである。確率的直交基底関数の集合を正式に導入し、超次元変換とその逆写像を定義する。本稿では、その特異性、逆変換の近似特性、積分と微分の表現など、一般的な変換関連特性について論じる。超次元変換は、フーリエ変換、ラプラス変換、ファジィ変換などの他の積分変換と密接に結合する強力で柔軟なフレームワークを提供する。さらに、より効率的で説明可能な機械学習アルゴリズムに急速に注目を集めている超次元コンピューティングの分野に対する理論的基礎と新しい洞察を提供し、統計モデリングや機械学習の潜在的な応用の可能性を提供する。さらに,チュートリアルとして機能し,変換の計算から微分方程式の解法まで,実例の再現を可能にする,簡単で分かりやすいコードも提供する。

Integral transforms are invaluable mathematical tools to map functions into spaces where they are easier to characterize. We introduce the hyperdimensional transform as a new kind of integral transform. It converts square-integrable functions into noise-robust, holographic, high-dimensional representations called hyperdimensional vectors. The central idea is to approximate a function by a linear combination of random functions. We formally introduce a set of stochastic, orthogonal basis functions and define the hyperdimensional transform and its inverse. We discuss general transform-related properties such as its uniqueness, approximation properties of the inverse transform, and the representation of integrals and derivatives. The hyperdimensional transform offers a powerful, flexible framework that connects closely with other integral transforms, such as the Fourier, Laplace, and fuzzy transforms. Moreover, it provides theoretical foundations and new insights for the field of hyperdimensional computing, a computing paradigm that is rapidly gaining attention for efficient and explainable machine learning algorithms, with potential applications in statistical modelling and machine learning. In addition, we provide straightforward and easily understandable code, which can function as a tutorial and allows for the reproduction of the demonstrated examples, from computing the transform to solving differential equations.

翻訳日:2023-10-26 18:59:19 公開日:2023-10-24

# 学習可能なフィルタモジュールによる交通予測の強化

Enhancing Traffic Prediction with Learnable Filter Module ( http://arxiv.org/abs/2310.16063v1 )

ライセンス: Link先を確認

Yuanshao Zhu, Yongchao Ye, Xiangyu Zhao, and James J.Q. Yu

(参考訳) 将来の交通条件のモデル化は、時間的および空間的相関を捉えるために複雑な空間-時間的ニューラルネットワークに大きく依存することが多い。このノイズは、しばしば交通観測における予期せぬ短期的なピークや落下として現れ、交通事故や固有のセンサー振動によって引き起こされる。実際には、そのようなノイズはその確率的性質のためにモデル化することが困難であり、ニューラルネットワークがこの振る舞いを学習するように設計された場合、リスクを過度に当てはめる可能性がある。この問題に対処するために,トラフィックデータのノイズを適応的にフィルタする学習可能なフィルタモジュールを提案する。このモジュールはフーリエ変換を利用して、そのパターンに基づいてノイズがフィルタリングされる周波数領域にデータを変換する。離散データは逆フーリエ変換を用いて時間領域に復元される。提案手法は,交通予測モデルにおける入力データの品質向上に重点を置いている。提案するモジュールは軽量であり,既存モデルとの統合が容易であり,トラフィック予測性能を大幅に向上できることを示す。さらに,実世界のデータセットに対する広範囲な実験結果を用いて検証を行い,ノイズを効果的に軽減し,予測精度を向上させることを示す。

Modeling future traffic conditions often relies heavily on complex spatial-temporal neural networks to capture spatial and temporal correlations, which can overlook the inherent noise in the data. This noise, often manifesting as unexpected short-term peaks or drops in traffic observation, is typically caused by traffic accidents or inherent sensor vibration. In practice, such noise can be challenging to model due to its stochastic nature and can lead to overfitting risks if a neural network is designed to learn this behavior. To address this issue, we propose a learnable filter module to filter out noise in traffic data adaptively. This module leverages the Fourier transform to convert the data to the frequency domain, where noise is filtered based on its pattern. The denoised data is then recovered to the time domain using the inverse Fourier transform. Our approach focuses on enhancing the quality of the input data for traffic prediction models, which is a critical yet often overlooked aspect in the field. We demonstrate that the proposed module is lightweight, easy to integrate with existing models, and can significantly improve traffic prediction performance. Furthermore, we validate our approach with extensive experimental results on real-world datasets, showing that it effectively mitigates noise and enhances prediction accuracy.

翻訳日:2023-10-26 18:59:01 公開日:2023-10-24

# 調整済み大規模モデルの逆行領域適応における共同設立者のバランシング

Confounder Balancing in Adversarial Domain Adaptation for Pre-Trained Large Models Fine-Tuning ( http://arxiv.org/abs/2310.16062v1 )

ライセンス: Link先を確認

Shuoran Jiang, Qingcai Chen, Yang Xiang, Youcheng Pan, Xiangping Wu

(参考訳) プレトレーニング済みの大規模モデル(PLM)における優れた一般化、文脈学習、および出現能力は、直接トレーニングデータなしで特定のタスクを処理し、ソースドメインから学習した知識をターゲットドメインに転送するために、敵対的ドメイン適応(ADA)手法のより良い基礎モデルとなる。しかし、既存のadaメソッドは、ターゲットドメインと異なるソースデータ分散の根本原因である、confounderを適切に考慮していない。本研究では, PLMs fine-tuning (ADA-CBF) のための共創バランシングを用いた対向ドメイン適応を提案する。 ADA−CBFは、特徴抽出器、ドメイン分類器及び共同分類器の基盤モデルとしてPLMを含み、対向的損失で共同訓練される。この損失は、ドメイン分類器の識別を希釈することで、ドメイン不変表現学習を改善するために設計されている。同時に、敵対的損失は、トレーニング中のソースドメインと未測定ドメインの共作者分布のバランスをとる。既存のADA法と比較して、ADA-CBFはドメイン不変の特徴の共創者を正しく識別し、PLMから抽出した特徴の共創バイアスを取り除くことができる。 ADA-CBFの共創者分類器はプラグアンドプレイとして設計されており、共創者計測可能、測定不能、または部分的に測定可能な環境に適用することができる。自然言語処理とコンピュータビジョンダウンストリームタスクの実証結果は、ADA-CBFが最新のGPT-4, LLaMA2, ViT, ADAメソッドより優れていることを示している。

The excellent generalization, contextual learning, and emergence abilities in the pre-trained large models (PLMs) handle specific tasks without direct training data, making them the better foundation models in the adversarial domain adaptation (ADA) methods to transfer knowledge learned from the source domain to target domains. However, existing ADA methods fail to account for the confounder properly, which is the root cause of the source data distribution that differs from the target domains. This study proposes an adversarial domain adaptation with confounder balancing for PLMs fine-tuning (ADA-CBF). The ADA-CBF includes a PLM as the foundation model for a feature extractor, a domain classifier and a confounder classifier, and they are jointly trained with an adversarial loss. This loss is designed to improve the domain-invariant representation learning by diluting the discrimination in the domain classifier. At the same time, the adversarial loss also balances the confounder distribution among source and unmeasured domains in training. Compared to existing ADA methods, ADA-CBF can correctly identify confounders in domain-invariant features, thereby eliminating the confounder biases in the extracted features from PLMs. The confounder classifier in ADA-CBF is designed as a plug-and-play and can be applied in the confounder measurable, unmeasurable, or partially measurable environments. Empirical results on natural language processing and computer vision downstream tasks show that ADA-CBF outperforms the newest GPT-4, LLaMA2, ViT and ADA methods.

翻訳日:2023-10-26 18:58:41 公開日:2023-10-24

# グラフを用いた分散オンライン学習のための局所的個人的勾配追跡

Locally Differentially Private Gradient Tracking for Distributed Online Learning over Directed Graphs ( http://arxiv.org/abs/2310.16105v1 )

ライセンス: Link先を確認

Ziqin Chen and Yongqiang Wang

(参考訳) 分散オンライン学習は、ストリーミングデータを含む大規模な機械学習問題を解決するのに極めて効果的であることが証明されている。しかし、分散学習における学習者間の情報共有は、個々の学習者のセンシティブなデータの漏洩を懸念させる。このリスクを軽減するため、分散オンライン学習において、プライバシー保護の「金の標準」として広く見なされている差分プライバシーが、多くの既存の結果に広く採用されている。しかし、これらの結果はしばしば、学習精度とプライバシーの根本的なトレードオフに直面します。本稿では,このトレードオフを回避するために,局所的微分勾配追跡に基づく分散オンライン学習アルゴリズムを提案する。解析の結果,提案アルゴリズムは局所的差分プライバシーの厳密性を確保しつつ,平均二乗に収束し,反復回数が無限大となる場合においても,累積的プライバシー予算が有限であることが保証された。このアルゴリズムは、学習者間のコミュニケーショングラフが向けられた場合でも適用できる。私たちの知る限りでは、有向グラフ上の分散オンライン学習において、学習精度と厳密な局所微分プライバシーを同時に確保する最初の結果です。我々は,Mushroomsデータセットのロジスティック回帰と,MNISTデータセットとCIFAR-10データセットのCNN画像分類を含む,複数のベンチマーク機械学習アプリケーションを用いて,アルゴリズムの性能を評価する。実験の結果,提案アルゴリズムが既存のアルゴリズムよりも精度が向上していることが確認された。

Distributed online learning has been proven extremely effective in solving large-scale machine learning problems involving streaming data. However, information sharing between learners in distributed learning also raises concerns about the potential leakage of individual learners' sensitive data. To mitigate this risk, differential privacy, which is widely regarded as the "gold standard" for privacy protection, has been widely employed in many existing results on distributed online learning. However, these results often face a fundamental tradeoff between learning accuracy and privacy. In this paper, we propose a locally differentially private gradient tracking based distributed online learning algorithm that successfully circumvents this tradeoff. Our analysis shows that the proposed algorithm converges in mean square to the exact optimal solution while ensuring rigorous local differential privacy, with the cumulative privacy budget guaranteed to be finite even when the number of iterations tends to infinity. The algorithm is applicable even when the communication graph among learners is directed. To the best of our knowledge, this is the first result that simultaneously ensures learning accuracy and rigorous local differential privacy in distributed online learning over directed graphs. We evaluate our algorithm's performance by using multiple benchmark machine-learning applications, including logistic regression on the "Mushrooms" dataset and CNN-based image classification on the "MNIST" and "CIFAR-10" datasets, respectively. The experimental results confirm that the proposed algorithm outperforms existing counterparts in both training and testing accuracies.

翻訳日:2023-10-26 18:49:48 公開日:2023-10-24

# LaksNet:Udacityシミュレーターにおける自動運転車のエンドツーエンドディープラーニングモデル

LaksNet: an end-to-end deep learning model for self-driving cars in Udacity simulator ( http://arxiv.org/abs/2310.16103v1 )

ライセンス: Link先を確認

Lakshmikar R. Polamreddy and Youshan Zhang

(参考訳) 道路事故の大半は、注意散らし、無謀さ、飲酒運転など、人間の間違いによるものである。この危険な状況を克服する効果的な方法の1つは、車両に自動運転技術を実装することである。本稿では、自動運転車のための効率的なディープラーニングモデルの構築に着目する。本研究では、4つの畳み込み層と2つの完全連結層からなる新しい効果的畳み込みニューラルネットワークモデル「ラークスネット」を提案する。 Udacityシミュレータから生成されたトレーニングデータを用いて,LaksNetモデルを用いた広範な実験を行った。我々のモデルは、シミュレーターのトラックを降りることなく走行する車の走行時間において、既存のImageNetやNVIDIAモデルよりも優れています。

The majority of road accidents occur because of human errors, including distraction, recklessness, and drunken driving. One of the effective ways to overcome this dangerous situation is by implementing self-driving technologies in vehicles. In this paper, we focus on building an efficient deep-learning model for self-driving cars. We propose a new and effective convolutional neural network model called `LaksNet' consisting of four convolutional layers and two fully connected layers. We conduct extensive experiments using our LaksNet model with the training data generated from the Udacity simulator. Our model outperforms many existing pre-trained ImageNet and NVIDIA models in terms of the duration of the car for which it drives without going off the track on the simulator.

翻訳日:2023-10-26 18:49:23 公開日:2023-10-24

# 光子効率多光子顕微鏡のための学習・不確実性駆動型適応獲得

Learned, Uncertainty-driven Adaptive Acquisition for Photon-Efficient Multiphoton Microscopy ( http://arxiv.org/abs/2310.16102v1 )

ライセンス: Link先を確認

Cassandra Tong Ye, Jiashu Han, Kunzan Liu, Anastasios Angelopoulos, Linda Griffith, Kristina Monakhova, Sixian You

(参考訳) 多光子顕微鏡(MPM)は強力なイメージングツールであり、生体組織イメージングにおいて重要な効果がある。しかし、ほとんどの多光子顕微鏡プラットフォームは点走査に依存しているため、取得時間、視野(fov)、光毒性、および画質の間に固有のトレードオフがあり、高速で大きなfov、および/または穏やかな撮像が必要な場合、ノイズの測定結果が発生することが多い。深層学習は多光子顕微鏡測定に応用できるが、これらのアルゴリズムは幻覚を引き起こす傾向があり、医学や科学の分野では破滅的なものである。本稿では,多光子画像計測における画素方向の不確かさを同時に推定し,アルゴリズムの信頼性を改善し,深層学習予測のための統計的保証を提供する手法を提案する。さらに,この学習された画素単位の不確実性を利用して,サンプルの最も不確実な領域のみをスキャンする適応的取得手法を提案する。本研究では,ヒト子宮内膜組織のMPM測定実験において,微細な特徴を維持でき,各画素における不確かさを予測しながら,他の denoising 法より優れていることを示す。最後に, 適応的獲得手法を用いて, 試料中の微細な特徴を回収しながら, 120倍の取得時間と全光量削減効果を示した。実実験データを用いた復調作業における分布自由不確実性定量化と再構成不確実性に基づく適応的獲得の提案を最初に行った。

Multiphoton microscopy (MPM) is a powerful imaging tool that has been a critical enabler for live tissue imaging. However, since most multiphoton microscopy platforms rely on point scanning, there is an inherent trade-off between acquisition time, field of view (FOV), phototoxicity, and image quality, often resulting in noisy measurements when fast, large FOV, and/or gentle imaging is needed. Deep learning could be used to denoise multiphoton microscopy measurements, but these algorithms can be prone to hallucination, which can be disastrous for medical and scientific applications. We propose a method to simultaneously denoise and predict pixel-wise uncertainty for multiphoton imaging measurements, improving algorithm trustworthiness and providing statistical guarantees for the deep learning predictions. Furthermore, we propose to leverage this learned, pixel-wise uncertainty to drive an adaptive acquisition technique that rescans only the most uncertain regions of a sample. We demonstrate our method on experimental noisy MPM measurements of human endometrium tissues, showing that we can maintain fine features and outperform other denoising methods while predicting uncertainty at each pixel. Finally, with our adaptive acquisition technique, we demonstrate a 120X reduction in acquisition time and total light dose while successfully recovering fine features in the sample. We are the first to demonstrate distribution-free uncertainty quantification for a denoising task with real experimental data and the first to propose adaptive acquisition based on reconstruction uncertainty

翻訳日:2023-10-26 18:49:15 公開日:2023-10-24

# 教師なしドメイン適応のためのDeep Feature Registration

Deep Feature Registration for Unsupervised Domain Adaptation ( http://arxiv.org/abs/2310.16100v1 )

ライセンス: Link先を確認

Youshan Zhang and Brian D. Davison

(参考訳) ラベル付きソースドメインからラベル付きターゲットドメインへの知識を活用するために、教師なしのドメイン適応が検討されているが、既存の手法は2つのドメイン間の分散アライメントに焦点を当てている。しかし、ソースとターゲットの機能を調整する方法には、うまく対応していない。本稿では,ドメイン不変特徴を維持し,ヒストグラムマッチングによる登録特徴と対象特徴のドメイン異同性を同時に最小化する,登録特徴を生成できるディープ特徴登録(dfr)モデルを提案する。さらに,確率的ソフトセレクションとセンターベースハードセレクションの両方を考慮して,ターゲット領域における擬似ラベルの品質を向上させる擬似ラベルリファインメントプロセスも採用する。複数のUDAベンチマークでの大規模な実験は、我々のDFRモデルの有効性を示し、その結果、新しい最先端の性能をもたらす。

While unsupervised domain adaptation has been explored to leverage the knowledge from a labeled source domain to an unlabeled target domain, existing methods focus on the distribution alignment between two domains. However, how to better align source and target features is not well addressed. In this paper, we propose a deep feature registration (DFR) model to generate registered features that maintain domain invariant features and simultaneously minimize the domain-dissimilarity of registered features and target features via histogram matching. We further employ a pseudo label refinement process, which considers both probabilistic soft selection and center-based hard selection to improve the quality of pseudo labels in the target domain. Extensive experiments on multiple UDA benchmarks demonstrate the effectiveness of our DFR model, resulting in new state-of-the-art performance.

翻訳日:2023-10-26 18:48:43 公開日:2023-10-24

# 半教師付き画像分割における解剖学的不確かさ

Anatomically-aware Uncertainty for Semi-supervised Image Segmentation ( http://arxiv.org/abs/2310.16099v1 )

ライセンス: Link先を確認

Sukesh Adiga V, Jose Dolz, Herve Lombaert

(参考訳) 半教師付き学習は、ラベルなしデータを活用することにより、画像セグメンテーションのための大きなピクセル単位のラベル付きデータセットの必要性を緩和する。ラベルのないデータを利用するための顕著な方法は、モデル予測を規則化することである。非ラベルデータの予測は信頼できないため、不確実性認識スキームは徐々に有意義で信頼性の高い予測から学ぶために用いられる。しかしながら、不確実性推定法は、各トレーニングステップで計算しなければならないモデル予測から、計算コストの高い複数の推論に依存する。さらに、これらの不確実性マップは画素ワイドの差を捉え、グローバルな情報を考慮しない。本研究では,セグメント化マスクのグローバル情報を活用することによってセグメント化の不確実性を推定する手法を提案する。より正確には、解剖学的に認識された表現は、最初に利用可能なセグメンテーションマスクをモデル化することを学ぶ。学習表現は、新しいセグメンテーションの予測を解剖学的に表現可能なセグメンテーションにマップする。推定可能なセグメンテーションからのずれは、セグメンテーションネットワークをさらに導くために基礎となる画素レベルの不確かさを推定するのに役立つ。提案手法は,この表現から単一推論を用いて不確実性を推定し,全体の計算量を削減する。心臓MRIでは左心房,腹部CTでは多発臓器の2つの公用セグメンテーションデータセットについて検討した。我々の解剖学的手法は2つの一般的な評価指標を用いて,最先端の半教師付き手法よりもセグメンテーション精度を向上する。

Semi-supervised learning relaxes the need of large pixel-wise labeled datasets for image segmentation by leveraging unlabeled data. A prominent way to exploit unlabeled data is to regularize model predictions. Since the predictions of unlabeled data can be unreliable, uncertainty-aware schemes are typically employed to gradually learn from meaningful and reliable predictions. Uncertainty estimation methods, however, rely on multiple inferences from the model predictions that must be computed for each training step, which is computationally expensive. Moreover, these uncertainty maps capture pixel-wise disparities and do not consider global information. This work proposes a novel method to estimate segmentation uncertainty by leveraging global information from the segmentation masks. More precisely, an anatomically-aware representation is first learnt to model the available segmentation masks. The learnt representation thereupon maps the prediction of a new segmentation into an anatomically-plausible segmentation. The deviation from the plausible segmentation aids in estimating the underlying pixel-level uncertainty in order to further guide the segmentation network. The proposed method consequently estimates the uncertainty using a single inference from our representation, thereby reducing the total computation. We evaluate our method on two publicly available segmentation datasets of left atria in cardiac MRIs and of multiple organs in abdominal CTs. Our anatomically-aware method improves the segmentation accuracy over the state-of-the-art semi-supervised methods in terms of two commonly used evaluation metrics.

翻訳日:2023-10-26 18:48:29 公開日:2023-10-24

# 在庫管理政策の評価と改善のための文脈帯域

Contextual Bandits for Evaluating and Improving Inventory Control Policies ( http://arxiv.org/abs/2310.16096v1 )

ライセンス: Link先を確認

Dean Foster, Randy Jia, Dhruv Madeka

(参考訳) 定期的なレビュー在庫管理問題に、非定常的なランダムな需要、失った販売、確率的ベンダーのリードタイムに対処する解決策は、一般に近似またはシミュレーションのダイナミクスを強く仮定し、最適化、動的プログラミング、強化学習などの手法を適用する。したがって、特に改善の余地があるかどうかを確認するためには、在庫管理政策の分析と評価が重要である。我々は,政策の望ましい性質である均衡政策の概念について紹介する。これは,ほんのわずかな行動だけを変更するだけでは,実質的な報酬は得られない,という直観的な意味を持つ。本手法は, 理論上, 経験的研究においても良好な保証が得られることを示すため, 軽量なコンテキストバンディットベースアルゴリズムを提案する。

Solutions to address the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times typically involve making strong assumptions on the dynamics for either approximation or simulation, and applying methods such as optimization, dynamic programming, or reinforcement learning. Therefore, it is important to analyze and evaluate any inventory control policy, in particular to see if there is room for improvement. We introduce the concept of an equilibrium policy, a desirable property of a policy that intuitively means that, in hindsight, changing only a small fraction of actions does not result in materially more reward. We provide a light-weight contextual bandit-based algorithm to evaluate and occasionally tweak policies, and show that this method achieves favorable guarantees, both theoretically and in empirical studies.

翻訳日:2023-10-26 18:48:09 公開日:2023-10-24

# CR-COPEC:財務報告から学ぶ企業業績変化の因果関係

CR-COPEC: Causal Rationale of Corporate Performance Changes to Learn from Financial Reports ( http://arxiv.org/abs/2310.16095v1 )

ライセンス: Link先を確認

Ye Eun Chun, Sunjae Kwon, Kyunghwan Sohn, Nakwon Sung, Junyoup Lee, Byungki Seo, Kevin Compher, Seung-won Hwang, Jaesik Choi

(参考訳) 本稿では,企業業績の変化の因果関係(Causal Rationale of Corporate Performance Changes)を財務報告から紹介する。これは、企業業績の変化を検出するための包括的な大規模ドメイン適応因果文データセットである。 CR-COPECは2つの大きな業績に貢献している。まず、会計基準に従う専門家の因果分析を形式的に含む米国企業の10-kの年次報告書から因果的根拠を検出する。このデータセットは、個々の投資家とアナリストの両方が、すべてのドキュメントを読み取るのに多大な努力をすることなく、投資と意思決定のための材料情報リソースとして広く利用することができる。第2に、12の業界における企業の財務パフォーマンスに影響を与えるさまざまな特性を慎重に検討する。その結果、CR-COPECは各産業における独自の物語を考慮に入れ、各産業における因果文を区別することができる。また, CR-COPECデータセットの構築方法や, 目的文を産業特性に関する因果文として分類するのに適していることを示す。私たちのデータセットと実験コードは公開されています。

In this paper, we introduce CR-COPEC called Causal Rationale of Corporate Performance Changes from financial reports. This is a comprehensive large-scale domain-adaptation causal sentence dataset to detect financial performance changes of corporate. CR-COPEC contributes to two major achievements. First, it detects causal rationale from 10-K annual reports of the U.S. companies, which contain experts' causal analysis following accounting standards in a formal manner. This dataset can be widely used by both individual investors and analysts as material information resources for investing and decision making without tremendous effort to read through all the documents. Second, it carefully considers different characteristics which affect the financial performance of companies in twelve industries. As a result, CR-COPEC can distinguish causal sentences in various industries by taking unique narratives in each industry into consideration. We also provide an extensive analysis of how well CR-COPEC dataset is constructed and suited for classifying target sentences as causal ones with respect to industry characteristics. Our dataset and experimental codes are publicly available.

翻訳日:2023-10-26 18:47:54 公開日:2023-10-24

# 長距離絡み合いと位相励起

Long-range entanglement and topological excitations ( http://arxiv.org/abs/2310.16091v1 )

ライセンス: Link先を確認

Gianpaolo Torre, Jovan Odavi\'c, Pierre Fromholz, Salvatore Marco Giampaolo, Fabio Franchini

(参考訳) トポロジカル秩序は様々な形態を持ち、その分類と検出は近代研究の重要な分野である。本研究では, 位相位相を同定するために導入された非連結エントロピーが, 単一の分数化励起によって搬送される長距離エンタングルメント (lre) も明らかにできることを示す。反強磁性スピン鎖のトポロジカルフラストレーションを誘導することにより、量子的に非局在化されたドメインウォール励起をシステムに導入できることを示す。さらに, 量子クエンチに対するlreの弾力性と障害の導入について検討し, 典型的な位相秩序や対称性が保護された位相を持つ位相の存在を確立した。

Topological order comes in different forms, and its classification and detection is an important field of modern research. In this work, we show that the Disconnected Entanglement Entropy, a measure originally introduced to identify topological phases, is also able to unveil the long-range entanglement (LRE) carried by a single, fractionalized excitation. We show this by considering a quantum, delocalized domain wall excitation that can be introduced into a system by inducing topological frustration in an antiferromagnetic spin chain. Furthermore, we study the resilience of LRE against a quantum quench and the introduction of disorder, thus establishing the existence of a phase with topological features despite not being a typical topological order or symmetry-protected one.

翻訳日:2023-10-26 18:47:38 公開日:2023-10-24

# フラクトニック量子物質の動的スペクトル応答

Dynamical Spectral Response of Fractonic Quantum Matter ( http://arxiv.org/abs/2310.16084v1 )

ライセンス: Link先を確認

Philip Zechmann, Julian Boesl, Johannes Feldmeier, Michael Knap

(参考訳) フラクタル励起を持つ量子多体系は、興味深い物質の段階を実現することができる。本研究では,粒子数に加えて質量中心の保存や双極子モーメントも考慮した,拘束されたボース・ハバード模型の低エネルギー励起を1次元で研究する。このモデルは、双極子モット絶縁体、双極子ルッティンガー液体、準安定双極子超固体を含むフラクトン相を実現することが知られている。テンソルネットワーク法を用いてシステムの動的応答からスペクトル関数を計算し、対応する基底状態相の低エネルギー場理論から予測を検証する。双極子mott絶縁体,双極子ルッティンガー液の線形音響モード,および非整数充填時の電荷密度波秩序と位相コヒーレンスを有する超固体状態における零モード,有限モードの軟2次モードにおいて,ガッピング励起の存在を実証する。

Quantum many-body systems with fractonic excitations can realize fascinating phases of matter. Here, we study the low-energy excitations of a constrained Bose-Hubbard model in one dimension, which conserves the center of mass or, equivalently, the dipole moment in addition to the particle number. This model is known to realize fractonic phases, including a dipole Mott insulator, a dipole Luttinger liquid, and a metastable dipole supersolid. We use tensor network methods to compute spectral functions from the dynamical response of the system and verify predictions from low-energy field theories of the corresponding ground state phases. We demonstrate the existence of gapped excitations compatible with strong coupling results in a dipole Mott insulator, linear sound modes characteristic of a Luttinger liquid of dipoles, and soft quadratic modes at both zero and finite momenta in a supersolid state with charge density wave order and phase coherence at non-integer filling.

翻訳日:2023-10-26 18:47:22 公開日:2023-10-24

# 局所量子場の経路積分による粒子検出器モデル

Particle detector models from path integrals of localized quantum fields ( http://arxiv.org/abs/2310.16083v1 )

ライセンス: Link先を確認

Bruno de S. L. Torres

(参考訳) シュウィンガー・ケルディッシュ経路積分を用いて、相対論的量子情報 (rqi) における局所量子場理論とより一般的な局所プローブのモデルとの接続を描く。プローブとして使用される局所化された場の到達不能モードを積分して追跡することにより、摂動理論の先頭の順において、プローブ場の有限個のモードのダイナミクスは、ちょうど有限個の調和振動子unruh-dewitt(udw)検出器のそれであることを示す。等価性は、プローブターゲット場系の入力状態の比較的一般的なクラスと、検出器として含む任意の数のモードに対して有効である。経路積分はまた、追跡された追加モードの存在により摂動理論のより高い順序でUDWモデルの補正を得る体系的な方法を与える閉形式式も提供する。このアプローチは、最近提案された量子場理論(arXiv:2308.11698)のための検出器ベースとフィールド理論ベースの測定フレームワークの間の橋渡しと拡張し、また、経路積分法がより一般的な分野であるRQIと他の物理学領域における粒子検出器モデルの間の潜在的な接続を指している。

Using the Schwinger-Keldysh path integral, we draw a connection between localized quantum field theories and more commonly used models of local probes in Relativistic Quantum Information (RQI). By integrating over and then tracing out the inaccessible modes of the localized field being used as a probe, we show that, at leading order in perturbation theory, the dynamics of any finite number of modes of the probe field is exactly that of a finite number of harmonic-oscillator Unruh-DeWitt (UDW) detectors. The equivalence is valid for a rather general class of input states of the probe-target field system, as well as for any arbitrary number of modes included as detectors. The path integral also provides a closed-form expression which gives us a systematic way of obtaining the corrections to the UDW model at higher orders in perturbation theory due to the existence of the additional modes that have been traced out. This approach vindicates and extends a recently proposed bridge between detector-based and field-theory-based measurement frameworks for quantum field theory [arXiv:2308.11698], and also points to potential connections between particle detector models in RQI and other areas of physics where path integral methods are more commonplace -- in particular, the Wilsonian approach to the renormalization group and effective field theories.

翻訳日:2023-10-26 18:47:04 公開日:2023-10-24

# 葉を通しての立体視深度知覚

Stereoscopic Depth Perception Through Foliage ( http://arxiv.org/abs/2310.16120v1 )

ライセンス: Link先を確認

Robert Kerschner, Rakesh John Amala Arokia Nathan, Rafal Mantiuk, Oliver Bimber

(参考訳) 人間も計算手法も葉の下に隠された物体の深さを識別するのに苦労している。しかし,計算合成開口センシングと立体画像を融合する人間の能力を組み合わせた場合,このような識別が実現可能となる。捜索・救助、野生生物の観察、監視、早期の山火事検出に必要な物体識別タスクでは、人、動物、車両などの誤った発見と地上や樹冠の日光を浴びたパッチ、あるいは地上の火災と樹木のトランクとの区別を深度支援する。我々は、密集した森の上空でドローンが撮影したビデオを使って、ユーザーの奥行きを識別する能力をテストした。単視ビデオを見たり,運動視差に頼ると,これは不可能であることがわかった。葉の閉塞が原因で立体映像でも同様であった。しかし,オクルージョンを減少させるために合成開口センシングが用いられ,立体視ビデオに差が生じたが,計算(立体視マッチング)手法は失敗し,人間の観察者は深度を識別することに成功した。これは、計算方法と人間の視覚の相乗効果を利用して、単独では実行できないタスクを実行するシステムの可能性を示している。

Both humans and computational methods struggle to discriminate the depths of objects hidden beneath foliage. However, such discrimination becomes feasible when we combine computational optical synthetic aperture sensing with the human ability to fuse stereoscopic images. For object identification tasks, as required in search and rescue, wildlife observation, surveillance, and early wildfire detection, depth assists in differentiating true from false findings, such as people, animals, or vehicles vs. sun-heated patches at the ground level or in the tree crowns, or ground fires vs. tree trunks. We used video captured by a drone above dense woodland to test users' ability to discriminate depth. We found that this is impossible when viewing monoscopic video and relying on motion parallax. The same was true with stereoscopic video because of the occlusions caused by foliage. However, when synthetic aperture sensing was used to reduce occlusions and disparity-scaled stereoscopic video was presented, whereas computational (stereoscopic matching) methods were unsuccessful, human observers successfully discriminated depth. This shows the potential of systems which exploit the synergy between computational methods and human vision to perform tasks that neither can perform alone.

翻訳日:2023-10-26 18:41:08 公開日:2023-10-24

# alquist 5.0: 対話ツリーは生成モデルと出会う。ソーシャルボットの会話を促進する新しいアプローチ

Alquist 5.0: Dialogue Trees Meet Generative Models. A Novel Approach for Enhancing SocialBot Conversations ( http://arxiv.org/abs/2310.16119v1 )

ライセンス: Link先を確認

Ond\v{r}ej Kobza, Jan \v{C}uhel, Tommaso Gargiani, David Herel, Petr Marek (Faculty of Electrical Engineering, CTU in Prague)

(参考訳) Alexa Prize SocialBot Grand Challenge~5のために開発されたSocialBot - Alquist~5.0を紹介します。従来のシステムに基づいて、NRG Baristaを導入し、社会ボットにバリスタを統合するための革新的なアプローチをいくつか紹介し、全体的な会話体験を改善した。さらに、SocialBotを拡張してマルチモーダルデバイスをサポートします。本稿では,多種多様なトピックにまたがる共感的・知識的な会話能力を維持しつつ,ユーザ期待の進展に対応するAlquist~5.0の開発に関する知見を提供する。

We present our SocialBot -- Alquist~5.0 -- developed for the Alexa Prize SocialBot Grand Challenge~5. Building upon previous versions of our system, we introduce the NRG Barista and outline several innovative approaches for integrating Barista into our SocialBot, improving the overall conversational experience. Additionally, we extend our SocialBot to support multimodal devices. This paper offers insights into the development of Alquist~5.0, which meets evolving user expectations while maintaining empathetic and knowledgeable conversational abilities across diverse topics.

翻訳日:2023-10-26 18:40:45 公開日:2023-10-24

# NADI 2023:第4回アラビア方言識別タスク

NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task ( http://arxiv.org/abs/2310.16117v1 )

ライセンス: Link先を確認

Muhammad Abdul-Mageed, AbdelRahim Elmadany, Chiyu Zhang, El Moatez Billah Nagoudi, Houda Bouamor, Nizar Habash

(参考訳) 第4回Nuanced Arabic Dialect Identification Shared Task (NADI 2023)の報告を行った。 NADIの目的は、研究チームが標準化された条件下で協力的に競争する機会を作ることで、最先端のアラビアNLPを促進することである。アラビア語の方言に注目し、新しいデータセットを提供し、異なるアプローチ間で意味のある比較を可能にするサブタスクを定義する。 NADI 2023は、方言識別(Subtask 1)と方言間機械翻訳(Subtask 2とSubtask 3)の両方をターゲットにしている。共有タスクには58のユニークなチームが登録され、そのうち18チームが参加している(テストフェーズには76の有効な応募がある)。そのうち16チームがsubtask 1, 5がsubtask 2に参加し,3がsubtask 3に参加した。優勝チームはSubtask 1, 14.76 Bleuで87.27 F1、Subtask 2, 21.10 Bleuでそれぞれ勝利した。その結果,3つのサブタスクは依然として困難なままであり,将来的な作業のモチベーションが得られた。参加チームが採用する手法について説明し,NADIの展望を簡潔に述べる。

We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023). The objective of NADI is to help advance state-of-the-art Arabic NLP by creating opportunities for teams of researchers to collaboratively compete under standardized conditions. It does so with a focus on Arabic dialects, offering novel datasets and defining subtasks that allow for meaningful comparisons between different approaches. NADI 2023 targeted both dialect identification (Subtask 1) and dialect-to-MSA machine translation (Subtask 2 and Subtask 3). A total of 58 unique teams registered for the shared task, of whom 18 teams have participated (with 76 valid submissions during test phase). Among these, 16 teams participated in Subtask 1, 5 participated in Subtask 2, and 3 participated in Subtask 3. The winning teams achieved 87.27 F1 on Subtask 1, 14.76 Bleu in Subtask 2, and 21.10 Bleu in Subtask 3, respectively. Results show that all three subtasks remain challenging, thereby motivating future work in this area. We describe the methods employed by the participating teams and briefly offer an outlook for NADI.

翻訳日:2023-10-26 18:40:36 公開日:2023-10-24

# 過去のデータのない概念の覚醒:オンラインプラセボからのクラスインクリメンタルラーニング

Wakening Past Concepts without Past Data: Class-Incremental Learning from Online Placebos ( http://arxiv.org/abs/2310.16115v1 )

ライセンス: Link先を確認

Yaoyao Liu, Yingying Li, Bernt Schiele, Qianru Sun

(参考訳) 古いクラス知識を忘れないことは、モデルが新しいクラスに継続的に適応する場合、クラスインクリメンタル学習(cil)にとって重要な課題である。これに対処する一般的なテクニックは知識蒸留(kd)であり、古いモデルと新しいモデルの予測の不一致を罰する。このような予測は、cilのメモリ制限が厳しいため、古いクラスデータは極めて少ないため、ほとんど新しいクラスデータで行われます。本稿では,KDの損失を深く掘り下げ,「KDの新しいクラスデータの利用」がモデル適応を阻害するだけでなく(新しいクラスを学習するために),古いクラスの知識を保存するための効率の低下をもたらすことを明らかにする。ここでは,Google Imagesなどの無料画像ストリームから,Placebosを自動的かつ経済的に選択するKDの古いクラスのPlaceboを使用することによって,この問題に対処する。この目的のために,オンラインプレースボ選択ポリシーをトレーニングし,ストリーミング画像(良か悪か)の品質を迅速に評価し,kdの1回フィードフォワード計算によいもののみを使用する。我々は,オンラインマルコフ決定プロセス(MDP)としてポリシートレーニングプロセスを定式化し,このMDP問題を解決するためのオンライン学習アルゴリズムを導入する。実験では、我々の方法が示されます。 1) placebosとオリジナルの古いクラスデータの間にクラス重複がない場合でも、驚くほど効果的である。 2)追加の監督や記憶予算を必要としない。 3)多くの上位パフォーマンスcilメソッド、特にクラスごとに5つのexemplarsのような古いクラスのexemplarに対して低いメモリ予算を使用する場合を著しく上回っている。

Not forgetting old class knowledge is a key challenge for class-incremental learning (CIL) when the model continuously adapts to new classes. A common technique to address this is knowledge distillation (KD), which penalizes prediction inconsistencies between old and new models. Such prediction is made with almost new class data, as old class data is extremely scarce due to the strict memory limitation in CIL. In this paper, we take a deep dive into KD losses and find that "using new class data for KD" not only hinders the model adaption (for learning new classes) but also results in low efficiency for preserving old class knowledge. We address this by "using the placebos of old classes for KD", where the placebos are chosen from a free image stream, such as Google Images, in an automatical and economical fashion. To this end, we train an online placebo selection policy to quickly evaluate the quality of streaming images (good or bad placebos) and use only good ones for one-time feed-forward computation of KD. We formulate the policy training process as an online Markov Decision Process (MDP), and introduce an online learning algorithm to solve this MDP problem without causing much computation costs. In experiments, we show that our method 1) is surprisingly effective even when there is no class overlap between placebos and original old class data, 2) does not require any additional supervision or memory budget, and 3) significantly outperforms a number of top-performing CIL methods, in particular when using lower memory budgets for old class exemplars, e.g., five exemplars per class.

翻訳日:2023-10-26 18:40:17 公開日:2023-10-24

# 脳遺伝子転写の圧縮発現

Compressed representation of brain genetic transcription ( http://arxiv.org/abs/2310.16113v1 )

ライセンス: Link先を確認

James K Ruffle, Henry Watkins, Robert J Gray, Harpreet Hyare, Michel Thiebaut de Schotten, Parashkev Nachev

(参考訳) 脳のアーキテクチャは複雑すぎるので、コンパクトでナビゲート可能な空間に変化を投影する圧縮表現を使わずに直感的に調査できる。この課題は、解剖学的および転写学的パターンの結合複雑性が最大圧縮を要求する遺伝子表現のような高次元データにおいて特に困難である。確立された実践は標準主成分分析(pca)であり、その計算フェリシティは限定的な表現率、特に大きな圧縮比によって相殺される。 Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets. ディープオートエンコーダは、パフォーマンスとターゲットドメインのすべての指標において優れた表現力を示し、人間の脳における転写パターンを表現する基準標準としての使用をサポートする。

The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use standard principal component analysis (PCA), whose computational felicity is offset by limited expressivity, especially at great compression ratios. Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets. We show that deep auto-encoders yield superior representations across all metrics of performance and target domains, supporting their use as the reference standard for representing transcription patterns in the human brain.

翻訳日:2023-10-26 18:39:46 公開日:2023-10-24

# 胸部X線からの長期多ラベル疾患分類に向けて:CXR-LT課題の概観

Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge ( http://arxiv.org/abs/2310.16112v1 )

ライセンス: Link先を確認

Gregory Holste, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang, Dongkyun Kim, Trong-Hieu Nguyen-Mau, Minh-Triet Tran, Jaehyup Jeong, Wongi Park, Jongbin Ryu, Feng Hong, Arsh Verma, Yosuke Yamagishi, Changhyun Kim, Hyeryeong Seo, Myungjoo Kang, Leo Anthony Celi, Zhiyong Lu, Ronald M. Summers, George Shih, Zhangyang Wang, Yifan Peng

(参考訳) 診断医療画像検査のような現実世界の画像認識問題の多くは、"long-tailed" $\unicode{x2013}$である。胸部X線撮影では、診断は長い尾と多ラベルの問題であり、患者は同時に複数の所見を呈することが多い。医学画像認識における長期学習の問題の研究が始まっているが、長期学習によるラベルの不均衡とラベル共起の相互作用を研究する研究者はほとんどいない。今回我々は,胸部x線 (cxr) からの胸部多発性胸部疾患の分類について, 研究コミュニティと協働し, cxr-lt (cxr-lt) のオープンチャレンジを行った。我々は、35万以上のCXRの大規模ベンチマークデータセットを公開し、それぞれに長い尾の分布の後、少なくとも26の臨床所見の1つをラベル付けした。トップパフォーマンスソリューションの一般的なテーマを合成し,ロングテール,マルチラベルの医用画像分類を実践的に推奨する。最後に,これらの知見を用いて,視覚言語基礎モデルによる少数・ゼロショットの疾患分類を提案する。

Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" $\unicode{x2013}$ there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.

翻訳日:2023-10-26 18:39:28 公開日:2023-10-24

# ゼロショットプロンプトを用いた局所微分プライベート文書生成

Locally Differentially Private Document Generation Using Zero Shot Prompting ( http://arxiv.org/abs/2310.16111v1 )

ライセンス: Link先を確認

Saiteja Utpala, Sara Hooker, Pin Yu Chen

(参考訳) 多くの研究が、事前訓練された大きな言語モデルに関連するプライバシーリスクを強調している。対照的に,本研究は,事前学習された大規模言語モデルがプライバシー保護に効果的に寄与することを示すことにより,独自の視点を提供する。本稿では,DP-Promptという,事前訓練された大規模言語モデルのパワーとゼロショットプロンプトを利用して,ダウンストリームユーティリティへの影響を最小限に抑えながら,作者の匿名化攻撃に対処する手法を提案する。 DP-PromptをChatGPT(gpt-3.5)のような強力な言語モデルで使用すると、匿名化攻撃の成功率の顕著な低下が観察され、より単純な設計にもかかわらず既存のアプローチをかなり上回っていることが示された。例えば、IMDBデータセットの場合、DP-Prompt(ChatGPT)は、静的攻撃者に対する著者識別F1スコアの46倍、適応攻撃者に対する26倍の低下を達成しながら、クリーンな感情F1スコアを完全に回復する。プライバシ利用トレードオフのさまざまな影響を分析するために,70億のパラメータを含む,オープンソースの6つの大規模言語モデルを対象に,広範な実験を行いました。

Numerous studies have highlighted the privacy risks associated with pretrained large language models. In contrast, our research offers a unique perspective by demonstrating that pretrained large language models can effectively contribute to privacy preservation. We propose a locally differentially private mechanism called DP-Prompt, which leverages the power of pretrained large language models and zero-shot prompting to counter author de-anonymization attacks while minimizing the impact on downstream utility. When DP-Prompt is used with a powerful language model like ChatGPT (gpt-3.5), we observe a notable reduction in the success rate of de-anonymization attacks, showing that it surpasses existing approaches by a considerable margin despite its simpler design. For instance, in the case of the IMDB dataset, DP-Prompt (with ChatGPT) perfectly recovers the clean sentiment F1 score while achieving a 46\% reduction in author identification F1 score against static attackers and a 26\% reduction against adaptive attackers. We conduct extensive experiments across six open-source large language models, ranging up to 7 billion parameters, to analyze various effects of the privacy-utility tradeoff.

翻訳日:2023-10-26 18:39:01 公開日:2023-10-24

# 複合画像生成SwinTransformer Network for Audio Denoising

Complex Image Generation SwinTransformer Network for Audio Denoising ( http://arxiv.org/abs/2310.16109v1 )

ライセンス: Link先を確認

Youshan Zhang and Jialu Li

(参考訳) 高性能なオーディオデノーミングを実現することは、現実世界のアプリケーションでは依然として難しい課題である。既存の時間周波数法は、しばしば生成された周波数領域画像の品質を無視している。本稿では,音声の雑音化問題を画像生成タスクに変換する。まず、複雑なフーリエ領域からより多くの情報を取得するための複雑な画像生成SwinTransformerネットワークを開発する。そこで我々は,高品質な画像を生成するために構造類似性と詳細な損失関数を課し,識別音声とクリーンオーディオの差を最小限に抑えるためにSDR損失を開発する。 2つのベンチマークデータセットに関する広範囲な実験により,提案手法が最先端の手法よりも優れていることを証明した。

Achieving high-performance audio denoising is still a challenging task in real-world applications. Existing time-frequency methods often ignore the quality of generated frequency domain images. This paper converts the audio denoising problem into an image generation task. We first develop a complex image generation SwinTransformer network to capture more information from the complex Fourier domain. We then impose structure similarity and detailed loss functions to generate high-quality images and develop an SDR loss to minimize the difference between denoised and clean audios. Extensive experiments on two benchmark datasets demonstrate that our proposed model is better than state-of-the-art methods.

翻訳日:2023-10-26 18:38:38 公開日:2023-10-24

# 進化の物理的性質と統計的収縮性は写像の等価概念である

Physicality of evolution and statistical contractivity are equivalent notions of maps ( http://arxiv.org/abs/2310.16107v1 )

ライセンス: Link先を確認

Matteo Scandi, Paolo Abiuso, Dario De Santis, Jacopo Surace

(参考訳) 統計量化器は、ノイズ変換の下で情報が失われるべきという直感に従って、物理的進化の下で収縮するために一般的に必要である。この原理は統計学において非常に関係があり、それに基づいて一意性の結果を導出することさえ可能である: 任意の物理写像の下にそれらの縮約性を与えることによって、チェンツォフ=ペッツの定理はフィッシャー情報計量と呼ばれる確率分布(あるいは密度行列)の空間上の一意の計量を抽出する。この結果から、統計量化器は、その定義が物理写像に基づいているため、導出概念である可能性が示唆される。この作品の目的は、この信念を否定することである。実際、チェンツォフ=ペッツの定理に双対な結果を示し、すべての可能な線型写像の中で、フィッシャー情報に一致するのは、まさに物理的なものであることを証明した。この結果は、共通の意見に反して、物理地図と標準統計量化器の間には基本的な階層構造が存在しないことを示している。

Statistical quantifiers are generically required to contract under physical evolutions, following the intuition that information should be lost under noisy transformations. This principle is very relevant in statistics, and it even allows to derive uniqueness results based on it: by imposing their contractivity under any physical maps, the Chentsov-Petz theorem singles out a unique family of metrics on the space of probability distributions (or density matrices) called the Fisher information metrics. This result might suggest that statistical quantifiers are a derived concept, as their very definition is based on physical maps. The aim of this work is to disprove this belief. Indeed, we present a result dual to the Chentsov-Petz theorem, proving that among all possible linear maps, the only ones that contract the Fisher information are exactly the physical ones. This result shows that, contrary to the common opinion, there is no fundamental hierarchy between physical maps and canonical statistical quantifiers, as either of them can be defined in terms of the other.

翻訳日:2023-10-26 18:38:27 公開日:2023-10-24

# ブロードキャストベースサブグラフサンプリングを用いた無線ネットワークによる分散学習

Decentralized Learning over Wireless Networks with Broadcast-Based Subgraph Sampling ( http://arxiv.org/abs/2310.16106v1 )

ライセンス: Link先を確認

Daniel P\'erez Herrera, Zheng Chen and Erik G. Larsson

(参考訳) 本研究は、コンセンサスに基づく分散確率勾配勾配(D-SGD)を用いて、無線ネットワーク上の分散学習のコミュニケーション面に焦点を当てる。ネットワーク内情報交換による実際の通信コストや遅延を考慮すると,送信スロット毎の改善によって測定されたアルゴリズムの高速収束を実現することが目的である。本稿では,無線ネットワーク上でのD-SGDの効率的な通信フレームワークであるBASSを提案する。各イテレーションにおいて、非干渉ノードの複数のサブセットを起動し、隣人にモデル更新をブロードキャストする。これらのサブセットは時間とともにランダムに活性化され、確率はネットワーク接続の重要性を反映し、通信コストの制約(例えば、イテレーション当たりの平均送信スロット数)を受ける。コンセンサス更新ステップでは、通信対称性を維持するために双方向リンクのみを効果的に保存する。既存のリンクベースのスケジューリング手法と比較して、無線チャネルの固有の放送特性は、同じ数の送信スロットでより多くの通信リンクを作成することにより、分散学習の収束を早めるという本質的な利点を提供する。

This work centers on the communication aspects of decentralized learning over wireless networks, using consensus-based decentralized stochastic gradient descent (D-SGD). Considering the actual communication cost or delay caused by in-network information exchange in an iterative process, our goal is to achieve fast convergence of the algorithm measured by improvement per transmission slot. We propose BASS, an efficient communication framework for D-SGD over wireless networks with broadcast transmission and probabilistic subgraph sampling. In each iteration, we activate multiple subsets of non-interfering nodes to broadcast model updates to their neighbors. These subsets are randomly activated over time, with probabilities reflecting their importance in network connectivity and subject to a communication cost constraint (e.g., the average number of transmission slots per iteration). During the consensus update step, only bi-directional links are effectively preserved to maintain communication symmetry. In comparison to existing link-based scheduling methods, the inherent broadcasting nature of wireless channels offers intrinsic advantages in speeding up convergence of decentralized learning by creating more communicated links with the same number of transmission slots.

翻訳日:2023-10-26 18:38:03 公開日:2023-10-24

# 知識グラフに対する文脈対応説明可能なレコメンデーション

Context-aware explainable recommendations over knowledge graphs ( http://arxiv.org/abs/2310.16141v1 )

ライセンス: Link先を確認

Jinfeng Zhong, Elsa Negre

(参考訳) 知識グラフは、アイテムに関連する豊富な意味関係を含み、そのような意味関係をレコメンデーションシステムに組み込むことで、アイテムの潜伏した関係を探索し、予測の精度を改善し、レコメンデーションの説明可能性を高める。しかし、このような説明はユーザのコンテキストに適応せず、ユーザーの好みに大きく影響する可能性がある。そこで本研究では,コンテキストに適応したユーザの嗜好をモデル化し,項目に関する知識グラフにリッチな意味関係を組み込むためのエンドツーエンドフレームワークであるca-kgcn(context-aware knowledge graph convolutional network)を提案する。このフレームワークは、アイテムのコンテキストや特徴など、さまざまな要素に対するユーザの注意を捉える。具体的には、コンテキストに適合したユーザの好みをモデル化し、与えられたコンテキストに適応した説明を提供する。実世界の3つのデータセットの実験は、ユーザの好みを文脈に合わせてモデル化し、生成したリコメンデーションを説明するという、我々のフレームワークの有効性を示している。

Knowledge graphs contain rich semantic relationships related to items and incorporating such semantic relationships into recommender systems helps to explore the latent connections of items, thus improving the accuracy of prediction and enhancing the explainability of recommendations. However, such explainability is not adapted to users' contexts, which can significantly influence their preferences. In this work, we propose CA-KGCN (Context-Aware Knowledge Graph Convolutional Network), an end-to-end framework that can model users' preferences adapted to their contexts and can incorporate rich semantic relationships in the knowledge graph related to items. This framework captures users' attention to different factors: contexts and features of items. More specifically, the framework can model users' preferences adapted to their contexts and provide explanations adapted to the given context. Experiments on three real-world datasets show the effectiveness of our framework: modeling users' preferences adapted to their contexts and explaining the recommendations generated.

翻訳日:2023-10-26 18:29:51 公開日:2023-10-24

# Pix2HDR -- 高速HDRビデオのための画素単位の取得と深層学習に基づく合成アプローチ

Pix2HDR -- A pixel-wise acquisition and deep learning-based synthesis approach for high-speed HDR videos ( http://arxiv.org/abs/2310.16139v1 )

ライセンス: Link先を確認

Caixin Wang, Jie Zhang, Matthew A. Wilson, Ralph Etienne-Cummings

(参考訳) 広い動きと光強度でダイナミックなシーンを正確に捉えることは、多くの視覚アプリケーションにとって不可欠である。しかし、カメラのフレームレートがダイナミックレンジを制限するため、高速ハイダイナミックレンジ(HDR)ビデオの取得は困難である。既存の方法はマルチ露光フレームを取得するために速度を犠牲にする。しかし、これらのフレーム内の不整合運動は、なおもHDR融合アルゴリズムの複雑さを生じさせ、成果物をもたらす。フレームベースの露光の代わりに、個々のピクセルを様々な露光や位相オフセットでサンプリングする。ピクセル単位でプログラマブルなイメージセンサに実装したサンプリングパターンは,高速動作を同時に高ダイナミックレンジでキャプチャする。次に,ディープニューラルネットワークによるエンドツーエンド学習重みを用いて,画素毎の出力をhdrビデオに変換し,動きのぼやけを最小限に抑えながら,高い時空間分解能を達成する。我々は、1000FPSでエイリアスフリーのHDRビデオの取得を実証し、低照度条件下での高速な動きと明るい背景を解消する。複雑なシーンをデコードする際の深層ニューラルネットワークの強度と画素ワイドサンプリングパターンの汎用性を組み合わせることにより,動的条件下での視覚システムの適応性と性能を大幅に向上させる。

Accurately capturing dynamic scenes with wide-ranging motion and light intensity is crucial for many vision applications. However, acquiring high-speed high dynamic range (HDR) video is challenging because the camera's frame rate restricts its dynamic range. Existing methods sacrifice speed to acquire multi-exposure frames. Yet, misaligned motion in these frames can still pose complications for HDR fusion algorithms, resulting in artifacts. Instead of frame-based exposures, we sample the videos using individual pixels at varying exposures and phase offsets. Implemented on a pixel-wise programmable image sensor, our sampling pattern simultaneously captures fast motion at a high dynamic range. We then transform pixel-wise outputs into an HDR video using end-to-end learned weights from deep neural networks, achieving high spatiotemporal resolution with minimized motion blurring. We demonstrate aliasing-free HDR video acquisition at 1000 FPS, resolving fast motion under low-light conditions and against bright backgrounds - both challenging conditions for conventional cameras. By combining the versatility of pixel-wise sampling patterns with the strength of deep neural networks at decoding complex scenes, our method greatly enhances the vision system's adaptability and performance in dynamic conditions.

翻訳日:2023-10-26 18:29:33 公開日:2023-10-24

# 下位信号:脳神経発達過程としての乳児非栄養摂取の検出

Subtle Signals: Video-based Detection of Infant Non-nutritive Sucking as a Neurodevelopmental Cue ( http://arxiv.org/abs/2310.16138v1 )

ライセンス: Link先を確認

Shaotong Zhu, Michael Wan, Sai Kumar Reddy Manne, Emily Zimmerman, Sarah Ostadabbas

(参考訳) 栄養素を摂取せずにおしゃぶり、指または類似の物体を吸う行為である非栄養吸引(non-nutritive sucking, nns)は、健康な初期発達を評価する上で重要な役割を果たす。早産児の場合、NNS行動は摂食準備度を決定する重要な要素である。年長の幼児では、nns行動の特徴は神経および運動発達に関する貴重な洞察を与える。さらに、突発性乳幼児死亡症候群(SIDS)の予防としてNNS活性が提案されている。しかし、NNS評価の臨床応用は、現在、労働集約的および主観的指先評価によって妨げられている。そのため、研究者はしばしば、客観的なNS信号測定のために高価な圧力変換器を利用する。臨床医と研究者双方のNS信号監視のアクセシビリティと信頼性を高めるため,自然環境下でのベビーモニター映像を用いたNNS活動の非接触検出のためのビジョンベースアルゴリズムを提案する。本手法では,乳幼児の微妙な信号の検出と増幅を可能にするため,光学的流れと時間的畳み込みネットワークを包括的に探索する。均一長の短いビデオクリップをNNSおよび非NNS周期に分類することに成功した。さらに,NNSおよび非NNSセグメントに長い混合能動画を分割し,局所的な分類結果をまとめる手動および学習に基づく手法について検討した。本研究は,19名の乳児と183時間の乳児モニター映像を含む,乳児の注釈付きビデオの2つの新しいデータセットを紹介した。

Non-nutritive sucking (NNS), which refers to the act of sucking on a pacifier, finger, or similar object without nutrient intake, plays a crucial role in assessing healthy early development. In the case of preterm infants, NNS behavior is a key component in determining their readiness for feeding. In older infants, the characteristics of NNS behavior offer valuable insights into neural and motor development. Additionally, NNS activity has been proposed as a potential safeguard against sudden infant death syndrome (SIDS). However, the clinical application of NNS assessment is currently hindered by labor-intensive and subjective finger-in-mouth evaluations. Consequently, researchers often resort to expensive pressure transducers for objective NNS signal measurement. To enhance the accessibility and reliability of NNS signal monitoring for both clinicians and researchers, we introduce a vision-based algorithm designed for non-contact detection of NNS activity using baby monitor footage in natural settings. Our approach involves a comprehensive exploration of optical flow and temporal convolutional networks, enabling the detection and amplification of subtle infant-sucking signals. We successfully classify short video clips of uniform length into NNS and non-NNS periods. Furthermore, we investigate manual and learning-based techniques to piece together local classification results, facilitating the segmentation of longer mixed-activity videos into NNS and non-NNS segments of varying duration. Our research introduces two novel datasets of annotated infant videos, including one sourced from our clinical study featuring 19 infant subjects and 183 hours of overnight baby monitor footage.

翻訳日:2023-10-26 18:29:09 公開日:2023-10-24

# あなたは私をフォローできますか。 ChatGPTにおける状況理解のテスト

Can You Follow Me? Testing Situational Understanding in ChatGPT ( http://arxiv.org/abs/2310.16135v1 )

ライセンス: Link先を確認

Chenghao Yang, Allyson Ettinger

(参考訳) 文の意味の理解と情報の更新は、私たちが“situational understanding(su)”と呼ぶ、人間のようなaiエージェントにとって重要な能力です。特にチャットモデル、例えばChatGPTでは、人間とAIの一貫性、一貫性、効果的な対話を可能にするためにSUが不可欠である。従来,非チャットボット大規模言語モデル(LLM)のSU制限は特定されてきたが,これらの制限の程度や原因はよく理解されておらず,現在のチャットベースモデルの性能については検討されていない。本研究では,モデルが環境状態を追跡・列挙する能力を評価することによって,チャット指向モデルにおけるsuの制御および体系的なテストを可能にする,新しいsuテストのための合成環境を提案する。私たちの環境はまた、パフォーマンスパターンの根本原因をより深く理解するために、モデルパフォーマンスのダイナミクスを綿密に分析することができます。テストは最先端のチャットボットであるChatGPTに適用し、タスクの基本的な単純さにもかかわらず、モデルの性能は時間にわたって正しい環境状態を維持することができないことを反映している。当社のフォローアップ分析によると、パフォーマンスの低下は、主にchatgptが(完全な対話履歴にアクセスできるが)永続的なインコンテキストメモリを持っているためであり、アキュラシーを人工的に膨らませるアップデートを含む幻覚的なアップデートの影響を受けやすいためである。以上の結果から,ChatGPTは現状のロバストな追跡機能を備えていないことが示唆され,ChatGPTの優れた対話性能への信頼にはリスクが伴うことが示唆された。テスト環境を再現するためのコードベースと、ChatGPTからのすべてのプロンプトとAPIレスポンスを、https://github.com/yangalan123/SituationalTestingでリリースしています。

Understanding sentence meanings and updating information states appropriately across time -- what we call "situational understanding" (SU) -- is a critical ability for human-like AI agents. SU is essential in particular for chat models, such as ChatGPT, to enable consistent, coherent, and effective dialogue between humans and AI. Previous works have identified certain SU limitations in non-chatbot Large Language models (LLMs), but the extent and causes of these limitations are not well understood, and capabilities of current chat-based models in this domain have not been explored. In this work we tackle these questions, proposing a novel synthetic environment for SU testing which allows us to do controlled and systematic testing of SU in chat-oriented models, through assessment of models' ability to track and enumerate environment states. Our environment also allows for close analysis of dynamics of model performance, to better understand underlying causes for performance patterns. We apply our test to ChatGPT, the state-of-the-art chatbot, and find that despite the fundamental simplicity of the task, the model's performance reflects an inability to retain correct environment states across time. Our follow-up analyses suggest that performance degradation is largely because ChatGPT has non-persistent in-context memory (although it can access the full dialogue history) and it is susceptible to hallucinated updates -- including updates that artificially inflate accuracies. Our findings suggest overall that ChatGPT is not currently equipped for robust tracking of situation states, and that trust in the impressive dialogue performance of ChatGPT comes with risks. We release the codebase for reproducing our test environment, as well as all prompts and API responses from ChatGPT, at https://github.com/yangalan123/SituationalTesting.

翻訳日:2023-10-26 18:28:43 公開日:2023-10-24

# ソフトウェア工学会議とジャーナルの多様性

Diversity in Software Engineering Conferences and Journals ( http://arxiv.org/abs/2310.16132v1 )

ライセンス: Link先を確認

Aditya Shankar Narayanan, Dheeraj Vagavolu, Nancy A Day, Meiyappan Nagappan

(参考訳) 民族や性別に関する多様性は、ソフトウェア開発のオープンソースや産業環境で研究されてきた。学術会議や雑誌などの出版の道は、成長する技術産業に寄与している。しかし、学界における多様性に関する研究はほとんど行われていない。本稿では,ソフトウェア工学の会議や雑誌に掲載した著者の民族,性別,地理的多様性について検討する。ソフトウェア工学における3つのトップカンファレンスと2つのトップジャーナルの出版物の多様性を体系的に定量的に分析し、ソフトウェア工学の会議や出版物において、特定の民族、性別、地理的な場所に属する著者や委員に対するバイアスと参入障壁の存在を示唆する。本研究は,2010年から2022年までのICSE, FSE, ASEおよびIEEE TSE, ACM TOSEMの会議から,出版物(受理者)および委員会データ(プログラム・組織委員会・ジャーナル編集委員会)を分析した。このデータの分析によると、参加者や委員会メンバーの間では、アフリカ、南アメリカ、オセアニアの国からの出版物など、表現力が著しく低いコミュニティが存在する。しかし、委員会の多様性と参加者との相関研究では決定的な証拠は得られなかった。さらに、白人作家や男性作家との論文が引用される可能性が高いという決定的な証拠はない。最後に、2010-2022年の間に著者の民族多様性が向上したが、性別や地理的多様性は改善しなかった。

Diversity with respect to ethnicity and gender has been studied in open-source and industrial settings for software development. Publication avenues such as academic conferences and journals contribute to the growing technology industry. However, there have been very few diversity-related studies conducted in the context of academia. In this paper, we study the ethnic, gender, and geographical diversity of the authors published in Software Engineering conferences and journals. We provide a systematic quantitative analysis of the diversity of publications and organizing and program committees of three top conferences and two top journals in Software Engineering, which indicates the existence of bias and entry barriers towards authors and committee members belonging to certain ethnicities, gender, and/or geographical locations in Software Engineering conferences and journal publications. For our study, we analyse publication (accepted authors) and committee data (Program and Organizing committee/ Journal Editorial Board) from the conferences ICSE, FSE, and ASE and the journals IEEE TSE and ACM TOSEM from 2010 to 2022. The analysis of the data shows that across participants and committee members, there are some communities that are consistently significantly lower in representation, for example, publications from countries in Africa, South America, and Oceania. However, a correlation study between the diversity of the committees and the participants did not yield any conclusive evidence. Furthermore, there is no conclusive evidence that papers with White authors or male authors were more likely to be cited. Finally, we see an improvement in the ethnic diversity of the authors over the years 2010-2022 but not in gender or geographical diversity.

翻訳日:2023-10-26 18:28:10 公開日:2023-10-24

# GenKIE:ロバストな生成型マルチモーダルドキュメントキー情報抽出

GenKIE: Robust Generative Multimodal Document Key Information Extraction ( http://arxiv.org/abs/2310.16131v1 )

ライセンス: Link先を確認

Panfeng Cao, Ye Wang, Qiang Zhang, Zaiqiao Meng

(参考訳) スキャンされた文書からキー情報抽出(KIE)が注目されている。最近のkieのアプローチによって有望な結果が得られたが、通常は識別モデルに基づいて構築され、ocr(optical character recognition)エラーの処理能力がなく、不必要なトークンレベルのラベル付けが必要となる。本稿では,KIEタスクに対処する新しい生成的エンドツーエンドモデルであるGenkieを提案する。 genkieは、マルチモーダルエンコーダを使用して視覚、レイアウト、テキストの特徴を埋め込み、デコーダを使用して所望の出力を生成するシーケンスツーシーケンスのマルチモーダル生成モデルである。適切に設計されたプロンプトを利用して、ラベルセマンティクスを弱教師付き信号として組み込んで、キー情報の生成を促す。生成モデルの顕著な利点は、OCRエラーの自動修正を可能にすることである。さらに、トークンレベルの粒度アノテーションは不要である。複数のパブリックな実世界のデータセットに対する大規模な実験は、GenKIEが様々な種類のドキュメントを効果的に一般化し、最先端の結果を達成することを示している。実験では,OCRエラーに対するモデルの堅牢性も検証し,実際のシナリオにおいてGenKIEを高度に適用する。

Key information extraction (KIE) from scanned documents has gained increasing attention because of its applications in various domains. Although promising results have been achieved by some recent KIE approaches, they are usually built based on discriminative models, which lack the ability to handle optical character recognition (OCR) errors and require laborious token-level labelling. In this paper, we propose a novel generative end-to-end model, named GenKIE, to address the KIE task. GenKIE is a sequence-to-sequence multimodal generative model that utilizes multimodal encoders to embed visual, layout and textual features and a decoder to generate the desired output. Well-designed prompts are leveraged to incorporate the label semantics as the weakly supervised signals and entice the generation of the key information. One notable advantage of the generative model is that it enables automatic correction of OCR errors. Besides, token-level granular annotation is not required. Extensive experiments on multiple public real-world datasets show that GenKIE effectively generalizes over different types of documents and achieves state-of-the-art results. Our experiments also validate the model's robustness against OCR errors, making GenKIE highly applicable in real-world scenarios.

翻訳日:2023-10-26 18:27:46 公開日:2023-10-24

# octopus:アラビア語自然言語生成のためのマルチタスクモデルとツールキット

Octopus: A Multitask Model and Toolkit for Arabic Natural Language Generation ( http://arxiv.org/abs/2310.16127v1 )

ライセンス: Link先を確認

AbdelRahim Elmadany, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

(参考訳) アラビア語のテキストを理解し、人間のような応答を生成することは、難しい取り組みだ。多くの研究者が個々の問題に対するモデルと解決策を提案しているが、幅広いタスクを処理できる包括的なアラビア語自然言語生成ツールキットが急速に不足している。本稿では,新しいアラビア語テキスト変換モデルarat5v2について述べる。新しいモデルは,拡張シーケンス長2,048トークンを使用して,多種多様なデータに対して体系的に訓練されている。我々は,シングルタスクとマルチタスクの両方の設定下で,教師なし,監督なし,共同学習を含む様々な事前学習戦略を検討する。私たちのモデルは、大きなマージンで競争ベースラインを上回ります。これはPythonベースのパッケージで、8つのアラビア生成タスク用に調整されたコマンドラインツールキットで、すべて1つのモデルを利用しています。モデルとツールキットをパブリックリポジトリでリリースしています。

Understanding Arabic text and generating human-like responses is a challenging endeavor. While many researchers have proposed models and solutions for individual problems, there is an acute shortage of a comprehensive Arabic natural language generation toolkit that is capable of handling a wide range of tasks. In this work, we present a novel Arabic text-to-text Transformer model, namely AraT5v2. Our new model is methodically trained on extensive and diverse data, utilizing an extended sequence length of 2,048 tokens. We explore various pretraining strategies including unsupervised, supervised, and joint pertaining, under both single and multitask settings. Our models outperform competitive baselines with large margins. We take our work one step further by developing and publicly releasing Octopus, a Python-based package and command-line toolkit tailored for eight Arabic generation tasks all exploiting a single model. We release the models and the toolkit on our public repository.

翻訳日:2023-10-26 18:27:20 公開日:2023-10-24

# 薄肉金属添加物製造におけるオンライン熱場予測

Online Thermal Field Prediction for Metal Additive Manufacturing of Thin Walls ( http://arxiv.org/abs/2310.16125v1 )

ライセンス: Link先を確認

Yifan Tang, M. Rahmani Dehaghani, Pouyan Sajadi, Shahriar Bakrani Balani, Akshay Dhalpe, Suraj Panicker, Di Wu, Eric Coatanea, G. Gary Wang

(参考訳) 本論文は, 金属AMにおける実用的問題, すなわち, 少数のセンサが利用可能であれば, 印刷部品の熱場をオンラインで予測する方法について検討することを目的とする。本研究は,オンライン性能制御のための金属AMプロセスに統合可能なマッピングと再構成を用いたオンライン熱場予測手法を提案する。温度曲線(一点の温度プロファイルの曲線セグメント)の類似性に基づいて、熱電界マッピングは、予め印刷された層上のある点の測定温度から、未印刷層の点の温度曲線を推定する人工ニューラルネットワークを適用する。同じ層上の複数の点の温度分布を測定・予測することで、熱電界再構成は、同じ層上のすべての点の温度プロファイルを構築するための還元次数モデル(rom)を提案し、層全体の温度場を構築するのに使用できる。 ROMのトレーニングは、計算効率を高めるための極端な学習機械(ELM)を用いて行われる。 15本のワイヤアークAM実験と9つのシミュレーションは、各層の固定長と一方向印刷の薄い壁のために設計されている。実験結果から, 提案手法は, 低コストデスクトップ上で0.1秒以内で, 未印刷層の温度場を構築できることが示唆された。一方,本手法は,低層から高層へ,同じシミュレーションでは高層へ,異なるAMプロセスパラメータ上での新しいシミュレーションに至るまで,ほとんどの場合において適用可能である。さらに,提案手法を限られた実験データで微調整した後,新しい実験における予測温度分布の相対誤差は十分に小さく,金属AMのオンライン応用における熱場予測法の適用性と一般化が実証された。

This paper aims to study a practical issue in metal AM, i.e., how to predict the thermal field of yet-to-print parts online when only a few sensors are available. This work proposes an online thermal field prediction method using mapping and reconstruction, which could be integrated into a metal AM process for online performance control. Based on the similarity of temperature curves (curve segments of a temperature profile of one point), the thermal field mapping applies an artificial neural network to estimate the temperature curves of points on the yet-to-print layer from measured temperatures of certain points on the previously printed layer. With measured/predicted temperature profiles of several points on the same layer, the thermal field reconstruction proposes a reduced order model (ROM) to construct the temperature profiles of all points on the same layer, which could be used to build the temperature field of the entire layer. The training of ROM is performed with an extreme learning machine (ELM) for computational efficiency. Fifteen wire arc AM experiments and nine simulations are designed for thin walls with a fixed length and unidirectional printing of each layer. The test results indicate that the proposed prediction method could construct the thermal field of a yet-to-print layer within 0.1 seconds on a low-cost desktop. Meanwhile, the method has acceptable generalization capability in most cases from lower layers to higher layers in the same simulation and from one simulation to a new simulation on different AM process parameters. More importantly, after fine-tuning the proposed method with limited experimental data, the relative errors of all predicted temperature profiles on a new experiment are sufficiently small, demonstrating the applicability and generalization of the proposed thermal field prediction method in online applications for metal AM.

翻訳日:2023-10-26 18:27:06 公開日:2023-10-24

# アンカー空間最適輸送:複数のOT問題のバッチ処理の高速化

Anchor Space Optimal Transport: Accelerating Batch Processing of Multiple OT Problems ( http://arxiv.org/abs/2310.16123v1 )

ライセンス: Link先を確認

Jianming Huang, Xun Su, Zhongxi Fang, Hiroyuki Kasai

(参考訳) 最適輸送(ot)理論は、定義された距離空間上の確率分布を比較する効果的な方法を提供するが、立方体計算の複雑さに苦しむ。シンクホーンのアルゴリズムはotソリューションの計算複雑性を大幅に削減するが、複数のot問題の解は依然として時間消費とメモリ消費である。しかし、OTの計算加速度に関する多くの研究は、通常、単一OT問題の前提に基づいており、ミニバッチにおける分布の潜在的共通特性を無視している。そこで本研究では,複数のOT問題解のバッチ処理に特化して設計された,アンカー空間最適輸送(ASOT)問題として指定された翻訳OT問題を提案する。提案したASOT問題に対して、分布を共有アンカー点空間にマッピングすることで、潜在的な共通特性を学習し、OTバッチ処理を高速化する。提案する asot に基づいて、元の ot 問題に対する wasserstein 距離誤差は、地上コスト誤差によって境界づけられることが証明される。そこで本研究では,距離誤差を最小限に抑えるアンカー空間を3つの手法で学習する手法を提案する。実世界のデータセットの数値実験により,提案手法は妥当な近似性能を維持しつつ計算時間を劇的に短縮できることを示した。

The optimal transport (OT) theory provides an effective way to compare probability distributions on a defined metric space, but it suffers from cubic computational complexity. Although the Sinkhorn's algorithm greatly reduces the computational complexity of OT solutions, the solutions of multiple OT problems are still time-consuming and memory-comsuming in practice. However, many works on the computational acceleration of OT are usually based on the premise of a single OT problem, ignoring the potential common characteristics of the distributions in a mini-batch. Therefore, we propose a translated OT problem designated as the anchor space optimal transport (ASOT) problem, which is specially designed for batch processing of multiple OT problem solutions. For the proposed ASOT problem, the distributions will be mapped into a shared anchor point space, which learns the potential common characteristics and thus help accelerate OT batch processing. Based on the proposed ASOT, the Wasserstein distance error to the original OT problem is proven to be bounded by ground cost errors. Building upon this, we propose three methods to learn an anchor space minimizing the distance error, each of which has its application background. Numerical experiments on real-world datasets show that our proposed methods can greatly reduce computational time while maintaining reasonable approximation performance.

翻訳日:2023-10-26 18:26:38 公開日:2023-10-24

# 素粒子物理学のためのニューラルネットワーク「19のパラメータ」

19 Parameters Is All You Need: Tiny Neural Networks for Particle Physics ( http://arxiv.org/abs/2310.16121v1 )

ライセンス: Link先を確認

Alexander Bogatskiy, Timothy Hoffman, Jan T. Offermann

(参考訳) 粒子加速器の衝突速度が向上し、ディープラーニングソリューションがその実現可能性を証明するにつれ、トリガーのような低レイテンシタスクのための軽量で高速なニューラルネットワークアーキテクチャの必要性が高まっている。本稿では,最近のLorentz- and permutation-symmetric architectureであるPELICANの可能性を検証し,トップクォークジェットタグのバイナリ分類タスクと比較した場合に,数万のパラメータで汎用アーキテクチャを上回り,最大19個のトレーニング可能なパラメータを提示する。

As particle accelerators increase their collision rates, and deep learning solutions prove their viability, there is a growing need for lightweight and fast neural network architectures for low-latency tasks such as triggering. We examine the potential of one recent Lorentz- and permutation-symmetric architecture, PELICAN, and present its instances with as few as 19 trainable parameters that outperform generic architectures with tens of thousands of parameters when compared on the binary classification task of top quark jet tagging.

翻訳日:2023-10-26 18:26:17 公開日:2023-10-24

# 静的長距離双極子相互作用による量子位置相関を持つ冷エミッタアンサンブル中の光伝播

Propagation of light in cold emitter ensembles with quantum position correlations due to static long-range dipolar interactions ( http://arxiv.org/abs/2310.16158v1 )

ライセンス: Link先を確認

G. J. Bean, N. D. Drummond, J. Ruostekoski

(参考訳) 我々は、不規則な位置が静的な長距離双極子-双極子相互作用によって引き起こされる相関を示す双極子エミッタからの光の散乱を分析する。量子力学的位置相関は、変動量子および拡散量子モンテカルロ法によるゼロ温度ボゾン原子または分子に対して計算される。低光強度の極限における高密度アンサンブル中の定常原子に対して、シミュレーションは、電子基底状態と励起状態を含む全ての位置相関関数に対する光学応答の解を与える。我々は,コヒーレントかつ非コヒーレントな散乱,集合線幅,直線シフト,固有モード,および障害誘発励起局在が静的相互作用と密度に影響されるかを計算する。強く閉じ込められたオービタントトラップとプロラトトラップの強い反発的な静的相互作用は、光を介する共鳴双極子-双極子相互作用において大きな変動を緩和する双極子間の短距離秩序をもたらす。典型的には、コヒーレント反射と光学的深さが増大し、コヒーレント散乱が減少する。静的双極子相互作用の存在は、密度の強い雲におけるサブラジアント固有モードの高選択的励起を可能にする。この効果は、自然の線幅より下にある共鳴が狭いプロラトトラップにおいてさらに顕著になる。静的双極子相互作用が光遷移周波数に影響を及ぼすと、アンサンブルは協調効果を抑制する不均一に経験された静的双極子相互作用によって不均質な広がりを示す。

We analyze the scattering of light from dipolar emitters whose disordered positions exhibit correlations induced by static, long-range dipole-dipole interactions. The quantum-mechanical position correlations are calculated for zero temperature bosonic atoms or molecules using variational and diffusion quantum Monte Carlo methods. For stationary atoms in dense ensembles in the limit of low light intensity, the simulations yield solutions for the optical responses to all orders of position correlation functions that involve electronic ground and excited states. We calculate how coherent and incoherent scattering, collective linewidths, line shifts, and eigenmodes, and disorder-induced excitation localization are influenced by the static interactions and the density. We find that dominantly repulsive static interactions in strongly confined oblate and prolate traps introduce short-range ordering among the dipoles which curtails large fluctuations in the light-mediated resonant dipole-dipole interactions. This typically results in an increase in coherent reflection and optical depth, accompanied by reduced incoherent scattering. The presence of static dipolar interactions permits the highly selective excitation of subradiant eigenmodes in dense clouds. This effect becomes even more pronounced in a prolate trap, where the resonances narrow below the natural linewidth. When the static dipolar interactions affect the optical transition frequencies, the ensemble exhibits inhomogeneous broadening due to the nonuniformly experienced static dipolar interactions that suppress cooperative effects.

翻訳日:2023-10-26 18:21:51 公開日:2023-10-24

# 議論による文脈認識特徴帰属

Context-aware feature attribution through argumentation ( http://arxiv.org/abs/2310.16157v1 )

ライセンス: Link先を確認

Jinfeng Zhong, Elsa Negre

(参考訳) 特徴帰属(feature attribution)は、機械学習とデータ分析の両方において、モデル出力に対する個々の特徴や変数の寄与を決定する基本的なタスクである。このプロセスは、結果を予測する上で最も重要な特徴を特定するのに役立つ。特徴属性法の歴史は、従属変数と独立変数の間の非線形関係を組み込んで線形回帰モデルを拡張する一般付加モデル(GAM)に遡ることができる。近年、勾配に基づく手法やサロゲートモデルが複雑な人工知能(AI)システムに応用されているが、これらの手法には限界がある。ガンは精度が低い傾向にあり、勾配に基づく手法は解釈が難しく、サロゲートモデルはしばしば安定性と忠実性の問題に苦しむ。さらに,既存の手法ではユーザのコンテキストを考慮せず,好みに大きな影響を及ぼす可能性がある。このような制約に対処し、現在の最先端を推し進めるために、我々は、CA-FATA(Context-Aware Feature Attribution Through Argumentation)と呼ばれる新しい特徴属性フレームワークを定義します。我々のフレームワークは、各フィーチャを、予測をサポートし、攻撃し、または中和できる引数として扱うことによって、議論の力を利用する。さらに、CA-FATAは議論手順として属性を定式化し、各計算には明示的な意味論があり、本質的に解釈可能である。 CA-FATAは、ユーザのコンテキストなどのサイド情報を容易に統合し、より正確な予測を行う。

Feature attribution is a fundamental task in both machine learning and data analysis, which involves determining the contribution of individual features or variables to a model's output. This process helps identify the most important features for predicting an outcome. The history of feature attribution methods can be traced back to General Additive Models (GAMs), which extend linear regression models by incorporating non-linear relationships between dependent and independent variables. In recent years, gradient-based methods and surrogate models have been applied to unravel complex Artificial Intelligence (AI) systems, but these methods have limitations. GAMs tend to achieve lower accuracy, gradient-based methods can be difficult to interpret, and surrogate models often suffer from stability and fidelity issues. Furthermore, most existing methods do not consider users' contexts, which can significantly influence their preferences. To address these limitations and advance the current state-of-the-art, we define a novel feature attribution framework called Context-Aware Feature Attribution Through Argumentation (CA-FATA). Our framework harnesses the power of argumentation by treating each feature as an argument that can either support, attack or neutralize a prediction. Additionally, CA-FATA formulates feature attribution as an argumentation procedure, and each computation has explicit semantics, which makes it inherently interpretable. CA-FATA also easily integrates side information, such as users' contexts, resulting in more accurate predictions.

翻訳日:2023-10-26 18:21:06 公開日:2023-10-24

# 光による超伝導量子ビットのコヒーレント制御

Coherent control of a superconducting qubit using light ( http://arxiv.org/abs/2310.16155v1 )

ライセンス: Link先を確認

Hana K. Warner (1), Jeffrey Holzgrafe (1 and 2), Beatriz Yankelevich (3), David Barton (1), Stefano Poletto (3), C. J. Xin (1), Neil Sinclair (1 and 4), Di Zhu (1), Eyob Sete (3), Brandon Langley (3), Emma Batson (5), Marco Colangelo (5), Amirhassan Shams-Ansari (1), Graham Joe (1), Karl K. Berggren (5), Liang Jiang (6), Matthew Reagor (3), and Marko Loncar (1) ((1) Harvard John A. Paulson School for Engineering and Applied Sciences, Cambridge, MA, USA, (2) Hyperlight Corporation, Cambridge, MA, USA, (3) Rigetti Computing, Berkeley, CA, USA, (4) Division of Physics, Mathematics, and Astronomy, California Institute of Technology, Pasadena, CA, USA, (5) Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA, (6) Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, USA)

(参考訳) 量子科学と技術は、低損失および低ノイズ通信チャネルに接続された量子プロセッサのネットワークに依存する強力な計算資源の実現を約束している [1,2]。極低温環境で動作する超伝導マイクロ波量子ビット (3-8ghz) は、その強いジョセフソン非線形性と低損失 [3] のために量子プロセッサノードの有望な候補として現れているが、空間的に分離されたプロセッサノード間の情報は、低損失光ファイバを伝搬する通信光子 (200 thz) を介して室温で伝達される可能性が高い。したがって、これらの異なる周波数間の量子情報の変換 [4-10] は、各プラットフォームの利点を量子資源と対向させることで活用することが重要である。ここでは超伝導量子ビットのコヒーレント光制御を示す。我々は、最大1.18%の変換効率(1.16%の協調性)で動作し、量子コヒーレンス時間 (800 ns) に影響を与えずに超伝導量子ビット内のラビ振動 (2.27 mhz) を示すマイクロ波光量子トランスデューサを開発した。最後に,ネットワーク量子プロセッサノードへのトランスデューサの利用に関する展望について述べる。

Quantum science and technology promise the realization of a powerful computational resource that relies on a network of quantum processors connected with low loss and low noise communication channels capable of distributing entangled states [1,2]. While superconducting microwave qubits (3-8 GHz) operating in cryogenic environments have emerged as promising candidates for quantum processor nodes due to their strong Josephson nonlinearity and low loss [3], the information between spatially separated processor nodes will likely be carried at room temperature via telecommunication photons (200 THz) propagating in low loss optical fibers. Transduction of quantum information [4-10] between these disparate frequencies is therefore critical to leverage the advantages of each platform by interfacing quantum resources. Here, we demonstrate coherent optical control of a superconducting qubit. We achieve this by developing a microwave-optical quantum transducer that operates with up to 1.18% conversion efficiency (1.16% cooperativity) and demonstrate optically-driven Rabi oscillations (2.27 MHz) in a superconducting qubit without impacting qubit coherence times (800 ns). Finally, we discuss outlooks towards using the transducer to network quantum processor nodes.

翻訳日:2023-10-26 18:20:23 公開日:2023-10-24

# 深層ニューラルネットワークにおける不変表現の学習による次元の呪いの破れ

Breaking the Curse of Dimensionality in Deep Neural Networks by Learning Invariant Representations ( http://arxiv.org/abs/2310.16154v1 )

ライセンス: Link先を確認

Leonardo Petrini

(参考訳) 人工知能、特に機械学習のサブフィールドは、データから学び、データに適応するデータ駆動モデルへとパラダイムシフトしている。このことは、自然言語処理やコンピュータビジョンといった様々な領域において前例のない進歩をもたらした。ディープラーニングは、一連の計算層を通じて生データから関連する特徴を学習することで、従来のアプローチをはるかに超えている。この論文は、これらのモデルのアーキテクチャとそれらが処理するデータ内の固有の構造との関係を研究することによって、ディープラーニングの理論的基礎を探求する。特に、深層学習アルゴリズムの有効性を問うことで、いわゆる次元の呪い(すなわち、次元が増大するデータポイントの必要性が指数関数的に増加することによる、高次元での一般学習の難しさ)を克服できるだろうか? データの構造を利用して、関連する表現を学ぶ能力はあるか? 異なるアーキテクチャはどのように異なるデータ構造を利用するのか? これらの問題に対処するために、データの構造は、その不変性、すなわち、手元にあるタスクに無関係な側面によって効果的に特徴づけられるという考えを推し進める。本手法は,実験研究と物理モデルを組み合わせた深層学習への経験的アプローチを取り入れている。これらの単純化されたモデルは、私たちが深層学習システムで観察する複雑な振る舞いを調査し、解釈し、理論と実践のギャップを埋めることが目的である。

Artificial intelligence, particularly the subfield of machine learning, has seen a paradigm shift towards data-driven models that learn from and adapt to data. This has resulted in unprecedented advancements in various domains such as natural language processing and computer vision, largely attributed to deep learning, a special class of machine learning models. Deep learning arguably surpasses traditional approaches by learning the relevant features from raw data through a series of computational layers. This thesis explores the theoretical foundations of deep learning by studying the relationship between the architecture of these models and the inherent structures found within the data they process. In particular, we ask What drives the efficacy of deep learning algorithms and allows them to beat the so-called curse of dimensionality-i.e. the difficulty of generally learning functions in high dimensions due to the exponentially increasing need for data points with increased dimensionality? Is it their ability to learn relevant representations of the data by exploiting their structure? How do different architectures exploit different data structures? In order to address these questions, we push forward the idea that the structure of the data can be effectively characterized by its invariances-i.e. aspects that are irrelevant for the task at hand. Our methodology takes an empirical approach to deep learning, combining experimental studies with physics-inspired toy models. These simplified models allow us to investigate and interpret the complex behaviors we observe in deep learning systems, offering insights into their inner workings, with the far-reaching goal of bridging the gap between theory and practice.

翻訳日:2023-10-26 18:19:41 公開日:2023-10-24

# wojoodner 2023: アラビア語の最初の名前付きエンティティ認識共有タスク

WojoodNER 2023: The First Arabic Named Entity Recognition Shared Task ( http://arxiv.org/abs/2310.16153v1 )

ライセンス: Link先を確認

Mustafa Jarrar, Muhammad Abdul-Mageed, Mohammed Khalilia, Bashar Talafha, AbdelRahim Elmadany, Nagham Hamad, Alaa' Omar

(参考訳) WojoodNER-2023は、最初のアラビア名付きエンティティ認識(NER)共有タスクである。 WojoodNER-2023の主な焦点はアラビア語のNERであり、新しいNERデータセット(すなわちWojood)と異なるNERアプローチ間の有意義な比較を促進するために設計されたサブタスクの定義を提供する。 WojoodNER-2023はFlatNERとNestedNERの2つのサブタスクを含む。合計45のチームがこの共有タスクに登録され、そのうち11チームがテストフェーズに積極的に参加した。具体的には11チームがFlatNERに参加し、8ドルチームがNestedNERに挑戦した。優勝チームはF1得点を91.96点、NestedNERで93.73点とした。

We present WojoodNER-2023, the first Arabic Named Entity Recognition (NER) Shared Task. The primary focus of WojoodNER-2023 is on Arabic NER, offering novel NER datasets (i.e., Wojood) and the definition of subtasks designed to facilitate meaningful comparisons between different NER approaches. WojoodNER-2023 encompassed two Subtasks: FlatNER and NestedNER. A total of 45 unique teams registered for this shared task, with 11 of them actively participating in the test phase. Specifically, 11 teams participated in FlatNER, while $8$ teams tackled NestedNER. The winning teams achieved F1 scores of 91.96 and 93.73 in FlatNER and NestedNER, respectively.

翻訳日:2023-10-26 18:19:08 公開日:2023-10-24

# FLTrojan: 選択的な重み付けによるフェデレーション言語モデルに対するプライバシ漏洩攻撃

FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering ( http://arxiv.org/abs/2310.16152v1 )

ライセンス: Link先を確認

Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Kang Gu, Najrin Sultana, Shagufta Mehnaz

(参考訳) フェデレーション学習(federated learning, fl)は、言語モデリングを含む多くのテクノロジベースのアプリケーションにおいて、重要なコンポーネントになりつつある。しかし、連合言語モデルにおけるプライバシー漏洩の程度を認識するのは簡単ではなく、既存の攻撃は、それがどれほど敏感であるか、あるいは無意味であるかに関わらず、データを抽出することだけを目的としている。本稿では,このギャップを埋めるため,フェデレーション言語モデルからプライバシに敏感なユーザデータを漏洩する2つの新たな知見を提案する。まず、FLの中間ラウンドからのモデルスナップショットが、最終的なトレーニングモデルよりも大きなプライバシリークを引き起こす可能性があることを重要視する。第2に、センシティブなトレーニングデータを記憶する責任を特に負うモデルの選択的な重みを改ざんすることで、プライバシの漏洩が増大する可能性があることを特定する。悪意のあるクライアントが、サーバからの協力なしに、FL内の他のユーザのプライバシーに敏感なデータを漏洩させる方法を示す。提案手法は, メンバシップ推定のリコールを29%向上させ, 最大70%のプライベートデータ再構成を達成し, 敵の能力の強い仮定で既存の攻撃よりも優れていた。

Federated learning (FL) is becoming a key component in many technology-based applications including language modeling -- where individual FL participants often have privacy-sensitive text data in their local datasets. However, realizing the extent of privacy leakage in federated language models is not straightforward and the existing attacks only intend to extract data regardless of how sensitive or naive it is. To fill this gap, in this paper, we introduce two novel findings with regard to leaking privacy-sensitive user data from federated language models. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that privacy leakage can be aggravated by tampering with a model's selective weights that are specifically responsible for memorizing the sensitive training data. We show how a malicious client can leak the privacy-sensitive data of some other user in FL even without any cooperation from the server. Our best-performing method improves the membership inference recall by 29% and achieves up to 70% private data reconstruction, evidently outperforming existing attacks with stronger assumptions of adversary capabilities.

翻訳日:2023-10-26 18:18:45 公開日:2023-10-24

# Yin Yang Convolutional Nets:Opposites解析による画像マニフォールド抽出

Yin Yang Convolutional Nets: Image Manifold Extraction by the Analysis of Opposites ( http://arxiv.org/abs/2310.16148v1 )

ライセンス: Link先を確認

Augusto Seben da Rosa, Frederico Santos de Oliveira, Anderson da Silva Soares, Arnaldo Candido Junior

(参考訳) コンピュータビジョンは、トレーニング最適化、新しいアーキテクチャ(純粋注意、効率的なブロック、視覚言語モデル、生成モデルなど)など、いくつかの進歩を示した。これにより、分類などのいくつかのタスクのパフォーマンスが向上した。しかし、これらのモデルの大部分は、脳に関する現実的な神経科学的アプローチから遠ざかっている修正に焦点を当てている。本研究では,視覚多様体を抽出するアーキテクチャであるYin Yang Convolutional Network(Yin Yang Convolutional Network,Yin Yang Convolutional Network,Yin Yang Convolutional Network,Yin Yang Convolutional Network,Yin Yang Convolutional Network,Yin Yang Convolutional Network)を紹介する。我々のアーキテクチャは,データセットCIFAR-10の低パラメータアーキテクチャ間で,最先端の効率を提供することを示す。最初のモデルは93.32\%テスト精度に達し、このカテゴリの古いsomaよりも0.8\%高く、パラメータは15,000未満(726k)でした。第2のモデルは52kパラメータを使用し、テスト精度はわずか3.86\%です。 ImageNetでも分析を行い、1.6Mパラメータで66.49\%の精度で検証しました。コードはhttps://github.com/NoSavedDATA/YinYang_CNNで公開しています。

Computer vision in general presented several advances such as training optimizations, new architectures (pure attention, efficient block, vision language models, generative models, among others). This have improved performance in several tasks such as classification, and others. However, the majority of these models focus on modifications that are taking distance from realistic neuroscientific approaches related to the brain. In this work, we adopt a more bio-inspired approach and present the Yin Yang Convolutional Network, an architecture that extracts visual manifold, its blocks are intended to separate analysis of colors and forms at its initial layers, simulating occipital lobe's operations. Our results shows that our architecture provides State-of-the-Art efficiency among low parameter architectures in the dataset CIFAR-10. Our first model reached 93.32\% test accuracy, 0.8\% more than the older SOTA in this category, while having 150k less parameters (726k in total). Our second model uses 52k parameters, losing only 3.86\% test accuracy. We also performed an analysis on ImageNet, where we reached 66.49\% validation accuracy with 1.6M parameters. We make the code publicly available at: https://github.com/NoSavedDATA/YinYang_CNN.

翻訳日:2023-10-26 18:18:21 公開日:2023-10-24

# PreWoMe: ロングフォーム質問回答のためのワーキングメモリとしての前提事項のエクスプロイト

PreWoMe: Exploiting Presuppositions as Working Memory for Long Form Question Answering ( http://arxiv.org/abs/2310.16147v1 )

ライセンス: Link先を確認

Wookje Han, Jinsol Park, Kyungjae Lee

(参考訳) 長文質問応答(LFQA)における情報探索質問は、その質問の曖昧さや偽の前提によって誤解を招くことが多い。既存の多くのアプローチは誤解を招く問題に対処するが、予測不可能な入力特性を持つ現実世界では不十分な限られた問題に適応している。本研究では,任意の種類の情報探索問題に対処できる統一的なアプローチであるPreWoMeを提案する。 PreWoMeのキーとなるアイデアは、質問の前提を抽出し、それらをワーキングメモリとして利用して、質問に対するフィードバックとアクションを生成することである。実験の結果,PreWoMeは誤解を招く質問に対処するだけでなく,通常の質問に対処する上でも有効であることがわかった。

Information-seeking questions in long-form question answering (LFQA) often prove misleading due to ambiguity or false presupposition in the question. While many existing approaches handle misleading questions, they are tailored to limited questions, which are insufficient in a real-world setting with unpredictable input characteristics. In this work, we propose PreWoMe, a unified approach capable of handling any type of information-seeking question. The key idea of PreWoMe involves extracting presuppositions in the question and exploiting them as working memory to generate feedback and action about the question. Our experiment shows that PreWoMe is effective not only in tackling misleading questions but also in handling normal ones, thereby demonstrating the effectiveness of leveraging presuppositions, feedback, and action for real-world QA settings.

翻訳日:2023-10-26 18:17:59 公開日:2023-10-24

# Clinfo.ai: 学術文献を用いた医学質問応答のためのオープンソースの検索型大規模言語モデルシステム

Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature ( http://arxiv.org/abs/2310.16146v1 )

ライセンス: Link先を確認

Alejandro Lozano, Scott L Fleming, Chia-Chun Chiang, and Nigam Shah

(参考訳) 出版される医学文献の急速な発展は、臨床医や研究者が最新の関連する発見をタイムリーに追従し、要約することを困難にしている。大規模言語モデル(LLM)に基づくいくつかのクローズドソース要約ツールが存在するが、その出力の厳密で体系的な評価は欠如している。さらに、これらのツールを評価するための高品質なデータセットと適切なベンチマークタスクが存在する。 We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200.

The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality datasets and appropriate benchmark tasks with which to evaluate these tools. We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200.

翻訳日:2023-10-26 18:17:45 公開日:2023-10-24

# 限定記憶能力を持つ言語モデルによる人間の文処理における干渉

A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing ( http://arxiv.org/abs/2310.16142v1 )

ライセンス: Link先を確認

William Timkey, Tal Linzen

(参考訳) 人間の文処理の難易度を左右する2つの要因は、作業記憶からの期待と検索である。最近の統合認知モデル作成の試みは、トランスフォーマー言語モデルの自己愛機構と、人間の文処理における作業記憶のcueに基づく検索理論(ryuとlewis 2021)との並列性に依拠している。 ryuとlewisは、gpt-2の特殊注意ヘッドの注意パターンが類似性に基づく干渉、すなわちcueに基づく検索モデルの鍵となる予測と一致していることを示したが、それらの方法は構文的に特殊な注意ヘッドを識別することを必要とし、数百のメモリ検索操作が並行して行われるという認知的に予測不能な仮定を与える。本研究は,認知理論によって仮定される記憶系とより密接に類似した,単一の自己注意頭部を持つ反復型ニューラルネットワークモデルを開発する。本モデルでは,人間の実験で観察された意味的および構文的干渉効果を捉える。

Two of the central factors believed to underpin human sentence processing difficulty are expectations and retrieval from working memory. A recent attempt to create a unified cognitive model integrating these two factors relied on the parallels between the self-attention mechanism of transformer language models and cue-based retrieval theories of working memory in human sentence processing (Ryu and Lewis 2021). While Ryu and Lewis show that attention patterns in specialized attention heads of GPT-2 are consistent with similarity-based interference, a key prediction of cue-based retrieval models, their method requires identifying syntactically specialized attention heads, and makes the cognitively implausible assumption that hundreds of memory retrieval operations take place in parallel. In the present work, we develop a recurrent neural language model with a single self-attention head, which more closely parallels the memory system assumed by cognitive theories. We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.

翻訳日:2023-10-26 18:17:30 公開日:2023-10-24

# 隠れたサイテーションが科学に本当の影響を与えている

Hidden Citations Obscure True Impact in Science ( http://arxiv.org/abs/2310.16181v1 )

ライセンス: Link先を確認

Xiangyi Meng, Onur Varol, Albert-L\'aszl\'o Barab\'asi

(参考訳) 参照科学者が以前の知識に依拠するメカニズムは、近年広く使われて誤用された科学的影響の尺度へと変化している。しかし、発見が常識となると、引用は法人化によって消滅する。これは隠れた引用の概念につながり、それを具現化した出版物に言及することなく、発見への明確なテキストクレジットを表す。ここでは,各論文の全文に適用した教師なしの解釈可能な機械学習を用いて,隠れた引用を体系的に識別する。出版場所や規律に関係なく出現する,影響力のある発見や隠された引用数が引用数を上回っていることが判明した。引用数ではなく,写本の本文中の話題に関する談話の程度から判断し,より議論が深まるほど,標準書誌分析の可視性が低下することを示した。隠れた引用は、文献測度が発見の真の影響を定量化するための限られた視点を与え、科学的コーパスの全文から知識を抽出する必要性を高めていることを示している。

References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.

翻訳日:2023-10-26 18:09:11 公開日:2023-10-24

# 逆追跡による補正は要約における幻覚を減少させる

Correction with Backtracking Reduces Hallucination in Summarization ( http://arxiv.org/abs/2310.16176v1 )

ライセンス: Link先を確認

Zhenzhen Liu, Chao Wan, Varsha Kishore, Jin Peng Zhou, Minmin Chen, Kilian Q. Weinberger

(参考訳) 抽象要約は、重要な要素を保持しながら簡潔なソースドキュメントの自然言語要約を生成することを目的としている。近年の進歩にもかかわらず、ニューラルネットワークの要約モデルは、ソースドキュメントに基礎を置かない詳細の要約を生成させる幻覚(またはより正確に表現する)の影響を受けやすいことが知られている。本稿では,抽象的な要約における幻覚を低減するため,シンプルだが効率的な手法であるCoBaを紹介する。アプローチは幻覚検出と緩和という2つのステップに基づいている。前者は条件付き単語の確率と文脈語の距離に関する単純な統計値を測定することで達成可能であることを示す。さらに,ストレートフォワードバックトラッキングが驚くほど効果的であることを示す。テキスト要約のための3つのベンチマークデータセットに対して,先行技術を用いて提案手法を徹底的に評価した。その結果,CoBaは幻覚の低減に有効かつ効率的であり,適応性と柔軟性に優れていた。

Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization. The approach is based on two steps: hallucination detection and mitigation. We show that the former can be achieved through measuring simple statistics about conditional word probabilities and distance to context words. Further, we demonstrate that straight-forward backtracking is surprisingly effective at mitigation. We thoroughly evaluate the proposed method with prior art on three benchmark datasets for text summarization. The results show that CoBa is effective and efficient in reducing hallucination, and offers great adaptability and flexibility.

翻訳日:2023-10-26 18:08:52 公開日:2023-10-24

# G-CASCADE:2次元医用画像分割のための効率的なカスケードグラフ畳み込みデコーディング

G-CASCADE: Efficient Cascaded Graph Convolutional Decoding for 2D Medical Image Segmentation ( http://arxiv.org/abs/2310.16175v1 )

ライセンス: Link先を確認

Md Mostafijur Rahman and Radu Marculescu

(参考訳) 近年,医療画像分割は,コンピュータ支援診断の分野において重要な応用となっている。本稿では,2次元医用画像分割のための新しいグラフ畳み込み型デコーダであるカスケードグラフ畳み込み注意デコーダ(g-cascade)を提案する。 G-CASCADEは、効率的なグラフ畳み込みブロックを持つ階層変換器エンコーダによって生成される多段特徴写像を徐々に洗練する。エンコーダはセルフアテンション機構を利用して長距離依存関係をキャプチャし、デコーダはグラフ畳み込みブロックのグローバル受容フィールドによる長距離情報を保存する特徴マップを洗練する。複数のトランスフォーマーエンコーダを用いたデコーダの厳密な評価は,5つの医用画像分割作業(腹部臓器,心臓臓器,ポリープ病変,皮膚病変,網膜血管)において,我々のモデルが他のSOTA法よりも優れていることを示している。また,パラメータが80.8%少なく,FLOPが82.3%少ないSOTA CASCADEデコーダよりも優れたDICEスコアが得られることを示す。我々のデコーダは他の階層エンコーダと簡単に使用でき、汎用的セマンティックおよび医用画像セグメンテーションタスクに利用できる。

In recent years, medical image segmentation has become an important application in the field of computer-aided diagnosis. In this paper, we are the first to propose a new graph convolution-based decoder namely, Cascaded Graph Convolutional Attention Decoder (G-CASCADE), for 2D medical image segmentation. G-CASCADE progressively refines multi-stage feature maps generated by hierarchical transformer encoders with an efficient graph convolution block. The encoder utilizes the self-attention mechanism to capture long-range dependencies, while the decoder refines the feature maps preserving long-range information due to the global receptive fields of the graph convolution block. Rigorous evaluations of our decoder with multiple transformer encoders on five medical image segmentation tasks (i.e., Abdomen organs, Cardiac organs, Polyp lesions, Skin lesions, and Retinal vessels) show that our model outperforms other state-of-the-art (SOTA) methods. We also demonstrate that our decoder achieves better DICE scores than the SOTA CASCADE decoder with 80.8% fewer parameters and 82.3% fewer FLOPs. Our decoder can easily be used with other hierarchical encoders for general-purpose semantic and medical image segmentation tasks.

翻訳日:2023-10-26 18:08:39 公開日:2023-10-24

# フォトニック状態の量子幾何学による電磁不整合の探索

Probing Electromagnetic Nonreciprocity with Quantum Geometry of Photonic States ( http://arxiv.org/abs/2310.16174v1 )

ライセンス: Link先を確認

Ioannis Petrides, Jonathan B. Curtis, Marie Wesson, Amir Yacoby, Prineha Narang

(参考訳) 誘電体および磁性材料における相互および非相互効果は、電子の微視的性質に関する重要な情報を提供する。しかし、この2つを実験的に区別することは、特に関連する効果が極めて小さい場合に困難であることが証明されている。そこで本研究では,関心のある材料を中心に配置したクロスキャビティデバイスを用いた非接触検出を提案する。本稿では, キャビティの電磁モード間の結合と共振周波数のシフトに, Kerr や Faraday などの材料の光学特性, 複屈折が現れることを示す。幾何学的フォトニック状態のダイナミクスを計算することにより、量子メトリックおよび量子プロセストモグラフィーに基づいて、物質の複素屈折率の個々の成分を分離し、関連するパラメータ推定の分散に束縛された量子力学的クレーア・ラオを最小化する計測プロトコルを定式化する。本手法は,光キャビティにおけるフォック状態,マイクロ波およびTHz共振器におけるコヒーレント状態など,幅広い実験プラットフォームに適用可能であることが期待される。

Reciprocal and nonreciprocal effects in dielectric and magnetic materials provide crucial information about the microscopic properties of electrons. However, experimentally distinguishing the two has proven to be challenging, especially when the associated effects are extremely small. To this end, we propose a contact-less detection using a cross-cavity device where a material of interest is placed at its centre. We show that the optical properties of the material, such as Kerr and Faraday rotation, or, birefringence, manifest in the coupling between the cavities' electromagnetic modes and in the shift of their resonant frequencies. By calculating the dynamics of a geometrical photonic state, we formulate a measurement protocol based on the quantum metric and quantum process tomography that isolates the individual components of the material's complex refractive index and minimizes the quantum mechanical Cram\'er-Rao bound on the variance of the associated parameter estimation. Our approach is expected to be applicable across a broad spectrum of experimental platforms including Fock states in optical cavities, or, coherent states in microwave and THz resonators.

翻訳日:2023-10-26 18:08:18 公開日:2023-10-24

# $\epsilon$-Greedyによる深部Q-Networksの収束とサンプル複雑度解析について

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $\epsilon$-Greedy Exploration ( http://arxiv.org/abs/2310.16173v1 )

ライセンス: Link先を確認

Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury

(参考訳) 本稿では,深層強化学習における$\varepsilon$-greedyによるDQN(Deep Q-Network)の理論的理解を提供する。 DQNの壮大な経験的成果にもかかわらず、その理論的特徴は未解明のままである。まず、探査戦略は非現実的か既存の分析で無視される。第2に、従来のQ-ラーニングアルゴリズムとは対照的に、DQNはターゲットネットワークと経験リプレイを使用して、Q-ネットワークのトレーニングに使用する平均2乗ベルマン誤差(MSBE)のバイアスのない推定値を取得する。しかし、dqnsの既存の理論解析では収束解析が欠如しており、計算効率に乏しい超パラメータニューラルネットワークを配置することで技術的な課題を回避している。本稿では,DQNの実用的設定を$\epsilon$-greedyポリシーを用いて理論的収束とサンプル複雑性解析を行う。減衰$\epsilon$が最適Q値関数に幾何学的に収束する反復手順を証明する。さらに、$\epsilon$値のより高いレベルは収束領域を拡大するが収束を遅くするが、反対のレベルは$\epsilon$値の低レベルである。実験はdqnsの確立した理論的洞察を正当化する。

This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However, the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with decaying $\epsilon$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $\epsilon$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $\epsilon$ values. Experiments justify our established theoretical insights on DQNs.

翻訳日:2023-10-26 18:07:58 公開日:2023-10-24

# iNVS:新しいビュー合成のための拡散塗料の再利用

iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis ( http://arxiv.org/abs/2310.16167v1 )

ライセンス: Link先を確認

Yash Kant, Aliaksandr Siarohin, Michael Vasilkovsky, Riza Alp Guler, Jian Ren, Sergey Tulyakov, Igor Gilitschenski

(参考訳) 単一ソース画像から一貫した新しいビューを生成する方法を提案する。本手法は,画像からの可視画素の再利用を最大化する。これを実現するために,光源ビューから対象ビューへ可視画素を転送する単眼深度推定器を用いる。事前学習した2次元インペインティング拡散モデルから始めて,大規模オブジャバースデータセットを用いて3次元オブジェクトの事前学習を行う。トレーニング中は、エピポーラ線に基づく新しいマスキング機構を使用して、アプローチの質をさらに向上する。これにより、さまざまなオブジェクトに対してゼロショットの新規ビュー合成を行うことができる。 Google Scanned Objects、Ray Traced Multiview、Common Objectsの3つの挑戦的なデータセットでフレームワークのゼロショット能力を評価する。詳細は、私たちのWebページを参照してください。

We present a method for generating consistent novel views from a single source image. Our approach focuses on maximizing the reuse of visible pixels from the source image. To achieve this, we use a monocular depth estimator that transfers visible pixels from the source view to the target view. Starting from a pre-trained 2D inpainting diffusion model, we train our method on the large-scale Objaverse dataset to learn 3D object priors. While training we use a novel masking mechanism based on epipolar lines to further improve the quality of our approach. This allows our framework to perform zero-shot novel view synthesis on a variety of objects. We evaluate the zero-shot abilities of our framework on three challenging datasets: Google Scanned Objects, Ray Traced Multiview, and Common Objects in 3D. See our webpage for more details: https://yashkant.github.io/invs/

翻訳日:2023-10-26 18:07:39 公開日:2023-10-24

# Brainchop:次世代Webベースのニューロイメージングアプリケーション

Brainchop: Next Generation Web-Based Neuroimaging Application ( http://arxiv.org/abs/2310.16162v1 )

ライセンス: Link先を確認

Mohamed Masoud, Pratyush Reddy, Farfalla Hu, and Sergey Plis

(参考訳) ブラウザ内でのボリューム画像処理、特に医療データを直接行うことは、従来のバックエンドツールと比較して前例のない課題である。これらの課題は、制約付き計算リソースやフロントエンド機械学習ライブラリの可用性など、ブラウザ環境に固有の制限から生じる。その結果、エンドユーザーデータのプライバシと居住性を維持しつつ、脳全体の前処理とセグメンテーションに包括的なエンドツーエンドソリューションを提供することができる、神経画像フロントエンドツールが不足している。この状況を踏まえて、brainchop(http://www.brainchop.org)を、事前訓練されたフル脳深層学習モデルを使用して、構造mriのボリューム分析を可能にする画期的なブラウザ内神経イメージングツールとして紹介します。データプライバシに関するコミットメントに加えて、このフロントエンドツールはスケーラビリティ、低レイテンシ、ユーザフレンドリな操作、クロスプラットフォーム互換性、アクセシビリティ向上など、複数の機能を提供する。本稿では,brainchopの処理パイプラインを概説し,各種ソフトウェアおよびハードウェア構成におけるモデルの性能評価を行う。その結果,webブラウザのリソース制約環境においても,ロバストなメッシュネットアーキテクチャにより,ボリュームデータに対するクライアント側処理の実用性が示された。

Performing volumetric image processing directly within the browser, particularly with medical data, presents unprecedented challenges compared to conventional backend tools. These challenges arise from limitations inherent in browser environments, such as constrained computational resources and the availability of frontend machine learning libraries. Consequently, there is a shortage of neuroimaging frontend tools capable of providing comprehensive end-to-end solutions for whole brain preprocessing and segmentation while preserving end-user data privacy and residency. In light of this context, we introduce Brainchop (http://www.brainchop.org) as a groundbreaking in-browser neuroimaging tool that enables volumetric analysis of structural MRI using pre-trained full-brain deep learning models, all without requiring technical expertise or intricate setup procedures. Beyond its commitment to data privacy, this frontend tool offers multiple features, including scalability, low latency, user-friendly operation, cross-platform compatibility, and enhanced accessibility. This paper outlines the processing pipeline of Brainchop and evaluates the performance of models across various software and hardware configurations. The results demonstrate the practicality of client-side processing for volumetric data, owing to the robust MeshNet architecture, even within the resource-constrained environment of web browsers.

翻訳日:2023-10-26 18:07:24 公開日:2023-10-24

# MyriadAL: 病理学のためのアクティブショットラーニング

MyriadAL: Active Few Shot Learning for Histopathology ( http://arxiv.org/abs/2310.16161v1 )

ライセンス: Link先を確認

Nico Schiavone, Jingyi Wang, Shuangzhi Li, Roger Zemp, and Xingyu Li

(参考訳) アクティブラーニング(AL)とFew Shot Learning(FSL)は,近年,優れた成果を上げているラベル効率のよい2つの手法である。しかし、両方の学習パラダイムにおけるほとんどの先行技術は、膨大な未学習データの富を探索することができない。本研究では,アノテーションの予算が非常に限られているが,目的とするタスクにラベルなしのデータが大量に含まれている場合に,この問題に対処する。この研究は、ラベリングが禁止的に高価である病理組織学の文脈で行われます。そこで,本研究では,ループ内のコントラスト学習エンコーダ,擬似ラベル生成,新規クエリサンプル選択などを含む,能動的少数ショット学習フレームワークであるmyriad active learning (mal)を提案する。具体的には、得られたデータ表現とクラスタリング知識が基礎を形成してalループを活性化する自己教師あり方式で、ラベルなしデータをマッサージする。各ALサイクルのオラクルからのフィードバックにより、エンコーダの上の浅いタスク固有ネットを最適化することにより、未ラベルデータの擬似ラベルを洗練する。これらの更新された擬似ラベルは、アクティブな学習クエリ選択プロセスの通知と改善に役立つ。さらに,既存の不確実性対策を組み合わせて,不確実性リスト全体を活用し,alのサンプル冗長性を低減するための新しいレシピを提案する。 2つの公開病理組織学データセットに関する広範な実験により、malは以前の研究よりも優れたテスト精度、マクロf1-スコア、ラベル効率を示し、データセットのわずか5%をラベル付けしながら、完全な教師付きアルゴリズムと同等のテスト精度を達成できることが示された。

Active Learning (AL) and Few Shot Learning (FSL) are two label-efficient methods which have achieved excellent results recently. However, most prior arts in both learning paradigms fail to explore the wealth of the vast unlabelled data. In this study, we address this issue in the scenario where the annotation budget is very limited, yet a large amount of unlabelled data for the target task is available. We frame this work in the context of histopathology where labelling is prohibitively expensive. To this end, we introduce an active few shot learning framework, Myriad Active Learning (MAL), including a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop. Specifically, we propose to massage unlabelled data in a self-supervised manner, where the obtained data representations and clustering knowledge form the basis to activate the AL loop. With feedback from the oracle in each AL cycle, the pseudo-labels of the unlabelled data are refined by optimizing a shallow task-specific net on top of the encoder. These updated pseudo-labels serve to inform and improve the active learning query selection process. Furthermore, we introduce a novel recipe to combine existing uncertainty measures and utilize the entire uncertainty list to reduce sample redundancy in AL. Extensive experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works, and can achieve a comparable test accuracy to a fully supervised algorithm while labelling only 5% of the dataset.

翻訳日:2023-10-26 18:07:00 公開日:2023-10-24

# 軽量安定化器を用いたトーリック符号の単発誤差補正

Single-shot error correction on toric codes with high-weight stabilizers ( http://arxiv.org/abs/2310.16160v1 )

ライセンス: Link先を確認

Yingjia Lin, Shilin Huang, Kenneth R. Brown

(参考訳) 量子エラー訂正符号の場合、要求される測定ラウンドの数は通常、測定が故障した場合の符号距離とともに増加する。単発エラー訂正では、コードサイズに関係なく1ラウンドのノイズシンドローム測定でエラーしきい値を設定することができる。ここでは、トーリックコードのシングルショットチェック演算子を実装します。シングルショットチェックはcampbell[campbell, 2019]に続くガウス除去によって構築される。単発チェック演算子は、ノイズ測定による誤差モデルに対して5.62%の持続しきい値となり、従来のトーリックコード検査演算子よりもノイズ測定の回数が多い。この変換のコストは非局所的な高重安定化器発生器である。次に,安定度重みで測定誤差を増大させるゲートに基づく誤差モデルを検討する。ここでは、単発のしきい値の振る舞いは見つからず、代わりに、コードファミリが固定エラー率に対して最適なコードサイズを持つことを見つけます。この誤差モデルでは、複数の測定値を持つ従来のチェック演算子は論理誤差率を低くする。

For quantum error correction codes the required number of measurement rounds typically increases with the code distance when measurements are faulty. Single-shot error correction allows for an error threshold with only one round of noisy syndrome measurements regardless of the code size. Here we implement single-shot check operators for toric codes. The single-shot checks are constructed by Gaussian elimination following Campbell [Campbell, 2019]. The single-shot check operators result in a sustainable threshold at 5.62% for an error model with noisy measurements, outperforming the conventional toric code check operators with multiple rounds of noisy measurement. The cost of the transformation is non-local high-weight stabilizer generators. We then consider a gate-based error model that leads to increased measurement error with stabilizer weight. Here we find no single-shot threshold behavior and instead find the code family will have an optimal code size for a fixed error rate. For this error model, the conventional check operators with multiple measurements yields a lower logical error rate.

翻訳日:2023-10-26 18:06:32 公開日:2023-10-24

# 固定イジング結合を有する超伝導または半導体スピン量子ビット配列に対するロバスト形状パルス

Robust shaped pulses for arrays of superconducting or semiconductor spin qubits with fixed Ising coupling ( http://arxiv.org/abs/2310.16159v1 )

ライセンス: Link先を確認

David W. Kanaar and J. P. Kestner

(参考訳) 固体量子コンピューティングにおける現在の大きな課題は、量子ビット配列をより多くの量子ビットに拡張することである。これは、これらの配列内の独立に調整可能な多数の量子ビット間カップリングに対する制御配線の複雑さによって妨げられる。問題を単純化する1つのアプローチは、固定Ising(ZZ$)相互作用を持つqubit配列を使用することである。そのようなシステムにおいて、量子ビットの特定の部分集合を同時に駆動するとき、ダイナミクスは、$\mathfrak{su}$(2) 部分代数の集合に制限される。これらの$\mathfrak{su}$(2)sの中で、x$-gatesと$\frac{\pi}{2}$$zz$ローテーションを、トランスモン量子ビットにおけるエラーの主な原因である漏洩や、フラックスや半導体スピン量子ビットにおける不確かさの主な源である結合ゆらぎに対して頑健に行う方法を説明します。これらのゲートと仮想$z$ゲートは、量子コンピューティングのための普遍的なゲートセットを形成する。超伝導量子ビットおよび半導体スピン量子ビットアレイを構成する2辺,3辺,4辺の頂点に対して,このロバストゲートセットを構築する。

A major current challenge in solid-state quantum computing is to scale qubit arrays to a larger number of qubits. This is hampered by the complexity of the control wiring for the large number of independently tunable interqubit couplings within these arrays. One approach to simplifying the problem is to use a qubit array with fixed Ising ($ZZ$) interactions. When simultaneously driving a specific subset of qubits in such a system, the dynamics are confined to a set of commuting $\mathfrak{su}$(2) subalgebras. Within these $\mathfrak{su}$(2)s we describe how to perform $X$-gates and $\frac{\pi}{2}$ $ZZ$ rotations robustly against either leakage, which is the main source of error in transmon qubits, or coupling fluctuations, which is the main source of infidelity in flux or semiconductor spin qubits. These gates together with virtual-$z$ gates form a universal set of gates for quantum computing. We construct this set of robust gates for two-edge, three-edge, and four-edge vertices, which compose all existing superconducting qubit and semiconductor spin qubit arrays.

翻訳日:2023-10-26 18:06:17 公開日:2023-10-24

# GPU組み込みシステムのパフォーマンスチューニング:マシンラーニングと解析モデル駆動チューニング手法

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies ( http://arxiv.org/abs/2310.16214v1 )

ライセンス: Link先を確認

Adrian Perez Dieguez, Margarita Amor Lopez

(参考訳) GPU組み込みシステムは、効率的な電力消費のために、様々な領域で人気を集めている。しかし、これらのシステム上で動作するリアルタイムまたは時間を要するアプリケーションの要求を満たすためには、高いパフォーマンスを示すように調整することが不可欠である。本稿では,GPU組み込みシステム上での2つのチューニング手法の開発と比較による課題に対処するとともに,これらのアーキテクチャ上で動作するアプリケーションの最適化を目指す開発者や研究者に対して,パフォーマンス上の洞察を提供する。我々は、FFT、スキャンプリミティブ、および多くのアプリケーションにおいて性能クリティカルなコンポーネントである三角形システムソルバなどの並列プレフィックス演算に焦点を当てる。本研究は,分析モデル駆動型チューニング手法と機械学習(ML)に基づくチューニング手法を紹介する。 NVIDIA JetsonシステムにおけるBPLGライブラリの異なる並列プレフィックス実装のための2つのチューニング手法の性能評価を行い、その性能を網羅的な探索によって達成されたものと比較した。この発見は、サーバと組み込みデバイス間の主要な計算パターンのパフォーマンスポータビリティに関するオープンな課題に対処するための最良の戦略を明らかにし、オフラインおよびオンラインチューニングの実践的なガイダンスを提供した。また,CUSPARSE,CUB,CUFFTなどの最先端ライブラリとBPLGの性能を比較し,GPU組み込みシステムにおける並列計算パターンに関する既存の研究のギャップにも対処する。

GPU-embedded systems have gained popularity across various domains due to their efficient power consumption. However, in order to meet the demands of real-time or time-consuming applications running on these systems, it is crucial for them to be tuned to exhibit high performance. This paper addresses the issue by developing and comparing two tuning methodologies on GPU-embedded systems, and also provides performance insights for developers and researchers seeking to optimize applications running on these architectures. We focus on parallel prefix operations, such as FFT, scan primitives, and tridiagonal system solvers, which are performance-critical components in many applications. The study introduces an analytical model-driven tuning methodology and a Machine Learning (ML)-based tuning methodology. We evaluate the performance of the two tuning methodologies for different parallel prefix implementations of the BPLG library in an NVIDIA Jetson system, and compare their performance to the ones achieved through an exhaustive search. The findings shed light on the best strategies for handling the open challenge of performance portability for major computational patterns among server and embedded devices, providing practical guidance for offline and online tuning. We also address the existing gap in performance studies for parallel computational patterns in GPU-embedded systems by comparing the BPLG performance against other state-of-the-art libraries, including CUSPARSE, CUB, and CUFFT.

翻訳日:2023-10-26 18:00:26 公開日:2023-10-24

# シャドウセンス:RGB熱ドローン画像からのシャドウ非依存樹冠検出のための教師なしドメイン適応と特徴融合

ShadowSense: Unsupervised Domain Adaptation and Feature Fusion for Shadow-Agnostic Tree Crown Detection from RGB-Thermal Drone Imagery ( http://arxiv.org/abs/2310.16212v1 )

ライセンス: Link先を確認

Rudraksh Kapil, Seyed Mojtaba Marvasti-Zadeh, Nadir Erbilgin, Nilanjan Ray

(参考訳) リモートセンシングデータから個々の樹冠の正確な検出は、森林天蓋の密集した性質と、重複する天蓋、閉塞、および様々な照明条件など様々な環境変化の存在により、大きな課題となる。さらに、ロバストモデルのトレーニングのためのデータ不足は、複雑な森林条件を効果的に研究する上で、別の制限を加える。本稿では,新しい陰影樹冠検出法を提案し,約50k対のrgb熱画像を含む難解なデータセットを提供する。提案手法(ShadowSense)は完全に自己教師型であり,特徴抽出のためのソースドメインアノテーションと,特徴ピラミッドネットワークのための前景特徴アライメントを使わずに,それぞれ目に見える前景領域に着目してドメイン不変表現を適応させる。そして、両方のモードの補完情報を融合し、rgbで訓練された検出器の予測を効果的に改善し、全体的な精度を高める。広汎な実験は、ベースラインRGB訓練検出器と、教師なし領域適応や早期画像融合に依存する最先端技術の両方よりも提案手法が優れていることを示す。私たちのコードとデータは、https://github.com/rudrakshkapil/ShadowSense.comで利用可能です。

Accurate detection of individual tree crowns from remote sensing data poses a significant challenge due to the dense nature of forest canopy and the presence of diverse environmental variations, e.g., overlapping canopies, occlusions, and varying lighting conditions. Additionally, the lack of data for training robust models adds another limitation in effectively studying complex forest conditions. This paper presents a novel method for detecting shadowed tree crowns and provides a challenging dataset comprising roughly 50k paired RGB-thermal images to facilitate future research for illumination-invariant detection. The proposed method (ShadowSense) is entirely self-supervised, leveraging domain adversarial training without source domain annotations for feature extraction and foreground feature alignment for feature pyramid networks to adapt domain-invariant representations by focusing on visible foreground regions, respectively. It then fuses complementary information of both modalities to effectively improve upon the predictions of an RGB-trained detector and boost the overall accuracy. Extensive experiments demonstrate the superiority of the proposed method over both the baseline RGB-trained detector and state-of-the-art techniques that rely on unsupervised domain adaptation or early image fusion. Our code and data are available: https://github.com/rudrakshkapil/ShadowSense

翻訳日:2023-10-26 18:00:03 公開日:2023-10-24

# 深層学習による衛星ハイパースペクトル画像の海とクラウドのセグメンテーション

Sea-Land-Cloud Segmentation in Satellite Hyperspectral Imagery by Deep Learning ( http://arxiv.org/abs/2310.16210v1 )

ライセンス: Link先を確認

Jon Alvarez Justo, Joseph Landon Garrett, Mariana-Iuliana Georgescu, Jesus Gonzalez-Llorente, Radu Tudor Ionescu, Tor Arne Johansen

(参考訳) 衛星は、エッジ推論を通じてプラットフォームの自律性を高めるために、オンボード人工知能(AI)技術の採用が増えている。この文脈において,hs衛星画像のセグメンテーションにおける深層学習(dl)技術の利用は,リモートセンシング応用に有利であり,本研究では,海洋(海),陸(陸),雲形成に焦点をあてた,hs画像のオンボードマルチクラスセグメンテーションに関連があると考えられる16種類の異なるモデルを訓練する。我々は,海陸クラウドセグメンテーションの実証事例としてHYPSO-1ミッションを採用し,その有効性を示すために,新しい海陸クラウドランキングアプリケーションシナリオを導入する。本システムでは, セグメント画像から海, 陸, 雲の濃度に基づいて, HS画像のダウンリンクを優先する。性能,パラメータ数,推測時間を考慮して,軌道内配置のモデルを比較的評価した。モデルには浅部モデルと深部モデルの両方が含まれており、新たに4つのDLモデルを提案すると、スペクトル(1D)と空間(2D)の両方のコンテキストからなる1つのスペクトルシグネチャ(1D)のセグメンテーションが3Dデータ処理より優れていることを示す。 1D-Justo-LiuNet と呼ばれる軽量DLモデルは,U-Net などの海面-クラウドセグメンテーションの最先端モデルを,性能 (0.93 精度) とパラメータ数 (4,563) で一貫して上回っている。しかし、1Dモデルは、テストされた処理アーキテクチャにおいて、明らかに準最適である15秒の推論時間を示す。最後に、軌道内画像のセグメンテーションは生データではなく、L1bの放射率キャリブレーション後に起こることを実証した後、より弱いセグメンテーション性能を犠牲にして、スペクトルチャネルを3つのモデルのパラメータと推論時間に下げることも示す。

Satellites are increasingly adopting on-board Artificial Intelligence (AI) techniques to enhance platforms' autonomy through edge inference. In this context, the utilization of deep learning (DL) techniques for segmentation in HS satellite imagery offers advantages for remote sensing applications, and therefore, we train 16 different models, whose codes are made available through our study, which we consider to be relevant for on-board multi-class segmentation of HS imagery, focusing on classifying oceanic (sea), terrestrial (land), and cloud formations. We employ the HYPSO-1 mission as an illustrative case for sea-land-cloud segmentation, and to demonstrate the utility of the segments, we introduce a novel sea-land-cloud ranking application scenario. Our system prioritizes HS image downlink based on sea, land, and cloud coverage levels from the segmented images. We comparatively evaluate the models for in-orbit deployment, considering performance, parameter count, and inference time. The models include both shallow and deep models, and after we propose four new DL models, we demonstrate that segmenting single spectral signatures (1D) outperforms 3D data processing comprising both spectral (1D) and spatial (2D) contexts. We conclude that our lightweight DL model, called 1D-Justo-LiuNet, consistently surpasses state-of-the-art models for sea-land-cloud segmentation, such as U-Net and its variations, in terms of performance (0.93 accuracy) and parameter count (4,563). However, the 1D models present longer inference time (15s) in the tested processing architecture, which is clearly suboptimal. Finally, after demonstrating that in-orbit image segmentation should occur post L1b radiance calibration rather than on raw data, we additionally show that reducing spectral channels down to 3 lowers models' parameters and inference time, at the cost of weaker segmentation performance.

翻訳日:2023-10-26 17:59:35 公開日:2023-10-24

# ELMリッジ回帰ブースティング

ELM Ridge Regression Boosting ( http://arxiv.org/abs/2310.16209v1 )

ライセンス: Link先を確認

M. Andrecut

(参考訳) ELM(Extreme Learning Machine)に適用したRide Regression(RR)手法の高速化手法について検討し,提案手法がELMの分類性能とロバスト性を大幅に向上させることを示す。

We discuss a boosting approach for the Ridge Regression (RR) method, with applications to the Extreme Learning Machine (ELM), and we show that the proposed method significantly improves the classification performance and robustness of ELMs.

翻訳日:2023-10-26 17:58:56 公開日:2023-10-24

# イベントタイムラインの背景要約

Background Summarization of Event Timelines ( http://arxiv.org/abs/2310.16197v1 )

ライセンス: Link先を確認

Adithya Pratapa, Kevin Small, Markus Dreyer

(参考訳) ニュースイベントの簡潔な要約を生成することは、難しい自然言語処理タスクである。ジャーナリストは、重要なサブイベントをハイライトするためにタイムラインをキュレートすることが多いが、ニュースイベントへの新参者は、歴史的な状況に追いつくことの難しさに直面する。本稿では、各タイムライン更新を補完するバックグラウンドニュース要約のタスクと、関連する先行イベントの背景要約を導入することで、このニーズに対処する。既存の時系列データセットをマージしてデータセットを構築し,各ニュースイベント毎の背景概要を記述する。本稿では,最先端の要約システムを用いて強力なベースライン性能を確立し,背景要約を生成するクエリ指向型を提案する。背景要約の質を評価するため,背景要約が回答する現在の事象経過に関する質問の割合を測定する質問応答に基づく評価指標であるバックグラウンドユーティリティスコア(BUS)を提案する。 GPT-3.5を用いたゼロショット性能の向上に加えて,Flan-T5などの微調整システムの有効性を示す。

Generating concise summaries of news events is a challenging natural language processing task. While journalists often curate timelines to highlight key sub-events, newcomers to a news event face challenges in catching up on its historical context. In this paper, we address this need by introducing the task of background news summarization, which complements each timeline update with a background summary of relevant preceding events. We construct a dataset by merging existing timeline datasets and asking human annotators to write a background summary for each timestep of each news event. We establish strong baseline performance using state-of-the-art summarization systems and propose a query-focused variant to generate background summaries. To evaluate background summary quality, we present a question-answering-based evaluation metric, Background Utility Score (BUS), which measures the percentage of questions about a current event timestep that a background summary answers. Our experiments show the effectiveness of instruction fine-tuned systems such as Flan-T5, in addition to strong zero-shot performance using GPT-3.5.

翻訳日:2023-10-26 17:58:49 公開日:2023-10-24

# 単純決定論的オートエンコーダによる低位潜在空間の学習:理論的および経験的考察

Learning Low-Rank Latent Spaces with Simple Deterministic Autoencoder: Theoretical and Empirical Insights ( http://arxiv.org/abs/2310.16194v1 )

ライセンス: Link先を確認

Alokendu Mazumder, Tirthajit Baruah, Bhartendu Kumar, Rishab Sharma, Vishwajeet Pattanaik, Punit Rathore

(参考訳) autoencoderは教師なしの学習パラダイムであり、再構成損失を最小限にすることでデータのコンパクトな潜在表現を作ることを目的としている。しかし、ほとんどのデータ(画像)が低次元空間に埋め込まれているという事実は見過ごされがちであり、効果的なデータ表現には不可欠である。この制限に対処するため,Low-Rank Autoencoder (LoRAE) と呼ばれる新しい手法を提案する。 LoRAEでは,低次元潜在空間を適応的に再構成し,オートエンコーダの基本目的を保ちながら低ランク正規化器を組み込んだ。これは重要な情報を保存しながら、データを低次元空間に埋め込むのに役立つ。低ランク潜在空間を学習する単純なオートエンコーダ拡張である。理論的には、モデルに対してより厳密なエラー境界を確立する。経験的に、我々のモデルの優越性は画像生成や下流分類といった様々なタスクを通して輝いています。理論的および実践的な結果は、低次元埋め込みを取得することの重要性を強調している。

The autoencoder is an unsupervised learning paradigm that aims to create a compact latent representation of data by minimizing the reconstruction loss. However, it tends to overlook the fact that most data (images) are embedded in a lower-dimensional space, which is crucial for effective data representation. To address this limitation, we propose a novel approach called Low-Rank Autoencoder (LoRAE). In LoRAE, we incorporated a low-rank regularizer to adaptively reconstruct a low-dimensional latent space while preserving the basic objective of an autoencoder. This helps embed the data in a lower-dimensional space while preserving important information. It is a simple autoencoder extension that learns low-rank latent space. Theoretically, we establish a tighter error bound for our model. Empirically, our model's superiority shines through various tasks such as image generation and downstream classification. Both theoretical and practical outcomes highlight the importance of acquiring low-dimensional embeddings.

翻訳日:2023-10-26 17:58:32 公開日:2023-10-24

# Lengthは文書レベルのセマンティックスのためのカースと祝福

Length is a Curse and a Blessing for Document-level Semantics ( http://arxiv.org/abs/2310.16193v1 )

ライセンス: Link先を確認

Chenghao Xiao, Yizhi Li, G Thomas Hudson, Chenghua Lin, Noura Al Moubayed

(参考訳) 近年、コントラスト学習(cl)は、事前学習された言語モデルから文と文書レベルのエンコーディング能力を回復するために広く利用されている。本研究では,CLモデルの長さ一般化可能性,すなわち,長さ誘起セマンティックシフトに対する脆弱性について考察する。我々は、その長さの脆弱性が重要で見過ごされている研究のギャップであるだけでなく、文書の長さによって提供される意味的信号のみに応じて教師なしのclメソッドを考案することができることを検証した。まず,文書の伸長がCLによってもたらされた文書内類似度を高めることを示し,文書の長さ攻撃の基礎となる理論的基礎を導出する。さらに,clが約束する等方性は,学習中に露呈するテキストの長さ範囲に大きく依存することがわかった。これらの知見に触発されて、単純で普遍的な文書表現学習フレームワークla(ser)$^{3}$: 意味論的にロバストな文表現学習のための長さ非依存の自己参照を導入し、標準情報検索ベンチマークで最先端の教師なしパフォーマンスを実現する。

In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but we can devise unsupervised CL methods solely depending on the semantic signal provided by document length. We first derive the theoretical foundations underlying length attacks, showing that elongating a document would intensify the high intra-document similarity that is already brought by CL. Moreover, we found that isotropy promised by CL is highly dependent on the length range of text exposed in training. Inspired by these findings, we introduce a simple yet universal document representation learning framework, LA(SER)$^{3}$: length-agnostic self-reference for semantically robust sentence representation learning, achieving state-of-the-art unsupervised performance on the standard information retrieval benchmark.

翻訳日:2023-10-26 17:58:17 公開日:2023-10-24

# スパース観測と時変センサを用いた高効率深部データ同化

Efficient deep data assimilation with sparse observations and time-varying sensors ( http://arxiv.org/abs/2310.16187v1 )

ライセンス: Link先を確認

Sibo Cheng, Che Liu, Yike Guo, Rossella Arcucci

(参考訳) 変分データ同化(DA)は、複数のノイズデータソースの重み付けをすることで、現場復元と予測の工学的問題に広く用いられている。近年,DAにおけるディープラーニング(DL)技術の統合は,高次元力学系における効率と精度の向上を約束している。それにもかかわらず、既存の深部DAアプローチは、特に時間とともにセンサーの配置と数が動的である場合、非構造化観測データを扱うのに困難に直面している。本稿では,dl逆演算子を同化目的関数に組み込んだ変分データ同化のためのvoronoi-tessellation inverse operator(vivid)という新しい変分daスキームを導入する。 voronoi-tessellationとconvolutional neural networksの能力を活用することで、vividは、スパース、非構造化、時間変化のセンサーデータの処理に長けている。さらに、DL逆演算子の組み入れにより、観測と状態空間の直接リンクが確立され、DAに必要な最小化ステップの数が減少する。さらに、 vivid は適切な直交分解 (pod) とシームレスに統合でき、エンドツーエンドの還元順序 da スキームを開発することができる。流体力学系における数値実験により、VIVIDは既存のDAおよびDLアルゴリズムを大幅に上回ることを示す。 VIVIDのロバスト性は、様々なレベルの事前エラー、様々なセンサーの利用、DAにおける誤り共分散の誤特定などを通じてもアクセス可能である。

Variational Data Assimilation (DA) has been broadly used in engineering problems for field reconstruction and prediction by performing a weighted combination of multiple sources of noisy data. In recent years, the integration of deep learning (DL) techniques in DA has shown promise in improving the efficiency and accuracy in high-dimensional dynamical systems. Nevertheless, existing deep DA approaches face difficulties in dealing with unstructured observation data, especially when the placement and number of sensors are dynamic over time. We introduce a novel variational DA scheme, named Voronoi-tessellation Inverse operator for VariatIonal Data assimilation (VIVID), that incorporates a DL inverse operator into the assimilation objective function. By leveraging the capabilities of the Voronoi-tessellation and convolutional neural networks, VIVID is adept at handling sparse, unstructured, and time-varying sensor data. Furthermore, the incorporation of the DL inverse operator establishes a direct link between observation and state space, leading to a reduction in the number of minimization steps required for DA. Additionally, VIVID can be seamlessly integrated with Proper Orthogonal Decomposition (POD) to develop an end-to-end reduced-order DA scheme, which can further expedite field reconstruction. Numerical experiments in a fluid dynamics system demonstrate that VIVID can significantly outperform existing DA and DL algorithms. The robustness of VIVID is also accessed through the application of various levels of prior error, the utilization of varying numbers of sensors, and the misspecification of error covariance in DA.

翻訳日:2023-10-26 17:57:55 公開日:2023-10-24

# 粉末x線回折像に対するu-netアーキテクチャを用いた画像分割

Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images ( http://arxiv.org/abs/2310.16186v1 )

ライセンス: Link先を確認

Howard Yanxon, Eric Roberts, Hannah Parraga, James Weng, Wenqian Xu, Uta Ruett, Alexander Hexemer, Petrus Zwart, Nickolas Schwarz

(参考訳) 科学研究者は、高エネルギー粉末X線回折(XRD)技術を用いて、充電可能な電池材料などの機能デバイスにおける材料の結晶構造を調べる。実験XRD画像中のアーティファクトを識別する手法を提案する。提案手法では,チューニング可能なu-netなど,ディープラーニング畳み込みニューラルネットワークアーキテクチャを用いてアーチファクトを識別する。特に、予測されたアーティファクトは、全体正の正の率またはリコールを用いて、対応する基底真理(手動で実装)に対して評価される。その結果、u-netはトレーニングに含まれないテストデータセット上で92.4%という高いリコール性能を実現でき、従来の方法と比較して平均的な偽陽性率を34%削減できた。 U-Netsはまた、アーティファクトの識別と分離に要する時間を50%以上削減している。さらに, アーティファクトの排除は, 統合された1次元XRDパターンに大きな変化を示し, 後処理のXRDデータのさらなる解析を促進する。

Scientific researchers frequently use the in situ synchrotron high-energy powder X-ray diffraction (XRD) technique to examine the crystallographic structures of materials in functional devices such as rechargeable battery materials. We propose a method for identifying artifacts in experimental XRD images. The proposed method uses deep learning convolutional neural network architectures, such as tunable U-Nets to identify the artifacts. In particular, the predicted artifacts are evaluated against the corresponding ground truth (manually implemented) using the overall true positive rate or recall. The result demonstrates that the U-Nets can consistently produce great recall performance at 92.4% on the test dataset, which is not included in the training, with a 34% reduction in average false positives in comparison to the conventional method. The U-Nets also reduce the time required to identify and separate artifacts by more than 50%. Furthermore, the exclusion of the artifacts shows major changes in the integrated 1D XRD pattern, enhancing further analysis of the post-processing XRD data.

翻訳日:2023-10-26 17:57:25 公開日:2023-10-24

# blp 2023タスク2:感情分析

BLP 2023 Task 2: Sentiment Analysis ( http://arxiv.org/abs/2310.16183v1 )

ライセンス: Link先を確認

Md. Arid Hasan, Firoj Alam, Anika Anjum, Shudipta Das, Afiyat Anjum

(参考訳) EMNLP 2023と共同で,第1回BLP 2023ワークショップの一環として編成されたBLP知覚共有タスクの概要を紹介する。このタスクは、ソーシャルメディアのテキスト中の感情の検出として定義されます。このタスクには71人の参加者が参加し、29チームと30チームがそれぞれ開発フェーズと評価フェーズにシステムを提出した。参加者は合計597人となった。しかし、合計15チームがシステム記述書を提出した。提出されたシステムにおけるアプローチの範囲は、古典的な機械学習モデル、微調整された事前訓練モデル、ゼロショットと少数ショットの設定でLarge Language Model(LLM)を活用することまで様々である。本稿では,データセット開発と評価設定を含むタスク設定の詳細な説明を行う。また,参加者が提出したシステムの概要についても概説する。共有タスクからのすべてのデータセットと評価スクリプトが研究コミュニティ向けに公開され、この領域におけるさらなる研究が進められている。

We present an overview of the BLP Sentiment Shared Task, organized as part of the inaugural BLP 2023 workshop, co-located with EMNLP 2023. The task is defined as the detection of sentiment in a given piece of social media text. This task attracted interest from 71 participants, among whom 29 and 30 teams submitted systems during the development and evaluation phases, respectively. In total, participants submitted 597 runs. However, a total of 15 teams submitted system description papers. The range of approaches in the submitted systems spans from classical machine learning models, fine-tuning pre-trained models, to leveraging Large Language Model (LLMs) in zero- and few-shot settings. In this paper, we provide a detailed account of the task setup, including dataset development and evaluation setup. Additionally, we provide a brief overview of the systems submitted by the participants. All datasets and evaluation scripts from the shared task have been made publicly available for the research community, to foster further research in this domain

翻訳日:2023-10-26 17:57:08 公開日:2023-10-24

# 事前学習型言語モデルの改良と解釈のための混合言語訓練適応器

Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models ( http://arxiv.org/abs/2310.16240v1 )

ライセンス: Link先を確認

Raymond Li, Gabriel Murray and Giuseppe Carenini

(参考訳) 本研究では,パラメータ効率のよい微調整(PEFT)設定において,言語構造を事前学習言語モデルに注入することで,2つの人気のある研究領域を組み合わせる手法を提案する。このアプローチでは、異なる言語構造をエンコードする並列アダプタモジュールを、gumbel-softmaxゲートを使用してモデルの各層におけるこれらのモジュールの重要性を判断する、新しい混合言語専門家アーキテクチャを用いて結合する。パラメータの数を減らすために、まず、その重要度に基づいて専門家を刈り取る前に、一定数のステップでモデルをトレーニングします。実験の結果,3種類の事前学習モデルによる実験結果から,本手法はパラメータ数に比較して,最先端のPEFT法より優れていることが示された。さらに,各モデルで選択した専門家を各層で分析し,今後の研究に対する洞察を提供する。

In this work, we propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models in the parameter-efficient fine-tuning (PEFT) setting. In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture, where Gumbel-Softmax gates are used to determine the importance of these modules at each layer of the model. To reduce the number of parameters, we first train the model for a fixed small number of steps before pruning the experts based on their importance scores. Our experiment results with three different pre-trained models show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters. In addition, we provide additional analysis to examine the experts selected by each model at each layer to provide insights for future studies.

翻訳日:2023-10-26 17:49:59 公開日:2023-10-24

# 教師なし画像セグメンテーションのための画素レベルクラスタリングネットワーク

Pixel-Level Clustering Network for Unsupervised Image Segmentation ( http://arxiv.org/abs/2310.16234v1 )

ライセンス: Link先を確認

Cuong Manh Hoang and Byeongkeun Kang

(参考訳) 画像分割は、自動運転、把持、ロボットナビゲーションなどの様々なコンピュータビジョンアプリケーションにおいて不可欠であるが、トレーニングのためにピクセルレベルですべてのオブジェクトに注釈を付けることはほぼ不可能である。したがって、教師なし画像分割法の研究は不可欠である。本稿では,画像の領域分割のためのピクセルレベルのクラスタリングフレームワークを提案する。提案フレームワークは、注意機構を備えた機能埋め込みモジュール、特徴統計計算モジュール、画像再構成、および高精度な教師なしセグメンテーションを実現するスーパーピクセルセグメンテーションを含む。さらに,各スーパーピクセル間の一貫性,隣接スーパーピクセル間の相似/相似性,画像間の構造相似性を利用したトレーニング戦略を提案する。また,スーパーピクセルによる損失による過大セグメント化を回避するため,ポストプロセッシング手法を提案する。さらに,教師なしセマンティックセグメンテーションのための提案手法の拡張を提案する。提案フレームワークの有効性を実証するために,3つの公開データセット(berkeley segmentation dataset,pascal voc 2012 dataset,coco-stuff dataset)について実験を行った。実験の結果,提案手法は従来の最先端手法よりも優れていた。

While image segmentation is crucial in various computer vision applications, such as autonomous driving, grasping, and robot navigation, annotating all objects at the pixel-level for training is nearly impossible. Therefore, the study of unsupervised image segmentation methods is essential. In this paper, we present a pixel-level clustering framework for segmenting images into regions without using ground truth annotations. The proposed framework includes feature embedding modules with an attention mechanism, a feature statistics computing module, image reconstruction, and superpixel segmentation to achieve accurate unsupervised segmentation. Additionally, we propose a training strategy that utilizes intra-consistency within each superpixel, inter-similarity/dissimilarity between neighboring superpixels, and structural similarity between images. To avoid potential over-segmentation caused by superpixel-based losses, we also propose a post-processing method. Furthermore, we present an extension of the proposed method for unsupervised semantic segmentation. We conducted experiments on three publicly available datasets (Berkeley segmentation dataset, PASCAL VOC 2012 dataset, and COCO-Stuff dataset) to demonstrate the effectiveness of the proposed framework. The experimental results show that the proposed framework outperforms previous state-of-the-art methods.

翻訳日:2023-10-26 17:49:43 公開日:2023-10-24

# 時系列予測のための注意に基づくアンサンブルプール

Attention-Based Ensemble Pooling for Time Series Forecasting ( http://arxiv.org/abs/2310.16231v1 )

ライセンス: Link先を確認

Dhruvit Patel and Alexander Wikner

(参考訳) 時系列予測におけるモデルバイアスを低減する一般的な手法は、予測モデルのアンサンブルを使用して、その出力をアンサンブル予測にまとめることである。しかし、各予測モデルが異なるバイアスを持つ場合、このプーリング中に各モデル予測がどのように評価されるべきかは必ずしも明確ではない。提案手法は,注意に基づくアンサンブルプーリングモデルによって重み付け値が学習される候補モデル予測よりも重み付け平均を行うプーリング手法を提案する。本手法は,非定常Lorenz `63方程式のダイナミクスのマルチステップ予測と,COVID-19による週のインシデント死亡の1ステップ予測という2つの時系列予測問題に対して試行する。当モデルでは,非定常ロレンツ式63を予測した場合に優れた有効時間が得られるが,covid-19週次インシデント死亡を予測した場合,既存のアンサンブルプールよりも良好に動作しないことがわかった。

A common technique to reduce model bias in time-series forecasting is to use an ensemble of predictive models and pool their output into an ensemble forecast. In cases where each predictive model has different biases, however, it is not always clear exactly how each model forecast should be weighed during this pooling. We propose a method for pooling that performs a weighted average over candidate model forecasts, where the weights are learned by an attention-based ensemble pooling model. We test this method on two time-series forecasting problems: multi-step forecasting of the dynamics of the non-stationary Lorenz `63 equation, and one-step forecasting of the weekly incident deaths due to COVID-19. We find that while our model achieves excellent valid times when forecasting the non-stationary Lorenz `63 equation, it does not consistently perform better than the existing ensemble pooling when forecasting COVID-19 weekly incident deaths.

翻訳日:2023-10-26 17:49:23 公開日:2023-10-24

# ショートカット学習の基礎について

On the Foundations of Shortcut Learning ( http://arxiv.org/abs/2310.16228v1 )

ライセンス: Link先を確認

Katherine L. Hermann, Hossein Mobahi, Thomas Fel, Michael C. Mozer

(参考訳) ディープラーニングモデルは、データから豊富な特徴を抽出できる。モデルが使用する機能は、予測性だけでなく、確実にトレインセットラベルを示す機能にも依存します。ショートカット学習に関する文献では、例えば、形状上のテクスチャや、前景の物体上の画像背景など、モデルが別の特徴を特権化する例が指摘されている。本稿では,モデルに対してどの入力特性が利用可能かという仮説を検証し,モデルの特徴利用に対する予測性と可用性の相互作用を体系的に検討する。提案手法は,予測性や予測可能性に関連する要因によって異なる2つの潜在的特徴を持つ分類データセットを合成する最小限かつ明示的な生成フレームワークを構築し,コア機能(利用できない,予測しにくい)を犠牲にして,ショートカット機能に対するモデルのショートカットバイアスの過度な信頼性を定量化する。線形モデルは比較的偏りがないが、ReLUまたはTanh単位を持つ単一の隠れ層を導入するとバイアスが生じる。我々の経験的発見は、Neural Tangent Kernelsに基づく理論的考察と一致している。最後に,自然データ集合における予測性と可用性のトレードオフについて検討し,モデルの近距離バイアスを増大させるアベイラビリティ操作を発見する。これらの結果は、モデルがタスクをどう解決するかを形作る役割を考慮し、体系的な研究を保証している深い非線形アーキテクチャの基本的な特徴であることを示す。

Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on predictivity-how reliably a feature indicates train-set labels-but also on availability-how easily the feature can be extracted, or leveraged, from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for example texture over shape and image backgrounds over foreground objects. Here, we test hypotheses about which input properties are more available to a model, and systematically study how predictivity and availability interact to shape models' feature use. We construct a minimal, explicit generative framework for synthesizing classification datasets with two latent features that vary in predictivity and in factors we hypothesize to relate to availability, and quantify a model's shortcut bias-its over-reliance on the shortcut (more available, less predictive) feature at the expense of the core (less available, more predictive) feature. We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias. Our empirical findings are consistent with a theoretical account based on Neural Tangent Kernels. Finally, we study how models used in practice trade off predictivity and availability in naturalistic datasets, discovering availability manipulations which increase models' degree of shortcut bias. Taken together, these findings suggest that the propensity to learn shortcut features is a fundamental characteristic of deep nonlinear architectures warranting systematic study given its role in shaping how models solve tasks.

翻訳日:2023-10-26 17:49:04 公開日:2023-10-24

# TiC-CLIP:CLIPモデルの継続的なトレーニング

TiC-CLIP: Continual Training of CLIP Models ( http://arxiv.org/abs/2310.16226v1 )

ライセンス: Link先を確認

Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri

(参考訳) 最新のデータで大規模な基盤モデルを最新に保つのは本質的にコストがかかる。絶え間ない再訓練の禁止コストを避けるためには、これらのモデルを継続的に訓練することが不可欠である。この問題は、大規模な連続学習ベンチマークやベースラインの欠如によって悪化している。我々は、TiC-DataCompt、TiC-YFCC、TiC-RedCapsといったビジョン言語モデルをトレーニングするための、WebスケールのTime-Continual(TiC)ベンチマークの最初のセットを紹介します。まず、ベンチマークを用いて様々な動的評価を算出し、既存のモデルの時間的堅牢性を測定する。私たちは、OpenAIのCLIP(2020年までのデータでトレーニングされた)が、最近トレーニングされたOpenCLIPリポジトリのモデルと比較して、2021年から2022年までのキュレートされた検索タスクにおいて、$\approx 8\%$ゼロショットの精度を失うことを示しています。次に、時間連続データに基づいてモデルを効率的にトレーニングする方法を研究します。最後のチェックポイントからトレーニングを継続し、古いデータを再生するシンプルなリハーサルベースのアプローチは、スクラッチからリトレーニングする標準的なプラクティスと比較して、計算コストを2.5\times$削減する。

Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataCompt, TiC-YFCC, and TiC-RedCaps with over 12.7B timestamped image-text pairs spanning 9 years (2014--2022). We first use our benchmarks to curate various dynamic evaluations to measure temporal robustness of existing models. We show OpenAI's CLIP (trained on data up to 2020) loses $\approx 8\%$ zero-shot accuracy on our curated retrieval task from 2021--2022 compared with more recently trained models in OpenCLIP repository. We then study how to efficiently train models on time-continuous data. We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by $2.5\times$ when compared to the standard practice of retraining from scratch.

翻訳日:2023-10-26 17:48:32 公開日:2023-10-24

# CleanCoNLL: ほとんどノイズのない名前付きエンティティ認識データセット

CleanCoNLL: A Nearly Noise-Free Named Entity Recognition Dataset ( http://arxiv.org/abs/2310.16225v1 )

ライセンス: Link先を確認

Susanna R\"ucker, Alan Akbik

(参考訳) conll-03コーパスは、名前付きエンティティ認識(ner)のための最もよく知られているベンチマークデータセットである。しかし、以前の研究では、データにかなりの数のアノテーションエラー、不完全性、不整合が見つかった。これは、現在の最先端モデルは、CoNLL-03の推定ノイズレベルに匹敵する、あるいは超えるF1スコアを達成するため、NERアプローチを客観的に比較し、それらのエラーを分析するための課題となる。この問題に対処するために,全ラベルの7.0%を英語のconll-03で訂正する自動一貫性チェックによる包括的relabelingの取り組みを提案する。我々の取り組みは、NERラベルのより良い説明可能性とアノテーション品質のさらなる保護のためにエンティティリンクアノテーションのレイヤを追加します。実験結果から, 最先端の手法がF1スコア(97.1%)をはるかに上回っているだけでなく, アノテーションノイズによる誤りとして誤算された正確な予測のシェアが47%から6%に低下していることがわかった。このことは、我々の資源は最先端モデルによる残差を分析するのに適しており、理論上界は高資源でも粗粒NERに到達していないことを示している。このような分析を容易にするため,研究コミュニティにCleanCoNLLを公開する。

The CoNLL-03 corpus is arguably the most well-known and utilized benchmark dataset for named entity recognition (NER). However, prior works found significant numbers of annotation errors, incompleteness, and inconsistencies in the data. This poses challenges to objectively comparing NER approaches and analyzing their errors, as current state-of-the-art models achieve F1-scores that are comparable to or even exceed the estimated noise level in CoNLL-03. To address this issue, we present a comprehensive relabeling effort assisted by automatic consistency checking that corrects 7.0% of all labels in the English CoNLL-03. Our effort adds a layer of entity linking annotation both for better explainability of NER labels and as additional safeguard of annotation quality. Our experimental evaluation finds not only that state-of-the-art approaches reach significantly higher F1-scores (97.1%) on our data, but crucially that the share of correct predictions falsely counted as errors due to annotation noise drops from 47% to 6%. This indicates that our resource is well suited to analyze the remaining errors made by state-of-the-art models, and that the theoretical upper bound even on high resource, coarse-grained NER is not yet reached. To facilitate such analysis, we make CleanCoNLL publicly available to the research community.

翻訳日:2023-10-26 17:48:06 公開日:2023-10-24

# 毒物は痕跡がない:毒物攻撃の完全無依存検出

Poison is Not Traceless: Fully-Agnostic Detection of Poisoning Attacks ( http://arxiv.org/abs/2310.16224v1 )

ライセンス: Link先を確認

Xinglong Chang, Katharina Dost, Gillian Dobbie, J\"org Wicker

(参考訳) 機械学習モデルのパフォーマンスは、基礎となるデータの品質に依存する。悪意のあるアクターは、トレーニングデータを汚染することでモデルを攻撃することができる。現在の検出器は、特定のデータタイプ、モデル、または攻撃と結びついているため、実際のシナリオでの適用性は限られている。本稿では,毒性のあるデータセットの分析にのみ依存する攻撃を検知する新たなフレームワークであるDIVA(Detecting In Visible Attacks)を提案する。 divaは、毒物や清潔なデータに対する分類器の精度を比較して毒物攻撃を検知できるという考えに基づいており、仮説上のクリーンデータセット上で未知の精度を推定するために、複雑度測定を用いてメタリーナーを事前訓練している。このフレームワークは一般的な中毒攻撃に適用できる。評価のために,本稿ではラベルフリップ攻撃に対するDIVAを検証した。

The performance of machine learning models depends on the quality of the underlying data. Malicious actors can attack the model by poisoning the training data. Current detectors are tied to either specific data types, models, or attacks, and therefore have limited applicability in real-world scenarios. This paper presents a novel fully-agnostic framework, DIVA (Detecting InVisible Attacks), that detects attacks solely relying on analyzing the potentially poisoned data set. DIVA is based on the idea that poisoning attacks can be detected by comparing the classifier's accuracy on poisoned and clean data and pre-trains a meta-learner using Complexity Measures to estimate the otherwise unknown accuracy on a hypothetical clean dataset. The framework applies to generic poisoning attacks. For evaluation purposes, in this paper, we test DIVA on label-flipping attacks.

翻訳日:2023-10-26 17:47:42 公開日:2023-10-24

# 階層的ランダム化平滑化

Hierarchical Randomized Smoothing ( http://arxiv.org/abs/2310.16221v1 )

ライセンス: Link先を確認

Yan Scholten, Jan Schuchardt, Aleksandar Bojchevski, Stephan G\"unnemann

(参考訳) 実世界のデータは複雑で、しばしば複数のエンティティ(例えば画像はピクセル、グラフは相互接続ノード)に分解できるオブジェクトで構成されている。ランダム化平滑化(randomized smoothing)は、モデルが入力の小さな変更に対して確実に堅牢になるための強力なフレームワークである。しかし、オブジェクト全体(例えば画像)を任意に摂動せず、エンティティのサブセット(例えばピクセル)しか持たない場合、ランダムな平滑化による複雑なデータに対するロバスト性の証明は困難である。ランダムに選択されたエンティティのサブセットにのみランダムノイズを追加することにより、部分的にオブジェクトを平滑化します。従来の手法よりも標的に雑音を加えることで、高い精度を維持しながら強靭性を保証する。異なるノミージング分布を用いて階層的平滑化を初期化し,離散的および連続的領域に対する新しいロバスト性証明を導出する。画像とノードの分類における階層的平滑化の重要性を実験的に実証し,ロバスト性・正確性に優れたトレードオフをもたらすことを示した。全体として、階層的平滑化は、摂動に対して確実に堅牢で正確であるモデルにとって重要な貢献である。

Real-world data is complex and often consists of objects that can be decomposed into multiple entities (e.g. images into pixels, graphs into interconnected nodes). Randomized smoothing is a powerful framework for making models provably robust against small changes to their inputs - by guaranteeing robustness of the majority vote when randomly adding noise before classification. Yet, certifying robustness on such complex data via randomized smoothing is challenging when adversaries do not arbitrarily perturb entire objects (e.g. images) but only a subset of their entities (e.g. pixels). As a solution, we introduce hierarchical randomized smoothing: We partially smooth objects by adding random noise only on a randomly selected subset of their entities. By adding noise in a more targeted manner than existing methods we obtain stronger robustness guarantees while maintaining high accuracy. We initialize hierarchical smoothing using different noising distributions, yielding novel robustness certificates for discrete and continuous domains. We experimentally demonstrate the importance of hierarchical smoothing in image and node classification, where it yields superior robustness-accuracy trade-offs. Overall, hierarchical smoothing is an important contribution towards models that are both - certifiably robust to perturbations and accurate.

翻訳日:2023-10-26 17:47:28 公開日:2023-10-24

# 大規模言語モデルのための知識編集:調査

Knowledge Editing for Large Language Models: A Survey ( http://arxiv.org/abs/2310.16218v1 )

ライセンス: Link先を確認

Song Wang, Yaochen Zhu, Haochen Liu, Zaiyi Zheng, Chen Chen, Jundong L

(参考訳) 大規模言語モデル(LLM)は、その膨大な知識と推論能力に基づいてテキストを理解し、分析し、生成する顕著な能力のために、最近、学術的および産業的景観を変革した。それにもかかわらず、llmsの大きな欠点は、前例のない量のパラメータによる事前トレーニングの計算コストである。事前訓練されたモデルに新しい知識を頻繁に導入する必要がある場合、デメリットは悪化する。したがって、事前訓練されたLLMを更新するための効率的かつ効率的な技術を開発することが不可欠である。従来の手法は、事前訓練されたllmにおける新しい知識を直接微調整によってエンコードする。しかし, 自己学習型LLMは計算集約的であり, モデル更新によらず, 価値ある事前学習知識を劣化させるリスクがある。近年,知識に基づくモデル編集(KME)が注目され,他の無関係な知識に悪影響を及ぼすことなく,特定の知識を組み込むためにLLMを正確に修正することを目指している。本調査では,KME分野の最近の進歩を包括的かつ詳細に概観することを目的としている。まず、異なるKME戦略を包含するKMEの一般的な定式化を導入する。その後,本手法の革新的分類法として,既存のKME戦略を考察し,各カテゴリの手法の重要点,利点,限界を分析した上で,新たな知識の事前学習 LLM への導入方法に基づくKME手法の革新的分類法を提案する。さらに、KMEの代表的な指標、データセット、応用を紹介する。最後に,KMEの実践性と課題の残りについて詳細な分析を行い,今後の発展に向けた今後の研究の方向性を提案する。

Large language models (LLMs) have recently transformed both the academic and industrial landscapes due to their remarkable capacity to understand, analyze, and generate texts based on their vast knowledge and reasoning ability. Nevertheless, one major drawback of LLMs is their substantial computational cost for pre-training due to their unprecedented amounts of parameters. The disadvantage is exacerbated when new knowledge frequently needs to be introduced into the pre-trained model. Therefore, it is imperative to develop effective and efficient techniques to update pre-trained LLMs. Traditional methods encode new knowledge in pre-trained LLMs through direct fine-tuning. However, naively re-training LLMs can be computationally intensive and risks degenerating valuable pre-trained knowledge irrelevant to the update in the model. Recently, Knowledge-based Model Editing (KME) has attracted increasing attention, which aims to precisely modify the LLMs to incorporate specific knowledge, without negatively influencing other irrelevant knowledge. In this survey, we aim to provide a comprehensive and in-depth overview of recent advances in the field of KME. We first introduce a general formulation of KME to encompass different KME strategies. Afterward, we provide an innovative taxonomy of KME techniques based on how the new knowledge is introduced into pre-trained LLMs, and investigate existing KME strategies while analyzing key insights, advantages, and limitations of methods from each category. Moreover, representative metrics, datasets, and applications of KME are introduced accordingly. Finally, we provide an in-depth analysis regarding the practicality and remaining challenges of KME and suggest promising research directions for further advancement in this field.

翻訳日:2023-10-26 17:47:04 公開日:2023-10-24

# NaRb分子の複数回転状態に対するマジックトラップ

Magic Traps for Multiple Rotational States of NaRb Molecule ( http://arxiv.org/abs/2310.16215v1 )

ライセンス: Link先を確認

Svetlana Kotochigova, Qingze Guan, Vito Scarola, Brian DeMarco, Bryce Gadway

(参考訳) 分子は振動、回転、スピン軌道、超微細な自由度を持ち、それぞれが外部電磁放射に特異的に反応する。これらの量子状態の重ね合わせに対するコヒーレント制御は分子の操作の鍵となる。例えば、より長い量子シミュレーションが続くほど、コヒーレンス時間が長くなる。レーザー光で分子を制御する上で重要な量は、その複素値の分子動的偏光性である。実際の部分は分子が感じたツイーザー電位を決定するが、想像的な部分はコヒーレンス時間に寄与する。本研究は、電気双極子-forbidden分子遷移に対して、(数十ghzのオーダーで)小さなデチューニングを持つ選択レーザ周波数によって、光学ポテンシャルにおける分子の効率的なトラップを実現することを示唆する。この遷移に近接して、これらの状態間のコヒーレンスを犠牲にすることなく、多重回転状態のトラップ電位を著しく修正することができる。超低温23na87rb極性分子の複数の回転状態に対するマジックトラップ条件が生成できることを実証する。また,スピン分離したマジックトラップは磁場方向に向いた静電場を印加することで実現可能であることを示した。

Molecules have vibrational, rotational, spin-orbit and hyperfine degrees of freedom, each of which responds in a unique fashion to external electromagnetic radiation. The coherent control over superpositions of these quantum states is key to manipulation of molecules. For example, the better the coherence time the longer quantum simulations can last. The important quantity for controlling a molecule with laser light is its complex-valued molecular dynamic polarizability. Its real part determines the tweezer potential as felt by the molecule, while its imaginary part contributes to the coherence time. Our studies show that efficient trapping of a molecule in an optical potential can be achieved by a selecting laser frequency that has a small detuning (on the order of tens of GHz) relative to an electric-dipole-forbidden molecular transition. Close proximity to this transition allows us to significantly modify the trapping potentials for multiple rotational states without sacrificing coherences among these states. We demonstrate that magic trapping conditions for multiple rotational states in ultracold 23Na87Rb polar molecule can be created. In addition, we show that spin-decoupled magic trapping can be achieved with an applied static electric field oriented along the magnetic field direction.

翻訳日:2023-10-26 17:46:37 公開日:2023-10-24

# speakerly:テキスト合成のための音声ベースのライティングアシスタント

Speakerly: A Voice-based Writing Assistant for Text Composition ( http://arxiv.org/abs/2310.16251v1 )

ライセンス: Link先を確認

Dhruv Kumar, Vipul Raheja, Alice Kaiser-Schatzlein, Robyn Perry, Apurva Joshi, Justin Hugues-Nuger, Samuel Lou, Navid Chowdhury

(参考訳) メールやインスタントメッセージ,ノートなど,さまざまなユースケースにわたるテキスト合成を支援する,リアルタイム音声による文字作成支援システムである speakerly を提案する。ユーザーは指示や指示を通じてシステムと対話でき、システムはよく書式化され、一貫性のある文書を生成する。システムアーキテクチャと,このようなシステムを大規模に構築およびデプロイする上でのさまざまな課題に対する対処方法について詳述する。具体的には,タスク固有モデルと事前学習した言語モデルを組み合わせて,テキスト合成を高速かつ効果的に行うとともに,多様な入力モードをサポートしてユーザビリティを向上させる。

We present Speakerly, a new real-time voice-based writing assistance system that helps users with text composition across various use cases such as emails, instant messages, and notes. The user can interact with the system through instructions or dictation, and the system generates a well-formatted and coherent document. We describe the system architecture and detail how we address the various challenges while building and deploying such a system at scale. More specifically, our system uses a combination of small, task-specific models as well as pre-trained language models for fast and effective text composition while supporting a variety of input modes for better usability.

翻訳日:2023-10-26 17:39:03 公開日:2023-10-24

# グラフ隣接の固有ベクトルに基づく有限要素モデルの問合せのためのクラスタリングツール

A clustering tool for interrogating finite element models based on eigenvectors of graph adjacency ( http://arxiv.org/abs/2310.16249v1 )

ライセンス: Link先を確認

Ramaseshan Kannan

(参考訳) 本稿では,有限要素(fe)シミュレーションモデルにおける誤りをデバッグするための教師なし学習アルゴリズムを紹介し,その生成方法について詳述する。このアルゴリズムは、剛性行列の隣接性の数値的性質を用いてfeモデルにおける自由度を集合する。このアルゴリズムは、商用構造FEスイートOasys GSA(www.oasys-software.com/gsa)の「モデル安定性解析」ツールとしてデプロイされている。実世界のfeモデルのデバッグにエンドユーザがうまく利用し、実際に動作するツールの例を示す。

This note introduces an unsupervised learning algorithm to debug errors in finite element (FE) simulation models and details how it was productionised. The algorithm clusters degrees of freedom in the FE model using numerical properties of the adjacency of its stiffness matrix. The algorithm has been deployed as a tool called `Model Stability Analysis' tool within the commercial structural FE suite Oasys GSA (www.oasys-software.com/gsa). It has been used successfully by end-users for debugging real world FE models and we present examples of the tool in action.

翻訳日:2023-10-26 17:38:51 公開日:2023-10-24

# GlotLID:低リソース言語のための言語識別

GlotLID: Language Identification for Low-Resource Languages ( http://arxiv.org/abs/2310.16248v1 )

ライセンス: Link先を確認

Amir Hossein Kargaran, Ayyoob Imani, Fran\c{c}ois Yvon, Hinrich Sch\"utze

(参考訳) 最近のいくつかの論文は、約300の高リソース言語と中リソース言語のための優れた言語識別ソリューション(lid)を公開している。ただし、LIDは利用できない。 i) 幅広い低リソース言語をカバーしている。 (ii)厳格に評価され、信頼性がある (iii)効率的で使いやすい。 glotlid-mは広範にわたる範囲,信頼性,効率性のデシデラタを満たすlidモデルである。 1665の言語を識別し、以前の作業に比べてカバー範囲が大幅に増加した。実験では,F1と偽陽性率(FPR)のバランスをとる場合,GlotLID-Mは4つのベースライン(CLD3,FT176,OpenLID,NLLB)を上回った。コーパスメタデータの誤り、高リソース言語からの漏洩、密接な関連言語間の分離の困難、マクロ言語対バラエティの処理、一般的なノイズデータなどである。 GlotLID-Mをデータセット生成パイプラインに統合することで,低リソース言語や文化に対するNLP技術の品質向上とアクセシビリティ向上が期待できる。 GlotLID-Mモデル、コード、およびデータソースのリストが利用可能である。

Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages. However, there is no LID available that (i) covers a wide range of low-resource languages, (ii) is rigorously evaluated and reliable and (iii) efficient and easy to use. Here, we publish GlotLID-M, an LID model that satisfies the desiderata of wide coverage, reliability and efficiency. It identifies 1665 languages, a large increase in coverage compared to prior work. In our experiments, GlotLID-M outperforms four baselines (CLD3, FT176, OpenLID and NLLB) when balancing F1 and false positive rate (FPR). We analyze the unique challenges that low-resource LID poses: incorrect corpus metadata, leakage from high-resource languages, difficulty separating closely related languages, handling of macrolanguage vs varieties and in general noisy data. We hope that integrating GlotLID-M into dataset creation pipelines will improve quality and enhance accessibility of NLP technology for low-resource languages and cultures. GlotLID-M model, code, and list of data sources are available: https://github.com/cisnlp/GlotLID.

翻訳日:2023-10-26 17:38:42 公開日:2023-10-24

# 汎用最小補助イジングマシンの設計

Design of General Purpose Minimal-Auxiliary Ising Machines ( http://arxiv.org/abs/2310.16246v1 )

ライセンス: Link先を確認

Isaac K. Martin, Andrew G. Moore, John T. Daly, Jess J. Meyer, Teresa M. Ranadive

(参考訳) isingマシンは、従来のコンピューティングパラダイムの制限を克服し、エネルギー使用量のごく一部で運用する、量子インメモリ処理コンピュータの一形態である。イジングマシンを設計する過程は逆イジング問題として知られている。不運なことに、この問題は一般に計算的に難解である:これは非凸混合整数線形計画問題であり、スピン数が多いランタイムの指数的スケーリングのため、最も単純な場合を除いて、素直にブルート強化できない。我々は、探索空間を2次スケーリングで1つに減らすことができる新しい理論的結果を証明する。この理論を利用して、逆イジング問題に対する汎用アルゴリズム解を開発する。特に、3ビットと4ビットの整数乗算のイジングの定式化を実証する。この結果,スピンがプレミアムである現代のIsingハードウェア上でそのような回路を実装する実践性が向上した。

Ising machines are a form of quantum-inspired processing-in-memory computer which has shown great promise for overcoming the limitations of traditional computing paradigms while operating at a fraction of the energy use. The process of designing Ising machines is known as the reverse Ising problem. Unfortunately, this problem is in general computationally intractable: it is a nonconvex mixed-integer linear programming problem which cannot be naively brute-forced except in the simplest cases due to exponential scaling of runtime with number of spins. We prove new theoretical results which allow us to reduce the search space to one with quadratic scaling. We utilize this theory to develop general purpose algorithmic solutions to the reverse Ising problem. In particular, we demonstrate Ising formulations of 3-bit and 4-bit integer multiplication which use fewer total spins than previously known methods by a factor of more than three. Our results increase the practicality of implementing such circuits on modern Ising hardware, where spins are at a premium.

翻訳日:2023-10-26 17:38:19 公開日:2023-10-24

# ZzzGPT:睡眠の質を高めるインタラクティブGPTアプローチ

ZzzGPT: An Interactive GPT Approach to Enhance Sleep Quality ( http://arxiv.org/abs/2310.16242v1 )

ライセンス: Link先を確認

Yonchanok Khaokaew, Thuc Hanh Nguyen, Kaixin Ji, Hiruni Kegalle, Marwah Alaofi

(参考訳) 今日の世界では、睡眠の質は全体の幸福に欠かせない。ウェアラブルセンサーはリアルタイムのモニタリングを提供するが、アクション可能な洞察を欠くことが多く、ユーザの放棄につながる。本稿では,睡眠パターンの理解における技術の役割について述べる。本研究では,大規模言語モデル(llm)を活用した2段階フレームワークを導入し,動作可能なフィードバックによる正確な睡眠予測を実現する。 GLOBEMデータセットとLLMからの合成データを活用して、XGBoostのようなモデルによる強化結果を強調する。本手法は,高度な機械学習とユーザ中心設計を融合し,科学的正確性と実用性を融合する。

In today's world, sleep quality is pivotal for overall well-being. While wearable sensors offer real-time monitoring, they often lack actionable insights, leading to user abandonment. This paper delves into the role of technology in understanding sleep patterns. We introduce a two-stage framework, utilizing Large Language Models (LLMs), aiming to provide accurate sleep predictions with actionable feedback. Leveraging the GLOBEM dataset and synthetic data from LLMs, we highlight enhanced results with models like XGBoost. Our approach merges advanced machine learning with user-centric design, blending scientific accuracy with practicality.

翻訳日:2023-10-26 17:38:01 公開日:2023-10-24

# タスク親和性予測によるマルチタスク機械学習のためのタスクグループ化

Task Grouping for Automated Multi-Task Machine Learning via Task Affinity Prediction ( http://arxiv.org/abs/2310.16241v1 )

ライセンス: Link先を確認

Afiya Ayman, Ayan Mukhopadhyay, Aron Laszka

(参考訳) 類似したタスクを同時に学習する必要がある場合、マルチタスク学習(MTL)モデルはシングルタスク学習(STL)モデルよりもはるかに高い精度が得られる。しかし、MTLの利点は、タスクの類似性、データセットのサイズなど、様々な要因に依存している。では、どのタスクを一緒に学ぶべきか? ドメインの専門家は直観、経験、ベストプラクティスに従ってタスクをグループ化できますが、手動のグルーピングは労働集約的で最適なものではありません。本稿では,タスクグループ化のための新しい自動化手法を提案する。まず、mtl文献で広く使われている4つのベンチマークデータセットを用いて、mtlのタスクの親和性を調べ、ニューラルネットワークに基づくmtlモデルに焦点をあてる。我々は、MTLを用いてタスク群を同時に学習すべきか、STLを用いて独立して学習すべきかを予測するのに役立つ固有のタスク特徴とSTLの特徴を識別する。この予測器をベースとしたランダム化探索アルゴリズムを導入し,タスク群探索時に行うMTLトレーニングの数を最小化する。提案する4つのベンチマークデータセットでは,既存のベースラインアプローチよりも,予測型検索アプローチの方が優れたタスクグループ化を実現できることを示す。

When a number of similar tasks have to be learned simultaneously, multi-task learning (MTL) models can attain significantly higher accuracy than single-task learning (STL) models. However, the advantage of MTL depends on various factors, such as the similarity of the tasks, the sizes of the datasets, and so on; in fact, some tasks might not benefit from MTL and may even incur a loss of accuracy compared to STL. Hence, the question arises: which tasks should be learned together? Domain experts can attempt to group tasks together following intuition, experience, and best practices, but manual grouping can be labor-intensive and far from optimal. In this paper, we propose a novel automated approach for task grouping. First, we study the affinity of tasks for MTL using four benchmark datasets that have been used extensively in the MTL literature, focusing on neural network-based MTL models. We identify inherent task features and STL characteristics that can help us to predict whether a group of tasks should be learned together using MTL or if they should be learned independently using STL. Building on this predictor, we introduce a randomized search algorithm, which employs the predictor to minimize the number of MTL trainings performed during the search for task groups. We demonstrate on the four benchmark datasets that our predictor-driven search approach can find better task groupings than existing baseline approaches.

翻訳日:2023-10-26 17:37:53 公開日:2023-10-24

# スケール空間理論を用いた深層畳み込みネットワークの解像学習

Resolution learning in deep convolutional networks using scale-space theory ( http://arxiv.org/abs/2106.03412v3 )

ライセンス: Link先を確認

Silvia L.Pintea and Nergis Tomen and Stanley F. Goes and Marco Loog and Jan C. van Gemert

(参考訳) 深層畳み込みニューラルネットワーク(cnns)の分解能は、通常、フィルタサイズを通じて受容場サイズに制限され、特徴地図上のレイヤーまたはストレート畳み込みをサブサンプリングする。最適な解像度はデータセットによって大きく異なる可能性がある。現代のCNNは、そのようなハイパーパラメータのチューニングを煩雑にするネットワークアーキテクチャにおいて、その解像度のハイパーパラメータをハードコードしている。我々は、ハードコードされた解像度ハイパーパラメータを廃止し、データから適切な解像度を学ぶことを提案する。スケール空間理論を用いてフィルタの自己相似パラメトリゼーションを求め、ガウス微分フィルタの学習的組み合わせによりフィルタを近似するために、N-Jet: truncated Taylor級数を用いる。ガウス基底のパラメータシグマは、フィルタが符号化する詳細度とフィルタの空間的範囲の両方を制御する。 sigmaは連続パラメータであるため、損失に関して最適化することができる。提案したN-Jetレイヤは,各レイヤの解像度を自動的に学習しながら,最先端のアーキテクチャで使用する場合と同等のパフォーマンスを実現する。我々は,N-Jet層を分類とセグメンテーションの両方で評価し,学習シグマが複数サイズの入力に特に有用であることを示す。

Resolution in deep convolutional neural networks (CNNs) is typically bounded by the receptive field size through filter sizes, and subsampling layers or strided convolutions on feature maps. The optimal resolution may vary significantly depending on the dataset. Modern CNNs hard-code their resolution hyper-parameters in the network architecture which makes tuning such hyper-parameters cumbersome. We propose to do away with hard-coded resolution hyper-parameters and aim to learn the appropriate resolution from data. We use scale-space theory to obtain a self-similar parametrization of filters and make use of the N-Jet: a truncated Taylor series to approximate a filter by a learned combination of Gaussian derivative filters. The parameter sigma of the Gaussian basis controls both the amount of detail the filter encodes and the spatial extent of the filter. Since sigma is a continuous parameter, we can optimize it with respect to the loss. The proposed N-Jet layer achieves comparable performance when used in state-of-the art architectures, while learning the correct resolution in each layer automatically. We evaluate our N-Jet layer on both classification and segmentation, and we show that learning sigma is especially beneficial for inputs at multiple sizes.

翻訳日:2023-10-26 04:10:02 公開日:2023-10-24

# 未知不変多様体近傍の低速確率系の非線形モデル還元

Nonlinear model reduction for slow-fast stochastic systems near unknown invariant manifolds ( http://arxiv.org/abs/2104.02120v2 )

ライセンス: Link先を確認

Felix X.-F. Ye, Sichen Yang, Mauro Maggioni

(参考訳) 本稿では,低次元不変有効多様体と低速ダイナミクス,高次元大速モードを有する高次元確率力学系に対して,非線形確率モデル還元法を提案する。シミュレーションの短いバーストが得られたブラックボックスシミュレータへのアクセスのみを前提として、不変多様体の推定値を出力するアルゴリズムと、高速モードを平均化した実効確率力学のプロセスと、そのシミュレータを設計する。このシミュレータは、不変多様体の低次元を活用し、有効プロセスの正則性に依存する大きさの時間ステップを要し、したがって通常、高速モードを解決しなければならない元のシミュレータよりもはるかに大きいという点で効率的である。アルゴリズムと推定はオンザフライで実行でき、基礎となるダイナミクスとの一貫性を失うことなく、効率的な状態空間の探索に繋がる。この構造は, 定常分布, 準安定状態の同定, 滞留時間, 遷移速度など, それらの力学の重要な特徴と観測可能性の推定とともに, 有効力学の経路の高速かつ効率的なシミュレーションを可能にする。

We introduce a nonlinear stochastic model reduction technique for high-dimensional stochastic dynamical systems that have a low-dimensional invariant effective manifold with slow dynamics, and high-dimensional, large fast modes. Given only access to a black box simulator from which short bursts of simulation can be obtained, we design an algorithm that outputs an estimate of the invariant manifold, a process of the effective stochastic dynamics on it, which has averaged out the fast modes, and a simulator thereof. This simulator is efficient in that it exploits of the low dimension of the invariant manifold, and takes time steps of size dependent on the regularity of the effective process, and therefore typically much larger than that of the original simulator, which had to resolve the fast modes. The algorithm and the estimation can be performed on-the-fly, leading to efficient exploration of the effective state space, without losing consistency with the underlying dynamics. This construction enables fast and efficient simulation of paths of the effective dynamics, together with estimation of crucial features and observables of such dynamics, including the stationary distribution, identification of metastable states, and residence times and transition rates between them.

翻訳日:2023-10-26 04:09:40 公開日:2023-10-24

# 記述論理 elhr の証明

Provenance for the Description Logic ELHr ( http://arxiv.org/abs/2001.07541v3 )

ライセンス: Link先を確認

Camille Bourgaux, Ana Ozaki, Rafael Pe\~naloza and Livia Predoiu

(参考訳) ELHrオントロジーにおける前兆情報処理の問題に対処する。本稿では,オントロジーに基づくデータアクセスの設定について考察し,オントロジーの公理に証明トークンを付加したセミリングと古典的データ証明の拡張について考察する。その結果、導出に関わる公理の証明を継承し、注釈として証明多項式を生成する。 ELHrの場合のセマンティクスを分析し,結合の存在が証明の扱いに様々な困難をもたらすことを示し,その一部はセミリングの乗法的イデオロポシーを仮定することによって緩和されている。本仮定では, オントロジーの完備化, 結果に対する関連する公理の集合の計算, 問合せ応答の3つの問題について検討する。

We address the problem of handling provenance information in ELHr ontologies. We consider a setting recently introduced for ontology-based data access, based on semirings and extending classical data provenance, in which ontology axioms are annotated with provenance tokens. A consequence inherits the provenance of the axioms involved in deriving it, yielding a provenance polynomial as an annotation. We analyse the semantics for the ELHr case and show that the presence of conjunctions poses various difficulties for handling provenance, some of which are mitigated by assuming multiplicative idempotency of the semiring. Under this assumption, we study three problems: ontology completion with provenance, computing the set of relevant axioms for a consequence, and query answering.

翻訳日:2023-10-26 04:08:51 公開日:2023-10-24

# ディープニューラルネットワークのための微分スカラー化

Differentiable Sparsification for Deep Neural Networks ( http://arxiv.org/abs/1910.03201v6 )

ライセンス: Link先を確認

Yognjin Lee

(参考訳) ディープニューラルネットワークは、機能エンジニアリングの負担を大幅に軽減していますが、これらのネットワークの効果的なアーキテクチャを決定するには、それと同等の努力が必要です。さらに、ネットワークサイズが過大になるにつれて、そのサイズを減らすためにかなりの量のリソースが投資される。これらの課題はオーバーコンプリートモデルのスパース化によって効果的に対処できる。本研究では,確率的勾配降下を伴う正規化対象関数を直接最適化することにより,重要でないパラメータをゼロにすることができるディープニューラルネットワークの完全微分可能なスパーシフィケーション法を提案する。その結果,提案手法はネットワークのスパース化構造と重み付けの両方をエンドツーエンドで学習することができる。様々な現代のディープニューラルネットワークに直接適用することができ、トレーニングプロセスに最小限の変更を必要とする。私たちの知る限りでは、これは最初の完全に微分可能なスパーシフィケーション方法です。

Deep neural networks have significantly alleviated the burden of feature engineering, but comparable efforts are now required to determine effective architectures for these networks. Furthermore, as network sizes have become excessively large, a substantial amount of resources is invested in reducing their sizes. These challenges can be effectively addressed through the sparsification of over-complete models. In this study, we propose a fully differentiable sparsification method for deep neural networks, which can zero out unimportant parameters by directly optimizing a regularized objective function with stochastic gradient descent. Consequently, the proposed method can learn both the sparsified structure and weights of a network in an end-to-end manner. It can be directly applied to various modern deep neural networks and requires minimal modification to the training process. To the best of our knowledge, this is the first fully differentiable sparsification method.

翻訳日:2023-10-26 04:08:22 公開日:2023-10-24

# 微分プライバシーにおける完全適応構成

Fully Adaptive Composition in Differential Privacy ( http://arxiv.org/abs/2203.05481v3 )

ライセンス: Link先を確認

Justin Whitehouse and Aaditya Ramdas and Ryan Rogers and Zhiwei Steven Wu

(参考訳) 構成は差分プライバシーの重要な特徴である。よく知られている高度な合成定理は、プライバシの基本的な構成が許すよりも2倍の頻度でプライベートデータベースをクエリできる。しかし、これらの結果は、すべてのアルゴリズムのプライバシパラメータをデータとやりとりする前に修正する必要がある。これを解決するためにRogersらは、アルゴリズムとプライバシパラメータの両方を適応的に選択できる完全適応型合成を導入した。彼らは、適応的な構成でプライバシを測定するための2つの確率的オブジェクトを定義した。すなわち、構成されたインタラクションに対して差分プライバシー保証を提供するプライバシフィルタと、プライバシ損失に関する時間的一様境界であるプライバシオドメータである。高度な合成と既存のフィルターとオドメーターの間には大きなギャップがある。まず、既存のフィルタは、構成されるアルゴリズムに強い仮定を与える。第二に、これらのオドメータとフィルターは大きな定数に苦しめられ、実用的でない。我々は,プライバシパラメータを適応的に選択したにもかかわらず,定数を含む高度な構成率に適合するフィルタを構築した。途中で、おおよそのzcdpに対するプライバシーフィルターも導出します。また、オドメーターの一般的なファミリーもいくつか構築する。これらのオドメーターは、任意の、事前選択された時点、または同時に同時に2つの対数係数まで高度な合成の厳密さと一致する。マルティンゲール濃度の進歩を利用して結果を得る。要約すると、完全に適応的なプライバシは、ほとんど損失なく取得可能である。

Composition is a key feature of differential privacy. Well-known advanced composition theorems allow one to query a private database quadratically more times than basic privacy composition would permit. However, these results require that the privacy parameters of all algorithms be fixed before interacting with the data. To address this, Rogers et al. introduced fully adaptive composition, wherein both algorithms and their privacy parameters can be selected adaptively. They defined two probabilistic objects to measure privacy in adaptive composition: privacy filters, which provide differential privacy guarantees for composed interactions, and privacy odometers, time-uniform bounds on privacy loss. There are substantial gaps between advanced composition and existing filters and odometers. First, existing filters place stronger assumptions on the algorithms being composed. Second, these odometers and filters suffer from large constants, making them impractical. We construct filters that match the rates of advanced composition, including constants, despite allowing for adaptively chosen privacy parameters. En route we also derive a privacy filter for approximate zCDP. We also construct several general families of odometers. These odometers match the tightness of advanced composition at an arbitrary, preselected point in time, or at all points in time simultaneously, up to a doubly-logarithmic factor. We obtain our results by leveraging advances in martingale concentration. In sum, we show that fully adaptive privacy is obtainable at almost no loss.

翻訳日:2023-10-26 04:02:28 公開日:2023-10-24

# LAP:畳み込みニューラルネットワークにおける概念に基づく自己解釈と知識注入のための注意型モジュール

LAP: An Attention-Based Module for Concept Based Self-Interpretation and Knowledge Injection in Convolutional Neural Networks ( http://arxiv.org/abs/2201.11808v5 )

ライセンス: Link先を確認

Rassa Ghavami Modegh, Ahmad Salimi, Alireza Dizaji, Hamid R. Rabiee

(参考訳) 深層畳み込みニューラルネットワークの最先端性能にもかかわらず、見当たらない状況ではバイアスや誤動作の影響を受けやすい。さらに、推論の背後にある複雑な計算は、信頼を育むには人間には理解できない。外部説明手法は、人間の理解可能な方法でネットワーク決定を解釈しようと試みてきたが、仮定や単純化により誤認を訴えられている。一方、モデル固有の自己解釈性は、前述の誤りに対してより堅牢であるが、既に訓練されたモデルには適用できない。そこで本研究では, 自己解釈性を実現し, 性能損失を伴わない知識注入の可能性を実現する, LAP (Local Attention Pooling) と呼ばれる新しい注意層を提案する。このモジュールは、どんな畳み込みニューラルネットワークにも簡単に接続できる。我々は、専門家の注釈に頼らずに、意思決定における特徴の区別を学ぶための弱教師付きトレーニングスキームを定義した。我々は、ImageNetを含む2つのデータセット上で複数のLAP拡張モデルを評価することによって、我々の主張を検証する。提案するフレームワークは、一般的なホワイトボックスの説明手法よりも、人間の理解しやすく忠実なモデル解釈を提供する。

Despite the state-of-the-art performance of deep convolutional neural networks, they are susceptible to bias and malfunction in unseen situations. Moreover, the complex computation behind their reasoning is not human-understandable to develop trust. External explainer methods have tried to interpret network decisions in a human-understandable way, but they are accused of fallacies due to their assumptions and simplifications. On the other side, the inherent self-interpretability of models, while being more robust to the mentioned fallacies, cannot be applied to the already trained models. In this work, we propose a new attention-based pooling layer, called Local Attention Pooling (LAP), that accomplishes self-interpretability and the possibility for knowledge injection without performance loss. The module is easily pluggable into any convolutional neural network, even the already trained ones. We have defined a weakly supervised training scheme to learn the distinguishing features in decision-making without depending on experts' annotations. We verified our claims by evaluating several LAP-extended models on two datasets, including ImageNet. The proposed framework offers more valid human-understandable and faithful-to-the-model interpretations than the commonly used white-box explainer methods.

翻訳日:2023-10-26 04:01:23 公開日:2023-10-24

# heam:ディープニューラルネットワークの高効率近似マルチプライア最適化

HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks ( http://arxiv.org/abs/2201.08022v4 )

ライセンス: Link先を確認

Su Zheng, Zhen Li, Yao Lu, Jingbo Gao, Jide Zhang, Lingli Wang

(参考訳) オペランド分布にしたがって平均誤差を最小化する近似乗算器の自動設計のための最適化手法を提案する。我々の乗算器は、DNNにおいて最もよく再現された近似乗算器よりも50.24%高い精度で15.76%小さく、消費電力が25.05%減少し、3.50%遅れている。正確な乗算器と比較して、乗算器は面積、消費電力、遅延を44.94%、47.63%、および16.78%削減し、精度の損失は無視できる。我々の乗算器を持つ試験されたDNN加速器モジュールは、18.70%の面積と9.99%の消費電力を得る。

We propose an optimization method for the automatic design of approximate multipliers, which minimizes the average error according to the operand distributions. Our multiplier achieves up to 50.24% higher accuracy than the best reproduced approximate multiplier in DNNs, with 15.76% smaller area, 25.05% less power consumption, and 3.50% shorter delay. Compared with an exact multiplier, our multiplier reduces the area, power consumption, and delay by 44.94%, 47.63%, and 16.78%, respectively, with negligible accuracy losses. The tested DNN accelerator modules with our multiplier obtain up to 18.70% smaller area and 9.99% less power consumption than the original modules.

翻訳日:2023-10-26 04:01:05 公開日:2023-10-24

# DNNテストのための実世界のメディアデータの多変量解析

Provably Valid and Diverse Mutations of Real-World Media Data for DNN Testing ( http://arxiv.org/abs/2112.01956v2 )

ライセンス: Link先を確認

Yuanyuan Yuan, Qi Pang, Shuai Wang

(参考訳) ディープニューラルネットワーク(dnn)は、しばしば高次元メディアデータ(例えば写真、テキスト、音声)を受け取り、その知覚内容(例えば猫)を理解する。 DNNをテストするには、誤予測を引き起こすために多様な入力が必要である。いくつかの予備的な研究では、バイトレベルの突然変異やドメイン固有のフィルター(霧など)を使用し、有効変異は制限され、エラーを起こしやすい。 sota worksは(無限の)入力を生成するために深い生成モデルを採用している。また、変異した入力を知覚的に有効に保つために(例えば、猫は突然変異後に「猫」のままである)、既存の努力は不正確で一般化不可能なヒューリスティックに頼っている。本研究は,低次元空間における高次元メディアデータの知覚を捉える理論である,多様体に基づく厳密な手法により,メディア入力変異(DIV)と妥当性(VAL)の2つの重要な目的を再考する。 DIV と VAL が互いに密接な関係にある重要な結果を示し、SOTA 生成モデルに基づく手法が実世界のメディアデータ(DIV と VAL の犠牲)を根本的に変更できないことを証明した。対照的に,実世界のメディアデータを,多様体に基づく高いDIVとVALで変更できる可能性について論じる。我々は,様々なフォーマット(画像,音声,テキスト)のメディアデータを,多様体に基づく統一的な方法で変更する技術ソリューションを考案する。特に、メディアデータが低次元多様体に投影されると、そのデータは特定の方向とステップサイズで多様体の上を歩くことで変更することができる。入力データと対比すると、変異されたデータは、適度に高いval(犬はまだ残っている)を保持しながら、知覚特性(例えば、横たわる犬対立犬)にdivを奨励する。 DNNをテストするためにDEEPWALKで実装する。 DEEPWALKは包括性テストにおいて従来の手法よりも優れており、より高い品質のエラートリガー入力を見つけることができる。

Deep neural networks (DNNs) often accept high-dimensional media data (e.g., photos, text, and audio) and understand their perceptual content (e.g., a cat). To test DNNs, diverse inputs are needed to trigger mis-predictions. Some preliminary works use byte-level mutations or domain-specific filters (e.g., foggy), whose enabled mutations may be limited and likely error-prone. SOTA works employ deep generative models to generate (infinite) inputs. Also, to keep the mutated inputs perceptually valid (e.g., a cat remains a "cat" after mutation), existing efforts rely on imprecise and less generalizable heuristics. This study revisits two key objectives in media input mutation - perception diversity (DIV) and validity (VAL) - in a rigorous manner based on manifold, a well-developed theory capturing perceptions of high-dimensional media data in a low-dimensional space. We show important results that DIV and VAL inextricably bound each other, and prove that SOTA generative model-based methods fundamentally fail to mutate real-world media data (either sacrificing DIV or VAL). In contrast, we discuss the feasibility of mutating real-world media data with provably high DIV and VAL based on manifold. We concretize the technical solution of mutating media data of various formats (images, audios, text) via a unified manner based on manifold. Specifically, when media data are projected into a low-dimensional manifold, the data can be mutated by walking on the manifold with certain directions and step sizes. When contrasted with the input data, the mutated data exhibit encouraging DIV in the perceptual traits (e.g., lying vs. standing dog) while retaining reasonably high VAL (i.e., a dog remains a dog). We implement our techniques in DEEPWALK for testing DNNs. DEEPWALK outperforms prior methods in testing comprehensiveness and can find more error-triggering inputs with higher quality.

翻訳日:2023-10-26 04:00:49 公開日:2023-10-24

# 最も単純な流木

Simplest Streaming Trees ( http://arxiv.org/abs/2110.08483v6 )

ライセンス: Link先を確認

Haoyin Xu, Jayanta Dey, Sambit Panda, Joshua T. Vogelstein

(参考訳) ランダムな森林や勾配木などの決定的森林は、特に表データにおいて、現実世界のデータ問題の主要な機械学習手法である。しかし、現在の実装のほとんどはバッチモードでのみ動作するため、より多くのデータが到着してもインクリメンタルに更新することはできない。以前のいくつかの作品は、この制限を克服するためにストリーミングツリーとアンサンブルを開発した。それにもかかわらず、これらの最先端アルゴリズムは、いくつかの問題に対する精度の低下や、他の問題でのメモリ使用率など、多くの欠点を抱えていることがわかった。そこで、我々は、決定木を可能な限りシンプルに拡張し、新しいデータを与え、成長を続けることで既存の木を更新し、古い木を新しい木に置き換え、全体の木数を制御する。 72の分類問題(OpenML-CC18データスイート)を含むベンチマークスイートでは、上記のいずれかの制限を問わないストリーム決定フォレスト(SDF)のアプローチが示されている。これらのデータセット上では、従来のバッチ決定森林アルゴリズムよりも、我々のアプローチがよく、時にはより良く機能することを示した。したがって、sdfは多くの現実世界の問題に容易に適用できる流木や森林の単純な標準を確立している。

Decision forests, including random forests and gradient boosting trees, remain the leading machine learning methods for many real-world data problems, especially on tabular data. However, most of the current implementations only operate in batch mode, and therefore cannot incrementally update when more data arrive. Several previous works developed streaming trees and ensembles to overcome this limitation. Nonetheless, we found that those state-of-the-art algorithms suffer from a number of drawbacks, including low accuracy on some problems and high memory usage on others. We therefore developed the simplest possible extension of decision trees: given new data, simply update existing trees by continuing to grow them, and replace some old trees with new ones to control the total number of trees. In a benchmark suite containing 72 classification problems (the OpenML-CC18 data suite), we illustrate that our approach, Stream Decision Forest (SDF), does not suffer from either of the aforementioned limitations. On those datasets, we also demonstrate that our approach often performs as well, and sometimes even better, than conventional batch decision forest algorithm. Thus, SDFs establish a simple standard for streaming trees and forests that could readily be applied to many real-world problems.

翻訳日:2023-10-26 04:00:14 公開日:2023-10-24

# 非教師なしビデオ領域適応による行動認識:対角的視点

Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective ( http://arxiv.org/abs/2208.07365v3 )

ライセンス: Link先を確認

Pengfei Wei, Lingdong Kong, Xinghua Qu, Yi Ren, Zhiqiang Xu, Jing Jiang, Xiang Yin

(参考訳) 教師なしビデオドメイン適応は実用的だが難しい課題である。この作業では、初めて、歪んだ視点からそれに取り組む。我々のキーとなる考え方は、空間的領域と時間的領域の分断を分離して扱うことである。具体的には,静的情報のエンコードと動的情報をエンコードする2組の潜在要因によるクロスドメインビデオの生成を検討する。その後、トランスファーシーケンスVAE(TranSVAE)フレームワークが開発され、そのような世代をモデル化する。適応性を高めるために,潜在因子を制約する目的をいくつか提案する。これらの制約により、静的なドメイン固有情報を切り離すことで空間的ばらつきを容易に取り除き、対角学習により時間的ばらつきをフレームレベルとビデオレベルの両方からさらに低減することができる。 UCF-HMDB、Jester、Epic-Kitchensデータセットの大規模な実験は、最先端のいくつかのアプローチと比較してTranSVAEの有効性と優位性を検証する。コードは公開されている。

Unsupervised video domain adaptation is a practical yet challenging task. In this work, for the first time, we tackle it from a disentanglement view. Our key idea is to handle the spatial and temporal domain divergence separately through disentanglement. Specifically, we consider the generation of cross-domain videos from two sets of latent factors, one encoding the static information and another encoding the dynamic information. A Transfer Sequential VAE (TranSVAE) framework is then developed to model such generation. To better serve for adaptation, we propose several objectives to constrain the latent factors. With these constraints, the spatial divergence can be readily removed by disentangling the static domain-specific information out, and the temporal divergence is further reduced from both frame- and video-levels through adversarial learning. Extensive experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE compared with several state-of-the-art approaches. Code is publicly available.

翻訳日:2023-10-26 03:50:54 公開日:2023-10-24

# 作用素値シャッテン空間と量子エントロピー

Operator-valued Schatten spaces and quantum entropies ( http://arxiv.org/abs/2207.06693v3 )

ライセンス: Link先を確認

Salman Beigi, Milad M. Goodarzi

(参考訳) 作用素値のシャッテン空間は g. pisier によってベクトル値 $\ell_p$-spaces の非可換対応として導入された。この作用素空間の族は補間スケールを形成し、様々なアプリケーションにおいて強力で便利なツールとなる。特に、この族から来るノルムは自然に量子情報理論(QIT)におけるあるエントロピー量の定義に現れるので、ピシエの理論を用いてそれらの量のいくつかの特徴を確立することができる。それにもかかわらず、既存の文献からこの理論の主結果の証明に従うことは極めて困難である。本稿では,特にQITコミュニティ全体において,Pisierの理論の基礎となる概念と概念を自己完結した形で提示することによって,このギャップを埋めようとしている。さらに、この理論のいくつかの応用をQITで述べる。特に、量子条件 R'enyi エントロピーに束縛された新しい一様連続性を証明する。

Operator-valued Schatten spaces were introduced by G. Pisier as a noncommutative counterpart of vector-valued $\ell_p$-spaces. This family of operator spaces forms an interpolation scale which makes it a powerful and convenient tool in a variety of applications. In particular, as the norms coming from this family naturally appear in the definition of certain entropic quantities in Quantum Information Theory (QIT), one may apply Pisier's theory to establish some features of those quantities. Nevertheless, it could be quite challenging to follow the proofs of the main results of this theory from the existing literature. In this article, we attempt to fill this gap by presenting the underlying concepts and ideas of Pisier's theory in a self-contained way which we hope to be more accessible, especially for the QIT community at large. Furthermore, we describe some applications of this theory in QIT. In particular, we prove a new uniform continuity bound for the quantum conditional R\'enyi entropy.

翻訳日:2023-10-26 03:50:28 公開日:2023-10-24

# 動的変動軌跡モデルを用いた心エコー図の異常検出

Anomaly Detection in Echocardiograms with Dynamic Variational Trajectory Models ( http://arxiv.org/abs/2206.15316v3 )

ライセンス: Link先を確認

Alain Ryser, Laura Manduchi, Fabian Laumer, Holger Michel, Sven Wellmann, Julia E. Vogt

(参考訳) 心エコービデオの新しい異常検出法を提案する。本手法は循環周期の周期的特性を利用して変動潜在軌道モデル(TVAE)の3つの変種を学習する。第1の2つの変種(TVAE-CとTVAE-R)は心臓の厳格な周期運動をモデル化するが、第3の変種(TVAE-S)はより一般的であり、ビデオ全体を通して空間表現の変化を可能にする。全てのモデルは、健康な人口の規範を学ぶために、複数のチャンバービューからなる幼児の心エコービデオの、新しい社内データセットの健全なサンプルに基づいて訓練される。推定の際には,データセット内の分布外サンプルを検出するために,MAPに基づく最大異常検出を行う。提案手法は, Ebstein's Anomaly や Shone-complex などの重症先天性心疾患を確実に同定する。さらに、肺高血圧や右室拡張を検出する際に、標準変分オートエンコーダを用いたMAPベースの異常検出よりも優れた性能を発揮する。最後に, 異常心構造に対応する領域を強調するヒートマップを用いて, 出力の解釈可能な説明を可能にすることを実証する。

We propose a novel anomaly detection method for echocardiogram videos. The introduced method takes advantage of the periodic nature of the heart cycle to learn three variants of a variational latent trajectory model (TVAE). While the first two variants (TVAE-C and TVAE-R) model strict periodic movements of the heart, the third (TVAE-S) is more general and allows shifts in the spatial representation throughout the video. All models are trained on the healthy samples of a novel in-house dataset of infant echocardiogram videos consisting of multiple chamber views to learn a normative prior of the healthy population. During inference, maximum a posteriori (MAP) based anomaly detection is performed to detect out-of-distribution samples in our dataset. The proposed method reliably identifies severe congenital heart defects, such as Ebstein's Anomaly or Shone-complex. Moreover, it achieves superior performance over MAP-based anomaly detection with standard variational autoencoders when detecting pulmonary hypertension and right ventricular dilation. Finally, we demonstrate that the proposed method enables interpretable explanations of its output through heatmaps highlighting the regions corresponding to anomalous heart structures.

翻訳日:2023-10-26 03:50:12 公開日:2023-10-24

# ボソニックガウス系の量子r\'{e}nyiエントロピー汎関数

Quantum R\'{e}nyi Entropy Functionals for Bosonic Gaussian Systems ( http://arxiv.org/abs/2204.10737v3 )

ライセンス: Link先を確認

Junseo Lee and Kabgyun Jeong

(参考訳) 本研究では、次数 $p>1$ とパワー $\kappa$ の量子 r\'{e}nyi エントロピーパワー不等式を古典的 r\'{e}nyi-$p$ エントロピーパワー不等式(英語版)の量子アナログとして導入する。この不等式を導出するために、一般化ビームスプリッター演算である量子畳み込みの混合演算により、ボソニックガウス系のWehrl-$p$エントロピーパワー不等式を利用する。この観測は、量子R\'{e}nyi-$p$エントロピーパワーの不等式を、D$モードボソニックガウスの準確率分布に対して直接提供する。提案された不等式は、量子チャネル容量の非自明な計算、特にボソニックガウス量子チャネル上の普遍上界、およびガウス増幅器のスクイージング操作によるガウスの絡み合い観測に有用である。

In this study, the quantum R\'{e}nyi entropy power inequality of order $p>1$ and power $\kappa$ is introduced as a quantum analog of the classical R\'{e}nyi-$p$ entropy power inequality. To derive this inequality, we first exploit the Wehrl-$p$ entropy power inequality on bosonic Gaussian systems via the mixing operation of quantum convolution, which is a generalized beam-splitter operation. This observation directly provides a quantum R\'{e}nyi-$p$ entropy power inequality over a quasi-probability distribution for $D$-mode bosonic Gaussian regimes. The proposed inequality is expected to be useful for the nontrivial computing of quantum channel capacities, particularly universal upper bounds on bosonic Gaussian quantum channels, and a Gaussian entanglement witness in the case of Gaussian amplifier via squeezing operations.

翻訳日:2023-10-26 03:48:47 公開日:2023-10-24

# ULF: Cross-Validation を用いた非教師付きラベリング関数補正

ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision ( http://arxiv.org/abs/2204.06863v3 )

ライセンス: Link先を確認

Anastasiia Sedova, Benjamin Roth

(参考訳) 手動ラベリングの費用対効果は弱い監督(WS)であり、データサンプルは事前に定義されたラベリング関数のセット(LF)を使って自動的にアノテートされ、関連するクラスの人工ラベリングを生成するルールベースのメカニズムである。そこで本研究では,k-foldクロスバリデーションの原理に基づくWSのノイズ低減手法について検討する。非教師付きラベル関数補正のための新しいアルゴリズムULFを導入し、いくつかのLF以外のモデルで訓練されたモデルを利用してWSデータを識別し、保持されたLFに固有のバイアスを補正する。特にULFは、高信頼性のクロスバリデーションサンプルにこの割り当てを再見積することで、クラスへのLFの割り当てを洗練します。複数のデータセットの評価は、手動ラベリングを必要とせずにWS学習を向上するULFの有効性を確認する。

A cost-effective alternative to manual data labeling is weak supervision (WS), where data samples are automatically annotated using a predefined set of labeling functions (LFs), rule-based mechanisms that generate artificial labels for the associated classes. In this work, we investigate noise reduction techniques for WS based on the principle of k-fold cross-validation. We introduce a new algorithm ULF for Unsupervised Labeling Function correction, which denoises WS data by leveraging models trained on all but some LFs to identify and correct biases specific to the held-out LFs. Specifically, ULF refines the allocation of LFs to classes by re-estimating this assignment on highly reliable cross-validated samples. Evaluation on multiple datasets confirms ULF's effectiveness in enhancing WS learning without the need for manual labeling.

翻訳日:2023-10-26 03:48:28 公開日:2023-10-24

# 弱結合を超える有限時間ランダウアー原理

Finite-time Landauer principle beyond weak coupling ( http://arxiv.org/abs/2211.02065v3 )

ライセンス: Link先を確認

Alberto Rolandi and Mart\'i Perarnau-Llobet

(参考訳) ランダウアーの原理は、情報を消去する熱力学的コストに根本的な制限を与える。その飽和は可逆等温過程を必要とし、したがって無限の時間を必要とする。我々は,単一のフェルミオンモードの占有中にエンコードされたビットに対して,ランドウアーの原理の有限時間バージョンを開発した。正確な非平衡力学を解くことによって、熱力学への幾何学的アプローチにより、遅い駆動状態における消去過程(フェルミオンのエネルギーと系-バス結合を制御パラメータとする)を最適化する。数値的に解くことができる熱力学的計量と測地線方程式の解析式を求める。これらの解は、非マルコフ的かつ強いカップリング効果を完全に考慮して、ランダウアーの束縛に対する有限時間補正を特徴付けるための最適な過程を与える。

Landauer's principle gives a fundamental limit to the thermodynamic cost of erasing information. Its saturation requires a reversible isothermal process, and hence infinite time. We develop a finite-time version of Landauer's principle for a bit encoded in the occupation of a single fermionic mode, which can be strongly coupled to a reservoir. By solving the exact non-equilibrium dynamics, we optimize erasure processes (taking both the fermion's energy and system-bath coupling as control parameters) in the slow driving regime through a geometric approach to thermodynamics. We find analytic expressions for the thermodynamic metric and geodesic equations, which can be solved numerically. Their solution yields optimal processes that allow us to characterize a finite-time correction to Landauer's bound, fully taking into account non-markovian and strong coupling effects.

翻訳日:2023-10-26 03:42:03 公開日:2023-10-24

# FrischとSegr\`eによる多段階Stern$\unicode{x2013}$Gerlach実験の量子力学的モデリング

Quantum mechanical modeling of the multi-stage Stern$\unicode{x2013}$Gerlach experiment conducted by Frisch and Segr\`e ( http://arxiv.org/abs/2210.11553v3 )

ライセンス: Link先を確認

S. S\"uleyman Kahraman, Kelvin Titimbo, Zhe He, Jung-Tsung Shen, Lihong V. Wang

(参考訳) マルチステージ Stern$\unicode{x2013}$Gerlach 実験はカスケード量子測定を提供する。 Frisch と Segr\``e が行ったマルチステージ Stern$\unicode{x2013}$Gerlach 実験は、Majorana の量子力学を用いて解析的にモデル化され、Rabi によって修正された。しかし、理論的な予測は実験的な観測とよく一致しない。ここでは、スピンの時間発展の超微細構造相互作用を含むフォン・ノイマン方程式を用いて、標準量子力学モデルを数値的に解く。これまでのところ、自由パラメータを使わずに標準量子力学モデルから決定される係数はまだゼロ以下であり、理論と実験のミスマッチを示している。一致を改善する非標準変種を議論するために検討する。

Multi-stage Stern$\unicode{x2013}$Gerlach experiments provide cascaded quantum measurements. The multi-stage Stern$\unicode{x2013}$Gerlach experiment conducted by Frisch and Segr\`e has been modeled analytically using quantum mechanics by Majorana and revised by Rabi by including the hyperfine interaction. However, the theoretical predictions do not match the experimental observation well. Here, we numerically solve the standard quantum mechanical model, via the von Neumann equation, that includes the hyperfine interaction for the time evolution of the spin. Thus far, the coefficients of determination from the standard quantum mechanical model without using free parameters are still below zero, indicating a mismatch between theory and experiment. Non-standard variants that improve the match are explored for discussion.

翻訳日:2023-10-26 03:41:49 公開日:2023-10-24

# 深層生成モデルのコンテンツベース検索

Content-Based Search for Deep Generative Models ( http://arxiv.org/abs/2210.03116v4 )

ライセンス: Link先を確認

Daohan Lu, Sheng-Yu Wang, Nupur Kumari, Rohan Agarwal, Mia Tang, David Bau, Jun-Yan Zhu

(参考訳) カスタマイズおよび事前学習された生成モデルの増大により、ユーザが既存のすべてのモデルを完全に認識することは不可能になった。このニーズに対処するために、我々はコンテンツベースのモデル検索のタスクを導入する: クエリと大量の生成モデルが与えられたとき、クエリに最も適したモデルを見つける。各生成モデルは画像の分布を生成するため、探索タスクを最適化問題として定式化し、クエリと類似したコンテンツを生成する確率が最も高いモデルを選択する。画像,スケッチ,テキストなど,異なるモーダル性からのクエリを考慮し,この確率を近似する定式化を導入する。さらに,様々な問合せモダリティに適合する特徴を学習するモデル検索のためのコントラスト学習フレームワークを提案する。本手法は,モデル検索タスク用に作成した新しいベンチマークである生成モデルzooのベースライン数を上回ることを示す。

The growing proliferation of customized and pretrained generative models has made it infeasible for a user to be fully cognizant of every model in existence. To address this need, we introduce the task of content-based model search: given a query and a large set of generative models, finding the models that best match the query. As each generative model produces a distribution of images, we formulate the search task as an optimization problem to select the model with the highest probability of generating similar content as the query. We introduce a formulation to approximate this probability given the query from different modalities, e.g., image, sketch, and text. Furthermore, we propose a contrastive learning framework for model retrieval, which learns to adapt features for various query modalities. We demonstrate that our method outperforms several baselines on Generative Model Zoo, a new benchmark we create for the model retrieval task.

翻訳日:2023-10-26 03:41:32 公開日:2023-10-24

# Spectral2 Spectral: Image-spectral similarity Assisted Spectral CT Deep Reconstruction without Reference

Spectral2Spectral: Image-spectral Similarity Assisted Spectral CT Deep Reconstruction without Reference ( http://arxiv.org/abs/2210.01125v2 )

ライセンス: Link先を確認

Xiaodong Guo, Longhui Li, Peng He, Peng Feng, Dingyue Chang, Hengyong Yu, Weiwen Wu

(参考訳) 光子計数検出器(英語版)(PCD)に基づくスペクトル計算トモグラフィーは、バイオメディカル素材のより正確な同定と定量分析を提供する能力を持つため、ますます注目を集めている。狭いエネルギービン内での光子数の制限は、低信号ノイズ比の撮像結果をもたらす。既存のCT再建のための教師付き深層再構築ネットワークは,ノイズのない臨床像を基準として取得することは不可能であるため,これらの課題に対処するのは難しい。本稿では,教師なし手法とデータ先行処理を,Spectral2Spectralという名前の統一フレームワークに相乗化するための反復的深層再構築ネットワークを提案する。我々のSpectral2Spectralは、教師なしの深層学習戦略を用いて、ノイズの多いデータからエンドツーエンドで高品質な画像を得る。画像スペクトル領域内の構造的類似性は、ネットワークトレーニングをさらに制約するために正規化項として洗練される。ニューラルネットワークの重みは自動的に更新され、反復プロセス内の画像の特徴と構造をキャプチャする。 3つの大規模な前臨床データセット実験は、スペクトル2スペクトルが他の最先端の手法よりも優れた画質を再構成することを示した。

Spectral computed tomography based on a photon-counting detector (PCD) attracts more and more attentions since it has the capability to provide more accurate identification and quantitative analysis for biomedical materials. The limited number of photons within narrow energy bins leads to imaging results of low signal-noise ratio. The existing supervised deep reconstruction networks for CT reconstruction are difficult to address these challenges because it is usually impossible to acquire noise-free clinical images with clear structures as references. In this paper, we propose an iterative deep reconstruction network to synergize unsupervised method and data priors into a unified framework, named as Spectral2Spectral. Our Spectral2Spectral employs an unsupervised deep training strategy to obtain high-quality images from noisy data in an end-to-end fashion. The structural similarity prior within image-spectral domain is refined as a regularization term to further constrain the network training. The weights of neural network are automatically updated to capture image features and structures within the iterative process. Three large-scale preclinical datasets experiments demonstrate that the Spectral2spectral reconstructs better image quality than other the state-of-the-art methods.

翻訳日:2023-10-26 03:41:17 公開日:2023-10-24

# 畳み込みニューラルネットワークにおける最大プール特徴写像のシフト不変性について

On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks ( http://arxiv.org/abs/2209.11740v2 )

ライセンス: Link先を確認

Hubert Leterme (UGA, LJK), K\'evin Polisano (UGA, LJK), Val\'erie Perrier (Grenoble INP, LJK), Karteek Alahari (LJK)

(参考訳) 本稿では,画像分類における畳み込みニューラルネットワーク(cnns)の数学的解釈性の向上に着目する。具体的には、imagenetのようなデータセットでトレーニングすると、指向したバンドパスフィルタによく似たパラメータを学習する傾向がある、第1層で発生する不安定な問題に取り組む。このようなガボル型フィルタによるサブサンプル畳み込みはエイリアスしやすく、小さな入力シフトに敏感である。この文脈では、最大プーリング作用素が複素モジュラーを近似する条件を確立するが、これはほとんどシフト不変である。次に、サブサンプル畳み込みに対するシフト不変性の尺度を導出し、最大プーリングを行う。特に,安定を達成する上で,フィルタの周波数と方向が果たす重要な役割を強調する。本稿では,二本木複素ウェーブレットパケット変換に基づく決定論的特徴抽出器,特に離散ガボール分解の場合について実験的に検証する。

This paper focuses on improving the mathematical interpretability of convolutional neural networks (CNNs) in the context of image classification. Specifically, we tackle the instability issue arising in their first layer, which tends to learn parameters that closely resemble oriented band-pass filters when trained on datasets like ImageNet. Subsampled convolutions with such Gabor-like filters are prone to aliasing, causing sensitivity to small input shifts. In this context, we establish conditions under which the max pooling operator approximates a complex modulus, which is nearly shift invariant. We then derive a measure of shift invariance for subsampled convolutions followed by max pooling. In particular, we highlight the crucial role played by the filter's frequency and orientation in achieving stability. We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree complex wavelet packet transform, a particular case of discrete Gabor-like decomposition.

翻訳日:2023-10-26 03:40:55 公開日:2023-10-24

# Amortized Variational Inference: A Systematic Review

Amortized Variational Inference: A Systematic Review ( http://arxiv.org/abs/2209.10888v2 )

ライセンス: Link先を確認

Ankush Ganguly, Sanjana Jain, and Ukrit Watchareeruetai

(参考訳) 変分推論(VI)の中核となる原理は、複雑な後続確率密度の統計的推論問題をトラクタブルな最適化問題に変換することである。この特性により、VIは複数のサンプリングベース技術よりも高速になる。しかし、従来のVIアルゴリズムは大規模データセットには拡張性がなく、最適化プロセスを再実行することなく容易に境界外データポイントを推測できない。ストーシャスティック、ブラックボックス、アモールタイズVIといったこの分野の最近の発展は、これらの問題に対処するのに役立っている。生成的モデリングタスクは、パラメータ化関数を用いて近似後続密度パラメータを学習するため、その効率と拡張性にアモータイズVIを広く利用している。本稿では、様々なVI技法の数学的基礎を概観し、VIの解釈の基礎を形成する。さらに, 償却ギャップ, 一般化問題, 不整合表現学習, 後方崩壊など, 償却viの諸問題に対処した最近の傾向について概説する。最後に、VI 最適化を改善するための交互分散手法を解析する。

The core principle of Variational Inference (VI) is to convert the statistical inference problem of computing complex posterior probability densities into a tractable optimization problem. This property enables VI to be faster than several sampling-based techniques. However, the traditional VI algorithm is not scalable to large data sets and is unable to readily infer out-of-bounds data points without re-running the optimization process. Recent developments in the field, like stochastic-, black box-, and amortized-VI, have helped address these issues. Generative modeling tasks nowadays widely make use of amortized VI for its efficiency and scalability, as it utilizes a parameterized function to learn the approximate posterior density parameters. In this paper, we review the mathematical foundations of various VI techniques to form the basis for understanding amortized VI. Additionally, we provide an overview of the recent trends that address several issues of amortized VI, such as the amortization gap, generalization issues, inconsistent representation learning, and posterior collapse. Finally, we analyze alternate divergence measures that improve VI optimization.

翻訳日:2023-10-26 03:40:38 公開日:2023-10-24

# MaXM:多言語視覚質問応答を目指して

MaXM: Towards Multilingual Visual Question Answering ( http://arxiv.org/abs/2209.05401v3 )

ライセンス: Link先を確認

Soravit Changpinyo, Linting Xue, Michal Yarom, Ashish V. Thapliyal, Idan Szpektor, Julien Amelot, Xi Chen, Radu Soricut

(参考訳) VQA(Visual Question Answering)は、主に英語のレンズを通して研究されている。しかし、同じ方法で他の言語でVQAに取り組むには、かなりの量のリソースが必要になる。本稿では,データとモデリングの両面で,多言語視覚質問応答(mVQA)のスケーラブルな解を提案する。まず,従来の質問や回答を直接収集する手法よりも,人間のアノテーションの取り組みをはるかに少なくする,mVQAデータ生成のための翻訳ベースのフレームワークを提案する。次に,Crossmodal-3600データセットの多言語キャプションに適用し,テスト専用VQAベンチマークであるMaXMを作成するための効率的なアノテーションプロトコルを開発する。最後に, 単純で軽量で効果的なアプローチと, 最先端の英語および多言語VQAモデルのベンチマークを行う。われわれのベンチマークがmVQAのさらなる研究を促進することを願っている。

Visual Question Answering (VQA) has been primarily studied through the lens of the English language. Yet, tackling VQA in other languages in the same manner would require a considerable amount of resources. In this paper, we propose scalable solutions to multilingual visual question answering (mVQA), on both data and modeling fronts. We first propose a translation-based framework to mVQA data generation that requires much less human annotation efforts than the conventional approach of directly collection questions and answers. Then, we apply our framework to the multilingual captions in the Crossmodal-3600 dataset and develop an efficient annotation protocol to create MaXM, a test-only VQA benchmark in 7 diverse languages. Finally, we develop a simple, lightweight, and effective approach as well as benchmark state-of-the-art English and multilingual VQA models. We hope that our benchmark encourages further research on mVQA.

翻訳日:2023-10-26 03:40:19 公開日:2023-10-24

# SCL-RAI:NERにおける未ラベルエンティティ問題に対する検索拡張推論を用いたスパン型コントラスト学習

SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER ( http://arxiv.org/abs/2209.01646v3 )

ライセンス: Link先を確認

Shuzheng Si, Shuang Zeng, Jiaxing Lin, Baobao Chang

(参考訳) 名前付きエンティティ認識は、テキスト内のエンティティを見つけて分類するタスクである。しかし、NERデータセットのUnlabeled Entity Problemは、NERのパフォーマンスを著しく損なう。本稿では,この問題に対処するためのSCL-RAIを提案する。まず,異なるラベルで表現するスパンの距離を減らし,異なるラベルで表現するコントラスト学習を行うことにより,エンティティ間のあいまいさを軽減し,ラベルのないエンティティに対するモデルの堅牢性を向上させる。そこで我々は,決定境界シフト問題を緩和する検索拡張推論を提案する。本手法は,2つの実世界のデータセットにおいて,従来のSOTA法よりも4.21%,F1スコアが8.64%向上した。

Named Entity Recognition is the task to locate and classify the entities in the text. However, Unlabeled Entity Problem in NER datasets seriously hinders the improvement of NER performance. This paper proposes SCL-RAI to cope with this problem. Firstly, we decrease the distance of span representations with the same label while increasing it for different ones via span-based contrastive learning, which relieves the ambiguity among entities and improves the robustness of the model over unlabeled entities. Then we propose retrieval augmented inference to mitigate the decision boundary shifting problem. Our method significantly outperforms the previous SOTA method by 4.21% and 8.64% F1-score on two real-world datasets.

翻訳日:2023-10-26 03:40:05 公開日:2023-10-24

# カプセル学習のためのハイブリッドGromov-Wasserstein埋め込み

Hybrid Gromov-Wasserstein Embedding for Capsule Learning ( http://arxiv.org/abs/2209.00232v2 )

ライセンス: Link先を確認

Pourya Shamsolmoali, Masoumeh Zareapoor, Swagatam Das, Eric Granger, Salvador Garcia

(参考訳) Capsule Networks(CapsNets)は、イメージをオブジェクト、部品、およびそれらの関係の階層にパースすることを目的として、部分全体変換と階層的コンポーネントルーティングを含む2段階のプロセスを使用する。しかし、この階層的関係モデリングは計算コストが高く、潜在的な利点にもかかわらずcapsnetの利用が制限されている。 capsnetモデルの現在の状況は、主に彼らのパフォーマンスとカプセルのベースラインを比較することに集中しており、複雑なタスクでディープcnnの変種と同じレベルの熟練度を達成できていない。この制限に対処するために、標準ベースラインモデルを超え、高性能な畳み込みモデルよりも優れた性能を示すカプセルの学習手法を提案する。まず、入力ベクトルが投影される部分カプセルのグループを紹介します。次に、まず、サブカプセルによってモデル化されたコンポーネントと入力の相違性を定量化し、次に最適な輸送によってアライメント度を決定するハイブリッドGromov-Wassersteinフレームワークを提案する。この革新的なメカニズムは、それぞれのコンポーネント分布の類似性に基づいて、入力とサブカプセルのアライメントを定義する新しい洞察を生かしている。このアプローチはCapsNetsの複雑な高次元データから学ぶ能力を高め、解釈可能性と階層構造を維持する。提案モデルには2つの利点がある。 (i)その軽量な性質は、物体検出を含むより複雑な視覚タスクへのカプセルの応用を促進する。 (ii)これらの要求タスクにおけるベースラインアプローチよりも優れています。

Capsule networks (CapsNets) aim to parse images into a hierarchy of objects, parts, and their relations using a two-step process involving part-whole transformation and hierarchical component routing. However, this hierarchical relationship modeling is computationally expensive, which has limited the wider use of CapsNet despite its potential advantages. The current state of CapsNet models primarily focuses on comparing their performance with capsule baselines, falling short of achieving the same level of proficiency as deep CNN variants in intricate tasks. To address this limitation, we present an efficient approach for learning capsules that surpasses canonical baseline models and even demonstrates superior performance compared to high-performing convolution models. Our contribution can be outlined in two aspects: firstly, we introduce a group of subcapsules onto which an input vector is projected. Subsequently, we present the Hybrid Gromov-Wasserstein framework, which initially quantifies the dissimilarity between the input and the components modeled by the subcapsules, followed by determining their alignment degree through optimal transport. This innovative mechanism capitalizes on new insights into defining alignment between the input and subcapsules, based on the similarity of their respective component distributions. This approach enhances CapsNets' capacity to learn from intricate, high-dimensional data while retaining their interpretability and hierarchical structure. Our proposed model offers two distinct advantages: (i) its lightweight nature facilitates the application of capsules to more intricate vision tasks, including object detection; (ii) it outperforms baseline approaches in these demanding tasks.

翻訳日:2023-10-26 03:39:53 公開日:2023-10-24

# DenseShift: 高精度で効率的な2ビットの量子化を目指す

DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization ( http://arxiv.org/abs/2208.09708v3 )

ライセンス: Link先を確認

Xinlin Li, Bang Liu, Rui Heng Yang, Vanessa Courville, Chao Xing, Vahid Partovi Nia

(参考訳) 低リソースのエッジデバイスにディープニューラルネットワークを効率的にデプロイするのは、リソース要件の増大が原因で難しい。この問題に対処するため、研究者は2つの量子化のパワーや、メモリ使用量の削減と計算の簡素化を目的としたシフトネットワークなど、乗算フリーなニューラルネットワークを提案している。しかし、既存の低ビットシフトネットワークはフル精度のネットワークほど正確ではなく、通常は制限されたウェイトレンジ符号化スキームと量子化損失に悩まされている。本稿では,シフトネットワークの精度を大幅に向上し,視覚・音声アプリケーションのための全精度ネットワークと競合する性能を実現する高密度シフトネットワークを提案する。さらに,非量子化浮動小数点アクティベーションを用いた効率的なDenseShiftネットワークのデプロイ手法を導入し,既存手法の1.6倍の高速化を実現した。これを実現するために,低ビットシフトネットワークにおけるゼロウェイト値がモデルのキャパシティに寄与せず,推論計算に悪影響を及ぼすことを実証する。そこで本研究では,モデルキャパシティの向上と推論を簡略化するゼロフリーシフト機構を提案する。さらに,学習効率を向上させるための符号スケール分解設計と,モデルの伝達学習性能を向上させるための低分散ランダム初期化戦略を提案する。様々なコンピュータビジョンおよび音声タスクに関する広範な実験により,高密度シフトは既存の低ビット乗算フリーネットワークよりも優れており,全精度ネットワークに比べて競争性能が向上することが示された。さらに,提案手法は,精度を低下させることなく強い転送学習性能を示す。私たちのコードはGitHubでリリースされました。

Efficiently deploying deep neural networks on low-resource edge devices is challenging due to their ever-increasing resource requirements. To address this issue, researchers have proposed multiplication-free neural networks, such as Power-of-Two quantization, or also known as Shift networks, which aim to reduce memory usage and simplify computation. However, existing low-bit Shift networks are not as accurate as their full-precision counterparts, typically suffering from limited weight range encoding schemes and quantization loss. In this paper, we propose the DenseShift network, which significantly improves the accuracy of Shift networks, achieving competitive performance to full-precision networks for vision and speech applications. In addition, we introduce a method to deploy an efficient DenseShift network using non-quantized floating-point activations, while obtaining 1.6X speed-up over existing methods. To achieve this, we demonstrate that zero-weight values in low-bit Shift networks do not contribute to model capacity and negatively impact inference computation. To address this issue, we propose a zero-free shifting mechanism that simplifies inference and increases model capacity. We further propose a sign-scale decomposition design to enhance training efficiency and a low-variance random initialization strategy to improve the model's transfer learning performance. Our extensive experiments on various computer vision and speech tasks demonstrate that DenseShift outperforms existing low-bit multiplication-free networks and achieves competitive performance compared to full-precision networks. Furthermore, our proposed approach exhibits strong transfer learning performance without a drop in accuracy. Our code was released on GitHub.

翻訳日:2023-10-26 03:39:27 公開日:2023-10-24

# 非エルミタン系における量子力学の欠陥解凍

Quantum Metric Unveils Defect Freezing in Non-Hermitian Systems ( http://arxiv.org/abs/2301.02247v3 )

ライセンス: Link先を確認

Karin Sim, Nicol\`o Defenu, Paolo Molignini, R. Chitra

(参考訳) 量子ハミルトニアンにおける非ハーモニティ性は、非単位時間進化とおそらく複雑なエネルギー固有値をもたらし、エルミート的でない豊富な現象論をもたらす。本研究では, 完全可解な非エルミート系のダイナミクスを研究し, 線形クエンチを受ける$\mathcal{pt}$-symmetric モードと$\mathcal{pt}$-brokenモードの両方をホストする。ヒルベルト空間に非自明な動的計量が与えられる完全に一貫したフレームワークを用いることで、生成された欠陥のダイナミクスを分析する。エルミート系とは対照的に, PT崩壊時間進化は欠陥凍結を引き起こすため, 断熱性に反することが明らかとなった。この物理学は、状態の時間依存ノルムによる量正規化の法則によって見逃されるため、いわゆるメートル法フレームワークを必要とする。我々の結果は幅広い実験システムに関係している。

Non-Hermiticity in quantum Hamiltonians leads to nonunitary time evolution and possibly complex energy eigenvalues, which can lead to a rich phenomenology with no Hermitian counterpart. In this work, we study the dynamics of an exactly solvable non-Hermitian system, hosting both $\mathcal{PT}$-symmetric and $\mathcal{PT}$-broken modes subject to a linear quench. Employing a fully consistent framework, in which the Hilbert space is endowed with a nontrivial dynamical metric, we analyze the dynamics of the generated defects. In contrast to Hermitian systems, our study reveals that PT -broken time evolution leads to defect freezing and hence the violation of adiabaticity. This physics necessitates the so-called metric framework, as it is missed by the oft used approach of normalizing quantities by the time-dependent norm of the state. Our results are relevant for a wide class of experimental systems.

翻訳日:2023-10-26 03:30:46 公開日:2023-10-24

# 境界時間結晶を用いた量子力学

Quantum metrology with boundary time crystals ( http://arxiv.org/abs/2301.02103v2 )

ライセンス: Link先を確認

V. Montenegro, M. G. Genoni, A. Bayat, M. G. A. Paris

(参考訳) 量子センシングは、古典的な技術よりも量子技術の優位性を実証するアリーナの1つである。しかし、そのような優位性は、回避不可能なノイズとプローブのデコヒーレンスにより減少することができる。したがって、デコヒーレンスと戦うか利益を得るための気象学的戦略は非常に望ましい。これは、散逸相転移をサポートするある種の脱コヒーレンス駆動多体系であり、センシングに役立つかもしれない。境界時結晶(バウンダリ時結晶)は、時間-翻訳対称性が破られ、熱力学の極限で開量子系に長寿命の振動が出現する物質のエキゾチックな散逸相である。対称から境界時間結晶相への遷移は2次遷移によって説明され、量子フィッシャー情報によって定量化された量子エンハンス感度を示す。また,システムの臨界指数を決定し,それらの関係性を確立する。我々の手法は、量子エンハンス感度を達成するためにデコヒーレンスを活用することの実証である。実用の観点からは、初期化とは無関係であることの利点があり、単純な測定で捉えることができる。

Quantum sensing is one of the arenas that exemplifies the superiority of quantum technologies over their classical counterparts. Such superiority, however, can be diminished due to unavoidable noise and decoherence of the probe. Thus, metrological strategies to fight against or profit from decoherence are highly desirable. This is the case of certain types of decoherence-driven many-body systems supporting dissipative phase transitions, which might be helpful for sensing. Boundary time crystals are exotic dissipative phases of matter in which the time-translational symmetry is broken, and long-lasting oscillations emerge in open quantum systems at the thermodynamic limit. We show that the transition from a symmetry unbroken into a boundary time crystal phase, described by a second-order transition, reveals quantum-enhanced sensitivity quantified through quantum Fisher information. We also determine the critical exponents of the system and establish their relationship. Our scheme is indeed a demonstration of harnessing decoherence for achieving quantum-enhanced sensitivity. From a practical perspective, it has the advantage of being independent of initialization and can be captured by a simple measurement.

翻訳日:2023-10-26 03:30:28 公開日:2023-10-24

# T-Projection:シーケンスラベリングタスクのための高品質アノテーションプロジェクション

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks ( http://arxiv.org/abs/2212.10548v2 )

ライセンス: Link先を確認

Iker Garc\'ia-Ferrero, Rodrigo Agerri, German Rigau

(参考訳) 与えられたシーケンスラベリングタスクと言語に対するラベル付きデータがないため、アノテーションプロジェクションは注釈付きデータを自動的に生成する戦略のひとつとして提案されている。アノテーションプロジェクションはしばしば、並列コーパス上で、ソース言語の与えられたスパンに関連するラベルをターゲット言語の対応するスパンに転送するタスクとして定式化されている。本稿では,大規模な事前学習されたテキスト・テキスト言語モデルと最先端機械翻訳技術を活用したアノテーション投影手法T-Projectionを提案する。 T-プロジェクションはラベルプロジェクションタスクを2つのサブタスクに分解する。 (i)多言語t5モデルを用いた投影候補の集合を生成した候補生成ステップ (ii)翻訳確率に基づいて生成候補をランク付けする候補選択ステップ。 5つのインド・ヨーロッパ語と8つの低資源アフリカの言語において内在的および外在的タスクについて実験を行った。我々は、T射影が従来のアノテーション投影法よりも広いマージンで優れていると評価した。我々は、T-Projectionがシーケンスラベリングタスクにおける高品質なトレーニングデータの欠如を自動的に緩和するのに役立つと考えている。コードとデータは公開されている。

In the absence of readily available labeled data for a given sequence labeling task and language, annotation projection has been proposed as one of the possible strategies to automatically generate annotated data. Annotation projection has often been formulated as the task of transporting, on parallel corpora, the labels pertaining to a given span in the source language into its corresponding span in the target language. In this paper we present T-Projection, a novel approach for annotation projection that leverages large pretrained text-to-text language models and state-of-the-art machine translation technology. T-Projection decomposes the label projection task into two subtasks: (i) A candidate generation step, in which a set of projection candidates using a multilingual T5 model is generated and, (ii) a candidate selection step, in which the generated candidates are ranked based on translation probabilities. We conducted experiments on intrinsic and extrinsic tasks in 5 Indo-European and 8 low-resource African languages. We demostrate that T-projection outperforms previous annotation projection methods by a wide margin. We believe that T-Projection can help to automatically alleviate the lack of high-quality training data for sequence labeling tasks. Code and data are publicly available.

翻訳日:2023-10-26 03:30:11 公開日:2023-10-24

# 近位因果学習のための最適治療基準

Optimal Treatment Regimes for Proximal Causal Learning ( http://arxiv.org/abs/2212.09494v3 )

ライセンス: Link先を確認

Tao Shen, Yifan Cui

(参考訳) 政策立案者が因果推論を引き合いに出し、観測データに基づいて決定を下す場合の一般的な懸念は、測定された共変量体が、すべての共変量体、すなわち標準的無根性の仮定が成り立たないことである。最近提案された近親因果推論フレームワークは、実生活シナリオに付随するプロキシ変数を利用して因果効果を特定し、意思決定を容易にする。そこで本研究では, 橋梁の既往と治療を基盤とした, 最適な個別化治療手法を提案する。以上の結果から,この新しい最適治療体制の価値関数は文献上既存のものよりも優れていることが示された。識別、優越性、過剰な価値境界、推定された体制の整合性を含む理論的保証が確立される。さらに,提案手法を数値実験により実証し,実データに適用する。

A common concern when a policymaker draws causal inferences from and makes decisions based on observational data is that the measured covariates are insufficiently rich to account for all sources of confounding, i.e., the standard no confoundedness assumption fails to hold. The recently proposed proximal causal inference framework shows that proxy variables that abound in real-life scenarios can be leveraged to identify causal effects and therefore facilitate decision-making. Building upon this line of work, we propose a novel optimal individualized treatment regime based on so-called outcome and treatment confounding bridges. We then show that the value function of this new optimal treatment regime is superior to that of existing ones in the literature. Theoretical guarantees, including identification, superiority, excess value bound, and consistency of the estimated regime, are established. Furthermore, we demonstrate the proposed optimal regime via numerical experiments and a real data application.

翻訳日:2023-10-26 03:29:29 公開日:2023-10-24

# PhoMoH:人間の頭部のフォトリアリスティックな3Dモデル

PhoMoH: Implicit Photorealistic 3D Models of Human Heads ( http://arxiv.org/abs/2212.07275v3 )

ライセンス: Link先を確認

Mihai Zanfir, Thiemo Alldieck and Cristian Sminchisescu

(参考訳) 本稿では,フォトリアリスティックな3次元形状の生成モデルを構築し,頭髪,あごひげ,口腔,衣服などの人間の頭部の外観をモデル化するニューラルネット手法であるフォモ(英語版)を提案する。以前の研究とは対照的に、PhoMoHは神経場を用いて人間の頭部をモデル化し、複雑なトポロジーをサポートする。ヘッドモデルをゼロから学習する代わりに,既存の表現型ヘッドモデルに新機能を加えることを提案する。具体的には,中解像度の頭部モデル上に高精細なジオメトリネットワークを,細部,局所的なジオメトリ認識,不連続色場とともに学習する。提案するアーキテクチャにより,比較的少ないデータからフォトリアリスティックな頭部モデルを学ぶことができる。学習された生成幾何学と出現ネットワークは個別にサンプリングすることができ、多様で現実的な人間の頭を作ることができる。大規模な実験は、我々のメソッドを定性的かつ異なるメトリクスで検証する。

We present PhoMoH, a neural network methodology to construct generative models of photo-realistic 3D geometry and appearance of human heads including hair, beards, an oral cavity, and clothing. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive head model with new features. Concretely, we learn a highly detailed geometry network layered on top of a mid-resolution head model together with a detailed, local geometry-aware, and disentangled color field. Our proposed architecture allows us to learn photo-realistic human head models from relatively little data. The learned generative geometry and appearance networks can be sampled individually and enable the creation of diverse and realistic human heads. Extensive experiments validate our method qualitatively and across different metrics.

翻訳日:2023-10-26 03:29:12 公開日:2023-10-24

# チャネル識別のための利益のある絡み合い

Profitable entanglement for channel discrimination ( http://arxiv.org/abs/2211.15108v2 )

ライセンス: Link先を確認

Samad Khabbazi Oskouei, Stefano Mancini, Milajiguli Rexiti

(参考訳) 本稿では,2つの一般量子ビットチャネル(単位前処理と後処理)を識別する際の側絡み合いの有用性について検討し,それが成功確率を増大させる(及び、そうでない)正確な条件を決定する。これは、まず、完全正およびトレース保存されたキュービット線型写像の集合において極端であるチャネルの問題を解析し、次にそのような集合の内部にあるチャネルについて構成的に行われる。

We investigate the usefulness of side entanglement in discriminating between two generic qubit channels, {\ up to unitary pre- and post-processing,} and determine exact conditions under which it does enhance (as well as conditions under which it does not) the success probability. This is done in a constructive way by first analyzing the problem for channels that are extremal in the set of completely positive and trace-preserving qubit linear maps and then for channels that are inside such a set.

翻訳日:2023-10-26 03:28:42 公開日:2023-10-24

# 物理ベースオブジェクト6d-pose推定法

Physics-Based Object 6D-Pose Estimation during Non-Prehensile Manipulation ( http://arxiv.org/abs/2211.13572v3 )

ライセンス: Link先を確認

Zisong Xu, Rafael Papallas, Mehmet Dogar

(参考訳) 本研究では,物体の6次元姿勢を時間とともに追跡する手法を提案する。オブジェクトの操作中にいつでも、ロボットのジョイントコントロールとカメラからのイメージへのアクセスを前提とします。ロボットのジョイントコントロールを使って、物体の動きを物理ベースの予測します。そして、この予測とカメラからの観測を組み合わせることで、物体のポーズを可能な限り正確に推定する。本研究では,制御情報と視覚情報を組み合わせた粒子フィルタリング手法を提案する。提案手法を2つのベースラインと比較する。 (i)各時間ステップでのイメージベースポーズ推定システムのみの使用、及び (II)計算に高価な物理予測を行わない粒子フィルタであって,物体が一定の速度で動くことを仮定する。その結果、物理ベースの予測を行うことで計算コストが上がり、より正確な追跡が可能となり、カメラに見えない物体でも物体のポーズを推定できることがわかった。

We propose a method to track the 6D pose of an object over time, while the object is under non-prehensile manipulation by a robot. At any given time during the manipulation of the object, we assume access to the robot joint controls and an image from a camera. We use the robot joint controls to perform a physics-based prediction of how the object might be moving. We then combine this prediction with the observation coming from the camera, to estimate the object pose as accurately as possible. We use a particle filtering approach to combine the control information with the visual information. We compare the proposed method with two baselines: (i) using only an image-based pose estimation system at each time-step, and (ii) a particle filter which does not perform the computationally expensive physics predictions, but assumes the object moves with constant velocity. Our results show that making physics-based predictions is worth the computational cost, resulting in more accurate tracking, and estimating object pose even when the object is not clearly visible to the camera.

翻訳日:2023-10-26 03:28:31 公開日:2023-10-24

# 低複素性を考慮した適応フェデレーションミニマックス最適化

Adaptive Federated Minimax Optimization with Lower complexities ( http://arxiv.org/abs/2211.07303v3 )

ライセンス: Link先を確認

Feihu Huang

(参考訳) フェデレーション学習(Federated Learning)は、分散およびプライバシ保護の機械学習パラダイムとして人気がある。一方、機械学習において、効率的な階層最適化としてミニマックス最適化が広く適用されている。近年,分散ミニマックス問題の解法としてフェデレーション最適化法が提案されている。しかし、これらのフェデレーションされたミニマックス法は依然として高い勾配と通信の複雑さに苦しむ。一方,適応学習速度を用いてアルゴリズムを高速化するアルゴリズムは少ない。このギャップを埋めるため,本論文では,非凸ミニマックス最適化のクラスについて検討し,分散ミニマックス問題を解くための効率的な適応フェデレーションミニマックス最適化アルゴリズム(adafgda)を提案する。特に,adafgdaは運動量に基づく分散低減法と局所sgd法を基盤とし,統一適応行列を用いて様々な適応学習率を柔軟に組み込むことができる。理論的には、AdaFGDAアルゴリズムに対して、非i.d.条件下でのソリッド収束解析フレームワークを提供する。さらに、我々のアルゴリズムは、非凸ミニマックス問題の$\epsilon$-stationary pointを求める際に、$\tilde{o}(\epsilon^{-3})$と$\tilde{o}(\epsilon^{-2})$の通信複雑性がより低い勾配(すなわち確率的一階オラクル、sfo)の複雑さを得ることを証明します。実験では,アルゴリズムの効率性を検証するために,深層auc最大化とロバストニューラルネットワークトレーニングタスクについて実験を行う。

Federated learning is a popular distributed and privacy-preserving machine learning paradigm. Meanwhile, minimax optimization, as an effective hierarchical optimization, is widely applied in machine learning. Recently, some federated optimization methods have been proposed to solve the distributed minimax problems. However, these federated minimax methods still suffer from high gradient and communication complexities. Meanwhile, few algorithm focuses on using adaptive learning rate to accelerate algorithms. To fill this gap, in the paper, we study a class of nonconvex minimax optimization, and propose an efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to solve these distributed minimax problems. Specifically, our AdaFGDA builds on the momentum-based variance reduced and local-SGD techniques, and it can flexibly incorporate various adaptive learning rates by using the unified adaptive matrix. Theoretically, we provide a solid convergence analysis framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we prove our algorithms obtain lower gradient (i.e., stochastic first-order oracle, SFO) complexity of $\tilde{O}(\epsilon^{-3})$ with lower communication complexity of $\tilde{O}(\epsilon^{-2})$ in finding $\epsilon$-stationary point of the nonconvex minimax problems. Experimentally, we conduct some experiments on the deep AUC maximization and robust neural network training tasks to verify efficiency of our algorithms.

翻訳日:2023-10-26 03:28:15 公開日:2023-10-24

# 超不均衡太陽電池モジュール画像における欠陥分割のための高調波出力不均衡

Harmonizing output imbalance for defect segmentation on extremely-imbalanced photovoltaic module cells images ( http://arxiv.org/abs/2211.05295v4 )

ライセンス: Link先を確認

Jianye Yi, Xiaopin Zhong, Weixiang Liu, Zongze Wu, Yuanlong Deng and Zhengguang Wu

(参考訳) 太陽光発電(PV)産業の継続的な発展は、PVモジュール細胞の単結晶の品質に対する高い要求を高めている。 PVモジュールセルイメージの欠陥領域の分割を学ぶとき、Tiny Hidden Cracks (THC) は極めて不均衡なサンプルを生み出す。欠陥画素と通常の画素の比率は1:2000程度である。この極端不均衡により、PVモジュール細胞のTHCのセグメンテーションが難しくなり、セグメンテーションの課題でもある。 To address the problem of segmenting defects on extremely-imbalanced THC data, the paper makes contributions from three aspects: (1) it proposes an explicit measure for output imbalance; (2) it generalizes a distribution-based loss that can handle different types of output imbalances; and (3) it introduces a compound loss with our adaptive hyperparameter selection algorithm that can keep the consistency of training and inference for harmonizing the output imbalance on extremelyimbalanced input data. 提案手法は,広く使用されている4つのディープラーニングアーキテクチャと,入力の不均衡度が異なる4つのデータセットを用いて評価する。実験の結果,提案手法は既存手法よりも優れていた。

The continuous development of the photovoltaic (PV) industry has raised high requirements for the quality of monocrystalline of PV module cells. When learning to segment defect regions in PV module cell images, Tiny Hidden Cracks (THC) lead to extremely-imbalanced samples. The ratio of defect pixels to normal pixels can be as low as 1:2000. This extreme imbalance makes it difficult to segment the THC of PV module cells, which is also a challenge for semantic segmentation. To address the problem of segmenting defects on extremely-imbalanced THC data, the paper makes contributions from three aspects: (1) it proposes an explicit measure for output imbalance; (2) it generalizes a distribution-based loss that can handle different types of output imbalances; and (3) it introduces a compound loss with our adaptive hyperparameter selection algorithm that can keep the consistency of training and inference for harmonizing the output imbalance on extremelyimbalanced input data. The proposed method is evaluated on four widely-used deep learning architectures and four datasets with varying degrees of input imbalance. The experimental results show that the proposed method outperforms existing methods.

翻訳日:2023-10-26 03:27:20 公開日:2023-10-24

# 重要再サンプリングによる言語モデルのデータ選択

Data Selection for Language Models via Importance Resampling ( http://arxiv.org/abs/2302.03169v2 )

ライセンス: Link先を確認

Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang

(参考訳) 適切な事前学習データセットの選択は、一般ドメイン(gpt-3など)とドメイン固有言語モデル(例えば、コードx)の両方において不可欠である。この問題を、ラベルなしのターゲットサンプルが与えられた場合に、所望のターゲット分布にマッチするように、大きな生のラベルなしデータセットのサブセットを選択することで定式化する。テキストデータの大規模化と次元化のため、既存の手法では単純なヒューリスティックスや専門家を使ってデータを手作業でキュレートする。代わりに、lmデータ選択に低次元で使用される古典的な重要度再サンプリングアプローチを拡張します。本研究では,トラクタビリティの低減した特徴空間における重み付けを推定し,重み付けによる重み付けを伴うデータを選択する,効率的でスケーラブルなフレームワークであるData Selection with Importance Resampling(DSIR)を提案する。適切な特徴空間を決定するために、選択した事前学習データと特徴空間のターゲットとの近接度を測定するデータ計量であるKL削減が、単純なn-gram特徴量で計算した場合の平均下流精度(r=0.89)と高い相関を持つことを示す。これは、n-gram特徴を用いたDSIRのインスタンス化を動機付けます。特定のドメインに対して事前トレーニングを継続する場合、DSIRは8つのターゲットディストリビューションにわたる専門家のキュレーションと互換性がある。汎用ドメインモデル(ターゲットはウィキペディア+書籍)を事前トレーニングする場合、DSIRはGLUEベンチマークでランダム選択とヒューリスティックフィルタリングベースラインを2-2.5%改善する。

Selecting a suitable pretraining dataset is crucial for both general-domain (e.g., GPT-3) and domain-specific (e.g., Codex) language models (LMs). We formalize this problem as selecting a subset of a large raw unlabeled dataset to match a desired target distribution given some unlabeled target samples. Due to the large scale and dimensionality of the raw text data, existing methods use simple heuristics or use experts to manually curate data. Instead, we extend the classic importance resampling approach used in low-dimensions for LM data selection. We propose Data Selection with Importance Resampling (DSIR), an efficient and scalable framework that estimates importance weights in a reduced feature space for tractability and selects data with importance resampling according to these weights. To determine an appropriate feature space, we show that KL reduction, a data metric that measures the proximity between selected pretraining data and the target in a feature space, has high correlation with average downstream accuracy (r=0.89) when computed with simple n-gram features. This motivates our instantiation of DSIR using n-gram features. When performing continued pretraining towards a specific domain, DSIR performs comparably to expert curation across 8 target distributions. When pretraining general-domain models (target is Wikipedia + books), DSIR improves over random selection and heuristic filtering baselines by 2-2.5% on the GLUE benchmark.

翻訳日:2023-10-26 03:22:12 公開日:2023-10-24

# NA-SODINN:残音条件に基づく外惑星画像検出のためのディープラーニングアルゴリズム

NA-SODINN: a deep learning algorithm for exoplanet image detection based on residual noise regimes ( http://arxiv.org/abs/2302.02854v2 )

ライセンス: Link先を確認

Carles Cantero, Olivier Absil, Carl-Henrik Dahlqvist and Marc Van Droogenbroeck

(参考訳) SODINNアルゴリズムは、角微分画像(ADI)データセットにおける外惑星検出のために設計された畳み込みニューラルネットワークである。 EIDC (Exoplanet Imaging Data Challenge) におけるHCIアルゴリズムのベンチマークの結果が得られた。 i) SODINNは、最終検出マップにおいて、多数の偽陽性を生成でき、 (ii)より局所的に画像を処理するアルゴリズムは、より優れた性能を発揮する。本研究は,新しい局所処理手法を導入し,それに従って学習プロセスを適用することで, sodinn検出性能を向上させることを目的とする。本稿では,畳み込みニューラルネットワーク(CNN)に基づく新しいディープラーニングバイナリ分類器NA-SODINNを提案する。我々の新しいアプローチは、VLT/SPHEREとKeck/NIRC-2のADI配列の局所受信動作特性(ROC)解析を通じて、2つのSODINNベースハイブリッドモデルとより標準の環状PCAアプローチに対して試験された。その結果、NA-SODINNは感度と特異性の両方でSODINNを強化し、特にスペックルが支配するノイズレシエーションにおいて顕著であることがわかった。また, NA-SODINNは, EIDCにおける提案された検出アルゴリズムの完全セットに対してベンチマークを行い, 最終的な検出スコアが最強検出アルゴリズムと一致しているか, あるいは上回っていることを示すとともに, 教師付き機械学習のケースにおいて, 処理された画像の局所的内容に検出タスクを適用することの重要性を図示し, 強化する。

Supervised deep learning was recently introduced in high-contrast imaging (HCI) through the SODINN algorithm, a convolutional neural network designed for exoplanet detection in angular differential imaging (ADI) datasets. The benchmarking of HCI algorithms within the Exoplanet Imaging Data Challenge (EIDC) showed that (i) SODINN can produce a high number of false positives in the final detection maps, and (ii) algorithms processing images in a more local manner perform better. This work aims to improve the SODINN detection performance by introducing new local processing approaches and adapting its learning process accordingly. We propose NA-SODINN, a new deep learning binary classifier based on a convolutional neural network (CNN) that better captures image noise correlations in ADI-processed frames by identifying noise regimes. Our new approach was tested against its predecessor, as well as two SODINN-based hybrid models and a more standard annular-PCA approach, through local receiving operating characteristics (ROC) analysis of ADI sequences from the VLT/SPHERE and Keck/NIRC-2 instruments. Results show that NA-SODINN enhances SODINN in both sensitivity and specificity, especially in the speckle-dominated noise regime. NA-SODINN is also benchmarked against the complete set of submitted detection algorithms in EIDC, in which we show that its final detection score matches or outperforms the most powerful detection algorithms.Throughout the supervised machine learning case, this study illustrates and reinforces the importance of adapting the task of detection to the local content of processed images.

翻訳日:2023-10-26 03:21:45 公開日:2023-10-24

# 半スーパービジョンの医用画像分割再考 : ばらつき低減の視点から

Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective ( http://arxiv.org/abs/2302.01735v5 )

ライセンス: Link先を確認

Chenyu You, Weicheng Dai, Yifei Min, Fenglin Liu, David A. Clifton, S Kevin Zhou, Lawrence Hamilton Staib, James S Duncan

(参考訳) 医用画像のセグメンテーションにおいて, 比較学習は, 意味論的に類似した, 異種のサンプルを対比することにより, 視覚表現の質を向上させるための主流の実践である。これは、真に異なる解剖学的特徴を持つ負の例が、もしサンプルを採取すれば、性能が著しく向上する、という観察によって可能となった。しかし実際には、これらのサンプルは類似した解剖学的領域から来ており、モデルは少数派のテールクラスのサンプルを区別するのに苦労し、テールクラスは誤分類されやすくなり、両者ともモデル崩壊に繋がる。本稿では,医療画像分割のための階層化群理論を用いた半教師付きコントラスト学習(cl)フレームワークarcoを提案する。特に, 分散還元推定の概念を通したarcoの構築を最初に提案し, 限定ラベルを持つ画素/ボクセルレベル分割タスクにおいて, ある種の分散還元手法が特に有益であることを示す。さらに,これらのサンプリング手法が分散還元において普遍的であることを理論的に証明する。最後に,5つの2D/3D医療データセットと3つのセマンティックセマンティックセグメンテーションデータセットとラベル設定の異なる8つのベンチマークに対して,我々の手法を実験的に検証した。さらに、clフレームワークをこれらのサンプリング技術で強化し、以前の方法を大きく上回る結果を示す。我々は,これらの課題を克服するために,現在の自己超越目標の限界を定量化し,半監督的医用画像セグメンテーションに向けた重要なステップであると考えている。

For medical image segmentation, contrastive learning is the dominant practice to improve the quality of visual representations by contrasting semantically similar and dissimilar pairs of samples. This is enabled by the observation that without accessing ground truth labels, negative examples with truly dissimilar anatomical features, if sampled, can significantly improve the performance. In reality, however, these samples may come from similar anatomical regions and the models may struggle to distinguish the minority tail-class samples, making the tail classes more prone to misclassification, both of which typically lead to model collapse. In this paper, we propose ARCO, a semi-supervised contrastive learning (CL) framework with stratified group theory for medical image segmentation. In particular, we first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks with extremely limited labels. Furthermore, we theoretically prove these sampling techniques are universal in variance reduction. Finally, we experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings, and our methods consistently outperform state-of-the-art semi-supervised methods. Additionally, we augment the CL frameworks with these sampling techniques and demonstrate significant gains over previous methods. We believe our work is an important step towards semi-supervised medical image segmentation by quantifying the limitation of current self-supervision objectives for accomplishing such challenging safety-critical tasks.

翻訳日:2023-10-26 03:21:17 公開日:2023-10-24

# コンテキストプルーニングメタラーニングによる大規模ニューラルネットワークの学習

Learning Large-scale Neural Fields via Context Pruned Meta-Learning ( http://arxiv.org/abs/2302.00617v3 )

ライセンス: Link先を確認

Jihoon Tack, Subin Kim, Sihyun Yu, Jaeho Lee, Jinwoo Shin, Jonathan Richard Schwarz

(参考訳) 本稿では,オンラインコンテキストポイントの自動選択による大幅なメモリ節約を実現することで,大規模ニューラルネットワークトレーニングのための効率的な最適化に基づくメタ学習手法を提案する。これは、各学習ステップをデータサブセットに集中させ、モデル品質の即時改善を期待し、その結果、大域構造のほぼ瞬時にモデリングし、高周波の詳細を洗練させることによって達成される。さらに,最適化に基づくメタ学習のマイオピアを緩和しつつ,文脈セットの縮小によって生じる誤りを最小化するブートストラップ補正を導入することで,メタ学習初期化の質をさらに向上させる。最後に,メタテスト時間における勾配再スケーリングが,最適化手順を大幅に短縮する上で,極めて高品質なニューラルネットワークの学習を可能にすることを示す。私たちのフレームワークはモデルに依存しず、直感的で、実装が簡単で、幅広い信号に対する大幅な再構成改善を示しています。本稿では,複数のモダリティにまたがる9つのデータセットの広範な実験評価を行い,その手法を構成するアルゴリズム成分を注意深く分析することで,最先端の結果を示す。コードはhttps://github.com/jihoontack/GradNCPで入手できる。

We introduce an efficient optimization-based meta-learning technique for large-scale neural field training by realizing significant memory savings through automated online context point selection. This is achieved by focusing each learning step on the subset of data with the highest expected immediate improvement in model quality, resulting in the almost instantaneous modeling of global structure and subsequent refinement of high-frequency details. We further improve the quality of our meta-learned initialization by introducing a bootstrap correction resulting in the minimization of any error introduced by reduced context sets while simultaneously mitigating the well-known myopia of optimization-based meta-learning. Finally, we show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields in significantly shortened optimization procedures. Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals. We provide an extensive empirical evaluation on nine datasets across multiple multiple modalities, demonstrating state-of-the-art results while providing additional insight through careful analysis of the algorithmic components constituting our method. Code is available at https://github.com/jihoontack/GradNCP

翻訳日:2023-10-26 03:20:48 公開日:2023-10-24

# グラフニューラルネットワークのゼロワン法則

Zero-One Laws of Graph Neural Networks ( http://arxiv.org/abs/2301.13060v5 )

ライセンス: Link先を確認

Sam Adam-Day, Theodor Mihai Iliant, \.Ismail \.Ilkan Ceylan

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ上の機械学習のためのデファクト標準ディープラーニングアーキテクチャである。これにより、これらのモデルの能力と限界、特にそれらの表現と外挿能力に関する多くの作業が分析された。グラフノードの数が非常に大きくなるにつれて、GNNはどのように振る舞うのか? 穏やかな仮定の下では、Erd\H{o}s-R\'enyi モデルから増大するグラフを描くと、そのようなグラフがGNN分類器のクラスによって特定の出力にマップされる確率は 0 または 1 の傾向を示す。このクラスは一般的なグラフ畳み込みネットワークアーキテクチャを含んでいる。その結果、これらのGNNに対して「ゼロワン法則」を確立し、他の収束法則と類似して、その能力に関する理論的制限を課す。理論的な漸近限界は、比較的小さなグラフ上で既に明らかなものであることを観察し、実験的に検証した。

Graph neural networks (GNNs) are the de facto standard deep learning architectures for machine learning on graphs. This has led to a large body of work analyzing the capabilities and limitations of these models, particularly pertaining to their representation and extrapolation capacity. We offer a novel theoretical perspective on the representation and extrapolation capacity of GNNs, by answering the question: how do GNNs behave as the number of graph nodes become very large? Under mild assumptions, we show that when we draw graphs of increasing size from the Erd\H{o}s-R\'enyi model, the probability that such graphs are mapped to a particular output by a class of GNN classifiers tends to either zero or to one. This class includes the popular graph convolutional network architecture. The result establishes 'zero-one laws' for these GNNs, and analogously to other convergence laws, entails theoretical limitations on their capacity. We empirically verify our results, observing that the theoretical asymptotic limits are evident already on relatively small graphs.

翻訳日:2023-10-26 03:20:28 公開日:2023-10-24

# グラフニューラルネットワークは、グラフ構造のみから隠れた特徴を回復できる

Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure ( http://arxiv.org/abs/2301.10956v3 )

ライセンス: Link先を確認

Ryoma Sato

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ学習問題の一般的なモデルである。 gnnは多くの実用的なタスクで強い経験的パフォーマンスを示します。しかし、理論的な性質は完全に解明されていない。本稿では,GNNの表現力の観点から,GNNがグラフ構造を活用できるかどうかを検討する。本分析では,グラフ構造に関するすべての情報を含む隠れノード特徴(あるいは潜在ノード特徴)によって制御されるグラフ生成プロセスについて考察する。このフレームワークの典型的な例は、隠れた特徴から構築されたkNNグラフである。本研究の主目的は,隠れた特徴自身や間接的なヒントを含むすべてのノード特徴が利用できない場合でも,GNNが入力グラフのみから隠れたノード特徴を復元できることである。 gnnは、ダウンストリームタスクで回復したノード機能をさらに使用できる。これらの結果から、GNNはグラフ構造を自分自身で完全に活用でき、事実上、GNNは下流タスクに隠されたノード機能と明示的なノード機能の両方を利用することができる。実験では,理論解析に基づいて構築されたGNNアーキテクチャを用いて,GNNが隠れた特徴を正確に復元できることを示し,その妥当性を確認した。

Graph Neural Networks (GNNs) are popular models for graph learning problems. GNNs show strong empirical performance in many practical tasks. However, the theoretical properties have not been completely elucidated. In this paper, we investigate whether GNNs can exploit the graph structure from the perspective of the expressive power of GNNs. In our analysis, we consider graph generation processes that are controlled by hidden (or latent) node features, which contain all information about the graph structure. A typical example of this framework is kNN graphs constructed from the hidden features. In our main results, we show that GNNs can recover the hidden node features from the input graph alone, even when all node features, including the hidden features themselves and any indirect hints, are unavailable. GNNs can further use the recovered node features for downstream tasks. These results show that GNNs can fully exploit the graph structure by themselves, and in effect, GNNs can use both the hidden and explicit node features for downstream tasks. In the experiments, we confirm the validity of our results by showing that GNNs can accurately recover the hidden features using a GNN architecture built based on our theoretical analysis.

翻訳日:2023-10-26 03:19:26 公開日:2023-10-24

# Batch Prompting: 大規模言語モデルAPIによる効率的な推論

Batch Prompting: Efficient Inference with Large Language Model APIs ( http://arxiv.org/abs/2301.08721v2 )

ライセンス: Link先を確認

Zhoujun Cheng, Jungo Kasai, Tao Yu

(参考訳) 大規模言語モデル(LLM)を用いた大量のサンプルに対する推論は、産業や実世界の利用において計算的かつ経済的にコストがかかる可能性がある。我々は,LLMが1回に1つのサンプルではなく,バッチで推論を実行できるようにする,シンプルで効果的なプロンプト手法であるバッチプロンプトを提案する。ダウンストリーム性能を維持しながらトークンと時間の両方のコストを削減する。理論的には、数ショットのコンテキスト内学習環境では、各バッチのサンプル数とともに、推論コストはほぼ線形に減少する。バッチプロンプトが著しく~(最大で6つのサンプルで5倍)、LLM(Codex)推論トークンと時間コストが削減され、性能が向上または同等になる。 GPT-3.5 や GPT-4 のような最先端の Chat ベースの LLM では、バッチプロンプトの利点も保たれている。さらに分析した結果、各バッチ内のサンプル数とタスクの複雑さがパフォーマンスに影響することがわかった。さらに、バッチプロンプトはLLMを用いて異なる推論方法に適用できる。私たちのコードはhttps://github.com/xlang-ai/batch-promptingのサイトにある。

Performing inference on large volumes of samples with large language models (LLMs) can be computationally and financially costly in industry and real-world use. We propose batch prompting, a simple yet effective prompting approach that enables the LLM to run inference in batches, instead of one sample at a time. Our method reduces both token and time costs while retaining downstream performance. We theoretically demonstrate that under a few-shot in-context learning setting, the inference costs decrease almost inverse linearly with the number of samples in each batch. We extensively validate the effectiveness of batch prompting on ten datasets across commonsense QA, arithmetic reasoning, and NLI/NLU: batch prompting significantly~(up to 5x with six samples in batch) reduces the LLM (Codex) inference token and time costs while achieving better or comparable performance. For state-of-the-art Chat-based LLMs, e.g., GPT-3.5 and GPT-4, we show the benefits of batch prompting also hold. Further analysis shows that the number of samples in each batch and the complexity of tasks affect its performance. Moreover, batch prompting can be applied across different reasoning methods using LLMs. Our code can be found at the site https://github.com/xlang-ai/batch-prompting.

翻訳日:2023-10-26 03:19:08 公開日:2023-10-24

# FENDI:量子インターネットにおける高密度エンタングルメント分布を目指して

FENDI: Toward High-Fidelity Entanglement Distribution in the Quantum Internet ( http://arxiv.org/abs/2301.08269v3 )

ライセンス: Link先を確認

Huayue Gu, Zhouyu Li, Ruozhou Yu, Xiaojian Wang, Fangtong Zhou, Jianqing Liu, Guoliang Xue

(参考訳) 量子ネットワークは、遠隔ノード間で量子の絡み合いを分散させ、セキュアな通信、量子センシング、分散量子コンピューティングにおける多くの応用の鍵となる。本稿では,マルチホップ量子リピータネットワークにおけるスループットと絡み合い分布の質のトレードオフについて検討する。エンタングルメント分布率(EDR)および/またはエンタングルメント忠実度をヒューリスティックに最大化することを目的とした既存の研究と比較して、我々のゴールは、任意の量子ノード間の最大到達可能なEDRの上限を満たしつつ、最大到達可能な最悪のケース忠実度を特徴づけることである。この特徴付けは、量子ネットワークの達成可能な性能領域の基本的な境界を提供し、量子ネットワークトポロジー、プロトコル、アプリケーションの設計を支援する。しかし、そのタスクは非常に非自明であり、証明する限りNPハードである。我々の主な貢献は、達成可能な最悪のケースの忠実度を厳密なEDR境界に近似する完全多項式時間近似スキームであり、最適忠実度非依存なEDR最適化と最悪のケース等方性雑音モデルを組み合わせたものである。 EDRとフィデリティ保証は、量子メモリを備えたポストセレクション・アンド・ストレージプロトコルによって実装できる。離散時間量子ネットワークシミュレータを開発することで,ネットワークの特徴的な性能領域(近似パレートフロンティア)を示すシミュレーションを行い,既存のプロトコルが実質的なギャップを示す一方で,設計プロトコルが性能領域を達成できることを実証する。

A quantum network distributes quantum entanglements between remote nodes, and is key to many applications in secure communication, quantum sensing and distributed quantum computing. This paper explores the fundamental trade-off between the throughput and the quality of entanglement distribution in a multi-hop quantum repeater network. Compared to existing work which aims to heuristically maximize the entanglement distribution rate (EDR) and/or entanglement fidelity, our goal is to characterize the maximum achievable worst-case fidelity, while satisfying a bound on the maximum achievable expected EDR between an arbitrary pair of quantum nodes. This characterization will provide fundamental bounds on the achievable performance region of a quantum network, which can assist with the design of quantum network topology, protocols and applications. However, the task is highly non-trivial and is NP-hard as we shall prove. Our main contribution is a fully polynomial-time approximation scheme to approximate the achievable worst-case fidelity subject to a strict expected EDR bound, combining an optimal fidelity-agnostic EDR-maximizing formulation and a worst-case isotropic noise model. The EDR and fidelity guarantees can be implemented by a post-selection-and-storage protocol with quantum memories. By developing a discrete-time quantum network simulator, we conduct simulations to show the characterized performance region (the approximate Pareto frontier) of a network, and demonstrate that the designed protocol can achieve the performance region while existing protocols exhibit a substantial gap.

翻訳日:2023-10-26 03:18:42 公開日:2023-10-24

# 超伝導回路上の時間最適ユニバーサル量子ゲート

Time-optimal universal quantum gates on superconducting circuits ( http://arxiv.org/abs/2301.03334v2 )

ライセンス: Link先を確認

Ze Li, Ming-Jie Liang, Zheng-Yuan Xue

(参考訳) 量子系を操作する場合、デコヒーレンスは避けられない。量子操作の質を低下させるため、高忠実度量子ゲートを必要とする大規模量子計算の主要な障害の1つである。一般的に、ゲート操作が長ければ長いほど、デコヒーレンスによって引き起こされるゲートの不完全性が増す。したがって、ゲート時間を短くする方法は、解決すべき緊急の問題となる。この目的のために、量子ブラヒストローネ方程式の解法に基づく時間最適制御は簡単な解である。本稿では,2次元正方格子配置の超伝導量子ビット上での普遍量子ゲートを実現する手法を提案し,2量子ビットゲートの忠実度は99.9\%に近づく。一方、外部駆動の変形を調整することにより、Z軸ゲートをかなり加速させることができる。最後に,デフォーカスエラーの影響を低減するために,デコヒーレンスフリーな部分空間符号化も実装に取り入れた。そこで我々は,大規模量子計算に期待できる高速量子スキームを提案する。

Decoherence is inevitable when manipulating quantum systems. It decreases the quality of quantum manipulations and thus is one of the main obstacles for large-scale quantum computation, where high-fidelity quantum gates are needed. Generally, the longer a gate operation is, the more decoherence-induced gate infidelity will be. Therefore, how to shorten the gate time becomes an urgent problem to be solved. To this end, time-optimal control based on solving the quantum brachistochrone equation is a straightforward solution. Here, based on time-optimal control, we propose a scheme to realize universal quantum gates on superconducting qubits in a two-dimensional square lattice configuration, and the two-qubit gate fidelity approaches 99.9\%. Meanwhile, we can further accelerate the Z-axis gate considerably by adjusting the detuning of the external driving. Finally, in order to reduce the influence of the dephasing error, decoherence-free subspace encoding is also incorporated in our physical implementation. Therefore, we present a fast quantum scheme which is promising for large-scale quantum computation.

翻訳日:2023-10-26 03:18:11 公開日:2023-10-24

# cosyn:コンテキスト同期双曲ネットワークを用いたオンライン会話における暗黙的ヘイトスピーチの検出

CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network ( http://arxiv.org/abs/2303.03387v3 )

ライセンス: Link先を確認

Sreyan Ghosh and Manan Suri and Purva Chiniya and Utkarsh Tyagi and Sonal Kumar and Dinesh Manocha

(参考訳) オンラインの会話で交流するソーシャルメディア利用者の急増はヘイトスピーチを著しく増加させ、様々な人口層からの影響を受けている。先行研究のほとんどが、暗黙のヘイトスピーチの検出や間接言語やコード化された言語によるヘイトスピーチの検出に重点を置いて、ヘイトフルなフレーズを活用している、明示的なヘイトスピーチの検出に重点を置いている。本稿では,オンライン会話における暗黙のヘイトスピーチを検出するために,ユーザと会話のコンテキストを明示的に組み込んだ,コンテキストシナージュ型ニューラルネットワークCoSynを提案する。 cosyn氏は、これらの外部コンテキストをエンコードする新しい方法を紹介し、それらの間の相互作用を明確に捉える新しいコンテキストインタラクションメカニズムを採用し、これらのノイズの多いコンテキストから取得すべき情報量について独立的に評価する。さらに、ソーシャルメディアのスケールフリーなダイナミクスを考慮するために、双曲空間でこれらすべての操作を実行する。我々は6つのヘイトスピーチデータセットに対するCoSynの有効性を実証し、CoSynが1.24%から57.8%の範囲で絶対的な改善を施した暗黙のヘイトスピーチの検出において、すべてのベースラインを上回っていることを示す。

The tremendous growth of social media users interacting in online conversations has led to significant growth in hate speech, affecting people from various demographics. Most of the prior works focus on detecting explicit hate speech, which is overt and leverages hateful phrases, with very little work focusing on detecting hate speech that is implicit or denotes hatred through indirect or coded language. In this paper, we present CoSyn, a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations. CoSyn introduces novel ways to encode these external contexts and employs a novel context interaction mechanism that clearly captures the interplay between them, making independent assessments of the amounts of information to be retrieved from these noisy contexts. Additionally, it carries out all these operations in the hyperbolic space to account for the scale-free dynamics of social media. We demonstrate the effectiveness of CoSyn on 6 hate speech datasets and show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.

翻訳日:2023-10-26 01:35:41 公開日:2023-10-24

# 欧州連合における政治広告の透明性向上法についての一考察

A Note on the Proposed Law for Improving the Transparency of Political Advertising in the European Union ( http://arxiv.org/abs/2303.02863v4 )

ライセンス: Link先を確認

Jukka Ruohonen

(参考訳) 世界中で政治広告の供給と需要が高まっている。同時に、外国政府や他の悪役による選挙妨害のような社会的な脅威は、多くの民主政治において迫る懸念となっている。さらに、外国軍や国内軍による選挙結果の操作は、基本的権利を心配している多くの市民の関心事であり続けている。この目的のために、欧州連合(EU)はこの問題に取り組むためのいくつかの取り組みを開始した。 2020年には、政治広告の透明性を高めるための新しい規制が提案された。この短い解説は提案された規制を見直し、その制限と潜在的な影響についていくつかの点を提起する。

There is an increasing supply and demand for political advertising throughout the world. At the same time, societal threats, such as election interference by foreign governments and other bad actors, continues to be a pressing concern in many democracies. Furthermore, manipulation of electoral outcomes, whether by foreign or domestic forces, continues to be a concern of many citizens who are also worried about their fundamental rights. To these ends, the European Union (EU) has launched several initiatives for tackling the issues. A new regulation was proposed in 2020 also for improving the transparency of political advertising in the union. This short commentary reviews the regulation proposed and raises a few points about its limitations and potential impacts.

翻訳日:2023-10-26 01:35:17 公開日:2023-10-24

# 大規模言語モデルによるゼロショットクロスリンガル要約

Zero-Shot Cross-Lingual Summarization via Large Language Models ( http://arxiv.org/abs/2302.14229v4 )

ライセンス: Link先を確認

Jiaan Wang, Yunlong Liang, Fandong Meng, Beiqi Zou, Zhixu Li, Jianfeng Qu, Jie Zhou

(参考訳) ソース言語の文書が与えられた場合、言語間要約(CLS)は異なるターゲット言語で要約を生成することを目的としている。近年, GPT-3.5, ChatGPT, GPT-4 などの大規模言語モデル (LLM) の出現は, 計算言語学コミュニティから広く注目を集めている。しかし、LS上でのLSMの性能は未だ分かっていない。本稿では,異なるパラダイム(エンド・ツー・エンド・エンド・パイプライン)からゼロショットCLSを誘導するための様々なプロンプトを実証的に使用し,生成したサマリーの予備評価を行う。 ChatGPT と GPT-4 はもともと,詳細な情報を持つ長文要約が好まれていた。これらの2つのLSMは、対話的なプロンプトの助けを借りて、情報量と簡潔さを更にバランスさせ、CLSの性能を大幅に向上させることができる。 3つの広く使用されているCLSデータセットによる実験結果から、GPT-4は最先端のゼロショットCLS性能を達成し、微細調整されたmBART-50と競合して性能を発揮することが示された。さらに,多言語およびバイリンガルLLM(BLOOMZ,ChatGLM-6B,Vicuna-13B,ChatYuan)はゼロショットCLS能力に制限がある。要約と翻訳を同時に行うモデルを必要とするCLSの合成特性のため、ゼロショット方式でこのタスクを実現することは、LSMにとっての課題である。したがって、今後のLSM研究がLSをテストベッドとして利用できることを心から願っています。

Given a document in a source language, cross-lingual summarization (CLS) aims to generate a summary in a different target language. Recently, the emergence of Large Language Models (LLMs), such as GPT-3.5, ChatGPT and GPT-4, has attracted wide attention from the computational linguistics community. However, it is not yet known the performance of LLMs on CLS. In this report, we empirically use various prompts to guide LLMs to perform zero-shot CLS from different paradigms (i.e., end-to-end and pipeline), and provide a preliminary evaluation on the generated summaries. We find that ChatGPT and GPT-4 originally prefer to produce lengthy summaries with detailed information. These two LLMs can further balance informativeness and conciseness with the help of an interactive prompt, significantly improving their CLS performance. Experimental results on three widely-used CLS datasets show that GPT-4 achieves state-of-the-art zero-shot CLS performance, and performs competitively compared with the fine-tuned mBART-50. Moreover, we also find some multi-lingual and bilingual LLMs (i.e., BLOOMZ, ChatGLM-6B, Vicuna-13B and ChatYuan) have limited zero-shot CLS ability. Due to the composite nature of CLS, which requires models to perform summarization and translation simultaneously, accomplishing this task in a zero-shot manner is even a challenge for LLMs. Therefore, we sincerely hope and recommend future LLM research could use CLS as a testbed.

翻訳日:2023-10-26 01:35:07 公開日:2023-10-24

# 深層ニューラルネットワークにおける早期トレーニングダイナミクスの位相図:学習速度,深さ,幅の影響

Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width ( http://arxiv.org/abs/2302.12250v2 )

ライセンス: Link先を確認

Dayal Singh Kalra and Maissam Barkeshli

(参考訳) 確率勾配降下法(SGD)を訓練したディープニューラルネットワーク(DNN)の最適化ダイナミクスを系統的に解析し,学習率$\eta$,deep $d$,Whid $w$のニューラルネットワークの効果について検討した。損失のヘシアンの最大固有値 $\lambda^H_t$ を解析することにより、損失ランドスケープの鋭さを測定することで、ダイナミクスは4つの異なる状態を示すことができる。 (i)早期の一時的な体制。 (二)中間飽和体制 (iii)進歩的な研削体制、 (iv)後期の「安定の最先端」体制。初期と中間の体制は (i)および (ii) $\eta \equiv c / \lambda_0^H $, $d$, $w$ に依存する豊富な位相図を示す。トレーニング損失とシャープネスの初期ダイナミクスにおいて定性的に異なる現象を分離するいくつかの臨界値である$c$を同定した。特に、$d$ と $1/w$ が増加するにつれて、鋭さが早い段階で減少する `sharpness reduction" フェーズの開始を見出した。

We systematically analyze optimization dynamics in deep neural networks (DNNs) trained with stochastic gradient descent (SGD) and study the effect of learning rate $\eta$, depth $d$, and width $w$ of the neural network. By analyzing the maximum eigenvalue $\lambda^H_t$ of the Hessian of the loss, which is a measure of sharpness of the loss landscape, we find that the dynamics can show four distinct regimes: (i) an early time transient regime, (ii) an intermediate saturation regime, (iii) a progressive sharpening regime, and (iv) a late time ``edge of stability" regime. The early and intermediate regimes (i) and (ii) exhibit a rich phase diagram depending on $\eta \equiv c / \lambda_0^H $, $d$, and $w$. We identify several critical values of $c$, which separate qualitatively distinct phenomena in the early time dynamics of training loss and sharpness. Notably, we discover the opening up of a ``sharpness reduction" phase, where sharpness decreases at early times, as $d$ and $1/w$ are increased.

翻訳日:2023-10-26 01:34:38 公開日:2023-10-24

# 共変量子組合せ論とゼロエラー通信への応用

Covariant quantum combinatorics with applications to zero-error communication ( http://arxiv.org/abs/2302.07776v2 )

ライセンス: Link先を確認

Dominic Verdon

(参考訳) 有限次元の共変集合において、すべての系(有限次元$C^*$-代数)がコンパクトな量子群$G$の作用を持ち、すべてのチャネル(正の正の$G$-不変状態を保存する写像)が$G$-作用に関して共変であるような量子(非可換性)関係と量子(非可換性)グラフの理論を開発する。我々は、対称性制約を持つゼロエラー量子通信理論への応用による定義の動機付けを行う。主な結果は以下の通りである。 1)共変量子関係を共変チャネルの基底関係とするために必要な十分条件を与える。 2) 共変チャネルの共変チャネルの共変グラフとして、g$-作用を持つすべての量子可換グラフ(これを量子 $g$-graph と呼ぶ)が出現することを示す。 3) 共変チャネルは共変チャネルの可積分性が$G$-graph であるときに正確に可逆であることを示す。 4) $g$ が準三角である場合(これはすべてのコンパクト群を含む)、共変ゼロエラーのソースチャネル符号化スキームは、共変準同型である。

We develop the theory of quantum (a.k.a. noncommutative) relations and quantum (a.k.a. noncommutative) graphs in the finite-dimensional covariant setting, where all systems (finite-dimensional $C^*$-algebras) carry an action of a compact quantum group $G$, and all channels (completely positive maps preserving the canonical $G$-invariant state) are covariant with respect to the $G$-actions. We motivate our definitions by applications to zero-error quantum communication theory with a symmetry constraint. Some key results are the following: 1) We give a necessary and sufficient condition for a covariant quantum relation to be the underlying relation of a covariant channel. 2) We show that every quantum confusability graph with a $G$-action (which we call a quantum $G$-graph) arises as the confusability graph of a covariant channel. 3) We show that a covariant channel is reversible precisely when its confusability $G$-graph is discrete. 4) When $G$ is quasitriangular (this includes all compact groups), we show that covariant zero-error source-channel coding schemes are classified by covariant homomorphisms between confusability $G$-graphs.

翻訳日:2023-10-26 01:34:15 公開日:2023-10-24

# 近位ニュートンによる効率的なグラフラプラシアン推定

Efficient Graph Laplacian Estimation by Proximal Newton ( http://arxiv.org/abs/2302.06434v2 )

ライセンス: Link先を確認

Yakov Medvedovsky, Eran Treister, Tirza Routtenberg

(参考訳) Laplacian-Constrained Gaussian Markov Random Field (LGMRF) は、与えられたデータから重み付きスパース依存グラフを学ぶための一般的な多変量統計モデルである。このグラフ学習問題は、ラプラシア構造制約を受ける精度行列の最大極大推定(MLE)として、スパース性誘導ペナルティ項で定式化することができる。本稿では,この学習問題を正確かつ効率的に解くことを目的とする。まず、一般的な$\ell_1$-normのペナルティは、この設定では不適切であり、完全なグラフにつながる可能性があるため、推定バイアスの低いスパース解を促進する非凸ミニマックスペナルティ(MCP)を用いる。第二に, 既存の一階法とは対照的に, 共役勾配, プリコンディショニング, およびアクティブ/フリー集合への分割といったアルゴリズム的特徴を活かし, 効率的な解法を得るための二階間近ニュートン法を開発した。数値実験により,既存の手法と比較して計算複雑性とグラフ学習精度の両方において,提案手法の利点が示された。

The Laplacian-constrained Gaussian Markov Random Field (LGMRF) is a common multivariate statistical model for learning a weighted sparse dependency graph from given data. This graph learning problem can be formulated as a maximum likelihood estimation (MLE) of the precision matrix, subject to Laplacian structural constraints, with a sparsity-inducing penalty term. This paper aims to solve this learning problem accurately and efficiently. First, since the commonly used $\ell_1$-norm penalty is inappropriate in this setting and may lead to a complete graph, we employ the nonconvex minimax concave penalty (MCP), which promotes sparse solutions with lower estimation bias. Second, as opposed to existing first-order methods for this problem, we develop a second-order proximal Newton approach to obtain an efficient solver, utilizing several algorithmic features, such as using Conjugate Gradients, preconditioning, and splitting to active/free sets. Numerical experiments demonstrate the advantages of the proposed method in terms of both computational complexity and graph learning accuracy compared to existing methods.

翻訳日:2023-10-26 01:33:06 公開日:2023-10-24

# ディープ・パーセプチュアル・ロス・ネットワークの系統的性能解析--Breaking Transfer Learning Conventions-

A Systematic Performance Analysis of Deep Perceptual Loss Networks: Breaking Transfer Learning Conventions ( http://arxiv.org/abs/2302.04032v2 )

ライセンス: Link先を確認

Gustav Grund Pihlgren, Konstantina Nikolaidou, Prakash Chandra Chhipa, Nosheen Abid, Rajkumar Saini, Fredrik Sandin, Marcus Liwicki

(参考訳) ディープ・パーセプチュアル・ロス(deep perceptual loss)は、ニューラルネットワークから抽出された深い特徴を用いて人間の知覚を模倣することを目的としたコンピュータビジョンにおける損失関数の一種である。近年,画像合成やセグメンテーション,奥行き予測など,画像や画像ライクなアウトプットを持つタスクにおいて,興味深いコンピュータビジョンタスクのホストに対して大きな効果が与えられている。この手法の多くのアプリケーションは事前訓練されたネットワーク(しばしば畳み込みネットワーク)を損失計算に利用する。関心が高まり、広く使われるようになったにも拘わらず、深い知覚的損失を計算するためにどのネットワークを使うか、どの層から特徴を抽出するかを探索するにはより多くの努力が必要である。本研究の目的は,既存の4つの重度知覚喪失例において,多種多様な特徴抽出点に対して,広く利用され,容易に利用できる事前学習ネットワークのホストを体系的に評価することである。知覚的類似性,超解像,画像分割,次元化のユースケースをベンチマークにより評価した。ベンチマークは、選択したネットワークと抽出ポイントを評価する以前の作業の実装である。ベンチマークのパフォーマンスとネットワークの属性と抽出ポイントは、詳細な分析の基盤として使用される。この分析は、どのアーキテクチャが深い知覚損失に対して優れたパフォーマンスを提供するのか、特定のタスクやデータセットの適切な抽出ポイントをどのように選択するかに関する洞察を明らかにする。さらに本研究は, 深い知覚喪失に対する結果の意義と, 転校学習の幅広い分野について論じる。その結果, 転校学習における2つの慣例から深い知覚損失が逸脱し, それらの規則がより深い分析を必要とすることが示唆された。

Deep perceptual loss is a type of loss function in computer vision that aims to mimic human perception by using the deep features extracted from neural networks. In recent years, the method has been applied to great effect on a host of interesting computer vision tasks, especially for tasks with image or image-like outputs, such as image synthesis, segmentation, depth prediction, and more. Many applications of the method use pretrained networks, often convolutional networks, for loss calculation. Despite the increased interest and broader use, more effort is needed toward exploring which networks to use for calculating deep perceptual loss and from which layers to extract the features. This work aims to rectify this by systematically evaluating a host of commonly used and readily available, pretrained networks for a number of different feature extraction points on four existing use cases of deep perceptual loss. The use cases of perceptual similarity, super-resolution, image segmentation, and dimensionality reduction, are evaluated through benchmarks. The benchmarks are implementations of previous works where the selected networks and extraction points are evaluated. The performance on the benchmarks, and attributes of the networks and extraction points are then used as a basis for an in-depth analysis. This analysis uncovers insight regarding which architectures provide superior performance for deep perceptual loss and how to choose an appropriate extraction point for a particular task and dataset. Furthermore, the work discusses the implications of the results for deep perceptual loss and the broader field of transfer learning. The results show that deep perceptual loss deviates from two commonly held conventions in transfer learning, which suggests that those conventions are in need of deeper analysis.

翻訳日:2023-10-26 01:32:29 公開日:2023-10-24

# 順序付けによる規則

Rule Enforcing Through Ordering ( http://arxiv.org/abs/2303.17971v2 )

ライセンス: Link先を確認

David Sychrovsk\'y, Sameer Desai, Martin Loebl

(参考訳) 大都市の小さな交通犯罪のような現実の多くの状況では、中央の権威は多数の個人に対して定期的に罰を課す。一般的な慣習は、個人により小さな罰金を科す機会を与え、より大きな刑罰を課す法的手続きを避けることを保証することである。しかし、多くの犯罪者と中央権力の限られた能力のおかげで、個人のリスクは通常小さく、合理的な個人は罰金を支払うことを選択しない。ここで、中央機関が犯人を公的な命令で処理した場合、犯人に罰金を科すよう適切にインセンティブを与える。我々は、我々のメカニズムが非協力と個人が支払うインセンティブを促進するという分析的および現実的な実験を示す。さらに、任意の連立についても同じことが言える。我々は、中央機関が受け取る総支払額を定量化し、その額が大幅に増加することを示す。

In many real world situations, like minor traffic offenses in big cities, a central authority is tasked with periodic administering punishments to a large number of individuals. Common practice is to give each individual a chance to suffer a smaller fine and be guaranteed to avoid the legal process with probable considerably larger punishment. However, thanks to the large number of offenders and a limited capacity of the central authority, the individual risk is typically small and a rational individual will not choose to pay the fine. Here we show that if the central authority processes the offenders in a publicly known order, it properly incentives the offenders to pay the fine. We show analytically and on realistic experiments that our mechanism promotes non-cooperation and incentives individuals to pay. Moreover, the same holds for an arbitrary coalition. We quantify the expected total payment the central authority receives, and show it increases considerably.

翻訳日:2023-10-26 01:25:56 公開日:2023-10-24

# 正面視のためのNeRFおよびニューラルビュー合成法の知覚的品質評価

Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views ( http://arxiv.org/abs/2303.15206v3 )

ライセンス: Link先を確認

Hanxue Liang, Tianhao Wu, Param Hanji, Francesco Banterle, Hongyun Gao, Rafal Mantiuk, Cengiz Oztireli

(参考訳) ニューラルビュー合成(neural view synthesis, nvs)は、自由視点映像を合成する最も成功した手法の1つであり、撮像された画像の集合から高い忠実度を達成することができる。この成功は、PSNR、SSIM、LPIPSといった画像品質の指標を用いて、テストビューのセットで評価される、多くのバリエーションを生み出した。 nvsの手法がビデオ品質に対してどのように機能するかについては、研究が不足している。本研究は,NVSおよびNeRFの知覚的評価に関する最初の研究である。本研究では,制御された実験室環境で撮影されたシーンの2つのデータセットと,室内のシーンを収集した。既存のデータセットとは対照的に、これらのシーンには参照ビデオシーケンスがあり、静的画像のみを見る際に容易に見過ごされる時間的アーティファクトや微妙な歪みをテストできます。我々は,NVS法によって合成された映像の品質をよく制御された知覚品質評価実験で測定した。本稿では,nvs評価のためのデータセットとメトリック選択の結果と推奨結果の詳細な分析を行う。

Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation.

翻訳日:2023-10-26 01:25:38 公開日:2023-10-24

# 医用画像解析におけるラベル有効深層学習の課題と今後の方向性

Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions ( http://arxiv.org/abs/2303.12484v2 )

ライセンス: Link先を確認

Cheng Jin, Zhengrui Guo, Yi Lin, Luyang Luo, Hao Chen

(参考訳) ディープラーニングは近年急速に成長し、幅広いアプリケーションで最先端のパフォーマンスを達成している。しかし、トレーニングモデルは通常、大量のラベル付きデータの高価で時間を要する。これは医療画像解析(MIA)の分野において特に当てはまり、データに制限があり、ラベルを取得するのに費用がかかる。これにより、ラベル付きデータとラベルなしデータと弱いラベル付きデータとを包括的に利用するためのラベル効率の高いディープラーニング手法が開発される。本調査では,最近300以上の論文を網羅的に調査し,MIAにおけるラベル効率学習戦略の最近の進歩を概観した。まず,ラベル効率の高い学習の背景を示し,そのアプローチを異なるスキームに分類する。次に、各スキームを通して現在の最先端手法を詳細に検討する。具体的には,カノニカルな半教師付き,自己教師付き,マルチインスタンスの学習スキームだけでなく,最近ではアクティブでアノテーション効率のよい学習戦略も紹介する。さらに, この分野への総合的な貢献として, 調査手法の共通点や特徴を解明するだけでなく, 現状の課題を詳細に分析し, 今後の研究への道のりを示唆する。

Deep learning has seen rapid growth in recent years and achieved state-of-the-art performance in a wide range of applications. However, training models typically requires expensive and time-consuming collection of large quantities of labeled data. This is particularly true within the scope of medical imaging analysis (MIA), where data are limited and labels are expensive to be acquired. Thus, label-efficient deep learning methods are developed to make comprehensive use of the labeled data as well as the abundance of unlabeled and weak-labeled data. In this survey, we extensively investigated over 300 recent papers to provide a comprehensive overview of recent progress on label-efficient learning strategies in MIA. We first present the background of label-efficient learning and categorize the approaches into different schemes. Next, we examine the current state-of-the-art methods in detail through each scheme. Specifically, we provide an in-depth investigation, covering not only canonical semi-supervised, self-supervised, and multi-instance learning schemes, but also recently emerged active and annotation-efficient learning strategies. Moreover, as a comprehensive contribution to the field, this survey not only elucidates the commonalities and unique features of the surveyed methods but also presents a detailed analysis of the current challenges in the field and suggests potential avenues for future research.

翻訳日:2023-10-26 01:25:04 公開日:2023-10-24

# 自律運転における3次元動作推定のための簡易フレームワーク

A Simple Framework for 3D Occupancy Estimation in Autonomous Driving ( http://arxiv.org/abs/2303.10076v4 )

ライセンス: Link先を確認

Wanshui Gan, Ningkai Mo, Hongbin Xu, Naoto Yokoya

(参考訳) 周囲の画像から3D占有率を推定するタスクは、Bird's Eye View (BEV) の認識の成功に続いて、自動運転分野におけるエキサイティングな発展である。このタスクは、運転環境の重要な3D特性を提供し、周囲空間の全体的な理解と認識を高める。本研究では,ネットワーク設計や最適化,評価などの3D占有率推定の重要要素を明らかにするために,CNNベースのフレームワークである3D占有率推定のためのシンプルなフレームワークを提案する。さらに, 自律運転における3次元知覚研究を推進しうる, 単眼深度推定や3次元再構成など, 3次元占有推定と他の関連課題との関係について検討した。評価のために,現在の公開データセットに柔軟である占有評価基準を定義するための簡単なサンプリング戦略を提案する。さらに,提案手法とddadおよびnuscenesデータセットの単眼深度推定法を比較し,競合性能を達成するために,深度推定メトリックの観点からベンチマークを確立した。関連するコードはhttps://github.com/GANWANSHUI/SimpleOccupancyで更新される。

The task of estimating 3D occupancy from surrounding-view images is an exciting development in the field of autonomous driving, following the success of Bird's Eye View (BEV) perception. This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. In this work, we present a simple framework for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation, such as network design, optimization, and evaluation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation and 3D reconstruction, which could advance the study of 3D perception in autonomous driving. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish the benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets and achieve competitive performance. The relevant code will be updated in https://github.com/GANWANSHUI/SimpleOccupancy.

翻訳日:2023-10-26 01:24:43 公開日:2023-10-24

# CoLT5: 条件計算付きより高速なロングレンジトランス

CoLT5: Faster Long-Range Transformers with Conditional Computation ( http://arxiv.org/abs/2303.09752v3 )

ライセンス: Link先を確認

Joshua Ainslie, Tao Lei, Michiel de Jong, Santiago Onta\~n\'on, Siddhartha Brahma, Yury Zemlyanskiy, David Uthus, Mandy Guo, James Lee-Thorp, Yi Tay, Yun-Hsuan Sung, Sumit Sanghai

(参考訳) 多くの自然言語処理タスクは、長い入力の恩恵を受けるが、長い文書をトランスフォーマーで処理するのは高価である。しかし、特に長い文書では、すべてのトークンが等しく重要であるわけではない。本研究では,条件計算を駆使して,フィードフォワード層とアテンション層の両方で重要なトークンにより多くのリソースを割り当てる,この直観に基づく長入力トランスフォーマモデル colt5 を提案する。我々は、長い入力SCROLLSベンチマークでSOTAを達成し、より高速なトレーニングと推論により、CoLT5はLongT5よりも強力な性能を実現することを示す。さらに、CoLT5は、非常に長い入力を効果的に、かつ、牽引的に利用でき、64kまでの入力長が強い。

Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.

翻訳日:2023-10-26 01:24:23 公開日:2023-10-24

# I Tag, You Tag, Everybody Tags!

I Tag, You Tag, Everybody Tags! ( http://arxiv.org/abs/2303.06073v2 )

ライセンス: Link先を確認

Hazem Ibrahim, Rohail Asim, Matteo Varvello, Yasir Zaki

(参考訳) 位置タグは個人の持ち物を追跡するように設計されている。それでも、位置情報タグが人をストーカーするのに悪用されているという逸話もある。追跡は、例えばBluetoothとペアの電話でローカルに達成され、タグに近づいた位置レポート装置にピギーバックすることでリモートで達成される。本稿では,最も人気のある2つの位置情報タグ (apple の airtag と samsungの smarttag) の性能を,実生活のユースケースをエミュレートする目的で,遭遇したデバイスの数や種類を制御せず,多数の位置情報報告デバイスを含む制御実験によって検討する。どちらのタグも同様の性能を示しており、例えば、半径100m以内の約10分で55%の位置にある。両方のタグが同時にデプロイされ、半分の時間で同等の精度を達成する場合でも、位置タグによる正確な位置へのリアルタイムストーカーは実行不可能である。それにもかかわらず、被害者の正確な動きの半分は、1時間だけ遅れて正確にバックトラッキングできる(エラーは10m)。

Location tags are designed to track personal belongings. Nevertheless, there has been anecdotal evidence that location tags are also misused to stalk people. Tracking is achieved locally, e.g., via Bluetooth with a paired phone, and remotely, by piggybacking on location-reporting devices which come into proximity of a tag. This paper studies the performance of the two most popular location tags (Apple's AirTag and Samsung's SmartTag) through controlled experiments - with a known large distribution of location-reporting devices - as well as in-the-wild experiments - with no control on the number and kind of reporting devices encountered, thus emulating real-life use-cases. We find that both tags achieve similar performance, e.g., they are located 55% of the times in about 10 minutes within a 100 m radius. It follows that real time stalking to a precise location via location tags is impractical, even when both tags are concurrently deployed which achieves comparable accuracy in half the time. Nevertheless, half of a victim's exact movements can be backtracked accurately (10m error) with just a one-hour delay, which is still perilous information in the possession of a stalker.

翻訳日:2023-10-26 01:23:50 公開日:2023-10-24

# chatgptは優れたnlgエバブリエーターか? 予備的研究

Is ChatGPT a Good NLG Evaluator? A Preliminary Study ( http://arxiv.org/abs/2303.04048v3 )

ライセンス: Link先を確認

Jiaan Wang, Yunlong Liang, Fandong Meng, Zengkui Sun, Haoxiang Shi, Zhixu Li, Jinan Xu, Jianfeng Qu, Jie Zhou

(参考訳) 近年、ChatGPTの出現は、計算言語学コミュニティから広く注目を集めている。多くの先行研究により、ChatGPTは自動評価指標を用いて様々なNLPタスクにおいて顕著な性能を発揮することが示されている。しかし、ChatGPTが評価指標として機能する能力はまだ未定である。自然言語生成モデル(NLG)の質を評価することは困難な作業であり、NLGの指標は人間の判断と相関が低いことで悪名高いことから、ChatGPTは優れたNLG評価指標であるのだろうか。本稿では,その信頼性を NLG 測定値として示すため,ChatGPT の予備メタ評価を行う。より詳しくは、ChatGPTを人間評価器とみなし、タスク固有(例えば、要約)とアスペクト固有(例えば、関連)の指示を与えて、ChatGPTにNLGモデルの生成された結果を評価する。我々は5つのNLGメタ評価データセット(要約、ストーリー生成、データ・トゥ・テキストタスクを含む)について実験を行った。実験の結果,ChatGPTは従来の自動測定値と比較すると,ほとんどの場合,人間の判断と最先端あるいは競合的な相関が得られた。さらに,ChatGPT評価器の有効性は,メタ評価データセットの作成方法の影響を受けている可能性が示唆された。参照に大きく依存して生成されるメタ評価データセットに対して、ChatGPT評価器は効果を失う可能性がある。我々の予備研究は、汎用的な信頼性NLGメトリックの出現を促すことを願っている。

Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with human judgments, we wonder whether ChatGPT is a good NLG evaluation metric. In this report, we provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric. In detail, we regard ChatGPT as a human evaluator and give task-specific (e.g., summarization) and aspect-specific (e.g., relevance) instruction to prompt ChatGPT to evaluate the generated results of NLG models. We conduct experiments on five NLG meta-evaluation datasets (including summarization, story generation and data-to-text tasks). Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments in most cases. In addition, we find that the effectiveness of the ChatGPT evaluator might be influenced by the creation method of the meta-evaluation datasets. For the meta-evaluation datasets which are created greatly depending on the reference and thus are biased, the ChatGPT evaluator might lose its effectiveness. We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.

翻訳日:2023-10-26 01:23:26 公開日:2023-10-24

# Hofstadter格子の次アネレスト近傍結合における光-マター相互作用

Light-Matter interactions in Hofstadter lattice with the next-nearest neighbor couplings ( http://arxiv.org/abs/2304.14580v2 )

ライセンス: Link先を確認

Jia-Qi Li, Zhao-Min Gao, Wen-Xiao Liu and Xin Wang

(参考訳) ホフシュタッター格子のバルク領域に結合するエミッタの光-マター相互作用について,De Bernardis \textit{et al。とD。バーナーディーズ、Z。 -P。 Cian, I. Carusotto, M. Hafezi, P. Rabl, \href{https://link.aps.org/doi/10.1103/PhysRevLett.126.103603}{Phys Rev. Lett. 126, 103603 (2021)}]. 本研究では,NNN(Next-nearest neighbor)結合を用いた拡張Hofstadter格子における光相互作用を提案する。標準ホフシュタッター格子と比較して、NNN結合はミラー対称性を破り、エネルギーバンドは平坦ではなく、非ゼロ群速度に分散する。 de bernardis \textit{et al. による研究とは対照的である。二つのレベルエミッタが拡張ホフスタッター格子のバルク領域と相互作用する場合、エミッタはフラットバンドとのコヒーレント振動によってタップされず、光子を一方向放射することができる。キラル機構は、壊れたパリティ対称性に由来する。放射率とキラリティはエミッタの結合位置によって周期的に変化する。これらの特徴はすべてフォトニック格子プラットフォーム上で実現でき、キラル量子情報処理に応用される可能性がある。

The light-mater interactions for an emitter coupling to the bulk region of a Hofstadter lattice has recently investigated by De Bernardis \textit{et al.} [D. De Bernardis, Z.-P. Cian, I. Carusotto, M. Hafezi, and P. Rabl, \href{https://link.aps.org/doi/10.1103/PhysRevLett.126.103603}{Phys. Rev. Lett. 126, 103603 (2021)}]. We propose the light-mater interactions in an extended Hofstadter lattice with the next-nearest neighbor (NNN) couplings. Compared with the standard Hofstadter lattice, the NNN couplings break the mirror symmetry and the energy bands are not flat, i.e., dispersive with nonzero group velocity. In contrast to the study by De Bernardis \textit{et al.}, when a two-level emitter interacts with the bulk region of extended Hofstadter lattice, the emitter is no longer tapped by the coherent oscillations with the flat band, and can radiate photons unidirectional. The chiral mechanism stems from the broken parity symmetry. Both the radiation rate and the chirality periodically change with the emitter's coupling position. All of those particular features can be realized on the photonic lattice platform and may find potential application in chiral quantum information processing.

翻訳日:2023-10-26 01:16:02 公開日:2023-10-24

# 生成モデルのための平均場ゲーム実験室

A mean-field games laboratory for generative modeling ( http://arxiv.org/abs/2304.13534v5 )

ライセンス: Link先を確認

Benjamin J. Zhang and Markos A. Katsoulakis

(参考訳) 生成モデルの説明,拡張,設計のための数学的枠組みとして,平均場ゲーム(MFG)の汎用性を実証する。生成フローでは、各粒子(生成サンプル)がその模擬経路上の損失関数を最小化するラグランジアン定式化が用いられる。しかし、この損失は他の粒子の経路に依存しており、粒子の集団間での競合につながっている。この競技の漸近的な行動は平均場ゲームをもたらす。我々は,MFGsと生成フローと,連続時間正規化フロー,スコアベース生成モデル(SGM),ワッサーシュタイン勾配フローなどの拡散とを関連づける。さらに,各生成モデルの数学的性質を,結合した前方-後方非線形偏微分方程式の組であるmfgの最適性条件を用いて検討する。 MFG最適条件によって記述される数学的構造は、生成フローの誘導バイアスを特定する。 SGMの数学的構造を解明し, ワッサーシュタイン勾配流のMFG定式化を導出し, 正規化流れの健全性と構造について検討する。アルゴリズムの観点から、最適条件は生成モデルの訓練を強化するためにハミルトン・ヤコビ・ベルマン正則化器(HJB)を生成する。特に,標準SGMよりも性能が向上したHJB正規化SGMを提案する。本稿では,本フレームワークをMFG実験室として紹介し,新たな実験方法と生成モデルの創出の場として機能する。

We demonstrate the versatility of mean-field games (MFGs) as a mathematical framework for explaining, enhancing, and designing generative models. In generative flows, a Lagrangian formulation is used where each particle (generated sample) aims to minimize a loss function over its simulated path. The loss, however, is dependent on the paths of other particles, which leads to a competition among the population of particles. The asymptotic behavior of this competition yields a mean-field game. We establish connections between MFGs and major classes of generative flows and diffusions including continuous-time normalizing flows, score-based generative models (SGM), and Wasserstein gradient flows. Furthermore, we study the mathematical properties of each generative model by studying their associated MFG's optimality condition, which is a set of coupled forward-backward nonlinear partial differential equations. The mathematical structure described by the MFG optimality conditions identifies the inductive biases of generative flows. We investigate the well-posedness and structure of normalizing flows, unravel the mathematical structure of SGMs, and derive a MFG formulation of Wasserstein gradient flows. From an algorithmic perspective, the optimality conditions yields Hamilton-Jacobi-Bellman (HJB) regularizers for enhanced training of generative models. In particular, we propose and demonstrate an HJB-regularized SGM with improved performance over standard SGMs. We present this framework as an MFG laboratory which serves as a platform for revealing new avenues of experimentation and invention of generative models.

翻訳日:2023-10-26 01:15:30 公開日:2023-10-24

# DiffTraj:拡散確率モデルによるGPS軌道生成

DiffTraj: Generating GPS Trajectory with Diffusion Probabilistic Model ( http://arxiv.org/abs/2304.11582v2 )

ライセンス: Link先を確認

Yuanshao Zhu, Yongchao Ye, Shiyao Zhang, Xiangyu Zhao, and James J.Q. Yu

(参考訳) GPS対応機器とデータ取得技術の広範囲な統合により、GPSトラジェクトリーデータの増加が加速し、時空間データマイニング研究の進歩が促進された。それにもかかわらず、GPSトラジェクトリには個人位置情報が含まれており、生データを扱う際に深刻なプライバシー上の懸念が生じる。この問題に対処するための有望なアプローチは、オリジナルのデータを生成されたプライバシフリーな代替手段に置き換える、トラジェクトリ生成である。軌道生成の可能性にもかかわらず、人間の行動の複雑な性質とその固有の確率特性は、高品質な軌道生成に挑戦する。本研究では,軌道生成のための時空間拡散確率モデル(DiffTraj)を提案する。このモデルは拡散モデルの生成能力と実際の軌道から導かれる時空間的特徴を効果的に組み合わせる。中心となる考え方は、逆軌道分解過程を通じて白いノイズから地理的軌跡を再構成し、合成することである。さらに、条件情報を埋め込んだトラジェクトリUNet(Traj-UNet)ディープニューラルネットワークを提案し、逆処理中のノイズレベルを正確に推定する。 2つの実世界のデータセットの実験により、DiffTrajは元の分布を保持しながら高忠実な軌道を生成するために直感的に適用可能であることが示された。さらに, 生成した結果は下流経路解析タスクをサポートし, 地理的分布評価の点で他の手法を著しく上回っている。

Pervasive integration of GPS-enabled devices and data acquisition technologies has led to an exponential increase in GPS trajectory data, fostering advancements in spatial-temporal data mining research. Nonetheless, GPS trajectories contain personal geolocation information, rendering serious privacy concerns when working with raw data. A promising approach to address this issue is trajectory generation, which involves replacing original data with generated, privacy-free alternatives. Despite the potential of trajectory generation, the complex nature of human behavior and its inherent stochastic characteristics pose challenges in generating high-quality trajectories. In this work, we propose a spatial-temporal diffusion probabilistic model for trajectory generation (DiffTraj). This model effectively combines the generative abilities of diffusion models with the spatial-temporal features derived from real trajectories. The core idea is to reconstruct and synthesize geographic trajectories from white noise through a reverse trajectory denoising process. Furthermore, we propose a Trajectory UNet (Traj-UNet) deep neural network to embed conditional information and accurately estimate noise levels during the reverse process. Experiments on two real-world datasets show that DiffTraj can be intuitively applied to generate high-fidelity trajectories while retaining the original distributions. Moreover, the generated results can support downstream trajectory analysis tasks and significantly outperform other methods in terms of geo-distribution evaluations.

翻訳日:2023-10-26 01:15:05 公開日:2023-10-24

# 量子特異値変換を用いたハミルトンシミュレーション:複雑性解析と線形vlasov-poisson方程式への応用

Hamiltonian simulation using quantum singular value transformation: complexity analysis and application to the linearized Vlasov-Poisson equation ( http://arxiv.org/abs/2304.08937v2 )

ライセンス: Link先を確認

Kiichiro Toyoizumi, Naoki Yamamoto, Kazuo Hoshino

(参考訳) 量子コンピューティングは物理系のシミュレーション時間(より正確にはアルゴリズムのクエリ数)を高速化するために使用することができる。近年,量子特異値変換(QSVT)がHSの最小シミュレーション時間を達成することが証明された。 QSVTベースのHSアルゴリズムの重要なサブルーチンは振幅増幅演算であり、これはQSVTフレームワークにおける可視振幅増幅または固定点振幅増幅によって実現できる。そこで本研究では,QSVT ベースの HS の誤りとクエリ数に関する詳細な解析を行い,シミュレーション時間における不明瞭な手法が固定点法よりも優れていることを示す。この結果に基づいて,QSVT に基づく HS を 1 次元線形化 Vlasov-Poisson 方程式に適用し,線形ランドウ減衰のシミュレーションに成功したことを示す。

Quantum computing can be used to speed up the simulation time (more precisely, the number of queries of the algorithm) for physical systems; one such promising approach is the Hamiltonian simulation (HS) algorithm. Recently, it was proven that the quantum singular value transformation (QSVT) achieves the minimum simulation time for HS. An important subroutine of the QSVT-based HS algorithm is the amplitude amplification operation, which can be realized via the oblivious amplitude amplification or the fixed-point amplitude amplification in the QSVT framework. In this work, we execute a detailed analysis of the error and number of queries of the QSVT-based HS and show that the oblivious method is better than the fixed-point one in the sense of simulation time. Based on this finding, we apply the QSVT-based HS to the one-dimensional linearized Vlasov-Poisson equation and demonstrate that the linear Landau damping can be successfully simulated.

翻訳日:2023-10-26 01:14:21 公開日:2023-10-24

# Few-Shot Class-Incremental Learningに関する調査

A Survey on Few-Shot Class-Incremental Learning ( http://arxiv.org/abs/2304.08130v2 )

ライセンス: Link先を確認

Songsong Tian, Lusi Li, Weijun Li, Hang Ran, Xin Ning, Prayag Tiwari

(参考訳) 大規模なディープラーニングモデルは印象的だが、リアルタイムデータが利用できないと苦労する。 FSCIL(Few-shot class-incremental Learning)は、ディープニューラルネットワークにおいて、これまで学んだことを忘れずに、ラベル付きサンプルから新しいタスクを学習する上で重要な課題となる。このセットアップは、破滅的な忘れと過度な問題を引き起こし、モデルパフォーマンスに深刻な影響を与えます。 FSCILの研究は、データボリュームと取得時間に関するディープラーニングモデルの制限を克服し、機械学習モデルの実用性と適応性を向上させる。本稿では FSCIL に関する総合的な調査を行う。これまでの調査と異なり,2つの視点からfscilを導入することに着目し,30以上の理論研究と20以上の応用研究をレビューした。理論的には,従来の機械学習手法,メタ学習に基づく手法,特徴量と特徴量に基づく手法,リプレイに基づく手法,動的ネットワーク構造に基づく手法の5つのサブカテゴリに分けた新しい分類手法を提案する。また、FSCILのベンチマークデータセットに関する最近の理論的研究の評価を行った。アプリケーションの観点からは、FSCILは、自然言語処理やグラフと同様に、画像分類、オブジェクト検出、画像分割など、コンピュータビジョンの様々な分野において、目覚ましい成果を達成している。我々は重要な応用をまとめる。最後に,応用,問題設定,理論開発など今後の研究の方向性を指摘する。本稿では,FSCILの方法論,性能,アプリケーションの観点からの最近の進歩を包括的に分析する。

Large deep learning models are impressive, but they struggle when real-time data is not available. Few-shot class-incremental learning (FSCIL) poses a significant challenge for deep neural networks to learn new tasks from just a few labeled samples without forgetting the previously learned ones. This setup easily leads to catastrophic forgetting and overfitting problems, severely affecting model performance. Studying FSCIL helps overcome deep learning model limitations on data volume and acquisition time, while improving practicality and adaptability of machine learning models. This paper provides a comprehensive survey on FSCIL. Unlike previous surveys, we aim to synthesize few-shot learning and incremental learning, focusing on introducing FSCIL from two perspectives, while reviewing over 30 theoretical research studies and more than 20 applied research studies. From the theoretical perspective, we provide a novel categorization approach that divides the field into five subcategories, including traditional machine learning methods, meta-learning based methods, feature and feature space-based methods, replay-based methods, and dynamic network structure-based methods. We also evaluate the performance of recent theoretical research on benchmark datasets of FSCIL. From the application perspective, FSCIL has achieved impressive achievements in various fields of computer vision such as image classification, object detection, and image segmentation, as well as in natural language processing and graph. We summarize the important applications. Finally, we point out potential future research directions, including applications, problem setups, and theory development. Overall, this paper offers a comprehensive analysis of the latest advances in FSCIL from a methodological, performance, and application perspective.

翻訳日:2023-10-26 01:14:02 公開日:2023-10-24

# 大規模言語モデルを用いた文書レベル機械翻訳

Document-Level Machine Translation with Large Language Models ( http://arxiv.org/abs/2304.02210v2 )

ライセンス: Link先を確認

Longyue Wang, Chenyang Lyu, Tianbo Ji, Zhirui Zhang, Dian Yu, Shuming Shi, Zhaopeng Tu

(参考訳) ChatGPTのような大規模言語モデル(LLM)は、様々な自然言語処理(NLP)タスクに対して、一貫性、凝集性、関連性、および流動性のある回答を生成することができる。本稿では,文書レベルの機械翻訳(MT)をテストベッドとして,談話モデルにおけるLLMの能力の詳細な評価を行う。この研究は3つの側面に焦点を当てています 1) 文脈認識プロンプトの効果は,文書レベルの翻訳品質と談話現象に異なるプロンプトが与える影響について検討する。 2)ChatGPTの翻訳性能を商用MTシステムと高度文書レベルのMT手法と比較する翻訳モデルの比較 3) 会話モデリング能力の分析により, llmで符号化された談話知識をさらに探究し, 学習技術が談話モデリングに与える影響に光を当てる。多くのベンチマークで評価した結果、LCMは優れた性能を示し、文書レベルの翻訳の新たなパラダイムとなる可能性を示した。 1)GPT-3.5及びGPT-4は、その強力な長文モデリング機能を活用し、人的評価において商用MTシステムより優れている。 2) GPT-4 は GPT-3.5 よりも言語知識の探索能力が高い。この研究は、MT における LLM の課題と機会を強調し、将来 LLM の設計と評価を刺激したいと思っています。

Large language models (LLMs) such as ChatGPT can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. Taking document-level machine translation (MT) as a testbed, this paper provides an in-depth evaluation of LLMs' ability on discourse modeling. The study focuses on three aspects: 1) Effects of Context-Aware Prompts, where we investigate the impact of different prompts on document-level translation quality and discourse phenomena; 2) Comparison of Translation Models, where we compare the translation performance of ChatGPT with commercial MT systems and advanced document-level MT methods; 3) Analysis of Discourse Modelling Abilities, where we further probe discourse knowledge encoded in LLMs and shed light on impacts of training techniques on discourse modeling. By evaluating on a number of benchmarks, we surprisingly find that LLMs have demonstrated superior performance and show potential to become a new paradigm for document-level translation: 1) leveraging their powerful long-text modeling capabilities, GPT-3.5 and GPT-4 outperform commercial MT systems in terms of human evaluation; 2) GPT-4 demonstrates a stronger ability for probing linguistic knowledge than GPT-3.5. This work highlights the challenges and opportunities of LLMs for MT, which we hope can inspire the future design and evaluation of LLMs.We release our data and annotations at https://github.com/longyuewangdcu/Document-MT-LLM.

翻訳日:2023-10-26 01:12:52 公開日:2023-10-24

# 完全配向量子センサを用いた超伝導渦の広視野定量磁気イメージング

Wide-field quantitative magnetic imaging of superconducting vortices using perfectly aligned quantum sensors ( http://arxiv.org/abs/2304.01024v2 )

ライセンス: Link先を確認

Shunsuke Nishimura, Taku Kobayashi, Daichi Sasaki, Takeyuki Tsuji, Takayuki Iwasaki, Mutsuko Hatano, Kento Sasaki, and Kensuke Kobayashi

(参考訳) 超伝導渦の可視化に様々な技術が応用され、電磁応答の手がかりとなっている。ここでは, 完全に整列したダイヤモンド量子センサを用いて, 超伝導薄膜中の渦の成層場を広範囲に定量的に可視化する。センサの不均一性の影響を軽減する解析により,yba$_2$cu$_3$o$_{7-\delta}$における単一渦の磁束を,精度$\pm10~\%$で可視化する。得られた渦形状は理論モデルと一致し, 浸透深さと温度依存性は従来の研究と一致し, 精度と広い適用性が証明された。この広視野イメージングは、原理的には極端条件下でも機能し、様々な超伝導体のキャラクタリゼーションを可能にする。

Various techniques have been applied to visualize superconducting vortices, providing clues to their electromagnetic response. Here, we present a wide-field, quantitative imaging of the stray field of the vortices in a superconducting thin film using perfectly aligned diamond quantum sensors. Our analysis, which mitigates the influence of the sensor inhomogeneities, visualizes the magnetic flux of single vortices in YBa$_2$Cu$_3$O$_{7-\delta}$ with an accuracy of $\pm10~\%$. The obtained vortex shape is consistent with the theoretical model, and penetration depth and its temperature dependence agree with previous studies, proving our technique's accuracy and broad applicability. This wide-field imaging, which in principle works even under extreme conditions, allows the characterization of various superconductors.

翻訳日:2023-10-26 01:12:30 公開日:2023-10-24

# 議論中の暗黙の質問としての包括的単純化

Elaborative Simplification as Implicit Questions Under Discussion ( http://arxiv.org/abs/2305.10387v3 )

ライセンス: Link先を確認

Yating Wu, William Sheffield, Kyle Mahowald and Junyi Jessy Li

(参考訳) 自動テキスト簡易化(automated text simplification)は、子供や創発的なバイリンガルなどの人々にとって、テキストをより使いやすくするための技術であり、複雑な文からエンコーダ・デコーダモデルを用いた簡易文への単言語翻訳タスクとしてよく考えられている。このビューは、単純化されたテキストに新しい情報が加えられる詳細化の考慮に失敗している。本稿では,議論中の問題(qud)フレームワークのレンズを通して,説明の簡略化を考察し,著者が何を精巧に扱っているのか,どのように精巧化が談話の文脈にどのように適合するかを,暗黙的な問いに対する明示的な答えとして捉えて検討する。我々は,これらの現象を研究するために,暗黙のQUDを伴う1.3KのelabQUDを紹介する。質問生成による)qudを明示的にモデル化することで、説明の単純化と他の談話とどのように結びつくかという本質的な理解がもたらされるだけでなく、説明生成の質が大幅に向上することを示す。

Automated text simplification, a technique useful for making text more accessible to people such as children and emergent bilinguals, is often thought of as a monolingual translation task from complex sentences to simplified sentences using encoder-decoder models. This view fails to account for elaborative simplification, where new information is added into the simplified text. This paper proposes to view elaborative simplification through the lens of the Question Under Discussion (QUD) framework, providing a robust way to investigate what writers elaborate upon, how they elaborate, and how elaborations fit into the discourse context by viewing elaborations as explicit answers to implicit questions. We introduce ElabQUD, consisting of 1.3K elaborations accompanied with implicit QUDs, to study these phenomena. We show that explicitly modeling QUD (via question generation) not only provides essential understanding of elaborative simplification and how the elaborations connect with the rest of the discourse, but also substantially improves the quality of elaboration generation.

翻訳日:2023-10-26 01:07:06 公開日:2023-10-24

# オフライン強化学習へのミニマリストアプローチの再検討

Revisiting the Minimalist Approach to Offline Reinforcement Learning ( http://arxiv.org/abs/2305.09836v2 )

ライセンス: Link先を確認

Denis Tarasov, Vladislav Kurenkov, Alexander Nikulin, Sergey Kolesnikov

(参考訳) 近年、オフライン強化学習(rl)が大幅に進歩し、複雑さの度合いの異なる多数のアルゴリズムが開発された。これらのアルゴリズムは注目すべき改善をもたらしたが、多くは中核的なアルゴリズムの進歩を超えてその有効性に影響を与える一見小さな設計選択を取り入れている。しかし、これらの設計選択が確立されたベースラインに与える影響は未定である。本稿では,オフラインrlにおける最近の作業のふりかえり分析を行い,td3+bc法上に構築された設計要素を統合する最小化アルゴリズムであるrebracを提案することで,このギャップを埋めることを目的とする。 D4RLとV-D4RLのベンチマークを用いて51のデータセット上のReBRACの評価を行い、オフラインとオフラインの両方の設定におけるアンサンブルフリーメソッド間の最先端性能を実証した。これらの設計選択の有効性をさらに説明するために、数千の実験で大規模なアブレーション研究とハイパーパラメータ感度分析を行う。

Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many incorporate seemingly minor design choices that impact their effectiveness beyond core algorithmic advances. However, the effect of these design choices on established baselines remains understudied. In this work, we aim to bridge this gap by conducting a retrospective analysis of recent works in offline RL and propose ReBRAC, a minimalistic algorithm that integrates such design elements built on top of the TD3+BC method. We evaluate ReBRAC on 51 datasets with both proprioceptive and visual state spaces using D4RL and V-D4RL benchmarks, demonstrating its state-of-the-art performance among ensemble-free methods in both offline and offline-to-online settings. To further illustrate the efficacy of these design choices, we perform a large-scale ablation study and hyperparameter sensitivity analysis on the scale of thousands of experiments.

翻訳日:2023-10-26 01:06:29 公開日:2023-10-24

# 顔認証の視力説明に向けて

Towards Visual Saliency Explanations of Face Verification ( http://arxiv.org/abs/2305.08546v4 )

ライセンス: Link先を確認

Yuhang Lu, Zewei Xu, Touradj Ebrahimi

(参考訳) 過去数年間、深層畳み込みニューラルネットワークは、認証と識別の両方のシナリオにおいて、顔認識(FR)技術のフロンティアを推し進めてきた。精度が高いにもかかわらず、説明性に欠けるとしてしばしば批判される。深層顔認識システムにおける意思決定プロセスの理解に対する需要が高まっている。近年の研究では、視覚塩分マップの解説としての利用が研究されているが、顔認識の文脈では議論や分析が欠如していることが多い。本稿では,説明可能な顔認証タスクに集中し,新しい説明枠組みを提案する。まず, 深層frモデルによる決定に焦点をあてた, 塩分に基づく説明方法の定義が提案されている。第二に,CorrRISEというモデルに依存しない新しい説明法が提案され,任意の顔画像の類似領域と相似領域の両方を明らかにする。次に、顔認証における一般的な視覚塩分説明手法の性能を測定するために評価手法を考案する。最後に, 視覚的, 定量的な結果から, 提案手法は他の最先端の顔認証手法と比較して有望な結果が得られた。

In the past years, deep convolutional neural networks have been pushing the frontier of face recognition (FR) techniques in both verification and identification scenarios. Despite the high accuracy, they are often criticized for lacking explainability. There has been an increasing demand for understanding the decision-making process of deep face recognition systems. Recent studies have investigated the usage of visual saliency maps as an explanation, but they often lack a discussion and analysis in the context of face recognition. This paper concentrates on explainable face verification tasks and conceives a new explanation framework. Firstly, a definition of the saliency-based explanation method is provided, which focuses on the decisions made by the deep FR model. Secondly, a new model-agnostic explanation method named CorrRISE is proposed to produce saliency maps, which reveal both the similar and dissimilar regions of any given pair of face images. Then, an evaluation methodology is designed to measure the performance of general visual saliency explanation methods in face verification. Finally, substantial visual and quantitative results have shown that the proposed CorrRISE method demonstrates promising results in comparison with other state-of-the-art explainable face verification approaches.

翻訳日:2023-10-26 01:05:48 公開日:2023-10-24

# 広視野眼底画像からの網膜疾患認識のためのドメイン適応

Supervised Domain Adaptation for Recognizing Retinal Diseases from Wide-Field Fundus Images ( http://arxiv.org/abs/2305.08078v2 )

ライセンス: Link先を確認

Qijie Wei, Jingyuan Yang, Bo Wang, Jinrui Wang, Jianchun Zhao, Xinyu Zhao, Sheng Yang, Niranchana Manivannan, Youxin Chen, Dayong Ding, Jing Zhou and Xirong Li

(参考訳) 本稿では,広視野 (WF) と超広視野 (UWF) の眼底画像から複数の網膜疾患を認識するための課題について述べる。既存の大量のラベル付きカラーファンドス写真(CFP)データと、比較的少量のWFおよびUWFデータを有効利用するために、クロスドメイン協調学習(CdCL)というドメイン適応手法を提案する。教師なしドメイン適応における固定比に基づくミックスアップの成功に触発されて、我々はこの戦略を現在のタスクに再活用する。 CFP画像とWF/UWF画像の視野の違いにより,CFP画像の解剖学的構造がWF/UWF画像よりもかなり大きくなるという,スケールバイアスが自然に存在する。 CdCL法は,変圧器を用いたスケール・バイアス補正法により,スケール不変な特徴を生成できる。 wf画像とuwf画像の両方をカバーする複数のデータセットに関する広範囲な実験によって示されているように、提案手法は多くの競合ベースラインと比較できる。

This paper addresses the emerging task of recognizing multiple retinal diseases from wide-field (WF) and ultra-wide-field (UWF) fundus images. For an effective use of existing large amount of labeled color fundus photo (CFP) data and the relatively small amount of WF and UWF data, we propose a supervised domain adaptation method named Cross-domain Collaborative Learning (CdCL). Inspired by the success of fixed-ratio based mixup in unsupervised domain adaptation, we re-purpose this strategy for the current task. Due to the intrinsic disparity between the field-of-view of CFP and WF/UWF images, a scale bias naturally exists in a mixup sample that the anatomic structure from a CFP image will be considerably larger than its WF/UWF counterpart. The CdCL method resolves the issue by Scale-bias Correction, which employs Transformers for producing scale-invariant features. As demonstrated by extensive experiments on multiple datasets covering both WF and UWF images, the proposed method compares favorably against a number of competitive baselines.

翻訳日:2023-10-26 01:05:31 公開日:2023-10-24

# 医用画像の拡散モデルに留意すること --脳MRIおよび胸部X線画像の記憶におけるGANとの比較

Beware of diffusion models for synthesizing medical images -- A comparison with GANs in terms of memorizing brain MRI and chest x-ray images ( http://arxiv.org/abs/2305.07644v2 )

ライセンス: Link先を確認

Muhammad Usman Akbar, Wuhao Wang, Anders Eklund

(参考訳) 拡散モデルは当初テキスト・画像生成のために開発され、現在では高品質な合成画像の生成に利用されている。 GANが先行する拡散モデルでは,様々な評価指標を用いて顕著な結果が得られた。しかし、fidなどの一般的なメトリクスは、拡散モデルが単にトレーニングイメージを再現しているかどうかを決定するのに適していない。ここでは、BRATS20、BRATS21および胸部X線肺炎データセットを用いてStyleGANおよび拡散モデルを用いて、脳MRIおよび胸部X線画像を合成し、合成4c画像とすべてのトレーニング画像との相関を測定する。以上の結果から,拡散モデルでは,特に3次元ボリュームの2次元スライスを用いた場合,StyleGANと比較してトレーニング画像を記憶する傾向が示唆された。研究者は、synthe4c画像の共有が最終的な目的であれば、医用イメージングに拡散モデルを使用する際に注意する必要がある。

Diffusion models were initially developed for text-to-image generation and are now being utilized to generate high-quality synthetic images. Preceded by GANs, diffusion models have shown impressive results using various evaluation metrics. However, commonly used metrics such as FID and IS are not suitable for determining whether diffusion models are simply reproducing the training images. Here we train StyleGAN and diffusion models, using BRATS20, BRATS21 and a chest x-ray pneumonia dataset, to synthesize brain MRI and chest x-ray images, and measure the correlation between the synthe4c images and all training images. Our results show that diffusion models are more likely to memorize the training images, compared to StyleGAN, especially for small datasets and when using 2D slices from 3D volumes. Researchers should be careful when using diffusion models for medical imaging, if the final goal is to share the synthe4c images

翻訳日:2023-10-26 01:05:11 公開日:2023-10-24

# VPGTrans: LLM間でのビジュアルプロンプトジェネレータの転送

VPGTrans: Transfer Visual Prompt Generator across LLMs ( http://arxiv.org/abs/2305.01278v2 )

ライセンス: Link先を確認

Ao Zhang, Hao Fei, Yuan Yao, Wei Ji, Li Li, Zhiyuan Liu, and Tat-Seng Chua

(参考訳) 画像テキストペアをスクラッチから事前学習することで,新たなマルチモーダル LLM (MLLM) を開発するには, 既存の LLM を比較的軽量なビジュアルプロンプトジェネレータ (VPG) と接続することが, 実現可能なパラダイムとなる。しかし、MLLMのVPG部分のさらなるチューニングは依然として必要な計算コスト、すなわち何千時間ものGPU時間と数百万のトレーニングデータに悩まされている。 1つの代替策は、既存のMLLMからターゲットMLLMに既存のVPGを転送することである。本研究では,LLM間のVPG転送可能性について初めて検討し,VPG転送コストを低減するための解決策を探究する。我々はまず, 異なるLLMサイズ(例えば, 小さいから大きい)および異なるLLMタイプにわたるVPG転送について検討し, 転送効率を最大化するために重要な因子を診断する。本稿では,VPGTransという2段階の転送フレームワークを設計する。広範な実験を通じて,vpgtransは,パフォーマンスを損なうことなく,転送学習プロセスを大幅に高速化できることを実証する。 BLIP-2 OPT$_\text{2.7B}$からBLIP-2 OPT$_\text{6.7B}$へのVPG転送には10倍以上のスピードアップと10.7%のトレーニングデータがある。さらに、その背後にある一連の興味深い発見と潜在的な根拠を提供し、議論する。最後に、VL-LLaMAとVL-Vicunaを含む2つの新しいMLLMと、最近リリースされたLLaMAとVicuna LLMをカスタマイズすることで、VPGTransアプローチの実用価値を示す。

While developing a new multimodal LLM (MLLM) by pre-training on tremendous image-text pairs from scratch can be exceedingly resource-consuming, connecting an existing LLM with a comparatively lightweight visual prompt generator (VPG) becomes a feasible paradigm. However, further tuning the VPG part of the MLLM still suffers from indispensable computational costs, i.e., requiring thousands of GPU hours and millions of training data. One alternative solution is to transfer an existing VPG from any existing MLLMs for the target MLLM. In this work, we for the first time investigate the VPG transferability across LLMs, and explore a solution to reduce the cost of VPG transfer. We first study the VPG transfer across different LLM sizes (e.g., small-to-large), and across different LLM types, through which we diagnose the key factors to maximize the transfer efficiency. Based on our observation, we design a two-stage transfer framework named VPGTrans, which is simple yet highly effective. Through extensive experiments, we demonstrate that VPGTrans helps significantly speed up the transfer learning process without compromising performance. Remarkably, it helps achieve the VPG transfer from BLIP-2 OPT$_\text{2.7B}$ to BLIP-2 OPT$_\text{6.7B}$ with over 10 times speed-up and 10.7% training data compared with connecting a VPG to OPT$_\text{6.7B}$ from scratch. Further, a series of intriguing findings and potential rationales behind them are provided and discussed. Finally, we showcase the practical value of our VPGTrans approach, by customizing two novel MLLMs, including VL-LLaMA and VL-Vicuna, with recently released LLaMA and Vicuna LLMs.

翻訳日:2023-10-26 01:04:24 公開日:2023-10-24

# GPT-2はどのように計算しますか? 事前学習言語モデルにおける数学的能力の解釈

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model ( http://arxiv.org/abs/2305.00586v4 )

ライセンス: Link先を確認

Michael Hanna, Ollie Liu and Alexandre Variengien

(参考訳) 事前訓練された言語モデルは、明示的に訓練されていないタスクに驚くほど適しているが、これらの機能の実装方法はあまり理解されていない。本稿では,事前学習された言語モデルによってしばしば得られる基本的な数学的能力について検討する。具体的には,GPT-2の(限定的な)数学的能力を説明するために,機械的解釈可能性技術を用いる。ケーススタディとして,「戦争は1732年から17年まで続いた」などの文を取り込む能力について検討し,有効な2桁終了年(32歳未満)を予測した。まず、このタスクの出力を計算するGPT-2小の計算グラフの小さなサブセットである回路を同定する。そして、各回路部品の役割を説明し、GPT-2小の最終的な多層パーセプトロンが、開始年よりも終末年の確率を高めることを示す。最後に、回路を活性化する関連タスクを見つける。以上の結果から,GPT-2は多種多様なコンテキストにまたがって活性化する複雑だが汎用的な機構を用いて計算を行う。

Pre-trained language models can be surprisingly adept at tasks they were not explicitly trained on, but how they implement these capabilities is poorly understood. In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models. Concretely, we use mechanistic interpretability techniques to explain the (limited) mathematical abilities of GPT-2 small. As a case study, we examine its ability to take in sentences such as "The war lasted from the year 1732 to the year 17", and predict valid two-digit end years (years > 32). We first identify a circuit, a small subset of GPT-2 small's computational graph that computes this task's output. Then, we explain the role of each circuit component, showing that GPT-2 small's final multi-layer perceptrons boost the probability of end years greater than the start year. Finally, we find related tasks that activate our circuit. Our results suggest that GPT-2 small computes greater-than using a complex but general mechanism that activates across diverse contexts.

翻訳日:2023-10-26 01:03:43 公開日:2023-10-24

# instructalign: 連続的な言語間インストラクションチューニングによる高低リソース言語アライメント

InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning ( http://arxiv.org/abs/2305.13627v2 )

ライセンス: Link先を確認

Samuel Cahyawijaya, Holy Lovenia, Tiezheng Yu, Willy Chung, Pascale Fung

(参考訳) 命令を調整した大規模言語モデル(LLM)は、様々なタスクや言語で顕著な能力を示している。しかし、利用可能なデータが不足しているため、表現不足の言語に一般化する能力は限られている。さらに、命令調整されたLLMに新しい言語を直接適用すると、破滅的な忘れ込みが生じ、マルチタスク能力が失われる。この問題に対処するために,LLMが新たな未知言語と学習済み高ソース言語との整合を可能にするために,連続的な言語間命令チューニングを使用するInstructAlignを提案する。 InstructAlignの有効性を実証し,並列データに制限のある低リソース言語をモデルで理解し,破滅的な忘れ込みを防止した。我々の研究は、言語適応手法の進歩に寄与し、特に、未表現言語への命令調整 LLM の適応に寄与する。私たちのコードはhttps://github.com/HLTCHKUST/InstructAlignでリリースされています

Large language models (LLMs) that are tuned with instructions have demonstrated remarkable capabilities in various tasks and languages. However, their ability to generalize to underrepresented languages is limited due to the scarcity of available data. Additionally, directly adapting new languages to instruction-tuned LLMs can result in catastrophic forgetting, which leads to the loss of multitasking ability. To address this issue, we propose InstructAlign which uses continual crosslingual instruction tuning to enable LLMs to align new unseen languages with previously learned high-resource languages. Our results demonstrate the effectiveness of InstructAlign in enabling the model to understand low-resource languages with limited parallel data while preventing catastrophic forgetting. Our work contributes to the advancement of language adaptation methods, particularly for adapting instruction-tuned LLMs to underrepresented languages. Our code is released on https://github.com/HLTCHKUST/InstructAlign

翻訳日:2023-10-26 00:55:37 公開日:2023-10-24

# KineticNet: 軌道自由密度汎関数理論のための伝達可能な運動エネルギー関数の深層学習

KineticNet: Deep learning a transferable kinetic energy functional for orbital-free density functional theory ( http://arxiv.org/abs/2305.13316v2 )

ライセンス: Link先を確認

Roman Remme, Tobias Kaczun, Maximilian Scheurer, Andreas Dreuw, Fred A. Hamprecht

(参考訳) 軌道自由密度汎関数理論(OF-DFT)は、最小コストで基底状態分子特性を計算することを約束する。しかし、電子密度のみの関数として運動エネルギーを計算できないため、これは抑制されている。ここでは、より高価なコーン・シャム密度汎関数理論によって提供される基底真理から運動エネルギー汎関数を学習する。モデルに十分な表現性と空間的コンテキストを付与し、メモリフットプリントをGPU上の計算能力に制限する、トレーニングデータの十分な広範な分布を作成して、初期推定が貧弱な場合でも反復的な密度最適化を可能にする、という2つの課題に直面している。そこで我々は,分子二次格子上の量予測に適応した点畳み込みに基づく等価なディープニューラルネットワークアーキテクチャであるkineticnetを提案する。核カスプ近傍で十分な空間分解能を有する畳み込みフィルタ、複数の結合長にわたって情報を伝達する原子中心のスパースだが表現力のあるアーキテクチャ、およびランダムな外部電位による摂動面の基底状態密度を見つけ、様々なトレーニングデータを生成する新しい戦略を含む。 KineticNetは、入力密度と微小分子のジオメトリにわたる学習された機能の化学的精度を初めて達成した。 2つの電子系に対して、化学的精度でOF-DFT密度を最適化する。

Orbital-free density functional theory (OF-DFT) holds the promise to compute ground state molecular properties at minimal cost. However, it has been held back by our inability to compute the kinetic energy as a functional of the electron density only. We here set out to learn the kinetic energy functional from ground truth provided by the more expensive Kohn-Sham density functional theory. Such learning is confronted with two key challenges: Giving the model sufficient expressivity and spatial context while limiting the memory footprint to afford computations on a GPU; and creating a sufficiently broad distribution of training data to enable iterative density optimization even when starting from a poor initial guess. In response, we introduce KineticNet, an equivariant deep neural network architecture based on point convolutions adapted to the prediction of quantities on molecular quadrature grids. Important contributions include convolution filters with sufficient spatial resolution in the vicinity of the nuclear cusp, an atom-centric sparse but expressive architecture that relays information across multiple bond lengths; and a new strategy to generate varied training data by finding ground state densities in the face of perturbations by a random external potential. KineticNet achieves, for the first time, chemical accuracy of the learned functionals across input densities and geometries of tiny molecules. For two electron systems, we additionally demonstrate OF-DFT density optimization with chemical accuracy.

翻訳日:2023-10-26 00:55:21 公開日:2023-10-24

# GQA:マルチヘッドチェックポイントを用いた汎用マルチクエリトランスフォーマモデルの訓練

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints ( http://arxiv.org/abs/2305.13245v2 )

ライセンス: Link先を確認

Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebr\'on, Sumit Sanghai

(参考訳) 単一のキー値ヘッドのみを使用するマルチクエリアテンション(MQA)は、デコーダ推論を大幅に高速化する。しかし、MQAは品質の低下につながる可能性があるし、より高速な推論のためだけに別のモデルをトレーニングすることは望ましくないかもしれない。 1) 既存のマルチヘッド言語モデルのチェックポイントを、従来の事前学習計算の5%を用いてMQAモデルにアップトレーニングするためのレシピを提案し、(2) キー値ヘッドの中間数(クエリヘッド数より少ない数)を使用するマルチクエリアテンションの一般化であるグループクエリアテンション(GQA)を導入する。トレーニングされたGQAはMQAに匹敵する速度でマルチヘッドで品質を実現することを示す。

Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference. However, MQA can lead to quality degradation, and moreover it may not be desirable to train a separate model just for faster inference. We (1) propose a recipe for uptraining existing multi-head language model checkpoints into models with MQA using 5% of original pre-training compute, and (2) introduce grouped-query attention (GQA), a generalization of multi-query attention which uses an intermediate (more than one, less than number of query heads) number of key-value heads. We show that uptrained GQA achieves quality close to multi-head attention with comparable speed to MQA.

翻訳日:2023-10-26 00:54:57 公開日:2023-10-24

# SpokenWOZ:タスク指向対話エージェントのための大規模音声テキストベンチマーク

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents ( http://arxiv.org/abs/2305.13040v4 )

ライセンス: Link先を確認

Shuzheng Si, Wentao Ma, Haoyu Gao, Yuchuan Wu, Ting-En Lin, Yinpei Dai, Hangyu Li, Rui Yan, Fei Huang, Yongbin Li

(参考訳) タスク指向対話(TOD)モデルは近年大きな進歩を遂げている。しかし,従来の研究は主にアノテータによるデータセットに焦点を当てており,学術研究と実世界の会話シナリオのギャップが生じた。いくつかの小規模音声TODデータセットは、ASRエラーなどの堅牢性問題に対処するために提案されているが、音声会話におけるユニークな課題は無視されている。この制限に対処するために,8つのドメイン,203kのターン,5.7kの対話,対人会話からの249時間の音声を含む,音声TODのための大規模音声テキストデータセットであるSpkenWOZを導入する。 SpokenWOZはさらに、音声言語における単語間処理や推論などの一般的な音声特徴を取り入れている。これらの特徴に基づき,新たな課題としてクロスターンスロットと推論スロット検出を提案する。テキストモーダルモデル,新たに提案されたデュアルモーダルモデル,LLM,例えばChatGPTなど,さまざまなベースライン上で実験を行う。その結果、最も先進的な対話状態追跡装置は、結合目標精度が25.65%しか達成できず、somaエンドツーエンドモデルでは52.1%の対話でユーザ要求を正しく完了している。データセット、コード、およびリーダーボードは、https://spokenwoz.github.io/SpokenWOZ-github.io/で入手できる。

Task-oriented dialogue (TOD) models have made significant progress in recent years. However, previous studies primarily focus on datasets written by annotators, which has resulted in a gap between academic research and real-world spoken conversation scenarios. While several small-scale spoken TOD datasets are proposed to address robustness issues such as ASR errors, they ignore the unique challenges in spoken conversation. To tackle the limitations, we introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD, containing 8 domains, 203k turns, 5.7k dialogues and 249 hours of audios from human-to-human spoken conversations. SpokenWOZ further incorporates common spoken characteristics such as word-by-word processing and reasoning in spoken language. Based on these characteristics, we present cross-turn slot and reasoning slot detection as new challenges. We conduct experiments on various baselines, including text-modal models, newly proposed dual-modal models, and LLMs, e.g., ChatGPT. The results show that the current models still have substantial room for improvement in spoken conversation, where the most advanced dialogue state tracker only achieves 25.65% in joint goal accuracy and the SOTA end-to-end model only correctly completes the user request in 52.1% of dialogues. The dataset, code, and leaderboard are available: https://spokenwoz.github.io/SpokenWOZ-github.io/.

翻訳日:2023-10-26 00:54:35 公開日:2023-10-24

# 形状のViT:計算最適モデル設計のためのスケーリング法則

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design ( http://arxiv.org/abs/2305.13035v4 )

ライセンス: Link先を確認

Ibrahim Alabdulmohsin, Xiaohua Zhai, Alexander Kolesnikov, Lucas Beyer

(参考訳) スケーリング法則は、最近、与えられた計算時間に最適なモデルサイズ(パラメータの数)を導出するために用いられる。このような手法を改良して、幅や深さなどの計算最適モデル形状を推定し、視覚トランスフォーマーでこれをうまく実装した。我々の形状最適化型視覚変換器SoViTは、同等の計算量で事前訓練されているにもかかわらず、サイズが2倍以上のモデルと競合する結果を得る。例えば、SoViT-400m/14はILSRCV2012で90.3%の微調整精度を達成し、はるかに大きなViT-g/14を超え、同じ設定でViT-G/14に近づく。画像分類,キャプション,vqa,ゼロショット転送など,複数のタスクにわたって徹底的な評価を行い,幅広い領域にわたるモデルの有効性と限界の特定を実証した。全体として、私たちの発見は視覚モデルを盲目的にスケールアップし、より詳細なスケーリングの道を開くという一般的なアプローチに挑戦しています。

Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute. For example, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012, surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical settings, with also less than half the inference cost. We conduct a thorough evaluation across multiple tasks, such as image classification, captioning, VQA and zero-shot transfer, demonstrating the effectiveness of our model across a broad range of domains and identifying limitations. Overall, our findings challenge the prevailing approach of blindly scaling up vision models and pave a path for a more informed scaling.

翻訳日:2023-10-26 00:54:07 公開日:2023-10-24

# 最近傍の機械翻訳は出力投影層上でのメタオプティマイザである

Nearest Neighbor Machine Translation is Meta-Optimizer on Output Projection Layer ( http://arxiv.org/abs/2305.13034v2 )

ライセンス: Link先を確認

Ruize Gao, Zhirui Zhang, Yichao Du, Lemao Liu, Rui Wang

(参考訳) Nearest Neighbor Machine Translation (k$NN-MT)は、訓練済みニューラルネットワーク翻訳(NMT)モデルとドメイン固有のトークンレベルの検索を統合することで、ドメイン適応タスクにおいて大きな成功を収めた。しかし、その成功の背景にある理由は十分に調査されていない。本稿では,理論的および実証的研究を通じて,$k$NN-MTを包括的に分析する。当初,NMTの出力射影層に勾配降下を暗黙的に実行する手法として,$k$NN-MTの動作機構に関する新たな知見を提供し,モデル微調整の特定の事例であることを示す。その後、我々は、$k$NN-MTとモデル全体の微調整性能の違いを調べるために、複数ドメインの実験と単語レベルの分析を行う。その結果、(1)アダプタに$k$nn-mtを組み込むことで、ドメイン内テストセットの微調整と同等の翻訳性能が得られると同時に、ドメイン外テストセットのパフォーマンスも向上し、(2)ドメイン内低頻度単語のリコールでは$k$nn-mtを大きく上回っているが、このギャップは、追加のアダプタ層でコンテキスト表現を最適化することで橋渡しできる。

Nearest Neighbor Machine Translation ($k$NN-MT) has achieved great success in domain adaptation tasks by integrating pre-trained Neural Machine Translation (NMT) models with domain-specific token-level retrieval. However, the reasons underlying its success have not been thoroughly investigated. In this paper, we comprehensively analyze $k$NN-MT through theoretical and empirical studies. Initially, we provide new insights into the working mechanism of $k$NN-MT as an efficient technique to implicitly execute gradient descent on the output projection layer of NMT, indicating that it is a specific case of model fine-tuning. Subsequently, we conduct multi-domain experiments and word-level analysis to examine the differences in performance between $k$NN-MT and entire-model fine-tuning. Our findings suggest that: (1) Incorporating $k$NN-MT with adapters yields comparable translation performance to fine-tuning on in-domain test sets, while achieving better performance on out-of-domain test sets; (2) Fine-tuning significantly outperforms $k$NN-MT on the recall of in-domain low-frequency words, but this gap could be bridged by optimizing the context representations with additional adapter layers.

翻訳日:2023-10-26 00:53:37 公開日:2023-10-24

# chatgptを蒸留して自動解答評価を行う

Distilling ChatGPT for Explainable Automated Student Answer Assessment ( http://arxiv.org/abs/2305.12962v2 )

ライセンス: Link先を確認

Jiazheng Li, Lin Gui, Yuxiang Zhou, David West, Cesare Aloisi, Yulan He

(参考訳) 説明可能で忠実なフィードバックを提供することは,学生の回答自動評価に不可欠である。本稿では,最先端の大規模言語モデルであるChatGPTを用いて,学生の回答スコアリングと合理性生成の同時処理を行う新しいフレームワークを提案する。そこで我々は,ChatGPTに異なるテンプレートを付けて,一貫性のない有理を改良してマーキング基準に適合させることにより,適切な指示を識別する。洗練されたChatGPT出力により、学生の回答を同時に評価し、合理的な結果を提供する、より小さな言語モデルを微調整できる。ベンチマークデータセットの広範な実験により,提案手法はchatgptと比較してqwk全体のスコアを11%向上させた。さらに,提案手法によって得られた理論的根拠がchatgptに匹敵することを示した。このアプローチは,教育における説明可能な自動評価を実現するための有効なソリューションを提供する。コードはhttps://github.com/lijiazheng99/aeraで入手できる。

Providing explainable and faithful feedback is crucial for automated student answer assessment. In this paper, we introduce a novel framework that explores using ChatGPT, a cutting-edge large language model, for the concurrent tasks of student answer scoring and rationale generation. We identify the appropriate instructions by prompting ChatGPT with different templates to collect the rationales, where inconsistent rationales are refined to align with marking standards. The refined ChatGPT outputs enable us to fine-tune a smaller language model that simultaneously assesses student answers and provides rationales. Extensive experiments on the benchmark dataset show that the proposed method improves the overall QWK score by 11% compared to ChatGPT. Furthermore, our thorough analysis and human evaluation demonstrate that the rationales generated by our proposed method are comparable to those of ChatGPT. Our approach provides a viable solution to achieve explainable automated assessment in education. Code available at https://github.com/lijiazheng99/aera.

翻訳日:2023-10-26 00:53:12 公開日:2023-10-24

# nlp研究におけるパラダイムシフトの2次解析--いつ、どのように、なぜ?

A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and Why? ( http://arxiv.org/abs/2305.12920v2 )

ライセンス: Link先を確認

Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych

(参考訳) 科学分野の基本概念と傾向を理解することは、その継続的な進歩を保ち続けるために不可欠である。本研究では,因果発見と推論手法を用いて,科学分野における研究トピックの進化を分析するための体系的枠組みを提案する。我々は,NLPにおける研究トピックの進化の多様な側面を包含する3つの変数を定義し,因果探索アルゴリズムを用いてこれらの変数間の因果関係を明らかにする。その後、これらの関係の強度を測定するためにこの構造を利用する。 ACLアンソロジーコーパスに関する広範な実験を行うことにより、我々のフレームワークは、幅広いNLP研究トピックの進化的傾向と根本原因を効果的に発見できることを実証する。具体的には、タスクとメソッドがNLPの研究の主要な要因であることを示し、データセットは従うが、メトリクスは最小限の影響を持つ。

Understanding the fundamental concepts and trends in a scientific field is crucial for keeping abreast of its continuous advancement. In this study, we propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques. We define three variables to encompass diverse facets of the evolution of research topics within NLP and utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data. Subsequently, we leverage this structure to measure the intensity of these relationships. By conducting extensive experiments on the ACL Anthology corpus, we demonstrate that our framework effectively uncovers evolutionary trends and the underlying causes for a wide range of NLP research topics. Specifically, we show that tasks and methods are primary drivers of research in NLP, with datasets following, while metrics have minimal impact.

翻訳日:2023-10-26 00:52:56 公開日:2023-10-24

# opt-r: 大きな言語モデルの推論スキルの微調整と促進における説明の役割を探る

OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models ( http://arxiv.org/abs/2305.12001v2 )

ライセンス: Link先を確認

Badr AlKhamissi, Siddharth Verma, Ping Yu, Zhijing Jin, Asli Celikyilmaz, Mona Diab

(参考訳) 本稿では,大規模言語モデル (llm) の推論能力について,特にopen pretrained transformers (opt) モデルを代表として徹底的に検討する。そこで本研究では, OPT-Rモデル, OPT-REモデル, OPT-REモデルの3つのモデルについて検討した。次に,SUPER-NATURALINSTRUCTIONSベンチマークから抽出した57の領域外タスクに対して,26の異なる推論スキルを網羅し,3つのプロンプト技術を用いて全てのモデルを評価する。本研究では,27の構成と6,156の試験評価を網羅的に網羅し,様々な推論スキルにおける説明の役割を理解するために,微調整,プロンプト,スケールの寸法を調査した。この結果から, モデルが微調整された場合, モデルの性能に有意な影響を与えず, 非微調整されたモデルに肯定的な影響を及ぼすことが明らかとなった。さらに,シグネチャリングと微調整の際の説明を取り入れた分類精度が,わずかながら一貫した増加を観察した。最後に、数値(+20.4%)と類推(+13.9%)の推論や、無視可能なあるいは否定的な効果を示すスキルなど、微調整やプロンプトの際の説明を取り入れることで、どのスキルが最も有益かを洞察する。

In this paper, we conduct a thorough investigation into the reasoning capabilities of Large Language Models (LLMs), focusing specifically on the Open Pretrained Transformers (OPT) models as a representative of such models. Our study entails finetuning three different sizes of OPT on a carefully curated reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned without explanations, and OPT-RE, finetuned with explanations. We then evaluate all models on 57 out-of-domain tasks drawn from the SUPER-NATURALINSTRUCTIONS benchmark, covering 26 distinct reasoning skills, utilizing three prompting techniques. Through a comprehensive grid of 27 configurations and 6,156 test evaluations, we investigate the dimensions of finetuning, prompting, and scale to understand the role of explanations on different reasoning skills. Our findings reveal that having explanations in the fewshot exemplar has no significant impact on the model's performance when the model is finetuned, while positively affecting the non-finetuned counterpart. Moreover, we observe a slight yet consistent increase in classification accuracy as we incorporate explanations during prompting and finetuning, respectively. Finally, we offer insights on which skills benefit the most from incorporating explanations during finetuning and prompting, such as Numerical (+20.4%) and Analogical (+13.9%) reasoning, as well as skills that exhibit negligible or negative effects.

翻訳日:2023-10-26 00:52:41 公開日:2023-10-24

# 言語間視覚伝達のためのメタラーニング

Meta-learning For Vision-and-language Cross-lingual Transfer ( http://arxiv.org/abs/2305.14843v2 )

ライセンス: Link先を確認

Hanxu Hu, Frank Keller

(参考訳) 現在のvison-Language Model (PVLM) は、様々なマルチモーダルデータセットにおいて優れた性能を発揮する。近年,多言語モデルの構築を目的とした研究が行われ,多言語多モーダルデータセットが提案されている。現在のpvlmは、マルチモーダルなゼロショットや少数ショットのクロスリンガル転送、特に低リソース言語で使用される場合、これらのデータセットでパフォーマンスが悪い。この問題を解決するために,新しいメタ学習型微調整フレームワークを提案する。本フレームワークは,mamlを言語間マルチモーダルで設計することにより,視覚言語シナリオにおける新しい言語に迅速に適応する。 XVNLI, xGQA, MARVL, xFlicker&Co) の視覚言語理解タスクおよびデータセットにおける, ゼロショットおよび少数ショットの言語間移動における現在のPVLMの性能を向上させる実験を行った。

Current pre-trained vison-language models (PVLMs) achieve excellent performance on a range of multi-modal datasets. Recent work has aimed at building multilingual models, and a range of novel multilingual multi-modal datasets have been proposed. Current PVLMs typically perform poorly on these datasets when used for multi-modal zero-shot or few-shot cross-lingual transfer, especially for low-resource languages. To alleviate this problem, we propose a novel meta-learning fine-tuning framework. Our framework makes current PVLMs rapidly adaptive to new languages in vision-language scenarios by designing MAML in a cross-lingual multi-modal manner. Experiments show that our method boosts the performance of current state-of-the-art PVLMs in both zero-shot and few-shot cross-lingual transfer on a range of vision-language understanding tasks and datasets (XVNLI, xGQA, MaRVL, xFlicker&Co)

翻訳日:2023-10-26 00:45:31 公開日:2023-10-24

# 画像キャプションにおけるアクダクタンスとsituated Meaningの探索:マルチモーダル解析

Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis ( http://arxiv.org/abs/2305.14616v2 )

ライセンス: Link先を確認

Pin-Er Chen, Po-Ya Angela Wang, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh

(参考訳) 本稿では,マルチモーダルな意味表現に関する基礎的課題を,計算的認知言語学の観点から考察する。我々は、flickr30kデータセットから得られた画像に、アフォーマンス、知覚的敬礼、オブジェクト番号、視線キューイング、生態的ニッチアソシエーション(ena)という5つの知覚的特性を注釈し、画像キャプションにおけるテキスト的要素との関連について検討する。以上の結果から,ギブソニアン代価を持つ画像は,テルル代価を示す画像に比べて「保持版」と「コンテナ名詞」を含む字幕の頻度が高いことが判明した。知覚的サリエンス、対象数、ENAもまた言語表現の選択と関連している。本研究は,物体や事象の包括的理解には,認知的注意,言語の意味的ニュアンス,多様性の統合が必要であることを示す。自然言語理解における位置的意味と余裕の基盤の重要性を強調し,様々なシナリオにおける人間的な解釈の進歩の可能性について考察した。

This paper explores the grounding issue regarding multimodal semantic representation from a computational cognitive-linguistic view. We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche Association (ENA), and examine their association with textual elements in the image captions. Our findings reveal that images with Gibsonian affordance show a higher frequency of captions containing 'holding-verbs' and 'container-nouns' compared to images displaying telic affordance. Perceptual Salience, Object Number, and ENA are also associated with the choice of linguistic expressions. Our study demonstrates that comprehensive understanding of objects or events requires cognitive attention, semantic nuances in language, and integration across multiple modalities. We highlight the vital importance of situated meaning and affordance grounding in natural language understanding, with the potential to advance human-like interpretation in various scenarios.

翻訳日:2023-10-26 00:45:10 公開日:2023-10-24

# 翻訳と効果的な言語間伝達のための多言語画素表現

Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer ( http://arxiv.org/abs/2305.14280v2 )

ライセンス: Link先を確認

Elizabeth Salesky, Neha Verma, Philipp Koehn, Matt Post

(参考訳) 画素表現を用いた多言語機械翻訳モデルを効果的に学習する方法を紹介し,実証する。さまざまな言語とスクリプトカバレッジを備えた2つの異なるデータ設定を実験し,サブワード埋め込みと比較して性能が向上した。文字間のパラメータ共有など,画素表現のさまざまな特性について検討し,前向きな転送につながる部分の理解を深める。これらの特性は, 未知のスクリプトへのシームレスな言語間移動を可能にするだけでなく, 語彙展開などの代替手段よりも, 画素表現をよりデータ効率良くする。この作業が、すべての言語とスクリプトに対して、より拡張可能な多言語モデルに貢献することを願っています。

We introduce and demonstrate how to effectively train multilingual machine translation models with pixel representations. We experiment with two different data settings with a variety of language and script coverage, demonstrating improved performance compared to subword embeddings. We explore various properties of pixel representations such as parameter sharing within and across scripts to better understand where they lead to positive transfer. We observe that these properties not only enable seamless cross-lingual transfer to unseen scripts, but make pixel representations more data-efficient than alternatives such as vocabulary expansion. We hope this work contributes to more extensible multilingual models for all languages and scripts.

翻訳日:2023-10-26 00:44:51 公開日:2023-10-24

# 2次元クラスター状態におけるバルク測定による境界相転移のトリガリング

Triggering Boundary Phase Transitions through Bulk Measurements in 2D Cluster States ( http://arxiv.org/abs/2305.14231v2 )

ライセンス: Link先を確認

Yuchen Guo, Jian-Hao Zhang, Zhen Bi, Shuo Yang

(参考訳) テンソルネットワーク法を用いてバルク測定を行う無限2次元クラスター状態の境界における位相図について検討する。状態は、下限量子ビットおよび全てのバルク量子ビットにおいて、一様測定値$m = \cos{\theta}z+\sin{\theta}x$となる。以上の結果から, システムの境界は, 測定角度$\theta = \pi/2$ および任意の$\theta < \pi/2$ に対して領域法的絡み合いを示すことがわかった。領域ロー位相では、相転移は$\theta_c=1.371$で起こる。 $\theta \in(\theta_c,\pi/2)$ の位相は、単次元局所ガッピングハミルトニアンの一意な基底状態として実現できない非射影行列積状態によって特徴づけられる。その代わり、自発的な対称性の破れを伴う猫の状態に似ている。これらの結果から, 2次元系の境界の位相図は, 標準1次元系よりも複雑であることが示された。

We investigate the phase diagram at the boundary of an infinite two-dimensional cluster state subject to bulk measurements using tensor network methods. The state is subjected to uniform measurements $M = \cos{\theta}Z+\sin{\theta}X$ on the lower boundary qubits and in all bulk qubits. Our results show that the boundary of the system exhibits volume-law entanglement at the measurement angle $\theta = \pi/2$ and area-law entanglement for any $\theta < \pi/2$. Within the area-law phase, a phase transition occurs at $\theta_c=1.371$. The phase with $\theta \in(\theta_c,\pi/2)$ is characterized by a noninjective matrix product state, which cannot be realized as the unique ground state of a one-dimensional local, gapped Hamiltonian. Instead, it resembles a cat state with spontaneous symmetry breaking. These findings demonstrate that the phase diagram of the boundary of a two-dimensional system can be more intricate than that of a standard one-dimensional system.

翻訳日:2023-10-26 00:44:39 公開日:2023-10-24

# 知識の不変性を保つ:オープン情報抽出のロバスト性評価の再検討

Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction ( http://arxiv.org/abs/2305.13981v2 )

ライセンス: Link先を確認

Ji Qi, Chuchun Zhang, Xiaozhi Wang, Kaisheng Zeng, Jifan Yu, Jinxin Liu, Jiuding Sun, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu

(参考訳) 分布変化に対するロバスト性は、NLPモデルを現実の世界、特に情報抽出タスクにうまく適用できることを保証する。しかしながら、ほとんどの先行評価ベンチマークは、ロバスト性の重要な測定値を無視して、ペアワイズマッチングの正しさを検証することに費やされてきた。本稿では,実世界におけるオープン情報抽出モデルの評価をシミュレートした最初のベンチマークを提案する。それぞれの例が、同じ意味の構造化された知識を持つが、異なる構文と表現形式を持つ文からなる、知識不変のクランクである大規模なテストベッドを設計し、アノテートする。さらにロバスト性メトリクスを詳述することで、モデルが全体のクランクで一貫して正確である場合、ロバストであると判断される。我々は過去10年間に発行された典型的なモデルと一般的な大言語モデルの実験を行い、その結果、既存の成功したモデルは、最大で23.43 F1スコアのフラストレーションのある劣化を示した。私たちのリソースとコードはhttps://github.com/qijimrc/robust.comから入手できます。

The robustness to distribution changes ensures that NLP models can be successfully applied in the realistic world, especially for information extraction tasks. However, most prior evaluation benchmarks have been devoted to validating pairwise matching correctness, ignoring the crucial measurement of robustness. In this paper, we present the first benchmark that simulates the evaluation of open information extraction models in the real world, where the syntactic and expressive distributions under the same knowledge meaning may drift variously. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique that consists of sentences with structured knowledge of the same meaning but with different syntactic and expressive forms. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques. We perform experiments on typical models published in the last decade as well as a popular large language model, the results show that the existing successful models exhibit a frustrating degradation, with a maximum drop of 23.43 F1 score. Our resources and code are available at https://github.com/qijimrc/ROBUST.

翻訳日:2023-10-26 00:44:04 公開日:2023-10-24

# 大規模言語モデルのための協調学習アシスタントによる誤りから学ぶ

Learning from Mistakes via Cooperative Study Assistant for Large Language Models ( http://arxiv.org/abs/2305.13829v3 )

ライセンス: Link先を確認

Danqing Wang, Lei Li

(参考訳) 大規模言語モデル(llm)は、自身のフィードバックに基づいて世代を洗練する可能性を実証している。しかし、llm自体からのフィードバックはしばしば不正確であり、その利点を制限している。本稿では,対話的協調によるミス学習における主要なllmを支援する補助エージェントを用いた新しい枠組みである,大言語モデル学習支援システム(salam)を提案する。収集フェーズでは、学生アシスタントエージェントがメインLLMをプローブし、そのエラーを分析し、間違ったメモリでインタラクションを収集する。試験段階では、研究アシスタントは、関連するケースを検索して、メインのllmが予測し、同様のエラーを避けるためのガイドラインを提供する。まず,汎用学習支援システムの有効性を検証し,その効果をカスタマイズし,学習経験を模倣してllm固有の指導を行う。 SALAMはBBHでは6.6、BBQでは12.6の精度でLLMを大幅に向上できることを示す。

Large language models (LLMs) have demonstrated their potential to refine their generation based on their own feedback. However, the feedback from LLM itself is often inaccurate, thereby limiting its benefits. In this paper, we propose Study Assistant for Large LAnguage Model (SALAM), a novel framework with an auxiliary agent to assist the main LLM in learning from mistakes through interactive cooperation. In the gathering phase, the student assistant agent probes the main LLM, analyzes its errors, and collects the interaction in a mistake memory. During the examination phase, the study assistant provides guidelines by retrieving relevant cases to help the main LLM anticipate and avoid similar errors. We first investigate the effectiveness of a general study assistant and then customize it to provide LLM-specific guidance through imitation learning from successful guidance experiences. Our experiments on three LLMs using two challenging frameworks demonstrate that SALAM can significantly boost LLMs by an accuracy margin of up to 6.6 on BBH and 12.6 on BBQ.

翻訳日:2023-10-26 00:43:45 公開日:2023-10-24

# 質問が英語でなければChatGPTを信用しない:多言語能力とLLMのタイプの検討

Don't Trust ChatGPT when Your Question is not in English: A Study of Multilingual Abilities and Types of LLMs ( http://arxiv.org/abs/2305.16339v2 )

ライセンス: Link先を確認

Xiang Zhang, Senyu Li, Bradley Hauer, Ning Shi, Grzegorz Kondrak

(参考訳) 大規模言語モデル(LLM)は,近年,自然言語理解能力に優れ,多種多様な自然言語処理(NLP)タスクに優れてきた。ほとんどのllmが主に英語で訓練されているにもかかわらず、複数の研究が他の多くの言語での比較性能を示している。しかし、LLMが多言語能力をどのように獲得するか、また異なる言語間でパフォーマンスがどのように異なるか、という根本的な疑問が続いている。ユーザや研究者は多種多様な言語背景から来ており、LLMの活用と解釈に影響を与える可能性があるため、これらの質問はLLMの研究に不可欠である。本研究では,多言語環境でのllmの性能差を体系的に評価する方法を提案する。 LLMにおける多言語一般化の現象について検討し,多言語学習データ不足が多言語能力の向上につながることを示す。これを実現するために、バック翻訳に基づく新しいプロンプト方式を用いる。その結果,GPTは多言語設定において高い翻訳的振る舞いを示すことがわかった。

Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities and have excelled in a variety of natural language processing (NLP)tasks in recent years. Despite the fact that most LLMs are trained predominantly in English, multiple studies have demonstrated their comparative performance in many other languages. However, fundamental questions persist regarding how LLMs acquire their multi-lingual abilities and how performance varies across different languages. These inquiries are crucial for the study of LLMs since users and researchers often come from diverse language backgrounds, potentially influencing their utilization and interpretation of LLMs' results. In this work, we propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings. We investigate the phenomenon of across-language generalizations in LLMs, wherein insufficient multi-lingual training data leads to advanced multi-lingual capabilities. To accomplish this, we employ a novel back-translation-based prompting method. The results show that GPT exhibits highly translating-like behaviour in multilingual settings.

翻訳日:2023-10-26 00:35:13 公開日:2023-10-24

# 経験的条件付き一貫した最適輸送

Consistent Optimal Transport with Empirical Conditional Measures ( http://arxiv.org/abs/2305.15901v3 )

ライセンス: Link先を確認

Piyushi Manupriya, Rachit Keerti Das, Sayantan Biswas, Saketha Nath Jagarlapudi

(参考訳) 2つの連接分布からのサンプルを仮定し,共通変数上での最適輸送(OT)の問題を考える。条件付き変数が連続であるような一般的な設定に注目し、2つのジョイント分布におけるこの変数の限界は同じではないかもしれない。このような設定では、標準ot変種は採用できず、新しい推定技術が必要である。主な課題は条件分布が明確には利用できないことであるが、我々のot定式化における重要なアイデアは、共同サンプル上で計算されたカーネル化されたleast-squares項を、輸送計画の限界と経験的な条件条件とを暗黙的に一致させることである。軽度条件下では,条件付き変数の関数として推定された輸送計画が漸近的に最適であることを示す。有限標本に対しては、正規化対象の偏差が$O(1/m^{1/4})$で有界であることを示し、$m$はサンプルの数である。また,明示的な確率モデルと暗黙的な生成モデルを用いて条件付き輸送計画をモデル化する方法についても論じる。最適計画が解析的に知られている合成データセット上の推定器の一貫性を実証的に検証する。治療に対する細胞応答予測の文脈において, プロンプト・ラーニングや条件生成などのアプリケーションで採用すると, 最先端の手法が改善される。

Given samples from two joint distributions, we consider the problem of Optimal Transportation (OT) between them when conditioned on a common variable. We focus on the general setting where the conditioned variable may be continuous, and the marginals of this variable in the two joint distributions may not be the same. In such settings, standard OT variants cannot be employed, and novel estimation techniques are necessary. Since the main challenge is that the conditional distributions are not explicitly available, the key idea in our OT formulation is to employ kernelized-least-squares terms computed over the joint samples, which implicitly match the transport plan's marginals with the empirical conditionals. Under mild conditions, we prove that our estimated transport plans, as a function of the conditioned variable, are asymptotically optimal. For finite samples, we show that the deviation in terms of our regularized objective is bounded by $O(1/m^{1/4})$, where $m$ is the number of samples. We also discuss how the conditional transport plan could be modelled using explicit probabilistic models as well as using implicit generative ones. We empirically verify the consistency of our estimator on synthetic datasets, where the optimal plan is analytically known. When employed in applications like prompt learning for few-shot classification and conditional-generation in the context of predicting cell responses to treatment, our methodology improves upon state-of-the-art methods.

翻訳日:2023-10-26 00:34:55 公開日:2023-10-24

# スクラッチによる文埋め込みの対比学習

Contrastive Learning of Sentence Embeddings from Scratch ( http://arxiv.org/abs/2305.15077v2 )

ライセンス: Link先を確認

Junlei Zhang, Zhenzhong Lan, Junxian He

(参考訳) コントラスト学習は、最先端の文埋め込みを訓練する主要なアプローチである。これまでの研究は、人間の注釈付き自然言語推論(nli)データや、教師なしの大規模非ラベル文を用いて、文埋め込みを学習してきた。しかし、ラベルのないデータであっても、様々な理由から特定のドメインで課題を提起している。これらの問題に対処するために、合成データによる文埋め込みを訓練するコントラスト学習フレームワークSynCSEを提案する。具体的には,(1)ラベルなし文(同期部分)に対する肯定的および否定的アノテーションの生成,(2)対応するアノテーションをスクラッチから生成すること(同期スクラッチ),など,比較学習に必要なデータサンプルを合成する大規模言語モデルの利用について検討する。 SynCSE-partial と SynCSE-scratch はどちらも教師なしベースラインを大幅に上回り、SynCSE-partial は教師付きモデルに匹敵する性能をほとんどの設定で達成している。

Contrastive learning has been the dominant approach to train state-of-the-art sentence embeddings. Previous studies have typically learned sentence embeddings either through the use of human-annotated natural language inference (NLI) data or via large-scale unlabeled sentences in an unsupervised manner. However, even in the case of unlabeled data, their acquisition presents challenges in certain domains due to various reasons. To address these issues, we present SynCSE, a contrastive learning framework that trains sentence embeddings with synthesized data. Specifically, we explore utilizing large language models to synthesize the required data samples for contrastive learning, including (1) producing positive and negative annotations given unlabeled sentences (SynCSE-partial), and (2) generating sentences along with their corresponding annotations from scratch (SynCSE-scratch). Experimental results on sentence similarity and reranking tasks indicate that both SynCSE-partial and SynCSE-scratch greatly outperform unsupervised baselines, and SynCSE-partial even achieves comparable performance to the supervised models in most settings.

翻訳日:2023-10-26 00:34:29 公開日:2023-10-24

# Cheap and Quick: 大規模言語モデルのための効率的な視覚言語指導チューニング

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models ( http://arxiv.org/abs/2305.15023v3 )

ライセンス: Link先を確認

Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, Rongrong Ji

(参考訳) 近年、人工知能の次のマイルストーンと見なされる視覚言語(vl)学習など、大規模言語モデル(llm)のマルチモーダル能力の拡張への関心が高まっている。しかし、既存のソリューションは非常に高価であり、過剰なパラメータを最適化するだけでなく、VL命令のチューニングの前にも大規模な事前学習が必要である。本稿では,Mixture-of-Modality Adaptation (MMA)と呼ばれる,LLMの有効なVL適応のための,新規で安価なソリューションを提案する。画像エンコーダとLLMを接続するために大きなニューラルネットワークを使用する代わりに、MMAはLLMとVLタスクのギャップを埋めるために、軽量モジュール(アダプタ)を採用する。一方、MMAは、LLMが自然言語理解能力を損なうことなく、シングルモーダル命令とマルチモーダル命令の自動シフトを実現するためのルーティングアルゴリズムも備えている。 mmaを検証するために、llamaと呼ばれる最近のllmに適用し、これをlavinという大きな視覚言語指示モデルと呼ぶ。 mmaとlavinを検証するために,マルチモーダル科学質問応答とマルチモーダル対話という2つの設定で広範な実験を行った。実験結果は,既存のマルチモーダルLLMよりもLaVINの競争性能と訓練効率が優れているだけでなく,汎用チャットボットとしての可能性も確認した。さらに重要なことに、LaVINの実際の支出は極めて安価であり、例えば3.8Mのトレーニング可能なパラメータを持つ訓練時間は1.4時間に過ぎず、MMAの有効性を大きく確認している。私たちのプロジェクトはhttps://luogen1996.github.io/lavinでリリースしています。

Recently, growing interest has been aroused in extending the multimodal capability of large language models (LLMs), e.g., vision-language (VL) learning, which is regarded as the next milestone of artificial general intelligence. However, existing solutions are prohibitively expensive, which not only need to optimize excessive parameters, but also require another large-scale pre-training before VL instruction tuning. In this paper, we propose a novel and affordable solution for the effective VL adaption of LLMs, called Mixture-of-Modality Adaptation (MMA). Instead of using large neural networks to connect the image encoder and LLM, MMA adopts lightweight modules, i.e., adapters, to bridge the gap between LLMs and VL tasks, which also enables the joint optimization of the image and language models. Meanwhile, MMA is also equipped with a routing algorithm to help LLMs achieve an automatic shift between single- and multi-modal instructions without compromising their ability of natural language understanding. To validate MMA, we apply it to a recent LLM called LLaMA and term this formed large vision-language instructed model as LaVIN. To validate MMA and LaVIN, we conduct extensive experiments under two setups, namely multimodal science question answering and multimodal dialogue. The experimental results not only demonstrate the competitive performance and the superior training efficiency of LaVIN than existing multimodal LLMs, but also confirm its great potential as a general-purpose chatbot. More importantly, the actual expenditure of LaVIN is extremely cheap, e.g., only 1.4 training hours with 3.8M trainable parameters, greatly confirming the effectiveness of MMA. Our project is released at https://luogen1996.github.io/lavin.

翻訳日:2023-10-26 00:33:27 公開日:2023-10-24

# ACL OCL Corpus:計算言語学におけるオープンサイエンスの推進

The ACL OCL Corpus: Advancing Open Science in Computational Linguistics ( http://arxiv.org/abs/2305.14996v2 )

ライセンス: Link先を確認

Shaurya Rohatgi, Yanxia Qin, Benjamin Aw, Niranjana Unnithan, Min-Yen Kan

(参考訳) 本稿では、ACLアンソロジーから派生した学術コーパスであるACL OCLを紹介し、計算言語学領域におけるオープン科学研究を支援する。 ACLアンソロジーの以前のバージョンの統合と拡張により、ACL OCLはメタデータ、PDFファイル、引用グラフ、セクション、数字、大きな知識リソースへのリンクを含む構造化されたフルテキストをコントリビュートする(Semantic Scholar)。 ACL OCLは、73Kの論文と210Kの数字を含む70年に及ぶ。我々は、ACL OCLが計算言語学の傾向を観察するためにどのように適用されているかに注目する。教師付きニューラルモデルを用いて論文のトピックを検出することで、"Syntax: Tagging, Chunking and Parsing"への関心が薄れ、"Natural Language Generation"が復活しつつあることに注意する。私たちのデータセットはHuggingFace (https://huggingface.co/datasets/WINGNUS/ACL-OCL)から入手可能です。

We present ACL OCL, a scholarly corpus derived from the ACL Anthology to assist Open scientific research in the Computational Linguistics domain. Integrating and enhancing the previous versions of the ACL Anthology, the ACL OCL contributes metadata, PDF files, citation graphs and additional structured full texts with sections, figures, and links to a large knowledge resource (Semantic Scholar). The ACL OCL spans seven decades, containing 73K papers, alongside 210K figures. We spotlight how ACL OCL applies to observe trends in computational linguistics. By detecting paper topics with a supervised neural model, we note that interest in "Syntax: Tagging, Chunking and Parsing" is waning and "Natural Language Generation" is resurging. Our dataset is available from HuggingFace (https://huggingface.co/datasets/WINGNUS/ACL-OCL).

翻訳日:2023-10-26 00:32:57 公開日:2023-10-24

# Dolphin: アラビア語のNLGのベンチマーク

Dolphin: A Challenging and Diverse Benchmark for Arabic NLG ( http://arxiv.org/abs/2305.14989v2 )

ライセンス: Link先を確認

El Moatez Billah Nagoudi, AbdelRahim Elmadany, Ahmed El-Shangiti, Muhammad Abdul-Mageed

(参考訳) 我々は、アラビア語の言語と品種の広範なコレクションに特化した自然言語生成(NLG)評価フレームワークの必要性に対処する新しいベンチマークであるDolphinを紹介する。提案したベンチマークは、対話生成、質問応答、機械翻訳、要約などを含む13種類のNLGタスクを含む。イルカは50のテストスプリットにまたがる40の多様で代表的な公開データセットで構成されており、実世界のシナリオとアラビア語の言語豊かさを反映して注意深くキュレートされている。アラビア語および多言語モデルの性能と一般化能力を評価するための新しい標準を設定し、研究者が現在の方法論の境界を押し上げることを約束する。我々はDolphinを広範囲に分析し、その多様性と現在のアラビアのNLG研究のギャップを明らかにする。また、インタラクティブでモジュール化された公開のリーダーボードを提供し、ベンチマークでいくつかのモデルを評価し、研究者が比較できる強力なベースラインを設定することができます。

We present Dolphin, a novel benchmark that addresses the need for a natural language generation (NLG) evaluation framework dedicated to the wide collection of Arabic languages and varieties. The proposed benchmark encompasses a broad range of 13 different NLG tasks, including dialogue generation, question answering, machine translation, summarization, among others. Dolphin comprises a substantial corpus of 40 diverse and representative public datasets across 50 test splits, carefully curated to reflect real-world scenarios and the linguistic richness of Arabic. It sets a new standard for evaluating the performance and generalization capabilities of Arabic and multilingual models, promising to enable researchers to push the boundaries of current methodologies. We provide an extensive analysis of Dolphin, highlighting its diversity and identifying gaps in current Arabic NLG research. We also offer a public leaderboard that is both interactive and modular and evaluate several models on our benchmark, allowing us to set strong baselines against which researchers can compare.

翻訳日:2023-10-26 00:32:39 公開日:2023-10-24

# キャリブレーションを問う:人間のフィードバックを微調整した言語モデルからキャリブレーションされた信頼スコアを除去するための戦略

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback ( http://arxiv.org/abs/2305.14975v2 )

ライセンス: Link先を確認

Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning

(参考訳) 信頼に値する実世界の予測システムは、十分に調整された信頼度スコアを生成するべきである。つまり、その回答に対する信頼度は、答えが正しい可能性を示すものでなければならない。近年の研究では、教師なし事前学習が条件付き確率が著しく高い大言語モデル(lms)を生成することが示されている。しかしながら、最も広く使われているLMは、人間のフィードバック(RLHF-LMs)からの強化学習によって微調整されており、RLHF-LMsが極めて低濃度の条件付き確率を生成することを示唆する研究もある。この弱さを考慮し,rlhf-lmsから信頼度スコアを抽出する方法の広範な評価を行った。 ChatGPT, GPT-4, Claude などの RLHF-LM に対して,出力トークンとして出力される言語的信頼度は,TriviaQA, SciQ, TruthfulQA ベンチマークにおけるモデルの条件付き確率よりもよく校正され,期待される校正誤差を50%削減する。

A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of low-confidence predictions. Recent studies have shown that unsupervised pre-training produces large language models (LMs) whose conditional probabilities are remarkably well-calibrated. However, the most widely-used LMs are fine-tuned with reinforcement learning from human feedback (RLHF-LMs), and some studies have suggested that RLHF-LMs produce conditional probabilities that are very poorly calibrated. In light of this perceived weakness, we conduct a broad evaluation of methods for extracting confidence scores from RLHF-LMs. For RLHF-LMs such as ChatGPT, GPT-4, and Claude, we find that verbalized confidences emitted as output tokens are typically better-calibrated than the model's conditional probabilities on the TriviaQA, SciQ, and TruthfulQA benchmarks, often reducing the expected calibration error by a relative 50%.

翻訳日:2023-10-26 00:32:22 公開日:2023-10-24

# GRACE: 差別的ガイドによる思考の連鎖

GRACE: Discriminator-Guided Chain-of-Thought Reasoning ( http://arxiv.org/abs/2305.14934v2 )

ライセンス: Link先を確認

Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Lu Wang

(参考訳) マルチステップ推論の文脈では、例えば、チェーン・オブ・シント(英語版)を持つ言語モデル(LM)は、容易に誤ったステップを割り当てることができる。結果として、ソリューションの可能性を最適化するデコーディング戦略は、しばしば不正確なソリューションをもたらす。この問題に対処するため、我々は、正しい推論ステップを生成するためのデコードプロセスを段階的に決定するGRACE(CorrectnEss Discriminator)によるチェーン・オブ・シークレット・推論を提案する。 GRACEは、正しいステップと間違ったステップに対して対照的な損失で訓練された判別器を使用し、復号時にその正確性に基づいて次のステップ候補をスコアする。重要な点として、GRACEはLMトレーニングや微調整を必要とせず、LMからのサンプリングのみを必要とする。 flan-t5ファミリーとllamaファミリのモデルを用いて、4つの数学と2つの象徴的推論タスクの優雅さを評価し、ほとんどの設定で欲張りなデコード、検証者、自己一貫性と比較して、実質的なパフォーマンス向上を示す。さらに自己整合性と組み合わせると、GRACEはすべてのベースラインを大きなマージンで上回る。 GSM8Kに対する人間とLLMの評価は、GRACEが最終回答精度を向上するだけでなく、中間推論の正確性も向上することを示している。我々の実装は \url{https://github.com/mukhal/grace} でアクセスできる。

In the context of multi-step reasoning, e.g., with chain-of-thought, language models (LMs) can easily assign a high likelihood to incorrect steps. As a result, decoding strategies that optimize for solution likelihood often yield incorrect solutions. To address this issue, we propose Guiding chain-of-thought ReAsoning with a CorrectnEss Discriminator (GRACE), a stepwise decoding approach that steers the decoding process towards producing correct reasoning steps. GRACE employs a discriminator trained with a contrastive loss over correct and incorrect steps, which is used during decoding to score next-step candidates based on their correctness. Importantly, GRACE only requires sampling from the LM, without the need for LM training or fine-tuning. Using models from FLAN-T5 and LLaMA families, we evaluate GRACE over four math and two symbolic reasoning tasks, where it exhibits substantial performance gains compared to greedy decoding, verifiers, and self-consistency in most settings. When further combined with self-consistency, GRACE outperforms all the baselines by sizeable margins. Human and LLM evaluations over GSM8K show that GRACE not only improves the final answer accuracy but also the correctness of the intermediate reasoning. Our implementation can be accessed at \url{https://github.com/mukhal/grace}.

翻訳日:2023-10-26 00:31:57 公開日:2023-10-24

# 電子スピンが電気双極子スピン共鳴によって駆動される量子ドット内の動的核スピン偏極

Dynamical nuclear spin polarization in a quantum dot with an electron spin driven by electric dipole spin resonance ( http://arxiv.org/abs/2306.11253v2 )

ライセンス: Link先を確認

Peter Stano, Takashi Nakajima, Akito Noiri, Seigo Tarucha, Daniel Loss

(参考訳) 単一電子スピンによって誘起される量子ドット内の核スピンの分極を、コヒーレントなラビ振動を行うために電気的に駆動する。核スピン偏光速度を導出し、その制御パラメータ、特に電子ラーモア周波数からの駆動周波数のデチューニングに依存することを解析する。生じる核スピン偏極は、2つの重要な違いを持つNMR文献から知られているハートマン・ハーン効果と関係している。まず、量子ドットでは一般的にマイクロ磁石を使用し、電子と核スピンの量子化軸の小さな偏向に繋がる。第2に、電気駆動は原子格子に対して電子をウィグルする。伝統的なハートマン・ハーンのシナリオにはない2つの効果は、ゲート量子ドットにおける原子スピン偏極の2つのメカニズムを引き起こす。核スピン偏極は共鳴現象であり、電子ラビと核ラーモア周波数(典型的には数MHzまたは数MHz)の共鳴において最大効率を達成する。駆動周波数の関数として、偏光速度は鋭いピークを発生させ、大きな値に達することができる。原子核偏極は電子ラーモア周波数の変化として実験的に検出されるため、式や図形では前者から後者への変換がしばしば行われる。これらの単位では、分極はGaAs量子ドットで数百MHz/s、Si量子ドットで少なくとも数十kHz/sに達する。我々は、大きな核分極を達成するための共鳴分極効果とフィードバックによるオーバーハウザー場安定化の可能性を分析する。

We analyze the polarization of nuclear spins in a quantum dot induced by a single-electron spin that is electrically driven to perform coherent Rabi oscillations. We derive the associated nuclear-spin polarization rate and analyze its dependence on the accessible control parameters, especially the detuning of the driving frequency from the electron Larmor frequency. The arising nuclear-spin polarization is related to the Hartmann-Hahn effect known from the NMR literature with two important differences. First, in quantum dots one typically uses a micro magnet, leading to a small deflection of the quantization axes of the electron and nuclear spins. Second, the electric driving wiggles the electron with respect to the atomic lattice. The two effects, absent in the traditional Hartmann-Hahn scenario, give rise to two mechanisms of nuclear-spin polarization in gated quantum dots. The arising nuclear-spin polarization is a resonance phenomenon, achieving maximal efficiency at the resonance of the electron Rabi and nuclear Larmor frequency (typically a few or a few tens of MHz). As a function of the driving frequency, the polarization rate can develop sharp peaks and reach large values at them. Since the nuclear polarization is experimentally detected as changes of the electron Larmor frequency, we often convert the former to the latter in our formulas and figures. In these units, the polarization can reach hundreds of MHz/s in GaAs quantum dots and at least tens of kHz/s in Si quantum dots. We analyze possibilities to exploit the resonant polarization effects for achieving large nuclear polarization and for stabilizing the Overhauser field through feedback.

翻訳日:2023-10-26 00:25:50 公開日:2023-10-24

# ビジョンファウンデーションモデルによる任意のポイントクラウドシーケンスの分割

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models ( http://arxiv.org/abs/2306.09347v2 )

ライセンス: Link先を確認

Youquan Liu and Lingdong Kong and Jun Cen and Runnan Chen and Wenwei Zhang and Liang Pan and Kai Chen and Ziwei Liu

(参考訳) 視覚基礎モデル(VFM)の最近の進歩は、多目的かつ効率的な視覚知覚の新しい可能性を開いた。本稿では,vfmを多種多様な自動車用ポイントクラウドシーケンスのセグメンテーションに利用する新しいフレームワークである seal を紹介する。 sealには3つの魅力がある。 i) スケーラビリティ: VFMはポイントクラウドに直接蒸留され、事前トレーニング中に2Dまたは3Dのアノテーションが不要になる。 ii) 一貫性: 空間的および時間的関係は、カメラからライダーへの正規化段階とポイント・ツー・セグメンの正規化段階の両方において実施され、クロスモーダル表現学習が促進される。三総合可能性:シールは、実際の/合成、低解像度、大規模/小規模、クリーン/破損したデータセットを含む多様なポイントクラウドを含む下流タスクに、オフザシェルフ方式で知識伝達を可能にする。 11の異なるポイントクラウドデータセットで広範な実験が行われ、sealの有効性と優位性が示された。特筆すべきは、アザラシは線形探索の後、無作為初期化を36.9%、先行芸術を6.1%上回り、無作為初期化を45.0%上回ったことである。さらに、sealは、テスト済みの11のpoint cloudデータセットすべてにおいて、20の異なるマイナショット微調整タスクにわたる既存のメソッドよりも大きなパフォーマンス向上を示している。

Recent advancements in vision foundation models (VFMs) have opened up new possibilities for versatile and efficient visual perception. In this work, we introduce Seal, a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences. Seal exhibits three appealing properties: i) Scalability: VFMs are directly distilled into point clouds, obviating the need for annotations in either 2D or 3D during pretraining. ii) Consistency: Spatial and temporal relationships are enforced at both the camera-to-LiDAR and point-to-segment regularization stages, facilitating cross-modal representation learning. iii) Generalizability: Seal enables knowledge transfer in an off-the-shelf manner to downstream tasks involving diverse point clouds, including those from real/synthetic, low/high-resolution, large/small-scale, and clean/corrupted datasets. Extensive experiments conducted on eleven different point cloud datasets showcase the effectiveness and superiority of Seal. Notably, Seal achieves a remarkable 45.0% mIoU on nuScenes after linear probing, surpassing random initialization by 36.9% mIoU and outperforming prior arts by 6.1% mIoU. Moreover, Seal demonstrates significant performance gains over existing methods across 20 different few-shot fine-tuning tasks on all eleven tested point cloud datasets.

翻訳日:2023-10-26 00:25:28 公開日:2023-10-24

# 微分的にプライベートな条件付き独立性テスト

Differentially Private Conditional Independence Testing ( http://arxiv.org/abs/2306.06721v2 )

ライセンス: Link先を確認

Iden Kalemaj, Shiva Prasad Kasiviswanathan, Aaditya Ramdas

(参考訳) 条件独立テスト(CI)は、統計データ分析において広く使われ、例えば、因果グラフ発見のための多くのアルゴリズムの構成要素である。 ciテストの目的は、$x \perp \!というヌル仮説を受け入れたり拒否したりすることです。 \! \! \perp Y \mid Z$, where $X \in \mathbb{R}, Y \in \mathbb{R}, Z \in \mathbb{R}^d$。本研究では,差分プライバシー制約下での条件付き独立試験について検討する。我々は、ShahとPetersの一般化共分散尺度(2020)とCand\`es et al.(2016)の条件付きランダム化テスト(モデル-X仮定)の2つのプライベートCIテスト手順を設計する。テストのパフォーマンスを理論的に保証し、実証的に検証します。これらは、Z$が連続している場合の一般的なケースで機能する厳密な理論的保証を持つ最初のプライベートCIテストである。

Conditional independence (CI) tests are widely used in statistical data analysis, e.g., they are the building block of many algorithms for causal graph discovery. The goal of a CI test is to accept or reject the null hypothesis that $X \perp \!\!\! \perp Y \mid Z$, where $X \in \mathbb{R}, Y \in \mathbb{R}, Z \in \mathbb{R}^d$. In this work, we investigate conditional independence testing under the constraint of differential privacy. We design two private CI testing procedures: one based on the generalized covariance measure of Shah and Peters (2020) and another based on the conditional randomization test of Cand\`es et al. (2016) (under the model-X assumption). We provide theoretical guarantees on the performance of our tests and validate them empirically. These are the first private CI tests with rigorous theoretical guarantees that work for the general case when $Z$ is continuous.

翻訳日:2023-10-26 00:24:40 公開日:2023-10-24

# ラベルシフトによるフェデレーション不確かさの等角的予測

Conformal Prediction for Federated Uncertainty Quantification Under Label Shift ( http://arxiv.org/abs/2306.05131v2 )

ライセンス: Link先を確認

Vincent Plassier, Mehdi Makni, Aleksandr Rubashevskii, Eric Moulines and Maxim Panov

(参考訳) Federated Learning(FL)は機械学習フレームワークで、多くのクライアントがトレーニングデータを分散化しながらモデルを協調的にトレーニングする。近年のFLの発展にもかかわらず、不確実量化トピック(UQ)は部分的に解決されている。 UQ法の中で、共形予測(CP)アプローチは最小の仮定の下で分布のない保証を提供する。質的回帰に基づく新しい連立共形共形予測法を開発し,プライバシー制約を考慮に入れる。この方法はエージェント間のラベルシフトを効果的に扱うために重み付けを活用し、予測セットの有効なカバレッジと差分プライバシの両方を理論的に保証する。広範な実験により、この方法が現在の競争相手よりも優れていることが示されている。

Federated Learning (FL) is a machine learning framework where many clients collaboratively train models while keeping the training data decentralized. Despite recent advances in FL, the uncertainty quantification topic (UQ) remains partially addressed. Among UQ methods, conformal prediction (CP) approaches provides distribution-free guarantees under minimal assumptions. We develop a new federated conformal prediction method based on quantile regression and take into account privacy constraints. This method takes advantage of importance weighting to effectively address the label shift between agents and provides theoretical guarantees for both valid coverage of the prediction sets and differential privacy. Extensive experimental studies demonstrate that this method outperforms current competitors.

翻訳日:2023-10-26 00:24:22 公開日:2023-10-24

# 高輝度LHCにおけるデータ圧縮のための地球モーバー距離の微分

Differentiable Earth Mover's Distance for Data Compression at the High-Luminosity LHC ( http://arxiv.org/abs/2306.04712v2 )

ライセンス: Link先を確認

Rohan Shenoy and Javier Duarte and Christian Herwig and James Hirschauer and Daniel Noonan and Maurizio Pierini and Nhan Tran and Cristina Mantilla Suarez

(参考訳) 地球移動器距離(Earth mover's distance、EMD)は画像認識と分類に有用な指標であるが、通常の実装は微分可能ではなく、勾配降下による他のアルゴリズムを訓練するための損失関数として使うには遅すぎる。本稿では,畳み込みニューラルネットワーク(CNN)を用いて,EMDの微分可能かつ高速な近似を学習し,計算集約型EMD実装の代替として使用できることを示す。この微分可能な近似を、cernの高輝度lhcにおけるデータ圧縮のためのautoencoder-inspired neural network(encoder nn)のトレーニングに適用する。このエンコーダNNの目標は、粒子検出器内のエネルギー蓄積の分布に関する情報を保存しながらデータを圧縮することである。 EMD CNNを用いて訓練したエンコーダNNの性能が平均二乗誤差に基づく損失関数付きトレーニングよりも優れていることを示す。

The Earth mover's distance (EMD) is a useful metric for image recognition and classification, but its usual implementations are not differentiable or too slow to be used as a loss function for training other algorithms via gradient descent. In this paper, we train a convolutional neural network (CNN) to learn a differentiable, fast approximation of the EMD and demonstrate that it can be used as a substitute for computing-intensive EMD implementations. We apply this differentiable approximation in the training of an autoencoder-inspired neural network (encoder NN) for data compression at the high-luminosity LHC at CERN. The goal of this encoder NN is to compress the data while preserving the information related to the distribution of energy deposits in particle detectors. We demonstrate that the performance of our encoder NN trained using the differentiable EMD CNN surpasses that of training with loss functions based on mean squared error.

翻訳日:2023-10-26 00:24:11 公開日:2023-10-24

# ダイナミクスシフトを伴うデータに対する状態正規化ポリシー最適化

State Regularized Policy Optimization on Data with Dynamics Shift ( http://arxiv.org/abs/2306.03552v2 )

ライセンス: Link先を確認

Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An

(参考訳) 多くの現実世界のシナリオでは、強化学習(rl)アルゴリズムは、動的シフトのあるデータ、すなわち異なる環境ダイナミクスに基づいて訓練される。現在の手法の大部分は、環境パラメータを識別するためにコンテキストエンコーダをトレーニングすることでこの問題に対処している。動的シフトを伴うデータは、環境パラメータに従って分離され、対応するポリシーをトレーニングする。しかし、これらの手法は、データがtextit{ad hoc} として使用されるため、サンプル非効率であり、1つのダイナミクスのために訓練されたポリシーは、異なるダイナミクスを持つ他のすべての環境で収集されたデータから恩恵を受けることができない。本稿では,類似した構造と異なるダイナミクスを持つ多くの環境において,最適ポリシーが類似した定常状態分布を持つことを示す。このような特性を活用し,動的シフトを持つデータから定常状態分布を学習し,効率的なデータ再利用を行う。そのような分布は、新しい環境で訓練されたポリシーを規則化するために使用され、SRPO(\textbf{S}tate \textbf{R}egularized \textbf{P}olicy \textbf{O}ptimization)アルゴリズムにつながる。理論的解析を行うため、類似した環境構造の直観はホモモルファスMDPの概念によって特徴づけられる。次に、定常状態分布によって規則化されたポリシーに対して、低いバウンド性能保証を示す。実際には、SRPOはオンラインとオフラインのRL設定の両方でコンテキストベースのアルゴリズムのアドオンモジュールとなることができる。実験の結果、srpoは複数のコンテキストベースのアルゴリズムをより効率的にし、全体的な性能を大幅に向上できることがわかった。

In many real-world scenarios, Reinforcement Learning (RL) algorithms are trained on data with dynamics shift, i.e., with different underlying environment dynamics. A majority of current methods address such issue by training context encoders to identify environment parameters. Data with dynamics shift are separated according to their environment parameters to train the corresponding policy. However, these methods can be sample inefficient as data are used \textit{ad hoc}, and policies trained for one dynamics cannot benefit from data collected in all other environments with different dynamics. In this paper, we find that in many environments with similar structures and different dynamics, optimal policies have similar stationary state distributions. We exploit such property and learn the stationary state distribution from data with dynamics shift for efficient data reuse. Such distribution is used to regularize the policy trained in a new environment, leading to the SRPO (\textbf{S}tate \textbf{R}egularized \textbf{P}olicy \textbf{O}ptimization) algorithm. To conduct theoretical analyses, the intuition of similar environment structures is characterized by the notion of homomorphous MDPs. We then demonstrate a lower-bound performance guarantee on policies regularized by the stationary state distribution. In practice, SRPO can be an add-on module to context-based algorithms in both online and offline RL settings. Experimental results show that SRPO can make several context-based algorithms far more data efficient and significantly improve their overall performance.

翻訳日:2023-10-26 00:23:56 公開日:2023-10-24

# シュル=オディンガー代数の自然な基礎におけるクリロフ複雑性

Krylov complexity in a natural basis for the Schr\"odinger algebra ( http://arxiv.org/abs/2306.03133v3 )

ライセンス: Link先を確認

Dimitrios Patramanis and Watse Sybesma

(参考訳) クリロフ複雑性の研究により、2次元シュリンガー群対称性を持つ量子系の作用素成長を研究する。半単純リー代数では実現可能であるが、半直和構造によって特徴づけられるシュリンガー代数のようなケースは複雑である。我々は、この代数のクリロフ複雑性を自然な正則基底で計算し、通常の三対角ランツォスアルゴリズムの結果とは対照的に、時間発展作用素の五対角構造を生成することを提案する。結果として生じる複雑性は期待通りに振る舞う。このアプローチは他の半単純でない代数に洞察を与えることができると我々は主張する。

We investigate operator growth in quantum systems with two-dimensional Schr\"odinger group symmetry by studying the Krylov complexity. While feasible for semisimple Lie algebras, cases such as the Schr\"odinger algebra which is characterized by a semi-direct sum structure are complicated. We propose to compute Krylov complexity for this algebra in a natural orthonormal basis, which produces a pentadiagonal structure of the time evolution operator, contrasting the usual tridiagonal Lanczos algorithm outcome. The resulting complexity behaves as expected. We advocate that this approach can provide insights to other non-semisimple algebras.

翻訳日:2023-10-26 00:23:25 公開日:2023-10-24

# ReContrast: コントラスト再構成によるドメイン特異的異常検出

ReContrast: Domain-Specific Anomaly Detection via Contrastive Reconstruction ( http://arxiv.org/abs/2306.02602v3 )

ライセンス: Link先を確認

Jia Guo, Shuai Lu, Lize Jia, Weihang Zhang, Huiqi Li

(参考訳) 殆どの高度な教師なし異常検出(UAD)手法は、例えばImageNetのような大規模データセットで事前訓練された冷凍エンコーダネットワークの特徴表現をモデル化することに依存している。しかし, 自然画像領域から借用したエンコーダから抽出した特徴は, 産業検査や医用画像などのUAD領域で要求される特徴とほとんど一致しない。本稿では,ネットワーク全体を最適化し,事前学習した画像領域に対するバイアスを低減し,対象領域におけるネットワークの向き付けを行う,新たな認識論的uad法であるrecontrastを提案する。まず、エラーから異常を検出する機能再構築アプローチから始める。本質的に、コントラスト学習の要素を特徴再構成にエレガントに組み込んで、ネットワークが不安定、パターン崩壊、および同一のショートカットをトレーニングし、同時にターゲットドメイン上のエンコーダとデコーダの両方を最適化する。様々な画像領域における転写能力を実証するために,2つの一般的な産業欠陥検出ベンチマークと3つの医療画像UADタスクにまたがる広範な実験を行った。

Most advanced unsupervised anomaly detection (UAD) methods rely on modeling feature representations of frozen encoder networks pre-trained on large-scale datasets, e.g. ImageNet. However, the features extracted from the encoders that are borrowed from natural image domains coincide little with the features required in the target UAD domain, such as industrial inspection and medical imaging. In this paper, we propose a novel epistemic UAD method, namely ReContrast, which optimizes the entire network to reduce biases towards the pre-trained image domain and orients the network in the target domain. We start with a feature reconstruction approach that detects anomalies from errors. Essentially, the elements of contrastive learning are elegantly embedded in feature reconstruction to prevent the network from training instability, pattern collapse, and identical shortcut, while simultaneously optimizing both the encoder and decoder on the target domain. To demonstrate our transfer ability on various image domains, we conduct extensive experiments across two popular industrial defect detection benchmarks and three medical image UAD tasks, which shows our superiority over current state-of-the-art methods.

翻訳日:2023-10-26 00:23:14 公開日:2023-10-24

# 分布シフト下での低ランクデータの雑音化:二重降下とデータ拡張

Denoising Low-Rank Data Under Distribution Shift: Double Descent and Data Augmentation ( http://arxiv.org/abs/2305.17297v2 )

ライセンス: Link先を確認

Chinmaya Kausik and Kashvi Srivastava and Rishi Sonthalia

(参考訳) 現代の機械学習におけるデノイジングの重要性と、教師付きデノイジングに関する豊富な経験的研究にもかかわらず、その理論的理解は比較的少ない。教師付きdenoisingを研究することの1つの懸念は、テスト分布からのノイズレストレーニングデータが常に存在するとは限らないことである。テストデータセットとは異なるデータセットからノイズレストレーニングデータにアクセスするのは、より合理的である。そこで本研究では,分布シフト下での分節化と雑音入力回帰について検討した。実生活データや現代の機械学習への理論的洞察の適用性を高めるために、3つの考察を加えます。第一に、過去の理論的な研究は、データ共分散行列はフルランクでよく条件付けされていると仮定しているが、実生活データはおよそローランクである。したがって、データ行列は低ランクであると仮定する。第2に、データの独立性の前提を下げます。第3に、計算能力の増大とデータの次元性は、非古典的学習体制の研究を重要視している。したがって、データ次元$d$とサンプル数$N$が$d/N = c + o(1)$として成長する非古典的比例法で作業する。この設定では,雑音と雑音の回帰に対する一般的なテストエラー表現を導出し,雑音の過大さが良性,緊張的,あるいは破滅的である場合の研究を行う。テスト誤差は一般分布シフト下で二重降下を示し,データ拡張と暗黙的正規化としてのノイズの役割についての洞察を与える。また、実生活データを用いて実験を行い、その理論予測を低ランクデータに対する1% MSE誤差と一致させる。

Despite the importance of denoising in modern machine learning and ample empirical work on supervised denoising, its theoretical understanding is still relatively scarce. One concern about studying supervised denoising is that one might not always have noiseless training data from the test distribution. It is more reasonable to have access to noiseless training data from a different dataset than the test dataset. Motivated by this, we study supervised denoising and noisy-input regression under distribution shift. We add three considerations to increase the applicability of our theoretical insights to real-life data and modern machine learning. First, while most past theoretical work assumes that the data covariance matrix is full-rank and well-conditioned, empirical studies have shown that real-life data is approximately low-rank. Thus, we assume that our data matrices are low-rank. Second, we drop independence assumptions on our data. Third, the rise in computational power and dimensionality of data have made it important to study non-classical regimes of learning. Thus, we work in the non-classical proportional regime, where data dimension $d$ and number of samples $N$ grow as $d/N = c + o(1)$. For this setting, we derive general test error expressions for both denoising and noisy-input regression, and study when overfitting the noise is benign, tempered or catastrophic. We show that the test error exhibits double descent under general distribution shift, providing insights for data augmentation and the role of noise as an implicit regularizer. We also perform experiments using real-life data, where we match the theoretical predictions with under 1% MSE error for low-rank data.

翻訳日:2023-10-26 00:22:53 公開日:2023-10-24

# 外部時間スケール調整によるハイパーパラメータ依存性の低減

Reducing hyperparameter dependence by external timescale tailoring ( http://arxiv.org/abs/2307.08603v2 )

ライセンス: Link先を確認

Lina C. Jaurigue and Kathy L\"udge

(参考訳) 貯水池コンピューティングにおけるタスク特化ハイパーパラメータチューニングはオープンな問題であり、特にハードウェア実装型貯水池との関連性が高い。本研究では,外部制御可能なタスク特定時間スケールが貯留層計算手法の性能とハイパーパラメータ感度に与える影響について検討する。その結果,リザーバの時間スケールが特定のタスクに合わせて調整された場合,ハイパーパラメータの最適化の必要性を低減できることがわかった。この結果は主に過去の入力の記憶を必要とする時間的タスクに関係している。貯水池計算手法にタスク固有の時間スケールを含める様々な方法を検討し、時間多重・空間多重の貯水池計算の両面から、メッセージの普遍性を実証する。

Task specific hyperparameter tuning in reservoir computing is an open issue, and is of particular relevance for hardware implemented reservoirs. We investigate the influence of directly including externally controllable task specific timescales on the performance and hyperparameter sensitivity of reservoir computing approaches. We show that the need for hyperparameter optimisation can be reduced if timescales of the reservoir are tailored to the specific task. Our results are mainly relevant for temporal tasks requiring memory of past inputs, for example chaotic timeseries prediciton. We consider various methods of including task specific timescales in the reservoir computing approach and demonstrate the universality of our message by looking at both time-multiplexed and spatially multiplexed reservoir computing.

翻訳日:2023-10-26 00:14:35 公開日:2023-10-24

# 開量子系におけるフラクトニック高次位相

Fractonic Higher-Order Topological Phases in Open Quantum Systems ( http://arxiv.org/abs/2307.05474v2 )

ライセンス: Link先を確認

Jian-Hao Zhang, Ke Ding, Shuo Yang, Zhen Bi

(参考訳) 本研究では,非共役平均対称性保護位相の開放量子系への一般化を,サブシステム対称性と大域対称性の組み合わせで検討する。特に、平均サブシステム対称性を持つ2種類の固有平均高次位相位相相の例を示す。平均対称性の一般化された異常キャンセル基準に基づくこれらの位相の分類手法についても論じる。

In this work, we study the generalization of decohered average symmetry-protected topological phases to open quantum systems with a combination of subsystem symmetries and global symmetries. In particular, we provide examples of two types of intrinsic average higher-order topological phases with average subsystem symmetries. A classification scheme for these phases based on generalized anomaly cancellation criteria of average symmetry is also discussed.

翻訳日:2023-10-26 00:14:24 公開日:2023-10-24

# RADAR: 逆学習によるロバストなAIテキスト検出

RADAR: Robust AI-Text Detection via Adversarial Learning ( http://arxiv.org/abs/2307.03838v2 )

ライセンス: Link先を確認

Xiaomeng Hu and Pin-Yu Chen and Tsung-Yi Ho

(参考訳) 大規模言語モデル(LLM)の最近の進歩とChatGPTライクなアプリケーションの普及により、人間と機械間の高品質テキスト生成の境界が曖昧になった。しかし、我々の技術や社会の革命的な変化に加えて、LLM生成テキスト(AIテキスト)と人間生成テキストを区別することの難しさは、偽コンテンツ生成、盗作、無実の作家の虚偽の告発など、誤用と公平性の新たな課題を引き起こす。既存の研究は、現在のAIテキスト検出器はLLMベースのパラフレーズには堅牢ではないことを示しているが、本稿は、敵学習による堅牢なAIテキスト検出を共同で訓練するRADARと呼ばれる新しいフレームワークを提案することによって、このギャップを埋めることを目指している。 RADARはパラフラザーと検出器の対向訓練に基づいている。パラフレーズの目標は、AIテキスト検出を避けるために現実的なコンテンツを生成することである。 RADARは検出器からのフィードバックを使ってパラフラザーを更新する。 4つのデータセットで8つの異なるLLM(Pythia, Dolly 2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLaMA, Vicuna)を評価した結果、RADARが既存のAIテキスト検出方法、特にパラフレーズが設定されている場合において、大幅に上回っていることが示された。 GPT-3.5-Turbo を用いた RADAR の高機能化と RADAR の高機能化について検討した。

Recent advances in large language models (LLMs) and the intensifying popularity of ChatGPT-like applications have blurred the boundary of high-quality text generation between humans and machines. However, in addition to the anticipated revolutionary changes to our technology and society, the difficulty of distinguishing LLM-generated texts (AI-text) from human-generated texts poses new challenges of misuse and fairness, such as fake content generation, plagiarism, and false accusations of innocent writers. While existing works show that current AI-text detectors are not robust to LLM-based paraphrasing, this paper aims to bridge this gap by proposing a new framework called RADAR, which jointly trains a robust AI-text detector via adversarial learning. RADAR is based on adversarial training of a paraphraser and a detector. The paraphraser's goal is to generate realistic content to evade AI-text detection. RADAR uses the feedback from the detector to update the paraphraser, and vice versa. Evaluated with 8 different LLMs (Pythia, Dolly 2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLaMA, and Vicuna) across 4 datasets, experimental results show that RADAR significantly outperforms existing AI-text detection methods, especially when paraphrasing is in place. We also identify the strong transferability of RADAR from instruction-tuned LLMs to other LLMs, and evaluate the improved capability of RADAR via GPT-3.5-Turbo.

翻訳日:2023-10-26 00:14:18 公開日:2023-10-24

# 逆モデルによる不確かさの定量化

Quantification of Uncertainty with Adversarial Models ( http://arxiv.org/abs/2307.03217v2 )

ライセンス: Link先を確認

Kajetan Schweighofer, Lukas Aichberger, Mykyta Ielanskyi, G\"unter Klambauer, Sepp Hochreiter

(参考訳) 不確かさの定量化は実世界のアプリケーションで実行可能な予測に重要である。予測的不確実性定量化の重要な部分は、発散関数と後部の間の積の積分として定義されるてんかん不確実性の推定である。ディープアンサンブルやMCドロップアウトのような現在の手法は、主にサンプリングモデルにおいて後部を考慮しているため、てんかんの不確かさを推定するには不十分である。疫学的な不確実性をよりよく推定するために, 適応モデルによる不確かさの定量化を提案する。 quamは、積分の下の全積が後側だけでなく大きい領域を特定する。その結果、quamは従来の方法に比べて認識の不確かさの近似誤差が低い。製品が大きいモデルは、(逆の例ではなく)逆のモデルに対応します。敵対モデルは、高い後部と、それらの予測と参照モデルの高ばらつきの両方を持つ。実験の結果, QUIMは, 深層学習モデルの認識不確実性を把握し, 視覚領域における課題に対する従来の手法よりも優れていることがわかった。

Quantifying uncertainty is important for actionable predictions in real-world applications. A crucial part of predictive uncertainty quantification is the estimation of epistemic uncertainty, which is defined as an integral of the product between a divergence function and the posterior. Current methods such as Deep Ensembles or MC dropout underperform at estimating the epistemic uncertainty, since they primarily consider the posterior when sampling models. We suggest Quantification of Uncertainty with Adversarial Models (QUAM) to better estimate the epistemic uncertainty. QUAM identifies regions where the whole product under the integral is large, not just the posterior. Consequently, QUAM has lower approximation error of the epistemic uncertainty compared to previous methods. Models for which the product is large correspond to adversarial models (not adversarial examples!). Adversarial models have both a high posterior as well as a high divergence between their predictions and that of a reference model. Our experiments show that QUAM excels in capturing epistemic uncertainty for deep learning models and outperforms previous methods on challenging tasks in the vision domain.

翻訳日:2023-10-26 00:13:51 公開日:2023-10-24

# Marginal Pseudo-likelihood を用いたガウス図形モデルの大規模ベイズ構造学習

Large-scale Bayesian Structure Learning for Gaussian Graphical Models using Marginal Pseudo-likelihood ( http://arxiv.org/abs/2307.00127v2 )

ライセンス: Link先を確認

Reza Mohammadi, Marit Schoonhoven, Lucas Vogels, S. Ilker Birbil

(参考訳) ガウスモデルの学習のためのベイズ的手法は、モデルの不確実性に対処し、事前の知識を取り入れる堅牢なフレームワークを提供する。その理論的な強みにもかかわらず、ベイズ法の適用性はしばしば計算的要求、特に数千の変数を含む現代の文脈によって制約される。この問題を克服するため,我々は,ベイズ的手法を先行する手法に比べて計算コストが著しく低いマルコフ連鎖モンテカルロ(mcmc)探索アルゴリズムを2つ導入する。提案するmcmcに基づく探索アルゴリズムは,計算の難解な正規化定数と反復的精度行列サンプリングの複雑さを回避できる。これらのアルゴリズムは、1000変数の大規模な問題であっても、標準コンピュータ上でほんの数分で信頼できる結果を提供できる。さらに,提案手法は,全グラフ空間を効率的に探索することにより,モデルの不確実性に対処することができる。シミュレーション研究は,提案アルゴリズム,特に大規模スパースグラフにおいて,計算効率と精度の点でベイズ的手法より優れていることを示す。新しいアプローチをサポートする実装は、r package bdgraphを通じて利用できる。

Bayesian methods for learning Gaussian graphical models offer a robust framework that addresses model uncertainty and incorporates prior knowledge. Despite their theoretical strengths, the applicability of Bayesian methods is often constrained by computational needs, especially in modern contexts involving thousands of variables. To overcome this issue, we introduce two novel Markov chain Monte Carlo (MCMC) search algorithms that have a significantly lower computational cost than leading Bayesian approaches. Our proposed MCMC-based search algorithms use the marginal pseudo-likelihood approach to bypass the complexities of computing intractable normalizing constants and iterative precision matrix sampling. These algorithms can deliver reliable results in mere minutes on standard computers, even for large-scale problems with one thousand variables. Furthermore, our proposed method is capable of addressing model uncertainty by efficiently exploring the full posterior graph space. Our simulation study indicates that the proposed algorithms, particularly for large-scale sparse graphs, outperform the leading Bayesian approaches in terms of computational efficiency and precision. The implementation supporting the new approach is available through the R package BDgraph.

翻訳日:2023-10-26 00:13:33 公開日:2023-10-24

# ディープフェイク検出の公平性向上

Improving Fairness in Deepfake Detection ( http://arxiv.org/abs/2306.16635v2 )

ライセンス: Link先を確認

Yan Ju, Shu Hu, Shan Jia, George H. Chen, Siwei Lyu

(参考訳) 近年の効果的なディープフェイク検出モデルの開発にもかかわらず、近年の研究では、ディープフェイク検出モデルの開発に使用されるトレーニングデータのバイアスが、異なる人種や性別の人口集団に対して不公平なパフォーマンスをもたらすことが示されている。このような結果、これらのグループは不公平に標的にされ、または検出から除外され、分類されていないディープフェイクが世論を操り、モデルの信頼を損なうことができる。これらの研究はディープフェイク検出における不公平さの同定と評価に重点を置いているが,アルゴリズムレベルでのディープフェイク検出の公平性問題に対処する手法は開発されていない。そこで本研究では,新しい損失関数を提案すれば,人口統計学的要因を認識できない方法で,ディープフェイク検出モデルをトレーニングできるという,ディープフェイク検出フェアネスを改善する最初の試みを行う。 4つのdeepfakeデータセットと5つのdeepfake検出器に関する広範な実験は、deepfake検出フェアネスを改善するためのアプローチの有効性と柔軟性を示しています。

Despite the development of effective deepfake detection models in recent years, several recent studies have demonstrated that biases in the training data utilized to develop deepfake detection models can lead to unfair performance for demographic groups of different races and/or genders. Such can result in these groups being unfairly targeted or excluded from detection, allowing misclassified deepfakes to manipulate public opinion and erode trust in the model. While these studies have focused on identifying and evaluating the unfairness in deepfake detection, no methods have been developed to address the fairness issue of deepfake detection at the algorithm level. In this work, we make the first attempt to improve deepfake detection fairness by proposing novel loss functions to train fair deepfake detection models in ways that are agnostic or aware of demographic factors. Extensive experiments on four deepfake datasets and five deepfake detectors demonstrate the effectiveness and flexibility of our approach in improving the deepfake detection fairness.

翻訳日:2023-10-26 00:13:13 公開日:2023-10-24

# vint:ビジュアルナビゲーションのための基礎モデル

ViNT: A Foundation Model for Visual Navigation ( http://arxiv.org/abs/2306.14846v2 )

ライセンス: Link先を確認

Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Stachowicz, Kevin Black, Noriaki Hirose, Sergey Levine

(参考訳) 汎用的事前学習モデル("foundation model")は、個々の機械学習問題に対して、スクラッチから学習するために必要なものよりもはるかに小さいデータセットを使って、一般化可能なソリューションを作成することができる。このようなモデルは通常、弱い監督を持つ大規模で多様なデータセットでトレーニングされ、個々のダウンストリームアプリケーションで利用可能なものよりも多くのトレーニングデータを消費する。本稿では,視覚に基づくロボットナビゲーションにおける汎用事前学習モデルの成功を目的とした基礎モデルである視覚ナビゲーショントランスフォーマ(vint)について述べる。 ViNTは、任意のナビゲーションデータセットで使用可能な汎用目標到達目標をトレーニングし、フレキシブルなTransformerベースのアーキテクチャを使用して、ナビゲーションの余裕を学習し、さまざまな下流ナビゲーションタスクへの効率的な適応を可能にする。 vintは、さまざまなロボットプラットフォームから数百時間のロボットナビゲーションを含む、既存の多くのナビゲーションデータセットでトレーニングされており、特異なデータセットでトレーニングされた専門家モデルよりも優れた、ポジティブな転送を示す。 ViNTは、新しい環境を探索するための拡散に基づくサブゴールの提案で拡張することができ、長距離ヒューリスティックスを備えた場合のキロメートル規模のナビゲーション問題を解決することができる。 ViNTはプロンプトチューニングにインスパイアされた技法で新しいタスク仕様に適応することができ、ゴールエンコーダはゴールトークンの同じ空間に埋め込まれた別のタスクモダリティ(GPSウェイポイントやルーティングコマンドなど)のエンコーディングに置き換えられる。様々な下流問題領域に対応する柔軟性と能力は、モバイルロボティクスの効果的な基盤モデルとしてViNTを確立している。ビデオ、コード、モデルチェックポイントについては、プロジェクトページ https://visualnav-transformer.github.io を参照してください。

General-purpose pre-trained models ("foundation models") have enabled practitioners to produce generalizable solutions for individual machine learning problems with datasets that are significantly smaller than those required for learning from scratch. Such models are typically trained on large and diverse datasets with weak supervision, consuming much more training data than is available for any individual downstream application. In this paper, we describe the Visual Navigation Transformer (ViNT), a foundation model that aims to bring the success of general-purpose pre-trained models to vision-based robotic navigation. ViNT is trained with a general goal-reaching objective that can be used with any navigation dataset, and employs a flexible Transformer-based architecture to learn navigational affordances and enable efficient adaptation to a variety of downstream navigational tasks. ViNT is trained on a number of existing navigation datasets, comprising hundreds of hours of robotic navigation from a variety of different robotic platforms, and exhibits positive transfer, outperforming specialist models trained on singular datasets. ViNT can be augmented with diffusion-based subgoal proposals to explore novel environments, and can solve kilometer-scale navigation problems when equipped with long-range heuristics. ViNT can also be adapted to novel task specifications with a technique inspired by prompt-tuning, where the goal encoder is replaced by an encoding of another task modality (e.g., GPS waypoints or routing commands) embedded into the same space of goal tokens. This flexibility and ability to accommodate a variety of downstream problem domains establishes ViNT as an effective foundation model for mobile robotics. For videos, code, and model checkpoints, see our project page at https://visualnav-transformer.github.io.

翻訳日:2023-10-26 00:12:55 公開日:2023-10-24

# 漸近等方性サブプランク位相空間感度に対するスーパーポーシングコンパス状態

Superposing compass states for asymptotic isotropic sub-Planck phase-space sensitivity ( http://arxiv.org/abs/2306.13182v2 )

ライセンス: Link先を確認

Atharva Shukla, Barry C. Sanders

(参考訳) コンパス状態は、位相空間の変位に対する感度が真空状態の任意の方向に分散する感度よりも優れているという意味でサブプランク位相空間構造をもたらすが、この感度は異方性である。ここでは、一般化されたコンパス状態を、前者に対して$n$ のコンパス状態の重ね合わせとして導入し、それぞれ$\frac\pi{2n}$ で向き付けます。具体的には、これら一般化されたコンパス状態のウィグナー関数と、一般化されたコンパス状態とそれらの置換されたコンパス状態との重なりに対する近似閉形式表現を導出する。さらに、一般化されたコンパス状態は、任意の方向における位相空間の変位に対する等方性感度を示す。

Compass states deliver sub-Planck phase-space structure in the sense that sensitivity to phase-space displacement is superior to the sensitivity of displacing the vacuum state in any direction, but this sensitivity is anisotropic: better sensitivity for some directions of phase-space displacement vs others. Here we introduce generalised compass states as superpositions of $n$ compass states, with each oriented by $\frac\pi{2n}$ with respect to its predecessor. Specifically, we derive Wigner functions for these generalised compass states and approximate closed-form expressions for overlaps between generalised compass states and their displaced counterparts. Furthermore, we show that generalised compass states, in the limit $n\to\infty$, display isotropic sensitivity to phase-space displacement in any direction.

翻訳日:2023-10-26 00:12:26 公開日:2023-10-24

# 監視システムのレプリカ限界における位相遷移の解明

Elusive phase transition in the replica limit of monitored systems ( http://arxiv.org/abs/2306.12166v2 )

ライセンス: Link先を確認

Guido Giachetti and Andrea De Luca

(参考訳) 各スピンが無作為な方向にスピン成分の弱い測定によって常に摂動しているペアワイズオールツーオールノイズ相互作用を持つ、n$ spin-$1/2$粒子の系において、正確な可解な力学モデルの研究を行った。我々は、このレプリカのトリックを利用して、精製やその他の可観測物の研究における測定結果の重み付けをボルンの規則に当てはめ、大額のN$制限に正確に記述する。相転移の性質は計算に使用されるレプリカの数 n$ に大きく依存しており、関連する $n \rightarrow 1$ のレプリカリミットにおける不連続/清浄相を破壊する非摂動的対数補正が現れる。具体的には、弱い測定相における混合状態の浄化時間は、任意の強い測定速度でシステムサイズにおいて常に指数関数的に長いことを観察する。

We study an exactly solvable model of monitored dynamics in a system of $N$ spin-$1/2$ particles with pairwise all-to-all noisy interactions, where each spin is constantly perturbed by weak measurements of the spin component in a random direction. We make use of the replica trick to account for the Born's rule weighting of the measurement outcomes in the study of purification and other observables, with an exact description in the large-$N$ limit. We find that the nature of the phase transition strongly depends on the number $n$ of replicas used in the calculation, with the appearance of non-perturbative logarithmic corrections that destroy the disentangled/purifying phase in the relevant $n \rightarrow 1$ replica limit. Specifically, we observe that the purification time of a mixed state in the weak measurement phase is always exponentially long in the system size for arbitrary strong measurement rates.

翻訳日:2023-10-26 00:12:08 公開日:2023-10-24

# 構造に基づく薬物設計のための幾何学的深層学習の体系的調査

A Systematic Survey in Geometric Deep Learning for Structure-based Drug Design ( http://arxiv.org/abs/2306.11768v5 )

ライセンス: Link先を確認

Zaixi Zhang, Jiaxian Yan, Qi Liu, Enhong Chen, and Marinka Zitnik

(参考訳) structure-based drug design (sbdd) はタンパク質の3次元形状を利用して薬物候補を同定する。物理化学的モデリングに基礎を置き、ドメインの専門知識によって情報を得る伝統的な手法は資源集約である。幾何学的深層学習の最近の進歩は、AlphaFoldのようなツールによる正確なタンパク質の3D構造予測の可用性と合わせて、3D幾何学的データの統合と処理に焦点を当て、構造に基づく薬物設計の分野を大きく進歩させた。本稿では,SBDDにおける幾何学的深層学習の現状を体系的にレビューする。まず,SBDDの基本課題を概説し,3Dタンパク質の表現を詳細に説明し,代表的予測モデルと生成モデルを強調した。次に、結合部位予測、結合ポーズ生成、 \emph{de novo} 分子生成、リンカ設計、結合親和性予測など、各キータスクの詳細なレビューを行う。形式的な問題定義を提供し,各タスクの代表的な方法,データセット,評価指標,パフォーマンスベンチマークを概説する。 Finally, we summarize the current challenges and future opportunities: current challenges in SBDD include oversimplified problem formulations, inadequate out-of-distribution generalization, a lack of reliable evaluation metrics and large-scale benchmarks, and the need for experimental verification and enhanced model understanding; opportunities include leveraging multimodal datasets, integrating domain knowledge, building comprehensive benchmarks, designing criteria based on clinical endpoints, and developing foundation models that broaden the range of design tasks. また、進行中のコントリビューションとSBDDの新しいデータセットを反映して、 \url{https://github.com/zaixizhang/Awesome-SBDD}をキュレートします。

Structure-based drug design (SBDD) utilizes the three-dimensional geometry of proteins to identify potential drug candidates. Traditional methods, grounded in physicochemical modeling and informed by domain expertise, are resource-intensive. Recent developments in geometric deep learning, focusing on the integration and processing of 3D geometric data, coupled with the availability of accurate protein 3D structure predictions from tools like AlphaFold, have greatly advanced the field of structure-based drug design. This paper systematically reviews the current state of geometric deep learning in SBDD. We first outline foundational tasks in SBDD, detail prevalent 3D protein representations, and highlight representative predictive and generative models. We then offer in-depth reviews of each key task, including binding site prediction, binding pose generation, \emph{de novo} molecule generation, linker design, and binding affinity prediction. We provide formal problem definitions and outline each task's representative methods, datasets, evaluation metrics, and performance benchmarks. Finally, we summarize the current challenges and future opportunities: current challenges in SBDD include oversimplified problem formulations, inadequate out-of-distribution generalization, a lack of reliable evaluation metrics and large-scale benchmarks, and the need for experimental verification and enhanced model understanding; opportunities include leveraging multimodal datasets, integrating domain knowledge, building comprehensive benchmarks, designing criteria based on clinical endpoints, and developing foundation models that broaden the range of design tasks. We also curate \url{https://github.com/zaixizhang/Awesome-SBDD}, reflecting ongoing contributions and new datasets in SBDD.

翻訳日:2023-10-26 00:11:49 公開日:2023-10-24

# 生成的行動クローニングのための証明可能保証--低レベル安定性と高レベル行動の橋渡し

Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior ( http://arxiv.org/abs/2307.14619v5 )

ライセンス: Link先を確認

Adam Block, Ali Jadbabaie, Daniel Pfrommer, Max Simchowitz, Russ Tedrake

(参考訳) 生成モデルを用いた複雑な専門家による実験の行動クローニングに関する理論的枠組みを提案する。我々のフレームワークは、専門家によるデモンストレーションの模倣を安定化させるために、低レベルのコントローラ(位置命令制御の学習または暗黙)を呼び出す。私たちはそれを示します a) 適切な低レベルの安定保証及び b) 擬似学習者として十分強力な生成モデルである純粋教師付き行動クローニングは, 基本的に任意の専門的軌跡の時間毎のステップ分布を最適な輸送コストで生成することができる。我々の分析は、学習方針の確率的連続性(英語版)(total variation continuity、TVC)に依存している。次に、一般的なデータ拡張レジームと新しいアルゴリズムのトリックを組み合わせることで、TVCが最小限の精度の劣化で確保できることを示し、実行時に拡張ノイズを追加する。拡散モデルによりパラメータ化されたポリシーの保証をインスタンス化し、学習者が(雑音増大した)エキスパートポリシーのスコアを正確に推定した場合、擬似軌道の分布は自然の最適輸送距離における演者分布に近くなることを示す。提案手法は,無関心な手法である雑音提示トラジェクタ間の複雑なカップリングを構成する。本稿では,アルゴリズムの推薦を実証的に検証し,生成モデルによる行動クローニングの改善に向けた今後の研究の方向性について論じる。

We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. Our framework invokes low-level controllers - either learned or implicit in position-command control - to stabilize imitation around expert demonstrations. We show that with (a) a suitable low-level stability guarantee and (b) a powerful enough generative model as our imitation learner, pure supervised behavior cloning can generate trajectories matching the per-time step distribution of essentially arbitrary expert trajectories in an optimal transport cost. Our analysis relies on a stochastic continuity property of the learned policy we call "total variation continuity" (TVC). We then show that TVC can be ensured with minimal degradation of accuracy by combining a popular data-augmentation regimen with a novel algorithmic trick: adding augmentation noise at execution time. We instantiate our guarantees for policies parameterized by diffusion models and prove that if the learner accurately estimates the score of the (noise-augmented) expert policy, then the distribution of imitator trajectories is close to the demonstrator distribution in a natural optimal transport distance. Our analysis constructs intricate couplings between noise-augmented trajectories, a technique that may be of independent interest. We conclude by empirically validating our algorithmic recommendations, and discussing implications for future research directions for better behavior cloning with generative modeling.

翻訳日:2023-10-26 00:07:13 公開日:2023-10-24

# ランダムウォークによる異常検出に対する結合空間攻撃

Coupled-Space Attacks against Random-Walk-based Anomaly Detection ( http://arxiv.org/abs/2307.14387v2 )

ライセンス: Link先を確認

Yuni Lai, Marcin Waniek, Liying Li, Jingwen Wu, Yulin Zhu, Tomasz P. Michalak, Talal Rahwan, Kai Zhou

(参考訳) ランダムウォークスに基づく異常検出(RWAD)は、様々なアプリケーションにおいて異常パターンを特定するために一般的に用いられる。 RWADの興味深い特徴は、入力グラフが事前に存在するか、生の特徴から構築できることである。その結果、RWADに対する潜在的な攻撃面は2つあり、グラフ空間攻撃と特徴空間攻撃である。本稿では,実用的な結合空間攻撃を設計し,グラフ空間と特徴空間攻撃の相互作用について検討する。この目的のために、我々は徹底的な複雑性解析を行い、RWAD攻撃がNPハードであることを証明した。そこで我々は,グラフ空間攻撃を二段階最適化問題として定式化し,それを解決するための2つの戦略を提案する。最後に、より強力な特徴空間攻撃(グラフ誘導攻撃)を設計するためのガイダンスとしてグラフ空間攻撃の結果を利用する。包括的実験により,提案する攻撃は,rwadからターゲットノードを限定的な攻撃予算で有効にすることを示す。さらに,ブラックボックス設定で転送攻撃実験を行い,対象ノードの異常スコアを有意に減少させることを示した。本研究では,グラフ空間が特徴空間に依存するグラフ異常検出に対する結合空間攻撃の研究の扉を開く。

Random Walks-based Anomaly Detection (RWAD) is commonly used to identify anomalous patterns in various applications. An intriguing characteristic of RWAD is that the input graph can either be pre-existing or constructed from raw features. Consequently, there are two potential attack surfaces against RWAD: graph-space attacks and feature-space attacks. In this paper, we explore this vulnerability by designing practical coupled-space attacks, investigating the interplay between graph-space and feature-space attacks. To this end, we conduct a thorough complexity analysis, proving that attacking RWAD is NP-hard. Then, we proceed to formulate the graph-space attack as a bi-level optimization problem and propose two strategies to solve it: alternative iteration (alterI-attack) or utilizing the closed-form solution of the random walk model (cf-attack). Finally, we utilize the results from the graph-space attacks as guidance to design more powerful feature-space attacks (i.e., graph-guided attacks). Comprehensive experiments demonstrate that our proposed attacks are effective in enabling the target nodes from RWAD with a limited attack budget. In addition, we conduct transfer attack experiments in a black-box setting, which show that our feature attack significantly decreases the anomaly scores of target nodes. Our study opens the door to studying the coupled-space attack against graph anomaly detection in which the graph space relies on the feature space.

翻訳日:2023-10-26 00:06:37 公開日:2023-10-24

# 多類分類における平均ケースロバストネスの効率的な推定

Efficient Estimation of Average-Case Robustness for Multi-Class Classification ( http://arxiv.org/abs/2307.13885v3 )

ライセンス: Link先を確認

Tessa Han, Suraj Srinivas, Himabindu Lakkaraju

(参考訳) 機械学習におけるロバスト性は、逆条件でよく研究されるが、実世界のノイズ(測定ノイズなど)は逆条件ではなくランダムである。このような雑音下でのモデル行動は、平均ケースロバスト性、すなわち入力周辺の局所領域で一貫した予測を得る確率によって捉えられる。しかしながら、モンテカルロサンプリングに基づく平均ケースロバストネスを計算するna\"iveなアプローチは、特に高次元データでは統計的に非効率であり、大規模アプリケーションでは計算コストがかかる。本研究では,マルチクラス判別モデルの平均ケースロバストネスを効率的に計算する最初の解析推定器を開発した。これらの推定器は入力周辺の局所領域のモデルを線形化し、結果の線形モデルのロバスト性を解析的に計算する。これらの推定器が標準ディープラーニングモデルのロバストネスを効率的に計算し、ロバスト性バイアスの測定やノイズの摂動に弱いデータセットの同定など、ロバストネスに関わる様々なタスクにおいてこれらの推定器の有用性を示す。そこで本研究では,ロバストネスのための新しいフレームワークを提案するだけでなく,下流アプリケーションにおける平均ケースロバストネスの利用を可能にし,その計算を実用的なものにする。

Robustness in machine learning is commonly studied in the adversarial setting, yet real-world noise (such as measurement noise) is random rather than adversarial. Model behavior under such noise is captured by average-case robustness, i.e., the probability of obtaining consistent predictions in a local region around an input. However, the na\"ive approach to computing average-case robustness based on Monte-Carlo sampling is statistically inefficient, especially for high-dimensional data, leading to prohibitive computational costs for large-scale applications. In this work, we develop the first analytical estimators to efficiently compute average-case robustness of multi-class discriminative models. These estimators linearize models in the local region around an input and analytically compute the robustness of the resulting linear models. We show empirically that these estimators efficiently compute the robustness of standard deep learning models and demonstrate these estimators' usefulness for various tasks involving robustness, such as measuring robustness bias and identifying dataset samples that are vulnerable to noise perturbation. In doing so, this work not only proposes a new framework for robustness, but also makes its computation practical, enabling the use of average-case robustness in downstream applications.

翻訳日:2023-10-26 00:06:11 公開日:2023-10-24

# WebArena: 自律エージェント構築のための現実的なWeb環境

WebArena: A Realistic Web Environment for Building Autonomous Agents ( http://arxiv.org/abs/2307.13854v2 )

ライセンス: Link先を確認

Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig

(参考訳) 生成AIの進歩により、自律エージェントは自然言語コマンドを通じて日々のタスクを管理することが可能になった。しかし、現在のエージェントは主に単純な合成環境で作成され、テストされ、現実世界のシナリオと切り離される。本稿では,現実的で再現性の高い言語誘導エージェントのための環境を構築する。具体的には、web上でタスクを行うエージェントに注目し、eコマース、ソーシャルフォーラムの議論、共同ソフトウェア開発、コンテンツ管理という4つの共通ドメインから完全に機能するwebサイトを構築する。私たちの環境は、人間のようなタスク解決を促進するツール(地図など)と外部知識ベース(ユーザマニュアルなど)で豊かになっています。私たちの環境に基づいて、タスク完了の機能的正確性を評価することに焦点を当てた一連のベンチマークタスクをリリースします。私たちのベンチマークのタスクは多様で、長い水平で、人間が日常的にインターネット上で実行するタスクをエミュレートするように設計されています。我々はいくつかのベースラインエージェントを実験し、行動前に推論などの最近の手法を統合する。 GPT-4をベースとしたベストエージェントは、エンド・ツー・エンドのタスク成功率14.41%に過ぎず、人間のパフォーマンス78.24%よりも大幅に低い。これらの結果は、ロバストなエージェントのさらなる開発の必要性、現在の最先端の大規模言語モデルが実際のタスクにおける完全なパフォーマンスには程遠いこと、そして、webarenaがそのような進歩を測定するために使用できることを浮き彫りにしている。

With advances in generative AI, there is now potential for autonomous agents to manage daily tasks via natural language commands. However, current agents are primarily created and tested in simplified synthetic environments, leading to a disconnect with real-world scenarios. In this paper, we build an environment for language-guided agents that is highly realistic and reproducible. Specifically, we focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains: e-commerce, social forum discussions, collaborative software development, and content management. Our environment is enriched with tools (e.g., a map) and external knowledge bases (e.g., user manuals) to encourage human-like task-solving. Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions. The tasks in our benchmark are diverse, long-horizon, and designed to emulate tasks that humans routinely perform on the internet. We experiment with several baseline agents, integrating recent techniques such as reasoning before acting. The results demonstrate that solving complex tasks is challenging: our best GPT-4-based agent only achieves an end-to-end task success rate of 14.41%, significantly lower than the human performance of 78.24%. These results highlight the need for further development of robust agents, that current state-of-the-art large language models are far from perfect performance in these real-life tasks, and that WebArena can be used to measure such progress.

翻訳日:2023-10-26 00:05:32 公開日:2023-10-24

# 交通信号制御のためのSim-to-Real転送に向けた不確実な接地行動変換

Uncertainty-aware Grounded Action Transformation towards Sim-to-Real Transfer for Traffic Signal Control ( http://arxiv.org/abs/2307.12388v2 )

ライセンス: Link先を確認

Longchao Da, Hao Mei, Romir Sharma and Hua Wei

(参考訳) 交通信号制御(tsc)は、数百万人の日常生活に影響を与える複雑で重要なタスクである。強化学習(rl)は交通信号制御の最適化に有望な結果を示しているが、現在のrlベースのtsc法は主にシミュレーションで訓練され、シミュレーションと実世界のパフォーマンスギャップに苦しむ。本稿では, シミュレーション中の動作を不確実性で動的に変換することで, シミュレーション環境から実世界環境へ学習した学習方針を伝達し, 遷移力学の領域ギャップを緩和する, UGAT と呼ばれるシミュレーションから実世界への移行手法を提案する。本手法をシミュレーションした交通環境において評価し,実環境におけるトランスファーrlポリシーの性能を著しく向上させることを示す。

Traffic signal control (TSC) is a complex and important task that affects the daily lives of millions of people. Reinforcement Learning (RL) has shown promising results in optimizing traffic signal control, but current RL-based TSC methods are mainly trained in simulation and suffer from the performance gap between simulation and the real world. In this paper, we propose a simulation-to-real-world (sim-to-real) transfer approach called UGAT, which transfers a learned policy trained from a simulated environment to a real-world environment by dynamically transforming actions in the simulation with uncertainty to mitigate the domain gap of transition dynamics. We evaluate our method on a simulated traffic environment and show that it significantly improves the performance of the transferred RL policy in the real world.

翻訳日:2023-10-26 00:05:04 公開日:2023-10-24

# 表面電子に基づく非断熱的ホロノミック量子ゲート

Nonadiabatic holonomic quantum gates based on the surface electron ( http://arxiv.org/abs/2307.09900v3 )

ライセンス: Link先を確認

Jun Wang, Hai-Bo Wang, Qing Ai

(参考訳) 幾何学位相に基づく非線形ホロノミック量子計算は、内蔵ノイズとデコヒーレンスに対して堅牢である。本研究では, 量子計算のための有望な2次元プラットフォームである表面電子系において, 非断熱ホロノミック量子ゲートを実現するためのスキームを理論的に提案する。ホロノミックゲートは、リドベルク状態とスピン状態が不均一磁場を介して結合する3層構造によって実現される。循環進化の後、計算基盤は異なる幾何学的位相を拾い上げ、幾何学的ゲートを実行する。スピンアップした電子のみが幾何ゲートを体験し、スピンダウンした電子は状態選択駆動場から分離される。ライドバーグ状態とスピン状態にエンコードされた任意の制御uゲートを実現できる。出力状態の忠実度は、実験的に達成可能なパラメータで 0.99 を超える。

The nonadiabatic holonomic quantum computation based on the geometric phase is robust against the built-in noise and decoherence. In this work, we theoretically propose a scheme to realize nonadiabatic holonomic quantum gates in a surface electron system, which is a promising two-dimensional platform for quantum computation. The holonomic gate is realized by a three-level structure that combines the Rydberg states and spin states via an inhomogeneous magnetic field. After a cyclic evolution, the computation bases pick up different geometric phases and thus perform a geometric gate. Only the electron with spin up experiences the geometric gate, while the electron with spin down is decoupled from the state-selective driving fields. The arbitrary controlled-U gate encoded on the Rydberg states and spin states can then be realized. The fidelity of the output state exceeds 0.99 with experimentally achievable parameters.

翻訳日:2023-10-26 00:04:25 公開日:2023-10-24

# 曖昧な基底真理の下での共形予測

Conformal prediction under ambiguous ground truth ( http://arxiv.org/abs/2307.09302v2 )

ライセンス: Link先を確認

David Stutz, Abhijit Guha Roy, Tatiana Matejovicova, Patricia Strachan, Ali Taylan Cemgil, Arnaud Doucet

(参考訳) Conformal Prediction (CP) は、$C(X)$ を満たす予測セット $\mathbb{P}(Y \in C(X))\geq 1-\alpha$ for a user-chosen $\alpha \in [0,1]$ をキャリブレーションデータ $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$ に頼って厳密な不確実性定量化を行うことができる。通常、$\mathbb{P}^{Y|X}$ は「真の」後ラベル分布であると暗黙的に仮定される。しかし、多くの実世界のシナリオにおいて、ラベルの$Y_1, ..., Y_n$は投票手順を用いて専門家の意見を集約することで得られ、結果として1ホット分布の$\mathbb{P}_{vote}^{Y|X}$となる。そのような `voted' ラベルに対して、CP の保証は、真の分布 $\mathbb{P}$ よりもむしろ w.r.t. $\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y|X}$ である。曖昧な基底真理ラベルを持つ場合、$\mathbb{P}_{vote}$と$\mathbb{P}$の区別は無関係である。しかし、不明瞭なラベルのために専門家が同意しない場合、$\mathbb{P}_{vote}^{Y|X}$を1ホット分布 $\mathbb{P}_{vote}^{Y|X}$ と近似すると、この不確実性は無視される。本稿では、非退化分布 $\mathbb{P}_{agg}^{Y|X}$ を用いて、専門家の意見を利用して $\mathbb{P}Y|X}$ を近似する。それぞれのキャリブレーション例$X_1, ..., X_n$に対して, $\mathbb{P}_{agg}^{Y|X}$から複数の合成擬似ラベルをサンプリングすることにより, w.r.t. $\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y|X}$を保証できるモンテカルロCPプロシージャを開発する。専門家アノテータ間で大きな不一致を伴う皮膚条件分類のケーススタディでは、CP w.r.t. $\mathbb{P}_{vote}$ under-covers expert annotations: calibrated for 7,2\%$ coverage, on average 10\%$; our Monte Carlo CP closes this gap both empirically and theoretically。

Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}(Y \in C(X))\geq 1-\alpha$ for a user-chosen $\alpha \in [0,1]$ by relying on calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$. It is typically implicitly assumed that $\mathbb{P}^{Y|X}$ is the "true" posterior label distribution. However, in many real-world scenarios, the labels $Y_1,...,Y_n$ are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$. For such ``voted'' labels, CP guarantees are thus w.r.t. $\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y|X}$ rather than the true distribution $\mathbb{P}$. In cases with unambiguous ground truth labels, the distinction between $\mathbb{P}_{vote}$ and $\mathbb{P}$ is irrelevant. However, when experts do not agree because of ambiguous labels, approximating $\mathbb{P}^{Y|X}$ with a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$ ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate $\mathbb{P}^{Y|X}$ using a non-degenerate distribution $\mathbb{P}_{agg}^{Y|X}$. We develop Monte Carlo CP procedures which provide guarantees w.r.t. $\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y|X}$ by sampling multiple synthetic pseudo-labels from $\mathbb{P}_{agg}^{Y|X}$ for each calibration example $X_1,...,X_n$. In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. $\mathbb{P}_{vote}$ under-covers expert annotations: calibrated for $72\%$ coverage, it falls short by on average $10\%$; our Monte Carlo CP closes this gap both empirically and theoretically.

翻訳日:2023-10-26 00:04:13 公開日:2023-10-24

# 量子コヒーレンスと微視的可逆性の原理

Quantum coherence and the principle of microscopic reversibility ( http://arxiv.org/abs/2307.08792v2 )

ライセンス: Link先を確認

K. Khan, W. F. Magalhaes, Jailson S. Araujo, B. de Lima Bernardo and Gabriel H. Aguilar

(参考訳) 微視的可逆性の原理は、ゆらぎ関係とオンサガーの相互関係の定式化の基本的な要素である。したがって、この原理が量子力学のシナリオにどのように適合するかを明確に記述することは、非平衡量子過程をよりよく理解するために重要である。本稿では、量子遷移を観測する確率と対応する時間反転過程との対称性関係においてコヒーレンスが果たす役割を強調する、この原理の量子一般化を提案する。本研究では,温熱貯留層と相互作用する量子ビット系の枠組みにおける知見の意義について検討し,そのダイナミクスをシミュレートする光学実験を実施する。理論および実験の結果, 低温ではコヒーレンスの影響がより決定的であり, 古典の場合からの最大離脱は最大コヒーレント状態に対しては起こらないことがわかった。古典的な予測は適切な範囲で回復される。

The principle of microscopic reversibility is a fundamental element in the formulation of fluctuation relations and the Onsager reciprocal relations. As such, a clear description of whether and how this principle is adapted to the quantum mechanical scenario might be essential to a better understanding of nonequilibrium quantum processes. Here, we propose a quantum generalization of this principle, which highlights the role played by coherence in the symmetry relations involving the probability of observing a quantum transition and that of the corresponding time reversed process. We study the implications of our findings in the framework of a qubit system interacting with a thermal reservoir, and implement an optical experiment that simulates the dynamics. Our theoretical and experimental results show that the influence of coherence is more decisive at low temperatures and that the maximum departure from the classical case does not take place for maximally coherent states. Classical predictions are recovered in the appropriate limits.

翻訳日:2023-10-26 00:02:41 公開日:2023-10-24

# llm自己防衛:自己検査によって、llmは彼らが騙されていることを知っている

LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked ( http://arxiv.org/abs/2308.07308v3 )

ライセンス: Link先を確認

Mansi Phute, Alec Helbling, Matthew Hull, ShengYun Peng, Sebastian Szyller, Cory Cornelius and Duen Horng Chau

(参考訳) 大規模言語モデル(LLM)は高品質なテキスト生成に人気があるが、強化学習を通じて人的価値に合わせる場合でも有害なコンテンツを生成できる。敵のプロンプトは安全対策を回避できる。 LLM自己防衛(LSM Self Defense, LLM Self Defense)は, LLMスクリーンに応答を誘導することでこれらの攻撃を防御する簡単な手法である。本手法では,微調整や入力前処理,反復的な出力生成は不要である。代わりに、生成されたコンテンツを事前定義されたプロンプトに組み込んで、llmの別のインスタンスを使用してテキストを分析し、それが有害かどうかを予測します。我々は, GPT 3.5 と Llama 2 の LLM Self Defense を, GPT 3.5 と Llama 2 の2つの主要な LLM の様々な攻撃に対して試験する。特に、LDM Self Defense は GPT 3.5 と Llama 2 を用いて攻撃成功率を事実上 0 に下げることに成功した。

Large language models (LLMs) are popular for high-quality text generation but can produce harmful content, even when aligned with human values through reinforcement learning. Adversarial prompts can bypass their safety measures. We propose LLM Self Defense, a simple approach to defend against these attacks by having an LLM screen the induced responses. Our method does not require any fine-tuning, input preprocessing, or iterative output generation. Instead, we incorporate the generated content into a pre-defined prompt and employ another instance of an LLM to analyze the text and predict whether it is harmful. We test LLM Self Defense on GPT 3.5 and Llama 2, two of the current most prominent LLMs against various types of attacks, such as forcefully inducing affirmative responses to prompts and prompt engineering attacks. Notably, LLM Self Defense succeeds in reducing the attack success rate to virtually 0 using both GPT 3.5 and Llama 2.

翻訳日:2023-10-25 23:54:51 公開日:2023-10-24

# セマンティックスを超えて:自己教師型学習による行動強化関連モデル学習

Beyond Semantics: Learning a Behavior Augmented Relevance Model with Self-supervised Learning ( http://arxiv.org/abs/2308.05379v4 )

ライセンス: Link先を確認

Zeyuan Chen, Wei Chen, Jia Xu, Zhongyi Liu, Wei Zhang

(参考訳) 関連モデリングは,検索エンジンがユーザエクスペリエンスを確保する上で重要な,対応するクエリに対して望ましい項目を見つけることを目的としている。ほとんどの従来の手法では、クエリとアイテム間のセマンティックな類似性を評価することでこの問題に対処するが、純粋なセマンティックマッチングは、すべてではない。実際、検索ログのユーザ履歴行動データから抽出された補助的なクエリ-イテム相互作用は、ユーザの検索意図をさらに明らかにするためのヒントを与えることができる。そこで我々は,Alipay Search (BARL-ASe) のための新しい行動拡張関連学習モデルを提案し,ターゲットクエリの隣のクエリと隣のクエリの隣のクエリを利用して,ターゲットクエリと項目のセマンティックマッチングを補完する。具体的には,隣接と対象の両方のビューから粗粒度および細粒度の意味表現を蒸留するマルチレベルコアテンションを構築した。このモデルはその後,BARL-ASeの精度とロジット学習の強化により頑健性を向上させるために,隣接目標の自己教師型学習を採用する。さらに、alipayのミニアプリの検索シナリオのロングテールクエリ項目マッチングを実際に扱う方法について論じる。実業界データとオンラインa/bテストによる実験により,提案手法が低レイテンシで有望な性能を実現することを実証した。

Relevance modeling aims to locate desirable items for corresponding queries, which is crucial for search engines to ensure user experience. Although most conventional approaches address this problem by assessing the semantic similarity between the query and item, pure semantic matching is not everything. In reality, auxiliary query-item interactions extracted from user historical behavior data of the search log could provide hints to reveal users' search intents further. Drawing inspiration from this, we devise a novel Behavior Augmented Relevance Learning model for Alipay Search (BARL-ASe) that leverages neighbor queries of target item and neighbor items of target query to complement target query-item semantic matching. Specifically, our model builds multi-level co-attention for distilling coarse-grained and fine-grained semantic representations from both neighbor and target views. The model subsequently employs neighbor-target self-supervised learning to improve the accuracy and robustness of BARL-ASe by strengthening representation and logit learning. Furthermore, we discuss how to deal with the long-tail query-item matching of the mini apps search scenario of Alipay practically. Experiments on real-world industry data and online A/B testing demonstrate our proposal achieves promising performance with low latency.

翻訳日:2023-10-25 23:54:31 公開日:2023-10-24

# モデルモデル -- その1

Model of models -- Part 1 ( http://arxiv.org/abs/2308.04600v2 )

ライセンス: Link先を確認

Shimon Komarovsky

(参考訳) 本稿では,AGIエージェントの主成分として機能する新しい認知モデルを提案する。このモデルは、成熟したインテリジェンス状態に導入され、以前のモデルであるDENN、特にAKREMの拡張として、運用モデル(フレーム/クラス)と意志を含む。このモデルの中核的な仮定は、認知は蓄積された知識を操作することであり、適切な意志のガイダンスである。また、知識の一部である行動が、成熟した知性状態に先行する進化段階において、意志に沿うことを学習していると仮定する。さらに、このモデルは、トップダウンとボトムアップの両方のモデル学習、一般化のバース特殊化など、既知のすべての知的側面における双対性原理に基づいている。さらに、AGI設計には全体論的アプローチが提唱され、再利用性とシンプルさという形で制約や効率性の下での認知が提案される。最後に、この成熟状態に達するには、統合原理を利用して、幼児から成人への認知的進化を通して記述する。この認知モデルの最終的な製品は、モデルとインスタンスの動的操作メモリである。最後に、成熟状態に達する進化段階のいくつかの例と予備的なアイデアを示す。

This paper proposes a new cognitive model, acting as the main component of an AGI agent. The model is introduced in its mature intelligence state, and as an extension of previous models, DENN, and especially AKREM, by including operational models (frames/classes) and will. This model's core assumption is that cognition is about operating on accumulated knowledge, with the guidance of an appropriate will. Also, we assume that the actions, part of knowledge, are learning to be aligned with will, during the evolution phase that precedes the mature intelligence state. In addition, this model is mainly based on the duality principle in every known intelligent aspect, such as exhibiting both top-down and bottom-up model learning, generalization verse specialization, and more. Furthermore, a holistic approach is advocated for AGI designing, and cognition under constraints or efficiency is proposed, in the form of reusability and simplicity. Finally, reaching this mature state is described via a cognitive evolution from infancy to adulthood, utilizing a consolidation principle. The final product of this cognitive model is a dynamic operational memory of models and instances. Lastly, some examples and preliminary ideas for the evolution phase to reach the mature state are presented.

翻訳日:2023-10-25 23:54:09 公開日:2023-10-24

# MM-Vet:統合能力のための大規模マルチモーダルモデルの評価

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities ( http://arxiv.org/abs/2308.02490v3 )

ライセンス: Link先を確認

Weihao Yu, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang

(参考訳) 複雑なマルチモーダルタスクにおける大規模マルチモーダルモデル(LMM)の評価ベンチマークであるMM-Vetを提案する。近年のLMMは、黒板に書かれた数学の問題を解くこと、ニュース画像の出来事や有名人を推論すること、視覚的ジョークを説明することなど、様々な興味深い能力を示している。迅速なモデル開発は、ベンチマーク開発の評価に課題をもたらす。課題は,(1)複雑なマルチモーダルタスクを体系的に構造化し,評価する方法,(2)質問や回答のタイプでうまく機能する評価指標を設計する方法,(3)単純なパフォーマンスランキングを超えたモデルインサイトを提供する方法。この目的のために、複雑なタスクを解く興味深い能力は、様々なコアビジョン言語(VL)機能を統合できる一般モデルによってしばしば達成されるという知見に基づいて設計されたMM-Vetを提案する。 MM-Vetは6つのコアVL機能を定義し、機能の組み合わせから導かれる16の関心統合を検証している。評価指標として,オープンエンド出力のためのLCMに基づく評価器を提案する。評価器は、異なる質問タイプと回答スタイルで評価が可能であり、その結果、統一されたスコアリング基準となる。 MM-Vetにおける代表的LMMを評価し、異なるLMMシステムパラダイムとモデルの能力に関する洞察を提供する。コードとデータはhttps://github.com/yuweihao/MM-Vet.comで公開されている。

We propose MM-Vet, an evaluation benchmark that examines large multimodal models (LMMs) on complicated multimodal tasks. Recent LMMs have shown various intriguing abilities, such as solving math problems written on the blackboard, reasoning about events and celebrities in news images, and explaining visual jokes. Rapid model advancements pose challenges to evaluation benchmark development. Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking. To this end, we present MM-Vet, designed based on the insight that the intriguing ability to solve complicated tasks is often achieved by a generalist model being able to integrate different core vision-language (VL) capabilities. MM-Vet defines 6 core VL capabilities and examines the 16 integrations of interest derived from the capability combination. For evaluation metrics, we propose an LLM-based evaluator for open-ended outputs. The evaluator enables the evaluation across different question types and answer styles, resulting in a unified scoring metric. We evaluate representative LMMs on MM-Vet, providing insights into the capabilities of different LMM system paradigms and models. Code and data are available at https://github.com/yuweihao/MM-Vet.

翻訳日:2023-10-25 23:53:23 公開日:2023-10-24

# Baby Llama: パフォーマンスペナルティのない小さなデータセットで訓練された教師のアンサンブルからの知識蒸留

Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty ( http://arxiv.org/abs/2308.02019v2 )

ライセンス: Link先を確認

Inar Timiryasov and Jean-Loup Tastet

(参考訳) 言語モデルのサンプル効率を向上させることを目的として,babylmチャレンジへの提案を行った。我々は,GPT-2と10MワードのBabyLMデータセットを用いて,GPT-2と小LLaMAモデルからなるアンサンブルを訓練し,それを58MパラメータのLLaMAモデルに蒸留した。これは、蒸留が十分に小さなデータセットで訓練された場合、教師モデルの完全な性能を維持するだけでなく、それを上回ることができ、直接訓練よりもかなり優れた性能を得られることを示唆する。

We present our submission to the BabyLM challenge, whose goal was to improve the sample efficiency of language models. We trained an ensemble consisting of a GPT-2 and small LLaMA models on the developmentally-plausible, 10M-word BabyLM dataset, then distilled it into a small, 58M-parameter LLaMA model, which exceeds in performance both of its teachers as well as a similar model trained without distillation. This suggests that distillation can not only retain the full performance of the teacher model when the latter is trained on a sufficiently small dataset; it can exceed it, and lead to significantly better performance than direct training.

翻訳日:2023-10-25 23:52:58 公開日:2023-10-24

# DiffKendall: Kendallのランク相関を微分可能なFew-Shot学習のための新しいアプローチ

DiffKendall: A Novel Approach for Few-Shot Learning with Differentiable Kendall's Rank Correlation ( http://arxiv.org/abs/2307.15317v2 )

ライセンス: Link先を確認

Kaipeng Zheng, Huishuai Zhang, Weiran Huang

(参考訳) 少数ショット学習は、ベースデータセットでトレーニングされたモデルを、それまでモデルによってカテゴリが見られなかった新しいタスクに適応させることを目的としている。これはしばしば、新しいクラスにおけるチャネル間の機能値の比較的均一な分布をもたらし、新しいタスクにおけるチャネルの重要性を決定する上での課題となる。標準的少数ショット学習法では、コサイン類似度や負ユークリッド距離といった幾何学的類似度メトリクスを用いて、2つの特徴間の意味的関連度を測定する。しかし、幾何学的類似性が高い特徴は、特に数ショット学習の文脈において、異なる意味論を持つ可能性がある。本稿では,特徴チャネルのランク付けの重要性が,幾何学的類似度指標よりも数ショット学習の信頼性が高いことを示す。幾何類似度メトリックをケンドールのランク相関に置き換えることにより、様々な領域の様々な方法やデータセットにおいて、数発学習の性能を向上させることができる。さらに,kendallのランク相関の非微分可能性問題に対処するために,メタトレーニングにおいて注意深く設計された微分可能損失を提案する。幾何学的類似性を微分可能なkendallのランク相関式に置き換えることで,既存の多数の少数ショットアプローチと統合することができ,幾何学的類似度メトリクスに依存する将来の最先端手法と統合する準備が整っている。大規模な実験は、ランク相関に基づくアプローチの有効性を検証し、少数ショット学習において顕著な改善を示す。

Few-shot learning aims to adapt models trained on the base dataset to novel tasks where the categories were not seen by the model before. This often leads to a relatively uniform distribution of feature values across channels on novel classes, posing challenges in determining channel importance for novel tasks. Standard few-shot learning methods employ geometric similarity metrics such as cosine similarity and negative Euclidean distance to gauge the semantic relatedness between two features. However, features with high geometric similarities may carry distinct semantics, especially in the context of few-shot learning. In this paper, we demonstrate that the importance ranking of feature channels is a more reliable indicator for few-shot learning than geometric similarity metrics. We observe that replacing the geometric similarity metric with Kendall's rank correlation only during inference is able to improve the performance of few-shot learning across a wide range of methods and datasets with different domains. Furthermore, we propose a carefully designed differentiable loss for meta-training to address the non-differentiability issue of Kendall's rank correlation. By replacing geometric similarity with differentiable Kendall's rank correlation, our method can integrate with numerous existing few-shot approaches and is ready for integrating with future state-of-the-art methods that rely on geometric similarity metrics. Extensive experiments validate the efficacy of the rank-correlation-based approach, showcasing a significant improvement in few-shot learning.

翻訳日:2023-10-25 23:52:01 公開日:2023-10-24

# Lanczos法の累積展開を用いた量子計算グリーン関数

Quantum Computed Green's Functions using a Cumulant Expansion of the Lanczos Method ( http://arxiv.org/abs/2309.09685v2 )

ライセンス: Link先を確認

Gabriel Greene-Diniz, David Zsolt Manrique, Kentaro Yamamoto, Evgeny Plekhanov, Nathan Fitzpatrick, Michal Krompiec, Rei Sakuma, David Mu\~noz Ramo

(参考訳) 本稿では,多体グリーン関数行列をスピン軌道基底で計算する量子計算法を提案する。我々は,有限サイズのフェルミオンハバードモデルとそれに関連する不純物モデルに動的平均場理論を適用し,量子量子コンピュータH1-1上でのグリーン関数の計算を実証する。本手法は, ハミルトンモーメントを計測可能な期待値として, ランチョス法を累積展開する手法である。これにより、変分量子固有ソルバ(vqe)の繰り返し適用による測定回数の大幅なオーバーヘッドを回避し、代わりに一組の計測回路でモーメントの期待値を測定する。測定されたモーメントから、三対角化ハミルトン行列が計算され、連続分数を通してグリーン函数が生成される。本研究では, 変分アルゴリズムを用いて基底状態を作成するが, 実装のモジュラリティにより, 基底状態に対して他の(変分的でない)アプローチが使用できることに留意する。

In this paper, we present a quantum computational method to calculate the many-body Green's function matrix in a spin orbital basis. We apply our approach to finite-sized fermionic Hubbard models and related impurity models within Dynamical Mean Field Theory, and demonstrate the calculation of Green's functions on Quantinuum's H1-1 trapped-ion quantum computer. Our approach involves a cumulant expansion of the Lanczos method, using Hamiltonian moments as measurable expectation values. This bypasses the need for a large overhead in the number of measurements due to repeated applications of the variational quantum eigensolver (VQE), and instead measures the expectation value of the moments with one set of measurement circuits. From the measured moments, the tridiagonalised Hamiltonian matrix can be computed, which in turn yields the Green's function via continued fractions. While we use a variational algorithm to prepare the ground state in this work, we note that the modularity of our implementation allows for other (non-variational) approaches to be used for the ground state.

翻訳日:2023-10-25 23:45:33 公開日:2023-10-24

# 提案要求に対するオープンデータ駆動チーム推奨によるリサーチコラボレーションの促進

Promoting Research Collaboration with Open Data Driven Team Recommendation in Response to Call for Proposals ( http://arxiv.org/abs/2309.09404v3 )

ライセンス: Link先を確認

Siva Likitha Valluru, Biplav Srivastava, Sai Teja Paladi, Siwen Yan, Sriraam Natarajan

(参考訳) チームの構築とコラボレーションの促進は2つの非常に一般的なビジネス活動です。例えばteamingforfunding問題では、研究機関や研究者が、後者の提案に応じて資金提供機関に申し込む際の協力的な機会を特定することに関心を持っている。本稿では,(1)各チームが,その機会に要求される最高のスキルカバレッジを達成し,(2)その機会を分配する作業負荷が,候補メンバー間でバランスをとるような,さまざまなAI手法を用いてチームを推薦するシステムについて述べる。我々は,提案コール(需要)と研究者プロファイル(供給)のオープンデータに潜んでいるスキルを抽出し,分類法を用いてそれらを正規化し,供給需要にマッチする効率的なアルゴリズムを作成することで,これらの疑問に対処した。短期と長期の目標のバランスをとる新しいメトリクスに沿って、良さを最大化するチームを作ります。我々は,(1) アルゴリズムの成功を定量的に検証し,(1) 優れたスコアを用いて推奨チームを評価し,より情報のある手法がより少ない人数のチームの推薦につながること,(2) 大学レベルの大規模ユーザスタディを実施することによって質的に,そのツールが極めて有用かつ関連性の高いものであることを示す。最後に,我々のアプローチの汎用性を確立するために,米国とインド(研究者と提案コール)の2つの異なる環境でシステムを評価し,日常的な使用のために米国の主要大学に展開する。

Building teams and promoting collaboration are two very common business activities. An example of these are seen in the TeamingForFunding problem, where research institutions and researchers are interested to identify collaborative opportunities when applying to funding agencies in response to latter's calls for proposals. We describe a novel system to recommend teams using a variety of AI methods, such that (1) each team achieves the highest possible skill coverage that is demanded by the opportunity, and (2) the workload of distributing the opportunities is balanced amongst the candidate members. We address these questions by extracting skills latent in open data of proposal calls (demand) and researcher profiles (supply), normalizing them using taxonomies, and creating efficient algorithms that match demand to supply. We create teams to maximize goodness along a novel metric balancing short- and long-term objectives. We validate the success of our algorithms (1) quantitatively, by evaluating the recommended teams using a goodness score and find that more informed methods lead to recommendations of smaller number of teams but higher goodness, and (2) qualitatively, by conducting a large-scale user study at a college-wide level, and demonstrate that users overall found the tool very useful and relevant. Lastly, we evaluate our system in two diverse settings in US and India (of researchers and proposal calls) to establish generality of our approach, and deploy it at a major US university for routine use.

翻訳日:2023-10-25 23:45:14 公開日:2023-10-24

# AV2Wav: 音声音声強調のための連続自己教師機能からの拡散に基づく再合成

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement ( http://arxiv.org/abs/2309.08030v2 )

ライセンス: Link先を確認

Ju-Chieh Chou, Chung-Ming Chien, Karen Livescu

(参考訳) 音声強調システムは通常、クリーンな音声と騒がしい音声のペアを使って訓練される。オーディオ・ヴィジュアル音声強調(AVSE)では、音声・ヴィジュアル・データセットは、背景雑音や残響を伴う現実世界の環境で収集され、AVSEの開発を妨げている。本研究では,実世界の学習データの課題にもかかわらずクリーンな音声を生成できる再生型音声視覚音声強調手法であるAV2Wavを紹介する。ニューラルクオリティ推定器を用いて音声・視覚コーパスからほぼクリーンな音声のサブセットを取得し、このサブセット上で拡散モデルを訓練し、ノイズロバストトレーニングによりAV-HuBERTから連続音声表現に条件付き波形を生成する。韻律や話者情報を保持するために、離散表現よりも連続表現を用いる。このvocodingタスクだけで、モデルはマスキングベースのベースラインよりも音声強調を行うことができる。さらに, クリーン・ノイズ対の拡散モデルを微調整し, 性能向上を図る。提案手法は,自動測定と人間の聴力テストの両方においてマスキングベースのベースラインを上回り,聴力テストにおけるターゲット音声にほぼ近い品質である。オーディオサンプルはhttps://home.ttic.edu/~jcchou/demo/avse/avse_demo.htmlにある。

Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in real-world environments with background noise and reverberation, hampering the development of AVSE. In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data. We obtain a subset of nearly clean speech from an audio-visual corpus using a neural quality estimator, and then train a diffusion model on this subset to generate waveforms conditioned on continuous speech representations from AV-HuBERT with noise-robust training. We use continuous rather than discrete representations to retain prosody and speaker information. With this vocoding task alone, the model can perform speech enhancement better than a masking-based baseline. We further fine-tune the diffusion model on clean/noisy utterance pairs to improve the performance. Our approach outperforms a masking-based baseline in terms of both automatic metrics and a human listening test and is close in quality to the target speech in the listening test. Audio samples can be found at https://home.ttic.edu/~jcchou/demo/avse/avse_demo.html.

翻訳日:2023-10-25 23:44:47 公開日:2023-10-24

# 臨床テキスト要約:大規模言語モデルへの適応は人間の専門家を上回らせる

Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts ( http://arxiv.org/abs/2309.07430v3 )

ライセンス: Link先を確認

Dave Van Veen, Cara Van Uden, Louis Blankemeier, Jean-Benoit Delbrouck, Asad Aali, Christian Bluethgen, Anuj Pareek, Malgorzata Polacin, Eduardo Pontes Reis, Anna Seehofnerova, Nidhi Rohatgi, Poonam Hosamani, William Collins, Neera Ahuja, Curtis P. Langlotz, Jason Hom, Sergios Gatidis, John Pauly, Akshay S. Chaudhari

(参考訳) 膨大なテキストデータを精査し、電子健康記録(ehr)から重要な情報を要約することは、臨床医の時間の割り当てに多大な負担を課す。大規模言語モデル(LLM)は自然言語処理(NLP)タスクにおいて大きな可能性を秘めているが、多種多様な臨床要約タスクに対する効果はまだ十分に実証されていない。本研究は,8つのllmにドメイン適応法を適用し,6つのデータセットと4つの異なる臨床要約タスク(放射線検査,患者の質問,進捗記録,医師と患者との対話)にまたがる。我々は,最近のllmの進歩が改善しない事例に加えて,モデルと適応手法のトレードオフを明らかにする。さらに,10名の医師による臨床読影者を対象に,最良適応LSMの要約は,完全性と正確性の観点からヒトの要約より好ましいことを示す。続く質的分析は、LLMと人間の専門家が直面する課題を強調します。最後に,これらの指標が医師の嗜好とどのように一致しているかの理解を深めるため,従来の量的NLP指標と読者調査スコアを相関付ける。我々の研究は、複数のタスクにわたる臨床テキスト要約において、llmが人間専門家を上回った最初の証拠である。このことは、LSMを臨床ワークフローに組み込むことで、医師がパーソナライズされた患者のケアや、本質的に人間の医学的側面にもっと集中できるように、ドキュメントの負担を軽減することができることを意味している。

Sifting through vast textual data and summarizing key information from electronic health records (EHR) imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown immense promise in natural language processing (NLP) tasks, their efficacy on a diverse range of clinical summarization tasks has not yet been rigorously demonstrated. In this work, we apply domain adaptation methods to eight LLMs, spanning six datasets and four distinct clinical summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Our thorough quantitative assessment reveals trade-offs between models and adaptation methods in addition to instances where recent advances in LLMs may not improve results. Further, in a clinical reader study with ten physicians, we show that summaries from our best-adapted LLMs are preferable to human summaries in terms of completeness and correctness. Our ensuing qualitative analysis highlights challenges faced by both LLMs and human experts. Lastly, we correlate traditional quantitative NLP metrics with reader study scores to enhance our understanding of how these metrics align with physician preferences. Our research marks the first evidence of LLMs outperforming human experts in clinical text summarization across multiple tasks. This implies that integrating LLMs into clinical workflows could alleviate documentation burden, empowering clinicians to focus more on personalized patient care and the inherently human aspects of medicine.

翻訳日:2023-10-25 23:44:24 公開日:2023-10-24

# MRI並列再構成のためのバッチインプットニューラル表現法

Batch Implicit Neural Representation for MRI Parallel Reconstruction ( http://arxiv.org/abs/2309.06067v3 )

ライセンス: Link先を確認

Hao Li, Yusheng Zhou, Jianan Liu, Xiling Liu, Tao Huang, and Zhihan Lv

(参考訳) 磁気共鳴画像(MRI)は常に長い取得時間の問題に悩まされている。 MRI再構成は、特定の位相符号化ラインをスキップし、アンダーサンプル測定から高品質なイメージを復元することでスキャン時間を短縮する1つの方法である。近年,物体を空間座標の連続関数として表現する新しい深層学習法として暗黙的ニューラル表現(INR)が登場し,この関数は通常多層パーセプトロン(MLP)によってパラメータ化される。本稿では,INRの一般化問題を克服するために,フルサンプリング画像をボクセル座標の関数として,アンダーサンプル画像の先行特徴ベクトルとして表現した新しいMRI並列再構成手法を提案する。具体的には,スケールの異なるmr画像からスケール非依存なvoxel特徴を生成し,座標ベクトルと結合してmlpを介して完全にサンプリングされたmr画像を復元し,任意のスケール再構成を実現するスケール埋め込みエンコーダを導入する。提案手法の性能は,mriデータセット上で実験し,他の再構成法と比較することで評価した。提案手法が代替手法よりも優れていることを示す定量的評価を行った。

Magnetic resonance imaging (MRI) always suffered from the problem of long acquisition time. MRI reconstruction is one solution to reduce scan time by skipping certain phase-encoding lines and then restoring high-quality images from undersampled measurements. Recently, implicit neural representation (INR) has emerged as a new deep learning method that represents an object as a continuous function of spatial coordinates, and this function is normally parameterized by a multilayer perceptron (MLP). In this paper, we propose a novel MRI parallel reconstruction method based on INR, which represents the fully-sampled images as the function of voxel coordinates and prior feature vectors of undersampled images for overcoming the generalization problem of INR. Specifically, we introduce a scale-embedded encoder to produce scale-independent voxel-specific features from MR images with different undersampled scales and then concatenate with coordinates vectors to recover fully-sampled MR images via an MLP, thus achieving arbitrary scale reconstruction. The performance of the proposed method was assessed by experimenting on publicly available MRI datasets and compared with other reconstruction methods. Our quantitative evaluation demonstrates the superiority of the proposed method over alternative reconstruction methods.

翻訳日:2023-10-25 23:43:25 公開日:2023-10-24

# NanoT5: リソース制限付き事前トレーニングおよび微調整T5スタイルモデルのためのPyTorchフレームワーク

nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources ( http://arxiv.org/abs/2309.02373v2 )

ライセンス: Link先を確認

Piotr Nawrot

(参考訳) T5のような最先端の言語モデルはNLPのランドスケープに革命をもたらしたが、その計算要求は研究コミュニティの大部分を妨げている。この課題に対処するため、T5モデルの事前学習と微調整を効率的に行うために特別に最適化されたPyTorchフレームワークであるnanoT5を提案する。 nanot5はオプティマイザの違いと優先順位付け効率から得られた洞察に基づいて、t5ベースのモデルをたった16時間で1つのgpuで事前トレーニングすることができる。このオープンソースフレームワークの導入により、言語モデリングの研究へのアクセシビリティを拡大し、よりユーザフレンドリーなT5(Encoder-Decoder)実装に対するコミュニティの要求に応えたいと思っています。コンフィギュレーションやコードベース、事前トレーニングされた洞察、事前トレーニングされたモデルなど、私たちのコントリビューションを一般公開しています。

State-of-the-art language models like T5 have revolutionized the NLP landscape, but their computational demands hinder a large portion of the research community. To address this challenge, we present nanoT5, a specially-optimized PyTorch framework for efficient pre-training and fine-tuning of T5 models. Drawing on insights from optimizer differences and prioritizing efficiency, nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance. With the introduction of this open-source framework, we hope to widen the accessibility to language modelling research and cater to the community's demand for more user-friendly T5 (Encoder-Decoder) implementations. We make our contributions, including configurations, codebase, pre-training insights, and pre-trained models, available to the public.

翻訳日:2023-10-25 23:43:04 公開日:2023-10-24

# Shatter and Gather: テキストスーパービジョンによる画像セグメンテーションの学習

Shatter and Gather: Learning Referring Image Segmentation with Text Supervision ( http://arxiv.org/abs/2308.15512v2 )

ライセンス: Link先を確認

Dongwon Kim, Namyup Kim, Cuiling Lan, Suha Kwak

(参考訳) イメージセグメンテーションを参照すると、自由形式のテキストで記述された任意のエンティティをセグメンテーションするタスクは、様々なビジョンアプリケーションを開きます。しかし、このタスクのトレーニングデータの手作業によるラベル付けは極めてコストがかかるため、トレーニング用のラベル付きデータが不足する。トレーニング画像のテキスト記述を唯一の監督源として用いた弱教師付き学習手法によりこの問題に対処する。この目的のために,まず,入力画像中の意味的エンティティを探索し,テキストクエリに関連するエンティティを結合して参照者のマスクを予測する新しいモデルを提案する。また、新たな損失関数を導入し、さらなる監視なしにモデルをトレーニングできるようにします。提案手法は,画像分割参照のための4つの公開ベンチマークで評価され,同じタスクに対する既存の手法や,最近のオープンボカブラリーセグメンテーションモデルよりも明らかに優れていた。

Referring image segmentation, the task of segmenting any arbitrary entities described in free-form texts, opens up a variety of vision applications. However, manual labeling of training data for this task is prohibitively costly, leading to lack of labeled data for training. We address this issue by a weakly supervised learning approach using text descriptions of training images as the only source of supervision. To this end, we first present a new model that discovers semantic entities in input image and then combines such entities relevant to text query to predict the mask of the referent. We also present a new loss function that allows the model to be trained without any further supervision. Our method was evaluated on four public benchmarks for referring image segmentation, where it clearly outperformed the existing method for the same task and recent open-vocabulary segmentation models on all the benchmarks.

翻訳日:2023-10-25 23:42:45 公開日:2023-10-24

# マルチアーメッドバンドの実値組合せ純粋探索のためのトンプソンサンプリング

Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit ( http://arxiv.org/abs/2308.10238v2 )

ライセンス: Link先を確認

Shintaro Nakamura, Masashi Sugiyama

(参考訳) 本稿では,マルチアームバンディット(R-CPE-MAB)問題の実測値について検討する。 R-CPE-MABでは、プレイヤーは確率的な腕を$d$与えられ、各アームの報酬は$s\in\{1, \ldots, d\}$が平均$\mu_s$の未知分布に従う。各タイムステップで、プレイヤーは片方の腕を引っ張り、その報酬を観察する。プレイヤーのゴールは、最適な \emph{action} $\boldsymbol{\pi}^{*} = \argmax_{\boldsymbol{\pi} \in \mathcal{A}} \boldsymbol{\mu}^{\top}\boldsymbol{\pi}$を有限サイズの実数値の \emph{action set} $\mathcal{A}\subset \mathbb{R}^{d}$から極小のアームプルで識別することである。 R-CPE-MAB の以前の方法では、アクションセット $\mathcal{A}$ のサイズは$d$ の多項式である。一般トンプソンサンプリング探索法(GenTS-Explore)と呼ばれるアルゴリズムを導入する。これはアクションセットのサイズが指数関数的に$d$で大きい場合でも動作する最初のアルゴリズムである。また,R-CPE-MAB問題に対して,新たな問題依存型サンプル複雑性を低い境界で導入し,GenTS-Exploreアルゴリズムが問題依存定数係数まで最適なサンプル複雑性を実現することを示す。

We study the real-valued combinatorial pure exploration of the multi-armed bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given $d$ stochastic arms, and the reward of each arm $s\in\{1, \ldots, d\}$ follows an unknown distribution with mean $\mu_s$. In each time step, a player pulls a single arm and observes its reward. The player's goal is to identify the optimal \emph{action} $\boldsymbol{\pi}^{*} = \argmax_{\boldsymbol{\pi} \in \mathcal{A}} \boldsymbol{\mu}^{\top}\boldsymbol{\pi}$ from a finite-sized real-valued \emph{action set} $\mathcal{A}\subset \mathbb{R}^{d}$ with as few arm pulls as possible. Previous methods in the R-CPE-MAB assume that the size of the action set $\mathcal{A}$ is polynomial in $d$. We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the action set is exponentially large in $d$. We also introduce a novel problem-dependent sample complexity lower bound of the R-CPE-MAB problem, and show that the GenTS-Explore algorithm achieves the optimal sample complexity up to a problem-dependent constant factor.

翻訳日:2023-10-25 23:42:20 公開日:2023-10-24

# 知識グラフ推論による弱教師付きセマンティックセグメンテーション

Weakly Supervised Semantic Segmentation by Knowledge Graph Inference ( http://arxiv.org/abs/2309.14057v2 )

ライセンス: Link先を確認

Jia Zhang, Bo Peng, Xi Wu

(参考訳) 現在、畳み込みニューラルネットワーク(CNN)に基づくWSSS(Weakly Supervised Semantic Segmentation)における既存の取り組みは、同様に重要な下流セグメンテーションネットワークに限定して、マルチラベル分類ネットワークステージの強化に重点を置いている。さらに、CNNベースのローカルコンボリューションには、広範なカテゴリ間の依存関係をモデル化する能力がない。そこで本稿では,wsss 強化のためのグラフ推論に基づくアプローチを提案する。マルチラベル分類とセグメンテーションネットワークの段階を同時に拡張することにより,WSSSの全体的改善を図る。マルチラベル分類ネットワークセグメントでは、外部知識とgcnを組み合わせることで、クラス間の依存関係をグローバルに推論する。これによりネットワークは、画像の不十分な領域の特徴を解明し、生成された擬似ラベルの完全性を改善することができる。セグメント化ネットワークセグメントにおいて,提案するグラフ推論マッピング(GRM)モジュールを用いてテキストデータベースから得られた知識を活用し,画像領域内のクラス表現の文脈的推論を容易にする。このgrmモジュールは、個々のサンプルに対するセマンティックコヒーレンスを動的に学習しながら、セグメンテーションネットワークの局所畳み込みの高レベル意味論における特徴表現を強化する。画像レベルの監視のみを用いて、PASCAL VOC 2012およびMS-COCOデータセット上でWSSSの最先端のパフォーマンスを達成した。マルチラベル分類とセグメンテーションネットワークの段階における広範な実験により,WSSSの進展に対するグラフ推論手法の有効性が示された。

Currently, existing efforts in Weakly Supervised Semantic Segmentation (WSSS) based on Convolutional Neural Networks (CNNs) have predominantly focused on enhancing the multi-label classification network stage, with limited attention given to the equally important downstream segmentation network. Furthermore, CNN-based local convolutions lack the ability to model the extensive inter-category dependencies. Therefore, this paper introduces a graph reasoning-based approach to enhance WSSS. The aim is to improve WSSS holistically by simultaneously enhancing both the multi-label classification and segmentation network stages. In the multi-label classification network segment, external knowledge is integrated, coupled with GCNs, to globally reason about inter-class dependencies. This encourages the network to uncover features in non-salient regions of images, thereby refining the completeness of generated pseudo-labels. In the segmentation network segment, the proposed Graph Reasoning Mapping (GRM) module is employed to leverage knowledge obtained from textual databases, facilitating contextual reasoning for class representation within image regions. This GRM module enhances feature representation in high-level semantics of the segmentation network's local convolutions, while dynamically learning semantic coherence for individual samples. Using solely image-level supervision, we have achieved state-of-the-art performance in WSSS on the PASCAL VOC 2012 and MS-COCO datasets. Extensive experimentation on both the multi-label classification and segmentation network stages underscores the effectiveness of the proposed graph reasoning approach for advancing WSSS.

翻訳日:2023-10-25 23:34:38 公開日:2023-10-24

# 在庫管理における後方予測 : 分類手法とコストの考察

Backorder Prediction in Inventory Management: Classification Techniques and Cost Considerations ( http://arxiv.org/abs/2309.13837v3 )

ライセンス: Link先を確認

Sarit Maitra, Sukanya Kundu

(参考訳) 本稿では,在庫管理における後方予測のための高度な分析手法を紹介する。秩序とは、株式の枯渇により直ちに達成できない命令のこと。 ROC-AUC や PR-AUC などの性能評価指標を用いて, 平衡バッグ分類器, ファジィ論理, 変分オートエンコーダ, 多層パーセプトロン分類器などの複数の分類手法の評価を行った。さらに、在庫管理や受注処理に関連する金銭的意味やコストを考慮すると、利益関数と誤分類コストが組み込まれている。この研究は、アンサンブル技法とvaeを含むモデリング手法の組み合わせによって、在庫管理における不均衡データセットを効果的に処理し、解釈可能性を強調し、偽陽性と偽陰性を低減できることを示唆している。本研究は, 予測分析の進歩に寄与し, 後方予測における今後の調査や意思決定のための在庫管理最適化に有用な知見を提供する。

This article introduces an advanced analytical approach for predicting backorders in inventory management. Backorder refers to an order that cannot be immediately fulfilled due to stock depletion. Multiple classification techniques, including Balanced Bagging Classifiers, Fuzzy Logic, Variational Autoencoder - Generative Adversarial Networks, and Multi-layer Perceptron classifiers, are assessed in this work using performance evaluation metrics such as ROC-AUC and PR-AUC. Moreover, this work incorporates a profit function and misclassification costs, considering the financial implications and costs associated with inventory management and backorder handling. The study suggests that a combination of modeling approaches, including ensemble techniques and VAE, can effectively address imbalanced datasets in inventory management, emphasizing interpretability and reducing false positives and false negatives. This research contributes to the advancement of predictive analytics and offers valuable insights for future investigations in backorder forecasting and inventory control optimization for decision-making.

翻訳日:2023-10-25 23:34:10 公開日:2023-10-24

# Rewrite Caption Semantics: 言語スーパービジョンセマンティックセマンティックセマンティックセマンティックスのためのブリッジングセマンティックギャップ

Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation ( http://arxiv.org/abs/2309.13505v3 )

ライセンス: Link先を確認

Yun Xing, Jian Kang, Aoran Xiao, Jiahao Nie, Shao Ling, Shijian Lu

(参考訳) ビジョンランゲージ事前学習は、その目覚ましいゼロショット認識能力と、言語監督から一般化可能な視覚表現を学習する可能性を示した。一歩前進して、言語によるセマンティックセグメンテーションは、画像とテキストのペアのみからピクセルグループを学習することで、テキスト入力の空間的局所化を可能にする。それでも、最先端技術は、視覚とテキストのモダリティの間に明確な意味的ギャップに悩まされている:画像に現れる多くの視覚概念が、ペア化されたキャプションに欠けている。このような意味的ミスアライメントは事前学習で循環し、テキスト表現で捉えた視覚概念が不十分なため、密集した予測ではゼロショット性能が劣る。このようなセマンティクスのギャップを埋めるため,CLIPを利用するパイプラインであるConcept Curation(CoCu)を提案する。各画像とテキストのペアに対して,視覚駆動型拡張とテキスト対視覚誘導ランキングとで視覚的に整合するコンセプトアーカイブを構築した。したがって、関連する概念はクラスタガイドによるサンプリングによって識別され、事前トレーニングされ、視覚とテキストのセマンティクスのギャップを埋めることができる。 8つのセグメンテーションベンチマークの幅広いスイートにわたる実験は、cocuがスーパーブゼロショット転送性能を達成し、言語教師付きセグメンテーションベースラインを大きなマージンで大きく向上させ、事前トレーニングデータにおけるセマンティクスギャップの橋渡しの価値を示唆している。

Vision-Language Pre-training has demonstrated its remarkable zero-shot recognition ability and potential to learn generalizable visual representations from language supervision. Taking a step ahead, language-supervised semantic segmentation enables spatial localization of textual inputs by learning pixel grouping solely from image-text pairs. Nevertheless, the state-of-the-art suffers from clear semantic gaps between visual and textual modality: plenty of visual concepts appeared in images are missing in their paired captions. Such semantic misalignment circulates in pre-training, leading to inferior zero-shot performance in dense predictions due to insufficient visual concepts captured in textual representations. To close such semantic gap, we propose Concept Curation (CoCu), a pipeline that leverages CLIP to compensate for the missing semantics. For each image-text pair, we establish a concept archive that maintains potential visually-matched concepts with our proposed vision-driven expansion and text-to-vision-guided ranking. Relevant concepts can thus be identified via cluster-guided sampling and fed into pre-training, thereby bridging the gap between visual and textual semantics. Extensive experiments over a broad suite of 8 segmentation benchmarks show that CoCu achieves superb zero-shot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin, suggesting the value of bridging semantic gap in pre-training data.

翻訳日:2023-10-25 23:33:51 公開日:2023-10-24

# 単語レベルとスパンレベルのタスクを統一する:NJUNLPによるWMT2023品質評価共有タスクへの参加

Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task ( http://arxiv.org/abs/2309.13230v2 )

ライセンス: Link先を確認

Xiang Geng, Zhejian Lai, Yu Zhang, Shimin Tao, Hao Yang, Jiajun Chen, Shujian Huang

(参考訳) 我々は,WMT 2023 Quality Estimation (QE)共有タスクに対するNJUNLPチームの提案を紹介する。私たちのチームは2つのサブタスクすべてで、英語とドイツ語のペアの予測を提出しました。 (i)文・語レベルの品質予測、及び (ii)細粒度エラースパン検出。 NJUQEフレームワーク(https://github.com/NJUNLP/njuqe)に基づくQEの擬似データ手法をさらに検討する。 WMT翻訳タスクから並列データを用いて疑似MQMデータを生成する。擬似QEデータ上でXLMR大モデルを事前訓練し、実QEデータ上で微調整する。両段階で文レベルスコアと単語レベルタグを共同で学習する。実証的に、私たちはパフォーマンスを改善する重要なハイパーパラメータを見つける実験を行います。技術的には、単語レベルの出力をきめ細かな誤差にカバーする単純な手法を提案する。全体的に、我々のモデルは単語レベルときめ細かいエラースパン検出サブタスクの両方において、英語とドイツ語で最高の結果を得ました。

We introduce the submissions of the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks: (i) sentence- and word-level quality prediction; and (ii) fine-grained error span detection. This year, we further explore pseudo data methods for QE based on NJUQE framework (https://github.com/NJUNLP/njuqe). We generate pseudo MQM data using parallel data from the WMT translation task. We pre-train the XLMR large model on pseudo QE data, then fine-tune it on real QE data. At both stages, we jointly learn sentence-level scores and word-level tags. Empirically, we conduct experiments to find the key hyper-parameters that improve the performance. Technically, we propose a simple method that covert the word-level outputs to fine-grained error span results. Overall, our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks by a considerable margin.

翻訳日:2023-10-25 23:33:19 公開日:2023-10-24

# AnglE最適化テキスト埋め込み

AnglE-optimized Text Embeddings ( http://arxiv.org/abs/2309.12871v5 )

ライセンス: Link先を確認

Xianming Li, Jing Li

(参考訳) 高品質なテキスト埋め込みは、Large Language Model (LLM) アプリケーションにおいて重要なコンポーネントであるセマンティックテキスト類似性(STS)タスクの改善に重要である。しかし、既存のテキスト埋め込みモデルが直面する共通の課題は、主に飽和ゾーンを持つ最適化目的におけるコサイン関数に依存することによる勾配の消失の問題である。本稿では,AnglEと呼ばれる新しい角度最適化テキスト埋め込みモデルを提案する。 AnglEの中核となる考え方は、複素空間に角度最適化を導入することである。この手法は、勾配を阻害し最適化を妨げうるコサイン関数における飽和域の悪影響を効果的に軽減する。包括的なSTS評価を設定するために、既存の短文STSデータセットとGitHub Issuesから新たに収集された長文STSデータセットを試した。さらに、ラベル付きデータに制限のあるドメイン固有のstsシナリオを検討し、アングルがllmアノテートデータとどのように連携するかを検討する。短文STS、長文STS、ドメイン固有のSTSタスクなど、さまざまなタスクで大規模な実験が行われた。その結果、AnglEはコサイン飽和ゾーンを無視したSOTA(State-of-the-art STS)モデルよりも優れていた。これらの結果は、AnglEが高品質なテキスト埋め込みを生成する能力と、STSにおける角度最適化の有用性を示している。

High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.

翻訳日:2023-10-25 23:32:43 公開日:2023-10-24

# モデルを微調整する方法:統一モデルシフトとモデルバイアスポリシー最適化

How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization ( http://arxiv.org/abs/2309.12671v2 )

ライセンス: Link先を確認

Hai Zhang, Hang Yu, Junqiao Zhao, Di Zhang, Chang Huang, Hongtu Zhou, Xiao Zhang, Chen Ye

(参考訳) 効果的なモデルベース強化学習(mbrl)アルゴリズムの設計と導出は、主にモデル学習とポリシー最適化の結合度が高いことが原因で困難である。モデル学習を導くためにリターンの相違に依存する多くの先行手法は、モデル変更の影響を無視しており、過剰なモデル更新によるパフォーマンス劣化につながる可能性がある。他のメソッドでは、モデルシフトを明示的に考慮するためにパフォーマンス差分を使用する。しかし、これらの手法はモデルシフトを制約するために一定のしきい値に依存するため、しきい値に大きく依存し、トレーニングプロセス中に適応性に欠ける。本稿では,モデルシフトとモデルバイアスを統一し,微調整プロセスを定式化する最適化目標を理論的に導出する。このプロセスはモデル更新を適応的に調整し、モデルオーバーフィットを避けながら、パフォーマンス向上の保証を得る。そこで我々は,USB-PO (Unified model Shift and model Bias Policy Optimization) という簡単なアルゴリズムを開発した。実験の結果,USB-POはいくつかの課題のあるベンチマークタスクにおいて,最先端のパフォーマンスを実現することがわかった。

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward algorithm USB-PO (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks.

翻訳日:2023-10-25 23:32:21 公開日:2023-10-24

# 自律型水中車両のインテリジェントデブリ質量推定モデル

Intelligent Debris Mass Estimation Model for Autonomous Underwater Vehicle ( http://arxiv.org/abs/2309.10617v2 )

ライセンス: Link先を確認

Mohana Sri S, Swethaa S, Aouthithiye Barathwaj SR Y, Sai Ganesh CS

(参考訳) 海洋ゴミは海洋生物の生存に重大な脅威をもたらし、しばしば絡み合いや飢餓につながり、最終的には死に至る。したがって、海洋からゴミを取り除くことは自然のバランスを回復し、海洋生物を繁栄させるのに不可欠である。インスタンスセグメンテーション(インスタンスセグメンテーション)は、物体を識別し、それらを正確に特定し、分離するオブジェクト検出の先進的な形態であり、自律型水中車両(AUV)が水中環境を効果的に操作するための必須のツールである。 AUVは画像セグメンテーションを使用して、カメラが捉えた画像を分析し、水中環境をナビゲートする。本稿では、画像内の個々のオブジェクトの面積を計算するためにインスタンスセグメンテーションを使用し、roboflowではyolov7を使用して、検出毎にクラスラベルと信頼度スコアを持つ画像内の各オブジェクトのバウンディングボックスのセットを生成する。次に、オブジェクトの境界ボックスにバイナリマスクを適用することで、各オブジェクトに対してセグメンテーションマスクを作成する。マスクは、背景からオブジェクトをセグメント化するように訓練された畳み込みニューラルネットワークの出力にバイナリしきい値を適用して生成される。最後に、形態素演算や輪郭検出などの後処理技術を適用し、マスクの精度と品質を向上させることにより、各対象に対するセグメンテーションマスクの精錬を行う。インスタンスセグメンテーションの領域を推定するプロセスは、各セグメンテーションされたインスタンスの領域を別々に計算し、全インスタンスの領域を合計して総面積を得る。この計算は、矩形や円のような物体の形状に基づく標準式を用いて行われる。対象が複素である場合、その領域を推定するためにモンテカルロ法が用いられる。この方法は従来の方法よりも精度が高く、特に多数のサンプルを使用する場合に高い精度を提供する。

Marine debris poses a significant threat to the survival of marine wildlife, often leading to entanglement and starvation, ultimately resulting in death. Therefore, removing debris from the ocean is crucial to restore the natural balance and allow marine life to thrive. Instance segmentation is an advanced form of object detection that identifies objects and precisely locates and separates them, making it an essential tool for autonomous underwater vehicles (AUVs) to navigate and interact with their underwater environment effectively. AUVs use image segmentation to analyze images captured by their cameras to navigate underwater environments. In this paper, we use instance segmentation to calculate the area of individual objects within an image, we use YOLOV7 in Roboflow to generate a set of bounding boxes for each object in the image with a class label and a confidence score for every detection. A segmentation mask is then created for each object by applying a binary mask to the object's bounding box. The masks are generated by applying a binary threshold to the output of a convolutional neural network trained to segment objects from the background. Finally, refining the segmentation mask for each object is done by applying post-processing techniques such as morphological operations and contour detection, to improve the accuracy and quality of the mask. The process of estimating the area of instance segmentation involves calculating the area of each segmented instance separately and then summing up the areas of all instances to obtain the total area. The calculation is carried out using standard formulas based on the shape of the object, such as rectangles and circles. In cases where the object is complex, the Monte Carlo method is used to estimate the area. This method provides a higher degree of accuracy than traditional methods, especially when using a large number of samples.

翻訳日:2023-10-25 23:32:04 公開日:2023-10-24

# 量子ハイブリッドおよび量子インスパイアされたハードウェア上での移動ロボットスケジューリング問題の最適化事例

An Optimization Case Study for solving a Transport Robot Scheduling Problem on Quantum-Hybrid and Quantum-Inspired Hardware ( http://arxiv.org/abs/2309.09736v4 )

ライセンス: Link先を確認

Dominik Leib, Tobias Seidel, Sven J\"ager, Raoul Heese, Caitlin Isobel Jones, Abhishek Awasthi, Astrid Niederle, Michael Bortz

(参考訳) 本稿では,d-wavesのquantum-classical hybrid framework,futsuのquantum-inspired digital annealer,gurobi's state-of-the-art classical solverの性能比較を行った。この問題は、産業的に関連のある現実世界のシナリオに由来する。我々は、異なる設計哲学に従う問題に対して、3つの異なるモデルを提供する。ベンチマークでは、異なるモデルとソルバの組み合わせのソリューション品質とエンドツーエンドランタイムに焦点を当てています。ディジタルアニールラーには有望な結果が得られ、グロビと直接比較すると、ハイブリッド量子アニールラーにはいくつかの機会がある。本研究は、異なる戦略でアプリケーション指向最適化問題を解決するためのワークフローに関する洞察を提供し、異なるアプローチの強みと弱みを評価するのに有用である。

We present a comprehensive case study comparing the performance of D-Waves' quantum-classical hybrid framework, Fujitsu's quantum-inspired digital annealer, and Gurobi's state-of-the-art classical solver in solving a transport robot scheduling problem. This problem originates from an industrially relevant real-world scenario. We provide three different models for our problem following different design philosophies. In our benchmark, we focus on the solution quality and end-to-end runtime of different model and solver combinations. We find promising results for the digital annealer and some opportunities for the hybrid quantum annealer in direct comparison with Gurobi. Our study provides insights into the workflow for solving an application-oriented optimization problem with different strategies, and can be useful for evaluating the strengths and weaknesses of different approaches.

翻訳日:2023-10-25 23:31:08 公開日:2023-10-24

# アバロンの思考ゲーム:再帰的熟考による偽装との戦い

Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation ( http://arxiv.org/abs/2310.01320v3 )

ライセンス: Link先を確認

Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang

(参考訳) 大規模言語モデル(LLM)の最近の進歩は、LLM-as-Agentの分野で大きな成功を収めている。それにもかかわらず、llmsが処理する情報は一貫して正直であり、人間社会やaiが生成するコンテンツにおける広汎な誤解や誤解を招く情報を無視しているという仮定が一般的である。この監視により、LSMは悪意のある操作を受けやすくなり、有害な結果をもたらす可能性がある。本研究では,複雑なアバロンゲームを用いて,認知環境におけるLSMの可能性を探究する。アバロンは誤った情報に満ちており、洗練された論理を必要とするため、「思考のゲーム」として表される。アバロンゲームにおける人間の再帰的思考と視点取りの有効性に着想を得て,LLMの認知・認識能力を高めるための新しい枠組みであるRecursive Contemplation(ReCon)を導入する。 ReConは、定式化と洗練の熟考プロセスを組み合わせており、定式化は初期の思考とスピーチを生み出し、洗練の熟考はそれらをさらに洗練する。さらに、これらのプロセスにそれぞれ一階および二階の視点遷移を組み込む。具体的には、LLMエージェントが他人の精神状態を推測し、2階は他人がエージェントの精神状態をどう知覚するかを理解する。 reconを異なるllmと統合した後、avalon gameの広範な実験結果は、追加の微調整やデータなしで偽情報の識別と操作をllmに支援する効果を示している。最後に、ReConの有効性の可能な説明を提供し、安全性、推論、話し方、フォーマットの観点からLLMの現在の限界を探求し、その後の研究の可能性を秘めている。

Recent breakthroughs in large language models (LLMs) have brought remarkable success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information processed by LLMs is consistently honest, neglecting the pervasive deceptive or misleading information in human society and AI-generated content. This oversight makes LLMs susceptible to malicious manipulations, potentially resulting in detrimental outcomes. This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments. Avalon, full of misinformation and requiring sophisticated logic, manifests as a "Game-of-Thoughts". Inspired by the efficacy of humans' recursive thinking and perspective-taking in the Avalon game, we introduce a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information. ReCon combines formulation and refinement contemplation processes; formulation contemplation produces initial thoughts and speech, while refinement contemplation further polishes them. Additionally, we incorporate first-order and second-order perspective transitions into these processes respectively. Specifically, the first-order allows an LLM agent to infer others' mental states, and the second-order involves understanding how others perceive the agent's mental state. After integrating ReCon with different LLMs, extensive experiment results from the Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around deceptive information without extra fine-tuning and data. Finally, we offer a possible explanation for the efficacy of ReCon and explore the current limitations of LLMs in terms of safety, reasoning, speaking style, and format, potentially furnishing insights for subsequent research.

翻訳日:2023-10-25 23:25:26 公開日:2023-10-24

# 話者認識のための自己スーパービジョンによる音声とコンテンツの分離

Disentangling Voice and Content with Self-Supervision for Speaker Recognition ( http://arxiv.org/abs/2310.01128v2 )

ライセンス: Link先を確認

Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li

(参考訳) 話者認識では,話者特性と内容が混在しているため,音声から正確な話者表現を抽出することは困難である。本稿では,話者の特性と内容の変動を同時にモデル化するアンタングル化フレームワークを提案する。異なる音声成分を抽出する学習可能な遷移モデルからなる3つのガウス推論層を用いて実現した。特に、強化された遷移モデルは、複雑な音声力学をモデル化するために特別に設計されている。また,話者識別以外のラベルを使わずにコンテンツを動的に切り離すセルフスーパービジョン手法を提案する。提案フレームワークの有効性は,VoxCelebデータセットとSITWデータセットを用いて,それぞれEERおよびminDCFの平均減少率を9.56%,8.24%で検証した。追加のモデルトレーニングやデータは特に必要とされないため、実用上容易に適用できる。

For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker traits and content variability in speech. It is realized with the use of three Gaussian inference layers, each consisting of a learnable transition model that extracts distinct speech components. Notably, a strengthened transition model is specifically designed to model complex speech dynamics. We also propose a self-supervision method to dynamically disentangle content without the use of labels other than speaker identities. The efficacy of the proposed framework is validated via experiments conducted on the VoxCeleb and SITW datasets with 9.56% and 8.24% average reductions in EER and minDCF, respectively. Since neither additional model training nor data is specifically needed, it is easily applicable in practical use.

翻訳日:2023-10-25 23:24:55 公開日:2023-10-24

# 自律運転における協調認識における適応的コミュニケーション

Adaptive Communications in Collaborative Perception with Domain Alignment for Autonomous Driving ( http://arxiv.org/abs/2310.00013v2 )

ライセンス: Link先を確認

Senkang Hu, Zhengru Fang, Haonan An, Guowen Xu, Yuan Zhou, Xianhao Chen, Yuguang Fang

(参考訳) 複数の連結車両と自律車両の協調認識は、車両が通信を介して補助情報を交換できるようにすることで、知覚能力を大幅に向上させることができる。従来のアプローチの進歩にもかかわらず、チャネルのばらつきとコラボレーティブな車両間のデータの均一性による課題は依然として残っている。そこで本研究では,通信グラフを動的に調整し,平均伝送遅延を最小化し,データの不均一性による副作用を緩和するチャネルアウェア協調知覚フレームワークacc-daを提案する。私たちの小説は3つの側面にある。まず、通信グラフを構築し、異なるチャネル情報状態に応じて伝送遅延を最小化できる伝送遅延最小化方法を設計する。次に、速度歪みトレードオフを動的に調整し、知覚効率を向上させる適応データ再構成機構を提案する。さらに、データ送信時の時間的冗長性を最小化する。最後に、異なる車両からのデータ分布を調整するためのドメインアライメントスキームを考案し、異なる車両間のドメイン間ギャップを緩和し、対象タスクの性能を向上させる。総合的な実験により,本手法の有効性が実証された。

Collaborative perception among multiple connected and autonomous vehicles can greatly enhance perceptive capabilities by allowing vehicles to exchange supplementary information via communications. Despite advances in previous approaches, challenges still remain due to channel variations and data heterogeneity among collaborative vehicles. To address these issues, we propose ACC-DA, a channel-aware collaborative perception framework to dynamically adjust the communication graph and minimize the average transmission delay while mitigating the side effects from the data heterogeneity. Our novelties lie in three aspects. We first design a transmission delay minimization method, which can construct the communication graph and minimize the transmission delay according to different channel information state. We then propose an adaptive data reconstruction mechanism, which can dynamically adjust the rate-distortion trade-off to enhance perception efficiency. Moreover, it minimizes the temporal redundancy during data transmissions. Finally, we conceive a domain alignment scheme to align the data distribution from different vehicles, which can mitigate the domain gap between different vehicles and improve the performance of the target task. Comprehensive experiments demonstrate the effectiveness of our method in comparison to the existing state-of-the-art works.

翻訳日:2023-10-25 23:24:41 公開日:2023-10-24

# 説明可能な機械学習に基づく糖尿病性腎症予測モデル

Explainable machine learning-based prediction model for diabetic nephropathy ( http://arxiv.org/abs/2309.16730v2 )

ライセンス: Link先を確認

Jing-Mei Yin, Yang Li, Jun-Tang Xue, Guo-Wei Zong, Zhong-Ze Fang, and Lang Zou

(参考訳) 本研究の目的は, 糖尿病性腎症 (DN) に対する血清代謝物の影響を解析し, 機械学習を用いてDNの有病率を予測することである。データセットは、2018年4月から2019年4月まで、大連医科大学第二附属病院(SAHDMU)で548人の患者で構成されている。最小絶対収縮・選択演算子(LASSO)回帰モデルと10倍のクロスバリデーションにより最適38個の特徴を選定する。我々は,eXtreme Gradient Boosting (XGB),ランダムフォレスト,決定木,ロジスティック回帰の4つの機械学習アルゴリズムを,AUC-ROC曲線,決定曲線,キャリブレーション曲線で比較した。 shapley additive explanations (shap) 法による最適予測モデルにおける特徴量と相互作用効果を定量化する。 xgbモデルは、最大auc値0.966のdnで画面表示に最適な性能を持つ。 XGBモデルは、他のモデルよりも臨床効果が高く、適合度も良い。さらに、血清代謝物と糖尿病の持続時間の間には大きな相互作用がある。我々は,DN をスクリーニングする XGB アルゴリズムによる予測モデルを開発した。 C2、C5DC、Tyr、Ser、Met、C24、C4DC、Cysはこのモデルに多大な貢献をしている。

The aim of this study is to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach. The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). We select the optimal 38 features through a Least absolute shrinkage and selection operator (LASSO) regression model and a 10-fold cross-validation. We compare four machine learning algorithms, including eXtreme Gradient Boosting (XGB), random forest, decision tree and logistic regression, by AUC-ROC curves, decision curves, calibration curves. We quantify feature importance and interaction effects in the optimal predictive model by Shapley Additive exPlanations (SHAP) method. The XGB model has the best performance to screen for DN with the highest AUC value of 0.966. The XGB model also gains more clinical net benefits than others and the fitting degree is better. In addition, there are significant interactions between serum metabolites and duration of diabetes. We develop a predictive model by XGB algorithm to screen for DN. C2, C5DC, Tyr, Ser, Met, C24, C4DC, and Cys have great contribution in the model, and can possibly be biomarkers for DN.

翻訳日:2023-10-25 23:24:24 公開日:2023-10-24

# 金融ポートフォリオ管理のためのディープラーニングとオンラインソース感の活用

Leveraging Deep Learning and Online Source Sentiment for Financial Portfolio Management ( http://arxiv.org/abs/2309.16679v2 )

ライセンス: Link先を確認

Paraskevi Nousi, Loukia Avramelou, Georgios Rodinos, Maria Tzelepi, Theodoros Manousis, Konstantinos Tsampazis, Kyriakos Stefanidis, Dimitris Spanos, Manos Kirtas, Pavlos Tosidis, Avraam Tsantekidis, Nikolaos Passalis and Anastasios Tefas

(参考訳) ファイナンシャル・ポートフォリオ・マネジメント(英: financial portfolio management)とは、株式、インデックスファンド、外国為替、暗号通貨などの一連の金融資産において、当該事業の損失を最小化しつつ利益を最大化することを目的とした、資金の分配及び取引業務を行う業務をいう。ディープラーニング(DL)メソッドは、さまざまなタスクにおいて一貫して優れており、自動化された金融取引はその中のひとつです。本稿では,金融取引における様々なdl手法について,監督学習と強化学習の両面で見識を提供することを目的としている。同時に、取引資産に関する感情情報を考慮し、対応する研究研究を通してそれらの有用性を議論し、実証する。最後に、このような金融エージェントの訓練においてよく見られる問題について議論し、これらの問題を避けるために必要な知識を読者に与え、実際に議論する方法を適用する。

Financial portfolio management describes the task of distributing funds and conducting trading operations on a set of financial assets, such as stocks, index funds, foreign exchange or cryptocurrencies, aiming to maximize the profit while minimizing the loss incurred by said operations. Deep Learning (DL) methods have been consistently excelling at various tasks and automated financial trading is one of the most complex one of those. This paper aims to provide insight into various DL methods for financial trading, under both the supervised and reinforcement learning schemes. At the same time, taking into consideration sentiment information regarding the traded assets, we discuss and demonstrate their usefulness through corresponding research studies. Finally, we discuss commonly found problems in training such financial agents and equip the reader with the necessary knowledge to avoid these problems and apply the discussed methods in practice.

翻訳日:2023-10-25 23:24:01 公開日:2023-10-24

# 一般化されたブラックホールエントロピーはフォン・ノイマンエントロピーである

Generalized Black Hole Entropy is von Neumann Entropy ( http://arxiv.org/abs/2309.15897v2 )

ライセンス: Link先を確認

Jonah Kudler-Flam, Samuel Leutheusser, Gautam Satishchandran

(参考訳) 最近、シュワルツシルト-AdSブラックホールの質量にdressした可観測物のフォン・ノイマン代数やデ・シッターの観測者がタイプIIであることが示されている。半古典状態のフォン・ノイマンエントロピーは一般化エントロピーであることが判明した。しかし、これらの議論は平衡状態(kms)の存在に依存しており、例えば重力崩壊によって形成されたブラックホール、カーブラックホール、あるいは漸近的にド・ジッター空間内のブラックホールには適用されない。本稿では, キリング地平線を持つ任意の時空上の線形場に対して, 着衣可観測体の代数を求めるための一般的な枠組みを提案する。定常状態(ただし必ずしも KMS ではない)の存在と解の適切な崩壊を仮定すると、着飾った可観測体の代数が常に地平線上に「局所化」されたタイプII因子を含むという構造定理が証明される。これらの仮定は、ほとんどのケースで厳格に証明されている。漸近的に平坦なケーラーブラックホールの外方での代数に応用すると、場はブラックホールの質量と角運動量にdressした状態で、地平線上のタイプII$_{\infty}$代数と過去のヌル無限大におけるタイプI$_{\infty}$代数の積が見つかる。シュワルツシルト=ド・シッター (Schwarzschild-de Sitter) では、観測者を導入するにもかかわらず、場の可観測物はブラックホールと宇宙的地平線の摂動領域に似ており、各地平線上のタイプII$_{\infty}$代数の積である。いずれの場合も、半古典状態に対するフォン・ノイマンのエントロピーは一般化エントロピーによって与えられる。我々の結果は、他の「有界構造」が存在する場合(例えば、漸近境界あるいは他のキリング地平線)、可観測体の代数はタイプII$_{\infty}$であり、そのような構造が存在しない場合(例えば、デ・シッター)、代数はタイプII$_{1}$であることを示している。

It was recently shown that the von Neumann algebras of observables dressed to the mass of a Schwarzschild-AdS black hole or an observer in de Sitter are Type II, and thus admit well-defined traces. The von Neumann entropies of "semi-classical" states were found to be generalized entropies. However, these arguments relied on the existence of an equilibrium (KMS) state and thus do not apply to, e.g., black holes formed from gravitational collapse, Kerr black holes, or black holes in asymptotically de Sitter space. In this paper, we present a general framework for obtaining the algebra of dressed observables for linear fields on any spacetime with a Killing horizon. We prove, assuming the existence of a stationary (but not necessarily KMS) state and suitable decay of solutions, a structure theorem that the algebra of dressed observables always contains a Type II factor "localized" on the horizon. These assumptions have been rigorously proven in most cases of interest. Applied to the algebra in the exterior of an asymptotically flat Kerr black hole, where the fields are dressed to the black hole mass and angular momentum, we find a product of a Type II$_{\infty}$ algebra on the horizon and a Type I$_{\infty}$ algebra at past null infinity. In Schwarzschild-de Sitter, despite the fact that we introduce an observer, the quantum field observables are dressed to the perturbed areas of the black hole and cosmological horizons and is the product of Type II$_{\infty}$ algebras on each horizon. In all cases, the von Neumann entropy for semiclassical states is given by the generalized entropy. Our results suggest that in all cases where there exists another "boundary structure" (e.g., an asymptotic boundary or another Killing horizon) the algebra of observables is Type II$_{\infty}$ and in the absence of such structures (e.g., de Sitter) the algebra is Type II$_{1}$.

翻訳日:2023-10-25 23:23:45 公開日:2023-10-24

# 拡散モデルにおける信号リークバイアスの爆発

Exploiting the Signal-Leak Bias in Diffusion Models ( http://arxiv.org/abs/2309.15842v2 )

ライセンス: Link先を確認

Martin Nicolas Everaert, Athanasios Fitsios, Marco Bocchio, Sami Arpa, Sabine S\"usstrunk, Radhakrishna Achanta

(参考訳) ほとんどの拡散モデルの推論パイプラインにはバイアスがある。このバイアスは、分布がノイズ分布から逸脱し、トレーニングと推論プロセスの間に不一致が生じる信号リークから生じる。この信号リークバイアスは、モデルが特定のスタイルに調整されると特に重要であり、サブ最適スタイルマッチングを引き起こす。最近の研究は、訓練中の信号漏れを回避しようとしている。代わりに、既存の拡散モデルにおけるこの信号漏れバイアスを利用して、生成した画像のさらなる制御を可能にする方法を示します。これにより、より輝度の異なる画像や、所望のスタイルや色に合致した画像を生成することができます。空間周波数及び画素領域における信号リークの分布をモデル化し、初期潜時における信号リークを含むことにより、追加のトレーニングを伴わずに予測結果に適合する画像を生成する。

There is a bias in the inference pipeline of most diffusion models. This bias arises from a signal leak whose distribution deviates from the noise distribution, creating a discrepancy between training and inference processes. We demonstrate that this signal-leak bias is particularly significant when models are tuned to a specific style, causing sub-optimal style matching. Recent research tries to avoid the signal leakage during training. We instead show how we can exploit this signal-leak bias in existing diffusion models to allow more control over the generated images. This enables us to generate images with more varied brightness, and images that better match a desired style or color. By modeling the distribution of the signal leak in the spatial frequency and pixel domains, and including a signal leak in the initial latent, we generate images that better match expected results without any additional training.

翻訳日:2023-10-25 23:23:06 公開日:2023-10-24

# 手術ビデオのための動的シーングラフ表現

Dynamic Scene Graph Representation for Surgical Video ( http://arxiv.org/abs/2309.14538v2 )

ライセンス: Link先を確認

Felix Holm, Ghazal Ghazaei, Tobias Czempiel, Ege \"Ozsoy, Stefan Saur, Nassir Navab

(参考訳) 顕微鏡または内視鏡画像装置から撮影された手術ビデオは、豊富なが複雑な情報源であり、様々なツールや解剖学的構造が長い時間で利用される。重要なワークフロー情報を含み、多くの手順で一般的に記録されているにもかかわらず、外科的ワークフロー理解のための外科的ビデオの使用は依然として限られている。本研究では,すべての解剖学的構造,ツール,およびそれらの相互作用をエンコードしながら,手術ビデオを表現するためのより包括的,意味的に有意義で可読な方法としてシーングラフを利用する。ソリューションの影響を適切に評価するために、cadisと白内障データセットのセマンティックセグメンテーションからシーングラフデータセットを作成します。本稿では,グラフ畳み込みネットワーク(gcns)を用いて,手術下下流の作業,例えば外科的ワークフロー認識や競合性能に対処し,シーングラフを活用できることを実証する。さらに, 臨床現場において重要なモデル決定の説明可能性とロバスト性に関して, 外科的シーングラフの有用性を示す。

Surgical videos captured from microscopic or endoscopic imaging devices are rich but complex sources of information, depicting different tools and anatomical structures utilized during an extended amount of time. Despite containing crucial workflow information and being commonly recorded in many procedures, usage of surgical videos for automated surgical workflow understanding is still limited. In this work, we exploit scene graphs as a more holistic, semantically meaningful and human-readable way to represent surgical videos while encoding all anatomical structures, tools, and their interactions. To properly evaluate the impact of our solutions, we create a scene graph dataset from semantic segmentations from the CaDIS and CATARACTS datasets. We demonstrate that scene graphs can be leveraged through the use of graph convolutional networks (GCNs) to tackle surgical downstream tasks such as surgical workflow recognition with competitive performance. Moreover, we demonstrate the benefits of surgical scene graphs regarding the explainability and robustness of model decisions, which are crucial in the clinical setting.

翻訳日:2023-10-25 23:22:53 公開日:2023-10-24

# 人間支援言語プランナーを用いた生涯ロボット学習

Lifelong Robot Learning with Human Assisted Language Planners ( http://arxiv.org/abs/2309.14321v2 )

ライセンス: Link先を確認

Meenal Parakh, Alisha Fong, Anthony Simeonov, Tao Chen, Abhishek Gupta, Pulkit Agrawal

(参考訳) 大規模言語モデル(LLM)は、高レベルの命令を実行可能な命令列に分解できるプランナーのように振る舞うことが示されている。しかし、現在のLSMベースのプランナーは、一定のスキルセットでしか動作できない。この限界を克服し、llmベースのプランナーを用いて新たなスキルをクエリし、これらのスキルを剛体オブジェクト操作のためのデータと時間効率のよい方法でロボットに教える方法を提案する。本システムは,新たに獲得したスキルを今後の課題に再利用し,オープンワールドと生涯学習の可能性を示す。シミュレーションと実世界における複数のタスクに関するフレームワークの評価を行った。ビデオは以下の通り。 https://sites.google.com/mit.edu/halp-robot-learning。

Large Language Models (LLMs) have been shown to act like planners that can decompose high-level instructions into a sequence of executable instructions. However, current LLM-based planners are only able to operate with a fixed set of skills. We overcome this critical limitation and present a method for using LLM-based planners to query new skills and teach robots these skills in a data and time-efficient manner for rigid object manipulation. Our system can re-use newly acquired skills for future tasks, demonstrating the potential of open world and lifelong learning. We evaluate the proposed framework on multiple tasks in simulation and the real world. Videos are available at: https://sites.google.com/mit.edu/halp-robot-learning.

翻訳日:2023-10-25 23:22:06 公開日:2023-10-24

# grove: 証拠の森を用いた検索による複雑なストーリー生成フレームワーク

GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence ( http://arxiv.org/abs/2310.05388v2 )

ライセンス: Link先を確認

Zhihua Wen, Zhiliang Tian, Wei Wu, Yuxin Yang, Yanqi Shi, Zhen Huang, Dongsheng Li

(参考訳) 条件付きストーリー生成は、人間と機械の相互作用、特に複雑なプロットによるストーリーの生成において重要である。大きな言語モデル(LLM)は、ストーリー生成を含む複数のNLPタスクでうまく機能するが、複雑なプロットと創造的なプロットの両方でストーリーを生成することは困難である。既存の手法はしばしば、LLMを目標条件に合わせるための詳細なプロンプトに依存しており、それは必然的に生成されたストーリーの創造性を制限している。我々は、模範的な人間書きの物語からの情報を活用することで、より多様なプロットラインを生み出すことを主張する。ストーリーの詳細を深く掘り下げることは、複雑で信頼できるプロットを構築するのに役立つ。本稿では,e\textbf{V}id\textbf{E}nce(GROVE)のf\textbf{O}restを用いた検索-au\textbf{G}mented sto\textbf{R}y生成フレームワークを提案する。我々は,目標条件の検索レポジトリを構築し,llmをプロンプトするためのサンプルを少数生成する。さらに,証拠の森を抽出する 'asking-why'' プロンプトスキームをデザインし,生成したストーリーで発生する曖昧さを補償する。この反復的なプロセスはストーリーの背景を明らかにする。最後に,エビデンス・フォレストから最も適切なエビデンス・チェーンを選択し,生成したストーリーに統合することで,物語の複雑さと信頼性を高める。実験結果と多数の事例が本手法の有効性を検証した。

Conditional story generation is significant in human-machine interaction, particularly in producing stories with complex plots. While Large language models (LLMs) perform well on multiple NLP tasks, including story generation, it is challenging to generate stories with both complex and creative plots. Existing methods often rely on detailed prompts to guide LLMs to meet target conditions, which inadvertently restrict the creative potential of the generated stories. We argue that leveraging information from exemplary human-written stories facilitates generating more diverse plotlines. Delving deeper into story details helps build complex and credible plots. In this paper, we propose a retrieval-au\textbf{G}mented sto\textbf{R}y generation framework with a f\textbf{O}rest of e\textbf{V}id\textbf{E}nce (GROVE) to enhance stories' complexity. We build a retrieval repository for target conditions to produce few-shot examples to prompt LLMs. Additionally, we design an ``asking-why'' prompting scheme that extracts a forest of evidence, providing compensation for the ambiguities that may occur in the generated story. This iterative process uncovers underlying story backgrounds. Finally, we select the most fitting chains of evidence from the evidence forest and integrate them into the generated story, thereby enhancing the narrative's complexity and credibility. Experimental results and numerous examples verify the effectiveness of our method.

翻訳日:2023-10-25 23:15:21 公開日:2023-10-24

# ラテント合成による効率的なテキストデータ利用によるエンドツーエンド音声処理の改善

Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis ( http://arxiv.org/abs/2310.05374v3 )

ライセンス: Link先を確認

Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen

(参考訳) 高性能なエンドツーエンド音声(E2E)処理モデルを訓練するには、特にデータ中心人工知能の時代において、大量のラベル付き音声データが必要となる。しかし、ラベル付き音声データは通常、テキストデータに比べて、収集が困難で費用がかかる。 E2E音声処理モデルのための効率的なテキストデータ利用フレームワークLaSynを提案する。我々は、テキストデータを事前訓練された音声モデルの中間潜在表現に変換するために、潜在合成器を訓練する。テキストデータの擬似音響表現は、モデルトレーニングのための音響データを増強する。我々は,低リソース自動音声認識(ASR)と音声言語理解(SLU)タスクにおけるLaSynの評価を行った。 ASRでは、LibriSpeechトレインクリーン100で訓練されたE2Eベースラインを改善し、異なるテストセットで単語エラー率を22.3%以上削減した。 SLUでは,SLURP上でのSLU-F1の絶対4.1%,SLURP上でのスロット充填SLU-F1の絶対4.49%,STOP上でのEMとEM-Treeの精度2.25%でE2Eベースラインを改善した。パラメータが少なければ、LaSynの結果は出版されている最先端の作品と競合する。その結果,強化トレーニングデータの品質が示された。

Training a high performance end-to-end speech (E2E) processing model requires an enormous amount of labeled speech data, especially in the era of data-centric artificial intelligence. However, labeled speech data are usually scarcer and more expensive for collection, compared to textual data. We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech processing models. We train a latent synthesizer to convert textual data into an intermediate latent representation of a pre-trained speech model. These pseudo acoustic representations of textual data augment acoustic data for model training. We evaluate LaSyn on low-resource automatic speech recognition (ASR) and spoken language understanding (SLU) tasks. For ASR, LaSyn improves an E2E baseline trained on LibriSpeech train-clean-100, with relative word error rate reductions over 22.3% on different test sets. For SLU, LaSyn improves our E2E baseline by absolute 4.1% for intent classification accuracy and 3.8% for slot filling SLU-F1 on SLURP, and absolute 4.49% and 2.25% for exact match (EM) and EM-Tree accuracies on STOP respectively. With fewer parameters, the results of LaSyn are competitive to published state-of-the-art works. The results demonstrate the quality of the augmented training data.

翻訳日:2023-10-25 23:14:52 公開日:2023-10-24

# Counter Turing Test CT^2: AI生成テキスト検出は、あなたが考えるほど簡単ではない -- AI検出可能性指数の導入

Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think -- Introducing AI Detectability Index ( http://arxiv.org/abs/2310.05030v2 )

ライセンス: Link先を確認

Megha Chakraborty, S.M Towhidul Islam Tonmoy, S M Mehedi Zaman, Krish Sharma, Niyar R Barman, Chandan Gupta, Shreya Gautam, Tanay Kumar, Vinija Jain, Aman Chadha, Amit P. Sheth, Amitava Das

(参考訳) 有能なChatGPTの台頭に伴い、AI生成テキストのリスクと結果が急増している。 AI生成物の所有権に関する必然的な問題に対処するため、米国著作権庁は「作品の伝統的な著作物が機械によって生産された場合、作品は人間の著作物に欠け、事務所はそれを登録しない」という声明を発表した。さらに、米国とEU政府は最近、AIの規制フレームワークに関する最初の提案を起草した。 AI生成型テキスト検出(AGTD)は、AI生成型テキスト検出(AGTD)におけるこのサイノーゾ的なスポットライトから、研究においてすぐに注目を集めているトピックとして現れ、いくつかの初期手法が提案され、間もなく検出をバイパスする技術が出現する。本稿では,既存のAGTD手法のロバスト性を総合的に評価することを目的とした手法のベンチマークであるCounter Turing Test (CT^2)を紹介する。調査対象のAGTD法が脆弱であることは明らかです。 AI開発を規制するための政策決定に関する広範な議論の中で、LLMが生成するコンテンツの検出可能性を評価することが最も重要である。そこで本研究では,LLMの評価とランク付けを容易にする定量スペクトルを確立するために,AI検出可能性指数(AI Detectability Index, ADI)を提案する。われわれは15個の現代LLMを徹底的に検討し、より大きなLLMはADIが高い傾向を示し、小さいLLMに比べて検出しにくいことを示した。 ADIはより広範なNLPコミュニティのツールとして大きな価値があり、AI関連の政策決定においてルーリックとして機能する可能性があると強く信じています。

With the rise of prolific ChatGPT, the risk and consequences of AI-generated text has increased alarmingly. To address the inevitable question of ownership attribution for AI-generated artifacts, the US Copyright Office released a statement stating that 'If a work's traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it'. Furthermore, both the US and the EU governments have recently drafted their initial proposals regarding the regulatory framework for AI. Given this cynosural spotlight on generative AI, AI-generated text detection (AGTD) has emerged as a topic that has already received immediate attention in research, with some initial methods having been proposed, soon followed by emergence of techniques to bypass detection. This paper introduces the Counter Turing Test (CT^2), a benchmark consisting of techniques aiming to offer a comprehensive evaluation of the robustness of existing AGTD techniques. Our empirical findings unequivocally highlight the fragility of the proposed AGTD methods under scrutiny. Amidst the extensive deliberations on policy-making for regulating AI development, it is of utmost importance to assess the detectability of content generated by LLMs. Thus, to establish a quantifiable spectrum facilitating the evaluation and ranking of LLMs according to their detectability levels, we propose the AI Detectability Index (ADI). We conduct a thorough examination of 15 contemporary LLMs, empirically demonstrating that larger LLMs tend to have a higher ADI, indicating they are less detectable compared to smaller LLMs. We firmly believe that ADI holds significant value as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making.

翻訳日:2023-10-25 23:14:27 公開日:2023-10-24

# ブラックホール蒸発の単位(半)因果量子回路表現

Unitary (semi)causal quantum-circuit representation of black hole evaporation ( http://arxiv.org/abs/2310.04744v3 )

ライセンス: Link先を確認

Bogus{\l}aw Broda

(参考訳) 事象の地平線 (semicausality) によって課される因果関係を尊重するブラックホールの一元進化(蒸発)の一般的な構造が導出され、量子回路の言語で表される。対応する絡み合いエントロピーとエントロピー曲線の進化に対する結果が決定されている。一般的なスキームの例として、テンソル製品モデルと制御された非製品モデルという2種類のキュービット玩具モデルが議論されている。

A general structure of unitary evolution (evaporation) of the black hole, respecting causality imposed by the event horizon (semicausality), has been derived and presented in the language of quantum circuits. The resulting consequences for the evolution of the corresponding entanglement entropy and the entropy curve have been determined. As an illustration of the general scheme, two families of qubit toy models have been discussed: tensor product models and controlled non-product models.

翻訳日:2023-10-25 23:13:56 公開日:2023-10-24

# 中国語大言語モデルにおける幻覚評価

Evaluating Hallucinations in Chinese Large Language Models ( http://arxiv.org/abs/2310.03368v3 )

ライセンス: Link先を確認

Qinyuan Cheng, Tianxiang Sun, Wenwei Zhang, Siyin Wang, Xiangyang Liu, Mozhi Zhang, Junliang He, Mianqiu Huang, Zhangyue Yin, Kai Chen, Xipeng Qiu

(参考訳) 本稿では,中国大言語モデルにおける幻覚現象を測定するために,HaluQAというベンチマークを作成した。 HalluQAには450の厳密に設計された敵の質問が含まれており、複数のドメインにまたがっており、中国の歴史的文化、慣習、社会現象を考慮に入れている。 HalluQAの構築中,擬似偽造と事実誤りの2種類の幻覚を考察し,GLM-130B と ChatGPT に基づく敵対的サンプルを構築した。評価のために,モデル出力が幻覚的かどうかを判定するために,GPT-4を用いた自動評価手法を設計する。 ERNIE-Bot、Baichuan2、ChatGLM、Qwen、SparkDeskなど、24の大規模言語モデルに関する広範な実験を行います。 24モデル中、18モデルは50%未満の非幻覚率を達成した。これはHauQAが非常に難しいことを示している。様々なモデルにおける幻覚の主なタイプとその原因を分析した。さらに,様々なモデルに対してどの種類の幻覚を優先すべきかについて議論する。

In this paper, we establish a benchmark named HalluQA (Chinese Hallucination Question-Answering) to measure the hallucination phenomenon in Chinese large language models. HalluQA contains 450 meticulously designed adversarial questions, spanning multiple domains, and takes into account Chinese historical culture, customs, and social phenomena. During the construction of HalluQA, we consider two types of hallucinations: imitative falsehoods and factual errors, and we construct adversarial samples based on GLM-130B and ChatGPT. For evaluation, we design an automated evaluation method using GPT-4 to judge whether a model output is hallucinated. We conduct extensive experiments on 24 large language models, including ERNIE-Bot, Baichuan2, ChatGLM, Qwen, SparkDesk and etc. Out of the 24 models, 18 achieved non-hallucination rates lower than 50%. This indicates that HalluQA is highly challenging. We analyze the primary types of hallucinations in different types of models and their causes. Additionally, we discuss which types of hallucinations should be prioritized for different types of models.

翻訳日:2023-10-25 23:13:18 公開日:2023-10-24

# 自己教師型エンコーダ・デコーダ音声モデルのプロンプティングと適応調整

Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model ( http://arxiv.org/abs/2310.02971v2 )

ライセンス: Link先を確認

Kai-Wei Chang, Ming-Hsin Chen, Yun-Ping Lin, Jing Neng Hsu, Paul Kuo-Ming Huang, Chien-yu Huang, Shang-Wen Li, Hung-yi Lee

(参考訳) プロンプティングとアダプタチューニングがファインチューニング(FT)手法の効率的な代替手段として登場した。しかし、既存の音声プロンプトの研究は分類タスクに焦点が当てられ、より複雑なシーケンス生成タスクに失敗した。加えて、アダプタチューニングは主にエンコーダのみの自己教師型モデルに焦点をあてて適用される。実験の結果,自己教師付きエンコーダデコーダモデルwav2seqは,シーケンス生成タスクにおける従来の作業を上回ることがわかった。 ASRでは単語誤り率が53%向上し,スロットフィリングではF1スコアが27%向上した。さらに、プロンプトは低リソースシナリオにおいてFT法と競合する。さらに,言語間asrにおけるwav2seqのプロンプトとアダプタチューニングの伝達可能性を示す。訓練可能なパラメータが限られている場合、プロンプトとアダプタのチューニングは7つの言語で従来のFTより一貫して優れている。特に低リソースのシナリオでは、アダプタチューニングが一貫して向上する。

Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. However, existing studies on speech prompting focused on classification tasks and failed on more complex sequence generation tasks. Besides, adapter tuning is primarily applied with a focus on encoder-only self-supervised models. Our experiments show that prompting on Wav2Seq, a self-supervised encoder-decoder model, surpasses previous works in sequence generation tasks. It achieves a remarkable 53% relative improvement in word error rate for ASR and a 27% in F1 score for slot filling. Additionally, prompting competes with the FT method in the low-resource scenario. Moreover, we show the transferability of prompting and adapter tuning on Wav2Seq in cross-lingual ASR. When limited trainable parameters are involved, prompting and adapter tuning consistently outperform conventional FT across 7 languages. Notably, in the low-resource scenario, prompting consistently outperforms adapter tuning.

翻訳日:2023-10-25 23:12:32 公開日:2023-10-24

# 監視量子ビットにおける局在、フラクタル性、エルゴード性

Localization, fractality, and ergodicity in a monitored qubit ( http://arxiv.org/abs/2310.01997v2 )

ライセンス: Link先を確認

Paul P\"opperl, Igor V. Gornyi, David B. Saakian, Oleg M. Yevtushenko

(参考訳) そこで本研究では,二段階システム (qubit) の統計的特性を反復的に測定した。このセットアップは、システムのユニタリダイナミクスと量子測定によって導入された非ユニタリ確率の間の複雑な相互作用を探索するための基本的な最小限のモデルであり、これは測定誘起相転移の現象の中心である。この「トイモデル」は、量子ビットの量子状態の分布関数を長時間の極限で表す、驚くほどリッチなダイナミクスを持つことを示した。我々はアンダーソン局在の現象と魅力的な類似点を発見したが、それは異なる基礎的なメカニズムによって支配されている。具体的には、監視された量子ビットの状態分布関数は、ブロッホ球面上の1つの角度でパラメータ化され、アンダーソン遷移の理論に精通した様々な種類の振る舞いを示し、完全な局在からほぼ一様非局在まで、この2つの極限の間にフラクタリティが生じる。各種特殊ケースの解析解と2つの相補的な数値的アプローチを組み合わせることにより、モデルの「位相図」を記述した構造を包括的に理解する。我々は、初期状態の分類と定量化を行い、監視された量子ビットの2つの異なる位相:エルゴードと非エルゴードを同定する。これら2つのフェーズ間の遷移が主な発見です。

We study the statistical properties of a single two-level system (qubit) subject to repetitive ancilla-based measurements. This setup is a fundamental minimal model for exploring the intricate interplay between the unitary dynamics of the system and the nonunitary stochasticity introduced by quantum measurements, which is central to the phenomenon of measurement-induced phase transitions. We demonstrate that this "toy model" harbors remarkably rich dynamics, manifesting in the distribution function of the qubit's quantum states in the long-time limit. We uncover a compelling analogy with the phenomenon of Anderson localization, albeit governed by distinct underlying mechanisms. Specifically, the state distribution function of the monitored qubit, parameterized by a single angle on the Bloch sphere, exhibits diverse types of behavior familiar from the theory of Anderson transitions, spanning from complete localization to almost uniform delocalization, with fractality occurring between the two limits. By combining analytical solutions for various special cases with two complementary numerical approaches, we achieve a comprehensive understanding of the structure delineating the "phase diagram" of the model. We categorize and quantify the emergent regimes and identify two distinct phases of the monitored qubit: ergodic and nonergodic. The transition between these two phases is our main finding.

翻訳日:2023-10-25 23:11:59 公開日:2023-10-24

# TRIGO:生成言語モデルのための形式的数学的証明のベンチマーク

TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models ( http://arxiv.org/abs/2310.10180v2 )

ライセンス: Link先を確認

Jing Xiong, Jianhao Shen, Ye Yuan, Haiming Wang, Yichun Yin, Zhengying Liu, Lin Li, Zhijiang Guo, Qingxing Cao, Yinya Huang, Chuanyang Zheng, Xiaodan Liang, Ming Zhang, Qun Liu

(参考訳) 自動定理証明(ATP)は、最近成功した生成言語モデルの推論能力を探究する上で魅力的な領域となっている。しかし、現在のATPベンチマークは主にシンボリック推論に焦点を当てているが、複素数組合せの推論を理解することは滅多にない。本研究では, ATP ベンチマーク TRIGO を提案する。このベンチマークは, ステップバイステップの証明で三角法式を縮小するモデルを必要とするだけでなく, 論理式に対する生成的 LM の推論能力とその操作, グループ化, 因子数項の操作能力を評価する。我々は、Webから三角法式とその縮小形式を収集し、手作業で単純化プロセスを注釈化し、それをリーン形式言語システムに翻訳する。その後、アノテーション付きサンプルからサンプルを自動生成してデータセットを拡張する。さらに,Lean-Gymに基づく自動生成装置を開発し,モデルの一般化能力を徹底的に分析するために,様々な困難と分布のデータセット分割を作成する。提案するTRIGOは,多量のオープンソース形式定理証明言語データに基づいて事前学習された GPT-4 を含む先進的生成型LMの新たな課題を示すとともに,形式的および数学的推論において,生成型LMの能力を研究するための新しいツールを提供する。

Automated theorem proving (ATP) has become an appealing domain for exploring the reasoning ability of the recent successful generative language models. However, current ATP benchmarks mainly focus on symbolic inference, but rarely involve the understanding of complex number combination reasoning. In this work, we propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas and its capability to manipulate, group, and factor number terms. We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system. We then automatically generate additional examples from the annotated samples to expand the dataset. Furthermore, we develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability. Our extensive experiments show our proposed TRIGO poses a new challenge for advanced generative LM's including GPT-4 which is pre-trained on a considerable amount of open-source formal theorem-proving language data, and provide a new tool to study the generative LM's ability on both formal and mathematical reasoning.

翻訳日:2023-10-25 23:06:28 公開日:2023-10-24

# AdaptSSR: Augmentation-Adaptive Self-Supervised Rankingによる事前学習ユーザモデル

AdaptSSR: Pre-training User Model with Augmentation-Adaptive Self-Supervised Ranking ( http://arxiv.org/abs/2310.09706v2 )

ライセンス: Link先を確認

Yang Yu, Qi Liu, Kai Zhang, Yuren Zhang, Chao Song, Min Hou, Yuqing Yuan, Zhihao Ye, Zaixi Zhang, Sanshi Lei Yu

(参考訳) ユーザの特性や関心を捉えることを目的としたユーザモデリングは、タスク固有のラベル付きデータに大きく依存しており、データのスパーシティの問題に苦しんでいる。最近のいくつかの研究は、対照的な学習タスクで大量のユーザー行動シーケンスでユーザーモデルを事前学習することでこの問題に取り組みました。一般に、これらの手法は、データ拡張によって構築された同一の行動列の異なるビューを意味的に一貫した、すなわち、ユーザの類似した特性や興味を反映し、特徴空間におけるそれらの合意を最大化する。しかし,ユーザ行動の多様さや騒音のため,既存の拡張手法はユーザの特徴を損なったり,ノイズを生じさせる傾向がある。したがって、ユーザモデルに拡張ビュー間の類似性を直接最大化させると、負の転送が発生する可能性がある。そこで本研究では,ユーザモデルを事前学習しながら,拡張ビュー間の意味的一貫性の要件を緩和する,拡張適応型自己教師付きランキング (adaptssr) という新しいpretextタスクでコントラスト学習タスクを置き換えることを提案する。具体的には,ユーザモデルをトレーニングして,暗黙的に拡張されたビューと明示的な拡張されたビュー,他のユーザからのビューの類似性をキャプチャする,複数対のランキング損失を採用する。さらに,モデルトレーニングを容易にするために,バッチ内ハードネガティブサンプリング戦略も採用した。さらに,異なる行動系列に対するデータ拡張の影響を別々に考慮し,拡張ビュー間の推定類似度に基づいて,各サンプルに適用される類似度順序制約を自動的に調整する拡張適応融合機構を設計する。 6つの下流タスクを持つパブリックデータセットと産業データセットの大規模な実験は、AdaptSSRの有効性を検証する。

User modeling, which aims to capture users' characteristics or interests, heavily relies on task-specific labeled data and suffers from the data sparsity issue. Several recent studies tackled this problem by pre-training the user model on massive user behavior sequences with a contrastive learning task. Generally, these methods assume different views of the same behavior sequence constructed via data augmentation are semantically consistent, i.e., reflecting similar characteristics or interests of the user, and thus maximizing their agreement in the feature space. However, due to the diverse interests and heavy noise in user behaviors, existing augmentation methods tend to lose certain characteristics of the user or introduce noisy behaviors. Thus, forcing the user model to directly maximize the similarity between the augmented views may result in a negative transfer. To this end, we propose to replace the contrastive learning task with a new pretext task: Augmentation-Adaptive SelfSupervised Ranking (AdaptSSR), which alleviates the requirement of semantic consistency between the augmented views while pre-training a discriminative user model. Specifically, we adopt a multiple pairwise ranking loss which trains the user model to capture the similarity orders between the implicitly augmented view, the explicitly augmented view, and views from other users. We further employ an in-batch hard negative sampling strategy to facilitate model training. Moreover, considering the distinct impacts of data augmentation on different behavior sequences, we design an augmentation-adaptive fusion mechanism to automatically adjust the similarity order constraint applied to each sample based on the estimated similarity between the augmented views. Extensive experiments on both public and industrial datasets with six downstream tasks verify the effectiveness of AdaptSSR.

翻訳日:2023-10-25 23:06:03 公開日:2023-10-24

# ポイントDynRF:単眼ビデオからの点ベース動的放射場

Point-DynRF: Point-based Dynamic Radiance Fields from a Monocular Video ( http://arxiv.org/abs/2310.09647v2 )

ライセンス: Link先を確認

Byeongjun Park, Changick Kim

(参考訳) 動的放射場は単眼ビデオから新しいビューを生成するための有望なアプローチとして現れてきた。しかし, 従来の手法では, 隣接する入力フレーム間のみの動的放射場に対する幾何的整合性を強制し, 大域的なシーン形状を表現し, 入力カメラ軌道から時空間離れた視点で退化させることが困難であった。この問題を解決するために、我々は、大域的幾何学情報とボリュームレンダリングプロセスがそれぞれニューラルネットワークと動的放射場によってトレーニングされる新しいフレームワークである点ベース動的放射場(\textbf{Point-DynRF})を導入する。具体的には,幾何学的プロキシから直接ニューラルポイント雲を再構成し,提案する損失を用いて放射場と幾何学的プロキシの両方を最適化し,相互補完を可能にした。提案手法の有効性をNVIDIA Dynamic Scenes Datasetと因果的に捉えたモノクロビデオクリップを用いて検証した。

Dynamic radiance fields have emerged as a promising approach for generating novel views from a monocular video. However, previous methods enforce the geometric consistency to dynamic radiance fields only between adjacent input frames, making it difficult to represent the global scene geometry and degenerates at the viewpoint that is spatio-temporally distant from the input camera trajectory. To solve this problem, we introduce point-based dynamic radiance fields (\textbf{Point-DynRF}), a novel framework where the global geometric information and the volume rendering process are trained by neural point clouds and dynamic radiance fields, respectively. Specifically, we reconstruct neural point clouds directly from geometric proxies and optimize both radiance fields and the geometric proxies using our proposed losses, allowing them to complement each other. We validate the effectiveness of our method with experiments on the NVIDIA Dynamic Scenes Dataset and several causally captured monocular video clips.

翻訳日:2023-10-25 23:05:32 公開日:2023-10-24

# explore-instruct: 能動的探索によるドメイン固有の命令カバレッジの向上

Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration ( http://arxiv.org/abs/2310.09168v3 )

ライセンス: Link先を確認

Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, Shuming Shi

(参考訳) インストラクションチューニングは、拡張された多様性によって大幅に最適化され、より広い範囲のタスクを扱うことができるモデルとなる。しかし、そのようなチューニングに使用される既存のデータは、個々のドメインの不十分なカバレッジを示すことが多く、これらの領域内のニュアンスな理解と相互作用の範囲を制限する。そこで本研究では,Large Language Models (LLMs) による積極的な探索を通じて,ドメイン固有の命令チューニングに使用されるデータカバレッジを向上させる手法であるExplore-Instructを提案する。 Explore-Instructは、汎用的なドメインユースケースに基づいて、多種多様なドメイン中心の命令チューニングデータを得るための探索アルゴリズムを実装することで、さまざまなバリエーションや可能性を探究する。データ中心分析は、ドメイン固有の命令カバレッジを改善するために提案手法の有効性を検証する。さらに,本モデルの性能は,ドメイン固有のデータ拡張など,複数のベースラインにまたがる大幅な向上を示す。本研究は,特にドメイン固有の文脈において,命令カバレッジを改善するための有望な機会を提供し,適応可能な言語モデルの開発を促進する。私たちのコード、モデルウェイト、データは、 \url{https://github.com/fanqiwan/Explore-Instruct}で公開されています。

Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks. However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a novel approach to enhance the data coverage to be used in domain-specific instruction-tuning through active exploration via Large Language Models (LLMs). Built upon representative domain use cases, Explore-Instruct explores a multitude of variations or possibilities by implementing a search algorithm to obtain diversified and domain-focused instruction-tuning data. Our data-centric analysis validates the effectiveness of this proposed approach in improving domain-specific instruction coverage. Moreover, our model's performance demonstrates considerable advancements over multiple baselines, including those utilizing domain-specific data enhancement. Our findings offer a promising opportunity to improve instruction coverage, especially in domain-specific contexts, thereby advancing the development of adaptable language models. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/Explore-Instruct}.

翻訳日:2023-10-25 23:05:15 公開日:2023-10-24

# PuoBERTa:セツワナのキュレート言語モデルの訓練と評価

PuoBERTa: Training and evaluation of a curated language model for Setswana ( http://arxiv.org/abs/2310.09141v2 )

ライセンス: Link先を確認

Vukosi Marivate, Moseli Mots'Oehli, Valencia Wagner, Richard Lastrucci and Isheanesu Dzingirai

(参考訳) 自然言語処理(NLP)は、Setswanaのような低リソース言語では遅れを取っているが、英語のような豊富なリソース言語では大きな進歩を遂げている。本稿では,seswana用に特別に訓練されたカスタマイズされたマスキング言語モデルpuobertaについて述べる。我々は,PuoBERTaのトレーニングのための高品質なコーパスを生成するために,多種多様なモノリンガルテキストの収集,キュレート,準備を行った。 setwanaのためのモノリンガルリソースの作成に先立って,part-of-speech(pos)タグ,named entity recognition(ner),news categorizationなど,いくつかのnlpタスクでpuobertaを評価した。さらに、新しいセツワナニュース分類データセットを導入し、PuoBERTaを使った初期ベンチマークを提供した。我々の研究は、セツワナのような未調査言語に対するNLP能力の育成におけるPuoBERTaの有効性を実証し、今後の研究方向性の道を開く。

Natural language processing (NLP) has made significant progress for well-resourced languages such as English but lagged behind for low-resource languages like Setswana. This paper addresses this gap by presenting PuoBERTa, a customised masked language model trained specifically for Setswana. We cover how we collected, curated, and prepared diverse monolingual texts to generate a high-quality corpus for PuoBERTa's training. Building upon previous efforts in creating monolingual resources for Setswana, we evaluated PuoBERTa across several NLP tasks, including part-of-speech (POS) tagging, named entity recognition (NER), and news categorisation. Additionally, we introduced a new Setswana news categorisation dataset and provided the initial benchmarks using PuoBERTa. Our work demonstrates the efficacy of PuoBERTa in fostering NLP capabilities for understudied languages like Setswana and paves the way for future research directions.

翻訳日:2023-10-25 23:04:53 公開日:2023-10-24

# 機械学習に基づく地球科学システムのモデリングのための質量保存型パーセプトロン

A Mass-Conserving-Perceptron for Machine Learning-Based Modeling of Geoscientific Systems ( http://arxiv.org/abs/2310.08644v2 )

ライセンス: Link先を確認

Yuan-Heng Wang, Hoshin V. Gupta

(参考訳) 地学システムの時系列進化を予測する物理概念(PC)モデルの構築に何十年も取り組んできたが、最近の研究は機械学習(ML)ベースのGated Recurrent Neural Network技術が、はるかに正確なモデルの開発に利用できることを示している。しかし,MLモデルから身体的理解を抽出することの難しさは,システム構造や機能に関する科学的知識の強化に有用である。本稿では,PCベースとMLベースのモデリングアプローチのギャップを埋める手段として,物理的に解釈可能なMass Conserving Perceptron(MCP)を提案する。 MCPは、PCモデルとGRNNの両方の基盤となる有向グラフ構造間の固有同型を利用して、物理的プロセスの質量保存性を明確に表現し、それらのプロセスの機能的性質を、既製のML技術を用いて利用可能なデータから直接(解釈可能な方法で)学習できるようにする。概念実証として,mcpの機能的表現力(能力)を調査し,リーフ川流域の降雨流出(rr)ダイナミクスを同時表現する能力について検討し,科学的仮説検証に有用性を示す。結論として,この概念を拡張して,地学システムを通しての質量エネルギー情報流の結合特性のMLに基づく物理概念表現を可能にする。

Although decades of effort have been devoted to building Physical-Conceptual (PC) models for predicting the time-series evolution of geoscientific systems, recent work shows that Machine Learning (ML) based Gated Recurrent Neural Network technology can be used to develop models that are much more accurate. However, the difficulty of extracting physical understanding from ML-based models complicates their utility for enhancing scientific knowledge regarding system structure and function. Here, we propose a physically-interpretable Mass Conserving Perceptron (MCP) as a way to bridge the gap between PC-based and ML-based modeling approaches. The MCP exploits the inherent isomorphism between the directed graph structures underlying both PC models and GRNNs to explicitly represent the mass-conserving nature of physical processes while enabling the functional nature of such processes to be directly learned (in an interpretable manner) from available data using off-the-shelf ML technology. As a proof of concept, we investigate the functional expressivity (capacity) of the MCP, explore its ability to parsimoniously represent the rainfall-runoff (RR) dynamics of the Leaf River Basin, and demonstrate its utility for scientific hypothesis testing. To conclude, we discuss extensions of the concept to enable ML-based physical-conceptual representation of the coupled nature of mass-energy-information flows through geoscientific systems.

翻訳日:2023-10-25 23:04:35 公開日:2023-10-24

# 一般化リセット過程を考慮したマルコフ開量子力学における普遍的および非普遍的確率則

Universal and nonuniversal probability laws in Markovian open quantum dynamics subject to generalized reset processes ( http://arxiv.org/abs/2310.06981v2 )

ライセンス: Link先を確認

Federico Carollo, Igor Lesanovsky, Juan P. Garrahan

(参考訳) 我々は、マルコフ開量子系の量子ジャンプ軌道を、初期配置への状態の確率的リセットの対象となるものとする。リセットイベントは、量子軌道を連続した時間間隔に分割し、各間隔内で観測可能な軌道の値から確率変数のシーケンスを定義する。量子状態の関数に関連する観測可能量に対して、列内の特定の順序の確率が普遍法則に従うことを示す。この法則は、選択された可観測性に依存しず、ポアソニアンリセット過程の場合、ダイナミクスの詳細にも依存しない。量子ジャンプの数え上げに関連する可観測性を考慮すると、一般の確率は普遍的な性質を失う。普遍性は、同じシーケンスで等しい結果が観測される確率が、弱いリセット率の限界で達成できるような、消滅的に小さい場合にのみ回復される。その結果,従来の確率過程 [N。〜r。 ~smith et al., epl {\bf 142}, 51002 (2023)] 量子領域と状態依存リセット過程に関係し、普遍確率法則の出現に関連する側面に光を当てている。

We consider quantum jump trajectories of Markovian open quantum systems subject to stochastic in time resets of their state to an initial configuration. The reset events provide a partitioning of quantum trajectories into consecutive time intervals, defining sequences of random variables from the values of a trajectory observable within each of the intervals. For observables related to functions of the quantum state, we show that the probability of certain orderings in the sequences obeys a universal law. This law does not depend on the chosen observable and, in case of Poissonian reset processes, not even on the details of the dynamics. When considering (discrete) observables associated with the counting of quantum jumps, the probabilities in general lose their universal character. Universality is only recovered in cases when the probability of observing equal outcomes in a same sequence is vanishingly small, which we can achieve in a weak reset rate limit. Our results extend previous findings on classical stochastic processes [N.~R.~Smith et al., EPL {\bf 142}, 51002 (2023)] to the quantum domain and to state-dependent reset processes, shedding light on relevant aspects for the emergence of universal probability laws.

翻訳日:2023-10-25 23:04:11 公開日:2023-10-24

# パッセージレベルの幻覚検出のための新しいベンチマークと逆検証法

A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection ( http://arxiv.org/abs/2310.06498v2 )

ライセンス: Link先を確認

Shiping Yang, Renliang Sun, Xiaojun Wan

(参考訳) 大きな言語モデル(LLM)は、現実世界のシナリオで人間と効果的に協力する能力を示している。しかし、LCMは幻覚、すなわち不正確なテキストと未検証情報を生成する傾向があり、ミッションクリティカルなタスクに配備すると大きなダメージを与える可能性がある。本稿では,ゼロリソース方式で事実誤りを自動的に検出する逆検証に基づく自己チェック手法を提案する。そこで本研究では,ChatGPTが生成し,アノテーションを付加した幻覚検出ベンチマークPHDを構築した。ゼロリソース幻覚検出の以前の研究とは対照的に,本手法とベンチマークは文レベルではなくパスレベル検出に集中している。提案手法と既存のゼロリソース検出手法を2つのデータセット上で実証的に評価した。実験の結果,提案手法はトークンのコストが少なく,時間も少ないが,ベースラインをかなり上回ることがわかった。さらに,LLMが捕捉できなかった幻覚症例を手動で解析し,ゼロリソース手法の共有限界を明らかにした。

Large Language Models (LLMs) have shown their ability to collaborate effectively with humans in real-world scenarios. However, LLMs are apt to generate hallucinations, i.e., makeup incorrect text and unverified information, which can cause significant damage when deployed for mission-critical tasks. In this paper, we propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion. To facilitate future studies and assess different methods, we construct a hallucination detection benchmark named PHD, which is generated by ChatGPT and annotated by human annotators. Contrasting previous studies of zero-resource hallucination detection, our method and benchmark concentrate on passage-level detection instead of sentence-level. We empirically evaluate our method and existing zero-resource detection methods on two datasets. The experimental results demonstrate that the proposed method considerably outperforms the baselines while costing fewer tokens and less time. Furthermore, we manually analyze some hallucination cases that LLM failed to capture, revealing the shared limitation of zero-resource methods.

翻訳日:2023-10-25 23:03:47 公開日:2023-10-24

# 近接認識表現によるメモリ効率の高い位置推薦

Memory efficient location recommendation through proximity-aware representation ( http://arxiv.org/abs/2310.06484v2 )

ライセンス: Link先を確認

Xuan Luo, Mingqing Huang, Rui Lv, Hui Zhao

(参考訳) シーケンシャルな位置推薦は、ユーザー体験を高め、ビジネスに利益をもたらし、行政を補助する現代の生活において大きな役割を果たす。位置推薦手法は,レコメンデーションシステムの開発によって大きく発展してきたが,地理的情報の利用は限定的であり,データの疎性に対処する課題も続いている。そこで本研究では,自己認識ネットワークアーキテクチャ上に構築された逐次レコメンデーション(PASR:Sequential Recommendation)の領域表現について述べる。本稿では,重要サンプリングを用いた新たな損失関数を用いて,最適化時の情報的負のサンプルを強調する。さらに、PASRは、自己アテンションに基づく地理エンコーダを、各GPSポイントにおける階層グリッドと近接グリッドに利用することにより、地理情報の統合を強化する。さらに地理情報を活用するため,近接認識型負のサンプリング器を用いて負のサンプルの品質を向上させる。 3つの実世界位置ベースソーシャルネットワーキング(LBSN)データセットを用いて評価を行い、PASRが最先端のシーケンシャルな位置推薦方法を上回ることを示した。

Sequential location recommendation plays a huge role in modern life, which can enhance user experience, bring more profit to businesses and assist in government administration. Although methods for location recommendation have evolved significantly thanks to the development of recommendation systems, there is still limited utilization of geographic information, along with the ongoing challenge of addressing data sparsity. In response, we introduce a Proximity-aware based region representation for Sequential Recommendation (PASR for short), built upon the Self-Attention Network architecture. We tackle the sparsity issue through a novel loss function employing importance sampling, which emphasizes informative negative samples during optimization. Moreover, PASR enhances the integration of geographic information by employing a self-attention-based geography encoder to the hierarchical grid and proximity grid at each GPS point. To further leverage geographic information, we utilize the proximity-aware negative samplers to enhance the quality of negative samples. We conducted evaluations using three real-world Location-Based Social Networking (LBSN) datasets, demonstrating that PASR surpasses state-of-the-art sequential location recommendation methods

翻訳日:2023-10-25 23:03:28 公開日:2023-10-24

# AdaFuse:空間/周波数交差注意に基づく適応医療画像融合

AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential Cross Attention ( http://arxiv.org/abs/2310.05462v2 )

ライセンス: Link先を確認

Xianming Gu, Lihui Wang, Zeyu Deng, Ying Cao, Xingyu Huang and Yue-min Zhu

(参考訳) マルチモーダル画像の融合は, 多モーダル画像の相補的情報を単一の画像にマージできるため, 正確な臨床診断と手術ナビゲーションに不可欠である。融合画像の品質は、抽出された単一モダリティの特徴と、マルチモーダル情報に対する融合規則に依存する。既存の深層学習に基づく融合法では各モードの意味的特徴を完全に活用することができ、各モードの有効低周波情報と高周波情報を識別することができず、適応的に融合することができない。本稿では,フーリエ変換に基づく周波数誘導注意機構を用いてマルチモーダル画像情報を適応的に融合するadafuseを提案する。具体的には,鍵と問合せ値の交換により空間領域と周波数領域の2つのモダリティの特徴を適応的に融合し,空間と周波数の特徴間のクロスアテンションスコアを算出し,空間と周波数の融合をさらに導くクロスアテンション融合(caf)ブロックを提案する。 cafブロックは、異なるモダリティの高周波特性を高め、融合画像の詳細を保持することができる。さらに,低周波情報と高周波情報の両方を保持するために,構造損失とコンテンツ損失からなる新しい損失関数を設計する。いくつかのデータセットにおける広範囲な比較実験により、提案手法が視覚品質と定量的指標の両方において最先端の手法よりも優れていることが示されている。アブレーション実験は, 提案した損失・融合戦略の有効性も検証した。

Multi-modal medical image fusion is essential for the precise clinical diagnosis and surgical navigation since it can merge the complementary information in multi-modalities into a single image. The quality of the fused image depends on the extracted single modality features as well as the fusion rules for multi-modal information. Existing deep learning-based fusion methods can fully exploit the semantic features of each modality, they cannot distinguish the effective low and high frequency information of each modality and fuse them adaptively. To address this issue, we propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism based on Fourier transform. Specifically, we propose the cross-attention fusion (CAF) block, which adaptively fuses features of two modalities in the spatial and frequency domains by exchanging key and query values, and then calculates the cross-attention scores between the spatial and frequency features to further guide the spatial-frequential information fusion. The CAF block enhances the high-frequency features of the different modalities so that the details in the fused images can be retained. Moreover, we design a novel loss function composed of structure loss and content loss to preserve both low and high frequency information. Extensive comparison experiments on several datasets demonstrate that the proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics. The ablation experiments also validate the effectiveness of the proposed loss and fusion strategy.

翻訳日:2023-10-25 23:03:08 公開日:2023-10-24

# ImageArg-2023:マルチモーダル・引数マイニングにおける最初の共有タスクの概要

Overview of ImageArg-2023: The First Shared Task in Multimodal Argument Mining ( http://arxiv.org/abs/2310.12172v2 )

ライセンス: Link先を確認

Zhexiong Liu, Mohamed Elaraby, Yang Zhong, Diane Litman

(参考訳) 本稿では,第10回Argument Mining on EMNLP 2023ワークショップと共同で,最初のマルチモーダルなArgument Mining共有タスクであるImageArg共有タスクの概要を紹介する。共有タスクは,(1)Subtask-A:Argument Stance Classification,(2)Subtask-B: Image Persuasiveness Classificationの2つのサブタスクからなる。前者は、物議を醸す話題(銃規制や中絶など)に向けて、画像とテキストを含むツイートのスタンスを決定する。後者は、画像がツイートテキストをより説得力のあるものにするかどうかを決定する。共有タスクは6カ国9チームからSubtask-A申請31件、Subtask-B申請21件を受け取った。 subtask-a の上位は 0.8647 の f1-score を達成し、subtask-b の上位は 0.5561 の f1-score を達成した。

This paper presents an overview of the ImageArg shared task, the first multimodal Argument Mining shared task co-located with the 10th Workshop on Argument Mining at EMNLP 2023. The shared task comprises two classification subtasks - (1) Subtask-A: Argument Stance Classification; (2) Subtask-B: Image Persuasiveness Classification. The former determines the stance of a tweet containing an image and a piece of text toward a controversial topic (e.g., gun control and abortion). The latter determines whether the image makes the tweet text more persuasive. The shared task received 31 submissions for Subtask-A and 21 submissions for Subtask-B from 9 different teams across 6 countries. The top submission in Subtask-A achieved an F1-score of 0.8647 while the best submission in Subtask-B achieved an F1-score of 0.5561.

翻訳日:2023-10-25 22:55:03 公開日:2023-10-24

# bin-wise scalingは、機械学習回帰における予測の不確かさの一貫性と適応性を改善することができるか?

Can bin-wise scaling improve consistency and adaptivity of prediction uncertainty for machine learning regression ? ( http://arxiv.org/abs/2310.11978v2 )

ライセンス: Link先を確認

Pascal Pernot

(参考訳) binwise variance scaling (bvs) は、一様分散(または温度)スケーリングよりも効率的な補正が可能な機械学習回帰問題の予測の不確実性のためのポストホックなリカバリ法として最近提案されている。 BVSのオリジナルバージョンは不確実性ベースのビンニングを使用しており、不確実性、すなわち一貫性に基づいて校正条件を改善することを目的としている。ここでは,BVSの適応,特に代替損失関数と,適応性を改善するための入力機能(X)に基づくビンニング方式について検討する。すなわち,BVSと提案した変種の性能は,原子化エネルギーの予測のためのベンチマークデータセット上で検証し,等調回帰の結果と比較する。

Binwise Variance Scaling (BVS) has recently been proposed as a post hoc recalibration method for prediction uncertainties of machine learning regression problems that is able of more efficient corrections than uniform variance (or temperature) scaling. The original version of BVS uses uncertainty-based binning, which is aimed to improve calibration conditionally on uncertainty, i.e. consistency. I explore here several adaptations of BVS, in particular with alternative loss functions and a binning scheme based on an input-feature (X) in order to improve adaptivity, i.e. calibration conditional on X. The performances of BVS and its proposed variants are tested on a benchmark dataset for the prediction of atomization energies and compared to the results of isotonic regression.

翻訳日:2023-10-25 22:54:32 公開日:2023-10-24

# ディスコナンスからインサイトへ:事例アウトカム分類のための集合住宅の解体

From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification ( http://arxiv.org/abs/2310.11878v4 )

ライセンス: Link先を確認

Shanshan Xu, T.Y.S.S Santosh, Oana Ichim, Isabella Risini, Barbara Plank, Matthias Grabmair

(参考訳) 法的NLPでは、ケースアウトカム分類(COC)は正確であるだけでなく、信頼性と説明性も必要である。説明可能なCOCの既存の作業は、単一の専門家によるアノテーションに限定されている。しかし、弁護士が事件事実の評価に異議を唱えることも知られている。そこで我々は,国際人権法領域の専門家2人から得られたechr1の合理的な変動に関する新たなデータセットを収集し,弱い合意を遵守する。それらの不一致を調査し,coc固有のサブカテゴリを補う2段階のタスク非依存分類法を構築した。我々の知る限り、これは人間のラベルの変化に焦点を当てた法的NLPにおける最初の研究である。異なる分類群を定量的に評価し,cocメタデータの粒度やノイズを考慮し,法的な文脈を過小に特定することによる不一致が主な原因であることを見出した。さらに、RAVE上でのSOTA COCモデルの妥当性を評価し、モデルと専門家間の限定的な合意を観察する。総じて,本事例のケーススタディでは,法的nlpにおけるベンチマークデータセット作成におけるhhertoの不正確さが明らかにされている。

In legal NLP, Case Outcome Classification (COC) must not only be accurate but also trustworthy and explainable. Existing work in explainable COC has been limited to annotations by a single expert. However, it is well-known that lawyers may disagree in their assessment of case facts. We hence collect a novel dataset RAVE: Rationale Variation in ECHR1, which is obtained from two experts in the domain of international human rights law, for whom we observe weak agreement. We study their disagreements and build a two-level task-independent taxonomy, supplemented with COC-specific subcategories. To our knowledge, this is the first work in the legal NLP that focuses on human label variation. We quantitatively assess different taxonomy categories and find that disagreements mainly stem from underspecification of the legal context, which poses challenges given the typically limited granularity and noise in COC metadata. We further assess the explainablility of SOTA COC models on RAVE and observe limited agreement between models and experts. Overall, our case study reveals hitherto underappreciated complexities in creating benchmark datasets in legal NLP that revolve around identifying aspects of a case's facts supposedly relevant to its outcome.

翻訳日:2023-10-25 22:54:16 公開日:2023-10-24

# 画像データに対するconvnetのパラメータ生成のための学習

Learning to Generate Parameters of ConvNets for Unseen Image Data ( http://arxiv.org/abs/2310.11862v2 )

ライセンス: Link先を確認

Shiye Wang, Kaituo Feng, Changsheng Li, Ye Yuan, Guoren Wang

(参考訳) 典型的な畳み込みニューラルネットワーク(convnets)は、大量の画像データに大きく依存し、ネットワークパラメータを学習するために反復最適化アルゴリズム(sgdやadamなど)を利用する。本稿では,convnetアーキテクチャが与えられたとき,画像データセットとそれに対応する最適なネットワークパラメータの間に相関関係が存在することを観測し,それらの関係を捉えるハイパーマップを学習できるかどうかを検証し,トレーニングフェーズで見たことのない画像データセットのネットワークパラメータを直接予測できるように,新たなトレーニングパラダイムを提案し,convnetのパラメータ学習を予測タスクに定式化する。そこで我々は,データセットとそれに対応するネットワークパラメータのマッピングを学習する目的で,PudNetと呼ばれる新しいハイパーネットワークモデルを提案し,そのパラメータを1つの前方伝播だけで予測する。さらに,重みを共有する一連の適応型ハイパーリカレントユニットにより,異なるネットワーク層間のパラメータの依存性を捉えることができる。大規模な実験により,提案手法は,データセット内予測とデータセット間予測の2種類のデータセットに対して有効であることが示された。当社のPudNetは,ImageNet-1Kなど,大規模なデータセットにもスケールアップ可能です。 GCをスクラッチから使用してImageNet-1K上でResNet-18をトレーニングするには8967GPU秒を要する。しかし、我々のpudnetはresnet-18のネットワークパラメータを予測するのにわずか3.89gpu秒しかかからない(44.92%)。

Typical Convolutional Neural Networks (ConvNets) depend heavily on large amounts of image data and resort to an iterative optimization algorithm (e.g., SGD or Adam) to learn network parameters, which makes training very time- and resource-intensive. In this paper, we propose a new training paradigm and formulate the parameter learning of ConvNets into a prediction task: given a ConvNet architecture, we observe there exists correlations between image datasets and their corresponding optimal network parameters, and explore if we can learn a hyper-mapping between them to capture the relations, such that we can directly predict the parameters of the network for an image dataset never seen during the training phase. To do this, we put forward a new hypernetwork based model, called PudNet, which intends to learn a mapping between datasets and their corresponding network parameters, and then predicts parameters for unseen data with only a single forward propagation. Moreover, our model benefits from a series of adaptive hyper recurrent units sharing weights to capture the dependencies of parameters among different network layers. Extensive experiments demonstrate that our proposed method achieves good efficacy for unseen image datasets on two kinds of settings: Intra-dataset prediction and Inter-dataset prediction. Our PudNet can also well scale up to large-scale datasets, e.g., ImageNet-1K. It takes 8967 GPU seconds to train ResNet-18 on the ImageNet-1K using GC from scratch and obtain a top-5 accuracy of 44.65 %. However, our PudNet costs only 3.89 GPU seconds to predict the network parameters of ResNet-18 achieving comparable performance (44.92 %), more than 2,300 times faster than the traditional training paradigm.

翻訳日:2023-10-25 22:53:57 公開日:2023-10-24

# ニューラルネットワークを用いた自己注意機構におけるQKV計算の強化

Neural Attention: Enhancing QKV Calculation in Self-Attention Mechanism with Neural Networks ( http://arxiv.org/abs/2310.11398v2 )

ライセンス: Link先を確認

Muhan Zhang

(参考訳) ディープラーニングの領域では、自己認識メカニズムは、自然言語処理やコンピュータビジョンを含む、無数のタスクにまたがる重要な役割を実証している。多様なアプリケーションで成功しているにもかかわらず、従来の自己認識メカニズムは主にクエリ、キー、値(QKV)の計算に線形変換を利用する。本稿では,qkv計算のための新しい手法を探究し,特別に設計されたニューラルネットワーク構造を用いて計算を行う。改良されたマリアンモデルを用いて、IWSLT 2017ドイツ語翻訳タスクデータセットの実験を行い、従来の手法で近似した。実験結果から,BLEUスコアの大幅な向上が得られた。さらに,wikitext-103データセットを用いてrobertaモデルをトレーニングする際にも,モデルのパープレキシティが当初のデータに比べて著しく低下していることを反映して,その優越性が示された。これらの実験結果から,本手法の有効性を検証できるだけでなく,ニューラルネットワークを用いたqkv計算による自己着脱機構の最適化の可能性も明らかにした。提案手法のソースコードと実装の詳細はhttps://github.com/ocislyjrti/NeuralAttention.comでアクセスできます。

In the realm of deep learning, the self-attention mechanism has substantiated its pivotal role across a myriad of tasks, encompassing natural language processing and computer vision. Despite achieving success across diverse applications, the traditional self-attention mechanism primarily leverages linear transformations for the computation of query, key, and value (QKV), which may not invariably be the optimal choice under specific circumstances. This paper probes into a novel methodology for QKV computation-implementing a specially-designed neural network structure for the calculation. Utilizing a modified Marian model, we conducted experiments on the IWSLT 2017 German-English translation task dataset and juxtaposed our method with the conventional approach. The experimental results unveil a significant enhancement in BLEU scores with our method. Furthermore, our approach also manifested superiority when training the Roberta model with the Wikitext-103 dataset, reflecting a notable reduction in model perplexity compared to its original counterpart. These experimental outcomes not only validate the efficacy of our method but also reveal the immense potential in optimizing the self-attention mechanism through neural network-based QKV computation, paving the way for future research and practical applications. The source code and implementation details for our proposed method can be accessed at https://github.com/ocislyjrti/NeuralAttention.

翻訳日:2023-10-25 22:53:04 公開日:2023-10-24

# vechr:欧州人権裁判所における脆弱性タイプの説明可能かつロバストな分類のためのデータセット

VECHR: A Dataset for Explainable and Robust Classification of Vulnerability Type in the European Court of Human Rights ( http://arxiv.org/abs/2310.11368v4 )

ライセンス: Link先を確認

Shanshan Xu, Leon Staufer, T.Y.S.S Santosh, Oana Ichim, Corina Heri, Matthias Grabmair

(参考訳) 脆弱性を認識することは,対象とするサポートの理解と実装において極めて重要である。これは欧州人権裁判所(ECtHR)において特に重要であり、裁判所は条約の基準を実際の個人のニーズに適合させ、それによって効果的な人権保護を確保する。しかし、脆弱性の概念はECtHRではいまだ解明されておらず、これまでのNLP研究では対応していない。そこで本研究では,脆弱性型分類と説明的根拠からなる,新たな専門家によるマルチラベルデータセットであるVECHRを提案する。予測可能性と説明可能性の両方の観点から,VECHRの最先端モデルの性能をベンチマークする。結果は,予測性能が低く,モデルと専門家の合意が限られているタスクの難易度を示す。さらに,out-of-domain(ood)データを扱う際のモデルのロバスト性を分析し,全体の性能を観測する。私たちのデータセットは、パフォーマンス、説明可能性、堅牢性に関する大きな改善の余地を提供するユニークな課題をもたらします。

Recognizing vulnerability is crucial for understanding and implementing targeted support to empower individuals in need. This is especially important at the European Court of Human Rights (ECtHR), where the court adapts Convention standards to meet actual individual needs and thus ensures effective human rights protection. However, the concept of vulnerability remains elusive at the ECtHR and no prior NLP research has dealt with it. To enable future research in this area, we present VECHR, a novel expert-annotated multi-label dataset comprising of vulnerability type classification and explanation rationale. We benchmark the performance of state-of-the-art models on VECHR from both prediction and explainability perspectives. Our results demonstrate the challenging nature of the task with lower prediction performance and limited agreement between models and experts. Further, we analyze the robustness of these models in dealing with out-of-domain (OOD) data and observe overall limited performance. Our dataset poses unique challenges offering significant room for improvement regarding performance, explainability, and robustness.

翻訳日:2023-10-25 22:52:27 公開日:2023-10-24

# 弱視を利用してインドネシアの保全データセットを生成する

Utilizing Weak Supervision To Generate Indonesian Conservation Dataset ( http://arxiv.org/abs/2310.11258v2 )

ライセンス: Link先を確認

Mega Fransiska, Diah Pitaloka, Saripudin, Satrio Putra, Lintang Sutawika

(参考訳) 弱監視は、NLP開発を加速する需要の増加に対応する、迅速かつ大規模データセット作成のための有望なアプローチとして現れている。ラベル機能を利用することで、弱い監督により、ソフトラベル付きデータセットを生成する学習ラベルモデルを作成することで、実践者が迅速にデータセットを生成することができる。本稿では,インドネシアのNLPデータセットを保護ニューステキストから構築する方法について述べる。マルチクラス分類と感情分類の2種類のデータセットを構築した。次に、様々な事前学習言語モデルを用いてベースライン実験を行う。これらの基準値は59.79%の精度と55.72%のF1スコア、66.87%のF1スコアマクロ、71.5%のF1スコアマイクロ、83.67%のROC-AUCの試験結果を示している。さらに,本研究で使用されるデータセットとラベル機能もリリースして,さらなる研究と探索を行う。

Weak supervision has emerged as a promising approach for rapid and large-scale dataset creation in response to the increasing demand for accelerated NLP development. By leveraging labeling functions, weak supervision allows practitioners to generate datasets quickly by creating learned label models that produce soft-labeled datasets. This paper aims to show how such an approach can be utilized to build an Indonesian NLP dataset from conservation news text. We construct two types of datasets: multi-class classification and sentiment classification. We then provide baseline experiments using various pretrained language models. These baseline results demonstrate test performances of 59.79% accuracy and 55.72% F1-score for sentiment classification, 66.87% F1-score-macro, 71.5% F1-score-micro, and 83.67% ROC-AUC for multi-class classification. Additionally, we release the datasets and labeling functions used in this work for further research and exploration.

翻訳日:2023-10-25 22:51:46 公開日:2023-10-24

# 導波路QEDにおける量子多光子ラビ振動

Quantum Multiphoton Rabi Oscillations in Waveguide QED ( http://arxiv.org/abs/2310.15412v1 )

ライセンス: Link先を確認

Debsuvra Mukhopadhyay and Jung-Tsung Shen

(参考訳) 量子情報処理の未来は、チップスケールのナノフォトニクス、特にキャビティQEDと導波路QEDである。量子フォトニクス技術を支える最前線のプロセスの1つは、強いレーザー源によって量子ビットが照射されたときに現れるラビ振動現象である。従来の半古典的枠組みとは別に、光励起が多光子フォック状態の形で、キュービットカップルが放射線モードの連続体となるより一般的な量子論的ケースについて述べる。実空間の定式化を利用して、2レベルエミッタと相互作用するフォトニックフォック状態の散乱ダイナミクスを解析的に探索する。原子励起の振幅は、逐次光子吸収と放出のポテンシャルによって引き起こされる様々な独立した散乱事象の線形重ね合わせを示す。数個の光子のうちの1つが確率的散乱によって始められた最低次励起は、弱場環境におけるダイナミクスを適切に特徴づける。これは、原子-光子相互作用の繰り返しによる高次散乱現象によって補われる。我々の構成におけるクォービット励起の時間的進化は、特にラビ振動が展開する強跳躍極限において、半古典的な予測を密接に反映している。特に、この半古典的パラダイムとの互換性は、弱い運転と大きな調整の限界の両方に適用される。したがって,本解析では,単一モードキャビティqedに関連する量子ラビ振動の既存の結果から,光子を情報キャリアとするマルチモード導波路qed構成まで拡張する。最後に、パルス波パケットの散乱ダイナミクスについて検討し、少数の光子を含むシナリオにおいても励起効率を大幅に向上させる可能性を明らかにする。

The future of quantum information processing hinges on chip-scale nanophotonics, specifically cavity QED and waveguide QED. One of the foremost processes underpinning quantum photonic technologies is the phenomenon of Rabi oscillations, which manifests when a qubit is irradiated by an intense laser source. Departing from the conventional semiclassical framework, we expound on the more general, quantum-theoretic case where the optical excitation takes the form of a multiphoton Fock state, and the qubit couples to a continuum of radiation modes. By employing the real-space formalism, we analytically explore the scattering dynamics of the photonic Fock state as it interfaces with a two-level emitter. The resulting amplitude for atomic excitation features a linear superposition of various independent scattering events that are triggered by the potential of sequential photon absorptions and emissions. The lowest-order excitation event, initiated by the stochastic scattering of one of the several photons, aptly characterizes the dynamics in a weak-field environment. This is complemented by a multitude of higher-order scattering events ensuing from repeated atom-photon interactions. The temporal evolution of the qubit excitation in our configuration closely mirrors the semiclassical predictions, particularly in the strong-pumping limit where Rabi oscillations unfold. Notably, this compatibility with the semiclassical paradigm applies both to the weak-driving and large-detuning limits. Our analysis, therefore, extends the existing results on quantum Rabi oscillations pertinent to single-mode cavity QED, to the multimode, waveguide-QED configurations wherein flying photons are the information carriers. Finally, we explore the scattering dynamics of pulsed wave packets, highlighting the potential to substantially enhance excitation efficiency, even in scenarios involving just a few photons.

翻訳日:2023-10-25 21:24:17 公開日:2023-10-24

# constitutionmaker: フィードバックを原則に変換することで、大規模言語モデルをインタラクティブに評価する

ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles ( http://arxiv.org/abs/2310.15428v1 )

ライセンス: Link先を確認

Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry

(参考訳) 大きな言語モデル(LLM)のプロンプトは、ユーザが独自のチャットボットを作成してカスタマイズするための、有望な新しいアプローチである。しかしながら、プロンプトエンジニアリングや微調整といったチャットボットのアウトプットを操作する現在の方法は、モデルのアウトプットに対する自然なフィードバックをプロンプトやモデルの変更に変換するユーザをサポートしない。本研究では,フィードバックをモデル動作を規定する一連の原則(コンスティチューション)に変換するのを支援することにより,ユーザがフィードバックを通じてインタラクティブにモデルアウトプットを洗練する方法について検討する。フォーマティブな研究から,(1)ユーザはフィードバックをチャットボットの原則に変換することを支援する必要があり,(2)ユーザが望む原則の種類を分類する必要があることがわかった。このような知見に触発されて,ユーザフィードバックを原則に変換するインタラクティブなツールであるconstitutionmakerを,llmベースのチャットボットとして開発した。 ConstitutionMakerでは、自然言語で肯定的あるいは否定的なフィードバック、自動生成されたフィードバックの選択、チャットボットの応答の書き直し、各フィードバックモードが自動的にチャットボットのプロンプトに挿入される原則を生成する。 14人の参加者によるユーザ調査では、constitutionmakerとablatedバージョンを比較して、ユーザが独自の原則を記述した。 constitutionmakerでは、参加者は彼らの原則がチャットボットをよりガイドし、フィードバックをより簡単に原則に変換し、より効率的に、よりメンタルな要求なしに原則を書くことができると感じた。 ConstitutionMakerは、ユーザーがチャットボットを改善する方法を特定し、モデルに対する直感的な反応をフィードバックに定式化し、フィードバックを具体的で明確な原則に変換するのに役立つ。これらの知見は,LLM出力の対話的クオリティ向上を支援する将来的なツールである。

Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively refine model outputs through their feedback, by helping them convert their feedback into a set of principles (i.e. a constitution) that dictate the model's behavior. From a formative study, we (1) found that users needed support converting their feedback into principles for the chatbot and (2) classified the different principle types desired by users. Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots. With ConstitutionMaker, users can provide either positive or negative feedback in natural language, select auto-generated feedback, or rewrite the chatbot's response; each mode of feedback automatically generates a principle that is inserted into the chatbot's prompt. In a user study with 14 participants, we compare ConstitutionMaker to an ablated version, where users write their own principles. With ConstitutionMaker, participants felt that their principles could better guide the chatbot, that they could more easily convert their feedback into principles, and that they could write principles more efficiently, with less mental demand. ConstitutionMaker helped users identify ways to improve the chatbot, formulate their intuitive responses to the model into feedback, and convert this feedback into specific and clear principles. Together, these findings inform future tools that support the interactive critiquing of LLM outputs.

翻訳日:2023-10-25 21:12:10 公開日:2023-10-24

# Mason-Alberta音声セグメント:ディープニューラルネットワークと補間に基づく強制アライメントシステム

The Mason-Alberta Phonetic Segmenter: A forced alignment system based on deep neural networks and interpolation ( http://arxiv.org/abs/2310.15425v1 )

ライセンス: Link先を確認

Matthew C. Kelley, Scott James Perry, Benjamin V. Tucker

(参考訳) 強制アライメントシステムは,音声データのセグメント間の境界を自動的に決定する。これらのツールは、手作業で書き起こしやセグメント化できない音声データの使用を容易にするために、音韻学では一般的である。本稿では,新しいニューラルネットワークに基づく強制アライメントシステム,Mason-Alberta Phonetic Segmenter(MAPS)について述べる。 MAPSアライメントは、強制アライメントシステムのために私たちが追求する2つの改善のためのテストベッドとして機能します。第一は、音声のセグメントが真に離散的ではなく、一般的に重複しているという共通の理解によって動機付けられた分類タスクではなく、強制ライナーで音響モデルをタグ付けタスクとして扱うことである。 2つ目は、現代の強制アライメントシステムにおいて一般的な10ミリ秒制限よりも正確な境界を許容する補間技術である。本システムの構成を最先端システムであるモントリオール強制調整機と比較した。タギングのアプローチはモントリオール強制アリグナーよりも改善された結果をもたらすことはなかった。しかし、補間技術を備えたシステムは、試験セット上の目標の10ms以内の境界の量において、モントリオール強制調整機と比較して27.92%増加した。また,音響モデリングの課題と訓練過程を強制的に調整し,これらのモデルの出力対象が電話との類似性の概念とどのように一致しないか,また,この緊張の解消にはタスクと出力対象の再検討や音声自体のセグメント化が必要となる可能性があることを強調する。

Forced alignment systems automatically determine boundaries between segments in speech data, given an orthographic transcription. These tools are commonplace in phonetics to facilitate the use of speech data that would be infeasible to manually transcribe and segment. In the present paper, we describe a new neural network-based forced alignment system, the Mason-Alberta Phonetic Segmenter (MAPS). The MAPS aligner serves as a testbed for two possible improvements we pursue for forced alignment systems. The first is treating the acoustic model in a forced aligner as a tagging task, rather than a classification task, motivated by the common understanding that segments in speech are not truly discrete and commonly overlap. The second is an interpolation technique to allow boundaries more precise than the common 10 ms limit in modern forced alignment systems. We compare configurations of our system to a state-of-the-art system, the Montreal Forced Aligner. The tagging approach did not generally yield improved results over the Montreal Forced Aligner. However, a system with the interpolation technique had a 27.92% increase relative to the Montreal Forced Aligner in the amount of boundaries within 10 ms of the target on the test set. We also reflect on the task and training process for acoustic modeling in forced alignment, highlighting how the output targets for these models do not match phoneticians' conception of similarity between phones and that reconciliation of this tension may require rethinking the task and output targets or how speech itself should be segmented.

翻訳日:2023-10-25 21:11:39 公開日:2023-10-24

# 分子ポラリトンの線形応答

Linear response of molecular polaritons ( http://arxiv.org/abs/2310.15424v1 )

ライセンス: Link先を確認

Joel Yuen-Zhou and Arghadip Koner

(参考訳) 本稿では,光学キャビティの光子モードにN$分子エミッタが結合する集合光物質強結合系を,光子が不純物である量子不純物モデルにマッピングし,不調和遷移の浴に結合することを示す。 N\gg1$の熱力学限界では、この浴を効果的な調和風呂に置き換えることにより、問題を劇的に単純化して調和振動子の1つにすることができる。分子入力に必要な唯一の分子入力が分子線感受性である線形光学スペクトル(透過,反射,吸収)の単純な解析式を導出する。この形式化は、温度、障害、ビブロンカップリング、および分子アンサンブルの光学的飽和の役割を示す一連の例に適用され、非線形光学実験の重要なクラスを記述する際にも有用である。完全性のために、回転波近似における任意の無調波系(大小ともにN$)に対する分光観測器の自己完結型導出を含む包括的近似を提供する。提案された結果のいくつかは既に文献で報告されているが、オープン量子系における強力な概念と線形応答理論と分子分極論を結びつける新しい解釈と同様に、結果を統一的に提示する。

In this article, we show that the collective light-matter strong coupling regime, where $N$ molecular emitters couple to the photon mode of an optical cavity, can be mapped to a quantum impurity model where the photon is the impurity that is coupled to a bath of anharmonic transitions. In the thermodynamic limit where $N\gg1$, we argue that the bath can be replaced with an effective harmonic bath, leading to a dramatic simplification of the problem into one of coupled harmonic oscillators. We derive simple analytical expressions for linear optical spectra (transmission, reflection, and absorption) where the only molecular input required is the molecular linear susceptibility. This formalism is applied to a series of illustrative examples showcasing the role of temperature, disorder, vibronic coupling, and optical saturation of the molecular ensemble, explaining that it is useful even when describing an important class of nonlinear optical experiments. For completeness, we provide a comprehensive Appendix that includes a self-contained derivation of the relevant spectroscopic observables for arbitrary anharmonic systems (for both large and small $N$) within the rotating-wave approximation. While some of the presented results herein have already been reported in the literature, we provide a unified presentation of the results as well as new interpretations that connect powerful concepts in open quantum systems and linear response theory with molecular polaritonics.

翻訳日:2023-10-25 21:11:10 公開日:2023-10-24

# G2-MonoDepth:単分子RGB+Xデータからの一般化深度推論の一般的なフレームワーク

G2-MonoDepth: A General Framework of Generalized Depth Inference from Monocular RGB+X Data ( http://arxiv.org/abs/2310.15422v1 )

ライセンス: Link先を確認

Haotian Wang, Meng Yang, and Nanning Zheng

(参考訳) 単眼深度推定はロボットのシーン認識の基本的な問題である。特定のロボットにはカメラと任意のタイプの奥行きセンサーが装備され、様々なスケールの様々なシーンに配置できるが、近年の進歩は複数のサブタスクを派生させた。これにより、特定のロボットの微調整モデルにさらなる負担がかかり、大規模な工業化において高コストでカスタマイズできる。本稿では,様々なロボットから入力されたあらゆるデータから高品質な深度マップを推定する単眼深度推定の統一課題について検討する。基本的なベンチマーク G2-MonoDepth はこのタスクのために開発されている。 (a)rgbプラス多様なシーンスケール/セマンティクス、深さスパーシティ([0%, 100%])、エラー(ホール/ノイズ/ブラル)の生深度に対応する統一データ表現rgb+x。 (b)入力生データの深度・深度・誤り及び出力シーンの多様さに対応するための新たな統一的損失 (c)多様なシーンスケールを入力から出力へよく伝達する改良されたネットワーク、及び (d) トレーニング用の生深度マップで実際のすべての種類のアーティファクトをシミュレートするデータ拡張パイプライン。 G2-MonoDepthは、深度推定、鮮度の違いによる深度補完、見えないシーンでの深度向上を含む3つのサブタスクに適用され、現実世界のデータと合成データの両方でSOTAベースラインを常に上回る。

Monocular depth inference is a fundamental problem for scene perception of robots. Specific robots may be equipped with a camera plus an optional depth sensor of any type and located in various scenes of different scales, whereas recent advances derived multiple individual sub-tasks. It leads to additional burdens to fine-tune models for specific robots and thereby high-cost customization in large-scale industrialization. This paper investigates a unified task of monocular depth inference, which infers high-quality depth maps from all kinds of input raw data from various robots in unseen scenes. A basic benchmark G2-MonoDepth is developed for this task, which comprises four components: (a) a unified data representation RGB+X to accommodate RGB plus raw depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth sparsity/errors of input raw data and diverse scales of output scenes, (c) an improved network to well propagate diverse scene scales from input to output, and (d) a data augmentation pipeline to simulate all types of real artifacts in raw depth maps for training. G2-MonoDepth is applied in three sub-tasks including depth estimation, depth completion with different sparsity, and depth enhancement in unseen scenes, and it always outperforms SOTA baselines on both real-world data and synthetic data.

翻訳日:2023-10-25 21:10:48 公開日:2023-10-24

# FANToM: インタラクションにおける心のストレステストマシン理論のベンチマーク

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions ( http://arxiv.org/abs/2310.15421v1 )

ライセンス: Link先を確認

Hyunwoo Kim, Melanie Sclar, Xuhui Zhou, Ronan Le Bras, Gunhee Kim, Yejin Choi, Maarten Sap

(参考訳) 心の理論(ToM)評価は、相互作用性に本質的に欠ける受動的物語を用いたテストモデルに焦点を当てている。本稿では,情報非対称な会話文脈におけるToMのストレステストを目的とした新しいベンチマークであるFANToMを紹介する。本ベンチマークは,大規模言語モデル(llm)の評価において,心理学から重要な理論的要件と必要な経験的考察を導出する。特に,LLMにおける視覚的・虚偽のToM能力を識別するために,同じ推論を要求される複数の質問を定式化する。 FANToMは、チェーン・オブ・シークレット・推論や微調整でさえも、人間よりもはるかにパフォーマンスが悪く、最先端のLLMでは困難であることを示す。

Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. Our benchmark draws upon important theoretical requisites from psychology and necessary empirical considerations when evaluating large language models (LLMs). In particular, we formulate multiple types of questions that demand the same underlying reasoning to identify illusory or false sense of ToM capabilities in LLMs. We show that FANToM is challenging for state-of-the-art LLMs, which perform significantly worse than humans even with chain-of-thought reasoning or fine-tuning.

翻訳日:2023-10-25 21:10:19 公開日:2023-10-24

# 短文トピックモデリングのための事前学習型言語モデル"Imagine"

Let the Pretrained Language Models "Imagine" for Short Texts Topic Modeling ( http://arxiv.org/abs/2310.15420v1 )

ライセンス: Link先を確認

Pritom Saha Akash, Jie Huang, Kevin Chen-Chuan Chang

(参考訳) トピックモデルは、ドキュメントコレクション内の潜在意味論を発見するための魅力的な方法の1つです。しかし、ドキュメントが有効な十分な共起情報を持っていると仮定する。しかし、短いテキストでは、共起情報は最小限であり、結果として文書表現に特徴的スパーシティが生じる。したがって、既存のトピックモデル(確率的または神経的)は、主にパターンをマイニングして一貫性のあるトピックを生成するのに失敗する。本稿では,既存の事前学習言語モデル(PLM)を用いて,短いテキストを長いシーケンスに拡張することで,データスパーシビリティ問題に対処する,短文トピックモデリングの新しいアプローチを提案する。さらに、PLMからノイズの多い話題テキスト生成の効果を低減するために、ニューラルトピックモデルを拡張した簡単なソリューションを提供する。我々は,本モデルが短文トピックモデリングの性能を大幅に向上させることができることを観察した。極端なデータスパーシティシナリオの下での複数の実世界のデータセットに関する広範囲な実験は、我々のモデルが最先端のモデルよりも高品質なトピックを生成できることを示しています。

Topic models are one of the compelling methods for discovering latent semantics in a document collection. However, it assumes that a document has sufficient co-occurrence information to be effective. However, in short texts, co-occurrence information is minimal, which results in feature sparsity in document representation. Therefore, existing topic models (probabilistic or neural) mostly fail to mine patterns from them to generate coherent topics. In this paper, we take a new approach to short-text topic modeling to address the data-sparsity issue by extending short text into longer sequences using existing pre-trained language models (PLMs). Besides, we provide a simple solution extending a neural topic model to reduce the effect of noisy out-of-topics text generation from PLMs. We observe that our model can substantially improve the performance of short-text topic modeling. Extensive experiments on multiple real-world datasets under extreme data sparsity scenarios show that our models can generate high-quality topics outperforming state-of-the-art models.

翻訳日:2023-10-25 21:10:09 公開日:2023-10-24

# 政策最適化におけるフラクタル景観

Fractal Landscapes in Policy Optimization ( http://arxiv.org/abs/2310.15418v1 )

ライセンス: Link先を確認

Tao Wang, Sylvia Herbert and Sicun Gao

(参考訳) 政策勾配は、継続的ドメインにおける深層強化学習(RL)の中核にある。多くの成功にもかかわらず、政策勾配によるRLトレーニングは、既知の解に対する標準的な制御問題でさえも、多くの理由で失敗する可能性があると、実際にはしばしば見られている。ポリシ空間における最適化の展望は,あるクラスのMDPに対して極めて非平滑あるいはフラクタルであり,そもそも勾配を推定する手段が存在しない,という,ポリシー勾配アプローチの固有の制限を理解するための枠組みを提案する。カオス理論と非スムース解析の手法を考察し,政策最適化目標の最大リアプノフ指数とh\"older指数を分析した。さらに,学習過程がフラクタルランドスケープに遭遇したときのサンプルから目的関数の局所的滑らかさを推定する実用的な手法を開発した。このようなフラクタルな景観によって、政策最適化の失敗事例をいかに説明できるかを示す実験を示す。

Policy gradient lies at the core of deep reinforcement learning (RL) in continuous domains. Despite much success, it is often observed in practice that RL training with policy gradient can fail for many reasons, even on standard control problems with known solutions. We propose a framework for understanding one inherent limitation of the policy gradient approach: the optimization landscape in the policy space can be extremely non-smooth or fractal for certain classes of MDPs, such that there does not exist gradient to be estimated in the first place. We draw on techniques from chaos theory and non-smooth analysis, and analyze the maximal Lyapunov exponents and H\"older exponents of the policy optimization objectives. Moreover, we develop a practical method that can estimate the local smoothness of objective function from samples to identify when the training process has encountered fractal landscapes. We show experiments to illustrate how some failure cases of policy optimization can be explained by such fractal landscapes.

翻訳日:2023-10-25 21:09:52 公開日:2023-10-24

# 点/系列再構成による名目性スコア条件付き時系列異常検出

Nominality Score Conditioned Time Series Anomaly Detection by Point/Sequential Reconstruction ( http://arxiv.org/abs/2310.15416v1 )

ライセンス: Link先を確認

Chih-Yu Lai, Fan-Keng Sun, Zhengqi Gao, Jeffrey H. Lang, and Duane S. Boning

(参考訳) 時系列異常検出は、複雑で様々なパターンが発生するため困難である。時間依存関係をモデル化して、点異常の検出精度を維持しながらコンテキスト異常を見つけることが大きな課題である。本稿では,ポイントベースおよびシーケンスベース再構成モデルを用いた教師なし時系列異常検出のためのフレームワークを提案する。点ベースモデルは点異常の定量化を試み、シーケンスベースモデルは点と文脈異常の定量化を試みる。観測された時刻が名目時点から2段階のずれ値であるという定式化において、復元誤差の組合せ値の比率から算出した名目スコアを導入する。本研究は,発音スコアと異常スコアとを更に統合して誘導異常スコアを導出し,特定の条件下で誘導異常スコアが元の異常スコアよりも優れていることを理論的に証明する。いくつかの公開データセットに関する広範な研究により、提案されたフレームワークは、時系列異常検出のための最先端のベースラインよりも優れていることが示されている。

Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based reconstruction models. The point-based model attempts to quantify point anomalies, and the sequence-based model attempts to quantify both point and contextual anomalies. Under the formulation that the observed time point is a two-stage deviated value from a nominal time point, we introduce a nominality score calculated from the ratio of a combined value of the reconstruction errors. We derive an induced anomaly score by further integrating the nominality score and anomaly score, then theoretically prove the superiority of the induced anomaly score over the original anomaly score under certain conditions. Extensive studies conducted on several public datasets show that the proposed framework outperforms most state-of-the-art baselines for time series anomaly detection.

翻訳日:2023-10-25 21:09:36 公開日:2023-10-24

# 会話間のギャップを意識して-長期対話生成の改善

Mind the Gap Between Conversations for Improved Long-Term Dialogue Generation ( http://arxiv.org/abs/2310.15415v1 )

ライセンス: Link先を確認

Qiang Zhang, Jason Naradowsky, Yusuke Miyao

(参考訳) 会話の終わり方や再開方法を知ることは、コミュニケーションの自然な部分であり、数週間、数ヶ月、数年にわたる議論を可能にする。会話間のギャップの期間は、どのトピックが関連しているか、どの質問をするかを判断し、明確にモデル化されていない対話システムは、不自然な応答を生成する。本稿では,対話モデルに時間を認識し,セッション間の時間が異なるマルチセッション対話データセットであるgapchatを提案する。データセットはリアルタイムに構築されているが、話者の生活における出来事の進行をシミュレートして、長い時間間隔で発生する現実的な対話を生成する。時間情報をモデルに公開し、時間とイベントの進捗の異なる表現を比較します。人的評価において、時間認識モデルは、選択したトピックと会話から得られる情報との関係を判断する指標において、より良い性能を示すことを示す。

Knowing how to end and resume conversations over time is a natural part of communication, allowing for discussions to span weeks, months, or years. The duration of gaps between conversations dictates which topics are relevant and which questions to ask, and dialogue systems which do not explicitly model time may generate responses that are unnatural. In this work we explore the idea of making dialogue models aware of time, and present GapChat, a multi-session dialogue dataset in which the time between each session varies. While the dataset is constructed in real-time, progress on events in speakers' lives is simulated in order to create realistic dialogues occurring across a long timespan. We expose time information to the model and compare different representations of time and event progress. In human evaluation we show that time-aware models perform better in metrics that judge the relevance of the chosen topics and the information gained from the conversation.

翻訳日:2023-10-25 21:09:18 公開日:2023-10-24

# 人間とAIのコラボレーションに関する諸条約

Diverse Conventions for Human-AI Collaboration ( http://arxiv.org/abs/2310.15414v1 )

ライセンス: Link先を確認

Bidipta Sarkar and Andy Shih and Dorsa Sadigh

(参考訳) コンベンションは、プレイヤーが明示的なコミュニケーションなしに共有戦略で協調できるため、協調マルチエージェントゲームにおける強力なパフォーマンスに不可欠である。残念ながら、セルフプレイのような標準的なマルチエージェント強化学習技術は、任意で非多様性の慣習に収束し、新しいパートナーと対話する際には一般化が不十分になる。本研究は,(1)自己プレイ中の報酬を最大化し,(2)発見済みの規約(クロスプレイ)で遊ぶ際の報酬を最小化し,意味的に異なる規約を刺激することにより,多様な慣習を生成する手法を提案する。クロスプレイの逆最適化に拘わらず,学習した政策が忠実に振る舞うようにするために,自己プレイとクロスプレイの遷移をサンプリングして初期状態をランダムに生成し,この初期状態から自己プレイの報酬を最大化することを学習する「emph{mixed-play}」を導入する。我々は,Overcookedを含む様々なマルチエージェント協調ゲームにおける手法の利点を分析し,本手法が実際のユーザとペアリングした場合の人間レベルのパフォーマンスを越えながら,人間の慣行に適応できることを見出した。

Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce \emph{mixed-play}, where an initial state is randomly generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users.

翻訳日:2023-10-25 21:09:02 公開日:2023-10-24

# DeepIron:1枚の画像から未処理のガーメントテクスチャを予測する

DeepIron: Predicting Unwarped Garment Texture from a Single Image ( http://arxiv.org/abs/2310.15447v1 )

ライセンス: Link先を確認

Hyun-Song Kwon, Sung-Hee Lee

(参考訳) 画像からの3D衣服のリアルな再構築は、アバター作成や仮想試着など幅広い応用がある。本稿では,1枚の写真から3次元衣料のテクスチャマップを再構築する新しい枠組みを提案する。 2次元縫製パターンを縫い合わせることで3D衣服をモデル化すると、その具体的目的は縫製パターンのテクスチャ画像を作成することである。本フレームワークの重要な構成要素であるテクスチュア・アンワーパーは、入力された衣服画像から本来のテクスチャイメージを推測し、ユーザの身体形状やポーズによるテクスチャのゆらぎと隠蔽を示す。 Texture Unwarperは、2つの画像の潜在空間をマッピングすることで、入力画像と出力画像の間で効果的に変換する。入力された衣服の本来のテクスチャを推定することで、新しいポーズのためにリアルに変形した高品質なテクスチャ画像を表示できる3d衣料モデルの再構築を支援する。他の方法との比較とアブレーション研究を通じて,本手法の有効性を検証する。さらに, 衣服を装着したアバターのテクスチャやイメージを付加した衣服縫製パターンの大規模データセットを公開し, 今後, テクスチャの再構築と合成研究に役立てる予定である。

Realistic reconstruction of 3D clothing from an image has wide applications, such as avatar creation and virtual try-on. This paper presents a novel framework that reconstructs the texture map for 3D garments from a single image with pose. Assuming that 3D garments are modeled by stitching 2D garment sewing patterns, our specific goal is to generate a texture image for the sewing patterns. A key component of our framework, the Texture Unwarper, infers the original texture image from the input clothing image, which exhibits warping and occlusion of texture due to the user's body shape and pose. The Texture Unwarper effectively transforms between the input and output images by mapping the latent spaces of the two images. By inferring the unwarped original texture of the input garment, our method helps reconstruct 3D garment models that can show high-quality texture images realistically deformed for new poses. We validate the effectiveness of our approach through a comparison with other methods and ablation studies. Additionally, we release a large dataset of garment sewing patterns with textures and images of avatars wearing the garments, which will be useful for future research on garment texture reconstruction and synthesis.

翻訳日:2023-10-25 21:03:38 公開日:2023-10-24

# 高速伝播: サンプリングサブネットワークによる単段攻撃訓練の高速化

Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks ( http://arxiv.org/abs/2310.15444v1 )

ライセンス: Link先を確認

Xiaojun Jia, Jianshu Li, Jindong Gu, Yang Bai and Xiaochun Cao

(参考訳) 敵のトレーニングでは、敵の例に対して堅牢なモデルを構築することが期待されている。逆行訓練の大きな欠点は、逆行例の生成によって引き起こされる計算オーバーヘッドである。この制限を克服するため、単段階攻撃に基づく敵の訓練が検討されている。これまでの作業は、サンプル初期化、損失正規化、トレーニング戦略など、異なる視点からの一段階の敵訓練を改善する。ほとんど全員が、基盤となるモデルをブラックボックスとして扱う。本研究では,モデルの内部構造ブロックを利用して効率を向上させることを提案する。具体的には、トレーニング中の代理モデルとして軽量サブネットワークを動的にサンプリングすることを提案する。これにより、効果的に対向訓練を行うために、前方と後方の両方のパスを加速することができる。さらに,モデルロバスト性が,サンプルサブネットワークを用いた単段逆訓練によって向上することを示すための理論的解析を行う。さらに, サンプリングを層ごとに, 繰り返しから繰り返しへと変化させる新しいサンプリング手法を提案する。従来の手法と比較して,本手法はトレーニングコストを削減するだけでなく,モデル堅牢性を向上する。一連の人気データセットの評価は、提案したFB-Betterの有効性を示す。私たちのコードはhttps://github.com/jiaxiaojunQAQ/FP-Better.comで公開されています。

Adversarial training has shown promise in building robust models against adversarial examples. A major drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples. To overcome this limitation, adversarial training based on single-step attacks has been explored. Previous work improves the single-step adversarial training from different perspectives, e.g., sample initialization, loss regularization, and training strategy. Almost all of them treat the underlying model as a black box. In this work, we propose to exploit the interior building blocks of the model to improve efficiency. Specifically, we propose to dynamically sample lightweight subnetworks as a surrogate model during training. By doing this, both the forward and backward passes can be accelerated for efficient adversarial training. Besides, we provide theoretical analysis to show the model robustness can be improved by the single-step adversarial training with sampled subnetworks. Furthermore, we propose a novel sampling strategy where the sampling varies from layer to layer and from iteration to iteration. Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness. Evaluations on a series of popular datasets demonstrate the effectiveness of the proposed FB-Better. Our code has been released at https://github.com/jiaxiaojunQAQ/FP-Better.

翻訳日:2023-10-25 21:03:15 公開日:2023-10-24

# 量子アニール法による線形方程式解法アルゴリズムの収束率

Convergence rate of algorithms for solving linear equations by quantum annealing ( http://arxiv.org/abs/2310.15441v1 )

ライセンス: Link先を確認

V. Shalgin, S. Tikhomirov

(参考訳) 量子アニーリングの原理に基づく量子コンピュータを用いて線形方程式$ax=b$を解くための様々な反復アルゴリズムを考える。コンピュータの出力がボルツマン分布によって記述されていると仮定すると、方程式解法アルゴリズムが収束する条件下で、それらの収束率の推定値が提供される。無限個の量子ビットと少数の量子ビットの両方を用いたアルゴリズムへのこのアプローチの適用について論じる。

We consider various iterative algorithms for solving the linear equation $ax=b$ using a quantum computer operating on the principle of quantum annealing. Assuming that the computer's output is described by the Boltzmann distribution, it is shown under which conditions the equation-solving algorithms converge, and an estimate of their convergence rate is provided. The application of this approach to algorithms using both an infinite number of qubits and a small number of qubits is discussed.

翻訳日:2023-10-25 21:02:57 公開日:2023-10-24

# 線形VOEにおける学習ダイナミクス: 後方崩壊閾値, 超流動潜時空間ピットフォール, KLアニーリングによる高速化

Learning Dynamics in Linear VAE: Posterior Collapse Threshold, Superfluous Latent Space Pitfalls, and Speedup with KL Annealing ( http://arxiv.org/abs/2310.15440v1 )

ライセンス: Link先を確認

Yuma Ichikawa and Koji Hukushima

(参考訳) 変分自己エンコーダ(VAEs)は、変分後部はしばしば前者と密接に一致する悪名高い問題に直面し、後部崩壊と呼ばれる現象は表現学習の質を妨げる。この問題を緩和するために、調整可能なハイパーパラメータ$\beta$と、KLアニールと呼ばれるこのパラメータをアニールする戦略を提案する。本研究では,最小vaeにおける学習ダイナミクスの理論的解析を行う。ダイナミックスが大きな入力次元の限界内で決定論的プロセスに収束することが厳密に証明され、一般化誤差の詳細な動的解析が可能になる。さらに, VAEはまず絡み合った表現を学習し, 徐々に絡み合った表現を取得する。決定論的プロセスの固定点分析により、$\beta$ が一定の閾値を超えると、学習期間に関係なく後方崩壊は避けられないことが分かる。さらに、データ生成因子の過剰な潜在変数は背景雑音の過剰化につながり、一般化と学習収束の両方に悪影響を及ぼす。この分析により、適切に調整されたKLアニールが収束を加速することが明らかとなった。

Variational autoencoders (VAEs) face a notorious problem wherein the variational posterior often aligns closely with the prior, a phenomenon known as posterior collapse, which hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter $\beta$ and a strategy for annealing this parameter, called KL annealing, are proposed. This study presents a theoretical analysis of the learning dynamics in a minimal VAE. It is rigorously proved that the dynamics converge to a deterministic process within the limit of large input dimensions, thereby enabling a detailed dynamical analysis of the generalization error. Furthermore, the analysis shows that the VAE initially learns entangled representations and gradually acquires disentangled representations. A fixed-point analysis of the deterministic process reveals that when $\beta$ exceeds a certain threshold, posterior collapse becomes inevitable regardless of the learning period. Additionally, the superfluous latent variables for the data-generative factors lead to overfitting of the background noise; this adversely affects both generalization and learning convergence. The analysis further unveiled that appropriately tuned KL annealing can accelerate convergence.

翻訳日:2023-10-25 21:02:48 公開日:2023-10-24

# k-haters:ターゲット別評価を用いた韓国におけるヘイトスピーチ検出コーパス

K-HATERS: A Hate Speech Detection Corpus in Korean with Target-Specific Ratings ( http://arxiv.org/abs/2310.15439v1 )

ライセンス: Link先を確認

Chaewon Park, Soohwan Kim, Kyubyong Park, Kunwoo Park

(参考訳) オンライン憎しみの拡散に対抗するために、多くのデータセットが提案されている。これらの努力にもかかわらず、これらの資源の大半は英語中心であり、主に過度な憎しみの形式に焦点を当てている。この研究ギャップは、より微妙な憎悪表現をカプセル化した多様な言語で高品質なコーパスを開発することを要求する。本研究では,韓国におけるヘイトスピーチ検出のための新しいコーパスであるK-HATERSを紹介する。このリソースは韓国で最大の攻撃的言語コーパスであり、ターゲット固有の評価を3ポイントのlikertスケールで提供し、さまざまな攻撃性を通じて韓国における憎悪表現の検出を可能にした。提案コーパスの有効性を示す実験を行い,既存のデータセットとの比較を行った。さらに,人間の注釈における潜在的なノイズやバイアスに対処するために,個人の認知能力を評価するための社会科学において広く用いられている認知的リフレクションテスト(cognitive reflection test)を,ラベル付け品質の指標として採用するという新しい考え方を探求する。その結果、テストスコアが最も低い個人からのアノテーションは、特定のターゲットグループに対して偏りのある予測を行い、精度が低い検出モデルをもたらす傾向がある。本研究は,ヘイトスピーチの検出と資源構築に関するNLP研究に寄与する。コードとデータセットはhttps://github.com/ssu-humane/K-HATERSでアクセスできる。

Numerous datasets have been proposed to combat the spread of online hate. Despite these efforts, a majority of these resources are English-centric, primarily focusing on overt forms of hate. This research gap calls for developing high-quality corpora in diverse languages that also encapsulate more subtle hate expressions. This study introduces K-HATERS, a new corpus for hate speech detection in Korean, comprising approximately 192K news comments with target-specific offensiveness ratings. This resource is the largest offensive language corpus in Korean and is the first to offer target-specific ratings on a three-point Likert scale, enabling the detection of hate expressions in Korean across varying degrees of offensiveness. We conduct experiments showing the effectiveness of the proposed corpus, including a comparison with existing datasets. Additionally, to address potential noise and bias in human annotations, we explore a novel idea of adopting the Cognitive Reflection Test, which is widely used in social science for assessing an individual's cognitive ability, as a proxy of labeling quality. Findings indicate that annotations from individuals with the lowest test scores tend to yield detection models that make biased predictions toward specific target groups and are less accurate. This study contributes to the NLP research on hate speech detection and resource construction. The code and dataset can be accessed at https://github.com/ssu-humane/K-HATERS.

翻訳日:2023-10-25 21:02:27 公開日:2023-10-24

# VGX:学習ベースのソフトウェア脆弱性分析を促進する大規模サンプル生成

VGX: Large-Scale Sample Generation for Boosting Learning-Based Software Vulnerability Analyses ( http://arxiv.org/abs/2310.15436v1 )

ライセンス: Link先を確認

Yu Nong, Richard Fang, Guangbei Yi, Kunsong Zhao, Xiapu Luo, Feng Chen, and Haipeng Cai

(参考訳) 学習ベースの防御ソフトウェア脆弱性分析の成功を伴って、ラベル付き脆弱性プログラムサンプルの大規模かつ高品質なセットが欠如しており、これらの防御のさらなる進歩を妨げる。既存の自動サンプル生成手法は、生成したサンプルの高ノイズのため、まだ現実的な期待に届かなかった。本稿では,高品質な脆弱性データセットを大規模に生成するための新しい手法であるVGXを提案する。通常のプログラムが与えられた場合、VGXは脆弱性を注入できるコードコンテキストを特定し、新しいバリューフローベースの位置エンコーディングを備えたカスタマイズされたトランスフォーマーを使用して、特にコード構造とコンテキストを学ぶための新しい目的に対して事前トレーニングを行う。次に、VGXは、歴史的修正と現実世界の脆弱性に関する人間の知識の両方から得られた編集パターンを用いて、特定コンテキストにおける脆弱性注入コード編集を実現する。 4つのSOTAベースライン(パターン-、トランスフォーマー-、GNN-、パターン+トランスフォーマー-ベース)と比較して、VGXは99.09-890.06%高いF1と22.45%-328.47%高いラベル精度を達成した。 vgxは脆弱性のあるサンプルを150,392個生成し、そのサンプルから10パーセントをランダムに選択し、脆弱性の検出、ローカライズ、修復にどの程度役立つかを評価しました。その結果、これらの3つのアプリケーションタスクのSOTA技術は、F1の19.15-330.80%、トップ10の精度が12.86-19.31%、トップ50の精度が85.02-99.30%向上した。これらのサンプルはまた、SOTA脆弱性検出器が、オリジナルのモデルで見逃されるような重要なシステム(例えばLinuxカーネル)において、13のより現実的な脆弱性(CVE)を発見するのに役立った。

Accompanying the successes of learning-based defensive software vulnerability analyses is the lack of large and quality sets of labeled vulnerable program samples, which impedes further advancement of those defenses. Existing automated sample generation approaches have shown potentials yet still fall short of practical expectations due to the high noise in the generated samples. This paper proposes VGX, a new technique aimed for large-scale generation of high-quality vulnerability datasets. Given a normal program, VGX identifies the code contexts in which vulnerabilities can be injected, using a customized Transformer featured with a new value-flowbased position encoding and pre-trained against new objectives particularly for learning code structure and context. Then, VGX materializes vulnerability-injection code editing in the identified contexts using patterns of such edits obtained from both historical fixes and human knowledge about real-world vulnerabilities. Compared to four state-of-the-art (SOTA) baselines (pattern-, Transformer-, GNN-, and pattern+Transformer-based), VGX achieved 99.09-890.06% higher F1 and 22.45%-328.47% higher label accuracy. For in-the-wild sample production, VGX generated 150,392 vulnerable samples, from which we randomly chose 10% to assess how much these samples help vulnerability detection, localization, and repair. Our results show SOTA techniques for these three application tasks achieved 19.15-330.80% higher F1, 12.86-19.31% higher top-10 accuracy, and 85.02-99.30% higher top-50 accuracy, respectively, by adding those samples to their original training data. These samples also helped a SOTA vulnerability detector discover 13 more real-world vulnerabilities (CVEs) in critical systems (e.g., Linux kernel) that would be missed by the original model.

翻訳日:2023-10-25 21:02:03 公開日:2023-10-24

# PromptInfuser: AIとUIデザインの密結合がデザイナのワークフローに与える影響

PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers' Workflows ( http://arxiv.org/abs/2310.15435v1 )

ライセンス: Link先を確認

Savvas Petridis, Michael Terry, Carrie J. Cai

(参考訳) AIアプリケーションのプロトタイプ作成は、非常に難しい。大規模言語モデル(LLM)のプロトタイピングがAIプロトタイピングの障壁を劇的に減らしたが、デザイナはまだAI機能とUIを別々にプロトタイピングしている。プロンプトとuiデザインの結合がデザイナのワークフローに与える影響について検討する。本研究では,UI要素をインプットやプロンプトの出力に接続することで,半機能的なモックアップを作成できるプラグインであるPromptInfuserを開発した。 14人のデザイナーによる研究で、promiseinfuserとデザイナーの現在のaiプロトタイピングワークフローを比較した。 PromptInfuserはプロダクトのアイデアを伝えるのに非常に有用であり、想定されたアーティファクトを現実的に表現し、プロトタイピングをより効率的にし、UIの問題や技術的な制約を予測するのに役立ちます。 PromptInfuserは、プロンプトとUIを合わせてイテレーションを奨励した。これらの発見は、AIアプリケーションをプロトタイピングする将来のシステムに通知する。

Prototyping AI applications is notoriously difficult. While large language model (LLM) prompting has dramatically lowered the barriers to AI prototyping, designers are still prototyping AI functionality and UI separately. We investigate how coupling prompt and UI design affects designers' workflows. Grounding this research, we developed PromptInfuser, a Figma plugin that enables users to create semi-functional mockups, by connecting UI elements to the inputs and outputs of prompts. In a study with 14 designers, we compare PromptInfuser to designers' current AI-prototyping workflow. PromptInfuser was perceived to be significantly more useful for communicating product ideas, more capable of producing prototypes that realistically represent the envisioned artifact, more efficient for prototyping, and more helpful for anticipating UI issues and technical constraints. PromptInfuser encouraged iteration over prompt and UI together, which helped designers identify UI and prompt incompatibilities and reflect upon their total solution. Together, these findings inform future systems for prototyping AI applications.

翻訳日:2023-10-25 21:01:26 公開日:2023-10-24

# 政策畳み込みによる大規模行動空間のオフポリシー評価

Off-Policy Evaluation for Large Action Spaces via Policy Convolution ( http://arxiv.org/abs/2310.15433v1 )

ライセンス: Link先を確認

Noveen Sachdeva, Lequn Wang, Dawen Liang, Nathan Kallus, Julian McAuley

(参考訳) 正確なオフポリシー推定器の開発は、新しいポリシーの評価と最適化の両方に不可欠である。オフポリシー推定の主な課題は、データを生成するロギングポリシーと、我々が評価しようとしているターゲットポリシーの分散シフトである。通常、分布シフトを補正する技術は、ある種の重要サンプリングを含む。このアプローチは偏りのない値推定をもたらすが、ワンステップのコンテキストバンディットの単純な場合であっても、しばしば高い分散のトレードオフを伴う。さらに、重要サンプリングは、アクションスペースが大きいと非現実的になる共通のサポート仮定に依存する。これらの課題に対処するために、我々は、予測者の政策転換(PC)ファミリーを紹介する。これらのメソッドは、アクション内の潜在構造 -- アクション埋め込みを通じて利用可能 -- を利用して、ログとターゲットポリシーを戦略的に畳み込みます。この畳み込みは、畳み込み量を調整することで制御できるユニークなバイアス分散トレードオフをもたらす。筆者らは,PCを用いた場合,特に行動空間や政策ミスマッチが大きくなり,既存の推定値よりも最大5～6桁の精度で,平均二乗誤差(MSE)が顕著に向上することを示した。

Developing accurate off-policy estimators is crucial for both evaluating and optimizing for new policies. The main challenge in off-policy estimation is the distribution shift between the logging policy that generates data and the target policy that we aim to evaluate. Typically, techniques for correcting distribution shift involve some form of importance sampling. This approach results in unbiased value estimation but often comes with the trade-off of high variance, even in the simpler case of one-step contextual bandits. Furthermore, importance sampling relies on the common support assumption, which becomes impractical when the action space is large. To address these challenges, we introduce the Policy Convolution (PC) family of estimators. These methods leverage latent structure within actions -- made available through action embeddings -- to strategically convolve the logging and target policies. This convolution introduces a unique bias-variance trade-off, which can be controlled by adjusting the amount of convolution. Our experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC, especially when either the action space or policy mismatch becomes large, with gains of up to 5 - 6 orders of magnitude over existing estimators.

翻訳日:2023-10-25 21:01:05 公開日:2023-10-24

# 火をつけるのに何が必要か社会的・道徳的状況の明確化のための文脈と合理化の反復的自己蒸留

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations ( http://arxiv.org/abs/2310.15431v1 )

ライセンス: Link先を確認

Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi

(参考訳) 道徳的または倫理的な判断は、それらが起こる特定の文脈に大きく依存する。様々なデファシブルな文脈化の陰(つまり、行動の道徳的受容性を強化するまたは弱める付加的な情報)を理解することは、現実のシナリオにおける人間の道徳的判断の微妙さと複雑さを正確に表すために重要である。我々は,行動が多かれ少なかれ道徳的に容認されるような基礎的な文脈を提供することと,その推論を正当化する常識的理性を導入する。高品質なタスクデータを抽出するために,GPT-3から少量の未構造化シード知識から始まる反復的自己蒸留アプローチを,(1)学生モデルからの自己蒸留,(2)人間による判断(妥当性向上)とNLI(多様性向上)によって訓練された批評家モデルによるターゲットフィルタリング,(3)自己シミュレーション学習(データ品質の増幅)とを交互に行う。このプロセスは、妥当性、多様性、デファシビリティを改善したデファシブルコンテキストを生成する学生モデルを生成する。このモデルから、人間のアノテータの85.9%から99.8%で評価された115Kデファシブルな道徳行動の文脈化と合理性の1.2M項目からなる高品質なデータセット \delta-Rules-of-Thumb を蒸留する。 \delta-RoT を用いて、すべての中間学生モデルに顕著なマージンで勝利する最終学生モデルを得る。

Moral or ethical judgments rely heavily on the specific contexts in which they occur. Understanding varying shades of defeasible contextualizations (i.e., additional information that strengthens or attenuates the moral acceptability of an action) is critical to accurately represent the subtlety and intricacy of grounded human moral judgment in real-life scenarios. We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable, along with commonsense rationales that justify the reasoning. To elicit high-quality task data, we take an iterative self-distillation approach that starts from a small amount of unstructured seed knowledge from GPT-3 and then alternates between (1) self-distillation from student models; (2) targeted filtering with a critic model trained by human judgment (to boost validity) and NLI (to boost diversity); (3) self-imitation learning (to amplify the desired data quality). This process yields a student model that produces defeasible contexts with improved validity, diversity, and defeasibility. From this model we distill a high-quality dataset, \delta-Rules-of-Thumb, of 1.2M entries of contextualizations and rationales for 115K defeasible moral actions rated highly by human annotators 85.9% to 99.8% of the time. Using \delta-RoT we obtain a final student model that wins over all intermediate student models by a notable margin.

翻訳日:2023-10-25 21:00:45 公開日:2023-10-24

# Beyond Sentiment: 政治的スタンス分類のためのトピックメトリクスを活用する

Beyond Sentiment: Leveraging Topic Metrics for Political Stance Classification ( http://arxiv.org/abs/2310.15429v1 )

ライセンス: Link先を確認

Weihong Qi

(参考訳) 感覚分析は、コーパスの全体音だけを捉えるために広く批判されているが、テキスト内の潜伏構造や政治的スタンスを正確に反映するには不十分である。本研究では,抽出されたトピックから変換されたダミー変数であるトピックメトリクスを,スタンス分類における感情指標の代替および補完として導入する。本研究は,Bestvater and Monroe (2023) が同定した3つのデータセットを用いて,一貫性のあるトピック抽出におけるBERTopicの習熟度と,スタンス分類におけるトピックメトリクスの有効性を示す。実験の結果、BERTopicのコヒーレンススコアは17.07%から54.20%向上し、ディリクレ転位(英語版)(LDA)や非負行列因子化(英語版)(NMF)のような従来のアプローチと比較しても改善した。さらに,トピックメトリクスは,スタンス分類における感情指標を上回り,最大18.95%のパフォーマンス向上を示した。本研究は,文脈に富んだテキストやコーパスにおいて,スタンスと感情の相関が弱い話題メトリクスが特に有効であることを示唆する。センチメントとトピックメトリクスの組み合わせは、ほとんどのシナリオで最適なパフォーマンスを達成し、トピックメトリクスのコヒーレンススコアの低さだけでなく、感情のみに依存するという制限にも対処できます。

Sentiment analysis, widely critiqued for capturing merely the overall tone of a corpus, falls short in accurately reflecting the latent structures and political stances within texts. This study introduces topic metrics, dummy variables converted from extracted topics, as both an alternative and complement to sentiment metrics in stance classification. By employing three datasets identified by Bestvater and Monroe (2023), this study demonstrates BERTopic's proficiency in extracting coherent topics and the effectiveness of topic metrics in stance classification. The experiment results show that BERTopic improves coherence scores by 17.07% to 54.20% when compared to traditional approaches such as Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), prevalent in earlier political science research. Additionally, our results indicate topic metrics outperform sentiment metrics in stance classification, increasing performance by as much as 18.95%. Our findings suggest topic metrics are especially effective for context-rich texts and corpus where stance and sentiment correlations are weak. The combination of sentiment and topic metrics achieve an optimal performance in most of the scenarios and can further address the limitations of relying solely on sentiment as well as the low coherence score of topic metrics.

翻訳日:2023-10-25 21:00:13 公開日:2023-10-24

# 意味的混乱補正による連続イベント抽出

Continual Event Extraction with Semantic Confusion Rectification ( http://arxiv.org/abs/2310.15470v1 )

ライセンス: Link先を確認

Zitao Wang and Xinyi Wang and Wei Hu

(参考訳) 本研究では, 連続イベント抽出法について検討し, 忘れることを避けつつ, 間欠的に出現するイベント情報を抽出することを目的とした。イベントタイプに関するセマンティックな混乱は、時間とともに更新される同じテキストのアノテーションに由来することを観察する。イベントタイプ間の不均衡は、この問題を悪化させる。本稿では,意味的混乱を解消する新しい連続イベント抽出モデルを提案する。意味的混乱を軽減するために各文の擬似ラベルをマークする。イベントタイプの理解を深めるために、現在のモデルと以前のモデルの間に重要な知識を転送します。さらに、モデルには、他の関連する型を利用して、ロングテールイベントタイプのセマンティクスにフォーカスするよう促す。実験の結果,本モデルは最先端のベースラインより優れ,不均衡なデータセットに熟練していることがわかった。

We study continual event extraction, which aims to extract incessantly emerging event information while avoiding forgetting. We observe that the semantic confusion on event types stems from the annotations of the same text being updated over time. The imbalance between event types even aggravates this issue. This paper proposes a novel continual event extraction model with semantic confusion rectification. We mark pseudo labels for each sentence to alleviate semantic confusion. We transfer pivotal knowledge between current and previous models to enhance the understanding of event types. Moreover, we encourage the model to focus on the semantics of long-tailed event types by leveraging other associated types. Experimental results show that our model outperforms state-of-the-art baselines and is proficient in imbalanced datasets.

翻訳日:2023-10-25 20:51:41 公開日:2023-10-24

# Janusインターフェース: 大規模言語モデルの微調整がプライバシリスクをいかに増幅するか

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks ( http://arxiv.org/abs/2310.15469v1 )

ライセンス: Link先を確認

Xiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, Zihao Wang, Liya Su, XiaoFeng Wang, Haixu Tang

(参考訳) 2018年以降のこの時代は、OpenAIのChatGPTのような革新的な言語技術によって、大きな言語モデル(LLM)が出現した。業界がモデルパラメータの強化と膨大な人間の言語データの活用に躍起になり、セキュリティとプライバシの課題も浮上した。中でも最も重要なのが、Webベースのデータ取得におけるPII(Personal Identible Information)の潜在的な不注意な付加であり、意図しないPII開示のリスクが生じる。トレーニング中のRLHFや破滅的なフォーッティングといった戦略は、プライバシー侵害のリスクを抑えるために取り組まれてきたが、OpenAIのGPT-3.5のための微調整インターフェースによって象徴された最近のLCMの進歩は、懸念を再燃させた。 LLMの微調整は、トレーニングデータセットに埋め込まれた個人情報の漏洩を引き起こすだろうか? 本稿では,この問題に対する最初の解決策,特にJanus 攻撃と呼ばれる新たな LLM 攻撃経路の発見について報告する。この攻撃では、LLMを極小のPIIデータセットを用いて微調整し、潜在的に再蓄積し、隠蔽されたPIIを明らかにするPIIアソシエーションタスクを構築することができる。以上の結果から, GPT-3.5 などの LLM が不透過性から PII 抽出に移行し, 隠れた PII のかなりの割合を希釈できることが明らかとなった。この研究は、Janus攻撃ベクトルを深く掘り下げることで、LLMユーティリティとプライバシ保護の間の複雑な相互作用をナビゲートする義務を負う。

The era post-2018 marked the advent of Large Language Models (LLMs), with innovations such as OpenAI's ChatGPT showcasing prodigious linguistic prowess. As the industry galloped toward augmenting model parameters and capitalizing on vast swaths of human language data, security and privacy challenges also emerged. Foremost among these is the potential inadvertent accrual of Personal Identifiable Information (PII) during web-based data acquisition, posing risks of unintended PII disclosure. While strategies like RLHF during training and Catastrophic Forgetting have been marshaled to control the risk of privacy infringements, recent advancements in LLMs, epitomized by OpenAI's fine-tuning interface for GPT-3.5, have reignited concerns. One may ask: can the fine-tuning of LLMs precipitate the leakage of personal information embedded within training datasets? This paper reports the first endeavor to seek the answer to the question, particularly our discovery of a new LLM exploitation avenue, called the Janus attack. In the attack, one can construct a PII association task, whereby an LLM is fine-tuned using a minuscule PII dataset, to potentially reinstate and reveal concealed PIIs. Our findings indicate that, with a trivial fine-tuning outlay, LLMs such as GPT-3.5 can transition from being impermeable to PII extraction to a state where they divulge a substantial proportion of concealed PII. This research, through its deep dive into the Janus attack vector, underscores the imperative of navigating the intricate interplay between LLM utility and privacy preservation.

翻訳日:2023-10-25 20:51:29 公開日:2023-10-24

# 再生可能エネルギーシステムにおける分散ソリューションのエンパワーメントとグリッド最適化

Empowering Distributed Solutions in Renewable Energy Systems and Grid Optimization ( http://arxiv.org/abs/2310.15468v1 )

ライセンス: Link先を確認

Mohammad Mohammadi and Ali Mohammadi

(参考訳) 本研究では,電力産業における集中型アプローチから分散型アプローチへの移行に着目し,特に機械学習(ml)の進歩が再生可能エネルギー資源のエンパワーメントとグリッド管理の改善において重要な役割を担っていることを示す。 MLモデルは、人工ニューラルネットワーク、サポートベクターマシン、決定木といった様々な技術を活用することで、再生可能エネルギーの生成と消費を予測する上でますます重要になっている。さらに、予測精度を高めるために、データ分割、正規化、分解、離散化などのデータ前処理手法を用いる。ビッグデータとMLをスマートグリッドに組み込むことは、エネルギー効率の向上、需要に対するより効率的な応答、再生可能エネルギー源のより良い統合など、いくつかの利点をもたらす。それでも、大規模なデータボリュームの処理、サイバーセキュリティの確保、専門知識の獲得といった課題には対処する必要がある。この研究は、太陽エネルギー、風力エネルギー、電気分布と貯蔵の領域における様々なML応用を研究し、エネルギーシステムを最適化する可能性を示している。この研究は、mlイノベーションと分散意思決定の適用を通じて集中型ソリューションから分散型ソリューションへと移行し、最終的にはより効率的で持続可能なエネルギーの未来を形作る、電力セクターの進化の状況を示すものだ。

This study delves into the shift from centralized to decentralized approaches in the electricity industry, with a particular focus on how machine learning (ML) advancements play a crucial role in empowering renewable energy sources and improving grid management. ML models have become increasingly important in predicting renewable energy generation and consumption, utilizing various techniques like artificial neural networks, support vector machines, and decision trees. Furthermore, data preprocessing methods, such as data splitting, normalization, decomposition, and discretization, are employed to enhance prediction accuracy. The incorporation of big data and ML into smart grids offers several advantages, including heightened energy efficiency, more effective responses to demand, and better integration of renewable energy sources. Nevertheless, challenges like handling large data volumes, ensuring cybersecurity, and obtaining specialized expertise must be addressed. The research investigates various ML applications within the realms of solar energy, wind energy, and electric distribution and storage, illustrating their potential to optimize energy systems. To sum up, this research demonstrates the evolving landscape of the electricity sector as it shifts from centralized to decentralized solutions through the application of ML innovations and distributed decision-making, ultimately shaping a more efficient and sustainable energy future.

翻訳日:2023-10-25 20:50:57 公開日:2023-10-24

# EKGNet: パターン内不整脈分類のための10.96{\mu}W完全アナログニューラルネットワーク

EKGNet: A 10.96{\mu}W Fully Analog Neural Network for Intra-Patient Arrhythmia Classification ( http://arxiv.org/abs/2310.15466v1 )

ライセンス: Link先を確認

Benyamin Haghi, Lin Ma, Sahin Lale, Anima Anandkumar, Azita Emami

(参考訳) 心電図不整脈分類におけるアナログ計算と深層学習を組み合わせた統合的アプローチを提案する。本研究では,高精度かつ低消費電力でアーカイブ可能なハードウェア効率で完全アナログ不整脈分類アーキテクチャであるekgnetを提案する。提案アーキテクチャは、サブスレッショルド領域で動作するトランジスタのエネルギー効率を活用し、アナログ・デジタルコンバータ(ADC)と静的ランダムアクセスメモリ(SRAM)を必要としない。システム設計は、プロセス、供給電圧、温度変化を緩和する新しいアナログ・シーケンシャル・マルチプライ・アキュムレート(MAC)回路を含む。 PhysioNet の MIT-BIH と PTB 診断データセットの実験的評価は, 平均平衡精度 95% と 94.25% を患者内不整脈分類と心筋梗塞分類でそれぞれ達成し, 提案手法の有効性を示した。この革新的なアプローチは、バイオメディカル応用における精度と伝達性を高めた低出力不整脈分類システムを開発するための有望な道を示す。

We present an integrated approach by combining analog computing and deep learning for electrocardiogram (ECG) arrhythmia classification. We propose EKGNet, a hardware-efficient and fully analog arrhythmia classification architecture that archives high accuracy with low power consumption. The proposed architecture leverages the energy efficiency of transistors operating in the subthreshold region, eliminating the need for analog-to-digital converters (ADC) and static random access memory (SRAM). The system design includes a novel analog sequential Multiply-Accumulate (MAC) circuit that mitigates process, supply voltage, and temperature variations. Experimental evaluations on PhysioNet's MIT-BIH and PTB Diagnostics datasets demonstrate the effectiveness of the proposed method, achieving average balanced accuracy of 95% and 94.25% for intra-patient arrhythmia classification and myocardial infarction (MI) classification, respectively. This innovative approach presents a promising avenue for developing low-power arrhythmia classification systems with enhanced accuracy and transferability in biomedical applications.

翻訳日:2023-10-25 20:50:33 公開日:2023-10-24

# ユーザ生成コンテンツにおけるyes-no質問に対する回答の解釈

Interpreting Answers to Yes-No Questions in User-Generated Content ( http://arxiv.org/abs/2310.15464v1 )

ライセンス: Link先を確認

Shivam Mathur, Keun Hee Park, Dhivya Chinnappa, Saketh Kotamraju and Eduardo Blanco

(参考訳) ソーシャルメディアでイエスノー質問に対する回答の解釈は難しい。もちろん、キーワードは珍しくなく、それらを含む答えは、キーワードが提案するものと解釈されることは滅多にない。本稿では,Twitterから4,442件の質問応答対を新たに提示する。我々は, 解釈がイエスかノーか, 解釈が不明な回答の言語的特徴について論じる。大規模な言語モデルは、同じ問題に対して他のコーパスを微調整しブレンドした後でも、ソーシャルメディア以外でも、この問題を解決するには程遠いことを示している。

Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and the few answers that include them are rarely to be interpreted what the keywords suggest. In this paper, we present a new corpus of 4,442 yes-no question-answer pairs from Twitter. We discuss linguistic characteristics of answers whose interpretation is yes or no, as well as answers whose interpretation is unknown. We show that large language models are far from solving this problem, even after fine-tuning and blending other corpora for the same problem but outside social media.

翻訳日:2023-10-25 20:50:12 公開日:2023-10-24

# 言語モデル間インタラクションによる自己誘導型メンタルヘルス介入の促進--認知再構成の事例研究

Facilitating Self-Guided Mental Health Interventions Through Human-Language Model Interaction: A Case Study of Cognitive Restructuring ( http://arxiv.org/abs/2310.15461v1 )

ライセンス: Link先を確認

Ashish Sharma, Kevin Rushton, Inna Wanyin Lin, Theresa Nguyen, Tim Althoff

(参考訳) 自己指導型のメンタルヘルス介入、例えば"do-it-yourself"ツールによる対処戦略の学習と実践は、メンタルヘルスへのアクセスを改善するという大きな約束を示す。しかし、これらの介入はしばしば認知的に要求され、感情的に引き起こされ、広範囲の実装と採用を制限するアクセシビリティ障壁を生み出します。本稿では,人間と言語モデルの相互作用が自己誘導型メンタルヘルス介入をどのように支援できるかについて検討する。否定的思考を克服するエビデンスに基づく治療手法であるcognitive restructuringをケーススタディとして捉えた。 IRBが承認した15,531人の参加者からなる大規模メンタルヘルスウェブサイトにおけるランダム化フィールドスタディにおいて、認知的再構成の様々な段階を通じて言語モデルを用いて人々を支援するシステムの設計と評価を行った。その結果,本システムは67%の参加者の感情的強度に正の影響を与え,否定的思考を65%が克服するのに役立つことがわかった。若者は比較的悪い結果を報告しているが、言語モデル生成を単純化する調整された介入により、全体的な効果と株式が向上する。

Self-guided mental health interventions, such as "do-it-yourself" tools to learn and practice coping strategies, show great promise to improve access to mental health care. However, these interventions are often cognitively demanding and emotionally triggering, creating accessibility barriers that limit their wide-scale implementation and adoption. In this paper, we study how human-language model interaction can support self-guided mental health interventions. We take cognitive restructuring, an evidence-based therapeutic technique to overcome negative thinking, as a case study. In an IRB-approved randomized field study on a large mental health website with 15,531 participants, we design and evaluate a system that uses language models to support people through various steps of cognitive restructuring. Our findings reveal that our system positively impacts emotional intensity for 67% of participants and helps 65% overcome negative thoughts. Although adolescents report relatively worse outcomes, we find that tailored interventions that simplify language model generations improve overall effectiveness and equity.

翻訳日:2023-10-25 20:50:00 公開日:2023-10-24

# UI文法によるLLMによるUIレイアウト生成

UI Layout Generation with LLMs Guided by UI Grammar ( http://arxiv.org/abs/2310.15455v1 )

ライセンス: Link先を確認

Yuwen Lu, Ziang Tong, Qinyi Zhao, Chengzhi Zhang, Toby Jia-Jun Li

(参考訳) 近年のLLM(Large Language Models)の進歩は、特にモバイルユーザインタフェース(UI)に関するタスクへの応用において、研究者や業界の専門家の間で関心を喚起している。本稿では,UIレイアウト生成におけるLCMの利用について検討する。調査の中心はUI文法の導入です。UI画面に固有の階層構造を表現するために提案した新しいアプローチです。本研究の目的は, LLMの生成能力の向上と, プロセスの説明可能性, 制御性の向上である。 GPT-4で行った実験では、LLMがテキスト内学習を通じて高品質なユーザインタフェースを実現できることを示した。さらに,本研究では,特定の側面における生成結果の品質向上に向けた文法的アプローチの可能性について予備的検討を行った。

The recent advances in Large Language Models (LLMs) have stimulated interest among researchers and industry professionals, particularly in their application to tasks concerning mobile user interfaces (UIs). This position paper investigates the use of LLMs for UI layout generation. Central to our exploration is the introduction of UI grammar -- a novel approach we proposed to represent the hierarchical structure inherent in UI screens. The aim of this approach is to guide the generative capacities of LLMs more effectively and improve the explainability and controllability of the process. Initial experiments conducted with GPT-4 showed the promising capability of LLMs to produce high-quality user interfaces via in-context learning. Furthermore, our preliminary comparative study suggested the potential of the grammar-based approach in improving the quality of generative results in specific aspects.

翻訳日:2023-10-25 20:49:38 公開日:2023-10-24

# パブリック機能によるプライベートラーニング

Private Learning with Public Features ( http://arxiv.org/abs/2310.15454v1 )

ライセンス: Link先を確認

Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang

(参考訳) 本研究では,データがプライベート機能とパブリック機能の結合であるプライベート学習問題のクラスについて検討する。これは、リコメンデーションや広告予測のような個人的なパーソナライズタスクにおいて、個人に関連する特徴が敏感である一方で、アイテム(推奨する映画や曲、またはユーザーに見せる広告)に関連する特徴が公開されており、保護を必要としない場合が多い。自然の疑問は、プライベートアルゴリズムがパブリック機能の存在下で高いユーティリティを達成できるかどうかである。公開機能で動作するマルチエンコーダモデルに対して,肯定的な回答を与える。我々は,この分離を有効に活用するアルゴリズムを,(勾配にノイズを加える代わりに)十分な統計量だけを保護して開発する。本手法は, 線形回帰に対する実用性の向上を保証し, 2つの標準プライベートレコメンデーションベンチマークにおいて, プライベートな特徴分離に適応する手法の重要性を実証する。

We study a class of private learning problems in which the data is a join of private and public features. This is often the case in private personalization tasks such as recommendation or ad prediction, in which features related to individuals are sensitive, while features related to items (the movies or songs to be recommended, or the ads to be shown to users) are publicly available and do not require protection. A natural question is whether private algorithms can achieve higher utility in the presence of public features. We give a positive answer for multi-encoder models where one of the encoders operates on public features. We develop new algorithms that take advantage of this separation by only protecting certain sufficient statistics (instead of adding noise to the gradient). This method has a guaranteed utility improvement for linear regression, and importantly, achieves the state of the art on two standard private recommendation benchmarks, demonstrating the importance of methods that adapt to the private-public feature separation.

翻訳日:2023-10-25 20:49:25 公開日:2023-10-24

# 因果表現学習における一般識別性と達成可能性

General Identifiability and Achievability for Causal Representation Learning ( http://arxiv.org/abs/2310.15450v1 )

ライセンス: Link先を確認

Burak Var{\i}c{\i}, Emre Acart\"urk, Karthikeyan Shanmugam, Ali Tajer

(参考訳) 本稿では、一般的な非パラメトリック因果潜在モデルと、潜在データを観測データにマッピングする一般変換モデルに基づく因果表現学習(CRL)に焦点を当てる。潜在因果グラフ内のノードごとに2つのハードな \textbf{uncoupled} 介入を用いて、 \textbf{identifiability} と \textbf{achievability} の結果を確立する。特に、どの一対の介入環境が同じノードを介入しているか(疎結合な環境)を知らない。この論文は、未結合の介入の下で潜在因果モデルと変数の完全回復が保証されることを示す。達成可能性のために、観測データと介入データを使用し、アルゴリズムの証明可能な保証で潜在因果モデルと変数を復元するアルゴリズムが設計されている。このアルゴリズムは、異なる環境におけるスコアの変動を利用して、変圧器の逆数と後続変数を推定する。さらに、分析では、2つのハードな \textbf{coupled} 介入、つまり同じノードが介入した2つの環境に関するメタデータが知られている場合に、既存のidentifiability結果が復元される。非パラメトリック識別性に関する既存の結果は、介入に関する仮定と追加の忠実性の仮定を必要とする。本稿では、観測データが利用可能である場合、追加の忠実性の仮定は不要であることを示す。

This paper focuses on causal representation learning (CRL) under a general nonparametric causal latent model and a general transformation model that maps the latent data to the observational data. It establishes \textbf{identifiability} and \textbf{achievability} results using two hard \textbf{uncoupled} interventions per node in the latent causal graph. Notably, one does not know which pair of intervention environments have the same node intervened (hence, uncoupled environments). For identifiability, the paper establishes that perfect recovery of the latent causal model and variables is guaranteed under uncoupled interventions. For achievability, an algorithm is designed that uses observational and interventional data and recovers the latent causal model and variables with provable guarantees for the algorithm. This algorithm leverages score variations across different environments to estimate the inverse of the transformer and, subsequently, the latent variables. The analysis, additionally, recovers the existing identifiability result for two hard \textbf{coupled} interventions, that is when metadata about the pair of environments that have the same node intervened is known. It is noteworthy that the existing results on non-parametric identifiability require assumptions on interventions and additional faithfulness assumptions. This paper shows that when observational data is available, additional faithfulness assumptions are unnecessary.

翻訳日:2023-10-25 20:49:07 公開日:2023-10-24

# 確率的非凸凸ミニマックス問題に対する一階正則運動量降下昇降アルゴリズム

An accelerated first-order regularized momentum descent ascent algorithm for stochastic nonconvex-concave minimax problems ( http://arxiv.org/abs/2310.15448v1 )

ライセンス: Link先を確認

Huiling Zhang and Zi Xu

(参考訳) 確率的非凸ミニマックス問題は近年、機械学習、信号処理など多くの分野に注目されている。本稿では,確率的非凸凸ミニマックス問題を解くための一階正則化運動量降下法(formda)を提案する。アルゴリズムの反復複雑性は$\tilde{\mathcal{O}}(\varepsilon ^{-6.5})$で$\varepsilon$-stationary pointを得ることが証明され、これは目的関数の定常性の下での確率的非凸-凹ミニマックス問題を解くためにシングルループアルゴリズムの最もよく知られた複雑性を実現する。

Stochastic nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose an accelerated first-order regularized momentum descent ascent algorithm (FORMDA) for solving stochastic nonconvex-concave minimax problems. The iteration complexity of the algorithm is proved to be $\tilde{\mathcal{O}}(\varepsilon ^{-6.5})$ to obtain an $\varepsilon$-stationary point, which achieves the best-known complexity bound for single-loop algorithms to solve the stochastic nonconvex-concave minimax problems under the stationarity of the objective function.

翻訳日:2023-10-25 20:48:40 公開日:2023-10-24

# the quantum tortoise and the classical hare: 量子コンピューティングがどの問題を加速させるか(そしてそうしないか)を理解するためのシンプルなフレームワーク

The Quantum Tortoise and the Classical Hare: A simple framework for understanding which problems quantum computing will accelerate (and which it will not) ( http://arxiv.org/abs/2310.15505v1 )

ライセンス: Link先を確認

Sukwoong Choi, William S. Moses, Neil Thompson

(参考訳) 量子コンピューティングは、いくつかの問題を解決するために変革的な利益を約束します。量子コンピュータを今、あるいは将来使いたい人には、どの問題が役に立つかを知ることが重要です。本稿では,この問いに対して直感的かつ定量的に答える枠組みを提案する。フレームワークの基盤となる構造は量子コンピュータと古典コンピュータの競争であり、それぞれの強みが勝利のタイミングを決定する。古典的コンピュータは高速に動作するが、量子コンピュータはより効率的なアルゴリズムを実行することがある。速度優位かアルゴリズム優位かは、ある問題が量子コンピューティングの恩恵を受けるかどうかを決定する。我々の分析によると、多くの問題、特に、一般的なビジネスにとって重要な小規模から中規模の問題では、量子コンピューティングの恩恵を受けない。逆に、より大きな問題や特に大きなアルゴリズム的ゲインを持つものは、短期量子コンピューティングの恩恵を受ける。非常に大きなアルゴリズムの利得は、実際にはまれであり、原理上も稀であると理論化されているため、量子コンピューティングの利点は、このようなまれなケースのユーザか、非常に大きなデータを処理する実践者のいずれかに流れることを示唆する。

Quantum computing promises transformational gains for solving some problems, but little to none for others. For anyone hoping to use quantum computers now or in the future, it is important to know which problems will benefit. In this paper, we introduce a framework for answering this question both intuitively and quantitatively. The underlying structure of the framework is a race between quantum and classical computers, where their relative strengths determine when each wins. While classical computers operate faster, quantum computers can sometimes run more efficient algorithms. Whether the speed advantage or the algorithmic advantage dominates determines whether a problem will benefit from quantum computing or not. Our analysis reveals that many problems, particularly those of small to moderate size that can be important for typical businesses, will not benefit from quantum computing. Conversely, larger problems or those with particularly big algorithmic gains will benefit from near-term quantum computing. Since very large algorithmic gains are rare in practice and theorized to be rare even in principle, our analysis suggests that the benefits from quantum computing will flow either to users of these rare cases, or practitioners processing very large data.

翻訳日:2023-10-25 20:44:11 公開日:2023-10-24

# 合成シーングラフからのクロスビュー自己ローカライゼーション

Cross-view Self-localization from Synthesized Scene-graphs ( http://arxiv.org/abs/2310.15504v1 )

ライセンス: Link先を確認

Ryogo Yamamoto, Kanji Tanaka

(参考訳) クロスビューの自己ローカライゼーションは、スパース視点からデータベースイメージを提供する視覚的場所認識の難しいシナリオである。近年,NeRF(Neural Radiance Fields)技術を用いたデータベース画像の合成手法が注目されている。しかし,これらの手法により得られた合成画像は,原画像よりも品質が低く,データベースの保存コストも著しく向上する。本研究では、生画像から計算したビュー不変外観特徴と合成画像から計算したビュー依存空間意味特徴の利点を組み合わせた、新しいハイブリッドシーンモデルを提案する。これら2つの特徴はシーングラフに融合され、グラフニューラルネットワークによって圧縮学習され認識される。提案手法の有効性は,フォトリアリスティック・ビタット・シミュレータを用いた多数の未確認ビューを含む新しいクロスビュー・セルフローカライズデータセットを用いて検証した。

Cross-view self-localization is a challenging scenario of visual place recognition in which database images are provided from sparse viewpoints. Recently, an approach for synthesizing database images from unseen viewpoints using NeRF (Neural Radiance Fields) technology has emerged with impressive performance. However, synthesized images provided by these techniques are often of lower quality than the original images, and furthermore they significantly increase the storage cost of the database. In this study, we explore a new hybrid scene model that combines the advantages of view-invariant appearance features computed from raw images and view-dependent spatial-semantic features computed from synthesized images. These two types of features are then fused into scene graphs, and compressively learned and recognized by a graph neural network. The effectiveness of the proposed method was verified using a novel cross-view self-localization dataset with many unseen views generated using a photorealistic Habitat simulator.

翻訳日:2023-10-25 20:43:52 公開日:2023-10-24

# TRAMS:長距離言語モデリングのためのトレーニング不要メモリ選択

TRAMS: Training-free Memory Selection for Long-range Language Modeling ( http://arxiv.org/abs/2310.15494v1 )

ライセンス: Link先を確認

Haofei Yu, Cunxiang wang, Yue Zhang, Wei Bi

(参考訳) トランスフォーマーアーキテクチャは多くのaiモデルにとって不可欠であるが、長距離言語モデリングの課題に直面している。いくつかの特定のトランスフォーマーアーキテクチャは、長距離依存の問題に対処するために設計されているが、Transformer-XLのような既存のメソッドは、高頻度で非効率なメモリに悩まされている。本研究では、1つの単純なメトリクスに基づいて注意計算に参加するトークンを選択できる「トレーニングフリーメモリ選択(tram)」と呼ばれるプラグ・アンド・プレイ戦略を提案する。この戦略により、現在のクエリに高い注意点を持つ可能性のあるトークンを保持し、他のトークンを無視します。我々は、単語レベルのベンチマーク(wikitext-103)と文字レベルのベンチマーク(enwik8)で、このアプローチをテストしました。

The Transformer architecture is crucial for numerous AI models, but it still faces challenges in long-range language modeling. Though several specific transformer architectures have been designed to tackle issues of long-range dependencies, existing methods like Transformer-XL are plagued by a high percentage of ineffective memories. In this study, we present a plug-and-play strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens participating in attention calculation based on one simple metric. This strategy allows us to keep tokens that are likely to have a high attention score with the current queries and ignore the other ones. We have tested our approach on the word-level benchmark (WikiText-103) and the character-level benchmark (enwik8), and the results indicate an improvement without having additional training or adding additional parameters.

翻訳日:2023-10-25 20:43:38 公開日:2023-10-24

# 統合オンライントップK勧告のためのロバスト表現学習

Robust Representation Learning for Unified Online Top-K Recommendation ( http://arxiv.org/abs/2310.15492v1 )

ライセンス: Link先を確認

Minfang Lu, Yuchen Jiang, Huihui Dong, Qi Li, Ziru Xu, Yuanlin Liu, Lixia Wu, Haoyuan Hu, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng

(参考訳) 大規模産業eコマースにおいて、オンラインレコメンデーションシステムの効率性は、さまざまなビジネスシナリオに対応する、非常に関連性の高いアイテム/コンテンツ広告を提供する上で重要である。しかし、既存の研究のほとんどはアイテム広告のみに焦点を当てており、コンテンツ広告の重要性を無視している。この監視はマルチエンタリティ構造内の不整合と不公平な検索をもたらす。さらに、異なるドメインにまたがる複数のエンティティ広告からトップk広告を取得するという課題は、複雑さを増す。近年の研究では、異なるドメイン内のユーザエンタリティの挙動が、分化と均質性の特徴を示すことが証明されている。したがって、マルチドメインマッチングモデルは通常、ドメイン不変およびドメイン固有表現を持つハイブリッド専門家フレームワークに依存します。残念なことに、ほとんどのアプローチは、主に異なる専門家のコンビネーションモードの最適化にフォーカスしており、専門家モジュール自体の最適化に固有の困難に対処できていない。異なるドメインにまたがる冗長な情報の存在は、専門家間の干渉と競争をもたらし、一方、各ドメインの異なる学習目標が専門家間の最適化の課題に繋がる。そこで本研究では,統一型オンライントップkレコメンデーションのためのロバスト表現学習を提案する。提案手法は,データフェアネスを保証するため,エンティティ空間における統一モデリングを構築する。ロバスト表現学習は、ドメイン敵学習とマルチビューワッサースタイン分布学習を用いてロバスト表現を学習する。さらに,本提案手法は,相補的不確実性重みと直交性制約によって相反する目的のバランスをとる。提案手法の有効性と合理性は様々な実験によって検証されている。

In large-scale industrial e-commerce, the efficiency of an online recommendation system is crucial in delivering highly relevant item/content advertising that caters to diverse business scenarios. However, most existing studies focus solely on item advertising, neglecting the significance of content advertising. This oversight results in inconsistencies within the multi-entity structure and unfair retrieval. Furthermore, the challenge of retrieving top-k advertisements from multi-entity advertisements across different domains adds to the complexity. Recent research proves that user-entity behaviors within different domains exhibit characteristics of differentiation and homogeneity. Therefore, the multi-domain matching models typically rely on the hybrid-experts framework with domain-invariant and domain-specific representations. Unfortunately, most approaches primarily focus on optimizing the combination mode of different experts, failing to address the inherent difficulty in optimizing the expert modules themselves. The existence of redundant information across different domains introduces interference and competition among experts, while the distinct learning objectives of each domain lead to varying optimization challenges among experts. To tackle these issues, we propose robust representation learning for the unified online top-k recommendation. Our approach constructs unified modeling in entity space to ensure data fairness. The robust representation learning employs domain adversarial learning and multi-view wasserstein distribution learning to learn robust representations. Moreover, the proposed method balances conflicting objectives through the homoscedastic uncertainty weights and orthogonality constraints. Various experiments validate the effectiveness and rationality of our proposed method, which has been successfully deployed online to serve real business scenarios.

翻訳日:2023-10-25 20:43:20 公開日:2023-10-24

# NuTrea: コンテキスト誘導型マルチホップKGQAのためのニューラルツリー検索

NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA ( http://arxiv.org/abs/2310.15484v1 )

ライセンス: Link先を確認

Hyeong Kyu Choi and Seunghun Lee and Jaewon Chu and Hyunwoo J. Kim

(参考訳) マルチホップ知識グラフ質問回答(Multi-hop Knowledge Graph Question Answering, KGQA)は、知識グラフ(KG)からノードを取得して自然言語の質問に答えるタスクである。最近のGNNベースのアプローチでは、メッセージをシードノードから応答ノードへ順次伝播するKGパス探索問題としてこのタスクを定式化している。しかし、これらのメッセージは過去指向であり、全kgコンテキストを考慮しない。さらに悪いことに、kgノードは適切な名詞エンティティを表すことが多く、時には暗号化され、経路間の選択に役立たない。これらの問題に対処するために,木探索に基づくGNNモデルであるNeural Tree Search (NuTrea)を提案する。私たちのモデルは、未到達のサブツリー領域を調査し、過去指向の埋め込みを促進するメッセージパッシングスキームを採用しています。さらに,グローバルなKGコンテキストを考慮したRF-IEF(Relation Frequency-Inverse Entity Frequency)ノードの埋め込みを導入し,不明瞭なKGノードを特徴付ける。提案手法の汎用性は,3つの主要なマルチホップKGQAベンチマークデータセットの実験により実証され,その表現性と頑健性をさらに検証した。全体として、NuTreaは複雑な自然言語の質問でKGに問い合わせる強力な手段を提供する。コードはhttps://github.com/mlvlab/nutreaで入手できる。

Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the full KG context. To make matters worse, KG nodes often represent proper noun entities and are sometimes encrypted, being uninformative in selecting between paths. To address these problems, we propose Neural Tree Search (NuTrea), a tree search-based GNN model that incorporates the broader KG context. Our model adopts a message-passing scheme that probes the unreached subtree regions to boost the past-oriented embeddings. In addition, we introduce the Relation Frequency-Inverse Entity Frequency (RF-IEF) node embedding that considers the global KG context to better characterize ambiguous KG nodes. The general effectiveness of our approach is demonstrated through experiments on three major multi-hop KGQA benchmark datasets, and our extensive analyses further validate its expressiveness and robustness. Overall, NuTrea provides a powerful means to query the KG with complex natural language questions. Code is available at https://github.com/mlvlab/NuTrea.

翻訳日:2023-10-25 20:42:37 公開日:2023-10-24

# RGB-Dビデオにおける局所物体検出

Salient Object Detection in RGB-D Videos ( http://arxiv.org/abs/2310.15482v1 )

ライセンス: Link先を確認

Ao Mou, Yukang Lu, Jiahao He, Dingyao Min, Keren Fu, Qijun Zhao

(参考訳) 奥行き検知装置の普及に伴い、RGB-Dビデオや関連データ/メディアは日常生活の様々な面で大きな注目を集めている。その結果、RGB-Dビデオにおけるサルエント物体検出(SOD)の実施は、非常に有望で進化する道を示す。この領域の可能性にもかかわらず、RGB-DビデオにおけるSODは、RGB-D SODとビデオSOD(VSOD)は、伝統的に独立して研究されている。この新興分野を探求するため,本稿では,データセットとモデルという2つの主要な貢献を行う。一方,RDVSデータセットは現実的な深度を持つ新しいRGB-D VSODデータセットであり,シーンの多様性とフレーム単位の厳密なアノテーションが特徴である。包括的属性とオブジェクト指向分析を用いてデータセットを検証し、トレーニングとテストの分割を提供する。さらに、RGB-D VSODに適した3ストリームネットワークであるDCTNet+を導入し、RGBのモダリティを重視し、奥行きと光の流れを補助モダリティとして扱う。正確な最終予測のために,有効機能強化,改良,融合を追求するために,マルチモーダルアテンションモジュール (MAM) と改良融合モジュール (RFM) の2つのモジュールを提案する。 RFM内での相互作用と融合を強化するため、我々はUIM(Universal Interaction Module)を設計し、RFMに到達する前にマルチモーダルな低レベル特徴を洗練するための全体的マルチモーダル減衰経路(HMAP)を統合する。 RDVSと共に擬似RGB-Dビデオデータセットを用いて総合実験を行い、DCTNet+が17のVSODモデルと14のRGB-D SODモデルよりも優れていることを示した。擬似的および現実的なRGB-Dビデオデータセット上でアブレーション実験を行い、個々のモジュールの利点と現実的な深さを導入する必要性を実証した。私たちのコードとRDVSデータセットはhttps://github.com/kerenfu/RDVS/で利用可能です。

Given the widespread adoption of depth-sensing acquisition devices, RGB-D videos and related data/media have gained considerable traction in various aspects of daily life. Consequently, conducting salient object detection (SOD) in RGB-D videos presents a highly promising and evolving avenue. Despite the potential of this area, SOD in RGB-D videos remains somewhat under-explored, with RGB-D SOD and video SOD (VSOD) traditionally studied in isolation. To explore this emerging field, this paper makes two primary contributions: the dataset and the model. On one front, we construct the RDVS dataset, a new RGB-D VSOD dataset with realistic depth and characterized by its diversity of scenes and rigorous frame-by-frame annotations. We validate the dataset through comprehensive attribute and object-oriented analyses, and provide training and testing splits. Moreover, we introduce DCTNet+, a three-stream network tailored for RGB-D VSOD, with an emphasis on RGB modality and treats depth and optical flow as auxiliary modalities. In pursuit of effective feature enhancement, refinement, and fusion for precise final prediction, we propose two modules: the multi-modal attention module (MAM) and the refinement fusion module (RFM). To enhance interaction and fusion within RFM, we design a universal interaction module (UIM) and then integrate holistic multi-modal attentive paths (HMAPs) for refining multi-modal low-level features before reaching RFMs. Comprehensive experiments, conducted on pseudo RGB-D video datasets alongside our RDVS, highlight the superiority of DCTNet+ over 17 VSOD models and 14 RGB-D SOD models. Ablation experiments were performed on both pseudo and realistic RGB-D video datasets to demonstrate the advantages of individual modules as well as the necessity of introducing realistic depth. Our code together with RDVS dataset will be available at https://github.com/kerenfu/RDVS/.

翻訳日:2023-10-25 20:41:51 公開日:2023-10-24

# AutoDiff: 表データ合成のためのオートエンコーダと拡散モデルを組み合わせる

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing ( http://arxiv.org/abs/2310.15479v1 )

ライセンス: Link先を確認

Namjoon Suh, Xiaofeng Lin, Din-Yin Hsieh, Merhdad Honarkhah, Guang Cheng

(参考訳) 拡散モデルは、コンピュータビジョン、言語モデル、音声合成を含む現代の機械学習の多くのサブフィールドにおいて、合成データ生成の主要なパラダイムとなっている。本稿では,合成表データを生成するために拡散モデルのパワーを利用する。表データの異質な特徴は表データ合成における主な障害であり,オートエンコーダアーキテクチャを用いてこの問題に対処している。最先端の表型シンセサイザーと比較すると,本モデルから得られた合成表は,実データに対する優れた統計量を示し,機械学習ユーティリティの下流タスクにおいて良好に機能する。我々は15の公開データセットに対して実験を行った。特に,本モデルでは,表層データ合成における長年の課題である特徴間の相関関係を良好に捉えている。私たちのコードは要求に応じて入手でき、paperが受け入れられれば公開されます。

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem by employing the auto-encoder architecture. When compared with the state-of-the-art tabular synthesizers, the resulting synthetic tables from our model show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities. We conducted the experiments over 15 publicly available datasets. Notably, our model adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. Our code is available upon request and will be publicly released if paper is accepted.

翻訳日:2023-10-25 20:40:47 公開日:2023-10-24

# CRaSh: 大規模言語モデルなしでのファインチューニングによるクラスタリング、削除、共有

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model ( http://arxiv.org/abs/2310.15477v1 )

ライセンス: Link先を確認

Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, Bowen Zhou

(参考訳) 近年,大規模言語モデル(LLM)を協調させ,様々なタスクにまたがる一般化能力を高めるための効果的な手法として,命令チューニングが認識されている。しかし、公開アクセス可能な集中型LCMをプライベートな命令データでチューニングする場合、プライバシー上の懸念は避けられない。モデル間のパラメータ化モジュールの直接移動は、この問題に対処するための有効なアプローチであるが、その意味と有効性はさらなる探索が必要である。本稿では,集中型LCMと下流エミュレータ間でトランスフォーマブロックを転送する代表技術であるOffsite-Tuning(OFT)に焦点を当てる。基礎となるOFTのメカニズムを限定的に理解し,表現性や機能的類似性の観点からLLMに関する経験的分析を行う。興味深いことに、モデルのサイズが拡大するにつれて、LCMの層内にユニークなモジュラー構造が現れる。同時に、レイヤ間の表現と中間予測の微妙だが潜在的に重要な変化に注目する。これらの観測にインスパイアされたCRaShは、LCMから改善エミュレータを導出するトレーニングフリー戦略であるClustering、Removing、Sharingを含む。 CRaShは数十億のパラメータでOFTのパフォーマンスを大幅に向上させる。さらに,ロスランドスケープのレンズを通したフルモデルによる微調整により得られる最適解について検討した。以上の結果から,同じ盆地に落下するオプティマ間の線形接続が示され,クラッシュとオプティマの効果が強調された。ソースコードはhttps://github.com/TsinghuaC3I/CRaShで公開されている。

Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks. However, when tuning publicly accessible, centralized LLMs with private instruction data, privacy concerns are inevitable. While direct transfer of parameterized modules between models is a plausible approach to address this, its implications and effectiveness need further exploration. This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators. Given the limited understanding of the underlying mechanism of OFT, we perform an empirical analysis on LLMs from the perspectives of representation and functional similarity. Interestingly, our findings reveal a unique modular structure within the layers of LLMs that appears to emerge as the model size expands. Simultaneously, we note subtle but potentially significant changes in representation and intermediate predictions across the layers. Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs. CRaSh significantly boosts performance of OFT with billions of parameters. Furthermore, we investigate the optimal solutions yielded by fine-tuning with and without full model through the lens of loss landscape. Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT. The source code is publicly available at https://github.com/TsinghuaC3I/CRaSh.

翻訳日:2023-10-25 20:40:32 公開日:2023-10-24

# 幾何コヒーレンスのトレードオフ関係

Trade-off relations of geometric coherence ( http://arxiv.org/abs/2310.15476v1 )

ライセンス: Link先を確認

Bingyu Hu and Ming-Jing Zhao

(参考訳) 量子コヒーレンスは重要な量子資源であり、様々な研究分野と密接に関連している。幾何コヒーレンス(geoge coherence)は、操作的にも幾何学的にもコヒーレンス尺度である。量子ビット系における幾何コヒーレンスのトレードオフ関係について検討する。まず、量子状態の純度によって幾何学的コヒーレンスの上界を導出する。これにより、量子コヒーレンスと混合性との相補性関係が確立される。次に, 2 と 3 つの一般測定ベース上の幾何学的コヒーレンスの量子不確かさ関係をそれぞれ非可逆性の観点から導出する。これらのトレードオフ関係は、量子コヒーレンスの量に制限を与える。副産物として、純状態アンサンブルを識別する最小誤差確率と量子状態の混合性との相補性関係が確立される。

Quantum coherence is an important quantum resource and it is intimately related to various research fields. The geometric coherence is a coherence measure both operationally and geometrically. We study the trade-off relation of geometric coherence in qubit systems. We first derive an upper bound for the geometric coherence by the purity of quantum states. Based on this, a complementarity relation between the quantum coherence and the mixedness is established. We then derive the quantum uncertainty relations of the geometric coherence on two and three general measurement bases in terms of the incompatibility respectively, which turn out to be state-independent for pure states. These trade-off relations provide the limit to the amount of quantum coherence. As a byproduct,the complementarity relation between the minimum error probability for discriminating a pure-states ensemble and the mixedness of quantum states is established.

翻訳日:2023-10-25 20:40:07 公開日:2023-10-24

# 心不全リスク予測のための解釈型生存分析

Interpretable Survival Analysis for Heart Failure Risk Prediction ( http://arxiv.org/abs/2310.15472v1 )

ライセンス: Link先を確認

Mike Van Ness, Tomas Bosschieter, Natasha Din, Andrew Ambrosy, Alexander Sandhu, Madeleine Udell

(参考訳) 生存分析(Survival analysis)は、医療研究において重要かつ広範な問題である。医学研究は伝統的に生き残り分析のためにコックスモデルに依存してきた。 coxモデルは、時間とともに対数線形ハザード関数と比例ハザードを仮定し、これらの仮定が失敗すると性能が低下する。機械学習に基づく新しい生存モデルは、これらの仮定を回避し、精度の向上を提供するが、時には、臨床的使用に不可欠なモデル解釈可能性の犠牲になる。現状の生存モデルと解釈可能かつ競合する新しい生存分析パイプラインを提案する。具体的には,サバイバル・スタッキングの改良版を用いてサバイバル解析問題を分類問題に変換し,特徴選択を行う制御バーンと説明可能なブースティングマシンを用いて解釈可能な予測を生成する。パイプラインを評価するため,大規模なEMHデータベースを用いて心不全のリスクを予測する。我々のパイプラインは最先端のパフォーマンスを達成し、心不全のリスク要因に関する興味深い新しい洞察を提供する。

Survival analysis, or time-to-event analysis, is an important and widespread problem in healthcare research. Medical research has traditionally relied on Cox models for survival analysis, due to their simplicity and interpretability. Cox models assume a log-linear hazard function as well as proportional hazards over time, and can perform poorly when these assumptions fail. Newer survival models based on machine learning avoid these assumptions and offer improved accuracy, yet sometimes at the expense of model interpretability, which is vital for clinical use. We propose a novel survival analysis pipeline that is both interpretable and competitive with state-of-the-art survival models. Specifically, we use an improved version of survival stacking to transform a survival analysis problem to a classification problem, ControlBurn to perform feature selection, and Explainable Boosting Machines to generate interpretable predictions. To evaluate our pipeline, we predict risk of heart failure using a large-scale EHR database. Our pipeline achieves state-of-the-art performance and provides interesting and novel insights about risk factors for heart failure.

翻訳日:2023-10-25 20:39:50 公開日:2023-10-24

# SteloCoder: Pythonコードへの多言語翻訳のためのデコーダ専用LLM

SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code Translation ( http://arxiv.org/abs/2310.15539v1 )

ライセンス: Link先を確認

Jialing Pan, Adrien Sad\'e, Jin Kim, Eric Soriano, Guillem Sole, Sylvain Flamant

(参考訳) 最近、Large Language Models (LLMs) に焦点が当てられ、StarCoder (Li et al., 2023) と Code Llama (Rozi\`ere et al., 2023) の両方がコード生成において顕著なパフォーマンスを示している。しかし、効率的なトレーニング技術によるコード翻訳機能の改善はいまだに必要である。これに対応するために,マルチプログラミング言語からpythonコードへの翻訳用に設計された,デコーダ専用のstarcoderベースのllmであるstelocoderを紹介する。特にSteroCoderは、入力プログラミング言語を指定せずに、C++、C#、JavaScript、Java、PHPからPythonへのコード変換を実現している。我々は,5人の専門家とマルチタスク処理のためのゲーティングネットワークを備えたMixture-of-Experts (MoE)技術を組み込んだStarCoderモデルアーキテクチャを改良した。専門家はstarcoderの微調整によって得られる。具体的には,各専門家のサイズをStarCoderのパラメータの0.06%に制限するローランド適応手法(LoRA)を用いる。同時に、時間的学習効率を向上させるため、カリキュラム学習戦略を採用し、自己指導データを用いて効率的な微調整を行う。その結果、各専門家は1つの80Gb A100 HBMでトレーニングするのにわずか6時間しかかからない。 XLCoSTデータセットの実験により、SteroCoderは、マルチプログラミング言語からPythonへの翻訳において平均73.76のCodeBLEUスコアを達成し、リーダーボードの最高パフォーマンスを3.5以上上回った。この成果は、StarCoderをバックボーンとし、1つの80GB A100 HBMで32時間の有効なトレーニングを行うという、わずか4500万の余剰パラメータによるものである。ソースコードはhttps://github.com/sade-adrien/stelocoder.com/。

With the recent focus on Large Language Models (LLMs), both StarCoder (Li et al., 2023) and Code Llama (Rozi\`ere et al., 2023) have demonstrated remarkable performance in code generation. However, there is still a need for improvement in code translation functionality with efficient training techniques. In response to this, we introduce SteloCoder, a decoder-only StarCoder-based LLM designed specifically for multi-programming language-to-Python code translation. In particular, SteloCoder achieves C++, C#, JavaScript, Java, or PHP-to-Python code translation without specifying the input programming language. We modified StarCoder model architecture by incorporating a Mixture-of-Experts (MoE) technique featuring five experts and a gating network for multi-task handling. Experts are obtained by StarCoder fine-tuning. Specifically, we use a Low-Rank Adaptive Method (LoRA) technique, limiting each expert size as only 0.06% of number of StarCoder's parameters. At the same time, to enhance training efficiency in terms of time, we adopt curriculum learning strategy and use self-instruct data for efficient fine-tuning. As a result, each expert takes only 6 hours to train on one single 80Gb A100 HBM. With experiments on XLCoST datasets, SteloCoder achieves an average of 73.76 CodeBLEU score in multi-programming language-to-Python translation, surpassing the top performance from the leaderboard by at least 3.5. This accomplishment is attributed to only 45M extra parameters with StarCoder as the backbone and 32 hours of valid training on one 80GB A100 HBM. The source code is release here: https://github.com/sade-adrien/SteloCoder.

翻訳日:2023-10-25 20:32:48 公開日:2023-10-24

# 協調サンプル選択とコントラスト半監督学習を用いた雑音ラベルによる学習

Learning with Noisy Labels Using Collaborative Sample Selection and Contrastive Semi-Supervised Learning ( http://arxiv.org/abs/2310.15533v1 )

ライセンス: Link先を確認

Qing Miao, Xiaohe Wu, Chao Xu, Yanli Ji, Wangmeng Zuo, Yiwen Guo, Zhaopeng Meng

(参考訳) ノイズラベルを用いた学習(LNL)は広く研究されており、既存のアプローチでは、クリーンサンプルの選択と半教師付き学習(SSL)を交互に行うフレームワークが一般的である。しかし、このアプローチには制限があり、Deep Neural Network (DNN)分類器によって選択されたクリーンセットは、必然的にノイズの多いサンプルを含んでいる。クリーンなサンプルとノイズの多いサンプルの混合は、SSL中のDNNトレーニングの誤認を招き、サンプル選択におけるエラー蓄積による確認バイアスによる一般化性能を損なう。この問題に対処するために,大規模事前学習モデルクリップを活用した協調サンプル選択法(collaborative sample selection, css)を提案する。 CSSは、特定されたクリーンセットから混合ノイズサンプルを削除することを目的としている。私たちは,CLIPの確率とDNN分類器の予測を組み合わせた2次元ガウス混合モデル (2D-GMM) を訓練することにより,これを実現できる。また,CLIPのLNLへの適応性を高めるために,半教師付き学習における対照的な損失を伴う協調学習機構を導入する。これにより、CLIPとDNN分類器のプロンプトを共同でトレーニングし、特徴表現の改善、DNNの分類性能の向上、協調サンプル選択に対する相互利益をもたらすことができる。 CLIPからの補助情報と即時微調整を活用することにより、クリーンセットからノイズサンプルを効果的に除去し、トレーニング中の確認バイアスを軽減する。複数のベンチマークデータセットに対する実験結果から,提案手法の有効性を最先端手法と比較した。

Learning with noisy labels (LNL) has been extensively studied, with existing approaches typically following a framework that alternates between clean sample selection and semi-supervised learning (SSL). However, this approach has a limitation: the clean set selected by the Deep Neural Network (DNN) classifier, trained through self-training, inevitably contains noisy samples. This mixture of clean and noisy samples leads to misguidance in DNN training during SSL, resulting in impaired generalization performance due to confirmation bias caused by error accumulation in sample selection. To address this issue, we propose a method called Collaborative Sample Selection (CSS), which leverages the large-scale pre-trained model CLIP. CSS aims to remove the mixed noisy samples from the identified clean set. We achieve this by training a 2-Dimensional Gaussian Mixture Model (2D-GMM) that combines the probabilities from CLIP with the predictions from the DNN classifier. To further enhance the adaptation of CLIP to LNL, we introduce a co-training mechanism with a contrastive loss in semi-supervised learning. This allows us to jointly train the prompt of CLIP and the DNN classifier, resulting in improved feature representation, boosted classification performance of DNNs, and reciprocal benefits to our Collaborative Sample Selection. By incorporating auxiliary information from CLIP and utilizing prompt fine-tuning, we effectively eliminate noisy samples from the clean set and mitigate confirmation bias during training. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed method in comparison with the state-of-the-art approaches.

翻訳日:2023-10-25 20:32:10 公開日:2023-10-24

# マトリックス機構のプライバシ増幅

Privacy Amplification for Matrix Mechanisms ( http://arxiv.org/abs/2310.15526v1 )

ライセンス: Link先を確認

Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

(参考訳) プライバシーの増幅はデータ選択のランダム性を利用して、より厳密な差分プライバシー(DP)保証を提供する。この分析は、DP-SGDが機械学習で成功した鍵であるが、新しい最先端のアルゴリズムには適用できない。これは、DP-FTRLとして知られるこれらのアルゴリズムが、DP-SGDのように独立ノイズの代わりに相関ノイズを追加するために行列機構を使用するためである。本稿では,任意の汎用行列機構をサンプリングすることで,プライバシ増幅を解析した最初のアルゴリズムであるmmccを提案する。 MMCCは、$\epsilon\to0$という低い値に近づいたため、ほぼ厳密である。 MMCCにおける相関出力を解析するために,先行出力に条件付けすることで,独立であるかのように解析できることを示す。条件合成定理」は広範に有効であり、二分木-DP-FTRLに付加される雑音が、DP-SGDに付加される雑音と増幅と漸近的に一致できることを示す。また,本アルゴリズムは,標準ベンチマーク上でのDP-FTRLアルゴリズムのプライバシ・ユーティリティトレードオフを大幅に改善することを示した。

Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose "MMCC", the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $\epsilon\to0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our "conditional composition theorem" has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our amplification algorithm also has practical empirical utility: we show it leads to significant improvement in the privacy-utility trade-offs for DP-FTRL algorithms on standard benchmarks.

翻訳日:2023-10-25 20:31:43 公開日:2023-10-24

# 離散消音拡散モデルの固有のプライバシー特性について

On the Inherent Privacy Properties of Discrete Denoising Diffusion Models ( http://arxiv.org/abs/2310.15524v1 )

ライセンス: Link先を確認

Rongzhe Wei, Eleonora Krea\v{c}i\'c, Haoyu Wang, Haoteng Yin, Eli Chien, Vamsi K. Potluru, Pan Li

(参考訳) プライバシーに関する懸念から、合成データセットの作成が急増し、将来的な道として拡散モデルが生まれている。先行研究はこれらのモデルに対して経験的評価を行ったが、プライバシ保護能力の数学的特徴を提供するにはギャップがある。そこで本研究では,個別データセット生成のための離散拡散モデル(DDM)に固有のプライバシ保護を理論的に検討する。インスタンス毎の差分プライバシー(pDP)に着目して、トレーニングデータセットの各データポイントの潜在的なプライバシー漏洩を解明し、データ前処理に関する洞察を提供し、DDMによる合成データセット生成のプライバシーリスクを低減する。また、$s$サイズのデータポイントによるトレーニングは、純粋なノイズから合成クリーンデータフェーズへの移行時に、$(\epsilon, \mathcal{o}(\frac{1}{s^2\epsilon})$-pdpから$(\epsilon, \mathcal{o}(\frac{1}{s\epsilon})$-pdpへのプライバシリークの急増をもたらし、拡散係数のより早い減衰は、プライバシの保証を増幅することを示している。最後に,合成データと実世界のデータの両方について理論的知見を実証的に検証する。

Privacy concerns have led to a surge in the creation of synthetic datasets, with diffusion models emerging as a promising avenue. Although prior studies have performed empirical evaluations on these models, there has been a gap in providing a mathematical characterization of their privacy-preserving capabilities. To address this, we present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models (DDMs) for discrete dataset generation. Focusing on per-instance differential privacy (pDP), our framework elucidates the potential privacy leakage for each data point in a given training dataset, offering insights into data preprocessing to reduce privacy risks of the synthetic dataset generation via DDMs. Our bounds also show that training with $s$-sized data points leads to a surge in privacy leakage from $(\epsilon, \mathcal{O}(\frac{1}{s^2\epsilon}))$-pDP to $(\epsilon, \mathcal{O}(\frac{1}{s\epsilon}))$-pDP during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Finally, we empirically verify our theoretical findings on both synthetic and real-world datasets.

翻訳日:2023-10-25 20:31:23 公開日:2023-10-24

# グラフ自己教師付き学習のための生成的および対比的パラダイム

Generative and Contrastive Paradigms Are Complementary for Graph Self-Supervised Learning ( http://arxiv.org/abs/2310.15523v1 )

ライセンス: Link先を確認

Yuxiang Wang, Xiao Yan, Chuang Hu, Fangcheng Fu, Wentao Zhang, Hao Wang, Shuo Shang, Jiawei Jiang

(参考訳) グラフ自己教師学習(GSSL)では、マスク付きオートエンコーダ(MAE)が生成パラダイムに従い、マスク付きグラフエッジやノードの機能を再構築する。 Contrastive Learning (CL)は、同じグラフの拡張ビューの類似性を最大化し、GSSLで広く使われている。しかし、GSSLの既存の作業では、MAEとCLは別々に検討されている。我々は、mae と cl のパラダイムが相補的であることを観察し、それらを統合するために graph contrastive masked autoencoder (gcmae) フレームワークを提案する。具体的には、ローカルエッジやノード機能に注目して、MAEはグラフのグローバルな情報をキャプチャできず、特定のエッジや機能に敏感である。逆にclはグラフ間の関係を考えるため、グローバル情報を抽出するのに優れている。したがって、GCMAE に MAE ブランチと CL ブランチを装備し、2 つのブランチは共通エンコーダを共有することにより、MAE ブランチは CL ブランチによって抽出されたグローバル情報を利用することができる。 GCMAEにグローバルグラフ構造を捕捉させるため、既存の作業のようにマスクされたエッジのみでなく、隣接行列全体を再構築するように訓練する。さらに,MAEの特徴平滑化問題に対処するため,再構成誤差を低減するのではなく,ノード埋め込み間の格差を改善する特徴再構築のための識別損失を提案する。我々は,4つのグラフタスク(ノード分類,ノードクラスタリング,リンク予測,グラフ分類)におけるGCMAEを評価し,14の最先端ベースラインと比較した。その結果、GCMAEはこれらのタスクに対して常に良好な精度を提供しており、最高性能のベースラインと比較して最大3.2%の精度向上が達成されている。

For graph self-supervised learning (GSSL), masked autoencoder (MAE) follows the generative paradigm and learns to reconstruct masked graph edges or node features. Contrastive Learning (CL) maximizes the similarity between augmented views of the same graph and is widely used for GSSL. However, MAE and CL are considered separately in existing works for GSSL. We observe that the MAE and CL paradigms are complementary and propose the graph contrastive masked autoencoder (GCMAE) framework to unify them. Specifically, by focusing on local edges or node features, MAE cannot capture global information of the graph and is sensitive to particular edges and features. On the contrary, CL excels in extracting global information because it considers the relation between graphs. As such, we equip GCMAE with an MAE branch and a CL branch, and the two branches share a common encoder, which allows the MAE branch to exploit the global information extracted by the CL branch. To force GCMAE to capture global graph structures, we train it to reconstruct the entire adjacency matrix instead of only the masked edges as in existing works. Moreover, a discrimination loss is proposed for feature reconstruction, which improves the disparity between node embeddings rather than reducing the reconstruction error to tackle the feature smoothing problem of MAE. We evaluate GCMAE on four popular graph tasks (i.e., node classification, node clustering, link prediction, and graph classification) and compare with 14 state-of-the-art baselines. The results show that GCMAE consistently provides good accuracy across these tasks, and the maximum accuracy improvement is up to 3.2% compared with the best-performing baseline.

翻訳日:2023-10-25 20:30:51 公開日:2023-10-24

# MarkQA:数値推論を用いた大規模KBQAデータセット

MarkQA: A large scale KBQA dataset with numerical reasoning ( http://arxiv.org/abs/2310.15517v1 )

ライセンス: Link先を確認

Xiang Huang, Sitao Cheng, Yuheng Bao, Shanshan Huang, Yuzhong Qu

(参考訳) 知識ベースに対する質問応答 (KBQA) はファクトイド問題への対処の進展を示しているが、数値的推論を伴うKBQAはいまだに未解明である。本稿では,KBQAにおける複素数値推論に着目し,マルチホップ推論と数値推論の両方を実行する必要がある新しいタスクNR-KBQAを提案する。 PyQLと呼ばれるPython形式で論理形式を設計し、数値推論問題の推論プロセスを表現する。 NR-KBQAの開発を容易にするため,少量の種子から自動的に構築されるMarkQAと呼ばれる大規模なデータセットを提案する。 MarkQAの各質問には、対応するSPARQLクエリと、QDMRフォーマットとPyQLプログラムのステップバイステップ推論プロセスが備わっている。 MarkQAにおける最先端QA手法の実験結果は、KBQAにおける複雑な数値推論が大きな課題に直面していることを示している。

While question answering over knowledge bases (KBQA) has shown progress in addressing factoid questions, KBQA with numerical reasoning remains relatively unexplored. In this paper, we focus on the complex numerical reasoning in KBQA and propose a new task, NR-KBQA, which necessitates the ability to perform both multi-hop reasoning and numerical reasoning. We design a logic form in Python format called PyQL to represent the reasoning process of numerical reasoning questions. To facilitate the development of NR-KBQA, we present a large dataset called MarkQA, which is automatically constructed from a small set of seeds. Each question in MarkQA is equipped with its corresponding SPARQL query, alongside the step-by-step reasoning process in the QDMR format and PyQL program. Experimental results of some state-of-the-art QA methods on the MarkQA show that complex numerical reasoning in KBQA faces great challenges.

翻訳日:2023-10-25 20:30:21 公開日:2023-10-24

# 負荷依存コストによる中国のポストマン問題を解決するためのグラフ注意に基づく深層強化学習

Graph Attention-based Deep Reinforcement Learning for solving the Chinese Postman Problem with Load-dependent costs ( http://arxiv.org/abs/2310.15516v1 )

ライセンス: Link先を確認

Cong Dao Tran, Truong Son Hy

(参考訳) 近年,深い強化学習(DRL)モデルがルーティング問題を解く上で有望な結果を示している。しかしながら、ほとんどのDRLソルバは、トラベリングセールスマン問題(TSP)のようなノードルーティング問題を解決するために一般的に提案されている。一方、中国ポストマン問題(CPP)のようなアークルーティング問題に対するニューラルネットワークの適用については、TSPと比較して不規則で複雑な解空間がしばしばあるため、限定的な研究がなされている。これらのギャップを埋めるために,負荷制約を伴う複雑なアークルーティング問題であるCPP-LC(Corberan et al., 2018)に対処する新しいDRLフレームワークを提案する。この手法の目新しさは2つある。まず、CPP-LCをマルコフ決定過程(MDP)シーケンシャルモデルとして定式化する。次に、CPP-LC課題に効果的に対応するために、エンコーダとデコーダからなるDRL、すなわちArc-DRLに基づく自己回帰モデルを導入する。このようなフレームワークにより、DRLモデルはルーティング問題に対して効率よく、かつ、辛抱強く動作する。さらに,CPP-LCのための進化的アルゴリズム(EA)に基づくバイオインスパイアされた新しいメタヒューリスティックソリューションを提案する。大規模な実験により、Arc-DRLは、(Corberanらによって提案された)CPP-LCの大規模なベンチマークデータセットにおいて、反復局所探索(ILS)や可変近傍探索(VNS)のような既存のメタヒューリスティックな手法よりも、ソリューションの品質と実行時間の両方に関して優れていることが示された。 EA、ILS、VNSといったメタヒューリスティクスのためのC++実装と、データ生成のためのコード、生成されたデータはhttps://github.com/HySonLab/ Chinese_Postman_Problemでリリースしています。

Recently, Deep reinforcement learning (DRL) models have shown promising results in solving routing problems. However, most DRL solvers are commonly proposed to solve node routing problems, such as the Traveling Salesman Problem (TSP). Meanwhile, there has been limited research on applying neural methods to arc routing problems, such as the Chinese Postman Problem (CPP), since they often feature irregular and complex solution spaces compared to TSP. To fill these gaps, this paper proposes a novel DRL framework to address the CPP with load-dependent costs (CPP-LC) (Corberan et al., 2018), which is a complex arc routing problem with load constraints. The novelty of our method is two-fold. First, we formulate the CPP-LC as a Markov Decision Process (MDP) sequential model. Subsequently, we introduce an autoregressive model based on DRL, namely Arc-DRL, consisting of an encoder and decoder to address the CPP-LC challenge effectively. Such a framework allows the DRL model to work efficiently and scalably to arc routing problems. Furthermore, we propose a new bio-inspired meta-heuristic solution based on Evolutionary Algorithm (EA) for CPP-LC. Extensive experiments show that Arc-DRL outperforms existing meta-heuristic methods such as Iterative Local Search (ILS) and Variable Neighborhood Search (VNS) proposed by (Corberan et al., 2018) on large benchmark datasets for CPP-LC regarding both solution quality and running time; while the EA gives the best solution quality with much more running time. We release our C++ implementations for metaheuristics such as EA, ILS and VNS along with the code for data generation and our generated data at https://github.com/HySonLab/Chinese_Postman_Problem

翻訳日:2023-10-25 20:30:05 公開日:2023-10-24

# 火災と闘う - 誤情報の作りと検出におけるllmの2つの役割

Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation ( http://arxiv.org/abs/2310.15515v1 )

ライセンス: Link先を確認

Jason Lucas, Adaku Uchendu, Michiharu Yamashita, Jooyoung Lee, Shaurya Rohatgi, Dongwon Lee

(参考訳) 大規模言語モデル(LLM)の最近のユビキティと破壊的な影響は、誤用される可能性(大規模な有害かつ誤解を招くコンテンツを生成すること)を懸念している。 LLMの新たなリスクに対処するために,現代LSMの創発的・創発的推論能力を活用して人文・LLM生成の偽情報に対抗する新しいFighting Fire with Fire(F3)戦略を提案する。まず, GPT-3.5-turboを用いて, パラフレーズベースおよび摂動型プレフィックススタイルのプロンプトを用いて, 真偽LLM生成コンテンツを合成する。第2に,ゼロショットの文脈内意味推論手法をclozeスタイルのプロンプトに適用し,偽記事やニュース記事から真偽を識別する。我々は,GPT-3.5-turboの分布内および分布外両方のゼロショット優位性を観測し,GPT-3.5-turboの精度は従来より68-72%向上した。私たちのコードベースとデータセットはhttps://github.com/mickeymst/f3で利用可能です。

Recent ubiquity and disruptive impacts of large language models (LLMs) have raised concerns about their potential to be misused (.i.e, generating large-scale harmful and misleading content). To combat this emerging risk of LLMs, we propose a novel "Fighting Fire with Fire" (F3) strategy that harnesses modern LLMs' generative and emergent reasoning capabilities to counter human-written and LLM-generated disinformation. First, we leverage GPT-3.5-turbo to synthesize authentic and deceptive LLM-generated content through paraphrase-based and perturbation-based prefix-style prompts, respectively. Second, we apply zero-shot in-context semantic reasoning techniques with cloze-style prompts to discern genuine from deceptive posts and news articles. In our extensive experiments, we observe GPT-3.5-turbo's zero-shot superiority for both in-distribution and out-of-distribution datasets, where GPT-3.5-turbo consistently achieved accuracy at 68-72%, unlike the decline observed in previous customized and fine-tuned disinformation detectors. Our codebase and dataset are available at https://github.com/mickeymst/F3.

翻訳日:2023-10-25 20:29:29 公開日:2023-10-24

# 多言語表現の結合行列分解解析

A Joint Matrix Factorization Analysis of Multilingual Representations ( http://arxiv.org/abs/2310.15513v1 )

ライセンス: Link先を確認

Zheng Zhao, Yftah Ziser, Bonnie Webber, Shay B. Cohen

(参考訳) 多言語モデルと単言語モデルの潜在表現を比較するために,結合行列の分解に基づく解析ツールを提案する。探索の代替として、このツールは複数の表現の集合を共同で解析することを可能にする。このツールを用いて,多言語事前学習モデルで学習した表現に形態素的特徴がどのように反映されているかを検討した。 33以上の言語と17種類の形態素合成カテゴリの大規模実証研究を行った。以上の結果から,上層と下層における形態素情報エンコーディングの多様性が示され,言語特性によるカテゴリー別差異がみられた。因子化出力の階層的クラスタリングは、言語学者が手作業で作成した系統樹に関連する木構造をもたらす。さらに、因子化出力は、異なる言語間タスク間で観察される性能と強い相関を示す。将来の研究を促進するためにコードをリリースします。

We present an analysis tool based on joint matrix factorization for comparing latent representations of multilingual and monolingual models. An alternative to probing, this tool allows us to analyze multiple sets of representations in a joint manner. Using this tool, we study to what extent and how morphosyntactic features are reflected in the representations learned by multilingual pre-trained models. We conduct a large-scale empirical study of over 33 languages and 17 morphosyntactic categories. Our findings demonstrate variations in the encoding of morphosyntactic information across upper and lower layers, with category-specific differences influenced by language properties. Hierarchical clustering of the factorization outputs yields a tree structure that is related to phylogenetic trees manually crafted by linguists. Moreover, we find the factorization outputs exhibit strong associations with performance observed across different cross-lingual tasks. We release our code to facilitate future research.

翻訳日:2023-10-25 20:29:04 公開日:2023-10-24

# KITAB:情報検索における制約満足度の評価

KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval ( http://arxiv.org/abs/2310.15511v1 )

ライセンス: Link先を確認

Marah I Abdin, Suriya Gunasekar, Varun Chandrasekaran, Jerry Li, Mert Yuksekgonul, Rahee Ghosh Peshawaria, Ranjita Naik, Besmira Nushi

(参考訳) 本研究は,情報検索における制約満足度問合せ(例えば「サンディエゴのアイスクリームショップの一覧」)に対する最新技術モデルの回答能力について検討する。これまでこのようなクエリは,web検索や知識ベースを通じてのみ解決可能なタスクと考えられていた。最近では、大きな言語モデル (LLM) がこのタスクの初期発生能力を示している。しかし、現在の検索ベンチマークの多くは飽和しているか、制約満足度を測定していない。 llmの事実的不正確性と幻覚に関する懸念の高まりに動機づけられ,言語モデルの制約満足度を測定するための新しいデータセットであるkitabを提案する。 KITABは600人以上の著者と13,000のクエリにまたがる書籍関連データで構成され、関連する動的データ収集と制約検証アプローチを提供し、他の著者に対して同様のテストデータを取得する。 GPT4 と GPT3.5 に関する拡張実験では,情報人気,制約タイプ,コンテキストアベイラビリティなど,一般的な障害モードを特徴付ける。その結果,無関係な情報,事実的誤り,不完全性によって測定された厳密な制約が,情報人気が低下するにつれて悪化することが明らかとなった。コンテキスト可用性は無関係な情報を緩和するが、制約を満たすには役立たず、制約満足度に対する基本的な障壁を特定する。今後のモデルの制約満足度向上に関するさらなる研究を促進するため、当社のコントリビューションをオープンソースとして公開します。

We study the ability of state-of-the art models to answer constraint satisfaction queries for information retrieval (e.g., 'a list of ice cream shops in San Diego'). In the past, such queries were considered to be tasks that could only be solved via web-search or knowledge bases. More recently, large language models (LLMs) have demonstrated initial emergent abilities in this task. However, many current retrieval benchmarks are either saturated or do not measure constraint satisfaction. Motivated by rising concerns around factual incorrectness and hallucinations of LLMs, we present KITAB, a new dataset for measuring constraint satisfaction abilities of language models. KITAB consists of book-related data across more than 600 authors and 13,000 queries, and also offers an associated dynamic data collection and constraint verification approach for acquiring similar test data for other authors. Our extended experiments on GPT4 and GPT3.5 characterize and decouple common failure modes across dimensions such as information popularity, constraint types, and context availability. Results show that in the absence of context, models exhibit severe limitations as measured by irrelevant information, factual errors, and incompleteness, many of which exacerbate as information popularity decreases. While context availability mitigates irrelevant information, it is not helpful for satisfying constraints, identifying fundamental barriers to constraint satisfaction. We open source our contributions to foster further research on improving constraint satisfaction abilities of future models.

翻訳日:2023-10-25 20:28:50 公開日:2023-10-24

# ケイ素炭化ケイ素における核スピン量子ビットの測定

Measuring nuclear spin qubits by qudit-enhanced spectroscopy in Silicon Carbide ( http://arxiv.org/abs/2310.15557v1 )

ライセンス: Link先を確認

Erik Hesselmeier, Pierre Kuna, Istv\'an Tak\'acs, Viktor Iv\'ady, Wolfgang Knolle, Misagh Ghezellou, Jawad Ul-Hassan, Durga Dasari, Florian Kaiser, Vadim Vorobyov, J\"org Wrachtrup

(参考訳) 単一電子スピンへの超微細結合を持つ核スピンは、非常に貴重な量子ビットである。本研究では,4H-SiCの単一シリコン空孔色中心(V2)を取り巻く特にリッチな核スピン環境を探索し,特徴付ける。電子スピン-3/2quditを4レベルセンサーとして使用することにより、超微細な相互作用を通じて、数種類の$^{29}$siと$^{13}$c核スピンを同定する。我々は、光検出核共鳴による超微粒子結合の主要成分を抽出し、DFTシミュレーションにより結晶中の殻群に割り当てる。我々は、電子スピンの基底状態レベルの反交差を動的核偏極に利用し、核スピン偏極を最大9,8\pm6\,\%$とする。この手法は、個々のスピンの核磁気共鳴信号を検出し、そのコヒーレント制御を実証するために使用できる。我々の研究は、多ビットメモリおよび量子コンピューティングプラットフォームとしてSiCが将来使われるためのパラメータの詳細なセットを提供する。

Nuclear spins with hyperfine coupling to single electron spins are highly valuable quantum bits. In this work we probe and characterise the particularly rich nuclear spin environment around single silicon vacancy color-centers (V2) in 4H-SiC. By using the electron spin-3/2 qudit as a 4 level sensor, we identify several groups of $^{29}$Si and $^{13}$C nuclear spins through their hyperfine interaction. We extract the major components of their hyperfine coupling via optical detected nuclear resonance, and assign them to shell groups in the crystal via the DFT simulations. We utilise the ground state level anti-crossing of the electron spin for dynamic nuclear polarization and achieve a nuclear spin polarization of up to $98\pm6\,\%$. We show that this scheme can be used to detect the nuclear magnetic resonance signal of individual spins and demonstrate their coherent control. Our work provides a detailed set of parameters for future use of SiC as a multi-qubit memory and quantum computing platform.

翻訳日:2023-10-25 20:23:01 公開日:2023-10-24

# TCRA-LLM:推論コスト削減のための大規模言語モデル

TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction ( http://arxiv.org/abs/2310.15556v1 )

ライセンス: Link先を確認

Junyi Liu, Liangzhi Li, Tong Xiang, Bowen Wang, Yiming Qian

(参考訳) ChatGPTが公開用のAPIをリリースして以来、商用の大規模言語モデル(LLM)上に構築されたアプリケーションの数は指数関数的に増加した。このようなモデルの一般的な使用例としては、コンテキスト内学習能力の活用と、検索強化によって得られた知識を活用したユーザクエリによる応答の生成がある。商業的な検索拡張 LLM の展開の1つの問題は、LLM の入力トークンサイズを大幅に増大させる追加の検索コンテキストによるコストである。そこで本研究では,要約圧縮と意味圧縮の2つの手法を含むトークン圧縮方式を提案する。第1の方法は、長さの異なる自己インストラクションを含むサンプルを用いて生成されたデータセットによって微調整されたt5ベースのモデルを適用し、要約を行うことでトークンサイズを削減する。第2の方法は、セマンティクスへの影響が小さい単語を取り除いてトークンサイズを更に圧縮する。提案手法の有効性を適切に評価するために,妊娠期や乳幼児の食品レコメンデーションに着目したFRDB(Food-Recommendation DB)というデータセットを提案し,活用する。意味的圧縮は、トークンサイズとパフォーマンスをトレードオフするより柔軟な方法を提供するので、トークンサイズを1.6%の精度低下で20%削減できます。

Since ChatGPT released its API for public use, the number of applications built on top of commercial large language models (LLMs) increase exponentially. One popular usage of such models is leveraging its in-context learning ability and generating responses given user queries leveraging knowledge obtained by retrieval augmentation. One problem of deploying commercial retrieval-augmented LLMs is the cost due to the additionally retrieved context that largely increases the input token size of the LLMs. To mitigate this, we propose a token compression scheme that includes two methods: summarization compression and semantic compression. The first method applies a T5-based model that is fine-tuned by datasets generated using self-instruct containing samples with varying lengths and reduce token size by doing summarization. The second method further compresses the token size by removing words with lower impact on the semantic. In order to adequately evaluate the effectiveness of the proposed methods, we propose and utilize a dataset called Food-Recommendation DB (FRDB) focusing on food recommendation for women around pregnancy period or infants. Our summarization compression can reduce 65% of the retrieval token size with further 0.3% improvement on the accuracy; semantic compression provides a more flexible way to trade-off the token size with performance, for which we can reduce the token size by 20% with only 1.6% of accuracy drop.

翻訳日:2023-10-25 20:22:45 公開日:2023-10-24

# 日頭負荷予測のための転送学習--欧州電力需要時系列を事例として

Transfer learning for day-ahead load forecasting: a case study on European national electricity demand time series ( http://arxiv.org/abs/2310.15555v1 )

ライセンス: Link先を確認

Alexandros-Menelaos Tzortzis, Sotiris Pelekis, Evangelos Spiliotis, Spiros Mouzakitis, John Psarras, Dimitris Askounis

(参考訳) 電力グリッドの日々の運用には,短期負荷予測(STLF)が不可欠である。しかし、電力需要時系列を特徴付ける非線形性、非定常性、ランダム性は、STLFを困難な課題にしている。ターゲット系列を含む必要のない複数の電力需要系列のデータを用いてトレーニングされたニューラルネットワーク(NN)モデルなど、STLFを改善するための様々な予測手法が提案されている。本研究では,この特殊なSTLF(Transfer Learning, TL)の性能について, 欧州各国の日頭電力需要を表す27の時系列を考慮し検討した。我々は、人気があり実装が容易なNNモデルを採用し、クラスタリング分析を行い、シリーズ間の類似パターンを特定し、TLを支援する。この文脈では、2つの異なるTLアプローチがクラスタリングステップなしでコンパイルされ、典型的なNNトレーニング設定と同様に互いに比較される。その結果,クラスタリング技術を考慮した場合,TLは従来の手法よりも優れていることがわかった。

Short-term load forecasting (STLF) is crucial for the daily operation of power grids. However, the non-linearity, non-stationarity, and randomness characterizing electricity demand time series renders STLF a challenging task. Various forecasting approaches have been proposed for improving STLF, including neural network (NN) models which are trained using data from multiple electricity demand series that may not necessary include the target series. In the present study, we investigate the performance of this special case of STLF, called transfer learning (TL), by considering a set of 27 time series that represent the national day-ahead electricity demand of indicative European countries. We employ a popular and easy-to-implement NN model and perform a clustering analysis to identify similar patterns among the series and assist TL. In this context, two different TL approaches, with and without the clustering step, are compiled and compared against each other as well as a typical NN training setup. Our results demonstrate that TL can outperform the conventional approach, especially when clustering techniques are considered.

翻訳日:2023-10-25 20:22:22 公開日:2023-10-24

# 圧縮光キャビティモードにおける単一原子の量子速度限界

Quantum speed limit of a single atom in a squeezed optical cavity mode ( http://arxiv.org/abs/2310.15554v1 )

ライセンス: Link先を確認

Ya-Jie Ma, Xue-Chen Gao, Shao-Xiong Wu, and Chang-shui Yu

(参考訳) 本研究では,Fabry-Perotマイクロ共振器に閉じ込められた単一原子の量子速度限界について理論的に検討する。 2階非線形媒体に駆動レーザを印加した場合、キャビティモードを圧縮し、ボゴリューボフスクイーズ変換の下で有効ハミルトニアンが得られる。進化した原子状態の解析的表現は、初期励起状態の非エルミート的シュル「{o}dinger方程式を用いて得ることができ、量子速度制限時間は解析的式とマスター方程式の両方に非常によく一致する。量子速度制限の観点からは、大きな変形、強い駆動、結合強度のために量子状態の進化を加速することがより導出的である。最初の重ね合わせ状態の場合、初期状態の形式は進化速度に大きな影響を与える。量子速度制限時間はシステムパラメータに依存するだけでなく、初期状態によっても決定される。

We theoretically study the quantum speed limit of a single atom trapped in a Fabry-Perot microresonator. The cavity mode will be squeezed when a driving laser is applied to the second-order nonlinear medium, and the effective Hamiltonian can be obtained under the Bogoliubov squeezing transformation. The analytical expression of evolved atom state can be obtained by using the non-Hermitian Schr\"{o}dinger equation for the initial excited state, and the quantum speed limit time coincides very well for both the analytical expression and the master equation method. From the perspective of quantum speed limit, it is more conducive to accelerate the evolution of the quantum state for the large detuning, strong driving and coupling strength. For the initial superposition state case, the form of initial state has more influence on the evolution speed. The quantum speed limit time is not only dependent on the system parameters but also determined by the initial state.

翻訳日:2023-10-25 20:22:07 公開日:2023-10-24

# トランスフォーマーモデルにおける多言語性:フィードフォワードネットワークにおける言語特異性の検討

Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks ( http://arxiv.org/abs/2310.15552v1 )

ライセンス: Link先を確認

Sunit Bhattacharya and Ondrej Bojar

(参考訳) 最近の研究では、トランスフォーマー内のフィードフォワードモジュールは、トレーニングの例に基づいて入力から特定のパターンをキャプチャすることを学ぶキーバリューメモリの集合と見なすことができる。次に、キーの"メモリ"から出力された値を組み合わせて、次のトークンに関する予測を生成する。これは、出力層の近くの最終的なトークン選択に向けて徐々に収束する予測の漸進的なプロセスにつながる。この興味深い視点は、多言語モデルがこのメカニズムをどのように活用するかという疑問を提起する。具体的には、2つ以上の言語でトレーニングされた自己回帰モデルでは、すべてのニューロン(クロス層)はすべての言語に等しく反応するのか? いいえ! 我々の仮説は、事前学習中に特定のモデルパラメータが強い言語固有の特徴を学習する一方で、他のパラメータは言語に依存しない(言語間で共有される)特徴を学習するという考えを中心にしている。これを検証するために,本モデルが最初に事前学習された2言語の並列コーパスを用いて実験を行った。その結果,ネットワークの入力や出力に最も近い層は,中間層に比べて言語固有の振る舞いを示す傾向があることがわかった。

Recent research suggests that the feed-forward module within Transformers can be viewed as a collection of key-value memories, where the keys learn to capture specific patterns from the input based on the training examples. The values then combine the output from the 'memories' of the keys to generate predictions about the next token. This leads to an incremental process of prediction that gradually converges towards the final token choice near the output layers. This interesting perspective raises questions about how multilingual models might leverage this mechanism. Specifically, for autoregressive models trained on two or more languages, do all neurons (across layers) respond equally to all languages? No! Our hypothesis centers around the notion that during pretraining, certain model parameters learn strong language-specific features, while others learn more language-agnostic (shared across languages) features. To validate this, we conduct experiments utilizing parallel corpora of two languages that the model was initially pretrained on. Our findings reveal that the layers closest to the network's input or output tend to exhibit more language-specific behaviour compared to the layers in the middle.

翻訳日:2023-10-25 20:21:51 公開日:2023-10-24

# 自己教師付き適応残差推定生成逆ネットワークによるpet合成

PET Synthesis via Self-supervised Adaptive Residual Estimation Generative Adversarial Network ( http://arxiv.org/abs/2310.15550v1 )

ライセンス: Link先を確認

Yuxin Xue, Lei Bi, Yige Peng, Michael Fulham, David Dagan Feng, Jinman Kim

(参考訳) PET(Positron emission tomography)は、臨床診断において広く用いられている、高感度な分子イメージングである。 PETからの放射線被曝を減らすことだけでなく、適切な画質を維持することに関心がある。畳み込みニューラルネットワーク(cnns)を用いた低用量pet画像から合成された高品質pet画像を生成する手法が,低用量画像の復元に最先端の手法であると報告されている。しかし,これらの手法は,合成画像と実画像のテクスチャと構造にばらつきが生じやすい。さらに,低用量PETと標準PETとの分布変化について検討した。これらの課題に対処するため,我々は,自己教師付き適応残差推定生成ネットワーク(SS-AEGAN)を開発した。本稿では,(1)低線量PETと合成出力との残差マップを入力とし,予備合成PET画像の動的修正を目的とした適応残差推定機構であるAE-Net,(2)粗いジェネレータの特徴表現を強化する自己教師付き事前学習戦略を紹介する。全身PET画像の公開ベンチマークデータを用いて実験したところ,SS-AEGANは様々な線量削減因子による最先端合成法よりも一貫して優れていた。

Positron emission tomography (PET) is a widely used, highly sensitive molecular imaging in clinical diagnosis. There is interest in reducing the radiation exposure from PET but also maintaining adequate image quality. Recent methods using convolutional neural networks (CNNs) to generate synthesized high-quality PET images from low-dose counterparts have been reported to be state-of-the-art for low-to-high image recovery methods. However, these methods are prone to exhibiting discrepancies in texture and structure between synthesized and real images. Furthermore, the distribution shift between low-dose PET and standard PET has not been fully investigated. To address these issues, we developed a self-supervised adaptive residual estimation generative adversarial network (SS-AEGAN). We introduce (1) An adaptive residual estimation mapping mechanism, AE-Net, designed to dynamically rectify the preliminary synthesized PET images by taking the residual map between the low-dose PET and synthesized output as the input, and (2) A self-supervised pre-training strategy to enhance the feature representation of the coarse generator. Our experiments with a public benchmark dataset of total-body PET images show that SS-AEGAN consistently outperformed the state-of-the-art synthesis methods with various dose reduction factors.

翻訳日:2023-10-25 20:21:30 公開日:2023-10-24

# テンソル最適化におけるアルゴリズムの正則化:マトリックスセンシングの解法に向けて

Algorithmic Regularization in Tensor Optimization: Towards a Lifted Approach in Matrix Sensing ( http://arxiv.org/abs/2310.15549v1 )

ライセンス: Link先を確認

Ziye Ma, Javad Lavaei, Somayeh Sojoudi

(参考訳) 勾配降下(GD)は、暗黙の正規化を誘導し、コンパクト表現を促進するため、機械学習モデルの一般化に不可欠である。本研究では, テンソル最適化のための暗黙的正則化誘導におけるgdの役割について検討する。このフレームワークは、対称なランク1テンソルを最適化する際に、急激な解を厳密なサドルに変換することによって、非凸行列センシング問題に対処するために最近提案されている。十分に小さな初期化スケールで、この昇降問題に適用されたGDは、近似階数1テンソルと逃避方向の臨界点を導出する。本研究は, 行列センシングのテンソルパラメトリゼーションが一階法と組み合わせ, この問題における大域的最適性を達成する上で重要であることを裏付ける。

Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor optimization, particularly within the context of the lifted matrix sensing framework. This framework has been recently proposed to address the non-convex matrix sensing problem by transforming spurious solutions into strict saddles when optimizing over symmetric, rank-1 tensors. We show that, with sufficiently small initialization scale, GD applied to this lifted problem results in approximate rank-1 tensors and critical points with escape directions. Our findings underscore the significance of the tensor parametrization of matrix sensing, in combination with first-order methods, in achieving global optimality in such problems.

翻訳日:2023-10-25 20:21:10 公開日:2023-10-24

# トラップイオン中のボソニック論理状態のロバストと決定論的生成

Robust and Deterministic Preparation of Bosonic Logical States in a Trapped Ion ( http://arxiv.org/abs/2310.15546v1 )

ライセンス: Link先を確認

V. G. Matsos, C. H. Valahu, T. Navickas, A. D. Rao, M. J. Millican, M. J. Biercuk and T. R. Tan

(参考訳) ボソニックモードにおける論理量子ビットの符号化は、フォールトトレラント量子情報処理のハードウェア効率の高い実装を提供する。閉じ込められたイオンと超伝導マイクロ波キャビティの最近の進歩は、高品質なボソニック状態の実験的実現と、ボソニックモードで符号化された誤り訂正論理量子ビットの実証につながっている。しかし、現在のボゾン符号語作成プロトコルは、一般的なノイズ源には堅牢性がなく、実装が実験的に困難であり、これまで実現されてきたコードの品質と幅を制限している。本稿では, ロバスト制御による誤り抑制の概念と量子誤差補正符号化を組み合わせることで, 捕捉イオンの力学的運動における非古典的ターゲットボソニック状態の高忠実性, 決定論的生成を実験的に証明する。本稿では,レーザ駆動によるスピンモーション相互作用の動的変調を数値的に最適化し,目標状態を生成する手法を提案する。最適化された制御パルスは実験的な制約に合わせて調整され、支配的なエラー源に対して堅牢に設計されている。これらのプロトコルを用いて、Gottesman-Kitaev-Preskill (GKP)状態の論理的忠実度を$\bar{\mathcal{F}}=0.940(8)$で証明し、平均忠実度$\mathcal{F}=0.807(7)$で距離3二項論理状態の最初の実現を実現し、12.91(5) dBの真空状態を示す。

Encoding logical qubits in bosonic modes provides a potentially hardware-efficient implementation of fault-tolerant quantum information processing. Recent advancements in trapped ions and superconducting microwave cavities have led to experimental realizations of high-quality bosonic states and demonstrations of error-corrected logical qubits encoded in bosonic modes. However, current protocols for preparing bosonic code words lack robustness to common noise sources and can be experimentally challenging to implement, limiting the quality and breadth of codes that have been realized to date. Here, we combine concepts of error suppression via robust control with quantum error correction encoding and experimentally demonstrate high-fidelity, deterministic preparation of highly non-classical target bosonic states in the mechanical motion of a trapped ion. Our approach implements numerically optimized dynamical modulation of laser-driven spin-motion interactions to generate the target state in a single step. The optimized control pulses are tailored towards experimental constraints and are designed to be robust against the dominant source of error. Using these protocols, we demonstrate logical fidelities for the Gottesman-Kitaev-Preskill (GKP) state as high as $\bar{\mathcal{F}}=0.940(8)$, achieve the first realization of a distance-3 binomial logical state with an average fidelity of $\mathcal{F}=0.807(7)$, and demonstrate a 12.91(5) dB squeezed vacuum state.

翻訳日:2023-10-25 20:20:54 公開日:2023-10-24

# 複数の解像度でのルーティング問題を解決する対称性保存グラフアテンションネットワーク

Symmetry-preserving graph attention network to solve routing problems at multiple resolutions ( http://arxiv.org/abs/2310.15543v1 )

ライセンス: Link先を確認

Cong Dao Tran, Thong Bach, Truong Son Hy

(参考訳) トラベリングセールスパーソン問題 (TSP) と車両ルーティング問題 (VRP) は,機械学習 (ML) 手法の適応により,精度と計算時間を合理的に向上した。しかし、以前の作品では、回転、翻訳、置換、スケーリングを含む、tspsとvrpから生じる対称性を完全に尊重していない。本研究では,組合わせ問題を解くために,最初の完全同値モデルとトレーニングを導入する。さらに、特に大きなグラフや長距離グラフの場合において、入力グラフのマルチスケール構造(ローカルからグローバル情報)を捉えることが不可欠であり、従来の手法は局所的あるいは準最適解に繋がるローカル情報のみを抽出することに限定されていた。上記の制限に対処するため,マルチレゾリューション方式と等価グラフアテンションネットワーク(mEGAT)アーキテクチャを併用して,低レベルおよび高レベルグラフレゾリューションに基づく最適経路を効率的に学習する手法を提案する。特に, 入力グラフから粗粒グラフの階層構造を構築し, まずは単純な低レベルグラフのルーティング問題を解き, その知識をより複雑な高レベルグラフに活用する。実験により,本モデルが既存のベースラインより優れており,対称性の保存とマルチレゾリューションがデータ駆動方式で組合せ問題を解くための重要なレシピであることを実証した。私たちのソースコードはhttps://github.com/HySonLab/Multires-NP-hardで公開されています。

Travelling Salesperson Problems (TSPs) and Vehicle Routing Problems (VRPs) have achieved reasonable improvement in accuracy and computation time with the adaptation of Machine Learning (ML) methods. However, none of the previous works completely respects the symmetries arising from TSPs and VRPs including rotation, translation, permutation, and scaling. In this work, we introduce the first-ever completely equivariant model and training to solve combinatorial problems. Furthermore, it is essential to capture the multiscale structure (i.e. from local to global information) of the input graph, especially for the cases of large and long-range graphs, while previous methods are limited to extracting only local information that can lead to a local or sub-optimal solution. To tackle the above limitation, we propose a Multiresolution scheme in combination with Equivariant Graph Attention network (mEGAT) architecture, which can learn the optimal route based on low-level and high-level graph resolutions in an efficient way. In particular, our approach constructs a hierarchy of coarse-graining graphs from the input graph, in which we try to solve the routing problems on simple low-level graphs first, then utilize that knowledge for the more complex high-level graphs. Experimentally, we have shown that our model outperforms existing baselines and proved that symmetry preservation and multiresolution are important recipes for solving combinatorial problems in a data-driven manner. Our source code is publicly available at https://github.com/HySonLab/Multires-NP-hard

翻訳日:2023-10-25 20:20:21 公開日:2023-10-24

# 辞書から概念的役割を学習することによる理解と一貫性の言語モデルの改善

Improving Language Models Meaning Understanding and Consistency by Learning Conceptual Roles from Dictionary ( http://arxiv.org/abs/2310.15541v1 )

ライセンス: Link先を確認

Myeongjun Erik Jang, Thomas Lukasiewicz

(参考訳) 現代事前訓練言語モデル(PLM)の非人間的な振る舞いは、その信頼性を損なう主要な原因である。このような不整合な振る舞いの驚くべき現象は、一貫性のない予測の生成であり、同じ意味を持つテキストに対して異なる予測を生成したり、論理特性に違反するなど、論理的に矛盾する結果を生み出す。以前の研究では、データの増大を悪用したり、問題を緩和するために特殊な損失関数を実装した。しかし、大規模なPLMのために高価なトレーニングリソースを消費し、一定の一貫性のタイプしか扱えないため、利用は限られている。そこで本研究では,plmの意味認識を根本的に改善することにより,一貫性のない行動問題を緩和する実践的アプローチを提案する。概念的役割理論に基づき,辞書内の単語定義ペアから概念間の正確な相互関係を学習することにより,plmが正確な意味を捉えることができる。次に,学習した相互関係とPLMの事前学習知識を組み合わせるために,いくつかの追加パラメータのみを更新する効率的なパラメータ統合手法を提案する。実験の結果,複数種類の一貫性を同時に改善し,効率的な知識統合を実現し,他の言語にも容易に適用できることが判明した。

The non-humanlike behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness. A striking phenomenon of such faulty behaviours is the generation of inconsistent predictions, which produces logically contradictory results, such as generating different predictions for texts delivering the same meaning or violating logical properties. Previous studies exploited data augmentation or implemented specialised loss functions to alleviate the issue. However, their usage is limited, because they consume expensive training resources for large-sized PLMs and can only handle a certain consistency type. To this end, we propose a practical approach that alleviates the inconsistent behaviour issue by fundamentally improving PLMs' meaning awareness. Based on the conceptual role theory, our method allows PLMs to capture accurate meaning by learning precise interrelationships between concepts from word-definition pairs in a dictionary. Next, we propose an efficient parameter integration technique that updates only a few additional parameters to combine the learned interrelationship with PLMs' pre-trained knowledge. Our experimental results reveal that the approach can concurrently improve multiple types of consistency, enables efficient knowledge integration, and easily applies to other languages.

翻訳日:2023-10-25 20:19:56 公開日:2023-10-24

# 変化のレンズを通して識別可能な潜在多項式因果モデル

Identifiable Latent Polynomial Causal Models Through the Lens of Change ( http://arxiv.org/abs/2310.15580v1 )

ライセンス: Link先を確認

Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

(参考訳) 因果表現学習は、観測された低レベルデータから潜在的な高レベル因果表現を明らかにすることを目的としている。その主な任務の1つは、これらの潜在因果モデルの識別を信頼できる保証を提供することである。最近のブレークスルーでは、複数の環境にまたがる潜在因果変数間の因果影響の変化を利用して、識別可能性を探る。しかし、この進歩は潜在因果変数間の因果関係が線形ガウスモデルに厳密に従うという仮定に基づいている。本稿では,多項式モデルに代表される非線形因果関係と指数関数族に準拠した一般雑音分布を含む潜在因果モデルの範囲を拡張する。さらに,すべての因果パラメータに変化を付与する必要性や,その一部が変化していない場合の部分的識別可能性について検討する。さらに,我々の理論的発見に基礎を置き,一貫した因果表現の学習を可能にする新しい経験的推定法を提案する。合成データと実世界データの両方から得られた実験結果は,識別性と一貫性に関する理論的貢献を検証する。

Causal representation learning aims to unveil latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments \citep{liu2022identifying}. However, this progress rests on the assumption that the causal relationships among latent causal variables adhere strictly to linear Gaussian models. In this paper, we extend the scope of latent causal models to involve nonlinear causal relationships, represented by polynomial models, and general noise distributions conforming to the exponential family. Additionally, we investigate the necessity of imposing changes on all causal parameters and present partial identifiability results when part of them remains unchanged. Further, we propose a novel empirical estimation method, grounded in our theoretical finding, that enables learning consistent latent causal representations. Our experimental results, obtained from both synthetic and real-world data, validate our theoretical contributions concerning identifiability and consistency.

翻訳日:2023-10-25 20:12:05 公開日:2023-10-24

# VMAFによるPyTorchの再実装:実験結果

VMAF Re-implementation on PyTorch: Some Experimental Results ( http://arxiv.org/abs/2310.15578v1 )

ライセンス: Link先を確認

Kirill Aistov and Maxim Koroteev

(参考訳) 標準VMAF実装に基づいて,PyTorchフレームワークを用いたVMAFの実装を提案する。この実装で標準(libvmaf)と比較すると、vmafユニットで$\lesssim 10^{-2}$の差が示される。目的関数としてVMAFを使用する場合の勾配計算について検討し、この関数を用いたトレーニングが不利な勾配を生じさせないことを示す。

Based on the standard VMAF implementation we propose an implementation of VMAF using PyTorch framework. For this implementation comparisons with the standard (libvmaf) show the discrepancy $\lesssim 10^{-2}$ in VMAF units. We investigate gradients computation when using VMAF as an objective function and demonstrate that training using this function does not result in ill-behaving gradients.

翻訳日:2023-10-25 20:11:47 公開日:2023-10-24

# CONTRASTE:Aspect-based Promptsを用いた教師付きコントラスト事前訓練

CONTRASTE: Supervised Contrastive Pre-training With Aspect-based Prompts For Aspect Sentiment Triplet Extraction ( http://arxiv.org/abs/2310.15577v1 )

ライセンス: Link先を確認

Rajdeep Mukherjee, Nithish Kannen, Saurabh Kumar Pandey, Pawan Goyal

(参考訳) Aspect Sentiment Triplet extract (ASTE)に関する既存の研究は、タスクのためのより効率的な微調整技術の開発に重点を置いている。私たちのモチベーションは、複数のABSAタスクの下流のパフォーマンスを同時に改善できる汎用的なアプローチを考え出すことです。そこで本研究では,ConTRastive Learningを用いた新しい事前学習戦略であるConTRASTEを提案する。我々は主にASTEに焦点を当てているが、ACOS、TASD、AESCといった他のABSAタスクに対して提案手法の利点を示す。文とその関連する(アスペクト、意見、感情)三つ子を与えられたら、まず、対応する感情を隠蔽したアスペクトベースのプロンプトを設計する。次に,デコーダの生成したアスペクト認識感情表現に対して,コントラスト学習を適用して,エンコーダ-デコーダモデルを訓練する。そこで, モデル重みを微調整するために, ベースエンコーダ・デコーダモデルとタグ付きオピニオン項検出器, 回帰型トリプレット数推定器の2つの補完モジュールを組み合わせた, 新たなマルチタスク手法を提案する。 4つのベンチマークデータセットの徹底的な実験と詳細なアブレーション実験により,提案する各コンポーネントの重要性が証明された。

Existing works on Aspect Sentiment Triplet Extraction (ASTE) explicitly focus on developing more efficient fine-tuning techniques for the task. Instead, our motivation is to come up with a generic approach that can improve the downstream performances of multiple ABSA tasks simultaneously. Towards this, we present CONTRASTE, a novel pre-training strategy using CONTRastive learning to enhance the ASTE performance. While we primarily focus on ASTE, we also demonstrate the advantage of our proposed technique on other ABSA tasks such as ACOS, TASD, and AESC. Given a sentence and its associated (aspect, opinion, sentiment) triplets, first, we design aspect-based prompts with corresponding sentiments masked. We then (pre)train an encoder-decoder model by applying contrastive learning on the decoder-generated aspect-aware sentiment representations of the masked terms. For fine-tuning the model weights thus obtained, we then propose a novel multi-task approach where the base encoder-decoder model is combined with two complementary modules, a tagging-based Opinion Term Detector, and a regression-based Triplet Count Estimator. Exhaustive experiments on four benchmark datasets and a detailed ablation study establish the importance of each of our proposed components as we achieve new state-of-the-art ASTE results.

翻訳日:2023-10-25 20:11:41 公開日:2023-10-24

# 量子アルゴリズムによるAgnostic Learningのためのニアクアドラティックサンプル複雑度低減

A Near-Quadratic Sample Complexity Reduction for Agnostic Learning via Quantum Algorithms ( http://arxiv.org/abs/2310.15576v1 )

ライセンス: Link先を確認

Daniel Z. Zanger

(参考訳) 量子アルゴリズムを用いて、精度 $\epsilon,0<\epsilon<1/4$ と信頼 $1-\delta,0<\delta <1,$ の新しいサンプル複雑性上界$O((\mbox{log}(\frac{1}{\delta}))/\epsilon)$ as $\epsilon,\delta\rightarrow 0$ ($\epsilon^{-1}$ のポリ対数係数まで)を一般の無知学習モデルに対して得られる。これは漸近順序 $\theta((\mbox{log}(\frac{1}{\delta}))/\epsilon^{2}) の対応するサンプル複雑性を、有限濃度の仮説集合とともに無依存学習問題に対する古典的(非量子)アルゴリズムによって達成可能であることが文献で知られている(例えば arunachalam と de wolf (2018) を参照)。したがって、一般的な無依存学習の場合、我々が達成する学習速度の量子スピードアップは、(多対数因子まで)$\epsilon^{-1}$で二次的である。

Using quantum algorithms, we obtain, for accuracy $\epsilon,0<\epsilon<1/4$ and confidence $1-\delta,0<\delta <1,$ a new sample complexity upper bound of $O((\mbox{log}(\frac{1}{\delta}))/\epsilon)$ as $\epsilon,\delta\rightarrow 0$ (up to a polylogarithmic factor in $\epsilon^{-1}$) for a general agnostic learning model, provided the hypothesis class is of finite cardinality. This greatly improves upon a corresponding sample complexity of asymptotic order $\Theta((\mbox{log}(\frac{1}{\delta}))/\epsilon^{2})$ known in the literature to be attainable by means of classical (non-quantum) algorithms for an agnostic learning problem also with hypothesis set of finite cardinality (see, for example, Arunachalam and de Wolf (2018) and the classical statistical learning theory references cited there). Thus, for general agnostic learning, the quantum speedup in the rate of learning that we achieve is quadratic in $\epsilon^{-1}$ (up to a polylogarithmic factor).

翻訳日:2023-10-25 20:11:16 公開日:2023-10-24

# POE:複数選択推論のための除去プロセス

POE: Process of Elimination for Multiple Choice Reasoning ( http://arxiv.org/abs/2310.15575v1 )

ライセンス: Link先を確認

Chenkai Ma, Xinya Du

(参考訳) 言語モデル(LM)は、複数の選択推論タスクに対してコンテキスト内学習を行うことができるが、これらのタスクの選択肢は等しく扱われる。人間は最後に正しい答えを選ぶ前に間違った選択肢を最初に排除するので、同様の2段階の戦略は、これらのタスクにおいてLMをより良くする、と私たちは主張する。この目的のために, 2段階のスコアリング法であるプロセス・オブ・エミッション(POE)を提案する。最初のステップでは、POEはそれぞれのオプションをスコアし、一見間違ったオプションを排除します。 2番目のステップでは、POEはこれらの間違ったオプションを隠蔽し、残りのオプションから最終的な予測を行う。 8つの推論タスクのゼロショット実験では,POEの有効性が示され,以下の分析により,論理的推論タスクに特に有効であることが判明した。さらにマスクの効果を分析し,ChatGPTのような少数ショット設定や大規模言語モデル(LLM)に適用できることを示す。

Language models (LMs) are capable of conducting in-context learning for multiple choice reasoning tasks, but the options in these tasks are treated equally. As humans often first eliminate wrong options before picking the final correct answer, we argue a similar two-step strategy can make LMs better at these tasks. To this end, we present the Process of Elimination (POE), a two-step scoring method. In the first step, POE scores each option, and eliminates seemingly wrong options. In the second step, POE masks these wrong options, and makes the final prediction from the remaining options. Zero-shot experiments on 8 reasoning tasks illustrate the effectiveness of POE, and a following analysis finds our method to be especially performant on logical reasoning tasks. We further analyze the effect of masks, and show that POE applies to few-shot settings and large language models (LLMs) like ChatGPT.

翻訳日:2023-10-25 20:10:35 公開日:2023-10-24

# 薬物発見知識グラフのための自然言語処理:約束と落とし穴

Natural Language Processing for Drug Discovery Knowledge Graphs: promises and pitfalls ( http://arxiv.org/abs/2310.15572v1 )

ライセンス: Link先を確認

J. Charles G. Jeynes, Tim James, Matthew Corney

(参考訳) 薬物発見を助けるための知識グラフ(kgs)の構築と分析は、研究のトピックである。 KGsの健全な特徴は、コネクションの発見を容易にするフォーマットで、多くの異種データソースを組み合わせる能力である。 KGsの実用性は、薬物再資源化などの分野で実証されており、手動によるデータの探索とモデリングを通じて洞察されている。本稿では、自然言語処理(nlp)を用いて、通常、科学文献からkgsのデータソースとして非構造化テキストをマイニングする約束と落とし穴について論じる。これは、当初、KG内のデータの基盤としてChEMBLなどの構造化データソースを解析し、NLPを使用してそれらを強化または拡張した経験に基づいています。 KGsのNLPの基本的な約束は、人間のキュレーションだけでは事実上不可能なタスクとして、数百万のドキュメントからデータを自動的に抽出することである。しかしながら、NLP-KGパイプラインには誤った名前のエンティティ認識やオントロジーなどの潜在的な落とし穴があり、最終的には誤った推論や結論につながる可能性がある。

Building and analysing knowledge graphs (KGs) to aid drug discovery is a topical area of research. A salient feature of KGs is their ability to combine many heterogeneous data sources in a format that facilitates discovering connections. The utility of KGs has been exemplified in areas such as drug repurposing, with insights made through manual exploration and modelling of the data. In this article, we discuss promises and pitfalls of using natural language processing (NLP) to mine unstructured text typically from scientific literature as a data source for KGs. This draws on our experience of initially parsing structured data sources such as ChEMBL as the basis for data within a KG, and then enriching or expanding upon them using NLP. The fundamental promise of NLP for KGs is the automated extraction of data from millions of documents a task practically impossible to do via human curation alone. However, there are many potential pitfalls in NLP-KG pipelines such as incorrect named entity recognition and ontology linking all of which could ultimately lead to erroneous inferences and conclusions.

翻訳日:2023-10-25 20:10:19 公開日:2023-10-24

# 選択特殊化による視覚的接地連続言語学習

Visually Grounded Continual Language Learning with Selective Specialization ( http://arxiv.org/abs/2310.15571v1 )

ライセンス: Link先を確認

Kyra Ahrens, Lennart Bengtson, Jae Hee Lee, Stefan Wermter

(参考訳) 視覚に作用する人工エージェントの望ましい特性は、各タスクに十分な専門化と、伝達のための一般的な知識の構築のバランスを保ちながら、言語に変形したタスクのシーケンスを継続的に学習することである。選択的特殊化(Selective specialization)、すなわち各タスクを専門とするモデルコンポーネントの選択は、このトレードオフを管理するための戦略である。しかしながら、選択戦略の設計には、より専門的で一般化可能な表現の学習において、各モデルコンポーネントの役割についての洞察が必要である。そこで本研究の目的は,視覚下連続言語学習のための選択戦略を広範囲に分析することである。この目的に適したベンチマークがないため、徹底したモデル分析に十分な制御と柔軟性を提供する2つの新しい診断データセットを導入する。モジュールの特殊化戦略および2種類のモデルアーキテクチャの定量化のための様々なヒューリスティックスを評価する。最後に,共通の連続学習ベースラインを上回る分析に基づいて,概念的に単純なアプローチをデザインする。本研究は,連続学習アルゴリズムと個別モデル部品の学習行動の連携を改善するためのさらなる取り組みの必要性を示す。

A desirable trait of an artificial agent acting in the visual world is to continually learn a sequence of language-informed tasks while striking a balance between sufficiently specializing in each task and building a generalized knowledge for transfer. Selective specialization, i.e., a careful selection of model components to specialize in each task, is a strategy to provide control over this trade-off. However, the design of selection strategies requires insights on the role of each model component in learning rather specialized or generalizable representations, which poses a gap in current research. Thus, our aim with this work is to provide an extensive analysis of selection strategies for visually grounded continual language learning. Due to the lack of suitable benchmarks for this purpose, we introduce two novel diagnostic datasets that provide enough control and flexibility for a thorough model analysis. We assess various heuristics for module specialization strategies as well as quantifiable measures for two different types of model architectures. Finally, we design conceptually simple approaches based on our analysis that outperform common continual learning baselines. Our results demonstrate the need for further efforts towards better aligning continual learning algorithms with the learning behaviors of individual model parts.

翻訳日:2023-10-25 20:10:00 公開日:2023-10-24

# MuLMS: 材料科学領域における情報抽出のための多層注釈テキストコーパス

MuLMS: A Multi-Layer Annotated Text Corpus for Information Extraction in the Materials Science Domain ( http://arxiv.org/abs/2310.15569v1 )

ライセンス: Link先を確認

Timo Pierre Schrader, Matteo Finco, Stefan Gr\"unewald, Felix Hildebrand, Annemarie Friedrich

(参考訳) 研究分野に関する最近の出版物や実験結果をすべて追跡することは難しい課題である。先行研究は、様々な科学分野における情報抽出モデルの有効性を実証した。最近、未研究の材料科学領域向けにいくつかのデータセットがリリースされた。しかしながら、これらのデータセットは、パーシング合成手順や固体酸化物燃料電池などのサブドメインといったサブプロブレムに焦点を当てている。本稿では,材料科学のサブドメイン7つにまたがる50のオープンアクセス記事のデータセットであるmulmsについて述べる。コーパスはドメインの専門家によって注釈付けされており、名前付きエンティティからフレーム構造へのいくつかのレイヤがある。すべてのタスクに対して競合するニューラルモデルを提示し、既存の関連リソースによるマルチタスクトレーニングがメリットをもたらすことを示す。

Keeping track of all relevant recent publications and experimental results for a research area is a challenging task. Prior work has demonstrated the efficacy of information extraction models in various scientific areas. Recently, several datasets have been released for the yet understudied materials science domain. However, these datasets focus on sub-problems such as parsing synthesis procedures or on sub-domains, e.g., solid oxide fuel cells. In this resource paper, we present MuLMS, a new dataset of 50 open-access articles, spanning seven sub-domains of materials science. The corpus has been annotated by domain experts with several layers ranging from named entities over relations to frame structures. We present competitive neural models for all tasks and demonstrate that multi-task training with existing related resources leads to benefits.

翻訳日:2023-10-25 20:09:40 公開日:2023-10-24

# I$^2$MD:Modal Mutual Distillationを用いた3D行動表現学習

I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation ( http://arxiv.org/abs/2310.15568v1 )

ライセンス: Link先を確認

Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li

(参考訳) 近年の自己教師型3次元行動表現学習の進歩は、主に対照的な学習によるものである。しかし、従来の対照的な枠組みでは、異なる骨格のモダリティ間の豊富な相補性は未解明のままである。さらに、自己提供したサンプルの識別に最適化されたモデルでは、限定されたアクションカテゴリの場合、同様のポジティブなインスタンスが多数発生する。本研究では, 一般的な相互蒸留(I$^2$MD)フレームワークを導入することで, 上記の問題に対処する。 i$^2$md では、まずクロスモーダル相互作用をクロスモーダル相互蒸留(cmd)過程として再計算する。教員の知識を学生に伝達する既存の蒸留ソリューションとは異なり、CMDでは、知識は継続的に更新され、事前訓練中にモダリティ間で双方向に蒸留される。類似したサンプルの干渉を緩和し,その基盤となるコンテキストを活用するため,IMD(Intra-modal Mutual Distillation)戦略,IMD(Dynamic Neighbors Aggregation)メカニズムを最初に導入し,各モードで追加のクラスタレベルの識別ブランチをインスタンス化する。高度に相関した隣り合う特徴を適応的に集約し、局所的なクラスタレベルのコントラストを形成する。相互蒸留は2つの分枝間で行われ、相互レベルの知識交換が行われる。 3つのデータセットに関する広範な実験は、我々のアプローチが一連の新しいレコードを設定することを示している。

Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of limited action categories. In this work, we tackle the aforementioned problems by introducing a general Inter- and Intra-modal Mutual Distillation (I$^2$MD) framework. In I$^2$MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process. Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training. To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy, In IMD, the Dynamic Neighbors Aggregation (DNA) mechanism is first introduced, where an additional cluster-level discrimination branch is instantiated in each modality. It adaptively aggregates highly-correlated neighboring features, forming local cluster-level contrasting. Mutual distillation is then performed between the two branches for cross-level knowledge exchange. Extensive experiments on three datasets show that our approach sets a series of new records.

翻訳日:2023-10-25 20:09:28 公開日:2023-10-24

# Ojaのアルゴリズムから応用による乗法重み更新法へ

From Oja's Algorithm to the Multiplicative Weights Update Method with Applications ( http://arxiv.org/abs/2310.15559v1 )

ライセンス: Link先を確認

Dan Garber

(参考訳) ojaのアルゴリズムは、主に確率主成分分析の文脈で研究されているよく知られたオンラインアルゴリズムである。我々は、共通の固有ベクトルを共有する任意の(必ずしも確率的ではない)対称行列列に適用すると、ojaのアルゴリズムの後悔は、専門家のアドバイスによる予測問題に対するよく知られた乗法重みの後悔という観点で、直接的に境界づけられるという、我々の知識の最も良いところは、単純な観察をする。単位球面上の二次形式を最適化するいくつかの応用を$\reals^n$で論じる。

Oja's algorithm is a well known online algorithm studied mainly in the context of stochastic principal component analysis. We make a simple observation, yet to the best of our knowledge a novel one, that when applied to a any (not necessarily stochastic) sequence of symmetric matrices which share common eigenvectors, the regret of Oja's algorithm could be directly bounded in terms of the regret of the well known multiplicative weights update method for the problem of prediction with expert advice. Several applications to optimization with quadratic forms over the unit sphere in $\reals^n$ are discussed.

翻訳日:2023-10-25 20:09:04 公開日:2023-10-24

# tagE:人間の指示を理解するために身体的エージェントを起動

tagE: Enabling an Embodied Agent to Understand Human Instructions ( http://arxiv.org/abs/2310.15605v1 )

ライセンス: Link先を確認

Chayan Sarkar and Avik Mitra and Pradip Pramanick and Tapas Nayak

(参考訳) 自然言語は、物理的存在を持つ知的エージェントが人間と関わるとき、コミュニケーションの第一のモードとして機能する。多くの研究が、感情分析、意図予測、質問応答、要約といった取り組みを含む自然言語理解(NLU)に焦点を当てているが、NLUの範囲は、具体的エージェントによる具体的な行動を必要とする状況に限られている。自然言語固有の曖昧さと不完全性は、人間の意図を解読しようとする知的エージェントにとっての課題である。この課題に取り組むため,我々は,具体化エージェント (tage) のためのタスクおよび引数グラウンドと呼ばれる新しいシステムを提案する。本システムでは,自然言語で表現された複雑なタスク命令から一連のタスクを抽出するために,発明的なニューラルネットワークモデルを採用している。提案モデルでは,入れ子デコードに富んだエンコーダ・デコーダ・フレームワークを用いて,複雑な命令からタスクとその引数を効果的に抽出する。抽出されたタスクはロボットの確立したスキルコレクションにマッピング(あるいは接地)され、引数は環境に存在するオブジェクトの接地を見つける。システムのトレーニングと評価を容易にするため,複雑な命令を含むデータセットをキュレートした。実験の結果は、ロバストなベースラインモデルよりも優れており、我々のアプローチの長所を浮き彫りにしている。

Natural language serves as the primary mode of communication when an intelligent agent with a physical presence engages with human beings. While a plethora of research focuses on natural language understanding (NLU), encompassing endeavors such as sentiment analysis, intent prediction, question answering, and summarization, the scope of NLU directed at situations necessitating tangible actions by an embodied agent remains limited. The inherent ambiguity and incompleteness inherent in natural language present challenges for intelligent agents striving to decipher human intention. To tackle this predicament head-on, we introduce a novel system known as task and argument grounding for Embodied agents (tagE). At its core, our system employs an inventive neural network model designed to extract a series of tasks from complex task instructions expressed in natural language. Our proposed model adopts an encoder-decoder framework enriched with nested decoding to effectively extract tasks and their corresponding arguments from these intricate instructions. These extracted tasks are then mapped (or grounded) to the robot's established collection of skills, while the arguments find grounding in objects present within the environment. To facilitate the training and evaluation of our system, we have curated a dataset featuring complex instructions. The results of our experiments underscore the prowess of our approach, as it outperforms robust baseline models.

翻訳日:2023-10-25 20:03:21 公開日:2023-10-24

# MUSER: マルチビュー類似のケース検索データセット

MUSER: A Multi-View Similar Case Retrieval Dataset ( http://arxiv.org/abs/2310.15602v1 )

ライセンス: Link先を確認

Qingquan Li and Yiran Hu and Feng Yao and Chaojun Xiao and Zhiyuan Liu and Maosong Sun and Weixing Shen

(参考訳) 類似事例検索(SCR)は、司法公正の促進に重要な役割を果たす代表的法的AIアプリケーションである。しかし、既存のSCRデータセットは、事件間の類似性を判断する際にのみ事実記述セクションに焦点をあてており、背景にある洞察力のある推論プロセスを提供する他の価値あるセクション(例えば裁判所の意見)を無視している。さらに、ケースの類似性は、典型的には事実記述のテクスト的意味論のみによって測定され、法的知識の観点からは、訴訟の完全な複雑さを捉えることができない可能性がある。本稿では,多視点類似度測定に基づく類似事例検索データセットであるmuserと,文レベルの法的要素アノテーションを用いた包括的法的要素を提案する。具体的には,3つの視点(法的事実,紛争焦点,法規)を選択し,それぞれに法的要素の包括的かつ構造化されたラベルスキーマを構築し,ケース類似性の正確かつ理解可能な評価を可能にする。構築されたデータセットは、中国の民事事件から始まり、100のクエリケースと4,024の候補ケースを含んでいる。法的な要素予測のためのテキスト分類アルゴリズムと,MUSER上の類似事例を検索するための様々な検索手法を実装した。実験結果から, 法的要素を組み込むことでSCRモデルの性能向上が期待できるが, MUSERがもたらした課題に対処するためには, さらなる努力が必要であることが示唆された。ソースコードとデータセットはhttps://github.com/thulawtech/muserで公開されている。

Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge. In this work, we present MUSER, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations. Specifically, we select three perspectives (legal fact, dispute focus, and law statutory) and build a comprehensive and structured label schema of legal elements for each of them, to enable accurate and knowledgeable evaluation of case similarities. The constructed dataset originates from Chinese civil cases and contains 100 query cases and 4,024 candidate cases. We implement several text classification algorithms for legal element prediction and various retrieval methods for retrieving similar cases on MUSER. The experimental results indicate that incorporating legal elements can benefit the performance of SCR models, but further efforts are still required to address the remaining challenges posed by MUSER. The source code and dataset are released at https://github.com/THUlawtech/MUSER.

翻訳日:2023-10-25 20:02:55 公開日:2023-10-24

# 片手で複数の物体をつかむ

Grasp Multiple Objects with One Hand ( http://arxiv.org/abs/2310.15599v1 )

ライセンス: Link先を確認

Yuyang Li, Bo Liu, Yiran Geng, Puhao Li, Yaodong Yang, Yixin Zhu, Tengyu Liu, Siyuan Huang

(参考訳) 人間の手の複雑な運動学は、複数のオブジェクトを同時に把握し、操作することができる。その重要性にもかかわらず、ロボットによるマルチオブジェクトの把持は未検討のままであり、運動学、ダイナミクス、オブジェクト構成の課題を提示している。本稿では,マルチフィンガーデキスタラスハンドを用いたテーブルトップ上のマルチオブジェクトグリップのための2段階手法であるMultiGraspを提案する。それは (i)先延ばし案の作成及び (二物をつかんで持ち上げること。) 実験結果は、主に二重物体の把握と44.13%の成功率の報告に焦点が当てられ、未確認の物体構成への適応性と不正確な把握を示す。フレームワークはまた、推論速度の低下にもかかわらず、2つ以上のオブジェクトを把握できることも示している。

The human hand's complex kinematics allow for simultaneous grasping and manipulation of multiple objects, essential for tasks like object transfer and in-hand manipulation. Despite its importance, robotic multi-object grasping remains underexplored and presents challenges in kinematics, dynamics, and object configurations. This paper introduces MultiGrasp, a two-stage method for multi-object grasping on a tabletop with a multi-finger dexterous hand. It involves (i) generating pre-grasp proposals and (ii) executing the grasp and lifting the objects. Experimental results primarily focus on dual-object grasping and report a 44.13% success rate, showcasing adaptability to unseen object configurations and imprecise grasps. The framework also demonstrates the capability to grasp more than two objects, albeit at a reduced inference speed.

翻訳日:2023-10-25 20:02:34 公開日:2023-10-24

# 対話型スケッチ質問応答における創発的コミュニケーション

Emergent Communication in Interactive Sketch Question Answering ( http://arxiv.org/abs/2310.15597v1 )

ライセンス: Link先を確認

Zixing Lei, Yiming Zhang, Yuxin Xiong and Siheng Chen

(参考訳) 視覚に基づく創発的コミュニケーション(EC)は、スケッチを通してコミュニケーションを学び、人間のコミュニケーションの進化を解明することを目的としている。皮肉なことに、以前の作品は、人間のコミュニケーションに欠かせないマルチラウンドインタラクションを無視している。このギャップを埋めるために、我々はまず、2人の共同プレイヤーがスケッチを通して対話し、複数のラウンドで画像に関する質問に答える、インタラクティブスケッチ質問回答(ISQA)タスクを導入する。この課題を達成するために,質問応答精度,複雑化,人間の解釈可能性などの3つの評価因子のバランスを効果的に達成できる,新しいインタラクティブECシステムを設計する。人的評価を含む実験結果から,マルチラウンド対話機構は,適切な人間解釈能力を有する知的エージェント間のコミュニケーションを目標とし,効率的なものにすることが示された。

Vision-based emergent communication (EC) aims to learn to communicate through sketches and demystify the evolution of human communication. Ironically, previous works neglect multi-round interaction, which is indispensable in human communication. To fill this gap, we first introduce a novel Interactive Sketch Question Answering (ISQA) task, where two collaborative players are interacting through sketches to answer a question about an image in a multi-round manner. To accomplish this task, we design a new and efficient interactive EC system, which can achieve an effective balance among three evaluation factors, including the question answering accuracy, drawing complexity and human interpretability. Our experimental results including human evaluation demonstrate that multi-round interactive mechanism facilitates targeted and efficient communication between intelligent agents with decent human interpretability.

翻訳日:2023-10-25 20:02:20 公開日:2023-10-24

# 検索に基づく知識伝達:超大規模言語モデル圧縮に対する効果的なアプローチ

Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression ( http://arxiv.org/abs/2310.15594v1 )

ライセンス: Link先を確認

Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Ran Lucien Wang, Rui Yan

(参考訳) 大規模事前学習言語モデル(LLM)は、様々な自然言語処理(NLP)タスクにおいて例外的な性能を示した。しかし、これらのモデルの巨大なサイズは、現実世界のアプリケーションに展開する上で大きな課題をもたらします。多くのモデル圧縮技術が提案されているが、モデルスケールに大きなギャップがある場合、そのほとんどが極端なモデル圧縮を達成するのに適していない。本稿では,LLMの知識を極小モデル(例えば1%)に効果的に伝達する,Retrieval-based Knowledge Transfer (RetriKT)と呼ばれる新しい圧縮パラダイムを提案する。特に,本手法では,LLMから知識を抽出して知識ストアを構築する。モデルの質を向上させるために、ソフトプロンプトチューニングと近位政策最適化(ppo)強化学習技術が採用されている。 SuperGLUE と GLUE ベンチマークによる低リソースタスクに対する大規模な実験が行われた。提案手法はLLMの知識を活用することにより,小規模モデルの性能を著しく向上することを示す。

Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. While numerous model compression techniques have been proposed, most of them are not well-suited for achieving extreme model compression when there is a significant gap in model scale. In this paper, we introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT), which effectively transfers the knowledge of LLMs to extremely small-scale models (e.g., 1%). In particular, our approach extracts knowledge from LLMs to construct a knowledge store, from which the small-scale model can retrieve relevant information and leverage it for effective inference. To improve the quality of the model, soft prompt tuning and Proximal Policy Optimization (PPO) reinforcement learning techniques are employed. Extensive experiments are conducted on low-resource tasks from SuperGLUE and GLUE benchmarks. The results demonstrate that the proposed approach significantly enhances the performance of small-scale models by leveraging the knowledge from LLMs.

翻訳日:2023-10-25 20:02:05 公開日:2023-10-24

# 顔データ最小化: プライバシーフィルターとしての浅いモデル

Facial Data Minimization: Shallow Model as Your Privacy Filter ( http://arxiv.org/abs/2310.15590v1 )

ライセンス: Link先を確認

Yuwen Pu, Jiahao Chen, Jiayu Pan, Hao li, Diqun Yan, Xuhong Zhang, Shouling Ji

(参考訳) 顔認識サービスは、多くの分野で使われており、人々に多くの利便性をもたらしている。しかし、ユーザの顔データがサービスプロバイダに送信されると、ユーザはプライベートデータのコントロールを失うことになる。近年,顔データ漏洩によるセキュリティやプライバシの問題が数多く発生している。多くのプライバシー保護手法が提案されているが、通常は敵の戦略や補助データにアクセスできない場合に失敗する。そこで本稿では,顔認識サービスシステムにおいて非常に典型的な顔画像と顔特徴をアップロードする2つの事例を十分に検討し,データプライバシ最小化変換(pmt)法を提案する。この方法は、認証サービスの浅いモデルに基づいて元の顔データを処理し、難読化データを得る。難読化されたデータは、認可されたモデルの満足なパフォーマンスを維持し、他の許可されていないモデルのパフォーマンスを制限するだけでなく、AIメソッドや人間の視覚的盗難によって元のプライバシデータが漏洩することを防ぐ。また,サービスプロバイダが受信したデータに対して事前処理を行うことができるため,PMTの堅牢性を向上させるための摂動法も提案する。さらに、1つの顔画像を複数のサービスモデルに同時に認可するために、PMTのスケーラビリティを向上させるために複数の制限機構を提案する。最後に,提案するpmtによる顔再建,データ乱用,顔属性推定攻撃に対する防御効果について,広範な実験を行い,その効果を評価した。これらの実験結果から, PMTは顔認識精度を維持しつつ, 顔データの乱用やプライバシーの漏洩を防止できることがわかった。

Face recognition service has been used in many fields and brings much convenience to people. However, once the user's facial data is transmitted to a service provider, the user will lose control of his/her private data. In recent years, there exist various security and privacy issues due to the leakage of facial data. Although many privacy-preserving methods have been proposed, they usually fail when they are not accessible to adversaries' strategies or auxiliary data. Hence, in this paper, by fully considering two cases of uploading facial images and facial features, which are very typical in face recognition service systems, we proposed a data privacy minimization transformation (PMT) method. This method can process the original facial data based on the shallow model of authorized services to obtain the obfuscated data. The obfuscated data can not only maintain satisfactory performance on authorized models and restrict the performance on other unauthorized models but also prevent original privacy data from leaking by AI methods and human visual theft. Additionally, since a service provider may execute preprocessing operations on the received data, we also propose an enhanced perturbation method to improve the robustness of PMT. Besides, to authorize one facial image to multiple service models simultaneously, a multiple restriction mechanism is proposed to improve the scalability of PMT. Finally, we conduct extensive experiments and evaluate the effectiveness of the proposed PMT in defending against face reconstruction, data abuse, and face attribute estimation attacks. These experimental results demonstrate that PMT performs well in preventing facial data abuse and privacy leakage while maintaining face recognition accuracy.

翻訳日:2023-10-25 20:01:49 公開日:2023-10-24

# ScanDL:テキストによる合成スキャンパス生成のための拡散モデル

ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts ( http://arxiv.org/abs/2310.15587v1 )

ライセンス: Link先を確認

Lena S. Bolliger, David R. Reich, Patrick Haller, Deborah N. Jakobi, Paul Prasse, Lena A. J\"ager

(参考訳) 読書における眼球運動は、人間の言語処理の基礎となる認知メカニズムの研究において重要な役割を担っている。近年,目の動きと認知の密結合は,言語モデルの解釈可能性,拡張性,事前学習といった言語関連機械学習タスクや,読み手やテキスト特有の特性の推論にも活用されている。しかし、眼球運動データの不足とアプリケーション時の利用不可は、この研究のラインにとって大きな課題となっている。当初は、眼球運動データを合成するための認知モデルを用いてこの問題に対処した。しかし、人間のようなスキャンパスを生成する唯一の目的として、純粋にデータ駆動型機械学習ベースの手法の方が適していることが証明されている。近年の拡散過程を離散データに適用する進歩に続いて,テキスト上で合成スキャンパスを生成する新しい離散シーケンス-シーケンス間拡散モデルであるscandlを提案する。事前学習した単語表現を活用し、刺激テキストと固定シーケンスを併用することにより、2つの入力間のマルチモーダル相互作用を捉える。本研究では,データセット内のscandlを評価し,最先端のscanpath生成法を著しく上回っていることを示す。最後に、モデルが人間的な読書行動を示す能力の基盤となる広範な心理言語学的分析を提供する。実装はhttps://github.com/dili-lab/scandlで利用可能です。

Eye movements in reading play a crucial role in psycholinguistic research studying the cognitive mechanisms underlying human language processing. More recently, the tight coupling between eye movements and cognition has also been leveraged for language-related machine learning tasks such as the interpretability, enhancement, and pre-training of language models, as well as the inference of reader- and text-specific properties. However, scarcity of eye movement data and its unavailability at application time poses a major challenge for this line of research. Initially, this problem was tackled by resorting to cognitive models for synthesizing eye movement data. However, for the sole purpose of generating human-like scanpaths, purely data-driven machine-learning-based methods have proven to be more suitable. Following recent advances in adapting diffusion processes to discrete data, we propose ScanDL, a novel discrete sequence-to-sequence diffusion model that generates synthetic scanpaths on texts. By leveraging pre-trained word representations and jointly embedding both the stimulus text and the fixation sequence, our model captures multi-modal interactions between the two inputs. We evaluate ScanDL within- and across-dataset and demonstrate that it significantly outperforms state-of-the-art scanpath generation methods. Finally, we provide an extensive psycholinguistic analysis that underlines the model's ability to exhibit human-like reading behavior. Our implementation is made available at https://github.com/DiLi-Lab/ScanDL.

翻訳日:2023-10-25 20:01:24 公開日:2023-10-24

# 自己監督型深層学習を用いた開海サーベイランスにおける意図的AISシャットダウンの検出

Detecting Intentional AIS Shutdown in Open Sea Maritime Surveillance Using Self-Supervised Deep Learning ( http://arxiv.org/abs/2310.15586v1 )

ライセンス: Link先を確認

Pierre Bernab\'e, Arnaud Gotlieb, Bruno Legeard, Dusica Marijan, Frank Olaf Sem-Jacobsen, Helge Spieker

(参考訳) 海上交通監視においては、違法漁業や違法商品の輸送などの違法行為の検知は沿岸管理にとって重要な課題である。開海では、自動識別システム(ais)のメッセージがオンボードのトランスポンダーによって送信され、監視衛星によって捕捉される。しかし、インシンセア船はしばしば違法行為を隠すためにAISトランスポンダを故意にシャットダウンする。開海では、プロトコルの制限、悪天候条件、衛星位置の制限により、意図的なAISシャットダウンと受信の欠如を区別することが非常に困難である。本稿では,自己教師付き深層学習手法とトランスフォーマーモデルに基づく異常ais欠落検出のための新しい手法を提案する。トレーニングされたモデルは、履歴データを使用して、次の分にメッセージを受け取るかどうかを予測する。その後、モデルが検出された異常を予測と実際に何が起こるかを比較して報告する。本手法は,6万以上の船舶の軌道に対応して,毎月5億以上のaisメッセージをリアルタイムに処理することができる。この手法は、ノルウェーの4つの観測衛星から得られた1年間の実世界のデータに基づいて評価される。関連研究結果を用いて,すでに検出されているAIS停止を再度発見し,本手法の有効性を検証した。

In maritime traffic surveillance, detecting illegal activities, such as illegal fishing or transshipment of illicit products is a crucial task of the coastal administration. In the open sea, one has to rely on Automatic Identification System (AIS) message transmitted by on-board transponders, which are captured by surveillance satellites. However, insincere vessels often intentionally shut down their AIS transponders to hide illegal activities. In the open sea, it is very challenging to differentiate intentional AIS shutdowns from missing reception due to protocol limitations, bad weather conditions or restricting satellite positions. This paper presents a novel approach for the detection of abnormal AIS missing reception based on self-supervised deep learning techniques and transformer models. Using historical data, the trained model predicts if a message should be received in the upcoming minute or not. Afterwards, the model reports on detected anomalies by comparing the prediction with what actually happens. Our method can process AIS messages in real-time, in particular, more than 500 Millions AIS messages per month, corresponding to the trajectories of more than 60 000 ships. The method is evaluated on 1-year of real-world data coming from four Norwegian surveillance satellites. Using related research results, we validated our method by rediscovering already detected intentional AIS shutdowns.

翻訳日:2023-10-25 20:01:00 公開日:2023-10-24

# 教師指導による構成的視覚推論のためのマルチモーダル表現

Multimodal Representations for Teacher-Guided Compositional Visual Reasoning ( http://arxiv.org/abs/2310.15585v1 )

ライセンス: Link先を確認

Wafa Aissa (CEDRIC - VERTIGO), Marin Ferecatu (CEDRIC - VERTIGO), Michel Crucianu (CEDRIC - VERTIGO)

(参考訳) ニューラルモジュールネットワーク(Neural Module Networks, NMN)は、画像上で順次実行される一連の推論サブタスクからなるプログラムへの質問の変換を可能にする視覚的質問応答のための魅力的な方法である。 nmnは統合モデルと比較して説明可能性を高め、基礎となる推論プロセスの理解を深める。 nmnの有効性を向上させるため,大規模クロスモーダルエンコーダで得られた特徴を活用できる。また、現在のNMNsのトレーニング手法は、モジュール出力をその後のモジュールに伝播させることに依存しており、予測誤差の蓄積と偽解の生成につながる。これを軽減するために,教師指導を含むNMN学習戦略を導入する。当初、このモデルは地道な中間出力によって完全に導かれるが、訓練が進むにつれて徐々に自律的な行動へと移行する。これにより、誤り蓄積を低減し、トレーニング効率と最終性能を向上し、クロスモーダル機能を導入し、NMNにより効果的なトレーニング技術を採用することにより、推論プロセスにおける性能と透明性のバランスが良好であることを実証する。

Neural Module Networks (NMN) are a compelling method for visual question answering, enabling the translation of a question into a program consisting of a series of reasoning sub-tasks that are sequentially executed on the image to produce an answer. NMNs provide enhanced explainability compared to integrated models, allowing for a better understanding of the underlying reasoning process. To improve the effectiveness of NMNs we propose to exploit features obtained by a large-scale cross-modal encoder. Also, the current training approach of NMNs relies on the propagation of module outputs to subsequent modules, leading to the accumulation of prediction errors and the generation of false answers. To mitigate this, we introduce an NMN learning strategy involving scheduled teacher guidance. Initially, the model is fully guided by the ground-truth intermediate outputs, but gradually transitions to an autonomous behavior as training progresses. This reduces error accumulation, thus improving training efficiency and final performance.We demonstrate that by incorporating cross-modal features and employing more effective training techniques for NMN, we achieve a favorable balance between performance and transparency in the reasoning process.

翻訳日:2023-10-25 20:00:40 公開日:2023-10-24

# 無線通信ネットワークによる分割フェデレーション学習の高速化

Accelerating Split Federated Learning over Wireless Communication Networks ( http://arxiv.org/abs/2310.15584v1 )

ライセンス: Link先を確認

Ce Xu, Jinxuan Li, Yuan Liu, Yushi Ling, and Miaowen Wen

(参考訳) 人工知能(AI)の開発は、ディープニューラルネットワーク(DNN)ベースのアプリケーションを促進する機会を提供する。しかし、DNNの大量のパラメータと計算複雑性により、リソース制約のあるエッジデバイスにデプロイすることは困難である。この課題に対処する効果的な方法はモデル分割/分割であり、DNNはデバイスとサーバにそれぞれデプロイされる2つの部分に分けられる。本稿では,連合学習(fl)の並列モデル学習機構と分割学習(sl)のモデル分割構造を組み合わせたslit federated learning(sfl)フレームワークについて検討する。 DNNの個別分割点を持つ異種デバイスの実用シナリオを考察する。システム遅延を最小限に抑えるために,分割点選択と帯域割り当ての連立問題を定式化する。交互最適化を用いることで、問題を2つのサブプロブレムに分解し、最適に解く。実験の結果,レイテンシ低減と精度向上における作業の優位性を実証した。

The development of artificial intelligence (AI) provides opportunities for the promotion of deep neural network (DNN)-based applications. However, the large amount of parameters and computational complexity of DNN makes it difficult to deploy it on edge devices which are resource-constrained. An efficient method to address this challenge is model partition/splitting, in which DNN is divided into two parts which are deployed on device and server respectively for co-training or co-inference. In this paper, we consider a split federated learning (SFL) framework that combines the parallel model training mechanism of federated learning (FL) and the model splitting structure of split learning (SL). We consider a practical scenario of heterogeneous devices with individual split points of DNN. We formulate a joint problem of split point selection and bandwidth allocation to minimize the system latency. By using alternating optimization, we decompose the problem into two sub-problems and solve them optimally. Experiment results demonstrate the superiority of our work in latency reduction and accuracy improvement.

翻訳日:2023-10-25 20:00:20 公開日:2023-10-24

# ガウス過程回帰による保証被覆予測間隔

Guaranteed Coverage Prediction Intervals with Gaussian Process Regression ( http://arxiv.org/abs/2310.15641v1 )

ライセンス: Link先を確認

Harris Papadopoulos

(参考訳) ガウス過程回帰 (gaussian process regression, gpr) は一般的な回帰法であり、多くの機械学習技術とは異なり、予測の不確実性の推定を提供する。しかしながら、これらの不確実性の推定は、モデルが十分に特定されているという仮定に基づいている。その結果、生成した不確実性推定は、例えば、95%の信頼度で生成される予測間隔(PI)が、真のラベルの95%未満をカバーすることができる。この問題に対処するため,本稿では,CP(Conformal Prediction)と呼ばれる機械学習フレームワークに基づくGPRの拡張を提案する。この拡張は、モデルを完全に不特定であっても、必要なカバレッジでPIの生成を保証する。提案手法は,GPRの利点とCPの有効なカバレッジ保証を組み合わせ,実験により既存の手法よりも優れていることを示す。

Gaussian Process Regression (GPR) is a popular regression method, which unlike most Machine Learning techniques, provides estimates of uncertainty for its predictions. These uncertainty estimates however, are based on the assumption that the model is well-specified, an assumption that is violated in most practical applications, since the required knowledge is rarely available. As a result, the produced uncertainty estimates can become very misleading; for example the prediction intervals (PIs) produced for the 95\% confidence level may cover much less than 95\% of the true labels. To address this issue, this paper introduces an extension of GPR based on a Machine Learning framework called, Conformal Prediction (CP). This extension guarantees the production of PIs with the required coverage even when the model is completely misspecified. The proposed approach combines the advantages of GPR with the valid coverage guarantee of CP, while the performed experimental results demonstrate its superiority over existing methods.

翻訳日:2023-10-25 19:52:38 公開日:2023-10-24

# coannotating: データアノテーションのための人間と大規模言語モデル間の不確実性に基づく作業割り当て

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation ( http://arxiv.org/abs/2310.15638v1 )

ライセンス: Link先を確認

Minzhi Li, Taiwei Shi, Caleb Ziems, Min-Yen Kan, Nancy F. Chen, Zhengyuan Liu, Diyi Yang

(参考訳) 注釈付きデータは、訓練モデルにおいて自然言語処理(NLP)において重要な役割を果たす。近年のLLM(Large Language Models)の発展を踏まえると、ChatGPTのようなモデルは、人間のアノテーションと同等かそれ以上の多くのテキストアノテーションタスクにおいてゼロショット機能を示す。このようなllmは、コストの低減とスケーラビリティの向上により、手動アノテーションの代替として機能する。しかし,LLMを補完的なアノテータとして活用した限定的な研究や,品質とコストの両方の目的を達成するために,人間とLLMの間でアノテーション作業がどのように最適に割り当てられているかを考察した。本稿では,非構造化テキストの大規模共同アノテーションのための新しいパラダイムであるCoAnnotatingを提案する。この枠組みでは、不確実性を利用してLCMのアノテーション能力を推定する。我々の実証研究は、CoAnnotatingが、異なるデータセットで結果から作業を割り当てる効果的な手段であることを示し、ランダムベースラインよりも最大21%パフォーマンスが改善されている。コード実装についてはhttps://github.com/SALT-NLP/CoAnnotatingを参照。

Annotated data plays a critical role in Natural Language Processing (NLP) in training models and evaluating their performance. Given recent developments in Large Language Models (LLMs), models such as ChatGPT demonstrate zero-shot capability on many text-annotation tasks, comparable with or even exceeding human annotators. Such LLMs can serve as alternatives for manual annotation, due to lower costs and higher scalability. However, limited work has leveraged LLMs as complementary annotators, nor explored how annotation work is best allocated among humans and LLMs to achieve both quality and cost objectives. We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale. Under this framework, we utilize uncertainty to estimate LLMs' annotation capability. Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline. For code implementation, see https://github.com/SALT-NLP/CoAnnotating.

翻訳日:2023-10-25 19:52:22 公開日:2023-10-24

# Resume Representation Learningとスキルベースマッチングを用いたキャリアパス予測

Career Path Prediction using Resume Representation Learning and Skill-based Matching ( http://arxiv.org/abs/2310.15636v1 )

ライセンス: Link先を確認

Jens-Joris Decorte, Jeroen Van Hautte, Johannes Deleu, Chris Develder and Thomas Demeester

(参考訳) 求職者の満足度とパフォーマンスにフィットするパーソン・ジョブの影響は広く認識されており、キャリアにおける正しいタイミングで労働者に次のステップを提供することの重要性を強調している。キャリアの次のステップを予測するこのタスクは、キャリアパス予測と呼ばれ、ターンオーバー防止や社内仕事の移動といった多様な応用がある。既存のキャリアパス予測手法は、職種と企業間の相互作用をモデル化するために、大量のプライベートキャリア履歴データに依存している。本稿では,履歴書の作業経験セクションの一部である未検討のテキスト記述を活用することを提案する。 ESCOの職業ラベルにアノテートした2,164人の匿名キャリア履歴の構造化データセットを導入する。このデータセットに基づいて,作業履歴データ専用に設計された新しい表現学習手法である careerbert を提案する。キャリアパス予測のためのスキルベースモデルとテキストベースモデルを開発し,データセット上でそれぞれ35.24%と39.61%のre recall@10を達成した。最後に、ハイブリッドアプローチが43.01%のリコール@10で最強の結果を得るため、両方のアプローチが相補的であることを示す。

The impact of person-job fit on job satisfaction and performance is widely acknowledged, which highlights the importance of providing workers with next steps at the right time in their career. This task of predicting the next step in a career is known as career path prediction, and has diverse applications such as turnover prevention and internal job mobility. Existing methods to career path prediction rely on large amounts of private career history data to model the interactions between job titles and companies. We propose leveraging the unexplored textual descriptions that are part of work experience sections in resumes. We introduce a structured dataset of 2,164 anonymized career histories, annotated with ESCO occupation labels. Based on this dataset, we present a novel representation learning approach, CareerBERT, specifically designed for work history data. We develop a skill-based model and a text-based model for career path prediction, which achieve 35.24% and 39.61% recall@10 respectively on our dataset. Finally, we show that both approaches are complementary as a hybrid approach achieves the strongest result with 43.01% recall@10.

翻訳日:2023-10-25 19:51:51 公開日:2023-10-24

# 言語設計、ライブラリ、ガベージコレクションで64ビットアーキテクチャを最大限活用するためのヒント

Tips for making the most of 64-bit architectures in langage design, libraries or garbage collection ( http://arxiv.org/abs/2310.15632v1 )

ライセンス: Link先を確認

Beno\^it Sonntag (UNISTRA), Dominique Colnet (LORIA)

(参考訳) 今日標準になった64ビットアーキテクチャは、前例のない低レベルプログラミングの可能性を秘めている。 For the first time in the history of computing, the size of address registers far exceeded the physical capacity of their bus.After a brief reminder of the possibilities offered by the small size of addresses compared to the available 64 bits,we develop three concrete examples of how the vacant bits of these registers can be used.Among these examples, two of them concern the implementation of a library for a new statically typed programming language.Firstly, the implementation of multi-precision integers, with the aim of improving performance in terms of both calculation speed and RAM savings.The second example focuses on the library's handling of UTF-8 character strings.Here, the idea is to make indexing easier by ignoring the physical size of each UTF-8 characters.Finally, the third example is a possible enhancement of garbage collectors, in particular the mark \& sweep for the object marking phase.

The 64-bit architectures that have become standard today offer unprecedented low-level programming possibilities. For the first time in the history of computing, the size of address registers far exceeded the physical capacity of their bus.After a brief reminder of the possibilities offered by the small size of addresses compared to the available 64 bits,we develop three concrete examples of how the vacant bits of these registers can be used.Among these examples, two of them concern the implementation of a library for a new statically typed programming language.Firstly, the implementation of multi-precision integers, with the aim of improving performance in terms of both calculation speed and RAM savings.The second example focuses on the library's handling of UTF-8 character strings.Here, the idea is to make indexing easier by ignoring the physical size of each UTF-8 characters.Finally, the third example is a possible enhancement of garbage collectors, in particular the mark \& sweep for the object marking phase.

翻訳日:2023-10-25 19:51:18 公開日:2023-10-24

# 圧縮量子波形推定

Compressive quantum waveform estimation ( http://arxiv.org/abs/2310.15630v1 )

ライセンス: Link先を確認

Alex Tritt, Joshua Morris, Christopher C. Bounds, Hamish A. M. Taylor, James Saunderson, L. D. Turner

(参考訳) 量子センサーを信号全体(量子波形推定)のサンプルに適用することは、医療研究のためにニューロンが生成する電気パルスのモニタリングなど、小さな信号のセンシングに革命をもたらす。しかし、量子リソース(例えば、長いセンシング時間や多くの破壊的測定値)の集中的な使用は、現在の実装を現実世界での使用には実用的ではない。そこで本論文では, 合成神経様信号の量子波形推定を実験的に実証し, 必要となる以上の寒冷原子測定を行った。

Applying quantum sensors to sample entire signals (quantum waveform estimation) promises to revolutionize the sensing of small signals, such as the monitoring of electrical pulses generated by neurons for medical research. However, intensive use of quantum resources (e.g., long sensing times and/or many destructive measurements) make current implementations impractical for real-world use. In this Letter, we experimentally demonstrate quantum waveform estimation of a synthesized neural-like signal, taking many fewer cold-atom measurements than would naively be necessary.

翻訳日:2023-10-25 19:50:55 公開日:2023-10-24

# シリコン系バレーフォトニック結晶における光周波数コムのオンチップ位相輸送

On-chip topological transport of optical frequency combs in silicon-based valley photonic crystals ( http://arxiv.org/abs/2310.15629v1 )

ライセンス: Link先を確認

Zhen Jiang, Hongwei Wang, Yuechen Yang, Yang Shen, Bo Ji, Yanghe Chen, Yong Zhang, Lu Sun, Zheng Wang, Chun Jiang, Yikai Su, and Guangqiang He

(参考訳) 集積フォトニックシステムにおける光周波数コムの生成と制御は、複雑で高可制御性で大規模デバイスを可能にする。平行して、多粒子系におけるトポロジカル物理学の活用は、製造の不完全性に対する堅牢性のような魅力的な特徴を持つ。ここでは,古典的領域と非古典的領域の両方において,通信波長における光周波数コムのオンチップトポロジ輸送を実験的に実証する。量子周波数コムと消散性Kerrソリトンコムの両方にマイクロ共振器でアクセスする。量子周波数コム、すなわち多重周波数モードのコヒーレント重ね合わせは、周波数絡み合いqudit状態であることが証明されている。また, 散逸性カーソリトンコームは, 集団的コヒーレンスやソリトンの自己組織化により, 高いコヒーレント性とモード同期性を示す。さらに、バレー・キンク状態は、量子周波数コムと散逸性カー・ソリトンコムの両方を、鋭い曲がりに対する頑丈さで許容する。位相的に保護された光周波数コムは、複合フォトニックシステムにおいて固有のロバスト性を可能にする。

The generation and control of optical frequency combs in integrated photonic systems enables complex, high-controllable, and large-scale devices. In parallel, harnessing topological physics in multipartite systems has allowed them with compelling features such as robustness against fabrication imperfections. Here we experimentally demonstrate on-chip topological transport for optical frequency combs at telecommunication wavelengths, both in classical and nonclassical domains. We access both the quantum frequency combs and dissipative Kerr soliton combs with a micro-resonator. The quantum frequency comb, that is, a coherent superposition of multiple frequency modes, is proven to be a frequency-entangled qudit state. We also show that dissipative Kerr soliton combs are highly coherent and mode-locked due to the collective coherence or self-organization of solitons. Moreover, the valley kink states allow both quantum frequency combs and dissipative Kerr soliton combs with robustness against sharp bends. Our topologically protected optical frequency combs could enable the inherent robustness in integrated complex photonic systems.

翻訳日:2023-10-25 19:50:28 公開日:2023-10-24

# 文脈指向非巡回グラフ

Contextual directed acyclic graphs ( http://arxiv.org/abs/2310.15627v1 )

ライセンス: Link先を確認

Ryan Thompson, Edwin V. Bonilla, Robert Kohn

(参考訳) 観測データから有向非巡回グラフ(DAG)の構造を推定することは、機械学習において重要な課題である。この地域のほとんどの研究は、人口の1つのDAGを学ぶことに集中している。本稿では、利用可能な「文脈的」特徴に基づき、個人間でグラフ構造が変化する別の設定を検討する。我々は、コンテキスト特徴を重み付き隣接行列として表されるDAGにマッピングするニューラルネットワークを介して、このコンテキストDAG問題に取り組む。ニューラルネットワークは、出力行列がスパースであることを保証する新規な投影層を備え、最近開発された非循環性の特徴を満足する。我々は,コンテキストDAGを学習するためのスケーラブルな計算フレームワークを考案し,プロジェクション層をバックプロパゲーションするための収束保証と解析的勾配を提供する。実験の結果,既存手法が失敗するコンテキスト固有グラフを復元できる可能性が示唆された。

Estimating the structure of directed acyclic graphs (DAGs) from observational data remains a significant challenge in machine learning. Most research in this area concentrates on learning a single DAG for the entire population. This paper considers an alternative setting where the graph structure varies across individuals based on available "contextual" features. We tackle this contextual DAG problem via a neural network that maps the contextual features to a DAG, represented as a weighted adjacency matrix. The neural network is equipped with a novel projection layer that ensures the output matrices are sparse and satisfy a recently developed characterization of acyclicity. We devise a scalable computational framework for learning contextual DAGs and provide a convergence guarantee and an analytical gradient for backpropagating through the projection layer. Our experiments suggest that the new approach can recover the true context-specific graph where existing approaches fail.

翻訳日:2023-10-25 19:50:02 公開日:2023-10-24

# gupnet++: 単眼3次元物体検出のための幾何不確かさ伝播ネットワーク

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection ( http://arxiv.org/abs/2310.15624v1 )

ライセンス: Link先を確認

Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

(参考訳) 幾何学は単眼3次元物体検出において重要な役割を担っている。物体の物理的大きさと画像平面の2次元投影の間の視点投影を用いて物体の深さを推定することができ、深部モデルに数学的先行性を導入することができる。しかし、このプロジェクションプロセスは、推定高さの誤差を増幅し、投影された深さに反映する誤差増幅も導入する。信頼できない深さの推測を導き、トレーニングの安定性を損なう。そこで本研究では,幾何投影を確率論的にモデル化し,新たな幾何不確かさ伝播ネットワーク(gupnet++)を提案する。これにより、深さ予測が十分に拘束され、合理的な不確実性に結びつくことが保証される。このような幾何学的不確実性を導入する意義は、2つある:(1)。トレーニング中の幾何射影の不確かさ伝播関係をモデル化し、エンドツーエンドモデル学習の安定性と効率を向上させる。 (2). 3D検出結果の品質を示す信頼性の高い信頼性に導出することができ、より信頼性の高い検出推測を可能にする。実験により,提案手法は画像ベースモノクロ3次元検出におけるSOTA性能を得るだけでなく,簡易なフレームワークによる有効性も示す。

Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and reflected into the projected depth. It leads to unreliable depth inferences and also impairs training stability. To tackle this problem, we propose a novel Geometry Uncertainty Propagation Network (GUPNet++) by modeling geometry projection in a probabilistic manner. This ensures depth predictions are well-bounded and associated with a reasonable uncertainty. The significance of introducing such geometric uncertainty is two-fold: (1). It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning. (2). It can be derived to a highly reliable confidence to indicate the quality of the 3D detection result, enabling more reliable detection inference. Experiments show that the proposed approach not only obtains (state-of-the-art) SOTA performance in image-based monocular 3D detection but also demonstrates superiority in efficacy with a simplified framework.

翻訳日:2023-10-25 19:49:30 公開日:2023-10-24

# Nkoの機械翻訳:ツール、コーパス、ベースライン結果

Machine Translation for Nko: Tools, Corpora and Baseline Results ( http://arxiv.org/abs/2310.15612v1 )

ライセンス: Link先を確認

Moussa Koulako Bala Doumbouya, Baba Mamadi Dian\'e, Solo Farabado Ciss\'e, Djibrila Dian\'e, Abdoulaye Sow, S\'er\'e Moussa Doumbouya, Daouda Bangoura, Fod\'e Moriba Bayo, Ibrahima Sory 2. Cond\'e, Kalo Mory Dian\'e, Chris Piech, Christopher Manning

(参考訳) 現在、複数の西アフリカ諸国で何千万人もの人々が話している言語であるNkoの機械翻訳システムは存在しない。この問題に対処するために,現在十分に大きな並列テキストコーパスを持っていないNkoや他の言語向けの機械翻訳システムの開発を目的とした,ツール,リソース,ベースラインの一連の結果を示す。 1) Friallel: 複写ベースのワークフローによる品質管理を取り入れた,新しい並列テキストキュレーションソフトウェア。 2) FLoRes-200とNLLB-Seedの2,009,6,193の高品質なNko翻訳を204,40言語と並行して拡張した。 3) nicolingua-0005:130,850の並列セグメントを持つ三言語・二言語コーパスと300万以上のnko単語を含む単言語コーパスのコレクション。 (4) ベースラインバイリンガルおよび多言語ニューラルマシン翻訳の結果、FLoRes-devtest上での英語Nko chrF++のスコアが30.83である。

Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. (1) Friallel: A novel collaborative parallel text curation software that incorporates quality control through copyedit-based workflows. (2) Expansion of the FLoRes-200 and NLLB-Seed corpora with 2,009 and 6,193 high-quality Nko translations in parallel with 204 and 40 other languages. (3) nicolingua-0005: A collection of trilingual and bilingual corpora with 130,850 parallel segments and monolingual corpora containing over 3 million Nko words. (4) Baseline bilingual and multilingual neural machine translation results with the best model scoring 30.83 English-Nko chrF++ on FLoRes-devtest.

翻訳日:2023-10-25 19:49:08 公開日:2023-10-24

# Slisemapを使って物理データを解釈する

Using Slisemap to interpret physical data ( http://arxiv.org/abs/2310.15610v1 )

ライセンス: Link先を確認

Lauri Sepp\"al\"ainen, Anton Bj\"orklund, Vitus Besel and Kai Puolam\"aki

(参考訳) マニフォールド可視化技術は、物理科学における高次元データセットの可視化に一般的に用いられている。本稿では,最近導入されたsliseと呼ばれる多様体可視化法を,物理と化学のデータセットに適用する。 Slisemapは、多様体の可視化と説明可能な人工知能を組み合わせる。説明可能な人工知能は、ブラックボックス機械学習モデルと複雑なシミュレータの決定過程を調べるために使用される。 Slisemapでは、類似のローカル説明を持つデータ項目がグループ化されるような埋め込みが見つかる。従って、slisemapは、ブラックボックスモデルのさまざまな振る舞いの概要を提供する。これにより、Slisemapは教師付き多様体可視化法となり、埋め込みのパターンは対象特性を反映する。本稿では,Slisemapを物理データ上でどのように利用し,評価し,Slisemapがこれらのデータセットでトレーニングされた分類と回帰モデルに関する有意義な情報を見つけるのに有効であることを示す。

Manifold visualisation techniques are commonly used to visualise high-dimensional datasets in physical sciences. In this paper we apply a recently introduced manifold visualisation method, called Slise, on datasets from physics and chemistry. Slisemap combines manifold visualisation with explainable artificial intelligence. Explainable artificial intelligence is used to investigate the decision processes of black box machine learning models and complex simulators. With Slisemap we find an embedding such that data items with similar local explanations are grouped together. Hence, Slisemap gives us an overview of the different behaviours of a black box model. This makes Slisemap into a supervised manifold visualisation method, where the patterns in the embedding reflect a target property. In this paper we show how Slisemap can be used and evaluated on physical data and that Slisemap is helpful in finding meaningful information on classification and regression models trained on these datasets.

翻訳日:2023-10-25 19:48:52 公開日:2023-10-24

# 限界テスト: 大規模言語モデルを用いたモバイルアプリクラッシュ検出のための不規則テキスト入力生成

Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model ( http://arxiv.org/abs/2310.15657v1 )

ライセンス: Link先を確認

Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, Qing Wang

(参考訳) モバイルアプリは私たちの日常生活のユビキタスな部分となり、ユーザはさまざまなサービスやユーティリティにアクセスできるようになる。テキスト入力は、ユーザとアプリケーションの間の重要な対話チャネルとして、検索クエリ、認証、メッセージングなどのコア機能において重要な役割を果たす。しかし、特定の特別なテキスト(例えばFont Sizeの-18)は、アプリをクラッシュさせ、アプリを完全テストするための多様な特異な入力を生成することが要求される。しかし、これは爆発ジレンマ、高文脈感度、複雑な制約関係の組み合わせによっても困難である。本稿では,LLMを利用してモバイルアプリのクラッシュ検出のための異常なテキスト入力を自動的に生成するInputBlasterを提案する。異常な入力生成問題を一連のテストジェネレータを生成するタスクとして定式化し、それぞれが同じ突然変異規則の下で異常なテキスト入力のバッチを生成する。詳しくは、インプットブラスターがllmを利用して、推論チェインとして機能する突然変異ルールと共にテストジェネレータを生成し、コンテキスト内学習スキーマを使用して、パフォーマンス向上の例を示す。 inputblasterは36のテキスト入力ウィジェットで評価され、31の人気のあるandroidアプリを含むキャッシュバグがあり、78%のバグ検出率を達成し、最高のベースラインよりも136%高い。また、自動GUIテストツールと統合し、Google Playの現実世界のアプリの37のクラッシュを検知します。

Mobile applications have become a ubiquitous part of our daily life, providing users with access to various services and utilities. Text input, as an important interaction channel between users and applications, plays an important role in core functionality such as search queries, authentication, messaging, etc. However, certain special text (e.g., -18 for Font Size) can cause the app to crash, and generating diversified unusual inputs for fully testing the app is highly demanded. Nevertheless, this is also challenging due to the combination of explosion dilemma, high context sensitivity, and complex constraint relations. This paper proposes InputBlaster which leverages the LLM to automatically generate unusual text inputs for mobile app crash detection. It formulates the unusual inputs generation problem as a task of producing a set of test generators, each of which can yield a batch of unusual text inputs under the same mutation rule. In detail, InputBlaster leverages LLM to produce the test generators together with the mutation rules serving as the reasoning chain, and utilizes the in-context learning schema to demonstrate the LLM with examples for boosting the performance. InputBlaster is evaluated on 36 text input widgets with cash bugs involving 31 popular Android apps, and results show that it achieves 78% bug detection rate, with 136% higher than the best baseline. Besides, we integrate it with the automated GUI testing tool and detect 37 unseen crashes in real-world apps from Google Play.

翻訳日:2023-10-25 19:43:07 公開日:2023-10-24

# モメンタム勾配に基づくハイパーグラフニューラルネットワークの標的外攻撃

Momentum Gradient-based Untargeted Attack on Hypergraph Neural Networks ( http://arxiv.org/abs/2310.15656v1 )

ライセンス: Link先を確認

Yang Chen, Stjepan Picek, Zhonglin Ye, Zhaoyang Wang and Haixing Zhao

(参考訳) ハイパグラフニューラルネットワーク(HGNN)は,高次表現能力に優れたため,様々なハイパーグラフ関連タスクに適用されている。近年の研究では、ディープラーニングモデルは敵の攻撃に弱いことが示されている。グラフニューラルネットワーク(GNN)を対象とするグラフ敵攻撃の研究はほとんど行われておらず、HGNNに対する敵攻撃の研究はほとんど未解明である。本稿では,このギャップを低減しようと試みる。我々は、ノード機能の変更に焦点を当てた、未ターゲット攻撃のための新しいHGNN攻撃モデル、MGHGAを設計する。我々はHGNNのトレーニングの過程を考察し、ハイパーグラフモデリングの前に代理モデルを用いて攻撃を実装する。具体的には、MGHGAは2つの部分から構成される。我々は,特徴選択モジュールにおける攻撃ノード機能を選択するために運動量勾配機構を用いる。特徴修正モジュールでは、MGHGAを離散的かつ連続的なデータセットに適用するために、2つの特徴生成アプローチ(直接修正と符号勾配)を用いる。我々は,5つのベンチマークデータセットを用いて,ノードにおけるMGHGAの攻撃性能と視覚オブジェクト分類タスクを検証する。その結果,MGHGAはベースラインよりも平均2%向上した。

Hypergraph Neural Networks (HGNNs) have been successfully applied in various hypergraph-related tasks due to their excellent higher-order representation capabilities. Recent works have shown that deep learning models are vulnerable to adversarial attacks. Most studies on graph adversarial attacks have focused on Graph Neural Networks (GNNs), and the study of adversarial attacks on HGNNs remains largely unexplored. In this paper, we try to reduce this gap. We design a new HGNNs attack model for the untargeted attack, namely MGHGA, which focuses on modifying node features. We consider the process of HGNNs training and use a surrogate model to implement the attack before hypergraph modeling. Specifically, MGHGA consists of two parts: feature selection and feature modification. We use a momentum gradient mechanism to choose the attack node features in the feature selection module. In the feature modification module, we use two feature generation approaches (direct modification and sign gradient) to enable MGHGA to be employed on discrete and continuous datasets. We conduct extensive experiments on five benchmark datasets to validate the attack performance of MGHGA in the node and the visual object classification tasks. The results show that MGHGA improves performance by an average of 2% compared to the than the baselines.

翻訳日:2023-10-25 19:42:41 公開日:2023-10-24

# 軽量cnnネットワークによる光流れの輝度整合性の破れ

Breaking of brightness consistency in optical flow with a lightweight CNN network ( http://arxiv.org/abs/2310.15655v1 )

ライセンス: Link先を確認

Yicheng Lin, Shuo Wang, Yunlong Jiang and Bin Han

(参考訳) スパース光フローは様々なコンピュータビジョンタスクで広く使われているが、輝度の一貫性がハイダイナミックレンジ(HDR)環境での性能を制限すると仮定する。本研究では,光の強い畳み込み特性と強い不変性を持つコーナーを抽出するために,軽量ネットワークを用いる。畳み込み特性の整合性に対する光学流法の典型的な輝度の整合性を変化させると、光ローバストハイブリッド光流法が得られる。提案するネットワークは,4つの畳み込み層のみを使用して特徴マップとスコアマップを同時に抽出するため,商用CPU上で190 FPSで動作する。浅層ネットワークを直接訓練することは難しいため、深層ネットワークは信頼性マップを計算してそれを支援するように設計されている。両ネットワークでエンドツーエンドの教師なしトレーニングモードが使用される。提案手法の有効性を検証するため, 動的照明下でのコーナーリピータビリティと原点光流のマッチング性能を比較した。さらに、VINS-Monoの光学フロー法を置き換えることにより、より正確な視覚慣性システムを構築する。パブリックなHDRデータセットでは、翻訳エラーを93\%削減する。コードはhttps://github.com/linyicheng1/LET-NETで公開されている。

Sparse optical flow is widely used in various computer vision tasks, however assuming brightness consistency limits its performance in High Dynamic Range (HDR) environments. In this work, a lightweight network is used to extract illumination robust convolutional features and corners with strong invariance. Modifying the typical brightness consistency of the optical flow method to the convolutional feature consistency yields the light-robust hybrid optical flow method. The proposed network runs at 190 FPS on a commercial CPU because it uses only four convolutional layers to extract feature maps and score maps simultaneously. Since the shallow network is difficult to train directly, a deep network is designed to compute the reliability map that helps it. An end-to-end unsupervised training mode is used for both networks. To validate the proposed method, we compare corner repeatability and matching performance with origin optical flow under dynamic illumination. In addition, a more accurate visual inertial system is constructed by replacing the optical flow method in VINS-Mono. In a public HDR dataset, it reduces translation errors by 93\%. The code is publicly available at https://github.com/linyicheng1/LET-NET.

翻訳日:2023-10-25 19:42:21 公開日:2023-10-24

# llms生成コンテンツの検出に関する調査研究

A Survey on Detection of LLMs-Generated Content ( http://arxiv.org/abs/2310.15654v1 )

ライセンス: Link先を確認

Xianjun Yang, Liangming Pan, Xuandong Zhao, Haifeng Chen, Linda Petzold, William Yang Wang, Wei Cheng

(参考訳) ChatGPTのような先進的な大規模言語モデル(LLM)の急成長は、メディア、サイバーセキュリティ、公開談話、教育など、さまざまな分野に影響を及ぼす合成コンテンツ生成の増加につながっている。そのため,LSMの生成する内容を検出する能力は重要視されている。我々は,既存の検出戦略とベンチマークの詳細な概要を提供し,それらの相違点を精査し,この分野の重要な課題と展望を特定し,検出精度を高めるためにより適応的で堅牢なモデルを提案する。また,LSMの急速な機能向上に対応するため,様々な攻撃に対して多面的アプローチの必要性を示唆する。我々の知る限り、この研究はLLMの時代の検出に関する最初の総合的な調査である。我々は,LLMが生成するコンテンツ検出の現在の状況について広く理解し,合成コンテンツに支配される時代において,デジタル情報の完全性を維持しようと努力する研究者や実践者に対して,ガイダンスを提供することを期待している。関連論文の要約はhttps://github.com/Xianjun-Yang/Awesome_papers_on_LLMs_detection.gitで一貫して更新される。

The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT have led to an increase in synthetic content generation with implications across a variety of sectors, including media, cybersecurity, public discourse, and education. As such, the ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks, scrutinizing their differences and identifying key challenges and prospects in the field, advocating for more adaptable and robust models to enhance detection accuracy. We also posit the necessity for a multi-faceted approach to defend against various attacks to counter the rapidly advancing capabilities of LLMs. To the best of our knowledge, this work is the first comprehensive survey on the detection in the era of LLMs. We hope it will provide a broad understanding of the current landscape of LLMs-generated content detection, offering a guiding reference for researchers and practitioners striving to uphold the integrity of digital information in an era increasingly dominated by synthetic content. The relevant papers are summarized and will be consistently updated at https://github.com/Xianjun-Yang/Awesome_papers_on_LLMs_detection.git.

翻訳日:2023-10-25 19:42:02 公開日:2023-10-24

# メタ学習によるグラフ上の知覚的公正攻撃

Deceptive Fairness Attacks on Graphs via Meta Learning ( http://arxiv.org/abs/2310.15653v1 )

ライセンス: Link先を確認

Jian Kang, Yinglong Xia, Ross Maciejewski, Jiebo Luo, Hanghang Tong

(参考訳) グラフ学習モデルにおいて、どのようにして有害な攻撃を達成し、偏見を欺いて悪化させることができるのか? 本稿では,二段階最適化問題を通じてこの問題に答え,FATEというメタ学習ベースのフレームワークを提案する。 FATEは、様々な公正定義やグラフ学習モデル、操作操作の任意の選択に関して広く適用できる。さらに、グラフニューラルネットワーク上での統計的パリティと個別の公正性を攻撃するためにFATEをインスタンス化する。半教師付きノード分類のタスクにおいて,実世界のデータセットに対する広範な実験評価を行う。実験の結果,下流タスクの実用性を維持しつつ,公平性を考慮したグラフニューラルネットワークのバイアスを増大させる可能性が示唆された。本稿では、公正グラフ学習の対角的堅牢性に関する洞察を提供し、将来の研究における堅牢かつ公正なグラフ学習の設計に光を当てることを望む。

We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively? We answer this question via a bi-level optimization problem and propose a meta learning-based framework named FATE. FATE is broadly applicable with respect to various fairness definitions and graph learning models, as well as arbitrary choices of manipulation operations. We further instantiate FATE to attack statistical parity and individual fairness on graph neural networks. We conduct extensive experimental evaluations on real-world datasets in the task of semi-supervised node classification. The experimental results demonstrate that FATE could amplify the bias of graph neural networks with or without fairness consideration while maintaining the utility on the downstream task. We hope this paper provides insights into the adversarial robustness of fair graph learning and can shed light on designing robust and fair graph learning in future studies.

翻訳日:2023-10-25 19:41:42 公開日:2023-10-24

# 効率的な事前学習音声モデルとしての動的畳み込みニューラルネットワーク

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models ( http://arxiv.org/abs/2310.15648v1 )

ライセンス: Link先を確認

Florian Schmid, Khaled Koutini, Gerhard Widmer

(参考訳) audiosetのような大規模なオーディオデータセットの導入は、トランスフォーマーがオーディオドメインを克服し、cnnを最先端のニューラルネットワークアーキテクチャとして多くのタスクで置き換える手段となった。 Audio Spectrogram Transformerは大規模なデータセットを活用するのに優れており、下流タスクで微調整されたときにCNNを超える強力な事前学習モデルを生成する。しかし、現在の一般的なAudio Spectrogram Transformersは、CNNと比較して計算複雑性の点で要求されている。近年, Transformer-to-CNN Knowledge Distillation を用いることで, 効率的な CNN は, 大規模データセット上での Transformer に追いつき, 性能も向上することが示された。本研究では, 動的非線形性, 動的畳み込み, および注意機構からなる動的cnnブロックを導入することにより, この研究範囲を拡大し, 効率的なcnnのキャパシティを向上させる。これらの動的CNNは,大規模オーディオセットの音声タグ付け作業において,性能・複雑度トレードオフとパラメータ効率の観点から,従来のCNNよりも優れていることを示す。さらに,導入した動的cnnは,ダウンストリームタスクの性能向上とスケールアップ,トランスフォーマー性能の向上,オーディオセットやダウンストリームタスクよりも優れたパフォーマンスを実現していることを示す。

The introduction of large-scale audio datasets, such as AudioSet, paved the way for Transformers to conquer the audio domain and replace CNNs as the state-of-the-art neural network architecture for many tasks. Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks. However, current popular Audio Spectrogram Transformers are demanding in terms of computational complexity compared to CNNs. Recently, we have shown that, by employing Transformer-to-CNN Knowledge Distillation, efficient CNNs can catch up with and even outperform Transformers on large datasets. In this work, we extend this line of research and increase the capacity of efficient CNNs by introducing dynamic CNN blocks, constructed of dynamic non-linearities, dynamic convolutions and attention mechanisms. We show that these dynamic CNNs outperform traditional efficient CNNs, in terms of the performance-complexity trade-off and parameter efficiency, at the task of audio tagging on the large-scale AudioSet. Our experiments further indicate that the introduced dynamic CNNs achieve better performance on downstream tasks and scale up well, attaining Transformer performance and even outperforming them on AudioSet and several downstream tasks.

翻訳日:2023-10-25 19:41:28 公開日:2023-10-24

# マスク付き特徴アライメントを持つ平均教師DETR:ロバストドメイン適応検出トランスフレームワーク

Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework ( http://arxiv.org/abs/2310.15646v1 )

ライセンス: Link先を確認

Weixi Weng, Chun Yuan

(参考訳) 非教師付きドメイン適応オブジェクト検出(UDAOD)による検出変換器(DETR)の研究は主に特徴アライメントに焦点を当てており、既存の手法は2つの種類に分けられる。 1段階の機能アライメント手法は、パフォーマンスの変動やトレーニングの停滞を容易に引き起こすことができる。平均教師に基づく2段階特徴アライメント手法は、事前訓練段階に続き、自己訓練段階と、信頼性の高い事前訓練モデルの獲得と一貫した性能向上の達成に直面する課題を含む。上述の手法では、ターゲットライクなドメインのような第3の関連ドメインをどのように活用して適応を支援するかはまだ検討されていない。これらの問題に対処するため、我々はMTMと呼ばれる2段階のフレームワーク、すなわちMasked Feature Alignmentを用いた平均教師-DETRを提案する。事前訓練段階では,画像スタイルの転送によって生成されたラベル付きターゲットライクな画像を用いて,性能変動を回避する。自己学習段階において,平均教師に基づく擬似ラベルによるラベル付き目標画像の活用と,学生モデルの一貫したパフォーマンス向上を実現するために,オブジェクトクエリ知識転送(oqkt)と呼ばれるモジュールを提案する。最も重要なことは,Masked Domain Query-based Feature Alignment (MDQFA) や Masked Token-wise Feature Alignment (MTWFA) といったマスク付き機能アライメント手法によって,トレーニングの停滞を防止し,事前訓練段階における堅牢な事前訓練モデルを実現するとともに,自己学習段階におけるモデルの目標性能を向上させることにある。 3つの難解なシナリオの実験と理論的解析はmtmの有効性を検証する。

Unsupervised domain adaptation object detection(UDAOD) research on Detection Transformer(DETR) mainly focuses on feature alignment and existing methods can be divided into two kinds, each of which has its unresolved issues. One-stage feature alignment methods can easily lead to performance fluctuation and training stagnation. Two-stage feature alignment method based on mean teacher comprises a pretraining stage followed by a self-training stage, each facing problems in obtaining reliable pretrained model and achieving consistent performance gains. Methods mentioned above have not yet explore how to utilize the third related domain such as target-like domain to assist adaptation. To address these issues, we propose a two-stage framework named MTM, i.e. Mean Teacher-DETR with Masked Feature Alignment. In the pretraining stage, we utilize labeled target-like images produced by image style transfer to avoid performance fluctuation. In the self-training stage, we leverage unlabeled target images by pseudo labels based on mean teacher and propose a module called Object Queries Knowledge Transfer(OQKT) to ensure consistent performance gains of the student model. Most importantly, we propose masked feature alignment methods including Masked Domain Query-based Feature Alignment(MDQFA) and Masked Token-wise Feature Alignment(MTWFA) to alleviate domain shift in a more robust way, which not only prevent training stagnation and lead to a robust pretrained model in the pretraining stage, but also enhance the model's target performance in the self-training stage. Experiments on three challenging scenarios and a theoretical analysis verify the effectiveness of MTM.

翻訳日:2023-10-25 19:41:03 公開日:2023-10-24

# Droidをライトアップ! Android マルウェア検出におけるアプリケーション難読化に対する静的解析機能の有効性について

Light up that Droid! On the Effectiveness of Static Analysis Features against App Obfuscation for Android Malware Detection ( http://arxiv.org/abs/2310.15645v1 )

ライセンス: Link先を確認

Borja Molina-Coronado, Antonio Ruggia, Usue Mori, Alessio Merlo, Alexander Mendiburu, Jose Miguel-Alonso

(参考訳) マルウェアの作者は、難読化を静的解析機能に基づいてマルウェア検出をバイパスする手段と見なしている。 Androidでは、多くのアンチマルウェア製品が単純なプログラム変換で容易に回避できることが確認されている。これらの作業とは対照的に、静的解析機能を活用したAndroid用のML検出提案も難読化耐性として提案されている。したがって、特定の難読化戦略やツールの使用が、静的解析機能に基づくandroid用のmlマルウェア検出器の妥当性のリスクの程度を決定する必要がある。本稿では,静的解析を用いて抽出した共通特徴に対する特定の難読化技術の影響を評価し,これらの特徴に依存するMLマルウェア検出装置の有効性を損なうのに十分重要な変化かどうかを判定する。実験結果から, 難読化技術は静的解析の全ての特徴を異なるツールで異なる程度に変化させることが示唆された。しかし,特定の特徴は,難読化が存在する場合でもMLマルウェア検出の有効性を保っている。これらの知見に基づいて,難読化対策に頑健なAndroid用MLマルウェア検出器を提案し,現状の最先端検知器よりも優れた性能を示す。

Malware authors have seen obfuscation as the mean to bypass malware detectors based on static analysis features. For Android, several studies have confirmed that many anti-malware products are easily evaded with simple program transformations. As opposed to these works, ML detection proposals for Android leveraging static analysis features have also been proposed as obfuscation-resilient. Therefore, it needs to be determined to what extent the use of a specific obfuscation strategy or tool poses a risk for the validity of ML malware detectors for Android based on static analysis features. To shed some light in this regard, in this article we assess the impact of specific obfuscation techniques on common features extracted using static analysis and determine whether the changes are significant enough to undermine the effectiveness of ML malware detectors that rely on these features. The experimental results suggest that obfuscation techniques affect all static analysis features to varying degrees across different tools. However, certain features retain their validity for ML malware detection even in the presence of obfuscation. Based on these findings, we propose a ML malware detector for Android that is robust against obfuscation and outperforms current state-of-the-art detectors.

翻訳日:2023-10-25 19:40:31 公開日:2023-10-24

# フィンランドにおけるICTインハウス調達の旅 : 法的枠組みと実践的課題の評価

Navigating ICT In-House Procurement in Finland: Evaluating Legal Frameworks and Practical Challenges ( http://arxiv.org/abs/2310.15643v1 )

ライセンス: Link先を確認

Reetta Ghezzi, Minnamaria Korhonen, Hannu Vilpponen, and Tommi Mikkonen

(参考訳) 内調達は公共調達の分野で物議を醸している問題である。簡単に言えば、このような調達はベンダーの公平かつ平等な扱いの特定の側面を見渡すことができる。本稿では,フィンランド市町村におけるICTの社内調達に関する質的研究について述べる。半構造化インタビューは自治体の利害関係者からの洞察を集めるために行われた。接地理論のアプローチを用いて、データ分析はフィンランドの自治体とそれに関連する内的実体の間の複雑なダイナミクスを示している。それでもなお、社内調達を管理する法的枠組みが複雑で議論されていることは明らかである。

In-house procurement is a controversial issue in the field of public procurement. Simply put, such procurement allows overlooking certain aspects of fair and equal treatment of vendors. This paper presents qualitative research on in-house ICT procurement within Finnish municipalities. Semi-structured interviews were conducted to gather insights from municipal stakeholders. Using grounded theory approach, data analysis shows intricate dynamics between Finnish municipalities and in-house entities associated with them. Still, it is clear that the legal framework governing in-house procurement remains intricate and debated.

翻訳日:2023-10-25 19:40:14 公開日:2023-10-24

# GitBug-Actions:GitHubアクションで再現可能なバグフィックスベンチマークを構築する

GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub Actions ( http://arxiv.org/abs/2310.15642v1 )

ライセンス: Link先を確認

Nuno Saavedra, Andr\'e Silva, Martin Monperrus

(参考訳) バグフィックスベンチマークは、自動プログラム修復(APR)やフォールトローカライゼーション(FL)など、ソフトウェア工学の様々なサブフィールドを進化させる上で基本的なものである。優れたベンチマークには、今日の技術と開発プラクティスを正確に反映する最近の例を含める必要があります。長期的に実行可能なベンチマークは、例えば、もはや利用できない依存関係のために、残業時間を劣化しないテストスイートを特徴としなければならない。既存のベンチマークは両方の基準を満たさない。例えば、最上位のjavaベンチマークである defects4j が、2020年にアップデートされた。さらに、既存のベンチマークの大半では、完全な再現性は無視されている。本稿では,gitbug-actionsについて述べる。最新かつ完全に再現可能なバグフィックスを用いて,バグフィックスベンチマークを構築するための新しいツールである。 GitBug-Actionsは、最も人気のあるCIプラットフォームであるGitHub Actionsに依存して、バグフィックスを検出し、制御された再現可能な環境でCIパイプラインをスマートにローカルに実行する。私たちの知る限りでは、GitHub Actionsを使ってバグフィックスを収集するのは初めてです。ツールチェーンを示すために、gitbug-actionsをデプロイして、さまざまなリポジトリから実行可能な、完全に再現可能なバグ修正を含む、概念実証のgoバグフィックスベンチマークを構築します。 GitBug-Actionsをデモするビデオは、https://youtu.be/aBWwa1sJYBsで公開されている。

Bug-fix benchmarks are fundamental in advancing various sub-fields of software engineering such as automatic program repair (APR) and fault localization (FL). A good benchmark must include recent examples that accurately reflect technologies and development practices of today. To be executable in the long term, a benchmark must feature test suites that do not degrade overtime due to, for example, dependencies that are no longer available. Existing benchmarks fail in meeting both criteria. For instance, Defects4J, one of the foremost Java benchmarks, last received an update in 2020. Moreover, full-reproducibility has been neglected by the majority of existing benchmarks. In this paper, we present GitBug-Actions: a novel tool for building bug-fix benchmarks with modern and fully-reproducible bug-fixes. GitBug-Actions relies on the most popular CI platform, GitHub Actions, to detect bug-fixes and smartly locally execute the CI pipeline in a controlled and reproducible environment. To the best of our knowledge, we are the first to rely on GitHub Actions to collect bug-fixes. To demonstrate our toolchain, we deploy GitBug-Actions to build a proof-of-concept Go bug-fix benchmark containing executable, fully-reproducible bug-fixes from different repositories. A video demonstrating GitBug-Actions is available at: https://youtu.be/aBWwa1sJYBs.

翻訳日:2023-10-25 19:40:04 公開日:2023-10-24

# 循環紙からの知識集約による医用抽象要約の改善

Improving Biomedical Abstractive Summarisation with Knowledge Aggregation from Citation Papers ( http://arxiv.org/abs/2310.15684v1 )

ライセンス: Link先を確認

Chen Tang, Shun Wang, Tomas Goldsack and Chenghua Lin

(参考訳) バイオメディカル文学から派生した抽象化は、専門的な書体や、関連する文献の深い理解を必要とするバイオメディカル用語など、ドメイン固有の特徴を持っている。結果として、既存の言語モデルは、ドメイン固有の背景知識が欠如していることから、バイオメディカルの専門家が生み出したものと同等の技術的要約を生成するのに苦労する。本稿では,文献から引用された外部論文から知識を集約することにより,生物医学的抽象要約における言語モデルの性能を向上させることを目的とする。本稿では,引用論文からドメイン固有の知識を統合し,引用論文から論文の内容と関連知識の両方を活用することで要約をニューラルネットワークで生成する,新しい注目に基づく引用集約モデルを提案する。さらに,本研究の基盤となる大規模生物医学的要約データセットを構築し,公開する。広範な実験により,本モデルが最先端のアプローチを上回り,抽象的生物医学的テキスト要約の大幅な改善を達成していることが示された。

Abstracts derived from biomedical literature possess distinct domain-specific characteristics, including specialised writing styles and biomedical terminologies, which necessitate a deep understanding of the related literature. As a result, existing language models struggle to generate technical summaries that are on par with those produced by biomedical experts, given the absence of domain-specific background knowledge. This paper aims to enhance the performance of language models in biomedical abstractive summarisation by aggregating knowledge from external papers cited within the source article. We propose a novel attention-based citation aggregation model that integrates domain-specific knowledge from citation papers, allowing neural networks to generate summaries by leveraging both the paper content and relevant knowledge from citation papers. Furthermore, we construct and release a large-scale biomedical summarisation dataset that serves as a foundation for our research. Extensive experiments demonstrate that our model outperforms state-of-the-art approaches and achieves substantial improvements in abstractive biomedical text summarisation.

翻訳日:2023-10-25 19:30:49 公開日:2023-10-24

# 集団作業における大規模言語モデルの利用状況と防止

Prevalence and prevention of large language model use in crowd work ( http://arxiv.org/abs/2310.15683v1 )

ライセンス: Link先を確認

Veniamin Veselovsky, Manoel Horta Ribeiro, Philip Cozzolino, Andrew Gordon, David Rothschild, Robert West

(参考訳) 大規模言語モデル (LLM) の使用は, 群集労働者の間で広く普及しており, 目標緩和戦略は, LLM の使用を著しく削減するが, 排除しない。 LLMの使用に関して労働者が指示を受けていないテキスト要約タスクでは、LLMの使用頻度は30%程度と見積もられたが、LLMの使用を禁止し、コピーペーストを無効にすることで使用コストを高くすることで約半分削減された。 llmの使用は、(モデルではなく)人間の行動に関わる研究を害し、クラウドソースデータで訓練された将来のモデルを劣化させる可能性がある、高品質だが均質な反応をもたらす。同時に、llmの使用を防止することは、高品質な応答を得るのと相反する可能性がある。例えば、労働者にllmを使わないよう要求する場合、要約には必須情報を含むキーワードが少なかった。 llmが人気や能力を高め、利用に関する基準が変わるにつれ、私たちの見積もはおそらく変わるでしょう。しかし,LLMベースのツールとユーザの共同進化を理解することは,クラウドソーシングによる研究の妥当性を維持する鍵であり,広く普及する前に重要なベースラインを提供する。

We show that the use of large language models (LLMs) is prevalent among crowd workers, and that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use. On a text summarization task where workers were not directed in any way regarding their LLM use, the estimated prevalence of LLM use was around 30%, but was reduced by about half by asking workers to not use LLMs and by raising the cost of using them, e.g., by disabling copy-pasting. Secondary analyses give further insight into LLM use and its prevention: LLM use yields high-quality but homogeneous responses, which may harm research concerned with human (rather than model) behavior and degrade future models trained with crowdsourced data. At the same time, preventing LLM use may be at odds with obtaining high-quality responses; e.g., when requesting workers not to use LLMs, summaries contained fewer keywords carrying essential information. Our estimates will likely change as LLMs increase in popularity or capabilities, and as norms around their usage change. Yet, understanding the co-evolution of LLM-based tools and users is key to maintaining the validity of research done using crowdsourcing, and we provide a critical baseline before widespread adoption ensues.

翻訳日:2023-10-25 19:30:32 公開日:2023-10-24

# 多要素バンドの固定予算実値組合せ純粋探索

Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit ( http://arxiv.org/abs/2310.15681v1 )

ライセンス: Link先を確認

Shintaro Nakamura and Masashi Sugiyama

(参考訳) 固定予算設定におけるマルチアームバンディットの実測値について検討した。まず,動作クラスのサイズがアーム数に対して指数関数的に大きい場合でも,最善の動作を識別できる最初のアルゴリズムであるコンビネートアル・逐次アサイン(csa)アルゴリズムを導入する。 CSAアルゴリズムの誤差確率の上限は指数の対数係数までの下界と一致することを示す。次に、アクションクラスのサイズが多項式である場合には、minimax combinatorial sequential accepts and rejects(minimax-combsar)アルゴリズムという別のアルゴリズムを導入し、それが最適であることを示し、下界に一致することを示す。最後に,提案手法を従来の手法と実験的に比較し,アルゴリズムの性能が向上したことを示す。

We study the real-valued combinatorial pure exploration of the multi-armed bandit in the fixed-budget setting. We first introduce the Combinatorial Successive Asign (CSA) algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large with respect to the number of arms. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. Then, we introduce another algorithm named the Minimax Combinatorial Successive Accepts and Rejects (Minimax-CombSAR) algorithm for the case where the size of the action class is polynomial, and show that it is optimal, which matches a lower bound. Finally, we experimentally compare the algorithms with previous methods and show that our algorithm performs better.

翻訳日:2023-10-25 19:30:08 公開日:2023-10-24

# マルチモーダル3次元シーン理解の最近の進歩:包括的調査と評価

Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation ( http://arxiv.org/abs/2310.15676v1 )

ライセンス: Link先を確認

Yinjie Lei, Zixuan Wang, Feng Chen, Guoqing Wang, Peng Wang and Yang Yang

(参考訳) マルチモーダルな3Dシーン理解は、自律運転や人間とコンピュータのインタラクションなど、多くの分野で広く応用されているため、注目されている。従来の単一モードの3D理解と比較して、付加的なモダリティの導入は、シーン解釈の豊かさと精度を高めるだけでなく、より堅牢でレジリエントな理解を保証する。これは、3Dデータのみに依存することが不十分な環境において、特に重要になる。マルチカメラ画像(3D+2D)とテキスト記述(3D+言語)を統合するようなマルチモーダルな3D手法の開発が過去3年間に進んでいるが、包括的かつ詳細なレビューは特に欠落している。本稿では,このギャップを埋めるための最近の進歩を体系的に調査する。まず、様々な3次元マルチモーダルタスクを形式的に定義し、それらの固有の課題を要約する背景を紹介する。その後,既存の手法をモダリティやタスクに応じて徹底的に分類し,それぞれの強みや限界を探索する新しい分類法を提案する。さらに、いくつかのベンチマークデータセットに対する最近のアプローチと洞察に富んだ分析の比較結果も提供される。最後に,未解決問題について考察し,今後の研究への道筋について述べる。

Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction. Compared to conventional single-modal 3D understanding, introducing an additional modality not only elevates the richness and precision of scene interpretation but also ensures a more robust and resilient understanding. This becomes especially crucial in varied and challenging environments where solely relying on 3D data might be inadequate. While there has been a surge in the development of multi-modal 3D methods over past three years, especially those integrating multi-camera images (3D+2D) and textual descriptions (3D+language), a comprehensive and in-depth review is notably absent. In this article, we present a systematic survey of recent progress to bridge this gap. We begin by briefly introducing a background that formally defines various 3D multi-modal tasks and summarizes their inherent challenges. After that, we present a novel taxonomy that delivers a thorough categorization of existing methods according to modalities and tasks, exploring their respective strengths and limitations. Furthermore, comparative results of recent approaches on several benchmark datasets, together with insightful analysis, are offered. Finally, we discuss the unresolved issues and provide several potential avenues for future research.

翻訳日:2023-10-25 19:29:53 公開日:2023-10-24

# シュワルツシルトブラックホール近傍の量子性

Quantumness near a Schwarzschild black hole ( http://arxiv.org/abs/2310.15675v1 )

ライセンス: Link先を確認

S. Haddadi, M. A. Yurischev, M. Y. Abd-Rabbou, M. Azizi, M. R. Pourkarimi, M. Ghominejad

(参考訳) 量子情報科学と相対性理論の融合は、ブラックホールに関連する情報の伝達を取り巻く謎を理解する新しい機会を与える。この目的のために、シュワルツシルトブラックホール近傍の量子度をデコヒーレンスの下で実用モデルで研究する。本論文で検討するシナリオは、平らな領域の定常粒子が周囲の粒子と相互作用し、別の粒子がシュワルツシルトブラックホールの事象の地平線付近で自由落下する、というものである。ホーキング放射とデコヒーレンスが研究中の系に与える影響を調べ、これらの効果が量子特性の生存を阻害するが、完全に破壊できないことを発見した。したがって、この研究の結果は、曲がりくねった時空フレームワークの中で動作している実システムの量子特性の理解に貴重な洞察を与える可能性がある。

The merging of quantum information science with the relativity theory presents novel opportunities for understanding the enigmas surrounding the transmission of information in relation to black holes. For this purpose, we study the quantumness near a Schwarzschild black hole in a practical model under decoherence. The scenario we consider in this paper is that a stationary particle in the flat region interacts with its surroundings while another particle experiences free fall in the vicinity of a Schwarzschild black hole's event horizon. We explore the impacts of Hawking radiation and decoherence on the system under investigation and find that these effects can limit the survival of quantum characteristics, but cannot destroy them completely. Hence, the results of this study possess the potential to yield valuable insights into the comprehension of the quantum properties of a real system operating within a curved space-time framework.

翻訳日:2023-10-25 19:29:31 公開日:2023-10-24

# 私の注意に基づくASRシステムはどのくらい必要か?

How Much Context Does My Attention-Based ASR System Need? ( http://arxiv.org/abs/2310.15672v1 )

ライセンス: Link先を確認

Robert Flynn and Anton Ragni

(参考訳) 音声認識のタスクでは、訓練中の30秒以上の音響コンテキストの使用は珍しく、文献ではあまり調査されていない。本研究では,音声・言語モデルの学習/評価に使用されるシーケンス長のスケールが音声認識性能に与える影響について検討する。これらの実験では、約10万の擬似ラベル付きSpotifyポッドキャストのデータセットを使用し、コンテキストの長さは5秒から1時間である。長文データセットのゼロショット評価利益-22とtedliumは、約80秒の音響コンテキストでのトレーニングの利点を示し、限られたコンテキストベースラインから14.9%の相対的な改善を示している。さらに、完全長文ASRシステムのビームサーチにより、長文変換言語モデルとシステム組み合わせを行い、現在の最先端技術と競合する結果を得る。

For the task of speech recognition, the use of more than 30 seconds of acoustic context during training is uncommon, and under-investigated in literature. In this work, we examine the effect of scaling the sequence length used to train/evaluate (dense-attention based) acoustic and language models on speech recognition performance. For these experiments a dataset of roughly 100,000 pseudo-labelled Spotify podcasts is used, with context lengths of 5 seconds to 1 hour being explored. Zero-shot evaluations on long-format datasets Earnings-22 and Tedlium demonstrate a benefit from training with around 80 seconds of acoustic context, showing up to a 14.9% relative improvement from a limited context baseline. Furthermore, we perform a system combination with long-context transformer language models via beam search for a fully long-context ASR system, with results that are competitive with the current state-of-the-art.

翻訳日:2023-10-25 19:29:17 公開日:2023-10-24

# 3次元物体検出のための視覚中心多モードエキスパートの活用

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection ( http://arxiv.org/abs/2310.15670v1 )

ライセンス: Link先を確認

Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li

(参考訳) 現在の研究は主に、lidarまたはマルチモーダルベース(expert)から転送される知識を通じて、カメラのみの3dオブジェクト検出器(apprentice)の精度向上に重点を置いている。しかし、LiDARとカメラの特徴のドメインギャップの存在は、時間融合の固有の非互換性と相まって、蒸留に基づく徒弟強化の有効性を著しく損なう。ユニモーダル蒸留の成功に触発されて、見習いに親しみやすい専門家モデルはカメラ機能に大きく依存する一方で、マルチモーダルモデルに匹敵する性能を保った。そこで本研究では, 見習いに親しみやすいマルチモーダルエキスパートと時間融合に親しむ蒸留監督を含む,カメラオンリーの見習いモデルを改善するためのフレームワークであるVCDを紹介する。マルチモーダルの専門家VCD-Eは、特徴格差を軽減するためにカメラオンリーの見習いと同一の構造を採用し、LiDAR入力を3Dシーンの再構成に先立って深度として活用し、他の異種マルチモーダル専門家と同等の性能を達成する。また、シーン内の各対象に対する運動誤認を個別に補正する目的で、細粒度軌道ベースの蒸留モジュールを導入する。これらの改善により、我々のカメラオンリーの見習いVCD-Aは、63.1%のNDSスコアでnuScenesに新しい最先端技術を設定する。

Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert). However, the presence of the domain gap between LiDAR and camera features, coupled with the inherent incompatibility in temporal fusion, significantly hinders the effectiveness of distillation-based enhancements for apprentices. Motivated by the success of uni-modal distillation, an apprentice-friendly expert model would predominantly rely on camera features, while still achieving comparable performance to multi-modal models. To this end, we introduce VCD, a framework to improve the camera-only apprentice model, including an apprentice-friendly multi-modal expert and temporal-fusion-friendly distillation supervision. The multi-modal expert VCD-E adopts an identical structure as that of the camera-only apprentice in order to alleviate the feature disparity, and leverages LiDAR input as a depth prior to reconstruct the 3D scene, achieving the performance on par with other heterogeneous multi-modal experts. Additionally, a fine-grained trajectory-based distillation module is introduced with the purpose of individually rectifying the motion misalignment for each object in the scene. With those improvements, our camera-only apprentice VCD-A sets new state-of-the-art on nuScenes with a score of 63.1% NDS.

翻訳日:2023-10-25 19:29:02 公開日:2023-10-24

# 数学用語問題に対する表現構文情報ボトルネック

Expression Syntax Information Bottleneck for Math Word Problems ( http://arxiv.org/abs/2310.15664v1 )

ライセンス: Link先を確認

Jing Xiong, Chengming Li, Min Yang, Xiping Hu, Bin Hu

(参考訳) Math Word Problems (MWP) は、テキストで与えられた数学的問題を自動的に解くことを目的としている。以前の研究では、モデルがより包括的な機能を得るために、元のテキストで追加情報を取得するために複雑なモデルを設計する傾向がある。本稿では,我々の注意を反対方向に向け,MWPの急激な相関を含む冗長な特徴を捨てる方法について検討する。そこで本研究では,表現構文木の本質的特徴を抽出し,構文関連性のない特徴を含む潜在固有冗長性をフィルタリングするMWP(ESIB)のための表現構文情報ブートネック手法を設計する。 ESIBの鍵となる考え方は、複数のモデルに対して、同じ問題の異なる問題表現に対する同じ式構文木を相互学習により予測し、表現構文木の一貫性のある情報をキャプチャし、潜時固有の冗長性を捨てることである。モデルの一般化能力を向上し、より多様な表現を生成するために、潜在空間における表現構文情報にもっと依存するようモデルに促すために、自己蒸留損失をデザインする。 2つの大規模ベンチマークにおける実験結果から,我々のモデルが最先端の結果を達成するだけでなく,より多様なソリューションを生み出すことが示された。コードは利用可能です。

Math Word Problems (MWP) aims to automatically solve mathematical questions given in texts. Previous studies tend to design complex models to capture additional information in the original text so as to enable the model to gain more comprehensive features. In this paper, we turn our attention in the opposite direction, and work on how to discard redundant features containing spurious correlations for MWP. To this end, we design an Expression Syntax Information Bottleneck method for MWP (called ESIB) based on variational information bottleneck, which extracts essential features of expression syntax tree while filtering latent-specific redundancy containing syntax-irrelevant features. The key idea of ESIB is to encourage multiple models to predict the same expression syntax tree for different problem representations of the same problem by mutual learning so as to capture consistent information of expression syntax tree and discard latent-specific redundancy. To improve the generalization ability of the model and generate more diverse expressions, we design a self-distillation loss to encourage the model to rely more on the expression syntax information in the latent space. Experimental results on two large-scale benchmarks show that our model not only achieves state-of-the-art results but also generates more diverse solutions. The code is available.

翻訳日:2023-10-25 19:28:37 公開日:2023-10-24

# 電気負荷予測における対話型一般化付加モデルとその応用

Interactive Generalized Additive Model and Its Applications in Electric Load Forecasting ( http://arxiv.org/abs/2310.15662v1 )

ライセンス: Link先を確認

Linxiao Yang and Rui Ren and Xinyue Gu and Liang Sun

(参考訳) 電力負荷予測は電力システムの計画と管理に欠かせない要素である。不正確な負荷予測は、停電やエネルギーの浪費につながる可能性がある。正確な電力負荷予測は、ホリデーシーズンの負荷予測や極端気象条件下での負荷予測など、限られたデータやデータがない場合に困難である。高リスク意思決定は通常負荷予測の後に行われるため、モデル解釈は予測モデルの導入に不可欠である。本稿では,電力産業において,解釈可能なだけでなく,特定の分野の知識を取り入れた対話型GAMを提案する。このブースティングに基づくGAMは、断片線形関数を活用し、効率的なアルゴリズムによって学習することができる。パブリックベンチマークと電気データの両方において、我々の対話型GAMは現在の最先端の手法よりも優れており、極端な気象事象の場合に優れた一般化能力を示す。私たちはインタラクティブなGAMをベースとしたユーザフレンドリなWebベースのツールをローンチし、電気予測のための統合AIプラットフォームであるeForecaster製品にすでに組み込んでいます。

Electric load forecasting is an indispensable component of electric power system planning and management. Inaccurate load forecasting may lead to the threat of outages or a waste of energy. Accurate electric load forecasting is challenging when there is limited data or even no data, such as load forecasting in holiday, or under extreme weather conditions. As high-stakes decision-making usually follows after load forecasting, model interpretability is crucial for the adoption of forecasting models. In this paper, we propose an interactive GAM which is not only interpretable but also can incorporate specific domain knowledge in electric power industry for improved performance. This boosting-based GAM leverages piecewise linear functions and can be learned through our efficient algorithm. In both public benchmark and electricity datasets, our interactive GAM outperforms current state-of-the-art methods and demonstrates good generalization ability in the cases of extreme weather events. We launched a user-friendly web-based tool based on interactive GAM and already incorporated it into our eForecaster product, a unified AI platform for electricity forecasting.

翻訳日:2023-10-25 19:28:15 公開日:2023-10-24

# 地域制御型スタイル転送

Region-controlled Style Transfer ( http://arxiv.org/abs/2310.15658v1 )

ライセンス: Link先を確認

Junjie Kang, Jinsong Wu, Shiqi Jiang

(参考訳) 画像スタイル転送は計算ビジョンにおいて難しい課題である。既存のアルゴリズムは、ニューラルネットワークの特徴層を制御することによって、スタイルイメージの色とテクスチャを転送する。しかし、コンテンツ画像の異なる領域におけるテクスチャの強さを制御できない。そこで本研究では,異なる領域のスタイル強度を制約するためにロス関数を用いたトレーニング手法を提案する。本手法は,スタイル画像とコンテンツ画像の勾配関係に基づいて,異なる領域におけるスタイル特徴の伝達強度を導出する。さらに,その意味的関係を維持しつつ,コンテンツの特徴をスタイル的特徴に線形変換する特徴融合手法を提案する。広範な実験により,提案手法の有効性が実証された。

Image style transfer is a challenging task in computational vision. Existing algorithms transfer the color and texture of style images by controlling the neural network's feature layers. However, they fail to control the strength of textures in different regions of the content image. To address this issue, we propose a training method that uses a loss function to constrain the style intensity in different regions. This method guides the transfer strength of style features in different regions based on the gradient relationship between style and content images. Additionally, we introduce a novel feature fusion method that linearly transforms content features to resemble style features while preserving their semantic relationships. Extensive experiments have demonstrated the effectiveness of our proposed approach.

翻訳日:2023-10-25 19:27:58 公開日:2023-10-24

# GNeSF: 一般化可能なニューラルセマンティックフィールド

GNeSF: Generalizable Neural Semantic Fields ( http://arxiv.org/abs/2310.15712v1 )

ライセンス: Link先を確認

Hanlin Chen, Chen Li, Mengqi Guo, Zhiwen Yan, Gim Hee Lee

(参考訳) 神経的暗黙的表現に基づく3次元シーンセグメンテーションが最近登場し,2次元監督によるトレーニングのみを活用している。しかし、既存のアプローチでは推論中に新しいシーンへの一般化を禁止した高価なシーンごとの最適化が必要である。この問題を回避するために,暗黙表現に基づく一般化可能な3次元セグメンテーションフレームワークを提案する。具体的には,多視点画像特徴と意味マップを入力として,空間情報のみを入力とし,シーン固有の幾何学的・意味的情報への過度な適合を避ける。本稿では,各3次元点の異なる視点から2次元意味情報を集約するソフト投票機構を提案する。画像の特徴に加えて,我々のフレームワークでは,投票結果を予測するために,ビュー差情報も符号化されている。直感的には、近くのビューからのセマンティックな情報は、遠くのビューよりも貢献できる。さらに、可視性モジュールは、隠されたビューから有害情報を検出し、フィルタリングするように設計されている。提案手法の汎用性により,意味マップを合成したり,2次元意味的監督だけで新規シーンの3次元意味セグメンテーションを行うことができる。実験結果から,本手法はシーン特異的アプローチと同等の性能を示した。さらに重要なことは、我々のアプローチは2Dアノテーションだけで既存の強力な監督ベースのアプローチより優れていることです。ソースコードはhttps://github.com/hlinchen/gnesf.com/で入手できます。

3D scene segmentation based on neural implicit representation has emerged recently with the advantage of training only on 2D supervision. However, existing approaches still requires expensive per-scene optimization that prohibits generalization to novel scenes during inference. To circumvent this problem, we introduce a generalizable 3D segmentation framework based on implicit representation. Specifically, our framework takes in multi-view image features and semantic maps as the inputs instead of only spatial information to avoid overfitting to scene-specific geometric and semantic information. We propose a novel soft voting mechanism to aggregate the 2D semantic information from different views for each 3D point. In addition to the image features, view difference information is also encoded in our framework to predict the voting scores. Intuitively, this allows the semantic information from nearby views to contribute more compared to distant ones. Furthermore, a visibility module is also designed to detect and filter out detrimental information from occluded views. Due to the generalizability of our proposed method, we can synthesize semantic maps or conduct 3D semantic segmentation for novel scenes with solely 2D semantic supervision. Experimental results show that our approach achieves comparable performance with scene-specific approaches. More importantly, our approach can even outperform existing strong supervision-based approaches with only 2D annotations. Our source code is available at: https://github.com/HLinChen/GNeSF.

翻訳日:2023-10-25 19:22:35 公開日:2023-10-24

# 観察変数のグルーピングによって識別可能な因果表現学習

Causal Representation Learning Made Identifiable by Grouping of Observational Variables ( http://arxiv.org/abs/2310.15709v1 )

ライセンス: Link先を確認

Hiroshi Morioka, Aapo Hyv\"arinen

(参考訳) 現在注目されているトピックはcausal representation learning(crl)で、その目標はデータ駆動方式で隠れた機能のための因果モデルを学ぶことである。残念なことにCRLは、表現学習と因果発見の2つの悪名高い悪名高い問題の組み合わせである。しかし,一意解が保証される実用的識別可能性条件の発見は,その実用性に不可欠である。これまでのアプローチのほとんどは、時間的因果性(temporal causality)や監督や介入の存在といった潜在因果メカニズムの仮定に基づいている。ここでは,時間構造や介入,弱い監督を必要としない,新しい弱い制約に基づく識別可能性を示す。このアプローチは、観測混合が観測変数の適切なグループ化を示すと仮定している。また,モデルに整合した新たな自己教師付き推定フレームワークを提案し,その統計的整合性を証明し,最先端のベースラインに比べて優れたCRL性能を実験的に示す。我々はまた、潜在する共同設立者と因果サイクルに対する堅牢性を示す。

A topic of great current interest is Causal Representation Learning (CRL), whose goal is to learn a causal model for hidden features in a data-driven manner. Unfortunately, CRL is severely ill-posed since it is a combination of the two notoriously ill-posed problems of representation learning and causal discovery. Yet, finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most approaches so far have been based on assumptions on the latent causal mechanisms, such as temporal causality, or existence of supervision or interventions; these can be too restrictive in actual applications. Here, we show identifiability based on novel, weak constraints, which requires no temporal structure, intervention, nor weak supervision. The approach is based assuming the observational mixing exhibits a suitable grouping of the observational variables. We also propose a novel self-supervised estimation framework consistent with the model, prove its statistical consistency, and experimentally show its superior CRL performances compared to the state-of-the-art baselines. We further demonstrate its robustness against latent confounders and causal cycles.

翻訳日:2023-10-25 19:22:11 公開日:2023-10-24

# 深層強化学習を用いた多種多様なスケジューリングポリシーの作成による大規模フレキシブルジョブショップスケジューリングインスタンスの解法

Solving large flexible job shop scheduling instances by generating a diverse set of scheduling policies with deep reinforcement learning ( http://arxiv.org/abs/2310.15706v1 )

ライセンス: Link先を確認

Imanol Echeverria, Maialen Murua, Roberto Santana

(参考訳) フレキシブルなジョブショップスケジューリング問題(fjssp)は文献で広く研究されており、ヒューリスティック、精密、メタヒューリスティックな手法で複数のアプローチが提案されている。しかし、業界がリアルタイムでディスラプティブなイベントに応答できるという要求は、数秒以内に新しいスケジュールを生成する必要性を生んでいる。この制約の下では、品質が向上してもスケジュールを生成することができるのはディスパッチルール(DR)のみである。この結果を改善するため、fjsspをマルコフ決定プロセス(mdp)としてモデル化し、強化学習を用いて機械に操作を割り当てる最適解を生成するポリシーを作成するための最近の手法が提案されている。それでも、特に現実のシナリオで一般的な大きなJSSPインスタンスでは、改善の余地は残っている。そこで本研究では,FJSSPの大規模インスタンスを堅牢に解決する手法を提案する。そこで本稿では,グラフニューラルネットワークを用いてFJSSPをMDPとしてモデル化する手法を提案する。また、推論をより堅牢にする方法として、並列化可能なスケジューリングポリシーの多様なセットを生成し、DRを使って制限する2つの方法を提案する。提案手法は,より大規模なFJSSPインスタンス上での他の3つの深層強化学習手法よりも,分散ルールよりも優れ,より優れた結果が得られることがわかった。

The Flexible Job Shop Scheduling Problem (FJSSP) has been extensively studied in the literature, and multiple approaches have been proposed within the heuristic, exact, and metaheuristic methods. However, the industry's demand to be able to respond in real-time to disruptive events has generated the necessity to be able to generate new schedules within a few seconds. Among these methods, under this constraint, only dispatching rules (DRs) are capable of generating schedules, even though their quality can be improved. To improve the results, recent methods have been proposed for modeling the FJSSP as a Markov Decision Process (MDP) and employing reinforcement learning to create a policy that generates an optimal solution assigning operations to machines. Nonetheless, there is still room for improvement, particularly in the larger FJSSP instances which are common in real-world scenarios. Therefore, the objective of this paper is to propose a method capable of robustly solving large instances of the FJSSP. To achieve this, we propose a novel way of modeling the FJSSP as an MDP using graph neural networks. We also present two methods to make inference more robust: generating a diverse set of scheduling policies that can be parallelized and limiting them using DRs. We have tested our approach on synthetically generated instances and various public benchmarks and found that our approach outperforms dispatching rules and achieves better results than three other recent deep reinforcement learning methods on larger FJSSP instances.

翻訳日:2023-10-25 19:21:52 公開日:2023-10-24

# 無線ネットワークにおける情報正確性と鮮度のための学習型スケジューリング

Learning-based Scheduling for Information Accuracy and Freshness in Wireless Networks ( http://arxiv.org/abs/2310.15705v1 )

ライセンス: Link先を確認

Hitesh Gudwani

(参考訳) 我々は、複数のソース、単一の通信チャネル、単一の監視ステーションからなるシステムを考える。各ソースは、精度の異なる時間変動量を測定し、そのうちの1つがチャネル経由で監視ステーションに更新を送信する。それぞれの通信が成功する確率は、更新を送信するためにスケジュールされたソースの機能である。正確な測定の確率と全てのソースの送信が成功する確率の両方がスケジューラに不明である。利息の指標は、宛先が受信した最終更新の精度と、システムの年齢情報(AoI)に依存するシステムによって与えられる報酬である。我々は,マルチアームバンディット問題の一変種としてスケジューリング問題をモデル化した。 ETC,$\epsilon$-greedy, UCB, TSといった4ドルの標準バンディットポリシのパフォーマンスをシミュレーションによって適切に調整したシステムモデルと比較する。さらに、これらのポリシーの2ドルなどの分析的な保証と、$\epsilon$-greedyを提供します。最後に、いかなる政策でも達成可能な累積的後悔に対する下限を特徴づける。

We consider a system of multiple sources, a single communication channel, and a single monitoring station. Each source measures a time-varying quantity with varying levels of accuracy and one of them sends its update to the monitoring station via the channel. The probability of success of each attempted communication is a function of the source scheduled for transmitting its update. Both the probability of correct measurement and the probability of successful transmission of all the sources are unknown to the scheduler. The metric of interest is the reward received by the system which depends on the accuracy of the last update received by the destination and the Age-of-Information (AoI) of the system. We model our scheduling problem as a variant of the multi-arm bandit problem with sources as different arms. We compare the performance of all $4$ standard bandit policies, namely, ETC, $\epsilon$-greedy, UCB, and TS suitably adjusted to our system model via simulations. In addition, we provide analytical guarantees of $2$ of these policies, ETC, and $\epsilon$-greedy. Finally, we characterize the lower bound on the cumulative regret achievable by any policy.

翻訳日:2023-10-25 19:21:24 公開日:2023-10-24

# 外部知識グラフを用いた生物医学的要約の強化

Enhancing Biomedical Lay Summarisation with External Knowledge Graphs ( http://arxiv.org/abs/2310.15702v1 )

ライセンス: Link先を確認

Tomas Goldsack, Zhihao Zhang, Chen Tang, Carolina Scarton, Chenghua Lin

(参考訳) 自動レイサマリゼーションのこれまでのアプローチは、技術的聴衆(例えば研究者)のために書かれたことを考えると、すべての技術的概念を明示的に定義したり、すべての背景情報を一般の聴衆に関連付けることは不可能である。本稿では,既存のバイオメディカル・レイ・サマリゼーション・データセットであるeLifeに,関連するバイオメディカル概念に関する詳細な情報を含む,記事固有の知識グラフを付加することにより,この問題に対処する。自動評価と人的評価の両方を用いて,各手法がエンコーダ・デコーダ・モデルアーキテクチャの異なる領域を対象とし,階層化モデルに知識グラフを組み込む3つのアプローチの有効性を体系的に検討した。この結果から,グラフベースのドメイン知識の統合は,生成したテキストの可読性を大幅に向上し,技術的概念の理解を深めることによって,レイ・サマリゼーションのメリットを著しく向上させることが確認できた。

Previous approaches for automatic lay summarisation are exclusively reliant on the source article that, given it is written for a technical audience (e.g., researchers), is unlikely to explicitly define all technical concepts or state all of the background information that is relevant for a lay audience. We address this issue by augmenting eLife, an existing biomedical lay summarisation dataset, with article-specific knowledge graphs, each containing detailed information on relevant biomedical concepts. Using both automatic and human evaluations, we systematically investigate the effectiveness of three different approaches for incorporating knowledge graphs within lay summarisation models, with each method targeting a distinct area of the encoder-decoder model architecture. Our results confirm that integrating graph-based domain knowledge can significantly benefit lay summarisation by substantially increasing the readability of generated text and improving the explanation of technical concepts.

翻訳日:2023-10-25 19:21:07 公開日:2023-10-24

# COPF: 最適な政策適合による継続的な学習

COPF: Continual Learning Human Preference through Optimal Policy Fitting ( http://arxiv.org/abs/2310.15694v1 )

ライセンス: Link先を確認

Han Zhang, Lin Gui, Yuanzhao Zhai, Hui Wang, Yu Lei, Ruifeng Xu

(参考訳) 人間フィードバックからの強化学習(rlhf)は、事前学習された言語モデル(lm)を改善するために一般的に用いられる手法であり、人間の好みに適合する能力を高める。しかしながら、現在のRLHFベースのLMは、新しいクエリやフィードバックが導入されるたびに完全なリトレーニングを必要とする。 lmsの再トレーニングは、データプライバシに関する懸念に加えて、膨大な時間と計算リソースを必要とするため、多くの現実の状況において実践上の困難をもたらす。この制限に対処するために,モンテカルロ法を用いて一連の最適政策を推定し,関数正規化と連続的にポリシーシーケンスを適合させる,COPF(Continuous Optimal Policy Fitting)と呼ばれる新しい手法を提案する。 COPFは単一の学習フェーズを含み、複雑な強化学習を必要としない。重要なのは、ラベルのないデータから学習するRLHFと共有することで、継続的な嗜好学習に柔軟になることだ。実験の結果, copfは, 異なるタスクやドメインにおける人間の嗜好と一貫性を持たせる上で, 強い連続学習(cl)ベースラインよりも優れていることがわかった。

The technique of Reinforcement Learning from Human Feedback (RLHF) is a commonly employed method to improve pre-trained Language Models (LM), enhancing their ability to conform to human preferences. Nevertheless, the current RLHF-based LMs necessitate full retraining each time novel queries or feedback are introduced, which becomes a challenging task because human preferences can vary between different domains or tasks. Retraining LMs poses practical difficulties in many real-world situations due to the significant time and computational resources required, along with concerns related to data privacy. To address this limitation, we propose a new method called Continual Optimal Policy Fitting (COPF), in which we estimate a series of optimal policies using the Monte Carlo method, and then continually fit the policy sequence with the function regularization. COPF involves a single learning phase and doesn't necessitate complex reinforcement learning. Importantly, it shares the capability with RLHF to learn from unlabeled data, making it flexible for continual preference learning. Our experimental results show that COPF outperforms strong Continuous learning (CL) baselines when it comes to consistently aligning with human preferences on different tasks and domains.

翻訳日:2023-10-25 19:20:48 公開日:2023-10-24

# 半教師付き学習によるレシピジャンルの自動分類

Towards Automated Recipe Genre Classification using Semi-Supervised Learning ( http://arxiv.org/abs/2310.15693v1 )

ライセンス: Link先を確認

Nazmus Sakib, G. M. Shahariar, Md. Mohsinul Kabir, Md. Kamrul Hasan and Hasan Mahmud

(参考訳) 料理のレシピを共有することは、料理のアイデアを交換し、料理の準備の指示を与えるのに最適な方法である。しかし、適切なラベル付きデータがないため、オンラインの生レシピを適切な食品ジャンルに分類することは困難である。本研究では,それぞれのカテゴリにラベル付けされた200万の料理レシピを含む「Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset」というデータセットを提案する。このデータには、タイトル、NER、方向、拡張NERなどの様々な特徴と、パン屋、飲み物、非野菜、野菜、ファーストフード、穀物、食事、側面、融合などのジャンルを表す9つの異なるラベルが含まれている。提案されたパイプラインである3A2M+は、名前付きエンティティ認識(NER)リストのサイズを拡張して、2つのNER抽出ツールを使用してレシピの方向から、熱、時間、プロセスなどの名前のないエンティティに対処する。 3A2M+データセットは、分類、名前付きエンティティ認識、レシピ生成など、さまざまな困難なレシピ関連タスクに対する包括的なソリューションを提供する。さらに、従来の機械学習、ディープラーニング、事前学習言語モデルを用いてレシピをそれぞれのジャンルに分類し、全体の精度98.6\%を達成した。我々の調査は、タイトル機能はジャンルの分類においてより重要な役割を担ったことを示している。

Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the ``Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools. 3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation. Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6\%. Our investigation indicates that the title feature played a more significant role in classifying the genre.

翻訳日:2023-10-25 19:20:27 公開日:2023-10-24

# 補間・逆問題に対する高次残差ネットワークを用いた物理インフォームド

Physics-Informed with Power-Enhanced Residual Network for Interpolation and Inverse Problems ( http://arxiv.org/abs/2310.15690v1 )

ライセンス: Link先を確認

Amir Noorizadegan, D.L. Young, Y.C. Hon, C.S. Chen

(参考訳) 本稿では,2次元および3次元設定におけるスムース関数と非スムース関数の補間能力を改善するために設計された,パワーエンハンシング残差ネットワークと呼ばれる新しいニューラルネットワーク構造を提案する。残余要素に電力項を追加することで、アーキテクチャはネットワークの表現力を高める。本研究は,ネットワーク深さ,幅,最適化手法について検討し,アーキテクチャの適応性と性能上の優位性を示す。一貫して,提案するパワーエンハンシング残差ネットワーク,特に非スムース関数の異常精度を強調する。実世界の例では、正確性、収束性、効率性の点で、普通のニューラルネットワークよりも優れていることも確認されている。この研究は、より深いネットワークの影響も調べている。さらに、提案アーキテクチャは逆バーガー方程式の解法にも適用され、優れた性能を示す。結論として、パワーエンハンシング残余ネットワークは、ニューラルネットワークの機能を大幅に強化する汎用的なソリューションを提供する。実装されたコードは、 \url{https://github.com/cmmai/resnet_for_pinn} で利用可能である。

This paper introduces a novel neural network structure called the Power-Enhancing residual network, designed to improve interpolation capabilities for both smooth and non-smooth functions in 2D and 3D settings. By adding power terms to residual elements, the architecture boosts the network's expressive power. The study explores network depth, width, and optimization methods, showing the architecture's adaptability and performance advantages. Consistently, the results emphasize the exceptional accuracy of the proposed Power-Enhancing residual network, particularly for non-smooth functions. Real-world examples also confirm its superiority over plain neural network in terms of accuracy, convergence, and efficiency. The study also looks at the impact of deeper network. Moreover, the proposed architecture is also applied to solving the inverse Burgers' equation, demonstrating superior performance. In conclusion, the Power-Enhancing residual network offers a versatile solution that significantly enhances neural network capabilities. The codes implemented are available at: \url{https://github.com/CMMAi/ResNet_for_PINN}.

翻訳日:2023-10-25 19:19:55 公開日:2023-10-24

# 特許簡素化のための銀標準の作成

Creating a silver standard for patent simplification ( http://arxiv.org/abs/2310.15689v1 )

ライセンス: Link先を確認

Silvia Casola, Alberto Lavelli, Horacio Saggion

(参考訳) 特許は、発明を一方的に保護し、他方で技術知識を流通させることを目的とした法的文書である。彼らの複雑なスタイル ― 法的、技術的、極めてあいまいな言語 ― は、コンテンツが人間や機械へのアクセスを困難にし、情報検索コミュニティに重大な課題をもたらす。本稿では,リプレースにより特許文書を自動的に簡易化する手法を提案する。ドメイン内並列化データがないため,特許文の大規模銀標準を自動的に生成する手法を提案する。候補を得るには一般ドメインパラフレーズシステムを用いるが,このプロセスはエラーを起こしやすく,制御が困難である。そこで,本研究では,適切なフィルタとペアリングし,簡易化システムの訓練に有効なクリーンコーパスを構築する。合成銀コーパスの人間による評価は, 文法的, 適切であり, 簡単な文を含むことを示している。

Patents are legal documents that aim at protecting inventions on the one hand and at making technical knowledge circulate on the other. Their complex style -- a mix of legal, technical, and extremely vague language -- makes their content hard to access for humans and machines and poses substantial challenges to the information retrieval community. This paper proposes an approach to automatically simplify patent text through rephrasing. Since no in-domain parallel simplification data exist, we propose a method to automatically generate a large-scale silver standard for patent sentences. To obtain candidates, we use a general-domain paraphrasing system; however, the process is error-prone and difficult to control. Thus, we pair it with proper filters and construct a cleaner corpus that can successfully be used to train a simplification system. Human evaluation of the synthetic silver corpus shows that it is considered grammatical, adequate, and contains simple sentences.

翻訳日:2023-10-25 19:19:40 公開日:2023-10-24

# フィードバックに基づく物体外観学習による夜間熱赤外画像のカラー化

Nighttime Thermal Infrared Image Colorization with Feedback-based Object Appearance Learning ( http://arxiv.org/abs/2310.15688v1 )

ライセンス: Link先を確認

Fu-Ya Luo, Shu-Lin Liu, Yi-Jun Cao, Kai-Fu Yang, Chang-Yong Xie, Yong Liu, Yong-Jie Li

(参考訳) 悪環境(例えば全暗黒)における安定した撮像は、熱赤外カメラ(TIR)を夜景知覚の一般的な選択肢にしている。しかしながら、TIR画像の低コントラストと色度欠如は、人間の解釈とその後のRGBベースの視覚アルゴリズムの展開に有害である。したがって、それを対応する昼間色画像(NTIR2DC)に翻訳することで、夜間TIR画像を色づけすることは理にかなっている。 NTIR2DCタスクの目覚ましい進歩にもかかわらず、小さなオブジェクトクラスの翻訳性能をいかに向上させるかは未調査である。この問題に対処するために,フィードバックに基づくオブジェクト外観学習(FoalGAN)を取り入れた生成的敵ネットワークを提案する。具体的には、オブジェクト翻訳の文脈依存性を低減するために、オクルージョン対応ミックスアップモジュールとそれに対応する外観整合性損失を提案する。夜間の街路場面における小型物体の代表的な例として, 交通灯の外観損失をデザインすることにより, 交通灯のリアリズムを高める方法を示す。小型オブジェクトの出現学習をさらに改善するため,2つのフィードバック学習戦略を考案し,異なるサンプルの学習頻度を選択的に調整する。さらに,brnoデータセットのサブセットに対してピクセルレベルのアノテーションを提供し,複数の気象条件下でのntir画像理解の研究を容易にする。広範な実験により,提案手法は小物体の出現学習に有効であるだけでなく,ntir2dcタスクにおける意味保存とエッジ一貫性の観点から,他の画像翻訳手法よりも優れていることが示された。

Stable imaging in adverse environments (e.g., total darkness) makes thermal infrared (TIR) cameras a prevalent option for night scene perception. However, the low contrast and lack of chromaticity of TIR images are detrimental to human interpretation and subsequent deployment of RGB-based vision algorithms. Therefore, it makes sense to colorize the nighttime TIR images by translating them into the corresponding daytime color images (NTIR2DC). Despite the impressive progress made in the NTIR2DC task, how to improve the translation performance of small object classes is under-explored. To address this problem, we propose a generative adversarial network incorporating feedback-based object appearance learning (FoalGAN). Specifically, an occlusion-aware mixup module and corresponding appearance consistency loss are proposed to reduce the context dependence of object translation. As a representative example of small objects in nighttime street scenes, we illustrate how to enhance the realism of traffic light by designing a traffic light appearance loss. To further improve the appearance learning of small objects, we devise a dual feedback learning strategy to selectively adjust the learning frequency of different samples. In addition, we provide pixel-level annotation for a subset of the Brno dataset, which can facilitate the research of NTIR image understanding under multiple weather conditions. Extensive experiments illustrate that the proposed FoalGAN is not only effective for appearance learning of small objects, but also outperforms other image translation methods in terms of semantic preservation and edge consistency for the NTIR2DC task.

翻訳日:2023-10-25 19:19:25 公開日:2023-10-24

# 単純なアンサンブルプロジェクタによる半教師あり学習性能の劣化・校正・改善

Debiasing, calibrating, and improving Semi-supervised Learning performance via simple Ensemble Projector ( http://arxiv.org/abs/2310.15764v1 )

ライセンス: Link先を確認

Khanh-Binh Nguyen

(参考訳) 半教師付き学習(SSL)に関する最近の研究は大きな成功を収めている。有望な性能にもかかわらず、現在の最先端の手法は、より多くのネットワークコンポーネントと追加のトレーニング手順を導入するコストを犠牲にして、ますます複雑な設計へと向かっている。本稿では,既存のコントラスト付き半教師付き学習フレームワークの性能向上を目的として,EPASS(Ensemble Projectors Aided for Semi-supervised Learning)という簡単な手法を提案する。 1つのプロジェクタからの学習された埋め込みが対照的な学習で使用されるメモリバンクに格納される標準的な方法とは異なり、EPASSは複数のプロジェクタからのアンサンブル埋め込みをメモリバンクに格納する。その結果、EPASSは一般化を改善し、特徴表現を強化し、性能を向上する。例えばEPASSは、SimMatchのラベル付きデータの100k/1\%/10\%しか使用せず、半教師付き学習の強いベースラインを39.47\%/31.39\%/24.70\%のトップ-1エラーレートで改善し、ImageNetデータセット上でCoMatchの40.24\%/32.64\%/25.90\%のトップ1エラーレートを達成する。これらの改善は、提案手法の一般的な有効性を証明するため、メソッド、ネットワークアーキテクチャ、データセット間で一貫性がある。コードはhttps://github.com/beandkay/EPASSで入手できる。

Recent studies on semi-supervised learning (SSL) have achieved great success. Despite their promising performance, current state-of-the-art methods tend toward increasingly complex designs at the cost of introducing more network components and additional training procedures. In this paper, we propose a simple method named Ensemble Projectors Aided for Semi-supervised Learning (EPASS), which focuses mainly on improving the learned embeddings to boost the performance of the existing contrastive joint-training semi-supervised learning frameworks. Unlike standard methods, where the learned embeddings from one projector are stored in memory banks to be used with contrastive learning, EPASS stores the ensemble embeddings from multiple projectors in memory banks. As a result, EPASS improves generalization, strengthens feature representation, and boosts performance. For instance, EPASS improves strong baselines for semi-supervised learning by 39.47\%/31.39\%/24.70\% top-1 error rate, while using only 100k/1\%/10\% of labeled data for SimMatch, and achieves 40.24\%/32.64\%/25.90\% top-1 error rate for CoMatch on the ImageNet dataset. These improvements are consistent across methods, network architectures, and datasets, proving the general effectiveness of the proposed methods. Code is available at https://github.com/beandkay/EPASS.

翻訳日:2023-10-25 19:11:31 公開日:2023-10-24

# RAPL:Few-Shotドキュメンテーション-レベル関係抽出のための関係認識型学習手法

RAPL: A Relation-Aware Prototype Learning Approach for Few-Shot Document-Level Relation Extraction ( http://arxiv.org/abs/2310.15743v1 )

ライセンス: Link先を確認

Shiao Meng, Xuming Hu, Aiwei Liu, Shu'ang Li, Fukun Ma, Yawen Yang, Lijie Wen

(参考訳) ラベル付きドキュメントがわずかにあれば、ドキュメント内のエンティティ間のセマンティックな関係を識別する方法? 実世界のシナリオにおける広範囲なデータ不足問題に対処するためには,FSDLRE (Few-shot document-level relation extract) が重要である。メトリクスベースのメタラーニングは、分類のためのクラスプロトタイプを構築するFSDLREに広く採用されている効果的なフレームワークである。しかし、既存の作品はしばしば正確な関係セマンティクスを持つクラスプロトタイプを得るのに苦労している。 1) 対象関係型のプロトタイプを構築するには、その関係を保持するすべてのエンティティペアの表現を集約する一方、これらのエンティティペアは他の関係も保持し、プロトタイプを妨害する可能性がある。 2) ターゲット関係型が異なるタスクではNOTA意味が異なることを無視して,NOTA(None-of-the-above)プロトタイプを全タスクにわたって使用する。本稿では,FSDLREにおける関係認識型プロトタイプ学習手法を提案する。本手法は,関係記述や現実的なNOTAインスタンスをガイダンスとして活用することにより,関係のプロトタイプを効果的に改良し,タスク固有のNOTAプロトタイプを生成する。 2つのFSDLREベンチマークの様々な設定において,提案手法が平均2.61%のF_1$で最先端の手法より優れていることを示す。

How to identify semantic relations among entities in a document when only a few labeled documents are available? Few-shot document-level relation extraction (FSDLRE) is crucial for addressing the pervasive data scarcity problem in real-world scenarios. Metric-based meta-learning is an effective framework widely adopted for FSDLRE, which constructs class prototypes for classification. However, existing works often struggle to obtain class prototypes with accurate relational semantics: 1) To build prototype for a target relation type, they aggregate the representations of all entity pairs holding that relation, while these entity pairs may also hold other relations, thus disturbing the prototype. 2) They use a set of generic NOTA (none-of-the-above) prototypes across all tasks, neglecting that the NOTA semantics differs in tasks with different target relation types. In this paper, we propose a relation-aware prototype learning method for FSDLRE to strengthen the relational semantics of prototype representations. By judiciously leveraging the relation descriptions and realistic NOTA instances as guidance, our method effectively refines the relation prototypes and generates task-specific NOTA prototypes. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches by average 2.61% $F_1$ across various settings of two FSDLRE benchmarks.

翻訳日:2023-10-25 19:11:02 公開日:2023-10-24

# 拡張テンプレートを用いた心電図インプテーションの拡散モデルの改善

Improving Diffusion Models for ECG Imputation with an Augmented Template Prior ( http://arxiv.org/abs/2310.15742v1 )

ライセンス: Link先を確認

Alexander Jenkins, Zehua Chen, Fu Siong Ng, Danilo Mandic

(参考訳) 心電図(ecg)などの脈動信号は日常診療の一部として広範囲に収集される。しかし、ノイズの多い低品質な録音は、モバイルの健康システムで収集された信号にとって大きな問題であり、信号品質が低下し、ダウンストリームのタスクが自動化される。近年の研究では、確率的時系列モデルによるECGの欠落値の計算が検討されている。それにもかかわらず、決定論的モデルと比較すると、被験者と心拍関係の差異がトレーニング目標において明示的に考慮されないため、その性能は依然として限られている。本研究は,心電図の計算精度の向上と確率モデルによる予測精度の向上を目的として,様々な健康状態に先立って情報処理を行うテンプレート誘導型拡散確率モデルPulseDiffを提案する。具体的には 1) まず,被写体レベルの脈動テンプレートを,個人的特徴を捉えた欠落値の先取りとして,観察から抽出する。 2) 位置と振幅のビートレベルのばらつきを考慮した事前拡張のためのテンプレートにビートレベルの確率シフト項を追加する。 3) 被験者の健康状態を検討するための信頼度スコアを最終的に設計し, プライオリティが安全な方法で提供されることを保証した。 PTBXLデータセットを用いて実験したところ、PulseDiffはCSDIとSSSD$^{S4}$という2つの強力なDDPMベースラインモデルの性能を改善し、不確実性を管理しながらDDPMの生成を検証した。 SSSD$^{S4}$と組み合わせると、PulseDiff法は短区間欠落データに対する主要な決定論的モデルよりも優れ、長期間隔データ損失に匹敵する。

Pulsative signals such as the electrocardiogram (ECG) are extensively collected as part of routine clinical care. However, noisy and poor-quality recordings, leading to missing values, are a major issue for signals collected using mobile health systems, decreasing the signal quality and affecting the automated downstream tasks. Recent studies have explored imputation of missing values for ECG with probabilistic time-series models. Nevertheless, in comparison with the deterministic models, their performance is still limited, as the variations across subjects and heart-beat relationships are not explicitly considered in the training objective. In this work, to improve the ECG imputation and forecasting accuracy with probabilistic models, we present an template-guided denoising diffusion probabilistic model, PulseDiff, which is conditioned an informative prior for a range of health conditions. Specifically, 1) we first extract a subject-level pulsative template from the observation as an informative prior of missing values, which captures the personal characteristics; 2) we then add beat-level stochastic shift terms on the template for prior augmentation, which considers the beat-level variance of positioning and amplitude; 3) we finally design a confidence score to consider the health condition of subject, which ensures our prior is provided in a safe way. Experiments with the PTBXL dataset reveal PulseDiff improves the performance of two strong DDPMs baseline models, CSDI and SSSD$^{S4}$, verifying our method guides the generation of DDPMs while managing the uncertainty. When combining with SSSD$^{S4}$, our PulseDiff method outperforms the leading deterministic model for short-interval missing data and is comparable for long-interval data loss.

翻訳日:2023-10-25 19:10:40 公開日:2023-10-24

# プロトタイプ学習と特権情報を用いた解釈可能な医用画像分類

Interpretable Medical Image Classification using Prototype Learning and Privileged Information ( http://arxiv.org/abs/2310.15741v1 )

ライセンス: Link先を確認

Luisa Gallee, Meinrad Beer, and Michael Goetz

(参考訳) 解釈可能性はしばしば医療画像に必須の要件である。説明可能性とハイパフォーマンスの必要性に対処するには、高度なディープラーニング手法が必要である。本研究では,トレーニングプロセス中に利用可能な追加情報を使用して理解可能かつ強力なモデルを作成することができるかを検討する。本稿では,カプセルネットワークの利点,プロトタイプ学習,特権情報の利用を活用したproto-capsという革新的なソリューションを提案する。 LIDC-IDRIデータセット上で提案された解を評価することで,解釈可能性の向上と以上の最先端予測性能の併用が期待できる。説明可能なベースラインモデルと比較して,悪性度 (93.0 %) と肺結節の平均的特徴を予測できる精度は6 %以上向上した。同時に、モデルは、放射線科医が定義した属性の視覚的な検証を可能にするプロトタイプ表現によるケースベースの推論を提供する。

Interpretability is often an essential requirement in medical imaging. Advanced deep learning methods are required to address this need for explainability and high performance. In this work, we investigate whether additional information available during the training process can be used to create an understandable and powerful model. We propose an innovative solution called Proto-Caps that leverages the benefits of capsule networks, prototype learning and the use of privileged information. Evaluating the proposed solution on the LIDC-IDRI dataset shows that it combines increased interpretability with above state-of-the-art prediction performance. Compared to the explainable baseline model, our method achieves more than 6 % higher accuracy in predicting both malignancy (93.0 %) and mean characteristic features of lung nodules. Simultaneously, the model provides case-based reasoning with prototype representations that allow visual validation of radiologist-defined attributes.

翻訳日:2023-10-25 19:10:09 公開日:2023-10-24

# 量子モナドロジーは

The Quantum Monadology ( http://arxiv.org/abs/2310.15735v1 )

ライセンス: Link先を確認

Hisham Sati and Urs Schreiber

(参考訳) 関数型プログラミング言語の現代的な理論は、計算サイドエフェクトとサイドコンテクストの符号化にモナドを用いる。量子コンピューティングは本質的に(量子測定のように)サイドエフェクトフルであり、(混合補助状態のように)コンテキスト依存であるにもかかわらず、このモナディックパラダイムは以前は量子プログラミング言語に当てはまらない。ここでは、Grothendieckの「操作のモチーフヨガ」によって誘導されるパラメータ化加群スペクトルのカテゴリ上の(co)モナドを、HC-加群に特化する現在の目的と、さらに集合付き複素ベクトル空間に対して体系的に解析する。量子計測結果によってパラメータ化された量子状態空間の集まりとしてインデックス付きベクトル空間を解釈すると、これらの(co)モナドは、古典的な制御と量子測定結果を古典的文脈に「動的に持ち上げる」機能を持つ関数型量子プログラミングのための包括的自然言語を提供する。我々は、最近構築された線形ホモトピー型理論(LHoTT)に埋め込み、パラメータ化されたモジュールスペクトルに解釈可能な、これらのモナディックな量子効果を表現するドメイン固有量子プログラミング言語(QS)を提案する。 LHoTTに組み込むと、線形量子型、古典的制御、動的リフト、そして特に位相効果を持つ、正式に検証可能な普遍量子プログラミングが実現される。

The modern theory of functional programming languages uses monads for encoding computational side-effects and side-contexts, beyond bare-bone program logic. Even though quantum computing is intrinsically side-effectful (as in quantum measurement) and context-dependent (as on mixed ancillary states), little of this monadic paradigm has previously been brought to bear on quantum programming languages. Here we systematically analyze the (co)monads on categories of parameterized module spectra which are induced by Grothendieck's "motivic yoga of operations" -- for the present purpose specialized to HC-modules and further to set-indexed complex vector spaces. Interpreting an indexed vector space as a collection of alternative possible quantum state spaces parameterized by quantum measurement results, as familiar from Proto-Quipper-semantics, we find that these (co)monads provide a comprehensive natural language for functional quantum programming with classical control and with "dynamic lifting" of quantum measurement results back into classical contexts. We close by indicating a domain-specific quantum programming language (QS) expressing these monadic quantum effects in transparent do-notation, embeddable into the recently constructed Linear Homotopy Type Theory (LHoTT) which interprets into parameterized module spectra. Once embedded into LHoTT, this should make for formally verifiable universal quantum programming with linear quantum types, classical control, dynamic lifting, and notably also with topological effects.

翻訳日:2023-10-25 19:09:53 公開日:2023-10-24

# 群集歩行者検出のためのクエリ適応型DETR

Query-adaptive DETR for Crowded Pedestrian Detection ( http://arxiv.org/abs/2310.15725v1 )

ライセンス: Link先を確認

Feng Gao, Jiaxu Leng, Ji Gan, and Xinbo Gao

(参考訳) トラアンフォーマー(DETR)とその変種(DETR)は,歩行者の混雑検知に適用され,高い性能を実現している。しかし、混み合ったシーンでは、DETRのクエリの数が手動で調整されなければならず、そうでなければ、パフォーマンスは様々な程度に低下する。本稿では,2つのクエリ生成手法をまず分析し,適応クエリ生成手法を設計するための4つのガイドラインを要約する。そこで我々は,この問題を軽減するためにランクベースの適応クエリ生成(RAQG)を提案する。具体的には、エンコーダが生成する最も信頼度の低い正のトレーニングサンプルのランクを予測できるランク予測ヘッドを設計する。予測ランクに基づいて,エンコーダが生成した粗い検出結果を適応的に選択してクエリを生成する適応的選択法を設計する。さらに、ランク予測ヘッドをより良く訓練するために、ソフトグラディエントL1損失を提案する。ソフトグラディエントL1損失の勾配は連続であり、損失値とモデルパラメータの更新値の関係を粒度的に記述することができる。提案手法は単純かつ効果的であり,任意のDETRに接続してクエリ適応性を実現する。 crowdhuman dataset と citypersons dataset の実験結果は,detr に対するクエリを適応的に生成し,競合的な結果が得られることを示した。特に,crowdhumanデータセットで39.4%のmrを実現する。

DEtection TRansformer (DETR) and its variants (DETRs) have been successfully applied to crowded pedestrian detection, which achieved promising performance. However, we find that, in different degrees of crowded scenes, the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees. In this paper, we first analyze the two current query generation methods and summarize four guidelines for designing the adaptive query generation method. Then, we propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem. Specifically, we design a rank prediction head that can predict the rank of the lowest confidence positive training sample produced by the encoder. Based on the predicted rank, we design an adaptive selection method that can adaptively select coarse detection results produced by the encoder to generate queries. Moreover, to train the rank prediction head better, we propose Soft Gradient L1 Loss. The gradient of Soft Gradient L1 Loss is continuous, which can describe the relationship between the loss value and the updated value of model parameters granularly. Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory. The experimental results on Crowdhuman dataset and Citypersons dataset show that our method can adaptively generate queries for DETRs and achieve competitive results. Especially, our method achieves state-of-the-art 39.4% MR on Crowdhuman dataset.

翻訳日:2023-10-25 19:09:23 公開日:2023-10-24

# variator: プラグアンドプレイ圧縮モジュールによる事前学習モデルの高速化

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules ( http://arxiv.org/abs/2310.15724v1 )

ライセンス: Link先を確認

Chaojun Xiao, Yuqi Luo, Wenbin Zhang, Pengle Zhang, Xu Han, Yankai Lin, Zhengyan Zhang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou

(参考訳) プレトレーニング言語モデル (PLM) は, NLPタスクにおいて顕著な結果を得たが, 膨大なパラメータサイズと計算コストを犠牲にしている。本稿では,プラグアンドプレイ圧縮プラグインによる計算効率を向上させるパラメータ効率向上手法であるVariatorを提案する。圧縮プラグインは、複数の隠れベクターを1つに圧縮することでシーケンス長を減らし、元のPLMでトレーニングするように設計されている。 1) 実世界のアプリケーションでは, 圧縮プラグインのプラグ・アンド・プレイ特性は, 現在のワークロードに基づいて異なる加速度比で異なる圧縮プラグインを動的に選択することができる。 2) 圧縮プラグインは、最小パラメータを持ついくつかのコンパクトニューラルネットワーク層で構成され、特にタスク数が増加するシナリオにおいて、ストレージとメモリオーバーヘッドを大幅に節約する。 Variatorの7つのデータセットに対する有効性を検証する。実験の結果,バリエータは0.9%の追加パラメータで計算コストを53%削減でき,性能は2%未満であった。さらに、モデルが数十億のパラメータにスケールすると、変数は未圧縮plmの強力な性能にマッチする。

Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressing multiple hidden vectors into one and trained with original PLMs frozen. Different from traditional model acceleration methods, which compress PLMs to smaller sizes, Variator offers two distinct advantages: (1) In real-world applications, the plug-and-play nature of our compression plugins enables dynamic selection of different compression plugins with varying acceleration ratios based on the current workload. (2) The compression plugin comprises a few compact neural network layers with minimal parameters, significantly saving storage and memory overhead, particularly in scenarios with a growing number of tasks. We validate the effectiveness of Variator on seven datasets. Experimental results show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%. Moreover, when the model scales to billions of parameters, Variator matches the strong performance of uncompressed PLMs.

翻訳日:2023-10-25 19:09:00 公開日:2023-10-24

# re-temp:時間知識グラフ完成のための関係認識時間表現学習

Re-Temp: Relation-Aware Temporal Representation Learning for Temporal Knowledge Graph Completion ( http://arxiv.org/abs/2310.15722v1 )

ライセンス: Link先を確認

Kunze Wang, Soyeon Caren Han, Josiah Poon

(参考訳) 補外設定の下での時間的知識グラフ補完(TKGC)は、行方不明な実体を将来から予測することを目的としており、現実の予測問題とより密接に一致する課題を呈している。既存の研究は主に、最近のスナップショットに適用されたシーケンシャルグラフニューラルネットワークを使用してエンティティと関係を符号化している。しかしながら、これらのアプローチは、クエリにおけるエンティティ関連の関係に従って無関係なスナップショットをスキップする能力を見落とし、明示的な時間的情報の重要性を無視する傾向にある。そこで本研究では,各タイムスタンプのあとのスキップ情報の流れを取り入れ,明示的な時間的埋め込みを入力として活用するRe-Temp(Relation-Aware Temporal Representation Learning)を提案する。さらに,情報漏洩を防止するため,二相前方伝播法を提案する。 6つのtkgc(extrapolation)データセットの評価を通じて、このモデルが最新の8つの最先端モデルを上回ることを実証した。

Temporal Knowledge Graph Completion (TKGC) under the extrapolation setting aims to predict the missing entity from a fact in the future, posing a challenge that aligns more closely with real-world prediction problems. Existing research mostly encodes entities and relations using sequential graph neural networks applied to recent snapshots. However, these approaches tend to overlook the ability to skip irrelevant snapshots according to entity-related relations in the query and disregard the importance of explicit temporal information. To address this, we propose our model, Re-Temp (Relation-Aware Temporal Representation Learning), which leverages explicit temporal embedding as input and incorporates skip information flow after each timestamp to skip unnecessary information for prediction. Additionally, we introduce a two-phase forward propagation method to prevent information leakage. Through the evaluation on six TKGC (extrapolation) datasets, we demonstrate that our model outperforms all eight recent state-of-the-art models by a significant margin.

翻訳日:2023-10-25 19:08:41 公開日:2023-10-24

# 脳エンコーディングのためのタスク固有言語モデルのアンサンブル

Ensemble of Task-Specific Language Models for Brain Encoding ( http://arxiv.org/abs/2310.15720v1 )

ライセンス: Link先を確認

Sanjai Kumaran, Arvindh Arun, Jerrin John

(参考訳) 言語モデルは、脳内の特定の関心領域のfMRIアクティベーションをエンコードするのに十分なほど豊富であることが示されている。従来の研究は、脳の反応を予測するために人気のある自然言語処理タスクで学んだ表現から伝達学習を探索してきた。本研究では,10言語モデル(構文2と意味8)からアンサンブルモデルを作成することにより,エンコーダの性能を向上させる。アンサンブルメソッドを通じて、すべてのROIで、現在のベースラインを平均10%上回りました。

Language models have been shown to be rich enough to encode fMRI activations of certain Regions of Interest in our Brains. Previous works have explored transfer learning from representations learned for popular natural language processing tasks for predicting brain responses. In our work, we improve the performance of such encoders by creating an ensemble model out of 10 popular Language Models (2 syntactic and 8 semantic). We beat the current baselines by 10% on average across all ROIs through our ensembling methods.

翻訳日:2023-10-25 19:08:22 公開日:2023-10-24

# リカレントリニアトランス

Recurrent Linear Transformers ( http://arxiv.org/abs/2310.15719v1 )

ライセンス: Link先を確認

Subhojeet Pramanik, Esraa Elelimy, Marlos C. Machado, Adam White

(参考訳) トランスアーキテクチャにおける自己保持機構は、長距離依存をキャプチャできるため、シーケンシャルデータ処理におけるその有効性の背後にある主な理由である。しかし、トランスフォーマーの成功にもかかわらず、幅広い適用可能性を制限する2つの大きな欠点がある。(1)過去の情報を思い出すために、自己照査メカニズムは、コンテキストとして提供すべき履歴全体にアクセスする必要がある。 (2)変圧器の推論コストは高価である。本稿では,文脈非依存な推論コストを提供し,長距離依存性を効果的に活用し,実際にうまく機能するトランスフォーマ自着機構の再帰的な代替手法を提案する。上述した計算制限が変圧器の応用をほぼ不可能にしている強化学習問題に対する我々のアプローチを評価する。診断環境におけるアーキテクチャの異なるコンポーネントの影響を定量化し、2dおよび3dピクセルベースの部分観測可能な環境でのパフォーマンス向上を評価する。最先端アーキテクチャであるgtrxlと比較すると、このアプローチでの推論は少なくとも40%安価で、メモリ使用量を50%以上削減できる。提案手法はGTrXLと同等かそれ以上に動作し,GTrXLの性能が37%以上向上する。

The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their broader applicability: (1) In order to remember past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper we introduce recurrent alternatives to the transformer self-attention mechanism that offer a context-independent inference cost, leverage long-range dependencies effectively, and perform well in practice. We evaluate our approaches in reinforcement learning problems where the aforementioned computational limitations make the application of transformers nearly infeasible. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments. When compared to a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use in more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance on harder tasks.

翻訳日:2023-10-25 19:08:14 公開日:2023-10-24

# ソーシャルメディア上でヘイトスピーチを共有する理由に関する因果理解

Causal Understanding of Why Users Share Hate Speech on Social Media ( http://arxiv.org/abs/2310.15772v1 )

ライセンス: Link先を確認

Dominique Geissler and Abdurahman Maarouf and Stefan Feuerriegel

(参考訳) ソーシャルメディア上でのヘイトスピーチは、個人の精神的および身体的幸福を脅かし、現実世界の暴力にさらに責任を負う。ヘイトスピーチの普及の背後にある重要なドライバーであり、なぜヘイトフルな投稿がバイラルに広まるのかは、リシェアされている。本稿では,ヘイトスピーチをユーザに再共有させるユーザ属性の包括的かつ因果的分析を行う。しかし, ソーシャルメディアデータからの因果推論は, 選択バイアスに悩まされる可能性が高く, 発話を嫌うユーザの脆弱性の違いにより, さらなる矛盾が生じているため, 困難である。我々は,新しい3段階の因果関係の枠組みを開発し,(1)対向性スコアを応用し,観察的ソーシャルメディアデータの偏りを解消する。 2) 音声を潜伏埋め込みとして嫌うユーザの潜伏脆弱性をモデル化するために, 偏りのある確率スコアを用いた。 3) ユーザ属性がヘイトスピーチを共有する確率に与える影響をモデル化し, ユーザのヘイトスピーチに対する潜在的な脆弱性を制御した。既存のベースラインと比較して、我々のフレームワークの特に強みは、非線形でありながら説明可能な因果効果をモデル化することである。フォロワーが減り、友達が減り、投稿数が減り、ヘイトスピーチが増えたことがわかりました。その代わり、若いアカウントはヘイトスピーチを減らしている。全体として、ヘイトスピーチの共有を促す要因を理解することは、有害な行動に関与するリスクのある個人を検知し、効果的な緩和戦略を設計するために重要である。

Hate speech on social media threatens the mental and physical well-being of individuals and is further responsible for real-world violence. An important driver behind the spread of hate speech and thus why hateful posts can go viral are reshares, yet little is known about why users reshare hate speech. In this paper, we present a comprehensive, causal analysis of the user attributes that make users reshare hate speech. However, causal inference from observational social media data is challenging, because such data likely suffer from selection bias, and there is further confounding due to differences in the vulnerability of users to hate speech. We develop a novel, three-step causal framework: (1) We debias the observational social media data by applying inverse propensity scoring. (2) We use the debiased propensity scores to model the latent vulnerability of users to hate speech as a latent embedding. (3) We model the causal effects of user attributes on users' probability of sharing hate speech, while controlling for the latent vulnerability of users to hate speech. Compared to existing baselines, a particular strength of our framework is that it models causal effects that are non-linear, yet still explainable. We find that users with fewer followers, fewer friends, and fewer posts share more hate speech. Younger accounts, in return, share less hate speech. Overall, understanding the factors that drive users to share hate speech is crucial for detecting individuals at risk of engaging in harmful behavior and for designing effective mitigation strategies.

翻訳日:2023-10-25 19:02:58 公開日:2023-10-24

# 自己監督型コントラスト学習によるMRI超解像

Unpaired MRI Super Resolution with Self-Supervised Contrastive Learning ( http://arxiv.org/abs/2310.15767v1 )

ライセンス: Link先を確認

Hao Li, Quanwei Liu, Jianan Liu, Xiling Liu, Yanni Dong, Tao Huang, Zhihan Lv

(参考訳) 高分解能mri(high- resolution (hr) magnetic resonance imaging, mri)は臨床における診断精度を高めるために重要である。それでも、MRIの解像度に固有の制限が適用範囲を制限している。深層学習に基づく画像超解像(SR)法は、追加コストなしでMRIの解像度を改善することを約束する。しかし、これらの手法はトレーニングのために相当数のHR MRI画像を必要とすることが多く、取得は困難である。本稿では、自己教師付きコントラスト学習を用いて、限られたトレーニングデータを用いてSR性能を向上させる未ペアMRI SRアプローチを提案する。提案手法は,正および負のサンプル対を構築するために,正のHR画像と合成SR画像の両方を活用し,識別的特徴の学習を容易にする。本研究で得られた実験結果は,hr画像のpaucityが利用可能であっても,ピーク信号対雑音比と構造類似度指数が著しく向上することを示す。本研究は, 臨床応用における高分解能MRIの進歩に寄与し, 限られたトレーニングデータの課題に対処するためのアプローチの可能性を示すものである。

High-resolution (HR) magnetic resonance imaging (MRI) is crucial for enhancing diagnostic accuracy in clinical settings. Nonetheless, the inherent limitation of MRI resolution restricts its widespread applicability. Deep learning-based image super-resolution (SR) methods exhibit promise in improving MRI resolution without additional cost. However, these methods frequently require a substantial number of HR MRI images for training, which can be challenging to acquire. In this paper, we propose an unpaired MRI SR approach that employs self-supervised contrastive learning to enhance SR performance with limited training data. Our approach leverages both authentic HR images and synthetically generated SR images to construct positive and negative sample pairs, thus facilitating the learning of discriminative features. Empirical results presented in this study underscore significant enhancements in the peak signal-to-noise ratio and structural similarity index, even when a paucity of HR images is available. These findings accentuate the potential of our approach in addressing the challenge of limited training data, thereby contributing to the advancement of high-resolution MRI in clinical applications.

翻訳日:2023-10-25 19:02:31 公開日:2023-10-24

# 条件付き精度調整によるロバスト学習

Robust Learning via Conditional Prevalence Adjustment ( http://arxiv.org/abs/2310.15766v1 )

ライセンス: Link先を確認

Minh Nguyen, Alan Q. Wang, Heejong Kim, Mert R. Sabuncu

(参考訳) 医療データは、境界変数間の相関が広く変化する複数の場所から来ることが多い。深層学習モデルがこれらの不安定な相関を利用していれば、目に見えない場所で破滅的に失敗する可能性がある。不安定な相関に対処する多くの方法が提案されているが、それぞれに制限がある。例えば、敵対的なトレーニングはモデルに不安定な相関を完全に無視させるが、それによって予測性能が低下する可能性がある。他の方法(例えば不変リスク最小化[4])は、因果データ生成過程を仮定して、安定した関連性のみに依存するドメイン不変表現を学習しようとする(入力 X はクラスラベル Y を引き起こす)。したがって、それらはコンピュータビジョンに共通する反因果タスク(Y cause X)に対して効果がない。本稿では,CoPA(Conditional Prevalence-Adjustment)という手法を提案する。 CoPAは、(1)生成機構が安定であり、すなわちラベルYと共起変数(s)ZがXを発生し、(2)各サイトEにおける不安定な条件付き確率がXとYの不安定な相関を完全に考慮していると仮定する。我々の重要な観察は、共起変数は医療現場で定期的に記録され、例えば (Y, Z) サンプルのセット(X のサンプルは不要)から容易に有病率を推定できるということです。 CoPAは、たとえ1つのトレーニングサイトがあっても機能する。合成データと実データを用いた実験では,CoPAが競争ベースラインを上回っていることがわかった。

Healthcare data often come from multiple sites in which the correlations between confounding variables can vary widely. If deep learning models exploit these unstable correlations, they might fail catastrophically in unseen sites. Although many methods have been proposed to tackle unstable correlations, each has its limitations. For example, adversarial training forces models to completely ignore unstable correlations, but doing so may lead to poor predictive performance. Other methods (e.g. Invariant risk minimization [4]) try to learn domain-invariant representations that rely only on stable associations by assuming a causal data-generating process (input X causes class label Y ). Thus, they may be ineffective for anti-causal tasks (Y causes X), which are common in computer vision. We propose a method called CoPA (Conditional Prevalence-Adjustment) for anti-causal tasks. CoPA assumes that (1) generation mechanism is stable, i.e. label Y and confounding variable(s) Z generate X, and (2) the unstable conditional prevalence in each site E fully accounts for the unstable correlations between X and Y . Our crucial observation is that confounding variables are routinely recorded in healthcare settings and the prevalence can be readily estimated, for example, from a set of (Y, Z) samples (no need for corresponding samples of X). CoPA can work even if there is a single training site, a scenario which is often overlooked by existing methods. Our experiments on synthetic and real data show CoPA beating competitive baselines.

翻訳日:2023-10-25 19:02:14 公開日:2023-10-24

# 自由テキストフィードバックから学ぶ - 新しいデータセットを収集するか、既存のものを拡張するか?

Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend Existing Ones? ( http://arxiv.org/abs/2310.15758v1 )

ライセンス: Link先を確認

Dominic Petrak, Nafise Sadat Moosavi, Ye Tian, Nikolai Rozanov, Iryna Gurevych

(参考訳) 自由テキストの人間のフィードバックから学ぶことはダイアログシステムには不可欠だが、注釈付きデータは少なく、通常は会話型AIで知られている少数のエラータイプのみをカバーする。新しいデータセットをスクラッチから収集しアノテートするのではなく、最新の合成ダイアログ生成は、既存のダイアログデータセットを必要なアノテーションで拡張するために使用できる。しかし,このような取り組みの実現可能性を評価するためには,これらのデータセットに含まれる自由文フィードバックのタイプと頻度を知ることが重要である。本研究では,MultiWoZ,SGD,BABI,ペルソナチャット,ウィザーズ・オブ・ウィキペディア,セルフフィード・チャットボットの人間ボット分割など,多種多様なダイアログデータセットについて検討する。本稿では,対話における自由文人文フィードバックのアノテーションのための新しい分類法を導出し,gpt-2,llama,flan-t5の3つのsota言語生成モデルに対する応答生成におけるそのデータを含む影響について検討した。本研究は,エラータイプ,ユーザ応答型,それらの関係など,検討したデータセットの構成に関する新たな知見を提供する。

Learning from free-text human feedback is essential for dialog systems, but annotated data is scarce and usually covers only a small fraction of error types known in conversational AI. Instead of collecting and annotating new datasets from scratch, recent advances in synthetic dialog generation could be used to augment existing dialog datasets with the necessary annotations. However, to assess the feasibility of such an effort, it is important to know the types and frequency of free-text human feedback included in these datasets. In this work, we investigate this question for a variety of commonly used dialog datasets, including MultiWoZ, SGD, BABI, PersonaChat, Wizards-of-Wikipedia, and the human-bot split of the Self-Feeding Chatbot. Using our observations, we derive new taxonomies for the annotation of free-text human feedback in dialogs and investigate the impact of including such data in response generation for three SOTA language generation models, including GPT-2, LLAMA, and Flan-T5. Our findings provide new insights into the composition of the datasets examined, including error types, user response types, and the relations between them.

翻訳日:2023-10-25 19:01:44 公開日:2023-10-24

# オンライン討論における価値の相違は相違に影響を及ぼすか?

Do Differences in Values Influence Disagreements in Online Discussions? ( http://arxiv.org/abs/2310.15757v1 )

ライセンス: Link先を確認

Michiel van der Meer, Piek Vossen, Catholijn M. Jonker, Pradeep K. Murukannaiah

(参考訳) 差別はオンライン議論で一般的である。相違はコラボレーションを促進し、いくつかの条件下での議論の品質を改善する可能性がある。意見の不一致を認識する方法は存在するが、意見不一致に影響を及ぼす要因の深い理解は文献に欠けている。本稿では,個人価値の違いがオンライン議論における意見の相違を示唆する仮説を考察する。オンライン議論における価値推定に最先端モデルをどのように利用できるか,そして,推定値をどのように価値プロファイルに集約できるかを示す。人手による合意ラベルに基づいて,評価値のプロファイルを評価する。価値プロファイルの相違は特定のケースにおける不一致と相関することがわかった。また,合意予測に価値情報を含めることで,性能が向上することがわかった。

Disagreements are common in online discussions. Disagreement may foster collaboration and improve the quality of a discussion under some conditions. Although there exist methods for recognizing disagreement, a deeper understanding of factors that influence disagreement is lacking in the literature. We investigate a hypothesis that differences in personal values are indicative of disagreement in online discussions. We show how state-of-the-art models can be used for estimating values in online discussions and how the estimated values can be aggregated into value profiles. We evaluate the estimated value profiles based on human-annotated agreement labels. We find that the dissimilarity of value profiles correlates with disagreement in specific cases. We also find that including value information in agreement prediction improves performance.

翻訳日:2023-10-25 19:01:22 公開日:2023-10-24

# 言語モデルと直接音声翻訳の統合:ジェンダーの抑揚を制御する推論時間解法

Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection ( http://arxiv.org/abs/2310.15752v1 )

ライセンス: Link先を確認

Dennis Fucci, Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli

(参考訳) 話者を参照する単語を翻訳する場合、音声翻訳(st)システムはデフォルトの男性ジェネリクスに頼らず、潜在的に誤解を招く声質に頼るべきではない。むしろ、話者の好みに応じて性別を割り当てるべきである。そのための既存のソリューションは、効果的ではあるが、実際には実現不可能ではない。提案手法は,STデコーダが暗黙的に学習した(バイアス付き)内部言語モデル(LM)を,ジェンダー固有の外部LMに置き換えるものである。 en->es/fr/it実験では,女性型において,基礎モデルと最良のトレーニング時間緩和戦略をそれぞれ31.0点,1.6点に上回った。話者の発声特性が性別と矛盾する困難な状況下では、さらに利益が(最大32.0と3.4まで)大きくなる。

When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST data. To overcome these limitations, we propose the first inference-time solution to control speaker-related gender inflections in ST. Our approach partially replaces the (biased) internal language model (LM) implicitly learned by the ST decoder with gender-specific external LMs. Experiments on en->es/fr/it show that our solution outperforms the base models and the best training-time mitigation strategy by up to 31.0 and 1.6 points in gender accuracy, respectively, for feminine forms. The gains are even larger (up to 32.0 and 3.4) in the challenging condition where speakers' vocal traits conflict with their gender.

翻訳日:2023-10-25 19:01:12 公開日:2023-10-24

# 大規模言語モデルはビデオ質問応答の時間的・因果的推論である

Large Language Models are Temporal and Causal Reasoners for Video Question Answering ( http://arxiv.org/abs/2310.15747v1 )

ライセンス: Link先を確認

Dohwan Ko, Ji Soo Lee, Wooyoung Kang, Byungseok Roh, Hyunwoo J. Kim

(参考訳) 大規模言語モデル(LLM)は、幅広い自然言語理解および生成タスクにおいて顕著なパフォーマンスを示している。ビデオ質問回答 (Video Question Answering, VideoQA) における時間的・因果的推論のために, LLM が $\textit{linguistic shortcuts}$ を有効活用するための先行情報を提供する。しかしながら、そのような先行は、視覚的コンテンツを無視しながら、そのモデルを過度に疑問に答える$\textit{i.e.}$, $\textit{linguistic bias}$ へと導くことによって、ビデオQAの準最適結果を引き起こすことが多い。これは 'ungrounded guesses' や 'hallucinations' とも呼ばれる。この問題を解決するために,ビデオQA 上で LLM が先行する手法である Flipped-VQA を提案し,VQ とVA,QA のペアをそれぞれ付与する$\langle$V,Q,A$\rangle$ triplet のすべての組み合わせを,ソースペアとターゲットラベルをフリップすることで予測し,それらの複雑な関係を理解するために $\textit{i.e.}$,予測 A, Q, V のペアをそれぞれ与えられた VQ, VA, QA のペアを推定する。本稿では,LLaMAにFlipped-VQAを適用してLLaMA-VQAを開発した。さらに、Flipped-VQA は様々な LLM (OPT および GPT-J) に適用可能な汎用フレームワークであり、その性能を一貫して改善する。我々は, Flipped-VQAが言語的ショートカットの活用を促進するだけでなく, 言語バイアスを緩和し, 問題の過度な回答を引き起こすことを実証的に示す。コードはhttps://github.com/mlvlab/flipped-vqaで入手できる。

Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA). However, such priors often cause suboptimal results on VideoQA by leading the model to over-rely on questions, $\textit{i.e.}$, $\textit{linguistic bias}$, while ignoring visual content. This is also known as `ungrounded guesses' or `hallucinations'. To address this problem while leveraging LLMs' prior on VideoQA, we propose a novel framework, Flipped-VQA, encouraging the model to predict all the combinations of $\langle$V, Q, A$\rangle$ triplet by flipping the source pair and the target label to understand their complex relationships, $\textit{i.e.}$, predict A, Q, and V given a VQ, VA, and QA pairs, respectively. In this paper, we develop LLaMA-VQA by applying Flipped-VQA to LLaMA, and it outperforms both LLMs-based and non-LLMs-based models on five challenging VideoQA benchmarks. Furthermore, our Flipped-VQA is a general framework that is applicable to various LLMs (OPT and GPT-J) and consistently improves their performances. We empirically demonstrate that Flipped-VQA not only enhances the exploitation of linguistic shortcuts but also mitigates the linguistic bias, which causes incorrect answers over-relying on the question. Code is available at https://github.com/mlvlab/Flipped-VQA.

翻訳日:2023-10-25 19:00:51 公開日:2023-10-24

# 失敗は道を開く - チューニングフリーなルール蓄積による大規模言語モデルの拡張

Failures Pave the Way: Enhancing Large Language Models through Tuning-free Rule Accumulation ( http://arxiv.org/abs/2310.15746v1 )

ライセンス: Link先を確認

Zeyuan Yang, Peng Li, Yang Liu

(参考訳) 大きな言語モデル(LLM)は素晴らしいパフォーマンスを示しています。しかし、サンプル間の関係を捉えることができないため、これらの凍結LDMは必然的に同様のミスを繰り返し続ける。本稿では,過去の誤りから学習することで,llmの性能向上を指導するチューニングフリールール蓄積(tran)フレームワークを提案する。データが順次到着すると、LSMは不正なケースから徐々にルールを蓄積し、ルールコレクションを形成する。これらのルールはLLMによって、後続の入力を処理する際にも同様のミスを避けるために使用される。さらに、ルールはプライマリプロンプトとは独立であり、シームレスにプロンプトデザイン戦略を補完する。実験により,TRANは最近のベースラインよりも大きなマージンで改善されていることがわかった。

Large Language Models (LLMs) have showcased impressive performance. However, due to their inability to capture relationships among samples, these frozen LLMs inevitably keep repeating similar mistakes. In this work, we propose our Tuning-free Rule Accumulation (TRAN) framework, which guides LLMs in improving their performance by learning from previous mistakes. Considering data arrives sequentially, LLMs gradually accumulate rules from incorrect cases, forming a rule collection. These rules are then utilized by the LLMs to avoid making similar mistakes when processing subsequent inputs. Moreover, the rules remain independent of the primary prompts, seamlessly complementing prompt design strategies. Experimentally, we show that TRAN improves over recent baselines by a large margin.

翻訳日:2023-10-25 18:59:49 公開日:2023-10-24

# トポロジカル非負行列因子化による単一細胞RNA配列の解析

Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization ( http://arxiv.org/abs/2310.15744v1 )

ライセンス: Link先を確認

Yuta Hozumi and Guo-Wei Wei

(参考訳) 単細胞rnaシークエンシング(scrna-seq)は比較的新しい技術であり、scrna-seqデータに関連する高次元、複雑さ、大規模であることから統計学、データサイエンス、計算生物学に多大な関心を寄せている。非負行列分解(NMF)は、結果として生じる低次元成分のメタジーン解釈によるユニークなアプローチを提供する。しかし、NMFアプローチはマルチスケール分析の欠如に悩まされている。この研究は、2つの永続ラプラシア正規化NMF法、すなわちトポロジカルNMF(TNMF)とロバストトトポロジカルNMF(rTNMF)を導入している。合計12のデータセットを用いて、提案したTNMFとrTNMFが他のNMFベースの手法よりも大幅に優れていることを示す。また,TNMF と rTNMF を用いて,一般的な一様多様体近似・投影 (UMAP) と t-分散確率的隣接埋め込み (t-SNE) の可視化を行った。

Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).

翻訳日:2023-10-25 18:59:13 公開日:2023-10-24

# パラメータ効率の良い構成知識グラフ表現のためのランダムエンティティ量子化

Random Entity Quantization for Parameter-Efficient Compositional Knowledge Graph Representation ( http://arxiv.org/abs/2310.15797v1 )

ライセンス: Link先を確認

Jiaang Li, Quan Wang, Yi Liu, Licheng Zhang, Zhendong Mao

(参考訳) 下流タスクには知識グラフ(KG)の表現学習が不可欠である。支配的なアプローチであるKG Embedding(KGE)は、独立したベクトルを持つエンティティを表し、スケーラビリティの課題に直面している。最近の研究では、事前定義された小さなコードブックからマッチしたエンティティ対応コードワードを構成することでエンティティを表現する、パラメータ効率の代替方法を提案している。本稿では、各エンティティの対応するコードワードをエンティティ量子化として取得するプロセスについて述べる。本稿では,単純なランダムな実体量子化が,現在の戦略と同じような結果が得られることを示す。この現象を分析し,エンティティ表現のための数値化結果であるエンティティ符号が,コードレベルではエントロピーが高く,ランダムなエンティティ量子化下ではコードワードレベルではjaccard距離が高いことを明らかにする。したがって、異なる実体はより容易に区別され、効果的なKG表現を促進する。以上の結果から,現在の定量化戦略はkg表現にとって重要ではないこと,また,実体識別性が現在の戦略を超えて向上する余地があることが示された。結果はhttps://github.com/jiaangl/randomquantizationで再現できます。

Representation Learning on Knowledge Graphs (KGs) is essential for downstream tasks. The dominant approach, KG Embedding (KGE), represents entities with independent vectors and faces the scalability challenge. Recent studies propose an alternative way for parameter efficiency, which represents entities by composing entity-corresponding codewords matched from predefined small-scale codebooks. We refer to the process of obtaining corresponding codewords of each entity as entity quantization, for which previous works have designed complicated strategies. Surprisingly, this paper shows that simple random entity quantization can achieve similar results to current strategies. We analyze this phenomenon and reveal that entity codes, the quantization outcomes for expressing entities, have higher entropy at the code level and Jaccard distance at the codeword level under random entity quantization. Therefore, different entities become more easily distinguished, facilitating effective KG representation. The above results show that current quantization strategies are not critical for KG representation, and there is still room for improvement in entity distinguishability beyond current strategies. The code to reproduce our results is available at https://github.com/JiaangL/RandomQuantization.

翻訳日:2023-10-25 18:51:14 公開日:2023-10-24

# プレフィックス部分空間学習による大規模言語モデルの一般化

Improving generalization in large language models by learning prefix subspaces ( http://arxiv.org/abs/2310.15793v1 )

ライセンス: Link先を確認

Louis Falissard, Vincent Guigue, Laure Soulier

(参考訳) この記事では、不足データレジーム("few-shot"学習設定としても知られる)における、大言語モデル(llms)の微調整に焦点を当てます。ニューラルネットワーク部分空間に基づくLLMの一般化能力を向上させる手法を提案する。近年,コンピュータビジョンで導入されたこの最適化手法は,パラメータ空間におけるモデル全体の結合最適化を通じて,より広い局所最適化を同定することにより,モデル一般化を改善することを目的としている。しかし、大規模で事前訓練されたトランスフォーマーへの適応は、いくつかの課題を引き起こす。第一に、それらのパラメータの数によって複数のモデルの訓練が難しくなっており、第二に、決定論的パラメータの初期化スキームは、当初提案された部分空間法に不適当である。本稿では,Parameter Efficient Fine-Tuning(PEFT)法が従来の手法と完全に互換性があることを示し,連続接頭辞の単純さを学習することを提案する。本手法は,数ショットの学習環境に適応したGLUEベンチマークの変種を用いて試行し,両コントリビューションが相多手法と比較して平均性能の向上につながることを示す。実装は以下のリンクで確認できる。 https://github.com/Liloulou/prefix_subspace

This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as the "few-shot" learning setting). We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces. This optimization method, recently introduced in computer vision, aims to improve model generalization by identifying wider local optima through the joint optimization of an entire simplex of models in parameter space. Its adaptation to massive, pretrained transformers, however, poses some challenges. First, their considerable number of parameters makes it difficult to train several models jointly, and second, their deterministic parameter initialization schemes make them unfit for the subspace method as originally proposed. We show in this paper that "Parameter Efficient Fine-Tuning" (PEFT) methods, however, are perfectly compatible with this original approach, and propose to learn entire simplex of continuous prefixes. We test our method on a variant of the GLUE benchmark adapted to the few-shot learning setting, and show that both our contributions jointly lead to a gain in average performances compared to sota methods. The implementation can be found at the following link: https://github.com/Liloulou/prefix_subspace

翻訳日:2023-10-25 18:50:54 公開日:2023-10-24

# qPOTS: Pareto 最適トンプソンサンプリングによる効率的なバッチ多目的ベイズ最適化

qPOTS: Efficient batch multiobjective Bayesian optimization via Pareto optimal Thompson sampling ( http://arxiv.org/abs/2310.15788v1 )

ライセンス: Link先を確認

S. Ashwin Renganathan

(参考訳) 多目的最適化の古典的進化的アプローチは、非常に効果的であるが、目的に対して多くのクエリを発生させる。多目的最適化を解くためのサンプル効率のアプローチは、ガウス過程(GP)サロゲートとベイズ最適化(BO)である。多目的ベイズ最適化(MOBO)は、新しい観測候補を取得するために最適化された取得関数の構築を伴う。この ‘inner' の最適化は様々な理由により困難である: 取得関数は非凸であり、非微分可能であり、/または解析形式で利用できない。我々は、このハード獲得関数最適化ステップを廃止し、より安価な多目的最適化問題を解くことで得られたランダムgp後方サンプルパスのparetoフロンティアから新しい候補を選択する(q\texttt{pots}$)トンプソンサンプリングベースアプローチ(q\texttt{pots}$)を提案する。より高次元での計算的トラクタビリティを向上させるために、Nystr\"{o}m近似と組み合わせた自動アクティブな候補選択法を提案する。提案手法は,任意のgp事前仮定に適用し,合成および実世界実験において,精度と計算効率の両面で,最先端における強力な経験的性能を示す。

Classical evolutionary approaches for multiobjective optimization are quite effective but incur a lot of queries to the objectives; this can be prohibitive when objectives are expensive oracles. A sample-efficient approach to solving multiobjective optimization is via Gaussian process (GP) surrogates and Bayesian optimization (BO). Multiobjective Bayesian optimization (MOBO) involves the construction of an acquisition function which is optimized to acquire new observation candidates. This ``inner'' optimization can be hard due to various reasons: acquisition functions being nonconvex, nondifferentiable and/or unavailable in analytical form; the success of MOBO heavily relies on this inner optimization. We do away with this hard acquisition function optimization step and propose a simple, but effective, Thompson sampling based approach ($q\texttt{POTS}$) where new candidate(s) are chosen from the Pareto frontier of random GP posterior sample paths obtained by solving a much cheaper multiobjective optimization problem. To further improve computational tractability in higher dimensions we propose an automated active set of candidates selection combined with a Nystr\"{o}m approximation. Our approach applies to arbitrary GP prior assumptions and demonstrates strong empirical performance over the state of the art, both in terms of accuracy and computational efficiency, on synthetic as well as real-world experiments.

翻訳日:2023-10-25 18:50:31 公開日:2023-10-24

# SequenceMatch: 半教師あり学習のための弱強強化設計の再検討

SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learning ( http://arxiv.org/abs/2310.15787v1 )

ライセンス: Link先を確認

Khanh-Binh Nguyen

(参考訳) 半教師付き学習(SSL)は,大量のラベルのないデータを用いたモデルのトレーニングを可能にするため,近年普及している。しかし、SSLメソッドが直面する問題のひとつは、モデルが小さなラベル付きトレーニングデータセットに過度に適合し、過信で誤った予測を生成する場合に発生する、確認バイアスである。この問題に対処するために,複数のデータ拡張を利用する効率的なSSL手法であるSequenceMatchを提案する。 sequencematchのキー要素は、ラベルなしデータのメディア拡張を含んでいることです。拡張された各例の異なる拡張と一貫性の制約を利用することで、sequencematchは弱く強く拡張された例に対するモデルの予測分布の相違を減らすのに役立ちます。さらに、SequenceMatchは、高信頼と低信頼の予測のための2つの異なる一貫性の制約を定義する。その結果、SequenceMatchはReMixMatchよりもデータ効率が高く、ReMixMatch($\times4$)とCoMatch($\times2$)の両方よりも時間効率が高い。その単純さにもかかわらず、SequenceMatchはCIFAR-10/100、SVHN、STL-10といった標準ベンチマークの先行手法より一貫して優れている。また、ImageNetのような大規模データセットで38.46\%のエラー率で、最先端の手法をはるかに上回っている。コードはhttps://github.com/beandkay/sequencematchで入手できる。

Semi-supervised learning (SSL) has become popular in recent years because it allows the training of a model using a large amount of unlabeled data. However, one issue that many SSL methods face is the confirmation bias, which occurs when the model is overfitted to the small labeled training dataset and produces overconfident, incorrect predictions. To address this issue, we propose SequenceMatch, an efficient SSL method that utilizes multiple data augmentations. The key element of SequenceMatch is the inclusion of a medium augmentation for unlabeled data. By taking advantage of different augmentations and the consistency constraints between each pair of augmented examples, SequenceMatch helps reduce the divergence between the prediction distribution of the model for weakly and strongly augmented examples. In addition, SequenceMatch defines two different consistency constraints for high and low-confidence predictions. As a result, SequenceMatch is more data-efficient than ReMixMatch, and more time-efficient than both ReMixMatch ($\times4$) and CoMatch ($\times2$) while having higher accuracy. Despite its simplicity, SequenceMatch consistently outperforms prior methods on standard benchmarks, such as CIFAR-10/100, SVHN, and STL-10. It also surpasses prior state-of-the-art methods by a large margin on large-scale datasets such as ImageNet, with a 38.46\% error rate. Code is available at https://github.com/beandkay/SequenceMatch.

翻訳日:2023-10-25 18:50:04 公開日:2023-10-24

# 小規模確率メタラーニングのためのニューラルネットワークの償却推論

Amortised Inference in Neural Networks for Small-Scale Probabilistic Meta-Learning ( http://arxiv.org/abs/2310.15786v1 )

ライセンス: Link先を確認

Matthew Ashman, Tommy Rochussen and Adrian Weller

(参考訳) BNNに対する大域的誘導点変分近似は、真の後続分布の条件を正確に近似する一連の条件分布を構築するために、一連のインジェクション入力を使用する。我々の重要な洞察は、これらのインプットを実際のデータに置き換えることができ、変動分布は各データポイントに対して近似的な確率の集合からなることである。この構造は、推定ネットワークとして知られるメタモデルを通して各データポイントを渡すことで、各近似近似のパラメータが得られ、アモートされた推論になる。この推論ネットワークを関連するデータセット間でトレーニングすることにより、タスク固有のBNNに対するメタ学習ベイズ推論が可能になる。

The global inducing point variational approximation for BNNs is based on using a set of inducing inputs to construct a series of conditional distributions that accurately approximate the conditionals of the true posterior distribution. Our key insight is that these inducing inputs can be replaced by the actual data, such that the variational distribution consists of a set of approximate likelihoods for each datapoint. This structure lends itself to amortised inference, in which the parameters of each approximate likelihood are obtained by passing each datapoint through a meta-model known as the inference network. By training this inference network across related datasets, we can meta-learn Bayesian inference over task-specific BNNs.

翻訳日:2023-10-25 18:49:34 公開日:2023-10-24

# LLMをテストエキスパートにする - 機能的認識によるモバイルGUIテストへのヒューマンライクなインタラクション

Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions ( http://arxiv.org/abs/2310.15780v1 )

ライセンス: Link先を確認

Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, Qing Wang

(参考訳) 自動化されたグラフィカルユーザインターフェース(gui)テストは、アプリケーションの品質を保証する上で重要な役割を果たす。自動guiテストにおける学習ベースのテクニックの人気は、人間のようなインタラクションを生成する能力によって高まっているが、テストカバレッジの低さ、一般化能力の不十分、トレーニングデータへの依存度など、いくつかの制限に苦しめられている。自然言語理解や質問応答におけるChatGPTのような大規模言語モデル(LLM)の成功に触発されて,我々はQ&AタスクとしてモバイルGUIテスト問題を定式化した。 gptdroidを提案し,guiページ情報をllmに渡してテストスクリプトを省略し,アプリケーションのフィードバックをllmに渡すように実行し,プロセス全体を繰り返すことで,モバイルアプリとのチャットをllmに依頼する。このフレームワークでは、llmにプロセス全体のテスト知識を保持させ、長期にわたって機能ベースの推論を行うことで探索を導く、機能対応メモリプロンプト機構も導入しています。 google playの93のアプリで評価し、最高のベースラインを32%のアクティビティカバレッジで上回り、より速い速度で31%のバグを検出することを実証した。さらに、gptdroidはgoogle playで新たに53のバグを発見し、そのうち35が修正されている。

Automated Graphical User Interface (GUI) testing plays a crucial role in ensuring app quality, especially as mobile applications have become an integral part of our daily lives. Despite the growing popularity of learning-based techniques in automated GUI testing due to their ability to generate human-like interactions, they still suffer from several limitations, such as low testing coverage, inadequate generalization capabilities, and heavy reliance on training data. Inspired by the success of Large Language Models (LLMs) like ChatGPT in natural language understanding and question answering, we formulate the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts, and executing them to keep passing the app feedback to LLM, iterating the whole process. Within this framework, we have also introduced a functionality-aware memory prompting mechanism that equips the LLM with the ability to retain testing knowledge of the whole process and conduct long-term, functionality-based reasoning to guide exploration. We evaluate it on 93 apps from Google Play and demonstrate that it outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate. Moreover, GPTDroid identify 53 new bugs on Google Play, of which 35 have been confirmed and fixed.

翻訳日:2023-10-25 18:49:23 公開日:2023-10-24

# MRIスキャンにおけるプライバシー向上のための3Dマスクオートエンコーダ

3D Masked Autoencoders for Enhanced Privacy in MRI Scans ( http://arxiv.org/abs/2310.15778v1 )

ライセンス: Link先を確認

Lennart Alexander Van der Goten and Kevin Smith

(参考訳) MRIスキャンは貴重な医療情報を提供するが、保護すべき機密かつ個人識別可能な情報(PII)も含む。 MRIメタデータは容易にサニタイズされるが、MRI画像データは患者の頭部の高現実的な3Dヴィジュアライゼーションをレンダリングする情報を含んでいるため、データベースを相互参照することで、悪意あるアクターが被検体を特定できるため、プライバシー上のリスクである。データ匿名化と非識別化は個人の個人情報のプライバシーと機密性の確保に関係している。従来のMRI鑑定法では、特定のスキャンからプライバシーに敏感な部分(目、鼻など)を取り除く。これは、ダウンストリーム分析をオフにできるドメインシフトの導入に費やされる。近年,GANをベースとしたアプローチが提案され,患者のスキャンを部品の除去ではなく (顔の変更など) 改造して識別する手法が提案されている。本研究では,マスク付きオートエンコーダを用いて顔を非識別するモデルcp-maeを提案する。この方法では,最大256^3$(以前は128立方体)の解像度のスキャンを合成することができ,ボクセルの数が8倍に増加した。構築した構成を使って、非常に堅牢なトレーニングステージを示すシステムを設計することができ、ネットワークを新しいデータに適合させるのが容易になりました。

MRI scans provide valuable medical information, however they also contain sensitive and personally identifiable information (PII) that needs to be protected. Whereas MRI metadata is easily sanitized, MRI image data is a privacy risk because it contains information to render highly-realistic 3D visualizations of a patient's head, enabling malicious actors to possibly identify the subject by cross-referencing a database. Data anonymization and de-identification is concerned with ensuring the privacy and confidentiality of individuals' personal information. Traditional MRI de-identification methods remove privacy-sensitive parts (e.g. eyes, nose etc.) from a given scan. This comes at the expense of introducing a domain shift that can throw off downstream analyses. Recently, a GAN-based approach was proposed to de-identify a patient's scan by remodeling it (e.g. changing the face) rather than by removing parts. In this work, we propose CP-MAE, a model that de-identifies the face using masked autoencoders and that outperforms all previous approaches in terms of downstream task performance as well as de-identification. With our method we are able to synthesize scans of resolution up to $256^3$ (previously 128 cubic) which constitutes an eight-fold increase in the number of voxels. Using our construction we were able to design a system that exhibits a highly robust training stage, making it easy to fit the network on novel data.

翻訳日:2023-10-25 18:48:58 公開日:2023-10-24

# MindLLM: スクラッチ、評価、ドメイン・アプリケーションからトレーニング済みの軽量大言語モデル

MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications ( http://arxiv.org/abs/2310.15777v1 )

ライセンス: Link先を確認

Yizhe Yang, Huashan Sun, Jiawei Li, Runheng Liu, Yinghao Li, Yuhang Liu, Heyan Huang, Yang Gao

(参考訳) 大規模言語モデル(LLM)は、様々な自然言語タスクにおいて顕著な性能を示し、汎用人工知能への大きな一歩を踏み出した。汎用人工知能は、ますます大規模なモデルを開発することで活用されているが、LLMのトレーニングとデプロイのコストとリソース不足を考慮して、特定のドメインにより良いサービスを提供する軽量なカスタムモデルを開発するための別の部門が存在する可能性がある。本稿では,13億,30億のパラメータを持つモデルを提供することで,その負担を軽減するために,スクラッチから訓練したバイリンガル軽量大言語モデルであるMindLLMを提案する。データ構築、モデルアーキテクチャ、評価、アプリケーションなど、プロセスのすべてのステップをカバーしている。このような洞察は、同僚の学者や開発者にとって有益である。 MindLLMは、いくつかの公開ベンチマークにおいて、他のオープンソースの大規模モデルのパフォーマンスと一貫して一致または上回っている。また,小型モデルに適した革新的な命令チューニングフレームワークを導入し,その能力を向上させる。さらに、法律や金融といった特定の垂直領域におけるMindLLMの適用について検討し、軽量モデルの俊敏性と適応性を強調します。

Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.

翻訳日:2023-10-25 18:48:30 公開日:2023-10-24

# CP$^{\infty}$ and beyond: 2-カテゴリー拡張理論

CP$^{\infty}$ and beyond: 2-categorical dilation theory ( http://arxiv.org/abs/2310.15776v1 )

ライセンス: Link先を確認

Robert Allen and Dominic Verdon

(参考訳) カテゴリー量子力学の洞察と技法を無限次元系に拡張する問題は (coecke and heunen, 2016) で検討された。その仕事において、ヒルベルト空間と有界線型写像の圏からヒルベルト空間と量子演算の圏を復元する$\mathrm{CP}^{\infty}$-コンストラクションが定義された。ここで、$\mathrm{cp}^{\infty}$-コンストラクションの‘ホリゾンタル分類’によって、フォン・ノイマン代数、双加群、インタートウィナーの2-圏 $[w^*]$ からすべてのフォン・ノイマン代数とチャネル(正規ユニタリ正の写像)の圏を回復できることを示す。応用として、チェーの有限次元行列代数間の極端チャネルのキャラクタリゼーションを任意のフォン・ノイマン代数間の極端チャネルのキャラクタリゼーションに拡張する。

The problem of extending the insights and techniques of categorical quantum mechanics to infinite-dimensional systems was considered in (Coecke and Heunen, 2016). In that work the $\mathrm{CP}^{\infty}$-construction, which recovers the category of Hilbert spaces and quantum operations from the category of Hilbert spaces and bounded linear maps, was defined. Here we show that by a `horizontal categorification' of the $\mathrm{CP}^{\infty}$-construction, one can recover the category of all von Neumann algebras and channels (normal unital completely positive maps) from the 2-category $[W^*]$ of von Neumann algebras, bimodules and intertwiners. As an application, we extend Choi's characterisation of extremal channels between finite-dimensional matrix algebras to a characterisation of extremal channels between arbitrary von Neumann algebras.

翻訳日:2023-10-25 18:48:11 公開日:2023-10-24

# BLESS: 文の単純化に関する大規模言語モデルのベンチマーク

BLESS: Benchmarking Large Language Models on Sentence Simplification ( http://arxiv.org/abs/2310.15773v1 )

ライセンス: Link先を確認

Tannon Kew, Alison Chi, Laura V\'asquez-Rodr\'iguez, Sweta Agrawal, Dennis Aumiller, Fernando Alva-Manchego, Matthew Shardlow

(参考訳) 本稿では,最新の大規模言語モデル(LLM)の総合的なパフォーマンスベンチマークであるBLESSについて,テキスト単純化(TS)の課題について紹介する。そこで,本研究では,各ドメインの3つのテストセット(Wikipedia,ニュース,医療)に対して,サイズ,アーキテクチャ,事前学習方法,アクセシビリティなど,44種類のモデルを比較して,この課題を克服する方法について検討する。本分析では,異なるモデルで実行される共通編集操作のタイプについて,一連の自動測定値と大規模に定量的に検討する。さらに,モデル出力のサブセットを手作業で定性解析することにより,生成した単純化の品質を評価する。評価の結果,最高のLSMはTSのトレーニングを受けていないにもかかわらず,最先端のTSベースラインと相容れない性能を示した。さらに,一部のLCMでは編集操作の幅と多様性がより大きいことが判明した。私たちのパフォーマンスベンチマークは、将来のTSメソッドと評価メトリクスの開発のためのリソースとして利用できます。

We present BLESS, a comprehensive performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS). We examine how well off-the-shelf LLMs can solve this challenging task, assessing a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting. Our analysis considers a suite of automatic metrics as well as a large-scale quantitative investigation into the types of common edit operations performed by the different models. Furthermore, we perform a manual qualitative analysis on a subset of model outputs to better gauge the quality of the generated simplifications. Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines. Additionally, we find that certain LLMs demonstrate a greater range and diversity of edit operations. Our performance benchmark will be available as a resource for the development of future TS methods and evaluation metrics.

翻訳日:2023-10-25 18:47:50 公開日:2023-10-24

# 非自然言語処理: 言語モデルはマシン生成プロンプトをどのように扱うか?

Unnatural language processing: How do language models handle machine-generated prompts? ( http://arxiv.org/abs/2310.15829v1 )

ライセンス: Link先を確認

Corentin Kervadec, Francesca Franzon and Marco Baroni

(参考訳) 言語モデルプロンプト最適化研究は、モデル埋め込み空間からのベクトル列を含む、明確な意味や構文構造を持たない自動生成されたトークンシーケンスによって、意味論的および文法上、手作業によるプロンプトがルーチン的に上回ることを示した。我々は機械生成プロンプトを用いて、自然言語表現を含まない入力に対してモデルがどのように反応するかを探索する。連続的および離散的な機械生成プロンプトに応答し,複数の意味タスクにおいて異なる大きさのモデルの挙動を考察し,人間の生成した自然言語プロンプトに応答する振る舞いと比較した。同様の出力を生成する場合でも、マシン生成とヒューマンプロンプトは、異なるパープレキシティ、異なる注意と出力エントロピー分布、異なるユニットアクティベーションプロファイルを含む、ネットワーク処理経路を通じて異なる応答パターンをトリガーする。我々は、異なるプロンプトタイプによって活性化される単位の性質について予備的な洞察を与え、自然言語のみが真に言語的な回路をリクルートすることを示唆する。

Language model prompt optimization research has shown that semantically and grammatically well-formed manually crafted prompts are routinely outperformed by automatically generated token sequences with no apparent meaning or syntactic structure, including sequences of vectors from a model's embedding space. We use machine-generated prompts to probe how models respond to input that is not composed of natural language expressions. We study the behavior of models of different sizes in multiple semantic tasks in response to both continuous and discrete machine-generated prompts, and compare it to the behavior in response to human-generated natural-language prompts. Even when producing a similar output, machine-generated and human prompts trigger different response patterns through the network processing pathways, including different perplexities, different attention and output entropy distributions, and different unit activation profiles. We provide preliminary insight into the nature of the units activated by different prompt types, suggesting that only natural language prompts recruit a genuinely linguistic circuit.

翻訳日:2023-10-25 18:41:58 公開日:2023-10-24

# 高度高分解能3次元ResUNetによる自動大動脈切開 : SEG.Aチャレンジへの貢献

Automatic Aorta Segmentation with Heavily Augmented, High-Resolution 3-D ResUNet: Contribution to the SEG.A Challenge ( http://arxiv.org/abs/2310.15827v1 )

ライセンス: Link先を確認

Marek Wodzinski and Henning M\"uller

(参考訳) 3次元医用量の自動大動脈分割は重要な課題である。いくつかの要因は、大動脈解離の可能性や、小枝の分節化や注釈の難しさなど、問題を難しくしている。この研究は、MICCAI 2023カンファレンスで組織されたSEGへのMedGIFTチームの貢献を示す。ディープエンコーダ・デコーダアーキテクチャに基づく完全自動アルゴリズムを提案する。私たちの研究の主な前提は、特に低いデータ構造において、データ前処理と拡張がディープアーキテクチャよりもずっと重要であるということです。したがって、この解は伝統的な畳み込みU-Netの変種に基づいている。提案手法は,すべてのテストケースに対して0.9以上のdiceスコアを達成し,参加者の安定性が最も高かった。本法は, 臨床評価, 定量的結果, 容積メッシュの質について, 1位, 4位, 3位と評価した。ソースコードと事前訓練されたモデルを自由にリリースし、Grand-Challengeプラットフォーム上でアルゴリズムへのアクセスを提供する。

Automatic aorta segmentation from 3-D medical volumes is an important yet difficult task. Several factors make the problem challenging, e.g. the possibility of aortic dissection or the difficulty with segmenting and annotating the small branches. This work presents a contribution by the MedGIFT team to the SEG.A challenge organized during the MICCAI 2023 conference. We propose a fully automated algorithm based on deep encoder-decoder architecture. The main assumption behind our work is that data preprocessing and augmentation are much more important than the deep architecture, especially in low data regimes. Therefore, the solution is based on a variant of traditional convolutional U-Net. The proposed solution achieved a Dice score above 0.9 for all testing cases with the highest stability among all participants. The method scored 1st, 4th, and 3rd in terms of the clinical evaluation, quantitative results, and volumetric meshing quality, respectively. We freely release the source code, pretrained model, and provide access to the algorithm on the Grand-Challenge platform.

翻訳日:2023-10-25 18:41:38 公開日:2023-10-24

# コンセプトドリフトについて知っておくべきこと - 進化する環境のモニタリングに関するサーベイ

One or Two Things We know about Concept Drift -- A Survey on Monitoring Evolving Environments ( http://arxiv.org/abs/2310.15826v1 )

ライセンス: Link先を確認

Fabian Hinder and Valerie Vaquet and Barbara Hammer

(参考訳) 私たちを取り巻く世界は常に変化している。これらの変化は、しばしば概念の漂流と表現され、多くの産業や技術プロセスに影響を及ぼす。多くのシナリオでは安全性が重要であり、誤動作やその他の異常な行動につながる可能性があるため、概念ドリフトの検出と分析が不可欠である。本稿では,教師なしデータストリームにおける概念ドリフトに着目した文献レビューを行う。多くの調査は教師なしのデータストリームにフォーカスしているが、教師なしの設定をレビューする作業はない。しかし、この設定はモニタリングや異常検出に特に関連しており、エンジニアリングにおける多くのタスクや課題に直接適用できる。この調査は、ドリフト検出に関する既存の研究の分類を提供する。さらに、ドリフトの局在に関する研究の現状を体系的な方法でカバーしている。体系的な文献レビューの提供に加えて、本研究は、考慮された問題の正確な数学的定義を提供し、パラメトリックな人工データセットの標準化実験を含んでおり、検出とローカライゼーションの異なる戦略を直接比較することができる。これにより、異なるスキームの適合性を系統的に分析し、実世界のシナリオで使用するためのガイドラインを提供できる。最後に、概念ドリフトを説明するという新しいトピックのセクションがある。

The world surrounding us is subject to constant change. These changes, frequently described as concept drift, influence many industrial and technical processes. As they can lead to malfunctions and other anomalous behavior, which may be safety-critical in many scenarios, detecting and analyzing concept drift is crucial. In this paper, we provide a literature review focusing on concept drift in unsupervised data streams. While many surveys focus on supervised data streams, so far, there is no work reviewing the unsupervised setting. However, this setting is of particular relevance for monitoring and anomaly detection which are directly applicable to many tasks and challenges in engineering. This survey provides a taxonomy of existing work on drift detection. Besides, it covers the current state of research on drift localization in a systematic way. In addition to providing a systematic literature review, this work provides precise mathematical definitions of the considered problems and contains standardized experiments on parametric artificial datasets allowing for a direct comparison of different strategies for detection and localization. Thereby, the suitability of different schemes can be analyzed systematically and guidelines for their usage in real-world scenarios can be provided. Finally, there is a section on the emerging topic of explaining concept drift.

翻訳日:2023-10-25 18:41:23 公開日:2023-10-24

# Rosetta Stone - KSAA-RD Shared Task: 言語モデリングから単語定義へ

Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To Word--Definition Alignment ( http://arxiv.org/abs/2310.15823v1 )

ライセンス: Link先を確認

Ahmed ElBakry, Mohamed Gabr, Muhammad ElNokrashy, Badr AlKhamissi

(参考訳) 逆辞書は、ユーザーが提供された定義、意味、記述に基づいて単語を発見できるツールである。このような手法は様々なシナリオで有用であり、同一性のない単語の記述を持つ言語学習者を支援し、正確な用語を求める作家に利益をもたらす。これらのシナリオは、しばしば"Tip-of-the-Tongue"(TOT)現象と呼ばれる現象をカプセル化する。本稿では,アラビア語逆辞書共有タスクの勝利解を提案する。この課題は、アラビア語のベクトル表現を付随する記述から導出することに焦点を当てている。共有タスクは2つの異なるサブタスクを含む: 1つはアラビア語の定義を入力として含み、もう1つは英語の定義を用いる。最初のサブタスクに対して、我々のアプローチは、与えられた定義に埋め込まれた単語を予測し、微調整されたアラビアBERTベースのモデルの集合に依存する。最終的な表現は、アンサンブル内の各モデルからの出力埋め込み平均化によって得られる。対照的に、第2サブタスクの最も効果的な解決策は、英語のテスト定義をアラビア語に翻訳し、最初は第1サブタスクのために訓練された微調整モデルに適用することである。この簡単な方法は両方のサブタスクで最高点を達成する。

A Reverse Dictionary is a tool enabling users to discover a word based on its provided definition, meaning, or description. Such a technique proves valuable in various scenarios, aiding language learners who possess a description of a word without its identity, and benefiting writers seeking precise terminology. These scenarios often encapsulate what is referred to as the "Tip-of-the-Tongue" (TOT) phenomena. In this work, we present our winning solution for the Arabic Reverse Dictionary shared task. This task focuses on deriving a vector representation of an Arabic word from its accompanying description. The shared task encompasses two distinct subtasks: the first involves an Arabic definition as input, while the second employs an English definition. For the first subtask, our approach relies on an ensemble of finetuned Arabic BERT-based models, predicting the word embedding for a given definition. The final representation is obtained through averaging the output embeddings from each model within the ensemble. In contrast, the most effective solution for the second subtask involves translating the English test definitions into Arabic and applying them to the finetuned models originally trained for the first subtask. This straightforward method achieves the highest score across both subtasks.

翻訳日:2023-10-25 18:41:04 公開日:2023-10-24

# 社会的アイデンティティバイアスを示す生成言語モデル

Generative Language Models Exhibit Social Identity Biases ( http://arxiv.org/abs/2310.15819v1 )

ライセンス: Link先を確認

Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek

(参考訳) 大規模言語モデルの人気の高まりは、これらのモデルが人間から学べるバイアスに対する懸念を引き起こした。本研究では,51大言語モデルに内集団連帯性と外集団敵意,社会科学の基本的な社会的バイアスが存在するかを検討する。ほとんどすべての基礎言語モデルといくつかの命令微調整モデルは、文の完全化を促されたとき(例えば、「我々は...」など)、明らかな非群正および外集団負のバイアスを示す。 LLM生成文とインターネット上の人書き文を比較すると、これらのモデルが人間のテキストと同等のバイアスレベルを示していることが分かる。これらのバイアスがどこから発生したのかを調べるために,米国民主党・共和党の分断の文脈で,モデルが微調整中に露呈した非グループ陽性または非グループ陰性の文の量を実験的に変化させた。その結果,モデルでは,グループ内連帯の著しい増加と,グループ外敵性の増加がみられた。さらに、微調整データから非群陽性または非群陰性の文(または両方)を削除すると、非群連帯と非群敵性の両方が著しく減少し、偏りのあるトレーニングデータを削除することでバイアスを低減できることが示唆される。以上より,現代言語モデルは基本的な社会的アイデンティティバイアスを示し,そのバイアスをトレーニングデータのキュレーションによって軽減できることが示唆された。以上の結果から, バイアスの少ない大規模言語モデルの作成や, ヒトのバイアス強化を防止すべく, llmとのユーザインタラクションに関するさらなる研究の必要性を浮き彫りにした。

The surge in popularity of large language models has given rise to concerns about biases that these models could learn from humans. In this study, we investigate whether ingroup solidarity and outgroup hostility, fundamental social biases known from social science, are present in 51 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative biases when prompted to complete sentences (e.g., "We are..."). A comparison of LLM-generated sentences with human-written sentences on the internet reveals that these models exhibit similar level, if not greater, levels of bias than human text. To investigate where these biases stem from, we experimentally varied the amount of ingroup-positive or outgroup-negative sentences the model was exposed to during fine-tuning in the context of the United States Democrat-Republican divide. Doing so resulted in the models exhibiting a marked increase in ingroup solidarity and an even greater increase in outgroup hostility. Furthermore, removing either ingroup-positive or outgroup-negative sentences (or both) from the fine-tuning data leads to a significant reduction in both ingroup solidarity and outgroup hostility, suggesting that biases can be reduced by removing biased training data. Our findings suggest that modern language models exhibit fundamental social identity biases and that such biases can be mitigated by curating training data. Our results have practical implications for creating less biased large-language models and further underscore the need for more research into user interactions with LLMs to prevent potential bias reinforcement in humans.

翻訳日:2023-10-25 18:40:43 公開日:2023-10-24

# 自己回帰拡散モデルのための判別器ガイダンス

Discriminator Guidance for Autoregressive Diffusion Models ( http://arxiv.org/abs/2310.15817v1 )

ライセンス: Link先を確認

Filip Ekstr\"om Kelvinius, Fredrik Lindsten

(参考訳) 自己回帰拡散モデルの設定において判別器ガイダンスを導入する。拡散過程を導くための判別器の使用は、これまで連続拡散モデルに用いられてきたが、本研究では、離散的な場合において、事前訓練された生成モデルとともに判別器を使用する方法が導出される。まず,最適判別器を用いて事前学習したモデルを修正し,基礎となるデータ分布から正確なサンプリングを可能にすることを示す。第2に、サブ最適判別器を使用する現実的なシナリオを考慮し、生成過程において、識別器からの予測を反復的に考慮した逐次モンテカルロアルゴリズムを導出する。これらのアプローチを分子グラフ生成のタスクでテストし,事前学習したモデルのみを用いて識別器が生成性能をいかに改善するかを示す。

We introduce discriminator guidance in the setting of Autoregressive Diffusion Models. The use of a discriminator to guide a diffusion process has previously been used for continuous diffusion models, and in this work we derive ways of using a discriminator together with a pretrained generative model in the discrete case. First, we show that using an optimal discriminator will correct the pretrained model and enable exact sampling from the underlying data distribution. Second, to account for the realistic scenario of using a sub-optimal discriminator, we derive a sequential Monte Carlo algorithm which iteratively takes the predictions from the discrimiator into account during the generation process. We test these approaches on the task of generating molecular graphs and show how the discriminator improves the generative performance over using only the pretrained model.

翻訳日:2023-10-25 18:40:10 公開日:2023-10-24

# 非線形次元の低減と現在 : ml時代の散逸pdesを目指して

Nonlinear dimensionality reduction then and now: AIMs for dissipative PDEs in the ML era ( http://arxiv.org/abs/2310.15816v1 )

ライセンス: Link先を確認

Eleni D. Koronaki, Nikolaos Evangelou, Cristina P. Martin-Linares, Edriss S. Titi and Ioannis G. Kevrekidis

(参考訳) 本研究では,分散動的システムのための還元次モデル(rom)を構築するための純粋データ駆動ワークフローの集合について述べる。私たちが注目しているROMは、近似慣性マニフォールド(AIM)の理論にインスパイアされ、テンプレート化されたデータアシストモデルです。その適用性は拡張可能であり、正確な切り裂かれたガレルキン射影と閉じた補正の導出の必要性は機械学習ツールを使って回避できる。右潜在変数が既知でない場合、自己エンコーダと拡散写像(多様体学習スキーム)が、潜在変数のよい集合を発見し、それらの説明可能性をテストするためにどのように用いられるかを説明する。提案手法はROMを表現できる。 (a)理論(フーリエ係数) (b)線形データ駆動(PODモード)及び/または (c)非線形データ駆動(拡散マップ)座標 Black-Box と Gray-Box のモデル (理論上はインフォームドとデータ修正) はどちらも記述されているが、後者の必要性は truncated Galerkin projections が不正確すぎて後処理ができない場合に生じる。チャフィー・インファント反応拡散方程式とクラモト・シヴァシンスキー散逸偏微分方程式を用いて, 全体の枠組みを説明, 検証した。

This study presents a collection of purely data-driven workflows for constructing reduced-order models (ROMs) for distributed dynamical systems. The ROMs we focus on, are data-assisted models inspired by, and templated upon, the theory of Approximate Inertial Manifolds (AIMs); the particular motivation is the so-called post-processing Galerkin method of Garcia-Archilla, Novo and Titi. Its applicability can be extended: the need for accurate truncated Galerkin projections and for deriving closed-formed corrections can be circumvented using machine learning tools. When the right latent variables are not a priori known, we illustrate how autoencoders as well as Diffusion Maps (a manifold learning scheme) can be used to discover good sets of latent variables and test their explainability. The proposed methodology can express the ROMs in terms of (a) theoretical (Fourier coefficients), (b) linear data-driven (POD modes) and/or (c) nonlinear data-driven (Diffusion Maps) coordinates. Both Black-Box and (theoretically-informed and data-corrected) Gray-Box models are described; the necessity for the latter arises when truncated Galerkin projections are so inaccurate as to not be amenable to post-processing. We use the Chafee-Infante reaction-diffusion and the Kuramoto-Sivashinsky dissipative partial differential equations to illustrate and successfully test the overall framework.

翻訳日:2023-10-25 18:39:55 公開日:2023-10-24

# 良いこと: 騒々しいデモのための自己モチベーション模倣学習

Good Better Best: Self-Motivated Imitation Learning for noisy Demonstrations ( http://arxiv.org/abs/2310.15815v1 )

ライセンス: Link先を確認

Ye Yuan, Xin Li, Yong Heng, Leiji Zhang, MingZhong Wang

(参考訳) イミテーションラーニング(IL)は,エージェントの行動と専門家によるデモンストレーションとの相違を最小化することで,政策の発見を目指す。しかし、ilは非熟練の行動から騒がしいデモンストレーションによって課される制限を受けやすく、その専門性を評価するための補足的な情報がないことが大きな課題となっている。本稿では,現在の方針に劣る方針によって収集されたデモを段階的にフィルタリングし,追加情報を必要としない自己モチベーション模倣学習(smile)を提案する。拡散モデルの前方および逆の過程を利用して, 実演知識の低レベルから高レベルへのシフトをエミュレートし, 実演知識を拡散する雑音情報を抽出する。そして,そのノイズ情報を利用して,現状の政策と実証者間の拡散過程を予測し,それらの専門的ギャップに対する等価性を理論的に実証する。さらに, 予測拡散ステップを適用して, 自己動機づけによる騒音を除去し, その理論的根拠を提供する方法について, 詳細に説明する。提案手法は,MuJoCoタスクに対する経験的評価を通じて,ノイズの多い実演中のエキスパートポリシーの学習に長けており,現在の政策に劣る専門知識を持つデモンストレーションを効果的にフィルタリングする。

Imitation Learning (IL) aims to discover a policy by minimizing the discrepancy between the agent's behavior and expert demonstrations. However, IL is susceptible to limitations imposed by noisy demonstrations from non-expert behaviors, presenting a significant challenge due to the lack of supplementary information to assess their expertise. In this paper, we introduce Self-Motivated Imitation LEarning (SMILE), a method capable of progressively filtering out demonstrations collected by policies deemed inferior to the current policy, eliminating the need for additional information. We utilize the forward and reverse processes of Diffusion Models to emulate the shift in demonstration expertise from low to high and vice versa, thereby extracting the noise information that diffuses expertise. Then, the noise information is leveraged to predict the diffusion steps between the current policy and demonstrators, which we theoretically demonstrate its equivalence to their expertise gap. We further explain in detail how the predicted diffusion steps are applied to filter out noisy demonstrations in a self-motivated manner and provide its theoretical grounds. Through empirical evaluations on MuJoCo tasks, we demonstrate that our method is proficient in learning the expert policy amidst noisy demonstrations, and effectively filters out demonstrations with expertise inferior to the current policy.

翻訳日:2023-10-25 18:39:28 公開日:2023-10-24

# 損失超伝導量子回路の非線形応答理論

Nonlinear response theory for lossy superconducting quantum circuits ( http://arxiv.org/abs/2310.15802v1 )

ライセンス: Link先を確認

V. Vadimov, M. Xu, J. T. Stockburger, J. Ankerhold, and M. M\"ott\"onen

(参考訳) 最小拡張状態空間における量子散逸の枠組みに基づいて、損失超伝導量子回路向けに開発された数値的完全かつ計算可能な非線形応答理論を提案する。系の自由度を回路の非線形要素とする開量子系に対するファインマン-ヴァーノン経路積分形式から始め、回路の自由度に結合した複素数値周波数を持つ補助調和モードを導入することにより、すべての線形要素の時間的非局所的影響関数を除去する。本研究では,実験から着想を得た平均観測値の概念を提案し,その準確率分布を生成するための公式を提供する。さらに,ドライブの存在下での弱い結合近似を系統的に導出し,超伝導量子ビットの分散可読化に関する研究を通じて,形式性の適用性を示す。開発されたフレームワークは、弱い散逸、高温、弱い駆動への典型的なアプローチの制限なしに、その環境と結合した非線形量子回路の包括的完全量子力学的処理を可能にする。さらに,本研究の量子計測理論への影響についても論じる。

We introduce a numerically exact and yet computationally feasible nonlinear response theory developed for lossy superconducting quantum circuits based on a framework of quantum dissipation in a minimally extended state space. Starting from the Feynman--Vernon path integral formalism for open quantum systems with the system degrees of freedom being the nonlinear elements of the circuit, we eliminate the temporally non-local influence functional of all linear elements by introducing auxiliary harmonic modes with complex-valued frequencies coupled to the non-linear degrees of freedom of the circuit. In our work, we propose a concept of time-averaged observables, inspired by experiment, and provide an explicit formula for producing their quasiprobability distribution. Furthermore, we systematically derive a weak-coupling approximation in the presence of a drive, and demonstrate the applicability of our formalism through a study on the dispersive readout of a superconducting qubit. The developed framework enables a comprehensive fully quantum-mechanical treatment of nonlinear quantum circuits coupled to their environment, without the limitations of typical approaches to weak dissipation, high temperature, and weak drive. Furthermore, we discuss the implications of our findings to the quantum measurement theory.

翻訳日:2023-10-25 18:39:03 公開日:2023-10-24

# DALE: 低リソースの法定NLPのための生成データ拡張

DALE: Generative Data Augmentation for Low-Resource Legal NLP ( http://arxiv.org/abs/2310.15799v1 )

ライセンス: Link先を確認

Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar, S Ramaneswaran, S Sakshi, Utkarsh Tyagi, Dinesh Manocha

(参考訳) 低リソースレガルNLPのための新規かつ効果的な生成データ拡張フレームワークであるDALEを提案する。 DALEは、法律文書の効果的なデータ拡張において、既存のフレームワークがもたらす課題に対処する - 専門的な語彙と複雑な意味論、形態学、構文を持つ法律言語は、ソース文を単に言い換えるデータ拡張の恩恵を受けない。この問題に対処するために,エンコーダ・デコーダ言語モデル上に構築されたDALEは,選択的マスキングに基づく新たな教師なしテキスト記述目標に基づいて事前訓練されている。これらはDALEが法的概念、原則、言語使用に関する知識を得るのに役立つ。その結果、新しい文脈でコヒーレントで多様な拡張を生成する能力が発達する。最後に、DALEは条件付き生成を行い、低リソースのLegal NLPタスクのための合成拡張を生成する。 6つのタスクと4つの低リソース設定にまたがる13のデータセットに対するDALEの有効性を示す。 DALEは、LLMを含むすべてのベースラインを質的かつ定量的に上回り、1%から50%改善しました。

We present DALE, a novel and effective generative Data Augmentation framework for low-resource LEgal NLP. DALE addresses the challenges existing frameworks pose in generating effective data augmentations of legal documents - legal language, with its specialized vocabulary and complex semantics, morphology, and syntax, does not benefit from data augmentations that merely rephrase the source sentence. To address this, DALE, built on an Encoder-Decoder Language Model, is pre-trained on a novel unsupervised text denoising objective based on selective masking - our masking strategy exploits the domain-specific language characteristics of templatized legal documents to mask collocated spans of text. Denoising these spans helps DALE acquire knowledge about legal concepts, principles, and language usage. Consequently, it develops the ability to generate coherent and diverse augmentations with novel contexts. Finally, DALE performs conditional generation to generate synthetic augmentations for low-resource Legal NLP tasks. We demonstrate the effectiveness of DALE on 13 datasets spanning 6 tasks and 4 low-resource settings. DALE outperforms all our baselines, including LLMs, qualitatively and quantitatively, with improvements of 1%-50%.

翻訳日:2023-10-25 18:38:42 公開日:2023-10-24

# スピンウィグナー関数核のノイズ調整構成

Noise-tailored Constructions for Spin Wigner Function Kernels ( http://arxiv.org/abs/2310.15855v1 )

ライセンス: Link先を確認

Michael Hanks and Soovin Lee and M. S. Kim

(参考訳) ノイズの多い中間スケール量子デバイスの有効利用には、サンプル測定分布の精度を向上させるために誤差緩和が必要である。これらの分布に対するノイズの影響がより正確にモデル化できるほど、誤差の緩和は理論的な境界に近づくことができる。雑音量子チャネルの特性化と一般観測量への影響の推測は難しい問題であるが、多くの場合、表現の変化は解析を大幅に単純化することができる。本稿では,マルチキュートシステムのスピンウィグナー関数について検討する。従来のカーネル構造を一般化し、いくつかの確率的ユニタリノイズモデルの効果を少ないパラメータで捉える。

The effective use of noisy intermediate-scale quantum devices requires error mitigation to improve the accuracy of sampled measurement distributions. The more accurately the effects of noise on these distributions can be modeled, the more closely error mitigation will be able to approach theoretical bounds. The characterisation of noisy quantum channels and the inference of their effects on general observables are challenging problems, but in many cases a change in representation can greatly simplify the analysis. Here, we investigate spin Wigner functions for multi-qudit systems. We generalise previous kernel constructions, capturing the effects of several probabilistic unitary noise models in few parameters.

翻訳日:2023-10-25 18:30:51 公開日:2023-10-24

# イベント時間空間分割学習によるイベント時間予測の改善

Improving Event Time Prediction by Learning to Partition the Event Time Space ( http://arxiv.org/abs/2310.15853v1 )

ライセンス: Link先を確認

Jimmy Hickey, Ricardo Henao, Daniel Wojdyla, Michael Pencina, Matthew M. Engelhard

(参考訳) 近年の生存率解析手法は, 既定(離散)時間間隔ごとに事象発生確率を予測することにより, 既存の手法を改良した。イベント密度に強いパラメトリック仮定を置くことを避けることで、特にデータが豊富である場合、この手法は予測性能を改善する傾向にある。しかし、利用可能なデータが少ない臨床環境では、目の前の予測タスクに適した限られた間隔に、イベント時間空間を適切に分割することが望ましいことが多い。本研究では,そのような分割を定義する切断点の集合をデータから学習する手法を開発する。 2つのシミュレーションデータセットにおいて、基礎となる生成モデルにマッチする間隔を回復できることを示す。次に,新たに調和した脳卒中リスク予測データセットを含む実世界の3つの観測データに対して,予測性能の向上を示す。最後に,本手法は,より正確なリスク予測を促進するという意味で,各タスクに最も適した時間間隔を提案することにより,臨床意思決定を促進する。

Recently developed survival analysis methods improve upon existing approaches by predicting the probability of event occurrence in each of a number pre-specified (discrete) time intervals. By avoiding placing strong parametric assumptions on the event density, this approach tends to improve prediction performance, particularly when data are plentiful. However, in clinical settings with limited available data, it is often preferable to judiciously partition the event time space into a limited number of intervals well suited to the prediction task at hand. In this work, we develop a method to learn from data a set of cut points defining such a partition. We show that in two simulated datasets, we are able to recover intervals that match the underlying generative model. We then demonstrate improved prediction performance on three real-world observational datasets, including a large, newly harmonized stroke risk prediction dataset. Finally, we argue that our approach facilitates clinical decision-making by suggesting time intervals that are most appropriate for each task, in the sense that they facilitate more accurate risk prediction.

翻訳日:2023-10-25 18:30:14 公開日:2023-10-24

# 人工フランス語データを用いたトランスフォーマー言語モデルにおけるジェンダーバイアス発生の理解

Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models ( http://arxiv.org/abs/2310.15852v1 )

ライセンス: Link先を確認

Lina Conti and Guillaume Wisniewski

(参考訳) 多くの研究が、言語モデルが直接の監督なしに様々な言語特性を学ぶ能力を示している。この研究は、神経モデルがジェンダーなどの単語の言語的性質や、その使用法を規定する規則をいかに発見するかという、あまり研究されていないトピックを探求するための最初のステップである。本稿では,フランス語をベースとしたPCFGが生成した人工コーパスを用いて,トレーニングデータ中の性別分布を正確に制御し,モデルが正しく性別情報をキャプチャした条件を決定することを提案する。

Numerous studies have demonstrated the ability of neural language models to learn various linguistic properties without direct supervision. This work takes an initial step towards exploring the less researched topic of how neural models discover linguistic properties of words, such as gender, as well as the rules governing their usage. We propose to use an artificial corpus generated by a PCFG based on French to precisely control the gender distribution in the training data and determine under which conditions a model correctly captures gender information or, on the contrary, appears gender-biased.

翻訳日:2023-10-25 18:29:48 公開日:2023-10-24

# セルフガード: LLMに自身を守る力を与える

Self-Guard: Empower the LLM to Safeguard Itself ( http://arxiv.org/abs/2310.15851v1 )

ライセンス: Link先を確認

Zezhong Wang, Fangkai Yang, Lu Wang, Pu Zhao, Hongru Wang, Liang Chen, Qingwei Lin, Kam-Fai Wong

(参考訳) 脱獄攻撃は、大規模な言語モデル(llm)の安全性対策をバイパスし、有害なコンテンツを生成することができる。このLSMの誤用は社会的な悪影響を及ぼす。現在、ジェイルブレイク攻撃に対処するには、安全トレーニングとセーフガードという2つの主要なアプローチがある。安全性トレーニングは、安全性を高めるためのさらなる訓練llmに焦点を当てている。一方、セーフガードには、有害な出力を防ぐための外部モデルやフィルタの実装が含まれる。しかし、安全性トレーニングは新しい攻撃タイプに適応する能力に制約があり、しばしばモデルパフォーマンスの低下につながる。セーフガードは限られた助けになる。これらの問題に対処するため,我々は,両安全法の強みを組み合わせたセルフガードと呼ばれる新しいアプローチを提案する。セルフガードには2つのステージがある。第1段階では,有害コンテンツを評価するモデルの能力を高め,第2段階では,有害コンテンツ検出を自己の応答に対して一貫して行うように指示する。この実験は、自己保護がジェイルブレイク攻撃に対して堅牢であることを実証した。悪いケース分析では、LLMは時に有害なクエリに対する無害な応答を提供する。さらに,安全訓練前後におけるLLMの汎用能力を評価し,自己ガードがLLMの性能劣化を招かないことを示す。感度テストでは、Self-GuardはLSMの過敏性を引き起こすことを避けるだけでなく、この問題を軽減することもできる。

The jailbreak attack can bypass the safety measures of a Large Language Model (LLM), generating harmful content. This misuse of LLM has led to negative societal consequences. Currently, there are two main approaches to address jailbreak attacks: safety training and safeguards. Safety training focuses on further training LLM to enhance its safety. On the other hand, safeguards involve implementing external models or filters to prevent harmful outputs. However, safety training has constraints in its ability to adapt to new attack types and often leads to a drop in model performance. Safeguards have proven to be of limited help. To tackle these issues, we propose a novel approach called Self-Guard, which combines the strengths of both safety methods. Self-Guard includes two stages. In the first stage, we enhance the model's ability to assess harmful content, and in the second stage, we instruct the model to consistently perform harmful content detection on its own responses. The experiment has demonstrated that Self-Guard is robust against jailbreak attacks. In the bad case analysis, we find that LLM occasionally provides harmless responses to harmful queries. Additionally, we evaluated the general capabilities of the LLM before and after safety training, providing evidence that Self-Guard does not result in the LLM's performance degradation. In sensitivity tests, Self-Guard not only avoids inducing over-sensitivity in LLM but also can even mitigate this issue.

翻訳日:2023-10-25 18:29:27 公開日:2023-10-24

# 条件変化推論を用いた動的PET画像の後方推定

Posterior Estimation for Dynamic PET imaging using Conditional Variational Inference ( http://arxiv.org/abs/2310.15850v1 )

ライセンス: Link先を確認

Xiaofeng Liu, Thibault Marin, Tiss Amal, Jonghye Woo, Georges El Fakhri, Jinsong Ouyang

(参考訳) 本研究の目的は, 活動曲線の測定値から, 動的ポジトロン放射トモグラフィ(PET)イメージングにおける運動パラメータの後方分布を効率的に推定することである。前向き運動モデルによるパラメトリックイメージングから計測空間への固有情報損失を考えると、逆写像は曖昧である。従来の(しかし高価な)解はマルコフ連鎖モンテカルロサンプリング(英語版)(mcmc)であり、偏りのない漸近的推定を生じることが知られている。効率的な後続推定のためのディープラーニングに基づくフレームワークを提案する。具体的には、潜在変数を導入することで、フォワードプロセスの情報損失を相殺する。次に,条件付き変分オートエンコーダ(CVAE)を用いて,そのエビデンスを低境界で最適化する。十分に訓練されたデコーダは、与えられた測定値とサンプルされた潜在変数を単純な多変量ガウス分布に従って推測することができる。低次元データ(単一脳領域)の参照として非バイアスMCMCを用いたCVAE法を簡易な参照組織モデルで検証した。

This work aims efficiently estimating the posterior distribution of kinetic parameters for dynamic positron emission tomography (PET) imaging given a measurement of time of activity curve. Considering the inherent information loss from parametric imaging to measurement space with the forward kinetic model, the inverse mapping is ambiguous. The conventional (but expensive) solution can be the Markov Chain Monte Carlo (MCMC) sampling, which is known to produce unbiased asymptotical estimation. We propose a deep-learning-based framework for efficient posterior estimation. Specifically, we counteract the information loss in the forward process by introducing latent variables. Then, we use a conditional variational autoencoder (CVAE) and optimize its evidence lower bound. The well-trained decoder is able to infer the posterior with a given measurement and the sampled latent variables following a simple multivariate Gaussian distribution. We validate our CVAE-based method using unbiased MCMC as the reference for low-dimensional data (a single brain region) with the simplified reference tissue model.

翻訳日:2023-10-25 18:29:06 公開日:2023-10-24

# 公平性、プライバシー、規制規範を備えた責任ある機械学習データセットについて

On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms ( http://arxiv.org/abs/2310.15848v1 )

ライセンス: Link先を確認

Surbhi Mittal, Kartik Thakral, Richa Singh, Mayank Vatsa, Tamar Glaser, Cristian Canton Ferrer, Tal Hassner

(参考訳) 人工知能(AI)は様々な科学分野に進出し、様々なタスクのために既存のアルゴリズムよりも驚くほど改善されている。近年、AI技術の信頼性に対する深刻な懸念が高まっている。科学コミュニティは信頼できるAIアルゴリズムの開発に注力してきた。しかし、今日のaiコミュニティで人気がある機械学習とディープラーニングのアルゴリズムは、開発中のデータに大きく依存している。これらの学習アルゴリズムはデータのパターンを特定し、行動目標を学習する。データ中のあらゆる欠陥は、直接アルゴリズムに翻訳する可能性がある。本研究では,Responsible Machine Learning Datasetsの重要性を論じ,責任のあるルーリックを用いてデータセットを評価するフレームワークを提案する。既存の研究は,アルゴリズムの信頼性評価に重点を置いているが,我々は,データコンポーネントを別々に考慮し,アルゴリズムにおけるその役割を理解するフレームワークを提供する。我々は、公正、プライバシー、規制遵守のレンズを通して責任あるデータセットを議論し、将来のデータセットを構築するための推奨事項を提供する。 100以上のデータセットを調査した後、分析に60のデータセットを使用し、フェアネス、プライバシ保護、規制遵守の問題に影響を受けないことを示した。私たちは ``datasheets for datasets" の修正を行い、データセットドキュメントの改善に重要な追加を加えました。世界中の政府がデータ保護法を規則化しているため、科学コミュニティでデータセットを作成するには修正が必要である。この研究は、今日のAIの時代において、タイムリーで重要なものだと考えています。

Artificial Intelligence (AI) has made its way into various scientific fields, providing astonishing improvements over existing algorithms for a wide variety of tasks. In recent years, there have been severe concerns over the trustworthiness of AI technologies. The scientific community has focused on the development of trustworthy AI algorithms. However, machine and deep learning algorithms, popular in the AI community today, depend heavily on the data used during their development. These learning algorithms identify patterns in the data, learning the behavioral objective. Any flaws in the data have the potential to translate directly into algorithms. In this study, we discuss the importance of Responsible Machine Learning Datasets and propose a framework to evaluate the datasets through a responsible rubric. While existing work focuses on the post-hoc evaluation of algorithms for their trustworthiness, we provide a framework that considers the data component separately to understand its role in the algorithm. We discuss responsible datasets through the lens of fairness, privacy, and regulatory compliance and provide recommendations for constructing future datasets. After surveying over 100 datasets, we use 60 datasets for analysis and demonstrate that none of these datasets is immune to issues of fairness, privacy preservation, and regulatory compliance. We provide modifications to the ``datasheets for datasets" with important additions for improved dataset documentation. With governments around the world regularizing data protection laws, the method for the creation of datasets in the scientific community requires revision. We believe this study is timely and relevant in today's era of AI.

翻訳日:2023-10-25 18:28:50 公開日:2023-10-24

# 人種バイアス分析のための新しい方法:個人レベルの参照の収集

A Novel Method for Analysing Racial Bias: Collection of Person Level References ( http://arxiv.org/abs/2310.15847v1 )

ライセンス: Link先を確認

Muhammed Yusuf Kocyigit, Anietie Andy, Derry Wijaya

(参考訳) 文学やメディアにおけるバイアス付きコンテンツへの長期的露出は、人々の現実に対する認識に大きな影響を与え、検出と対処が難しい暗黙のバイアスの発生につながる(Gerbner 1998)。本研究では,2つのグループ間の表現の違いを分析し,google booksデータセット(goldberg and orwant 2013)を用いて1850年から2000年にかけての書物におけるアフリカ系アメリカ人と白人の表現について検討する手法を提案する。表現の違いを理解するためのより良いツールを開発することで、バイアスを認識し緩和するための継続的な努力に貢献することを目指している。より一般的な語句(男性、女性、白人、黒人など)に基づいて文脈を区別する手法(Tripodi et al. 2019, Lucy, Tadimeti, Bamman 2022)を改善するため、歴史的に重要な人物の包括的リストを収集し、それらの名前を用いて関連文脈を選択することを提案する。この手法は、選択バイアスのリスクを低減し、暗黙のバイアスを検出するためのより正確でニュアンスな方法を提供する。我々は10年ごとにグループ表現を作成し、それらを整列した意味空間で解析する(Hamilton, Leskovec, Jurafsky 2016)。我々は、各グループの文脈において、時間調整された毒性(Bassignana, Basile, Patti 2018)を評価し、数十年にわたるグループ間の最も顕著な違いを示す意味軸(Lucy, Tadimeti, Bamman 2022)を特定し、その結果をさらに支援する。我々は,提案手法が社会的変化を正確に把握できることを示し,本書に記載されているアフリカ系アメリカ人の相対的な数が増えつつも,それらを取り巻く文脈は,白人よりも有毒であることを示す。

Long term exposure to biased content in literature or media can significantly influence people's perceptions of reality, leading to the development of implicit biases that are difficult to detect and address (Gerbner 1998). In this study, we propose a novel method to analyze the differences in representation between two groups and use it examine the representation of African Americans and White Americans in books between 1850 to 2000 with the Google Books dataset (Goldberg and Orwant 2013). By developing better tools to understand differences in representation, we aim to contribute to the ongoing efforts to recognize and mitigate biases. To improve upon the more common phrase based (men, women, white, black, etc) methods to differentiate context (Tripodi et al. 2019, Lucy; Tadimeti, and Bamman 2022), we propose collecting a comprehensive list of historically significant figures and using their names to select relevant context. This novel approach offers a more accurate and nuanced method for detecting implicit biases through reducing the risk of selection bias. We create group representations for each decade and analyze them in an aligned semantic space (Hamilton, Leskovec, and Jurafsky 2016). We further support our results by assessing the time adjusted toxicity (Bassignana, Basile, and Patti 2018) in the context for each group and identifying the semantic axes (Lucy, Tadimeti, and Bamman 2022) that exhibit the most significant differences between the groups across decades. We support our method by showing that our proposed method can capture known socio political changes accurately and our findings indicate that while the relative number of African American names mentioned in books have increased over time, the context surrounding them remains more toxic than white Americans.

翻訳日:2023-10-25 18:28:29 公開日:2023-10-24

# 量子および半古典的解析によるホウ素-アインシュタイン凝縮体の動的相転移のキャラクタリゼーション

Characterizing dynamical phase transitions in a spinor Bose-Einstein condensate via quantum and semiclassical analyses ( http://arxiv.org/abs/2310.15841v1 )

ライセンス: Link先を確認

Zhen-Xia Niu and Qian Wang

(参考訳) Phase transitions in nonequilibrium dynamics of many body quantum systems,the so-called dynamical phases transition (DPTs), play an important role for understanding various dynamical phenomena observed in different branches of physics.In general, there have two types of DPTs, the first one refers to the phase transition that is characterized by distinct evolution behaviors of a physical observable, while the second one is marked by the nonanalyticities in the rate function of the initial state survival probability. 本稿では,非平衡力学を研究するための理想的な基盤であるスピノルBose-Einstein condensate (BEC) における量子的および半古典的両面からのDPTに着目し,その制御パラメータが臨界クエンチ(Critical quench)と呼ばれる臨界クエンチ(Critical quench)を通してクエンチされることを示す。本研究では, 半古典的手法を用いて臨界クエンチを決定する手法を解析的に示し, 2種類のDPTの半古典的および量子的シグネチャについて詳細に検討する。さらに, DPTの発生は, 基礎となる古典システムにおいて, セパラトリクスと密接に関連していることを明らかにする。本研究は,DPTの特性についてより深い知見を提供し,よく定義された半古典的極限を持つ量子系におけるDPTを理解するための半古典的解析の有用性を検証する。

Phase transitions in nonequilibrium dynamics of many body quantum systems,the so-called dynamical phases transition (DPTs), play an important role for understanding various dynamical phenomena observed in different branches of physics.In general, there have two types of DPTs, the first one refers to the phase transition that is characterized by distinct evolution behaviors of a physical observable, while the second one is marked by the nonanalyticities in the rate function of the initial state survival probability. Here, we focus on such DPTs from both quantum and semiclassical perspectives in a spinor Bose-Einstein condensate (BEC), an ideal platform to investigate nonequilibrium dynamics.By using the sudden quench process, we demonstrate that the system exhibits both types of DPTs as the control parameter quenches through the critical one, referring to as the critical quench. We show analytically how to determine the critical quenches by means of the semiclassical approach and carry out a detailed examination on both semiclassical and quantum signatures of two types of DPTs. Moreover, we further reveal that the occurrence of DPTs is closely connected to the separatrix in the underlying classical system. Our findings provide more insights into the properties of DPTs and verify the usefulness of semiclassical analysis for understanding DPTs in quantum systems with well-defined semiclassical limit.

翻訳日:2023-10-25 18:27:54 公開日:2023-10-24

# 新しい意図発見のための拡散重み付きグラフフレームワーク

A Diffusion Weighted Graph Framework for New Intent Discovery ( http://arxiv.org/abs/2310.15836v1 )

ライセンス: Link先を確認

Wenkai Shi, Wenbin An, Feng Tian, Qinghua Zheng, QianYing Wang, Ping Chen

(参考訳) New Intent Discovery (NID)は、既知のインテントのみを含む限定ラベル付きデータを使用して、ラベルのないデータから、新しいインテントと既知のインテントの両方を認識することを目的としている。サンプル間の構造的関係を考慮せずに、従来の方法では、量と品質のバランスが取れないノイズの多い監視信号を生成し、新しい意図クラスターの形成を妨げ、事前学習の知識を効果的に伝達する。この制限を緩和するために,データ固有の意味的類似性と構造的関連性の両方を捕捉し,より十分かつ信頼性の高い監視信号を可能にする新しい拡散重み付きグラフフレームワーク(DWGF)を提案する。具体的には,複数のホップに対して最寄りの近傍が誘導する意味経路に沿って近傍関係を拡散し,局所構造を識別的に特徴付ける。次に,その正のキーをサンプリングし,意味的類似性や局所構造に基づいて評価する。さらに,グラフ平滑化フィルタ(GSF)を提案し,クラスタ境界上の意味的曖昧なサンプルに具現化された高周波ノイズをフィルタする。広範な実験により,本手法は複数のベンチマークデータセットにおいて,すべての評価指標において最先端モデルを上回ることがわかった。コードとデータはhttps://github.com/yibai-shi/dwgfで入手できる。

New Intent Discovery (NID) aims to recognize both new and known intents from unlabeled data with the aid of limited labeled data containing only known intents. Without considering structure relationships between samples, previous methods generate noisy supervisory signals which cannot strike a balance between quantity and quality, hindering the formation of new intent clusters and effective transfer of the pre-training knowledge. To mitigate this limitation, we propose a novel Diffusion Weighted Graph Framework (DWGF) to capture both semantic similarities and structure relationships inherent in data, enabling more sufficient and reliable supervisory signals. Specifically, for each sample, we diffuse neighborhood relationships along semantic paths guided by the nearest neighbors for multiple hops to characterize its local structure discriminately. Then, we sample its positive keys and weigh them based on semantic similarities and local structures for contrastive learning. During inference, we further propose Graph Smoothing Filter (GSF) to explicitly utilize the structure relationships to filter high-frequency noise embodied in semantically ambiguous samples on the cluster boundary. Extensive experiments show that our method outperforms state-of-the-art models on all evaluation metrics across multiple benchmark datasets. Code and data are available at https://github.com/yibai-shi/DWGF.

翻訳日:2023-10-25 18:27:30 公開日:2023-10-24

# コンセプトドリフト説明法を用いた配水網における小漏れの局所化

Localization of Small Leakages in Water Distribution Networks using Concept Drift Explanation Methods ( http://arxiv.org/abs/2310.15830v1 )

ライセンス: Link先を確認

Valerie Vaquet and Fabian Hinder and Kathrin Lammers and Jonas Vaquet and Barbara Hammer

(参考訳) 気候変動に直面すると、すでに限定された飲料水の利用量は今後減少し、飲料水はますます希少な資源となる。相当量の水は、水運と流通網の漏れによって失われる。漏水検出と局所化は複雑な相互作用と水流ネットワークの要求の変化のために困難な問題である。特に小さな漏れは指摘が難しいが、長期にわたる水の損失を避けるためには、その位置が不可欠である。リーク検出とローカライゼーションのタスクの解決には様々なアプローチがあるが、リアルタイム需要測定や正確なネットワークトポロジといったシステムに関する様々な情報に依存しており、これは現実のシナリオでは非現実的な仮定である。対照的に,本研究は圧力測定のみを用いて漏洩局所化を試みる。この目的のために, まず, 配水網内の漏洩をベイズネットワークを用いてモデル化し, システムダイナミクスを解析する。次に,問題がどのように接続されているかを示し,概念ドリフトのレンズを通して考察する。特に、モデルに基づくコンセプトドリフトの説明は、ネットワークに関する情報が限られている場合のリークをローカライズするための有望なツールであると主張する。この手法は現実的なベンチマークシナリオを用いて実験的に評価される。

Facing climate change the already limited availability of drinking water will decrease in the future rendering drinking water an increasingly scarce resource. Considerable amounts of it are lost through leakages in water transportation and distribution networks. Leakage detection and localization are challenging problems due to the complex interactions and changing demands in water distribution networks. Especially small leakages are hard to pinpoint yet their localization is vital to avoid water loss over long periods of time. While there exist different approaches to solving the tasks of leakage detection and localization, they are relying on various information about the system, e.g. real-time demand measurements and the precise network topology, which is an unrealistic assumption in many real-world scenarios. In contrast, this work attempts leakage localization using pressure measurements only. For this purpose, first, leakages in the water distribution network are modeled employing Bayesian networks, and the system dynamics are analyzed. We then show how the problem is connected to and can be considered through the lens of concept drift. In particular, we argue that model-based explanations of concept drift are a promising tool for localizing leakages given limited information about the network. The methodology is experimentally evaluated using realistic benchmark scenarios.

翻訳日:2023-10-25 18:27:08 公開日:2023-10-24

# 確率的オウムにも感情はあるか? 感情認識による合成テキストのニューラル検出の改善

Do Stochastic Parrots have Feelings Too? Improving Neural Detection of Synthetic Text via Emotion Recognition ( http://arxiv.org/abs/2310.15904v1 )

ライセンス: Link先を確認

Alan Cowap, Yvette Graham, Jennifer Foster

(参考訳) 生成AIの最近の進歩は、高性能な合成テキスト生成技術に注目を向けている。このようなモデルの可用性と使いやすさは、合成テキストを識別できる等しく強力な技術を提供する緊急の必要性を浮き彫りにしている。このことを念頭に置いて、人々は感情によって駆動され、構成するテキストに感情をエンコードできるという心理学的な研究から着想を得た。我々は、事前学習された言語モデル(plm)は、テキストを生成する際にそのような感情的ドライバが欠如し、感情的一貫性のない合成テキストを生成する可能性があるため、感情的欠陥があると仮定する。 PLMを感情に微調整することで感情を認識できる検出器を開発した。実験結果から,感情認識型検知器は,テキスト生成装置,さまざまなサイズモデル,データセット,ドメインなど,さまざまな領域で改善されている。最後に、感情認識型合成テキスト検出装置を、独自の出力を識別するタスクでchatgptと比較し、相当な利益を示し、感情を合成テキストを識別するための信号としての可能性を強める。コード、モデル、データセットはhttps: //github.com/alanagiasi/emoPLMsynthで入手できる。

Recent developments in generative AI have shone a spotlight on high-performance synthetic text generation technologies. The now wide availability and ease of use of such models highlights the urgent need to provide equally powerful technologies capable of identifying synthetic text. With this in mind, we draw inspiration from psychological studies which suggest that people can be driven by emotion and encode emotion in the text they compose. We hypothesize that pretrained language models (PLMs) have an affective deficit because they lack such an emotional driver when generating text and consequently may generate synthetic text which has affective incoherence i.e. lacking the kind of emotional coherence present in human-authored text. We subsequently develop an emotionally aware detector by fine-tuning a PLM on emotion. Experiment results indicate that our emotionally-aware detector achieves improvements across a range of synthetic text generators, various sized models, datasets, and domains. Finally, we compare our emotionally-aware synthetic text detector to ChatGPT in the task of identification of its own output and show substantial gains, reinforcing the potential of emotion as a signal to identify synthetic text. Code, models, and datasets are available at https: //github.com/alanagiasi/emoPLMsynth

翻訳日:2023-10-25 18:21:31 公開日:2023-10-24

# ピックオールラベル損失を伴うマルチラベル学習における神経崩壊

Neural Collapse in Multi-label Learning with Pick-all-label Loss ( http://arxiv.org/abs/2310.15903v1 )

ライセンス: Link先を確認

Pengyu Li, Yutong Wang, Xiao Li, Qing Qu

(参考訳) マルチラベル分類(MLab)タスクのためのディープニューラルネットワークについて,ニューラル崩壊レンズ(NC)を用いて検討した。先行研究は,マルチクラス分類設定に制限されており,最終層の特徴として,以下の特性からなるnc現象が広く見られる。 (i)各クラス内の特徴の変動性はゼロに崩壊する。 (ii)特徴点集合は、等角タイトフレーム(etf)を形成し、 3)最後の層分類器は、ある程度のスケーリングで機能に崩壊する。本研究をマルチラベル学習に一般化し,一般化されたnc現象が「ピック・オール・ラベル」の定式化で成立することを示す。非拘束特徴モデル(unconstrained feature model:ufm)の自然な類似性の下で、ピック-オール-ラベルクロスエントロピー損失の唯一の大域的分類器は、さらに多重度-1特徴クラス平均に崩壊する同じetf幾何を表示する。さらに,複数のラベルを持つサンプルの特徴クラス平均値が単一ラベルタグの特徴クラス平均値の平均値である「タグワイド平均値」という,多ラベル学習に特有な一般化NCの組合せ特性を発見した。理論上, ufmのピックオールラベルクロスエントロピーリスクに対する大域的最適性が確立される。さらに,マルチラベルデータセット上で深層ニューラルネットワークをトレーニングすることで,トレーニング効率が向上することを示す実証的証拠も提供する。

We study deep neural networks for the multi-label classification (MLab) task through the lens of neural collapse (NC). Previous works have been restricted to the multi-class classification setting and discovered a prevalent NC phenomenon comprising of the following properties for the last-layer features: (i) the variability of features within every class collapses to zero, (ii) the set of feature means form an equi-angular tight frame (ETF), and (iii) the last layer classifiers collapse to the feature mean upon some scaling. We generalize the study to multi-label learning, and prove for the first time that a generalized NC phenomenon holds with the "pick-all-label'' formulation. Under the natural analog of the unconstrained feature model (UFM), we establish that the only global classifier of the pick-all-label cross entropy loss display the same ETF geometry which further collapse to multiplicity-1 feature class means. Besides, we discover a combinatorial property in generalized NC which is unique for multi-label learning that we call ``tag-wise average'' property, where the feature class-means of samples with multiple labels are scaled average of the feature class-means of single label tags. Theoretically, we establish global optimality result for the pick-all-label cross-entropy risk for the UFM. Additionally, We also provide empirical evidence to support our investigation into training deep neural networks on multi-label datasets, resulting in improved training efficiency.

翻訳日:2023-10-25 18:21:09 公開日:2023-10-24

# yolo-angio:冠動脈解剖学セグメンテーションのアルゴリズム

YOLO-Angio: An Algorithm for Coronary Anatomy Segmentation ( http://arxiv.org/abs/2310.15898v1 )

ライセンス: Link先を確認

Tom Liu, Hui Lin, Aggelos K. Katsaggelos, Adrienne Kline

(参考訳) 冠動脈造影は、世界中で最も多い死因である冠動脈疾患の診断基準である。この手順は年間200万回以上実施されているが、疾患の迅速かつ正確な自動測定と冠動脈解剖の局所化の方法はほとんど残っていない。そこで本研究では,MICCAI 2023におけるX線アンギオグラフィー画像(ARCADE)を用いた冠状動脈疾患自動診断のソリューションを提案する。血管分割作業では,従来のコンピュータビジョンによる前処理と特徴選択を組み合わせることで血管コントラストを向上し,さらにyolov8に基づくアンサンブルモデルを用いて血管マップを作成し,血管候補候補を提案する。最終セグメンテーションは、グラフベースのソート法で冠状樹を再構築する論理に基づくアプローチに基づいている。アーケードチャレンジへのエントリーは総合で3位だった。評価のために公式な測定値を用いて,検証セットとホールドアウトセットでそれぞれ0.422点,0.4289点を得た。

Coronary angiography remains the gold standard for diagnosis of coronary artery disease, the most common cause of death worldwide. While this procedure is performed more than 2 million times annually, there remain few methods for fast and accurate automated measurement of disease and localization of coronary anatomy. Here, we present our solution to the Automatic Region-based Coronary Artery Disease diagnostics using X-ray angiography images (ARCADE) challenge held at MICCAI 2023. For the artery segmentation task, our three-stage approach combines preprocessing and feature selection by classical computer vision to enhance vessel contrast, followed by an ensemble model based on YOLOv8 to propose possible vessel candidates by generating a vessel map. A final segmentation is based on a logic-based approach to reconstruct the coronary tree in a graph-based sorting method. Our entry to the ARCADE challenge placed 3rd overall. Using the official metric for evaluation, we achieved an F1 score of 0.422 and 0.4289 on the validation and hold-out sets respectively.

翻訳日:2023-10-25 18:20:45 公開日:2023-10-24

# BianQue:ChatGPTによるマルチターンヘルス会話による健康LLMの質問と提案能力のバランス

BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT ( http://arxiv.org/abs/2310.15896v1 )

ライセンス: Link先を確認

Yirong Chen, Zhenyu Wang, Xiaofen Xing, huimin zheng, Zhipei Xu, Kai Fang, Junhong Wang, Sihang Li, Jieling Wu, Qi Liu, Xiangmin Xu

(参考訳) 大規模言語モデル(llm)は、chatgpt、chatglm、chatdoctor、doctorglmなどのシステムによって例示される、シングルターン会話における一般的な、広範な健康提案を提供することでうまく機能している。しかし, シングルターン中に提供された限られた情報により, 生成した提案のパーソナライズやターゲティングが不十分になり, ユーザが単独で有用な部分を選択する必要がある。主にマルチターン質問を行う能力の欠如によって引き起こされる。現実の医療相談では、医師は通常、患者の状態を徹底的に理解するために一連の反復的な問合せを使用し、その後、効果的でパーソナライズされた提案を提供し、LLMに対する質問の連鎖(CoQ)と定義できる。 llmsのcoqを改善するために,自己構築型健康会話データセットbianquecorpusを微調整したchatglmベースのllmであるbianqueを提案する。実験の結果,提案するビアンクは,質問と健康提案の双方の能力のバランスをとることができ,活動的健康分野におけるllmの研究と応用を促進する。

Large language models (LLMs) have performed well in providing general and extensive health suggestions in single-turn conversations, exemplified by systems such as ChatGPT, ChatGLM, ChatDoctor, DoctorGLM, and etc. However, the limited information provided by users during single turn results in inadequate personalization and targeting of the generated suggestions, which requires users to independently select the useful part. It is mainly caused by the missing ability to engage in multi-turn questioning. In real-world medical consultations, doctors usually employ a series of iterative inquiries to comprehend the patient's condition thoroughly, enabling them to provide effective and personalized suggestions subsequently, which can be defined as chain of questioning (CoQ) for LLMs. To improve the CoQ of LLMs, we propose BianQue, a ChatGLM-based LLM finetuned with the self-constructed health conversation dataset BianQueCorpus that is consist of multiple turns of questioning and health suggestions polished by ChatGPT. Experimental results demonstrate that the proposed BianQue can simultaneously balance the capabilities of both questioning and health suggestions, which will help promote the research and application of LLMs in the field of proactive health.

翻訳日:2023-10-25 18:20:27 公開日:2023-10-24

# 不均一データに基づく分散深層学習のためのクロスファインコントラスト損失

Cross-feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Data ( http://arxiv.org/abs/2310.15890v1 )

ライセンス: Link先を確認

Sai Aparna Aketi and Kaushik Roy

(参考訳) 現在の最先端の分散学習アルゴリズムは、データ分散を独立的かつ特定的分散(IID)とみなしている。しかし、実際のシナリオでは、分散データセットはエージェント間で著しく異質なデータ分布を持つことができる。本研究では,異種データを用いた分散学習の新たな手法を提案する。一対の隣接するエージェントのクロスフィーチャは、他のエージェントのモデルパラメータに関してエージェントのデータから得られる特徴(すなわち最後の隠れ層活性化)である。提案手法は,様々なコンピュータビジョンデータセット (cifar-10, cifar-100, fashion mnist, imagenet) ,モデルアーキテクチャ,ネットワークトポロジ上で徹底的に実験を行い,その効果を示す。実験の結果,提案手法は異種データを用いた分散学習手法に比べて性能(テスト精度0.2～4%向上)が優れていることがわかった。

The current state-of-the-art decentralized learning algorithms mostly assume the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the distributed datasets can have significantly heterogeneous data distributions across the agents. In this work, we present a novel approach for decentralized learning on heterogeneous data, where data-free knowledge distillation through contrastive loss on cross-features is utilized to improve performance. Cross-features for a pair of neighboring agents are the features (i.e., last hidden layer activations) obtained from the data of an agent with respect to the model parameters of the other agent. We demonstrate the effectiveness of the proposed technique through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, and ImageNet), model architectures, and network topologies. Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.

翻訳日:2023-10-25 18:20:02 公開日:2023-10-24

# 表現学習のためのフーリエ変換による状態系列予測

State Sequences Prediction via Fourier Transform for Representation Learning ( http://arxiv.org/abs/2310.15888v1 )

ライセンス: Link先を確認

Mingxuan Ye, Yufei Kuang, Jie Wang, Rui Yang, Wengang Zhou, Houqiang Li, Feng Wu

(参考訳) 深層強化学習 (rl) は複雑な制御タスクの解法として有効であることが実証されているが, 膨大なデータ量を必要とするため, サンプル効率は依然として重要な課題である。既存の研究は、データ効率の良いrl、例えば長期的な将来の状態を予測して予測表現を学習するための表現学習の応用を探求している。しかし、多くの既存手法では、シーケンシャルな状態信号に固有の構造情報を十分に活用していないため、長期的な意思決定の質が向上する可能性がある。この問題に対処するために,状態系列の周波数領域を利用して時系列データの基本パターンを抽出し,表現表現を効率的に学習する手法である,フーリエ変換による状態系列予測(SPF)を提案する。具体的には,政策性能と信号規則性に密接な関係にある状態系列における構造情報の存在を理論的に解析し,その情報を抽出するために無限ステップ状態系列のフーリエ変換の予測を提案する。 SPFの魅力の1つは、予測対象として無限段階の将来の状態を保存する必要がなく、実装が簡単であることである。実験により,提案手法がサンプル効率と性能の両面で最先端アルゴリズムよりも優れていることを示す。

While deep reinforcement learning (RL) has been demonstrated effective in solving complex control tasks, sample efficiency remains a key challenge due to the large amounts of data required for remarkable performance. Existing research explores the application of representation learning for data-efficient RL, e.g., learning predictive representations by predicting long-term future states. However, many existing methods do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we propose State Sequences Prediction via Fourier Transform (SPF), a novel method that exploits the frequency domain of state sequences to extract the underlying patterns in time series data for learning expressive representations efficiently. Specifically, we theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity, and then propose to predict the Fourier transform of infinite-step future state sequences to extract such information. One of the appealing features of SPF is that it is simple to implement while not requiring storage of infinite-step future states as prediction targets. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.

翻訳日:2023-10-25 18:19:45 公開日:2023-10-24

# AdaptiX - 補助ロボットにおける共有制御アプリケーションの開発と評価のための遷移型XRフレームワーク

AdaptiX -- A Transitional XR Framework for Development and Evaluation of Shared Control Applications in Assistive Robotics ( http://arxiv.org/abs/2310.15887v1 )

ライセンス: Link先を確認

Max Pascher and Felix Ferdinand Goldau and Kirill Kronhardt and Udo Frese and Jens Gerken

(参考訳) 移動性障害のある人々に力を与える努力と、一般市民による技術受容の増加により、協調ロボットアームなどの補助技術が人気を集めている。しかし、その広範な成功はユーザビリティの問題、特にユーザ入力とソフトウェアコントロールの相違によって制限されている。これを解決するために、共有制御の概念は、目標とするユーザ自律性と特定のレベルのコンピュータ支援を組み合わせる機会を提供する。本稿では,高分解能シミュレーション環境における共有制御アプリケーションの開発と評価のための,オープンソースのadaptix xrフレームワークを提案する。初期のフレームワークは、仮想現実感(VR)の例を含むシミュレーションされたロボットアーム、複数の標準制御インタフェース、特殊な記録/再生システムで構成されている。 AdaptiXは特定の研究ニーズに対して容易に拡張することができ、人間のロボットインタラクション(HRI)研究者は、アイデア、プロトタイピング、評価の初期段階で実際の物理的なロボットアームを必要とすることなく、新しいインタラクション方法、介入戦略、マルチモーダルフィードバックテクニックを迅速に設計、テストすることができる。また、ロボット・オペレーティング・システム(ROS)の統合により、シミュレーションと現実のギャップをなくすことなく、実際のロボットアームをPhysicalTwinアプローチで制御することができる。本稿では,adaptixの機能と限界を詳細に検討し,そのフレームワークに基づく3つの研究成果を示す。 AdaptiXはhttps://adaptix.robot-research.deでアクセスできる。

With the ongoing efforts to empower people with mobility impairments and the increase in technological acceptance by the general public, assistive technologies, such as collaborative robotic arms, are gaining popularity. Yet, their widespread success is limited by usability issues, specifically the disparity between user input and software control along the autonomy continuum. To address this, shared control concepts provide opportunities to combine the targeted increase of user autonomy with a certain level of computer assistance. This paper presents the free and open-source AdaptiX XR framework for developing and evaluating shared control applications in a high-resolution simulation environment. The initial framework consists of a simulated robotic arm with an example scenario in Virtual Reality (VR), multiple standard control interfaces, and a specialized recording/replay system. AdaptiX can easily be extended for specific research needs, allowing Human-Robot Interaction (HRI) researchers to rapidly design and test novel interaction methods, intervention strategies, and multi-modal feedback techniques, without requiring an actual physical robotic arm during the early phases of ideation, prototyping, and evaluation. Also, a Robot Operating System (ROS) integration enables the controlling of a real robotic arm in a PhysicalTwin approach without any simulation-reality gap. Here, we review the capabilities and limitations of AdaptiX in detail and present three bodies of research based on the framework. AdaptiX can be accessed at https://adaptix.robot-research.de.

翻訳日:2023-10-25 18:19:25 公開日:2023-10-24

# KirchhoffNet: メッセージパッシングと継続的深度モデルによる回路ブリッジ

KirchhoffNet: A Circuit Bridging Message Passing and Continuous-Depth Models ( http://arxiv.org/abs/2310.15872v1 )

ライセンス: Link先を確認

Zhengqi Gao, Fan-Keng Sun, Duane S. Boning

(参考訳) 本稿では,アナログ電子回路の基本原理であるKirchhoffの現行法則を利用して,KirchhoffNetと呼ぶ独自のニューラルネットワークモデルを導入する。 KirchhoffNetは、メッセージパッシングニューラルネットワークと連続深度ネットワークとの密接な接続を確立する。従来の層(畳み込み、プーリング、線形層など)が存在しない場合でも、kirchhoffnetはmnistデータセットで98.86%のテスト精度を達成し、state of the art (sota)の結果に匹敵する。 KirchhoffNetが興味深いのは、ハードウェア界におけるその可能性だ。現代のディープニューラルネットワークは従来、gpuにデプロイされる。対照的に、KirchhoffNetはアナログ電子回路によって物理的に実現することができる。さらに、KirchhoffNet内のパラメータの数に関係なく、その前方計算は常に1/f秒以内に完了し、fはハードウェアのクロック周波数を表す。この特徴は、超大規模ニューラルネットワークの実装に有望な技術を導入する。

In this paper, we exploit a fundamental principle of analog electronic circuitry, Kirchhoff's current law, to introduce a unique class of neural network models that we refer to as KirchhoffNet. KirchhoffNet establishes close connections with message passing neural networks and continuous-depth networks. We demonstrate that even in the absence of any traditional layers (such as convolution, pooling, or linear layers), KirchhoffNet attains 98.86% test accuracy on the MNIST dataset, comparable with state of the art (SOTA) results. What makes KirchhoffNet more intriguing is its potential in the realm of hardware. Contemporary deep neural networks are conventionally deployed on GPUs. In contrast, KirchhoffNet can be physically realized by an analog electronic circuit. Moreover, we justify that irrespective of the number of parameters within a KirchhoffNet, its forward calculation can always be completed within 1/f seconds, with f representing the hardware's clock frequency. This characteristic introduces a promising technology for implementing ultra-large-scale neural networks.

翻訳日:2023-10-25 18:19:00 公開日:2023-10-24

# 因果認識型グラフニューラルネットワークを用いた動的グラフの時間中心性予測

Using Causality-Aware Graph Neural Networks to Predict Temporal Centralities in Dynamic Graphs ( http://arxiv.org/abs/2310.15865v1 )

ライセンス: Link先を確認

Franziska Heeg, Ingo Scholtes

(参考訳) ノード中心性は、ネットワーク科学、ソーシャルネットワーク分析、レコメンダシステムにおいて重要な役割を果たす。時間的データでは、近接性や間性のような静的な経路に基づく中心性は、時間的グラフにおけるノードの真の重要性について誤解を招く結果をもたらす。この問題に対処するために、ノード対間の最短時間参照パスに基づく、間隙と近さの時間的一般化が定義されている。しかし、これらの一般化の大きな問題は、そのような経路の計算が計算コストが高いことである。本稿では,因果関係を認識したグラフニューラルネットワークアーキテクチャであるde bruijn graph neural networks(dbgnn)の時系列データの時間経路に基づく中心性予測への応用について検討する。本手法は,生体・社会系の13の時間グラフを用いて実験的に評価し,静的グラフ畳み込みニューラルネットワークと比較して,相対性と近接性の両方の予測をかなり改善することを示した。

Node centralities play a pivotal role in network science, social network analysis, and recommender systems. In temporal data, static path-based centralities like closeness or betweenness can give misleading results about the true importance of nodes in a temporal graph. To address this issue, temporal generalizations of betweenness and closeness have been defined that are based on the shortest time-respecting paths between pairs of nodes. However, a major issue of those generalizations is that the calculation of such paths is computationally expensive. Addressing this issue, we study the application of De Bruijn Graph Neural Networks (DBGNN), a causality-aware graph neural network architecture, to predict temporal path-based centralities in time series data. We experimentally evaluate our approach in 13 temporal graphs from biological and social systems and show that it considerably improves the prediction of both betweenness and closeness centrality compared to a static Graph Convolutional Neural Network.

翻訳日:2023-10-25 18:18:43 公開日:2023-10-24

# トポロジーを意識した自己教師付きグラフ学習

Topology-aware Debiased Self-supervised Graph Learning for Recommendation ( http://arxiv.org/abs/2310.15858v1 )

ライセンス: Link先を確認

Lei Han and Hui Yan and Zhicheng Qiao

(参考訳) グラフベースの協調フィルタリング(CF)手法は、グラフコントラスト学習(GCL)を導入することにより、データの分散を緩和する。しかし、これらのGCLベースのCFモデルにおけるランダムなネガティブサンプリング戦略は、偽陰性(アンカーユーザ(item)に類似した負)を導入するだけでなく、潜在的正のサンプルも無視するユーザ(items)のセマンティック構造を無視している。上記の課題に対処するために,ユーザ間のセマンティックな類似性に応じてコントラッシブなペアを構成するTDSGL(Topology-aware Debiased Self-supervised Graph Learning)を提案する。具体的には,ユーザとユーザ間のインタラクションデータには,ユーザの購買意図やアイテムの特徴が反映されているため,インタラクションデータ上でユーザ間のセマンティックな類似性を計算する。そして、ユーザ(item)が与えられた場合、ユーザ(item)を選択して、そのユーザ(item)とその負のセマンティックな違いを確実にするために、異なるセマンティックな構造を組み込む。さらに、ユーザ(item)に対して、他の意味的に類似したユーザ(item)を補助的なポジティブなサンプルに変換する機能抽出モジュールを設計し、より有益な表現を得る。実験結果から,提案モデルは3つの公開データセット上で,最先端モデルよりも優れていた。私たちのモデル実装コードはhttps://github.com/malajikuai/tdsglで利用可能です。

In recommendation, graph-based Collaborative Filtering (CF) methods mitigate the data sparsity by introducing Graph Contrastive Learning (GCL). However, the random negative sampling strategy in these GCL-based CF models neglects the semantic structure of users (items), which not only introduces false negatives (negatives that are similar to anchor user (item)) but also ignores the potential positive samples. To tackle the above issues, we propose Topology-aware Debiased Self-supervised Graph Learning (TDSGL) for recommendation, which constructs contrastive pairs according to the semantic similarity between users (items). Specifically, since the original user-item interaction data commendably reflects the purchasing intent of users and certain characteristics of items, we calculate the semantic similarity between users (items) on interaction data. Then, given a user (item), we construct its negative pairs by selecting users (items) which embed different semantic structures to ensure the semantic difference between the given user (item) and its negatives. Moreover, for a user (item), we design a feature extraction module that converts other semantically similar users (items) into an auxiliary positive sample to acquire a more informative representation. Experimental results show that the proposed model outperforms the state-of-the-art models significantly on three public datasets. Our model implementation codes are available at https://github.com/malajikuai/TDSGL.

翻訳日:2023-10-25 18:18:26 公開日:2023-10-24

# オンラインロバスト平均推定

Online Robust Mean Estimation ( http://arxiv.org/abs/2310.15932v1 )

ライセンス: Link先を確認

Daniel M. Kane and Ilias Diakonikolas and Hanshen Xiao and Sihan Liu

(参考訳) オンライン環境における高次元ロバスト平均推定問題について検討する。具体的には、$n$のセンサーが現在進行中の現象を計測するシナリオについて検討する。それぞれのステップで$t=1,2,\ldots,T$, $i^{th}$ センサーは、そのステップの読み込みを $x^{(i)}_t$ と報告する。するとアルゴリズムはその推定値に$\mu_t$をコミットし、その時のプロセスの平均値は$t$である。センサーの多くは、ある一般的な分布の独立したサンプルをX$で観察していると仮定するが、その代わりに$\epsilon$-fractionが悪質に振る舞うかもしれない。このアルゴリズムは、真の平均 $\mu^\ast := \mathbf{e}[x]$ に対するよい近似を計算したいと考えている。このアルゴリズムが推定値の報告に$T$まで待つことができれば、頑健な平均推定というよく研究された問題に還元されることに留意する。しかし,本アルゴリズムでは,データの出現に伴って部分的な推定が要求されるため,その状況はかなり複雑になる。このモデルにおいて,オンラインロバスト平均推定に関する2つの主な結果を示す。まず、未分解のサンプルが $(\epsilon,\delta)$-stability の標準条件を満たす場合、推定値 $\mu_t$, $t \in [t],$ を出力する効率的なオンラインアルゴリズムが与えられ、高い確率で$\|\mu-\mu^\ast\|_2 = o(\delta \log(t))$ となる。このエラーバウンドは、最高のオフラインアルゴリズムとほぼ競合しており、$\ell_2$-error of $o(\delta)$が得られる。 2つ目の主な結果は、入力に対する追加の仮定(特に$X$は製品分布である)により、エラーが$T$に依存しない非効率なアルゴリズムが存在することを示している。

We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where $n$ sensors are measuring some common, ongoing phenomenon. At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $\mu_t$ for the true mean value of the process at time $t$. We assume that most of the sensors observe independent samples from some common distribution $X$, but an $\epsilon$-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation $\mu$ to the true mean $\mu^\ast := \mathbf{E}[X]$. We note that if the algorithm is allowed to wait until time $T$ to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation. We prove two main results about online robust mean estimation in this model. First, if the uncorrupted samples satisfy the standard condition of $(\epsilon,\delta)$-stability, we give an efficient online algorithm that outputs estimates $\mu_t$, $t \in [T],$ such that with high probability it holds that $\|\mu-\mu^\ast\|_2 = O(\delta \log(T))$, where $\mu = (\mu_t)_{t \in [T]}$. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve $\ell_2$-error of $O(\delta)$. Our second main result shows that with additional assumptions on the input (most notably that $X$ is a product distribution) there are inefficient algorithms whose error does not depend on $T$ at all.

翻訳日:2023-10-25 18:10:31 公開日:2023-10-24

# E-Sparse:エントロピーベースのN:Mスパリティによる大規模言語モデル推論の強化

E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity ( http://arxiv.org/abs/2310.15929v1 )

ライセンス: Link先を確認

Yun Li, Lin Niu, Xipeng Zhang, Kai Liu, Jianchen Zhu, Zhanhui Kang

(参考訳) 従来のプルーニング手法は、その耐え難いトレーニングプロセスと大きな計算要求のために、生成型aiのために大きな言語モデル(llm)で作業することが難しいことが知られている。 LLMにおけるN:M間隔の精度を向上させるため,隠れ状態特徴の情報エントロピーをプルーニング計量設計(E-Sparse)に導入した。 e-sparseは、チャネルの重要性を活用するために情報豊かさを利用し、さらに、(1)パラメータ重みと入力特徴ノルムの重要度を高めるために情報エントロピーを導入し、残りの重みを変更せずにn:mスパーシティを実行するという、いくつかの新しい手法を取り入れている。 2) グローバルなナイーブシャッフルとローカルブロックシャッフルを設計し,情報配信を迅速に最適化し,N:M空間がLLMの精度に与える影響を適切に対処する。 E-SparseはFasterTransformer上のSparse-GEMMとして実装され、NVIDIA Ampere GPU上で動作する。 LLaMAファミリーとOPTモデルの大規模な実験により、E-Sparseは高密度モデル(最大1.53X)よりもモデル推論を著しく高速化し、大きなメモリ節約(最大43.52%)を得ることができ、精度の低下を許容できることが示された。

Traditional pruning methods are known to be challenging to work in Large Language Models (LLMs) for Generative AI because of their unaffordable training process and large computational demands. For the first time, we introduce the information entropy of hidden state features into a pruning metric design, namely E-Sparse, to improve the accuracy of N:M sparsity on LLM. E-Sparse employs the information richness to leverage the channel importance, and further incorporates several novel techniques to put it into effect: (1) it introduces information entropy to enhance the significance of parameter weights and input feature norms as a novel pruning metric, and performs N:M sparsity without modifying the remaining weights. (2) it designs global naive shuffle and local block shuffle to quickly optimize the information distribution and adequately cope with the impact of N:M sparsity on LLMs' accuracy. E-Sparse is implemented as a Sparse-GEMM on FasterTransformer and runs on NVIDIA Ampere GPUs. Extensive experiments on the LLaMA family and OPT models show that E-Sparse can significantly speed up the model inference over the dense model (up to 1.53X) and obtain significant memory saving (up to 43.52%), with acceptable accuracy loss.

翻訳日:2023-10-25 18:09:55 公開日:2023-10-24

# コントラスト学習に基づく文エンコーダの暗黙的重み付け

Contrastive Learning-based Sentence Encoders Implicitly Weight Informative Words ( http://arxiv.org/abs/2310.15921v1 )

ライセンス: Link先を確認

Hiroto Kurita, Goro Kobayashi, Sho Yokoi, Kentaro Inui

(参考訳) コントラスト損失を用いた微調整の簡単な実践により、文エンコーダの性能は大幅に向上する。モデルがコントラスト学習中に獲得する特徴は何か? 本稿では,コントラストに基づく文エンコーダが,情報理論量に基づいて暗黙的に単語重み付けを行うことを理論的に実験的に示す。この理論は、対照的な学習目標の最適値の下限において、単語埋め込みのノルムは、周囲の単語の分布に関連する情報ゲインを反映していると述べている。また、様々なモデル、複数のデータセット、モデルの暗黙の重み付け(Integrated Gradients と SHAP)を測定する2つの方法、情報理論量(情報ゲインと自己情報)を用いて包括的な実験を行う。その結果、対照的な微調整が情報的単語を強調するという実証的な証拠が得られた。

The performance of sentence encoders can be significantly improved through the simple practice of fine-tuning using contrastive loss. A natural question arises: what characteristics do models acquire during contrastive learning? This paper theoretically and experimentally shows that contrastive-based sentence encoders implicitly weight words based on information-theoretic quantities; that is, more informative words receive greater weight, while others receive less. The theory states that, in the lower bound of the optimal value of the contrastive learning objective, the norm of word embedding reflects the information gain associated with the distribution of surrounding words. We also conduct comprehensive experiments using various models, multiple datasets, two methods to measure the implicit weighting of models (Integrated Gradients and SHAP), and two information-theoretic quantities (information gain and self-information). The results provide empirical evidence that contrastive fine-tuning emphasizes informative words.

翻訳日:2023-10-25 18:09:28 公開日:2023-10-24

# 非ガウス連続変数系を用いた変分量子シミュレーション

Variational quantum simulation using non-Gaussian continuous-variable systems ( http://arxiv.org/abs/2310.15919v1 )

ライセンス: Link先を確認

Paolo Stornati, Antonio Acin, Ulysse Chabaud, Alexandre Dauphin, Valentina Parigi, Federico Centrone

(参考訳) 本研究は、フォトニックハードウェアに触発されたフレームワーク内で連続変数システムを活用することで、量子シミュレーションに新たなアプローチを導入する。第一の焦点は、量子論において生じるような無限次元系に関連するハミルトンの基底状態の静的な性質のシミュレーションである。現状のフォトニクス技術と互換性のある連続可変変分量子固有解器を提案する。 1+1次元のBose-Hubbardモデルの静的特性の研究に適用し、その有効性と実用性を示し、量子物理学における複素問題に対処する連続変数量子シミュレーションの可能性を強調した。

This work introduces a novel approach to quantum simulation by leveraging continuous-variable systems within a photonic hardware-inspired framework. The primary focus is on simulating static properties of the ground state of Hamiltonians associated with infinite-dimensional systems, such as those arising in quantum field theory. We present a continuous-variable variational quantum eigensolver compatible with state-of-the-art photonic technology. We apply it to the study of static properties of the Bose--Hubbard model in 1+1 dimension and demonstrate its effectiveness and practicality, highlighting the potential of continuous-variable quantum simulations in addressing complex problems in quantum physics.

翻訳日:2023-10-25 18:09:11 公開日:2023-10-24

# タスクベクトルを生成するIn-Context Learning

In-Context Learning Creates Task Vectors ( http://arxiv.org/abs/2310.15916v1 )

ライセンス: Link先を確認

Roee Hendel, Mor Geva, Amir Globerson

(参考訳) In-context Learning (ICL) in Large Language Models (LLMs) は強力な新しい学習パラダイムとして登場した。しかし、そのメカニズムはまだよく分かっていない。特に、それを"標準"の機械学習フレームワークにマッピングすることは難しい。そこでは、トレーニングセット$S$を使用して、仮説クラスで最適な関数$f(x)$を見つける。ここでは、ICLが学習する関数は、入力が$x$とトレーニングセットから計算された1つの「タスクベクトル」のみを持つ変換器LSMに対応する、非常に単純な構造を持つことを示すことで、この問題を進展させる。このように icl は 1 つのタスクベクトル $\boldsymbol{\theta}(s)$ に$s$ を圧縮し、このタスクベクトルを使ってトランスフォーマーを変調して出力を生成することができる。我々は、様々なモデルとタスクにわたる包括的な実験を通じて、上記の主張をサポートする。

In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set $S$ to find a best-fitting function $f(x)$ in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query $x$ and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing $S$ into a single task vector $\boldsymbol{\theta}(S)$ and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

翻訳日:2023-10-25 18:09:00 公開日:2023-10-24

# 個人ReIDの一般化のためのプライマリ補助目的協会による緩和ドメインシフト

Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person ReID ( http://arxiv.org/abs/2310.15913v1 )

ライセンス: Link先を確認

Qilei Li, Shaogang Gong

(参考訳) 深層学習は、独立分布と同一分布(IID)の仮定の下でReIDモデルの精度を著しく向上させたが、予測不能/未知のドメインシフトにより、未確認の新規ドメインに適用した場合に顕著に低下することが明らかとなった。現代ドメイン一般化(DG) ReIDモデルは、インスタンス分類目的のトレーニングのみを通じて、ドメイン不変表現の学習に苦労する。深層学習モデルには大きな影響があり,背景クラッター,スケール,視点の変化,学習モデルの一般化可能性の制限,歩行者が同じ構造特性を持つ領域不変性であるという仮説など,ドメイン固有の特徴に偏っていると考えられる。そこで本稿では,ReIDモデルのドメイン固有性を低下させるため,弱いラベル付き歩行者唾液検出における同時学習目標を用いて,プライマリなReIDインスタンス分類目標のモデル学習を誘導する手法を提案する。 2つの学習対象間のモデルパラメータ空間における相反する最適化基準の問題を解決するために、補助課題の損失勾配を一次学習課題勾配に校正する一次対目標結合(paoa)機構を導入する。実験対象領域におけるモデル生成能力を最大化するために,本手法はマルチタスク学習設計の利点を生かして,PAOA+を生かした最近の試験時間図で拡張し,補助目的に対してオンザフライ最適化を行う。実験により提案したPAOAモデルの優位性を示す。

While deep learning has significantly improved ReID model accuracy under the independent and identical distribution (IID) assumption, it has also become clear that such models degrade notably when applied to an unseen novel domain due to unpredictable/unknown domain shift. Contemporary domain generalization (DG) ReID models struggle in learning domain-invariant representation solely through training on an instance classification objective. We consider that a deep learning model is heavily influenced and therefore biased towards domain-specific characteristics, e.g., background clutter, scale and viewpoint variations, limiting the generalizability of the learned model, and hypothesize that the pedestrians are domain invariant owning they share the same structural characteristics. To enable the ReID model to be less domain-specific from these pure pedestrians, we introduce a method that guides model learning of the primary ReID instance classification objective by a concurrent auxiliary learning objective on weakly labeled pedestrian saliency detection. To solve the problem of conflicting optimization criteria in the model parameter space between the two learning objectives, we introduce a Primary-Auxiliary Objectives Association (PAOA) mechanism to calibrate the loss gradients of the auxiliary task towards the primary learning task gradients. Benefiting from the harmonious multitask learning design, our model can be extended with the recent test-time diagram to form the PAOA+, which performs on-the-fly optimization against the auxiliary objective in order to maximize the model's generative capacity in the test target domain. Experiments demonstrate the superiority of the proposed PAOA model.

翻訳日:2023-10-25 18:08:42 公開日:2023-10-24

# 気候変動が農地の適性に及ぼす影響--機械学習を用いたユーラシアのケーススタディ

Climate Change Impact on Agricultural Land Suitability: An Interpretable Machine Learning-Based Eurasia Case Study ( http://arxiv.org/abs/2310.15912v1 )

ライセンス: Link先を確認

Valeriy Shevchenko, Daria Taniushkina, Aleksander Lukashevich, Aleksandr Bulkin, Roland Grinis, Kirill Kovalev, Veronika Narozhnaia, Nazar Sotiriadi, Alexander Krenke, Yury Maximov

(参考訳) 国連は食料安全保障の改善と飢餓の削減を持続可能な開発目標の重要な要素としている。 2021年現在、世界中で約8億8800万人が飢餓と栄養失調を経験しており、多くの死者が報告されている。気候変動は農地の適性に大きな影響を及ぼし、深刻な食糧不足とその後の社会と政治の対立に繋がる可能性がある。このプレス問題に対処するため,我々は,相当な土地適合性低下のリスクと灌水パターンの変化を予測する機械学習ベースのアプローチを開発した。本研究は、経済・社会問題に苦しむ中央ユーラシアに焦点をあてた。本研究は、気候変動が様々な二酸化炭素排出シナリオにおける農地の適性に与える影響を評価するために、機械学習を用いた先駆的な取り組みである。包括的特徴重要度分析を通じて、土地適合性に影響を及ぼす特定の気候・地形特性を明らかにする。提案手法は,人道的危機を回避することを目的とした情報的意思決定を促進するための,政策立案者にとって貴重な知見を提供する。この研究は、飢餓と栄養失調の緩和に特に重点を置いて、グローバルな課題に取り組む機械学習の膨大な可能性の基礎となっている。

The United Nations has identified improving food security and reducing hunger as essential components of its sustainable development goals. As of 2021, approximately 828 million people worldwide are experiencing hunger and malnutrition, with numerous fatalities reported. Climate change significantly impacts agricultural land suitability, potentially leading to severe food shortages and subsequent social and political conflicts. To address this pressing issue, we have developed a machine learning-based approach to predict the risk of substantial land suitability degradation and changes in irrigation patterns. Our study focuses on Central Eurasia, a region burdened with economic and social challenges. This study represents a pioneering effort in utilizing machine learning methods to assess the impact of climate change on agricultural land suitability under various carbon emissions scenarios. Through comprehensive feature importance analysis, we unveil specific climate and terrain characteristics that exert influence on land suitability. Our approach achieves remarkable accuracy, offering policymakers invaluable insights to facilitate informed decisions aimed at averting a humanitarian crisis, including strategies such as the provision of additional water and fertilizers. This research underscores the tremendous potential of machine learning in addressing global challenges, with a particular emphasis on mitigating hunger and malnutrition.

翻訳日:2023-10-25 18:08:15 公開日:2023-10-24

# 言語モデルにおけるファクチュアルリコールのメカニズム

Characterizing Mechanisms for Factual Recall in Language Models ( http://arxiv.org/abs/2310.15910v1 )

ライセンス: Link先を確認

Qinan Yu, Jack Merullo, Ellie Pavlick

(参考訳) 言語モデル(LM)は、しばしば、特定の文脈に現れる新しい情報と事前トレーニングで記憶した事実を統合する必要がある。これら2つの情報源は意見が一致せず、モデル内での競合を引き起こし、LMがどのように紛争を解決するかは不明である。本研究では,世界資本の知識を問うデータセットについて,そのような状況下でのLMの挙動の分布的および機械的決定要因について検討する。具体的には、LMが対実的な接頭辞(例えば「ポーランドの首都はロンドン」)を使用する時間の割合を測定して、事前訓練で学んだことを上書きする("Warsaw")。 Pythia と GPT2 では、クエリ国 (Poland) と非コンテキスト都市 (London) の両方のトレーニング頻度が、モデルがカウンターファクトルを使用する可能性に大きく影響している。次に、暗記された回答やロジット内の文脈内回答を促進する個別の注意頭を特定するために、頭部属性を使用する。これらのヘッドの値ベクトルをスケールアップまたはダウンすることで、新しいデータにコンテキスト内応答を使用することの可能性を制御できる。このメソッドは、実行時に1つのヘッドをスケールするだけで、コンテキスト内応答を88\%に増やすことができる。私たちの研究は、モデル動作を特定のコンポーネントにローカライズできることを示す一連の証拠に貢献し、将来のメソッドが実行時にモデル動作を動的に制御する方法の実証を提供する。

Language Models (LMs) often must integrate facts they memorized in pretraining with new information that appears in a given context. These two sources can disagree, causing competition within the model, and it is unclear how an LM will resolve the conflict. On a dataset that queries for knowledge of world capitals, we investigate both distributional and mechanistic determinants of LM behavior in such situations. Specifically, we measure the proportion of the time an LM will use a counterfactual prefix (e.g., "The capital of Poland is London") to overwrite what it learned in pretraining ("Warsaw"). On Pythia and GPT2, the training frequency of both the query country ("Poland") and the in-context city ("London") highly affect the models' likelihood of using the counterfactual. We then use head attribution to identify individual attention heads that either promote the memorized answer or the in-context answer in the logits. By scaling up or down the value vector of these heads, we can control the likelihood of using the in-context answer on new data. This method can increase the rate of generating the in-context answer to 88\% of the time simply by scaling a single head at runtime. Our work contributes to a body of evidence showing that we can often localize model behaviors to specific components and provides a proof of concept for how future methods might control model behavior dynamically at runtime.

翻訳日:2023-10-25 18:07:54 公開日:2023-10-24

# 仮想バンド遷移による超高速光変調

Ultrafast Optical Modulation by Virtual Interband Transitions ( http://arxiv.org/abs/2310.15908v1 )

ライセンス: Link先を確認

Evgenii E. Narimanov

(参考訳) 光学研究の新しいフロンティアは、時間と空間の両方における非摂動光学変調の最近の発展によって開かれ、媒体中の光の「時間反射」と「時間反射」を生成する時間境界を生み出している。変調された光学材料におけるフォトニック時間結晶の形成は、非共鳴光増幅や調整可能なラシングから新しい量子光-マター相互作用の体制に至るまで、実用的な応用の可能性を持つ幅広い新しい現象をもたらす。しかし、光の時間境界の形成は、単一の光サイクルの時間スケールにおいてさえ強く高速な屈折率の光変調に依存している。これら2つの問題は、独立して取り組んだ場合でも非常に困難であり、既存の全ての光変調の方法の相反する要件に繋がる。しかし、本研究で示すように、仮想バンド間遷移励起に基づく代替アプローチは、この不可解な問題を解決している。基本的には散逸のない仮想励起による光変調は、材料の熱蓄積と散逸の問題に直面することはないが、単一の光サイクルの時間スケールでしか材料の応答を変調しない励起仮想集団の過渡的な性質は、屈折率の変化が本質的に超高速であることを保証する。本稿では,提案手法の理論的記述を開発し,既存の光学材料や技術を用いて容易に実装できることを実証する。

A new frontier in optics research has been opened by the recent developments in non-perturbative optical modulation in both time and space that creates temporal boundaries generating ``time-reflection'' and ``time-refraction'' of light in the medium. The resulting formation of a Photonic Time Crystal within the modulated optical material leads to a broad range new phenomena with a potential for practical applications, from non-resonant light amplification and tunable lasing, to the new regime of quantum light-matter interactions. However, the formation of the temporal boundary for light relies on optical modulation of the refractive index that is both strong and fast even on the time scale of a single optical cycle. Both of these two problems are extremely challenging even when addressed independently, leading to conflicting requirements for all existing methods of optical modulation. However, as we show in the present work, an alternative approach based on virtual interband transition excitation, solves this seemingly insurmountable problem. Being fundamentally dissipation-free, optical modulation by virtual excitation does not face the problem of heat accumulation and dissipation in the material, while the transient nature of the excited virtual population that modifies the material response only on the time scale of a single optical cycle, ensures that the resulting change in the refractive index is inherently ultrafast. Here we develop the theoretical description of the proposed modulation approach, and demonstrate that it can be readily implemented using already existing optical materials and technology.

翻訳日:2023-10-25 18:07:31 公開日:2023-10-24

# 必要なものは全部探すか? 埋め込み空間の探索に代わるインジケータタスク

Is Probing All You Need? Indicator Tasks as an Alternative to Probing Embedding Spaces ( http://arxiv.org/abs/2310.15905v1 )

ライセンス: Link先を確認

Tal Levy, Omer Goldman and Reut Tsarfaty

(参考訳) 単語のベクトル表現に符号化された様々な言語情報を識別し、制御する能力は、特に説明可能性やバイアス除去のために多くのユースケースを持つ。これは通常、埋め込み空間に符号化された情報を評価するために、プローブと呼ばれる一連の単純な分類タスクによって行われる。しかし、訓練可能な分類器の関与は、プローブの結果と分類器の性質の間の絡み合いにつながる。その結果、現代の調査作業には補助モデルの訓練を含まないタスクが含まれている。本研究では,ある特性の存在を問合せするための埋め込み空間を問合せするために用いられる非訓練タスクの指示タスクという用語を導入し,このようなタスクがプローブとは反対の方向を指している可能性を示し,この矛盾は埋め込み空間に特性が存在するか否かの決定を複雑化する。実験では, 男女の偏差を扱い, 埋込み空間から形態的情報を消去する2つのテストケースを用いてクレームを実証した。適切なインジケータの適用により,プローブに対して捕獲・削除された情報のより正確な画像が得られることを示す。そこで我々は,組込み表現から情報を引き出す際に,インジケータタスクを実装し,考慮すべきであると結論付けた。

The ability to identify and control different kinds of linguistic information encoded in vector representations of words has many use cases, especially for explainability and bias removal. This is usually done via a set of simple classification tasks, termed probes, to evaluate the information encoded in the embedding space. However, the involvement of a trainable classifier leads to entanglement between the probe's results and the classifier's nature. As a result, contemporary works on probing include tasks that do not involve training of auxiliary models. In this work we introduce the term indicator tasks for non-trainable tasks which are used to query embedding spaces for the existence of certain properties, and claim that this kind of tasks may point to a direction opposite to probes, and that this contradiction complicates the decision on whether a property exists in an embedding space. We demonstrate our claims with two test cases, one dealing with gender debiasing and another with the erasure of morphological information from embedding spaces. We show that the application of a suitable indicator provides a more accurate picture of the information captured and removed compared to probes. We thus conclude that indicator tasks should be implemented and taken into consideration when eliciting information from embedded representations.

翻訳日:2023-10-25 18:07:05 公開日:2023-10-24

# decoupled DETR: 終端物体検出のための空間的距離化と分類

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection ( http://arxiv.org/abs/2310.15955v1 )

ライセンス: Link先を確認

Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li

(参考訳) DETRの導入は、オブジェクト検出の新しいパラダイムである。しかし、そのデコーダは、共有クエリとクロスアテンションレイヤを使って分類とボックスのローカライゼーションを行い、亜最適結果をもたらす。視覚特徴マップに対する関心の異なる領域が、同じオブジェクトであっても、クエリ分類やボックスローカライゼーションタスクの実行に適していることを確認する。サルエント領域は分類に不可欠な情報を提供し、周囲の境界はボックス回帰に有利である。残念ながら、これらの2つのタスク間の空間的不整合は、DETRの訓練を著しく妨げている。そこで本研究では,DETRにおける局所化タスクと分類タスクの分離に着目した。そこで本研究では,タスク認識型クエリ生成モジュールと不整合特徴学習プロセスを含む空間分離型DETR (SD-DETR) と呼ばれる新しい設計手法を提案する。タスク対応クエリの初期化プロセスを精巧に設計し、デコーダ内のクロスアテンションブロックを分割し、タスク対応クエリを異なる視覚領域にマッチさせる。また,高い分類信頼度と正確な位置推定のための予測ミスアライメント問題が存在することを観察し,空間的に分離されたdetrトレーニングをさらに導くためのアライメント損失を提案する。広範にわたる実験により,本手法は過去の研究と比較して,MSCOCOデータセットの大幅な改善を実現していることを示す。例えば、条件付きDETRの性能を4.5 APで改善する。この2つのタスクを空間的に切り離すことで、不整合問題を克服し、オブジェクト検出のためのDETRの性能を大幅に改善する。

The introduction of DETR represents a new paradigm for object detection. However, its decoder conducts classification and box localization using shared queries and cross-attention layers, leading to suboptimal results. We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object. Salient regions provide vital information for classification, while the boundaries around them are more favorable for box regression. Unfortunately, such spatial misalignment between these two tasks greatly hinders DETR's training. Therefore, in this work, we focus on decoupling localization and classification tasks in DETR. To achieve this, we introduce a new design scheme called spatially decoupled DETR (SD-DETR), which includes a task-aware query generation module and a disentangled feature learning process. We elaborately design the task-aware query initialization process and divide the cross-attention block in the decoder to allow the task-aware queries to match different visual regions. Meanwhile, we also observe that the prediction misalignment problem for high classification confidence and precise localization exists, so we propose an alignment loss to further guide the spatially decoupled DETR training. Through extensive experiments, we demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work. For instance, we improve the performance of Conditional DETR by 4.5 AP. By spatially disentangling the two tasks, our method overcomes the misalignment problem and greatly improves the performance of DETR for object detection.

翻訳日:2023-10-25 18:01:13 公開日:2023-10-24

# 潜在誘導拡散とネストセンブルを用いた医用画像分類におけるロバスト性と信頼性の向上

Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles ( http://arxiv.org/abs/2310.15952v1 )

ライセンス: Link先を確認

Xing Shen, Hengguan Huang, Brennan Nichyporuk, Tal Arbel

(参考訳) 深層学習モデルは、様々な医療画像解析タスクにおいて顕著な成功を収めてきたが、実際の臨床状況におけるこれらのモデルの展開には、取得した画像のばらつきに対して堅牢である必要がある。多くの方法は、トレーニングデータを拡張してテスト時の堅牢性を高めるために事前定義された変換を適用するが、これらの変換は、患者画像に見られる多様な変数に対するモデルの堅牢性を保証するものではない。本稿では,条件付き拡散モデルと組み合わされたトランスフォーマーに基づく新しい3段階アプローチを提案する。この目的のために、複数の画像エンコーダはまず階層的な特徴表現を学習し、識別可能な潜在空間を構築する。次に、潜在コードに導かれる逆拡散過程が、情報的事前に作用し、予測候補を生成的手法で提案する。最後に、予測候補を2レベル集約プロトコルに集約し、最終的な出力を生成する。医用イメージングベンチマークデータセットの広範な実験を通じて,本手法はロバスト性と信頼性のキャリブレーションの観点から最先端の手法により改善されることを示す。さらに, 症例レベルでの予測の不確実性を定量化し, 臨床実習における臨床医への信頼性を高める戦略を導入する。

While deep learning models have achieved remarkable success across a range of medical image analysis tasks, deployment of these models in real clinical contexts requires that they be robust to variability in the acquired images. While many methods apply predefined transformations to augment the training data to enhance test-time robustness, these transformations may not ensure the model's robustness to the diverse variability seen in patient images. In this paper, we introduce a novel three-stage approach based on transformers coupled with conditional diffusion models, with the goal of improving model robustness to the kinds of imaging variability commonly encountered in practice without the need for pre-determined data augmentation strategies. To this end, multiple image encoders first learn hierarchical feature representations to build discriminative latent spaces. Next, a reverse diffusion process, guided by the latent code, acts on an informative prior and proposes prediction candidates in a generative manner. Finally, several prediction candidates are aggregated in a bi-level aggregation protocol to produce the final output. Through extensive experiments on medical imaging benchmark datasets, we show that our method improves upon state-of-the-art methods in terms of robustness and confidence calibration. Additionally, we introduce a strategy to quantify the prediction uncertainty at the instance level, increasing their trustworthiness to clinicians using them in clinical practice.

翻訳日:2023-10-25 18:00:46 公開日:2023-10-24

# 重み付き距離近辺凝縮

Weighted Distance Nearest Neighbor Condensing ( http://arxiv.org/abs/2310.15951v1 )

ライセンス: Link先を確認

Lee-Ad Gottlieb, Timor Sharabi, Roi Weiss

(参考訳) 近隣の凝縮の問題は、その理論的および実践的な側面において、長い研究の歴史を享受してきた。本稿では, 凝縮集合の各点に重みを割り当てる重み付き距離近傍の凝縮問題を紹介し, そして, 凝縮集合内のその重み付き距離近傍の近傍の重み付き距離に基づいて新しい点をラベル付けする。この新しいモデルの理論的性質を考察し,本モデルが最も近い規則よりも劇的に凝縮性が向上することを示すが,その一般化は後者とほぼ同一である。次に、新しい問題に対する凝縮ヒューリスティックを提案する。このヒューリスティックなベイズ一貫性を示し、有望な実証結果も示します。

The problem of nearest neighbor condensing has enjoyed a long history of study, both in its theoretical and practical aspects. In this paper, we introduce the problem of weighted distance nearest neighbor condensing, where one assigns weights to each point of the condensed set, and then new points are labeled based on their weighted distance nearest neighbor in the condensed set. We study the theoretical properties of this new model, and show that it can produce dramatically better condensing than the standard nearest neighbor rule, yet is characterized by generalization bounds almost identical to the latter. We then suggest a condensing heuristic for our new problem. We demonstrate Bayes consistency for this heuristic, and also show promising empirical results.

翻訳日:2023-10-25 18:00:24 公開日:2023-10-24

# 推薦のための大規模言語モデルによる表現学習

Representation Learning with Large Language Models for Recommendation ( http://arxiv.org/abs/2310.15950v1 )

ライセンス: Link先を確認

Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, Chao Huang

(参考訳) レコメンダシステムは、ディープラーニングとグラフニューラルネットワークの影響、特に複雑なユーザとテーマの関係を捉えることで大きな進歩を遂げている。しかし、これらのグラフベースのレコメンデータは、IDベースのデータに大きく依存しており、ユーザやアイテムに関連する貴重なテキスト情報を無視する可能性がある。さらに、暗黙的なフィードバックデータの利用は潜在的なノイズとバイアスを導入し、ユーザの嗜好学習の有効性に挑戦する。大規模言語モデル(LLM)を従来のIDベースのレコメンダに統合することは注目されているが、スケーラビリティの問題、テキストのみ依存の制限、実用的なレコメンダシステムにおける効果的な実装のためには入力制約に対処する必要がある。これらの課題に対処するため,LLMを用いた表現学習により既存のレコメンデータを強化することを目的としたモデルに依存しないフレームワーク RLMRec を提案する。ユーザ行動や嗜好の複雑な意味的側面を捉えるために,表現学習とLLMを統合したレコメンデーションパラダイムを提案する。 RLMRecには補助的なテキスト信号が組み込まれており、LLMによって強化されたユーザ/イテムプロファイリングのパラダイムが開発されており、LLMのセマンティック空間と協調的な関係信号の表現空間を、クロスビューアライメントフレームワークを通じて整列する。この研究はさらに、相互情報最大化によるテキスト信号の統合が表現の質を高めることを実証する理論的基礎を確立する。本評価では,rlmrecを最先端のレコメンダモデルに統合するとともに,ノイズデータに対する効率性とロバスト性を分析する。実装コードはhttps://github.com/hkuds/rlmrecで利用可能です。

Recommender systems have seen significant advancements with the influence of deep learning and graph neural networks, particularly in capturing complex user-item relationships. However, these graph-based recommenders heavily depend on ID-based data, potentially disregarding valuable textual information associated with users and items, resulting in less informative learned representations. Moreover, the utilization of implicit feedback data introduces potential noise and bias, posing challenges for the effectiveness of user preference learning. While the integration of large language models (LLMs) into traditional ID-based recommenders has gained attention, challenges such as scalability issues, limitations in text-only reliance, and prompt input constraints need to be addressed for effective implementation in practical recommender systems. To address these challenges, we propose a model-agnostic framework RLMRec that aims to enhance existing recommenders with LLM-empowered representation learning. It proposes a recommendation paradigm that integrates representation learning with LLMs to capture intricate semantic aspects of user behaviors and preferences. RLMRec incorporates auxiliary textual signals, develops a user/item profiling paradigm empowered by LLMs, and aligns the semantic space of LLMs with the representation space of collaborative relational signals through a cross-view alignment framework. This work further establish a theoretical foundation demonstrating that incorporating textual signals through mutual information maximization enhances the quality of representations. In our evaluation, we integrate RLMRec with state-of-the-art recommender models, while also analyzing its efficiency and robustness to noise data. Our implementation codes are available at https://github.com/HKUDS/RLMRec.

翻訳日:2023-10-25 18:00:12 公開日:2023-10-24

# 多条件拡散モデルを用いた言語駆動シーン合成

Language-driven Scene Synthesis using Multi-conditional Diffusion Model ( http://arxiv.org/abs/2310.15948v1 )

ライセンス: Link先を確認

An Vuong, Minh Nhat Vu, Toan Tien Nguyen, Baoru Huang, Dzung Nguyen, Thieu Vo, Anh Nguyen

(参考訳) シーン合成はいくつかの産業応用において難しい問題である。近年,人間の動きや部屋のレイアウト,空間グラフを入力としてシーンを合成する取り組みが盛んに行われている。しかし、この問題を複数のモダリティ、特にテキストプロンプトを組み合わせることで解決した研究はほとんどない。本稿では,文章のプロンプト,人間の動き,既存のシーン合成用オブジェクトを統合する新しいタスクである,言語駆動型シーン合成タスクを提案する。他の単一条件合成タスクとは異なり、この問題は複数の条件を伴い、それらを統一された空間に処理およびエンコードするための戦略を必要とする。この課題に対処するために、原データ分布の導出点を明示的に予測することにより、他の拡散文学の暗黙の統合アプローチとは異なる多条件拡散モデルを提案する。我々のアプローチは理論的に支持的であることを実証する。集中実験の結果,本手法は最先端ベンチマークよりも優れており,自然なシーン編集アプリケーションを実現する。ソースコードとデータセットはhttps://lang-scene-synth.github.io/でアクセスできる。

Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which is a new task that integrates text prompts, human motion, and existing objects for scene synthesis. Unlike other single-condition synthesis tasks, our problem involves multiple conditions and requires a strategy for processing and encoding them into a unified space. To address the challenge, we present a multi-conditional diffusion model, which differs from the implicit unification approach of other diffusion literature by explicitly predicting the guiding points for the original data distribution. We demonstrate that our approach is theoretically supportive. The intensive experiment results illustrate that our method outperforms state-of-the-art benchmarks and enables natural scene editing applications. The source code and dataset can be accessed at https://lang-scene-synth.github.io/.

翻訳日:2023-10-25 17:59:39 公開日:2023-10-24

# ShARc:人物識別のための形状と外観認識

ShARc: Shape and Appearance Recognition for Person Identification In-the-wild ( http://arxiv.org/abs/2310.15946v1 )

ライセンス: Link先を確認

Haidong Zhu, Wanrong Zheng, Zhaoheng Zheng, Ram Nevatia

(参考訳) 非拘束的なビデオ設定で個人を特定することは、外見、環境、劣化、および咬合の多様性のため、生体計測分析において有益だが困難なタスクである。本稿では,3次元の身体形状,ポーズ,外観を重視した映像に基づく人物識別のためのマルチモーダル手法であるShARcを提案する。本稿では,PSE(Pose and Shape Encoder)とAAE(Aggregated Appearance Encoder)の2つのエンコーダを紹介する。 pseは2次元シルエット、骨格運動、および3次元体形状を介して体形を符号化し、aaeは注意に基づく特徴集約と平均的なアグリゲーションの2段階の時間的外観特徴集約を提供する。注意に基づく特徴集約では、空間的・時間的注意を個人区別のための重要な領域に向ける。また,アグリゲーションを平均化するために,アグリゲーション後の新しい平ら化層を導入し,より識別可能な情報を抽出し,注目の過度な適合を低減する。ギャラリー登録にはcentroid feature averagingを利用する。我々は、ccvid、mevid、briarなど、パブリックデータセットにおける既存の最先端のメソッドに対する大幅な改善を示す。

Identifying individuals in unconstrained video settings is a valuable yet challenging task in biometric analysis due to variations in appearances, environments, degradations, and occlusions. In this paper, we present ShARc, a multimodal approach for video-based person identification in uncontrolled environments that emphasizes 3-D body shape, pose, and appearance. We introduce two encoders: a Pose and Shape Encoder (PSE) and an Aggregated Appearance Encoder (AAE). PSE encodes the body shape via binarized silhouettes, skeleton motions, and 3-D body shape, while AAE provides two levels of temporal appearance feature aggregation: attention-based feature aggregation and averaging aggregation. For attention-based feature aggregation, we employ spatial and temporal attention to focus on key areas for person distinction. For averaging aggregation, we introduce a novel flattening layer after averaging to extract more distinguishable information and reduce overfitting of attention. We utilize centroid feature averaging for gallery registration. We demonstrate significant improvements over existing state-of-the-art methods on public datasets, including CCVID, MEVID, and BRIAR.

翻訳日:2023-10-25 17:59:22 公開日:2023-10-24

# これはデータセットではない: 大きな言語モデルに挑戦する大規模な否定ベンチマーク

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models ( http://arxiv.org/abs/2310.15941v1 )

ライセンス: Link先を確認

Iker Garc\'ia-Ferrero, Bego\~na Altuna, Javier \'Alvez, Itziar Gonzalez-Dios, German Rigau

(参考訳) 大規模言語モデル(llm)はある種の文法知識と一般化能力を獲得したが、自然言語処理において重要なステップである否定の解釈に失敗している。我々は,LLMが否定を理解する上での最適でない性能の理由を明らかにする。本稿では,コーパスの約2/3に否定が存在する真偽の常識知識に関する記述文約40万文の大規模な半自動生成データセットを,異なる形式で紹介する。我々は,その一般化と推論能力を把握するため,ゼロショットアプローチで利用可能な最大オープンLCMを用いてデータセットを構築し,また,否定の理解をトレーニングできるかどうかを評価するために,いくつかのモデルを微調整した。以上の結果から, LLMは肯定文の分類に長けているが, 否定文に苦慮し, 否定の深い理解が欠如していることが示唆された。否定文のモデルを微調整することで、その性能は向上するが、否定処理における一般化の欠如は持続的であり、否定理解と一般化に関するLLMの継続的な課題を強調している。データセットとコードは公開されている。

Although large language models (LLMs) have apparently acquired a certain level of grammatical knowledge and the ability to make generalizations, they fail to interpret negation, a crucial step in Natural Language Processing. We try to clarify the reasons for the sub-optimal performance of LLMs understanding negation. We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms. We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability and we have also fine-tuned some of the models to assess whether the understanding of negation can be trained. Our findings show that, while LLMs are proficient at classifying affirmative sentences, they struggle with negative sentences and lack a deep understanding of negation, often relying on superficial cues. Although fine-tuning the models on negative sentences improves their performance, the lack of generalization in handling negation is persistent, highlighting the ongoing challenges of LLMs regarding negation understanding and generalization. The dataset and code are publicly available.

翻訳日:2023-10-25 17:59:00 公開日:2023-10-24

# 動作と後継機能キーボードとの結合

Combining Behaviors with the Successor Features Keyboard ( http://arxiv.org/abs/2310.15940v1 )

ライセンス: Link先を確認

Wilka Carvalho, Andre Saraiva, Angelos Filos, Andrew Kyle Lampinen, Loic Matthey, Richard L. Lewis, Honglak Lee, Satinder Singh, Danilo J. Rezende, Daniel Zoran

(参考訳) Option Keyboard (OK) はタスク間での行動知識の伝達方法として提案されている。 OKは、継承的特徴(SF)と一般化政策改善(GPI)を用いて、既知の行動の部分集合を適応的に組み合わせて知識を伝達する。しかし、ハンドデザインされた状態特徴とタスクエンコーディングに依存しており、新しい環境ごとに設計するのは面倒です。本稿では,検出された状態特徴とタスクエンコーディングによる転送を可能にする"successor features keyboard"(sfk)を提案する。そこで我々は,SFを推定する新しい学習アルゴリズムであるCSFA(Categorical Successor Feature Approximator)を提案する。 SFK と CSFA では,必要な表現がすべて発見される困難な3次元環境において,SF との移動を初めて実演する。まず, CSFA と他の SF 近似法を比較し, このスケールで SF&GPI と互換性のある表現を CSFA のみが発見できることを示す。そして、sfkとトランスファー学習のベースラインを比較し、最も高速に長いホリゾンタスクに転送できることを示します。

The Option Keyboard (OK) was recently proposed as a method for transferring behavioral knowledge across tasks. OK transfers knowledge by adaptively combining subsets of known behaviors using Successor Features (SFs) and Generalized Policy Improvement (GPI). However, it relies on hand-designed state-features and task encodings which are cumbersome to design for every new environment. In this work, we propose the "Successor Features Keyboard" (SFK), which enables transfer with discovered state-features and task encodings. To enable discovery, we propose the "Categorical Successor Feature Approximator" (CSFA), a novel learning algorithm for estimating SFs while jointly discovering state-features and task encodings. With SFK and CSFA, we achieve the first demonstration of transfer with SFs in a challenging 3D environment where all the necessary representations are discovered. We first compare CSFA against other methods for approximating SFs and show that only CSFA discovers representations compatible with SF&GPI at this scale. We then compare SFK against transfer learning baselines and show that it transfers most quickly to long-horizon tasks.

翻訳日:2023-10-25 17:58:39 公開日:2023-10-24

# ABKD:意識に基づく知識蒸留によるグラフニューラルネットワーク圧縮

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation ( http://arxiv.org/abs/2310.15938v1 )

ライセンス: Link先を確認

Anshul Ahluwalia, Rohit Das, Payman Behnam, Alind Khare, Pan Li, Alexey Tumanov

(参考訳) グラフニューラルネットワーク(GNN)は、レコメンデーションシステム、偽ニュース検出、薬物発見、コンピュータビジョンなど、さまざまな用途に非常に汎用性があることが証明されている。グラフ構造データのサイズが大きくなるため、GNNモデルも複雑さが増し、重大なレイテンシの問題が発生している。これは主に、グラフデータの不規則な構造とそのメモリへのアクセスパターンに起因する。レイテンシを低減するための自然な解決策は、大きなGNNを小さなGNNに圧縮することだ。この方法の1つは知識蒸留(KD)である。しかしながら、GNNのほとんどのKDアプローチは、最後の層の出力のみを考慮し、GNNの中間層の出力を考慮しない。この問題に対処するため,我々は,Attention-Based Knowledge Distillation (ABKD) と呼ぶ,GNN圧縮に対する新しいKDアプローチを提案する。 ABKDはKDアプローチであり、重要な中学生層を識別し、出力の整合に集中する。 ABKDは既存のKD手法に比べて精度の低いGNNの圧縮を可能にする。グラフデータセットであるOGBN-Magの32.3倍圧縮比を,最先端のアプローチと比較して平均1.79%の精度向上を実現した。

Graph Neural Networks (GNNs) have proven to be quite versatile for a variety of applications, including recommendation systems, fake news detection, drug discovery, and even computer vision. Due to the expanding size of graph-structured data, GNN models have also increased in complexity, leading to substantial latency issues. This is primarily attributed to the irregular structure of graph data and its access pattern into memory. The natural solution to reduce latency is to compress large GNNs into small GNNs. One way to do this is via knowledge distillation (KD). However, most KD approaches for GNNs only consider the outputs of the last layers and do not consider the outputs of the intermediate layers of the GNNs; these layers may contain important inductive biases indicated by the graph structure. To address this shortcoming, we propose a novel KD approach to GNN compression that we call Attention-Based Knowledge Distillation (ABKD). ABKD is a KD approach that uses attention to identify important intermediate teacher-student layer pairs and focuses on aligning their outputs. ABKD enables higher compression of GNNs with a smaller accuracy dropoff compared to existing KD approaches. On average, we achieve a 1.79% increase in accuracy with a 32.3x compression ratio on OGBN-Mag, a large graph dataset, compared to state-of-the-art approaches.

翻訳日:2023-10-25 17:58:18 公開日:2023-10-24

# 量子エネルギーテレポーテーションによる近藤効果の探求

Exploring Kondo effect by quantum energy teleportation ( http://arxiv.org/abs/2310.15936v1 )

ライセンス: Link先を確認

Kazuki Ikeda, Rajeev Singh, Robert-Jan Slager

(参考訳) コンド効果結合を特徴とする1次元のXXZ$スピン鎖の位相図を再現する量子エネルギーテレポーテーション(QET)法を検討する。この設定では、エネルギー供給者と受信者は、点不純物から空間的に分離され、直接相互作用しない。それでも、正確な対角化によって生成された図を忠実に反映した位相図を生成することに成功した。これは各サブシステムのローカル操作のみを使用して実現でき、古典的な通信によって補完される。この偉業は、QETアプローチによって得られたエネルギーとシステムの量子エンタングルメントエントロピーとの間の臨界接続によって実現される。これらの知見を裏付けるために、まず量子エンタングルメントエントロピーがシステムの関連する順序パラメータとなることを実証する。興味深いことに、QETによって決定されたエンタングルメントスペクトルのギャップ間隔の変化は、エンタングルメントエントロピーとエネルギーの両方のピークの位置と一致している。この理論の枠組みは、例えば、リドバーグ原子の1次元鎖を用いて実験的に検証できると仮定する。

We consider a quantum energy teleportation (QET) method to replicate the phase diagram of a one-dimensional $XXZ$ spin chain featuring a Kondo effect coupling. In this setup, the energy supplier and receiver are spatially separated from the point impurity and do not interact directly with it. Nonetheless, they may successfully generate phase diagrams that closely mirror those produced via exact diagonalization. This can be achieved using only local operations on their respective subsystems, supplemented by classical communication. This feat is made possible due to a critical connection between the energy obtained through the QET approach and the system's quantum entanglement entropy. To substantiate these findings, we initially demonstrate that the quantum entanglement entropy serves as the relevant order parameter for the system. Intriguingly, changes in the gap spacing of the entanglement spectra align with the locations of peaks in both entanglement entropy and energy, as determined by QET. We hypothesize that this theoretical framework could, for example, be validated experimentally using a one-dimensional chain of Rydberg atoms.

翻訳日:2023-10-25 17:57:56 公開日:2023-10-24

# 時系列予測のためのグラフ深層学習

Graph Deep Learning for Time Series Forecasting ( http://arxiv.org/abs/2310.15978v1 )

ライセンス: Link先を確認

Andrea Cini, Ivan Marisca, Daniele Zambon, Cesare Alippi

(参考訳) グラフに基づくディープラーニング手法は,関連時系列の収集処理に人気がある。従来の多変量予測法とは異なり、ニューラルグラフベースの予測器は、時系列コレクションにまたがる(おそらく動的)グラフの予測を条件付けることにより、ペアワイズな関係を利用する。条件付けは、ニューラルネットワークの予測アーキテクチャに対するアーキテクチャ上の帰納的バイアスの形を取り得るため、時空間グラフニューラルネットワークと呼ばれる深層学習モデルのファミリとなる。このようなリレーショナルインダクティブバイアスにより、大規模な時系列コレクション上でのグローバル予測モデルのトレーニングが可能になると同時に、各要素(グラフノード)内の各要素(グラフエッジ)の局所的相関(グラフエッジ)を計算して、予測をローカライズする。実際、グラフニューラルネットワークと時系列予測のためのディープラーニングの理論的および実践的な進歩は、そのような処理フレームワークの採用を魅力的かつタイムリーに進めている。しかし、文献研究の大半は現代の深層学習の手法を活かして既存のニューラルアーキテクチャのバリエーションを提案することに焦点を当てているが、基礎的・方法論的側面は体系的調査の対象となっていない。このギャップを埋めるため,本論文では,予測問題を形式化し,グラフに基づく予測モデルや手法の設計原則を提供する,包括的な方法論フレームワークを提案する。同時に、この分野の概要とともに、デザインガイドライン、レコメンデーション、ベストプラクティスを提供し、オープンチャレンジと今後の研究方向性に関する詳細な議論も行います。

Graph-based deep learning methods have become popular tools to process collections of correlated time series. Differently from traditional multivariate forecasting methods, neural graph-based predictors take advantage of pairwise relationships by conditioning forecasts on a (possibly dynamic) graph spanning the time series collection. The conditioning can take the form of an architectural inductive bias on the neural forecasting architecture, resulting in a family of deep learning models called spatiotemporal graph neural networks. Such relational inductive biases enable the training of global forecasting models on large time-series collections, while at the same time localizing predictions w.r.t. each element in the set (i.e., graph nodes) by accounting for local correlations among them (i.e., graph edges). Indeed, recent theoretical and practical advances in graph neural networks and deep learning for time series forecasting make the adoption of such processing frameworks appealing and timely. However, most of the studies in the literature focus on proposing variations of existing neural architectures by taking advantage of modern deep learning practices, while foundational and methodological aspects have not been subject to systematic investigation. To fill the gap, this paper aims to introduce a comprehensive methodological framework that formalizes the forecasting problem and provides design principles for graph-based predictive models and methods to assess their performance. At the same time, together with an overview of the field, we provide design guidelines, recommendations, and best practices, as well as an in-depth discussion of open challenges and future research directions.

翻訳日:2023-10-25 17:49:31 公開日:2023-10-24

# the conspiracy money machine: telegramの陰謀チャネルとその利益モデルを明らかにする

The Conspiracy Money Machine: Uncovering Telegram's Conspiracy Channels and their Profit Model ( http://arxiv.org/abs/2310.15977v1 )

ライセンス: Link先を確認

Vincenzo Imperati, Massimo La Morgia, Alessandro Mei, Alberto Maria Mongardini, Francesco Sassi

(参考訳) 近年、主要なソーシャルメディアプラットフォームはより厳格なモデレーション政策を実施しており、陰謀論関連のコンテンツの禁止と制限となっている。これらの制限を回避するために、陰謀論者は、より少ない制限で彼らの見解を表現し、広めることができるTelegramのような代替手段に目を向けている。 telegramはチャネル -- 管理者だけがメッセージをブロードキャストできる仮想ルーム -- と、より寛容なコンテンツポリシを提供する。これらの特徴は、陰謀チャネルの複雑なエコシステムのための完璧な繁殖地を生み出した。本稿では,この生態系を照明する。まず,陰謀チャネルを検出する手法を提案する。次に,共謀チャネルを17,000以上のチャネルからなる4つの異なるコミュニティにまとめることができることを発見した。最後に,「共謀マネーマシン」を明らかにし,ほとんどの共謀チャンネルが加入者から利益を得ようとしていることを明らかにした。陰謀論者はeコマースプラットフォームを利用して、疑わしい商品を販売したり、アフィリエイトリンクを通じて利益を上げたりする。さらに,共謀チャネルは寄付やクラウドファンディングのプラットフォームを利用してキャンペーンの資金を集めることを観察する。この事業には何百人もの寄付者が関与し、9000万ドル以上のリターンを生み出すと私たちは判断する。

In recent years, major social media platforms have implemented increasingly strict moderation policies, resulting in bans and restrictions on conspiracy theory-related content. To circumvent these restrictions, conspiracy theorists are turning to alternatives, such as Telegram, where they can express and spread their views with fewer limitations. Telegram offers channels -- virtual rooms where only administrators can broadcast messages -- and a more permissive content policy. These features have created the perfect breeding ground for a complex ecosystem of conspiracy channels. In this paper, we illuminate this ecosystem. First, we propose an approach to detect conspiracy channels. Then, we discover that conspiracy channels can be clustered into four distinct communities comprising over 17,000 channels. Finally, we uncover the "Conspiracy Money Machine," revealing how most conspiracy channels actively seek to profit from their subscribers. We find conspiracy theorists leverage e-commerce platforms to sell questionable products or lucratively promote them through affiliate links. Moreover, we observe that conspiracy channels use donation and crowdfunding platforms to raise funds for their campaigns. We determine that this business involves hundreds of donors and generates a turnover of over $90 million.

翻訳日:2023-10-25 17:49:05 公開日:2023-10-24

# 非凸最適化のための符号ベースランダムリシャッフルアルゴリズムの収束

Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization ( http://arxiv.org/abs/2310.15976v1 )

ライセンス: Link先を確認

Zhen Qin, Zhishuai Liu, Pan Xu

(参考訳) signSGDは通信効率のために非凸最適化で人気がある。しかし、既存のSignSGDの分析では、データが各反復でサンプル化され、ランダムにリシャッフルされ、アルゴリズムにシーケンシャルに供給される実践的な実装と矛盾する、と仮定している。非凸最適化のためのランダムリシャッフル(SignRR)を用いたSignSGDの最初の収束結果を証明することにより、このギャップを埋める。データセットのサイズが$n$、データのエポック数が$t$、確率勾配 $\sigma^2$ の分散境界が与えられると、signgd \citep{bernstein2018signsgd} と同じ収束率 $o(\log(nt)/\sqrt{nt} + \|\sigma\|_1$ が signgd と同じであることが分かる。次に,分散勾配と運動量更新をそれぞれ利用する signrvr と signrvm を示し,それぞれ$o(\log(nt)/\sqrt{nt})$ で収束させる。 signgdの分析とは対照的に、各イテレーションで非常に大きなバッチサイズが必要はなく、イテレーションの総数である \citep{bernstein2018signsgd} や、確率的かつ真の勾配の符号は、要素ごとに最小確率1/2 \citep{safaryan2021stochastic} で一致している。また、異なるマシンに分散している場合にもアルゴリズムを拡張し、dist-signrvrとdist-signrvmを生成し、どちらも$o(\log(n_0t)/\sqrt{n_0t})$で収束させます。シミュレーションおよび実世界の問題に関する実験を通じて理論的知見をバックアップし、ランダムにリシャッフルされた手話法が既存のベースラインに一致するか、あるいは超えるかを検証する。

signSGD is popular in nonconvex optimization due to its communication efficiency. Yet, existing analyses of signSGD rely on assuming that data are sampled with replacement in each iteration, contradicting the practical implementation where data are randomly reshuffled and sequentially fed into the algorithm. We bridge this gap by proving the first convergence result of signSGD with random reshuffling (SignRR) for nonconvex optimization. Given the dataset size $n$, the number of epochs of data passes $T$, and the variance bound of a stochastic gradient $\sigma^2$, we show that SignRR has the same convergence rate $O(\log(nT)/\sqrt{nT} + \|\sigma\|_1)$ as signSGD \citep{bernstein2018signsgd}. We then present SignRVR and SignRVM, which leverage variance-reduced gradients and momentum updates respectively, both converging at $O(\log(nT)/\sqrt{nT})$. In contrast with the analysis of signSGD, our results do not require an extremely large batch size in each iteration to be of the same order as the total number of iterations \citep{bernstein2018signsgd} or the signs of stochastic and true gradients match element-wise with a minimum probability of 1/2 \citep{safaryan2021stochastic}. We also extend our algorithms to cases where data are distributed across different machines, yielding dist-SignRVR and dist-SignRVM, both converging at $O(\log(n_0T)/\sqrt{n_0T})$, where $n_0$ is the dataset size of a single machine. We back up our theoretical findings through experiments on simulated and real-world problems, verifying that randomly reshuffled sign methods match or surpass existing baselines.

翻訳日:2023-10-25 17:48:47 公開日:2023-10-24

# データ駆動交通シミュレーション:総括的レビュー

Data-driven Traffic Simulation: A Comprehensive Review ( http://arxiv.org/abs/2310.15975v1 )

ライセンス: Link先を確認

Di Chen, Meixin Zhu, Hao Yang, Xuesong Wang, Yinhai Wang

(参考訳) 自動運転車(avs)は安全で効率的な交通手段を提供することで社会を大きく変革する可能性を秘めている。近年、自律運転の認識と予測において顕著な進歩が見られるが、AVの性能を検証するという課題はほとんど解決されていない。データ駆動型微視的交通シミュレーションが自動運転テストの重要なツールになった 1) 高忠実度交通データの提供 2) 大規模試験・シナリオ再現性の向上のメリット 3)反応的かつ現実的な交通シミュレーションの可能性。しかし、現在このトピックに関する包括的なレビューは欠落している。このパパーは、関連する研究を要約することで、このギャップを埋めることを目的としています。本研究の目的は,現在の研究ef-fortsを概観し,この分野の今後の発展に資する未来的視点を提供することである。データ駆動トラフィックシミュレーションの一般的な問題を紹介し、重要な概念と用語を概説する。交通シミュレーションの概要を概観した後、様々なデータセットと評価指標を概観する。次に,模倣学習,強化学習,生成学習,深層学習を総合的に評価し,それぞれを要約し,その利点と欠点を詳細に分析する。さらに、最先端、既存の課題、そして将来の研究方向性を評価する。

Autonomous vehicles (AVs) have the potential to significantly revolutionize society by providing a secure and efficient mode of transportation. Recent years have witnessed notable advance-ments in autonomous driving perception and prediction, but the challenge of validating the performance of AVs remains largely unresolved. Data-driven microscopic traffic simulation has be-come an important tool for autonomous driving testing due to 1) availability of high-fidelity traffic data; 2) its advantages of ena-bling large-scale testing and scenario reproducibility; and 3) its potential in reactive and realistic traffic simulation. However, a comprehensive review of this topic is currently lacking. This pa-per aims to fill this gap by summarizing relevant studies. The primary objective of this paper is to review current research ef-forts and provide a futuristic perspective that will benefit future developments in the field. It introduces the general issues of data-driven traffic simulation and outlines key concepts and terms. After overviewing traffic simulation, various datasets and evalua-tion metrics commonly used are reviewed. The paper then offers a comprehensive evaluation of imitation learning, reinforcement learning, generative and deep learning methods, summarizing each and analyzing their advantages and disadvantages in detail. Moreover, it evaluates the state-of-the-art, existing challenges, and future research directions.

翻訳日:2023-10-25 17:48:03 公開日:2023-10-24

# 性能保証を伴う進化タスクのミニマックスフォワードと後方学習

Minimax Forward and Backward Learning of Evolving Tasks with Performance Guarantees ( http://arxiv.org/abs/2310.15974v1 )

ライセンス: Link先を確認

Ver\'onica \'Alvarez, Santiago Mazuelas, and Jose A. Lozano

(参考訳) 時間とともに現れる一連の分類タスクについては、連続タスクがしばしば高い類似性を持つという意味でタスクが進化していることが一般的である。増大するタスク列の漸進的な学習は、シーケンス内のすべてのタスク(前方および後方学習)の情報を活用することで、タスク毎のサンプルが少ない場合でも正確な分類を可能にすることを約束する。しかし、継続学習やコンセプトドリフト適応のために開発された既存の技術は、時間に依存しない類似性のあるタスクのために設計されるか、シーケンスの最後のタスクを学習するためにのみ使用される。本稿では,前向きと後向きの学習を効果的に活用し,タスクの進化に寄与するインクリメンタルなミニマックスリスク分類器(IMRC)を提案する。さらに,タスクの期待2次変化とタスク数の観点から,前向きと後向きの学習によって得られる性能改善を解析的に特徴付ける。実験評価の結果,imrcは,特に試料サイズが小さくなるほど,大幅な性能向上が期待できることがわかった。

For a sequence of classification tasks that arrive over time, it is common that tasks are evolving in the sense that consecutive tasks often have a higher similarity. The incremental learning of a growing sequence of tasks holds promise to enable accurate classification even with few samples per task by leveraging information from all the tasks in the sequence (forward and backward learning). However, existing techniques developed for continual learning and concept drift adaptation are either designed for tasks with time-independent similarities or only aim to learn the last task in the sequence. This paper presents incremental minimax risk classifiers (IMRCs) that effectively exploit forward and backward learning and account for evolving tasks. In addition, we analytically characterize the performance improvement provided by forward and backward learning in terms of the tasks' expected quadratic change and the number of tasks. The experimental evaluation shows that IMRCs can result in a significant performance improvement, especially for reduced sample sizes.

翻訳日:2023-10-25 17:47:47 公開日:2023-10-24

# ランタイムシステムにおける課題管理の特徴

Characterizing Issue Management in Runtime Systems ( http://arxiv.org/abs/2310.15971v1 )

ライセンス: Link先を確認

Salma Begum Tamanna, Gias Uddin, Lan Xia and Longyu Zhang

(参考訳) Javaのような現代のプログラミング言語は、様々なコンピューティングプラットフォームやオペレーティングシステムにおけるソフトウェアアプリケーションの実装とデプロイをサポートするためにランタイムシステムを必要とする。これらのランタイムシステムは通常,大規模なソフトウェア企業(ibmやmicrosoftなど)とoss開発者との緊密なコラボレーションに基づいて,githubがホストするリポジトリで開発される。しかし、その人気と幅広い利用にもかかわらず、私たちの知る限りでは、これらのリポジトリは研究されていない。 GitHubの34のランタイムシステムリポジトリから約118Kの問題に関する実証的研究を報告する。拡張性、テストの失敗、バグに関する問題は、主にランタイムシステムレポジトリに投稿され、ソリューションに関する議論は、主に問題議論に現れます。ランタイムシステムリポジトリの82.69%の問題は解決され、0.69%の問題は無視されている。 82.65%の問題はラベルでタグ付けされ、28.30%の発行者は指定され、90.65%の発行者は少なくとも1つのコメントを含む。調査結果に基づいて 6つの推奨事項を

Modern programming languages like Java require runtime systems to support the implementation and deployment of software applications in diverse computing platforms and operating systems. These runtime systems are normally developed in GitHub-hosted repositories based on close collaboration between large software companies (e.g., IBM, Microsoft) and OSS developers. However, despite their popularity and broad usage; to the best of our knowledge, these repositories have never been studied. We report an empirical study of around 118K issues from 34 runtime system repos in GitHub. We found that issues regarding enhancement, test failure and bug are mostly posted on runtime system repositories and solution related discussion are mostly present on issue discussion. 82.69% issues in the runtime system repositories have been resolved and 0.69% issues are ignored; median of issue close rate, ignore rate and addressing time in these repositories are 76.1%, 2.2% and 58 days respectively. 82.65% issues are tagged with labels while only 28.30% issues have designated assignees and 90.65% issues contain at least one comment; also presence of these features in an issue report can affect issue closure. Based on the findings, we offer six recommendat

翻訳日:2023-10-25 17:47:29 公開日:2023-10-24

# アクセント固有のコードブックを用いたアクセント音声認識

Accented Speech Recognition With Accent-specific Codebooks ( http://arxiv.org/abs/2310.15970v1 )

ライセンス: Link先を確認

Darshan Prabhu (1), Preethi Jyothi (1), Sriram Ganapathy (2), Vinit Unni (1) ((1) Indian Institute of Technology Bombay, Mumbai, India, (2) Indian Institute of Science, Bangalore, India)

(参考訳) 音声アクセントは最先端の自動音声認識(ASR)システムに重大な課題をもたらす。あまり表現されないアクセントによる性能低下は、ASRの包括的採用に対する深刻な抑止力である。本研究では,トレーニング可能なコードブックを用いたクロスアテンションを用いた,エンドツーエンドのASRシステムに対するアクセント適応手法を提案する。これらの学習可能なコードブックはアクセント固有の情報をキャプチャし、ASRエンコーダ層に統合される。モデルはアクセント付き英語音声で訓練されるが、テストデータには訓練中に見られなかったアクセントも含まれていた。 mozilla common voice multi-accented datasetでは、提案手法が英語のアクセント(単語誤り率の相対的改善)だけでなく、目に見えないアクセント(werでは最大$5\$$$の相対的改善)にも大きなパフォーマンス向上をもたらすことを示した。さらに、L2Articデータセット上でゼロショット転送設定の利点を説明する。また,アクセント対向訓練に基づく他の手法との比較を行った。

Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. These learnable codebooks capture accent-specific information and are integrated within the ASR encoder layers. The model is trained on accented English speech, while the test data also contained accents which were not seen during training. On the Mozilla Common Voice multi-accented dataset, we show that our proposed approach yields significant performance gains not only on the seen English accents (up to $37\%$ relative improvement in word error rate) but also on the unseen accents (up to $5\%$ relative improvement in WER). Further, we illustrate benefits for a zero-shot transfer setup on the L2Artic dataset. We also compare the performance with other approaches based on accent adversarial training.

翻訳日:2023-10-25 17:47:12 公開日:2023-10-24

# カラビヤウ五重組の構築と機械学習

Constructing and Machine Learning Calabi-Yau Five-folds ( http://arxiv.org/abs/2310.15966v1 )

ライセンス: Link先を確認

R. Alawadhi, D. Angella, A. Leonardo and T. Schettini Gherardini

(参考訳) 我々は、最大4つの制約を持つ4つ以上の複素射影空間の積において、すべての可能な完備交叉カラビ・ヤウ多様体を構成する。構成行列の行と列の置換に関係のない27068$空間を取得し、それらすべてに対してオイラー数を決定する。これらのうち3,909ドルの製品多様体を除いて、コホモロジーデータは非生産空間の1,2433ドルのケース、すなわち53.7セントのコホモロジーデータを計算し、2,375ドルの異なるホッジダイヤモンドを得る。上記のすべての情報を含むデータセットは、https://www.dropbox.com/scl/fo/z7ii5idt6qxu36e0b8azq/h? rlkey=0qfhx3tykytduobpld510gsfy&dl=0。不変量の分布を提示し, 低次元の類似物との比較を行った。教師付き機械学習は、分類器とレグレッサー(完全連結と畳み込みの両方)を介してコホモロジーデータ上で実行される。私たちは、$h^{1,1}$を非常に効率的に学習することができ、非常に高い$r^2$スコアと996\%$の正確さ、すなわち正確な値に正確に一致する予測の96 \%$である。 h^{1,4},h^{2,3}, \eta$については、非常に高い$r^2$スコアが得られますが、可能な値の範囲が広いため、精度は低くなります。

We construct all possible complete intersection Calabi-Yau five-folds in a product of four or less complex projective spaces, with up to four constraints. We obtain $27068$ spaces, which are not related by permutations of rows and columns of the configuration matrix, and determine the Euler number for all of them. Excluding the $3909$ product manifolds among those, we calculate the cohomological data for $12433$ cases, i.e. $53.7 \%$ of the non-product spaces, obtaining $2375$ different Hodge diamonds. The dataset containing all the above information is available at https://www.dropbox.com/scl/fo/z7ii5idt6qxu36e0b8azq/h?rlkey=0qfhx3tykytduobpld510gsfy&dl=0 . The distributions of the invariants are presented, and a comparison with the lower-dimensional analogues is discussed. Supervised machine learning is performed on the cohomological data, via classifier and regressor (both fully connected and convolutional) neural networks. We find that $h^{1,1}$ can be learnt very efficiently, with very high $R^2$ score and an accuracy of $96\%$, i.e. $96 \%$ of the predictions exactly match the correct values. For $h^{1,4},h^{2,3}, \eta$, we also find very high $R^2$ scores, but the accuracy is lower, due to the large ranges of possible values.

翻訳日:2023-10-25 17:46:59 公開日:2023-10-24

# トークンの混合:クロスサンプル凝集による効率的なLCM

Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation ( http://arxiv.org/abs/2310.15961v1 )

ライセンス: Link先を確認

Szymon Antoniak, Sebastian Jaszczur, Micha{\l} Krutul, Maciej Pi\'oro, Jakub Krajewski, Jan Ludziejewski, Tomasz Odrzyg\'o\'zd\'z, Marek Cygan

(参考訳) トレーニングや推論コストを維持しながらトランスフォーマーモデルのパラメータ数を増やすために、専門家(moe)モデルの混合が期待されているにもかかわらず、その応用には顕著な欠点がある。これらのモデルの鍵となる戦略は、各処理されたトークンに対して、広範囲なフィードフォワード層の多くの専門家サブセットでアクティベートすることだ。しかし、このアプローチには課題はない。専門家とトークンの一致する操作は個別であり、トレーニングの不安定性や不均一な専門家の利用といった問題にMoEモデルは影響を受けやすい。補助損失やバランスアウェアマッチングなど、これらの懸念に対処するために設計された既存のテクニックは、モデルパフォーマンスを低下させるか、トレーニングがより困難になる。これらの問題に対応して,上記の困難を回避しつつ,MoEアーキテクチャの利点を保った完全微分可能なモデルであるMixture of Tokensを提案する。トークンを専門家にルーティングする代わりに、このアプローチでは、さまざまな例からのトークンを専門家に渡す前に混合し、モデルがすべてのトークンと専門家の組み合わせから学習できるようにする。重要なことに、この混合は推論中に異なる配列の混合を避けるために無効にすることができる。重要な点として、この手法はマスク付きおよび因果大言語モデルトレーニングと推論の両方と完全に互換性がある。

Despite the promise of Mixture of Experts (MoE) models in increasing parameter counts of Transformer models while maintaining training and inference costs, their application carries notable drawbacks. The key strategy of these models is to, for each processed token, activate at most a few experts - subsets of an extensive feed-forward layer. But this approach is not without its challenges. The operation of matching experts and tokens is discrete, which makes MoE models prone to issues like training instability and uneven expert utilization. Existing techniques designed to address these concerns, such as auxiliary losses or balance-aware matching, result either in lower model performance or are more difficult to train. In response to these issues, we propose Mixture of Tokens, a fully-differentiable model that retains the benefits of MoE architectures while avoiding the aforementioned difficulties. Rather than routing tokens to experts, this approach mixes tokens from different examples prior to feeding them to experts, enabling the model to learn from all token-expert combinations. Importantly, this mixing can be disabled to avoid mixing of different sequences during inference. Crucially, this method is fully compatible with both masked and causal Large Language Model training and inference.

翻訳日:2023-10-25 17:46:33 公開日:2023-10-24

# notechat: 臨床ノートに基づく総合的な医師・患者会話のデータセット

NoteChat: A Dataset of Synthetic Doctor-Patient Conversations Conditioned on Clinical Notes ( http://arxiv.org/abs/2310.15959v1 )

ライセンス: Link先を確認

Junda Wang, Zonghai Yao, Zhichao Yang, Huixue Zhou, Rumeng Li, Xun Wang, Yucheng Xu, Hong Yu

(参考訳) 各患者訪問後の医師による詳細な臨床記録は、医療従事者や研究者にとって不可欠である。これらのノート作成を言語モデルで自動化することで、医師の作業負荷を削減できる。しかし、患者と医師の会話の公開が限られているため、そのようなモデルの訓練は困難である。本稿では,大言語モデル(LLMs)を利用した臨床ノートを用いた総合的医師と患者との会話生成のための協調型マルチエージェントフレームワークであるNoteChatを紹介する。 NoteChatはプランニング、ロールプレイ、ポーランドモジュールで構成されている。我々はNoteChatをOpenAIのChatGPTやGPT-4といった最先端モデルと比較し、総合的な自動評価と人的評価を行う。以上の結果から,NoteChatは医療におけるLLMの未利用可能性を強調し,高品質な総合的医師と患者との会話を促進することが示唆された。この研究は、臨床ノートに記載された医師と患者との会話を完了するために協力する複数のLSMの最初の事例であり、AIと医療の交差点への有望な道を提供する。

The detailed clinical records drafted by doctors after each patient's visit are crucial for medical practitioners and researchers. Automating the creation of these notes with language models can reduce the workload of doctors. However, training such models can be difficult due to the limited public availability of conversations between patients and doctors. In this paper, we introduce NoteChat, a cooperative multi-agent framework leveraging Large Language Models (LLMs) for generating synthetic doctor-patient conversations conditioned on clinical notes. NoteChat consists of Planning, Roleplay, and Polish modules. We provide a comprehensive automatic and human evaluation of NoteChat, comparing it with state-of-the-art models, including OpenAI's ChatGPT and GPT-4. Results demonstrate that NoteChat facilitates high-quality synthetic doctor-patient conversations, underscoring the untapped potential of LLMs in healthcare. This work represents the first instance of multiple LLMs cooperating to complete a doctor-patient conversation conditioned on clinical notes, offering promising avenues for the intersection of AI and healthcare

翻訳日:2023-10-25 17:46:11 公開日:2023-10-24

# 模倣学習のためのヒューマン・イン・ザ・ループタスクと動作計画

Human-in-the-Loop Task and Motion Planning for Imitation Learning ( http://arxiv.org/abs/2310.16014v1 )

ライセンス: Link先を確認

Ajay Mandlekar, Caelan Garrett, Danfei Xu, Dieter Fox

(参考訳) 人間のデモから学ぶ模倣は、複雑な操作スキルをロボットに教えることができるが、時間と労力がかかる。対照的に、タスク・アンド・モーション・プランニング(TAMP)システムは、長距離タスクの解決に優れ、自動化されているが、コンタクトリッチタスクには適用が難しい。本稿では,Human-in-the-Loop Task and Motion Planning (HITL-TAMP)を提案する。このシステムは、人間の遠隔操作者に対して選択的に制御を行うTAMPゲート制御機構を採用している。これにより、人間の遠隔操作者がロボット群を管理し、データの収集効率を最大化する。収集されたヒューマンデータは、tamp-gatedポリシーをトレーニングするための模倣学習フレームワークと組み合わせることで、完全なタスクデモでのトレーニングよりも優れたパフォーマンスが得られる。私たちはHITL-TAMPを従来の遠隔操作システムと比較しました。さらに、熟練エージェント(75\%+成功)を10分間の非熟練遠隔操作データから訓練することができた。最後に, HITL-TAMPによる2.1Kのデモを12のコンタクトリッチな長距離タスクで収集した。ビデオと追加結果はhttps://hitltamp.github.io。

Imitation learning from human demonstrations can teach robots complex manipulation skills, but is time-consuming and labor intensive. In contrast, Task and Motion Planning (TAMP) systems are automated and excel at solving long-horizon tasks, but they are difficult to apply to contact-rich tasks. In this paper, we present Human-in-the-Loop Task and Motion Planning (HITL-TAMP), a novel system that leverages the benefits of both approaches. The system employs a TAMP-gated control mechanism, which selectively gives and takes control to and from a human teleoperator. This enables the human teleoperator to manage a fleet of robots, maximizing data collection efficiency. The collected human data is then combined with an imitation learning framework to train a TAMP-gated policy, leading to superior performance compared to training on full task demonstrations. We compared HITL-TAMP to a conventional teleoperation system -- users gathered more than 3x the number of demos given the same time budget. Furthermore, proficient agents (75\%+ success) could be trained from just 10 minutes of non-expert teleoperation data. Finally, we collected 2.1K demos with HITL-TAMP across 12 contact-rich, long-horizon tasks and show that the system often produces near-perfect agents. Videos and additional results at https://hitltamp.github.io .

翻訳日:2023-10-25 17:41:40 公開日:2023-10-24

# MLFMF:数学的形式化のための機械学習のためのデータセット

MLFMF: Data Sets for Machine Learning for Mathematical Formalization ( http://arxiv.org/abs/2310.16005v1 )

ライセンス: Link先を確認

Andrej Bauer, Matej Petkovi\'c, Ljup\v{c}o Todorovski

(参考訳) MLFMFは,証明アシスタントを用いた数学の形式化を支援するために使用される推薦システムベンチマークのためのデータセットの集合である。これらのシステムは、新しい定理の証明や新しい構成の実行に関係した以前のエントリ(理論、構造、データタイプ、仮定)を特定するのに役立つ。各データセットは、AgdaやLeanの証明アシスタントで書かれた形式化された数学のライブラリから導かれる。このコレクションには、最大のLean~4ライブラリMathlibと、最大規模のAgdaライブラリ(標準ライブラリ、Agda-Unimathの一価数学ライブラリ、TypeTopologyライブラリ)が含まれている。各データセットは対応するライブラリを2つの方法で表現する: ヘテロジニアスネットワークとして、そしてライブラリ内のすべてのエントリの構文木を表すs表現のリストとして。ネットワークにはライブラリの(モジュール的な)構造とエントリ間の参照が含まれており、s式はエントリごとに完全かつ容易に解析される情報を提供する。標準グラフと単語埋め込み,ツリーアンサンブル,インスタンスベースの学習アルゴリズムを用いて,ベースライン結果について報告する。 MLFMFデータセットは、形式化された数学に対する多くの機械学習アプローチのさらなる調査のために、確固たるベンチマークサポートを提供する。ネットワークとs-表現を抽出する手法は他のライブラリにも容易に適用でき、他の証明アシスタントにも適用できる。合計250,000ドル以上のエントリーがあり、これは現在、機械学習可能な形式における公式な数学的知識のコレクションとして最大である。

We introduce MLFMF, a collection of data sets for benchmarking recommendation systems used to support formalization of mathematics with proof assistants. These systems help humans identify which previous entries (theorems, constructions, datatypes, and postulates) are relevant in proving a new theorem or carrying out a new construction. Each data set is derived from a library of formalized mathematics written in proof assistants Agda or Lean. The collection includes the largest Lean~4 library Mathlib, and some of the largest Agda libraries: the standard library, the library of univalent mathematics Agda-unimath, and the TypeTopology library. Each data set represents the corresponding library in two ways: as a heterogeneous network, and as a list of s-expressions representing the syntax trees of all the entries in the library. The network contains the (modular) structure of the library and the references between entries, while the s-expressions give complete and easily parsed information about every entry. We report baseline results using standard graph and word embeddings, tree ensembles, and instance-based learning algorithms. The MLFMF data sets provide solid benchmarking support for further investigation of the numerous machine learning approaches to formalized mathematics. The methodology used to extract the networks and the s-expressions readily applies to other libraries, and is applicable to other proof assistants. With more than $250\,000$ entries in total, this is currently the largest collection of formalized mathematical knowledge in machine learnable format.

翻訳日:2023-10-25 17:41:17 公開日:2023-10-24

# CVPR 2023 テキストガイドビデオ編集コンペティション

CVPR 2023 Text Guided Video Editing Competition ( http://arxiv.org/abs/2310.16003v1 )

ライセンス: Link先を確認

Jay Zhangjie Wu, Xiuyu Li, Difei Gao, Zhen Dong, Jinbin Bai, Aishani Singh, Xiaoyu Xiang, Youzeng Li, Zuwei Huang, Yuanxi Sun, Rui He, Feng Hu, Junhua Hu, Hai Huang, Hanyu Zhu, Xu Cheng, Jie Tang, Mike Zheng Shou, Kurt Keutzer, Forrest Iandola

(参考訳) 人間は一日に10億時間以上のビデオを視聴する。このビデオのほとんどは手作業で編集されたもので、面倒な作業です。しかし、AI対応のビデオ生成とビデオ編集が増えている。安定拡散やimagenのようなテキストから画像へのモデルに基づいて、生成aiはビデオタスクで劇的に改善されている。しかし、標準ベンチマークがないため、これらのビデオタスクの進捗を評価するのは難しいです。そこで本研究では,テキスト誘導ビデオ編集(TGVE)のための新しいデータセットを提案する。本稿では,コンペティションに関するふりかえりを行い,優勝方法について述べる。競合データセットはhttps://sites.google.com/view/loveucvpr23/track4で入手できる。

Humans watch more than a billion hours of video per day. Most of this video was edited manually, which is a tedious process. However, AI-enabled video-generation and video-editing is on the rise. Building on text-to-image models like Stable Diffusion and Imagen, generative AI has improved dramatically on video tasks. But it's hard to evaluate progress in these video tasks because there is no standard benchmark. So, we propose a new dataset for text-guided video editing (TGVE), and we run a competition at CVPR to evaluate models on our TGVE dataset. In this paper we present a retrospective on the competition and describe the winning method. The competition dataset is available at https://sites.google.com/view/loveucvpr23/track4.

翻訳日:2023-10-25 17:40:54 公開日:2023-10-24

# 画像合成のためのビュー条件の統合

Integrating View Conditions for Image Synthesis ( http://arxiv.org/abs/2310.16002v1 )

ライセンス: Link先を確認

Jinbin Bai, Zhen Dong, Aosong Feng, Xiao Zhang, Tian Ye, Kaicheng Zhou, Mike Zheng Shou

(参考訳) 画像処理の分野では、既存の画像に複雑な意味的修正を適用することは永続的な課題である。本稿では,視点情報を統合して画像編集タスクの制御性を高める,先駆的枠組みを提案する。既存のオブジェクト編集手法を調査し,画像編集法に適合する3つの基本的な基準,一貫性,制御可能性,調和を抽出した。従来の手法とは対照的に,本手法は画像合成の課題に対処するための3つの要件をすべて満たしている。定量的評価と質的比較の両方を包含する包括的実験を通じて,多次元における我々の枠組みの優れた性能を示す説得力のある証拠を提示する。この研究は、画像合成技術の進歩と、合成全体の視覚的コヒーレンスを保ちながら、精密なオブジェクト修正を促進するための有望な道を確立する。

In the field of image processing, applying intricate semantic modifications within existing images remains an enduring challenge. This paper introduces a pioneering framework that integrates viewpoint information to enhance the control of image editing tasks. By surveying existing object editing methodologies, we distill three essential criteria, consistency, controllability, and harmony, that should be met for an image editing method. In contrast to previous approaches, our method takes the lead in satisfying all three requirements for addressing the challenge of image synthesis. Through comprehensive experiments, encompassing both quantitative assessments and qualitative comparisons with contemporary state-of-the-art methods, we present compelling evidence of our framework's superior performance across multiple dimensions. This work establishes a promising avenue for advancing image synthesis techniques and empowering precise object modifications while preserving the visual coherence of the entire composition.

翻訳日:2023-10-25 17:40:38 公開日:2023-10-24

# 推移性回復分解:解釈可能かつロバストな細粒度関係

Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships ( http://arxiv.org/abs/2310.15999v1 )

ライセンス: Link先を確認

Abhra Chaudhuri, Massimiliano Mancini, Zeynep Akata, Anjan Dutta

(参考訳) 細粒度表現学習の最近の進歩は、最先端の成果を達成するために、局所からグローバル(緊急)の関係性を活用する。しかし、そのような方法に依存する関係表現は抽象的である。画像ビュー上の解釈可能なグラフとして表現することで、この抽象化を分解することを目指している。理論的には、抽象的関係表現は、局所的な見解間の推移的関係を回復する手段に過ぎない。そこで我々は,インスタンスとクラスレベルの抽象的創発的関係の解釈可能な等価性を識別するグラフ空間探索アルゴリズムであるTransitivity Recovering Decompositions (TRD) を設計した。また,この発見を裏付ける実証的な証拠とともに,RDがノイズの多い見方に対して確実に堅牢であることを示す。後者は、RDが完全に解釈可能でありながら、最先端技術よりも同等またはそれ以上のパフォーマンスを実現することを可能にする。実装はhttps://github.com/abhrac/trdで利用可能である。

Recent advances in fine-grained representation learning leverage local-to-global (emergent) relationships for achieving state-of-the-art results. The relational representations relied upon by such methods, however, are abstract. We aim to deconstruct this abstraction by expressing them as interpretable graphs over image views. We begin by theoretically showing that abstract relational representations are nothing but a way of recovering transitive relationships among local views. Based on this, we design Transitivity Recovering Decompositions (TRD), a graph-space search algorithm that identifies interpretable equivalents of abstract emergent relationships at both instance and class levels, and with no post-hoc computations. We additionally show that TRD is provably robust to noisy views, with empirical evidence also supporting this finding. The latter allows TRD to perform at par or even better than the state-of-the-art, while being fully interpretable. Implementation is available at https://github.com/abhrac/trd.

翻訳日:2023-10-25 17:40:11 公開日:2023-10-24

# 2つのAndreevレベル量子ビットの光子による長距離結合

Photon-mediated long range coupling of two Andreev level qubits ( http://arxiv.org/abs/2310.15995v1 )

ライセンス: Link先を確認

L. Y. Cheung, R. Haller, A. Kononov, C. Ciaccia, J. H. Ungerer, T. Kanne, J. Nyg\r{a}rd, P. Winkel, T. Reisinger, I. M. Pop, A. Baumgartner, C. Sch\"onenberger

(参考訳) 超伝導弱いリンクでは、超電流は電子とその時間反転パートナーの位相コヒーレントな反射によって形成されるアンドレフ境界状態(ABS)によって運ばれる。単一の高透過性ABSは、次のABSと大きなエネルギー差があるため、理想的でコンパクトな2レベルシステムとして機能する。このようなAndreevレベル量子ビット(ALQ)のコヒーレントな操作が実証されているが、高度な量子ビットアーキテクチャに必要な2つのALQ間の長距離結合はまだ実現されていない。ここでは、マイクロ波光子を介する2つのALQ間のコヒーレントな遠隔結合を、超伝導マイクロ波キャビティカップラで示す。後者は外部ポートとの結合率が異なる2つのモードをホストする。これにより、強結合モードを使用して各キュービットを高速に読み出し、弱い結合モードを使用してキュービット間の結合を仲介することができる。両方の量子ビットが後者のモードと共振するように調整されると、Tavis-Cummingsモデルと非常によく一致して、回避交叉を持つ励起スペクトルが見つかる。このモデルに基づいて, 絡み合いが6ミリの距離で媒介される2量子状態の強い絡み合いを同定する。この研究はALQをコンパクトでスケーラブルな固体量子ビットとして確立する。

In a superconducting weak link, the supercurrent is carried by Andreev bound states (ABSs) formed by the phase-coherent reflection of electrons and their time-reversed partners. A single, highly transmissive ABS can serve as an ideal, compact two-level system, due to a potentially large energy difference to the next ABS. While the coherent manipulation of such Andreev levels qubits (ALQs) has been demonstrated, a long-range coupling between two ALQs, necessary for advanced qubit architectures, has not been achieved, yet. Here, we demonstrate a coherent remote coupling between two ALQs, mediated by a microwave photon in a novel superconducting microwave cavity coupler. The latter hosts two modes with different coupling rates to an external port. This allows us to perform fast readout of each qubit using the strongly coupled mode, while the weakly coupled mode is utilized to mediate the coupling between the qubits. When both qubits are tuned into resonance with the latter mode, we find excitation spectra with avoided-crossings, in very good agreement with the Tavis-Cummings model. Based on this model, we identify highly entangled two-qubit states for which the entanglement is mediated over a distance of six millimeters. This work establishes ALQs as compact and scalable solid-state qubits.

翻訳日:2023-10-25 17:39:46 公開日:2023-10-24

# 大型言語モデルによるホワイトボックスコンパイラのファジング

White-box Compiler Fuzzing Empowered by Large Language Models ( http://arxiv.org/abs/2310.15991v1 )

ライセンス: Link先を確認

Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu, Reyhaneh Jabbarvand, Lingming Zhang

(参考訳) プログラムの振る舞いを偽装することは重大な結果をもたらす可能性があるため、コンパイラの正確性は不可欠である。文献では、ファジングはコンパイラの欠陥を明らかにするために広く研究されている。既存のアーティファクトは、内部のコンパイラ動作を十分に理解せずにテストを生成するブラックボックスとグレイボックスファジングに焦点を当てている。そのため、しばしば複雑な最適化の条件を実行するプログラムの構築に失敗する。一方、従来のホワイトボックス技術は、コンパイラの巨大なコードベースに計算的に適用できない。最近の進歩は、大規模言語モデル(llm)がコード生成/理解タスクに優れ、ブラックボックスファジングで最先端のパフォーマンスを達成していることを示している。それでも、コンパイラのソースコード情報によるLLMのプロンプトは、コンパイラテストの欠如した部分である。そこで本研究では,LLMを用いた最初のホワイトボックスコンパイラファザであるWhiteFoxを提案する。 WhiteFoxはデュアルモデルフレームワークを採用しています。 (i)低レベル最適化ソースコードを解析し、最適化をトリガーできる高レベルテストプログラムの要求を生成すること。 (ii)世代llmは、要約された要件に基づいてテストプログラムを生成する。さらに、最適化-トリガーテストはフィードバックとして使われ、テスト生成をさらに高めます。一般的な4つのコンパイラに対する評価は、WhiteFoxが最先端のファジィよりも80以上の最適化を実践し、複雑な条件を必要とする深い最適化を実行するために高品質なテストを生成することができることを示している。現在までに、WhiteFoxは96のバグを発見し、80が以前不明と確認され、51がすでに修正されている。コンパイラテスト以外にも、WhiteFoxは、他の複雑な現実世界のソフトウェアシステムのホワイトボックスファジングにも適用することができる。

Compiler correctness is crucial, as miscompilation falsifying the program behaviors can lead to serious consequences. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates tests without sufficient understanding of internal compiler behaviors. As such, they often fail to construct programs to exercise conditions of intricate optimizations. Meanwhile, traditional white-box techniques are computationally inapplicable to the giant codebase of compilers. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, prompting LLMs with compiler source-code information remains a missing piece of research in compiler testing. To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization. WhiteFox adopts a dual-model framework: (i) an analysis LLM examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) a generation LLM produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are used as feedback to further enhance the test generation on the fly. Our evaluation on four popular compilers shows that WhiteFox can generate high-quality tests to exercise deep optimizations requiring intricate conditions, practicing up to 80 more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 96 bugs, with 80 confirmed as previously unknown and 51 already fixed. Beyond compiler testing, WhiteFox can also be adapted for white-box fuzzing of other complex, real-world software systems in general.

翻訳日:2023-10-25 17:38:55 公開日:2023-10-24

# GPTにおける翻訳の文脈内学習

Dissecting In-Context Learning of Translations in GPTs ( http://arxiv.org/abs/2310.15987v1 )

ライセンス: Link先を確認

Vikas Raunak and Hany Hassan Awadalla and Arul Menezes

(参考訳) GPT-3 for Machine Translation (MT)のようなLLM(Large Language Models)を活用した最近の研究のほとんどは、プロンプトのための数発のサンプルの選択に重点を置いている。本研究では,高品質なドメイン内実演の摂動を通じて,翻訳の文脈内学習における実演属性の役割をより深く理解することを試みる。ソース・ターゲットマッピングの非対称摂動は、非常に異なる結果をもたらす。対象の摂動は翻訳品質を劇的に低下させる可能性があり、翻訳の文脈内学習において最も重要な学習信号を提供するのが出力テキスト分布であることが示唆された。我々は、ゼロショットプロンプトでこの信号を自動的に付加するゼロショットコンテキストという手法を提案する。我々は,gpt-3のゼロショット翻訳性能を向上し,少人数翻訳と競合することを実証した。

Most of the recent work in leveraging Large Language Models (LLMs) such as GPT-3 for Machine Translation (MT) has focused on selecting the few-shot samples for prompting. In this work, we try to better understand the role of demonstration attributes for the in-context learning of translations through perturbations of high-quality, in-domain demonstrations. We find that asymmetric perturbation of the source-target mappings yield vastly different results. We show that the perturbation of the source side has surprisingly little impact, while target perturbation can drastically reduce translation quality, suggesting that it is the output text distribution that provides the most important learning signal during in-context learning of translations. We propose a method named Zero-Shot-Context to add this signal automatically in Zero-Shot prompting. We demonstrate that it improves upon the zero-shot translation performance of GPT-3, even making it competitive with few-shot prompted translations.

翻訳日:2023-10-25 17:38:02 公開日:2023-10-24

# マルチラベル学習のための視覚言語擬似ラベル

Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning ( http://arxiv.org/abs/2310.15985v1 )

ライセンス: Link先を確認

Xin Xing, Zhexiao Xiong, Abby Stylianou, Srikumar Sastry, Liyu Gong, Nathan Jacobs

(参考訳) 本稿では,シングル陽性多ラベル学習に対する新しいアプローチを提案する。一般に、モデルは単一の入力画像に対して複数のラベルやカテゴリを予測することを学習する。これは標準的なマルチクラス画像分類とは対照的で、タスクは画像の可能な多くのラベルから単一のラベルを予測する。 SPML(Single-Positive Multi-label Learning)は、トレーニングデータに1つの画像に1つのアノテーションしか存在しない場合、複数のラベルを予測する学習を特に検討する。現実世界のデータには、複数のカテゴリに属するインスタンスが同時に含まれることが多いため、マルチラベル学習は、多くの点で、シングルラベル学習よりも現実的なタスクである。我々は視覚言語モデルを用いて強い正負の擬似ラベルを提示し、現在のSOTA法をパスカルVOCで5.5%、MS-COCOで18.4%、NUS-WIDEで15.2%、CUB-Birdsで8.4%上回る新しいアプローチであるVLPLを提案する。コードとデータはhttps://github.com/mvrl/vlpl.comから入手できます。

This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specifically considers learning to predict multiple labels when there is only a single annotation per image in the training data. Multi-label learning is in many ways a more realistic task than single-label learning as real-world data often involves instances belonging to multiple categories simultaneously; however, most common computer vision datasets predominantly contain single labels due to the inherent complexity and cost of collecting multiple high quality annotations for each instance. We propose a novel approach called Vision-Language Pseudo-Labeling (VLPL), which uses a vision-language model to suggest strong positive and negative pseudo-labels, and outperforms the current SOTA methods by 5.5% on Pascal VOC, 18.4% on MS-COCO, 15.2% on NUS-WIDE, and 8.4% on CUB-Birds. Our code and data are available at https://github.com/mvrl/VLPL.

翻訳日:2023-10-25 17:37:48 公開日:2023-10-24

# 動的デジタルヒューマンのための幾何認識映像品質評価

Geometry-Aware Video Quality Assessment for Dynamic Digital Human ( http://arxiv.org/abs/2310.15984v1 )

ライセンス: Link先を確認

Zicheng Zhang, Yingjie Zhou, Wei Sun, Xiongkuo Min, and Guangtao Zhai

(参考訳) dynamic digital human (ddhs) は3dデジタルモデルであり、予め定義された動きを使ってアニメーションされ、生成過程におけるノイズ/シフトや伝達過程における圧縮歪みによって必然的に煩わされる。通常、DDHは2Dレンダリングされたアニメーションビデオとして表示され、ビデオ品質アセスメント(VQA)メソッドをDDH品質アセスメント(DDH-QA)タスクに適応させることは自然である。しかしながら、VQA法は視点に強く依存しており、幾何学に基づく歪みには敏感ではない。そこで本稿では,DDH-QAチャレンジのための新しい非参照(NR)画像品質評価手法を提案する。幾何特性は、DDHsの幾何特性分布から推定される統計的パラメータによって記述される。レンダリングされたビデオから空間的特徴と時間的特徴を取得する。最後に、あらゆる種類の機能が統合され、品質値にレグレッションされます。実験の結果,提案手法はDDH-QAデータベース上で最先端の性能を実現することがわかった。

Dynamic Digital Humans (DDHs) are 3D digital models that are animated using predefined motions and are inevitably bothered by noise/shift during the generation process and compression distortion during the transmission process, which needs to be perceptually evaluated. Usually, DDHs are displayed as 2D rendered animation videos and it is natural to adapt video quality assessment (VQA) methods to DDH quality assessment (DDH-QA) tasks. However, the VQA methods are highly dependent on viewpoints and less sensitive to geometry-based distortions. Therefore, in this paper, we propose a novel no-reference (NR) geometry-aware video quality assessment method for DDH-QA challenge. Geometry characteristics are described by the statistical parameters estimated from the DDHs' geometry attribute distributions. Spatial and temporal features are acquired from the rendered videos. Finally, all kinds of features are integrated and regressed into quality values. Experimental results show that the proposed method achieves state-of-the-art performance on the DDH-QA database.

翻訳日:2023-10-25 17:37:24 公開日:2023-10-24

# Maxwell-Density Matrix Langevin 法による動的光電子デバイスシミュレーションにおけるゆらぎのモデル化

Modeling of Fluctuations in Dynamical Optoelectronic Device Simulations within a Maxwell-Density Matrix Langevin Approach ( http://arxiv.org/abs/2310.16039v1 )

ライセンス: Link先を確認

Johannes Popp (1), Johannes Stowasser (1), Michael A. Schreiber (1), Lukas Seitner (1), Felix Hitzelhammer (2), Michael Haider (1), Gabriela Slavcheva (2 and 3), Christian Jirauschek (1 and 4) ((1) TUM School of Computation, Information and Technology, Technical University of Munich, 85748 Garching, Germany (2) Institute of Physics, NAWI Graz, University of Graz, Universit\"atsplatz 5, 8010 Graz, Austria (3) Quantopticon, 5235 South Harper Court, Chicago, IL 60615 USA (4) TUM Center for Quantum Engineering (ZQE), 85748 Garching, Germany)

(参考訳) 本稿では,量子カスケードレーザー(qcls)や量子ドット(qd)構造などのアクティブフォトニックデバイスにおける時空間ダイナミクスのモデル化のために,c数確率ノイズ項を含む全波マクスウェル密度行列シミュレーションツールを提案する。このようなデバイスにおけるコヒーレント光-マター相互作用は、周波数コムやその他の非線形および非古典光学現象の生成において重要な役割を果たす。非線形および非古典的特徴の出現はノイズ特性に直接関連しているため、ノイズ特性の詳細なシミュレーションは低ノイズ量子光電子源の開発に必要である。我々の半古典的シミュレーションフレームワークは、電子動力学のリンドブラッド方程式とレーザー導波路の光伝搬のマクスウェル方程式を組み合わせたものである。光学場と量子系とそれらの貯水池の相互作用から生じるゆらぎは、量子ランジュバン理論の中で扱われる。ここで、揺らぎは、マクスウェル密度行列方程式に確率的なc-数項を加えることによって含まれる。 mbsolve動的シミュレーションフレームワークの実装が公開されている。

We present a full-wave Maxwell-density matrix simulation tool including c-number stochastic noise terms for the modeling of the spatiotemporal dynamics in active photonic devices, such as quantum cascade lasers (QCLs) and quantum dot (QD) structures. The coherent light-matter interaction in such devices plays an important role in the generation of frequency combs and other nonlinear and nonclassical optical phenomena. Since the emergence of nonlinear and nonclassical features is directly linked to the noise properties, detailed simulations of the noise characteristics are required for the development of low-noise quantum optoelectronic sources. Our semiclassical simulation framework is based on the Lindblad equation for the electron dynamics, coupled with Maxwell's equations for the optical propagation in the laser waveguide. Fluctuations arising from interactions of the optical field and quantum system with their reservoirs are treated within the quantum Langevin theory. Here, the fluctuations are included by adding stochastic c-number terms to the Maxwell-density matrix equations. The implementation in the mbsolve dynamic simulation framework is publicly available.

翻訳日:2023-10-25 17:29:20 公開日:2023-10-24

# 残りは何だ? 論理強化基礎モデルによる概念接地

What's Left? Concept Grounding with Logic-Enhanced Foundation Models ( http://arxiv.org/abs/2310.16035v1 )

ライセンス: Link先を確認

Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu

(参考訳) VisProgやViperGPTといった最近の研究は、視覚推論を用いた大規模言語モデル(LLM)の基礎モデルを巧みに構成し、事前学習された視覚言語モデルで実行可能なプログラムを生成する。しかし、それらは2D画像のような限られた領域で動作し、言語の一般化を完全に活用していない:"左"のような抽象的な概念は、左へ移動するときのように、3D、時間、行動データにも根ざすことができる。この限定的な一般化は、これらの推論のみのメソッドが、事前学習されたモデルを新しいドメインに学習または適応できないことに起因する。本稿では,ドメインに依存しない一階述語論理ベースのプログラムエグゼキュータを持つドメイン間の概念を基礎として,論理拡張基礎モデル(LEFT)を提案する。 LEFTにはLLMインタプリタがあり、全てのドメインとタスク間で共有される一般的な論理ベースの推論言語で表されるプログラムを出力する。 LEFTのエグゼキュータは、トレーニング可能なドメイン固有のグラウンドモジュールでプログラムを実行する。 LEFTは2次元画像,3次元シーン,人間の動作,ロボット操作の4つの領域で,柔軟に概念を学習する。訓練中に複雑で見られず、新しい領域に容易に適用できるものを含む、幅広いタスクにおいて強力な推論能力を示す。

Recent works such as VisProg and ViperGPT have smartly composed foundation models for visual reasoning-using large language models (LLMs) to produce programs that can be executed by pre-trained vision-language models. However, they operate in limited domains, such as 2D images, not fully exploiting the generalization of language: abstract concepts like "left" can also be grounded in 3D, temporal, and action data, as in moving to your left. This limited generalization stems from these inference-only methods' inability to learn or adapt pre-trained models to a new domain. We propose the Logic-Enhanced Foundation Model (LEFT), a unified framework that learns to ground and reason with concepts across domains with a differentiable, domain-independent, first-order logic-based program executor. LEFT has an LLM interpreter that outputs a program represented in a general, logic-based reasoning language, which is shared across all domains and tasks. LEFT's executor then executes the program with trainable domain-specific grounding modules. We show that LEFT flexibly learns concepts in four domains: 2D images, 3D scenes, human motions, and robotic manipulation. It exhibits strong reasoning ability in a wide variety of tasks, including those that are complex and not seen during training, and can be easily applied to new domains.

翻訳日:2023-10-25 17:29:02 公開日:2023-10-24

# マルチモーダル大言語モデルのゼロショット質問応答を改善するビジュアルクロップ

Visual Cropping Improves Zero-Shot Question Answering of Multimodal Large Language Models ( http://arxiv.org/abs/2310.16033v1 )

ライセンス: Link先を確認

Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

(参考訳) マルチモーダル大規模言語モデル(LLM)は、最近、様々な下流アプリケーションやドメインに影響を及ぼす基本的なタスクである視覚的質問応答(VQA)において、ゼロショットの精度を約束している。これらのモデルが広範に使用される可能性を考えると、異なる画像と質問特性を扱う際の制限を検討することが重要である。本研究では,マルチモーダルLLMが画像の細部だけでなく細部も知覚できるかどうかを検討する。特に,視覚的質問に対する回答におけるゼロショット精度は,被写体の大きさに非常に敏感であり,最大4,6\%まで低下することを示した。さらに,この効果は,人間の視覚のトリッピングが,そのサイズに対する感受性を著しく低下させる可能性があることを観察することによる因果性を示す。そこで本研究では,マルチモーダルLCMのゼロショット性能を向上させるために,推定時間機構として3つの自動収穫法を提案する。 VQAv2データセットのサブセットと4つのVQAv2データセットについて,その有効性を検討した。以上の結果から,マルチモーダルLCMは細部感応性VQAアプリケーションに注意を払って使用すべきであり,視覚的トリミングはゼロショット性能を向上させる上で有望な方向であることが示唆された。私たちのコードとデータは公開されています。

Multimodal Large Language Models (LLMs) have recently achieved promising zero-shot accuracy on visual question answering (VQA) -- a fundamental task affecting various downstream applications and domains. Given the great potential for the broad use of these models, it is important to investigate their limitations in dealing with different image and question properties. In this work, we investigate whether multimodal LLMs can perceive small details as well as large details in images. In particular, we show that their zero-shot accuracy in answering visual questions is very sensitive to the size of the visual subject of the question, declining up to $46\%$ with size. Furthermore, we show that this effect is causal by observing that human visual cropping can significantly mitigate their sensitivity to size. Inspired by the usefulness of human cropping, we then propose three automatic visual cropping methods as inference time mechanisms to improve the zero-shot performance of multimodal LLMs. We study their effectiveness on four popular VQA datasets, and a subset of the VQAv2 dataset tailored towards fine visual details. Our findings suggest that multimodal LLMs should be used with caution in detail-sensitive VQA applications, and that visual cropping is a promising direction to improve their zero-shot performance. Our code and data are publicly available.

翻訳日:2023-10-25 17:28:40 公開日:2023-10-24

# LDPC符号の物理 I. ゲージングと双対性

The Physics of (good) LDPC Codes I. Gauging and dualities ( http://arxiv.org/abs/2310.16032v1 )

ライセンス: Link先を確認

Tibor Rakovszky and Vedika Khemani

(参考訳) 低深さパリティチェック(ldpc)符号は誤り訂正のパラダイムであり、(qu)ビット間の空間的非局所的相互作用を可能にするが、各(qu)ビットが有限個の他の多くのビットとのみ相互作用することを強制する。拡張性グラフでは、有限符号化率とコード距離の最適なスケーリングを組み合わせた‘よいコード’が生まれ、それによってコードのノイズに対する堅牢性が制御される。このようなコードは、優れた量子ldpc符号と優れたローカルテスト可能な古典ldpc符号の2つのブレークスルー開発によって、近年注目を集めている。ここでは,非局所相互作用系および非ユークリッド幾何学系において定義されるldpc符号と秩序相間の接続を確立する物理レンズを用いて,これらの発展を考察する。我々は、クラマース=ワンニエ(kw)双対性とゲージ理論の物理的概念をこの文脈に一般化し、連鎖錯体の概念を組織化原理として用いる。一般的な古典的ldpc符号に基づくゲージ理論を議論し、その励起が点的か拡張的かに基づいて2つのクラスを区別する。前者については、1次元イジングモデルに類似したkw双対性を記述するとともに、``boundary conditions''で果たす役割を記述する。後者については、ウェグナーの双対性を一般化して、Z_2ゲージ理論の分解位相内での一般的な量子LDPC符号を得る。量子LDPC符号のすべての既知の例は、局所的に検証可能な古典符号をゲージすることで得られることを示す。また、ゲージ理論のヒッグス位相に関連する任意の古典符号からクラスターハミルトニアンを構築し、ケネディ・タサキ双対変換の定式化を行う。連鎖複素言語を用いてこれらのモデルのエッジモードと非局所順序パラメータを議論し、非ユークリッド幾何学におけるSPT位相の研究を開始する。

Low-depth parity check (LDPC) codes are a paradigm of error correction that allow for spatially non-local interactions between (qu)bits, while still enforcing that each (qu)bit interacts only with finitely many others. On expander graphs, they can give rise to ``good codes'' that combine a finite encoding rate with an optimal scaling of the code distance, which governs the code's robustness against noise. Such codes have garnered much recent attention due to two breakthrough developments: the construction of good quantum LDPC codes and good locally testable classical LDPC codes, using similar methods. Here we explore these developments from a physics lens, establishing connections between LDPC codes and ordered phases of matter defined for systems with non-local interactions and on non-Euclidean geometries. We generalize the physical notions of Kramers-Wannier (KW) dualities and gauge theories to this context, using the notion of chain complexes as an organizing principle. We discuss gauge theories based on generic classical LDPC codes and make a distinction between two classes, based on whether their excitations are point-like or extended. For the former, we describe KW dualities, analogous to the 1D Ising model and describe the role played by ``boundary conditions''. For the latter we generalize Wegner's duality to obtain generic quantum LDPC codes within the deconfined phase of a Z_2 gauge theory. We show that all known examples of good quantum LDPC codes are obtained by gauging locally testable classical codes. We also construct cluster Hamiltonians from arbitrary classical codes, related to the Higgs phase of the gauge theory, and formulate generalizations of the Kennedy-Tasaki duality transformation. We use the chain complex language to discuss edge modes and non-local order parameters for these models, initiating the study of SPT phases in non-Euclidean geometries.

翻訳日:2023-10-25 17:28:17 公開日:2023-10-24

# 光の超放射性と回転量子流体からの絡み合い

Entanglement from superradiance and rotating quantum fluids of light ( http://arxiv.org/abs/2310.16031v1 )

ライセンス: Link先を確認

Adri\`a Delhom, Killian Guerrero, Paula Calizaya, K\'evin Falque, Anthony J. Brady, Ivan Agullo and Maxime J. Jacquet

(参考訳) 超放射光による放射の増幅は、多くの物理系で観測される普遍的な現象である。超ラジアント散乱は、コヒーレント状態を含む様々な入力状態の絡み合いを生成し、この現象の本質的な量子の性質を明らかにする。これらの概念を実験に適用するために,光の偏光流体の散逸ダイナミクスにより動的に安定な地平線のないエルゴリージョンを構築する新しい手法を提案する。我々は,安定なエルゴリージョンの生成を数値シミュレーションし,同等の構成を実験的に実現した。次に,本システムにおける回転超ラジアンスについて検討し,エンタングルメント生成と現在の手法によるエンタングルメント向上の可能性について考察した。本手法は,自発的に入力状態を制御することで,回転超放射による量子放出の研究を可能にする。

The amplification of radiation by superradiance is a universal phenomenon observed in numerous physical systems. We demonstrate that superradiant scattering generates entanglement for different input states, including coherent states, thereby revealing the inherently quantum nature of this phenomenon. To put these concepts to the test, we propose a novel approach to create horizonless ergoregions, which are nonetheless dynamically stable thanks to the dissipative dynamics of a polaritonic fluid of light. We numerically simulate the system to demonstrate the creation of a stable ergoregion, and experimentally realize a comparable configuration. Subsequently, we investigate rotational superradiance within this system, with a primary focus on entanglement generation and the possibilities for its enhancement using current techniques. Our methods permit the investigation of quantum emission by rotational superradiance by controlling the input state at will.

翻訳日:2023-10-25 17:27:43 公開日:2023-10-24

# 現実世界でオフラインのモデルを微調整する

Finetuning Offline World Models in the Real World ( http://arxiv.org/abs/2310.16029v1 )

ライセンス: Link先を確認

Yunhai Feng, Nicklas Hansen, Ziyan Xiong, Chandramouli Rajagopalan, Xiaolong Wang

(参考訳) 強化学習(RL)はデータ非効率で、実際のロボットの訓練を困難にする。モデルベースのRLアルゴリズム(世界モデル)はデータ効率をある程度改善するが、スキルを学ぶには数時間や数日のインタラクションが必要である。最近、オフラインRLは、オンラインインタラクションなしで既存のデータセットに対するRLポリシーをトレーニングするためのフレームワークとして提案されている。しかし、アルゴリズムを固定データセットに制約すると、トレーニングと推論の間に状態-作用分布のシフトが生じ、その適用性は新しいタスクに制限される。我々は,実ロボット上で収集したオフラインデータを用いて世界モデルを事前学習し,学習したモデルを用いて計画したオンラインデータに基づいてモデルを微調整する問題を考える。オンラインインタラクションにおける外挿誤差を軽減するため,評価されたリターンとモデルの不確実性のバランスをとることで,テスト時のプランナの正規化を提案する。本手法は,シミュレーションおよび実ロボットにおける様々なビジュオモータ制御タスクについて評価し,オフラインデータに制限がある場合でも,数発のファインタニングが可能であることが確認された。ビデオ、コード、データはhttps://yunhaifeng.com/FOWM で公開されている。

Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult. While model-based RL algorithms (world models) improve data-efficiency to some extent, they still require hours or days of interaction to learn skills. Recently, offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction. However, constraining an algorithm to a fixed dataset induces a state-action distribution shift between training and inference, and limits its applicability to new tasks. In this work, we seek to get the best of both worlds: we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model. To mitigate extrapolation errors during online interaction, we propose to regularize the planner at test-time by balancing estimated returns and (epistemic) model uncertainty. We evaluate our method on a variety of visuo-motor control tasks in simulation and on a real robot, and find that our method enables few-shot finetuning to seen and unseen tasks even when offline data is limited. Videos, code, and data are available at https://yunhaifeng.com/FOWM .

翻訳日:2023-10-25 17:27:29 公開日:2023-10-24

# トランスフォーマーはどんなアルゴリズムを学べるのか? 長さ一般化に関する研究

What Algorithms can Transformers Learn? A Study in Length Generalization ( http://arxiv.org/abs/2310.16028v1 )

ライセンス: Link先を確認

Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran

(参考訳) 大きな言語モデルは驚くほどの突発的な一般化特性を示すが、算術やパリティのような多くの単純な推論タスクにも苦労する。これにより、Transformerモデルがタスクを解決する真のアルゴリズムを学習できるかどうかという疑問が提起される。アルゴリズムタスクにおける長さ一般化の設定におけるトランスフォーマーの能力の範囲について検討する。本稿では,トランスフォーマーが与えられたタスクに対して,いつ,どのように,強い長さの一般化を示すかを理解するための統一フレームワークを提案する。具体的には、Transformerの計算モデル用に設計されたプログラミング言語であるRASP(Weiss et al., 2021)を活用し、RASP-Generalization Conjectureを導入する。この単純な予想はアルゴリズム上の長さ一般化の最もよく知られた例を顕著に捉えている。さらに、私たちの洞察を活用して、従来の難しいタスク(パリティや追加など)における一般化性能を大幅に改善します。理論的には、abbe et al. (2023) からの学習の"min-degree-interpolator"モデルが、トランスフォーマーの分布外行動を正確に予測しない単純な例を与えるが、我々の予想はそうである。全体として、我々の研究は、構成一般化のメカニズムとトランスフォーマーのアルゴリズム能力に関する新しい視点を提供する。

Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task. Specifically, we leverage RASP (Weiss et al., 2021) -- a programming language designed for the computational model of a Transformer -- and introduce the RASP-Generalization Conjecture: Transformers tend to length generalize on a task if the task can be solved by a short RASP program which works for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks. Moreover, we leverage our insights to drastically improve generalization performance on traditionally hard tasks (such as parity and addition). On the theoretical side, we give a simple example where the "min-degree-interpolator" model of learning from Abbe et al. (2023) does not correctly predict Transformers' out-of-distribution behavior, but our conjecture does. Overall, our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.

翻訳日:2023-10-25 17:27:09 公開日:2023-10-24

# TimewarpVAE: 軌跡の同時学習と表現学習

TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories ( http://arxiv.org/abs/2310.16027v1 )

ライセンス: Link先を確認

Travers Rhodes and Daniel D. Lee

(参考訳) 軌道の人間の実演は多くの機械学習問題に対するトレーニングデータの重要な情報源である。しかしながら、複雑なタスクのために人間のデモンストレーションデータを集めるのが難しいため、これらの軌道の効率的な表現を学ぶことは困難である。手書きや擬似乱数操作などの多くの問題に対して、軌道の正確なタイミングはそれらの空間的経路特性から決定されるべきである。本研究では,動的時間ウォーピング(DTW)を取り入れた完全微分可能多様体学習アルゴリズムであるTimewarpVAEを提案する。本稿では,timewarpvaeアルゴリズムが,手書き文字やフォーク操作データセットにおける空間変動の適切な時間アライメントと意味表現をどのように学習するかを示す。その結果, 基礎的手法よりも空間的再構成テスト誤差が低く, 学習した低次元表現は意味的に有意義な新しい軌跡を効率的に生成できることがわかった。

Human demonstrations of trajectories are an important source of training data for many machine learning problems. However, the difficulty of collecting human demonstration data for complex tasks makes learning efficient representations of those trajectories challenging. For many problems, such as for handwriting or for quasistatic dexterous manipulation, the exact timings of the trajectories should be factored from their spatial path characteristics. In this work, we propose TimewarpVAE, a fully differentiable manifold-learning algorithm that incorporates Dynamic Time Warping (DTW) to simultaneously learn both timing variations and latent factors of spatial variation. We show how the TimewarpVAE algorithm learns appropriate time alignments and meaningful representations of spatial variations in small handwriting and fork manipulation datasets. Our results have lower spatial reconstruction test error than baseline approaches and the learned low-dimensional representations can be used to efficiently generate semantically meaningful novel trajectories.

翻訳日:2023-10-25 17:26:42 公開日:2023-10-24

# ConvBKI: 定量不確実性を備えたリアルタイム確率的意味マッピングネットワーク

ConvBKI: Real-Time Probabilistic Semantic Mapping Network with Quantifiable Uncertainty ( http://arxiv.org/abs/2310.16020v1 )

ライセンス: Link先を確認

Joey Wilson, Yuewei Fu, Joshua Friesen, Parker Ewen, Andrew Capodieci, Paramsothy Jayakumar, Kira Barton, and Maani Ghaffari

(参考訳) 本稿では,不確実な環境でのリアルタイムセマンティックマッピングのためのモジュール型ニューラルネットワークを開発し,ニューラルネットワーク層内のボクセルごとの確率分布を明示的に更新する。従来の確率アルゴリズムの信頼性と現代のニューラルネットワークの性能と効率を両立させる手法である。ロボットの知覚は近代的な微分可能な方法と古典的な明示的な方法に分けられることが多いが、両者の融合はリアルタイムと信頼性の高いパフォーマンスに必要である。本稿では,共役前処理を生かした畳み込みレイヤを通じて,オンラインのセグメンテーション予測を3次元マップに組み込んだ新しい畳み込みベイズカーネル推論(ConvBKI)手法を提案する。 convbkiと最先端のディープラーニングのアプローチと、信頼性とパフォーマンスを評価するための確率的アルゴリズムを比較した。我々はまた、ConvBKIのロボットオペレーティングシステム(ROS)パッケージを作成し、現実の知覚的なオフロード運転データ上でテストする。

In this paper, we develop a modular neural network for real-time semantic mapping in uncertain environments, which explicitly updates per-voxel probabilistic distributions within a neural network layer. Our approach combines the reliability of classical probabilistic algorithms with the performance and efficiency of modern neural networks. Although robotic perception is often divided between modern differentiable methods and classical explicit methods, a union of both is necessary for real-time and trustworthy performance. We introduce a novel Convolutional Bayesian Kernel Inference (ConvBKI) layer which incorporates semantic segmentation predictions online into a 3D map through a depthwise convolution layer by leveraging conjugate priors. We compare ConvBKI against state-of-the-art deep learning approaches and probabilistic algorithms for mapping to evaluate reliability and performance. We also create a Robot Operating System (ROS) package of ConvBKI and test it on real-world perceptually challenging off-road driving data.

翻訳日:2023-10-25 17:26:26 公開日:2023-10-24

# cubesatsを用いたダウンリンク量子鍵分布とキーレス通信のための光ペイロード設計

Optical payload design for downlink quantum key distribution and keyless communication using CubeSats ( http://arxiv.org/abs/2310.16017v1 )

ライセンス: Link先を確認

Pedro Mendes, Gon\c{c}alo Teixeira, David Pinho, Rui Rocha, Paulo Andr\'e, Manfred Niehus, Ricardo Faleiro, Davide Rusca, Emmanuel Zambrini Cruzeiro

(参考訳) 量子鍵分布は費用がかかり、現時点では宇宙アプリケーションにおける低パフォーマンスを提供する。その他の最近のプロトコルはこの問題に対して潜在的に実用的な解決策を提供する可能性がある。本研究では,3Uキューブサットにおける量子通信ダウンリンクのための市販オフザシェルフ素子を用いた予備的な光学ペイロード設計を提案する。この量子状態エミッタは、衛星と地上局の間の2種類の量子通信(量子鍵分布と量子キーレスプライベート通信)を確立できることが示されている。両プロトコルのスキームとその性能が実現可能であることを示す数値シミュレーションが提供されている。単純化されたBB84では、最大秘密鍵レートが約80kHz、最小QBERが0.07\ \%$であるのに対し、量子秘密鍵レス通信では700MHzのプライベートレートが達成されている。この設計は、空間における量子通信の性能を向上させる新しい量子通信プロトコルの実装のためのプラットフォームとして機能する。

Quantum key distribution is costly and, at the moment, offers low performance in space applications. Other more recent protocols could offer a potential practical solution to this problem. In this work, a preliminary optical payload design using commercial off-the-shelf elements for a quantum communication downlink in a 3U CubeSat is proposed. It is shown that this quantum state emitter allows the establishment of two types of quantum communication between the satellite and the ground station: quantum key distribution and quantum keyless private communication. Numerical simulations are provided that show the feasibility of the scheme for both protocols as well as their performance. For the simplified BB84, a maximum secret key rate of about 80 kHz and minimum QBER of slightly more than $0.07\ \%$ is found, at the zenith, while for quantum private keyless communication, a 700 MHz private rate is achieved. This design serves as a platform for the implementation of novel quantum communication protocols that can improve the performance of quantum communications in space.

翻訳日:2023-10-25 17:26:08 公開日:2023-10-24

# 検証としての合成データ

Synthetic Data as Validation ( http://arxiv.org/abs/2310.16052v1 )

ライセンス: Link先を確認

Qixin Hu, Alan Yuille, Zongwei Zhou

(参考訳) 本研究は,AI開発において最適なモデルの選択を容易化するために,合成データを検証セットとして活用する。合成データはトレーニングセットの強化に使用されているが、合成データはバリデーションセットを著しく多様化させる可能性があり、医療などの領域ではデータが制限され、敏感であり、外部のソース(病院など)から得られる顕著な利点がある。そこで本研究では,ct(ct)ボリュームにおける癌早期検出のための合成データの有効性について述べる。そこでは,合成腫瘍が生成され,健康な臓器に重畳され,厳密な検証のための広範なデータセットが作成される。合成データをバリデーションとして使用すると、ドメイン内とドメイン外の両方のテストセットにおけるAI堅牢性が改善される。さらに,合成腫瘍を含む領域外データのストリーム上でaiモデルを継続的にトレーニングする,新しい連続学習フレームワークを構築した。動的に拡張された合成データをトレーニングし、検証するAIモデルは、実世界のデータにのみトレーニングされ、検証されるモデルより一貫して優れている。具体的には、肝臓腫瘍セグメンテーションのDSCスコアは、内部データセットで評価すると26.7% (95% CI: 22.6%-30.9%) から34.5% (30.8%-38.2%) に改善され、31.1% (26.0%-36.2%) から35.4% (32.1%-38.7%) に改善されている。重要な点は、ctボリュームで非常に小さな肝腫瘍(radius < 5mm)を同定することであり、感度はドメイン内データセットでは33.1%から55.4%に向上し、ドメイン外データセットでは33.9%から52.3%に向上し、癌の早期発見の有効性を正当化している。トレーニングと検証の両方の観点から、合成データの応用は、さまざまなドメインのデータを扱う際のAIロバスト性を高めるための、有望な方法である。

This study leverages synthetic data as a validation set to reduce overfitting and ease the selection of the best model in AI development. While synthetic data have been used for augmenting the training set, we find that synthetic data can also significantly diversify the validation set, offering marked advantages in domains like healthcare, where data are typically limited, sensitive, and from out-domain sources (i.e., hospitals). In this study, we illustrate the effectiveness of synthetic data for early cancer detection in computed tomography (CT) volumes, where synthetic tumors are generated and superimposed onto healthy organs, thereby creating an extensive dataset for rigorous validation. Using synthetic data as validation can improve AI robustness in both in-domain and out-domain test sets. Furthermore, we establish a new continual learning framework that continuously trains AI models on a stream of out-domain data with synthetic tumors. The AI model trained and validated in dynamically expanding synthetic data can consistently outperform models trained and validated exclusively on real-world data. Specifically, the DSC score for liver tumor segmentation improves from 26.7% (95% CI: 22.6%-30.9%) to 34.5% (30.8%-38.2%) when evaluated on an in-domain dataset and from 31.1% (26.0%-36.2%) to 35.4% (32.1%-38.7%) on an out-domain dataset. Importantly, the performance gain is particularly significant in identifying very tiny liver tumors (radius < 5mm) in CT volumes, with Sensitivity improving from 33.1% to 55.4% on an in-domain dataset and 33.9% to 52.3% on an out-domain dataset, justifying the efficacy in early detection of cancer. The application of synthetic data, from both training and validation perspectives, underlines a promising avenue to enhance AI robustness when dealing with data from varying domains.

翻訳日:2023-10-25 17:20:56 公開日:2023-10-24

# MuSR:マルチステップソフト推論によるチェーンの限界テスト

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning ( http://arxiv.org/abs/2310.16049v1 )

ライセンス: Link先を確認

Zayne Sprague, Xi Ye, Kaj Bostrom, Swarat Chaudhuri, Greg Durrett

(参考訳) 大きな言語モデル(LLM)にはチェーン・オブ・シークレット・プロンプトのような技術が備わっているが、それでも複雑な設定で堅牢に推論できる能力は不足している。しかし、LLM推論の評価は、論理的推論のようなタスクのベンチマークデータセットが静的のままである間、システムの能力が成長し続けるため、難しい。自然言語ナラティブで指定された多段階のソフト推論タスクに基づいて言語モデルを評価するデータセットであるMuSRを紹介する。このデータセットには2つの重要な特徴がある。まず、新しいニューロシンボリック合成-自然生成アルゴリズムによって作成され、GPT-4(例えば約1000ワードの謎)に挑戦する複雑な推論インスタンスの構築を可能にし、より有能なLSMが放出されるにつれてさらにスケールすることができる。第二に、私たちのデータセットインスタンスは、実世界の推論の領域に対応する無料のテキスト物語です。我々は、このデータセット上で様々なLSMを評価し、堅牢な推論を行うために、チェーンオブ思考のような技術に残るギャップを特徴づける。

While large language models (LLMs) equipped with techniques like chain-of-thought prompting have demonstrated impressive capabilities, they still fall short in their ability to reason robustly in complex settings. However, evaluating LLM reasoning is challenging because system capabilities continue to grow while benchmark datasets for tasks like logical deduction have remained static. We introduce MuSR, a dataset for evaluating language models on multistep soft reasoning tasks specified in a natural language narrative. This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm, enabling the construction of complex reasoning instances that challenge GPT-4 (e.g., murder mysteries roughly 1000 words in length) and which can be scaled further as more capable LLMs are released. Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning; this makes it simultaneously much more challenging than other synthetically-crafted benchmarks while remaining realistic and tractable for human annotators to solve with high accuracy. We evaluate a range of LLMs and prompting techniques on this dataset and characterize the gaps that remain for techniques like chain-of-thought to perform robust reasoning.

翻訳日:2023-10-25 17:20:17 公開日:2023-10-24

# AIアライメントと社会的選択:基本的限界と政策含意

AI Alignment and Social Choice: Fundamental Limitations and Policy Implications ( http://arxiv.org/abs/2310.16048v1 )

ライセンス: Link先を確認

Abhilash Mishra

(参考訳) AIエージェントを人間の意図や価値観に合わせることは、安全でデプロイ可能なAIアプリケーションを構築する上で重要なボトルネックである。しかし、AIエージェントはどんな価値を持つべきか? 人間のフィードバックによる強化学習(RLHF)がAIアライメントの鍵となるフレームワークとして登場した。 RLHFは人間の強化子からのフィードバックを微調整出力に利用し、すべての広くデプロイされた大規模言語モデル(LLM)はRLHFを使用して出力を人間の値に合わせる。 RLHFの限界を理解し、これらの制限から生じる政策課題を考えることが重要である。本稿では,民主的規範を尊重するRLHFシステム構築の課題について考察する。社会的選択論における不合理性に基づいて、かなり広い前提の下では、民主的プロセスを通じてRLHFを用いてAIシステムを普遍的に調整する独自の投票プロトコルが存在しないことを示す。さらに、AIエージェントを全個人の価値観に合わせることは、常に個人の個人的な倫理的嗜好に反すること、すなわち、RLHFを用いたユニバーサルAIアライメントは不可能であることを示す。まず、モデルビルダーの責任を負うために、透過的な投票ルールを義務付ける必要性について論じます。第二に、モデルビルダーは特定のユーザーグループに狭義のAIエージェントを開発することに集中する必要がある。

Aligning AI agents to human intentions and values is a key bottleneck in building safe and deployable AI applications. But whose values should AI agents be aligned with? Reinforcement learning with human feedback (RLHF) has emerged as the key framework for AI alignment. RLHF uses feedback from human reinforcers to fine-tune outputs; all widely deployed large language models (LLMs) use RLHF to align their outputs to human values. It is critical to understand the limitations of RLHF and consider policy challenges arising from these limitations. In this paper, we investigate a specific challenge in building RLHF systems that respect democratic norms. Building on impossibility results in social choice theory, we show that, under fairly broad assumptions, there is no unique voting protocol to universally align AI systems using RLHF through democratic processes. Further, we show that aligning AI agents with the values of all individuals will always violate certain private ethical preferences of an individual user i.e., universal AI alignment using RLHF is impossible. We discuss policy implications for the governance of AI systems built using RLHF: first, the need for mandating transparent voting rules to hold model builders accountable. Second, the need for model builders to focus on developing AI agents that are narrowly aligned to specific user groups.

翻訳日:2023-10-25 17:19:51 公開日:2023-10-24

# 画像復元における後方サンプリングから有意義な多様性へ

From Posterior Sampling to Meaningful Diversity in Image Restoration ( http://arxiv.org/abs/2310.16047v1 )

ライセンス: Link先を確認

Noa Cohen, Hila Manor, Yuval Bahat, Tomer Michaeli

(参考訳) 画像復元問題は通常、劣化した画像が無限に多くの有効な方法で復元できるという意味では不適切である。これに対応するために、多くの作品が、劣化した入力を与えられた自然画像の後方分布からランダムにサンプルし、多様な出力を生成する。ここでは,この戦略は後方分布の重く,実用的価値が限られていると論じる。例えば、画像中の空の欠落した領域を塗装することを考える。欠落した領域には雲以外の物体が存在しない可能性が高いため、後部からのサンプルの集合はすべて(実際は同一)空の完備化によって支配される。しかし、飛行船、鳥、気球などの代替ソリューションとともに、空の完成度を1つだけ示すことで、可能性の集合を概説した方がよいだろう。本稿では,有意義に多様な画像復元の研究を開始する。本稿では,様々な画像復元手法と組み合わせて意味論的に意味のある多様性が得られるポストプロセッシング手法について検討する。さらに, 拡散に基づく画像復元手法により, 不要な計算オーバーヘッドのみを伴いながら, 有意義に多様な出力を生成できる実用的な手法を提案する。提案手法を広範囲にわたるユーザスタディで分析し,出力間の類似性を低減し,後方サンプリングよりも有意に有利であることを示す。コードと例はhttps://noa-cohen.github.io/MeaningfulDiversityInIRで公開されている。

Image restoration problems are typically ill-posed in the sense that each degraded image can be restored in infinitely many valid ways. To accommodate this, many works generate a diverse set of outputs by attempting to randomly sample from the posterior distribution of natural images given the degraded input. Here we argue that this strategy is commonly of limited practical value because of the heavy tail of the posterior distribution. Consider for example inpainting a missing region of the sky in an image. Since there is a high probability that the missing region contains no object but clouds, any set of samples from the posterior would be entirely dominated by (practically identical) completions of sky. However, arguably, presenting users with only one clear sky completion, along with several alternative solutions such as airships, birds, and balloons, would better outline the set of possibilities. In this paper, we initiate the study of meaningfully diverse image restoration. We explore several post-processing approaches that can be combined with any diverse image restoration method to yield semantically meaningful diversity. Moreover, we propose a practical approach for allowing diffusion based image restoration methods to generate meaningfully diverse outputs, while incurring only negligent computational overhead. We conduct extensive user studies to analyze the proposed techniques, and find the strategy of reducing similarity between outputs to be significantly favorable over posterior sampling. Code and examples are available in https://noa-cohen.github.io/MeaningfulDiversityInIR

翻訳日:2023-10-25 17:19:29 公開日:2023-10-24

# ニューラル集団デコードのための統一的でスケーラブルなフレームワーク

A Unified, Scalable Framework for Neural Population Decoding ( http://arxiv.org/abs/2310.16046v1 )

ライセンス: Link先を確認

Mehdi Azabou, Vinam Arora, Venkataramana Ganesh, Ximeng Mao, Santosh Nachimuthu, Michael J. Mendelson, Blake Richards, Matthew G. Perich, Guillaume Lajoie, Eva L. Dyer

(参考訳) ニューラルアクティビティを解読するためにディープラーニングアプローチを使用する能力は、モデルのサイズとデータセットの両方の観点から、より大きなスケールの恩恵を受けるでしょう。しかし、複数の神経記録を1つの統一モデルに統合することは困難であり、それぞれの記録には個々の動物の異なるニューロンの活動が含まれている。本稿では,多種多様な大規模ニューラル記録における神経活動の集団動態をモデル化するためのトレーニングフレームワークとアーキテクチャを提案する。提案手法は,まずデータセット内の個々のスパイクをトークン化し,神経活動の微細な時間構造を捉えるニューラルネットワークイベントの効率的な表現を構築する。次に、交差注意とPerceiverIOバックボーンを用いて、神経集団活動の潜在トークン化をさらに構築する。このアーキテクチャとトレーニングフレームワークを利用して、7つの非ヒト霊長類からの大規模データセットでトレーニングされた大規模マルチセッションモデルを構築し、27,373以上の神経ユニットと100時間以上の記録から、158以上の異なる記録セッションにまたがる。多くの異なるタスクにおいて、我々の事前訓練されたモデルは、未特定のニューロン対応を持つ新しい未確認セッションに迅速に適応できることを示し、最小限のラベルによる少数ショットのパフォーマンスを実現する。この研究は、ニューラルネットワークを分析するディープラーニングツールを構築するための強力な新しいアプローチを示し、大規模トレーニングへの明確な道を切り開く。

Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.

翻訳日:2023-10-25 17:19:05 公開日:2023-10-24

# woodpecker: マルチモーダル大規模言語モデルに対する幻覚補正

Woodpecker: Hallucination Correction for Multimodal Large Language Models ( http://arxiv.org/abs/2310.16045v1 )

ライセンス: Link先を確認

Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun and Enhong Chen

(参考訳) 幻覚は急速に進化するマルチモーダル大言語モデル(mllm)の上にぶら下がっている大きな影であり、生成されたテキストが画像の内容と矛盾する現象を指す。幻覚を緩和するためには、既存の研究は主に、特定のデータでモデルを再訓練するインストラクションチューニング方式を採用している。本稿では,Woodpeckerというトレーニングフリーの手法を導入することで,異なる方法を提案する。木こりが木を癒すように、生成されたテキストから幻覚を拾い、修正する。具体的には、キーコンセプト抽出、質問定式化、視覚知識検証、視覚的クレーム生成、幻覚補正の5段階からなる。治療後の方法で実装されたWoodpeckerは、5段階の中間出力にアクセスして解釈しながら、異なるMLLMを容易に提供することができる。我々はWoodpeckerを定量的かつ質的に評価し、この新しいパラダイムの潜在可能性を示す。 POPEベンチマークでは,ベースラインのMiniGPT-4/mPLUG-Owlよりも30.66%/24.33%精度が向上した。ソースコードはhttps://github.com/bradyfu/woodpeckerで公開されている。

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content. In order to mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like a woodpecker heals trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://github.com/BradyFU/Woodpecker.

翻訳日:2023-10-25 17:18:41 公開日:2023-10-24

# Stanford-ORB: 現実世界の3Dオブジェクトの逆レンダリングベンチマーク

Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark ( http://arxiv.org/abs/2310.16044v1 )

ライセンス: Link先を確認

Zhengfei Kuang, Yunzhi Zhang, Hong-Xing Yu, Samir Agarwala, Shangzhe Wu, Jiajun Wu

(参考訳) 実世界の3Dオブジェクト逆レンダリングベンチマークであるStanford-ORBを紹介する。最近の逆レンダリングの進歩により、3dコンテンツ生成における現実世界の幅広いアプリケーションが実現され、研究や商用のユースケースからコンシューマーデバイスへと急速に移行した。結果は改善を続けているが、様々な逆レンダリングメソッドのパフォーマンスを定量的に評価し比較できる実世界のベンチマークは存在しない。既存の現実世界のデータセットは、通常、オブジェクトの形状とマルチビューイメージのみで構成されており、素材の復元とオブジェクトのリライトの質を評価するには不十分である。材料や照明を回収する手法は、しばしば合成データを用いて定量的評価を行うが、複雑な実環境への一般化は保証されない。地上3Dスキャン,マルチビュー画像,環境照明など,様々な自然環境下で捉えた実世界のオブジェクトのデータセットを新たに導入する。このデータセットを用いて,対象の逆レンダリングタスクの総合的な実世界評価ベンチマークを構築し,既存手法の性能を比較した。すべてのデータ、コード、モデルはhttps://stanfordorb.github.io/でアクセスできる。

We introduce Stanford-ORB, a new real-world 3D Object inverse Rendering Benchmark. Recent advances in inverse rendering have enabled a wide range of real-world applications in 3D content generation, moving rapidly from research and commercial use cases to consumer devices. While the results continue to improve, there is no real-world benchmark that can quantitatively assess and compare the performance of various inverse rendering methods. Existing real-world datasets typically only consist of the shape and multi-view images of objects, which are not sufficient for evaluating the quality of material recovery and object relighting. Methods capable of recovering material and lighting often resort to synthetic data for quantitative evaluation, which on the other hand does not guarantee generalization to complex real-world environments. We introduce a new dataset of real-world objects captured under a variety of natural scenes with ground-truth 3D scans, multi-view images, and environment lighting. Using this dataset, we establish the first comprehensive real-world evaluation benchmark for object inverse rendering tasks from in-the-wild scenes, and compare the performance of various existing methods. All data, code, and models can be accessed at https://stanfordorb.github.io/.

翻訳日:2023-10-25 17:18:23 公開日:2023-10-24

# 弱い相互作用するフェルミガスの弾道対拡散クロスオーバー

The ballistic to diffusive crossover in a weakly-interacting Fermi gas ( http://arxiv.org/abs/2310.16043v1 )

ライセンス: Link先を確認

Jerome Lloyd, Tibor Rakovszky, Frank Pollmann, Curt von Keyserlingk

(参考訳) 電荷とエネルギーは、初期の準粒子のコヒーレントな弾道的な流れから交叉を誘導する相互作用によって、障害がなくても、有限温度でのフェルミオンの相互作用系において拡散することが期待されている。関連するクロスオーバー時間スケールと輸送係数は、相互作用の強さによって制御される。本研究では,DAOE(Dissipation-assisted Operator Evolution)をフェルミオンに適応させることにより,このようなシステムを高温でシミュレーションする数値計算法を開発した。我々のフェルミオンDAOEは、高い$n$-point関数から情報を体系的に捨てることで正確なダイナミクスを近似し、非相互作用のダイナミクスを正確に捉えるように調整され、弱い相互作用の問題の出発点となる。この手法を弱い相互作用を持つフェルミオンの微視的モデルに適用することにより、ボールから拡散輸送への交差は、t_D\sim1/\Delta^{2}$で発生し、拡散定数も同様に$D \sim 1/\Delta^2$でスケールし、$\Delta$は相互作用強度であることを示す。このスケーリングを演算子のスプレッド・ピクチャーにおけるフェルミの黄金則計算で仮定し、$t_d$ をフェルミオン・フェルミオン散乱時間と単粒子グリーン関数の寿命と解釈する。

Charge and energy are expected to diffuse in interacting systems of fermions at finite temperatures, even in the absence of disorder, with the interactions inducing a crossover from the coherent and ballistic streaming of quasi-particles at early times, to incoherent diffusive behavior at late times. The relevant crossover timescales and the transport coefficients are both controlled by the strength of interactions. In this work we develop a numerical method to simulate such systems at high temperatures, applicable in a wide range of interaction strengths, by adapting Dissipation-assisted Operator Evolution (DAOE) to fermions. Our fermion DAOE, which approximates the exact dynamics by systematically discarding information from high $n$-point functions, is tailored to capture non-interacting dynamics exactly, thus providing a good starting point for the weakly interacting problem. Applying our method to a microscopic model of weakly interacting fermions, we numerically demonstrate that the crossover from ballistic to diffusive transport happens at a time $t_D\sim1/\Delta^{2}$ and that the diffusion constant similarly scales as $D \sim 1/\Delta^2$, where $\Delta$ is the interaction strength. We substantiate this scaling with a Fermi's golden rule calculation in the operator spreading picture, interpreting $t_D$ as the fermion-fermion scattering time and lifetime of the single-particle Green's function.

翻訳日:2023-10-25 17:18:04 公開日:2023-10-24

# WebWISE: 大規模言語モデルによるWebインタフェース制御とシークエンシャル探索

WebWISE: Web Interface Control and Sequential Exploration with Large Language Models ( http://arxiv.org/abs/2310.16042v1 )

ライセンス: Link先を確認

Heyi Tao, Sethuraman T V, Michal Shlapentokh-Rothman, Derek Hoiem, Heng Ji

(参考訳) 本稿では,Large Language Model (LLM) を用いて,クリック,スクロール,テキスト入力操作によるWebソフトウェアタスクの自動実行について検討する。強化学習(RL)や模倣学習といった従来のアプローチは、訓練やタスク固有に非効率である。提案手法では,フィルタドキュメンテーションオブジェクトモデル(DOM)要素を観測として使用し,タスクをステップバイステップで実行し,現在の観測結果に基づいて小さなプログラムを逐次生成する。手動で用意した例から恩恵を受けるか、ゼロショットトライアルの成功に基づいて自動的に生成された例を使う。提案手法をMiniWob++ベンチマークで評価する。インコンテキストの例が1つしかないので、WebWISEメソッドは、多くのデモや試行を必要とする他の方法と同じような、あるいは優れたパフォーマンスを実現します。

The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations. Previous approaches, such as reinforcement learning (RL) or imitation learning, are inefficient to train and task-specific. Our method uses filtered Document Object Model (DOM) elements as observations and performs tasks step-by-step, sequentially generating small programs based on the current observations. We use in-context learning, either benefiting from a single manually provided example, or an automatically generated example based on a successful zero-shot trial. We evaluate the proposed method on the MiniWob++ benchmark. With only one in-context example, our WebWISE method achieves similar or better performance than other methods that require many demonstrations or trials.

翻訳日:2023-10-25 17:17:33 公開日:2023-10-24

# インストラクションと抽出:オンデマンド情報抽出のための命令チューニング

Instruct and Extract: Instruction Tuning for On-Demand Information Extraction ( http://arxiv.org/abs/2310.16040v1 )

ライセンス: Link先を確認

Yizhu Jiao, Ming Zhong, Sha Li, Ruining Zhao, Siru Ouyang, Heng Ji, Jiawei Han

(参考訳) 命令フォロー機能を備えた大規模言語モデルは、より広いグループユーザへの扉を開く。しかし、情報抽出 - 自然言語処理の古典的なタスク - に関して言えば、ほとんどのタスク固有のシステムは、非専門家ユーザのためのロングテールアドホック抽出ユースケースとうまく連携できない。そこで本研究では,実世界の利用者の要求に応えるために,オンデマンド情報抽出と呼ばれる新しいパラダイムを提案する。本課題は,テキストから所望の内容を抽出し,構造化表形式で提示するための指示に従うことである。テーブルヘッダは、ユーザが指定するか、モデルによってコンテキスト的に推論できる。この領域での研究を容易にするために,自動生成したトレーニングデータと人間によるテストセットの両方を包含するinstructieというベンチマークを示す。 InstructIE 上に構築した On-Demand Information Extractor, ODIE をさらに発展させる。ベンチマークの総合的な評価から,ODIEが既存のオープンソースモデルと同等のサイズで大幅に上回っていることが明らかとなった。私たちのコードとデータセットはhttps://github.com/yzjiao/On-Demand-IEで公開されています。

Large language models with instruction-following capabilities open the door to a wider group of users. However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, to fulfill the personalized demands of real-world users. Our task aims to follow the instructions to extract the desired content from the associated text and present it in a structured tabular format. The table headers can either be user-specified or inferred contextually by the model. To facilitate research in this emerging area, we present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE. Comprehensive evaluations on our benchmark reveal that ODIE substantially outperforms the existing open-source models of similar size. Our code and dataset are released on https://github.com/yzjiao/On-Demand-IE.

翻訳日:2023-10-25 17:17:19 公開日:2023-10-24

# 原型軌道の解釈可能なテキスト分類

Interpretable Text Classification Via Prototype Trajectories ( http://arxiv.org/abs/2007.01777v4 )

ライセンス: Link先を確認

Dat Hong, Stephen S. Baek, Tong Wang

(参考訳) 本稿では,ProtoryNetと呼ばれるテキスト分類のための新しい解釈可能なディープニューラルネットワークを提案する。現代言語学におけるプロトタイプ理論に動機づけられたProtoryNetは、テキストシーケンスで各文の最も類似したプロトタイプを見つけ、各文の近接したRNNバックボーンを対応するアクティブプロトタイプに供給することで予測を行う。 RNNのバックボーンは、プロトタイプの時間パターンをキャプチャします。プロトタイプの軌跡は、人間がテキストを分析する方法に似た、RNNモデルの推論過程の直感的できめ細かな解釈を可能にする。また,モデルが使用するプロトタイプの総数を削減し,解釈性を向上させるためのプロトタイプの刈り込み手順も設計した。複数の公開データセットの実験によると、ProtoryNetはベースラインのプロトタイプベースのディープニューラルネットよりも正確であり、最先端のブラックボックスモデルと比較してパフォーマンスギャップを低減する。さらに、プロトタイププルーニング後の結果のProtoryNetモデルでは、すべてのデータセットのプロトタイプが20ほど必要とせず、解釈可能性に大きなメリットがある。さらに,ProtoryNetがプロトタイプベースの手法よりも直感的で理解しやすいことを示す調査結果を報告する。

We propose a novel interpretable deep neural network for text classification, called ProtoryNet, based on a new concept of prototype trajectories. Motivated by the prototype theory in modern linguistics, ProtoryNet makes a prediction by finding the most similar prototype for each sentence in a text sequence and feeding an RNN backbone with the proximity of each sentence to the corresponding active prototype. The RNN backbone then captures the temporal pattern of the prototypes, which we refer to as prototype trajectories. Prototype trajectories enable intuitive and fine-grained interpretation of the reasoning process of the RNN model, in resemblance to how humans analyze texts. We also design a prototype pruning procedure to reduce the total number of prototypes used by the model for better interpretability. Experiments on multiple public data sets show that ProtoryNet is more accurate than the baseline prototype-based deep neural net and reduces the performance gap compared to state-of-the-art black-box models. In addition, after prototype pruning, the resulting ProtoryNet models only need less than or around 20 prototypes for all datasets, which significantly benefits interpretability. Furthermore, we report a survey result indicating that human users find ProtoryNet more intuitive and easier to understand than other prototype-based methods.

翻訳日:2023-10-25 15:25:36 公開日:2023-10-24

# 深層生成モデルのコンテンツベース検索

Content-Based Search for Deep Generative Models ( http://arxiv.org/abs/2210.03116v3 )

ライセンス: Link先を確認

Daohan Lu, Sheng-Yu Wang, Nupur Kumari, Rohan Agarwal, Mia Tang, David Bau, Jun-Yan Zhu

翻訳日:2023-10-25 14:47:07 公開日:2023-10-24

# 事前学習型大規模言語モデルにおけるスパースフィードフォワードネットワークの統一化に向けて

Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model ( http://arxiv.org/abs/2305.13999v2 )

ライセンス: Link先を確認

Zeyu Leo Liu, Tim Dettmers, Xi Victoria Lin, Veselin Stoyanov, Xian Li

(参考訳) Mixture-of-Experts (MoE) のような大規模でスパースなフィードフォワード層 (S-FFN) は、大言語モデルに対する Transformers モデルサイズをスケールアップするのに有効であることが証明されている。 S-FFNは、入力を条件にFFNパラメータの一部を活性化することによって、トレーニングと推論コスト(FLOP)を固定したまま、一般化性能を向上させる。本研究では,s-ffnのメモリブロックサイズ(専門家)とメモリブロック選択方法の2つの主要な設計選択を,スパースニューラルネットワークの一般的な概念枠組みに基づいて解析した。この統合フレームワークを用いて、言語モデリングのためのいくつかのS-FFNアーキテクチャを比較し、それらの相対的有効性と効率に関する洞察を提供する。そこで我々は,Switch Transformer (Fedus et al., 2021) やHashLayer (Roller et al., 2021) などの既存のMoEアーキテクチャと比較して,言語モデルの事前学習における難易度を低くする,ブロックを平均的に集約された隠れ状態から選択する,より単純な選択方法を発見した。

Large and sparse feed-forward layers (S-FFN) such as Mixture-of-Experts (MoE) have proven effective in scaling up Transformers model size for \textit{pretraining} large language models. By only activating part of the FFN parameters conditioning on input, S-FFN improves generalization performance while keeping training and inference costs (in FLOPs) fixed. In this work, we analyzed two major design choices of S-FFN: the memory block (a.k.a. expert) size and the memory block selection method under a general conceptual framework of sparse neural memory. Using this unified framework, we compare several S-FFN architectures for language modeling and provide insights into their relative efficacy and efficiency. We found a simpler selection method -- \textbf{\texttt{Avg-K}} that selects blocks through their mean aggregated hidden states, achieving lower perplexity in language model pretraining compared to existing MoE architectures including Switch Transformer (Fedus et al., 2021) and HashLayer (Roller et al., 2021).

翻訳日:2023-10-25 12:07:05 公開日:2023-10-24

Zeyu Leo Liu, Tim Dettmers, Xi Victoria Lin, Veselin Stoyanov, Xian Li

翻訳日:2023-10-25 11:52:22 公開日:2023-10-24

# 知識に基づく視覚質問応答のための簡単なベースライン

A Simple Baseline for Knowledge-Based Visual Question Answering ( http://arxiv.org/abs/2310.13570v2 )

ライセンス: Link先を確認

Alexandros Xenos, Themos Stafylakis, Ioannis Patras and Georgios Tzimiropoulos

(参考訳) 本稿では,知識に基づく視覚質問応答(KB-VQA)の問題について述べる。最近の研究は、(外部データベースを通して)明示的な知識と(LCMを通して)暗黙的な知識の両方を効果的に取り入れることの重要性を強調している。このようなアプローチの共通する制限は、比較的複雑なパイプラインで構成されており、しばしばGPT-3 APIへのアクセスに大きく依存していることである。本稿では,質問文を文脈情報としてラマ(1,2)を促すことで,効率的な文脈内学習を基本とした,よりシンプルで容易に再現可能なパイプラインを提案する。近年のアプローチとは対照的に,本手法はトレーニングフリーであり,外部データベースやAPIへのアクセスを必要とせず,OK-VQAおよびA-OK-VQAデータセット上で最先端の精度を実現する。最後に,本手法の重要な側面を理解するため,いくつかのアブレーション研究を行った。私たちのコードはhttps://github.com/alexandrosXe/ASimple-Baseline-For-Knowledge-Based-VQAで公開されています。

This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA). Recent works have emphasized the significance of incorporating both explicit (through external databases) and implicit (through LLMs) knowledge to answer questions requiring external knowledge effectively. A common limitation of such approaches is that they consist of relatively complicated pipelines and often heavily rely on accessing GPT-3 API. Our main contribution in this paper is to propose a much simpler and readily reproducible pipeline which, in a nutshell, is based on efficient in-context learning by prompting LLaMA (1 and 2) using question-informative captions as contextual information. Contrary to recent approaches, our method is training-free, does not require access to external databases or APIs, and yet achieves state-of-the-art accuracy on the OK-VQA and A-OK-VQA datasets. Finally, we perform several ablation studies to understand important aspects of our method. Our code is publicly available at https://github.com/alexandrosXe/ASimple-Baseline-For-Knowledge-Based-VQA

翻訳日:2023-10-25 11:34:54 公開日:2023-10-24

# 言語モデルにおける語彙理解に向けて

Towards Understanding Sycophancy in Language Models ( http://arxiv.org/abs/2310.13548v2 )

ライセンス: Link先を確認

Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez

(参考訳) 人間のフィードバックからの強化学習(RLHF)は、高品質なAIアシスタントを訓練するための一般的なテクニックである。しかし、RLHFはまた、真の反応に対するユーザの信念と一致するモデル応答を奨励するかもしれない。 RLHF訓練モデルにおける梅毒の有病率と人間の嗜好判断が原因かを検討する。まず,5つの最先端aiアシスタントが,4つの自由形式のテキスト生成タスクに対して一貫して共語行動を示すことを実証した。人間の嗜好がRLHFモデルの広範に観察された振る舞いを駆動するかどうかを理解するために,既存の嗜好データを分析する。レスポンスがユーザのビューにマッチする場合、より好まれる可能性が高いことが分かりました。さらに、人間と選好モデル(pms)は、正しいものよりも説得力に書かれたシコファンティックな反応を好む。 pmsに対するモデル出力の最適化は、時としてシンコファンシーに有利な真理を犠牲にする。以上の結果から, 梅毒はRLHFモデルの一般的な行動である可能性が示唆された。

Reinforcement learning from human feedback (RLHF) is a popular technique for training high-quality AI assistants. However, RLHF may also encourage model responses that match user beliefs over truthful responses, a behavior known as sycophancy. We investigate the prevalence of sycophancy in RLHF-trained models and whether human preference judgements are responsible. We first demonstrate that five state-of-the-art AI assistants consistently exhibit sycophantic behavior across four varied free-form text-generation tasks. To understand if human preferences drive this broadly observed behavior of RLHF models, we analyze existing human preference data. We find that when a response matches a user's views, it is more likely to be preferred. Moreover, both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time. Optimizing model outputs against PMs also sometimes sacrifices truthfulness in favor of sycophancy. Overall, our results indicate that sycophancy is a general behavior of RLHF models, likely driven in part by human preference judgements favoring sycophantic responses.

翻訳日:2023-10-25 11:34:36 公開日:2023-10-24

# 騒々しい翻訳データをきれいにする言語モデル

Ask Language Model to Clean Your Noisy Translation Data ( http://arxiv.org/abs/2310.13469v3 )

ライセンス: Link先を確認

Quinten Bolding, Baohao Liao, Brandon James Denis, Jun Luo, Christof Monz

(参考訳) トランスフォーマーモデルはニューラルマシン翻訳(NMT)において顕著な性能を示した。しかし、ノイズ入力に対するその脆弱性は、ノイズ入力からクリーンな出力を生成するという実践的な実装において重大な課題を生んでいる。 MTNTデータセットは、ノイズ入力に対するNMTモデルの堅牢性を評価するベンチマークとして広く利用されている。それでも、その実用性は、ソース文とターゲット文の両方にノイズがあるため制限されている。この制限に対処するため、MTNTのターゲット文からノイズを除去することに集中し、ノイズ評価のベンチマークとしてより適している。大規模言語モデル(llm)の機能を活用して,ノイズ除去におけるその印象的な能力を観察した。例えば、意味的な意味を考慮しながら絵文字を削除できる。さらに, LLM はスラング, ジャーゴン, 預言を効果的に表現できることが示唆された。 C-MTNTと呼ばれる結果のデータセットは、元の文のセマンティックな整合性を保ちながら、ターゲット文のノイズを著しく少なくする。我々の人間とgpt-4の評価は、llmがこのタスクでうまく働くという一貫した結論をもたらす。最後に、C-MTNT実験はNMTモデルの堅牢性を評価する上での有効性を示し、C-MTNTを貴重な資源として強調した。

Transformer models have demonstrated remarkable performance in neural machine translation (NMT). However, their vulnerability to noisy input poses a significant challenge in practical implementation, where generating clean output from noisy input is crucial. The MTNT dataset is widely used as a benchmark for evaluating the robustness of NMT models against noisy input. Nevertheless, its utility is limited due to the presence of noise in both the source and target sentences. To address this limitation, we focus on cleaning the noise from the target sentences in MTNT, making it more suitable as a benchmark for noise evaluation. Leveraging the capabilities of large language models (LLMs), we observe their impressive abilities in noise removal. For example, they can remove emojis while considering their semantic meaning. Additionally, we show that LLM can effectively rephrase slang, jargon, and profanities. The resulting datasets, called C-MTNT, exhibit significantly less noise in the target sentences while preserving the semantic integrity of the original sentences. Our human and GPT-4 evaluations also lead to a consistent conclusion that LLM performs well on this task. Lastly, experiments on C-MTNT showcased its effectiveness in evaluating the robustness of NMT models, highlighting the potential of advanced language models for data cleaning and emphasizing C-MTNT as a valuable resource.

翻訳日:2023-10-25 11:34:18 公開日:2023-10-24

# ChatGPTをテーマ分析に役立てる - 準備はいいか?

Harnessing ChatGPT for thematic analysis: Are we ready? ( http://arxiv.org/abs/2310.14545v2 )

ライセンス: Link先を確認

V Vien Lee, Stephanie C. C. van der Lubbe, Lay Hoon Goh and Jose M. Valderas

(参考訳) ChatGPTは先進的な自然言語処理ツールであり、医学研究における様々な分野の応用が成長している。データのパターンを識別し解釈するための定性的な研究手法であるthematic analysisは、この技術の恩恵を受けるアプリケーションのひとつだ。この視点は、医学的文脈におけるテーマ分析の3つのコアフェーズにおけるchatgptの利用を考察する。 1) 転写物の直接符号化 2)予め定義されたコードリストからテーマを生成すること,及び 3)原稿包含のための前処理引用さらに,ChatGPTによるインタビューテキスト生成の可能性についても検討した。これらの役割におけるChatGPTの使用の強みと限界を評価し,人間の介入が必要な領域を強調した。全体としては、ChatGPTは解析において貴重なツールとして機能し、理論解析の効率を高め、定性的データにさらなる洞察を与えることができると論じる。

ChatGPT is an advanced natural language processing tool with growing applications across various disciplines in medical research. Thematic analysis, a qualitative research method to identify and interpret patterns in data, is one application that stands to benefit from this technology. This viewpoint explores the utilization of ChatGPT in three core phases of thematic analysis within a medical context: 1) direct coding of transcripts, 2) generating themes from a predefined list of codes, and 3) preprocessing quotes for manuscript inclusion. Additionally, we explore the potential of ChatGPT to generate interview transcripts, which may be used for training purposes. We assess the strengths and limitations of using ChatGPT in these roles, highlighting areas where human intervention remains necessary. Overall, we argue that ChatGPT can function as a valuable tool during analysis, enhancing the efficiency of the thematic analysis and offering additional insights into the qualitative data.

翻訳日:2023-10-25 11:26:02 公開日:2023-10-24

# コンピュータ翻訳における単語レベル自動補完の再考

Rethinking Word-Level Auto-Completion in Computer-Aided Translation ( http://arxiv.org/abs/2310.14523v2 )

ライセンス: Link先を確認

Xingyu Chen and Lemao Liu and Guoping Huang and Zhirui Zhang and Mingming Yang and Shuming Shi and Rui Wang

(参考訳) Word-Level Auto-Completion (WLAC) はコンピュータ翻訳において重要な役割を果たす。人間の翻訳者に対して単語レベルの自動補完提案を提供することを目的としている。従来の研究は主に複雑なモデルアーキテクチャの設計に重点を置いてきたが、本論文は基本的な問題を再考することによって、異なる視点を採っている。この質問に答えるために測定可能な基準を導入し、既存のwlacモデルは、しばしばこの基準を満たさないことを発見します。本研究は, 基準の遵守を促進することによってWLAC性能を向上させる効果的な手法を提案する。特に,提案手法は汎用的であり,様々なエンコーダアーキテクチャに適用可能である。実験により,WMT2022におけるWLAC共有タスクの処理性能は,モデルサイズを大幅に小さくし,高い性能を示した。

Word-Level Auto-Completion (WLAC) plays a crucial role in Computer-Assisted Translation. It aims at providing word-level auto-completion suggestions for human translators. While previous studies have primarily focused on designing complex model architectures, this paper takes a different perspective by rethinking the fundamental question: what kind of words are good auto-completions? We introduce a measurable criterion to answer this question and discover that existing WLAC models often fail to meet this criterion. Building upon this observation, we propose an effective approach to enhance WLAC performance by promoting adherence to the criterion. Notably, the proposed approach is general and can be applied to various encoder-based architectures. Through extensive experiments, we demonstrate that our approach outperforms the top-performing system submitted to the WLAC shared tasks in WMT2022, while utilizing significantly smaller model sizes.

翻訳日:2023-10-25 11:25:46 公開日:2023-10-24

# corefprompt:イベントタイプと引数互換性の測定によるプロンプトベースのイベントコリファレンス解決

CorefPrompt: Prompt-based Event Coreference Resolution by Measuring Event Type and Argument Compatibilities ( http://arxiv.org/abs/2310.14512v2 )

ライセンス: Link先を確認

Sheng Xu, Peifeng Li, Qiaoming Zhu

(参考訳) event coreference resolution(ecr)は、同じ実世界のイベントをクラスタに参照するイベント言及をグループ化する。以前の研究のほとんどは"encoding first, then scoring"フレームワークを採用しており、コリファレンス判断はイベントエンコーディングに依存している。さらに、現在の手法では、モデルを導くために、coreferential eventsが同じイベントタイプを持つべきであるなど、人間によるecrルールの活用に苦労している。これら2つの問題に対処するため,我々は,ECRを閉鎖型MLM(masked language model)タスクに変換するプロンプトベースのアプローチであるCorefPromptを提案する。これにより、完全な共有コンテキストを持つ単一のテンプレート内で、イベントモデリングとコリファレンスの同時識別が可能になる。さらに、イベント型互換性と引数互換性という2つの補助的なプロンプトタスクを導入し、モデルが最終的な予測を行うのに役立つECRの推論過程を明確に示す。実験の結果,CorefPromptはSOTA(State-of-the-art)ベンチマークでよく動作することがわかった。

Event coreference resolution (ECR) aims to group event mentions referring to the same real-world event into clusters. Most previous studies adopt the "encoding first, then scoring" framework, making the coreference judgment rely on event encoding. Furthermore, current methods struggle to leverage human-summarized ECR rules, e.g., coreferential events should have the same event type, to guide the model. To address these two issues, we propose a prompt-based approach, CorefPrompt, to transform ECR into a cloze-style MLM (masked language model) task. This allows for simultaneous event modeling and coreference discrimination within a single template, with a fully shared context. In addition, we introduce two auxiliary prompt tasks, event-type compatibility and argument compatibility, to explicitly demonstrate the reasoning process of ECR, which helps the model make final predictions. Experimental results show that our method CorefPrompt performs well in a state-of-the-art (SOTA) benchmark.

翻訳日:2023-10-25 11:25:31 公開日:2023-10-24

# 最適制御における学習問題に対する暗黙差分の再検討

Revisiting Implicit Differentiation for Learning Problems in Optimal Control ( http://arxiv.org/abs/2310.14468v2 )

ライセンス: Link先を確認

Ming Xu, Timothy Molloy, Stephen Gould

(参考訳) 本稿では,非凸,制約付き離散時間最適制御(COC)問題から生じる最適軌道を暗黙関数定理(IFT)を用いて微分する新しい手法を提案する。従来の研究は、軌道微分のための微分カルーシュ・クーン・タッカー(KKT)システムを解き、補助線形二次レギュレータ(LQR)問題を解くことで効率よく実現している。対照的に、(微分)kkt系におけるラグランジュ乗算項に変数除去を適用することによって生じる行列方程式を直接評価する。結果方程式内の項の構造を適切に説明することにより、軌道微分は時間ステップの数とともに線形にスケールすることを示す。さらに,本手法により並列化が容易になり,モデルサイズによるスケーラビリティが大幅に向上し,ベクトルジャコビアン積の直接計算が可能となった。さらなる貢献として、IFTを用いたトラジェクトリ微分の計算は、時間ステップの数と2倍にスケールするという主張に対処する。本手法を合成ベンチマークと4つの挑戦ベンチマークで評価し,6自由度操縦クワッドローターと6自由度ロケット動力着陸を含む実演ベンチマークから学習した。

This paper proposes a new method for differentiating through optimal trajectories arising from non-convex, constrained discrete-time optimal control (COC) problems using the implicit function theorem (IFT). Previous works solve a differential Karush-Kuhn-Tucker (KKT) system for the trajectory derivative, and achieve this efficiently by solving an auxiliary Linear Quadratic Regulator (LQR) problem. In contrast, we directly evaluate the matrix equations which arise from applying variable elimination on the Lagrange multiplier terms in the (differential) KKT system. By appropriately accounting for the structure of the terms within the resulting equations, we show that the trajectory derivatives scale linearly with the number of timesteps. Furthermore, our approach allows for easy parallelization, significantly improved scalability with model size, direct computation of vector-Jacobian products and improved numerical stability compared to prior works. As an additional contribution, we unify prior works, addressing claims that computing trajectory derivatives using IFT scales quadratically with the number of timesteps. We evaluate our method on a both synthetic benchmark and four challenging, learning from demonstration benchmarks including a 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.

翻訳日:2023-10-25 11:25:12 公開日:2023-10-24

# 深層アクティブラーニングとその医用画像解析への応用に関する総合的調査

A comprehensive survey on deep active learning and its applications in medical image analysis ( http://arxiv.org/abs/2310.14230v2 )

ライセンス: Link先を確認

Haoran Wang, Qiuye Jin, Shiman Li, Siyu Liu, Manning Wang, Zhijian Song

(参考訳) 深層学習は医用画像解析で広く成功し、大規模の専門家による医用画像データセットの需要が高まっている。しかし、医用画像に注釈をつける高コストは、この分野での深層学習の発展を著しく妨げている。アノテーションのコストを削減するため、アクティブラーニングはアノテーションの最も有用なサンプルを選択し、できるだけ少ないラベル付きサンプルで高性能モデルを訓練することを目的としている。本稿では,情報化とサンプリング戦略の評価を含む,アクティブラーニングの中核的手法について概説する。今回我々は,アクティブラーニングとラベル効率の高い他の手法,例えば半教師付き学習,自己教師付き学習などとの統合に関する詳細な概要を初めて提示する。また、医用画像分析に特化しているアクティブな学習作業についても強調する。最後に、我々は、アクティブラーニングとその医療画像解析への応用の今後の動向と課題について展望を提供する。

Deep learning has achieved widespread success in medical image analysis, leading to an increasing demand for large-scale expert-annotated medical image datasets. Yet, the high cost of annotating medical images severely hampers the development of deep learning in this field. To reduce annotation costs, active learning aims to select the most informative samples for annotation and train high-performance models with as few labeled samples as possible. In this survey, we review the core methods of active learning, including the evaluation of informativeness and sampling strategy. For the first time, we provide a detailed summary of the integration of active learning with other label-efficient techniques, such as semi-supervised, self-supervised learning, and so on. Additionally, we also highlight active learning works that are specifically tailored to medical image analysis. In the end, we offer our perspectives on the future trends and challenges of active learning and its applications in medical image analysis.

翻訳日:2023-10-25 11:24:47 公開日:2023-10-24

# 最適化アルゴリズムの自動微分のランダム化フォワードモード

Randomized Forward Mode of Automatic Differentiation for Optimization Algorithms ( http://arxiv.org/abs/2310.14168v2 )

ライセンス: Link先を確認

Khemraj Shukla and Yeonjong Shin

(参考訳) ニューラルネットワーク内のバックプロパゲーションは、リバースモード微分(reverse mode differentiation)またはベクタージャコビアン積(vector jacobian product、vjp)と呼ばれる自動微分の基本的な要素を利用する。勾配降下法を用いてニューラルネットワークパラメータの更新を行うため,勾配の計算が重要である。本研究では,フォワードモード ad やヤコビベクトル積 (jvp) を用いて効率的に計算される損失関数の方向微分を用いて,ニューラルネットワークのパラメータを更新するジェネリックランダム化手法を提案する。これらのJVPは、Bernoulli、Normal、Wigner、Laplace、Uniformといった確率分布からサンプリングされたランダムな方向に沿って計算される。勾配の計算はニューラルネットワークの前方通過中に行われる。また,特に物理インフォームドニューラルネットワークやDeep Operator Networksにおいて,科学的機械学習に導入された計算実験とともに収束率を示す手法について,厳密な分析を行った。

Backpropagation within neural networks leverages a fundamental element of automatic differentiation, which is referred to as the reverse mode differentiation, or vector Jacobian Product (VJP) or, in the context of differential geometry, known as the pull-back process. The computation of gradient is important as update of neural network parameters is performed using gradient descent method. In this study, we present a genric randomized method, which updates the parameters of neural networks by using directional derivatives of loss functions computed efficiently by using forward mode AD or Jacobian vector Product (JVP). These JVP are computed along the random directions sampled from different probability distributions e.g., Bernoulli, Normal, Wigner, Laplace and Uniform distributions. The computation of gradient is performed during the forward pass of the neural network. We also present a rigorous analysis of the presented methods providing the rate of convergence along with the computational experiments deployed in scientific Machine learning in particular physics-informed neural networks and Deep Operator Networks.

翻訳日:2023-10-25 11:24:31 公開日:2023-10-24

# グラデーションフィードバックを伴う強単調,exp-concaveゲームにおける適応的,二重最適no-regret学習

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback ( http://arxiv.org/abs/2310.14085v2 )

ライセンス: Link先を確認

Michael I. Jordan, Tianyi Lin and Zhengyuan Zhou

(参考訳) オンライン勾配降下(OGD)は、強い凸性や単調性仮定の下では2倍に最適であることがよく知られており、(1)強凸コスト関数に対して$\Theta(\log T)$の最適後悔を達成し、(2)強単調ゲームのマルチエージェント設定において、OGDを用いて、一意的なナッシュ均衡に$\Theta(\frac{1}{T})$の最適な速度で、結合作用の最終的な収束を得る。これらの有限時間保証はその利点を強調するが、OGDは強い凸性/単調性パラメータを知る必要があるという欠点がある。本稿では,これらのパラメータの事前知識を必要としない完全適応型OGDアルゴリズムである \textsf{AdaOGD} を設計する。単一エージェント設定では、このアルゴリズムは強い凸性の下で$O(\log^2(T))$ regretを達成し、ログ係数まで最適である。さらに、各エージェントが強い単調ゲームで \textsf{adaogd} を雇うと、ジョイントアクションはラストイテレートな意味で、$o(\frac{\log^3 t}{t})$で一意なnash平衡に収束し、再びログファクターまで最適となる。従来のnewsvendor問題の学習版では、売上の減少により(ノイズの多い)グラデーションフィードバックのみを観察できる。その結果、シングルリテラー設定とマルチリテラー設定の両方において、最初の実現可能でほぼ最適なアルゴリズムが得られる。さらに、オンラインニュートンステップ(ons)アルゴリズムを用いて、exp-concaveコスト関数とゲームをより一般的な設定に拡張した。

Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of $\Theta(\log T)$ for strongly convex cost functions; and (2) in the multi-agent setting of strongly monotone games, with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of $\Theta(\frac{1}{T})$. While these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In this paper, we design a fully adaptive OGD algorithm, \textsf{AdaOGD}, that does not require a priori knowledge of these parameters. In the single-agent setting, our algorithm achieves $O(\log^2(T))$ regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs \textsf{AdaOGD} in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of $O(\frac{\log^3 T}{T})$, again optimal up to log factors. We illustrate our algorithms in a learning version of the classical newsvendor problem, where due to lost sales, only (noisy) gradient feedback can be observed. Our results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multi-retailer settings. We also extend our results to the more general setting of exp-concave cost functions and games, using the online Newton step (ONS) algorithm.

翻訳日:2023-10-25 11:24:13 公開日:2023-10-24

# 対照的に、医療用時系列の階層的コントラストフレームワーク

Contrast Everything: A Hierarchical Contrastive Framework for Medical Time-Series ( http://arxiv.org/abs/2310.14017v2 )

ライセンス: Link先を確認

Yihe Wang, Yu Han, Haishuai Wang, Xiang Zhang

(参考訳) コントラスト表現学習は、労働集約的、ドメイン特化的、希少な専門家アノテーションへの依存を軽減するため、医療時系列分析において重要である。しかし、既存のコントラスト学習手法は主に1つのデータレベルに焦点を当てており、医療時系列の複雑な性質を完全に活用できない。この問題に対処するために,医療時系列におけるデータコンピテンシーを生かした,革新的な階層型フレームワークCOMETを提案する。我々の綿密に設計されたモデルは、観察、サンプル、トライアル、患者レベルという4つの潜在的なレベルからデータ一貫性を体系的にキャプチャする。複数のレベルで対照的な損失を発生させることで、包括的なデータの一貫性を保ち、情報利用を自己管理的に最大化する効果的な表現を学習することができる。患者に依存しない環境で実験を行う。心筋梗塞の心電図信号やアルツハイマー病やパーキンソン病の脳波信号を含む3種類のデータセットを用いて6つの基準値と比較した。その結果、COMETはすべてのベースラインを一貫して上回り、特に10%と1%のラベル付きデータセットで設定されている。これらの結果は,医療時系列におけるコントラスト表現学習技術の進歩における我々の枠組みの意義を裏付けるものである。ソースコードはhttps://github.com/DL4mHealth/COMETで入手できる。

Contrastive representation learning is crucial in medical time series analysis as it alleviates dependency on labor-intensive, domain-specific, and scarce expert annotations. However, existing contrastive learning methods primarily focus on one single data level, which fails to fully exploit the intricate nature of medical time series. To address this issue, we present COMET, an innovative hierarchical framework that leverages data consistencies at all inherent levels in medical time series. Our meticulously designed model systematically captures data consistency from four potential levels: observation, sample, trial, and patient levels. By developing contrastive loss at multiple levels, we can learn effective representations that preserve comprehensive data consistency, maximizing information utilization in a self-supervised manner. We conduct experiments in the challenging patient-independent setting. We compare COMET against six baselines using three diverse datasets, which include ECG signals for myocardial infarction and EEG signals for Alzheimer's and Parkinson's diseases. The results demonstrate that COMET consistently outperforms all baselines, particularly in setup with 10% and 1% labeled data fractions across all datasets. These results underscore the significant impact of our framework in advancing contrastive representation learning techniques for medical time series. The source code is available at https://github.com/DL4mHealth/COMET.

翻訳日:2023-10-25 11:23:27 公開日:2023-10-24

# 対照的な選好学習:RLのない人間のフィードバックから学ぶ

Contrastive Preference Learning: Learning from Human Feedback without RL ( http://arxiv.org/abs/2310.13639v2 )

ライセンス: Link先を確認

Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

(参考訳) Reinforcement Learning from Human Feedback (RLHF) は、モデルを人間の意図に合わせるための一般的なパラダイムとして登場した。第一に、人間の好みを使って報酬関数を学習し、第二に、強化学習(rl)によって学習した報酬を最適化することでモデルを調整します。このパラダイムは、人間の嗜好は報酬に応じて分配されると仮定するが、最近の研究は、ユーザーの最適なポリシーの下で後悔に従うことを示唆している。したがって、フィードバックから報酬関数を学習することは、人間の好みの欠陥の仮定に基づくだけでなく、ポリシーの勾配やrlフェーズでのブートストラップに起因する不利な最適化課題につながる。これらの最適化の課題により、現代のRLHF法は文脈的帯域設定(例えば、大きな言語モデル)や観測次元(例えば、状態に基づくロボット工学)に制限される。我々は,人間の嗜好の後悔に基づくモデルを用いて,人間のフィードバックから行動の最適化を行うアルゴリズムを新たに導入することで,これらの制限を克服する。最大エントロピーの原理を用いて、報酬関数を学習せずに好みから最適なポリシーを学習するアルゴリズムであるContrastive Preference Learning (CPL) を導出し、RLの必要性を回避する。 CPLは完全に非政治的であり、単純なコントラスト目的のみを使用し、任意のMDPに適用できる。これにより、CPLは従来の方法よりも単純でありながら、高次元およびシーケンシャルなRLHF問題にエレガントにスケールすることができる。

Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the model by optimizing the learned reward via reinforcement learning (RL). This paradigm assumes that human preferences are distributed according to reward, but recent work suggests that they instead follow the regret under the user's optimal policy. Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase. Because of these optimization challenges, contemporary RLHF methods restrict themselves to contextual bandit settings (e.g., as in large language models) or limit observation dimensionality (e.g., state-based robotics). We overcome these limitations by introducing a new family of algorithms for optimizing behavior from human feedback using the regret-based model of human preferences. Using the principle of maximum entropy, we derive Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions, circumventing the need for RL. CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs. This enables CPL to elegantly scale to high-dimensional and sequential RLHF problems while being simpler than prior methods.

翻訳日:2023-10-25 11:23:07 公開日:2023-10-24

# 指紋活度検出のための適応スタイル手法によるブースティング一般化

Boosting Generalization with Adaptive Style Techniques for Fingerprint Liveness Detection ( http://arxiv.org/abs/2310.13573v3 )

ライセンス: Link先を確認

Kexin Zhu, Bo Lin, Yang Qiu, Adam Yule, Yao Tang, Jiajun Liang

(参考訳) 本稿では,LivDet 2023 Fingerprint Representation Challengeにおいて,指紋の鮮明な特徴抽出技術を紹介した。さらに94.68%の精度で実用的な指紋認識システムを開発し,LivDet 2023 Liveness Detection in Actionの2位を獲得した。各種手法,特にスタイル転送を調査することにより,限られたトレーニングデータに直面する場合の精度の向上と一般化を実証する。その結果,LivDet 2023 Challengesで最先端の性能を達成した。

We introduce a high-performance fingerprint liveness feature extraction technique that secured first place in LivDet 2023 Fingerprint Representation Challenge. Additionally, we developed a practical fingerprint recognition system with 94.68% accuracy, earning second place in LivDet 2023 Liveness Detection in Action. By investigating various methods, particularly style transfer, we demonstrate improvements in accuracy and generalization when faced with limited training data. As a result, our approach achieved state-of-the-art performance in LivDet 2023 Challenges.

翻訳日:2023-10-25 11:22:38 公開日:2023-10-24

# シェル上のゴースト:一般的な3d形状の表現的表現

Ghost on the Shell: An Expressive Representation of General 3D Shapes ( http://arxiv.org/abs/2310.15168v2 )

ライセンス: Link先を確認

Zhen Liu, Yao Feng, Yuliang Xiu, Weiyang Liu, Liam Paull, Michael J. Black, Bernhard Sch\"olkopf

(参考訳) フォトリアリスティックな仮想世界の構築には、幅広い対象に対する3d表面形状の正確なモデリングが必要である。そのため、メッシュは魅力的です。 1)現実的な素材と照明による高速物理ベースのレンダリングの実現。 2)物理シミュレーションの支援、及び 3) 現代のグラフィックスパイプラインではメモリ効率がよい。しかし、最近の3次元形状の再構成と統計的モデリングの研究は、メッシュをトポロジカルに非フレキシブルであると批判している。広い範囲の物体の形状を捉えるためには、任意の3d表現は、固体、水密、形状、および薄い、開いている表面をモデル化できなければならない。最近の研究は前者に焦点を当てており、オープンサーフェスを再構築する方法は、材料や照明による高速再構成や無条件生成モデルをサポートしていない。開放面は水密面に浮かぶ島として見ることができ、水密テンプレート上の多様体符号距離場を定義することにより開面をパラメータ化する。このパラメータ化により、任意のトポロジーの水密メッシュと非水密メッシュの両方をパラメータ化するグリッドベースかつ微分可能表現を更に開発する。 ghost-on-the-shell (g-shell) と呼ばれる新しい表現は,多視点画像からのラスタリゼーションベース再構成と,非水密メッシュ生成モデルという,2つの重要な応用を可能にしている。我々は,非水密メッシュの再構築および生成作業において,G-Shellが最先端の性能を達成すると同時に,水密メッシュに対して効果的に動作できることを実証的に実証した。

The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they 1) enable fast physics-based rendering with realistic material and lighting, 2) support physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D shape, however, has critiqued meshes as being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight, shapes as well as thin, open, surfaces. Recent work has focused on the former, and methods for reconstructing open surfaces do not support fast reconstruction with material and lighting or unconditional generative modelling. Inspired by the observation that open surfaces can be seen as islands floating on watertight surfaces, we parameterize open surfaces by defining a manifold signed distance field on watertight templates. With this parameterization, we further develop a grid-based and differentiable representation that parameterizes both watertight and non-watertight meshes of arbitrary topology. Our new representation, called Ghost-on-the-Shell (G-Shell), enables two important applications: differentiable rasterization-based reconstruction from multiview images and generative modelling of non-watertight meshes. We empirically demonstrate that G-Shell achieves state-of-the-art performance on non-watertight mesh reconstruction and generation tasks, while also performing effectively for watertight meshes.

翻訳日:2023-10-25 11:14:23 公開日:2023-10-24

# ニューラルネットワークにおけるメタ(文脈外)学習

Meta- (out-of-context) learning in neural networks ( http://arxiv.org/abs/2310.15047v2 )

ライセンス: Link先を確認

Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, David Krueger

(参考訳) brown et al. (2020) は、大規模言語モデル(llm)における文脈内学習の現象を導入したことで有名である。我々は,llmsを用いた合成実験によりメタアウト・オブ・コンテキスト学習(meta-ocl)と呼ばれる現象の存在を確立する。以上の結果から,メタOCL は LLM をより容易に,あるいは広く有用と思われるテキスト(真文や権威情報源からのテキストなど)のセマンティックな内容に"内部化" し,適切な状況で利用することが示唆された。さらに, メタOCLの出現の仮説として, モデルがパラメータに知識を格納する方法に依存したメタOCLと, 勾配依存型最適化器の暗黙的勾配アライメントバイアスが原因である可能性が示唆された。最後に、将来のAIシステムの能力について、我々の結果が示唆するものを反映し、潜在的なリスクについて議論する。私たちのコードはhttps://github.com/krasheninnikov/internalizationにあります。

Brown et al. (2020) famously introduced the phenomenon of in-context learning in large language models (LLMs). We establish the existence of a phenomenon we call meta-out-of-context learning (meta-OCL) via carefully designed synthetic experiments with LLMs. Our results suggest that meta-OCL leads LLMs to more readily "internalize" the semantic content of text that is, or appears to be, broadly useful (such as true statements, or text from authoritative sources) and use it in appropriate circumstances. We further demonstrate meta-OCL in a synthetic computer vision setting, and propose two hypotheses for the emergence of meta-OCL: one relying on the way models store knowledge in their parameters, and another suggesting that the implicit gradient alignment bias of gradient-descent-based optimizers may be responsible. Finally, we reflect on what our results might imply about capabilities of future AI systems, and discuss potential risks. Our code can be found at https://github.com/krasheninnikov/internalization.

翻訳日:2023-10-25 11:13:56 公開日:2023-10-24

# 言語モデルを用いたメタ学習:不均衡テキストの分類における挑戦と機会

Meta learning with language models: Challenges and opportunities in the classification of imbalanced text ( http://arxiv.org/abs/2310.15019v2 )

ライセンス: Link先を確認

Apostol Vassilev and Honglan Jin and Munawar Hasan

(参考訳) ポリシースピーチ(OOPS)の内容の検出は重要だが難しい。機械学習は、この困難なタスクに取り組む強力なツールですが、トレーニングデータの量や品質の制限、oops定義とデータラベリングの不整合といった要因によって、パフォーマンスの天井を壊すことは困難です。利用可能な限られたリソースの完全な可能性を実現するため,異なるテキスト表現で構築された個々のモデルを組み合わせたメタ学習手法(MLT)を提案する。解析により, この手法は数値的に安定であり, 合理的な組合せ重みが得られることを示した。我々は,MLTとしきい値移動(TM)技術を組み合わせることで,高度に不均衡な分布内および分布外データセット上での予測器の性能をさらに向上する。また,提案手法の統計的に有意な利点を示す計算結果も提供する。すべての著者がこの作品に等しく貢献した。

Detecting out of policy speech (OOPS) content is important but difficult. While machine learning is a powerful tool to tackle this challenging task, it is hard to break the performance ceiling due to factors like quantity and quality limitations on training data and inconsistencies in OOPS definition and data labeling. To realize the full potential of available limited resources, we propose a meta learning technique (MLT) that combines individual models built with different text representations. We analytically show that the resulting technique is numerically stable and produces reasonable combining weights. We combine the MLT with a threshold-moving (TM) technique to further improve the performance of the combined predictor on highly-imbalanced in-distribution and out-of-distribution datasets. We also provide computational results to show the statistically significant advantages of the proposed MLT approach. All authors contributed equally to this work.

翻訳日:2023-10-25 11:13:35 公開日:2023-10-24

# wonder3d:クロスドメイン拡散を用いた単一画像から3dへ

Wonder3D: Single Image to 3D using Cross-Domain Diffusion ( http://arxiv.org/abs/2310.15008v2 )

ライセンス: Link先を確認

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt and Wenping Wang

(参考訳) 本研究では, 単一視点画像から高忠実なテクスチャメッシュを効率的に生成する新しい手法であるWonder3Dを紹介する。近年, Score Distillation Sampling (SDS) に基づく手法は, 2次元拡散前の3次元形状を復元する可能性を示しているが, 一般には, 形状ごとの最適化と一貫性の欠如に悩まされている。対照的に、いくつかの作品は高速ネットワーク推論によって直接3d情報を生成するが、それらの結果はしばしば品質が低く幾何学的詳細が欠如している。画像から3Dまでのタスクの品質,一貫性,効率性を均一に向上するために,多視点正規写像と対応するカラー画像を生成するクロスドメイン拡散モデルを提案する。一貫性を確保するために、ビューとモダリティ間の情報交換を容易にするマルチビュークロスドメインアテンション機構を用いる。最後に,多視点2次元表現から高品質表面を抽出する幾何認識正規融合アルゴリズムを提案する。提案手法は, 高品質な復元結果, 堅牢な一般化, 従来の作業に比べて合理的に良好な効率を達成できることを示す。

In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion priors, but they typically suffer from time-consuming per-shape optimization and inconsistent geometry. In contrast, certain works directly produce 3D information via fast network inferences, but their results are often of low quality and lack geometric details. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images. To ensure consistency, we employ a multi-view cross-domain attention mechanism that facilitates information exchange across views and modalities. Lastly, we introduce a geometry-aware normal fusion algorithm that extracts high-quality surfaces from the multi-view 2D representations. Our extensive evaluations demonstrate that our method achieves high-quality reconstruction results, robust generalization, and reasonably good efficiency compared to prior works.

翻訳日:2023-10-25 11:13:18 公開日:2023-10-24

# 物理インフォームドグラフ畳み込みネットワーク:複素幾何学の一般化フレームワークを目指して

Physics-Informed Graph Convolutional Networks: Towards a generalized framework for complex geometries ( http://arxiv.org/abs/2310.14948v2 )

ライセンス: Link先を確認

Marien Chenaud, Jos\'e Alves, Fr\'ed\'eric Magoul\`es

(参考訳) 9]とその物理情報ニューラルネットワーク(PINN)のセミナル研究以来、ディープラーニングモデルを用いた偏微分方程式(PDE)の解法に多くの取り組みがなされてきた。しかし、複雑な3次元幾何学へのモデルの拡張や、そのようなアプローチが古典的数値解法とどのように結合できるかの研究など、いくつかの課題は残っている。本研究では,偏微分方程式の解法として従来の数値計算手法で用いられるメッシュと,これらのアーキテクチャの類似性に基づいて,これらの問題に対するグラフニューラルネットワークの利用を正当化する。複素幾何学における物理インフォームドフレームワークの問題点を証明した後、古典的数値解法と物理インフォームドフレームワークを組み合わせることで、PDE残差の計算において別の方法を提案する。最後に,この手法の実装を提案し,不規則な幾何学上の3次元問題について検証する。

Since the seminal work of [9] and their Physics-Informed neural networks (PINNs), many efforts have been conducted towards solving partial differential equations (PDEs) with Deep Learning models. However, some challenges remain, for instance the extension of such models to complex three-dimensional geometries, and a study on how such approaches could be combined to classical numerical solvers. In this work, we justify the use of graph neural networks for these problems, based on the similarity between these architectures and the meshes used in traditional numerical techniques for solving partial differential equations. After proving an issue with the Physics-Informed framework for complex geometries, during the computation of PDE residuals, an alternative procedure is proposed, by combining classical numerical solvers and the Physics-Informed framework. Finally, we propose an implementation of this approach, that we test on a three-dimensional problem on an irregular geometry.

翻訳日:2023-10-25 11:12:55 公開日:2023-10-24

# ロボット応用のための効率的な因果発見

Efficient Causal Discovery for Robotics Applications ( http://arxiv.org/abs/2310.14925v2 )

ライセンス: Link先を確認

Luca Castri, Sariah Mghames, Nicola Bellotto

(参考訳) 倉庫やショッピングセンター、病院など、人間と共有される環境でタスクを自動化するロボットは、近くのエージェントやオブジェクト間の基本的な物理的相互作用を理解する必要がある。特に、これらの要素間の因果関係を表現するモデルを作成することは、予期せぬ人間の行動を予測し、特定のロボット行動の結果を予測するのに役立つ。ロボットに適合するためには、因果解析は高速かつ正確で、リアルタイムの要求を満たすことと、ほとんどのロボティクスアプリケーションで典型的な限られた計算資源を必要とする。本稿では,F-PCMCI(F-PCMCI)と呼ばれる高速かつ正確な因果解析のためのアプローチと,実世界のロボット工学応用の実践例を示す。提案したアプリケーションは,F-PCMCIが人間とロボットのインタラクションシナリオの因果モデルを正確にかつ迅速に再構築し,インタラクションの質を高めるために利用することができることを示す。

Using robots for automating tasks in environments shared with humans, such as warehouses, shopping centres, or hospitals, requires these robots to comprehend the fundamental physical interactions among nearby agents and objects. Specifically, creating models to represent cause-and-effect relationships among these elements can aid in predicting unforeseen human behaviours and anticipate the outcome of particular robot actions. To be suitable for robots, causal analysis must be both fast and accurate, meeting real-time demands and the limited computational resources typical in most robotics applications. In this paper, we present a practical demonstration of our approach for fast and accurate causal analysis, known as Filtered PCMCI (F-PCMCI), along with a real-world robotics application. The provided application illustrates how our F-PCMCI can accurately and promptly reconstruct the causal model of a human-robot interaction scenario, which can then be leveraged to enhance the quality of the interaction.

翻訳日:2023-10-25 11:12:38 公開日:2023-10-24

# エアデコード:デコード時間制御可能なテキスト生成のための属性分布再構成

Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation ( http://arxiv.org/abs/2310.14892v2 )

ライセンス: Link先を確認

Tianqi Zhong, Quan Wang, Jingxuan Han, Yongdong Zhang, Zhendong Mao

(参考訳) 制御可能なテキスト生成(CTG)は、所望の属性を持つテキストを生成することを目的としており、復号時間に基づく手法はこのタスクに有望な性能を示す。しかし,本稿では属性崩壊の現象を初めて明らかにする。これにより、制御強度が臨界値を超えると、生成されたテキストの流動性が急速に低下し、テキストが完全に使用不能になる。この制限は、高いレベルの制御性を達成するための復号法の有効性を妨げる。そこで本研究では,Air-Decodingという軽量デコーディングフレームワークを提案する。その主な考え方は属性分布を再構築し、属性語と非属性語の重み付けをバランスさせ、より流動的なテキストを生成することである。具体的にはプレフィックスチューニングによってプレフィックスをトレーニングして属性分布を得る。そして,得られた分布のバランスをとる新しい属性分布再構成法を設計,再構成した分布を用いて言語モデルの生成を誘導し,属性崩壊の問題を効果的に回避する。複数のctgタスクにおける実験により,新たな最先端制御性能が得られた。

Controllable text generation (CTG) aims to generate text with desired attributes, and decoding-time-based methods have shown promising performance on this task. However, in this paper, we identify the phenomenon of Attribute Collapse for the first time. It causes the fluency of generated text to rapidly decrease when the control strength exceeds a critical value, rendering the text completely unusable. This limitation hinders the effectiveness of decoding methods in achieving high levels of controllability. To address this problem, we propose a novel lightweight decoding framework named Air-Decoding. Its main idea is reconstructing the attribute distributions to balance the weights between attribute words and non-attribute words to generate more fluent text. Specifically, we train prefixes by prefix-tuning to obtain attribute distributions. Then we design a novel attribute distribution reconstruction method to balance the obtained distributions and use the reconstructed distributions to guide language models for generation, effectively avoiding the issue of Attribute Collapse. Experiments on multiple CTG tasks prove that our method achieves a new state-of-the-art control performance.

翻訳日:2023-10-25 11:12:21 公開日:2023-10-24

# mcc-kd:マルチcot一貫性のある知識蒸留

MCC-KD: Multi-CoT Consistent Knowledge Distillation ( http://arxiv.org/abs/2310.14747v2 )

ライセンス: Link先を確認

Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, Ji Zhang

(参考訳) 大規模言語モデル(LLM)は、思考の連鎖(CoT)による複雑な推論において顕著な能力を示した。近年,LLMから小型モデルへの推論能力の移転への関心が高まっている。しかし、合理化における多様性と一貫性の両立は困難である。本稿では,これらの2つの側面の強化に焦点をあて,その推論能力の効率向上を図るために,MCC-KD(Multi-CoT Consistent Knowledge Distillation)を提案する。 MCC-KDでは,各質問に対して複数の有理数を生成し,回答分布間の双方向KL分割を最小化することにより,対応する予測間の一貫性を強制する。本研究では,様々なモデルアーキテクチャ (LLaMA/FlanT5) と様々なモデルスケール (3B/7B/11B/13B) によるMCC-KDの有効性について検討した。実験の結果は、MCC-KDの分布内データセットにおける優れた性能を確認するだけでなく、分布外データセットに対する堅牢な一般化能力を強調している。

Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller models. However, achieving both the diversity and consistency in rationales presents a challenge. In this paper, we focus on enhancing these two aspects and propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities. In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions by minimizing the bidirectional KL-divergence between the answer distributions. We investigate the effectiveness of MCC-KD with different model architectures (LLaMA/FlanT5) and various model scales (3B/7B/11B/13B) on both mathematical reasoning and commonsense reasoning benchmarks. The empirical results not only confirm MCC-KD's superior performance on in-distribution datasets but also highlight its robust generalization ability on out-of-distribution datasets.

翻訳日:2023-10-25 11:12:03 公開日:2023-10-24

# LLM生成テキスト検出に関する調査:必要,方法,今後の方向性

A Survey on LLM-generated Text Detection: Necessity, Methods, and Future Directions ( http://arxiv.org/abs/2310.14724v2 )

ライセンス: Link先を確認

Junchao Wu, Shu Yang, Runzhe Zhan, Yulin Yuan, Derek F. Wong, Lidia S. Chao

(参考訳) 大きな言語モデル(LLM)から生まれた複雑な言語を理解し、追跡し、生成する強力な能力によって、LLMが生成したテキストは、私たちの日常生活の多くの領域を驚くほどの速さで浸水させ、人間に広く受け入れられる。 LLMが拡大を続けるにつれ、LCMが生成するテキストを検出する検出器を開発する必要がある。このことは、LLMの潜在的な誤用や、LLM生成コンテンツの有害な影響から芸術的表現やソーシャルネットワークのような保護領域を緩和するために重要である。 LLMの生成したテキスト検出は、LLMによってテキストが生成されるかどうかを識別することを目的としている。検出器技術は最近、透かし技術、ゼロショット法、微動LMs法、対向学習法、LSMを検出器として使う方法、そして人力支援手法の革新によって、顕著な進歩が見られた。本調査では,この領域における最近の研究のブレークスルーと,検出器研究の推進の必要性を裏付けるものである。また、一般的なデータセットを掘り下げて、その制限と開発要件を明らかにします。さらに, LLM生成テキスト検出のパラダイムを分析し, アウト・オブ・ディストリビューション問題, 潜在的な攻撃, データのあいまいさといった課題に光を当てる。結論として,LLM生成テキスト検出における今後の研究の方向性に注目し,人工知能(AI)の実装を推し進める。本調査の目的は,新参者への明確かつ包括的な紹介と,LCM生成テキスト検出分野における有意義な更新を提供することである。有用なリソースは、https://github.com/NLP2CT/LLM- generated-Text-Detectionで公開されている。

The powerful ability to understand, follow, and generate complex language emerging from large language models (LLMs) makes LLM-generated text flood many areas of our daily lives at an incredible speed and is widely accepted by humans. As LLMs continue to expand, there is an imperative need to develop detectors that can detect LLM-generated text. This is crucial to mitigate potential misuse of LLMs and safeguard realms like artistic expression and social networks from harmful influence of LLM-generated content. The LLM-generated text detection aims to discern if a piece of text was produced by an LLM, which is essentially a binary classification task. The detector techniques have witnessed notable advancements recently, propelled by innovations in watermarking techniques, zero-shot methods, fine-turning LMs methods, adversarial learning methods, LLMs as detectors, and human-assisted methods. In this survey, we collate recent research breakthroughs in this area and underscore the pressing need to bolster detector research. We also delve into prevalent datasets, elucidating their limitations and developmental requirements. Furthermore, we analyze various LLM-generated text detection paradigms, shedding light on challenges like out-of-distribution problems, potential attacks, and data ambiguity. Conclusively, we highlight interesting directions for future research in LLM-generated text detection to advance the implementation of responsible artificial intelligence (AI). Our aim with this survey is to provide a clear and comprehensive introduction for newcomers while also offering seasoned researchers a valuable update in the field of LLM-generated text detection. The useful resources are publicly available at: https://github.com/NLP2CT/LLM-generated-Text-Detection.

翻訳日:2023-10-25 11:11:41 公開日:2023-10-24

# SPRING-INX: SPRING Lab, IIT Madrasによる多言語言語音声コーパス

SPRING-INX: A Multilingual Indian Language Speech Corpus by SPRING Lab, IIT Madras ( http://arxiv.org/abs/2310.14654v2 )

ライセンス: Link先を確認

Nithya R, Malavika S, Jordan F, Arjun Gangwar, Metilda N J, S Umesh, Rithik Sarab, Akhilesh Kumar Dubey, Govind Divakaran, Samudra Vijaya K, Suryakanth V Gangashetty

(参考訳) インドには多くの言語があり、22の言語がインド憲法によって公式に承認されている。インド国民のための音声ベースのアプリケーションを構築することは、限られたデータと対応すべき言語やアクセントの数のために難しい問題である。言語技術コミュニティがインドの言語で音声ベースのアプリケーションを構築することを奨励するため、私たちはSPRING-INXデータをオープンソース化しています。これは、アサメ、ベンガル、グジャラーティ、ヒンディー、カナダ、マラヤラム、マラチ、オディア、パンジャービ、タミルのASRシステム構築のための2000時間に及ぶ法的および手作業による音声データです。この取り組みはインド工科大学マドラス校のSPRING Labが行い、インド政府電子情報技術省(MeitY)が出資したNLTM(National Language Translation Mission)の一部となっている。本稿では,データ収集とデータクリーニングのプロセスとデータ統計について述べる。

India is home to a multitude of languages of which 22 languages are recognised by the Indian Constitution as official. Building speech based applications for the Indian population is a difficult problem owing to limited data and the number of languages and accents to accommodate. To encourage the language technology community to build speech based applications in Indian languages, we are open sourcing SPRING-INX data which has about 2000 hours of legally sourced and manually transcribed speech data for ASR system building in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi and Tamil. This endeavor is by SPRING Lab , Indian Institute of Technology Madras and is a part of National Language Translation Mission (NLTM), funded by the Indian Ministry of Electronics and Information Technology (MeitY), Government of India. We describe the data collection and data cleaning process along with the data statistics in this paper.

翻訳日:2023-10-25 11:11:12 公開日:2023-10-24

PDF登録状況（公開日: 20231024）